IEEE Systems Man Amp Cybernetics Magazine - Vol9 No 3 July 2023

IEEE Systems, Man, and Cybernetics Magazine
EDITOR-IN-CHIEF Yo-Ping Huang, Vice President, Membership and Student Activities Committee Ferat Sahin
Tingwen Huang Conferences and Meetings Karen Panetta, Chair György Eigner
Texas A&M University at Qatar, Doha, Qatar Karen Panetta, Vice President, György Eigner, Coordinator
tingwen.huang@qatar.tamu.edu Membership and Student Activities Christopher Nemeth
Lance Fung Chapter Coordinators Subcommittee
Okyay Kaynak, Vice President, Lance Fung, Chair
ASSOCIATE EDITORS Organization and Planning Robert Kozma
Roxanna Pakkar Enrique Herrera-Viedma
Mali Abdollahian, Australia
Shun-Feng Su, Vice President, Publications Imre Rudas
Mohammad Abdullah-Al-Wadud, Saudi Arabia Saeid Nahavandi
Ying (Gina) Tang, Vice President, Finance Adrian Stoica
Choon Ki Ahn, Korea Okyay Kaynak
Maria Pia Fanti
Bernadetta Kwintiana Ane, India Vladik Kreinovich, Treasurer Tadahiko Murata
Karen Panetta
Krishna Busawon, UK Tom Gedeon, Secretary Ferial El-Hawary
Hideyuki Takagi
György EIgner, Hungary Paolo Fiorini
Valeria Garai, Asst. Secretary Ching-Chih Tsai
Liping Fang, Canada Shun-Feng Su
Hossam Gaber, Canada Virgil Adumitroaie
Editors Student Activities Subcommittee
Aurona Gerber, South Africa Peng Shi
Peng Shi, EIC, IEEE Transactions Roxanna Pakkar, Chair
Jason Gu, Canada Ashitey Trebi-Ollennu
on Cybernetics Bryan Lara Tovar
Abdollah Homaifar, USA Hideyuki Takagi Piril Nergis
Okyay Kaynak, Turkey Robert Kozma, EIC, IEEE Transactions
JuanJuan Li
Kevin Kelly, Ireland on Systems, Man, and Cybernetics: Systems Standards Committee X. Wang
Kazuo Kiguchi, Japan Ljiljana Trajkovic, EIC, IEEE Transactions Loi Lei Lai, Chair (China)
Abbas Khosravi, Australia on Human–Machine Systems Chun Sing Lai, Vice Chair (UK) Young Professionals Subcommittee
Vladik Kreinovich, USA Bin Hu, EIC, IEEE Transactions Wei-jen Lee (USA) György Eigner, Chair
Wei Lei, China on Computational Social Systems Thomas Strasser (Austria) Ronald Bock
Kovács Levente, Hungary Dongxiao Wang (Australia) Sonia Sharma
Tiago H. Falk, EIC, SMC E-Newsletter
Huaqing Li, China Chaochai Zhang (China) Xuan Chen
Jing Li, China Haibin Zhu (Canada) Raul Roman
Dongning Liu, China Industrial Liaison Committee Fernando Schramm
Agostino Marcello Mangini, Italy Christopher Nemeth, Chair Nominations Committee
Darius Nahavandi, Australia Sunil Bharitkar Imre Rudas, Chair
Chris Nemeth, USA Michael Henshaw C.L. Philip Chen
Vinod Prasad, Singapore Yo-Ping Huang Vladimir Marik
Hong Qiao, China Azad Madni Ljiljana Trajkovic
Rodney Roberts IEEE PERIODICALS
Ferat Sahin, USA
Awards Committee MAGAZINES DEPARTMENT
Mehrdad Saif, Canada
Claudio Savaglio, Italy Organization and Planning Committee Dimitar Filev, Chair 445 Hoes Lane, Piscataway, NJ 08854 USA
Bahram Shafai, USA Vladimir Marik, Chair Edward Tunstel
Peter Stavenick
Yin Sheng, China Enrique Herrera Viedma Laurence Hall Journals Production Manager
Jinshan Tang, USA Mengchu Zhou Ljiljana Trajkovic
Dimitar Filev Peng Shi Katie Sullivan
Liqiong Tang, New Zealand
Robert Woon Michael H. Smith Senior Manager, Journals Production
Ying Tan, Australia
Jiacun Wang, USA Ferat Sahin Vladik Kreinovich Janet Dudar
Yingxu Wang, Canada Edward Tunstel Senior Art Director
Margot Weijnen, Netherlands Larry Hall Fellows Evaluation Committee
Jay Wang Gail A. Schnitzer
Peter Whitehead, USA Edward Tunstel, Chair
Associate Art Director
Zhao Xingming, China Michael Smith Mengchu Zhou, Vice Chair
Laurence T. Yang, Canada C.L. Philip Chen Liping Fang Theresa L. Smith
Karen Panetta Maria Pia Fanti Production Coordinator
Vladimir Marik
SOCIETY BOARD OF Mark David
Publications Ethics Committee Germano Lambert-Torres
GOVERNORS Director, Business Development—
Shun-Feng Su, Chair Karen Panetta Media & Advertising
Executive Committee Imre Rudas Ching-Chih Tsai
Sam Kwong, President Edward Tunstel Felicia Spagnoli
Imre Rudas, Jr. Past President Vladik Kreinovich Electronic Communications Advertising Production Manager
Edward Tunstel, Sr. Past President Peng Shi Subcommittee Peter M. Tuohy
Fei-Yue Wang Saeid Nahavandi, Chair Production Director
Enrique Herrera Viedma, Vice President,
Robert Kozma Syed Salaken, Web Editor
Cybernetics Kevin Lisankie
Ljiljana Trajkovic Darius Nahavandi, Social Media
Saeid Nahavandi, Vice President, Editorial Services Director
Haibin Zhu Mariagrazia Dotoli
Human–Machine Systems Dawn M. Melley
Patrick Chan
Thomas I. Strasser, Vice President, History Committee Haibin Zhu Staff Director, Publishing Operations
Systems Science and Engineering Michael Smith Ying (Gina) Tang
IEEE SYSTEMS, MAN, AND CYBERNETICS MAGAZINE (ISSN 2333-942X) is published quarterly by the Institute of Electrical and Electron- IEEE prohibits discrimination,
ics Engineers, Inc. Headquarters: 3 Park Avenue, 17th Floor, New York, NY 10016-5997 USA, Telephone: +1 212 419 7900. Responsibility for the harassment, and bullying.
content rests upon the authors and not upon the IEEE, the Society or its members. IEEE Service Center (for orders, subscriptions, address For more information, visit http://www.
changes): 445 Hoes Lane, Piscataway, NJ 08855-1331 USA. Telephone: +1 732 981 0060. Subscription rates: Annual subscription rates included ieee.org/web/aboutus/whatis/policies/
in IEEE Systems, Man, and Cybernetics Society member dues. Subscription rates available on request. Copyright and reprint permission: p9-26.html.
Abstracting is permitted with credit to the source. Libraries are permitted to photocopy beyond the limits of U.S. Copyright law for the private
use of patrons 1) those post-1977 articles that carry a code at the bottom of the first page, provided the per-copy fee indicated in the code is
paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923 USA; 2) pre-1978 articles without a fee. For other copy-
ing, reprint, or republication permission, write Copyrights and Permissions Department, IEEE Service Center, 445 Hoes Lane, Piscataway,
NJ 08854. Copyright © 2023 by the Institute of Electrical and Electronics Engineers Inc. All rights reserved.
Digital Object Identifier 10.1109/MSMC.2023.3280352

Smart Solutions
for Technology
www.ieeesmc.org
Volume 9, Number 3 • July 2023
Features
2 UAVs-Enabled Maritime Communications

UAVs-Enabled Maritime Communications:
Opportunities and Challenges
By Muhammad Waseem Akhtar and Nasir Saeed
2
9 An ASD Classification Based
on a Pseudo 4D ResNet
Utilizing Spatial and Temporal Convolution
By Shuaiqi Liu, Siqi Wang, Hong Zhang, Shui-Hua Wang,
Jie Zhao, and Jingwen Yan
19 Tooth.AI
Intelligent Dental Disease Diagnosis and Treatment
Support Using Semantic Network
By Hossam A. Gabbar, Abderrazak Chahid, Md. Jamiul Alam Khan,
Oluwabukola Grace Adegboro, and Matthew Immanuel Samson
28 MDN-Enabled SO for Vehicle Proactive

Guidance in Ride-Hailing Systems
Minimizing Travel Distance and Wait Time
19 By Xiaoming Li, Jie Gao, Chun Wang, Xiao Huang, and Yimin Nie
37 Edge Processing
A LoRa-Based LCDT System for Smart Building
With Energy and Delay Constraints
By B Shilpa, Hari Prabhat Gupta, and Rajesh Kumar Jha
ABOUT THE COVER

Functional magnetic resonance imaging
display of the human brain.
©SHUTTERSTOCK/STEPAN KAPL
Departments
& Columns
44 Conference Reports
Mission Statement
The mission of the IEEE Systems, Man, and Cybernetics Society is to serve the interests of its members
and the community at large by promoting the theory, practice, and interdisciplinary aspects of systems
science and engineering, human–machine systems, and cybernetics. It is accomplished through
conferences, publications, and other activities that contribute to the professional needs of its members.
Ju ly 2023 IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE 1

UAVs-Enabled
Maritime
Communications
UAVs-Enabled Maritime Communications:
Opportunities and Challenges
by Muhammad Waseem Akhtar and Nasir Saeed
T
he next generation of wireless communi-
cation systems will integrate terrestrial
and nonterrestrial networks, targeting
the coverage of the undercovered regions,
especially those connected to marine
activities. Unmanned aerial vehicle (UAV)-based
connectivity solutions offer significant advances to
support conventional terrestrial networks. However,
the use of UAVs for maritime communication is
still an unexplored area of research. Therefore, this
article highlights different aspects of UAV-based
maritime communication, including the basic archi-
tecture, various channel characteristics, and use
cases. The article afterward discusses several open
research problems, such as mobility management,
trajectory optimization, interference management,
and beam forming.
Introduction
Seawater covers around 70% of planet Earth, and more
than 90% of the world’s products are moved by a com-
mercial fleet of approximately 46,000 ships [1], [2], [3].
The world is experiencing an ever-growing booming
marine economy with continuous development in con-
ventional sectors, such as fisheries and transporta-
tion, and exploring dimensions in maritime activities,
such as tourism, exploring oil and gas resources, and
weather monitoring. Most of these applications

Date of current version: 17 July 2023
2 IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023 2333-942X/23©2023IEEE

depend on a reliable and efficient maritime communica- Therefore, developing high-speed maritime networks is of
tion network. great importance to improve the onboard user experience.
Existing maritime networks mainly comprise band- As a result, maritime communications have garnered sub-
width that is too low, very high frequency (VHF) radios, stantial interest in the recent past, where the primary pur-
or satellite communication networks with too high a pose is to enhance the broadband network coverage for
cost to support the International Maritime Organization terrestrial users with the aid of UAVs that can serve as
(IMO) eNavigation concept. However, emerging maritime aerial base stations (BSs) and relays [4].
networks need wideband, low-cost communication sys- In this context, UAVs can play a vital role in maritime
tems to achieve better security, surveillance, and cover- communications either as relays or flying sensors, gather-
age for efficient working conditions for the onboard crew ing information in cheaper, safer, and faster ways. They
and passengers. Although wireless broadband access can successfully perform complex tasks with less human
(WBA) can fulfill the IMO eNavigation requirement, the involvement cost. UAVs in the maritime network have the
implementation of WBA technologies in maritime areas potential to manage, control, and monitor maritime activi-
is questionable. ties, including the identification of defects in ships to
The typical marine networks comprise a mesh network diagnose and resolve issues while keeping ships in the
of different entities in an integrated satellite–air–sea– sea, reducing maintenance costs and time. Moreover,
ground network. A stand-alone satellite-based solution UAVs can also be helpful for maritime natural resource
considerably boosts its potential to cover a large area with exploration purposes, such as oil and gas exploration,
high-speed data transmission. However, it suffers from especially in harsh and challenging environmental condi-
unavoidable large propagation delays and expensive imple- tions. Furthermore, UAVs equipped with high-resolution
mentation costs. Alternatively, HF/VHF-based systems are cameras can also be used for security and surveillance
simple to implement but have limited utilization, i.e., only purposes. A single drone can gather more information
in vessel identification, tracking/monitoring, and alerting. than cameras installed at different locations. Inspired by
these trends, we present the key aspects of UAV-aided
maritime communication networks. The goal is to identify
the prospects and challenges of deploying UAVs in the
maritime network. Our major contributions in this article
are summarized as follows:
◆◆ First, we present a design architecture of a UAV-based
maritime communication network.
◆◆ Then, we discuss the channel characteristics in mari-
time communication networks, such as air-to-sea and
near-sea-surface channels. Also, we present the use
cases of UAV-aided maritime communication (Table 1).
◆◆ Finally, we present the research challenges and future
directions for UAV-based maritime communication
networks.
UAV-Aided Maritime Communication

Network Architecture
The basic network architecture of a UAV-aided maritime
communication network is shown in Figure 1. In such a
network, UAVs are simultaneously connected with the
maritime control station (MCS), satellite, and sea vessels.
The communication links between UAVs and the MCS, sat-
ellites, and ships are primary, whereas the communica-
tion link between satellites and the MCS is secondary. In
the following, we discuss the MCS, control links, and data
links in detail.
©SHUTTERSTOCK.COM/I’M FRIDAY
MCS
An MCS is the brain of maritime networks positioned on
the ship, on UAVs, or underwater to facilitate the operators
of UAVs. The control station may be either stationary or
movable for command and control (C&C) transmission.
The control station equipment can be as simple as a laptop

with an antenna connected to it or as complex as a rat’s a BS to the users in the uplink. The control links from a
nest, with wires, antennas, computers, electronics boxes, maritime BS to the satellite may be utilized for the orbit
joysticks, and monitors. selection, speed control of the satellite, and coverage con-
trol. Similarly, for the UAV and maritime vessels, control
Control Links links are used for speed control, path selection, and trans-
The link used for talking from a BS in the ship or at the mission direction control.
coast to users (UAV, satellite, or ship) in UAV-assisted mar-
itime networks is called the control link. The control link is Data Links
responsible for transmitting commands and controls from Information is exchanged in maritime networks using data
links where the communication technologies are responsi-
ble for data delivery between system elements and exter-
Table 1. A depiction of UAV-based nal units. The fundamental challenges of the maritime
prospective integrated solutions for network are the security of C&C from a BS to the users,
challenges in maritime applications. cognitive control of the bandwidth, frequency, and data
flow. The following are the different types of data links
Perspective UAV- that exist in maritime networks.
Based Integrated
Use Cases Challenges Solutions
UAV–Ship and Satellite–Ship Data Links
Relaying Mobility, beam form- Sonar, UAVs, and These links deliver information from the UAV/satellite to a
ing, and handovers machine learning sea-based reception device. These links are responsible for
IoT data Interference and Sonar, UAVs, and the data communication between UAVs and ships and sat-
harvesting path planning machine learning ellites and ships.
Wireless power 3D handovers Sonar, UAVs, and
transfer machine learning UAV–Satellite, UAV–UAV, and Satellite–Satellite Data Links
UAVs can cooperate with other space/airborne platforms,
Computation Complexity UAVs and machine such as satellites and other UAVs. These types of data links
offloading learning
demand that air-to-air communication be established
Localization Channel variations, Sonar, UAVs, satel- between the platforms. Establishing these links is more
and 3D Doppler lite, and machine challenging due to the relative movement of both transmit-
effect learning
ters and receivers [5].
Delivering goods Path/trajectory Sonar, UAVs,
planning satellite, and Channel Characteristics
machine learning
It is important to comprehend and model the wireless
Security, safety, Cost and Sonar, UAVs, channels to establish the efficient maritime communi-
and fault complexity satellite, and cation network mentioned. As far as maritime commu-
identification machine learning
nication is concerned, three major channel types are to
IoT: Internet of Things. be investigated. The first is an air-to-sea channel used
to communicate between UAVs and ships. The second is
a near-sea-surface channel that is used for ship-to-ship
communication. Finally, an underwater communication
Satellite channel is used to communicate between underwater
Satellite-
to-UAV
Link UAV
vessels. Underwater communication channels can fur-
Pr ther be divided into near-sea-surface (i.e., up to 600 m
Satellite-to-Ship Link
k
Link
im
ip Lin
ar
Co
y below the sea surface) and deep-sea underwater (i.e.,
-Ship
nt
o-Sh
Lin
ro
more than 600 m below the sea surface) wireless chan-
k
lL
to
Lin
in
Ship
Sec
llite-t
Lin
llite-
ond k
nels due to differences in their c haracteristics, such as
UAV-to-Ship
ary
-to-
-Ship
Con
Sate
Sate
trol
UAV
Link
the temperature, salinity, and atmospheric pressure at
UAV-to
Ship different sea levels.

Ship Maritime wireless channels differ from conventional
Maritime Control
Ship Ship
Station terrestrial channels in many aspects, such as the duct-
Ship Ship
ing effect and heavy scattering over the sea surface,
unpredictable sea wave proportions, water density, and
Underwater
Vessel
Underwater
Vessel
temperature variations in the sea. All of these aspects
result in significant complexity in the receiver design.
Figure 1. A depiction of the basic network Although the satellite-to-ship channels have been
architecture for UAV-aided maritime communication. explored extensively in the past [6], the wireless channels
4 IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

in the terrestrial and nonterrestrial integrated net- expected to face more sparse scattering, which may lead
works (TaNTIN) [7] are less explored for the near-coast to simplification in the air-to-sea channel modeling.
situation. Therefore, researchers have recently inves- As discussed earlier, a standard two-ray or three-ray
tigated maritime wireless channels and developed model can be used in an air-to-sea channel. However, due
several models. to long-distance transmission in the maritime environ-
The two most essential and dis- ment, two main elements, i.e., the
tinguishing properties of maritime ducting effect and Earth curva-
wireless channels are sparsity and The control station ture, must be considered. Also,
location dependence. Sparsity is the location of the transmitter
extensively observed in the mari- equipment can be as (UAV or satellite) is usually above
time environment, especially for simple as a laptop the ducting layer; therefore, a part
the unpredictable scattering and of the radio energy could be
distribution of maritime receivers.
with an antenna absorbed in the ducting layer,
In contrast, the location depen- connected to it or especially when the gazing angle
dency feature implies that there (the angle between the sea surface
as complex as a
should be a completely different and the direct path) is less than
channel model for different loca- rat’s nest, with wires, a threshold. In this case, the ray-
tions of the maritime receiver. Fig- antennas, computers, trapping action of the ducting
ure 2 depicts the challenging layer can also increase the power
maritime environment and chan- electronics boxes, of the received signal, resulting in
nel variations observed at sea level joysticks, and reduced path loss [10].
due to the traveling sea waves,
monitors.
mov ing UAVs, a nd ships. Sea Near-Sea-Surface Channel
waves traveling in random direc- As mentioned earlier, near-sea-sur-
tions and with dynamic wave face (such as ship-to-ship, ship-to-
amplitudes cause high fluctuations in the receiver’s sig- land, and land-to-ship) channels are distance dependent.
nal-to-interference-plus-noise ratio (SINR) level. The Different channel models can be used for different loca-
mobility of sips, sea waves, and UAVs in random directions of transmitters and receivers. The standard two-ray
tions makes channel estimation challenging for the model can be used for a modest distance between the
receiver. Similarly, the variable speed of sea waves, UAVs, transmitter and receiver. However, the LoS and the reflect-
and ships leads to an unpredictable Doppler effect. Con- ed ray components vanish due to Earth curvature with
sequently, these traits develop new difficulties and increased distance between the transmitter and receiver.
dimensions in the design of UAVs in a maritime communi- However, the receiver can still receive the signal transmit-
cation system. ted due to the ducting effect, provided there is proper beam
In the following, we discuss different models for the air- alignment between the transmitter and receiver. Conclu-
to-sea, near-sea-surface, and underwater wireless channels. sively, as the distance between the transmitter and receiver
Air-to-Sea Channel
Air-to-ground channels are widely studied in the literature
[2]. However, the air-to-sea channel differs from the air-to-
UAV
ground channel in many aspects due to differences, such UAV
-to-
as ducting, the sparsity effect, and instability in the mari- UAV
Link UAV
time environment, which lead to the remarkable differenc-
ink
es in channel modeling. Usually, in many cases, the

k
ip L
UA
Lin
V-t
h
two-ray model is applied. The first component of the two-

Ship
o-S
o-
UA
Sea Wave
Sh
V-t
V-t
-to-
ip
UAV-to-Ship Link
ray model is the line-of-sight (LoS) component, and the

UA
o
Lin
-S
UAV
h
k
ip
second is the surface-reflected ray component. When the Sea

Lin
Wave
k
transmission distance is very large, and the transmitter is Ship Ship

Ship
located at some notable height, the curve-Earth two-ray Underwater
model is used to account for the Earth curvature [8]. Vessel
Underwater
In some cases, the rays received from other weak scat- Underwater
Vessel
Ship
tered paths can also be considered, apart from two Vessel
strong paths. However, a dispersion around the maritime

receiver is observed when the transmitter is located at a Figure 2. The UAV as a use case of reliable
very high altitude [9]. Compared to the terrestrial (i.e., maritime communication in dynamically changing
near the urban area) environment, a maritime receiver is environmental conditions.

increases, the two- or three-ray channel model is replaced transfer, data offloading, and localization (Table 1). In the
by duct only. The ducting effect across the sea surface following, we discuss each of the use cases in detail.
allows beyond LoS (BLoS) transmission in marine commu-
nications, which has gained much popularity in secure and UAV-Based Relaying
long-distance maritime communication. UAV-based communications have growing importance for
Figure 3 shows the path loss [10], [11] against the dis- many applications, particularly with the arrival of high-alti-
tance between the transmitter and receiver for different tude, long-endurance platforms. These UAVs can enable
maritime channels with acoustic BLoS communications in support
waves at 500-kHz frequency. The of a range of maritime activities.
path loss varies with the level of The UAV-based airborne relay will
water density in the wireless chan- Wireless charging has enable range extension for mari-
nel. For instance, the path loss in been acknowledged time communication ser vices.
deep seawater is higher than that Also, with the flexible mobility and
in free-space, near-sea-surface, as a viable high possibilities of LoS air-to-sea
and sea-surface channels. The rea- technology to provide links, UAV-enabled relays can dis-
sons for this are the factors of tem- play increasingly important advan-
perature, shadowing, and density
an energy supply for tages for maritime networks, as
of the water. We also show the battery-limited nodes, shown in Figure 1.
trend of path loss for radio-fre-
such as underwater
quency (RF) waves in Figure 4, UAV-Aided Maritime Internet
where the RFs face the highest Internet of Things of Things Data Harvesting
path loss in deep-seawater chan- devices and sensors. Underwater sensor networks have
nels compared to other maritime attained a lot of research attention
wireless channels. For the free- in recent years. However, it is evi-
space channel, we do not consider dent that major obstacles remain
shadowing caused by the sea waves; rather, we consider to be solved. Several telemetry activities for maritime
the LoS communication link between the UAV and ship at monitoring, research, and exploration can be performed
sea level. By comparing Figures 3 and 4, we can determine based on collecting data from marine buoys rapidly and in
that acoustic waves are more suitable for maritime com- real time. Satellites, ships, and airplanes can all collect
munication in the underwater environment. At the same marine data, but satellite transmission is often expensive
time, RF is better suited for near-surface and free-space and bandwidth limited, while manned ships/aircraft have
links above the seawater environment. high manpower/mission costs and risks. Therefore, using
UAVs that can resist strong winds over the sea surface as
Use Cases of UAV-Aided Maritime agile data collectors appears to be an exciting solution.
Communication
This section covers various use cases of UAV-aided mari-
time communication, such as relaying, wireless power
250
200
160 150
Path Loss (dB)
140 100
50
Path Loss (dB)
120
0
100
–50
80
Free Space (LoS)
–100
Sea Surface 0 200 400 600 800 1,000
60 Near Sea Surface
Deep Sea Water Distance (m)
40 Free Space (LoS) Near Sea Surface
0 200 400 600 800 1,000
Sea Surface Deep Sea Water
Distance (m)
Figure 3. A depiction of the path loss for free-space Figure 4. A depiction of the path loss for free-space
(LoS), sea-surface, near-sea-surface, and deep- (LoS), sea-surface, near-sea-surface, and deep-
seawater channels at 500-kHz acoustic waves. seawater channels at 500-kHz RF waves.

UAVs can fly near the buoys and use a stable communica- determining the location of unknown marine targets by
tion channel to wirelessly and quickly capture a signifi- UAVs are challenging.
cant amount of data because of their high mobility.
Research Challenges and Directions
Maritime Wireless Power Transfer Although there has been great interest in UAV-aided mar-
Wireless charging has been acknowledged as a viable tech- itime communication over the past few years, various
nology to provide an energy supply for battery-limited open research issues should be targeted. In the follow-
nodes, such as underwater Inter- ing, we explore some promising
net of Things (IoT) devices and upcoming research challenges for
sensors. UAV-based wireless charg- UAV-aided maritime communica-
ing can bring more flexibility in In maritime tion networks.
terms of mobility and accessing communication
hard-to-reach areas [12]. Due to the UAV 3D Maritime
LoS linkages between the UAV and networks, beam- Trajectory Design
sensors, the UAV-enabled wireless forming and power Exploiting the high mobility of UAVs
power transfer system may sub- is projected to unlock the full poten-
control issues are
stantially improve energy transfer tial of UAV-to-sea communications.
efficiency by deploying the UAV as more challenging Various trajectory optimization
a mobile energy transmitter. due to the models exist in the literature that
optimize air-to-sea communications
Maritime Computation frequent switching under different UAV configurations.
Offloading of frequency- The problems of trajectory optimi-
Because of great sensitivity to time zation are often nonconvex, and
and energy consumption, many
access points variants of the successive convex
computation- and data-intensive and collaborative approximation (SCA) technique are
jobs are challenging to accomplish used to solve them suboptimally.
operation.
on maritime energy-constrained Nevertheless, these SCA-based
devices. UAV-based mobile edge approaches depend heavily on tra-
computing (MEC) is a promising jectory initialization and do not
solution to overcome this challenge, providing ubiquitous explicitly account for the wind effect. Furthermore, for
Internet services for emerging maritime applications, such fixed-wing UAVs that must sustain forward motion to stay in
as marine environmental monitoring, ocean resource the air, the computational complexity and resulting trajecto-
exploration, disaster prevention, and navigation. As a ry complexity make it costly to collect a high volume of
result, UAV-based MEC has emerged as a new paradigm data. Therefore, designing an energy-efficient 3D maritime
that receives great attention in both academic and indus- UAV trajectory is very important.
trial sectors. Increasing demand for large-scale connection
and communications, ultralow information-processing UAV-to-Sea and UAV-to-UAV Interference
latency, and high dependability in delay-sensitive marine Management
applications pose problems for delivering reliable quality For maritime applications, UAVs largely send data in the
of service in a resource-constrained maritime network. downlink. Nevertheless, the capacity of maritime-connected
UAVs to establish LoS communication with several sea ves-
Maritime Localization sels might lead to severe mutual interference between them
Localization plays a significant role in communication in and the ships. To overcome this difficulty, additional advanc-
the TaNTIN environment [1]. Maritime localization uses a es in the architecture of future UAV-based maritime net-
ship’s measuring devices to determine the location of works, such as enhanced receivers, 3D frequency reuse, and
other nautical targets. Ocean surveillance satellites can 3D beam forming, are needed. For instance, because of their
take advantage of space and altitude to cover large capabilities of detecting and categorizing images, deep
ocean areas, monitor submarine operations in real time, learning models can be implemented on each UAV to rec-
and detect radar signals sent by ships. Nevertheless, the ognize numerous environmental elements, such as the
position precision based on satellites may not be satis- location of UAVs and ships. Such a method will enable
factory, especially in unforeseen situations that require each UAV to change its beamwidth tilt angle to minimize
high accuracy, such as ocean rescue and noncoopera- the ships’ interference.
tive (enemy) ship location. In this case, UAVs can be
used to improve the localization accuracy of the targets 3D Mobility Management (3D Handoffs)
where the UAVs can be controlled remotely [3]. Never- UAVs can be deployed as aerial BSs or aerial users in UAV-
theless, the self-positioning of UAV platforms and assisted maritime networks. In the case of their

deployment as the aerial BSs, UAVs can be deployed far device-to-device communication; artificial intelligence;
away from maritime users, such as a ship. This might machine learning and blockchain technologies; and mari-
degrade the signal strength at the receiver and cause poor time communication. He is a Member of IEEE.
mobility performance, such as radio connection loss and Nasir Saeed (mr.nasir.saeed@ieee.org) earned his
handover failure. In addition, loss of the C&C signal may Ph.D. degree in electronics and communication engi-
result in dangerous events, such as the collision of UAVs neering from Hanyang University, Seoul, South Korea, in
with commercial aircraft, or may even cause UAVs to fall 2015. He is currently an associate professor with the
into the sea. Department of Electrical and Communication Engineer-
For this case, UAVs are deployed as aerial users in mari- ing at United Arab Emirates University, Al Ain 15551,
time communication networks. However, they can still United Arab Emirates. His research interests include
face many mobility management issues, especially when nonconventional communication networks, heteroge-
there is no LoS link between the maritime BS and the aeri- nous vertical networks, multidimensional signal process-
al users [13]. Although the sidelobes of BS antenna can still ing, and localization.
serve aerial users, there may be a loss of connection and
handover failure due to lower antenna gains in the side- References
lobes [1]. Hence, excellent mobility management is of [1] J.-B. Wang et al., “Unmanned surface vessel assisted maritime wireless commu-
essential relevance for enabling reliable connections nication toward 6G: Opportunities and challenges,” IEEE Wireless Commun., early
between UAVs and ships sailing on the sea. access, 2022, doi: 10.1109/MWC.008.2100554.
[2] Y. Song et al., “Internet of maritime things platform for remote marine water qual-
Beam Forming for High-Mobility Ships and UAVs ity monitoring,” IEEE Internet Things J., vol. 9, no. 16, pp. 14,355–14,365, Aug. 2022,
In maritime communication networks, beam-forming doi: 10.1109/JIOT.2021.3079931.
and power control issues are more challenging due to [3] F. S. Alqurashi et al., “Maritime communications: A survey on enabling tech-
the frequent switching of frequency-access points and nologies, opportunities, and challenges,” IEEE Internet Things J., early access, 2022,
collaborative operation. Conjunct power control and doi: 10.1109/JIOT.2022.3219674.
beam forming provide reliable coverage for UAV-assist- [4] M. W. Akhtar et al., “The shift to 6G communications: Vision and requirements,”
ed maritime networks, but a fixed beam-forming vector Human Centric Comput. Inf. Sci., vol. 10, no. 1, pp. 1–27, Dec. 2020, doi: 10.1186/s13673
may lead to SINR variations due to variations in angle of -020-00258-2.
departure (AoD) and angle of arrival (AoA). Empirical [5] N. Saeed et al., “Point-to-point communication in integrated satellite-aerial 6G
measurements with Doppler effects can be of substan- networks: State-of-the-art and future challenges,” IEEE Open J. Commun. Soc., vol. 2,
tial value for constructing more accurate statistical air- pp. 1505–1525, Jun. 2021, doi: 10.1109/OJCOMS.2021.3093110.
to-sea channel models, and modern technologies can [6] C. Azzarello, C. Gerbino, and R. Mehta, “Enhanced sensing methods for UAV-
improve beam forming and mobility management for based disaster recovery,” Comput. Sci. Eng. Senior Theses, Santa Clara Univ.,
ships and UAVs. Dept. Comput. Sci. Eng., Santa Clara, CA, USA, 2021. [Online]. Available: https://
scholarcommons.scu.edu/cseng_senior/194.
Conclusion [7] M. W. Akhtar and S. A. Hassan, “TaNTIN: Terrestrial and non-terrestrial integrated
This article presents the possible architecture, impor- networks-a collaborative technologies perspective for beyond 5G and 6G,” Internet
tant applications, challenges, and solutions for using Technol. Lett., early access, 2021, doi: 10.1002/itl2.274.
UAVs in maritime networks. This article identifies vari- [8] A. Verma et al., “VaCoChain: Blockchain-based 5G-assisted UAV vaccine distribu-
ous types of wireless maritime channel characteristics. tion scheme for future pandemics,” IEEE J. Biomed. Health Inform., vol. 26, no. 5,
Furthermore, several use cases of UAV-assisted mari- pp. 1997–2007, May 2022, doi: 10.1109/JBHI.2021.3103404.
time communications, such as monitoring and surveil- [9] S. Bauk, “Performances of some autonomous assets in maritime missions,” Trans-
lance, relaying, IoT harvesting, computation offloading, Nav, Int. J. Marine Navig. Safety Sea Transp., vol. 14, no. 4, pp. 875–881, Feb. 2021,
localization, and the delivery of goods, are discussed. doi: 10.12716/1001.14.04.12.
This article further tries to spur the interest of research- [10] J. Wang et al., “Wireless channel models for maritime communications,” IEEE
ers in the future evolution of UAV-enabled maritime com- Access, vol. 6, pp. 68,070–68,088, Nov. 2018, doi: 10.1109/ACCESS.2018.2879902.
munication networks that will enable digital use cases [11] J. Wang and S. Wang, “Seawater short-range electromagnetic wave communica-
for the future marine economy. tion method based on OFDM subcarrier allocation,” J. Comput. Commun., vol. 7,
no. 10, pp. 63–71, Jan. 2019, doi: 10.4236/jcc.2019.710006.
About the Authors [12] E. Lvsouras and A. Gasteratos, “A new method to combine detection and tracking
Muhammad Waseem Akhtar (muhammadwaseem. algorithms for fast and accurate human localization in UAV-based SAR operations,” in
akhtar@miun.se) is a postdoctoral research fellow with the Proc. IEEE Int. Conf. Unmanned Aircraft Syst. (ICUAS), 2020, pp. 1688–1696, doi:
Information Systems and Technology department of Mid 10.1109/ICUAS48674.2020.9213873.
Sweden University, Sundsvall 851 70, Sweden. His research [13] Z. Haider et al., “A novel cooperative relaying-based vertical handover technique
interests include the Internet of Things; cooperative com- for unmanned aerial vehicles,” Secure Commun. Netw., vol. 2022, Sep. 2022, Art. no.
munication; energy- and bandwidth-efficient network 5702529, doi: 10.1155/2022/5702529.
designing; massive multiple-input, multiple-output and

An ASD
Classification
Based on a Pseudo
4D ResNet
Utilizing Spatial and Temporal Convolution
by Shuaiqi Liu , Siqi Wang , Hong Zhang,

Shui-Hua Wang , Jie Zhao, and Jingwen Yan
©SHUTTERSTOCK.COM/SAID FX
T
he psychiatric condition known as autism extract and classify the brain activity of ASD patients. A
spectrum disorder (ASD) affects children and P4D ResNet can extract both temporal and spatial infor-
adults alike. As a medical imaging technology, mation from fMRI data, which mainly consists of two dif-
functional magnetic resonance imaging ferent residual blocks stacked together. In a P4D ResNet,
(fMRI) is widely used to study the brains of to reduce computational and parametric quantities, each
persons with ASD. This study introduces a novel tech- residual block is combined with a 3D spatial filter and a 1D
nique: a pseudo 4D ResNet (P4D ResNet) to simultaneously temporal filter instead of a 4D spatiotemporal convolution,
which can perform parallel computation. Due to the high
dimensionality of the complete data and the limited
Date of current version: 17 July 2023 amount of data, in this article, each piece of fMRI data are
2333-942X/23©2023IEEE Ju ly 2023 IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE 9

sampled at equal intervals of a set length in the time dimen- classifier. Iidaka [10] input the correlation matrix calculat-
sion for data expansion. Compared with other existing ed from rs-fMRI time-series data into a probabilistic neu-
models, the experiments show that the proposed model for ral network (PNN) for ASD classification. The PNN
ASD classification achieved better results. classifier consists of four fully interconnected layers: an
input, a pattern, summation, and an output. The proposed
Introduction algorithm obtained approximately 90% accuracy in 312
ASD, usually known as autism, is a common neurodevel- subjects with ASD, and 328 subjects with typical develop-
opmental cognitive condition in children that is primarily ment. Bi et al. [11] proposed a random NN cluster consist-
inherited. Neurodegenerative conditions, including autism ing of multiple NNs to classify 50 ASD patients and
spectrum diseases, have recently drawn more attention. 42 typical controls (TCs) to solve the problem of the low
Patients usually have very slow accuracy of a single NN to classify
language development and are ASD patients and TCs. They also sug-
unable to communicate properly. gested five random NN clusters,
They are not interested in the Resting-state fMRI namely, a random backpropagation
activities around them and rarely requires subjects to NN cluster, random probabilistic NN
initiate social interactions. More- cluster, random learning vector
over, they often exhibit repetitive,
be fully relaxed to quantization NN cluster, a random
stereotyped behaviors and are acquire images, and competitive NN cluster, and random
extremely resistant to change and Elman NN cluster were construct-
the images acquired
transformation. The families of ed. Among them, the accuracy of
ASD patients suffer significant psy- have high spatial and random Elman NN clusters was
chological and financial stress for temporal resolution. greatly improved.
a protracted period of time due to Mostafa et al. [12] proposed a
the lack of a specific prescription brain network-based algorithm for
for ASD and the difficulties in find- ASD classification. This algorithm
ing a permanent cure. This causes losses and injury to used 264 regions-based wrapping schemes from the fMRI
individuals, families, and society at large. Traditional ASD of the brain to construct a brain network. Then, 264 origi-
diagnostic techniques are time consuming and prone to nal brain features were defined by the 264 feature values
error because they are dependent on the Diagnostic and of the Laplacian matrix of the brain network, and three
Statistical Manual of Mental Disorders. As a result, the additional features of the brain network were defined by
creation of a fully automated diagnostic method for ASD the network centrality. Finally, this algorithm obtained 64
is required. discriminative features through a feature-selection algo-
Numerous functional neuroimaging techniques have rithm and obtained an accuracy of 77.7% in ASD classifica-
been utilized in brain study since the advancement of med- tion. Liu et al. [13] proposed an ASD classification
ical imaging. One of the most widely used is fMRI [1], [2], algorithm based on dynamic functional connectivity and
[3], [4]. High temporal and spatial resolution obtained by multitask feature selection, which was validated by the
fMRI makes it possible to see both physiological and path- fMRI data from ABIDE I with a classification accuracy of
ological functional brain activity [5], [6]. Blood-oxygen- 76.8%. Zhao et al. [14] used the method of extracting cen-
ation-level dependent, which in brain research can be tral moments of data to extract time-invariant features in
separated into two modalities, namely, task and resting low- or high-order dynamic functional connectivity net-
states, serves as the foundation for the fundamental theory works of fMRI data. By integrating the features extracted
of fMRI. Resting-state fMRI (rs-fMRI) requires subjects from conventional functionally connected, low-order
to be fully relaxed to acquire images, and the images dynamically connected, and high-order dynamically con-
acquired have high spatial and temporal resolution. nected networks, an accuracy of up to 83% was obtained in
Because the acquisition method is quick and easy, it is 45 ASD patients and 47 TCs by using a linear, kernel-based
widely applied in the classification of ASDs [7], [8], [9]. The support vector machine (SVM) classifier.
rs-fMRI data used in this study were mainly dichotomized Deep learning algorithms have been well applied in var-
for ASD and TCs. ious fields [15], [16], [17]. Deep learning-based ASD classifi-
In terms of model composition, the research on ASD cation algorithms have also recently gained popularity due
classification can generally be categorized into two types: to the quick advancement of computers. One of the most
traditional machine learning and deep learning. Tradition- widely used is convolutional NNs (CNNs). For example,
al machine learning methods provide effective models for Xiao et al. [18] decomposed the dataset of each subject
ASD classification and recognition problems. Scholars into 30 independent components. Then, an array of 84 key
from various countries have proposed different traditional features of all the subjects was reshaped into a 3,400 ×
machine learning-based methods for ASD classification, 84-dimensional key-feature matrix and was input into a
and the main steps include manual feature extraction and stacked autoencoder for classification. This study

obtained an average classification accuracy of 87.21% in small-sample data. In the aforementioned studies, classifi-
84 subjects. Jia et al. [19] extracted the functional connec- cation accuracy of the small-sample studies can reach
tivity correlation matrix of the brain from rs-fMRI data nearly 90%, while that of large-sample studies reaches only
after preprocessing and then used a stacked autoencoder about 70% accuracy. However, the significance of ASD
for ASD classification. ASD identification was obtained classification studies is precisely why there is a desire to
with an accuracy of 95.27% in 656 subjects. In 2019, invest in realistic medical judgment. If the variability of
Rathore et al. [20] obtained a classification accuracy of sites in the database is not taken into account and only a
69.2% in 1,035 subjects with a simple three-layer NN by small sample is used for the study, the results are not
using a functional correlation and its topological features. extensive. To classify ASDs, a P4D-ResNet-based ASD
In the same year, Zhuang et al. [21] proposed an invertible classification method is created and employed in this
network for ASD classification and research. This model puts spatial
biomarker selection. This invert- convolution and temporal convolu-
ible network has two invertible tion together into a residual block,
blocks that map the data from the The quantity of data thus realizing the simultaneous
input domain to the feature domain. voxels is substantial extraction of spatiotemporal fea-
Then, a fully connected (FC) layer tures of fMRI data, which can also
was applied for classification, and a
because fMRI images perform parallel computations.
classification accuracy of 71% was are an arrangement The results of the experiments
achieved in 530 ASD patients and show how effective the proposed
of a series of 3D
505 subjects. In 2020, Tang et al. method works.
[22] proposed an end-to-end multi- images acquired in a
modal architecture based on deep time series. The Proposed Algorithm
NNs that can analyze the region-of- In this article, we propose a P4D-
interest time-series activation ResNet model based on different
maps by combining different deep residual architectures. This model
learning networks. This method can analyze functional can extract both spatial and temporal features of fMRI data
images more comprehensively and achieve 74% classifica- and fully exploit the spatiotemporal information, which
tion accuracy among 1,035 subjects. In 2021, Shao et al. [23] achieves satisfactory classification results. Construction of
proposed an ASD classification algorithm by combining the P4D-ResNet network model is described in this section.
deep feature selection and graph convolutional networks The P4D-ResNet model consists of a 4D maximum pooling
(GCNs), which achieved better ASD classification results. layer, a P4D-convolution (P4DC) block, a mixed residual
In the same year, Yin et al. [24] constructed brain networks block (MRB), a Flatten layer, a dropout layer, and an FC
from brain fMRI images and then combined self-encoders layer. The network structure of the P4D-ResNet model is
and deep NNs for ASD classification, which achieved good shown in Figure 1.
classification results. The ResNet model first performs dimensionality reduc-
The quantity of data voxels is substantial because fMRI tion by using a 4D maximum pooling layer, followed by
images are an arrangement of a series of 3D images P4D convolution, i.e., a spatial and temporal convolution to
acquired in a time series. The huge amount of spatiotem- obtain the spatial and temporal features of the fMRI data.
poral information within the fMRI 4D image data is The P4D-ResNet model feeds the extracted features into
ignored in most current methods, which inevitably leads to three connected 4D maximum pooling layers and the MRB
the loss of important information. Traditional models are module to downscale and further extract spatiotemporal
unable to extract more effective features, and the classifi- features from the data. Finally, through the Flatten layer
cation accuracy is relatively low. In addition to this, the and the FC layers, the classification results are obtained
sample size has a significant impact on the classification by the Sigmoid function. The proposed model in this arti-
results. There tends to be greater accuracy when using cle can be expressed as
4D Max Pooling
4D Max Pooling
4D Max Pooling
4D Max Pooling
Classification
fMRI Data
Dropout
Flatten
P4DC
MRB
MRB
MRB
FC
Figure 1. The network structure of the P4D-ResNet model. max: maximum.

y = MRB (MP (MRB (MP (MRB (MP (P4DC (MP (x))))))))) extracting features over the entire time series by shifting
(1) the step size. Assuming that k ijx y z t is the value at the 0 0 0 0
(x 0, y 0, z 0, t 0) position of the j th feature map of the i th

where x denotes the input 4D fMRI data, and y denotes layer, that is,
the output of the last MRB function. MRB denotes MRB
k ijx y z t = 2 d b ij + | | | k ((xi -+1)pc)(y + q)(z + r)(t + s) n (2)
Pi - 1 Q i - 1 R i - 1 S i - 1
function, P4DC denotes the P4D-convolutional block, and 0 0 0 0

||w pqrs
ijc
0 0 0 0
c p=0 q=0 r=0 s=0

MP denotes the 4D maximum pooling function. The sub-
structures in the model are described separately in the where 2 is the activation function. Pi, Q i, R i, and S i
next section. denote the size of the dimension in each of the four direc-
tions. w ijc
pqrs
is the weight value at position (p, q, r, s), which
4D Convolution connects the c th feature map of the i - 1 th layer with the
4D CNNs are well suited for spatiotemporal feature learn- j th feature map of the i th layer.
ing of medical images. It is possible to better extract the With the expansion of convolutional layers from three to
data’s temporal and spatial information by performing 4D four dimensions, the skyrocketing number of parameters
convolutional procedures over space and time. To gain and computational effort may lead to an overfitting phenom-
more detailed temporal information, the spatial feature ena. To solve this problem, we decompose the 4D spatiotem-
maps in the convolutional layer are connected to numer- poral convolution into the combination of a 3D spatial and
ous nearby time points in the previous layer. The principle 1D temporal convolution, that is, the original 3 # 3 # 3 # 3
of 4D convolution is presented in Figure 2. The same color convolution is split into a combination of a 3 # 3 # 3 # 1
in the convolutional connection indicates weight sharing. spatial convolution and a 1 # 1 # 1 # 3 temporal convolu-
As displayed in Figure 2, the 4D convolution operation tion, which is the principle of the P4DC module.
applies the same 4D kernel to a continuous 3D image,
The MRB
In this article, as shown in Figure 3, a 4D MRB is built to
conduct the simultaneous extraction of spatiotemporal
information. The residual block is composed of a P4D-seri-
Temporal al residual block (P4D-SRB) and a P4D-parallel residual
block (P4D-PRB).
The P4D-SRB and P4D-PRB constructed in this article
are obtained by modifying the conventional 3D bottleneck
residual block. The conventional residual structure is
shown in Figure 4(a), and the residual blocks constructed
Figure 2. The principle of 4D convolution. in this article are depicted in Figure 4(b) and (c).
P4D-SRB
Conv-s
Conv-t
Conv
Conv
P4D-SRB
P4D-PRB
MRB
Conv-t
PRB
4D-PR B
Conv
Conv
P4D
Conv-s
Figure 3. The mixed residual block structure. P4D-SRB: pseudo-4D serial residual block; P4D-PRB: pseudo-4D
parallel residual block.

The kernel size of both the first and fourth convolu- ◆◆ The input data are reshaped into a dimensional size of
tional layers in the P4D-SRB is set to 1 # 1 # 1 # 1, which b, w, h, d, and t by using the reshaping function.
can match the number of channels. The number of output ◆◆ A 3D maximum pooling operation is performed on the
channels in the P4D-SRB is four times the number of input reshaped input and output data with a dimension size
channels. The P4D-SRB uses spatial convolution, followed of b, w/2, h/2, d/2, and t. “/” denotes a division oper-
by a temporal convolution mode for the spatiotemporal ation with upward rounding.
feature extraction of data. And the P4D-PRB extracts the ◆◆ The current data are reshaped into a dimension size of
spatiotemporal features of data by using spatial convolu- b, w/2, h/2, d/2, t/2, and 2 by using the reshaping
tion and temporal convolution in parallel. In the P4D-SRB, function again.
output of the spatial convolution is directly used as the ◆◆ Take the maximum value of the current data in the
input of the temporal convolution, which indicates that channel dimension and output the data with dimension
the extraction of spatial information has a direct impact sizes of b, w/2, h/2, d/2, t/2, and 1.
on the temporal features. In contrast, in the P4D-PRB, When the number of channels is eight, the data are first
spatial and temporal convolution are extracted separately sliced into eight tensors with channel number eight, and
and then directly accumulated as feature outputs. The the 4D maximum pooling operation with channel number
extraction of spatial information in the same residual one is invoked separately. And when the number of chan-
block does not have a direct effect on temporal feature nels is 16, the data are sliced into two tensors of channel
extraction. It is helpful to generate MRBs by cascading number eight. Similarly, when the number of channels is 32
this too, which improves ASD classification results by or 64, it is processed the same way. So, the 4D maximum
capturing the spatiotemporal features of fMRI data well. pooling layer can be computed by parallel computation.
4D Maximum Pooling Layer Data Enhancement And Model Training

This study extends the 3D maximum pooling layer to the The dataset from the global, openly accessible Autism
4D maximum pooling layer. The number of channels used Brain Imaging Data Sharing Project [25] is used to gener-
in this article for the 4D maximum pooling layer are 1, 16, ate the rs-fMRI results in this study. The samples with
32, and 64, respectively. The 4D maximum pooling with a poor brain coverage, excessive motion peaks, ghosting,
channel number of 1 proceeds as follows: and other scanner aberrations are eliminated to leave a
◆◆ Let the size of each dimension of the input data of the final dataset of 871 participants, including 403 ASD
pooling layer be b, w, h, d, t, and l. b denotes the patients and 468 TCs.
batch size of the data input. w, h, and d represent In a 4D NN, more data samples are needed for training.
the width, height, and depth, respectively, of the input Therefore, the data in this article are enhanced by obtaining
fMRI data. t represents the time dimension of the multiple sampling from the original dataset in the temporal
input data, and l denotes the number of channels. dimension. Specifically, 871 subjects are disordered before
1×1×1×1
1×1×1 1×1×1×1
ReLU
ReLU ReLU ReLU
3×3×3×1
3×3×3 ReLU 3×3×3×1 1×1×1×3
ReLU 1×1×1×3 ReLU

+
ReLU ReLU
1×1×1
1×1×1×1 1×1×1×1
+ ++ +
ReLU ReLU ReLU
(a) (b) (c)
Figure 4. 3D residual block and P4D residual block structures. (a) An ResNet. (b) A P4D-SRB. (c) A P4D-PRB.
ReLu: rectified linear unit.

the experiment, and each subject’s 4D fMRI data are sam- 3 # 3 # 3 # 1 and a layer of temporal convolution with a
pled in the temporal dimension in turn. Sixteen time slices kernel size of 1 # 1 # 1 # 3. Then, high-level spatiotempo-
are drawn at an interval of one per frame, and each subject ral features of the data are extracted by the maximum
enhances the data by the maximum expansion. The data of pooling and MRB modules. In this article, three MRB mod-
each 69 subjects and the corresponding labels are encapsu- ules are used. The first MRB module has eight channels.
lated into one generated TFRecord file. TFRecord format The second MRB module has 16 channels. The third MRB
file storage form can reasonably store the data. TFRecord module has 32 channels. The output features of the last
internal use of the “Protocol Buffer” binary data encoding MRB module are flattened by using the Flatten layer.
scheme occupies only a block of memory and only needs to Finally, the flattened feature vector is fed into the FC
load one binary file at a time. It is simple and fast, especial- layer after the dropout operation and classified by using
ly for large training data. When the training data are large, the Sigmoid classifier. In this article, model optimization
they can be divided into multiple TFRecord files to improve is accomplished by using the Adam-optimization algo-
processing efficiency. Fifteen TFRecord files are generated rithm. Cross-entropy is the loss function. The experimen-
for training and testing. Among them, 12 TFRecord files are tal parameters include a four-batch data input size. The
used for training and three TFRecord files are used for test- rate of learning is 0.00001. Dropout is set to 0.5, and the
ing. The data augmentation scheme used in this article is dense layer’s two-parameter regularization parameter is
divided into a training set and a testing set on the unit of set to 0.0005.
“person.” Then the data of each subject are expanded sepa-
rately. Each person’s extended data are either in the train- Experimental Results and Analysis
ing or the testing set, which aids in preventing similar data The data are split into a training set and a test set to the
from impairing the model’s classification effect and ratio of 8:2 to test the model algorithm’s efficacy and save
improves the generalization performance. The amount of as much training data as possible. The test set is used to
data used in the actual experiment after data augmentation evaluate the classification performance of the model,
is listed in Table 1. whereas the training set is used to train the model.
The experiments in this article are implemented on the
Tensorflow 1.0 platform with an Ubuntu 18.4 operating Ablation Experiments
system, 32 G of random-access memory, Intel(R) Xeon(R) To have a better illustration of the effectiveness of the
central processing unit E5-2667 processor, and a Nvidia MRB module on ASD classification, “mixed serial residual
Tesla K40c GPU card. The experiments start with data block (MSRB-2)” is used to replace the P4D-PRB residual
enhancement of the preprocessed fMRI data with dimen- block in MRB with the P4D-SRB residual block. Then,
sional sizes of 61, 73, 61, and 16 for all the data. Second, to “mixed parallel residual block (MPRB-2)” is used to
reduce the risk of model overfitting, a 4D maximum pool- replace the P4D-SRB residual block in MRB with the P4D-
ing layer with a step size of two and a kernel of PRB residual block, and the classification results are pre-
2 # 2 # 2 # 2 is used for dimensionality reduction. The sented in Table 2. When MSRB-2 is used, the accuracy,
low-level spatiotemporal features are then extracted by a specificity, and sensitivity of ASD classification are 66.8,
layer of spatial convolution with a kernel size of 62.68, and 70.18%, respectively. When using MPRB-2, the
accuracy, specificity, and sensitivity of ASD classification
are 68.54, 60.62, and 75.08%, respectively. In contrast, when
Table 1. The dataset after data using the MRB module, the accuracy of ASD classification
enhancement. is improved by 7.87 and 6.13%, respectively, and the sensi-
tivity and specificity are the highest. It can be seen that a
The datasets ASD TC Total more structured MRB can achieve better results, especial-
The original dataset 403 468 871 ly for sensitivity improvement, which validates the effec-
tiveness of the MRB model.
The expanded dataset 2,901 3,051 5,952
For the aforementioned three different residual struc-
tures, we plot their receiver operating characteristic
(ROC) curves and calculate area-under-the-curve (AUC)
Table 2. The performance of different values to evaluate the three algorithms. Figure 5 illustrates
kinds of residual block combinations.
ROC analysis results of the model by using MSRB-2,
Residual Accuracy Specificity Sensitivity MPRB-2, and mixed many residual block (MMRB), respec-
structures (%) (%) (%) tively. Figure 5 shows that the model performs at its best
and the AUC value is its highest when MRB is employed.
MSRB-2 66.8 62.68 70.18
In this article, we also conduct experiments on the
MPRB-2 68.54 60.62 75.08 effect of the number of MRB modules. And we use 1–4 MRB
modules, respectively, to further verify reliability of the
MMRB 74.67 71.9 76.91
model’s design. As shown in Table 3, ASD classification

accuracy is merely 64.74% when only one MRB module is abbreviated as HFR by merging various functional connec-
used to extract spatiotemporal features, which indi- tivity matrix creation techniques, brain segmentation defi-
cates that one MRB module cannot extract effective and nitions, and feature-extraction techniques proposed by
representative spatiotemporal Graña and Silva [27]. 3) A CNN and
information. As the number of multilayer perceptron (CNN-MLP)-
model layers increases, ASD classi- based ASD classification system
fication accuracy increases, but It can be seen that a [28]. 4) A deep multimodal model
when the number of stacked more structured MRB ASD classification system based
groups reaches four, ASD classifi- on joint representation learning,
cation accuracy decreases and the
can achieve better namely, DiagNet, was proposed by
model appears to be overfitted. In results, especially Eslami et al. [29]. 5) A 4D CNN-
summary, three MRB modules are for sensitivity based ASD classification algorithm
selected for model experiments in proposed by Guo et al. [30]. 6) An
this article. improvement, ASD classification system based on
In this article, the selection of which validates the 4D CNNs, namely, UM_1, was pro-
time frames for data sampling is posed by Guo et al. [30]. 7) An ASD
discussed. The time frames are
effectiveness of the classification algorithm based on
selected and set to 8, 16, and 32 for MRB model. USM sites and 4D CNNs was
training and testing, respectively. offered by Guo et al. [30]. 8) A CNN
The classification effects are listed and gate-recursive unit-based ASD
in Table 4. classification algorithm was report-
Table 4 shows that ASD classification accuracy is low ed by Jiang et al. [31]. 9) A GCN was used by Parisot et al.
when the time dimension is chosen to be 8. This is mostly [32] to train ASD detection models in a semisupervised
due to the time being too short, which causes the model to learning setting. The results of the comparison algorithms
extract fewer features and makes it difficult to properly are taken from the test results provided by the authors in
extract the temporal signals in the fMRI data. And when 32 the corresponding references. The test dataset contains
is used for the temporal dimension, more parameters and data from every site, providing for the calculation of the
computation are required for model training, which results average accuracy. The proposed algorithm’s and compari-
in the overfitting phenomena. As a result, the experiments son algorithms’ test set results are listed in Table 5.
in this article’s experiments selected the data from 16 time As shown in Table 5, the proposed algorithm can
points that had the best categorization effect. achieve 74.67% accuracy in the experiments with 871 sub-
jects. It improves by 7.37% compared to the RCE-SVM,
The Comparison With
Existing Algorithms
We compare the proposed method with the current ASD Table 3. The impact of the number of MRB
classification algorithms to test its performance. The com- modules on the classification effect.
pared algorithms are 1) an ASD classification algorithm
The number of Accuracy Specificity Sensitivity
based on functional connection networks and recursive- MRB modules (%) (%) (%)
cluster elimination SVMs (RCE-SVMs) was put forth by
Chaitra et al. [26]. 2) A hybrid ASD classification algorithm 1 64.74 62.34 66.72
2 69.54 66 72.47
1 3 74.67 71.9 76.91
0.8 4 70.36 67.45 72.77

True-Positive Rate
0.6
able 4. The classification effect of

T
0.4 different time frames.
MSRB-2 (AUC = 0.75)
0.2 Time Accuracy Specificity Sensitivity
MPRB-2 (AUC = 0.74)
MMRB (AUC = 0.8) Frames (%) (%) (%)
0
0 0.2 0.4 0.6 0.8 1 8 70.86 67.88 73.22
False-Positive Rate
16 74.67 71.9 76.91
Figure 5. The ROC curves of different residual block
32 71.27 70.38 72.01
superposition experiments.

5.23% compared to the UM_1, and 3–5% compared to the the sample size necessary for deep learning models. To
HFR, CNN-MLP, DiagNet, 4D CNN, and USM. In addition, evaluate the performance of the model, we conducted
the proposed algorithm obtains 71.9 and 76.91% sensitivity ablation experiments on the proposed algorithm. Addi-
and specificity, respectively. The proposed algorithm uses tionally, by contrasting the method in this article with cur-
large samples for experiments, so the results are more rent ASD classification algorithms, the proposed
extensive. And the subsequent single-site experiments algorithm’s efficacy was confirmed. In addition, we calcu-
also verify that the algorithm in this article can obtain lated ASD classification accuracy, sensitivity, and speci-
better classification results on the New York University ficity indexes among 17 sites and assessed the effect of
(NYU) results. site variability on the results. However, several issues can
The classification accuracy, sensitivity, and specificity be considered in the future. First, we used only functional
of the proposed algorithm at 17 sites are computed in this imaging modality, without considering the structural
study to further explore the classification performance of imaging modalities related to brain states. In the future,
the model at each site, as shown in Table 6. From Table 6, it we will integrate both functional and structural modali-
is more obvious that the variability between sites has a sig- ties to train our model for ASD identification. Second, we
nificant impact on the final results. Although the Carnegie treated ASD diagnosis as a binary classification problem.
Mellon University (CMU), SBL, and UM sites had less than However, it is well known that ASD is divided into eight
70% classification accuracy, the Kennedy Krieger Institute categories in the latest edition of ICD-11, published by the
(KKI), Leuven, MaxMun, and Trinity sites have more than
80% classification accuracy. The varying scanning appara-
tuses, subject counts, and time dimensions at each site
Table 6. The classification effect of the
contributed to the variation in the expansion data as well. algorithm in this article at 17 sites.
The noise introduced by this fluctuation makes it more dif-
ficult to extract features from the fMRI data to categorize Serial Accuracy Specificity Sensitivity
illness states. number Sites (%) (%) (%)
In addition, the confusion matrices of 17 sites are given 1 Caltech 70.45 70 70.83
in Figure 6, which clearly shows the sample probability dis-
tribution of both the ASD and TC being correctly and 2 CMU 69.7 84.62 60
incorrectly identified, respectively. From Figure 6, it can be 3 KKI 86.36 94.74 80
seen that the percentage of ASD and TC, which can be cor-
rectly classified, is high in the KKI, MaxMun, and Trinity 4 Leuven 80.36 77.5 87.5
sites, while the accuracy of both ASD and TC recognition 5 MaxMun 83.33 83.33 83.33
varies more in the CMU, SBL, San Diego State University,
and UM sites. 6 NYU 72.59 75 70.67
7 OHSU 71.42 75 66.67

Conclusion and Future Work
In this study, the P4D-ResNet deep learning model was 8 Olin 76.67 83.33 72.22
proposed for the simultaneous extraction of spatiotemporal 9 Pitt 78.72 82.35 76.67
information. Instead of using 4D spatiotemporal convolu-
tion, we employed spatial and temporal convolution, 10 SBL 69.44 73.33 50
which also built a mixed residual model to extract richer 11 SDSU 80 60 87.14
spatiotemporal feature information. This study conducted
an enhancement operation on fMRI data, taking into 12 Stanford 72.46 70 73.47
account the constraints of the current data volume and 13 Trinity 85.42 87.5 84.38
14 UCLA 70 77.78 63.63
Table 5. The performance of the 15 UM 69.53 81.82 62.78

P4D-ResNet model compared with
other algorithms. 16 USM 75.89 75.71 76.19
17 Yale 80.95 76.19 85.71

Classification RCE- CNN- Diag- 4D
algorithms SVM HFR MLP Net CNN
Caltech: California Institute of Technology; CMU: Carnegie Mellon University;
KKI: Kennedy Krieger Institute; Leuven: University of Leuven; MaxMun: University
Accuracy (%) 67.3 71.1 70.22 70.3 70.49 of Munich; NYU: New York University Langone Medical Center; OHSU: Oregon
Health and Science University; Olin: Olin Institute of Living at Hartford
Classification UM_1 USM CNNG GCN P4D Hospital; Pitt: University of Pittsburgh School of Medicine; SBL: Social Brain Lab;
SDSU: San Diego State University; Stanford: Stanford University (Stanford);
algorithms ResNet Trinity: Trinity Centre for Health Sciences; UCLA: University of California, Los
Angeles; UM: University of Michigan; USM: University of Utah School of
Accuracy (%) 69.44 69.7 72.46 69.5 74.67 Medicine; Yale: Yale Child Study Center.

ASD 70 30 ASD 84.62 15.38 ASD 94.74 5.26 ASD 77.5 22.5 ASD 83.33 16.67 ASD 75 25
TC 29.17 70.83 TC 40 60 TC 20 80 TC 12.5 87.5 TC 16.67 83.33 TC 29.33 70.67
ASD TC ASD TC ASD TC ASD TC ASD TC ASD TC

(a) (b) (c) (d) (e) (f)
ASD 75 25 ASD 83.33 16.67 ASD 82.35 17.65 ASD 73.33 26.67 ASD 60 40
TC 33.33 66.67 TC 27.78 72.22 TC 23.33 76.67 TC 50 50 TC 42.86 57.14
ASD TC ASD TC ASD TC ASD TC ASD TC

(g) (h) (i) (j) (k)
ASD 70 30 ASD 87.5 12.5 ASD 77.78 22.22 ASD 81.82 18.18 ASD 75.71 24.29 ASD 76.19 23.81
TC 26.53 73.47 TC 15.62 84.38 TC 36.37 63.63 TC 37.22 62.78 TC 23.81 76.19 TC 14.29 85.71
ASD TC ASD TC ASD TC ASD TC ASD TC ASD TC

(l) (m) (n) (o) (p) (q)
Figure 6. The confusion matrices of 17 sites. (a) Caltech. (b) CMU. (c) KKI. (d) Leuven. (e) MaxMun. (f) NYU.
(g) OHSU. (h) Olin. (i) Pitt. (j) SBL. USM. (k) SDSU. (l) Stanford. (m) Trinity. (n) UCLA. (o) UM. (p) USM. (q) Yale.
World Health Organization. Therefore, we will seek to Siqi Wang (sqwang_hbu@163.com) earned her B.S.
model a multiclass classifier. In addition, the deep learn- degree from the College of Electronic and Information
ing model is like a black box, and it is difficult to achieve Engineering, Hebei University, Baoding, China, in 2021.
physiological interpretation. We will continue to explore She is currently pursuing her M. S. degree at the College
interpretive methods suitable for the model. of Electronic and Information Engineering, Hebei Univer-
sity, 071002 Baoding, China. Her research interests
Acknowledgment include computer vision and image processing.
This work was supported in part by the National Natural Sci- Hong Zhang (hzhang_hbu@163.com) earned her
ence Foundation of China under grant 62172139, the Natural B.S. degree from the College of Information Engineer-
Science Foundation of Hebei Province under grant ing, Yanshan University, Qnhuangdao, China, in 2019.
F2022201055, and the Science Research Project of Hebei She is currently pursuing her M.S. degree at the College
Province under grant BJ2020030. The project was funded of Electronic and Information Engineering, Hebei Uni-
by the China Postdoctoral under grant 2022M713361, Natu- versity, 071002 Baoding, China. Her research interests
ral Science Interdisciplinary Research Program of Hebei include computer vision and image processing.
University under grant DXK202102, Research Project of Shui-Hua Wang (shuihuawang@ieee.org) earned her
Hebei University Intelligent Financial Application Technol- Ph.D. degree in electrical engineering from Nanjing Univer-
ogy R & D Center under grant XGZJ2022022, Open Project sity in 2017. She was a professor in the School of Computer
Program of the National Laboratory of Pattern Recognition Science and Technology, Henan Polytechnic University,
under grant 202200007, and Open Foundation of Guang- 454000 Jiaozo, China. She also served as a research
dong Key Laboratory of Digital Signal and Image Process- associate in Loughborough University from 2018–2019.
ing Technology (2020GDDSIPL-04). This work was also Her research interests includes machine learning and bio-
supported by the High-Performance Computing Center of medical image processing.
Hebei University. Jingwen Yan is the corresponding author. Jie Zhao (jzhao_hbu@163.com) earned his Ph.D.
degree in optics from the State Key Laboratory of
About the Authors Applied Optics, Changchun Institute of Fine Mechanics
Shuaiqi Liu (shdkj-1918@163.com) earned his Ph.D. and Optics, Academia Sinica, Changchun, China, in
degree from the Institute of Information Science, Beijing 1997. He is a professor in the Department of Electronic
Jiaotong University, in 2014. He is a professor at the College Engineering, University of Shantou, 515063 Shantou,
of Electronic and Information Engineering, Hebei Universi- China. His current research interests include SAR
ty, Baoding 071002, China. His research interests include image processing, hyper-wavelet transforms, and com-
image processing and signal processing. pressed sensing.

Jingwen Yan (jwyan@stu.edu.cn) is with the School of [16] K. Fu, D. Fan, G. Ji, Q. Zhao, J. Shen, and C. Zhu, “Siamese network for RGB-D
Engineering, Shantou University, 515063 Shantou, China. salient object detection and beyond,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44,
no. 9, pp. 5541–5559, Sep. 2022, doi: 10.1109/TPAMI.2021.3073689.
References [17] Q. Hu, S. Hu, and S. Liu, “BANet: A balance attention network for anchor-free ship
[1] C. M. Michel, M. M. Murray, G. Lantz, S. Gonzalez, L. Spinelli, and R. Grave de Per- detection in SAR images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–12, Jan.
alta, “EEG source imaging,” Clin. Neurophysiol., vol. 115, no. 10, pp. 2195–2222, Oct. 2022, doi: 10.1109/TGRS.2022.3146027.
2004, doi: 10.1016/j.clinph.2004.06.001. [18] Z. Xiao, C. Wang, N. Jia, and J. Wu, “SAE-based classification of school-aged chil-
[2] S. Liu et al., “3DCANN: A spatio-temporal convolution attention neural network dren with autism spectrum disorders using functional magnetic resonance imaging,”
for EEG emotion recognition,” IEEE J. Biomed. Health Inform., vol. 26, no. 11, pp. Multimedia Tools Appl., vol. 77, no. 17, pp. 22,809–22,820, Sep. 2018, doi: 10.1007/
5321–5331, Nov. 2022, doi: 10.1109/JBHI.2021.3083525. s11042-018-5625-1.
[3] E. Moradi, A. Pepe, C. Gaser, H. Huttunen, and J. Tohka, “Machine learning [19] N. Jia, J. Tan, Z. Xiao, Z. Qi, and J. Wu, “Classification of autism spectrum disorder
framework for early MRI-based Alzheimer’s conversion prediction in MCI sub- based on brain functional connectivity and SAE,” J. Nanchang Univ. (Natural Sci.),
jects,” NeuroImage, vol. 104, pp. 398–412, Jan. 2015, doi: 10.1016/j.neuroimage. vol. 42, no. 4, pp. 399–403, Aug. 2018, doi: 10.13764/j.cnki.ncdl.2018.04.017.
2014.10.002. [20] A. Rathore, S. Palande, J. S. Anderson, B. A. Zielinski, P. T. Fletcher, and B. Wang,
[4] S. Liu, C. Zhao, Y. An, P. Li, J. Zhao, and Y. Zhang, “Diffusion tensor imaging “Autism classification using topological features and deep learning: A cautionary
denoising based on Riemannian geometric framework and sparse Bayesian learning,” tale,” in Proc. Int. Conf. Med. Image Comput. Comput. Assisted Intervention (MIC-
J. Med. Imag. Health Inform., vol. 9, no. 9, pp. 1993–2003, Dec. 2019, doi: 10.1166/ CAI), Cham, Switzerland: Springer-Verlag, 2019, pp. 736–744, doi: 10.1007/978-3-030
jmihi.2019.2832. -32248-9_82.
[5] S. Liu, L. Zhao, J. Zhao, B. Li, and S.-H. Wang, “Attention deficit/hyperactivity disor- [21] J. Zhuang, N. C. Dvornek, X. Li, P. Ventola, and J. S. Duncan, “Invertible network
der Classification based on deep spatio-temporal features of functional Magnetic Reso- for classification and biomarker selection for ASD,” in Proc. Int. Conf. Med. Image
nance Imaging,” Biomed. Signal Process. Control, vol. 71, Jan. 2022, Art. no. 103239, Comput. Comput. Assisted Intervention (MICCAI), Cham, Switzerland: Springer-
doi: 10.1016/j.bspc.2021.103239. Verlag, 2019, pp. 700–708, doi: 10.1007/978-3-030-32248-9_78.
[6] A. Kastrup, G. Kruger, G. H. Glover, and M. E. Moseley, “Assessment of cerebral [22] M. Tang, P. Kumar, H. Chen, and A. Shrivastava, “Deep multimodal learning for
oxidative metabolism with breath holding and fMRI,” Magn. Reson. Med., vol. 42, the diagnosis of autism spectrum disorder,” J. Imag., vol. 6, no. 6, p. 47, Jun. 2020, doi:
no. 3, pp. 608–611, Sep. 1999, doi: 10.1002/(SICI)1522-2594(199909)42:3<608::AID- 10.3390/jimaging6060047.
MRM26>3.0.CO;2-I. [23] L. Shao, C. Fu, Y. You, and D. Fu, “Classification of ASD based on fMRI data with
[7] E. Kirino, S. Tanaka, Y. Nagai, A. Hattori, and S. Aoki, “S1-3 Functional connectivity deep learning,” Cogn. Neurodynamics, vol. 15, no. 6, pp. 961–974, Dec. 2021, doi:
in autism spectrum disorder evaluated using rs-fMRI and DKI,” Clin. Neurophysiol., 10.1007/s11571-021-09683-0.
vol. 131, no. 10, pp. e244–e245, Oct. 2020, doi: 10.1016/j.clinph.2020.04.062. [24] W. Yin, S. Mostafa, and F. Wu, “Diagnosis of autism spectrum disorder based on
[8] J. F. Agastinose Ronicko, J. Thomas, P. Thangavel, V. Koneru, G. Langs, and functional brain networks with deep learning,” J. Comput. Biol., vol. 28, no. 2, pp.
J. Dauwels, “Diagnostic classification of autism using resting-state fMRI data 146–165, Feb. 2021, doi: 10.1089/cmb.2020.0252.
improves with full correlation functional brain connectivity compared to partial [25] B. Lullo. “Autism Brain Imaging Data Exchange I ABIDE I.” ABIDE. Accessed: Jun.
correlation,” J. Neurosci. Methods, vol. 345, Nov. 2020, Art. no. 108884, doi: 10.1016/ 24, 2016. [Online]. Available: https://fcon_1000.projects.nitrc.org/indi/abide/abide_I.html
j.jneumeth.2020.108884. [26] N. Chaitra, P. A. Vijaya, and G. Deshpande, “Diagnostic prediction of autism spectrum
[9] M. Wang, J. Huang, M. Liu, and D. Zhang, “Modeling dynamic characteristics of disorder using complex network measures in a machine learning framework,” Biomed.
brain functional connectivity networks using resting-state functional MRI,” Med. Signal Process. Control, vol. 62, Sep. 2020, Art. no. 102099, doi: 10.1016/j.bspc.2020.102099.
Image Anal., vol. 71, Jul. 2021, Art. no. 102063, doi: 10.1016/j.media.2021.102063. [27] M. Graña and M. Silva, “Impact of machine learning pipeline choices in autism
[10] T. Iidaka, “Resting state functional magnetic resonance imaging and neural prediction from functional connectivity data,” Int. J. Neural Syst., vol. 31, no. 4,
network classified autism and control,” Cortex, vol. 63, pp. 55–67, Feb. 2015, doi: p. 2,150,009, Apr. 2021, doi: 10.1142/s012906572150009x.
10.1016/j.cortex.2014.08.011. [28] Z. Sherkatghanad, M. Akhondzadeh, S. Salari, M. Zomorodi, and V. Salari, “Auto-
[11] X. Bi, Y. Liu, Q. Jiang, Q. Shu, Q. Sun, and J. Dai, “The diagnosis of autism spec- mated detection of autism spectrum disorder using a convolutional neural network,”
trum disorder based on the random neural network cluster,” Frontiers Hum. Neurosci., Frontiers Neurosci., vol. 13, Jan. 2020, Art. no. 1325, doi: 10.3389/fnins.2019.01325.
vol. 12, Jun. 2018, Art. no. 257, doi: 10.3389/fnhum.2018.00257. [29] T. Eslami, V. Mirjalili, A. Fong, A. R. Laird, and F. Saeed, “ASD-DiagNet: A hybrid
[12] S. Mostafa, L. Tang, and F. X. Wu, “Diagnosis of autism spectrum disorder based learning approach for detection of autism spectrum disorder using fMRI data,” Fron-
on eigenvalues of brain networks,” IEEE Access, vol. 7, pp. 128,474–128,486, Sep. 2019, tiers Neuroinformatics, vol. 13, Nov. 2019, Art. no. 70, doi: 10.3389/fninf.2019.00070.
doi: 10.1109/access.2019.2940198. [30] L. Guo et al., “Classification of the functional magnetic resonance image of autism
[13] J. Liu, Y. Sheng, W. Lan, R. Guo, Y. Wang, and J. Wang, “Improved ASD classifica- based on 4D convolutional neural network,” CAAI Trans. Intell. Syst., vol. 16, no. 6, pp.
tion using dynamic functional connectivity and multi-task feature selection,” Pattern 1021–1029, Nov. 2021, doi: 10.11992/tis.202009022.
Recognit. Lett., vol. 138, pp. 82–87, Oct. 2020, doi: 10.1016/j.patrec.2020.07.005. [31] W. Jiang et al., “CNNG: A convolutional neural networks with gated recurrent
[14] F. Zhao, Z. Chen, I. Rekik, S.-W. Lee, and D. Shen, “Diagnosis of autism spectrum units for autism spectrum disorder classification,” Frontiers Aging Neurosci., vol. 14,
disorder using central-moment features from low- and high-order dynamic resting- Jul. 2022, Art. no. 948704, doi: 10.3389/fnagi.2022.948704.
state functional connectivity networks,” Frontiers Neurosci., vol. 14, Apr. 2020, Art. no. [32] S. Parisot, S. I. Ktena, E. Ferrante, M. Lee, and D. Rueckert, “Spectral graph
258, doi: 10.3389/fnins.2020.00258. convolutions for population-based disease prediction,” in Proc. Int. Conf. Med. Image
[15] Y. Wu et al., “JCS: An explainable COVID-19 diagnosis system by joint classifica- Comput. Comput. Assisted Intervention (MICCAI), Cham, Switzerland: Springer-
tion and segmentation,” IEEE Trans. Image Process., vol. 30, pp. 3113–3126, Feb. Verlag, 2017, pp. 177–185, doi: 10.1007/978-3-319-66179-7_21.
2021, doi: 10.1109/TIP.2021.3058783.

©SHUTTERSTOCK.COM/TEO ANGELOVSKI
Tooth.AI
Intelligent Dental Disease Diagnosis and Treatment
Support Using Semantic Network
by Hossam A. Gabbar , Abderrazak Chahid ,

Md. Jamiul Alam Khan , Oluwabukola Grace Adegboro, and
Matthew Immanuel Samson
T
he emerging fourth industrial revolution (indus- and diagnosis of oral health diseases. The solution presents
try 4.0) is leading the healthcare system toward a smart and automated assistive platform to aid dental prac-
more digitalization and smart management. For titioners in identifying underlying tooth diseases and
instance, recent digital healthcare solutions can accessing doctors in treatment suggestions.
help dentists/practitioners save time by manag-
ing their schedules and managing diagnosis and treatment. Introduction
The proposed solution is a diagnostic module that can be According to the Global Burden of Disease 2010, of dental
integrated into existing dental software. This module is and oral diseases affecting people worldwide, around 35%
based on artificial intelligence (AI) that allows the diagnosis suffer from untreated decay (caries) of permanent teeth,
of X-ray images/volumes and helps in the early detection 11% have severe periodontal (gum) disease, and 2% even
have tooth loss. Oral health diseases happen due to differ-
ent factors, such as a lack of resources, oral hygiene hab-
Date of current version: 17 July 2023 its, etc. Such diseases may cause the loss of all-natural

teeth, which can lead to changes in eating patterns, nutri- In addition, tooth-related diseases might result from
ent deficiency, and involuntary weight loss, as well as some skull/mouth geometry abnormalities. In some cases,
speech difficulty (if left uncorrected). The state of oral surgical interventions are needed to correct this deforma-
health in Canada reported that the government’s main tion and restore healthy teeth. The detection of such skele-
challenge is providing required oral health care to the tal abnormalities is usually diagnosed using cephalometric
most vulnerable segments of its population (e.g., low- analysis, which checks the normal position of some key
income groups, indigenous peoples, people with special locations, called landmarks. Therefore, it is crucial to
needs, children, and new immigrants with refugee status) design preventive healthcare solutions to integrate skeletal
[1]. Figure 1 shows the age distribution of the health survey and dental diagnosis to help improve the oral health
of the Canadian community. system and reduce treatment expenses. Many studies dem-
The time loss due to dental problems and treatment onstrate that preventative healthcare solutions are cost-
causes an economic loss estimated at over 40 million effective, with substantial economic benefits regarding
hours lost annually: US$442 billion in 2010 worldwide (see reduced treatment costs and decreased productivity losses
Table 1 for more details). It is crucial to design preventive in the labor market.
healthcare solutions to help improve the oral health sys- Most of the existing dental and skeletal software pro-
tem and reduce economic loss. vide independent diagnosis and/or treatment solutions
with data management and appointment schedulers. These
available systems in the market can be divided into two
80 main categories. First, the hardware-based solutions pro-
70
vide the medical dental and skeletal scanner for data
60
acquisition and capturing the medical recording used for
50
the medical diagnosis. The scanners use different imaging
(%)
40
30 technologies, such as X-ray computed tomography (CT),
20 and intraoral cameras using near-infrared imaging (NiRi).
10 These solutions provide doctors mainly with raw and/or
enhanced medical images used for manual diagnosis, for
Age All example, iTero [2], Carestream Dental [3], and GO [4].
0
1
−2
−3
−4
−5
−6
−7
≥7
12
21
31
41
51
61
These solutions allow fast scan time with additional post-

Source: Statistics Canada, processing phases to reduce motion blur risk and limit
Canadian Community Health Survey (CCHS), 201236
exposure time with minimal radiation. Some of these solu-
Figure 1. Percentage of Canadians aged 12 years tions allow advanced postprocessing, such as automatic
and over who consulted with a dentist or orthodontist cephalometric tracing, superimposition, image reporting,
in 2012. and surgical simulation using a visual treatment objective.
Table 1. Potential productivity losses due to dental problems and treatment at the
individual and societal level [1].
Mean Potential Individual Potential Societal

Occupation Classification Hours Lost Losses ($) Losses ($)
Management 2.9 108.16 104,287,872
Business, finance, and administrative 3.8 85.15 239,109,715
Natural and applied sciences and related occupations 2.9 95.17 103,278,484
Health occupations 3.6 97.44 97,790,784
Occupations in social science, education, government 3.7 112.51 165,333,445

service, and religion
Occupations in art, culture, recreation, and sport 3.9 91.67 33,212,041
Sales and service occupations 31 5812 220,857,664
Trades, transport, and equipment operators and related occupations 2.8 6431 131,064,967
Occupations unique to primary industry 33 76.39 16,439,128
Occupations unique to processing, manufacturing, and utilities 22 42.96 32,232,888

These software solutions include I n t h i s a r t icle, we propose
but are not limited to: Cephx [5], a smart and automated solution
This module is
Planmeca [6], FACAD [7], OrisCeph that combines dental and skeletal
Rx [8], AudaxCeph [9], Carestream based on artificial diagnosis based on deep learning
Dental [3], and DolphinCeph Trac- intelligence that techniques. In addition, it assists
ing [10]. Most of the advanced anal- doctors in treatment suggestions,
ysis is performed using deep allows the diagnosis taking into consideration the
learning-based models (classifica- of X-ray images/ patient profile parametrized by
tion, semantic segmentation, and their medical records and previ-
landmark detection, etc.) [11] or
volumes and helps ous diseases treatment history.
knowledge-based techniques (gen- in the early detection The outline of the rest of this arti-
erative programming, pattern cle is as follows. The “Proposed
and diagnosis of oral
detection, etc.) [12]. However, deep Solution” section describes the
learning-based features provided health diseases. proposed Tooth.AI framework
by this software still depend on the with the different diagnosis and
initial data used for training. This treatment modules. The “Results
limits its flexibility for variant and Discussion” section presents
patient profiles. Thus, it becomes challenging to ensure the the obtained results of a case study. The “Knowledge
generalizability of the trained models. Moreover, some fac- Translation” section explores the knowledge transfer
tors, such as age, gender, and existing medical conditions, plan to take our solution to the next stage of public usage.
are not considered in model training. The “Novelty and Anticipated Impact” section presents a
Input Medical Data
Patient Information
– Age, Sex, Gender
– Health Condition CT Image/3D Volume
– Previous Treatment
Patient Information Verified Diagnosis/Treatment

Annotated
Dataset (DSN)
Dental Diagnosis
Treatment Database – Tooth Structure Segmentation
– Previous Successful Disease – Caries Detection and Characterization
Incremental Learning
Treatments – Predict Future Dental Complications

– Patients With Similar Profile/ – Provide a Justified Diagnostic Report
Disease
Treatment Suggestion
Suggest the Most Relevant Successful
Treatment Protocol
Update Database
Expert Feedback
Figure 2. The general framework of the proposed solution. DSN, dental semantic network.

summary of our contribution and novelty with conclud- regions across Canada to have a good representation of
ing remarks. the dental healthcare data with varying health conditions.
All of these factors will be investigated in relation to the
Proposed Solution dental treatment process. The proposed computer vision-
There has been a massive amount of X-ray data and based approach will process 2D images/scans, which will
cumulative knowledge from dental radiologists and be mapped into 3D digital form. Tooth.AI will mainly ana-
experts over the last few decades. They have identified lyze cone-beam CT images/volumes because it provides a
thousands of pathological changes and traces of previous better understanding of the mouth/teeth morphology. In
dental treatment on X-rays worldwide. Our solution will addition, we use 2D images collected from the nonradia-
offer an integrated computer vision and knowledge-based tive intraoral camera using NiRi technology.
system to extract diagnostic information from input medi-
cal images/volumes collected from dental exams using CT Skeletal Diagnosis
scanners. The proposed research solution is to design an The second integrated component of Tooth.AI will sup-
intelligent preventive system named Tooth.AI to detect port dentists, orthodontists, and oral surgeons in cepha-
and diagnose skeletal and dental diseases. It aims to pro- lometric analysis and help them to understand the dental
vide a real-time inspection of the teeth and skull geometry and skeletal relationships in the human skull. They will
and simulate the future development of the disease in the be able to plan the treatment correctly and accurately
case of no treatment and suggest suitable treatments for with reduced time. Tooth.AI will reduce the manual
the patient (Figure 2). examination of X-ray images, where it will automatically
identify landmarks with preprocessed knowledge in
Dental Diagnosis DSN. It will visualize the integrated view with land-
This integrated dental diagnosis component of this solution marks and possible diagnoses or issues based on stored
will support the detection and diagnosis of vertical root expertise, as shown in Figure 2. Tooth.AI will provide
fractures, assessment of root morphologies, determining details about patient diagnoses of dental and skeletal
the working length of the tooth, locating apical foramen, abnormalities and propose a possible treatment plan.
retreatment predictions, and prediction of periapical pathol- Tooth.AI will offer an automated process with human-in-
ogies. The medical images will be collected from open- the-loop to work fully automatically or with human
access resources/data sets and our collaborative dentists in intervention based on user preferences and configura-
Toronto and internationally. The developed deep learning tions. The proposed techniques within Tooth.AI for
segmentation techniques will identify the tooth’s structure cephalometric landmark detection are based on state-of-
(see Figure 3). In addition, it will classify its health condition the-art methods categorized into two main categories, as
(healthy tooth with caries). shown in Figure 4.
The extracted knowledge will be accumulated with The latest techniques of cephalometric landmark detec-
deterministic and probabilistic parameters in the dental tion and delta disease detection using the latest deep
semantic network (DSN) that will be dynamically updated learning algorithms produce results comparable to human
using expert feedback. The collaboration with experts will examiners [14]. For instance, very encouraging results
allow our team to annotate the medical data and evaluate were achieved in landmark detection of an error less than
the performance of the developed algorithm on different 2 mm of point-to-point errors with ground truth positions
patients and validate the deployment of the proposed solu- [15], [16], [17], [18], [19], [20]. In addition, there exist other
tion in clinical case studies. The case studies will include types of methods used to search for landmarks, such as
different sex/gender from different communities and shape model [21], employing resampling in conjunction
with the convolutional neural networks (CNN) algorithm
[22], CNN for regression analysis of cephalometric coordi-
nates [23], and various others.
Enamel
However, we need to go beyond landmark detection and
Crown suggest a suitable treatment based on the previous success-
Dentine
ful treatments of similar patient profiles. We propose devel-
Pulp Neck oping a fully integrated toolbox for automatic analysis of
X-ray images, detection of abnormalities or diseases, and
Gum Line help in treatment planning. The system would have a proper
Root data management system to input patient data. Then the
Alveolar Bone landmarks would be identified with a trained deep learning
model. In detecting landmarks, we propose investigating
the effects of factors, such as age, gender, and noise data.
The proposed system would analyze the landmark data
Figure 3. Illustration of the tooth structure (source [13]). (providing the needed angles and distances computations

necessary for the diagnosis). By combining the computed annotated data set. This will boost the deep learning
results and the previous expert’s treatment, the system model and help build a compromised knowledge base
would suggest the presence of abnormalities or diseases that can be transferred to other doctors and healthcare
and suggest treatment planning. The collaboration with systems. The increment learning framework presents a
dentists will enable the team to annotate the diagnosis solution to this problem as follows:
image and link it to diseases. It will also provide detailed ◆◆ Gradually build an annotated data set from the daily
inputs to label images for skeletal analysis to support the practice of doctors.
planning of surgical modifications. The main toolbox will ◆◆ Centralize the knowledge base from different experts
be directed to clinical use using X-ray CT images and 3D and build generalizable models.
volumes. The proposed algorithms will further validate ◆◆ Enable the transfer learning by using these pretrained
nonradiative data using our laboratory setup based on an models for another similar disease while preserving
intraoral camera [24]. The workflow of the proposed solu- patient data privacy.
tion is shown in Figure 2.
DSN
Enabled Incremental Learning During the medical treatment
Deep learning-based diagnosis journey of a specific disease, the
Tooth.AI will provide
has shown remarkable abilities to doctors create a treatment file
achieve high accuracy even com- details about describing the diagnosis proce-
parable to expert practitioners. patient diagnoses of dure, and record the prescribed
However, this cannot be guaran- treatment and its efficiency evalu-
teed if these models are trained dental and skeletal ation during the follow-up ses-
on a small data set or using data abnormalities and sions. In this article, the different
sets that do not represent most data collected during this treat-
propose a possible
samples but with few variabilities. ment journey are structured into a
In addition, medical data have treatment plan. s e m a nt ic ne t work d a t a b a s e
some additional constraints relat- including patient health condition,
ed to privacy and ethics restric- disease and treatment history, etc.
tions. Therefore, it becomes highly Therefore, all patients’ treatment
challenging to access the needed labeled data set with journeys are put together and grouped into different
enough size and variability. Furthermore, the data label- nodes: patients (denoted P), tooth diseases (denoted D-T)
ing presents a second challenge as this type of labeling (tooth), gum diseases (denoted D-G), and their corre-
(dental and skeletal diagnosis) is subjective to each sponding treatments (T-T, T-G). These nodes are associ-
expert’s experience and daily practices. Therefore, it is ated by their relationship: i.e., the patient (P1) is affected
vital to design a system that can use the doctor’s diagno- by the disease (D-T1), which is treated with the treatment
sis and treatment and convert them into a standardized (T-T2). The patient nodes are linked with a weighted edge
Landmark Identification
Knowledge Based AI Based

Techniques
Machine Learning Deep Learning
–Edge and Pattern Detection –Random Forest

–Convolution Neural
–Genetic Programing –Regression
Network
–Models (Active Shape –Support Vector Machines
–Pulse Coupled Neural
and Active Appearance –Decision Tree
Network
Models) –Linear Affine and Linear
–Cellular Neural Network
Principal Component
Figure 4. Categorization of landmark identification techniques (source [12]).

defining their similarity. This sim- f irst to tra in a segmentation
ilarity is computed as the covari- model. The segmented teeth will
Therefore, it is vital
ance of patients, retrieved from be then cropped and used to gen-
the semantic network edge, con- to design a system erate a second data set used for
sidering their health conditions that can use the classification. The classification
and disease/treatment history. model is deployed to distinguish
Figure 5 shows an illustration of doctor’s diagnosis three different tooth classes:
converting regular data into seman- and treatment and healthy, unhealthy, and treated
tic network-based data. (with filling). The cascade models
convert them into will help in diagnosing each tooth
Results and Discussion a standardized separately, as shown in Figure 6.
In this section, a case study is pre- The training performance of
annotated data set.
sented to show an example of the segmentation models gives perfor-
obtained results using the pro- mance described by the achieved
posed framework. It explores a intersection over union (IOU) up
scenario of deploying the proposed Tooth.AI system for to 0.79. Similarly, the classification model could achieve an
teeth diagnosis and skull landmark detection and shows accuracy of 0.95. Figure 7 presents the training and valida-
how this diagnosis report can be used to update the tion performance of both models.
semantic network and suggest a suitable treatment.
Skeletal Landmark Detection
Teeth Diagnosis The cephalograms data set [25] consists of 400 lateral
The used panoramic dental data set consists of 1,000 cephalogram images of 400 different subjects, whose
radiography images, where the corresponding mask ages are between 7 and 76. Each image of the data
localized the different teeth [24]. These data are used set is annotated with 19 landmarks, as presented in
1
Treated With Chlor-
Gingivitis
Hexidine 0.8
P2:
Jamiul 0.6
IOU
Affected By 0.6
P1: Success
0.4
John
Failure Training IOU
Affected By
0.2 Validation IOU
Root 0 100 200 300 400 500

Treated With Canal Epochs
Caries
Treatment (a)
100
Patient Disease Treatment
90
Figure 5. Illustration of the treatment suggestion
process. 80
Accuracy
70
60
50
40 Training_Accuracy
30 Validation_Accuracy
0 50 100 150 200 250 300 350 400

Epochs
(b)
Figure 7. Teeth segmentation and classification

Figure 6. Example of teeth diagnosis: healthy teeth performance: (a) teeth segmentation IOU; (b) teeth
(green), unhealthy (red), treated (blue). classification accuracy.

Figure 8. For landmark detection, a deep
learning model based on CNN architec- L1 Sella
ture is deployed. Figure 9 shows the pre- L2 Nasion
L3 Orbitale
dicted landmark points. The obtained
L4 Porion
results will be used to compute the dif- L5 Subspinale
ferent clinical measurements needed to L6 Supramentale
characterize the skull shape and extract L7 Pogonion
L8 Menton
the anomalies.
L9 Gnathion
L10 Gonion
Treatment Suggestion Using the L11 Lower Incisal Incision
Semantic Network L12 Upper Incisal Incision
L13 Upper Lip
The diagnostic reports generated by the
L14 Lower Lip
dental and skeletal modules will be used L15 Subnasale
to recognize the disease and suggest the L16 Soft Tissue Pogonion
appropriate treatment. In this work, a list L17 Posterior Nasal Spine
of diseases and treatments is shown L18 Anterior Nasal Spine
L19 Articulate
based on the medical literature to build
the initial database needed for treatment
Figure 8. Cephalogram annotation example showing the 19
suggestions [26], [27], [28]. The suggestion landmarks (source [17]).
of treatment for a specific patient has
three levels. First, the system suggests
the recent successful treatment if the patient was previ- further train the deep learning models. We will communi-
ously treated for the same disease. Second, and if the cate with the Canadian Dental Association to get more
patient was not affected by the disease before, the system views and expertise on our solution and potential imple-
will suggest the successful treatment of the most similar mentation guidelines. Our team will communicate with
patient from the database. If not available, third, the sys- the Canadian Dental Regulatory Authorities Federation to
tem will suggest the most commonly used treatment for gain experience and application of automation in view of
the diagnosed disease. the regulatory framework.
The generated system diagnosis and the suggested
treatment are then updated to the semantic network, cre-
ating additional nodes if applicable. Figure 10 presents two
examples of adding new nodes to the semantic network.
Knowledge Translation
The collaborating partner dentists from Canada and inter-
national clinics will provide sample images (with con-
sent) and diagnosis and treatment data, which will
support the research team to build training data and asso-
ciated analysis. The interviews with expert dentists and
dental data providers will offer expertise in the validation
and analysis of images, diagnosis, and treatment details,
which will be transferred to the research team. Obtaining
medical data from 20 patients is expected each year. In
addition, we will conduct around 28 interview sessions
with dentists and experts to annotate the collected data
and get their opinion about the algorithms, approach, and
integrated solutions. Thanks to the interaction with
experts and practitioners, the proposed toolbox is
enabled with an interactive user interface. Thus, the
experts can correct the wrong predictions of the AI mod-
els to boost their performance. Moreover, we propose
handling the lack of an annotated data set by developing
an incremental model training framework that keeps
updating the annotated data from recent interactions with Figure 9. Example of skeletal landmark locations
the expert. All of these interactions between the toolbox detection: predicted landmarks (green) ground-truth
and the expert will be saved to the database and used to of landmarks (red).

Disease Nodes Patient Nodes
Treatment Nodes Patient ID
Zoom Area
(a)
Disease Nodes Patient Nodes

Treatment Nodes Patient ID
Zoom Area
(b)
Figure 10. Example of the semantic networks after new patient–disease–treatment augmentation: (a) small
DSN; (b) larger DSN.
Novelty and Anticipated Impact The proposed DSN and knowledge base would be useful for
The proposed system includes different deep learning- both dentists and the public to share and transfer expertise.
based techniques for dental and skeletal diseases and The accumulation of expertise around dental diagnosis and
treatments, which will enhance the accuracy of dental treatment will preserve the expertise of doctors and will
treatments and reduce errors, with enhanced efficiency. The allow continuous expertise exchange and transfer between
proposed novel incremental learning framework will allow healthcare providers. The proposed solution will also sup-
for a gradual and improved understanding of dental and port dental surgeries, which are expensive, and reduce
skeletal diseases and to transfer this knowledge to an AI- errors and increase comfort and satisfaction based on
based model using an active interaction between the tool- improved precision and accuracy to meet patient expecta-
box and the expert. It will preserve the doctor’s experiences tions. It will open the door for digital and smart dental
in diagnosis and treatment, and convert them into standard- healthcare systems. The solution will enable plug-and-play
ized annotated data sets that will be used to support young interfaces to different X-ray and camera technologies for
dentists with less experience in improved dental treatments. national and international deployments.

Acknowledgment [13] K. Watson and C. Frank. “How to brush your teeth properly.” Healthline. Accessed:
Research reported in this publication has been supported Mar. 12, 2022. [Online]. Available: https://www.healthline.com/health/dental-and-
by New Vision Systems Canada Inc. and Mitacs. oral-health/how-to-brush-your-teeth
[14] H. W. Hwang, J. H. Moon, M. G. Kim, R. E. Donatelli, and S. J. Lee, “Evaluation of
About the Authors automated cephalometric analysis based on the latest deep learning method,” Angle
Hossam A. Gabbar (hossam.gaber@ontariotechu.ca) is Orthodontist, vol. 91, no. 3, pp. 329–335, May 2021, doi: 10.2319/021220-100.1.
with the Faculty of Energy Systems and Nuclear Science [15] C. W. Wang et al., “Evaluation and comparison of anatomical landmark detection
and the Faculty of Engineering and Applied Science, Ontar- methods for cephalometric X-ray images: A grand challenge,” IEEE Trans. Med. Imag.,
io Tech University, Oshawa, ON L16 0C5, Canada. vol. 34, no. 9, pp. 1890–1900, Sep. 2015, doi: 10.1109/TMI.2015.2412951.
Abderrazak Chahid (abderrazak.chahid@ontariotechu. [16] H. Kim, E. Shim, J. Park, Y. Y. J. Kim, U. Lee, and Y. Y. J. Kim, “Web-based fully
net) is with the Faculty of Energy Systems and Nuclear Sci- automated cephalometric analysis by deep learning,” Comput. Methods Programs
ence, Ontario Tech University, Oshawa, ON L16 0C5, Canada. Biomed., vol. 194, Oct. 2020, Art. no. 105513, doi: 10.1016/j.cmpb.2020.105513.
Md. Jamiul-Alam Khan (mdjamiul.khan@ontariotechu. [17] “Fully automatic cephalometric evaluation using random forest regression-voting,”
net) is with the Faculty of Engineering and Applied Science, Univ. of Manchester, Manchester, U.K., 2015. [Online]. Available: https://www.research.
Ontario Tech University, Oshawa, ON L16 0C5, Canada. manchester.ac.uk/portal/en/publications/fully-automatic-cephalometric-evaluation
Oluwabukola Grace-Adegboro (oluwabukola. -using-random-forest-regressionvoting(b42c658f-0a66-4d1e-99c7-9cb67fb282a0).html
adegboro@ontariotechu.net) is with the Faculty of [18] “Grand challenges in dental X-ray image analysis 2014.” Accessed: Mar.
Engineering and Applied Science, Ontario Tech University, 12, 2022. [Online]. Available: https://www.be.ntust.edu.tw/p/404-1009-44930.
Oshawa, ON L16 0C5, Canada. php?Lang=zh-tw
Matthew Immanuel Samson (sunnyssj8@gmail.com) [19] Y. Song, X. Qiao, Y. Iwamoto, and Y. W. Chen, “Automatic cephalometric landmark
is with New Visions Systems Canada Inc., Scarborough, ON detection on X-ray images using a deep-learning method,” Appl. Sci. (Switzerland),
M1S 3L1, Canada. vol. 10, no. 7, Apr. 2020, Art. no. 2547, doi: 10.3390/app10072547.
[20] J. Kim et al., “Accuracy of automated identification of lateral cephalometric
References landmarks using cascade convolutional neural networks on lateral cephalograms from
[1] H. Amasya, D. Yildirim, T. Aydogan, N. Kemaloglu, and K. Orhan, “Cervical ver- nationwide multi-centres,” Orthodontics Craniofacial Res., vol. 24, no. S2, pp. 59–67,
tebral maturation assessment on lateral cephalometric radiographs using artificial Dec. 2021, doi: 10.1111/ocr.12493.
intelligence: Comparison of machine learning classifier models,” Dentomaxillofacial [21] J. Montúfar, M. Romero, and R. J. Scougall-Vilchis, “Automatic 3-dimensional
Radiol., vol. 49, no. 5, Mar. 2020, Art. no. 49, doi: 10.1259/dmfr.20190441. cephalometric landmarking based on active shape models in related projections,”
[2] “iTero element 5D — iTero intraoral scanner.” iTero. Accessed: Mar. 12, 2022. Amer. J. Orthodontics Dentofacial Orthopedics, vol. 153, no. 3, pp. 449–458, Mar. 2018,
[Online]. Available: https://global.itero.com/en/products/itero_element_5d doi: 10.1016/j.ajodo.2017.06.028.
[3] “Cephalometric imaging systems.” Carestream Dental. Accessed: Mar. 12, 2022. [22] S. H. Kang, K. Jeon, H. J. Kim, J. K. Seo, and S. H. Lee, “Automatic three-
[Online]. Available: https://www.carestreamdental.com/en-us/csd-products/extraoral- dimensional cephalometric annotation system using three-dimensional con-
imaging/cephalometric-imaging/ volutional neural networks: A developmental trial,” Comput. Methods Bio-
[4] “GO extraoral imaging,” Newtom. Accessed: Mar. 12, 2022. [Online]. Available: mechanics Biomed. Eng., Imag. Vis., vol. 8, no. 2, pp. 210–218, Mar. 2020, doi:
https://www.newtom.it/en/medicale/prodotti/go/ 10.1080/21681163.2019.1674696.
[5] “Cephalometric anlaysis archives— CephX— AI driven dental services.” [23] S. Nishimoto, Y. Sotsuka, K. Kawai, H. Ishise, and M. Kakibuchi, “Personal
CephX. Accessed: Mar. 12, 2022. [Online]. Available: https://cephx.com/it/tag/ computer-based cephalometric landmark detection with deep learning, using cepha-
cephalometric-anlaysis-it/ lograms on the internet,” J. Craniofacial Surgery, vol. 30, no. 1, pp. 91–95, Jan. 2019,
[6] “Cephalometric anlaysis archives— CephX— AI driven dental services.” CephX. doi: 10.1097/SCS.0000000000004901.
Accessed: Mar. 12, 2022. https://cephx.com/it/tag/cephalometric-anlaysis-it/ [24] K. Panetta, R. Rajendran, A. Ramesh, S. Rao, and S. Agaian, “Tufts dental data-
[7] “Facad ortho tracing software.” facad.com. Accessed: Mar. 12, 2022. [Online]. base: A multimodal panoramic X-ray dataset for benchmarking diagnostic systems,”
Available: https://www.facad.com/wp/ IEEE J. Biomed. Health Inform., vol. 26, no. 4, pp. 1650–1659, Apr. 2022, doi: 10.1109/
[8] “Software for cephalometric analysis OrisCeph Rx CE.” OrisLine. Accessed: Mar. 12, JBHI.2021.3117575.
2022. [Online]. Available: https://www.orisline.com/en/software-for-cephalometric-analysis/ [25] C. Lindner, C. W. Wang, C. T. Huang, C. H. Li, S. W. Chang, and T. F. Cootes, “Fully auto-
[9] “AudaxCeph software.” audaxceph.com. Accessed: Mar. 12, 2022. [Online]. Avail- matic system for accurate localisation and analysis of cephalometric landmarks in lateral
able: https://www.audaxceph.com/ cephalograms,” Scientific Rep., vol. 6, no. 1, pp. 1–10, Jun. 2021, doi: 10.1038/s41598-021-
[10] “Content library — Aquarium — Orthodontic imaging and practice management 91681-7.
software — Patient education — 1(818)435-1368 — Dolphin imaging and management [26] “Gum problems: 6 types, causes, symptoms, treatment & oral cancer.” Medi-
solutions — Product.” Dolphin Imaging. Accessed: Mar. 12, 2022. [Online]. Available: https:// cineNet. Accessed: Mar. 12, 2022. [Online]. Available: https://www.medicinenet.com/
www.dolphinimaging.com/product/Aquarium?Subcategory_OS_Safe_Name=Content_Library gum_problems/article.htm
[11] F. Schwendicke, T. Golla, M. Dreher, and J. Krois, “Convolutional neural networks [27] “Fractured tooth (Cracked Tooth): What it is, symptoms & repair,” Cleveland
for dental image diagnostics: A scoping review,” J. Dentistry, vol. 91, Dec. 2019, Art. Clinic, Cleveland, OH, USA, 2021. Accessed: Mar. 12, 2022. [Online]. Available: https://
no. 103226, doi: 10.1016/j.jdent.2019.103226. my.clevelandclinic.org/health/diseases/21628-fractured-tooth-cracked-tooth
[12] M. Juneja et al., “A review on cephalometric landmark detection techniques,” [28] “Healthline: Medical information and health advice you can trust.” Healthline.
Biomed. Signal Process. Control, vol. 66, Apr. 2021, Art. no. 102486, doi: 10.1016/j. Accessed: Mar. 12, 2022. [Online]. Available: https://www.healthline.com/
bspc.2021.102486.

©SHUTTERSTOCK.COM/BARILLO_PICTURE
MDN-Enabled
SO for Vehicle
Proactive
Guidance in Ride-
Hailing Systems
Minimizing Travel Distance and Wait Time
by Xiaoming Li , Jie Gao , Chun Wang , Xiao Huang , and Yimin Nie

Date of current version: 17 July 2023
28 IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023 2333-942X/23©2023IEEE

V
ehicle proactive guidance strategies are distances and rider wait times [2]. Guo et al. [3] propose an
used by ride-hailing platforms to mitigate online ride-hailing dispatch framework that is based on
supply–demand imbalance across regions spatiotemporal thermos guidance to address the real-time
by directing idle vehicles to high-demand service vehicle dispatching problem. A concept named
regions before the demands are realized. This spatiotemporal thermos is defined to represent the
article presents a data-driven stochastic optimization demand density of ride-hailing regions. In addition, the
framework for computing idle vehicle guidance strategies. random forest regression machine-learning method is uti-
The objective is to minimize drivers’ idle travel distance, lized for spatiotemporal thermos forecasting. A data-driv-
riders’ wait time, and the oversupply costs (OSCs) and en recommendation system that exploits the benefits of
undersupply costs (USCs) of the platform. Specifically, vehicular social networks for ride-hailing services is
we design a novel neural network that integrates gated designed in [4] where long short-term memory is utilized to
recurrent units (GRUs) with mixture density networks forecast the demands. Chen et al. [5] propose a hierarchical
(MDNs) to capture the spatial-temporal features of the framework for vehicle dispatch in ride-sharing systems.
rider demand distribution. The higher hierarchy optimizes idle
The outcome of the neura l mileage by rebalancing vehicles
network is fed into a stochastic across regions toward current and
optimization process to compute The objective is to predicted rider demands.
near-optimal idle vehicle guidance While the lower hierarchy is to
solutions. The performance of the minimize the total minimize the total mileage delay as
proposed framework is validated idle travel distance well as serve rider requests as
through numeric experiments using much as possible, Miao et al. [6]
under the worst case
New York yellow taxi trip record develop a data-driven taxi dispatch
data. Our results show that the demand scenario framework under demand uncer-
MDN-enabled stochastic optimiza- while maintaining tainty that is spatial-temporally
tion approach outperforms other correlated using robust optimiza-
machine learning-based vehicle service fairness across tion modeling techniques. In this
guidance models that only utilize the whole city. work, vacant vehicles are dis-
the point estimates of rider demands. patched toward predicted rider
In terms of managerial implica- demand that varies in an uncertain
tions, it is clear from our experi- demand set constructed on spatial-
mental results that, by adopting data-driven stochastic temporally correlated data sets. The objective is to mini-
optimization models in their vehicle guidance systems, mize the total idle travel distance under the worst case
ride-hailing platforms can improve rider and driver satis- demand scenario while maintaining service fairness across
faction and reduce their operating costs. the whole city. In addition to guidance strategies at the sys-
tem level, the impact of guidance signals on individual driv-
Introduction ers’ decisions is also studied. In [7], a sequential binary
The most important service provided by ride-hailing plat- logistic regression model is proposed to determine the fac-
forms, such as Lyft, Uber, and Didi, is to match drivers and tors influencing the driver’s cruising decisions when receiv-
riders efficiently. To ensure service quality and reduce ing taxi-calling signals. The model is calibrated by survey
wait times, the demand of riders needs to be promptly met data. Recently, machine learning [8] and deep reinforce-
by the supply of drivers. However, dynamic changes in the ment learning [9] approaches have been ubiquitously uti-
demands across the service regions often cause a supply– lized in ride-hailing applications which shed light on a
demand imbalance in the regions and make it challenging research trend of combining learning approaches with opti-
for the platforms to dispatch sufficient drivers to high- mization modeling techniques.
demand regions in a timely manner to ensure low wait The articles mentioned previously provide important
times. Without a proactive guidance strategy, a ride-hail- insights into designing a proactive guidance mechanism
ing platform has to react to the rider demands across in ride-hailing systems. However, their approaches do
regions when they are realized. This reactive strategy may not incorporate uncertainties in their optimization pro-
prolong riders’ wait times since the needed idle vehicles cess in the sense that they only predict scalar point esti-
may not be in riders’ immediate proximity. mates of the demands in regions, which does not allow
Idle vehicle proactive guidance strategies have been stochastic optimization (SO) models. This simplified
proposed in recent literature to tackle this challenge [1], modeling of uncertainty often leads to a considerable
[2]. A proactive guidance strategy guides needed vehicles decline in system performance [10]. As an exception, the
to regions where future demands are expected to outstrip approach proposed in [6] does involve the uncertainty
supply. As a result, it can increase the rider serving rate sets of the demand. However, their robust optimization
(SR) and, at the same time, reduce driver idle driving models focus on guaranteed performance in worst case

scenarios, which is rather conservative for the purpose of Generally, GMM can be considered as a group of Gaussian
proactive guidance. distributions with different weights, where the ith Gauss-
In this article, we propose a data-driven SO framework ian is determined by weight r i, means n i and covariance
to compute near-optimal idle vehicle proactive guidance matrix R i (variance for v i univariate Gaussian). Then the
strategies given the dynamic rider demand and driver sup- predicted probability distribution can be represented
ply across ride-hailing service regions. Instead of just pre- using GMM by adjusting the parameter i. Notice that the
dicting the demand in the form of a scalar, the framework sum of Gaussian component weights must be equal to 1
models the uncertainty of rider demand by estimating its because each weight is computed by the following softmax
probability distribution using historical rider demand function, which is shown in (2):
data. The uncertainty model is then integrated into an SO
process to compute proactive guidance strategies. The
r
eh i
r i = softmax (h) i = n (2)

contribution of this article is two-fold: 1) we extend MDNs |
r
eh k
k=1
[11] by integrating GRUs [12], which enables the MDN to
capture various spatial-temporal features in estimating where h ri denotes the outputs of the hidden layer prior to
rider demand distributions and 2) we integrate the the layer stores GMM components. Meanwhile, the corre-
extended MDN with an SO process to minimize the vehi- sponding n i and v 2i are computed from (3) and (4),
cle guidance related costs, including USC, over supply respectively:
cost, and driver idle travel cost.
n i = h in (3)
The MDN-SO Framework
In this section, we present the MDN-enabled SO (MDN- v i = exp ^h vi h . (4)
SO) framework, which consists of two modules: an
extended MDN that is suitable for estimating demand dis- The probabilistic forecasting model is built on the
tributions of time-series data and an SO process that XMDN where GRUs can encode useful information of the
computes near-optimal proactive past in single or multiple layers.
guidance strategies. The input of each layer is the out-
put of the previous layer concate-
The Extended MDN nated with the network input. Then
MDN is a combination of a neural
Therefore, we propose the outputs of the GRU hidden layer
network and a Gaussian mixture an extended MDN h t will be used to compute the
model (GMM). Unlike the regular parameters of GMM from (2)–(4).
to be integrated
neural network that only predicts a In addition, the concatenation of
single value as the output, MDN into our SO process, outputs of all layers is used to pre-
can capture the model’s stochastic which requires the dict the network’s output, which is
behaviors by parameterizing a compared with the target y. Finally,
Gaussian mixture distribution distribution of the we use the mixture density param-
using the outputs of a neural net- rider demand eters to parameterize a Gaussian
work. However, regular MDN mod- mixture distribution as the proba-
els are not sufficient for our
as input. bilistic forecasting outcome. The
purpose as they do not possess the prediction process can be repeated
capability of capturing spatial-tem- in a loop to predict rider demand
poral features in rider demand for multiple time steps.
data. Therefore, we propose an extended MDN to be inte- Furthermore, one of the issues in MDNs, like the con-
grated into our SO process, which requires the distribution ventional deep neural network, is the overfitting problem
of the rider demand as input. [13]. In this work, besides the dropout operations in
The extended MDN (XMDN) is an integration of reg- XMDN, we introduce the L2 regularization technique to
ular MDN with GRU. The GMM used by the XMDN is avoid the overfitting issue. In this regard, we design the
configured by the mixed coefficients (also known as loss function of XMDN shown in (5):
weights), mean, and variance of each Gaussian kernel
that is shown in (1):
E ^w GRU h = - | In ' |r k ^ X n, w GRU h
N K
n=1 k=1
p (y ; X, i) = |r i N i ^ X h^y ; n i ^ X h, v i ^ X hh
K
N ^t ; n k ^ X n, w GRU h, v 2k ^ X n, w GRU hh, + 2 w GRU

(1) 1 2
(5)
i=1
where i = (r, n, v), and K is the number of Gaussian dis- where the parameter w GRU denotes the set of weights and
tributions (also known as components in the literature). biases in the GRU deep neural networks.

The Stochastic Optimization Process |x t
v, m # 1, 6v ! V t. (9)
We assume the ride-hailing platform operates during a day m!M
that is discretized into a group of batching windows (also

known as time slot) with fixed size DT (e.g., 10 min). (We Further, each idle vehicle, if guided, can only be guided
use “batching window” and “time slot” interchangeably in to the region’s POI that the vehicle can reach the POI with-
this article.) To facilitate the vehicle allocations, the ride- in the length of the batching window. These time con-
hailing service zone is divided into a group of disjoint ride- straints are captured by
hailing service regions denoted as M. Let V t denote the
g v,m /m # DT + H ^1 - x tv,m h, 6v ! V t, 6m ! M (10)
set of idle vehicles in batching window t. The binary vari-
able x tv,m = 1 if idle vehicle v is guided to the point of inter- where H is a large positive number to linearize the “if” con-
est (POI) in region m at time t, and x vt ,m = 0 otherwise. straints [14], and m is the idle vehicle’s travel speed that is
At the beginning of each batching window, a certain assumed to be a constant value during the guidance opera-
number of idle vehicles are guided to the ride-hailing tion. Therefore, g v,m /m is the guidance time between the GPS
regions’ POIs with minimum guidance distance to meet location of vehicle v to the GPS location of region POI m.
the rider’s requests in the future. This proactive guidance Moreover, the total number of idle vehicles must be less
operation incurs the idle vehicle guidance cost, which can than the fleet size under a certain supply–demand ratio,
be formulated in (6): which leads to the following constraint:
a| |g v, m x vt ,m (6)
v ! Vt m ! M
| |x t
v, m # iC t . (11)
where g v,m denotes the distance between idle vehicle v’s v ! Vt m ! M
GPS location and the GPS location of region POI m, a is Given the objective function and constraints, now the
introduced to denote the idle travel cost per mile. In addi- holistic optimization model for idle vehicle proactive guid-
tion, OSCs incur when the number of guided vehicles ance problem is summarized as follows:
exceeds the rider demand (including predicted rider
demands for the current batching window and the minimize ^6 h + ^7 h
unserved riders from the previous batching window). Like-
subject to ^8 h, ^9 h, ^10 h, ^11 h
wise, the USCs incur when the number of guided vehicles
is lower than the rider demand. The sum of OSC and the x tv,m ! " 0, 1 , 6v ! V t, 6m ! M, 6t ! T . (12)
USC is defined in (7):
As discussed previously, the objective is to minimize
|E ;b $ max ' 0, c | x v,m - d m - d m m1
t t t,s t-1
dt tm,s ~P the overall ride-hailing system costs.
m!M v ! Vt
To solve the SO model, we first reformulate it to its cor-
+ c $ max ' 0, c d tm- 1 + dt tm,s - | x tv,m m1E (7)
v ! Vt responding deterministic counterpart with a large group
of scenarios by applying the sample average approxima-
where dt tm,s and d tm- 1 denote the predicted rider demand at tion (SAA) [15] technique. The resulting deterministic
region m in time slot t under scenario s and the number of model can then be solved by an off-the-shelf solver such as
unserved riders at region m in time slot t - 1, respectively. Gurobi (https://www.gurobi.com/) and CPLEX (https://
Notice that the stochastic programming model will degen- www.ibm.com/analytics/cplex-optimizer).
erate to the deterministic model if only one scenario is
involved. b and c are introduced to denote the OSC per Numerical Experiment
vehicle and USC per requested order, respectively. Since In this section, we validate the performance of MDN-
the stochastic programming model has a set of rider SO through numerical experiments. We first describe
demand scenarios (drawn from rider demand distribution), the numerical validation env ironment and perfor-
the previous formula denotes the expected total cost (TC) mance metrics. Next, we discuss data processing and
over the rider demand distribution. feature engineering for XMDN and GRU. Finally, we
A group of constraints must be satisfied according to evaluate the proposed approach by comparing the per-
our problem settings. First, a certain level of supply– formance with other machine learning-based vehicle
demand ratio (i), along with the supply–demand ratio guidance models.
gap (p ) among ride-hailing regions must be taken into
consideration, which is captured by the following con- Experiment Setup
straints: Both batching matching and historical averages are
coded in Python 3.8, and the mathematical optimization
^i - p h^dt tm + d tm- 1 h # |x t
v, m # i ^dt tm + d tm- 1 h, 6m ! M . (8) models are solved by Gurobi 9.1 (https://www.gurobi.
v ! Vt
com/academia/academic-program-and-licenses/). The
In addition, each idle vehicle can be guided to one experiments are run on a PC with Intel Core i7 CPU,
region’s POI at most, which are represented by 32 GB RAM, Windows 10. The deep learning models

(GRU and XMDN) are coded in Python 3.8 and Tensor- (In this case, the riders must wait until the next batching
Flow 2.4 under NVIDIA GeForce RTX 2080 GPU, 16 GB window for service, and we assume the riders do not cancel
RAM, and Ubuntu 18.04. their requests if they are not served in the current batching
For GRU models, the training time of each epoch is window. A similar assumption is discussed in [16].) In addi-
around 275 s, and the average training time of the GRU tion, we assume that riders are picked up using the first-
model is approximately 3.5 h. For XMDN models, the come-first-serve (FCFS) protocol. Also, for the no guidance
training time of each epoch is around 358 s, and the aver- scenario, riders are picked up by their nearest drivers.
age training time of the XMDN model is approximately (Since the FCFS protocol is adopted, rider A, whose
4.7 h. After the training process, the deep learning models request time is before rider B, will be picked up by a driver
can predict the rider demand (using GRU) and rider even if the distance between the driver and rider B is closer
demand distribution (using XMDN) by utilizing the time- than the distance between the driver and rider A.)
series sequence data from the testing set where the com-
putational time for prediction is only a few seconds. In Feature Engineering
addition, the optimization model can be solved by Gurobi We consider the following features that are highly corre-
within 2 min. Therefore, the overall time is far less than lated to rider demands. Features extracted from the data
the batching window size, which indicates that our pro- set in this work include rider demand, region ID, day of
posed framework can be applied to the dynamic ride-hail- the month, month, day of the week, hour of the day, and
ing platform. minute of the hour. The rider demand is used as the pre-
dicted target, while the rest of the features are used to
Evaluation Metrics observe how they affect the target. We adopt XGBoost [17]
We adopt the following three data-driven optimization to determine the feature importance for the deep learning
models as the guidance approaches 1) our proposed predictor, whose metric is based on impurity value. The
approach MDN-SO, 2) the integration of GRU and deter- result of the feature importance is illustrated in Figure 1.
ministic optimization model that is labeled as GRU-DM, We can observe that the region ID and hour of the day are
and 3) the integration of historical average (HA) and deter- the most important features for the selected data set. The
ministic optimization model that is labeled as HA-DM. In feature of region ID and hour of the day takes over 50%
addition, the nonguidance mechanism is also introduced and 30%, respectively, which implies that the features sig-
to compare the results. Meanwhile, we select the following nificantly impact rider demand prediction.
metrics for the performance comparison.
◆◆ OSC, USC, and TC: The metric involves two types of Performance Evaluation
costs, namely, OSC, which can be computed by the In this section, we choose one-week trip records (2 March
driver’s idle driving distance, and USC, which can be 2016–8 March 2016) that involve five weekdays and two
computed by the profit of service orders. The results weekend days for the experiment validations. The experi-
can be computed from (7) by replacing the predicted mental results averaged five and two for the weekday and
rider demand with the real demand. weekend scenarios, respectively. Since no idle driver infor-
◆◆ Rider’s SR: For the ride-hailing service region k, the mation is available in the data sets, we assume the coordi-
metric is defined as the proportion of served (satisfied) nates of idle vehicles are randomly generated in the eight
riders. Namely, the rider’s SR at region k is ride-hailing regions. The parameter setting of the optimi-
zation models is described in Table 1.
SR k = min $ 1, dk .
s
(13) We assume that the coordinates of the idle vehicles are
k
randomly generated across the eight ride-hailing service
where s k and d k denote the number of (guided) idle regions. In addition, the number of idle vehicles (fleet size)
vehicles at region k and the number of requests (real in the current time slot is determined by the real rider
rider demand) at region k, respectively. demands from the previous time slot. We set the supply–
◆◆ Rider’s waiting time (WT): WT is computed in differ- demand ratio parameter i to 0.95, 1.0, and 1.05 to evaluate
ent ways depending on the approaches. To be specific, the experimental results under different fleet size levels.
for guidance approaches (i.e., MDN-SO, GRU-DM, and We are more interested in how much benefit the ride-hail-
HA-DM), WT involves three parts, namely: 1) the time ing platform could obtain from vehicle proactive guidance.
duration between the end of the current batching win- Since idle vehicles may distribute across regions under any
dow and the rider’s request time (WT1), 2) the driver’s patterns, we consider the three idle vehicle distribution
travel time from POI (from driver’s GPS for no guidance scenarios shown as follows.
scenario) to rider’s pickup coordinate (WT2), and ◆◆ Positively correlated idle vehicle distribution: Given a
3) 10 minutes if the rider cannot be picked up in the cur- set of region index K = " 1, 2, f, k ,, a set of idle vehi-
rent batching window (WT3): cles distributed across regions {s i} i ! K, and a set of
demands across regions {d j} j ! K, we formulate such a
WT = WT1 + WT2 + WT3 . (14) tuple sequence as follows:

f, 1 s i , d j 2, 1 s i, d j 2, 1 s i , d j 2, f
- - + + (15) average SR. In addition, without guidance operation, the SR
under positively correlated distribution (labeled NG-PC) is
such that much higher than the one under uniform (labeled NG-U)
and negatively correlated distributions (labeled NG-NC).
f, s i # s i # s i , f
- +
This is because NG-PC considers such an ideal scenario
f, d j # d j # d j , f.
- +
that the idle vehicles are cruising at their “right” regions.
Therefore, all the regions can satisfy the rider’s requests.
We call this type of idle vehicle distribution Positively Notice that during some time slots (around 4 a.m. to 8 a.m.
Correlated labeled as PC, if 6i, j ! K, i = j. on weekdays, around 4 a.m. to 11 a.m. on weekends),
Intuitively, PC is introduced to describe such a scenar- HA-DM is inferior to NG-U in terms of average SR, implying
io that the idle vehicles are “ideally” distributed across that without accurate rider demand predictions, a guidance
regions, which indicates more idle vehicles are cruising approach can be even worse than no guidance.
around the higher demand regions and vice versa. In this Further, MDN-SO is quite close to the NG-PC scenario
sense, vehicle proactive guidance operation is unneces- in terms of average SR, which indicates that our proposed
sary since the number of idle vehicles can meet the
demand for each region. However, this ideal scenario sel-
dom happens in realistic applications [18]. Feature Importance
◆◆ Negatively correlated idle vehicle distribution: Using
the same notation, we formulate such a tuple sequence Region-ID
as follows:
f, 1 s i , d j 2, 1 s i, d j 2, 1 s i , d j 2, f (16) 55.26%
- - + +
such that
1.21% MoH
f, s i # s i # s i , f
- +
%
Month 2.5 %
f, d j $ d j $ d j , f. 4
3.2
- +
DoM %
74
30.05%
7.
We call this type of idle vehicle distribution Negatively

DoW
Correlated labeled as NC, if 6i, j ! K, i = j.
HoD
In contrast to PC, NC is introduced to describe
such a “worst case” scenario that the idle vehicles
are cruising around the “wrong” regions. In this case, Figure 1. The pie plot of the feature importance
vehicle proactive guidance operations are quite nec- where DoM, MoY, DoW, HoD, and MoH denote the
day of the month, the month of the year, the day of
essary to alleviate the imbalance of supply and the week, the hour of the day, and the minute of the
demand. hour, respectively.
◆◆ Uniform idle vehicle distribution: In this case, the idle
vehicles are uniformly distributed across multiple ride-
hailing regions. We call this type of idle vehicle distri- Table 1. Parameter Settings in the
bution Uniform that is labeled U. Optimization Model.
We compare the validation results based on the previ-
ous idle vehicle distributions. First, as shown in Table 2, Parameter Value
we observe that the OSC increases and the USC decreas-
a US$0.4–US$0.9
es as the fleet size grows (i ranges from 0.95 to 1.05).
This is because more rider requests will be satisfied as b idle travel distance cost: a $ g v,m
the number of idle vehicles increases, which leads to
c estimated from the real data set in the
more OSC and less USC. In addition, MDN-SO outper- corresponding time slot
forms the remaining data-driven competitors GRU-DM
and HA-DM in terms of the TC, with the average TC m 30 mi/h
reduction by 17.5% and 63.8% on weekdays, 21.4% and i {0.95, 1, 1.05}
62.1% on weekends under i = 0.95; 17.2% and 70.5% on
weekdays, 23.2% and 68.8% on weekends under i = 1.0; p 0.1
23.7% and 64.4% on weekdays, 31.9% and 63.7% on week- 10 min
DT
ends under i = 1.05.
Second, as shown in Table 2, MDN-SO is approximately Ct set to the total rider demand in the previous
time slot
2% and 17% higher than GRU-DM and HA-DM in terms of

approach is capable of guiding idle vehicles in a more rea- exist a few riders who are served by the drivers in other
sonable manner. This is because the MDN-SO framework regions under the NG-NC scenario where the rider’s aver-
utilizes uncertainty in the forecasting results that involves age waiting time will increase.
all the potential rider demand possibilities for decision- Further, among the three data-driven guidance
making. Moreover, compared with MDN-SO, GRU-DM, and approaches, GRU-DM and HA-DM are 2.1% and 11.5% high-
HA-DM, we observe that our proposed data-driven er than MDN-SO in terms of the rider’s average waiting
approach is able to achieve close to the PC scenario, which time. Also, MDN-SO can reduce the rider’s average waiting
implies that MDN-SO provides a fairly effective strategy time by 20% compared with the NG-U scenario without
for the idle vehicle proactive guidance operation. guidance, which is closer to the realistic scenario. This is
Finally, the rider’s average waiting time is an essential because MDN-SO leverages not only the predicted demand
metric from the rider’s perspective. As shown in Figure 2, uncertainty in each ride-hailing region but also guidance
the rider’s average waiting time drops as the fleet size operations to achieve a better solution.
increases. This is because more rider requests will be sat-
isfied when more idle vehicles are available. Therefore, Conclusions and Future Work
WT3 will be smaller. In addition, NG-PC outperforms the Effective idle vehicle guidance strategies provide ride-hail-
NG-U and NG-NC regarding the rider’s average waiting ing platforms with competitive advantages in terms of
time. This is quite straightforward because the riders in improved matching rates, reduced rider wait times, and
each region can be served by the idle drivers in the corre- driver idle travel distances. More research work is needed
sponding region under the NG-PC scenario, while there in this area to ensure the sustainable growth of ride-hailing
Table 2. The average OSC, under-supply cost (USC), TC, and SR using different data-driven
guidance approaches (HA-DM, GRU-DM, and MDN-SO) and no guidance with different idle
vehicle distributions (NG-PC, NG, and NG-NC).
OSC USC TC SR OSC USC TC SR
i = 0.95 HA-DM 20 6,795 6,815 79.2% 31 6,500 6,531 80.7%
GRU-DM 47 2,941 2,988 90.9% 57 3,096 3,153 91.1%
MDN-SO 72 2,394 2,466 92.7% 79 2,399 2,478 93.2%
NG-PC N/A* N/A N/A 96% N/A N/A N/A 96.5%
NG-U N/A N/A N/A 75.7% N/A N/A N/A 77.8%
NG-NC N/A N/A N/A 38.3% N/A N/A N/A 39.3%
i=1 HA-DM 40 5,509 5,549 82.8% 58 5,133 5,193 84.1%
GRU-DM 111 1,865 1,976 93.7% 125 1,983 2,108 93.9%
MDN-SO 132 1,503 1,635 95.1% 145 1,473 1,618 95.6%
NG-PC N/A N/A N/A 96.6% N/A N/A N/A 97.4%
NG-U N/A N/A N/A 78.1% N/A N/A N/A 80.6%
i = 1.05 HA-DM 140 3,867 4,007 87.8% 125 3,858 3,983 87.3%
GRU-DM 204 1,665 1,869 94.5% 213 1,910 2,123 94.2%
MDN-SO 268 1,158 1,426 96.1% 241 1,203 1,444 96.4%
NG-PC N/A N/A N/A 97.7% N/A N/A N/A 97.7%
NG-U N/A N/A N/A 80.2% N/A N/A N/A 83%
*OSC, USC, and TC are set to N/A under NG-PC, PG-U, and NG-NC since no idle vehicle guidance operation is involved.

platforms in the long run. We propose an MDN-enabled SO guidance solutions. In future work, we plan to study the
framework by integrating an extended MDN with a sto- impacts of adopting such a vehicle guidance framework on
chastic optimization process. The proposed framework the downstream matching/dispatching operations of a ride-
produces high service quality and low-cost vehicle hailing platform. In addition, as an enhancement to our
16 16
Average Waiting Time (min)

14 14
12 12
10 10
8 8
6 6
0:00
1:00
2:00
3:00
4:00
5:00
6:00
7:00
8:00
9:00
10:00
11:00
12:00
13:00
14:00
15:00
16:00
17:00
18:00
19:00
20:00
21:00
22:00
23:00
0:00
1:00
2:00
3:00
4:00
5:00
6:00
7:00
8:00
9:00
10:00
11:00
12:00
13:00
14:00
15:00
16:00
17:00
18:00
19:00
20:00
21:00
22:00
23:00
Time of a Day Time of a Day
(a) (b)
16
16
14
14
12 12
10 10
8 8
6 6
0:00
1:00
2:00
3:00
4:00
5:00
6:00
7:00
8:00
9:00
10:00
11:00
12:00
13:00
14:00
15:00
16:00
17:00
18:00
19:00
20:00
21:00
22:00
23:00
0:00
1:00
2:00
3:00
4:00
5:00
6:00
7:00
8:00
9:00
10:00
11:00
12:00
13:00
14:00
15:00
16:00
17:00
18:00
19:00
20:00
21:00
22:00
23:00
(c) (d)
14
14
12
12
10 10
8 8
6 6
0:00
1:00
2:00
3:00
4:00
5:00
6:00
7:00
8:00
9:00
10:00
11:00
12:00
13:00
14:00
15:00
16:00
17:00
18:00
19:00
20:00
21:00
22:00
23:00
0:00
1:00
2:00
3:00
4:00
5:00
6:00
7:00
8:00
9:00
10:00
11:00
12:00
13:00
14:00
15:00
16:00
17:00
18:00
19:00
20:00
21:00
22:00
23:00

(e) (f)
MDN-SO GRU-DM HA-DM NG NG-NC NG-PC
Figure 2. The rider’s average waiting time under different supply–demand ratio scenarios: (a) weekday,
i = 0.95, (b) weekend, i = 0.95, (c) weekday, i = 1, (d) weekend, i = 1, (e) weekday, i = 1.05, and (f) weekend,
i = 1.05.

previous work on guidance and matching [2], we will tion. His research interests include machine learning, com-
design integrated vehicle guidance and rider–driver match- puter vision, and natural language processing.
ing systems that make use of the special characteristics of
the ride-hailing data and domain-specific constraints to References
further improve the performance of the framework in [1] H. Wang and H. Yang, “Ridesourcing systems: A framework and review,” Transp.
terms of system scalability and solution quality. Res. B, Methodol., vol. 129, pp. 122–155, Nov. 2019, doi: 10.1016/j.trb.2019.07.009.
[2] J. Gao, X. Li, C. Wang, and X. Huang, “BM-DDPG: An integrated dispatching frame-
About the Authors work for ride-hailing systems,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 8, pp.
Xiaoming Li (xiaoming.li@mail.concordia.ca) earned his 11,666–11,676, Aug. 2022, doi: 10.1109/TITS.2021.3106243.
M.S. degree in computer software and theory from North- [3] Y. Guo, Y. Zhang, J. Yu, and X. Shen, “A spatiotemporal thermo guidance based
eastern University and his Ph.D. degree in information and real-time online ride-hailing dispatch framework,” IEEE Access, vol. 8, pp. 115,063–
systems engineering from Concordia University. He is a 115,077, Jun. 2020, doi: 10.1109/ACCESS.2020.3003942.
research associate at Concordia University, Montreal, QC [4] X. Wan, H. Ghazzai, and Y. Massoud, “A generic data-driven recommendation sys-
H3G 1M8 Canada. His research interests include optimiza- tem for large-scale regular and ride-hailing taxi services,” Electronics, vol. 9, no. 4, p.
tion under uncertainty, large-scale optimization, network 648, Apr. 2020, doi: 10.3390/electronics9040648.
optimization, machine learning with applications in intelli- [5] X. Chen, F. Miao, G. J. Pappas, and V. Preciado, “Hierarchical data-driven vehicle
gent transportation systems, and supply chain optimization. dispatch and ride-sharing,” in Proc. IEEE 56th Annu. Conf. Decis. Control (CDC),
Jie Gao (jie.gao@hec.ca) earned her MASc. degree in 2017, pp. 4458–4463, doi: 10.1109/CDC.2017.8264317.
information systems and her Ph.D. degree in information [6] F. Miao et al., “Data-driven robust taxi dispatch under demand uncertainties,”
systems engineering from Concordia University. She is a IEEE Trans. Control Syst. Technol., vol. 27, no. 1, pp. 175–191, Jan. 2019, doi: 10.1109/
postdoctoral research fellow at HEC Montreal at the Uni- TCST.2017.2766042.
versity of Montreal, Montreal, QC H3T 2A7 Canada. Her [7] W. Szeto, R. Wong, and W. Yang, “Guiding vacant taxi drivers to demand locations
research interests include data-driven optimization, game by taxi-calling signals: A sequential binary logistic regression modeling approach and
theory, mechanism design, and machine learning with policy implications,” Transp. Policy, vol. 76, pp. 100–110, Apr. 2019, doi: 10.1016/j.
applications in intelligent transportation systems, smart tranpol.2018.06.009.
cities, and community healthcare. [8] Y. Liu, R. Jia, J. Ye, and X. Qu, “How machine learning informs ride-hailing services: A sur-
Chun Wang (chun.wang@concordia.ca) is a professor vey,” Commun. Transp. Res., vol. 2, 2022, Art. no. 100075, doi: 10.1016/j.commtr.2022.100075.
with the Concordia Institute for Information Systems Engi- [9] Y. Liu, F. Wu, C. Lyu, S. Li, J. Ye, and X. Qu, “Deep dispatching: A deep reinforce-
neering, Concordia University, Montreal, QC H3G 1M8 Can- ment learning approach for vehicle dispatching on online ride-hailing platform,”
ada. His research interests include the interface between Transp. Res. E, Logistics Transp. Rev., vol. 161, 2022, Art. no. 102694, doi: 10.1016/j.
economic models, operations research, and artificial intelli- tre.2022.102694.
gence. He is actively conducting research in multiagent sys- [10] E. Delage, S. Arroyo, and Y. Ye, “The value of stochastic modeling in two-stage sto-
tems, data-driven optimization, and economic model-based chastic programs with cost uncertainty,” Oper. Res., vol. 62, no. 6, pp. 1377–1393, Nov./
resource allocation with applications to healthcare manage- Dec. 2014, doi: 10.1287/opre.2014.1318.
ment, smart grid, and smart city environments. He is a Mem- [11] C. M. Bishop, “Mixture density networks,” Aston University, Birmingham, U.K.,
ber of IEEE. Tech. Rep. NCRG/94/004, 1994.
Xiao Huang (xiao.huang@concordia.ca) earned her [12] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated
B.E. degree in electronic engineering from Tsinghua Univer- recurrent neural networks on sequence modeling,” 2014, arXiv:1412.3555.
sity, her M.S. degree in mathematical finance from the Uni- [13] D. Ormoneit and V. Tresp, “Improved gaussian mixture density estimates using
versity of Southern California, and her Ph.D. degree from Bayesian penalty terms and network averaging,” in Proc. 8th Int. Conf. Neural Inf.
the Marshall School of Business at the University of South- Process. Syst., Nov. 1995, vol. 95, pp. 542–548.
ern California. She is a professor and the Concordia Univer- [14] R. L. Rardin and R. L. Rardin, Optimization in Operations Research, vol. 166.
sity Research Chair in Supply Chain Management in the Upper Saddle River, NJ, USA: Prentice-Hall, 1998.
John Molson School of Business at Concordia University, [15] S. Kim, R. Pasupathy, and S. G. Henderson, “A guide to sample average approxi-
Montreal, QC H3G 1M8 Canada. Her research interests mation,” in Handbook of Simulation Optimization, M. Fu, Ed. New York, NY, USA:
include competition and cooperation in supply chains, prod- Springer Science & Business Media, 2015, pp. 207–243.
uct and pricing strategies, and data-driven decision-making. [16] T. Oda and C. Joe-Wong, “MOVI: A model-free approach to dynamic fleet manage-
Yimin Nie (yimin.nie@ericsson.com) earned his B.S. ment,” in Proc. IEEE Conf. Comput. Commun. (INFOCOM), 2018, pp. 2708–2716, doi:
and M.S. degrees in theoretical physics from Peking Univer- 10.1109/INFOCOM.2018.8485988.
sity and his Ph.D. degree in computational neuroscience [17] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proc.
from the Canadian Center of Behavior Neuroscience at the 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2016, pp. 785–794, doi:
University of Calgary. He is currently a senior data scientist 10.1145/2939672.2939785.
and artificial intelligence researcher at Global AI Accelera- [18] E. Brown, “The ride-hail utopia that got stuck in traffic,” Wall Street J., Feb. 2020.
tor (GAIA) at Ericsson Inc., Montreal, QC H4R 2A4 Canada. [Online]. Available: https://www.wsj.com/articles/the-ride-hail-utopia-that-got-stuck
He worked as a senior data scientist in multiple business -in-traffic-11581742802
fields including E-commerce, finance, and telecommunica-

Edge Processing
A LoRa-Based LCDT System for Smart Building
With Energy and Delay Constraints
©SHUTTERSTOCK.COM/HALLOJULIE
A
by B Shilpa , Hari Prabhat Gupta , smart building is an emerging technology that
and Rajesh Kumar Jha has the potential to be used in a variety of
ubiquitous computing applications. The
majority of existing work for smart building
monitoring consumes a significant amount
of energy to communicate the sensory data from the build-
ing to the end users (EUs). This work presents a low-cost
data transmission (LCDT) system for a smart building in
Date of current version: 17 July 2023 the context of a noisy environment. The system uses the

long-range (LoRa) communication protocol to conserve data rate. The transmission of a large amount of sensory
energy and enable long-distance communication. The data takes a huge amount of time and consumes a lot of
smart building sensors generate data in the form of a mul-energy. The pervasive usage of unlicensed frequency
tivariate time series (MTS). The system compresses such bands by a large number of LoRa nodes creates the issue
an MTS before transmission by utilizing deep learning of LoRa interference.
(DL) techniques. A channel to reduce the transmission LoRa is commonly used for long-distance applications,
noise of sensory data is also designed using the DL meth- although it also performs well in indoor applications.
od. The system decompresses the received data at the There are just a few studies [5], [6], [7], [8], [9] that assess
receiver end and obtains the original MTS. Additionally, LoRa in dense indoor networks. The existing work [2],
we also conducted experiments to demonstrate the utility [10], [11] proposed various solutions to solve the LoRa
of the system. The experimental results demonstrate that interference issue. The authors in [10] used multiple gate-
selecting a finite number of distinct edge device (ED) ways to handle LoRa interference. The scheduling of
types aids in developing an LCDT nodes also reduces the interfer-
system subject to energy and laten- ence by transmitting the data in a
cy constraints. given period [2]. The effective use
This work presents of LoRa network parameters, such
Overview as spreading factors, also helps to
The smart building consists of var-
a low-cost data reduce the interference [11]. How-
ious types of sensor nodes (SNs) transmission ever, the use of multiple gateways
for gathering, processing, and com- increases the network’s cost;
system for a smart
municating the surrounding envi- scheduling nodes reduces gateway
ronment information to the users building in the utility; and fixed spreading factors
[1], [2]. An SN has sensing, commu- context of a noisy may consume high power. The
nication, processing, and power employ ment of c ut t i n g- ed ge
units. Examples of sensing units environment. machine learning and DL algo-
are temperature, light, humidity rithms is enhancing traditional
sensors, etc. The sensors in smart communication systems. Several
buildings generate huge data in the DL models for wireless communi-
form of an MTS, which contains significant information cation systems were developed in the existing research
that must be mined to enable timely responses and better works [12], [13], [14], [15], [16]. We intend to implement
decision making. The components of an MTS are the data such principles into practice for LoRa communication.
of different sensors with a given sampling rate. The research studies [1], [17], [18] related to smart build-
The communication unit of SNs in smart buildings com- ing are mainly focused on energy-efficient systems. As
monly uses Zigbee, Bluetooth, Wi-Fi, and other 2.4-GHz they have not taken into account the system’s cost, the
technologies [3]. Such technologies support short-range primary focus of this work is cost optimization.
communication and, therefore, have scalability issues. Edge processing is a potential solution for communicat-
Communicating the information using such technologies ing the smart building data with limited energy and delay.
increases the cost of multihop devices. The scalability It minimizes the communication time and energy con-
issue motivates the use of promising wireless solutions sumption for conveying sensory data by allowing tasks to
capable of simultaneously supporting many nodes and be processed locally. The cost of such EDs may vary based
long range communication. Low-power wide-area net- on the specification of the devices. A dynamic compres-
works (LPWANs) have evolved as the leading connection sion ratio of sensory data for edge inference systems with
option for smart applications requiring extended range, strict deadlines was described in [19]. The authors in [20]
high energy efficiency, and low cost. proposed an adaptive data reduction method that uses
An LPWAN protocol that is built on LoRa technology compressive sampling to lower the bandwidth needed for
is specified by an open standard known as the Long- sensory data transmission while minimizing the informa-
Range Wide-Area Network (LoRaWAN). The primary tion loss.
advantage of LoRa is its scalability because the gateway In this work, we consider a smart building scenario,
modules in LoRa support concurrent communication of where several nodes generate the sensory data while
multiple SNs [4]. Another advantage of LoRa is low ener- sensing the environment and communicate those data
gy usage during the transmission of the data to a large to the EDs for further processing. The success of the
distance. LoRa also provides tradeoffs among power con- scenario depends on the size of the data and the num-
sumption, communication range, and data rate. Despite ber of nodes. Large data size and multiple nodes give
the aforementioned advantages of LoRa, communicating high accuracy with high energy consumption and com-
a significant volume of the sensory data of smart build- munication delay. The smart building scenario works
ings to the EUs is difficult because LoRa supports a low successfully for a long time if the acquired sensory data

are transferred in the given duration and the energy ◆◆ We present the analysis of the delay and energy required
consumption of all of the deployed nodes is equal. The for sensory data compression and communication. The
system cost of such a scenario may be reduced by using analysis considers the different types of devices with
different types of EDs based on the requirement of the unequal processing, energy, and storage capabilities.
scenario. We address the following problem in this ◆◆ An optimization problem is formulated to minimize
work: How does one design an LCDT system to transmit the cost and energy consumption of the data transmis-
the huge size of sensory data of the smart building with sion system of the smart building. We also present a
given energy and delay constraints? To solve this prob- low-time-complexity algorithm to solve the optimiza-
lem, we present an LCDT system for smart buildings in a tion problem.
noisy environment. The solution uses DL techniques for ◆◆ Finally, the experimental results are presented to illus-
the compression and effective transmission of sensory trate the solution’s effectiveness. The experiment’s
data. The system uses the LoRa communication proto- parameters are defined based on the analysis of exist-
col to transfer the compressed smart building data to ing hardware to make it practical.
the EUs. Along with this, the key
contributions are as follows: The LCDT System
◆◆ We propose a compression– The LCDT system architecture
decompression approach called consists of SNs, EDs attached
The system uses
transmitter- and receiver-nets with a LoRa node, an LG, a net-
for lowering the amount of the long-range work server (NS), an application
sensory data at the ED. The communication server (AS), and EUs, as shown in
approach employs deep neural Figure 1. The SNs attached with
network (DNN) architectures protocol to conserve the smart building collect the sen-
for compressing and decom- energy and enable sory data in the form of the MTS
pressing the sensory data. The and forward it to the ED. The ED
DNN designed for compressing
long-distance is responsible for compressing the
the data is lightweight and can communication. received MTS and transmitting to
successfully run on low-pro- the LG. The LG receives the com-
cessing EDs. pressed MTS and forwards the
◆◆ We employ a mixed-density net- same to the NS. The compressed
work architecture for the channel-net [21] to reduce MTS is retrieved to the original form at the NS and for-
the noise effects between EDs and the LoRa gateway warded to the AS. The AS identifies the data and for-
(LG). The channel-net works on EDs after reducing wards them to the respective user based on the
the size of the sensory data by using the proposed application. Finally, the EU receives the information col-
compression DNN architecture. lected by the SNs.
Smart Building Transmitter-Net Channel-Net

With Sensor Nodes Edge Device (LoRa Node)
Receiver-Net
LoRa
Gateway Network Server and
Application Server
End Users
Transmitter-Net Channel-Net
LoRa Communication Non-LoRa Communication
Figure 1. An illustration of the LCDT system components for smart building using LoRa. The transmitter-net and
receiver-net are the mirror image of DNNs.

The system uses the DNN model for the compression of sender to receiver [12]. The channel-net is designed as
the MTS. The encoder and decoder of the DNN work for a mixture density network (MDN) with Gaussian com-
the compression and decompression of the MTS at the ED ponents to simulate the conditional density of the
and NS, respectively. Since the ED is a very lightweight channel output given its input [21]. The MDN is a con-
device compared to the NS, the delay and energy analysis catenation of DNN and a mixture model with parame-
of the system is considered only at the ED. The system ters z (I l) as a channel input I l function. The DNN
uses the transmitter-net as an encoder for compressing the model of the channel-net consists of L dense layers fol-
MTS and channel-net for handling the noise between the lowed by a sampling layer.
ED and LG. Both the transmitter-net and channel-net are The channel has the maximum fixed speed, denoted as
DNNs and work at the ED. The system consists of a channel rate c, to process the received data. The condi-
set of N EDs with I different tional probability density P (I m ; I l)
types, where N = {1, 2, f, n} and of a mixture model is given to the
I = " 1, 2, f., k ,. The costs of i sampling layer to obtain the output
and j types of EDs are denoted as The objective function I l. The conditional channel densi-
C i and C j, where { i, j } ! I and of the LCDT problem ty modeled by the MDN is given by
C i ! C j . The total number of ith
is to determine the
type EDs in the system is given by k
P (I m ; I l) = | r i (I l) z (I m ; I l) (2)
X i . The parameters of ED, such as number of the various i=1
energy E i, processing speed Vi, types of devices

and cost C i, will differ based on where k is the number of mixture
the ED type i. necessary to achieve components, r i (I l) ! [0, 1] is the
the lowest system cost. mixing coefficient of component I,
The Transmitter-Net and z (I m ; I l) is the function repre-
The transmitter-net is a DNN that senting the conditional densities of
runs on an ED for mapping the I . The output of channel-net, i.e.,
m
input MTS data I ! R D X Z to a reduced dimensional MTS I m, is forwarded to the NS through the LG. The receiver-net
I l ! R D X Z l, where D, Z, and Z l are the number of compo- at the NS decompresses and retrieves the original data It.
nents of MTS; original size of MTS; and reduced size of
MTS, respectively, and Z l # Z. The DNN model consists of Estimation of Cost, Delay, and Energy
L number of layers with q neurons in each layer. Initially, Consumption of the System
the I is one-hot encoded, and the elements of the encoded The cost of the LCDT system is determined by the number
vector are " I 1, I 2, f, I Z ,. The one-hot-encoded vector I 1 is of EDs of each type utilized for the smart building. Let X i
input to the first layer of the DNN. The neurons in the first be the total number of the ith type ED in the system,
layer receive input and perform simple computation with i ! I. The system cost is therefore
activation function h and forward output to the next layer.
The neurons in the next layer receive weighted input from C sys = C 1 X 1 + C 2 X 2 + g + C k X k . (3)
the previous layer, perform the computation, and forward
the output to the next layer. Likewise, the outputs of the The delay of the LCDT system depends on the time
Lth and (L - 1)th layers are given by taken by the transmitter-net and channel-net. The delay of
L the transmitter-net is the estimated time to compress the
h L = | f (W j h j - 1), MTS of SNs, i.e., the sum of the number of operations in
j=1
L-1 q the DNN of each ED. Let the SNs generate MTS with sam-
h L - 1 = | | h ij W ij (h ij - 1 (W i I i + b i)) (1) pling rate m, which is processed by k types of EDs. The
j=1 i=1
delay of the nth type of ED with the transmitter-net of the
where f, W j, I, and b are the activation function, weight L-layered DNN is given by
metrics of the jth layer, input, and bias, respectively. Due
to hardware constraints, the output of the last layer is k L
T comp
n = | | mq j (2I a + 1) h q Vi X i . (4)
given to normalization, which transforms the data to satis- i=1 j=1
fy the average power constraint or amplitude constraint.

Lastly, the compressed data I l ! R D X Z l are transmitted to The estimated delay in the channel-net is given by
the LG via the channel-net.
k L
T chan
n = | | cq j (2I al + 1) h q Vi X i . (5)
The Channel-Net i=1 j=1
The channel-net learns resilient representations of the

input data that can be retrieved with a low likelihood of The total delay of the system is the sum of the delays of the
errors despite channel conditions translating from transmitter-net and channel-net, which is given by

Tn = T comp
n + T chan
n . (6) and X 2 . We fix the total number of instructions to be
performed to 300. The cost, energy consumption, and
Let E oi be the energy consumption per operation of the ith processing speed of the different EDs are assumed to
type ED; then, the total energy consumption of the system be in the ratios of 1:3, 1:4, and 10:1, respectively. The
is given by threshold values E th, Tth, and C th are set to 1,500, 300,
and 5,000, respectively. Algorithm 1 is implemented to
E n = (T comp
n # E oi ) + (T chan
n # E io). (7) find the minimum cost of the system. Initially, the algo-
rithm computes the energy consumption and delay for
The Optimization Problem of the LCDT System all of the combinations of EDs of different types. Next,
This work aims to design a low-cost system for the trans- it finds the list of combinations of EDs that satisfy the
mission of sensory data of the smart building with given system constraints. Finally, the system’s cost is calcu-
energy and delay constraints. The optimization problem of lated for a given number of EDs of different types, and
the LCDT system is defined as it selects the combination of EDs that gives the mini-
mum cost. For the maximum number of 10 devices, the
The LCDT Problem optimal cost found by Algorithm 1 is 11 with X 1 = 5
and X 2 = 2.
min C sys (8a)
subject to constraint 1 : E n # E th (8b) Discussion and Results
Constraint 2 : Tn # Tth . (8c) In this section, we illustrate the performance of the pro-
posed system by using simulation results. The parameters
The objective function of the LCDT problem is to deter- considered for simulation are X 1 and X 2 types of EDs
mine the number of the various types of devices necessary with cost, energy consumption, and processing speed in
to achieve the lowest system cost. Constraint 1 indicates the ratios of 1:3, 1:4, and 10:1, respectively.
that the energy consumption of the system should be For example, the ratio of parameters selected by a
below the threshold E th . It helps to prolong the life of the market analysis considers the type 1 ED as Arduino
system. Constraint 2 ensures that the delay of the system and the type 2 ED as Raspberry Pi. The cost of the
for receiving the data at the NS should not exceed the Raspberry Pi is three times the cost of the Arduino; the
threshold Tth . The thresholds E th and Tth are given by the energy consumption is four times higher;, and the pro-
user based on the application of the system. cessing speed is 10 times that of the Arduino. The
To solve the LCDT problem, Algorithm 1 computes the threshold values E th, Tth, and C th are unit free and set
required number of different types of EDs with given energy initially to 1,500, 300, and 5,000, respectively. These
and delay constraints. We start by fixing the maximum threshold values may be varied depending on the sce-
number of EDs, i.e., n i to say n max for 1 # i # k . We con- nario of the application.
sider the scenario described in the “The LCDT System” sec- Figure 2(a) illustrates the impact of a number of
tion. Algorithm 1 takes C i, E oi , Vi, E th, Tth, and n max as instructions to be performed on the system cost. It shows
inputs, where 1 # i # k. It then computes the E n iterative that an increasing number of instructions increases the
for the nth type of ED by using (7), where 1 # n # k. If con-
straint 1 satisfies, i.e., E n # E th, the algorithm checks con-
straint 2, i.e., Tn # Tth, by using (6). Algorithm 1 returns the Algorithm 1: The Solution of the
number of EDs of each type, which satisfies both con- LCDT Problem
strains. Finally, Algorithm 1 calculates the cost of the sys-
Input: Ci , E oi , Vi , E th, T th, nmax
tem with the selected number of EDs of each type and
Output: q 1, f, q k
returns the number of EDs, which gives the minimum cost.
1 for int X 1 ! 1 to nmax do
The time complexity of the proposed algorithm is as follows:
2 h
There are 1 + k for loops in the function Insert, result-
3 for int X k ! 1 to nmax do
ing in a time complexity of O (k # n max # ft), where ft is the
4 if E n # E th and Tn # Tth then
time complexity of the function Insert. The function Com-
5 {q 1, q 2, f, q k } = Insert (X 1, X 2, f, X k )
pute Cost has a time complexity of O (q) # c, where q and
6 return q 1, q 2, f, q k ;
c are the number of times and the complexity of computing
7 Function Insert (X 1, X 2, f, X k )
the cost, respectively. Thus, the computational complexity
8 begin
of the algorithm is O (k # n max # q # ( ft + c)), which is in
9 Compute Cost = C 1 X 1 + C 2 X 2 + g + C k X k ;
polynomial time.
10 if Cost 1 C sys then q 1 = X 1, q 2 = X 2, f, q k = X k and
C sys = Cost ;
Example
11 return q 1, q 2, f, q k ;
Consider an LCDT system with the maximum number of
12 end
devices n max = 10 with two different types of EDs, i.e., X 1

system cost. This is because the is also observed that by increasing
system uses a greater number of the delay threshold, we can mini-
devices for performing an increased The development of mize the number of devices which,
number of instructions in the given a channel-net with in turn, minimizes the cost of the
delay and energy thresholds. Fig- system. The impact of the number
ure 2(a) also shows the impact of higher performance of instructions on type 2 devices is
delay on the system cost. We can is highlighted as a shown in Figure 2(c). Type 2 devic-
minimize the system cost by es also increase in number with
increasing the delay threshold for
next step in reducing respect to the number of instruc-
an increased number of instruc- interference and tions, but we can see a very mini-
tions. The delay threshold Tth, var- mal increase compared to type 1
transmission errors in
ied from 300 to 1,000. These values devices. This is because the cost of
can be ad justed based on the LoRa communication. type 2 devices is higher than type 1
requirements of use case. The devices, so the system considers
results show that the cost of the fewer type 2 devices to minimize
system depends on the delay the system cost.
threshold, number, and cost of different types of devices.
Figure 2(b) and (c) demonstrates the impact of the num- Conclusion and Future Work
ber of instructions on the number of devices. Figure 2(b) In this article, an LCDT method for smart building data
shows that the number of type 1 devices increases with is proposed. Compression–decompression models based
respect to the number of instructions to be performed. It on DL estimate the energy and communication delay for
80 20
70 18
Number of Type 1 Devices
16
Cost of the System
60
14
50 12
40 10
30 8
6
20
4
10 2
0 0
100 200 300 400 500 600 700 800 900 1,000 100 200 300 400 500 600 700 800 900 1,000
Number of Instructions Number of Instructions
(a) (b)
12
Number of Type 2 Devices
10
0
100 200 300 400 500 600 700 800 900 1,000
Number of Instructions
(c)
Tth = 300 Tth = 500

Tth = 700 Tth = 1,000
Figure 2. An illustration of the effect of the number of instructions on the system cost and the required number
of devices with different delay threshold. (a) The cost of the system. (b) The required number of type 1 devices.
(c) The required number of type 2 devices.

sensory data. A novel approach to implementing a com- Internet Things J., vol. 7, no. 1, pp. 298–310, Jan. 2020, doi: 10.1109/JIOT.2019.
munication channel entails creating a DL model to mini- 2946900.
mize transmission error. The best combination of EDs [7] J. Petäjäjärvi, K. Mikhaylov, M. Hämäläinen, and J. H. Iinatti, “Evaluation of
needed to build an LCDT system is determined using an LoRa LPWAN technology for remote health and wellbeing monitoring,” in Proc.
algorithm that is described. The experimental findings 10th Int. Symp. Med. Inf. Commun. Technol. (ISMICT), 2016, pp. 1–5, doi: 10.1109/
demonstrate that the system’s cost, which is constrained ISMICT.2016.7498898.
by energy and latency considerations, can be decreased [8] J. Haxhibeqiri, A. Karaagac, F. V. D. Abeele, W. Joseph, I. Moerman, and J. Hoe-
by using a fixed number of distinct EDs. Future research beke, “LoRa indoor coverage and performance in an industrial environment: Case
directions include expanding the analysis to take into study,” in Proc. 22nd IEEE Int. Conf. Emerg. Technol. Factory Automat. (ETFA), 2017,
account various performance-enhancing characteristics. pp. 1–8, doi: 10.1109/ETFA.2017.8247601.
The development of a channel-net with higher perfor- [9] L. Gregora, L. Vojtech, and M. Neruda, “Indoor signal propagation of LoRa technol-
mance is highlighted as a next step in reducing interfer- ogy,” in Proc. 17th Int. Conf. Mechatronics - Mechatronika (ME), 2016, pp. 1–4.
ence and transmission errors in LoRa communication. [10] D. Croce, M. Gucciardo, S. Mangione, G. Santaromita, and I. Tinnirello, “LoRa
technology demystified: From link behavior to cell-level performance,” IEEE
About the Authors Trans. Wireless Commun., vol. 19, no. 2, pp. 822–834, Feb. 2020, doi: 10.1109/
B Shilpa (b.shilpa@ifheindia.org) is a research scholar TWC.2019.2948872.
with the Department of Electronics and Communication [11] P. Kumari, H. P. Gupta, and T. Dutta, “Estimation of time duration for using the
Engineering, Faculty of Science and Technology, IFHE, allocated LoRa spreading factor: A game-theory approach,” IEEE Trans. Veh. Technol.,
Hyderabad 501203, India. Her research interests include vol. 69, no. 10, pp. 11,090–11,098, Oct. 2020, doi: 10.1109/TVT.2020.3007566.
wireless communication, wireless sensor networks, and the [12] T. O’Shea and J. Hoydis, “An introduction to deep learning for the physical layer,”
Internet of Things. IEEE Trans. Cogn. Commun. Netw., vol. 3, no. 4, pp. 563–575, Dec. 2017, doi: 10.1109/
Hari Prabhat Gupta (hariprabhat.cse@iitbhu.ac.in) is TCCN.2017.2758370.
an assistant professor in the Department of Computer Sci- [13] T. J. O’Shea, K. Karra, and T. C. Clancy, “Learning to communicate: Chan-
ence and Engineering, Indian Institute of Technology (BHU), nel auto-encoders, domain specific regularizers, and attention,” in Proc. IEEE Int.
Varanasi 221005, India. His research interests include wire- Symp. Signal Process. Inf. Technol. (ISSPIT), 2016, pp. 223–228, doi: 10.1109/ISSPIT.
less sensor networks, distributed algorithms, and the Inter- 2016.7886039.
net of Things. [14] H. Ye, L. Liang, G. Y. Li, and B.-H. Juang, “Deep learning-based end-to-end wire-
Rajesh Kumar Jha (rajeshjha@ifheindia.org) is an less communication systems with conditional GANS as unknown channels,” IEEE
assistant professor in the Department of Electronics and Trans. Wireless Commun., vol. 19, no. 5, pp. 3133–3143, May 2020, doi: 10.1109/
Communication Engineering, Faculty of Science and Tech- TWC.2020.2970707.
nology, IFHE, Hyderabad 501203, India. His research inter- [15] S. Dörner, S. Cammerer, J. Hoydis, and S. t. Brink, “Deep learning based commu-
ests include very large scale integration and the Internet of nication over the air,” IEEE J. Sel. Topics Signal Process., vol. 12, no. 1, pp. 132–143,
Things. Feb. 2018, doi: 10.1109/JSTSP.2017.2784180.
[16] D. Wu, M. Nekovee, and Y. Wang, “Deep learning-based autoencoder for m-user
References wireless interference channel physical layer design,” IEEE Access, vol. 8, pp. 174,679–
[1] B. Qolomany et al., “Leveraging machine learning and big data for smart build- 174,691, Sep. 2020, doi: 10.1109/ACCESS.2020.3025597.
ings: A comprehensive survey,” IEEE Access, vol. 7, pp. 90,316–90,356, Jul. 2019, doi: [17] I. Sülo, S. R. Keskin, G. Dogan, and T. Brown, “Energy efficient smart buildings:
10.1109/ACCESS.2019.2926642. LSTM neural networks for time series prediction,” in Proc. Int. Conf. Deep Learn.
[2] P. Kumari, H. P. Gupta, and T. Dutta, “A nodes scheduling approach for effective Mach. Learn. Emerg. Appl. (Deep-ML), 2019, pp. 18–22, doi: 10.1109/Deep-ML.
use of gateway in dense LoRa networks,” in Proc. ICC IEEE Int. Conf. Commun. (ICC), 2019.00012.
2020, pp. 1–6, doi: 10.1109/ICC40277.2020.9149006. [18] I. Abdennadher, N. Khabou, I. B. Rodriguez, and M. Jmaiel, “Designing energy effi-
[3] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, and M. Ayyash, “Internet cient smart buildings in ubiquitous environments,” in Proc. 15th Int. Conf. Intell. Syst.
of Things: A survey on enabling technologies, protocols, and applications,” IEEE Design. Appl. (ISDA), 2015, pp. 122–127, doi: 10.1109/ISDA.2015.7489212.
Commun. Surveys Tuts., vol. 17, no. 4, pp. 2347–2376, 4th quarter 2015, doi: 10.1109/ [19] X. Huang and S. Zhou, “Dynamic compression ratio selection for edge inference
COMST.2015.2444095. systems with hard deadlines,” IEEE Internet Things J., vol. 7, no. 9, pp. 8800–8810,
[4] J. C. Liando, A. Gamage, A. W. Tengourtius, and M. Li, “Known and unknown facts Sep. 2020, doi: 10.1109/JIOT.2020.2997128.
of LoRa: Experiences from a large-scale measurement study,” ACM Trans. Sens. Netw., [20] S. Tripathi and S. De, “An efficient data characterization and reduction scheme
vol. 15, no. 2, pp. 1–35, May 2019, doi: 10.1145/3293534. for smart metering infrastructure,” IEEE Trans. Ind. Informat., vol. 14, no. 10, pp.
[5] E. D. Ayele, C. Hakkenberg, J. P. Meijers, K. Zhang, N. Meratnia, and P. J. M. Hav- 4300–4308, Oct. 2018, doi: 10.1109/TII.2018.2799855.
inga, “Performance analysis of LoRa radio for an indoor IoT applications,” in Proc. [21] D. García Martí, J. Palacios Beltrán, J. O. Lacruz, and J. Widmer, “A mixture
Int. Conf. Internet Things Global Commun. (IoTGC), 2017, pp. 1–8, doi: 10.1109/ density channel model for deep learning-based wireless physical layer design,” in Proc.
IoTGC.2017.8008973. 23rd Int. ACM Conf. Model., Anal. Simul. Wireless Mobile Syst. (MSWiM), 2020, pp.
[6] W. Xu, J. Y. Kim, W. Huang, S. S. Kanhere, S. K. Jha, and W. Hu, “Measurement, 53–62, doi: 10.1145/3416010.3423229.
characterization, and modeling of LoRa technology in multifloor buildings,” IEEE

Conference Reports
by Qi Kang and Shuaiyu Yao
The 19th IEEE on a rigorous single-blind review

peer review for oral presentations.
International
This indicates a paper acceptance
rate of approximately 68.2%. The ac-
cepted papers have been included
Conference on in Proceedings of 2022 IEEE Inter-
national Conference on Networking,
Networking, Sensing, Sensing, and Control, which have

now been published in IEEE Xplore
and Control and Engineering Index COMPENDEX

indexed. Notably, the authors hailed
from various countries, including
China, the United States, Japan,
T
he 19th IEEE International focusing on intelligent control, ma- Canada, France, Italy, and The Neth-
Conference on Networking, chine learning, deep learning, network erlands. ICNSC 2022 was success-
Sensing, and Control (ICNSC communication, multiagent systems, fully held as a multinational and
2022) was held between 15 and Internet of Things, and swarm intelli- multidisciplinary conference that
18 December 2022 in Shanghai, gence. Following this theme, the con- provided scientists, engineers, and
China. ICNSC 2022 was hosted by ference provided a platform for both students with a platform to con -
the IEEE Systems, Man, and Cyber- academic researchers and industrial vene a nd d i s c u s s t hei r s h a r e d
netics Society; Tong ji University practitioners involved in different interests (Figures 1 and 2), thanks
(China); Fudan University (China); but related domains to discuss key to the collaborative efforts of the
and Shanghai Association for Sys- problems, exchange ideas, and tackle orga nizing, progra m, a nd steer-
tem Simulation (China). It was sup- emerging challenges, while sharing ing committees; the authors who
ported by the K.C. Wong Education innovative solutions and looking into submitted exceptional papers; and
Foundation, Hong Kong, China. future research prospects. the reviewers who examined the
The theme of this conference The conference was held in a hy- papers and provided many insight-
was “autonomous intelligent systems,” brid format with online and in-person ful comments.
attendance. A total of 211 papers were The program agenda of the confer-
submitted to the conference, out ence encompassed various technical
Date of current version: 17 July 2023 of which 144 were selected based activities, including a plenary session,
Figure 1. Some attendees of ICNSC 2022.

Consensus, a fundamental prob- among agents. Flocking, a self-
The theme of this lem in M ASs, was explored as organizing behavior inspired by
conference was a requirement for cooperation lower-intelligence animals, enables
“autonomous intel-
ligent systems,”
focusing on intel-
ligent control,
machine learning,
deep learning,
network com-
munication, mul-
tiagent systems,
Internet of Things,
and swarm intel-
ligence.
four keynote speeches, a best paper

award session, and 28 parallel panel
Figure 2. The conference site of ICNSC 2022.
sessions that featured eight special
sessions. The plenary session kicked
off with opening remarks delivered
by Prof. Xiaohua Tong, vice presi-
dent of Tongji University (Figure 3);
Prof. Mengchu Zhou, chair of the
ICNSC Steering Committee (Figure 4);
and Prof. Qi Kang, general chair of
ICNSC 2022 (Figure 5). The confer-
ence featured keynote speeches from
renowned experts (shown in the fol-
lowing paragraphs), whose thought-
provoking ideas set the tone for the
event. These speakers captivated the
audience with their visionary out-
look and provided inspiring insights
Figure 3. The opening remarks delivered by Prof. Xiaohua Tong, vice
into the future of networked systems president of Tongji University.
and control.
1) Prof. Peng Shi, editor-in-chief of
IEEE Transactions on Cybernet-
ics, who is from the University of
Adelaide, Australia, gave a presen-
tation titled “Consensus and For-
mation Control for Multi-agent
Systems.” Prof. Shi’s presentation
focused on multiagent systems
(MASs) and highlighted their key
features of communication, coor-
dination, and collaboration for
achieving common goals effec-
tively and efficiently. The presenta-
tion covered three main topics:
consensus, flocking/swarming, and Figure 4. The opening remarks delivered by Prof. Mengchu Zhou, chair of
formation control within MASs. the ICNSC Steering Committee.

the emergence of swarm intelligence adaptable formations. Prof. Shi’s
to enhance system survivability presentation presented modeling These speakers
and competitiveness. Additionally, analysis, design, simulations, and
captivated the au-
formation control aims to drive experimental examples to showcase
agents toward desired scalable and the potential of distributed schemes dience with their
visionary outlook
and provided
inspiring insights
into the future of
networked systems
and control.
in achieving consensus and for-

mation control (Figure 6).
2) Prof. Ke Tang from Southern Uni-
versity of Science and Technol-
ogy, China, introduced “Learn to
Optimize” to the audience of the
conference. Prof. Tang’s speech
Figure 5. The opening remarks delivered by Prof. Qi Kang, general chair focused on the automation of algo-
of ICNSC 2022.
rithm design to address complex
real-world optimization problems.
Off-the-shelf algorithms and tools
are inadequate for these problems,
requiring extensive prior knowl-
edge and manual algorithm design
efforts. The concept of learn to
optimize (L2O), a data-driven
approach for automated algorithm
and solver design, was introduced.
The speech discussed the building
blocks and recent advancements in
L2O, along with successful case
studies. Future directions in this
field were also presented (Figure 7).
3) Prof. Zhi Wei from the New Jersey
Figure 6. The keynote speech provided by Prof. Peng Shi. Institute of Technology, USA, pro-
vided a talk titled “Deep Autoen-
coders for Analysis of Single-Cell
RNA Sequencing Data.” Prof. Wei’s
talk focused on clustering analy-
sis, specifically in the context
of single-cell RNA sequencing
(scRNA-seq) studies. Traditional
clustering methods often overlook
the unique character istics of
scRNA-seq data and fail to utilize
prior information or filter out irrel-
evant genes during the clustering
process. To overcome these limita-
tions, Prof. Wei proposed the use
of model-based deep aut oen -
c o d e r s . These novel methods
Figure 7. The keynote speech provided by Prof. Ke Tang. aim to a dd re s s the identified

These sessions
facilitated dy-
namic conversa-
tions, where ideas
were rigorously
examined, and
diverse viewpoints
were respectfully
debated.
issues a nd enhance clustering Figure 8. The keynote speech provided by Prof. Zhi Wei.
performance. Through extensive
experiments on both simulated
and real datasets, the proposed
methods demonstrate a signifi-
cant improvement in clustering
performance, leading to the gen-
eration of biologically meaning-
ful clusters (Figure 8).
4) Prof. Tadahiko Murata from Kansai
University, Japan, delivered a pre-
sentation titled “Synthetic Societal
Data (Synthetic Population + Basic
Behavioral Data).” Prof. Murata’s
presentation focused on real-scale
social simulations for specific com-
munities such as cities, towns, and Figure 9. The keynote speech provided by Prof. Tadahiko Murata.
villages. With the COVID-19 pan-
demic, researchers are developing
social simulations for countermea-
sures against the virus. To develop
such simulations, synthetic popula-
tions have been synthesized based
on publicly released statistics with-
out containing any privacy infor-
mation. Prof. Murata’s research
outcome enables the generation
of synthetic societal data, which
include household compositions
and basic behavioral data, facilitat-
ing the development of real-scale
social simulations for emergency
and peaceful times (Figure 9).
The parallel sessions allowed re-
searchers to delve into specific sub-
topics, fostering focused discussions
on areas such as autonomous agents
and multiagent, continual learning,
cyberphysical systems, edge comput-
ing, heterogeneous wireless networks,
Internet of Things, networked con-
trol systems, smart civil aviation and Figure 10. The offline parallel sessions.

◆◆ “Detection transformer: Ultra-
sonic echo signal inclusion
detection with transformer” by
Xiaoxin Fang et al.
3) The winner of the best emerging
technology paper award:
◆◆ “Design of resilient supervi-
sory control for autonomous
connected vehicles approach-
ing unsignalized intersection
in presence of communication
delays” by Carlo Motta et al.
The attendees of ICNSC 2022
experienced a vibrant and intellec
tually stimulating environment, fos-
tering lively and in-depth exchanges
and discussions. The conference at-
Figure 11. The online parallel sessions.
tracted a diverse group of academi-
cians, researchers, industry experts,
a erospace, swarm intelligence, and trans- debated. Attendees eagerly explored and students from around the globe,
fer learning. The researchers present interesting topics, engaging in deep and all of them were eager to share
ed their latest discoveries and break- conversations, sharing feedback, and their knowledge and insights in the
throughs, sparking intense debates and exploring potential collaborations. field of networking, sensing, and
encouraging the exchange of different These interactions not only enriched control. Throughout the conference,
perspectives (Figures 10 and 11). the knowledge of the attendees but the participants actively engaged in
In addition, ICNSC 2022 was com- also nurtured a sense of community various sessions and presentations,
posed of eight special sessions that and camaraderie. each offering unique perspectives
addressed a diverse range of topics, After a series of oral presentation and cutting-edge research findings.
including competitions, a total of five papers The atmosphere was characterized
◆◆ Modeling, analysis, and control of were chosen from the pool of candi- by a palpable enthusiasm and a genu-
resource allocation systems date papers to receive the prestigious ine passion for advancing the field of
◆◆ A connected and autonomous accolades and best paper awards of networking, sensing, and control. In
mobility system for energy and ICNSC 2022. Specifically, these awards addition, the social events, including
environmental sustainability included two best conference paper receptions, banquets, and networking
◆◆ Artificial intelligence for IT oper- awards, two best student paper awards, breaks, offered valuable opportunities
ations and one best emerging technology for participants to forge new connec-
◆◆ Deep learning and optimization paper award. The winners of the best tions, foster collaborations, and es-
for distributed industrial systems paper awards are listed as follows: tablish lasting friendships. In these in-
◆◆ An evolutionary algorithm for big 1) The winners of the best conference formal settings, participants engaged
data applications paper awards: in lively conversations, sharing their
◆◆ Data-driven estimation in indus- ◆◆ “Heuristic scheduling method experiences, exchanging ideas, and
trial scenarios of flexible manufacturing based exploring potential joint projects. For
◆◆ Latent representation learning for on Petri nets and artificial more information about ICNSC 2022,
incomplete big data potential field” by Sijia Yi et al. including details about the conference
◆◆ Transfer perception and control ◆◆ “Open the black box of recur- program, keynote speeches, and spe-
in real robotic applications. rent neural network by decod- cial sessions, please visit the official
The discussions of these sessions ing the internal dynamics” by website of the conference: http://www.
brought together experts and attend- Jiacheng Tang et al. icnisc2022.com/. The upcoming con-
ees to tackle challenging issues and 2) The winners of the best student ference, ICNSC 2023, will take place
address the emerging trends in the paper awards: in the captivating city of Marseille,
field. These sessions facilitated dy- ◆◆ “Design and implementation France, which is renowned for its rich
namic conversations, where ideas of autonomous mapping sys- cultural heritage, breathtaking land-
were rigorously examined, and di- tem for UGV based on lidar” by scapes, and vibrant atmosphere.
verse viewpoints were respectfully Xiaohong Xu et al.

The 1st IEEE International Summer School on
E-CARGO and Applications
(Online)
July 16-21, 2023
http://www.e-cargoschool.com/
Sponsors:
• IEEE Systems, Man, and Cybernetics Society
Organizer:
• Technical Committee of Distributed Intelligent Systems
Co-Organizers:
• Technical Committee of Computer-Supported Cooperative Work in Design
• Guangdong Chapter
• Nipissing University, Canada
Acknowledgement:
• Jinling Institute of Technology, China
Goal:
The Environments-Classes, Agents, Roles, Groups, and Objects (E-CARGO) model is an abstract model for
complex systems. It has been successfully applied in different applications. It has numerous potentials to promote
investigations into academic and industry problems. It fits the SMCS requirements of initiatives.
Role Based Collaboration (RBC) and its E-CARGO model have been developed into a powerful tool for
investigating collaboration and complex systems. Related research has brought and will bring in exciting
improvements to the development, evaluation, and management of systems including collaboration, services, clouds,
productions, and administration systems.
E-CARGO assists scientists and engineering in formalizing abstract problems, which originally are taken as
complex problems, and finally points out solutions to such problems including programming. The E-CARGO model
possesses all the preferred properties of a computational model. It has been verified by formalizing and solving
significant problems in collaboration and complex systems, e.g., Group Role Assignment (GRA). With the help of
E-CARGO, the methodology of RBC can be applied to solve various real-world problems. E-CARGO itself can be
extended to formalize abstract problems as innovative investigations in research. On the other hand, the details of
each E-CARGO component are still open for renovations for specific fields to make the model easily applied. For
example, in programming, we need to specify the primitive elements for each component of E-CARGO. When these
primitive elements are well-specified, a new type of modeling or programming language can be developed and
applied to solve general problems with software design and implementations.
This summer school will extend the applications of E-CARGO and RBC, which promote problem solving for
complex systems that are considered in SMCS, such as Cybernetics, Systems Science and Engineering, Human-
Machine Systems, and Computational Social Systems.
Motivation:
In the field of Systems, Man, and Cybernetics (SMC), many researchers require solid tools to develop their
methodologies or solutions to their specific problems in their specific areas. There are many traditional tools for
specific areas, such as object or agent models, deep learning, evolutionary computation, or evolutionary
optimizations. However, these methodologies and models have their own limitations. Researchers are eager to have
a high-level, abstract, but expressive models and methodologies to guide them in understanding the requirement of
their specific problems, which are usually very complex. It is very hard for them to grasp the key elements to
analyze their problems, specify the requirement, and design a feasible solution.
E-CARGO is a novel model to meet the requirement of researchers in this aspect. Using E-CARGO, researchers
master a tool to start to investigate a problem along an easy-to-follow route and can gradually delve into the details
of the system or problem they are mainly concerned about. Such a tool helps them to understand their problems or
systems in an adaptive and incremental way.
In the summer school, we will demonstrate through lectures and labs many successful stories and case studies
for researchers to learn, follow, and practice.
The SMC Society encourages interdisciplinary research and innovations and is a reputational technology
incubator. It is the SMC Society that makes E-CARGO develop, expand, and mature.

Attendees:
This school is open for everyone and anyone with some familiarity with abstract mathematical structures to
learn about the E-CARGO model and RBC theory. Our goal is to make the E-CARGO/RBC theory accessible to,
and inclusive of, everyone who is interested. We believe that E-CARGO is for everyone, and are committed to
fostering a kind, inclusive environment. From our experience, 4th-year students, graduate students including
master’s and PhD’s, and fresh researchers/practitioners in STEM majors are better fits.
Registration:
Including:
1) 5-day (10 sessions) of online participation of the summer school program.
2) a certificate for those registered attendees who attend not less than 7 sessions.
3) an author-signed hardcopy book for the top 10 students, and a hardcopy book for the top 11-50 students in
performance (Value: $170 including shipping cost): H. Zhu, E-CARGO and Role-Based Collaboration: Modeling
and Solving Problems in the Complex World, Wiley-IEEE Press, NJ, USA, Dec. 2021.
Note: We will also send out more books (51-?) based on the budget. The criterion is the registration time, i.e., First
In First Serve (FIFS).
IEEE SMC student member: $50CAD

IEEE SMC member: $50CAD
IEEE student member: $85CAD
IEEE member: $120CAD
Non-IEEE student: $120CAD
Non-IEEE member: $190CAD
Organization Committee:
General Chair:
Haibin Zhu, Nipissing University, Canada
Program Co-Chairs:
Dongning Liu, Guangdong University of Technology, China
Yin Sheng, Hohai University, China
Registration Co-Chairs:
Xianjun Zhu, Jinling Institute of Technology, China
Publicity Co-Chairs:
Hua Ma, Hunan Normal University, China
Libo Zhang, Southwest University, China
Instructors:
Haibin Zhu, Nipissing University, Canada
Dongning Liu, Guangdong University of Technology, China
Yin Sheng, Hohai University, China
Lab Instructor:
Qian Jiang, Macau University of Science and Technology, China
Secretary:
Chengyu Peng, Laurentian University, Canada
Contact: cpeng@laurentian.ca
Confirmed Panelists:
Sam Kwong, IEEE Fellow, Chair Professor, City University of Hong Kong, President, IEEE SMC Society
Mariagrazia Dotoli, Professor, Politecnico di Bari, Vice President – Membership & Student Activities, IEEE SMC
Society
Ljiljana Trajkovic, IEEE Fellow, Professor, Simon Fraser University, EiC, IEEE Transactions on Human Machine
Systems
Peng Shi, IEEE Fellow, Professor, University of Adelaide, EiC, IEEE Transactions on Cybernetics
Robert Kozma, IEEE Fellow, Professor, University of Memphis, EiC, IEEE Transactions on Systems, Man, and
Cybernetics: Systems
Weiming Shen, IEEE Fellow, Professor, Huazhong University of Science and Technology,

IEEE Systems Man Amp Cybernetics Magazine - Vol9 No 3 July 2023

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

IEEE Systems Man Amp Cybernetics Magazine - Vol9 No 3 July 2023

Uploaded by

Copyright:

Available Formats

IEEE Systems, Man, and Cybernetics Magazine

Digital Object Identifier 10.1109/MSMC.2023.3280352

Volume 9, Number 3 • July 2023

2 UAVs-Enabled Maritime Communications

28 MDN-Enabled SO for Vehicle Proactive

ABOUT THE COVER

Ju ly 2023 IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE 1

by Muhammad Waseem Akhtar and Nasir Saeed

Digital Object Identifier 10.1109/MSMC.2022.3231415

2 IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023 2333-942X/23©2023IEEE

UAV-Aided Maritime Communication

Ju ly 2023 IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE 3

Ship different sea levels.

4 IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

es in channel modeling. Usually, in many cases, the

two-ray model is applied. The first component of the two-

ray model is the line-of-sight (LoS) component, and the

second is the surface-reflected ray component. When the Sea

transmission distance is very large, and the transmitter is Ship Ship

strong paths. However, a dispersion around the maritime

Ju ly 2023 IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE 5

6 IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

Ju ly 2023 IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE 7

8 IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

by Shuaiqi Liu , Siqi Wang , Hong Zhang,

2333-942X/23©2023IEEE Ju ly 2023 IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE 9

10 IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

Figure 1. The network structure of the P4D-ResNet model. max: maximum.

Ju ly 2023 IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE 11

(x 0, y 0, z 0, t 0) position of the j th feature map of the i th

function, P4DC denotes the P4D-convolutional block, and 0 0 0 0

c p=0 q=0 r=0 s=0

12 IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

4D Maximum Pooling Layer Data Enhancement And Model Training

ReLU 1×1×1×3 ReLU

ReLU ReLU ReLU

(a) (b) (c)

Ju ly 2023 IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE 13

14 IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

0.8 4 70.36 67.45 72.77

 able 4. The classification effect of

Ju ly 2023 IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE 15

7 OHSU 71.42 75 66.67

14 UCLA 70 77.78 63.63

Table 5. The performance of the 15 UM 69.53 81.82 62.78

17 Yale 80.95 76.19 85.71

16 IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

TC 29.17 70.83 TC 40 60 TC 20 80 TC 12.5 87.5 TC 16.67 83.33 TC 29.33 70.67

ASD TC ASD TC ASD TC ASD TC ASD TC ASD TC

TC 33.33 66.67 TC 27.78 72.22 TC 23.33 76.67 TC 50 50 TC 42.86 57.14

ASD TC ASD TC ASD TC ASD TC ASD TC

ASD TC ASD TC ASD TC ASD TC ASD TC ASD TC

Ju ly 2023 IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE 17

18 IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

by Hossam A. Gabbar , Abderrazak Chahid ,

2333-942X/23©2023IEEE Ju ly 2023 IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE 19

These solutions allow fast scan time with additional post-

Mean Potential Individual Potential Societal

Management 2.9 108.16 104,287,872

Business, finance, and administrative 3.8 85.15 239,109,715

Health occupations 3.6 97.44 97,790,784

Occupations in social science, education, government 3.7 112.51 165,333,445

Occupations in art, culture, recreation, and sport 3.9 91.67 33,212,041

able 4. The classification effect of