July 2023 IEEE Systems, Man, & Cybernetics Magazine

159 53 8MB

English Pages [52] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Volume 8, Number 3, July 2022 IEEE Systems, Man and Cybernetics Magazine

147 32 15MB Read more

Volume 8, Number 2, April 2022 IEEE Systems, Man and Cybernetics Magazine 1119693063

138 41 10MB Read more

Volume 8, Number 4, October 2022 IEEE Systems, Man and Cybernetics Magazine

178 13 11MB Read more

Volume 18, Number 3, August 2023 IEEE Computational Intelligence Magazine

167 47 32MB Read more

Volume 40, Number 3, May 2023 IEEE Signal Processing Magazine

271 53 5MB Read more

vol 15, # 2, Spring 2023 IEEE Solid-States Circuits Magazine

146 100 35MB Read more

Volume 40, Number 5, July 2023 IEEE Signal Processing

184 7 9MB Read more

March 2022 IEEE Robotics and Automation Magazine

284 68 34MB Read more

SUMMER 2023, VOL. 15, NO. 3 IEEE Solid-States Circuits Magazine

148 41 18MB Read more

September 2023, vol. 11, No. 3 IEEE Geoscience and Remote Sensing Magazine

103 25 51MB Read more

July 2023
IEEE Systems, Man, & Cybernetics Magazine

Citation preview

IEEE Systems, Man, and Cybernetics Magazine EDITOR-IN-CHIEF Tingwen Huang Texas A&M University at Qatar, Doha, Qatar [email protected]

ASSOCIATE EDITORS Mali Abdollahian, Australia Mohammad Abdullah-Al-Wadud, Saudi Arabia Choon Ki Ahn, Korea Bernadetta Kwintiana Ane, India Krishna Busawon, UK György EIgner, Hungary Liping Fang, Canada Hossam Gaber, Canada Aurona Gerber, South Africa Jason Gu, Canada Abdollah Homaifar, USA Okyay Kaynak, Turkey Kevin Kelly, Ireland Kazuo Kiguchi, Japan Abbas Khosravi, Australia Vladik Kreinovich, USA Wei Lei, China Kovács Levente, Hungary Huaqing Li, China Jing Li, China Dongning Liu, China Agostino Marcello Mangini, Italy Darius Nahavandi, Australia Chris Nemeth, USA Vinod Prasad, Singapore Hong Qiao, China Ferat Sahin, USA Mehrdad Saif, Canada Claudio Savaglio, Italy Bahram Shafai, USA Yin Sheng, China Jinshan Tang, USA Liqiong Tang, New Zealand Ying Tan, Australia Jiacun Wang, USA Yingxu Wang, Canada Margot Weijnen, Netherlands Peter Whitehead, USA Zhao Xingming, China Laurence T. Yang, Canada

SOCIETY BOARD OF GOVERNORS Executive Committee Sam Kwong, President Imre Rudas, Jr. Past President Edward Tunstel, Sr. Past President Enrique Herrera Viedma, Vice President, Cybernetics Saeid Nahavandi, Vice President, Human–Machine Systems Thomas I. Strasser, Vice President, Systems Science and Engineering

Yo-Ping Huang, Vice President, Conferences and Meetings Karen Panetta, Vice President, Membership and Student Activities Okyay Kaynak, Vice President, Organization and Planning Shun-Feng Su, Vice President, Publications Ying (Gina) Tang, Vice President, Finance Vladik Kreinovich, Treasurer Tom Gedeon, Secretary Valeria Garai, Asst. Secretary Editors Peng Shi, EIC, IEEE Transactions on Cybernetics Robert Kozma, EIC, IEEE Transactions on Systems, Man, and Cybernetics: Systems Ljiljana Trajkovic, EIC, IEEE Transactions on Human–Machine Systems Bin Hu, EIC, IEEE Transactions on Computational Social Systems Tiago H. Falk, EIC, SMC E-Newsletter Industrial Liaison Committee Christopher Nemeth, Chair Sunil Bharitkar Michael Henshaw Yo-Ping Huang Azad Madni Rodney Roberts Organization and Planning Committee Vladimir Marik, Chair Enrique Herrera Viedma Mengchu Zhou Dimitar Filev Robert Woon Ferat Sahin Edward Tunstel Larry Hall Jay Wang Michael Smith C.L. Philip Chen Karen Panetta Publications Ethics Committee Shun-Feng Su, Chair Imre Rudas Edward Tunstel Vladik Kreinovich Peng Shi Fei-Yue Wang Robert Kozma Ljiljana Trajkovic Haibin Zhu History Committee Michael Smith

Membership and Student Activities Committee Karen Panetta, Chair György Eigner, Coordinator Christopher Nemeth Lance Fung Robert Kozma Roxanna Pakkar Saeid Nahavandi Okyay Kaynak Tadahiko Murata Ferial El-Hawary Paolo Fiorini Shun-Feng Su Virgil Adumitroaie Peng Shi Ashitey Trebi-Ollennu Hideyuki Takagi Standards Committee Loi Lei Lai, Chair (China) Chun Sing Lai, Vice Chair (UK) Wei-jen Lee (USA) Thomas Strasser (Austria) Dongxiao Wang (Australia) Chaochai Zhang (China) Haibin Zhu (Canada) Nominations Committee Imre Rudas, Chair C.L. Philip Chen Vladimir Marik Ljiljana Trajkovic Awards Committee Dimitar Filev, Chair Edward Tunstel Laurence Hall Ljiljana Trajkovic Peng Shi Michael H. Smith Vladik Kreinovich Fellows Evaluation Committee Edward Tunstel, Chair Mengchu Zhou, Vice Chair Liping Fang Maria Pia Fanti Vladimir Marik Germano Lambert-Torres Karen Panetta Ching-Chih Tsai Electronic Communications Subcommittee Saeid Nahavandi, Chair Syed Salaken, Web Editor Darius Nahavandi, Social Media Mariagrazia Dotoli Patrick Chan Haibin Zhu Ying (Gina) Tang

Ferat Sahin György Eigner Chapter Coordinators Subcommittee Lance Fung, Chair Enrique Herrera-Viedma Imre Rudas Adrian Stoica Maria Pia Fanti Karen Panetta Hideyuki Takagi Ching-Chih Tsai Student Activities Subcommittee Roxanna Pakkar, Chair Bryan Lara Tovar Piril Nergis JuanJuan Li X. Wang Young Professionals Subcommittee György Eigner, Chair Ronald Bock Sonia Sharma Xuan Chen Raul Roman Fernando Schramm

IEEE PERIODICALS MAGAZINES DEPARTMENT 445 Hoes Lane, Piscataway, NJ 08854 USA Peter Stavenick Journals Production Manager Katie Sullivan Senior Manager, Journals Production Janet Dudar Senior Art Director Gail A. Schnitzer Associate Art Director Theresa L. Smith Production Coordinator Mark David Director, Business Development— Media & Advertising Felicia Spagnoli Advertising Production Manager Peter M. Tuohy Production Director Kevin Lisankie Editorial Services Director Dawn M. Melley Staff Director, Publishing Operations

IEEE SYSTEMS, MAN, AND CYBERNETICS MAGAZINE (ISSN 2333-942X) is published quarterly by the Institute of Electrical and Electronics Engineers, Inc. Headquarters: 3 Park Avenue, 17th Floor, New York, NY 10016-5997 USA, Telephone: +1 212 419 7900. Responsibility for the content rests upon the authors and not upon the IEEE, the Society or its members. IEEE Service Center (for orders, subscriptions, address changes): 445 Hoes Lane, Piscataway, NJ 08855-1331 USA. Telephone: +1 732 981 0060. Subscription rates: Annual subscription rates included in IEEE Systems, Man, and Cybernetics Society member dues. Subscription rates available on request. Copyright and reprint permission: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy beyond the limits of U.S. Copyright law for the private use of patrons 1) those post-1977 articles that carry a code at the bottom of the first page, provided the per-copy fee indicated in the code is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923 USA; 2) pre-1978 articles without a fee. For other copying, reprint, or republication permission, write Copyrights and Permissions Department, IEEE Service Center, 445 Hoes Lane, Piscataway, NJ 08854. Copyright © 2023 by the Institute of Electrical and Electronics Engineers Inc. All rights reserved.

Digital Object Identifier 10.1109/MSMC.2023.3280352

IEEE prohibits discrimination, harassment, and bullying. For more information, visit http://www. ieee.org/web/aboutus/whatis/policies/ p9-26.html.

Smart Solutions for Technology

www.ieeesmc.org

Volume 9, Number 3 • July 2023

Features 2 UAVs-Enabled Maritime Communications UAVs-Enabled Maritime Communications: Opportunities and Challenges By Muhammad Waseem Akhtar and Nasir Saeed

2

9 An ASD Classification Based on a Pseudo 4D ResNet Utilizing Spatial and Temporal Convolution By Shuaiqi Liu, Siqi Wang, Hong Zhang, Shui-Hua Wang, Jie Zhao, and Jingwen Yan

19 Tooth.AI

Intelligent Dental Disease Diagnosis and Treatment Support Using Semantic Network By Hossam A. Gabbar, Abderrazak Chahid, Md. Jamiul Alam Khan, Oluwabukola Grace Adegboro, and Matthew Immanuel Samson

28 MDN-Enabled SO for Vehicle Proactive Guidance in Ride-Hailing Systems Minimizing Travel Distance and Wait Time By Xiaoming Li, Jie Gao, Chun Wang, Xiao Huang, and Yimin Nie

19

37 Edge Processing

A LoRa-Based LCDT System for Smart Building With Energy and Delay Constraints By B Shilpa, Hari Prabhat Gupta, and Rajesh Kumar Jha

ABOUT THE COVER Functional magnetic resonance imaging display of the human brain. ©SHUTTERSTOCK/STEPAN KAPL

Departments & Columns

44

Conference Reports

Mission Statement The mission of the IEEE Systems, Man, and Cybernetics Society is to serve the interests of its members and the community at large by promoting the theory, practice, and interdisciplinary aspects of systems science and engineering, human–machine systems, and cybernetics. It is accomplished through conferences, publications, and other activities that contribute to the professional needs of its members. Digital Object Identifier 10.1109/MSMC.2023.3273049

Ju ly 2023

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE

1

UAVs-Enabled Maritime Communications UAVs-Enabled Maritime Communications: Opportunities and Challenges by Muhammad Waseem Akhtar

and Nasir Saeed

T

he next generation of wireless communication systems will integrate terrestrial and nonterrestrial networks, targeting the coverage of the undercovered regions, especially those connected to marine activities. Unmanned aerial vehicle (UAV)-based connectivity solutions offer significant advances to support conventional terrestrial networks. However, the use of UAVs for maritime communication is still an unexplored area of research. Therefore, this article highlights different aspects of UAV-based maritime communication, including the basic architecture, various channel characteristics, and use cases. The article afterward discusses several open research problems, such as mobility management, trajectory optimization, interference management, and beam forming. Introduction Seawater covers around 70% of planet Earth, and more than 90% of the world’s products are moved by a commercial fleet of approximately 46,000 ships [1], [2], [3]. The world is experiencing an ever-growing booming marine economy with continuous development in conventional sectors, such as fisheries and transportation, and exploring dimensions in maritime activities, such as tourism, exploring oil and gas resources, and weather monitoring. Most of these applications Digital Object Identifier 10.1109/MSMC.2022.3231415 Date of current version: 17 July 2023

2

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

2333-942X/23©2023IEEE

Therefore, developing high-speed maritime networks is of great importance to improve the onboard user experience. As a result, maritime communications have garnered substantial interest in the recent past, where the primary purpose is to enhance the broadband network coverage for terrestrial users with the aid of UAVs that can serve as aerial base stations (BSs) and relays [4]. In this context, UAVs can play a vital role in maritime communications either as relays or flying sensors, gathering information in cheaper, safer, and faster ways. They can successfully perform complex tasks with less human involvement cost. UAVs in the maritime network have the potential to manage, control, and monitor maritime activities, including the identification of defects in ships to diagnose and resolve issues while keeping ships in the sea, reducing maintenance costs and time. Moreover, UAVs can also be helpful for maritime natural resource exploration purposes, such as oil and gas exploration, especially in harsh and challenging environmental conditions. Furthermore, UAVs equipped with high-resolution cameras can also be used for security and surveillance purposes. A single drone can gather more information than cameras installed at different locations. Inspired by these trends, we present the key aspects of UAV-aided maritime communication networks. The goal is to identify the prospects and challenges of deploying UAVs in the maritime network. Our major contributions in this article are summarized as follows: ◆◆ First, we present a design architecture of a UAV-based maritime communication network. ◆◆ Then, we discuss the channel characteristics in maritime communication networks, such as air-to-sea and near-sea-surface channels. Also, we present the use cases of UAV-aided maritime communication (Table 1). ◆◆ Finally, we present the research challenges and future directions for UAV-based maritime communication networks.

depend on a reliable and efficient maritime communication network. Existing maritime networks mainly comprise bandwidth that is too low, very high frequency (VHF) radios, or satellite communication networks with too high a cost to support the International Maritime Organization (IMO) eNavigation concept. However, emerging maritime networks need wideband, low-cost communication systems to achieve better security, surveillance, and coverage for efficient working conditions for the onboard crew and passengers. Although wireless broadband access (WBA) can fulfill the IMO eNavigation requirement, the implementation of WBA technologies in maritime areas is questionable. The typical marine networks comprise a mesh network of different entities in an integrated satellite–air–sea– ground network. A stand-alone satellite-based solution considerably boosts its potential to cover a large area with high-speed data transmission. However, it suffers from unavoidable large propagation delays and expensive implementation costs. Alternatively, HF/VHF-based systems are simple to implement but have limited utilization, i.e., only in vessel identification, tracking/monitoring, and alerting.

©SHUTTERSTOCK.COM/I’M FRIDAY

UAV-Aided Maritime Communication Network Architecture The basic network architecture of a UAV-aided maritime communication network is shown in Figure 1. In such a network, UAVs are simultaneously connected with the maritime control station (MCS), satellite, and sea vessels. The communication links between UAVs and the MCS, satellites, and ships are primary, whereas the communication link between satellites and the MCS is secondary. In the following, we discuss the MCS, control links, and data links in detail.

MCS An MCS is the brain of maritime networks positioned on the ship, on UAVs, or underwater to facilitate the operators of UAVs. The control station may be either stationary or movable for command and control (C&C) transmission. The control station equipment can be as simple as a laptop Ju ly 2023

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE

3

with an antenna connected to it or as complex as a rat’s nest, with wires, antennas, computers, electronics boxes, joysticks, and monitors. Control Links The link used for talking from a BS in the ship or at the coast to users (UAV, satellite, or ship) in UAV-assisted maritime networks is called the control link. The control link is responsible for transmitting commands and controls from

Table 1. A depiction of UAV-based prospective integrated solutions for challenges in maritime applications.

Use Cases

Challenges

Perspective UAVBased Integrated Solutions

Relaying

Mobility, beam forming, and handovers

Sonar, UAVs, and machine learning

IoT data harvesting

Interference and path planning

Sonar, UAVs, and machine learning

Wireless power transfer

3D handovers

Sonar, UAVs, and machine learning

Computation offloading

Complexity

UAVs and machine learning

Localization

Channel variations, and 3D Doppler effect

Sonar, UAVs, satellite, and machine learning

Delivering goods

Path/trajectory planning

Sonar, UAVs, satellite, and machine learning

Cost and complexity

Sonar, UAVs, satellite, and machine learning

Security, safety, and fault identification IoT: Internet of Things.

Satellite Satellite-

Link

UAV

k

k

im

ar

y

Co nt ro lL in Sec k ond ary Con trol Link

Lin -Ship UAV-to

Ship

UAV-to-Ship Lin

Ship

Lin

k

Pr

UAV -to-

Link -Ship lliteto Sate

Satellite-to-Ship Link k ip Lin o-Sh llite-t Sate

to-UAV

Ship Ship

Underwater Vessel

Ship

Maritime Control Station Ship

Ship

Underwater Vessel

Figure 1. A depiction of the basic network

architecture for UAV-aided maritime communication.

4

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

a BS to the users in the uplink. The control links from a maritime BS to the satellite may be utilized for the orbit selection, speed control of the satellite, and coverage control. Similarly, for the UAV and maritime vessels, control links are used for speed control, path selection, and transmission direction control. Data Links Information is exchanged in maritime networks using data links where the communication technologies are responsible for data delivery between system elements and external units. The fundamental challenges of the maritime network are the security of C&C from a BS to the users, cognitive control of the bandwidth, frequency, and data flow. The following are the different types of data links that exist in maritime networks. UAV–Ship and Satellite–Ship Data Links These links deliver information from the UAV/satellite to a sea-based reception device. These links are responsible for the data communication between UAVs and ships and satellites and ships. UAV–Satellite, UAV–UAV, and Satellite–Satellite Data Links UAVs can cooperate with other space/airborne platforms, such as satellites and other UAVs. These types of data links demand that air-to-air communication be established between the platforms. Establishing these links is more challenging due to the relative movement of both transmitters and receivers [5]. Channel Characteristics It is important to comprehend and model the wireless channels to establish the efficient maritime communication network mentioned. As far as maritime communication is concerned, three major channel types are to be investigated. The first is an air-to-sea channel used to communicate between UAVs and ships. The second is a near-sea-surface channel that is used for ship-to-ship communication. Finally, an underwater communication channel is used to communicate between underwater vessels. Underwater communication channels can further be divided into near-sea-surface (i.e., up to 600 m below the sea surface) and deep-sea underwater (i.e., more than 600 m below the sea surface) wireless channels due to differences in their c haracteristics, such as the temperature, salinity, and atmospheric pressure at different sea levels. Maritime wireless channels differ from conventional terrestrial channels in many aspects, such as the ducting effect and heavy scattering over the sea surface, unpredictable sea wave proportions, water density, and temperature variations in the sea. All of these aspects result in significant complexity in the receiver design. Although the satellite-to-ship channels have been explored extensively in the past [6], the wireless channels

expected to face more sparse scattering, which may lead in the terrestrial and nonterrestrial integrated netto simplification in the air-to-sea channel modeling. works (TaNTIN) [7] are less explored for the near-coast As discussed earlier, a standard two-ray or three-ray situation. Therefore, researchers have recently invesmodel can be used in an air-to-sea channel. However, due tigated maritime wireless channels and developed to long-distance transmission in the maritime environseveral models. ment, two main elements, i.e., the The two most essential and disducting effect and Earth curvatinguishing properties of maritime ture, must be considered. Also, wireless channels are sparsity and The control station the location of the transmitter location dependence. Sparsity is equipment can be as (UAV or satellite) is usually above extensively observed in the marithe ducting layer; therefore, a part time environment, especially for simple as a laptop of the radio energy could be the unpredictable scattering and with an antenna absorbed in the ducting layer, distribution of maritime receivers. especially when the gazing angle connected to it or In contrast, the location depen(the angle between the sea surface dency feature implies that there as complex as a and the direct path) is less than should be a completely different rat’s nest, with wires, a threshold. In this case, the raychannel model for different locatrapping action of the ducting tions of the maritime receiver. Figantennas, computers, layer can also increase the power ure 2 depicts the challenging electronics boxes, of the received signal, resulting in maritime environment and chanreduced path loss [10]. nel variations observed at sea level joysticks, and due to the traveling sea waves, monitors. mov ing UAVs, a nd ships. Sea Near-Sea-Surface Channel waves traveling in random direcAs mentioned earlier, near-sea-surtions and with dynamic wave face (such as ship-to-ship, ship-toamplitudes cause high fluctuations in the receiver’s sigland, and land-to-ship) channels are distance dependent. nal-to-interference-plus-noise ratio (SINR) level. The Different channel models can be used for different locamobility of sips, sea waves, and UAVs in random directions of transmitters and receivers. The standard two-ray tions makes channel estimation challenging for the model can be used for a modest distance between the receiver. Similarly, the variable speed of sea waves, UAVs, transmitter and receiver. However, the LoS and the reflectand ships leads to an unpredictable Doppler effect. Coned ray components vanish due to Earth curvature with sequently, these traits develop new difficulties and increased distance between the transmitter and receiver. dimensions in the design of UAVs in a maritime communiHowever, the receiver can still receive the signal transmitcation system. ted due to the ducting effect, provided there is proper beam In the following, we discuss different models for the airalignment between the transmitter and receiver. Concluto-sea, near-sea-surface, and underwater wireless channels. sively, as the distance between the transmitter and receiver

UAV -to-

UAV L

Lin k

UAV

Underwater Vessel

Ship

UAV-to-Ship Link

k

Underwater Vessel

Lin

Ship

ip

Sea Wave Ship

h -S

k

Lin

UAV -to-

ip

UA V-t o-S h

ip L

ink

ink

o V-t UA

Sea Wave

Sh

UAV

oV-t UA

Air-to-Sea Channel Air-to-ground channels are widely studied in the literature [2]. However, the air-to-sea channel differs from the air-toground channel in many aspects due to differences, such as ducting, the sparsity effect, and instability in the maritime environment, which lead to the remarkable differences in channel modeling. Usually, in many cases, the two-ray model is applied. The first component of the tworay model is the line-of-sight (LoS) component, and the second is the surface-reflected ray component. When the transmission distance is very large, and the transmitter is located at some notable height, the curve-Earth two-ray model is used to account for the Earth curvature [8]. In some cases, the rays received from other weak scattered paths can also be considered, apart from two strong paths. However, a dispersion around the maritime receiver is observed when the transmitter is located at a very high altitude [9]. Compared to the terrestrial (i.e., near the urban area) environment, a maritime receiver is

Ship

Underwater Vessel Ship

Figure 2. The UAV as a use case of reliable

maritime communication in dynamically changing environmental conditions.

Ju ly 2023

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE

5

transfer, data offloading, and localization (Table 1). In the increases, the two- or three-ray channel model is replaced following, we discuss each of the use cases in detail. by duct only. The ducting effect across the sea surface allows beyond LoS (BLoS) transmission in marine communications, which has gained much popularity in secure and UAV-Based Relaying long-distance maritime communication. UAV-based communications have growing importance for Figure 3 shows the path loss [10], [11] against the dismany applications, particularly with the arrival of high-altitance between the transmitter and receiver for different tude, long-endurance platforms. These UAVs can enable maritime channels with acoustic BLoS communications in support waves at 500-kHz frequency. The of a range of maritime activities. path loss varies with the level of The UAV-based airborne relay will Wireless charging has water density in the wireless chanenable range extension for marinel. For instance, the path loss in time communication ser vices. been acknowledged deep seawater is higher than that Also, with the flexible mobility and as a viable in free-space, near-sea-surface, high possibilities of LoS air-to-sea and sea-surface channels. The realinks, UAV-enabled relays can distechnology to provide sons for this are the factors of templay increasingly important advanan energy supply for perature, shadowing, and density tages for maritime networks, as of the water. We also show the shown in Figure 1. battery-limited nodes, trend of path loss for radio-fresuch as underwater quency (RF) waves in Figure 4, UAV-Aided Maritime Internet Internet of Things where the RFs face the highest of Things Data Harvesting path loss in deep-seawater chanUnderwater sensor networks have devices and sensors. nels compared to other maritime attained a lot of research attention wireless channels. For the freein recent years. However, it is evispace channel, we do not consider dent that major obstacles remain shadowing caused by the sea waves; rather, we consider to be solved. Several telemetry activities for maritime the LoS communication link between the UAV and ship at monitoring, research, and exploration can be performed sea level. By comparing Figures 3 and 4, we can determine based on collecting data from marine buoys rapidly and in that acoustic waves are more suitable for maritime comreal time. Satellites, ships, and airplanes can all collect munication in the underwater environment. At the same marine data, but satellite transmission is often expensive time, RF is better suited for near-surface and free-space and bandwidth limited, while manned ships/aircraft have links above the seawater environment. high manpower/mission costs and risks. Therefore, using UAVs that can resist strong winds over the sea surface as Use Cases of UAV-Aided Maritime agile data collectors appears to be an exciting solution. Communication This section covers various use cases of UAV-aided maritime communication, such as relaying, wireless power 250 200 Path Loss (dB)

160

Path Loss (dB)

140 120 100 Free Space (LoS) Sea Surface Near Sea Surface Deep Sea Water

60

50 0

0

200

400 600 Distance (m)

800

–100

1,000

Figure 3. A depiction of the path loss for free-space

(LoS), sea-surface, near-sea-surface, and deepseawater channels at 500-kHz acoustic waves.

6

100

–50

80

40

150

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

0

200

400 600 Distance (m)

Free Space (LoS) Sea Surface

800

1,000

Near Sea Surface Deep Sea Water

Figure 4. A depiction of the path loss for free-space

(LoS), sea-surface, near-sea-surface, and deepseawater channels at 500-kHz RF waves.

UAVs can fly near the buoys and use a stable communication channel to wirelessly and quickly capture a significant amount of data because of their high mobility.

determining the location of unknown marine targets by UAVs are challenging.

Research Challenges and Directions Although there has been great interest in UAV-aided marMaritime Wireless Power Transfer itime communication over the past few years, various Wireless charging has been acknowledged as a viable techopen research issues should be targeted. In the follownology to provide an energy supply for battery-limited ing, we explore some promising nodes, such as underwater Interupcoming research challenges for net of Things (IoT) devices and UAV-aided maritime communicasensors. UAV-based wireless chargIn maritime tion networks. ing can bring more flexibility in terms of mobility and accessing communication hard-to-reach areas [12]. Due to the UAV 3D Maritime networks, beamLoS linkages between the UAV and Trajectory Design sensors, the UAV-enabled wireless Exploiting the high mobility of UAVs forming and power power transfer system may subis projected to unlock the full potencontrol issues are stantially improve energy transfer tial of UAV-to-sea communications. more challenging efficiency by deploying the UAV as Various trajectory optimization a mobile energy transmitter. models exist in the literature that due to the optimize air-to-sea communications frequent switching under different UAV configurations. Maritime Computation The problems of trajectory optimiOffloading of frequencyzation are often nonconvex, and Because of great sensitivity to time access points variants of the successive convex and energy consumption, many and collaborative approximation (SCA) technique are computation- and data-intensive used to solve them suboptimally. jobs are challenging to accomplish operation. Nevertheless, these SCA-based on maritime energy-constrained approaches depend heavily on tradevices. UAV-based mobile edge jectory initialization and do not computing (MEC) is a promising explicitly account for the wind effect. Furthermore, for solution to overcome this challenge, providing ubiquitous fixed-wing UAVs that must sustain forward motion to stay in Internet services for emerging maritime applications, such the air, the computational complexity and resulting trajectoas marine environmental monitoring, ocean resource ry complexity make it costly to collect a high volume of exploration, disaster prevention, and navigation. As a data. Therefore, designing an energy-efficient 3D maritime result, UAV-based MEC has emerged as a new paradigm UAV trajectory is very important. that receives great attention in both academic and industrial sectors. Increasing demand for large-scale connection and communications, ultralow information-processing UAV-to-Sea and UAV-to-UAV Interference latency, and high dependability in delay-sensitive marine Management applications pose problems for delivering reliable quality For maritime applications, UAVs largely send data in the of service in a resource-constrained maritime network. downlink. Nevertheless, the capacity of maritime-connected UAVs to establish LoS communication with several sea vessels might lead to severe mutual interference between them Maritime Localization and the ships. To overcome this difficulty, additional advancLocalization plays a significant role in communication in es in the architecture of future UAV-based maritime netthe TaNTIN environment [1]. Maritime localization uses a works, such as enhanced receivers, 3D frequency reuse, and ship’s measuring devices to determine the location of 3D beam forming, are needed. For instance, because of their other nautical targets. Ocean surveillance satellites can capabilities of detecting and categorizing images, deep take advantage of space and altitude to cover large learning models can be implemented on each UAV to recocean areas, monitor submarine operations in real time, ognize numerous environmental elements, such as the and detect radar signals sent by ships. Nevertheless, the location of UAVs and ships. Such a method will enable position precision based on satellites may not be satiseach UAV to change its beamwidth tilt angle to minimize factory, especially in unforeseen situations that require the ships’ interference. high accuracy, such as ocean rescue and noncooperative (enemy) ship location. In this case, UAVs can be used to improve the localization accuracy of the targets 3D Mobility Management (3D Handoffs) where the UAVs can be controlled remotely [3]. NeverUAVs can be deployed as aerial BSs or aerial users in UAVtheless, the self-positioning of UAV platforms and assisted maritime networks. In the case of their

Ju ly 2023

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE

7

deployment as the aerial BSs, UAVs can be deployed far away from maritime users, such as a ship. This might degrade the signal strength at the receiver and cause poor mobility performance, such as radio connection loss and handover failure. In addition, loss of the C&C signal may result in dangerous events, such as the collision of UAVs with commercial aircraft, or may even cause UAVs to fall into the sea. For this case, UAVs are deployed as aerial users in maritime communication networks. However, they can still face many mobility management issues, especially when there is no LoS link between the maritime BS and the aerial users [13]. Although the sidelobes of BS antenna can still serve aerial users, there may be a loss of connection and handover failure due to lower antenna gains in the sidelobes [1]. Hence, excellent mobility management is of essential relevance for enabling reliable connections between UAVs and ships sailing on the sea.

device-to-device communication; artificial intelligence; machine learning and blockchain technologies; and maritime communication. He is a Member of IEEE. Nasir Saeed ([email protected]) earned his Ph.D. degree in electronics and communication engineering from Hanyang University, Seoul, South Korea, in 2015. He is currently an associate professor with the Department of Electrical and Communication Engineering at United Arab Emirates University, Al Ain 15551, United Arab Emirates. His research interests include nonconventional communication networks, heterogenous vertical networks, multidimensional signal processing, and localization. References [1] J.-B. Wang et al., “Unmanned surface vessel assisted maritime wireless communication toward 6G: Opportunities and challenges,” IEEE Wireless Commun., early access, 2022, doi: 10.1109/MWC.008.2100554. [2] Y. Song et al., “Internet of maritime things platform for remote marine water qual-

Beam Forming for High-Mobility Ships and UAVs In maritime communication networks, beam-forming and power control issues are more challenging due to the frequent switching of frequency-access points and collaborative operation. Conjunct power control and beam forming provide reliable coverage for UAV-assisted maritime networks, but a fixed beam-forming vector may lead to SINR variations due to variations in angle of departure (AoD) and angle of arrival (AoA). Empirical measurements with Doppler effects can be of substantial value for constructing more accurate statistical airto-sea channel models, and modern technologies can improve beam forming and mobility management for ships and UAVs.

ity monitoring,” IEEE Internet Things J., vol. 9, no. 16, pp. 14,355–14,365, Aug. 2022, doi: 10.1109/JIOT.2021.3079931. [3] F. S. Alqurashi et al., “Maritime communications: A survey on enabling technologies, opportunities, and challenges,” IEEE Internet Things J., early access, 2022, doi: 10.1109/JIOT.2022.3219674. [4] M. W. Akhtar et al., “The shift to 6G communications: Vision and requirements,” Human Centric Comput. Inf. Sci., vol. 10, no. 1, pp. 1–27, Dec. 2020, doi: 10.1186/s13673 -020-00258-2. [5] N. Saeed et al., “Point-to-point communication in integrated satellite-aerial 6G networks: State-of-the-art and future challenges,” IEEE Open J. Commun. Soc., vol. 2, pp. 1505–1525, Jun. 2021, doi: 10.1109/OJCOMS.2021.3093110. [6] C. Azzarello, C. Gerbino, and R. Mehta, “Enhanced sensing methods for UAVbased disaster recovery,” Comput. Sci. Eng. Senior Theses, Santa Clara Univ., Dept. Comput. Sci. Eng., Santa Clara, CA, USA, 2021. [Online]. Available: https:// scholarcommons.scu.edu/cseng_senior/194.

Conclusion This article presents the possible architecture, important applications, challenges, and solutions for using UAVs in maritime networks. This article identifies various types of wireless maritime channel characteristics. Furthermore, several use cases of UAV-assisted maritime communications, such as monitoring and surveillance, relaying, IoT harvesting, computation offloading, localization, and the delivery of goods, are discussed. This article further tries to spur the interest of researchers in the future evolution of UAV-enabled maritime communication networks that will enable digital use cases for the future marine economy.

[7] M. W. Akhtar and S. A. Hassan, “TaNTIN: Terrestrial and non-terrestrial integrated networks-a collaborative technologies perspective for beyond 5G and 6G,” Internet Technol. Lett., early access, 2021, doi: 10.1002/itl2.274. [8] A. Verma et al., “VaCoChain: Blockchain-based 5G-assisted UAV vaccine distribution scheme for future pandemics,” IEEE J. Biomed. Health Inform., vol. 26, no. 5, pp. 1997–2007, May 2022, doi: 10.1109/JBHI.2021.3103404. [9] S. Bauk, “Performances of some autonomous assets in maritime missions,” TransNav, Int. J. Marine Navig. Safety Sea Transp., vol. 14, no. 4, pp. 875–881, Feb. 2021, doi: 10.12716/1001.14.04.12. [10] J. Wang et al., “Wireless channel models for maritime communications,” IEEE Access, vol. 6, pp. 68,070–68,088, Nov. 2018, doi: 10.1109/ACCESS.2018.2879902. [11] J. Wang and S. Wang, “Seawater short-range electromagnetic wave communication method based on OFDM subcarrier allocation,” J. Comput. Commun., vol. 7, no. 10, pp. 63–71, Jan. 2019, doi: 10.4236/jcc.2019.710006.

About the Authors Muhammad Waseem Akhtar (muhammadwaseem. [email protected]) is a postdoctoral research fellow with the Information Systems and Technology department of Mid Sweden University, Sundsvall 851 70, Sweden. His research interests include the Internet of Things; cooperative communication; energy- and bandwidth-efficient network designing; massive multiple-input, multiple-output and 8

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

[12] E. Lvsouras and A. Gasteratos, “A new method to combine detection and tracking algorithms for fast and accurate human localization in UAV-based SAR operations,” in Proc. IEEE Int. Conf. Unmanned Aircraft Syst. (ICUAS), 2020, pp. 1688–1696, doi: 10.1109/ICUAS48674.2020.9213873. [13] Z. Haider et al., “A novel cooperative relaying-based vertical handover technique for unmanned aerial vehicles,” Secure Commun. Netw., vol. 2022, Sep. 2022, Art. no. 5702529, doi: 10.1155/2022/5702529.

An ASD Classification Based on a Pseudo 4D ResNet Utilizing Spatial and Temporal Convolution

©SHUTTERSTOCK.COM/SAID FX

by Shuaiqi Liu , Siqi Wang , Hong Zhang, Shui-Hua Wang , Jie Zhao, and Jingwen Yan

T

he psychiatric condition known as autism spectrum disorder (ASD) affects children and adults alike. As a medical imaging technology, functional magnetic resonance imaging (fMRI) is widely used to study the brains of persons with ASD. This study introduces a novel technique: a pseudo 4D ResNet (P4D ResNet) to simultaneously Digital Object Identifier 10.1109/MSMC.2022.3228381 Date of current version: 17 July 2023

2333-942X/23©2023IEEE

extract and classify the brain activity of ASD patients. A P4D ResNet can extract both temporal and spatial information from fMRI data, which mainly consists of two different residual blocks stacked together. In a P4D ResNet, to reduce computational and parametric quantities, each residual block is combined with a 3D spatial filter and a 1D temporal filter instead of a 4D spatiotemporal convolution, which can perform parallel computation. Due to the high dimensionality of the complete data and the limited amount of data, in this article, each piece of fMRI data are Ju ly 2023

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE

9

classifier. Iidaka [10] input the correlation matrix calculated from rs-fMRI time-series data into a probabilistic neural network (PNN) for ASD classification. The PNN classifier consists of four fully interconnected layers: an input, a pattern, summation, and an output. The proposed algorithm obtained approximately 90% accuracy in 312 Introduction subjects with ASD, and 328 subjects with typical developASD, usually known as autism, is a common neurodevelment. Bi et al. [11] proposed a random NN cluster consistopmental cognitive condition in children that is primarily ing of multiple NNs to classify 50 ASD patients and inherited. Neurodegenerative conditions, including autism 42 typical controls (TCs) to solve the problem of the low spectrum diseases, have recently drawn more attention. accuracy of a single NN to classify Patients usually have very slow ASD patients and TCs. They also suglanguage development and are gested five random NN clusters, unable to communicate properly. Resting-state fMRI namely, a random backpropagation They are not interested in the NN cluster, random probabilistic NN activities around them and rarely requires subjects to cluster, random learning vector initiate social interactions. Morebe fully relaxed to quantization NN cluster, a random over, they often exhibit repetitive, competitive NN cluster, and random stereotyped behaviors and are acquire images, and Elman NN cluster were constructextremely resistant to change and the images acquired ed. Among them, the accuracy of transformation. The families of have high spatial and random Elman NN clusters was ASD patients suffer significant psygreatly improved. chological and financial stress for temporal resolution. Mostafa et al. [12] proposed a a protracted period of time due to brain network-based algorithm for the lack of a specific prescription ASD classification. This algorithm for ASD and the difficulties in findused 264 regions-based wrapping schemes from the fMRI ing a permanent cure. This causes losses and injury to of the brain to construct a brain network. Then, 264 origiindividuals, families, and society at large. Traditional ASD nal brain features were defined by the 264 feature values diagnostic techniques are time consuming and prone to of the Laplacian matrix of the brain network, and three error because they are dependent on the Diagnostic and additional features of the brain network were defined by Statistical Manual of Mental Disorders. As a result, the the network centrality. Finally, this algorithm obtained 64 creation of a fully automated diagnostic method for ASD discriminative features through a feature-selection algois required. Numerous functional neuroimaging techniques have rithm and obtained an accuracy of 77.7% in ASD classificabeen utilized in brain study since the advancement of medtion. Liu et al. [13] proposed an ASD classification ical imaging. One of the most widely used is fMRI [1], [2], algorithm based on dynamic functional connectivity and [3], [4]. High temporal and spatial resolution obtained by multitask feature selection, which was validated by the fMRI makes it possible to see both physiological and pathfMRI data from ABIDE I with a classification accuracy of ological functional brain activity [5], [6]. Blood-oxygen76.8%. Zhao et al. [14] used the method of extracting cenation-level dependent, which in brain research can be tral moments of data to extract time-invariant features in separated into two modalities, namely, task and resting low- or high-order dynamic functional connectivity netstates, serves as the foundation for the fundamental theory works of fMRI data. By integrating the features extracted of fMRI. Resting-state fMRI (rs-fMRI) requires subjects from conventional functionally connected, low-order to be fully relaxed to acquire images, and the images dynamically connected, and high-order dynamically conacquired have high spatial and temporal resolution. nected networks, an accuracy of up to 83% was obtained in Because the acquisition method is quick and easy, it is 45 ASD patients and 47 TCs by using a linear, kernel-based widely applied in the classification of ASDs [7], [8], [9]. The support vector machine (SVM) classifier. rs-fMRI data used in this study were mainly dichotomized Deep learning algorithms have been well applied in varfor ASD and TCs. ious fields [15], [16], [17]. Deep learning-based ASD classifiIn terms of model composition, the research on ASD cation algorithms have also recently gained popularity due classification can generally be categorized into two types: to the quick advancement of computers. One of the most traditional machine learning and deep learning. Traditionwidely used is convolutional NNs (CNNs). For example, al machine learning methods provide effective models for Xiao et al. [18] decomposed the dataset of each subject ASD classification and recognition problems. Scholars into 30 independent components. Then, an array of 84 key from various countries have proposed different traditional features of all the subjects was reshaped into a 3,400 × machine learning-based methods for ASD classification, 84-dimensional key-feature matrix and was input into a and the main steps include manual feature extraction and stacked autoencoder for classification. This study sampled at equal intervals of a set length in the time dimension for data expansion. Compared with other existing models, the experiments show that the proposed model for ASD classification achieved better results.

10

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

Classification

FC

Dropout

Flatten

MRB

4D Max Pooling

MRB

4D Max Pooling

MRB

4D Max Pooling

P4DC

fMRI Data

4D Max Pooling

small-sample data. In the aforementioned studies, classifiobtained an average classification accuracy of 87.21% in cation accuracy of the small-sample studies can reach 84 subjects. Jia et al. [19] extracted the functional connecnearly 90%, while that of large-sample studies reaches only tivity correlation matrix of the brain from rs-fMRI data about 70% accuracy. However, the significance of ASD after preprocessing and then used a stacked autoencoder classification studies is precisely why there is a desire to for ASD classification. ASD identification was obtained invest in realistic medical judgment. If the variability of with an accuracy of 95.27% in 656 subjects. In 2019, sites in the database is not taken into account and only a Rathore et al. [20] obtained a classification accuracy of small sample is used for the study, the results are not 69.2% in 1,035 subjects with a simple three-layer NN by extensive. To classify ASDs, a P4D-ResNet-based ASD using a functional correlation and its topological features. classification method is created and employed in this In the same year, Zhuang et al. [21] proposed an invertible research. This model puts spatial network for ASD classification and convolution and temporal convolubiomarker selection. This inverttion together into a residual block, ible network has two invertible The quantity of data thus realizing the simultaneous blocks that map the data from the extraction of spatiotemporal feainput domain to the feature domain. voxels is substantial tures of fMRI data, which can also Then, a fully connected (FC) layer because fMRI images perform parallel computations. was applied for classification, and a are an arrangement The results of the experiments classification accuracy of 71% was show how effective the proposed achieved in 530 ASD patients and of a series of 3D method works. 505 subjects. In 2020, Tang et al. images acquired in a [22] proposed an end-to-end multiThe Proposed Algorithm modal architecture based on deep time series. In this article, we propose a P4DNNs that can analyze the region-ofResNet model based on different interest time-series activation residual architectures. This model maps by combining different deep can extract both spatial and temporal features of fMRI data learning networks. This method can analyze functional and fully exploit the spatiotemporal information, which images more comprehensively and achieve 74% classificaachieves satisfactory classification results. Construction of tion accuracy among 1,035 subjects. In 2021, Shao et al. [23] the P4D-ResNet network model is described in this section. proposed an ASD classification algorithm by combining The P4D-ResNet model consists of a 4D maximum pooling deep feature selection and graph convolutional networks layer, a P4D-convolution (P4DC) block, a mixed residual (GCNs), which achieved better ASD classification results. block (MRB), a Flatten layer, a dropout layer, and an FC In the same year, Yin et al. [24] constructed brain networks layer. The network structure of the P4D-ResNet model is from brain fMRI images and then combined self-encoders shown in Figure 1. and deep NNs for ASD classification, which achieved good The ResNet model first performs dimensionality reducclassification results. tion by using a 4D maximum pooling layer, followed by The quantity of data voxels is substantial because fMRI P4D convolution, i.e., a spatial and temporal convolution to images are an arrangement of a series of 3D images obtain the spatial and temporal features of the fMRI data. acquired in a time series. The huge amount of spatiotemThe P4D-ResNet model feeds the extracted features into poral information within the fMRI 4D image data is three connected 4D maximum pooling layers and the MRB ignored in most current methods, which inevitably leads to module to downscale and further extract spatiotemporal the loss of important information. Traditional models are features from the data. Finally, through the Flatten layer unable to extract more effective features, and the classifiand the FC layers, the classification results are obtained cation accuracy is relatively low. In addition to this, the by the Sigmoid function. The proposed model in this artisample size has a significant impact on the classification cle can be expressed as results. There tends to be greater accuracy when using

Figure 1. The network structure of the P4D-ResNet model. max: maximum.

Ju ly 2023

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE

11

c

||w

p=0 q=0 r=0 s=0

pqrs ijc

k ((xi -+1)pc)(y + q)(z + r)(t + s) n (2) 0

0

0

0

where 2 is the activation function. Pi, Q i, R i, and S i denote the size of the dimension in each of the four direcpqrs tions. w ijc is the weight value at position (p, q, r, s), which connects the c th feature map of the i - 1 th layer with the j th feature map of the i th layer. With the expansion of convolutional layers from three to four dimensions, the skyrocketing number of parameters and computational effort may lead to an overfitting phenomena. To solve this problem, we decompose the 4D spatiotemporal convolution into the combination of a 3D spatial and 1D temporal convolution, that is, the original 3 # 3 # 3 # 3 convolution is split into a combination of a 3 # 3 # 3 # 1 spatial convolution and a 1 # 1 # 1 # 3 temporal convolution, which is the principle of the P4DC module.

Conv

Conv-s

Conv

Conv-t

Conv-t

Conv-s

The MRB In this article, as shown in Figure 3, a 4D MRB is built to conduct the simultaneous extraction of spatiotemporal information. The residual block is composed of a P4D-serial residual block (P4D-SRB) and a P4D-parallel residual block (P4D-PRB). The P4D-SRB and P4D-PRB constructed in this article are obtained by modifying the conventional 3D bottleneck residual block. The conventional residual structure is shown in Figure 4(a), and the residual blocks constructed in this article are depicted in Figure 4(b) and (c).

P4D-PRB

P4D-SRB

MRB

Figure 2. The principle of 4D convolution.

Pi - 1 Q i - 1 R i - 1 S i - 1

0 0 0 0

Conv

Temporal

k ijx y z t = 2 d b ij + | | |

Conv

4D Convolution 4D CNNs are well suited for spatiotemporal feature learning of medical images. It is possible to better extract the data’s temporal and spatial information by performing 4D convolutional procedures over space and time. To gain more detailed temporal information, the spatial feature maps in the convolutional layer are connected to numerous nearby time points in the previous layer. The principle of 4D convolution is presented in Figure 2. The same color in the convolutional connection indicates weight sharing. As displayed in Figure 2, the 4D convolution operation applies the same 4D kernel to a continuous 3D image,

0 0 0 0

P4D-SRB

where x denotes the input 4D fMRI data, and y denotes the output of the last MRB function. MRB denotes MRB function, P4DC denotes the P4D-convolutional block, and MP denotes the 4D maximum pooling function. The substructures in the model are described separately in the next section.

extracting features over the entire time series by shifting the step size. Assuming that k ijx y z t is the value at the (x 0, y 0, z 0, t 0) position of the j th feature map of the i th layer, that is,

P4D 4D-PR PRB B

y = MRB (MP (MRB (MP (MRB (MP (P4DC (MP (x))))))))) (1)

Figure 3. The mixed residual block structure. P4D-SRB: pseudo-4D serial residual block; P4D-PRB: pseudo-4D

parallel residual block.

12

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

The kernel size of both the first and fourth convolutional layers in the P4D-SRB is set to 1 # 1 # 1 # 1, which can match the number of channels. The number of output channels in the P4D-SRB is four times the number of input channels. The P4D-SRB uses spatial convolution, followed by a temporal convolution mode for the spatiotemporal feature extraction of data. And the P4D-PRB extracts the spatiotemporal features of data by using spatial convolution and temporal convolution in parallel. In the P4D-SRB, output of the spatial convolution is directly used as the input of the temporal convolution, which indicates that the extraction of spatial information has a direct impact on the temporal features. In contrast, in the P4D-PRB, spatial and temporal convolution are extracted separately and then directly accumulated as feature outputs. The extraction of spatial information in the same residual block does not have a direct effect on temporal feature extraction. It is helpful to generate MRBs by cascading this too, which improves ASD classification results by capturing the spatiotemporal features of fMRI data well.

◆◆ The input data are reshaped into a dimensional size of

b, w, h, d, and t by using the reshaping function. ◆◆ A 3D maximum pooling operation is performed on the reshaped input and output data with a dimension size of b, w/2, h/2, d/2, and t. “/” denotes a division operation with upward rounding. ◆◆ The current data are reshaped into a dimension size of b, w/2, h/2, d/2, t/2, and 2 by using the reshaping function again. ◆◆ Take the maximum value of the current data in the channel dimension and output the data with dimension sizes of b, w/2, h/2, d/2, t/2, and 1. When the number of channels is eight, the data are first sliced into eight tensors with channel number eight, and the 4D maximum pooling operation with channel number one is invoked separately. And when the number of channels is 16, the data are sliced into two tensors of channel number eight. Similarly, when the number of channels is 32 or 64, it is processed the same way. So, the 4D maximum pooling layer can be computed by parallel computation.

4D Maximum Pooling Layer This study extends the 3D maximum pooling layer to the 4D maximum pooling layer. The number of channels used in this article for the 4D maximum pooling layer are 1, 16, 32, and 64, respectively. The 4D maximum pooling with a channel number of 1 proceeds as follows: ◆◆ Let the size of each dimension of the input data of the pooling layer be b, w, h, d, t, and l. b denotes the batch size of the data input. w, h, and d represent the width, height, and depth, respectively, of the input fMRI data. t represents the time dimension of the input data, and l denotes the number of channels.

Data Enhancement And Model Training The dataset from the global, openly accessible Autism Brain Imaging Data Sharing Project [25] is used to generate the rs-fMRI results in this study. The samples with poor brain coverage, excessive motion peaks, ghosting, and other scanner aberrations are eliminated to leave a final dataset of 871 participants, including 403 ASD patients and 468 TCs. In a 4D NN, more data samples are needed for training. Therefore, the data in this article are enhanced by obtaining multiple sampling from the original dataset in the temporal dimension. Specifically, 871 subjects are disordered before

1×1×1×1

1×1×1

1×1×1×1

ReLU

ReLU 3×3×3

ReLU

3×3×3×1

1×1×1×3

ReLU

ReLU

ReLU

3×3×3×1

1×1×1×3

ReLU

+

ReLU 1×1×1

+ ReLU

1×1×1×1

1×1×1×1

++

+ ReLU

ReLU

(a)

ReLU

(b)

(c)

Figure 4. 3D residual block and P4D residual block structures. (a) An ResNet. (b) A P4D-SRB. (c) A P4D-PRB.

ReLu: rectified linear unit.

Ju ly 2023

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE

13

the experiment, and each subject’s 4D fMRI data are sampled in the temporal dimension in turn. Sixteen time slices are drawn at an interval of one per frame, and each subject enhances the data by the maximum expansion. The data of each 69 subjects and the corresponding labels are encapsulated into one generated TFRecord file. TFRecord format file storage form can reasonably store the data. TFRecord internal use of the “Protocol Buffer” binary data encoding scheme occupies only a block of memory and only needs to load one binary file at a time. It is simple and fast, especially for large training data. When the training data are large, they can be divided into multiple TFRecord files to improve processing efficiency. Fifteen TFRecord files are generated for training and testing. Among them, 12 TFRecord files are used for training and three TFRecord files are used for testing. The data augmentation scheme used in this article is divided into a training set and a testing set on the unit of “person.” Then the data of each subject are expanded separately. Each person’s extended data are either in the training or the testing set, which aids in preventing similar data from impairing the model’s classification effect and improves the generalization performance. The amount of data used in the actual experiment after data augmentation is listed in Table 1. The experiments in this article are implemented on the Tensorflow 1.0 platform with an Ubuntu 18.4 operating system, 32 G of random-access memory, Intel(R) Xeon(R) central processing unit E5-2667 processor, and a Nvidia Tesla K40c GPU card. The experiments start with data enhancement of the preprocessed fMRI data with dimensional sizes of 61, 73, 61, and 16 for all the data. Second, to reduce the risk of model overfitting, a 4D maximum pooling layer with a step size of two and a kernel of 2 # 2 # 2 # 2 is used for dimensionality reduction. The low-level spatiotemporal features are then extracted by a layer of spatial convolution with a kernel size of

Table 1. The dataset after data enhancement. The datasets

ASD

TC

Total

The original dataset

403

468

871

The expanded dataset

2,901

3,051

5,952

Table 2. The performance of different kinds of residual block combinations.

14

Residual structures

Accuracy (%)

Specificity (%)

Sensitivity (%)

MSRB-2

66.8

62.68

70.18

MPRB-2

68.54

60.62

75.08

MMRB

74.67

71.9

76.91

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

3 # 3 # 3 # 1 and a layer of temporal convolution with a kernel size of 1 # 1 # 1 # 3. Then, high-level spatiotemporal features of the data are extracted by the maximum pooling and MRB modules. In this article, three MRB modules are used. The first MRB module has eight channels. The second MRB module has 16 channels. The third MRB module has 32 channels. The output features of the last MRB module are flattened by using the Flatten layer. Finally, the flattened feature vector is fed into the FC layer after the dropout operation and classified by using the Sigmoid classifier. In this article, model optimization is accomplished by using the Adam-optimization algorithm. Cross-entropy is the loss function. The experimental parameters include a four-batch data input size. The rate of learning is 0.00001. Dropout is set to 0.5, and the dense layer’s two-parameter regularization parameter is set to 0.0005. Experimental Results and Analysis The data are split into a training set and a test set to the ratio of 8:2 to test the model algorithm’s efficacy and save as much training data as possible. The test set is used to evaluate the classification performance of the model, whereas the training set is used to train the model. Ablation Experiments To have a better illustration of the effectiveness of the MRB module on ASD classification, “mixed serial residual block (MSRB-2)” is used to replace the P4D-PRB residual block in MRB with the P4D-SRB residual block. Then, “mixed parallel residual block (MPRB-2)” is used to replace the P4D-SRB residual block in MRB with the P4DPRB residual block, and the classification results are presented in Table 2. When MSRB-2 is used, the accuracy, specificity, and sensitivity of ASD classification are 66.8, 62.68, and 70.18%, respectively. When using MPRB-2, the accuracy, specificity, and sensitivity of ASD classification are 68.54, 60.62, and 75.08%, respectively. In contrast, when using the MRB module, the accuracy of ASD classification is improved by 7.87 and 6.13%, respectively, and the sensitivity and specificity are the highest. It can be seen that a more structured MRB can achieve better results, especially for sensitivity improvement, which validates the effectiveness of the MRB model. For the aforementioned three different residual structures, we plot their receiver operating characteristic (ROC) curves and calculate area-under-the-curve (AUC) values to evaluate the three algorithms. Figure 5 illustrates ROC analysis results of the model by using MSRB-2, MPRB-2, and mixed many residual block (MMRB), respectively. Figure 5 shows that the model performs at its best and the AUC value is its highest when MRB is employed. In this article, we also conduct experiments on the effect of the number of MRB modules. And we use 1–4 MRB modules, respectively, to further verify reliability of the model’s design. As shown in Table 3, ASD classification

abbreviated as HFR by merging various functional connecaccuracy is merely 64.74% when only one MRB module is tivity matrix creation techniques, brain segmentation defiused to extract spatiotemporal features, which indinitions, and feature-extraction techniques proposed by cates that one MRB module cannot extract effective and Graña and Silva [27]. 3) A CNN and representative spatiotemporal multilayer perceptron (CNN-MLP)information. As the number of based ASD classification system model layers increases, ASD classiIt can be seen that a [28]. 4) A deep multimodal model fication accuracy increases, but ASD classification system based when the number of stacked more structured MRB on joint representation learning, groups reaches four, ASD classifican achieve better namely, DiagNet, was proposed by cation accuracy decreases and the results, especially Eslami et al. [29]. 5) A 4D CNNmodel appears to be overfitted. In based ASD classification algorithm summary, three MRB modules are for sensitivity proposed by Guo et al. [30]. 6) An selected for model experiments in improvement, ASD classification system based on this article. 4D CNNs, namely, UM_1, was proIn this article, the selection of which validates the posed by Guo et al. [30]. 7) An ASD time frames for data sampling is effectiveness of the classification algorithm based on discussed. The time frames are USM sites and 4D CNNs was selected and set to 8, 16, and 32 for MRB model. offered by Guo et al. [30]. 8) A CNN training and testing, respectively. and gate-recursive unit-based ASD The classification effects are listed classification algorithm was reportin Table 4. ed by Jiang et al. [31]. 9) A GCN was used by Parisot et al. Table 4 shows that ASD classification accuracy is low [32] to train ASD detection models in a semisupervised when the time dimension is chosen to be 8. This is mostly learning setting. The results of the comparison algorithms due to the time being too short, which causes the model to are taken from the test results provided by the authors in extract fewer features and makes it difficult to properly the corresponding references. The test dataset contains extract the temporal signals in the fMRI data. And when 32 data from every site, providing for the calculation of the is used for the temporal dimension, more parameters and average accuracy. The proposed algorithm’s and comparicomputation are required for model training, which results son algorithms’ test set results are listed in Table 5. in the overfitting phenomena. As a result, the experiments As shown in Table 5, the proposed algorithm can in this article’s experiments selected the data from 16 time achieve 74.67% accuracy in the experiments with 871 subpoints that had the best categorization effect. jects. It improves by 7.37% compared to the RCE-SVM, The Comparison With Existing Algorithms Table 3. The impact of the number of MRB We compare the proposed method with the current ASD modules on the classification effect. classification algorithms to test its performance. The com-

True-Positive Rate

pared algorithms are 1) an ASD classification algorithm based on functional connection networks and recursivecluster elimination SVMs (RCE-SVMs) was put forth by Chaitra et al. [26]. 2) A hybrid ASD classification algorithm

The number of MRB modules

Accuracy (%)

Specificity (%)

Sensitivity (%)

1

64.74

62.34

66.72

2

69.54

66

72.47

1

3

74.67

71.9

76.91

0.8

4

70.36

67.45

72.77

0.6

able 4. The classification effect of T different time frames.

0.4 MSRB-2 (AUC = 0.75) MPRB-2 (AUC = 0.74) MMRB (AUC = 0.8)

0.2 0

0

0.2

0.4

0.6

0.8

1

False-Positive Rate Figure 5. The ROC curves of different residual block

superposition experiments.

Time Frames

Accuracy (%)

Specificity (%)

Sensitivity (%)

8

70.86

67.88

73.22

16

74.67

71.9

76.91

32

71.27

70.38

72.01

Ju ly 2023

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE

15

5.23% compared to the UM_1, and 3–5% compared to the HFR, CNN-MLP, DiagNet, 4D CNN, and USM. In addition, the proposed algorithm obtains 71.9 and 76.91% sensitivity and specificity, respectively. The proposed algorithm uses large samples for experiments, so the results are more extensive. And the subsequent single-site experiments also verify that the algorithm in this article can obtain better classification results on the New York University (NYU) results. The classification accuracy, sensitivity, and specificity of the proposed algorithm at 17 sites are computed in this study to further explore the classification performance of the model at each site, as shown in Table 6. From Table 6, it is more obvious that the variability between sites has a significant impact on the final results. Although the Carnegie Mellon University (CMU), SBL, and UM sites had less than 70% classification accuracy, the Kennedy Krieger Institute (KKI), Leuven, MaxMun, and Trinity sites have more than 80% classification accuracy. The varying scanning apparatuses, subject counts, and time dimensions at each site contributed to the variation in the expansion data as well. The noise introduced by this fluctuation makes it more difficult to extract features from the fMRI data to categorize illness states. In addition, the confusion matrices of 17 sites are given in Figure 6, which clearly shows the sample probability distribution of both the ASD and TC being correctly and incorrectly identified, respectively. From Figure 6, it can be seen that the percentage of ASD and TC, which can be correctly classified, is high in the KKI, MaxMun, and Trinity sites, while the accuracy of both ASD and TC recognition varies more in the CMU, SBL, San Diego State University, and UM sites. Conclusion and Future Work In this study, the P4D-ResNet deep learning model was proposed for the simultaneous extraction of spatiotemporal information. Instead of using 4D spatiotemporal convolution, we employed spatial and temporal convolution, which also built a mixed residual model to extract richer spatiotemporal feature information. This study conducted an enhancement operation on fMRI data, taking into account the constraints of the current data volume and

Table 5. The performance of the P4D-ResNet model compared with other algorithms. Classification algorithms

RCESVM

HFR

CNNMLP

DiagNet

4D CNN

Accuracy (%)

67.3

71.1

70.22

70.3

70.49

Classification algorithms

UM_1

USM

CNNG

GCN

P4D ResNet

Accuracy (%)

69.44

69.7

72.46

69.5

74.67

16

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

the sample size necessary for deep learning models. To evaluate the performance of the model, we conducted ablation experiments on the proposed algorithm. Additionally, by contrasting the method in this article with current ASD classification algorithms, the proposed algorithm’s efficacy was confirmed. In addition, we calculated ASD classification accuracy, sensitivity, and specificity indexes among 17 sites and assessed the effect of site variability on the results. However, several issues can be considered in the future. First, we used only functional imaging modality, without considering the structural imaging modalities related to brain states. In the future, we will integrate both functional and structural modalities to train our model for ASD identification. Second, we treated ASD diagnosis as a binary classification problem. However, it is well known that ASD is divided into eight categories in the latest edition of ICD-11, published by the

Table 6. The classification effect of the algorithm in this article at 17 sites. Serial number

Sites

Accuracy (%)

Specificity (%)

Sensitivity (%)

1

Caltech

70.45

70

70.83

2

CMU

69.7

84.62

60

3

KKI

86.36

94.74

80

4

Leuven

80.36

77.5

87.5

5

MaxMun

83.33

83.33

83.33

6

NYU

72.59

75

70.67

7

OHSU

71.42

75

66.67

8

Olin

76.67

83.33

72.22

9

Pitt

78.72

82.35

76.67

10

SBL

69.44

73.33

50

11

SDSU

80

60

87.14

12

Stanford

72.46

70

73.47

13

Trinity

85.42

87.5

84.38

14

UCLA

70

77.78

63.63

15

UM

69.53

81.82

62.78

16

USM

75.89

75.71

76.19

17

Yale

80.95

76.19

85.71

Caltech: California Institute of Technology; CMU: Carnegie Mellon University; KKI: Kennedy Krieger Institute; Leuven: University of Leuven; MaxMun: University of Munich; NYU: New York University Langone Medical Center; OHSU: Oregon Health and Science University; Olin: Olin Institute of Living at Hartford Hospital; Pitt: University of Pittsburgh School of Medicine; SBL: Social Brain Lab; SDSU: San Diego State University; Stanford: Stanford University (Stanford); Trinity: Trinity Centre for Health Sciences; UCLA: University of California, Los Angeles; UM: University of Michigan; USM: University of Utah School of Medicine; Yale: Yale Child Study Center.

ASD

ASD 84.62 15.38

30

70

TC 29.17 70.83 TC

ASD (a) ASD

TC

40

75

25

ASD

ASD

TC 26.53 73.47 ASD TC (l)

ASD 77.5 TC

TC

22.5

ASD 83.33 16.67

12.5

87.5

TC 16.67 83.33

ASD

TC

(c) ASD 82.35 17.65

TC 27.78 72.22

TC 23.33 76.67

ASD

TC

ASD

(h) ASD 87.5

ASD

(d)

ASD 83.33 16.67

TC

30

80

20 ASD

TC

(g) 70

TC

60

(b)

TC 33.33 66.67

ASD

ASD 94.74 5.26

ASD

75

TC 29.33 70.67

TC

TC

TC

50

50 ASD

(i)

TC (j)

TC

ASD

(e) ASD 73.33 26.67

25

(f) ASD

60

40

TC 42.86 57.14 ASD

TC

(k)

12.5

ASD 77.78 22.22

ASD 81.82 18.18

ASD 75.71 24.29

ASD 76.19 23.81

TC 15.62 84.38

TC 36.37 63.63

TC 37.22 62.78

TC 23.81 76.19

TC 14.29 85.71

ASD TC (m)

ASD TC (n)

ASD TC (o)

ASD TC (p)

ASD TC (q)

Figure 6. The confusion matrices of 17 sites. (a) Caltech. (b) CMU. (c) KKI. (d) Leuven. (e) MaxMun. (f) NYU. (g) OHSU. (h) Olin. (i) Pitt. (j) SBL. USM. (k) SDSU. (l) Stanford. (m) Trinity. (n) UCLA. (o) UM. (p) USM. (q) Yale.

World Health Organization. Therefore, we will seek to model a multiclass classifier. In addition, the deep learning model is like a black box, and it is difficult to achieve physiological interpretation. We will continue to explore interpretive methods suitable for the model. Acknowledgment This work was supported in part by the National Natural Science Foundation of China under grant 62172139, the Natural Science Foundation of Hebei Province under grant F2022201055, and the Science Research Project of Hebei Province under grant BJ2020030. The project was funded by the China Postdoctoral under grant 2022M713361, Natural Science Interdisciplinary Research Program of Hebei University under grant DXK202102, Research Project of Hebei University Intelligent Financial Application Technology R & D Center under grant XGZJ2022022, Open Project Program of the National Laboratory of Pattern Recognition under grant 202200007, and Open Foundation of Guangdong Key Laboratory of Digital Signal and Image Processing Technology (2020GDDSIPL-04). This work was also supported by the High-Performance Computing Center of Hebei University. Jingwen Yan is the corresponding author. About the Authors Shuaiqi Liu ([email protected]) earned his Ph.D. degree from the Institute of Information Science, Beijing Jiaotong University, in 2014. He is a professor at the College of Electronic and Information Engineering, Hebei University, Baoding 071002, China. His research interests include image processing and signal processing.

Siqi Wang ([email protected]) earned her B.S. degree from the College of Electronic and Information Engineering, Hebei University, Baoding, China, in 2021. She is currently pursuing her M. S. degree at the College of Electronic and Information Engineering, Hebei University, 071002 Baoding, China. Her research interests include computer vision and image processing. Hong Zhang ([email protected]) earned her B.S. degree from the College of Information Engineering, Yanshan University, Qnhuangdao, China, in 2019. She is currently pursuing her M.S. degree at the College of Electronic and Information Engineering, Hebei University, 071002 Baoding, China. Her research interests include computer vision and image processing. Shui-Hua Wang ([email protected]) earned her Ph.D. degree in electrical engineering from Nanjing University in 2017. She was a professor in the School of Computer Science and Technology, Henan Polytechnic University, 454000 Jiaozo, China. She also served as a research associate in Loughborough University from 2018–2019. Her research interests includes machine learning and biomedical image processing. Jie Zhao ([email protected]) earned his Ph.D. degree in optics from the State Key Laboratory of Applied Optics, Changchun Institute of Fine Mechanics and Optics, Academia Sinica, Changchun, China, in 1997. He is a professor in the Department of Electronic Engineering, University of Shantou, 515063 Shantou, China. His current research interests include SAR image processing, hyper-wavelet transforms, and compressed sensing. Ju ly 2023

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE

17

Jingwen Yan ([email protected]) is with the School of Engineering, Shantou University, 515063 Shantou, China.

[16] K. Fu, D. Fan, G. Ji, Q. Zhao, J. Shen, and C. Zhu, “Siamese network for RGB-D salient object detection and beyond,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 9, pp. 5541–5559, Sep. 2022, doi: 10.1109/TPAMI.2021.3073689.

References

[17] Q. Hu, S. Hu, and S. Liu, “BANet: A balance attention network for anchor-free ship

[1] C. M. Michel, M. M. Murray, G. Lantz, S. Gonzalez, L. Spinelli, and R. Grave de Per-

detection in SAR images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–12, Jan.

alta, “EEG source imaging,” Clin. Neurophysiol., vol. 115, no. 10, pp. 2195–2222, Oct.

2022, doi: 10.1109/TGRS.2022.3146027.

2004, doi: 10.1016/j.clinph.2004.06.001.

[18] Z. Xiao, C. Wang, N. Jia, and J. Wu, “SAE-based classification of school-aged chil-

[2] S. Liu et al., “3DCANN: A spatio-temporal convolution attention neural network

dren with autism spectrum disorders using functional magnetic resonance imaging,”

for EEG emotion recognition,” IEEE J. Biomed. Health Inform., vol. 26, no. 11, pp.

Multimedia Tools Appl., vol. 77, no. 17, pp. 22,809–22,820, Sep. 2018, doi: 10.1007/

5321–5331, Nov. 2022, doi: 10.1109/JBHI.2021.3083525.

s11042-018-5625-1.

[3] E. Moradi, A. Pepe, C. Gaser, H. Huttunen, and J. Tohka, “Machine learning

[19] N. Jia, J. Tan, Z. Xiao, Z. Qi, and J. Wu, “Classification of autism spectrum disorder

framework for early MRI-based Alzheimer’s conversion prediction in MCI sub-

based on brain functional connectivity and SAE,” J. Nanchang Univ. (Natural Sci.),

jects,” NeuroImage, vol. 104, pp. 398–412, Jan. 2015, doi: 10.1016/j.neuroimage.

vol. 42, no. 4, pp. 399–403, Aug. 2018, doi: 10.13764/j.cnki.ncdl.2018.04.017.

2014.10.002.

[20] A. Rathore, S. Palande, J. S. Anderson, B. A. Zielinski, P. T. Fletcher, and B. Wang,

[4] S. Liu, C. Zhao, Y. An, P. Li, J. Zhao, and Y. Zhang, “Diffusion tensor imaging

“Autism classification using topological features and deep learning: A cautionary

denoising based on Riemannian geometric framework and sparse Bayesian learning,”

tale,” in Proc. Int. Conf. Med. Image Comput. Comput. Assisted Intervention (MIC-

J. Med. Imag. Health Inform., vol. 9, no. 9, pp. 1993–2003, Dec. 2019, doi: 10.1166/

CAI), Cham, Switzerland: Springer-Verlag, 2019, pp. 736–744, doi: 10.1007/978-3-030

jmihi.2019.2832.

-32248-9_82.

[5] S. Liu, L. Zhao, J. Zhao, B. Li, and S.-H. Wang, “Attention deficit/hyperactivity disor-

[21] J. Zhuang, N. C. Dvornek, X. Li, P. Ventola, and J. S. Duncan, “Invertible network

der Classification based on deep spatio-temporal features of functional Magnetic Reso-

for classification and biomarker selection for ASD,” in Proc. Int. Conf. Med. Image

nance Imaging,” Biomed. Signal Process. Control, vol. 71, Jan. 2022, Art. no. 103239,

Comput. Comput. Assisted Intervention (MICCAI), Cham, Switzerland: Springer-

doi: 10.1016/j.bspc.2021.103239.

Verlag, 2019, pp. 700–708, doi: 10.1007/978-3-030-32248-9_78.

[6] A. Kastrup, G. Kruger, G. H. Glover, and M. E. Moseley, “Assessment of cerebral

[22] M. Tang, P. Kumar, H. Chen, and A. Shrivastava, “Deep multimodal learning for

oxidative metabolism with breath holding and fMRI,” Magn. Reson. Med., vol. 42,

the diagnosis of autism spectrum disorder,” J. Imag., vol. 6, no. 6, p. 47, Jun. 2020, doi:

no. 3, pp. 608–611, Sep. 1999, doi: 10.1002/(SICI)1522-2594(199909)42:33.0.CO;2-I.

[23] L. Shao, C. Fu, Y. You, and D. Fu, “Classification of ASD based on fMRI data with

[7] E. Kirino, S. Tanaka, Y. Nagai, A. Hattori, and S. Aoki, “S1-3 Functional connectivity

deep learning,” Cogn. Neurodynamics, vol. 15, no. 6, pp. 961–974, Dec. 2021, doi:

in autism spectrum disorder evaluated using rs-fMRI and DKI,” Clin. Neurophysiol.,

10.1007/s11571-021-09683-0.

vol. 131, no. 10, pp. e244–e245, Oct. 2020, doi: 10.1016/j.clinph.2020.04.062.

[24] W. Yin, S. Mostafa, and F. Wu, “Diagnosis of autism spectrum disorder based on

[8] J. F. Agastinose Ronicko, J. Thomas, P. Thangavel, V. Koneru, G. Langs, and

functional brain networks with deep learning,” J. Comput. Biol., vol. 28, no. 2, pp.

J. Dauwels, “Diagnostic classification of autism using resting-state fMRI data

146–165, Feb. 2021, doi: 10.1089/cmb.2020.0252.

improves with full correlation functional brain connectivity compared to partial

[25] B. Lullo. “Autism Brain Imaging Data Exchange I ABIDE I.” ABIDE. Accessed: Jun.

correlation,” J. Neurosci. Methods, vol. 345, Nov. 2020, Art. no. 108884, doi: 10.1016/

24, 2016. [Online]. Available: https://fcon_1000.projects.nitrc.org/indi/abide/abide_I.html

j.jneumeth.2020.108884.

[26] N. Chaitra, P. A. Vijaya, and G. Deshpande, “Diagnostic prediction of autism spectrum

[9] M. Wang, J. Huang, M. Liu, and D. Zhang, “Modeling dynamic characteristics of

disorder using complex network measures in a machine learning framework,” Biomed.

brain functional connectivity networks using resting-state functional MRI,” Med.

Signal Process. Control, vol. 62, Sep. 2020, Art. no. 102099, doi: 10.1016/j.bspc.2020.102099.

Image Anal., vol. 71, Jul. 2021, Art. no. 102063, doi: 10.1016/j.media.2021.102063.

[27] M. Graña and M. Silva, “Impact of machine learning pipeline choices in autism

[10] T. Iidaka, “Resting state functional magnetic resonance imaging and neural

prediction from functional connectivity data,” Int. J. Neural Syst., vol. 31, no. 4,

network classified autism and control,” Cortex, vol. 63, pp. 55–67, Feb. 2015, doi:

p. 2,150,009, Apr. 2021, doi: 10.1142/s012906572150009x.

10.1016/j.cortex.2014.08.011.

[28] Z. Sherkatghanad, M. Akhondzadeh, S. Salari, M. Zomorodi, and V. Salari, “Auto-

[11] X. Bi, Y. Liu, Q. Jiang, Q. Shu, Q. Sun, and J. Dai, “The diagnosis of autism spec-

mated detection of autism spectrum disorder using a convolutional neural network,”

trum disorder based on the random neural network cluster,” Frontiers Hum. Neurosci.,

Frontiers Neurosci., vol. 13, Jan. 2020, Art. no. 1325, doi: 10.3389/fnins.2019.01325.

vol. 12, Jun. 2018, Art. no. 257, doi: 10.3389/fnhum.2018.00257.

[29] T. Eslami, V. Mirjalili, A. Fong, A. R. Laird, and F. Saeed, “ASD-DiagNet: A hybrid

[12] S. Mostafa, L. Tang, and F. X. Wu, “Diagnosis of autism spectrum disorder based

learning approach for detection of autism spectrum disorder using fMRI data,” Fron-

on eigenvalues of brain networks,” IEEE Access, vol. 7, pp. 128,474–128,486, Sep. 2019,

tiers Neuroinformatics, vol. 13, Nov. 2019, Art. no. 70, doi: 10.3389/fninf.2019.00070.

doi: 10.1109/access.2019.2940198.

[30] L. Guo et al., “Classification of the functional magnetic resonance image of autism

[13] J. Liu, Y. Sheng, W. Lan, R. Guo, Y. Wang, and J. Wang, “Improved ASD classifica-

based on 4D convolutional neural network,” CAAI Trans. Intell. Syst., vol. 16, no. 6, pp.

tion using dynamic functional connectivity and multi-task feature selection,” Pattern

1021–1029, Nov. 2021, doi: 10.11992/tis.202009022.

Recognit. Lett., vol. 138, pp. 82–87, Oct. 2020, doi: 10.1016/j.patrec.2020.07.005.

[31] W. Jiang et al., “CNNG: A convolutional neural networks with gated recurrent

[14] F. Zhao, Z. Chen, I. Rekik, S.-W. Lee, and D. Shen, “Diagnosis of autism spectrum

units for autism spectrum disorder classification,” Frontiers Aging Neurosci., vol. 14,

disorder using central-moment features from low- and high-order dynamic resting-

Jul. 2022, Art. no. 948704, doi: 10.3389/fnagi.2022.948704.

state functional connectivity networks,” Frontiers Neurosci., vol. 14, Apr. 2020, Art. no.

[32] S. Parisot, S. I. Ktena, E. Ferrante, M. Lee, and D. Rueckert, “Spectral graph

258, doi: 10.3389/fnins.2020.00258.

convolutions for population-based disease prediction,” in Proc. Int. Conf. Med. Image

[15] Y. Wu et al., “JCS: An explainable COVID-19 diagnosis system by joint classifica-

Comput. Comput. Assisted Intervention (MICCAI), Cham, Switzerland: Springer-

tion and segmentation,” IEEE Trans. Image Process., vol. 30, pp. 3113–3126, Feb.

Verlag, 2017, pp. 177–185, doi: 10.1007/978-3-319-66179-7_21.

2021, doi: 10.1109/TIP.2021.3058783.

18

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

©SHUTTERSTOCK.COM/TEO ANGELOVSKI

Tooth.AI Intelligent Dental Disease Diagnosis and Treatment Support Using Semantic Network by Hossam A. Gabbar , Abderrazak Chahid , Md. Jamiul Alam Khan , Oluwabukola Grace Adegboro, and Matthew Immanuel Samson

T

he emerging fourth industrial revolution (industry 4.0) is leading the healthcare system toward more digitalization and smart management. For instance, recent digital healthcare solutions can help dentists/practitioners save time by managing their schedules and managing diagnosis and treatment. The proposed solution is a diagnostic module that can be integrated into existing dental software. This module is based on artificial intelligence (AI) that allows the diagnosis of X-ray images/volumes and helps in the early detection Digital Object Identifier 10.1109/MSMC.2023.3245814 Date of current version: 17 July 2023

2333-942X/23©2023IEEE

and diagnosis of oral health diseases. The solution presents a smart and automated assistive platform to aid dental practitioners in identifying underlying tooth diseases and accessing doctors in treatment suggestions. Introduction According to the Global Burden of Disease 2010, of dental and oral diseases affecting people worldwide, around 35% suffer from untreated decay (caries) of permanent teeth, 11% have severe periodontal (gum) disease, and 2% even have tooth loss. Oral health diseases happen due to different factors, such as a lack of resources, oral hygiene habits, etc. Such diseases may cause the loss of all-natural Ju ly 2023

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE

19

≥7

1

0

0

−7 61

−6

0 51

−5

0 −4

41

0 31

−3 21

12

Age All

0

80 70 60 50 40 30 20 10

−2

(%)

teeth, which can lead to changes in eating patterns, nutrient deficiency, and involuntary weight loss, as well as speech difficulty (if left uncorrected). The state of oral health in Canada reported that the government’s main challenge is providing required oral health care to the most vulnerable segments of its population (e.g., lowincome groups, indigenous peoples, people with special needs, children, and new immigrants with refugee status) [1]. Figure 1 shows the age distribution of the health survey of the Canadian community. The time loss due to dental problems and treatment causes an economic loss estimated at over 40 million hours lost annually: US$442 billion in 2010 worldwide (see Table 1 for more details). It is crucial to design preventive healthcare solutions to help improve the oral health system and reduce economic loss.

Source: Statistics Canada, Canadian Community Health Survey (CCHS), 201236 Figure 1. Percentage of Canadians aged 12 years

and over who consulted with a dentist or orthodontist in 2012.

In addition, tooth-related diseases might result from some skull/mouth geometry abnormalities. In some cases, surgical interventions are needed to correct this deformation and restore healthy teeth. The detection of such skeletal abnormalities is usually diagnosed using cephalometric analysis, which checks the normal position of some key locations, called landmarks. Therefore, it is crucial to design preventive healthcare solutions to integrate skeletal and dental diagnosis to help improve the oral health system and reduce treatment expenses. Many studies demonstrate that preventative healthcare solutions are costeffective, with substantial economic benefits regarding reduced treatment costs and decreased productivity losses in the labor market. Most of the existing dental and skeletal software provide independent diagnosis and/or treatment solutions with data management and appointment schedulers. These available systems in the market can be divided into two main categories. First, the hardware-based solutions provide the medical dental and skeletal scanner for data acquisition and capturing the medical recording used for the medical diagnosis. The scanners use different imaging technologies, such as X-ray computed tomography (CT), and intraoral cameras using near-infrared imaging (NiRi). These solutions provide doctors mainly with raw and/or enhanced medical images used for manual diagnosis, for example, iTero [2], Carestream Dental [3], and GO [4]. These solutions allow fast scan time with additional postprocessing phases to reduce motion blur risk and limit exposure time with minimal radiation. Some of these solutions allow advanced postprocessing, such as automatic cephalometric tracing, superimposition, image reporting, and surgical simulation using a visual treatment objective.

Table 1. Potential productivity losses due to dental problems and treatment at the individual and societal level [1]. Occupation Classification

Mean Hours Lost

Potential Individual Losses ($)

Potential Societal Losses ($)

Management

2.9

108.16

104,287,872

Business, finance, and administrative

3.8

85.15

239,109,715

Natural and applied sciences and related occupations

2.9

95.17

103,278,484

Health occupations

3.6

97.44

97,790,784

Occupations in social science, education, government service, and religion

3.7

112.51

165,333,445

Occupations in art, culture, recreation, and sport

3.9

91.67

33,212,041

Sales and service occupations

31

5812

220,857,664

Trades, transport, and equipment operators and related occupations

2.8

6431

131,064,967

Occupations unique to primary industry

33

76.39

16,439,128

Occupations unique to processing, manufacturing, and utilities

22

42.96

32,232,888

20

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

I n t h i s a r t icle, we propose These software solutions include a smart and automated solution but are not limited to: Cephx [5], This module is that combines dental and skeletal Planmeca [6], FACAD [7], OrisCeph based on artificial diagnosis based on deep learning Rx [8], AudaxCeph [9], Carestream techniques. In addition, it assists Dental [3], and DolphinCeph Tracintelligence that doctors in treatment suggestions, ing [10]. Most of the advanced analallows the diagnosis taking into consideration the ysis is performed using deep patient profile parametrized by learning-based models (classificaof X-ray images/ their medical records and prevition, semantic segmentation, and volumes and helps ous diseases treatment history. landmark detection, etc.) [11] or in the early detection The outline of the rest of this artiknowledge-based techniques (gencle is as follows. The “Proposed erative programming, pattern and diagnosis of oral Solution” section describes the detection, etc.) [12]. However, deep health diseases. proposed Tooth.AI framework learning-based features provided with the different diagnosis and by this software still depend on the treatment modules. The “Results initial data used for training. This and Discussion” section presents limits its flexibility for variant the obtained results of a case study. The “Knowledge patient profiles. Thus, it becomes challenging to ensure the Translation” section explores the knowledge transfer generalizability of the trained models. Moreover, some facplan to take our solution to the next stage of public usage. tors, such as age, gender, and existing medical conditions, The “Novelty and Anticipated Impact” section presents a are not considered in model training.

Input Medical Data

Patient Information – Age, Sex, Gender – Health Condition – Previous Treatment

CT Image/3D Volume

Patient Information

Verified Diagnosis/Treatment

Annotated Dataset (DSN)

Dental Diagnosis – – – –

– Previous Successful Disease Treatments – Patients With Similar Profile/ Disease

Tooth Structure Segmentation Caries Detection and Characterization Predict Future Dental Complications Provide a Justified Diagnostic Report

Incremental Learning

Treatment Database

Treatment Suggestion Suggest the Most Relevant Successful Treatment Protocol

Update Database

Expert Feedback

Figure 2. The general framework of the proposed solution. DSN, dental semantic network.

Ju ly 2023

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE

21

summary of our contribution and novelty with concluding remarks. Proposed Solution There has been a massive amount of X-ray data and cumulative knowledge from dental radiologists and experts over the last few decades. They have identified thousands of pathological changes and traces of previous dental treatment on X-rays worldwide. Our solution will offer an integrated computer vision and knowledge-based system to extract diagnostic information from input medical images/volumes collected from dental exams using CT scanners. The proposed research solution is to design an intelligent preventive system named Tooth.AI to detect and diagnose skeletal and dental diseases. It aims to provide a real-time inspection of the teeth and skull geometry and simulate the future development of the disease in the case of no treatment and suggest suitable treatments for the patient (Figure 2). Dental Diagnosis This integrated dental diagnosis component of this solution will support the detection and diagnosis of vertical root fractures, assessment of root morphologies, determining the working length of the tooth, locating apical foramen, retreatment predictions, and prediction of periapical pathologies. The medical images will be collected from openaccess resources/data sets and our collaborative dentists in Toronto and internationally. The developed deep learning segmentation techniques will identify the tooth’s structure (see Figure 3). In addition, it will classify its health condition (healthy tooth with caries). The extracted knowledge will be accumulated with deterministic and probabilistic parameters in the dental semantic network (DSN) that will be dynamically updated using expert feedback. The collaboration with experts will allow our team to annotate the medical data and evaluate the performance of the developed algorithm on different patients and validate the deployment of the proposed solution in clinical case studies. The case studies will include different sex/gender from different communities and

Enamel Dentine Pulp Gum Line

Crown Neck

Root

Alveolar Bone

Figure 3. Illustration of the tooth structure (source [13]). 22

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

regions across Canada to have a good representation of the dental healthcare data with varying health conditions. All of these factors will be investigated in relation to the dental treatment process. The proposed computer visionbased approach will process 2D images/scans, which will be mapped into 3D digital form. Tooth.AI will mainly analyze cone-beam CT images/volumes because it provides a better understanding of the mouth/teeth morphology. In addition, we use 2D images collected from the nonradiative intraoral camera using NiRi technology. Skeletal Diagnosis The second integrated component of Tooth.AI will support dentists, orthodontists, and oral surgeons in cephalometric analysis and help them to understand the dental and skeletal relationships in the human skull. They will be able to plan the treatment correctly and accurately with reduced time. Tooth.AI will reduce the manual examination of X-ray images, where it will automatically identify landmarks with preprocessed knowledge in DSN. It will visualize the integrated view with landmarks and possible diagnoses or issues based on stored expertise, as shown in Figure 2. Tooth.AI will provide details about patient diagnoses of dental and skeletal abnormalities and propose a possible treatment plan. Tooth.AI will offer an automated process with human-inthe-loop to work fully automatically or with human intervention based on user preferences and configurations. The proposed techniques within Tooth.AI for cephalometric landmark detection are based on state-ofthe-art methods categorized into two main categories, as shown in Figure 4. The latest techniques of cephalometric landmark detection and delta disease detection using the latest deep learning algorithms produce results comparable to human examiners [14]. For instance, very encouraging results were achieved in landmark detection of an error less than 2 mm of point-to-point errors with ground truth positions [15], [16], [17], [18], [19], [20]. In addition, there exist other types of methods used to search for landmarks, such as shape model [21], employing resampling in conjunction with the convolutional neural networks (CNN) algorithm [22], CNN for regression analysis of cephalometric coordinates [23], and various others. However, we need to go beyond landmark detection and suggest a suitable treatment based on the previous successful treatments of similar patient profiles. We propose developing a fully integrated toolbox for automatic analysis of X-ray images, detection of abnormalities or diseases, and help in treatment planning. The system would have a proper data management system to input patient data. Then the landmarks would be identified with a trained deep learning model. In detecting landmarks, we propose investigating the effects of factors, such as age, gender, and noise data. The proposed system would analyze the landmark data (providing the needed angles and distances computations

necessary for the diagnosis). By combining the computed results and the previous expert’s treatment, the system would suggest the presence of abnormalities or diseases and suggest treatment planning. The collaboration with dentists will enable the team to annotate the diagnosis image and link it to diseases. It will also provide detailed inputs to label images for skeletal analysis to support the planning of surgical modifications. The main toolbox will be directed to clinical use using X-ray CT images and 3D volumes. The proposed algorithms will further validate nonradiative data using our laboratory setup based on an intraoral camera [24]. The workflow of the proposed solution is shown in Figure 2.

annotated data set. This will boost the deep learning model and help build a compromised knowledge base that can be transferred to other doctors and healthcare systems. The increment learning framework presents a solution to this problem as follows: ◆◆ Gradually build an annotated data set from the daily practice of doctors. ◆◆ Centralize the knowledge base from different experts and build generalizable models. ◆◆ Enable the transfer learning by using these pretrained models for another similar disease while preserving patient data privacy.

DSN Enabled Incremental Learning During the medical treatment journey of a specific disease, the Deep learning-based diagnosis Tooth.AI will provide doctors create a treatment file has shown remarkable abilities to details about describing the diagnosis proceachieve high accuracy even comdure, and record the prescribed parable to expert practitioners. patient diagnoses of treatment and its efficiency evaluHowever, this cannot be guarandental and skeletal ation during the follow-up sesteed if these models are trained sions. In this article, the different on a small data set or using data abnormalities and data collected during this treatsets that do not represent most propose a possible ment journey are structured into a samples but with few variabilities. treatment plan. s e m a nt ic ne t work d a t a b a s e In addition, medical data have including patient health condition, some additional constraints relatdisease and treatment history, etc. ed to privacy and ethics restricTherefore, all patients’ treatment tions. Therefore, it becomes highly journeys are put together and grouped into different challenging to access the needed labeled data set with nodes: patients (denoted P), tooth diseases (denoted D-T) enough size and variability. Furthermore, the data label(tooth), gum diseases (denoted D-G), and their correing presents a second challenge as this type of labeling sponding treatments (T-T, T-G). These nodes are associ(dental and skeletal diagnosis) is subjective to each ated by their relationship: i.e., the patient (P1) is affected expert’s experience and daily practices. Therefore, it is by the disease (D-T1), which is treated with the treatment vital to design a system that can use the doctor’s diagno(T-T2). The patient nodes are linked with a weighted edge sis and treatment and convert them into a standardized

Landmark Identification

Knowledge Based Techniques

AI Based

Deep Learning

Machine Learning

–Random Forest –Regression –Support Vector Machines –Decision Tree –Linear Affine and Linear Principal Component

–Edge and Pattern Detection –Genetic Programing –Models (Active Shape and Active Appearance Models)

–Convolution Neural Network –Pulse Coupled Neural Network –Cellular Neural Network

Figure 4. Categorization of landmark identification techniques (source [12]).

Ju ly 2023

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE

23

f irst to tra in a segmentation model. The segmented teeth will Therefore, it is vital be then cropped and used to gento design a system erate a second data set used for classification. The classification that can use the model is deployed to distinguish doctor’s diagnosis three different tooth classes: healthy, unhealthy, and treated and treatment and (with filling). The cascade models convert them into will help in diagnosing each tooth a standardized separately, as shown in Figure 6. Results and Discussion The training performance of In this section, a case study is preannotated data set. segmentation models gives perforsented to show an example of the mance described by the achieved obtained results using the prointersection over union (IOU) up posed framework. It explores a to 0.79. Similarly, the classification model could achieve an scenario of deploying the proposed Tooth.AI system for accuracy of 0.95. Figure 7 presents the training and validateeth diagnosis and skull landmark detection and shows tion performance of both models. how this diagnosis report can be used to update the semantic network and suggest a suitable treatment. Skeletal Landmark Detection Teeth Diagnosis The cephalograms data set [25] consists of 400 lateral cephalogram images of 400 different subjects, whose The used panoramic dental data set consists of 1,000 ages are between 7 and 76. Each image of the data radiography images, where the corresponding mask set is annotated with 19 landmarks, as presented in localized the different teeth [24]. These data are used defining their similarity. This similarity is computed as the covariance of patients, retrieved from the semantic network edge, considering their health conditions and disease/treatment history. Figure 5 shows an illustration of converting regular data into semantic network-based data.

1 ChlorHexidine P2: Jamiul

Affected By

0.6 P1: John

Success

0.8 IOU

Gingivitis

Treated With

0.4

Failure

Training IOU

Affected By

Caries

Patient

0.6

0.2

Treated With

Disease

Validation IOU 0

Root Canal Treatment

100

200 300 Epochs (a)

400

500

100

Treatment

90

Figure 5. Illustration of the treatment suggestion

80 Accuracy

process.

70 60 50 40

Training_Accuracy

30

Validation_Accuracy 0

50

100 150 200 250 300 350 400 Epochs (b)

Figure 6. Example of teeth diagnosis: healthy teeth

(green), unhealthy (red), treated (blue).

24

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

Figure 7. Teeth segmentation and classification

performance: (a) teeth segmentation IOU; (b) teeth classification accuracy.

Figure 8. For landmark detection, a deep learning model based on CNN architecture is deployed. Figure 9 shows the predicted landmark points. The obtained results will be used to compute the different clinical measurements needed to characterize the skull shape and extract the anomalies.

L1 L2 L3 L4 L5 L6 L7 L8 L9 L10 L11 L12 L13 L14 L15 L16 L17 L18 L19

Sella Nasion Orbitale Porion Subspinale Supramentale Pogonion Menton Gnathion Gonion Lower Incisal Incision Upper Incisal Incision Upper Lip Lower Lip Subnasale Soft Tissue Pogonion Posterior Nasal Spine Anterior Nasal Spine Articulate

Treatment Suggestion Using the Semantic Network The diagnostic reports generated by the dental and skeletal modules will be used to recognize the disease and suggest the appropriate treatment. In this work, a list of diseases and treatments is shown based on the medical literature to build the initial database needed for treatment Figure 8. Cephalogram annotation example showing the 19 suggestions [26], [27], [28]. The suggestion landmarks (source [17]). of treatment for a specific patient has three levels. First, the system suggests further train the deep learning models. We will communithe recent successful treatment if the patient was previcate with the Canadian Dental Association to get more ously treated for the same disease. Second, and if the views and expertise on our solution and potential implepatient was not affected by the disease before, the system mentation guidelines. Our team will communicate with will suggest the successful treatment of the most similar the Canadian Dental Regulatory Authorities Federation to patient from the database. If not available, third, the sysgain experience and application of automation in view of tem will suggest the most commonly used treatment for the regulatory framework. the diagnosed disease. The generated system diagnosis and the suggested treatment are then updated to the semantic network, creating additional nodes if applicable. Figure 10 presents two examples of adding new nodes to the semantic network. Knowledge Translation The collaborating partner dentists from Canada and international clinics will provide sample images (with consent) and diagnosis and treatment data, which will support the research team to build training data and associated analysis. The interviews with expert dentists and dental data providers will offer expertise in the validation and analysis of images, diagnosis, and treatment details, which will be transferred to the research team. Obtaining medical data from 20 patients is expected each year. In addition, we will conduct around 28 interview sessions with dentists and experts to annotate the collected data and get their opinion about the algorithms, approach, and integrated solutions. Thanks to the interaction with experts and practitioners, the proposed toolbox is enabled with an interactive user interface. Thus, the experts can correct the wrong predictions of the AI models to boost their performance. Moreover, we propose handling the lack of an annotated data set by developing an incremental model training framework that keeps updating the annotated data from recent interactions with the expert. All of these interactions between the toolbox and the expert will be saved to the database and used to

Figure 9. Example of skeletal landmark locations

detection: predicted landmarks (green) ground-truth of landmarks (red).

Ju ly 2023

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE

25

Disease Nodes

Patient Nodes

Treatment Nodes

Patient ID

Zoom Area

(a) Disease Nodes

Patient Nodes

Treatment Nodes

Patient ID

Zoom Area

(b) Figure 10. Example of the semantic networks after new patient–disease–treatment augmentation: (a) small

DSN; (b) larger DSN.

Novelty and Anticipated Impact The proposed system includes different deep learningbased techniques for dental and skeletal diseases and treatments, which will enhance the accuracy of dental treatments and reduce errors, with enhanced efficiency. The proposed novel incremental learning framework will allow for a gradual and improved understanding of dental and skeletal diseases and to transfer this knowledge to an AIbased model using an active interaction between the toolbox and the expert. It will preserve the doctor’s experiences in diagnosis and treatment, and convert them into standardized annotated data sets that will be used to support young dentists with less experience in improved dental treatments. 26

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

The proposed DSN and knowledge base would be useful for both dentists and the public to share and transfer expertise. The accumulation of expertise around dental diagnosis and treatment will preserve the expertise of doctors and will allow continuous expertise exchange and transfer between healthcare providers. The proposed solution will also support dental surgeries, which are expensive, and reduce errors and increase comfort and satisfaction based on improved precision and accuracy to meet patient expectations. It will open the door for digital and smart dental healthcare systems. The solution will enable plug-and-play interfaces to different X-ray and camera technologies for national and international deployments.

Acknowledgment Research reported in this publication has been supported by New Vision Systems Canada Inc. and Mitacs.

[13] K. Watson and C. Frank. “How to brush your teeth properly.” Healthline. Accessed: Mar. 12, 2022. [Online]. Available: https://www.healthline.com/health/dental-andoral-health/how-to-brush-your-teeth [14] H. W. Hwang, J. H. Moon, M. G. Kim, R. E. Donatelli, and S. J. Lee, “Evaluation of

About the Authors Hossam A. Gabbar ([email protected]) is with the Faculty of Energy Systems and Nuclear Science and the Faculty of Engineering and Applied Science, Ontario Tech University, Oshawa, ON L16 0C5, Canada. Abderrazak Chahid (abderrazak.chahid@ontariotechu. net) is with the Faculty of Energy Systems and Nuclear Science, Ontario Tech University, Oshawa, ON L16 0C5, Canada. Md. Jamiul-Alam Khan (mdjamiul.khan@ontariotechu. net) is with the Faculty of Engineering and Applied Science, Ontario Tech University, Oshawa, ON L16 0C5, Canada. Oluwabukola Grace-Adegboro (oluwabukola. [email protected]) is with the Faculty of Engineering and Applied Science, Ontario Tech University, Oshawa, ON L16 0C5, Canada. Matthew Immanuel Samson ([email protected]) is with New Visions Systems Canada Inc., Scarborough, ON M1S 3L1, Canada.

automated cephalometric analysis based on the latest deep learning method,” Angle Orthodontist, vol. 91, no. 3, pp. 329–335, May 2021, doi: 10.2319/021220-100.1. [15] C. W. Wang et al., “Evaluation and comparison of anatomical landmark detection methods for cephalometric X-ray images: A grand challenge,” IEEE Trans. Med. Imag., vol. 34, no. 9, pp. 1890–1900, Sep. 2015, doi: 10.1109/TMI.2015.2412951. [16] H. Kim, E. Shim, J. Park, Y. Y. J. Kim, U. Lee, and Y. Y. J. Kim, “Web-based fully automated cephalometric analysis by deep learning,” Comput. Methods Programs Biomed., vol. 194, Oct. 2020, Art. no. 105513, doi: 10.1016/j.cmpb.2020.105513. [17] “Fully automatic cephalometric evaluation using random forest regression-voting,” Univ. of Manchester, Manchester, U.K., 2015. [Online]. Available: https://www.research. manchester.ac.uk/portal/en/publications/fully-automatic-cephalometric-evaluation -using-random-forest-regressionvoting(b42c658f-0a66-4d1e-99c7-9cb67fb282a0).html [18] “Grand challenges in dental X-ray image analysis 2014.” Accessed: Mar. 12, 2022. [Online]. Available: https://www.be.ntust.edu.tw/p/404-1009-44930. php?Lang=zh-tw [19] Y. Song, X. Qiao, Y. Iwamoto, and Y. W. Chen, “Automatic cephalometric landmark detection on X-ray images using a deep-learning method,” Appl. Sci. (Switzerland), vol. 10, no. 7, Apr. 2020, Art. no. 2547, doi: 10.3390/app10072547. [20] J. Kim et al., “Accuracy of automated identification of lateral cephalometric

References

landmarks using cascade convolutional neural networks on lateral cephalograms from

[1] H. Amasya, D. Yildirim, T. Aydogan, N. Kemaloglu, and K. Orhan, “Cervical ver-

nationwide multi-centres,” Orthodontics Craniofacial Res., vol. 24, no. S2, pp. 59–67,

tebral maturation assessment on lateral cephalometric radiographs using artificial

Dec. 2021, doi: 10.1111/ocr.12493.

intelligence: Comparison of machine learning classifier models,” Dentomaxillofacial

[21] J. Montúfar, M. Romero, and R. J. Scougall-Vilchis, “Automatic 3-dimensional

Radiol., vol. 49, no. 5, Mar. 2020, Art. no. 49, doi: 10.1259/dmfr.20190441.

cephalometric landmarking based on active shape models in related projections,”

[2] “iTero element 5D — iTero intraoral scanner.” iTero. Accessed: Mar. 12, 2022.

Amer. J. Orthodontics Dentofacial Orthopedics, vol. 153, no. 3, pp. 449–458, Mar. 2018,

[Online]. Available: https://global.itero.com/en/products/itero_element_5d

doi: 10.1016/j.ajodo.2017.06.028.

[3] “Cephalometric imaging systems.” Carestream Dental. Accessed: Mar. 12, 2022.

[22] S. H. Kang, K. Jeon, H. J. Kim, J. K. Seo, and S. H. Lee, “Automatic three-

[Online]. Available: https://www.carestreamdental.com/en-us/csd-products/extraoral-

dimensional cephalometric annotation system using three-dimensional con-

imaging/cephalometric-imaging/

volutional neural networks: A developmental trial,” Comput. Methods Bio-

[4] “GO extraoral imaging,” Newtom. Accessed: Mar. 12, 2022. [Online]. Available:

mechanics Biomed. Eng., Imag. Vis., vol. 8, no. 2, pp. 210–218, Mar. 2020, doi:

https://www.newtom.it/en/medicale/prodotti/go/

10.1080/21681163.2019.1674696.

[5] “Cephalometric anlaysis archives— CephX— AI driven dental services.”

[23] S. Nishimoto, Y. Sotsuka, K. Kawai, H. Ishise, and M. Kakibuchi, “Personal

CephX. Accessed: Mar. 12, 2022. [Online]. Available: https://cephx.com/it/tag/

computer-based cephalometric landmark detection with deep learning, using cepha-

cephalometric-anlaysis-it/

lograms on the internet,” J. Craniofacial Surgery, vol. 30, no. 1, pp. 91–95, Jan. 2019,

[6] “Cephalometric anlaysis archives— CephX— AI driven dental services.” CephX.

doi: 10.1097/SCS.0000000000004901.

Accessed: Mar. 12, 2022. https://cephx.com/it/tag/cephalometric-anlaysis-it/

[24] K. Panetta, R. Rajendran, A. Ramesh, S. Rao, and S. Agaian, “Tufts dental data-

[7] “Facad ortho tracing software.” facad.com. Accessed: Mar. 12, 2022. [Online].

base: A multimodal panoramic X-ray dataset for benchmarking diagnostic systems,”

Available: https://www.facad.com/wp/

IEEE J. Biomed. Health Inform., vol. 26, no. 4, pp. 1650–1659, Apr. 2022, doi: 10.1109/

[8] “Software for cephalometric analysis OrisCeph Rx CE.” OrisLine. Accessed: Mar. 12,

JBHI.2021.3117575.

2022. [Online]. Available: https://www.orisline.com/en/software-for-cephalometric-analysis/

[25] C. Lindner, C. W. Wang, C. T. Huang, C. H. Li, S. W. Chang, and T. F. Cootes, “Fully auto-

[9] “AudaxCeph software.” audaxceph.com. Accessed: Mar. 12, 2022. [Online]. Avail-

matic system for accurate localisation and analysis of cephalometric landmarks in lateral

able: https://www.audaxceph.com/

cephalograms,” Scientific Rep., vol. 6, no. 1, pp. 1–10, Jun. 2021, doi: 10.1038/s41598-021-

[10] “Content library — Aquarium — Orthodontic imaging and practice management

91681-7.

software — Patient education — 1(818)435-1368 — Dolphin imaging and management

[26] “Gum problems: 6 types, causes, symptoms, treatment & oral cancer.” Medi-

solutions — Product.” Dolphin Imaging. Accessed: Mar. 12, 2022. [Online]. Available: https://

cineNet. Accessed: Mar. 12, 2022. [Online]. Available: https://www.medicinenet.com/

www.dolphinimaging.com/product/Aquarium?Subcategory_OS_Safe_Name=Content_Library

gum_problems/article.htm

[11] F. Schwendicke, T. Golla, M. Dreher, and J. Krois, “Convolutional neural networks

[27] “Fractured tooth (Cracked Tooth): What it is, symptoms & repair,” Cleveland

for dental image diagnostics: A scoping review,” J. Dentistry, vol. 91, Dec. 2019, Art.

Clinic, Cleveland, OH, USA, 2021. Accessed: Mar. 12, 2022. [Online]. Available: https://

no. 103226, doi: 10.1016/j.jdent.2019.103226.

my.clevelandclinic.org/health/diseases/21628-fractured-tooth-cracked-tooth

[12] M. Juneja et al., “A review on cephalometric landmark detection techniques,”

[28] “Healthline: Medical information and health advice you can trust.” Healthline.

Biomed. Signal Process. Control, vol. 66, Apr. 2021, Art. no. 102486, doi: 10.1016/j.

Accessed: Mar. 12, 2022. [Online]. Available: https://www.healthline.com/

bspc.2021.102486.

Ju ly 2023

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE

27

©SHUTTERSTOCK.COM/BARILLO_PICTURE

MDN-Enabled SO for Vehicle Proactive Guidance in RideHailing Systems Minimizing Travel Distance and Wait Time by Xiaoming Li , Jie Gao , Chun Wang , Xiao Huang , and Yimin Nie

Digital Object Identifier 10.1109/MSMC.2022.3220315 Date of current version: 17 July 2023

28

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

2333-942X/23©2023IEEE

V

distances and rider wait times [2]. Guo et al. [3] propose an ehicle proactive guidance strategies are online ride-hailing dispatch framework that is based on used by ride-hailing platforms to mitigate spatiotemporal thermos guidance to address the real-time supply–demand imbalance across regions service vehicle dispatching problem. A concept named by directing idle vehicles to high-demand spatiotemporal thermos is defined to represent the regions before the demands are realized. This demand density of ride-hailing regions. In addition, the article presents a data-driven stochastic optimization random forest regression machine-learning method is utiframework for computing idle vehicle guidance strategies. lized for spatiotemporal thermos forecasting. A data-drivThe objective is to minimize drivers’ idle travel distance, en recommendation system that exploits the benefits of riders’ wait time, and the oversupply costs (OSCs) and vehicular social networks for ride-hailing services is undersupply costs (USCs) of the platform. Specifically, designed in [4] where long short-term memory is utilized to we design a novel neural network that integrates gated forecast the demands. Chen et al. [5] propose a hierarchical recurrent units (GRUs) with mixture density networks framework for vehicle dispatch in ride-sharing systems. (MDNs) to capture the spatial-temporal features of the The higher hierarchy optimizes idle rider demand distribution. mileage by rebalancing vehicles The outcome of the neura l across regions toward current and network is fed into a stochastic predicted rider demands. optimization process to compute The objective is to While the lower hierarchy is to near-optimal idle vehicle guidance minimize the total minimize the total mileage delay as solutions. The performance of the well as serve rider requests as proposed framework is validated idle travel distance much as possible, Miao et al. [6] through numeric experiments using under the worst case develop a data-driven taxi dispatch New York yellow taxi trip record demand scenario framework under demand uncerdata. Our results show that the tainty that is spatial-temporally MDN-enabled stochastic optimizawhile maintaining correlated using robust optimization approach outperforms other service fairness across tion modeling techniques. In this machine learning-based vehicle work, vacant vehicles are disguidance models that only utilize the whole city. patched toward predicted rider the point estimates of rider demands. demand that varies in an uncertain In terms of managerial implicademand set constructed on spatialtions, it is clear from our experitemporally correlated data sets. The objective is to minimental results that, by adopting data-driven stochastic mize the total idle travel distance under the worst case optimization models in their vehicle guidance systems, demand scenario while maintaining service fairness across ride-hailing platforms can improve rider and driver satisthe whole city. In addition to guidance strategies at the sysfaction and reduce their operating costs. tem level, the impact of guidance signals on individual drivers’ decisions is also studied. In [7], a sequential binary Introduction logistic regression model is proposed to determine the facThe most important service provided by ride-hailing plattors influencing the driver’s cruising decisions when receivforms, such as Lyft, Uber, and Didi, is to match drivers and ing taxi-calling signals. The model is calibrated by survey riders efficiently. To ensure service quality and reduce data. Recently, machine learning [8] and deep reinforcewait times, the demand of riders needs to be promptly met ment learning [9] approaches have been ubiquitously utiby the supply of drivers. However, dynamic changes in the demands across the service regions often cause a supply– lized in ride-hailing applications which shed light on a demand imbalance in the regions and make it challenging research trend of combining learning approaches with optifor the platforms to dispatch sufficient drivers to highmization modeling techniques. demand regions in a timely manner to ensure low wait The articles mentioned previously provide important times. Without a proactive guidance strategy, a ride-hailinsights into designing a proactive guidance mechanism ing platform has to react to the rider demands across in ride-hailing systems. However, their approaches do regions when they are realized. This reactive strategy may not incorporate uncertainties in their optimization proprolong riders’ wait times since the needed idle vehicles cess in the sense that they only predict scalar point estimay not be in riders’ immediate proximity. mates of the demands in regions, which does not allow Idle vehicle proactive guidance strategies have been stochastic optimization (SO) models. This simplified proposed in recent literature to tackle this challenge [1], modeling of uncertainty often leads to a considerable [2]. A proactive guidance strategy guides needed vehicles decline in system performance [10]. As an exception, the to regions where future demands are expected to outstrip approach proposed in [6] does involve the uncertainty supply. As a result, it can increase the rider serving rate sets of the demand. However, their robust optimization (SR) and, at the same time, reduce driver idle driving models focus on guaranteed performance in worst case

Ju ly 2023

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE

29

scenarios, which is rather conservative for the purpose of proactive guidance. In this article, we propose a data-driven SO framework to compute near-optimal idle vehicle proactive guidance strategies given the dynamic rider demand and driver supply across ride-hailing service regions. Instead of just predicting the demand in the form of a scalar, the framework models the uncertainty of rider demand by estimating its probability distribution using historical rider demand data. The uncertainty model is then integrated into an SO process to compute proactive guidance strategies. The contribution of this article is two-fold: 1) we extend MDNs [11] by integrating GRUs [12], which enables the MDN to capture various spatial-temporal features in estimating rider demand distributions and 2) we integrate the extended MDN with an SO process to minimize the vehicle guidance related costs, including USC, over supply cost, and driver idle travel cost. The MDN-SO Framework In this section, we present the MDN-enabled SO (MDNSO) framework, which consists of two modules: an extended MDN that is suitable for estimating demand distributions of time-series data and an SO process that computes near-optimal proactive guidance strategies.

Generally, GMM can be considered as a group of Gaussian distributions with different weights, where the ith Gaussian is determined by weight r i, means n i and covariance matrix R i (variance for v i univariate Gaussian). Then the predicted probability distribution can be represented using GMM by adjusting the parameter i. Notice that the sum of Gaussian component weights must be equal to 1 because each weight is computed by the following softmax function, which is shown in (2):

r i = softmax (h) i =

eh

|

r i

n

k=1

eh

r k

(2)

where h ri denotes the outputs of the hidden layer prior to the layer stores GMM components. Meanwhile, the corresponding n i and v 2i are computed from (3) and (4), respectively:

n i = h in (3)

v i = exp ^h vi h . (4)

The probabilistic forecasting model is built on the XMDN where GRUs can encode useful information of the past in single or multiple layers. The input of each layer is the output of the previous layer concatenated with the network input. Then The Extended MDN Therefore, we propose the outputs of the GRU hidden layer MDN is a combination of a neural h t will be used to compute the network and a Gaussian mixture an extended MDN model (GMM). Unlike the regular parameters of GMM from (2)–(4). to be integrated neural network that only predicts a In addition, the concatenation of into our SO process, single value as the output, MDN outputs of all layers is used to precan capture the model’s stochastic dict the network’s output, which is which requires the behaviors by parameterizing a compared with the target y. Finally, distribution of the Gaussian mixture distribution we use the mixture density paramusing the outputs of a neural neteters to parameterize a Gaussian rider demand work. However, regular MDN modmixture distribution as the probaas input. els are not sufficient for our bilistic forecasting outcome. The purpose as they do not possess the prediction process can be repeated capability of capturing spatial-temin a loop to predict rider demand poral features in rider demand for multiple time steps. data. Therefore, we propose an extended MDN to be inteFurthermore, one of the issues in MDNs, like the congrated into our SO process, which requires the distribution ventional deep neural network, is the overfitting problem of the rider demand as input. [13]. In this work, besides the dropout operations in The extended MDN (XMDN) is an integration of regXMDN, we introduce the L2 regularization technique to ular MDN with GRU. The GMM used by the XMDN is avoid the overfitting issue. In this regard, we design the configured by the mixed coefficients (also known as loss function of XMDN shown in (5): weights), mean, and variance of each Gaussian kernel that is shown in (1): N K E ^w GRU h = - | In ' |r k ^ X n, w GRU h n=1 k=1 K 2 1 p (y ; X, i) = |r i N i ^ X h^y ; n i ^ X h, v i ^ X hh (1) (5) N ^t ; n k ^ X n, w GRU h, v 2k ^ X n, w GRU hh, + 2 w GRU i=1 where i = (r, n, v), and K is the number of Gaussian distributions (also known as components in the literature). 30

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

where the parameter w GRU denotes the set of weights and biases in the GRU deep neural networks.

The Stochastic Optimization Process We assume the ride-hailing platform operates during a day that is discretized into a group of batching windows (also known as time slot) with fixed size DT (e.g., 10 min). (We use “batching window” and “time slot” interchangeably in this article.) To facilitate the vehicle allocations, the ridehailing service zone is divided into a group of disjoint ridehailing service regions denoted as M. Let V t denote the set of idle vehicles in batching window t. The binary variable x tv,m = 1 if idle vehicle v is guided to the point of interest (POI) in region m at time t, and x vt ,m = 0 otherwise. At the beginning of each batching window, a certain number of idle vehicles are guided to the ride-hailing regions’ POIs with minimum guidance distance to meet the rider’s requests in the future. This proactive guidance operation incurs the idle vehicle guidance cost, which can be formulated in (6): a|

|g

v, m

x vt ,m (6)

v ! Vt m ! M

where g v,m denotes the distance between idle vehicle v’s GPS location and the GPS location of region POI m, a is introduced to denote the idle travel cost per mile. In addition, OSCs incur when the number of guided vehicles exceeds the rider demand (including predicted rider demands for the current batching window and the unserved riders from the previous batching window). Likewise, the USCs incur when the number of guided vehicles is lower than the rider demand. The sum of OSC and the USC is defined in (7):

|E

m!M

dt tm,s ~P

t t-1 t t,s ;b $ max ' 0, c | x v,m - d m - d m m1 v ! Vt

+ c $ max ' 0, c d tm- 1 + dt tm,s - | x tv,m m1E

(7)

v ! Vt

where dt tm,s and d tm- 1 denote the predicted rider demand at region m in time slot t under scenario s and the number of unserved riders at region m in time slot t - 1, respectively. Notice that the stochastic programming model will degenerate to the deterministic model if only one scenario is involved. b and c are introduced to denote the OSC per vehicle and USC per requested order, respectively. Since the stochastic programming model has a set of rider demand scenarios (drawn from rider demand distribution), the previous formula denotes the expected total cost (TC) over the rider demand distribution. A group of constraints must be satisfied according to our problem settings. First, a certain level of supply– demand ratio (i), along with the supply–demand ratio gap (p ) among ride-hailing regions must be taken into consideration, which is captured by the following constraints: ^i - p h^dt tm + d tm- 1 h #

|x

v ! Vt

t v, m

# i ^dt tm + d tm- 1 h, 6m ! M . (8)

In addition, each idle vehicle can be guided to one region’s POI at most, which are represented by

|x

t v, m

# 1, 6v ! V t. (9)

m!M

Further, each idle vehicle, if guided, can only be guided to the region’s POI that the vehicle can reach the POI within the length of the batching window. These time constraints are captured by

g v,m /m # DT + H ^1 - x tv,m h, 6v ! V t, 6m ! M (10)

where H is a large positive number to linearize the “if” constraints [14], and m is the idle vehicle’s travel speed that is assumed to be a constant value during the guidance operation. Therefore, g v,m /m is the guidance time between the GPS location of vehicle v to the GPS location of region POI m. Moreover, the total number of idle vehicles must be less than the fleet size under a certain supply–demand ratio, which leads to the following constraint:

| |x

t v, m

# iC t . (11)

v ! Vt m ! M

Given the objective function and constraints, now the holistic optimization model for idle vehicle proactive guidance problem is summarized as follows: minimize ^6 h + ^7 h

subject to ^8 h, ^9 h, ^10 h, ^11 h

x tv,m ! " 0, 1 , 6v ! V t, 6m ! M, 6t ! T . (12)

As discussed previously, the objective is to minimize the overall ride-hailing system costs. To solve the SO model, we first reformulate it to its corresponding deterministic counterpart with a large group of scenarios by applying the sample average approximation (SAA) [15] technique. The resulting deterministic model can then be solved by an off-the-shelf solver such as Gurobi (https://www.gurobi.com/) and CPLEX (https:// www.ibm.com/analytics/cplex-optimizer). Numerical Experiment In this section, we validate the performance of MDNSO through numerical experiments. We first describe the numerical validation env ironment and performance metrics. Next, we discuss data processing and feature engineering for XMDN and GRU. Finally, we evaluate the proposed approach by comparing the performance with other machine learning-based vehicle guidance models. Experiment Setup Both batching matching and historical averages are coded in Python 3.8, and the mathematical optimization models are solved by Gurobi 9.1 (https://www.gurobi. com/academia/academic-program-and-licenses/). The experiments are run on a PC with Intel Core i7 CPU, 32 GB RAM, Windows 10. The deep learning models Ju ly 2023

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE

31

(GRU and XMDN) are coded in Python 3.8 and TensorFlow 2.4 under NVIDIA GeForce RTX 2080 GPU, 16 GB RAM, and Ubuntu 18.04. For GRU models, the training time of each epoch is around 275 s, and the average training time of the GRU model is approximately 3.5 h. For XMDN models, the training time of each epoch is around 358 s, and the average training time of the XMDN model is approximately 4.7 h. After the training process, the deep learning models can predict the rider demand (using GRU) and rider demand distribution (using XMDN) by utilizing the timeseries sequence data from the testing set where the computational time for prediction is only a few seconds. In addition, the optimization model can be solved by Gurobi within 2 min. Therefore, the overall time is far less than the batching window size, which indicates that our proposed framework can be applied to the dynamic ride-hailing platform. Evaluation Metrics We adopt the following three data-driven optimization models as the guidance approaches 1) our proposed approach MDN-SO, 2) the integration of GRU and deterministic optimization model that is labeled as GRU-DM, and 3) the integration of historical average (HA) and deterministic optimization model that is labeled as HA-DM. In addition, the nonguidance mechanism is also introduced to compare the results. Meanwhile, we select the following metrics for the performance comparison. ◆◆ OSC, USC, and TC: The metric involves two types of costs, namely, OSC, which can be computed by the driver’s idle driving distance, and USC, which can be computed by the profit of service orders. The results can be computed from (7) by replacing the predicted rider demand with the real demand. ◆◆ Rider’s SR: For the ride-hailing service region k, the metric is defined as the proportion of served (satisfied) riders. Namely, the rider’s SR at region k is s SR k = min $ 1, dk . (13)

k

where s k and d k denote the number of (guided) idle vehicles at region k and the number of requests (real rider demand) at region k, respectively. ◆◆ Rider’s waiting time (WT): WT is computed in different ways depending on the approaches. To be specific, for guidance approaches (i.e., MDN-SO, GRU-DM, and HA-DM), WT involves three parts, namely: 1) the time duration between the end of the current batching window and the rider’s request time (WT1), 2) the driver’s travel time from POI (from driver’s GPS for no guidance scenario) to rider’s pickup coordinate (WT2), and 3) 10 minutes if the rider cannot be picked up in the current batching window (WT3): 32

WT = WT1 + WT2 + WT3 . (14) IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

(In this case, the riders must wait until the next batching window for service, and we assume the riders do not cancel their requests if they are not served in the current batching window. A similar assumption is discussed in [16].) In addition, we assume that riders are picked up using the firstcome-first-serve (FCFS) protocol. Also, for the no guidance scenario, riders are picked up by their nearest drivers. (Since the FCFS protocol is adopted, rider A, whose request time is before rider B, will be picked up by a driver even if the distance between the driver and rider B is closer than the distance between the driver and rider A.) Feature Engineering We consider the following features that are highly correlated to rider demands. Features extracted from the data set in this work include rider demand, region ID, day of the month, month, day of the week, hour of the day, and minute of the hour. The rider demand is used as the predicted target, while the rest of the features are used to observe how they affect the target. We adopt XGBoost [17] to determine the feature importance for the deep learning predictor, whose metric is based on impurity value. The result of the feature importance is illustrated in Figure 1. We can observe that the region ID and hour of the day are the most important features for the selected data set. The feature of region ID and hour of the day takes over 50% and 30%, respectively, which implies that the features significantly impact rider demand prediction. Performance Evaluation In this section, we choose one-week trip records (2 March 2016–8 March 2016) that involve five weekdays and two weekend days for the experiment validations. The experimental results averaged five and two for the weekday and weekend scenarios, respectively. Since no idle driver information is available in the data sets, we assume the coordinates of idle vehicles are randomly generated in the eight ride-hailing regions. The parameter setting of the optimization models is described in Table 1. We assume that the coordinates of the idle vehicles are randomly generated across the eight ride-hailing service regions. In addition, the number of idle vehicles (fleet size) in the current time slot is determined by the real rider demands from the previous time slot. We set the supply– demand ratio parameter i to 0.95, 1.0, and 1.05 to evaluate the experimental results under different fleet size levels. We are more interested in how much benefit the ride-hailing platform could obtain from vehicle proactive guidance. Since idle vehicles may distribute across regions under any patterns, we consider the three idle vehicle distribution scenarios shown as follows. ◆◆ Positively correlated idle vehicle distribution: Given a set of region index K = " 1, 2, f, k ,, a set of idle vehicles distributed across regions {s i} i ! K, and a set of demands across regions {d j} j ! K, we formulate such a tuple sequence as follows:

f, 1 s i , d j 2, 1 s i, d j 2, 1 s i , d j 2, f (15) -

-

+

+

such that

f, s i # s i # s i , f

f, d j # d j # d j , f.

-

-

+

+

We call this type of idle vehicle distribution Positively Correlated labeled as PC, if 6i, j ! K, i = j. Intuitively, PC is introduced to describe such a scenario that the idle vehicles are “ideally” distributed across regions, which indicates more idle vehicles are cruising around the higher demand regions and vice versa. In this sense, vehicle proactive guidance operation is unnecessary since the number of idle vehicles can meet the demand for each region. However, this ideal scenario seldom happens in realistic applications [18]. ◆◆ Negatively correlated idle vehicle distribution: Using the same notation, we formulate such a tuple sequence as follows:

average SR. In addition, without guidance operation, the SR under positively correlated distribution (labeled NG-PC) is much higher than the one under uniform (labeled NG-U) and negatively correlated distributions (labeled NG-NC). This is because NG-PC considers such an ideal scenario that the idle vehicles are cruising at their “right” regions. Therefore, all the regions can satisfy the rider’s requests. Notice that during some time slots (around 4 a.m. to 8 a.m. on weekdays, around 4 a.m. to 11 a.m. on weekends), HA-DM is inferior to NG-U in terms of average SR, implying that without accurate rider demand predictions, a guidance approach can be even worse than no guidance. Further, MDN-SO is quite close to the NG-PC scenario in terms of average SR, which indicates that our proposed

Feature Importance Region-ID

55.26%

f, 1 s i , d j 2, 1 s i, d j 2, 1 s i , d j 2, f (16) -

-

+

+

such that 1.21%

f, d j $ d j $ d j , f.

-

-

+

Month

+

% 2.5 % 4 3.2

7.

DoM

We call this type of idle vehicle distribution Negatively Correlated labeled as NC, if 6i, j ! K, i = j. In contrast to PC, NC is introduced to describe such a “worst case” scenario that the idle vehicles are cruising around the “wrong” regions. In this case, vehicle proactive guidance operations are quite necessary to alleviate the imbalance of supply and demand. ◆◆ Uniform idle vehicle distribution: In this case, the idle vehicles are uniformly distributed across multiple ridehailing regions. We call this type of idle vehicle distribution Uniform that is labeled U. We compare the validation results based on the previous idle vehicle distributions. First, as shown in Table 2, we observe that the OSC increases and the USC decreases as the fleet size grows (i ranges from 0.95 to 1.05). This is because more rider requests will be satisfied as the number of idle vehicles increases, which leads to more OSC and less USC. In addition, MDN-SO outperforms the remaining data-driven competitors GRU-DM and HA-DM in terms of the TC, with the average TC reduction by 17.5% and 63.8% on weekdays, 21.4% and 62.1% on weekends under i = 0.95; 17.2% and 70.5% on weekdays, 23.2% and 68.8% on weekends under i = 1.0; 23.7% and 64.4% on weekdays, 31.9% and 63.7% on weekends under i = 1.05. Second, as shown in Table 2, MDN-SO is approximately 2% and 17% higher than GRU-DM and HA-DM in terms of

MoH

%

f, s i # s i # s i , f

74

30.05%

DoW HoD Figure 1. The pie plot of the feature importance

where DoM, MoY, DoW, HoD, and MoH denote the day of the month, the month of the year, the day of the week, the hour of the day, and the minute of the hour, respectively.

Table 1. Parameter Settings in the Optimization Model. Parameter

Value

a

US$0.4–US$0.9

b

idle travel distance cost: a $ g v,m

c

estimated from the real data set in the corresponding time slot

m

30 mi/h

i

{0.95, 1, 1.05}

p

0.1

DT

10 min

Ct

set to the total rider demand in the previous time slot

Ju ly 2023

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE

33

approach is capable of guiding idle vehicles in a more reasonable manner. This is because the MDN-SO framework utilizes uncertainty in the forecasting results that involves all the potential rider demand possibilities for decisionmaking. Moreover, compared with MDN-SO, GRU-DM, and HA-DM, we observe that our proposed data-driven approach is able to achieve close to the PC scenario, which implies that MDN-SO provides a fairly effective strategy for the idle vehicle proactive guidance operation. Finally, the rider’s average waiting time is an essential metric from the rider’s perspective. As shown in Figure 2, the rider’s average waiting time drops as the fleet size increases. This is because more rider requests will be satisfied when more idle vehicles are available. Therefore, WT3 will be smaller. In addition, NG-PC outperforms the NG-U and NG-NC regarding the rider’s average waiting time. This is quite straightforward because the riders in each region can be served by the idle drivers in the corresponding region under the NG-PC scenario, while there

exist a few riders who are served by the drivers in other regions under the NG-NC scenario where the rider’s average waiting time will increase. Further, among the three data-driven guidance approaches, GRU-DM and HA-DM are 2.1% and 11.5% higher than MDN-SO in terms of the rider’s average waiting time. Also, MDN-SO can reduce the rider’s average waiting time by 20% compared with the NG-U scenario without guidance, which is closer to the realistic scenario. This is because MDN-SO leverages not only the predicted demand uncertainty in each ride-hailing region but also guidance operations to achieve a better solution. Conclusions and Future Work Effective idle vehicle guidance strategies provide ride-hailing platforms with competitive advantages in terms of improved matching rates, reduced rider wait times, and driver idle travel distances. More research work is needed in this area to ensure the sustainable growth of ride-hailing

Table 2. The average OSC, under-supply cost (USC), TC, and SR using different data-driven guidance approaches (HA-DM, GRU-DM, and MDN-SO) and no guidance with different idle vehicle distributions (NG-PC, NG, and NG-NC).

i = 0.95

i=1

i = 1.05

OSC

USC

TC

SR

OSC

USC

TC

SR

HA-DM

20

6,795

6,815

79.2%

31

6,500

6,531

80.7%

GRU-DM

47

2,941

2,988

90.9%

57

3,096

3,153

91.1%

MDN-SO

72

2,394

2,466

92.7%

79

2,399

2,478

93.2%

NG-PC

N/A*

N/A

N/A

96%

N/A

N/A

N/A

96.5%

NG-U

N/A

N/A

N/A

75.7%

N/A

N/A

N/A

77.8%

NG-NC

N/A

N/A

N/A

38.3%

N/A

N/A

N/A

39.3%

HA-DM

40

5,509

5,549

82.8%

58

5,133

5,193

84.1%

GRU-DM

111

1,865

1,976

93.7%

125

1,983

2,108

93.9%

MDN-SO

132

1,503

1,635

95.1%

145

1,473

1,618

95.6%

NG-PC

N/A

N/A

N/A

96.6%

N/A

N/A

N/A

97.4%

NG-U

N/A

N/A

N/A

78.1%

N/A

N/A

N/A

80.6%

NG-NC

N/A

N/A

N/A

38.3%

N/A

N/A

N/A

39.3%

HA-DM

140

3,867

4,007

87.8%

125

3,858

3,983

87.3%

GRU-DM

204

1,665

1,869

94.5%

213

1,910

2,123

94.2%

MDN-SO

268

1,158

1,426

96.1%

241

1,203

1,444

96.4%

NG-PC

N/A

N/A

N/A

97.7%

N/A

N/A

N/A

97.7%

NG-U

N/A

N/A

N/A

80.2%

N/A

N/A

N/A

83%

NG-NC

N/A

N/A

N/A

38.4%

N/A

N/A

N/A

39.4%

*OSC, USC, and TC are set to N/A under NG-PC, PG-U, and NG-NC since no idle vehicle guidance operation is involved.

34

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

14 12 10 8 6

16 14 12 10 8 6 0:00 1:00 2:00 3:00 4:00 5:00 6:00 7:00 8:00 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00

Average Waiting Time (min)

16

guidance solutions. In future work, we plan to study the impacts of adopting such a vehicle guidance framework on the downstream matching/dispatching operations of a ridehailing platform. In addition, as an enhancement to our

0:00 1:00 2:00 3:00 4:00 5:00 6:00 7:00 8:00 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00

Average Waiting Time (min)

platforms in the long run. We propose an MDN-enabled SO framework by integrating an extended MDN with a stochastic optimization process. The proposed framework produces high service quality and low-cost vehicle

Time of a Day

Time of a Day

(a)

(b) 16

12 10 8 6

14 12 10 8 6

Time of a Day

Time of a Day

(c)

(d)

12 10 8 6

14 12 10 8 6 0:00 1:00 2:00 3:00 4:00 5:00 6:00 7:00 8:00 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00

Average Waiting Time (min)

14

0:00 1:00 2:00 3:00 4:00 5:00 6:00 7:00 8:00 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00

Average Waiting Time (min)

0:00 1:00 2:00 3:00 4:00 5:00 6:00 7:00 8:00 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00

Average Waiting Time (min)

14

0:00 1:00 2:00 3:00 4:00 5:00 6:00 7:00 8:00 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00

Average Waiting Time (min)

16

Time of a Day

Time of a Day

(e)

(f)

MDN-SO

GRU-DM

HA-DM

NG

NG-NC

NG-PC

Figure 2. The rider’s average waiting time under different supply–demand ratio scenarios: (a) weekday,

i = 0.95, (b) weekend, i = 0.95, (c) weekday, i = 1, (d) weekend, i = 1, (e) weekday, i = 1.05, and (f) weekend, i = 1.05.

Ju ly 2023

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE

35

previous work on guidance and matching [2], we will design integrated vehicle guidance and rider–driver matching systems that make use of the special characteristics of the ride-hailing data and domain-specific constraints to further improve the performance of the framework in terms of system scalability and solution quality.

tion. His research interests include machine learning, computer vision, and natural language processing. References [1] H. Wang and H. Yang, “Ridesourcing systems: A framework and review,” Transp. Res. B, Methodol., vol. 129, pp. 122–155, Nov. 2019, doi: 10.1016/j.trb.2019.07.009. [2] J. Gao, X. Li, C. Wang, and X. Huang, “BM-DDPG: An integrated dispatching frame-

About the Authors Xiaoming Li ([email protected]) earned his M.S. degree in computer software and theory from Northeastern University and his Ph.D. degree in information and systems engineering from Concordia University. He is a research associate at Concordia University, Montreal, QC H3G 1M8 Canada. His research interests include optimization under uncertainty, large-scale optimization, network optimization, machine learning with applications in intelligent transportation systems, and supply chain optimization. Jie Gao ([email protected]) earned her MASc. degree in information systems and her Ph.D. degree in information systems engineering from Concordia University. She is a postdoctoral research fellow at HEC Montreal at the University of Montreal, Montreal, QC H3T 2A7 Canada. Her research interests include data-driven optimization, game theory, mechanism design, and machine learning with applications in intelligent transportation systems, smart cities, and community healthcare. Chun Wang ([email protected]) is a professor with the Concordia Institute for Information Systems Engineering, Concordia University, Montreal, QC H3G 1M8 Canada. His research interests include the interface between economic models, operations research, and artificial intelligence. He is actively conducting research in multiagent systems, data-driven optimization, and economic model-based resource allocation with applications to healthcare management, smart grid, and smart city environments. He is a Member of IEEE. Xiao Huang ([email protected]) earned her B.E. degree in electronic engineering from Tsinghua University, her M.S. degree in mathematical finance from the University of Southern California, and her Ph.D. degree from the Marshall School of Business at the University of Southern California. She is a professor and the Concordia University Research Chair in Supply Chain Management in the John Molson School of Business at Concordia University, Montreal, QC H3G 1M8 Canada. Her research interests include competition and cooperation in supply chains, product and pricing strategies, and data-driven decision-making. Yimin Nie ([email protected]) earned his B.S. and M.S. degrees in theoretical physics from Peking University and his Ph.D. degree in computational neuroscience from the Canadian Center of Behavior Neuroscience at the University of Calgary. He is currently a senior data scientist and artificial intelligence researcher at Global AI Accelerator (GAIA) at Ericsson Inc., Montreal, QC H4R 2A4 Canada. He worked as a senior data scientist in multiple business fields including E-commerce, finance, and telecommunica36

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

work for ride-hailing systems,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 8, pp. 11,666–11,676, Aug. 2022, doi: 10.1109/TITS.2021.3106243. [3] Y. Guo, Y. Zhang, J. Yu, and X. Shen, “A spatiotemporal thermo guidance based real-time online ride-hailing dispatch framework,” IEEE Access, vol. 8, pp. 115,063– 115,077, Jun. 2020, doi: 10.1109/ACCESS.2020.3003942. [4] X. Wan, H. Ghazzai, and Y. Massoud, “A generic data-driven recommendation system for large-scale regular and ride-hailing taxi services,” Electronics, vol. 9, no. 4, p. 648, Apr. 2020, doi: 10.3390/electronics9040648. [5] X. Chen, F. Miao, G. J. Pappas, and V. Preciado, “Hierarchical data-driven vehicle dispatch and ride-sharing,” in Proc. IEEE 56th Annu. Conf. Decis. Control (CDC), 2017, pp. 4458–4463, doi: 10.1109/CDC.2017.8264317. [6] F. Miao et al., “Data-driven robust taxi dispatch under demand uncertainties,” IEEE Trans. Control Syst. Technol., vol. 27, no. 1, pp. 175–191, Jan. 2019, doi: 10.1109/ TCST.2017.2766042. [7] W. Szeto, R. Wong, and W. Yang, “Guiding vacant taxi drivers to demand locations by taxi-calling signals: A sequential binary logistic regression modeling approach and policy implications,” Transp. Policy, vol. 76, pp. 100–110, Apr. 2019, doi: 10.1016/j. tranpol.2018.06.009. [8] Y. Liu, R. Jia, J. Ye, and X. Qu, “How machine learning informs ride-hailing services: A survey,” Commun. Transp. Res., vol. 2, 2022, Art. no. 100075, doi: 10.1016/j.commtr.2022.100075. [9] Y. Liu, F. Wu, C. Lyu, S. Li, J. Ye, and X. Qu, “Deep dispatching: A deep reinforcement learning approach for vehicle dispatching on online ride-hailing platform,” Transp. Res. E, Logistics Transp. Rev., vol. 161, 2022, Art. no. 102694, doi: 10.1016/j. tre.2022.102694. [10] E. Delage, S. Arroyo, and Y. Ye, “The value of stochastic modeling in two-stage stochastic programs with cost uncertainty,” Oper. Res., vol. 62, no. 6, pp. 1377–1393, Nov./ Dec. 2014, doi: 10.1287/opre.2014.1318. [11] C. M. Bishop, “Mixture density networks,” Aston University, Birmingham, U.K., Tech. Rep. NCRG/94/004, 1994. [12] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” 2014, arXiv:1412.3555. [13] D. Ormoneit and V. Tresp, “Improved gaussian mixture density estimates using Bayesian penalty terms and network averaging,” in Proc. 8th Int. Conf. Neural Inf. Process. Syst., Nov. 1995, vol. 95, pp. 542–548. [14] R. L. Rardin and R. L. Rardin, Optimization in Operations Research, vol. 166. Upper Saddle River, NJ, USA: Prentice-Hall, 1998. [15] S. Kim, R. Pasupathy, and S. G. Henderson, “A guide to sample average approximation,” in Handbook of Simulation Optimization, M. Fu, Ed. New York, NY, USA: Springer Science & Business Media, 2015, pp. 207–243. [16] T. Oda and C. Joe-Wong, “MOVI: A model-free approach to dynamic fleet management,” in Proc. IEEE Conf. Comput. Commun. (INFOCOM), 2018, pp. 2708–2716, doi: 10.1109/INFOCOM.2018.8485988. [17] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2016, pp. 785–794, doi: 10.1145/2939672.2939785. [18] E. Brown, “The ride-hail utopia that got stuck in traffic,” Wall Street J., Feb. 2020. [Online]. Available: https://www.wsj.com/articles/the-ride-hail-utopia-that-got-stuck -in-traffic-11581742802

Edge Processing

©SHUTTERSTOCK.COM/HALLOJULIE

A LoRa-Based LCDT System for Smart Building With Energy and Delay Constraints

by B Shilpa , Hari Prabhat Gupta , and Rajesh Kumar Jha

Digital Object Identifier 10.1109/MSMC.2022.3204848 Date of current version: 17 July 2023

2333-942X/23©2023IEEE

A

smart building is an emerging technology that has the potential to be used in a variety of ubiquitous computing applications. The majority of existing work for smart building monitoring consumes a significant amount of energy to communicate the sensory data from the building to the end users (EUs). This work presents a low-cost data transmission (LCDT) system for a smart building in the context of a noisy environment. The system uses the Ju ly 2023

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE

37

data rate. The transmission of a large amount of sensory data takes a huge amount of time and consumes a lot of energy. The pervasive usage of unlicensed frequency bands by a large number of LoRa nodes creates the issue of LoRa interference. LoRa is commonly used for long-distance applications, although it also performs well in indoor applications. There are just a few studies [5], [6], [7], [8], [9] that assess LoRa in dense indoor networks. The existing work [2], [10], [11] proposed various solutions to solve the LoRa interference issue. The authors in [10] used multiple gateways to handle LoRa interference. The scheduling of nodes also reduces the interference by transmitting the data in a given period [2]. The effective use of LoRa network parameters, such This work presents as spreading factors, also helps to Overview a low-cost data reduce the interference [11]. HowThe smart building consists of vartransmission ever, the use of multiple gateways ious types of sensor nodes (SNs) increases the network’s cost; for gathering, processing, and comsystem for a smart scheduling nodes reduces gateway municating the surrounding envibuilding in the utility; and fixed spreading factors ronment information to the users may consume high power. The [1], [2]. An SN has sensing, commucontext of a noisy employ ment of c ut t i n g- ed ge nication, processing, and power environment. machine learning and DL algounits. Examples of sensing units rithms is enhancing traditional are temperature, light, humidity communication systems. Several sensors, etc. The sensors in smart DL models for wireless communibuildings generate huge data in the cation systems were developed in the existing research form of an MTS, which contains significant information that must be mined to enable timely responses and better works [12], [13], [14], [15], [16]. We intend to implement decision making. The components of an MTS are the data such principles into practice for LoRa communication. of different sensors with a given sampling rate. The research studies [1], [17], [18] related to smart buildThe communication unit of SNs in smart buildings coming are mainly focused on energy-efficient systems. As monly uses Zigbee, Bluetooth, Wi-Fi, and other 2.4-GHz they have not taken into account the system’s cost, the technologies [3]. Such technologies support short-range primary focus of this work is cost optimization. communication and, therefore, have scalability issues. Edge processing is a potential solution for communicatCommunicating the information using such technologies ing the smart building data with limited energy and delay. increases the cost of multihop devices. The scalability It minimizes the communication time and energy conissue motivates the use of promising wireless solutions sumption for conveying sensory data by allowing tasks to capable of simultaneously supporting many nodes and be processed locally. The cost of such EDs may vary based long range communication. Low-power wide-area neton the specification of the devices. A dynamic compresworks (LPWANs) have evolved as the leading connection sion ratio of sensory data for edge inference systems with option for smart applications requiring extended range, strict deadlines was described in [19]. The authors in [20] high energy efficiency, and low cost. proposed an adaptive data reduction method that uses An LPWAN protocol that is built on LoRa technology compressive sampling to lower the bandwidth needed for is specified by an open standard known as the Longsensory data transmission while minimizing the informaRange Wide-Area Network (LoRaWAN). The primary tion loss. advantage of LoRa is its scalability because the gateway In this work, we consider a smart building scenario, modules in LoRa support concurrent communication of where several nodes generate the sensory data while multiple SNs [4]. Another advantage of LoRa is low enersensing the environment and communicate those data gy usage during the transmission of the data to a large to the EDs for further processing. The success of the distance. LoRa also provides tradeoffs among power conscenario depends on the size of the data and the numsumption, communication range, and data rate. Despite ber of nodes. Large data size and multiple nodes give the aforementioned advantages of LoRa, communicating high accuracy with high energy consumption and coma significant volume of the sensory data of smart buildmunication delay. The smart building scenario works ings to the EUs is difficult because LoRa supports a low successfully for a long time if the acquired sensory data long-range (LoRa) communication protocol to conserve energy and enable long-distance communication. The smart building sensors generate data in the form of a multivariate time series (MTS). The system compresses such an MTS before transmission by utilizing deep learning (DL) techniques. A channel to reduce the transmission noise of sensory data is also designed using the DL method. The system decompresses the received data at the receiver end and obtains the original MTS. Additionally, we also conducted experiments to demonstrate the utility of the system. The experimental results demonstrate that selecting a finite number of distinct edge device (ED) types aids in developing an LCDT system subject to energy and latency constraints.

38

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

are transferred in the given duration and the energy ◆◆ We present the analysis of the delay and energy required consumption of all of the deployed nodes is equal. The for sensory data compression and communication. The system cost of such a scenario may be reduced by using analysis considers the different types of devices with different types of EDs based on the requirement of the unequal processing, energy, and storage capabilities. scenario. We address the following problem in this ◆◆ An optimization problem is formulated to minimize work: How does one design an LCDT system to transmit the cost and energy consumption of the data transmisthe huge size of sensory data of the smart building with sion system of the smart building. We also present a given energy and delay constraints? To solve this problow-time-complexity algorithm to solve the optimizalem, we present an LCDT system for smart buildings in a tion problem. noisy environment. The solution uses DL techniques for ◆◆ Finally, the experimental results are presented to illusthe compression and effective transmission of sensory trate the solution’s effectiveness. The experiment’s data. The system uses the LoRa communication protoparameters are defined based on the analysis of existcol to transfer the compressed smart building data to ing hardware to make it practical. the EUs. Along with this, the key contributions are as follows: The LCDT System The LCDT system architecture ◆◆ We propose a compression– consists of SNs, EDs attached decompression approach called The system uses with a LoRa node, an LG, a nettransmitter- and receiver-nets the long-range work server (NS), an application for lowering the amount of server (AS), and EUs, as shown in sensory data at the ED. The communication Figure 1. The SNs attached with approach employs deep neural protocol to conserve the smart building collect the sennetwork (DNN) architectures sory data in the form of the MTS for compressing and decomenergy and enable and forward it to the ED. The ED pressing the sensory data. The long-distance is responsible for compressing the DNN designed for compressing communication. received MTS and transmitting to the data is lightweight and can the LG. The LG receives the comsuccessfully run on low-propressed MTS and forwards the cessing EDs. same to the NS. The compressed ◆◆ We employ a mixed-density netMTS is retrieved to the original form at the NS and forwork architecture for the channel-net [21] to reduce warded to the AS. The AS identifies the data and forthe noise effects between EDs and the LoRa gateway wards them to the respective user based on the (LG). The channel-net works on EDs after reducing application. Finally, the EU receives the information colthe size of the sensory data by using the proposed lected by the SNs. compression DNN architecture.

Smart Building With Sensor Nodes

Transmitter-Net

Channel-Net

Edge Device (LoRa Node)

LoRa Gateway

Transmitter-Net

Receiver-Net

Network Server and Application Server End Users

Channel-Net

LoRa Communication

Non-LoRa Communication

Figure 1. An illustration of the LCDT system components for smart building using LoRa. The transmitter-net and

receiver-net are the mirror image of DNNs.

Ju ly 2023

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE

39

sender to receiver [12]. The channel-net is designed as The system uses the DNN model for the compression of a mixture density network (MDN) with Gaussian comthe MTS. The encoder and decoder of the DNN work for ponents to simulate the conditional density of the the compression and decompression of the MTS at the ED channel output given its input [21]. The MDN is a conand NS, respectively. Since the ED is a very lightweight catenation of DNN and a mixture model with paramedevice compared to the NS, the delay and energy analysis ters z (I l) as a channel input I l function. The DNN of the system is considered only at the ED. The system uses the transmitter-net as an encoder for compressing the model of the channel-net consists of L dense layers folMTS and channel-net for handling the noise between the lowed by a sampling layer. ED and LG. Both the transmitter-net and channel-net are The channel has the maximum fixed speed, denoted as DNNs and work at the ED. The system consists of a channel rate c, to process the received data. The condiset of N EDs with I different tional probability density P (I m ; I l) types, where N = {1, 2, f, n} and of a mixture model is given to the sampling layer to obtain the output I = " 1, 2, f., k ,. The costs of i The objective function and j types of EDs are denoted as I l. The conditional channel density modeled by the MDN is given by { i , j } ! I C i and C j, where and of the LCDT problem C i ! C j . The total number of ith is to determine the k type EDs in the system is given by P (I m ; I l) = | r i (I l) z (I m ; I l) (2) number of the various X i . The parameters of ED, such as i=1 energy E i, processing speed Vi, types of devices and cost C i, will differ based on where k is the number of mixture necessary to achieve the ED type i. components, r i (I l) ! [0, 1] is the mixing coefficient of component I, the lowest system cost. and z (I m ; I l) is the function repreThe Transmitter-Net senting the conditional densities of The transmitter-net is a DNN that m runs on an ED for mapping the I . The output of channel-net, i.e., input MTS data I ! R D X Z to a reduced dimensional MTS I m, is forwarded to the NS through the LG. The receiver-net I l ! R D X Z l, where D, Z, and Z l are the number of compoat the NS decompresses and retrieves the original data It. nents of MTS; original size of MTS; and reduced size of MTS, respectively, and Z l # Z. The DNN model consists of Estimation of Cost, Delay, and Energy Consumption of the System L number of layers with q neurons in each layer. Initially, the I is one-hot encoded, and the elements of the encoded The cost of the LCDT system is determined by the number vector are " I 1, I 2, f, I Z ,. The one-hot-encoded vector I 1 is of EDs of each type utilized for the smart building. Let X i input to the first layer of the DNN. The neurons in the first be the total number of the ith type ED in the system, layer receive input and perform simple computation with i ! I. The system cost is therefore activation function h and forward output to the next layer. The neurons in the next layer receive weighted input from C sys = C 1 X 1 + C 2 X 2 + g + C k X k . (3) the previous layer, perform the computation, and forward the output to the next layer. Likewise, the outputs of the The delay of the LCDT system depends on the time Lth and (L - 1)th layers are given by taken by the transmitter-net and channel-net. The delay of the transmitter-net is the estimated time to compress the L h L = | f (W j h j - 1), MTS of SNs, i.e., the sum of the number of operations in j=1 the DNN of each ED. Let the SNs generate MTS with sam L-1 q (1) h L - 1 = | | h ij W ij (h ij - 1 (W i I i + b i)) pling rate m, which is processed by k types of EDs. The j=1 i=1 delay of the nth type of ED with the transmitter-net of the L-layered DNN is given by where f, W j, I, and b are the activation function, weight metrics of the jth layer, input, and bias, respectively. Due k L to hardware constraints, the output of the last layer is T comp = | | mq j (2I a + 1) h q Vi X i . (4) n given to normalization, which transforms the data to satisi=1 j=1 fy the average power constraint or amplitude constraint. Lastly, the compressed data I l ! R D X Z l are transmitted to The estimated delay in the channel-net is given by the LG via the channel-net. The Channel-Net The channel-net learns resilient representations of the input data that can be retrieved with a low likelihood of errors despite channel conditions translating from 40

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

k

L

T chan = | | cq j (2I al + 1) h q Vi X i . (5) n i=1 j=1

The total delay of the system is the sum of the delays of the transmitter-net and channel-net, which is given by

Tn = T comp + T chan . (6) n n

Let E oi be the energy consumption per operation of the ith type ED; then, the total energy consumption of the system is given by

E n = (T comp # E oi ) + (T chan # E io). (7) n n

The Optimization Problem of the LCDT System This work aims to design a low-cost system for the transmission of sensory data of the smart building with given energy and delay constraints. The optimization problem of the LCDT system is defined as The LCDT Problem

min C sys (8a)

subject to constraint 1 : E n # E th (8b)

Constraint 2 : Tn # Tth . (8c)

The objective function of the LCDT problem is to determine the number of the various types of devices necessary to achieve the lowest system cost. Constraint 1 indicates that the energy consumption of the system should be below the threshold E th . It helps to prolong the life of the system. Constraint 2 ensures that the delay of the system for receiving the data at the NS should not exceed the threshold Tth . The thresholds E th and Tth are given by the user based on the application of the system. To solve the LCDT problem, Algorithm 1 computes the required number of different types of EDs with given energy and delay constraints. We start by fixing the maximum number of EDs, i.e., n i to say n max for 1 # i # k . We consider the scenario described in the “The LCDT System” section. Algorithm 1 takes C i, E oi , Vi, E th, Tth, and n max as inputs, where 1 # i # k. It then computes the E n iterative for the nth type of ED by using (7), where 1 # n # k. If constraint 1 satisfies, i.e., E n # E th, the algorithm checks constraint 2, i.e., Tn # Tth, by using (6). Algorithm 1 returns the number of EDs of each type, which satisfies both constrains. Finally, Algorithm 1 calculates the cost of the system with the selected number of EDs of each type and returns the number of EDs, which gives the minimum cost. The time complexity of the proposed algorithm is as follows: There are 1 + k for loops in the function Insert, resulting in a time complexity of O (k # n max # ft), where ft is the time complexity of the function Insert. The function Compute Cost has a time complexity of O (q) # c, where q and c are the number of times and the complexity of computing the cost, respectively. Thus, the computational complexity of the algorithm is O (k # n max # q # ( ft + c)), which is in polynomial time. Example Consider an LCDT system with the maximum number of devices n max = 10 with two different types of EDs, i.e., X 1

and X 2 . We fix the total number of instructions to be performed to 300. The cost, energy consumption, and processing speed of the different EDs are assumed to be in the ratios of 1:3, 1:4, and 10:1, respectively. The threshold values E th, Tth, and C th are set to 1,500, 300, and 5,000, respectively. Algorithm 1 is implemented to find the minimum cost of the system. Initially, the algorithm computes the energy consumption and delay for all of the combinations of EDs of different types. Next, it finds the list of combinations of EDs that satisfy the system constraints. Finally, the system’s cost is calculated for a given number of EDs of different types, and it selects the combination of EDs that gives the minimum cost. For the maximum number of 10 devices, the optimal cost found by Algorithm 1 is 11 with X 1 = 5 and X 2 = 2. Discussion and Results In this section, we illustrate the performance of the proposed system by using simulation results. The parameters considered for simulation are X 1 and X 2 types of EDs with cost, energy consumption, and processing speed in the ratios of 1:3, 1:4, and 10:1, respectively. For example, the ratio of parameters selected by a market analysis considers the type 1 ED as Arduino and the type 2 ED as Raspberry Pi. The cost of the Raspberry Pi is three times the cost of the Arduino; the energy consumption is four times higher;, and the processing speed is 10 times that of the Arduino. The threshold values E th, Tth, and C th are unit free and set initially to 1,500, 300, and 5,000, respectively. These threshold values may be varied depending on the scenario of the application. Figure 2(a) illustrates the impact of a number of instructions to be performed on the system cost. It shows that an increasing number of instructions increases the

Algorithm 1: The Solution of the LCDT Problem Input: Ci , E oi , Vi , E th, T th, nmax Output: q 1, f, q k 1 for int X 1 ! 1 to nmax do 2 h 3 for int X k ! 1 to nmax do 4 if E n # E th and Tn # Tth then 5 {q 1, q 2, f, q k } = Insert (X 1, X 2, f, X k ) 6 return q 1, q 2, f, q k ; 7 Function Insert (X 1, X 2, f, X k ) 8 begin 9 Compute Cost = C 1 X 1 + C 2 X 2 + g + C k X k ; 10 if Cost 1 C sys then q 1 = X 1, q 2 = X 2, f, q k = X k and C sys = Cost ; 11 return q 1, q 2, f, q k ; 12 end

Ju ly 2023

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE

41

is also observed that by increasing system cost. This is because the the delay threshold, we can minisystem uses a greater number of The development of mize the number of devices which, devices for performing an increased in turn, minimizes the cost of the number of instructions in the given a channel-net with system. The impact of the number delay and energy thresholds. Fighigher performance of instructions on type 2 devices is ure 2(a) also shows the impact of shown in Figure 2(c). Type 2 devicdelay on the system cost. We can is highlighted as a es also increase in number with minimize the system cost by next step in reducing respect to the number of instrucincreasing the delay threshold for interference and tions, but we can see a very minian increased number of instrucmal increase compared to type 1 tions. The delay threshold Tth, vartransmission errors in devices. This is because the cost of ied from 300 to 1,000. These values LoRa communication. type 2 devices is higher than type 1 can be ad justed based on the devices, so the system considers requirements of use case. The fewer type 2 devices to minimize results show that the cost of the the system cost. system depends on the delay threshold, number, and cost of different types of devices. Conclusion and Future Work Figure 2(b) and (c) demonstrates the impact of the numIn this article, an LCDT method for smart building data ber of instructions on the number of devices. Figure 2(b) is proposed. Compression–decompression models based shows that the number of type 1 devices increases with on DL estimate the energy and communication delay for respect to the number of instructions to be performed. It

80 Number of Type 1 Devices

Cost of the System

70 60 50 40 30 20 10 0 100 200 300 400 500 600 700 800 900 1,000 Number of Instructions (a)

20 18 16 14 12 10 8 6 4 2 0 100 200 300 400 500 600 700 800 900 1,000 Number of Instructions (b)

Number of Type 2 Devices

12 10 8 6 4 2 0 100 200 300 400 500 600 700 800 900 1,000 Number of Instructions (c) Tth = 300

Tth = 500

Tth = 700

Tth = 1,000

Figure 2. An illustration of the effect of the number of instructions on the system cost and the required number

of devices with different delay threshold. (a) The cost of the system. (b) The required number of type 1 devices. (c) The required number of type 2 devices.

42

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

sensory data. A novel approach to implementing a communication channel entails creating a DL model to minimize transmission error. The best combination of EDs needed to build an LCDT system is determined using an algorithm that is described. The experimental findings demonstrate that the system’s cost, which is constrained by energy and latency considerations, can be decreased by using a fixed number of distinct EDs. Future research directions include expanding the analysis to take into account various performance-enhancing characteristics. The development of a channel-net with higher performance is highlighted as a next step in reducing interference and transmission errors in LoRa communication.

Internet Things J., vol. 7, no. 1, pp. 298–310, Jan. 2020, doi: 10.1109/JIOT.2019. 2946900. [7] J. Petäjäjärvi, K. Mikhaylov, M. Hämäläinen, and J. H. Iinatti, “Evaluation of LoRa LPWAN technology for remote health and wellbeing monitoring,” in Proc. 10th Int. Symp. Med. Inf. Commun. Technol. (ISMICT), 2016, pp. 1–5, doi: 10.1109/ ISMICT.2016.7498898. [8] J. Haxhibeqiri, A. Karaagac, F. V. D. Abeele, W. Joseph, I. Moerman, and J. Hoebeke, “LoRa indoor coverage and performance in an industrial environment: Case study,” in Proc. 22nd IEEE Int. Conf. Emerg. Technol. Factory Automat. (ETFA), 2017, pp. 1–8, doi: 10.1109/ETFA.2017.8247601. [9] L. Gregora, L. Vojtech, and M. Neruda, “Indoor signal propagation of LoRa technology,” in Proc. 17th Int. Conf. Mechatronics - Mechatronika (ME), 2016, pp. 1–4. [10] D. Croce, M. Gucciardo, S. Mangione, G. Santaromita, and I. Tinnirello, “LoRa technology demystified: From link behavior to cell-level performance,” IEEE

About the Authors B Shilpa ([email protected]) is a research scholar with the Department of Electronics and Communication Engineering, Faculty of Science and Technology, IFHE, Hyderabad 501203, India. Her research interests include wireless communication, wireless sensor networks, and the Internet of Things. Hari Prabhat Gupta ([email protected]) is an assistant professor in the Department of Computer Science and Engineering, Indian Institute of Technology (BHU), Varanasi 221005, India. His research interests include wireless sensor networks, distributed algorithms, and the Internet of Things. Rajesh Kumar Jha ([email protected]) is an assistant professor in the Department of Electronics and Communication Engineering, Faculty of Science and Technology, IFHE, Hyderabad 501203, India. His research interests include very large scale integration and the Internet of Things.

Trans. Wireless Commun., vol. 19, no. 2, pp. 822–834, Feb. 2020, doi: 10.1109/ TWC.2019.2948872. [11] P. Kumari, H. P. Gupta, and T. Dutta, “Estimation of time duration for using the allocated LoRa spreading factor: A game-theory approach,” IEEE Trans. Veh. Technol., vol. 69, no. 10, pp. 11,090–11,098, Oct. 2020, doi: 10.1109/TVT.2020.3007566. [12] T. O’Shea and J. Hoydis, “An introduction to deep learning for the physical layer,” IEEE Trans. Cogn. Commun. Netw., vol. 3, no. 4, pp. 563–575, Dec. 2017, doi: 10.1109/ TCCN.2017.2758370. [13] T. J. O’Shea, K. Karra, and T. C. Clancy, “Learning to communicate: Channel auto-encoders, domain specific regularizers, and attention,” in Proc. IEEE Int. Symp. Signal Process. Inf. Technol. (ISSPIT), 2016, pp. 223–228, doi: 10.1109/ISSPIT. 2016.7886039. [14] H. Ye, L. Liang, G. Y. Li, and B.-H. Juang, “Deep learning-based end-to-end wireless communication systems with conditional GANS as unknown channels,” IEEE Trans. Wireless Commun., vol. 19, no. 5, pp. 3133–3143, May 2020, doi: 10.1109/ TWC.2020.2970707. [15] S. Dörner, S. Cammerer, J. Hoydis, and S. t. Brink, “Deep learning based communication over the air,” IEEE J. Sel. Topics Signal Process., vol. 12, no. 1, pp. 132–143, Feb. 2018, doi: 10.1109/JSTSP.2017.2784180. [16] D. Wu, M. Nekovee, and Y. Wang, “Deep learning-based autoencoder for m-user

References

wireless interference channel physical layer design,” IEEE Access, vol. 8, pp. 174,679–

[1] B. Qolomany et al., “Leveraging machine learning and big data for smart build-

174,691, Sep. 2020, doi: 10.1109/ACCESS.2020.3025597.

ings: A comprehensive survey,” IEEE Access, vol. 7, pp. 90,316–90,356, Jul. 2019, doi:

[17] I. Sülo, S. R. Keskin, G. Dogan, and T. Brown, “Energy efficient smart buildings:

10.1109/ACCESS.2019.2926642.

LSTM neural networks for time series prediction,” in Proc. Int. Conf. Deep Learn.

[2] P. Kumari, H. P. Gupta, and T. Dutta, “A nodes scheduling approach for effective

Mach. Learn. Emerg. Appl. (Deep-ML), 2019, pp. 18–22, doi: 10.1109/Deep-ML.

use of gateway in dense LoRa networks,” in Proc. ICC IEEE Int. Conf. Commun. (ICC),

2019.00012.

2020, pp. 1–6, doi: 10.1109/ICC40277.2020.9149006.

[18] I. Abdennadher, N. Khabou, I. B. Rodriguez, and M. Jmaiel, “Designing energy effi-

[3] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, and M. Ayyash, “Internet

cient smart buildings in ubiquitous environments,” in Proc. 15th Int. Conf. Intell. Syst.

of Things: A survey on enabling technologies, protocols, and applications,” IEEE

Design. Appl. (ISDA), 2015, pp. 122–127, doi: 10.1109/ISDA.2015.7489212.

Commun. Surveys Tuts., vol. 17, no. 4, pp. 2347–2376, 4th quarter 2015, doi: 10.1109/

[19] X. Huang and S. Zhou, “Dynamic compression ratio selection for edge inference

COMST.2015.2444095.

systems with hard deadlines,” IEEE Internet Things J., vol. 7, no. 9, pp. 8800–8810,

[4] J. C. Liando, A. Gamage, A. W. Tengourtius, and M. Li, “Known and unknown facts

Sep. 2020, doi: 10.1109/JIOT.2020.2997128.

of LoRa: Experiences from a large-scale measurement study,” ACM Trans. Sens. Netw.,

[20] S. Tripathi and S. De, “An efficient data characterization and reduction scheme

vol. 15, no. 2, pp. 1–35, May 2019, doi: 10.1145/3293534.

for smart metering infrastructure,” IEEE Trans. Ind. Informat., vol. 14, no. 10, pp.

[5] E. D. Ayele, C. Hakkenberg, J. P. Meijers, K. Zhang, N. Meratnia, and P. J. M. Hav-

4300–4308, Oct. 2018, doi: 10.1109/TII.2018.2799855.

inga, “Performance analysis of LoRa radio for an indoor IoT applications,” in Proc.

[21] D. García Martí, J. Palacios Beltrán, J. O. Lacruz, and J. Widmer, “A mixture

Int. Conf. Internet Things Global Commun. (IoTGC), 2017, pp. 1–8, doi: 10.1109/

density channel model for deep learning-based wireless physical layer design,” in Proc.

IoTGC.2017.8008973.

23rd Int. ACM Conf. Model., Anal. Simul. Wireless Mobile Syst. (MSWiM), 2020, pp.

[6] W. Xu, J. Y. Kim, W. Huang, S. S. Kanhere, S. K. Jha, and W. Hu, “Measurement,

53–62, doi: 10.1145/3416010.3423229.

characterization, and modeling of LoRa technology in multifloor buildings,” IEEE

Ju ly 2023

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE

43

Conference Reports

by Qi Kang

and Shuaiyu Yao

The 19th IEEE International Conference on Networking, Sensing, and Control

T

he 19th IEEE International Conference on Networking, Sensing, and Control (ICNSC 2022) was held between 15 and 18 December 2022 in Shanghai, China. ICNSC 2022 was hosted by the IEEE Systems, Man, and Cybernetics Society; Tong ji University (China); Fudan University (China); and Shanghai Association for System Simulation (China). It was supported by the K.C. Wong Education Foundation, Hong Kong, China. The theme of this conference was “autonomous intelligent systems,” Digital Object Identifier 10.1109/MSMC.2023.3273460 Date of current version: 17 July 2023

focusing on intelligent control, machine learning, deep learning, network communication, multiagent systems, Internet of Things, and swarm intelligence. Following this theme, the conference provided a platform for both academic researchers and industrial practitioners involved in different but related domains to discuss key problems, exchange ideas, and tackle emerging challenges, while sharing innovative solutions and looking into future research prospects. The conference was held in a hybrid format with online and in-person attendance. A total of 211 papers were submitted to the conference, out of which 144 were selected based

Figure 1. Some attendees of ICNSC 2022. 44

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

on a rigorous single-blind review peer review for oral presentations. This indicates a paper acceptance rate of approximately 68.2%. The accepted papers have been included in Proceedings of 2022 IEEE International Conference on Networking, Sensing, and Control, which have now been published in IEEE Xplore and Engineering Index COMPENDEX indexed. Notably, the authors hailed from various countries, including China, the United States, Japan, Canada, France, Italy, and The Netherlands. ICNSC 2022 was successfully held as a multinational and multidisciplinary conference that provided scientists, engineers, and students with a platform to con vene a nd d i s c u s s t hei r s h a r e d interests (Figures 1 and 2), thanks to the collaborative efforts of the orga nizing, progra m, a nd steering committees; the authors who submitted exceptional papers; and the reviewers who examined the papers and provided many insightful comments. The program agenda of the conference encompassed various technical activities, including a plenary session,

Consensus, a fundamental problem in M ASs, was explored as a requirement for cooperation

The theme of this conference was “autonomous intelligent systems,” focusing on intelligent control, machine learning, deep learning, network communication, multiagent systems, Internet of Things, and swarm intelligence. four keynote speeches, a best paper award session, and 28 parallel panel sessions that featured eight special sessions. The plenary session kicked off with opening remarks delivered by Prof. Xiaohua Tong, vice president of Tongji University (Figure 3); Prof. Mengchu Zhou, chair of the ICNSC Steering Committee (Figure 4); and Prof. Qi Kang, general chair of ICNSC 2022 (Figure 5). The conference featured keynote speeches from renowned experts (shown in the following paragraphs), whose thoughtprovoking ideas set the tone for the event. These speakers captivated the audience with their visionary outlook and provided inspiring insights into the future of networked systems and control. 1) Prof. Peng Shi, editor-in-chief of IEEE Transactions on Cybernetics, who is from the University of Adelaide, Australia, gave a presentation titled “Consensus and Formation Control for Multi-agent Systems.” Prof. Shi’s presentation focused on multiagent systems (MASs) and highlighted their key features of communication, coordination, and collaboration for achieving common goals effectively and efficiently. The presentation covered three main topics: consensus, flocking/swarming, and formation control within MASs.

among agents. Flocking, a selforganizing behavior inspired by lower-intelligence animals, enables

Figure 2. The conference site of ICNSC 2022.

Figure 3. The opening remarks delivered by Prof. Xiaohua Tong, vice

president of Tongji University.

Figure 4. The opening remarks delivered by Prof. Mengchu Zhou, chair of

the ICNSC Steering Committee.

Ju ly 2023

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE

45

the emergence of swarm intelligence to enhance system survivability and competitiveness. Additionally, formation control aims to drive agents toward desired scalable and

adaptable formations. Prof. Shi’s presentation presented modeling analysis, design, simulations, and experimental examples to showcase the potential of distributed schemes

Figure 5. The opening remarks delivered by Prof. Qi Kang, general chair

of ICNSC 2022.

Figure 6. The keynote speech provided by Prof. Peng Shi.

Figure 7. The keynote speech provided by Prof. Ke Tang. 46

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

These speakers captivated the audience with their visionary outlook and provided inspiring insights into the future of networked systems and control. in achieving consensus and formation control (Figure 6). 2) Prof. Ke Tang from Southern University of Science and Technology, China, introduced “Learn to Optimize” to the audience of the conference. Prof. Tang’s speech focused on the automation of algorithm design to address complex real-world optimization problems. Off-the-shelf algorithms and tools are inadequate for these problems, requiring extensive prior knowledge and manual algorithm design efforts. The concept of learn to optimize (L2O), a data-driven approach for automated algorithm and solver design, was introduced. The speech discussed the building blocks and recent advancements in L2O, along with successful case studies. Future directions in this field were also presented (Figure 7). 3) Prof. Zhi Wei from the New Jersey Institute of Technology, USA, provided a talk titled “Deep Autoencoders for Analysis of Single-Cell RNA Sequencing Data.” Prof. Wei’s talk focused on clustering analysis, specifically in the context of single-cell RNA sequencing (scRNA-seq) studies. Traditional clustering methods often overlook the unique character istics of scRNA-seq data and fail to utilize prior information or filter out irrelevant genes during the clustering process. To overcome these limitations, Prof. Wei proposed the use of model-based deep aut oen c o d e r s . These novel methods aim to a dd re s s the identified

These sessions facilitated dynamic conversations, where ideas were rigorously examined, and diverse viewpoints were respectfully debated. issues a nd enhance clustering performance. Through extensive experiments on both simulated and real datasets, the proposed methods demonstrate a significant improvement in clustering performance, leading to the generation of biologically meaningful clusters (Figure 8). 4) Prof. Tadahiko Murata from Kansai University, Japan, delivered a presentation titled “Synthetic Societal Data (Synthetic Population + Basic Behavioral Data).” Prof. Murata’s presentation focused on real-scale social simulations for specific communities such as cities, towns, and villages. With the COVID-19 pandemic, researchers are developing social simulations for countermeasures against the virus. To develop such simulations, synthetic populations have been synthesized based on publicly released statistics without containing any privacy information. Prof. Murata’s research outcome enables the generation of synthetic societal data, which include household compositions and basic behavioral data, facilitating the development of real-scale social simulations for emergency and peaceful times (Figure 9). The parallel sessions allowed researchers to delve into specific subtopics, fostering focused discussions on areas such as autonomous agents and multiagent, continual learning, cyberphysical systems, edge computing, heterogeneous wireless networks, Internet of Things, networked control systems, smart civil aviation and

Figure 8. The keynote speech provided by Prof. Zhi Wei.

Figure 9. The keynote speech provided by Prof. Tadahiko Murata.

Figure 10. The offline parallel sessions.

Ju ly 2023

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE

47

◆◆ “Detection transformer: Ultra-

Figure 11. The online parallel sessions.

a erospace, swarm intelligence, and transfer learning. The researchers present ed their latest discoveries and breakthroughs, sparking intense debates and encouraging the exchange of different perspectives (Figures 10 and 11). In addition, ICNSC 2022 was composed of eight special sessions that addressed a diverse range of topics, including ◆◆ Modeling, analysis, and control of resource allocation systems ◆◆ A connected and autonomous mobility system for energy and environmental sustainability ◆◆ Artificial intelligence for IT operations ◆◆ Deep learning and optimization for distributed industrial systems ◆◆ An evolutionary algorithm for big data applications ◆◆ Data-driven estimation in industrial scenarios ◆◆ Latent representation learning for incomplete big data ◆◆ Transfer perception and control in real robotic applications. The discussions of these sessions brought together experts and attendees to tackle challenging issues and address the emerging trends in the field. These sessions facilitated dynamic conversations, where ideas were rigorously examined, and diverse viewpoints were respectfully

48

debated. Attendees eagerly explored interesting topics, engaging in deep conversations, sharing feedback, and exploring potential collaborations. These interactions not only enriched the knowledge of the attendees but also nurtured a sense of community and camaraderie. After a series of oral presentation competitions, a total of five papers were chosen from the pool of candidate papers to receive the prestigious accolades and best paper awards of ICNSC 2022. Specifically, these awards included two best conference paper awards, two best student paper awards, and one best emerging technology paper award. The winners of the best paper awards are listed as follows: 1) The winners of the best conference paper awards: ◆◆ “Heuristic scheduling method of flexible manufacturing based on Petri nets and artificial potential field” by Sijia Yi et al. ◆◆ “Open the black box of recurrent neural network by decoding the internal dynamics” by Jiacheng Tang et al. 2) The winners of the best student paper awards: ◆◆ “Design and implementation of autonomous mapping system for UGV based on lidar” by Xiaohong Xu et al.

IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023

sonic echo signal inclusion detection with transformer” by Xiaoxin Fang et al. 3) The winner of the best emerging technology paper award: ◆◆ “Design of resilient supervisory control for autonomous connected vehicles approaching unsignalized intersection in presence of communication delays” by Carlo Motta et al. The attendees of ICNSC 2022 experienced a vibrant and intellec tually stimulating environment, fostering lively and in-depth exchanges and discussions. The conference attracted a diverse group of academicians, researchers, industry experts, and students from around the globe, and all of them were eager to share their knowledge and insights in the field of networking, sensing, and control. Throughout the conference, the participants actively engaged in various sessions and presentations, each offering unique perspectives and cutting-edge research findings. The atmosphere was characterized by a palpable enthusiasm and a genuine passion for advancing the field of networking, sensing, and control. In addition, the social events, including receptions, banquets, and networking breaks, offered valuable opportunities for participants to forge new connections, foster collaborations, and establish lasting friendships. In these informal settings, participants engaged in lively conversations, sharing their experiences, exchanging ideas, and exploring potential joint projects. For more information about ICNSC 2022, including details about the conference program, keynote speeches, and special sessions, please visit the official website of the conference: http://www. icnisc2022.com/. The upcoming conference, ICNSC 2023, will take place in the captivating city of Marseille, France, which is renowned for its rich cultural heritage, breathtaking landscapes, and vibrant atmosphere.

st

The 1 IEEE International Summer School on E-CARGO and Applications (Online)

July 16-21, 2023 http://www.e-cargoschool.com/

Sponsors: • IEEE Systems, Man, and Cybernetics Society Organizer: • Technical Committee of Distributed Intelligent Systems Co-Organizers: • Technical Committee of Computer-Supported Cooperative Work in Design • Guangdong Chapter • Nipissing University, Canada Acknowledgement: • Jinling Institute of Technology, China Goal: The Environments-Classes, Agents, Roles, Groups, and Objects (E-CARGO) model is an abstract model for complex systems. It has been successfully applied in different applications. It has numerous potentials to promote investigations into academic and industry problems. It fits the SMCS requirements of initiatives. Role Based Collaboration (RBC) and its E-CARGO model have been developed into a powerful tool for investigating collaboration and complex systems. Related research has brought and will bring in exciting improvements to the development, evaluation, and management of systems including collaboration, services, clouds, productions, and administration systems. E-CARGO assists scientists and engineering in formalizing abstract problems, which originally are taken as complex problems, and finally points out solutions to such problems including programming. The E-CARGO model possesses all the preferred properties of a computational model. It has been verified by formalizing and solving significant problems in collaboration and complex systems, e.g., Group Role Assignment (GRA). With the help of E-CARGO, the methodology of RBC can be applied to solve various real-world problems. E-CARGO itself can be extended to formalize abstract problems as innovative investigations in research. On the other hand, the details of each E-CARGO component are still open for renovations for specific fields to make the model easily applied. For example, in programming, we need to specify the primitive elements for each component of E-CARGO. When these primitive elements are well-specified, a new type of modeling or programming language can be developed and applied to solve general problems with software design and implementations. This summer school will extend the applications of E-CARGO and RBC, which promote problem solving for complex systems that are considered in SMCS, such as Cybernetics, Systems Science and Engineering, HumanMachine Systems, and Computational Social Systems. Motivation: In the field of Systems, Man, and Cybernetics (SMC), many researchers require solid tools to develop their methodologies or solutions to their specific problems in their specific areas. There are many traditional tools for specific areas, such as object or agent models, deep learning, evolutionary computation, or evolutionary optimizations. However, these methodologies and models have their own limitations. Researchers are eager to have a high-level, abstract, but expressive models and methodologies to guide them in understanding the requirement of their specific problems, which are usually very complex. It is very hard for them to grasp the key elements to analyze their problems, specify the requirement, and design a feasible solution. E-CARGO is a novel model to meet the requirement of researchers in this aspect. Using E-CARGO, researchers master a tool to start to investigate a problem along an easy-to-follow route and can gradually delve into the details of the system or problem they are mainly concerned about. Such a tool helps them to understand their problems or systems in an adaptive and incremental way. In the summer school, we will demonstrate through lectures and labs many successful stories and case studies for researchers to learn, follow, and practice. The SMC Society encourages interdisciplinary research and innovations and is a reputational technology incubator. It is the SMC Society that makes E-CARGO develop, expand, and mature. Digital Object Identifier 10.1109/MSMC.2023.3275041

Attendees: This school is open for everyone and anyone with some familiarity with abstract mathematical structures to learn about the E-CARGO model and RBC theory. Our goal is to make the E-CARGO/RBC theory accessible to, and inclusive of, everyone who is interested. We believe that E-CARGO is for everyone, and are committed to fostering a kind, inclusive environment. From our experience, 4th-year students, graduate students including master’s and PhD’s, and fresh researchers/practitioners in STEM majors are better fits. Registration: Including: 1) 5-day (10 sessions) of online participation of the summer school program. 2) a certificate for those registered attendees who attend not less than 7 sessions. 3) an author-signed hardcopy book for the top 10 students, and a hardcopy book for the top 11-50 students in performance (Value: $170 including shipping cost): H. Zhu, E-CARGO and Role-Based Collaboration: Modeling and Solving Problems in the Complex World, Wiley-IEEE Press, NJ, USA, Dec. 2021. Note: We will also send out more books (51-?) based on the budget. The criterion is the registration time, i.e., First In First Serve (FIFS). IEEE SMC student member: $50CAD IEEE SMC member: $50CAD IEEE student member: $85CAD IEEE member: $120CAD Non-IEEE student: $120CAD Non-IEEE member: $190CAD Organization Committee: General Chair: Haibin Zhu, Nipissing University, Canada Program Co-Chairs: Dongning Liu, Guangdong University of Technology, China Yin Sheng, Hohai University, China Registration Co-Chairs: Xianjun Zhu, Jinling Institute of Technology, China Publicity Co-Chairs: Hua Ma, Hunan Normal University, China Libo Zhang, Southwest University, China Instructors: Haibin Zhu, Nipissing University, Canada Dongning Liu, Guangdong University of Technology, China Yin Sheng, Hohai University, China Lab Instructor: Qian Jiang, Macau University of Science and Technology, China Secretary: Chengyu Peng, Laurentian University, Canada Contact: [email protected] Confirmed Panelists: Sam Kwong, IEEE Fellow, Chair Professor, City University of Hong Kong, President, IEEE SMC Society Mariagrazia Dotoli, Professor, Politecnico di Bari, Vice President – Membership & Student Activities, IEEE SMC Society Ljiljana Trajkovic, IEEE Fellow, Professor, Simon Fraser University, EiC, IEEE Transactions on Human Machine Systems Peng Shi, IEEE Fellow, Professor, University of Adelaide, EiC, IEEE Transactions on Cybernetics Robert Kozma, IEEE Fellow, Professor, University of Memphis, EiC, IEEE Transactions on Systems, Man, and Cybernetics: Systems Weiming Shen, IEEE Fellow, Professor, Huazhong University of Science and Technology,