Proceedings of Asia Pacific Computer Systems Conference 2021: APCS 2021 9811979030, 9789811979033

This book contains select proceedings papers from the Asia Pacific Computer Systems Conference (APCS 2022). The contents

258 8 5MB

English Pages 172 [173] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Proceedings of Asia Pacific Computer Systems Conference 2021: APCS 2021
 9811979030, 9789811979033

Table of contents :
Organizing Committee
Preface
Contents
About the Editors
Computer and Automation Systems
Optimal Design of Rail Support Structure for Mountain Rail Transport Vehicle
1 Introduction
2 Finite Element Theory
2.1 Establishment of Finite Element Models
2.2 Simulation Analysis of Support Mode
2.3 Simulation Analysis of Support Height
2.4 Simulation Analysis of High-Strength Materials
3 Conclusion
References
Hazardous Behavior Recognition Based on Multi-Model Fusion
1 Introduction
2 Ease of Use
3 Dataset
4 Proposed Method
5 Experiments
6 Conclusion
References
A Study on Intelligent Agricultural Monitoring System Based on Internet of Things
1 Introduction
2 Theoretical Basis
2.1 The Concept of IoT
2.2 The Concept of Smart Agriculture
2.3 Smart Agriculture System Structure
2.4 Key Technologies of Smart Agriculture
3 Research Status at Home and Abroad
3.1 Current Status of Foreign Research
3.2 Domestic Research Status
4 The Overall Scheme Design of the System
4.1 Analysis of System Requirements
4.2 System Functions
4.3 System Scheme and Architecture Design
5 Conclusion
References
Design of Ship Abnormal Behavior Recognition System for Intelligent Maritime Supervision
1 Introduction
2 Monitoring Technology of Intentional Shutdown of Ship-Borne AIS Radio Station
2.1 Shutdown Identification Based on the Distribution Rule of Ship-Borne AIS Signal Strength
2.2 Anomaly Identification Based on Ship-Borne AIS “Shutdown Event” Model
3 The Reconnaissance Technology of Illegally Occupying the Working Channel of the Ship-Borne AIS
3.1 Communication Status of Working Channel of Ship-Borne AIS Equipment
3.2 Abnormal Identification of Ship-Borne AIS Working Level Signal
4 Conclusion
References
Phase Recovery Algorithm Based on Intensity Transport Equation and Angular Spectrum Iteration
1 Introduction
2 Phase Retrieval Algorithm
2.1 Intensity Transport Equation (TIE)
2.2 Iterative Angular Spectrum Algorithm (IASA)
2.3 Improved Algorithm
3 Simulation Experiment Results
3.1 IASA Simulation Experiment
3.2 Improved Algorithm Simulation Experiment
4 Conclusion
References
Advanced Computer Science and Information Management
Debias in Deep Learning Recommender System
1 Introduction
2 Application of Statistical Learning in Recommendation System
2.1 Collaborative Filtering (CF)
2.2 Matrix Factorization (MF)
2.3 Factorization Machine (FM)
2.4 Field-Aware Factorization Machine (FFM)
2.5 Gradient Boosting Decision Tree (GBDT) with Logistic Regression (LR)
3 Application of Deep Learning in Recommendation System
3.1 The AutoRec Model
3.2 The NeuralCF Model
4 Debiasing on Recommender System
4.1 Propensity Score
4.2 Inverse Propensity Score (IPS)
4.3 Doubly Robust Model
5 Conclusion
References
Entity Relationship Extraction Based on Knowledge Graph Knowledge
1 Introduction
2 Related Research
3 Model of This Paper
4 Technical Route
4.1 Named Entity Recognition by Fusing Multi-feature Embedding
4.2 Entity Relationship Extraction Research in Novel Scenes
4.3 Research on Novel Character Relationships Based on Knowledge Graph
5 Conclusion
References
Algorithm Research and Program Implementation of Q Matrix Eigenvalue
1 Introduction
2 Definition and Lemma
3 Eigenvalue Dichotomy Method for Solving Q Matrix
4 Numerical Experiment and Programming
4.1 Using Python Command to Find the Eigenvalue of Q Matrix
4.2 Using Java Language to Design Q Matrix Dichotomy Program
5 Conclusions
References
Proposal for an Employee Management Support System for Regional Public Transportation Based on Health Data
1 Introduction
2 Regional Public Transport in Japan
2.1 Major and Regional Public Transport
2.2 Aging of Employees
2.3 Operation Management
3 Related Works
3.1 Employee Health Care
3.2 Privacy Data Management
4 Proposal of the System
4.1 Encryption of Personal Data
4.2 Design of the Applications
4.3 Application for Roll Callers
4.4 Application for Managers
4.5 Application for Employees
4.6 Design of the Backend System
5 Implementation and Evaluation
5.1 Implementation of the System
5.2 Evaluation
6 Conclusion
References
Gamification to Grow Motivation for Interactive Engagement of Health Nurses in Using Health Information Systems: A Conceptual Framework
1 Introduction
2 Literature Study
2.1 Health Information System
2.2 Required Use of the System
2.3 Motivation and Gamification Strategy
3 Research Method
4 Results
4.1 Game Features and Motivating Needs
4.2 Drivers of Nurse Motivation
4.3 Gamification Effect
4.4 Nurse Involvement in Information System as a Gamification
4.5 Patient-Centered Healthcare and Quality Improvement
5 Conclusions
6 Limitations
References
Analysis of the Influence of System Quality, Information Quality, and Service Quality of PBB
1 Introduction
2 Literature Review
2.1 Local Tax
2.2 Rural and Urban Land and Building Tax
2.3 Tax Compliance
2.4 DeLone and McLean
2.5 Application
2.6 System Quality
2.7 Information Quality
2.8 Service Quality
2.9 Hypothesis
3 Method
3.1 Outer Model
3.2 Inner Model
3.3 Variable Operations
4 Results and Discussions
4.1 Results
4.2 Discussions
4.3 The Effect of IPBB Application Service Quality on Taxpayer Compliance
5 Conclusions
References
Effects of Different Normalization on the ESRGAN
1 Introduction
2 Method
2.1 ESRGAN
2.2 Batch Normalization (BN)
2.3 Layer Normalization (LN)
2.4 Instance Normalization (IN)
2.5 Group Normalization (GN)
2.6 Representative Batch Normalization (RBN)
3 Experiment
4 Conclusion
References
Early Detection of Mental Disorder Via Social Media Posts Using Deep Learning Models
1 Introduction
2 Data Collection and Preprocessing
3 Methods
3.1 Random Predictor Model
3.2 Sentence2vec
3.3 Deep Learning LSTM Model
3.4 Pre-trained BERT
3.5 Fine-Tuned BERT
4 Experiments
4.1 Quantitative Evaluations
4.2 Qualitative Analysis
5 Conclusions and Future Work
References
Design of PLC Training Platform Based on Digital Twin
1 Introduction
2 Current Situation of PLC Course Training Teaching
2.1 Large Investment in Training Equipment
2.2 Training Space and Time Are Limited
2.3 Lack of Learning Initiative
3 Construction and Exploration of Digital Twin PLC Course
3.1 To Meet the Upgrading Requirements of Manufacturing Industry
3.2 Online and Offline Teaching Mode
3.3 Establish a Whole-Process Evaluation Mechanism
4 Design and Construction of Digital Twin Platform
4.1 Platform Tool Selection
4.2 NX MCD to Create 3D Scene
4.3 Connection Between PLC Program and MCD
5 Summary
References

Citation preview

Lecture Notes in Electrical Engineering 978

Anu Gokhale Emanuel Grant   Editors

Proceedings of Asia Pacific Computer Systems Conference 2021 APCS 2021

Lecture Notes in Electrical Engineering Volume 978

Series Editors Leopoldo Angrisani, Department of Electrical and Information Technologies Engineering, University of Napoli Federico II, Naples, Italy Marco Arteaga, Departament de Control y Robótica, Universidad Nacional Autónoma de México, Coyoacán, Mexico Bijaya Ketan Panigrahi, Electrical Engineering, Indian Institute of Technology Delhi, New Delhi, Delhi, India Samarjit Chakraborty, Fakultät für Elektrotechnik und Informationstechnik, TU München, Munich, Germany Jiming Chen, Zhejiang University, Hangzhou, Zhejiang, China Shanben Chen, Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China Tan Kay Chen, Department of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore Rüdiger Dillmann, Humanoids and Intelligent Systems Laboratory, Karlsruhe Institute for Technology, Karlsruhe, Germany Haibin Duan, Beijing University of Aeronautics and Astronautics, Beijing, China Gianluigi Ferrari, Università di Parma, Parma, Italy Manuel Ferre, Centre for Automation and Robotics CAR (UPM-CSIC), Universidad Politécnica de Madrid, Madrid, Spain Sandra Hirche, Department of Electrical Engineering and Information Science, Technische Universität München, Munich, Germany Faryar Jabbari, Department of Mechanical and Aerospace Engineering, University of California, Irvine, CA, USA Limin Jia, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Alaa Khamis, German University in Egypt El Tagamoa El Khames, New Cairo City, Egypt Torsten Kroeger, Stanford University, Stanford, CA, USA Yong Li, Hunan University, Changsha, Hunan, China Qilian Liang, Department of Electrical Engineering, University of Texas at Arlington, Arlington, TX, USA Ferran Martín, Departament d’Enginyeria Electrònica, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain Tan Cher Ming, College of Engineering, Nanyang Technological University, Singapore, Singapore Wolfgang Minker, Institute of Information Technology, University of Ulm, Ulm, Germany Pradeep Misra, Department of Electrical Engineering, Wright State University, Dayton, OH, USA Sebastian Möller, Quality and Usability Laboratory, TU Berlin, Berlin, Germany Subhas Mukhopadhyay, School of Engineering and Advanced Technology, Massey University, Palmerston North, Manawatu-Wanganui, New Zealand Cun-Zheng Ning, Electrical Engineering, Arizona State University, Tempe, AZ, USA Toyoaki Nishida, Graduate School of Informatics, Kyoto University, Kyoto, Japan Luca Oneto, Department of Informatics, BioEngineering, Robotics and Systems Engineering, University of Genova, Genova, Genova, Italy Federica Pascucci, Dipartimento di Ingegneria, Università degli Studi “Roma Tre“, Rome, Italy Yong Qin, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China Gan Woon Seng, School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore Joachim Speidel, Institute of Telecommunications, Universität Stuttgart, Stuttgart, Germany Germano Veiga, Campus da FEUP, INESC Porto, Porto, Portugal Haitao Wu, Academy of Opto-electronics, Chinese Academy of Sciences, Beijing, China Walter Zamboni, DIEM—Università degli studi di Salerno, Fisciano, Salerno, Italy Junjie James Zhang, Charlotte, NC, USA

The book series Lecture Notes in Electrical Engineering (LNEE) publishes the latest developments in Electrical Engineering—quickly, informally and in high quality. While original research reported in proceedings and monographs has traditionally formed the core of LNEE, we also encourage authors to submit books devoted to supporting student education and professional training in the various fields and applications areas of electrical engineering. The series cover classical and emerging topics concerning: • • • • • • • • • • • •

Communication Engineering, Information Theory and Networks Electronics Engineering and Microelectronics Signal, Image and Speech Processing Wireless and Mobile Communication Circuits and Systems Energy Systems, Power Electronics and Electrical Machines Electro-optical Engineering Instrumentation Engineering Avionics Engineering Control Systems Internet-of-Things and Cybersecurity Biomedical Devices, MEMS and NEMS

For general information about this book series, comments or suggestions, please contact [email protected]. To submit a proposal or request further information, please contact the Publishing Editor in your country: China Jasmine Dou, Editor ([email protected]) India, Japan, Rest of Asia Swati Meherishi, Editorial Director ([email protected]) Southeast Asia, Australia, New Zealand Ramesh Nath Premnath, Editor ([email protected]) USA, Canada Michael Luby, Senior Editor ([email protected]) All other Countries Leontina Di Cecco, Senior Editor ([email protected]) ** This series is indexed by EI Compendex and Scopus databases. **

Anu Gokhale · Emanuel Grant Editors

Proceedings of Asia Pacific Computer Systems Conference 2021 APCS 2021

Editors Anu Gokhale Department of Computer Information Systems St. Augustine’s University Raleigh, NC, USA

Emanuel Grant University of North Dakota Grand Forks, ND, USA

Illinois State University Normal, Illinois, USA

ISSN 1876-1100 ISSN 1876-1119 (electronic) Lecture Notes in Electrical Engineering ISBN 978-981-19-7903-3 ISBN 978-981-19-7904-0 (eBook) https://doi.org/10.1007/978-981-19-7904-0 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Organizing Committee

Conference Chairs Anu Gokhale, St. Augustine’s University, USA Mohsen Guizani, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI), UAE (IEEE Fellow)

Program Chairs Emanuel Grant, University of North Dakota, USA Sen Zhang, State University of New York College at Oneonta, USA Marvin Schneider, DeVry University, USA Abdel-Badeeh M. Salem, Ain Shams University, Egypt

Program Co-chairs Valeriy Gavrishchaka, Applied Quantitative Solutions for Complex Systems, USA Sule Yildirim Yayilgan, Norwegian University of Science and Technology, Norway Don Zinger, Northern Illinois University, USA

Regional Chair Shawon Rahman, University of Hawaii-Hilo, USA

v

vi

Organizing Committee

Special Session Chairs Jenny Li, Kean University, USA Yulia Kumar, Kean University, USA

Publicity Chair Sujoy Chakraborty, Stockton University, USA

Technical Program Committee Michiko Tsubaki, The University of Electro-Communications, Japan Mohammed A. Gharawi, Institute of Public Administration, Saudi Arabia Aboamama Atahar Ahmed, Galactic Bridge Research and Technology Center, Malaysia Jian-Ao Lian, Prairie View A&M University, USA Youssif Al-Nashif, Florida Polytechnic University, USA Chui Young Yoon, Institute of On Kwang Technology Research, South Korea Driss Kettani, Al-Akhawayn University, Morocco Mehmet Karaata, Kuwait University, Kuwait Filippo Sanfilippo, University of Agder (UiA); Oslo Metropolitan University (OsloMet) Gabriel Jimenez, Pontifical Catholic University of Peru, Peru Hanku Lee, Minnesota State University Moorhead, USA Matthieu Martel, University of Perpignan Via Domitia, France Muhammad Sarfraz, Kuwait University, Kuwait Nai-Wei Lo, National Taiwan University of Science and Technology, Taiwan Robinson Jiménez-Moreno, Militar Nueva Granada University, Colombia Smain Femmam, Strasbourg University of Haute Alsace, France Toshiro Minami, Kyushu Institute of Information Sciences, Japan Chin-Shiuh Shieh, National Kaohsiung University of Science and Technology, Taiwan Chunshan Yang, Guilin University of Aerospace Technology, China Erica Teixeira Gomes de Sousa, Federal Rural University of Pernambuco, Brazil Francesco Colace, University of Salerno, Italy Muazzam A. Khan Khattak, University of Missuouri Kansas City, USA Muthoni Masinde, Central University of Technology, South Africa Tian Song, Tokushima University, Japan Doyel Pal, LaGuardia Community College, USA Phoey Lee, Malaysia Sunway University, Malaysia

Organizing Committee

Eyüp Burak Ceyhan, Bartin University, Turkey Fedra Trujillano, Pontifical Catholic University of Peru, Peru Liming Zhang, University of Macau, Macau Md Faisal Kabir, Pennsylvania State University-Harrisburg, USA Nelly Elsayed, University of Cincinnati, USA Athar Khodabakhsh, Norwegian University of Science and Technology, Norway Bharath Reddy, Process Automation R&D Schneider-Electric, USA Federico Reghenzani, Politecnico di Milano (POLIMI), Italy Xin Fang, Northeastern University, USA R. Shekhar, Alliance University, India

vii

Preface

It is our pleasure to introduce you to the Proceedings of the 2021 Asia Pacific Computer Systems Conference (APCS 2021) which was held virtually to minimize the risk of SOVID-19. The safety and well-being of all participants are our top priority. At the same time, we strive to provide this long-awaited conference for many scholars and researchers to conduct academic exchanges with colleagues. APCS 2021 provides a scientific platform for both local and international scientists, engineers and technologists who work in all aspects of Computer Systems. In addition to the contributed papers, internationally known experts from several countries are also invited to deliver keynote and invited speeches at APCS 2021. The volume includes 14 selected papers which were submitted to the conference from universities, research institutes and industries. Each contributed paper has been peer-reviewed by reviewers who were program committee and technical committee members as well as other experts in the field from different countries. The proceedings are intended to present to the readers the newest research results and findings in the field of Computer Systems. Much of the credit of the success of the conference is due to topic coordinators who have devoted their expertise and experience in promoting the conference as well as general co-ordination, organization and operation of the activities during the conference. The coordinators of various session topics devoted considerable time and energy in soliciting papers from relevant researchers for presentation at the conference. The chairpersons of the different sessions played important role in conducting the proceedings of the session in a timely and efficient manner and on behalf of the conference committee, we express sincere appreciation for their involvement. The reviewers of the manuscripts, those by tradition would remain anonymous, have also been very helpful in efficiently reviewing the manuscripts, providing valuable comments well within the time allotted to them. We express our sincere gratitude to all reviewers.

ix

x

Preface

We strongly believe that APCS 2021 will channelize intellectual resources relevant to Computer Systems and provide a roadmap for further studies, investigations, analyses, and applications in this field.

Anu Gokhale, Ph.D. Professor and Chair Department of Computer Information Systems Saint Augustine’s University Raleigh, NC, USA Distinguished Professor Emerita Illinois State University Normal, Illinois, USA

Contents

Computer and Automation Systems Optimal Design of Rail Support Structure for Mountain Rail Transport Vehicle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zhu Chen, Tao Lin, Hong Chen, Bingfeng Bai, and Guohong Zhang Hazardous Behavior Recognition Based on Multi-Model Fusion . . . . . . . Bingyi Zhang, Bincheng Li, and Yuhan Zhu

3 15

A Study on Intelligent Agricultural Monitoring System Based on Internet of Things . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fangfang Jiang

23

Design of Ship Abnormal Behavior Recognition System for Intelligent Maritime Supervision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yangyang Wu, Hailin Zheng, and Nibin Huang

33

Phase Recovery Algorithm Based on Intensity Transport Equation and Angular Spectrum Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Haiyan Wu

43

Advanced Computer Science and Information Management Debias in Deep Learning Recommender System . . . . . . . . . . . . . . . . . . . . . . Jialin He, Kenan Li, Haonan Yao, and Haoqiang Kang

55

Entity Relationship Extraction Based on Knowledge Graph Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dengyun Zhu and Hongzhi Yu

65

Algorithm Research and Program Implementation of Q Matrix Eigenvalue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Weida Qin, Xinqi Wang, and Ricai Luo

73

xi

xii

Contents

Proposal for an Employee Management Support System for Regional Public Transportation Based on Health Data . . . . . . . . . . . . . Toshihiro Uchibayashi, Chinasa Sueyoshi, Hideya Takagi, Yoshihiro Yasutake, and Kentaro Inenaga Gamification to Grow Motivation for Interactive Engagement of Health Nurses in Using Health Information Systems: A Conceptual Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Faisal Binsar, Ignatius Edward Riantono, Rano Kartono, Agustinus Bandur, and Wibowo Kosasih

85

99

Analysis of the Influence of System Quality, Information Quality, and Service Quality of PBB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Maya Safira Dewi, Meiryani, Ignatius Edward Riantono, and Nastasya Sekar Ayu Utami Effects of Different Normalization on the ESRGAN . . . . . . . . . . . . . . . . . . . 139 Yongqi Tian, Jialin Tang, Lihong Niu, Binghua Su, and Yulei An Early Detection of Mental Disorder Via Social Media Posts Using Deep Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Amanda Sun and Zhe Wu Design of PLC Training Platform Based on Digital Twin . . . . . . . . . . . . . . 159 Fuhua Yang

About the Editors

Dr. Anu Gokhale is professor and department chairperson at St. Augustine’s University, USA. She has completed thirty years as Faculty and has received several college and university research, teaching, and service awards. Dr. Gokhale was named Fulbright Distinguished Chair in STEM+C at the University of Pernambuco, Brazil, 2016–17; was Faculty Fellow in Israel and Fulbright Specialist in Cybersecurity at Gujarat Technological University, India, in 2017; and was Visiting Professor in College of Business at Shandong University, China, in 2017 where she focused on data analytics and e-commerce. Her achievements encompass extensively cited refereed publications, groundbreaking externally funded research supported by a continuous stream for 20 years of grants from state and federal agencies including the National Science Foundation, and elevation of the ISU student experience through excellence in teaching, mentorship, and the creation of opportunities for students to get involved in research. She consults for business and industry to increase productivity using data analytics and leveraging etechnologies. She has delivered multiple workshops focusing on inclusion and diversity as well as “STEM for All” public policy. Dr. Emanuel Grant received his B.Sc. from the University of the West Indies, MCS from Florida Atlantic University, and Ph.D. from Colorado State University, all in Computer Science. Since 2008, he is Associate Professor in the Department of Computer Science and the School of Electrical Engineering and Computer Science at the University of North Dakota, USA, where he started as Assistant Professor in 2002. He currently serves as Associate Director of the School of Electrical Engineering and Computer Science. His research interests are in software development methodologies, formal specification techniques, domain-specific modeling languages, modeldriven software development, and software engineering education. Dr. Grant has conducted research in software engineering teaching with collaborators from Holy Angel University, Philippines; HELP University College, Malaysia; III-Hyderabad, India; Singapore Management University, Singapore; Montclair State University,

xiii

xiv

About the Editors

and University of North Carolina Wilmington of the USA; and the University of Technology, Jamaica. He is affiliated with the Software Engineering Method and Theory (SEMAT) organization, as Member of the Essence-Kernel and Language for Software Engineering Methods (Essence) group.

Computer and Automation Systems

Optimal Design of Rail Support Structure for Mountain Rail Transport Vehicle Zhu Chen, Tao Lin, Hong Chen, Bingfeng Bai, and Guohong Zhang

Abstract The rail transport vehicle is a solution for the transportation of heavy materials in mountainous areas. The support scheme of the rail is the key and difficult point of the design, but unified design criterion has not been established in many industries. In order to meet the requirements of the technical development of rail transport vehicles and improve the reliability of rail transport, the rail support structure needs to be optimized. This paper uses ANSYS Workbench 19.0 to analyze the strength of the rail support structure of the dual rail transport vehicle, compares the stress conditions of the three support structures, and optimizes the design of installation height of the support to ensure the safety and reliability of the support structure in practical applications. The final analysis results show that the stress of the cross-braced structure scheme is optimal, and within the range allowed by the structural design, the height of the connection point of the diagonal bracing can be increased as much as possible to increase the structural stability. Keywords Rail transport vehicle · Support structure · Optimal design · Finite element analysis

1 Introduction The rail transportation system is a new transportation method in which the truck is towed and transported by the engine equipped with the locomotive and rides on a specific rail. At present, the mainstream rail transportation methods include monorail Z. Chen (B) Fujian Electricity Transmission & Transformation Facilities Engineering Co., Ltd, Fuzhou, China e-mail: [email protected] T. Lin · H. Chen · B. Bai Machinery and Equipment Branch, Fujian Electricity Transmission & Transformation Facilities Engineering Co., Ltd, Fuzhou, China G. Zhang Construction Management Department, Fujian Electricity Transmission & Transformation Facilities Engineering Co., Ltd, Fuzhou, China © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Gokhale and E. Grant (eds.), Proceedings of Asia Pacific Computer Systems Conference 2021, Lecture Notes in Electrical Engineering 978, https://doi.org/10.1007/978-981-19-7904-0_1

3

4

Z. Chen et al.

transportation and dual rail transportation. Monorail transportation technology has been developed very maturely, and it has been used in many countries and regions for many years. At present, it has been used in agricultural production activities such as mountain fruit picking and fertilization and, to a certain extent, has improved the mechanization level of agricultural transportation. However, the maximum load of the monorail transport vehicle does not exceed 1 ton, which is difficult to meet the application in mining, metallurgy, and construction projects. The maximum load of the dual rail transport vehicle can reach 4 tons, and it has strong overturning resistance, which can meet the transportation needs of materials below 4 tons in mining, metallurgy, and engineering construction in mountainous areas. It can be predicted that with the development of rail transportation technology, the rail transportation system will be widely used in more industries in the future, and a set of standard system will be established. The monorail transportation system is only suitable for small loads. Its overturning resistance as a monorail is weak, as shown in Fig. 1. Although the dual rail transportation system has the ability to transport large loads, its rails are often laid close to the ground, and the transport vehicles also run close to the ground, which cannot be applicable for the complex and variable terrain in mountainous areas, as shown in Fig. 2. Therefore, it is necessary to study the overhead rail of the dual rail transportation system. Compared with the monorail transportation system, the structure of the dual rail transportation system, especially the rail support structure, is more complicated. At present, no systematic reference exists in the actual design process, which lead to our inability to judge which support structure can achieve the best, and inability to confirm the weak position of rail support structure where the potential safety hazards exist. This paper will conduct finite element analysis and discuss the stress conditions of three support structures under the same boundary conditions by means of the ANSYS Workbench 19.0 tool: (1) Set up diagonal support column to assist in supporting vertical columns; (2) connect the vertical support columns at the same position on both sides of the rail with beam to form a truss; and (3) connect the vertical support columns at the same position on both sides of the rail with two diagonal braces to inability to form a cross-braced structure. Finally, the optimal structure is selected, the support height and material strength are changed, and the stress improvement condition is further discussed.

2 Finite Element Theory The transport vehicle runs smoothly and slowly on the rail, and the deformation of the rail and support structure can be regarded as the elastic deformation under the static load, which conforms to the small deformation assumption in the mechanics of materials [1–3]. Therefore, the problem studied in this paper belongs to the category of elastic mechanics. Starting from the micro-unit, considering the three aspects of static force, geometry, and physics, the basic differential equations are obtained for solution, and finally the constants in the solution are determined by using the

Optimal Design of Rail Support Structure for Mountain Rail Transport …

5

(a) Monorail

(b) Dual rail

Fig. 1 Rail transporter transport vehicle

Fig. 2 Rail support scheme

boundary conditions, which is the basic method for solving elastic mechanics problems [4–7]. This article will use the ANSYS Workbench19.0 tool for finite element analysis.

6

Z. Chen et al.

Table 1 Main parameters of the rail Single lengths (m)

6

Single weight (kg)

32.6

Material

Q345, Q690

Rail dimensions (length, width, wall thickness-mm)

50 × 50 × 4

Ground clearance (m)

1

Rail spacing (m)

1

Material of the support column

Q295

Dimensions of the support column (outside diameter, wall thickness-mm)

27 × 4

Spacing of support column (m)

1

Support base (diameter, thickness-mm)

120 × 15

2.1 Establishment of Finite Element Models 2.1.1

Model Import

According to the design requirements of mountain transport vehicles, the parameters such as the spacing and height of the rail must be designed according to the requirements. The main parameters of the rail are given in Table 1 below. According to the parameters of the mountain transport vehicle, the rail support structure of the mountain transport vehicle is designed, and three support structure schemes are proposed [8, 9]. According to Scheme 1, diagonal support is provided for each vertical support column, as shown in “Fig. 2a”; according to Scheme 2, crossbars are used to connect the vertical support columns at the same position on both sides of the rail, as shown in “Fig. 2b”; according to Scheme 3, the two diagonal braces are used to connect the vertical support columns at the same position on both sides of the rail, as shown in “Fig. 2c”. For each scheme, 3D models were created and assembled in SolidWorks 18. In order to improve the division effect of the grid and the speed of simulation calculation, the 3D model is simplified in comparison with the actual structure, and the structure that has a greater influence on the strength is retained [10]. After building the 3D model, the model is imported into the ANSYS Workbench statics module.

2.1.2

Material Properties

The material selection of various parts of the paper has been stipulated in the design requirements, so only the model needs to be imported into Workbench, and then corresponding material properties are assigned to its corresponding material. The material properties of the main parts of the rail and support are given in Table 2.

Optimal Design of Rail Support Structure for Mountain Rail Transport …

7

Table 2 Property sheet of the rail and its supporting materials Part Name

Mechanical Properties of the Selected Material Poison’s Ratio

Young’s Modulus (GPa)

Ultimate tensile Strength (MPa)

Yield strength (MPa)

Density (kg/m3 )

Rail

0.28

206

490

390

7850

Support

0.28

206

390

295

7850

Bolt

0.3

207

780

640

7800

2.1.3

Boundary Conditions

Constraint Settings The rail and the pins, clips, and wheels are all contacted with constraints. The clips, pin bush, and the support column are bonded for contact, and the lower surface of the support base is fixed to simulate a fixed connection with the ground.

Load Application The force on the rail support includes the pressure exerted by the rail and its own gravity, of which the rail pressure accounts for the main part. The rail pressure comes from the gravity in the vertical direction of the trolley. When the trolley climbs, it will also generate a horizontal force along the rail direction, and it will also generate a horizontal centrifugal force perpendicular to the rail when turning. Therefore, the rail is most stressed when the trolley is climbing and turning. Fz = M ∗ g ∗ cos(α)

(1)

Fy = M ∗ g ∗ sin(α)

(2)

Fx =

M∗v 2 r

(3)

The rail stress at the limited working condition is shown in Fig. 3. Loads are applied to the wheels and the upper surface of the rail in the workbench to simulate the stress of the limited working condition.

Meshing Workbench is used to mesh the rail and support models. The quality of meshing will ultimately affect the results of finite element analysis. Therefore, mesh refinement is performed for small structures such as bolts to ensure accuracy of the simulation

8

Z. Chen et al.

Fz=12000N

Fy=2000N Fx=2000N

Rail

1m

Support

1m

Fig. 3 Schematic diagram of force under limited working condition

Table 3 Unit type of the main components

Part

Part type

Element type

Rail

3D-Deformable Solids

Linear hexahedron

Wheel Bolt Support

Linear tetrahedron

[11, 12]. The unit types of each component are given in Table 3. The total number of units after meshing is 138097.The division results are shown in Fig. 4.

2.2 Simulation Analysis of Support Mode According to the above three support methods, the finite element simulation of the three support methods is carried out when the support height is 800 mm. The parameters are given in Table 4. The simulation results are as follows. In Fig. 5, Fig. 5a is the method of diagonal support, Fig. 5b is the beam support method, and “Fig. 5c” is the fork-shaped support method. The maximum stress value and maximum stress position are given in Table 5. It can be seen from Fig. 5 and Table 5 that under the condition of lateral force, the connection between the two rails will greatly improve the force of the rail and the vertical support. Among them, the fork-shaped support has the most obvious

Optimal Design of Rail Support Structure for Mountain Rail Transport …

9

Fig. 4 Meshing results Table 4 Support method

Support method

Support height (mm)

Diagonal support

800

Beam support Fork-shaped support

Fig. 5 Simulation results of different support methods Table 5 Comparison of maximum stress values for different support methods Support method

Support height (mm)

Maximum stress (MPa)

Maximum stress position

Diagonal support

800

261.45

Contact surface between the rail and wheel

Beam support

184.58

Forked support

126.86

10

Z. Chen et al.

improvement on the force of the rail and vertical support. Meanwhile, the forkshaped support constitutes a triangular stable structure, which has advantages in strength and rigidity.

2.3 Simulation Analysis of Support Height In the case of using fork-shaped support, considering the economy and convenience of construction, the improvement of the support point height needs to be discussed to the rail and vertical support [13, 14]. According to theoretical mechanics, it can be analyzed that the higher the support, the better the force of the rail and the vertical support, so the stress is scaled up from support height of 700 mm. Table 6 gives the working conditions of the three support heights. The simulation results are shown in Fig. 6. In Fig. 6a, the support height is 700 mm, in Fig. 6b, the support height is 800 mm, and in Fig. 6c, the support height is 900 mm. Table 7 gives the maximum stress value and maximum stress location at different support heights. It can be seen from Fig. 6 and Table 7 that in the support mode of the fork-shaped support, the higher the position of the support point, the better the stress of the rail and vertical support, which is consistent with the results of theoretical analysis. It can be seen from Table 7 that the theoretical maximum stress values of the two working conditions at support height of 800 and 900 mm are close. When the support height is 900 mm, the maximum stress position changes, and the force on the rail surface is better. For the sake of stability, a fork-shaped support with a support height of 900 mm is used to support the double rails [15]. Table 6 Stress situation of different support heights of fork-shaped support

Support method

Material

Fork-shaped support

Q345

Support height (mm) 700 800 900

Fig. 6 Simulation results of different support heights

Optimal Design of Rail Support Structure for Mountain Rail Transport …

11

Table 7 Stress conditions at different support heights Support method

Material

Support height (mm)

Maximum stress (MPa)

Maximum stress position

Fork-shaped support

Q345

700

154.41

800

126.86

Contact surface between the rail and wheel

900

123.48

Table 8 Stress of different support heights of fork-shaped support

Junction of the Vertical Support and Rail

Support method

Support height (mm)

Fork-shaped support

900

Material Q345 Q690

2.4 Simulation Analysis of High-Strength Materials In order to further improve the safety and stability of the truss structure, the safety factors of different materials under the same working conditions are now compared. Selected materials are given in Table 8. The simulation results are shown in Fig. 7. In Fig. 7, Fig. 7a is the stress nephogram of the material Q690 used for the fork-shaped support, Fig. 7b is the stress nephogram of the material Q345 used for the fork-shaped support, Fig. 7c is the safety factor nephogram of the material Q690 used for the fork-shaped support, and Fig. 7d is the stress of safety factor nephogram of the material Q345 used for the forkshaped support. The maximum stress values and minimum safety factors are given in Table 8. It can be seen from Fig. 7 and Table 9 that the use of high-strength materials has no obvious effect on the structural stress when the support point is 900 mm and the fork-shaped support is used. However, the higher the strength of the material used, the higher the structural stability and safety. In order to meet the safety requirements, Q690 is selected as the material for the fork support.

3 Conclusion Three kinds of design schemes of rail support structure are firstly proposed in this paper in view of rail support structure of dual rail transport vehicle system: (1) diagonal bracing; (2) beam; and (3) cross-bracing. The SolidWorks 18 software is used to carry out design modeling, and then static structural analysis is conducted based on ANSYS Workbench 19.0, to solve the stress of three kinds of design schemes under the same working conditions. Through the comparative analysis of the results, the maximum stress of the diagonal bracing structure is 261.45 MPa; the maximum

12

Z. Chen et al.

Fig. 7 Simulation results of different materials

Table 9 Stress conditions at different support heights Support method

Support height (mm)

Fork-shaped support

900

Material

Maximum stress (MPa)

Minimal safety factor

Q345

123.48

2.0246

Q690

124.14

5.5581

stress of the beam structure is 184.58 MPa; the maximum stress of the cross-bracing structure is 126.86 MPa. Therefore, the cross-bracing structure is the best solution for guide rail support. On the basis of the cross-bracing structure, the connection position with the vertical support rod was changed, to further analyze the relationship between the stress and the connection position of the cross-bracing structure. Using the same tool for static structural analysis, when the support height is 700 m, the maximum stress is 154.41 MPa; when the support height is 800 m, the maximum stress is 126. When the support height is 900 m, the maximum stress is 123.48 MPa. The higher the support position, the better the structural stress condition. Therefore, within the allowable range of the structural design, the height of the connection point of the diagonal bracing can be increased as much as possible to increase the structural stability.

Optimal Design of Rail Support Structure for Mountain Rail Transport …

13

On the basis of the optimal support height of the cross-bracing structure, if the high-strength material Q690 is used, the stress of the support structure will not be changed significantly, but the overall safety factor of the rail support structure will be improved.

References 1. Amiri N, Tasnim F, Anbarani MT, Dagdeviren C, Karami MA (2021) Experimentally verified finite element modeling and analysis of a conformable piezoelectric sensor. Smart Mater Struct 30(8) 2. Kamel A, Dammak K, Yangui M, Hami A El, Jdidia MB, Hammami L, Haddar M (2021) A reliability optimization of a coupled soil structure interaction applied to an offshore wind turbine. Appl Ocean Res 113 3. Gu D, Yang J, Wang H, Lin K, Yuan L, Hu K, Wu L (2021) Laser powder bed fusion of bio-inspired reticulated shell structure: optimization mechanisms of structure, process, and compressive property. CIRP J Manuf Sci Technol 35 4. Gao H, Liang J, Li B, Zheng C, Matsumoto T (2021) A level set based topology optimization for finite unidirectional acoustic phononic structures using boundary element method. Comput Methods Appl Mech Eng 381 5. Su J, Lou J, Jiang X (2021) Finite element optimization design of aircraft equipment installation structure. In: IOP conference series: earth and environmental science 769(4) 6. Peng W-M, Cheng K-J, Liu Y-F, Nizza M, Baur DA, Jiang X-F, Dong X-Tao (2021) Biomechanical and mechanostat analysis of a titanium layered porous implant for mandibular reconstruction: the effect of the topology optimization design. Mater Sci Eng C 124 7. Li S, Qu Z (2021) Optimized design of structure of high-bending-rigidity circular tube. Sustain 13(8) 8. Liu ZH, Tian SL, Zeng QL, Gao KD, Cui XL, Wang CL (2021) Optimization design of curved outrigger structure based on buckling analysis and multi-island genetic algorithm. Sci Prog 104(2) 9. Nia AB, Nejad AF, Xin L, Ayob A, Yahya MY (2020) Energy absorption assessment of conical composite structures subjected to quasi-static loading through optimization based method. Mech Ind 21(1) 10. Theotokoglou EE, Balokas G, Savvaki EK (2019) Linear and nonlinear buckling analysis for the material design optimization of wind turbine blades. Int J Struct Integrity 10(6) 11. Wu H, Kuang S, Hou H (2019) Research on application of electric vehicle collision based on reliability optimization design method. Int J Comput Meth 16(7) 12. Zuo W, Zhao C, Zhou L, Guo G (2019) Comparison of gradient and nongradient algorithms in the structural optimization course. Int J Mech Eng Educ 47(3) 13. Wang D, Zhang S, Xu W (2019) Multi-objective optimization design of wheel based on the performance of 13° and 90° impact tests. Int J Crashworthiness 24(3) 14. Wondimu A, Alemup N, Regassa Y (2019). Modeling and simulation of rail end bolt hole and bolted rail joint by FEM, pp 2348–7968 15. Li S, Feng X (2020) Study of structural optimization design on a certain vehicle body-in-white based on static performance and modal analysis. Mech Syst Signal Proc 135

Hazardous Behavior Recognition Based on Multi-Model Fusion Bingyi Zhang, Bincheng Li, and Yuhan Zhu

Abstract Distracted driving has become a serious traffic problem. This study proposes an image processing and multi-model fusion scheme to maximize the discrimination accuracy of distracted driving. First, the training dataset and the test dataset are processed to specific specifications by translation and clipping. Second, we set VGG16 as the benchmark model for evaluation and train ResNet50, InceptionV3, and Xception model input images. Finally, considering that each model has its own advantages, we use frozen part of the network layer to fine-tune the model, remove the weights of each single-model fine-tune from the output before full connection, connect them in series, and then calculate each model weights through neural network training. Keywords Driver distraction · Driving behavior · Computer vision · Machine learning

1 Introduction There are many traffic accidents caused by distracted driving today. More than 3700 people around the world die each day in road traffic collisions [1]. Many, if not all of these deaths could have been avoided. These sorts of crashes are the leading killer of kids and young people aged one to 25 in the USA and five to 29 around the world. We need to create a safe systems approach, so we can protect people and save lives. According to NHTSA [2], distracted driving behaviors mainly include (a) talking or texting on one’s phone, (b) eating and drinking, (c) talking to passengers, B. Zhang (B) · B. Li · Y. Zhu Dalian Maritime University, Dalian, China e-mail: [email protected] B. Li e-mail: [email protected] Y. Zhu e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Gokhale and E. Grant (eds.), Proceedings of Asia Pacific Computer Systems Conference 2021, Lecture Notes in Electrical Engineering 978, https://doi.org/10.1007/978-981-19-7904-0_2

15

16

B. Zhang et al.

Fig. 1 Overall idea flowchart

and (d) fiddling with the stereo, entertainment, or navigation system. We believe that detecting distracted driver gestures is the key to further preventive measures. Detection of driver distraction is also important for autonomous vehicles; we propose a hybrid model distracted driving attitude evaluation system, which mixes Xception, InceptionV3, and ResNet50, and adopt a more appropriate number of convolutional layers. Our main purpose is to improve the accuracy of the classifier (Fig. 1).

2 Ease of Use A. Maintaining the Integrity of the Specifications At first, distracted driving behavior was an evaluation of multiple issues such as drivers making phone calls and not wearing seat belts. The Southeast University (SUE-DP) dataset [3] was proposed in 2011. It mainly includes holding the steering wheel, operating the gear lever, eating, and calling four categories of distracted driving behaviors. Berri proposed a polynomial kernel support vector machine (SVM) classification system for this dataset, and the classification success rate was 91.57%. Yan et al. [4] proposed a random forest algorithm to extract PHOG features based on time series, with an accuracy of 96.56%. With the refinement of the model and the development of machine learning, the impact of hand and face analysis on distraction detection has been discussed separately for different positions. The model uses the Kaggle public dataset to train the weighted model, and the classification accuracy reaches 95.98%. B. Image Enhancement

Hazardous Behavior Recognition Based on Multi-Model Fusion

17

The closest work to our image enhancement part is from Zhou and Bolei et al [5]. The author’s purpose is to locate the core position of the image given the label, in order to exclude the interference of the external environment.

3 Dataset Since most of the datasets from Southeast University and related distracted driving datasets are not public, we use the competition dataset provided by Kaggle. The data features are as follows: (1) There are a total of 102,150 pictures, of which 79,726 are in the test set, and the number of the test set is much larger than that of the training set; (2) The training set is different from the drivers collected in the test set. The training set is collected from 26 drivers, and the test set is selected from 55 drivers; (3) The picture has obvious temporal continuity; (4) The data is divided into nine tags, and the number of images per tag is relatively average; (5) The camera placement is slightly different, and preprocessing is required to reduce the image range to prevent overfitting. The approximate distribution of the images is shown Tables 1 and 2:

4 Proposed Method Our proposed solution consists of image augmentation and fusion models. We choose to use fine-tune, that is, on the basis of transfer learning, training without locking part of the weights of Xception [6], InceptionV3 [7], and Resnet50 [8], the weights of ImageNet are partially locked at the level, and some levels are retrained. I choose to use multiple models, because the design ideas of different models are different, and the principles of extracting image features are also different. In theory, the use of multi-model fusion can improve the accuracy, and the fusion method I use is not to find an average, but to the weights of the single-model fine-tune to remove the output before the full connection, connect them in series, and then solve the weights of each model through neural network training. A. Image Processing (1) Image zooming and panning: Because the collection of the dataset mainly uses the camera in the car to take the media files of the driver’s driving process [9], the collection process collects the driver’s face and hands, and the resolution, installation location, camera angle, or the driver’s habitual driving posture of the cameras in different datasets brings redundant obstacles to the classification system.

18 Table 1 Number of driver

Table 2 Category of driver

B. Zhang et al. Training set driver

Number of images

p002

725

p012

823

p014

876

p015

875

p016

1078

p021

1237

p022

1233

p024

1226

p026

1196

p035

848

p039

651

p041

605

p042

591

p045

724

p047

835

p049

1011

p050

790

p051

920

p052

740

p056

794

p061

809

p064

820

p066

1034

p072

346

p075

814

p081

823

Category

Description

Number of images

C0

Safe driving

2489

C1

Right-handed texting

2267

C2

Right-handed phone use

2317

C3

Left-hand texting

2346

C4

Left-handed phone use

2326

C5

Operating the ratio

2312

C6

Drinking

2325

C7

Glancing behind

2002

C8

Hair and makeup

1911

C9

Talking to passengers

2129

Hazardous Behavior Recognition Based on Multi-Model Fusion

19

Due to the difference in camera placement and the small amount of data, in order to prevent overfitting later, the data is enhanced before the training data. We translate, zoom, rotate, etc., for each image, increasing the diversity of the data. B. Parametric Optimization (1) Use dropout layers: There are roughly two reasons why dropout can prevent overfitting [10]: averaging and reducing the complex co-adaptation relationship between neurons (reducing weights makes the network more robust to losing specific neuron connections). We use 0.5 probability dropout before the final dense layer. (2) Classification methods: • ResNet50 [8]: Solve the problem that when the network depth increases to a certain extent, the stacking effect of the deeper network becomes worse. ResNet50 can increase accuracy without increasing complexity. • InceptionV3 [7]: Based on the original Inception, the number of parameters is significantly reduced by sharing weights between adjacent blocks. This design can reduce the time complexity. • Xception [6]: Xception uses depth-wise separable convolution to replace the multi-size convolution kernel feature response operation in the original InceptionV3, reducing parameters and increasing accuracy. (3) Our classification model: I choose to use multiple models, because the design ideas of different models are different, and the principles of extracting image features are also different. In theory, the use of multi-model fusion can improve the accuracy, and the fusion method I use is not to find an average, but to the weights of single-model fine-tune to remove the output before full connection, connect them in series, and then solve the weights of each model through neural network training (Fig. 2).

5 Experiments A. Single-Model Optimization (1) Benchmark model: For final optimization and comparison, I use a unified fully connected layer. The fully connected layer I use is as follows. When the input model of ResNet, the graph is scaled to (224, 224, 3), and the output after removing the fully connected layer is a vector of length 2048.

20

B. Zhang et al.

Fig. 2 Multi-model CAM activation image

(2) Various experimental models: • ResNet50: The image is reduced to (240, 320, 3), probably because many scenes have very small actions, and the reduction is too small to propose features. And after repeated attempts, the result of keeping the image ratio unchanged and entering the neural network is that it is best to first train with Adam for six rounds and then using RMSprop to train for six rounds with a very small learning rate of 0.00001. After many experiments, it is found that the best effect of tune is to start the tune at the 152nd layer, that is, to lock the weights of the 0–151 layers. From the 152nd layer, the weights can be trained. • InceptionV3: The image is reduced to (360, 480, 3), use Adam to train for four epochs, and then using RMSprop to train for six epochs with a minimal learning rate of 0.00001. From the 172nd layer, the weights can be trained. • Xception: The image is reduced to (360, 480, 3) and using Adam to train for four epochs, and then using RMSprop to train six epochs with a very small learning rate of 0.00001. From layer 172, the weights can be trained.

Hazardous Behavior Recognition Based on Multi-Model Fusion Table 3 Recognition accuracy of each single and mixing model

21

Model

Accuracy (%)

BaseModelVtttt16

64.49

ResNet50

85.96

InceptionV3

91.76

Xecption

90.36

ResNet50 + InceptionV3 + Xecption

93.45

(3) Comparison between different models: In order to verify the superiority of the hybrid model and the effectiveness of the network, we conduct experiments on the Kaggle public dataset. Mixing model from accuracy also has advantages. And the accuracy of most models is higher than 90%, and the model results have good recognition accuracy (Table 3).

6 Conclusion Distracted driving has become an important culprit in traffic tragedies, in order to identify distracted driving behaviors and solve problems. First, image enhancement is used to extract driving behavior-related regions in the image. Second, we propose a new classification model (hybrid model), which achieves the classification accuracy of distracted driving behavior with large and small accuracy. This system is formed by the combination of three models. The weight of each model is calculated through the neural network, and the connection layers of the model are connected to each other, which strengthens the confidence of the model. Finally, through the convolutional network classification system, the original dataset is classified, and the model accuracy is obtained. Experiments on the Kaggle dataset show that the accuracy of the mixture model after image processing is far better than that of the other existing models. Therefore, this method has better classification efficiency.

References 1. World Health Organization (2018) Global status report on road safety 2018: summary. World Health Organization, Geneva, Switzerland 2. Pickrell TM, Li HR, KC Shova (2016) Traffic safety facts. Retrieved from https://www.nhtsa. gov/risky-driving/distracted-driving 3. Berri RA et al (2014) A pattern recognition system for detecting use of mobile phones while driving. In: 2014 International conference on computer vision theory and applications (VISAPP), vol 2. IEEE 4. Yan C, Coenen F, Zhang B (2014) Driving posture recognition by joint application of motion history image and pyramid histogram of oriented gradients. Int J Veh Technol 2014 5. Abouelnaga Y, Eraqi HM, Moustafa MN (2017) Real-time distracted driver posture classification. arXiv preprint arXiv:1706.09498

22

B. Zhang et al.

6. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE 7. Szegedy C, Vanhoucke V, Ioffe S, et al (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on computer vision and pattern recognition (CVPR). IEEE, pp 2818–2826 8. Ou C, Ouali C, Karray F (2018) Transfer learning based strategy for improving driver distraction recognition. In: International conference image analysis and recognition. Springer, Cham, pp 443–452 9. Wang J, Wu ZC, Li F et al (2021) A data augmentation approach to distracted driving detection. Future Internet 13(1):1 10. Saito K, Ushiku Y, Harada T, et al (2017) Adversarial dropout regularization. arXiv preprint arXiv:1711.01575

A Study on Intelligent Agricultural Monitoring System Based on Internet of Things Fangfang Jiang

Abstract Based on the Internet of Things (IoT) architecture, this paper designs the overall structural framework of the intelligent agricultural system. In terms of software design of the monitoring center, JAVA and JSP languages need to be used in the MyEclipse environment to complete the overall design of the monitoring center, including the monitoring center web server, the monitoring center front-end interface design, and the interaction between the client and the database. The system uses the Tomcat server to publish information and at the same time connects the Tomcat server to the Internet so that users can access the monitoring center anywhere and realize information sharing. The intelligent agricultural monitoring system based on the Internet of Things researched and designed in this paper has the following basic functions: (1) It can collect important farmland environmental parameters; (2) it can conduct real-time video monitoring of the growth status of crops; (3) it can record the collected farmland environmental data, and crop growth information is stored on the web server; (4) it can realize short-distance wireless networking and data transmission of monitoring points; (5) agricultural managers can log in to the system for real-time understanding of farmland environmental parameters and crop growth conditions. Keywords Intelligent agriculture · IoT · Monitoring system · Framework

1 Introduction As the third wave of information industry, the Internet of Things will play an important role in the upgrading from traditional agriculture to modern agriculture. Intelligent agriculture based on Internet of Things will be a new form of agriculture. As the technical support of agricultural modernization, the application of Internet of Things technology in agricultural production and scientific research will accelerate the development of modern agriculture. Agricultural environmental monitoring is an F. Jiang (B) Party School of Weifang Municipal Party Committee, Weifang 0536, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Gokhale and E. Grant (eds.), Proceedings of Asia Pacific Computer Systems Conference 2021, Lecture Notes in Electrical Engineering 978, https://doi.org/10.1007/978-981-19-7904-0_3

23

24

F. Jiang

important part of modern agriculture and an effective means to master agricultural environmental information and crop growth. The extensive management methods of traditional agriculture cannot achieve refined and intelligent management of agricultural environmental information. To improve the traditional agricultural production methods and realize the intelligent management of agriculture, this paper studies and designs a smart agricultural monitoring system based on the IoT. The system can realize real-time monitoring and intelligent management of agricultural environmental parameters and crop growth conditions, which plays a vital role in improving agricultural production efficiency and management level.

2 Theoretical Basis 2.1 The Concept of IoT The IoT is the connection between items and items through the Internet to realize the exchange and communication of information. The IoT was initially proposed based on RFID technology, which was mainly the first version of the radio frequency identification system (RFID) researched and designed by the Massachusetts Institute of Technology (MIT) in 1999. RFID can realize the positioning, monitoring, and management of objects by connecting to the Internet and apply radio frequency identification technology between objects to realize intelligent identification of objects [1, 2]. With the increasing maturity of wireless sensor networks (WSNs), IoT architectures based on RFID and WSNs also emerge as the times require. According to the ITU Internet report issued by the International Telecommunication Union, the IoT is defined as follows. The IoT uses related information sensing devices, mainly including RFID, two-dimensional code reading devices, and GPS [3–5]. These devices are connected to each other through certain protocols, and at the same time, they are connected to the Internet to realize the identification, positioning, and tracking of items and management of a network [6].

2.2 The Concept of Smart Agriculture With the rapid development of IoT technology, smart agriculture that combines IoT technology and agriculture is also constantly progressing and maturing. For the development of smart agriculture, both in terms of production, operation, and management services, it is necessary to continue to integrate information technology such as the IoT, which is currently rapidly developing, enables the rapid development of

A Study on Intelligent Agricultural Monitoring System Based …

25

smart agriculture [7, 8]. Upgrading and transforming traditional agriculture, realizing intelligent control, fine management, and scientific planting of agriculture, and achieving high-yield, high-efficiency, and high-quality agricultural products are the goals of smart agriculture development. Smart agriculture refers to the use of a series of information technologies, including wireless sensor network technology (WSN), Internet technology, automatic control technology, computer technology, artificial intelligence technology, etc., to realize the collection of agricultural environmental parameters, and to transmit and analyze the collected data, after processing, according to these data to achieve scientific guidance for agriculture, and finally achieve the increase of agricultural production and income [9].

2.3 Smart Agriculture System Structure Smart agriculture is the product of the widespread application of IoT technology in the agricultural field. The division of smart agriculture systems can refer to the standards for the division of IoT systems [10]. For the agricultural environmental parameter information, the parameter collection and transmission should be carried out first. The data should be processed by computer and other related technologies, and finally the processed data should be applied to different occasions, that is, the four processes of collection, transmission, processing, and application (as shown in Fig. 1). Fig. 1 Smart agriculture architecture

26

F. Jiang

Fig. 2 Smart agriculture technology system model

2.4 Key Technologies of Smart Agriculture Smart agriculture integrates a variety of advanced technologies, considering the architecture of smart agriculture. It mainly involves four types of key technologies, including perception and identification technology, network and communication technology, computing and service technology, and management and support technology (as shown in Fig. 2).

3 Research Status at Home and Abroad 3.1 Current Status of Foreign Research In the research and application of smart agriculture, foreign countries have a very complete technical system. Taking European and American countries as examples, they have accumulated rich experience in smart agriculture. The most famous and influential are the use of “5S” technology (GPS is global positioning system, RS is remote sensing technology, GIS is geographic information system, ES is expert system, and DPS is digital photogrammetry system). The use of modern technology has realized the fine management of large-scale agricultural production process and the remote control of agricultural production environment. It is of great significance for the construction of agricultural informatization, improving the utilization rate of agricultural resources and increasing agricultural output, and has made great achievements. A device called “CI-600” produced by a company in the USA can monitor the root growth of crops in real time, mainly including a series of information such as root profile images of crops and soil growth around the roots. For monitoring equipment, the probe head is light in weight and has good versatility in crop root monitoring in different environments. At the same time, the power of the equipment is low, and

A Study on Intelligent Agricultural Monitoring System Based …

27

it can be powered by a combination of AC and DC. It is designated by National Ecological Monitoring Network (NEON) as a special instrument for root monitoring application research.

3.2 Domestic Research Status The Chinese government has repeatedly proposed the IoT as a strategic technology for national development. For the first time, it proposed the concept of “perceiving China”, which emphasized that the IoT technology (Zhou 2008) should be applied to agricultural production and accelerate the transformation of agricultural production methods [11]. In June 2010, with the joint efforts of the National Organization for Standardization and other relevant departments, to accelerate the standardization process of China’s IoT technology, to realize the integration of China’s IoT technology with developed countries, and to master the core technology and initiative of development. For this purpose, the “Joint Working Group on IoT Standards” was established. Relevant research institutes and universities in China have also done a lot of work in the research and application of smart agriculture and have also achieved certain achievements. Although China (Zhang 2011) has applied the IoT technology well in agricultural production, there are still many shortcomings in the equipment and systems developed and produced [12]. There are mainly three shortcomings. The first (Wen 2010) is from the perspective of the network architecture of the system [13]. The deployed system can only be applied to a certain characteristic occasion and cannot meet the universal applicability nationwide; the second is that the subordinate sensing nodes have simple functions and short working time periods; the third is to release and share information from the from the point of view, and the release of relevant information can only be viewed through the host and cannot be adapted to national information sharing. In view of the shortcomings of China’s agricultural environmental monitoring, we (Yan 2011) can use IoT-related technologies to design a set of systems that can be widely applied in different agricultural occasions, which is of great significance to improve China’s agricultural production efficiency and management level and promote the development of agricultural information in China [14, 15].

4 The Overall Scheme Design of the System 4.1 Analysis of System Requirements In this paper, an intelligent agricultural monitoring system based on the Internet of Things is proposed and implemented. The advantages of the system are realtime monitoring and intelligent management. In smart agriculture (Dong 2008), the

28

F. Jiang

farmland environmental parameters that need to be monitored are mainly soil temperature, soil moisture, ambient temperature, humidity, precipitation, wind speed, CO concentration, and pH value of the water environment. Due to the complexity of the geographical conditions of farmlands and the variety of collection parameters, many sensors must be used in the field to collect the ecological parameters of farmlands. These sensors used in the field usually have to move and suffer from power outages and other failures. In the past, the design of the monitoring system was mainly based on the wired network. For large farmland, its application has limitations and does not meet the dynamic requirements of the network.

4.2 System Functions The intelligent agricultural monitoring system based on Internet of Things has the following basic functions: (1) It can collect important farmland environmental parameters; (2) it can carry out real-time video monitoring of crop growth; (3) the collected farmland environmental parameters and crop growth information can be stored in the web server to provide a basic information database for modern agricultural development; (4) short-distance wireless networking and data transmission of monitoring nodes; (5) agricultural managers can log in to the system after registration to understand the farmland environmental parameters and crop growth status in real time.

4.3 System Scheme and Architecture Design Hybrid network (ZigBee and Wi-Fi) can be used to realize SASIOT functions (Fig. 3 is the overall structure block diagram of the system), including farmland information collection module, farmland real-time monitoring module, and monitoring center module. The farmland information collection module consists of soil temperature and humidity sensors and small weather stations to form a ZigBee wireless monitoring network, which is mainly responsible for collecting soil temperature, soil humidity, ambient temperature, ambient humidity, rainfall, wind speed, CO concentration, pH of water environment in crop growth blocks value, etc. For each ZigBee monitoring network, there are two main components. One is the soil temperature and humidity data acquisition node, which is mainly responsible for the collection of parameters. The other is the gateway node, which has dual functions. First, as a network coordinator, the ZigBee monitoring network is automatically established and maintained, while the collected data is collected, and second is responsible for monitoring network and monitoring center message transmission. The temperature and humidity sensors are distributed in a regular triangle in the monitoring area, the collected data is sent to the adjacent routing nodes through the ZigBee protocol, and the data is transmitted to the monitoring center through the gateway, so as to realize the collection and display

A Study on Intelligent Agricultural Monitoring System Based …

29

of farmland environmental parameters. The farmland real-time monitoring module consists of multiple video surveillance cameras (100 ~ 200 m apart) distributed in the planting block, Wi-Fi wireless routers, and wireless gateways to form a Wi-Fi wireless video monitoring network, which is mainly responsible for real-time monitoring of crop growth conditions. The cameras distributed in the block transmit the video surveillance data to the nearest Wi-Fi wireless router through Wi-Fi, then the wireless router transmits the video data to the wireless gateway, and finally the wireless gateway transmits the data to the monitoring center. Among them, the Wi-Fi routing of the planting block adopts a medium-power Wi-Fi router to complete the Wi-Fi communication of 100 ~ 1000 m; the wireless routing with the monitoring center adopts a super-power Wi-Fi router to complete the Wi-Fi communication of 1000 ~ 5000 m. The monitoring center module is mainly responsible for the display of soil temperature, soil humidity, ambient temperature, ambient humidity, rainfall, wind speed, CO concentration, pH value of the water environment and other data in the crop growth area, as well as the real-time video monitoring and display of crops, so as to provide agricultural management personnel. Scientific planting provides an effective basis. An intelligent agricultural monitoring system based on the Internet of Things organically integrates a wireless sensor network, a wireless video monitoring network, and the Internet. According to the three-layer architecture of the Internet of Things, the system can be divided into three parts (perception layer, transmission layer, and application layer). The sensor layer is a sub-layer of the system. Its main task is to acquire information about the agricultural environment through various sensor arrays through wireless sensor networks and wireless video surveillance networks used in agricultural fields. The transport layer is the middle layer of the system, and the basic equipment is the wireless gateway (coordinator gateway).

Fig. 3 System overall structure frame diagram

30

F. Jiang

The application layer is the upper layer of the system, or the module of the monitoring center, which mainly consists of a web server and a database for data storage and output. The monitoring center uses the B/S architecture. Through the monitoring center, agricultural managers and ordinary users can not only get acquainted with the information of the agricultural environment, including real-time and historical data of the agricultural environment, but also monitor the corresponding devices of the system, manage, and send relevant instructions. The working process of the whole system consists in collecting the environmental parameters of the crop fields and real-time video monitoring of the growth status of the crop through sensor nodes, small meteorological stations, and high-precision video monitoring nodes placed on the crop fields. The data or on-site images are combined in the wireless gateway (coordinator gateway), and the wireless gateway (coordinator gateway) transmits the data to the monitoring center for storage. After registering in the system, agricultural managers or ordinary users can access the web server to understand the ecological parameters of the farmland and the growth status of the crop in real time. In the front interface of the monitoring center, the evolution process of the environment can be shown in the form of a bar graph, a diagram, and a curve. Monitor the sensor node and wireless gateway (coordination gateway).

5 Conclusion Based on the test and operation results of the system, the system can collect environmental parameters of agricultural areas. It can also monitor the growth status of plants in real time. The monitoring center’s web server can store the collected relevant data and provide a basic information base for the development of modern agriculture. A web server can publish and exchange information. After registration, agricultural managers can log in to the system to get real-time information on environmental parameters of fields and crop growth conditions, as well as perform data query, inspection, and other functions. In short, the IoT-based agricultural intelligent monitoring system presented in this article is very important for the development of agriculture.

References 1. Zhou Y, Jing B, Zhang J (2008) Embedded remote monitoring system based on Zig Bee wireless sensor network. Instrum Technol Sens 2:47–49 2. Zhang J, Li A, Li J, et al (2011) Research of real-time image acquisition system based on ARM7 for agricultural environmental monitoring. In: 2011 International conference on remote sensing, environment and transportation engineering. IEEE, pp 6216–6220 3. Syria WD (2010) IoT applications in agriculture. Mod Agric Sci Technol 15:54–56

A Study on Intelligent Agricultural Monitoring System Based …

31

4. Yan D (2011) Design of intelligent agriculture management information system based on IoT. In: 2011 Fourth international conference on intelligent computation technology and automation, pp 1046–1049 5. Yachao D (2008) Development of wireless environmental monitoring network based on Zigbe technology. Dalian University of Technology, Dalian 6. Jiafang W (2010) Research on the development trend of big data in the era of smart agriculture. Technol Econ Manage Res 2:124–1289 7. Linjun Z (2021) Development status and application of smart agriculture based on big data. Agric Eng 11(4):50–53 8. Yingbo C (2020) Research on the application of big data in smart agriculture. Hubei Agric Sci 59(1):17–22 9. Liang H, Yuehua L, Mingzhi H Research on the growth model of organic vegetables in smart agriculture based on big data and the Internet of Things. 10. Xin F, Nian J, Min F (2021) Research on the construction of smart agricultural parks based on big data. Smart Agric Guide 1(3):25–27 11. Gangming G (2021) Application analysis of big data in smart agriculture. Mod Agric Mach 3:19–20 12. Longshan S, Lijuan P (2021) Application and research of big data in smart agriculture. Agric Eng Technol 41(3):47–48 13. Dalong W (2021) Research on the development of “Internet + smart agricultural machinery” under the background of agricultural big data. Smart Agric Guide 1(8):44–46 14. Huiqiang A (2021) Analysis on the construction of smart water conservancy and smart watershed based on big data. Agric Eng Technol 41(9):63–64 15. Yongqing W (2021) Research on the development of smart agriculture based on big data. Hebei Agric Mach 6:8–10

Design of Ship Abnormal Behavior Recognition System for Intelligent Maritime Supervision Yangyang Wu, Hailin Zheng, and Nibin Huang

Abstract The research in this article is conducive to the coastal countries’ full awareness of the maritime situation in the waters under their jurisdiction and provides strong help to ensure the safety of coastal countries’ maritime navigation waters. The abnormal shutdown of onboard AIS equipment can be effectively identified by monitoring the change of transmission intensity of onboard AIS radio signal and using the established shutdown event model. In addition, by monitoring the abnormal change of the signal level of the ship-borne AIS radio station, it can effectively identify whether the ship-borne AIS radio station is illegally occupied. The experimental results show that the abnormal behavior recognition system designed in this paper can effectively identify the abnormal behavior of ships according to the established shutdown event model and monitoring the change of signal strength of ship-borne AIS radio station. Keywords Water transportation · Maritime supervision · Shutdown event · Abnormal behavior · Illegal occupation

1 Introduction In 2006, the International Maritime Organization (IMO) formulated the “e-navigation development strategy” to actively promote the realization of the goal of “enavigation”. Electronic patrol system is based on the integration of AIS, VTS, CCTV and other information systems and supervision system functions and adds technical means such as simulating cruise, delimiting boundary lines and setting critical values to realize the functions of maintaining ship traffic order, correcting and punishing ship violations and collecting navigation environment data. The electronic cruise system compares the supervision object information collected by AIS, VTS, GPS and other systems with the standard data or system setting value of the maritime business data center, identifies the movement state and development trend of the supervision object Y. Wu · H. Zheng (B) · N. Huang School of Naval Architecture and Maritime, Zhejiang Ocean University, Zhoushan 316022, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Gokhale and E. Grant (eds.), Proceedings of Asia Pacific Computer Systems Conference 2021, Lecture Notes in Electrical Engineering 978, https://doi.org/10.1007/978-981-19-7904-0_4

33

34

Y. Wu et al.

and gives early warning of abnormal state, so as to provide reference and decision support for the maritime administration to implement water traffic safety supervision [1]. China Academy of Launch Vehicle Technology Aohai Technology Co., Ltd. has developed a “clean sea guardian” water radio-assisted law enforcement system, which can not only accurately identify abnormal ships, but also monitor national ship information in real time around the clock, so that illegal ships have nowhere to escape. The system is connected with the national AIS data source, ship registration database and ship radio database. It can carry out continuous monitoring and real-time intelligent analysis of national ship information, quickly locate illegal ships, provide clues and basis for law enforcement and improve the accuracy of law enforcement. At the same time, the system can also provide law enforcement personnel with the functions of on-site illegal evidence collection, processing result storage and upload, so as to improve the standardization of law enforcement [2]. “Smart sharing in dark blue” deeply integrates the ship dynamic data of Ningbo Zhoushan port, E-port customs clearance data, AIS data, over the horizon CCTV, real-time meteorological and sea condition monitoring data and some maritime supervision data and realizes the intelligent identification of ship abnormalities, risk control center, the whole process safety management of large ships in port, intelligent auxiliary supervision and other modules provide a powerful tool for the transformation of maritime law enforcement from “sweat-based type” to “wisdom type” [3]. This project is to design an identification system for abnormal ship behavior for maritime intelligent supervision, which provides a powerful help for strengthening the management of maritime radio order and solve the prominent illegal behaviors such as deliberately closing the ship automatic identification system (AIS) signal to evade maritime supervision, tampering or fraudulent use of radio station identification code, unauthorized modification of ship-borne radio equipment, malicious implementation of signal interference and so on. It focuses on tackling such prominent problems as the incomplete allocation of legal licenses for ships, “multiple ships per yard”, “multiple yards per ship”, illegal occupation of channels, violation of communication order and so on.

2 Monitoring Technology of Intentional Shutdown of Ship-Borne AIS Radio Station 2.1 Shutdown Identification Based on the Distribution Rule of Ship-Borne AIS Signal Strength The ship port traffic management system that collects the target signal through the shore-based radar is called VTS. The radar that collects the target signal through the ship-based radar and displays the self-directed course, speed and can simulate collision avoidance is called ARPA collision avoidance radar. With the development

Design of Ship Abnormal Behavior Recognition System for Intelligent …

35

of navigation and the improvement of people’s requirements for navigation communication, the problem that VTS and ARPA radar cannot directly identify the target is more prominent. AIS is a new type of navigation aid system and equipment. At present, AIS has developed into a general automatic identification system. The correct use of AIS helps to strengthen the safety of marine life, improve the safety and efficiency of navigation and protect the marine environment. AIS can prevent collision between ships, enhance function of ship traffic management system, display ship static information and dynamic information on electronic chart, improve the function of maritime communication, enhance the overall awareness of ships and bring the maritime world into the digital age. The high-precision positioning means and the positioning of global positioning system (Civil GPS) can ensure the accuracy better than 10 m (the measured up to 3 M), which meets the positioning requirements of AIS. The globally unique MMSI code of the ship is also called the ship identification number. Each ship is given a globally unique MMSI code from the beginning of construction to the end of use. IMO passed the resolution to promote the application of MMSI code in 1987. Self-organized-timedivision-multiple-access (SOTDMA) technology is a technology that links through data packaging. AIS technical standard stipulates that every minute is divided into 4500 time periods. Each time slot can send a message no longer than 256 bits. Each ship will ask (automatically) to select a time slot that does not conflict with other ships and the corresponding time slot to release its own message. On the unified VHF channel, any ship within the AIS range can send reports and receive reports from all ships (shore stations) without interfering with each other, which is the technical core of SOTDMA. AIS system (in the same area) can accommodate 200–300 ships at the same time. When the system is overloaded, only targets far away will be abandoned in order to ensure the priority of short-range targets. As some vessels may engage in illegal or dangerous activities, maritime transport and port security have become major concerns. Since AIS is a self-report system, the main disadvantage of this technology is its high data error rate and easy to be changed. Therefore, the verification of the reliability of AIS data has become a key issue to fully exploit the security potential and security application potential of this technology. AIS data is unreliable: (1) AIS information can be wrong as some of the information is manually entered by the crew, either when initializing the system for permanent data (such as ship name) or on every new voyage; (2) AIS reports may be falsified (or deceived) as a form of deception; (3) The ship can turn off its power and AIS transponder to engage in illegal activities. Because it is difficult to detect whether the ship’s AIS system is a malicious shutdown to cover up the illegal behavior of the ship, this is because of the irregularity of artificial intelligence. This is because the irregular reporting interval of artificial intelligence changes from a few seconds to a few minutes. Moreover, the change of radio wave propagation signal in the marine environment is very complex, which greatly affects the received signal strength index (RSSI) of the base station [4, 5].

36

Y. Wu et al.

Therefore, based on the variation law of the signal strength of the ship-borne AIS, it is possible to identify whether the ship-borne AIS has been shut down artificially. First, we must understand the AIS receiving characteristics. AIS was born as collision avoidance system, which is based on regular VHF sending and receiving binary information containing ship dynamic information (such as position, speed over ground (SOG), course of ground (COG), flight speed and rate of turning (ROT)) and static information (such as MMSI ship code, name, call sign, destination, ship type and size). While analyzing the maritime communication in the open sea, the level of service and reflections from sea surface are considered as the relevant components of RSSI, while reflections from shore and noise are considered as undesired components which need to be removed. However, some phenomena produce interfering signals that have the effect of extending the range of signal. In the literature on propagation of VHF band, three following main phenomena are determined to have practical impact on AIS transmission: (1) Ocean diffraction around the curvature of earth expands the AIS signal; (2) Multi-path effect can produce variability in RSSI and largely depends on the working mode of transmitting ship antenna, installation position, vessel superstructure, wind speed, receiver configuration, height of antenna and surrounding terrain; (3) Transmission range can also be expanded although vertical gradient troposphere profile of refractive index is greater than a certain threshold.

2.2 Anomaly Identification Based on Ship-Borne AIS “Shutdown Event” Model Ship static and dynamic data are sent by ship-borne AIS equipment and received by surrounding ships and shore base stations. If the ship’s AIS equipment fails during navigation, or the sailing distance exceeds the coverage of the shore base station, the data will not be received again. During the navigation of the ship, the equipment of the ship that meets the shipping specifications will keep sending data. When the ship slows down to stop, the equipment may be shut down. After shutdown, the system will no longer receive the data sent by the ship. This kind of event is the same as the disappearance event, and the signal is also not received for a long time, but the reasons for them are different, and the algorithm for discrimination is also different. The algorithm for discriminating the shutdown event is as follows [6, 7]. Judging the speed of the recently received data of this ship, if it is less than or equal to the set shutdown speed threshold, that is, the upper limit of the speed before shutdown, it may be that the signal caused by the shutdown has not been received anymore. Request the latest two data of this ship through the interface. If the ship is shutting down the AIS device after decelerating and stopping, then the latest two data must be in a decelerating state, and the speed should be less than or equal to

Design of Ship Abnormal Behavior Recognition System for Intelligent …

37

the upper limit of the speed before shutdown. If so, it is considered that this ship has a shutdown event. If not two consecutive data are less than the upper limit before shutdown, then the data of this ship will be processed in the disappearance event. After a certain discrimination, algorithm determines that the ship has disappeared, the ship disappearance event is born. Through the analysis of historical data, the signal loss of a ship for consecutive tth1 minutes is regarded as that the ship has disappeared. The discrimination algorithm is as follows: before receiving the data periodically, detect whether there is a record in the current real-time data that the latest update is t th1 minutes ago and greater than t th2 seconds before t th1 minutes. If so, it is considered that the ship corresponding to these data has disappeared in the system. According to IMO guidelines, if the captain believes that the continuous operation of AIS may endanger the safety or security of the ship, or a security incident is imminent, AIS may be closed. The date, place and time of AIS shutdown should be recorded in the ship’s log book with the reason, and the master should restart the AIS once the source of danger has disappeared. A frequently cited example of the reason for closing AIS is to hide the identity, location and route of ships from pirates when passing through pirate prone waters. In addition, the ship’s AIS equipment must work continuously during the voyage or during the berthing in the port or anchorage, which is conducive to the competent authority’s monitoring of the ship’s navigation dynamics to ensure the safety of the ship’s navigation waters. If the onboard AIS equipment does not fail and the ship is within the coverage of the onshore base station, but the ship disappears. There is only one reason. The onboard AIS equipment is illegally shut down. Anomaly identification also includes its own behavior: (1) Whether the power supply of AIS equipment is normal; A-level AIS and related sensors should be powered by main power supply and emergency power supply and can switch automatically. Class B AIS is powered by a regulated power supply. For most ships, there is basically no situation where the AIS cannot be powered on during the voyage, unless the main power supply and emergency power supply of the AIS are manually turned off. Since the B-level AIS equipment shall be able to store the shutdown and startup time of at least 10 times recently, and some A-level AIS also have this function, board the wheel to verify whether the AIS can be started normally. If it operates normally and the signal verification is normal, it can confirm the intentional shutdown of AIS by querying the equipment alarm information/stop operation record; if it cannot be turned on, check the power connection, including the switch on the switchboard and the power connection line of the AIS equipment host. (2) Whether the transceiver antenna of AIS equipment is abnormal; the transceiver function of AIS equipment needs to be realized with the help of antenna. If this happens, check whether the antenna signal connecting line of AIS host is pulled out. For A-level AIS, the ship position signal will be abnormal when the external GPS equipment interface is pulled out. When the law enforcement officers eliminate the above situations, they can ask the crew to send information to

38

Y. Wu et al.

other ships, check whether the AIS equipment is in the normal state of receiving and transmitting, and verify whether the installation of AIS VHF antenna is reasonable (AIS VHF antenna and ship VHF antenna should not be in the same horizontal plane, and the vertical direction should be at least 2 m apart, if in the same horizontal plane, at least 10 m apart). (3) The own ship can receive the AIS signal of other ships, but other ships cannot see the own ship’s signal. The transmit power of the AIS transmitter is generally 25 W. Some AIS devices have the transmit power adjustment function. If the transmit power is adjusted to a lower value, when the ship is far away from the AIS base station, the VTS monitoring will not be able to receive the ship’s signal. In order to verify whether the AIS transmission function of the ship is abnormal, the self-inspection test can be carried out through the built-in inspection test (BIIT) inside the AIS equipment, or the special AIS test instrument can be used to test and verify the transmission power and transmission frequency of the AIS transmission of the inspected ship, so as to further verify whether the crew deliberately turns off the AIS or there is a problem with its antenna transmission function.

3 The Reconnaissance Technology of Illegally Occupying the Working Channel of the Ship-Borne AIS 3.1 Communication Status of Working Channel of Ship-Borne AIS Equipment VHF wireless communication equipment is the main equipment for realizing shortrange radio communication on water, and it is also an important equipment to ensure the safety of ship navigation. AIS and VHF communication equipment plays a vital role in the navigation of modern ships. However, due to the numerous electronic devices on board, the operation of the driver is complicated and inconvenient, and it also brings hidden dangers to the navigation of the ship [8–10]. At present, the main way of maritime communication is through VHF communication equipment, whether on VHF16 channel, or the special channel of port authorities and VTS, each channel is very busy. Despite this, many crew members occupy the channel for a long time to chat, and singing is common, which seriously affects the normal communication between ships and between ships and shores and greatly reduces the navigation efficiency of ships. However, the current maritime communication lacks effective supervision methods, and VHF communication equipment cannot effectively identify the speaker’s identity. When someone on the ship uses AIS communication equipment to chat and talk for a long time, the transmitting power of the AIS equipment is much greater than the receiving power, so the current flowing through the power line is also significantly larger. The Hall sensor detection circuit, according to the Hall effect, can detect the

Design of Ship Abnormal Behavior Recognition System for Intelligent …

39

change of the current during the operation of the ship-borne AIS in real time and can also reflect the use of the AIS equipment. When it is detected that the AIS working channel is occupied and the continuous speech exceeds the preset time, the ship’s AIS equipment will send an automatic identification message, and the surrounding ships equipped with AIS equipment and the shore AIS base station can identify the ship occupying the channel.

3.2 Abnormal Identification of Ship-Borne AIS Working Level Signal This solution achieves the abnormal identification of AIS high-level signals by connecting the device between the VHF communication equipment and the AIS equipment. It includes a current sensor, connected to the VHF communication equipment, used to detect the current change of the VHF communication equipment when speaking and output the corresponding level signal; the control module, the input end of which is connected to the output end of the current sensor, which is used to drive the AIS equipment to send messages according to the output time of the level signal. The communication controller is connected with the controller through the data interface, which is used to control the signal method, control the transmission of information and exchange data of each interface; VHF transmitter used to send early warning signal. The specific components of the self-monitoring and early warning offshore VHF communication device are shown in Fig. 1, and the components of the controller module are shown in Fig. 2. This scheme provides a supervision and early warning method for self-supervision and early warning marine VHF communication device. The specific process is shown in Fig. 3. Step (1): The current sensor judges whether the detected current variation exceeds the critical value, if so, go to step (2); Step (2): The current sensor outputs high level to the controller; Step (3): The controller judges whether the duration of the received high-level signal exceeds the preset time. If so, it is judged as abnormal behavior and proceeds to step (4); Step (4): The controller sends the warning command to the communication controller through the data interface; Step (5): The communication controller drives the VHF transmitter to send early warning information. In step (5), the early warning information sent by the VHF transmitter includes ship information, maritime mobile service identity (MMSI), binary broadcast information, messages with specific signs and timeout early warning information. The AIS dynamic message is used to send the early warning information, and the reserved bit in the AIS dynamic message is used to indicate the information of the ships that

40

Y. Wu et al. Ship motion sensor interface

VHF Transmitter

VHF-DSC terminal

data interface

controller

Power Supply

Transmitter receiver Display module

VHF-TDMA Transmitter

VHF-TDMA Receiver

Second duplexer

First duplexer

VHF Receiver

Communication controller

DSC receiver

Input control module

Input module

Satellite positioning receiver

Fig. 1 Circuit block diagram of self-supervision and early warning marine VHF communication device Fig. 2 Composition of controller module

Time detection unit

processing unit

Time setting module

Design of Ship Abnormal Behavior Recognition System for Intelligent … Fig. 3 Supervision and early warning method of self-supervision and early warning marine VHF communication device

41

The current variation amplitude exceeds the critical value

The current sensor outputs a high level to the controller

High level signal duration exceeds preset time

The controller sends the warning command to the communication controller through the data interface

The communication controller drives the VHF-TDMA transmitter to send early warning

occupy the VHF communication equipment over time, so that the surrounding ships and land base stations can identify the ships that occupy the VHF communication equipment over time. This scheme not only integrates all the functions of traditional AIS equipment and VHF communication equipment, but also can monitor the use of AIS communication equipment, so that the use of AIS communication equipment can be effectively supervised, so as to reduce the phenomenon that the channel is occupied for a long time, improve the navigation efficiency of the ship and effectively simplify the electronic equipment on the ship, so that the ship drivers can operate the equipment more conveniently and efficiently.

42

Y. Wu et al.

4 Conclusion To sum up, the research in this paper is conducive to the coastal states’ full awareness of the maritime situation in the waters under their jurisdiction and provides strong help to ensure the safety of the coastal states’ maritime navigation waters. The abnormal shutdown of onboard AIS equipment can be effectively identified by monitoring the change of transmission intensity of onboard AIS radio signal and using the established shutdown event model. In addition, by monitoring the abnormal change of the signal level of the ship-borne AIS radio station, it can effectively identify whether the ship-borne AIS radio station is illegally occupied. The research in this paper is the key basis for suspicious traffic situation or threat detection in the maritime domain, which can provide strong support for the maritime monitoring system and provide decision-making guidance. Acknowledgements Fund Project: Zhejiang University Students’ Science and Technology Innovation Activity Plan and New Miao Talent Project (2021R411041); Zhejiang Ocean University’s National innovation and entrepreneurship training program for College Students (202010340035).

References 1. Kunyang W (2020) Application of virtual “electronic fence” in the supervision of ship entry and exit reports. Transp Enterprise Manag 35(2):3 2. Sanqiang Z (2020) Reflections on ship AIS signal loss and safety management. Pearl River Water Transp 22:2 3. Zhikai X, Application of ship AIS shutdown monitoring system in inland waters. In: Proceedings of the 2020 Maritime Management Academic Annual Conference 4. Mazzarella F, Vespe M, Alessandrini A, et al (2017) A novel anomaly detection approach to identify intentional AIS on-off switching. Expert Syst Appl 78:110–123 5. Hong X, Design and implementation of massive data storage and real-time event discovery system. Beijing University of Posts and Telecommunications 6. Qinyou H, Chuanxin C, Liang C, et al (2016) Self supervision and early warning maritime VHF communication device and its supervision and early warning method: cn105551309a 7. Qinyou H, Qifei Z, Liang C, et al (2016) Device for VHF communication equipment to drive AIS equipment to send automatic identification message: cn105262603a 8. Yong H, Changyun J, Shixiang C, et al (2014) Maritime dynamic supervision system in the era of “e-navigation”—electronic cruise. J Wuhan University Techn: Traffic Sci Eng Edn 38(2):5 9. Changyun J, Research on maritime dynamic supervision mode of Yangtze River trunk line based on electronic cruise. Wuhan University of technology. 10. Jiatong H (2020) Screening method of slipway irregularities based on AIS technology. Pearl River Water Transp 22:2

Phase Recovery Algorithm Based on Intensity Transport Equation and Angular Spectrum Iteration Haiyan Wu

Abstract Phase recovery algorithm is to use the intensity distribution obtained from direct measurement to recover the phase and reconstruct the wave function. There are mainly intensity transmission equation (TIE) and iterative algorithm. TIE algorithm has fast calculation speed and is suitable for the case of small imaging distance. When the imaging distance is too far, the linear approximation is no longer consistent, which is equivalent to low-frequency filtering of the image. Therefore, it is effective for lowfrequency reconstruction. The iterative algorithm is very effective in recovering the high-frequency part, and only needs a few iterations. For the low-frequency part of the image, the reconstruction efficiency is lower, and more iterations are needed to achieve higher accuracy, which increases the computational burden of the iterative algorithm. Combining the characteristics of TIE and iterative algorithm, they can be used in combination. In the iterative algorithm, the angular spectrum iterative algorithm is selected. The initial phase is the random phase. If the initial phase is close to the real value, the convergence speed is faster. We take the result of TIE reconstruction as the initial phase value of the angular spectrum iterative algorithm, and then use the angular spectrum iterative algorithm for phase reconstruction. If the phase map reconstructed by tie is effective in the low-frequency part, the angular spectrum iterative algorithm can further reconstruct the high-frequency part information with less iterations, and finally realize the phase reconstruction. This not only improves the calculation accuracy of TIE in Fresnel domain, but also reduces the computational burden of iterative algorithm. Keywords Phase retrieval · Intensity transport equation · Iterative angular spectrum algorithm · Fuse

H. Wu (B) Anhui Sanlian University, Hefei 230601, Anhui, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Gokhale and E. Grant (eds.), Proceedings of Asia Pacific Computer Systems Conference 2021, Lecture Notes in Electrical Engineering 978, https://doi.org/10.1007/978-981-19-7904-0_5

43

44

H. Wu

1 Introduction The complete light field should include intensity information, wavelength information, and phase information. Relevant studies show that most of the useful information of the light field (about 75%) is contained in the phase information. Due to the fast propagation speed of light, today’s detection instruments cannot directly detect the phase information. Phase retrieval technology is to calculate the phase information according to the intensity information directly measured, so as to reconstruct the whole wave function. Phase information can provide important data information for light field reconstruction and holographic three-dimensional display technology. It is an important research field of modern optics. At present, phase retrieval technology mainly includes two algorithms based on intensity transport equation (TIE) and iteration. Teague [1] put forward tie equation (strength transmission equation) in 1983. There are many methods to solve tie equation, such as Fourier method [2], multigrid method [3, 4], and Green’s function method [1]. This method is accurate and effective, but the solution is complex and only suitable for the near-field region of the light field. In 1972, Gerchberg et al. proposed GS iterative algorithm [5]. Later, many scholars proposed modified and improved algorithms for GS algorithm, such as error reduction algorithm [5], hybrid input–output algorithm [6], and angular spectrum iterative algorithm [7]. The iterative algorithm makes use of the reversibility of wave propagation to iterate repeatedly between the object plane and the image plane. However, this method has many disadvantages, such as slow convergence speed, iterative uncertainty, and is only suitable for the far-field region of the light field. Another algorithm, iterative angular spectrum algorithm (IASA) is based on the traditional GS algorithm, which uses the angular spectrum propagation between the object plane and the image plane to recover the phase information on the object plane. Based on the application of the two algorithms, this paper combines iterative angular spectrum algorithm with intensity transfer equation method, and its idea is to solve it by using tie equation. The phase value is used as the iterative initial phase value of iterative angular spectrum algorithm, and then repeated iteration is carried out. Through the following experiments, it can be proved that this new algorithm can effectively improve the accuracy of the solution results of tie equation, reduce the number of angular spectrum iterations, make up for their shortcomings, and finally solve the phase information better.

2 Phase Retrieval Algorithm 2.1 Intensity Transport Equation (TIE) When a plane wave passes through the test piece and changes the forward direction of the wave (phase change), a change in intensity distribution is formed on

Phase Recovery Algorithm Based on Intensity Transport Equation …

45

Fig. 1 Schematic diagram of tie equation

the imaging plane. The relationship between image intensity changes on different imaging surfaces includes the wave phase information changed by the plane wave passing through the test piece, and the intensity propagation equation is to describe the relationship between image intensity distribution and phase distribution. Therefore, if the intensity distribution on different imaging planes can be recorded, the phase information can be solved through tie equation. TIE equation describes the relationship between phase distribution and intensity distribution. The basic principle of its micro-field is as follows: assuming that the plane wave propagates along the z-axis direction (as shown in Fig. 1), the relationship between Poynting vector, intensity, and phase meets the following requirements: → λ I (x, y, z)∇ϕ(x, y, z) S = 2π

(1)

→ I (x, y, z) represents the intensity, ϕ(x, y, z) is the phase, λ is the wavelength, S   is the time average over a period of time, and ∇ = ∂x , ∂ y is the gradient operator. Decompose the phase into scalar component ϕs (x, y, z) and vector component → ϕv (x, y, z), then  →   → S = I (x, y, z) ∇ϕs (x, y, z) + ∇ × ϕv (x, y, z)

(2)

In the paraxial structure, assuming that the energy propagates along the optical axis in a small angle direction, it is deduced by using Eqs. (1) and (2): −

2π · ∂z I = ∇ · (I ∇ϕ) λ

(3)

46

H. Wu

The above formula is TIE equation. There are many methods to solve TIE equation, such as Fourier method, Green’s function method, multigrid method, and so on. In this paper, the Fourier method is used to solve the TIE equation. We can deform Eq. (3) to solve the phase, as shown in Eq. (4).  φ(r⊥ , z) = −.−1 q⊥−2 .{∇⊥ · I −1 (r⊥ , z)∇⊥ ψ(r⊥ , z)

(4)

 ψ(r⊥ , z) = .−1 q⊥−2 . k∂z I (r⊥ , z)

(5)

Among them,

The strength partial derivative ∂z I (r⊥ , z) in Eq. (5) can be approximately calculated by the strength difference, i.e., ∂z I (r⊥ , z) =

I (r⊥ , z + Δz) − I (r⊥ , z − Δz) 2Δz

(6)

As mentioned above, we can solve the phase by using two intensity maps and Fourier method.

2.2 Iterative Angular Spectrum Algorithm (IASA) The complex amplitude on object plane is U1 (x1 , y1 ) = ρ1 (x1 , y1 ) exp[i φ1 (x1 , y1 )]

(7)

The complex amplitude on the image plane can be expressed as U2 (x2 , y2 ) = |U2 (x2 , y2 )| exp[i ϕ2 (x2 , y2 )]

(8)

Assuming that the light field transformation relationship between the object plane and the image plane obeys the scalar diffraction theory, the forward and reverse diffraction calculation process between the object plane and the image plane can be constructed by using the angular spectrum transfer function according to the angular spectrum propagation theory. The light field between object surface and image surface satisfies the following relationship:    U2 (x2 , y2 ) = .−1 .(U 1(x1 , y1 ))H f x j , f y j

(9)

Phase Recovery Algorithm Based on Intensity Transport Equation …

47



  2  2 H f x j , f y j = exp ikΔz 1 − λ f x j − λ f y j is transfer function, k =       n m 2π λ, fx j = ΔL , fy j = ΔL m, n = −N 2, −N 2 + 1...N 2 − 1 , Δz is the distance between the object plane and the imaging plane (When it is positive, it indicates the forward diffraction process, and when it is negative, it indicates the reverse diffraction process), ΔL is calculated width, m, n is the number of sampling points, and N is the total number of samples. The discrete numerical calculation can be completed with the help of fast Fourier. 



The steps of iterative angular spectrum algorithm are as follows: (a) The random phase is used as the initial phase distribution of the object wave, together with the known amplitude distribution as the initial wavefront function. Through the forward angular spectrum propagation, the wavefront function on the image plane is obtained. (b) The phase remains unchanged, but the amplitude becomes the amplitude distribution of the image plane detected in advance. (c) The wavefront function on the object surface is obtained by backward angular spectrum propagation of the wave function on the new image surface. (d) Keeping the phase unchanged, the amplitude becomes the pre-known amplitude distribution of the object surface, which is used as the initial object wave function of the next iteration. In this way, the algorithm iteration is repeated until the sum of the defined error mean square error reaches the set accuracy or the set maximum number of iterations. SSE is defined as 2   (n) ρ22 SSE = ρ2 − ρ2 (10) ρ2 is the amplitude distribution of the detected image plane and ρ2(n) is the amplitude distribution obtained at the end of the nth iteration. The flowchart of iterative angular spectrum algorithm is shown in Fig. 2.

2.3 Improved Algorithm TIE algorithm has fast calculation speed and is suitable for the case of small imaging distance. When the imaging distance is too far, the linear approximation is no longer consistent, which is equivalent to low-frequency filtering of the image. Therefore, it is effective for low-frequency reconstruction. The iterative algorithm is very effective in recovering the high-frequency part, and only needs a few iterations. For the lowfrequency part of the image, the reconstruction efficiency is lower, and more iterations are needed to achieve higher accuracy, which increases the computational burden of the iterative algorithm. Combining the characteristics of tie and iterative algorithm, we can use them together. In the iterative angular spectrum algorithm, the initial phase selects the random phase. If the initial phase is close to the real value, the convergence

48

H. Wu

Fig. 2 Flowchart of iterative angular spectrum algorithm

speed is faster. We take the result of tie reconstruction as the initial phase value of the iterative algorithm, and then use the iterative angular spectrum algorithm for phase reconstruction. If the phase map reconstructed by tie is effective in the lowfrequency part, the iterative angular spectrum algorithm can further reconstruct the high-frequency part information with less iterations, and finally realize the phase reconstruction. This not only improves the calculation accuracy of TIE in Fresnel domain, but also reduces the computational burden of iterative angular spectrum algorithm.

3 Simulation Experiment Results We take Fig. 3 as the imaginary object phase; its phase value is between [−0.45π, 0.9π ] and the wavelength λ = 10−10 m. It is assumed that the intensity distribution in the plane of the object is uniform, I0 (x, y) ≡ 1.

Phase Recovery Algorithm Based on Intensity Transport Equation …

49

Fig. 3 Imaginary object phase

3.1 IASA Simulation Experiment According to the iterative angular spectrum algorithm steps, Fig. 4 shows the experimental results of 100 iterations at different distances. Figure 5 shows the experimental results of 100, 500, and 1000 iterations at the distance z = 0.1 m. In order to verify the applicability of the iterative algorithm to recover phase information, we use the iterative algorithm to iterate 100 times at different distances,

(a)z=0.01m

(b)z=0.1m

(c)z=1m

Fig. 4 Results of 100 iterations at different distances

(a)100 iterations

(b)500 iterations

Fig. 5 Results of different iterations at z = 0.1 m

(c)1000 iterations

50

H. Wu

as shown in the results of Fig. 4. We can see that the effect of iterative algorithm is poor in the near-field region, but better in the distance. In the low-frequency part, the reconstruction effect is poor, while in the high-frequency part, it only needs less iterations to get a good reconstruction effect. This is because in the Fresnel domain, the phase transfer function of the low-frequency component is smaller than that of the high-frequency component. As can be seen from Fig. 5 in the low-frequency part, more iterations are required to achieve high accuracy, which leads to very slow algorithm speed.

3.2 Improved Algorithm Simulation Experiment We do the experiment of TIE algorithm and IASA algorithm at the imaging distance of z = 0.1 and z = 1 m, respectively, and then combine the two algorithms for 100 iterations. The experimental results are shown in Figs. 5 and 6. As can be seen from Fig. 6, at z = 0.1 m (near-field area), the result of TIE recovery is better than that of ISIA algorithm, and the experimental result of TIE and IASA fusion algorithm is obviously better than that of the two algorithms used alone. Moreover, compare Fig. 5c with Fig. 6c, the new algorithm can achieve the effect of 1000 iterations of IASA algorithm after 100 iterations with the same z value. It can be seen from Fig. 7 that at z = 1 m (far-field area), TIE recovery results are poor, while the IASA algorithm results are good, which also proves that as mentioned above, TIE is suitable for near-field and IASA algorithm is suitable for far-field, and the recovery results of the fusion algorithm of the two are obviously better than those of the two alone. From the experimental results, firstly, tie reconstruction results are a good choice as the initial phase of the iteration. Therefore, it can be known that the appropriate initial phase selection can greatly improve the convergence speed of the iteration, and the desired accuracy can be obtained with less iterations. Secondly, TIE computing speed is fast, and the algorithm of TIE and iterative fusion does not significantly increase the amount of computation.

(a)TIE

(b)IASA

(c)TIE+ IASA

Fig. 6 Comparison of experimental results of three algorithms at z = 0.1 m

Phase Recovery Algorithm Based on Intensity Transport Equation …

(a)TIE

(b)IASA

51

(c)TIE+ IASA

Fig. 7 Comparison of experimental results of three algorithms at z = 1 m

4 Conclusion The former is suitable for solving the far-field equation, and the latter is suitable for the near-field equation. The two algorithms have their own shortcomings. This paper combines the two algorithms, that is, the phase obtained by TIE is used as the initial value of the IASA algorithm. Experiments show that the fusion algorithm is fast and effective. Acknowledgements This work was supported by the Support Program for Excellent Young Talents in Universities of Anhui Province (gxyq2020082), the Anhui Provincial Natural Science Foundation (KJ2020A0810, KJ2021A1190).

References 1. Teague MR (1983) Deterministic phase retrieval: a green’s function solution. Opt Soc Am 73:1434–1441 2. Paganin D, Nugent KA (1998) Noninterferometric phase imaging with partially coherent light. Phys Rev Lett 80:2586–2589 3. Press WH, Teukolsky SA, Vetterling WT, Flannery BP (1992) Numerical recipes in Fortran, 2nd edn. Cambridge University Press, Cambridge 4. Dang XB, Zheng SL, Jiang ZG (2009) Phase recovery method for solving intensity transmission equation by complete multigrid method, pp 0253–2239, 06–1514–05 5. Wang X, Mao H, Zhao DZ (2007) Phase recovery based on light intensity propagation equation. J Opt 27(12):2117–2121 6. Fienup JR (1982) Phase retrieval algorithms: a comparison. Appl Opt 21:2758–2769 7. Yu B, Peng X, Tian JD, Niu HB, et al (2005) Simulation of phase recovery of hard X-ray coaxial proportional imaging. Sci China Ser G Phys Mech. Astron 35(3):233–240

Advanced Computer Science and Information Management

Debias in Deep Learning Recommender System Jialin He, Kenan Li, Haonan Yao, and Haoqiang Kang

Abstract In recent years, plenty of researchers work on recommender systems, predicting users’ ratings or preferences for items based on massive data. Traditional statistical learning recommender system includes collaborative filtering (CF), matrix factorization (MF), factorization machine (FM), field-aware factorization machine (FFM), and gradient boosting decision tree (GBDT). Some deep learning algorithms like AutoRec and NeuralCF are introduced in this article. However, it is purposed that existing recommender systems are based on association, and there is a deviation from the actual situation. Thus, inverse propensity score (IPS) and doubly robust model are proposed to debias. Keywords Recommender system · Deep learning · Causal inference · Inverse propensity score

1 Introduction The development of recommender system has gone through nearly 20 years. Depending on the specific application field, the recommender system will recommend items to users with audio, music, digital products, news, etc. These items are either helpful to user or are of interest to user. As the scale of Internet continues to expand, the number and types of various recommended items continue

J. He Huazhong University of Science and Technology, Wuhan 430074, China K. Li (B) International School, Jinan University, Guangzhou 510632, China e-mail: [email protected] H. Yao School of Mathematics, University of Connecticut, Storrs, CT 06269, USA H. Kang Applied and Computational Math Sciences—University of Washington, Seattle, WA 98195, USA © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Gokhale and E. Grant (eds.), Proceedings of Asia Pacific Computer Systems Conference 2021, Lecture Notes in Electrical Engineering 978, https://doi.org/10.1007/978-981-19-7904-0_6

55

56

J. He et al.

to increase, users’ attention and requirements for retrieval and recommendation are also constantly increasing. If asking people in the field of recommender systems to choose the most widely used model, then collaborative filtering (CF) is undoubtedly the first choice. Collaborative filtering is a method of automatically predicting the interests of users by collecting information about the preferences or taste of many users. At present, deep learning recommendation models such as AutoRec and NeuralCF are gradually becoming mainstream, but collaborative filtering is still used in a large number of scenarios due to its intuitive, interpretable, and easy-to-train advantages. Recommender systems we use currently are based on frequency that may cause bias, for example, the system will recommend items we already own. That is because although collaborative filtering is an intuitive and highly interpretable model, it does not have strong generalization ability. Collaborative filtering only reflects the association between data rather than causality. This leads to a strong head effect of popular products, which is easy to produce similarities with a large number of products, and the tail products lack similarities with other products due to the lack of characteristics, which leads some tail products are hard to be recommended. Therefore, there is an error between actual measured data and predict data. Inverse propensity scoring (IPS) is a method people recently use to reduce the error between actual measured data and predict data. Our goal is to use debias method to combine association with causality. In the following paper, we will introduce the statistical learning method based on collaborative filtering and matrix factorization, and deep learning model with AutoRec and NeuralCF. In this paper, we introduce both statistical learning and deep learning algorithms in recommender systems. We test them on the same dataset called MovieLens with over 1 million data. Then, we introduce inverse propensity score (IPS) and doubly robust algorithms to debias recommender system.

2 Application of Statistical Learning in Recommendation System Nowadays, there are already some statistical learning algorithms such as factorization machine (FM), field-aware factorization (FFM), and stacking gradient boosting decision tree (GBDT) with logistic regression (GBDT + LR).

2.1 Collaborative Filtering (CF) Collaborative filtering (CF) contains two algorithms UserCF and ItemCF. UserCF is based on customer similarity [1], which satisfies human instant: “if people with

Debias in Deep Learning Recommender System

57

similar interests like an item, then I will also like it.” However, under this situation, the number of customers is always larger than the number of items, hence UserCF will become extremely complex. Meanwhile, if the user lacks historical data, the matching accuracy will be greatly reduced. ItemCF [2] improves the defects of UserCF, which is based on the similarity of items. The behavior of selecting items according to the user’s history and recommending other items to the user through the similarity between the items. Therefore, UserCF favors scenarios where there are far fewer users than items and items are updated frequently, such as news and hotspots recommendations. ItemCF favors scenarios where items are far fewer than users and items are not updated frequently, such as major e-commerce platforms like Amazon, Taobao. The update frequency of items will not be too fast.

2.2 Matrix Factorization (MF) The emergence of matrix factorization algorithm solves the problem of the obvious head effect caused by collaborative filtering algorithm. Matrix factorization introduces the concept of implicit vector [3], which optimizes the ability of the model to deal with scattered data. It generates an implicit vector for each user and item and locates the users and items in the representation space, in order to distinguish the distance between each user and item. Then recommend the close items to target users because closer distance means that the interest characteristics are similar.

2.3 Factorization Machine (FM) Factorization machine (FM) type algorithms are a combination of linear regression and matrix factorization, the cool idea behind this type of algorithm is it aims to model interactions between features using low-dimensional latent vector (ω j1 , ω j2 ) and their dot product as weight. By doing so it has the ability to estimate all interactions between features even with extremely sparse data. Broadly speaking, factorization machines are able to estimate interactions in sparse settings because they break the independence of the interaction parameters by factorizing them. This means that data for one interaction helps also to estimate the parameters for related interactions (similar to the idea of matrix factorization and collaborative filtering). The below is the second-degree part of the expression of FM. ∅FM(ω, x) =

n n     ω j1 · ω j2 x j1 x j2 j1 =1 j2 = j1 +1

58

J. He et al.

2.4 Field-Aware Factorization Machine (FFM) Field-aware factorization machine (FFM) [4] algorithm enables the model to know and memorize the field of the element in each latent vector. Below is the seconddegree part of the expression of FFM.   As we can see,  compared with FM, the latent vector is altered from ω j1 · ω j2 to ω j1 , f2 · ω j2 , f1 so that each feature is corresponding to a group of latent vectors, rather than a unique latent vector. For example, when the feature §1 interacts with the feature §2 , §1 uses ω j1 , f2 as the latent vector into the dot product since the field f 2 corresponds to the feature x 2 . Similarly, the feature x 2 uses ω j2 , f1 as the latent vector into the dot product since the field f 2 corresponds to the feature §1 . ∅FFM(ω, x) =

n  n    ω j1 , f2 · ω j2 , f1 x j1 x j2 i=1 −i

2.5 Gradient Boosting Decision Tree (GBDT) with Logistic Regression (LR) GBDT is a commonly used nonlinear model, based on the idea of boosting ensemble learning [5]. Since GBDT itself can find a variety of distinguishing features and feature combinations, the path of the decision tree can be used directly as LR input features, eliminating the need for manual labor. Steps to find features and feature combinations. Therefore, as shown in Fig. 1, the leaf node of GBDT can be output as the input of LR.

3 Application of Deep Learning in Recommendation System There are two fundamental ideas in the recommendation system algorithm, one is user and item representation, and the other is feature intersection. In the traditional algorithm, matrix factorization (MF) starts from the user and item characterization (embedding), and expresses the user’s preference for items by the multiplication (inner product) of the user characterization vector and the entry characterization vector. The factorization machine (FM) is dedicated to solving the feature intersection problem. With the introduction of AlexNet in 2012, deep learning models have gradually swept the recommendation and advertising fields, becoming the mainstream of a new generation of recommendation models.

Debias in Deep Learning Recommender System

59

Fig. 1 Structure of GBDT

3.1 The AutoRec Model In 2015, Suvash et al. proposed AutoRec—a novel autoencoder framework for collaborative filtering (CF), at the “The Web Conference” conference. Figure 2 shows the item-based AutoRec model [6]. The function of the single hidden layer autoencoder is to take r (i) as the input data, map the vector to a low-dimensional vector through the encoder, and then reconstruct the output vector through the decoder so that the output vector is close to the input vector. Ability to complete missing values in the original input vector. Fig. 2 Item-based AutoRec model

60

J. He et al.

Although AutoRec’s compact and efficiently trainable model outperforms CF techniques on the MovieLens datasets, it is only a single-layer network with weak generalization and expression capabilities.

3.2 The NeuralCF Model MF uses the hidden vector to find the inner product to calculate the similarity between user u and item i. The model structure is relatively simple, and it is often unable to effectively fit the optimization target in practical applications. In order to better learn the relationship between the hidden vectors and ratings of users and products, in 2017, He Xiangnan et al. used a deep learning network to improve the traditional collaborative filtering algorithm, named neural network collaborative filtering (NeuralCF) [7]. The author uses a multi-layer neural network to replace the simple inner product operation to enhance the learning ability of the model. The framework of the NeuralCF model is shown in Fig. 3. In addition to the AutoRec model that changes the complexity of the neural network and the NeuralCF model that changes the feature crossing method, the application of deep learning in the recommendation system also includes combination models such as Deep & Cross and DeepFM, as well as deep learning evolution versions of FM models such as NFM and FMM, even the more popular DIN, DIEN, DRN, etc., recently. On the one hand, compared with the traditional machine learning model, the deep learning model has the stronger expressive ability and can mine more hidden patterns in the data; on the other hand, the deep learning model

Fig. 3 Neural collaborative filtering framework

Debias in Deep Learning Recommender System

61

has a very flexible structure and can be based on business scenarios and data characteristics. Flexible adjustment of the model structure to make the model perfectly fit the application scenario.

4 Debiasing on Recommender System Recommender system recommends content that users may be interested in based on analyzation of history data [8]. However, users cannot rate all items because of the large quantity, and users do not evaluate existing items completely randomly. Jiawei et al. propose a feedback loop in recommendation, and demonstrate that there are biases in observed data, including selection bias, position bias, exposure bias, popularity bias, etc. [9]. Due to the ratings are not missing at random (MNAR), there exists bias between the probability distributions from the real data and the observed data. Thus, the observed data is not supposed to be used directly and needs debiasing.

4.1 Propensity Score Existing recommender systems typically only consider the displayed data and ignore the data missing mechanism. To solve this, the propensity score is used to evaluate the probabilities that how likely user u ∈ {1, . . . , U } will rate an item i and be observed. In general, propensity depends on some observable or unobservable features X (e.g., the user’s gender, occupation, and age) and ratings Y ∈ RU ×I . Binary matrix O ∈ {0, 1}U ×I shows which items the users provided their ratings. Once the observable characteristics are taken into account, it is reasonable to assume that Ou,i is independent of the new prediction Y . Propensity estimation methods include but are not limited to logistic regression (LR), Naïve Bayes, and neural networks (NN). The propensity score estimated by features X and ratings Y can be expressed as follows. 

Pu,i = P(Ou,i = 1|X, Y ) The empirical risk minimization (ERM) of propensity score is given in as follows [10]: given training observations O from Y with marginal propensities P, given a ˆ ERM hypothesis space H of predictions Y , and given a loss function δu,i (Y, Y), selects the Y ∈ H that optimizes: The propensity score estimated by features X and ratings Y can be expressed as follows. 



62

J. He et al.

argmin ˆ ˆ { R(Y |P)} + Reg(θ ) Yˆ ERM = ˆ Y ∈H

4.2 Inverse Propensity Score (IPS) 

The mechanism for evaluating how well a predicted rating matrix Y represents the real ratings Y is the major distinction between IPS and naïve frameworks. The most common metric used in standard assessment is the mean absolute error (MAE) or mean squared error (MSE), which can be expressed as       MAE : δu.i Y, Yˆ = Yu,i − Yˆu,i     2 MSE : δu.i Y, Yˆ = Yu,i − Yˆu,i  

Accuracy: δu.i Y, Yˆ = 1 Yˆu,i = Yu,i The standard evaluation is   R Yˆ =

U I 1   ˆ δu.i Y, Y U · I u=1 i=1

Because Y is only known in part, the conventional method is to estimate R by taking the average of only the observed entries, which can be written as      1  δu.i Y, Yˆ Rˆ naive Yˆ =   (u, i ) : Ou,i = 1  (u,i):O =1 u,i

  IPS estimator, unlike naïve estimator. R naive Y , is unbiased for any probability allocation method. The IPS estimator can be expressed as 

Rˆ I P S (Yˆ |P) =

1 U·I



 (u,i):Ou,i =1

  δu.i Y, Yˆ Pu,i

The marginal probability Pu,i is the only requirement by the IPS estimator, and the unbiasedness is unaffected by the correlation O.

Debias in Deep Learning Recommender System

63

 ⎡  ⎤ δu.i Y, Yˆ  1 E Ou,i ⎣ Ou,i ⎦ E O [ Rˆ I P S (Yˆ |P)] = U·I u i Pu,i   1   ˆ = δu.i Y, Y = R Yˆ U·I u i

4.3 Doubly Robust Model Since models based on data imputation usually have a large bias due to wrong specifications, while models based on IPS usually have high variance, Xiaojie, Rui, Yu, and Jianzhong in 2019 suggested combining the two models and enjoying the desired double robustness property: the ability to remain unbiased as soon as the propensities or the imputed error is accurate [11]. They defined the objective functions as follows: εD R

   o ou,i du,i 1  = ε D R rˆ , r = eˆu,i + nm u,i Pˆu,i

They conduct extensive experimts on four real data sets using the doubly robust model. The results show that their method is superior to the most advanced method in rating prediction. The results also show that the proposed estimator significantly reduces the deviation of prediction error estimation.

5 Conclusion In this article, we review several traditional statistic learning methods in recommender systems, including collaborative filtering (CF), matrix factorization (MF), factorization machine (FM), field-aware factorization machine (FFM), and gradient boosting decision tree (GBDT). Then, we introduce several deep learning recommender systems AutoRec and NeuralCF. In the fourth section, we introduce inverse propensity score (IPS) and doubly robust algorithms and how they debias recommender systems. We have tested several statistic learning and deep learning methods in recommender system on the same dataset called MovieLens with over 1 million data. Our future work is using IPS and doubly robust to debias these algorithms. Acknowledgements Special thanks to our professor David Woodruff and teaching assistant Hudson Li for all the lectures and work for the project. Kenan Li, Haonan Yao and Haoqiang Kang contributed equally to this work and should be considered co-second authors.

64

J. He et al.

References 1. Goldberg D, et al (1992) Using collaborative filtering to weave an information tapestry. Commn ACM 35(12):61–70 2. Linden G, Smith B, York J (2003) Amazon. com recommendations: Item-to-item collaborative filtering. IEEE Internet Comput 7(1):76–80 3. Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 42(8):30–37 4. Juan Y, et al (2016) Field-aware factorization machines for CTR prediction. In: Proceedings of the 10th ACM conference on recommender systems 5. He X, et al (2014) Practical lessons from predicting clicks on ads at facebook. In: Proceedings of the Eighth International Workshop on Data Mining for Online Advertising 6. Sedhain S, Menon AK, et al (2015) Autorec: Autoencoders meet collaborative filtering. In: Proceedings of the 24th International Conference on World Wide Web, 111–112 7. He X, Liao L, et al (2017) Neural collaborative filtering. In: International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 173–182 8. Steck H (2010) Training and testing of recommender systems on data missing not at random. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining 9. Jiawei C, Hande D, Xiang W, Fuli F, Meng W, Xiangnan H (2020) Bias and debias in recommender system: a survey and future directions. arXiv:2010.03240 10. Tobias S, Adith S, Ashudeep S, Navin C, Thorsten J (2016) Recommendations as treatments: debiasing learning and evaluation. In: International Conference on Machine Learning. PMLR 11. Xiaojie W, Rui Z, Yu S, Jianzhong Q (2019) Doubly robust joint learning for recommendation on data missing not at random. In: International Conference on Machine Learning, PMLR

Entity Relationship Extraction Based on Knowledge Graph Knowledge Dengyun Zhu and Hongzhi Yu

Abstract In this paper, the ontology design of the schema layer of the knowledge graph is carried out, and the design of the RDF database storage scheme based on the existing knowledge graph is completed. Meanwhile, the named entity recognition and relationship extraction techniques are used to identify the entity pairs from the novel corpus through the idea of remote supervision and extract the knowledge triads from the sentences and add them to the knowledge graph to finally complete the construction of the knowledge graph of novel character relationships. Keywords Knowledge graph · Entity relationship extraction · RDF database

1 Introduction With the rapid development of science and technology today, the speed of knowledge update changes rapidly. How to effectively manage and utilize the massive knowledge has become one of the urgent problems to be solved. With the development of computer level and the continuous improvement of computing ability, knowledge graph constructed by man can no longer meet the requirements of intelligence and immediacy [1]. Knowledge graph provides a positive means for us. Knowledge graph is a graph-based data structure, which consists of points and edges, each point represents an “entity”, each edge is a “relationship” between entity and entity, knowledge graph is essentially a semantic network, is a structured semantic knowledge base, and is used to describe the physical world quickly in the concepts and their mutual relations. Knowledge reasoning technology oriented to knowledge mapping is an important part of downstream tasks of knowledge mapping, and is also the basis of entity link and intelligent question answering, which is of great significance for knowledge mapping and knowledge cleaning. The formation and development of knowledge graph has gone through nearly 70 years. D. Zhu · H. Yu (B) Department of Chinese Language and Literature, Northwest Minzu University, Lanzhou, Gansu, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Gokhale and E. Grant (eds.), Proceedings of Asia Pacific Computer Systems Conference 2021, Lecture Notes in Electrical Engineering 978, https://doi.org/10.1007/978-981-19-7904-0_7

65

66

D. Zhu and H. Yu

2 Related Research In the twenty-first century, with the vigorous development of the Internet and the explosive growth of knowledge, search engines have been widely used. However, in the face of the increasing amount of information on the Internet, the traditional document World Wide Web can no longer meet the needs of people to quickly obtain the information they need [2]. People want to get the information they need faster, more accurately, and more intelligently. To meet this need, the knowledge graph came into being. Strive to organize knowledge in a more orderly and organic way, provide users with a more intelligent interface, so that users can obtain the knowledge information they need more quickly and accurately. In recent years, many scholars and institutions have conducted in-depth research on the knowledge graph, hoping to show the connections between various concepts in a clearer and more dynamic way, so as to realize the intelligent acquisition and management of knowledge [3]. In November 2012, Google pioneered the concept of a knowledge graph, adding the representation of a knowledge graph to search results [4]. According to January 2015 statistics, Google’s knowledge map has 500 million entities and about 3.5 billion entity relationship information, which has been widely used to improve search engines.. Another representative knowledge graph system is Probase built by Microsoft. According to data on Microsoft’s official website, as of April 2016, Probase has a total of more than ten million concepts, including about 2.7 million core concepts. Probase has become the system with the largest number of concepts in the knowledge base system. Zhishi [5]. Me of Shanghai Jiao Tong University is the earliest knowledge base constructed in China, Zhishi. Me Knowledge Base integrates tens of thousands of data from Wikipedia (Chinese) [6], Baidu Encyclopedia and interactive Encyclopedia to provide Linking Open data (LOD) service to knowledge base users. Chinese Academy of Sciences Institutional Repository (CAS-IR) is a secondary development of DSpace software. By September 2013, CAS-IR has collected and preserved more than 440,000 scientific research results, more than 70% of the research results are available in full, CAS-IR is the largest institutional knowledge base network in China; in addition, well-known domestic search engine companies have also invested in the construction of knowledge graph and added the function of knowledge graph to their search engines, such as Baidu’s “Bosom” and Sogou’s “Knowledge Cube” [7].

3 Model of This Paper Based on the novel dataset crawled on the web, we focus on information extraction, and collect and read a lot of literature for the current research status of this problem, mainly to transform the unstructured text data into a structured form, and the research content is mainly divided into three parts: one is the research of named entity recognition by integrating multi-feature embedding [8], the second is the research of entity

Entity Relationship Extraction Based on Knowledge Graph Knowledge

67 Word-based named entity recognition

A study on named entity recognition by fusing multi-feature embedding

Named entity recognition based on word and lexical features

Named entity recognition based on word union embedding Research on novel information extraction based on deep learning

A study of entity relationship extraction in fictional scenes

Recurrent + CNN

A study of character relationships in novels based on knowledge graphs

RDF Knowledge Storage System

Chinese named entity recognition system implementation

WEB Service

Fig. 1 Research content diagram

relationship extraction in novel scenes, and the third is the research of novel character relationship research [9]. Finally, a Web-based information extraction system is implemented, the content is as Fig. 1.

4 Technical Route The technical route is divided into four major parts: named entity recognition with multi-feature embedding, novel entity relationship extraction model construction and training, novel character relationship model construction and training based on knowledge graph, and Web information extraction system implementation. The total technical route is shown in Fig. 2.

Fig. 2 Framework of technology

68

D. Zhu and H. Yu

4.1 Named Entity Recognition by Fusing Multi-feature Embedding In 2018, researchers at Google proposed the BERT pre-training model, which has been widely praised in the field of natural language processing, and then Google publicized the results achieved by the BERT pre-training model in 11 natural language processing tasks [10], affirming the academic value of the BERT pretraining model. The BERT pre-training model mainly contains an input layer and a multi-layer the input layer of BERT is composed by summing the token embeddings, segment embeddings and position embeddings, and adding [CLS] and [SEP] flags to the beginning and end of the sentence, respectively. The transformer encoding layer contains multiple encoders and decoders with the same structure, and the sentence input from the encoder is passed through a self-attention layer and then transmitted to the feed-forward neural network [11], and the decoder contains not only these two layers but also the decoder. In addition to these two layers, an additional attention layer is added between them to focus on the relevant parts of the input sentence. The introduction of BERT has changed the relationship between the word embedding vectors obtained from the pre-trained language model and the downstream natural language processing task, and the BERT pre-training model is able to consider the word context information and obtain more accurate word vector word vectors than the traditional Word2vec and Glove pre-training model, the whole processing is as Fig. 3. Lexical features are more important research in natural language processing, and different lexical words play different functions in sentences. In the fiction corpus, nouns and verbs often have important roles in sentences and are helpful for named entity recognition. In addition, the accuracy of lexical auto-labeling is high, and the possibility of noisy data input to the model is low. The word vectorized representation in the sentence is fused with the lexical feature vectorized representation, and the fused vector is input to the model for training. Using lexical features, the model can better perceive the context and improve the performance of entity recognition.

Fig. 3 BERT pre-training model

Entity Relationship Extraction Based on Knowledge Graph Knowledge

69

The word-based joint embedding employs a bidirectional lattice LSTM to encode Chinese text, incorporating word information on top of the word. The lattice LSTM first uses a large external dictionary automatically acquired to match sentences [12], acquiring all words that can be formed by adjacent words, forming a lattice structure. The word vectors from the lattice structure are then fed into the coding layer of the LSTM along with the word vectors, thus combining the information of words with that of characters. This lattice structure avoids the problems associated with word division errors in general models with word vectors and is the core part of the lattice LSTM model that distinguishes it from other LSTM models. The advantage of this encoding approach is that it compensates for the problem of polysemy and missing information when word vectors are used as inputs, while avoiding the problem of error propagation caused by word division errors in word vectors. Word embedding, word and lexical feature embedding, and joint word-word embedding are used in the input representation layer to compare and analyze the rationality and necessity of multi-feature embedding in the proposed algorithm. The input text sequence is pre-trained by BERT to achieve word embedding, and the sequence text is converted into the corresponding vector by table look-up operation in turn; word joint embedding is achieved by lexicon matching; word features are obtained by Word2vec, word vectors, and lexical feature vectors are stitched together, and feature extraction is achieved by BiLSTM for the stitched feature vectors; global sequence is obtained by CRF decoding optimal label as shown in Fig. 4.

4.2 Entity Relationship Extraction Research in Novel Scenes Facing the text data of novels with complex structure and diverse language styles, the conventional approach is difficult to be effective. There are often multiple characters and multiple attributes in a piece of fiction data, and there may be multiple cases of one-to-many, many-to-one, many-to-many, and no correspondence between multiple characters and multiple attributes. To address the problem of poor extraction caused by the interconnected relationships between entities in the novel corpus, this paper proposes a novel entity relationship extraction model incorporating a multi-channel self-attention mechanism, which is divided into the following five parts. (1) Embedding layer: The extracted entities are marked as 1 and non-entities are marked as 0 according to the recognition effect in named entity recognition, and the input feature vector of the utterance is obtained by using BERT pre-training. Using the results of named entity recognition to guide entity relationship extraction can avoid the situation that a direct entity relationship extraction study may lead to incomplete entity extraction. (2) BiLSTM layer: Using BiLSTM to learn the contextual information and shallow semantic features of text utterances from the input feature vector to obtain the sentence vector.

70

D. Zhu and H. Yu

Fig. 4 Joint word-based embedding model

(3) Multi-channel self-attentive layer: Using multi-channel self-attentive mechanism to learn deep-level global semantic features of text utterances from sentence vectors and get global feature vectors of sentences. (4) CNN layer: Using CNN to learn local phrase features of text utterances from sentence vectors and get local feature vectors of sentences. (5) Output layer: The global and local feature vectors of the sentence are spliced and input to a fully connected network, and the final result is output by the softmax function. The model is based on the “recurrent + CNN” network framework combined with the multi-channel self-attentive mechanism, converting the text into an input feature vector consisting of word vectors and position vectors, capturing the contextual information and shallow semantic features of the text sentences using BiLSTM,

Entity Relationship Extraction Based on Knowledge Graph Knowledge

71

Fig. 5 Novel entity relationship extraction model incorporating multi-channel self-attentiveness mechanism

capturing the local phraseological features of the sentences using CNN, and then combining with the multi-channel self-attentive mechanism. Then, we use CNN to capture the local phrase features of the sentence, and then combine the global semantic features of the utterance with the multi-channel self-attentive mechanism to deeply mine the semantic features of the text, as shown in Fig. 5.

4.3 Research on Novel Character Relationships Based on Knowledge Graph Common knowledge graphs contain knowledge of generic domains and generally focus on their overall comprehensiveness, but less on the relationships between characters. In this paper, we focus on the attributes of each character in the novel and the association between characters, which can clearly show the introduction

72

D. Zhu and H. Yu

of characters in the novel. Character attributes mainly include character’s personality, appearance, age, and style (e.g., martial arts schools of characters in martial arts novels, etc.). Through deep intelligent analysis and mining of novels, we are able to extract character attributes in novels, classify novel styles, and establish corresponding knowledge graphs.

5 Conclusion The research of knowledge graph mainly comes from three aspects: First, data, information, knowledge and visualization in the field of computer science; the second is the visual analysis of citation data in the field of library information; the third is to study complex network systems and social network analysis. In the process of fusion, there are some problems in the knowledge graph drawing method, which will be developed in the following aspects in the future. Acknowledgements This research was supported by Lanzhou City Cheng Guan District Science and Technology Bureau Talent Innovation and Entrepreneurship Project (2021RCCX0016).

References 1. Garfeild E (1995) Citation indexes for science. Science 122:108–111 2. Price: citation classic for “Little Science, Big Science”. Current Contents: Social & Behavioral Science 29, 18 (1983) 3. Eugene G (2004) Theory and application of citation indexing method. Hou Hanqing, Translated by Liu Yu. Beijing Library Publishing House, 243–246 4. Garfeild E (1970) Citations in popular and interpretive science writing. Science 227:669–671 5. White HD, Griffith BC (1982) Authors as markers of intellectual space: co-citation in studies of science, technology and society. J Document 38(4):255–272 6. Hummon NP, Dereian P (1989) Connectivity in a citation network: the development of DNA theory. Social Netw 11(1):39–63 7. Heckerman D, Dan G, Chickering DM (1995) Learning Bayesian networks: the combination of knowledge and statistical data. Mach Learn 20(3):197–243 8. Cheong H, Li W, Cheung A, Nogueira A, Iorio F (2017) Automated extraction of function knowledge from text. ASME J Mech Des 139(11):111407 9. Nickel M, Murphy K, Tresp V et al (2016) A review of relational machine learning for knowledge graphs. Proc IEEE 104(1):11–33 10. Li F, Ke J (2018) Research progress of entity relation extraction based on deep learning framework. Info Sci 36(3):169–176 11. Wang L, Xie Y, Zhou JS, et al (2018) Segment-level Chinese named entity recognition based on neural network. J Chinese Inform Sci 32(3):84–90, 100 12. Tran SN, Garcez ASD (2018) Deep logic networks: inserting and extracting knowledge from deep belief networks. IEEE Trans Neural Net & Learn Syst 29(2):246–258

Algorithm Research and Program Implementation of Q Matrix Eigenvalue Weida Qin, Xinqi Wang, and Ricai Luo

Abstract The problem of solving matrix eigenvalue is a hot research topic in systems engineering and electronics, because matrix eigenvalue has important applications in many fields such as unmanned aerial vehicle (UAV) cluster pinning control system and satellite remote sensing image processing. There are many algorithms to solve matrix eigenvalue, and dichotomy is an important one. On the basis of the research on matrix eigenvalue algorithm by experts and scholars, the Q matrix is studied and found to have nine pairs of eigenvalues of adjacent two sequential principal square submatrices that are not strictly interlaced that is, the matrix does not meet the key properties of using dichotomy to solve eigenvalues. Then, the feasible conditions and feasibility of dichotomy algorithm for solving the eigenvalue of Q matrix are studied by analyzing the principle of sign change number of characteristic polynomial sequence of sequential principal square submatrices of Q matrix. Finally, the accuracy of the algorithm is verified by numerical experiments using Java programming language. Keywords Q matrix · Dichotomy · Eigenvalue · Java programming language · Program implementation

1 Introduction The problem of solving matrix eigenvalue is a hot research topic in systems engineering and electronics [1, 2], because matrix eigenvalue has important applications in many fields such as unmanned aerial vehicle (UAV) cluster pinning control system and satellite remote sensing image processing [3, 4]. There are many algorithms to solve matrix eigenvalue, and dichotomy is an important one [5, 6]. Let the eigenvalues of k-order sequential principal square submatrices Rk of norder matrix R be λ1 , λ2 ,…λk , and the eigenvalues of the k+1-order sequential principal square submatrices Rk+1 be μ1 , μ2 ,….,μk+1 . In the reference [7], it is pointed W. Qin · X. Wang (B) · R. Luo Hechi University, Hechi, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Gokhale and E. Grant (eds.), Proceedings of Asia Pacific Computer Systems Conference 2021, Lecture Notes in Electrical Engineering 978, https://doi.org/10.1007/978-981-19-7904-0_8

73

74

W. Qin et al.

out that the key property of using dichotomy to solve the eigenvalues of a matrix is that the eigenvalues of any two adjacent sequential principal square submatrices of the matrix are strictly interlaced, i.e., for k = 1, 2,…,n−1, inequality (1) holds. μ1 < λ1 < μ2 < λ2 < μ3 ... < μk < λk < μk+1 .

(1)

Supposing m i , di , gi are reals, ⎡

d1 g1

0



⎢ ⎥ ⎢ m 1 d2 . . . ⎥ ⎥ R=⎢ ⎢ ⎥ .. .. ⎣ . . gn−1 ⎦ 0 m n−1 dn . If there is a positive integer i such that m i gi > 0(i = 1, . . . , n − 10) and m n−9 gn−9 = m n−8 gn−8 = . . . = m n−1 gn−1 = 0, the n-order matrix R is called a Q matrix. Supposing the j-order sequential principal square submatrix R of Q matrix is Rj , then R1 = b1 , Rn = R. If the characteristic polynomial of Rj is p j (x), then p j (x) = det(x I − R j )    x − d1 −g1 0       −m 1 x − d2 . . .  =   .. ..  . . −g j−1    0 −m j−1 x − d j     x − d1 −g1     −m 1 x − d2 −g2      . . . .. .. .. = (x − d j ) p j−1 (x) + g j−1      0 −m j−3 x − d j−2 −g j−2    0 0 0 −m j−1     x − d1 −g1  0     . ..  −m 1 x − d2    = (x − d j ) p j−1 (x) − m j−1 g j−1   . . .. ..  −g j−3    0 −m x −d  j−3

= (x − d j ) p j−1 (x) − m j−1 g j−1 p j−2 (x). Let p0 (x) = 1, and according to Eq. (2) j = 2, 3,…,n, there will be.

j−2

(2)

Algorithm Research and Program Implementation of Q Matrix Eigenvalue

p j (x) = (x − d j ) p j−1 (x) − m j−1 g j−1 p j−2 (x).

75

(3)

According to the definition of Q matrix, ∵ m n−9 gn−9 = m n−8 gn−8 = ... = m n−1 gn−1 = 0, Bases on Eq. (3), there will be pn−8 (x) = (x − dn−8 ) pn−9 (x)

(4)

pn−7 (x) = (x − dn−7 ) pn−8 (x)

(5)

pn−6 (x) = (x − dn−6 ) pn−7 (x)

(6)

pn−5 (x) = (x − dn−5 ) pn−6 (x)

(7)

pn−4 (x) = (x − dn−4 ) pn−5 (x)

(8)

pn−3 (x) = (x − dn−3 ) pn−4 (x)

(9)

pn−2 (x) = (x − dn−2 ) pn−3 (x)

(10)

pn−1 (x) = (x − dn−1 ) pn−2 (x)

(11)

pn (x) = (x − dn ) pn−1 (x)

(12)

Equations (4–12) show that a Q matrix has nine pairs of characteristic polynomials pl−1 (x) and pl (x)(l = n − 8, n − 7, ..., n) that have a common root for two adjacent sequential primary square submatrices that is, the roots of pl−1 (x) and pl (x)(l = n − 8, n − 7, ..., n) are not strictly interlaced. Thus, Q matrix does not meet the key properties of solving matrix eigenvalues by dichotomy. In reference [8], it is pointed out that dichotomy will be the most suitable algorithm for calculating partial eigenvalues of specified intervals. In view of the important position of dichotomy in algorithmic problem solving, the necessary conditions and feasibility of it for solving eigenvalues of Q matrix were studied in this paper. Finally, the algorithm was verified by a numerical experiment using Python programming language.

76

W. Qin et al.

2 Definition and Lemma Definition 1 [9]: A sequence of real numbers. c0 , c1 , . . . , cm−2 , cm−1 , cm and c0 = 0, cm = 0, if ci−1 and ci are opposite in sign counting from left to right, it means that there is a sign change number; if ci = 0, ci−1 and ci+1 are opposite in sign, there is also a sign change number, which is the sum of the sign change numbers in the sequence. According to the definition, count the sequence of real numbers. −12, −35, 3, 6, −1, 2, −9, −8, 12. From left to right, there are five sign change numbers, namely −35 and 3, 6 and −1, −1 and 2, 2 and −9, −8 and 12. Count the sequence of real numbers. −84, −11, 0, 22, −5. From left to right, there are two sign change numbers, namely −61 and 22, 22 and −5. Definition 2 [9]: Let f 0 (x), f 1 (x),…, f m (x) be a polynomial with real coefficients, and x0 be a real, in the sequence of real numbers f 0 (x0 ), f 1 (x0 ), ..., f m (x0 ) there is f 0 (x0 ) = 0, f m (x0 ) = 0, then the sign change numbers of this sequence is called the sign change numbers of this polynomial sequence at x0 . Definition 3 [10]: If the real number x0 is a root of f 0 (x), and there is a positive number ε such that f 0 (x) f 1 (x) > 0 or f 0 (x) f 1 (x) < 0 constantly for any x ∈ (x0 − ε, x0 ), then x0 is called generalized zero of the third kind of f 0 (x) about f 1 (x). Definition 3 [9, p. 123]: Let m i , di , gi be reals ⎡

d1 g1

0



⎢ ⎥ ⎢ m 1 d2 . . . ⎥ ⎢ ⎥ H =⎢ ⎥ .. .. ⎣ . . gn−1 ⎦ 0 m n−1 dn If there is a positive integer i such that m i gi > 0(i = 1, · · · , n − 1), H is a Jacobian matrix. Supposing the j-order sequential principal square submatrix of H is H j , and its characteristic polynomial is p j (x). Let p0 (x) = 1, and according to Eq. (2) j = 2, 3, ..., n, there will be

Algorithm Research and Program Implementation of Q Matrix Eigenvalue

p j (x) = (x − d j ) p j−1 (x) − m j−1 g j−1 p j−2 (x)(m j−1 g j−1 > 0)

77

(13)

The sequence p0 (x), p1 (x), ..., pn−1 (x), pn (x) satisfying Eq. (13) is called the characteristic polynomial sequence for the sequential primary square submatrix of the n-order Jacobian matrix. Lemma 1 [9]: The dichotomy for solving the eigenvalue of Jacobi matrix is: Let p0 (x) = 1, and V (x) be the sign change number of characteristic polynomial sequence p0 (x), p1 (x),…pn (x) for the sequential primary square submatrix of the n-order Jacobian matrix, with given a, b, if pn (a) = 0, pn (b) = 0, then V (a)−V (b) is the number of roots of pn (x) in the interval [a, b].

3 Eigenvalue Dichotomy Method for Solving Q Matrix Supposing the j-order sequential principal square submatrix of Q matrix, R is R j , and its characteristic polynomial is p j (x). Let p0 (x) = 1, and according to Eq. (2) j = 2, 3, ..., n, there will be. p j (x) = (x − d j ) p j−1 (x) − m j−1 g j−1 p j−2 (x).

(14)

According to the definition of Q matrix, ∵ m i gi > 0(i = 1, 2, · · · , n − 10), then p0 (x) = 1, p1 (x) = x − d1 , p2 (x) = (x − d2 ) p1 (x) − m 1 g1 p0 (x), p3 (x) = (x − d3 ) p2 (x) − m 2 g2 p1 (x), ..., pn−9 (x) = (x − dn−9 ) pn−10 (x) − m n−10 gn−10 pn−11 (x), According to Eq. (13), p0 (x), p1 (x), ..., pn−9 (x) is a characteristic polynomial sequence for the sequential primary square submatrix of the n-9-order Jacobian matrix. According to Eqs. (4–12) pn (x) = (x − dn−8 )(x − dn−7 )...(x − dn−1 )(x − dn ) pn−9 (x).

(15)

78

W. Qin et al.

Supposing S(x) is the sign change number of the sequence p0 (x), p1 (x), ..., pn (x), V (x) is the sign change number of the sequence p0 (x), p1 (x), ..., pn−9 (x), and Y (x) is the sign change number of the sequence pn−9 (x), ..., pn−1 (x), pn (x), then S(x) = V (x) + Y (x). Theorem 1 If dn−8 , ..., dn−1 , dn ∈ / [a, b], let p0 (x) = 1, and S(x) be the sign change number of characteristic polynomial sequence p0 (x), p1 (x), . . . , pn (x) for the sequential primary square submatrix of the n-order Q matrix, with given a, b, if pn (a) = 0, pn (b) = 0, then S(a) − S(b) is the number of roots of pn (x) in the interval [a, b]. prove S(a) − S(b) = V (a) + Y (a) − (V (b) + Y (b)) = (V (a) − V (b)) + (Y (a) − Y (b)).

(16)

According to Eq. (15) and pn (a) = 0, pn (b) = 0, pn (b) = 0, then pn−9 (a) = 0, pn−9 (b) = 0. Supposing k is the number of roots of pn−9 (x) in the interval [a, b], and because p0 (x), p1 (x), ..., pn−9 (x) is the characteristic polynomial sequence for the sequential primary square submatrix of the n-9-order Jacobian matrix, V (a) − V (b) = k according to Lemma 1. ∵dn−8 , ..., dn−1 , dn ∈ / [a, b], the number of roots of pn (x) and pn−9 (x) is equal in [a, b] according to Eq. (15). According to Eq. (16) and V (a) − V (b) = k, so the theorem will be true only when Y (a) − Y (b) = 0 is proved to be true. Put all the roots of pn−9 (x), ..., pn−1 (x), pn (x) in [a, b] from small to large h 1 , h 2 , ..., h m−1 , h m Let h 0 = a, h m+1 = b, then there are m + 1 disjoint short intervals in []] >< EmphasisT ype = "I talic" > a < /Emphasis >< EmphasisT ype = "I talic" > b < /Emphasis > 0.7 with the construct to be measured. Second, discriminant validity is a measurement model with reflexive indicators assessed based on cross-loading measurements with constructs. Suppose the construct’s relationship to the measurement item is greater than the size of the other constructs. Then show their block size better than with other blocks. Assessing discriminant validity can also be done by comparing the square root of the average variance extracted (AVE) value. Third, composite reliability is an indicator to measure a construct seen in the latent variable coefficients view. It consists of two measuring tools, namely internal

126

M. S. Dewi et al.

consistency, and Cronbach’s alpha. In this measurement, if the value obtained is > 0.7, then the construct has high reliability.

3.2 Inner Model The analysis of the inner model is commonly referred to as the inner relation, structural model, and substantive theory, which explains the relationship between latent variables based on substantive theory. This analysis can be evaluated using R-square for the dependent variable, Stone-Geisser Q-square test for predictive relevance and ttest, and the significance of the coefficients of structural path parameters. Considering the inner model with PLS begins by looking at the R-square for each latent dependent variable. Then the interpretation is the same as the interpretation in the regression. Then changes in the R-square value can be used to assess certain independent latent variables on the latent dependent variable. In addition to the R-square value, this PLS can also be evaluated by looking at the predictive Q-square value for constructive models. Q-square can measure how good the observation value found by the model and its parameter estimates is. The Q-square value > 0 means that the model has a relevant predictive value, whereas if the Q-square value is < 0, the model has a less relevant predictive value. In the inner model, the next step is the effect size (F-square) used to determine how good a research model is or assess the relative impact of exogenous variables on endogenous variables. There are several levels to find out. According to Ghozali [11], if the F-square value of 0.02 means it has a small effect, 0.15 means it has a medium impact, and a value of 0.35 means it has a significant influence. The final stage is testing the hypothesis using path coefficient estimates to know the relationship between variables. This stage is carried out through a bootstrapping procedure on how to test the estimated path coefficient using the t-statistical value. Here the tstatistics are used to make conclusions on the hypothesis test in this study. This test can determine whether a hypothesis is rejected or accepted. If the t-statistic value is greater than the t-table value (1.96), the hypothesis is accepted. And vice versa [11] (Table 1).

3.3 Variable Operations See Table 1.

Analysis of the Influence of System Quality, Information Quality, …

127

Table 1 Variable operations Variables

Indicators

Sources

Measurement

System quality (exogenous variable)

1. Ease of using the IPBB application 2. All data in the IPBB application comes from a trusted source 3. The IPBB application has a fast response to a request for information 4. IPBB application security can be trusted

[8]

Likert scale: STS = 1 TS = 2 N. = 3 S. = 4 SS = 5

Information quality (exogenous variable)

1. The information presented is easy to understand 2. The information presented is helpful for users 3. The information presented is complete 4. The information presented is the latest

[8]

Likert scale: STS = 1 TS = 2 N. = 3 S. = 4 SS = 5

Service quality (exogenous variable)

1. The IPBB application responds quickly when the system has problems 2. Have a complete FAQ 3. Understand user needs 4.4.4.4.Display attractive visuals

[8, 9]

Likert scale: STS = 1 TS = 2 N. = 3 S. = 4 SS = 5

Taxpayer compliance (endogenous variable)

1. Pay taxes due on time Minister of Finance 2. I am paying taxes owed Regulation Number: 74/PMK.0 3/2012 by applicable regulations 3. It knows the deadline for PBB payments 4. Has no administrative sections

Likert scale: STS = 1 TS = 2 N=3 S=4 SS = 5

4 Results and Discussions In collecting data, the researcher distributed 130 questionnaires to the Tangerang City Bapenda service department in paper form. However, 20 questionnaires did not meet the criteria or were not feasible, and six questionnaires were not returned to the researcher. So that only 104 questionnaires can be used in this study.

128

M. S. Dewi et al.

Table 2 Descriptive analysis Variable

N

System quality (SQ)

104

Minimum 9

Maximum

Mean

Standard deviation

20

15.96

2.401

Information quality(IQ)

104

9

20

16.04

2.048

Service quality (SVQ)

104

10

20

15.57

2.135

Taxpayer compliance (KWP)

104

7

20

16.00

2.181

Valid N (Listwise)

104

Primary data processed by the author

4.1 Results 4.1.1

Descriptive Analysis Statistics

Descriptive statistics are statistics used for data analysis by describing the data that has been collected without drawing conclusions that apply to the public [12]. Included in descriptive statistics is the presentation of data in tables, graphs, diagrams, calculation of mode, mean, median, calculation of deciles, calculation of data distribution through analysis of the average, and standard deviation (Table 2).

4.1.2

Outer Model

To test the validity and reliability of the data in this study, the outer model test is used (Table 3). a. Convergent Validity So, from all indicators, each variable has an outer loading value of more than 0.70. So, it’s all valid (Table 4). b. Discriminant Validity So it can be concluded that all latent variables are valid and can be used for research with a loading value of more than 0.70 that has met discriminant validity. The result of the R-square test can represent the amount of variance of the construct described by a model. Average variance extracted (AVE) (Table 5). For this reason, all variables in this study have met the AVE requirements and are declared valid, because the AVE value is greater than 0.5. After knowing the value of the square root of the AVE of each construct, the next step is to compare the square root of the AVE with the correlation between constructs in the model. If the square root value of AVE in each correlation is greater than the correlation between constructs in the model, it indicates that the model has good discriminant validity.

Analysis of the Influence of System Quality, Information Quality, …

129

Table 3 Convergent validity Variable

Code

Item

Outer loading

Information

System quality

SQ1

IPBB application is easy to use

0.885

Valid

SQ2

IPBB applications can respond quickly to requests for information

0.872

Valid

SQ3

I feel safe accessing the IPBB application

0.935

Valid

SQ4

I am sure that all data in the IPBB application comes from a trusted source

0.807

Valid

IQ1

I can understand the information presented

0.875

Valid

IQ2

The information presented is helpful to me

0.894

Valid

IQ3

The information presented is complete and detailed

0.895

Valid

IQ4

The information presented is the latest

0.831

Valid

SVQ1

The IPBB application provides a fast response when the system encounters problems

0.842

Valid

SVQ2

The IPBB application has a complete FAQ

0.801

Valid

SVQ3

The IPBB application understands my tax needs, especially PBB

0.796

Valid

SVQ4

The visuals displayed in the IPBB application are attractive

0.826

Valid

KWP1

I submitted the SPPT-PBB (Tax Notice Payable—PBB) on time

0.753

Valid

KWP2

I have no tax arrears

0.834

Valid

KWP3

I pay taxes owed by applicable regulations

0.888

Valid

KWP4

I have never committed a tax 0.818 crime for the last 10 (ten) years

Valid

Information quality

Service quality

Taxpayer compliance

Primary data processed by the author

130

M. S. Dewi et al.

Table 4 Cross loading Indicators

System quality (SQ)

SQ1

0.885

SQ2

0.872

SQ3

0.935

SQ4

0.837

Information quality (IQ)

IQ1

0.875

IQ2

0.894

IQ3

0.895

IQ4

0.831

Service quality (SVQ)

SVQ1

0.842

SVQ2

0.801

SVQ3

0.796

SVQ4

0.826

Taxpayer compliance (KWP)

KWP1

0.753

KWP2

0.834

KWP3

0.888

KWP4

0.818

Primary data processed by the author

Table 5 Average variance extracted (AVE)

Variable

AVE

System quality (SQ)

0.767 0.875

Root AVE Information Valid

Information quality (IQ)

0.764 0.871

Valid

Service quality (SVQ)

0.667 0.816

Valid

Taxpayer compliance (KWP) 0.680 0.824

Valid

Primary data processed by the author

In Table 6, it is known that the Fornell Lacker criteria value for each variable already has the highest value when compared to other variables. So, based on the above analysis, all indicators used in this study have met discriminant validity and convergent validity, which can be used for further research (Table 7). c. Composite Reliability So it can be concluded that all indicators used in this study are consistent and have good reliability. After that, it can continue to test the structural model.

Analysis of the Influence of System Quality, Information Quality, …

131

Table 6 Fornell Lacker criteria Variable

Taxpayer compliance

Information quality

Service quality

Taxpayer compliance

0.824

Information quality

0.604

0.874

Service quality 0.565

0.732

0.817

System quality 0.471

0.749

0.745

System quality

0.876

Primary data processed by the author

Table 7 Composite reliability

Variable

Composite reliability Cronbach’s alpha

System quality

0.929

0.898

Information quality

0.928

0.898

Service quality

0.889

0.833

Taxpayer compliance 0.894

0.843

Primary data processed by the author

4.1.3

Inner Model

The result of the R-square test can represent the amount of variance of the construct described by a model (Table 8). In the next step, the measure used to assess the relative impact of each variable is 2 effect sizes (F-square) (Table 9). In this study, after testing the R-square value, it is necessary to test with the goodness of fit or Q2. The higher the value of a model, the more fit data will be. Q2 Table 8 R-square value

Variable

R-square

Taxpayer compliance

0.404

Primary data processed by the author

Table 9 F-square value

Variable

F-square

Criteria

System quality–taxpayer compliance

0.006

Have no effect

Information quality–taxpayer compliance

0.134

Small

Service quality–taxpayer compliance

0.057

Small

Primary data processed by the author

132 Table 10 Value of Q-square

M. S. Dewi et al. Variable

SSO

Cronbach’s alpha

System quality 416.000

416.000

Information quality

416.000

416.000

Service quality 416.000

416.000

Taxpayer compliance

315.549

416.000

Q2 (= 1−SSE/SSO)

0.241

Primary data processed by the author

is done via blindfolding in the SmartPLS 3.0 software. This blindfolding process can produce cross-validated redundancy (Table 10). The predictive relevance value shows the number 0.241, which means it is greater than 0. The Q2 value > 0 means that the model has predictive relevance [11]. Based on these results, the structural model in this study has good goodness of fit. In Fig. 1, there is a path coefficient value to determine the relationship of each variable (Table 11). To test the hypothesis, the researcher uses a significance level of 5% with a confidence level of 95%, so the result of the t-statistic value must be > 1.96. If the t-statistic is greater than the t-table value, it means that the hypothesis can be accepted. Vice

Fig. 1 Path diagram research analysis of the effect of system quality, information quality, and service quality of Tangerang City IPBB application on taxpayer compliance

Analysis of the Influence of System Quality, Information Quality, …

133

Table 11 Hypothesis testing Relationship

Original sample (O)

Sample mean (M)

Standard deviation (STDEV)

T-statistics

p-value

Information

System quality–taxpayer compliance

−0.100

−0.089

0.165

0.605

0.545

Rejected

Information quality–taxpayer compliance

0.463

0.443

0.194

2.393

0.017

Accepted

Service quality–taxpayer compliance

0.300

0.321

0.160

1.880

0.061

Rejected

Primary data processed by the author

versa, if the t-statistic value is smaller than the t-table value, then the hypothesis is rejected. Table 11 shows that there are two hypotheses in this study that are rejected or the t-statistic value is less than 1.96, namely the hypothesis of system quality on taxpayer compliance which scores 0.605, and the hypothesis of the effect of service quality on taxpayer compliance which scores 1.880. Meanwhile, one other hypothesis has a t-statistic value greater than 1.96 and is acceptable. In addition, the p-values can show how significant the effect is. If the p-values in a hypothesis are less than 0.05, then the hypothesis has a significant effect. In this study, only one hypothesis has a significant effect, namely the effect of information quality on taxpayer compliance.

4.2 Discussions 4.2.1

The Effect of IPBB Application System Quality on Taxpayer Compliance

The first hypothesis in this study is the effect of system quality on taxpayer compliance, showing a path coefficient value of 0.605 with p-values of 0.545 and a t-statistic value of 0.605. It means that the quality of the Tangerang City IPBB system does not significantly affect user satisfaction. This is because the value of t-statistics does not meet the requirements that t-statistics > t-table 1.96. So the first hypothesis in this study was rejected. These results prove that the quality of the IPBB Application system does not meet the characteristics that users can take advantage of. This is in line with research showed by [13], system quality can be measured by the dimensions of reliability, flexibility, accessibility, response time, and integration because reliability is the level of trust, and the way the system works can be measured consistently. Flexibility refers

134

M. S. Dewi et al.

to the ability to adapt the system to the changing needs of users. Accessibility refers to the ease of getting information. Integrity refers to the source of data originating from a credible system, and response time refers to the time it takes the system to respond to user requests. Taxpayer compliance is the discipline of the taxpayer in terms of paying off taxes owed and obeying the applicable laws or regulations. The better the system’s quality in processing data for user needs, the more often users will access the system because it can help solve problems or solve a need. Research showed by [14] states that higher user satisfaction results in higher actual use. The more satisfied users are with a system, the more often users use the system. Net benefits in this study are the taxpayer compliance variable. Suppose the quality of the system does not meet the characteristics proposed by [5] such as being flexible, reliable, and fast in responding to requests for information. In that case, users will feel uncomfortable using IPBB Tangerang City. This causes users to be dissatisfied with the Tangerang City IPBB. The results of this hypothesis are in line with research showed by [9] which states that the system quality variable does not affect taxpayer compliance because when a user accesses an application his concern is the content and services available, not the system. Research [15] also states that if a system does not have reliability, users are not satisfied with the system. Likewise, research showed by [16] on customer service applications states that the quality of the system produced by the application is not by the needs and desires of users. However, the results of this hypothesis are different from the research showed by [17] and [18] which suggest that the system’s quality has a significant influence on taxpayer compliance. The results of this hypothesis prove that the low quality of the system will affect the level of taxpayer compliance. One indicator has a low value on respondents’ responses to questions from the system quality variable. It becomes a weak projection in the use of Tangerang City IPBB, namely Tangerang City IPBB is less able to respond quickly to requests for information. At the same time, the indicator that plays a vital role in determining the system’s quality for the use of IPBB is security in accessing IPBB in Tangerang City. This condition indicates that the Tangerang City IPBB has not met the characteristics of a quality system. IPBB Tangerang City is still relatively new so the system is still making changes, and the alertness of the system in responding to requests for information is not too fast. However, Tangerang City IPBB needs to be maintained in terms of security because it is necessary to enter the Tax Object Number (NOP) before checking the PBB bill. So there is a small risk of data leakage or the identity of the Taxpayer. Therefore, the Tangerang City government must continue to manage IPBB by evaluating and improving the quality of the IPBB system, so that taxpayers can directly accept the result of the increase and meet their tax needs. Thus, the better the quality level of the IPBB system, the level of taxpayer compliance in Tangerang City will also increase.

Analysis of the Influence of System Quality, Information Quality, …

4.2.2

135

The Effect of IPBB Application Information Quality on Taxpayer Compliance

The second hypothesis in this study is the influence of information quality on taxpayer compliance, showing the path coefficient value of 2.393 with p-values of 0.017 and the t-statistical value of 2.393. This value indicates that the quality of information IPBB Tangerang City has a significant influence on taxpayer compliance so that the second hypothesis in this study can be accepted. The results of this hypothesis mean that a high level of information quality will increase taxpayer compliance. However, suppose the quality of information is low. In that case, taxpayer compliance in Tangerang City will decrease so that the results of this research hypothesis can support the model developed by DeLone and McLean. Research that supports the results of this hypothesis is [19], who researched the success of the e-filling system using the DeLone and McLean success model. Based on the study results, it was stated that if a system provides valuable information or is by the user’s wishes, then the user will feel satisfied. Likewise, research showed by [20] states that the quality of information positively affects user satisfaction. The better the quality of the information, the more accurate the decision will be and will have a positive effect on user satisfaction. If the quality of the information produced is not good, it will harm user satisfaction. The results of this hypothesis also follow research showed by [21], suggesting that the quality of information is the leading and most important benchmark for measuring taxpayer compliance in an information system. If an information system can provide complete, timely, accurate, and can be used or utilized by users in helping their work, it can increase the desire to pay off their tax obligations. The results of this hypothesis are supported by field data derived from filling out the questionnaire. Based on responses to questions from the information quality variable, there are 2 (two) of 4 (four) indicators that have the highest value in predicting taxpayer compliance, namely the information presented in the Tangerang City IPBB is easy to understand, and the information presented by Tangerang City IPBB is complete. As well as details, this is because the output generated from the Tangerang City IPBB Application contains personal data of the taxpayer consisting of the tax payer’s name, Tax Object Number (NOP), and complete address, along with the presentation of a table of transaction history and bills for the current year. Because users only get information related to PBB bills, users are satisfied with this IPBB. It can be concluded that taxpayer compliance with an information system can be measured based on the quality of the information produced. If the taxpayer believes that the information made is optimal, then the information can be used as a reference for tax compliance. The quality of information at the Tangerang City IPBB is clear and structured to satisfy taxpayers with the information generated. Therefore, the better the level of information quality can increase taxpayer compliance in Tangerang City.

136

M. S. Dewi et al.

4.3 The Effect of IPBB Application Service Quality on Taxpayer Compliance The third hypothesis in this study is service quality on taxpayer compliance, showing the path coefficient value of 1.880 with p-values of 0.061 and a t-statistic value of 1.880. Based on this value, it shows that the service quality of Tangerang City IPBB does not have an insignificant effect on taxpayer compliance. So the hypothesis in this study was rejected. Research that supports the results of this hypothesis is research showed by [22], suggesting that service quality does not affect taxpayer compliance. Likewise, research showed by [23] concluded that although the service quality was good, it did not make taxpayers comply with their tax obligations. Several other supporting factors are needed to increase taxpayer compliance. The research results that are different from this study are research showed by [17] which states that service quality has a positive influence on system use. The provision of fast and reliable services according to the specific needs of users can lead to better services to information system users. The results of this hypothesis are also supported by field data derived from filling out the questionnaire. Based on the responses to questions from the service quality variable, 2 (two) of the 4 (four) indicators have the lowest value and are the weakest projections for taxpayer compliance. IPBB has a complete FAQ or Frequently Asked Questions. This indicates that IPBB does not provide FAQs required by users. Therefore, users still need to use the available helpdesk services to address questions that arise when using this information system. Then the second indicator is that the visuals displayed in IPBB are less attractive. This is because the display in IPBB only contains a table display containing information about PBB in white or neutral colors. Display that is too simple for users makes users quickly get bored with the appearance of IPBB. Taxpayers who are new users also feel unfamiliar with IPBB and need assistance by utilizing the available services. Unfortunately, with the lack of quality of IPBB services, taxpayers cannot use IPBB properly. As a result, there is no sense of motivation to fulfill their tax obligations immediately. Therefore, the Tangerang City government must improve IPBB by increasing the quality of IPBB services. So that taxpayers can utilize the results of this increase to fulfill their tax obligations. Thus, the better the quality of IPBB services, the level of taxpayer compliance in Tangerang City will also increase.

5 Conclusions The purpose of this study was to determine the effect of system quality, information quality, and service quality of Tangerang City IPBB on taxpayer compliance. After testing the collected data and producing an analysis of each of the variables studied, the researcher can conclude from the results of this study. So it can be concluded

Analysis of the Influence of System Quality, Information Quality, …

137

that first, the quality of the Tangerang City IPBB system has no significant effect on taxpayer compliance. The test results show the path coefficient value of 0.605 with p-values of 0.545 and the t-statistic value of 0.605. The characteristics of the Tangerang City IPBB system have not been met, so users cannot meet their needs adequately. Thus, the lower the quality level of the IPBB system will cause the level of taxpayer compliance in Tangerang City. Second, the quality of Tangerang City IPBB information has a significant effect on taxpayer compliance. The test results show the path coefficient value of 2.393 with p-values of 0.017 and a t-statistic value of 2.393. The better the quality of information generated by a system, the level of taxpayer compliance in Tangerang City will increase. Third, the quality of the Tangerang City IPBB service does not have an insignificant effect on taxpayer compliance. The test result shows the path coefficient value of 1.880 with a p-value of 0.061 and a t-statistic value of 1.880 because the services in IPBB do not help users, use IPBB. Thus, the lower the quality of IPBB services, the lower the level of taxpayer compliance in Tangerang City. Acknowledgements This work supported by BINUS University.

References 1. Gusar H, Azlina N, Susilatri S (2015) Pengaruh Sosialisasi Pemerintah, Pengetahuan Perpajakan, Sanksi Pajak, Kesadaran Wajib Pajak, dan Kualitas Pelayanan terhadap Kepatuhan Wajib Pajak dalam Membayar Pajak Bumi dan Bangunan (Kecamatan Bengkong). J Online Mhs Fak Ekon Univ Riau 2(2):33988 2. Kania P, Wahyuni A, Luh N, Erni G, Arie M (2017) Pengaruh Penerapan E-System Perpajakan Terhadap Tingkat Kepatuhan Wajib Pajak Orang Pribadi Dalam Membayar Pajak Pada Kantor Pelayanan Pajak ( Kpp ) Pratama Singaraja. e-Journal S1 Ak Univ Pendidik Ganesha 7(1) 3. Resmi S (2018) Perpajakan: Teori dan Kasus, Delapan. Salemba Empat, Jakarta 4. Ilhamsyah R, Endang M, Dewantara R (2016) Pengaruh Pemahaman dan Pengetahuan Wajib Pajak Tentang Peraturan Perpajakan, Kesadaran Wajib Pajak, Kualitas Pelayanan, dan Sanksi Perpajakan Terhadap Kepatuhan Wajib Pajak Kendaraan Bermotor. J Mhs. Perpajak. 8:1–9 5. DeLone WH, McLean ER (2003) The DeLone and McLean model of information systems success: a ten-year update. J Manag Inf Syst 19(4):9–30. https://doi.org/10.1080/07421222. 2003.11045748 6. Mustakini JH (2017) Analisis dan Desain (Sistem Informasi Pendekatan Terstruktur Teori dan Praktek Aplikasi Bisnis). Penerbit Andi. 7. Siregar HF, Siregar YH, Melani M (2019) Perancangan Aplikasi Komik Hadist Berbasis Multimedia. J Teknol Inf 2(2):113. https://doi.org/10.36294/jurti.v2i2.425 8. DeLone WH, McLean ER (2016) Information systems success measurement. Found. Trends® . Inf Syst 2(1):1–116. https://doi.org/10.1561/2900000005 9. Rachmadi TY, Handaka RD (2019) Evaluasi Penerapan E-Faktur Dengan Model Kesuksesan Sistem Informasi Delone Dan Mclean (Studi Kasus Di Kpp Pratama Metro), Substansi Sumber Artik. Akunt. Audit. dan Keuang. Vokasi 3(2):129. https://doi.org/10.35837/subs.v3i2.580 10. Pering IMA (2020) Kajian Analisis Jalur Dengan Structural Equation Modeling (Sem) SmartPls 3.0. J Ilm Satyagraha 3(2):28–48. https://doi.org/10.47532/jis.v3i2.177

138

M. S. Dewi et al.

11. Ghozali I (2015) Partial least squares Konsep Teknik Menggunakan SmartPLS 3.0. Universitas Diponegoro, Semarang 12. Sugiyono, Metode Penelitian Administrasi Dilengkapi dengan Metode R&D. Alfabeta, Bandung, 2018 13. Hsu CL, Lin JCC (2016) Effect of perceived value and social influences on mobile app stickiness and in-app purchase intention. Technol. Forecast. Soc. Change 108:42–53. https://doi.org/10. 1016/j.techfore.2016.04.012 14. Hidayati N, Harimurti F, SPA D (2017) Pengaruh Entrepreneurial Orientation, Culture Organization Internal Factor Terhadap Performance Organization Melalui Corporate Entreprenuership Capability Pada Umkm Batik Tulis Di Jawa Timur. Ris Akunt dan Keuang Indones 2(1):1–18. https://doi.org/10.23917/reaksi.v2i1.3412 15. Yuniarti IF (2021) Novrikasari, and Misnaniarti, “Pengaruh Kualitas Sistem, Kualitas Informasi, Kualitas Pelayanan pada Kepuasan Pengguna dan Dampaknya pada Manfaat Bersih (Penelitian terhadap Sistem Informasi Surveilans Penyakit Tidak Menular di Kota Palembang).” J Epidemiol Kesehat Komunitas 6(1):161–180 16. Maryana F, Ridhawati R, et al (2018) Pengaruh Kualitas Sistem Dan Kualitas Informasi Terhadap Kepuasan Pengguna Aplikasi Pelayanan Pelanggan Terpusat (Ap2t) Pt Pln (Persero) Wilayah … J Ekon dan Bisnis 11(2):213–229 [Online]. Available: https://www.journal.stienasypb.ac.id/index.php/jdeb/article/view/123 17. Garnetia Pramanita IGAAN, Rasmini NK (2020) Sistem E-Filing dan Kepatuhan Wajib Pajak Orang Pribadi: Studi D&M IS Success Model pada KPP Pratama Denpasar Timur. E-Jurnal Akunt 30(11):2825. https://doi.org/10.24843/eja.2020.v30.i11.p09 18. Kholis A, Husrizalsyah D, Pramana A (2020) Analisis Model Delone And Mclean Pada Penerapan Sistem Informasi Akuntansi Pemerintah Kota Medan. J Ilm MEA (Manajemen, Ekon. dan Akuntansi) 4(2):1–13 19. Azwar, Saragih R (2018) Analisis Faktor-Faktor Yang Mempengaruhi Kesuksesan Implementasi Sistem E-Filling Pajak : Studi Kasus Kantor Pelayanan Pajak Madya Makassar. J BPPK 11(1):12–34 20. Rukhmiati NMS, ketut Budiartha I (2016) Informasi Dan Perceived Usefulness Pada Kepuasan Pengguna Akhir Software Akuntansi (Studi Empiris Pada Hotel Berbintang Di Provinsi Bali). E-Jurnal Ekon dan Bisnis Univ Udayana 5.1 1:115–142 21. Nugroho Y, Prasetyo A (2018) Assessing information systems success: A respecification of the DeLone and McLean model to integrating the perceived quality. Probl Perspect Manag 16(1):348–360. https://doi.org/10.21511/ppm.16(1).2018.34 22. Kara RN (2018) E-Filling Terhadap Kepatuhan Pajak. J Ekon dan Bisnis 23. Ester KG, Nangoi GB, Alexander SW (2017) Pengaruh Kualitas Pelayanan Pajak Dan Pengetahuan Wajib Pajak Terhadap Kepatuhan Wajib Pajak Orang Pribadi Di Kelurahan Kleak Kecamatan Malalayang Kota Manado. Going Concern J. Ris. Akunt. 12(2):523–530. https://doi.org/ 10.32400/gc.12.2.17951.2017

Effects of Different Normalization on the ESRGAN Yongqi Tian, Jialin Tang, Lihong Niu, Binghua Su, and Yulei An

Abstract Nowadays, batch normalization (BN) has become the core component of deep learning. Thanks to the advantage of stable training, an image super-resolution reconstruction model, enhanced super-resolution generative adversarial networks (ESRGAN), and an image super-resolution reconstruction algorithm use BN to help its stable training. However, BN normalizes different types of information. This causes artifacts in the generated super-resolution image. Based on this, the generator of ESRGAN removes BN and the discriminator retains BN. However, BN in the discriminator will also normalize the information of different images, thus affecting the discriminator judgment. Motivated by this, we replace BN in the discriminator with layer normalization (LN), instance normalization (IN), group normalization (GN), and representative batch normalization (RBN) without adding any normalization operation. After a large number of experiments, ESRGAN reaches the state of the art when GN (PSNR:27.72, SSIM:0.8316) is used in the discriminator on the Set5 dataset. Keywords Image super-resolution · Generative adversarial networks · Normalization

Y. Tian · B. Su School of Optics and Photonics, Beijing Institute of Technology, Beijing, China e-mail: [email protected] B. Su e-mail: [email protected] Y. Tian · J. Tang · B. Su · Y. An School of Information Technology, Beijing Institute of Technology, Zhuhai, China e-mail: [email protected] L. Niu (B) College of Physics and Optoelectronic, Shenzhen, China e-mail: [email protected] J. Tang Faculty of Data Science, City University of Macau, Macau, China © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Gokhale and E. Grant (eds.), Proceedings of Asia Pacific Computer Systems Conference 2021, Lecture Notes in Electrical Engineering 978, https://doi.org/10.1007/978-981-19-7904-0_12

139

140

Y. Tian et al.

1 Introduction Image super-resolution is a challenging problem of computer vision. It has significant applications in many fields, such as face super-resolution reconstruction [1] and license plate super-resolution reconstruction [2]. With the popularity of convolutional neural networks (CNNs), the SRCNN [3] based on CNN has achieved good results. However images processed using SRCNN are usually blurred. Based on this, Ledig et al. proposed SRGAN [4]. SRGAN introduced the generative adversarial networks (GANs) [5] into the image super-resolution task. Compared with SRCNN, SRGAN can better capture image high-frequency information to obtain more accurate super-resolution images. However, SRGAN generates artifact regions, therefore, in 2018, Wang et al. proposed enhanced super-resolution generative adversarial networks (ESRGAN) [6] to solve the problem. ESRGAN introduces the dense [7] module and removes batch normalization (BN) [8]. Meanwhile, it uses the relativistic GAN as the judgment basis. Finally, it adopts VGG [9] features before activation. These efforts enable ESRGAN to solve the previous problems and improve the visual effect of reconstructed images. In the model design of ESRGAN, the generator removes BN, but the discriminator retains BN. In the model training, BN layer calculates and normalizes the mean and variance within a batch, thereby causing bring a problem: The difference between the test set adopted by the model in the reasoning stage and the dataset used in the training causes the normalization operation of the BN layer to negatively influence the results. Motivated by this, we select BN, layer normalization (LN) [10], instance normalization (IN) [11], group normalization (GN) [12], and representative batch normalization (RBN) [13] and remove normalization, respectively, on the basis of ESRGAN discriminator. Extensive experiments show that the state-of-the-art performance is achieved when the discriminator of ESRGAN utilizes GN.

2 Method In this section, we present ESRGAN in detail. Then, the implementation methods of the BN, LN, IN, GN, and RBN will beare explained.

2.1 ESRGAN The discriminator of ESRGAN uses VGG networks, and its objective function selects relativistic GAN as follows:

Effects of Different Normalization on the ESRGAN

        − E x f 1 − log DRa xr , x f L D = −E xr log DRa xr , x f

141

(1)

where xr is the discriminator’s output score of the true image and x f is the output score of the generated image by the discriminator. For the generator, ESRGAN introduces the dense and ResNet [14] modules, which propose the residue-in-residual dense block (RRDB). The objective function of the generator model of ESRGAN is as follows: L = L precep + λL G + μL 1

(2)

where L precep is the perceptual loss. It uses the pre-activation features of VGG16 network, L G is generator objective function, and L 1 is pixel-wise loss. The L precep is L 1 loss, and the formula for L G is as follows:         − E x f log DRa xr , x f L G = −E xr 1 − log DRa xr , x f

(3)

and the L 1 is L 1 = E xi G(xi ) − y1

(4)

λ and μ are hyperparameters and are set to 0.005 and 0.01, respectively, in ESRGAN.

2.2 Batch Normalization (BN) Internal covariate shift (ICS) [15] refers to the mismatch of input and output data distribution in the iterative process of neural network. To solve this problem, BN adopts normalization operations to accelerate model convergence using the following equation: x˙ − μB x˜i = √i σ B2 + ε

(5)

Finally, multiply each element in the mini-batch by the learnable parameters γ and β to get yi : yi = γ x˜i + β

(6)

142

Y. Tian et al.

2.3 Layer Normalization (LN) During network training, a small batch size setting can affect the expression of BN, so LN chooses to normalize in the channel direction using the following equation: a l − μl aˆ l = √  2 σl + ε

(7)

LN needs to ensure that the normalization does not damage the previous information, so it needs to add the gain parameter (gl ) and the bias parameter (bl ), so its output is as follows:   Output L = f gl aˆ l + bl

(8)

2.4 Instance Normalization (IN) Different from BN and LN, IN normalizes in a single channel of a single image, and its mean μti and variance σti2 are as follows: μti =

σti2 =

H W 1  xtilm H W l=1 m=1

H W 1  (xtilm − μti )2 H W l=1 m=1

(9)

(10)

where t represents the index of the picture and i represents the index of the feature map. The final output y is as follows: xtijk − μti ytijk = √ σti2 − ε

(11)

2.5 Group Normalization (GN) When the batch size is large, the training is difficult, but if the batch size is small, the result is somewhat affected. Therefore, GN can be normalized when the batch size is small and improves the model performance.

Effects of Different Normalization on the ESRGAN

143

This divides the input channels into G groups (G is a hyperparameter), at which point the size N × C × H × W of the input data is reshaped into size N × G × C/G × H × W . Each group has C/G channels. And GN performs normalization operations within their respective groups.

2.6 Representative Batch Normalization (RBN) RBN introduces feature calibration to enhance useful information and suppress useless noise because BN accelerates network training while ignoring the representation dif-ferences between examples. RBN adds the input feature to the feature K m of the current instance and normalizes it, where K m is obtained using global average pooling [16] and its formula is as follows: Km =

H W 1  x(N ,C,H,W ) H W h=1 w=1

(12)

At this point, the input X m is as follows: X m = X + wm · K m

(13)

where wm is a learnable variable, while the RBN also introduces scaling alignment after normalization using the following formula: X = X + wm ∗ R(wv · K S + wb )

(14)

K S is the instance feature after global average pooling, R is the sigmoid function, and both variable wv and wb are learnable variables.

3 Experiment In this experiment, we use the DIV2K dataset, which has 1000 high-definition images (2 K resolution), of which 800 are used for training, 100 are used for validation, and 100 are used for testing. We use 800 training images as the training set, and 100 validation images were used for the validation set during training. Set5, Set14, and BSD100 datasets were used for testing. These datasets consist of high-resolution images and 2x, 3x, and 4x downsampled of these images. Highresolution images and low-resolution images with 4x downsampling were used as test sets in our experiments.

144

Y. Tian et al.

We use peak signal-to-noise ratio (PSNR) and SSIM to evaluate the quality of generated super-resolution images. The mean square error (MSE) is defined as follows: MSE =

H −1 W −1 1  [I (i, j ) − K (i, j )]2 H W i=0 j=0

(15)

The height and width of images I and K are H and W . According to MSE, the formula for PSNR is as follows:  MAE I (16) PSNR = 20 · log10 MSE where MAE I is the maximum possible pixel value of the picture. If it is 8-bit binary, it is set to 255. SSIM is different from MSE and PSNR in measuring absolute error. The larger the PSNR value, the better the image quality. SSIM is more in line with human visual intuitive feeling:    2μx μ y + c2 2σx y + c3   SSIM = 2 (μx + μ2y + c2 ) σx2 + σ y2 + c3

(17)

The value range of SSIM is [0,1]. Like PSNR, the larger the value of SSIN, the higher the image quality is. For fair experiments, all models involved in the evaluation use the same training strategy. We first downsample the training image by a factor of 4 and then use bicubic to reshape the image to a size of 32 × 32 as low-resolution image. Then the processed images were fed into the generator to train 1162 epochs, the discriminator was not trained at this time, and the generator generates 4x upsampled image (128 × 128). After training the generator, take the training image into a 128 × 128 size as a real high-resolution image and send it to the discriminator for training. Next 32 × 32 low-resolution images were sent to the generator to generate 128128 high-resolution images. Finally, the generated images were sent to the discriminator training. The hyperparameter epoch is set to 465. The learning rate is 0.0002 when training the generator alone and 0.0001 when training both the generator and the discriminator. The batch size is 16, and Adam was selected as the optimization algorithm for this experiment. Figures 1, 2, and 3, respectively, show the super-resolution images generated on Set5, Set14, and BSD100 datasets of all models participating in the comparative experiment. At the same time, the PSNR value of each test set is shown on the picture, and the PSNR value is in parentheses. Tables 1 and 2 shows PSNR and SSIM scores for all models on Set5, Set14, and BSD100 datasets under different normalizations, respectively. LN(1), (2), and (3) mean normalize the different dimensions of the input image. For example, LN(3) normalizes the [C × H × W ] of the input image [N × C × H × W ]. . LN(2)

Effects of Different Normalization on the ESRGAN

Fig. 1 Generated super-resolution images of all participating models on the Set5 dataset

Fig. 2 Generated super-resolution images of all participating models on the Set14 dataset

145

146

Y. Tian et al.

Fig. 3 Generated super-resolution images of all participating models on the BSD100 dataset

normalizes the [H × W ] of the input image [N × C × H × W ]. . GN(4)(2) means dividing the input channels into different groups, for example GN(4) means dividing the input channels into four groups. The experimental results show that when the discriminator c of ESRGAN adopts GN and the number of groups is set to 4, its effect is better than BN and other normalization. Table 1 PSNR of all participating models on Set5, Set14, and BSD100 datasets

Model

Set5

Set14

BSD100

PSNR

PSNR

PSNR

RRDBNet

30.00

27.14

26.77

BN

27.39

24.67

24.55

LN(3)

27.47

24.94

24.72

LN(2)

27.42

24.89

24.67

LN(1)

27.39

24.83

24.55

GN(2)

27.84

25.06

24.86

GN(4)

27.72

25.09

24.95

RBN

27.52

24.92

24.58

No normalization

27.56

25.02

24.83

Effects of Different Normalization on the ESRGAN Table 2 SSIM of all participating models on Set5, Set14, and BSD100 datasets

Model BN

147 Set5

Set14

BSD100

SSIM

SSIM

SSIM

0.8285

0.7234

0.6939

LN(3)

0.8294

0.7236

0.6930

LN(2)

0.8251

0.7215

0.6925

LN(1)

0.8228

0.7172

0.6875

GN(2)

0.8367

0.7283

0.6989

GN(4)

0.8316

0.7288

0.7005

RBN

0.8245

0.7248

0.6922

No normalization

0.8333

0.7261

0.6981

4 Conclusion For the ESRGAN, we trained the discriminator with different normalizations and used the generator of ESRGAN in Set5 and Set14, to test the quality of superresolved images generated by the generator on BSD100. Extensive experiments show that when the discriminator adopts GN and set the group to 4, ESRGAN trained with the DIV2K dataset has the highest score on Set5 (PSNR:27.72,SSIM:0.8316), Set14 (PSNR:25.09, SSIM:0.7288), and BSD100 (PSNR: 24.95, SSIM: 0.7005). ESRGAN uses a small batch size (batch size: 16), so the model will cause an error reduction in statistical normalization results. It will normalize images with large differences and thus affect the impact of differences between images on the results. Therefore, GN can be used to group input images with large differences and normalize them within the group. Acknowledgements This research work is supported in part by the key scientific research platforms and projects in general colleges and universities in Guangdong Province 2021GXJK368 and Guangdong Higher Education Association Research Project 21GYB110.

References 1. Zefreh K, Aarle W, Batenburg K, Sijbers J (2013) Super-resolution of license plate images using algebraic reconstruction technique. J Imag Graph 1(2), June. https://doi.org/10.12720/ joig.1.2.94-98 2. Hamdan S, Fukumizu Y, Izumi T, Yamauchi H (2018) Face image super-resolution with adaptive patch size to scaling. J Imag Graph 6(2), December. https://doi.org/10.18178/joig.6.2. 167-173 3. Author C, Dong CC, Loy K, He X (2016) Tang, Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 38(2):295–307. https://doi.org/10.1109/ TPAMI.2015.2439281.doi:10.1109/TPAMI.2015.2439281 4. Ledig C, Theis L, Huszar F, Caballero J, Cunningham A, Acosta A, Aitken AP, Tejani A, Totz J, Wang Z, Shi W (2017) Photo-realistic single image super-resolution using a generative

148

5. 6.

7.

8.

9.

10. 11. 12.

13.

14.

15.

16.

Y. Tian et al. adversarial network. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2017, pp. 105–114 Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville AC, Bengio Y, Generative adversarial networks, CoRR abs/1406.2661 Wang X, Yu K, Wu S, Gu J, Liu Y, Dong C, Qiao Y, Loy CC (2018) Esrgan: Enhanced super-resolution generative adversarial networks. In: European Conference on Computer Vision (ECCV) workshops, Vol. 11133 of Lecture Notes in Computer Science, pp. 63–79 Huang G, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017, IEEE Computer Society, pp. 2261–2269. https:// doi.org/10.1109/CVPR.2017.243; https://doi.org/10.1109/CVPR.2017.243 Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, ICML, Vol. 37 of JMLR Workshop and Conference Proceedings, pp. 448–456 Simonyan K, Zisserman A (2015) Very deep convolutional networks for large scale image recognition. In: Bengio Y, LeCun Y (eds) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, Conference Track Proceedings, 2015. http://arxiv.org/abs/1409.1556 Ba LJ, Kiros JR, Hinton GE, Layer normalization, CoRR abs/1607.06450. http://arxiv.org/abs/ 1607.06450 Ulyanov D, Vedaldi A, Lempitsky VS, Instance normalization: The missing ingredient for fast stylization, CoRR abs/1607.08022. http://arxiv.org/abs/1607.08022 Wu Y, He K (2018) Group normalization. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds), Computer vision—ECCV 2018—15th European Conference, Munich, Germany, September 8–14. Proceedings, Part XIII, Vol. 11217 of Lecture Notes in Computer Science, Springer, pp. 3–19. https://doi.org/10.1007/978-3-030-01261-8_1 Gao S, Han Q, Li D, Cheng M, Peng P (2021) Representative batch normalization with feature calibration. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 8669–8679 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, IEEE Computer Society, 2016, pp. 770–778. https://doi.org/10.1109/CVPR. 2016.90 Santurkar S, Tsipras D, Ilyas A, Madry A (2018) How does batch normalization help optimization? In: Bengio S, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3–8, Montréal, Canada, 2018, pp. 2488–2498. https://proceedings.neurips.cc/paper/2018/hash/905056c1ac1dad141560467e 0a99e1cf-Abstract.html Lin M, Chen Q, Yan S (2014) Network in network. In: International Conference on Learning Representations, ICLR

Early Detection of Mental Disorder Via Social Media Posts Using Deep Learning Models Amanda Sun and Zhe Wu

Abstract Mental health, which has as equally important effects on people’s life as physical health, is receiving more and more attention nowadays, especially with a significant increase of pressure brought by the fast-paced evolution of technology and society. The diagnosis of mental health symptoms, however, mostly relies on the interpretation of languages and behaviors by experienced psychologists, who are not accessible for the great population. Depression causes cognitive and motor changes that affect speech production: reduction in verbal activity productivity, prosodic speech irregularities, and monotonous speech have all been shown to be symptomatic of depression. In this study, we aim to provide a deep learning-based model that could give an initial diagnosis of mental health problems for individuals and screen the risk of developing mental health issues. This AI-driven model focuses on the understanding and analysis of people’s daily public comments/posts and captures the peoples’ mental health status embedded in the semantic and syntactic structure in those online posts. Keywords Mental health · Depression detection · Deep learning · Artificial intelligence

1 Introduction Individual mental health problems have received more and more attention nowadays as it threatens the well-being of people, as much as physical health problems. According to Mental Health America (MHA)’s report in 2019 [1], nearly 20% of adults, which is equivalent to nearly 50 million Americans, experiencing mental illness and suicidal ideation continues to increase, with nearly 5% of adults A. Sun (B) Princeton High School, 151 Moore ST, Princeton, NJ 08540, USA e-mail: [email protected] Z. Wu Nanjing University of Aeronautics and Astronautics, Nanjing, China © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Gokhale and E. Grant (eds.), Proceedings of Asia Pacific Computer Systems Conference 2021, Lecture Notes in Electrical Engineering 978, https://doi.org/10.1007/978-981-19-7904-0_13

149

150

A. Sun and Z. Wu

having serious thoughts of suicide. Over half of the adults with mental illness remain untreated. Meanwhile, the situation for the youth is not optimistic either, with over 15% of youth experienced severe depression and 60% of those who do not receive any mental health treatment. The lack of treatment is partially due to the scarcity of psychotherapy resources, but more so caused by the unawareness of the existence of the mental illness and a lack of timely warnings. Unlike most physical illnesses, there is no standard medical test to diagnose mental illness. Instead, feelings, symptoms, and behaviors are usually used by psychologists, and diagnosis heavily relies on the psychologists’ interpretation of those observations. Social media provides platforms for people to express their feelings via posts in text and those text data embed abundant information about people’s feelings and emotions. It was reported in [2] that people with depression are more likely to post tweets with negative emotional sentiments. And many efforts have been conducted to understand the risk of mental illness via social platform data. Feature engineering is vastly performed to generate attributes such as linguistic styles and social engagement to build traditional machine learning classifiers (e.g., support vector machine) for mental disorder prediction [3–5]. Hand engineering features are not an effective way to understand the information embedded in the posts as those features may not capture every detail in the raw text. With the significant amount of text data on social media, it is possible to build end-to-end natural language processing (NLP) models to enable the early detection of mental disorders. Both convolutional neural network (CNN) and long short-term memory network (LSTM) have been applied to user content from social media [6–8]. The challenge associated with those end-to-end models is the requirement of a large amount of labeled data for training to guarantee satisfying prediction accuracy. Language model pre-training is effective for improving performances on different natural language processing (NLP) tasks, including the mental disorder detection via social media posts in this study. There are several large-scale pre-trained language models developed that provide useful language embeddings that can be directly used for downstream language tasks (e.g., sentiment classification, text generation, question answering, etc.). Embeddings from language model, aka ELMo [9] utilizes a bidirectional language model to learn contextualized word representation in an unsupervised learning setup. The ELMo model is task specific and needs to be trained separately for different tasks. A generative pre-training transformer (GPT) [10] was then proposed based on a transformer decoder architecture and supervised fine-tuning process for downstream task applications. GPT model architecture can be easily generalized to different NLP tasks with the pre-trained model and the general GPT framework was shown to provide superior performance than existing models back in time when it was proposed. Later GPT-2 [11] and GPT-3 [12] were published as the upgraded versions, with each version containing a 10x larger number of parameters than the previous version and, therefore, more powerful and yet more computationally expensive to use. Bidirectional Encoder Representations from Transformers (BERT) [13] utilizes a similar idea as the GPT model, which trains a large language model and is fine-tuned on specific tasks with the generic model architecture. BERT utilizes a multi-layer

Early Detection of Mental Disorder Via Social …

151

bidirectional transformer encoder and is trained on two tasks: Masked Language Model (MLM) and Next Sentence Prediction (NSP). MLM randomly masks 15% of the words/tokens in the sequence, and the model tries to predict the masked words using other context words in the sequence. NSP is motivated by the observation that many NLP tasks, such as question answering, involve the understanding of the relationships between neighboring sentences and attempts to learn if the second sentence is the next sentence in the data. The pre-training language models discussed above have not been extensively studied and applied for mental disorder detection. In this study, we investigate several methods, which consist of deep learning networks and the state-of-the-art pre-training language model for mental illness detection via social media posts. We want to explore and validate if natural language processing models can find subtle language patterns based on the raw text of online posts to detect mental health issues, and if the detection performance could be improved by more state-of-the-art language models.

2 Data Collection and Preprocessing There is a lot of textual data flooded to social media because of the increase of social media usage that has given researchers the opportunity to try to examine the emotions from the text. These data may aid in the analysis of feelings and provide valuable insight into the sudden discrepancy in the user’s personality traits as reflected in one’s posts. The goal of this work is real-time mental health status classification on online tweets, Reddit posts, comments, etc., using the developed natural language processing model. The model performance will heavily rely on the data quality and, therefore, the collection and preprocessing of data with/without mental health issues is one of the most important contributions of this study. Despite the recent research on identifying different mental health problems via text data, the public datasets in this area are rather limited. Meanwhile, it is challenging to get tweets and Reddit posts that indicate depression with manual labeling, which requires tremendous labor work. Instead, we take the advantage of existing public datasets, as well as the Twitter data filtering tool for the data collection. Existing datasets that we use here include TalkLife [14] and Dreaddit [15]. TalkLife is the largest global peer-to-peer mental health support network. It enables seekers to have textual interactions with peer supporters through conversational threads. The dataset contains 6.4 M threads and 18 M interactions (seeker post, response post pairs). Dreaddit is a new text corpus of lengthy multi-domain social media data for the identification of stress, which we only extract the text data and corresponding labels. Besides reformulating existing datasets into our need, we also collect data from Twitter and Reddit directly that indicate depression, loneliness, anxiety, etc. These datasets were collected so that an AI agent can be trained to evaluate mental health status via verbal expression. We developed a script using Twint (an open-source data scraper provided by Twitter to scrape data on its platform) that filters out data based on hashtags such as ‘#depressed,’ ‘#depression,’ ‘#hopeless,’ and so on. Meanwhile, we

152 Table 1 Dataset

A. Sun and Z. Wu Labels

Number of samples

Mental health issue

7952

No mental health issue

8000

use Reddit PRAW API to collect data from sub-Reddits that are likely to have mental health issues discussed, including interpersonal conflicts and mental illness, which was summarized by Sharma et al. [16] with 55 mental health-focused sub-Reddits. The scraped data is then manually reviewed to remove noisy data samples. The above discussion describes how the negative samples that indicate mental health issues are collected. When a machine learning classifier is trained, positive samples that indicate no mental health problems are equally important as negative samples. We randomly sample a subset (8000 samples) from the sentiment140 [17] dataset as positive samples. The sentiment140 is a public dataset that contains 1.6 million tweets. Combining the positive and negative samples collected via methods described above, we arrive at our dataset given in Table 1. The dataset collected above contains raw text, emojis, URLs, etc. To better assist the machine learning model training, proper preprocessing is required. The following data preprocessing steps are performed: • • • •

Convert all cases in text to lower cases Remove symbols such as emojis and URLs Expand contractions (e.g., can’t → cannot) Remove punctuation and stop words.

Those preprocessing steps can effectively reduce the data noise and facilitate the training of the machine learning model. We then analyze the word frequency in posts indicating mental disorders and visualize the results in Fig. 1. The larger the font size is, the more frequently the word appears in the negative posts with mental health issues. It is observable that keywords such as ‘depression’ and ‘anxiety’ make up a large proportion in the negative posts.

3 Methods Five models including one baseline model are trained and evaluated over the dataset we created.

3.1 Random Predictor Model We use the random predictor model as a natural baseline model, where the label distributions in the training data are memorized and then used to predict a random

Early Detection of Mental Disorder Via Social …

153

Fig. 1 Word cloud visualization for posts indicating mental health problems. The larger the word font size is, the more frequently it appears in the posts/tweets with mental health problems

class conforming to the label distribution of each new test case regardless of features we have for the test case.

3.2 Sentence2vec We implement a 100-dimension sentence2vec model, which returns the average of the word embeddings in the input sequence (or sentence, noting that one sample may contain multiple sentences simultaneously). The pre-trained word embeddings provided by Gensim [18], an open-source natural language processing package, are obtained by applying GloVe on the Twitter dataset. We specifically use the ‘gloveTwitter-100’ version of the pre-trained embeddings.

3.3 Deep Learning LSTM Model An LSTM model with an embedding layer was implemented to take the tokenized sequence as input and predict the mental health status directly. The architecture of the LSTM model is given in Table 2. The 105 in the output size from Table 2 denotes the maximum number of tokens within the training data. All training samples are padded to match that number so the model can batch process the data. All embedding weights in our LSTM model are randomly initialized and no pre-trained weights are utilized. The LSTM model is trained end-to-end using the training data we collected.

154 Table 2 LSTM model architecture

A. Sun and Z. Wu Layer

Example

# Parameters

Embedding layer

(None, 105, 128)

256,000

Spatial dropout 1D

(None, 105, 128)

0

LSTM

(None, 196)

254,800

Dense

(None, 2)

394

3.4 Pre-trained BERT BERT pre-trains language representations to build practical models that can be widely used for different tasks. We first use pre-trained BERT to extract high-quality language features, then train a logistic regression model over the extracted language features to classify the processed tweets/posts as either positive or negative. The extracted features for each processed raw text are a vector of size 768, which is an embedding for the tweet/post that we can use for classification. In this study, instead of using the vanilla BERT model, we use DistilBERT [19], which is a smaller, faster, and lighter version open sourced by Hugging Face. It has only 60% size of the vanilla BERT but runs 60% faster and matches 97% language understanding performance. Several data processing steps are required before the processed data in Sect. 2 is fed to DistilBERT: • Tokenization—break words into tokens, add [CLS] and [SEP] tokens to the start and end of the sentence, respectively, and substitute tokens with corresponding ids. • Padding—pad all lists of tokens to the same size to represent the input as a 2D array for batch processing. • Masking—create a mask array that has the same size as the input array and masks the padding so that the BERT model won’t be confused by the padded area. Three embeddings are combined as input embedding in BERT: token embedding, segment embedding, and position embedding, as illustrated in Fig. 2. Word piece tokenization embedding is used in the BERT model so that unusual words can be split into sub-word units. Sentence embeddings are designed to differentiate two sentences, and position embeddings are learned to reflect the position of words in the sequence. Trained BERT-based models are rarely used as-is, but are generally finetuned (transfer training) on the target dataset. The pre-trained BERT model grasps the semantic and syntactic relationships between words from a large dataset. Because of the shared semantic and syntactic information in different language datasets, a new model can be simply fine-tuned on a small target dataset to obtain superior performance and avoid overfitting.

Early Detection of Mental Disorder Via Social …

155

Fig. 2 BERT structure overview

3.5 Fine-Tuned BERT In Sect. 3.4, the pre-trained DistilBERT model is used as a feature extractor purely and all weights in the model are frozen. Because the pre-trained weights carry much language information, it allows fine-tuning on specific tasks with a much smaller dataset and less training time. In this subsection, we have an extra fully connected layer added after the general BERT model and fine-tune the overall model over the processed dataset.

4 Experiments Our experiments sought to understand if the proposed NLP models can detect the mental health status purely based on textual languages such as tweets and Reddit posts and the corresponding accuracies.

4.1 Quantitative Evaluations The quantitative testing results by different models described in Sect. 3 are provided in Table 3. Because both the training and testing data are pretty balanced, accuracy and macro-F1 score (as well as other metrics such as precision and recall) will be similar. Therefore, we only use accuracy as the evaluation metric for simplicity.

156 Table 3 Quantitative results

A. Sun and Z. Wu Model

Accuracy

Random

0.511

Sentence2vec + LR

0.857

LSTM

0.934

Pre-trained BERT + LR

0.947

Fine-tuned BERT

0.963

The test accuracies from different language models (e.g., sentence2vec, LSTM, pretrained model, fine-tuned model) are significantly higher than the random model, which shows that NLP models could find language statistics and patterns to help detect mental health status abnormalities. The BERT model (pre-trained and fine-tuned) outperforms other language models such as sentence2vec and LSTM, which shows that more powerful language model architectures can improve the performance of detecting tweets/posts that reflect mental health issues. The fine-tuned BERT model is able to achieve above 96% classification accuracy over the random test data, which demonstrates the effectiveness of fine-tuning over the pre-trained language models.

4.2 Qualitative Analysis We now show resulting examples with fine-tuned BERT model in Table 4, where the first column shows the raw texts (without any processing) from different posts, the second column indicates the true label of the posts, with 0 representing normal posts and 1 denoting posts with depression or mental health problems, the third column shows the prediction by our best model, i.e., the fine-tuned BERT. The first example, ‘Just checked my user timeline on my blackberry …,’ is a pretty normal post, as we can safely conclude that no mental disorders can be seen from that post. And our language model makes the correct prediction. The second Table 4 Qualitative analysis for examples Posts

Label Prediction

Just checked my user timeline on my blackberry, it looks like the twanking 0 is still happening Are ppl still having probs w/BGs and UIDs?

0

Just heard gun shots in my neighborhood!!!

1

0

If anyone will listen. I’m in a bad place right now. I could really use a friend 1

1

It cleared up and I was okay but. On Monday I was thinking about humans 1 and how the brain works and it tripped me out I got worried that because I was thinking about how the brain works that I would lose sleep and I did. That night was bad just like last time. Also, yesterday my sleep was bad I woke up like every hour of the night just like last time

1

Early Detection of Mental Disorder Via Social …

157

example, ‘Just heard gun shots in my neighborhood!!!,’ however, is very interesting as the fine-tuned BERT model classifies this post as potentially showing signs of a mental disorder while the label shows it is a normal post. The label here could be viewed as error or data noise because it is undeniable that the post reflects a certain level of anxiety and fear, which may result in potential mental disorder. The fact that the proposed language model is able to detect it demonstrates the robustness of the model. Similarly, the third and fourth example also proves that the proposed language model is more than a simple ‘keyword detector,’ as very few words in those examples exist in the high-frequency set shown in Fig. 1.

5 Conclusions and Future Work The application of different natural language processing models on the early detection of mental health disorders via social media posts is explored and we found that deep learning language models help detect potential mental disorders based solely on raw text in social media. Among all the models investigated, the advanced fine-tuned BERT model proves to be the most effective model for detecting mental disorder signs from the text. Compared with the baseline model where sentence embeddings are derived by averaging word embeddings, The LSTM, the pre-trained BERT, and the fine-tuned BERT led to obvious better classification performance. For future work, we want to explore if the performance of fine-tuned BERT model could be further improved with a larger training dataset. Also, the language model investigated in this study could be powered online to provide real-time analysis of individual mental health status by analyzing comments as well as about public mental health statistics. Moreover, the attention mechanism visualization could be investigated and applied to explain the model behaviors, i.e., why the model classifies specific posts/tweets as ones indicating mental health problems.

References 1. Reinert M, Fritze D, Nguyen T (2022) The state of mental health in America 2. Park G, Schwartz HA, Eichstaedt JC, Kern ML, Kosinski M, Stillwell DJ, Ungar LH, Seligman ME (2015) Automatic personality assessment through social media language. J Personal Soc Psychol 108(6):934 3. De Choudhury M, Gamon M, Counts S, Horvitz E (2013) Predicting depression via social media. In: Seventh International AAAI Conference on Weblogs and Social Media 4. Reece AG, Reagan AJ, Lix KL, Dodds PS, Danforth CM, Langer EJ (2017) Forecasting the onset and course of mental illness with twitter data. Sci Rep 7(1):1–11 5. Tsugawa S, Kikuchi Y, Kishino F, Nakajima K, Itoh Y, Ohsaki H (2015) Recognizing depression from twitter activity. In: Proceedings of the 33rd annual ACM conference on human factors in computing systems, pp. 3187–3196

158

A. Sun and Z. Wu

6. Gkotsis G, Oellrich A, Velupillai S, Liakata M, Hubbard TJ, Dobson RJ, Dutta R (2017) Characterisation of mental health conditions in social media using informed deep learning. Sci Rep 7(1):1–11 7. Du J, Zhang Y, Luo J, Jia Y, Wei Q, Tao C, Xu H (2018) Extracting psychiatric stressors for suicide from social media using deep learning. BMC Med Inform Decis Mak 18(2):77–87 8. Kim J, Lee J, Park E, Han J (2020) A deep learning model for detecting mental illness from user content on social media. Sci Rep 10(1):1–6 9. Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: Proc of NAACL 10. Radford A, Narasimhan K, Salimans T, Sutskever I, Improving language understanding by generative pre-training 11. Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I et al (2019) Language models are unsupervised multitask learners. OpenAI blog 1(8):9 12. Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, et al., Language models are few-shot learners. arXiv:2005.14165 13. Devlin J, Chang M-W, Lee K, Toutanova K, Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 14. Saha K, Sharma A (2020) Causal factors of effective psychosocial outcomes in online mental health communities. In: Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14, pp. 590–601 15. Sharma A, Miner AS, Atkins DC, Althoff T (2020) A computational approach to understanding empathy expressed in text-based mental health support. In: EMNLP 16. Sharma E, De Choudhury M (2018) Mental health support and its relationship to linguistic accommodation in online communities. In: Proceedings of the 2018 CHI conference on human factors in computing systems, pp. 1–13 17. Friedrich N, Bowman TD, Stock WG, Haustein S, Adapting sentiment analysis for tweets linking to scientific papers. arXiv:1507.01967. 18. Rehurek R, Sojka P, Gensim–python framework for vector space modelling. NLP Centre, Faculty of Informatics, Masaryk University, Brno, Czech Republic 3(2) 19. Sanh V, Debut L, Chaumond J, Wolf T, Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv:1910.01108

Design of PLC Training Platform Based on Digital Twin Fuhua Yang

Abstract In order to adapt to the digital development of the manufacturing industry and meet the “golden course” construction needs. This paper analyzes the problems in PLC training, and proposed to create an intelligent manufacturing digital twin training platform, relying on TIA, PLCSIM Advanced, NX MCD and other software to perform 3D modeling of mechanical parts and create real work scenes. The digital twin training platform provides an open learning mode, which can improve students’ problem-solving ability and innovation ability, it also provides a new method for the design and debugging of automated products. Keywords Golden course · PLC · Digital twin · Combination of virtual and real

1 Introduction The idea of digital twin emerged from the advanced manufacturing field in the twentyfirst century. It is a new concept of information physical fusion, relying on computer technology, which was first proposed by Professor Michael grieves, the University of Michigan, United States. It was initially called “Information mirroring model” and then evolved into “Digital twin” [1]. Digital twin, is the virtual-real interconnection technology of smart factories, which can accurate mapping between virtual and reality, proactively warn and optimize operation plans, greatly shorten the program design, installation and debugging time [2]. Through a variety of high-precision sensors and communication interface technologies, digital twinning technology can use the accurate data of physical entities for data analysis and simulation [3]. It is particularly important to explore innovative training and teaching models based on digital twins. “Golden course” refers to the construction of courses with depth, difficulty and challenge, and the evaluation standard is “high-level, innovation and challenge” [4]. As one of the five golden courses, the virtual simulation golden course has received F. Yang (B) Department of Electronic Engineering, Huizhou Technician Institute, HuiZhou 516000, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Gokhale and E. Grant (eds.), Proceedings of Asia Pacific Computer Systems Conference 2021, Lecture Notes in Electrical Engineering 978, https://doi.org/10.1007/978-981-19-7904-0_14

159

160

F. Yang

extensive attention from schools. It has realized the integration of computer technology and practical teaching, which solved the problems of “cannot do it” and “cannot do it well” in the process of practical training. It is of great significance to improve the teaching quality and cultivate high-tech skilled talents of electrical automation under the background of new engineering [5].

2 Current Situation of PLC Course Training Teaching PLC course is the core course of electrical automation major. It is a comprehensive course that combines relay control technology, computer technology and communication technology. However, the traditional teaching mode is highly dependent on the training platform, and there are the following problems:

2.1 Large Investment in Training Equipment The update speed of PLC training equipment is slow and the cost is high, which cannot meet the rapid upgrading of intelligent manufacturing industry and the high-speed iteration of automation technology. There is a gap between learning needs and the employment requirements of students in the digital background. Students’ learning is highly dependent on equipment, and there is a gap between school equipment and factory equipment [6].

2.2 Training Space and Time Are Limited Due to the fixed space and class time of the training room, the number of equipment is not sufficient, as the students’ knowledge and practical ability are different, so that students do not have enough training time in the classroom. Due to the factors such as non-standard and unskilled operation, it is also easy to cause equipment damage, which will cause harm to students and make them fear of difficulties [7].

2.3 Lack of Learning Initiative The traditional practice teaching is a one-to-many mode, mainly through the teacher’s explanation and demonstration. Most students are in the stage of imitating the teacher, which is reflected in the poor learning initiative, lack the spirit of exploration and are prone to stagnation when encountering problems [8, 9].

Design of PLC Training Platform Based on Digital Twin

161

3 Construction and Exploration of Digital Twin PLC Course 3.1 To Meet the Upgrading Requirements of Manufacturing Industry The interdisciplinary and integrated manufacturing industry requires more interdisciplinary talents. It is possible to build a training program to “make up for the real with the virtual”. Digital twin platform can build basic training projects and design comprehensive training projects. Guiding students to discover and analyze problems, and focus on cultivating students’ comprehensive ability to solve problems. Intelligent manufacturing needs more talents with self-learning ability and innovation ability. Through the digital twin platform, we can integrate cutting-edge technology and realize the integration of computer technology and practical training. Recently, a large number of automation equipment such as mask production machines, drug production lines and nucleic acid detection devices have been widely used. Students can learn these new equipment and new technologies in time through the digital twin platform [10].

3.2 Online and Offline Teaching Mode Digital twin technology can be used for 3D modeling through NX MCD software, and a complete working scene of automated equipment can be built online. Students can understand what to do in the training, how to do it, and what to pay attention to participatory learning. The training process is not limited to the training equipment, and each student can design and debug PLC programs on the digital twin learning platform, effectively solving the fear of difficulties in the training. Especially for students with weak foundations, can study in advance before class, check for omissions in class and practice repeatedly after class. The learning mode of “digital twin, combination of virtual and real” solves the problem of insufficient training equipment, provides sufficient learning opportunities for each students and improves the learning motivation of students.

3.3 Establish a Whole-Process Evaluation Mechanism The data record of the whole process can be established through the digital twin virtual simulation platform, which helps teachers know the learning situation of students, Check students’ preparation and completion of training ensure the science

162

F. Yang

Fig. 1 Software of digital platform

and fairness of evaluation and guide the implementation of teaching activities from “experience decision” to “data decision” [11].

4 Design and Construction of Digital Twin Platform 4.1 Platform Tool Selection Digital twin teaching platform relying on TIA, PLCSIM Advanced, NX MCD and other software. Firstly, according to the real equipment structure, the mechanical motion and control part are modeled in 3D by using the software NX MCD, and then the motion mechanism is assembled to complete the construction of the platform. PLCSIM Advanced is a high-function simulator, by creating a virtual controller, it can simulate the s7-1500 controller, provide the data interface of NX MCD mechatronics conceptual design, establish the connection between TIA and NX MCD software and realize the application of digital virtual debugging technology. The software of digital platform is shown in Fig. 1.

4.2 NX MCD to Create 3D Scene Use NX MCD to select and assemble mechanical parts, design the physical characteristics and interfaces of the assembled parts. You can build a basic training module or a comprehensive training module, combining frequency converters, sensors, Servo, stepper control to build an automated production line. Figure 2 is the simulation model of automatic assembly line built by NX MCD. You can use mechanisms such as drive belts and turntables to build different scenes. Such as turntable, chute, color sensor and replace the gripper for grabbing materials with suction cups to build the simulation model of automatic sorting production line as shown in Fig. 3.

Design of PLC Training Platform Based on Digital Twin

163

Fig. 2 Simulation model of automatic assembly line

Fig. 3 Simulation model of automatic color sorting

4.3 Connection Between PLC Program and MCD After the mechanical attribute setting, electrical signal interface configuration and connection are completed in NX MCD software, the next step is to program PLC on TIA platform and then download the program to PLCSIM Advanced simulation PLC. In the simulation environment, connect the signal of NX MCD with the variables

164

F. Yang

Fig. 4 NX MCD virtual platform and virtual PLC variable mapping

in PLC. As shown in Fig. 4, the mapping of PLC signals and signals in MCD is realized, so as to realize the program control of the digital twin virtual model. Programs debugged on the digital twin platform can be transplanted into the hardware environment and control real devices, thereby realizing safe and convenient debugging without physical costs. The seven-segment digital tube simulation control is shown in Fig. 5.

5 Summary Virtual simulation based on digital twin has become a new teaching mode. This paper explores the construction idea of PLC virtual simulation platform, designs typical digital twin models and adopts the quantitative data evaluation mode of the whole process to make the teaching process visible and controllable. The platform can effectively expand the learning time and space. Focus on the needs of students, integrate cutting-edge technology, keep up with the development of the industry, breaking the barriers between courses and provide methods and ideas for the construction of virtual simulation courses.

Design of PLC Training Platform Based on Digital Twin

165

Fig. 5 Seven-segment digital tube simulation debugging

References 1. Grieves M, Vickers J (2017) Digital twin: mitigating unpredictable, undesirable emergent behavior in complex systems. In: Franz JK, Shannon F, Anabela A (eds) Transdisciplinary perspectives on complex systems. Springer-Verlag, Berlin, 85413 2. Korth B, Schwede C, Zajac M (2018) Simulation-ready digital twin for realtime management of logistics systems. In: 2018 IEEE International Conference on Big Data (Big Data),Seattle, WA, USA. EEE, pp.4194–4201 3. Merkle L, Segura AS, Grummel JT, et al (2019) Architecture of a digital twin for enabling digital services for battery systems. In: 2019 IEEE International Conference on Industrial Cyber Physical Systems (ICPS),Taipei, Taiwan, China. IEEE, pp.155–160 4. Yong G, Guangyi Z (2021) Research on the application of intelligent manufacturing technology based on digital twin. Auto Control 7:189–207 5. Yan W (2018) Constructing China’s “golden course”. Chinese University Teaching 12:4–9 6. Fan Z, Li Z, et al (2020) Shanghai, Research on mixed practice teaching model based on digital twin. Shanghai: Lab Res Expl 2(39):241–244 7. Li Y, Zhuoran Z, Li W, Shanshui Y, Jiadan W (2021) Exploration of virtual simulation experimental teaching mode of aircraft electrical power system. J Elect Electron Edu: Issue 1:135–138 8. Di W, Yi F (2020) Design and development of the virtual simulation for electromagnetic waves. J Elect Electron Educ: Issue 42:130–134 9. Yin H, Lisha M (2021) Research on innovative school-enterprise cooperation practice teaching modes based on digital twin technology. J High Eng Educ Res Iss 4:105–110 10. Baiyan H, Youqiang Y, et al (2020) Construction and practice of biological virtual simulation experiment teaching project based on the connotation of golden class. Shanghai: Chem Life 40(9),:1612–1616 11. Shanshan Z, Haihui X, Kejun T, Ruiwu L (2020) Construction of a multi-mode virtual simulation experiment platform based on three-semester teaching reform. Shijiazhuang: Educ Teach Forum 11:391–392