Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing 2022-Winter 3031261348, 9783031261343

This edited book presents scientific results of the 24th ACIS International Winter Conference on Software Engineering, A

695 147 7MB

English Pages 164 [165] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing 2022-Winter
 3031261348, 9783031261343

Table of contents :
Foreword
Contents
Contributors
Automatic Piano Accompaniment Generation Method by Drum Rhythm Features with Selectable Difficulty Level
1 Introduction
2 Related Works
3 Proposed Method
3.1 Overview
3.2 Music Information Extraction Function
3.3 Sound Source Separation Function
3.4 Accompaniment Pattern
3.5 Optimal Pattern Extraction Function
3.6 Chord and Accompaniment Pattern Synthesis Function
4 Preliminary Experiments
4.1 Rhythm Vectorization of Accompaniment Patterns
4.2 Experimental Results
4.3 Discussion
5 Conclusion
References
Using Mixed Reality Technology for Teaching a New Language: A Study from Teachers’ Perspectives
1 Introduction
2 Literature Review
2.1 Use of MR Technology for Teaching New Languages
2.2 Human-Organization-Technology (HOT) Model
2.3 Self-Determination Theory (SDT)
3 Research Model and Propositions
3.1 Human Factors
3.2 Organization Factors
3.3 Technology Factors
3.4 Relatedness
3.5 Autonomy
3.6 Engagement
3.7 Attitude
4 Proposed Methodology and Future Work
5 Conclusion
References
Reinforcement Learning and Action Space Shaping for a Humanoid Agent in a Highly Dynamic Environment
1 Introduction
2 Methodology
2.1 Simulation Environment Setup
2.2 Reinforcement Learning
2.3 Action Space Shaping—Control Modes
2.4 Reward Function
3 Results
3.1 Experimental Evaluation
3.2 Analysis of the Control Modes
4 Conclusion
References
Color-SIFT Features for Histopathological Image Analysis
1 Introduction
2 Review of Histological Image Analysis Systems
2.1 Pre-processing Practices
2.2 Image Processing
3 Review of SIFT
3.1 SIFT Detector
3.2 SIFT Descriptor
4 Methodology
4.1 Pre-processing
4.2 Features Extraction by Color-SIFT
4.3 Features Vectors Encoding
4.4 BoF Classification Using SVM
5 Tests and Evaluation
5.1 Experimental Setup
5.2 Experiments Data
5.3 Results
5.4 Discussion
6 Conclusion
References
Time-Series Multidimensional Dialogue Feature Visualization Method for Group Work
1 Introduction
2 Related Works
2.1 Research About Divergent Thinking and Convergent Thinking
2.2 Related Research on Meeting Analysis Systems
3 Proposed Method
3.1 Transcription Decoder
3.2 Transcript Video Sync
3.3 Extracting Metadata
3.4 Filter
3.5 Segmentation
3.6 Extracting User Data
3.7 Extracting Statistics
3.8 Meeting Analysis Interface
4 Visualization Examples
4.1 Usage Environment
4.2 Consideration of Use Results
4.3 Discussion
5 Conclusion
References
A Study on the Usage Prediction Model of Demand Response Resource Using Machine Learning
1 Introduction
2 Related Study
2.1 Predicting Power Demand Based on Time Series Analysis
2.2 Forecasting Power Demand Based on Regression Analysis
2.3 Prediction of Electricity Demand Based on Artificial Neural Network Analysis
3 Research Model
3.1 Research Model Summary
3.2 Analysis on the Characteristics of the Data
4 Experiment and Result
4.1 Experiment Design
4.2 Deep Learning Model Experiment Results
4.3 ARIMA Model Experiment Result
4.4 Comparative Verification of Prediction Model
5 Conclusion
References
A Study on AI Profiling Technology of Malicious Code Meta Information
1 Introduction
2 Related Research
2.1 Types and Analysis of Malicious Code Detection Technology
2.2 Malicious Code Static Analysis Detection Technology
2.3 Dynamic Malicious Code Analysis Detection Technology
2.4 Malicious Code Analysis and the Comparison of Detection Techniques
2.5 MITRE ATT&CK Framework
2.6 Machine Learning Convergence Technology for Intelligent Malicious Code Detection
3 Research Model
3.1 Outline
3.2 OP-CODE Extraction and Data Processing Technology
3.3 OP-CODE Extraction and Data Processing Technology
3.4 Data Classification Modeling
3.5 Malicious Code Profiling Technology
3.6 OP-CODE TTP Matching Technology
4 Experiment and Result
4.1 Experiment Setting and Method
4.2 Experiment Result
5 Conclusion
References
A Study on the Influence of Smart Factory’s Intention to Continue to Use on the Management Effect of Enterprises
1 Introduction
2 Theoretical Background
2.1 Definition and Components of Smart Factories
2.2 Modified Information System Success Model
2.3 Relationships Between Perceived Usefulness and Ease of Use, Intention to Continue to Use, and Net Impacts
3 Research Method
4 Analysis and Results
5 Conclusions
References
Protecting the Rights and Legitimate Interests of the Parties in B2B Non-personal Data Sharing Transactions: Experiences from the UK and EU and Lessons Learnt for Vietnam
1 Introduction
2 Features of B2BNPDST
2.1 The Pricing of B2BNPDST
2.2 B2BNPDST from an Investment Perspective
2.3 Multi-layered Digital Business Ecosystem (MLDBE)15
2.4 Promptness
3 Legal Issues Related to B2BNPDST
3.1 The Legal Approach
3.2 The Technological Approach
4 A Comparative Study of the Legal Approaches of the UK and EU in Protecting Legitimate Rights and Interests of the Parties in B2BNPDST
5 Experiences for Vietnam
6 Conclusion
References
Capacitive Pressure Sensor Based on Interdigitated Capacitor for Applications in Smart Textiles
1 Introduction
2 Theoretical Background
2.1 Principle of Operation
2.2 Materials
3 Research Method
3.1 Result and Discussion
4 Conclusions
References
Author Index

Citation preview

Studies in Computational Intelligence 1086

Roger Lee   Editor

Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing 2022-Winter

Studies in Computational Intelligence Volume 1086

Series Editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland

The series “Studies in Computational Intelligence” (SCI) publishes new developments and advances in the various areas of computational intelligence—quickly and with a high quality. The intent is to cover the theory, applications, and design methods of computational intelligence, as embedded in the fields of engineering, computer science, physics and life sciences, as well as the methodologies behind them. The series contains monographs, lecture notes and edited volumes in computational intelligence spanning the areas of neural networks, connectionist systems, genetic algorithms, evolutionary computation, artificial intelligence, cellular automata, selforganizing systems, soft computing, fuzzy systems, and hybrid intelligent systems. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution, which enable both wide and rapid dissemination of research output. Indexed by SCOPUS, DBLP, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science.

Roger Lee Editor

Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing 2022-Winter

Editor Roger Lee Software Engineering and Information Technology Institute Central Michigan University Mount Pleasant, USA

ISSN 1860-949X ISSN 1860-9503 (electronic) Studies in Computational Intelligence ISBN 978-3-031-26134-3 ISBN 978-3-031-26135-0 (eBook) https://doi.org/10.1007/978-3-031-26135-0 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Foreword

The purpose of the 24th ACIS International Summer Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD2022-Winter) held on December 7–9, 2022, in Taichung, Taiwan is aimed at bringing together researchers and scientists, businessmen and entrepreneurs, teachers, and students to discuss the numerous fields of computer science, and to share ideas and information in a meaningful way. This Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing discussed the wide range of issues with significant implications, from Artificial Intelligence, to Communication Systems and Networks, Embedded Systems, Data Mining and BigData, Data-driven business models, Data privacy and security issues, etc. The conference organizers have selected the best 11 papers from those papers accepted for presentation at the conference in order to publish them ONLY in this volume (NOT in the conference proceedings). The papers were chosen based on review scores submitted by members of the program committee and underwent further rigorous rounds of review. In the chapter “Automatic Piano Accompaniment Generation Method by Drum Rhythm Features with Selectable Difficulty Level”, Kazuma Komiya, Ryotaro Okada, Ayako Minematsu, and Takafumi Nakanishi present an automatic piano accompaniment generation method by drum rhythm features with selectable difficulty levels. The proposed system takes as input the URL of a piece of music uploaded on a video distribution site and outputs several accompaniment scores of different difficulty levels. In the chapter “Using Mixed Reality Technology for Teaching a New Language: A Study from Teachers’ Perspectives”, Noura Tegoan, Srimannarayana Grandhi, Santoso Wibowo, and Robin Yang present a conceptual framework for assessing language teachers’ attitudes and decisions to adopt mixed reality (MR) technology in teaching a new language to high school students in Australia. In the chapter “Reinforcement Learning and Action Space Shaping for a Humanoid Agent in a Highly Dynamic Environment”, Jyun-Ting Song, Guilherme Christmann, Jaesik Jeong, and Jacky Baltes tackle a balancing task in a highly dynamic environment, using a humanoid robot agent and a balancing board. They v

vi

Foreword

propose an RL algorithm structure based on the state-of-the-art Proximal Policy Optimization (PPO) using GPU-based implementation; the agent achieves successful balancing in under 40 min of real time. In the chapter “Color-SIFT Features for Histopathological Image Analysis”, Ghada Ouddai, Ines Hamdi, and Henda Ben Ghezala explore the use of a model based on Color-SIFT descriptors, Bag-of-Features (BoF), and Support Vector Machine (SVM) to analyze and classify tumoral histopathological tissues. In the chapter “Time-Series Multidimensional Dialogue Feature Visualization Method for Group Work”, Rikito Ohnishi, Yuki Murakami, Takafumi Nakanish, Ryotaro Okada, Teru Ozawa, Kosuke Fukushima, Taichi Miyamae, Yutaka Ogasawara, Kei Akiyama, and Kazuhiro Ohashi present a time-series multidimensional dialogue feature visualization method for group work. The method uses group work recording data and Live Transcription data as input and performs time-series multidimensional dialogue feature visualization to show group work visualization results as output. In the chapter “A Study on the Usage Prediction Model of Demand Response Resource Using Machine Learning”, Hyeonju Park, Chungku Han, Kilsang Yoo, and Gwangyong Gim present a study on the usage prediction model of demand response resource using machine learning. The study has meaning in adapting a new approach in terms of real-time prediction based on the power consumption per minute within a short period of 1~2 months, which is different from the existing hourly or daily forecasts. In the chapter “A Study on AI Profiling Technology of Malicious Code Meta Information”, Dongcheol Kim, Taeyeon Kim, Jinsool Kim, and Gwangyong Gim present a study on AI profiling technology of malicious code meta-information. They intend to develop an artificial intelligence-based malicious code detection technology that can detect maliciousness and attack techniques and automatically generate profiling information to track attackers back. In the chapter “A Study on the Influence of Smart Factory’s Intention to Continue to Use on the Management Effect of Enterprises”, Seoung Jong Lee, Jong Woo Park, and Hee Jun Cho present a study on the factors affecting the intention to continue to use and net impacts of smart factory. The data from this study will contribute to studies on the establishment of smart factory strategies or implementation plans for the strategies conducted by companies or the government. In the chapter “Protecting the Rights and Legitimate Interests of the Parties in B2B Non-personal Data Sharing Transactions: Experiences from the UK and EU and Lessons Learnt for Vietnam”, Phùng Thi. M˜y Dung present a study about Protecting the rights and legitimate interests of the parties in B2B non-personal data sharing transactions. This study draws experiences from the above comparison and gives recommendations to the process of drafting laws for Vietnam on B2B non-personal data sharing. In the chapter “Capacitive Pressure Sensor Based on Interdigitated Capacitor for Applications in Smart Textiles”, Tran Thuy Nga Truong and Jooyong Kim present a systematic approach to electro-textile pressure sensors dependent on interdigitated capacitors (IDCs) for applications surrounding intelligent wearable devices, robots,

Foreword

vii

and e-skins. The method proposed a broad range of highly sensitive pressure sensors based on ICDs, composite Ecoflex, and carbon nanotubes (CNTs). It is our sincere hope that this volume provides stimulation and inspiration, and that it will be used as a foundation for works to come. December 2022

Roland Stenzel Dresden University of Applied Sciences Dresden, Germany Hsiung-Cheng Lin National Chin-Yi University of Technology Taichung, Taiwan

Contents

Automatic Piano Accompaniment Generation Method by Drum Rhythm Features with Selectable Difficulty Level . . . . . . . . . . . . . . . . . . . . . Kazuma Komiya, Ryotaro Okada, Ayako Minematsu, and Takafumi Nakanishi Using Mixed Reality Technology for Teaching a New Language: A Study from Teachers’ Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Noura Tegoan, Srimannarayana Grandhi, Santoso Wibowo, and Robin Yang Reinforcement Learning and Action Space Shaping for a Humanoid Agent in a Highly Dynamic Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jyun-Ting Song, Guilherme Christmann, Jaesik Jeong, and Jacky Baltes Color-SIFT Features for Histopathological Image Analysis . . . . . . . . . . . . Ghada Ouddai, Ines Hamdi, and Henda Ben Ghezala Time-Series Multidimensional Dialogue Feature Visualization Method for Group Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rikito Ohnishi, Yuki Murakami, Takafumi Nakanish, Ryotaro Okada, Teru Ozawa, Kosuke Fukushima, Taichi Miyamae, Yutaka Ogasawara, Kei Akiyama, and Kazuhiro Ohashi

1

17

29 43

59

A Study on the Usage Prediction Model of Demand Response Resource Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hyeonju Park, Chungku Han, Kilsang Yoo, and Gwangyong Gim

77

A Study on AI Profiling Technology of Malicious Code Meta Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dongcheol Kim, Taeyeon Kim, Jinsool Kim, and Gwangyong Gim

91

A Study on the Influence of Smart Factory’s Intention to Continue to Use on the Management Effect of Enterprises . . . . . . . . . . . . . . . . . . . . . . 107 Seoung Jong Lee, Jong woo Park, and Hee Jun Cho

ix

x

Contents

Protecting the Rights and Legitimate Interests of the Parties in B2B Non-personal Data Sharing Transactions: Experiences from the UK and EU and Lessons Learnt for Vietnam . . . . . . . . . . . . . . . . 125 Phùng Dung Thi. M˜y Capacitive Pressure Sensor Based on Interdigitated Capacitor for Applications in Smart Textiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Tran Thuy Nga Truong and Jooyong Kim Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

Contributors

Kei Akiyama ITOKI Corporation, Tokyo, Japan Jacky Baltes Department of Electrical Engineering, National Taiwan Normal University (NTNU), Taipei, 10610, Taiwan Henda Ben Ghezala RIADI Laboratory, National School of Computer Science (ENSI), University of La Manouba, La Manouba, Tunisia Hee Jun Cho Soongsil University, Seoul, South Korea Guilherme Christmann Department of Electrical Engineering, National Taiwan Normal University (NTNU), Taipei, 10610, Taiwan Phùng Dung Thi. M˜y University of the West of England, Ho Chi Minh City, Vietnam Kosuke Fukushima ITOKI Corporation, Tokyo, Japan Gwangyong Gim Department of IT Policy and Management, Soongsil University, Seoul, South Korea Srimannarayana Grandhi Central Queensland University, Melbourne, VIC, Australia Ines Hamdi RIADI Laboratory, National School of Computer Science (ENSI), University of La Manouba, La Manouba, Tunisia Chungku Han Department of IT Policy and Management, Soongsil University, Seoul, South Korea Jaesik Jeong Department of Electrical Engineering, National Taiwan Normal University (NTNU), Taipei, 10610, Taiwan Dongcheol Kim Soongsil University, Seoul, South Korea Jinsool Kim Soongsil University, Seoul, South Korea Jooyong Kim Soongsil University, Seoul, South Korea xi

xii

Contributors

Taeyeon Kim Soongsil University, Seoul, South Korea Kazuma Komiya Department of Data Science, Musashino University, Koto, Tokyo, Japan Seoung Jong Lee Soongsil University, Seoul, South Korea Ayako Minematsu Department of Data Science, Musashino University, Koto, Tokyo, Japan Taichi Miyamae ITOKI Corporation, Tokyo, Japan Yuki Murakami Department of Data Science, Musashino University, Tokyo, Japan Takafumi Nakanish Department of Data Science, Musashino University, Tokyo, Japan Takafumi Nakanishi Department of Data Science, Musashino University, Koto, Tokyo, Japan Yutaka Ogasawara ITOKI Corporation, Tokyo, Japan Kazuhiro Ohashi ITOKI Corporation, Tokyo, Japan Rikito Ohnishi Department of Data Science, Musashino University, Tokyo, Japan Ryotaro Okada Department of Data Science, Musashino University, Koto, Tokyo, Japan Ghada Ouddai RIADI Laboratory, National School of Computer Science (ENSI), University of La Manouba, La Manouba, Tunisia Teru Ozawa ITOKI Corporation, Tokyo, Japan Hyeonju Park Department of IT Policy and Management, Soongsil University, Seoul, South Korea Jong woo Park Soongsil University, Seoul, South Korea Jyun-Ting Song Department of Electrical Engineering, National Taiwan Normal University (NTNU), Taipei, 10610, Taiwan Noura Tegoan Central Queensland University, Sydney, NSW, Australia Tran Thuy Nga Truong Soongsil University, Seoul, South Korea Santoso Wibowo Central Queensland University, Melbourne, VIC, Australia Robin Yang Kaplan Business School, Adelaide, SA, Australia Kilsang Yoo Department of IT Policy and Management, Soongsil University, Seoul, South Korea

Automatic Piano Accompaniment Generation Method by Drum Rhythm Features with Selectable Difficulty Level Kazuma Komiya, Ryotaro Okada, Ayako Minematsu, and Takafumi Nakanishi

Abstract This paper shows the automatic piano accompaniment generation method by drum rhythm features with selectable difficulty levels. In general, when arranging a piece of music to be played on the piano by oneself, one needs to have an exceptionally high level of knowledge regarding the arrangement of the left-hand part of that accompaniment. This paper presents a new system that takes as input the URL of a piece of music uploaded on a video distribution site and outputs several accompaniment scores of different difficulty levels. This system obtains the drum sounds and chord progressions of a piece of music from the URL input by the user and generates an accompaniment score. This system focuses on the rhythm of the music. It generates an accompaniment score that reproduces the piece’s rhythm by selecting an accompaniment pattern that has a rhythm close to that of the music from among several accompaniment patterns prepared in advance. In addition, it generates scores with multiple difficulties by designing accompaniment patterns by difficulty level. It makes it possible to suggest musical scores that match the performer’s skill level. By implementing this system, piano players will have more opportunities to select and play music from a vast and diverse range of music based on their preferences. Keywords Piano · Automatic accompaniment generation · Rhythm · Drum · Similarity · Difficulty level

K. Komiya · R. Okada · A. Minematsu · T. Nakanishi (B) Department of Data Science, Musashino University, Koto, Tokyo, Japan e-mail: [email protected] K. Komiya e-mail: [email protected] R. Okada e-mail: [email protected] A. Minematsu e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Lee (ed.), Software Engineering, Artificial Intelligence, Networking and Parallel/ Distributed Computing 2022-Winter, Studies in Computational Intelligence 1086, https://doi.org/10.1007/978-3-031-26135-0_2

1

2

K. Komiya et al.

1 Introduction With the spread of creating digital music on a PC, a vast and diverse range of musical compositions has come to exist on the Internet. In addition, many requests to play these musical pieces on musical instruments. Among musical instruments, many users especially enjoy keyboard instruments such as the piano. The piano has been in the top two positions for three consecutive years in yahoo’s musical instrument search ranking [1]. In general, many piano players need sheet music when playing a piece of music. For many piano players to enjoy piano playing more, each player must be able to quickly obtain sheet music for each piece of music that matches their skill level. In addition, when selecting sheet music, piano players mainly choose from those sold and distributed on the Internet or at music stores that match their skills. However, when a player selects a piece of music to play from the vast amount of music available on the Internet, they may be unable to find a part of the score that matches their skill level. In such cases, a player with a high level of musical knowledge can listen to the music, arrange it in a way that allows them to play it on their own, and perform it. However, beginner players with low musical backgrounds have difficulty making a playable arrangement independently. Performers find it particularly difficult to arrange accompaniment parts played with the left hand, as it requires a high level of ability and knowledge. In this study, we first focus on the accompaniment part, which is considered to be particularly difficult to arrange by oneself, to realize the automatic generation of piano-arranged scores for any music piece and difficulty level. Accompaniment is also an important element for giving expression to a piece of music. It is said that music without accompaniment is very monotonous, but with accompaniment, the theme of a piece is fresher and more easily conveyed [2]. For this accompaniment, we consider a method for automatically generating scores of multiple difficulty levels for an arbitrary piece of music. This paper presents an automatic piano accompaniment generation method by drum rhythm features with selectable difficulty levels. This method takes the URL of an arbitrary piece of music uploaded to a video distribution site such as YouTube. It outputs an accompaniment score to be played by the left hand by the melody played by the right hand. This system realizes to increase opportunities for piano players to select and play music from a vast and diverse range based on their preferences. This paper is organized as follows: Sect. 2 presents related research; Sect. 3 describes the automatic piano accompaniment generation method by drum rhythm features with selectable difficulty levels; Sect. 4 generates accompaniment scores from the configured system, and Sect. 5 summarizes this paper.

Automatic Piano Accompaniment Generation Method by Drum …

3

2 Related Works Takamori et al. [3] proposed a method that takes a music score as input and automatically generates a piano-arranged score. In this study, a database of accompaniments is created from existing scores, and scores are developed by selecting accompaniments that are similar in thickness and rhythm to the input score. However, this method requires the input of a music score, making its use for general pop music difficult. In another study by Takamori et al. [4], a method for automatic generation of piano reduction scores for pop music is proposed. The music source is used as input, and the melody, chords, and rhythm are extracted from the input sound source to generate a score for piano performance. In this study [4], the score is generated in such a way that the accompaniment patterns are similar within the sections such as Verse, Bridge, and chorus, taking into account the connections between measures. In a study by Percival et al. [5], a method for automatically generating a quartet score for pop music is proposed. In this method, as in Takamori et al.’s work [4], the score is generated by inputting the sound source of the music. In this study, a probability model is prepared by extracting the features of melody, chord, bass, and beat from the music using songle [9] and by analyzing a corpus of classical music. The probability model and extracted features are then used to generate a score for the quartet. Nakamura and Sagayama [6] proposed a method for automatic piano reduction using an ensemble score of a piece as input. They use a probabilistic model of piano fingering to quantify the performance difficulty of the score. It also uses a merged-output hidden Markov model [7] to generate the score for both hands. Ariga et al. [8] propose a method for automatic generation of guitar solo covers using a music source as input. The method uses MIR technology to handle beats, melodies, and chords. The method also allows the user to specify the performance difficulty. Based on the results of a preliminary study, the difficulty level of the generated score is controlled by focusing on fingering in the case of performance. In contrast to these studies, our study takes the URL of any song uploaded on video distribution sites. It generates accompaniment scores of various difficulty levels, focusing on the drum sound of the input song. It will allow us to propose accompaniment scores with appropriate rhythms for the input sound source that match the user’s performance skills.

4

K. Komiya et al.

3 Proposed Method This section presents our automatic piano accompaniment generation method by drum rhythm features with selectable difficulty levels.

3.1 Overview Figure 1 shows the overview of this system. This system consists of a music information extraction function, a sound source separation function, an optimal pattern extraction function, and a chord and accompaniment pattern synthesis function. In this system, the music information extraction function first obtains the drum sound source, chord progression, beat position, and chorus position from the URL input by the user. Next, the sound source separation function generates each chord section, and the optimal pattern extraction function assigns multiple accompaniment patterns appropriate for the degree of difficulty to each chord section. Finally, the chord and accompaniment pattern synthesis function combines the assigned accompaniment patterns and chord progressions to generate accompaniment scores for multiple difficulty levels. In this system, an accompaniment score combines chord progressions and accompaniment patterns. An accompaniment pattern defines how the notes that make up a chord are arranged. The system obtains the chord progression using songle’s API [9] and suggests appropriate accompaniment patterns by focusing on the song’s rhythm. Specifically, the system extracts the rhythm from the song’s drum part, compares the extracted rhythm with the rhythms of multiple accompaniment patterns prepared in advance, and suggests an accompaniment pattern with the most similar rhythm. The system also delimits the accompaniment pattern at the timing of chord changes. In this paper, each section separated by the timing of the chord change is called a chord section.

3.2 Music Information Extraction Function A sound source separation tool called spleeter [10] is used to acquire the drum sound source of a song. After obtaining the original sound source of a piece from a URL using a library, etc., only the drum part is extracted using a spleeter. Songle API is a service that allows you to enjoy music while visually obtaining information such as chord progressions, beat positions, and choruses of songs. To use songle, a song must be uploaded to a video distribution site such as YouTube and registered on songle. Songle API allows you to retrieve information about a song analyzed by songle. The song information parsed by songle is tied to the URL of the

Automatic Piano Accompaniment Generation Method by Drum …

Fig. 1 Overview of this system

5

6

K. Komiya et al.

Fig. 2 Examples of accompaniment patterns handled by this system. This accompaniment pattern is expressed in C Major and shows how the notes that makeup C Major are arranged

original video. Since the songle API can be used to obtain various information on songs registered on songle, this is used to obtain information on songs.

3.3 Sound Source Separation Function The chord progressions obtained in the previous section are used for sound source splitting. At this time, the length of each chord section is expressed in units of beats and tied to the chord section information using the information on the position of beats obtained in the previous section. The obtained chord progression also includes information on where the chord corresponds to in the music, which is used to split the drum sound source at the timing of chord changes in the music.

3.4 Accompaniment Pattern For this paper, an accompaniment pattern is defined as “a pattern that defines the arrangement of the notes that make up a chord.” An example of an accompaniment pattern is shown in Fig. 2. This accompaniment pattern has a length of two beats and is expressed in c major. If we distinguish the three notes, C, E, and G, that makeup c major as “the first note,” “the second note,” and “the third note,” in that order from the bottom, this accompaniment pattern can be expressed as “0.5 beats for the first note, 0.5 beats for the third note, and 1 beat for the first note one octave higher.” This generalization enables the generation of scores with arbitrary chords and accompaniment patterns. The database also stores the accompaniment patterns generalized as described above by the degree of difficulty and length of the accompaniment pattern.

3.5 Optimal Pattern Extraction Function In this method, the accompaniment pattern is determined mainly by the rhythm of the music, and the optimal pattern is the accompaniment pattern with the closest rhythm to that of the music. In this function, the optimal pattern is assigned to each

Automatic Piano Accompaniment Generation Method by Drum …

7

drum sound source that each chord section has. The degree of difficulty is adjusted by narrowing down the accompaniment patterns to be selected to a specific degree of difficulty. The optimal pattern for each drum sound source is the accompaniment pattern with the most similar rhythm when the rhythm of the drum sound source is compared with the rhythms of all the selected accompaniment patterns. To compare the rhythms, the rhythm of the drum sound source and the rhythm of the accompaniment patterns are each converted into a vector of the same dimension, and the cosine similarity is used. In this case, the number of dimensions of the vector is defined as the number of dimensions per beat and is varied according to the beat length of the chord section. Given two vectors p and q, the cosine similarity is obtained by the following equation. cos( p, q) =

p·q  pq

(1)

We define each of the two vectors in Sects. 3.5.1 and 3.5.2.

3.5.1

Vectorization of Accompaniment Patterns Rhythms

First, a vector is generated by filling in 1 for the moment the note is played and 0 for the rest of the time. For example, the accompaniment pattern in Fig. 2 is vectorized with 2 as the number of dimensions per beat, resulting in [1, 1, 1, 0]. Next, the 1’s are varied according to the height of the note sounding at the time. Precisely, the higher the note, the smaller the value. This operation is based on the hypothesis that lower tones have a more significant effect on the rhythm. In this method, the values were varied according to this equation. x

f (x) = a 12

(2)

Here, the variable x is assumed to be a natural number representing the note’s height corresponding to each semitone, such as a midi note number. Also, the constant a is assumed to be a real number greater than 0 and less than 1, representing the rate of decay of the value when the note is one octave higher. In this way, the vector value is multiplied each time the sound rises an octave, and a vector whose value decreases as the sound rises higher can be generated. In addition, this method increases the attenuation rate when the chord section hits the chorus. This operation emphasizes the low notes more in the chorus section and makes it appear more glorious. For example, the accompaniment pattern shown in Fig. 2 is vectorized with 2 dimensions per beat. If the attenuation is 0.5 and x is the distance from the first C note, we get [1, 0.66, 0.5, 0].

8

K. Komiya et al.

Fig. 3 An example of vectorizing a drum sound source that a certain chord section has. The drum sound source shown above is vectorized as shown below. The horizontal axis in the above figure is number of samples and the vertical axis is amplitude (sampling rate: 22,050). In the figure below, the drum sound source is divided equally by the number of dimensions of the vector, and the root-mean-square in each section is represented. Horizontal axis: number of elements, vertical axis: root-mean-square values

3.5.2

Vectorization of Drum Sound Source Rhythms

The root-mean-square is used to vectorize the drum source rhythm. Figure 3 shows an example of vectorization. The drum sound source is divided equally by the number of dimensions, and a vector is generated by finding the root-mean-square in each section.

3.6 Chord and Accompaniment Pattern Synthesis Function This section describes the function for synthesizing the chord progression obtained in Sect. 3.2 and the accompaniment pattern generated in Sect. 3.5. As described in Sect. 3.4, the accompaniment pattern distinguishes the notes that make up the chord as 1, 2, and 3 from the bottom to the top and indicates how they are arranged. The accompaniment score is generated by mapping the chord of the corresponding chord section to that accompaniment pattern. This method uses only the bottom three notes for chords with four or more component notes, such as a seventh chord.

Automatic Piano Accompaniment Generation Method by Drum …

9

4 Preliminary Experiments In this section, we describe a preliminary experiment of this method, in which we implement the method described in Sect. 3 and generate accompaniment scores for several pieces of music. The generated scores are observed, and the degree of rhythmic reproduction is examined and discussed.

4.1 Rhythm Vectorization of Accompaniment Patterns For the decay rate indicated in Sect. 3.6, this experiment uses 1/2 for the chorus section and 2/3 for the rest of the chorus. The number of dimensions per beat shall be 12, the common multiple of 4 and 3, to correspond to sixteenth notes and triplets, and the music pieces used shall be Charles/Balloon, Into the Night/YOASOBI, and Zenzenzense/RADWIMPS. For the database for storing accompaniment patterns, the author prepared his accompaniment patterns, which he arbitrarily divided into three levels of difficulty and stored them. Hereafter, the difficulty levels are distinguished as Level 1, Level 2, and Level 3. In addition, four different lengths of accompaniment patterns were prepared, ranging from one to four beats in length. The accompaniment pattern used in this experiment is shown in Figs. 4, 5, 6, and 7. These figures are notated with each accompaniment pattern contained within a measure. The length of the chord section is measured in beats, and the accompaniment pattern with the matching beat length is used as the choice. When the length of the chord section is longer than 4 beats, which is the longest among the prepared ones, it is handled by expressing it with a combination of the prepared lengths. As an example, if the chord section were 8 beats long, two accompaniment patterns with a length of 4 beats would be connected. Level1

Level2

Level3

Fig. 4 The accompaniment pattern used in the experiment. Beat length: 1

10

K. Komiya et al. Level1

Level2

Level3

Fig. 5 The accompaniment pattern used in the experiment. Beat length: 2

Level1

Level2

Level3

Fig. 6 The accompaniment pattern used in the experiment. Beat length: 3

4.2 Experimental Results As an example of the entire score generated, some of Charles’ scores generated by this system are shown in the figures: level 1 is shown in Fig. 8, level 2 in Fig. 9, and level 3 in Fig. 10. In each of these scores, chorus part of Charles is extracted, and a single-note melody is manually added. Next, a drum sound source and the generated score are shown to confirm the reproducibility of the rhythm. The last chord section of the chorus of the first verse of “Zenzenzense” was adopted as a part that reasonably represents the reproducibility of the rhythm and is shown in Fig. 11. The audio waveform shows this is a chord section with loud sounds on the first and third beats, and the accompaniment pattern

Automatic Piano Accompaniment Generation Method by Drum …

11

Level1

Level2

Level3

Fig. 7 The accompaniment pattern used in the experiment. Beat length: 4

adopted is also characteristic of this section. The chord section at the beginning of the chorus of “Into the Night” was also adopted as a part that reasonably represents the reproduction of the rhythm and is shown in Fig. 12. In this chord section, an accompaniment pattern with a rest on the first beat was chosen, and not only in this chord section but also in the surrounding sections, an accompaniment pattern with a rest on the first beat was selected as well.

12

K. Komiya et al.

Fig. 8 Level 1 score of Charles’ chorus section generated by this system

4.3 Discussion Consider the beginning of the chorus of “Into the Night” shown in Fig. 12. Considering that the chorus of “Into the Night” has a general 8-beat rhythm, an accompaniment pattern in which the first beat is rest is inappropriate. One reason why an accompaniment pattern in which the first beat is not a rest was not selected is that the amplitude at the beginning of the chord section was smaller when the drum source was divided. This is thought to be since the position of the beats obtained using the songle API did not match the drums in the music. Figure 13 shows a vectorized version of the drum sound source shown in Fig. 12. The first element was close to zero, which did not result in a large cosine similarity to the vector of rhythms in the accompaniment pattern where the first beat is not a rest. One solution to this problem is to increase the size of the moment when the note is sounded and the surrounding elements when vectorizing the rhythm of the accompaniment pattern. In this way, even if there is a discrepancy between the position of the beat acquired by the songle API and the rhythm of the drum sound source, the cosine similarity can be measured without being affected significantly.

Automatic Piano Accompaniment Generation Method by Drum …

13

Fig. 9 Level 2 score of Charles’ chorus section generated by this system

5 Conclusion This paper represented a piano accompaniment generation system with selectable difficulty based on the drum part of a piece of music. This study devised a system that outputs an appropriate accompaniment score for a piece of music by taking the URL of the music on the Internet as input and extracting an accompaniment pattern with the closest rhythm to the accompaniment pattern prepared in advance. By generating accompaniment scores for different difficulty levels, we aimed to propose accompaniment scores that match the skill level of various piano players. Observation of the generated scores showed that the rhythmic characteristics of the drum part were successfully reproduced. Still, at the same time, there were situations in which an accompaniment score was selected that was considered inappropriate due to a discrepancy with the beat position. We will resolve the issues raised in Sect. 4.3 and extract rhythmic features from sources other than drum sources to deal

14

Fig. 10 Level 3 score of Charles’ chorus section generated by this system Fig. 11 The drum sound source for the last chord section of the chorus of “Zenzenzense” and the optimal pattern assigned to it. The upper figure shows the drum sound source, with number of samples on the horizontal axis and amplitude on the vertical axis (sampling rate: 22,050). The lower figure is the accompaniment pattern

K. Komiya et al.

Automatic Piano Accompaniment Generation Method by Drum …

15

Fig. 12 Drum sound source for the first chord section of the chorus of “Into the Night” and the optimal pattern assigned to it. The upper figure is the drum sound source, with number of samples on the horizontal axis and amplitude on the vertical axis (sampling rate: 22,050). The lower figure is the accompaniment pattern

Fig. 13 Vector generated from the drum source shown in the upper side of Fig. 12. Horizontal axis: number of elements, vertical axis: root-mean-square values

with situations where there are no drum parts. When we apply our method to the system that quickly obtains accompaniment scores that match their skills for any piece of music, many piano players, including beginners, will enjoy piano playing more.

16

K. Komiya et al.

References 1. “Musical Instrument” Search Ranking No. 1 in 2021 is “Piano” - Topics - Yahoo Japan Corporation (in Japanese), https://about.yahoo.co.jp/topics/20210603.html 2. Y. Mo, Designing an automatic piano accompaniment system using artificial intelligence and sound pattern database. Mob. Inf. Syst. 2022 (2022) 3. H. Takamori, H. Sato, T. Nakatsuka, S. Morishima, Automatic arranging musical score for piano using important musical elements, in Proceedings of the 14th Sound and Music Computing Conference 2017 (2017), pp. 35–41 4. H. Takamori, T. Nakatsuka, S. Fukayama, M. Goto, S. Morishima, Audio-based automatic generation of a piano reduction score by considering the musical structure, in International Conference on Multimedia Modeling (2019), pp. 169–181 5. G. Percival, S. Fukayama, M. Goto, Song2Quartet: a system for generating string quartet cover songs from polyphonic audio of popular music. Int. Soc. Music Inf. Retr. 114–120 (2015) 6. E. Nakamura, S. Sagayama, Automatic piano reduction from ensemble scores based on mergedoutput hidden Markov model, in Proceedings of the 41st International Computer Music Conference (2015), pp. 298–305 7. E. Nakamura, N. Ono, Y. Saito, S. Sagayama, Merged-output hidden Markov model for score following of MIDI performance with ornaments, desynchronized voices, repeats and skips. 1185–1192 (2014) 8. S. Ariga, S. Fukayama, M. Goto, Song2Guitar: a difficulty-aware arrangement system for generating guitar solo covers from polyphonic audio of popular music. ISMIR 568–574 (2017) 9. Songle API, https://api.songle.jp/ 10. R. Hennequin, A. Khlif, F. Voituret, M. Moussallam, Spleeter: a fast and efficient music source separation tool with pre-trained models. J. Open Source Softw. 2154 (2020)

Using Mixed Reality Technology for Teaching a New Language: A Study from Teachers’ Perspectives Noura Tegoan, Srimannarayana Grandhi, Santoso Wibowo, and Robin Yang

Abstract This paper presents a conceptual framework for assessing language teachers’ attitudes and decisions to adopt mixed reality (MR) technology in teaching a new language to high-school students in Australia. A research model grounded in the Human-Organization-Technology (HOT) model and Self Determination Theory (SDT) is proposed. This research study will involve an online questionnaire survey for data collection. The outcomes of this study will support schools, teachers and students in gaining insights into the use of MR technology for supporting learning and teaching. On the theoretical side, this study will focus on the implications of adopting MR technology, and the identification of the main factors affecting teachers’ decision to adopt MR technology. On the practical side, this study will provide insight for teachers, school principals and education providers to develop appropriate strategies for addressing key concerns and make sound decisions in adopting MR technology for effective learning and teaching. Keywords New language · Human-organization-technology · Mixed reality technology · Self-determination theory · Teachers

N. Tegoan (B) Central Queensland University, Sydney, NSW, Australia e-mail: [email protected] S. Grandhi · S. Wibowo Central Queensland University, Melbourne, VIC, Australia e-mail: [email protected] S. Wibowo e-mail: [email protected] R. Yang Kaplan Business School, Adelaide, SA, Australia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Lee (ed.), Software Engineering, Artificial Intelligence, Networking and Parallel/ Distributed Computing 2022-Winter, Studies in Computational Intelligence 1086, https://doi.org/10.1007/978-3-031-26135-0_3

17

18

N. Tegoan et al.

1 Introduction The number of people learning new languages is growing due to the high demand for multilingual staff in various sectors, especially tourism and education. This has contributed to an unparalleled growth of students learning new languages in public and private schools. For example, the Arabic language is the official language of 22 Arab countries and it ranks the fifth most spoken language worldwide. Due to its wider use, several schools around the world are teaching this language [1]. Unarguably, there are numerous challenges associated with teaching a new language to students. Al-Busaidi et al. [2] investigated the issues in teaching new languages and reported that students are generally unable to grasp new languages well. Prior studies [3, 4] attempted to understand the poor motivation behind learning a new language and the negative acceptance. These studies point out that insufficient teaching resources contributed to poor delivery of the curriculum. Another study by Mohd et al. [1] noticed that even though students are willing to learn a new language, most of them found it difficult to stay engaged. To address the challenges with delivering the content and keeping language students engaged and motivated, this study investigates the use of mixed reality (MR) technology to (a) inspire students’ motivation in learning a new language, and (b) alleviate any challenges and difficulties these students may encounter in the learning process. MR technology refers to any portion of the reality-virtuality continuum containing both real and virtual objects: from augmented reality, where virtual objects are displayed in the real world, to augmented virtuality, where real objects are displayed in the virtual world [4]. Within the MR environment, the user is able to ‘see’ the virtual/digital overlay or object, and physically or mentally interact with and/or manipulate it. For example, students can go on a virtual trip to another country and try to interact with the locals. Currently, the use of MR technology is becoming popular so it has been used in the education sector in a few countries including Canada and China [5]. Mohd et al. [1] argue that using MR technology to support education has grown remarkably because of its flexibility, availability and effectiveness. Proper adoption of technology in learning and teaching space will positively result in improving teaching quality, motivating students and increasing their fluency and engagement in learning languages [6]. This study is motivated by an important research gap that we observed. Despite the potential benefits of using MR technology in learning and teaching, there is limited research on the use of MR to support teaching new languages, particularly in the high-school student environment in Australia. Thus, this study aims to: (a) identify some major challenges in adopting MR technology by language teachers, and (b) provide recommendations to alleviate these challenges. The outcomes of this study are expected to push this research area a step forward by contributing to new and insightful knowledge on how MR technology can be used to improve the effectiveness of teaching new languages in Australian high schools. This research attempts to answer the following two research questions.

Using Mixed Reality Technology for Teaching a New Language: …

19

RQ1. What are the major challenges faced by language teachers in Australian high schools? RQ2. How can MR technology provide proper teaching support in overcoming the challenges associated with teaching new languages? This research work is organized into five sections. Section 2 reviews the existing research on MR technology and the relevant technology adoption models. Section 3 presents the proposed research model and propositions. Section 4 describes the proposed methodology and future work. Section 5 concludes the paper with expected contributions to the research.

2 Literature Review 2.1 Use of MR Technology for Teaching New Languages In the last few years, MR technology has been considered a potential technology to support the learning and teaching process [5, 23]. Some research works have been conducted on the benefits of using MR technology in language teaching in various educational systems [4, 5, 7]. Lew et al. [5] state that MR technology can be efficiently used for teaching new languages and supporting students’ motivation and self-efficacy. A study by Zainuddin and Perera [7] found that students became more engaged in learning a language when MR technology was used in the classroom. Bonner and Reinders [4] observed that teachers generally showed positive attitudes by using MR technology in educational institutes. They suggest that teachers’ attitude has changed and they become facilitators, counsellors and decision-makers after the MR technology adoption. The new role of teachers in the classroom is not only to disseminate new information and knowledge but also to teach learners the way to acquire data and value electronically. Further, Sahrir and Yusri [3] also found that students showed more interest, engagement, comprehension, development, success and positive improvements towards using MR technology in learning new languages. In general, several studies [2] highlighted the positive impact of technology in teaching new languages. In particular, students’ analysis, comprehension and dialogue skills improved significantly when technology was used for teaching new languages. Mohd et al. [1] reported a similar finding. Their study on the use of technology for teaching new languages to European high school students highlighted the benefits of using virtual learning technology and its role in motivating students to continue learning new languages. Over the years, several studies [5, 8] investigated the role of MR technology in improving authenticity, personalization and collaboration in teaching. The majority of these studies focused on understanding how MR technology can be used to motivate higher education students to learn new languages [22]. For example, Cheng et al. [8] investigated the role of MR technology in improving teaching quality, supporting

20

N. Tegoan et al.

students’ motivation, and improving students’ concentration and students’ problemsolving skills. Al-Busaidi et al. [2] argue that the use of immersive educational games with pictures and sounds in learning a new language helps to increase students’ engagement, generate higher learning outcomes, and support effective learning.

2.2 Human-Organization-Technology (HOT) Model The HOT model has been widely used to study the intention and acceptance of adopting innovative technologies [9–11]. The model distinguishes human, organizational and technological factors as impacting factors in technology adoption in organizations [11]. The technological factor explains adoption in terms of the technology’s functionality and reliability as well as its perceived usefulness. The HOT model focuses on social and behavioral contexts to clarify the connection between technology development in an organizational setting impacted by the surrounding environment [9]. The HOT model was built to assess the quality of a system through three important contexts, including human, organization, and technology [10]. In this research, the HOT model is used to identify the issues relating to the adoption of MR technology for teaching new languages in Australian middle schools from the teachers’ perspectives. This HOT model is useful for presenting the main contributing factors that influence the adoption of MR technology in teaching a new language.

2.3 Self-Determination Theory (SDT) SDT [12] is a commonly applied theory for examining human motivational behavior. This theory has been effective in explaining motivational dynamics and human motivational behavior. SDT postulates that the satisfaction of psychological needs (competence, autonomy and relatedness) determines the underlying motivational mechanism that energizes individuals to pursue an activity and thus directs people’s behavior [12]. Following the recent research in computer-mediated environments, researchers have applied SDT to explore the use of the virtual environment and found that the satisfaction of psychological needs leads to sustained engagement with virtual contexts (including how their fulfilment of technology needs can facilitate motivational behavior in the activities) [13]. SDT focuses on relatedness, autonomy, engagement, motivation and competence variables [14]. Several studies have used SDT to explain motivated behavior and behavioral intentions of Web-based technology in various areas such as marketing [14], and education [15].

Using Mixed Reality Technology for Teaching a New Language: …

21

3 Research Model and Propositions Motivation is a critical aspect of technology adoption decisions. It is about enabling individuals to take necessary action to achieve their goals. Motivation studies are often associated with identifying organizational goals and linking them to employees’ goals and enabling them to perform specific tasks. Over the years scholars [16, 17] thoroughly investigated the role of motivation in accepting new technologies and using them to achieve organizational goals. These studies point out that individuals are motivated either by internal factors such as interest and cultural values or external factors such as pay and organizational requirements. The Self-determination theory (SDT) highlights the importance of motivation quality. It suggests that motivation itself is not sufficient in achieving goals unless the quality aspect is not addressed [18]. The SDT helps understand the interplay between intrinsic and extrinsic factors and distinguishes between internal and external motivation [19]. The SDT theory presents three key variables competence, autonomy and relatedness. Competence refers to an individual’s ability to deal with the environment effectively. The knowledge and skills gained through experience will help achieve competence. Autonomy refers to an individual’s control over their life. Most importantly, people should believe that they are the masters of their own behavior and are willing to accept responsibility for the subsequent outcome. Whereas relatedness refers to people connecting themselves to others [18]. The other theories besides the SDT include TAM, TPB, TOE and HOT models. Of these models, the HOT model is commonly used to explain IT adoption decisions in organizational contexts [20]. Yusof et al. [21] proposed the HOT-Fit model to demonstrate the relationship between the organization, technology and human factors. They elaborated that human factors (self-efficacy, user experience and user satisfaction) help examine the users’ experience of new technology. The human factors help examine the users’ perspectives and satisfaction towards new technology. The organization factors (technology awareness, top management support and technology readiness) help assess the organizational context in terms of their preparedness to adopt new technology. The technology factors (functionality, reliability and perceived usefulness) outline the perceived benefits of new technology and to what extent users can rely on new technology [22]. Due to its ability to predict individuals’ behavioral intentions, the theory has been used widely in the education context [23]. Interestingly, some technology adoption studies combined the SDT theory with Technology Acceptance Model (TAM) to predict technology adoption intention [16, 17]. The use of technology in education is not new. Educational institutions adopted new technology to support learning and teaching activities. Prior studies highlight the role of various factors such as attitude, beliefs, competence, autonomy and relatedness in technology adoption decisions [17]. However, there is limited evidence

22

N. Tegoan et al.

Fig. 1 Proposed research model

to pinpoint the role of emotional constructs. Therefore, this study attempts to integrate human, organization and technology factors with the factors of SDT theory to determine the intention to adopt mixed reality technology for teaching new languages. This study investigates the impacts of using MR technology in teaching new languages in Australian high schools. The HOT model and SDT will be used to build the theoretical model to identify the issues relating to the adoption of MR technology for teaching new languages in Australian high schools from teachers’ perspectives. The research model combines the HOT and SDT model. In this research model, the HOT model is used to study human, organizational and technological factors in technology adoption, while the SDT is used to understand behavioral intentions and motivational mechanisms for applying MR technology. Figure 1 shows the proposed research model.

3.1 Human Factors Human factors are some of the essential factors that impact learners significantly when new technologies are adopted for teaching. Human factors have three dimensions: computer self-efficacy, user experience and user satisfaction [1]. Computer self-efficacy refers to the user’s capability of using computers in completing a mission and understanding computer skills [24]. Ismail et al. [25] found that language teachers

Using Mixed Reality Technology for Teaching a New Language: …

23

positively employ more modern technologies in classrooms if they have higher degrees of computer self-efficacy. User experience is the degree of positive or negative emotions experienced by a user in a specific context during and after product usage, and that experience will affect the motivation for further use [24]. Cheng et al. [8] state that the use of technology for achieving a high level of computer experience in the education system can improve the instructor’s teaching behaviors, attitudes, performance, confidence, and skills. User satisfaction refers to the extent how users positively feel or think towards using e-learning programs and meeting users’ requirements [6]. Mohd et al. [1] found that the more users are satisfied with the MR technology, the higher their support will be in using the technology. Al-Busaidi et al. [2] noticed that students who spend more time using the Internet for their studies are more likely to have higher satisfaction with technology experience. However, if a student is dissatisfied with the technology, they are more inclined to enroll in another study program with a different institution. Therefore, the following propositions are proposed: H1: Computer self-efficacy positively influences the decision to adopt MR technology for learning and teaching new languages. H2: User experience positively influence the MR technology for learning and teaching new languages. H3: User satisfaction positively influences the adoption of MR technology for learning and teaching new languages.

3.2 Organization Factors Organizational factors have a significant impact on technology adoption. For example, organizational factors can influence successful collaboration and engagement in using computers and adopting other technologies. There are three important organizational factors to be considered in technology adoption including technology awareness, top management support and technology readiness [26]. Technology awareness reflects the employer’s knowledge about the features, advantages, and benefits of technology adoption in the workplace [25]. Albirini [27] pointed out that technical knowledge is an essential requirement for improving and developing teachers’ attitudes and awareness, and teachers’ knowledge of the cultural non-neutrality of information and technology may have a substantial influence on their attitudes and teaching approach. Neupane et al. [26] defined top management support as the participation and involvement of top decision-makers to perform a set of specialized roles such as interpersonal, informational and decisional. Top management support positively affects the system quality and system function, and it is an essential element that affects the success of technology adoption and systems development projects [28]. Technology readiness is defined as a multidimensional construct that concentrates on the positive and negative impacts of technology adoption in the workplace. Neupane

24

N. Tegoan et al.

et al. [26] reported a positive relationship between technology readiness and technology adoption in a general context. Al-Busaidi et al. [2] investigated the impact of demographics on technology readiness involving some primary-school teachers. They found no major differences among different age groups and subject teachers. This leads to the following propositions: H4: Technology awareness positively influences the adoption of MR technology for learning and teaching new languages. H5: Top management support positively influences the adoption of MR technology for learning and teaching new languages. H6: Technology readiness positively influences the adoption of MR technology for learning and teaching new languages.

3.3 Technology Factors Technology factors reflect functionality, reliability and perceived usefulness factors that indicate the capability of technology to deal with a specific task [27]. Functionality refers to the potentiality and capability of a particular technology to produce the required characteristics and functions for a certain task confirming consistent, genuine and appropriate operations as anticipated [26]. More specifically, Mohd et al. [1] observed that using MR technology to support learning and teaching generates a positive impact on students’ motivation and passion towards language study. A similar finding was reported by Sahrir and Yusri [3], that using online games has a positive impact on students’ motivations, encouragement, engagement and vocabulary mastery of a new language. Reliability refers to the possibility that a system will achieve its intended aim and function for a certain period under a given set of circumstances [26]. Cheng et al. [8] argued that computers and other technological devices become essential parts of the educational world. Nowadays, many information activities involve converting data to a digital format to increase the capacity, ability, speed and reliability of information dissemination. Perceived usefulness is the degree to which a person thinks that the usage of a certain method can support and help work progress [10]. Sahrir and Yusri [3] found that students showed more interest, engagement, comprehension, and success in learning a new language by using digital tools. Hence, the following propositions are presented: H7: Functionality positively influences the adoption of MR technology for learning and teaching new languages. H8: Reliability positively influences the adoption of MR technology for learning and teaching new languages. H9: Perceived usefulness positively influences the adoption of MR technology for learning and teaching new languages.

Using Mixed Reality Technology for Teaching a New Language: …

25

3.4 Relatedness Relatedness is the demand for acceptance, belongingness or association with significant others and essential community [14]. According to Zainuddin and Perera [7], relatedness is connected to social networking that enables learners to interact and connect with their teachers and peers in class or after class. For example, students can learn new things by utilizing flipped classrooms via which students can regularly exchange knowledge with their teachers and peers. Mollaei and Riasati [28] reported that psychological factors have a positive effect on students’ behavioral intentions in virtual environments and argued that an immersive virtual system will offer higher chances for people to develop a sense of relatedness that subsequently predicts behavioral intentions. Thus, the following proposition is developed: H10a: Relatedness positively influences students’ and teachers’ attitudes towards using MR technology for learning and teaching a new language.

3.5 Autonomy Autonomy has two perspectives: learner and teacher. Learner’s autonomy is an attitude towards learning in which the learner is prepared to take responsibility or make decisions on their own depending on requirements, benefits and values [14]. On the other hand, teacher autonomy refers to teachers’ flexibility, freedom, ability and desire to adjust their teaching strategies and processes [1]. Huang et al. [14] argued that teacher autonomy should be managed within the framework of responsibility, knowledge, awareness, engagement, participation, collaboration and dealing with challenges. Al-Busaidi et al. [2] found a positive correlation between teachers’ attitudes and the adoption of technology in teaching. We, therefore, formulate the following proposition: H10b: Teachers’ autonomy positively influences students’ and teachers’ attitudes towards using MR technology for learning and teaching a new language.

3.6 Engagement Mollaei and Riasati [28] defined engagement as the quality and quantity of psychological and physical interest in an educational environment. Sahrir and Yusri [3] found that online gaming applications can support and improve students’ attitudes, engagement and vocabulary learning. Cheng et al. [8] got a similar outcome when virtual reality games and 3D video games were used to help students learn the Japanese language. The above findings have suggested the contribution of digital technology to teaching a new language as a second language [29, 30]. Thus, the following proposition is developed:

26

N. Tegoan et al.

H10c: Engagement will positively influence students’ and teachers’ attitudes towards using MR technology for learning and teaching a new language.

3.7 Attitude Bonner and Reinders [4] stated that attitude is a learned, global evaluation of an object (person, place, or issue) that impacts ideas and actions. Albirini [27] pointed out that teachers’ awareness of the cultural non-neutrality of technology adoption may have a substantial influence on their attitudes and approaches. Ismail et al. [25] reported a positive relationship between teachers’ attitudes and technology adoption. Tegoan et al. [29] argued that there are many benefits to using technology to motivate students and alter their attitudes. For example, when a classroom becomes technologically equipped, students will be more capable of making decisions on their own and being responsible and independent. Mollaei and Riasati [28] further argued that the new role of teachers after implementing technology in a classroom is not restricted to transmitting new information and knowledge, but to teaching students the way to acquire data and value from textbooks, software applications and Internet programs. In view of the above, we formulate the following proposition: H11: Attitude positively controls the relationship between human/organizational/technological factors and the decision to adopt MR technology for learning and teaching a new language.

4 Proposed Methodology and Future Work The goal of this study is to investigate the factors influencing the adoption of MR technology for teaching new languages in Australian high schools. To achieve this, a research model based on the foundation of HOT and SDT models is developed. A quantitative study involving an online questionnaire survey via Qualtrics will be used to gather data for testing the formulated propositions. Emails will be sent to highschool teachers in Australia. We aim to collect data from at least 387 respondents. The online survey questionnaire will contain three sections. In the first section, the respondents will be asked about their basic demographic data such as gender, education level, teaching experience and age. In the second section, the respondents will be requested to state the extent to which they disagree or agree with respect to each survey question. The third section asks the respondents about their experiences with using MR technology and their general evaluations. A 5-point Likert scale will be used to measure each survey item ranging from (1) Strongly Disagree to (5) Strongly Agree. A statistical analysis tool IBM SPSS will be used for preliminary tests such as reliability tests and exploratory factor analysis. This study will use the structural equation modelling technique for testing the relationships among the constructs presented in the research model.

Using Mixed Reality Technology for Teaching a New Language: …

27

5 Conclusion There is limited research work to investigate the adoption of MR technology for teaching new languages, particularly for high-school students in an Australian context. Hence, this research study will fill an important research gap in this aspect. The outcomes of this research work are anticipated to have both theoretical and practical implications. On the theoretical side, this study will contribute to the IS body of knowledge on technology adoption and learning technology, by identifying the factors influencing MR technology adoption. On the practical side, this study provides insight for teachers, school principals and education providers to develop appropriate strategies for addressing key concerns and make sound decisions in adopting MR technology for effective learning and teaching new languages.

References 1. K. Mohd, A. Adnan, A. Yusof, M. Ahmad, M. Kamal, Teaching Arabic language to Malaysian university students using education technologies based on Education 4.0 principles, in Proceedings of the International Invention, Innovative & Creative Conference (2019), pp. 38–51 2. F. Al-Busaidi, A. Al Hashmi, A. Al Musawi, A. Kazem, Teachers’ perceptions of the effectiveness of using Arabic language teaching software in Omani basic education. Int. J. Educ. Dev. Technol. 12, 139–157 (2016) 3. M. Sahrir, G. Yusri, Online vocabulary games for teaching and learning Arabic. Gema Online J. Lang. Stud. 12, 961–977 (2012) 4. E. Bonner, H. Reinders, Augmented and virtual reality in the language classroom: practical ideas. Teach. Engl. Technol. 183, 33–53 (2018) 5. S. Lew, T. Gul, J. Pecore, ESOL pre-service teachers’ culturally and linguistically responsive teaching in mixed-reality. Inf. Learn. Sci. 122, 45–67 (2021) 6. K. Burden, M. Kearney, Investigating and critiquing teacher educators’ mobile learning practices. Interact. Technol. Smart Educ. 14, 110–125 (2017) 7. Z. Zainuddin, C. Perera, Exploring students’ competence, autonomy and relatedness in the flipped classroom pedagogical model. J. High. Educ. 43, 115–126 (2017) 8. A. Cheng, L. Yang, E. Andersen, Teaching language and culture with a virtual reality game, in Proceedings of the Conference on Human Factors in Computing Systems (2017), pp. 541–549 9. M. Hossain, M. Quaddus, The adoption and continued usage intention of RFID: an integrated framework. Inf. Technol. People 24, 236–256 (2011) 10. J. Lian, D. Yen, Y.T. Wang, An exploratory study to understand the critical factors affecting the decision to adopt cloud computing in Taiwan hospital. Int. J. Inf. Manag. 34, 28–36 (2014) 11. L. Tornatzky, M. Fleischer, The processes of technology innovation (Lexington Books, Lexington, 1990) 12. E. Deci, R. Ryan, Intrinsic motivation and self-determination in human behavior (Plenum Press, New York, 1985) 13. A. Przybylski, C. Rigby, R. Ryan, A motivational model of video game engagement. Rev. Gen. Psychol. 14, 154–166 (2010) 14. Y. Huang, K. Backman, S. Backman, L. Chang, Exploring the implications of virtual reality technology in tourism marketing: an integrated research framework. Int. J. Tour. Res. 18, 116–128 (2015) 15. J. Roca, M. Gagné, Understanding e-learning continuance intention in the workplace: a selfdetermination theory perspective. Comput. Hum. Behav. 24, 1585–1604 (2008)

28

N. Tegoan et al.

16. S. Fathi, T. Okada, Technology acceptance model in technology-enhanced OCLL contexts: a self-determination theory approach. Australas. J. Educ. Technol. 34, 138–154 (2018) 17. F. Sahin, Y.L. Sahin, Drivers of technology adoption during the COVID-19 pandemic: the motivational role of psychological needs and emotions for pre-service teachers. Soc. Psychol. Educ. 25, 567–592 (2022) 18. R.M. Ryan, E.L. Deci, Self-determination theory: basic psychological needs in motivation, development, and wellness (Guilford Publications, New York, 2017) 19. N.R. Salikhova, M.F. Lynch, A.B. Salikhova, Psychological aspects of digital learning: a selfdetermination theory perspective. Contemp. Educ. Technol. 12, 1–13 (2020) 20. J. Xu, W. Lu, Developing a human-organization-technology fit model for information technology adoption in organizations. Technol. Soc. 70, 102010 (2022) 21. M.M. Yusof, R.J. Paul, L.K. Stergioulas, Towards a framework for health information system evaluation, in Proceedings of the 39th Annual Hawaii International Conference (2006) 22. S. Tri Purnama, H. Zulfadli, T. Wen Via, P. Astri Ayu, Human-organization-technology (HOT) analysis on the primary care application users. Revista 41, 1–11 (2020) 23. A.A. Larionova, N.A. Zaitseva, Y.F. Anoshina, L.V. Gaidarenko, V.M. Ostroukhov, The modern paradigm of transforming the vocational education system. Astra Salvensis 6, 436–448 (2018) 24. D. Compeau, C. Higgins, Computer self-efficacy: development of a measure and initial test. MIS Q. 19, 189–211 (1995) 25. S. Ismail, A. Almekhlafi, M. Al-Mekhlafy, Teachers’ perceptions of the use of technology in teaching languages in United Arab Emirates’ schools. Int. J. Res. Educ. 27, 21–28 (2010) 26. C. Neupane, S. Wibowo, S. Grandhi, M. Hossain, A trust-based smart city adoption model for the Australian regional cities: a conceptual framework, in Proceedings of the Australasian Conference on Information Systems, Perth, Australia (2019) 27. A. Albirini, An exploration of the factors associated with the attitudes of high school EFL teachers in Syria toward information and communication technology. The Ohio State University, 2004 28. F. Mollaei, M. Riasati, Teachers’ perceptions of using technology in teaching EFL. Int. J. Appl. Linguist. Engl. Lit. 2, 13–22 (2013) 29. N. Tegoan, S. Wibowo, S. Grandhi, R. Yang, Extended reality technology adoption for learning new languages: an integrated research framework, in PACIS Proceedings, 303 (2022) 30. N. Tegoan, S. Wibowo, S. Grandhi, Application of the extended reality technology for teaching new languages: a systematic review. Appl. Sci. 11–23, 11360 (2021)

Reinforcement Learning and Action Space Shaping for a Humanoid Agent in a Highly Dynamic Environment Jyun-Ting Song, Guilherme Christmann, Jaesik Jeong, and Jacky Baltes

Abstract Reinforcement Learning (RL) is a powerful tool and has been increasingly used in continuous control tasks such as locomotion and balancing in robotics. In this paper, we tackle a balancing task in a highly dynamic environment, using a humanoid robot agent and a balancing board. This task requires complex continuous actuation in order for the agent to stay in a balanced state. In this work, we propose an RL algorithm structure based on the state-of-the-art Proximal Policy Optimization (PPO) using GPU-based implementation; the agent achieves successful balancing in under 40 min of real-time. We sought to examine the impact of action space shaping on sample efficiency and designed 6 distinct control modes. Our constrained parallel control modes outperform the naive baseline in both sample efficiency and variance to the starting seed. The best-performing control mode, using parallel configuration, including lower body and shoulder roll joints named (PLS-R), is 33% more sample efficient than all the other defined modes, indicating the impact of action space shaping on the sample efficiency of our approach.Our implementation is open-source and freely available at: https://github.com/NTNU-ERC/Robinion-Balance-BoardPPO. Keywords Reinforcement learning · Humanoid robot system · Body balancing

J.-T. Song · G. Christmann · J. Jeong · J. Baltes (B) Department of Electrical Engineering, National Taiwan Normal University (NTNU), Taipei, 10610, Taiwan e-mail: [email protected] J.-T. Song e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Lee (ed.), Software Engineering, Artificial Intelligence, Networking and Parallel/ Distributed Computing 2022-Winter, Studies in Computational Intelligence 1086, https://doi.org/10.1007/978-3-031-26135-0_4

29

30

J.-T. Song et al.

1 Introduction The use of reinforcement learning (RL) in continuous control tasks has been gaining more traction [1]. Recent works have begun favoring RL methods to enable real and simulated agents to solve challenging tasks [2–4]. Where more traditional control system approaches were previously preferred, now with modern deep learning algorithms, agents can learn emergent behaviors and solve dynamic tasks from scratch [5]. Specifically, in the field of robotics, this has pushed many researchers to tackle age-old problems such as locomotion and manipulation under the lens of deep reinforcement learning [6]. In RL, one of the most impactful design decisions is the choice of the action space [7]. The action space refers to the method which the actions inferred by the policy are applied to the agent in the environment. Different action spaces may differ in properties such as the number of actions, i.e., the dimensionality of the space, as well as how the action values are executed by the agent. Moreover, action spaces can be completely discrete [8], completely continuous [9], or a mix of discrete and continuous actions [10]. Furthermore, action spaces can also differ in the way of normalization ranges or other transformations. The idea of transforming the action space aims to enable more efficient learning, and it is known as action space shaping [7] in the literature. The trend of growing RL is also observed in the sub-field of humanoid robotics. For a long time, balancing and locomotion problems were solved by using traditional control methods. For example, the work of [11] consisted of generating trajectories using a linear inverted pendulum model. Another example is the work by [12] which develops biped walking based on zero moment point (ZMP). Whereas in recent works it’s more common to see the use of deep RL to solve the same or similar problems. For example [13], investigated the use of deep RL for human-like walking behavior, and explored action space shaping with two control methods: torque-based and musclebased. Other works such as [14–16] show that with careful training and design RL policies for locomotion are robust to a large range of environments and disturbances. It also demonstrates that policies that are trained completely in simulation can be feasibly deployed to the real-world counterpart of the agent. Although RL for locomotion is well-established in the field, for the specific case of explicit balancing tasks, it seems less so. We wish to highlight the difference between balancing as part of locomotion and balancing in a highly dynamic environment, such as on top of a balancing board. In the first case, there are states in which the agent is more or less stable, for example, during the stance phase of the commonly used locomotion cycles [17, 18]. In the second case, a highly dynamic environment means that constant actuation is necessary to remain in a balanced state, and failing to do so will inevitably cause a collapse. For control in highly dynamic environments there works such as [19], which developed a momentum-based controller that is robust to disturbances, where one of their experiments utilized a balance board. In [20], the authors designed a sensor filter and a predictive control algorithm and managed to balance a real humanoid robot on a balance board for a few seconds. These research

Reinforcement Learning and Action Space Shaping …

31

studies focused on employing traditional control techniques. In [21], the authors proposed a deep Q-learning algorithm for a similar control problem in which only a classic deep-reinforcement learning technique has been used [22]. In this paper, we aim to investigate the effects of action space shaping in the efficiency of training in a highly dynamic environment under the state-of-the-art RL algorithm. We ran several experiments using PPO [23] to train an agent based on a humanoid robot [24] in a balancing task. The goal of the agent consisted of staying on top of a balancing board for as long as possible. As a humanoid, the agent counts with as many as 23 degrees of freedom (DoF) capable of actuation. We designed 6 distinct action spaces, called control modes, that shape the execution of actions in different manners. Each control mode differs in the number of outputted actions and constrains the propagation of the actions to the joints. The freest control mode (FA) counts with 23 actions and controls each joint independently. On the other hand, the most constrained control mode (PL), where the agent can only control its legs, outputs only 6 actions, which are mapped to the joints using a scheme that ensures the feet remain parallel to the ground. Our results show that the action space has a significant impact on the sample efficiency of training. In the best case, our implementation using Isaac Gym [25] is capable of training an agent capable of balancing for the whole duration of an episode in under 40 min on a single GPU. We make our implementation open-source and freely available for all at: https://github. com/NTNU-ERC/Robinion-Balance-Board-PPO.

2 Methodology We aim to show how the choice of action space affects the sample efficiency and final performance of RL. We designed a balancing problem solved by our agent in a simulation environment. The goal of our agent is to remain stable on a balance board while avoiding letting itself or the board touch the ground. In the following section, we describe the complete setup for our experiments, including the simulator and RL algorithms. We provide the details of our observation features, actions, and reward function design. For action space shaping, we define and describe in detail 6 distinct control modes. For each control mode, we train a new agent using 5 different seeds. Finally, for improved policy stability and generalization, we also provide the parameters used for domain randomization.

2.1 Simulation Environment Setup The agent used in this study is the humanoid robot Robinion2 [24]. The humanoid agent counts with 23 degrees of freedom (DoF). The goal of the agent is to stay balanced on top of a balance board. The balance board is composed of two separate rigid body objects, a board and a roller, which are modeled as a cuboid and a cylinder,

32

J.-T. Song et al.

respectively. We used 3D CAD software to design the model of the agent and the balance board, compute its physical properties, such as mass and inertia matrices, and import them into the simulation environment for training. The specifications of the agent and balance board are shown in Table 1. We design our simulation environment using Isaac Gym. Isaac Gym is a physics simulation environment for RL research developed by NVIDIA with support for large-scale parallel training. With Isaac Gym we can create a physics simulation scene directly in the GPU and an API provides access to read and write the state of multiple environments simultaneously. In all of our experiments, we created 4096 parallel environments. Each environment contains an instance of the humanoid agent and the balance board. Figure 1 shows a screenshot of the training scene with many parallel environments in Isaac Gym. The GPU used for our experiments was an RTX 2080 Super with 8 GB of VRAM.

Table 1 Specifications of the humanoid agent, the board, and the roller Agent Board Specification Dimensions (mm) Weight (kg) DoFs Actuators

300 × 112 × 920 7.5 23 XH540, XM540 XM430

730 ×280 × 37 2.3 – –

Roller 113 × 113 × 430 1.5 – –

Fig. 1 Training environment in Isaac Gym. Our training loop simultaneously reads and writes the states of 4096 parallel environments

Reinforcement Learning and Action Space Shaping …

33

2.2 Reinforcement Learning The goal of reinforcement learning is to optimize a policy π that maximizes the expected return J (π ),  T   J (π ) = Eτ ∼ p(τ |π) γ t rt (1) t=0

where |π ) denotes the likelihood of a trajectory τ under the given policy π and T p(τ t t=0 γ r t is the total reward collected in one trajectory, with r t being the reward collected at time t, T denoting the length of each episode, and γ ∈ [0, 1] is the discount factor that determines the weight of future rewards. We employ Proximal Policy Optimization (PPO) [23] to train our policies because it effectively optimizes Eq. 1 and has demonstrated remarkable success in several continuous control tasks [14, 26]. Our network consists of three hidden layers with 256, 128, and 64 neurons respectively, with ELU activation function [27]. Our policy architecture is presented in Fig. 2. Table 2 shows the hyper-parameter settings of training with PPO. At each timestep, the agent receives the latest observations from the environment. The observations are the inputs of the policy network, which models each output action as an independent normal distribution. The observations include information about the current state of the agent such as the current position and velocity of each joint, the agent’s current pose, angular velocity, and the position of its feet relative to its center-of-mass (CoM). Additionally, to facilitate the convergence of training, we include extra information such as the actions taken at the previous timestep, the angle of the balance board and position of the roller, the distance to the ideal position

Fig. 2 Network Architecture. The network consists of three fully connected hidden layers of size 256, 128 and 64, respectively. Each hidden layer is followed by an ELU activation function Table 2 Hyper-parameter settings for training of PPO Value Parameter Discount factor Learning rate PPO clip threshold PPO batch size PPO epochs Kullback-Leibler divergence threshold Entropy coefficient λ for generalized advantage estimation

0.99 3 × 10−5 0.2 32768 2250 0.008 0 0.95

34

J.-T. Song et al.

Table 3 The dimensionality of position and velocity of controlled joints and previous actions vary from 6 to 23 depending on the control mode applied to the agent. The board angle is computed relative to the ground. The upwards projection is the orthogonal projection of the robot’s orientation onto the z-axis Observations Feature Position of controlled joints Velocity of controlled joints Previous actions Agent’s position Agent’s angular velocity Agent’s orientation Left and right foot position (relative to CoM) Distance of left foot to ideal position Upwards projection of robot torso Board position Board angle Roller position

Dimensionality [6–23]* [6–23]* [6–23]* 3 3 3 6 1 1 3 1 3

of the feet on the board’s surface, and a projection of the upwards the robot torso, which indicates how vertical the agent currently is. Table 3 presents the complete list of observations and their dimensionality. For the actions, we realized action space shaping with 6 different control modes, presented in detail in the next Sect. 2.3. To improve the robustness and generalization properties of the policy, as well as to introduce some variance to the training loop, we employed domain randomization [28]. During training, we randomly sample a diverse set of physics parameters for each environment. Table 4 presents the parameters that are randomized as well as the ones that are kept fixed. Having the policy optimized on trajectories from a diverse set of physics parameters forces the agent to learn a behavior that can perform well across all of them.

2.3 Action Space Shaping—Control Modes We train our agent with six different control modes to show the effect of action space shaping and the choice of the control scheme can affect sample efficiency and final performance. In our experiments, we have two primary schemes: free and parallel. In free mode, the agent can control its joints freely, without any constraints. In parallel mode (robot’s parallel configuration), the agent’s feet are constrained to remain parallel to the ground. Under this control mode, we can reduce the actionspace dimension and facilitate the deployment of inverse kinematics. The reason is the agent’s mechanical configuration; the thighs and calves have the same length.

Reinforcement Learning and Action Space Shaping …

35

Table 4 Parameter ranges for domain randomization. Additive noises are added to the original value, while scaling noises are multiplied with the original value Range Distribution Type Parameters Actions Mass Stiffness of DoF Damping of DoF Initial DoF position Initial DoF velocity Gravity Static friction Dynamic friction

[0.0, 0.003] [0.85, 1.15] [0.75, 1.25] [0.9, 1.1] [–0.03, 0.03] [–0.1, 0.1] 9.81 1.0 1.0

Gaussian Uniform Uniform Uniform Uniform Uniform Constant Constant Constant

Additive Scaling Scaling Scaling Additive Additive – – –

Fig. 3 Side view of the agent’s lower body mechanics while moving in the parallel type of control modes. The 3 horizontal lines show that the knee angles will always be twice the angle of the hip and ankle joints

Thus, we can ensure the feet remain parallel to the ground by rotating the ankle and hip joints by the same amount and simultaneously rotating the knee joints twice the target angle while moving the leg’s pitch joints. Figure 3 shows the agent’s lower body mechanics for the parallel type of control modes. When designing our experiments and choosing the action space, we had two main questions: 1. How much impact does action space shaping have on the sample efficiency of training, and; 2. How critical is a humanoid’s upper body to solve balancing problems? To answer both of these questions, we carefully designed 6 control modes. In the most free mode, the agent can control all of the joints independently, where in the most constrained mode, it can only control 6 joints. Some control modes allow for control of only the lower body joints and others allow for varying degrees of

36

J.-T. Song et al.

upper body control. The 6 control modes and their detailed description are presented below: • FA—Free Mode + All Joints: The agent can control all of its joints without any limitations. Since there are no constraints in this mode, this is considered the baseline control mode. The total number of actions in this mode is 23. • FL—Free Mode + Lower Body: The agent can control any of its lower body’s joints independently. The upper body’s joints are set to a fixed pose and not controlled. The total number of actions in this mode is 10. • PA—Parallel Mode (robot configuration) + All Joints: The agent’s feet are constrained to remain parallel to the ground. It can control the joints from both the lower and upper body. The lower body actions are propagated using the parallel scheme. The total number of actions in this mode is 19. • PL—Parallel Mode (robot configuration) + Lower Body: The agent’s feet are constrained to remain parallel to the ground. The agent can control any of its lower body’s joints. The upper body’s joints are set to a fixed pose and not controlled. The total number of actions in this mode is 6. • PLS-R— Parallel Mode (robot configuration) + Lower Body + Shoulder Roll Joints: The agent’s feet are constrained to remain parallel to the ground. The agent can control any of its lower body’s joints. Additionally, the agent can also control its shoulder roll joints. The remaining upper body’s joints are set to a fixed pose and not controlled. The total number of actions in this mode is 8. • PLS-RP—Parallel Mode (robot configuration) + Lower Body + Shoulder Roll Joints + Shoulder Pitch Joints: The agent’s feet are constrained to remain parallel to the ground. The agent can control any of its lower body’s joints. Additionally, the agent can also control its shoulder roll and shoulder pitch joints. The remaining upper body’s joints are set to a fixed pose and not controlled. The total number of actions in this mode is 10. The dimensionality of both the action and observation spaces differ depending on the control modes. Table 5 shows the dimensions of the action space and observation space for the different control modes. To ensure reproducibility and robust results, we train a policy from scratch using 5 random seeds for each control mode. The graphs and tables presented in Sect. 3 are averaged over the 5 random seeds.

2.4 Reward Function In this section, we present the details of our reward function. The reward function was designed in such a way to encourage the agent to remain balanced on top of the board. The reward function is conditioned on the board’s angle to the ground (rtb ), the p orthogonal projection of the agent’s orientation to the z-axis (rt ), the angular velocity of the board (rta ), and the distance between the agent’s left foot and an optimal

Reinforcement Learning and Action Space Shaping …

37

Table 5 Size of the observation and action space for the different control modes. In the parallel mode, the lower body joints are constrained to keep the feet parallel to the ground, which reduces the number of actions needed from the network Mode name Dimension Observation space Action space FA FL PA PL PLS-R PLS-RP

93 54 93 54 60 66

23 10 19 6 8 10

position on top of the board (rtd ). Figure 4 shows the information components used in the formulation of the reward function. The complete reward function rt is a summation of four reward components: p

rt = rtb + rt + rta + rtd

(2)

The first component rtb encourages the agent to maintain the board parallel to the ground. It is computed according to the following equation: rtb = 1 − 20 · θt 2

(3)

where θt represents the board’s angle relative to the ground at timestep t. The angle is computed relative to x which is orthogonal to the upwards vector of the word and runs parallel to the ground and is shown in Fig. 4. p The second component rt encourages the agent to maintain its body vertical to the ground and is defined as: p

rt = −(1 − proj o z)

(4)

where projo z represents the projection of the upwards vector from the agent’s orientation against the world’s z axis vector (upwards). Figure 4 shows that the value of projo z is smaller the more tilted the agent becomes. The third component rta encourages the agent to maintain the board as stable as possible: rta = −ωt 2

(5)

where ωt represents the board’s angular velocity at timestep t. The fourth and last component of the reward function rtd encourages agent to place its foot in an ideal position on top of the board. It is defined as follows:

38

J.-T. Song et al.

Fig. 4 Vector and angle components used by the reward function (yellow symbols). x, z are unit vectors in the direction of the world’s x-axis and z-axis, respectively

rtd = 1 − 20 · Dl f 2

(6)

where Dl f 2 is the distance between the current position of the left foot and the ideal position on the board.

3 Results 3.1 Experimental Evaluation Using PPO, the policy is optimized with samples collected from 300 million timesteps in Isaac Gym. This process takes less than 40 min in a single RTX 2080 Super. Figure 5

Reinforcement Learning and Action Space Shaping …

39

(a) Average reward versus training steps.

(b) Average time versus training steps.

Fig. 5 Training curves for the six different control modes, averaged over 5 random seeds. The top performing parallel control modes achieve much better sample efficiency than the other control modes

presents the training curves for the 6 control modes defined in Sect. 2.3. Each line of the graph is averaged over 5 distinct random seeds. After training is finished, we run 2048 validation episodes and measure the performance of the policies in three metrics: average reward, average time to stay alive, and tolerance to disturbances. To measure tolerance to disturbances, we apply a force at the root of the agent in a random direction, starting from 0 Newtons and increasing by 10 Newtons every 2 s until the agent collapses. For each control mode, the data is averaged over the 5 random seeds and the standard deviation is calculated, presented in Table 6.

40

J.-T. Song et al.

Table 6 Evaluation trials for each control mode. The metrics were collected from 2048 validation episodes with a maximum duration of 16 s and averaged for 5 different seeds for each control mode Reward Steps Tolerance Control mode FA FL PA PL PLS-R PLS-RP

765±62 470±101 866±11 533±203 922±14 900±16

852±66 612±133 954±14 665±255 974±13 967±16

90±11 37±4 94±6 37±9 104±3 105±10

3.2 Analysis of the Control Modes Our results show that action space shaping, i.e. the choice of control modes, had a significant impact on the sample efficiency and final performance of the RL policies. The training curves in Fig. 5 show a large gap between the parallel mode methods PLS-R, PLS-RP, and PA and the other control modes. Additionally, the variance for different starting seeds is lower, which implies that the training is more stable and has a larger guarantee of convergence. The first question we sought to answer was to quantify the impact of action space shaping on sample efficiency. Using the reward value of 800 as a threshold reference, Fig. 5a shows that the most efficient method is PLS-R crossing the reference line at about 140 million timesteps. The second best-performing method PLS-RP crosses the reference line after about 210 million timesteps. From these, we can infer that PLS-R can reach a good level of performance with 70 million fewer timesteps; it is 33% more sample efficient than the second best control mode. The second question we sought to answer was determining if control of the humanoid agents’ upper body was critical to solve the balancing problem. Figure 5b shows poor final performance, as well as huge variance to initial random seed for the control modes constrained only to lower body PL and FL. Similarly, the evaluation trials presented in Table 6 also demonstrate sub-optimal performance for these control modes. On the other hand, the effectiveness of the parallel mode constraint in combination with upper body control is notable. The top 3 constrained control modes PA, PLS-R, and PLS-RP, substantially outperformed the naive baseline FA, where all joints are controlled independently.

4 Conclusion In this paper, we proposed an RL algorithm based on the state-of-the-art Proximal Policy Optimization (PPO) with a humanoid agent to tackle a balancing problem. Our implementation can train a successful balancing policy in under 40 min on a

Reinforcement Learning and Action Space Shaping …

41

single GPU. We explored the effect of action space shaping on the sample efficiency of training. We designed 6 distinct control modes, with varying constraints on the propagation of actions to the agent’s joints. For each control method, we averaged the results over 5 different random seeds. Our results demonstrated the control modes’ significant impact on sample efficiency and variance of training. The constrained parallel control modes outperformed the naive baseline (FA control mode) by a wide margin, and our best-performing control method required 70 million fewer timesteps to converge. In our future work, we plan to extend the findings from this work into the simto-real domain. Our goals are to deploy the policies learned in simulation with the real world robot. The biggest challenge to overcome will be handling the observation features that can’t be easily estimated in the real world but were used in the simulation of this work.

References 1. T.P. Lillicrap, J.J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D. Wierstra, Continuous control with deep reinforcement learning. arXiv:1509.02971 2. D. Kalashnikov, A. Irpan, P. Pastor, J. Ibarz, A. Herzog, E. Jang, D. Quillen, E. Holly, M. Kalakrishnan, V. Vanhoucke, et al., Scalable deep reinforcement learning for vision-based robotic manipulation, in Conference on Robot Learning, PMLR (2018), pp. 651–673 3. S. Gu, E. Holly, T. Lillicrap, S. Levine, Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates, in IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2017), pp. 3389–3396 4. O.M. Andrychowicz, B. Baker, M. Chociej, R. Jozefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray et al., Learning dexterous in-hand manipulation. Int. J. Robot. Res. 39(1), 3–20 (2020) 5. H. Zhu, J. Yu, A. Gupta, D. Shah, K. Hartikainen, A. Singh, V. Kumar, S. Levine, The ingredients of real-world robotic reinforcement learning. arXiv:2004.12570 6. J. Kober, J.A. Bagnell, J. Peters, Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32(11), 1238–1274 (2013) 7. A. Kanervisto, C. Scheller, V. Hautamäki, Action space shaping in deep reinforcement learning, in 2020 IEEE Conference on Games (CoG) (IEEE, 2020), pp. 479–486 8. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller, Playing atari with deep reinforcement learning. arXiv:1312.5602 9. X.B. Peng, M. van de Panne, Learning locomotion skills using deeprl: Does the choice of action space matter?, in Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation (2017), pp. 1–13 10. M. Neunert, A. Abdolmaleki, M. Wulfmeier, T. Lampe, T. Springenberg, R. Hafner, F. Romano, J. Buchli, N. Heess, M. Riedmiller, Continuous-discrete reinforcement learning for hybrid control in robotics, in Conference on Robot Learning, PMLR (2020), pp. 735–751 11. S. Kajita, O. Matsumoto, M. Saigo, Real-time 3d walking pattern generation for a biped robot with telescopic legs, in Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. no. 01ch37164), vol. 3 (IEEE, 2001), pp. 2299–2306 12. S. Kagami, K. Nishiwaki, T. Kitagawa, T. Sugihara, M. Inaba, H. Inoue, A fast generation method of a dynamically stable humanoid robot trajectory with enhanced ZMP constraint, in Proceedings of the IEEE International Conference on Humanoid Robotics (Humanoid2000) (2000)

42

J.-T. Song et al.

13. A.S. Anand, G. Zhao, H. Roth, A. Seyfarth, A deep reinforcement learning based approach towards generating human walking behavior with a neuromuscular model, in 2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids) (IEEE, 2019), pp. 537–543 14. T. Miki, J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, M. Hutter, Learning robust perceptive locomotion for quadrupedal robots in the wild. Sci. Robot. 7(62), eabk2822 (2022) 15. J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, M. Hutter, Learning quadrupedal locomotion over challenging terrain. Sci. Robot. 5(47), eabc5986 (2020) 16. N. Rudin, D. Hoeller, P. Reist, M. Hutter, Learning to walk in minutes using massively parallel deep reinforcement learning, in Conference on Robot Learning, PMLR (2022), pp. 91–100 17. F. Lacquaniti, Y.P. Ivanenko, M. Zago, Patterned control of human locomotion. J. Physiol. 590(10), 2189–2199 (2012) 18. K. Yin, K. Loken, M. Van de Panne, Simbicon: simple biped locomotion control. ACM Trans. Graph. (TOG) 26(3), 105-es (2007) 19. A. Herzog, L. Righetti, F. Grimminger, P. Pastor, S. Schaal, Balancing experiments on a torquecontrolled humanoid with hierarchical inverse dynamics, in 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems (IEEE, 2014), pp. 981–988 20. J. Baltes, C. Iverach-Brereton, J. Anderson, Sensor filtering for balancing of humanoid robots in highly dynamic environments, in CACS International Automatic Control Conference (CACS) (IEEE, 2013), pp. 170–173 21. L. Liu, J. Hodgins, Learning to schedule control fragments for physics-based characters using deep q-learning. ACM Trans. Graph. (TOG) 36(3), 1–14 (2017) 22. H. Van Hasselt, A. Guez, D. Silver, Deep reinforcement learning with double q-learning, in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) 23. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algorithms. arXiv:1707.06347 24. J. Jeong, J. Yang, J. Baltes, Robot magic show: human robot interaction. Knowl. Eng. Rev. 35 25. V. Makoviychuk, L. Wawrzyniak, Y. Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa, et al., Isaac gym: high performance gpu-based physics simulation for robot learning. arXiv:2108.10470 26. X.B. Peng, M. Andrychowicz, W. Zaremba, P. Abbeel, Sim-to-real transfer of robotic control with dynamics randomization, in IEEE International Conference on Robotics and Automation (ICRA). (IEEE, 2018), pp. 3803–3810 27. B. Ding, H. Qian, J. Zhou, Activation functions and their characteristics in deep neural networks, in Chinese Control and Decision Conference (CCDC) (IEEE, 2018), pp. 1836–1841 28. J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, P. Abbeel, Domain randomization for transferring deep neural networks from simulation to the real world, in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE, 2017), pp. 23–30

Color-SIFT Features for Histopathological Image Analysis Ghada Ouddai, Ines Hamdi, and Henda Ben Ghezala

Abstract Histopathology is one of the most used practices in the bio-medical field for disease and cancer detection and grading. Following the digitalization of biological data and improvement of machine/deep learning methods, the challenges for developing Computer-Assisted Diagnosis (CAD) systems arose. In this work, we explore the use of a model based on Color-SIFT descriptors, Bag-of-Features (BoF) and Support Vector Machine (SVM) to analyse and classify tumoral histopathological tissus. We tested the system using a limited amount of data and compared its results with ResNet18 results. Our model obtained a 64.8% accuracy while ResNet18 obtained 61.8% classification accuracy. We performed the experiments using the CPU rather than the GPU. The training and validation of ResNet18 took 07 H and 29 min while the proposed model took 03 H and 14 min. Keywords Histopathological image processing · Features extraction · Color scale-invariant feature transform (Color-SIFT) · Bag-of-features (BoF) · Machine learning

G. Ouddai (B) · I. Hamdi · H. Ben Ghezala RIADI Laboratory, National School of Computer Science (ENSI), University of La Manouba, La Manouba, Tunisia e-mail: [email protected] I. Hamdi e-mail: [email protected] H. Ben Ghezala e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Lee (ed.), Software Engineering, Artificial Intelligence, Networking and Parallel/ Distributed Computing 2022-Winter, Studies in Computational Intelligence 1086, https://doi.org/10.1007/978-3-031-26135-0_5

43

44

G. Ouddai et al.

1 Introduction With the growth of digitalized histopathological images data-sets in the past years, there was a huge number of researches and attempts to create the perfect autonomous Computer-Assisted Diagnosis (CAD) system for this type of images, however, in every attempt, never a perfect score of recall or precision was reached. The challenge of building such system is still going. CADs architecture in digital histopathology are, necessarily, composed of two segments: image processing and machine/deep learning. The reviews of the state-of-the-art works, such as presented in [1–7] showed that the majority depends on deep learning architectures such as CNN end RNN. The nature of histopathological image is complex, textures and colors can vary depending on the studied cells and the stains used in the histological preparation process of the specimen which, after scanning, generates the digital image. The coloring technics of the sample are: histochemistry (HC), immunohistochemistry (IHC) and immunofluorescence (IF); in each category, there are hundreds of possible stains, each one of them allows the emphasis of certain parts. The most used pigments (stains) are Eosin (E), Hematoxylin (H) and their combination (H&E). The choice of stains and target cell is purely determined by the biologist expert. When dealing with a digitalized histopathological specimen, the tools used in its preparation process must be known; this allows the selection of the most suitable processing technics for its case. Our survey of the state-of-the-art methods in histopathological image analysis field led to some observations: firstly, the majority of researches are specific for the H&E images with just a few of them for IHC images and practically no method specific for IF images was properly established. Secondly, the existing methods in digital histopathology are mostly based on neural networks (CNN, RNN...). The problem with such architectures is their complexity: neural network are time and memory consuming. Another problem is the quantity of training data; the more images used the more classification precision is obtained. In this work, we explore the approach based on Color-SIFT, Bag-of-Features (BoF) and SVM. Our main goal is to study the possibility of obtaining good classification results while using a limited amount of images. This manuscript is organized as follows: in Sects. 2 and 3, we review, respectively, the state of the art works and the method SIFT. The used approach is introduced in Sect. 4 and evaluated in Sect. 5. In Sect. 6, we give a conclusion of this chapter and a perspective of future works.

2 Review of Histological Image Analysis Systems To ensure optimal results, image processing systems, in general, follow a well-defined workflow and that makes no exception for digital histopathology. Due to the complex nature of histological images, the methods used in every step of the general process

Color-SIFT Features for Histopathological Image Analysis

45

Fig. 1 Architecture of histopathological image analysis systems

of analysis, as shown in Fig. 1, must be chosen thoroughly. In this section, we explain the practices used by the existent systems in each step.

2.1 Pre-processing Practices The pre-processing of histopathologic digital image is a very important step, using the right algorithms to adjust luminance, contrast and eliminate noise is a crucial task that aims to regulate the texture of the image without changing the nature of its structures.

46

2.1.1

G. Ouddai et al.

WSI Fragmentation

Raw Whole-Slide Images (WSI) processing is considerate to be a tough operation due to the very large resolution of the data; generally, an image of 100000 × 100000 pixels is considerate WSI. Dividing such images into little patches proves to be the best solution for this problem. The patches generated can be relatively small varying from (256 × 256 pixels) to (960 × 960 pixels) [6].

2.1.2

Color-space Conversion and Luminance Adjustment

One of the most used practices in digital histology is the conversion of the image from RGB to another color-space; in fact, for each stain used in the histological process exists an optimal color-space that emphasizes the right characteristics of the specimen and allows an automatic adjustment of luminance and contrast. For H&E images, Cyan-Magenta-Yellow-Black (CMYB) color space was used in [8], L * A * B was used in [9–12] instead of RGB, L * U * V was used in [13] for its color uniformity.

2.2 Image Processing Extracting low-level information from digital images can be a tough operation. Through the past years, the need to perfectionate the image analysis methods arose. Over the past two decades, there was a huge number of methods for histopathological image analysis. The practices used in the state-of-the art works are divided onto many categories, here, we explain some of them.

2.2.1

Segmentation and Regions-of-Interest Detection

When dealing with a relatively big images, reducing their sizes is a smart solution that facilitates the future processing. Unlike the resizing of WSI, the low-level segmentation aims to create a smaller image compared to the original one by searching and keeping only the informative and significant pixels. In digital histopathology, one of the most used algorithms for primary segmentation is the thresholding by Otsu [14], it is convenient when the luminance of the image is regular and uniform, which is not always the case, [15] proposes a rapid pre-process to prepare the image and fixe its luminance for a better Otsu result. In [15], the thresholding corresponds to the average intensity of the canal R of RGB, the segmentation is performed based on it. Other than Otsu, the segmentation using Chan & Vese method [16, 17] was used in [18–20]. One efficient superpixel based segmentation was introduced in [21], it is especially designed to operate on H&E histopathological images.

Color-SIFT Features for Histopathological Image Analysis

2.2.2

47

Features Extraction

Analyzing the image pixel-wise can lead to the extractions of some common features related to color, shape, texture or any other specifications, they can also be local (significant to some regions) or global (representing all the image). Features extraction in digital histopathology is less used than the segmentation/ROI detection; unlike these latter, low-level features must be combined with machine/deep learning method for a precise interpretation. In [22], the authors study the efficiency of the combination of different feature extractors (GLCM, GLRLM, SFTA, LBP and LBGLCM) with a variety of classifier (SVM, KNN, LDA and SFTA). Other than that, KAZE features were used in [23] to classify breast cancer H&E-stained images.

3 Review of SIFT First introduced in [24], Scale-Invariant Feature Transform (SIFT) is a widely used image processing technic for feature extraction, object recognition and image/keypoints matching. It is proved that this method is scaling, translation and rotation invariant. The workflow of the SIFT algorithm is constituted of four steps divided onto two major parts as presented in (Fig. 2).

Fig. 2 Workflow of the SIFT method

3.1 SIFT Detector In this segment of the SIFT method, the key-points of the image are extracted. At this level, the results are scaling invariant but need some refinement to select only the significant ones. The architecture introduced by the authors is composed of two sub-processes as shown in (Fig. 2).

48

3.1.1

G. Ouddai et al.

Scale-Space Construction

This sub-process ensures the scaling invariance of the final results by blurring the input image gradually. The authors proposed the use of a series of Gaussian Filter (see Eq. (1)); these latter are organized in an increasing way according to the intensity α. ( ) 1 e G(i, j ) = 2π α 2

−(i 2 + j 2 ) 2α 2

(1)

At the end of this step, the result is a block of blurred copies of the original image; for each copy, which corresponds to a certain intensity, a radius is associated. As a preparation for the next step, this group is organized in an ascending order according to the intensities.

3.1.2

Differences of Gaussians (DoG) Calculation and Key-Points Localization

The identification of important pixels from the previously constructed scale-space block is time consuming process. The computation of DoG is a simple task, as it consists of calculating the difference pixel-by-pixel of each two successive images from the scale-space block. In the other hand, extracting the points-of-interest from the DoG is a more complicated step. To localize the scaling invariant key-points, each pixel from a given DoG should be compared with its neighbors; if its value is superior or inferior than that of all its neighbors, this pixel is considered a local key-point. The neighborhood of a given position is defined as follows: the eight direct neighbors (in the same DoG), the corresponding nine neighbors for the previous DoG and the nine from the next DoG.

3.2 SIFT Descriptor The main task of this segment of the SIFT method is to combine the key-points extracted in the SIFT detector; the result is a matrix containing the descriptor vectors. As shown in (Fig. 2), this sub-process is composed of two steps.

3.2.1

Orientation Assignment

This part ensures the rotation invariance of the results. For each point-of-interest neighborhood, a convolutional filter, such as Fourier Transform or Sobel Filtering, must be applied; this will generate a gradient. For the same neighborhood, construct

Color-SIFT Features for Histopathological Image Analysis

49

a histogram representing the 360 possible rotations; in practice, a histogram of 36 rotations is sufficient, each 10◦ angle form the same radius. After the construction of the histogram, it must be updated by calculating the amplitude and direction of each neighborhood; each pixel possessing a high amplitude value is considerate significant. The points-of-interest are extracted from the peaks of each neighborhood, its orientation is equal to the correspondent rotation.

3.2.2

Descriptor Vectors Deployment

In this segment, the points-of-interest obtained in the previous step will be transformed onto a matrix of descriptor vectors. For each key-point, apply a 12 × 12 mask then partition it onto 16 windows of 3 × 3; each one of these latter is associated to a histogram of 8 orientations regrouped in 45◦ angle. The next step is to apply Hough Transform [25] which generates a gradient. The new rotation for each pixel of a given neighborhood is calculated by subtracting the orientation of the centroid of this latter. The histogram is then updated as follows: for each pixel close to the centroid and possessing a high value of amplitude, assign an impact to its orientation, in the other hand, far and weak pixel are neglected. At the end of process, 16 histograms are assigned to each mask, each one of them contain 8 rotations. When aligned, these histograms form a descriptor vector of 128 values. The SIFT descriptor vectors are stored in a matrix of 128 columns and V rows, where V is the number of the final points-of-interest.

4 Methodology As stated before, the purpose of our study is the development of a light-weight system for histopathological image analysis. The system used is based on ColorSIFT features, BoF and SVM classifier. We exploit this architecture and evaluate it on limited data amount using the CPU. A comparaison to ResNet is than performed. Our goal is to find: which system can obtain a better precision when dealing with restricted learning images?, and which system can be the fastest when operating on the CPU rather than the GPU?.

4.1 Pre-processing During the digitalization of the histological specimen, some luminance noises and hues may occur; in the case of H&E stained samples, a green shadow can be noticed immediately. In our system, we use the correction of luminance and contrast by the bias and gain function (see Eq. (2)).

50

G. Ouddai et al.

Output(i, j ) = α ∗ Input(i, j ) + β

(2)

In the definition of the equation above, the parameter α permits the control of contrast; its optimal value is situated between 1 and 3, in the other case, if α < 0 , the result image will have less contrast as the colors will be compressed. The parameter β controls the brightness, it should range between 0 and 100. These parameters are fixed in an experimental way; searching for the optimal values can be a time-consuming task.

Fig. 3 Results of our pre-processing methods—Examples of H&E and IHC images of Medisp HICL database ([26–28])

In the preparation of our system, we established numerous experiments to find the right values for the parameters α and β; we noticed that the results differ depending on the type of image. In the case of H&E stained images, the green hue is much more noticeable than on IHC images. Also, we observed that focusing on the elimination of the green shadow can lead to the fading of informative colors. To remedy to these problems, we propose the adjustment of luminance and contrast for each channel of the RGB-space separately. In the case of H&E images, where the blue and red colors are the most significant, we fix α = 1.2 and β = 25 for the R and B channels, for the G channel, we fix α = 1.2 and β = −25. In the case of IHC image, we fix α = 1.1 and β = 10 for the three channels. The results of our experiments are shown in (Fig. 3).

Color-SIFT Features for Histopathological Image Analysis

51

4.2 Features Extraction by Color-SIFT Scale-Invariant Feature Transform is not widely used in digital histopathology; in our study of the state-of-the-art works, we found that in [29, 30], the authors used SIFT alongside other methods to prepare cervical histopathological images for classification. In [31], to detect mitosis from H&E stained images, the authors proposed an system based on SIFT and texture features to detect the key-points from R and B channels. In this work, rather than using the classic SIFT method, we propose the use of Color-SIFT. The original SIFT, as introduced in [24], uses gray-scale images to detect the key-points. To explore the color information of the image, some variants of SIFT were proposed: • HSV-SIFT [32]: the input RGB image is converted to the HSV color-space. The key-points vector is calculated from each channel and the final SIFT descriptor is the result of the combination of the three vectors. • Opponent-SIFT: in this method, the authors propose the conversion of the input RGB image to an opponent space calculated following the (Eq. (3)). The final SIFT descriptor is the combination of the descriptors of each channel. ⎤⎡ ⎤ −1 ⎡ ⎤ ⎡ √1 √ 0 R O1 2 2 √ 1 √1 − 6 ⎥ ⎣ ⎦ ⎣ O2⎦ = ⎢ √ G (3) ⎣ 6 6 3 ⎦ 1 1 1 O3 B √ √ √ 3

3

3

• CSIFT [33]: as in opponent-SIFT, CSIFT proposes the redefinition of the RGB image using the Gaussian color model. The result image is color and geometry invariant. The conversion of the input RGB image is performed following the (Eq. (4)). CSIFT descriptor vectors are then calculated from each channel. ⎤ ⎡ ⎡ ⎤⎡ ⎤ 0.06 0.63 0.27 E1(x, y) R(x, y) ⎣ E2(x, y)⎦ = ⎣ 0.3 0.04 −0.35⎦ ⎣G(x, y)⎦ (4) E3(x, y) 0.34 −0.6 0.17 B(x, y) • Other variants of SIFT were presented such as Color Independent Component SIFT (CIC-SIFT) [34], transformed Color-SIFT, rg-SIFT. To explore the color information of the histopathological images, we use RGBSIFT to extract features from the three channels. RGB-SIFT is applied to the input RGB channel without conversion to the gray-scale color space. For each channel, SIFT descriptor vector is calculated; its dimension is 128. The final features vector is the result of the composition of the R, G and then B descriptor vectors. For each input image, RGB-SIFT assigns an key-points vector of 384 dimensional orientation (See Fig. 4 for results of the RGB-SIFT features extraction using BreaKHis database [35]).

52

G. Ouddai et al.

Fig. 4 Results of RGB-SIFT features extraction on H&E

4.3 Features Vectors Encoding In the previous section, the features and key-points of the RGB images are extracted and presented as descriptor vectors. These latter can be explored directly in various tasks such as key-points matching. In our case, these descriptors are used in the classification of the histopathological image for disease and cancer detection and recognition. The RGB-SIFT vectors can not be used directly in the classification step. We use the kmeans/frequency histogram method to regroup the SIFT features into Bags-of-

Color-SIFT Features for Histopathological Image Analysis

53

Features (BoF), named also as Visual Bags-of-Words (VBoW). This method is proved to be efficient in the whole image classification task. The BoFs by kmeans/frequency histogram is performed as follows: • After choosing the number of clusters (in our case, k = 5), centroids of each cluster are randomly initialized by elements of the descriptor vectors space. • For the rest of the descriptor vectors, assign a cluster and recalculate the centroid of the cluster. • In the end, the descriptor vector are regrouped into clusters (in our case, 5 groups) • For the features dictionary (clusters), a histogram of frequency is assigned. The latter represents the apparition number of a given descriptor of the cluster.

4.4 BoF Classification Using SVM Support Vector Machine (SVM) [36] is a classification and regression method based on a well defined supervised learning models. The goal of SVM is to find the perfect linear hyperplane that ensures the separation of the data of two known classes while optimizing the margins (Binary SVM). In the case of non-separable data, SVM proposes a new representation of the learning samples. The original SVM work state that if a linear separation can not be obtained in the definition space, a passage to a larger one is necessary. The latter secures the obtaining of a linear separating hyperplane. In this work, we use the classic binary SVM with soft margins to classify our BoFs. This choice is justified by the nature of our experiments data; the base contains two labelled classes of images.

5 Tests and Evaluation 5.1 Experimental Setup Our execution machine is configured as follows: Intel. CoreTM I9 10th Gen up to 5.30 GHz CPU, 32 GB of RAM, NVIDIA. GeForce. GTX 2080 SUPER GPU, 512 GB SSD. For the implementation of our system, we used the Python 3.8 programming language and TensorFlow CPU 2.4.1.

5.2 Experiments Data The staining techniques in histology are categorized into three groups: histochemistry (HC), immunohistochemistry (IHC) and immunofluorescence (IF), in each one,

54

G. Ouddai et al.

hundreds of stains can be used, the most utilized one is the combination of Eosin and Hematoxylin (H&E). In our study, we focus on HC H&E and IHC images; this choice is mainly due to the lack of IF databases. Our experiments were performed on BreakHis data-set [35] which offers a collection of breast tumor tissues classified following different criteria: magnification, tumor type (malignant or benign) and tumoral cells. The samples were obtained using the SOB technique and H&E stains, details can be found in (Table 1). Table 1 BreakHis dataset details Tumor type Cell x40 Benign

Malignant

A F PT TA Total = DC LC MC PC Total =

114 253 109 149 625 864 156 205 145 1370

x100

x200

x400

113 260 121 150 644 903 170 222 142 1437

111 264 108 140 623 896 163 196 135 1390

106 237 115 130 588 788 137 169 138 1232

The cells contained in the benign class are: Adenosis (A), Fibroadenoma (F), Phyllodes Tumor (PT) and Tubular Adenoma (TA). For the malignant cells, there are: Ductal Carcinoma (DC), Lobular Carcinoma (LC), Mucinous Carcinoma (MC) and Papillary Carcinoma (PC). The data-set contains a total of 7909 images, 2480 for benign tumor and 5429 for malignant tumor. The images size is 700 × 420 pixels. In our work, we used 360 (180 for benign tumor and 180 for malignant tumor) random images for the learning step. For the tests and validation of our approach, 360 (180 benign +180 malignant) other random images were used. Our test data was collected randomly from the x100 magnification class. The images were resized to 234 × 154 pixels.

5.3 Results To evaluate our Color-SIFT/SVM based model, we compare its results with the widely used deep learning method ResNet. The latter was used alongside other architectures in [37] for WSI segmentation, it was also used in [38] to create a weakly supervised model for cancer diagnosis in WSI. As stated in the original work [39], the architecture of ResNet can be pushed to contain 152 layers.

Color-SIFT Features for Histopathological Image Analysis

55

In our work, we use ResNet18 (18 layers). We used a 5-fold cross validation to remedy to the over-fitting problem that can occur during the training of deep learning models on a limited data. For each fold, 20 epochs were performed. For our model, we use 5 means for the encoding of BoFs. The models were trained using the CPU. The accuracy of classification and execution time are calculated for each model (see Table 2). Table 2 Evaluation of the proposed method (using the CPU) Accuracy (%) Model Proposed (Color-SIFT/Bof/SVM) ResNet18

Execution time

64.8

03 H 14 min

61.8

07 H 29 min

5.4 Discussion The results of our experiments show that, when using a limited amount of data, our model outperforms the classic deep learning method. For the tumor classification task, the Color-SIFT/BoF/SVM based system proved to be slightly more accurate than the ResNet18 model. After all these experiments, we retain the followings: • When using the CPU, the proposed system took less time than the deep learning architecture. GPU computation offers a fast training of neural networks, however, when dealing with large data, a memory overflow is to be expected. • The use of Color-SIFT in the proposed method ensures the color invariance of the system; this allows the latter to be flexible and easily adaptable to other histopathological image types (IHC, IF, other stains...).

6 Conclusion In this work, we introduced a histopathological image analysis system based on RGB-SIFT features, Kmeans/histogram encoded Bag-of-Features and Support Vector Machine (SVM). The aim of this study was to develop a rapid and color invariant model. The experiments showed that our model is less consuming on execution time using the CPU. Also, compared to the deep learning method (ResNet18), our system proved to be more precise.

56

G. Ouddai et al.

For future works, we intend to test the other available Color-SIFT models and evaluate the possibility of feature extraction from only the two dominant channels. Another future perspective is to explore the SIFT features with CNN architectures; rather than using directly the image as input of CNN algorithm, we intend to explore the possibility of classifying the Bags-of-Features.

References 1. C.L. Srinidhi, O. Ciga, A.L. Martel, Deep neural network models for computational histopathology: a survey. Med. Image Anal. 67 (2021) 2. T.A.A. Tosta, L.A. Neves, M.Z. do Nascimento, Segmentation methods of H&E-stained histological images of lymphoma: a review. Inform. Med. Unlock. 9(May), 35–43 (2017) 3. A. Das, M.S. Nair, S.D. Peter, Computer-aided histopathological image analysis techniques for automated nuclear atypia scoring of breast cancer: a review. J. Digit. Imaging (2020) 4. M.N. Gurcan, L.E. Boucheron, A. Can, A. Madabhushi, N.M. Rajpoot, B. Yener, Histopathological image analysis: a review. IEEE Rev. Biomed. Eng. 2, 147–171 (2009) 5. H. Irshad, A. Veillard, L. Roux, D. Racoceanu, Methods for nuclei detection, segmentation, and classification in digital histopathology: a review-current status and future potential. IEEE Rev. Biomed. Eng. 7, 97–114 (2014) 6. D. Komura, S. Ishikawa, Machine learning methods for histopathological image analysis. Comput. Struct. Biotechnol. J. 16, 34–42 (2018) 7. C. Li, H. Chen, X. Li, N. Xu, Z. Hu, D. Xue, S. Qi, H. Ma, L. Zhang, H. Sun, A review for cervical histopathology image analysis using machine vision approaches, vol. 53 (2020) 8. L.S. Hammes, J.E. Korte, R.R. Tekmal, P. Naud, M.I. Edelweiss, P.T. Valente, A. LongattoFilho, N. Kirma, J.S. Cunha-Filho, Computer-assisted immunohistochemical analysis of cervical cancer biomarkers using low-cost and simple software. Appl. Immunohistochem. Mol. Morphol. 15(4), 456–462 (2007) 9. B. Pang, Y. Zhang, Q. Chen, Z. Gao, Q. Peng, X. You, Cell nucleus segmentation in color histopathological imagery using convolutional networks, in 2010 Chinese Conference on Pattern Recognition, CCPR 2010—Proceedings (2010), pp. 555–559 10. B. Oztan, H. Kong, M.N. Gürcan, B. Yener, Follicular lymphoma grading using cell-graphs and multi-scale feature analysis, in Medical Imaging 2012: computer-Aided Diagnosis, eds. by B. van Ginneken, C.L. Novak, vol. 8315 (2012), pp. 831516 11. O. Sertel, J. Kong, U.V. Catalyurek, G. Lozanski, J.H. Saltz, M.N. Gurcan, Histopathological image analysis using model-based intermediate representations and color texture: follicular lymphoma grading. J. Signal Process. Syst. 55(1–3), 169–183 (2009) 12. O. Sertel, J. Kong, G. Lozanski, A. Shana ’Ah, U. Catalyurek, J. Saltz, M. Gurcan, Texture classification using nonlinear color quantization: application to histopathological image analysis, in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing— Proceedings, pp. 597–600 (2008) 13. L. Yang, P. Meer, D.J. Foran, Unsupervised segmentation based on robust estimation and color active contour models. IEEE Trans. Inf. Technol. Biomed. 9(3), 475–486 (2005) 14. N. Otsu, Threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. SMC 9(1), 62–66 (1979) 15. A. Vahadane, A. Sethi, Towards generalized nuclear segmentation in histological images, in 13th IEEE International Conference on BioInformatics and BioEngineering, IEEE BIBE 2013 (2013) 16. T. Chan, L. Vese, An active contour model without edges, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 1682 (Springer, 1999), pp. 141–151

Color-SIFT Features for Histopathological Image Analysis

57

17. T.F. Chan, L.A. Vese, Active contours without edges. IEEE Trans. Image Process. 10(2), 266– 277 (2001) 18. B. Arora, A. Banerjee, Computer assisted grading schema for Follicular Lymphoma based on level set formulation, in 2013 Students Conference on Engineering and Systems, SCES 2013 (2013) 19. K. Belkacem-Boussaid, S. Samsi, G. Lozanski, M.N. Gurcan, Automatic detection of follicular regions in H&E images using iterative shape index. Comput. Med. Imaging Graph. 35(7–8), 592–602 (2011) 20. K. Belkacem-Boussaid, J. Prescott, G. Lozanski, M.N. Gurcan, Segmentation of follicular regions on H&E slides using a matching filter and active contour model, in Medical Imaging 2010: Computer-Aided Diagnosis, eds. by N. Karssemeijer, R.M. Summers, vol. 7624 (SPIE, 2010), pp. 762436 21. L. Sulimowicz, I. Ahmad, ‘Rapid’ regions-of-interest detection in big histopathological images, in Proceedings—IEEE International Conference on Multimedia and Expo (IEEE Computer Society, 2017), pp. 595–600 22. S. Öztürk, B. Akdemir, Application of feature extraction and classification methods for histopathological image using GLCM, LBP, LBGLCM, GLRLM and SFTA, in Procedia Computer Science, vol. 132 (Elsevier B.V., 2018), pp. 40–46 23. D. Sanchez-Morillo, J. González, M. García-Rojo, J. Ortega, Classification of breast cancer histopathological images using KAZE features, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10814 (LNBI, 2018), pp. 276–286 24. D.G. Lowe, Object recognition from local scale-invariant features, in Proceedings of the IEEE International Conference on Computer Vision, vol. 2 (IEEE, 1999), pp. 1150–1157 25. V.C. Paul Hough, Pattent: method and means for recognizing complex patterns (1962) 26. D. Glotsos, I. Kalatzis, P. Spyridonos, S. Kostopoulos, A. Daskalakis, E. Athanasiadis, P. Ravazoula, G. Nikiforidis, D. Cavouras, Improving accuracy in astrocytomas grading by integrating a robust least squares mapping driven support vector machine classifier into a two level grade classification scheme. Comput. Methods Progr. Biomed. 90(3), 251–261 (2008) 27. S. Kostopoulos, D. Glotsos, D. Cavouras, A. Daskalakis, I. Kalatzis, P. Georgiadis, P. Bougioukos, P. Ravazoula, G. Nikiforidis, Computer-based association of the texture of expressed estrogen receptor nuclei with histologic grade using immunohistochemically-stained breast carcinomas. Anal. Quant. Cytol. histol. Technical report 28. K. Ninos, S. Kostopoulos, I. Kalatzis, K. Sidiropoulos, P. Ravazoula, G. Sakellaropoulos, G. Panayiotakis, G. Economou, D. Cavouras, Microscopy image analysis of p63 immunohistochemically stained laryngeal cancer lesions for predicting patient 5-year survival. Euro. Arch. Oto-Rhino-Laryngol. 273(1), 159–168 (2016) 29. C. Li, H. Chen, L. Zhang, X. Ning, D. Xue, H. Zhijie, H. Ma, H. Sun, Cervical histopathology image classification using multilayer hidden conditional random fields and weakly supervised learning. IEEE Access 7, 90378–90397 (2019) 30. C. Li, H. Chen, D. Xue, Z. Hu, L. Zhang, L. He, N. Xu, S. Qi, H. Ma, H. Sun, Weakly supervised cervical histopathological image classification using multilayer hidden conditional random fields, in Advances in Intelligent Systems and Computing, vol. 1011 (Springer, 2019), pp. 209–221 31. H. Irshad, S. Jalali, L. Roux, D. Racoceanu, G.L. Naour, L.J. Hwee, F. Capron, Automated mitosis detection using texture, SIFT features and HMAX biologically inspired approach. J. Pathol. Inform. 4(2), 12 (2013) 32. A. Bosch, A. Zisserman, X. Muñoz, Scene classification using a hybrid generative/discriminative approach. IEEE Trans. Pattern Anal. Mach. Intell. 30(4), 712–727 (2008) 33. A.E. Abdel-Hakim, A.A. Farag, CSIFT: a SIFT descriptor with color invariant characteristics, in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2 (2006), pp. 1978–1983 34. D.N. Ai, X.H. Han, X. Ruan, Y.W. Chen, Color independent components based SIFT descriptors for object/scene classification. IEICE Trans. Inf. Syst. E93-D(9), 2577–2586 (2010)

58

G. Ouddai et al.

35. F.A. Spanhol, L.S. Oliveira, C. Petitjean, L. Heutte, A dataset for breast cancer histopathological image classification. IEEE Trans. Biomed. Eng. 63(7), 1455–1462 (2016) 36. C. Cortes, V. Vapnik, Support-vector networks. Mach. Learn. 20(3), 273–297 (1995) 37. Y.V. Eycke, C. Balsat, L. Verset, O. Debeir, I. Salmon, C. Decaestecker, Segmentation of glandular epithelium in colorectal tumours to automatically compartmentalise IHC biomarker quantification: a deep learning approach. Med. Image Anal. 49, 35–45 (2018) 38. G. Campanella, M.G. Hanna, L. Geneslaw, A. Miraflor, V.W.K. Silva, K.J. Busam, E. Brogi, V.E. Reuter, D.S. Klimstra, T.J. Fuchs, Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25(8), 1301–1309 (2019) 39. K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2016), pp. 770–778

Time-Series Multidimensional Dialogue Feature Visualization Method for Group Work Rikito Ohnishi, Yuki Murakami, Takafumi Nakanish, Ryotaro Okada, Teru Ozawa, Kosuke Fukushima, Taichi Miyamae, Yutaka Ogasawara, Kei Akiyama, and Kazuhiro Ohashi

Abstract This paper presents a time-series multidimensional dialogue feature visualization method for group work. The new coronavirus has changed our lives and brought many things online. Group work is more prevalent now than ever before, as online access has eliminated location restrictions in all situations, allowing multiple people to gather and share ideas. However, when group work is conducted, discussions and opinions may not proceed smoothly, and sometimes group work is meaningless. This method uses group work recording data and Live Transcription data as input and performs time-series multidimensional dialogue feature visualization R. Ohnishi · Y. Murakami · T. Nakanish (B) · R. Okada Department of Data Science, Musashino University, Tokyo, Japan e-mail: [email protected] R. Ohnishi e-mail: [email protected] Y. Murakami e-mail: [email protected] R. Okada e-mail: [email protected] T. Ozawa · K. Fukushima · T. Miyamae · Y. Ogasawara · K. Akiyama · K. Ohashi ITOKI Corporation, Tokyo, Japan e-mail: [email protected] K. Fukushima e-mail: [email protected] T. Miyamae e-mail: [email protected] Y. Ogasawara e-mail: [email protected] K. Akiyama e-mail: [email protected] K. Ohashi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Lee (ed.), Software Engineering, Artificial Intelligence, Networking and Parallel/ Distributed Computing 2022-Winter, Studies in Computational Intelligence 1086, https://doi.org/10.1007/978-3-031-26135-0_6

59

60

R. Ohnishi et al.

to show group work visualization results as output. The results of the visualization are shown using data from group work, and it is considered possible to visualize information on whether the discussion is active or not at a certain time and whether the discussion is organized or not. Keywords Group work · Meeting · Visualization · Natural language processing · Time-series multidimensional dialogue

1 Introduction Our lives have changed dramatically since 2020 due to the new coronavirus. In this context, people have moved many things online in an effort to reduce contact. In the corporate world, the concept of commuting has been eliminated and a style of working from home has been established, and in the field of education, online classes using video and call capabilities have become common. In particular, group work, in which multiple people gather to exchange ideas and opinions, has become more popular than ever before, as location restrictions have been removed in all aspects of the work environment. However, while group work is increasing, one problem is that it is difficult to improve the quality of group work. In most group work, an assignment is set. In order to prepare one conclusion or production in response to it, members discuss and formulate opinions. This sequence of work does not go smoothly in most cases, and depending on the content of the conversation, the group work may be meaningless. But even if one tries to make an effort not to repeat the same thing, it is difficult to know what to change from the previous group work, and it is important to realize a tool for looking back. From this perspective, we realized that we needed to visualize the contents of group work in numerical and graphical form to make the application suitable for retrospection. This paper shows a time-series multidimensional dialogue feature visualization method that can be used to look back on group work using recorded data of group work and live transcription data. This system visualizes what is being said in group work in chronological order, targeting Live Transcription data, which is data of transcribed statements made during group work. This system visualizes the contents described below. These visualizations allow group work members to efficiently analyze and reflect on past group work. • Negative–positive value of each member’s statement. • The number and percentage of times each member said something. • Representativeness, which expresses the importance during the group work of each of the words uttered. • Freshness, which indicates whether new content is being discussed. The contributions of this paper are as follows:

Time-Series Multidimensional Dialogue Feature Visualization Method …

61

• We represent a new visualization method for time-series multidimensional dialogue features. This method realizes clear visualization of the results of the analysis for reviewing past group work. • We develop a system to realize this visualization method and presented several visualization examples. This paper is organized as follows. In Sect. 2, we describe our research on divergent thinking and convergent thinking and meeting analysis systems relevant to our study. In Sect. 3, we describe the functionality of the system in an application that uses timeseries multidimensional dialogue feature visualization using group work recording data and live transcription data to show the results of meeting analysis. In Sect. 4, as an example of our visualization, we describe the results of an application and discussion using data from an actual group work session. Finally, in Sect. 5, we summarize this paper.

2 Related Works In this section, we describe related research. Our proposed method supports the analysis and review of meetings. In Sect. 2.1, we show some related works about divergent thinking and convergent thinking. In Sect. 2.2, we show Related research on meeting analysis systems.

2.1 Research About Divergent Thinking and Convergent Thinking Psychologist Guilford [1] introduced that the human thought process has two categories, divergence, and convergence. Guilford proposed the Structure of Intellect theory (SI theory) as a model expressing factors of intelligence. The divergent thinking and convergent thinking in Guilford’s SI theory were classifications of an individual person’s intelligence. Osborn [2] proposed Brainstorming, which is a famous method of meeting. Osborn defined two phases that are the divergent phase and the convergent phase as a state of the meeting. Here, this categorization is proposed about a group, not about an individual person’s intelligence. Brainstorming is a meeting method for generating many ideas, but brainstorming is not a technique only for divergent thinking. This method requires clearly separate divergence and convergence as the state of the meeting, and this method encourages focusing on divergent thinking in scenes for generating ideas and on convergent thinking in scenes for evaluation and criticism. Design thinking [3], which is a mainstream creative thinking framework in recent years, also incorporates divergent and convergent thinking as basic processes.

62

R. Ohnishi et al.

In our existing research [4], we focused on the divergence and convergence of meetings and proposed a method to automatically estimate them from the transcribed text. We divided the meeting into 20 segments and defined representativeness as the degree of relevance of each segment to the overall topic of the meeting, and freshness as the degree of new words in that segment. We defined the mode of divergence as low representativeness and high freshness and the mode of convergence as high representativeness and low freshness. In this study, we extract and visualize representativeness and freshness as an analysis of meetings.

2.2 Related Research on Meeting Analysis Systems There are various types of meeting analysis. In recent years, the progress of automatic speech recognition (ASR) technology has made it easier to analyze based on the content of speeches in a meeting. Praharaj et al. classify indicators of co-located collaboration (CC) into social (i.e., non-verbal audio indicators like speaking time, and gestures) and epistemic space (i.e., verbal audio indicators like the content of the conversation) [5]. They list Roles and Knowledge co-construction as Epistemic parameters, and Dominance, Active participation, Expertise, and Rapport as Social parameters. Furthermore, they listed Indicators of CC and their operationalization of collaboration quality for those parameters. For example, indicators of Roles are “Topics covered detected from keywords,” “Frequently used keywords and phrases,” and its Operationalizing collaboration quality is “Topical closeness to meeting agenda,” and “Proximity of commonly used words and phrases to the roles.” Studies that extract indicators to evaluate each parameter include the following. Roles [6, 7], Dominance [8–11], Active participation [12], Expertise [13, 14], Rapport [15], Knowledge co-construction [16– 18]. Praharaj et al. made the above classification and then built a meeting analysis system that focused specifically on epistemic space [5]. They analyzed the collected group speech data to do role-based profiling and visualize it with the help of a dashboard. Our study and this study have in common the use of a dashboard to implement a meeting review system. Chandrasegaran et al. developed “TalkTraces” [6] as a support system focusing on the verbal contents of meetings. TalkTraces provides a real-time visualization that helps teams identify themes in their discussions and obtain a sense of agenda items covered. TalkTraces is a system that integrates speech recognition, natural language processing, topic modeling, and word embedding. Though it shares features with smart meeting rooms, it is designed to exist as a peripheral meeting component. TalkTraces uses topic modeling to identify themes within the discussions and word embeddings to compute the discussion’s “relatedness” to items in the meeting agenda. The system uses LDA [19] for the topic model, and uses ConceptNet Numberbatch [20] for the word embedding. These two processes are performed separately as two iterations at the time of use. In iteration 1, the initial meeting is transcribed

Time-Series Multidimensional Dialogue Feature Visualization Method …

63

using speech-to-text algorithms, and with LDA, used to train a topic model. The model is used to categorize utterances from subsequent meetings. In iteration 2, additionally uses word embeddings to compute speech-to-agenda-item similarities. In both iterations, the visualizations update in real time to represent the results of these computations. The visualization design of TalkTraces shares characteristics with Pousman and Stasko’s taxonomy [21], in that it steadily conveys a high volume of information.

3 Proposed Method This section describes the functionality of the system in an application that uses timeseries multidimensional dialogue feature visualization using group work recording data and live transcription data to show the results of meeting analysis. Figure 1 shows an example of an application screen, and Fig. 2 shows a system configuration diagram of the application.

Fig. 1 Example of an application screen. Meeting. Video Block shows a meeting video. Transcription Block shows the speaker, time, and sentences for each speech. User Data Block shows information about speech, such as the amount of conversation and negative–positive values, aggregated for each user. Graph Block shows fluctuations in the amount of speech, negative–positive values, etc. Filter Block allowing you to check the filtering of the speaker, time, and speech content

64

R. Ohnishi et al.

Fig. 2 A system configuration diagram of the application

3.1 Transcription Decoder Transcription Decoder converts VTT files, which are transcriptions output by Teams [22], a Microsoft conference application, into data that can be analyzed by this application. In the speech, the “speech start time, speech end time, speaker, and speech sentences” of each speech are acquired and defined as “Talk Data” and are acquired for each talk sentence.

Time-Series Multidimensional Dialogue Feature Visualization Method …

65

3.2 Transcript Video Sync Transcript Video Sync eliminates the time difference between the transcript and the meeting video. This function eliminates the time difference that occurs when generating the file and allows the transcript to be displayed in relation to the video. This function provides a method to synchronize the time of the start of a statement in the group work video and the content of that statement by specifying the text to be displayed. This method makes it possible to easily correct the difference between the live transcription data and the meeting video without having to measure how much the difference is.

3.3 Extracting Metadata Extracting Metadata uses the acquired “Talk Data” to extract metadata, enabling analysis from various perspectives regarding the speech. The data to be extracted is the “number of characters, speech time, and negative– positive values” for each speech. The extracted data is added to the speech data and filtered data is defined as “Extended Talk Data.”

3.3.1

Extracting the Number of Characters

Extracting the Number of Characters extracts the number of characters in each speech using the speech sentences in the “Talk Data.” By extracting the number of characters in a speech, it is possible to analyze whether simple reactions or long-winded explanations and impressions were spoken.

3.3.2

Extraction of Speech Time

Extraction of Speech Time extracts the speech time for each speech using the speech start time and speech end time in the “Talk Data.” This function makes it possible to analyze how much time participants spent talking and how much time they spent not talking.

3.3.3

Extraction of Negative–Positive Values

Extraction of Negative–Positive Values extracts negative and positive values for each speech using the speech sentences in the “Talk Data.” This function can be used to analyze whether members make more positive or more negative comments, and to analyze the fluctuation of negative–positive values in group work.

66 Table 1 Definition of negative–positive value

R. Ohnishi et al. References

Definition

Negative–positive value

Inflectable word [25]

Negative experiences

−1

Negative evaluation

−1

Positive experiences

1

Positive evaluation

1

Noun [26]

?e

0

?p

0

?p?e

0

?p?n

0

a

0

e

0

n

−1

o

0

p

1

In this function, some parts of speech are extracted by using the segmentation method in the calculation of negative–positive values. For this extraction, MeCab [23], a morphological analysis engine that uses NEologd [24] as a dictionary for word segmentation, is used. The parts of speech extracted are nouns, adjectives, verbs, and adjectival verbs, which are considered to represent negative–positive values. The function then calculates a negative–positive value for each word in the segmented data using the Japanese Polarity Dictionary [25, 26]. In the inflectable word, the negative–positive value is set to −1 and 1 using the definition of negative– positive in experience and evaluation, respectively. In the noun, the negative–positive value is set to −1 and 1 for “n” and “p,” which are defined as negative and positive. For the other defined values, negative and positive are not considered and are set to 0 because they are not clear and there are only a few of them. The average of those values is defined as the negative–positive value in the speech. The negative–positive values are assigned to the evaluation polarity criteria in the Japanese Polarity Dictionary as listed in Table 1. In negative–positive values, a smaller value indicates a more negative value.

3.4 Filter The filter allows filtering so that only speech sentences that are suitable for the user-defined conditions are calculated. This function uses data on speaker, time, and speech content for filtering.

Time-Series Multidimensional Dialogue Feature Visualization Method …

67

Fig. 3 A graph of the variation in the number of characters (x: time, y: number of characters)

With this function, it is possible to visualize each member of a speech, calculate the number of speeches and negative–positive values for each member at a set time, and visualize the statements and timing of each member for a certain word.

3.5 Segmentation The segmentation divides the “Expanded Talk Data” of the entire group work into minute-by-minute segments and generates “Segmented Talk Data.” This function enables time series analysis of group work. This function divides the “Extended Talk Data” into segments in terms of speech content, the number of characters, and the number of speeches. This section describes the division method of each data.

3.5.1

Extraction of Speeches Amount and Number of Characters

Extraction of speeches amount and the number of characters calculates the sum of the “number of characters and speeches” in a certain range. Figure 3 shows a graph of the variation in the number of characters using this function. As shown in Fig. 3, it is possible to graphically display fluctuations in the number of speeches and the number of letters in a meeting, making it possible to analyze at what point in time the meeting is active.

3.5.2

Extraction of Percentage of Characters Per User

Extraction of the percentage of characters per user calculates the percentage of users in a certain range in the “number of characters”. Figure 4 shows the percentage of the number of characters occupied by members extracted by this function. As shown in Fig. 4, it is possible to visualize the expression of each member at that time, and also to visualize the change in the expression of each

68

R. Ohnishi et al.

Fig. 4 The percentage of the number of characters occupied by members (x: time, y: percentage of the number of characters occupied by members)

member over time. Therefore, it is possible to analyze who was actively speaking and how the number of speakers changed with time.

3.5.3

Extraction of Representativeness and Freshness

Extraction of representativeness and freshness can visualize “Representativeness” and “Freshness” by acquiring and analyzing the content of each speech. Details of this extraction method are described in papers [4, 27–29]. Freshness is a measure of the percentage of newly uttered words in a segment. The higher the freshness, the newer ideas are generated and discussed. This is used as a measure of the contribution of the segment to the agenda. Representativeness is an index that expresses the importance of each segment in the entire meeting. The method is to weigh the similarity between segments based on the words spoken within each segment and calculate the sum of the similarities with other segments as the representativeness. By using this index, it is possible to consider that the segments with high similarity show convergence in the content of the speech, indicating that a consensus is being reached on the agenda. When the analysis is conducted using only one indicator defined in this paper, the representative segment may appear to be converging, but in reality, the speech is stagnant because it cannot come up with a new topic. In the case of freshness, it is difficult to judge whether new ideas are being generated, but the content is not coherent. Therefore, the accuracy of this analysis is improved by using two indicators.

3.6 Extracting User Data Extracting User Data extracts the characteristics of each member from the “Extended Talk Data”. This function aggregates a total of nine items: the number of speeches, the total number of speeches, the total speech time of users, speech occupancy rate, the average number of characters per speech, speech time per speech, speech speed, average negative–positive value, and average negative–positive value in speeches

Time-Series Multidimensional Dialogue Feature Visualization Method …

69

with negative–positive values. This function enables comparison with other members in various characteristics.

3.7 Extracting Statistics Extracting Statistics extracts statistical information in the transcription. Three types of statistical information are provided: the number of speakers, the number of speeches, and the number of characters. This function enables a rough analysis of the entire meeting.

3.8 Meeting Analysis Interface 3.8.1

Screen Layout

Figure 1 shows the layout of the fabricated application. A “Group Work Video Block” is placed in the upper left corner, allowing the user to check the group work video. A “Transcriptions Block” is placed in the upper right corner, and the “Talk Data” can be viewed in a timeline format. When a tab is switched from the “Transcription Block”, the display switches to the “User Data Block”. In the “User Data Block”, the characteristics of each extracted member can be confirmed, and each member can be compared. The “Statistics Block” is placed below the “Transcription Block” and the “User Data Block,” allowing the user to check the statistics of the entire meeting. A “Graph Block” is placed in the lower left corner, where you can check the graphs generated by “Segmented Talk Data” regarding the variation in the number of characters, the variation in the number of speeches, the number of characters per speaker, negative–positive values, “Freshness” of speeches, and the “Representativeness” of the speeches. A “Filter Block” is placed in the lower right corner, allowing you to check the filtering of the speaker, time, and speech content. Figure 5 shows an example of the display of speech data in the “Transcriptions Block.” The color assigned to each member is displayed in the member color to improve visibility and to identify members in the graph.

3.8.2

Search and Filtering Function

The search and filtering function allows the user to use the filtering function described above. Figures 6 and 7 show the results of filtering speech that contain the word “space.” In Fig. 6, only the relevant speech is displayed in the “Transcriptions Block,” allowing

70

R. Ohnishi et al.

Fig. 5 An example of the display of speech data in the “Transcriptions Block.” That Block shows User Color, User Name, Start Time, and Speak Content

the user to understand the content of the speech containing the word. By temporarily removing the filter display after selecting the relevant speech, it is also possible to grasp the contents of the previous and following speeches. In Fig. 7, the graph block shows the time of the relevant speech as a bar for each member color. This indicates at what time each member is speaking the content of the selected filter.

Fig. 6 Show the results of filtering conversations in the “Transcriptions Block.” In this figure, the word “space” is filtered, and the speech containing the word is extracted and displayed with the word highlighted

Time-Series Multidimensional Dialogue Feature Visualization Method …

71

Fig. 7 Show the results of filtering conversations in the “Graph Block”. Bars appear in the graph when using the filtering function in Graph Block. In this figure, the word “space” is filtered, and the User Color of the member who is speaking the speech containing the word is displayed at the time of speaking (x: time, y: number of characters)

3.8.3

Time Synchronized Display

This function synchronizes the display time in the “Group Work Video Block,” “Transcriptions Block,” and “Graph Block.” This function allows the user to check the transcription during the playback time of the video, playback the video near the transcription of interest, and check the video and transcription near the graph when there is an area of interest, such as a rapidly changing area in the graph.

4 Visualization Examples 4.1 Usage Environment Two groups of three university students were created, and each group conducted group work using Microsoft Teams [22], and the recorded data and live transcription data were used in this system for analysis. When logging into Teams, each student participated in the group work from his or her own PC. Each group had 15 min to discuss the topic, “What items would be important if a spaceship made an emergency landing?”.

4.2 Consideration of Use Results Figure 8 shows a graph of character variation in Group A. It can be seen in Fig. 8 that the number of characters decreases significantly at about the 2-min mark. This can be assumed to be due to the fact that the discussion proceeded in a manner in which each individual ranked the items for a certain period of time, in the beginning, followed by a discussion. Thus, a time-series analysis of the number of characters can be used to confirm fluctuations in the excitement of the group work.

72

R. Ohnishi et al.

Fig. 8 Shows a graph of character variation in Group A (x: time, y: number of characters). This figure shows that there is no discussion a few minutes after it begins and that there is an active discussion after that time is up

Figure 9 shows a graph of the results of filtering for “water (水),” one of the 15 items in each group. The “Graph Block” in Fig. 9 shows that in Group A, the discussion is short-term at a certain time, while in Group B, the discussion is generalized. By filtering for specific words in this way, the flow of discussion in group work can be visualized. Also, by comparing multiple group work sessions, differences in the way group work proceeds can be visualized. Figure 10 shows the results of filtering for the words “なるほど、うん、ハイ 、確かに” which are words indicating agreement in each group. In Fig. 10, it can be seen from the graph block that more members in blue in Group A and more members in red in Group B showed agreement. In addition, the “User Data Block” shows that members in blue spoke the word “agree” 27 times, more than the second-place user

Fig. 9 Shows a graph of the results of filtering for “water (水)” (x: time, y: number of characters). This figure shows that in Group A, the discussion on the word in question is concentrated in a short period of time, while in Group B, the discussion continues for a longer period of time

Time-Series Multidimensional Dialogue Feature Visualization Method …

73

Fig. 10 The results of filtering for the words indicating agreement (x: time, y: number of characters). This figure shows that both Group A and Group B are actively speaking words that indicate agreement. In addition, the blue members in Graph A and the orange members in Graph B often speak words that indicate agreement

(17 times). Members in red spoke the word “agree” 32 times, more than the second place (14 times). By limiting the filtering to words of agreement, it is possible to visualize what kind of content was agreed upon most often by understanding at which time the agreement was made in the group work. Figure 11 shows the “Freshness and Representation” graph for Group B. The freshness graph in Fig. 11 always shows high values, and the representative graph shows high values at the end of the group work, which confirms that the participants discussed new topics from the beginning to the middle of the work, and summarized their ideas at the end. By using the graph of freshness and representativeness, it is possible to visualize the divergent phase at the time of high freshness and low representativeness, and the convergent phase at the time of low freshness and high representativeness.

4.3 Discussion The visualization examples show that it is possible to understand the fluctuations in the activity of discussion in a meeting and how much of a given word is being spoken at any given time. Also, by changing the words used in the Filter, it would be useful to understand the center of the discussion and the conversion points of the discussion.

74

R. Ohnishi et al.

Fig. 11 Shows the “Freshness and Representation” graph for Group B (top figure x: time, y: freshness) (bottom figure x: time, y: representativeness). This figure shows that at the end of the group work, the freshness is low and the representativeness is high, indicating that the group work was summarized at that time

In addition, the graphs of representativeness and freshness allowed us to understand the generation of new ideas in the discussions and the consolidation of the discussions. As described above, detailed feedback on group work can be obtained by reflecting on the group work using this application. Using this information, group work methods can be reviewed to improve group work.

5 Conclusion In this paper, we present an application that shows the results of analyzing the frequency and content of each user’s conversations by performing time-series multidimensional dialogue feature visualization using Live Transcription data from group work. As a result of using the system, it was considered possible to visualize the way discussions were conducted, the timing of topics discussed in discussions, and the freshness and representativeness of discussions. By using the visualized information to look back on the group work in chronological order, we consider that it is possible to grasp the good points and points for improvement of the group work. There are three issues to be addressed in the future of this system. The first is to conduct usefulness experiments. Although we were able to visualize the system, we have not been able to measure its usefulness of the system. Therefore, it

Time-Series Multidimensional Dialogue Feature Visualization Method …

75

is necessary to define what kind of group work is good and to analyze the improvement of group work with and without this system in order to determine its usefulness. Second, it is necessary to visualize the results of the analysis in the application using topic modeling. Talk Traces [6] and Okada’s work [29] use topic modeling to create word themes and visualize which topics are assigned to each conversation and each word. By implementing this function in this system, we believe it will be possible to visualize topic transition points and the percentage of topics discussed by each member in a way that is easier for users to understand. In addition, by combining this function with this system, it would be possible to visualize which topics were agreed upon more often and which topics were discussed more actively. The third is the use of face recognition technology. By capturing facial movements and expressions during group work, it is possible to capture nods and emotions, which would increase the amount of information in the time series analysis.

References 1. J.P. Guilford, The Nature of Human Intelligence (McGraw-Hill, New York, 1967) 2. A.F. Osborn, Applied Imagination. Principles and Procedures of Creative Problem-Solving (Charles Scribner’s Sons, 1953) 3. T. Brown, Design thinking. Harv. Bus. Rev. 86(6), 84 (2008) 4. R. Okada, T. Nakanishi, Y. Tanaka, Y. Ogasawara, K. Ohashi, A time series structure analysis method of a meeting using text data and a visualization method of state transitions. N. Gener. Comput. 37, 113–137 (2019) 5. S. Praharaj, M. Scheffel, M. Schmitz, M. Specht, H. Drachsler, Towards collaborative convergence: quantifying collaboration quality with automated co-located collaboration analytics, in LAK22: 12th International Learning Analytics and Knowledge Conference (2022), pp. 358–369 6. S. Chandrasegaran, C. Bryan, H. Shidara, T.Y. Chuang, K.L. Ma, TalkTraces: real-time capture and visualization of verbal content in meetings, in Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (2019), pp. 1–14 7. S. Praharaj, M. Scheffel, M. Schmitz, M. Specht, H. Drachsler, Towards automatic collaboration analytics for group speech data using learning analytics. Sensors 21(9), 3156 (2021) 8. T, Kim, A. Chang, L. Holland, A.S. Pentland, Meeting mediator: enhancing group collaboration using sociometric feedback, in Proceedings of the 2008 ACM Conference on Computer Supported Cooperative Work (ACM, 2008), pp. 457–466 9. K. Bachour, F. Kaplan, P. Dillenbourg, An interactive table for supporting participation balance in face-to-face collaborative learning. IEEE Trans. Learn. Technol. 3(3), 203–213 (2010) 10. T. Bergstrom, K. Karahalios, Conversation clock: visualizing audio patterns in co-located groups, in 2007 40th Annual Hawaii International Conference on System Sciences (HICSS’07) (IEEE, 2007), p. 78 11. S. Praharaj, M. Scheffel, H. Drachsler, M. Specht, Group coach for co-located collaboration, in European Conference on Technology Enhanced Learning (Springer, 2019), pp. 732–736 12. J. Kim, K.P. Truong, V. Charisi, C. Zaga, M. Lohse, D. Heylen, V. Evers, Vocal turn-taking patterns in groups of children performing collaborative tasks: an exploratory study, in Sixteenth Annual Conference of the International Speech Communication Association (2015) 13. J. Zhou, K. Hang, S. Oviatt, K. Yu, F. Chen, Combining empirical and machine learning techniques to predict math expertise using pen signal features, in Proceedings of the 2014 ACM Workshop on Multimodal Learning Analytics Workshop and Grand Challenge, 369 (ACM, 2014), pp. 29–36

76

R. Ohnishi et al.

14. S. Oviatt, K. Hang, J. Zhou, F. Chen, Spoken interruptions signal productive problem solving and domain expertise in mathematics, in Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (ACM, 2015), pp. 311–318 15. N. Lubold, H. Pon-Barry, Acoustic-prosodic entrainment and rapport in collaborative learning dialogues, in Proceedings of the 2014 ACM Workshop on Multimodal Learning Analytics Workshop and Grand Challenge (ACM, 2014), pp. 5–12 16. H. Jeong, M.T. Chi, Knowledge convergence and collaborative learning. Instr. Sci. 35(4), 287–315 (2007) 17. S. Teasley, F. Fischer, P. Dillenbourg, M. Kapur, M. Chi, A. Weinberger, K. Stegmann, Cognitive convergence in collaborative learning (2008), https://repository.isls.org//handle/1/3275 18. B. Huber, S. Shieber, K.Z. Gajos, Automatically analyzing brainstorming language behavior with Meeter. Proc. ACM Hum.-Comput. Interact. 3(CSCW), 1–17 (2019) 19. D.M. Blei, A.Y. Ng, M.I. Jordan, Latent Dirichlet allocation. J. Mach. Learn. Res. 3(4/5), 993–1022 (2003) 20. R. Speer, J. Chin, C. Havasi, ConceptNet 5.5: an open multilingual graph of general knowledge, in AAAI Conference on Artificial Intelligence (2017), pp. 4444–4451 21. Z. Pousman, J. Stasko, A taxonomy of ambient information systems: four patterns of design, in Proceedings of the Working Conference on Advanced Visual Interfaces (2006), pp. 67–74 22. Microsoft Teams—video conferencing, meetings, calling, https://www.microsoft.com/en-us/ microsoft-teams/group-chat-software. Accessed 5 Sept 2022 23. MeCab: yet another part-of-speech and morphological analyzer, https://taku910.github.io/ mecab/ 24. Neologd, https://github.com/neologd/mecab-ipadic-neologd 25. N. Kobayashi, K. Inui, Y. Matsumoto, K. Tateishi, Collecting evaluative expressions for opinion extraction. J. Nat. Lang. Process. 12(3), 203–222 (2005) 26. M. Higashiyama, K. Inui, Y. Matsumoto, Learning sentiment of nouns from selectional preferences of verbs and adjectives, in Proceedings of the 14th Annual Meeting of the Association for Natural Language Processing (2008), pp. 584–587 27. R. Okada, T. Nakanishi, Y. Tanaka, Y. Ogasawara, K. Ohashi, A visualization method of relationships among topics in a series of meetings. Inf. Eng. Express 3(4), 115–124 (2017) 28. T. Nakanishi, R. Okada, Y. Tanaka, Y. Ogasawara, K. Ohashi, A topic extraction method on the flow of conversation in meetings, in 6th IIAI International Congress on Advanced Applied Informatics, IIAI-AAI 2017, Hamamatsu, Japan, 9–13 July 2017 29. R. Okada, T. Nakanishi, Y. Tanaka, Y. Ogasawara, K. Ohashi, A topic structuration method on time series for a meeting from text data, in Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (Springer, Cham, 2017), pp. 45–59

A Study on the Usage Prediction Model of Demand Response Resource Using Machine Learning Hyeonju Park, Chungku Han, Kilsang Yoo, and Gwangyong Gim

Abstract In the Demand Response (DR) market, it is very important to predict the usage of demand response resources more accurately in real-time based on the power usage per minute. The power usage per minute is 60 in an hour, so it does not draw a continuous trend line, but just shows the pattern of going back and forth repeatedly with a few usage values within a certain range. This acts as a constraint in developing a prediction model. Based on the one-month data, the LSTM (Long Short Term Memory) model had the best performance indicator value when the window size of the input data and the number of predicted results were set to (60/10). The Gated Recurrent Unit (GRU) prediction model showed slightly better results. In a comparative experiment in which the training dataset was increased to 2, 3, and 6 months, the performance index was lower than that of 1 month. In the Autoregressive Integrated Moving Average (ARIMA) model, the differences in terms of fit model indicators and periodicity were compared and tested for each of the five individual months. ARIMA (p,d,q) orders for the entire time series showed different result values for each month, and differential orders existed for months in season-changing seasons. In ARIMA (p,d,q) (P,D,Q) for the period pattern, the P and Q orders were 1 or 2, indicating similar periodicity within a certain range. As a result of extracting data of every 0, 15, 30, and 45 min from a 36-month data set for comparative verification between predictive models, the ARIMA model showed better performance index values. This study has meaning in adapting a new approach in terms of real-time prediction based on the power consumption per minute within a short period of 1 ~ 2 months, which is different from the existing hourly or daily forecasts. Keywords Machine learning · Demand response resource · Usage prediction model · LSTM · ARIMA H. Park · C. Han · K. Yoo · G. Gim (B) Department of IT Policy and Management, Soongsil University, Seoul, South Korea e-mail: [email protected] H. Park e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Lee (ed.), Software Engineering, Artificial Intelligence, Networking and Parallel/ Distributed Computing 2022-Winter, Studies in Computational Intelligence 1086, https://doi.org/10.1007/978-3-031-26135-0_7

77

78

H. Park et al.

1 Introduction As the supply and demand instability gradually increases due to the increase in obstacles to the power supply and the validity of predicting power demand [1], it is necessary to reduce or distribute power demand through management in terms of demand in parallel with the expansion of supply capacity. Accordingly, a paradigm shift in power policy was required due to efficient supply and demand including demand management [2]. The Demand Response (DR) market is a market that pays compensation if industries, buildings, and APT complexes reduce electricity demand by pre-contracted reduction capacity for a specific time (1–4 h) according to the government’s (Power Exchange) instructions. It is not a form in which electricity consumers can directly participate, but can only participate in demand response resources, a group of more than tens to hundreds of electricity consumers, through DR Aggregators. The demand management business operator is responsible for management and operation to achieve a reduction in power consumption above the registered capacity of the power exchange as a unit of demand response resources. Most importantly, collecting electricity values per minute and making real-time predictions of usage for an hour, and performing Portfolio management for electricity consumers in resources so that the performance of resource reduction does not fall short of the pre-arranged reduction capacity are crucial. This study seeks to find a model suitable for the characteristics of power consumption per minute using machine learning and to identify optimal conditions through comparative experiments on matters to be considered when constructing a predictive model. For the power amount data, power usage data for about 36 months from November 2017 to October 2020 was used. This study intends to develop and apply the LSTM (long short term memory) model and the ARIMA model, which are most often used for time series prediction, to analyze the performance and differences of each model. LSTM prediction model was developed using Python code-based libraries. The ARIMA prediction model is developed using the IBM SPSS Statistics Version 22. Statistical Analysis Tool. For the evaluation of the predictive model, the Mean Absolute Percentage Error (MAPE), which is frequently used in the time series model, was used as the main performance indicator. This study aims to evaluate the results of the model by using the Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) as auxiliary indicators.

2 Related Study 2.1 Predicting Power Demand Based on Time Series Analysis In a time series model, a time series is defined as a series of data listed at regular time intervals and refers to data recorded over time. It also refers to a method of predicting power demand by creating an analysis model using these time series data

A Study on the Usage Prediction Model of Demand Response Resource …

79

and predicting future values [3]. Power demand is a representative time-series data that fluctuates according to the passage of time and season and has a certain pattern of trends or cycles. The ARIMA model represents the current observation based on past observations and the functional form of the error term [4]. In the model expressed in the form of ARIMA (p,d,q), p represents the autoregressive order, q represents the moving average order, and d represents the difference order [5]. Son [6] said that the ARIMA model is effective in analyzing time series including seasonal variation and for short-term prediction. Park [7] confirmed that the performance was improved by predicting power demand by summing the predicted value calculated by the AR (Auto Regressive) model and the error correction estimated by the ANFIS-based Neuro-Fuzzy model using the fused model combined with the AR model. Son [6] confirmed that the predictive performance of the model using time series cluster analysis significantly decreased after 4 days. This is because the combined prediction weight applied from the start of the prediction becomes inaccurate over time. Jo [8] conducted an empirical study on an algorithm for predicting power demand based on weather information using neighborhood forecasts. Compared with the index smoothing model of the power exchange, it was shown that accuracy was greatly improved in summer and winter. Lee [9] predicted the university campus power usage by including the ARIMA model which was a time series model focusing on cooling and heating demand. Looking at foreign research cases, Hunt et al. [10] performed major energy demand predictions in Turkey based on Semi Statistical Cycle Pattern Analysis, showing similar research results to Winter’s exponential smoothing method. As a result of comparing the seasonal ARIMA model and the simple AR model, Ackerman [11] showed better results than the simple AR model, which removed the values used in the power distribution device. Bakhat et al. [12] applied the GARCH (Generalized Auto Regressive Conditional Heteroskedasticity) model and the ARMAX (Auto Regressive Moving Average Model with Exogenous Inputs) model in a study on daily power demand. Taylor [13] showed better prediction performance with the modified Holt-Winters exponential smoothing model than the existing Holt-Winters model or the dual-season ARIMA model. Taylor [14] further developed the ultra-short-term triple-season model.

2.2 Forecasting Power Demand Based on Regression Analysis Due to the relationship between weather factors and power demand, regression models with weather factors as independent variables and power demand as dependent variables have been used a lot in predicting power demand. Typically, the SVR (Support Vector Regression) model is widely used, and the multi-linear regression model uses two or more independent variables to explain the dependent variable, which is a method used to estimate the regression coefficient using past data [4].

80

H. Park et al.

Back [15] showed significant improvement in prediction performance when applying the Bagging technique compared to when using individual prediction models in a study that applied Bagging and Boosting to prediction models of Naive, ANN (Artificial Neural Network), and SVM (Support Vector Machine). Kim [16] compared the multiple regression model and the SARIMA model in relation to the prediction of weekly maximum power demand. The high prediction accuracy of the SARIMA model, which uses only one past observation value as a single variable, is because the multiple regression model uses temperature and GDP variables as multivariate to include the prediction error of the independent variable, while the SARIMA model does not include the error. Looking at foreign research cases, Yumurtaci et al. [17] used a linear regression model to predict power usage through population and per capita consumption rate. Lam et al. [18] studied residential and commercial power consumption patterns in Hong Kong using multiple regression models. Hong [19] and Fan et al. [20] analyzed Chinese data and showed excellent results through the SVR (Support Vector Regression) model when there were few suitable data.

2.3 Prediction of Electricity Demand Based on Artificial Neural Network Analysis Artificial neural networks use multi-layer perceptron structures that mimic neural network structures in living organisms [21]. The nonlinear characteristics of major variables related to power demand or power usage are modeled with several nonlinear conversion techniques that summarize key functions or contents in large amounts of data or relationships between data through a multi-processing hierarchy [22]. The circular neural network is a structure that can reflect the temporal relationship of variables related to predicting power demand [4]. Since all of time steps are shared by variables, a general error inversion method can be applied [23]. However, cyclic neural networks have a problem of poor learning due to the explosion or disappearance of slope values when training long time series [24]. Long Short Term Memory (LSTM) models inputs and outputs in a structure that stores previous information in a hidden layer [4]. As a structure consisting of forgetting gate, input, and output gate, it is possible to solve the long-term memory loss problem, and it is widely used in the short-term power demand prediction model. Park [4] improved the performance of the prediction model by selecting a similar date based on reinforcement learning as a power demand prediction technique. Moon et al. [25] stated that the model applied with ANN (Artificial Neural Network) showed more accurate prediction performance than SVR (Support Vector Regression) through a study of power usage prediction techniques for university campus buildings. Song et al. [26] derived a result that the prediction results for the reliability demand response for each consumer using the LSTM model were better. Kim et al. [27] highly improved the accuracy of campus power prediction models through a

A Study on the Usage Prediction Model of Demand Response Resource …

81

stepwise optimization method, even though only a single input variable was used based on CNN-LSTM deep learning. Kim et al. [28] showed high predictive performance for holiday holidays based on the RNN-LSTM model in a power demand prediction study considering holiday information. Looking at foreign research cases, Sozen et al. [29] used an artificial neural network model to predict energy demand in Korea. As a result of comparing and evaluating several hybrid models in electricity and oil consumption prediction, Pao [30] showed that the WARCH-artificial neural network model was the best. GarciaAscanio et al. [31] showed that the internal Multi Layer Perception (iMLP) model is more suitable than the Vector AutoRegressive (VAR) model through Spain’s monthly power consumption prediction.

3 Research Model 3.1 Research Model Summary The overall process of the research method is shown in Fig. 1. Actual power usage data were aggregated in real-time in units of 1 min. Predictive data periodically generates one-minute predictive data through LSTM and ARIMA prediction models. For the evaluation of prediction accuracy between the two prediction models, MAPE, RMSE, and MAE indicators were used. This study aims to compare the prediction accuracy between the artificial neural network model and the traditional statistical time series model.

Fig. 1 Process of research method

82

H. Park et al.

Fig. 2 1 min power usage (60 data/1 h, 240 data/4 h, 1,440 data/24 h)

3.2 Analysis on the Characteristics of the Data The data is one-minute data collected through real-time power meters installed separately for the purpose of participating in the DR market. The collection period is 36 months from 2017.11.01 to 2020.10.31, and the data set consists of the collection time (YYYY-MM-DD HH:MM) and the power usage (kWh) per minute. The power usage per minute shows a trend pattern that is significantly different from that of a typical time unit and a daily power usage. Figure 2 shows that the power consumption per minute does not show any linearized trend at all during 1 h, and rather shows a pattern with very high-level regularity as if jumping. As the number increases to 240 in 4 h and 1,440 in 24 h, it is much more stably concentrated within a certain range, showing a periodic pattern. As shown in Fig. 2, the variability and irregularity of the 1-min power usage data value are relatively greater than the accumulated power usage values in 1-h units. This characteristic acts as a constraint in developing a model that requires a one-minute prediction, providing a structural limitation that can increase the relative prediction error.

4 Experiment and Result 4.1 Experiment Design The optimal sliding window and target predictions in the deep learning prediction model, comparison between the three RNN models, analysis of the learning period extension effect, and the characteristics of the suitable model and periodicity in the ARIMA model were identified. After, comparative verification between models was conducted. The MAPE is used as the main evaluation index, and major details of the experimental design are described in Table 1.

A Study on the Usage Prediction Model of Demand Response Resource …

83

Table 1 Classification of experimental design Experimental data

Division

Model Deep learning

ARIMA

Base month

• 1 month – 2020.08 – 44,638 data

• Window/target • Fit model search • MAPE, RMSE, MAE • Seasonality • Model comparison • MAPE, RMSE, MAE (LSTM, GRU, RNN)

Extension of train period

• 2 month • Comparative • Comparative verification verification – 2020.07 ~ 2020.08 – 89,278 data • MAPE, RMSE, MAE • MAPE, RMSE, MAE • 3 month – 2020.06 ~ 2020.08 – 132,477 data • 6 month – 2020.03 ~ 2020.08 – 264,490 data

Comparative verification of predictive models

• 36 month data: 1,565,951 • Select: 0 m, 15 m, 30 m, 45 m • Select data: 104,373 (6.7%) • Train/test: 33 month/3 month

• Window/target • Fit Model Search • MAPE, RMSE, MAE • Seasonality • Model comparison • MAPE, RMSE, MAE (LSTM, GRU, RNN)

4.2 Deep Learning Model Experiment Results Training, verification, and test data were distributed at a ratio of 6:2:2 for model learning, including the base month of August 2020. The power usage per minute is continuous time-series data, and the model performance varies depending on the window size of the input data used for prediction and the number of prediction results required. It is important to take into account the situation in which the prediction of demand response resource usage requires 60 predictions based on 1 h, which is the unit of DR transaction time unit. The window size of the basic input data for comparative analysis was tested based on 60 power usage during the previous 60 min, and the number of predicted result values was 10 min’ prediction, which is 10. The results are shown in Tables 2 and 3. Table 2 Performance comparison (based on input data window) Division

Window/target 60/10

90/10

120/10

40 /10

20/10

MAPE

1.1995

1.2966

1.2663

1.3317

1.1240

RMSE

1.6578

1.7988

1.7267

1.8356

1.5401

MAE

1.3416

1.4528

1.4060

1.4940

1.2543

84

H. Park et al.

Table 3 Performance comparison (based on the number of targets) Division

Window/target 60/10

60/5

60/15

60 /20

MAPE

1.1995

1.4702

1.2389

1.5058

RMSE

1.6578

1.9936

1.7180

2.0330

MAE

1.3416

1.6269

1.3804

1.6645

Table 4 Performance comparison (extension of train period) Division

Train period 1 month

2 month

3 month

6 month

MAPE

1.1995

1.3293

1.3799

1.2292

RMSE

1.6578

1.7909

1.8552

1.6649

MAE

1.3416

1.4709

1.5254

1.3659

Table 5 Performance comparison (RNN Model)

Division

RNN Model LSTM

GRU

SimpleRNN

MAPE

1.1995

1.1549

1.5077

RMSE

1.6578

1.5895

2.0235

MAE

1.3416

1.2893

1.6665

Next, the experiment results show the analysis of the effectiveness of expanding only the training dataset from 1 month to 2, 3, and 6 months with the same test dataset as the base month of August 2020. In predicting power usage in a one-minute unit in a deep learning model, it can be seen that the correlation between increasing the learning period and the exact predicted value is not constant. Rather, it is important to set an appropriate learning period. The result is shown in Table 4. All performance indicators showed slightly better results in the Gated Recurrent Unit (GRU) prediction model compared to the LSTM prediction model. The result is shown in Table 5. Table 5 is for August 2020.

4.3 ARIMA Model Experiment Result Table 6 describes the results of the fit model and performance indicators through the ARIMA model performance for five individual months. In the ARIMA (p,d,q) (P,D,Q) expressed in the fitted model, the entire time series part was expressed as (p,d,q), and different results were found for each month. In November 2019, December 2019, and June 2020, the difference order was 1, and it can be interpreted that the seasonal characteristics changing to winter or summer were reflected as a

A Study on the Usage Prediction Model of Demand Response Resource …

85

trend in the power consumption per minute. The (P,D,Q) orders with respect to the period pattern are slightly different from month to month, but the AR() and MA() orders are 1 or 2, and thus similar periodicity is shown within a predetermined range. The normal R square is an indicator of how much explanatory power is as a result of fitting the predictive model, and it is very high in July 2020 and August 2020. MAPE shows similar indicator results that do not differ significantly from month to month except for July 2020. The experimental results for the change in the fitted model index and the periodicity index when the prediction period is prolonged are shown in Table 7. Although the p, d, and q orders for the entire time-series data showed varying results, the P, D, and Q orders for the cycle pattern were similar to 1 or 2, similar to the previous one-month indicator. MAPE did not show a significant difference between 1.039 and 1.244.

4.4 Comparative Verification of Prediction Model Data extracted for 36 months corresponds to about 6.7% of the total data. As described in Table 2 related to deep learning model experiments, the data for 33 months is training data. The data for the final three months were used as test data. The results are shown in Tables 8 and 9. Table 9 is for the 36-month extracted data corresponds to about 6.7% of the total data. As shown in Table 10, the ARIMA model experimental results showed that the data continuity was greatly diluted and the cycle pattern was suitable as (0,0,0). It showed better performance in the test data compared to the training data. The final result values in the deep learning model and the ARIMA model were compared and summarized as shown in Table 11. Better performance indicator values were derived from the ARIMA model from all performance indicators. It can be seen that the stability of the ARIMA model in traditional long-term time series analysis and prediction and the limitations of implementing the deep learning model in this study are due to the lack of optimization through various hyperparameter settings.

5 Conclusion An experiment was conducted using a deep learning model that typically requires a lot of data and an ARIMA model which is used for long-term time series data analysis to compare and analyze various conditions of power usage data in a minute unit of several months. In the deep learning model, performance differences occurred according to the Window/Target value and the training period setting. In the ARIMA model, the monthly performance indicators were similar, but it was found that there were trends and constant cycle patterns according to seasonal fluctuations. Through the performance comparison between the two models, it was possible to confirm

ARIMA(0,1,8)(1,0,2) 0.639 1.132

ARIMA(0,1,8)(2,0,1)

0.623

1.274

1.571

1.063

Fit model

Normal R square

MAPE

RMSE

MAE

1.047

1.461

2019.12

Target month

2019.11

Division

Table 6 Fit model and evaluation index (target month)

1.110

1.642

1.169

0.624

ARIMA(0,1,8)(2,0,1)

2020.06

0.948

1.192

0.866

0.894

ARIMA(2,0,8)(1,0,1)

2020.07

1.027

1.619

1.228

0.796

ARIMA(3,0,8)(1,0,1)

2020.08

86 H. Park et al.

A Study on the Usage Prediction Model of Demand Response Resource …

87

Table 7 Fit model and evaluation index (extension of train period) Train period

Division

1 month

2 month

3 month

6 month

Fit model

ARIMA (3,0,8)(1,0,1)

ARIMA (0,0,8)(1,0,2)

ARIMA (2,1,8)(1,0,2)

ARIMA (2,1,7)(2,0,1)

Normal R square

0.796

0.796

0.893

0.635

MAPE

1.228

1.244

1.039

1.221

RMSE

1.619

1.702

1.477

1.664

MAE

1.027

1.206

0.974

1.005

Table 8 Performance comparison (based on window/target) Division

Window/target 96/6

48/6

72/6

144/6

192/6

MAPE

2.2602

2.5247

3.3840

5.4358

89.4967

RMSE

2.8648

3.3638

3.8851

7.1797

96.1853

MAE

2.0908

2.4202

3.2322

5.6878

96.0578

Table 9 Performance comparison (RNN Model) Division

RNN model LSTM

GRU

SimpleRNN

MAPE

2.2602

2.3080

2.5191

RMSE

2.8648

2.7863

3.3070

MAE

2.0908

2.1080

2.4017

Table 10 Fit model and evaluation index (extracted data of 36 months) Division

Train (33 month)

Test (3 month)

Fit model

ARIMA(1,1,7)(0,0,0)

ARIMA(1,1,7)(0,0,0)

Normal R square

0.293

0.558

MAPE

1.913

1.497

RMSE

2.456

1.998

MAE

1.466

1.286

Table 11 Predictive model comparative verification results

Division

Deep learning model

ARIMA model

MAPE

2.2602

1.497

RMSE

2.8648

1.998

MAE

2.0908

1.286

88

H. Park et al.

the need for optimization through the setting of various hyperparameters in the deep learning model and the stability of the ARIMA model with relatively good performance. The prediction of power usage in previous studies was mainly hourly or daily. This study has meaning in implementing a prediction model for ultra-short-term power usage in one minute and suggested implications for future considerations.

References 1. H.S. Jung, A study on the factors affecting the intention to use the people’s participatory power transaction system for energy prosumer. Doctoral Thesis, Graduate School of Soongsil University (2021) 2. J.G. Choi, A study on economic analysis of small scale demand response resources for the stabilization of power system supply. Doctoral Thesis, Graduate School of Chosun University (2021) 3. E. Erdogdu, Electricity demand analysis using cointegration and ARIMA modelling: a case study of Turkey. Energy Policy 35(2), 1129–1146 (2007) 4. R.J. Park, A study of short-term load forecasting method using similar day selection based on reinforcement learning. Doctoral Thesis, Graduate School of Soongsil University (2020) 5. T.H. Lee, Prediction of seoul house price index using artificial neural network: focused on the seoul housing price index. Doctoral Thesis, Graduate School of Chung-Ang University (2018) 6. H.G. Sohn, Elecricity demand combined forecasting based on time series clustering analysis. Doctoral Thesis, Graduate School of Chung-Ang University (2016) 7. Y.S. Park, Daily peak power demand forecasting method with fusion model composed of AR and ANFIS. Doctoral Thesis, Graduate School of Korea National University of Transportation (2015) 8. S.W. Jo, Short-term load forecasting algorithm using Dong-Nae forecast for weather. Master Thesis, Graduate School of Soongsil University (2018) 9. J.Y. Lee, Campus electric load forecasting using time series model. Master Thesis, Graduate School of Chung-Ang University (2019) 10. L.C. Hunt, G. Judge, Y. Ninomiya, Underlying trends and seasonality in UK energy demand: a sectoral analysis. Energy Econ. 25(1), 93–118 (2003) 11. G. Ackerman, Short-term load prediction for electric-utility control of generating units. In: Comparative Models for Electrical Load Forecasting, ed. By Farmer (1985), pp. 33–42 12. M. Bakhat, J. Rossello, Estimation of tourism-induced electricity consumption: the case study of Balearics Islands, Spain. Energy Econ. 33(3), 437–444 (2011) 13. J.W. Taylor, Short-term electricity demand forecasting using double seasonal exponential smoothing. J. Oper. Res. Soc.. 54(8), 799–805 (2003) 14. J.W. Taylor, Triple seasonal methods for short-term electricity demand forecasting. Eur. J. Oper. Res. 204(1), 139–152 (2010) 15. S.J. Baek, Short-term load forecasting with small-scale loads by types using arificial neural network. Master Thesis, Graduate School of Soongsil University (2019) 16. S.Y. Kim, Development of weekly maximum electric load forecasting method for systematic power system planning and operation. Master Thesis, Graduate School of Soongsil University (2013) 17. Z. Yumurtaci, E. Asmaz, Electric energy demand of Turkey for the year 2050. Energy Sources 26(12), 1157–1164 (2004) 18. J.C. Lam, H.L. Tang, D.H. Li, Seasonal variations in residential and commercial sector electricity consumption in Hong Kong. Energy 33(3), 513–523 (2008)

A Study on the Usage Prediction Model of Demand Response Resource …

89

19. W.C. Hong, Electric load forecasting by support vector model. Appl. Math. Model. 33(5), 2444–2454 (2009) 20. S. Fan, L. Chen, W.J. Lee, Machine learning based switching model for electricity load forecasting. Energy Convers. Manage. 49(6), 1331–1344 (2008) 21. V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis, Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015) 22. J. Schmidhuber, Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015) 23. A. Gulli, S. Pal, Deep learning with Keras: Packt Publishing Ltd (2017) 24. M.H. Kim, An empirical study on time series nonlinear prediction model using generative adversarial network-focused on epidemics(Influenza, COVID-19) and the Unemployment Rate. Doctoral Thesis, Graduate School of Soongsil University (2021) 25. J.H. Moon, S.H. Jun, J.W. Park, Y.H. Choi, E.H. Hwang, Electric load forecasting scheme for university campus buildings using artificial neural network and support vector regression. Korea Inf. Process. Soc. 5(10), 293–302 (2016) 26. S.J. Song, S.W. Lee, S.S. Choi, J.M. Kang, Estimation of reliability demand response by customer based on reduction history data using LSTM. The Institute of Electronics and Information Engineers. Conference, 1015–1017 (2018) 27. Y.I. Kim, S.U. Lee, Y.S. Kwon, Proposal of a step-by-step optimized campus power forecast model using CNN-LSTM deep learning. J. Korea Acad.-Ind. Coop. Soc. 21(10), 8–15 (2020) 28. H.S. Kim, H.C. Song, S.K. Ko, B.T. Lee, J.W. Shin, RNN-LSTM based short-term electricity demand forecasting using holiday information. The Institute of Electronics and Information Engineers. Conference, 552–555 (2016) 29. A. Sozen, E. Arcaklioglu, Prediction of net energy consumption based on economic indicators (GNP and GDP) in Turkey. Energy Policy 35(10), 4981–4992 (2007) 30. H.T. Pao, Forecasting energy consumption in Taiwan using hybrid nonlinear models. Energy 34(10), 1438–1446 (2009) 31. C. Garcia-, C. Mate, Electric power demand forecasting using interval time series: a comparison between VAR and iMLP. Energy Policy 38(2), 715–725 (2010)

A Study on AI Profiling Technology of Malicious Code Meta Information Dongcheol Kim, Taeyeon Kim, Jinsool Kim, and Gwangyong Gim

Abstract The global trend of the COVID-19 pandemic is accelerating the change to a non-face-to-face society. In various fields such as telecommuting and online education, non-face-to-face-based services are expanding, and cyber threats to infrastructure are becoming more intelligent and advanced by utilizing these service vulnerabilities. To identify and detect these security threats, there is a growing need for research to profile information extracted through static, dynamic, and detailed analysis of infringement incidents as indicators. Also, research on artificial intelligence-based malicious code detection technology is needed to cope with the techniques detected through this profiling and zero-day attacks. Taking this trend into consideration, this study generates feature vectors that combine OP-Code and ASM-Code extracted through binary reverse engineering based on threat hunting models. Through the development of a learning model for data that vectorized this, this study intends to develop an artificial intelligence-based malicious code detection technology that can detect maliciousness and attack techniques and automatically generate profiling information to track attackers back. To overcome the inability to explain detected malicious codes, which were the limitations of artificial intelligence-based malicious code detection programs, TTP (Tactics, Techniques, Procedures) is applied to provide an explanation function of how malicious codes are organized and which damages are made. Keywords Malicious code · Machine learning · Random forest · MITRE ATT@CK TTPs (Tactics Techniques Procedures) · OP-code · ASM-code

D. Kim · T. Kim · J. Kim · G. Gim (B) Soongsil University, Seoul, South Korea e-mail: [email protected] J. Kim e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Lee (ed.), Software Engineering, Artificial Intelligence, Networking and Parallel/ Distributed Computing 2022-Winter, Studies in Computational Intelligence 1086, https://doi.org/10.1007/978-3-031-26135-0_8

91

92

D. Kim et al.

1 Introduction The most commonly used technology among most cybersecurity threat detection methods in recent years is the technology that generates malicious code patterns in advance of databases. Utilizing appropriate monitoring technologies where data flows are needed, technologies have evolved to identify and respond to threats when data flows or malicious codes matching those patterns are detected. Such existing technology has the advantage of being able to detect quickly and accurately when it matches a pattern secured in advance. However, the detection was often impossible in the case of new and mutant threats that did not secure patterns or bypassed them. Malicious codes are steadily increasing by more than 10% to 30% every year as ICT infrastructure is advanced every year. The diversity of cyberattacks and the level of damage are also increasing in proportion. To respond to this increasing number of malicious codes and cybersecurity threats, the source technology for response technologies has been researched and responded to in the past. However, there are massive cybersecurity incidents every year, and the scale of the damage continues to increase steadily (Fig. 1).

Fig. 1 Announcement of an increasing number of malware statistics (www.av-test.org)

A Study on AI Profiling Technology of Malicious Code Meta Information

93

2 Related Research 2.1 Types and Analysis of Malicious Code Detection Technology In general, malicious code analysis and detection techniques are divided into static analysis techniques and dynamic analysis techniques [1]. The static analysis method is a method of analyzing the assembly code of the file and the structure of the Portable Executable (PE) without executing the executable file to be analyzed. Dynamic analysis is a method of monitoring API calls, processes, memory, and network resources by executing malicious code executable files and analyzing state changes in executable files. Dynamic analysis builds and uses virtual environments to protect the operating system [2].

2.2 Malicious Code Static Analysis Detection Technology Malicious code static analysis detection technology is a technology that extracts characteristic information through static analysis and detects it based on the process of comparing it with a previously confirmed pattern database. Since the analysis is performed without executing the executable file to be analyzed, the analysis speed is fast. Overall file structure information such as file size, API, and registry information is obtained from the PE [4]. However, if the executable file to be analyzed for malicious code is encrypted, the assembly code cannot be extracted, so it has a limited disadvantage in code analysis.

2.3 Dynamic Malicious Code Analysis Detection Technology Malicious code dynamic analysis detection technology extracts specific actions of the file to be analyzed by executing the actual file in the virtual machine environment. It is a technology that identifies whether it is malicious based on actions derived during a specific time. In addition, a separate environment is needed to protect the operating system from malicious code infection [7].

94

D. Kim et al.

Table 1 Malicious code analysis technology comparison Division

Static techniques

Dynamic technique

Analysis system size and complexity

Simple

Complexity

Analysis time

Fast (within 1 ms)

Slowness (within 5 min)

Type of file

All files

Executable file

Avoidance method

A lot

A lot

Analyst intervention

Few

A lot

Attacker profiling

Impossibility

impossibility

Provision of malicious information

None

Some malicious information

Prediction of attack

Impossibility

Impossibility

Known threat detection rate

High (over 98%)

High (over 98%)

Zero day threat detection rate

Low (40–50%)

Normal (over 70%)

Response time

Slow

Normal

Reasons for detected

Provide malicious code detection name

Provide malicious code behavior information

2.4 Malicious Code Analysis and the Comparison of Detection Techniques The advantages and disadvantages of static and dynamic analysis technologies used in malicious code analysis were compared. The result is shown in Table 1.

2.5 MITRE ATT&CK Framework MITRE is a non-profit organization that oversees the vulnerability database Common Vulnerabilities and Exposures (CVE). ATT&CK openly provides information, which is based on information about cyberattacks Tactics and Techniques, suggesting a security framework that can be used for free by everyone or organizations [3]. MITRE (https://attack.mitre.org) has studied the definition of malicious behavior, and the publication of the ATT&CK®Framework, 14 tactics based on Enterprise, and more than 450 attack technology models are globally used. Based on actual cyberattack cases, cyberattack features such as strategies and tactics of attackers are represented through Tactics, Techniques, and Common Knowledge [8] (Fig. 2).

95

Fig. 2 MITRE ATT&CK framework (https://attack.mitre.org)

A Study on AI Profiling Technology of Malicious Code Meta Information

96

D. Kim et al.

2.6 Machine Learning Convergence Technology for Intelligent Malicious Code Detection Recent malicious code detection and analysis techniques construct behavior analysis DBs such as API functions, sequences, and n-gram features of malicious code to utilize machine learning techniques [6]. Malicious code is classified by analyzing malicious code characteristics according to the type of execution command, generating multidimensional vectors using them, and analyzing similarities with statistical analysis-based behavior characteristics analysis [5].

2.6.1

N—gram(2-g)

The N-gram is a commonly used feature such as natural language processing analysis and DNA sequence analysis, and can analyze unit-specific associations. The gram structure consists of a feature vector and consists of up to 65,536 dimensions when the unit is byte.

2.6.2

WEM(Window Etnropy Map)

The WEM feature was proposed by Zhuojun Ren and is a feature using entropy information on bytes [10]. Entropy is a measure of disorder, where the entropy is high in an irregular data set, whereas the entropy value is low in a regular data set.

3 Research Model 3.1 Outline Based on the threat hunting model, the attack technique and binary reverse engineering meta-information-based AI profiling technology for backtracking the attacker group operates based on the extracted code after disassembling the binary file. Hence, this profiling technology can be identified only with an artificial intelligence engine with data generated through a relatively simple system size and normalization process without a separate engine.

A Study on AI Profiling Technology of Malicious Code Meta Information

97

3.2 OP-CODE Extraction and Data Processing Technology When OP-CODE extraction and data processing are performed, the executable file is analyzed based on the disassembly tools IDA (https://hex-rays.com/ida-pro/) Pro [9], Radare2 [11]. The required ASM code and OP-CODE are extracted, and these are reclassified for each function to organize the array of OP-CODE. Only the OPCODE of the target to be analyzed and the required ASM code are separated, and the human or computer reconstructs it into a readable form to process the data.

3.3 OP-CODE Extraction and Data Processing Technology Data analysis is responsible for processing and analyzing data for actual similarity analysis based on each vectorized functional feature dataset. For similarity analysis, this study converted each set of vectorized OP-CODE and ASM-CODE data back to byte form. This is extracted as a hash of a block unit and a block hash value of the entire data is generated based on the eigenvalue of the partial unit. This hash value does not generate verification values for the entire data for verification of file integrity, such as existing MD5, SHA-1, and SHA-256. Rather, to smoothly perform the partial comparison, a hash value of a unit designated to extract an intrinsic value of each block unit is extracted and compared. It is a type of Fuzzy Hashing technique that is used to compare the similarity of two or more different files with the traditional hash method of comparing whether they are the same files. The similarity hash algorithms used in this experiment are ssdeep, TLSH, and DHASH. Ssdeep is a similar malware classification technique based on fuzzy hash tools [13].

3.4 Data Classification Modeling This study constructed a classification model with a Decision Tree or Random Forest learning algorithm for the prepared data. The Decision Tree builds a tree by optimizing comparison conditions for feature values (number of pattern expressions) that can distinguish one or more classes (T-IDs) according to an information gain scale. Figs. 3–7 shows the following structure. The orange square portion of Fig. 6 indicates to the condition for classification by the in-terminal node. The blue square portion is a terminal node which means classified classes. The input data of the Decision Tree constituting the Random Forest model make up various Decision Trees by differentiating the data and characteristics. Classification is performed on multiple generated Decision Tree models and using majority voting techniques, the final classification class is determined. A classification model is learned based on the generated vector data. In the case of classification, the threshold

98

D. Kim et al.

Fig. 3 Malware analysis of machine learning

Fig. 4 Concept of reverse engineering meta-information AI profiling

Fig. 5 OP-CODE extraction

value is set to prevent excessive detection and misreading, the value below the lower threshold value is discarded, and the classification is performed as a data target above the detection threshold value [7].

A Study on AI Profiling Technology of Malicious Code Meta Information

99

Fig. 6 Decision tree model

Fig. 7 Learning steps and classification steps in profiling

3.5 Malicious Code Profiling Technology Malicious code profiling analyzes existing well-known attack codes or malicious codes and extracts OP-CODE implemented inside each binary through feature information extraction methods used in this technology. This information is then vectorized and labeled based on the results of the analysis by humans. Labeling requires two simultaneous actions, one with a T-ID which gives a unique index number for each pre-defined attack in MITRE ATT&CK® . The other is to fill in the information about the user who wrote the attack code (Table 2).

(OP-CODE + ASM-CODE)’s Fuzzy Hash

389E03B1A1FD1C527D48DF74D3C26A04 83A5B105F36841193172F1EE80E62C1B (182,344 bytes)

32:76EcDrURBPQk58tGcoHE129qSwkkw7j mJ4zm2fGjbTnyIKtv67PcP3dkpidSUye7:7d 4UBYxtGcoHE129qSwkkw7jmJ4zm2fR

E3EC2AA04AFECC6F43492BFE2E0D2710 45AB693ABFA332A2C89A5115FFE77653 (1,311,488 bytes)

32:76EcDrURBPQk58tGcoHE129qSwkkw7j mJ4zm2fGjbTnyIKtv67PcP3dkpidSUyeS:7d 4UBYxtGcoHE129qSwkkw7jmJ4zm2fg

TA505

TA505

TA505

TA505

T1027

SHA-256(size)

Attacker (Group)

T-ID

Label

Table 2 Labeled learning data

{“3736”: 1, “3645”: 1, “4563”: 1, “6344”: 1, “4472”: 1, “7255”: 1, “5552”: 1, “5242”: 1, “4250”: 1, “5051”: 1, “516b”: 1, “6b35”: 1, “3538”: 1, “3874”: 1, “7447”: 2, “4763”: 2, “636f”: 2, “6f48”: 2, “4845”: 2, “4531”: 2, “3132”: 2, “3239”: 2, “3971”: 2, “7153”: 2, “5377”: 2, “776b”: 2, “6b6b”: 2, “6b77”: 2, “7737”: 2, “376a”: 2, “6a6d”: 2, “6d4a”: 2, “4a34”: 2, “347a”: 2, “7a6d”: 2, “6d32”: 2, “3266”: 2, “6647”: 1, “476a”: 1, “6a62”: 1, “6254”: 1, “546e”: 1, “6e79”: 1, “7949”: 1, “494b”: 1, “4b74”: 1, “7476”: 1, “7636”: 1, “3637”: 1, “3750”: 1, “5063”: 1, “6350”: 1, “5033”: 1, “3364”: 1, “646b”: 1, “6b70”: 1, “7069”: 1, “6964”: 1, “6453”: 1, “5355”: 1, “5579”: 1, “7965”: 1, “6553”: 1, “533a”: 1, “3a37”: 1, “3764”: 1, “6434”: 1, “3455”: 1, “5542”: 1, “4259”: 1, “5978”: 1, “7874”: 1, “6667”: 1}

{"3736": 1, "3645": 1, "4563": 1, "6344": 1, "4472": 1, "7255": 1, "5552": 1, "5242": 1, "4250": 1, "5051": 1, "516b": 1, "6b35": 1, "3538": 1, "3874": 1, "7447": 2, "4763": 2, "636f": 2, "6f48": 2, "4845": 2, "4531": 2, "3132": 2, "3239": 2, "3971": 2, "7153": 2, "5377": 2, "776b": 2, "6b6b": 2, "6b77": 2, "7737": 2, "376a": 2, "6a6d": 2, "6d4a": 2, "4a34": 2, "347a": 2, "7a6d": 2, "6d32": 2, "3266": 2, "6647": 1, "476a": 1, "6a62": 1, "6254": 1, "546e": 1, "6e79": 1, "7949": 1, "494b": 1, "4b74": 1, "7476": 1, "7636": 1, "3637": 1, "3750": 1, "5063": 1, "6350": 1, "5033": 1, "3364": 1, "646b": 1, "6b70": 1, "7069": 1, "6964": 1, "6453": 1, "5355": 1, "5579": 1, "7965": 1, "6537": 1, "373a": 1, "3a37": 1, "3764": 1, "6434": 1, "3455": 1, "5542": 1, "4259": 1, "5978": 1, "7874": 1, "6652": 1}

2-g

100 D. Kim et al.

A Study on AI Profiling Technology of Malicious Code Meta Information

101

3.6 OP-CODE TTP Matching Technology To generate core evidence data for detection, OP-CODE of the file is extracted through the disassembly process of the binary file. Based on the extracted OP-CODE, it is divided into function units, which are the minimum components for program code to execute, and consists of three sets of OP-CODE that make up each function. MITREATT&CK®’s TTP, defined in the form of human oral language, is not specifically composed in pattern form. Thus, it analyzes the malicious code in advance and checks in what form it is developed to implement each TTP constituting the malicious code based on the extracted OP-CODE. By vectorizing the extracted OP-CODE set in this way, a dataset is constructed so that the OP-CODE of a function unit can be used as learning data. Based on the constructed dataset, learning is performed through the random forest algorithm based on the OP-CODE dataset for each TTP. This learned artificial intelligence model acts as the main function of identifying TTP when new files are collected. As seen in Fig. 8, it operates in a form that enables multiple identifications of which TTP the file to be analyzed is composed based on the extracted OP-CODE.

Fig. 8 Technical concept of matching TTP with the results of disassemble extracted from binary

Fig. 9 Each TTP matching configuration in the OP-CODE combination

102

D. Kim et al.

Table 3 Specifications for testing equipment No

Role

OS

CPU

Mem

Storage

Note

1

Test notebook

Mac OS Big Sur 11.5.1

Intel i9 2.4Ghz 8Core

64 GB

4 TB

– AI-based TTP identification engine – Sample (malicious file, Normal files

4 Experiment and Result 4.1 Experiment Setting and Method The research was performed on a test laptop system with Intel i9—8Core @ 2.4Ghz, 4 TB Storage, 64 GB RAM, Mac OS Big Sur 11.5.1 OS as shown in Table 3. The experiment installed an artificial intelligence-based TTP identification engine that was tested on a single test laptop separate from the network. As for the measurement sample, malicious files and normal files were prepared from malwares.com [12]. The reason for network separation in the test environment was that the results were derived through the test while the malicious code pattern update was prohibited in the chain network environment.

4.2 Experiment Result 4.2.1

Accuracy of Malicious Code Detection Based on TTP Learning Data

The experiment allowed detection and accuracy of malicious code among normal and malicious files to be tested while the Internet was disconnected. As for the experimental procedure, 1,500 normal files and 17,500 malicious files were prepared, and malicious code detection was performed on the file to be tested. The test results are shown in Table 4.

4.2.2

TTP Learning Data-Based Malicious Code T-ID Analysis Accuracy

The experiment confirmed whether the malicious code T-ID of the malicious file is analyzed using vector information extracted based on OP-Code and ASM-Code while the Internet is disconnected. Through this, the accuracy of T-ID analysis was

A Study on AI Profiling Technology of Malicious Code Meta Information

103

Table 4 Malicious code detection accuracy Number of measurement sample files

Separating measurement sample files

Measurement results

Malicious file detection accuracy by sample file classification

Malicious code detection accuracy

20,000 files

2500 normal files

2494 normal detections 6 cases of over-detection

99.76%

99.76%

17,500 malicious files

17,498 normal detections 2 undetected cases

99.98%

Table 5 Malicious Code T-ID Analysis Accuracy Number of measurement sample files

Separating measurement sample files

Measurement results

Malicious code detection accuracy

100 files

100 malicious files

– T-ID analysis results match 97 cases – T-ID analysis results show 3 inconsistencies

97.00%

measured. In the experimental procedure, 100 malicious files were measured for identifying the T-ID of the malicious file. The test results are shown in Table 5.

4.2.3

Accuracy of Detection of New Malware Based on TTP Learning Data

The experiment checked whether the test subjects detected the presence or absence of new malicious codes by distinguishing between normal and malicious files found after the learning data composition date while the Internet was disconnected, and measured the detection accuracy. The experimental procedure confirmed the date of the configuration of the learning data under test and the date of generation of 3,000 newly collected normal files and 7,000 malicious files. Malicious code detection was performed as a test target to detect and measure malicious files. The test results are shown in Table 6.

104

D. Kim et al.

Table 6 New malicious code detection accuracy based on TTP learning data Number of measurement sample files

Separating measurement sample files

Measurement Results

Malicious file detection accuracy by sample file classification

Malicious code detection accuracy

10,000 files

3,000 normal files

3,000 normal detections Zero cases of over-detection

100%

99.76%

7,000 malicious files

6,976 normal detections 20 undetected cases Four failed detections

99.65%

Table 7 Malicious code attacker group T-ID analysis accuracy Number of measurement sample files

Separating measurement sample files

Measurement results

Attacker group associated T-ID analysis accuracy

100 files

100 malicious files

– T-ID analysis results match 98 cases – T-ID analysis results show 2 inconsistencies

98.00%

4.2.4

Accuracy of T-ID Analysis for Malicious Code Attacker Group Based on TTP Learning Data

The experiment verified whether the test target analyzed the malicious code of the malicious file while the Internet was disconnected and output T-ID analysis information related to the group of previously learned malicious code attackers, and measured the analysis accuracy. As for the experimental procedure, 100 malicious files developed by a specific attacker were prepared, and the accuracy of identifying the attacker group was measured for 100 malicious files as a test target. Additionally, the associated T-ID output for the identified attacker group was confirmed. The test results are shown in Table 7.

5 Conclusion In this study, the attack tactics and techniques themselves accompanying cybersecurity threats were identified using binary reverse engineering technology, big data, and

A Study on AI Profiling Technology of Malicious Code Meta Information

105

artificial intelligence technologies. This is a technology that tracks and automatically profiles attackers or groups of attackers who implemented the attack. These technologies extract feature information on OP-CODE and ASM-CODE, which implement attacks by attackers, and are not just technologies that detect malware, but technologies that are used throughout cyber threats. This feature information is vectorized and made into a big data set for artificial intelligence learning to identify threats based on the learned data. This study confirmed that information that can be confirmed by users through malicious code detection technology such as existing vaccines is only provided with information such as “This file is malicious code, and the detection name is this.” However, since this technology identifies TTP based on OP-CODE, it is possible to explain the malicious behavior of a specific file itself. When collecting and analyzing malicious codes, it is possible to understand how attackers implement them with one TTP. Based on the data identified in this study, it is expected that many research results will emerge in the future as a starting point for proactive response to cyber threats by learning and predicting how new attack codes will be developed.

References 1. M. Sikorski, A. Honig, Practical Malware Analysis (No Starch Press, 2012) 2. K Rieck, T Holz, C Willems, P Dussel, P Laskov, Learning and classification of malware behavior, in International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, (Springer, 2018) pp. 108–125 3. MITRE, MITRE ATT&CK (2021), https://www.attack.mitre.org 4. B.B. Hyun, Histogram Visualization and Prototype Selection Algorithm for Malicious Code Analysis (2020) 5. L. Hyunjong, H. Seongyul, H.D. Sung, An Ensemble Learning Model Based on API Characteristics for Malicious Code Family Classification (KIISC, 2019) 6. L. Hyunjong, Next-generation endpoint proactive detection and response technology for effective response to intelligent APT attacks (2019) 7. L. Breiman, Random forests. Machine learning, 5–32 (2001) 8. H. Chanwoong, B. Sungho, L. Taejin, A study on abnormal attack sign detection technology based on MITREATT&CK and anomaly detection. Convergence Security Paper 9. C. Eagle, The IDA pro book, 2nd Ed. (No Starch Press, 2011) 10. Z. Ren, G. Chen, EntropyVis: Malware classification. Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), in 2017 10th International Congress on. IEEE (2017) 11. Radare2, “radare2” (2019) https://rada.re/r/ 12. Malwares.com, “malware.com” (2022) https://malwares.com, 2022 13. J. Kornblum, Identifying almost identical files using context triggered piecewise hashing. Digit. Investig. 3, 91–97 (2006)

A Study on the Influence of Smart Factory’s Intention to Continue to Use on the Management Effect of Enterprises Seoung Jong Lee, Jong woo Park, and Hee Jun Cho

Abstract In South Korea, although companies are creating many business outcomes by building infrastructure for smart manufacturing through the smart factory construction projects, which has been promoted by the government since 2015, the levels of smartization of most small and medium-sized enterprises (SMEs) are still remaining at the basic level of basic steps. Therefore, an empirical study was conducted on the factors affecting the intention to continue to use smart factories and net impacts with large enterprises (including public enterprises), strong medium enterprises, and SMEs that have built and have been operating smart factories. This study prepared a questionnaire targeting executives and employees working at companies with a research model that combines the IS success model, the TOE model, and the UTAUT2 model on how companies that built and are using smart factories will be affected when they use the smart factories with an intention to continue to use smart factories and conducted empirical analyses. It is judged that the data from this study will contribute to studies on the establishment of smart factory strategies or implementation plans for the strategies conducted by companies or the government. However, the results of analyses were found to have limitations because differences in perceptions between large enterprises and SMEs in the positions of users that have built and are operating smart factories are large. Therefore, more studies are necessary from the viewpoints of net impacts according to the intention to continue to use and operation of smart factories by scale, business type, level of smart factories. Keywords Smart factory · Information system · Perceived usefulness · Perceived ease of use · Intention to continue to use · Net impacts

S. J. Lee · J. Park (B) · H. J. Cho Soongsil University, Seoul, South Korea e-mail: [email protected] S. J. Lee e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Lee (ed.), Software Engineering, Artificial Intelligence, Networking and Parallel/ Distributed Computing 2022-Winter, Studies in Computational Intelligence 1086, https://doi.org/10.1007/978-3-031-26135-0_9

107

108

S. J. Lee et al.

1 Introduction The Korean economy, which had contracted due to the COVID-19 pan-demic, is also showing an overall recovery trend thanks to the effect of the global economy, but gaps among industries, business types, and business scales are gradually widening in the limited manufacturing environment. As the 4th industrial revolution became an issue, even manufacturing companies became to have much interest in the introduction and operation of smart factories through digital innovation. With the smart factories that have been initiated by the government since 2015, traditional companies have been gradually building and operating smart factories with automation and informatization using ICT technology, thereby creating corporate value with productivity improvement, quality improvement, cost reduction, and delivery time shortening. Recently, as the number of manufacturing companies that have introduced and are operating smart factories has gradually increased, many studies have been conducted on the factors affecting the intention to introduce, intention to use, and intention to continue to use smart factories and management effects, but studies on factors affecting the intention to continue to use and net impacts are insufficient. Therefore, the latter studies were judged to be academically and practically meaningful and thus a survey was conducted with manufacturing companies to empirically analyze the effects of information system factors, organizational factors, and environmental factors on the intention to continue to use and net impacts.

2 Theoretical Background 2.1 Definition and Components of Smart Factories Places where manufacturing companies input raw materials and produce products through production processes such as processing and assembling are called factories. The term “smart” is used to mean artificial intelligence beyond the dictionary meaning of “smart”. Smart means something that thinks like a human and acts and judges beyond the foregoing. A smart factory can be said to be an intelligent factory that includes the meaning of artificial intelligence. In order to implement a smart factory, data should be collected from factory facilities or equipment using various sensors, kiosks, or PCs, stored in the cloud to make big data, and the big data should be analyzed and intellectualized through various optimization tools, artificial intelligence, or CPS so that the factories in reality can be autonomously controlled. Therefore, for the implementation of smart factories, the Smart Manufacturing Innovation Promotion Team classified FA, MES, ERP, PLM, and SCM as five major construction areas and is intensively supporting information system construction.

A Study on the Influence of Smart Factory’s Intention to Continue to Use …

109

According to KS X 9001-3 (Smart Factory—Part 3 Operation Management System, Diagnostic Evaluation Model), the Smart Factory Operation Management System Framework consists of 5 layers; vision and strategy, goals and performance, corporate management, manufacturing operation, and machinery and control. The areas under the corporate management layer include components from the perspectives of processes, systems, and automation. There are six com-ponents from the perspective of processes, which are product development, pro-duction planning, process management, quality control, facility management, and logistics management. Components from the perspective of system and automation are divided into an information system module and a facility automation module, and the information system module consists of sub-modules such as PLM, ERP, SCM, MES, FEMS, and security system [1].

2.2 Modified Information System Success Model After the updated IS Success Model was released in 2003, DeLone and McLean [2] made two additional modifications to the IS Success Model. The first modification used the term “Net Benefits” in the Updated DeLone and McLean [2] IS Success Model (2003) to indicate the final measure of success. They concluded that “Net Impacts” would be a better title than “Net Bene-fits” because “Net Benefits” means only positive outcomes. The intention of De-Lone and McLean [2] was to ensure that the IS Success Model recognizes both positive and negative outcomes. If positive results are obtained, more frequent “use” will lead to higher “user satisfaction”. On the other hand, negative results can hinder “use” and lead to lower “user satisfaction”. For this reason, the term “Net Benefits” has been replaced with the term “Net Impacts” thereafter. The second modification is the recognition of the necessity of an additional set of feedback loops. As experiences with the system increase, problems become ap-parent and improvements are recognized to be possible generally resulting in requests for modifications and updates to the system, commonly referred to as “maintenance”. These changes are the next steps in the process of evolution of the system life cycle. To express this graphically, feedback arrows were marked as “system quality”, “information quality” and “service quality” in “use” and “user satisfaction” [3].

2.3 Relationships Between Perceived Usefulness and Ease of Use, Intention to Continue to Use, and Net Impacts Whereas information in an information system does not change once it has been entered into the system, information in the real world constantly changes. Therefore, it was said that the quality of information in the system deteriorates over time [4].

110

S. J. Lee et al.

In this study, information quality was judged to be the degree to which the smart factory provides reliable information accurately, quickly, completely to users in a form easily understandable to users and service quality was judged to be the quality of the support from suppliers, which affects customer satisfaction with the use of smart factories when information systems are provided to users at manufacturing companies. Accordingly, the following hypotheses were set. Hypothesis 1 (H1a). System quality will positively affect perceived useful-ness. Hypothesis 1 (H1b). System quality will positively affect perceived ease of use. Hypothesis 2 (H2a). Information quality will positively affect perceived usefulness. Hypothesis 2 (H2b). Information quality will positively affect perceived ease of use. Hypothesis 3 (H3a). Service quality will positively affect perceived useful-ness. Hypothesis 3 (H3b). Service quality will positively affect perceived ease of use. According to the TAM model, perceived usefulness is the perception that using a certain system can improve one’s work performance. Perceived ease of use refers to the degree to which information technology users expect to be able to use newly introduced information technology without much effort. Many study findings indicating that positive attitudes and acceptance intentions are formed when the ease of use of information technology is perceived, and this is not the case when the ease of use of information technology is not perceived have been presented [5]. Based on the previous studies, the following study hypotheses were set. Hypothesis 4 (H4a). Perceived usefulness will positively affect intention to continue to use. Hypothesis 4 (H4b). Perceived usefulness will positively affect net impacts. Hypothesis 5 (H5a). Perceived ease of use will positively affect intention to continue to use. Hypothesis 5 (H5b). Perceived ease of use will positively affect net impacts. Parida and Daniel Örtqvist [6] investigated the effect of the ability to use ICT, which is the ability to strategically use ICT in business, on the innovation performance of SMEs with technology-based Swedish SMEs and reported that according to the results, the ability to use ICT affected the innovation performance and a company’s ability to use ICT was closely connected to its intention to introduce technology [7]. Based on previous studies, the following study hypotheses were set. Hypothesis 6 (H6a). Top manager influence will positively affect the intention to continue to use. Hypothesis 6 (H6b). Top manager influence will positively affect net im-pacts. Hypothesis 7 (H7a). The ability to use ICT will positively affect the intention to continue to use. Hypothesis 7 (H7b). The ability to use ICT will positively affect net impacts. Government assistance expectancy can be defined as the degree to which government support projects related to smart factory construction are recognized, and the

A Study on the Influence of Smart Factory’s Intention to Continue to Use …

111

Fig. 1 Research model

degree of expectations for support, and it was said that because of the characteristics of smart technologies, government assistance expectancy was expected to greatly affect the intention to introduce smart factories [8]. Based on previous studies, the following study hypotheses were set. Hypothesis 8 (H8a). Financial readiness will positively affect the intention to continue to use. Hypothesis 8 (H8b). Financial readiness will positively affect net impacts. Hypothesis 9 (H9a). Government assistance expectancy will positively affect the intention to continue to use. Hypothesis 9 (H9b). Government assistance expectancy will positively affect net impacts. Since it was judged that if the manager of a company is interested in smart factories, improves the organization’s ability to use ICT, and promotes and continues to use it ICT with financial readiness and government assistance, the net benefit will be affected, the following hypothesis was set. Hypothesis 10 (H10). The intention to continue to use will positively affect net impacts. Based on the above theoretical background and hypothesis setting, the study model of this study is presented as follows (Fig. 1).

3 Research Method This study conducted a questionnaire based on previous studies to study the effect of variables of each factor on the intention to continue to use smart factories in terms of

112

S. J. Lee et al.

information system factors, organizational factors, and environmental factors. The survey target companies sent 568 copies to public corporations, large enterprises, mid-sized companies and SMEs that introduced and operated information systems and received a total of 376 copies. An empirical analysis of this study was conducted with a total of 306 questionnaires, excluding 70 questionnaires with missing values or non-ICT application and insincere answers as a result of receiving the questionnaire. System quality in the information system success model refers to the desirable characteristics of the information system itself [9]. In this study, System Quality was used to indicate how reliable, quick-response, accessible in various environments, and stable operation of a smart factory is. Information Quality was modified and used to indicate how quickly, informative, accurate, complete, and easy-to-understand information is provided by the information provided by the smart factory compared to the user’s expected level. Service Quality was modified and used to indicate the quality of the support the user receives from the smart factory supplier (internal organization or ICT support personnel). Perceived ease of use is the extent to which individuals accept the fact that using the correct method will not cost [10-13]. The perceived usefulness in this study is the degree to which the smart factory user believes that work performance can be improved through the use of the information system. Perceived ease of use was used to the extent that users believed that using a particular system would be effortless. Tippins and Sohi [14] defined ability to use ICT as the degree to which a company has knowledge of ICT and how effectively it manages information with-in a company using ICT. Top manager influence was defined as the degree of interest and support for the introduction and use of smart factories, and ability to use ICT were modified to the degree of capability of having ICT knowledge, effectively managing company internal information, and strategically using ICT in smart factories. The level of automation and smart factory of companies that received government support was higher than those of companies that did not [15]. In this study, government assistance expectancy was used as the level of expectation that government support projects related to smart factory construction were perceived and that support could be received [16]. Previous studies on intention to continue to use almost commonly de-fined intention to continue to use as “the degree of willingness to continue using a product or system to improve the value and competitiveness of a company” [17]. In this study, the intention to continue to use was modified to the extent that a company that has used the smart factory more than once since the introduction wants to use it continuously. Net impact is the extent to which information systems contribute to the success of individuals, groups, organizations, industries and countries, i.e., improved decisionmaking, improved productivity, increased sales, reduced costs, im-proved profits, market efficiency, consumer welfare, job creation, etc. [3]. In this study, the net impact was used by modifying the extent to which an individual or organization of a company is contributing to the company or not due to the operation of the smart factory.

A Study on the Influence of Smart Factory’s Intention to Continue to Use …

113

4 Analysis and Results A regression analysis was performed to verify the hypothesis for a study on the factors affecting the net impacts of the intention to continue to use the smart factory, and the following results were obtained (Tables 1 and 2).

5 Conclusions First, the hypotheses that system quality, information quality, and service quality, which are information system factors, will positively (+) affect perceived usefulness were adopted. This was analyzed as indicating that the higher the system quality, information quality, and service quality of the smart factory information system, the more efficiently the task is performed and the more positively the perceived usefulness that improves the work performance ability is affected. However, it was found that the foregoing qualities do not positively (+) affect the perceived ease of use, which is the quality of support received from the suppliers (including internal organizations or ICT support personnel) that is the service quality of the information system. With regard to hypothesis H4a, which indicates the influencing relationship between perceived usefulness and the intention to continue to use, it was found that the fact that users can continue using the information system even without effort does not positively (+) affect the intention to continue to use. This means that the foregoing fact negatively (−) affects the intention to continue to use the smart factory, which is introduced and used without much effort. With regard to hypothesis H4b indicating the influencing relationship be-tween the perceived ease of use and the intention to continue to use, the fact that users use equipment such as mobile systems, terminals, and tablet PCs and apply new technologies such as artificial intelligence and CPS without much effort was found to positively affect the intention to continue to use. With regard to the influencing relationships between top manager influence and ability to use ICT among organizational factors and the intention to continue to use, hypothesis H6a indicating that top manager’s participation and support positively (+) affect the intention to continue to use smart factories was found to be true. In addition, with regard to hypothesis H7a regarding the influencing relationship between smart factory users’ ability to use ICT and intention to continue to use was also found to be true. With regard to the relationships between financial readiness and government assistance expectancy among the environmental factors and the intention to continue to use, hypothesis H8a regarding the influencing relationship between financial readiness and intention to continue to use was found to be true. However, H9a regarding in the relationship between government assistance expectancy and intention to continue to use was found to be false.

114

S. J. Lee et al.

Table 1 Confirmatory factor analysis based on reliability Variable

Measurement items

Factor L.D

Eigen value

Crb. Alpha

Intention to use

4. Our company is willing to continue to use the smart factory introduced and used to improve the value of the company

0.838

3.58

0.893

3. Our company is willing to continue to use the smart factory introduced and used to improve the company’s competitiveness

0.819

2. Our company is willing to continue to use the smart factory by investing resources (material, human capital)

0.802

1. Our company is satisfied with the smart factory introduced and used and is willing to continue to use it

0.802

3.15

0.844

3.13

0.839

5. Our company is willing to 0.802 actively recommend to other colleagues in the vicinity to use the introduced and used smart factory Net impacts

1. The cost reduction (cost reduction) effect has improved compared to before the introduction of the smart factory

0.791

5. Compared to before the 0.772 introduction of the smart factory, customer response and production speed have improved 4. Compared to before the introduction of the smart factory, the delivery standards and lead time reduction effect have been improved

0.767

3. Flexibility (product production, schedule) and productivity have improved compared to before the introduction of the smart factory

0.725

2. The quality has improved 0.706 (reduced nonconforming products) compared to before the introduction of the smart factory Top manager influence

1. The high interest of the CEO of our company had a great impact on the introduction and use of smart factories

0.808

(continued)

A Study on the Influence of Smart Factory’s Intention to Continue to Use …

115

Table 1 (continued) Variable

Measurement items

Factor L.D

4. The CEO’s confidence in the expected effect compared to the investment had a great influence on the introduction and use of the smart factory

0.800

Eigen value

Crb. Alpha

3.05

0.882

2.95

0.866

3. Appropriate financial support 0.769 for the introduction and use of the smart factory by the CEO of our company had a great impact on the introduction and use of the smart factory 2. Accurate information and 0.742 awareness of the CEO of our company had a great impact on the introduction and use of smart factory 5. Our company’s CEO’s will to 0.718 respond to changes rather than stability had a major impact on the adoption and use of smart factories System quality

Government assistance expectancy

2. The processing and response speed of our company’s smart factory is fast

0.868

3. Our smart factory can be easily accessed and used whenever and wherever you want

0.856

1. Our company’s smart factory is integrated and stable

0.824

4. Our company’s smart factory is difficult to detect errors or obstacles

0.809

2. Our company wants to receive 0.836 help from government support projects related to the introduction and use of smart factory 3. There is a possibility that our 0.827 company will receive benefits from being selected for a government support project related to the introduction and use of smart factory 4. Our company will help manufacturers with government support projects related to the introduction and use of smart factory

0.824

(continued)

116

S. J. Lee et al.

Table 1 (continued) Variable

Financial readiness

Service quality

Measurement items

Factor L.D

1. Our company is aware of government support projects related to the introduction and use of smart factory

0.817

2. Our company has secured a smart factory-related investment budget

0.823

3. If necessary, the company can use the budget of other sectors for the smart factory construction project

0.807

4. Our company has the ability to procure the budget for the introduction and use of smart factory from outside

0.800

1. Our company has established smart factory promotion plans (short-term, mid-term and long-term roadmaps, etc.)

0.702

2. The smart factory supplier staff 0.892 will immediately deal with any problems (our requirement) that arise when using the smart factory

Eigen value

Crb. Alpha

2.80

0.840

2.42

0.867

2.29

0.838

2.27

0.833

3. The smart factory supplier’s 0.864 staff takes care of the problems (our requirements) that arise when using the smart factory with sincerity 4. The smart factory supplier staff 0.858 provides feedback on the processing results after handling problems (our requirements) that occur when using the smart factory Information quality

3. The information provided by our company’s smart factory is suitable for the purpose of the user’s system use

0.827

2. The information provided by 0.824 our company’s smart factory is the information that users need

Ability to use ICT

4. The information provided by our company’s smart factory is reliable

0.812

1. Our company is using ICT well in its internal work for a smart factory

0.839

(continued)

A Study on the Influence of Smart Factory’s Intention to Continue to Use …

117

Table 1 (continued) Variable

Perceived ease of use

Measurement items

Factor L.D

2. Our company tends to utilize the smart factory well for collaboration with existing business partners (partners or business partners)

0.837

3. Our employees tend to have the ability to use ICT

0.833

1. It is easy to learn how to use our 0.842 company’s smart factory

Eigen value

Crb. Alpha

2.21

0.822

2.20

0.823

2. I think that our company’s smart 0.794 factory can be used easily by anyone

Perceived usefulness

3. Our company’s smart factory system can be conveniently used anytime, anywhere (including mobile systems, tablet PCs, kiosks, apps, etc.)

0.787

1. A smart factory enables our company to perform its business more efficiently

0.807

3. The smart factory improves the business performance of our company

0.797

2. A smart factory is useful for conducting our business

0.788

With regard to hypothesis H5a intended to investigate the relationship be-tween perceived usefulness of information system and net impacts, it was found that when the smart factory is efficient for business performance, it positively (+) affected productivity improvement (P), quality improvement (Q), cost reduction (C), and delivery time shortening (D). In addition, with regard to hypothesis H5b intended to investigate the relationship between perceived ease of use and net impacts, it was found that the use of mobile systems or terminals so that the smart factory can be easily used by anybody positively (+) affects net impacts. With regard to the influencing relationships between top manager influence and the ability to use ICT net impacts among organizational factors and net impacts, in relation to hypothesis H6b regarding the influencing relationship be-tween top manager influence and net impacts and hypothesis H7b regarding the influencing relationship between the ability to use ICT and net impacts, it was found that the higher the CEO support and interest users’ ability to use ICT, the more positively (+) they affected the net impacts of the company. Hypothesis H8b regarding the influencing relationship between financial readiness among environmental factors

0.074

0.005

Government Assistance expectancy

Intention to continue to use

0.169**

Ability to use 0.073 ICT

0.018

0.076

0.158**

0.128*

0.165**

-0.005

0.195**

0.136*

0.133*

Top manager influence

0.151**

0.044

0.359**

0.216**

Perceived ease of use

Financial readiness

0.013

0.341**

0.340**

Perceived usefulness

0.233**

0.231**

0.152**

Service quality

1

1

0.253**

Service quality

Information quality

Information quality

1

System quality

Correlation

System quality

Variable

Table 2 Results of correlation analysis

0.247**

0.202**

0.166**

0.161**

0.257**

0.189**

0.175**

1

Perceived ease of use

0.208**

0.175**

0.162**

0.304**

1

Perceived usefulness

0.165**

0.194**

0.208**

0.063

1

Top manager influence

0.191**

0.191**

0.293**

1

Ability to use ICT

0.241**

0.251**

1

Financial readiness

0.113*

1

Government assistance expectancy

1

Intention to continue to use

(continued)

Net impacts

118 S. J. Lee et al.

*

0.100

System quality

Service quality

-0.022

Information quality

0.214**

Correlation

0.304**

Perceived usefulness

The correlation is significant at level 0.01 (both sides) The correlation is significant at level 0.05 (both sides)

**

Net impacts

Variable

Table 2 (continued)

0.206**

Perceived ease of use 0.167**

Top manager influence 0.238**

Ability to use ICT 0.263**

Financial readiness 0.136*

Government assistance expectancy 0.366**

Intention to continue to use 1

Net impacts

A Study on the Influence of Smart Factory’s Intention to Continue to Use … 119

120

S. J. Lee et al.

Table 3 Results of hypotheses tests β

t

Result

H1a

System quality will positively (+) affect perceived usefulness

0.257

4.809 (0.000**)

Supported

H2a

Information quality will positively (+) affect perceived usefulness

0.244

4.496 (0.000**)

Supported

H3a

Service quality will positively (+) affect perceived usefulness

0.137

2.585 (0.010*)

Supported

Hypotheses

R Square = 0.202, F = 25.507, p = 0.000, Durbin-Watson = 1.845 H1b

System quality will positively (+) affect perceived ease of use

0.126

2.294 (0.022*)

Supported

H2b

Information quality will positively (+) affect perceived ease of use

0.310

5.537 (0.000**)

Supported

H3b

Service quality will positively (+) affect perceived ease of use

0.074

1.349 (0.178)

Not Supported

R Square = 0.151, F = 17.865, p = 0.000, Durbin-Watson = 1.902 H4a

Perceived usefulness will positively (+) affect intention to continue to use

. 094

1.622 (0.106)

Not Supported

H5a

Perceived ease of use will positively (+) affect intention to continue to use

0.218

3.748 (0.000**)

Supported

R Square = 0.069, F = 11.230, p = 0.000, Durbin-Watson = 2.116 H6a

Top manager influence will 0.154 positively (+) affect the intention to continue to use

2.751 (0.006**)

Supported

H7a

The ability to use ICT will positively (+) affect the intention to continue to use

3.243 (0.001**)

Supported

0.181

R Square = 0.060, F = 9.639, p = 0.000, Durbin-Watson = 2.079 H8a

Financial readiness will positively (+) affect the intention to continue to use

0.227

3.942 (0.000**)

Supported

H9a

Government assistance expectancy will positively (+) affect the intention to continue to use

0.056

0.969 (0.334)

Not Supported

R Square = 0.061, F = 9.820, p = 0.000, Durbin-Watson = 2.045 (continued)

A Study on the Influence of Smart Factory’s Intention to Continue to Use …

121

Table 3 (continued) β

t

Result

H4b

Perceived usefulness will positively (+) affect net impacts

0.266

4.674 (0.000**)

Supported

H5b

Perceived ease of use positively (+) affects net impacts

0.125

2.192 (0.029*)

Supported

Hypotheses

R Square = 0.107, F = 18.120, p = 0.000, Durbin-Watson = 1.980 H6b

Top manager influence will 0.153 positively (+) affect net impacts

2.765 (0.006**)

Supported

H7b

The ability to use ICT will positively (+) affect net impacts

4.146 (0.000**)

Supported

0.229

R Square = 0.080, F = 13.187, p = 0.000, Durbin-Watson = 1.868 H8b

Financial readiness will positively (+) affect net impacts

0.245

4.285 (0.000**)

Supported

H9b

Government assistance expectancy will positively (+) affect net impacts

0.074

1.296 (0.196)

Not Supported

R Square = 0.074, F = 12.189, p = 0.000, Durbin-Watson = 1.965 H10

The intention to continue to 0.366 use will positively (+) affect net impacts

6.862 (0.000**)

Supported

R Square = 0.134, F = 47.090, p = 0.000

and net impacts was found to be true. It was found that company’s financial readiness to invest for the introduction of smart factory information systems or equipment has positive effects on the company’s net impacts. With regard to hypothesis H9a regarding the influencing relationship be-tween government assistance expectancy among environmental factors and the intention to continue to use and hypothesis H9b regarding the influencing relationship between government assistance expectancy and net impacts, it was found that the fact that companies recognize government support projects related to smart factories and want to receive government support did not positively (+) affect companies’ intention to continue to use smart factories and their net im-pacts. It was found that hypothesis H10 regarding net impacts was true when a user who introduced and was operating a smart factory had the intention to continue to use the smart factory. This indicates that the continuous use of the smart factory increases work efficiency so that the user recognizes that it will positively affect net impacts.

122

S. J. Lee et al.

The findings of this study identified what users are aware of in terms of information system factors, organizational factors, and environmental factors at companies that have constructed and are using smart factories, so that the government or companies continue to use smart factories and will be provided as research materials to increase net impacts. From the viewpoint as such, more studies are necessary to increase the net impacts through the continuous use of smart factories. The implications obtained through this study are as follows. First, this study is meaningful in that it provides a study model designed based on previous studies on IS Success, TAM, TOE and UTAUT models, etc. regarding the intention to continue to use smart factories and net impacts of smart factories. In previous studies, IT expertise was included in the technical factors of the TOE model with respect to factors affecting the intention to use smart factory and net impacts. In this study, the ability to use ICT was used instead of IT expertise. In terms of companies that use smart factories, it provides the meaning that applying the ability to use ICT in studies will be more effective academically. Second, the relationships between perceived usefulness and perceived ease of use among information system factors, intention to continue to use and net impacts among organizational factors and environmental fac-tors were identified based on the results of questionnaire surveys with companies using smart factories. Recently, various technologies have been grafted onto manufacturing sites due to digital transformation in the era of the 4th industrial revolution. A significant conclusion that information systems should be made to be more easily accessible and conveniently used by users to increase the ease of use increases so that the usefulness can be increased, and this will have positive net impacts on corporate activities with continuous use of smart factories. Third, most of the studies on smart factories were conducted focusing on limited areas such as small and medium-sized enterprises or industrial complexes, but this study can be said to be academically meaningful in that it collected opinions from the viewpoints of smart factory uses working at the innovation teams, pro-duction teams, quality teams, maintenance teams, purchasing logistics teams, development teams, and IT of large enterprises, strong medium companies and small and medium-sized enterprises (SMEs) to conduct the study on the intention to continue to use and net impacts. The limitations of this study and future study directions can be explained as follows. First, in this study, the contents of surveys conducted with large enterprises (including public enterprises), strong medium enterprises, and SMEs that have constructed and are operating smart factories were used, but they were not considered by region, business type, company size, or smart factory level. In addition, 68% of companies responded to the survey were found to be in the machinery, materials, parts, electrical, electronic and equipment industries, indicating that there is a limitation in representing all areas of the manufacturing industry. Because the levels of recognition of the intention to continue to use and net impacts vary due to the differences in the period of smart factory operation and the ability to use ICT according to company sizes, that is, large companies, strong medium companies, and SMEs, conducting questionnaire surveys by business type and company size in the study is also considered desirable. Accordingly, as the findings of this study have limitations

A Study on the Influence of Smart Factory’s Intention to Continue to Use …

123

in representing the entire manufacturing companies that operate smart factories, it would be desirable to conduct studies after reducing the scopes by region, business type, and industrial complex. Second, although the researcher maximally made the users understand the concept of the smart factory before the users answered the questionnaire, the users answered the questionnaire without accurate understanding of the concept and levels of the smart factories so that there may be differences in analysis despite that the levels of smart factories are the same due to the answers based on different perceptions. Third, until now, many studies have been conducted on the intention to introduce smart factories, use intentions, and effects on management effects. It is hoped that this study will be helpful in introducing and operating smart factories and that when more companies have built smart factories and continuously use them, more studies will be conducted on what kind of net im-pacts it will have on companies.

References 1. H.N. Min, A study on the level diagnosis and recognition of smart factory in the pharmaceutical industry, M.S. thesis, Industrial Pharmaceutical Science, Sungkyunkwan University., Seoul, South Korea, 2020 2. W.H. DeLone, E.R. McLean, The DeLone and McLean model of information systems success: a 10-year update. J. Manag. Inf. Syst. 19(4), 9–30 (2003) 3. N. Urbach, B. Müller, The updated DeLone and McLean model of information systems success, Information systems theory (Springer, New York NY, 2016), pp.1–18 4. K. Orr, Data quality and system theory. Commun. ACM 41(2), 66–71 (1998) 5. F.D. Davis, Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Q. 13(3), 319–339 (1989) 6. V. Parida, D. Örtqvist, Interactive effects of network capability, ict capability, and financial slack on technology based small firm innovation performance. J. Small Bus. Manage. 53, 278–298 (2015) 7. S.T. Kim, A study on factors affecting intention of introducing smart factory : focusing on the moderating effect of innovation resistance, Ph.D. dissertation, Dept. Bus. Incheon National Univ., Incheon, South Korea, 2021 8. J.R. Kim, Factors affecting intention to introduce smart factory in SMEs—including government assistance expectancy and task technology fit. J. Ventur. Innov. 3(2), 41–76 (2020) 9. W.H. DeLone, E.R. McLean, Information systems success: the quest for the dependent variable. Inf. Syst. Res. 3(1), 60–95 (1992) 10. F.D. Davis, R.P. Bagozzi, P.R. Warshaw, User acceptance of computer technology: a comparison of two theoretical models. Manage. Sci. 35(8), 982–1003 (1989) 11. K. Mathieson, Predicting user intentions: comparing the technology acceptance model with the theory of planned behavior. Inf. Syst. Res. 2(3), 173–191 (1991) 12. D. Gefen, D. Straub, The relative importance of perceived ease of use in IS adoption: a study of e-commerce adoption. J. Assoc. Inf. Syst. 1(8), 1–30 (2000) 13. S. Al-, The applicability of TAM outside North America: an empirical test in the United Kingdom. Inf. Resour. Manag. J. (IRMJ) 14(3), 37–46 (2001) 14. M.J. Tippins, R.S. Sohi, IT competency and firm performance: is organization all earning a missing link? Strateg. Manag. J. 24(8), 745–761 (2003) 15. J.S. Kang, K.T. Cho, An analysis of the effect of government support on automation and smart factory. J. Korea Technol. Innov. Soc. 21(2), 738–766 (2018)

124

S. J. Lee et al.

16. H.G. Kim, An empirical study on continuous use intention and switching intention of the smart factory. J. Korea Soc. Ind. Inf. Syst. 24(2), 65–80 (2019) 17. S.H. Hong, A study on the effect of goal level and change management factors on the continuous intention to use at the stage of smart factory introduction and operation, Ph.D. dissertation, Dept. Management of Technol. Hoseo Univ., Asan, South Korea, 2021

Protecting the Rights and Legitimate Interests of the Parties in B2B Non-personal Data Sharing Transactions: Experiences from the UK and EU and Lessons Learnt for Vietnam Phùng Dung Thi. M˜y Abstract In the context related to digital transformation and the digital economy taking place globally, the role of data becomes more crucial when data can be both input and output for the production/provision of goods or services, or it can be analyzed to create value for decision making or innovation. Moreover, the emergence of technology such as AI, IoT, cloud computing, VR/AR, 5G, data analytics technologies has promoted B2B data sharing. Therefore, the data-related rights and interests of stakeholders involved in B2B data sharing transactions become critical. Besides the issue of personal data protection, non-personal data in B2B transactions are also being concerned by policymakers because of its economic value. Currently, several governments have issued legal documents to regulate these issues. Therefore, it is necessary to study the legal framework on B2B non-personal data sharing of those countries to have a comparative view to understand the reasons behind the commonalities and differences between these legal solutions. And from there, the article will draw experiences from the above comparison and give recommendations to the process of drafting laws for Vietnam on B2B non-personal data sharing. Keywords Digital law · Digital economy · B2B data sharing · Non-personal data · Data protection law · Big data

1 Introduction Before stepping to the introduction of this study, some definitions of the terms used in this article should be clarified. “‘Data’ means any digital representation of acts, facts or information and any compilation of such acts, facts or information, including in the form of sound, visual or audiovisual recording” [1]. Non-personal data (NPD) is defined by the European Commission as “data other than personal data as defined in point (1) of Article 4 of Regulation (EU) 2016/679” [2]. B2B non-personal data D. T. M. Phùng (B) University of the West of England, Ho Chi Minh City, Vietnam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Lee (ed.), Software Engineering, Artificial Intelligence, Networking and Parallel/ Distributed Computing 2022-Winter, Studies in Computational Intelligence 1086, https://doi.org/10.1007/978-3-031-26135-0_10

125

126

Phùng D. T. M.

sharing is defined as “making (non-personal) data available to, or accessing data from, other companies for business purposes” [3, p. 6]. “‘User’ means a natural or legal person that owns, rents or leases a product or receives a service” [4]. “‘Data holder’ means a legal or natural person who has the right or obligation, or in the case of non-personal data and through control of the technical design of the product and related services, the ability, to make available certain data” [5]. The development and expansion in terms of type and volume of B2B Non-personal Data Sharing Transactions (B2BNPDST) have been documented by previous reports [6, 7]. The main reason for this is because of the combination of “growing scale and scope of NPD and increased capability and convergence of technology” [8] (such as AI, IoT, cloud computing, VR/AR, 5G, Data analytics technologies). This combination “has brought us past a point of inflection where the transformation of the economy has markedly accelerated, which is the transition to a new economic era, that of the Data-driven Economy (DDE), which succeeds the era of the knowledge-based economy” [8]. B2BNPDST are an integral part of this economy. Therefore, the study of these transactions is one of the essential studies of the DDE. Currently, several governments have issued legal documents to regulate B2BNPDST. Therefore, it is necessary to have a comparative study of the legal frameworks on B2B non-personal data sharing of those countries to understand the reasons behind the commonalities and differences between these legal solutions. And from there, the article will draw experiences from the above comparison and give recommendations to Vietnam’s law-making on B2BNPDST.

2 Features of B2BNPDST Before finding the legal issues related to B2BNPDST, it is necessary to understand the features of B2BNPDST. Some of the features will be introduced below.

2.1 The Pricing of B2BNPDST B2BNPDST should not be considered “for free” [9]. However, the pricing of these transactions is not the same as the transactions in the traditional economy because they refer “to the collection, aggregation, organization, analysis, exchange, and exploitation” of NPD [10, p. 260]. To understand how to price these transactions, we need to understand the NPD usage demands. Data is used “in production (such as in ‘smart manufacturing’ and ‘smart agriculture’), the sale of goods and services (such as through electronic commerce), the provision of services (such as through online platforms like Uber), and trade in data as such (whether for advertising, solicitation, or assessment, as for credit ratings)” [10]. From these demands, we can classify B2BNPDST into two groups: services involving NPD and transactions that consider data as goods. For the first group, the pricing of digital services can no longer use

Protecting the Rights and Legitimate Interests of the Parties in B2B …

127

the “pricing mechanisms that worked well for traditional industrial products in the past” [11, p. 10]. Instead, the value-based pricing is the more appropriate [11, p. 12] approach because the delivery of new service offerings or the digital-disrupted business models “can create the entirely new types of value-creation” [12]. And the role of NPD in these services, is the digital format of the value delivered. NPD can also be viewed as goods because NPD are traded by the value taken from it, depending on different usage or exploitation demands [13, para. 52]. In short, the conduct of B2BNPDST is the “process of securing profits from value-creation and the distribution of those profits among participating actors such as providers and customers” [12]. NPD in this process is part of the total value, which is tangible but in digital format.

2.2 B2BNPDST from an Investment Perspective Purchasing NPD can be seen as an investment as a business can generate additional value from aggregating and processing that acquired NPD [14] and use them for many purposes [13]. NPD are excludable by technology and stored privately [13, para. 53]. Thus, it can be seen as an asset of value to whoever holds it [14].

2.3 Multi-layered Digital Business Ecosystem (MLDBE)15 In providing data-driven services, the companies can rely on other services like Platform as a Service (PaaS) or Software as a Service (SaaS) and thus, forming the business ecosystem [3]. Besides, these PaaS or SaaS also have to rely on the digital infrastructure and network infrastructures such as cloud services and physical storage services. From here, a multi-layer digital business ecosystem is formed, and this is the typical business model of DDE [3].

2.4 Promptness Promptness is the typical characteristic of B2BNPDST [16, p. 2]. In DDE, “speed is everything” [16]. Particularly, in some services, the demand for promptness has been raised to the (near) real-time data [17]. Promptness is also the major difference between transactions in the traditional economy and DDE.

128

Phùng D. T. M.

3 Legal Issues Related to B2BNPDST NPD are considered the asset. Thus, the legitimate “owners” of NPD (who is legally acquiring the NPD without breaching any contractual rights or infringing others’ legally rights) should have their legitimate interests (exclusive rights with their NPD, rights to make transactions related to NPD the second market, the right to sue for deception, etc.). Currently, the legitimate rights of the legitimate “owners” in “owning” the NPD in B2BNPDST are not being fully protected by current law and technology. This study will apply current law and technology to analyze how legitimate rights and interests related to NPD are being protected in the context of B2BNPDST, thereby pointing out gaps in these protections. The aspect of monopoly, competition law, personal data, and illegal activities will not be discussed.

3.1 The Legal Approach There are three legal approaches in protecting the legitimate interests related to NPD.

3.1.1

Ownership Rights

The first approach is the ownership rights. Ownership is a bundle of rights granted for owner [18]. As for the characteristics of NPD, there are rights in the bundle of rights that are inappropriate in B2B NPD sharing context. For example, ownership allows the owner the right to mortgage his or her property. However, a database can be inconsistent [19]. Consequently, NPD mortgage will cause complications and consequences. Hence, ownership for NPD should not be encouraged. Other scholars also have the same opinion [13, para. 63], [20]. The question should be raised from this issue is that what rights should be granted for the person who has paid for his NPD instead of ownership.

3.1.2

Intellectual Property (IP) Rights

The second approach is the IP rights. However, this approach can protect the intellectual properties in the NPD sets, but not the other rest. NPD sets can also be protected by sui generis right. According to EU law, this right protects the content of the database, and “the maker of the database can prevent the extraction and/or reuse of the whole or a substantial part of the database’s content” for 15 years [21]. However, this right is not suitable for B2BNPDST as the buyers need to extract or reuse the whole or a substantial part [22] of the database for the provisions of goods or services.

Protecting the Rights and Legitimate Interests of the Parties in B2B …

3.1.3

129

Contractual Rights

The third approach is the contract law. This approach is more flexible than the other two because “the contractual obligations can protect data irrespective of any underlying legal rights and can impose fine-grained controls on the access, use and dissemination” [23, p. 15]. Contractual obligations can also include the protection of intellectual properties and confidential information. However, this negotiation mechanism cannot protect the legitimate interests of the parties in the situations below: (a) Economic loss because of the NPD sharing B2BNPDST are based on value created and delivered, thus, the risks are borne by both sellers and buyers. For the buyer side, the determination of the acceptable price is difficult since there is no standard for NPD and the data generation sources are varied (from different people and devices). Thus, the buyers have to depend on the sellers or the service providers. Unlike physical goods, NPD cannot be bought by everyone because not everyone can use or exploit NPD. Thus, a specific set of NPD is hardly to be repurchased by others. Therefore, buying the non-useful NPD [16] is also a great loss for buyers. In case NPD are technically open for sharing, the excludable characteristic of NPD is no longer valid because, unlike physical goods, NPD can be accessed or kept by multi-parties the same time [13, para. 52]. Therefore, for the seller side, once NPD have been shared, it cannot be recovered. And the seller’s loss will be greater if the shared NPD include IP rights and confidential information. Hence, contractual obligations cannot protect both parties in this situation as the wordings cannot fully describe NPD and contract performance process and the prevention of the process of sharing NPD with other non-legitimate parties is impossible. (b) Information asymmetry “The concept of asymmetric information plays a crucial role in the market microstructure theory involving the fact that market participants may have different information” [24, p. 179]. In B2BNPDST, the sellers have more information and technical power in controlling the making of products or the delivery of services. Thus, even if the buyer may have good knowledge and measure of risk, he is unable to fully assess the risk because of the information asymmetry issue. Moreover, information asymmetry exists not only in one transaction but also in the chains of other transactions in the MLDBE. Additionally, designed technology and digital platforms can increase users “digital dependency” [25]. In these circumstances, users will be less careful in measuring risks which facilitates information asymmetry on a large scale. And this issue cannot be prevented by the contract negotiation mechanism. (c) The uncertainty of contract performance NPD can be technically excludable [13, para. 52]. Therefore, the party that has the technical control of NPD will have the ability to prevent the accessing of others. The issue may arise in the circumstances of blocking of access to NPD by one party. This action can affect the chain of transactions in the MLDBE and affect the promptness of digital services. Thus, contractual obligations just can protect one

130

Phùng D. T. M.

specific transaction, but cannot prevent the consequences from the breach and protect the legitimate interests of parties in the chain. (d) Risks from using platforms This issue is about the role of the types of service providers like PaaS or SaaS. Businesses can use the tools or platforms provided by these providers to make or provide their goods or services for their clients. Reliance on these tools or platforms is a double-edged sword: businesses may not need to make huge investment in the IT-based infrastructure to run their business, but businesses may encounter the risks from using such tools and platforms. Specifically, existing PaaS and SaaS business models do not make it possible to tailor the design to each customer, because they have been designed to be able to provide services for multiple customers at the same time [26, p. 179]. Therefore, if PaaS or SaaS are designed with a non-transparent, inefficient, and unsafe workflow, it may affect the transaction chains based on those platforms. Second, these providers can increase the dependency of businesses by restricting the sharing of data created on that platform to other platforms (this issue can be called temporary-access models [27, p. 170]). Even if it is possible to do so, switching to another platform will have incompatibility problems, thereby affecting the quality of NPD. In these circumstances, contractual rights cannot protect the legitimate interests of the users in the chain of transactions including their rights related to NPD and their freedom of choice of services. (e) The rights of the legitimate “owners” in the secondary market Legitimate “owners” holding NPD may wish to resell, lease out their NPD, or transfer the transactions related to NPD. However, these transactions on the secondary market can be hindered. For example, NPD may be tied to a certain platform and that platform may limit or does not have a suitable format to facilitate NPD sharing [27, pp. 171– 173]. In practice, the transfer of NPD in this situation can occur by transferring the access to the platform to the next buyer. However, this may also be impeded if the platform provider obstructs or discriminates against the acquirer. Second, NPD can be excludable by contract [13, para. 52]. Provisions prohibiting the re-sharing of NPD in the secondary market can be used, which affects NPD legitimate “owners”. In such circumstances, contractual rights cannot protect the legitimate interests in the making of those transactions of the legitimate “owners” on the secondary market. (f) Third party related issues Since contractual rights “are rights in personam and can only be enforced against specific persons” [23, p. 15], the rights and obligations of other third parties are not regulated, and thus can affect the contractual rights of other parties. In this situation, the question should be raised is who can have access to the NPD of the legitimate “owners”? First, the hackers. Attacks by hackers that cause loss for transactions have been happening in DDE [28]. These events have raised questions about the responsibility of enhancing the security of the parties in DDE to secure their own transactions and not affect other transactions. Second, the parties in the lower layers of the MLDBE. These third parties have great power in terms of data collection because

Protecting the Rights and Legitimate Interests of the Parties in B2B …

131

they have access to NPD traded on their platforms. Therefore, the free-rider issue [13, para. 55] of these third parties should be raised because these companies are gaining profits for themselves but not having to pay NPD collection costs. Moreover, since they have access to NPD, they may also have access to IP and confidential information of other parties in the above layers of the MLDBE. They can therefore use such NPD without any obligations of protecting the confidentiality of the legitimate “owners”. Furthermore, the failure of one party in terms of making NPD or services available will affect the whole DDE. For example, if a company that provides physical storage to store data fails, DDE will face serious consequences. (g) Digital Ethics [29, 30] Data-driven decision-making is becoming popular in everyday life. It has its benefits but also its downside. Particularly, the decisions made on incorrect, faulty, outdated, or biased can be biased. This causes a lot of consequences for society. First, the making of unsafety of the products or services based on the faulty data can affect society on large scale, for example, the misdiagnosis in the medical industry or use of faulty data in the construction industry. Second, the use of the faulty data-driven applications, like self-driving cars and facial recognition based can also cause societal consequences. Third, since the data are being technically controlled and adjusted in favor of personal or group benefits, it can lead to the societal manipulation issues. These circumstances raise the question of ethical issues, which goes beyond the contract negotiation mechanism.

3.2 The Technological Approach Before discussing the technological approach to protect rights related to NPD, the following points need to be clarified. First, even if technology can guarantee the interests of the parties, from a legal perspective, only legitimate interests will be recognized. Second, no technology is perfect in protecting NPD-related rights in the multitude of unforeseen circumstances. Therefore, legitimate interests must be recognized by law, regardless of the effectiveness of the technology (Fig. 1). The current digital business models include two types of systems: centralized system and decentralized system. Both systems have their advantages and disadvantages [32, pp. 10–15]. However, in terms of NPD-related rights protection, the decentralized system has more advantages compared to the centralized system because a centralized system allows centralized control over other companies’ NPD [32]. Therefore, centralized system will cause large effects on the MLDBE chain as the issues mentioned above. However, these two systems are still being used concurrently, and problems related to centralized systems still occur. Moreover, security issues are still a problem for both systems [28]. Another type of technological application is the smart contract (SC). “SC is a useful program to exchange digital assets when certain conditions are met, or the external scheduled events occur. Thanks to this way of operating, SC can be very

132

Phùng D. T. M.

Fig. 1 Simplified patterns for centralized/decentralized and direct/indirect cross-ledger interoperability illustrating a single asset transfer from the source distributed ledger (a) to the target distributed ledger (b) [31]

useful to automate markets, products, services, and even new system of business organizations (Decentralized Autonomous Organizations or DAO)” [33, p. 7]. However, the issue of contract negotiation and contract performance still depends on the parties. Therefore, the legitimate interests of the parties have not been optimally protected [33, p. 12].

4 A Comparative Study of the Legal Approaches of the UK and EU in Protecting Legitimate Rights and Interests of the Parties in B2BNPDST From the above issues, we can see that from the asset view of NPD, both current law and technology have not yet fully guaranteed legitimate parties in B2BNPDST. Therefore, the question is to what extent and how should the law intervene to protect these legitimate interests. By referring to the legal approaches of countries to this problem, including hard law (acts, regulations) and soft law (guidance, standards), with the scope of regulation covers the whole economy or specific industries, this article will provide a comparison of legal approaches of the UK and EU and does not provide an analysis of the effectiveness of these approaches. The comparison will be put in the Table 1. The UK and EU legal approaches have similarities as they both have the data strategies, hard law, and soft law with the scope of application for the whole economy and specific industries. Specifically, UK case law and EU law have provisions on the data holders’ obligation to make data available to users. Additionally, their Intellectual

Protecting the Rights and Legitimate Interests of the Parties in B2B …

133

Table 1 Comparison of input and output variables in previous research Legal issues

UK

EU

Ownership rights

• This approach is not mentioned in the National Data Strategy

• Purchase, rent or lease (Article 3.2 Data Act)

IP protections

• Part 4 Intellectual Property of Digital Act 2017 • Amending relevant IP laws: increasing infringement cases and conditions for exemptions (in the context of the digital economy) • There is no specific law for B2BNPDST within the DDE. Hard law and soft law for DDE are under development according to UK National data strategy • Industry-specific regulations are also being developed and applied. (For example: Industrial Strategy Artificial Intelligence Sector Deal

• Keep the existing rules in the areas of intellectual property • Sui generis right of the Database Directive: in review process to align with Data Act

Beyond contractual obligations

• For DDE: Hard law: Data Governance Act, Data Act, Digital Services Act, Digital Markets Act Soft law: Recommended Contract Terms for Data Sharing • Specific industry: Proposal for a regulation—The European Health Data Space

(continued)

Property laws have been adjusted to fit the context of the data-driven economy. The main difference, compared to the EU, the UK is more inclined toward soft law. The reason for this may lies in the policies and management of the countries. Therefore, it can be concluded that the UK and EU legal approaches on protecting NPD-related rights and legitimate interests of the parties in B2BNPDST have similarities.

5 Experiences for Vietnam Vietnam’s DDE “has been booming and is the second fastest-growing market in Southeast Asia after Indonesia” [35, p. 1]. Although Vietnam has Intellectual Property Law, Cybersecurity Law, draft decree on the protection of personal data [36], Vietnam does not have laws on B2BNPDST. It is advisable to refer to other countries’ regulations on this matter. However, law importation should not be encouraged because of these reasons. First, different countries with different economic, cultural, and social contexts cannot apply the same formula. Second, the enactment of law comes with the responsibility of the competent authorities in managing the activities arising from that enactment. With the civil law background, the application of hard law to B2BNPDST in Vietnam will be more appropriate. The development of a legal framework for B2BNPDST needs to go hand

134

Phùng D. T. M.

Table 1 (continued) Legal issues

UK

1. Economic loss because of NPD sharing

• Enhancing trust by soft law • For the users: The right of users to access and use data • DDE: Data Quality generated by the use of Framework, Information products or related services management Framework, (Article 4 Data Act) Data Maturity Model, Data Manage Community of Good • For the seller, rentor, lessor, data holder: “The data Practice, AI Auditing recipient shall destroy the Framework data make made available by • Specific industries: ISO the data holder and any copies 19650 (Construction industry) thereof or end the production, offering, placing on the market or use of goods.” in cases that “they have provided inaccurate or false information to the data holder, deployed deceptive or coercive means or abused evident gaps in the technical infrastructure of the data holder designed to protect the data, has used the data made available for unauthorized purposes or has disclosed those data to another party without the data holder’s authorization” (Article 11.2 Data Act)

EU

2. Information asymmetry

• N/A

3. The uncertainty of contract performance

• Case law on obligation of data • Data holder: Conditions under holder to make data available: which data holders make data Mott Mac-Donald Ltd v Trant available to data recipients Engineering Ltd [2017] (Article 8 Data Act), EWHC 2061 (TCC) Compensation for making data available (Article 9 Data Act) (continued)

• For the seller, rentor, lessor, data holder: Obligation to make data generated by the use of products or related services accessible (Article 3 Data Act), Essential requirements regarding interoperability (Article 28 Data Act)

Protecting the Rights and Legitimate Interests of the Parties in B2B …

135

Table 1 (continued) Legal issues

UK

EU

4. Risks from using platforms

• N/A

• Chapter VI: switching between data processing services Data Act, Interoperability for data processing services (Article 29 Data Act) • Digital Services Act: “sets a higher standard of transparency and accountability on how the providers of such plat forms moderate content, on advertising and on algorithmic processes. It sets obligations to assess the risks their systems pose to develop appropriate risk management tools to protect the integrity of their services against the use of manipulative techniques” [34]

5. Rights of the legitimate “owners” in the secondary market

• N/A

• For the users: Right to share data with third parties (Article 5 Data Act) • For the third parties in the secondary market: Obligations of third parties receiving data at the request of the user (Article 6 Data Act)

• Security: Technical protection 6. Third party related issues • Security measures and provisions on • Soft law: guidance on security unauthorized use or disclosure matter: Cyber Assessment of data (Article 11 Data Act), Framework, NIS Regulations EU Cybersecurity Act 2018, etc • For MLDBE and Storage • For MLDBE: Chapter providers: N/A III—Practices of gatekeepers that limit contestability or are unfair of Digital Markets Act, Chapter III—Requirements Applicable to Data Sharing Services of Data Governance Act • Storage providers: Data Storage & Processing Services—nteraction with Data Protection Rules (in development process) (continued)

136

Phùng D. T. M.

Table 1 (continued) Legal issues

UK

EU

7. Digital ethics

• AI: UK Proposal for AI regulation • Others: N/A

• AI: Artificial Intelligent Act • Others: N/A

in hand with the protection of personal data. For the regulation of B2BNPDST to be more effective, the law on personal data protection must be regulated in accordance with the way B2BNPDST will be managed in the future. While there is no general regulation for personal data and NPD protection, the protection of legitimate interests of parties in B2BNPDST in industries that are already in the digital transformation process should be prioritized. Therefore, it is advisable to have industry-specific guidelines before there is a general protection law for personal data and NPD. Finally, as B2BNPDST is the source creating NPD for society, thus, digital ethics issues should also be considered when regulating B2BNPDST to minimize the related societal consequences.

6 Conclusion This study analyzes the features of B2BNPDST in DDE. From that, the study analyzes the legal issues of protecting legitimate interests of the parties in B2BNPDST from the legal approaches (ownership, IP protection, contract law) and the technological approach. This study also makes a comparison between UK law and EU law on these legal issues and has found out that the EU and UK both protect access to NPD protected by hard law. Finally, the article outlines the provisions of the current Vietnamese laws and provides implications for the law-making process on B2BNPDST of Vietnam.

References 1. EU Data Act, Article 2(1) 2. EU Data Governance Act, Article 2(3) 3. European Round Table, Expert Paper: B2B Data Sharing (2021). https://ert.eu/wp-content/upl oads/2021/06/ERT-Expert-Paper-B2B-Data-Sharing-FINAL.pdf. Accessed 24 May 2022 4. EU Data Act, Article 2(5) 5. EU Data Act, Article 2(6) 6. OECD, Data-Driven Innovation: Big Data for Growth and Well-Being (2015). https:// read.oecd-ilibrary.org/science-and-technology/data-driven-innovation_9789264229358-en# page23. Accessed 24 May 2022 7. UN, The Value and Role of Data in Electronic Commerce and the Digital Economy and Its Implications for Inclusive Trade and Development (2019). https://unctad.org/system/files/off icial-document/tdb_ede3d2_en.pdf. Accessed 24 May 2022

Protecting the Rights and Legitimate Interests of the Parties in B2B …

137

8. C. Dan, The economics of data: implications for the data-driven economy, in Data Governance in the Digital Age, Ed. Centre for International Governance Innovation (2018) 9. European Commission, Study on Data Sharing Between Companies in Europe (2018). https:// op.europa.eu/en/publication-detail/-/publication/8b8776ff-4834-11e8-be1d-01aa75ed71a1/ language-en. Accessed 24 May 2022 10. G. Shaffer, Trade law in a data-driven economy: the need for modesty and resilience. World Trade Rev. 20(3), 259–281 (2021) 11. Deloitte, Pricing of Digital Products and Services in The Manufacturing Eco-system: From Cost-Based to Value-Based Pricing. https://www2.deloitte.com/content/dam/Deloitte/de/Doc uments/energy-resources/IPC_Pricing%20of%20digital%20products_final.pdf. Accessed 24 May 2022 12. G.K. Agarwal et al., Value-capture in digital servitization. J. Manuf. Technol. Manag. (2022). https://doi.org/10.1108/JMTM-05-2021-0168 13. T. Tombal, Imposing Data Sharing Among Private Actors: A Tale of Evolving Balances (Kluwer Law International BV, 2022) 14. UN, Digital Economy Report 2021: Cross-border Data Flows and Development: For Whom the Data Flow. United Nations (2021) 15. V. Chang, L. Uden, Governance for E-learning ecosystem, in Paper presented at the 2008 2nd IEEE International Conference on Digital Ecosystems and Technologies (2008) https://doi. org/10.1109/DEST.2008.4635164 16. Z. Zhang et al., The elements of data sharing. Genom. Proteom. Bioinf. 18(1), 1–4 (2020). https://doi.org/10.1016/j.gpb.2020.04.001 17. S. Davari, et al., Demystifying the definition of digital twin for built environment, in The 9th International Conference on Construction Engineering and Project Management, University of Nevada, Las Vegas, 20–24 June 2022 18. E. Baskind, G. Osborne, L. Roach, Commercial Law, 3rd edn. (Oxford University Press, 2019) 19. L. Bertossi, Inconsistent databases, in Encyclopedia of Database Systems, ed. by L. Liu, M.T. Özsu (Springer, 2009). https://doi.org/10.1007/978-0-387-39940-9_1242 20. I. Cofone, Beyond data ownership. Cardozo Law Rev. 43(2), 501–572 (2020) 21. EU, Database Protection. https://europa.eu/youreurope/business/running-business/intellect ual-property/database-protection/index_en.htm. Accessed 24 May 2022 22. OECD, Measuring the Economic Value of Data and Cross-border Data Flows: A Business Perspective (OCED Publishing, 2020) 23. T. Bond, Howbest to protect proprietary data in data-sharing deals, in The Guide to Data as a Critical Asset 2022. Law Business Research (2022). https://www.twobirds.com/-/media/newwebsitecon-tent/pdfs/2022/articles/2022_gdr-data_how-best-to-protect-proprietary-data.pdf. Accessed 23 May 2022 24. R. Karaa, S. Slim, D. Hmaied, Trading intensity and informed trading in the tunis stock exchange, in Emerging Markets and the Global Economy, ed. by M. Arouri, S. Boubaker, D. Nguyen. (Academic Press, 2014), pp 179–200 25. J. Pitt,From Trust and Loyalty to Lock-In and Digital Dependence (2020). https://technolog yandsociety.org/from-trust-and-loyalty-to-lock-in-and-digital-dependence/. Accessed 24 July 2022 26. K.S. Rahman, K. Thelen, The rise of the platform business model and the transformation of twenty-first-century capitalism. Polit. Soc. 47(2), 177–204 (2019) 27. A. Perzanowski, J. Schultz, The End of Ownership: Personal Property in the Digital Economy (The MIT Press, 2018) 28. S. Morgan,Cybercrime to Cost the World $10.5 Trillion Annually By 2025 (2016). https://cyb ersecurityventures.com/hackerpocalypse-cybercrime-report-2016/. Accessed 24 May 2022 29. R. Montague,Digital ethics, in Irish Building Magazine 2022 Issue 2 (2022). https://edition. pagesuite-profession-al.co.uk/html5/reader/production/default.aspx?pubname=&edid=09f f22aa-baaf-4b71-aafe-b4934ba12bfa&pnum=104. Accessed 24 July 2022 30. OECD, Good Practice Principles for Data Ethics in the Public Sector. https://www.oecd.org/ digital/digital-government/good-practice-principles-for-data-ethics-in-the-public-sector.htm. Accessed 24 May 2022

138

Phùng D. T. M.

31. A. Sunyaev, Token economy. Bus. Inf. Syst. Eng. 63(4), 457–478 (2021) 32. G. Cuillier, Advantages and Disadvantages of Centralized Versus Decentralized Information Systems and Services from a Project Management Perspective. M.S. thesis, Faculty of California State University, San Bernardino (2022). https://scholarworks.lib.csusb.edu/etd/1487. Accessed 24 July 2022 33. P. Bayón,Key legal issues surrounding smart contract applications. KLRI J. Law Legislation 9(1), 63–91 (2019). https://ssrn.com/abstract=3525778 34. European Commission, Digital Services Act: Questions and Answers (2022). https://digital-str ategy.ec.europa.eu/en/faqs/digital-services-act-questions-and-answers. Accessed 24 May 2022 35. B. Le, P. Tran, Digital Economy and Digital Transformation in Vietnam: A Reader Prepared for Roundtable Series on EVFTA, EVIPA and Post-COVID-19 Economic Recovery in Vietnam (2020). https://www.economica.vn/Content/files/PUBL%20%26%20REP/EVFTA% 20and%20Digital%20Economy%20in%20Vietnam%20ENG.pdf. Accessed 24 May 2022 36. Vietnam Business Law, Draft New Decree on Personal Data Protection in Vietnam (2021). https://vietnam-business-law.info/blog/2021/8/30/draft-new-decree-on-personal-dataprotection-in-vietnam. Accessed 24 May 2022

Capacitive Pressure Sensor Based on Interdigitated Capacitor for Applications in Smart Textiles Tran Thuy Nga Truong and Jooyong Kim

Abstract Today, Electronic skin (E-skin) is an active research area in human– computer interaction and intelligent wearable devices using different approaches. Our research aims to offer a systematic approach to electro-textile pressure sensors that rely on interdigitated capacitors (IDCs) for applications in artificial intelligence, particularly those involving smart wearable textile devices, including choosing appropriate materials to achieve high sensing performance. Firstly, the sensor characteristics were measured with a precision LCR meter throughout the frequency range from 1 to 300 kHz. The parallel plate measurement method is applied to measure and quantify the dielectric change as a function of pressure-sensing performance. The 16451B dielectric test fixture Keysight, which can measure the dielectric material with frequency, was applied to measure the dielectric constant and dissipation factor. The influence of choosing the substrate layer on sensitivity operation was included. Second, the influence of the volume percentage of CNTs on the characteristics of composite materials is examined. The presence of CNT improves the bonding strength of the sensor layer as well as the sensor’s robustness when used in highfrequency applications. Thirdly, a large pressure detection range (up to 400 kPa) and quick response times (less than 20 ms) with a sensitivity range of 0.008125 (at 400 kPa) to 0.1 KPa1 have been made possible by the combination of CNTs with Ecoflex rubber (at 20 kPa). Finally, to obtain excellent performance, variables including frequency, fabric substrate, filler loading, and dielectric layer structure must be properly considered. Keywords Capacitive pressure sensor · Electronic textiles · Electronic skin · Flexible wearable sensor · Printed-type sensor

This research is supported by Soongsil University. T. T. N. Truong · J. Kim (B) Soongsil University, Seoul, South Korea e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Lee (ed.), Software Engineering, Artificial Intelligence, Networking and Parallel/ Distributed Computing 2022-Winter, Studies in Computational Intelligence 1086, https://doi.org/10.1007/978-3-031-26135-0_11

139

140

T. T. N. Truong and J. Kim

1 Introduction Electronic skin is an artificial skin that mimics human skin’s functionalities and mechanical properties, which brings many potential applications for robotics [1], prosthetics, or wearable attachable devices to perceive the external environment with tactile sensing capability [2]. With the help of E-skin, professionals can measure human activity and diagnose diseases. The analysis of keywords related to current E-skin research showed that the pressure sensor, the strain sensor, and flexible electronics are the most focused research directions. Among these topics, the pressure sensor is the primary area of E-skin research. Material detection or tactile sensing is one of the vital elements of skin; in this way, creating electronic skin that ability contact is critical. Remember that composites’ electrical and mechanical properties strongly depend on the percolation of conductive fillers such as carbon nanotubes, graphene, silver nanowires, etc. High percolation of conductive fillers may improve conductivity at the cost of compromised flexibility and stretchability. Many piezoelectric, triboelectric, piezoresistive, and piezocapacitive performance methods have been studied. Among them, capacitive pressure sensors relying upon two electrically conductive parallel electrodes separated by equal plates are broadly utilized because of lower power consumsion, quicker reaction times, and simple structure and working principle. Theoretically, the capacitance of a parallel-plate capacitor is caculated by Eq. (1): C=

εr ε0 A d

(1)

where εr stands for the permittivity constant of the material, ε0 is the vacuum dielectric constant, A represents the effective area of the upper and lower plates, and d denotes the separation gap or the spacing between the two plates. This method involves changing the dielectric layer’s thickness in response to an external force, which also affects the sensor’s capacitance. Because of the dependence of the pressure sensitivity on the parameters A and d in Formula (1), changing the area or thickness has an impact. The most common method for capacitive pressure sensing, where the separation gap changes as a result of applied pressure, is changing the separation gap. The compression distance here is (Δd), so the greater the pressure, the narrower the distance between the two plates, leading to increased capacitance. By varying εr or d, Two major categories can be used to separate the capacitive pressure sensors: dynamic electric layer [3] and compression distance between two plates [4]. These techniques include changing the dielectric sensing layer’s thickness in response to an external force, which causes the sensor’s capacitance to change according to the Eq. (1). Because of reliance on the dielectric layer εr and thickness d in Eq. (1), changing these prameters influences the responsiveness and the size. As a result, responsiveness is often extremely low, and the size is often large to achieve high performance. In any case, these strategies have slow recovery times, high costs, and complex manufacturing processes to create microstructures.

Capacitive Pressure Sensor Based on Interdigitated Capacitor …

141

In this study, we presented a wearable capacitive pressure sensor based on Interdigitated Capacitor. Our work aims to propose the design of a wearable pressure sensor that depends only on one side electrode and centers around employing just only one electrode. The number of fingers and the dielectric constant of the synthetic polymer layer are what affect the presented sensor instead of the distance or contact surface of the dielectric layer (A). The interdigitated capacitor pressure sensor is then introduced. The electrodes of the proposed detector are created using silver ink printed on Polyester fabric, which looks like a brush with many interdigitated fingers. This paper’s efforts in order to increase awareness of the proposed sensor are gathered into two main investigations: the changes in the dielectric constant under pressure and the concentration of nanoparticles inside the dielectric layer. Finally, the examination results demonstrated that the presented sensor could be advantageous in size, high sensitivity, low cost, and low power consumption, bringing numerous potential applications in Smart Textiles.

2 Theoretical Background In this section, we focus on exploiting three focal investigations. First is the principle of operation of IDC. Second is the dispersion of carbon nanotubes or CNTs inside the composite for the improvement of polymer properties. And finally, some promising solutions further to help increase the sensor’s performance. In previous studies, we also evaluated the effect of frequency on sensor performance; this was also completed with sensors printed on fabric and had the same results [5]. In other studies, we also demonstrated the advantages of silver thread when operating at high frequencies compared to silver ink [6, 7]. Therefore, to increase the flexibility of the sensor, it is possible to change from printed to leased form for many purposes and have the exact operating mechanism. It should be added that in the case of switching from printed to embroidered form, a note to be taken into account is that parasitic capacitance occurs at high frequencies. In our previous work, it was shown that these parasitic capacitances reduce the stability of the sensor. Therefore, the embroidery form will have less stability than the printed form, while its thickness will also be thicker than the printed form but in return for greater flexibility. Our goal is not only to systematically provide the proposed sensor analysis but also to provide concluding information on various design aspects for researchers and manufacturers.

2.1 Principle of Operation The interdigitated capacitor has a structure that looks like a comb with multi-fingers. Compared with parallel plate capacitors (PPC), this creates a structure with a more effective area than PPC taking the same dimension. Compared to a parallel capacitor,

142

T. T. N. Truong and J. Kim

this design has a higher quality factor. The sensor’s construction is based solely on a single coplanar port, necessitating just a single size for connecting it to the controller. The capacitance value estimated between narrow gaps of finger electrodes depends on the dielectric constants of both sensing and substrate layers. When the gaps decrease, the capacitance increases accordingly. Moreover, the sensitivity is determined by Eq. (2): S = (ΔC/C0 )/P

(2)

where P is the applied force. C0 is standard capacitance or the initial capacitance, and it is proportional to the dielectric constant of the substrate. The higher the initial capacitance, the lower the sensitivity achieved. So choosing a suitable fabric becomes important in reaching high sensitivity (S). As observed in Fig. 1, the limits displayed define the proposed sensor. By adding the unit cell capacitance, the planar interdigital microstrip in Fig. 1b is calculated. The characteristic capacitance of each unit cell can be analyzed as Eq. (3) [8]. CCell = C MU T + C Sub + C G C MU T + C Sub

(3)

√ (εMUT + εSub )K ( 1 − δ 2 ) = ε0 2K (δ)

(4)

h a

(5)

C G = ε0 εMUT

Fig. 1 a Top view and b cross-section

Capacitive Pressure Sensor Based on Interdigitated Capacitor …

δ=

143

h a

(6)

where ε0 = 8.85x10−12 F/m reperesents the relative permittivity of free space, ε Sub represents the dielectric constant of the substrate here is Polyeste fabric. C Sub is the capacitance of substrate, and ε MU T is the relative permittivity of the MUT. C MU T is the capacitance of material under test (MUT). C G is the capacitance between two fingers. K(x) is the elliptic integrals of the first kind, h is the height of the metal layer, and a is the dimension of one unit cell. Usually, we chose the gap among fingers (G), and spaces from the electrode toward the end of the fingers are similar. The proposed sensor with eight fingers, as shown in Fig. 1, has the following dimensions illustrated in Table 1. Because of the potential for low resistance and high conductivity, the design of the introduced sensor was screen printed using silver paste, which means the fabric substrate is directly attached to the silver paste. Moreover, the advantage of using liquid glue is that we can define the shape of the electrode on the flexible substrate suitable using a simple technique, printing. The characteristics and curing states of silver paste (DM-SIP-2001) from Dycotec material applied in this study are displayed in Table 2. Under pressure, the SUT (sample under test) is placed on the surface substrate, and the capacitor between the finger terminals will vary depending on the frequency and dielectric. The effect of the SUT constant on the change in capacitance under pressure was used to estimate the sensor’s sensitivity. As demonstrated in the following sections, the introduced sensor converts the dielectric variations of the MUT into force. This happens because an increase in dielectric constant corresponds to an increase in compressive strength. Equation (4) shows that (εMUT + εSub ) represents the dielectric variations of the MUT and substrate corresponding with the sensor’s capacitance variation. This parameter sum of two capacitors is far more enormous than the capacitance that happens between electrodes C G , so the performance of C G , in this case, can be Table 1 The integrated capacitor’s dimensions

Table 2 Characteristics of silver ink printed on fabric

Parameter

Dimension (mm)

Width of the finger (W )

0.57

The gap between two fingers (G)

0.70

Distance of the overlapped region (L)

15

Dimension of feedline (W f )

2.80

Type

Property

Curing conditions (°C, min)

120, 15

Density (g/cm3 )

2.1

Sheet resistance (mΩ/Sq./mil)