Applied Computing and Information Technology [1st ed. 2020] 978-3-030-25216-8, 978-3-030-25217-5

This book gathers the outcomes of the 7th International Conference on Applied Computing and Information Technology (ACIT

303 92 8MB

English Pages XII, 179 [183] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Applied computing et information technology 978-3-319-64050-1, 331964050X, 978-3-319-64051-8

335 93 6MB Read more

Advances in Information Communication Technology and Computing: Proceedings of AICTC 2019 [1st ed.] 9789811554209, 9789811554216

This book features selected research papers presented at the International Conference on Advances in Information Communi

623 92 1MB Read more

Tribocorrosion (SpringerBriefs in Applied Sciences and Technology) [1st ed. 2020] 3030481069, 9783030481063

This book is a toolbox for identifying and addressing tribocorrosion situations from an engineering point of view. It is

116 35 6MB Read more

Tribocorrosion (SpringerBriefs in Applied Sciences and Technology) [1st ed. 2020] 3030481069, 9783030481063

This book is a toolbox for identifying and addressing tribocorrosion situations from an engineering point of view. It is

291 22 18MB Read more

Information, Communication and Computing Technology: 5th International Conference, ICICCT 2020, New Delhi, India, May 9, 2020, Revised Selected Papers [1st ed.] 9789811596704, 9789811596711

This book constitutes the refereed proceedings of the 5th International Conference on Information, Communication and Com

319 109 27MB Read more

Graphic Intelligence: Drawing and Cognition (SpringerBriefs in Applied Sciences and Technology) [1st ed. 2020] 9783030452445, 9783030452438, 3030452441

152 74 15MB Read more

Augmented Reality in Education: A New Technology for Teaching and Learning (Springer Series on Cultural Computing) [1st ed. 2020] 9783030421557

This is the first comprehensive research monograph devoted to the use of augmented reality in education. It is written b

4,835 59 22MB Read more

Emerging Trends in Intelligent Computing and Informatics: Data Science, Intelligent Information Systems and Smart Computing [1st ed. 2020] 978-3-030-33581-6, 978-3-030-33582-3

This book presents the proceedings of the 4th International Conference of Reliable Information and Communication Technol

1,378 114 80MB Read more

Information Technology in Biomedicine [1st ed.] 9783030496654, 9783030496661

The rapid and continuous growth in the amount of available medical information and the variety of multimodal content has

1,025 111 13MB Read more

Innovative Computing: IC 2020 [1st ed.] 9789811559587, 9789811559594

This book gathers peer-reviewed proceedings of the 3rd International Conference on Innovative Computing (IC 2020). This

381 90 20MB Read more

Applied Computing and Information Technology [1st ed. 2020]
978-3-030-25216-8, 978-3-030-25217-5

Author / Uploaded
Roger Lee

Table of contents :
Front Matter ....Pages i-xii
An Algorithm Integrating and Updating Rights Management Information on Public Domain Images (Youngmo Kim, Il-Hwan Kim, Deok-Gi Hong, Seok-Yoon Kim)....Pages 1-16
Improvement of Data Sparsity and Scalability Problems in Collaborative Filtering Based Recommendation Systems (Ji-Won Choi, Sang-Kweon Yun, Jong-Bae Kim)....Pages 17-31
A Study on the Methods for Establishing Security Information & Event Management (Geyong-Sik Jeon, Sam-Hyun Chun, Jong-Bae Kim)....Pages 33-45
Multi-TSV (Through Silicon Via) Error Detection Using the Non-contact Probing Method (Sang-Min Han, Youngkyu Kim, Jin-Ho Ahn)....Pages 47-56
Real-Time Ultra-Wide Viewing Player for Spatial and Temporal Random Access (Gicheol Kim, Haechul Choi)....Pages 57-69
A Study on the Faith Score of Telephone Voices Using Machine Learning (Hyungwoo Park)....Pages 71-80
A Comparative Study of Using Bag-of-Words and Word-Embedding Attributes in the Spoiler Classification of English and Thai Text (Rangsipan Marukatat)....Pages 81-93
Fall Detection of Elderly Persons by Action Recognition Using Data Augmentation and State Transition Diagram (Ayaka Takebayashi, Yuji Iwahori, Shinji Fukui, James J. Little, Lin Meng, Aili Wang et al.)....Pages 95-109
Elliptic Curve Cryptography and LSB Steganography for Securing Identity Data (Christofer Derian Budianto, Arya Wicaksana, Seng Hansun)....Pages 111-127
Labeling Algorithm and Fully Connected Neural Network for Automated Number Plate Recognition System (Kevin Alexander, Arya Wicaksana, Ni Made Satvika Iswari)....Pages 129-145
Implementation of Creation and Distribution Processes of DACS Rules for the Cloud Type Virtual Policy Based Network Management Scheme for the Specific Domain (Kazuya Odagiri, Shogo Shimizu, Naohiro Ishii)....Pages 147-164
Transforming YAWL Workflows into Petri Nets (Wanwisa Paakbua, Wiwat Vatanawood)....Pages 165-177
Back Matter ....Pages 179-179

Citation preview

Studies in Computational Intelligence 847

Roger Lee Editor

Applied Computing and Information Technology

Studies in Computational Intelligence Volume 847

Series Editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland

The series “Studies in Computational Intelligence” (SCI) publishes new developments and advances in the various areas of computational intelligence—quickly and with a high quality. The intent is to cover the theory, applications, and design methods of computational intelligence, as embedded in the fields of engineering, computer science, physics and life sciences, as well as the methodologies behind them. The series contains monographs, lecture notes and edited volumes in computational intelligence spanning the areas of neural networks, connectionist systems, genetic algorithms, evolutionary computation, artificial intelligence, cellular automata, self-organizing systems, soft computing, fuzzy systems, and hybrid intelligent systems. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution, which enable both wide and rapid dissemination of research output. The books of this series are submitted to indexing to Web of Science, EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink.

More information about this series at http://www.springer.com/series/7092

Roger Lee Editor

Applied Computing and Information Technology

123

Editor Roger Lee Software Engineering and Information Technology Institute Central Michigan University Mount Pleasant, MI, USA

ISSN 1860-949X ISSN 1860-9503 (electronic) Studies in Computational Intelligence ISBN 978-3-030-25216-8 ISBN 978-3-030-25217-5 (eBook) https://doi.org/10.1007/978-3-030-25217-5 © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Foreword

The purpose of the 7th International Conference on Applied Computing and Information Technology (ACIT 2019) held on May 29–31 in Honolulu, Hawaii, was to together researchers, scientists, engineers, industry practitioners, and students to discuss, encourage, and exchange new ideas, research results, and experiences on all aspects of Applied Computing and Information Technology, and to discuss the practical challenges encountered along the way and the solutions adopted to solve them. The conference organizers have selected the best 12 papers from those papers accepted for presentation at the conference in order to publish them in this volume. The papers were chosen based on review scores submitted by members of the program committee and underwent further rigorous rounds of review. In chapter “An Algorithm Integrating and Updating Rights Management Information on Public Domain Images,” Youngmo Kim, Il-Hwan Kim, Deok-Gi Hong, and Seok-Yoon Kim proposed an algorithm that extracts and analyzes the information needed to update the latest Rights Management Information (RMI) and correct the wrong or duplicate RMI information in various public domain sites. In chapter “Improvement of Data Sparsity and Scalability Problems in Collaborative Filtering Based Recommendation Systems,” Ji-Won Choi, Sang-Kweon Yun, and Jong-Bae Kim proposed a collaborative filtering technique that improves the limits of the collaborative filtering recommendation technique’s data sparsity and scalability through two-stage clustering. In chapter “A Study on the Methods for Establishing Security Information & Event Management,” Geyong-Sik Jeon, Sam-Hyun Chun, and Jong-Bae Kim presented a security log analysis system utilizing security information and event management (SIEM) system to cope with an advanced attack that the existing ESM can hardly detect. SIEM will analyze an association between data and security event occurring in major IT infrastructure facility network, system, applied services, and a great deal of information security systems, and then to present the methods for identifying, in advance, potential security threat.

v

vi

Foreword

In chapter “Multi-TSV (Through Silicon Via) Error Detection Using the Noncontact Probing Method,” Sang-Min Han, Youngkyu Kim, and Jin-Ho Ahn proposed a simple and fast through-silicon via (TSV) fabrication error detection method for 3-D IC applications. From the post-processing of the measured data, the capability of error detection is verified for error cognition, the size, and the number of errors. In chapter “Real-Time Ultra-Wide Viewing Player for Spatial and Temporal Random Access,” Gicheol Kim and Haechul Choi introduced a spatial and temporal random-access method that uses NVIDIA’s NvCodec library. Furthermore, they proposed a transrating method to lower the bit rate of the region of non-interest. Using the proposed methods, customized service can be implemented. In chapter “A Study on the Faith Score of Telephone Voices Using Machine Learning,” Hyungwoo Park presented a study to evaluate whether a telephone voice is human or a machine. In this study, they proposed a judge to judge whether a voice is real by using a support vector machine. They proposed a personal telephone truth discriminator. In chapter “A Comparative Study of Using Bag-of-Words and Word-Embedding Attributes in the Spoiler Classification of English and Thai Text,” Rangsipan Marukatat compared the effectiveness of using traditional bag-of-words and word-embedding attributes to classify movie comments into spoiler or non-spoiler. Both approaches were applied to comments in English, an inflectional language, and in Thai, a non-inflectional language. In chapter “Fall Detection of Elderly Persons by Action Recognition Using Data Augmentation and State Transition Diagram,” Ayaka Takebayashi, Yuji Iwahori, Shinji Fukui, James J. Little, Lin Meng, Aili Wang, and Boonserm Kijsirikul proposed a method to acquire posture information of a person from a camera watching over elderly people and to recognize the behavior of the person using long short-term memory. The proposed method detects falls of elderly people automatically using the result of action recognition. In chapter “Elliptic Curve Cryptography and LSB Steganography for Securing Identity Data,” Christofer Derian Budianto, Arya Wicaksana, and Seng Hansun proposed a method of using elliptic curve cryptography (ECC) to secure the Indonesia identity card data. Least significant bit steganography is used after the ECC to embed the information (ciphertext) into the person picture. Thus, the only stored information in the application is a collection of photos. In chapter “Labeling Algorithm and Fully Connected Neural Network for Automated Number Plate Recognition System,” Kevin Alexander, Arya Wicaksana, and Ni Made Satvika Iswari proposed a method for using a labeling algorithm and a fully connected neural network that are used to create an ANPR system for vehicle parking management in Universitas Multimedia Nusantara, Indonesia. The system was built using Java and the Android SDK for the client and PHP for the server. In chapter “Implementation of Creation and Distribution Processes of DACS Rules for the Cloud Type Virtual Policy Based Network Management Scheme for the Specific Domain,” Kazuya Odagiri, Shogo Shimizu, and Naohiro Ishii proposed

Foreword

vii

a method to solve the problem of anonymity of the network communication such as personal information leaks and crimes using the Internet system. To solve this problem, they applied a Destination Addressing Control System (DACS) scheme to Internet system management. In chapter “Transforming YAWL Workflows into Petri Nets,” Wanwisa Paakbua and Wiwat Vatanawood proposed an alternative to transform a YAWL workflow into a corresponding Petri nets model. A set of mapping rules would be proposed to cope with the non-well-formed model of YAWL. The resulting Petri nets model of the YAWL workflows is correct and ready for model checking. It is our sincere hope that this volume will provide stimulation and inspiration and that it will be used as a foundation for works to come. May 2019

Takaaki Goto Toyo University Kawagoe, Japan

Contents

An Algorithm Integrating and Updating Rights Management Information on Public Domain Images . . . . . . . . . . . . . . . . . . . . . . . . . . Youngmo Kim, Il-Hwan Kim, Deok-Gi Hong and Seok-Yoon Kim

1

Improvement of Data Sparsity and Scalability Problems in Collaborative Filtering Based Recommendation Systems . . . . . . . . . . Ji-Won Choi, Sang-Kweon Yun and Jong-Bae Kim

17

A Study on the Methods for Establishing Security Information & Event Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . Geyong-Sik Jeon, Sam-Hyun Chun and Jong-Bae Kim

33

Multi-TSV (Through Silicon Via) Error Detection Using the Non-contact Probing Method . . . . . . . . . . . . . . . . . . . . . . . . . Sang-Min Han, Youngkyu Kim and Jin-Ho Ahn

47

Real-Time Ultra-Wide Viewing Player for Spatial and Temporal Random Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gicheol Kim and Haechul Choi

57

A Study on the Faith Score of Telephone Voices Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hyungwoo Park

71

A Comparative Study of Using Bag-of-Words and Word-Embedding Attributes in the Spoiler Classification of English and Thai Text . . . . . . Rangsipan Marukatat

81

Fall Detection of Elderly Persons by Action Recognition Using Data Augmentation and State Transition Diagram . . . . . . . . . . . . . . . . . . . . . Ayaka Takebayashi, Yuji Iwahori, Shinji Fukui, James J. Little, Lin Meng, Aili Wang and Boonserm Kijsirikul

95

ix

x

Contents

Elliptic Curve Cryptography and LSB Steganography for Securing Identity Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Christofer Derian Budianto, Arya Wicaksana and Seng Hansun Labeling Algorithm and Fully Connected Neural Network for Automated Number Plate Recognition System . . . . . . . . . . . . . . . . . 129 Kevin Alexander, Arya Wicaksana and Ni Made Satvika Iswari Implementation of Creation and Distribution Processes of DACS Rules for the Cloud Type Virtual Policy Based Network Management Scheme for the Specific Domain . . . . . . . . . . . . . . . . . . . . 147 Kazuya Odagiri, Shogo Shimizu and Naohiro Ishii Transforming YAWL Workflows into Petri Nets . . . . . . . . . . . . . . . . . . 165 Wanwisa Paakbua and Wiwat Vatanawood Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

Contributors

Jin-Ho Ahn School of Electronics and Display Engineering, Hoseo University, Asan, Chungnam, Republic of Korea Kevin Alexander Department of Informatics, Universitas Multimedia Nusantara, Tangerang, Indonesia Christofer Derian Budianto Department of Informatics, Universitas Multimedia Nusantara, Tangerang, Indonesia Haechul Choi Department of Multimedia Engineering, Hanbat National University, Daejeon, Yuseong-gu, South Korea Ji-Won Choi Graduate School of Software, Soongsil University, Seoul, Korea Sam-Hyun Chun Department of Law, Soongsil University, Seoul, Korea Shinji Fukui Department of Information Education, Aichi University of Education, Kariya, Japan Sang-Min Han Department of Information and Communication Engineering, Soonchunhyang University, Asan, Chungnam, Republic of Korea Seng Hansun Department of Informatics, Universitas Multimedia Nusantara, Tangerang, Indonesia Deok-Gi Hong Department of Computer Science and Engineering, Soongsil University, Seoul, Republic of Korea Naohiro Ishii Aichi Institute of Technology, Toyota, Aichi, Japan Ni Made Satvika Iswari Department of Informatics, Universitas Multimedia Nusantara, Tangerang, Indonesia Yuji Iwahori Graduate School of Engineering, Chubu University, Kasugai, Japan Geyong-Sik Jeon Department of IT Policy and Management, Soongsil University, Seoul, Korea

xi

xii

Contributors

Boonserm Kijsirikul Department of Computer Engineering, Chulalongkorn University, Bangkok, Thailand Gicheol Kim Department of Multimedia Engineering, University, Daejeon, Yuseong-gu, South Korea

Hanbat

National

Il-Hwan Kim Department of Computer Science and Engineering, Soongsil University, Seoul, Republic of Korea Jong-Bae Kim Startup Support Foundation, Soongsil University, Seoul, Korea Seok-Yoon Kim Department of Computer Science and Engineering, Soongsil University, Seoul, Republic of Korea Youngkyu Kim Department of Information and Communication Engineering, Soonchunhyang University, Asan, Chungnam, Republic of Korea Youngmo Kim Department of Computer Science and Engineering, Soongsil University, Seoul, Republic of Korea James J. Little Department of Computer Science, University of British Columbia, Vancouver, BC, Canada Rangsipan Marukatat Department of Computer Engineering, Faculty of Engineering, Mahidol University, Nakhon Pathom, Thailand Lin Meng Department of Electrical and Electronics Engineering, Ritsumeikan University, Kusatsu, Japan Kazuya Odagiri Sugiyama Jogakuen University, Nagoya, Aichi, Japan Wanwisa Paakbua Department of Computer Engineering, Chulalongkorn University, Bangkok, Thailand Hyungwoo Park School of Information Technology, Soongsil University, Seoul, Republic of Korea Shogo Shimizu Gakushuin Women’s College, Tokyo, Japan Ayaka Takebayashi Graduate School of Engineering, Chubu University, Kasugai, Japan Wiwat Vatanawood Department of Computer Engineering, Chulalongkorn University, Bangkok, Thailand Aili Wang Higher Education Key Lab, Harbin University of Science and Technology, Harbin, China Arya Wicaksana Department of Informatics, Universitas Multimedia Nusantara, Tangerang, Indonesia Sang-Kweon Yun Department of IT Policy and Management, Soongsil University, Seoul, Korea

An Algorithm Integrating and Updating Rights Management Information on Public Domain Images Youngmo Kim, Il-Hwan Kim, Deok-Gi Hong and Seok-Yoon Kim

Abstract The public domain sites provide various types of public domain works and displays Rights Management Information (RMI) for them. Users may use public domain works in accordance with RMI as specified by the author. However, users who do not understand RMI for a public domain work can be disputed by copyright infringement. To solve this problem, it is necessary to provide users with the accurate and most updated information of RMI on public domain works. To solve these problems, in this paper, we propose an algorithm that extracts and analyzes the information needed to update the latest RMI and correct the wrong or duplicate RMI information in various public domain sites. The proposed algorithm has been verified using 10,000 data for the accuracy of extraction and consistency of updates. The experiment shows that the automatic updating and integration was accomplished for 99.96% of data and only 0.04% of data required to be verified manually. Keywords Public domain · Free-for-use license · Right management information (RMI) · Web crawling

1 Introduction Nowadays, various organizations around the world have been operating various types of copyrighted work sites for users’ right to use copyrighted works, and in particular, there are various types of copyrighted works, such as freely available public domain Y. Kim · I.-H. Kim · D.-G. Hong · S.-Y. Kim (B) Department of Computer Science and Engineering, Soongsil University, Seoul, Republic of Korea e-mail: [email protected] Y. Kim e-mail: [email protected] I.-H. Kim e-mail: [email protected] D.-G. Hong e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Lee (ed.), Applied Computing and Information Technology, Studies in Computational Intelligence 847, https://doi.org/10.1007/978-3-030-25217-5_1

1

2

Y. Kim et al.

Fig. 1 Korean ‘Gong-u Madang’ status

and copyrighted public Records. Figure 1 illustrates the use of copyrighted works in the ‘Gong-u Madang’, a site for sharing works in Korea. The number of database usage in 2016 is 11.44 million, which is a significant increase compared with the previous year [1]. Such public domain and copyrighted public records sites are not limited by the scope of use permission of users, copying, modification, redistribution, etc. but they can be easily accessed by anyone. One of possible difficulties is copyright infringement when a copyrighted work is protected by a second or third user without permission of the copyright owner or by unauthorized use due to the easy access to the public domain and copyrighted public records site [2]. The use of such public domain works is likely to cause copyright disputes between the copyright owner and the user [3, 4]. The other difficulties include the cases that the different RMI scheme for each site, and the incorrect RMI notation makes it difficult for users to use the public domain works they want to use [5]. Therefore, there is a need for a public domain works search engine that can show the integrated RMI of each public domain works site and the updated RMI of public domain works. In this paper, we propose a method of extracting RMI and meta information needed for the integrated search engine of public domain images through web crawling and reflecting the latest changes of distributed public domain works on the database of integrated public domain image search engine, and an algorithm for updating RMI of public domain images.

An Algorithm Integrating and Updating Rights Management …

3

Table 1 Terminology related to public domain Terms

Explanation

Public domain

General name refereed to the work that is freely available to the people and businesses, such as expired works, donated works, CCL works, and public works subject to private opening

Expired works

Copyrighted works after the copyright term has expired

Donated works

A work donated by the right holder the right to the stat

Creative commons license (CCL)

A license copyright holders let their works to freely use in a certain conditions, it spreads to the private autonomous movement and is applied in 53 countries around the world

Public works

Works created or maintained by the public sector, such as government departments, local governments, and public institutions

Copyrighted public records

Works owned and managed by public institutions

Orphan work

A work for which copyright exists but the copyright holder is unknown or unidentified

Free use information

Unprotected works under copyright law

2 Related Research 2.1 Public Domain A public domain is a copyrighted work that has expired in copyright, a work that has been donated by copyright owner (including the rights to create a secondary work), a CCL that is available under certain conditions, public works copyrighted license mark (KOGL, Korea Open Government License) (Table 1).

2.2 RMI (Right Management Information) RMI is the identification information for the work, the identifying information for the author, and the information on the terms and conditions of use that are attached to the work or public domain in relation to the public delivery of the work, such as work [1]. Table 2 shows the representation of the other RMI, including the CCL.

2.3 Copyright Dispute Care of Public Domain Public domain is readily available to anyone and is available for distribution, modification, and commercial use subject to the permission of the copyright owner.

Attribution—noncommercial

Attribution

Works commercially available

Work content can be changed

No re-use (Europeana)

In copyright (Europeana)

A copyrighted work by which the state has the rights

Modified, commercially available

Data collection prohibited

Donated works

Public domain

Different RMIs in public domain sites

Commercial license

Derivative license

CCL and others

CC BY-NC

CC-BY

CCL

Common RMI in public domain works sites

Table 2 Representation of RMI

Copyright not evaluated (Europeana)

Copyright expired (70 years after its death)

Expired works

Attribution—no derivative works

CC BY-ND

CC BY-SA

All right reserved (Flikr)

No restrictions on copyright law

Public domain marked

Attribution— share alike

CC BY-NC-SA

Changed license (Gong-u Madang)

Attribution—noncommercial— share alike

CC BY-NC-ND

Public organization works

Attribution—noncommercial—no derivative works

4 Y. Kim et al.

An Algorithm Integrating and Updating Rights Management …

5

Fig. 2 Korean copyright infringement casesf

However, copyright infringement can occur if you use it without knowing the permission range of the copyrighted work because it is available to anyone. Korean copyright infringement cases shown in Fig. 2 illustrate the increasing trend as the year proceeds.

3 Algorithm for Updating and Integrating RMI on Public Domain Images 3.1 Overall Structure of Proposed Algorithm In this paper, we propose an algorithm to collect image data and its RMI to provide accurate right information by selecting a public domain, which can be an public domain image site, such as Gong-u Madang, Flikr, and Europeana. The algorithm collects image data and its RMI to provide accurate right information, and then compare the deleted, added, and duplicated data and finally update and integrate it. The overall flow diagram of the algorithm is shown in Fig. 3, which is divided into the data extraction part and the updating part.

3.2 Data Extraction of Public Domain Images Public domain image data extraction uses web crawling technology to collect basic data for updating RMI for public domain images. Web sites to be collected are targeted for the most popular public domain sites such as Europeana, Flickr, and

6

Fig. 3 RMI extraction and updating algorithm on public domain images

Y. Kim et al.

An Algorithm Integrating and Updating Rights Management …

7

Fig. 4 Image data extraction algorithm

Gong-u Madang. The algorithm that collects and manages the copyrighted work, the RMI, and the identification information needed for the update is shown in Fig. 4. Within the public domain image search site, images, RMI, image hash information, and other identifiers are included. In order to collect this information, the web crawler is executed first, and then the RMI is discriminated first, and it is judged whether it is a Prohibited RMI. The public domain work that is the subject of the collection shall be the copyrighted works of the author except for the public domain images with RMI that have all rights to the work or that can not be downloaded, such as No Re-Use, In Copyright, and Copyright Not Evaluated, Extraction. In addition, donated works, expired works, free-use works, and public domains are integrated into free-use works among the extracted RMIs. In this paper, image hash information is used as an identifier. Identifiers are needed to match with databases of search engines. When matching data by using specific characters as identifiers, the database of the search engine to be updated must have the same type of information. However, this has the problem that the data format differs for each public domain work site. In addition, the method of extracting image features and comparing and matching similarities [6] is inefficient because the images must be downloaded again in the collection step [7, 8]. Therefore, in this paper, propose a matching method using image hash information instead of comparison using specific data or images as shown in Fig. 5. The hash library provided by python generates a total of 128 bits of hash information by creating and combining hash information for the rows and columns of the image. Therefore, it is possible to generate different hash information for each image, so it is suitable for use as an identifier.

8

Y. Kim et al.

Fig. 5 An example of image hash

3.3 Updating RMI on Public Domain Works The algorithm for updating RMI of the public domain is as shown in Fig. 6, after collecting basic data according to the algorithm in Sect. 3.2 and updating RMI to the latest information. The collected data and the data to be updated are divided into RMI DB and update set DB, respectively, and then the algorithm is executed. First, a step of selecting data to be matched for update is performed from the RMI DB and the update set DB. Then the image data set (B) to be updated with the latest information to be compared with the image RMI database (A) of the public domain site is searched with the RMI, the hash information, the author, and the latest update date, respectively, and retrieves result values of the result A and the result B are obtained. In other words, Result A is the existing collected data, and Result B is the newly collected data and information for updating. When the data selection step is completed, the data values of Result A are compared based on Result B, and it is determined whether the data matching result values are present only in Result A or Result B, and both Result A and B exist. First, if the result exists only in Result A, it means that the previously specified RMI has disappeared and the public domain has been changed to a state that is unavailable. Therefore, Result A, which is the content of the image RMI database (A), is deleted. For example, if copyright holder changes the terms of use of a work that was uploaded by a creator of a public domain work to an unavailable work, the creator will no longer be able to use that image, and will delete and update RMI so that the work is not available. Secondly, if the results are present only in Result B, it updates the relevant information in the new image by adding it to the existing Image RMI Database (A), which means that a new image is created. Finally, it is the case that both result A and B exist. In these case if a copyright holder uploads the same image to several public domain sites such as Gong-u Madang, Flickr, or Europeana, duplication of data will occur. Therefore, it is necessary to determine which site collected data is up-to-date. First, check whether the source of the duplicated work is Europeana or Flicker. If both sites have occurred, update RMI based on the latest date of the update or the latest date of the post. In addition, since Europeana and Flickr share data with each other, they are reflected in the search engine database (A) so that they can be updated automatically between the two sites.

An Algorithm Integrating and Updating Rights Management …

9

Fig. 6 Updating algorithm for image RMI

If the source of the result of the duplication is not Europeana or Flickr, it is determined to be a work that is collected from the Gong-u Madang. Since Gong-u Madang does not provide the latest update date, the related data is updated to the existing image RMI database (A) based on the fact that the source of the latest image data set (B) to be compared is the Gong-u Madang.

10

Y. Kim et al.

Fig. 7 An example of Gong-u Madang image

4 Implementation and Verification of Data Extraction and Update Algorithm In this paper, we implemented RMI algorithm based on Web crawling technology and DB matching technology in python. We selected Gong-u Madang as the site for extracting data, analyzed the number of image search result pages and the URL of the site through HTML document parsing, and collected only necessary data (Fig. 7). Figure 8 shows the total number of image works of a public domain work site by web crawling Gong-u Madang images and obtaining the maximum page number. When the total number of public domain image is needed, other html codes are refined and other characters are removed to leave numbers only [8]. The maximum page number is used to obtain URL information for all public domain image assets on the site, as shown in Fig. 9. Using the URL of the collected public domains, the image hash information and the rights management information are extracted for an example shown in Fig. 10, and the code is shown in Fig. 11. Figure 12 shows the image hash information and rights code and source code extracted as the resultant processing of Fig. 11, but since the update column does not display the most recent publication date due to the characteristics of the Gong-u Madang. Figures 13 and 14 show the case where the same public domain image is uploaded to Flickr and Europeana, the public domain work search sites, but different RMIs

An Algorithm Integrating and Updating Rights Management …

11

Fig. 8 Pseudo-codes for collecting the maximum number of pages in image search results

Fig. 9 Pseudo-codes for collecting URLs of all public domain works

are displayed. The process of updating to the latest RMI by applying the proposed algorithm is as follows. The results of the data extraction from each site are shown in Fig. 15, where the source code of Europeana is expressed as ‘UP’ and the source code of Flickr is ‘FK’. Even though the images are collected on different sites, the hash information is stored the same, resulting in duplicate results. In addition, the registration date and renewal date are extracted from each site so that the latest data can be selected according to the algorithms shown in Fig. 3.

12

Fig. 10 An image example for a public domain web page

Fig. 11 Pseudo-codes for hash information for RMI extraction

Y. Kim et al.

An Algorithm Integrating and Updating Rights Management …

Fig. 12 Extracted data result for the example in Fig. 10

Fig. 13 Duplicated image example in Flikr

13

14

Y. Kim et al.

Fig. 14 Duplicated image example in Europeana

Fig. 15 An example of duplicated data for the same image

Fig. 16 Data shown before updating

In the case of Fig. 15, the latest data on the date is a row with a ‘seq’ value of 144,210. According to the algorithm presented in this paper, Europeana ‘s work with the latest update date has the highest priority, so the column with ‘seq’ value of 144,210 is selected and updated by comparing with the search engine database. The data to be updated is shown in Fig. 16.

An Algorithm Integrating and Updating Rights Management …

15

Fig. 17 An example of duplicated image data from the same source

Fig. 18 An example of image data with redundant origin

Although the existing ‘right_code’ is ccl21, we can confirm that ‘right_code’ is changed to free as shown in Fig. 17 by updating according to the algorithm proposed in this paper. For the verification of this algorithm, the accuracy of the data extraction and the consistency of the renewal are measured. To compare the data extraction rate, the crawling was performed five times in 10,000 data cases, compared to the actual number of collected data. The formula is as follows. The number of extracted data (bi) is 10,000, and the number of recognized data extracted (ai) is substituted with the number of data stored in the actual DB. As a result of the crawling test, it was confirmed that 10,000 data were recognized and extracted 100 times in all 5 tests. In order to measure the consistency of the data updating, the renewal accuracy of the works that are duplicated from the same source was compared with the renewal accuracy of the products collected from different sources. First, if the source of the image work is the same, 9 hash information duplication occurred among 10,000 collected data. Three of these are not image formats, and since extraction data other than the image type is excluded from the measurement, the results are not affected. The other six duplicate data sets were three different pairs of data, and algorithms were applied to combine all metadata into one data, as shown in Fig. 18. n n For duplicate product x%accuracy = i=1 ai / i=1 bi ∗ 100 renewals from different sources, a total of 14 out of 10,000 data were duplicated, one of which was not updated with RMI data. These examples show different sources of data and RMI, but the update dates are the same. Because these data have no metadata to be used to determine the latest RMI, it is difficult to update the latest RMI updates. The non-updated data is found to be 4 out of 10,000 data, so 0.04% is a very low. When such an error occurs, the administrator needs to check the RMI and move it to another search engine database so that the RMI can be updated.

16

Y. Kim et al.

5 Conclusion In this paper, we select a public domain search site and extract data such as authors, hash information, and RMI using web crawling, and store the extracted public domain image and related data in the image RMI database. In addition, the image-search system has implemented algorithms to delete, add, and update image RMI databases according to the results by comparing them to Image Data Set based on image hash information for overlapping problems with the latest RMI updates. This will enable a search engine to be implemented, which will enable the retrieval of works that are distributed across multiple public domain sites from a public domain image retrieval system. In addition, as the current RMI for public domain works is updated, there will be improved reliability in the search engine and some fewer copyright infringements cases for the creator’s works. In the future, it may be necessary to expand the range of community-researched sites, not just Europeana, Flickr, or Gong-u Madang, to expand the range of available public domain for multiple users. It may also be necessary to compensate for the occurrence of duplicate data with the same date of the latest update and hash information. Research is also required to incorporate big data technology into existing updated systems to efficiently manage and process large amounts of data. Acknowledgements This research project was supported by Ministry of Culture, Sport and Tourism (MCST) and Korea Copyright Commission in 2018 (2017-SHARE-9500).

References 1. Korea Copyright Commission: Copyright Statistics, 6 (2017) 2. Pyun, Seog-Hoan: A study of copyright infringement in video works. J. Korea Contents Assoc. 7, 107–118 (2007) 3. Do, D.H.: Method of use for, and infringement on, the author’s property right for joint work. Judic. Preced. Rev. 29(1), 185–205 (2015) 4. Oh, S.-H., Choi, Y.-S.: A study on the standardization of the copyright management information metadata for operational efficiency in CLMS. J. Telecommun. Technol. Assoc. 5 (2015) 5. Yoon, J.S.: The study on the commons of copyrighted works. J. Informedia Law 10, 1–44 (2006) 6. Hong, Seok-Joo, Park, Young-Bae: The design and implementation of WEB crawling search engine using reverse RSS. J. Korean Inst. Commun. Sci. 34, 139–147 (2009) 7. Yoon, K.S., Kim, Y.H.: Designing and implementing web crawling-based SNS web site. In: Korea Computer Information Association Winter Conference, vol. 26, pp. 21–24 (2018) 8. Cho, Junghoo, Garcia-Molina, Hector, Page, Lawrence: Efficient crawling through URL ordering. Comput. Netw. ISDN Syst. 30, 161–172 (1998)

Improvement of Data Sparsity and Scalability Problems in Collaborative Filtering Based Recommendation Systems Ji-Won Choi, Sang-Kweon Yun and Jong-Bae Kim

Abstract As the demand for recommendation systems that select and show a lot of information existed on the Internet based on various criteria is growing more and more, the related technologies have also advanced. Among them, the technique, which analyzes customer preference information to measure similarity between customers or between items and recommends items based on it, is called the collaborative filtering. Since it has an advantage that the more amount of data the more exact recommendation result and the better performance, it has been used in various fields. However, there are also some limits. The data sparsity problem, in which the recommendation system’s performance is reduced if there is no sufficient preference information, and the scalability one, in which operation time increases exponentially as the amount of data becomes larger, are considered typical limits. Although studies to improve these limits have been continued, it is needed more practical studies. Therefore, this paper proposed the collaborative filtering technique that improves the limits of the collaborative filtering recommendation technique’s data sparsity and scalability through two-stage clustering. First, it studied how to improve the data sparsity problem. It proposed a method that uses customer’s basic information data to do clustering and then predicts specific customer’s preference scores based on the cluster’s preference information to fill the customer versus item matrix. Second, it proposed a method that reduces data space through clustering to improve the scalability problem. Finally, it verified through experimentation how much the collaborative filtering technique’s performance is improved by the method proposed in this paper. By improving the data sparsity and scalability which is an endemic problem of the collaborative filtering recommendation system through this study, it J.-W. Choi Graduate School of Software, Soongsil University, Seoul, Korea e-mail: [email protected] S.-K. Yun Department of IT Policy and Management, Soongsil University, Seoul, Korea e-mail: [email protected] J.-B. Kim (B) Startup Support Foundation, Soongsil University, Seoul, Korea e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Lee (ed.), Applied Computing and Information Technology, Studies in Computational Intelligence 847, https://doi.org/10.1007/978-3-030-25217-5_2

17

18

J.-W. Choi et al.

could recommend more exactly and quickly even if customer preference information is insufficient or there is a large amount of data. Keywords Data sparsity · Data scalability · Collaborative filtering · Recommendation · K-means clustering

1 Introduction The amount of information existed on the Internet is increasing explosively due to advances in communication technology. This leads to an advantage of a wide choice in various content areas such as music, video and news, but on the other hand, there is also a problem that users could not be easy to find the desired information due to too much information. Accordingly, the demand for a recommendation system to offer services fitted to consumer’s changing preference is increasing daily, and the related technology is also growing steadily. A recommendation system is one that analyzes customer’s preference through the existing data related to the content and suggests goods to match. Although recommendation systems are invisible, they are being used significantly in many services. The most representative case is the book recommendation system of the online commerce company called Amazon.com. The Amazon’s book recommendation system analyzes customer’s purchase history to recommend goods personalized for each customer. The goods recommended through it result in 35% of annual sales, and to maximize it, 10% of profits is reinvested annually in R&D of the recommendation system [1]. Also in the view of business, the recommendation system’s performance is an important problem of being able to determine the success or failure of a business. If customers think it is not suited to themselves while using the service, it could only lead to churn eventually. Furthermore, because it has a characteristic of recommending more exactly if data is accumulated while customers use the service, customers does not easily churn to other services. To assist consumer choice in this way and strengthen business, a lot of companies are using the recommendation system. A diversity of techniques is used in the recommendation system, and among them, the collaborative filtering recommendation technique is a successful one that is used and studied the most widely. The collaborative filtering recommendation technique is a method that analyzes customer consumption patterns through the existing data to measure similarity between customer-customer, item-item and customer-item, and recommends items based on the similarity. It has an advantage of improving the accuracy of recommendation results because data is accumulated over time. However, if there is no sufficient information about input data, there are several limits such as a data sparsity problem, which reduces the performance of recommendation systems, and scalability one that occurs because the filtering operations excessively increase if the data size becomes larger. There have been various studies to improve these limit. Seong-Gweon Cheon combined the mentor to improve the data sparsity problem appeared in the movie recom-

Improvement of Data Sparsity and Scalability Problems …

19

mendation system using the movie rating data [2], and Hyeon-Hi Kim, Dong-Geon Kim and Jin-Nam Cho improved the data sparsity problem in the music recommendation system by combining user’s listening habits with tag information to calculate preference scores [3]. However, the essential reason why data sparsity problems arise is because the amount of the rating information data is fairly scarce compared to the number of customers and services or products since customers do not usually leave the rating information about services or products. But previous studies could not improve the cause of problem substantially in the use of the future because they require another type of customer rating data such as the review information data or tag information. In addition, for studies for the scalability problem, there was a study on methods to improve by using dimensional reduction [4], but study was very lacking compared to the data sparsity problem. Therefore, this paper would like to propose a method to improve the data sparsity and scalability problems with minimal data utilization on customer’s rating. There are various limits such as cold-start, popularity bias and hacking in the collaborative filtering recommendation technique, and they also need to be improved. In this paper, however, the study was carried out with being limited to the data sparsity and scalability problems of collaborative filtering. The data used in this paper is the customer information and purchase history data of 5000 customers who bought goods at the L supermarket in 2015, and the R and R studio, which are data analysis tools, were used to make the experiment environment for the study.

2 Related Studies A recommendation system means one that analyzes preference information about items through data such as customer’s rating information and transaction history and suggests new items worthy of user’s satisfaction. Studies on recommendation systems have been carried out with a long history in various fields. In addition to Ringo in the field of music, GroupLens in the news articles area and Video Recommender in the field of video, studies have been conducted across a wide area such as CDNow and MovieFinder. At present, typically, Youtube, Amazon and Netflix etc. have applied the recommendation system to contents and goods to receive a great positive response. Youtube sorted more than billions of videos by the collaborative filtering recommendation technique, and generated a more personalized recommendation list through the click history data of respective customers. Therefore, even if clicking the same video, the recommended result varies depending on which customer clicks. Amazon introduced a recommendation system in the name of ‘Book Match’ for the first time in 1996. Since then, it implemented the recommendation system with higher satisfaction based on the data about goods rated by customers themselves and the similarity between items. The 35% of Amazon’s total sales is recommended through the recommendation system. Finally, there is Netflix that is an online movie streaming site

20

J.-W. Choi et al.

having more than 30 million customers throughout the world. Netflix developed a system called CineMatch that determines customer’s preference through rating and then recommends similar content based on it. Even if customers do not search movies one by one, a favorite content could be recommended and watched, so 60% of the rented movies is recommended from the recommendation system. The recommendation system uses various techniques, and they could be classified into collaborative filtering, content-based filtering and hybrid [5]. The collaborative filtering means a technique that recommends an item purchased by other customers whose preference and taste to items is similar. Thus, it should first find customers whose preference and taste is similar, and they are called neighbors. In other words, if they have similar preference to the same item, they become neighbors, and it recommends the item preferred by the neighbors accordingly [6]. The collaborative filtering recommendation technique includes user-based and item-based techniques. The user-based technique is a method that defines neighbors with a like similarity and recommends items preferred by neighbors. On the other hand, the item-based technique is a method that calculates similarity between items based on the preference score of customers and recommends an item with a high similarity to the relevant item. In other words, two methods are the same in they basically calculate similarity, but varies depending on whether they calculate similarity between customers or between items [7–10]. Of the problems pointed out as a limit of the collaborative filtering based recommendation system, there are data sparsity and scalability problems. First, the data sparsity problem means that the recommendation performance is reduced because similarity decreases fundamentally due to not enough input data. In particular, if there are a large number of items like the data used in this paper, customers could not use all the items, and the number of customers who use only very few items also increases. Therefore, most of the customer versus item matrix elements are left blank. The more the number of customers and items increase, the more the proportion of the items consumed actually by customer’s decreases. In other words, the information about preference scores becomes sparser and the accuracy of recommendation systems is reduced. Various studies have been carried out to improve the data sparsity problem. Seong-Kweon Cheon combined the existing data with a mentor concept to improve the data sparsity problem resulted from insufficient input data when implementing the movie recommendation system using the movie rating data. It adopted a hybrid method that combines the collaborative filtering technique with the content-based filtering one [2]. In addition, Hyeon-Hi Kim, Dong-Geon Kim and Jin-Nam Cho improved the data sparsity problem by using user’s listening habits and tag information in the process of implementing a music recommendation system. Tag information is subjective information on the characteristic of music, and the data sparsity problem could be considerably improved by combining the existing rating information with the tag information [3]. Like this, it may be considered various studies have been carried out to improve data sparsity, but in fact, since data such as movie review information data and tag information is customer’s rating data for items, it becomes the cause that increases the burden on the customers who want to use services conveniently. And because

Improvement of Data Sparsity and Scalability Problems …

21

customers do not usually leave the rating information, it is not easy to get this data. It is that asks for another hard-to-acquire data to improve the data sparsity problem arisen from not enough input data. Second, the scalability problem is one that operation time increases exponentially if the number of customers and the transaction history data increase. It is because the use of computing resource and disk space increases sharply as data increases. Although the scalability problem may vary with the basic performance of computer, it becomes an important problem to be solved as the interest and demand for big data increases recently. There was a study to improve the scalability problem of recommendation systems through a dimensional reduction method in regard to the scalability problem [4], but except it, there was a lack of various studies.

3 Recommendation System Solving the Data Sparsity and Scalability Problems 3.1 Problem Definition There are three elements in the collaborative filtering recommendation technique. They are customers, items and customer’s preference scores for items. Figure 1 is a customer versus item matrix used in the collaborative filtering recommendation technique. Each row and column of the matrix indicate a customer and item, respectively. And a number inside the matrix means each customer’s preference for an item. The preference score is to be filled in after actually getting a rating survey about items from customers, or if assuming that the number of times of purchasing items, log records and numbers of clicks etc. are proportional to preferences, they might be converted into preference scores to use them. The blank inside a matrix means there is no customer’s preference information about the item. Therefore, if defining the problem again, (1) The data sparsity problem is one that arises when customer preference information is sparse, that is, there are many blanks in the customer versus item matrix; (2) The scalability problem is one that operation time increases exponentially because the number of operations grows sharply if the size of customer versus item matrix becomes large.

3.2 Proposing a Method to Improve the Data Sparsity Problem To improve the data sparsity problem, the blanks of the customer versus item matrix should be filled in. However, because asking customers for another preference rating score is not easy and increases the burden on customers, this study uses a method that uses customer’s basic information data to predict preference scores.

22

J.-W. Choi et al.

Item 1

4 2 3

Customer

5

2

3 1 5

3 3

Fig. 1 Customer versus item matrix

Fig. 2 Collaborative filtering improving the data sparsity problem

Figure 2 shows the method of improving the data sparsity problem proposed in this paper in diagram form. The method to improve the data sparsity problem by using customer’s basic information data to predict preference scores works in the following order. (1) Carry out k-means clustering for the ‘customer’ corresponding to a row in the customer versus item matrix. ➀ Use the elbow method to find a proper k value.

Improvement of Data Sparsity and Scalability Problems …

23

➁ Carry out k-means clustering based on customers’ basic information data to sort customers into k clusters. The basic information data may be gender, age group and residence etc. (2) Define neighbors from the customer group. ➀ See the result of carrying out k-means clustering, and define the customers, who belong to the same cluster and have similar personal data, as neighbors. That is, k neighbor groups are formed. ➁ Assume that neighbors would have similar preferences for items. (3) Fill in the matrix. ➀ Find the average of the neighboring customers’ preference scores because neighbors have similar preference. ➁ Fill the found average values in the blanks of the target customer. (4) Recommend an item through the collaborative filtering technique based on the new matrix finally filled.

3.3 Proposing a Method to Improve the Scalability Problem The scalability problem is one that operation time increases considerably as numbers of customers and transaction data increase. In this paper, the rate that operation time increases with growing data amount is defined as the ‘expansion rate’. The key point of improving the scalability problem is to reduce the expansion rate. The method proposed in this study is to carry out k-means clustering as a pre-process of the collaborative filtering recommendation algorithm. The operation time could be reduced considerably because a large amount of data is divided into k small data. In other words, after carrying k-means clustering, the expansion rate would decrease than before. Figure 3 is a diagram that shows the method to improve the scalability problem proposed in this paper. The method to improve the scalability problem is carried out as the following order. (1) Fetch the existing customer versus item matrix. (2) Carry out k-means clustering in the customer versus item matrix. ➀ Use the elbow method to find a proper k value. ➁ Carry out k-means clustering based on customers’ preference scores to sort customers into k clusters. (3) Carry out the collaborative filtering recommendation for respective k clusters. The final collaborative filtering process to improve both the data sparsity and scalability problems is as Fig. 4. With the two-stage clustering, the collaborative

24

J.-W. Choi et al.

Fig. 3 Collaborative filtering improving scalability

Fig. 4 Collaborative filtering improving data sparsity and scalability problems

filtering recommendation would be able to be done without additional preference rating scores and exponential growth of operation time.

Improvement of Data Sparsity and Scalability Problems … Table 1 Experiment environment

25

Category

Environment

CPU

Intel Core i5-420U

OS

Windows 8.1

RAM

4 GB

R

version 3.4.2

4 Experiments and Results In this chapter, it would like to apply the improved collaborative filtering recommendation technique proposed in Chap. 3 to real data and to measure its performance by comparing with the existing collaborative filtering recommendation technique.

4.1 Experiment Environment The experiment was carried out on the PC having the environment like Table 1. R Version 3.4.2 was used in the Intel Core i5-4200U, Windows 8.1 and 4 GB RAM environment. For the major R packages used, it was used the recommender lab package to use the user-based collaborative filtering recommendation technique, and the NbClust package to carry out k-means clustering as a pre-analysis.

4.2 Experimental Data This paper used the data with modified personal information which was obtained for research purposes through ‘Lotte L.point Big Data Competition’, and the customer information and purchase history data for customers who bought goods at L supermarket in 2015 was sampled randomly to use. Of them, after processing the purchase history data, it assumed that the number of times that a customer bought a specific product as the preference for the product. Among the collaborative filtering recommendation techniques, the user-based technique was used to calculate similarity between customers to define neighbors. The customer basic information and purchase history data was used for the data. Of them, the purchase history data was processed to transform into the customer versus item matrix. Information appeared in the initial purchase history data was as Table 2. The data constructed with a total of 1,727,092 rows included the customer identification number (ID), receipt number (RCT_NO), business unit (BIZ_UNIT), product number (PD_S_C), branch number (BR_C), purchase date (DE_DT), purchase time (DE_HR), purchase price (BUY_AM) and purchase quantity (BUY_CT) information.

26

J.-W. Choi et al.

Table 2 Purchase history data table ID

RCT_NO

BIZ_UNIT

PD_S_C

4009

2109

A02

215

6379

2109

A02

75

6379

2109

A02

149

8002

2110

A02

8002

2110

A02

BR_C

DE_DT

DE_HR

BUY_AM

BUY_CT

2

20,150,216

13

59,600

2

29

20,150,213

11

35,000

1

4

20,150,115

10

85,000

1

139

10

20,151,220

10

25,000

1

139

10

20,151,220

10

21,000

1

Table 3 Customer versus item matrix created by purchase history data Domestic Mineral water 264

2

334 854

Carbonated water

Fruit and vegetable drinks

1

Mixed drinks

Tea bags

Chocolate bar

2

1

5

2

3

2

1352

4 2

3 3

Normal gum

3

3

920

1654

Imported Mineral water

1 4

1

3

Of them, the receipt number (RCT_NO), business unit (BIZ_UNIT), branch number (BR_C), purchase date (DE_DT), purchase time (DE_HR), purchase price (BUY_AM) and purchase quantity (BUY_CT) were determined to be unnecessary to construct the customer versus item matrix, so the entire columns were removed. To recognize intuitively, the product number (PD_S_C) was merged with the product category data to change into the product subcategory name. Because the data size is too large, the purchase history of 500 customers was extracted through random sampling to easily experiment. The number of times that the customer bought an item for a year was assumed as the preference score for the item, and the customer versus item matrix was constructed. Table 3 shows a part of the matrix.

4.3 Experiment of Improving the Data Sparsity Problem The experiment was carried out according to the order of the method proposed in Chap. 3. First, it was prepared the basic information data containing ID, age group, region and gender which could match with purchase history. Based on this basic information data, k-means clustering was carried out to divide customers into k clusters. For the k value, a proper value was found by increasing the number of clusters gradually through the elbow method [11]. The k value indicating the most proper result was 3 as shown in Fig. 5.

Improvement of Data Sparsity and Scalability Problems …

27

Fig. 5 The k values verified through the elbow method

Consequently, customers were clustered into 3 groups through k-means clustering. Next, the preference scores were averaged in each group to fill in the blanks of the matrix. Finally, the filled matrix was used to carry out the collaborative filtering recommendation to make a recommendation list.

4.4 Results of the Experiment to Improve the Data Sparsity Problem For evaluation on the improvement of the data sparsity problem, the performance of collaborative filtering recommendation systems is compared and evaluated before and after changing the customer versus item matrix. For the evaluation, the values of the precision, recall, true positive rate (TPR) and false positive rate (FPR) were found, and the precision-recall and TPR-FPR graphs were plotted to compare them. As verified in Figs. 6 and 7, the area below the line increases in both two graphs after carrying out the collaborative filtering with the improved matrix. Accordingly, it could be verified the premise, in which a customer with similar basic personal data would have similar preference, and the experiment that predicts other customers’ preference scores with the average preference score of the customers clustered based on it.

4.5 Experiment of Improving the Scalability Problem The key point of the experiment of improving the scalability problem is ‘Does the expansion rate decrease after improvement?’ Therefore, the operation time should be measured with the increase of data quantity, and the experiment is carried out twice with different data quantity. Consequently, as Table 4. (1) Experiment 1-1; In the case of not carrying out k-means clustering with data of 1/10 customers

28

J.-W. Choi et al.

(1) Before filling the matrix

(2) After filling the matrix

Fig. 6 Comparison of the precision-recall graph results

(1) Before filling the matrix

(2) After filling the matrix

Fig. 7 Comparison of the TPR-FPR graph results Table 4 Scalability problem improvement experiment table Not carrying out k-means clustering

Carrying out k-means clustering

Data of 1/10 customers (purchase history of 500 customers)

Experiment 1-1

Experiment 2-1

Data of the whole customers (purchase history of 5000 customers)

Experiment 1-2

Experiment 2-2

(2) Experiment 1-2; In the case of not carrying out k-means clustering with data of the whole customers (3) Experiment 2-1; In the case of carrying out k-means clustering with data of 1/10 customers (4) Experiment 2-2; In the case of carrying out k-means clustering with data of the whole customers A total of four experiments are carried out, and after repeating each experiment three times, the results are compared and verified. For a small amount of data in this

Improvement of Data Sparsity and Scalability Problems …

29

Table 5 Experimental result operation time table (unit: second) Not carrying out k-means clustering Data of 1/10 customers (transaction history of 500 customers) The entire customer data (transaction history of 5000 customers) Expansion rate (%)

Carrying out k-means clustering

2.06

1.60

183.61

68.04

3.70

1.47

paper, the purchase history data of about 500 customers was extracted by reducing the number of customers to about 1/10 through random sampling from the entire data.

4.6 Results of the Experiment to Improve the Scalability Problem It compares the running time of collaborative filtering recommendation process between the case of not carrying out k-means clustering as a pre-process and the case of carrying out it. Changing the amount of data, it compares how the running time varies as the data becomes larger. (1) Experiment 1-1; When not carrying out k-means clustering with data of 1/10 customers (2) Experiment 1-2; When carrying out k-means clustering with data of 1/10 customers (3) Experiment 2-1; When not carrying out k-means clustering with the entire customer data (4) Experiment 2-2; When carrying out k-means clustering with the entire customer data A total of 8 experiments were carried out, and the respective experiments were compared by averaging the operation time according to the criteria of Table 5. The mean value was rounded off to the four decimal places. Looking at Table 5, representing the operation time of experimental results, the operation time greatly increased from 2.06 to 183.61 s due to growing data when the existing collaborative filtering process was carried out, that is, when k-means clustering was not carried out. Calculating the expansion rate, its value was about 3.7%. On the other hand, after carrying out k-means clustering, the operation time itself was also reduced, and the same amount of data increased, but the operation time relatively less increased from 1.6 to 68.04 s. Calculating the expansion rate, it was 1.47%, which could be known to be reduced by about 2.5 times. In other words,

30

J.-W. Choi et al.

since the search space become smaller through the clustering, the operation time could be reduced effectively although data grows.

5 Conclusion The collaborative filtering recommendation technique which excellent performance is recognized is being used, keeping pace with the big data era, in all aspects of our daily lives. It is an important system that plays a role of helping choice of purchase for consumers and prevents customer churn in the view of business. However, studies on the data sparsity and scalability problems, which are limits of the collaborative filtering, have continued to carry out. This paper proposed the method to improve the data sparsity and scalability problems through two-stage k-means clustering, and carried out experiments to verify with real data. First, the key point of the process to improve the data sparsity problem was that should not ask for another pre-preference rating data. Because it is not easy to obtain, not only the usefulness of studies decreases, but also customers would be burdened. Therefore, this paper proposed the method of using customer’s basic information data. First, customers were divided into three clusters through the first k-means clustering based on the basic information data to improve the data sparsity problem. Experiments were carried out under the premise that the preference for items would be also similar for customers whose basic personal data are similar in this process. Customers who belong to the same cluster were defined as neighbor. For the item which preference score did not exist in a customer’s matrix, the value of averaging preference scores of other customers belonging to the same cluster was filled in. The collaborative filtering recommendation was carried out based on the customer versus item matrix filled in like this. Experiments were carried out by comparing the collaborative filtering performance before and after improving the customer versus item matrix. As a result of experimenting, it could be verified that the performance of collaborative filtering was improved after improving the matrix. Second, clustering was proposed as a pre-process of the collaborative filtering to improve the scalability problem. In the second k-means clustering, the customer versus item matrix was clustered based on the preference scores. Through this, it tried to reduce the search space of collaborative filtering and decrease the operation time. Experiments were carried out by dividing into when data amount is small (transaction history data of 500 customers which are 1/10 of the entire data) and when data amount is large (transaction history data of 5000 customers), and in a way that compares the expansion rate before and after carrying out k-means clustering. As a result of experimenting, the expansion rate was about 3.7% before carrying out k-means clustering, and it was about 1.47% after carrying out, which was reduced by about 2.5 times. Therefore, it could be considered that carrying out k-means clustering as a pre-process of the collaborative filtering becomes a factor of improving scalability.

Improvement of Data Sparsity and Scalability Problems …

31

References 1. Jeon, D.Y.: A Movie Recommender System Based on Demographic User Profile and Genre Preference. Master Thesis, Sungkyunkwan University (2017) 2. Chun, S.K.: Movie Recommender System Based on Mentor for the Users Who Have Sparse Rating Data. Master Thesis, Seoul National University (2014) 3. Kim, H.H., Kim, D., Jo, J.: A hybrid music recommendation system combining listening habits and tag information. KSCI J. Korea Soc. Comput. Inf. 18(2) (2013) 4. Lee, O.-J., You, E.-S.: Predictive clustering-based collaborative filtering technique for performance-stability of recommendation system. KIISS J. Intell. Inf. Syst. 21(1), 119–142 (2015) 5. Son, Ch-H, Kim, J.-O., Ha, G.-R.: A study on development of hybrid collaborative filtering algorithm, Korean association of industrial business administration. J. Bus. Res. 25(4), 47–66 (2010) 6. Kim, S., Oh, B., Kim, M., Yang, J.: A movie recommendation algorithm combining collaborative filtering and content information. KIISE J. KIISE 39(4), 261–268 (2012) 7. Gomez-Uribe, C.A., Hunt, N.: The netflix recommender system: algorithms, business value, and innovation. ACM Trans. Manag. Inf. Syst. (TMIS) 6(4), 13 (2016) 8. http://darkpgmr.tistory.com/162 9. Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010) 10. http://newsight.tistory.com/53 11. https://ko.wikipedia.org/wiki

A Study on the Methods for Establishing Security Information & Event Management Geyong-Sik Jeon, Sam-Hyun Chun and Jong-Bae Kim

Abstract There has recently been emerging a global threat caused by the attack through multi-hacking technologies against the national infrastructure, industrial control system and enterprises, what’s called cyber-hacking and cyber-attack on the cyber space like cyber war, for the sake of the nation and organization. Besides, APT (Advanced Persistent Threat) attack utilizing complex types of attack in order to attack a certain target brings about a tremendous chaos on a national and social level. Under such a situation, a necessity for ESM (Enterprise Security Management) is emphasized to establish multi-network enterprise security systems for a defense against an attack from outside and an efficient management. However, ESM analyzed and collected data, with the main focus on information security system-based security event and network sensor-based harmful traffic event without carrying out a function to analyze a general system and application log-based event. As far as an effective security detection is concerned, strategies for a systemic preparation and execution to actively solve the security issue are necessary by utilizing enormous big data occurring throughout the enterprise IT infrastructure sectors. In this regard, this study is to present a security log analysis system utilizing SIEM (Security Information & Event Management) system to cope with an advanced attack that the existing ESM can hardly detect. SIEM is going to analyze an association between data and security event occurring in major IT infrastructure facility network, system, applied services and a great deal of information security systems, and then to present the methods for identifying, in advance, potential security threat. Keywords Security information · Event management · APT · ESM · SIEM G.-S. Jeon Department of IT Policy and Management, Soongsil University, Seoul, Korea e-mail: [email protected] S.-H. Chun Department of Law, Soongsil University, Seoul 156-743, Korea e-mail: [email protected] J.-B. Kim (B) Startup Support Foundation, Soongsil Univ, Seoul, Korea e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Lee (ed.), Applied Computing and Information Technology, Studies in Computational Intelligence 847, https://doi.org/10.1007/978-3-030-25217-5_3

33

34

G.-S. Jeon et al.

1 Introduction APT (Advanced Persistent Threat) attack which has recently brought about a tremendous social chaos refers to Targeting Attack utilizing new type-malicious code and complex cyber-attack techniques. APT attack selects a target and approaches the system becoming the target of attack, utilizing social attack techniques in order to attack the target. Through social attack techniques, APT attack hacks user’s information utilizing the targeted system and user’s account information, when connecting to and utilizing the targeted system. Then, through the hacked account of the system manager, APT attack conveys information to the outside persistently or proceeds with attack by changing intranet information. Such a type of attack can neither be detected nor prevented by the existing ESM (Enterprise Security Management) that detects attacks from the outside. The existing ESM collected multi-logs occurring in extra network and saved them into RDBMS (Relational Database Management System) and then recorded the ongoing situation in dashboard to inform the manager of a problem happening. But, such a method has a limit to collect the security log that would monitor only a traffic approaching from extranet to intranet, and detect nothing but cyber-attacks from the outside, through only network-based attack detection techniques against an attack to impede usability of service. Taking it into account, ESM cannot but pose a vulnerability regarding the techniques against a direct attack on PC clients in enterprise. Accordingly, the purpose of this study is to present the security log analysis system utilizing SIEM (Security Information & Event Management) that can analyze logs generated in every system and detect potential security threat. It is expected that SIEM presented by this research would provide real-time analytic performance through analysis-based multiple search systems and for a substantial analysis, would apply flexible dashboard and every correlation factor technique therefore, enabling available usability, extension, and enterprise security management. Chapter 2 of this study introduced the ongoing situation and research types of ESM (Enterprise Security Management) against the theoretical background of this research, describing and summarizing through a chart the concept of SIEM system model to conduct a correlation analysis of security threat hardly known. Besides, this chapter presented the existing SIEM and the future direction for SIEM to head for, based on a total of 5 factors. This chapter presented an architectural platform for advanced information analysis and algorithm for the treatment of big data. As the methods, this chapter illustrated the platform to analyze the targets and their activity domains. And then, it presented an advanced information analysis platform that collects, saves, analyzes and optimizes enormous data to establish the specific system of each function. Chapter 4 made a practical approach on the analysis system established by the means of algorithm presented as the key element of this study, and verified its result for an effect to improve. This chapter mainly described the assessed value of the

A Study on the Methods for Establishing Security Information …

35

velocity of character string, command and search of data amounting to 9 hundred million, according to data structure. Chapter 5 summarized the final research results of this study and drew the conclusions to present a subsequent research direction.

2 Related Studies The aim of ESM mainly referring as remote security management is to enhance efficiency of security and its management by collectively controlling, managing and maintaining security according to the systemic and consistent policies. A great deal of research related to the existing ESM is conducted, real-time, about electronic intrusion, by connecting network equipment such as server, router, as well as fire wall, IDS, VPN and various security devices. Those relevant researches utilize IT unit log enterprise and defect treatment monitoring system. Security control is also being applied into those researches through an analysis of linkage of IT and security systems. But, those research methods which involve a problem with the performance and cost of DB saving numerous data have difficulties in coping with the recent APT (Advanced Persistent Threat) attack [1]. Accommodating the function of ESM, SIEM is a system model that enables a correlation analysis of security threat unknown by utilizing analytic techniques of event and log through the linkage of application. ESM performs intrusion detection by collecting data by the agent, through intrusion prevention event, intrusion detection event and various network-level events generated by virus. In addition, ESM performs monitoring as to security types known in and out of the network to detect security threat in advance. Yet, the recent security threat utilizes insider’s information through social engineering techniques to collect information for a long time in order to obtain information the hacker wants. The obtaining of the account information of the manager in order to analyze collected data or connect it into server resulted in the intrusion of usability of server, the deletion of data or the malfunction of multiple services, in the past. Meanwhile today, log or events are deleted to prevent the detecting of the hacked log for the purpose of collecting data persistently—even though the targeted information is obtained, which puts the heavy brake on detecting. There are an increasing number of cases where confidential information or patent technologies or data obtained in such a manner are being specially encrypted to demand money for it or threaten enterprises. Subsequently, no track of security intrusion is found so there is a persistent attack threat. Server is either reset or re-established in order to prevent the secondary damage caused by the failure to find information on security intrusion or hacking program, which inflicts the cessation of service operation and damage incurred by it. To solve such a problem, it is important to analyze every activity of internal user associated with the method and route of system access and IP (Internet Protocol). Information on usability of internal user was not considered in the past, but in order to prevent attacks like APT, doing so becomes integral today. The information about those log or events generates enormous data according to the

36

G.-S. Jeon et al.

Table 1 Characteristics of SIEM Classification

Existing SIEM

The future of SIEM

Collect

1. Being able to automatically collect and obtain data on log and events through multiple servers 2. Being able to handle enormous security information

Providing distributed data architecture

Data

1. Generating integration storage for security data 2. Data included in the log limitedly collected

1. Collecting network traffic 2. Restoring the session to detect and investigate how an attacker intrudes and attacks 3. Collecting automatically threat information from external sources and providing visualization of threat

Investigate

1. Integrating log data to generate data integration storage for major security 2. Low usability and much time consumed in investigating the event

Providing user interface saved to complement the method for a security analyst to carry out an investigation

Analyze

1. Providing a high-level control report 2. No a high grade of control over security risk

The outcome on security-focused program creates the proof proving of compliance

Predict

1. Understanding the past incidents and providing a warning alarm 2. Attack detection relies on recognizing attack or attack types in advance

1. Establishing an integration platform to collect security data from multiple environments 2. Detecting an attacker in case a suspicious activity is caught 3. Reducing the number of incident analysts required in order to detect and investigate threat

size of enterprise and characteristics of industrial organizations. The technology is required to detect security issues about particular events in and out the network by collecting and analyzing these data. With a technology to collect and save big data in addition to a technology to analyze in real time enormous data evolving, it is possible to collect and analyze massive structured/unstructured data that enables enterprise security management [2–4]. As described above, SIEM is a technology to analyze information on security for the next generation that enhances security intelligence by analyzing an association between data and security events occurring in major IT devices, namely, network, system and application-based service, so that enterprise security management could be made. It is anticipated that SIEM would be integrated with a technology to analyze big data and secure the position as a key technology of advanced security [3, 5, 6]. The characteristics of SIEM is indicated at Table 1 [5].

A Study on the Methods for Establishing Security Information …

37

3 Architectural Systems to Analyze Security Log Utilizing Big Data 3.1 Collect Enormous Data The factors to be considered in order to collect enormous data are every data collection technology method, enormous data transmission, operation stability and high usability. Especially, it is necessary to establish data relay functions in order to collect multiple channels, and the loss prevention system of collected data in order to secure the continuity of data. Every source, every format data, structured/unstructured original log equipment occurring in security equipment is to be collected real time and saved into a collector through data transmitter. The existing data collection system performs Web log, WEB/WAS (Web Application Server), DB operation log, statistical information on Net Flow, DNS (Domain Name System) sink hole detection event, intranet communication record, and detection event of abnormal sign of Web, but poses a vulnerability to an analysis of abnormal traffic. The integration collector presented in this research enables a control over a connection to harmful Websites through the system to analyze intranet abnormal traffic and detect abnormal Web activity, and DNS sink hole, and can early detect Web hacking through web shell. To collect information, the two methods, Agent/Agentless, are to be utilized, and flexibility is to be offered in the course of choosing the collection methods, in consideration of real time and stability. Data transmitter carries out a function to automatically distribute the defects and load of data and prevent a loss of data through the techniques of automatically distributing the load and detecting/reforwarding defects and log.

3.2 Data Analysis An analysis system targeting enormous data ought to be assured of a performance to analyze real time through distribution-based multiple search. For a multiple, instant and substantial analysis, visual techniques involving flexible dashboard, every factor correlation analysis, baseline and critical/statistical and trending/historical pattern analysis function must be considered to establish a data analysis system. Enormous data cannot be assured of velocity by investigating real time. But, Indexing data saved in a collector at data-saving stage enables the function to enter keyword or criteria to be searched and find indexing data. Searched massive security log data analysis is a data drill down to sub-divide problems into a lot of factors and analyze such problems, which enables an easy multi-search. Data generated in security equipment secures real-time analysis performance through distributionbased multi-search by utilizing the methods to detect a rapid change of data based on baseline and critical value, and a trending analysis to predict data based on statistics analysis.

38

G.-S. Jeon et al.

Big data standard platform (Hadoop)-based statistics analysis system produces a wide variety of statistical data through Batch job and analyzes big data utilizing Hadoop distribution file system. This system functioning to analyze statistical data through the linkage with Open Source enables low cost, fast and flexible development, compatibility, reliability and stability. In addition, this system pursues the methods for building the linkage with the enterprise security management system for independent data linkage by supporting data mining algorithm for big data-based statistics analysis, and natural language (NoSQL) search for data analysis and the extension of flexibility of statistical function.

3.3 Log Structures Utilized for Data Analysis To collect and relay data, the data collection system in consideration of data collection technology methods, enormous data transmission, operation stability and usability ought to be established. To do it, every source, every format data, structured/unstructured original log occurring in security equipment is to be collected real time and saved into a collector through a data transmitter. Then, the collected log data structure is classified according to its structured type, as follows: First, there is an unstructured data that is not saved into a fixed field. A typical example are text documents, image/video/voice data that enable a text analysis. Second, there is a structured data that is saved into a fixed field. As a typical example is mentioned relational database and spread sheet that process and produce unstructured data. Such a data transmits into a collector log data collected from each terminal adaptor without changing the original, and then saves IP address, security log, and the description of data collecting date and time into the collector. Then, it involves the stage of regularizing integration log server and the stage of processing identifiable original data to produce a structured data. All the manipulated data tracks such as user’s information, data inquiry, document output are to be gathered. The information on the inquired and output data is to be printed out. The unified structure gives an easier access to data in which an error message is output, and can visibly show data information to user.

3.4 Log Analysis Algorithm Structured technology of unstructured data presents certain factor value (separator) technology and librarian PCRE (Perl Compatible Regular Expressions) technology supporting regular expression. Algorithm to analyze log consists of the three factors: log-collecting server, logParser and log transmission. Log-collecting server involves the stage of filtering data collected through collection equipment such as fire wall event, IDS event, traffic event, Web fire wall, and so on, and then transmitting such data into logParser.

A Study on the Methods for Establishing Security Information …

39

LogParser undergoes the stage of parsing the filtered data according to collection equipment, standardizing data and converting it into standard data. The converted data is to be saved into file or memory for log transmission. Then, the original data is to be checked. And regular expression, log displacement, deletion, field value decision is to be selected and those selected directory is to be saved into XML (eXtensible Markup Language) file. By doing so, all the regularized processes come to an end.

4 Verification of Security Log Analysis System 4.1 Verification of the Leaking of Security Log Analysis System To prevent the leaking of information, there was a detection on the traffic of enormous file and the format of the attached file. The following shows the case where enormous file (more than 100 MB) was uploaded/downloaded, and indicates eight ones out of a total of 66 resulting values. The company number of staff utilizing enormous file, security grade and cases occurring against the standard cases are indicated at Table 2. To designate the value of system to present the preferred value out of a total of resulting values, ESCORT () log was placed as log linkage information. Regarding log set, the case of utilizing the attached file was searched for field name. The size of file was limited to 0–9. The variable company_id was utilized as the search field. The standard case utilizing 1 case per 1 min is indicated at Table 3.

4.2 Verification of the Plagiarizing and Abusing of Security Log Analysis System To control external IP (Internet Protocol), IP was connected to the areas without branch office/liaison office/local office. IP was detected targeting fire wall systems. The following shows the resulting value of the detection of IP address of external intrusion, based on the platform established, which indicates 10 ones out of a total of 1700 resulting values. Table 4 describes IP address connected outside the designated nations as of the date occurring, and the cases occurring according to security grade and standard case. The following table is the value of system in order to present the preferred value out of a total of resulting values. First, IPMS (Internet Protocol Management System) user connection log was placed as log linkage information. Regarding log set, the name of nation, Private Network was designated to present South Korea, the US, Japan, China and Germany as the nations to be excluded in searching. As a point of departure of the search field, the variable src_ip was utilized. The standard case was 20 ones per 1 min. This is indicated at Table 5.

Time occurring

3:01 pm, Apr. 15, 2018

2:20 pm, Apr. 15, 2018

2:28 pm, Apr. 15, 2018

1:42 pm, Apr. 15, 2018

1:40 pm, Apr. 15, 2018

9:28 am, Apr. 15, 2018

8:31 am, Apr. 15, 2018

7:58 pm, Apr. 14, 2018

Number

1

2

3

4

5

6

7

8

[ESCORT] Traffic of Enormous File & Attach

[B02011] Traffic of Enormous File & Attach

[B02011]_Traffic of Enormous File & Attach

[B02011] Traffic of Enormous File & Attach

[B02011] Traffic of Enormous File & Attach

[B02011] Traffic of Enormous File & Attach

[B02011] Traffic of Enormous File & Attach

[B02011]_Traffic of Enormous File & Attach

Event name

Table 2 Detection of enormous file of security log analysis system

Plagiarize, abuse

Plagiarize, abuse

Plagiarize, abuse

Plagiarize, abuse

Plagiarize, abuse

Plagiarize, abuse

Plagiarize, abuse

Plagiarize, abuse

Classification

Middle

Middle

Middle

Middle

Middle

Middle

Middle

Middle

Grade

1

1

1

1

1

1

1

1

Checking cycle (min)

1/1

1/1

1/1

2/1

1/1

1/1

1/1

1/1

Number of cases occurring/number of standard cases

40 G.-S. Jeon et al.

A Study on the Methods for Establishing Security Information … Table 3 Value of the detection of enormous file of security log analysis system

41

Classification

System value

Log linkage information

ESCORT log

Log set

log_type_Name: (Case of utilizing the attached pile) AND file_size:/[0-9]{9,}/

Standard field

Company number (company_id)

Standard cases

1 case per 1 min

Log was searched according to the values presented and successfully deducted IP as a point of departure and a point of destination. This is indicated at Table 6.

4.3 Measurement of Velocity of Data Analysis System For a substantial analysis of data, a great deal dynamic dashboard is to be established according to characteristics of users. The real-time monitoring of data according to equipment/log types is being conducted through a correlation analysis of every event to illustrate it intuitively. When there occurs an abnormal sign on analysis data while monitoring the analytic process real time, a warning alarm rings to visualize and show security threat to user. 2 billion cases of single search at maximum are executed. Regarding the velocity of search, searching can be executed for 1 min within the scope of 200G–400 GB log information, as of 1 day of criteria application of single search. The results of test of search system applying the treatment of big data technology are indicated at Tables 7, 8, 9 and 10.

5 Conclusions To establish the system for a SIEM-based security log analysis, the technology to collect, save, handle and analyze was applied through advanced information analysis platform. Usability and extension are secured through security log collected by means of Agent/Agentless to save enormous data. Through a fast search and analysis, a method of visualizing was pursued to establish the system that enables an analysis of security issue in intranet unrealized in the past. To verify it, Chap. 4 conducted a log analysis of the detection of external IP and the traffic of enormous file. An abnormal sign was detected through the number of cases occurring against the standard cases. It was confirmed that the corresponding log set deducted only the resulting value preferred. And finally, time of character string and command searched was measured. The correlation of every event was illustrated. The result revealed that regarding a total of 9 hundred million cases, there occurred the assessed value requiring 1 s on the average, and that there is a high efficacy in

Time occurring

3:01 pm, Apr. 08, 2018

3:47 pm, Apr. 08, 2018

3:46 pm, Apr. 08, 2018

3:45 pm, Apr. 08, 2018

3:44 pm, Apr. 08, 2018

1:36 pm, Apr. 08, 2018

1:18 pm, Apr. 08, 2018

9:01 am, Apr. 08, 2018

7:14 am, Apr. 09, 2018

6:33 am, Apr. 08, 2018

Number

1

2

3

4

5

6

7

8

9

10

[IP-PLUS]FW_Connected outside the designated nations

[IP-PLUS]FW_Connected outside the designated nations

[IP-PLUS]FW_Connected outside the designated nations

[IP-PLUS]FW_Connected outside the designated nations

[IP-PLUS]FW_Connected outside the designated nations

[IP-PLUS]FW_Connected outside the designated nations

[IP-PLUS]FW_Connected outside the designated nations

[IP-PLUS]FW_Connected outside the designated nations

[IP-PLUS]FW_Connected outside the designated nations

[IP-PLUS]FW Connected outside the designated nations

Event name

Table 4 Detection of IP of security log analysis system

223.17.240.131

109.168.116.238

192.167.0.56

211.30.222.91

211.30.222.91

109.87.60.26

109.87.60.26

109.87.60.26

109.87.60.26

109.87.60.26

Standard field

Security

Security

Security

Security

Security

Security

Security

Security

Security

Security

Classification

Low

Low

Low

Low

Low

Low

Low

Low

Low

Low

Grade

1

1

1

1

1

1

1

1

1

1

Checking cycle (min)

20/20

48/20

22/20

37/20

22/20

80/20

30/20

28/20

30/20

31/20

Number of cases occurring/number of standard cases

42 G.-S. Jeon et al.

A Study on the Methods for Establishing Security Information …

43

Table 5 Values of the detection of IT of security log system Classification

System values

Log linkage information

User connection log IPMS

Log set

log_type_Name: FW AND src_ip:* NOT src_country_name: (“Private Network” “Korea, Republic of” “United States” China Japan India Germany “N/A”)

Standard field

Point of departure IP (src_ip)

Standard case

1 min/20 cases

Table 6 Result of security log analysis system Number

Time of equipment occurring

Equipment IP

IP as a point of departure

Nation of departure

IP as point of destination

1

3:46 pm, Apr. 08, 2018

172.21.11.2

109.87.60.26

Ukraine

211.168.24.147

2

3:46 pm, Apr. 08, 2018

172.21.11.2

109.87.60.26

Ukraine

211.168.24.181

3

3:46 pm, Apr. 08, 2018

172.21.11.2

109.87.60.26

Ukraine

211.168.24.181

4

3:46 pm, Apr. 08, 2018

172.21.11.2

109.87.60.26

Ukraine

211.168.24.147

5

3:46 pm, Apr. 08, 2018

172.21.11.2

109.87.60.26

Ukraine

211.168.24.147

6

3:46 pm, Apr. 08, 2018

172.21.11.2

109.87.60.26

Ukraine

211.168.24.181

7

3:46 pm, Apr. 08, 2018

172.21.11.2

109.87.60.26

Ukraine

211.168.24.181

8

3:46 pm, Apr. 08, 2018

172.21.11.2

109.87.60.26

Ukraine

211.168.24.147

Table 7 Velocity of search system to apply the treatment of big data technology result of the 1st test Classification

Item

Assessed value (s)

Remark

Velocity of search

Character string

1

Result of search: 464,099 cases

Time

1

Result of search: 9179 cases

Single IP

1

Result of search: 118 cases

Band IP

8

Result of search: 276,339,321 cases

and

2

Result of search: 3,665,593 cases

or

2

Result of search: 1,242,696 cases

drill down

3

Result of search: 1,083,326 cases

44

G.-S. Jeon et al.

Table 8 Real-time monitoring of search system to apply the treatment of big data technology result of the 1st test Classification

Item

Assessed value

Remark

Providing real-time monitoring function

Searching the corresponding event on the management screen after the event occurs

11.9 s

Real-time monitoring system: 1 s Management screen: 11.9 s

Presenting agent before assessing utilization rate of system resource

0.2370%

Table 9 Velocity of search system to apply the treatment of big data technology result of the 2nd test Classification

Item

Assessed value (s)

Remark

Velocity of search

Character string

1.00

Result of search corresponding to reference value

Time

1.00

Result of search corresponding to reference value

Single IP

1.01

Result of search corresponding to reference value

Band IP

7.00

Result of search corresponding to reference value

and

1.00

Result of search corresponding to reference value

or

1.01

Result of search corresponding to reference value

drill down

1.00

Result of search corresponding to reference value

Table 10 Real-time monitoring of search system to apply the treatment of big data technology result of the 2nd test Classification

Item

Assessed value

Providing real-time monitoring function

Searching the corresponding event on the management screen after the event occurs

3.4 s

Presenting agent before assessing utilization rate of system resource

0.6%

Providing the categorized events of log consisting of multirow gathered through Agent

P

Remark

A Study on the Methods for Establishing Security Information …

45

search system to apply the treatment of big data technology. It was also verified that the data process searches accurately the resulting value of search amounting to 9 hundred million cases, without error. Accordingly, it was proved that internal and external enterprise security is possible by analyzing an association between data and security events occurring in external intrusion and internal services. Also it is expected that a real-time monitoring of inflow of malicious code, infection (dissemination) to enterprises would give a ripple effect to enhance customer service satisfaction. A limit that this research has is that it has adopted some of security domains in multi-methodologies of big data analysis. Taking it into account, it needs to carry out a subsequent research on analysis techniques to apply multi-domains such as manufacturing, service and finance—as well as security domain applied into big data analysis areas.

References 1. Ha, L.S., Won, K.S., Hong, K.K., Sechung, P.: Design of big data ETL model for aggregating of security log/event. KICS, no. 06 (2014) 2. Kavanagh, K.M., Nicolett, M., Rochford, O.: Magic quadrant for security information and event management. Gartner Group, no. 06 (2014) 3. Nicolett, M., Feiman, J.: SIEM Enables Enterprise Security Intelligence. Gartner Group, no. 01 (2011) 4. MacDonald, N.: Information Security Is Becoming a Big Data Analytics Problem. Gartner Group, no. 05 (2012) 5. Nicolett, M., Kavanagh, K.M.: Magic Quadrant for Security Information and Event Management. Gartner Group, no. 05 (2012) 6. Nicolett, M., Kavanagh, K.M.: Critical Capabilities for Security Information and Event Management. Gartner Group, no. 05 (2012)

Multi-TSV (Through Silicon Via) Error Detection Using the Non-contact Probing Method Sang-Min Han, Youngkyu Kim and Jin-Ho Ahn

Abstract In this paper, a simple and fast TSV (Through Silicon Via) fabrication error detection method is proposed for 3-D IC applications. The DUTs are EM-modeled for the capacitive probes and TSVs with fabrication errors of a crack, a pin-hole, and a micro-void. The large probe can measure multiple TSVs, simultaneously. From the post-processing of the measured data, the capability of error detection is verified for error cognition, the size and the number of errors. Keywords TSV (Through Silicon Via) · 3-D IC · Eye-patterns · Fabrication errors · Measurements · Error detection

1 Introduction Since the mobile communication terminal requires highly integrated compact chipsets, the semiconductor packaging has considered vertical integration by stacking chip dies. Because the die is still thin itself, the vertically integrated 3-D IC can save planer space. However, it needs a lot of transmission lines between dies that is fabricated for vertical through via, TSV (Through Silicon Via). The 3-D IC is composed of several silicon dies that are vertically stacked. The stacked dies are connected by a lot of TSVs that transmit signals, clocks, and DC power between the dies. Therefore, the 3-D IC can achieve a small form-factor, high speed signaling, low parasitic effect, and low power consumption. Even though it has the excellent merits, the fabrication technology is not stabilized for mass production. Because S.-M. Han · Y. Kim Department of Information and Communication Engineering, Soonchunhyang University, Asan, Chungnam 31538, Republic of Korea e-mail: [email protected] J.-H. Ahn (B) School of Electronics and Display Engineering, Hoseo University, Asan, Chungnam 31499, Republic of Korea e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Lee (ed.), Applied Computing and Information Technology, Studies in Computational Intelligence 847, https://doi.org/10.1007/978-3-030-25217-5_4

47

48

S.-M. Han et al.

current fabrication technologies can be re-used for 3-D ICs except the TSV and adhesion technologies, one of the key fabrication issues is the fabrication, yield, and error detection for the TSV [1, 2]. The Samsung Electronics Ltd. has started mass production of 235 GB 3D V-NAND flash memory in 2015 [3]. The TSVs can experience several types of errors such as a crack, a micro-void, and a pin-hole during the fabrication process. The TSV needs to be tested for error detection. Two kinds of test methods have been introduced [4]. One is the contact probing method using a wafer probe, and the other is the non-contact probing method using remote sensors or resonators. The former has physical limitations in extremely narrow probe pitches, a lot of probing pads, and contact damages such as scrub marks and deformation [5–7]. The latter has three types of measurements such as an RF resonator [8], an inductive coupling [8, 9], and a capacitive coupling [4, 10, 11]. The RF type requires a lot of large transceivers, whereas the inductive coupling type has difficulties in adjusting a coil size and a distance between coils. The capacitive coupling type needs relative high power consumption, an optimized probe size, and difficult coupling control for a distance between a probe and a TSV. However, the capacitive coupling type can measure high dense TSVs with faster process and simple instruments. In this paper, a TSV error detection method is proposed with non-contact capacitive coupling type probing. In addition, the resolution of the error detection is verified for the error size and the no. of TSV with error. The capacitive coupled probe and TSVs on a wafer are modeled for electromagnetic simulation. The extracted characteristics are evaluated by eye pattern tests. The capacitive probing tests are investigated for a single TSV and multi-TSVs to achieve fast testing process. This paper is organized as follows. After the brief Introduction, the models of the TSV errors and a capacitive coupling are described in Section II. The proposed investigation process and the geometrical design of a large probe and TSVs are explained in Sect. 3. Section 4 describes the error detection results for error sizes and the number of TSVs with errors. Finally, the feasibility and industrial approaches are mentioned in the Conclusion.

2 TSV Fabrication Error Model and Detection In this Section, the DUT (Devices Under Test) of TSVs, errors, and measuring probes are modeled with two-port networks to characterize the transmission performance [12, 13]. One port is a capacitive probe configured of a conductive disk, whereas the other is a ground pad on a bottom of the silicon die. The TSV is designed from a conductive cylinder coated by Teflon insulator. The insulator coating prevents current leakage from a TSV to a silicon die. The TSV is surrounded by a silicon die with a conductive GND pad at a bottom. The TSV and its fabrication error models are described in Fig. 1 for a crack, micro-void, and a pin-hole. The crack error is designed of the disconnection in the middle of the TSV cylinder with an air gap. The micro-void is modeled of an empty

Multi-TSV (Through Silicon Via) Error Detection Using …

(a) TSV.

(b) Crack.

49

(c) Micro-void.

(d) Pin-hole.

Fig. 1 TSV fabrication error models

L12 C12 L41

C23

C41

L23

C1

C2

L1a

L2a

C12

C34

L2b

L1b

L34

LG12

(a) Top view.

(b) Side view.

Fig. 2 Equivalent circuits model of 2 × 2 array TSVs

sphere within a TSV cylinder. The pin-hole is designed for the TSV with a circular hole of the insulator on a middle of the TSV cylinder. In this paper, multiple TSV array is modeled to measure several DUT simultaneously using one large capacitive probing disk. Figure 2 shows the equivalent circuit model for 2 × 2 arrayed TSVs. Each TSV is modeled of conductive cylinders with mutual capacitance Cmn , (m, n = TSV no.) and leaky current through a silicon wafer generates mutual inductances Lmn in Fig. 2a. Figure 2b presents vertical equivalent circuit components. The mutual capacitances between the large probe and tops of TSVs are C 1 and C 2 . The mutual inductance LGmn shows current flow via a ground plane. The inductances to current direction inside of TSVs, Lma , Lmb , are induced.

50

S.-M. Han et al.

Fig. 3 Post-process for the TSV fabrication error detection using measured datum and eye-diagram analysis

3 Multi-TSV Error Detection Method Using Large Non-contact Probes The non-contact probing method with capacitive coupling is proposed for a faster measurement procedure. The 3-D IC has a lot of TSVs to be tested. When it measured one by one, it takes a lot of time for testing process. Therefore, the proposed method utilizes moving probe that can measure several TSVs simultaneously. During one port of all TSVs on a wafer is the ground port, the other port of the TSVs are measured by the non-contact probe with a 2 × 2 or 3 × 3 array coverage. After one measurement is completed, the probe moves the next group without contact alignments. The error detection can check the existence and the size of errors such as cracks, micro-voids, and pin-holes. The error detection is performed by an eye-pattern diagram. The measured datum is tested by checking eye-patterns. Figure 3 shows the experimental process for the TSV error detection. The PRBS (Pseudo Random Bit Sequence) with a data rate of 500–700 Mbps and a length of 28 − 1 is generated by an LFSR (Linear Feedback Shift Register). For the simulation process, the characteristics of the two-port network are extracted from the EM simulation. Because the capacitive coupling differentiates transmitting pulses, the digital pulse is recovered from an integrator and an amplifier. Then, the output data is measured for the eye-pattern diagram at an oscilloscope. This is simulated by the ADS (Advanced Design Simulator) circuit simulator, Keysight Technologies Ltd. The TSV array in a silicon wafer is modeled for an electromagnetic simulation. Because the non-contact measurement is based on an EM coupling effect, the DUT is defined from a top probe to the bottom ground of the TSV by EM analysis. The Silicon wafer mounting four TSVs and a metal probe is model as shown in Fig. 4. A 2 × 2 TSV is mounted inside of the silicon wafer. The bottom surface of the wafer is covered by a conducting plane, which means that all TSVs are connected by a common ground plane. The TSV is designed with a circular cylinder coated by an insulator. The insulator coating can protect leakage current through the silicon wafer. The probe is designed with two conducting disks with different radius to consider impedance matching.

Multi-TSV (Through Silicon Via) Error Detection Using …

EM port

51

hprv1 hprv2

g

insulator H TSV

H waf

TSV silicon wafer

tGND

D TSV

ground plane

tins

(a) Side view.

r prv2 r prv1

Probe TSV

Wwaf

s

Silicon Wafer (b) Top view. Fig. 4 2 × 2 TSV array EM modeling using large non-contact large probes

An EM port is assigned on the top surface of the probe, while the alternative port is designed on a planar ground structure, which also contributes to increase the measurement speed. In order to evaluate the coupling characteristics, the EM analysis is simulated by a 3-D full wave simulator, the HFSS (High Frequency Structure Simulator), AnSys Ltd. The dimensions of the 2 × 2 TSV array model is summarized in Table 1.

52 Table 1 Parameters of the 2 × 2 array TSV and probe model

S.-M. Han et al.

Parameter

Values (µm)

Parameter

Values (µm)

hprv1

3

tGND

1

hprv2

3

g

1

HTSV

100

rprv1

30

Hwaf

99

rprv2

136

DTSV

50

Wwaf

600

s

100

tins

1

4 Experimental Results for TSV Fabrication Error Resolution In this section, the experimental results and the analysis are presented. While the previous research has shown the error existence detection by the proposed method, this research presents the detection resolution for error sizes and the number of TSVs with error. From the detection procedure introduced above, the TSV error detection performance is evaluated. Figure 5 presents the crack error detection. The crack error is designed of the disconnection in the middle of the TSV cylinder with an air gap of 8 µm and 18 µm. Only one TSV has the crack in the 2 × 2 TSV array. It is evaluated for different crack sizes from eye-pattern diagrams. Figure 5a shows the eye-pattern for 8 µm crack, while the larger crack of 18 µm is detected as shown in Fig. 5b. For the smaller crack, it has 0.7 ns jitter and eye closing of up to 50.2%. For the larger crack, the eye-pattern is degraded and eye is almost closed from the interference. Therefore, the proposed test procedure can detect the existence of TSV errors and resolve the crack size. Figure 6 presents the micro-void error detection of TSVs. The sphere-shaped micro-void error is located at the center of the TSV with a radius of 7 µm and 19 µm, respectively. From the post-processing, the eye-patterns are presented as below. The larger the size of micro-void is, the smaller the eye of the eye-pattern is, which means the degradation of the digital data transmission. Figure 7 describes the error detection of the number of pin-hole errors in TSVs. For the 2 × 2 TSV array, the number of TSV including a pin-hole error is increased from 0 to 4. The pin-hole error is designed of a circular hole of insulator in the middle of the TSV cylinder coated by the insulator. The circular hole has a radius of 16 µm. Each test is performed with the same simulation condition. The eye-pattern diagrams are tested at a bit rate of 500 MHz. Figure 7a shows the clear pattern of the 2 × 2 TSV array without any errors. From the eye-pattern evaluations, as the number of error increases, it can be detected that the eye-pattern is degraded. In case of one pin-hole error, it presents a jitter of 0.3 ns and the eye is closed up to 33%. For the diagonally located two errors in Fig. 7c, the jitter is 0.3 ns and the eye closes up to 47.4%. In addition, it starts that the detected voltage level drops. At the three error TSV array, the detected voltage level is rapidly dropped. For all error TSV array, it is difficult for the eye pattern to be detected.

Multi-TSV (Through Silicon Via) Error Detection Using …

53

Fig. 5 Fabrication error detection for various micro-void sizes

(a) Error free.

(b) 8 m.

(c) 18 m.

54

S.-M. Han et al.

Fig. 6 Fabrication error detection for various micro-void radius

(a) 7 m.

(b) 19 m.

5 Conclusion A non-contact TSV error detection method is proposed using a larger capacitive coupled probing. The capacitive coupling probe and TSV errors are modeled for EM analysis. The analyzed detection characteristics are evaluated at the eye-pattern diagram to recognize the existence, the size of the errors, and the number of TSVs with errors. From the evaluated results, the proposed method shows the error detection and the distinction from the size and number of errors. Additionally, it can be one of the fastest measurement methods by moving one large probe without contact alignment and TSV damages.

Multi-TSV (Through Silicon Via) Error Detection Using …

(a) with one error.

(b) with two errors.

55

(c) with three errors

(d) with four errors.

Fig. 7 Eye-diagram comparison for the number of TSV

Acknowledgements This research was supported by the MOTIE and KSRC support program for the development of the future semiconductor device, Soonchunhyang University research fund, and the MSIT, Korea, under the ITRC support program (IITP-2018-2015-0-00403) supervised by the IITP.

References 1. Marinissen, E.J.: Challenges and emerging solutions in testing TSV-based 2½D- and 3Dstacked ICs. In: Proceedings of the Design, Automation & Test Europe Conference & Exhibition, pp. 377–382 (2012) 2. Jung, D.H., Kim, J., Kim, H., Kim, J.J., Kim, J., Pak, J.S., Yook, J.-M., Kim, J.C.: Frequency and time domain measurement of Through-Silicon Via (TSV) failure. In: Proceedings of the IEEE Conference on Electrical Performance of Electronic Packaging Systems, pp. 331–334 (2012)

56

S.-M. Han et al.

3. Online Article: Samsung begins mass producing industry first 256-gigabit. 3D V-NAND. Semiconductor Manufacturing & Design Community. http://semimd.com/blog/2015/08/11/ samsung-begins-mass-producing-industry-first-256-gigabit-3d-v-nand/ (2015) 4. Kim, Y., Lee, S.-J., Ahn, J.-H., Yoon, W.-S., Swaminathan, M., Han, S.-M.: TSV fault detection with large non-contact probes using capacitive coupling for 3-D IC applications. In: Proceedings of IEEE Electrical Design of Advanced Packaging and Systems, pp. 58–61 (2015) 5. Noia, B., Panth, S., Chakrabarty, K., Lim, S.K.: Scan test of die logic in 3-D ICs using TSV probing. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 23, 317–330 (2015) 6. Yaglioglu, O., Eldridge, B.: Direct connection and testing of TSV and microbump devices using NanoPierce™ contactor for 3D-IC integration. In: Proceedings of the VLSI Test Symposium (VTS), pp. 96–101 (2012) 7. Lewis, D.L., Lee, H.-H.S.: A scan island based design enabling pre-bond testability in diestacked microprocessors. In: Proceedings of the International Test Conference, pp. 1–8 (2007) 8. Rashidzadeh, R.: Contactless test access mechanism for TSV based 3D ICs. In: Proceedings of the VLSI Test Symposium (VTS), pp. 1–6 (2013) 9. Kim, J.J., Kim, H., Kim, S., Bae, B., Jung, D.H., Kong, S., Kim, J., Lee, J., Park, K.: Non-contact wafer-level TSV connectivity test methodology using magnetic coupling. In: Proceedings of the IEEE International 3D System Integration Conference, pp. 1–4, 2013 10. Dev, K., Woods, G., Reda, S.: High-throughput TSV testing and characterization for 3D integration using thermal mapping. In: Proceedings of the Design Automation Conference (DAC), pp. 1–6 (2013) 11. Han, S.-M., Kim, Y., Ahn, J.-H.: A study of the TSV fabrication error detection method with large non-contact probes for 3-D IC applications. In: Proceedings of the International Conference on Electronics, Electrical Engineering, Computer Science, pp. 1–2 (2018) 12. Cho, J., et al.: Modeling and analysis of through-silicon via (TSV) noise coupling and suppression using a guard ring. IEEE Trans. Compon. Packag. Manuf. Technol. 1, 220–233 (2011) 13. Kim, Y., Han, S.-M., Ahn, J.-H.: TSV fault detection technique using eye pattern measurements based on a non-contact probing method. Trans. Korean Inst. Electr. Eng. 64, 592–597 (2015)

Real-Time Ultra-Wide Viewing Player for Spatial and Temporal Random Access Gicheol Kim and Haechul Choi

Abstract Recently, the demand for ultra-high-resolution images of 4K or more, such as ultra-wide viewing (UWV) images, is increasing. This paper investigates the problems concerning playing UWV video in terms of system resources and user preference. Users may be interested in a certain region or a particular time interval of the entire set of UWV images. In this case, decoding all images is a waste of system resources and requires a lot of computational time. This paper introduces a spatial and temporal random-access method uses NVIDIA’s NvCodec library. Furthermore, it proposes a trans-rating method to lower the bitrate of the region-of-non-interest. Using the proposed methods, customized service can be implemented. Keywords NVIDIA video codec · Trans-rating · Random access · HEVC

1 Introduction Recently, the demand for ultra-high-resolution images of 4K or more, such as wide panoramic images, 4K ultra high-definition broadcasting, and ultra-wide viewing (UWV) images, is increasing due to the development of display devices and the expansion of transmission bandwidth of Giga networks. Since the amount of data required to store or transmit ultra-high resolution images is much larger than that of conventional image content, advanced video compression technologies are required.

G. Kim · H. Choi (B) Department of Multimedia Engineering, Hanbat National University, 125 Dongseo-daero, Daejeon, Yuseong-gu, South Korea e-mail: [email protected] G. Kim e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Lee (ed.), Applied Computing and Information Technology, Studies in Computational Intelligence 847, https://doi.org/10.1007/978-3-030-25217-5_5

57

58

G. Kim and H. Choi

Accordingly, the ITU-T video coding expert group and the ISO/IEC moving picture expert group established JCT-VC, and released the video coding standard of high efficiency video coding (HEVC)/H.265 [1–10] in January 2013. HEVC/H.265 is the latest video coding standard, which provides high encoding efficiency using various encoding techniques; however, its complexity is also very high compared to the previous standard H.264/AVC [11, 12]. In particular, very high computational complexity is required to encode or decode ultra-high-resolution images with HEVC/H.265 [13]. Therefore, graphics processing unit (GPU) based methods [14–16] are currently being studied to accelerate the encoding and decoding processes. NVIDIA currently plays an important role in accelerating GPU computing in software as well as hardware. The GPU-based NVIDIA video codec software development kit (SDK) [17–19] supports accelerated video encoding and decoding using NVIDIA graphics cards, facilitating live stream conversion to H.264/AVC or HEVC/H.265 files at a greater scale and speed. UWV images are made with very wide viewing angles, both vertically and horizontally, using stitching methods. Therefore, they have ultra-high spatial resolution and contain an extremely large amount of data, which makes it difficult to encode or decode entire UWV images, even though the GPU based method is used. This paper investigates the problems concerning the playing of UWV video in terms of system resources and user preference. Users may only be interested in a certain region of a UWV image or a particular time interval of the image sequence. In other words, users may want spatial and temporal random-access features. In this case, decoding all images leads to a waste of system resources. These functions of random access and resource allocation are very important to prevent wastage of resource [20]. This process also takes a lot of time because it is computationally intensive. This paper introduces a spatial and temporal random-access method for UWV images that uses NVIDIA’s GPU based NvCodec library [19]. Furthermore, this paper proposes a trans-rating method for HEVC/H.265 to lower the bitrate for the region-of-non-interest (RONI). The trans-rating is bitrate conversion. By de-coding only the desired regions through random reproduction of the time and region, and providing low bitrates of RONI, customized service can be realized, and storage space and transmission bandwidth can be managed efficiently. The rest of this paper is organized as follows. Section 2 introduces trans-rating, spatial random access, and temporal random-access techniques. Section 3 describes experimental results. Finally, conclusions are drawn in Sect. 4.

Real-Time Ultra-Wide Viewing Player for …

59

Fig. 1 Adaptive trans-rating according to region-of-interest and region-of-non-interest

2 Proposed Method The block diagram of the proposed bitrate conversion method is shown in Fig. 1. The proposed method uses the NVIDIA transcoder [19, 21], receives the image bitstream file from the user, re-encodes the YUV file from the de-coder, and finally outputs the bitstream file. It also shows a system for changing the bitrate of the bitstream compressed by HEVC/H.265. A spatial/temporal random-access player was implemented using the NVIDIA decoder. This player can be decoded and rendered in real time [22–24]. Spatial random access refers to a method of decoding only a random portion of an image to be reproduced. Temporal random access refers to the function of selecting a random point from an image and decoding it from that point [25].

2.1 Trans-Rating In this paper, analysis related to the quantization module shown in Fig. 1 is carried out, and the following features are added. In the quantization part of the NVENC encoder, delta quantization parameter (QP) map related parameters are checked, and the size of the delta QP map is determined using the coding tree unit (CTU) size of 32 × 32. The tile format, region of interest (ROI) tile, and RONI tile can be distinguished into independent QP values. The size of the tile can be set by the number of tiles, and the length is as desired by the user. A QP map is created by the following procedure. First, the qpDeltaMapArray pointer is dynamically allocated to the CTU size using the width and height in the

60

G. Kim and H. Choi

Table 1 NVIDIA transcoder options

Command

Explanation

-i

Specify input yuv420 file

-o

Specify output bitstream file

-size

Specify input resolution

-codec

0: H264

-preset

hp: High Performance Preset

1: HEVC hq: High Quality Preset lowLatencyHP: nvenc low latency HP lowLatencyHQ: nvenc low latency HQ lossless: nvenc Lossless HP -fps

Encoding frame rate

-goplength

Specify gop length

-qp

Specify qp for Constant QP mode

-rcmode

0: Constant QP mode 1: Variable bitrate mode 2: Constant bitrate mode 8: low-delay CBR, high quality 16: CBR, high quality (slower) 32: VBR, high quality (slower)

-help

Prints help information

input image bitstream file. To generate a one-dimensional array DeltaQPArray, the delta QP map size is defined as (1) Delta QP map size =

Image width Image height × CTU width CTU height

(1)

To map the DeltaQPArray to the tile area, the input of a single image is the ROI tile delta QP, RONI tile delta QP, and the tile delta QP for each CTU. 1D arrays with QP values are created. This 1D Array is input to NVENC to convert the bitrate for the ROI/RONI. Based on the number of tiles and the length of the tiles, the ROI QP value is added to the qpDeltaMapArray in the raster scan order, and the existing QP value is put in the remaining tiles. Additional options supported by the NVIDIA transcoder are shown in Table 1.

Real-Time Ultra-Wide Viewing Player for …

61

Fig. 2 Real-time UWV player with spatial/temporal random access

2.2 Spatial/Temporal Random Access The overall flow after the trans-rating is shown in Fig. 2. In this paper, after decoding the YUV420 format image, it is converted into NV12 format, sepa-rated into Y bitmap and UV bitmap, and put into the back buffer after putting it into DirectX [26] effect rendering. For temporal random access, the research method was a method to find, at any point selected by the user, the nearest instant random-access point (IRAP) and reconstruct it from the image. Specific tiles have been selected for decoding and rendering. Finally, high-speed rendering is implemented using hard-ware acceleration utilizing the DirectX library.

3 Experimental Results The computational specifications used in this paper are 4 cores 8 thread CPU (i73770, 3.4 GHz), 16 GB of RAM, and GTX 1060 6 GB to use the NVIDIA video codec SDK. The test images were common test sequences used in HEVC/H.265 standardization [27]. Table 2 shows the compression ratios for image quality reduction in RONI. The average bit compression ratio of class A, B, and C is reduced by 43%. In terms of BD-rate [28], the proposed method achieves an average coding efficiency of –12.99% in BD-rate. Tables 3 and 4 show the results in spatial random access and the results in temporal random access. As shown in this table, the proposed method obtains an average speed of 0.014s per a frame, which is sufficient speed for real time processing.

62

G. Kim and H. Choi

Table 2 Coding efficiency by the trans-rating process for RONI Sequence

class A class B

class C

Bit compression (%)

BD rate Y (%)

BD rate U (%)

BD rate V (%)

PeopleOnStreet −53.0

−17.62

−32.93

−35.18

Traffic

−40.0

−10.32

−22.54

−23.74

Kimono

−44.0

−13.09

−18.76

−22.62

ParkScene

−42.0

−9.73

−22.73

−25.64

BasketballDrive −56.0

−27.87

−34.09

−32.53

BQTerrace

−48.0

−16.08

−28.47

−27.68

BQMall

−34.0

−5.00

−12.30

−16.60

BasketballDrill −34.0

−7.52

−14.49

−12.72

−39.0

−9.66

−12.96

−11.16

−43.0

−12.99

−22.14

−23.10

RaceHorses Average

Table 3 Result of spatial random access Decoding time (sec) Class A

1st Tile

6th Tile

Average

PeopleOnStreet

0.0210

0.0208

0.021

Traffic

0.0205

0.0184

0.019

Kimono

0.013

0.013

0.013

ParkScene

0.013

0.013

0.014

Class A average Class B

0.020

Cactus

0.013

0.014

0.014

BasketballDrive

0.015

0.015

0.015

BQTerrace

0.013

0.013

0.013

BQMall

0.009

0.009

0.009

PartyScene

0.009

0.009

0.009

BasketballDrill

0.007

0.008

0.008

RaceHorses

0.008

0.008

0.008

Class B average Class C

Class C average Average

0.014

0.008 0.014

Real-Time Ultra-Wide Viewing Player for …

63

Table 4 Result of temporal random access Decoding time (sec) Class A

1st Frame

30th Frame

Average

PeopleOnStreet

0.021

0.0210

0.021

Traffic

0.020

0.0205

0.020

Class A average Class B

0.021

Kimono

0.013

0.013

0.013

ParkScene

0.015

0.013

0.014

Cactus

0.013

0.013

0.013

BasketballDrive

0.013

0.015

0.014

BQTerrace

0.015

0.013

0.014

Class B average Class C

BQMall

0.008

0.009

PartyScene

0.008

0.009

0.009

BasketballDrill

0.007

0.007

0.007

RaceHorses

0.006

0.008

0.007

Class C average Average

0.014 0.009

0.008 0.014

To evaluate trans-rating performance with different qualities in ROI and RONI, an original image shown in Fig. 3a is divided into 2 × 3 tiles. The right-bottom tile is assigned to ROI and the rest tiles are assigned to RONI. In ROI and RONI, the delta QPs are set to 0 and 51, respectively. Figure 3b shows the result of the trans-rating for ROI and RONI. As shown in this figure, tile in ROI has very similar quality to the original image, whereas tiles in RONI have lower quality that the original image. By reducing the quality of RONI tiles, high coding efficiency is achieved like Table 2. The other experiment is conducted as shown in Fig. 4, where the top region is assigned to RONI and the bottom region is assigned to ROI. For ROI and RONI, the delta QPs are set to 28 and 51, respectively. In this experiment, it is also observed that the trans-rating works well. It can be confirmed that high compression is obtained with maintaining the image quality of the ROI tile by applying separate delta QPs to ROI and RONI. Figure 5 shows the result of temporal random access by using the proposed realtime ultra-wide viewing player. When the 220th frame is selected as the starting picture for decoding, it is shown that the 220th frame is the first frame decoded.

64

G. Kim and H. Choi

(a) Original image, 28th frame of Kimono sequence.

(b) Result of trans-rating Fig. 3 Trans-rated image where the right-bottom block is the ROI and the rest blocks are the RONI

Real-Time Ultra-Wide Viewing Player for …

(a) Original image, 22th frame of RaceHorses sequence

(b) Result of trans-rating Fig. 4 Trans-rated image where the top region is the RONI and the bottom region is the ROI

65

66

G. Kim and H. Choi

Fig. 5 Temporal random access result when the 220th frame is selected as the starting picture for decoding

Figure 6a, b, and c show the results of various spatial random access decoding by using the proposed real-time ultra-wide viewing player. As one more tiles are selected, the selected tiles are only decoded and displayed in real-time.

4 Conclusion UWV images, which provide very wide viewing-angles, have ultra-high spatial resolution and contain a huge amount of data. To view them using low system resources and in a user-friendly way, this paper introduces the spatial and temporal randomaccess method and the trans-rating method according to the ROI and RONI. The proposed methods make it possible to efficiently manage the storage space and transmission bandwidth by lowering the bitrate of RONI. Further, users can view and select desired regions from UWV images by using the spatial random-access feature. Similarly, the temporal random-access feature gives us the ability to start decoding at a desired picture location. The proposed method would be a suitable solution for customized UWV image services.

Real-Time Ultra-Wide Viewing Player for …

(a) Top-left region in 2 × 2 tile structure is selected

(b) Center and the top-left regions in 3× 3 tile structure are selected

(c) Right region in 1× 3 tile structure is selected Fig. 6 Results of spatial random access

67

68

G. Kim and H. Choi

References 1. JCT-VC.: High Efficiency Video Coding (HEVC) Text Specification Draft 10, JCT-VC-L1003, Geneva (Jan 2013) 2. Ohm, J.-R., et al.: Comparison of the coding efficiency of video coding standards—including high efficiency video coding (HEVC). IEEE Trans. Circuits Syst. Video Technol. 22(12), 1669–1684 (2012) 3. Sullivan, G.J., Ohm, J.R., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012) 4. Bossen, F., Bross, B., Suhring, K., Flynn, D.: HEVC complexity and implementation analysis. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1685–1696 (2012) 5. Kim, I.-K., Min, J., Lee, T., Han, W.-J., Park, J.: Block partitioning structure in the HEVC standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1697–1706 (2012) 6. Yuan, Y., Kim, I.-K., Zheng, X., Liu, L., Cao, X., Lee, S., Cheon, M.-S., Lee, T., He, Y., Park, J.-H.: Quadtree based nonsquare block structure for inter frame coding in high efficiency video coding. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1707–1719 (2012) 7. Helle, P., Oudin, S., Bross, B., Marpe, D., Bici, M.O., Ugur, K., Jung, J., Clare, G., Wiegand, T.: Block merging for quadtree-based partitioning in HEVC. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1720–1731 (2012) 8. Sze, V., Budagavi, M.: High throughput CABAC entropy coding in HEVC. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1778–1791 (2012) 9. Lainema, J., Bossen, F., Han, W.-J., Min, J., Ugur, K.: Intra coding of the HEVC standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1792–1801 (2012) 10. Norkin, A., Bjøntegaard, G., Fuldseth, A., Narroschke, M., Ikeda, M., Andersson, K., Zhou, M., der Auwera, G.V.: HEVC deblocking filter. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1746–1754 (2012) 11. Marpe, D., Wiegand, T., Sullivan, G.J.: The H.264/MPEG4 advanced video coding standard and its applications. IEEE Commun. Mag. 44(8), 134–143 (2006) 12. Wiegand, T., Sullivan, G.J., Bjontegaard, G., Luthra, A.: Overview of the H.264/AVC video coding standard. IEEE Trans. Circuits Syst. Video Technol. 13(7), 560–576 (2003) 13. Son, S., Baek, A., Choi, H.: Tile, slice, and deblocking filter parallelization method in HEVC. J. Broadcast. Eng. 22(4), 484–495 (2017) 14. Kruger, J., & Westermann, R.: Acceleration techniques for gpu-based volume rendering. In: Proceedings of the 14th IEEE Visualization 2003 (VIS’03). IEEE Comput. Soc. (2003) 15. Cheung, N.-M., Fan, X., Au, O.C., Kung, M.-C.: Video coding on multicore graphics processors. IEEE Signal Process. Mag. 27(2), 79–89 (2010) 16. Harish, P., Narayanan, P.J.: Accelerating large graph algorithms on the GPU using CUDA. In: Proceedings of the International Conference on High-Performance Computing, Goa, India, pp. 197–208 (2007) 17. Patait, A., Young, E.: High Performance Video Encoding with NVIDIA GPUs. In: 2016 GPU Technology Conference (2016). https://goo.gl/Bdjdgm 18. Kirk, D.: NVIDIA CUDA software and GPU parallel computing architecture. In: Proceedings of the 6th International Symposium on Memory Management, pp. 103–104. Quebec, Canada (2007) 19. NVIDIA Corporation, VIDEO CODEC SDK. https://developer.nvidia.com/nvidia-videocodec-sdk 20. Choi, Hyun-Ho: Adaptive and prioritized random access and resource allocation schemes for dynamic TDMA/TDD protocols. J. Inf. Commun. Converg. Eng. 15(1), 28–36 (2017) 21. Kim, G.C., Choi, H.: HEVC Trans-Rating using NVIDIA transcoder library. In: Proceedings of Korea Institute of Broadcast and Media Engineers Fall Conference, pp. 221–222. (2017) 22. Tennøe, M., Helgedagsrud, E.O., Næss, M., Alstad, H.K., Stensland, H.K., Halvorsen, P., Griwodz, C.: Real-time panorama video processing using NVIDA GPUs. In: Proceedings of GPU Technology Conference, Mar 2013

Real-Time Ultra-Wide Viewing Player for …

69

23. Lefohn, Aaron E., Kniss, Joe, Strzodka, Robert, Sengupta, Shubhabrata, Owens, John D.: Glift: Generic, efficient, random-access gpu data structures. ACM Trans. Graph. 25(1), 60–99 (2006) 24. Akenine-Moller, T., Haines, E., Hoffman, N.: Real-time Rendering. CRC Press, Aug 2008 25. Wang, Y.K., Hannuksela, M.: Random access points in video encoding. U.S. Patent No. 7,302,001, Nov 2007 26. Microsoft Corporation, DirectX (2010). http://www.microsoft.com/directx/ 27. Bossen, F.: Common Test Conditions and Software Reference Configurations. In: The 6th Meeting of Joint Collaborative Team on Video Coding (JCT-VC), JCTVC-F900, Torino, Italia, (2011) 28. Bjøntegaard, G.: Calculation of average PSNR differences between RD-curves. ITU-T SG16 Q.6 VCEG, Doc. VCEG-M33. (2001)

Gicheol Kim received his B.S. in radio engineering from Hanbat national university, Daejeon, Rep. of Korea, in 2017, and his M.S. in multimedia engineering from Hanbat national university, in 2019. His research interests include video coding, image processing, and deep learning.

Haechul Choi received his B.S. in electronics engineering from Kyungpook National University, Daegu, Rep. of Korea, in 1997, and his M.S. and Ph.D. in electrical and electronics engineering from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Rep. of Korea, in 1999 and 2004, respectively. He is a professor in the information and communication engineering at Hanbat national university, Daejeon, Korea. From 2004 to 2010, he was a senior member of the research staff in the broadcasting media research group of the Electronics and Telecommunications Research Institute (ETRI). His current research interests include image processing, video coding, and video transmission.

A Study on the Faith Score of Telephone Voices Using Machine Learning Hyungwoo Park

Abstract Human voices are one of the easiest ways to communicate information between humans. Voice characteristics may vary from person to person and include voice rate, gentle form and function, pitch tone, language habits, and gender. Human voices are a key element of human communication. In the era of the Fourth Industrial Revolution, the voices are the main means of communication between people and people, between humans and machines, machines and machines. And for that reason, people are trying to communicate their intent clearly to others. In the process, language information and various additional information are included. Information such as emotional state, health status, reliability, presence of lies, changes due to alcohol, etc. These languages and non-linguistic information can be used as a device to assess the lie of telephone voices that appear as various parameters. Especially, it can be obtained by analyzing the relationship between the characteristics of the fundamental frequency (fundamental tone) of the vocal cords and the resonance frequency characteristics of vocal tracks. Previous studies have extracted parameters for false testimony of various telephone voices and conducted this study to evaluate whether a telephone voice is a lie. In this study, we proposed a judge to judge whether a lie is true by using a support vector machine. We propose a personal telephone truth discriminator. Keywords Voice analysis · Lie detection · Credit evaluation · Telephone voice · Faith score

1 Introduction Human voices are one of the convenient and easy ways to communicate information around people and machines in life. These voices are made from human vocal cords and propagated through the air. The voice made by the vocal institution has many H. Park (B) School of Information Technology, Soongsil University, Seoul, Republic of Korea e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Lee (ed.), Applied Computing and Information Technology, Studies in Computational Intelligence 847, https://doi.org/10.1007/978-3-030-25217-5_6

71

72

H. Park

exercises since childhood. So they learned a lot to make learning easier and communicating habitually. And as voices spread far away, their voices gradually disappear. At this point, the voices generated are made up of information necessary for verbal communication, and voices are created by the personality of the person who creates them [1]. These individual characteristics are similar but different parameters are obtained by analyzing the differences between the characteristics of the penis, the characteristics of the language, the psychological situation, and the health status. A similar function of sound characters in speech analysis is the meaning of the language, and other functions are caused by individual habits. With the development of information and communication technology, people’s voices can be transmitted in various ways. It has been developed to transmit recorded voices far away. Today, these voices analyze information in voices record information, judge psychology, and communication between people and machines [1–3]. During the process of being created, voices have different characteristic parameters for each individual. This is analyzed by different sounds because of the process of vocalization and the resonance characteristics of vocal organs, as well as the different habits of making voices. Also, the vocabulary, language habits and location characteristics used vary widely depending on the difference in residential areas. Voice characteristics also change depending on health and psychological state. Therefore, you can analyze these voices in detail to discover the differences. The sound produced for this analysis can be analyzed by a person or by a machine. The basic result of the analysis conveys the linguistic meaning. In addition to basic language information, more than 100 additional information can be analyzed and identified to identify health and psychological states and can be used to judge the authenticity or deception of a speaker [3, 4]. By developing an information device, you can process and analyze your voice and get a variety of results. First, there is a method of transmitting the voice from a communication device or storing the voice using a recording device. Next, the meaning of the voice is analyzed, and instructions and operations of the machine can be performed through speech recognition. Next, people can recognize people in the speaker recognition system, identification through the speaker identification system, or by determining whether important functions are performed. The computer can also collect voice guidance or send information to a person on the device [5, 6]. Today, using technologies such as communication, storage, speech recognition, speaker identification, speaker recognition, and speech synthesis make it difficult to tackle the speaker’s feelings together. However, as in [5, 6], a study was attempted to add the attributes of synthetic language and add emotional parts of speech synthesis as in [6]. In [5], there has been an attempt to distinguish between the amount and type of information that recognizes and communicates emotions in speech recognition and communication [7]. Artificial Intelligence Technology in the Fourth Industrial Revolution Ear is developing and influencing the entire society. In particular, deep learning is a popular type of machine learning method, often performed by machines themselves. And in March 2016 Google Deepmind Challenge game [6]. And the machine learning method combined with the technology of processing Big data has been actively carried out to

A Study on the Faith Score of Telephone Voices Using …

73

process the area that was impossible with artificial intelligence. Big data processing method, known as deep learning, is a machine learning algorithm that can combine various nonlinear transformation techniques into a computer to perform multi-step abstraction (a key parameter combination of large data) [1]. However, a machine learning method requires a large amount of data, a calculation ability to make a conclusion, and a resource to judge the result [1, 3, 4]. In this study, we distinguished the characteristics of speech by using a support vector machine that can obtain effective results even in a small amount of learning. SVM (Support Vector Machine) is one of the machine learning fields and it is a supervised learning method for pattern recognition and data analysis [8]. Machine learning can be divided into supervised learning and non-supervised learning [9]. The advantage of supervised learning is that predictions are relatively accurate and use small amounts of data, so even small calculations can make quick decisions. SVM is a method of classifying data by finding a linear decision boundary (hyperplane) that distinguishes all data elements of a class from data elements of other classes [10]. And this SVM has very high classification accuracy [8]. The reason for high accuracy is that margins between data points to be classified are maximized and overfitting problems occur less frequently [9]. Also, if it is difficult to judge by the linear classification algorithm, it is easy to improve judgment performance by using the kernel function [8, 9]. The basic function of finance is to ensure that households, businesses, governments, financial institutions, etc. have the necessary funds and can manage the funds through transactions. The credit of individuals and financial institutions is very important information. Credit rating criteria are the degree to which past performance and format are faithfully implemented, the willingness to repay, the probability of having due dates, and the probability of default. Finance and P2P finance is very popular because it adopts the latest technologies such as in-depth learning and psychological evaluation based on credit rating. In this study, we propose a lie detection algorithm for personal phone voice. This algorithm is fast, easy to use and accurate through sound analysis. In previous studies, we studied the parameters of speech as a function of credit change. Previous studies have examined changes in voice parameters before and after bankruptcy in relation to default, and studies have been conducted to characterize these changes. We also propose personal credit evaluation modules by analyzing telephone voices and using these parameters and machine learning (SVM) to determine the falsity of telephone voice.

2 Related Works and Basic Algorithm Reviews 2.1 Speech Analysis Voice communication is a long-used information transmission technology. The process of conveying the meaning of the speaker and the understanding of the heavenly is

74

H. Park

Fig. 1 Voice Genetraon flow diagram [10]

basically started from the concept that the speaker wants to convey, and the process is as follows. The speaker changes the idea through the structure of the language and in the process selects the appropriate word or word to represent the speaker’s thoughts. It then aligns the order of the words according to the grammar. And as a next step, their brain commands move the muscle tissue involved with the location of the vocal organ and the vocalist’s desire pronunciation. In the process, accents, speech rate, intonation, speech habits, etc. are created together. This command is prepared in the voice organ and the air flow from the lungs vibrates the vocal cords. This vibration and air flow resonate with the vocal tracks that spread through the nose and mouth. Then an acoustic waveform according to the intention of the speaker is generated [11, 10]. Figure 1 shows the process of voice communication. In speech signal processing, speech information can be broadly classified into characteristics of excitation sources and characteristics of vocal track parameters. First, the characteristics of the excitation source can be confirmed by the presence or absence of vibration of the vocal cords, and the fundamental frequency when the vocal cords vibrate is called the pitch. This can be determined by analyzing the number of vibrations during the unit time or during the time the vocal cords are opened and

A Study on the Faith Score of Telephone Voices Using …

75

closed. When the pitch is accurately detected, the influence of the speaker at the time of speech recognition can be reduced, and the probability of speech synthesis can be maintained or naturalness can be easily maintained. And if we know exactly what this pitch is, we can use it to turn it into another voice. For a typical male, the available pitch range is 80–250 Hz. For women, there is a characteristic of 150–300 Hz [11, 12]. The change in pitch over time can be regarded as a parameter of major change in speech, and the change in pitch in addition to the language information included in speech can be regarded as a method of estimating other information. Second, the vocal track parameter is formant. The formant frequency of the voice is the frequency band emphasized by the resonance when the air tremor that occurs in the vocal cords passes through the vocal track. These formant frequencies are represented by the first and second formants, or F1 and F2, in the lowest frequency order. The formants generally have first through fourth resonance degrees, and the fifth and sixth formants are also detected when the vocal region is good. The position of the vertex is indicated by the frequency value of the vertex. And when you look at formant bands you can see what kind of vocal tracks you have. The sound caused by the quenching of the air here is the same, but the emphasized frequency band varies with thickness, length, and rate of change [4, 11, 12]. Generally speaking, phonetic characteristics according to phonemes are represented by F1 and F2, i.e., first and second formants. F3, F4, and F5 represent the individual characteristics of the speakers. At this time, frequency position, bandwidth, amplitude, etc. of F3 and F4 can be classified as characteristics. When speech recognition is performed, F1 or F2 is important information in the voiced section. However, in the unvoiced part, not only F1 and F2 but also F3, F4, and F5 include voice information and other information because the formant part is not simple and complicated, unlike the voiced part. You can also check the formant’s slope by evaluating the pronunciation and communicating the information to the listener [10, 12]. Therefore, the position and slope of the first and second formants are obtained, and the slopes of the first and fourth formants are compared to confirm the reliability and sharpness of the ignition as a whole. We can also use this formant frequency change as a parameter to measure speech [4].

2.2 Support Vector Machine Support Vector Machine (SVM) is one of the algorithms used in machine learning analysis in the IT industry. SVM is developed based on statistical analysis. The results of the algorithm are called dependent variables, and the factors affecting the results are called independent variables [8, 9]. Statistical analysis is a method of estimating dependent variables by predicting the statistical similarity of independent variables in a large number of data [9, 13]. In other words, it is a method of analyzing various conditions and making a statistical judgment criterion based on a large number and a small number [14]. SVM is a simple algorithm that finds hyperplanes that differentiate or predict dependent variables (decision results) through feature extraction from

76

H. Park

many data [15]. “SVM is a supervised learning model with an associated learning algorithm that analyzes the data used in classification and regression analysis” [15]. SVM is an algorithm that can distinguish between two categories as far as possible when there is more than one category in the subordinate for example, algorithms that distinguish between ripe apples and untrained apples are done through machine learning and high-volume data analysis, which is well used in machine learning because of its high classification accuracy. The reason for this is that it separates dependent variables and maximizes the margin of error for hyperplane margins [16, 17]. In general, other predictors are learning how to reduce the error, in this case, overfitting, and SVM is not excessive [18, 19]. It is possible to improve prediction performance by changing the dimension of data called a kernel function, In this study, SVM is able to judge the hypersurface by analyzing the characteristics of voice change, bankruptcy and voice change before and after the bankruptcy [20, 21]. The parameters used in the determination are the change parameters of the fundamental frequency and the pitch frequency, the slope according to the formant position and size Parameters of the speech rate with time variation of the formants, and slope parameters according to the energy changes.

3 Telephone Voice Discrimination of Propose Methods The proposed method of evaluating the reliability of voice over the telephone is as follows. When voice is input by telephone communication, noise is removed by preprocessing. The voice is divided into voiced and silent voice sections through a classifier. Then, the necessary parameters are extracted for each interval and used as independent variables in SVM. Also, check the rest of the analysis for any other arguments that the discriminator can use and classify separately. Then enter the parameters in the SVM discriminator to perform the evaluation for the telephone voice. The result shows the credibility of the party you are calling. A block diagram of the proposed technique is shown in Fig. 2. The spectral subtraction method in signal processing is generally used as a preprocessor for producing a clear voice of speech and is used to remove noise by using noise information in a silent section. Extract the necessary parameters of the time-frequency domain in feature extraction. The feature information obtained from the previous study was utilized. In previous studies, we identified changed speech components before and after personal bankruptcy in financial transactions. These features are found in various independent variables such as pitch, pitch perturbation, duration, formant, formant slope, and energy slope. In this study, we classify the above-mentioned parameters and use discriminators. In the case of the pitch used as the judgment parameter of the SVM, the parameter is extracted using the mean and variance of the pitch frequency over time rather than merely analyzing the frequency. Figure 3 shows a pitch contour diagram for ‘ne’ (mean is yes) utterance in speaker-independent. In the figure, the abscissa is the frame order and the ordinate is the pitch frequency value. Figure 3 shows the voice of a normal debtor who has not defaulted. Figure 4 shows the results of the person

A Study on the Faith Score of Telephone Voices Using …

Input Input

Pre-Process Pre-Process (Noise reducƟon)

Feature Feature ExtracƟon

77

U/uV Separator

Time-Domain Time-Domain Experiment Experiment output output

Freq.-Domain Freq.-Domain Experiment Experiment

SVM SVM Classifier Classifier

Fig. 2 Block diagram of proposed method

Fig. 3 Pitch contour of normal speeches

Fig. 4 Pitch contour of voice after personal default occurrence

who caused the default. In Fig. 3, the pitch change rate is low and the pitch is evenly analyzed. In Fig. 4, the pitch change is relatively high and the variance is expected to be high. Using these results, it is possible to judge the truthfulness of telephone voice related to finance through pitch analysis. And it can be defined as pitch dispersion as the first discrimination standard of SVM.

78 Table 1 The independent variables for SVM

H. Park

Variable

Marker

Classify

Independent

X1

Time, pitch perturbation

X2

Formants rate

X3

Speech speed

Dependent

X4

Speech speed jitter

Y

Default detection

4 Experiment and Results In order to obtain the parameters of this study, we were able to use the telephone recording file for research purposes as borrower and counselor of the S-Capital and m-bank. Then, the recorded voices were analyzed by the proposed method of parameterization and the result calculation. It also compares the analysis to actual personal credit scores to find similarities between these data. 30 telephone conversations, consisting of 18 males and 12 females, were used as materials. The age ranges of sample data are all in the 20s–40s. The collected speech was then sampled at 11 kHz and then quantized to 16 bits per sample. We use the Cepstrum analysis of speech signals prepared for fast parameter extraction. The Cepstrum analysis can be used to obtain information such as pitch in low order and pitch in high order quefrency based on quefrency. Personal credit status changes appear in the telephone consultation voices. That are voice data before and after personal bankruptcy. And if you study with enough big data, you can see the state just before bankruptcy. You can also use some of these parameters to create SVM Inspector baselines. And at low quefrency, we get information about formants and formants. And we can extract information about speech speed using a combination of quefrency according to the change of time. Independent variables for SVM are defined as shown in Table 1. The values from X1 to X4 based on the default are shown in Table 1. We summarized the most visible data to classify. The Y-values in Table 1 indicates the actual credit state personal banking and the past and personal default conditions. The parameters extracted from the telephone consultation voices X1 to X4 have been recorded. We used four independent variables for SVM learning. Based on this, it is possible to establish a classification criterion for the faith score.

5 Conclusion The voice is an important way of communicating information between various subjects (people, machines). In addition to language information, common voices include a variety of data such as health, emotions, and honesty. A voice signal is a time-varying signal that changes various parameters with time. In addition, since

A Study on the Faith Score of Telephone Voices Using …

79

speech includes semi-periodic information, there are parameters for various other information besides language information. Voice changes vary depending on the type of language but are typically 3–4 times per second. That is, the vocal track parameters and excitation source are changed. In the meantime, the development of information and communication technology has put a great deal of effort into understanding the language information contained in people’s words. In this study, we introduced a classifier that can evaluate speech by analyzing speech other than language information. This discriminant algorithm can be used to predict the basic or normal state of speech analyzed by binary decision logic. We examined the characteristics of each individual voice and confirmed the parameters of the trusted parameters. In short, a reliable voice has a wide range of voices, clear pronunciation, and a formant structure. You can use this parameter to create a determinant that evaluates trust. In the future, I will carry out studies to acquire additional data and conduct sufficient learning and to construct a discriminator through the result.

References 1. Park, H.W.: A study on personal credit evaluation system through voice analysis (by machine learning method). KSII, The 9th International Conference on Internet (ICONI), no. 12, Vientiane Laos (2017) 2. Lee, S.: Evaluation of mobile application in user’s perspective (case of P2P lending apps in FinTech Industry). KSII Trans. Internet Inf. Syst. 11(2) (2017) 3. Park, J.W., Park, H.W., Lee, S.Mm: An analysis on reliability parameter extraction using formant of voice over telephone. Asia-Pac. J. Multimedia Serv. Converg. Art Humanities Sociol. 7(3), 183–190 (2015) 4. Park, H.W., Bae, M.J.: Analysis of confidence and control through voice of Kim Jung-un’s. Information 19(5), 1469–1474 (2016) 5. Han, K., Yu, D., Tashev, I.: Speech emotion recognition using deep neural network and extreme learning machine. Interspeech, pp. 223–227 (2014) 6. Lin, Y.-L., Wei, G.: Speech emotion recognition based on HMM and SVM. In: Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, Guangzhou, August 2005, pp. 4898–4901 (2005) 7. Fundamentals of Telephone Communication Systems. Western Electric Company, p. 2.1 (1969) 8. Mathworks: eBook of Matlab and machine learning. https://kr.mathworks.com/campaigns/ products/offer/machine-learning-with-matlab.html (2017) 9. Kim, C.W., Yun, W.G.: Support Vector Machine and Manufacturing Application. Chaos Book (2015) 10. Bae, M.J., Lee, S.: Digital Voice Analysis. Dongyoung Press (1987) 11. Vlachos, A.: Active Learning with Support Vector Machines. Master thesis, University of Edinburgh (2004) 12. Ribiner, L.R., Schafer R.W.: Theory and Applications of Digital Speech Processing. Pearson (2011) 13. Park, H.W., Kim, M.S., Bae, M.-J.: Improving pitch detection through emphasized harmonics in time-domain. Commun. Comput. Inf. Sci. (CCIS) 352, 184–189 (2012) 14. Suykens, Johan A.K., Vandewalle, Joos: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999) 15. Fung, G.M., Mangasarian, O.L.: Multicategory proximal support vector machine classifiers. Mach. Learn. 59(2), 77–97 (2005)

80

H. Park

16. Zheng, Lu, et al.: Integrating granger causality and vector auto-regression for traffic prediction of large-scale WLANs. KSII Trans. Internet Inf. Syst. (TIIS) 10(1), 136–151 (2016) 17. Lee, D.Y., Shin, D.K., Shin, D.I.: A finger counting method for gesture recognition. J. Internet Comput. Serv. 17(2), 29–37 (2016) 18. Su, C.-L.: Ear recognition by major axis and complex vector manipulation. KSII Trans. Internet Inf. Syst. 11(3) (2017) 19. Yoon, S.-H., Bae, M.-J.: Analyzing characteristics of natural seismic sounds and artificial seismic sounds by using spectrum gradient. J. Inst. Electron. Eng. Korea SP 46(1), 79–86 (2009) 20. Huang, G.-B., et al.: Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 42(2), 513–529 (2012) 21. Ishidaa, H., Oishib, Y., Moritac, K., Moriwakic, K., Nakajimab, T.Y.: Development of a support vector machine based cloud detection method for MODIS with the adjustability to various conditions. Remote Sens. Environ. 205, 390–407 (2018)

A Comparative Study of Using Bag-of-Words and Word-Embedding Attributes in the Spoiler Classification of English and Thai Text Rangsipan Marukatat

Abstract This research compares the effectiveness of using traditional bag-of-words and word-embedding attributes to classify movie comments into spoiler or nonspoiler. Both approaches were applied to comments in English, an inflectional language; and in Thai, a non-inflectional language. Experimental results suggested that in terms of classification performance, word embedding was not clearly better than bag of words. Yet, a decision to choose it over bag of words could be due to its scalability. Between Word2Vec and FastText embeddings, the former was favorable when few out-of-vocabulary (OOV) words were present. Finally, although FastText was expected to be helpful with a large number of OOV words, its benefit was hardly seen for Thai language. Keywords Spoiler classification · Word embedding · Word2Vec · FastText

1 Introduction Classifying movie comments on web forums or social media websites into spoiler or non-spoiler is a challenging task in text mining. A spoiler is generally a comment that discloses some important details of a movie such as the ending, the twist, or even the presence of some characters or actors. Such disclosure spoils the enjoyment of users who have not yet watched that movie. Plot summary, however, is not a spoiler because it is given by the filmmaker themselves as part of their promotion materials. There have been a few researches on spoiler classification. Boyd-Graber et al. [1], Hijikata et al. [2], and Chang et al. [3] classified individual sentences; Iwai et al. [4] classified individual sentences and whole comments; and lastly Jeon et al. [5] classified individual tweets. The classification was done by support vector machine (SVM) [1, 5], Naïve Bayes [2, 4], and deep learning [3]. Their attributes were mostly R. Marukatat (B) Department of Computer Engineering, Faculty of Engineering, Mahidol University, Nakhon Pathom 73170, Thailand e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Lee (ed.), Applied Computing and Information Technology, Studies in Computational Intelligence 847, https://doi.org/10.1007/978-3-030-25217-5_7

81

82

R. Marukatat

words extracted from the comments or sentences, typically in bag-of-words form [1–4]. Additional attributes could be used, but they were much fewer than bag-ofwords attributes. In Boyd-Graber et al., meta attributes such as genre, length, and the recency of the show were included. In Hijikata et al., sentence location was included, following an observation that spoiler sentences were often clustered in the middle of a comment. In Chang et al., genre information was passed to the hidden state of deep learning network. In Jeon et al., linguistic and subjectivity analyses were employed, following observations that spoilers were usually written in past tense and in objective tone. According to the aforementioned researches, contents or words extracted from comments or sentences are primary attributes for spoiler classification. However, by using bag of words, the attributes can vary and the number of them can explode to many thousands. Boyd-Graber et al. ended up with bags of around 20,000 word unigrams and 125,000 word bigrams. Likewise, although Hijikata et al. and Iwai et al. ended up with bags of more than 2,000 words, the performance of their classifiers reached plateau at around 400 words. A recent alternative to bag of words is word embedding, whose appeal is its ability to represent words and documents by fixed-length condensed vectors [6, 7]. But the effectiveness of word embedding might depend on language characteristics, particularly whether or how words are inflected by grammatical features such as tense and gender [7]. Thus, this research applied bag of words and word embedding to movie comments in English, a synthetic (inflectional) language1 ; and in Thai, an analytic (non-inflectional) language. As far as we know from literature survey, our research is the first attempt at spoiler classification of Thai text. The rest of this paper is organized as follows. Section 2 reviews bag of words and word embedding. Section 3 summarizes main characteristics of Thai language and reviews Thai text mining researches that employed both word representation approaches. Section 4 describes data preparation and experimental setup. Section 5 reports and discusses results. Section 6 concludes the paper.

2 Word Representation 2.1 Bag of Words In bag of words, attributes are unique words extracted from all documents in a data set. Stop words are normally excluded. Given a set of attributes, attribute values for each record representing a document can be as simple as word frequencies in that document, or complicated measures such as term frequency-inverse document 1 Some

linguistic researches suggested that modern English is drifting towards analyticity [8, 9]. It has lower degree of inflection than Old English and other languages such as German. However, it is still more synthetic than Thai.

A Comparative Study of Using Bag-of-Words and Word-Embedding …

83

frequency (TF-IDF) [10, 11]. Some studies used bags of word n-grams instead of unigrams [1, 10]. Among extracted words, some may be inflectional forms of others such as go, went, gone, and going. Their frequencies are usually quite low, which leads to high data variation. Word normalization helps reduce such variation by aggregating the frequencies of inflectional words into the frequencies of their roots. There are two approaches to word normalization: stemming and lemmatization. Stemming strips prefixes and/or suffixes from inflectional words to obtain their roots or stems. For example, the stem of going is go. The stripping of prefixes and suffixes is regardless of vocabulary, word context, or part of speech. As a result, a stem needs not be any meaningful word. A notable rule-based stemmer for English is Porter stemmer. Porter later developed a framework called Snowball [12], which allows stemming algorithms to be written in high-level Snowball scripts and compiled into mainstream programming languages including C, Java, and Python. The framework also provides a collection of stemmers for various natural languages, including his own improved version of Porter stemmer, Porter2. Other well-known English stemmers are Lovins and Paice/Husk [13]. Contrarily, lemmatization utilizes word context and part of speech. One simple method is to scan the entire lexical database, such as WordNet, to find all variants of a word and determine the root or lemma from them. Therefore, every lemma is a meaningful word. Unlike stemming, lemmatization can find the lemma of irregular inflection. For example, go would be found as the lemma, but not the stem, of went.

2.2 Word Embedding The general idea of word embedding is to represent a word W by a vector of (relationship) scores between W and context words [C 1 , C 2 , …, C n ]. Typically derived from a massive corpus, the vector [C 1 , C 2 , …, C n ] is universal across data sets and much more condensed than bag of words. Suppose that a context vector is [car, sugar, water] and the embeddings or vector representations of words boat, milk, and wine with respect to this vector are Boat [0.39, 0.01, 0.60], Milk [0.02, 0.40, 0.58], W ine [0.15, 0.30, 0.55]. These embeddings capture latent features of words and because they are based on the same context, the relationship between them can be measured. Intuitively, milk is more similar to wine than boat because the cosine similarity between milk and wine is 0.9, but that between milk and boat is only 0.71. An aggregation method such as averaging can be applied to obtain the embedding of the whole document. Thus, a

84

R. Marukatat

document containing boat, milk, and wine will have an average embedding of [0.19, 0.24, 0.58]. Proposed by Mikolov et al. [6], Word2Vec is a renowned method to determine word embeddings by using neural network. Two common Word2Vec architectures are continuous bag of words (CBOW) and skip gram. CBOW predicts a target word from its context or surrounding words, whereas skip gram predicts context words within a certain range of the current word. The hidden layer in both architectures is a projection layer whose size (number of nodes) equals the size of the context vector. After a model is trained, the embeddings of all words in the training corpus can be calculated for later lookup. Mikolov et al. reported that skip gram was more versatile than CBOW. It gave higher accuracies than CBOW across different data sets and parameter combinations, despite longer training time. One shortcoming of Word2Vec is that it cannot calculate the embeddings of words that do not exist in the training corpus, i.e. out of vocabulary or OOV words. Suppose that a non-spoiler comment is “he loves his wife” and a spoiler one is “he murders his wife.” If loves and murders are both OOV, the embeddings of both comments will be identical because they are averaged from the embeddings of the remaining words: he, his, and wife. In order to tackle this shortcoming, Bojanowski et al. [7] proposed an extension of Word2Vec, namely FastText. In FastText, the embedding of a word is summed from the embeddings of its subwords or character n-grams. This allows the embedding of any OOV word to be calculated. FastText architectures are also based on CBOW and skip gram. Bojanowski et al. reported that FastText was helpful for highly morphological languages such as German and Russian; however, its benefit was traded off by much longer training time than Word2Vec. Additionally, as in bag-of-words approach, either stemming or lemmatization can be applied prior to the word embedding calculation. The effectiveness of these word normalization and word representation on the spoiler classification of English and Thai text will be investigated in this paper.

3 Thai Language 3.1 Characteristics of Thai Language Thai is an analytic language [14]. Unlike synthetic languages such as English, Thai words are not inflected by tense, subject, gender, voice, or any other. Instead, all these grammatical features are expressed by separate words. For example, consider the following English and Thai counterparts: (English) I watch this movie. (English) I watched this movie.

= =

(Thai) (Thai)

A Comparative Study of Using Bag-of-Words and Word-Embedding …

85

is an isolated word that indicates past tense; it is neither In the second pair, prefix nor suffix of another word as -ed in watched. There is no explicit word boundary in Thai writing. Words are written consecutively until reaching a space that signals the end of the sentence. The sentence ” is composed of 5 words . Thai sentences in “ this regard differ from English sentences. That is, given a sequence of consecutive words, decisions (to add spaces) to break it into sentences may vary from one native speaker to another. As a result, a Thai sentence can be equivalent to a complete sentence, a clause, or merely a phrase in English. This is illustrated by the following sequence of words:

(She waited, only to find that he comes back with a wife and a little daughter.)

Although the above sequence is already a valid sentence, different breaks can be added to produce multiple sentences which are also valid in Thai writing: (1) (2) (3) (4) (5) Furthermore, the set of Thai alphabet is bigger than English one. It consists of 44 consonants, 18 vowel symbols that make up 32 vowels, 4 tone marks, and 2 diacritics. Characters that make up a word can be stacked in 4 lines: base line for consonants; upper and lower lines for vowels; and top line for tone marks or diacritics. Typing Thai text especially with mobile devices is thus cumbersome and prone to spelling errors. Wrong combination of vowel symbols and omission of tone marks or diacritics are quite common in social media text.

3.2 Other Thai Text Mining Researches We end this section by reviewing some Thai text mining researches. Although they did not target spoiler classification, their methods and results gave us more insight into the application of bag of words and word embedding. Songram et al. [10] extracted bags of 4,000 word unigrams and 4,000 word bigrams for classifying deceptive Facebook posts. Tuarob and Mitrpanont [11] also extracted bag of words for classifying abusive Facebook posts. In both researches, attribute values were TF-IDF measurement. Their best performance was achieved by SVM [10] and discriminative multinomial Naïve Bayes [11]. Seneewong Na Ayutthaya and Pasupa [15] used pre-trained Word2Vec, namely Thai2Vec, along with part-of-speech and sentiment features to classify sentiment of Thai children tales, while Promrit and Waijanya [16] used Word2Vec attributes to

86

R. Marukatat

classify topic and sentiment of Thai poems. The classification in both researches was done by deep learning. Polpinij et al. [17] exploited Word2Vec differently. Starting with a short list of positive and negative sentiment words, they looked up Word2Vec embeddings to find other semantically similar words. Then, SVM with bag of polarity words was employed to classify sentiment of hotel reviews. Their method was able to select only predictive words and reduce the bag size.

4 Data Preparation and Experimental Setup 4.1 English Data Set As shown in Table 1, reviews on IMDB.com on 75 movies released during 2013–2017 were collected. For each movie, 20 non-spoiler and 20 spoiler reviews were randomed. Each review, henceforth called comment, consisted of a few paragraphs of text. Spoilers were comments hidden in collapsible blocks. Non-spoilers were verified manually to ensure that they disclosed nothing beyond plot summary. Our data set was split into: 2,400 comments from 60 movies as training data, and the other 600 comments from the other 15 movies as testing data. This research classified spoilers at coarser level than other researches. BoyedGraber et al. [1] collected 16,000 sentences from TVTropes.org for sentence-level classification. Their data set was later used by Chang et al. [3], also for classifying spoiler sentences. Similarly, Hijikata et al. [2] and Iwai et al. [4] collected 5,000 sentences from Amazon.com for sentence-level classification, and those sentences made up 500 comments for comment-level classification by Iwai et al. But due to the ambiguity of sentence boundary in Thai language, we opted for comment-level classification for both Thai and English data, so that their performance could be compared. Individual comments in our study could be much longer than individual sentences in other studies. Table 2 summarizes text processing for English data, which was done by using Python Natural Language Toolkit (NLTK). In all combinations, stop words were first

Table 1 Movie reviews in English, retrieved from IMDB.com Training Number of movies Number of comments (i.e. records)

Testing

Total

60

15

75

2,400

600

3,000

Non-spoiler

1,200

300

1,500

Spoiler

1,200

300

1,500

Non-spoiler

167

174

Spoiler

153

159

Average words per comment

A Comparative Study of Using Bag-of-Words and Word-Embedding …

87

Table 2 Text processing for English data Processing combination

Number of attributes

EN + BOW

No normalization

3,531

EN.st + BOW

Stemming

2,687

EN.lm + BOW

Lemmatization

3,010

Bag of words

Word2Vec EN + W2V.cbow

No normalization + CBOW

400

EN + W2V.skipgram

No normalization + skip gram

400

EN.st + W2V.cbow

Stemming + CBOW

400

EN.st + W2V.skipgram

Stemming + skip gram

400

EN.lm + W2V.cbow

Lemmatization + CBOW

400

EN.lm + W2V.skipgram

Lemmatization + skip gram

400

EN + FT.cbow

No normalization + CBOW

400

EN + FT.skipgram

No normalization + skip gram

400

EN.st + FT.cbow

Stemming + CBOW

400

EN.st + FT.skipgram

Stemming + skip gram

400

FastText

EN.lm + FT.cbow

Lemmatization + CBOW

400

EN.lm + FT.skipgram

Lemmatization + skip gram

400

removed. Then, either Snowball stemming (Porter2) or WordNet lemmatization was applied. For bag of words, the remaining unique words in the training data became attributes, and attribute values were the frequencies of these words in a comment. For Word2Vec and FastText, their CBOW and skip gram were trained by using Python’s Gensim and FastText packages with default parameters. Simplewiki dump (144 MB)2 of articles on Simple English Wikipedia was used as the training corpus. From literature survey, we found that many researches used context vectors of 300–400 words. So, our attributes in all cases were 400 trained context words. Attribute values were the average embeddings of all words in a comment.

4.2 Thai Data Set Discussions on Pantip.com, Thailand’s popular web forum, on 61 movies released during 2013–2017 were collected. For each movie, 20 non-spoiler and 20 spoiler comments were randomed. Spoilers were comments hidden in collapsible blocks or spoiler-alert threads; for some movies, there were fewer than 20 of them. 2 https://dumps.wikimedia.org/simplewiki/latest/simplewiki-latest-pages-articles.xml.bz2.

88

R. Marukatat

Table 3 Movie discussions in Thai, retrieved from Pantip.com Training Number of movies

Testing

Total

51

10

61

2,011

391

2,402

1,020

200

1,220

991

191

1,182

Non-spoiler

176

163

Spoiler

112

96

Number of comments (i.e. records) Non-spoiler Spoiler Average words per comment

Table 4 Text processing for Thai data Processing combination

Number of attributes

No normalization

1,802

Bag of words TH + BOW Word2Vec TH + W2V.cbow

No normalization + CBOW

400

TH + W2V.skipgram

No normalization + skip gram

400

FastText TH + FT.cbow

No normalization + CBOW

400

TH + FT.skipgram

No normalization + skip gram

400

Non-spoilers were also verified manually. As displayed in Table 3, our training data set contained 2,011 comments from 51 movies, and testing data set contained the other 391 comments from the other 10 movies. Table 4 shows text processing for Thai data, which was done by using Python’s pyThaiNLP with deep learning tokenization (Deepcut), together with Gensim and FastText packages with default parameters. Only stop-word removal was applied because there is neither stemming nor lemmatization in Thai language. Bag-of-words processing was done in the same way as was done with English data. The training corpus for Word2Vec and FastText was thwiki dump (241 MB)3 of articles on Thai Wikipedia.

4.3 Experimental Setup All data sets were saved in attribute relation file format (ARFF) to be classified by support vector machine (SVM) and Naïve Bayes in Weka (Waikato Environment for Knowledge Analysis). Naïve Bayes did not require any parameter. For SVM, we did 3 https://dumps.wikimedia.org/thwiki/latest/thwiki-latest-pages-articles.xml.bz2.

A Comparative Study of Using Bag-of-Words and Word-Embedding …

89

preliminary tests and found that RBF kernel was slightly better than other kernels. The remaining parameters were Weka’s defaults. In each run of experiment, the overall accuracy and the F-measure of class spoiler were computed. This F-measure, being a harmonic mean of precision and recall, indicated how well the classifier learned spoiler patterns.

5 Results and Discussion Experimental results are reported in Table 5. Main observations about classifiers, word representation, and word normalization are as follows. • Between SVM and Naïve Bayes, SVM outperformed Naïve Bayes in all cases. • Between bag of words, Word2Vec, and FastText, all of them achieved similar results for English data. But for Thai data, bag of words was clearly the winner. • Between Snowball stemming and WordNet lemmatization for English data, the latter was better but not by much. In many cases, not normalizing words yielded even better results. • Between CBOW and skip gram models for Word2Vec and FastText, skip gram outperformed CBOW in all cases. When using bag of words, predicting testing records was based on the occurrences of attributes in them. As shown in Table 6, many attributes did not exist in the testing data and hence did not contribute to the prediction. Yet, the remaining ones (about 15% for Thai data and 20% for English data) were adequate to identify spoiler patterns. When stemming and lemmatization were applied, more attributes matched words in the testing data, resulting in slightly better performance. It is worth noting the results of sentence-level classification by others. BoyedGraber et al. [1] reported the best accuracy of 67%. Hijikata et al. [2] reported the best F-measure of 0.80. Chang et al. [3] reported the best accuracy of 75% and the best F-measure of 0.78. Our bag of words performed better (86% accuracy and 0.86 F-measure by SVM), possibly due to coarser level of classification. In comment level, more words with high frequencies could be extracted for each record, leading to clearer patterns for spoiler classification. Apparently, in bag-of-words approach, extracted attributes depended on training data and a majority of them were redundant. With word embedding, a fixed set of attributes was used. Words in training and testing data needed not match. However, training Word2Vec and FastText was costly. Our training was run on a laptop with 8th generation Intel Core i7-8550U processor and 16-GB RAM. It took about 1 h to train CBOW and 1.5 h to train skip gram models of Word2Vec. As for FastText, it took almost 7 h to train its CBOW and skip gram. Nevertheless, once each model was trained, looking up word embeddings and averaging them took less than a minute per record. The number of word embeddings obtained from the training depended on the vocabulary size of the training corpus. There were 120,346 words in simplewiki (English) and 410,808 words in thwiki (Thai).

90

R. Marukatat

Table 5 Spoiler classification results SVM

Naïve Bayes

Overall accuracy (%)

F-measure (spoiler)

Overall accuracy (%)

F-measure (spoiler)

EN + BOW

85.3

0.85

83.5

0.82

EN.st + BOW

85.5

0.86

85.0

0.84

EN.lm + BOW

86.0

0.86

81.8

0.81

EN + W2V.cbow

85.0

0.84

81.7

0.80

EN + W2V.skipgram

85.7

0.85

85.7

0.85

EN.st + W2V.cbow

82.5

0.82

80.8

0.79

EN.st + W2V.skipgram

84.5

0.84

81.7

0.80

EN.lm + W2V.cbow

82.7

0.82

80.5

0.79

EN.lm + W2V.skipgram

85.7

0.85

84.2

0.83

EN + FT.cbow

85.3

0.85

81.7

0.80

EN + FT.skipgram

86.7

0.86

85.0

0.84

EN.st + FT.cbow

83.3

0.83

80.3

0.79

EN.st + FT.skipgram

84.5

0.84

82.2

0.81

EN.lm + FT.cbow

84.8

0.84

81.0

0.80

EN.lm + FT.skipgram

86.2

0.86

84.2

0.83

TH + BOW

86.7

0.87

86.4

0.85

TH + W2V.cbow

76.2

0.73

70.3

0.67

TH + W2V.skipgram

82.1

0.80

72.1

0.70

TH + FT.cbow

76.2

0.74

68.3

0.61

TH + FT.skipgram

78.8

0.75

75.0

0.72

Table 6 Attributes (from training data) that existed and did not exist in testing data Attributes extracted from training data

Attributes existing in testing data

Attributes not existing in testing data

EN + BOW

3,531

785 (22%)

2,746

EN.st + BOW

2,678

696 (26%)

1,982

EN.lm + BOW

3,010

714 (24%)

2,296

TH + BOW

1,802

275 (15%)

1,527

A Comparative Study of Using Bag-of-Words and Word-Embedding … Table 7 Unique and non-OOV words with respect to English and Thai vocabularies

Training data

91

Testing data

Unique

Non-OOV

Unique

Non-OOV

EN

3,531

3,282 (93%)

1,289

1,192 (92%)

EN.st

2,678

1,820 (68%)

1,067

755 (71%)

EN.lm

3,010

2,772 (92%)

1,149

1,052 (92%)

TH

1,802

530 (29%)

474

153 (32%)

From Table 7, the English embeddings covered approximately 90% of words in the training and testing data. The coverage decreased to 70% for stemmed data, which could explain its lesser Word2Vec performance (84.5% accuracy) than the unnormalized and lemmatized counterparts (85.7% accuracy for both). The benefit of FastText was marginal because there were few OOV words. Its performance on unnormalized data (86.7% accuracy) was slightly better than that on stemmed and lemmatized data (84.5% and 86.2% accuracies, respectively). The redundancy of word normalization supported Bojanowski et al.’s [7] assumption that subwording in FastText already accounted for possible morphemes (roots, prefixes, and suffixes) in inflectional languages. On the other hand, the Thai embeddings covered only 30% of words found in the training and testing data. We observed that user discussions on Pantip.com were much more casual than articles on Wikipedia. There were plenty of spoken words, slangs, acronyms, misspellings, and obscured profanities that became OOV. We believed these were main characteristics of online comments, and consequently did not remove or correct any of them. This could explain the lesser performance of Word2Vec (82% accuracy) in comparison to bag of words (86% accuracy). Note that other Thai text mining studies reported good Word2Vec performance, but without benchmarking it against bag of words [15, 16]. Although accuracies and F-measures from different domains cannot be compared, we spotted a factor that may contribute to their success. In [15], Thai2Vec was obtained from a small Thai Wikipedia corpus. Its vocabulary contained only 60,000 words, but the Wikipedia articles were likely to cover a majority of words in their children tale data. In [16], Word2Vec was obtained from a massive corpus (5.9 million words) of online Thai literature and poem resources, which undoubtedly covered their domain of interest in poem classification. Interestingly, FastText was not helpful in handling numerous OOV words in our experiments. In fact, its performance was even worse than Word2Vec. A reason was possibly that in Thai, a non-inflectional language, subwords are only random characters. As a result, the embeddings did not capture any meaningful pattern.

92

R. Marukatat

6 Conclusion This paper compares bag-of-words and word-embedding attributes in the spoiler classification of movie comments in English and in Thai. In summary, bag of words is a still good option if the training data, from which attributes are extracted, well represent the domain of interest. Choosing word embedding over bag of words is not necessarily due to its superior performance, but rather its scalability, provided that a pre-trained model is available (e.g. [18]). In that case, word normalization is unnecessary. The skip gram model of Word2Vec is favorable if few OOV words are expected. Otherwise, FastText would be helpful, but rather for inflectional languages such as English. A few research directions for further improvement are as follows. First, other attributes will be included. Potential ones are genres, named entities, part-of-speech features, and sentiment features. To handle OOV in Thai language, we will explore alternative corpora that cover social media text and investigate other methods that may be suitable for Thai, for example, by considering word context instead of character n-grams [19]. Finally, other embedding methods are also worth being studied. For example, Doc2Vec [20] represents a whole document by a single embedding rather than aggregating multiple Word2Vec ones; while GloVe [21] derives word embeddings from co-occurrence matrix instead of neural network training.

References 1. Boyd-Graber, J., Glasgow, K., Zajac, S.J.: Spoiler alerts: machine learning approaches to detect social media posts with revelatory information. In: Proceedings of the American Society for Information Science and Technology (ASIST), Montreal, Quebec, Canada, Nov 2013 2. Hijikata, Y., Iwai, H., Nishida, S.: Context-based plot detection from online comments for preventing spoilers. In: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, Omaha, NE, USA, pp. 57–65, Oct 2016 3. Chang, B., Kim, H., Kim, R., Kim, D., Kang, J.: A deep neural spoiler detection model using a genre-aware attention mechanism. In: Phung, D., et al. (eds.) PAKDD 2018. Lecture Notes in Artificial Intelligence (LNAI), vol. 10937, pp. 183–195 (2018) 4. Iwai, H., Hijikata, Y., Ikeda, K., Nishida, S.: Sentence-based plot classification for online review comment. In: Proceedings of the IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technologies, Warsaw, Poland, pp. 245–253, Aug 2014 5. Jeon, S., Kim, S., Yu, H.: Spoiler detection in TV program tweets. Inf. Sci. 329, 220–235 (2016) 6. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of the International Conference on Learning Representations (ICLR) Workshop, Scottsdale, AZ, USA, May 2013 7. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. (TACL) 5, 135–146 (2017) 8. Haspelmath, M., Michaelis, S.M.: Analytic and synthetic: typological change in varieties of European languages. In: Buchstaller, I., Siebenhaar, B. (eds.). Language Variation—European Perspective VI (2017) 9. Szmrecsanyi, B.: An analytic-synthetic spiral in the history of English. Linguist. Today 227, 93–112 (2016)

A Comparative Study of Using Bag-of-Words and Word-Embedding …

93

10. Songram, P., Choompol, A., Thipsanthia, P., Boonjing, V.: Detecting Thai messages leading to deception on Facebook. In: Huynh, V.-N., et al. (eds.) IUKM 2016. Lecture Notes in Artificial Intelligence (LNAI), vol. 9978, pp. 293–304 (2016) 11. Tuarob, S., Mitrpanont, J.L.: Automatic discovery of abusive Thai language usages in social networks. In: Choemprayong, S., et al. (eds.) ICADL 2017. Lecture Notes in Computer Science (LNCS), vol. 10647, pp. 267–278 (2017) 12. Porter, M.F.: Snowball: a language for stemming algorithms. http://snowball.tartarus.org/texts/ introduction.html. Accessed 15 May 2019 13. Singh, J., Gupta, V.: Text stemming: approaches, applications, and challenges. ACM Comput. Surv. 49(3) (article 45) (2016) 14. Slayden, G.: Overview of Thai language. http://www.thai-language.com/ref/overview. Accessed 15 Apr 2019 15. Seneewong Na Ayutthaya, T., Pasupa, K.: Thai sentiment analysis via bidirectional LSTMCNN model with embedding vectors and sentic features. In: Proceedings of the International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP), Pattaya, Thailand, Nov 2018 16. Promrit, N., Waijanya, S.: Convolutional neural networks for Thai poem classification. In: Cong, F., et al. (eds.) ISNN 2017, Part I. Lecture Notes in Computer Science (LNCS), vol 10261, pp. 449–456 (2017) 17. Polpinij, J., Srikanjanapert, N., Sopon, P.: Word2Vec approach for sentiment classification relating to hotel reviews. In: Meesad, P., et al. (eds.) Recent Advances in Information and Communication Technology 2017. Advances in Intelligence Systems and Computing, vol. 556, pp. 308–316 (2017) 18. Facebook Open Source: Word vectors for 157 languages. https://fasttext.cc/docs/en/crawlvectors.html. Accessed 15 Apr 2019 19. Horn, F.: Context encoder as a simple but powerful extension of Word2Vec. In: Proceedings of the 2nd Workshop on Representation Learning for NLP, Vancouver, Canada, pp. 10–14, Aug 2017 20. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning, Beijing, China, pp. 1188–1196, June 2014 21. Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1532–1543, Oct 2014

Fall Detection of Elderly Persons by Action Recognition Using Data Augmentation and State Transition Diagram Ayaka Takebayashi, Yuji Iwahori, Shinji Fukui, James J. Little, Lin Meng, Aili Wang and Boonserm Kijsirikul Abstract Because of the increase of the population of elderly people and the decreasing number of care workers, there is at present a problem of delay the detection of falls requiring emergency treatment. This paper proposes a method to acquire posture information of a person from a camera watching over elderly people, and to recognize the behavior of the person using Long Short Term Memory. The proposed method detects falls of elderly people automatically using the result of action recognition. Since it is difficult to capture dangerous scenes such as fall of elderly people A. Takebayashi · Y. Iwahori (B) Graduate School of Engineering, Chubu University, 1200 Matsumoto-cho, Kasugai 487-8501, Japan e-mail: [email protected] A. Takebayashi e-mail: [email protected] S. Fukui Department of Information Education, Aichi University of Education, Kariya 448-8542, Japan e-mail: [email protected] J. J. Little Department of Computer Science, University of British Columbia, Vancouver, BC V6T 1Z4, Canada e-mail: [email protected] L. Meng Department of Electrical and Electronics Engineering, Ritsumeikan University, Kusatsu 525-8577, Japan e-mail: [email protected] A. Wang Higher Education Key Lab, Harbin University of Science and Technology, Harbin 150006, China e-mail: [email protected] B. Kijsirikul Department of Computer Engineering, Chulalongkorn University, Bangkok 20330, Thailand e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Lee (ed.), Applied Computing and Information Technology, Studies in Computational Intelligence 847, https://doi.org/10.1007/978-3-030-25217-5_8

95

96

A. Takebayashi et al.

and to capture a lot of data, this paper proposes a new method that can create a large amount of data necessary for learning from a small amount of data. Keywords Fall detection · Elderly people · Action recognition · Data augmentation

1 Introduction The number of elderly people in Japan is increasing and number of the elderly population aged over 65 is 35.15 million and the ratio of the elderly people to the total population (aging ratio) is 27.7% in 2017. In the future, working-age population in Japan will lose its population (15–64) and the ratio of working-age population to one elderly person over the age of 65 was 12.1 times in 1950, while it is 2.3 times in 2017. This decrease will continue from now on and it is expected to decrease to 1.3 times in 2065 [1]. With such an aging population, the lack of care staff is likely to reduce the number of people supporting the elderly, and the elderly people living alone have become social problems. Therefore, it is difficult to care for and watch over many elderly people in nursing homes and homes, and various problems have occurred. One of the problems is the delay in the detection of falls which require emergency treatment. It is necessary to recognize behavior using a computer system to solve such problems and many methods for behavior recognition have been proposed. Methods of human action recognition using Kinect are summarized in Ref. [2]. In the detection method of fall motion described in Ref. [2], the authors focus on the specific flows obtained from the joint information vector of a person taken by Kinect, and use multiple classification methods such as backpropagation neural network and support vector machine. The authors perform fall detection based on the behavior recognition and the distance information between the joint position and the floor. However, a sensor such as Kinect that can obtain range images is necessary to perform the task. We propose a method to automatically detect a fall when the elderly people fall down when getting up from the bed using a monocular camera and machine learning to alleviate these conditions. OpenPose is a method for extracting posture information of a person using a monocular camera. Posture information of a person is acquired using OpenPose from a monocular camera watching over the elderly people, where Recurrent Neural Network recognizes the behavior of a person from the obtained information. By detecting the fall of a person using the obtained recognition results, it is possible to quickly find elderly people who need urgent treatment. In addition, this paper propose a method to create a data set for elderly people where a large amount of images cannot be taken and to improve the accuracy of recognition results.

Fall Detection of Elderly Persons by Action Recognition Using Data …

97

2 Background Knowledge 2.1 Long Short Term Memory Long Short Term Memory [3] (hereinafter referred to as LSTM) is a kind of Deep Learning algorithm RNN [4], and is used as a neural network that handles time series data. At time t, RNN can take into consideration the time series by inputting the response value of the middle layer at time t-1 simultaneously with the input sample to the middle layer. BPTT is used for learning in RNN, and past data is updated retrospectively, but as the data become long-term data, the gradient of the error disappears or explosively increases. A method that solves the problem of gradient disappearance in long-term memory, which is the problem of such RNN, is LSTM. LSTM replaces the middle layer of RNN with LSTM unit, and LSTM unit consists of storage memory, input gate, output gate, and forgetting gate. Since the unnecessary error calculation is not performed by the input gate and the output gate, gradient disappearance is prevented. Moreover, error propagation can be performed more efficiently by forgetting unnecessary memory by oblivion gate. Therefore, long-term memory can be correctly reflected in the output, and highly accurate identification can be performed.

DropOut Dropout [5], when learning a neural network, in a certain update, disables some of the nodes in the layer to perform learning, and in the next update, disables another node and perform learning. As a result, the degree of freedom of the network can be forcibly reduced during learning to improve generalization performance and avoid over-learning. Because the method is similar to ensemble learning, which combines multiple models to create one learning model, highly accurate learning results can be obtained.

2.2 OpenPose OpenPose is the name under which the method of [6] is released. We extract the feature points of human joints on a single image and estimate the human posture. Key points of 15 or 18 points for the entire body, 21 points for one hand and 70 points for the face can be detected from the image/video/web camera. Because it is a bottom-up model, the processing speed does not slow even if there are many people. In this paper, we explain the OpenPose method.

98

A. Takebayashi et al.

2.3 Optimization Algorithm Several methods have been proposed for updating the parameters when using gradient descent. In this paper, we use Adaptive Moment Estimation [7] (hereinafter Adam), which has attracted attention in recent years. Adam accumulates the exponential decay average of the past gradient squares, and additionally holds the exponential decrement average of the past gradients. The default parameters in the paper use the ones recommended in Ref. [7] .

3 Methodology This section describes an action recognition method which automatically detects a fall when an elderly person gets out of bed. Since this method uses machine learning to perform action recognition, the processing procedure is shown below about the two categories of “learning” and “action recognition”. • Learning Step1. Creation of data set using Kinect Data Augmentation is performed using 3D coordinates obtained from Kinect to create a data set. Step2 Learning by LSTM Time series learning is performed using LSTM which is a type of RNN. • Action recognition Step1. Extract joint coordinates of person by OpenPose The pose estimation method OpenPose is used to extract the 2D coordinates of human joints from the video taken by a general camera. Step2. Determine the action of extracted information by LSTM Recognize and discriminate behavior using a learned model. Step3. Correction of action discrimination result Correct the determine result using the state transition diagram.

3.1 Creation of Action Recognition Data Set Using Kinect In this method, the 2D coordinates data of the joint of the person is used to recognize the action of the person in the animation. It is necessary to have a large amount of data of the person who is the source of learning to perform the accurate recognition in machine learning. However, it is difficult to capture data of a large number of people. Furthermore, it becomes more difficult to capture a large amount of data related to falls in the elderly people.

Fall Detection of Elderly Persons by Action Recognition Using Data …

99

Therefore, this method proposes adjustment of joint coordinates and Data Augmentation based on 3D coordinates obtained using Kinect, and proposes a method to create a data set of large 2D joints of human joints. By using 3D coordinates, it is possible to perform Data Augmentation using transformation of camera position and natural joint coordinate data can be created. Procedure for creating a data set using Kinect is given as follows. 1. Get 3D coordinates of human joints using Kinect 2. Create joint coordinates of elderly people based on acquired joint coordinates 3. Data Augmentation by camera position conversion

3.1.1

Get 3D Coordinates of Human Joints Using Kinect

Kinect can obtain 3D information of human from the posture of the person in the image. The site of the human joint that can be acquired by Kinect v2 is shown in Fig. 1 from Kinect for Windows Help. In Kinect v2, it is possible to obtain 25 joint coordinates of human. The breakdown of the 25 parts is head, neck, center of shoulder, right shoulder, left shoulder, right elbow, left elbow, right wrist, right wrist, right hand, left hand, right thumb, left

Fig. 1 Kinect human joints acquired with kinect v2

100

A. Takebayashi et al.

thumb, right hand tip, left hand tip, spine, waist center, right hip, left hip, right knee, left knee, right heel, left heel, right foot, left foot. In this method, 3D coordinates of human joints are obtained from the animation of each action taken by Kinect. 3D coordinates of the joints of the person are obtained based on the joint coordinates obtained from the actual video, from the 25 positions that Kinect v2 can acquire, a total of 14 positions are used: “Head, Shoulder center, Right shoulder, left wrist, right hip, left hip, right knee, left knee, right foot, left foot”.

3.1.2

Create Joint Coordinates of Elderly People Based on Acquired Joint Coordinates

Using the data of the obtained 3D coordinates of human joints we adjust the joint coordinates to create the joint coordinates specific to the elderly people. According to Ref. [8], the elderly people tend to have lower shoulders and lower hips and shorter height compared with adults. Therefore, to reproduce the tendency, we adjust the joint coordinates of a total of five points of “head, center of shoulder, right shoulder, left shoulder, right hip, left hip” where changes can be seen. Adjustment is done by lowering the joint coordinates for each applicable point, and adjust the “right shoulder, left shoulder” and “right hip, left hip” so that the heights of the joint coordinates are not unnatural. Figure 2 shows an image in which two types of joint coordinate data have been adjusted to the joint coordinates.

Fig. 2 Joint coordinate adjustment image

Fall Detection of Elderly Persons by Action Recognition Using Data …

3.1.3

101

Data Augmentation by Transformation of Camera Position

Using the data of the joints obtained by taking image and the 3D coordinates of joints of a person adjusted for elderly people, the result of taking the same video from different angles is projected on the image and the data of 2D coordinates of human joints are obtained. It is not necessary to take image of each action every time by changing the camera position and it is possible to increase 2D coordinate data using the same video. Equation to project from the 3D coordinates to the image plane is given as follows. ⎡ ⎤ ⎤ ⎡ ⎤ ⎡ X u f x 0 cx ⎢Y ⎥ ⎣ v ⎦ = ⎣ 0 f y c y ⎦ [R|t] ⎢ ⎥ ⎣Z⎦ 0 0 1 1 1 ⎤ ⎡ r11 r12 r13 t1 [R|t] = ⎣ r21 r22 r23 t2 ⎦ r31 r32 r33 t3

(1)

(2)

Here, (X, Y, Z ) represents three-dimensional coordinates of the world coordinate system, and (u, v) represents the coordinates of a point projected on the image plane. (cx , c y ) is the principal point, and ( f x , f y ) is the focal length expressed in pixels. The translation-rotation homogeneous transformation matrix [R|t] represents the motion of the camera relative to the static environment, or the rigid body motion of the object in front of the fixed camera. This method uses Kinect v2 where (cx , c y ) is (960, 540), and the focal length ( f x , f y ) is (1051.79, 1048.55). Data Augmentation is performed by arbitrarily changing the rotation vector bm R that represents the orientation of the camera in the world coordinate system of equation (1), (2) and the translation vector bmt that represents the position of the camera in the world coordinate system. Figures 3 and 4 show an image of the 2D coordinates of a human joint acquired by Kinect and a transformed image with adding rotation or translation.

Fig. 3 Original image

102

A. Takebayashi et al.

Fig. 4 Image with coordinate transformation

3.2 Learning by LSTM Learning is performed using a data set of 2D coordinates of human joints in each action. In this method, LSTM is used to learn time series information. A total of 28 points of the 2D coordinates of the joints of each part of the person arranged in time series is regarded as one training data, and the correct data is vectorized by the action label One-hot. Figure 5 shows the structure of the network. Stacked LSTM, where LSTM is stacked in multiple layers, is used. Multiple layers makes it possible to learn long correlations and short correlations in each layer. The output dimensionality of the first layer LSTM is 200, and that of the second layer LSTM is 100. In the Dropout layer, about 20% of nodes are invalidated and learning is performed to prevent overlearning. The Dense layer is the total connection layer, and the dimension of the output is the sum of the learned action labels. The softmax function is used as the activation function, It has a role to decide how the sum of

Fig. 5 Network structure

Fall Detection of Elderly Persons by Action Recognition Using Data …

103

input signals is activated, and prediction results can be output with probability to each action label. Adam is used as the optimization algorithm.

3.3 Extract Joint Coordinates of Person by OpenPose The pose recognition of the person by OpenPose is performed to the real shot with monocular camera. OpenPose can obtain various key points by body/face/hand, and a total of 14 points are obtained: “nose, neck, right shoulder, left shoulder, right elbow, left elbow, right wrist, left wrist, right hip, left waist, right knee, left knee, right ankle, left ankle”. Figure 6 shows 2D coordinates of joints that can be acquired by Kinect and OpenPose. As can be seen from Fig. 6, the errors in the 2D coordinates of joints that Kinect and OpenPose can acquire are small, and it is observed that action recognition is performed using different posture recognition methods in learning and testing.

3.4 Determine Action by LSTM from Extracted Information Using 2D coordinates of human joints extracted by OpenPose, we predict behavior labels from trained neural networks. The action labels are explained in the next section. A total of 28 points of 2D coordinates of joints of human 14 parts arranged in time series are given as one data to the trained neural network. From the given data, the probability predicted to be the action label that has been learned is output for each

Fig. 6 Comparison of human joint coordinates

104

A. Takebayashi et al.

Fig. 7 State transition diagram

label. The label with the highest probability is the prediction result, and the label is output as the action discrimination result of the data.

3.5 Correction of Action Discrimination Result Based on the flow of the assumed operation, the result of the determination is corrected for the transition of the operation that is considered not to actually occur. The flow of assumed behavior is “Wake up from the bed −− > sit on the bed −− > stand up or fall over”. Based on this, the transition action label is “1. awake, 2. sitting, 3. standing, 4. fall, 5. intermediate action”, as shown in Fig. 7. Set transition diagram. The “intermediate action” is the action to be performed between each label whose action has been completed, such as from “wake up” to “sit down”. As an example of the transition of the operation to be corrected, there is a transition from a “fall” state to a “wake up” state, or a transition to a different action label without transitioning to an “intermediate operation”. This reduces false positives and improves accuracy.

4 Experiments The result of fall detection in the proposed method is evaluated with the detection accuracy.

4.1 Environment Specifications of the computer and camera used in this experiment is shown below • Computer specifications – CPU: Intel Core i7-3370 3.40 GHz – GPU: NVIDIA GeForce GTX 1060 6 GB – Memory: 16.0 GB

Fall Detection of Elderly Persons by Action Recognition Using Data …

105

• Camera 1 used for shooting (Kinect v2) – Number of pixels of color image: 1920 × 1080 pixel – Number of pixels in distance image: 512 × 424 pixel – frame rate: 30fps • Camera 2 used for shooting (GoPro HERO3) – Number of pixels of color image: 1920 × 1080 pixel – frame rate: 50fps

4.2 Create Training Data Set The action labels to be learned are “awake, sit, stand, fall, and intermediate action” in total of five. From the bed, sit up in the bed and decide whether to stand up or fall over. Table 1 shows the learning conditions of this method. In preparation for creating a data set for learning, at first, 11 motion images of each action labels are taken using Kinect, and 3D coordinates of human joints are acquired. After that, 10 sets of parameters, each of which is adjusted for 5 joints points for the joint coordinates of elderly people, are created according to each label. This is because the joint position may differ greatly depending on the label. All data were converted and data was increased. Next, to perform Data Augmentation, parameters of rotation vector and translation vector are set. The parameters are set manually while confirming visually with the result of the actual image. The reason is that if the parameters are determined roughly, the 2D coordinates of the joints will fit in the image. The angle is changed little by little with the original image as the front, and the parameters are determined so as not to be far from the front state. In this experiment, 109 conversions are performed on all the five motion labels of each action labels taken to increase data.

Table 1 Learning conditions Conditions Labels Number of videos taken Number of joints Learning data

“Awake”, “sitting”, “standing”, “fall” “Intermediate action” 11 labels each 14 places 1633500 pieces

106

A. Takebayashi et al.

Fig. 8 Learning curve

4.3 Neural Network Learning The learning is performed using the created data set. The batch size is 50, and the initial learning rate is 0.001. The framework uses keras. Figure 8 shows the learning curve. Since no change was seen in the value of accuracy after 40 epoch, learning stopped at 45 epoch. It can be confirmed from the learning curves in Fig. 8 that learning can be performed with high accuracy.

4.4 Test Set Figure 9 shows the test scene used in the experiment. The test video is taken using GoPro HERO3 and the image angle is changed for each scene. In addition, a fixed camera environment is assumed to recognize behavior changes in 2D coordinates of human joints. Two different animations of each scene “fall when stand up from bed” or “stand up from bed” are used by different persons for experiments. “fall” are 6 videos and “Non-fall” are 4 each scene.

Fig. 9 Each scene

Fall Detection of Elderly Persons by Action Recognition Using Data …

107

Divide the captured video every 150 frames in the same way as the data set, and extract the joint coordinates with OpenPose.

4.5 Action Recognition In this experiment, the action label is predicted 150 frames at a time, and the result is output as a numerical value. The allocation of each value is “1. awake, 2. sitting, 3. standing, 4. fall, 5. intermediate action”. After that, correct the result using the state transition diagram. Two different videos of each scene “stand up from bed” or “fall and fall when stand up from bed” are used by different persons for experiments. In addition, the case where the data set with data for elderly people is added and the case without those data are investigated to show the validity of the proposed method. A total of four patterns together with and without correction using a state transition diagram is shown in Table 2 which summarizes the methods. The above results are summarized in the following Tables 3 and 4 for each fall scene with no fall animation. From the Confusion Matrix of each method, it is confirmed that the action is recognized with high accuracy in any scene. In all the videos with fall, it was possible to recognize “fall” of the action label 4 and not to recognize “standing” of the action label 3. Furthermore, it is possible to recognize the “standing” of the action label 3 without recognizing the “fall” of the label 4 in all the videos without fall.

Table 2 Comparison of methods Elderly data Method 1 Method 2 Method 3 Method 4

State transition diagram

No Yes no Yes

Table 3 fall video recognition result Labels 1 2 Method 1

0.91

0.81

Method 2

0.99

0.87

Method 3

0.91

0.81

Method 4

1.0

0.88

No No Yes Yes

3

4

Intermediate action

No recognition No recognition No recognition No recognition

0.49

0.94

0.72

0.84

0.67

0.94

0.75

0.84

108

A. Takebayashi et al.

Table 4 Non-fall video recognition result Labels 1 2

3

4

Intermediate action

No recognition No recognition No recognition No recognition

0.87

Method 1

1.0

0.59

0.53

Method 2

1.0

0.72

0.79

Method 3

1.0

0.60

0.54

Method 4

1.0

0.77

0.79

0.82 0.87 0.81

According to Methods 1 and 2, and Methods 3 and 4, the accuracy was improved overall by increasing the data for elderly people. It is considered to be due to the increase in the number of data and the ability to recognize without being affected by height or posture during movement by adjusting the joint position. The reason for the decrease in the recognition accuracy of the intermediate movement is considered to be that the joint position of the intermediate movement has been adjusted close to the movement of other labels. Next, from Method 1 and 3, and Method 2 and 4, the recognition accuracy of “standing” in action label 3 and “fall” in action label 4 was improved. This is because what was misrecognized as an intermediate movement is corrected and it becomes a correct action label. In addition, even if the image scene is different, each action label can be recognized, so it is confirmed that the action can be recognized even when the taking position of image is changed. The reason is considered that Data Augmentation is learning data of various angles. From the above results, it can be judged that detection of fall using creation of elderly person data and correction with state transition diagram is correctly performed in the proposed method.

5 Conclusion In this paper, a data set is generated and the movement of human joints is recognized using Recurrent Neural Network. After that the method of detecting the fall of the person is proposed using the recognition result obtained by acquiring the joint coordinates of the person from the monocular camera watching over the elderly person. In recent years, due to the ever-increasing aging rate, the ratio of the working-age population to one elderly person over 65 years old decreases, and the number of workers for nursing care increases and the number of people supporting the elderly people decreases and becomes a social problem. Therefore, there is a problem that

Fall Detection of Elderly Persons by Action Recognition Using Data …

109

the finding is delayed when the nursing care facilities have a shortage of labor and the elderly people live alone and the elderly people fall down when getting up from the bed alone. To solve this problem, Deep Learning was used to learn the behavior of the person in the sequence and to learn the change of 2D coordinates of the joints of the person, recognize the behavior, and the fall of the elderly people was detected. First, a large amount of data is created using Kinect. The 3D coordinates of human joints are obtained from the sequence of each action taken by Kinect. Next, based on the obtained 3D coordinates of the joints of the person, joint coordinates for the elderly people are created. After that, the obtained coordinates are taken from the same moving image at virtually different angles, and the results are projected on the image to obtain 2D coordinates of human joints. This makes it possible to increase the amount of data more effectively than taking image of each action every time and increasing the data. Then learning is performed using the data set of the 2D coordinates of the joints of the created person. In this method, LSTM is used to learn time series information. At the time of action recognition, OpenPose is used to extract 2D coordinates of human joints. Based on the obtained 2D information of human joints, the predicted result of behavior is output by the learned network. It was confirmed that the action recognition was carried out accurately. As future work, the generalization is improved by changing the process of normalizing the 2D coordinates of the joint with the image size at the time of learning by LSTM to the distance based on the joint point with a person and performing normalization. In addition, when data for elderly people is created, adjustment parameters are determined for some joint points, but children other than elderly people can be prepared by increasing the number of adjustment points or adjusting adjustment points. It can be applied to data for the purpose of increasing its versatility. Acknowledgements Iwahori’s research is supported by JSPS Grant-in-Aid for Scientific Research (C)(17K00252) and Chubu University Grant.

References 1. Cabinet Office: The 20th Edition Aging Society White Paper (Overall Version). http://www8. cao.go.jp/kourei/whitepaper/w-2018/zenbun/pdf/1s1s_01.pdf 2. Lun, R. et al.: A survey of applications and human motion recognition with microsoft kinect. Int. J. Pattern Recognit. Artif. Intell. 29(05), 1555008 (2015) 3. Hochreiter, S.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 4. Elman, J.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990) 5. Srivastava, et al.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014) 6. Cao, Z., et al.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR (2017) 7. Kingma, et al.: Adam: A method for stochastic optimization (2014). arXiv preprint, arXiv:1412 8. Hujita, H.: Posture and fall of old people. Physiotherapy Sci. 10(3), 141–147 (1995)

Elliptic Curve Cryptography and LSB Steganography for Securing Identity Data Christofer Derian Budianto, Arya Wicaksana and Seng Hansun

Abstract Data theft is a growing phenomenon in order to gain useful information for the attackers own benefits over the victims sensitive data. The loss caused by this fraudulent act is severe as what happened to the Equifax data breach in 2017. Here cryptography and steganography are applied to secure identity data contained in the electronic Indonesian identity card (e-KTP). Cryptography has been used for decades to secure data in storage. Steganography is often added after the cryptography process to provide two layers of protection. The elliptic curve cryptography (ECC) is an approach to public-key cryptography based on the algebraic structure of elliptic curves over finite fields. The ECC is used here to secure the Indonesia identity card data. Least significant bit steganography is used after the ECC to embed the information (cipher text) into the person picture. Thus the only stored information in the application is a collection of photos. The testing and evaluation show that the implementation is successful with the PSNR value above 79 dB.

1 Introduction The Equifax data breach in 2017 exposed the sensitive personal information of 146 million consumers data [1]. When it comes to identity theft, users may be putting themselves at risk without realizing it. Storing sensitive data without any protection is highly vulnerable to attack such as data theft. Cryptography and steganography are widely used for decades to secure the data in the storage [2–4]. Both methods are complementary to each other and provide two layers of protection to the data. Thus, it enables data to be securely stored beyond the reach of unauthorized parties. C. D. Budianto (B) · A. Wicaksana · S. Hansun Department of Informatics, Universitas Multimedia Nusantara, Tangerang 15810, Indonesia e-mail: [email protected] A. Wicaksana e-mail: [email protected] S. Hansun e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Lee (ed.), Applied Computing and Information Technology, Studies in Computational Intelligence 847, https://doi.org/10.1007/978-3-030-25217-5_9

111

112

C. D. Budianto et al.

Elliptic curve cryptography (ECC) is an approach to public-key cryptography based on the algebraic structure of elliptic curves over finite fields [5]. The advantage of ECC compared to other cryptography is that ECC could obtain the same level of security but with shorter key lengths [5, 6]. Shorter key lengths mean faster computing, lower power consumption, and lesser memory and bandwidth usage. However, the use of cryptography only would arouse suspicion as the resulting ciphertext is obvious to human eyes. Thus, steganography is also used to hide the conceal it. Steganography hides the original data in a way that its existence is unknown to other parties [3]. The steganography method that can be used with cryptography is the least significant bit (LSB). The least significant bit is a method that literally replaces the least significant bit (smallest bit value) of a medium (video, audio, image) with the bit(s) of the message to be hidden. Changes that are caused to the hosting medium by this method are not apparent to the human eyes. Here the two methods are used specifically for securing identity data that are contained in the electronic Indonesian identity card (e-KTP). The hosting medium that is used is the person photograph in the identity card. Therefore, the only stored data in the storage is only an image of the person. Nonetheless, all information regarding the identity of the person is securely stored within the photo. The application is built as a web-based application and the result of the cryptography and steganography to the photo is measured using peak signal-to-noise ratio (PSNR).

2 Methods 2.1 Elliptic Curve Cryptography The elliptic curve cryptography (ECC) is an asymmetric cryptography that uses the elliptic curve equation. In mathematics [6], an elliptic curve is a plane algebraic curve defined by an equation of the form that is non-singular and it has no cusps or self-intersections. The elliptical curve could be created using this equation [6]. y2 = x3 + ax + b , where 4a3 + 27b2 = 0 mod p

(1)

Here is the example of an elliptic curve built with a = 4 and b = 9 as displayed in Fig. 1. The elliptic curve as displayed in Fig. 2 is an example of its usage for cryptography. The most important property of the curve is that if a line intersects two points in the curve it will always intersect a third. This is essential since that third point will be the representation of the public key. At this point, the produced public key from private key is a massive prime number. The details of the encryption and decryption process are given using mathematical operations as the following [6].

Elliptic Curve Cryptography and LSB Steganography for Securing Identity Data

113

Fig. 1 An elliptic curve with a = 4 and b = 9

Fig. 2 The use of elliptic curve for cryptography

1. Point Addition Let P(x, y) = Q(x, y) and Q = ∞, thus P(x, y) + Q = P(x, y) + ∞ = P(x, y) Let P(x, y) = Q(x, y) and Q = ∞, thus P(x, y) + Q(x, y) = R(x, y), where

114

C. D. Budianto et al.

Rx =

Qy − Py Qx − Px

2 − Px − Qx

and Qy − Py (Px − Rx ) − Py Ry = Qx − Px

(2)

2. Point Doubling Let P(x, y) = Q(x, y), thus P + Q = P + P = 2P(x, y) = R(x, y), where Rx =

3Px 2 + a 2Py

− 2Px

and 3Px 2 + a − (Px − Rx ) − Py Ry = 2Py

(3)

3. Point Subtraction Let R(x, y) = P(x, y) – Q(x, y) = P(x, y) + (–Q(x, y)) = P(x, y) + Q(x, –y) Let any point in the curve, the negation is the reflection of the x-axis: −P(x, y) = P(x, −y) 4. Point Multiplication The multiplication operation of point P is kP = P + P + · · · + P as many as ktimes, where k is an integer in field P and P is any point on the elliptic curve (E). Mathematical operations for points calculation is necessary for the encryption and decryption process. The private key (d) is picked at random, where the public keys are the two points e1 and e2 which Q and R are in the curve. Point e1 is on the curve and e2 is the multiplication product of e1 and private key d. The followings are the encryption and decryption process as described in [6]. • Encryption Every character that is encrypted with produce a couple of cipher characters: C1 and C2 . C1 = k × e1 C2 = P + k × e2 • Decryption Decryption process is done for a couple of cipher characters using the following formula. P = C2 − (d × C1 )

Elliptic Curve Cryptography and LSB Steganography for Securing Identity Data

115

Fig. 3 Binary representation of the first 3 pixels from 24-bit color image

Fig. 4 Binary representation of the first 3 pixels after the LSB insertion of 01000001 (A)

2.2 Least Significant Bit The least significant bit (LSB) steganography method embeds secret information by replacing the least significant bit(s) of the cover file with the bit(s) of the secret information [7]. The changes made to the cover file is minimum and not observable to the human eyes. Since the technique implies replacing the least significant bit of the cover file for each of the pixels, hence for images the lossless compression type such as PNG is suitable for LSB [7]. Figures 3 and 4 demonstrate the embedding of character A (01000001) using LSB into the first 3 pixels of a 24-bit color image.

2.3 Peak Signal-to-Noise Ratio Peak Signal-to-Noise Ratio (PSNR) is a value used to determine image quality after a reconstruction [8]. PSNR is commonly measured in units of decibels (dB). The value of PSNR generally ranges from 20 to 40 dB [9]. A reconstructed image is said to be of high quality if it has a PSNR value higher than 40 dB. PSNR calculation requires the Mean Squared Error (MSE) value. The value of PSNR is used to determine the similarity level of the two images, while the MSE value is used to measure the difference between the two images [8]. The equation for calculating MSE as in [8] is shown below.

116

C. D. Budianto et al.

MSE =

1 MN

M N (Xij − Xij )2 , where

(4)

i=1 j=1

• M and N is the height and width of the image, • Xij is the ith row pixel and the jth column of the original image, • Xij is the ith row pixel and the jth column of the stego image. After obtaining the value of the MSE, the value of PSNR in dB could be calculated using the formula below as written in [8] with I is the maximum fluctuation value in the input image. 2 I (5) PSNR = 10 log10 MSE

2.4 Indonesian Identity Card The Indonesian Identity Card or Kartu Tanda Penduduk (KTP) is an Indonesian compulsory identity card. Separate versions exist for Indonesian and non-Indonesian residents. The card is issued upon reaching the age of 17 or upon marriage. In the case of Indonesian citizens, the card must be renewed every five years unless for electronic-KTP (e-KTP) which is valid for lifetime. For non-Indonesian residents, the card’s expiry date is the same as that of their residency permit. The following are the information displayed in the card (e-KTP) which are sorted based on the appearance of the information in the card with the Indonesian language translation in the bracket next to it: • • • • • • • • • • • • • •

single identity number (NIK), full name (Nama), place and date of birth (Tempat/Tgl Lahir), gender (Jenis Kelamin), blood type (Gol. Darah), address (Alamat), religion (Agama), marital status (Status Perkawinan), occupation (Pekerjaan), nationality (Kewarganegaraan), expiry date (Berlaku Hingga), photo, place and date of issue, and bearer’s signature.

Elliptic Curve Cryptography and LSB Steganography for Securing Identity Data

117

3 Results The application is built using C# programming language with .NET framework. This application has four menus: Save Identity (Simpan Identitas), View Identity (Lihat Identitas), How to Use (Cara Pemakaian), and Credits. In Fig. 5, the main menu of the application is shown. It introduces the use of the application. Figure 6 displays the input form for users to insert their identity data to the application, the next step as presented in Fig. 7 after the input form is to upload the image to be used as the cover file by the application. In Fig. 6 above, the user is asked to fill in the fields of the electronic Indonesian Identity Card (e-KTP). All inputs are checked and cannot contain special characters such as: #, *, , ’, and ”. There is also a picture of a blank e-KTP card next to the form as an example of data that must be filled. The NIK (single identity number) and RT/RW columns can only be filled with numbers only. The Citizenship column will provide additional column when a foreign national is chosen. The additional

Fig. 5 Application main menu

118

C. D. Budianto et al.

Fig. 6 Identity data input form

column verifies the data to be the same value as the date of validity in the Permanent Stay Permit (ITAP). The information inside the stego file (covert message) could be retrieved back ultimately as shown in Fig. 8. Based on the initial results, the implementation is successful with the use of real data from the electronic Indonesian identity card (e-KTP). The cover file for the steganography process is the e-KTP photo of the user to reduce suspicion. Thus, the only file that is stored in the filesystem is only an image file, which is the users own photo with no other data stored explicitly. Meanwhile, the full information is securely embedded inside the photo and could be retrieved back later on.

Elliptic Curve Cryptography and LSB Steganography for Securing Identity Data

119

Fig. 7 Upload file for the cover image

4 Analysis 4.1 Testing Testing of the ECC implementation in this research has been done and one test scenario is described in this section. In this test scenario, the generation of the elliptic curve uses a = 6 and b = 7 that produces 64 distribution points. The 64 point distributions represent all of the characters required by the application as shown in Table 1 and the produced elliptic curve is shown in Fig. 9. Figure 10 presents the point distribution for the elliptic curve in Fig. 9. In this test scenario, there are four dummy data used for the testing. This test uses p = 61 and satisfy (1). The public keys chosen for each of the data are (32, 24), (50, 14), (47, 31), and (30, 46). Thus, the chosen private key is 59. The four test data are displayed in Tables 2 and 3. The cipher text produced from the encryption process for each of the test data is presented in Table 4. These cipher text is then inserted into each of the person e-KTP photos respectively using the LSB technique. The result of the stego image is evaluated using PSNR measurement and explained in the next subsection.

120

C. D. Budianto et al.

Fig. 8 The recovered information from the stego file Table 1 List of characters a i q b j r c k s d l t e m u f n v g o w h p x

y z 0 1 2 3 4 5

6 7 8 9 , . / ?

: ; [ ] { } \ |

‘ ! @ # $ % ˆ

& * ( ) _ = +

4.2 Evaluation The evaluation of the stego image is measured using PSNR application made by Prof. Bertolino in [10]. The image used for the PSNR evaluation is the Lena image retrieved from [11] as illustrated in Fig. 11. This particular image is used to provide fair PSNR evaluation of the ECC regardless of the persons photo file format and resolution. The PSNR results for both data measured using the Lena image can be seen in Table 5.

Elliptic Curve Cryptography and LSB Steganography for Securing Identity Data

Fig. 9 The elliptic curve for a = 6 and b = 7

Fig. 10 Point distribution for a = 6 and b = 7

121

122 Table 2 Test data 1 Field NIK Nama Tempat Lahir Tanggal Lahir Jenis Kelamin Golongan Darah Alamat RT/RW Kelurahan/Desa Kecamatan Agama Status Perkawinan Pekerjaan Kewarganegaraan Masa Berlaku

Table 3 Test data 2 Field NIK Nama Tempat Lahir Tanggal Lahir Jenis Kelamin Golongan Darah Alamat RT/RW Kelurahan/Desa Kecamatan Agama Status Perkawinan Pekerjaan Kewarganegaraan Masa Berlaku

C. D. Budianto et al.

Data 1

Data 2

3348610401970005 Christofer Derian Budianto Tegal 1997-03-04 Laki-laki B Jl. Pala 22 No. 30 005/017 Mejasem Tengah Kramat Katholik Belum Kawin Pelajar/Mahasiswa WNI Seumur Hidup

3360059202960004 Rahma Febryani Semarang 1996-02-02 Perempuan – Jl. Sidomulia No. 12 001/025 Muktirejo Kidul Pasurungan Islam Belum Kawin Pelajar/Mahasiswa WNI Seumur Hidup

Data 3

Data 4

3531021501960008 Janssen Jakarta 1997-04-08 Laki-laki A Villa Melati Blok V No. 8 008/008 Jelupang Serpong Katholik Belum Kawin Perdagangan WNA 2020-04-08

3586052905960004 Yosua Winata Jakarta 1997-04-17 Laki-Laki – Jl. Kebon Dalem No. 57 003/004 Keagungan Taman Sari Budha Belum Kawin Pelajar/Mahasiswa WNI Seumur Hidup

Elliptic Curve Cryptography and LSB Steganography for Securing Identity Data

123

Table 4 Encryption result Plain text 1

3348610401970005#christofer*derian*budianto#tegal#19 97-03-04#laki-laki#b#jl.*pala*22*no.*30#005#017#meja sem*tengah#kramat#katholik#belum*kawin#pelajar/mahas iswa#wni#seumur*hidup Cipher text 1

dˆ|&0%|0*6c#|8z˜c7c#zm8d*0*08*8p8$*cd,d+zez,|6z)8!cq zb**zzcq8jcazqz!**dkds*dcacl*n|6z)0y0/cq7)8=*ld1c#7q *9d08}0|0:7.*08w8$dacl0+zez?z=8=cbze|{cjc[db8m0]07c, 0e8mzq700x*2cdc+z)c2**8˜dz7_c7|88p|{0|0_*78$*mza8+8= d(0i*m||7gza0jc/8=c\d18f8jzqzi|9|68$7ndi7g8(|g*lcazr 0ycjdm*l0,7l07zrdi0&zec+0yd\8l8m0e7r7m7!76*mclz58=z, d9z,83*ac[|%dfze0yz,dm0,09dsd+07*h0mc|zhzs

Plain text 2

3360059202960004#rahma*febryani#semarang#1996-02-02# perempuan#-#jl.*sidomulia*no.*12#001#025#muktirejo*k idul#pasurungan#islam#belum*kawin#pelajar/mahasiswa# wni#seumur*hidup Cipher text 2

.-wz.=*0w4#+.n*2w4w[#gwj._._._*4w.#a#o.v#v.r.#w;*e.p *rw]w(#9#/.w*s.k.!w(*rw(wg#u.w.˜#g.n*6*-*0*2#3*0w[*# wq#t.(w\*mwq#nw(*n#**-*#.,.f*..#*sw5*d#(.!*uwsw5w(** *n*o#wwww}w[#*w4#@.˜#*#@#:./#*#vw9wt#\#/wo*e*jwr.#.\ *i.[*u*l#*#bw(wlw9.(.g.u.9#o#9*#w5.;ws#o*m.w*b.k.f*u *mww#e*a..w5.uw.*p*e*l.r.,#o#a#5#v*awm.r.;w5.;..*a.w *w.u*i#*wl*e*u#v#n.(**.v.+*d.gwq

(continued)

124

C. D. Budianto et al.

Table 4 (continued) Plain text 3

3531021501960008#janssen#jakarta#1997-04-08#laki-lak i#a#villa*melati*blok*v*no.*8#008#008#jelupang#serpo ng#katholik#belum*kawin#perdagangan#wna#2020-04-08 Cipher text 3

=kãhv@4+sg)=lg{f5*1@jgˆ=)f5hk=v:}f3e&h_˜9˜9fye-p5@s =3f&:5+%h.|9˜2g!y)h0|zywoa@?hmpj+5gk|ixj˜)h7$%p7e&=2 h7*#h8x3:jxbeˆ˜u|9$xhcp|*l+{:!g?hf+4|i@ee}y@h}$xgd*o y4||x{*#|8˜&ga@&=)x#:&=9h{@+=4h4$fg}˜/f|*#$j:hymh:p@ :;h@y0$a+{˜bp:eyeˆh7f&y0h[$oyt˜i˜\eoh#@k+mxbyq@&˜=e˜ yme;h8f|o2|r*g+{f%hsg\@,+{f)esg9$3=)o/y2:zpq+s@#

Plain text 4

3586052905960004#yosua*winata#jakarta#1997-04-17#lak i-laki#-#jl.*kebon*dalem*no.*57#003#004#keagungan#ta man*sari#budha#belum*kawin#pelajar/mahasiswa#wni#seu mur*hidup Cipher text 4

|&]6*8t:9g6‘‘4:ss+5:5*:b:39g_?_{!;|$5?s:-v!‘]_s)_v92 !‘b|_t˜2-u5|*k‘6!ˆt˜‘6*#˜.]ba˜!p!!ao‘#‘˜]3a;a=bya:6d _vs9:(:59@‘)a=9s-ˆ|ka%˜[_]_i5z]5s.!4a6]y|9:(]n!1695# |gto]_-t!p*#!\tm|&!;s+_?th_7tcb{‘6]qac92_,‘6]p9eb|‘6 ]f-b!4**_b9˜5{-k-ˆ9]])9):=5|a=!:!#˜u*u]fsbax|9bi˜(92 a=|h|=5c-b](_ts#˜r|q]!‘r:5|55d˜9!l*a_7‘$a}5da=5˜s7˜i b[t$:/:˜_n:u]yact#

Elliptic Curve Cryptography and LSB Steganography for Securing Identity Data

125

Fig. 11 Lena image (512 × 512 pixels) in PNG file format

Table 5 PSNR result

Table 6 Comparison between cover and stego image

Data

Cipher length

PSNR

1 2 3 4

354 344 308 330

79.41 79.99 79.94 79.75

Data

Cover image size

Stego image size

PSNR

1 2 3 4

2.44 MB 126 KB 60.1 KB 764 KB

7 MB 2.06 MB 101 KB 1.74 MB

64.53 63.83 64.56 66.02

After the steganography evaluation using the Lena image is done, another steganography evaluation is carried on for the test data used in the previous subsection. The PSNR between the original person e-KTP photo (cover image) with the stego image is obtained using the PSNR application mentioned earlier. The PSNR evaluation is displayed in Table 6.

126

C. D. Budianto et al.

5 Discussion The ECC and LSB steganography are successfully implemented for securing identity data particularly the data in the Indonesia Identity Card (KTP). The implementation in the web-based application shows that the data could be encrypted using ECC and stored within the persons photo using LSB. The identity information stored within the photo can be retrieved completely using the application without any discrepancy from the initial identity. The PSNR evaluation shows that the overall process still able to maintain a similarity level above 79 dB using Lena image. Meanwhile, the four test data provide lower PSNR with an average of 64.735. This is due to the lower image quality of the person e-KTP photo. Previous work of LSB steganography with DWT shows lower PSNR result [12]. Further research can be done in testing the security of the ECC and LSB implementation in this work. This could be extended to system level simulation using a specifically built virtual prototyping platform as in [13]. Another work to be done is to find the optimum value between the PSNR and the cipher size. This could be done by comparing the LSB with 2-bit, 3-bit, and 4-bit insertion technique. Furthermore, this work could be added with a lossless compression algorithm for the stego image such as Run-length encoding, Huffman Encoding, and Lempel-Ziv-Welch compression technique. Acknowledgements Foremost, I would like to express my sincere gratitude to my advisors Arya Wicaksana and Seng Hansun for the continuous support of my undergraduate final year project, for the patience, motivation, enthusiasm, and immense knowledge.

References 1. Equifax breaks down just how bad last year’s data breach was (2018). Available via NBC News. https://www.nbcnews.com/news/us-news/equifax-breaks-down-just-how-badlast-year-s-data-n872496. Cited 05 Aug 2018 2. Ahmed, D., Khalifa, O.: Robust and secure image steganography based on elliptic curve cryptography (2014). https://doi.org/10.1109/ICCCE.2014.88 3. Mishra, R., Bhanodiya, P.: A review on steganography and cryptography (2015). https://doi. org/10.1109/ICACEA.2015.7164679 4. Saritha, V., Khadabadi, S.M.: Image and text steganography with cryptography using MATLAB (2016). https://doi.org/10.1109/SCOPES.2016.7955506 5. Rouse, M.: What is elliptical curve cryptography (ECC)? SearchSecurity (2018). https:// searchsecurity.techtarget.com/definition/elliptical-curve-cryptography. Cited 05 Aug 2018 6. Mohanta, H.K.: Secure data hiding using elliptical curve cryptography and steganography. Int. J. Comput. Appl. 108(3), 16–20 (2014) 7. Zin, W.: Message embedding in PNG file using LSB steganographic technique. 2(1) 227–228 (2018) 8. Compute peak signal-to-noise ratio (PSNR) between images - Simulink. Mathworks.com (2018). https://www.mathworks.com/help/vision/ref/psnr.html. Cited 05 Aug 2018

Elliptic Curve Cryptography and LSB Steganography for Securing Identity Data

127

9. Saffor, A., Ramli, A., Ng, K.: A comparative study of image compression between JPEG and WAVELET 14(1), 42 (2018) 10. Bertolino, P.: gipsa-lab. Gipsa-lab.grenoble-inp.fr (2018). http://www.gipsa-lab.grenoble-inp. fr/~pascal.bertolino/software.html. Cited 13 Aug 2018 11. Hrthe, O.: The Lena standard test image, full version (!) - Velmont Tech. Tech.velmont.net (2018). http://tech.velmont.net/the-lena-standard-test-image-full-version/. Cited 05 Aug 2018 12. Wilson, F., Kristanda, M.B., Hansun, S.: New capacity formula for steganography using discrete wavelet transform. 13(1), 157–165 (2014) 13. Wicaksana, A., Tang, C.M.: Virtual prototyping platform for multiprocessor system-on-chip hardware/software co-design and co-verification (2018). https://doi.org/10.1007/978-3-31960170-0_7

Labeling Algorithm and Fully Connected Neural Network for Automated Number Plate Recognition System Kevin Alexander, Arya Wicaksana and Ni Made Satvika Iswari

Abstract Applications of automated number plate recognition (ANPR) technology in the commercial sector has developed rapidly in recent years. The applications of ANPR system such as vehicle parking, toll enforcement, and traffic management are already widely used but not in Indonesia today. In this paper, the Labeling algorithm and a fully connected neural network are used to create an ANPR system for vehicle parking management in Universitas Multimedia Nusantara, Indonesia. The system is built using Java and the Android SDK for the client and PHP for the server. The proposed ANPR system is targeted for Indonesian civilian number plate. Testing shows that the ANPR system has been implemented successfully. Evaluation of the system gives a precision value of 1 and a recall value of 0.78. These values are obtained with hidden layer nodes of 75, 85, and 95. These number of hidden nodes delivers an F-score of 0.88 with the accuracy of 88%.

1 Introduction The establishment of machine learning in artificial intelligence, in particular, the artificial neural network (NN) has brought automated number plate recognition (ANPR) applications to come to exist. The ANPR could be developed today with higher accuracy and precision than many decades ago using machine learning and deep learning. The ANPR itself could be a supervision method that one of its many applications is for vehicle parking. It recognizes a vehicle number plate from an image captured using a camera. K. Alexander · A. Wicaksana (B) · N. M. S. Iswari Department of Informatics, Universitas Multimedia Nusantara, Tangerang 15810, Indonesia e-mail: [email protected] K. Alexander e-mail: [email protected] N. M. S. Iswari e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Lee (ed.), Applied Computing and Information Technology, Studies in Computational Intelligence 847, https://doi.org/10.1007/978-3-030-25217-5_10

129

130

K. Alexander et al.

A number plate is a metal plate that is used to place the vehicle number. In Indonesia, the civilian number plate has a black background with white color characters (letters and numbers) [1]. The number plate is registered and recognized by the Indonesian authority. Universitas Multimedia Nusantara is a private university in Indonesia and currently, the limited parking space is managed by a security guard to check and allow only certain employees to occupy reserved parking spaces. Thus, the proposed ANPR system in this work could overcome the issue by recognizing the vehicle number plate automatically without the need for any human intervention. One key technology to this ANPR system is the optical character recognition (OCR) technology, that is good to identify characters in the number plate [2]. The Labeling algorithm and a fully connected neural network (FCNN) are chosen in this work due to its high accuracy for image processing applications. The complexity of the Labeling algorithm can be denoted by O(n), with n is the number of pixels within a frame, and the computation of the artificial neural network at a constant time is O(1) [3]. Sitompul in [4] suggested that further research could be carried out to implement Labeling algorithm and FCNN on mobile platform for real-time number plate recognition. Based on the recent works in image processing and classification problem [5, 6], here the Labeling algorithm along with a fully connected neural network are used for vehicle parking management (ANPR) system in Universitas Multimedia Nusantara. The proposed ANPR system is built using Java and Android SDK for the client part and PHP for the server part. Data processing and training process of the NN is done in the back-end. The chosen architecture of the NN is fully connected (feed-forward) with one hidden layer and several configurations of hidden nodes in the hidden layer. The testing and evaluation of the proposed system are intended to measure the F-score and accuracy of the system, and also to find the best amount of hidden nodes for the hidden layer.

2 Methods 2.1 Number Plate Recognition Number Plate Recognition is a method of identifying and obtaining characters in the number plate to be converted into text [7]. This technology utilizes the optical character recognition (OCR) to electronically convert images into machine-encoded text. In number plate recognition, the steps required are given in Fig. 1. The Plate Localization process recognizes the number plate pattern and looks for features such as shape and color in the image [8]. This process may not be effective of the number plate has different colors and marks. The Plate Orientation and Sizing process transforms the number plate image proportionally in size to a rectangular shape. The Normalization process adjusts the lighting and contrast level of the image. After the Normalization is done, the Character Segmentation

Labeling Algorithm and Fully Connected Neural Network …

131

Fig. 1 Automated number plate recognition (ANPR) system flow

process seeks to decompose an image of a sequence of characters into sub-images of individual symbols. Finally, the Character Recognition process translates each of the individual symbols into characters from the sub-images.

2.2 Labeling Labeling algorithm is used in image processing to detect areas that are connected in a binary digital image that is digital image with two possible colors for each pixel (black and white). It is able to detect such areas in colored images and data with higher dimension [9]. The Labeling algorithm performs double check on the image. The first check is to label each component of the image that is connected based on the neighbor pixel. The first check is described in Fig. 2. The second check collects and gathers pixels into the label list. In addition, this check ensures that no component has more than one label. The second check process is described in Fig. 3.

2.3 Neural Network A neural network (NN) is a data-mining tool that could be used for classification and clustering applications [9]. NN learns from patterns and examples. When given sufficient inputs, NN could find a new pattern in the data. NN has a composition of three different layers: input layer, hidden layer, and output layer. Each of the layers can have different number of nodes (neurons). These nodes are connected to other nodes in the neighboring layer. Weight is added to the connection between nodes. These weights are updated during the learning process (training) of the NN. There are two types of NN based on the learning process: supervised the output value is

132 Fig. 2 First check of the Labeling algorithm

Fig. 3 Second check of the Labeling algorithm

K. Alexander et al.

Labeling Algorithm and Fully Connected Neural Network …

133

Fig. 4 Fully connected neural network (FCNN) architecture

known beforehand (Back Propagation algorithm) and unsupervised the output value is unknown (clustering). In Fig. 4, a fully connected NN is presented. The input layer represents raw information that would be entered into the network. Each input to the network will be duplicated and sent to the nodes in the hidden layer. The hidden layer uses data and modifies the received value with the weight value of each connection. The new value is then sent to the output layer with some modifications based on the weight between the nodes in the hidden layer and output layer. This output then processed using the activation function [10]. The number of nodes in each layer is determined by the problems that is going to be resolved by the NN, the data type that the NN would receive, and the quality of data. The number of nodes in the input and output layers depends on the training set [10]. On the other hand, determining the number of nodes in the hidden layer is a challenging task. When the hidden nodes are too much, the amount of computing power required for training the NN would increase. If the hidden nodes are too few, the learning ability of the NN would suffer [11]. Thus, the formulation for the deciding the appropriate amount of hidden nodes in the hidden layer is defined as the following. (1) N (h) = N (i) × N (o) where: N (h) = the number of hidden node N (i) = the number of input node N (o) = the number of output node The neural network monitoring is done by the learning process. When the results are not developed, the NN model has to be modified. The output of the NN is controlled by the weight value between connecting nodes. Initially, the weight is set to a random value and during the learning process (training), the weight is adjusted repeatedly. The overall weight changes in the neural network have to be done simultaneously [12]. If the NN results improve after the weight renewal, the weight value

134

K. Alexander et al.

is kept and the iteration process continues. Finding the best combination of weights is the way to minimize errors and optimize accuracy. Determining the learning rate and momentum term could help in the process of updating (adjusting) the weight. When the learning rate is too small, the model would take a long time to become convergent. On the other hand, if the learning rate is too large, the model would become divergent. The large momentum term affects the initial weight value adjustment to move on the same path as the previous adjustment. Thus, an activation function is required by the hidden layer to introduce the nonlinearity [13]. Without an activation function, the NN would not work optimally due to the fact that it is just the same as a simple perceptron. In this work, there are two activation functions in use: ReLU and Softmax.

2.3.1

Rectified Linear Unit (ReLU)

The ReLU activation function is could only be used in the hidden layer. This ReLU function is explained in the following formula and illustrated by its function image in Fig. 5 [14]. The definition of SUM is described in Sect. 2.4.1. f (x) = max(0, x)

(2)

where: x = SUM The ReLU activation function is non-linear, which means that a back propagation could be done to fix errors and could have multiple layers with nodes that could be activated using this function. The advantage of the ReLU function is that it does not activate all nodes at the same time. Based on the ReLU activation function formula, when the input is negative it would be converted to 0 and the node is not activated. Thus, only a few nodes that would be activated at a time, which makes the network more efficient and easy to manage in terms of computing power.

2.3.2

Softmax

The Softmax activation function is a Sigmoid function that is easier to control for the classification process. This function tries to overcome classification problems for many classes. This Softmax function produces output value between 0 and 1 for each class. The Softmax function is defined using the following formula [14]. eyi S(yi ) = yj je where:

(3)

Labeling Algorithm and Fully Connected Neural Network …

135

Fig. 5 Illustration of the ReLU activation function

yi = ith output node yj = jth output node The Softmax activation function is ideal for use in the output layer (classification layer) where opportunities for determining classes from each input are searched for [14].

2.4 Back Propagation Algorithm Back propagation algorithm is used to calculate the necessary improvements, that is to update the weight between nodes. The Back propagation algorithm is divided into the following four main steps [15]: feed-forward computation, back propagation to the output layer, back propagation to the hidden layer, and weight updates.

2.4.1

Feed-Forward Computation

Feed-forward computation is a two-stage process. The first step is to get the value of the nodes in the hidden layer and the second step is to get the values in the output layer of the calculation values using the values of the nodes in the first stage [10]. These values are calculated using the SUM activation function. It calculates a weighted sum of its input, adds a bias and then decides whether it should be dropped or not.

136

K. Alexander et al.

SU M =

(xj wj ) + b

(4)

j

where: xj = jth input wj = jth weight b = bias

2.4.2

Back Propagation to the Output Layer

The next step is to calculate the error value at the output node [10] using the following formula. (5) error = nout × (1 − nout ) × (ntarget − nout ) where: nout = produced output value ntarget = desired output value The error value is calculated to find out how far the generated value with the desired value. After the error value is known, the value would be used in back propagation and in the renewal of the weight value. At this stage, the learning rate and momentum term would be used [10]. ΔW(i,j) = β × error × n(i,j)

(6)

where: ΔW(i,j) = weight changes of the ith node in the hidden layer to the jth node in the output layer β = learning rate n(i,j) = output value of the ith hidden layer After obtaining the value of the changes that must be made, the new weight value can be calculated with the following Formula. Wi,j new = Wi,j + ΔW(i,j) + (α × Δ(t − 1)) where: α = momentum term Δ(t − 1) = previous weight changes

(7)

Labeling Algorithm and Fully Connected Neural Network …

2.4.3

137

Back Propagation to the Hidden Layer

This stage resembles the previous stage, but what needs to be considered is the error value in the hidden layer, the value changes that must be done, and the new weight value for the weight between the input and hidden layer [10]. The following is the error calculation when the hidden layer output is greater than 0. n(i) error =

(error(j) × W(i,j) × 1)

(8)

j

where: error(j) = jth error value The following is the error calculation when the hidden layer output is smaller or equal to zero. n(i) error = (error(j) × W(i,j) × 0) (9) i

2.4.4

Weight Updates

The important thing at this stage is not to renew any weight until all errors have been calculated. After all error values are calculated and the weight value is updated, it is necessary to check whether or not the error value has been reduced. The learning process is said to be developing when the error value is reduced [10].

2.5 F-Measure F-measure is defined as the harmonic average of precision (P) and recall (R) [16]. Precision is the number of true predictive values divided by the total of true results [17]. Recall is the number of true predictive values divided by all relevant test values [17]. Intuitively, F-measure is not easily understood as accuracy, but F-measure is far more useful than accuracy, especially when there is an uneven distribution of classes [18] F=

2×P×R P+R

(10)

where: P = Precision R = Recall Confusion matrix (Fig. 6) is used to describe the performance of the classification model. The confusion matrix below produces the values of True Positive (TP), False

138

K. Alexander et al.

Fig. 6 Confusion matrix

Positive (FP), False Negative (FN), and True Negative (TN), which will be used to calculate precision and recall values. TP TP + FP TP R= TP + FN TP + TN Accuracy = TP + FP + FN + TN P=

(11) (12) (13)

3 Results The number plate recognition system is implemented using the Java programming language and Android SDK for Android platform. The back-end is built using PHP programming language. The system consists of seven modules: Plate Detection, Preprocessing, Plate Localization, Plate Transformation, Character Segmentation, Character Recognition, and Vehicle Identification. The flow of the system is presented in Fig. 7. The Plate Detection step detects plate objects in the camera frame using the TensorFlow Object Detection API in real time. The TensorFlow is trained using a hundred images of number plates, and then the annotation boxes is given for each image on the number plate. The Preprocessing, Plate Localization, and Plate Transformation are done using the OpenCV API. The Preprocessing step converts colored images into grayscale, into binary images. The Plate Localization step crops images obtained from the points using the TensorFlow Object Detection API. It is then refined using the Canny edge detector to detect the outer white lines in the number plate. Based on the results of the Canny edge, four outer corner points are searched and after that the Plate Transformation step sets the number plate to be perpendicular. Character Segmentation is performed using the Labeling algorithm mentioned earlier. The results from this step are sub-images of individual symbols in the number plate. The individual symbols are then selected based on the character aspect ratio in

Labeling Algorithm and Fully Connected Neural Network …

139

Fig. 7 The ANPR system flow

Fig. 8 The FCNN architecture of the ANPR system

the straight line of the midpoint and the height of the plate. The Character Recognition step is done using the FCNN. The architecture of the FCNN is presented in Fig. 8. In the input layer, two hundred nodes are used which are the representation of all pixels from the input characters. The image of the input character is the result of the Labeling algorithm in the size of 10 × 20 pixels. In the output layer, there are thirty-six nodes that represents twenty-six letters (A–Z) and ten numbers (0–9). In the hidden layer, the amount of hidden nodes to be used are calculated as the

140

K. Alexander et al.

following. N (i) × N (o) √ = 200 × 36 √ = 7200 ≈ 85

N (h) = N (h) N (h)

There are 85 hidden nodes to be used in the hidden layer of the FCNN. In this work, six other amounts of hidden nodes are also considered: 55, 65, 75, 85, 95, 105, and 115.

4 Analysis 4.1 Testing The test is divided into two parts to verify the implementation of the Labeling algorithm and the FCNN in the ANPR system. Further evaluation of the system performance is done by measuring the accuracy and F-measure, this evaluation is discussed in the next subsection. The implementation of the Labeling algorithm is tested using a 480 × 154 pixels image. The number plate image used for the test is as displayed in Fig. 9. Figure 9 is processed in two stages by the Labeling algorithm. The label is represented in color as illustrated in Fig. 10.

Fig. 9 Test image

Fig. 10 Labeling

Labeling Algorithm and Fully Connected Neural Network …

141

Fig. 11 Pattern aggregation

Fig. 12 Labeling test result

The results of the first stage of the Labeling algorithm is displayed in Fig. 10 above. The figure shows that there are several components that have more than one color. Component number “9” for example has blue and red colors. Therefore, the second stage (pattern aggregation) is carried out. At this stage, each pixel is checked whether or not it has an appropriate label (one component has no more than one label). The results of the pattern aggregation process is shown in Fig. 11. There are some obstacles when obtaining individual symbols (other objects than the background color, where the background color is black), the label obtained does not produce sequential characters and small components in the image are taken. The solution to this issue is to sort sub-images from left to right based from the high midpoint of the number plate image, and take only components based on the aspect ratio between 1.4211 and 5.6429. These numbers are obtained after numerous testing process. The sorted results are shown in Fig. 12. The test on Labeling algorithm implementation as described above is successful. The Labeling algorithm could identify each individual symbol and produce the sub-images. However, other tests show that there cases where the algorithm could not produce the expected result. This is due to external conditions such as the preprocessing step before the Labeling algorithm, light reflection on the plate, insufficient paint quality on the plate, and the existence of another object nearby the character. The sub-images produced from the Labeling algorithm are resized to a size of 20 × 10 pixels to suit the input size of the FCNN. The next test is on the FCNN implementation. The artificial neural network architecture used only for the test purpose is: • One input layer, one hidden layer, and one output layer.

142

K. Alexander et al.

Fig. 13 Android application (SiPelanok)

• The number of input nodes is two, the number of hidden nodes is three, and the number of output nodes is two. • The ReLU activation function is used in the hidden layer and the Softmax activation function is used in the output layer. • The learning rate is 0.01. • The training stops when the MSE is 0.01. This test also aims to determine the ideal amount of hidden nodes in the hidden layer against the evaluation of the FCNN. Before the testing is carried out, the network is trained using the Back Propagation algorithm. Data from the labeling algorithm test results contributed to a data set of 180 characters. The data set is split into training and testing (80/20). The initial weight is set to random and the value of Mean Squared Error (MSE) is set to 10−4 . The initial learning rate is 0.38, but the initial results show that the produced neural network model could not work properly. Thus, the learning rate is adjusted to 0.001. The results of this test is evaluated in the next subsection. Figure 13 presents the implementation of the methods in the client part (Android application). The implementation is successful for recognizing the number plate that is already registered in the system. The Android application is named SiPelanok.

4.2 Evaluation The results of the NN training using the Backpropagation are described in Table 1. Based on the results, the average epoch required to reach MSE of 0.0001 is approximately 27.105. The highest accuracy is achieved by network with 95 hidden nodes in

Labeling Algorithm and Fully Connected Neural Network … Table 1 Artificial neural network training result # Hidden nodes Epoch MSE 55 65 75 85 95 105 115

32,082 27,216 33,726 23,684 24,442 25,552 23,039

Table 2 F-Score data set # Hidden TP FP nodes 55 65 75 85 95 105 115

11 12 14 14 14 10 11

0 0 0 0 0 0 0

0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001

143

Training time (ms)

Accuracy (%)

375,070 426,996 462,007 544,069 526,050 716,876 699,230

97.22 97.22 88.89 97.22 100 88.89 97.22

TN

FN

P

R

F

15 15 15 15 15 15 15

7 6 4 4 4 8 7

1 1 1 1 1 1 1

0.61 0.67 0.78 0.78 0.78 0,56 0.61

0.76 0.80 0.88 0.88 0.88 0.71 0.76

the hidden layer. Meanwhile, 75 and 105 hidden nodes are both produced the lowest accuracy. The value of accuracy is obtained by using the testing data (twenty percent of the data-set). Recognition accuracy is evaluated using a sample of 33 number plates. These number plates are then divided into registered (18) and not registered (15). Each number plate went through one-time testing process for each amount of hidden nodes. The image for the testing process is captured between one to two meters from the vehicle number plate. The height of the camera during the capture is between 50145 cm from the ground surface area. The obtained accuracy values are summarized in Table 2. The true positive (TP) value is the condition where a number plate is read and identified correctly, and the number plate is also registered in the system. The false positive (FP) value is the condition where a number plate is read incorrectly, where whether registered or not, the plate is recognized by the system as a registered plate. The true negative (TN) value is the condition where a not registered number plate is identified as unregistered. The false negative (FN) value is the condition where a number plate is read incorrectly and although it is registered in the system, the number plate is identified as an unregistered vehicle.

144

K. Alexander et al.

Based on Table 2, the FP value for all configurations of hidden nodes is zero. All precision values are one due to the zero FP value. The highest recall value is achieved by 75, 85, and 95 hidden nodes with the recall value of 0.78, while the lowest value is achieved by 105 hidden nodes with the recall value of 0.56. The highest F score is achieved by 75, 85, and 95 hidden nodes with the F score of 0.88. The F score of 0.88 (close to 1) shows that the 75, 85, and 95 hidden nodes configurations could be chosen for the FCNN. These amounts of hidden nodes produce the best performance compared to other amounts of hidden nodes. The accuracy for the 75, 85 and 95 hidden nodes reaches 88%.

5 Discussion The implementation of the Labeling algorithm and the fully connected artificial neural network (FCNN) for the ANPR system presented in this paper has been successfully completed. The Labeling algorithm could separate the character components on the plate. The artificial neural network could classify the Labeling result components into characters between ‘A’ and ‘Z’ and between ‘0’ and ‘9’. The results of the F-measure calculation indicate that the artificial neural network model has an F-score ranging from 0.71 to 0.88. The model of artificial neural network that gives the best F-score is the one with the number of hidden nodes of 75, 85, and 95. In this study, the number of hidden nodes that deliver 88% accuracy and the best F-score of 0.88 is 95. These results are obtained under the test environment of good lighting condition, with a distance between one to two meters and the height between 50–145 cm from the ground surface area. Future work on this research is to try other parameters of the neural network such as the activation function to find the best activation function for the network. The goal is to boost accuracy in recognizing characters in number plates. The architecture of the NN could also be modified to achieve higher accuracy. In this work, one input layer architecture (two hundred nodes), one hidden layer (hidden node becomes this research variable), and one output layer (36 nodes) are used. Other architectures could be implemented to obtain the best model. The accuracy of recognition can improve or even worse. Another work that could be carried out is to add a convolutional layer to the neural network (CNN) to assist the extraction and normalization processes in the input for the fully connected neural network layer. Acknowledgements I really thank and appreciate both of my supervisors Arya Wicaksana and Ni Made Satvika Iswari for their guidance and support on this work. I would also like to extend my thanks to Universitas Multimedia Nusantara for giving me the chance and opportunity to carry out this work.

Labeling Algorithm and Fully Connected Neural Network …

145

References 1. Direktorat Jenderal Perhubungan Darat Kementerian Perhubungan Republik Indonesia. Direktorat Jenderal Perhubungan Darat - Undang Undang Republik Indonesia Nomor 22 Tahun 2009 Tentang Lalu Lintas dan Angkutan Jalan (2017). http://hubdat.dephub.go.id/uu/288-uunomor-22-tahun-2009-tentang-lalu-lintas-dan-angkutan-jalan/download. Cited 01 Oct 2017 2. Rouse, M.: What is automated license plate recognition (ALPR)? - Definition from WhatIs.com (2017). http://whatis.techtarget.com/definition/Automated-License-PlateRecognition-ALPR. Cited 01 Oct 2017 3. Janowski, L., Koozlowski, P., Baran, R., Romaniak, P., Glowacz, A., Rusc, T.: Quality assessment for a visual and automatic license plate recognition. Multimedia Tools App. 68, 23–40 (2014) 4. Sitompul, A., Sulistiyo, M., Purnama, B.: Indonesian vehicle number plates recognition using multi layer perceptron neural network and connected component labeling. Int. J. ICT 1, 29–37 (2016) 5. Iswari, N.M.S., Wella, Ranny.: Fish freshness classification method based on fish image using k-Nearest Neighbor (2018). https://doi.org/10.1109/CONMEDIA.2017.8266036 6. Kurniawan, V., Wicaksana, A., Prasetiyowati, M.I.: The implementation of eigenface algorithm for face recognition in attendance system. (2018). https://doi.org/10.1109/CONMEDIA.2017. 8266042 7. Toda, S.: License Plate Recognition. United States of America Patent US20050084134 A1 (2005) 8. Kim, K., Kim, K., Park, S., Jung, K., Park, M., Kim, H.: VEGA VISION: a vision system for recognizing license plates. In: IEEE International Symposium on Consummer Electronics (ISCE 99), vol. 2, pp. 176–181 (1999) 9. Samet, H., Tamminen, M.: Efficient component labeling of images of arbitrary dimension represented by linear bintrees. IEEE Trans. Pattern Anal. Mach. Intell. 579 (1988) 10. Cilimkovic, M.: Neural Network and Back Propagation Algorithm. Dublin: Institute of Technology Blanchardstown 11. Larose, D.: Discovering Knowledge in Data: An Introduction to Data Mining. WileyInterscience, New York (2005) 12. Fogel, D.: Blondie24: Playing at The Edge of AI. Morgan Kaufmann Publishers, Burlington (2002) 13. Faqs.org: Why use activation functions (2010). http://www.faqs.org/faqs/ai-faq/neural-nets/ part2/section-10.html. Cited 29 Dec 2017 14. Gupta, D.: Fundamentals of deep learning - activation functions and their use (2017). https:// www.analyticsvidhya.com/blog/2017/10/fundamentals-deep-learning-activation-functionswhen-to-use-them/. Cited 29 Dec 2017 15. Rojas, R.: Neural Networks: A Systematic Introduction. Springer, Berlin (2005) 16. Sasaki, Y.: f-measure-25Oct07.dvi - F-measure-YS-26Oct07.pdf (2007). http://www.cs.odu. edu/~mukka/cs795sum09dm/Lecturenotes/Day3/F-measure-YS-26Oct07.pdf. Cited 29 Dec 2017 17. Brownlee, J.: Classification accuracy is not enough: more performance measures you can use - machine learning mastery (2014). https://machinelearningmastery.com/classificationaccuracy-is-not-enough-more-performance-measures-you-can-use/. Cited 29 Dec 2017 18. Joshi, R.: Accuracy, precision, recall &; F1 score: interpretation of performance measures - Exsilio Blog (2016). http://blog.exsilio.com/all/accuracy-precision-recall-f1-scoreinterpretation-of-performance-measures/. Cited 12 Jan 2018

Implementation of Creation and Distribution Processes of DACS Rules for the Cloud Type Virtual Policy Based Network Management Scheme for the Specific Domain Kazuya Odagiri, Shogo Shimizu and Naohiro Ishii Abstract In the current Internet system, there are many problems using anonymity of the network communication such as personal information leaks and crimes using the Internet system. This is why TCP/IP protocol used in Internet system does not have the user identification information on the communication data, and it is difficult to supervise the user performing the above acts immediately. As a study for solving the above problem, there is the study of Policy Based Network Management (PBNM). This is the scheme for managing a whole Local Area Network (LAN) through communication control for every user. In this PBNM, two types of schemes exist. As one scheme, we have studied theoretically about the Destination Addressing Control System (DACS) Scheme with affinity with existing internet. By applying this DACS Scheme to Internet system management, we will realize the policy-based Internet system management. In this paper, to realize management of the specific domain with some network groups with plural organizations, results of implementation about creation and distribution processes of DACS rules are described. Keywords Policy-based network management · DACS scheme · NAPT

K. Odagiri (B) Sugiyama Jogakuen University, 17-3Hosigaokamotomachi, Chiksa-ku, Nagoya, Aichi 464-8662, Japan e-mail: [email protected]; [email protected] S. Shimizu Gakushuin Women’s College, Tokyo, Japan e-mail: [email protected] N. Ishii Aichi Institute of Technology, Toyota, Aichi, Japan e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Lee (ed.), Applied Computing and Information Technology, Studies in Computational Intelligence 847, https://doi.org/10.1007/978-3-030-25217-5_11

147

148

K. Odagiri et al.

1 Introduction In the current Internet system, there are many problems using anonymity of the network communication such as personal information leaks and crimes using the Internet system. As a study for solving the problems, Policy Based Network Management (PBNM) [1] exists. The PBNM is a scheme for managing a whole Local Area Network (LAN) through communication control every user, and cannot be applied to the Internet system. In the existing PBNM, there are two types of schemes. The first is the scheme of managing the whole LAN by locating the communication control mechanisms on the path between network servers and clients. The second is the scheme of managing the whole LAN by locating the communication control mechanisms on clients. As the second scheme, we have studied theoretically about the Destination Addressing Control System (DACS) Scheme. As the works on the DACS Scheme, we showed the basic principle of the DACS Scheme, and security function [2]. After that, we implemented a DACS System to realize a concept of the DACS Scheme. By applying this DACS Scheme to Internet system, we will realize the policy-based Internet system management. Then, the Wide Area DACS system (wDACS system) [3] to use it in one organization was showed as the second phase for the last goal. As the first step of the second phase, we showed results of implementation about creation and distribution processes of DACS rules [3]. In this paper of Sect. 2, motivation and related research for this study are described. In Sect. 3, the existing DACS Scheme and wDACS Scheme is described. In Sect. 4, content of the proposed scheme and results of implementation are described.

2 Motivation and Related Research In the current Internet system, problems using anonymity of the network communication such as personal information leak and crimes using the Internet system occur. Because TCP/IP protocol used in Internet system does not have the user identification information on the communication data, it is difficult to supervise the user performing the above acts immediately. As studies and technologies for Internet system management to be comprises of TCP/IP [4], many technologies are studied. For examples, Domain name system (DNS), Routing protocol such as Interior gateway protocol (IGP) such as Routing information protocol (RIP) and Open shortest path first (OSPF), Fire Wall (F/W), Network address translation (NAT)/Network address port translation (NAPT), Load balancing, Virtual private network (VPN), Public key infrastructure (PKI), Server virtualization. Except these studies, various studies are performed elsewhere. However, they are for managing the specific part of the Internet system, and have no purpose of solving the above problems.

Implementation of Creation and Distribution Processes of DACS …

149

Fig. 1 Principle in first scheme

As a study for solving the above problem, the study area about PBNM exists. This is a scheme of managing a whole LAN through communication control every user. Because this PBNM manages a whole LAN by making anonymous communication non-anonymous, it becomes possible to identify the user who steals personal information and commits a crime swiftly and easily. Therefore, by applying this policybased thinking, we study about the policy-based Internet system management. In policy-based network management, there are two types of schemes. The first scheme is the scheme described in Fig. 1. The standardization of this scheme is performed in various organizations. In IETF, a framework of PBNM [1] was established. Standards about each element constituting this framework are as follows. As a model of control information stored in the server called Policy Repository, Policy Core Information model (PCIM) [5] was established. After it, PCMIe [6] was established by extending the PCIM. To describe them in the form of Lightweight Directory Access Protocol (LDAP), Policy Core LDAP Schema (PCLS) [7] was established. As a protocol to distribute the control information stored in Policy Repository or decision result from the PDP to the PEP, Common Open Policy Service (COPS) [8] was established. Based on the difference in distribution method, COPS usage for RSVP (COPS-RSVP) [9] and COPS usage for Provisioning (COPS-PR) [10] were established. RSVP is an abbreviation for Resource Reservation Protocol. The COPS-RSVP is the method as follows. After the PEP having detected the communication from a user or a client application, the PDP makes a judgmental decision for it. The decision is sent and applied to the PEP, and the PEP adds the control to it. The COPS-PR is the method of distributing the control information or decision result to the PEP before accepting the communication.

150

K. Odagiri et al.

Fig. 2 Essential principle

Next, in DMTF, a framework of PBNM called Directory-enabled Network (DEN) was established. Like the IETF framework, control information is stored in the server storing control information called Policy Server, which is built by using the directory service such as LDAP [11], and is distributed to network servers and networking equipment such as switch and router. As the result, the whole LAN is managed. The model of control information used in DEN is called Common Information Model (CIM), the schema of the CIM (CIM Schema Version 2.30.0) [12] was opened. The CIM was extended to support the DEN [13], and was incorporated in the framework of DEN. In addition, Resource and Admission Control Subsystem (RACS) [14] was established in Telecoms and Internet converged Services and protocols for Advanced Network (TISPAN) of European Telecommunications Standards Institute (ETSI), and Resource and Admission Control Functions (RACF) was established in International Telecommunication Union Telecommunication Standardization Sector (ITU-T) [15]. However, all the frameworks explained above are based on the principle shown in Fig. 1. As problems of these frameworks, two points are presented as follows. Essential principle is described in Fig. 2. To be concrete, in the point called PDP (Policy Decision Point), judgment such as permission or non-permission for communication pass is performed based on policy information. The judgment is notified and transmitted to the point called the PEP, which is the mechanism such as VPN mechanism, router and Fire Wall located on the network path among hosts such as servers and clients. Based on that judgment, the control is added for the communication that is going to pass by.

Implementation of Creation and Distribution Processes of DACS …

151

Fig. 3 Principle in second scheme

The principle of the second scheme is described in Fig. 3. By locating the communication control mechanisms on the clients, the whole LAN is managed. Because this scheme controls the network communications on each client, the processing load is low. However, because the communication control mechanisms needs to be located on each client, the work load becomes heavy. When it is thought that Internet system is managed by using these two schemes, it is difficult to apply the first scheme to Internet system management practically. This is why the communication control mechanism needs to be located on the path between network servers and clients without exception. On the other hand, the second scheme locates the communication controls mechanisms on each client. That is, the software for communication control is installed on each client. So, by devising the installing mechanism letting users install software to the client easily, it becomes possible to apply the second scheme to Internet system management. As a first step for the last goal, we showed the Wide Area DACS system (wDACS) system [3]. This system manages a wide area network, which one organization manages. Therefore, it is impossible for plural organizations to use this system. In order to improve it, we showed the cloud type virtual PBNM, which could be used by plural organizations. After it, to expand its application area, the scheme to manage the specific domain and the user authentication processes for that scheme are examined. In this paper, the policy information decision processes which are performed after the user authentication are examined.

152

K. Odagiri et al.

Fig. 4 Basic principle of the DACS scheme

3 Existing DACS Scheme and wDACS System 3.1 Basic Principle of the DACS Scheme Figure 4 shows the basic principle of the network services by the DACS Scheme. At the timing of the (a) or (b) as shown in the following, the DACS rules (rules defined by the user unit) are distributed from the DACS Server to the DACS Client. (a) At the time of a user logging in the client. (b) At the time of a delivery indication from the system administrator. According to the distributed DACS rules, the DACS Client performs (1) or (2) operation are shown in the following. Then, communication control of the client is performed for every login user. (1) Destination information on IP Packet, which is sent from application program, is changed. (2) IP Packet from the client, which is sent from the application program to the outside of the client, is blocked. An example of the case (1) is shown in Fig. 4. In Fig. 4, the system administrator can distribute a communication of the login user to the specified server among servers A, B or C. Moreover, the case (2) is described. For example, when the system administrator wants to forbid a user to use MUA (Mail User Agent), it will be performed by blocking IP Packet with the specific destination information. In order to realize the DACS Scheme, the operation is done by a DACS Protocol as shown in Fig. 5. As shown by (1) in Fig. 5, the distribution of the DACS rules is performed on communication between the DACS Server and the DACS Client,

Implementation of Creation and Distribution Processes of DACS …

153

Fig. 5 Layer setting of the DACS scheme

which is arranged at the application layer. The application of the DACS rules to the DACS Control is shown by (2) in Fig. 5. The steady communication control, such as a modification of the destination information or the communication blocking is performed at the network layer as shown by (3) in Fig. 5.

3.2 Communication Control on Client The communication control on every user was given. However, it may be better to perform communication control on every client instead of every user. For example, it is the case where many and unspecified users use a computer room, which is controlled. In this section, the method of communication control on every client is described, and the coexistence method with the communication control on every user is considered. When a user logs into a client, the IP address of the client is transmitted to the DACS Server from the DACS Client. Then, if the DACS rules corresponding to IP address, is registered into the DACS Server side, it is transmitted to the DACS Client. Then, communication control for every client can be realized by applying to the DACS Control. In this case, it is a premise that a client uses a fixed IP address. However, when using DHCP service, it is possible to carry out the same control to all the clients linked to the whole network or its subnetwork for example. When using communication control on every user and every client, communication control may conflict. In that case, a priority needs to be given. The judgment is performed in the DACS Server side as shown in Fig. 6. Although not necessarily stipulated, the network policy or security policy exists in the organization such as a university (1). The priority is decided according to the policy (2). In (a), priority is

154

K. Odagiri et al.

Fig. 6 Creating the DACS rules on the DACS server

given for the user’s rule to control communication by the user unit. In (b), priority is given for the client’s rule to control communication by the client unit. In (c), the user’s rule is the same as the client’s rule. As the result of comparing the conflict rules, one rule is determined respectively. Those rules and other rules not overlapping are gathered, and the DACS rules are created (3). The DACS rules are transmitted to the DACS Client. In the DACS Client side, the DACS rules are applied to the DACS Control. The difference between the user’s rule and the client’s rule is not distinguished.

Implementation of Creation and Distribution Processes of DACS …

155

Fig. 7 Extend security function

3.3 Security Mechanism of the DACS Scheme In this section, the security function of the DACS Scheme is described. The communication is tunneled and encrypted by use of SSH. By using the function of port forwarding of SSH, it is realized to tunnel and encrypt the communication between the network server and the, which DACS Client is installed in. Normally, to communicate from a client application to a network server by using the function of port forwarding of SSH, local host (127.0.0.1) needs to be indicated on that client application as a communicating server. The transparent use of a client, which is a characteristic of the DACS Scheme, is failed. The transparent use of a client means that a client can be used continuously without changing setups when the network system is updated. The function that doesn’t fail the transparent use of a client is needed. The mechanism of that function is shown in Fig. 7.

3.4 The Cloud Type Virtual PBNM for the Common Use Between Plural Organizations In this section, after the concept and implementation of the proposed scheme were described, functional evaluation results are described.

156

K. Odagiri et al.

Fig. 8 Cloud type virtual PBNM for the common use between plural organizations

In Fig. 8 which is described in [3], the proposed concept is shown. Because the existing wDACS Scheme realized the PBNM control with the software called the DACS Server and the DACS client, other mechanism was not needed. By this point, application to the cloud environment was easy. The proposed scheme in this paper realizes the common usage by plural organizations by adding the following elements to realize the common usage by plural organizations: user identification of the plural organizations, management of the policy information of the plural organizations, application of the PKI for code communication in the Internet, Redundant configuration of the DACS Server (policy information server), load balancing configuration of the DACS Server, installation function of DACS Client by way of the Internet In the past study [2], the DACS Client was operated on the windows operation system (Windows OS). It was because there were many cases that the Windows OS was used for as the OS of the client. However, the Linux operating system (Linux OS) had enough functions to be used as the client recently, too. In addition, it was thought that the case used in the clients in the future came out recently. Therefore, to prove the possibility of the DACS Scheme on the Linux OS, the basic function of the DACS Client was implemented in this study. The basic functions of the DACS Server and DACS Client were implemented by JAVA language.

Implementation of Creation and Distribution Processes of DACS …

157

4 Content and Implementation of the Proposed Scheme to Manage the Specific Domain In this section, after the user authentication processes applied for the scheme to manage the specific domain was shown in Fig. 10, the concept of policy information decision method is examined and described.

4.1 Content of the Proposed Scheme to Manage the Specific Domain This proposed scheme is a scheme to manage the plural networks group. In Fig. 9, the concept is explained. Specifically, as a logical range to manage organization A and organization B, network group 1 exists. Similarly, as a logical range to manage organization C and organization D, network group 2 exists. These individual network groups are existing methods listed in Fig. 9. When plural network groups managed by this existing scheme exist, those plural network groups are targeted for management by this proposed method. For example, when user A belonging to org. A in network group 1 uses the network which org. C belonging to network group 2 which is a different network group holds, administrative organization Y for network group 2 refers for policy information of user A for administrative organization X of network group 1 and acquires it. After it, in the form that policy information registered with Network Group 2 beforehand is collated with the policy information, the final policy information is decided. As a result, the policy information is applied to the client that user A uses in network group 2, and the communication control on the client is performed. When a user moves plural network groups as well as the specific network group, it is thought that the PBNM scheme to keep a certain constant management state is realized. This scheme consists of the following three factors. (Factor 1) Method of user authentication (Factor 2) Determination method of the policy information (Factor 3) Distribution method of the policy information As Factor 1, the proposed method of user authentication which is suitable of this method is described in Fig. 10. Because the proposed PBNM method is for the method to manage the whole Internet system, the proposed user authentication system also has a distributed system form. For example, when user A belonging to org. A in network group 1 accesses the network of the network group 1, the user authentication process is generated for the user authentication server for the network group 1. On the other hand, when user A belonging to org. A in network group 1 accesses the network of the network group 2, the user authentication process is generated for the user authentication server for the network group 1. The server name with domain name are required as the information which are necessary for this user authentication.

158

K. Odagiri et al.

Specific administraƟve organizaƟon X

Specific administraƟve organizaƟon Y

RelaƟon

Policy InformaƟon Server

Policy InformaƟon Server

ApplicaƟon of the policy informaƟon

LAN or WAN Network of the own organizaƟon (Org. A)

ApplicaƟon of the policy informaƟon

LAN or WAN

LAN or WAN

Network of the own organizaƟon (Org. B)

Network of the own organizaƟon (Org. C)

Network Group 1

LAN or WAN Network of the own organizaƟon (Org. D) Network Group 2

Domain 1 Movement terminal in Org. A (1)Movement and connecƟon (2)Usage depending on policy informaƟon

Fig. 9 Concept of the proposed scheme

Based on the server name with domain name which is incorporated in the DACS CL in advance, the first access for the user authentication server is performed. After that, input of user name and password is requested. When these pieces of information are sent over the network, they need to be encrypted and sent by SSL (TLS). That is, the user authentication is handled by the organization to which the user belongs. The point to be noted here is the meaning of user authentication. Maybe, in some way, the user authentication server which is possessed by the network group that the user accesses may need to be used. However, that is only a complementary measure. The user authentication referred to here is the one before distribution of DACS rules as policy information. It is explicitly necessary to distinguish it from user authentication for permitting network connection as a separate one. Depending on the implementation, it may be descriptively possible to integrate these two user authentications as one process, but this is a matter to be considered at the implementation stage. As Factor 2, concept of policy information decision method is described. In Fig. 11, as an example, the user belonging to the organization A connects the notebook client computer to another organization in Network Group 2. In this case, the DACS rules as policy information are extracted from the DACS SV as policy information server which are located in both network group (Network Group 1 and Network Group 2). After it, the DACS rules are determined and applied for the DACS CL in the client computer. When the DACS rules are determined, multiple points are considered as follows. (Point 1) Use of the information service of the organization to which the user belongs (Point 2) Use of the information service of the organization to which the user connects (Point 3) Use of the Packet Filtering Service.

Implementation of Creation and Distribution Processes of DACS …

159

Fig. 10 Concept of the proposed user authentication method

As the first example, with the user in organization A connected the own client to the network of network group 2, the user accesses the mail system in own organization. At that time, to enable the client to access the mail service, the communication must be permitted by packet filtering setting. In the setting of the mail client in the client, the server name of the mail system of its own organization is normally specified. As the result, it becomes possible to access the mail system from the client. In this case, as the packet filtering setting, the policy information transmitted from the policy information server of the management organization X is reflected on the client. As the second example, with the user in organization A connected the own client to the network of network group 2, the user accesses the mail system of the connected organization, like the first example, to enable the client to access the mail service, the communication must be permitted by packet filtering setting. In addition, because the server name of the mail system of its own organization is normally specified in the setting of the mail client in the client, it is necessary to change the mail system to be accessed by communication control by NAT. As the policy information for these two communication controls, the one transmitted from the policy information server of the connected organization is reflected to the client. After it, it becomes possible to access the mail system of the connected organization from the client. However, in the case of example 2, the policy information used in example 1 also exists, the policy information used in the above two examples is set in both management organizations.

160

K. Odagiri et al.

Specific administraƟve organizaƟon Y

Specific administraƟve organizaƟon X Policy InformaƟon AuthenƟcaƟon Server Server

LAN or WAN

LAN or WAN

Network of the own organizaƟon (Org. A)

Network of the own organizaƟon (Org. B)

Policy InformaƟon AuthenƟcaƟon Server Server

RelaƟon

(a) Decision of Policy InformaƟon

LAN or WAN

LAN or WAN

Network of the own organizaƟon (Org. C)

Network of the own organizaƟon (Org. D)

Network Group 1

Network Group 2

Domain 1

(b) ApplicaƟon of Policy InformaƟon

User in Org. A

User in Org. A

Fig. 11 Concept of policy information decision method

In this case, the fact that the policy information is set in the connected organization means that the service use of the connected organization is permitted.

4.2 Implementation of the Proposed Scheme to Manage the Specific Domain Here, it is explained about whole implementation. As User Authentication Server, an openldap software was adopted. The openldap is a free software, and has high affinity for the Internet system. Then, as a database for Policy Information Server, postgresql was adopted. It has also high affinity for the Internet system, and is used some Web application systems. Communications between User Authentication Server and client computer are encrypted by Transport Layer Security (TLS). Then, communications from DACS SV to DACS Client are also TLS. The TLS moves on TCP/IP, and realize secure communication channels between servers and clients. The TLS is separated from application protocols such as Hyper Text Transfer Protocol (HTTP) and Simple Mail Transfer Protocol (SMTP), Internet Message Access Protocol (IMAP), and could be used from those application protocols. Previously, it is called Secure Sockets

Implementation of Creation and Distribution Processes of DACS …

161

Fig. 12 Prototype’s system configuration

Fig. 13 Access log on OpenLDAP

Layer (SSL). However, it is called TLS now. Because DAS CS SV and DACS CL is implemented by use of programing language of JAVA, and TLS corresponds to an openldap software which is supported by program language such as JAVA, TLS was adopted considering affinity with the Internet. Because the JAVA is used for the implementation for many services on the Internet and moves on all operation systems in the client computers, the JAVA has high affinity with Internet System. From here, results of Implementation are described. Simple figure of prototype system’s configuration is shown in Fig. 12. Two domain (Domain-a, Domain-b) exist. In case of client with DACS CL-b moving from Domain-b to Domain-a, it accesses OpenLDAP-b, DACS SV-a and DACS SV-b. OpenLDAP-b was accessed from the client. Access log for it is shown in Fig. 13. It shows that user (test1) accessed form the client with “192.168.93.133” of IP address. Then, the client accessed two DACS SVs (DACS SV-a, DACS SV-b), and DACS rules on “test1” are created and extracted from both DACS SVs. Access log for it was shown in Figs. 14 and 15. In Fig. 14, user account information sent from the client was shown at the upper side of “Displaying the Information from the Client”. Based on this information, DACS SV recognized the IP address of the client, user name, domain name which the client was connected. Extraction information which was extracted from postgresql

162

K. Odagiri et al.

Fig. 14 Access log of DACS SV-a in Domain-a

Fig. 15 Access log of DACS SV-a in Domain-a

database was shown at the middle of “Displaying the Extraction Information from Database”. Destination of IP address and port number to be converted and those after conversion were shown. In Fig. 15, access log of similar contents was displayed. Destination of IP address and port number to be converted and those after conversion were displayed differently. After both content of DACS rules are sent to the client, when rules with same destination IP address and port number exist, the rule with high priority is selected and applied to the client. The access log was shown in Fig. 16. At the upper side of “User Information which is sent from OpanlDAP server of own org”, user account information which is extracted from OpenLDAP Server was shown. This shows that the user authentication on the OpenLDAP server in its own domain was successful. At the two line starting with “Displaying the DACS rules—”, DACS rules sent from DACS SV-a and DACS SB-b were shown. After comparison with two DACS rules, the one that was given priority was indicated. At last part of line starting with “high Priority of DACS rules”, the rules sent from domain-a (connected domain) are displayed as the priority rule.

Implementation of Creation and Distribution Processes of DACS …

163

Fig. 16 Access log on the client side

5 Conclusion In this paper, results of implementation about creation process of DACS rules for the Cloud Type Virtual Policy Based Network Management Scheme for the Specific Domain are described. Considering affinity with the Internet system, system implementation was executed. In the near future, we plan to conduct a load experiment using about 100 virtual clients. In the future, realization research of Internet PBNM will be started based on this paper. Acknowledgements This work was supported by the research grant of JSPS KAKENHI Grant Number 17K00113, and the research grant by Support Center for Advanced Telecommunications Technology Research, Foundation (SCAT). We express our gratitude.

References 1. Yavatkar, R., Pendarakis, D., Guerin, R.: A framework for policy-based admission control. IETF RFC 2753 (2000) 2. Odagiri, K., Shimizu, S., Takizawa, M., Ishii, N.: Theoretical suggestion of policy-based wide area network management system (wDACS system part-I). Int. J. Netw. Distrib. Comput. (IJNDC) 1(4), 260–269 (2013) 3. Odagiri, K., Shimizu, S., Ishii, N., Takizawa, M.: Suggestion of the cloud type virtual policy based network management scheme for the common use between plural organizations. In: Proceedings of International Conference on International Conference on Network-Based Information Systems (NBiS-2015), pp. 180–186 (2015)

164

K. Odagiri et al.

4. Cerf, V., Kahn, E.: A protocol for packet network interconnection. IEEE Trans. Commun. COM-22, 637–648 (1974) 5. Moore, B., et al.: Policy core information model—version 1 specification. IETF RFC 3060 (2001) 6. Moore, B.: Policy core information model (PCIM) extensions. IETF 3460 (2003) 7. Strassner, J., Moore, B., Moats, R., Ellesson, E.: Policy core lightweight directory access protocol (LDAP) schema. IETF RFC 3703 (2004) 8. Durham, D., et al.: The COPS (common open policy service) protocol. IETF RFC 2748 (2000) 9. Herzog, S., et al.: COPS usage for RSVP. IETF RFC 2749 (2000) 10. Chan, K., et al.: COPS usage for policy provisioning (COPS-PR). IETF RFC 3084 (2001) 11. CIM Core Model V2.5 LDAP Mapping Specification (2002) 12. CIM Schema: Version 2.30.0 (2011) 13. Wahl, M., Howes, T., Kille, S.: Lightweight directory access protocol (v3). IETF RFC 2251 (1997) 14. ETSI ES 282 003: Telecoms and Internet converged Services and protocols for Advanced Network (TISPAN); Resource and Admission Control Subsystem (RACS); Functional Architecture (2006) 15. ETSI ETSI ES 283 026: Telecommunications and Internet Converged Services and Protocols for Advanced Networking (TISPAN); Resource and Admission Control; Protocol for QoS reservation information exchange between the Service Policy Decision Function (SPDF) and the Access-Resource and Admission Control Function (A-RACF) in the Resource and Protocol specification (2006)

Transforming YAWL Workflows into Petri Nets Wanwisa Paakbua and Wiwat Vatanawood

Abstract Business workflow management is essential and helps enhance efficiency, productivity and automation of the business functions. YAWL is one of the powerful business workflow modeling languages that is capable of representing the essential business workflow patterns needed for the software driven workflow automation. However, to do the model checking of a workflow written in YAWL is merely automated and the exhaustive simulations of the YAWL would still be time-consuming. In this paper, we propose an alternative to transform a YAWL workflow into a corresponding Petri nets model. A set of mapping rules would be proposed to cope with the non well formed model of YAWL. The resulting Petri nets model of the YAWL workflows is correct and ready for model checking. Keywords YAWL workflow · Formal model · Petri nets

1 Introduction In order to develop a huge software system with the complex business rules, the workflow automation is so essential to enable the efficiency, productivity and most importantly the maintainability. Design a huge and complex business software system from scratch is merely possible and practical for these days, whilst the business workflow automation is introduced the high level workflow language and workflow patterns to enhance the capability of the construction of the software system. We use workflow to represent the business processes by identifying the tasks and the control flows among the tasks. Several workflow modeling languages have been proposed such as YAWL, BPEL, BPMN, etc. However, the design of the business workflows should be analyzed and verified to ensure the correctness of the business W. Paakbua (B) · W. Vatanawood Department of Computer Engineering, Chulalongkorn University, Bangkok, Thailand e-mail: [email protected] W. Vatanawood e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Lee (ed.), Applied Computing and Information Technology, Studies in Computational Intelligence 847, https://doi.org/10.1007/978-3-030-25217-5_12

165

166

W. Paakbua and W. Vatanawood

processes beforehand. Fortunately, there exist several simulation tools proposed to do the simulation of the YAWL workflow model beforehand, such as YAWL simulator in [1]. Many researches proposed the scheme to convert business workflow models into the formal models such as YAWL, Petri nets and its related types of Petri nets— timed Petri nets, coloured Petri nets, etc. Some formal models are written in formal specification language, such as Promela [2, 3], CSP [4], etc. For example [2], proposed an approach to analyze the formal semantics of BPMN process model using YAWL. The resulting formal semantics helps understand and check the correctness of the behaviors of the workflow patterns found in the large and complicated BPMN model. Whilst [3], proposed an approach to convert the structure of BPMN to YAWL in order to ease the business end users to understand and analyze the complicated business process via the animation of YAWL workflow model. However [5], proposed a YAWL-to-BPEL transformation in order to support the complex patterns of the business workflows available only in BPEL. Alternatively, several researches proposed how to convert the business workflows into Petri nets model. For example [6], proposed how to transform UML activity diagram into coloured Petri nets. A set of rules was proposed to convert the scenarios of the essential control flows of the UML activity diagram, such as decision and merge, fork and join nodes. In [7], the framework to transform the BPMN design models into coloured Petri nets was proposed and the hierarchical and compositional verification techniques were proposed. As we mentioned earlier, to do the exhaustive simulations of a complex YAWL workflow model is still tedious and time-consuming. In this paper, we intend to propose an alternative to transform YAWL workflow into Petri nets so that the resulting Petri nets would be compatible and useful in the model checking tools. The rest of this paper is organized as followed. Section 2 is the backgrounds and Sect. 3 describes our transforming scheme. Section 4 shows our demonstration and case study. Section 5 is our conclusion.

2 Background 2.1 YAWL Workflow [1] The language YAWL stands for “Yet Another Workflow Language”. YAWL is a workflow language used to visually define the graph structure of the workflow patterns in a business process. YAWL’s graph structure helps illustrate the complicated business processes in the well-formed manners. It supports the detections of errors and conflicts of the control flows within the workflow model beforehand. A YAWL workflow model is composed of a set of tasks and conditions and a set of connecting edges. A sample of the YAWL elements is shown in Fig. 1. – Input Condition element indicates the starting point of the workflow. – Output Condition element specifies the ending point of the workflow.

Transforming YAWL Workflows into Petri Nets

Fig. 1 A sample of the YAWL elements [1]

167

168

W. Paakbua and W. Vatanawood

– Condition element represents the state for the workflow and it can be located in-between tasks. It allows us to decide where to proceed from the two or more paths. – Atomic Task element represents a single task in the workflow to be performed. – Multiple Instance Atomic Tasks element represents a set of multiple instance of a task to be performed concurrently. – Composite Task element represents a sub-net—with its own set of YAWL elements and constrained by the same syntax. – Multiple Instance Composite Tasks element represents a set of multiple instances of a composite task (i.e. a sub-net) to be performed concurrently. – AND-Split element allows the subsequent tasks to be performed in parallel. This element may have one or more outgoing flows so that it will activate every outgoing flow simultaneously. – AND-Join element allows all of the preceding tasks to be performed completely before continuing. This element may have one of more incoming flows so that it will wait for the completion of all incoming flows before activating its outgoing flows. – XOR-Split element allows us to choose exactly one exclusive alternative from the outgoing flows. This element may have one or more outgoing flows and it will activate exactly one outgoing flow. – XOR-Join element allows one or more incoming flows and it will activate the outgoing flow as soon as either one incoming flow completed. – OR-Split element allows one or more outgoing flows, each with an associated Boolean condition labelled. It will activate each outgoing flows that has its labelled condition evaluated as “true,” or the designated default flow if none of the other outgoing flows conditions are “true.” – OR-Join element allows one or more incoming flows and it will activate the outgoing flow as soon as each or any number of the incoming flows completed.

2.2 Petri Nets [8] Petri net is a particular type of bipartite directed graphs containing two types of nodes: places (depicted as circles) and transitions (depicted as bars). Petri nets are initially invented by Carl Adam Petri to model the communication between asynchronous devices. In 1965, it is used to model and analyze concurrent systems and popularly introduced to use in many different areas. A Petri net concept is encouraged in modeling and simulation of the structural and behavioral properties of the variety of real world systems, especially concurrent systems. Petri nets are extended further to cope with the time constraints for the real time systems and the stochastic where exist conflicts, concurrency, and non-determinism. Petri nets have a few symbols as shown in Fig. 2.

Transforming YAWL Workflows into Petri Nets

169

Fig. 2 Symbols of petri nets [8]

– Place element typically means the status of the system or a set of conditions used to determine the occurrence of an event. It is depicted as a circle. – Transition element typically means an action that will happen. It is depicted as a rectangular bar. – Arc element means the flow of the path of the system with an arrowhead to determine the direction. It is depicted as an arrowed line. – Token element is depicted as a black point which determines the event. In Fig. 3, a sample of Petri net is shown. It consists of two places, one transition, two arcs and one token. The place p1 is the input place of the transition t1 , while the place p2 is the output place of the transition t1 . A token is located within the place p1 representing the status of the system at that time which defined by the place p1 . The changing of the location of the token within the different places indicates the changing of the state of the system. We call the token distribution in Petri net as “marking.”

3 Our Model Transforming Approach In this section, the overview of our transforming approach is described and the mapping rules for model transformation are presented. As shown in Fig. 4, our transforming approach begins with the YAWL workflow model written in XML

170

W. Paakbua and W. Vatanawood

Fig. 3 A sample of petri nets [8] Fig. 4 The overview of our transforming approach

format as our input file. The elements of YAWL are extracted and considered to apply the mapping rules available. The mapping rules are illustrated graphically in Table 1. Each particular element of YAWL language is to be converted to the corresponding set of the elements of the Petri nets. The resulting Petri net is finally generated and verified using Petri net simulation tools.

Transforming YAWL Workflows into Petri Nets

171

Table 1 Mapping rules for transforming YAWL elements into petri nets elements Name Input condition

Output condition

Condition

Atomic task

Multiple instance atomic task

AMD-split

AND-join

XOR-split

XOR-Join

OR-split

OR-join

YAWL

Petri nets

172

W. Paakbua and W. Vatanawood

Definition 1 YAWL Workflow A YAWL workflow is a 6-tuple YWL = (C, T, F, C i , C o , T type ) such that C is a set of conditions T is a set of tasks C i ⊆ C is a set of the input condition C o ⊆ C is a set of the output conditions T a ⊆ T is a set of atomic tasks T m ⊆ T is a set of multiple instance tasks F ⊆ (C\C o x T ) ∪ (TxC\C i ) ∪ (TxT ) is the flow relation. The graph structure of YWL is depicted as (C ∪ T, F), such that every node is on the directed path from i to o where i ∈ C i and o ∈ C o . T type : T → {AND-Split, OR-Split, XOR-Split, AND-Join, OR-Join, XOR-Join} is the function that specifies the applicable type of a task in YWL. Definition 2 Petri nets Formally, a Petri nets can be defined as follows: A Petri nets is a 5 tuple PN = (P, T, I, O, M 0 ) where P = {P1 , P2 , …, Pn } is a finite set of Places; T = {t1 , t2 , …, tn } is a finite set of transitions, P ∪ T = ∅, and P ∩ T = ∅; I: (P x T ) → N is an input function that defines directed Arcs from Places to Transitions; O: (P x T ) → N is an output function that defines directed Arcs from Transitions to Places; and M 0 : P → N is the Initial marking. A. Import YAWL in XML format A business workflow would be represented as a YAWL workflow model using a traditional YAWL editor. The YAWL workflow is expected as a XML file format so that we could investigate the well-formed structure of the given XML file. B. Extract the elements of YAWL The input YAWL workflow model is extracted into a set of YWL elements with the inscriptions attached. Each element is considered to match the corresponding set of the elements of the Petri nets found in Table 1. C. Generate the Petri nets using mapping rules Our initial version of the transforming mapping rules are proposed for the basic elements of the YAWL, such as Input condition, Output condition elements, ordinary conditions and atomic task, etc. excluding the Composite Task and Multiple Instance Composite Tasks. Therefore, we found that the input YAWL would be strictly written in well-formed YAWL. In short, the well-formed YAWL means whenever we apply a kind of the path splitter element then we should have the same kind of the path merger elements. For example, if we draw a XOR-Split element then there should

Transforming YAWL Workflows into Petri Nets

173

exist a corresponding XOR-Join in our YAWL model structure. In order to provide our mapping rules a flexibility to handle the non well-formed YAWL model structure. Additionally, we extend our initial version of our mapping rules to cope with the XOR-Join, OR-Split and OR-Join using the solution proposed in [9]. The final mapping rules do support the non well-formed YAWL model precisely. Our mapping rules are presented as follows. Our Transforming Rules: (1) Mapping Input Condition Given an input condition i ∈ C i which has only one outgoing arc, we generate one place p1 connecting to transition t1 . Therefore, the transition t1 is expected to furtherly connect to the subsequent parts. Moreover, one token is available and located in p1 . (2) Mapping Output Condition Given an output condition o ∈ C o which is the terminal node of a YWL, we generate one place p1 for each condition o. Therefore, the output condition o is expected to furtherly be connected from the previous parts. (3) Mapping Condition Given a condition c ∈ C, we generate one place p1 connecting to one transition t1 . Therefore, the place p1 is expected to furtherly be connected from the previous parts while the transition t1 is expected to connect to the subsequent parts. (4) Mapping Atomic Task Given an atomic task t a ∈ T a , we generate one place p1 connecting to one transition t1 . Therefore, the place p1 is expected to furtherly be connected from the previous parts while the transition t1 is expected to connect to the subsequent parts. (5) Mapping Multiple Instance Atomic Task Given a multiple instance task t m ∈ T m , we generate one place p1 connecting to one transition t1 in the first layer. In the middle layer, we generate a set of multiple atomic tasks running the parallel structure and the transition t1 in the first layer connecting to every place pi of each atomic task. In the last layer, we generate a set of places to be connected from the atomic tasks in the middle layer and provide one single transition to be connected from the places in the last layer. In our example, the first layer contains place p1 and transition t1 . The middle layer contains three atomic tasks, which are place p2 and transition t2 , place p3 and transition t3 , place p4 and transition t4 . Finally, the last layer contains the place p5 , p6 , p7 and transition t5 . (6) Mapping AND-Split Given a task t ∈ T and T type(t) = AND-Split, we generate one place p1 connecting to one transition t1 . Therefore, the transition t1 is expected to furtherly connect to at least two or more subsequent parts. The place p1 is expected to furtherly be connected from the previous parts.

174

W. Paakbua and W. Vatanawood

(7) Mapping AND-Join Given a task t ∈ T and T type(t) = AND-Join, we generate one or more places p1 , p2 , …, pn connecting to one transition t1 . Therefore, the transition t1 is expected to furtherly connect to subsequent parts. The places p1 , p2 , …, pn are expected to be connected from the previous parts. (8) Mapping XOR-Split Given a task t ∈ T and T type(t) = XOR-Split, we generate one place p1 connecting to two or more transitions t1 , t2 , …, tn . Therefore, the transitions t1 , t2 , …, tn are expected to furtherly connect to at least two or more their own subsequent parts. The place p1 is expected to furtherly be connected from the previous parts. (9) Mapping XOR-Join Given a task t ∈ T and T type(t) = XOR-Join, we generate one or more places p1 , p2 , …, pn connecting to their own subsequent transitions t1 , t2 , …, tn . Moreover, for each place generated mentioned from {p1 , p2 , …, pn }, we generate the inhibitor arcs to the rest of the places other than itself. For example, there are inhibitor arcs from p1 to p2 , p3 , …, pn , or there are inhibitor arcs from p2 to p1 , p3 , …, pn . Therefore, the transitions t1 , t2 , …, tn are expected to furtherly connect to one single subsequent place which is connecting to the last transition. (10) Mapping OR-Split Given a task t ∈ T and T type(t) = OR-Split, we generate one place p1 connecting to two or more transitions t1 , t2 , …, tn . Therefore, the main concept is that there is the additional choice of transition firing to produce the tokens for all of the following places. (11) Mapping OR-Join Given a task t ∈ T and Ttype(t) = OR-Join, we generate one or more places p1 , p2 , …, pn connecting to their own subsequent transitions t1 , t2 , …, tn . Moreover, for each place generated mentioned from {p1 , p2 , …, pn }, we generate the inhibitor arcs to the rest of the places other than itself. The main concept is that there is an additional choice of transition checking the available tokens from all of the previous places. D. Verify the resulting Petri nets using Simulation Tools The resulting Petri nets are generated and verified using the CPN Tool [10]. The Petri nets are simulated and we would watch the step-by-step animated distributions of the tokens, which reflect the behavior of the original workflow in YAWL. Moreover, the LTL properties of the liveness and safetyness, concerning the reachability, and deadlock or livelock of the workflow, would be more verified using the plug-in module of CPN tool.

Transforming YAWL Workflows into Petri Nets

175

4 Demonstration and Case Study In this section, we demonstrate a case study regarding the credit card application process. The YAWL workflow model of this credit card application process is shown in Fig. 5, which could be described as follows. The YAWL workflow begins when an applicant submits an application. Upon the receiving of an application that using XOR-Split for Check completeness and Condition to be selected. If it is not complete, an application will hold or cancel depending on conditions which represent in Atomic. If not, the application would be checked both the document completeness and warning incidents from credit bureau using AND-Split task. After collecting the results from both paths using AND-Join, the XOR-Split shows the decision making exclusively done in either approval or reject of the application. The approved application would be considered for asking more preference on any extra features before a credit card is produced and delivered. For the case of application rejection, the process ends. Each node of the graph structure of (C ∪ T, F) is extracted. We found one input condition i ∈ C and o ∈ C which are mapped using rules in row 1 and 2 from Table 1. Each t ∈ T is considered whether it is an atomic task t ∈ T a or a multiple instance task t ∈ T m , for the single incoming and outgoing arc of the task t. In the case of multiple incoming arcs and/or multiple outgoing arcs connecting to the task t, the function Ttype(t) would be used to its applicable type, whether the task t is one the following types—XOR-Split, XOR-Join, AND-Split, AND-Join, OR-Split, OR-Join. The type of the task t indicates the mapping rule to be applied. Finally, the resulting of the Petri net is generated from the original YAWL workflow model of this case study as shown in Fig. 6.

Fig. 5 A sample of YAWL workflow model [11]

176

W. Paakbua and W. Vatanawood

Fig. 6 A credit card application process in petri nets

5 Conclusion In order to ease the simulation and verification of the YAWL, we provide an alternative to transform the given YAWL workflow model into the corresponding Petri nets. In this paper, a set of mapping rules are proposed and illustrated graphically. We provide the demonstration of the transforming approach using a case study showing that the resulting Petri net is generated correctly and verified using a Petri nets simulation tool.

References 1. The YAWL Foundation: YAWL: yet another workflow language. [Online] Available: http:// www.yawlfoundation.org. Last visited: 20 Mar 2019 (2019) 2. Ye, J.H., Sun, S.X., Song, W., Wen, L.J.:. Formal semantics of BPMN process models using YAWL. In: 2008 Second International Symposium on Intelligent Information Technology Application (2008) 3. Ye, J.H., Sun, S.X., Wen, L., Song, W.: Transformation of BPMN to YAWL. In: Proceedings of the 2008 International Conference on Computer Science and Software Engineering—Volume 02, IEEE Computer Society (2008) 4. Peleska, J.: CSP, Formal Software Engineering and the Development of Fault-Tolerant Systems. Kluwer Academic Publishers (1993) 5. Pornudomthap, S., Vatanawood, W.: Transforming YAWL workflow to BPEL skeleton. In: 2011 IEEE 2nd International Conference on Software Engineering and Service Science (2011) 6. Maneerat, N., Vatanawood, W.: Transformation of UML activity diagram into colored petri nets with inscription. In: 2016 13th International Joint Conference on Computer Science and Software Engineering, JCSSE (2016) 7. Deesukying, J., Vatanawood, W.: Transformation of business rule to CPN ML. In: 2016 IEEE/ACIS 15th International Conference on Computer and Information Science, ICIS (2016) 8. Petersoon, James L.: Petri Net Theory and the Modeling of Systems. Prentice-Hall Inc., Englewood Cliffs, N.J. (1981) 9. Rittgen, P.: Paving the road to business process automation. In: University Koblenz-Landau, Proceedings of the 8th European Conference on Information Systems, Trends in Information and Communication Systems for the 21st Century, ECIS 2000, Vienna, Austria, 3–5 July 2000 (2000)

Transforming YAWL Workflows into Petri Nets

177

10. Westergaard, M., (Eric) Verbeek, H.M.W.: CPN tools. [Online] Available: http://cpntools.org. Last visited: 25 Mar 2019 (2019) 11. Ouyang, C., Adams, M., Wynn, M.T., Ter, A.: Workflow management. In: Handbook on Business Process Management, vol. 1, pp. 475–506 (2010)

Author Index

A Ahn, Jin-Ho, 47 Alexander, Kevin, 129 B Budianto, Christofer Derian, 111 C Choi, Haechul, 57 Choi, Ji-Won, 17 Chun, Sam-Hyun, 33 F Fukui, Shinji, 95 H Han, Sang-Min, 47 Hansun, Seng, 111 Hong, Deok-Gi, 1 I Ishii, Naohiro, 147 Iswari, Ni Made Satvika, 129 Iwahori, Yuji, 95 J Jeon, Geyong-Sik, 33 K Kijsirikul, Boonserm, 95 Kim, Gicheol, 57 Kim, Il-Hwan, 1 Kim, Jong-Bae, 17, 33 Kim, Seok-Yoon, 1

Kim, Youngkyu, 47 Kim, Youngmo, 1 L Little, James J., 95 M Marukatat, Rangsipan, 81 Meng, Lin, 95 O Odagiri, Kazuya, 147 P Paakbua, Wanwisa, 165 Park, Hyungwoo, 71 S Shimizu, Shogo, 147 T Takebayashi, Ayaka, 95 V Vatanawood, Wiwat, 165 W Wang, Aili, 95 Wicaksana, Arya, 111, 129 Y Yun, Sang-Kweon, 17

© Springer Nature Switzerland AG 2020 R. Lee (ed.), Applied Computing and Information Technology, Studies in Computational Intelligence 847, https://doi.org/10.1007/978-3-030-25217-5

179