Information and Knowledge Engineering [1 ed.] 9781683921967, 9781601324634

This volume contains the proceedings of the 2017 International Conference on Information and Knowledge Engineering (IKE&

183 86 11MB

English Pages 146 Year 2015

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Information and Knowledge Engineering [1 ed.]
 9781683921967, 9781601324634

Citation preview

PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE ENGINEERING

Editors Hamid R. Arabnia Leonidas Deligiannidis, Ray Hashemi Fernando G. Tinetti Associate Editor Ashu M. G. Solo

CSCE’17 July 17-20, 2017 Las Vegas Nevada, USA americancse.org ©

CSREA Press

This volume contains papers presented at The 2017 International Conference on Information & Knowledge Engineering (IKE'17). Their inclusion in this publication does not necessarily constitute endorsements by editors or by the publisher.

Copyright and Reprint Permission Copying without a fee is permitted provided that the copies are not made or distributed for direct commercial advantage, and credit to source is given. Abstracting is permitted with credit to the source. Please contact the publisher for other copying, reprint, or republication permission.

© Copyright 2017 CSREA Press ISBN: 1-60132-463-4 Printed in the United States of America

Foreword It gives us great pleasure to introduce this collection of papers to be presented at the 2017 International Conference on Information and Knowledge Engineering (IKE’17), July 17-20, 2017, at Monte Carlo Resort, Las Vegas, USA. An important mission of the World Congress in Computer Science, Computer Engineering, and Applied Computing, CSCE (a federated congress to which this conference is affiliated with) includes "Providing a unique platform for a diverse community of constituents composed of scholars, researchers, developers, educators, and practitioners. The Congress makes concerted effort to reach out to participants affiliated with diverse entities (such as: universities, institutions, corporations, government agencies, and research centers/labs) from all over the world. The congress also attempts to connect participants from institutions that have teaching as their main mission with those who are affiliated with institutions that have research as their main mission. The congress uses a quota system to achieve its institution and geography diversity objectives." By any definition of diversity, this congress is among the most diverse scientific meeting in USA. We are proud to report that this federated congress has authors and participants from 64 different nations representing variety of personal and scientific experiences that arise from differences in culture and values. As can be seen (see below), the program committee of this conference as well as the program committee of all other tracks of the federated congress are as diverse as its authors and participants. The program committee would like to thank all those who submitted papers for consideration. About 65% of the submissions were from outside the United States. Each submitted paper was peer-reviewed by two experts in the field for originality, significance, clarity, impact, and soundness. In cases of contradictory recommendations, a member of the conference program committee was charged to make the final decision; often, this involved seeking help from additional referees. In addition, papers whose authors included a member of the conference program committee were evaluated using the double-blinded review process. One exception to the above evaluation process was for papers that were submitted directly to chairs/organizers of pre-approved sessions/workshops; in these cases, the chairs/organizers were responsible for the evaluation of such submissions. The overall paper acceptance rate for regular papers was 27%; 9% of the remaining papers were accepted as poster papers (at the time of this writing, we had not yet received the acceptance rate for a couple of individual tracks.) We are very grateful to the many colleagues who offered their services in organizing the conference. In particular, we would like to thank the members of Program Committee of IKE’17, members of the congress Steering Committee, and members of the committees of federated congress tracks that have topics within the scope of IKE. Many individuals listed below, will be requested after the conference to provide their expertise and services for selecting papers for publication (extended versions) in journal special issues as well as for publication in a set of research books (to be prepared for publishers including: Springer, Elsevier, BMC journals, and others). • • • • • •

Prof. Nizar Al-Holou (Congress Steering Committee); Professor and Chair, Electrical and Computer Engineering Department; Vice Chair, IEEE/SEM-Computer Chapter; University of Detroit Mercy, Detroit, Michigan, USA Prof. Hamid R. Arabnia (Congress Steering Committee); Graduate Program Director (PhD, MS, MAMS); The University of Georgia, USA; Editor-in-Chief, Journal of Supercomputing (Springer); Fellow, Center of Excellence in Terrorism, Resilience, Intelligence & Organized Crime Research (CENTRIC). Dr. Travis Atkison; Director, Digital Forensics and Control Systems Security Lab, Department of Computer Science, College of Engineering, The University of Alabama, Tuscaloosa, Alabama, USA Dr. Arianna D'Ulizia; Institute of Research on Population and Social Policies, National Research Council of Italy (IRPPS), Rome, Italy Prof. Kevin Daimi (Congress Steering Committee); Director, Computer Science and Software Engineering Programs, Department of Mathematics, Computer Science and Software Engineering, University of Detroit Mercy, Detroit, Michigan, USA Prof. Zhangisina Gulnur Davletzhanovna; Vice-rector of the Science, Central-Asian University, Kazakhstan, Almaty, Republic of Kazakhstan; Vice President of International Academy of Informatization, Kazskhstan, Almaty, Republic of Kazakhstan

• • • • •

• • • • • • • • • • • • • • •

• •

Prof. Leonidas Deligiannidis (Congress Steering Committee); Department of Computer Information Systems, Wentworth Institute of Technology, Boston, Massachusetts, USA; Visiting Professor, MIT, USA Prof. Mary Mehrnoosh Eshaghian-Wilner (Congress Steering Committee); Professor of Engineering Practice, University of Southern California, California, USA; Adjunct Professor, Electrical Engineering, University of California Los Angeles, Los Angeles (UCLA), California, USA Prof. Ray Hashemi (Session Chair, IKE); Professor of Computer Science and Information Technology, Armstrong Atlantic State University, Savannah, Georgia, USA Prof. Dr. Abdeldjalil Khelassi; Computer Science Department, Abou beker Belkaid University of Tlemcen, Algeria; Editor-in-Chief, Medical Technologies Journal; Associate Editor, Electronic Physician Journal (EPJ) - Pub Med Central Prof. Louie Lolong Lacatan; Chairperson, Computer Engineerig Department, College of Engineering, Adamson University, Manila, Philippines; Senior Member, International Association of Computer Science and Information Technology (IACSIT), Singapore; Member, International Association of Online Engineering (IAOE), Austria Dr. Andrew Marsh (Congress Steering Committee); CEO, HoIP Telecom Ltd (Healthcare over Internet Protocol), UK; Secretary General of World Academy of BioMedical Sciences and Technologies (WABT) a UNESCO NGO, The United Nations Dr. Somya D. Mohanty; Department of Computer Science, University of North Carolina - Greensboro, North Carolina, USA Dr. Ali Mostafaeipour; Industrial Engineering Department, Yazd University, Yazd, Iran Dr. Houssem Eddine Nouri; Informatics Applied in Management, Institut Superieur de Gestion de Tunis, University of Tunis, Tunisia Prof. Dr., Eng. Robert Ehimen Okonigene (Congress Steering Committee); Department of Electrical & Electronics Engineering, Faculty of Engineering and Technology, Ambrose Alli University, Nigeria Prof. James J. (Jong Hyuk) Park (Congress Steering Committee); Department of Computer Science and Engineering (DCSE), SeoulTech, Korea; President, FTRA, EiC, HCIS Springer, JoC, IJITCC; Head of DCSE, SeoulTech, Korea Dr. Prantosh K. Paul; Department of Computer and Information Science, Raiganj University, Raiganj, West Bengal, India Dr. Xuewei Qi; Research Faculty & PI, Center for Environmental Research and Technology, University of California, Riverside, California, USA Dr. Akash Singh (Congress Steering Committee); IBM Corporation, Sacramento, California, USA; Chartered Scientist, Science Council, UK; Fellow, British Computer Society; Member, Senior IEEE, AACR, AAAS, and AAAI; IBM Corporation, USA Chiranjibi Sitaula; Head, Department of Computer Science and IT, Ambition College, Kathmandu, Nepal Ashu M. G. Solo (Publicity), Fellow of British Computer Society, Principal/R&D Engineer, Maverick Technologies America Inc. Prof. Fernando G. Tinetti (Congress Steering Committee); School of CS, Universidad Nacional de La Plata, La Plata, Argentina; Co-editor, Journal of Computer Science and Technology (JCS&T). Varun Vohra; Certified Information Security Manager (CISM); Certified Information Systems Auditor (CISA); Associate Director (IT Audit), Merck, New Jersey, USA Dr. Haoxiang Harry Wang (CSCE); Cornell University, Ithaca, New York, USA; Founder and Director, GoPerception Laboratory, New York, USA Prof. Shiuh-Jeng Wang (Congress Steering Committee); Director of Information Cryptology and Construction Laboratory (ICCL) and Director of Chinese Cryptology and Information Security Association (CCISA); Department of Information Management, Central Police University, Taoyuan, Taiwan; Guest Ed., IEEE Journal on Selected Areas in Communications. Prof. Layne T. Watson (Congress Steering Committee); Fellow of IEEE; Fellow of The National Institute of Aerospace; Professor of Computer Science, Mathematics, and Aerospace and Ocean Engineering, Virginia Polytechnic Institute & State University, Blacksburg, Virginia, USA Prof. Jane You (Congress Steering Committee); Associate Head, Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong

We would like to extend our appreciation to the referees, the members of the program committees of individual sessions, tracks, and workshops; their names do not appear in this document; they are listed on the web sites of individual tracks. As Sponsors-at-large, partners, and/or organizers each of the followings (separated by semicolons) provided help for at least one track of the Congress: Computer Science Research, Education, and Applications Press (CSREA); US Chapter of World Academy of Science; American Council on Science &

Education & Federated Research Council (http://www.americancse.org/); HoIP, Health Without Boundaries, Healthcare over Internet Protocol, UK (http://www.hoip.eu); HoIP Telecom, UK (http://www.hoip-telecom.co.uk); and WABT, Human Health Medicine, UNESCO NGOs, Paris, France (http://www.thewabt.com/ ). In addition, a number of university faculty members and their staff (names appear on the cover of the set of proceedings), several publishers of computer science and computer engineering books and journals, chapters and/or task forces of computer science associations/organizations from 3 regions, and developers of high-performance machines and systems provided significant help in organizing the conference as well as providing some resources. We are grateful to them all. We express our gratitude to keynote, invited, and individual conference/tracks and tutorial speakers - the list of speakers appears on the conference web site. We would also like to thank the followings: UCMSS (Universal Conference Management Systems & Support, California, USA) for managing all aspects of the conference; Dr. Tim Field of APC for coordinating and managing the printing of the proceedings; and the staff of Monte Carlo Resort (Convention department) at Las Vegas for the professional service they provided. Last but not least, we would like to thank the Co-Editors of IKE’17: Prof. Hamid R. Arabnia, Prof. Leonidas Deligiannidis, Prof. Ray Hashemi, and Prof. Fernando G. Tinetti. We present the proceedings of IKE’17.

Steering Committee, 2017 http://americancse.org/

Contents SESSION: KNOWLEDGE EXTRACTION, KNOWLEDGE MANAGEMENT, AND NOVEL APPLICATIONS Building a Learning Machine Classifier with Inadequate Data for Crime Prediction Trung Nguyen, Amartya Hatua, Andrew Sung

3

A lexicon-based method for Sentiment Analysis using social network data Linh Vu, Thanh Le

10

Knowledge Management and WfMS Integration Ricardo Anderson, Gunjan Mansingh

17

A Knowledge Management Framework for Sustainable Rural Development:the case of Gilgit-Baltistan, Pakistan Liaqut Ali, Anders Avdic

24

'Innovation Description Languages, IDLs' & Knowledge Representations, KRs and - Easily Drafting &Testing Patents for Their Total Robustness Sigram Schindler

29

SESSION: INFORMATION EXTRACTION AND ENGINEERING, PREDICTION METHODS AND DATA MINING, AND NOVEL APPLICATIONS Optical Polling for Behavioural Threshold Analysis in Information Security Dirk Snyman, Hennie Kruger

39

Prediction of Concrete Compressive Strength Using Multivariate Feature Extraction with Neurofuzzy Systems Deok Hee Nam

46

Business and Technical Characteristics of the Bid-Process Information System (BPIS) Sahbi Zahaf, Faiez Gargouri

52

Deep Convolutional Neural Networks for Spatiotemporal Crime Prediction Lian Duan, Tao Hu, En Cheng, Jianfeng Zhu, Chao Gao

61

Proposed Method for Modified Apriori Algorithm Thanda Tin Yu, Khin Thidar Lynn

68

Simplified Long Short-term Memory Recurrent Neural Networks: part I Atra Akandeh, Fathi Salem

74

Simplified Long Short-term Memory Recurrent Neural Networks: part II Atra Akandeh, Fathi Salem

78

Simplified Long Short-term Memory Recurrent Neural Networks: part III Atra Akandeh, Fathi Salem

82

Extraction of Spam and Promotion Campaigns using Social Behavior and Similarity Estimation from Compromised Accounts Selvamani Kadirvelu, Nathiya S, Divyasree I R, Kanimozhi S

86

SESSION: INFORMATION RETRIEVAL, DATABASES, AND NOVEL APPLICATIONS The Semantic Structure of Query Search Results Ying Liu

95

Migration of Relational Databases RDB to Database for Objects db4o for Schema and Data Youness Khourdifi, Mohamed Bahaj

99

Key Information Retrieval System by using Diagonal Block Based Method Mie Mie Tin, Nyein Nyein Myo, Mie Mie Khin

104

Developing Corpus Management System: Architecture of System and Database Olga Nevzorova, Damir Mukhamedshin, Ramil Gataullin

108

SESSION: NATURAL LANGUAGE PROCESSING + TEXT MINING A Verb-based Algorithm for Multiple-Relation Extraction from Single Sentences Qi Hao, Jeroen Keppens, Odinaldo Rodrigues

115

Content based Segmentation of Texts using Table based KNN Taeho Jo

122

SESSION: POSTER PAPERS Multi-Source Identity Tracking Gurpreet Singh Bawa

131

A Steganographic Scheme Implemented on BTC-Compressed Image by Histogram Modification Shih-Chieh Shie

135

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

SESSION KNOWLEDGE EXTRACTION, KNOWLEDGE MANAGEMENT, AND NOVEL APPLICATIONS Chair(s) TBA

ISBN: 1-60132-463-4, CSREA Press ©

1

2

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

3

Building a Learning Machine Classifier with Inadequate Data for Crime Prediction Trung T. Nguyen, Amartya Hatua, and Andrew H. Sung School of Computing, The University of Southern Mississippi, Hattiesburg, MS 39406, U.S.A.

Abstract—In this paper, we describe a crime predicting method which forecasts the types of crimes that will occur based on location and time. In the proposed method, the crime forecasting is done for the jurisdiction of Portland Police Bureau (PPB). The method comprises the following steps: data acquisition and pre-processing, linking data with demographic data from various public sources, and prediction using machine learning algorithms. In the first step, data pre-processing is done mainly by cleaning the dataset, formatting, inferring and categorizing. The dataset is then supplemented with additional publicly available census data, which mainly provides the demographic information of the area, educational background, economical and ethnic background of the people involved; thereby some of the very important features are imported to the dataset provided by PPB in statistically meaningful ways, which contribute to achieving better performance. Under sampling techniques are used to deal with the imbalanced dataset problem. Finally, the entire data is used to forecast the crime type in a particular location over a period of time using different machine learning algorithms including Support Vector Machine (SVM), Random Forest, Gradient Boosting Machines, and Neural Networks for performance comparison. Keywords: crime prediction, missing features, random forest, gradient boosting, SVM, neural networks

INTRODUCTION Crime is a common problem in nearly all societies. Several important factors like quality of life and the economic growth of a society are affected by crime. There are many reasons that cause different types of crimes. In the past, criminal behavior is believed to be the result of a possessed mind and/or body and the only way to exorcise the evil was usually by some torturous means [1]. A person’s criminal behavior can be analyzed from different perspectives like his/her socio-economic background, education, psychology, etc. Researchers have done exhaustive research on these factors. Data mining and analytics have contributed to the development of many applications in medical, financial, business, science, technology and various other fields. Likewise, to obtain a better understanding of crime, machine learning can be used for crime data analytics. Analysis and forecasting the nature of crime has been done based mainly on the criminal’s economic status, race,

social background, psychology, and the demographics of a particular location. In the article by Gottfredson [2], the author discussed how to make a prediction whether a person will be criminal. On the other hand, he summarized and reviewed many of the previous works in order to identify general problems, limitations, potential methods and general nature of prediction problems of crime. The scope of that paper was limited to individual prediction, it did not address global prediction problems like predicting the number of offenses or offenders to be expected at a given time and place. In [3] Hongzhi et al. used improved fuzzy BP neural network to crime prediction. There is no mention of place, time and type of the crime. In [4,5] Mohler used crime hotspot to forecast a particular type of crime (gun crime) in Chicago; they did not address the other issues like other types of crime and occurrence time of those crimes. In [6] Tahani et al. focused on all the three major aspects of crime forecasting: place of crime, time of crime and type of crime. They performed the experiment using some machine learning algorithms for the state of Colorado and California of the United States. In their research, they used only the dataset with its information based on the National Incident Based Reporting System (NIBRS) [7], where information related to crime type, crime time and crime place was present but they did not consider any information about demographic, economic and ethnic details of criminals. In order to forecast crimes successfully, we need to forecast the three main parameters of a particular crime: its type, location and time. Also, the methodology of crime prediction should consider the pattern of previously happened crimes and the other external factors like demographic, economic and ethnic details of criminals. In the present article, we have taken care of all the above mentioned factors. Our main objective is to forecast a crime along with its type, location and time. In the following we describe data pre-processing, prediction methods, results and conclusion.

PROPOSED METHODOLOGY Our proposed methodology can be broadly divided into four phases: Data acquisition, Data preprocessing, Application of Classification algorithm, Finding result and drawing the conclusion. Diagrammatic representation of proposed methodology is given in Fig 1.

ISBN: 1-60132-463-4, CSREA Press ©

4

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

parameters (Final case type, Case description, Occurrence date, X and Y coordinate of the location of the crime and Census tract).

Data Transformation

Fig. 1. Proposed methodology

The data is acquired from Portland Police Bureau (PPB) and the public government source American FactFinder. In data preprocessing phase we have performed feature engineering tasks and integrating the PPB dataset and the demographic dataset. Thereafter, we have applied several machine learning algorithms to build the crime prediction models. Finally, we have performed the testing on trained models and evaluated the performance. Detailed descriptions about each of the phases are provided below.

DATA PREPROCESSING One of the key contribution of this article is data preprocessing. In previous researches, either only the crime occurrence data obtained from police were considered or data related to the criminals were considered, while in this research both aspects are considered at the same time. In addition, several other data preprocessing techniques are described next.

Description about Dataset In our experiments, we have used data from two different sources: first, the dataset provided by the PPB is for the period of March 1, 2012, through September 30, 2016 [8], and the dataset from the American FactFinder website [9]. The data in PPB dataset is listed in calls-forservice (CFS) records giving the following information: Category of crime, Call group, Final case type, Case description, Occurrence date, X and Y coordinate of the location of the crime and Census tract. The data in American FactFinder is the census data of Portland area. From this data, we obtained information about economic, education, employment and racial background of people in this area.

Data Reduction When we examine the data from PPB, there were some missing values in census tract information. Those data points are ignored as we have enough data to perform our experiments. In the dataset, four different types of parameters that describe crime type are Category, Call groups, Final case type and Case descriptions. Out of these four types, we are going to forecast only the final case type. Other parameters are not important in this case. So, we have not included those parameters in our experiments. Thereafter, we have performed dimensionality reduction and obtained a dataset with a reduced number of dimension. Our reduced dataset has a total of five

In the PPB dataset, one of the fields is census tract. The state and county information is removed from this field. Therefore, we have to convert that field into elevendigit census tract, because in the later part of our experiments it is necessary to integrate PPB data with data from the American FactFinder dataset using census tract as joined key. As the PPB data was collected from Multnomah county of Oregon state so the first five digits of census tract are 41051. The last six digits of a census tract is formed by leading zero padded with the original census tract in the PPB dataset. Now we get the standard census tract with eleven digits number in which the first five digits are 41051 and last six digits are the census tract. The total area of each of the census tract is different than others. So, we divided each of the census tract into small clusters with a fixed area of 0.71 square mile. The number of clusters in a particular census tract depends on the total area of the census tract. A new parameter named Area Code is derived, to identify the location of a crime using census tract of that place and the cluster number of that place. In our original dataset, X and Y coordinate of the crime location is provided, from this information we have created clusters of an area. If a crime is committed in cluster number “MM” of census tract “41051000XXX”, then Area Code of that crime will be “41051000XXXMM”. This is a unique id for location and in the rest of this article this parameter is used to identify the location of a crime.

Fig. 2. Different crimes occur in different census tracts of Portland

Fig. 2. visualizes crime hotspots in different census tracts on Portland map. Each crime hotspot is represented by piechart. The size of the pie charts proportional to the numbers of crimes happened in a particular census tract.

Data Discretization In the PPB dataset, every crime has a corresponding occurred time and date. However, our objective is to predict the crime within a span of seven days. Therefore, instead of a particular date, it is converted into corresponding week days and week number in the year. Then we can divide all the crimes into their occurrence day out of the seven days of the week, and occurrence week

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

out of the 53 weeks of the year. That makes it easy to handle the data and it also helps to achieve our target more easily.

Data Integration Data integration is one of the most important steps of this work on crime prediction. Economic, Demographic, Educational and Ethnic information about the people of Multnomah County of the state of Oregon are collected from the census data provided by American FactFinder and then integrated with the data provided by PPB. A total of 21 features are added, and description of the features are provided in Table 1 below. TABLE I.

FEATURES ADDED BY CENSUS DATA

Demographic ID

Description

HC01_EST_VC01 Total Population HC02_EST_VC01

Below poverty level; Estimate; Population for whom poverty status is determined

HC03_EST_VC01 Percent below poverty level; HC01_EST_VC14 Total population of One race - White Total population of One race - Black or African American Total population of One race - American Indian HC01_EST_VC16 and Alaska Native HC01_EST_VC15

HC01_EST_VC17 Total population of One race - Asian HC01_EST_VC18

Total population of One race - Native Hawaiian and Other Pacific Islander

HC01_EST_VC19 Total population of One race - Some other race HC01_EST_VC20 Total population of Two or more races EDUCATIONAL ATTAINMENT - Total HC01_EST_VC28 population of Less than high school graduate HC01_EST_VC29

EDUCATIONAL ATTAINMENT - Total population of High school graduate

5

data points. Based on the data distribution from the demographic data, we generated missing data using random variables with pre-determined distribution. For example, the HC01_EST_VC01 attribute represents the percentage of below poverty level for each census tract. In the present context, 9.1% of people are below poverty level at census tract 100. So, in the newly generated data, 9.1% data points of census tract 100 will be assigned as 0 (which denotes below poverty level) and the rest as 1 (which denotes not below poverty level). Similarly, for all other attributes, missing data will be assigned based on the percentage mentioned in census data. The poverty level of all census tract is divided into four levels: from 1 to 4 in which the lower the poverty level, the poorer the people in such census tract. Fig. 3 below shows the percentage of census tracts in total that corresponding to different poverty levels. Similarly, the crime density of census tracts is also divided into six levels which denote the increasing crime density from level 1 to level 6. In order to find the relationship between poverty level and crime density, we compare the distribution of crime density that corresponding to poverty level in census tract. Fig. 4 below represents the crime density levels of all census tracts that have poverty level 2 while Fig. 5 below represents the crime density levels of all census tract that have poverty level 3. By observation, it can be figured out that in such census tract that have higher poverty level (the higher income level), the total of census tract and the density of crimes are less than the census tracts that have lower poverty level. So, there is a relation between poverty level and crime density in a particular census tract. Similarly, we have found some other relations between the parameters mentioned in TABLE I and numbers of crime in different areas of Portland.

EDUCATIONAL ATTAINMENT - Total HC01_EST_VC30 population of Some college, associate’s degree HC01_EST_VC31

EDUCATIONAL ATTAINMENT - Total population of Bachelor’s degree or higher

22%

26% poverty_level1

HC01_EST_VC36 EMPLOYMENT STATUS - Employed

poverty_level2

poverty_level3

HC01_EST_VC39 EMPLOYMENT STATUS - Unemployed

HC01_EST_VC52 HC01_EST_VC53 HC01_EST_VC54 HC01_EST_VC55

Total population below 50 percent of poverty level Total population below 125 percent of poverty level Total population below 150 percent of poverty level Total population below 185 percent of poverty level Total population below 200 percent of poverty level

We observe that there exist certain relations between each of these features and the rate of crimes for a particular census tract. To figure out the effectiveness of demographic data integrated with the PPB data, we have compared the performance of our models on two kinds of datasets. Our first dataset contains only the preprocessed PPB data, in which there is no demographic information. In the second type of dataset, we assigned values randomly for those parameters in such way that, the overall percentage is the same. In the experiment, we have 92,715

poverty_level4

34%

Fig. 3. Percentage of different poverty levels

crime density levels

HC01_EST_VC51

18%

7 6 6 5 44 4 4 4 3 3333 33333 33 3333 3 33333 3 3 33 33333 3 2 2 22 22 2 2 22 22 2 2 2 11 1111 1 0 1 4 7 10131619222528313437404346495255

census tract Fig. 4. Occurrences of crimes and in areas with poverty level 2

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

crime density levels

6

8 6

5

4

4

2

3

5 2

6 4

3 3 3 3 2 2

5

4 2

3333

4

3 33 1

2

majority classes because the number of samples of class 3 and class 4 are much smaller than the number of samples of class 1 and 2. Below are the steps to construct the new dataset T from the original dataset: 111



0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

census tract



Fig. 5. Occurrences of crimes and in areas with poverty level 3

LEARNING MACHINE MODELS For this crime prediction problem, we employed several machine learning algorithms to build models to accomplish the classification task and then compared the results of them. The learning machines used include Support Vector Machines or SVM [10, 11, 12], Random Forest [13], Gradient Boosting Machines [14], and multilayer neural networks (using MATLAB toolbox, employing scaled conjugate gradient training and resilient backpropagation training algorithms) [15, 16, 17, 18]. These learning machines are well-known and details about them can be found in a variety of references.

RESULTS All the models described in the previous section were trained and tested in our crime prediction tasks. The following sections present the result. The first subsection will discuss our solution of the imbalanced dataset we face. The second and third subsections describe the results for our two kinds of dataset we have after preprocessing.

Dealing with imbalanced dataset Below is the distribution of our compilation dataset after preprocessing. In TABLE II Class 1, Class 2, Class 3, Class 4 represent the following classes: STREET CRIMES, OTHER, MOTOR VEHICLE THEFT, BURGLARY respectively. TABLE II. NUMBER OF DATA POINTS FOR DIFFERENT CLASSES Class 1 20,216

Class 2 70,390

Class 3 1,221

Class 4 924

Total 92,751

In Table II, we can see that there is a big difference between the numbers of each crime types because of the nature of crime occurrence probability. According to Chawla [19, 20], there are 4 ways of dealing with imbalanced data: 1) 2) 3) 4)

Adjusting class prior probabilities to reflect realistic proportions Adjusting misclassification costs to represent realistic penalties Oversampling the minority class Under-sampling the majority class

In this project, we have applied under-sampling of the majority class technique. In our dataset, class 3 and class 4 are the minority classes and class 1 and class 2 are the



We applied k-means clustering with number of cluster equals to k=2000 on two majority classes (class 1 and class 2) From each cluster of class 1 and class 2, select m random samples and put to new dataset T. In our experiment, we chose m that range from 1 to 5 to select representative samples from clusters of majority classes Put all samples of class 3 and class 4 into new dataset T

After new dataset T is constructed, 10-fold validation is used to divide this dataset into training set and test set and fed into selected learning machine to build classification models. These models will then be validated on the original dataset to benchmark the performance.

Prediction with only PPB dataset After preprocessing original PPB data, we have 6 features dataset with the size of more than 92,000 records. Then we applied under-sampling technique (mentioned in 5.1 above) on our first dataset to create a training dataset. The new dataset T that was constructed from above section had the size of 18,145 samples (6,000 samples of class 1; 10,000 samples of class 2; 1,221 samples of class 3; 924 samples of class 4). This training dataset has the almost equal number of samples of 4 different CFS classes that were defined by the Portland Police Bureau. The CFS classes are corresponding to Burglary, Street Crimes, Theft of Auto and Others. The following subsection with discuss the results of applying different machine learning algorithms on our first dataset. 5.2.1 Support Vector Machines (SVM) After the model were trained using SVM with Gaussian kernel on the above under-sampling training dataset, we tested the model run on all the available samples of our first dataset and got the overall 72.6% correct prediction of crime types. The confusing matrix of testing SVM model and the classification accuracy for each class are described in TABLE III. TABLE III. RESULTS USING SVM ON FIRST DATASET Class 1 Class 2 Class 3 Class 4 Precision

Class 1 10242 6542 2038 1394 50.66%

Class 2 19576 42154 5423 3237 59.89%

Class 3 45 57 1100 19 90.09%

Class 4 37 45 13 829 89.72%

5.2.2 Random Forest Next, we have applied Random Forest to complete the crime prediction task. We have trained random forest models using deep tree with minimum leaf size equals to 5 on our first training set, evaluated resulting models on all samples of our first dataset and got the results. At first, we

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

have started from using 50 trees and then adding 50 trees more next loop. In our experiment, the best accuracy results occur when we set number of tree to 100, and increased the number of trees doesn’t improve the results. Furthermore, TABLE IV showed the confusing matrix when evaluated random forest on our first dataset (number of tree = 100). TABLE IV. RESULT OF RANDOM FOREST ON FIRST DATASET (NUMBER OF TREE IS 100) Class 1 Class 2 Class 3 Class 4 Accuracy

5.2.3

Class 1 11427 8704 78 7 56.52%

Class 2 19375 50827 153 35 72.2%

Class 3 583 559 78 1 6.38%

Class 4 455 440 3 26 2.81%

7

TABLE VI. RESULT OF NEURAL NETWORK ON FIRST DATASET WITH SCG TRAINING Class 1 Class 2 Class 3 Class 4 Accuracy

Class 1 9459 10523 181 53 46.79%

Class 2 18664 51454 170 102 73.10%

Class 3 565 630 19 7 48.32%

Class 4 414 493 8 9 0.97%

TABLE VII. RESULT OF NEURAL NETWORK ON FIRST DATASET WITH RP TRAINING Class 1 8750 11337 96 33 43.28%

Class 1 Class 2 Class 3 Class 4 Accuracy

Class 2 17309 52860 177 44 75.10%

Class 3 529 679 12 1 0.98%

Class 4 406 507 5 6 0.65%

Gradient Boosting Machines

Finally, we have applied Gradient Tree Boosting to complete the prediction task. We have directly trained and evaluated this model using AdaBoost with training set and got the results. Generally, it will be better to employ more decision trees for higher prediction accuracy (lower mean square error). However, in our experiment, the best result occurs when we set number of tree estimators to 100, and it does not turn better as the number of trees increases. Furthermore, TABLE V showed the confusing matrix when evaluated Gradient Boosting Machines on our first dataset (number of tree = 100). TABLE V.

Class 1 Class 2 Class 3 Class 4 Accuracy

RESULT OF GBM ON FIRST DATASET (NUMBER OF TREE IS 100) Class 1 12019 6491 1125 581 59.45%

Class 2 20648 45685 2511 1546 64.9%

Class 3 64 71 1077 9 88.0%

Class 4 68 47 13 796 86.1%

5.2.4 Neural Networks Lastly, we trained neural networks on our first dataset using two backpropagation training techniques: Scaled Conjugate Backpropagation (SCG) and Resilient Backpropagation (RP). We constructed our neural network using 2 hidden layers. Different combination of different number of nodes of two layers were tested and classification performance were recorded. With SCG, we obtained the best result of 65.7% classification accuracy with 30 nodes of hidden layer 1 and 50 nodes of hidden layer 2. With RP, we could obtain the best result of 66.46% classification accuracy with 45 nodes of hidden layer 1 and 50 nodes of hidden layer 2. In TABLE VI. and TABLE VII, the confusing matrix of classification using the two best neural network models that obtained from SCG and RP training methods.

Prediction results with second dataset After working on our first dataset, we decided to develop our second dataset which derived from original PPB data with demographic data based on census tract that we obtained from FactFinder. After preprocessing original PPB data, the dataset have 6 features. Based on demographic dataset from FactFinder, new 9 features have been introduced to form the second dataset. The 9 features are: below poverty level status, age, sex, race, education status, employment status, working time status, poverty level status, past 12 month working status. Based on the data distribution from the demographic data, missing data have been generated using random variable with predetermined distribution. This resulting dataset contains 15 features with the same size with our first dataset (over 92,000 records). Under-sampling technique has also been applied for this dataset to create a training dataset size of 12,145 samples which contains 4,000 samples of class 1; 6,000 samples of class 2; 1,221 samples of class 3; and 924 samples of class 4. The following subsection with discuss the results of applying different machine learning algorithms on our second dataset. 5.3.1 SVM After the model were trained using SVM with Gaussian kernel on above training dataset, we tested the model on the whole sample of the second dataset and got the overall 79.39% correct prediction of crime types. The confusing matrix of testing SVM model and the classification accuracy for each class are described in TABLE VIII. We got good results to classify class 2, class 3 and class 4 but very bad accuracy with class 1. TABLE VIII. RESULT OF SVM ON SECOND DATASET Class 1 Class 2 Class 3 Class 4 Accuracy

Class 1 1398 18816 2 0 7.0%

ISBN: 1-60132-463-4, CSREA Press ©

Class 2 2 70324 34 30 99.9%

Class 3 0 367 854 0 70.0%

Class 4 0 281 1 642 69.5%

8

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

5.3.2 Random Forest Next, we trained a model using Random Forest deep tree with minimum leaf size equals to 5 on second training dataset to complete the crime prediction task. We trained different Random Forest models which ranged from 50 trees and 300 trees, and then recorded the classification performance. In our experiment, the best accuracy results occur when we set number of tree to 250 (65.79% accuracy). In TABLE IX, the result confusing matrix and classification accuracy on each class when training Random Forest with 250 trees. TABLE IX. RESULT OF RANDOM FOREST ON SECOND DATASET Class 1 Class 2 Class 3 Class 4 Accuracy

Class 1 11878 8014 278 46 58.75%

Class 2 473 47647 17978 4292 67.69%

Class 3 0 122 1047 52 85.75%

Class 4 0 133 341 450 48.7%

5.3.3 Gradient Boosting Machines Next, we trained a model using Gradient Boosting using AdaBoost training technique on second training dataset to complete the crime prediction task. We trained different Gradient Boosting Tree models which ranged from 50 trees and 300 trees, and then recorded the classification performance. In our experiment, the best accuracy results occur when we set number of tree to 300 (61.67% accuracy). In TABLE X, the result confusing matrix and classification accuracy on each class when training GBM with 300 trees. TABLE X.

Class 1 Class 2 Class 3 Class 4 Accuracy

RESULT OF GRADIENT BOOSTING TREE ON SECOND DATASET Class 1 12869 7016 236 95 63.65%

Class 2 459 42298 16491 11142 60.09%

Class 3 0 18 1167 36 95.57%

Class 4 0 14 45 865 93.61%

5.3.4 Neural Networks Lastly, we trained neural networks on our second dataset using two popular backpropagation training techniques which are Scaled Conjugate Backpropagation (SCG) and Resilient Backpropagation (RP). We constructed our neural network using 2 hidden layers. Different combination of different number of nodes of two layers were tested and classification performance were recorded.

40 nodes of hidden layer 2. With RP, we could obtain the best result of 74.24% classification accuracy with 40 nodes of hidden layer 1 and 25 nodes of hidden layer 2. In TABLE XI. and TABLE XII, the confusing matrix of classification using the two best neural network models that obtained from SCG and RP training methods. TABLE XII. RESULT OF NEURAL NETWORK ON SECOND DATASET WITH RP TRAINING Class 1 Class 2 Class 3 Class 4 Accuracy

Class 1 13373 6493 278 72 62.2%

Class 2 1134 54785 11513 2958 77.8%

Class 3 0 548 581 92 47.6%

Class 4 0 494 308 122 13.2%

Comparision of results

Fig. 6: Prediction accuracy of different classifiers on first dataset

Fig. 6. above displays the comparison of prediction accuracy of different classifiers on first dataset. Among those classifiers, SVM with Gaussian kernel AdaBoost gave 72.6% and 74.3% correct prediction of crime types while Random Forest (RF) and Neural Network perform badly. GBM handle the imbalanced data issue best among those classifier models. Fig. 7. is the comparison of prediction accuracy of different classifiers on second dataset. Among those classifier models, RF and GBM (AdaBoost) gave lowest overall performance with 65.79% for RF and 61.67% for GBM. However, SVM and Neural networks gave best overall prediction accuracy (79.39% for SVM, 74.02% for SCG and 74.24% for RP). Nevertheless, SVM and NN have problem with imbalanced classification between classes while GBM still handle well with this problem.

TABLE XI. RESULT OF NEURAL NETWORK ON SECOND DATASET WITH SCG TRAINING Class 1 Class 2 Class 3 Class 4 Accuracy

Class 1 12659 7170 295 92 62.62%

Class 2 952 52402 12304 4732 74.44%

Class 3 4 484 590 143 48.32%

Class 4 1 424 297 202 21.86%

With SCG, we obtained the best result of 74.02% classification accuracy with 60 nodes of hidden layer 1 and

Fig. 7. Prediction accuracy of different classifiers on second dataset

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

[3]

CONCLUSION For the crime prediction problem offered as a competition by the U.S. National Institute of Justice, we have started with preprocessing the dataset from Portland Police Bureau. Then, we have attempted to select some helpful features to represent the attributes of the samples in a proper manner. The total forecast area is divided into crime hotspots and integrated with demographic data from FactFinder. Thereafter, machine learning techniques are used to train our prediction models that calculated the probabilities of different categories of crimes. Because of the large dataset and problem of imbalanced data, we employ under-sampling technique on our dataset to reduce the training set size to less than 20,000 samples. According to the results, integrating demographic data from other public sources such as Fact Finder, U.S. census bureau has resulted in improving the performance of our models significantly. With our first dataset in which demographic was not used, as shown in the results of previous section, SVM does not seem to be a suitable model for this task because of the bad classification accuracy when compared to other used methods. Ensemble methods such as Random Forest or Gradient Boosting turned out to be the two best models when compared the performance of them with SVM. These two methods can handle with big training sets and the training time is faster than the training time of SVM model. With our second dataset that we use demographic data as reference to generate missing features, SVM and Neural Network show the best accuracy models when compared to other two ensemble methods of Random Forest and Gradient Boosting Machines. Overall, the classification accuracy of different machines on our second dataset are better than our first dataset. However, there are imbalanced classification accuracy between four desired classes where resulting models have large misclassification of one of four classes while the classification of other three classes are very good. In the era of big data, analytics are increasingly being used for modeling, prediction, knowledge extraction, and decision making, etc. How to make the best use of datasets that are missing important features poses an often very challenging problem in data mining tasks. This research demonstrates a successful approach to build learning machine models with insufficient features. The authors are continuing the work to explore methods to handle imbalanced data, and to develop a more general model to predict crime type, time, and place using the best performing algorithms.

[4]

[5]

[6]

[7] [8] [9] [10]

[11]

[12]

[13] [14]

[15]

[16]

[17]

[18] [19]

[20]

9

Yu, Hongzhi, Fengxin Liu and Kaiqi Zou, “Improved Fuzzy BP Neural Network and its Application in Crime Prediction.” Journal of Liaoning Technical University (Natural Science) 2 (2012): 025. Mohler, George, “Marked Point Process Hotspot Maps for Homicide and Gun Crime Prediction in Chicago.” International Journal of Forecasting, vol. 30, Issue 3, pp 491-497, JulySeptember 2014. Office of Justice Programs, “Why Crimes Occur in Hot Spots.” http://www.nij.gov/topics/law-enforcement/strategies/hot-spotpolicing/pages/why-hot-spots-occur.aspx. Tahani Almanie, Rsha Mirza, Elizabeth Lor, “Crime Prediction Based on Crime Types and Using Spatial and Temporal Criminal Hotspots”, arXiv preprint arXiv:1508.02050 (2015). DENVER Open Data Catalog, http://data. denvergov.org/dataset /city- and-county-of-denver-crime. [Accessed: 20- May- 2015]. National Institute of Justice, https://www.nij.gov/ funding/ Pages/ fy16-crime-forecasting-challenge. aspx#data American FactFinder, https://factfinder.census.gov/ faces/ nav/ jsf/ pages/index.xhtml Boser, Bernhard E., Isabelle M. Guyon, and Vladimir N. Vapnik, “A Training Algorithm for Optimal Margin Classifiers.” Proceedings of the fifth annual workshop on Computational learning theory, pp. 144-152. ACM, 1992. Bottou, Léon, Corinna Cortes, John S. Denker, Harris Drucker, Isabelle Guyon, Lawrence D. Jackel, Yann LeCun et al., “Comparison of Classifier Methods: A Case Study in Handwritten Digit Recognition.” IAPR International Conference, vol. 2, pp. 7782, 1994. Tong, Simon, and Edward Chang, “Support Vector Machine Active Learning for Image Retrieval.” Proceedings of the ninth ACM international conference on Multimedia, pp. 107-118, ACM, 2001. Breiman, Leo, “Random Forests.” Machine learning 45.1 (2001): 5-32. Freund, Yoav and Robert E. Schapire, “A Desicion-Theoretic Generalization of On-line Learning and an Application to Boosting.” European conference on computational learning theory, pp. 23-37, Springer Berlin Heidelberg. Møller, Martin Fodslette, “A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning.” Neural networks 6.4 (1993): 525533. Orozco, José and Carlos A. Reyes García, “Detecting Pathologies from Infant Cry Applying Scaled Conjugate Gradient Neural Networks.” European Symposium on Artificial Neural Networks, Bruges (Belgium), pp. 349-354, 2003. Riedmiller, Martin, and Heinrich Braun, “A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm.” IEEE International Conference on Neural Networks, pp. 586-591, 1993. MathWoks, Inc., Matlab.NeuralNetworksToolbox,V.9.1.0 https://www.mathworks.com/. Chawla, Nitesh V., “Data Mining for Imbalanced Datasets: An Overview.” Data mining and knowledge discovery handbook, Springer US, vol. 4, pp. 853-867, 2005. Rahman, M. Mostafizur, and D. N. Davis, “Addressing the Class Imbalance Problem in Medical Datasets.” International Journal of Machine Learning and Computing 3.2 (2013): 224.

REFERENCES [1]

[2]

C. N. Trueman. (Mar 2015). “Why Do People Commit Crime.” http://www.historylearningsite.co.uk/sociology/crime-anddeviance/ why-do-people-commit-crime. Don M. Gottfredson, “Assessment and Prediction Methods in Crime and Delinquency.” Contemporary Masters in Criminology, pp. 337-372, Springer US, 1995

ISBN: 1-60132-463-4, CSREA Press ©

10

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

A lexicon-based method for Sentiment Analysis using social network data Linh Vu

Thanh Le, Ph.D.

School of Business Information Technology University of Economics Ho-Chi-Minh City, Ho-Chi-Minh City, Vietnam [email protected] Abstract—In the era of social media, use of social networking data to study customers’ attitudes toward brands, products, services, or events has become an increasingly dominant trend in business strategic management research. Sentiment analysis, which is also called opinion mining, is a field of study that aims at extracting opinions and sentiments from natural language text using computational methods. It can assist in search engines, recommender systems and market research. With the growth of the internet, numerous business websites have been deployed to support online shopping and booking services as well as to allow online reviewing and commenting the services in forms of either business forums or social networks. Mining opinion automatically using the reviews from such online platforms is not only useful for customers to seek for advice, but also necessary for business to understand their customers and to improve their services. There are currently several approaches for sentiment analysis. In this research, we proposed a lexiconbased method using sentiment dictionaries with a heuristic method for data pre-processing. We showed that our method outperformed state-of-the-art lexicon-based methods. Keywords— sentiment analysis; text mining; social network; natural language processing; strategic management

I. INTRODUCTION Understanding what customers think about business products or services has always been one of the most important issues in business strategic management, particularly in business decision making. According to Liu et al. [Liu, 2012], the beliefs, or perceptions of reality, and the choices one makes are somehow conditioned upon the way the others act. This is true not only for individuals but also for business. While consumers hunger for and rely on online advice or recommendations of products and services, business demand for utilities that can transform customers’ thoughts and conversations into customer insights, or those for social media monitoring, reputation management and voice of the customer programs. Traditionally, individuals usually ask for opinions from friends and family members, while business rely on surveys, focus groups, opinion polls and consultants. In the modern age of Big Data, when millions of consumer reviews and discussions flood the Internet every day, while individuals feel overwhelmed with information, it is as well impossible for business to keep that up manually. Thus, there is a clear need of computational methods for automatically analyzing sentiment using unstructured text from social media to aid people on information indigestion.

School of Business Information Technology University of Economics Ho-Chi-Minh City, Ho-Chi-Minh City, Vietnam [email protected] Several approaches, therefore, have been proposed for sentiment analysis using text data. Among those that are based on lexical resources, methods utilize dictionary of opinionated synsets (sets of synonyms) like SentiWordNet1, or set of opinion adjective terms like that in Liu et al.2 are widely used these days thanks to their simplicity and effectiveness. SentiWordNet is basically a lexical resource for opinion mining, in which the connections between synsets and opinions are defined based on WordNet dictionary. Starting with a set of seed words to be known popularly used, the expansion of the synset was then done based on a gloss similarity measure, resulting in a dictionary of 147,306 synsets. That makes SentiWordNet affordable for many real world problems in terms of sentiment analysis. On the other hand, the lexicon in Liu et al. (Liu lexicon) is a set of around 6,800 English terms which are classified into positive and negative opinion groups. This set was also initiated using a set of seed adjective terms, meaning either good or bad, and was augmented using a knowledge discovery method based on semantic synonym and antonym relations. Both SentiWordNet and Liu lexicon therefore provide databases of English terms and sentiment values. Such databases are ready to make automatic the process of deriving opinion lexicons. Beside benefits, they yet have some drawbacks. Liu lexicon consists of two lists, one of positive terms and the other of negative terms, making it easy to use. Moreover, as it has evolved over the past decade, misspellings, morphological variants, slang, and social-media markup are also included in each lists. Thus, Liu lexicon can perform well on social media text analysis without using advanced methods for text pre-processing. It however cannot cover all kinds of real world problems because of its limitation in terms of sentiment intensity. SentiWordNet, in contrast, has sentiment specific definition for every synsets but its large number of selected terms, including those that have no positive or negative sentiment polarity, makes it highly noisy, resulting in its failure in accounting for sentiment bearing lexical features relevant to text in microblogs [Hutto et al., 2014]. The significant disagreement3 between SentiWordNet and the gold standard lexicons, namely Harvard General Inquirer [Stone et al., 1966] and Linguistic Inquiry and Word Counts [Pennebaker, 2001], may be another reason for its worse

1

http://sentiwordnet.isti.cnr.it/ https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html 3 http://sentiment.christopherpotts.net/lexicons.html 2

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 | performance and prevents it from being used widely in real world problems. In this research, we proposed a lexical resources based approach that combines the method of Baccianella et al. [Baccianella et al., 2010], which utilizes SentiWordNet (SentiWN), with that of Liu et al. (LIU) [Liu et al., 2004, 2005] and a heuristic data pre-processing method for sentimentoriented filtering of words. We had shown that our method outperformed the state-of-the-art lexicon-based methods, SentiWN and LIU. II.

BACKGROUND AND RELATED WORK

1. Sentiment Analysis - A challenging problem of Natural Language Processing Sentiment analysis is a multi-discipline research field to analyze people’s opinions, sentiments, appraisals, attitudes, and emotions toward entities and their attributes expressed in written text. Entities can be products, services, organizations, individuals, events, issues, or discussion topics. Numerous research papers in this field have been published during the past decade, in the forms of both proceedings and journals, varying in several research disciplines including natural language processing (NLP), machine learning, data mining, information retrieval, e-commerce, management science and so on. The research works seem to indicate that rather than being a sub-problem of NLP, sentiment analysis is actually more like a mini version or a special case of the full NLP [Liu, 2015]. It touches every core area of NLP, such as lexical semantics, coreference resolution, word sense disambiguation, discourse analysis, information extraction, and semantic analysis. The only difference is that sentiment analysis requires the usage of opinion lexicon. In general, sentiment analysis is a semantic analysis problem, but it highly focuses on determining “positive” or “negative”, without the need to fully understand every sentence or document. Since human language is rarely precise or plainly spoken, NLP is characterized as an NPcomplete problem of computer science [Barton et al., 1987]. Sentiment analysis is therefore hard and challenging. Thus, "Anyone who says they're getting better than 70% [today] is lying, generally speaking", said Nick Halstead, CTO of DataSift, on The Guardian Magazine (June 2013)4. There are different levels of tasks in sentiment analysis. A basic task is to identify the sentiment polarity from a given text to learn whether the expressed opinion in that text is positive, negative, or neutral. A little more complicated task is to calculate the sentiment level from the text using a scale from 1 to 5, where the value of 1 stands for the most negative and the value of 5 stands for the most positive. Higher levels in sentiment analysis are to determine the opinions expressed toward to different features of a given entity, e.g., the screen of a cell phone, the services of a restaurant, or the resolutions of a camera. Besides that, application of sentiment analysis in some industries may require complex attitude types, such as “happy”, “sad”, “funny”, “surprised”, and “angry”.

4

https://www.theguardian.com/news/datablog/2013/jun/10/social-mediaanalytics-sentiment-analysis

11 In this research, we aimed at improving the accuracy of determining sentiment from short reviews about something, which were made available on websites or social media. 2. State-of-the-art overview Sentiment analysis techniques can be roughly divided into lexicon-based approach, machine learning approach and integrated approach [Medhat et al., 2014]. Lexicon-based approach relies on a sentiment lexicon, a collection of known and precompiled sentiment terms. Machine learning approach, on the other hand, applies popular machine learning algorithms and uses linguistic features to “learn” the sentiment-relevant features of text. Integrated approach combines methods and models from both lexicon-based and machine learning approaches [Le et al., 2016]. Since sentiment terms are instrumental to sentiment analysis, to fully understand sentiments toward a given topic in natural language, sentiment lexicons should be used. Recent research works have pointed out that sentiment analysis methods that do not use sentiment lexicons all result in poor performances due to the lack of understanding human language while solving NLP problem [Liu, 2015]. Sentiment lexicons can be collected manually by psychologists or compiled automatically from dictionaries such as WordNet [Miller, 1995] using algorithms or machine-learned methods. Over the years, researchers have designed numerous algorithms to compile such lexicons. A common drawback of lexicon-based approach is that while manual created sentiment lexicons are goldstandard, they have just a few popular terms, say thousands. In contrast, machine-learned lexicons may have a hundred times larger in size but are sometimes not reliable. Thus, choosing the sentiment lexicon to rely on is very important. The following sections will describe some popular sentiment lexicons and non-lexicon based approaches. 2.1. Liu lexicon Liu lexicon consist of a set of around 6800 English words classified into positive and negative opinion groups. The set was initiated using a small set of seed adjective words meaning either good or bad. The seed set was augmented using knowledge discovery methods based on semantic synonym and antonym relations. In WordNet, adjectives are organized into bipolar clusters. Clusters for fast and slow, for example, are shown in Figure 1. Of the two half clusters, one stands for senses of fast and the other stands for senses of slow. Each half cluster is headed by a head synset which is, in this case, fast and its antonym slow. Liu et al. utilized the adjective synonym and antonym sets in WordNet to predict semantic orientation of adjectives. At first, a small list of seed adjectives tagged with either positive or negative labels is manually created. This seed adjective list is actually domain independent. For example, great, fantastic, nice, cool are positive adjectives; and bad, dull are negative adjectives. The list will be then expanded using WordNet, resulting in a list of 4.783 negative terms and 2.006 positive terms including misspellings, morphological variants, slang, and social-media markup which are useful for social network data analysis. However, due to its small size, Liu lexicon cannot cover all of real world problems in terms of sentiment analysis.

ISBN: 1-60132-463-4, CSREA Press ©

12

Int'l Conf. Information and Knowledge Engineering | IKE'17 | a deep learning technique was applied to yield a little higher accuracy than the method of Socher et al., 85.7 percent versus 85.5 percent. While the machine learning techniques require large dataset and computational time, they yet cannot achieve significantly high performance without sentiment lexicons and appropriate models. Thus, this research introduces a lexiconbased method by combining two popular sentiment lexicons and utilizing a heuristic data pre-processing method to overcome the drawbacks of existing lexicon-based approaches.

Figure 1 Bipolar adjective structure of WordNet [Liu et al., 2004]

2.2. SentiWordnet SentiWordNet is basically a sentiment intensity lexical resource for opinion mining, in which the connections between terms and opinions are defined based on WordNet dictionary using a set of seed words which are popularly used. The expansion of the term set was done based on a gloss similarity measure, resulting in a set of around 146,000 terms that makes SentiWordNet affordable for many real world problems in terms of sentiment analysis. SentiWordNet is therefore automatic annotations of all synsets in WordNet according to the notions of “positivity”, “negativity” and “objectivity”. Hence, each synset s is associated with three numerical scores, Pos(s), Neg(s), and Obj(s), which indicate how positive, negative and objective the terms in the synset are. Different senses of the same term may thus have different opinionrelated properties. Each of these scores ranges in the interval [0.0, 1.0], and the total of their values is 1.0 for each synset. Because of its large size, SentiWordNet lexicon is very noisy. There exists a large majority of synsets that have neither positive nor negative polarity. The lexicon therefore fails at accounting for sentiment bearing lexical features relevant to text in microblogs [Hutto et al., 2014]. 2.3. Other approaches Machine learning explores the study and construction of algorithms that can learn from and make predictions on data. Machine learning based classification analysis of documents, for example, requires two data sets, one for training and the other for testing. For the training set, a set of automatic classifiers are used to learn various characteristics of the documents. The testing set, in turn is used to validate the performance of the classifiers [Le et al., 2016]. Many modern machine learning algorithms have been used so far, including unsupervised methods [Turney, 2002; Kennedy et al., 2006] and supervised methods such as NB, ME, SVMs to improve the performance. In addition to machine learning methods, customized techniques for sentiment classification have also been proposed [Dave et al., 2003]. Like unsupervised learning methods, they actually rely on sentiment lexicons. Use of machine learning method significantly contributes to the field. Socher et al. [Socher et al., 2013] applied Recursive Neural Tensor Network model on sentiment treebank pushed the stateof-the-art in single sentence polarity classification from 80% up to 85.4%. Kotzias et al. [Kotzias et al., 2015] recently claimed a rejection rate of 0% with their new method in which

III. PROPOSED METHODS In terms of human language, texts are made up of words. Words are defined in the dictionary. Thus, to understand a given text, we proposed to look up the words of the text in the dictionary for their meanings then sum up the sentiment scores. Our proposed method actually replicates the way human beings do to solve this problem. The method, in details, consists of the three key subtasks, described in the following sections. 1. Lexicon loading 1.1. Liu lexicon LIU method is based on a dictionary of sensitive terms which are divided into two sets, a positively sensitive set and a negatively sensitive one. The use of the Liu lexicon is therefore quite simple. 1.2. SentiWordNet In SentiWN method, the positive PosS and negative NegS polarity scores are assigned to synsets independently. Each synset consists of numerous terms that have the same meaning in a specific context. Terms are ranked based on their popularity in that such context. Information of synsets is stored in SentiWordNet database using the format of which an example is given in Table 1; each column stands for a synset attribute. POS ID

PosS

NegS

Synset rank

Gloss

n

05159725

0.5

0

good#1

a

01586752

1

0

good#6

benefit; “for your own good” agreeable or pleasing; “we all have a good time”;

a

01068306

0.375

0.125

unspoilt#1 unspoiled#1 good#20

not left to spoilt; “the meet is still good”

n

03076708

0

0

trade_good#1 articles of good#4 commerce commodity#1

Table 1 Example of synsets from SentiWordNet

Note that each term may be involved in many synset and may have different sentiment scores and ranks. Use of SentiWordNet requires to compute the overall score for the term. Given S the synset, =

| :

; :

,

which includes many instances of terms. Let T be the term of which the overall sentiment score will be computed, where,

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 | =

| :

; :

.

The computing process consists of two main steps,

13 using -er suffix or superlative form using –est suffix. The base form is then used for looking up the word in dictionary. 2.4. Negation marking

Step 1: Traveling through all synset of SentiWordNet ● ●

Scoring each synset: = = – Extract synset into words and their ranks

Step 2: Computing the average score of term T: ●

= ∑







=

This results in a “dictionary” of terms and their sentiment scores. Scores are in scale of [-1; +1]. If a score is positive, the corresponding term has a positive meaning. Otherwise, it has a negative meaning. Figure 2. demonstrate how to generate a sentiment dictionary using SentiWordNet.

Negators are words and phrases that switch sentiment orientation of other words in the same sentence. For example, the sentence “I don’t like this book” expresses a negative opinion although the word “like” is positive. Thus, negation need to be handled with care. In LIU method, they simply change the sentiment orientation of words that appear close to “not”, “no” and “yet” [Liu et al., 2004]. In this research, the negation marking method of Christopher Potts5 is applied. This method appends a _NEG suffix to every word standing between a negator and a clause-level punctuation mark. To improve the performance, additional negators are also considered, including the following words and phrases: not, dont, cant, shouldnt, couldnt, havent, hasnt, hadnt, wont, doesnt, didnt, isn, arent, aint, never, no, noone, none, nowhere, nothing, neither, nor, no sense of, lack, lacks, lacked, lacking, far from, away from, have yet to, has yet to, had yet to. For example, given the text “It is not delicious: it is TOO spicy!!!”. The content after applying the pre-processing method is “it is not delicious_NEG it is too spicy”.

Figure 2 Example of sentiment dictionary created from SentiWordNet

2. Text pre-processing This is an essential task in text mining, especially in social network data analysis. Our pre-processing procedure includes four steps: text extraction, text cleaning, stemming and negation marking which is actually a heuristic method. Heuristics drawn from experiences about syntax and grammar will be applied to make easy the process of matching words in the text with those from the lexicons in the following task. This method basically does not guarantee an optimal or perfect solution. However, an approximate method together with additional downstream analyses is an acceptable approach for text sentiment analysis.

3. Sentiment scoring To compute sentiment score, we proposed a simple method by combining the method in LIU and that of SentiWordNet. Given t, the text to analyze. Denote by n, the length of t, and wi the i th word in t, i=1..n. Sentiment score of the ith word is define as: 1 1

=

"_

"

3.1. LIU scoring method =

3.2. SentiWordnet scoring method =

2.1. Text extraction

3.3. SentiWordnet scoring method using word count

The reviews, which consists of a few sentences, are first split into clauses based on clause-level punctuation mark. A clause-level punctuation mark is any word from the regular expression: ^[.,:;!?]$. This is actually a middle step for negation marking.

2 =

3.4. Proposed scoring method

2.2. Text cleaning This is to remove special characters and turn the uppercase letters into lowercase ones.

S 2

=

2.3. Stemming Stemming is a heuristic method for collapsing distinct word forms by trying to remove affixes. Noun can be in either singular or plural form using either –es or –s suffix. Similarly, verb can be in either present or past participle form using –ing and –ed respectively. Adjectives can be in comparative form

" " " 5

", ", ",

0 0 =0

http://sentiment.christopherpotts.net/lingstruc.html#negation

ISBN: 1-60132-463-4, CSREA Press ©

14

Int'l Conf. Information and Knowledge Engineering | IKE'17 | Average score is in the range of [+1, -1]. IV. EXPERIMENTAL RESULTS There are three different versions of our proposed method; (1) LPL is the method version that uses Liu lexicon with the data pre-processing method, (2) SPL is the method version that uses SentiWordNet and data pre-processing method, and (3) HSL is the hybrid version which is described in details in section III. Experiments were done using these three versions in comparison with SentiWN method (SentiWordNet) and LIU method [Liu et al. 2004, 2005]. 1. Datasets In order to evaluate and compare the performance of LPL, SPL and HSL with LIU and SentiWN, we utilized the real datasets made available online by Amazon, IMDb, and Yelp [Kotzias et al., 2015]6. Those datasets have been widely used in recent research works. Each dataset includes 1000 reviews that are manually labelled, 0 and 1 for negative and positive respectively.  Amazon dataset: The dataset consists of reviews and scores of the goods sold on Amazon website. The data are mainly from business transactions on cell phones and phone accessories. A part of the dataset was collected by McAuley et al. [McAuley et al., 2013].  IMDb dataset: This dataset refers to the IMDb movie reviews. The dataset was originally introduced by Maas et al. [Maas et al., 2011] as a benchmark for sentiment analysis.  Yelp dataset: This dataset consists of reviews on restaurant services which were publicly made available on the Yelp website7. 2. Experimental results Results of five methods are shown in Table 2. Without an application of data pre-processing method, SentiWN performed worst, achieving the lowest accuracy of 50%. LIU performed a little better, achieving an accuracy of 74%, thanks to its capability of working on review analysis without the use of text pre-processing. By applying a heuristic text pre-processing method, SPL and LPL produced significant performances. Interestingly that SPL performed much better than SentiWN, with an accuracy of 28.6% higher than that of SentiWN. LPL also performed better than either LIU and SentiWN. However, HSL is the one that performed at best. Regarding the performance of HSL on the three datasets; HSL achieved an accuracy of 84.8% on Amazon dataset, the highest accuracy of its performance. On Yelp dataset, its accuracy was a little bit lower, say around 83%. However, it yielded a low accuracy on IMDb dataset, which is yet higher than that of the other methods. HSL is the best method from those discussed in this study. The experimental results also indicated that movie reviews are apparently harder to analyze than reviews of other product types [Turney, 2002; Dave et al., 2003]. It is because the opinions about movies are not clearly and directly expressed, especially the negative opinions. For example, such reviews “God, and I can never get that 90 6 7

minutes back!”, or “1/10 - and only because there is no setting for 0/10” are not totally clear and hard to understand. In addition, a proper setting of HSL parameters, , may impact the algorithm’s performance. For example, if it was set 0.4, 0.5 and 0.1 for , respectively, HSL yielded a better result, say 82.3% of accuracy, than that where the parameters were set to ⅓, ⅓ and ⅓ respectively, which achieves an accuracy of 81.4%. This is because the scoring method based on Liu lexicon sentiment word counting and that uses SentiWordNet scoring method work much better, they are therefore much be given higher priority. Method (parameters)

Amazon (1000)

IMDb (1000)

Yelp (1000)

Average

SentiWN

49.9%

50%

50%

50%

LIU

74.7%

75.3%

70.6%

73.5%

SPL

80.9%

74.3%

80.6%

78.6%

LPL

77.3%

75.9%

72.8%

75.3%

HSL (⅓, ⅓, ⅓ )

83.7%

78%

82.6%

81.4%

HSL (0.4, 0.4, 0.2)

84.8%

78.2%

83.2%

82.1%

HSL (0.4, 0.5, 0.1)

84.8%

78.7%

83.4%

82.3%

Table 2: Experiment results

In summary, the three versions of our proposed method, SPL, LPL and HSL performed better than SentiWN and LIU. The hybrid method HSL is the one at best. It is also much better than any methods that use lexicon-based approach [Socher et al., 2013]. V.

DISCUSSIONS

1. Proposed method limitations The proposed lexicon-based method is simple but performed quite effectively. It however has some drawbacks. Firstly, it was designed for sentiment analysis of reviews in form of short text on social networks where people usually share their quick thought about a given topic. For long text like articles and books, where there are many topics and entities are simultaneously mentioned. It requires additional analyses such as classification analysis to extract topic, related entities and aspects before conducting sentiment analysis. Secondly, the proposed method heavily depends on the lexicon to be used. As indicated in the introduction section, the lexicon should be properly chosen. Given SentiWordNet the largest sentiment lexicon, Liu lexicon is however the one that works best on review sentiment analysis. For social network data where there are numerous slang words and emoticons, a more specialized lexicon should be a must to handle them all. Finally, due to the lack of understanding the text structures, the proposed method treats all the words from the text equally while it is clearly that they play different roles based on their different meanings. This may make the method focusing improperly on

https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences https://www.yelp.com/dataset_challenge

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

15

unimportant parts of the text, resulting in an incorrect identification of the key opinion from the text.

account [Kouloumpis et al., 2011], and large and varied datasets should be used to validate the benefit from using it.

2. State-of-the-art problems Although sentiment analysis has been a very active research field these days, a number of challenging problems of this field still remain. Firstly, the sarcasm problem; Sarcasm is a sophisticated form of speech act in which the speakers or the writers say or write the opposite of what they mean. Sarcastic sentences are very difficult to deal with in sentiment analysis since the opinion is not clearly and directly expressed, and it is also hard to recognize them. Given the following sentence as an example: “Glad I didn't pay to see it”, “Essentially you can forget Microsoft's tech support”. According to our understanding, sarcastic sentences are more commonly used in reviews of movies than that of other products and services. Secondly, the conditional sentences; they are hard to deal with [Narayanan et al., 2009]. For instance, “And, FINALLY, after all that, we get to an ending that would've been great had it been handled by competent people and not Jerry Falwell.” and “Even if you love bad movies, do not watch this movie”. Lastly, the objective sentence; the objective sentence can imply sentiment [Bing Liu, 2015]. For examples: “I bought this camera and it broke on the second day” and “What on earth is Irons doing in this film?”.

VI. CONCLUSION Sentiment analysis is yet a challenging problem and gains the interests of many researchers from different disciplines. Its applications are practical, promising and various in many industries, including business strategic management. In this paper, we have introduced a lexicon-based method for Sentiment Analysis of social networking data. Our proposed method is new and effective because it is a combination of the most popular lexicon-based sentiment analysis methods, the SentiWN and LIU methods. In addition, application of a data pre-processing analysis in our method allows filtering of opinion-oriented words from the text before conducting sentiment analysis. That guarantees the method to achieve better performance than any state-of-the-art methods using lexicon based approaches. The experimental results in the section (IV) of the paper indicated that our methods outperformed the other methods using the same approach. Our directions for future work are (1) to make a deeper study of language structures to fully understand opinion in aspect-level, (2) to leverage additional lexicons and pre-processing algorithm for the proposed method to achieve better results, and (3) to examine on large and varied datasets to prove accuracy of the proposed method.

3. Further directions Our further directions stem from the drawbacks of the proposed method, which are discussed in section (V.1). Firstly, there is a need to perform POS-tagging in the text preprocessing method. POS-tagging is the process to assign partof-speech tags to every word in a given sentence. It is also known as word classes or syntactic categories e.g. noun, verb, adjective. POS-tagging will help with lexical ambiguity problem because every word may have many different meanings due to its various classes. The synsets in SentiWordNet are, in fact, words with POS. Thus, use of SentiWordNet may bring a benefit regarding this issue. In addition, knowing word category also means a lot about the syntactic structure around the word. This allows further investigation of the structure in order to fully understand the text. There are currently numerous popular POS-taggers, for example, the Stanford Log-linear Part-Of-Speech Tagger [Toutanova et al., 2003]. Secondly, regarding the clause extraction step, besides punctuation marks, clause-level marks which are words and phrases, e.g. as, but, however, therefore, although, because, due to, should also be used. Thirdly, sentiment analysis should be done only if the domain-specific sentiment lexicons were leveraged. Sentiment classification is sensitive to the domain because words and language structures used in different domains may express different opinions. More specifically, given a word, it may support different sentiment polarity in different domains. Moreover, people usually compare different things in their statement. Therefore, further investigation on aspect-level to identify opinion toward specific entity and feature yet requires to be considered although there has been a number of research works on this problem [Liu et al., 2004]. Lastly, an important issue with social network data analysis is the emoticons, slang words and abbreviations, for instance, lol, omg, wtf, lmao, :)), :(, :-D, B-). Thus, construction of a dictionary for them should be taken into

REFERENCES [1]

[Asur et al., 2010] Asur, S. and Huberman, B.A., 2010, August. Predicting the future with social media. In Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on (Vol. 1, pp. 492-499). IEEE. [2] [Baccianella et al., 2010] Baccianella, S., Esuli, A. and Sebastiani, F., 2010, May. SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. In LREC (Vol. 10, pp. 22002204). [3] [Barton et al., 1987] Barton, G.E., Berwick, R.C. and Ristad, E.S., 1987. Computational complexity and natural language. MIT press. [4] [Dave et al., 2003] Dave, K., Lawrence, S. and Pennock, D.M., 2003, May. Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In Proceedings of the 12th international conference on World Wide Web (pp. 519-528). ACM. [5] [Fan et al., 2014] Fan, W. and Gordon, M.D., 2014. The power of social media analytics. Communications of the ACM, 57(6), pp.74-81. [6] [Liu et al., 2004] Hu, M. and Liu, B., 2004, August. Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 168-177). ACM. [7] [Hutto et al., 2014] Hutto, C.J. and Gilbert, E., 2014, May. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Eighth International AAAI Conference on Weblogs and Social Media. [8] [Jansen et al., 2009] Jansen, B.J., Zhang, M., Sobel, K. and Chowdury, A., 2009. Twitter power: Tweets as electronic word of mouth. Journal of the American society for information science and technology, 60(11), pp.2169-2188. [9] [Kennedy et al., 2006] Kennedy, A. and Inkpen, D., 2006. Sentiment classification of movie reviews using contextual valence shifters. Computational intelligence, 22(2), pp.110-125. [10] [Kotzias et al., 2015] Kotzias, D., Denil, M., De Freitas, N. and Smyth, P., 2015, August. From group to individual labels using deep features. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 597-606). ACM.

ISBN: 1-60132-463-4, CSREA Press ©

16

Int'l Conf. Information and Knowledge Engineering | IKE'17 | [11] [Kouloumpis et al., 2011] Kouloumpis, E., Wilson, T. and Moore, J.D., 2011. Twitter sentiment analysis: The good the bad and the omg!. Icwsm, 11(538-541), p.164 [12] [Le et al., 2016] Le, H.S, Ho, T.T, 2016. Applying opinion mining for exploring foreign visitors’ preferences on hotel services. Information Systems in Business and Management Conference (ISBM16), ISBN: 978-604-922-440-9, UEH Publishing House. [13] [Liu et al., 2005] Liu, B., Hu, M. and Cheng, J., 2005, May. Opinion observer: analyzing and comparing opinions on the web. In Proceedings of the 14th international conference on World Wide Web (pp. 342-351). ACM. [14] [Liu, 2012] Liu, B., 2012. Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1), pp.1-167. [15] [Liu, 2015] Liu, B., 2015. Sentiment analysis. Cambridge University Press. [16] [Maas et al., 2011] Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y. and Potts, C., 2011, June. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1 (pp. 142-150). Association for Computational Linguistics. [17] [Medhat et al., 2014] Medhat, W., Hassan, A. and Korashy, H., 2014. Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5(4), pp.1093-1113. [18] [Miller, 1995] Miller, G.A., 1995. WordNet: a lexical database for English. Communications of the ACM, 38(11), pp.39-41. [19] [McAuley and Leskovec, 2013] McAuley, J. and Leskovec, J., 2013, October. Hidden factors and hidden topics: understanding rating dimensions with review text. In Proceedings of the 7th ACM conference on Recommender systems (pp. 165-172). ACM. [20] [Mullen, 2006] Mullen, T. and Malouf, R., 2006. A Preliminary Investigation into Sentiment Analysis of Informal Political Discourse. In

AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs (pp. 159-162). [21] [Narayanan et al., 2009] Narayanan, A. and Shmatikov, V., 2009, May. De-anonymizing social networks. In Security and Privacy, 2009 30th IEEE Symposium on (pp. 173-187). IEEE. [22] Ohana et al., 2009] Ohana, B. and Tierney, B., 2009, October. Sentiment classification of reviews using SentiWordNet. In 9th. it & t conference (p. 13). [23] [Pang et al., 2008] Pang, B. and Lee, L., 2008. Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, 2(1–2), pp.1-135. [24] [Pennebaker, 2001] Pennebaker, J.W., Francis, M.E. and Booth, R.J., 2001. Linguistic inquiry and word count: LIWC 2001. Mahway: Lawrence Erlbaum Associates, 71(2001), p.2001. [25] [Socher et al., 2013] Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y. and Potts, C., 2013, October. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the conference on empirical methods in natural language processing (EMNLP) (Vol. 1631, p. 1642). [26] [Stone et al., 1966] Stone, P.J., Dunphy, D.C. and Smith, M.S., 1966. The general inquirer: A computer approach to content analysis. [27] [Toutanova et al., 2003] Toutanova, K., Klein, D., Manning, C.D. and Singer, Y., 2003, May. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1 (pp. 173-180). Association for Computational Linguistics. [28] [Turney, 2002] Turney, P.D., 2002, July. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 417-424). Association for Computational Linguistics.

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

17

Knowledge Management and WfMS Integration R. Anderson 1 , G. Mansingh2 1 Department of Computing, The University of the West Indies, Mona-WJC, Montego Bay, Jamaica 2 Department of Computing, University of the West Indies, Mona, Kingston, Jamaica Abstract— Several advances in workflow technology have made it possible to extend the use of workflow management systems in many domains that require flexible workflows due to their dynamic and unpredictable nature. Additionally, the development of a repeatable process for conducting knowledge management initiatives provides the basis to integrate workflow technologies into the knowledge management domain. In this paper, we discuss a design, implementation and integration of a workflow management system for knowledge management. Most importantly, we present a set of tasks which forms the basis of a successful configuration of the workflow management system. Through this work we highlight the important components in the design, implementation and configuration that facilitates workflow systems integration into knowledge management activities. We demonstrate that workflow management systems, if appropriately designed and configured can be integrated into knowledge management initiatives to provide improvements in process management and likelihood of initiative success. Keywords: Knowledge Management, Workflow Systems, KMI, WfMS, KM

1. Introduction Knowledge is a fluid mix of framed experience, values, contextual information and expert insight that provide a framework for evaluation and incorporating new experiences and information [5]. There is increased need to manage knowledge in the organization since it has been suggested to be the new organizational wealth [26]. A knowledge management initiative (KMI) brings together the necessary tools, techniques and activities to improve the generation, sharing, management and storage of knowledge in a domain. This requires effort to understand the nature of the organization, what exist in terms of knowledge assets and how these may be used to better support the organization’s goals. KMI activities may be guided by a methodology, dependent on the nature of the organization and the type of knowledge assets they wish to exploit; people, process or data. In conducting a KMI, the activities involved must be organized in some specified sequence to create a process. This sequence defined as a process may be carried out by one or a team of humans, or a combination of these may be required to perform tasks within that process. Human tasks include interacting with computers closely (e.g. providing input commands) or loosely (e.g. using computers only to

indicate task progress) [19]. A formal process gives rise to the use of computers to manage these activities through a special type of software called a workflow management system (WfMS). Having established the need for such a software solution through thorough assessment of prior research and identifying the convergence of a gap in WfMS and their use in the knowledge management domain for initiatives geared at improving knowledge capability, we propose a WfMS primarily for managing knowledge management initiatives. We build a prototype system based on our design and derive a configuration of tasks which allowed for easy integration of the system into the conduct of a KMI. We conclude that this system and configuration specification can be applied in the knowledge management domain to provide improvements in the process management and increase the likelihood of KMI success when applied. This paper therefore discusses aspects of the design of a workflow management system, its implementation and integration into the knowledge management domain for managing a knowledge management initiative. We focus on how the system was designed and the task configuration necessary for integration with knowledge management activities for improved knowledge initiative success. We provide notes on our continued work on improving and generalizing integration activities based on advances in both the workflow systems and knowledge management domains.

2. Knowledge Management and Workflow Systems 2.1 Knowledge and Knowledge Management Knowledge Management involves the processes that provides for the creation, storage, application and update of knowledge. Several studies have underscored the importance of knowledge to organizational success and many methods and algorithms for extracting knowledge from data through data mining have been well documented [17]. The mature nature of these techniques and the rich tool-set have enabled widespread use across various industries with significant game changing success. Knowledge management has therefore become vital to the modern organization [14][24]. Given the importance of knowledge to organizations, it is important to find strategies to effectively manage knowledge. The knowledge management process within the organization may be guided by some established process, similar to

ISBN: 1-60132-463-4, CSREA Press ©

18

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

software development models used in the development of traditional software systems. In order to engage the benefits of knowledge, deliberate activities are required to improve knowledge exploitation in the domain towards improvements in knowledge capability. In general, the activities to transition the domain for improved knowledge management capability can be considered as a workflow.

2.2 Workflow management systems A workflow is a collection work items (or tasks) organized to accomplish some process, for example, processing office requisition through to settlement or managing the scientific process [6], often established to ensure proper sequence and quality assurance. A workflow management system (WfMS) either completely or partially support the processing of work items to accomplish the objective of a group of tasks within a process. These systems usually include features for routing tasks from person to person in sequence, allowing each person to contribute before moving on in the process [9]. Given that workflow systems allow tracking of tasks from one step to the other and assigns participants in a process, they have the advantage of providing positive benefits to managing processes and enhances visibility of the process it manages. WfMS are particularly advantageous in supporting efficient business process execution and timely monitoring [7]. Other benefits observed from the application of WfMS in varied domains include: enabling reuse of process templates, robust integration of enterprise applications, and flexible coordination of human agents and teams [28]. Additionally, these systems help achieve software flexibility by modeling business processes explicitly and managing business processes as data that are much easier to modify than conventional program modules [28]. WfMS technology initially were applied to support the flow of documents in administrative operations, however, there are several examples where this technology has been successfully extended to automate production processes involving the execution of complex transactions on top of heterogeneous, autonomous and distributed systems, as well as the composition of e-services [4]. These systems have also been used successfully in various scientific domains for science automation among other specialized areas [6][13]. Since the need for flexibility became important, especially in fast-paced dynamic environments, different techniques have been used to achieve flexibility in modeling and enactment of workflows [8][16], but most of these systems do not allow modification to a process once it has started executing [28]. The increased complexity of business processes and the fast paced dynamic nature of many environments, however, require WfMS to be adaptive and flexible in managing workflow [3][14][23]. The key reasons supporting the need for adaptability include new business needs, supporting change after the process has begun, handling exceptions during

process execution and providing flexibility while assuring coherence and process quality [14][20]. Hence, more emphasis has been placed on methods for developing intelligent, adaptive and flexible workflow systems. Klien & Dellarocas [15] also emphasizes the need for these systems to be adaptive to effectively support today’s dynamic, uncertain and error-prone collaborative work environments. Given that the activities involved in improving a firm’s knowledge capability is dependent on the types of knowledge present and what they are keen on exploiting, the application of workflow systems in the knowledge management domain must embrace adaptability in its implementation. This will allow for modifications to the workflow at any stage of the initiative to impact the re-organization of the activities and adding or removing defined process activities. Work done in the workflow and knowledge management domains provides the foundation necessary to implement a workflow management systems for managing knowledge management initiatives. The suggested reuse of process templates is of particular benefit to the knowledge management domain especially since there may be similarities in the type of knowledge from one initiative to another, although environments may be vastly different.

3. Research Methodology The objective of the design science paradigm is to extend organizational capabilities by creating new and innovative artifacts [12][11]. The artifact embodies the ideas, practices, technical capabilities and products which are required to accomplish the analysis, design, implementation and use of information systems [12]. This research is conducted based on the design science guidelines proposed in [12]. A discussion of how these guidelines were applied to our work is presented below.

3.1 Application of Research Methodology Design as an Artifact: We developed a workflow management system for integrated knowledge management activities. A specific configuration instance was also developed and integrated in a Knowledge Capability Improvement Initiative. Problem Relevance: We outlined the need for this solution in the background. We further identified relevance through the review of literature related to workflow systems and knowledge management and their benefits. Design Evaluation: We evaluated the impact of integrating the developed workflow system in a real-world knowledge management initiative. Research Contribution: The successful implementation, the specification of a set of sequence tasks and configuration, and use in a selected domain to improve the process involved in knowledge initiatives, which shows a path to repeating integration of such systems. Research Rigor: Rapid Application development [18]

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

methodology was utilized to manage the development of the prototype WfMS. Evaluation was done by using the system in an actual knowledge capability improvement initiative. Design as a Search Process: We reviewed the existing literature and practice for improving knowledge capability in firms with specific emphasis on activities towards implementing knowledge management systems. We evaluated the literature and practice in the workflow management systems domain and recognized there was need to integrate these two domains. We used insights from the experiences in the knowledge management literature to guide the integrated use of our system and task configuration in the knowledge management initiative. Communication of Research: Presentation of our findings and evaluation of the results in a conference proceeding and a Journal paper. The prototype system and the results have been discussed with stakeholders of the KMI. Based on the application of the guidelines described [12] we proceed to discuss the details of the project results.

4. Designing the WfMS Many workflow researchers advocate the use of models and systems to define the way the organization performs work. The work procedure in a WfMS is generally defined by a process composed of a set of discrete work steps with explicit specifications of each activity flow between the steps required to complete the process [10][25]. The workflow model prescribes that a specific management process is separated from application process, providing a process definition, management and implementation to support efficient business process execution and timely monitoring [7]. The workflow engine requires a definition and interpretation of the operation of the process it implements [1]. WfMS usually implement a modeling component that provides facilities for creating and browsing a representation model, for applying various algorithms to an analysis model, and for collaborative interaction and information archiving for design models. This modeling system should allow for editing and preparation of the workflow specification [21]. The modeling components may also include facilities to assist in the coordination and execution of a procedure/process, describe how the procedure/process performs the work, and to analyze the behavior of a procedure [21]. According to Walsh et al [27] workflow systems should include capabilities to allow for definition of execution pattern for a computational process, mechanisms for storing and retrieving the pattern defined and tools that allow the execution of the pattern in different scenarios. Based on the knowledge of the activities required for knowledge management initiatives, we created a design for implementing the prototype WfMS which focuses on the arrangement of the features into three main components as depicted in figure 1.

19

Fig. 1: WfMS Component Diagram

4.1 The AppCore Component The AppCore component contains the key application logic for the specification and management of the workflow upon which the other key features depend for the management of tasks. The key interfaces provided by this component include: 1) ManageTask which allows adding and removing of tasks for the workflow including any constraints to the tasks, 2) CreateProcessModel which organizes the tasks in an acceptable sequence, ensuring all constraints are met. This can be invoked at any time to effect a change in the process sequence, especially in cases where new tasks are added and removed from the workflow or if constraint specifications change. 3) CommitProcessSequence that assigns the configured workflow for access by the interface which is used to build the application menu for execution of the knowledge management initiative.

4.2 The Process Manager Component The main objective of the process manager component is to provide features to allow the initiative owners to apply specific tools to perform activities necessary within the KMI. Since KMIs may differ in what is required to complete them depending on the type of knowledge that exists [22], the tools required to support the execution of specific tasks in one initiative may differ significantly for another. In general, the tools may include knowledge elicitation (KE)/knowledge acquisition (KA) tools such as questionnaire manager and other artificial tools that are widely used in elicitation/acquisition and coding of knowledge. Built into these tools are relevant intelligent agents that will ensure the proper elicitation of knowledge. One simple example is the use of an expert system to do automated questioning of an expert where subsequent question responses are used to determine next questions in the sequence. The Process Manager provides four main interfaces to the

ISBN: 1-60132-463-4, CSREA Press ©

20

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

other application units. The ManagePlugin interface allows installation of tools that are necessary for use in the knowledge management initiative. This will ensure that the tools access and update data within the WFMS. The ConfigReader allows access to the configured sequence and rendering this in the main application. The TaskLogging interface is important for managing the audit trail within each initiative that is managed by the application in addition to task completion logs which become important for performance reporting. The process improvement sub-component should also include document management capabilities which will allow the system to act as the document repository for the initiative. This is useful to ensure that the entire initiative management and tracking is integrated into the designed workflow management system.

4.3 The Analytics Engine Component The analytics engine will provide assessments of task performance within the workflow. It will log activities and provide analysis of performance with visualizations of tasks progress. This component will also include a set of tools that can perform basic data analysis to support knowledge elicitation and acquisition activities within the execution of the overall KMI. The collated data within the analytics engine describes how tasks are done and their progression. This will have capabilities for eliciting trends based on the performance of tasks within the initiatives and will include tools to allow the user to specify rules for identifying lagging activities, prompting where tasks are overdue and managing the metrics that must be met as performance indicators for the initiative. The analytics engine, therefore, collects, collates and assesses performance within the context of the initiative itself such that on-the-fly reports and analysis of the KMI can be done. This should inform the user of how well the initiative is going, patterns of success or failure on execution, changes and any trends that could inform how things should change to meet the objectives of the initiative. The user may specify rules for determining how tasks are assessed through the AnalysisModel interface and this will allow the generation of alerts, suggestions, triggers and visualizations via the SuggestionBot, PerformanceIndicator, ManageTriggers and Visualize interfaces respectively.

5. Implementation and Integration A prototype of the WfMS for KMI based on the design discussed in the previous section was successfully implemented using the object oriented approach. The system has to main user interfaces: 1) the initiative interface and 2) the administrator interface. All setup prior to the start of the any KMI initiative, and before the initiative interface can be access must be done in the administrator section. The modifications made in the administrator section are immediately propagated and accessed by the initiative interface to ensure changes are reflected in real-time.

The integration of the WfMS prototype required specifics of the knowledge management domain be carefully considered. We therefore used the CoMIS-KMS process [2] developed for the knowledge management domain to guide the creation of the configuration for the workflow. As outlined in table 1, the configuration of tasks represents one template that we developed for KMIs. The implementation of the prototype depended on a mixture of in-memory and disk based techniques and technologies for managing the configuration data-sets, managing the workflow menu and its adaptability as well as providing interfaces for the end user who need not interact with the administrative functions of the system, but are also engaged in the knowledge initiative. The MySQL Database System was used to manage configurations and data which required disk storage. PHP 5 and the Apache Web server were used to run the web-based interface that supports both administrative and stakeholder access to the administrative and initiative features respectively. A task interface was provided to the administrator to add tasks and task constraints. Thereafter, activities, tools, restrictions on date and assignments could be added. Once a task is fully specified, it became available for the sequencing functions and activities are attached to the tasks (and therefore are only accessible via that task). The CoMIS-KMS process model [2] specified a sequence of phases with guidelines which we used to derive a discrete set of KMI tasks (see table 1). This was installed for the KMI after which the Create Process Sequence routine was invoked and produced a configuration for the workflow menu in the initiative interface. The tasks derived based on the CoMIS-KMS are shown in table 1. These were preconfigured for easy installation with any new KMI. Table 1: CoMIS-KMS Tasks Configuration Phase Task Description

1

2

KSINV - Knowledge Source Inventory MAPR - Create mappings of people and processes VMAP - Validate mappings with stakeholders GAPA - Conduct Gap Analysis ISO - Identify Specific Outcomes PRO - Prioritize Outcomes KSA - Knowledge sources assessment

3

LNK - Link Outcomes to knowledge sources KAM - Conduct elicitation and acquisition

4

ENC - Encode / Represent Knowledge DPL - Develop deployment plan

5

KUM - Develop knowledge audit protocols IKW - Integrate knowledge, deploy

6

MO - Assess and monitor Outcomes RUS - Establish and conduct review and update

This storage of features as data that enables the workflow

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

is a key difference between traditional software systems and WfMS as supported in the literature. Although the CoMISKMS model [2] provided a list of phases and suggested work to be completed in each phase, the identified tasks needed activities that would be important to complete them. Figure 3 gives an example of the activity defined for a specific task, in this case the knowledge source assessment - KSA. The task definition and discretization of activities was an important factor to ensure that the system could be integrated into the knowledge management environment as these activities were specifically defined based on the domain and its requirements. The prototype system that was developed further allowed for additional tasks to be defined with attached activities. The specification of task and their attendant activities was paramount to the successful use of the system for an actual knowledge initiative. The implemented system was deployed to an Internet server where the administrator interface was used to setup a new initiative for the case study. The default CoMIS-KMS was installed using the administrative section of the system and the activities for each task, based on table 1 configured to allow for assigning tasks, setting deadlines, automated stakeholder communication and document management services for each of the activities to be performed. Figure 3 gives an example of the activity view and files/tools associated with the activity for a partially completed knowledge initiative.

Fig. 2: New Initiative Setup Page Figure 2, displays the new initiative initial landing page from where the workflow process sequence can be installed, tasks can be added, removed and constraints can be inserted. All tasks in the process sequence must have at least one activity. Having setup and initialized an instance of for an educational institution that indicated the desire to improve their knowledge application to decision making towards improved student performance, we deployed the instance and made it accessible in a secured web environment. Since

21

the KMI included several stakeholders, a meeting was held to provide basic overview of the prototype and how it will be utilized. Key to this was that the management of the initiative was through the leadership of the institution in conjunction with the researchers who were the technical experts in both knowledge management and the use of the workflow software. The tasks setup and sequence configuration were done before introducing the institution’s team to the system. The system was then initialized and the first set of tasks within the workflow were assigned. This primarily included gathering reports, reviewing processes and the data that supported each of these processes (see KSINV in table 1). Immediately, it became apparent that the use of the WfMS provided significant benefits to this set of activities as different stakeholders were assigned responsibilities to supply important reports and complete data collection forms built into the system. This of course was held in the WfMS repository for the knowledge experts and other stakeholders to review and take further action before moving to the next task in the workflow menu. Importantly, the workflow menu only allowed access to the activities in a subsequent task if the preceding activities on which such task depend are all completed. Therefore, the stakeholder who was assigned an activity was required to mark an activity as complete once their work is done. The general conduct of tasks in the workflow sequence continued with assignment, performance of assigned activities and transition to subsequent activities until the workflow was complete. Additionally, the administrative section was used to install and instantiate different tools that support different activities throughout the conduct of the KMI. At the beginning of the initiative, all tasks are automatically set as pending. The change of status of a task to in-progress or complete was done by a by the workflow routine which dynamically adjusts tasks based on status of activities defined for it. Therefore, once an activity was assigned and the task was enabled in the workflow, any note, tool installed or log recorded would automatically change the task status to in-progress. Once all activities are marked complete by the assignee or administrator, the task status is automatically updated to complete and all tasks that depend on it are enabled in the workflow. Figure 4 shows an example of the actual workflow menu in the initiative we conducted. Variance in color indicates the tasks which have been completed or enabled and those that are disabled because all their dependencies (tasks) have not been completed. Figure 5 illustrates one tool that was installed for this initiative. It was configured for a specific task that was assigned to a key stakeholder in the initiative for them to interactively specify details that was then used to do mapping of knowledge assets to some specified initiative outcomes. The constant use of the tools in the prototype was essential

ISBN: 1-60132-463-4, CSREA Press ©

22

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

Fig. 3: Example: Activity View for Task - KSA

Fig. 4: Example: Workflow Menu

on process management, as supported in the literature. In this study, we demonstrate that these impacts also extend to the knowledge management domain.

6. Conclusion and Future Work

Fig. 5: Example Tool: KPI Owner

to the completion of the activities required for the success of the initiative. The ability to manage the assignment and routing of tasks and activities to various participants proved key to the timely execution of tasks and provided a single reference point for information about progress of the KMI. The prototype also acted as a repository for the resources required to complete key activities in some tasks and we posit that this enhanced the timeliness of the actions taken by stakeholders involved in the initiative. The successful use of the implemented system provided confirmation of the positive impacts that WfMS can have

In this paper a design and an implementation of a workflow management system specially built and configured for KMI is presented. We highlight the important considerations for success of WfMS in the knowledge management domain, especially task and activities configuration integrated with the ability to adjust work sequence and install plugin-tools for different activities. A list of tasks was developed, which was informed by the knowledge management literature, then used to configure a workflow. Further, we identified some benefits derived from successful integration of the WfMS in an actual knowledge improvement initiative. These included improvements in communication among stakeholders, better tracking of progress, improved accessibility to resources in the workflow and ability to adjust activities to allow changes necessary as the tasks are being performed. The design for the prototype that was developed provides the framework for implementing WfMS in the knowledge management domain. The successful prototype implementation, configuration and use has also provided key contribution to theory in the

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

WfMS and KM domains. We have demonstrated that a set of discrete tasks as specified in table 1, which we derived, can be applied in an flexible sequence to conduct knowledge management initiatives. We also identified some key considerations necessary for successful integration of WfMS in the knowledge management domain. As part of our future work, we intend to use the prototype in additional initiatives across different domains and identify common factors that impact the successful application of WfMS in KMI. Additionally, we intend to expand the range of tasks and tools that are available for configuration and assess the impact of these configurations on integration and overall successful WfMS application in Knowledge Management.

References [1] Ahmad, B., Izzi, M.Z., Sayung, C., Wong, L.H., Salim, M., Muhamed, M., Som, M., Kasim, M., Kurniawan, R. and Biniwale, S., 2014, April. Samarang Integrated Operations (IO): Well Performance Workflows Enable Continuous Well Status and Performance Monitoring. In SPE Intelligent Energy Conference & Exhibition. Society of Petroleum Engineers. [2] Anderson, R. and Mansingh, G., 2016. Towards a Comprehensive Process Model for Transitioning MIS to KMS. International Journal of Knowledge Management (IJKM), 12(1), pp.1-17. [3] Buhler, P.A. and Vidal, J.M., 2005. Towards adaptive workflow enactment using multiagent systems. Information technology and management, 6(1), pp.61-87. [4] Casati, F., Fugini, M.G., Mirbel, I. and Pernici, B., 2002. Wires: A methodology for developing workflow applications. Requirements Engineering, 7(2), pp.73-106. [5] Davenport, T.H. and Prusak, L., 1998. Working knowledge: How organizations manage what they know. Harvard Business Press. [6] Deelman, E., Vahi, K., Juve, G., Rynge, M., Callaghan, S., Maechling, P.J., Mayani, R., Chen, W., da Silva, R.F., Livny, M. and Wenger, K., 2015. Pegasus, a workflow management system for science automation. Future Generation Computer Systems, 46, pp.17-35. [7] Du, L. and Jiang, Z.F., 2014. Analysis and design of library office automation system based on workflow. In Advanced Materials Research (Vol. 889, pp. 1301-1305). Trans Tech Publications. [8] Edmonds, E.A., Candy, L., Jones, R. and Soufi, B., 1994. Support for collaborative design: Agents and emergence. Communications of the ACM, 37(7), pp.41-47. [9] Fakas, G. and Karakostas, B., 1999. A workflow management system based on intelligent collaborative objects. Information and Software Technology, 41(13), pp.907-915. [10] Georgakopoulos, D., Hornick, M. and Sheth, A., 1995. An overview of workflow management: From process modeling to workflow automation infrastructure. Distributed and parallel Databases, 3(2), pp.119-153. [11] Gregor, S. and Hevner, A.R., 2013. Positioning and presenting design science research for maximum impact. MIS quarterly, 37(2), pp.337-355. [12] Hevner, A., & Chatterjee, S. (2010). Design science research in information systems (pp. 9-22). Springer US. [13] Ismail, W.N. and Aksoy, M.S., 2015. Technical Perspectives on Knowledge Management in Bioinformatics Workflow Systems. Editorial Preface, 6(1). [14] Kammer, P.J., Bolcer, G.A., Taylor, R.N., Hitomi, A.S. and Bergman, M., 2000. Techniques for supporting dynamic and adaptive workflow. Computer Supported Cooperative Work (CSCW), 9(3-4), pp.269-292. [15] Klein, M. and Dellarocas, C., 2000. A knowledge-based approach to handling exceptions in workflow systems. Computer Supported Cooperative Work (CSCW), 9(3-4), pp.399-412.

23

[16] Mahling, D.E., Craven, N. and Croft, W.B., 1995. From office automation to intelligent workflow systems. IEEE Expert, 10(3), pp.41-47. [17] Mariscal, G., MarbÃan, ˛ ÃS. ¸ and FernÃandez, ˛ C., 2010. A survey of data mining and knowledge discovery process models and methodologies. The Knowledge Engineering Review, 25(02), pp.137-166. [18] Martin, J. 1991. Rapid application development, Macmillan Publishing Co., Inc. Indianapolis, IN, USA. [19] Mentzas, G., Halaris, C. and Kavadias, S., 2001. Modelling business processes with workflow systems: an evaluation of alternative approaches. International Journal of Information Management, 21(2), pp.123-135. [20] Narendra, N.C., 2004. Flexible support and management of adaptive workflow processes. Information Systems Frontiers, 6(3), pp.247-262. [21] Nutt, G.J., 1996. The evolution towards flexible workflow systems. Distributed Systems Engineering, 3(4), p.276. [22] Osei-Bryson, K.M., Mansingh, G. and Rao, L., 2014. Understanding and applying knowledge management and knowledge management systems in developing countries: Some conceptual foundations. In Knowledge Management for Development (pp. 1-15). Springer US. [23] Rinderle, S., Reichert, M. and Dadam, P., 2004. Flexible support of team processes by adaptive workflow systems. Distributed and Parallel Databases, 16(1), pp.91-116. [24] Ryan, S., Harden, G., Ibragimova, B. and Windsor, J., 2012. The business value of knowledge management. [25] Swenson, K.D. and Irwin, K., 1995, August. Workflow technology: trade-offs for business process re-engineering. In Proceedings of conference on Organizational computing systems (pp. 22-29). ACM. [26] Sveiby, K. E. (1997). The new organizational wealth: Managing & measuring knowledge-based assets. Berrett-Koehler Publishers. [27] Walsh, P., Carroll, J. and Sleator, R.D., 2013. Accelerating in silico research with workflows: a lesson in simplicity. Computers in biology and medicine, 43(12), pp.2028-2035. [28] Zeng, D.D. and Zhao, J.L., 2002, January. Achieving software flexibility via intelligent workflow techniques. In System Sciences, 2002. HICSS. Proceedings of the 35th Annual Hawaii International Conference on (pp. 606-615). IEEE.

ISBN: 1-60132-463-4, CSREA Press ©

24

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

A Knowledge Management Framework for Sustainable Rural Development: the case of Gilgit-Baltistan, Pakistan 1

Liqut Ali1 and Anders Avdic2 Department of informatics, Orebro University, Orebro, Sweden 2 Dalrarna University, Dalarna, Sweden

Abstract - Some 50% of the people in the world live in rural areas. The need for knowledge of how to improve living conditions is well documented. Knowledge on how to improve living conditions in rural areas and elsewhere is continuously being developed by researchers and practitioners around the world. People in rural areas, in particular, would certainly benefit from being able to share relevant knowledge with each other, as well as with stakeholders (e.g. researchers) and other organizations (e.g. NGOs). Central to knowledge management is the idea of knowledge sharing. This study aims to present a framework for knowledge management in sustainable rural development. The study is interpretive and presents a framework of knowledge management system for sustainable rural development. Keywords: knowledge management, requirement analysis, framework, knowledge society, rural development,

1

Introduction

The need for knowledge of how to improve living conditions is well documented [18]. In response to this need, new knowledge of how to improve living conditions in rural areas and elsewhere is continuously being developed by researchers and practitioners around the world. People in rural areas, in particular, would certainly benefit from being able to share relevant knowledge with each other, as well as with stakeholders (e.g. researchers) and other organizations (e.g. NGOs). Central to knowledge management (KM) is the idea of knowledge sharing. The significance of KM in sustainable development has been described by several researchers. According to wong [19] KM provides a good foundation for sustainable development. KM is also critical for innovation, prioritization and the efficient use of resources [15]. In this paper we discuss how to apply a KM approach in order to take advantage of knowledge, experiences and good examples of sustainable rural development to improve life for the people of remote rural regions. We perceive knowledge management as the process of “continually managing knowledge of all kinds to meet existing and emerging needs, to identify and exploit existing and acquired knowledge assets and to develop new opportunities.” [16] When we use the term development we refer to human development. “Human development is about creating an environment in which people can develop their full potential and lead productive, creative lives in accord with their needs and interest” [19].

From our perspective in this context of rural development, sustainability implies the use of methods, systems and materials that will not deplete resources or harm natural cycles [10]. Thus, we hypothesize that the use of KM to discover, capture, share, and apply knowledge about rural development activities can support sustainable development. A knowledgebased society and knowledge-sharing environment can make the development process sustainable and the goals of that development process achievable. The long-term goal is to contribute to a better life for vulnerable and exposed people in rural areas. This study examines the case of the region of Gilgit-Baltistan in Pakistan in order to develop a framework for knowledge sharing for sustainable rural development. The GilgitBaltistan region of Pakistan is a rural region. Geographically the region is situated in the north of Pakistan (approximately 35-36o North and 74-75o East) and surrounded by the world’s greatest and highest mountain ranges: the Himalayas, Karakorum, and the Hindu Kush. Economically, this area is poor and this region has been the subject of a rural development process by numerous NGOs, international development agencies, and the local government of GilgitBaltistan in Pakistan. For the last twenty years, the AKRSP (Aga Khan Rural Support Program) has also sought to contribute to the reduction of poverty in the region [23]. The research question for this paper is: How can knowledge management contribute to sustainable rural development? In addition, the specific objectives are to: specify the knowledge needs of a repository and knowledge sharing.

1.1 Knowledge management frameworks for rural development: A number of knowledge management development frameworks exist in literature. In his study, Heisig found 160 frameworks [2]. In this paper, we present 10 selected frameworks that are specifically related to rural development. Different components of the framework have been selected in order to put forward one specific framework that is appropriate for rural contexts such as Gilgit-Baltistan, Pakistan. The selection of frameworks from the literature study is presented in Table 1, below.

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

25

Table 1 Frameworks

Type of Knowledge

Integrating resources to help

Agriculture and

farmers

market.

Integration of traditional and

Traditional &

World digital graphic mental,

scientific knowledge

scientific,

information portal website,

[2]

knowledge networks

Implicit ,explicit

Mobile phone, email, radio,

Local/ rural

[4]

management areasl development

Knowledge base

National, regional,

[5]

management knowledge-based DSS

Agriculture based

Data base, knowledge base

Farmers

[3]

Rural education resource

Rural education

IP network, satellite network,

Rural community

Agriculture

Databases, knowledge pools

Rural community,

[13] [17]

[6]

[9]

[11]

[7]

Regional viable system for agriculture knowledge management platform ASEAN countries

Economical, ecological,

Rural energy service

knowledge sharing centre in

Community

Dominican republic

knowledge

Rural business decision support

Live stock based rural

system

knowledge

The framework inventory mainly contributes by offering techniques and strategies that allow rural development knowledge to be shared. It also verifies the categorization of the beneficiaries and of rural development knowledge.

2

Methodology

This work is an interpretive qualitative study [20] that focuses on how to use KM for sustainable rural development. After an initial literature review the study was carried out in two phases. One phase consisted of an empirical study to specify stakeholders and knowledge resources. The other phase consisted of a selection of relevant frameworks and concepts carried out in order to find a relevant KM approach to realize a KMS. Phase one takes aspects from requirements analysis as point of departure, while phase two searches for KM approaches that claims to contribute to rural development. The first Phase started with a field study in the region of Gilgit-Baltistan in Pakistan. The field study was carried out over a period of 21 days, during which interviews were conducted in this specific region. The study was designed using a traditional method of stakeholder-driven requirement elicitation put forward by Lamsweerde [14]. In parallel with the requirement analysis we carried out an inventory of relevant approaches to the design of knowledge management frameworks. Data bases used were: ACM

Technology used

Knowledge

Purpose/Objective

Research and training

Relational database, search algorithm, knowledge servers,

beneficiary Local community, researchers Researchers

ASEAN countries,

Email, digital identities

Local community

Knowledge base- expert system-

Farmers

Digital Library, Elin@orebro, and Google Scholar. Search words were “knowledge management system”, “rural development”, “sustainable development”, “development”, and “framework”, in different combinations. A review of abstracts and conclusions elicited ten articles on the design of frameworks in knowledge management connected to rural development. The articles were selected due to their close connection to knowledge management and rural development. The three main elements, that were considered when selecting frameworks, were the same as we used as for the empirical study. A data matrix was used for analyzing the concepts of the selected frameworks [21]. As noted in [12] the use of ideas in the literature is to justify the particular approach to the topic. Ten frameworks were selected. Empirical information was gathered in line with initial information requirements for knowledge management. This was then related to the selected frameworks outlined in the literature study and described in the sections that follow. Results are presented with the findings from the empirical study and framework for knowledge sharing for sustainable rural development.

3

Knowledge on development activities

In order to develop a knowledge repository, the primary requirement is to acquire knowledge from stakeholders. Initially, information about development activities in the

ISBN: 1-60132-463-4, CSREA Press ©

26

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

region was collected and categorized. The interviewed organizations were categorized into four main stakeholder groups and analyzed with regards to the knowledge beneficiaries. The interviewed stakeholders had specific knowledge with respect to their development practices. Their practices concerned animal husbandry, social welfare, safe drinking water, tree planting, glacier protection, wildlife preservation, building health care units, school building, teacher training, culture preservation, tourism promotion, building ICT centers, gender development, and more. Knowledge was categorized on the basis of stakeholders’ knowledge resources and needs relevant for rural development, which is presented below. The knowledge is available in different forms. The local government’s planning and development department has, for example, knowledge on funding and infrastructural development of the region in paper format. The LSO (local support organization) and local population have information on the agriculture sector. AKRSP (Aga Khan Rural support program) has 25-years’ worth of development records, kept both in paper and digital formats.

3.1

Technical conditions infrastructure

and

ICT

The ICT infrastructure represents the conditions for any ICT-based initiative such as a KMS. The initiatives are candidates for incorporation in the KMS and they demonstrate that ICT based knowledge exists even in remote rural regions. The region has internet connectivity some of the NGOs have relatively fast Internet connection. More recently few Mobile phone companies have started their operation in the region. There exist ICT-based initiatives such as AKRSP has a project in collaboration with Telenor Pakistan. The project examines e-market access for farmers in the remote valleys. Another recent project is a digital resource centre, which stores the current documentation of organizational activities. AKRSP has also initiated a 3D program (democracy, dialogue and development) to study local governance in terms of the initiation of dialogue culture. AKCSP (Aga khan Cultural service Pakistan) has initiated a project with support from the Government of Norway to restore and rehabilitate historic landmarks and places. KADO (Karakuram area development organization) has started a business incubation project for website development for small entrepreneurs. An e-schooling concept has been introduced

and four Internet cafés have been established. The main requirement for the knowledge repository is to utilize all these kinds of resources offered by individual organizations.

3.2

Proposed framework for knowledge sharing of sustainable rural development activities

We present a framework for knowledge management in rural development based on literature study and the empirical findings. The description of the KMS is compiled from selected frameworks for rural development. The stakeholder categorization and knowledge content are both derived from the empirical study. In the proposed framework we claim that the sustainability of rural development can be achieved through a knowledge society in which knowledge of the rural development process is shared among all relevant stakeholders. Knowledge society and sustainability occur when the local population can independently inform themselves about the rural development process and activities. In the proposed framework the process starts with the stakeholders. The stakeholders are local government, local population, academia, NGO’s, civil society and donor agencies. The second layer consists of rural development activities including ICT and infrastructure. The third is KM system that consist of creating/capturing knowledge, knowledge storage and sharing/application of knowledge. The processes and technology are adapted from the 10 selected frameworks. In our framework, the created knowledge was then captured, converted and stored in different digital formats using the relevant ICT tools and technology. The framework shows the technologies that can be used to share knowledge; these include the Internet and intranet web portals, as well as mobile and smart phones, and other formats. Different communication channels can also be used, such as TV programs, to show documentaries. The final process on the topmost layer of the framework relates to the sustainability of rural development. Our main claim about this process is that when shared knowledge is applied further on in the rural development process, it leads to the sustainability of the whole rural development process. The key points may come into play as Knowledge is important for development [22]. A sustainable knowledge society is composed of three elements: economic, environmental and social development [2]. A knowledge society can make the development process achievable.

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

27

Figure1: KMS framework for sustainable rural development

4

Conclusions

Knowledge is crucial for any kind of development. Sustainability is composed of three main areas: social development, economic development and environmental development. Diverting rural development towards the path of sustainability implies a reliance on knowledge of the natural process, natural resources and the inter-relations between social and ecological systems. A knowledge society can also make the rural development process achievable. In this context, the achievability of the rural development process and its sustainability relies on knowledge. Thus, we designed a knowledge management framework in order that knowledge can be used to develop the sustainability of the rural development process in the case of GilgitBaltistan Pakistan.

5

References

[1] Afgan, N. H. & Carvalho, M. G. “The Knowledge Society: A Sustainability Paradigm”. Cadmus. 1(1), 28–41, Oct 2010. [2] Hess, C.G. “Knowledge Management and Knowledge Systems for Rural Development”. GTZ knowledge Systems in Rural Areas. 09/2007, 1-16, (2006).

[3] Hu, M., Liu, Q. & Li, H. “Design and Implementation of Rural Educational Resource Regional Service Platform”. IEEE Computer Society 2009 Second International Symposium on Knowledge Acquisition and Modeling. Pp. 16-19, 2009. [4] Kurlavicius, A. “Knowledge based approach to sustainable rural development management”. Biometrics and information technologies in agriculture: Research and development in proceedings of the second International Scientific conference, 24-25 Nov 2006, Academia, Kaunas, 710. [5] Kurlavicius, A. “A viable systems Approach to sustainable Rural Development: Knowledge-Based Technologies and OR Methodologies for Decisions of Sustainable Development”. 5th International Conference, KORSD. 75-79, 2009. [6] Kurlavicius, A. “Sustainable agricultural development: knowledge-based decision support”. Technological and economic development of economy. 15 (2), 294-309, 2009. [7] Miah, J.M., Kerr, D.,Gammack, J., & Crowan, T. (2008). “A generic design environment for the rural industry knowledge acquisition”. Knowledge-Based Systems, 21 (2008) 892–899, 2008.

ISBN: 1-60132-463-4, CSREA Press ©

28

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

[8] Mohamed, M., Stankosky, M., & Mohamed, M. “An empirical assessment of knowledge management criticality for sustainable development”. Journal of Knowledge Management, 13(5), 271 – 286, 2009. [9] Payakpate, J., Fung, C.C., Nathakarankule, S., Cole, P., & Coke.P. “A knowledge management platform for the promotion of modern rural energy services in ASEAN countries”. IEEE Xplore, 535-538, 2004. [10] Rosenbaum, M. “Sustainability definition” www.hsewebdepot.org/imstool/GEMI.nsf/WEBDocs/Glossar y?OpenDocument, 1993. [11] Shakeel, H., & Best, M. L. “Community knowledge sharing: an Internet application to support communications across literacy levels”. Technology and Society, (ISTAS'02) International Symposium, 37- 44. 2002.

[21] Webster, J., & Watson, R. T. “Analyzing the past to prepare for the future: Writing a literature review”. MIS Quarterly 26(2), xiii-xxiii, 2002. [22] Wong, D. M. L. “Knowledge Management Catalyst for Sustainable development”. ITSim 2010 (3), 1444-1449, 2010. [23] Wood, G., Malik, A., & Sagheer, S. “Valleys in Transition, Twenty Years of AKRSP’s Experience in the Northern Pakistan”. Oxford University Press. World Bank, 2013. [24] World Bank. “Indigenous knowledge for development a framework for action, Knowledge and Learning Center, Africa Region. 1998.

[12] Levy, Y., & Ellis, J. “Towards a Framework of Literature Review Process in Support of Information Systems Research”. Proceedings of the 2006 Informing Science and IT Education Joint Conference, 2006. [13] Liu, F. & Makoto, H. “Leadership’s function in the knowledge management of rural regional development”. IEEE, 978-1-42445326-9/10, 2010. [14] Axel van Lamsweerde “Requirements Engineering: From System Goals to UML Models to Software Specifications”. Willy, Feb 2009. [15] Mohamed, M., Stankosky, M., & Mohamed, M. “An empirical assessment of knowledge management criticality for sustainable development”. Journal of Knowledge Management, 13(5), 271 – 286, 2009. [16] Quintas, P., Lefrere, P. & Jones, G. “Knowledge Management: a Strategic Agenda”. Long Range Planning. 30(3), 395-391, 1997. [17] Rahman. A. “Development of an Integrated Traditional and Scientific Knowledge Base: A Mechanism for Accessing, BenefitSharing and Documenting Traditional Knowledge for Sustainable Socio-Economic Development and Poverty Alleviation, 2000. [18] UNDP. “Knowledge Sharing”. http://www.undp.org/content/undp/en/home/ourwork/knowle dge_exchange/, 2013 [19] UNDP. “The human development concept”. Human Development Reports, ttp://hdr.undp.org/en/humandev/, 2007. [20] Walsham, G. “Interpretive case studies in IS research: nature and method”. European Journal of Information Systems 1995 4, 74-81, 1995.

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

29

"Innovation Description Languages, IDLs" & Knowledge Representations, KRs and ─ Easily Drafting&Testing Patents for Their Total Robustness ─ Sigram Schindler TU Berlin & TELES Patent Rights International GmbH, Berlin, Germany Abstract - For any ETCI its KR in any IDL dramatically facilitates mathematically & semi-automatically ●drafting it totally robust and/or ●proving it to be totally robust (if preexisting). Key for totally robust patents is the rationalization by the Supreme Court’s MBA-framework of 35 USC Substantive Patent Law, SPL, thus enabling it to unassailably protect Emerging Technology Claimed Inventions without jeopardizing the US patent system. Even if applying the FSTP-Test, hitherto no way has been known to stereotypically for any ETCI semi-mathematically/automatically ●draft by a human an ETCI’s totally robust patent(application) specification in a thus expanded subset of a natural language, and/or ●test any ETCI by this IDL’s J IDL LACs for its meeting all requirements of 35 USC/SPL.

Keywords: : "Innovation Description Languages, IDLs" & Knowledge Representations (KRs), Total Robustness, Substantive Patent Law (SPL), MBA-framework, unlimited preemptivity, Alice’s PE analysis

1

DEFINITIONs of a TOTALLY 1.a) PATENT, of an IDL, ROBUST IDL and of its ETCI

:[354/ftn2.a)], .b :[354/ftn2.h)], .c :[354/ftn2.c)], .d :[371.1]. .e ─ which to pass by an ETCI is necessary and sufficient for it to satisfy SPL in MBA-framework flavor ─ All results presented here also hold for the EPA and for all other ‘National Patent Systems, NPS’ and ‘Classic Technology Claimed Inventions, CTCIs’ ─ yet in both areas are still considered to be unnecessary and hence left aside for the time being. Due to the rapid development of emerging technologies, it is unavoidable that many examiners/jurors/lawyers/judges often can’t fully understand ETCIs or SPL. Though their decisions about ETCIs then are totally untenable ─ especially when they construe alleged prima-facie cases ─ they nevertheless are done in spite of contradicting quite evidently correct presentations by the inventors and often become final. This inevitably leads to ETCI/case-wise unpredictability of USPTO and/or court decisions about them and over several of them becoming irreconcilably inconsistent. This has frequently happened in

1 .a

For any ETCI its KR[2] in any IDL dramatically facilitates mathematically & semi-automatically ●drafting it (from scratch) totally robust and/or ●proving it to be totally robust (if preexisting) ─ unless it is false. Key for totally robust patents is the rationalizationb)4) by the Supreme Court’s MBA-framework of 35 USC Substantive Patent Law, SPL (§§ 112/101/102/103), thus enabling it to unassailably protect Emerging Technology Claimed Inventions, ETCIs, without jeopardizing the US patent system. IDL-Axiom: An “IDL” is any elementarily mathematizable2) subset of any natural language that is next to trivial, yet is expanded by all notionsc) that the Supreme Court by its MBAframework ex- or implicitly requires to be used in legally particular with the CAFC and/or the USPTO (see the FSTP-Project Reference List). .f by a set of J uniform (over all ETCIs) and NT-area vastly independent, yet potentially complex Legal Argument Chains, LACs. From FIG1.3 follows structurally (i.e. crC0S_AXIOMs4) Lib[373] depending ): J ≈ 2K + 7 (as the K E-crCs are checked twice by FSTP-test1&-test2, and test7 is evidently redundant), whereby only 9 independent4) LACs are needed, yet any LAC having several parameters. I.e., this total number of questions for a LAC needed in SPL testing an ETCI is much smaller than the 240+ questions reported in a recent USPTO event, the statements of which moreover are not resilient (not to say meaningless) as hopelessly metaphysical2). These for both communities totally unexpected results ─ for the patent community just as for the program correctness proving community ─ are basically due to the Supreme Court having induced a rationalb) SPL-satisfaction-test by ETCIs, i.e. the FSTP-Teste), on the one hand, and this FSTP-Test is a program of an absolutely simple structure, on the other hand, .g :[261,298],[354/c),355/II.4],[371.2] ‘Pair of Teaching, alias .h IDL_or_freestylePTR::= IDL_or_freestyle COM(ETCI), and a Reference Set of prior art documents ’ for this teaching . The notion PTR is replaced by the notion ETCI, wherever possible without causing misunderstanding, and the prefix “IDL_or_freestyle” is often omitted. .i :[9.c]. The 8 KRs are metaphysical, metarational, rational, and/or (not yet fully) mathematical4). All cases assume the same IDL resp. freestyle language ─ unless additional provisions are given for overcoming the distinctions between unequal IDLs resp. freestyles, being always determinable (unless the freestyle language is not rational4)) but not elaborated on here. A distinction between an IDL and a freestyle language is that the latter fails to know all rational MBA framework notions. Many popular programming languages’ (Algol, Fortran, Pascal, …) tiny subsets are easily expandable to IDLs (if needed). Vice versa: Often a second glance at an alleged simple IDL is required for verifying that it is not still a freestyle language ─ a point not understood by the USPTO and the CAFC, hence often delivering SPL decisions about ETCIs contradicting the Supreme Courts MBA-framework for them.

ISBN: 1-60132-463-4, CSREA Press ©

30

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

deciding an ETCI to satisfy SPL ─ by testing it in this IDL. Thus, an IDL is a Domain-specific Language, DSLd), for facilitating drafting and or using (e.g. testing) IDLETCIs. Even if applying the FSTP-Teste), hitherto no way has been known to stereotypically for any ETCI semi-mathematically/-automatically ●draft by a human an ETCI’s totally robust patent(application) specification in a thus expanded subset of a natural languagef), e.g. an English IDL, and/or ─ the other way around ─ ●test any ETCI by this IDL’s J IDL LACsf) for its meeting all requirements of 35 USC/SPLg). Section II shows by 8 FIGs1 the “IDL/freestyle versus nIDLFSTPTest” (= Facts/Screening/Transforming/Presenting-Test) of a IDL_or_nIDL PTRh) in one of its KR(PTR)s (= IDL_or_nIDL KR(ETCI)h)) ─ its FIGs2 in[373] their ‘joint mental structure brainKR’[373] in the tester’s brain, incrementally completed. I.e.: After FIGs1 remind[354] of and elaborate on the vastly automatic IDLFSTP-Test for the mathematically proven total robustness of any ETCI in IDLKR over the Supreme Court framework’s SPL interpretation, these FIGs2 indicate the large number of freestyleFSTP-tests for this ETCI to be performed free-handedly by a human tester, if it is in today’s freestyleKR ─ for assessing its likely robustness only! FSTP-Test- start input LCOM(ETCI) ≡ inL0S ::= ‘{inL0n ≈ MUIS0n, ∀1≤n≤N}’; if [, ∀1≤n≤N, is lawfully_disclosed] if [crL0n, ∀1≤n≤N, is enablingly_disclosed] L

Section III sums up the enormous increases in efficiency IDLs enable with all PTOs & patentees & law firms, as any IDL FSTP-Test warrants for any IDLETCI a huge increase of convenience&efficiency& quality ●in drafting it to be totally robust and/or ●testing it for its total robustness. And Section IV briefly comments on 2 actual letters to Congress concerning allegedly only the PE problem, but in truth the entire SPL satisfaction problem as caused by ETCIs’ importance and their threatening the US NPS.

2

ETCIs’ FSTP-DRAFTING/-TESTING for TOTAL ROBUSTNESS in an IDLKR versus a nIDLKR

An ETCI’s FSTP-Test is shown by 8 different KRs, 4 (nIDL_&_nonrefined) KRs and 4 (IDL_&_refined)KRs: ●The 2 nonIDL_alias_freestyle KRs1.1&.2 contradict the MBA-framework as not refining it accordingly, as ●the 2 IDL & refinedKRs do, and ●the 4 KRs1.1’-4’ are semantically ‘equivalent’ to those of FIGs1.1-42)3). (Classic ETCI KR, need of MBA-framework-refinement not yet recognized)

else_output ‘LCOM(ETCI) is not lawfully_disclosed’  stop; else_output ‘LCOM(ETCI) is not enablingly_disclosed’  stop;

(in the ETCI’s claim interpretation by §§ 112 for a COM(ETCI) of it ─ representing the ETCI ─ these 2 tests are necessarily passed by the ETCI for its satisfying the SPL)

if [LCOM(ETCI) is PE] if [LCOM(ETCI) is over RS not (anticipated  obvious)] output ‘LCOM(ETCI) satisfies SPL’  stop.

FIG1.1: The ETCI’s FSTP-Test in

FSTP-Test- stop. nonIDL

else_output ‘LCOM(ETCI) is nPE’  stop; else_output ‘LCOM(ETCI) is nonpatentable over RS’  stop;

(O-)KR is of pre-MBA era, i.e. of highly metaphysical quality2)3)

FSTP-Test- start input LCOM(ETCI) ≡ inL0S ::= ‘{inL0n ≈ MUIS0n, ∀1≤n≤N}’; if [, ∀1≤n≤N, is lawfully_disclosed] if [{crL0n, ∀1≤n≤N} is enablingly_disclosed] if [{crL0n, ∀1≤n≤N} is ((Ldefinite∀1≤n≤N)  uniquely_defined  useful)]

(Hypothetic ETCI KR: Need of inventive concept not yet recognized)

else_output ‘LCOM(ETCI) is not lawfully_disclosed’  stop; else_output ‘LCOM(ETCI) is not enablingly_disclosed’  stop; else_output ‘LCOM(ETCI) is not (useful  definite)’  stop;

(the ETCI’s claim interpretation: Closing Remark is the same as in FIG1.1 ─ yet now also enabled meeting the ‘Biosig requirements’)

if [LCOM(ETCI) comprises an nPE TT0] if [LCOM(ETCI) is an application of the nature of TT0] if [LCOM(ETCI) is significantly more than TT0] if [LCOM(ETCI) is limited preemptive]

else_output ‘LCOM(ETCI) is not comprising an nPE TT0’  stop; else_output ‘LCOM(ETCI) is not an application of the nature of TT0’  stop; else_output ‘LCOM(ETCI) is not significantly more than TT0’  stop; else_output ‘LCOM(ETCI) is not limited preemptive’  stop;

if [LCOM(ETCI) is over RS not (anticipated  obvious)]

else_output ‘LCOM(ETCI) is not patentable due to RS’  stop;

(in the ETCI’s claim construction by §§ 101 these 4 tests are additionally to be passed ─ then the ETCI’s PE test by Bilski, Myriad and Alice is completed, i.e. it is patent-eligible)

(in the ETCI’s claim construction by §§ 102/103 this test is additionally to be passed ─ then also the ETCI’s patentability test by KSR & Graham is completed, i.e. it satisfies SPL)

output ‘ COM(ETCI) satisfies SPL’  stop. L

FIG1.2: The ETCI’s FSTP-Test in

FSTP-Test- stop.

much_less_nonIDL

(O-)KR is of pre-&post-MBA era, i.e. of metaphysical quality2)3)

FSTP-Test- start (Complete MBA-framework-refined KR) input ‘COM(ETCI) ≡ O-/A-/E-inC0S ::= ’O-inC0S ::= {O-inC0n ≈ O-MUIS0n, ∀1≤n≤N}  A-inC0S ::= {A-inC0n ≈ A-MUIS0n, ∀1≤n≤N}   E-inC0S ::= {(E-inC0nk  E-ninC0nk) ≈ E-MUIS0nk, ∀1≤k≤Kn∧∀1≤n≤N}  inC0S_AXIOM-Lib‘; 1) if [(E-crC0nk  E-ncrC0nk), ∀1≤k≤Kn, ∀1≤n≤N, is lawfully_disclosed] else_output ‘COM(ETCI) is not lawfully_disclosed’  stop; 2) if [A-crC0n=∧1≤k≤Kn(E-inC0nk  E-ninC0nk), ∀1≤n≤N, is enablingly disclosed] else_output ‘COM(ETCI) is not enablingly_disclosed’  stop; 3) if [COM(ETCI) is (E-definite  E-complete  uniquely_defined  useful)] else_output ‘COM(ETCI) is not (useful  definite)’  stop;

4) 5) 6) 7)

(the ETCI’s claim interpretation: Closing Remark same as in FIG1.2 ─ yet now also O-/A-/E-level notional resolution alias refinement performed)

if [COM(ETCI) comprises an nPE TT0] if [COM(ETCI) is an application of the nature of TT0] if [COM(ETCI) is significantly more than TT0] if [COM(ETCI) is limited preemptive]

else_output ‘COM(ETCI) is not comprising an nPE TT0’  stop; else_output ‘COM(ETCI) is not an application of the nature of TT0’  stop; else_output ‘COM(ETCI) is not significantly more than TT0’  stop; else_output ‘COM(ETCI) is not limited preemptive’  stop;

(in the ETCI’s claim construction by §§ 101 these 4 tests are additionally to be passed ─ then the ETCI’s PE test by Bilski, Myriad, and Alice is completed, i.e. it is patent-eligible)

8) if [COM(ETCI) comprises only independent E-inC0nk] 9) if [COM(ETCI) is definite over RS] 10) if [COM(ETCI)’s semantic height over RS is (>0/>1 over one/severalRS)]

else_output ‘COM(ETCI) is not independent’  stop; else_output ‘COM(ETCI) is not definite over RS’  stop; else_output ‘COM(ETCI) is not patentable over RS’  stop;

(in the ETCI’s claim construction by §§ 102/103 this test is additionally to be passed ─ then also the ETCI’s patentability test by KSR & Graham is completed, i.e. it satisfies SPL)

output ‘COM(ETCI) satisfies SPL’  stop.

FSTP-Test- stop.

FIG1.3: The ETCI’s FSTP-Test in IDLKR is of post-MBA era, i.e. of rational quality2)3)

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

31

FSTP-Test- start (Vastly mathematized MBA-framework-refined-KR) input ‘COM(ETCI) ≡ O-/A-/E-inC0S ::= ’O-inC0S ::= {O-inC0n ≈ O-MUIS0n, ∀1≤n≤N}  A-inC0S ::= {A-inC0n ≈ A-MUIS0n, ∀1≤n≤N}   E-inC0S ::= {(E-inC0nk  E-ninC0nk) ≈ E-MUIS0nk, ∀1≤k≤Kn∧∀1≤n≤N}  inC0S_ AXIOM-Lib ‘; 1) if [,∀1≤n≤N, is lawfully_disclosed] else_output ‘COM(ETCI) is not lawfully_disclosed’  stop; 2) if [{A-crC0n=∧1≤k≤Kn(E-inC0nk  E-ninC0nk), ∀1≤n≤N}, is enablingly disclosed] else_output ‘COM(ETCI) is not enablingly_disclosed’  stop; 3) if [COM(ETCI) is (E-definite  E-complete  uniquely_defined  useful)] else_output ‘COM(ETCI) is not (useful  definite)’  stop;

4) 5) 6) 7)

(in the ETCI’s claim interpretation: Closing Remark same as in FIG1.3)

if [scope(E-crCSTT0) ≠+ Φ] if [(TT0scope(E-crCSETCI)  scope(E-crCSTT0 )] if [(E-crCSAlice ≠ Φ)] if [(TT0scope(E-crCSETCI)  scope(E-crCSTT0 ))  (E-crCSAlice ≠ Φ)]

else_output ‘COM(ETCI) is not comprising an nPE TT0’  stop; else_output ‘COM(ETCI) is not an application of the nature of TT0’  stop; else_output ‘COM(ETCI) is not significantly more than TT0’  stop; else_output ‘COM(ETCI) is not limited preemptive’  stop;

8) if [∀ϵ{E-crC0nk | 1≤n≤N ˄ 1≤k≤Kn} are independent of each other] 9) if [∀i,n,k∃Δi,n,k ∷= if (E-crCink = E-crC0nk) ‘A’ else ‘N’] 10) if [∑1≤n≤N (min∀i[1,I] I{}I)≥2]

else_output ‘COM(ETCI) is not independent’  stop; else_output ‘COM(ETCI) is not definite over RS’  stop; else_output ‘COM(ETCI) is not patentable over RS’  stop;

(in the ETCI’s claim construction by §§ 101: Closing Remark same as in FIG1.3)

(in the ETCI’s claim construction by §§ 102/103: As in FIG1.2 ─ yet now also meeting the KSR & Graham requirements, testing the ETCI’s patentability, i.e. its satisfying 35USC/SPL)

output ‘COM(ETCI) satisfies SPL’  stop.

FSTP-Test- stop.

FIG1.4: The ETCI’s FSTP-Test in

IDL

KR is of post-MBA-era, i.e. of rational & vastly mathematical quality2)

FSTP-Test- start input LCOM(ETCI) ≡ inL0S ::= ‘{inL0n ≈ MUIS0nUSPTO, ∀1≤n≤N}’; if [, ∀1≤n≤N, is lawfully_disclosed] if [crL0n, ∀1≤n≤N, is enablingly_disclosed] if [{crL0n, ∀1≤n≤N} is (useful  (Ldefinite∀1≤n≤N)  uniquely_defined)]

(USPTO: BRIUSPTO&2-step-test KR, no inCS, no MBA-framework-refinement)

else_output ‘LCOM(ETCI) is not lawfully_disclosed’  stop; else_output ‘LCOM(ETCI) is not enablingly_disclosed’  stop; else_output ‘LCOM(ETCI) is not (useful  definite)’  stop;

(in the ETCI’s claim interpretation by §§ 112 for a LCOM(ETCI) of it ─ representing the ETCI ─ these 3 tests are necessarily passed by the ETCI for its satisfying SPL)

if [LCOM(ETCI) passes the USPTO’s 2-step-test] if [LCOM(ETCI)] passes the Graham test output ‘LCOM(ETCI) satisfies SPL’  stop.

else_output ‘LCOM(ETCI) is nPE’  stop; else_output ‘LCOM(ETCI) is nonpatentable over RS’  stop;

FSTP-Test- stop.

FIG1.1’: The ETCI’s FSTP-Test in

nonIDL

(O-)KR is of pre-MBA era, i.e. of highly metaphysical quality2)3)

FSTP-Test- start (CAFC: BRIPHI & incomplete Alice-interpretation, no O-/A-/E-refinement) input LCOM(ETCI) ≡ inL0S ::= ‘{inL0n ≈ MUIS0nPHI, ∀1≤n≤N}’; if [, ∀1≤n≤N, is lawfully_disclosed] else_output ‘LCOM(ETCI) is not lawfully_disclosed’  stop; if [{crL0n, ∀1≤n≤N} is enablingly_disclosed] else_output ‘LCOM(ETCI) is not enablingly_disclosed’  stop; L if [{crL0n, ∀1≤n≤N} is(( definite∀1≤n≤N)  uniquely_defined  useful)] then_met Biosig else_output ‘LCOM(ETCI) is not (useful  definite)’  stop; the ETCI’s claim interpretation: Closing Remark same as in FIG1.1 ─ yet now also meeting the ‘Biosig requirements’

if [LCOM(ETCI) is an application of the nature of TT0] if [LCOM(ETCI) is significantly more than TT0]

else_output ‘LCOM(ETCI) is not an application of the nature of TT0’  stop; then_met Alice else_output ‘LCOM(ETCI) is not significantly more than TT0’  stop;

if [LCOM(ETCI) is over RS not (anticipated  obvious)]

then_met Graham

(in the ETCI’s claim construction by §§ 101 such tests are additionally to be passed ─ then the ETCI’s PE test by Bilski, Myriad, and Alice is completed, i.e. it is patent-eligible)

else_output ‘LCOM(ETCI) is not patentable due to RS’  stop;

(in the ETCI’s claim construction by §§ 102/103 this test is additionally to be passed ─ then also the ETCI’s patentability test by KSR & Graham is completed, i.e. it satisfies SPL)

output ‘ COM(ETCI) satisfies SPL’  stop. L

FSTP-Test- stop.

FIG1.2’: The ETCI’s FSTP-Test in

much_less_nonIDL

(O-)KR is of pre-/post-MBA era, i.e. of metaphysical quality2)3)

FSTP-Test- start (authentic MBA framework KR) input ‘COM(ETCI) ≡ O-/A-/E-inC0S ::= ’O-inC0S ::= {O-inC0n ≈ O-MUIS0n, ∀1≤n≤N}  A-inC0S ::= {A-inC0n ≈ A-MUIS0n, ∀1≤n≤N}   E-inC0S ::= {(E-inC0nk  E-ninC0nk) ≈ E-MUIS0nk, ∀1≤k≤Kn∧∀1≤n≤N}  E-crC0S_DEF‘; 1) if [(E-crC0nk  E-ncrC0nk), ∀1≤k≤Kn, ∀1≤n≤N, is lawfully_disclosed] else_output ‘COM(ETCI) is not lawfully_disclosed’  stop; 2) if [A-crC0n=  1≤k≤Kn(E-inC0nk  E-ninC0nk), ∀1≤n≤N, is enablingly disclosed] else_output ‘COM(ETCI) is not enablingly_disclosed’  stop; 3) if [COM(ETCI) is (E-definiteE-completeuniquely_defineduseful)] then_met Biosig else_output ‘COM(ETCI) is not (useful  definite)’  stop; 4) 5) 6) 7)

(the ETCI’s claim interpretation: Closing Remark same as in FIG1.2 ─ yet now also O-/A-/E-level notional resolution alias refinement)

if [COM(ETCI) comprises an nPE TT0] if [COM(ETCI) is an application of the nature of TT0] if [COM(ETCI) is significantly more than TT0] if [COM(ETCI) is limited preemptive]

then_met Bilski

then_met Alice

else_output ‘COM(ETCI) is not comprising an nPE TT0’  stop; else_output ‘COM(ETCI) is not an application of the nature of TT0’  stop; else_output ‘COM(ETCI) is not significantly more than TT0’stop; else_output ‘COM(ETCI) is not limited preemptive’  stop;

(in the ETCI’s claim construction by §§ 101 these 4 tests are additionally to be passed ─ then the ETCI’s PE test by Bilski, Myriad, and Alice is completed, i.e. it is patent-eligible)

8) if [COM(ETCI) comprises only independent E-inC0nk]

9) if [COM(ETCI) is definite over RS]

1/≥2 10) if [COM(ETCI)’s seman. height over RS is (≥1/≥2 if AC RS)] then_met Graham

else_output ‘COM(ETCI) is not independent’  stop; else_output ‘COM(ETCI) is not definite over RS’  stop; else_output ‘COM(ETCI) is not patentable over RS’  stop;

(in the ETCI’s claim construction by §§ 102/103 this test is additionally to be passed ─ then also the ETCI’s patentability test by KSR & Graham is completed, i.e. it satisfies SPL)

output ‘COM(ETCI) satisfies SPL’  stop.

FSTP-Test- stop.

FIG1.3’: The ETCI’s FSTP-Test in IDLKR is of post-MBA era, i.e. of rational quality2)3) 2

The FIGs1.1’-4’ are slightly more informative than FIGs1.1-4 for explicitly quoting/reminding of the Supreme Court’s 6 decisions defining its MBA framework. To facilitate grasping these 8 FIGs’ messages when reading them, it helps to toggle forth-and-back between them (and all ftn’s). These 8 KRs of the FSTP-Test indicate by different highlights the vocabularies, syntaxes, and semantics of any of the 3 object-languagefragments integrated into the object-IDL, including those of their tiny natural languages wordings (self-explanatory for any IT expert, in[182,373] explained in more detail by3)4)a simple BNF). All of an IDL’s linguistic constructs are used stereotypically over all ETCIs, as occurring in any COM(ETCI)’s FSTP-Test . Moreover: These 8 KRs in total evidently comprise all notions of ●Elementary Mathematics and of ●SPL of MBAframework flavor, reported in trivialized English ─ thus greatly facilitating accepting the MBA-framework. This is amplified by showing: KRs1.1&1’ are metaphysical (in a highly speculative manner as to several aspects, which applies to much of classic patent law thinking), KR1.2’ is still metaphysical (in a less speculative manner, as 2 of the 3 preceding grounds remain), KR1.2 is metarational (in less speculative manner), KRs1.3&3’ are rational ─ as by the MBA-framework expected[378,378,378] ─ and KR1.4&4’ are partially resp. fully mathematical. The deficient SPL understanding of the semantics of ETCIs in KR1.1 is currently the same in all national SPL jurisdictions, worldwide, except in the US: The Supreme Court’s MBA-framework ─ and its discussion in the US patent community supported by the CAFC and USPTO, not being aware of these two institutions’ uncertainties about the currently nonexistent logical foundations of their working, in particular not noticing that their internal and external clashes of recent years are the unavoidable consequences of this lack of a consistent paradigm for it ─ has overcome these uncertainties, thus putting the US SPL into an international lead of about 10 years, in particular as US SPL thereby additionally has been put in line with AIT[2], thus enabling it to achieve the advantages of robustness & efficiency in the ETCI’s patent business reported here.

ISBN: 1-60132-463-4, CSREA Press ©

32

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

FSTP-Test- start (notional partially mathematized MBA framework KR) input ‘COM(ETCI) ≡ O-/A-/E-inC0S ::= ’O-inC0S ::= {O-inC0n ≈ O-MUIS0n, ∀1≤n≤N}  A-inC0S ::= {A-inC0n ≈ A-MUIS0n, ∀1≤n≤N}   E-inC0S ::= {(E-inC0nk  E-ninC0nk) ≈ E-MUIS0nk, ∀1≤k≤Kn∧∀1≤n≤N}  E-crC0S_MatDEF‘; 1) if [,∀1≤n≤N, is lawfully_disclosed] else_output ‘COM(ETCI) is not lawfully_disclosed’  stop; 2) If [{A-crC0n=∧1≤k≤Kn(E-inC0nk  E-ninC0nk), ∀1≤n≤N} is enablingly disclosed] else_output ‘COM(ETCI) is not enablingly_disclosed’  stop; 3) If [COM(ETCI) is (E-definiteE-completeuniquely_defineduseful)] then_met Biosig else_output ‘COM(ETCI) is not (useful  definite)’  stop;

4) 5) 6) 7)

(the ETCI’s claim interpretation: Closing Remark same as in FIG1.2 ─ yet now also O-/A-/E-notional resolution alias refinement)

if [scope(E-crCSTT0) ≠+ Φ] if [(TT0scope(E-crCSETCI)  scope(E-crCSTT0 )] if [(E-crCSAlice ≠ Φ)] if [(TT0scope(E-crCSETCI)  scope(E-crCSTT0 ))(E-crCSAlice ≠ Φ)]

then_met Bilski else_output ‘COM(ETCI) is not comprising an nPE TT0’  stop; else_output ‘COM(ETCI) is not an application of the nature of TT0’  stop; then_met Alice else_output ‘COM(ETCI) is not significantly more than TT0’  stop; else_output ‘COM(ETCI) is not limited preemptive’  stop;

(in the ETCI’s claim construction by §§ 101: As in FIG1.2 ─ yet now also meeting the Bilski & Alice requirements, testing the ETCI’s PE, i.e. its patent-eligible)

8) if [∀ϵ{E-crC0nk | 1≤n≤N ˄ 1≤k≤Kn} are independent of each other] else_output ‘COM(ETCI) is not independent’  stop; 9) if [∀i,n,k∃Δi,n,k ∷= if (E-crCink = E-crC0nk) ‘A’ else ‘N’] else_output ‘COM(ETCI) is not definite over RS’  stop; 1≤n≤N i,n,1 i,n,Kn i[1,I] 10) if [∑ (min∀ I{}I)≥2] then_met Graham else_output ‘COM(ETCI) is not patentable over RS’  stop;

(in the ETCI’s claim construction by §§ 102/103: As in FIG1.2 ─ yet now also meeting the KSR & Graham requirements, testing the ETCI’s patentability, i.e. its satisfying 35USC/SPL)

output ‘COM(ETCI) satisfies SPL’  stop.

FSTP-Test- stop.

FIG1.4’: The ETCI’s FSTP-Test in IDLKR is of post-MBA era, i.e. of rational & total mathematical quality3) Prior to starting the FSTP-Test3) for an ETCI or PTR , or iteratively after this start, a human tester must derive from this ETCI’s specification (and potentially from its RS) a tentative COM(ETCI). In the above 4 nIDLKRs COM(ETCI) comprise only its (O-)inC0S[354,355] and in the 4 IDL KRs additionally its A- and E-inCS, for any inC in any of the 8 KRs also its corresponding MUIS, just as its ETCI-embedded TT0 and potentially its RS’es A/N-Matrix (both for brevity omitted as evident). Any rationalized and potentially mathematized4) KR of an ETCI needs confirmation by its FSTP-Test ─ as the 4 IDLKRs show, while for the 4 nIDLKRs this rationalization and potential mathematization is impossible2). The 4 IDLKRs show that an ETCI is checked by the FSTP-Test for its satisfying the refined SPL in semantically independent4) ways ─ more precisely: by 9 independent tests of 9 independent aspects of the SPL semantics embodied by the ETCI (as FSTP-test7 evidently is a logical conjunction of FSTP-test5 and FSTP-test6, hence is dependent on the both of them3), i.e. must not be counted twice when determining an 1.h)

3

While the above main text surveyed the 8 KRs1.1-1.4’, this ftn3) outlines key details/aspects of these KRs’ definitions: Upfront 4 remarks are helpful in avoiding misunderstanding in the cognitive highly topical analysis ─ triggered by the Supreme Court’s MBA framework in the FSTP-Project, underlying this paper ─ of rational4) interrelations between 35 USC/SPL, ETCIs, and the current state of development of AIT[2]: 1.) Neither the ●disaggregation of a given patent and the ETCI it specifies into its whatever elements and their properties (comprising their mutual interrelations just as those to the nPE posc and prior art) nor ●their descriptions implied by the MBA framework need to be ─ and often are ─ unique, even if their paradigms to be used are prescribed implicitly only (what the MBA framework implies). 2.) Any of the hints quoted under one KR may apply to several other of the 8 KRs, too, following or preceding the former KR. 3.) The title of any of the 8 KR-Boxes briefly characterizes its content by a catchphrase segment. 4.) For typo fixes & more explanations see[373,182]. ● FIG1.1 All notions used by the 8 KRs’ definitions are confirmed by the “Person of Pertinent Ordinary Skill & Creativity, Pposc”. As explained in[355], the notion of ‘limitation’ as broadly used by the patent community (here in the 4 nIDLKRs) is not definable – probably being the main reason that the MBA framework requires basing the descriptions of an invention (i.e. of its properties) on the definable notion ‘(inventive) concept(s)’, as done in IT System Design since 50 years ─ nevertheless rejected by the patent community until today. Yet this conflict can be easily eliminated by axiomizing the term ‘limitation’ to mean ‘inventive concept’, as explained in5.a). KR1.1 is presented thereby as a riminder of the wellknown non-refined classical SPL satisfaction test that is still unaware of any ETCI’s inevitable peculiarities for excluding that it would put the US patent system into jeopardy. Initially these peculiar notions

introduced into the SPL by the Supreme Court’s MBA-framework have been very nebulous, yet in the FSTP literature elaborated on them since this FSTP-Project was started, immediately after the Supreme Court’s KSR decision, for clarifying their meanings and building an IES[352] around them. This clarification basically means: Tightening Kant’s broad notion of ‘Aufklärung’ to today’s needs in [‘378] describing inventions, i.e. to his famous ‘mathematizability testimonial ─ in more detail: in precisely drafting ETCIs according to the ‘SPL MBA-framework interpretation’ by the Supreme Court, for short: in precisely drafting ETCIs in FSTP-IDL. ● FIG1.2 KR1.2 shows, how the KR1.1 of the classical SPL satisfaction test (i.e. of FIG1.1) is expanded by the Supreme Court’s MBA-framework if the latter’s notion of inCs, is ignored. Indeed, this key notion of an ‘inC’─ introduced into SPL (for ETCIs) by the Supreme Court’s Mayo decision ─ is strongly disliked by the USPTO just as the other MBA-framework introduced notions (quoted in the ‘Bilski&Alice section’ of an ETCI’s KR of FIG1.2): In spite of the fact that the inC notion being introduced by the Mayo decision, applied in the Supreme Court’s Myriad decision, and refined in its Alice decision, and in spite of in the latter’s hearing the Supreme Court clearly stated that it expects appropriate refinements of its then earlier framework-decisions’ notions to be provided by the patent community[378]. Instead, the USPTO did [378] not even mention them in its ‘Interim (Patent) Eligibility Guideline, IEG’ that the USPTO released shortly thereafter ─ not to mention that it oughtat least to have clarified these MBA-framework notions therein, especially the notion of an ETCI’s inC(s) ─ evidently expecting these indeed semiotic SPL notions introduced by the Supreme Court into legally dealing with ETCIs would sooner or later fade away, as allegedly being meaningless. As consequence of this ignoring the MBA-framework notions therein, they evidently are still considered by the patent community to be too metarational, as defined here. Yet the Supreme Court nowhere indicated that the notion of inventive concept should by abandoned again, but repeatedly confirmed that such refinements are to be developed “in the light of the Mayo and Myriad framework” that it consequently ─ as the CAFC ignored this directive ─ eventually developed on its own with Alice. It thus provided further hints that its refinement should resemble this key notion of thinking as known since Aristotle and broadly used in the late 60s in the then young ITtechnique of ‘System Design’. The explanation in FIG1.2’ of which of these 6 framework-decisions by the 10 FSTP-tests checks which of the 9 independent semantic aspects of the SPL-tested ETCI would remove nothing of this allegedly speculative Metaphysics, mistakenly felt to control the MBA-framework. Indeed, the only systematic way known until today to break down Metaphysics and Metarationality [378] to Rationality and eventually to Mathematics, as postulated by Kant , is known from System Design[2] and there called “separation of concerns”[278] ─ here achieved by splitting an invention into ETCI-elements, disaggregating any of their ‘original’ properties into a conjunction of ‘elementary’ properties (usually via its intermediate refinement into a conjunction of ‘abstract’ alias predicatable, i.e. formalizable properties[91,182]), and finally layering any abstract and its elementary properties’ predicates alias description, i.e. this ETCI-elements' complete conjunctive lattice of “use hierarchy” structured description, into what here is called its ‘O-/A-/E-levels description’ (by a conjunctive lattice of notions of refining resolutions). ●FIG1.3 Let a set S1 be ‘independent’ of a set S2, s1* iff ∄S1*S1 such s1* that ∀s1*S1* ∃ a Pposc-known function F such that s1*=F (S2).

ISBN: 1-60132-463-4, CSREA Press ©

ftn3) continued

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

ETCI’s semantic height over RS)4). This in “CTCI patent thinking” hitherto unneeded scrutiny thus tells: The A/N-matrix that any IDLPTR comprises is extremely error prone as its entries are often determined without checking whether a TT allegedly found in some doc.i there has e.g. an enabling ─ happening as a rule if disclosure and the other 8 FSTP-AspectsUSPTO TT0’s claim is interpreted by the BRI (thus rendering doc.0 irrelevant for TT0’s claim interpretation), as the USPTO regularly does [364] (recently even by a precedential decision , thus diametrically contradicting the Supreme Court’s MBA framework), and often the [378,378] CAFC, too . Already in[5] was shown that for a nonpathologic ETCI all its KRs of all its COM(ETCI)s are isomorphic as to their semantic ETCI being height over RS ─ by the same reasoning this holds also for[373] PE. The remaining explanations for KR1.3 are shifted into . IDL The 8 KRs in total indicate by any ETCI’s BNF that their “FSTP[373] IDL” definition in the future likely is the IDL of highest popularity[182]. Thus, while the scientific thinking required for enabling this [352] scrutiny in ‘patent thinking’ about ETCIs is sophisticated, the IES completely hides it. The definitions from[354/2.h)] qualifying ‘human’s SPL perception’ of the meaning of an ETCI, of its scope, of ..., are here repeated in improved wordings for clarifying Kant’s broad Aufklärungs[354/2.b)] testimonial as to its specific meaning in SPL precedents about 1.a) ETCIs . Next, this objectively/uniformly describing this independent incremental increase of this human perception, the common sense notion of the ‘dependable understanding to a definitory degree by an ordinary human brain’ is taken as delimiter of the below quality levels of such perceptions. Thereby not an artificial, scientific, and/or otherwise conditioned human brain is assumed ─ but only its being familiar with (besides the posc in SPL of MBA-framework-flavor) elementary ─ i.e. elementary Arithmetic/AnalyMathematics sis/Algebra/Geometry/Set Theory/Logic[355] ─ and with KR[2] by “O-/A-/Elevels of refining notional resolutions” of/about such notions’ below quality levels to be perceived. Accordingly, such notions’ human incremental perception as to this level-wise refined KR of such a notion ─ in SPL precedents about it ─ is initially qualified as being of ●'metaphysical’ quality, as its understanding then is by the Pposc3) derived from the OMUIS0(COM(ETCI)), then increased by notional refinement to be of ●'metarational' (= ‘aggregated-rational’) quality, as of (meta-) atomic predicates aggregated, and finally of ●'rational' (= ‘elementaryrational’) quality ─ all such increments of understanding for the Pposc disclosed by an ETCI’s specification and/or its posc and/or its prior art ─ whereby by Kant[378] this final total KR is ●‘mathematizable’, which is achieved by an ETCI’s ‘axiomation’ by using an IDL (otherwise this ETCI is seen to be pathological[5,373] and is not considered here, if it should exist at all, which is extremely unlikely in the SPL context). The meaning of any of these 3 (overlapping) human perception quality levels of an ETCI and its specified constituents is:  metaphysical iff identified by all N ETCI-elements’ *O-MUIS0n’s, which denote the property alias meaning of any O-crC0n ─ potentially originally defining it only vaguely ─ by its set of “OMark-up Units Identified, O-MUIs” in ETCI’s specification*, this meaning being subject to not being ‘transcendent’ or only ‘highly speculative’ and its vagueness to being of only ‘low speculativity’ and amenable to rationalization (as described next). This set of all N O-crC0n is ETCI’s “Outer Shell” (as by the Supreme Court[315,378]) and is located on its O-level.  metarational (= aggregated-rational) iff identified by all N ETCIelements’ *A-MUI0n ≡ O-MUI0n ─ the latter a priori statable more precisely by an A-crC0n than by the O-crC0n (and quite precisely statable after having 3) determined A-crC0n’s conjunctive, in COM(ETCI) independent , K E-crC0k’s making up this AcrC0n (or the E-crC0k conjunctive and in COM(ETCI) independent mirror (meta-)atomic predicates making up this A-crC0n’s mirror meta-predicate)* ─ which determines of any of the N A-crC0n its property alias meaning and is located on its A-level.  rational (= elementary-rational) iff identified by all K’ atomic and K’’ meta-atomic (with K’+ K’’= K) E-crC0k mirror predicates’ *EMUI0k is defined ∀k’ a piori by posc and ∀k’’ by an axiom (defined such that it encapsulates also its for any k’’ in crCS0AXIOMs-IDLLib given low Metaphysics, thus rationalizing it[373]) ─ which determines of any of the K E-crC0k its meaning and is located on its E-level. NOTE: The crC0S-AXIOMs-IDLLib1.f) may seem to split of the ETCI’s E-level ─ in its ‘use hierarchy’[278] defined by its O-/A/E-hierarchy ─ an it supporting M-level, comprising the same items as the E-level, yet in their mathematical KR. But this is false, as this Mlayer is not a notion ‘refining‘ layer, but a notion ’précising’ layer of ftn3) continued

4

33

The Supreme Court is by Constitution entitled to interpret SPL by its MBA-framework in favor of ETCIs. But many patent practitioners feel it is metaphysical. Yet this is false! Any IDL proves: The semi-automatic application of the FSTP-Test on ETCIs is next to trivial, extremely practical, rational/mathematical, and totally robust! And: Nothing of all that goes without the ETCI’s MBA-framework4).

3

THE AFTERMATH: Enormous EFFICIENCY INCREASE for all PTOs and R&D-GROUPs

By Section II the FSTP-Test is THE silver bullet in drafting&testing an ETCI for total robustness. By contrast, this Section III clarifies the two fundamentally different alternatives for ETCIs’ FSTP-tests: To this end, it briefly outlines the basis for identifying and estimating in[373] the total amount of functionalities a human must inevitably execute on an ETCI for verifying that it satisfies SPL ─ i.e. for generating the J LACs of1.f) ─ by considering 8 different exemplary ETCI-KRs, 4 nIDLKRs and 4 IDLKRs:  On way1, in freestyleETCI’s freestyleFSTP-Test ─ to be performed freehand by the human tester and he/she therefore must understand in detail and completely, thereby avoiding any semi-automatic service and accepting that this test is of ‘good will’ quality, only, i.e. metarational at best if not metaphysical ─ as well as  on way2, in this IDLETCI’s vastly mathematized IDLFSTPTest ─ there to be semi-automatically performed by the human tester, as this test’s results are proven to be mathematically correct upfront iff the ETCI passes it and he/she therefore need not understand this test’s working. These two ways of executing an ETCI’s FSTP-Test evidently are the ●manufacture-like/pre-industrial/prescientific way1b) (currently implemented by the IES prototype, for a few friendly testers open over the Internet from June 2017 on), and the ●semi-automatic/post-industrial/AIT-scientific way2 (planned for Q4/2017), not requiring the tester to (fully) understand the ETCI or the FSTP-Test. In more detail: Using a freestyleETCI requires the human tester to fully understand this ETCI per se & the subtle but decisive intricacies of refined SPL & the FSTP-Test’s implementation ─ exceeding the usual human capability to think dependably. But this ‘human capacity bottleneck’ may be avoided right from the outset by using mathematically proven correct[354,355] IDLKRs[373]: This enables relieving the tester from all ‘understanding requirements’ by leveraging the ‘semi-automatic IDLETCI total robustness testing’ capability of the IES. The only residual risk (existing in freestyle, too) the E-layer ─ similar to the relation between ETCI’s O- and A-levels ─ i.e. logically irrelevant for SPL, as it nowhere (once its crCs’ independence is guaranteed, which is achieved by its ETCIs’ E-levels) requires this additional notional preciseness. I.e., In particular the MBA-framework gets along without it ─ it needs no mathematization of SPL precedents about ETCIs ─ though using the just summarized advantages on top of the such decisions making the IES without its internal mathematization is impossible.

ISBN: 1-60132-463-4, CSREA Press ©

34

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

then is the ETCIs’ correct definition of the ‘necessarily by a human to provide’ aspects of defining ETCIs’ facts[373]. Thus, thinking about other issues ─ in an IDL PTR’s[373] IDLFSTP-Test ─ than its few ‘necessarily human-provided facts’, in particular axioms4), is totally superfluous. Moreover, the IES may have already upfront all its (final number of) results stored in a data structure PTR-DS[7] and then additionally all involved programs and their data audited to be correct. I.e.: Committing a legal error in applying the IDLFSTP-Test to an IDLETCI is excluded (also after having dynamically correctly input to it the necessarily human-provided ETCIs’ facts, i.e. also if not having them prechecked stored in the PTR-DS). And: Also recognizing incorrect fact definitions is by the IES significantly facilitated for the Pposc, as the mathematization of most such ‘E-facts’ starts from their ‘nearby’ metarational posc. In total, the 4 main results of this paper may be summarized as follows:  An IDLETCI’s IDLFSTP-Test not only greatly facilitates drafting & testing its patent’s total robustness, but also finding a potential failure in attacking its validity, e.g. by a faulty prima-facie case for it[373].  Depending on an IDLETCI’s crCS-AXIOMs-IDLLib, the IDL FSTP-Test may be executed (almost) fully automatically[373], thus its result is unassailable in everyday patent business, if the IES is approved by a recognized auditor or the absolute reliability of the IES’es implementation & input is otherwise confirmed.  As the ‘FSTP-IDL’ is a trivial subset of English expanded by all SPL notions, elementarily mathematized to different degrees, it likely will be instantly broadly used by the patent community.  While the IES prototype is 35 USC/SPL based, its EPA/SPL variant1.e) is planned for Q2/18.

4

ABOUT 2 “MBA-FRAMEWORKLETTERS” ─ AND ALIKE ─ TO CONGRESS

There is a nice “Client Alert Commentary”[375] providing a compact survey about the trend-indicating/making voices in the patent community, ex- and implicitly including ─ besides the CAFC and USPTO ─ in particular AIPLA[376] and IPO[377], just as clusters of IT or pharma firms, just as …. Most of them are so upset about not understanding the Supreme Court’s MBA-framework that they now ask even Congress for relief from at least the peak of the unfortunate mistakes that the Supreme Court allegedly committed by it, the Patent-Eligibility alias §101 problem ─ except patent experts with a profound IT qualification, especially in System Design Technique, which disagree with this majority and send out quite positive signals about the MBA-framework. The only purpose of this very brief Section is to put these voices into a ranking relation to the Supreme Court’s MBA-framework initiative, evidently designed to break-up the

standstill in adjusting the SPL to the needs of ETCIs, as the classical SPL interpretation’s paradigm turned out to be far too coarse for robustly protecting by SPL the notional much more sophisticated filigree inevitably embodied by any ETCI. To put it short: The FSTP-Technology as represented by the FSTP-Test ─ induced by the principles underlying the MBA-framework thought brought in line with the AIT[2] thinking, yet without compromising the Supreme Court’s socio-economical cognitions concerning the US society and summarized in its Mayo opinion ─ provides a simple and unquestionable means for rating these voices such relations, as explained next. It namely instantly becomes evident that neither the IPO nor the AIPLA suggested modifications of the current § 101 provide any provision for excluding patenting ETCIs of unlimited preemptivity, which politically threaten to put the entire US NPS into jeopardy as socioeconomically untenable unless fair sublicensing becomes mandatory in favor of preempted ETCIs (as practiced in Europe). Yet thereby the notion of ‘fair’ is known to be indefinable, and thus their suggestions create another source of inconsistency and unpredictability of patent precedents about ETCIs. By contrast, the MBA-framework minimizes the likelihood of an ETCI to encounter a preemption without notice by requiring that the area of admissible potential preemptions ─ this area must be preserved for and granted to the financial and/or personal engagement investor also in the future for incentivity reasons ─ must by the specification of any ETCI to be patent-eligible ●explicitly disclosed and ●precisely verified to satisfy § 112, whereby the impact of meeting these requirements on the set of all future ETCIs is a priori minimized under the cost function logically/necessarily implied by the ETCI to be patent-eligible. None of the suggested modifications seems to comprise a faintest hint that this Solomonic cognition embodied by the Alice’s PE analysis (representing purely mathematical thinking as envisioned by Kant[378] has been preserved as probably not recognized ─ just as in the USPTO’s IEG interpretation by its ‘2-step-test’ of this Alice analysis, just as still in part of the current CAFC PE precedents (DDR et al, though being correct in the Alice sense). In the FSTP-Test this Solomonic Supreme Court decision is covered by the 4 FSTP-test4-7, which are logically totally intermeshed with all remaining 6 FSTP-testo. Hearsay tells that a similar phenomenon occurred in Theoretical Physics, when N. Bohr’s atom model had been replaced by E. Schrödinger’s one: 10 years later part of this scientific community still struggled with trying to derive Heisenberg’s uncertainty relation as to an elementary particle’s p and v from Bohr’s simple and clear atom model ─ refusing to accept that the latter’s deterministic paradigm simply is notionally to coarse to meet the requirements stated by the then refined Theoretical Physics. This story shows that nothing is wrong with the broad reluctance to accept the paradigm refinement that the Supreme Court (necessarily) found for meeting by SPL the requirements stated by ETCIs.

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

[283] [284] [285] [286] [287] *) FSTP = Facts Screening/Transforming/Presenting (Version of 15.05.2017 ) [288] [289] Most of the FSTP-Project papers below are written in preparation of the textbook [182] – i.e. are [290] not intended to be fully self-explanatory independent of their predecessors. [291] [292] [2] AIT: “Advanced Information Technology” alias “Artificial Intelligence Technology” denotes cutting edge IT areas, e.g. Knowledge Representation(KR)/Description Logic (DL)/Natural Language (NL)/Semantics/Semiotics/System Design, just as MAI: “Mathematical [293] [294] Artificial Intelligence”, the resilient fundament of AIT and ”Facts Screening/ Transforming/Presenting, FSTP”-Technology, developed in [295] this FSTP-Project. [296] [297] *) [5] S. Schindler: “Math. Model. Substantive. Patent Law (SPL) Top-Down vs. Bottom-Up”, Yokohama, 2012, JURISIN 20 [298] [6] S. Schindler, “FSTP” pat. appl.: “THE FSTP EXPERT SYSTEM”, 2012*). [7] S. Schindler, “DS” pat. appl.: “AN INNOVATION EXPERT SYSTEM, IES, & ITS PTR-DS”, 2013*). [299] [300] [9] .a S. Schindler, “Patent Business – Before Shake-up”, 2013*). [301] .b S. Schindler, “Patent Business – Before Shake-up”, 2015*). [302] .c S. Schindler, “Patent Business – Before Shake-up”, 2017*). [303] [304] [305] [306] [307] [308] [309] [310] [35] S. Schindler, IPR-MEMO: “Definitional Distinctions between ─ and Common Base Needed of ─ Subs. Trademark Law, Subs. [311] Copyright Law, and Subs. Patent Law”, in prep. [312] [37] D. Bey, C. Cotropia, "The Unreasonableness of the BRI Standard", AIPLA, 2009*). [313]

5

The FSTP-Project’s Reference List

[1]

S. Schindler: “US Highest Courts’ Patent Precedents in Mayo/Myriad/CLS/Ultram ercial/LBC: ‘Inventive Concepts’ Accepted, – ‘Abstract Ideas’ Next? Patenting Em erging Technologies'. Inventions Now without Intricacies”*).

[3] [4]

R. Brachm ann, H. Levesque: “Knowledge Represent. & Reasoning”, Elsevier, 2004. F. Baader, D. Calvanese, D. McGuiness, D. Nardi, P. Patel-Schneider: “The Description Logic Handbook”, CUP, 2010.

[8]

S. Schindler, J. Schulze: “Technical Report #1 on ‘902 PTR”, 2014.

[10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34]

SSBG's AB to CAFC in LBC, 2013*). S. Schindler, “inC” pat. appl.: “inC ENABLED SEM I-AUTO. TESTS OF PATENTS”, 2013*). C. Correa: “Handbook on Prot. of IP under WTO Rules”, EE, 2010. N. Klunker: "Harmonisierungsbest. im m at. Patentrecht”, MPI, 2010. “USPTO/M PEP: “2111 Claim Interpretation; Broadest Reason. Interpretation”). S. Schindler: “KR Support for SPL Precedents”, Barcelona, eKNOW-2014*). J. Daily, S. Kieff: “Anyt. under the Sun M ade by Hum ans SPL Doctrine as End. Instit. for Com m. Innovation”, Stanford/GWU*). CAFC En banc Hearing in LBC, 12.09.2013. USSC: SSBG’s AB in CLS, 07.10.2013*). USSC: SSBG’s AB in WildTangt, 23.09.2013*). USPTO, “Intellectual Property and the US Econom y: INDUSTR. IN FOCUS”, 2012*). K. O'Malley: Keynote Address, IPO, 2013*). S. Schindler, “An Inventor View at the Grace Period”, Kiev, 2013*). S. Schindler, “The IES and inC Enabled SPL Tests”, Munich, 2013*). S. Schindler, “Two Fund. Theorem s of ‘Math. Innovation Science’”, Hong Kong, ECM -2013*). S. Schindler, A. Paschke, S. Ram akrishna, “Form . Leg. Reas. that an Inven. Satis. SPL”, Bologna, JURIX-2013*). USSC: SSBG’s AB in Bilski, 06.08.2009*). T. Bench-Capon, F. Coenen: “Isomo. and Legal Knowledge Based System s”, AI&Law, 1992*). N. Fuchs, R. Schwitter. "Att. to Con. E.", 1996. A. Paschke: “Rules / Logic Programm ing in the Web”. 7. ISS, Galway, 2011. K. Ashley, V. Walker, “From Info. to Arg. Retr. for Legal Cases”, Bologna, JURIX-2013*). CAFC, H. in Oracle / Google, “As to Copyrightability of the Java Platf.”, 06.12.2013. S. Schindler, “A KR Based Inno. E. Sys. (IES) for US SPL Preceds”, Phuket, ICIIM -2014*). S. Schindler, “Status Report about the FSTP Prototype”, Hyderabad, GIPC-2014. S. Schindler, “Status of the FSTP Prototype”, M oscow, LESI, 2014.

[36]

S. Schindler, “Boon and Bane of Inventive Concepts and Refined Claim Construction in the Supreme Court's New Patent Precedents", Berkeley, IPSC, 08.08.2014*).

[38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63]

CAFC, Transcript of the Hearing in TELES vs. CISCO/USPTO, 08.01.2014*). CAFC, Transcript of the en banc Hearing in CLS vs. ALICE, 08.02.2013*). SSBG's Brief to the CAFC in case '453*). SSBG's Brief to the CAFC in case '902*). SSBG's Amicus Brief to the CAFC in case CLS, 06.12.2012*). S. Schindler, “LAC” pat. appl.: „Sem i-Auto. Gen./Custom. of (All) Confirm ative Legal Arg. Chains (LACs) in a CI`s SPL Test, Enabled by Its Inventive Concepts”, 2014*). R. Rader, S. Schindler: Panel disc. "Patents on Life Sciences", Berlin, LESI, 2012. USSC: SSBG’s AB as to CIIs, 28.01. 2014*). S. Schindler: "Autom . Deriv. of Leg. Arg. Chains (LACs) from Arguable Subtests (ASTs) of a Claim ed Invention's Test for Satisfying. SPL", U Warsaw, 24.05.2014*). S. Schindler: "Autom atic Generation of All ASTs for an Invention's SPL Test".*). USPTO/M PEP, “2012 Proc. for Subj. M at. Eli. ... of Pro. Claim s Inv. Laws of Nature”, 2012*). USPTO/M PEP, Supp. Ex. Guideli. for Determ . Com pli. with 35 U.S.C. 112; MPEP 2171*). NAUTILUS v. BIOSIG, PFC, 2013*). BIOSIG, Respondent, 2013*) Public Knowledge et al., AB, 2013*). Am azon et al., AB, 2013*). White House, FACT SHEET - ... the Presid.’s Call to Str. Our PS and Foster Inno., 2014*). B. Russel: “Principia M athematica”, see wikipedia. CAFC Decision Phillips v. AWH Corp., 12.07.2005 M. Adelm an, R. Rader, J. Thom as: "Cases and M aterials on Patent Law", West AP, 2009. SSBG's Amicus Brief to the Supreme Court as to its (In)Definiteness Quest’s, 03.03, 2014*). S. Schindler, “UI” pat. appl.: “An IES Cap. of S-Auto. Gen./Invoking All LACs in the SPL-T …, Ean. by InCs”, 2014*). S. Schindler: "Auto. Der. of All Arg. Chains Leg. Def. Patenting/Patented Inventions", ISPIM , Montreal, 6.10.2014, *). H. Wegner: "Indf., the Sl. Giant in SPL", www. laipla.net/hal-wegners-top-ten-patent-cases/. .a) CAFC decision on reexam ination of U.S. Pat. No. 7,145,902, 21.02.2014*). .b) CAFC decision on reexam ination of U.S. Pat. No 6,954,453, 04.04.2014*).

[64]

B. Wegner, S. Schindler: "A Mathe. Structure Modeling Inventions", Coimbra, CICM-2014*).

[65] [66] [67] [68]

SSBG’s Petition to the CAFC for Rehearing En Banc in the ‘902 case, 18.04.2014*). CAFC: VEDERI vs. GOOGLE, 14.03.2014 CAFC: THERASENSE decision, 25.05.2011 B. Fiacco: Am icus Brief to the CAFC in VERSATA v. SAP&USPTO, 24.03.14*).

[69]

USSC, Transcript of the oral argument in Alice Corp. v. CLS Bank, 31.03.2014*).

[70] [71] [72]

R. Rader, Keynote Speech: “Pat. Law and Liti. Ab.”, ED Tex Bench and Bar Conf., 01.11.2013*). S. Schindler, Keynote Speech: “eKnowledge of SPL – Trail Blazer into the Innovation Age”, Barcelona, eKNOW-2014*). .a) S. Schindler: “The USSC's ‘SPL Init.’: Sci. Its SPL Interpreta. Rem oves 3 Everg. SPL Obscurities”, PR, 08.04.2014*). .b) S. Schindler: “The Supreme Court’s ‘SPL Initiative’: Sci. Its SPL Int. Rem . 3 Everg. SPL Obsc. and En. Auto. in a CI’s SPL Tests and Arg. Chains”, Honolulu, IAM2014S, 18.07.14*). .a) USPTO/MPEP: “2014 Procedure For Subject M atter Eligibility Analysis Of Claims Reciting Or Involving Laws Of Nature/Natural Principles, Natural Phenomena, And/Or Natural Products”, [48,49], 2014*). .b) M EM ORANDUM : “Prelim . Examin. Instructions in view of Alice v. CLS”*). B. Wegner: "The Math. Background of Proving InCs Based Claimed Inv. Satisfies SPL”, 7. GIPC, Mum bai, 16.01.2015.*) CAFC Order as to denial [65], 27.05.2014 D. Crouch: “En Banc Fed. Cir. Panel Changes the Law of Claim Construction”, 13.07.2005*). Video of the USPTO Hearing, 09.05.2014*). R. Rader, Keynote Speech at GTIF, Geneva, 2014 and LESI, M oscow, 2014 S. Schindler: “On the BRI-Schism in the US NPS …”, publ. 22.05.2014.*) USSC: SSBG’s PfC in the ‘902 case, Draft_V.133_of_ [121], publ. 14.07.2014*). S. Schindler: “To Whom is Interested in the Suprem e Court’s Biosig Decision”*) R. DeBerardine: “Inno.Corp.Per.”, FCBA*). SSBG’s Petition to the CAFC for Rehearing En Banc in the ‘453 case, 09.06.2014*). CAFC’s Order as to denial [83], 14.07.2014*). CAFC: “At Three Decades”, DC, 2012. S. Schindler Foundation: “Transatlantic Coop. for Growth and Security”, DC, 2011. DPM A: “Recent Developm ents and Trends in US Patent Law“, Munich, 2012. FCBA: “Inno., Trade, Fis. Real.”, Col. S., 2013. LESI: GTIF, Geneva, 2014. FCBA: “Sharp. C. M an.”, Asheville, N.C., 2014

[73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90]

[314] [315] [316]

[91]

B. Wegner, S. Schindler: "A Math. KR Model for Refining Claim Interpret..& Constr.", in prep.

[92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112]

SSBG’s Petition for Writ of Certiorari to the Suprem e Court in the ‘453 case, 06.10.2014*). E. M orris: “What is ‘Technology’?”, IU I.N.*) E. M orris: “Alice, Artifice, and Action – and Ultram ercial”, IU I.N., 08.07.2014*). S. Schindler, ArAcPEP-M EM O: “Artifice, Action, and the Pat.-Eli. Prob.”, in prep., 2014. A. Chopra: “Deer in the Headlights. Response of Incum bent Firms to … ”, School of M anagement, Fribourg, 2014*). S. Schindler, DisInTech-M EMO: “R&D on Pat. Tech.: Eff. and Safety Boost.”, in prep., 2014. G. Boolos, J. Burgess, R. Jeffrey: “Com putability and Logic”, Cambridge UP, 2007. A. Hirshfeld, Alexandria, PTO, 22.07.2014*). C. Chun: “PTO’s Scrutiny on Software Patents Paying Off”, Law360, N.Y.*). P. M ichel, Keynote, PTO, 22.07.2014. D. Jones, Alexandria, PTO, 22.07.2014. R. Gomulkiewicz, Seattle, CASRIP, 25.07.14. M. Lemley, Seattle, CASRIP, 25.07.2014. D. Jones, Seattle, CASRIP, 25.07.2014. B. LaMarca, Seattle, CASRIP, 25.07.2014. J. Duffy, Seattle, CASRIP, 25.07.2014. J. Pagenberg, Seattle, CASRIP, 25.07.2014. M. Adelm an, Seattle, CASRIP, 25.07.2014. B. Stoll, Seattle, CASRIP, 25.07.2014. R. Rader, Seattle, CASRIP, 25.07.2014. E. Bowen, C. Yates: “Justices Should Back Off Patent Eligibility, …”, L360*).

[113]

S. Schindler: “The CAFC’s Rebellion is Over – The USSC, by Mayo/Biosig/Alice, ...”, published 07.08.2014*).

[114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127]

S. Elliott: “The USPTO Patent Subj. M atter Eligi. Guidance TRIPSs”, 30.07.2014*). W. Zheng: “Exhausting Patents”, Berkeley, IPSC, 08.08.2014*). R. M erges: “Ind. Inv.: A Lim ited Defense of Absolute Infringem ent Liability in Patent Law”, Berkeley, IPSC, 08.08.2014*). J. Sarnoff, Berkeley, IPSC, 08.08.2014. H. Surden: “Principles of Problem atic Pats”, Berkeley, IPSC, 08.08.2014*). www.zeit.de/2013/33/m ultiple-sklerose-medikam ent-tecfidera/seite-2*). J. M erkley, M . Warner, M . Begich, M . Heinrich, T. Udal: “Letter to Hon. Penny Pritzker”, DC, 06.08.2014*). USSC: SSBG’s PfC in ‘902 case, 25.08.2014*). D. Parnas, see Wikipedia. E. Dijkstra, see Wikipedia. S. Schindler: “Computer Organization III”, 3. Sem ester Class in Com p. Sc., TUB, 1974-1984. S. Schindler: “Nonsequential Algorithm s”, 4. Sem ester Class in Com p. Sc., TUB, 1978-1984. S. Schindler: “Optim al Satellite Orbit Transfers”, PhD Thesis, TUB, 1971. USSC Decision in KSR ….. USSC Decision in Bilski ….. USSC Decision in Mayo ….. USSC Decision in Myriad ….. USSC Decision in Biosig ….. USSC Decision in Alice …. R. Feldm an: “Coming of Age for the Federal Circuit”, The Green Bag 2014, UC Hastings. G. Quinn: “Judge M ichel says Alice Decision ‘will create total chaos’”, IPWatch,*). G. Frege: “Function und Begriff“, 1891. L. Wittgenstein: “Tract. logico-philoso.”, 1918. B. Wegner, M EM O: “About relations (V.7-final)”, 25.04.2013*). B. Wegner, M EM O: “About con. of pre. /con., scope and solution of problems”, 20.08.2013. B. Wegner, M EM O: “A refined relat. between dom ains in BADset and BEDset”, 18.09.2014. H. Goddard, S. Schindler, S. Steinbrener, J. Strauss: FSTP M eeting, Berlin, 29.09.2014. S. Schindler: “Tutorial on Comm onalities Between System Design and SPL Testing”.*). S. Schindler: “The Rationality of a Claim ed Invention’s (CI’s) post-Mayo SPL Test – It Increases CI’s Legal Quality and Professional Efficiency in CI’s Use”, in prep. S. Schindler: “The USSC Guid. to Robust ET CI Patents”, ICLPT, Bangkok, 22.01.2015*). USSC: Order as to denial [121], 14.10.2014*). S. Schindler: “§ 101 Bashing or § 101 Clarification”, published 27.10.2014*). BGH, “Dem onstrationsschrank” decision*). B. Wegner, S. Schindler: “A Mathematical KR Model for Refined Claim Interpretation & Construction II”, in prep... … Press, …… to go into [137]………… “Turmoil …..”, see program of AIPLA m eeting, DC, 23.10.2014 “Dark side of Innovation”, …… see [137] D. Kappos: About his recent west coast meetings, AIPLA, DC, 23.10.2014. CAFC, Transcript of the Hearing in Biosig case, 29.10.2014*). R. Rader: Confirming that socially inacceptable CIs as extrem ely preemptive, such as for example [119], should be patent-eligible, AIPLA m eeting, DC, 24.10.2014. A. Hirshfeld: Announcing the USPTO’s readiness to consider also hypo. CIs in its EG, AIPLA meeting, DC, 24.10.2014. S. Schindler: “Alice-Tests Enable ‘Quantifying’ Their Inventive Concepts … “, USPTO&GWU, 06.02.2015*), see also [175]*). S. Schindler: “Biosig, Refined by Alice, Vastly Increases the Robustness of Patents”, in prep.“*). S. Schindler: ”Auto. Deriv./Reprod. of LACs, Protecting Patens Against SPL Attacks”, Singapore, ISPIM , 09.12.2014*). S. Schindler: “Practical Im pacts of the Mayo/Alice/Biosig-Test”, t., Drake Uni. Law School, 27.03.2015*) CAFC Decision in Interval, 10.09. 2014*). S. Schindler: “A Tutorial into (Operating) Sys. Design and AIT Term s/Notions on Rigorous ETCIs' Analysis. “, in prep. CAFC Decision in DDR, 05.12. 2014*). USPTO: “2014 Int. Guidance on Pat. Subj. M . Eli. & Exam ples: Abs. Ideas”, 16.12.2014*). USSC’s Order as to denial [92], 08.12.2014*). CAFC Decision in Myriad, 17.12.2014*).

[128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159]

[160]

S. Schindler: “The USSC Mayo/Myriad/Alice Decisions, The PTO’s Implementation by Its IEG, The CAFC’s DDR & Myriad Recent Decisions”*), publ. 14.01.2015*), its short version*), and its PP presentation at USPTO, 21.01.2015*).. S. Schindler: ”The IES: Phil. & Func. &, Ma. F. – A Proto.”, 7. GIPC, Mum bai, 16.01.2015*). CAFC Decision in CET, 23.12.2014*).

[163]

S. Schindler: “The USSC’s Mayo/Myriad/Alice Decisions: Their Overinterpret. vs. Oversimpl of ETCIs – Scie. of SPL Prec. as to ET CIs in Action: The CAFC’s Myriad & CET Decisions”, USPTO, 07.01.2015*).

[164] [165] [166] [167] [168] [169] [170]

J. Schulze, D. Schoenberg, L. Hunger, S. Schindler: “Intro. to the IES UI of the FSTP-Test“, 7. GIPC, M umbai, 16.01.2015*). “ALICE AND PATENT DOOMSDAY IN THE NEW YEAR”, IPO, 06.01.2015*). S. Schindler: “Today’s SPL Precedents and Its Perspectives, Driven by ET CIs”, 7. GIPC, M umbai, 15.01.2015*). R. Sachs: “A Survey of Pat. Inv. since Alice”. F&W LLP, Law360, New York, 13.01.2015*). S. Schindler: “PTO’s IEG Forum – Som e Aftermath”, publ. 10.02.2015*). Agenda of this Forum on [157], Alexandria, USPTO, 21.01.2015*). G. Quinn: “Patent Eli. For. Discuss. Ex. Appli. of Mayo/Myriad/Alice”, IPWatchd, 21.01.2015*)

[171]

S. Schindler: “Semiotic Impacts of the Supreme Court’s Mayo/Biosig/Alice Decisions on Leg. Anal. ETCIs”*).

[172] [173] [174]

USSC Decision in Teva, 20.01.2015*). USSC Dec. in Pullm an-Standard, 27.04.1982*). USSC Decision in M arkm an, 23.04.1996*).

[175]

S. Schindler: “Patent’s Robustness & ‘Double Quantifying’ Their InCs as of Mayo/Alice”, WIPIP. USPTO&GWU, 06.02.2015*). R. Rader: Questions as to the FSTP-Test, WIPIP, USPTO&GWU, 06.02.2015. D. Karshtedt: “The Completeness Requ. in Pat Law”, WIPIP, USPTO&GWU, 06.02.2015*). O. Livak: “The Unresol. Am biguity of Patent Claim s”, WIPIP, USPTO&GWU, 06.02.2015*). J. M iller: “Reasonable Certain Notice”, WIPIP, USPTO&GWU, 06.02.2015*) S. Ghosh: “Dem arcating Nature After Myriad”, WIPIP, USPTO&GWU, 06.02.2015*) CAFC Decision in Cuozzo, 04.02.2015*).

[182]

S. Schindler: “Basics of Innovation-Theory and Substantive Patent Law Technology”, Textbook, in prep.

[183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210]

S. Schindler: “The Mayo/Alice SPL Ts/Ns in FSTP-T&PTO Init.”, USPTO, 16.03.2015*). S. Schindler: “PTOs Efficiency Increase by the FSTP-Test, e.g. EPO and USPTO”, LESI, Brussels, 10.04.2015*). R. Chen: Com m enting politely on “tensions” about the BRI, PTO/IPO-EF Day, 10.03.2015. A. Hirshfeld: Rep. about the PTO’s progress of the IEG work, PTO/IPO-EF Day, 10.03.2015. P. M ichel: Moderating the SPL paradigm ref. by Mayo/Alice, PTO/IPO-EF Day, 10.03.2015. P. M ichel: Asking this panel as to diss. of Mayo/Alice, PTO/IPO-EF Day, 10.03.2015. M. Lee: Luncheon Keynote Speech, PTO/IPO-EF Day, 10.03.2015*). A. Hirshfeld: Remark on EPQI’s ref. of pat. ap. exam ination, PTO/IPO-EF Day, 10.03.2015. 16th Int. Roundt. on Sem., Hilo, 29.04.2015*). M. Schecter, D. Crouch, P. Michel: Panel Disc., Patent Quality Sum m it, USPTO, 25.03.2015. Finnegan: 3 fund. current uncert. on SPL prec, Patent Quality Sum m it, USPTO, 25.03.2015. S. Schindler, B. Wegner, J. Schulze, D. Schoenberg: “post-Mayo/Biosig/Alice – The Precise Meanings of Their New SPL Term s”, publ. 08.04.15*). R. Stoll: “Fed. Cir. Cases to Watch on Softw. Pat. – Planet Blue”, Patently-O, 06.04.2015*). See the panel at the IPBCGlobal’2015, San Francisco, 14-16.06.2015*).. S. Schindler: “Mayo/Alice – The USSC’s Requirem ent Statem ent as to Sem iotics in SPL & ETCIs, USPTO, 06.05.2015r*). S. Schindler: “Pats’ Abs. Robust. & the FSTP-Test”, LESI 2015, Brussels 18.04.2015*), DBKDA 2015 Rome 27.05.2015. B. Wegner: “The FSTP Test – Its Mathe. Assess. of an ET CI’s Practical and SPL Quality”, LESI 2015, Brussels, 18.04.2015*). and DBKDA 2015, Rom e, 27.05.2015. D. Schoenberg: “The FSTP Test: A SW Sys. for Ass. an ET CI’s Pract. and SPL Quality”, LESI 2015 Brussels 18.04.2015 and DBKDA 2015 Rom e 27.05.2015*). Panel: “Patent Prosecution Session”, AIPLA, LA, 31.04.2015. S. Schindler:; “The Notion of “InC”, Fully Scientized SPL, and “Controlled Preem ptive” ETCIs”, published by 11.06.2015*). I. Kant, http://plato.stanford.edu/entries/kant/. J. Lefstin: “The Three Faces of Prometheus: A Post-Alice Jurisprudence of Abstraction”, N.C.J.L.&TECH, July 2015*). CAFC Decision in Biosig, 27.04.2015*). USSC Petition for Cert in ULTRAM ERCIAL vs, WILDTANGENT, May 2015. . K.-J. Melullis, report about a thus caused problem with a granted patent at the X. Senate of the German BGH. S. Schindler: “Reach of SPL Prot. for ETCIs of Tied Preem ptivity”, published by 25.06.2015*). CAFC Decision in Ariosa, 12.06.2015*) S. Braswell: “All Rise for Chief Justice Robot”, Sean Braswell, 07.06.2015*)

[211] [212] [214] [215] [217]

S. Schindler: “The Cons. of Ideas Mo. USSC’s MBA-Semiotics and its Hi-Level”, in prep. R. Merges: “Uncertainty, and the Standard of Patentability”, 1992*). K. O’Malley,…..: “Pat. Lit. Case Man.: Reforming the Pat. Lit. Proc. …”, FCBA, 25.06.2015. R. Chen,…..: “Claim Construct.”, FCBA, 26.06.2015. S. Schindler: “The US NPS: The MBA Framework a Rough Diamond – but Rough for Ever? Teva will Cut this Diamond and thus Create a Mega-Trend in SPL, Internat.”, publ. 21.07.2015*).

[213]

[216]

CAFC Decision in Teva, 18.06.2015*)

P. Naik, C. Laporte, C, Kinzig, T. Chappel, K. Gupta: “Chan. IP Norm s and their Effect on Inno. in Bio-/Pharm aceut.-/High-Tech Sectors of the Corporate World”, FCBA, 27.06.2015.

B. Russel: “Principles of M athem atics”, see Wikipedia. A.v. Wijngaarden, s.Wikipedia CAFC Decision in LBC, 23.06.2015*).. CAFC Decision in Cuozzo, 08.07.2015*).. CAFC Decision in Versata, 09.07.2015*).. CAFC Decision in Int. Ventures, 06.07.2015*).. J. Duffy, J. Dabney: PfC, 13.08.2009*).

S. Schindler: “A PS to an Appraisal to the USSC’s Teva Decision: CAFC Teaming-up with PTO for Barring Teva – and this entire ‘ET Spirit’ Framework?”, pub 27.07.2015*).

[226] [227] [228] [229] [230]

R. Stoll, B. LaM arca, S. Ono, H. Goddard, N. Hoelder: “Challenging Software-Business M ethod Pat. Eli. in Civil Actions and Post Grant Review”, CASRIP, Seattle, 24.07.2015. A. Serafini, D. Kettelberger, J. Haley, J. Krauss: „Biotech and Pharm a Patents Eligi.:“, CASRIP, Seattle, 24.07.2015. D. Kettelberger, see [227] Justice Breyer: “Archim edes Metaphor”, [69]*). I. Kant: https://en.wikipedia. com /wiki/Im manuel_Kant. & I. Kant: *Critique of Pure Reason”, https://en.wikipedia.com /wiki/I_Kant. I. Kant: “The M etaphysical Foundations of Natural Science “, Wikipedia. I. Kant: *Groundwork of the Metaphysics of M orals”, https://en.wikipedia.org/wiki/. I. Kant: *Categorical Im perative”, https:/en. wikipedia.org/wiki/Categorical_Imperative I. Kant: "What Real Progress has Metaphysics Made in Germ any since the Tim e of Leibniz and Wolff?", AbarisB., NY,'83. I. Kant: *Prolegomena to Any Future M etaphysics”, https://en.wikipedia.org/wiki/ J. Dabney: “The Return of the Inventive Concept?”, 06.12.2012*). .a USPTO: "July 2015 Update on Subj. Matter Eligibility", 30.07.2015*) .b USPTO: „M ay 2016 Update: M em orandum - Recent Subj.M atter Eligibility Decisions“, 19.05.2016*) Concepts, http://plato.stanford.edu/entries/concepts/. S. Schindler: “The Suprem e Court’s Substantive Law (SPL) Interpretation – and Kant“, publ.13.04.2016*). R. Hanna: “Kant and the Foundations of Analytic Philosophy”, OUP, 2001. S. Koerner: “The Philosophy of M athem atics”, DOVER, 2009 USSC: PfC by Cuozzo*). S. Schindler: “Draft of an Am icus Brief to the USSC in Cuozzo supporting“, publ. 05.11.2015*). Panel: “The Evolving Landscape at PTAB Proceedings”, AIPLA, DC, 22.10.2015 M. Lee: Publ. Interview at Opening Plenary Session, AIPLA, DC, 21.10.2015. S. Schindler: “The IEG’s July 2015 Update & the ‘Patent-Eligibility Granted/-ing, PEG’ Test”, publ. 18.12.2015*) M. Lee: USPTO Director's Forum , „Enhanced Patent Quality Initiative: M oving Forward”, 06.11.2015*). ISO/OSI Reference M odel of Open Systems Interconnection, see Wikipedia. S. Graham (LAW.COM): Q&A With AIPLA President Denise DeFranco, 13.11.2015*). USSC Decision in Parker vs. Flook, 22.06.1978*). CAFC Denial of En Banc Petition in Ariosa v. Sequenom , 02.12.2015*). D. Crouch (Patently-O): Federal Circuit Reluctantly Affirm s Ariosa v. Sequenom and Denies En Banc Rehearing, 03.12.2015*) S. Schindler: “Patent-Eligibility and the “Patent-Eligibility Granted/-ing , PEG” Test, resp. the CAFC Object. Counters the Suprem e Court’s M BA Framework, by its DDR vs. M yriad/ Cuozzo Decisions”, 05.01.2016*). E. Coe: “M ichelle Lee Steers USPTO Through Choppy Waters”, Law360 , 09.12.2015*).. USSC Cert Petitions in Halo v. Pulse and Stryker v. Zim m er, 22.06.2015 CAFC Oral Argument in M cRo v. Bandai, 11.12.2015 CAFC Oral Argument in Lexmark v. Im pression, 02.10.2015 CAFC Decision in Carnegie v. M arvell, 04.08.2015 S. Schindler: “A PS as to the Motio Decision ….”, 11.01.2016*). S. Schindler: “BRIPTO by the USPTO or BRIMBA by the Suprem e Court?”, 03.02.2016, *). S. Schindler: “Classical Lim itations or M BA Fram ework’s Inventive Concepts?”, 08.02.2016*), S. Schindler: “Patent-Eligibility: Vague Feelings or an M BA Fact?”, pub. 12.02.2016*)

[231] [232]

[261]

[354] [355] [356]

[360] [361] [362] [363] [364] [365] [366] [367] [368] [369] [370] [371] [372]

S. Schindler, U. Diaz, T. Hofmann, L. Hunger, C. Negrutiu, D. Schoenberg, J. Schulze, J. Wang, B. Wegner, R. Wetzler: “The User Interface Design of an Innovation Expert System (= IES) for Testing an Emerging Technology Claimed Invention (= ETCI) for its Satisfying Substantive SPL”, p.. 07.03.2016*)

[263] [264] [265] [266]

[262] M . McCormick: "Im m anuel Kant: M etaphysics", www.iep.utm .edu/kantm eta/. M. Fuller, D. Hirshfeld, M. Schecter, L. Sheridan, C Brinckerhoff (M oderator), Panel Disc., IPO, DC,15.03.2016. W. Quine, see Wikipedia. USSC PfC by Samsung v. Apple, 21.03.2016 "The Chicago M anual of Style Online", http://www.chicagomanualofstyle.org.

[267] [268] [269] [270] [271] [272] [273] [275] [276] [277] [278] [279] [280] [281] [282]

S. Schindler: “IDL” pat. appl.: “THE IDL TOOLBOX", 2016, in prep.. S. Schindler: “IES-UIE” pat. appl.: “THE IES USER INTERFACE DESIGN''”, 2016, in prep.. S. Schindler: “FSTP II" pat. appl.: “THE FSTP-II”, 2016*,, in prep.. S. Schindler: “PEGG-Test" pat. appl.: “THE PI GRANTING/GRANTED TEST”, 2016, in prep. S. Schindler: "The Supreme Court's MBA Framework" Implies "Levels Of Abstraction"",12.05.2016*) S. Schindler: “CSIP” pat. appl.: "CONTEXT SENSITIVE ITEMS PROMPTING",2016, in prep. S. Schindler: “MEMO about "Mathematical Inventive Intelligence, MII ", published on 21.06.2016*) V. Winters, K. Collins, S. Mehta, van Pelt: "After Williamson, Are Functional Claims for SW Viable?" K. Collins: "The Williamson Revolution in SW Structure", Washington University, Draft 04/01/16. CAFC Decision in.Williamson v. Citrix Online, 2015*). D. Parnas: "Software Fundamentals", ADDISON-WESLEY, 2001. USSC: Transcript of its Hearing in Cuozzo on 25.04.2016*) M. Lee: Opening Statement at the Patent Quality Community Sympos. USPTO', Alexandria, 27.04.2016 USPTO: "EPQI", http://www.uspto.gov/patent/initiatives/enhanced-patent-quality-initiative-0 R. Bahr, USPTO:" Formulating a Subject Matter Eligibility Rejection and Evaluating......", 04.05.2016*).

[274]

[349] [350] [351] [352] [353]

[357] [358]

[225]

[233] [234] [235]

[347] [348]

[359]

[218] [219] [220] [221] [222] [223] [224]

[236] [237] [238] [239] [240] [241] [242] [243] [244] [245] [246] [247] [248] [249] [250] [251] [252] [253] [254] [255] [256] [257] [258] [259] [260]

[335] [336] [337] [338] [339] [340] [341] [342] [343] [344] [345] [346]

[161] [162]

[176] [177] [178] [179] [180] [181]

[317] [318] [319] [320] [321] [322] [323] [324] [325] [326] [327] [328] [329] [330] [331] [332] [333] [334]

[373] [374] [375] [376] [377] [378]  

35

S. Schindler: "Prototype Demonstration of the Innovation Expert System", LESI 2016, Peking, 16.05.2016. B. Wegner: "FSTP – Math. Assess. of an ETCI's Practical/SPL Quality", LESI 2016, Peking, 16.05.2016. D. Schoenberg: "Presentation of the IES Prototype", LESI 2016, Peking, 16.05.2016. W. Rautenberg: "Einführung in die Mathematische Logik" ,VIEWEG*TEUBNER, 2008 ISO/IEC 7498-1:1994; Information technology -- Open Systems Interconnection -- Basic Reference Model:; www.iso.org N. Fuchs, K. Kaljurand, T. Kuhn: “Attempto Controlled English for KR”, U. Bonn, 2008 CAFC, Decision in TLI, 17.05.2016*). CAFC, Decision in Enfish, 12.05.2016*). S. Schindler: "Enfish & TLI: The CAFC in Line with the Supreme Court's MBA Framework", .25.05.2016*) R. Bahr, USPTO: MEMORANDUM as to "Recent Subject Matter Eligibility Decisions ...", 19.05.2016*). S. Schindler: " MRF, the Master Review Form in USPTO's EPQI, SPL, and the IES ", publ. 30.05.2016 *). USPTO: "Strategic IT Plan for FY 2015-2018", USPTO's home page L. Hunger, M. Weather: “The IES GUI – a Tutorial”, prep. for publ. S. Schindler: "A Comment on the 2016 IEG Update – Suggesting More Scrutiny ", publ. on 09.06.2016 *). USPTO:" Patent Public Advisory Com., Quarterly Meeting, IT Update", 05.05.2016, USPTO's home page S. Schindler, U. Diaz, C. Negrutiu, D. Schoenberg, J. Schulze, J. Wang, B. Wegner, R. Wetzler: “The User Interface Design of IES for Testing an ETCI’s its Satisfying SPL – Including Arguing Mode ”, in prep.. S. Schindler: "On Consolidating the Preemptivity and Enablement Problems", in prep. S. Schindler: "Epilog to the Patent-Eligibility Problem (Part I)", 20.07.2016*) S. Schindler: "Epilog to the Basic Patent-Eligibility Problem (Part II)", publ. 19.09.2016*) S. Schindler: "MEMO – Abstract Ideas and Natural Phenomena as Separate Causes of nPE", in prep. CAFC, Decision in Jericho v. Axiomatics, 14.03.2016*). CAFC, Decision in Rapid Litigation Management v. Cellzdirect, 05.07.2016*). E.. Chatlynne, „The High Court's Artific. And Fictitious Patent Test Part 1“, 05.07.2016 CAFC, Decision In re Alappat, 29.07.1994*). USSC, Decision in Diamond v. Diehr, 03.03.1981*). USSC, Petition for Certiorari, OIP v. Amazon, 12.11.2015*). USSC, Petition for Certiorari, Sequenom v. Ariosa, 21.03.2016*). USSC, Petition for Certiorari, Jericho v. Axiomatics, 10.06.2016*). CAFC, Decision in Bascom v. AT&T, 27.6.2016*). R. Bahr, USPTO: MEMO as to "Recent Subject Matter Eligibility Rulings", 14.07.2016*). a. Wikipedia: "First-order logic"*). b. Wikipedia: "Prädikatenlogik"*). J. Duffy: "Counterproductive Notice in Literalistic v. Peripheral Claiming", U. of Virginia, June 2016*). J. Duffy: “Section 112 and Functional Claiming”, FCBA, Nashville, 22.06.2016. S. Schindler: "MEMO on Metaphysics vs. Rationality in SPL Precedents about ETCIs" alias on "Mathematical Cognition Theory by Far Exceeds Hitherto Knowledge Representation ", in prep. R. Stoll: " Innovation Issues in the Americas - Subject Matter Eligibility " CASRIP, Seattle, 22.07.2016*). CAFC, Decision in Philips v. Zoll. Medical, 28.07.2016 CAFC, Decision in AGIS v. LIFE360, 28.7.2016 S. Schindler: “Modeling Semantics for the ‘Innovation Description Language, IDL’ for ETCIs”, this Memo, publ. 20.03.2017,*). S. Schindler: "Epilog to the Basic Patent-Eligibility Problem (Part III)", in prep. CAFC, Decision in In re CSB-System International, 09.08.2016.*) USSC, Decision in Cuozzo, 20.06.2016*). P. Suppes: "Axiomatic Set Theory", DOVER Publ., Stanford, 1972. P. Suppes: Probabilistic Metaphysics, Basil Blackwell, Oxford and New York, 1984 H: Burkhardt, B. Smith: "Handbook of Metaphysics and Ontology", Philosophia Verlag, Munich, 1991. G. Quinn: “USPTO handling of PI sparks substant. discussion at PPAC meeting“, IP Watchdog, 24.08.2016 tbd LAW360: D. Kappos: Modern-Day 101 Cases Spell Trouble For ATMs Of The Future, 16.08.2016 M. Holoubek: tbd S. Schindler: "A PS to my Epilog for the PE-Problem (Part I[300] & II[301])", publ. 22.09.2016*) S. Schindler: " MEMO: The Notion of Claiming in SPL – pre and post the Aufklärung", publ. 10.10.2016*) CAFC, Decision in Intellectual Ventures v. SYMANTEC, 30.09.2016*). S. Schindler: "Two Blueprints for Refining the IEG’s Update to Solving the PE Problem or A PS to my Comment on John Duffy's Essay about "Claiming" under 35 USC ", publ. 03.12.2016*).. T. Kuhn: "The Structure of Scientific Revolutions", UCP, 1962. EU's Biotech Directive EU's CII Directive EU's Enforcement Directive EU's SBC Regulation S. Schindler: "MEMO: The Two § 101 Flaws in the CAFC's IV Decision, caused by the Phenomenon of 'Paradigm Shift Paralysis' in SPL Precedents about ETCIs", publ. 26.10.2016*). D. Kappos: "Getting Practical About Patent Quality", Law360, 21.10.2016 J.Herndo:"Just When You Thought the CAFC would Softening … the Tide Turns Again", PATENTDOCS*) D.Atkins: "Federal Judges Slam Alice at Event Honoring Judge Whyte",Law360, 20.10.2016*) CAFC, Decision in AMDOCS v. OPENET TELECOM, 01.11.2016*). R. Bahr, USPTO: MEMORANDUM as to "Recent Subject Matter Eligibility Decisions ...", 02.11.2016*). S. Schindler: "The AMDOCS Dissent Stirs up the Key Deficiency of the CAFC's pro-PE Alice Decisions, thus showing: The Time is Ripe for Ending the §101 Chaos! ", pub., 10.11.2016*). S. Schindler: " ROUNDTABLE ON PATENT SUBJECT MATTER ELIGIBILITY ", pub., 14.11.2016*). B. Wegner: Invited paper, “Innovation, knowledge representation, knowledge management and classical mathematical thinking”, Corfu, Ionian University, pub., 22.11.2016*) B. Wegner: Invited paper, “Math. Modelling of a Robust Claim Interpretation and Claim Construction for an ETCI, - Adv. Steps of a “Mathematical Theory of Innovation””, Bangkok, ICMA-MU, 17.-19.12.2016*) S. Schindler: "The IES Qualification Machine: Prototype Demonstration", GIPC, New Delhi, 11.-13.01.2017. B. Wegner: "FSTP – Math. Assess. of ETCIs’ Quality", GIPC, New Delhi, 11.-13.01.2017*). D. Schoenberg: "The IES Prototype Qualification Machine ", GIPC, New Delhi, 11.-13.01.2017*) S. Schindler: “The Lesson to be Learned from the US Patent-Eligibility Hype: It Supports the USPTOs Enhanced Patent Quality Initiative, EPQI/MRF”, published, on 11.12.2016*). S. Schindler: “An Amazing SPL Cognition: Any Patent Application is Draftable Totally Robust, Memo A”, published on 31.01.2017*). S. Schindler: “An Amazing SPL Cognition: Any Patent Application is Draftable Totally Robust, Memo B”, published by 07.03.2017. S. Schindler: “An Amazing SPL Cognition: Any Patent Application is Draftable Totally Robust, Memo C”, to be published by the end of.04.2017. M. Kiklis: “The Supreme Court on Patent Law”, Wolters Kluwer, 2015. N. Solomom: “The Disintegration of the American Patent System ─ Adverse Consequences of Court Decisions”, IPWatchdog, 26/29.01.2017, *) IPO (“Intellectual Property Owners Association”): “Proposed Amendments to Patent Eligible Subject Matter under 35 U.S.C. § 101”, 07.02.2017,*) IA (Internet Association”): “Letter to the President-elect Trump”,14.12.2016*) N. N.: Survey about LifeCycle-/Biotechnique, to come soon. USPTO/PTAB: Ex parte Schulhauser, 2016,*) B. Kattehrheinrich et al.: ”What Schulhauser Means For Condit. Claim Limitation”, Law360, 03.02.2017*) S. Schindler: “The PTAB’s Schulhauser decision is Untenable”, published 08.03.2017*) R. Katznelson: “Can the Supreme Court’s erosion of patent rights be reversed?”, IPWatchdog, 02.03.2017*) CAFC, Decision in TVI v. Elbit, 08.03.2017*). P. Michel, P. Stone, P. Evans, P. Detkin, D. Matteo, R. Sterne, J. Mar-Spinola, et al.“The Current Patent Landscape in the US and Abroad”, 12th APLI, USPTO, 09.-10.03.2017*). Transcript of[367], ??.03.2017,*) P. Newman, dinner speech, 12th APLI, USPTO, 09.-10.03.2017. S. Schindler: “IDL” pat. appl.: “An ‘Innovation Description Language, IDL’ & its IES Interpreter”, 2017 .1 Wikipedia: “DSL”, .2 Wikipedia: “Compiler”, .3 Wikipedia: “BNF” S. Schindler: "Innovation Description Languages, IDLs & Knowledge Representations, KRs, and Easily Drafting&Testing Patents for Their Total Robustness”, this paper, 16.05.2017*) S. Schindler: "Innovation Description Languages, IDLs & Brain Knowledge Representation, brainKR”, in prep. Justice Thomas: Friendly Comment, 04.12.2015*) J. Koh, P.Tresemer: “Client Alert of 15.05.2017”, Latham & Watkins*) AIPLA: “Legislative Proposal and Report On Patet Eligible Subject Matter”,12.05.2017*) IPO: “Proposed Amendments to Patent Eligible Subject Matter”, 07.02.2017*) see the correct reference in the V.27 at the below URL, in a few days

*) available at www.fstp-expert-system.com

M. Flanagan, R .Merges, S. Michel, A. Rai, W. Taub: "After Alice, Are SW Innovations Ever Patentable Subj. Matter?"

ISBN: 1-60132-463-4, CSREA Press ©

36

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

SESSION INFORMATION EXTRACTION AND ENGINEERING, PREDICTION METHODS AND DATA MINING, AND NOVEL APPLICATIONS Chair(s) TBA

ISBN: 1-60132-463-4, CSREA Press ©

37

38

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

39

Optical Polling for Behavioural Threshold Analysis in Information Security Dirk Snyman, Hennie Kruger School of Computer-, Statistical- and Mathematical Sciences North West University, Potchefstroom, South Africa. e-mail: {dirk.snyman, hennie.kruger}@nwu.ac.za Abstract—In terms of information security research, human attitude and behaviour are commonly sought to be assessed and addressed. These aspects remain complex and difficult to understand and control in order to efficiently manage them. Many psychological and sociological models exist to describe and test human behaviour but they are all reliant on data. This research presents a novel approach for data collection for the specific application of behavioural threshold analysis. Optical polling is found to be a viable alternative to traditional data collection methods which provides good quality data. It addresses some of the issues associated with behavioural threshold analysis and information security research in general. Keywords—information security; optical polling; behavioural threshold analysis; data collection.

I.

INTRODUCTION

Research pertaining to information security often aims to understand the factors that drive the behaviour of individuals in their everyday interactions with information technology systems [1-3]. Human behaviour is often fickle and unpredictable which makes the management thereof a problematic endeavour. The business risk associated with information security behaviour (namely financial or data loss) necessitates the need for better understanding by management on how and why people do what they do to ultimately provide for different eventualities, or ideally act in a pre-emptive manner to avoid these eventualities from occurring at all. One of the common controls that are put in place to help prevent the above-mentioned losses include the development and deployment of information security policies [4]. These policies aim to govern the interaction between humans and systems in a manner that reduces the associated risks by limiting behaviour that might lead to risky actions. The main issue with policies, however, are that they are only effective when followed to the letter. Individuals tend to develop averse attitudes to policies due to their constricting nature [5] where, for instance, a less secure shortcut to completing a task exists but are contrary to procedures specified by policy. They therefore often opt to disregard policies in part or in their entirety in exchange for the ease or speed conveyed by the risky behaviour [4]. Furthermore, disregard for policies may also be due to ignorance about them rather than resistance towards them [6]. If an individual is oblivious to policies, following them becomes impossible. To remedy the latter situation where ignorance is the cause of failing policies is relatively simple. By introducing security awareness programs employees can be educated about the

existence and contents of information security policies and taught good information security practices [1, 3]. The former, where attitude is to blame, is far more complicated to address. Security awareness programs may help in re-affirming the already known intricacies of said policies, but without a specific focus on addressing behaviour and correcting the underlying attitude such programs may fail. The individual might just become security fatigued due to the possibly repetitive nature of programs that lay emphasis on the same tired topics [7]. Once the audience of such programs have become fatigued, it may lose its efficacy to promote healthy information security practices. The attitude that an individual has towards a situation influences the behaviour that the individual will exhibit when confronted with the specific situation and has been formalised into behaviour models (see [8, 9]). Given this link between attitude and behaviour it is evident that influencing the former, will influence the latter. It should be noted that before an attitude can be influenced, it should first be determined what the existing attitude is in order to determine a course of action. One approach to bring information security attitude and behaviour together is through behavioural threshold analysis [10]. The application of behavioural threshold analysis as a means of measuring information security attitudes of individuals in a group setting was first presented by Snyman and Kruger [11]. One of the key issues highlighted by this study was the unique requirements of a measuring instrument that would be appropriate in behavioural threshold analysis. This current study will investigate and comment on an alternative data collection method, specifically for behavioural threshold studies, to address the shortcomings of existing measuring instruments. Section II presents a brief overview of behavioural threshold analysis in the context of information security and will also point out some of the specific key issues pertaining to data collection. Existing instruments for measuring attitude are, although widely used, not without problems [2, 6-9, 11]. Based on these common problems, Crossler et al. [6] state that the methods of measuring data relating to information security should be improved. Subsequently Pattinson et al. [2] review and compare two standard data collection methods commonly used for measuring attitude. These methods were analysed in terms of ease of implementation and the overall quality of the results obtained by using each method. The two techniques are standard self-reporting (online) questionnaires and repertory grid technique interviews. They conclude that when trying to measure attitude an approach that combines these two methods might be advantageous. In an attempt to combine these methods

ISBN: 1-60132-463-4, CSREA Press ©

40

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

and draw from their respective strengths, this study posits evaluating the feasibility of an optical polling data collection method to obtain information on individual attitudes with a specific focus on gathering information security data for behavioural threshold analysis. The remainder of the paper is organised in the following manner: Section II provides the background for the study and describes the related literature on data collection methods for attitude and behaviour in information security. Section III presents the methodology for this study as well as results obtained. Section IV provides a discussion of the suggested data collection method and Section V concludes the paper and presents an overview of planned future work. II. BACKGROUND In an earlier and related study that forms part of the same larger research project, Snyman and Kruger [11] presented the first application of behavioural threshold analysis as an instrument for analysing the human factor of information security. The behavioural threshold approach was first presented by Granovetter [10] in an attempt to provide a model that could be used to predict the outcome of circumstances where people are assembled in a group setting. The proposition for the model is that each individual in a group has an innate attitude toward the actions of others in the specific group. When the individual is confronted with either partaking in or refraining from a specific activity in the group setting the decision is said to be based on the perceived number of others that are already participating in the activity. This innate attitude towards the action, given the knowledge of the number of current participants, is presented as the behavioural threshold of the individual in the context of Granovetter’s [10] model. When the number of participants of the activity exceeds that of the individual’s threshold, the individual will participate in the activity. This could then be used to determine the number of eventual participants in the specific activity, given the dispersion of the individuals that comprise the group and their own thresholds that determine their participation. Snyman and Kruger [11] conducted an initial inquiry into employing behavioural threshold analysis for information security. The pilot study that was conducted indicated that behavioural threshold analysis is a feasible approach in terms of information security and determining topics for security awareness programs among others (see [11] for more applications of behaviour threshold analysis in information security). There are, however, some issues that were raised that are to be addressed in future work if behavioural threshold analysis for information security is to properly succeed. Some of the key issues that were identified in the pilot study, with specific reference to behavioural threshold analysis research, were that: 1) Data collection for behavioural threshold analysis requires a unique method of questioning. The initial question in [11] was posed in such a manner that respondents1 had to nominate a percentage of other group members that were sharing their passwords with others before the respondent would also share 1.

their password. This lead to respondents misunderstanding the question and indicating any random percentage value which lead to unexpected aggregated results; 2) The choice of the topic (password security [11]) that was used for collecting behavioural threshold data influences the manner in which respondents answer the question due to their familiarity (and also individual outlooks) with the topic and its recommended practices. For behavioural threshold analysis it is imperative to identify suitable topics upon which questions are based. Topics on which respondents are already security fatigued may not yield useful results for threshold analysis; 3) The influence that biographic differences and circumstances of individuals might have on their reported attitudes. Behavioural threshold analysis measures attitudes in a group setting. The distribution of individuals that make up the group directly influences the attitudes of others in the group and ultimately determine the outcome of group participation; 4) The problem of social desirability and the impact that it might have on the accuracy of the results that are obtained from the application of this model. Social desirability is the supposed predisposition of a person to, when presented with a questionnaire, answer in a manner they would deem socially acceptable [8, 9, 11]. This desire to not offend social decorum leads to answers that reflect that which the person deems acceptable, instead of mirroring the actual events that the person is being asked about. In the context of behavioural threshold analysis, social desirability plays a deeper roll than only influencing the responses on a questionnaire. In its essence behavioural threshold analysis measures the influence of the group on the individual. If an individual suffers from social desirability they may demonstrate an uncharacteristic willingness for group participation with a low threshold. This is because the participant wants to fit in with the group’s social dynamic to be accepted. However when an outsider like a researcher questions them about their threshold for participation they might exaggerate their threshold to seem socially acceptable from the researcher’s perspective. This phenomenon might bias the behaviour threshold analysis to understate the eventual outcome that is being investigated and would need to be managed. Finally, 5) the manner in which questions and possible answers are presented to respondents in a questionnaire might have an effect on the answers they elicit by causing a bias. This could include the ranking of attitudes that a participant has by using traditional Likert scales vs. questions that require direct answers as discussed in 1) above. Traditional Likert scales are based on an uneven number of possible answers, e.g. a five or seven point scale [12]. This presents the respondent with the option to express neutrality by selecting a response exactly in the middle of the scale. For behavioural threshold analysis a neutral answer does not convey any proper information on the personal threshold of a respondent. Traditional scales of reporting are therefore not suited to express either participation or abstinence. Self-reporting questionnaires are thought to cause different

Note on terminology: Respondent is used to refer to an individual that completes questionnaires, while Participant refer to an individual in an interview or group setting.

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

biases and also introduce social desirability as previously mentioned [13]. Behavioural threshold analysis in the context of information security is currently unknown to the conventional respondent. With this in mind, traditional questionnaires might not convey enough background or structure for a respondent to confidently reply in the expected fashion. Mechanisms would need to be employed to restructure the approach and guide a respondent through the process. Pattinson et al. [2] confirm the occurrence of social desirability and other issues in questionnaires when comparing different data collection methods for measuring attitudes for information security research. They statistically analyse the outcomes of two different approaches, namely questionnaires and the repertory grid interview technique, and found that both the techniques have merit under different circumstances. Questionnaires provide a way to facilitate quick and easy dissemination of questions to many respondents with the possibility of a large amount of data being acquired, but as mentioned above, questionnaires are prone to biases. The repertory grid technique for interviews is shown to reduce social desirable responses and other biases, but are tedious and time consuming to implement due to the small number of participants that can be accommodated at any given interview as well as interview length. They may however lead to better quality results when compared to questionnaires. Pattinson et al. [2] then proceed to advise a combination of the two techniques when attempting to measure attitude. The main difference between these two approaches is self-reporting (when using questionnaires) contrasted with facilitator led reporting (repertory grid technique interviews) [2]. Combining these methods would then require some sort of supervised selfreporting where a facilitator or interviewer would guide the reporting process (as with repertory grid technique interviews) while still maintaining the volumes of responses that one would expect from using a questionnaire to some extent. In an attempt to combine the two approaches mentioned above and to address the issues 1) to 5) mentioned before, this research employs an optical polling platform for attitude data acquisition. According to a collection of patents, optical polling is the acquiring of feedback from participants by means of a camera and real-time image processing [14, 15]. Each participant is given an object (usually a paper-based encoded pattern) by means of which the participant can provide one of a set of pre-defined responses. The objects are then recognised by a computer system which logs the nominated responses that each participant provides. Very little research on using optical polling for data acquisition for any general research purposes, and indeed none for information security research, currently exists. De Thomas et al. [16] and Lai et al. [17] describe the use of optical polling as a mechanism for testing student participation and knowledge and getting student feedback, while some individual unpublished lecture and seminar notes also allude to audience interaction with optical polling methods [18-20]. The main advantage of optical polling would be that many participants can be guided through answering a set of questions by a facilitator that is present in the room. This meets the requirements, as set out above, for combining the common data 2

41

collection techniques. Furthermore, it also helps to address some of the issues in data collection for behavioural threshold analysis e.g. the real-time feedback from participants in order to clarify any uncertainties that may arise due to their understanding of questionnaire items, and may help to limit the effects of social desirability. Another advantage is that due to the interactive nature of optical polling (i.e. facilitated sessions), the questions on behavioural thresholds can be better structured to guide participants in responding in a structured manner, rather than expecting a direct answer on a relatively complex question with little or no context. Other advantages that are specific to behavioural threshold analysis is discussed in Section IV. The following section describes the methodology that was used to test optical polling as a data collection tool for behavioural thresholds in information security. III. METHODOLOGY A. Using an optical polling method The optical polling method that was selected for this research is the Plickers2 platform [16-20]. Plickers are free to use and can accommodate up to 63 participants. The Plickers website provides functionality to set up multiple choice questions that are answered during the facilitated questionnaire session. The questions are shown on a projector to the participants via the Plickers website where the current question is selected from a mobile device (e.g. a smartphone or tablet) running the Plicker application. The participants then use a Plicker card to nominate their answer to the specific question. Figure 1 shows an example of a typical Plicker card.

Fig. 1. Example of a Plicker card

Note the numbers and letters around the edges of the encoded pattern on the card in Figure 1. The number indicates the card number in the series of cards and shown in Figure 1 is card 1 of 63. Responses are recorded against the specific card number, which in turn (depending on the needs of the researcher) can be linked to a specific participant. The letters indicate the desired response from the participant where the answer (A, B, C or D) is selected by rotating the card so that the corresponding letter is

Plickers available at www.plickers.com

ISBN: 1-60132-463-4, CSREA Press ©

42

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

at the top of the card. The card shown in Figure 1 indicates that the answer selected is “A”. When selecting their answer to a question, the participants hold up their cards, taking care that the entire card is visible to the facilitator. The facilitator in turn, takes the mobile device mentioned above and points the device camera towards the group of participants. The live feed from the camera is presented in a real-time, augmented reality fashion showing the number of responses as well as a cursory summary of responses to the facilitator. A hovering indicator is shown next to the card through the view finder when a card has been scanned. The website is then updated to reflect which card number has answered the question and which card’s responses are still outstanding. The website can also show a summary of responses to the attending participants. The general advantages that are conveyed by using optical polling are that social desirability might be reduced as with the repertory grid technique interviews [2]. Questions would most likely not be misinterpreted [11] as any uncertainties may be discussed with the facilitator during the interactive session which in turn should lead to better quality results. The facilitated sessions can accommodate up to 63 participants which is an improvement over the individual interviews described by Pattinson et al. [2]. While a facilitated session does mean time spent on the interaction with participants it should be much less than the total time that would be invested in individual interviews. The advantages that this optical polling method encompasses for behavioural threshold analysis is presented in Section IV. Disadvantages include that participants all have to be present for the facilitated session where the optical polling is conducted. This corresponds to one of the drawbacks [2] of repertory grid technique interviews and is contrary to one of the benefits of self-reporting online questionnaires where respondents can complete the survey at any geographical location and at any time that suit them. The possible number of responses for a specific question is limited to four. As stated above, the available responses are only a choice of responses A-D. This might limit the sensitivity of the responses where more options are needed, especially for Likert scale based questions where a common practise is to use as wide a scale as possible, but narrowing the scale under certain conditions is also commonly done and should not pose an insurmountable problem [12]. Due to the limitation on the type and number of responses it is not possible to conduct any research that require open ended questions when using optical polling in the manner in which it is implemented here. These limitations are not necessarily detrimental for research on behavioural thresholds and can even be beneficial. By limiting the responses to four possible options, participants are compelled to take a stance on the question that is posed to them as there is no choice for a moral middle ground that is often offered by the uneven number of options in classic Likert scales [12]. This stance would typically be for or against a specific information security behaviour. The following section describes the detail of data collection for behavioural threshold analysis by using optical polling.

B. Data collection In order to test the feasibility of optical polling as a method for collecting data for information security behaviour research, a questionnaire on information security behavioural threshold questions [11] was posed to a group of participants using Plickers in the manner described in the previous section. A facilitated questionnaire session was conducted with 35 students enrolled for honours level computer science and information systems at a South African university. The voluntary nature of participation in this research meant that only 23 students elected to participate in the session. To ensure anonymity, the Plicker cards were distributed randomly among the participants with no specific linking of participants to the cards. To be able to track answers on a per participant basis, each card was identified on the Plicker system as only “Participant1” and “Participant2” and so on without any knowledge by the researchers on the eventual user of the card. The session was preceded by an instructional tutorial on the use of Plickers including a trial run of responses on a gender identification question. Behaviour threshold analysis requires a very specific style of questionnaire to elicit the required responses from participants [21]. Based on the questionnaire used in [11] and based on the understanding gained from the pilot study some changes were made to the approach to adapt the questionnaire to be used in an optical polling setup. Rather than opting for a participant to nominate their threshold value as a percentage as in [11], the participant rates their level of inclination in intervals of threshold values. The behavioural threshold question was based on the use of dubious websites as an example of naïve and accidental internet use [2, 5]. Figure 2 shows the questionnaire that was used in its entirety. Percentage of others that use torrent sites 0—10% 11—20% 21—30% 31—40% 41—50% 51—60% 61—70% 71—80% 81—90% 91—100%

How inclined would you be to use torrent sites, given the percentage (%) of others in your group that use torrent sites? Never

Somewhat inclined

Strongly inclined

Always

1 1 1 1 1 1 1 1 1 1

2 2 2 2 2 2 2 2 2 2

3 3 3 3 3 3 3 3 3 3

4 4 4 4 4 4 4 4 4 4

Fig. 2. Example of complete behavioural threshold question

The questionnaire in Figure 2 had to be further adjusted to be implemented in an optical polling environment. Each threshold interval was shown to the participants in isolation in order to structure and facilitate the process. Figure 3 shows an example of one of the behavioural threshold interval sub-questions as posed to the participants via the Plickers website during the facilitated session.

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

1=Never, 2=Somewhat inclined, 3=Mostly inclined, 4=Always A:1

B:2

C:3

D:4

Fig. 3. Example behavioural threshold question

The responses that were recorded for the behavioural threshold analysis were answered on a scale of 1-4 which corresponds with the options A-D on the four sides of a Plicker card. An answer of 1 indicates that the participant will have no inclination towards participating in using torrent sites given the indicated number of others that use them, while 4 indicates that the participant is inclined to always participate, given the number of others already participating. Ten versions of the subquestion in Figure 3 was presented with a new 10 % interval for the behavioural threshold value (see Section II and [11]) nominated for each sub-question in order to test all threshold value intervals from 0-100%. This resulted in ten behavioural threshold sub-questions in total. The series of behavioural threshold sub-questions was supplemented by a further eight questions [22] to determine the level of social desirability expressed by the participants in answering the behavioural threshold questions. By measuring the level of social desirability, certain adaptations can be made to the answers of the participants to reflect the true version of events, rather than the reported one. The facilitated questionnaire session, including the introductory instruction, trial question and social desirability questions, took around 20 minutes to complete. C. Data analysis and interpretation After completing the facilitated questionnaire session, the resulting responses were tabulated (see [11]) in order to construct a graphic representation for analysing the behavioural thresholds of the participants. The individual thresholds are determined when a participant is willing to participate, given the number of current participants. By nominating a value of either 2 and lower or 3 and higher, the participant indicates willingness to participate or abstain. It is therefore necessary to translate all responses to their extremes, i.e. a response of 2 is translated to the “Never” end of the spectrum to 1 and a response of 3 is translated to a 4 at the “Always” end of the spectrum [11]. This would, however, prove problematic if an uneven number of possible responses were presented to a participant by means of a five point Likert scale where participants have the possibility to choose a neutral response where classification could not be easily achieved. Given this binary classification of responses, and given the four responses that are possible using the Plicker cards, this approach seems to be well suited for the task. After the binary classification of responses were completed, the answers were analysed on a per-respondent basis to determine the behavioural threshold for each participant. The remaining eight questions that were used to measure the level of social desirability were scored and interpreted as instructed in [22]. Nine of the 23 participants exhibited a high level of social desirability. The responses for these participants were adjusted to one notch higher than the self-reported level of inclination to participate. The following sub-section shows the results for both

the unadjusted behavioural threshold analysis as well as the analysis where the thresholds were adjusted for social desirability. D. Results Snyman and Kruger [11] discuss the method of behavioural threshold analysis in terms of information security in detail and therefore this section will only mention the specifics pertaining to this research. Figure 4 shows the resulting graphical representation of the cumulative thresholds as reported by the participants using Plickers as an optical polling platform. Behavioural threshold analysis Percentage of respondents that use torrent sites

If 71-80% of the honours class makes use of torrent sites, how inclined would you be to use torrent sites?

43

100, 91

100 80, 86

90 80 70

90, 86

60 50 40 30 20 10 0 0

10

20

30

40

50

60

70

80

90

100

Behavioural threshold Equilibrium Ajusted for social desirability Original Fig. 4. Behavioural threshold analysis

Based on the original reported figures as designated within Figure 4, a strong initial number of participants with a low threshold for using torrent sites is shown. Over 50% of the respondents indicated they will participate in using torrent sites when only 10% of the group are already participating. The number of participants in the group is likely to continue growing until an equilibrium is reached. The equilibrium point where the graph intersects the equilibrium line is both stable against decrease and stable against further increase, with the preceding and subsequent line segments having a gradient (m) that is less than one (m=0 in both cases) [10, 21]. This indicates that given the distribution of individual thresholds, the behaviour of using torrent sites is likely to grow until about 86% of the individuals in the group take part in the behaviour. Once this point is reached the number of participants should become dormant and remain stable, but without an external intervention it will also not show any decline. Section III-C described a series of questions which were presented to the participants to test the level of social desirability exhibited when answering the behavioural threshold questions. Figure 4 also shows the resulting graph for the adjusted thresholds to correct the responses of the nine participants showing high levels of social desirability. The leftmost part of

ISBN: 1-60132-463-4, CSREA Press ©

44

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

the graph shows a higher number of participants exhibiting lower thresholds than initially indicated. This means that more participants are willing to use torrent sites when a lower number of the group are using the torrent sites. To the right of the graph where the equilibrium line is intersected, the resulting equilibrium remains the same as with the original thresholds which were not adjusted for social desirability. This shows that the outcome remains the same but the escalation of the number of partaking participants is quicker than the original threshold analysis indicated. The unchanged outcome, even after adjustment, might indicate that the influence of social desirability might not have been as high as expected. This might be due to the manner in which the data was collected, i.e. by using optical polling as the data collection method. In the penultimate section (Section IV), a discussion is given on the results presented here. IV. DISCUSSION The results obtained by using data collected via optical polling for behavioural threshold analysis seems promising. The cumulative behavioural thresholds presented in the previous section follow the expected conformation when graphically depicted. In the typical plot of cumulative behavioural thresholds, the majority of the reported thresholds reside above the uniformly distributed thresholds which forms the equilibrium line [10, 21]. A stable equilibrium can only be reached when the intersection with the equilibrium line is from above as the gradient of the adjacent line segments cannot be less than one if the intersection comes from below. The latter can indicate a situation where the influence of the group on the individual is negligible as the number of group members exhibiting a behaviour never exceeds their personal threshold [21]. Section III shows results that are somewhat in contrast to the earlier results presented in [11] where the graphic depiction shows only a small allocation of the reported thresholds above the equilibrium line with the bulk falling below it. In [11], equilibrium is reached early on and the problem of password sharing becomes stable with 10% of the group being likely to exhibit this behaviour. These results [11] are said to be an analogical example of the application of behavioural threshold analysis for information security behaviour. This was due to the deviation of the results from the expected form and was attributed to the issues highlighted in Section II. Using optical polling in this research leads to results that better resemble the expected outcomes from literature than that of [11]. The most notable difference between the two studies is that [11] made use of a self-reporting online questionnaire, while the optical polling method was used in this research. It appears that by changing the data collection method to a hybrid approach as suggested by Pattinson et al. [2], an improvement of the quality of the data that were collected is evident. It simultaneously helps address some of the issues mentioned in Section II: The misunderstanding of the question being presented to individuals. All of the participants can hear the instructions as issued by the facilitator. This ensures that all participants have a uniform understanding of the questions and the process. Participants could directly ask for clarification

during the facilitated session if anything was still unclear which means that the question will be answered in the manner and context expected by the researcher; secondly the issue of social desirability seems to be (at least partly) improved. When measuring the social desirability of participants, it was noted that some of them are rated high and are unlikely to supply responses that are truthful. Nonetheless the outcome of the behavioural threshold analysis (at least in this instance) seems unaffected; and finally, optical polling provides a new way in which to structure and present questions to participants which is particularly useful in addressing the unique requirements for behavioural threshold analysis. Forcing a participant to only choose participation or abstinence with the four point scale of the optical polling platform, the issue of neutral answers can be overcome which leads to better interpretability of the data. It helps in organising the data collection process in such a fashion as to assist the participant in understanding and contextualising the questions and guide their responses along the lines of what the researcher would expect. It helps participants answer what they should (the question), rather than what they think they should (their interpretation of the question). The final section concludes this research paper and indicates future directions for optical polling as a data collection method for behavioural threshold analysis. V. CONCLUSION This report shows the first investigation into using optical polling as an innovative data collection method for use in behavioural threshold analysis. The goal of this research was to test the feasibility of the approach and was done by implementing optical polling in the data collection phase of a behavioural threshold analysis experiment in the domain of information security. The motivation for this research was determined by a former study [11] in an ongoing larger project on information security behavioural threshold analysis research. Optical polling addresses some of the common data collection issues for information security research, but more importantly it addresses some of the issues that are unique to behavioural threshold analysis (see Sections II and VI). The results of behavioural threshold analysis that were obtained when using optical polling for data collection closely resemble that which is expected from literature. This indicates that optical polling solves some of the data quality complications that were experienced in earlier behavioural threshold analysis experiments. As with any other data collection approach, optical polling is not an infallible solution and cannot be seen as a silver bullet to address all the problems relating to behavioural threshold analysis. For traditional information security research, some general issues remain unanswered. For instance where it is physically impossible to have the participants together in one venue due to geographical or scheduling issues, the optical polling approach would not be suited. In this instance a selfreporting (possibly online) questionnaire would remain the instrument of choice. Also where more in-depth interviews with individual participants are required (e.g. for qualitative research), repertory grid interviews remain the preferred data collection mechanism. The problems which are specific to behavioural threshold analysis which still remained unanswered

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

are indicated for inclusion in future work. This includes an investigation to determine the suitability of the information security topic that is used in questioning to determine behavioural thresholds. Choosing the correct topics remain cardinally important in insuring the legitimacy of the behavioural thresholds that are measured; and finally the current studies on behavioural threshold analysis in information security and related data collection will be extended to include real-world applications thereof in an industry setting in order to extensively test these concepts.

[12] [13]

REFERENCES [1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

S. Furnell, N. Clarke, M. R. Pattinson, and G. Anderson, "How well are information risks being communicated to your computer end-users?," Information Management & Computer Security, vol. 15, pp. 362-371, 2007. M. Pattinson, K. Parsons, M. Butavicius, A. McCormac, and D. Calic, "Assessing Information Security Attitudes: A comparison of two studies," Information & Computer Security, vol. 24, pp. 228240, 2016. J. M. Stanton, K. R. Stam, P. Mastrangelo, and J. Jolton, "Analysis of end user security behaviors," Computers & security, vol. 24, pp. 124-133, 2005. S. R. Boss, L. J. Kirsch, I. Angermeier, R. A. Shingler, and R. W. Boss, "If someone is watching, I'll do what I'm asked: mandatoriness, control, and information security," European Journal of Information Systems, vol. 18, pp. 151-164, 2009. D. Calic, M. Pattinson, P. K., M. Butavicius, and A. McCormac, "Naïve and Accidental Behaviours that Compromise Information Security: What the Experts Think " in Tenth International Symposium on Human Aspects of Information Security & Assurance, Frankfurt, Germany, 2016, pp. 12-21. R. E. Crossler, A. C. Johnston, P. B. Lowry, Q. Hu, M. Warkentin, and R. Baskerville, "Future directions for behavioral information security research," computers & security, vol. 32, pp. 90-101, 2013. S. Furnell and K. Thomson, "Recognising and addressing ‘security fatigue’," Computer Fraud & Security, vol. 2009, pp. 7-11, 2009. R. J. Fisher, "Social desirability bias and the validity of indirect questioning," Journal of Consumer Research, vol. 20, pp. 303-315, 1993. W. D. Kearney and H. A. Kruger, "Theorising on risk homeostasis in the context of information security behaviour," Information & Computer Security, vol. 24, pp. 496-513, 2016. M. Granovetter, "Threshold models of collective behavior," American Journal of Sociology, vol. 83, pp. 1420-1443, 1978. D. P. Snyman and H. A. Kruger, "Behavioural thresholds in the context of information security," in

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

45

Tenth International Symposium on Human Aspects of Information Security & Assurance, Frankfurt, Germany, 2016, pp. 22-32. I. E. Allen and C. A. Seaman, "Likert scales and data analyses," Quality progress, vol. 40, p. 64, 2007. E. D. Frangopoulos, M. M. Eloff, and L. M. Venter, "Human Aspects of Information Assurance: A Questionnaire-based Quantitative Approach to Assessment," in Eighth International Symposium on Human Aspects of Information Security & Assurance, 2014, pp. 217-229. N. R. Amy and S. R. Amy, "Optical polling platform methods, apparatuses and media," US Patent 9,098,731 B1, issued August, 4, 2015. W. F. Thies, A. C. Cross, and E. B. Cutrell, "Audience polling system," US patent 2,014,004,092,8 A1, issued February, 6, 2014. J. R. de Thomas, V. López-Fernández, F. LlamasSalguero, P. Martín-Lobo, and S. Pradas, "Participation and knowledge through Plickers in high school students and its relationship to creativity," in UNESCO-UNIR ICT & Education Latam Congress 2016, Bogota, Colombia, 2016, pp. 113-123. C.-H. Lai, S.-H. Huang, and Y.-M. Huang, "Evaluation of Inquiry-based Learning with IRS in the Technique Course: A Pilot Study in Taiwan Industrial High School," in 44th Annual Conference of the European Society for Engineering Education, Tampere, Finland, 2016. J. Bacon and B. Ward. (2015, 2017-03-07). The Horizon Report 2015 with Audience Participation Using Paper Clickers [Presentation slides]. Available: http://scholarspace.jccc.edu/cgi/viewcontent.cgi?artic le=1129&context=c2c_sidlit L. J. Darcy and S. James. (2015, 2017-03-07). Take IT to the Next Level: Moving from Item Teaching to Authentic Instruction [Presentation slides]. Available: http://www.doe.virginia.gov/instruction/english/profe ssional_development/institutes/2015/elementary/darc y_and_james/presentation.pdf M. Taylor. (2016, 2017-03-07). Raise your Cards-A look at Plickers in an Adult Learning Environment [Presentation slides]. Available: http://scholarspace.jccc.edu/context/c2c_sidlit/article/ 1205/type/native/viewcontent J. S. Growney, "I will if you will: Individual thresholds and group behavior - Applications of algebra to group behavior," ed: Modules in Undergraduate Mathematics and Its Applications - Tools for Teaching, 1983, pp. 108-137. J. J. Ray, "The reliability of short social desirability scales," The Journal of Social Psychology, vol. 123, pp. 133-134, 1984.

ISBN: 1-60132-463-4, CSREA Press ©

46

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

Prediction of Concrete Compressive Strength Using Multivariate Feature Extraction with Neurofuzzy Systems Deok Hee Nam Engineering and Computing Science, Wilberforce University, Wilberforce, OHIO, USA

Abstract - The proposed work shows how to evaluate the target values applying the efficient data handling methods using various multivariate analyses with reduced dimensionalities. In order to explore the proposed methods, neurofuzzy systems developed by the original data and the reduced data are adapted to estimate the high performance concrete through the concrete compressive strength data. In addition, two different paradigms of the reduced dimensionalities are compared to show the better performance between three extracted features and four extracted features. Finally, various statistical categories are applied to determine the best performance among the applied techniques to evaluate the results through the neurofuzzy systems with the original and reduced data of concrete compressive strength. Keywords – concrete compressive strength, data mining, dimension reduction, feature extraction, multivariate analysis, neurofuzzy system

1

Introduction

In recent years, researches on the ability of concrete have received increasing attention. Concrete is widely used for construction material since it is cost-effective and able to carry relatively high compressive loads. Nevertheless, for the structure of concrete, it is often vulnerable to microcracks, which can cause the durability problems of civil infrastructure of the construction in premature degradation. Some factors can cause the issues of cracking formations in a concrete matrix, including mechanical load, restrained shrinkage or thermal deformation, differential settlement, poor construction methods and faulty workmanship. Moreover, conventional concrete repairing and rehabilitation techniques are time consuming and often not effective. Hence, the accurate measurements of the concrete slump components can improve the problematic issues for the better reliabilities of the concrete for the construction. As a result, the essence of high-performance concrete (HPC) is emphasized on such characteristics as high strength, high workability with good consistency,

dimensional stability and durability [1]. In addition to the three basic ingredients in conventional concrete, i.e., Portland cement, fine and coarse aggregates, and water, the making of HPC needs to incorporate supplementary cementitious materials, such as fly ash and blast furnace slag, and chemical admixture, such as superplasticizer [2]. The use of fly ash and blast furnace slag, as well as other replacement materials, plays an important role in contributing to a better workability [1]. In other words, the number of properties to be adjusted has also increased results in modeling workability behavior for the concrete containing these materials is inherently more difficult than for the concrete without them. There are several studies regarding the modeling of strength of HPC, however, it is more difficult to estimate the slump and slump flow of concrete with these complex materials described above. The traditional approach used in modeling the effects of these performances of concrete starts with an assumed form of analytical equation and is followed by a regression analysis using experimental data to determine unknown coefficients in the equation [3]. However, the prediction ability of regression analyses may be limited for highly non-linear problems [4]. On the other hand, the research about the efficient data management is getting more focused and important in these days. Simultaneously, data mining techniques to deal with the reduced data without any significant meaning instead of using the original data are developed more and more. Among data mining techniques, the most frequently used techniques are applying the various multivariate analysis techniques with the extracted features by reducing the dimensionalities. In this paper, factor analysis and principal component analysis, and subtractive clustering analysis are used with maximum likelihood estimation and varimax rotation.

2

Literature review

In general, concrete consists of a mixture of paste and aggregates, or rocks. In order to make a good concrete mix, aggregates with clean, hard, strong particles free of absorbed chemicals or

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

coatings of clay and other fine materials that could cause the deterioration of concrete, are required. Fundamentally, the concrete consists of a mixture of aggregates and paste. First, the aggregates are comprised of inert granular materials like sand, gravel, or crushed stone mixed with water and cement. Aggregates also strongly influence concrete's mixed and hardened properties, mixture proportions, and economy. Secondly, the paste, which is a mixture of cement and water, hardens and increases strength to form concrete through the hydration between the cement and water. The quality of the paste determines the character of the concrete as well such as increasing the strength of concrete by aging the hardening of concrete [5]. In addition, the compressive strength of concrete can be affected by the mixture ratio and curing conditions and methods of mixtures along with transporting, placing and testing the concrete. For the modernized construction using concrete, the prediction of concrete strength is the key point for the engineering judgement since it is important to know the conditions of the construction about removing concrete form, reshoring to slab, project scheduling and quality control, and the application of post tensioning for the structural engineers. Predicting the concrete strength [6, 7] has been researched for many decades based upon the concept of maturity of concrete [8, 9]. The maturity of concrete can be initially defined as “the rate of hardening at any moment is directly proportional to the amount by which the curing temperature exceeds the [datum] temperature” by McIntosh in 1949 [17]. Based on McIntosh’s concept, the same maturity of the same mix for concrete can approximately bring the same strength. In Best Practice Guide [18], the comparison of early age strength assessment of concrete has been presented by four test methods including Lok-tests, Capo tests, maturity measurement , Cube testing with air-cured cubes and temperature matched cubes, and Limpet pull-off test, along with the advantages and disadvantages.

2.1

Factor Analysis (FA)

Factor analysis [15] is a method for explaining the structure of data by explaining the correlations between variables. Factor analysis summarizes data into a few dimensions by condensing a large number of variables into a smaller set of latent variables or factors without losing any significance of the given data. Since factor analysis is a statistical procedure to identify interrelationships that exist among a large number of variables, factor analysis identifies how suites of variables are related. Factor analysis can be used for exploratory or confirmatory purposes. As an exploratory procedure, factor analysis is used to

47

search for a possible underlying structure in the variables. In confirmatory research, the researcher evaluates how similar the actual structure of the data, as indicated by factor analysis, is to the expected structure. The major difference between exploratory and confirmatory factor analysis is that researcher has formulated hypotheses about the underlying structure of the variables when using factor analysis for confirmatory purposes. As an exploratory tool, factor analysis doesn't have many statistical assumptions. The only real assumption is presence of relatedness between the variables as represented by the correlation coefficient. If there are no correlations, then there is no underlying structure. There are five basic factor analysis steps such as data collection and generation of the correlation matrix, partition of variance into common and unique components, extraction of initial factor solution, rotation and interpretation, and construction of scales or factor scores to use in further analyses. In addition, FA applies some rotational transformation based upon how each variable lies somewhere in the plane formed by the factors. The factor loadings, which represent the correlation between the factor and the variable, can also be thought of as the variable's coordinates on this plane. In un-rotated factor solution the Factor "axes" may not line up very well with the pattern of variables and the loadings may show no clear pattern. Factor axes can be rotated to more closely correspond to the variables and therefore become more meaningful. Relative relationships between variables are preserved. The rotation can be either orthogonal or oblique.

2.2 Principal Component Analysis (PCA) Principal components analysis [16] is a procedure for identifying a smaller number of uncorrelated variables, called "principal components", from a large set of data. The goal of principal components analysis is to explain the maximum amount of variance with the fewest number of principal components without losing any significance of the given data. Principal Component Analysis (PCA) is a dimension-reduction tool that can be used to reduce a large set of variables to a small set that still contains most of the information in the large set. Hence, principal component analysis (PCA) is a mathematical procedure that transforms a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables called principal components. The following characteristics explain the PCA. First, the first principal component accounts for as much of the variability in the data as possible, and each successive component accounts for as much of the remaining variability as possible. Second,

ISBN: 1-60132-463-4, CSREA Press ©

48

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

PCA reduces attribute space from a larger number of variables to a smaller number of components and as such is a "non-dependent" procedure if it does not assume a dependent variable is specified. Third, PCA is a dimensionality reduction or data compression method. Since the goal is dimension reduction and there is no guarantee that the dimensions are interpretable, in order to select a subset of variables from a larger set based on the original variables, the highest correlations with the principal components need to be considered.

2.3 Subtractive Clustering Analysis [14] Yager and Filev [13] in 1994 proposed a clustering method called mountain clustering, for estimating the number and initial location of cluster centers depending on the grid resolution with gridding the data space and computing a potential value for each grid point based on its distances to the actual data points. Chiu [14] suggested an extension of mountain clustering referred to as subtractive clustering, in which each data point is considered as a potential cluster centroid rather than the grid point. With Subtractive clustering method, the applied data points instead of the grid points, are evaluated independently from the dimensional problem. The following steps summarize the Subtractive clustering method. Step 1. Decide the measure, Mi, of data points, xi, using Mi =

∑e

with α =

−α xi − x j

4 ra 2

2

(1)

where ra is a positive constant.

Step 2. Find the first center point using the highest value from Step 1. Step 3. Recalculate the measure, Mi, for all data points using M ⇐ M i − M1*e

with β =

4 rb 2

− β xi − x j *

2

(2)

, where rb is a positive constant.

Step 4. Find the next center point using the highest value from Mk* < εM1* where ε is a small fraction and the value of ε may generate many cluster centers if it is too small. Step 5. Repeat this procedure until the kth cluster center, that satisfies the condition Mk* < εM1* with the above criteria for ε, is calculated.

2.4 Maximum Likelihood (MLE) [11]

Estimation

Maximum likelihood estimation (MLE) is a method of estimating the parameters of a statistical model given observations, by finding the parameter values that maximize the likelihood of making the observations given the parameters. For example, one may be interested in calculating a characteristic measurement which is unable to measure its characteristic measurement of every single object in a targeted group due to other constraints like cost or time based upon normal distribution with unknown mean and variance. At that time, the mean and variance can be estimated with MLE while only knowing the information of some sample of the overall population. In other words, the mean and variance as parameters are calculated and particular parametric values that make the observed results the most probable given the model can be identified by MLE.

2.5

Varimax Rotation [12]

The varimax rotation procedure was first proposed by Kaiser in 1958. The procedure is to find an orthonormal rotation matrix T by multiplying the given number of points and the number of dimensions configuration A. Then, the sum of variances of the columns of B × B, is a maximum, where B = AT. A direct solution for the optimal T is not available, except for the case when the number of dimensions equals two. Kaiser suggested an iterative algorithm based on planar rotations, i.e., alternate rotations of all pairs of columns of A.

3

Concrete slump test data [10]

Concrete is a highly complex material. The slump flow of concrete is not only determined by the water content, but that is also influenced by other concrete ingredients. The data set includes 103 data points. There are 7 input variables, and 3 output variables in the data set. The initial data set included 78 data. After several years, 25 new data points were added to form 103 data. Seven input variables are cement, slag, fly ash, water, SP, coarse aggregate and fine aggregate. The output variable is 28-day Compressive Strength in the unit of mega pascal.

4 Applied Neurofuzzy Systems To evaluate the acquired results, the neurofuzzy systems are used. Fig. 1 shows a neurofuzzy system

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

49

using seven inputs and one output with the original data to evaluate the concrete compressive strength.

Fig. 3 Rulebase System for Reduced Components

Fig. 1 Neurofuzzy Inference System of Compressive Strength for High Performance Concrete

Fig. 4 ANFIS Model Structure of Reduced Components

5

Fig. 2 Neurofuzzy Inference System with membership functions of Input Variables Fig. 2 shows the membership functions for each input variables and an output variable. Fig. 3 presents the rulebase system for the neurofuzzy system using the reduced data set. Fig. 4 shows ANFIS model structure of four reduced components. Fig. 3 shows the developed rulebase system for applied neurofuzzy systems with reduced four inputs and one output for the prediction of the strength of concrete. Fig. 4 shows the ANFIS structure of reduce four components and one output as the prediction of the strength of concrete.

Analyses and results

The proposed work has been analyzed by comparing the results applying factor analysis, principal component analysis, and subtractive clustering analysis with maximum likelihood estimation and varimax rotation using the neurofuzzy systems. In Fig. 5, to determine the reduced dimensionality through a proposed procedure, the number of the reduced components or factors is determined by the accumulation of the covariance and the significant eigenvalues for the system when the eigenvalues are plotted versus each factor or component extracted by the applied multivariate analyses. As an example of using the principal component analysis among the other compared techniques, from Fig. 5, the first three or four newly extracted factors are relatively significant to implement the deployed data.

ISBN: 1-60132-463-4, CSREA Press ©

50

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

the concrete compressive strength using three reduced dimension analysis.

Eigenvalues vs Extracted Features

Eigenvalue

2.5

2.0

TABLE 2 Three Reduced Dimension Analysis

1.5

org fa favar pcacr pcacv org Sub faSub pca Sub faml

1.0

0.5

0.0 1

2

3

4 5 Component Number

6

7

Fig. 5 Graph of Eigenvalues vs. Extracted Features Hence, in the evaluation, three or four extracted factors by principal component analysis are used for the predicted estimation of concrete compressive strength. As a result, three and four reduced attributes are determined and used for applying the proposed techniques. Similarly, applying factor analysis, three and four newly extracted components are determined and used for evaluating the prediction of concrete compressive strength. TABLE 1 Four Reduced Dimension Analysis org fa favar pcacr pcacv org Sub faSub pca Sub faml

corr 0.22 0.45 0.38 0.41 0.41

trms 878.2 6.09 6.08 6.10 6.31

stdev 84.99 11.17 10.21 11.47 10.39

mad 853.1 5.91 5.91 5.92 6.13

ewi 1817 23.7 22.8 24.1 23.4

err 33.65 5.82 5.91 5.82 6.15

0.13 0.66

875.5 3.05

85.82 4.86

850.6 2.96

1812 11.2

33.54 2.63

0.60 0.38

3.30 6.67

5.40 11.45

3.20 6.48

12.3 25.2

2.80 6.40

Note: The employed statistical categories are Correlation (corr), Total root mean square (trms), Standard deviation (stdev), Mean average distance (mad) and Equally weighted index (ewi). The deployed neurofuzzy systems are developed by the original data (org), the original data with applying subtractive cluster analysis (orgSub), applying factor analysis using covariance (favar), applying factor analysis using maximum likelihood (faml), applying factor analysis with subtractive clustering analysis (faSub), applying principal component analysis using correlation (pcacr), applying principal component analysis using covariance (pcacv), and applying principal component analysis with subtractive clustering analysis (pcaSub).

From TABLE 1, the best performance using four reduced dimension of the data set is the prediction of concrete compressive strength applying factor analysis with the post-subtractive clustering analysis. Using the examined methodologies of dimensionality reduction and various multivariate analysis methods, TABLE 2 shows the comparison of the methodologies using the original and reduced data of

corr 0.22 0.55 0.45 0.47 0.64

trms 878.2 4.20 8.78 4.57 4.56

stdev 84.99 6.69 13.97 8.25 3.63

mad 853.1 4.084 8.532 4.443 4.43

ewi 1817 15.4 31.8 17.8 13.0

err 33.65 3.657 7.850 4.000 4.145

0.13 0.72

875.6 3.03

85.82 4.83

850.6 2.939

1812 11.1

33.54 2.649

0.78 0.86

2.78 2.96

3.91 2.60

2.702 2.876

9.62 8.57

2.442 2.822

For applying three reduced dimension analysis, the prediction using principal component analysis with post subtracting clustering shows the best performance among the other techniques.

6

Conclusion

As a conclusion, two analyses applying different number of reduced factors/components are compared by the estimation through the neurfuzzy systems with implemented by the reduced data set as well as the original data set to predict the concrete compressive strength. In overall, from both reduction analyses, the technique using principal component analysis with the post subtractive clustering analysis shows the best performance. However, due to the data dependency, the unique results from both cases are not presented even though the reduced cases show the better results than the original data set cases. Therefore, for the future study, more various data need to be explored to find out the better prediction of the concrete compressive strength with the various dimensionality reduction methods.

ACKNOWLEGEMENT Concrete Slump Test Data Set [1, 10] from UCI Machine Learning Repository are used from the following website, https://archive.ics.uci.edu/ml/datasets/Concrete+Slum p+Test

7

References

[1] I. Yeh, “Modeling slump flow of concrete using second-order regressions and artificial neural networks,” Cement & Concrete Composites, Vol. 29, 2007, pp. 474-480.

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

[2] I. Yeh, “Exploring concrete slump model using artificial neural networks,” Journal of Computing in Civil Eng. Vol. 20. Issue 3, 2006, pp. 217-221. [3] M. Mansour, M. Dicleli, Y. Lee, and J. Zhang, “Predicting the shear strength of reinforced concrete beams using artificial neural networks,” Engineering Structures, vol. 26, 2004, pp. 781–799. [4] L. Baykasog, T. Dereli and S. Tanıs, “Prediction of cement strength using soft computing techniques,” Cement and Concrete Research, Vol. 34, Issue 11, 2004, pp. 2083-2090. [5] S. Kosmatka, B. Kerkhoff, W. Panarese, editors. Design and Control of Concrete Mixtures. 14th ed. Portland Cement Association, 2002. [6] L. Snell, J. Van Roekel, and N. Wallace, “Predicting early concrete strength,” Concrete International, Vol. 11(12), 1989, pp. 43–47. [7] S. Popovics, “History of a mathematical model for strength development of Portland cement concrete,” ACI Materials Journal Vol. 95, Issue 5, 1998, pp. 593–600. [8] G. Chengju, “Maturity of concrete: Method for predicting early stage strength,” ACI Materials Journal Vol. 86(4) 1989, pp. 341–353. [9] F. Oluokun, E. Burdette, and J. Deatherage, “Early-age concrete strength prediction by maturity — Another look,” ACI Materials Journal, Vol. 87(6), 1990, pp. 565–572. [10] UCI Machine Learning Repository, Concrete Slump Test Data Set https://archive.ics.uci.edu/ml/datasets/Concrete+ Slump+Test [11] I. Myung, “Tutorial on maximum likelihood estimation,” Journal of Mathematical Psychology, Vol. 47, 2003, pp. 90–100. [12] Henry F. Kaiser, "The varimax criterion for analytic rotation in factor analysis,” Psychometrika, Vol. 23, Issue 3, September 1958, pp. 187 – 200. [13] R. Yager and D. Filev, “Approximate clustering via the mountain method,” IEEE Trans. Systems, Man, Cybernet. Vol. 24, Issue 8, 1994, pp. 1279–1284. [14] S. Chiu, “Extracting fuzzy rules for pattern classification by cluster estimation,” Journal of Intelligent and Fuzzy System, vol.2, pp. 267-278. [15] A. Wright, “The Current State and Future of Factor Analysis in Personality Disorder Research,” Personality Disorders: Theory, Research, and Treatment, Vol. 8, No. 1, 2017, pp. 14 –25. [16] H. Abdi, L. Williams2 and D. Valentin, “Multiple factor analysis: principal component

51

analysis for multitable and multiblock data sets,” WIREs Comput Stat Vol. 5, 2013, pp. 149–179. [17] S. Wade, A. Schindler, R. Barnes, and J. Nixon, “Evaluation of the maturity method to estimate concrete strength,” Research Report No. 1, Alabama Department of Transportation, MAY 2006. [18] Best Practice Guide, “Early age strength assessment of concrete on site,” based upon “Early age acceptance of concrete (Improved quality management),” by J. H. Bungey, A. E. Long, M. N. Soutsos and G. D. Henderson. BRE Report 387, published by CRC Ltd.

ISBN: 1-60132-463-4, CSREA Press ©

52

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

Business and technical characteristics of the BidProcess Information System (BPIS) Sahbi Zahaf

Faiez Gargouri

Higher Institute of Computer and Multimedia MIRACL Laboratory, Sfax University, Tunisia [email protected] Abstract— Bid process is a key business process which influences the company’s survival. The Bid Process Information System (BPIS) that supports this process must be: integrated, flexible and interoperable. Nevertheless, the urbanization approach has to deal with “three fit” problems. Four dimensions have been identified to deal with such failures: operational, organizational, decision-making, and cooperative dimensions. We are particularly interested at reduce the gap between business and technical infrastructures. We propose solutions to deal with “vertical fit” problems and to define the characteristics of the operational dimension. In this context, we show that the ERP (Enterprise Resource Planning) implements the operational dimension of the BPIS.

Higher Institute of Computer and Multimedia MIRACL Laboratory, Sfax University, Tunisia [email protected] resist to changes in the market and to cope with the agility of business); and (iii) interoperable (able to exploit communications between companies in order to contribute to the construction of the techno-economic proposal that materializes the bid proposition). Nevertheless, the urbanization approach [5], on which we rely to implement this BPIS, has to deal with “three fit” problems: “vertical fit”, “horizontal fit” and “transversal fit” (Figure 1). Such problems handicap the exploitation of these three criteria [21].

Keywords- Bid Process, Information System, ERP, Ontology Design Pattern, Organizational Memory.

I.

INTRODUCTION

The bid process is a company business key as it does affect its future by implementing competence and competitiveness capable of beating competitors. Indeed, this process corresponds to the conceptual phase of the lifecycle of a project or a product. A bid process interacts upstream and involves other processes being design process. It aims to examine suitability of the bid before negotiating any contract with any owner following a pre-study carried out before a project launch. Many companies work together in interorganizational bid processes. To enable ad-hoc bid interaction it is necessary to align business processes of the business partners, especially in communication processes in the context of product conception activity. These business processes can be partly standardized, but need to be slightly adapted for several similar use cases by the involved companies. This fosters adaptability and reuse for the business partners. It is the company’s agility and competence that allows acquiring the owner’s confidence and interest, and as a consequence wins the offer. It is to remember that doing business means taking risks, something that can influence the company’s survival. So, the strategic management of business is a current concern to an innovative company. The latter promotes to restructure its Information System (IS) around its business processes. The I.S that allows to exploit the bid process (Bid Process Information System or BPIS) must be : (i) integrated (able to restore and exploit the patrimony of knowledge and expertise acquired during past experiments bids); (ii) flexible (able to

Figure 1. Urban I.S reference model [5]: “three fit” problems [21].

The “vertical fit” represents the problems of integrity, extensibility, and transposition from a business infrastructure that is abstract, to a technical infrastructure that represents implementations. The “horizontal fit” translates not only the applications’ problems of identification (induced by the “vertical fit” problems) that cover the entire infrastructure of the company’s business, but also the intra-applicative communications problems (internal interoperability) to ensure the interactions between applications of the same technical infrastructure in the company (the same homogeneous system). The “transversal fit” translates the inter-applicative communications problems (external interoperability carried out dynamically through a network). Four dimensions have been identified to deal with such failures (“three fit” problems): operational, organizational, decision-making, and cooperative dimensions [21]. Afterwards we are particularly interested at reduce the gap between business and technical-infrastructures of the BPIS.

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

Thus, we propose solutions to deal with “vertical fit” problems and to define the characteristics of the operational dimension. This work is organized as follows. The second section describes the alternatives of resolutions of the “vertical fit” problems. The third section shows our modelling of the business infrastructure of the BPIS. The fourth section shows the features of the bid-memory, which represents the organizational dimension of the BPIS. The fifth section presents our implementation of the operational dimension of the BPIS. We end this work by the conclusion and with the prospects of our future works. II.

RESOLUTION OF THE “VERTICAL FIT” PROBLEMS

Four dimensions have been identified to deal with “three fit” problems [21]: (i) the operational dimension that serves to specify the bid exploitation process by undertaking a specific project; (ii) the organizational dimension which allow to organize the set of skills and knowledge, that the company acquired during the previous bid in which it participated: the objective is a possible reutilization of this patrimony in future bid projects; (iii) the decision-making dimension which aims at optimizing and making the right decisions that concerns the market offers and that takes place during the company’s eventual participation in bid processes; and (iv) the cooperative dimension which aims at ensuring communication intra-enterprise (internal interoperability) and at planning the inter-enterprise communication on demand, in order to realize a common goal (dynamic interoperability): i.e. while creating techno-economic bid proposition. Afterwards we are particularly interested at solving “vertical fit” problems. This requirement is necessary to implement the operational dimension. TABLE I. BPIS IN THE CORE OF THE COUPLING CAPACITY OF THE EIGHT APPROACHES [21].

53



 

Lean Manufacturing [10] helps to select potential activities involved in a bid process. In fact, the Lean allows designing a product perfectly adapted to the needs of its owner. This approach integration adds value to the techno-economic proposition which materializes a bid process. Engineering Systems [1] allows aligning business processes of the business partners which are involved in a bid process. BPM or Business Process Management [18] allows describing the life cycle of a bid process. This approach facilitates the alignment of integrated I.S regardless technological constraints.

In this paper we propose to integrate these different approaches to reduce the gap between business and technical infrastructures of the BPIS. In this perspective, we propose to model the value chain of a bid process. Thus, we define:  A modeling for the business infrastructure (understandable by analysts) achieved with BPMN 2.0 (Business Process Model and Notation) [14]. We present our mapping of the activities required to serve a bid process. Particularly, we are interested in the following to the techno-economic bid elaboration. Our requirement is to establish a pivot and a standard business bid language.  A modeling for the technical infrastructure (understandable by experts) achieved with UML 2.0 (Unified Modeling Language) [13]. Thus, we propose our class diagram to build the techno-economic bid proposition for the product design activity. This model is intended for the implementation of the operational dimension of the BPIS. We also demonstrate that the ERP (Enterprise Resource Planning) [8] ensures the exploitation of our model and covers the operational dimension of the BPIS. Table 2 shows the comparative study between languages of definition of business processes: UML 2.0, YAWL [19] and BPMN 2.0. This study covers the criteria of integrity, flexibility and interoperability [23]. Through this evaluation we support our choice of modelling for both the business and the technical infrastructures. TABLE II. EVALUATION OF MODELLING LANGUAGES ACCORDING: INTEGRITY, FLEXIBILITY AND INTEROPERABILITY [23].

Table 1 shows that: PRIMA, Lean Manufacturing, BPM, and Engineering Systems approaches participate to overcome “vertical fit” problems:  PRIMA or Project Risk Management [2] allows specifying the activities involved in a bid process. However, PRIMA doesn’t propose an explicit modeling of bid process (i.e. for the business infrastructure), such requirement appears necessary in a multi-actor and a multi-criterion context.

ISBN: 1-60132-463-4, CSREA Press ©

54

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

III.

BUSINESS INFRASTRUCTURE MODELLING OF THE BPIS

This section deals with bid process modelling using BPMN 2.0, relying Bizagi tool, and according to the PRIMA approach, which distinguishes three main phases: assessment of the bid eligibility, elaboration of the bid proposition, and closure of the bid process. Thus, bid process exploitation is achieved through the interaction of these phases (or subprocesses). However, these sub-processes are themselves composed of a set of sub-processes or/and tasks.

Figure 2. Bid process phases modelled with BPMN 2.0.

Figure 2 materializes our modelling. Once the company receives a bid, it first checks the eligibility of this offer. This is an important step that the bidder assesses the achievability of the bid with respect: the capacity of its company (equipment, personnel, etc.); constraints imposed by the owner (performance, time, financial factors, risk factors, etc.); costs that are evaluated in relation with the offer and the price which the owner is willing to pay. A decision would be taken at the end of this phase: to continue and proceed to the development of the proposition, or to abandon this offer. In this last case, it is necessary that the bidder capitalizes and validates knowledge’s and skills built during this inspection phase. In the case where the bidder treats the development of the proposition, it is found that the bid is eligible. A decision would be taken at the end of this phase: to readjust the proposal, either to finalize and transmit them to the owner, or to abandon the process. In the two last cases, bidder shall proceed to the closure of the bid process, for the reasons related to feedback and knowledge validated capitalization. Indeed, this phase is devoted to monitoring implementation of the proposal chosen by the owner. Although this choice concerns the works of another competitor, the involvement of this phase is beneficial for the bidder for its future bid participation.

In the following, we focus on the bid process development. This is a phase that corresponds to the planning of the activity design of the product, which will be assessed by indicators (technical, business, time, risk, etc.). This is an innovation activity, its main objective is to predict the design of the product: by optimizing costs of construction, by respecting deadlines of manufacture and by minimizing risks of its future exploitation, while providing the highest quality of services. However, the bidder must mobilize its own human resources to form the bid-staff. It must designate these different profiles: bid-manager, cost-manager, risk-manager, expert and knowledge-manager. Various approaches have been formalized for designing product. We have opted in our strategy to design this activity by decomposition of problems. This is in fact a top-down approach, based on customer requirements to achieve the functions products. We justify our choice that the Lean Manufacturing follows this design approach. Figure 3 materializes our cartography for this phase. First, the bid-manager must define the value from the owner’s requirements; then he must model the chain of the value. This modelling should: integrate value generating activities flow according to the strategy of Lean Manufacturing; and describe the environment of the unfolding of the design activity. Henceforth this work will be transmitted to other members of the bid-staff. The knowledge-manager proceeds to fulfil the knowledge needed for this bid (Cf. Figure 3). For that, he must be based on repositories of the bidder (it reflects the expertise and skills acquired by this company), to create a bid-memory for each bid-project. Note that we treat the characteristics of the bidmemory in the next section. The bid-memory will be fuelled and driven by the knowledge produced from following steps:  Locate the bid-knowledge: knowledge-manager must identify, collect the knowledge and expertise needed to the product design activity. He must be based on: (i) the bid of the current project; (ii) bids of similar projects carried out by the bidder, and their respective bid-proposition; (iii) bids of similar projects carried out by others bidders, and their respective bid-proposition; and (iv) bids of other projects undertaken by this bidder and their respective bid-proposition.  Filter the bid-knowledge: knowledge-manager must select only the crucial knowledge for this project (creative knowledge of the value according the Lean Manufacturing strategy).  Preserve the bid-knowledge: knowledge-manager must model theses crucial knowledge’s, to ensure their exploitation thereafter by applications and tools for the product design activity. In this context, we propose in the following our modelling for the techno-economic bid-proposition.  Organize the bid-memory: knowledge-manager must fuel the bid memory and shared it between members of the bid-staff. They will exploit and enrich it with new

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

knowledge. However, validation of the produced knowledge is carried out only by the knowledgemanager. The intervention of the expert starts, once he receives the modelling of the value chain. For this he relies on the bid issued by the owner, and the bid-memory. First, expert must apply functional analysis methodologies on the bid; in the perspective of identifying owner requirements. He should not only provide a list of explicit requirements, but he must also be able to reveal the implicit requirements (not expressed by the owner). These requirements specifications constitute a support from which expert could determine the features desired by the owner relative its future product. There should be a list of the main functions, and their sub-functions. Thus expert must weigh each function according to the vision of the owner, and they must specify their criteria for evaluation. Thereafter, each owner-function must be translated on product-function. For this reason, the experts will be based on the bid-memory, in order to establish a description of product, at a detailed level. This memory should contain a correspondence sub-memory between requirements and specifications, and will be fuelled by the expert. For that, he should have access to the technical repositories of the bidder. Indeed, expert chooses the most suitable product-function, obviously if it exists in the repositories; otherwise he will propose transposition alternative on product-function. Furthermore, expert must assess the correspondence-memory and selects only the creative functions of the value (according to the strategy of Lean Manufacturing). Once the functions of the product are identified, the expert could deduce the type of design to achieve. It is a groovydesign of the product, in the case where all the predefined functions were extracted from the technical-repositories. It is a re-design of an existing product, where the majority of predefined functions are derived from the repositories, only a few new alternatives functions have been proposed. In contrast, it is an innovative-design for a similar product, where the majority of the proposed functions are alternatives, only a few predefined functions are derived from the repositories. It is a creative-design of a new product, if no proposed function was identified from the repositories. Finally, expert should prepare and store the design-plan. After that expert must characterizes all the components which constitute the manufactured-product. Indeed, he must be based on the bid-memory, specifically on the technicalrepositories, to check if the identified component has already been used by the bidder, or if he could replace it with a similar component. Otherwise, it is a new component; it should be validated later by the knowledge-manager at the cloture of the bid-process. Admitting that the main objective of the bid-staff is to measure the technical feasibility and the financial provision of the bid, it would be necessary to quantify the business characteristics of this component. Thus, if the bidstaff is going to reuse or to readjust an existing component, it

55

should specify the costs (by cost-manager) and risks (by riskmanager) related to its use. Otherwise, a decision must be made for a new component, either to buy or make it. In cases where the bid-staff opts for purchasing the component, it must be based on the market to predict simultaneously the price of the product (by the cost-manager) and the risks associated with its use (by the risk-manager). Whereas in the case where it chooses to manufacture the component, it must provide both of the manufacturing cost (by the cost-manager) and the risks associated with its operations (by the risk-manager). This characterization of the technical-components helps expert to plan product-design activity: it is the most important phase of the technical building process. In fact, during this phase expert proposes the different solutions that embody the techno-economic proposal. For this reason he must rely on a tool that allows him to make the assembly of selected components. Then he will retain only the best solutions. The construction of the technical solution ends up with incorporating the financial-proposal. It includes the cost of designing, bid-staff fees, supplier-prices, and subcontractorscosts. Finally, the techno-economic proposition will be simultaneously estimated. On the one hand, the cost-manager estimates an overall cost of the technical-proposal. In addition, he participates in the evaluation of its price on the market. On the other hand, the risk-manager assesses risks related to the creation of the technical-proposal. Ultimately, the bid-manager will finalize the proposition. Following this phase, the bid-staff could judge the necessity to adjust this proposition. Thus, a decision is made to rebuild or re-evaluate the proposition. Otherwise, if the result of the estimated proposition appears to be convincing, then it proceeds to the finalization of the proposition. The last step is to evaluate the bid-proposition in its totality, and take a final decision before forwarding it to the owner. In this context, the bid-staff must identify and serve decisions’ criteria, develop a scorecard to align and weigh these indicators, in order to finally optimize the final-proposition. In the following, we focus on the functions for achieving the bid-process. These are: (i) functions to build technicalsolutions (for planning product-design activity); (ii) functions for cost-management (to assess the overall cost of each technical-solution) and for price-management (to assess the price on the market of the designed-product) (iii) functions for risk-management (to assess the risks associated with the creation of the technical-proposal); (iv) functions for decisionsupport (to optimize decisions); and (v) functions for knowledge-management (to manage the knowledge and business-skills). In conclusion the functions described in:  (i), (ii) and (iii) cover the operation- dimension.  (iv) covers the decisional dimension.  (v) allows to cover the organizational dimension.  Interoperability between these different functions can cover the cooperative dimension

ISBN: 1-60132-463-4, CSREA Press ©

56

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

Figure 3. Elaboration of the bid-proposition.

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

IV. FEATURES OF THE BID- MEMORY: ORGANIZATIONAL DIMENSION OF THE BPIS To work on a specific bid implies the intervention of several collaborators. Certainly, these contributors exchange knowledge and information flows. However, its environmental differences lead to various representations and interpretations of knowledge, and therefore, on the same corpus, different skills and semantics overlap (“horizontal fit” problems). Some occurring failures are described in terms of a set of five conflicts: the syntactic conflicts are the results of different terminologies used by stakeholders on a same application domain. The structural conflicts are related to different levels of abstraction which aim at classifying knowledge within a virtual company (bid team). The semantic conflicts concern the ambiguity that emerges due to the stakeholders’ reasoning in the development of the technical and economic proposal. The heterogeneities conflicts are due to the diverse data sources (specifications, owner, experts, collaborators, etc.). Finally, the contextual conflicts, come mainly from environmental scalability problems, and in fact stakeholders can evolve in different environments. In order to answer to these various conflicts, we suggested in [22] an Organizational Memory (OM) sustained by an ontological framework in order to overcome the “horizontal fit” problems. However, this memory needs to be empowered by a knowledge-based system, to operate, share and automatically reason on business knowledge between different stakeholders. This system allows overcoming the structural and syntactic conflicts, and as a result it solves the problem related to knowledge acquisition. In return, it does not solve the ambiguities related to knowledge representation (semantic and contextual conflicts). In the perspectives to answer the requirements related to solving the semantic and contextual conflicts, we suggest an ontological modeling framework of business knowledge. Our approach which seeks to construct an ontological framework for the business operation process is jointly supported, on the one hand, on the specialization of the founding ontology DOLCE [12] which apply the method OntoSpec [11], and on the other hand, on the Ontology Design Patterns (ODP) [6] relating to kernel ontologies:  A specialized founding ontology DOLCE allowed us to master the complexity of conceptual modeling. Hence, it solves problems related to semantic conflicts. Accordingly, we reutilized concepts from DOLCE to specify generic concepts related to business processes (DOLCE is the backbone of the OntoSpec method). Also, this work allowed us to achieve a modeling of different levels of abstraction.  Ontology Design Pattern (ODP), allowed us to master the complexity of consensual modeling at the generic level, this solution solves problems related to contextual conflicts. Indeed, the use of these ODP is based on the reutilization of the ontological modules already designed and evaluated in other areas. It is worthy to note that the concepts used in ODP are defined according to concepts and relations issued by

57

the specialized ontology DOLCE. Practically, we defined the ODP relating to a business process treatment, ODP resources, ODP risks, contextual ODP, and ODP construction products [20]. The application of our proposal, framed in a context of identification of bid knowledge on a specific project, aims to define four types of ontologies: (i) foundational ontology (specialized DOLCE which defines the invariant concepts of business process); (ii) kernel ontology (ODP for the reutilization of the invariant concepts of business processes); (iii) domain ontology (specialized in concepts relating to the kernel ontology in the bid domain); and (iv) application ontology (specialized in concepts of the bid ontology domain in a particular application: bid in a specific project). The produced business skills must be stored for possible future use. For this reason, we suggested an OM for the management of business processes (Fig. 4). We use this memory to exploit the bid process in the context of a particular application. In fact, this OM can deal with the problem related to the capitalization and the restitution of knowledge, and therefore, it resolves conflicts of heterogeneity. We organized our OM with a set of five sub-Memories: a reusable resources memory, a context memory, a roles memory, an action memory, and uses cases memory. These memories are supported by different ODPs enumerated above, and by the different models of CommonKADS [16]. In a specific bid project, our starting point is the tender issued by the owner (a tender is a set of specifications). Concretely, a tender defines and details the set of elements to take in order to execute and manage the project. Its objective is to describe explicitly the desired functionality for future product: owner vision. The analysis of specifications allows alimenting context memory (organization model, agent model, and ODP contextual) and also action memory (task model and application model realized in the form of application ontology). A well explicit context frames the implementation environment of different uses of reusable resources. Henceforth, the memory of reusable resources stores the knowledge generated by the set of objects and reusable concepts that the company manipulates and controls in its routine activities (ODP treatment process, ODP resources, ODP construction product and ODP risk). Thus, reusable resources participate in construct the techno-economic proposal of the offer while being based on roles memory, the latter stores the knowledge generated to describe the use of a reusable resource within a given context. Cases uses memory describes knowledge built for each bid proposal departing from the content of other sub-memories. Each bid proposal will be subsequently evaluated by indicators (technical, business, temporal, risks, etc.). Thus, for a particular solution we suggest to construct a bid-memory which can be constituted by: a technical referential for product design, cost referential and price one to evaluate the case, and a risk referential to specify the possible risks associated with its

ISBN: 1-60132-463-4, CSREA Press ©

58

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

design. The bid-memory (Cf. Figure 4) will therefore be the dynamo for the bid exploitation, notably to prepare the proposal (OM stimulates the bid memory, which enables to cover the organizational dimension).

Figure 4.

Organizational Dimension of the BPIS: Characteristics of the bidmemory.

V.

IMPLMENTATION OF THE OPERATIONAL DIMENSION OF THE B.P.I.S

The modelling of the technical infrastructure allows identifying the tools needed to cover the four dimensions of the BPIS. In the following, we are interested to implement the operational dimension of this system.

The expert uses this model. First he should identify the list of links classifications: the relations (compound/components). The nomenclature describes the composition of a manufactured product from its components, and the coefficients that indicate the amount of each component in each compound. Indeed, classifications are described level by level, thus corresponding to the stages of development of the product. Expert will obtain manufacturing trees whose root is the final product, and whose leaves are characterized by the purchased components and raw-materials. Doing so allows him to identify the list of components to buy and the list of components to be manufactured. For components to buy, it is a list of suppliers, and then chooses the suitable suppliers. For components to make, it is to identify ranges of manufacture: the process of developing a product made from its direct components. Such processes are represented by a set of phases, and consume resources. Each component must be made a range of manufacturing, and many components can share the same range. It is also to define the resources used by the different phases of the specified ranges. Moreover, it is to specify the risks related to the exploitation of resources by the ranges. The ERP ensures the exploitation of this model, thus the ERP permits to implement the operational dimension of the BPIS [21]. Figure 6 presents the tools that describe the requirements of the four dimensions of BPIS.

A. Technical Infrastructure modelling The operational dimension should cover the product-design activity. Figure 5 specifies our contribution to build the techno-economic proposition. Precisely, we present our bidknowledge capitalization model by the risk to serve this activity. This model represents the “ODP Construction Product” of the Reusable Resources Memory (Cf. Figure 4).

Figure 6.

Applicative architecture of the BPIS.

B. Features offered by the ERP to solve the “three fit” problems

Figure 5.

Techno-economic bid-knowledge capitalization for the bid solutions model.

The management of a company covers several processes (sales, purchasing, production, etc.). Each process could be exploited by a specific application that is materialized by its own graphical interface and its own database. In this case the information is dispersed in disjoint systems ("spaghetti effect"), which reduces the productivity and the performance of the company. Thus, it would be difficult to ensure the integrity of the company within its functional scope. Inevitably, there would be problems of “vertical fit” and consequently problems of “horizontal fit” and “transversal fit”. The ERP is able to exploit all the functions of the company in a coherent way. It relies on a single database to manage all the business processes. Thus, the ERP permits to overcome the

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

“vertical fit” problems. Indeed: (i) it ensures the integrity of information within the company; (ii) it covers all business processes; (iii) it allows synchronization and consistency between the various processes. Moreover, the ERP participates in the resolution of the “horizontal fit” problems. infact, it ensures communication between its various modules via a single database. Furthermore, the ERP could also participate in the resolution of the “transversal fit” problems if it is coupled and supported by technological capacities [21].

59

module “Réponse à un Appel d’Offre” or bid process solution under “OpenERP 7.0”.

C. Choice of the ERP to implement the bid solutions Model There are two classes of ERP publishers: commercial and open source. The exploitation of commercial ERP requires the acquisition of an expensive license, such: SAP (Software Application Systems) and Oracle Business Suite. In addition to the price of the license, the company must undertake to pay the cost of a maintenance contract. While, the open source ERP category does not require the purchase of a license, though, it does require an integration cost, such: Open ERP, Compiere and Open Bravo. It is true that commercial ERP presents a more reliable and powerful solution than open source ERP, however, this solution is expensive and it is intended for large sizes companies. Moreover, the commercial ERP is not flexible, indeed the company is dependent on the publisher processes, i.e. it is the company that is forced to adapt and change its processes and working procedures according to the processes programmed by the publisher of the ERP. The open source ERP is more flexible, indeed, it can be adapted to the specific requirements of the company. However, open source ERP has limitations, i.e. lack of customized documentation, which makes its implementation difficult in the company. In the following, we propose to adapt an open source ERP to our context, which is able to cover any bid process solution. In this case the ERP allows to manage and to cover the requirements of the bid process through collaborations realized between its various modules. Thus, the ERP does not propose a specific module to exploit the bid process. In this context we propose to extend an open source ERP “OpenERP 7.0” by a new module that covers the construction of the bid proposition. Concretely, we have chosen “OpenERP 7.0” because it is currently the most complete open source ERP on the market. Moreover, this tool is characterized by its flexibility, i.e., “OpenERP 7.0” supports SOA [3] and Cloud Computing [7]. Specifically, it supports Software as a Service or SaaS [4]. D. Our new module under “OpenERP 7.0” to build the bid solution We have proposed to extend the open source "OpenERP 7.0" with a new module that allows treating the model of bid solutions. We have proposed to extend the open source "OpenERP 7.0" with a new module that allows treating the model of bid solutions. Figure 7 shows the visibility of our

Figure 7.

Visibility of our module “bid process solution” under “OpenERP 7.0”.

Our module allows its operators to detail the manufacture of a product (or a service), from: (i) the components used (raw materials, manufactured products and purchased products); (ii) the manufacturing processes used; and (iii) the resources consumed for construction. We have even planned to calculate automatically the cost for this product design activity. Furthermore, we have explicitly integrated internal risks (risks that are controlled by stakeholders), and external risks (risks that are imposed and not controlled by stakeholders). These risks are related to the manufacture of the finished product. These different criteria will influence on the final decision of the stakeholders to transmit (or not) their bid proposition to the owner. Figure 8 shows an example of the use of our module to plan the techno-economic proposition to the manufacture of the wardrobe. Finally, we propose an extension of a new module on “OpenERP 7.0” which allows the stakeholders to build the bid proposition. Our new extension allows planning the technoeconomic design activity of products by risk. Concretely, our module helps the stakeholders to take a decision for submit or not their proposition to the owner. In fact, we have supported this decision-making assistance by: (i) functionalities to achieve the product design activity according to customer requirements; (ii) a function that automatically calculates the cost for this design activity; (iii) an interval to limit the selling price of the manufactured product in the market; (iv) knowledge concerning the internal and external risks associated with the manufacture of this future product.

ISBN: 1-60132-463-4, CSREA Press ©

60

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

[1]

[2] [3] [4]

[5] [6]

[7] [8] [9]

Figure 8. Example of the bid proposition to manufacture the wardrobe. [10]

VI.

CONCLUSION AND PERSPECTIVES

We have presented our strategy of implementing a Bid Process Information System or BPIS. We showed that such a system must be integrated, flexible and interoperable. However, during the implementation of this system, “three fit” problems (vertical, horizontal and transversal fits) fail the inclusion of such requirements. We are particularly interested at solving “vertical fit” problems. In this article, we have interested to define the characteristics of the operational dimension of the BPIS. In this context, we have proposed a modelling of the bid process in the business level and we have defined our model of the bid solutions. The advantages of such cartography are therefore clear: it facilitates communication between bid staff; it allows unifying the business strategies on a common bid process platform; it reduced the gap between business and technical infrastructures during the exploitation of the BPIS. The implementation of the operational dimension is realized by adapting an ERP to our context. Indeed, we have proposed to extend the open source "OpenERP 7.0" with a new module that allows treating the model of bid solutions. Our extension provides the opportunity to be reusable in one bid to another. Furthermore, we have presented the characteristics of the organizational dimension of the BPIS. Bid processes are actually a family of business processes rather than an individual process. Thus, using “standard” business process languages, such as BPMN 2.0, may not be sufficient. In future work we must express variability in our model (e.g., optional vs. mandatory activities, variants of the same activity, change in the order or the flow of activities, etc.).

[11]

[12]

[13]

[14]

[15]

[16]

[17] [18] [19] [20]

[21]

[22]

[23]

REFERENCES

AFIS (Association Française d’Ingénierie Système) (2009). Découvrir et comprendre l’Ingénierie Système. Ouvrage collectif AFIS préparé par le groupe de travail d’IS. Alquier, A.M. and Tignol, M.H. (2007). Management de risque et Intelligence Economique, l’approche PRIMA. Economica. Bean, J. (2010). SOA and Web Services Interface Design, Principles, Techniques and Standards, Elsevier, Burlington, USA. Cancian, M.H. and Rabelo, R.J. (2013). Supporting Processes for Collaborative SaaS. IFIP Advances in Information and Communication Technology, vol. 408, p.183-190. Fourrier-Morel, X., Grojean, P., Plouin, G. and Rognon C. (2008). SOA le Guide de l'Architecture du SI. Dunod, Paris. Gangemi, A. (2006). Ontology Design Patterns. Tutorial on ODP, Laboratory for Applied Ontology Institute of Cognitive Sciences and Technology CNR, Rome, Italy. Gerald, K. (2010). Cloud Computing Architecture, Corporate Research and Technologies, Munich. Gronau, N. (2010), Enterprise Resource Planning. oldenbourg.verlag, 2nd Edition. Hachani (2013). Approche orientée Services pour un support Agile et flexible des Processus de conception de produit dans les systèmes PLM, Thèse de Doctorat Laboratoire G-SCOP dans l'École Doctorale I-MEP, Université de Grenoble, France. Kassel, G. (2005). Intégration de l’ontologie de haut niveau DOLCE dans la méthode OntoSpec, http://hal.ccsd.cnrs.fr/ccsd-00012203. Masolo, C., Borgo, S., Gangemi (2003): The WonderWeb Library of Foundational Ontologies and the DOLCE ontology. Technical Report. WonderWeb Deliverable D18. P. Hasle P. (2014), Lean Production : An Evaluation of the Possibilities for an Employee Supportive Lean Practice National Research Centre for the Working Environment, Copenhagen, Manufacturing & Service Industries DOI: 10.1002/hfm. OMG (2011a). Business process model and notation (BPMN) version 2.0. Technical report, Object Management Group (OMG), http://taval.de/ publications/BPMN2.0. OMG (2011b). Unied Modeling Language (OMG UML), Superstructure, Version 2.4.1. Technical report, Object Management Group. Schonenberg, H., Mans, R., Russell, N., Mulyar, N., & van der Aalst, W. (2008), Process Flexibility: A Survey of Contemporary Approaches. Advances in Enterprise Engineering. Schreiber, G. and Akkermans, H. (2000). Knowledge Engineering and Management: The CommonKADS Methodology. MIT, Cambridge, USA. Simon, A. (2014). Implémentation d’un éditeur BPMN au sein d'un outil de métamodélisation. Mémoire de Master, Université de Naumer. Weske, M. (2012). Business Process Management - Concepts, Languages, Architectures, (2nd ed.). Springer. YAWL (2014). Yet Another Workow Language. http://www.yawlfoundation.org/. Zahaf S., Gargouri F. (2013), Mémoire organisationnelle appuyée par un cadre ontologique pour l’exploitation des processus d’affaires, 31ème Congrès en INFormatique des ORganisations et Systèmes d’Information et de Décision (INFOSID), Paris, France. Zahaf S., Gargouri F. (2014a), ERP Inter-enterprises for the Operational Dimension of the Urbanized Bid Process Information System, Journal of Procedia Technology, Vol. 16, p. 813-823, Elsevier Ltd. Zahaf S., Gargouri F. (2014b), Operational and Organizational Dimensions of the Bid Process Information System, 13rd International Conference on Information & Knowledge Engineering (IKE), Las Vegas, USA. Zahaf S. (2016). Approche de conception de Système d’Information de Situations Sensibles : Application au processus métier réponse à un appel d’offre. Thèse en Informatique, Faculté des Sciences Economique et de Gestion, Université Sfax, Tunisie.

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

61

Deep Convolutional Neural Networks for Spatiotemporal Crime Prediction Lian Duan1, 2, Tao Hu3, En Cheng4, Jianfeng Zhu5, Chao Gao6* Geography Science and Planning School, Guangxi Teachers Education University, Nanning, China 2. Education Ministry Key Laboratory of Environment Evolution and Resources Utilization in Beibu Bay, Guangxi Teachers Education University, Nanning, China 3 State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China 4 College of Arts and Sciences, University of Akron,, Akron, Ohio, USA 5 Digital of School, Kent State University. Kent, Ohio, USA 6 Key Laboratory of Police Geographic Information Technology, Ministry of Public Security, Changzhou Municipal Public Security Bureau, Changzhou, China *Corresponding author, e-mail: [email protected] 1

Abstract - Crime, as a long-term global problem, has been showing the complex interactions with space, time and environments. Extracting effective features to reveal such entangled relationships to predict where and when crimes will occur, is becoming a hot topic and also a bottleneck for researchers. We, therefore, proposed a novel Spatiotemporal Crime Network (STCN), in an attempt to apply deep Convolutional Neural Networks (CNNs) for automatically crime-referenced feature extraction. This model can forecast the crime risk of each region in the urban area for the next day from the retrospective volume of high-dimension data. We evaluated the STCN using felony and 311 datasets in New York City from 2010 to 2015. The results showed STCN achieved 88% and 92% on F1 and AUC respectively, confirming the performances of STCN exceeded those of four baselines. Finally, the predicted results was visualized to help people understanding its linking with the ground truth. Keywords: Crime Spatiotemporal Prediction; Crime Risk Estimation; CNN (Convolution Neural Networks); Deep Learning; Urban Computing

1

Introduction

There has been increased concern about spatiotemporal predictive policing that forecasts the future crime risks for each community in the urban area, since it provides the significant aids for law enforcements to better identify the underlying patterns behind the crime propagations, efficiently deploy the limited police resources and improve the public safety [3] . However, the task of forecasting where and when the crimes will occur is inherently difficult because it is sensitive to the highly complex distributions of crimes in space and time. Whereas, recent literature recognized that crime incidents tend to exhibit spatial and temporal dependencies with the dynamical social environments [1] [2] . The spatial

dependencies denote the crime risks of a region are affected by the crime-related events or environment factors in its spatial proximate regions as well as distant regions. For example, empirical evidence identified that burglary offenders, bars, incomes, race populations [4] and traffics [13] were statistically related to the spatial concentration of crime. On the other side, the temporal dependencies mean the crime risks of a region are influenced by the crime-related events or environment factors at recent, near and even distant time intervals. For example, the near-repeated patterns found in crimes indicated the recent frequently crimes are considered to be the powerful variable for predicting local crime risks in the immediate future [4] . Moreover, dynamical social events such as 311 or 911 incidents, may imply high crime risks in the near spatiotemporal scope [8] . In summary, the challenge in crime prediction centered on identifying the effective spatiotemporal dependencies from the dynamic interplay of crimes between space, time and environmental factors. In order to address this challenge, scholars have incorporated spatiotemporal point processes [9] and random space–time distribution hypothesis [10] for crime predictions by modeling the crime spatial propagations. The risk terrain analysis [11] , geography regressive models and Bayesian models [12] were also developed to assess how multiple environment factors contribute to future crime risks. Taking the dynamical correlations between crimes and other social activities into consideration, recently studies have explored various feature engineering approaches to characterize the crime-related features regarding Foursquare data [6] , Twitter data [7] , 911 events [8] and taxi trajectories [13] , for enhancing the predictive power. However, most of the studies required profound crime knowledge and resorted to complicated spatiotemporal analysis or cumbersome feature engineering processes. Furthermore, once a bit of data changed, many features must be re-analysis and re-engineered by hand again. Thus, it is not only a challenging to capture valid

ISBN: 1-60132-463-4, CSREA Press ©

62

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

spatiotemporal dependencies from geo-crime data automatically and efficiently, but the performances of these models were also significantly affected by artificial features. Recently, deep convolution neural networks (CNNs), which used multiple-layer architectures to extract the inherent local correlations of data from the lowest to the highest levels, were developed to address the feature extraction issues from high-dimension data. They have led to marked improvements in many domains, especially of computer vision [14] and natural language processing [15] . Similar to the pixels in an image or words in a sentence, social incidents such as crimes incidents and traffic flows, exhibit multiple local spatial dependencies. Accordingly, studies of handling big geodatasets using deep CNNs for poverty prediction [16] , traffic congestion prediction [17] , crowd flows prediction [23] and air pollution prediction [18] , have become available. Hence, based on deep CNNs, this paper presented a novel Spatiotemporal Crime Network (STCN), drawing on crime data — along with 311 data, to provide insight into the automatically spatiotemporal dependencies abstraction that was tied to the crime risk prediction. The main contributions of this paper are the follow: (1) A novel deep CNNs framework is firstly proposed for the end-to-end crime spatiotemporal prediction, which makes it particularly flexible and minimizes feature engineering bias; (2) The combination of inception networks and fractal networks in a unified framework made it be able to automatically learn the nearby and distant spatiotemporal dependencies between crime and 311 incidents. (3) We evaluated STCN using crime and 311 datasets of New York City during 2010 and 2015, with the results showing that our model outperformed four well-known baselines on F1 and AUC. The remainder of this paper is structured as follows. Section 2 illustrates the architecture and learning procedure of proposed STCN model. Section 3 validates the effectiveness of the proposed algorithm through the numerical experiments. Finally, we present the concluding remarks in section 4.

2

Preliminaries

This section described the datasets and basic definitions, then gave the formal problem.

2.1

Datasets

311 Dataset : This dataset had information about 10 million complaint records. About 0.12 million complaint records were omitted as they did not contain coordinates or fall out of the sector boundaries. Therefore, the final 311 dataset contained 9.75 million records.

2.2

Formulation of Crime Prediction Problem

Definition 1 (Region): As shown in Figure 1(a) and (b), we devided the study area into disjoint 120×100 grids, G = {g1, g2, ⋯, g120×100}. Each grid in G is considered as a region. The colored grids denote at least one crime occurred during 2010 to 2015 in NYC. The area of each region is 0.18 km2 (0.47 km ×0.38 km). Such fine spatial scale is well-aligned with the police resource deploying [19] . Num of crimes

(b)

(a)

g (c)

Figure 1 Grid-based map

Definition 2 (Spatial neighbor set): As shown in Figure 1(c), a target region g is where we’d like to forecast whether a crime is most likely to occur. The 3×3 regions geographically surrounding g are called the spatial neighbor set of g. Definition 3 (Time window): The duration of M days is regarded as a time window. (

Definition 4 (Crime feature and 311 feature) x #$%&' ∈ ℝ is used to present the crime feature in the form of a 2D data structure, in which M indicates the length of the time window and N denotes the total region counts in spatial ( neighbor set . Each entry in x#$%&' indicates the crime number. As shown in Figure 2, the three grid-nets in the 3-D coordinate system express the crime numbers of regions during the three continuous days (a time window with M=3). We obtain ( x#$%&' ∈ ℝ/×0 by mapping the spatial neighbors set of g along with three continuous days. +×-

The raw datasets1 came from New York City (NYC) over a span of Jan 1st 2010 and Dec 31th 2015, including felony data and 311 data. Felony dataset: There were 653,447 incidents. Each incident had the properties of coordinates, offense type and time. We removed 8,000 incidents without the borough information and 8 incidents with abnormal coordinates. The left 645,439 incidents were used in our experiment.

1

Target region

NYC Open Data (https://data.cityofnewyork.us/ )

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

Time Day

began to abstract the high-level spatiotemporal features mainly relying on inception blocks and fractal blocks, which leverage branches of convolutional layers and merge layer to fuse the crime-related features with different abstract levels. Finally, the highest-level crime-related features were aggregated into the dense layer, which acts as the classifier in the framework, to realize the crime risk prediction. The number of each type of layer /block and the corresponding position (depth) in the networks referenced architectures described in [21] [22] and were further determined by the experiments.

Space Y

Sp ac e Y

3

2

1

Space X

Time

g g g

63

Day 3

Day 2

Raw inputs

Day 1

Conv × 2

Spatial neighbor

𝒈

Figure 2 𝐱 𝒄𝒓𝒊𝒎𝒆 constructed from raw data

*Inception block × 2

*Inception block

*Fractal block

Output of previous layer

Output of previous layer

1×1 (64)

1×1 (64)

1×3 (64)

5×1 (64)

Conv × 2

3×1 (64)

1×5 (64)

Max Pool

1×3 (64)

1×1 (64)

3×3 (128)

Max Pool

(

Also, we define the 311 feature x088 in the same way. .

*Fractal block × 2

Definition 5 (Observation) For the target region g at day 𝒈 t, the observation can be represented as 𝐗 𝒕 ∈ ℝ M×N×2. It ( ( composes of 𝑥#$%&' and 𝑥088 , as shown in Figure 3.

Avg Pool 3×3

3×3 (128)

3×1 (128)

1×3 (128)

1×3 (128)

3×1 (128)

Merge (concat)

3×3 (128)

3×1 (128)

1×3 (128)

1×3 (128)

3×1 (128)

Dense Merge(concat)

Merge (concat)

Predicted results

(a) STCN

g Time

g g g

Table 1 parameters of STCN

Spatial neighbor

Position

(

0 1-2 3-13 14 15-27 28 29 30

Definition 6 (Crime label) 𝑦= is the crime numbers for target region g at day t, which is represented as crime label. (

Each sample, {X = ,𝑦=?8 }, is generated by combining the observation and the crime label. However, we found 93% of total samples were labeled with 0, resulting the class imbalance problem. Two approaches were hence utilized to address it. The first is to collapse the number of Crime label categories to ( two, that means 𝑦= is either 0 or 1. Secondly, we under-sample [8] 0 class for 50% and over-sample 1 class according to SMOTE [26] . 𝒈

Problem Given the observation 𝐗 𝒕 , the goal is to ( estimate whether 𝑦=?8 is 0 or 1.

3

3.2

Layer (Stacked num) Raw input (1) Conv (2) *Inception (2) Max Pool (1) *Fractal (2) Conv (2) Max Pool (1) Dense(1)

Spatiotemporal within CNNs

Kernel size (num) 3×3 (32) 3×3 (1) 3×3 (64) 5×5 (1) 64

Output size 60×9×2 60×9×32 60×9×64 30×4×64 30×4×128 30×4×64 64 2

dependencies

learning

3.2.1

Convolution Operation A convolution operation can naturally captures a close spatiotemporal dependency for crime incidents, as shown in formula ( 1 ) , -G

Methodology

𝑓A 𝑥 = 𝑓

This section elaborated how the deep CNN architecture resolves the problem.

3.1

(c) Fractal

number of components stacked, “*” denotes it as an abbreviation of the structures depicted in (b) or (c), where the numbers denote the size and counts of kernels.

Figure 3 Observation structure

(

(b) Inception

Figure 4 framework. Red layer is convolution layer. In (a), “×” means the

Architecture

Figure 4 shows an overview of the proposed STCN model, with its parameters in Table 1. Particularly, The crime and 311 data are converted into two 2D image-like arrays (Definition 5) as input feature maps. By passing them into a sequence of two convolution layers, the model is able to capture the low-level spatiotemporal dependencies for crimes incidents. As the output feature maps flowing deep in the networks, the model

𝑤AD 𝑥D + 𝑏A

= 𝑓(wAK xA + 𝑏A )( 1 )

DH8

where l indicates the lth layer, 𝑁A indicates the size of the kernel, 𝑥D indicates the features of crime (or 311) of regions during the period both covered by the kernel, 𝑤A. is the learnable weight in the kernel, and 𝑏A is the linear bias. The f(.) is a ReLU activation function [14] which decides whether a neuron in this layer can be activated, thus a particular spatiotemporal feature is captured. Furthermore, a stack of convolution operations catches the mutual influences of crimes among distant spatiotemporal scope [23] automatically. Figure 5 demonstrated these selflearning processes. A neuron in layer 1 merely capture a small-

ISBN: 1-60132-463-4, CSREA Press ©

64

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

scale spatiotemporal dependencies across 3 × 3 range in layer 0 . Compared to it, a neuron in layer 2 is able to describe a largescale spatiotemporal dependency across 6 × 6 range in layer 0. By this manner, more convolutions can capture much farther spatiotemporal dependency.. Conv layer 0 (raw input)

Conv layer 1

(3) The deep drop-path and shallow drop-path were formulated to build the complex substructure and simple substructure, respectively. Fractal blocks selected either of them by the auto-parameter searching method [24] . This way help to handle the diversities of relation complexities in the data, thereby improving the prediction performance.

Conv layer 2

time space

Figure 5 Stacking of conv layers. A neuron in conv layer 0 represents a spatiotemporal region.

3.2.2

Inception block and fractal block Existing studies found that only by straightforwardly stacking more conv layers together or simply widen the networks make the extended network more prone to overfitting and become dramatically inefficient [21] . To avoid such drawbacks, this study developed the inception block and fractal block. Through stacking of them, our model is armed with the deep network structure to improve the crime prediction performance as well as keeping the networks efficient to train. Inception block. The crime risks are usually affected by various spatiotemporal patterns among crime-related incidents. For example, the crime risk of a particular region may has not only the positive correlations to the historical crime intensities of some surrounding areas, but also has the negative associations to other nearby areas. What’s more, it is may further affect by its temporal patterns such as seasons, and the impacts of 311 incident evolution patterns. In order to extract such complicated features, We designed this block, which stacks asymmetric convolutions in terms of the sequences of 1×3, 3×1 and 1×3 conv layers [21] , as shown in Figure 4 (b). At the bottom of the block, using element-wised methods, the merge layer (green rectangle) is leveraged to collect multiple spatiotemporal dependencies information together from different branches into a “thick” feature map. Fractal Block. The design of the fractal block was adopted from FRACTALNET [22] with the core idea of fractal stacking architecture and multiple drop-paths, as shown in Figure 4 (c). However, there are three major differences in the proposed fractal block with FRACTALNET. (1) Conv layers were factorized into 3 ×1 and 1 ×3 asymmetric ones, maintaining the expression ability with fewer parementers. (2) The merge layer applied concated function which overlays the outputs of different branches to capture multiple crime-related features, instead of using the element-wised average function.

In the Deep drop-path or shallow drop-path, the layer with multiple branches drops each of them with decreasing (Figure 6 (a)) or increasing (Figure 6 (b)) probabilities from left to right, but making sure at least one survives. The decreasing or increasing method is applied to obtain the dropout probability of each branch. For example, Figure 6 (a) depicted the Deep drop-path applied in fractal block with seven floors of layers. Compared to the shallow drop-path in Figure 6 (b), there is merely three floors of layers. Since the “previous layer” of the block owns four branches, the dropout probability set is S 0 T 8 𝝆𝒅 ={ , , , } from left to right for deep drop-path in Figure 6 S S S S 8 T 0 S

(a), or 𝝆𝒔 ={ , , , } for shallow drop-path in Figure 6 (b). S S S S

previous layer 100%

75%

previous layer

50%

25%

conv

conv

conv conv conv

Dropout probabilities

50%

conv

66%

33%

conv

conv

75%

conv

conv

conv

conv

Merge

conv

conv

conv

conv

conv conv

conv

Merge

Merge

(a) Deep Drop-path (b) Shallow Drop-path Figure 6 Drop-paths. Gray color means disabled.

4 4.1

100%

conv conv

conv

Merge 100%

25%



Experiments Baselines

STCN was evaluated by comparing with the other four models, which are conventional time series prediction approach and classifiers employed on spatiotemporal predictions published recently. Rolling weight average (RWA): The crime risk is determined by the average value of previous crime events along a time window (e.g., the last 60 days) with the values of more recent days being given the bigger weights.

𝑦=?8 =

VWX ? VY8 WXZ[ ?⋯?TWXZ\]^ ?WXZ\][ V? VY8 ?⋯?T?8

( 1 )

where n is the size of the time window, 𝑦= is the number of crime incidents at day t. A threshold 𝜀 = 0.1 is used as a cutoff to separate 1 or 0 classes. Support Vector Machines (SVM) [5] : in this model, we used the Gaussian (RBF) kernel, and other two parameters c and γ, which were set 3 and 0.16 respectively to the maximum its F1.

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

Random Forests [5] [6] [8] : In our implementation, the number of trees was 200, and the maximum depth of each tree was 20. Shallow fully connected neural networks (SFCNN) [23] : An artificial neural network composed of 3 full connecting hidden layers, each of them contains 64 neurons. The model was built to compare with deep learning structure that whether it owns the ability to capture the complex spatiotemporal dependencies among crime and 311 incidents.

4.2

Evaluation metrics

This paper used F1 and AUC (for ROC) as the performance metrics for our imbalance dataset since the two metrics are not biased towards the minority class [26] . In the training stage of each model, we repeated ten crossvalidations, in which the whole dataset was split into 90% randomly to training and 10% for validation using stratified sampling technology. In the testing stage, the validity of each model is assessed by the metrics mentioned above.

4.3

65

RandomForest SFCNN STCN

0.81 0.80 0.85

0.81 0.81 0.88

0.82 0.79 0.86

0.82 0.82 0.92

4.4.2

Performance studies on time windows It was evident from Figure 7 and Figure 8 that the proposed STCN approach outperformed the other baselines with increasing time windows. There were more than 6% and 10% improvements on F1 and AUC when compared to the SFCNN, which was the best classifiers among baselines. The other models, SVM and RWA, gained suboptimal performances. The curves in the two figures also demonstrate that the prediction performances of all models were improved as the time window increasing, due to more temporal patterns or trends can be discovered. The F1 and AUC of SVM began to decline when the length of time window beyond 40, while the performances of our STCN continued to grow until the length of time window surpassed 60. This situation implies that the STCN benefited from its capability to capture more valid spatiotemporal features hidden in crime and 311 incidents. However, when the length of the time window beyond 60, all of the models produced more errors, implying that a further remote historical data probably has less correlation with current time.

Experimental Setup

The experiment used TensorFlow1.01 and CUDA8.0 to build STCN and SFCNN, which were executed on a desktop computer with Intel i7 3.4GHz CPU, 32GB memory and NVIDIA GeForce GTX650 GPU (2GB RAM). The learnable parameters are initialized using a Gaussian distribution with the mean and standard deviation (0, 0.02). These neuron networks models were trained on the full training data for 100 epochs with the mini-batch size of 128. Besides, all of the conv layers use the ReLU activation function with the batch normalization technology. Then the model was trained by gradient descent optimization method and Adam [26] with a weight decay of 0.0001 and a momentum of 0.9, to minimize cross-entropy error. Other baseline models were developed through Scikit package. The over-sampling processing for imbalance dataset drawn support from imbalanced-learn package [26] .

Figure 7 F1 on time window

Figure 8 AUC on time window

4.4

Performance Comparison

4.4.1

Performance studies on 311 data The experiment has been carried out to evaluate the effectiveness of 311 data to the crime prediction. The size of the spatial neighbor set is fixed to 3×3 with the length of time window as 60 days. In Table 2, we found the F1 and AUC of the proposed STCN were highest without the 311 data. Further, after accommodating the 311 data, the F1 and AUC increased by 3% and 6% respectively, expressing the most improvement of our model. This fact demonstrated not only the 311 incidents have certain spatiotemporal correlations with crimes, but also the better capability of our model on mining such relationships. Table 2 Performance with and without 311 data F1 F1 AUC AUC Model (no 311) (no 311) SVM 0.70 0.68 0.76 0.74

It was proved that our proposed model had the best generalization performance since both F1 and AUC of the proposed STCN were still higher than other baselines even the length of time window reached 100.

4.5

Results Visualization

The predicted result acts as an important indicator for identifying high-risk locations that require further attention and police place-based intervention. As an example, Figure 9 gave the comparison of predicted result and the ground truth of crimes in NYC on 2nd Nov 2015. By observing the spatial distribution of red grids in Figure 9 we can know where crimes would happen in the future. In addition, by the spatial adjacent analysis on Figure 9, we found 86% yellow grids were located within 2 km of corresponding nearest red grids, with the left 10 ones out of 2 km. Therefore, just by expanding the police

ISBN: 1-60132-463-4, CSREA Press ©

66

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

resources (e.g., patrol cars) from the predicted red grids to extra 2 km, we could further remove opportunities for committing crimes from 76% (71/(225+71)) to 97% ((225+7110)/(225+71)) of the high-risk locations. That could be used to answer the question “how would crime be prevented?”.

Trajectory Pattern Mining” (#2014BGERLXT14); Open Research Program of Key Laboratory of Mine Spatial Information Technologies of National Administration of Surveying, Mapping and Geoinformation, “Intelligent Activity-route Planning Research Based on Multi-source Mobiling Trajectory Pattern Mining in ‘Intelligent City’” (#KLM201409).

References

Figure 9 Comparison of crime distribution on 2nd Nov 2015

5

Conclusions

In this paper, we proposed a novel deep CNNs-based model for crime predicting. By comparing with 4 baseline methods based on the crime and 311 data, the proposed model demonstrated its superiority in spatiotemporal crime risks forecasting task. However, CNN is a “black-box”, in which the neuron connection structures are not readily interpretable [25] . Whereas, rather than waiting for the day when the “black box” is open, this study is eager to take a quick pace to apply deep CNNs on the spatiotemporal crimes analysis for preventing crimes now. We believe this work just touches the surface of what is possible in this direction and there are many avenues for further exploration such as modeling different categories of crimes risks, identifying significantly socioeconomic features to prevent crimes etc. In addition, more data types should be taken into account to increase prediction performances in the future.

Acknowledgements This project has primarily been funded by: National Natural Science Foundation of China, “Research on House-breaking Suspects Location Spatiotemporal Prediction Base on Semantic Correlation” (#41401524); Guangxi Natural Science Foundation, “Research on Crime Events Spatial Correlation Evolution and its Location Prediction Based on Spatiotemporal Near-Repeating Network Pattern Analysis” (#2015GXNSFBA139191); Open Research Program of Key Laboratory of Police Geographic Information Technology, Ministry of Public Security, “House-breaking Suspects Location Spatiotemporal Prediction Base on Semantic Correlation” (#2016LPGIT03); Scientific Project of Guangxi Education Department, “Research on Crime Spatiotemporal hotspot Prediction Based on Spatiotemporal Near-Repeating Network Analysis” (#KY2015YB189); Open Research Program of Key Laboratory of Environment Change and Resources Use in Beibu Gulf (Guangxi Teachers Education University), Ministry of Education, “Research on Intelligent Activity-route Planning Based on Multi-source Mobiling

[1] Kelvin Leong, Anna Sung. “A review of spatio-temporal pattern analysis approaches on crime analysis”. International e-Journal of Criminal Science, vol. 9, 1—33, 2015. [2] Xiaofeng Wang, Donald E. Brown, Matthew S. Gerber, “Spatio-Temporal Modeling of Criminal Incidents Using Geographic, Demographic, and Twitter-derived Information”. IEEE International Conference on Intelligence and Security Informatics (ISI), 11-14, June 2012. [3] Spencer Chainey. “The Crime Prediction Framework- a spatial temporal framework for targeting patrols, crime prevention and strategic policy”. http://proceedings.esri.com/library/userconf/nss15/paper s/nss_03.pdf, 2015. [4] Mohsen Kalantari, Bamshad Yaghmaei, Somaye Ghezelbash (). Spatio-temporal analysis of crime by developing a method to detect critical distances for the Knox test, International Journal of Geographical Information Science, vol. 30, 11, 2302-2320, 2016. [5] Addarsh Chandrasekar, Abhilash Sunder Raj, Poorna Kumar. “Crime Prediction and Classification in San Francisco City”. http://cs229.stanford.edu/proj2015/228_report.pdf, 2015. [6] Kadar Cristina, Iria José, Pletikosa Cvijikj, Irena, “Exploring Foursquare-derived features for crime prediction in New York City”. The 5th International Workshop on Urban Computing, San Francisco, California USA, 2016. [7] M. S. Gerber. “Predicting crime using Twitter and kernel density estimation. Decision Support Systems”. Vol.61, 1, 115–125, 2014 [8] Alex Chohlas-Wood, Aliya Merali, Warren Reed, Theodoros Damoulas. “Mining 911 Calls in New York City- Temporal Patterns, Detection and Forecasting”. AAAI Workshop: AI for Cities, 2015. [9] Monsuru Adepeju, Gabriel Rosser, Tao Cheng. “Novel Evaluation Metrics for Sparse Spatio-temporal Point Process Hotspot Predictions: A Crime Case Study”, International Journal of Geographical Information Science, vol. 30, 11, 2133-2154, 2016 [10] William Wells, Ling Wu, Xinyue Ye. “Patterns of NearRepeat Gun Assaults in Houston”. Journal of Research in Crime and Delinquency, vol. 49, 2, 49-186, 2012. [11] Joel M. Caplan, Leslie W. Kennedy, Joel Miller, “Risk Terrain Modeling: Brokering Criminological Theory and GIS Methods for Crime Forecasting”. Justice Quarterly, Vol. 28, 2, 360-381, April 2011.

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

[12] Jane Law, Matthew Quick and Ping Chan, “Bayesian Spatio-Temporal Modeling for Analysing Local Patterns of Crime Over Time at Small Area-level”. Journal of Quantitative Criminology, vol. 30, 1, 57–78, 2014. [13] Hongjian Wang, Zhenhui Li, Daniel Kifer, Corina Graif, “Crime Rate Inference with Big Data”. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA, 635-644, August 2016. [14] K. Simonyan, A. Zisserman. “Very deep convolutional networks for largescale visual recognition”. https://arxiv.org/abs/1409.1556, 2015. [15] Wang, P., Xu, J., Xu, B., Liu, C., Zhang, H., Wang, F., & Hao, H. . “Semantic Clustering and Convolutional Neural Network for Short Text Categorization”. Proceedings ACL, 352–357, Beijing China, July 2015. [16] Neal Jean, Marshall Burke, Michael Xie, W. Matthew Davis, David B. Lobell, Stefano Ermon, “Combining satellite imagery and machine learning to predict poverty”. Science, vol. 353, 790-794, 2016. [17] Ma X., Yu H., Wang Y., Wang Y.. “Large-Scale Transportation Network Congestion Evolution Prediction Using Deep Learning Theory”. PLoS ONE, vol. 10, 3, 117, 2015. [18] Li X., Peng L., Hu Y., Shao J., Chi T.. “Deep learning architecture for air quality predictions”. Environ Sci Pollut Res Int, vol. 23, 22, 22408-22417, 2016. [19] Wilpen L. Gorr, YongJei Lee. “Early Warning System for Temporary Crime Hot Spots”. J Quant Criminol, vol. 31, 25–47, 2015 [20] Y. Movshovitz-Attias, Q. Yu, M. C. Stumpe, V. Shet, S. Arnoud, L. Yatziv. “Ontological Supervision for Fine Grained Classification of Street View Storefronts”. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1693–1702, Boston MA USA, June 2015. [21] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens and Zbigniew Wojna, “Rethinking the Inception Architecture for Computer Vision”. https://arxiv.org/abs/1512.00567, 2015 [22] Gustav Larsson, Michael Maire, Gregory Shakhnarovich. “FractalNet: Ultra-Deep Neaural Networks without Residuals”. https://arxiv.org/abs/1605.07648, 2017. [23] Junbo Zhang, Yu Zheng, Dekang Qi. “Deep SpatioTemporal Residual Networks for Citywide Crowd Flows Prediction”. https://arxiv.org/abs/1610.00081, 2017. [24] Bolme, D. S., Beveridge, J. R., Draper, B. A., Phillips, P. J., Lui, Y. M.. “Automatically Searching for Optimal Parameter Settings Using A Genetic Algorithm”. In Lecture Notes in Computer Science, 6962, 213–222, 2011. [25] Ravid Shwartz-Ziv, Naftali Tishby. “Opening the Black Box of Deep Neural Networks via Information”. arXiv:1703.00810, 2017. [26] Kingma, D., Ba, J.. “Adam: A method for stochastic optimization”. arXiv:1412.6980, 2017. [27] Guillaume Lematre, Fernando Nogueira, Christos K. Aridas. “Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning”.

67

Journal of Machine Learning Research, vol. 18, 17, 1-5, 2017.

ISBN: 1-60132-463-4, CSREA Press ©

68

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

Proposed Method for Modified Apriori Algorithm Thanda Tin Yu, Khin Thidar Lynn University of Computer Studies, Mandalay Mandalay, Myanmar

Abstract - There are many algorithms in data mining. Apriori algorithm is the most important algorithm which is used to extract frequent itemsets from large database and which gets association rule. Firstly, we check if the items are greater than or equal to the minimum support and find the frequent itemsets respectively. Then, the minimum confidence is used to form association rule. This paper proposed the new algorithm based on Apriori algorithm. In this new algorithm , it can reduce the computational complexity than Apriori algorithm. So the processing time is faster. And it can be used in any dataset which is executable with Apriori algorithm. Keywords: Data mining, Apriori algorithm, Frequent Pattern mining, Modified Apriori algorithm

1

Introduction

Mining Association rules can be divided into two phases: Phase one, developed by user in accordance with the minimum degree of support from the database to find a frequency greater than or equal to the minimum support of all frequent item sets. The ealiest in 1993, Agrawal, proposed AIS algorithm has too many sets of candidate projects, which results in the last mining association rules poor efficiency of the way, so in 1994, Agrawal, also filled a Apriori algorithm. Later many Apriori based on the improved algorithm have been proposed, such as Savasere, who proposed partition algorithm, Toivonen made sampling algorithm, Park who proposed the use of technology DHP hash algorithm in 1995, on the abovementioned study, the mining association rules is to increase the efficiency of its method not only reducing the noncollection of related items. Data Mining has found its applications use classification, clustering, prediction, association rule mining , pattern recognition and pattern analysis. This research uses any dataset which can implement the Apriori algorithm . This research discuss the mechanism for WEKA to use Apriori algorithm. The new method approach is described as follows:

1.1

Modified Apriori algorithm

Algorithm: Find frequent itemsets using new approach based on term frequency matrix. Input: D, dataset min_sup, the minimum support count threshold. Output: frequent itemsets in D. Method:

(1) Count numbers of distinct items, N, in D. (2) Create frequency matrix, M, which is N x N dimensions and initialize all elements with zero. (3) For each transaction t in D a. For each possible pair of items p in transaction t b. Increase frequency by 1 to corresponding element of frequency matrix, M (4) Generate infrequent itemsets from frequency matrix, M (5) For each row in frequency matrix, M a. Extract all possible frequent itemsets by matching rows and columns satisfying min_sup (6) Remove extracted frequent itemset which included infrequent subsets (7) Remove extracted frequent itemset which is not satisfied min_sup

1.2 Related works Paula R.C.Silva et.al ,2015 expresses a novel approach to discover professional profile patterns from LinkedIn by using association rule mining to extract relevant patterns from the data warehouse, evaluate their approach academic activities and curricula in educational instructions[6]. P. Nancy et.al, 2013 discussed the facebook 100 universities data set in United States from which association rules are mined. Knowledge pattern regarding the association between the major(course) and gender were identify[11]. Trand et.al, 2010 examined the community structures of facebook networks whose links represent “friendship” between user pages within each of five American universities[2]. Ahmet Seiman Bozkir et.al 2009 investigated demographic characteristics of facebook users and their frequency, time spent on facebook and membership in facebook group using association rules[3]. Xiao Cui et.al,2014 explored the relationship among different profile attributes in Sina Weibo using association rule mining to identified the dependency among the attributes [5]. Balaji Mahesh, VRK Rao G Subrahmanya [14] , they proposed adaptive implementation of Apriori algorithm for Retail Scenario in cloud environment which solves the time consuming problem for retail transactional databases. It aims to reduce the response time

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

significantly by using the approach of the frequent itemsets. Wang Feng, Li Young-hua et al presented” An Improved Apriori Algorithm Based on the Matrix” used the matrix effectively indicating the operations in the database and used the “AND operation” to deal with the matrix to generate the largest frequent itemsets. It is not needed to scan the database again and again to perform operations and therefore tasks less time and it also reduced the number of candidates of frequent itemsets greatly. Ke-Chung Lin et.al proposed new algorithm , LP tree (Linear Prefix-Tree) which is composed of array forms and minimize pointers between nodes . This algorithm requires minimum information required in mining process and linearly accesses corresponding nodes. This results in less usage of memory for building trees and it needs less time for traverse in a linear structure[17].

69

L1= {frequent items}; for (k = 1; Lk!=∅; k++) do begin Ck+1= candidates generated from Lk; for each transaction t in database do increment the count of all candidates in Ck+1 that are contained in t Lk+1= candidates in Ck+1 with min_support end Return ∪k Lk; 2.2.2 Apriori algorithm

2 2.1

Background theory

The research work uses the classical Apriority algorithm for extracting the association rules. The problem of association rule mining is defined as:

States of problem

Apriori algorithm suffers from some weakness in spite of being clear and simple. Apriori will be very low and inefficient when memory capacity is limited with large number of transactions. The proposed approach method in this paper reduce the time spent for searching the database and performing transactions for frequent itemsets and also reduces the complexity computation with large number of data transaction which is described in the proposed method, modified Apriori algorithm. Apriori Algorithm takes a lot of memory space and response time since it has exponential complexity. For example , there are 50 transactions so it will have 2 50itemsets and it is also mining twice. We can reduce the itemsets by frequent pattern mining. It will reduce time taken, a lot of space and many transactions.

2.2

Association rule mining process

Association rule generation is usually split up into two split steps. First , minimum support is applied to find all frequent itemsets in a database .Second, these frequent itemsets and minimum confidence constraint are used to form rules. While the second step is straight forward, the first step need more attention, Finding all frequent itemset in a database is difficult since it involves searching all possible itemsets (item combination) . 2.2.1 Pseudo code of Apriori algorithm Ck: Candidate item set of size k

Let I= {i1, i2, ....., in} be a set of n binary attributes called items. Let D= {t1,t2,...., tm} be a set of transactions called the database. Each transaction in D has a unique transaction ID and contains a subset of the items in I. A rule is defined as an implication of the form X=>Y where X,Y C I and X∩Y=ø. The support supp(X) of an itemset X is defined as the propotion in the data set which contain the itemset. The confidence of a rule is defined conf(X=>Y) = supp (X U Y) /supp(X). The lift of a rule is defined as lift(X=>Y) = supp(X U Y) / supp(Y) × supp(X) that expected if X and Y were independent. Correlation analysis using lift that value is less than 1, there is a negative correlation between the occurrence.

2.3 Sample usages of Apriori algorithm Transactional data for an All Electonics brancah is followed. Table 1: Transaction table TID

List of item_IDs

T100

I1,I2,I5

Lk: frequent item set of size k

ISBN: 1-60132-463-4, CSREA Press ©

70

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

T200

I2,I4

T300

I2,I3

T400

I1,I2,I4

T500

I1,I3

T600

I2,I3

T700

I1,I3

T800

I1,I2,I3,I5

T900

I1,I2,I3

Fig.3. Third step of generation 3-itemset frequent pattern The result of frequent itemsets may be {I1,I2,I3},{I1,I2,I5}, for support count =2 obtained by Apriori algorithm , This method takes a lot of time and steps to solve the problem by scanning database frequently. We will solve this problem by our proposed method as follows:

2.4 Example of our proposed algorithm First, we construct the matrix the above example in table (1). Then the matrix may be following. I1

Fig.1. First step of generation 1-itemset frequent pattern

I2

I3

I4

I5

I1

6

4

4

1

2

I2

4

7

4

2

2

I3

4

4

6

0

1

I4

0

1

0

1

0

I5

2

2

0

2

1

Figure 4. frequency matrix Step 1 In support count=2, extract all possible frequent itemset by matching rows and columns satisfying min-sup=2. Then we get {I1,I2},{I1,I3},{I1,I5},{I2,I3},{I2,I4},{I2,I5}

Fig2 .Second step of generating 2-itemset frequent pattern

Step 2 (1) Remove extracted frequent itemset which included infrequent subsets (2) Remove extracted frequent itemset which is not satisfied min_sup Then we get {I1,I2,I3},{I1,I2,I5},{I2,I4} The result of frequent itemset may be {I1,I2,I3},{I1,I2,I5},{I2,I4} The algorithm uses only 2 step for frequent itemset. Thus our proposed algorithm is less computational complexity than the Apriori algorithm.

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

3

71

Minimum confidence: 0.9

Experimental results

Association rule in Apriori algorithm found in WEKA tools and Modified Apriori algorithm in Java with arff file. It can be found in time comparison and less than computational complexity .We can solve the problem in Table 2 as follows: Min support =0.2 ( for 2 instances) Total execution time=18 ms (in Modified Apriori algorithm)

Total execution time=16 s (in WEKA Apriori algorithm) Total execution time=32 ms (in Modified Apriori algorithm) Table 3:result of rules in Apriori and Modified Apriori in 100 American university data set

Usages

Conf

lift

Rule

Conseq uent

Antecedent

R1

Gender= 1 Year=20 09

FacultyStatus =1

R1

Gender= 1 Year=20 09

FacultyStatus =1

R2

Gender= 1 Year=20 08

FacultyStatus =1

FacultyStatus =1

Table 2:result of rules in Apriori and Modified Apriori

Usages

WEKA

Conf

lift

Rule

Conse quent

Antecede nt

WEKA

1

1.29

R1

I4

I2

Modifie d Apriori

Modified Apriori

1

WEKA

1

1.5

R2

I2,I5

I1

Modified Apriori

1

1.5

R2

I2,I5

I1

WEKA

1

1.29

1.29

R1

R3

I4

I1,I5

1

1

1.23

1.23

I2 WEKA

0.99

1.22

Modifie d Apriori

0.99

1.22

R2

Gender= 1 Year=20 08

WEKA

0.99

1.22

R3

House=8 9

FacultyStatus =1

Modifie d Apriori

0.99

1.22

R3

House=8 9

FacultyStatus =1

WEKA

0,99

2.31

R4

Major=0

SecondMajor =0

Modifie d Apriori

0,99

2.31

R4

Major=0

SecondMajor =0

R5

Gender= 1 Year=20 07

FacultyStatus =1

R5

Gender= 1 Year=20 07

FacultyStatus =1

I2

Modified Apriori

1

1.29

R3

I1,I5

I2

WEKA

1

2.25

R4

I5

I1,I2

Modified Apriori

1

2.25

R4

I5

I1,I2

The result of two algorithm, data set is small ,so execution time is not too much different. But frequent item set may differ. In section (2.3) the frequent item set may be {I1,I2,I3},{I1,I2,I5}. The proposed algorithm found{I1,I2,I3},{I1,I2,I5}, {I2,I4} But in WEKA {I1,I2,I5},{I2,I4},{I1,I5},{I2,I5}. So our proposed algorithm is less computational complexity than the Apriori algorithm. In 100 American university data set, we test 3000 instances and 8 attributes (such as ID, FacultyStatus, Gender, Major, SecondMajor, House, Year, HighSchool)

WEKA

Modifie d Apriori

0.95

0.95

Minimum support: 0.1 (300 instances)

ISBN: 1-60132-463-4, CSREA Press ©

1.17

1.17

72

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

In other data sets , we test this two algorithm, association rules , confidence and lift are the same. But execution time is different the following 3 dataset .arff file. Table4. Time comparison in 3 data set of Apriori and Modified Apriori Dataset

Exacution time(Weka)

Exacution time(Modi fied Apriori)

Minimum support

Diabetes.arff

1s

0.021s

0.1(77 instances)

Segment.arff

9s

0.126s

0.15(122 instances)

Ame3000.arff

16s

0.032s

0.1 (300 instances)

We have performed experiments with multilevel association rules. Our testing result shows in Apriori and Modified Apriori method which is executed the frequent itemsets and association rules with minimum support, confidence and lift. According the above result, both of two are mostly the same. But our proposed algorithm has fewer steps to find the frequent item sets, so computational complexity is less than Apriori algorithm.

4

Conclusion

The association rules play a major role in many data mining applications, trying to find increasing patterns in databases. In order to obtain the frequent itemsets, new algorithm must be generated. In this paper, Apriori algorithm is based on the properties of cutting database. The typical Apriori algorithm has performance bottleneck in the massive data processing so that we need to optimize the algorithm in variety of methods. We proposed the new method in this paper not only optimizes the algorithm of reducing the size of the candidate set of k itemsets, but also reduce the computational complexity. The performance of Apriori algorithm is optimized so that we can mine association information from massive data better and faster. So there should be some approach which has less number of scans of database.

5

computer science engineering,2012.

and

technology

software

&data

[3] Xiao Cui, Hao Shi, Xun Yi, Application of association rule mining theory in SinaWeibo, Journal of computer and communication,2014. [4] Paula R.C.Silva, Wladmir C,Brandao, Mining Professional profile from LinkedIn Using Association rules, The 7th international conference on Information, Process, and knowledge management,2015. [5] P. Nancy, R. Geetha Ramani &Shomona Gracia Jacob, Mining of Association Patterns in Social Network Data (Face Book 100 Universities) through Data Mining Techniques and Methods,Springer,2013. [6] Gayana Femanda & Md Gapar MdJohar, Framework for social network data mining, International Journal of computer application, April,2015. [7] R.Sathya,A.Aruna devi, S.Divya, Data mining and analysis of online social network, International Journal of Data mining Techniques and applications, June,2015. [8] Muhammad Mahbubur Rahman, .Mining social data to extract Intellectual Knowledge, I.J, Intelligent System and application, 2012. [9] Paresh Tanna, Dr.Yogesh Ghodasara, Using Apriori with WEKA for frequent pattern mining, International Journal of Engineering Trends Technology (IjETT), June,2014. [10] D.Magdalene Delighta Angeline, I.Samuel Peter James, Efficient aprior mend algorithm for pattern extraction process, International Journal of Computer Science and InformationTechnologies,2011. [11] Nergis Y1, limaz ans Gulfem1 and Klar Alptekin, The Effect of Clustering in the Apriori Data Mining algorithm: A case study, Proceedings of the World Congress on Engineering , July,2013. [12] Nancy.P,Dr.R. Geetha Ramani, Discovery of classifiaction and rules in prediction of application usage in social network data (Facebook application data) using application algorithms, International journal in human machine Interaction (IJHMI),June,2014.

References

[1] A.Seiman Bozkir, Ebru Akcapinar Sezer, Identification of User Patterns in Social Netwirk by data mining techniques: facebook case, Springer, 2010.

[13] Vipulm Mangla, Chandni Sarda,Sarthal Madra, VIT University, Vellore(842014), Improving the efficiency of Apriori Algorithm, International Journal of Engineering and Innovative Technology(IJEIT), eptember,2013.

[2] S,Sphulari, P.U.Bhulchadra, Dr.S.D.Khamitkar &S.N.Lokhande ,Understanding rule behavior through apriori algorithm over social network data , Globle journal of

[14] An Adaptive Implementation Case Study of Apriori Algorithm for a retail Scenario in a Cloud enviornment, Balaji Mahesh, VRK Rao G Subrahmanya, 13th IEEE/ACM

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

International Symposium on Cluster, Cloud and Grid Computing, 2013. [15] Jaiwei Han et.al, Data mining: concepts and Techniques, Morgmann Kaungmarn publishers,2001. [16] Wang Feng, Li Young-hua, An Improved Apriori Algorithm Based on the Matrix,fbie, International Seminar on Future BioMedical Engineering, 2008. [17] Ke-Chung Lin et.al, An improved frequent pattern growth method for mining association rules, Expert Systems with Applications, 2011.

ISBN: 1-60132-463-4, CSREA Press ©

73

74

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

Simplified Long Short-term Memory Recurrent Neural Networks: part I Atra Akandeh and Fathi M. Salem Circuits, Systems, and Neural Networks (CSANN) Laboratory Computer Science and Engineering || Electrical and Computer Engineering Michigan State University East Lansing, Michigan 48864-1226 [email protected]; [email protected] Abstract—We present five variants of the standard Long Shortterm Memory (LSTM) recurrent neural networks by uniformly reducing blocks of adaptive parameters in the gating mechanisms. For simplicity, we refer to these models as LSTM1, LSTM2, LSTM3, LSTM4, and LSTM5, respectively. Such parameter-reduced variants enable speeding up data training computations and would be more suitable for implementations onto constrained embedded platforms. We comparatively evaluate and verify our five variant models on the classical MNIST dataset and demonstrate that these variant models are comparable to a standard implementation of the LSTM model while using less number of parameters. Moreover, we observe that in some cases the standard LSTM’s accuracy performance will drop after a number of epochs when using the ReLU nonlinearity; in contrast, however, LSTM3, LSTM4 and LSTM5 will retain their performance. Index Terms—Gated Recurrent Neural Networks (RNNs), Long Short-term Memory (LSTM), Keras Library.

1. Introduction Gated Recurrent Neural Networks (RNNs) have shown great success in processing data sequences in application such as speech recognition, natural language processing, and language translation. Gated RNNs are more powerful extension of the so-called simple RNNs. A simple RNN model is usually expressed using following equations:

by ot and (iii) a forget ( denoted by ft ) gates. These gates collectively control signaling. Specially, the standard LSTM is expressed mathematically as it = σin (Wi xt + Ui ht−1 + bi ) ft = σin (Wf xt + Uf ht−1 + bf ) ot = σin (Wo xt + Uo ht−1 + bo ) c˜t = σ(Wc xt + Uc ht−1 + bc ) ct = ft ct−1 + it c˜t ht = ot σ(ct )

(2)

where σin is called inner activation (logistic) function which is bounded between 0 and 1, and denotes point-wise multiplication. The output layer of the LSTM model may be chosen to be as a linear map, namely, yt = Why ht + by

(3)

LSTMs can be viewed as composed of the cell network and its 3 gating networks. LSTMs are relatively slow due to the fact that they have four sets of ”weights,” of which three are involved in the gating mechanism. In this paper we describe and demonstrate the comparative performance of five simplified LSTM variants by removing select blocks of adaptive parameters from the gating mechanism, and demonstrate that these variants are competitive alternate to the original LSTM model while requiring less computational cost.

2. New Variants of the LSTM model ht = σ(Whx xt + Whh ht−1 + bh ) yt = Why ht + by

(1)

where Whx , Whh , bh , Why , by are adaptive set of weights and σ is a nonlinear bounded function. In the LSTM model the usual activation function has been replaced with a more equivalent complicated activation function, i.e. the hidden units are changed in such a way that the back propagated gradients are better behaved and permitting sustained gradient descent without vanishing to zero or growing unbounded [6]. The LSTM RNN uses memory cells containing three gates: (i) an input (denoted by it ), (ii) an output (denoted

LSTM uses gating mechanism to control the signal flow. It possess three gating signals driven by 3 main components, namely, the external input signal, the previous state, and a bias. We have proposed five variants of the LSTM model, aiming at reducing the number of (adaptive) parameters in each gate, and thus reduce computational cost [11]. The first three models have been demonstrated previously in initial experiments in [8]. In this work, we detail and demonstrate the comparative performance of the expanded 5 variants using the classical benchmark MNIST dataset formatted in sequence mappings experiments. Moreover, for modularity

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

and ease in implementation, we apply the same changes to all three gates uniformly.

2.1. LSTM1 In this first model variant, input signals and their corresponding weights, namely, the terms Wi xt , Wf xt , Wo xt have been removed from the equations in the three corresponding gating signals. The resulting result model becomes it = σin (Ui ht−1 + bi ) ft = σin (Uf ht−1 + bf ) ot = σin (Uo ht−1 + bo ) c˜t = σ(Wc xt + Uc ht−1 + bc ) ct = ft ct−1 + it c˜t ht = ot σ(ct )

(4)

2.2. LSTM2 In this second model variant, the gates have no bias and no input signals Wi xt , Wf xt , Wo xt . Only the state is used in the gating signals. This produces it = σin (Ui ht−1 ) ft = σin (Uf ht−1 ) ot = σin (Uo ht−1 ) c˜t = σ(Wc xt + Uc ht−1 + bc ) ct = ft ct−1 + it c˜t ht = ot σ(ct )

(5)

75

TABLE 1: variants specifications. variants LSTM LSTM1 LSTM2 LSTM3 LSTM4 LSTM5

# of parameters 52610 44210 43910 14210 14210 14510

times(s) per epoch 30 27 25 14 23 24

2.5. LSTM5 In the fifth model variant, we revise LSTM1 so that the matrices Ui , Uf , Uo are replaced with corresponding vectors denoted by small letters. Then, as in LSTM4, we acquire (Hadamard) point-wise multiplication in the state variables. it = σin (ui ht−1 + bi ) ft = σin (uf ht−1 + bf ) ot = σin (uo ht−1 + bo ) c˜t = σ(Wc xt + Uc ht−1 + bc ) ct = ft ct−1 + it c˜t ht = ot σ(ct )

(8)

We note that for ease of tracking, odd-numbered variations contain biases while even-numbered variations do not. Table 1 provides a summary of the number of parameters as well as the times per epoch during training corresponding to each of the 5 model variants. We also add the parameter of the forward layer. These simulation and the training times are obtained by running the Keras Library [4].

3. Experiments and Discussion

2.3. LSTM3 In the third model variant, the only term in the gating signal is the (adaptive) bias. This model uses the least number of parameter among other variants. it = σin (bi ) ft = σin (bf ) ot = σin (bo ) c˜t = σ(Wc xt + Uc ht−1 + bc ) ct = ft ct−1 + it c˜t ht = ot σ(ct )

(6)

TABLE 2: Network specifications.

2.4. LSTM4 In the fourth model variant, the Ui , Uf , Uo matrices have been replaced with the corresponding ui , uf , uo vectors in LSTM2. The intent is to render the state signal with a pointwise multiplication. Thus, one reduces parameters while retain state feedback in the gatings. it = σin (ui ht−1 ) ft = σin (uf ht−1 ) ot = σin (uo ht−1 ) c˜t = σ(Wc xt + Uc ht−1 + bc ) ct = ft ct−1 + it c˜t ht = ot σ(ct )

The goal of this paper is to provide a fair comparison among the five model variants and the standard LSTM model. We train and evaluate all models on the benchmark MNIST dataset using the images as row-wise sequence. MNIST images are 28 × 28. In the experiment, each model reads one row at a time from top to bottom to produce its output after seeing all 28 rows. Table 2 gives specification of network used.

(7)

Input dimension Number of hidden units Non-linear function Output dimension Non-linear function Number of epochs / Batch size Optimizer / Loss function

28 × 28 100 tanh, sigmoid, tanh 10 softmax 100/32 RMprop / categorical cross-entropy

Three different nonlinearities, i.e., tanh, sigmoid, and relu, have been employed of first (RNN) layer. For each case, we train three different cases with different values of η . Two of those for each case are depicted in the figures below while the Tables below summarize all three results.

3.1. The tanh activation The activation tanh has been used as the nonlinearity of the first hidden layer. To improve performance of the model,

ISBN: 1-60132-463-4, CSREA Press ©

76

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

we perform parameter tuning over different values of the learning parameter η . From experiments, see the samples in Fig.1 and Fig.2, as well as Table 1, there is a small amount of fluctuation in the testing accuracy; however, all variants converge to above 98%. The general trend among all three η values is that LSTM1 and LSTM2 have the closest prediction to the standard LSTM. Then LSTM5 follows and finally LSTM4 and LSTM3. As it is shown, setting η = 0.002 results in test accuracy score of 98.60% in LSTM3 (i.e., the fastest model with least number of parameters) which is close to the best test score of the standard LSTM, i.e., 99.09%. The best results obtained among all the epochs are shown in Table 3. For each model, the best result over the 100 epochs training and using parameter tuning is shown in bold.

TABLE 3: Best results obtained Using tanh. LSTM LSTM1 LSTM2 LSTM3 LSTM4 LSTM5

train test train test train test train test train test train test

η = 1e−4 0.9995 0.9853 0.9993 0.9828 0.999 0.9849 0.9889 0.9781 0.9785 0.9734 0.9898 0.9774

η = 1e−3 1.0000 0.9909 0.9999 0.9906 0.9997 0.9897 0.9977 0.9827 0.9975 0.9853 0.9985 0.9835

η = 2e−3 0.9994 0.9903 0.9996 0.9907 0.9995 0.9897 0.9983 0.9860 0.9958 0.9834 0.9983 0.9859

LSTM1, LSTM2, LSTM5, LSTM4 and LSTM3 have the closest prediction to the base LSTM respectively. Again larger η results in better test accuracy and more fluctuation. It is observed that setting η = 0.002 results in test score of 98.34% in LSTM3 which is close to the test score of base LSTM 98.86%. The best results obtained over the 100 epochs are summarized in Table 4.

Figure 1: Training & Test accuracy, σ = tanh, η = 1e−4

Figure 3: Training & Test accuracy, σ = sigmoid, η = 1e−4

Figure 2: Training & Test accuracy, σ = tanh, η = 2e−3

3.2. The (logistic) sigmoid activation Then, the sigmoid activation has been used as the nonlinearity of the first hidden layer. Again, we explored 3 different value of the learning parameter η . The same trend is observed using the sigmoid nonlinearity. In this case, one can clearly observe the training profile of each model.

Figure 4: Training & Test accuracy, σ = sigmoid, η = 2e−3

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

77

TABLE 4: Best results obtained sigmoid. LSTM LSTM1 LSTM2 LSTM3 LSTM4 LSTM5

train test train test train test train test train test train test

η = 1e−4 0.9751 0.9739 0.9584 0.9635 0.9636 0.9660 0.8721 0.8757 0.8439 0.8466 0.9438 0.9431

η = 1e−3 0.9972 0.9880 0.9901 0.9863 0.9901 0.9856 0.9787 0.9796 0.9793 0.9781 0.9849 0.9829

η = 2e−3 0.9978 0.9886 0.9905 0.9858 0.9907 0.9858 0.9828 0.9834 0.9839 0.9822 0.9879 0.9844

3.3. The relu activation The relu activation has been used as the nonlinearity of the first hidden layer. It is observed (in Fig. 6) that the performance of LSTM, LSTM1 and LSTM2 drop after a number of epochs; however, this is not the case for LSTM3, LSTM4 and LSTM5. These latter model are sustained for all three choices of η . Also LSTM3, the fastest model with least number of parameters, shows the best performance among all 5 variants! With the relu as nonlinearity, the models fluctuate for larger η which is not within the tolerance range of the model. Setting η = 0.002 results in test score of 99.00% for LSTM3 which beat the best test score of the base LSTM, i.e., 98.43%. The best results obtained for all models are summarized in table 5.

Figure 6: Training & Test accuracy, σ = relu, η = 2e−3

4. Conclusion Five variants of the base LSTM model has been presented and evaluated. These models have been examined and evaluated on the benchmark classical MNIST dataset using different nonlinearity and different learning rates η . In the first model variant, the input and their weights have been removed uniformaly from the three gates. In the second model variant, the input weight and the bias have been removed from all gates. In the third model, the gates only retain their biases. The fourth model variant is similar to the second variant, and fifth variant is similar to first variant, except that weights become vectors to execute point-wise multiplication. It has been found that new model variants are comparable to the base LSTM model. Thus, these varaint models may be suitably chosen in applications in order to benefit from speed and/or computational cost.

Acknowledgment This work was supported in part by the National Science Foundation under grant No. ECCS-1549517.

References [1]

Figure 5: Training & Test accuracy, σ = relu, η = 1e−4 TABLE 5: Best results obtained by relu model. LSTM LSTM1 LSTM2 LSTM3 LSTM4 LSTM5

train test train test train test train test train test train test

η = 1e−4 0.9932 0.9824 0.9926 0.9803 0.9896 0.9802 0.9865 0.9808 0.9808 0.9796 0.987 0.9807

η = 1e−3 0.9829 0.9843 0.9824 0.9832 0.9795 0.9805 0.9967 0.9882 0.9916 0.9857 0.9962 0.9885

η = 2e−3 0.9787 0.9833 0.9758 0.9806 0.98 0.9836 0.9968 0.9900 0.9918 0.9847 0.9964 0.9892

Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient descent is difficult. IEE TRANSACTIONS ON NEURAL NETWORKS, 5, 1994. [2] N. Boulanger-Lewandowski, Y. Bengio, and P. Vincent. Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription, 2012. [3] F. Chollet. Keras github. [4] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling, 2014. [5] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9:1735–1780, 1997. [6] Q. V. Le, N. Jaitly, and H. G. E. A simple way to initialize recurrent networks of rectified linear units. 2015. [7] Y. Lu and F. Salem. Simplified gating in long short-term memory (lstm) recurrent neural networks. arXiv:1701.03441, 2017. [8] T. Mikolov, A. Joulin, S. Chopra, M. Mathieu, and M. Ranzato. Learning longer memory in recurrent neural networks, 2014. [9] F. M. Salem. A basic recurrent neural network model. arXiv preprint arXiv:1612.09022, 2016. [10] F. M. Salem. Reduced parameterization of gated recurrent neural networks. MSU Memorandum, 7.11.2016.

ISBN: 1-60132-463-4, CSREA Press ©

78

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

Simplified Long Short-term Memory Recurrent Neural Networks: part II Atra Akandeh and Fathi M. Salem Circuits, Systems, and Neural Networks (CSANN) Laboratory Computer Science and Engineering , Electrical and Computer Engineering Michigan State University East Lansing, Michigan 48864-1226 [email protected]; [email protected] Abstract—This is part II of three-part work. Here, we present a second set of inter-related five variants of simplified Long Short-term Memory (LSTM) recurrent neural networks by further reducing adaptive parameters. Two of these models have been introduced in part I of this work. We evaluate and verify our model variants on the benchmark MNIST dataset and assert that these models are comparable to the base LSTM model while use progressively less number of parameters. Moreover, we observe that in case of using the ReLU activation, the test accuracy performance of the standard LSTM will drop after a number of epochs when learning parameter become larger. However all of the new model variants sustain their performance. Index Terms—Gated Recurrent Neural Networks (RNNs), Long Short-term Memory (LSTM), Keras Library.

1. Introduction In contrast to simple Recurrent Neural Networks (RNNs), Gated RNNs are more powerful in performance when considering sequence-to-sequence relationships [1-8]. The simple RNN model can be mathematically expressed as: ht = (Whx xt + Whh ht yt = Why ht + by

1

+ bh )

(1)

where Whx , Whh , bh , Why and by are adaptive set of weights and is a nonlinear function. In the base LSTM model, the usual activation function has been equivalently morphed into a more complicated activation function, so that the hidden units enable the back propagated through time (BBTT) gradients technique to function properly [6]. The base LSTM uses memory cells in the base network with incorporated three gating mechanisms to properly process the sequence data. To do so, the base LSTM models introduce new sets of parameter in the gating signals and hence

more computational cost and slow training speed. The base (standard) LSTM model is expressed as it = in (Wi xt + Ui ht 1 + bi ) ft = in (Wf xt + Uf ht 1 + bf ) ot = in (Wo xt + Uo ht 1 + bo ) c˜t = (Wc xt + Uc ht 1 + bc ) ct = ft ct 1 + it c˜t ht = o t (ct )

(2)

Note that the first three equations represent the gating signals, while the remaining three equations express the memory-cell networks. In our previous work we have introduced 5 reduced variants of the base LSTM, referred to as LSTM1, LSTM2, LSTM3, and LSTM5, respectively. We will use the same numbering designation, and continue the numbering system in our publications to distinguish among the model variants.

2. Parameter-reduced variants of the LSTM base model A unique set of five variants of the base LSTM model are introduced and evaluated here. Two of them have been presented in part I and are used in this comparative evaluation. To facilitate identification, we designate odd-numbered variants to include biases and even-numbered variants to have no biases in the gating signals.

2.1. LSTM4 LSTM4 has been introduced in part I. We have removed the input signal and the bias from the gating equations. Furthermore, the matrices Ui , Uf and Uo are replaced with the corresponding vectors ui , uf and uo respectively, in order to render a point-wise multiplication. Specifically, one obtains

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

79

TABLE 1: variants specifications.

it = in (ui ht 1 ) ft = in (uf ht 1 ) ot = in (uo ht 1 ) c˜t = (Wc xt + Uc ht 1 + bc ) ct = ft ct 1 + it c˜t ht = o t (ct )

(3)

2.2. LSTM5 LSTM5 was also presented in part I and it is similar to LSTM4 except that the biases are retained in the three gating signal equations. it = in (ui ht 1 + bi ) ft = in (uf ht 1 + bf ) ot = in (uo ht 1 + bo ) c˜t = (Wc xt + Uc ht 1 + bc ) ct = ft ct 1 + it c˜t ht = o t (ct )

(4)

2.3. LSTM4a In LSTM4a, a fixed real number with absolute value less than 1 has been set for the forget gate in order to preserve bounded-input-bounded-output (BIBO) stability, [10]. Meanwhile, the output gate is set to 1 (which in practice eliminates this gate altogether). it = in (ui ht 1 ) ft = 0.96 ot = 1.0 c˜t = (Wc xt + Uc ht 1 + bc ) ct = ft ct 1 + it c˜t ht = o t (ct )

(5)

2.4. LSTM5a

variants LSTM LSTM4 LSTM5 LSTM4a LSTM5a LSTM6

# of parameters 52610 14210 14510 14010 14110 13910

times(s) per epoch 30 23 24 16 17 12

to 1 each (which practically eliminate them to the purpose of computational efficiency). In fact, this model variant now becomes equivalent to the so-called basic RNN model reported in [10]. it = 1.0 ft = 0.59 ot = 1.0 c˜t = (Wc xt + Uc ht 1 + bc ) ct = ft ct 1 + it c˜t ht = o t (ct )

(7)

Table 1 provides the number of total parameters as well as the training times per epoch corresponding to each variant. As shown, the number of parameters and time per epoch have been progressively decreased to less than half of the corresponding values for the base (standard) LSTM model.

3. Experiments and Discussion To perform equitable comparison among all variants, similar condition have been adopted using the Keras Library environment [4]. The model specification are depicted in Table 2. We have trained the models using the row-wise fashion of the MNIST dataset. Each image is 28⇥28. Hence, sequence duration is 28 and input dimension is also 28. There are three case studies involving distinct activation: tanh, sigmoid and relu of first layer. For each case, we have tuned the learning parameter ⌘ over three values. TABLE 2: Network specifications.

LSTM5a is similar to LSTM4a but the bias term in the input gate equation is preserved. it = in (ui ht 1 + bi ) ft = 0.96 ot = 1.0 c˜t = (Wc xt + Uc ht 1 + bc ) ct = ft ct 1 + it c˜t ht = o t (ct )

Input dimension Number of hidden units Non-linear function Output dimension Non-linear function Number of epochs / Batch size Optimizer / Loss function

28 ⇥ 28 100 tanh, sigmoid, tanh 10 softmax 100/32 RMprop / categorical cross-entropy

(6)

2.5. LSTM6 Finally, this is the most aggressive parameter reduction. Here all gating equations replaced replaced by appropriate constant. For BIBO stability, we found that the forget gate must be set ft = 0.59 or below. The other two gates are set

3.1. The tanh activation The activation function is tanh. To improve performance, learning parameter over only three different values is evaluated. In training, initially there is a gap among different model variants, however, they all catch up with the base (standard) LSTM quickly within the 100 epochs. Test accuracy of base LSTM model is at 99% and test accuracy of other variants after parameter tuning is around 98%. As

ISBN: 1-60132-463-4, CSREA Press ©

80

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

one increases ⌘ , more fluctuation is observed. However, the models still sustain their performance levels. One advantage of the model variant LSTM6 is that it rises faster in comparison to other variants. However, ⌘ = 0.002 causes it to decrease, indicating that this ⌘ value is relatively large. In all cases, LSTM4a and LSTM5a performance curves overlap.

Figure 1: Training & Test accuracy,

Figure 2: Training & Test accuracy,

3.2. The (logistic) sigmoid activation For these cases the similar trend is observed. The only difference is that the sigmoid activation with ⌘ = 1e 4 slowly progresses towards its maximal performance. After learning parameter tuning, the typical test accuracy of the base LSTM model is 99%, test accuracy of variants LSTM4 and LSTM5 is about 98%, and, test accuracy of variants LSTM4a and LSTM5a is 97%. As it is shown in the table and associated plots, ⌘ = 1e 4 seemingly relatively small when using the sigmoid activation. As evidence, LSTM6 attains only a training and test accuracies of about 74% after 100 epochs, while it attains an accuracy of about 97% when ⌘ is increased 10-fold. In our variants, models using tanh saturate quickly; however, models using sigmoid rise steady from beginning to the last epoch. In these cases, again the LSTM4a and LSTM5a performances overlap.

= tanh, ⌘ = 1e 4

Figure 3: Training & Test accuracy,

= sigmoid, ⌘ = 1e 4

Figure 4: Training & Test accuracy,

= sigmoid, ⌘ = 2e 3

= tanh, ⌘ = 2e 3

The best results obtained among over the 100 epochs training duration is summarized in table 3. For each model variant, the best results are shown in bold font. TABLE 3: Best results obtained by the tanh activation. LSTM LSTM4 LSTM5 LSTM4a LSTM5a LSTM6

train test train test train test train test train test train test

⌘ = 1e 4 0.9995 0.9853 0.9785 0.9734 0.9898 0.9774 0.9835 0.9698 0.9836 0.9700 0.9948 0.9771

⌘ = 1e 3 1.0000 0.9909 0.9975 0.9853 0.9985 0.9835 0.9957 0.9803 0.9977 0.9820 0.9879 0.9792

⌘ = 2e 3 0.9994 0.9903 0.9958 0.9834 0.9983 0.9859 0.9944 0.9792 0.998 0.9821 0.9657 0.9656

3.3. The relu activation For the relu activation cases, we tune the performance over three learning rate parameters for comparison. All

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

TABLE 4: Best results obtained by sigmoid model. LSTM LSTM4 LSTM5 LSTM4a LSTM5a LSTM6

train test train test train test train test train test train test

⌘ = 1e 4 0.9751 0.9739 0.8439 0.8466 0.9438 0.9431 0.8728 0.8770 0.8742 0.8788 0.7373 0.7423

⌘ = 1e 3 0.9972 0.9880 0.9793 0.9781 0.9849 0.9829 0.9726 0.9720 0.9725 0.9707 0.9495 0.9513

⌘ = 2e 3 0.9978 0.9886 0.9839 0.9822 0.9879 0.9844 0.9778 0.9768 0.9789 0.9783 0.9636 0.9700

models perform well with ⌘ = 1e 4. In this case, LSTM4a and LSTM5a do not overlap. For ⌘ = 1e 3, the base LSTM does not sustain its performance and drastically drops. One may need to leverage early stopping strategy to avoid this problem. In this case study, the LSTM model begin to fall around epoch=50. LSTM6 also drops in performance. Model LSTM4, LSTM5, LSTM4a and LSTM5a show sustained accuracy performance as for ⌘ = 1e 4. For ⌘ = 2e 3 model LSTM4 and LSTM5 are still sustaining their performances.

Figure 5: Training & Test accuracy,

= relu, ⌘ = 1e 4

81

The best results obtained among all the epochs have been shown in Table 5. TABLE 5: Best results obtained by relu model. LSTM LSTM4 LSTM5 LSTM4a LSTM5a LSTM6

train test train test train test train test train test train test

⌘ = 1e 4 0.9932 0.9824 0.9808 0.9796 0.987 0.9807 0.9906 0.9775 0.9904 0.9769 0.9935 0.9761

⌘ = 1e 3 0.9829 0.9843 0.9916 0.9857 0.9962 0.9885 0.9949 0.9878 0.996 0.9856 0.9719 0.9720

⌘ = 2e 3 0.9787 0.9833 0.9918 0.9847 0.9964 0.9892 0.1124 0.1135 0.1124 0.1135 0.09737 0.0982

4. Conclusion We have aimed at reducing the computational cost and increasing execution times by presenting new variants of the (standard) base LSTM model. LSTM4 and LSTM 5, which were introduced earlier, use pointwise multiplication between the states and their corresponding weights. Model LSTM5, as all other odd-numbered model variants, retains the bias term in the gating signals. Model LSTM4a and Model LSTM5a are similar to model LSTM4 and LSTM5 respectively. The only difference is that the forget and the output gate are set at appropriate constants. In model LSTM6, which is actually a basic recurrent neural network, all gating equations have been replaced by a constants! It has been demonstrated that all new variants are relatively comparable to the base (standard) LSTM in this initially case study. Using appropriate learning rate ⌘ values, LSTM5, then LSTM4a & LSTM5a, LSTM4 and LSTM6, respectively, have progressively close performance to the base LSTM. Finally we can conclude that any of introduced model variants, with hyper-parameter tuning, can be used to train a dataset with markedly less computational effort.

Acknowledgment This work was supported in part by the National Science Foundation under grant No. ECCS-1549517.

References

Figure 6: Training & Test accuracy,

= relu, ⌘ = 2e 3

[1] Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient descent is difficult. IEE TRANSACTIONS ON NEURAL NETWORKS, 5, 1994. [2] F. Chollet. Keras github. [3] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling, 2014. [4] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9:1735–1780, 1997. [5] Q. V. Le, N. Jaitly, and H. G. E. A simple way to initialize recurrent networks of rectified linear units. 2015. [6] Y. Lu and F. Salem. Simplified gating in long short-term memory (lstm) recurrent neural networks. arXiv:1701.03441, 2017. [7] F. M. Salem. A basic recurrent neural network model. arXiv preprint arXiv:1612.09022, 2016. [8] F. M. Salem. Reduced parameterization of gated recurrent neural networks. MSU Memorandum, 7.11.2016.

ISBN: 1-60132-463-4, CSREA Press ©

82

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

Simplified Long Short-term Memory Recurrent Neural Networks: part III Atra Akandeh and Fathi M. Salem Circuits, Systems, and Neural Networks (CSANN) Laboratory Computer Science and Engineering , Electrical and Computer Engineering Michigan State University East Lansing, Michigan 48864-1226 [email protected]; [email protected] Abstract—This is part III of three-part work. In parts I and II, we have presented eight variants for simplified Long Short Term Memory (LSTM) recurrent neural networks (RNNs). It is noted that fast computation, specially in constrained computing resources, are an important factor in processing big timesequence data. In this part III paper, we present and evaluate two new LSTM model variants which dramatically reduce the computational load while retaining comparable performance to the base (standard) LSTM RNNs. In these new variants, we impose (Hadamard) pointwise state multiplications in the cell-memory network in addition to the gating signal networks.

parameter reductions to the main network! Only the state and the bias are candidates. We describe and evaluate two new simplified LSTM variants by uniformly reducing blocks of adaptive parameters in the gating mechanisms and also in main equation of the gated system.

1. Introduction

2.1. LSTM6

Nowadays Neural Networks play a great role in Information and Knowledge Engineering in diverse media forms including text, language, image, and video. Gated Recurrent Neural Networks have shown impressive performance in numerous applications in these domains [1-8]. We begin with the simple building block for clarity, namely the simple RNN. The simple RNN is is expressed using following equations: ht = σ(Whx xt + Whh ht−1 + bh ) yt = Why ht + by

(1)

The Gated RNNs, called Long Short-term Memory (LSTM) RNNs, were introduced in [5], by defining the concept of gating signals to control the flow of information [1-5]. A base (standard) LSTM model can be expressed as it = σin (Wi xt + Ui ht−1 + bi ) ft = σin (Wf xt + Uf ht−1 + bf ) ot = σin (Wo xt + Uo ht−1 + bo ) c˜t = σ(Wc xt + Uc ht−1 + bc ) ct = ft ct−1 + it c˜t ht = ot σ(ct )

(2)

The first three equations express the three gating control signals. The three remaining equations express the main cell-memory network. In this part III paper, we shall apply

2. New Variants LSTM Models In part I and part II of this study, we introduced eight variants. In this part III, we present two new model variants. We seek to reduce the number of parameters and thus computational cost in this endeavor.

This minimal model variant was introduced earlier and it is included here for baseline comparison reasons. Only constants has been selected for the gate equation, i.e there is no parameter associate with input, output and forget gate. The forget gate value must be less than one in absolute value for bounded-input-bounded-output (BIBO) stability [8]. it = 1.0 ft = 0.59 ot = 1 c˜t = σ(Wc xt + Uc ht−1 + bc ) ct = ft ct−1 + it c˜t ht = ot σ(ct )

(3)

Note when the gate signal value is set to 1, this is, in practice, equivalent to eliminating the gate! The next two models perform nuances parameter reductions on the cellbody network equations. We figured using a numbering systems that start from 10 for ease for distinct referencing.

2.2. LSTM10 In this model, point-wise multiplication are applied to the hidden state and corresponding weights in the cell-body equations as well. We apply this modification not only to the

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

TABLE 1: variants specifications. variants # of parameters times(s) per epoch LSTM 52610 30 LSTM6 13910 12 LSTM10 4310 18 LSTM11 4610 19

3.1. Default η

gating equations but also to the main equation, i.e. matrix Uc is replaced with vector uc for the pointwise multiplication. it = σin (ui ht−1 ) ft = σin (uf ht−1 ) ot = σin (uo ht−1 ) c˜t = σ(Wc xt + uc ht−1 + bc ) ct = ft ct−1 + it c˜t ht = ot σ(ct )

83

(4)

Initially, we picked 0.001 for η . In the cases with sigmoid or tanh activation, all variants performed comparatively well. However, using the relu activation caused model LSTM10 drop its accuracy performance to 52%. Also accuracy of the base (standard) LSTM dropped after 50 epochs. The best test accuracy of the base LSTM is around 99% and the test accuracy of LSTM10 and LSTM11 are respectively about 92% and 95% using tanh. Other cases are summarized in Table 3. We explored a range of η for sigmoid and tanh in which variants LSTM10 and LSTM11 can become competitive within the 100 epochs. We also explored a valid range of η for relu.

2.3. LSTM11 This variant is similar to the LSTM10. However, it reinstates the biases in the gating signals. Mathematically, it is expressed as it = σin (ui ht−1 + bi ) ft = σin (uf ht−1 + bf ) ot = σin (uo ht−1 + bo ) c˜t = σ(Wc xt + uc ht−1 + bc ) ct = ft ct−1 + it c˜t ht = ot σ(ct )

(5)

Figure 1: Training & Test accuracy, σ = tanh, η = 1e−3

Table 1 provides the total number of parameters and the comparative elapsed times per epoch corresponding to each variant.

3. Experiments and Discussion We have trained the variants on the benchmark MNIST dataset. The 28 × 28 image is passed to the network as row-wise sequences. Each network reads one row at a time and infer its decision after all rows have been read. In all cases, the variants have been trained using the Keras Library [3]. Table 2 summarizes the specification of the network architecture used. TABLE 2: Network specifications. Input dimension 28 × 28 Number of hidden units 100 Non-linear function tanh, sigmoid, tanh Output dimension 10 Non-linear function softmax Number of epochs 100 Optimizer RMprop Batch size 32 Loss function categorical cross-entropy

Figure 2: Training & Test accuracy, σ = sigmoid, η = 1e−3

3.2. Searching for best η We increased η from 0.001 to 0.005 in increments of 0.001. This led into an increase in test accuracy of model LSTM11 and model LSTM12, yielding values 95.31% and 93.56% respectively for the tanh case. As expected,

ISBN: 1-60132-463-4, CSREA Press ©

84

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

Figure 3: Training & Test accuracy, σ = relu, η = 1e−3

Figure 5: Training & Test accuracy of different η , lstm10, sigmoid

TABLE 3: Best results obtained by η = 0.001. tanh sigmoid relu train 1.000 0.9972 0.9829 LSTM test 0.9909 0.9880 0.9843 train 0.9879 0.9495 0.9719 LSTM6 test 0.9792 0.9513 0.9720 train 0.9273 0.9168 0.4018 LSTM10 test 0.9225 0.9184 0.5226 train 0.9573 0.9407 0.9597 LSTM11 test 0.9514 0.9403 0.9582

LSTM10 with the relu activation failed progressively comparing to smaller η values. The training and test accuracy of these new η values are shown in Table 4 and Table 5.

Figure 6: Training & Test accuracy of different η , lstm10, relu

Figure 4: Training & Test accuracy of different η , lstm10, tanh

3.3. Finding η for LSTM10 relu We have explored a range of η for LSTM10 with relu activation to improve its performance. However, the effort was not successful. Increasing η from 2e − 6 to 1e − 5 leads to an increase in accuracy with value of 53.13%. For η less than 1e−5, the plots have increasing trend. However, after

Figure 7: Training & Test accuracy of different η , lstm11, tanh

this point, the accuracy starts to drop after a number of epochs depending on the value of η .

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

85

Figure 10: Training & Test accuracy of different η , lstm10, relu Figure 8: Training & Test accuracy of different η , lstm11, sigmoid

Figure 9: Training & Test accuracy of different η , lstm11, relu

η η η η

η η η η

TABLE 4: Best results obtained by LSTM10. tanh sigmoid relu train 0.9376 0.9354 0.3319 = 0.002 test 0.9366 0.9376 0.2510 train 0.9388 0.9389 0.1777 = 0.003 test 0.9357 0.9367 0.0919 train 0.9348 0.9428 0.1946 = 0.004 test 0.9350 0.9392 0.0954 train 0.9317 0.9453 0.1519 = 0.005 test 0.9318 0.9444 0.0919 TABLE 5: Best results obtained by LSTM11. tanh sigmoid relu train 0.9566 0.9546 0.9602 = 0.002 test 0.9511 0.9534 0.9583 train 0.9557 0.9601 0.9637 = 0.003 test 0.9521 0.9598 0.9656 train 0.9552 0.9608 0.9607 = 0.004 test 0.9531 0.9608 0.9582 train 0.9539 0.9611 0.9565 = 0.005 test 0.9516 0.9635 0.9569

4. Conclusion In this study, we have described and evaluated two new reduced variants of LSTM model. We call these new

models LSTM10 and LSTM11. These models have been examined and evaluated on the MNIST dataset with different activations and different learning rate η values. In our part I and part II, we considered variants to the base LSTM by removing weights/biases from the gating equations only. In this study, we have reduced weights even within the main cell-memory equation of the model by converting a weight matrix to a vector and replace regular multiplication with (Hadamard) pointwise multiplication. The only difference between model LSTM10 and LSTM11 is that the latter retain the bias term in the gating equations. LSTM 6 is equivalent to the so-called basic recurrent neural network (bRNN), since all gating equation have been replaced by a fixed constant– see [8]. It has been found that all of variants, except model LSTM10 when using the activation relu, are comparable to a (standard) base LSTM RNN. We anticipate that further case studies and experiments would serve to fine-tune these findings.

Acknowledgment This work was supported in part by the National Science Foundation under grant No. ECCS-1549517.

References [1] Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient descent is difficult. IEE TRANSACTIONS ON NEURAL NETWORKS, 5, 1994. [2] N. Boulanger-Lewandowski, Y. Bengio, and P. Vincent. Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription, 2012. [3] F. Chollet. Keras github. [4] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling, 2014. [5] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9:1735–1780, 1997. [6] Q. V. Le, N. Jaitly, and H. G. E. A simple way to initialize recurrent networks of rectified linear units. 2015. [7] Y. Lu and F. Salem. Simplified gating in long short-term memory (lstm) recurrent neural networks. arXiv:1701.03441, 2017. [8] F. M. Salem. A basic recurrent neural network model. arXiv preprint arXiv:1612.09022, 2016. [9] F. M. Salem. Reduced parameterization of gated recurrent neural networks. MSU Memorandum, 7.11.2016.

ISBN: 1-60132-463-4, CSREA Press ©

86

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

Extraction of Spam and Promotion Campaigns Using Social Behavior and Similarity Estimation from Compromised Accounts Selvamani K1, Nathiya S2, Divasree I R3, Kanimozhi S4 1,2,3 Department of CSE, CEG Campus, Anna University, Chennai, Tamilnadu, India 4 Department of IST, CEG Campus, Anna University , Chennai, Tamilnadu, India

Abstract - Nowadays, the compromised account detection is a challenging issue for the service providers and the normal users in social behavioral network due to its well-established trust relationship with normal users and their friends in the Online Social Networks (OSN). The compromised accounts may be a normal user, spam user or the promotion campaign. This proposed system involves in collecting the behavioral feature and the wall messages of the OSN users. Which involves a set of behavioral features and in addition the wall messages generated by them are collected which leads to the effective characterization of social activities by creating the social behavioral profiles. With the detected compromised accounts, the similarity of the accounts posting the same URLs is identified and the account graph is constructed. By using cluster and classification method normal, spam and promotion type of users are identified. The experimental result shows that the proposed system efficiently differentiation different OSN users, detects compromised accounts and also identifies the spam and promotion campaigns.

spammers to disseminate various malicious websites and promotion campaigns to actively participating in networking sites to promote their products is shown in Figure 1.

Keywords: Online Social Accounts, Behavioral Profiles

A compromised account is one that has been accessed by someone other than a legal user, spammers and promotion campaigns. These compromised accounts can be detected by analyzing the social behavioral activities as behavioral patterns and collecting the user’s individual wall messages for the detection of compromised account from the normal accounts by profiling their behavioral features and the messages as user’s individual behavioral profile.

1

Networks,

Compromised

Introduction

Online Social Networks are most popular and important communication tool for millions of users nowadays. Social networking services are not just bringing Internet users into fast-flowing online conversations but they are helping people to follow breaking news, relationship with friends, contribute to online debates and learning events. They are transforming online user behavior in terms of initial entry point, search, browsing and purchasing behavior. The social media are changing user’s expectations, acceptable online behavior in fast manner. Historically, it is enough to have an online presence on the Internet for broadcasting and dissemination of information. Today, social networks provide new forms of social interaction, information exchange and collaboration. Social networking sites enable users to post ideas, to update and comment, or to participate in activities, while sharing their wider interests [7]. In the same way it helps the

Figure 1. Different Users in OSN

Several spammers for spreading the malicious spam, they establish a trust relationship as normal user with the legitimate users. Understanding how user behave when connect to social network sites creates opportunities for improved design of content distribution. For this problem, the proposed system characterizes the social activities of several users and profiling those behavioral features and wall messages as individual profiles for the detection of compromised accounts.

Behavioral features involve several activities involved by every user, using those activities behavioral profile for every user can be profiled with vectors. The vectors features include first activity, activity preference, activity sequence, action latency, browsing preference, visit duration, request latency, browsing sequence and messages. By comparing the behavioral profiles the users will be differentiated as normal user and compromised users using clustering algorithms. From the compromised accounts using similarity estimation method and account graph construction normal, spam and promotion campaigns are identified. By using cross validation experiments, the proposed system can effectively differentiate online social network users and identifies the normal spam and promotion campaigns.

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

2

Literature Survey

The behaviors of the social network users are identified to promote normal users for the awareness of compromised accounts and malicious activities in social networking sites. . Benevenuto et al. [2] described the recent account hacking incidents in OSNs serves as evidence for this trend. Mostly, dedicated spam compromised accounts are created to serve malicious purposes. While malicious accounts can be banned or removed upon during detection, compromised accounts cannot be handled in that way because it creates negative impact on social network user experience. They used user activities and profile the activities with 85%. If the user cannot use an account for a long time and if the normal user may share the message based on their interest, it will lead to false positive rate on their terms. Bogers et al. [3] adopted the language based modeling based on similarity function to detect the malicious users. They used KNN and the Naive Bayes classifier for the classification.it involves the accuracy of 95%. By improving feature selection it will lead to better classification performance. Egele et al. [4] in their work showed major OSNs today’s battle against account compromization with the accuracy of 92%. However, this approach suffers from high false positive rate. The analysis of the behavior of the user, clickstream data reveals social network workloads, such as how frequently people connect to social networks, as well as the types and sequences of activities that users conduct on these sites. Ago et al. [1] proposed a framework that describes a research on spamming account detection, since previous work cannot differentiate compromised accounts from spam accounts and legitimate user accounts. One recent study that features compromised account detection using decision tree algorithm with the accuracy of 87%. They showed little false positive rate. For better performance the classifier should be trained with large data sets. Gianluca et al. [5] suggested a framework for characterizing user behavior and it is quite challenging because user attributes exhibit twisted distribution in OSNs. Therefore, any characterization involves user population in an OSN. Primarily they represent low-degree with active users of OSN, they achieved 99% with SVM classifier, but it requires high training time to train the classifier Thomas et al. [9] implemented a research which describes compromising accounts in social networks is very effective; attackers can easily establish the trust relationships with the normal account owners. Moreover, compromised accounts cannot be simply deleting the corresponding profiles. They can be identified using the Logistic regression algorithm for distributed data collection with the performance of 90.78% but they cannot be predicted with the short URLs.

87

Reza et al. [8] describes the system with the click sequence and inter-arrival time of Sybil and common users’ are found that considering both factors leads to better detection results. As a result, the above methods cannot handle compromised accounts well. In contrast, this paper aims cover users’ social behavior patterns and habits, which perform accurate and delicate detection on behavioral deviation with 89% of accuracy. The benefit is that popular content can be taken and implant to other sites for advertisement purposes. For the better performance, they can train the system using several OSN sites. Hongyu et al. [6] describe their work on the intuition is that spam campaigns are generated using templates and that messages in the same campaign shall retain certain similarity among them. The spam labeling approach assumed that suspended accounts generated no legitimate message. Xianchao et al. [11] proposed a framework to detect the spam and promotion campaigns. It consists of similarity estimation method between the accounts; account graph construction based on cohesive extraction algorithm and identifies the spam and promotion campaigns with 90% of accuracy by several campaign extraction features. Xin et al. [10] proposed the system with concrete behavioral metrics in hand; build a user’s social behavioral profile. Collecting the data as behavioral features from the user and profile those data as user’s individual behavioral profile. These behavioral profiles used for the identification of compromised accounts with 92% of accuracy. The drawback is considered if the account is not used for a long time.

3

System Design

This chapter discuss about the overall system architecture and detailed description of all modules. Proposed system provides a framework for detection of compromised accounts from the normal accounts and differentiates the spam and promotion campaigns. The major contributions included in this system are (i) Preprocess data (ii) The preprocessed data is profiled as social behavioral profiles (iii) Comparing the behavioral profiles, the compromised accounts can be detected (iv) estimating the similarity between the accounts using the posting URLs (v) linking the accounts using the estimated similarity value (vi) identifies the spam and promotion campaigns. System architecture of proposed system is illustrated in Figure 2

3.1

Preprocess Collect Data

It describes the data collection to understand their online social behaviors. In order to observe both extroversive and introversive behaviors from the participating users, a browser extension to record user activities from click streams.

ISBN: 1-60132-463-4, CSREA Press ©

88

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

Figure. 2. Overall Architecture of the System

The dataset are preprocessed using certain preprocessing techniques as data cleaning, data transformation and stemming. Then the preprocessed data is processed for the formation of user social behavioral profile using our proposed behavioral features as first activity, activity preference, activity sequence and action latency, browsing sequence, request latency, action latency, browsing preference and messages. The messages can be processed using the SVD algorithm with Latent Semantic Analysis and Indexing (LSI) technique by calculating the term document matrix and based on the score values the context is processed.

3.2

Profile Creation of Cluster

With concrete behavioral metrics, build a user’s social behavioral profile by combining their social behavior metrics into 9-vector tuple, then each vector is normalized. Then the social behavioral profile is compared using the difference between the vectors of two different profiles by calculating the Euclidean distance using K-means algorithm for the detection of compromised accounts.

3.3

Similarity estimation and account graph construction

Using the compromised accounts, the similarity of the accounts can be identified using the Shannon information theory. The similarity can be achieved by estimating the accounts purposes of posting URLs by using the information contained in the URL. With the similarity estimation method, the accounts can be linked using the estimated similarity value with the corresponding threshold ӷ. The accounts with the similarity value higher than the threshold value ӷ are listed on left. Using

the account graph the account dense graph can be extracted using the cohesive campaign extraction algorithm [11]. Algorithm: Greedy Approximation Approach (Cohesive Campaign Extraction) 1: Randomly select a vertex vi∈V that has not been visited. 2: Compute maximum co-clique of each neighbor η (vi, vnbr) of current node vi. 3: Insert all neighbors into priority queue sorted by η (vi, vnbr) and update the priority queue. 4: Pop the first vertex vj of queue and obtain θ (vi) = η (vi, vj), then select vj as the current node. 5: Repeat Steps 2 through 4 until all vertexes are visited. 6: Traverse the nodes as the order in Step 4 and identify the First increasing θ (vi) as the boundary of a new campaign. 7: Recursively include all connected vertexes whose θ value is bigger larger than λ ∗θ(vi) as a new campaign. 8: Repeat Steps 6 and 7, and find all cohesive campaigns

3.4

Identification of spam and promotion campaign

The algorithm is based in the greedy approach, by using the cohesive campaign extraction algorithm; the candidate campaign can be extracted. Every candidate campaign we extracted cannot be a spam and promotion campaign. Then the candidate campaign is introduced into the K-means clustering for the separation of normal and abnormal accounts and then SVM as the classifier to distinguish spam and promotion campaigns from abnormal accounts using certain features [11].

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

4

89

TABLE II CLUSTER MODE VS. SENSITIVITY RATE (%)

Implementation and Results

This section discusses about the experiment adopted to efficiently evaluate the accuracy and performance of the proposed system against the approaches specified. Experiments were performed using several parameter settings for the models for the same datasets. Performance is evaluated using the predicted value with the actual value.

4.1

Profile completeness vs. Accuracy

The table I gives the accuracy of detection of compromised

Percentage split (%) 0.66 0.50 0.55 0.80 0.76

Sensitivity rate (%) 0.921 0.8 0.912 0.999 0.903

The sensitivity rate can be obtained by the dataset using various cluster modes as percentage split which involves the evaluation of dataset for the proposed system.

accounts using number of vectors for the complete profiles, TABLE I DETECTION ACCURACY WITH PROFILE COMPLETENESS Vector number 3 4 5 6 7

Complete profiles 14 12 6 3 3

Here the vector number shows the behavioral activities involved by every individual user who are using the online social networks. The complete profile denotes the profile completeness which is a process in the proposed framework with the number of vectors by users.

Figure. 4. Cluster mode vs. Sensitivity

The Figure 4 gives the sensitivity rate that gives the correct identification of the normal accounts by percentage split validation of dataset is applied.

4.3

Cluster mode vs. Specificity

The table III gives the specificity rate by percentage split validation of dataset as follows, TABLE III CLUSTER MODE VS. SPECIFICITY RATE (%) Percentage split (%)

Specificity rate (%)

0.66

0.9059

0.50 0.55 0.80

0.8269 0.9233 0.9384

Figure. 3. Profile completeness vs. Accuracy

In the Figure 3, due certain activities are insignificant, some behavioral feature vectors be N/A. By adjusting the least number of features vectors with non-emptied, the completeness of selected behavioral profiles will be achieved.

4.2

Cluster mode vs. Sensitivity

The table II gives the different size of datasets and their sensitivity rate as follows,

Figure. 5. Cluster mode vs. Specificity

ISBN: 1-60132-463-4, CSREA Press ©

90

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

The Figure 5 gives the specificity rate that gives the correct identification of compromised accounts by percentage split validation of dataset collected for the detection of compromised accounts.

4.4

proposed system of identifying spam and promotion campaigns from the compromised accounts.

Cluster mode vs. Accuracy

The table IV gives the Accuracy rate for the dataset by percentage split validation for evaluating the accuracy of proposed system as follows, TABLE IV CLUSTER MODE VS. ACCURACY (%) Percentage split (%) 0.66 0.50 0.55 0.80 0.76

Accuracy rate (%) 0.884 0.9 0.886 0.883 0.896

Figure. 6. Cluster mode vs. Accuracy

The Figure 6 gives the accuracy rate for the correct identification of compromised accounts in the given dataset by percentage split validation of dataset collected for the detection of compromised accounts.

4.5

Performance by SVM

The table V gives the performance of support vector machine using the precision recall and f1-measure value for the identification of spam and promotion campaigns from the abnormal accounts of detected compromised accounts. TABLE V PERFORMANCE BY SVM Algorithm

Precision

Recall

F-measure

Support vector machine

8.8

8.9

8.8

The Figure 7 gives performance by the support vector machine for the identification of spam and promotion campaign from the abnormal accounts of compromised accounts. Hence by representing all the measures, the performance of the system with 88% is achieved using the

Figure. 7. Performance by SVM

5

Conclusion and Future Work

In this paper, user’s social behaviors and their messages of individual online social network users are taken and profiled. Based on the social behavioral profiles, the compromised accounts are employed. From the compromised accounts using the similarity estimation method and account graph construction, spam and promotion campaigns are identified. In future, we enable to analyze short messages and the content in the artificially generated images. Similarly to increase detection accuracy, several other features have to be considered during the clustering process of extraction of candidate campaign and identification of spam and promotion campaign. However, if some of the hackers compromise the physical machines that user owns to learn their social behavioral pattern it requires more resourceful and determined attackers and it costs more resources and time.

6

References

[1] Ago H., Hu J., Wilson C., Li Z., Chen Y., and Hao B. H., “Detecting and characterizing social spam campaigns”, in Proc. 10th ACM SIGCOMM Conf. Internet Meas. (IMC) (2010), Melbourne, VIC, Australia, pp. 35–47. [2] Benevenuto F., Rodrigues T., Cha M., and Almeida V., “Characterizing user behavior in online social networks”, in Proc. 9th ACM SIGCOMM Conf. Internet Meas. (IMC), (2009) Chicago, IL, USA, pp. 49–62. [3] Bogers T., and Van den Bosch A., Using language modeling for spam detection in social reference manager Websites, in Proc. 9th Belgian-Dutch Information Retrieval Workshop (2009), pp.87-94. [4] Egele M., Stringhini G., Kruegel C., and Vigna G.,“COMPA:Detecting compromised accounts on social networks”, in Proc. Symp. Netw.Distrib. Syst. Secur. (NDSS) (2013), San Diego, CA, USA.

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

[5] Gianluca Stringhini, Christopher Kruegel, and Giovanni Vigna (2010), ‘Detecting Spammers on Social Networks’,ACSAC, USA, Vol. No. 478. [6] HongyuGao, Yi Yang, Kai Bu, Tiantian Zhu, Yan Chen, Doug Downey, Kathy Lee, and Alok N., “Beating the Artificial Chaos: Fighting OSN Spam Using Its Own Templates”, in Proc. IEEE/ACM Transactions on networking (2016), Vol. 24, no. 6. [7] John Ray, Acopio and LucilaBance O., “Personality traits as predictors of Facebook use”, Journal of Languages and Culture, (2016),Vol. No. 8(4), pp. 45-52. [8] Reza Motamedi, Roberto Gonzalez, Reza Farahbakhsh, Reza Rejaie, Angel Cuevas and Ruben Cuevas ,“Characterizing Group-Level User Behavior in Major Online Social Networks”, ACSAC, USA, (2013). [9] Thomas K., Grier C., Ma J., Paxson V. and Song D.,“Design and evaluation of a real-time URL spam filtering service”, in Proc. IEEE Symp. Secur. Privacy (S&P), Oakland, CA, USA,(2011), pp. 447–462. [10] Xin Ruan, Zhenyu Wu, Haining Wang and Sushil Jajodia, “Profiling online social behaviors for compromised account detection”, in Proc. IEEE transactions on information forensics and security, (2016), Vol. No. 11. [11] Xianchao Zhang, Zhaoxing Li, Shaoping Zhu, Wenxin Liang, “Detecting spam and promoting campaigns in Twitter”, ACM Trans. (2016) Web 10, 1, Article 4, 28 pages. .

ISBN: 1-60132-463-4, CSREA Press ©

91

92

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

SESSION INFORMATION RETRIEVAL, DATABASES, AND NOVEL APPLICATIONS Chair(s) TBA

ISBN: 1-60132-463-4, CSREA Press ©

93

94

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

95

The Semantic Structure of Query Search Results Ying Liu 1

Department of Computer Science, Mathematics and Science, College of Professional Studies, St. John’s University, Queens, NY 11439 [email protected]

Abstract - Organization of query’s search results is a key issue of modern information retrieval systems. In recent years, ranking and listing Web pages based on their Web links have made great success in Web information systems. However, just ranking documents retrieved is not sufficient to users and this kind of ranking is only specific to text collections with extra linkage information among documents. Moreover, the organization of search results relies not only the extra linkage information, but also the intra content information. In the case of bad-quality extra linkage information, the understanding of intra content information of documents retrieved plays the crucial role. Therefore, to effectively organize query’s search results, in this paper, we studied the semantic structure, i.e., the content-based relationships, of query’s search results from the perspective of statistical networks. We showed that such semantic structure is a complex network with the properties of small worlds, scale free and hierarchy. Therefore, the semantic relationships between documents retrieved are revealed and hence provide a beneficial understanding of how these documents can be organized for users according to their contents. Keywords: Query Search Results; Semantic Network; Small Worlds Network; Scale Free Network

1

Introduction

In the last decade, the online text in the Internet has grown into a massive repository of information. Therefore, information retrieval increasingly attracted more interests and attention with the rapid growth of the Internet. The vast volume of online text information has made the phenomenon “information overload” worse in modern information retrieval systems. For instance, an ad-hoc query could return too many results, only few of which are relevant to users. Therefore, the bulk of studies in modern information retrieval have focused on how to present relevant documents effectively, e.g., ranking Web pages by exploring their hyperlinks, or grouping documents into clusters with distinct topics. With hyperlinks linking Web pages, Web information retrieval regards the Word Wide Web as a graph and explores the link structure of the Web graph to help measure the relevance of each retrieved Web page. The Web graph has been reported to be a complex network with remarkable properties [1, 2], small-

world, scale-free, hierarchy, community structure and so on. These discoveries have potentially advanced the development of modern information retrieval techniques [3, 4]. However, in recent years, with intense and considerable interests in exploring and discovering the link structure of the Web, there are few researches paying attention to the study of the semantic structure of query search results. Recalling popular ranking algorithms in Web information retrieval, e.g., HITS [5] and PageRank [6], they firstly retrieve large amounts of query search results containing query’s text information, and then explore their link structure, which is a network with documents as nodes and their hyperlinks as edges. Similarly, the semantic structure of query search results is a network of content relationships among search results, with documents as nodes and their textual relationships as edges. Like the exploration of the link structure of query search results, understanding and analyzing the semantic structure of query search results can also effectively enhance the organization and representation of search results. More important, unlike the Web, many other text resources are free text and do not have the additional linkage information to be exploited, e.g., digital libraries. In these cases, the understanding of their semantic structures is important to information retrieval systems. Although there have been much researches in the topological properties of the link structure of the Web, there are few studies related to the semantic structure. For example, Menczer [7] studied the topological relationship of the link structure and the semantic structure in the Web. The power-law relationship was discovered, and then based on this discovery, two applications were discussed: a content-based generative model for explaining Web growth and content-based crawling algorithms for Web navigation. However, Menczer’s work did not involve with the study of the semantic structure and his results are limited to the Web. In contrast, the study of purely textual relationships among query search results has potential impact on all kinds of text resources, including free-text collections without linkage information. Therefore, in this paper, given a query, we present a study of a semantic network of this query’s search results that is large – possibly containing tens of thousands of retrieved documents – and for which a precise definition of semantic proximity is possible. This network is weighted, where the weight of a semantic link between two documents is their content similarity.

ISBN: 1-60132-463-4, CSREA Press ©

96

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

2

Semantic Networks of a Query Search Results

j∈d1 ∩ d 2

j∈d1

where

wid

2 d1 j

Submit a query to information retrieval system

dn Document-Document Similarity Matrix

Build semantic networks: Two documents are connected if their similarity value is above σ

P a j

P aj e k

σ =ek 0.3

σ = 0.5

σ = 0.8Paj e k

2 d2 j

w )(∑ j∈d w ) 2

d1 d2 . . . dn d1 d2

Content similarities: Compute text similarity among results

User

wd1 j wd 2 j

tm Term-Document Matrix

Search results returned (retrieved documents)

...

s(d1 , d 2 ) =

∑ (∑

t1 t2

...

We study networks of a query search results (or retrieved documents) in which two retrieved documents are considered connected if their similarity in text content is high enough to exceed a value. There are two factors in this definition affecting the construction of semantic edges among search results. One is the definition of content similarity purely based on textual information of retrieved documents. In this paper, we use the cosine similarity function, which is a widely used measure in information retrieval, as the content similarity measure. It is defined as follows,

d1 d2 . . . dn

Preprocess: Stop words, stemming, term weighting ...

is some weight function for term j in the retrieved

document d, e.g., term frequency (TF) or term frequencyinverse document frequency (TFIDF) function. Hence, similarity values are in the range of zero (indicating completely not similar) and one (meaning the same). The other is the similarity threshold defining if two retrieved documents are similar enough to be connected. To investigate the impact of similarity thresholds on the topology of semantic networks, we use different similarity thresholds by iterating the range of [0,1] with a small gap. In addition to the introduction of content similarity function and similarity thresholds, the preprocessing of text documents follows typical steps in information retrieval: each retrieved documents are broken into tokens, which, in this paper, single terms (or words). Word stemming is used to truncate suffixes so that words having the same root (e.g., activate, activates, and activating) are collapsed to the same term for frequency counting. In our work, Porter’s stemmer was applied [8]. Stop lists are typically used to filter out nonscientific English words. The standard TFIDF function was used, in this paper, to assign the weight to each term in the document. Then each document was modeled as an mdimensional TFIDF vector, where m is the number of distinct terms in all the search results. Formally, a document is a vector (tfidf1, tfidf2, … , tfidfm), where tfidfi is the tfidf value of word i. Then a term-by-document matrix was built, in which, each column represents a document, and each row represents a term. The values in the matrix are TFIDF weights. If a term does not appear in one document, then a value 0 is assigned to that cell in the matrix. To summarize the process of constructing semantic networks of query search results, a flowchart is presented in Figure 1.

Figure 1. Flowchart of building semantic networks of a query search results In this paper, we analyze the semantic networks constructed from queries’ search results returned by three different online information retrieval systems, i.e., a large digital library PubMed1, a small digital library Human-Computer Interaction (HCI) Bibliography2 (both of these two libraries provide abstracts of retrieved research papers as textual information), and a popular Web search engine - Google3 (Web pages are retrieved by removing HTML tags and therefore textual information remains after preprocessing). To diversify queries, we consider, specific queries, i.e., a set of ten cancers’ names in PubMed corresponding to ten categories from Society of Surgical Oncology (http://www.surgonc.org) Annotated Bibliography (Among them, we selected only the query “breast cancer” which returns the largest number of relevant documents, while results of other nine cancer queries are reported in the supplementary information of this paper.); and a broad-topic query, i.e., “cognitive” in HCI Bibliography (this term is commonly used in HCI). Moreover, to we also submit a cancer name “breast cancer” as a specific query to the Google and analyze their semantic networks of the first 1000 ranked Web pages (after excluding PDF format files’ links in the search results and the secured web pages, the number of actual Web pages used in the experiment are less than 1000.). The summary of data used is listed in Table 1. Table S1. Summary of queries’ search results

1

PubMed is a service of the U.S. National Library of Medicine that includes over 16 million citations from MEDLINE and other life science journals for biomedical articles back to the 1950s. Its address is http:// www.ncbi.nlm.nih.gov/entrez. 2 HCI Bibliography is an online resource for academic use in Human-Computer Interaction, which provides searching service over 36,000 publications about Human-Computer Interaction. Its address is http://www.hcibib.org. 3 Google is one of the most popular Web search engine in the World Wide Web. Its address is http://www.google.com.

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

Query

Searched in PubMed Searched in HCI Bibliography Searched in Google

3

#Documents Retrieved

Breast Cancer

47871

Colorectal Cancer

26556

Cognitive

2310

Breast Cancer

788

#Terms 4087 7 2940 8

L(G ) =

Clustering Coefficient

The clustering coefficient was first introduced by Watts and Strogtz [9] as a graph metric to determine if a network is highly clustered or not. Let a network with n nodes be G(V,E) with |V|=n. On a selected node i, it has k i edges which connect it to other nodes. If the first neighbors of the original

k (k − 1) node are part of a clique, there would be i i edges 2 between them. Therefore, the ratio between the number ei of edges that actually exist between these k i first neighbors and k i (k i − 1) total number , yield the value of clustering 2 coefficient of node i,

2ei k i (k i − 1)

Consequently, the average of all nodes's clustering coefficient in G is the overall clustering coefficient of G, i.e.,

1 ∑ C (i) . We called this computation method of n

1 ∑ d ij | P | (i , j )∈P

where, P is the set of all possible shortest paths in the network G, and |P| is thus the number of shortest paths in G.

3.3

C (i) =

Average Shortest Path Length

The average shortest path length, which is also called “characteristic path length” by Watts and Strogatz [9], measures the typical separation between two nodes in a network. It is defined as

1733 9

Following the flowchart of constructing semantic networks in query’s search results in Figure 1, we set similarity thresholds as 0.8, 0.6, 0.4, 0.2 and 0.1, and then five semantic networks are obtained for each query’s search results. In the following, we investigated structural properties of these semantic networks.

C (G) =

3.2

6359

Results

3.1

97

Small-World Effect

The introduction of the small-world network by Watts and Strogatz [9] implicated that most real-world networks lie between regular networks and random networks, by sharing the same phenomenon: they are highly clustered, yet have small separation of nodes. More often than not, the clustering degree is measured by LCC or GCC, and the separation of nodes is measured by the average shortest path. Typically, identifying small world networks follows the framework of Watts and Strogatz by comparing real-world networks Gactual with random networks Grandom with the same number of nodes and edges in terms of clustering coefficient and characteristic path length, i.e., if the actual graph Gactual satisfies C (Gactual ) >> C (Grandom ) and L(Gactual ) ≈ L(Grandom ) , it is a small world network. In Table 2, 3, 4 and 5, values of these measures in semantic networks of four queries’ search results are summarized. Observing semantic networks of these four queries’ search results, we found an interesting phenomenon: “when the similarity threshold is high (e.g. σ=0.8), there are small number of edges (even much smaller than the number of nodes) in the semantic network and therefore do not appear as small-world networks, although most of these semantic networks have larger clustering coefficients (e.g., in Table 4, which is much larger than LCC(Gactual)=0.912, LCC(Grandom)=0) than random networks and approximately smaller average shortest path length (their ratio is less than 5 times); however, when the similarity threshold is set lower (e.g., 0.2 and 0.1), the semantic networks do show the smallworld phenomenon with L(Gactual ) ≈ L(Grandom ) but

C (Gactual ) >> C (Grandom ) .”

clustering coefficient as local clustering coefficient (LCC), because it is an average of each node’s clustering coefficient locally computed. Similarly, an alternative definition of clustering coefficient was proposed by Newman [10]. It is globally computed through counting the number of triangles and 2-length paths and therefore called global clustering coefficient (GCC). It is formally defined as follows,

GCC (G) =

6 × numberoftraingle sin thenetwork numberofpathsoflengthtwo Table 2. Summary of Query “Breast Cancer” Searched in PubMed (47871 nodes/documents)

ISBN: 1-60132-463-4, CSREA Press ©

98

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

Similarity Threshold (σ)

0.8 0.6 0.4 0.2 0.1

Edge Number

204 1493 6463 33334 116079

L

Gactual 1.43 4.13 17.96 7.88 5.01

LCC

Grandom 6.24 19.70 15.60 8.36 5.14

Gactual 0.0857 0.0031 0.0064 0.0059 0.0049

GCC

Grandom 0.0199 0.0018 0 0.0001 0.0003

Gactual 0.0588 0.0038 0.0068 0.0045 0.0037

Grandom 0.0125 0.0016 0 0.0001 0.0003

Table 3. Summary of Query “Cognitive” Searched in HCI Bibliography (2310 nodes/documents) Similarity Threshold (σ)

0.8 0.6 0.4 0.2 0.1

Edge Number

16 33 167 2315 19825

L

Gactual 1.56 1.47 2.69 7.24 3.13

LCC

Grandom 1.74 3.53 6.23 6.90 2.98

Gactual 0 0 0.1033 0.1084 0.0690

Grandom 2.9167E-01 3.8265E-02 0.0160 0 0.0077

GCC

Gactual 0 0 0.1044 0.0846 0.0556

Grandom 2.0000E-01 5.0000E-02 0.0136 0 0.0076

Table 4. Summary of Query “Breast Cancer” Searched in Google (788 nodes/documents) Similarity Threshold (σ)

0.8 0.6 0.4 0.2 0.1

4

Edge Number

59 123 318 1667 6705

L

Gactual 1.14 1.02 1.97 5.04 3.17

LCC

Grandom 5.46 7.67 7.08 3.83 2.59

Conclusion

In this paper, we investigated the semantic structure or content relationship of search results of queries which can be domainspecific or broad-topic. The results showed that the semantic structures of query’s search results do show properties of complex networks like most real-world networks. In specific, the semantic networks of query’s search results are smallworld, scale-free, and hierarchy. We believe these results can provide benefits and useful implications to the design of information retrieval systems for better organization and performance of search results.

5 1. 2. 3. 4.

Gactual 0.9120 0.9642 0.6643 0.5735 0.4630 5. 6.

7. 8. 9. 10.

References Albert, R., Jeong, H. & Barabasi, A.-L. Diameter of the World Wide Web. Science 401, 30-131 (1999). Albert, R., Barabasi, A.-L., Jeong, H. & Bianconi, G. Power-law distribution of the World Wide Web. Science 287, 2115a (2000). Chakrabarti, S. et al. Mining the link structure of the World Wide Web. Computer 32, 60-67 (1999). Kleinberg, J. & Lawrence, S. The structure of the Web. Science 294, 1849 (2001).

GCC

Grandom Gactual Grandom 0 0.8 0 0.0205 0.9550 0.0155 0 0.6879 0 0.0144 0.4708 0.0140 0.0243 0.4137 0.0245 Kleinberg, J. M. Authoritative sources in a hyperlinked environment. Journal of the ACM 46, 604-632 (1999). Page, L., Brin, S., Motwani, R. & Winograd, T. The PageRank citation ranking: bringing order to the Web. Stanford Digital Library Technologies Project (1998). Menczer, F. Growing and navigating the small world Web by local content. PNAS 99, 14014-14019 (2002). Porter, M. An algorithm for suffix stripping. Program 14, 130-137 (1980). Watts, D. J. & Strogatz, S. H. Collective dynamics of `small-world' networks. Nature 393, 440-442 (1998). Newman, M. E. J. The structure and function of complex networks. SIAM Review 45, 167-256 (2003).

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

99

Migration of relational databases RDB to database for objects db4o for schema and data. Youness Khourdifi

Mohamed Bahaj

Ph.D.: Department of Mathematics and Computer Sciences Faculty of Science and Technology, University Hassan I Settat, Morocco E-mail: [email protected] Abstract— In This paper, we present an approach that takes an existing relational database as input, obtains a copy of its metadata and the principle of semantic enrichment to extract different object principles including aggregation, inheritance, and composition. The first phase of the migration process is the CDM is generating an automatic, in the second phase we move towards the creation of schema object from the relational schemas, and the last step is the mapping of data with Query by Example in db4o. Keywords— Relational databases, database for objects db4o, Migration, Canonical Data Model CDM, Data Mapping, Object Oriented Database, Semantic Enrichment, Schema Translation, Query by Example.

I.

INTRODUCTION Most people who write software for a living have at least some familiarity with databases. Many of them are also familiar with object-oriented programming and have probably needed to use a database to provide persistence for software objects. db4o is an object database, which means that unlike the more common relational databases, it looks at data the same way that programs do. Something that is not possible with relational databases, because the relational model is a concept based on the principle of the relationship, all these data is organized in a table [1]. We tackle this problem by proposing a solution to the migration of RDB to DB4O for schemas and data mapping with Query by Example. Traditional relational databases are dominant in the market as most data today still stored and maintained in systems relational databases. However, relational databases have their place in supporting complex structures and data types, usersupplied based on technologies defined objects and World Wide Web. Therefore, a new generation of database management systems began to emerge in the market, offering more features and flexibility. They have been proposed databases, in a relatively new oriented object database (OODB) [2], which support several different concepts to meet the requirements of complex applications (e.g., multimedia, computer aided design, etc.) that require rich data types. Therefore, it is expected that the need to convert the RDB in

Professor: Department of Mathematics and Computer Sciences Faculty of Science and Technology, University Hassan I Settat, Morocco E-mail: [email protected] technologies that have emerged recently increased significantly such that the object database OODB [3]. 1.1. OODBMS and db4o: 1.1.1.

Instances of utilization and functionality:

When there is only one application at a time running on a database and if it's an application dedicated to mobility as found in systems based on location or in embedded software, this can worth trying a second generation OODBMS. Because of their ability to manage the relationships between objects, OODBMS are intrinsically well suited when complex object models, flat object or large object tree structures are in order which is often the case in the Systems Geographical Information (SGI). Here are some of the technical features of db4o:     

 

Integrated mode and client-server mode. No server administration at runtime. The database properties are checked out of the host application. Need little disk space for the libraries of the program, as considerably as in memory during execution. The simplicity of use: db4o uses application programming interfaces (APIs) from reflective Java and.NET. There is therefore no additional annotation, no pre or post processing (byte code engineering), any subclass or interface implementation to perform. Lazy loading of Control Methods relationships embedded objects. Replication tools as supplements.

As one might expect from a database, db4o implements ACID (atomicity, consistency, isolation and durability) that provide safe transactions: a transaction starts when opening or queries the database, and ends with the methods commit () or rollback (). Three approaches are implemented for queries: Query by Example, the SODA queries (Simple Object Database Access, or Simple Object Access database) and 'Native Queries'.

ISBN: 1-60132-463-4, CSREA Press ©

100

Int'l Conf. Information and Knowledge Engineering | IKE'17 | 1.1.2.

Advantages:

Talking ORM environment leads us to compare OODBMS with this approach. Db4o is very fast if one thinks the test beds [4]. A conceptual argument is that there is no mismatch object / relational: no adjustment of the types or sizes of data (within separate relational, no SQL and no textual SQL) to manage. It is important to mention that db4o provides a convenient management of agile development techniques: it is because the requests are written in the language of the application (Java, .NET) and is, therefore, safe in terms the data typing. There are also friendly features schema evolution, not SQL. Software engineers have life easier than before because they reside in the object world, unlike the professional databases. 1.1.3.

Restrictions

Db4o is probably not suited for employment in large data warehouses or data mining. It is especially recommended when multiple applications access the database with many views. The fact that different languages are available for queries and shows no mature standard language is available in comparison to the predominant SQL relational model. Constraints such as referential integrity are not (yet) part of any language except Native Queries that simply implement callback functions. Finally, the current lack of standardization has been recognized. The actions of the Management Subject Group (OMG) are under way to move to a new release of the Version 4 ODMG standard. II. RELATED WORK In engineering, migration relational databases there is significant research to solve the problem of setting up relational databases already in production in applications that runs on the object paradigm among this research there are certain assumptions migration with a single target (ORDB | OODB | XML) which could be a point of a disadvantage or limitation on the other they chose to work with an intermediary to migrate a relational database to several models. The migration starts with the extraction of relationships and dependencies in the conceptual schema model as EntityRelationship and its varieties; this is called the semantic enrichment [5] [6]. Maatuk and al. [7] propose an approach that takes an existing RDB as an input, expanding its representation of the metadata with the necessary semantics, and built a canonical data model (CDM) to generate three different targets (ORDB | OODB | XML). CDM capture the essential characteristics of target data models, and product is enriched with constraints and data semantics of RDB who may not have been explicitly expressed in the source metadata. The CDM is then mapped into target patterns according to internationally recognized standards such as ODMG 3.0 [8], SQL4 and XML. Other researchers such as R Alhajj and al. [9] they implement a system that builds an understanding of a conventional base given by taking these characteristics as

input and produces the object-oriented database corresponding output. The system draws a chart summarizing the conceptual model. Links in the graph are classified inheritance links and aggregation links. This classification resulted in the class hierarchy. A. Behm and al.[10] they work in the first phase, with the use of transformation rules to build an object-oriented schema that is semantically equivalent to the relational schema. In the second phase, schema transformation information is used to generate programs which migrate the data in a relational oriented database object. The two concepts, the schema transformation and data migration are implemented using O2 as OODBMS A. El alami and al. [11] Propose a solution to the migration of RDB to OODB which is based on metadata and the principle of semantic enrichment to extract different object principles, including inheritance, aggregation, and composition, this solution is based on the New Canonical data model (NDM). Maatuk and al. suggest a solution takes an existing RDB input, expanding its representation of the metadata with the necessary semantics, and produce an improved model canonical data, which captures the essential features of the ORDB target, and is suitable for migration. A prototype has been developed, which migrates successfully BRD in ORDBs (Oracle 11g) based on the canonical model [12]. C. Fahrner and al. [13] propose a three-step process that first goal is to complete a given relational schema, namely, to make the semantic information also carries explicit as possible using a variety of data dependencies. A complete schematic is then converted into a ODMG schema in a simple way, by generating class relational schemas. The result is generally not optimal object-oriented point of view; so the original objectoriented scheme is finally improved to better exploit the available options in the object-oriented paradigm. Another research focus also on the schema transformation like X. Zhang and al. [14] they discuss class structures and define well structured classes. Based on MVDs, a theorem is given the transformation of a relationship diagram in a wellstructured class. With the aim of transforming the RDB schema diagram OODB, a composition process simplifying RDB entry scheme and an algorithm transforming the RDB simplified diagram in OODB well-structured classes are developed. A new approach discusses the transition from relational databases to a document-oriented model of NoSQL. The method is based on the routing of a data model which establishes a migration of the physical schema and the data mapping [16]. In Our approach we discusses an approach that takes an existing relational database as input, obtains an object-oriented database in the output with its metadata and the principle of semantic enrichment to extract different object principles including aggregation, inheritance, and composition. The first phase of the migration process is the Canonical Data Model which is invented by Maatuk and al.[15] , in the second phase we move towards the creation of schema object from the

ISBN: 1-60132-463-4, CSREA Press ©

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

101

relational schemas with a schema transformation algorithm, and the last step we produce a data migration algorithm we always keep association and key in the mapping of data with Query by Example for db4o. PKs are underlined and FKs are marked by “*”

III. SEMANTIC ENRICHMENT Semantic enrichment is a process of analyzing and examining a database to capture its structure and definitions at a higher level of meaning. This is done by enhancing a representation of an existing database's structure in order to make hidden semantics explicit.

Student (stu_id, stu_fname, stu_lname ) Department (dep_code, dep_name) Instructor (ins_id, ins_fname, ins_lname, dep_code*) Location (loc_code, loc_name, loc_country) Course (crs_code, crs_utile, crs_credits, dep_code*, crs_description) Section (sec_id, sec_term, sec_bldg, sec_room, sec_time, crs_code*, loc_code*, ins_id*) Enrollment (stu_id*, sec_id*, grade_code) Prerequisite (crs_code*, crs_requires*) Qualified (ins_id*, crs_code*)

Fundamentally, we mean enriching the content/context of data by tagging, categorizing, and/or classifying data in relationship to each other, to dictionaries, and/or other base reference sources. At its simplest, this means adding additional contextual information to some existing data set (think of adding traffic data to road maps where the traffic data provides the context of road conditions, the probability of delay, the length of projected obstructions, the condition of the road, etc.).

Fig.1 Relational databases for University

In the phase of the semantic enrichment, we operate the CDM to keep the semantic database and for creating OODB schema and data migration with transformation algorithms. Taking the example of a relational database in Fig.1:

Cn

Cls

Abs

RRC

false

RST

false

SST

false

Student Department Instructor

We show the result of a database table with the concept of university CDM in Table 1.

Acdm an stu_id stu_fname stu_lname dep_code dep_name ins_id ins_fname ins_lname dep_code

t char char char char char char char char char

tag PK PK PK FK

l

n

9 20 20 4 40 9 20 20 4

n n n n n n n n n

d

RelType

dirC

Rel dirAs

c

invAs

asso

Enrollment

stu_id

0..*

stu_id

asso asso asso asso asso

Course Instructor Section Qualified Departmen

dep_code dep_code Ins_id Ins_id Dep_code

1..* 1..1 1..* 1..* 1..1

dep_code dep_code Ins_id Ins_id Dep_code

UK ua

S

dep_name

1

Table 1: Results of CDM generation

IV. SCHEMA TRANSLATION In this section we explain how the CDM translates to a target schema equivalent DB4O. A set of rules was designed to bring the classes and relationships. A target DB4O schema is defined as a set of class : OOSchemaDb4o := {Cdb4o | Cdb4o = } Where Att its attributes, and Relc the relationship with class. Attdb4o := {Adb4o | Adb4o = } Reldb4o := {Relc | Relc = } The processing algorithm based on the CDM to simplify the migration task to OODB schema for all classes that belong to CDM if the classification of the class C is an SST | SUR | SSC and the relationship between class is composition, we

create a variable that contains the Var compo element in determining the location of a class C with respect to the class C ', either within the class or to outside, in the case of a composition within an inheritance, in the case of assignment to Class compo value Cn we create a collection for compo. For the relations between class there are two types with either a single value or multiple values if the relationship is between two classes (0…* 1…*) so the relationship class is a relationship with more value. For treatment of the inheritance if the classification of a class C is SST then we create a new class and a variable n to set the location of the class C in the CDM within a loop is incremented to browse the following class. If the classification of a class C is SSC we create a class C extend the class C that accepts the inheritance, if not we will verify the classification is that it is a SUB is we create another class C '' extend the class C is class final or not accepted the inheritance.

ISBN: 1-60132-463-4, CSREA Press ©

102

Int'l Conf. Information and Knowledge Engineering | IKE'17 |

If the classification of a class C and a RST and the type of relationship is a composition then we create a simple collection if not a single class. For relations in cases of RRC and CAC if the relationship is (m, n) then we create a class with its own owner. The ProduceDB4Oschema algorithm shown in Figure 2 implements these translation rules. 1. Algorithm ProduceDB4Oschema (cdm: CDM) return OOschema 2. Target Schema: OOSchema := ∅ 3. Foreach class C ∈ cdm do // Composition inside inheritance 4. If (Cn.classification = (SST | SUR | SSC) && RelType = composition) then 5. Var compo = getDirC 6. Foreach C ∈ CDM do 7. If Cn = compo then 8. Create collection of compo 9. end if 10. end for 11. Foreach relationship ∈ C.relation do 12. If c.relation = (0…*,1…*) then 13. Multiplicity := ‘Collection value’ 14. Else multiplicity := ‘Single value’ 15. Else if C.classification = SST then // Treatment of Inheritance 16. Create class Cn 17. Specifie dirC (element n) Else if c.classification = SSC then 18. Create class Cn extend element n 19. Specifie dirC (element m) 20. Else if c.classification = SUB then 21. Create class Cn extend element m final 22. End if 23. If (Cn.classification = (RST) && RelType = composition) then 24. Var compo = getDirC 25. Foreach C ∈ CDM do 26. If Cn = compo then 27. Create collection of compo 28. end if 29. end for 30. Else 31. Create class Cn 32. End if 33. Foreach rel ∈ RELdb4o do 34. If C.classification = (RRC | CAC) && rel.relType = “associate with” then 35. If rel.c = (m,n) then 36. Create class with its own propriety 37. End if 38. End if Fig.2 ProduceDB4Oschema algorithm

V. DATA CONVERSION WITH QBE In this section we describe the GenerateDb4oData algorithm given in Figure 4, for processing and filling OODB schema generated from the schema translation phase, we choose db4o for several reasons is cited in the introduction queries in db4o is based on the query by Example, and for the extraction of data we have operated a JDBC programming interface given in Figure 3, that allows Java applications to access through a common interface to databases relational why there are JDBC drivers. public class Connect { public static void main(String[] args) { String tablename; try { Class.forName("org.postgresql.Driver"); String url = "jdbc:postgresql://localhost:5432/University"; String user = "admin"; String passwd = "admin"; Connection condb = DriverManager.getConnection(url, user, passwd); Statement sql4o = condb.createStatement(); ResultSet res = sql4o.executeQuery("SELECT * FROM tablename"); ResultSetMetaData resMeta = res.getMetaData(); For (int i = 1; i