Artificial intelligence research and development: recent advances and applications 9781614994510, 9781614994527, 1614994528

Title Page -- Preface -- Conference Organization -- Contents -- Machine Learning -- Approximate Policy Iteration with Be

806 81 8MB

English Pages [309] Year 2014

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Artificial intelligence research and development: recent advances and applications
 9781614994510, 9781614994527, 1614994528

Table of contents :
Title Page --
Preface --
Conference Organization --
Contents --
Machine Learning --
Approximate Policy Iteration with Bellman Residuals Minimization --
A Randomized Algorithm for Exact Transduction --
Comparing Feature-Based and Distance-Based Representations for Classification Similarity Learning --
Computer Vision I --
Bag-of-Tracklets for Person Tracking in Life-Logging Data --
Improving Autonomous Underwater Grasp Specification Using Primitive Shape Fitting in Point Clouds --
Emotions Classification Using Facial Action Units Recognition Decision Support SystemsIntelligent System for Optimisation in Golf Course Maintenance --
A Hierarchical Decision Support System to Evaluate the Effects of Climate Change in Water Supply in a Mediterranean River Basin --
A Computational Creativity System to Support Chocolate Designers Decisions --
Learning by Demonstration Applied to Underwater Intervention --
Social and Cognitive Systems --
Understanding Color Trends by Means of Non-Monotone Utility Functions --
Analysis of a Collaborative Advisory Channel for Group Recommendation Discovery of Spatio-Temporal Patterns from Location Based Social NetworksCollaborative Assessment --
Computer Vision II --
Analysis of Gabor-Based Texture Features for the Identification of Breast Tumor Regions in Mammograms --
Improvement of Mass Detection in Breast X-Ray Images Using Texture Analysis Methods --
Validating and Customizing a Colour Naming Theory --
Fuzzy Logic and Reasoning I --
One-Dimensional T-Preorders --
On the Derivation of Weights Using the Geometric Mean Approach for Set-Valued Matrices --
Fuzzy Logic and Reasoning II Local and Global Similarities in Fuzzy Class TheoryOn the Characterization of the Maximal Ideal Recursive Semantics of RP-DeLP --
Planning --
A Comparison of Two MCDM Methodologies in the Selection of a Windfarm Location in Catalonia --
A System for Generation and Visualization of Resource-Constrained Projects --
Short Contributions and Applications --
Towards a Remote Associate Test Solver Based on Language Data --
Fear Assessment: Why Data Center Servers Should Be Turned Off --
Low-Complex Real-Time Breathing Monitoring System for Smartphones Influencer Detection Approaches in Social Networks: A Current State-of-the-ArtCombinatorial and Multi-Unit Auctions Applied to Digital Preservation of Self-Preserving Objects --
Manifold Learning Visualization of Metabotropic Glutamate Receptors --
Evaluation of Random Forests on Large-Scale Classification Problems Using a Bag-of-Visual-Words Representation --
Web Pattern Detection for Business Intelligence with Data Mining --
About Model-Based Distances --
Defining Dimensions in Expertise Recommender Systems for Enhancing Open Collaborative Innovation

Citation preview

ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT

Frontiers in Artificial Intelligence and Applications FAIA covers all aspects of theoretical and applied artificial intelligence research in the form of monographs, doctoral dissertations, textbooks, handbooks and proceedings volumes. The FAIA series contains several sub-series, including “Information Modelling and Knowledge Bases” and “Knowledge-Based Intelligent Engineering Systems”. It also includes the biennial ECAI, the European Conference on Artificial Intelligence, proceedings volumes, and other ECCAI – the European Coordinating Committee on Artificial Intelligence – sponsored publications. An editorial panel of internationally well-known scholars is appointed to provide a high quality selection. Series Editors: J. Breuker, N. Guarino, J.N. Kok, J. Liu, R. López de Mántaras, R. Mizoguchi, M. Musen, S.K. Pal and N. Zhong

Volume 269 Recently published in this series Vol. 268. A. Utka, G. Grigonytė, J. Kapočiūtė-Dzikienė and J. Vaičenonienė (Eds.), Human Language Technologies – The Baltic Perspective: Proceedings of the Sixth International Conference Baltic HLT 2014 Vol. 267. P. Garbacz and O. Kutz (Eds.), Formal Ontology in Information Systems – Proceedings of the Eighth International Conference (FOIS 2014) Vol. 266. S. Parsons, N. Oren, C. Reed and F. Cerutti (Eds.), Computational Models of Argument – Proceedings of COMMA 2014 Vol. 265. H. Fujita, A. Selamat and H. Haron (Eds.), New Trends in Software Methodologies, Tools and Techniques – Proceedings of the Thirteenth SoMeT_14 Vol. 264. U. Endriss and J. Leite (Eds.), STAIRS 2014 – Proceedings of the 7th European Starting AI Researcher Symposium Vol. 263. T. Schaub, G. Friedrich and B. O’Sullivan (Eds.), ECAI 2014 – 21st European Conference on Artificial Intelligence Vol. 262. R. Neves-Silva, G.A. Tshirintzis, V. Uskov, R.J. Howlett and L.C. Jain (Eds.), Smart Digital Futures 2014 Vol. 261. G. Phillips-Wren, S. Carlsson, A. Respício and P. Brézillon (Eds.), DSS 2.0 – Supporting Decision Making with New Technologies Vol. 260. T. Tokuda, Y. Kiyoki, H. Jaakkola and N. Yoshida (Eds.), Information Modelling and Knowledge Bases XXV Vol. 259. K.D. Ashley (Ed.), Legal Knowledge and Information Systems – JURIX 2013: The Twenty-Sixth Annual Conference Vol. 258. K. Gerdes, E. Hajičová and L. Wanner (Eds.), Computational Dependency Theory

ISSN 0922-6389 (print) ISSN 1879-8314 (online)

A Artific ial Inttelligen nce Reesearch D pmentt and Develop R Recent Advances and Applicatio ons

y Edited by

Lleedó Museeros Engineering and Co omputer Scieence Departm ment, Univerrsitat Jaume I, Spain

O Oriol Pujo ol Dept. Matemàticca Aplicada i Anàlisi, Un niversitat de Barcelona, B SSpain

and

N Núria Ageell ESADE E Business Scchool, Deparrtment of Infformation Syystems Managgement, Universittat Ramon Lllull, Spain

Amstterdam • Berrlin • Tokyo • Washington, DC

© 2014 The authors and IOS Press. All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without prior written permission from the publisher. ISBN 978-1-61499-451-0 (print) ISBN 978-1-61499-452-7 (online) Library of Congress Control Number: 2014951481 Publisher IOS Press BV Nieuwe Hemweg 6B 1013 BG Amsterdam Netherlands fax: +31 20 687 0019 e-mail: [email protected] Distributor in the USA and Canada IOS Press, Inc. 4502 Rachael Manor Drive Fairfax, VA 22032 USA fax: +1 703 323 3668 e-mail: [email protected]

LEGAL NOTICE The publisher is not responsible for the use which might be made of the following information. PRINTED IN THE NETHERLANDS

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved.

v

Preface The Catalan Association for Artificial Intelligence (ACIA),1 created in 1994 as a nonprofit association, aims primarily at fostering the cooperation among researchers from the Catalan-speaking Artificial Intelligence research community. In 1998, the first International Conference of Artificial Intelligence (the CCIA) was organised at the Universitat Rovira i Virgili in the city of Tarragona. From then to now, the CCIA has been organised every year by the ACIA together with a local committee, striving to take place in all the Catalan-speaking lands. Since then, the CCIA has been held in Girona (1999), Vilanova i la Geltrú (2000), Barcelona (2001, 2004), Castelló de la Plana (2002), Mallorca (2003), L’Alguer (2005), Perpinyà (2006), Andorra (2007), Sant Martí d’Empúries (2008), Cardona (2009), L’Espluga de Francolí (2010), Lleida (2011), Alacant (2012) and Vic (2013). The CCIA is now a consolidated forum of debate among Catalan-speaking researchers and colleagues from all around the world. This year the conference is being held in Barcelona, hosted by the Universitat de Barcelona, and organised by the ACIA together with the Applied Mathematics and Analysis department of the hosting university. CCIA 2014 has been organised as a single-track conference consisting of high quality, previously unpublished papers on new and original research on Artificial Intelligence. This volume contains 34 original contributions, which have been accepted for presentation at the Seventeenth International Conference of the Catalan Association of Artificial Intelligence (CCIA 2014), which will take place on October 22–24, 2014. All contributions have been reviewed by at least two referees. The papers have been organised around different topics providing a representative sample of the current state of the art in the Catalan Artificial Intelligence Community and of the collaboration between ACIA members and the worldwide AI community. As a novelty, this year conference hosts the Cognitive Science Society (CSS) 2 award to the best student paper on interdisciplinary work in the fields of Cognitive Science and Artificial Intelligence. We would like to express our sincere gratitude to all authors for making the conference and this book possible with their contributions and participation. We sincerely thank the members of the organizing committee for their effort in the preparation of this event, and all the members of the Scientific Committee for their help in the reviewing process. Special thanks go also to the two outstanding plenary speakers, Ramón López de Mántaras and Leo Wanner, for their effort in preparing very interesting lectures, to Karina Gibert, former PC chair in CCIA’2013, for her help during this year and, last but not least, to Vicenç Torra, president of ACIA, for his kind support and involvement.

1 ACIA is a member of ECCAI, the European Coordinating Committe for Artificial Intelligence (http://www.acia.org). 2 http://cognitivesciencesociety.org

 

vi

We wish all participants a successful and inspiring conference and a pleasant stay in Barcelona. Lledó Museros, Universitat Jaume I Oriol Pujol, Universitat de Barcelona Núria Agell, ESADE Business School October 2014

vii

Conference Organization The CCIA 2014 Conference was organized by the Catalan Association for Artificial Intelligence. General Chair Núria Agell ESADE Business School Department of Information Systems Management Universitat Ramon Llull, Spain Local Organizing Committee Chair Oriol Pujol Dept. Matemàtica Aplicada i Anàlisi Universitat de Barcelona, Spain Scientific Committee Chair Lledó Museros Engineering and Computer Science Department University Jaume I, Spain Scientific Committee Núria Agell (ESADE-URL) Isabel Aguiló (UIB) René Alquézar (UPC) Cecilio Angulo (UPC) Carlos Ansótegui (UdL) Josep Argelich (UdL) Eva Armengol (IIIA) Federico Barber (UPV) Ramón Béjar (UdL) Miquel Bofill (UdG) María Luisa Bonet (UPC) Vicent Botti (UPV) Dídac Busquets (Imperial College London) Carlos Carrascosa (UPV) Pompeu Casanovas (UAB) Miguel Cazorla (UA) Hubie Chen (UPV/EHU Ibervasque) Dante Conti (ULA, Venezuela) Ulises Cortés (UPC) Pilar Dellunde (IIIA-CSIC) Sergio Escalera (UB) Zoe Falomir (U. Bremen, Alem) Ricard Gavalda (UPC) Ana García Fornés (UPV) Karina Gibert (UPC)

Maite López Sánchez (UB) Felip Manyà (IIIA-CSIC) Carles Mateu (UdL) Joaquim Meléndez (UdG) Pedro Meseguer (IIIA-CSIC) Antonio Moreno (URV) Lledó Museros (UJI) Angela Nebot (UPC) Eva Onaindia (UPV) José Oncina (UA) Mihaela Oprea (U. Ploiesti, Romania) Jordi Planes (UdL) Domenec Puig (URV) Oriol Pujol Vila (UB) Josep Puyol Gruart (IIIA-CSIC) Petia Radeva (UAB) David Riaño (URV) Andrea Rizzoli (IDSIA, Suïssa) Horacio Rodríguez(UPC) Francisco J. Ruíz (UPC) Jordi Sabater Mir (IIIA-CSIC) J. Salvador Sánchez Garreta (UJI) Miquel Sànchez Marrè (UPC) Ismael Sanz (UJI) Ricardo Toledo (UAB)

viii

bet Golobardees (URL) Elisab José Manuel M Iñestaa (UA) Andeers Jonsson (U UPF) nte Julián (UP PV) Vicen Mikh hail Kanevski (UNIL, ( Suissaa) Jordi Levy (IIIA-C CSIC) on López de Mántaras M (IIIA A-CSIC) Ramo Beatrriz López (UdG G)

Vicenç Torraa (IIIA-CSIC) UPC) Carme Torrass (IRI-CSIC/U Aida Valls (U URV) Maria Vanrelll (UAB) Mateu Villareet (UdG) Jordi Vitrià (U UB) Franz Wotaw wa (U. Graz, A Austria)

Local Org ganizing Com mmittee Oriol Pujol, Univerrsitat de Barceelona versitat de Baarcelona Lluís Garrido, Univ niversitat de Barcelona B Sergio Escalera, Un ne Balocco, Universitat U de Barcelona Simon Mariaa Salamó, Uniiversitat de Baarcelona Inmacculada Rodríg guez, Universiitat de Barcelo ona Victo or Ponce, Univ versitat de Barrcelona Eloi Puertas, P Univeersitat de Barccelona Alex Pardo, Univerrsitat de Barceelona Alberrt Clapes, Universitat de Baarcelona Anton nio Hernándezz, Universitat de Barcelonaa Migu uel Angel Bauttista, Universiitat de Barcelo o Santiaago Seguí, Un niversitat de Barcelona B versitat de Barrcelona Petia Radeva, Univ Lauraa Igual, Univeersitat de Barcelona Organizin ng Institution ns

ix

Contents Preface Lledó Museros, Oriol Pujol and Núria Agell Conference Organization

v vii

Machine Learning Approximate Policy Iteration with Bellman Residuals Minimization Gennaro Esposito and Mario Martin A Randomized Algorithm for Exact Transduction Gennaro Esposito and Mario Martin Comparing Feature-Based and Distance-Based Representations for Classification Similarity Learning Emilia López-Iñesta, Francisco Grimaldo and Miguel Arevalillo-Herráez

3 13

23

Computer Vision I Bag-of-Tracklets for Person Tracking in Life-Logging Data Maedeh Aghaei and Petia Radeva Improving Autonomous Underwater Grasp Specification Using Primitive Shape Fitting in Point Clouds David Fornas, Jorge Sales, Antonio Peñalver, Javier Pérez, J. Javier Fernández and Pedro J. Sanz Emotions Classification Using Facial Action Units Recognition David Sanchez-Mendoza, David Masip and Àgata Lapedriza

35

45

55

Decision Support Systems Intelligent System for Optimisation in Golf Course Maintenance Gerard Pons, Marc Compta, Xavier Berjaga, Filippo Lulli and Josep Maria Lopez A Hierarchical Decision Support System to Evaluate the Effects of Climate Change in Water Supply in a Mediterranean River Basin Tzu Chi Chao, Luis Del Vasto-Terrientes, Aida Valls, Vikas Kumar and Marta Schuhmacher A Computational Creativity System to Support Chocolate Designers Decisions Francisco J. Ruiz, Cristóbal Raya, Albert Samà and Núria Agell

67

77

87

x

Learning by Demonstration Applied to Underwater Intervention Arnau Carrera, Narcís Palomeras, Natàlia Hurtós, Petar Kormushev and Marc Carreras

95

Social and Cognitive Systems Understanding Color Trends by Means of Non-Monotone Utility Functions Mohammad Ghaderi, Francisco J. Ruiz and Núria Agell

107

Analysis of a Collaborative Advisory Channel for Group Recommendation Jordi Pascual, David Contreras and Maria Salamó

116

Discovery of Spatio-Temporal Patterns from Location Based Social Networks Javier Bejar, Sergio Alvarez, Dario Garcia, Ignasi Gomez, Luis Oliva, Arturo Tejeda and Javier Vazquez-Salceda

126

Collaborative Assessment Patricia Gutierrez, Nardine Osman and Carles Sierra

136

Computer Vision II Analysis of Gabor-Based Texture Features for the Identification of Breast Tumor Regions in Mammograms Jordina Torrents-Barrena, Domenec Puig, Maria Ferre, Jaime Melendez, Joan Marti and Aida Valls Improvement of Mass Detection in Breast X-Ray Images Using Texture Analysis Methods Mohamed Abdel-Nasser, Domenec Puig and Antonio Moreno Validating and Customizing a Colour Naming Theory Lledó Museros, Ismael Sanz, Luis Gonzalez-Abril and Zoe Falomir

149

159 169

Fuzzy Logic and Reasoning I One-Dimensional T-Preorders D. Boixader and J. Recasens On the Derivation of Weights Using the Geometric Mean Approach for Set-Valued Matrices Vicenç Torra

183

193

Fuzzy Logic and Reasoning II Local and Global Similarities in Fuzzy Class Theory Eva Armengol, Pilar Dellunde and Àngel García-Cerdaña

205

On the Characterization of the Maximal Ideal Recursive Semantics of RP-DeLP Teresa Alsinet, Ramón Béjar, Lluís Godo and Francesc Guitart

215

xi

Planning A Comparison of Two MCDM Methodologies in the Selection of a Windfarm Location in Catalonia Arayeh Afsordegan, Mónica Sánchez, Núria Agell, Juan Carlos Aguado and Gonzalo Gamboa A System for Generation and Visualization of Resource-Constrained Projects Miquel Bofill, Jordi Coll, Josep Suy and Mateu Villaret

227

237

Short Contributions and Applications Towards a Remote Associate Test Solver Based on Language Data Ana-Maria Olteţeanu and Zoe Falomir

249

Fear Assessment: Why Data Center Servers Should Be Turned Off Damián Fernández-Cerero, Alejandro Fernández-Montes, Luis González-Abril, Juan A. Ortega and Juan A. Álvarez

253

Low-Complex Real-Time Breathing Monitoring System for Smartphones Pere Marti-Puig, Gerard Masferrer and Moises Serra

257

Influencer Detection Approaches in Social Networks: A Current State-of-the-Art Jordi-Ysard Puigbò, Germán Sánchez-Hernández, Mònica Casabayó and Núria Agell

261

Combinatorial and Multi-Unit Auctions Applied to Digital Preservation of Self-Preserving Objects Jose Antonio Olvera, Paulo Nicolás Carrillo and Josep Lluis De La Rosa Manifold Learning Visualization of Metabotropic Glutamate Receptors Martha-Ivón Cárdenas, Alfredo Vellido and Jesús Giraldo Evaluation of Random Forests on Large-Scale Classification Problems Using a Bag-of-Visual-Words Representation Xavier Solé, Arnau Ramisa and Carme Torras

265 269

273

Web Pattern Detection for Business Intelligence with Data Mining Arturo Palomino and Karina Gibert

277

About Model-Based Distances Gabriel Mattioli

281

Defining Dimensions in Expertise Recommender Systems for Enhancing Open Collaborative Innovation J. Nguyen, A. Pereda, G. Sánchez-Hernández and C. Angulo

285

Using the Fuzzy Inductive Reasoning Methodology to Improve Coherence in Algorithmic Musical Beat Patterns Iván Paz-Ortiz, Àngela Nebot, Francisco Mugica and Enrique Romero

289

Subject Index

293

Author Index

295

This page intentionally left blank

Machine Learning

This page intentionally left blank

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-3

3

Approximate Policy Iteration with Bellman Residuals Minimization Gennaro ESPOSITO a,1 , Mario MARTIN a University of Catalunya, 08034 Barcelona, Spain

a Polytechnical

Abstract. Reinforcement Learning (RL) provides a general methodology to solve complex uncertain decision problems, which are very challenging in many realworld applications. RL problem is modeled as a Markov Decision Process (MDP) deeply studied in the literature. We consider Policy Iteration (PI) algorithms for RL which iteratively evaluate and improve control policies. In handling problems with continuous states or in very large state spaces, generalization is mandatory. Generalization property of RL algorithms is an important factor to predict values for unexplored states. Candidates for value function approximation are Support Vector Regression (SVR) known to have good properties over the generalization ability. SVR has been used in batch frameworks in RL but, smart implementations of incremental exact SVR can extend SVR generalization ability to online RL where the expected reward from states change constantly with experience. Hence our online SVR is a novelty method which allows fast and good estimation of value function achieving RL objective very efficiently. Throughout simulation tests, the feasibility and usefulness of the proposed approach is demonstrated. Keywords. Reinforcement Learning, Support Vector Machine, Approximate Policy Iteration, Regularization

1. Introduction By using RL an autonomous agent interacting with the environment can learn how to take the optimal action for a specific situation. RL is modeled as a Markov Decision Process (MDP) and RL algorithms can in principle solve nonlinear, stochastic optimal control problems without using a model. The immediate performance is measured by a scalar reward, and the goal is to find an optimal control policy maximizing the value function. In large state state spaces RL solutions cannot always be represented exactly and approximation must be used in general. State-of-the-art RL algorithms use weighted summations of basis functions to approximate the value function. To avoid the need of learning a model, action value functions are computed, making the policy improvement step trivial like in the Least-Squares Policy Iteration (LSPI) algorithm of [5]. However, unlike LSPI which builds on a parametric approximation using least-squares temporal difference learning (LSTD) we build our algorithm on the idea of Bellman Residuals Minimization (BRM) [1]. An important aspect of any method solving RL problems is the way that data are collected and processed. Data collection setting can be categorized as online or offline and the data processing method can be categorized as batch or in1 Corresponding Author: Gennaro Esposito, Department of Languages and Informatics Systems, Polytechnical University of Catalunya, 08034 Barcelona, Spain; E-mail: [email protected].

4

G. Esposito and M. Martin / Approximate Policy Iteration with Bellman Residuals Minimization

cremental. In the online setting the behavior policy should be the same as the learning policy or can be updated once every several transitions. On the contrary in the offline setting the agent does not have control on how the data are generated and the agent is provided with a given data set of experiences. In the offline setting the behavior policy is usually stochastic and might be unknown to the agent. Data processing while learning may use a batch or an incremental algorithm. A batch algorithm processing the collected observations can freely access any element at any time. An incremental algorithm continues to learn whenever a new data sample is available while the computation in principle might not directly depend on the whole data set of observations but could rely on the last sample. A possibility is to alternate between phases of exploration, where a set of training examples is grown by interacting with the system, and phases of learning, where the whole batch of observations is used called growing batch learning problem. In practice, the growing batch approach is the modeling of choice when applying batch reinforcement learning algorithms to real systems. Central goal of RL is to develop algorithms that learn online, in which case the performance should improve once every few transition samples. In this paper we propose and empirically study an online algorithm that evaluates policies with the BRM using SVR called online API − BRMε . A crucial difference from the offline case is that policy improvements must be performed once every few samples, before an accurate evaluation of the current policy can be completed. Online API −BRMε collects its own samples making exploration necessary, does not suffer from sub-optimality, always finding the solution to the approximation problem. Being a non-parametric learning method, choosing an appropriate kernel can automatically adapt to the complexity of the problem. After describing some theoretical background we introduce online API − BRMε and provides an experimental evaluation for inverted pendulum and bike balancing problem. 2. Background and notation A finite-action discounted MDP can then be defined as a tuple (S, A, P, R, γ) with S a measurable state space, A a finite set of available actions P is a mapping giving the distribution over R × S with marginals defined as P(·|s, a) (transition probability) while R(·|s, a) represents the expected immediate reward when the agent makes a transition. At stage t an action at ∈ A is selected by the agent controlling the process and in response the     pair (rt , st ) is drawn from the distribution P(r, s |st , at ) i.e (rt , st ) ∼ P(r, s |st , at ) where  rt is the reward the agent receives and st the next MDP state. An agent in RL is usually assumed to be very simple, consisting mainly of an action selection policy such that at = π(st ). More generally a stationary stochastic policy maps states to distributions over the action space with πt (a|s) denoting the probability that the agent will select the action a to perform in state s at time t. Stochastic policies are also called soft when they do not commit to a single action per state. An ε−greedy policy is a soft policy which for some 0 ≤ ε ≤ 1 picks deterministically a particular action with probability 1 − ε and a uniformly random action with probability ε. We will then use a ∼ π(·|s) to indicate that action a is chosen according to the probability function in state s. For an agent following the policy π considering the sequence of rewards {rt : t ≥ 1} when the MDP is started in the state action (s1 , a1 ) ∼ ν(s, a) ∈ M(S × A) the action value ∞ function Qπ is defined as Qπ (s, a) = E{∑t=1 γ t−1 rt |s1 = s, a1 = a, π} where γ ∈ [0, 1] ˆ is a discount factor. A policy π = π(·, Q) is greedy w.r.t. an action value function

G. Esposito and M. Martin / Approximate Policy Iteration with Bellman Residuals Minimization

5

Q if ∀ s ∈ S we choose π(s) = arg maxa∈A Q(s, a). The Bellman operator for action       value function is defined as (T π Q)(s, a) = P(dr, ds |s, a)(r + γ ∑a ∈A π(a |s )Q(s , a )). π π π Given a policy π Q = T Q is a fixed points of the Bellman operator for the action value function. Approximating the value function can be done using regularization in a given Hilbert H space defined by a kernel function κ(s, a, st , at ) which can be easily implemented into the framework of Support Vector Regression [4]. The collected   data Dn = {(s1 , a1 , s1 , r1 ), ..., (sn , an , sn , rn )} can be assumed in general non-i.i.d. according to some unknown distribution and used to define the empirical operators. Given the policy π consider the Dn data sets the empirical Bellman operator is defined as     (Tˆ π Q)(st , at ) = rt + γ ∑a ∈A π(a |st )Q(st , a ) which provide an unbiased estimate of the Bellman operator. 3. Approximate Policy Iteration with Bellman Error Minimization Policy iteration (PI) is a method of discovering the optimal policy for any given MDP, providing an iterative procedure in the policies space. PI discovers the optimal policy by generating a sequence of monotonically improving policies. Each iteration consists of two phases: policy evaluation computes the state-action value function Qk of the current policy πk by solving the linear system of the Bellman equations and policy improvement which defines the improved greedy policy πk+1 over Qπk as πk+1 = arg maxa∈A Qk (s, a). Exact representations and methods are impractical for large state and action spaces. In such cases, approximation methods are used. Approximations in the policy iteration framework can be introduced into the representation of the value function and/or the representation of the policy. This form of policy iteration is known as Approximate Policy Iteration (API). A way to implement API is through the so called Bellman Error Minimization (BRM). In this case API proceeds at iteration k evaluating πk choosing Qk such that the Bellman residuˆ Qk ) producing the seals εkBR = |Qk − T πk Qk | to be small. API calculates πk+1 = π(·, quence Q0 → π1 → Q1 .... . BRM minimizes the Bellman error of the Bellman residuals of Q given the distribution of the input data ν using SVR which can be formulated considering the Bellman residuals BR(s, a) = Q − T π Q = Qπr (s, a) − r(s, a) where       the approximating function is Qπr (s, a) = Q(s, a) − γ P(ds |s, a) ∑a ∈A π(a |s )Q(s , a ) π while r(s, a) = E[ˆr|s, a] and the BRM using the ε−insensitive losses ε (Q − T Q) = max(0, |Q − T π Q| − ε) can be written as LBRMε (Q, π) = E[ε (Qπr − r)] = E[ε (Q −  T π Q)] The empirical estimate Lˆ BRMε (Q, Πn , Dn ) = En [ε (Qˆ πr − rˆ)] with Qˆ πr (st , at , st ) =     Q(st , at ) − γ ∑a ∈A πt (at |st )Q(st , at ) and the BRMε optimization problem becomes Qˆ = t arg minQ∈H {Lˆ BRMε (Q, Πn , Dn ) + λ Q 2H } where the regularization term use the norm in the Hilbert space H. BRMε shows a remarkable sparsity property in the solution which essentially relies on the training support vectors. Lˆ BRMε (Q, Πn , Dn ) is an almost unbiased estimator of LBRMε (Q, π) In practice the empirical estimate can be biased whenever slacks are present i.e. the errors on the regression function are above the fixed threshold ε. It is unbiased when the error is contained in the resolution tube of the SVR. Nevertheless the choice of the SVR parameters C and ε gives a way to control this effect. We analyze the BRMε solving the geometrical version of the SVR and the corresponding constrained optimization problem using the Lagrangian multiplier. Consider the subset of observed samples Dn and express the approximation of the value function as Q(s, a) = Φ(s, a), w + b with w = (w1 , ..., wd )T weight vec-

6

G. Esposito and M. Martin / Approximate Policy Iteration with Bellman Residuals Minimization

tor and Φ(s, a) = (φ1 (s, a), ..., φd (s, a))T features vector from which we may build the kernel function κ(st , at , s, a) = Φ(st , at ), Φ(s, a) . We assume that the action value function belongs the Hilbert space Q ∈ H and can be expressed as Q(s, a) = ∑t αt κ(s, a, st , at ). Bellman residuals at each training point for a fixed policy π BR(st , at ) = Q(st , at )−T π Q(st , at ) = Qπr (st , at )−r(st , at ) and substituting the functional       form yields BR(st , at ) = Φ(st , at ), w − γ P(ds |st , at ) · ∑a ∈A π(a |s ) · Φ(s , a ), w + (1 − γ)b − r(st , at ). It is worth noting that in the expression of the Bellman operator we     used the average operator over the policy ∑a ∈A π(a |st )Q(st , a ). Using the average helps exploiting the knowledge of the policy which may prevent stochasticity. This can be actually thought as an extension of Expected Sarsa algorithm [10] used in tabular RL methods for the action value function approximation  case. So we may define the Bellman fea









ture mapping as Ψπ (st , at ) = Φ(st , at ) − γ P(ds |st , at ) ∑a ∈A π(a |s )Φ(s , a ) taking into account the structure of the MDP dynamics. Bellman residuals are now expressed as BR(st , at ) = Ψπ (st , at ), w + (1 − γ)b − r(st , at ) and using Ψπ (st , at ) we define the ˜ t , at , s, a) = Ψπ (st , at ), Ψπ (s, a) . The function Qπr belongs to the Bellman kernel κ(s Bellman Hilbert space HΨπ can be expressed as Qπr (s, a) = ∑t βt κ˜ π (s, a, st , at ). where policy and MDP dynamic incorporated in HΨπ . Hence, while the kernel κ corresponding to the features mapping Φ(·) is given by κ(st , at , s, a) = Φ(st , at ), Φ(s, a) the Bellman kernel κ˜ corresponding to the features mapping Ψπ (·) is given by κ˜ π (st , at , s, a) =

Ψπ (st , at ), Ψπ (s, a) . The weighting vector w can be found minimizing the Bellman residuals|BR(st , at )| ≤ ε using the Bellman kernel and the features mapping Ψπ (s, a) and searching for a solution of the regression function Qπr (s, a) = Ψπ (s, a), w + b with Qπr ∈ HΨπ and solving the SVR problem min

w,b,ξ ,ξ ∗

s.t.

1 2 2 w HΨπ

n +C ∑t=1 (ξt + ξt∗ )

(1)

r(st , at ) − Ψπ (st , at ), w − (1 − γ)b ≤ ε + ξt −r(st , at ) + Ψπ (st , at ), w + (1 − γ)b ≤ ε + ξt∗

ξt , ξt∗ ≥ 0

Once the Bellman kernel κ˜ π (st , at , s, a) and the rewards r(st , at ) are provided, Problem 1 can be solved in principle using any standard SVM package. SVR can be solved very efficiently using an incremental algorithm (see [8], [6] for SVM and [7] for the extension to SVR) which updates the trained SVR function whenever a new sample is added to the training set Dn . The basic idea is to change the lagrangian coefficient corresponding to the new sample in a finite number of discrete steps until it meets the KKT conditions while ensuring that the existing samples in Dn continue to satisfy the KKT at each step. 4. API − BRMε algorithm description SVM are powerful tools, but their computation and storage requirements increase rapidly with the number of training points. Core of SVM is a quadratic programming problem, separating support vectors from the rest of the training data. Speed of learning depends on the number of support vectors influencing also performances. Using SVR with ε−insensitive loss function allows to build a non-parametric approximator intrinsically sparse. Sparsity of the solution directly depends on the the combination of kernel and loss function parameters. The incremental version of the API − BRMε can be implemented in an online setting whenever the agent interacts with the environment. We adapt the

G. Esposito and M. Martin / Approximate Policy Iteration with Bellman Residuals Minimization

7

incremental SVR formulation given in [7] to our API − BRMε . After each incremental step it allows to implement a policy evaluation updating the approximation of the action value function. Hence one might perform the policy improvement and update the policy. Online API − BRMε by using the current behavior policy collects its own samples interacting with the system. As a consequence some exploration has to be added to the policy which becomes soft. In principle one may assume that action value function estimates remains similar for subsequent policies or al least do not change too much. Another possibility would be to rebuild the action value function estimates from scratch before every update. Unfortunately this alternative can be computationally costly and not be necessary in practice. API − BRMε algorithm was implemented using a combination of Matlab and C routines and was tested on the inverted pendulum and the bicycle balancing benchmarks. The simulation is implemented using a generative model capable of simulating the environment while the learning agent is represented by the API − BRMε algorithm. To rank performance it is necessary to introduce some metrics measuring the quality of the solutions. In each benchmark a specific goal is foreseen and a performance measure is represented by the fulfillment of a given task. Quality of a solution can be measured defining the score of a policy [2] for a given set of initial states S0 where we compute the average expected return of the stationary policy chosen independently from the set of ∑s ∈S Rˆ πˆ (s0 )

where tuples Dn . Given the learned policy πˆ its score is defined by Scoreπˆ = 0 |S0 | 0 n−1 t ˆ ˆ π π ˆ ˆ t ))|s0 = s] the average reR (s0 ) is the empirical estimate of R (s) = E[∑t=0 γ r(st , π(s turn. In all our experiments for any pair of zi = (si , ai ) and z j = (s j , a j ) we use the RBF 1 T 2 kernel κ(zi , z j ) = e− 2 (zi −z j ) Σ (zi −z j ) where Σ is a diagonal matrix specifying the weight for any state-action vector component. Using this kernel also allows to manage possible variant of the problems where the action space may be considered continuous or presenting eventually some noise. A part from the matrix Σ we also have to define the parameters (C, ε) in the SVR. We performed a grid search to find the appropriate set of parameters (Σ,C, ε) looking at the resulting performance of the learning system. In fact, using different set of parameters might help finding good policies. Another important aspect which may affect the performance of the algorithm is represented by the way we collect data and therefore how we manage the compromise between the need of exploration and exploitation of the learned policy. We run experiments using two different methods: Method-1 (online API − BRMε ): some data are generated offline using a random behavior policy which produces a set D0 of tuples i.i.d. using a set of different initial states S0 used to initialize the algorithm. API − BRMε algorithm proceed incrementally adding new experiences and improving the policy any KP steps using an ε−greedy policy. Exploration rely on exponential decay of the ε starting from ε0 ≤ 1 until a minimum value ε∞ = 0.1. Method-2 (online-growth API − BRMε ): alternate explorative samples using a random behavior policy with exploitative samples every Ke steps with an ε−greedy policy learned using a small exploration ε. Algorithm 1 illustrates the online variants of API − BRMε using an ε−greedy exploration policy. The algorithm allows the definition of two parameters which are not present into the offline version: the number of transition KP ∈ N0 between consecutive policy improvements and the exploration schedule Ke . Policy is fully optimistic whenever KP = 1 and the policy is updated after every sample while is partially optimistic with 1 < KP ≤ Kmax where in experiments we choose Kmax = 10. The exploration sched-

8

G. Esposito and M. Martin / Approximate Policy Iteration with Bellman Residuals Minimization

Algorithm 1 Online BRMε Policy Iteration with ε−greedy exploration Require: (κ, λ , γ, KP , εk f unction) l ← 0 initialize Qˆ 0 (s, a) (and implicitly πˆ0 ) arbitrarily for all (s, a) solve initial SVR Q1 ← BRMε (Π0 , D0 , κ, λ ) (policy evaluation) store initial next state policy Π0 measure initial state s0 for all time step k > 0 do update exploration factor εk choose action ak = {πk (·) w.p. 1 − εk ∨ uniform random action apply ak and measure next state sk+1 and reward rk+1 update training sample set Dk ← Dk−1 ∪ (sk , ak , rk+1 , sk+1 ) update next state policy Πk ← Πk−1 ∪ πk (·, sk+1 ) solve incremental SVR Qˆ k ← Incremental − BRMε (Πk , Dk , κ, λ ) if k = (l + 1)KP then ˆ Qˆ k−1 ) update policy πl (·) ← π(·, end if l ← l +1 end for return

w.p. εk }

ule can be controlled by the parameter Ke and the decay factor decay which should be chosen not too large while a significant amount of exploration is necessary. In this way we perform learning alternating exploration trials using some exploration factor ε f with exploitation trials using a ε0 . Proof of the convergence of Algorithm 1 under general hypothesis can be found in [3] and is omitted for brevity. 5. The inverted pendulum control problem The inverted pendulum control problem consists in balancing at the upright position a pendulum of unknown length and mass. This can be done by applying a force on the cart where the pendulum cart is attached to [11] Due to its simplicity but still challenging control task, this benchmark is widely used to test the performance of state of the art methods for function approximation in RL. The state space S \ ST = {(θ , θ˙ ) ∈ R2 } is continuous and consists of the vertical angle θ and the angular velocity θ˙ of the inverted pendulum and a terminal state ST described later. Three discrete actions are allowed A = {−am , 0, am } where am = 50N and some uniform noise in σa ∈ [−10, 10] might be added to the chosen action. The transitions are governed by nonlinear dynamics described in [11]. The angular velocity θ˙ is restricted to [−4π, 4π]rads−1 using saturation. The discrete-time dynamics is obtained by discretizing the time between t and t + 1 chosen with dt = 0.1s. If θt+1 is such that |θt+1 | > θm a terminal state ST = {|θ | > θm } is reached where we fixed θm = π/2. The reward function r(st , at ) is defined through the following expression: r(st , at ) = {−1 i f |θ | > θm , 0 otherwise} The discount factor γ has been chosen equal to 0.95. The dynamical system is integrated by using an Euler method with a 0.001s integration time step. To generate data samples we may consider episodes starting from the same initial state s0 = (θ0 , θ˙0 ) or using a random initial state and stopping when the pole leaves the region represented by S \ ST meaning enter in a terminal states ST . In our simulation using online API − BRMε we run

G. Esposito and M. Martin / Approximate Policy Iteration with Bellman Residuals Minimization

9

VLPXODWLRQWLPHVHF

    

VLPXODWLRQWLPHVHF

VLPXODWLRQWLPHVHF

Figure 1. Inverted pendulum: representative subsequences of policy found by online API − BRMε using Method-1. (Actions are discretized and only three grey levels show up) Policy score

0.01

Average balancing time 300

0 −0.01

250

−0.02

Balancing time (s)

Score

−0.03 −0.04 −0.05 −0.06

200

150

100

−0.07 −0.08

50

−0.09 −0.1

0

200

400 600 Simulation time (s)

800

1000

0

0

200

400 600 Simulation time (s)

800

1000

Figure 2. Inverted pendulum: (left) average score of online API − BRMε with KP = 10 over a grid of initial states; (right) average balancing time over the same grid of initial states using Method-1

for 1000s of simulated time collecting around 10000 samples using Method-1. Run was split into separate learning episodes initiated at random initial states and stopping when a terminal state has been reached or otherwise after 30s (300 steps). Policy improvement were performed once every KP = 10 steps (0.1s) using an ε−greedy policy with ε0 = 1 and reaching a value of ε∞ = 0.1 after 350s. We also used an RBF kernel with parameters Σ = I3 σ with σ = 0.5 and the regression parameters where chosen as C = 10 and ε = 0.01 selected using a grid search. Figure 1 shows a subsequence of policies found during representative run taken after simulation times t = 10s, 50s, 200s, 1000s. Clearly the generalization ability of the SVR makes possible to capture the structure of the approximated policy only after 50s (500 steps) of simulation time which closely resembles the final policy obtained after 1000s of simulation time. Figure 2 shows the performance of the final policy found by online API − BRMε along the online learning process. The performance was measured evaluating the score over a grid of initial states simulating balancing up to 300s (3000 steps). Also on the right we shows the balancing time of the pendulum which come close to 300s (3000 steps) in less than 50s (500 steps) of simulated time. Simulation of the same benchmark with parametric approximation like LSPI [5] needs to almost double the number of samples to reach similar performances

10

G. Esposito and M. Martin / Approximate Policy Iteration with Bellman Residuals Minimization

Figure 3. Inverted pendulum: States and actions in representative subsequences of learning trials. Each trial lasts 30s max considered as the minimum balancing to reach. Using Method-2 (online growth) with a fixed initial state S0 = (0, 0) API − BRMε learns a local policy in a few episodes (20s of simulation time).

(while Q-learning needs even more). In Figure 3 we report states and actions in subsequences of learning trials. Each trial lasts 30s max considered as the minimum balancing to reach. Using Method-2 (online growth) with a fixed initial state S0 = (0, 0) a local approximation can be found in less then 30s (300 steps) of simulation time. Finally the number of support vectors necessary to represents the approximate action value function with the set of parameters used in the approximation usually stays below 5% of the total number of collected samples which is quite sparse giving also an indication of the quality of the approximation. 6. The bicycle balancing control problem We consider the control problems related to a bicycle [9] moving at constant speed on an horizontal plane. Bike balancing is a quite challenging RL problem to solve due to the large number of state variables. For the bicycle balancing the agent has to learn how to balance the bicycle. The system has a continuous time dynamics described in [9] ˙ θ , θ˙ , ψ) ∈ R5 | ω ∈ and is composed of five state continuous variables S \ ST = {(ω, ω, π π [−ωm , ωm ] θ ∈ [−θm , θm ] ωm = 15 rad θm = 2.25 rad} plus a terminal state ST . Four states are related to the bicycle itself and three to the position of the bicycle on the plane. The state variables related to the bicycle are ω, ω˙ (the angle and radial speed from vertical to the bicycle), θ , θ˙ (the angle and radial speed the handlebars are displaced from normal). If |ω| > ωm the bicycle has fallen down reaching a terminal state ST . The state variables related to the position of the bicycle on the plane are the coordinates (xb , yb ) of the contact point of the back tire with the horizontal plane and the angle Ψ formed by the bicycle with the x-axis. The actions space is discretized A = {(u, T ) ∈ {−0.02, 0, 0.02} × {−2, 0, 2}} composed of 9 elements, depends on the the torque T applied to the handlebars and the displacement d of the rider. Some noise might be present in the system, added to action d and uniformly distributed σdt = [−0.02, 0.02]. Details of the dynamic can be found in [9] and it holds valid if |ωt+1 | ≤ ωm while if |ωt+1 | > ωm the bicycle is supposed to have fallen down reaching a terminal state ST . We suppose that the state variables (xb , yb ) cannot be observed. Since these two state variables do not intervene in the dynamics of the other state variables nor in the reward functions. Hence they may be considered no relevant variables which does not make the control problem partially observable. The bicycle balancing control problem has reward function defined as r(st , at ) = {−1 i f |ωt+1 | > ωm 0 otherwise}. The value of the discount factor γ has been chosen for both problems equal to 0.98. The dynamical system is integrated by using an Euler method with a 0.001s integration time step. To generate data samples we

G. Esposito and M. Martin / Approximate Policy Iteration with Bellman Residuals Minimization Policy score

Average balancing time

550

0

11

500

−0.05

450 400 Balancing time (s)

Score

−0.1 −0.15 −0.2 −0.25

350 300 250 200 150 100

−0.3

50

0

100

200 300 Simulation time (s)

400

500

0

0

100

200 300 Simulation time (s)

400

500

Figure 4. Bike balancing: performance of online API − BRMε with KP = 10 using Method-1

may consider episodes starting from the same initial state corresponding to the bicycle standing and going in straight line with s0 = (ω0 , ω˙ 0 , θ0 , θ˙0 , ψ0 ) = (0, 0, 0, 0, Ψ0 ) with a fixed value of Ψ or chosen at random Ψ0 ∈ [−π, π] and stopping when the bicycle leaves the region represented by S \ ST meaning a terminal state ST . In our simulation of this benchmark using online API − BRMε we run for 500s of simulated time collecting around 50000 samples. Run was split into separate learning episodes initiated at random initial states s0 = (0, 0, 0, 0, Ψ0 ) with Ψ0 ∈ [−π, π] and stopping when a terminal state has been reached or otherwise after 1s (100 steps). Policy improvement were performed once every KP = 10 steps (0.1s) using an ε−greedy policy with ε0 = 1 and reaching a value of ε∞ = 0.1 after 200s (20000 steps). We also used an RBF kernel with parameters Σ = I7 σ with σ = 1.5 and the regression parameters where chosen as C = 10 and ε = 0.01 selected using a grid search. Left Figure 4 shows the performance of the final policy found by online API − BRMε along the online learning process. The performance was measured evaluating the score over a grid of initial states S0 = {0, 0, 0, 0, Ψ0 } with Ψ0 ∈ [−π, π]. Right Figure 4 shows the balancing time of the bicycle which come close to 500s (50000 steps) in less than 500s (50000 steps) of simulated time. Simulation of the same benchmark with parametric approximation like LSPI [5] needs to almost double the number of samples to reach similar performances (while Q-learning needs even more). In Figure 5 we report states and actions in subsequences of learning trials. Each trial lasts 50s max (5000 steps) considered as the minimum balancing to reach the goal. Using Method-2 (online growth) with a fixed initial state S0 = (0, 0, 0, 0, Ψ0 ) a local approximation can be found in less then 50s of simulation time. In the lower part of Figure 5 we also show some of the trajectories during the learning process as well as the final one. Finally the number of support vectors necessary to represent the approximate action value function with the set of parameters used in the approximation usually stays below 5% of the total number of collected samples which is quite sparse giving also an indication of the quality of the approximation.

7. Conclusions The main contribution of this work is the experimental analysis of a non-parametric approximation algorithm for the generalization problem in RL using policy iteration and SVR. We developed a model free kernel based online SVR approximation intrinsically sparse using the Bellman Residual Minimization. We also studied its practical implementation issues in standard RL benchmarks where the method shows the ability to find good policies in continuous state RL problems.

12

G. Esposito and M. Martin / Approximate Policy Iteration with Bellman Residuals Minimization $

%

&

Figure 5. Bike balancing: (Upper A) States and actions in representative subsequences of learning trials. Each trial lasts 50s max (5000 steps) considered sufficient reach the goal. Using Method-2 (online-growth) with small perturbations of a fixed initial state S0 = (0, 0, 0, 0, π/2) API − BRMε may learn a local policy in a few episodes (50s (5000 steps) of simulation time). (Lower) sketch of the trajectory (B zoom, C overall) in the time interval (0, 500s) for the bicycle on the (xb , yb ) plane controlled by the final policy of API − BRMε

References [1] [2] [3] [4] [5] [6]

[7] [8] [9] [10]

[11]

Leemon Baird. Residual algorithms: Reinforcement learning with function approximation. In In Proceedings of the Twelfth ICML, pages 30–37. Morgan Kaufmann, 1995. Ernst Daniel, Geurts Pierre, and Whenkel Luis. Tree based batch mode reinforcement learning. Journal of Machine Learning Research, 6:503–556, 2005. Gennaro Esposito. Explorations of Generalization Methods using Kernel-Based Policy Iteration in Reinforcement Learning. PhD Thesis, UPC, Barcelona, 2014. Theodoros Evgeniou, Massimiliano Pontil, and Tomaso Poggio. A unified framework for regularization networks and support vector machines. Technical report, Cambridge, MA, USA, 1999. Michail G. Lagoudakis and Ronald Parr. Least-squares policy iteration. Journal of Machine Learning Research, 4:1107–1149, 2003. Pavel Laskov, Christian Gehl, Stefan Krüger, Klaus robert Müller, Kristin Bennett, and Emilio Parradohern. Incremental support vector learning: Analysis, implementation and applications. Journal of Machine Learning Research, 7:2006, 1968. Mario Martin. On-line support vector machine regression. In Proceedings of the 13th European Conference on Machine Learning, ECML ’02, pages 282–294, London, UK, UK, 2002. Springer-Verlag. Thomaso Poggio and Geurth Cauvemberghs. Incremental and decremental support vector machine learning. Adv. in Neural Inform. procesing, MIT Press(13):409–415, 2001. Jette Randlov and Paul Alstrom. Learning to drive a bycicle using reinforcement learning an shaping. In Proceeding of the fifth International Conference on Machine Learning, pages 463–471, 1998. Harm van Seijen, Hado van Hasselt, Shimon Whiteson, and Marco Wiering. A theoretical and empirical analysis of expected sarsa. In ADPRL 2009: Proc of IEEE Symposium on ADP and RL, pages 177–184, March 2009. H.O. Wang, K. Tanaka, and M.F. Griffin. An approach to fuzzy control of nonlinear systems: stability and design issues. Fuzzy Systems, IEEE Transactions on, 4(1):14–23, Feb 1996.

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-13

13

A randomized algorithm for exact transduction Gennaro ESPOSITO a,1 , Mario MARTIN a University of Catalunya, 08034 Barcelona, Spain

a Polytechnical

Abstract. Random sampling is an efficient method dealing with constrained optimization problems. In computational geometry, it has been applied, through Clarkson’s algorithm [10], to solve a general class of problems called violator spaces. In machine learning, TSVM is a learning method used when only a small fraction of labeled data is available, which implies solving a non convex optimization problem. Several approximation methods have been proposed to solve it, but they usually find suboptimal solutions. Global optimal solution may be obtained using exact techniques, costing an exponential time complexity with respect to the number of instances. In this paper, an interpretation of TSVM in terms of violator space is given. Hence, a randomized method is presented extending the use of exact methods now reducing the time complexity to sub-exponential in particular exponential w.r.t. the number of support vectors of the optimal solution instead of exponential w.r.t. the number of instances. Keywords. Transduction, Semi-Supervised Learning, Support Vector Machine, Violator Spaces, Branch and Bound

1. Introduction In computational geometry, random sampling is an efficient method to deal with constrained optimization problems. Firstly one finds the optimal solution subject to a random subset of constraints. Likely, the expected number of constraints violating that solution is significantly smaller than the overall number of remaining constraints. Eventually in some lucky cases, the solution founf does not violate the remaining constraints. Hence, one can exploit this property to build a simple randomized algorithm. Clarkson’s algorithm [10] is a two-staged random sampling technique able to solve linear programming problems, which can also be applied to the more general framework of violator spaces. The violator space framework has become a well-established tool in the field of geometric optimization, developing subexponential algorithms starting from a randomized variant of the simplex method. The class of violator space includes the problem of computing the minimum-volume ball or ellipsoid enclosing a given point set in Rd , the problem of finding the distance of two convex polytopes in Rd and many other computational geometry problems [6]. Generalization to violator space problems makes it applicable to a number of non-linear and mostly geometric problems. Clarkson’s algorithm stages are based on random sampling and are conceptually very simple. Once it is shown that a 1 Corresponding Author: Gennaro Esposito, Department of Languages and Informatics Systems, Polytechnical University of Catalunya, 08034 Barcelona, Spain; E-mail: [email protected].

14

G. Esposito and M. Martin / A Randomized Algorithm for Exact Transduction

particular optimization problem can be regarded as violator space problem, and certain algorithmic primitives are implemented for it, the algorithm is immediately applicable. In machine learning, Transductive Support Vector Machine (TSVM) [1] extends Support Vector Machines SVM to handle partially labeled datasets. TSVM learns the maximum margin hyperplane classifier using labeled training data that does not present unlabeled data inside its margin. Unfortunately, dealing with a TSVM implies to solve a non convex optimization problem. A wide spectrum of approximative techniques have been applied to solve the TSVM problem [3], but they do not guarantee finding the optimal global solution. In fact, when state-of-the-art approximative TSVM methods have been applied to different benchmark problems, far from optimal solutions have been found [3]. Exact methods can be practically applied to small datasets due to the exponential time complexity cost with respect to the number of instances. Balcazar et alt. [7] show that a hard margin SVM belongs to the class of violator space, proposing a random sampling technique for the determination of the maximum margin separating hyperplane. Note that the problem of solving a SVM is convex while solving a TSVM is a not, so they are very different in nature. In this paper we show that the optimal solution of a TSVM totally relies on the support vectors set, gathering a size smaller than the whole set of instances. Moreover, we also show how TSVM can be formulated as a violator space problem allowing the use of the Clarkson’s algorithm to find its optimal global solution. Fostering the TSVM sparsity property, we introduce a randomized algorithm able to reduce the time complexity of exact methods, scaling now exponentially w.r.t. the number of support vectors of the optimal solution instead of exponentially w.r.t. the number of instances. Using our method one may find the exact solution independently on the number of instances when the problem has few support vectors.

2. Transductive Support Vector Machines TSVM can be described as a training set of l labeled examples {(xi , yi )} where xi ∈ Rd with labels yi = {±1} and i = 1, ..., l i.i.d. according to an unknown distribution p(x, y) together with u unlabeled examples {xk } with k = l + 1, ..., n and n = l + u the total number of examples, distributed according to some distribution p(x). Considering w the vector normal to the hyperplane and the constant b, the problem can be formulated as finding the vector of labels yu = {yl+1 , ..., yn } (yi = {±1}) having the maximal geometric margin with a separating hyperplane (w, b) solving: w 22 2

I(w, b, yu ) = min

(w,b,yu )

+C ∑li=1 ξip +C∗ ∑nk=l+1 ξkp

yi (wT ϕ(xi ) + b) ≥ 1 − ξi

subj. to yk

(wT ϕ(x

k ) + b)

≥ 1 − ξk

ξi ≥ 0 ξk ≥ 0

1≤i≤l

(1) (L)

l +1 ≤ k ≤ n

(U)

with p = 1 or 2 respectively for linear (L1 ) and quadratic (L2 ) losses. The first term controls the generalization capacity while the others, through the slack variables ξi , the number of misclassified samples. The two regularization parameters (C and C∗ ) akin our confidence in the known labels. The decision function f ∈ F is represented as f (xi ) = wT ϕ(xi ) + b and yi = sign ( f (xi )) assuming there exist a given Hilbert H space and a mapping ϕ : Rd → H. The mapping sends the examples data into a feature space

G. Esposito and M. Martin / A Randomized Algorithm for Exact Transduction

15

Figure 1. SEB: extremes: points essential for the solution (in red), violators: points lying outside the ball (in blue), basis: a minimal set of points having the same ball

generating the kernel and satisfying the Mercer’s conditions. In this work we refer to quadratic losses, since once fixed the yu , the associated Hessian matrix is positive definite bringing to an objective function unique and strictly convex. Moreover, the optimization problem is considered computationally more stable for L2 losses. However, main results still apply for the L1 losses case. The solution has to be found solving the constrained optimization problem according to the labeled L and unlabeled U constraints. Early on, two main strategies have been adopted to minimize I(w, b, yu ): • Local approximation methods: starting from a tentative labeling yu they perform a local search in the space of labelings, using effective heuristics, to maximize I(w, b, yu ). These methods are sensible to local minima. For instance, SV M ligth method [2] uses a local search algorithm to find the solution which may fail in delivering a solution close to the global one. • Exact combinatorial methods: fixing the unlabeled vector yu in I(w, b, yu ) converts the optimization over (w, b) into a standard SVM. Combinatorial methods find the global optimal solution by searching on the entire space of possible labelings of yu the SVM with maximum margin. Focusing on exact combinatorial optimization of J (yu ) = min(w,b) I(w, b, yu ), the objective becomes to minimize J (yu ) over a set of binary variables. Such an optimization is non convex belonging to the computational class of NP-hard. It can be solved by using Branch and Bound (BB) [4] or Integer Programming (IP) [5] computationally very demanding due to the large number of different possible labelings of unlabeled instances.

3. Violator spaces and randomized algorithms Violator space problems were introduced as an abstract framework for randomized algorithms able to solve linear programs by a variant of the simplex algorithm. In computational geometry, an example problem that can be solved using this method is the Smallest

16

G. Esposito and M. Martin / A Randomized Algorithm for Exact Transduction

Enclosing Ball (SEB). Here the goal is to compute the smallest ball enclosing a set of n points in a d dimensional space (fig. 1). In the following we introduce the main tools of the abstract framework of violators spaces in order to show how randomized methods devised in computational geometry can be also applied to solve the TSVM problem. Details and proofs of the reported properties can be found in [6]. Consider a finite set of constraints H and a weight function w giving for G ⊆ H the cost w(G) ∈ W of the optimum solution satisfying some general properties like: • Monotonicity: for all F ⊆ G ⊆ H we have w(F) ≤ w(G) and • Locality: for all F ⊆ G ⊆ H and all h ∈ H with w(F) = w(G) and w(G) < w(G ∪ {h}) we have w(F) < w(F ∪ {h}) For any G ⊆ H define basis an inclusion-minimal subset B ⊆ H such that w(G) = w(B) while the combinatorial dimension δ is the size of the largest basis. A violator of G is an additional constraint h ∈ H such that w(G) = w(G ∪ h). An element h is an extreme of G if w(G − {h}) = w(G). From definition h violates G ⇔ h is extreme in G ∪ {h}. The set of all constraints violating G define the violator mapping V (G) := {h ∈ H : w(G) < w(G ∪ {h})} and the pair (H,V ) is called a violator space and to setup a basis one has to define the primitive operation Violation test that given G ⊆ H and h ∈ H\G decide whether h ∈ V (G). Primitive 1 (Violation test) A violator space can be implicitly defined by the primitive: given G ⊆ H and h ∈ H\G decide whether h ∈ V (G) Defining set of violators V (R) = {h ∈ G\R | w(R ∪ {h}) = w(R)} and set of extremes of R X(R) := {h ∈ R | w(R\{h}) = w(R)}, the following lemma holds: Lemma 1(Sampling Lemma) For a set R of size r uniformly chosen at random from the set Gr of all r-element subsets of G with (|G| = n), define two random variables Vr : R → |V (R)| Xr : R → |X(R)| with expected values vr := E(Vr ), xr := E(Xr ). Now for 0 ≤ r ≤ n we have vr = xr+1 n−r r+1 . bounding the expected number of violators trough vr ≤ δ n−r r+1 . Clarkson [10] envisaged a smart randomized algorithm able to solve violator space problems relying on the expected number of violators bounded according to the sampling lemma. In the SEB case, H is the set of constraints requiring all the points inside the ball, w is the radius of the ball, violators are points outside the ball. SEB is a violator space problem and Clarkson’s algorithm can be used to solve it. The algorithm proceeds in rounds maintaining a voting box initially containing one voting slip per point. In each round, a set R of r voting slips is drawn at random without repetitions from the voting box and the SEB of the corresponding set R is computed. For each new round the number of voting slips for the violators points is doubled. The algorithm terminates as soon as there are no violators (no points outside the ball). If r ≈ d 2 , the expected number of rounds is O(log(n)) reducing a problem of size n to O(log(n)) problems of size O(d 2 ). Clarkson’s algorithm find a basis using a Trivial method able to find the solution for subsets of size at most δ . Clarkson’s Basis2 algorithm randomly choose from the voting box (μ(h) distribution) a candidate base and calls Trivial to find a solution for it. Applying the

G. Esposito and M. Martin / A Randomized Algorithm for Exact Transduction

17

violator test, it doubles probability in μ(h) for each violator, increasing the probability of obtaining a true base in further iterations (due to the sampling lemma). When no violators are found, the solution is found. Consider G as a multiset with μ(h) (initially set to one) denoting the multiplicity of h. For a set F ⊂ G the compound multiplicity of all elements of F is μ(F) := ∑h∈F μ(h). Sampling from G is done envisaging that it contains μ(h) copies of every element h. At each round, when the amount of violators is below a threshold, Basis2 doubles the multiplicity μ (weight) of violator points, incrementing the probability of selecting a base in next rounds. Convergence of Basis2 algorithm relies on the fact that Trivial is correct. An iteration of the loop is successful if changes the weights of the elements. To estimate how many unsuccessful iterations pass between two successful ones, using the sampling lemma bound reveals that after k δ successful iterations the inequality 2k ≤ E[μ(B)] ≤ n ek/3 holds for every basis B of G with |G| = n and in particular k < 3 log n. Summarizing Clarkson’s algorithm Basis2 computes a basis of G with an expected number of at most 6δ n log n calls to primitive 1, and an expected number of at most 6δ log n calls to Trivial with sets of size 6δ 2 . 4. Randomized algorithms for TSVM TSVM optimal belongs to the class of violator space problems and Clarkson’s algorithm can be used to find its optimal global solution where w is represented by I(w, b, yu ) evaluated over subset of labeled L and unlabeled U constraints with combinatorial dimension depending on the number of support vectors. Given a subset of partially labeled points, TSVM global optimal solution (a basis) can be obtained using an exact method like IP or BB (our Trivial algorithm). Moreover we need to define a violation test as Clarkson’s algorithm rely on the probability of selecting a basis increasing the weight of violating points. Violators can be detected as the remaining points lying in the TSVM separating margin. Details of the proofs are omitted and can be found in [8]. TSVM may also arouse a formulation in terms of violator space problem. Endowing the constrained formulation of TSVM problem and violator space definition we formally propose: Proposition 1 Given a TSVM with constraints HF , wF (GF ) : 2HF → WF a mapping defined for each subset GF ⊆ HF as wF (GF ) = minHF F (GF ) = F (wg , bg , yg ) and WF bounded and linearly ordered by ≤, the quadruple (HF , wF ,WF , ≤) represents an associated violator space problem. We may prove the following: Proposition 2 Given a TSVM with constraints HF , for each subset GF ⊆ HF containing labeled L and unlabeled R ⊆ U data (L ∪ R), consider the mapping wF (GF ) : 2HF → WF defined as wF (GF ) = minGF F (GF ) = I(wg , bg , yg ) with WF bounded and linearly ordered by ≤. Then the quadruple (HF , wF ,WF , ≤) represents an associated violator space problem. In order to verify locality and monotonicity we need to prove the following lemma ( by contradiction see [8] for details) Lemma 2 Given a TSVM with constraints HF and a subset GF ⊆ HF with global optimum F (GF ), adding a constraint hF ∈ HF to GF change the global optimum according to F (GF ) ≤ F (GF ∪ hF )

18

G. Esposito and M. Martin / A Randomized Algorithm for Exact Transduction

Adopting the Clarkson’s algorithm needs the violator mapping which has to be defined through the violation test primitive: Primitive 2 (TSVM violation test) Given a TSVM with global solution F (GF ) over subset of constraints GF and decision function fGF (xh ), any other constraint h ∈ HF \GF will violate the solution (h ∈ VF (GF )), if ηh = max(0, 1 − yh fGF (xh )) ≥ 0 with the label obtained through yh = sign( fGF (xh )). Beforehand we have shown that we may associate a violator space problem to a TSVM once we deal with an exact method to get the optimal global solution. Remarkably, Clarkson’s randomized method is viable to solve a TSVM acquiring a basis using the violation test primitive 2. Efficiency relies on the sparsity property of TSVM.

5. Algorithm implementation Exact methods like Integer Programming (IP) [5] and Branch and Bound (BB) [4] have been investigated for TSVM. The basic idea of IP for TSVM is to add an integer decision variable indicating the class for each point in the working set. Hence the solution can be found using any mixed integer programming code. BB is also able to solve combinatorial optimization problems using an enumerative technique to search into the entire space of solutions. Any exact TSVM method could implement the Trivial algorithm. In the following we focus on BB indicating with BBTSVM the one devised by Chapelle for TSVM. It solves the problem over a set of binary variables were each evaluation of J (yu ) embodies an inductive SVM. BBTSVM minimizes over 2u possible choices of yu . At each node in the binary search tree, a partial labelling of the data sets is generated and for the corresponding sons casts the labelling of some unlabelled data. At the root of the tree, there is the set of labelled L with U =Ø, while at leaves the complete set of labels is provided. The exploration follows a depth first strategy. The design of BBTSVM foresee: (1) upper bound (UB) at leaf node is the objective J (yu ) while for the other nodes there is no UB meaning that the best known value of the objective function is used at each stage (initially UB = +∞); (2) the lower bound at each node is assessed optimizing the SVM with the whole set of labelled so far. This convey the idea that the minimum of the objective function J (yu ) is smaller ignoring unlabelled data; (3) branching is pursued observing that if J (L) is the objective trained on the already labelled set, the new point to add {x, y} should be the one we are most confident about a particular label [4]; (4) finally, a branch is cut when UB < LB. Core of the BBTSVM is a subroutine optimizing the SVM on the data already labelled as we proceed in the search tree. SVM are usually solved as constrained quadratic programming problem in the dual, but they can also worked out in the primal as unconstrained. BBTSVM uses this method requiring few iterations to get an approximate solution with a (worst case) complexity O(n3 ). SVM being convex quadratic program problems are polynomial-time solvable, the currently best method raising (worst case) a complexity of O(n3/2 d 2 ). By the enumeration process BBTSVM may reach a complexity from O(2n n3 ) to O(2n n3/2 d 2 ) depending on SVM solver used. Algorithm (1) implementing the STSVM slightly revise the Clarkson’s scheme. In general we may ignore the combinatorial dimension of the corresponding violator space. In fact, apart for the linear case, theoretical bound on the number of support vectors are not of practical use. Henceforth, we start with a fixed given value for ro . According to

G. Esposito and M. Martin / A Randomized Algorithm for Exact Transduction

19

Algorithm 1 STSVM (Xl , Yl , Xu ,Maxit,r) Require: r = 2go δ  |Xl ∪ Xu | repeat  Choose Xr at random according to μ distribution: Xr ⊂ ur Base(ob jR ,Yr , α, w, b) := BB(Xl , Xr ,Yl ) // call branch and bound // compute labels... Yu\r := sign( f(w,b) (xu\r )) 2 ξu\r := max(0, 1 −Yu\r f(w,b) (xu\r ))2 // ... and slacks // check violators VBase := {h ∈ U\R : ξu\r = 0} if μ(VBase ) ≤ μ(U)/g0 then // update weights of violators μ(h) := 2μ(h), h ∈ VBase end if 2 /q OBJR := ob jR + 1/2 ∑u\r ηu\r // update best... u\r Basebest := minOBJ (Base(OBJR ,Yr , α, w, b), Basebest ) // ...known solution until Maxit or VBase =Ø if VBase =Ø then return Base(ob jR ,Yr , α, w, b) // global optimum else return Basebest // best solution found end if Basis2, to stress the probability of selecting a given violator in further rounds we double the corresponding weight whenever the slack ξu\r = 0. In the quadratic loss implementation for example, where ηu\r = αu\r it means that the point would change the margin if added to the basis. Basic implementation of Trivial uses an implementation of an exact method working over random samples L ∪ R ⊆ L ∪ U. As envisaged, in the practical implementation of Trivial we use BB. Using different exact algorithm to provide the implementation of Trivial does not change the convergence of STSVM but can only affects the performances. Two stop conditions are foreseen: (1) VBase =Ø (no violators condition: returns global optimum) or (2) max number of iterations (returns best known solution). In the last case the best known solution is represented by the minimum of Fbest = minob jR BB(L ∪ R). Such criterium takes into account ob jR the best result from BB minimizing the violators contribution Δ(VR ). Our implementation of BB is able to optimize the undergoing SVM in the primal or dual formulation with quadratic or linear losses. Considering convergence and complexity of our algorithm, it can be showed that the number of violation tests is at most O(δ n log(n)) while the expected number of calls to BBTSVM is at most O(δ log(n)) with R sets having average size r = O(δ 2 ). The time complexity of STSVM reveals that the sparsity property allows for a relative gain with respect to the use of a BB over the whole set of data from O(2n n3/2 d 2 ) to O(δ log(n) 2r (r + l)3/2 d 2 ). Our method sounds more effective when the number of support vectors is much lower than the number of instances.

6. Experiments In this section we briefly describe results obtained using the proposed randomized method in two well known benchmark problems that have not been previously solved using exact methods.

20

G. Esposito and M. Martin / A Randomized Algorithm for Exact Transduction

Figure 2. (Left) Distribution and exact solution for the two moons data set (4.000 unlabelled, 2 labelled as triangle and cross). (Right) Weights distribution for the whole set of points at the final round. Support vectors of the optimal solution are encircled.

STSVM stands out on the well known benchmark problem of two moons now composed of 4.000 unlabeled examples (Fig 2). Two moons problem has been reported burdensome to solve for state-of-the-art transduction methods [3]. Table 1 shows error rates produced by some of these methods. All these methods fall in local minima very far from the global one. However, this problem has been solved previously applying BB over the space of all possible labelling of instances [4]. Unfortunately, this method has an exponential time complexity on the number of unlabeled instances, so it can only solve the two moons with few hundreds of unlabeled instances. The method described in this paper finds out the optimal solution scaling exponentially with the number of support vectors. In the case of the two moons, the number of support vectors does not scale with the number of points. So, as expected STSVM allows us to find out the exact solution of two moons with 4.000 unlabeled examples within few minutes. STSVM is also able to find the exact solution for another well known benchmark problem that has not been previously solved using exact methods: the Coil20 data set ( [3]). It contains 1440 images of 20 objects photographed with different perspectives (72 images per object) with 2 labeled images for each object. Coil20 is a multi-class problems commonly solved through a one-vs-all scheme. Table 1 shows the success of state-of-the-art transduction algorithms on this data set as is described in [3]. BB is not able to solve this problem, due to the large number of instances, albeit it can manage a reduced one with pictures of solely 3 classes hard to discriminate (Coil3 made of 210 unlabeled and 6 labeled examples). STSVM was able to solve Coil20. The final error achieved was 20 errors from the 1400 unlabeled initial examples reducing the error for Coil20 to 1.4%.

7. Conclusions and Future Work In this paper, we have shown an original interpretation of the TSVM problem in terms of violator space, which is able to extend the use of any exact method to find the optimal

21

G. Esposito and M. Martin / A Randomized Algorithm for Exact Transduction data sets

SVM

∇S3 VM

cS3 VM

CCCP

S3 VMlight

∇DA

Newton

STSVM

TwoMoons Coil3 Coil20

50.2 66.6

61 61.6

37.7 61

63.1 56.6

68.8 56.7

22.5 61.6

11 61.5

0 0

24.1

25.6

30.7

26.6

25.3

12.3

24.1

1.4

Table 1. Error rate on datasets for supervised SVM, state of the art transduction algorithms as reported in [3], and STSVM. Details of acronyms and the other methods than STSVM are reported in [3]

solution but now scaling in time complexity with the number of support vectors instead of the number of points. The most suitable situation for our method apply with datasets entailing a small amount of support vectors, independently of the size of the data set. Limitations of our approach appear when the size of the support vectors set is higher than few hundreds, a common situation in real data sets. In the future, we plan to investigate an implementation using very sparse SVMs formulations, which might allow to extend the application of our method to larger datasets. Also, preliminary experiments with larger benchmark datasets (where the number of support vector were in the range of hundreds) show that the method is, as we expected, not able to find the optimal solution in a reasonable time. However, in these cases we kept the best solution as described in section 5. In all cases, the returned solution was better than the returned one by a SVM on only the labeled examples. These experiments encourage us to explore error bounds for the proposed methods and try to apply it to find good approximations to the optimal solution. Finally, we consider interesting a possible interpretation of the weight obtained for each example in the randomized method. Samples with high weights usually appear as violators which could help us to identify points interesting for the final solution even if the method is not able to find the exact one. Acknowledgements We greatly thank O. Chapelle for providing us the BBTSVM Matlab demo code.

References [1] [2] [3] [4] [5] [6] [7] [8] [9]

V.Vapnik, The nature of statistical learning theory, Springer, (1995) T. Joachims, Transductive inference for text classification using support vector machines, In International Conference on Machine Learning, (1999) O. Chapelle et al, Optimization techniques for Semi supervised support vector machines, Journ. of Machine Learning Research, 9, 203-233, (2008) O. Chapelle, V. Sindhwani, and S. Keerthi, Branch and bound for semi-supervised support vector machines, In Adv. in Neural Information Proc. Systems, (2006) K.P. Bennett, A. Demiriz, Semi supervised support vector, machines, Adv. in Neural Information Processing Systems, 12, 368-374, (1998) B.Gartner, J. Matousek, L.Rust, and P.Skovron, Violator spaces: Structure and algorithms, Discrete Applied Mathematics, 156(11), (2008) Balcazar J.,L., Dai Y., Tanaka J., Watanabe O., Provably fast training algorithms for Support Vector Machines, Theory of Computing Systems, (2008) G. Esposito, LP-type methods for optimal Transductive Support Vector Machines, University of Perugia, PhD thesis,(2011), http://bit.ly/lZSo1L G. Cauvemberghs, T. Poggio, Incremental and decremental support vector machine learning, Adv. in Neural Inform. procesing, MIT Press, 13, 409-415, (2001)

22 [10]

G. Esposito and M. Martin / A Randomized Algorithm for Exact Transduction

K. L. Clarkson, Las vegas algorithms for linear and integer programming when the dimension is small. Journal of the ACM, 42, (1996). [11] S. A. Nene, S. K. Nayar, and H. Murase. Columbia object image library (coil-20). Technical Report CUCS-005-96, Columbia Univ., USA, (1996)

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-23

23

Comparing feature-based and distance-based representations for classification similarity learning Emilia LÓPEZ-IÑESTA, Francisco GRIMALDO and Miguel AREVALILLO-HERRÁEZ Departament d’Informàtica, Universitat de València Av. de la Universitat s/n. 46100-Burjassot (Spain) [email protected],[email protected], [email protected] Abstract. The last decades have shown an increasing interest in studying how to automatically capture the likeness or proximity among data objects due to its importance in machine learning and pattern recognition. Under this scope, two major approaches have been followed that use either feature-based or distance-based representations to perform learning and classification tasks. This paper presents the first results of a comparative experimental study between these two approaches for computing similarity scores using a classification-based method. In particular, we use the Support Vector Machine, as a flexible combiner both for a high dimensional feature space and for a family of distance measures, to finally learn similarity scores in a CBIR context. We analyze both the influence of the different input data formats and the training size on the performance of the classifier. Then, we found that a low dimensional multidistance-based representation can be convenient for small to medium-size training sets whereas it is detrimental as the training size grows. Keywords. similarity learning, distance-based representation, training size

1. Introduction Learning a function that measures the similarity between a pair of objects is a common and important task in applications such as classification, information retrieval, machine learning and pattern recognition. The Euclidean distance has been widely used since it provides a simple and mathematically convenient metric on raw features, even when dealing with a small training set, but it is not always the optimal solution for the problem being tackled [14]. This has led to the development of numerous similarity learning techniques [4,6] aimed to build a model or function that, from pairs of objects, produces a numeric value that indicates some kind of conceptual or semantic similarity and also allows to rank objects in descending or ascending order according to this score. Some studies have have put their attention into automatically learning a similarity measure that satisfies the properties of a metric distance [19,8] from the avalaible data (e.g. in the form of pairwise constraints obtained from the original labeled information)

24

E. López-Iñesta et al. / Comparing Feature-Based and Distance-Based Representations

and have turned supervised metric learning into a topic of great interest [5]. Under this scope, when the properties of a metric are not required, a similar setting can also be used to train a classifier to decide whether a new pair of unlabeled objects is similar or not, an approach that is named as classification similarity learning. Classification similarity learning has traditionally represented the annotated objects in the training set as numeric vectors in a multidimensional feature space. It is also known that the performance of many classification algorithms is largely dependent on the size and the dimensionality of the training data. Hence, a question that arises is what the ideal size and dimension should be to obtain a good classification performance, considering that greater values generally yield to a better classification but at the cost of increasing the computational load and the risk of overfitting [13]. To deal with this issue, the dimensionality of the training data has been commonly reduced by using distance-based representations such as pairwise distances or (dis)similarities [12]. It is also a frequent practice to use different (dis)similarity measures, each acting on distinct subsets of the muldimensional available features, that are lately combined to produce a similarity score value. A number of combination techniques then exists, under the name of fusion schemes, that have been categorised either as early or late fusion [20]. While the first one uses a unified measure that merges all the features, the second one computes multiple feature measures on a separate basis and then combines them to obtain the similarity between two objects. Inspired by the late fusion scheme, in this paper we use a multidistance representation that transforms the original feature space in a distance space resulting from the concatenation of several distance functions computed between pairs of objects. This kind of input data involves an additional knowledge injection to the classifier, because the use of a distance measure is an implicit match between the characteristics of two objects and also because of the usual correlation between semantic similarity and small values of distance. It is worth mentioning that this multidistance space is related to the dissimilarity space defined in [7]. Nevertheless, it differs from it in that the space transformation is carried out at a feature level between freely selected pairs of objects instead of using a fixed representation set. The aim of this paper is to compare the performance obtained from the feature-based and the multidistance-based representations when applied to a classification similarity learning setting as well as to analyze the influence of different training data sizes. Thus, our goal is twofold: on the one hand, we want to study the ability of a classifier to deal with a high feature dimensionality when the training size grows; on the other hand, we want to test under which circumstances the reduction in dimensionality leads to better results than treating objects in their wholeness. The proposed experimentation concerns the problem of Content-Based Image Retrieval (CBIR), where image contents are characterized by multidimensional vectors of visual features (e.g. shape, color or texture). By considering pairs of images labeled as similar or dissimilar as training instances, we face a binary classification problem that can be solved through a soft classifier that provides the probability of belonging to each class. This probability value can be considered as the score determining the degree of similarity between the images and it can be used for ranking purposes. In particular, the Support Vector Machine classification algorithm has been selected and we use four different values for the Minkowski distance to construct the multidistance-based representation. Additionally, we use as baseline for our comparison the performances obtained

E. López-Iñesta et al. / Comparing Feature-Based and Distance-Based Representations

25

from the global Euclidean distance and two other traditional score-based normalization methods: the standard Gaussian normalization and the Min-max normalization. The rest of the paper is organized as follows: Section 2 formulates the problem and describes the multidistance-based representation into detail; Section 3 presents the experimental setting and analyzes the obtained results; finally, Section 4 states the conclusions and discusses future work.

2. Problem Formulation Let us assume we have a collection of images X = {xi }, i = 1, 2 . . ., which are conveniently represented in a multidimensional feature space F. Let us also assume that this feature space is defined as the Cartesian product of the vector spaces related to T different descriptors such as color, texture or shape. F = F(1) × . . . × F(t) × . . . × F(T )

(1)

(t)

Hence, we can represent as xi the set of features that correspond to descriptor t in xi . Let us finally consider a classical similarity learning setup [19,8], where k training pairs (xi , xj ) are available that are accordingly labeled as similar (S) or dissimilar (D). In classification-based learning, these pairs are used to train a classifier that can later be able to classify new sample pairs. Thus, when it comes to using a soft classifier, its output will provide a score that may be used to judge the similarity between objects. A straigthforward approach that fits this scheme is to concatenate the feature vectors of the objects and use the resulting double-size vector as the input to the classifier (see the arrow labeled “feature-based representation” in Figure 1). However, by following this approach, the learning problem size highly depends on the dimensionality of the feature space F, which is usually rather large. This situation might be specially critical for small sample datasets, which unfortunately are often the case. The dimensionality of the input data can then be reduced by using feature reduction techniques such as Principal or Independent Component Analysis. Another way of tackling this problem is by applying a similarity-based spatial transformation [7]. In this paper we evaluate the performance of a multidistance-based representation resulting from a preprocessing layer that acts before passing the traning data to an SVM (see the arrow labeled “multidistance-based representation” in Figure 1). The preprocessing layer is composed of two steps. The first one derives from computing a family of N distance functions (e.g. Euclidean, cosine or Mahalanobis) for every training pair. Each distance function is defined in each descriptor vector space as in Equation 2. (t) d(t) × F(t) −→ R n :F

(2)

Thus, we define a transformation function w as indicated in Equation 3 that, given the feature-based representation of two images xi and xj , constructs a tuple of values (1) (1) (2) (2) (T ) (T ) (t) d1 , . . . , dN , d1 , . . . , dN , . . . , d1 , . . . , dN , where dn denotes the distance be(t) (t) tween xi and xj . w : F × F −→ RN ·T

(3)

26

E. López-Iñesta et al. / Comparing Feature-Based and Distance-Based Representations

Figure 1. Feature-based and multidistance-based classification similarity learning approaches.

The choice of the most suitable distance function depends on the task at hand and affects the performance of a retrieval system [15]. This has led different authors to analyze the performance of several distance measures for specific tasks [1,9]. Therefore, rather than choosing the most appropriate distance for a task, the proposed multidistance representation aims to boost performance by combining several distance functions simultaneously. This operation transforms the original data into a labeled set of N · T -tuples, where each element refers to a distance value, calculated on a particular subset of the features (i.e. the corresponding descriptor). The second step of the preprocessing layer normalizes the labeled tuples in order to increase the accuracy of the classification [18] by placing equal emphasis on each (T ) (1) (1) (2) (2) (T ) descriptor space [2]. We denote by dˆ1 , . . . , dˆN , dˆ1 , . . . , dˆN , . . . , dˆ1 , . . . , dˆN  the normalized tuples, where a simple linear scaling operation into range [0, 1] has been applied. A complete schema of the input data transformation done by the preprocessing layer can be seen in see Figure 2. Once the SVM soft classifier has been trained, it can be used to provide a score value that can be treated as a similarity estimation between images. For any new pair (xi , xj ), function w is applied to convert the original features into a tuple of distances (using the same family of functions as for training). After normalization, the resulting vector is used as input to the classifier, that provides a confidence estimation that the pair belongs to any of the classes. This estimate can be used directly for ranking purposes, or converted into a probability value by using the method in [16].

E. López-Iñesta et al. / Comparing Feature-Based and Distance-Based Representations

27

Figure 2. Scheme of the preprocessing layer.

3. Evaluation 3.1. Experimental setting To analyze the performance of the SVM classifier for the feature-based and the multidistance-based training data formats, a number of experiments have been run on a medium-size image dataset that has also been used in other previous studies (e.g., [3]). The dataset contains a subset of 5476 images from the large commercial collection called "Art Explosion", which is composed of a total of 102894 royalty free photographs that are distributed by the company Nova Development1 . The images from the repository, originally organized in 201 thematic folders, have been carefully selected and classified into 63 categories so that images in the same category represent a similar semantic concept. Each image in the dataset is described by a label that refers to the semantic concept the image represents according to this manual classification and by a 104-dimensional numeric feature vector defined in a muldimensional feature space through a set of ten visual descriptors2 . The two classification similarity learning approaches have been also compared with three other traditional score methods, that have been used as baseline. The first one is the global Euclidean distance applied on the entire feature vectors. The second one is the standard Gaussian normalization as described in [10], that consists of a mapping function (t) (t) d2 → (d2 −μ)/3σ, where μ and σ represent the mean and the standard deviation of the (t) Euclidean distance on each descriptor vector space (d2 ). The third one is the Min-max normalization, that performs a linear transformation on data computing the minimum (t) and the maximum of the distance d2 . The last two approaches are both applied to the individual visual image descriptors and will be referred to as Gaussian normalization and 1 http://www.novadevelopment.com 2 The

database and details about their content can be found in http://www.uv.es/arevalil/dbImages/

28

E. López-Iñesta et al. / Comparing Feature-Based and Distance-Based Representations

Min-max normalization, respectively. The experiments were run 50 times each and the results were averaged. To evaluate the influence of the training size, the experiments were run over twelve training sets with increasing sizes. The smallest training set had 100 pairs while the remaining training sets increased their size sequentially from 500 up to 5500 with steps of 500 pairs. In each training set the pairs were labeled as similar (S) when the labels associated with the vectors were the same, and as dissimilar (D) otherwise. It is worth noting that sizes smaller than 100 pairs were discarded as the classification method was outperformed by the baseline methods for such tiny training sets. On other hand, sizes greater than 5500 pair did not show any qualitative difference and followed the trends shown in this paper. After the training phase, if any, the ranking performance of each algorithm was assessed on a second different and independent test set composed of 5000 pairs randomly selected from the repository. To this end, the Mean Average Precision (MAP), one commonly used evaluation measure in the context of information retrieval [17], was used. The MAP value corresponds to a discrete computation of the area under the precision-recall curve. Thus, by calculating the mean average precision we had a single overall measure that provided a convenient trade-off between precision and recall along the whole ranking. For the preprocessing layer generating the multidistance-based representation, we have considered a pool composed of four Minkowski distances (Lp norms), with values p = 0.5, 1, 1.5, 2. These are widely used dissimilarity measures that have shown relatively large differences in performance on the same data [1,9], and hence suggest that may be combined to obtain improved results. Fractional values of p have been included because they have been reported to provide more meaningful results for high dimensional data, both from the theoretical and empirical perspective [1], a result that has also been confirmed in a CBIR context [9]. In addition, the kernel chosen for the SVM has been a Gaussian radial basis function. The parameters γ and C have been tuned by using an exhaustive grid search on a held out validation set composed of a 30% partition of the training data (C ∈ {10−6 , 10−5 , ..., 100 , 101 } and γ ∈ {10−2 , 10−1 , ..., 104 , 105 }). To compensate the SVM sensitiveness to unbalanced data sets [11], we fix the percentage of similar pairs in the training set to 30%. 3.2. Results Figure 3 plots the average MAP obtained for the compared similarity learning methods as a function of the training size, where one can distinguish three performance regions. On the left hand side, when the training set is very small (i.e. less than or equal to 100 pairs), the normalization methods perform better than the two classification approaches, that show a limited learning capacity. The reason behind this result is that there is not enough information for training the classifier. Even so, the multidistance-based representation outperforms the feature-based representation since it handles the high dimensionality in a better way. As the training set increases in size, we identify a second interval (i.e. from 100 to 2000 pairs, approximately) in which the reduction of dimensionality achieved through the concatenation of multiple descriptor distances has positive effects on the classifier performance. Indeed, the multidistance-based representation obtains the highest values in this region of small to medium-size training sets.

E. López-Iñesta et al. / Comparing Feature-Based and Distance-Based Representations

29

Figure 3. Average MAP vs. training set size.

Notwithstanding, when we allow the training size to grow far enough (i.e. beyond 2000 pairs), the classical feature-based representation improves the results that can be obtained from the rest of the algorithms. This third region demonstrates that the information loss incurred by the multidistance-based representation can be detrimental for big training sizes, since it limits the learning capabilities of the SVM classifier. All in all, these results suggest that the training size can have an important effect on the classification performance when we adopt differents strategies for the representation of data input. Regarding the performance of the baseline approaches, the average MAP remain constant for all the different training set sizes and show small values for the global Euclidean distance. Values for the Gaussian and Min-max normalization are almost equal and fall generally bellow the average MAP values obtained for both classification methods. Figures 4 and 5 allow us to observe the variability of the MAP values resulting from respectively executing the feature-based and the multidistance-based representations. Each point in these plots correspond to one of the 50 executions run for each training set size, while the solid line shows the linear curve fit. By comparing the plots, we observe that the feature-based representation generates a higher variability while the multidistance-based representation obtain more robust results. Even though these results are still preliminary and need further exploration, the reason behind such an behaviour could again be the difficulty of having enough examples to learn a high dimensional classification model.

30

E. López-Iñesta et al. / Comparing Feature-Based and Distance-Based Representations

Figure 4. MAP vs. training set size for the feature-based representation.

4. Conclusion In this paper we have conducted an experimental study comparing two approaches for learning similarity scores in a multidimensional feature space using a classification-based method as the SVM. The difference between these approaches is based on the representation format followed by the sample dataset that is used to train the classifier. On the one hand, a feature-based representation of objects can have as drawback the high dimensionality of the learning problem that it poses to the classifier. On the other hand, a multidistance-based representation can reduce dimensionality by transforming the original multidimensional space in a distance space constructed as the concatenation of a number of distance functions. A series of performance patterns have been extracted from the analysis of the different input data formats and the training size. We found that a low dimensional multidistance-based representation can be convenient for small to medium-size training sets whereas it is detrimental as the training size grows. The dimensionality reduction (e.g. in the form of distances relations and its combitation) supposes additional information to the classifier and boosts its performance. For large training sets, though, a higher dimensional feature-based representation provides better results for the data base considered. This results can be of value when designing future systems that need to automatically capture the similarity of pairs of objects.

E. López-Iñesta et al. / Comparing Feature-Based and Distance-Based Representations

31

Figure 5. MAP vs. training set size for the multidistance-based representation.

Future work will extend this study by including other databases with different characteristics in size and dimensionality. Besides, further investigation is needed that considers more distance combinations as well as other suitable techniques to reduce the dimensionality of the training set and to finally improve the performance of classifiers. Acknowledgements We would like to thank Juan Domingo, for extracting the features from the images in the database. This work has been supported by the Spanish Ministry of Science and Innovation and the Vice-chancellor for Training Policies and Educational Quality at the Universiy of Valencia through projects TIN2011-29221-C03-02 and UV-SFPIE_FO13147196. References [1] C. Aggarwal, A. Hinneburg, and D. Keim. On the surprising behavior of distance metrics in high dimensional space. In J. Bussche and V. Vianu, editors, Database Theory ICDT, volume 1973 of Lecture Notes in Computer Science, pages 420–434. Springer Berlin Heidelberg, 2001. [2] S. Ali and K. Smith-Miles. Improved support vector machine generalization using normalized input space. In AI 2006: Advances in Artificial Intelligence, volume 4304 of Lecture Notes in Computer Science, pages 362–371. Springer Berlin Heidelberg, 2006.

32 [3] [4]

[5] [6]

[7] [8] [9]

[10] [11] [12]

[13] [14] [15] [16] [17] [18] [19]

[20]

E. López-Iñesta et al. / Comparing Feature-Based and Distance-Based Representations

M. Arevalillo-Herráez and F. J. Ferri. An improved distance-based relevance feedback strategy for image retrieval. Image and Vision Computing, 31(10):704 – 713, 2013. A. Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall. Learning distance functions using equivalence relations. In Proceedings of the 20th International Conference on Machine Learning, pages 11–18, 2003. A. Bellet, A. Habrard, and M. Sebban. Similarity Learning for Provably Accurate Sparse Linear Classification. In ICML, pages 1871–1878, 2012. J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S. Dhillon. Information-theoretic metric learning. In Z. Ghahramani, editor, Machine Learning, Proceedings of the 24th International Conference (ICML), volume 227, pages 209–216. ACM, 2007. R. P. W. Duin and E. Pkalska. The dissimilarity space: Bridging structural and statistical pattern recognition. Pattern Recognition Letters, 33(7):826–832, May 2012. A. Globerson and S. T. Roweis. Metric learning by collapsing classes. In Advances in Neural Information Processing Systems 18 (NIPS). MIT Press, 2005. P. Howarth and S. Rüger. Fractional distance measures for content-based image retrieval. In Proceedings of the 27th European conference on Advances in Information Retrieval Research (ECIR), pages 447– 456, Berlin, Heidelberg, 2005. Springer-Verlag. Q. Iqbal and J. K. Aggarwal. Combining structure, color and texture for image retrieval: A performance evaluation. In 16th International Conference on Pattern Recognition (ICPR), pages 438–443, 2002. S. Köknar-Tezel and L. J. Latecki. Improving SVM classification on imbalanced time series data sets with ghost points. Knowledge and Information Systems, 28(1):1–23, 2011. W.-J. Lee, R. P. W. Duin, A. Ibba, and M. Loog. An experimental study on combining euclidean distances. In Proceedings 2nd International Workshop on Cognitive Information Processing (14-16 June, 2010 Elba Island, Tuscany - Italy), pages 304–309, 2010. Y. Liu and Y. F. Zheng. FS_SFS: A novel feature selection method for support vector machines. Pattern Recognition, 39(7):1333–1345, 2006. B. McFee and G. R. G. Lanckriet. Metric learning to rank. In Proceedings of the 27th International Conference on Machine Learning (ICML), pages 775–782. Omnipress, 2010. J. P. Papa, A. X. Falcão, and C. T. N. Suzuki. Supervised pattern classification based on optimum-path forest. Int. J. Imaging Syst. Technol., 19(2):120–131, 2009. J. C. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in large margin classifiers, pages 61–74. MIT Press, 1999. B. Thomee and M. S. Lew. Interactive search in image retrieval: a survey. International Journal of Multimedia Information Retrieval, 1(2):71–86, 2012. J. Vert, K. Tsuda, and B. Schölkopf. A Primer on Kernel Methods, pages 35–70. MIT Press, Cambridge, MA, USA, 2004. E. P. Xing, A. Y. Ng, M. I. Jordan, and S. J. Russell. Distance metric learning with application to clustering with side-information. In Advances in Neural Information Processing Systems 15 (NIPS), pages 505–512. MIT Press, 2002. J. Zhang and L. Ye. Local aggregation function learning based on support vector machines. Signal Processing, 89(11):2291–2295, 2009.

Computer Vision I

This page intentionally left blank

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-35

35

Bag-of-Tracklets for Person Tracking in Life-Logging Data Maedeh AGHAEI a,1 , Petia RADEVA a,b a Universitat de Barcelona, MAIA Department Barcelona, Spain b Computer Vision Center, Cerdanyola del Vallés (Barcelona), Spain Abstract. By increasing popularity of wearable cameras, life-logging data analysis is becoming more and more important and useful to derive significant events out of this substantial collection of images. In this study, we introduce a new tracking method applied to visual life-logging, called bag-of-tracklets, which is based on detecting, localizing and tracking of people. Given the low spatial and temporal resolution of the image data, our model generates and groups tracklets in a unsupervised framework and extracts image sequences of person appearance according to a similarity score of the bag-of-tracklets. The model output is a meaningful sequence of events expressing human appearance and tracking them in life-logging data. The achieved results prove the robustness of our model in terms of efficiency and accuracy despite the low spatial and temporal resolution of the data. Keywords. Person tracking, life-logging data, low spatial and temporal resolution videos.

1. Introduction The human mind procreates memories in terms of autobiographical events that can be explicitly stated. A photo is very much like a memory of little events. Studying photos of life events can aid the memory and boost the mental performance. SenseCam [17] is a passive image capturing tool, that has the ability to capture up to 3.000 photos (2 images per minute) during a day. It is a very well-acknowledged tool to produce data for lifelogging as a concept of collecting one’s daily life for later retrieval and/or summarization. Life-logging has mostly been used for medical purposes, particularly to aid those with memory loss [17]. Studies show by reviewing the day’s filmstrip, patients suffering from Alzheimer’s, amnesia, and other memory impairments found it much easier to retrieve lost memories. The SenseCam produces images (Fig. 1), which are very similar to one’s memory. These photographs are important pieces of information since they store necessary data about what happened during the day, answering to questions such as who, what, where and when occured, that is essential for event recognition. Humans exclusively occupy special place in our memories. In this work, we focus on recovering segments of videos based on appearance of people around the person wearing the camera. 1 Corresponding

Author: Maedeh Aghaei, E-mail: [email protected]

36

M. Aghaei and P. Radeva / Bag-of-Tracklets for Person Tracking in Life-Logging Data

Figure 1. A sequence of images acquired by a SenseCam camera. One can appreciate the low quality of the images, the free motion of the camera and the low-temporal resolution of the video.

In recent years, there has been a lot of interest in analysis of life-logging images to help obtain significant segments of information out of them. In our proposed model, we are particularly focusing on extracting the who information i.e. on human appearance in the life-logging data. We believe presence of a certain person for specific instance of time in a special place leads to a distinctive event. Therefore, we are interested in designing a model, which is able to automatically subdivide a sequence of egocenteric images of one’s daily life into several segments of sequential images. Every segment should represent an event related to the presence of a certain person. In this context, we call "human-based video segmentation" the process of detecting events that start once a particular person appears in the sight of the SenseCam, and end, when the same person is no more captured by the SenseCam. Recently, there has been growing interest in life-logging data analysis. The previous work in egocentric videos mainly focused on event and action recognition [11], discovering important people or objects [14,15], video summarization [16], social interaction recognition [12] or trajectory reconstruction [13]. In [11], the authors introduced a novel framework for detecting activities of daily living, based on, first, temporal pyramids to generalize the well-known spatial pyramid in order to approximate temporal correspondence of objects, when scoring a model. Second, interaction-based composite object models exploit the fact that objects look different, when being interacted with. A new egocentric video summarization method is also presented in [14], which is focused on extracting the most important people and objects with which the camera wearer interacts to. To accomplish this, region cues are developed such as the nearness to hands, gaze, and frequency of occurrence; and a regressor is trained to predict the relative importance of any new region based on these cues. Another robust summarization method is introduced in [16], which captures event connectivity beyond simple-object co-occurrence and focuses on visual objects influence between subshots and the way they contribute to the progression of events. In [12], a model is presented for detection and recognition of social interactions in first person videos of social events. The line of sight for each face in video frames is computed using information about location and orientation of faces. The lines of sight later are converted into locations in space to which individuals are supposed to attend to. Further, individuals are assigned roles based on their patterns of attention. The roles and locations of individuals are analyzed over time to detect and recognize types of social interactions. The method proposed in [13] serves also for estimating geospatial trajectory of a moving camera in an urban environment, based on a three steps process. It includes first, finding the best visual matches of images from moving camera to a dataset of geo-referenced street view images; second, Bayesian tracking to

M. Aghaei and P. Radeva / Bag-of-Tracklets for Person Tracking in Life-Logging Data

37

estimate the frame geolocalization and its temporal evolution, and third, a trajectory reconstruction algorithm to eliminate inconsistent estimations. However, to the best of our knowledge, person-based video segmentation and person tracking in life-logging data have not been studied yet. While person detection and tracking in videos are widely studied in computer vision, the problem of tracking in SenseCam images is relatively new, challenging and equally interesting. Most tracking difficulty in SenseCam videos occurs due to the low spatial and temporal resolution of images, free motion of the camera (it used to be hung on the neck of the wearer), frequent scene occlusion and distortion, and wide variation in appearance, scale and location of people along the videos (Fig. 1). Tracking algorithms based on appearance information [1,2] are often able to track targets very well, when the scene is uncluttered and the camera is static. However, as the complexity of the scene increases (complex background, crowded scene, etc.), these algorithms suffer from the well-known tracker drift problem [3]. Thus, due to the substantial difference between images from the SenseCam and images from a usual video cameras, studies which are addressing the tracking problem on videos from static cameras, are not applicable to the SenseCam images. In this context, images acquired from moving cameras are the most similar to the SenseCam data. Lately, tracking-by-detection mechanisms become of special interest, since they rely less on temporal coherence of objects appearance and thus could be applied to both type of images, captured from either static or moving cameras, and specially of interest to moving camera images. To address the challenges of tracking from a moving platform, several approaches have been proposed. Meanwhile, applying the Hungarian algorithm [18] to get the correspondance of objects appearance detected in different frames is a classical approach, in [4,5], the authors proposed a mobile tracking system that can simultaneously carry out detection and tracking by incorporating various sources of information and combining multiple detectors. While this method shows that integration of cues greatly helps reduce the false alarms, it is basically dependent on the usage of stereo cameras. In [6], the authors propose a robust method that is able to model multiple 2D/3D trajectories in space-time based on a probabilistic formulation. However, this is a sensitive approach in the case of missed detections or false positives. Later, in [7] the authors used both the positive detections and confidence values to model the observation likelihood from a HOG detector. The work in [7] is most similar in spirit to the work in [4,5], but in contrast, it tracks targets and estimates camera motion in a unified framework as well as does not require stereo information. However, this work relies on estimates of the camera parameters simultaneously (such as focal length and camera pose) and tracks objects as they move in the scene. It is not applicable to the SenseCam images, due to the low spatial and temporal resolution of the videos and the free motion of the camera. In [8], the authors propose an effective and efficient tracking algorithm called compressive tracking that uses appearance model based on features extracted in a compressed sensing domain. Inspired by this work, we present a new framework for the tracking problem in life-logging data. Although the work in [8] performs favorably against salient trackers on challenging datasets, it still produces significant false alarms. They are more likely to happen, when it comes to a very challenging dataset as SenseCam videos. Our approach to solve this problem is by introducing a novel concept, which we call bag-oftracklets, that allows to extract robust tracklet prototypes out of reliable bags-of-tracklets. By excluding unreliable bag-of-tracklets, we highly exclude those tracklets, which are

38

M. Aghaei and P. Radeva / Bag-of-Tracklets for Person Tracking in Life-Logging Data

prone to cause false alarms, either due to false alarm in detection or tracking. Thus, the main idea of our approach is to find reliable tracklets and discard unreliable ones generated by usual false alarms in person detection or tracking. This will allow to achieve a set of events, which are extracted from series of ’coherent’ tracklets. Thus, instead of deciding the reliability individually for each tracklet, our proposal makes this decision for the bag-of-tracklets containing similar tracklets grouped in a specially designed log-likelihood based framework. The article is organized as follows: in Sec. 2, we introduce the main concepts of our algorithm, namely: the seed and tracklet generation, as well as the bag-of-tracklets and prototype extraction. In Sec. 3, we discuss our results and finish the paper with conclusions and future work.

2. Bag-of-tracklets for life-logging human tracking In this section, we describe the proposed life-logging segmentation model based on human appearance, in details. Our model to address the human-based video segmentation problem is formulated by four sequential phases: person detection, tracking, bag-oftracklets formation and prototype extraction. 2.1. Seed and Tracklet Generation The first phase in our approach has the purpose to detect visible people in each image from the sequence of SenseCam images. In this work, we used Felzenszwalb et al.’s partbased human detector [9], although any alternative detector could be used. Each detected individual generates a seed defined by a box surrounding the person appearance. As an illustration, Fig. 2 shows a sequence of images, where frames in which the seed has been detected are shown by boxes around the person detected in the sequence. As second phase, a tracking technique is used to track each generated seed in the sequence forward and backward. A tracklet initiates in a frame, wherein the backward tracking ends, and ends, wherein the forward tracking ends. Thus, a tracklet ti , i = 1, 2, . . . , |G|, where G is the set of tracklets found, can be represented by ti = (ti1 , ti2 , . . . tis , tis+1 , . . . , tini ), where ti1 , tis and tini , respectively, show the final frame, where tracking finished going in the backward direction, the seed frame, where the person detection algorithm found a person appearance, and the final frame, where the tracking algorithm ended going in the forward direction, respectively. All the tracklets generated in this phase are stored in G for further processing. To generate the tracklet, we used the compressive tracking algorithm presented in [8] that has the advantage to be able to track an object independent of the time, in contrast to the majority of tracking techniques that follow only forward temporal tracking pattern. Compressive tracking relies on an appearance model based on features extracted from a multi-scale image feature space with data-independent basis. A very sparse measurement matrix is adapted to efficiently extract the features for the appearance model. By determining the first positive human detection, in the following at each frame, the model determines positive samples near the current target location and negative samples far away from the object center to update a Naive Bayes classifier. To predict the object location in the next frame, the model draws samples around the current target location

M. Aghaei and P. Radeva / Bag-of-Tracklets for Person Tracking in Life-Logging Data

39

Figure 2. An example of tracklet construction based on people appearance and tracking. Two detected tracklets have been shown. First tracklet starts from the second frame and lasts until the last frame. The second tracklet starts from the fifth frame and lasts until the last frame.

and determines the one with the maximal classification score. Thus, the ending condition of compressive tracking is controlled by the Naive Bayes classification score. The model is able to select the most correct positive sample, because its probability is larger than those of the less correct positive samples. This feature makes the method relatively tolerant to many changes in the scene during the tracking. In addition, the measurement matrix, which is used in the compressive tracking method is data-independent and no noise is introduced in it by mis-aligned samples. As a result, the compressive tracking enables to obtain a tracklet for every seed, as shown in Fig. 2. 2.2. Bag-of-tracklets and prototype extraction Compressive tracking, compared to other state-of-the-art approaches, shows a very favorable tracking results on SenseCam videos. However, it still fails to keep the true track of the object in case of huge motion of the camera and variation of object appearance from one frame to the next one, which is a very common fact in SenseCam videos. To address this challenge, our strategy is to integrate the results over multiple tracklets from multiple seeds generated from different frames, which are corresponding to a specific person along the video. Our model called bag-of-tracklets, assumes that tracklets generated by seeds belonging to the same person in the video, are very likely to be similar to each other. As third phase of our model, we aim to aggregate the tracklets in G into groups of tracklets. Therefore, each group contains tracklets, which are decided by our model as similar tracklets. Each group represents a bag-of-tracklets. Let us consider a bag-of-tracklets, M = {ti }m1 ,m2 ,...,mn . Initially, it contains a tracklet M = {tm1 }, where tm1 does not belong to any other bag-of-tracklets. Let us define the similarity of two tracklets, ti and tj as a sum function of the intersection between their boxes (box(tik ) and box(tjk )), where the person is detected (by the person detector or the compressive tracker) in frames k of both tracklets, normalized by the area of the boxes:    area(box(tjk )  box(tik )) n 1 , if box(tjk ) = ∅ and box(tik ) = ∅ L(tjk , tik ) = n k=1 area(box(tjk ) box(tik )) 0, otherwise. Given a tracklet tj , it will be accepted by the model M and added to the bag-oftracklets, if the log-likelihood that the tracklet belongs to the model is sufficiently large.

40

M. Aghaei and P. Radeva / Bag-of-Tracklets for Person Tracking in Life-Logging Data

Figure 3. A snapshot of our model. Six tracklets have been accepted by the model. Frames surrounded by the boxes correspond to the seeds of the tracklets.

The log-likelihood depends on the similarity of the new tracklet tj and all tracklets of the model M , namely, ti , i = m1 , m2 , . . . , mn . Hence, the log-likelihood is defined as follows: mi

|t | 1 1 L(t , M ) = L(tjk , tik ) n i=m ,m ,...,m |tmi | j

1

2

n

(1)

k=1

As a result of the definition, tracklets in a bag-of-tracklets are very likely to be corresponding to the same person in a video and thus, increase the chance of extracting a true track of the person throughout the video. In this way, the video will be represented with a group of bags-of-tracklets corresponding to the persons that have appeared in the video. Fig. 3 shows an illustration of a bag-of-tracklets. Note that not all tracklets have the same initial and final frame (as shown in Fig. 3), but being able to be grouped in the same bag-of-tracklets will help to decide the right beginning and end of person appearance. On the other hand, different persons in the same frames will give place to different bags-of-tracklets. So far we used all tracklets generated for all the seeds for formation of bags-oftracklets. However, we should be aware that some seeds might be false alarms in detection which also can cause false prototypes. These tracklets are likely to produce sparse bag-of-tracklets, which lead to unreliable human-based video segmentation. In the last phase of our approach, we aim to exclude unreliable bags-of-tracklets and to segment the video mainly by the reliable bags-of-tracklets. To compute the reliability of a bag-of-

M. Aghaei and P. Radeva / Bag-of-Tracklets for Person Tracking in Life-Logging Data

41

tracklets B, we define a term density d(B) of the bag-of-tracklets as the number NB of tracklets in that bag, divided by the median length of all the tracklets ti inside it. d(B) =

NB medianti ∈B (length(ti ))

(2)

Note that in ideal conditions, where detection and tracking do not fail, d(B) = 1. That is each frame, where the person appears, will generate a seed and this seed will generate a tracklet to the first and last frame of the person appearance. So we would have as much tracklets in the bag-of-tracklets, as the number of frames the person persisted in the video. In practice, since person detection and tracking may fail, the bag-of-tracklets is looking for the consensus between the different tracklets to obtain the right tracking outcome. Diversity among different bags-of-tracklets is more evident, if we map the density of a bag-of-tracklets versus the median length of its tracklets. Reliable bags-of-tracklets show very different behavior from unreliable ones. In the training stage, we used this diverse behavior to train a linear SVM classifier in order to remove unreliable bagsof-tracklets. Trained SVM later is used to distinguish reliable vs. unreliable bags-oftracklets based on their density. By excluding unreliable bags-of-tracklets, the final prototype of our method emerges from reliable bags-of-tracklets. The final prototype of the system is obtained as a sequence of the k tracklet boxes in the frames of the bag-of-tracklets that have the largest intersection with the rest of the tracklet boxes in the corresponding frames (Fig. 4, third row).

3. Experimental Results 3.1. Dataset Given the unique and highly specific nature of life-logging images, and lack of available dataset with groundtruth information on the presence of people in videos, we provided our own dataset to test the proposed model. This dataset has been acquired by three users in different age. Each one wore a SenseCam for a number of non-consecutive days over an overall of 10 days period, collecting ∼24,000 images, where ∼11,000 images of them contain a total number of 65 different trackable people along 58 sequences of average length of 80 frames. In this experiment, we consider that a person is trackable, when it appears at least in 10 consecutive frames. To create a groundtruth, we reviewed the image collections and manually marked the boundaries of human appearance. Table 1 provides details about the data used. 3.2. Results We run our person detection algorithm, generated the seeds from all positive detecions, generated the tracklets and formed the bag-of-tracklets. All unreliable bag-of-tracklets were removed using a linear SVM and a prototype per each bag-of-tracklets was extracted. Fig. 4 shows how the bag-of-tracklets improves the detection and tracking algorithms. The first row shows a tracklet, where the compressive tracking failed to follow the person, confusing it with a region of the scene. Since this tracklet differentiates

42

M. Aghaei and P. Radeva / Bag-of-Tracklets for Person Tracking in Life-Logging Data

Figure 4. Example of bag-of-tracklets that overcome FP and FN of detection and tracking algorithms. Rows: 1,2,3,4,5 represent different tracklets. Row 3 represents the prototype extracted. First row shows a tracklet, where compressive tracking fails. Last row represents a tracklet, where the person detector has a False Positive. Seeds are shown by a frame around them. Bounding boxes in images represent the person detected or tracked.

significantly from the rest of the tracklets, it did not affect the final prototype extracted (given in the middle row). Last row shows a tracklet, where the seed was generated by a False Positive of the person detector. Again, although this tracklet was accepted by the model of the bag-of-tracklets, it was too different from the rest of the tracklets so, it did not affect the prototype extracted. Thus, looking for a consensus among the tracklets, we see how the bag-of-tracklets model is able to overcome problems of detection and tracking and to improve the final performance of the algorithm. To determine the accuracy of our method, for each manual track specified in the groundtruth, we analyzed the performance of the bag-of-tracklets model in the following way. We estimated the accuracy of the tracking using a method [8] that calculates the number of True Positive tracked frames, divided by the total number of frames for that track in the groundtruth. Note that this measure is quite similar to the Jaccard distance in video segmentation. Then, we average the accuracy of each tracklet prototype extraction on the whole set of the bags-of-tracklets. The last column in Table 1 is giving the accuracy obtained by the bag-of-tracklets for the three datasets. As a result, our model succeeds to track the person with accuracy of 84%, in average. Comparing to state-of-the-art methods is difficult, since there is no other study on tracking of persons in SenseCam data. However, to demonstrate how our work improved results from existing state-of-the-art tracking techniques, when applied in SenseCam dataset, we compared the results with two other methods. The first method is based on

M. Aghaei and P. Radeva / Bag-of-Tracklets for Person Tracking in Life-Logging Data

43

Table 1. This table provides a detailed breakdown of the 23,909 images captured by the 3 users User

Total frames

Total frames with person(s)

Average daily duration

Accuracy

1 2 3

7582 7280 9046

2413 5651 1994

10 h 8h 11 h

83% 86% 85%

Table 2. Comparison between tracking result accuracy using Hungarian, CT and and Bag-of-Tracklets Hungarian

Compressive Tracking

Bag-oftracklets

40%

48%

84%

the Hungarian algorithm that solves the assignment problem of person detections, based on appearance-based information of objects (RGB and HOG) in every video frame. The second method to compare is the compressive tracking technique [8]. Tracking results over SenseCam images are shown in Table 2. Note that tracking of SenseCam data results really hard for standard tracking techniques due to the low spatial and temporal resolution of data, meanwhile the bag-of-tracklets succeeds to track persons with 84% of accuracy.

4. Conclusions In this work, we introduced a novel method to track persons in low spatial and temporal resolution images acquired by a SenseCam camera. This tracking allows to obtain human-based segmentation approach of visual life-logging data, that is of interest to detect events based on human presence. Bag-of-tracklets is a novel idea to extract lifelog events, which is based on presence of people for a time span in the SenseCam video. We integrate similar tracklets in a bag-of-tracklets and exclude unreliable bags in a supervised manner. In this way, the bag-of-tracklets achieved to address the tracking problem using higher level of information (dealing with a group of information as a bag-of-tracklets instead of single tracklets) and to increase the robustness of the model. We tested our model on a dataset of 24.000 images. As shown in Table 2, using our model, an overall accuracy of 84% has been achieved. We compared our model to two other methods on this dataset. Experiments show that a significant improvement has been achieved using our proposed model compared to state-of-the-art tracking techniques. As future lines, we are working on tackling the tracking of persons with occlusions and applying the human-based video segmentation to construct a final life-logging application for cognitive exercises of patients with mild cognitive impairment.

Acknowledgments This work was partially founded by the projects TIN2012-38187-C03-01, Fundació "Jaume Casademont" - Girona and SGR 1219.

44

M. Aghaei and P. Radeva / Bag-of-Tracklets for Person Tracking in Life-Logging Data

References [1] [2] [3] [4] [5] [6] [7]

[8] [9]

[10] [11] [12] [13] [14] [15] [16] [17] [18]

Z. Yin, R. Collins, On-the-fly object modeling while tracking, CVPR, 2007. Asad A. Butt, Robert T. Collins, Multi-target Tracking by Lagrangian Relaxation to Min-Cost Network Flow, CVPR, 2013. I. Matthews, T. Ishikawa, S. Baker, The template update problem, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, pp. 810-815, 2008. A. Ess, B. Leibe, K. Schindler, L. van Gool, A mobile vision system for robust multi-person tracking, CVPR, 2008. B. Leibe, K. Schindler, L. Van Gool, Robust multi-person tracking from a mobile platform, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, pp. 1831-1846, 2009. W. Choi, S. Savarese, Multiple target tracking in world coordinate with single, minimally calibrated camera, ECCV, 2010. C. Pantofaru, S. Savarese, A General Framework for Tracking Multiple People from a Moving Camera, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, pp. 1577-1591, 2013. K. Zhang, L. Zhang, M. Yang, Real-time compressive tracking, ECCV, 2012. P. F. Felzenszwalb, R. B. Girshick, D. McAllester, D. Ramanan, Object Detection with Discriminatively Trained Part Based Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, num. 9, pp. 1627-1645, 2010. A. Roshan Zamir, A. Dehghan, M. Shah, GMCP-Tracker: Global Multi-object Tracking Using Generalized Minimum Clique Graphs, ECCV, 2012. H. Pirsiavash, D. Ramanan, Detecting Activities of Daily Living in First-person Camera Views, CVPR, 2012. A. Fathi, J. K. Hodgins, J. M. Rehg, Social Interactions: A First-Person Perspective, CVPR, 2012. G. Vaca-Castano, A. Roshan Zamir, M. Shah, City Scale Geo-Spatial Trajectory Estimation of a Moving Camera, CVPR, 2012. Y. Jae Lee, J. Ghosh, K. Grauman, Discovering Important People and Objects for Egocentric Video Summarization, CVPR, 2012. C. Li, K. M. Kitani, Pixel-level Hand Detection in Ego-Centric Videos, CVPR, 2013. Zh. Lu, K. Grauman, Story-Driven Summarization for Egocentric Video, CVPR, 2013. S. Hodges, L. Williams, E. Berry, Sh. Izadi, J. Srinivasan, A. Butler, G. Smyth, N. Kapur, K. Wood, SenseCam: A retrospective memory aid, UbiComp 2006: Ubiquitous Computing, pp. 177-193, 2006. H. Kuhn, The Hungarian method for the assignment problem, Naval Research Logistics Quarterly ˘ S97, Vol. 2, Issue 1-2, pp. 83âA ¸ 1955.

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-45

45

Improving autonomous underwater grasp specification using primitive shape fitting in point clouds David FORNAS a , Jorge SALES a , Antonio PEÑALVER a , Javier PÉREZ a , J. Javier FERNÁNDEZ a , Pedro J. SANZ a a {dfornas, salesj, penalvea, japerez, fernandj, sanzp}@uji.es Abstract. This paper presents a research in progress towards autonomous underwater robot manipulation. Current research in underwater robotics intends to increase the autonomy of intervention operations that require physical interaction. Autonomous grasping is still a very challenging skill, especially in underwater environments, with highly unstructured scenarios, limited availability of sensors and adverse conditions that affect the robot perception and control systems in various degrees. To tackle those issues, we propose the use of vision and segmentation techniques that aim to improve the specification of grasping operations on underwater primitive shaped objects. Several sources of stereo information are used to gather 3D information in order to obtain a model of the object. Using a RANSAC primitive shape recognition algorithm, the model parameters are estimated and a set of feasible grasps are computed. This approach is validated in simulation and the quality of different 3D reconstructions from both real and virtual scenarios is analyzed. Keywords. underwater autonomous grasping, grasp specification, point cloud, RANSAC, shape fitting, UWSim underwater realistic simulator.

1. Introduction Research on autonomous robotic intervention on land has made some valuable achievements. In contrast, the current state-of-the-art in underwater intervention is in a very primitive stage where most systems are tele-operated by an expert user with complex interfaces. Nowadays, Remote Operated Vehicles (ROVs) are the most used machines but the trend is to advance towards Autonomous Underwater Vehicles (AUVs). This approach holds lots of challenges, being the autonomous grasping and manipulation tasks one of the biggest. Within this context there exist only a few projects in autonomous manipulation. In the field of the underwater intervention it is worth mentioning previous projects like AMADEUS [1], in which two underwater robot arms were mounted on a cell to demonstrate bi-manual skills. Nevertheless, this functionality demonstrated on a fixed frame, was never demonstrated on board an AUV, and hence with a mobile base. Also TRIDENT [2], that demonstrated the capability to autonomously survey an area and recover an object from the seafloor in 2012, still with some interaction with a human operator and restricted to shallow waters. In the context of this project, a framework to

46

D. Fornas et al. / Improving Autonomous Underwater Grasp Specification

grasp objects with the user interaction just for the grasp planning phase was presented in [3]. This approach, focused on increasing autonomy using 3D data, is further developed here. Presently, in TRITON project [4], an underwater robot that docks to an intervention panel is used to turn a valve or remove a connector, considering the structure as a well known object, thus limiting the amount of information needed from the operator and increasing its autonomy. Meanwhile two ongoing projects are running in the underwater intervention context funded by European Commission: MORPH [5] and PANDORA [6]. In the later, a learning by demonstration approach is used: a human operator teaches the system how to turn a valve through a set of trials and then the system generalizes this information to turn a valve autonomously in similar conditions. With respect to segmentation and modelling of characteristics for object manipulation is worth mentioning some advances that use 3D environment information to determine a grasp posture. In more controlled land scenarios it is possible to obtain an almost perfect object model, for example, using a rotating table and stereo cameras. In the presented context, however, only a single view of the object can be easily obtained and part of the object is unknown. There is an important effort in the 3D information processing field to extract geometric features of an object. In particular, the use of the RANSAC (RANdom SAmple Consensus) [7] method, that estimates parameters of a mathematical model from a set of observed data which contains outliers. The present paper is inspired by [8], which uses RANSAC in a 3D representation of an object in order to determine the best available grasp. In [9], the decomposition of complex objects in a series of primitive shapes with which a better grasp can be determined is described. This methods are not directly applicable because there is less 3D information available to marine sensors. In this paper we present a method able to perform grasping tasks more autonomously in the constrained, yet realistic, problem of grasping cylindrical objects like an amphora or a pipe. More generally, it allows to autonomously grasp unknown objects that resemble primitive shapes in the way that the amphora can be considered a cylinder or an airplane black box resembles a cuboid for the purpose of manipulation. Grasping objects generally requires at least some partial 3D structure, which is gathered using stereo vision and laser reconstruction [3]. The obtained point cloud is then used for planning a grasp, that is then executed fully autonomously by the robot. This article is organized as follows: next section describes the considered scenarios and setups; Section III briefly outlines the 3D point cloud acquisition and RANSAC shape fitting algorithm; Section IV describes the grasp specification with the analytical model of the object and the specification interface; Section V shows the current results and finally, further work and conclusions are included in Section VI.

2. Experimental setup Although real experiments are in progress, a simulated environment has been considered to develop and test the proposed algorithms. UWSim [10] has been chosen as the simulator for these experiments because it allows working with robots in underwater environments within the ROS [11] architecture. The considered scenario is illustrated in Figure 1. The mechatronics consists of a virtual model of an underwater robotic arm attached to the Girona500 AUV [12]. The arm has 7-DOF and was developed by Graal-

D. Fornas et al. / Improving Autonomous Underwater Grasp Specification

47

Figure 1. Simulation environment that reproduces the CIRS pool with the GT arm attached to the Girona 500 AUV(left). Real water tank conditions used for the experiments. The ARM5E arm is attached to a floating platform that is placed in the water in a fixed position (right).

Tech (named here GT-arm) in the context of the TRIDENT project [2]. The end-effector is a jaw to allow a parallel grip. In this way, only an approach vector is needed to specify the grasp. However, this solution has less flexibility than using a dexterous hand. The vehicle is floating in the CIRS (Centre d’Investigació en Robòtica Submarina, University of Girona) simulated pool and the target object is an amphora lying on a textured floor which simulates the seafloor. This vehicle-arm configuration is not physically available for experimental validation, but its features are optimal for testing purposes in simulation. The real experimental setup, that is also considered, is based on a 5-DOF underwater robotic arm (in our case the CSIP Light-weight ARM 5E [13]) attached to a floating vehicle prototype that remains static (Figure 1). The real scenario, resembling the aforementioned virtual scene, consists on a 2m x 2m x 1.5m water tank. The real target object is also a cylindrical object (a clay jar) lying on a planar surface surrounded with stones. There exist two real vision systems placed looking to the ground. The first one consists of a Videre Stereo Camera near the base of the arm inside a sealed case. The second consists of a monocular underwater camera (Bowtech 550C-AL). This camera is used in conjunction with a laser projector (Tritech SeaStrip) attached to the arm to obtain a 3D representation of the scene [3]. As can be seen in Figure 2, a laser stripe is projected and visually segmented to know the 3D coordinates of each point. It is worth mentioning that the simulated vehicle has access to a simulated version of the same sensors that the real system has, i.e. stereo vision, monocular cameras and a laser projector. Thanks to that, it is possible to use the same sources of 3D data that could be used in a real system.

Figure 2. Laser stripe projection over the scene. Camera point of view (left) and system overview (right).

48

D. Fornas et al. / Improving Autonomous Underwater Grasp Specification

3. 3D Reconstruction and Segmentation The first step in order to grasp an object consists of information acquisition about the environment. As stated previously, it can be gathered either using laser stripe reconstruction or a stereo camera (real or virtual in UWSim). The algorithms have been tested with both sources, obtaining a single point cloud which can be processed using Point Cloud Library (PCL [14]). This processing consists on downsampling the point cloud, in order to decrease the number of points using only the most relevant ones; and apply it an outlier filter, in order to remove the points that are potential wrong values caused by spurious particles or optical reconstruction errors. Decreasing the number of points decreases the computation time of the following steps and increases the robustness of the overall method. With this relevant point cloud, a RANSAC algorithm, described in [7], is used twice to separate the object from the background. First, the background plane is detected with a RANSAC plane fitting algorithm and the resulting parameters are used to remove the plane inliers from the original point cloud. In the next step, other RANSAC algorithm is used to obtain the cylinder parameters associated to the object (that is supposed to be an amphora). These algorithms are parameterized to allow fitting quality and performance control. The main parameters are distance to the plane threshold in case of the plane segmentation, and distance to the cylinder and maximum radius in case of the cylinder. The result of these steps is a set of inliers that represent the detected amphora points and the analytical parameters: a point in the obtained model axis, the axis direction and the cylinder radius. This process has been tested separately from the final execution using clouds extracted with all the previous point cloud sources with different degree of success. In general, stereo cameras perform better with good light conditions such as the experimental water tank or the simulation scene while laser reconstruction benefits from darker conditions where the contrast with the green ray is higher (like in deep sea operations or the water tank with lights off and windows closed). Different segmentations from virtual and real stereo cameras are shown in Figures 4-8 and will be further explained in the Results section.

4. Grasp specification and execution Using the cylinder model and corresponding points (also called inliers) obtained with the RANSAC algorithm a grasp posture can be specified. To avoid errors, the grasp point is computed using the most significant points of the cylinder inliers (the 90% nearest to the centre points). The middle point of the cylinder axis is used as a starting grasp point. Then, taking into account the amphora radius and desired approach distance and angle, the grasping end-effector frame is moved away from the starting position, with this free variables allowing computing different grasp frames around the cylinder axis. This freedom allows two different possibilities: (1) use these variables to maximize grasp characteristics such as angle with the floor and stability autonomously or (2) use them to allow the end-user to set up a grasp with an easy and quick interface. While the first approach is really appealing, sometimes the loss of autonomy of the second option becomes a huge robustness increase, as the user is who decides whether a grasp pose is good enough or not. This differs from the approach used in [3] where the user was selecting a pair of

D. Fornas et al. / Improving Autonomous Underwater Grasp Specification

49

Figure 3. Grasp specification interface showing the two types of grasp: side grasp (top left) and front grasp (top right). UWSim showing the GT-arm in the obtained grasp poses from two grasp specification processes (bottom).

points within a 2D image. In our approach, the user modifies the grasp pose in a 3D space instead until he thinks it is a desirable position. This could be a quite difficult task if the interface allowed to set the grasp in a completely free 3D space. For this reason, an interface was developed. Using the obtained analytical model to set the grasp approach vector allows the user to move the end-effector around the cylinder axis to place it in the desired pose really quickly. Two possible grasp configurations are shown in Figure 3. After the grasp to perform has been specified, it is necessary to check whether it is feasible or not. This can be done by computing the inverse kinematics of the whole arm kinematic chain and calculating its reachability. Our approach is to adopt a classical iterative inverse jacobian method where the jacobian is computed in order to exploit the kinematic redundancy of the system. This process can be done immediately after any movement of the grasp posture, showing it to the user the moment after a new posture has been chosen. When the grasp frame is selected and reachable it is executed in the UWSim simulator. In Figure 3 the arm configuration for the two possible approaches can be seen. With the 7-DOF GT-arm there are enough degrees of freedom to reach different positions with completely different orientations while the use of the ARM5E arm would constrain the possible approach vectors in a great extent. This would make almost impossible for the user to set orientation variations without including the vehicle in the kinematic chain. Finally, the grasp is executed moving the end-effector directly towards the object axis. The grasp simulation provided by UWSim consist of a capability called virtual object picker, that attaches the object to the inner part of the gripper when a given distance threshold is reached, thus assuming that if the hand reaches the object without colliding with the jaws it will be able to grasp it. Although it doesn’t use physics yet to perform the grasp, it lets the user visualize how it would perform in a real scenario. To increase the capabilities of UWSim in this sense is not the goal of this paper.

50

D. Fornas et al. / Improving Autonomous Underwater Grasp Specification

The low-level control architecture, including the arm kinematics, was implemented in C++ and makes use of ROS for inter-module communications. The kinematic module accepts either Cartesian or joint information (i.e. pose, velocity). In the specification stage, velocity control is used to move the end-effector towards the desired position. Then, in the execution stage, linear velocity towards the center of the object is applied. Finally the robot is commanded to a folded configuration to carry the object.

5. Results As mentioned above, the grasp specification algorithm presented in this paper has been tested with various image sources and has been used to execute a grasp with the described vehicle in the simulation environment, demonstrating the grasp simulation capabilities of the UWSim. Moreover, its use has made possible using an unavailable 7-DOF arm that can exploit the algorithms better than using an arm with less DOFs. This decision (the use of UWSim) has also prevented most optical errors to appear because the light conditions of the simulated environment are nearly ideal. The grasp specification interface can be seen in http://youtu.be/cXUz1Wx9Ycg. There can be seen the two grasp possibilities proposed for a cylindrical object, where a non expert user can guide the system to a proper starting position. Then, the grasp execution capabilities are shown in http://youtu.be/3lSvb6Tgrbs, where the robot aproaches from the specified position to the object and carries the amphora. These trials don’t fully validate the execution algorithm but give us a sense of the integration level of the perception, specification, user interaction, kinematics and control systems of the robot with this framework. The execution in the real scenario might need a higher level of control complexity using a variety of sensors.

Figure 4. Point clouds from the Videre Stereo Camera. Segmentation with 10 cm. to the cylinder threshold (center). Segmentation with 5 cm. threshold (right).

Figure 5. Point clouds from the virtual stereo camera. Extreme scenario where accurate segmentation parameters are needed.

D. Fornas et al. / Improving Autonomous Underwater Grasp Specification

51

The segmentation and grasp specification stages have been analyzed with real and virtual images from both the stereo camera and the laser reconstruction method. This steps are critical for the final grasp posture quality. Several point clouds have been segmented in order to analyze the parametrization of the RANSAC algorithm. Depending on the distance to the plane threshold, distance to the cylinder threshold and maximum cylinder radius a better segmentation can be obtained. In Figure 4, the point cloud obtained with the real stereo camera is segmented with good results and varying the thresholds results in some point of the floor considered as part of the cylinder. A similar effect can be seen in the simulator experiment (Figure 5), where with certain plane threshold some part of the pool floor is not considered as part of the plane, resulting in an error in the following cylinder segmentation. This, though, can be considered an extreme case because the floor corner has lots of points and its curved shape fits well in the cylinder shape. With regard of the laser reconstruction, Figure 6 shows that this method generates more occlusions than stereo vision due to the fact that some areas lightened by the laser are not seen from the camera and some areas can not be covered by the laser. In this context, a lower distance to cylinder threshold allows the algorithm to segment only the points that are really part of the object. Other possibility could be to increase the plane threshold in order to extract more plane inliers.

Figure 6. Point clouds from real laser reconstruction and segmentation with varying cylinder segmentation thresholds.

Although the laser reconstruction within the UWSim (Figure 7) obtains worse point clouds than the virtual stereo vision, the segmentation results are still great because the cylindrical shape is well preserved by the reconstruction. The previous results seem to indicate that using a higher threshold can be better to assure that the cylinder is found. However, if two different objects are placed in the scene and the threshold is not high enough the object with more points can be segmented instead of the smaller more cylindrical object (Figure 8). With respect to the segmentation execution time, the following results show that is not possible to use them in real time if the object is moving. The number of iterations of the algorithm constrains the upper limit of time, which in this case has been up to 0.4 seconds. For this analysis, it has been considered a distance to the plane threshold of 10 cm. and 5 cm. for various executions. Tables 1 and 2 show the point cloud size, the mean segmentation time and the mean inliers percentage. In Table 1, where a higher threshold is used, the relationship between the point cloud size and execution time is almost linear. However, in Table2, execution time depends on the segmentation complexity. When the threshold is lowered, the time spent is generally higher. On the contrary, with more percentage of inliers the computing time is lessened.

52

D. Fornas et al. / Improving Autonomous Underwater Grasp Specification

Figure 7. Point clouds from virtual laser reconstruction and segmentation with different cylinder segmentation thresholds.

Figure 8. Point clouds from real laser segmentation of a scene with a clay jar and a wooden trunk. Low thresholds (center) lead the algorithm to segment bigger objects instead of cylindrical shaped ones.

Source

Num. of points

Seg. Time (s)

% of plane inliers

Real Laser

51971

0.058

77

Real Stereo

307200

0.257

59

Virtual Stereo

786432

0.414

78

Virtual Laser

60326

0.043

90

Table 1. Plane segmentation time and percentage of found inliers using a 10 cm. threshold with RANSAC.

Source

Num. of points

Seg. Time (s)

% of plane inliers

Real Laser

51971

0.27

47

Real Stereo

307200

1.066

37

Virtual Stereo

786432

0.397

77

Virtual Laser

60326

0.056

82

Table 2. Plane segmentation time and percentage of found inliers using a 5 cm. threshold with RANSAC.

Finally, Table 3 shows the results of the next step, which consists in the cylinder segmentation for both plane segmentation cases, showing the number of points after removing the plane inliers. The results are very different between sources. With real sources a looser threshold removes more points from the original cloud, thus making the segmentation quicker. Using virtual sensors, though, cylinder segmentation is always relatively fast because points adjust very well to the model, even more in the case of the virtual laser segmentation where the amount of points is lower. This results indicate that the segmentation process is flexible but not fast enough to be executed at a reasonable frequency like 5 Hz. and other tracking techniques should be used to follow the object movements.

53

D. Fornas et al. / Improving Autonomous Underwater Grasp Specification

Source

10 cm. threshold Plane outliers Time (s)

Outliers

5 cm. threshold Time (s)

Real Laser

11536

1.385

27497

1.81

Estéreo real

124850

0.476

193518

4.8

Estéreo virtual

171560

0.22

174087

0.18

Virtual Laser

5860

0.022

10688

0.08

Table 3. Cylinder segmentation time using RANSAC depending on the threshold.

6. Conclusions and future work After very successful research achievements through previous projects like TRIDENT, following a semi-autonomous strategy, the ongoing project TRITON follows a more autonomous strategy. This paper presents a new framework to improve the autonomy of manipulation of unknown structured objects in underwater environments. The experimental validation has been focused on grasp tasks in the constrained, yet realistic, problem of grasping unknown cylindrical objects like an amphora or a pipe, although this framework can be further developed to recognize other primitive shapes. These shapes could be spheres, cuboids or even more complex models that can be approximated with a set of primitive shapes as shown in [9]. With that flexibility, the system could be capable of specify the grasp of other objects without the need of knowing the exact model of the object but its approximate shape. However this flexibility will have the drawback of more computation time in these time sensitive tasks. The steps to obtain a model, specify a grasp and execute it are described and should be further validated in real and practical scenarios. The real testbed for the experiments has been shown in Figure 1. The optical devices and segmentation algorithms have been extensively tested (see Figure 9), but a real grasp execution is still work in progress. This will demonstrate the use of this specification procedure, although it will not use all its flexibility due to the fact that it is not possible to reach all possible orientations with the ARM5E arm, as it has only 5-DOF. The real object to grasp, with cylindrical shape, is also shown in Figure 1. With this experiments, other issues are expected such as the grasp execution control using tactile sensor feedback.

Figure 9. User specified grasp pose with a point cloud using the Videre Stereo Camera (left). Right camera image (right).

54

D. Fornas et al. / Improving Autonomous Underwater Grasp Specification

Acknowledgement This research was partly supported by Spanish Ministry of Research and Innovation DPI2011-27977-C03 (TRITON Project), by Foundation Caixa Castelló Bancaixa PI-1B2011-17, by Universitat Jaume I PhD grants PREDOC/2012/47 and PREDOC/2013/46, and by Generalitat Valenciana PhD grant ACIF/2014/298.

References [1] [2]

[3]

[4]

[5]

[6]

[7] [8] [9]

[10]

[11]

[12] [13] [14]

G. Marani, D. Angeleti, G. Cannata, G. Casalino, On the functional and algorithmic control architecture of the amadeus dual arm robotic workcell, in WAC 2000, Hawaii (USA), June 2000. P. J Sanz, P. Ridao, G. Oliver, G. Casalino, Y. Petillot, C. Silvestre, C. Melchiorri, A. Turetta, TRIDENT: An European project targeted to increase the autonomy levels for underwater intervention missions, in OCEANS 2013 MTS/IEEE, San Diego (EEUU), 2013 , vol., no., pp.1,10, 23-27 September 2013. M. Prats, J. J. Fernández, and P. J. Sanz, Combining Template Tracking and Laser Peak Detection for 3D Reconstruction and Grasping in Underwater Environments, in International Conference on Intelligent Robots and Systems IROS 2012, Algarve (Portugal), October 2011. P. J. Sanz, J. Pérez, A. Peñalver, J. J. Fernández, D. Fornas, J. Sales, R. Marin, GRASPER: HIL Simulation Towards Autonomous Manipulation of an Underwater Panel in a Permanent Observatory’, in OCEANS 2013 MTS/IEEE, San Diego (EEUU), September 2013. J. Kalwa, M. Carreiro-Silva, F. Tempera, J. Fontes, R. S. Santos, M. C. Fabri, L. Brignone, P. Ridao, A. Birk, T. Glotzbach, M. Caccia, J. Alves, A. Pascoal, The MORPH concept and its application in marine research, in OCEANS 2013 MTS/IEEE , vol., no., pp.1,8, Bergen, 10-14 June 2013 doi: 10.1109/OCEANS-Bergen.2013.6607988 D. M. Lane, F. Maurelli, T. Larkworthy, D. Caldwell, J. Salvi, M. Fox, K. Kyriakopoulos, PANDORA: Persistent Autonomy through Learning, Adaptation, Observation and Re-planning, in 3rd IFAC Workshop on Navigation, Guidance and Control of Underwater Vehicles. R. Schnabel, R. Wahl, R. Klein Efficient RANSAC for Point-Cloud Shape Detection, in Computer Graphics Forum (Vol. 26, No. 2, pp. 214-226). Blackwell Publishing Ltd., June 2007. S. Garcia, Fitting primitive shapes to point clouds for robotic grasping, Doctoral dissertation, Master Thesis, Royal Institute of Technology, 2009, Sweden. K. Huebner, S. Ruthotto, D. Kragic, Minimum volume bounding box decomposition for shape approximation in robot grasping, in IEEE International Conference on Robotics and Automation, 2008, vol., no., pp.1628,1633, 19-23 May 2008 M. Prats, J. Pérez, J. J. Fernández, P. J. Sanz, An Open Source Tool for Simulation and Supervision of Underwater Intervention Missions, in International Conference on Intelligent Robots and Systems IROS 2012, Algarve(Portugal), October 2011. M. Quigley, B. Gerkey, K. Conley, J. Faust, T. Foote, J. Leibs, E. Berger, R. Wheeler, A. Y. Ng, ROS: an open-source Robot Operating System, in International Conference on Robotics and Automation (ICRA), Kobe(Japan), May 2009. D. Ribas, P. Ridao, LL. Magí, N. Palomeras, M. Carreras. The Girona 500, a multipurpose autonomous underwater vehicle. In Proceedings of the Oceans IEEE, Santander, Spain, June 2011. CSIP. Light-weight arm 5e datasheet. http://www.ecarobotics.com/ftp/ecatalogue/520/Fiche_Light Weight_5E.pdf. R. B. Rusu, S. Cousins, 3D is here: Point Cloud Library (PCL), in IEEE International Conference on Robotics and Automation (ICRA), Shanghai(China), May 2011.

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-55

55

Emotions Classification using Facial Action Units Recognition David SANCHEZ-MENDOZA, David MASIP, Àgata LAPEDRIZA {dsanchezmen,dmasipr,alapedriza}@uoc.edu Scene Understanding and Artificial Intelligence Lab, Open University of Catalonia Abstract. In this work we build a system for automatic emotion classification from image sequences. We analyze subtle changes in facial expressions by detecting a subset of 12 representative facial action units (AUs). Then, we classify emotions based on the output of these AUs classifiers, i.e. the presence/absence of AUs. We base the AUs classification upon a set of spatio-temporal geometric and appearance features for facial representation, fusing them within the emotion classifier. A decision tree is trained for emotion classifying, making the resulting model easy to interpret by capturing the combination of AUs activation that lead to a particular emotion. For Cohn-Kanade database, the proposed system classifies 7 emotions with a mean accuracy of near 90%, attaining a similar recognition accuracy in comparison with non-interpretable models that are not based in AUs detection. Keywords. Computer Vision, Emotion Detection, Facial Expression Recognition, Facial Action Units,

1. Introduction The dynamic facial behavior is a rich source of information for conveying emotions. In any communication, humans infer much of the information of the message from the emitter’s facial expression. For instance, the severity or the kindness of the message is often perceived from relevant changes in expressiveness [1]. Facial expression is related to the Facial Action Coding System (FACS), developed by Ekman and Friesen [2]. FACS defines accurately and unequivocally a set of atomic and isolated facial movements people are able to execute. In particular, FACS uses a set of 44 main Action Units (AUs) with regard to their location and intensity. Ekman and Friesen also defined six basic emotions [3] that are common across different cultures, namely anger, disgust, fear, happiness, sadness and surprise. In this paper we classify emotions given the presence (or absence) of a subset of AUs. Additionally, we pursue interpretable models from which emotions can be described in terms for AUs. We propose a two-step method: (i) we train Support Vector Machines to classify AUs using two kinds of facial representations, namely geometric descriptor (facial structure) and local binary patterns (facial appearance), and (ii) we use the output of the AUs classifiers to train interpretable emotion detection models by means of Decision Trees fusing both facial representations. The resulting emotion recognition model is both coherent and interpretable, while attains a similar recognition accuracy in comparison with non-interpretable models that are not based in AUs.

56

D. Sanchez-Mendoza et al. / Emotions Classification Using Facial Action Units Recognition

Figure 1. System overview

2. Related Work Our method involves two stages: the first one involves training classifiers, in particular Support Vector Machines, to recognize a subset of representative Action Units, while the second stage uses the AUs activation to train a Decision Tree to classify emotions. 2.1. Action Units detection We rely on a well-known methodology based on the Facial Action Coding System [2]. The FACS system comprises 44 Action Units (AU). Each AU defines a set of muscle movements that anatomically define expression descriptors that can be used for higher level emotion recognition tasks. Two kinds of facial representation exist: geometric-based and appearance-based. Geometric features represent the shape of the face based on the location and displacement in time of a set of landmark points fitted by means of Active Shape Models [4]. On the other hand, appearance features such as Gabor wavelets [5], Histograms of Oriented Gradients [6,7] or Local Binary Patterns [8] model face appearance changes. Additionally, AUs are detected on static frames but also on frame sequences by characterizing the temporal dynamics. The fusion of geometric-based and appearance-based features along with taking into account the temporal dynamics is proven to increase AU detection performance. Tian et al. [9] detect AUs in static frames by using a geometric-based descriptor that defines the state (open, closed, present, etc.) of facial components such as lip, eyes, brow and cheek based on the position of a set of fiducial points. Valstar and Pantic [10] and

D. Sanchez-Mendoza et al. / Emotions Classification Using Facial Action Units Recognition

57

Valstar et al. [11] use the dynamics of a set of tracked fiducial points to recognize AUs in frame sequences. Jiang et al. [12] use LBP [8] and LBP-TOP [13] as static and dynamic appearancebased descriptors for AU detection, the latter achieving higher accuracy. Gabor wavelets are used by Tian et al. [14] along with geometric descriptors, the best AU recognition performance was attained by combining both kind of descriptors. Almaev and Valstar [15] define a dynamic appearance-based descriptor from the combination of LBP-TOP and Gabor wavelets, called LGBP-TOP, outperforming LBP-TOP within the MMI Database [16]. 2.2. Emotion Recognition With regard to emotion detection we rely on the 7 basic emotions, namely anger, contempt, disgust, fear, happiness, sadness and surprise. As well as AUs, emotions can be detected on static frames or frame sequences using geometric-based and appearancebased descriptors or a combination of both. Zhang et al. [17] compared geometric-based vs. appearance-based features and concluded that Gabor wavelets outperformed the first one. They also showed that combining both kinds of features resulted in a slight improvement than just using appearance-based features. More recently, Jeni et al. [18] use only dynamic geometric descriptors to recognize emotions by means of the Procrustes’ transformation. Kotsia and Pitas [19] use a gridtracking method over a set of fiducial points of the face as a geometric descriptor for emotion inferring. On the other hand, Shan et al. [20,21] use Local Binary Patterns with multi-scaled sliding windows as an appearance descriptor for emotion detection. Bartlett et al. [5] developed a real-time system for emotion recognition using a bank of Gabor filters. From the works cited so far, only [19] makes use of AUs to infer emotions. In a similar way, Pantic and Rothkrantz [22] developed an expert system for emotion inferring based on the presence/absence of AUs. Both [19] and [22] establish a set of rules with the most representative AUs for the emotion recognition task.

3. Data Collection 3.1. Database The database used in this work is the CK+ Extended Cohn-Kanade [23]. This database includes 593 full FACS coded sequences from 123 different subjects, all of them starting with a neutral expression going towards peak expression. All the image sequences are annotated with 68 landmark points. Participants are 18 to 50 years of age, being 69% of them females, 81% Euro-Americans, 13% Afro-Americans and 6% from other ethnic groups. A subset of 118 subjects are annotated with basic emotions: anger, contempt, disgust, fear, happiness, sadness and surprise. In particular, we choose a set of 316 sequences from 117 different individuals, namely all the emotion labeled sequences having more than 9 frames.

58

D. Sanchez-Mendoza et al. / Emotions Classification Using Facial Action Units Recognition

3.2. Image registration All the faces are normalized to have horizontal 50px Interocular Distance by means of an affine transformation T based on two referential points: the eyes center. This preprocessing aims to minimize the changes in the fiducial points location, rigid head motion and distance to the camera across different individuals.

4. Action Units classifiers Actually, we build a set of AUs classifiers choosing a representative subset of 12 AUs (see table 1), i.e. the ones covering the most relevant facial parts and including enough training samples in the CK database to obtain robust AU classifiers. Action Unit

Action Unit

Action Unit

1 - Inner Brow Raiser

2 - Outer Brow Raiser

4 - Brow Lowerer

5 - Upper Lid Raiser

6 - Cheek Raiser

9 - Nose Wrinkle

12 - Lip Corner Puller

15 - Lip Corner Depressor

17 - Chin Raiser

20 - Lip Stretcher

25 - Lips Part

27 - Mouth Stretch

Table 1. Subset of the 12 chosen AUs (all the images belong to Cohn-Kanade database [23]).

We train two classifiers for each AU, the first using a geometric descriptor (section 4.1) and the latter using an appearance descriptor (section 4.2). 4.1. Geometric descriptor The structural descriptor, based on the one introduced by Valstar and Pantic [24] is defined as: Let f1 , ..., fW be the W frames of a particular sliding window position in which fW is considered the peak expression frame. Let Pt = {pt1 , ..., ptK } : pti ∈ N20 , i ∈ {1, .., K} be the K locations of the landmarks points within a particular frame ft : t ∈ {1, .., W }. The subtraction between the position of a landmark on the last frame with respect to the same landmark on the first frame: 0 pW i − pi

∀i = 1, ..., K

(1)

D. Sanchez-Mendoza et al. / Emotions Classification Using Facial Action Units Recognition

59

The subtraction among all the landmarks within the last frame: W I W = pW i − pj

∀i, j = 1, ..., K : i = j

(2)

The subtraction defined in equation 2 with respect to the first frame: F W = IW − I0

(3)

The subtraction defined in equation 2 with respect to the mid-sequence frame: M W = I W − I W/2

(4)

All the subtractions defined in equations 1, 2, 3 and 4 are converted to polar coordinates, making up a 752 features descriptor. 4.2. Appearance descriptor: LBP from three orthogonal planes Basic LBP is a simple operator that labels a pixel by thresholding its neighborhood by its color intensity and considering the result as a binary number (see figure 2). The resulting descriptor for the whole image is the histogram of the pixels labels. Basic LBP parameters include number of neighbors and radius, being LBPP,R the LBP operator sampling P neighbors with radius R.

Figure 2. The basic LBP operator on a 3 × 3 region. The LBP pattern for this particular center pixel would be 11001101.

Applying LBPP,R over every pixel image generates a histogram with 2P bins. However, Ojala et al. [25] found that uniform patterns (patterns containing at most 2 bitwise transitions) represent the 90% of all patterns when using LBP8,1 setting. Uniform LBPu2 P,R applied over every pixel image generates a histogram with just 59 bins. Rather than applying this operator over the entire face, describing faces using LBP requires dividing the image into m × n local regions, computing LBPu2 P,R on each region and concatenating them. Retaining the information about spatial relations is proven to increase accuracy [15,20,26]. Describing a face in this fashion takes up 59 × m × n features. Since temporal dynamics information is shown to improve emotion detection accuracy we choose Local Binary Pattern from Three Orthogonal Planes (LBP-TOP) [13] as appearance descriptor. LBP-TOP receives a frame sequence as an input and extends the LBP operator by computing the patterns not just on the XY plane but also over the XT and Y T planes. Applying LBP-TOPu2 P,R on a sequence of k frames splitting the XY planes in m × n local regions generates a feature vector of size m × n × 59 × 3 (notice that the feature vector length is independent from the frames sequence length). We choose LBP-TOPu2 8,2 to describe all the sequences by dividing the XY planes into 7 × 6 regions [20]. Thus, the appearance descriptor for a given sequence has 3 × 7 × 6 × 59 = 7434 features.

60

D. Sanchez-Mendoza et al. / Emotions Classification Using Facial Action Units Recognition

4.3. Classification We train a classifier to detect each AU independently (one-vs-all setting). Given a particular AU, we consider the frame sequences that contain this AU as positive samples and all the videos not containing it as negative ones. We use the Support Vector Machine (SVM) [27] classifier, in particular the Radial Basis Function kernel LIBSVM implementation [28]. Certain AUs are more prone to be detected by means of geometric approaches (eyebrow rising, mouth stretch, etc,), while others involve wrinkles that evoke changes in the texture (chin raiser, nose wrinkle, etc.) and are detected more accurately using appearance features. Therefore, given an AU, we train two separate classifiers for it: one using the geometric-based descriptor (section 4.1) and the other one using the appearancebased descriptor (section 4.2). We will end up with 24 AUs classifier (two classifiers for each AU). Before training the SVM we select the optimal features using GentleBoost [29]. The classifiers accuracy is assessed using double 5-fold person-independent cross validation. Given a particular fold, none of the test samples is used neither for feature selection (GentleBoost) nor SVM parameters (cost and gamma) optimization. Table 2 shows the results of AUs detection. AU

1a

1g

2a

2g

4a

4g

5a

5g

6a

6g

9a

9g

A P R F

96% 96% 94% 95%

96% 97% 91% 94%

94% 92% 87% 89%

95% 96% 88% 92%

84% 81% 77% 79%

87% 82% 85% 83%

90% 84% 77% 80%

88% 79% 76% 77%

87% 82% 73% 78%

87% 82% 72% 77%

95% 96% 77% 86%

96% 94% 86% 90%

AU

12a

12g

15a

15g

17a

17u

20a

20g

25a

25g

27a

27g

A P R F

92% 88% 80% 84%

94% 93% 82% 87%

90% 56% 27% 37%

87% 44% 36% 40%

83% 80% 70% 75%

80% 72% 70% 71%

94% 82% 34% 49%

95% 93% 54% 68%

95% 95% 95% 95%

88% 90% 89% 89%

96% 92% 88% 90%

97% 94% 94% 94%

Table 2. Accuracy (A), precision (P), recall (R) and F-score (F) for all the AU classifiers. Left side of each column, labeled as ’a’, corresponds to the appearance-based descriptor while the right side, labeled as ’g’, corresponds to the geometric-based descriptor.

5. Emotion recognition We classify the seven basic emotions based on the absence/presence of AUs, i.e. the output of the 24 AUs classifiers depicted in section 4.3: we train two SVMs for each AU, the former using the geometric features to describe faces and the latter using appearance features to do so. Thus, the emotion dataset has 24 categorical features, having each feature the set {1, −1} as possible values depending on the AUs classifier output for a given emotion labeled example. As in section 3.1, we choose a set of 316 sequences from 117 different individuals, namely all the emotion labeled sequences having more than 9 frames.

D. Sanchez-Mendoza et al. / Emotions Classification Using Facial Action Units Recognition

61

We train a decision tree [30] (a classifier that is simple to understand and interpret in terms of AUs activation) using the emotion dataset. Two parameters (minimum observation per leaf and minimum observations for an impure node to be split) are optimized using double person-independent cross validation. Given a particular fold, none of the test samples is used for parameter optimization. The test accuracy of the decision tree for the emotion classification task is 89.9%, its confusion matrix is shown in table 3. AN

CO

DI

FE

HA

SA

SU

rec

AN

40

1

3

0

0

1

0

88.88%

CO

0

15

0

0

0

1

0

93.75%

DI

7

1

46

2

0

0

0

82.14%

FE

0

0

1

16

2

4

1

61.54%

HA

0

2

0

1

66

0

0

95.65%

SA

2

0

0

4

0

22

0

78.57%

SU

0

2

0

0

0

0

76

97.44%

pre

81.63%

71.43%

92.00%

69.57%

97.06%

78.57%

98.70%

acc: 89.9%

Table 3. Confusion matrix for the test samples. Precision (pre) and recall (rec) for each class and global accuracy (acc) are also shown. AN=anger, CO=contempt, DI=disgust, FE=fear, HA=happiness, SA=sadness, SU=surprise.

6. Discussion Classifying emotions based on the output of AUs classifiers trained with both appearancebased and geometric-based descriptors improves the accuracy. Doing this task using just one kind of AUs classifiers drops the overall performance to 84.9% for appearance-based descriptors and 82.8% for geometric-based ones. Nevertheless, the emotion classifying accuracy is increased up 89.9% when using both kind of facial descriptors to model AUs activation (see table 3). Inferring emotions from AUs activation does not worsen the performance compared to doing it directly, i.e. not taking into account AUs. For instance, classifying emotions by directly training an SVM using the facial representations from sections 4.1 and 4.2 leads to similar emotion detection accuracy. However, all possible interpretations of emotions in terms of AUs activation are completely lost. Given the nature of the emotion classifier decision tree, a set of rules can be established for inferring emotions given the activation of a set of AUs, allowing and facilitating its psychological interpretation. For clarity and interpretability purposes, we show in figure 3 the emotion classifier decision tree but just using a half of the features, i.e. one for each AU classifier instead of two. In particular, given an AU, we just consider the output of the correspondent AU classifier that uses the appearance based features (see section 4.2). Let us show some paths across the emotion classifier decision tree to prove that they are coherent in terms of classifying emotions as a combination of AUs:

62

D. Sanchez-Mendoza et al. / Emotions Classification Using Facial Action Units Recognition

Figure 3. Emotion classifier decision tree.

· Absence of all AUs implies Contempt, which is the basic emotion closer to neutral face. · Outer Brow Raise and Mouth Stretch implies Surprise, whereas Outer Brow Raise and Lip Stretch implies Fear. However, just Outer Brow Raise implies Sadness. · Lip Corner Puller and Cheek Raise or Lip Part implies Happiness, while isolated Lip Corner Puller implies Contempt. · Nose Wrinkle implies Disgust. · Inner Brow Raise together with Lip Part and Upper Lid Raise implies Fear, however the emotion changes to Sadness if removing Upper Lid Raise from them. · Brow Lowerer implies Sadness if accompanied with Lip Corner Depressor, Fear with Lip Part and Disgust with Cheek Raise. Whereas isolated Brow Lowerer implies Anger. According to the emotion classifier confusion matrix shown in table 3, fear and sadness are the most difficult emotions to be recognized. Intuitively, both are similar in facial expression terms. Actually, most errors come from classifying fear as sadness and anger as disgust and vice versa, being the latter ones also similar regarding to facial expression shape. Finally, emotion classification is built on top of AUs recognition and hence it is conditioned to the AUs classifiers accuracy (table 2). Most of the AUs classifiers have a great performance in terms of precision and recall using any of both geometric and appearance facial representations. However, Lip Corner Depressor (AU #15) and Lip Stretcher (AU #20) offer poor classification performance. Since both AUs are related

D. Sanchez-Mendoza et al. / Emotions Classification Using Facial Action Units Recognition

63

to fear and sadness, this makes these emotions harder to be detected using the AUs classifiers output.

7. Conclusion In this paper we propose an emotion classification scheme from facial expression analysis. In particular, we classify emotions based on AUs activation, i.e. the presence/absence of AUs. Our method consists of two phases: (i) we train predictive models (in particular SVMs) for a representative subset of AUs using separately both geometric-based and appearance-based facial representations, and (ii) we build a decision tree for emotion classification task based on the AUs activation given one particular image sequence. From our results we can conclude that using a decision tree to classify emotions in terms of AUs activation improves the interpretability and clarity of the classifier (see figure 3), while attaining a similar performance compared to direct emotion classifiers, i.e. not using AUs activation. Additionally, we showed that combining both geometricbased and appearance-based facial representations increases the overall accuracy of the decision tree for the emotion classification task.

References [1] [2]

[3] [4]

[5]

[6]

[7]

[8]

[9] [10]

[11]

J. Zaki, N. Bolger, and K. Ochsner, “It takes two the interpersonal nature of empathic accuracy,” Psychological Science, vol. 19, no. 4, pp. 399–404, 2008. P. Ekman and W. V. Friesen, “Facial action coding system: A technique for the measurement of facial movement. palo alto,” CA: Consulting Psychologists Press. Ellsworth, PC, & Smith, CA (1988). From appraisal to emotion: Differences among unpleasant feelings. Motivation and Emotion, vol. 12, pp. 271– 302, 1978. P. Ekman and W. V. Friesen, “Constants across cultures in the face and emotion.,” Journal of personality and social psychology, vol. 17, no. 2, p. 124, 1971. P. Lucey, J. F. Cohn, I. Matthews, S. Lucey, S. Sridharan, J. Howlett, and K. M. Prkachin, “Automatically detecting pain in video through facial action units,” Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 41, no. 3, pp. 664–674, 2011. M. S. Bartlett, G. Littlewort, I. Fasel, and J. R. Movellan, “Real time face detection and facial expression recognition: Development and applications to human computer interaction.,” in Computer Vision and Pattern Recognition Workshop, 2003. CVPRW’03. Conference on, vol. 5, pp. 53–53, IEEE, 2003. Z. Li, J.-i. Imai, and M. Kaneko, “Facial-component-based bag of words and phog descriptor for facial expression recognition,” in Systems, Man and Cybernetics, 2009. SMC 2009. IEEE International Conference on, pp. 1353–1358, IEEE, 2009. N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1, pp. 886–893, IEEE, 2005. T. Ahonen, A. Hadid, and M. Pietikainen, “Face description with local binary patterns: Application to face recognition,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 28, no. 12, pp. 2037–2041, 2006. Y.-l. Tian, T. Kanade, and J. F. Cohn, “Recognizing action units for facial expression analysis,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 23, no. 2, pp. 97–115, 2001. M. Valstar and M. Pantic, “Fully automatic facial action unit detection and temporal analysis,” in Computer Vision and Pattern Recognition Workshop, 2006. CVPRW’06. Conference on, pp. 149–149, IEEE, 2006. M. F. Valstar, I. Patras, and M. Pantic, “Facial action unit detection using probabilistic actively learned support vector machines on tracked facial point data,” in Computer Vision and Pattern RecognitionWorkshops, 2005. CVPR Workshops. IEEE Computer Society Conference on, pp. 76–76, IEEE, 2005.

64 [12]

[13]

[14]

[15]

[16] [17]

[18]

[19]

[20] [21]

[22] [23]

[24]

[25]

[26]

[27] [28] [29] [30]

D. Sanchez-Mendoza et al. / Emotions Classification Using Facial Action Units Recognition

B. Jiang, M. F. Valstar, and M. Pantic, “Action unit detection using sparse appearance descriptors in space-time video volumes,” in Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on, pp. 314–321, IEEE, 2011. G. Zhao and M. Pietikainen, “Dynamic texture recognition using local binary patterns with an application to facial expressions,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 29, no. 6, pp. 915–928, 2007. Y.-l. Tian, T. Kanade, and J. F. Cohn, “Evaluation of gabor-wavelet-based facial action unit recognition in image sequences of increasing complexity,” in Automatic Face and Gesture Recognition, 2002. Proceedings. Fifth IEEE International Conference on, pp. 229–234, IEEE, 2002. T. R. Almaev and M. F. Valstar, “Local gabor binary patterns from three orthogonal planes for automatic facial expression recognition,” in Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on, pp. 356–361, IEEE, 2013. M. Pantic, M. Valstar, R. Rademaker, and L. Maat, “Web-based database for facial expression analysis,” in Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on, pp. 5–pp, IEEE, 2005. Z. Zhang, M. Lyons, M. Schuster, and S. Akamatsu, “Comparison between geometry-based and gaborwavelets-based facial expression recognition using multi-layer perceptron,” in Automatic Face and Gesture Recognition, 1998. Proceedings. Third IEEE International Conference on, pp. 454–459, IEEE, 1998. L. A. Jeni, D. Takacs, and A. Lorincz, “High quality facial expression recognition in video streams using shape related information only,” in Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pp. 2168–2174, IEEE, 2011. I. Kotsia and I. Pitas, “Facial expression recognition in image sequences using geometric deformation features and support vector machines,” Image Processing, IEEE Transactions on, vol. 16, no. 1, pp. 172– 187, 2007. C. Shan, S. Gong, and P. W. McOwan, “Facial expression recognition based on local binary patterns: A comprehensive study,” Image and Vision Computing, vol. 27, no. 6, pp. 803–816, 2009. C. Shan, S. Gong, and P. W. McOwan, “Robust facial expression recognition using local binary patterns,” in Image Processing, 2005. ICIP 2005. IEEE International Conference on, vol. 2, pp. II–370, IEEE, 2005. M. Pantic and L. J. Rothkrantz, “Expert system for automatic analysis of facial expressions,” Image and Vision Computing, vol. 18, no. 11, pp. 881–905, 2000. P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, “The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on, pp. 94– 101, IEEE, 2010. M. F. Valstar and M. Pantic, “Fully automatic recognition of the temporal phases of facial actions,” Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 42, no. 1, pp. 28–43, 2012. T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 24, no. 7, pp. 971–987, 2002. T. Ahonen, A. Hadid, and M. Pietikainen, “Face description with local binary patterns: Application to face recognition,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 28, no. 12, pp. 2037–2041, 2006. V. Vapnik, The nature of statistical learning theory. springer, 2000. C.-C. Chang and C.-J. Lin, “Libsvm: a library for support vector machines,” ACM Transactions on Intelligent Systems and Technology (TIST), vol. 2, no. 3, p. 27, 2011. J. Friedman, T. Hastie, and R. Tibshirani, “Additive logistic regression: a statistical view of boosting,” 1998. L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen, Classification and regression trees. CRC press, 1984.

Decision Support Systems

This page intentionally left blank

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-67

67

Intelligent System for Optimisation in Golf Course Maintenance Gerard PONS a,1 , Marc COMPTA a , Xavier BERJAGA a , Filippo LULLI b , and Josep Maria LOPEZ a a EDMA Innova, Vulpellac, Girona (Spain) {gerard.pons, marc.compta, xavier.berjaga, josepmaria.lopez}@edma.cat b Turf Europe Srl, Pisa (Italy) fi[email protected] Abstract. The use of conventional water resources for golf course activities is increasingly contested. As a result, in recent years there has been a considerable demand for golf courses to adopt environmentally sustainable strategies. Moreover, other exploitation-derived resources of turfgrass treatment have important economic costs. Thus, a major optimisation in resources and time is required, which is covered by the WATERGOLF project through different modules. This paper presents an intelligent system to optimise water consumption in golf courses, as well as to improve their management. Hence, the system is designed to suggest corrective actions through an expert system to the greenkeeper, as well as to trigger alarms in risky situations in four different modules: irrigation, weeds, diseases, and fertility. These suggestions are based on expert knowledge acquired from turfgrass experts and a set of measurements gathered by advanced sensors placed in the golf course. Although the intelligent system is currently being developed, it is expected that a further evaluation in real conditions will help golf facilities to reduce their maintenance costs and environmental impact. Keywords. Intelligent Systems, Rule Engine, Golf Course, Image Processing, Decision Support Systems

1. Introduction Recreational water takes up a growing percentage of total water use and it is mostly tied to reservoirs. When a reservoir has a water level higher than the designated for consumption, the excess of water could be categorized as for recreational usage. A report from the European Environment Agency (EEA) revealed that Europe has so far concentrated on increasing the supply of water rather than exploring ways to limit its demand. Golf courses are often targeted as using excessive amounts of water, especially in drier regions, where even some governments have labelled golf course usage as agricultural in order to deflect environmentalists charges of wasting water [1]. This is why the sector urgently needs to find ways for optimising its water consumption by increasing efficiency and using innovative technologies. 1 Corresponding

Author: Gerard Pons, EDMA Innova, Vulpellac, Spain; E-mail: [email protected].

68

G. Pons et al. / Intelligent System for Optimisation in Golf Course Maintenance

This work deals with the optimisation of water consumption of golf courses, but also with the enhancement of their maintenance and management, by providing an intelligent system to prevent and manage turfgrass diseases, the growth of weeds and fertility problems. The work presented in this paper is framed in the European WATERGOLF project [2]. The objective of the WATERGOLF project is to develop a system based on a wireless sensor network and an embedded Artificial Intelligence (AI) system that provides support to the irrigation processes on golf courses. Based on the estimations provided by the expert partners of the project, the expected impact is to save approximately one third of their current annual water usage. The system integrates both underground sensors, capable of measuring soil humidity, salinity, temperature, pH and nitrate concentration levels; and surface sensors, capable of measuring weather conditions and processing turfgrass images. All the measurements are wirelessly transmitted through low-consumption battery-powered sensors. The AI-driven software suggests the parameters for irrigation in different regions of the course, as well as informs the greenkepper of any existing or potentially looming turfgrass diseases along with possible treatments when required. The aim of this paper is to present the architecture of the intelligent system developed for the optimisation of golf course maintenance. The system is designed to suggest recommendations to the greenkeeper of the golf course, as well as dispatch warnings or alarms in risky situations, by using measurements from both underground and surface sensors. The paper is organised as follows: the next section provides a brief overview of the related work (Section 2). Section 3 overviews the general architecture of the WATERGOLF system. Section 4 outlines the intelligent system, and, finally, the last section (Section 5) provides concluding remarks and the future work planned to be carried out.

2. Related Work Artificial intelligence has been applied to irrigation control systems to optimise the use of water and fertilisers, among others. Mostly applied in precision farming [3,4], different technologies have been used, such as Neural Networks [5,6], fuzzy-based systems [7,8], multiagent-based systems [9,10], or expert systems [11,12]. The aim of these works is the optimisation of irrigation systems given the soil type, crop nature and other parameters. These technologies have been used for irrigation planning and control and have been widely proven as effective in this field. In the specific case of turfgrass crops, few works using artificial intelligence for their management can be found [13,14]. Therefore, current technology can provide solutions to the diagnosis and treatment of turfgrass problems. Several systems such as TORO [15], Bailoy [16], UgMO [17], ETS-Controls [18] or Rain Bird [19] allow computerised control of golf courses but they do not use intelligence systems. Regarding similar European projects, only the WATER-BEE [20] project is related to this work. The WATER-BEE project developed a prototype of an intelligent, flexible, easy-to-use but accurate irrigation scheduling system for intensive farming at an affordable cost that takes advantage of recent technological advances in wireless networking (ZigBee), environmental sensors and improvements in crop modelling. Soil sensors were placed at different depths to measure the water content at different root levels. This

G. Pons et al. / Intelligent System for Optimisation in Golf Course Maintenance

69

data was then combined with other appropriate environmental data, such as historical and forecast meteorological information. Intelligent software processed the information and then controlled irrigation events to precisely match the needs of the crop. Moreover, the data collected was sent wirelessly to a local computer where decisions were made. WATER BEE only considered the irrigation process, while the WATERGOLF project extends its scope to the intelligent control of weeds, fungi and fertility risk situation are handled for the optimisation of the golf course management. After reviewing the current works related to this paper, we can conclude that none of them provides advanced sensing and artificial intelligence for optimisation and decision support in the golf course management field.

3. System Overview A description of the architecture and components for the system is given in this section. The general schema (Figure 1) is composed of two main components: the central server and the client software. The central server is deployed as an API which is accessible through a web-service interface, allowing other systems to monitor a golf course. Further integration such as direct cooperation with a third-party irrigation system is also envisaged. The client software is divided into the following subsystems: i) the WATERGOLF main dashboard, which can be used as a desktop application to manage the entire system; ii) the server, which is in charge of the management and processing of all data fetched from the golf course and to deal with external requests; iii) the mobile-device application, in charge of managing remote notifications; and iv) the consultancy system, which is a tool for the expert consultant to remotely advise the greenkeeper. Note that a complex network of advanced sensors is also used in the system. The sensors used are underground sensors, which gather measurements of soil temperature, salinity, humidity, nitrate and pH; evapotranspiration sensors, which determine the evapotranspiration measurements of the turfgrass; weather stations, which measure air temperature, precipitation, wind direction and wind speed; and optical sensors, which take periodical photos of the turfgrass that are processed to detect stressed zones or diseases. All devices communicate one another through a wireless network, and then the gateway gathers all the required measurements. Later on, the WATERGOLF back-end retrieves this data from the gateway. 3.1. User interfaces The system provides the following user interfaces (front-end): Desktop application. This is the default client software that is used by the greenkeeper on a standard stand-alone deployment. It allows the management of the entire system through web-method calls to the server API. Suggestions and irrigation recommendations are shown in the main screen, while all complex behaviors are only revealed to an advanced and password-protected user. It is developed using a 2tier architecture: the user interface layer (for user interactions) and the middleware layer (for communications with the back-end). It is also possible to use it abroad the golf course (at home, while commuting, etc.).

70

G. Pons et al. / Intelligent System for Optimisation in Golf Course Maintenance

Figure 1. System overview.

Mobile application. It is based on the desktop application, but does not fully replicate all functionalities as it is a simplified version with read-only access to current notifications and data. Consultancy application. WATERGOLF provides adequate recommendations, but there is the possibility that complex and dangerous situations appear, which requires the participation of a turfgrass expert. In this case, the greenkeeper can consult a turfgrass expert. This subsystem provides the turfgrass expert with the appropriate interfaces to access relevant data, communicate directly with the greenkeeper, and maintain the system remotely by using an expert system rule editor. An advanced indexing service is also provided for accessing to documental data. The service consists of a repository of relevant documents (informative, specialised, technical, scientific papers, etc.) where the consultant must be able to manage documents. 3.2. Back-end application It is the kernel of the system and is implemented as a dynamic link library (.DLL). The library exports the classes and methods as web-services. The web-services interface has been developed using Windows Communication Foundation (WCF) [22]. It can be exported using the following technologies: • RESTful. It accepts JSON and XML as input encoding and use JSON as output encoding (also XML-ready). A broad audience of languages is capable of using this transport method, so it provides a high integration capacity with third-party systems.

G. Pons et al. / Intelligent System for Optimisation in Golf Course Maintenance

71

• SOAP. Uses the basic HTTP binding (neither authentication nor encryption is needed due to the fact that this is provided by the SSL/TLS security layer). MEX and WSDL are used for defining the interface. It is a well established industry standard for business interoperability. • Net.TCP. Uses the WCF custom binding (with binary encoding) for high performance. This is useful when both sides are implemented in .NET. 3.3. Golf course representation The golf field in the WaterGolf project is represented as a series of holes (commonly 18), in which each hole can be divided into different regions. Ideally, each Region should be monitored using the required sensors for each of the expert systems (described in section 4). However, this process is economically unaffordable for most of the golf facilities. Hence, only some specific regions are monitored, which are named Region of Interest (ROI) in the system. The Regions (without sensors) are linked to the ROIs (with sensors), and they use the measurements of the ROI to perform the necessary calculus. Therefore, the normal Regions are connected to the ROI with the most similar features. Thus, the measurements and calculus performed in the ROI are also applicable to the linked Region (taking an offset value for deviations into account provided by the corresponding expert system).

4. Intelligent System This section describes how the intelligent system works. Figure 2 shows a general schema of the intelligent system. Firstly, a gateway installed within the golf course gathers all measurements from the sensors deployed in the facilities. Then, the expert system hosted in the API retrieves this information and stores it in a database. The expert knowledge from turfgrass experts is embedded in the API and uses the sensor measures and other parameters provided by the greenkeeper to calculate data used to trigger alarms and provide suggestions. The intelligent system is divided into four different experts systems, one for each type of treatment (irrigation, weeds, fungi and fertility), that are presented individually in the next subsections. The final subsection describes a complementary information acquired from the optical sensors. The main function of each expert system is to extrapolate the suggestions of the different ROIs in the golf field to the regions (without sensors) that use them as reference. The relationship between ROIs and the Regions that use them as reference (correction factor or offset) will be established by means of statistical models to avoid unnecessary annoyances to the greenkeeper for each expert system. Since each expert system uses different inputs, each expert system has its own off-set that corrects the amount and time to reintegrate the water (irrigation), probability of appearance of certain hazardous weed species and fungi (weeds and fungi expert systems) or any issues with the golf field nitrate levels (fertility expert system). Additionally, each expert system is complemented with a rule engine to manage the alarms and suggestions related to irrigation. It is built by using the Windows Workflow Foundation (WWF), a framework that implements a full-featured forward chaining rules engine [23]. An easy and intuitive rule editor is available for the user (greenkeeper) for this purpose. The user is able to

72

G. Pons et al. / Intelligent System for Optimisation in Golf Course Maintenance

Figure 2. Intelligent system overview.

specify rules, e.g. "IF Irrigation Time is less than 5 hours THEN send Alarm", remotely and independent from the API implementation. 4.1. Irrigation Expert System Intelligent irrigation management is a key factor for optimising the maintenance of a golf facility. The irrigation expert system gathers measurements from the evapotranspiration and the soil water content, and complements it with the information regarding the soil characteristics (information provided by the greenkeeper). On the one hand, the soil can be composed of sand, loam, clay, silt or their respective combinations, which has a direct relationship with the underground sensor measurements. On the other hand, the turfgrass type affects the evaportranspiration, which is the key parameter on determining the amount of water to reintegrate. Considering that the precipitation forecast will affect the amount of water that has to be reintegrated, the irrigation expert system needs an accurate and trustworthy forecast source to avoid any issue to the golf field. In order to guarantee this, the adapted Schillo[24,25] trust model in (1) has been used to determine the most accurate weather forecast source with the information provided by the weather station installed in the golf field. n |Ri −Pi | i=0 1 − max(Ri ,Pi ) T (f ) = (1) n where T (f ) is the trust in the weather forecast f ; n is the number of samples and forecasts; Ri is the precipitation measured by the weather station; and Pi is the precipitation forecast. 4.2. Weeds Expert System The control of weeds plays a large role in golf course turfgrass management, both in financial terms and from an environmental viability point of view. There is a substan-

G. Pons et al. / Intelligent System for Optimisation in Golf Course Maintenance

73

tial push towards the progressive reduction (or elimination altogether) of chemical weed control products on golf courses, but currently the main means for weed control remains chemical. Hence, that the early detection of weeds germination would allow greenkeepers on taking preventive actions before the weeds appear, and thus, preventing the usage of chemical products. Seven of the most common golf course weeds (crabgrass, foxtail, goosegrass, white clover, daisy, dandelion, annual meadowgrass/bluegrass) are considered within the WATERGOLF project, from which their Growing Degree Days (or GDD, a measurement of the growth and development of plants and insects during the growing season) will be computed. Based on the accumulated difference between the average air/soil temperature and the weed basal temperature (unique to each weed type) and only when the average temperature is above the weed basal temperature, different thresholds to determine the probability of each weed appearing are defined. More specifically, three alarm levels have been defined: no alarm (the probability of a certain weed is close to 0%); low probability alarm (there are certain evidences indicating that a weed may grow, but the system cannot ensure it, which launches a warning); and high probability alarms (there are firm evidences of weed germinating, so we prompt the greenkeeper with an alarm that they has to take actions to eradicate them). As a final note, take into account that WATERGOLF does not indicate what chemical products to use for the weed eradication to avoid any potential misuse or error. These thresholds can be modified by the greenkeeper using a rule engine editor similar to the one used for irrigation. Remember that the expert system is responsible to extrapolate the information of the ROIs to their associated Regions in terms of alarms, while the rule engine determines the activation of the alarms in the different ROIs. 4.3. Fungi Expert System The control of turfgrass fungal diseases has a similar financial and environmental impact on golf courses as weed management. The following common fungal diseases were selected and researched for etiology parameters: Pythium spp. (Pythium blight), Rhizoctonia cerealis (Yellow patch), Sclerotinia hoemeocarpa (Dollar spot), Bipolaris spp. (Anthracnose 1), Drechslera spp. (Anthracnose 2), Michrodochium nivale (Pink snow mold), and Laetisaria fuciformis (Red thread). More specifically, soil and air temperature, pH, air relative humidity and leaf wetness were used to build the statistical models to determine the probability of contracting the different diseases. Based on the different probability ranges, three alarm levels have been defined to indicate the eagerness of each fungal disease appearance by the Fungi Rule Engine: no alarm (no evidences were found that a certain fungal disease appeared); low probability alarm (there are some evidences that indicate that a certain fungal disease has appeared, but the system can not ensure it, so the greenkeeper is warned about this situation); and high probability alarm (the system is certain that a fungal disease has appeared, and thus, it prompts the greenkeeper to take corrective actions to eradicate it). The expert system will be responsible to indicate the appropriate dosage and chemical product to eliminate the fungal disease, as well as to extrapolate the information to the associated Regions of each ROI.

74

G. Pons et al. / Intelligent System for Optimisation in Golf Course Maintenance

4.4. Fertility Expert System The last expert system within the WATERGOLF project is related to the fertility status of the golf course. Large sums of money are spent yearly for its fertilization, and manhours and fuel are used to control the vegetative peaks usually associated with excessive or incorrect timing or dosage of fertilization. Furthermore, excessive or insufficient fertilization can cause the onset of fungal diseases or weed insurgence, thus generating the need for more financial expenditure and a higher environmental impact of the golf course. Soil nitrates measured by a specific sensor are used as the main input, being the output a discrete soil fertility index. The fertility status of the golf course is measured through the nitrogen level registered in different zones of the facilities. In case of presenting a critical value above the expected range of nitrates in the soil (the normal range is between 10 and 30 nitrate ions (N O3− ) ppm), an alarm indicating a poor fertilisation state is launched. On the other hand, if this value largely surpasses this normal range, an over-fertilisation alarm is launched. In both situations, the greenkeeper has to take corrective actions to normalise the nitrate levels in the soil. In this case, the expert system only has to extrapolate the results from the ROIs to their associated Regions. 4.5. NASTEK lens image processing The system also uses images captured by a developed optical sensor. This sensor uses a normal camera and a camera with a NASTEK lens to analyse turfgrass. NASTEK lenses block out the green colours reflected from chlorophyll found in healthy vegetation, highlighting stress zones or zones with any type of disease before the naked eye is able to detect them. Figure 3 shows an example in which a spot can be detected in the NASTEK image indicating a turfgrass disease in its initial stage, but not in the normal image. The images captured by the optical sensors are processed using computer vision techniques. Then, visible indicators are searched for determining if the turfgrass is stressed, and hence, it is necessary to reintegrate water to the zone, or if there are any signs of turfgrass disease. This analysis sends alarms directly to the greenkeeper independently of the previously described expert systems, and determines the best action to perform. Several initial experiments have been carried out in the facilities of Golf Platja de Pals, where a set of images of turf grass under different conditions was studied and processed.

5. Conclusions and Future Work This paper presented the intelligent system of the WATERGOLF project, which is focused on optimising the irrigation of golf courses, while also considering its status in terms of fertility, weeds and fungal disease. In order to do so, it uses several specialised expert systems (irrigation, weeds, fungi and fertility), together with an image processing tool, to advise the greenkeeper on the actions to take in order to maintain the facilities in an optimal state, which in the end, reduces the maintenance costs and the environmental impact.

G. Pons et al. / Intelligent System for Optimisation in Golf Course Maintenance

75

(a)

(b) Figure 3. Optical sensor images: (a) shows a picture of a golf green captured using the normal camera and (b) the same picture captured using a NASTEK lens camera. The rectangle highlights a zone with spots that indicate an initial stage turfgrass disease not visible to the naked eye.

Future works related to the WATERGOLF project are directed towards a real implementation in golf courses. The different expert systems described on this paper will be tested on several golf course facilities, and based on the results and experiences of the users, improvements on the final system will be introduced.

76

G. Pons et al. / Intelligent System for Optimisation in Golf Course Maintenance

Acknowledgements The work described in this paper was carried out as part of the WATERGOLF project, and has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement number 315054. References [1] [2] [3]

[4]

[5] [6] [7]

[8] [9] [10] [11]

[12] [13]

[14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25]

R. L. Green. Trends in golf course water use and regulation in California, 2005. http://ucrturf.ucr.edu/topics/trends_in_golf_course_water_use.pdf. WATERGOLF project. http://watergolf-project.com/ Z. Yao, G. Lou, X. Zeng, Q. Zhao. Research and development precision irrigation control system in agricultural, International Conference On Computer and Communication Technologies in Agriculture Engineering (CCTAE), 3, 117–120, 2010. S. Zhang, H. Zhou. An Improvement of ZigBee Tree Routing in the Application of Intelligent Irrigation, International Conference on Intelligent System Design and Engineering Application (ISDEA), 1, 255– 260, 2010. S. Muhammad Umair, R. Usman. Automation of Irrigation System Using ANN based Controller, International Journal of Electrical and Computer Sciences IJECS-IJENS 10(2), 2010. F. Capraro, D. Patifio, S. Tosetti, C. Schugurensky. Neural Network-Based Irrigation Control for Precision Agriculture, Networking, IEEE International Conference on Sensing and Control, 357–362, 2008. P. Javadi Kia, A. Tabatabaee Far, M. Omid, R. Alimardani and L. Naderloo. Intelligent Control Based Fuzzy Logic for Automation of Greenhouse Irrigation System and Evaluation in Relation to Conventional Systems, World Applied Sciences Journal 6(1): 16-23, 2009. X. Peng, G. Liu. Intelligent Water-Saving Irrigation System Based on Fuzzy Control and Wireless Sensor Network, Fourth International Conference on Digital Home (ICDH), 252–256, 2012. S. Feuillette, F. Bousquet, P. Le Goulven. SINUSE: a multi-agent model to negotiate water demand management on a free access water table, Environmental Modelling and Software, 18(5), 413–427, 2003. D. Smith, W. Peng. Machine learning approaches for soil classification in a multi-agent deficit irrigation control system, IEEE International Conference on Industrial Technology, 1–6, 2009. R. M. Faye, F. Mora-Camino , S. Sawadogo, and A. Niang. An Intelligent Decision Support System for Irrigation System Management, IEEE International Conference on Systems, Man, and Cybernetics. 4, 3908–3913, 1998. S. N. Islam. ShellAg: Expert System Shell for Agricultural Crops, International Conference on Cloud and Ubiquitous Computing and Emerging Technologies (CUBE), 83–86, 2013. M. Yusoff, S. Mutalib, S. Abdul-Rahman, A. Mohamed. Intelligent Water Dispersal Controller: Comparison between Mamdani and Sugeno Approaches, International Conference on Computational Science and its Applications, 86–96, 2007. J. F. Vinsonhaler, P. G. Johnson. TurfDoctor: A web-based expert system for turfgrass problem diagnosis and treatment, International Turfgrass Society Research Journal 9:115-119, 2001. TORO. http://www.toro.com/ Bailoy. http://www.bailoy.com/ UgMO. http://www.ugmo.com/ ETS-Controls. http://www.ets-controls.co.uk/ Rain Bird. http://www.rainbird.com/ WATER-BEE project. http://waterbee.iris.cat/ M. Salgot, G.K. Priestley, M. Folch. Golf Course Irrigation with Reclaimed Water in the Mediterranean: A Risk Management Matter, Water. 4, 389–429, 2012. What Is Windows Communication Foundation. http://msdn.microsoft.com/en-US/en-EU/enen/library/ms731082%28v=vs.100%29.aspx Windows Workflow Foundation. http://msdn.microsoft.com/en-us/library/aa480193.aspx M. Schillo, P. Funk, M. Rovatsos. Who can you trust: dealing with deception. In: Proceedings of the ˘ S106, second workshop on deception, fraud and trust in agent societies. Seattle, pp 95âA ¸ (1999). M. Schillo, P. Funk, M. Rovatsos. Using trust for detecting deceitful agents in artificial societites. Applied artificial intelligence, (Special Issue on Trust, Deception and Fraud in Agent Societies), (2000).

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-77

77

A hierarchical decision support system to evaluate the effects of climate change in water supply in a Mediterranean river basin Tzu Chi CHAOa, Luis DEL VASTO-TERRIENTESa,1, Aida VALLSa, Vikas KUMARb and Marta SCHUHMACHERb a ITAKA research group, Departament d’Enginyeria Informàtica i Matemàtiques Universitat Rovira i Virgili, Av. Països Catalans, 26, 43007 Tarragona (Spain) b AGA research group, Departament d’Enginyeria Química Universitat Rovira i Virgili, Av. Països Catalans, 26, 43007 Tarragona (Spain)

Abstract. This paper presents a study of the applicability and suitability of ELECTRE-III-H for evaluating sectorial water allocation policies for a water stressed Mediterranean area of Northeastern Spain (Tarragona). Proposed method is based on the outranking model of decision support systems which constructs a partial preorder based on pairwise preference relations among all the possible actions. This work has addressed the multi-criteria water management problem considering several water allocation strategies to mitigate water scarcity induced by climate change. We compare several adaptation measures including alternative water sources, inter basin water transfer and sectorial demand management coming from industry, agriculture and domestic sectors. Keywords. Decision support systems, preference relations, environmental modelling, water allocation problem.

Introduction The necessity of developing Intelligent Environmental Decision Support Systems (IEDSS) is well-recognized in the literature [11]. In this domain several inherent difficulties appear, such as the uncertainty of data intrinsic to some environmental modelling techniques, the presence of spatial relationships between the areas studied or even the temporal relationships between the current state and the past states of the environmental system must be considered in knowledge discovery and planning processes [6]. In addition to this particular characteristics of the data, embracing a global perspective in environmental decision making implies accepting that multiple, usually conflicting criteria must be taken into account. Decisions in environmental problems usually deal with a set of diverse indicators measured on different scales and with different levels of uncertainty. Therefore, the development of IEDSS must consider the analysis of complex data. In this case, summarizing the multiple criteria into a single perspective that encompasses all of them is difficult and ineffective [11].

1

Corresponding Author.

78 T.C. Chao et al. / A Hierarchical Decision Support System to Evaluate the Effects of Climate Change

A new generation of methods based on building outranking relations by pairwise comparison has been attracting the attention in the environmental domain [1, 3, 7, 9]. The outranking approach has several advantages such as the management of heterogeneous scales of measurement without requiring any artificial transformation and the management of uncertainty by means of comparisons at a qualitative level in terms of the decision maker preferences. Moreover, the result given to the decision maker is not a numerical score but a model of the preference relations between the different objects studied (i.e., alternatives, actions) while taking into account all the data in an integrated way. In particular, this paper focuses on a method called ELECTRE-III-H, which has been especially designed for complex systems where the criteria are organized into a hierarchy. Each of the intermediate nodes of the hierarchy accounts for a subset of the overall decision problem (e.g. economic costs, social impacts, health risks, etc.). With this method, the decision maker (DM) obtains a ranking of the different alternatives at each node of the hierarchy. As it has been said, due to the special nature of the environmental field, this kind of structure usually appears when designing an IEDSS. This paper presents a study of the applicability and suitability of ELECTRE-III-H for evaluating several sectorial water allocation policies in Tarragona, an industrial and agricultural town of Spain. Changes in water supply conditions for economic activities and environmental uses are likely to be affected by climate change, including altered frequency of extreme events, such as droughts [12]. Lack of additional water resources to fulfil increasing water demand coupled with increased awareness on environmental issues and adverse effects of climate change are making decisions on water allocation rather complex. Meeting the new challenges on water resources management implies the quantification of climate change impact on basin-scale hydrology [12]. The goal of the IEDSS presented in this paper is to rank the different water supply strategies according to several indicators, related to cost and environmental impacts. The paper is organized as follows. Section 1 presents the ELECTRE-III-H method. Section 2 explains the case study of a water management problem and then presents the model proposed, including the set of actions (based on different strategies) and the criteria hierarchy used for the evaluation of the different. In Section 3, the obtained results are discussed. Section 4 gives some conclusions and future work.

1. The ELECTRE-III-H method ELECTRE corresponds to the family of outranking-based methods [5]. Outranking methods have been very successful because of their adaptability to complex problems and because they are based on social choice models that are easy to understand. The aim of the outranking method is to build a binary outranking relation S, which means "is at least as good", obtained by pairwise comparison of alternatives in a set A for each criterion in the family of criteria G. ELECTRE was proposed by Roy [10]. In this method the assessment of the relation S is made on the basis of two social-inspired rules: the majority opinion and right of veto. The final aim is to use this outranking relation to establish a realistic representation of four basic situations of preference: indifference, weak preference, strict preference, and incomparability. Several versions of ELECTRE have been defined with different purposes: ELECTRE-Is for the selection of the best alternatives, ELECTRE-II and ELECTRE-III/IV for constructing a ranking, and ELECTRE-TRI for classifying alternatives into predefined categories.

T.C. Chao et al. / A Hierarchical Decision Support System to Evaluate the Effects of Climate Change 79

ELECTRE input data is a finite set of alternatives (or actions): A ={a1, a2, …, an} and a finite set of criteria G ={g1, g2, …, gm}, being P • . Each alternative aA is evaluated on the m criteria gj. A criterion is a tool constructed for evaluating and comparing potential alternatives according to a well-defined point of view for the DM. The preference on each criterion is modeled with respect to three parameters: indifference (qj), preference (pj), and veto (vj) thresholds, which depend on the evaluations gj(a qj pj vj if the criterion is maximized (inversely if minimized). Each criterion may have a different weight wj that represents the relative importance of the criterion with respect to the other. Weights must be interpreted as the voting power of each criterion, instead of substitution rates as in the case of compensatory aggregation operators. The classical ELECTRE methods deal with criteria defined on a common level. However, in many real-world problems, criteria are organized hierarchically. In this case, three types of criteria can be distinguished depending on the level of generality. (a) Root criterion: Unique element that is the most general criterion, placed at the top of the tree. It represents the global goal of the decision maker. (b) Elementary criteria: They are placed at the lowest level of the hierarchical tree. Each criterion has a unique parent. (c) Intermediate criteria: Correspond to a group of intermediate or elementary criteria, placed in an intermediate level of the tree, between the root and elementary criteria. To construct a ranking of the alternatives at different levels of a hierarchy of criteria an extension of the classical ELECTRE-III method has been proposed by the authors, called ELECTRE-III-H [4]. This method follows the hierarchical organization of the criteria to aggregate the information at each node of the tree, according to the corresponding sub-criteria. First, the evaluations of alternatives on elementary criteria are aggregated applying classical ELECTRE-III method separately for each subset with the same ancestor. This results in partial pre-orders at each intermediate ancestor. The process continues for the subsets of intermediate nodes that have the same ancestor, which are aggregated calculating the partial concordance and discordance indices from partial preorders as proposed in ELECTRE-III-H. This process continues until the aggregation at the general global criterion, where the alternatives are finally ranked from the best to the worst. At each non-elementary criterion, the method constructs a partial preorder O that establishes a preference structure on the set of alternatives A. For each possible pair of alternatives, it assigns one of the following four binary relations {P, P, I, R}, having the following meaning: aPb (a is preferred to b), aPb (b is preferred to a), aIb (a is indifferent to b), and aRb (a is incomparable to b). The relations of strict preference correspond to situations where there is clear evidence (on the criteria values) in favor of one of the two alternatives (aPb or aPb); aIb occurs when the two alternatives have equivalent evaluations in all criteria; and aRb corresponds to the situation where some of the criteria are better in a and others are better in b, and hence it is not possible to set any preference relation between them. 1.1. The ELECTRE-III-H procedure The ELECTRE-III-H method defines two steps to build a partial preorder from a set of criteria: STEP 1. Construction of an outranking relation aSb for each pair of alternatives (a,b)AuA. The relation is built on the basis of two tests: the concordance test,

80 T.C. Chao et al. / A Hierarchical Decision Support System to Evaluate the Effects of Climate Change

sometimes referred to as “the respect of the majority”, involving the calculation of a concordance index c(a,b) measuring the strength of the coalition of criteria that support the hypothesis “a is at least as good as b”; and the discordance test, sometimes referred to as “the respect of minorities”, involving the calculation of discordance indices dj(a,b) measuring the strength of evidence provided by the j-th criterion against this hypothesis. The overall concordance index is computed for each pair (a,b) as: (, ) =

 

    (, )

(1)

Partial concordance index cj(a,b) calculation depends on the type of the criterion. For elementary criteria, cj(a,b) is measured comparing the values of the alternatives in the basic indicators gj (e.g., comparing the price). At intermediate criteria, cj(a,b) is measured in terms of the rank order value, denoted as  and represented in the preference structure of a partial pre-order (i.e., in terms of the binary preference relations P, P, I, R) [4]. At intermediate criteria, the weight wj indicates its relative importance with respect to the other descendants of their parents (i.e., adjacent nodes). The computation of the discordance index takes into account the criteria that disagree with the assertion aSb. In this case, each criterion is assigned a veto threshold vj. The veto is the maximum difference allowed between the values of a pair of alternatives when gj(a) (, ). STEP 2. Exploitation of the outranking relation S. This procedure follows an iterative distillation algorithm that selects at each step a subset of alternatives, taking into account the credibility values of the outranking relation previously calculated, ȡ(aSb). This procedure yields to two complete preorders Op and On, called descending and ascending distillation chain respectively. These two complete preorders are intersected to generate a final partial preorder. For each possible pair of alternatives, it assigns one of the following four binary relations {P, P, I, R}. Although the result of the exploitation is a partial preorder, to facilitate the interpretation or the management of large sets of data it is also possible to build a complete ranking of the alternatives from the partial preorder at each of the non-elementary nodes of the hierarchy.

2. The case study: Water Management in a Mediterranean River Basin Water management along the Francolí River and its tributaries is complex because it is a Mediterranean environment and there is limited supply of water to satisfy the demand of all sectors as well as the environmental needs [2]. The city of Tarragona, located at the mouth of Francolí River, was solely dependent on its own water resources before

T.C. Chao et al. / A Hierarchical Decision Support System to Evaluate the Effects of Climate Change 81

1988. Sea water intrusion in the groundwater aquifers compelled the municipalities to meet the water demand by inter-basin transfer from neighbouring river basins (Ebro River and Gaia River). Moreover, Tarragona is second largest industrial area of Catalonia (Northeast Spain) and most of the industries in Francolí River basin are located close to Tarragona including a large petrochemical industry. Many other small industries are situated in the upper part of the river basin. The agricultural demand varies all along the river basin depending on the crop type and cultivated area. Water allocation describes a process whereby an available water resource is distributed to legitimate claimants and the resulting water rights are granted, transferred, reviewed, and adapted. The allocation of water resources in river basins is one of the critical issues. A holistic approach to water supply management at the watershed-scale considering different criteria would be valuable, where individual water related sectors, such as agriculture, domestic, and industrial water supply are considered together to draft possible management strategies. The main goal of this study is to rank the different water supply strategies for possible demand scenarios. Due to the shortfall in supply from primary water resources, this study is focused on the use of alternative water supply scenarios for the demand of water of three major sectors: domestic, industrial and agricultural. For each future scenario of climate change, the goal is to obtain a ranking of a set of possible actions with regards to different types of indicators, such as costs, environmental impact, water stress, etc. A hierarchy of criteria has been defined and is explained below. The case study presented in this paper assumes that the demand in the future is the same than the one in 2013, and a time span between 2011 and 2040 for supply, with an estimated reduction of 21% water yield of the rivers [8]. 2.1. The actions The set of possible actions to analyze is defined by some general rules of water allocation to three sectors (industrial, domestic and agricultural), which change the percentage of water supply coming from the different types of water sources. The primary sources are the 3 rivers. Two alternate resources have been taken into consideration: Reclaimed water, which include the reuse of recycled domestic water in the industrial process or for the irrigation in agriculture; and Desalination, which is water obtained from processing of sea water. The actions can be grouped under 4 main strategies: A. Nature First: Giving priority to water coming from primary sources, especially for the domestic sector. B. Low use of alternative resources: low desalination for domestic water supply and use of reclaimed water. C. Medium use of alternative resources: medium desalination for domestic and industrial water supply and use of reclaimed water. D. High use of alternative resources: high desalination for domestic and industrial water supply as well as high use of reclaimed water. For each of those 4 strategies, 12 actions have been defined with different percentages of the water demanded that is obtained from the rivers and from the alternative resources. Thus, we have a total of 48 actions to compare. A threshold in the maximum water amount that can be extracted from the rivers is considered, in order to maintain a minimal ecological yield in the rivers. Next section details the alternative resources.

82 T.C. Chao et al. / A Hierarchical Decision Support System to Evaluate the Effects of Climate Change

2.1.1. Alternative resources Two alternative water resources are considered: water recycling and desalination. Water recycling is reusing treated wastewater to meet other demands such as agricultural irrigation, industrial demand, or even other urban uses. Water reuse offers environmental benefits, conservation of precious natural resource and financial savings. In the present case study area, regional water authority (the Catalan Agency of Water) has a Water Reclamation Project in Tarragona, which primary aim is to address regional water scarcity by reclaiming water. The current domestic consumption in Tarragona is estimated to be 44hm #!" , from which a maximum of 80% recycle efficiency has been assumed. It is feasible to assume that there will be a gradual increase in water recycling and reuse in the industrial sector. For agricultural usage it has been assumed that a maximum of 30% of total water demand in agriculture can be met by treated water from the domestic sector. Table 1 shows the considered values of water reuse in different sectors. The amount of reused water in agriculture is notably smaller since there are several constraints such as geographical distance and distribution cost. Moreover, water reuse for domestic (drinking water) sector is unviable as recycling cost is too high. Water desalination is contemplated only for the domestic and industrial sectors since there is no intensive centralized agriculture area in Tarragona where costly desalinated water can be used. Based on experts’ recommendation we assume that the domestic desalinated water supply can be up to 25% whereas the maximum for industrial water taken into account is 20%. The minimum percentage of water supply (20%) from desalination plant is based on the cost viability of the desalination plant. Table 1. Water reuse scenarios Reuse Industry Agriculture Scenarios (%) (%) No reuse 0 0 Low reuse 20 10 Medium reuse 40 20 High reuse 80 30

Table 2. Water desalination scenarios Desalination Industry Domestic Scenarios (%) (%) No desalination 0 0 Low 0 20 Medium 10 20 High 20 25

2.2. The hierarchical structure of the IEDSS Different sets of environmental and economic criteria and different ways of organizing the information have been studied. In this section we present the final criteria used to evaluate the different actions.

Figure 1. Hierarchical structure for the water allocation problem

T.C. Chao et al. / A Hierarchical Decision Support System to Evaluate the Effects of Climate Change 83

Since the problem can be viewed from different perspective and with different levels of generality, it is especially suitable to define a hierarchical structure for the decision support system (Figure 1). This model permits to distinguish sub-goals (i.e. intermediate criteria) at different levels of generality. Cost of water: Each action has a cost for each demand sector and supply source. This includes the cost of the primary water coming from the rivers Francolí, Ebro and Gaià as well as the cost of the alternative resources, which are different for different sector. Scarcity index: The impact on the environment has been evaluated using the water supply stress index (WaSSI). This scarcity index is defined as the ratio of the demand and the total supply from primary sources, and it is calculated for each sector as:WaSSI$ = WD$ #WS$ . The WaSSI can be analyzed in comparison to the current %&''*+  %&''*, where x scenario by means of the following ratio: WaSSIR $ = %&''*+

represents an action of a scenario and the c refers to the current scenario. A positive value of WaSSIR indicates increased water stress while a negative value indicates reduced water stress when compared to the current water stress conditions. Ecological impact: The EcoStress is a water use index that represents the percentage of water extracted from a river to fulfill the demand. This index gives an estimation of the ecological stress on the river. This index is calculated by summing all water allocation from each river and then dividing by the total annual water flow. This index has to be minimized, i.e., the less EcoStress index the better.

3. Results and discussion The results at the different intermediate and global criteria are presented in this section. For each criterion the ELECTRE-III-H parameters have been fixed. For the elementary criteria, all weights are equal because the experts considered differentiating criterion priority are not appropriate for this problem. All of them must contribute in the same proportion to the final result. The indifference threshold qj is 0 in all the elementary criteria, because any difference in the values is important to take into account. The preference threshold pj is also 0 because no tolerance to a worst value should be admitted. Finally, the criteria are given a low right of veto so that they are only able to completely veto the majority opinion if the difference between two values is the maximum permitted in each criterion (vj = maxgj - mingj). The partial preorders obtained at the intermediate level are shown in Figure 2. In the EcoStress’ partial preorder a large number of indifference relations can be seen. They correspond to cases where the actions have equivalent values on the EcoStress of the 3 rivers. For WaSSIR and Cost, the preorder identifies more strict preference relations than EcoStress, which lead to a more linear ranking (i.e., less rank ties at each criterion, such that WaSSIR has 32 rank positions while the Cost criterion has 30, from a total of 48 options). For intermediate criteria, the parameters (Table 3) have been set up according to the obtained partial preorders shown in Figure 2. First, several rank ties occur on the EcoStress criterion as the same rank position is shared by small subsets of alternatives (around 5 or 6), indicating quite similar evaluations of alternatives in EcoStress. Thus, the preference threshold is pEcoStress=5 and the veto is vEcoStress=25 (which is about 5 rank positions). Second, for Cost and WaSSIR, the ranks obtained have less rank ties, and

84 T.C. Chao et al. / A Hierarchical Decision Support System to Evaluate the Effects of Climate Change

consequently it allows the distinction of a larger set of values in gamma. Moreover, the parameters fixed for the Cost are stricter than for the WaSSIR criterion. The reason is that a high negative comparison in the Cost evaluation should be avoided when establishing the preference relations. For water supply stress, the veto power has been reduced (vWaSSIR=40), also permitting a more relaxed measurement of concordance pWaSSIR=20. In that way, the final decision will be more according to the environmental criteria majority opinion, but preventing situations of high cost.

Cost criterion

WaSSIR criterion

EcoStress criterion

Figure 2. Partial preorders obtained from the elementary criteria at intermediate nodes

Table 3. Parameters at intermediate nodes Cost WASSIR EcoStress Indifference threshold 0 0 0 Preference threshold 10 20 5 Veto threshold 25 40 25 Table 4. Positions of the 5 best options: B11, B10, B12, C7 and B9 Alternative Global Cost WASSIR EcoStress B11 1 19 11 2 B10 2 17 17 4 B12 3 25 5 1 C7 4 21 16 2 B9 5 11 23 6

After applying ELECTRE-III-H up to the root node, a global ranking is obtained. For space limitations, we have selected the 5 best options in the global ranking (Table 4) and the 5 worst options (Table 5), and we show their rank positions in the 3 intermediate nodes as well as the global final ranking. The first observation is that the best options belong to group B (strategy consisting on “Low use of alternative resources”). Otherwise, we can also observe that the worst

T.C. Chao et al. / A Hierarchical Decision Support System to Evaluate the Effects of Climate Change 85

actions belong to strategies in group D (with “High use of alternative resources”: high desalination for domestic and industrial water supply and high use of reclaimed water) Table 5. Positions of the 5 worst options: A4, D10, D1, D5 and D9 Alternatives Global Cost WASSIR EcoStress A4 25 17 24 6 D10 26 27 8 4 D1 27 19 23 6 D5 28 22 19 6 D9 29 24 16 6

In addition, in Table 6 we show some interesting cases worth to mention because they have a conflicting evaluation for the different intermediate criteria. Both D8 and D12 are very well ranked in the WaSSIR and EcoStress criteria, but they are placed in the worst positions regarding the cost criterion (as they are the most expensive). On the other hand, B1 and B5 have low costs but have unacceptable values in the environmental criteria. All of them are in an intermediate position in the global ranking. Therefore, not all the strategies in group D or B behave in the same way, because of the different proportion taken from each different type of water source; so the final decision must consider the detailed characteristics of water allocation in those strategies. Alternative D8 D12 B1 B5

Cost Rank 32 33 4 5

Table 6. Conflicting actions WASSIR Rank EcoStress Rank 2 1 1 1 31 9 27 9

Global Rank 20 19 22 17

After the analysis of the results, the recommended options are B11, B10 and B12. They correspond to strategies of “low use of alternative resources” strategy with the following characteristics: for the industrial sector only 40% should be taken from the rivers and no desalination is recommended; for the domestic sector 80% of the water should come from the rivers and the rest from desalination; for agriculture between 10% and 30% of the water reuse is recommended. On the other hand, we must avoid the actions D1, D5 and D9, which correspond to the “high use of alternative resources” strategy, where the agricultural and domestic sectors are 100% and 75% respectively dependent on the primary water sources. Action A4 suggests a low water reuse for industry but then agriculture is based on primary water, which increases the EcoStress and WaSSIR indices. A first study of sensitivity shows that the ranking of the best alternatives remains the same when the thresholds at intermediate level are changed. The threshold values have been set from highly strict to highly tolerant, obtaining always a high correlation between 0.81 and 0.96, from which we conclude that the global ranking is consistent.

4. Conclusions and future work The decision support system presented in this paper has shown quite interesting results from the environmental point of view. The appropriate allocation of water in future conditions of climate change is crucial since water scarcity is highly expected for a

86 T.C. Chao et al. / A Hierarchical Decision Support System to Evaluate the Effects of Climate Change

major part of the world. The Mediterranean rivers are more prone to drought, specially the small rivers such as the ones that provide water to Tarragona city. The proposed method is useful for water managers as it gives possibility to integrate different management criteria and water allocation strategies into a single modeling framework and explore the different adaptation measures with loosing transparency in the analysis. This work is part of a research project that studies the allocation of water among competing uses (industry, agriculture and municipal) due to global change (not only climate, but also demographic and economic). Global change will affect the current patterns of water demand. So, the next step is to extend this model of evaluation based on ELECTRE-III-H to other scenarios of global change, including different demand driving factors. Moreover, to evaluate how reliable and safe are the decisions proposed by the DSS, appropriate sensitivity and robustness analysis tool will be defined. Acknowledgements This work is funded by the Spanish project SHADE (TIN-2012-34369). Luis Del Vasto is supported by a FI predoctoral grant from Generalitat de Catalunya. Kumar received a fellowship grant (BP-DGR 2010) from AGAUR and the European Social Fund (Catalonia, Spain).

References [1]

C. Arondel and P. Girando, Sorting Cropping Systems on the Basis of Impact of Groundwater Quality, European Journal of Operational Research, 127:467–482, 2000. [2] R.F. Bangash, A. Passuello, M. Hammond and M. Schuhmacher, Water allocation assessment in low flow river under data scarce conditions: A study of hydrological simulation in Mediterranean basin. Science of the Total Environment, 440:60-71, 2012. [3] F. Cavallaro, A comparative assessment of thin-film photovoltaic production processes using the ELECTRE-III method, Energy Policy, 38:463–474, 2010. [4] L. Del Vasto-Terrientes, A. Valls, R. Slowinski, and P. Zielniewicz, Extending Concordance and Discordance Relations to Hierarchical Sets of Criteria in ELECTRE-III Method, Modelling Decisions for Artificial Intelligence – 9th International Conference, 7647:78–89, Springer-Verlag, Germany, 2012. [5] J. Figueira, S.Greco, M.Ehrgott, Multiple criteria decision analysis: state of the art surveys, SpringerVerlag, 2005. [6] K. Gibert, G. Rodríguez Silva and I. Rodríguez-Roda, Knowledge discovery with clustering based on rules by states: A water treatment application, Environm. Modelling & Software, 25(6):712–723, 2010. [7] N.R. Khalili and S. Duecker, Application of multi-criteria decision analysis in design of sustainable environmental management system framework, Journal of Cleaner Production, 47:186-198, 2013. [8] M. Marquès, R.F. Bangash, V. Kumar, R. Sharp, M. Schuhmacher, The impact of climate change on water provision under a low flow regime: A case study of the ecosystems services in the Francoli river basin. Journal of Hazardous Materials, 263:224–232, 2013. [9] M.F. Norese, ELECTRE-III as a support for participatory decision-making on the localization of wastetreatment plants, Land use policy, 23:76–85, 2006. [10] B. Roy, Analyse et choix multicritere, Informatique et Gestion,57: 21-27, 1974. [11] M. Sànchez-Marrè, K. Gibert, R.S. Sojda, J.P. Steyer, P. Struss, I. Rodríguez-Roda, J. Comas, V. Brilhante, E.A. Roehl, Intelligent Environmental Decision Support Systems, in: Environmental Modelling, Software and Decision Support. State of the Art and New Perspectives, 119-144, Elsevier 2008. [12] O. Varies, T. Kajander and R. Lemmela, Climate and water: from climate models to water resources management and vice versa, Climatic Change, 66:321-344, 2004.

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-87

87

A Computational creativity system to support chocolate designers decisions a

Francisco J. RUIZa, 1, Cristóbal RAYAa, Albert SAMÀa, Núria AGELLb Automatic Control Department, BarcelonaTech. Vilanova i la Geltrú, Spain b ESADE Business School- Ramon Llull University, Sant Cugat, Spain Abstract. In this paper, a new formulation of the central ideas of the wellestablished theory of Boden about creativity is presented. This new formulation redefines some terms and reviews the formal mechanisms of exploratory and transformational creativity. The presented approach is based on the conceptual space proposed by Boden and formalized by other authors in a way that facilitates the implementation of these mechanisms. The presented formulation is applied to a the real case of creative designing in which a new combination of chocolate and fruit is desired. The experimentation has been conducted jointly with a Spanish chocolate chef. Data collected from the chef has been used to validate the proposed system. Experimental results show that the formulation presented is not only useful for understanding how the creative mechanisms of design works, but also facilitates its implementation in real cases to support creativity processes. Keywords. Creativity, Creativity Support System, Food industry.

Introduction Computational creativity is a multidisciplinary area which goal is to model, simulate or serve as a support tool for creative tasks. In this paper we are focusing on the last goal. These systems are normally referred as Creativity Support Systems (CSS). CSS can be defined as systems capable of enhancing human creativity without necessarily being creative themselves. They act as a creative collaborator with scientists, designers, artists and engineers. CSS consists of applying technology to assist humans in thinking outside the box and expanding their exploration boundaries generating ideas that have never been imagined before [1]. They can help us to look farther and avoid thinking of the obvious concepts. The grand challenge for CSS is to enable more people to be more creative more of the time [2]. Creativity can be found in paintings, sculpture, music, literature, architecture, but also in engineering, software development, scientific discoveries and almost all human activities. It is assumed to be closely related to the rational decision-making process. In the literature, decision-making processes are normally considered to be composed by four steps: Framing the decision, generating alternatives, evaluating the alternatives and choosing and implementing the chosen alternative. Creativity is mainly associated to the second step: generation of alternatives. Alternatives are normally generated by reviewing processes that were used in the past or were used in different frameworks with subtly common aspects. The skills of creative people are on the one hand to find these apparently different frameworks with common characteristics and, on the other hand, to evaluate the alternatives taking into account the relation between frameworks. 1

Edifici VG2 (EPSEVG). Avda. Víctor Balaguer, s/n. 08800 Vilanova i la Geltrú. Spain

88

F.J. Ruiz et al. / A Computational Creativity System to Support Chocolate Designers Decisions

CSS has a potential role in the food industry. Today, a significant portion of a food services or manufacturers’ business is focused on generating new ingredient combinations and finding new flavors that will be a commercial success. This research will allow chefs and other food professionals to be more creative and to shorten the time to bring a new flavor to market by helping them in the development process. In this paper, a new formulation of the central ideas of the well-established theory of Boden about creativity is presented. This new formulation redefines some terms and also reviews the formal mechanisms of exploratory and transformational creativity based on a conceptual space proposed by Boden and formalized by others authors in a way that facilitates the implementation of these mechanisms. To illustrate this formulation, a computational system is developed and tested in the support process of a creative chocolate designer. The study has been conducted jointly with the chocolate chef Oriol Balaguer team (http://www.oriolbalaguer.com). Oriol Balaguer is one of the Spanish most awarded pastry chef who is actively involved in the research and development of new products. The remaining of the paper is organized as follows: first, a literature review on computational creativity is conducted. In the second section, the proposed CSS methodology is presented. A real case example where the methodology proposed is applied is given in section three and, finally, in the last section conclusions and future work are discussed.

1. Literature review on computational creativity Creativity should be regarded as one of the highest-level cognitive functions of the human mind. It is a phenomenon whereby something new and valuable is produced such as an idea, a problem solution, a marketing strategy, a literary work, a painting, a musical composition or a new cookery recipe. Authors have diverged in the precise definition of creativity beyond these two features: originality (new) and appropriateness (valuable). One of the few attempts to address the problem of creative behavior and its relation with Artificial Intelligence was done by Margaret Boden [3][4]. She aimed to study creativity processes from a philosophical viewpoint focusing on understanding human creativity rather than trying to create a creative machine. Boden distinguishes between creativity that is novel merely to the agent that produces it and creativity that is recognized as novel by the society. The first is usually known as P-creativity (or “psychological creativity”) and the second is known as Hcreativity (or “historical creativity”). The most important contribution of Boden study is the introduction of the idea of conceptual space composed by partial or complete concepts. She conceives the process of creativity as the location and identification of a concept in this conceptual space. The creative process can be performed by exploring or transforming this conceptual space. If the conceptual space is defined through a set of rules, whenever these rules are satisfied, the creative process can be thought as finding new and satisfactory elements of this space. This is the kind of creativity which Boden calls exploratory creativity. If the rules defining the conceptual space can be changed, then the process is called transformational creativity. However, from Boden’s study, it is not clear how the rules give rise to a particular conceptual space and, therefore, what is the true difference between exploring the space

F.J. Ruiz et al. / A Computational Creativity System to Support Chocolate Designers Decisions

89

and transforming it. In order to clarify and to formalize the creative process, G. A. Wigging [5] presented several papers in which emphasized the notion of search as the central mechanism for exploratory creativity and the notion of meta-level search related to transformational creativity. Wiggings proposes to define a universe of possibilities U which is a superset of the conceptual space. The universe is a multidimensional space, whose dimensions are capable of representing all possible concepts which are relevant to the domain in which we wish to be creative. For transformational creativity to be meaningful, all conceptual spaces are required to be subsets of U. Wiggings conceives exploratory creativity as a search of concepts in a specific conceptual space. The process involves three set of rules that can be denoted as acceptability, appropriateness and strategy. The first set of rules is associated to the conceptual space membership. Moreover, acceptability is related to the style. On the other hand, appropriateness rules are related to the value of the concept. Valuable concepts may become successful regardless of being acceptable according to the rules associated to the acceptability. This second set of rules that defines the value of a concept is much difficult to define because it depends on cultural and aesthetic aspects, specific context, personal mood, etc. It is important to remark that, in this context, appropriate means suitable to the task, but above all original and surprising. Finally, there exist a third set of rules associated to the search strategy. For instance, some people prefer to work “top-down”, others “bottom-up”, others rely on ad-hoc methodologies, using informed or uninformed heuristics and even at random. Wiggings points out that by separating acceptability and strategy rules, situations where different designers, each with a personal way of finding new ideas, are working within the same style can be described (a shared notion of acceptability). From Wiggins perspective, the interaction of these three sets of rules (acceptability, appropriateness and strategy) leads to the exploratory creativity process. However, although working within three invariant sets of rules may produce interesting results, a higher form of creativity can result from making changes of these rules (transformational creativity). In other words, on the one hand, exploratory creativity consists of finding a concept in a specific conceptual space, following a specific strategy and assessing it using a specific appropriateness set of rules. On the other hand, transformational creativity consists of the same process than exploratory creativity but changing the conceptual space, the search strategy or the appropriateness assessment. Besides Wiggings work, there have been other formalizations of specific aspects of the computational creative process [6][7][8]. Although these formalizations are very helpful in clarifying the nature of creative computation and have enabled some applications in diverse domains including graphic design, creative language, video game design and visual arts [9], the details of most of them are unspecified and the concepts they include are not easy to implement. The current paper starts from the central ideas of Boden and Wiggings and redefines the formal mechanisms of exploratory and transformational creativity in a way which facilitates the implementation of these mechanisms.

2. The proposed CSS As in Wiggins theory, let’s start by considering a universal set of all concepts, U. The idea is that U is universal enough to contain concepts for every type of artifacts that

90

F.J. Ruiz et al. / A Computational Creativity System to Support Chocolate Designers Decisions

might ever be conceived. In addition, we define a framework F, as the object F={C, a(), r()} where C U is the H-conceptual space (or “historical conceptual space”) formed by all concepts related to the framework F, and a() and r() are maps from U to R. a() is the appropriateness and r() is the relevance. The first map is related to the success of considering this concept while the second one is an objective measure of the relation of the concept with the framework. The relevance is the result of the experts’ activities. A naïve relevance measure might be a 0/1 value (1 if xC and 0 otherwise) but it is possible to consider more complex measures containing more information about the relation between the concept and the framework. The underlying idea is that, although evaluating the appropriateness requires some kind of talent or expertise, relevance evaluation can be performed by means of an objective analysis of other problems in the framework. Thus a concept with high relevance in a framework does not necessarily have high appropriateness. In fact, an original concept always has low relevance in the considered framework. In the proposed CSS, it is assumed that an expert i on a given framework Fo ={C0, a0(), r0()} knows value ao(x) for some subset Ci0 of C0, but values ao(y) for those concepts yCi0 are unknown by the expert. Following Boden, we call Ci0 the psychological or P-conceptual space, that is, the concept space associated to the framework Fo and to the expert i. At the same time, the expert does not necessarily know ai(x) for another framework Fi. Thus, it is considered that expertise of the expert is only guaranteed in a single framework. In this model, we consider that function a() depends on the framework but not on the expert. The difference between experts of the same framework is related to the different P-conceptual spaces, all them subsets of the H-conceptual space. In addition, the activity of an expert is not only to evaluate concepts but specially selecting them following some selection strategy. Once a new concept is selected and applied by the expert, if this does not belong to the P-conceptual space, the expert can obtain the value of a() and hence, the P-conceptual space and/or the H-conceptual space is extended to include this new concept. Let us consider a set of different frameworks F1, F2,…, Fm. If x U is a concept, we can consider the relevance vector of x respect to the set of frameworks F1, F2,…, Fm as )(x)=(r1(x), r2(x),…, rm(x)). This vector describes the membership relation of x to the set of frameworks F1, F2,…, Fm . If xCi, then ri(x)=0. Following Boden notation, if the expert only explores the conceptual space Ci0, the task is just exploratory creativity. If the expert explores beyond Ci0 by extending it, transformational creativity is then performed. The utility of CSS relies on proposing new concepts y U, yCi0 to the expert from the relevance information of these concepts with respect to frameworks different to the one initially considered and from the relation among all these frameworks. The system that we consider is able to propose new concepts y U, y Ci0 with likely high a0(y). In order to predict how valuable a new concept y is, i.e a0(y), our hypothesis is that no obvious relations between different frameworks exist, therefore the appropriateness a0(y) and the relevance vector )(y) are closely related. In this sense, it is considered that concepts with similar relevance vectors on the current framework should have similar appropriateness function. This hypothesis could not be true for a small set of frameworks but seems to be true for larger ones. Given the relation between appropriateness and relevance, our CSS will use the set Ci0, or a subset of it, as a training set in a learning system in order to extract the relation

F.J. Ruiz et al. / A Computational Creativity System to Support Chocolate Designers Decisions

91

between appropriateness in F0 and the relevance vector in a set of frameworks F1, F2,…, Fm. Once trained, we only have to feed the CSS with other concepts and the system will propose those concepts with expected high appropriateness. We propose in this study an illustrative example to validate our formulation and hypothesis. The example will highlight the relation between the appropriateness in a given framework with the relevance of the concept respect other apparently distinct frameworks.

3. Experimental framework: combining chocolate with fruits To illustrate the implementation of the ideas presented in the previous section, let us consider the following creative problem: creating a new chocolate cake by combining dark chocolate with a fruit to obtain a highly accepted product. The expert that has to create the new chocolate cake has a large experience in combining chocolate with many different ingredients –cheese, liqueurs, olive oil, nuts and, of course, fruits. Due to his experience, he knows whether several combinations of specific types of chocolates and fruits are suitable or not but, of course, he does not know how well chocolate combines with all existing fruits. Thus, a CSS is going to be developed according to the methodology presented in the previous section in order to assist the expert in creating suitable new combination. In our case, since we constrained the problem to combine fruit and dark chocolate, the universe U is formed by all fruits. The P-conceptual space Ci0, which consists in all fruits that the expert knows whether they blends in well or not with dark chocolate, is just a subset of U. Moreover, the expert is able to assign a value a0 (x) for all x Ci0, which is represented as a qualitative value (very bad, bad,…, good and very good). The objective of our CSS consists in suggesting other fruits y U, y Ci0, with a high predicted value function a0 (y), i.e. they are valuable to the expert. Following the proposed CSS methodology, we can learn the value a() of a fruit respect to the dark chocolate (framework F0) through the way it is related to other frameworks. In this example we are considering only frameworks related with recipes and ingredients, but other alternatives could also be considered. For obtaining the relevance value with respect to a framework we have made use of a large recipe databases and we counted the number of recipes containing both the fruit and the term associated to the framework. Although the combination of fruits and dark chocolate could have nothing to do with the combination of fruits and rice, for instance, according to our assumption, given a fruit that has high value of a0 and has similar relevance vector than another unknown fruit, this unknown fruit could be considered as a good option for extending the search. 3.1 Data collection In order to validate our method, we used the data provided by the chocolate chef Oriol Balaguer who assessed, according to his expertise, the combinations of 28 different fruits respect to their suitability to combine with dark chocolate [10]. In addition we have considered 14 frameworks aside from the main framework (dark chocolate). All considered frameworks consisted in ingredients used in cooking, but not necessarily in pastry making. In this implementation of the CSS, we are not focusing on the

92

F.J. Ruiz et al. / A Computational Creativity System to Support Chocolate Designers Decisions

frameworks selection problem. Instead, we think that the ad-hoc selection for this example is enough for illustrating how the formulation presented can be implemented and we leave framework selection as a near future work. In order to obtain the relevance vector for each fruit we used the online database of recipes www.allrecipes.com. Table 1 depicts the list of the 28 fruits and 14 frameworks. The last column of this table shows the qualitative assessment provided by the expert following the labels in table 2. Values of table 1 are obtained by searching both terms simultaneously (fruit and framework). This value represents the number of recipes of the database containing both terms. We can think that this list of 28 fruits constitutes the conceptual space and the CSS can learn the relation between the relevance vector of these fruits with respect to the 14 frameworks and the assessment provided by the expert. Once this relation is captured the CSS can obtain the relevance vector of other fruits respect to this set of frameworks. It calculates the predicted value a0 for these new fruits and proposed those which have high predicted a0. Table 1. List of the 28 fruits assessed by the expert (last column) and the list of 14 frameworks considered in this example. rice

Apple Pear Quince Apricot Peach Plum Boysenberry Blackberry Strawberry Bilberry Raspberry Grape Blackthorn Pineapple Orange Banana Pomegranate Grapefruit Loquat Kiwi Lime Mandarine Mango Olive Prickly+pear Papaya Watermelon Citron

chicken vinegar sugar pie caramel bread salad lasagna milk pudding garlic butter honey app

331 494 18 29 0 0 53 80 26 44 55 119 0 0 4 6 25 38 0 0 18 33 38 111 0 0 129 210 202 351 41 31 8 11 4 8 2 2 9 9 268 491 18 32 68 111 1036 2219 0 0 5 11 4 4 0 0

606 2265 586 49 185 38 0 7 0 51 250 30 49 374 150 91 142 18 0 4 0 15 156 72 95 794 219 0 0 0 73 418 103 90 157 14 0 0 0 119 818 129 235 1416 129 32 679 95 16 34 0 11 32 3 2 0 0 13 54 18 210 591 65 43 92 18 70 140 11 1433 998 141 0 2 0 8 19 0 30 54 3 2 46 0

152 17 0 6 3 2 0 5 8 0 2 6 0 11 15 31 0 0 0 0 9 0 0 103 0 0 0 0

525 38 0 51 53 89 0 10 116 0 51 48 0 141 276 327 9 6 0 4 106 8 40 1496 0 3 4 6

661 79 0 40 59 103 0 13 174 0 99 248 0 314 391 94 22 32 0 26 380 147 95 1668 0 17 55 0

5 0 0 0 0 4 0 0 0 0 0 0 0 0 2 2 0 0 0 0 0 0 0 91 0 0 0 0

567 42 0 72 145 46 2 56 330 0 121 37 0 220 346 430 8 6 0 17 208 21 64 547 2 5 6 15

173 6 0 16 24 16 0 6 84 0 35 8 0 106 78 107 0 0 0 6 19 22 7 11 0 0 0 0

465 36 0 42 42 281 0 6 29 0 22 101 0 157 391 52 18 10 2 10 810 20 107 5011 0 13 16 0

1253 113 3 160 195 107 4 85 330 0 217 65 0 310 671 401 12 7 0 21 251 23 59 1223 3 8 5 24

Table 2. Labels and linguistic meaning in the fruit assessment by the expert. Labels 1 2 3 4 5

Linguistic labels It does not combine at all It does not combine well Combines well Combines very well It is an excellent combination

265 35 0 46 36 26 0 15 92 0 42 44 0 94 257 112 11 16 0 14 140 16 47 318 0 8 18 6

1 1 2 4 3 5 4 4 4 5 5 1 5 5 5 5 4 5 4 3 4 5 5 2 4 1 1 3

F.J. Ruiz et al. / A Computational Creativity System to Support Chocolate Designers Decisions

93

3.2 CSS training and results Our proposal is based on the existing relation between the appropriateness of a concept with respect to a framework and the relevance vector of this concept with respect to a set of other frameworks. To validate this, we used data from Table 1 for obtaining this relation and assessing its significance. This validation is performed twice by using, on the one hand, the complete range of expert’s valuation shown in Table 2 and, on the other hand, using just a binary valuation (suitable or not suitable) that simplifies expert’s assessment. We used a multiclass and a two-class support vector machines (SVM) and we validated it by means of a leave one out cross-validation process. If the SVM can correctly estimate the appropriateness of a fruit from the relevance vector, it can be used to propose new fruits with high predicted appropriateness. Parameters of the SVM were tuned by optimizing the geometric mean of sensitivity and specificity because data are imbalanced. In the first case, we employed a multiclass SVM (5 classes) with a Gaussian kernel. The best parameters obtained were C=1000 (regularization cost) and J= 10000 (Gaussian kernel parameter). Software R and LibSVM library were used to train the datasets and predict accuracy of classifying. The total accuracy obtained is 42.86%. This value means that 42.86% of the times, the predicted value match to the expert assessment. Taking into account that there are 5 classes and the expected accuracy in the case of random values is 100/5=20%, the accuracy value obtained reaffirms our hypothesis. In the second case, pattern labels are changed in order to maximize CSS utility. Instead of labels showed in Table 1, a binary classification is employed in which the first class contains those combinations that are suitable to the expert and the second class those which are not. Patterns corresponding to values 3 and less are considered to the first class and the rest of patterns are considered to belong to the second class. In this case, the best parameters obtained by the tuning process were C=100000 and J= 0.1. The total accuracy obtained is 85%.

4. Conclusion and Future work In this paper we proposed a new formalization of the mechanism of creativity based on the Boden notions of conceptual space and transformational creativity through a search beyond the boundaries of this conceptual space. This study redefines the formal mechanisms of exploratory and transformational creativity introducing the concepts of framework and relevance of a concept with respect to a framework. The formalization presented has been implemented in a real example conducted with a Spanish chocolate chef. The obtained CSS was able to propose new unknown fruits that are predicted to combine well with dark chocolate. The validation of the method has been performed using a SVM. The obtained results allow us to conclude that the assumptions on which the method is based are satisfied in this example. It is important to remark that in this implementation we are not focusing on the frameworks selection problem. This is an important point to study in future work. Also, including both complete and incomplete concepts to the formalization presented will be an interesting topic for research.

94

F.J. Ruiz et al. / A Computational Creativity System to Support Chocolate Designers Decisions

Acknowledgments This work is supported by the Spanish project SENSORIAL (TIN2010-20966-C02-02) Spanish Ministry of Education and Science.

References [1] C.J. Thornton, How thinking inside the box can become thinking outside the box. In Proceedings of the 4th international Joint Workshop on Computational Creativity, (2007). [2] B. Shneiderman, Creating Creativity: User Interfaces for Supporting Innovation, ACM Transactions on Computer-Human Interaction 7,1 (2000), 114-138. [3] M.A. Boden, The Creative Mind: Myths and Mechanism. Weidenfiel and Nicholson, London, (1990). [4] M.A. Boden, What is creativity? In M.A.Boden, editor, Dimensions of Creativity. MIT Press (1996), 75118. [5] G. Wiggins, A preliminary framework for description, analysis and comparison of creative systems. Knowledge-Based Systems 19 (2006), 449-458. [6] G. Ritchie, Some empirical criteria for attributing creativity to a computer program. Minds and Machines 17 (1) (2007), 67-69 (2007). [7] G. Ritchie, A closer look at creativity as search, In Proceedings of the 3rd International Conference on Computational Creativity (2012). [8] J. Charnley, A. Pease and S. Colton. On the Notion of Framing in Computational Creativity, In Proccedings of the 3rd International Conference on Computational Creativity (2012). [9] R. Manurung, G. Ritchie and H. Thompson. Using a genetic algorithms to create meaningful poetic text. Journal of Experimental & Theoretical Artificial Intelligence 24(1), (2012), 43-64. [10] N. Agell, G. Sánchez, M. Sánchez, F. Ruiz, Group decision-making system based on a qualitative location function. An application to chocolates design, In Proceedings of the 27th International Workshop on Qualitative Reasoning (2013).

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-95

95

Learning by demonstration applied to underwater intervention Arnau CARRERA a , Narcís PALOMERAS a , Natàlia HURTÓS a , Petar KORMUSHEV b and Marc CARRERAS a a Computer Vision and Robotics Group (VICOROB), University of Girona b Department of Advanced Robotics, Istituto Italiano di Tecnologia 2

1

Abstract. Performing subsea intervention tasks is a challenge due to the complexities of the underwater domain. We propose to use a learning by demonstraition algorithm to intuitively teach an intervention autonomous underwater vehicle (IAUV) how to perform a given task. Taking as an input few operator demonstrations, the algorithm generalizes the task into a model and simultaneously controls the vehicle and the manipulator (using 8 degrees of freedom) to reproduce the task. A complete framework has been implemented in order to integrate the LbD algorithm with the different onboard sensors and actuators. A valve turning intervention task is used to validate the full framework through real experiments conducted in a water tank. Keywords. Learning by Demonstration (LbD), Dynamic Movement Primitives (DMP), Autonomous Underwater Vehicle (AUV), Underwater Intervention

Introduction The interest for having human operators interacting and working side by side with robots has been increasingly growing during the last decade. Two key elements are necessary to achieve this goal. The first one is to settle a safety work space in which human operators can not be hurt by the robots. The second element, which is related to the focus of this paper, concerns about how an operator can teach a new task to a robot in a natural way. A recent example of this progress in the industrial environment is the Baxter[1] robot. It is an industrial robot with two arms that can share the same work area with human operators. Moreover, Baxter is programed by means of an operator moving its compliant arms and recording the desired waypoints to perform a particular task. A more general way to teach a new task to a robot is using a learning by demonstration (LbD) algorithm [2]. This algorithm allows a robot to learn a new task through a set of demonstrations. Distinctly to the Baxter’s way of teaching, where only some points 1 University of Girona, 17071 Girona, Spain. E-mail: arnau.carrera, narcis.palomeras, marc.carreras at udg.edu and nhurtos at eia.udg.edu 2 Istituto Italiano di Tecnologia, via Morego, 30, 16163 Genova, Italy. E-mail: petar.kormushev at iit.it This research was sponsored by the Spanish government (COMAROB Project, DPI2011-27977-C03-02) and the PANDORA EU FP7-Project under the Grant agreement FP7-ICT-2011-7-288273. We are grateful for this support.

96

A. Carrera et al. / Learning by Demonstration Applied to Underwater Intervention

are stored, the LbD algorithm records several demonstration trajectories and then, they are used to generate a representative model of the task. Our aim in this paper is to apply a LbD technique to an intervention autonomous underwater vehicles (I-AUV) to enable it to learn an intervention task. This type of vehicles are designed to explore the underwater world autonomously and interact with it using a manipulator. By using LbD we will provide the possibility to easily use the IAUV for different tasks. In this way, every new task just requires an operator to perform demonstrations thus avoiding the implementation of new code every time. Some research projects have begun to demonstrate similar capabilities with I-AUVs, although none of them uses machine learning but classical manipulation theory. The SAUVIM project [3] proposed a system to recover objects of the seafloor. In the TRIDENT project [4] a system to search and recover objects with a light I-AUV was presented. The TRITON project [5] shows some manipulation with an I-AUV docked in a sub-sea panel. The work presented here is conducted in the context of the PANDORA project, and for the first time machine learning, in form of LbD, is applied on an I-AUV that has the goal of performing a valve turning task in free floating mode. The application of LbD techniques in the underwater domain presents several added complications due to water perturbations (i.e. current, waves), reduced visibility, difficulties in understanding the scene and high sensorial uncertainty for navigation and perception. For this reason, the LbD implementation presented here has required the development of a complete framework to successfully integrate the algorithms. The proposed framework learns 8 degrees of freedom (DoF) to control the trajectory of an AUV and its manipulator simultaneously. To perceive the environment, the framework uses the vehicle cameras and also a force and torque (F/T) sensor to detect the contact between the manipulator and the target. Information from all these sensors is acquired whilst a pilot is performing the intervention task to be learned. Then from a set of demonstrations, the proposed LbD algorithm generalizes , a control policy able to accomplish the intervention task with the same performance than the human operator. To validate the proposed approach we present experiments in the context of a valve turning intervention task. An AUV equipped with a manipulator, two cameras, a F/T sensor and a haptic device is set to perform the turning of a valve placed on an sub-sea panel in free floating mode. Results show good performance when attempting the task both in normal conditions and under external perturbations. The rest of this paper is organized as follows. Section 1 overviews related work on LbD for robotics and describes the LbD algorithm that has been used. Section 2 describes the vehicle used to perform the intervention task as well as the software architecture. Results obtained from the valve turning test scenario are presented and analyzed in Section 3. Section 4 summarizes, and concludes the work.

1. Learning by Demonstration LbD is a machine learning technique designed to transfer the knowledge from an expert to a machine. This type of algorithm follows three sequential phases: first, a set of demonstrations of the task are recorded; second, the algorithm learns by generalising all demonstrations and creating a model; finally, the algorithm loads the model and uses it to reproduce the task.

A. Carrera et al. / Learning by Demonstration Applied to Underwater Intervention

97

There are mainly two methods to transfer the knowledge: Imitation, where the teacher performs the task by itself and the robot extracts the information; and Demonstration, where the robot is used to perform the task by tele-operation or by a kinesthetic teaching, where the robot is moved by the teacher. The learned controllers can generate trajectories adaptable to the current robot state. 1.1. LbD related work Several LbD algorithms have been proposed depending on the application requirements. D.R.Faria [6] proposed to learn the manipulation and grasping of an object using geometry, based on the position of fingers and their pressure, representing them with a probabilistic volumetric model. Calinon [7] proposed to represent trajectories using a Gaussian mixture model (GMM). This representation was extended by Kruger [8] using Incremental GMM to automatically set the number of Gaussians. Furthermore, Calinon [9] used different types of parametrized regressions to adjust the trajectory learnt during the demonstrations. Similarly to the GMM, a hidden Markov model (HMM) [10] can be used to represent a trajectory and parametrized [11]. A different option to encode the trajectory is using dynamic movement primitives (DMP) [12], which can be extended for working in closed loop [13]. Moreover, forces exert along the trajectory can be learned by an extended DMP[14]. 1.2. Dynamic movement primitives Considering the context of this work, we have chosen to use DMP as the base of our learning framework. The main motivation is the fact that it dynamically generates the trajectories during the reproduction, which makes the approach robust to external perturbations. Also, the flexibility and simplicity of the representation allows the adaptation of the algorithm to specific requirements, as it will be described in Section 2.2.4. DMP is an algorithm where the learned skill is encapsulated in a superposition of basis motion fields (see Figure 1). The method used in this paper is an extension of DMP proposed by Kormushev [14].

Figure 1. Left figure shows a set of 2D demonstrated trajectories (black) and one reproduction (red). In this case, the demonstrated trajectory has to pass between the two depicted obstacles. On the right, the h function is represented. The encoding of the trajectories using a DMP algorithm has been done using 5 Gaussians adequately weighted over time.

98

A. Carrera et al. / Learning by Demonstration Applied to Underwater Intervention

To better understand this encoding, we can imagine a mass attached to different damped strings. These strings attract the mass changing their forces along the time of the experiment, moving the mass following the desired trajectory. To generate the superposition each attractor has an associated weight which changes along the time defined by the hi (t) function (1). The weight of each attractor is represented with a Gaussian, whose centers μTi are equally distributed in time, and whose variance parameters ΣTi = total_time/K are set to a constant value inversely proportional to the number of Gaussians (K). N (t; μTi , ΣTi ) , hi (t) = K T T k=1 N (t; μk , Σk )

(1)

Instead of using the real time a decay term is used, to obtain a time invariant model:

t=

ln(s) , where s is a canonical system : s˙ = s − αs, α

(2)

and the α value is selected by the user depending on the duration of the demonstrated task. The number of attractors is preselected by the user and represented using Gaussians, depending on the complexity of the task. The position of the attractor is the center of the Gaussian (μxi ) and the stiffness (matrix KiP ) is represented by the covariance. The values are learned from the observed data through least-squares regressions. All the data V + x], where from the demonstrations is concatenated in a matrix Y = [¨ x K1P + x˙ K KP x, x˙ and x ¨ are the position, velocity and acceleration recorded at each time instant of the demonstrations. Also the weights at each time instant are concatenated to obtain matrix H. With these two matrices, the linear equation Y = Hμx can be written . The least-square solution to estimate the attractor center is then given by μx = H † Y , where H † = (H T H)−1 H T is the pseudo-inverse of H. P P , and maximum Kmax to define the limits The user needs to define a minimum Kmin of the stiffness and to estimate the damping as follows: P K P = Kmin +

P P √ − Kmin Kmax , KV = 2 KP . 2

(3)

To take into account variability and correlation along the movement and among the different demonstrations, the residual errors of the least-squares estimations are computed in the form of covariance matrices, for each Gaussian (i ∈ {1, ...K}). ΣX i =

N 1     (Y − Y¯i )(Yj,i − Y¯i )T , N j=1 j,i

(4)

∀i ∈ {1, ...K}, where:  = Hj,i (Yj − μxi ). Yj,i

(5)

A. Carrera et al. / Learning by Demonstration Applied to Underwater Intervention

99

 In Equation 4, the Y¯i is the mean of Yi over the N datapoints. Finally, the residual terms of the regression process are used to estimate the KiP through the eigen components decomposition.

KiP = Vi Di Vi−1 ,

(6)

where: P P P Di = kmin + (kmax − kmin )

λi − λmin . λmax − λmin

(7)

In the Equation above, the λi and the Vi are the concatenated eigenvalues and eigenvector fo the inverse covariance matrix (Σxi )−1 . The basic idea is to determine a stiffness matrix proportional to the inverse of the observed covariance. To sum up, the model for the task will be composed by: the kiP matrices and μxi centers representing the Gaussians; hi (t) representing the influence of each matrix functions; K V representing the damping; and α, which is assigned according to the duration of the sample. Figure 1 shows a simple example where the learned data is represented. Finally, to reproduce the learned skill, the desired acceleration is generated with ˆ x ¨=

K

v hi (t)[KiP (μX ˙ i − x) − K x],

(8)

i=1

where x and x˙ are the current position and velocity. 2. Intervention Framework The intervention framework can be divided in two different parts. First, the hardware components, namely the I-AUV and the manipulator. Second, the software architecture which interprets the information gathered by the sensors, sends commands to the actuators, and learns the demonstrated task controlling both the AUV and the manipulator. 2.1. Hardware components The Girona 500 I-AUV [15] is a compact and lightweight AUV with hovering capabilities which can fulfill the particular needs of a wide diversity of applications by means of mission-specific payloads and a reconfigurable propulsion system. For the purpose of this paper, the propulsion system is configured with 5 thrusters to control 4 DoFs (surge, sway, heave and yaw). To perform intervention tasks, the Girona 500 I-AUV is equipped with an under-actuated manipulator (see Figure 2), with 4 DoFs (slew, elbow, elevation and roll) and custom end-effector. The custom end-effector is composed of three different parts (see Figure 2). The first one is a compliant passive gripper to absorb small impacts. Since we aim to demonstrate a valve turning task, the gripper has been designed with a V-shape in order to easily drive the handle of a T-bar valve to the end-effector center. The second element consists of a camera in hand which has been installed in the center of the gripper to provide a visual feedback of what the end-effector is manipulating. This camera has been placed to prevent the occlusion of the vehicle’s camera by the manipulator during the demonstration of the intervention. Finally, a F/T sensor, provides information about the quality of the grasping and the necessary torque to turn the valve during the manipulation.

100

A. Carrera et al. / Learning by Demonstration Applied to Underwater Intervention

Figure 2. On the right, the Girona 500 I-AUV in a water tank with a mock up of a sub-sea panel at the background and a Sea-eye thruster (on the right) used to introduce perturbations during the manipulation. On the left, a 3D model of the customized end-effector, in which the three blocks can be distinguished: 1 passive gripper, 2 camera in-hand and 3 F/T sensor.

2.2. Software Architecture The software architecture for intervention is composed of several modules which are organized in several layers, see Figure 3. Starting from the bottom, the first layer contains all the sensors and actuators. Next layer has all perception systems to process sensor information, such as the localization module and the perception systems that process cameras and F/T sensor data. On top of it, the AUV and manipulator velocity controllers are in charge of following the set points of the LbD architecture. Finally, in the top level layer, the LbD architecture is in charge of acquiring data from demonstrations (phase 1), learning the model (phase 2) and reproducing the task by generating velocity setpoints (phase 3).

Figure 3. Diagram of software architecture showing the LbD architecture and its connection to the AUV control architecture.

2.2.1. Localization and tracking of elements A simultaneous localization and mapping algorithm, based on an extended Kalman filter with simultaneous localization and mapping (EKF-SLAM) is used to obtain a robust

A. Carrera et al. / Learning by Demonstration Applied to Underwater Intervention

101

AUV navigation [16]. The EKF-SLAM system combines different navigation sensors and merges their information through a constant motion model to obtain an estimation of its position. Furthermore, the EKF-SLAM can include information about landmarks to improve the AUV navigation. In our proposal, the pose of the target of interest (the goal valve) is included as a landmark into the system. 2.2.2. Perception module AUVs have different sensors to perceive the environment. To identify the target of interest, a vision-based algorithm analyzes the gathered images and compares them with an a priori target template. With this information, the main system is able to obtain the position of the target with respect to the AUV. Additionally, during the intervention task, the F/T sensor mounted in the endeffector is used to obtain contact information. 2.2.3. Control system In general, AUV and manipulator controllers accept velocity or pose requests. Our strategy uses two independent velocity controllers: one dealing with the 4 DoF of the AUV and another controlling the 4 DoF of the manipulator. The AUV velocity controller computes the force and torque to be generated to reach the desired velocity. The force output is a combination of an standard 4 DoF proportionalintegral-dervative (PID) controller and an open-loop controller based on a model. The low-level controller for the manipulator controls the velocity for each joint (q˙ ∈ R4 ) in order to reach the desired velocity of the end-effector in the Cartesian space. To this end, the desired velocity is transformed to an increment in Cartesian space (x), and using the pseudo-inverse Jacobian (J † ) of the manipulator, q˙ is obtained as follows: ˙ q˙ = J † x. 2.2.4. LbD architecture for underwater intervention The LbD approach introduced in Section 1.2 has been tailored to the complexities of the underwater environment and the need of a tight cooperation between the vehicle and the manipulator. The implemented LbD architecture is divided in 3 phases that we detail in the following lines, describing also the particular modifications that have been performed at each stage. • Demonstration: The operator performs the task by teleo-perating the I-AUV, using the feedback from the onboard camera and the F/T sensor. Knowing the target pose, the manipulator and AUV poses are transformed with respect to a frame located at the target’s center (i.e the center of the valve). The tele-operation phase is paramount for the proper functioning of the system, as the quality of the learning will depend on the quality of the demonstrations. Toward that end, we propose to use a haptic device with force feedback to control the vehicle and the manipulator. Furthermore, for a better feedback while performing the demonstration, the operator uses a Graphical User Interface (GUI) to watch the vehicle’s camera as well as a 3D representation of the AUV pose.

102

A. Carrera et al. / Learning by Demonstration Applied to Underwater Intervention

• Learning: After several demonstrations, the LbD algorithm generates a model of Gaussians attractors defined by the position of its centers and the stiffness matrices, using a modified version of the DMP algorithm explained in Section 1.2. The DMP algorithm has been adapted to allow an efficient control of both, the vehicle and the manipulator. This implied the addition of the vehicle’s yaw orientation, and the end-effector’s position in the Cartesian space x,y,z and roll orientation. Hence, the modified DMP controls 8 DoF instead of 3 DoF. To take in consideration the relation between the movement of the vehicle and the manipulator, the requested manipulator velocities are computed by subtracting the end-effector requested velocities from the AUV requested velocities. Besides, the DMP has been modified to integrate a finalization condition using the information of the F/T sensor, to detect the contact with the valve and thus the accomplishment of the trajectory. • Reproduction: The LbD Reproductor loads the task model and using the same inputs as in the demonstration phase, generates the AUV and manipulator requested velocities to perform the task.

3. Results To validate the proposed LbD framework a valve turning intervention task has been proposed. The task is divided in 3 steps: approaching the panel, moving the manipulator to an appropriate configuration to grasp the valve, and finally turning the valve. Results are presented by following the 3 phases of the LbD algorithm. 3.1. Demonstration In the demonstration phase, the AUV is placed approximately at 4m from the panel. The operator, using the haptic device, drives the vehicle to a position where the intervention can start (around 1.5 to 2 m from the panel). When Girona 500 I-AUV reaches this position the operator moves the manipulator to obtain a desired configuration to grasp the valve. Finally the valve is turned. 3.2. Learning The learning algorithm uses the previously demonstrated trajectories from the beginning of the approximation until the valve is grasped (steps 1 and 2). The last step, turning the valve, is not part of the learned model. For that, we use a controller that given a maximum torque and a desired angle turns the valve. In the proposed experiment, the learning algorithm used 20 attractors points and 5 demonstrations to learn the task. 3.3. Reproduction Figure 4 shows the 8 DoF learned in independent plots comparing the 5 demonstrations against an autonomously performed trajectory. As it can be observed, the LbD reproduction follows the desired trajectory producing smoother movements than the human oper-

A. Carrera et al. / Learning by Demonstration Applied to Underwater Intervention

103

ator. This can be easily appreciated in the Z axis. Also the LbD can generate more strict or flexible constraints, depending on the variability of the demonstrated trajectories. For example, in the Roll graph, the reproduced trajectory is flexible until the second 80, and then it strictly follows the reproduced trajectories after that.

(a) AUV

(b) End-effector

Figure 4. Demonstrated trajectories (black and slashed line) and autonomous trajectory (red and blood line) for the valve grasping task. All trajectories are represented in the frame of the target valve. Each plot shows a single DoF for the manipulator and the end-effector. In the demonstration and the reproduction the time used is the real time of the experiments which in the reproduction is the equivalent to the one generated by the canonical system.

Regarding the overall performance, 13 of the 16 reproductions have been successful. Most of the failures are caused by errors in the alignment between the valve and the end-effector. This errors are introduced in the vision-based system to detect the valves orientations and should be further investigated. ¯ XAU V σ

¯ YAU V σ

¯ ZAU V σ

¯ Y awAU V σ

¯ XEE σ

¯ YEE σ

¯ ZEE σ

¯ RollEE σ

0.0713 m 0.0812 m 0.2071 m 0.0633 rad 0.0984 m 0.1056 m 0.2198 m 0.925 rad Table 1. The average of the standard deviation along the completed reproduction of the 13 successful reproductions.

To show the similarity of all the successful reproductions have been computed the standard deviation between them. Table 1 shows the average of the standard deviation obtained for each axis during the reproduction time. All the axes have obtained a small deviation proving the similarities between them. The roll of the end effector has the biggest deviation due to the flexibility in the learned model. On the other hand, the X and Y axes of the AUV and the end-effector have a similar small values while the Z axis has bigger values, because of the difficulty to stabilize the AUV after modifying the depth. 4. Conclusions This paper has presented, for the first time, the use of machine learning in the context of an I-AUV task showing real experiments in a controlled water tank. We have imple-

104

A. Carrera et al. / Learning by Demonstration Applied to Underwater Intervention

mented a LbD algorithm integrated in a full framework that allows to intuitively teach a task using few operator demonstrations. The core of the implemented LbD consists in a DMP algorithm that has been tailored to control simultaneously the vehicle and the manipulator (8 DoF). In this way, we achieve a tight cooperation between both components and greater stability in the performed trajectories. The validation experiments have been performed with the Girona 500 I-AUV equipped with a manipulator and a custom endeffector in the context of a valve turning intervention task. The results of the experiments have proved the suitability of the proposed method obtaining similar or better results than a human operator. Future work will focus on dealing with high perturbations and detecting failures during the evolution of the intervention, trying to adapt the strategy to the new conditions or aborting the task. Also, F/T sensor data will be included in the DMP algorithm to learn/reproduce how the human operator interacts with the target.

References [1] (2013) Rethink robotics. [Online]. Available: http://www.rethinkrobotics.com/products/baxter [2] B. Argall, S. Chernova, M. M. Veloso, and B. Browning, “A survey of robot learning from demonstration.” Robotics and Autonomous Systems, vol. 57, no. 5, pp. 469–483, 2009. [3] G. Marani and S. Yuh, “Underwater autonomous manipulation for intervention missions AUVs,” Oceans Engineering, vol. 36, no. 1, pp. 15–23, 2009. [4] M. Prats, J. Garcıa, J. Fernández, R. Marın, and P. Sanz, “Towards specification, planning and sensorbased control of autonomous underwater intervention,” IFAC 2011, 2011. [5] J. Fernandez, M. Prats, P. Sanz, J. Garcia, R. Marin, M. Robinson, D. Ribas, and P. Ridao, “Grasping for the Seabed: Developing a New Underwater Robot Arm for Shallow-Water Intervention,” Robotics Automation Magazine, IEEE, vol. 20, no. 4, pp. 121–130, Dec 2013. [6] D. R. Faria, R. Martins, J. Lobo, and J. Dias, “Extracting data from human manipulation of objects towards improving autonomous robotic grasping,” Robotics and Autonomous Systems, vol. 60, no. 3, pp. 396–410, 2012. [7] S. Calinon, F. D’halluin, E. L. Sauser, D. G. Caldwell, and A. Billard, “Learning and Reproduction of Gestures by Imitation.” IEEE Robot. Automat. Mag., vol. 17, no. 2, pp. 44–54, 2010. [8] V. Krüger, V. Tikhanoff, L. Natale, and G. Sandini, “Imitation learning of non-linear point-to-point robot motions using dirichlet processes.” in ICRA. IEEE, 2012, pp. 2029–2034. [9] S. Calinon, T. Alizadeh, and D. G. Caldwell, “Improving extrapolation capability of task-parameterized movement models,” in Proc. IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS), Tokyo, Japan, November 2013. [10] G. Hovland, P. Sikka, and B. McCarragher, “Skill acquisition from human demonstration using a hidden Markov model,” in Robotics and Automation, 1996. Proceedings., 1996 IEEE International Conference on, vol. 3, 1996, pp. 2706–2711 vol.3. [11] V. Kruger, D. Herzog, S. Baby, A. Ude, and D. Kragic, “Learning Actions from Observations,” Robotics Automation Magazine, IEEE, vol. 17, no. 2, pp. 30–43, 2010. [12] P. Pastor, H. Hoffmann, T. Asfour, and S. Schaal, “Learning and generalization of motor skills by learning from demonstration.” in ICRA. IEEE, 2009, pp. 763–768. [13] M. Parlaktuna, D. Tunaoglu, E. Sahin, and E. Ugur, “Closed-loop primitives: A method to generate and recognize reaching actions from demonstration.” in ICRA. IEEE, 2012, pp. 2015–2020. [14] P. Kormushev, S. Calinon, and D. G. Caldwell, “Imitation Learning of Positional and Force Skills Demonstrated via Kinesthetic Teaching and Haptic Input.” Advanced Robotics, vol. 25, no. 5, pp. 581– 603, 2011. [15] D. Ribas, N. Palomeras, P. Ridao, M. Carreras, and A. Mallios, “Girona 500 AUV: From Survey to Intervention,” Mechatronics, IEEE/ASME Transactions on, vol. 17, no. 1, pp. 46–53, feb. 2012. [16] N. Palomeras, S. Nagappa, D. Ribas, N. Gracias, and M. Carreras, “Vision-based localization and mapping system for AUV intervention,” in OCEANS’13 MTS/IEEE, 2013.

Social and Cognitive Systems

This page intentionally left blank

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-107

107

Understanding color trends by means of nonmonotone utility functions Mohammad GHADERIa,1, Francisco J. RUIZ b and Núria AGELLa ESADE Business School- Ramon Llull University, Sant Cugat, Spain b Automatic Control Department, BarcelonaTech. Vilanova i la Geltrú, Spain a

Abstract. In this paper we explore the possibility of capturing color trends and understanding the rationale behind the popularity of a color. To this end, we propose using a preference disaggregation approach from the field of MultiCriteria Decision Analysis. The main objective is to identify the criteria aggregation model that underlies the global preference of a color. We introduce a new disaggregation method based on the well-known UTASTAR algorithm able to represent preferences by means of non-monotonic utility functions. The method is applied to a large database of ranked colors, from three different years, based on the information published on the webpage of an international creative community. Non-monotone marginal utility functions from each of the coordinates are obtained for each year. These functions contain the color preference information captured, in an understandable way. Keywords. Multi-Criteria Decision Analysis, Disaggregation preference method, Non-monotonic utility function. Color trends.

Introduction Color is one of the key features which play an important role on the purchase decisions of consumers. Fashionable colors depend on many uncontrolled factors related to the nature of the product, the target market and other environmental characteristics such as cultural, religious and even climatic variables. Color trends are ephemeral and prevail just in one season, thus, it is crucial for the industry to understand the color fashion trends to offer the product to the market in the most efficient way. The common practice for the forecasting of color trends in industry are based on the opinion of field experts, which is hard to be substituted by analytical models. In this paper, we explore the option to capture color trends and understand the rationale behind the popularity of each color. To this end, we propose to use a preference disaggregation approach from the field of Multi-Criteria Decision Analysis (MCDA). The aim of this approach is to identify the criteria aggregation model that underlies the global preference in a multi-criteria decision problem by means of the marginal utility function of each of the attributes considered. UTA (Utilités Additives) is one of the most representative preference disaggregation methods. In most of the fields where UTA and its extensions have been applied, the input attributes are normally expected to be monotone with respect to the preferences. The assumption of monotonicity is widely accepted as reasonable for 1

ESADE Business School- Ramon Llull University, Sant Cugat, Spain

108

M. Ghaderi et al. / Understanding Color Trends by Means of Non-Monotone Utility Functions

many criteria, such as price, risk level, security, safety, comfort, required time, etc. However, this is not the case for other attributes. For instance, whether a color is preferable or not may depend on the red/green coordinate (if we use CIELab coordinates) but it is not expected that this attribute was monotone. This fact has motivated us to propose an extension for UTA method to address non-monotone preferences. Several attempts have been made in the literature to overcome the aforementioned shortcoming. This paper contributes to the existing literature by introducing a faster and simpler method able to capture the preferential system of the DM in the form of marginal additive non-monotonic utility functions. The method is applied to a database of colors obtained from an international creative community. These functions contain the color preference information captured, in an understandable way. The paper is organized as follows: first, an introduction to the problem of color preference description is briefly introduced and the basics of color spaces are presented. Section 2 is devoted to the description of the methodology proposed. Experiment description and results corresponding to the application of the proposed method to the color database are presented in section 3. Finally, in the last section conclusions and future work are discussed.

1.

Color coordinates and color spaces

Color preferences are the tendency for an individual or a group to prefer some colors over others. People make associations with certain colors due to their past experiences. For an individual, colors associated to good experiences are preferred and colors associated to bad experiences are disliked. For a group, color preferences can be influenced by many global factors include among others: cultural, politics, religion, economy, climate and geography factors. Designers and manufacturers desire to know what the “in” colors are going to be before their products can be developed. To this end, it will be useful to understand how some colors attributes influence color preferences. Color attributes are normally referred as color coordinates and the space formed by all possible colors is denoted color space. Several numeric specifications for colors definition can be found in the literature. The most classic and internationally accepted of these are the ones based on tristimulus values or coordinates. The most known of these is RGB, proposed by the Commission International de l’Eclairage (CIE) in 1931. RGB uses additive color mixing, because it describes what kind of light (red, green or blue) needs to be emitted to produce a given color. The RGB color model is implemented in different ways, depending on the capabilities of the system used. By far the most common is the 24-bit implementation. This model is thus limited to a         !!   !  It is a convenient color model for computer graphics, but it can be unintuitive in use. The

#  $     &  & ! can be difficult for untrained people (try selecting brown using an RGB vector). In 1976, the CIE proposed the CIE Lab color scale as an attempt to linearize the perceptibility of color differences [6]. CIE Lab (CIELab) is the most complete color model used conventionally to describe colors visible to the human eye. Its three parameters represent the luminance (L) of the color, its position between red and green

M. Ghaderi et al. / Understanding Color Trends by Means of Non-Monotone Utility Functions

109

(a) and its position between yellow and blue (b). It is generally argued that CIELab is more intuitive than RGB and its coordinates L,a,b are more readily and easily recognized. For this reason, in this paper we’ll use the CIELab color coordinates representation. Figure 1 represents geometrically the two most color representation systems RGB and CIELab.

Figure 1. RGB (left) and CIE L*a*b* (right) coordinates.

2.

The preference disaggregation methodology proposed

Most of machine learning tools can automatically discover relations between attributes and preferences, but, usually, they are considered as a black-box in the sense that this relation is difficult to understand in a rational way. In this study we are interested in representing this relation in an understandable way. For this reason, we will use a preference disaggregation method from MCDA capable of identifying the criteria aggregation model that underlies the preference result from the analysis of the global preferences. UTA (Utilités Additives) is one of the most representative preference disaggregation methods. It was first introduced by Jacquet-Lagrèze and Siskos as a Linear Programming (LP) model to capture the preferential system of the Decision Maker (DM) through nonlinear (piecewise linear) monotonic additive utility functions. The aim of the UTA method is to reproduce the ranking made by the DM over the set of alternatives by minimizing the level of ranking errors. Ranking errors are generally defined as the distance between the utility levels of two consecutive alternatives that are ranked incorrectly. However, the definition of the error slightly differs in the specific variant of UTA. The method leads to a simple Linear Programming (LP) model where the optimal solution can be easily obtained. Several extensions of UTA method have been introduced in the MCDA literature since then, incorporating variations on the original algorithm and considering different forms of global preference and optimality criteria. In this paper we propose an extension for UTA method to address non-monotone preferences.

110

M. Ghaderi et al. / Understanding Color Trends by Means of Non-Monotone Utility Functions

In the following subsections, we present the most representative UTA method for ranking (UTASTAR) and the non-monotone variant proposed to address the problem of color preferences. 2.1 UTASTAR method Suppose that there are m criteria g1, g2,…,gm to assess N preordered set of alternatives a1,a2,…,aN (in which a1 is the most and aN is the least preferred alternative in the ranking list) and xin is the performance of the alternative an over the criteria gi. Given a preordering set of the alternatives by the DM, the aim of the UTASTAR algorithm is to extract and represent the underlying logic behind the given ranking through estimating a set of m monotonic and additive utility functions, as consistent as possible with the preferential system of the DM. The formulation of the UTASTAR method involves defining Di breakpoints and henceforth Di –1 subintervals [gi0, gi1], [gi1 , gi2], …,[giDi–2, giDi–1] on the ith criterion, in which gi0 and giDi–1 are the minimum and maximum values over the ith scale, respectively. The marginal value at a breakpoint gil on criterion i is expressed as in equation (1).

(1) and the marginal value for an alternative an whose performance on the ith scale is xin [gil, gil+1] is obtained by linear interpolation between ui (gíl) and ui (gíl+1), as follows:

(2) The global utility of an alternative an is obtained by the sum of all of the marginal utilities, as in equation (3).

(3) The linear programming problem by the UTASTAR is provided in (4).

(4)

in which V +(an) and V –(an) are the underestimation and overestimation error terms. The term V +(an) (respectively V –(an)) is the lowest amount that must be deducted from (added to) the estimated global utility of an to satisfy the DM preferential order over an

M. Ghaderi et al. / Understanding Color Trends by Means of Non-Monotone Utility Functions

111

and an+1. The term į is a parameter with a small value, and the first two constraints represent the preorder relations provided by DM. The third constant ensures that the relative weights of the criteria sum up to 1, and the objective function minimizes the deviation of the utility function proposed by the model and the one assumed as the tacit knowledge of the DM. By solving this model, the marginal utility function over each criterion scale will be constructed based on the expression in (1). 2.2 Non-monotonic UTA-based Algorithm The input attributes in UTASTAR method are normally expected to be monotone with respect to the preferences. However this is not a reasonable requirement for colorimetric components. Obviously no one can expect a monotonic relationship between color preference degree and its degree of greenness, or its luminance. Therefore, modification of UTASTAR algorithm in the sense that it will be able to represent preferential system of the DM by non-monotonic utility functions is of a great importance in this setting. Although several attempts have been conducted in the literature to overcome the mentioned shortcoming [2],[3],[4],[5], all are computationally intensive or require extra information from the DM. The method we applied here, inspired by the UTA methodology, is fast and tractable, unlike the existing ones. The general idea is to relax sign constraint in the decision variables representing difference of utility level between two consecutive breakpoints. Therefore, marginal utility function can change the monotonicity at any breakpoint. This might leads to two problems: the first one is the overfitting problem in the case that the monotonicity changes many times. This is prevented by defining a small, but reasonable, number of breakpoints. The second problem is about normalization. By the normalization, we mean that the minimum and maximum global utility must be equal to zero and one, respectively. The challenge is that we cannot predict where the maximum utility will be achieved in order to impose a constraint on the sum of them over the set of criteria. Furthermore, we do not know the attribute level corresponding to the minimum marginal utility on each criterion to set them equal to zero. To solve this problem, an iterative approach is followed. Whenever the maximum global utility is less than one, it’s value is forced to increase in the next iteration, by adding a new constraint considering the performance level corresponds to highest marginal utility in the current stage. The added constraint is applied just in the next iteration, and will be removed from the LP model in the next iterations, because it is not necessarily satisfied in the final solution. Whenever the maximum global utility is greater than one, a restrictive constraint is imposed to ensure that the global utility of the attribute levels corresponding to the highest marginal utility in the current stage will not have a value more than one in all the next iterations. Furthermore, to satisfy another condition of normalization which is minimum global utility being zero, a penalization term is added to the objective function to penalize any violation of this assumption. Results obtained by applying this method, which is able to represent preferential system of the DM by set of additive non-monotonic utility functions, on the color dataset are presented in the next section.

112

3.

M. Ghaderi et al. / Understanding Color Trends by Means of Non-Monotone Utility Functions

Experiment description

The purpose of this study is to be able to represent the color preference in an understandable way. This representation can be useful when a color should be chosen for a new product. To this end, a database of color preferences has been collected and the non-monotone method presented is applied. The following subsections presents de database used and the results of the experiment. 3.1. Color Database Data collected in this research, expressing the collective color preference, were obtained from the website www.colourlovers.com. It is managed by an international creative community that focuses on color inspiration and color trends for both personal and professional creative projects. Each year since 2010, community members vote their favorite color created by themselves during the previous year. In this way, votes can be taken as a preference measure of each color. Colors, represented by their RGB colorimetric components, along with its number of votes were collected for three different years, 2010, 2011, and 2012. Colorimetric components are transformed into CIELab color space. There are no simple formulas for conversion between RGB and Lab, because RGB is device-dependent. Then, RGB is first transformed to a specific absolute color space using CIE standard illuminant D50 and then it is transformed into CIELab. In order to increase the validity of our findings, we exclude colors with less than 100 votes from the dataset. Finally, for each year, colors are ranked according to their votes. In conclusion, we obtain a dataset with 114 colors each one of them represented by its year, two pairs of coordinates and with an output representing its ranking order in the specific year considered.

3.2. Experimental results The algorithm was run on a 64 bit OS and 2.53 GHz Intel Core2Duo using MATLAB R2012b. The extracted marginal utility functions obtained in CIELab color spaces for each of the three years are provided in the figure 2. Each row of this table of figures represent the non-monotone marginal utility function for attributes L, a and b. Each graph represents the influence of one of the attributes on the global color preference of the year. The obtained graphs demonstrate how the difference of attractiveness among set of colors, perceived by a set users, can be represented by the color coordinates. Concretely, if color A is perceived more popular than color B based on the collective preferences of users, the figure explains the reason of that. Furthermore, it provides the contribution of each color coordinate into explaining the underlying preference structure. From figure 1, it can be seen that the shapes of the marginal utility function of the first attribute is approximately the same in all the three different datasets. Generally speaking, as the value of the luminance increases, the marginal utility value first decreases, then increases, and decreases again at the end. Therefore we can conclude that the marginal utility function over the attribute L (luminance) has an S-shape function, while the general shape for the other two marginal utility functions are not very clear and considerably differs in the different years.

M. Ghaderi et al. / Understanding Color Trends by Means of Non-Monotone Utility Functions

113

Table 1 shows the weights of each colorimetric dimension in the CIELab color space for each year

CIEL

CIEa

CIEb

Figure 2. Marginal utility functions obtained in the CIELab color space

Table 1 Weights of each colorimetric dimension in the CIELab color space for each year

CIEL

CIEa

CIEb

2010

37.00%

36.00%

27.00%

2011 2012

44.10% 44.50%

27.60% 17.90%

28.30% 37.60%

As can be seen, L dimension is always the most important dimension in the CIELab space. It has the highest weight in all the three rows, supporting the idea that attractiveness of a color mostly depends on its luminance. In another words, luminance (L) plays the most important role in determining the extent to which the color is going to be perceived favorable, therefore should be considered more carefully than other two attributes.

114

M. Ghaderi et al. / Understanding Color Trends by Means of Non-Monotone Utility Functions

To evaluate performance of the learning algorithm, the accuracy of the results is calculated by comparing the ranking achieved by estimated utility and the ranking achieved by the voters’ opinion. The result is as follows. Table 2 Accuracy of the results

2010 2011 2012

CIELab 68.18% 50.80% 74.20%

Given the marginal utility functions, it is also possible to detect the color that provides maximum utility to the DMs. The following table provides numerical values of those colors, along with their visualization.

Table 3 Most favorable colors based on the extracted utility functions in the CIELab space

L*

a*

b*

2010

80.2

67.6

9.7

2011

58.2

-0.6

-93.7

2012

79

-5.6

-83.1

It is clear that the three colors should not be necessarily the same, as the set of the voters differs in the three years. But the interesting thing is that all the three colors have pretty close value of the dimension luminance, while values for the other two dimensions differs a lot in the three rows.

4. Conclusions and future research This paper presents a methodology, based on a non-monotonic UTA algorithm, to capture color trends and understand the rationale behind the popularity of colors. An experiment has been performed to analyze if the presented algorithm can provide some insights on strict preference underlying color popularity. Results show that the luminance is the dimension which mostly affects color preference, it always dominates the other two dimensions red/blue, green/yellow. The results also shed light on the general shape of the marginal utility function of luminance, the most important dimension, which is shown to be S-shape.

M. Ghaderi et al. / Understanding Color Trends by Means of Non-Monotone Utility Functions

115

In further research we will try to analyze the dynamics of these trends by taking into account the sequence of marginal utility functions to forecast the influence of each of the color attributes in future preferences.

Acknowledgments This work is supported by the Spanish project SENSORIAL (TIN2010-20966-C02-02) Spanish Ministry of Education and Science. Mohammad Ghaderi has partially been supported by SUR (Secretaria d’Universitats i Recerca) of the DEC (Departament d’Economia i Coneixement) of the Government of Catalonia (2014FI_V00637).

References [1] Jacquet-Lagrez, E., Siskos, Y., Assessing a set of additive utility functions for multicriteria decision making: The UTA method, European Journal of Operational Research 10, 151-164 (1982). [2] Despotis, D. K., Zopounidis, C., Building additive utilities in the presence of non-monotonic preferences, Advances in Multicriteria Analysis, 5, 101-114 (1995). [3] Kliegr T., UTA-NM: Explaining Stated Preferences with Additive Non-Monotonic Utility Functions, Preference Learning (PL-09) ECML/PKDD-09 workshop (2009). [4] Eckhardt, A., Klieger, T., Preprocessing Algorithm for Handling Non-Monotone Attributes in the UTA method, Preference Learning: problems and applications in AI (PL-12) workshop (2012). [5] Doumpos, M., Learning non-monotonic additive value functions for multicriteria decision making, OR Spectrum, 34, 89-106 (2012). [6] F. Billveyer, Principles of Color Technology, 2n edition Jonh Wiley & Sons, New York (1981).

116

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-116

Analysis of a collaborative advisory channel for group recommendation Jordi PASCUAL a , David CONTRERAS , a,1 and Maria SALAMÓ a a Dept. Matemàtica Aplicada i Anàlisi, Universitat de Barcelona, Gran Via de les Corts Catalanes, 585-08007, Barcelona, Spain Abstract. To date, recommendation systems have focused mainly on recommending products to individual rather than groups of people intending to participate in a group activity. In the last decade, with the growth of interactive activities over the internet such as e-commerce services or social virtual spaces, there has appeared many recommendation scenarios that involve groups of inter-related users. Though there have been attempts to establish group recommendation, most of them focus on off-line environments. In this paper we present a novel web-based environment that supports on-line group recommendation scenarios. Specifically, we propose a Collaborative Advisory CHannel for group recommendation called gCOACH. We introduce the environment with the multiple interaction modalities developed to communicate, coordinate and persuade the group in a case-based group recommender. We demonstrate its usability through a live-user case-study. Keywords. Group Recommendation, Conversational Case-Based Recommendation, Critiquing, Interaction

1. Introduction Recommender Systems (RSs) help users search large amounts of digital contents and services by allowing them to identify the items that are likely to be more attractive and useful [15]. RSs are playing an increasingly important role in many on-line scenarios [20], for example, e-commerce services are one example of these scenarios [14]. In particular, recommender systems has the ability to provide even the most tentative shoppers with compelling and timely product suggestions. Moreover, there are many recommendation scenarios that involve groups of inter-related users such as movies, trips, restaurants, and museum exhibits. But to date, much of the research in the area of recommender systems has focused mainly on recommending products to individuals rather than groups of people intending to participate in a group activity [13]. These types of scenario have motivated recent interest in group recommendation and to date a variety of early-stage systems have been developed in domains such as recommending vacations or tours to groups of tourists [9,6] or recommending music tracks and playlists to large groups of listeners [3,8]. 1 Corresponding Author: David Contreras, Dept. Matemàtica Aplicada i Anàlisi, Universitat de Barcelona, Gran Via de les Corts Catalanes, 585-08007, Barcelona, Spain; E-mail: [email protected]

J. Pascual et al. / Analysis of a Collaborative Advisory Channel for Group Recommendation

117

The role of the group recommender system (GRS) is to make suggestions that reflect the preferences of the group as a whole, while offering reasonable and acceptable options to individual group members. In this regard, a key aspect of group recommendation concerns the way in which individual preferences are captured and stored into the user profiles. These user profiles are combined to recommend a product (or a set of products) for the group at the completion of the group session. Though there have been attempts to establish group recommendation, most of them focus on user profiles extracted from off-line environments. Taking into account the growth of activities and environments that offer group interactions (e.g., virtual spaces or ubiquitous personalization services), we consider that group recommendations in on-line environments are becoming nowadays commonplace. When the recommendations are made for a group of on-line users, new challenges and issues arise to provide compelling item suggestions as well as new interaction mechanisms to keep aware the members of the group and to facilitate communication among them. In this paper we consider a web-based environment that supports on-line group recommendation scenarios. The proposal is called gCOACH (COllaborative Advisory CHannel for group recommendation). In particular, gCOACH aims at being an on-line framework that facilitates group interaction and communication among members. For this reason, gCOACH uses a conversational case-based recommender [18], which is a form of content-based recommendation that is well suited to many product recommendations. It uses the description of the product base in terms of a complete set of features instead of using the preferences (mainly provided off-line) from other users to make a recommendation, as occurs in collaborative filtering recommenders [16]. Conversational recommenders have proven to be especially helpful for users with ill-defined needs and preferences. Notice that recommender systems can be distinguished by the type of feedback that they support; examples include value elicitation, ratings-based feedback and preference-based feedback [19]. Specifically, we use a form of user feedback called critiquing [12], where a user indicates a directional feature preference in relation to the current recommendation. It is a form of feedback that strikes a useful balance between the information content of the feedback and the level of user effort or domain expertise that is required. In summary, the main contributions of this paper are two-fold. First of all, gCOACH not only elicits users’ preferences in the form of critiques during the session but also it provides multiple interaction modalities and interface components to keep awareness of member’s decisions and to provide suggestions among them. Second, we evaluate the usability of the proposal based on a live-user study.

2. Related Work Most of the research on group recommendation investigated the core algorithms used for recommendation generation. Two different strategies have been mostly used for generating group recommendations: aggregating individual predictions into group predictions (aggregated predictions) [1] or aggregating individual models into group models (aggregated models) [17]. Differences among these strategies differ in the timing of data aggregation step. Apart from the recommendation generation, group recommender systems can be distinguished based upon the different features used in the design of the GRSs [6]. In this paper we are interested in group recommenders that work in an on-line environ-

118

J. Pascual et al. / Analysis of a Collaborative Advisory Channel for Group Recommendation

ment. For this reason, we pay attention to those features that heavily influence preference elicitation. Preference elicitation refers to the manner in which information is acquired from user. For example, preferences may be acquired by asking users directly explicit preference elicitation(e.g., [5]) or by inferring their preferences from their actions and feedback (implicit preference elicitation), as for example [7]. Specifically, we consider that preference elicitation is also related to: (1) the way user interacts with the GRS; (2) the type of domains that the GRS can work with; and (3) the outcome of the GRS. First of all, considering the user interaction, most of the approaches use an off-line interaction with the users. That is, users provide some feedback and the GRS provides a recommendation without further user interaction with the system (e.g., [4]). On the other hand, in an on-line interaction, users are active and they are engaged in a collaborative session as in [5,9]. Our proposal, gCOACH, follows this vein by supporting on-line group recommendation scenarios. Secondly, according to the domains that GRS can work with, nearly all GRSs are defined for an specific domain (e.g., tourist domain [9,5] or recipe recommender [1]. There are few proposals that are domain-independent [4] but it uses an off-line environment for group interaction. It is worth noting that gCOACH is domain-independent too. Finally, there are habitually two forms for presenting the outcome of the GRS: a single recommendation that must be a successful selection for the group (e.g., [9]) or a list of items ordered according to the preferences of the group members (e.g., [13]). There are some GRSs that generate a different outcome, for example in [5] instead of returning recommendations, it returns a list containing the group preferences. Our proposal uses a conversational case-based recommender which aims at guiding the user over the search space by using critiquing as feedback. For this reason, it is more convenient to use a single recommendation instead of a list of products ordered. However, it is remarkable that gCOACH defines two areas for keeping a list of recommendations. First one is devoted to store the preferred products of the group, called stack area. Second area receives and stores suggestions in a proactive way. This suggestions may come from the GRS itself or from other group members.

3. gCOACH: group-based COllaborative Advisory CHannel This section presents gCOACH framework, which supports on-line group recommendation scenarios. This framework allows several users to participate in a group activity that involves searching a product for the whole group. 3.1. Conceptual Architecture of gCOACH The conceptual architecture of gCOACH is based on a web-based environment developed on a client-server model for enabling the interaction of the users from anywhere. Figure 1 shows the conceptual architecture of our proposal divided into three main layers: a Space Client, a Space Server, and a Group Conversational Recommender (GCR) module. Firstly, Space Client is a space that offers interaction, collaboration, and awareness among users. This space allows concurrent connection of one or more users. They can interact with a recommendation object, with a stack object, with a suggestion box object,

J. Pascual et al. / Analysis of a Collaborative Advisory Channel for Group Recommendation

119

Figure 1. Conceptual Architecture of gCOACH Framework

or with an awareness object. A Recommendation Object (RO) is an object that represents a case in the interface. It contains product information and the description of all features of the current recommendation, interactive elements to perform critiques, and collaborative elements to perform collaboration and communication among users (e.g., sending a suggestion to another user). A Stack Object (SO) is an object that contains all the products that are of interest for the group. It is common for the group and it is updated each time a user operates over it. The Suggestion Box (SB) object contains a list of products that have been suggested to the user by anyone of the members or by the recommender itself. The Awareness Object (AO) includes a list of users with the information of the current product view of that particular user. Secondly, Space Server is responsible for the communication between user and GCR module. Basically, it has three components. The first one is the Communication Management Module which maps user’s events to recommendation’s actions and, in reverse, recommendation actions to user events. Second component is the Users Management Module which stores and manages users’ information such as user identification and states (critiquing or sending suggestions), and the RO the user is interacting with. Third component is the Content Management Module which is responsible for updating the RO visualization (i.e. product image and features values) in each recommendation cycle. Finally, Group Conversational Recommender module is composed by the Group Conversational Recommender (GCR) Algorithm, a case base (CB), a set of individual user models (IM) (i.e. one for each user in the environment), and a group preference model(GM). The case base CB contains a set of products or cases for recommendation. Each product is described with a set of a features (F) (e.g., price, duration, or location). The set of critiques applied by the user constitutes the individual user model IM of that particular user, where each element in IM is a single unit critique. A group preference model is also maintained by combining the individual user models and associating critiques with the users who contributed them. GCR algorithm is a module that performs the recommendation process, controls data access (massive products with diversity of features), and updates user models during the recommendation process. Briefly, the recommendation process starts when the user is inside of a Space Client and she performs one critique about a product feature through a Critique Element displayed on a Recommendation Object, this critique is sent to the Group Conversational Recommender (GCR) through the Communication Channel placed on the Space Server.

120

J. Pascual et al. / Analysis of a Collaborative Advisory Channel for Group Recommendation

Once received the critique in GCR algorithm, it updates the individual user model (IM) and the group preference model (GM) with the new preference and it selects the next recommendation from the full set of products in the case base, CB. The recommendation is generated based on the individual model of the user and the group preference model as described in [10,11]. Next, using both the communication management and content management modules, Group Conversational Recommender module sends a new recommendation to be displayed on the Recommendation Object of the user that performed the critique. In addition, users can communicate and provide a suggestion by using a Collaborative Element. The interaction modalities for communicating and allowing suggestion among users are deeply described in next section. 3.2. Interaction modalities in gCOACH Figure 2 depicts the interface screen shown to each user in their browser. The example interface shown is focused on a skiing package domain. It is divided in several areas, each one with a specific interaction modality (i.e. individual or social) and functionality. Note first that each one of the users is represented by a color (look at the top of the figure). In the example the user is the red one. This color is used in the interface to denote to who belongs each one of the objects in the interface (i.e., RO, SO, SB, AO).

Figure 2. The main gCOACH interface with a skiing package view

First area denoted with number one in Figure 2 represents the RO in the conceptual architecture shown in Figure 1. It is devoted to individual interaction, which has three different views: category view, subcategory view, and product view. In gCOACH, the user interaction starts at a category view, which is able to focus the user in a specific category of the products (in our case, we use the location in the domain used in the experiments, see Figure 3a). Once selected a category, they are moved to a subcategory that in the domain analyzed corresponds to a resort view (see Figure 3b) where they can see all the available hotels. If the user selects one hotel, they arrive to an specific skiing package, as depicted in Figure 2.

J. Pascual et al. / Analysis of a Collaborative Advisory Channel for Group Recommendation

(a) gCOACH category view

121

(b) gCOACH subcategory view

Figure 3. Due to space limitations, we only show the individual interaction of the two views

Here, in the product view, a product is described in terms of their features and the particular value for each one of them. Additionally, each one of the features contains one or two buttons for performing critiques (i.e. this are the critique elements in Figure 1). The user is able to make a critique2 , which affords the user an opportunity to provide informative feedback. This feedback is introduced to the individual model (IM) and the group preference model (GM). Next, GRC algorithm uses this informative feedback about the user taste and it answers this action by replacing the product displayed with a new recommendation that better matches with the preference expressed. Currently, GRC algorithm implementation is the one proposed in [10,11]. Furthermore, when individual users arrive to a particular product recommendation that satisfy their requirements and wish to draw it to the attention of the other group members, they can do this by performing a drag and drop and adding it to the suggestion box of another user or to a stack area, which is a social interaction modality. Second area in gCOACH interface is for keeping awareness among members of the group –see AO in the conceptual architecture and look at number 2 in Figure 2. This area contains a list of color boxes, each one representing a group member. Each color box show which product is currently looking up each user. Users can browse this product by performing a click on it. Besides, the color box contains a 0-5 hearts ranking that represents how much compatible is the current product to each one of the users. The ultimate goal of this heart ranking is to know that users may be interested in the current recommendation and, if considered interesting by the user, suggest it to them. With this goal in mind, this area also allows the user to make suggestions of the current recommendation on area 1 to a specific user by doing a drag and drop of the product into the target user box. Then, the suggested product will appear in the suggestions box of the target user. These collaborative actions correspond to collaborative elements in the conceptual architecture. Third area is the suggestion box, depicted in Figure 2, number 4, which is represented as SB in the conceptual architecture as shown in Figure 1. These suggestions may come from anyone of the group members or as a result of a proactive suggestion of the recommender algorithm. GRC algorithm suggests a product to the whole group when 2 With

a critique the user express a preference over a specific feature in line with their personal requirements (e.g., cheaper or higher star rating for hotel, etc.).

122

J. Pascual et al. / Analysis of a Collaborative Advisory Channel for Group Recommendation

one or more cases exceeds a certain critical compatibility threshold with respect to the group preference model. As the previous area, each product is identified by the border color and shows few features including a compatibility ranking with the current user. Furthermore, it is also available the option of clicking on the product to take a look on it in the area of individual interaction. Another social interaction modality is the STACK area (shown in Figure 2, number 5), which represents the SO in the conceptual architecture. It serves as repository of particular holiday recommendations the user is interested in and it is also useful to draw the attention of the other group members over a particular product. The stack stores summaries of the user’s recommendations, as well as displaying compatibility information relating to group compatibility. Each product recommendation appears boxed with the color of the user that added it to the list and shows a summary of its features (in the skiing domain, hotel name, resort name, kind of resort, number of stars, and price). In addition of this content, a measure of compatibility between the current user and the item appears through a visual ranking of 0-5 hearts (0 hearts means a non suitable product for the current user and 5 heart means a perfect item to choose). At any time, when users detect an attractive product in this area, they can open it in the individual interaction area to examine it by performing a simple click on it. Finally, there is a waste basket area (see Figure 2, number 3). It is used to the disposal of products that the user is not interested in anymore from the suggestions box area or the STACK area . This functionality is activated when the user perform a drag and drop from one of these areas to the waste basket area.

4. Evaluating the Usability in gCOACH In this section we evaluate the usability with real users. Given the high fidelity of our prototype, we used the Summative usability evaluation method, which focuses on gathering both qualitative and quantitative data [2]. 4.1. Setup and methodology We recruited 20 participants, diverse in features such as age, gender, computer skills and experience in web-based environments. These participants were joined in groups of four heterogeneous participants for each test. The test was performed using a SKI domain that contains 153 European skiing holidays described by 41 features related to the resort (e.g., country or transfer time) and the accommodation (e.g., rating, price, or restaurant facilities). The test was conducted by a moderator and an observer. Users were requested to join in a group using an initial web page of the interface and they perform a search task of their favorite ski vacation. The test protocol consisted of four stages: 1. Pre-test interview: In this stage the moderator welcomed the user, briefly explained test objectives and asked about their previous experience with ski vacations and recommender systems. 2. Training: During this stage users were freely navigating on the web interface of gCOACH. They were asked to locate a predefined product using individual or social interaction modalities. The training stage finished when users discovered this product. 3. Test: In this phase users performed, without guidance, a test task that consisted of selecting a product that best satisfies the group preferences for going skiing together.

J. Pascual et al. / Analysis of a Collaborative Advisory Channel for Group Recommendation

123

To this end, users were asked to navigate, communicate, and provide suggestions with the aim of finding a consensus in the group to purchase a product. However, users were free to finish the search process once they have found a product that best satisfies their preferences. Among the products in the stack, the user selects the preferred one. When all users finished the searching process, among the group preferred products (one for each user), GCR recommends the one that best satisfies the group. This product is shown as the final product for the group. During the task, a computer recorded the test session and the observer made annotations. 4. Post-test questionnaire: Users were asked to fill out a Web form that contains a satisfaction questionnaire consisting of 10 questions (see Table 1) that users answered using a five-point likert scale, where 1 correspond to “strongly disagree" and 5 to “strongly agree". After the test, we analyzed the post-questionnaire to extract relevant data concerning usability. Next section describes this evaluation analysis. 4.2. Analysis of usability To collect user satisfaction measures we designed a post-test questionnaire, depicted in Table 1. Figure 4 presents collected data using a bar chart and a pie chart. Table 1. Post-test questionnaire Question Number

Statement

Q1 (Effectiveness)

The items recommended to me matched my interests.

Q2 (Learnability)

The interface provides an adequate way for me to express my preferences.

Q3 (Learnability)

The recommender’s interface provides sufficient information.

Q4 (Learnability)

I became familiar with the recommender system very quickly.

Q5 (Usefulness)

It is easy for me to inform the system if I dislike/like the recommended item.

Q6 (Usefulness)

The recommended items effectively helped me find the ideal product.

Q7 (Satisfaction)

The recommender made me more confident about my selection/decision.

Q8 (Satisfaction)

The interface let me know easily where are the rest of my teammates at all times.

Q9 (Satisfaction)

Overall, I am satisfied with the recommender.

Q10 (Intention to use in the future)

I will use this system for buying products in the future.

Figure 4a depicts the results obtained from the post-questionnaire. Note that these results are related to the subjective perception of users but are quantitative data which gives us valuable information about users’ perception of usefulness and usability of our gCOACH framework. Overall, the quantitative results obtained from the questionnaire are very satisfactory. It is worth noting that 85.3% of the responses were ranked with 3 or more points, 2.1% of responses correspond to a minimal score (1 value), 12.6% were replied to a question with value 2. Considering the learnability of the gCOACH (i.e. questions Q2 to Q4), a 80% of participants’ responses show that the users found the system easy to learn and evaluated this aspect with 3 or more points. With regard to the satisfaction of users and intention to use the system in the future, results of Q7 to Q10 show that a 85% of the participants positively evaluated this aspect with 3 or more points. In addition, user’s opinion reference to whether or not gCOACH is useful to them (i.e. usefulness aspect), responses to Q5

124

J. Pascual et al. / Analysis of a Collaborative Advisory Channel for Group Recommendation

(a) Grouping rating for each question in the postquestionnaire

(b) Area of most attention in the interface

Figure 4. Usability analysis of gCOACH interface

and Q6 answers depict that a 93% of the participants it evaluated with 3 or more points. Finally, regards to perceived effectiveness of users during the recommendation process, results of Q1 show that a 95% of the participants evaluated positively this aspect (i.e. 3 or more points). We also asked users about the area on the interface that they paid more attention to the recommendations received: the stack area, the suggestion box area, the awareness area, or if there is not a preferred area. We report the results on the user’s perception of most useful area in Figure 4b. A 45% of users prefer the awareness area as its main source of information about member’s activities and for choosing a product. This means that group interaction is highly influenced by observation as users prefer to observe which products consult their teammates and then select one of them. Next, the most preferred area with a 30% value is the suggestions box area. Note that suggestions received in this area come from teammates or proposed in a proactive way by the recommender algorithm. A 15% of users prefer the stack area as its main source of recommendations. Finally, a 10% of users has not a clear preference over an area because they have been looking equally at all of them.

5. Conclusions This paper presented gCOACH framework, which supports on-line group recommendation scenarios. It has been described the conceptual architecture of the system and explained the developed interaction modalities available to communicate, coordinate and persuade group participants. The usability of this novel interface has been evaluated with live-users. The results show that 83.5% of participants responded positively to the various social interaction modalities. The results also depict that mostly users prefer to be aware about the products that are viewing other group members and it is their main influence for choosing products. Considering all the interaction modalities, as future work, we plan to add the behavior implicitly detected in gCOACH interface to define roles (i.e. leader, collaborator, or follower) among the teammates to influence in the group recommendation algorithm.

J. Pascual et al. / Analysis of a Collaborative Advisory Channel for Group Recommendation

125

Acknowledgments D. Contreras is supported by a doctoral fellowship "Becas Chile". This research has also received support from the project TIN2012-38603-C02, CSD2007-0022, TIN2011-24220 and TIN201238876-C02-02 from the Spanish Ministry of Science and Innovation.

References [1] S. Berkovsky and J. Freyne. Group-based recipe recommendations: analysis of data aggregation strategies. In Proc. of the fourth ACM conference on Recommender systems, pages 111–118. ACM, 2010. [2] D. Bowman, J. Gabbard, and D. Hix. A Survey of Usability Evaluation in Virtual Environments : Classification and Comparison of Methods 1 Introduction and motivation 2 Distinctive characteristics of VE evaluation. Presence: Teleoperators and Virtual Environments, 11(4):404–424, 2002. [3] A. Crossen, J. Budzik, and K. J. Hammond. Flytrap: Intelligent group music recommendation. In Proc. of the Int. Conf. on Intelligent User Interfaces, pages 184–185, USA, 2002. ACM. [4] I. Garcia, S. Pajares, L. Sebastia, and E. Onaindia. Preference elicitation techniques for group recommender systems. Inf. Sci., 189:155–175, Apr. 2012. [5] A. Jameson. More than the sum of its members: Challenges for group recommender systems. In Proc. of the International Working Conference on Advanced Visual Interfaces, pages 48–54, Italy, 2004. [6] A. Jameson and B. Smyth. Recommendation to groups. In P. Brusilovsky, A. Kobsa, and W. Nejdl, editors, The adaptive web, pages 596–627. Springer-Verlag, Berlin, Heidelberg, 2007. [7] H. Lieberman, N. V. Dyke, and A. Vivacqua. Let’s browse: A collaborative web browsing agent. In Proc. of the International Conference on Intelligent User Interfaces, page 65, New York, 1999. [8] J. McCarthy and T. Anagnost. Musicfx: An arbiter of group preferences for computer aupported collaborative workouts. In Proc. of Conf. on Computer Supported Cooperative Work, pages 363–372, 1998. [9] K. McCarthy, L. McGinty, B. Smyth, and M. Salamó. The Needs of the Many: A Case-Based Group Recommender System. In Proc. of the European Conf. on Case Based Reasoning, pages 196–210. Springer Verlag, 2006. [10] K. McCarthy, M. Salamó, L. Coyle, L. McGinty, and B. S. . P. Nixon. CATS: A Synchronous Approach to Collaborative Group Recommendation. In Proceedings of the FLAIRS 2006 Conference, pages 1–16. Springer Verlag, 2006. Florida, USA. [11] K. McCarthy, M. Salamó, L. Coyle, L. McGinty, B. Smyth, and P. Nixon. Group recommender systems: A critiquing-based approach. In Proc. of the Int. Conf. on Intelligent User Interfaces, pages 267–269, Austrailia, 2006. [12] L. McGinty and J. Reilly. On the evolution of critiquing recommenders. In Recommender Systems Handbook, pages 419–453. Springer, 2011. [13] M. O’Connor, D. Cosley, J. Konstan, and J. Riedl. PolyLens: A Recommender System for Groups of Users. In Proc. of European Conf. on Computer-Supported Cooperative Work, pages 199–218, 2001. [14] F. Resatsch, S. Karpischek, U. Sandner, and S. Hamacher. Mobile sales assistant: Nfc for retailers. In Proc. of Int. Conf. on HCI with mobile devices and services, pages 313–316. ACM, 2007. [15] F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors. Recommender Systems Handbook. Springer, 2011. [16] J. B. Schafer, D. Frankowski, J. Herlocker, and S. Sen. The adaptive web. chapter Collaborative Filtering Recommender Systems, pages 291–324. Springer-Verlag, 2007. [17] C. Senot, D. Kostadinov, M. Bouzid, J. Picault, A. Aghasaryan, and C. Bernier. Analysis of strategies for building group profiles. In User Modeling, Adaptation, and Personalization, Lecture Notes in Computer Science, pages 40–51. Springer Berlin / Heidelberg, 2010. [18] B. Smyth. The adaptive web. chapter Case-based Recommendation, pages 342–376. Springer-Verlag, 2007. [19] B. Smyth and L. McGinty. An Analysis of Feedback Strategies in Conversational Recommender Systems. In P. Cunningham, editor, Proceedings of the 14th National Conference on Artificial Intelligence and Cognitive Science, 2003. Dublin, Ireland. [20] C.-Y. Wang, Y.-H. Wu, and S.-C. Chou. Toward a ubiquitous personalized daily-life activity recommendation service with contextual information: a services science perspective. Information Systems and E-Business Management, 8:13–32, 2010.

126

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-126

Discovery of Spatio-Temporal Patterns from Location Based Social Networks Javier Bejar a,1 Sergio Alvarez a Dario Garcia a Ignasi Gomez a Luis Oliva a Arturo Tejeda a Javier Vazquez-Salceda a a Dep. de Llenguatges i Sistemes Informatics Universitat Politecnica de Catalunya Abstract. Location Based Social Networks (LBSN) have become an interesting source for mining user behavior. These networks (e.g. Twitter, Instagram or Foursquare) collect spatio-temporal data from users in a way that they can be seen as a set of collective and distributed sensors on a geographical area. Processing this information in different ways could result in patterns useful for several application domains. These patterns include simple or complex user visits to places in a city or groups of users that can be described by a common behavior. The domains of application range from the recommendation of points of interest to visit and route planning for touristic recommender systems to city analysis and planning. This paper presents the analysis of data collected for several months from such LBSN inside the geographical area of two large cities. The goal is to obtain by means of unsupervised data mining methods sets of patterns that describe groups of users in terms of routes, mobility patterns and behavior profiles that can be useful for city analysis and mobility decisions. Keywords. Spatio Temporal Data, Data Mining, Clustering, Frequent Itemsets

1. Introduction Location Based Social Networks [14], like Twitter or Instagram, are an interesting source for user geospatial behavior analysis. The data that these applications provide include user’s spatio-temporal information that can be used to uncover complex behaviors and patterns, including frequent routes, points of interest, group profiles or unusual events. This information is important for applications such as city management and planning or recommender systems. The goal of this paper is to analyze this spatio-temporal data using unsupervised data mining techniques in order to find out what patterns arise from user behavior in large cities. The data used in the experiments was obtained from Twitter and Instagram social networks in the geographical area that surrounds 1 Corresponding Author: Javier Béjar, Universitat Politecnica de Catalunya (Barcelona Tech); E-mail: [email protected].

J. Bejar et al. / Discovery of Spatio-Temporal Patterns from Location Based Social Networks

127

the cities of Barcelona and Milan. The expectation is that these unsupervised techniques can be applied to any city (or general area) to discover useful patterns that can be used in applications that need to reason and make decisions from the spatio-temporal activities of citizens. This study is included inside the European ICT project SUPERHUB, that has among its goals to integrate different sources of information to help the decision making process oriented to improve urban mobility. This project is part of the EU initiative towards the development of smart cities technologies. Two of the key points of the project are to use the citizens as a network of distributed sensors that gather information about city mobility conditions and to generate mobility profiles from these users. This information will be used to implement route planning and mobility recommendation systems. The plan of the paper is as follows: Section 2 introduces to other approaches to discover patterns from spatio-temporal data in general and from LBSN in particular. Section 3 describes the characteristics of the data and the transformations applied to obtain datasets suitable for the unsupervised data mining algorithms. Section 4 explains the approach, by means of frequent itemsets algorithms, to discover frequent patterns as an approximation to the common regions of interest of the users and their connections in the city. Section 5 address the problem of finding user profiles and groups that show interesting collective behaviors using clustering algorithms. Finally, section 6 presents some conclusions about the results of the different techniques along with the possible extensions of this work.

2. Related work Since the wide availability of devices capable of transmitting the location of users (mobiles, GPS devices, tablets, laptops), there has been an interest on studying user mobility patterns inside a geographical area. This data is available from different sources, from GPS traces extracted from these devices to internet sites where the users voluntarily share their location. From these data, different information can be extracted depending on the goal. One important application is the generation of visualizations, so the patterns in the data can be easily interpreted by experts in the specific domain of analysis, for example, city officials studying citizen mobility and traffic distribution. In this line of work, [1,2] describe different methods to obtain visualizations of clusters of GPS trajectories extracted from the movements of cars from the city of Milan. In [10], different techniques based on Kernel Density Estimation are employed to detect and visualize hot spots for the domains of epidemiology and criminology. Other applications include user routine mining and prediction. The idea is either to recognize user activities from the repeating temporal behavior of individuals or groups, or to recommend to users activities according to past behavior or user context. In [4], data gathered from mobile phones of MIT students and faculty was used to predict user routines and social network using hidden markov models and gaussian mixtures. In [5], the same dataset was used for user and group routine prediction, user profiling and change discovery in user routines. The methodology used a text mining analogy, considering the individual activities as

128

J. Bejar et al. / Discovery of Spatio-Temporal Patterns from Location Based Social Networks

words and the sequence of activities as documents, transforming the user activities to a bag of words representation and grouping them using Latent Dirichlet Allocation. In [13], data collected from GPS trajectories over an extended period of time inside a city was used. From these trajectories, a set of special points, defined as points where a user stays in a bounded region for a short period of time, were extracted. These points were considered points of interest for the user. This information and the categories of the places inside the region surrounding the points of interest were used for a touristic recommender system. An alternative to information collected by GPS enabled devices are the Location Based Social Networks. The main issue with GPS data is that it is difficult to obtain continuously for a large number of users. These social networks allow to sample information from more users, with the cost of a much lower sample rate. In [8], the text mining analogy is used to analyze data from Foursquare. Only information relative to the category of the check-in places was used, and all the check-ins of a user were put together to represent his global activity. Latent Dirichlet Allocation was then applied to obtain clusters described by sets of salient activities. These sets of activities allowed to characterize the different groups of persons in a city as a first step to extract user profiles to be used for different applications. In [9], the BrightKite social network (similar to Foursquare) was analyzed to obtain geographical profiles of the users and to measure the correlation between their activities and their geographical locations. In [11], data from Twitter was used to predict user activity. From the Twitter events, only the ones corresponding to Foursquare check-ins were extracted. Different clusterings were obtained using as characteristics spatial location, time of the day and venue type. These clusters were used as characteristics for activity prediction and recognition. The venue types were transformed to a set of predefined activities (lunch, work, nightlife, ...) and the prototypes of the clusters were used to obtain activity predictions for new data.

3. The dataset The goal of this work is to show the feasibility of extracting meaningful patterns from LBSN, so our interest was focused on the most popular of these kind of social networks, specifically Twitter, Foursquare and Instagram. Also we wanted to test if it is possible to perform the analysis with minimum monetary cost, so the data was obtained from the public feeds of these networks. This allows to collect the data to analyze for free. Initially, the Foursquare network was discarded from the analysis because it is not allowed to obtain an identifier for the users from its public feed, so the actual datasets only include Twitter and Instagram data. The results to be expected from the data of the public feeds will not be as good as analyzing all the data that can be obtained from these social networks, but that would involve, apart from the monetary cost, to apply scalability techniques for the analysis and that was not the intention at this stage of the work. All the events from these networks, among other user information, include a spatio-temporal stamp represented by latitude, longitude and time of the event. An unique user identifier is also provided that allows to relate all the events of a

J. Bejar et al. / Discovery of Spatio-Temporal Patterns from Location Based Social Networks

129

user. This information is enough to have a low rate sampling of the behavior of a large number of users from a geographical area. The data obtained from Twitter and Instagram was filtered to contain all the events inside approximately an area of 30 × 30 km2 of the cities. That includes all the populated area of the city and other surrounding cities. For the city of Barcelona, the dataset was collected during seven months (october 2013 to april 2014), the number of events extracted from Twitter and Instagram is around a million and a half each. For the city of Milan, the dataset was collected during two months (march 2014 to april 2014), the number of events in the datasets is around three hundred thousand for Twitter, and for Instagram one hundred thousand. Despite the large number of events the data is actually very sparse from the user-event perspective. Both datasets present similar user-event distributions, where around 50% of users only generate one event on a given day and 40% of the users only generate one event during the collected period. It is difficult to extract patterns of the behavior of the users directly from the raw events, so first the data has to be transformed to a more useful representation. There are different ways of obtaining such transformation. It was considered that the behavior of a user during a day may contain useful information for the application domains, so it was decided to generate transactions from the users daily events. The attributes of such transactions are the presence of an event in a geographical position (latitude, longitude) during an specific range of hours of the day. Different discretizations of the geographical area and time will allow to extract patterns at different resolutions and complexity. From these transaction two different unsupervised data mining methods will be used to obtain patterns. First, the extraction, by using a frequent itemset approach, of frequent related events that can be interpreted as connected places or frequent routes. Second, the extraction of user group profiles by clustering the users by their general daily behavior.

4. Frequent route discovery Association rules algorithms can discover frequent associations among elements in a transactions database. This paradigm can be used also to discover groups of spatio-temporal points that are frequently visited by people at similar times. This information can be useful, for instance, to discover places that are visited frequently by people that have a certain profile (e.g. tourists), places in the city that need to be connected by public transportation, traffic bottlenecks that are connected in time or travel patterns of citizens during the day (e.g. home to work). A previous step to apply association rules algorithms to this data is to decide what is a transaction in this domain. A possible answer is to consider as an item a place that a person has visited at a specific time (or period) of the day, and as a transaction the collection of all these events during a day. This is similar to the transactions considered in market basket analysis. To reduce the possible number of events, time and space can be discretized. This has two beneficial effects, first, to reduce the computational cost and second, to allow patters to emerge. The approach used is to divide the geographical area

130

J. Bejar et al. / Discovery of Spatio-Temporal Patterns from Location Based Social Networks

in a regular grid and to divide time in groups of hours. Other more costly discretization processes could be applied, but at this step of the process they have been discarded. Usually, not all places in a geographical area can be accessed, so the actual number of places a user can be is much less than all the possible cells in the grid. For example, for the data from Barcelona, less than half of the grid cells have at least an event. The use of a grid also allows to study the events at different levels of granularity, from specific places (e.g. points of interest) to meaningful regions (e.g. city districts). Another possibility would be to use a density based clustering algorithm to extract interesting areas from the raw data. The main problem of this approach is the sparseness and irregular distribution of the events. Usually, the clusters obtained this way are concentrated on the areas with more events, discarding other less dense areas that could be also of interest. The algorithm used for extracting frequent patterns from daily user events is the FP-growth algorithm [7]. This scalable algorithm uses a specialized data structure called FP tree, that allows to obtain frequent itemsets without generating candidates. To reduce the number of patterns obtained for further analysis, only the maximal itemsets are presented, being these patterns the largest itemsets that have a support larger than the minimum support threshold. Analyzing the data, results from the patterns obtained from Twitter and Instagram show different perspectives on the city. The most frequent Instagram events are generally related to tourist activity, so many of the frequent related areas show connections among touristic points of interest. For example, for the city of Barcelona the Sagrada Familia, Plaça Catalunya, Güell Park, Casa Batlló and La Pedrera appear among the most connected points of interest, the rest are concentrated around the center of the city. Using different time and space resolutions as long as different support allow to extract more general or specific patterns. For example, for the Instagram data from Barcelona, using a 150×150 grid, dividing the day in three range of hours and with a minimum support of 20, results in 688 itemsets mostly of length 2, increasing the support to 60 reduces the number of itemsets to 84. Reducing the granularity of the grid increases the number of extracted itemsets and increasing it has the opposite effect. Patterns from Twitter events are more diverse and show behaviors also for the actual dwellers of the city, this makes this data more interesting from the city planning perspective. This shows in the diversity of places that appear connected, that include touristic points of interest but also other places distributed all over the city. As an example of patterns for Barcelona, some of them show people from the nearby cities that tweet at the beginning or the end of the day from home and also arriving to their place of work inside the city, or people that traverse the city from their home outside Barcelona to their workplace in another city near Barcelona. A surprising pattern is the connection of the areas of the city where usually large business events and conventions are hosted and some areas that concentrate hotels where these people stay, like the Fira de Barcelona (conventions center) and Diagonal Mar (Marriott, Princess and Hilton hotels), that surprisingly are at opposite sides of the city. Public transportation behavior also appears, for example patterns that directly connect different train stations outside the city with stations inside the city.

J. Bejar et al. / Discovery of Spatio-Temporal Patterns from Location Based Social Networks

131

Figure 1. Frequent connected areas from Barcelona Twitter data (150×150 Grid, groups of 8 hours, support=40)

The data from Milan is from a shorter period, but some patterns already emerge. For example, from Twitter data, patterns appear connecting Duomo square and several of the surrounding churches and palaces and also patterns connecting some of the main train stations and surrounding cities with the center of Milan. Reducing the support more specific connections appear like patterns inside the university city campus and its surroundings. In order to facilitate the interpretation of the results, the frequent itemsets are represented as a layer over a map of the city. A representation of some of this itemsets for Barcelona is presented on figure 1.

5. Group behavior from clustering A complement to the frequent patterns analysis is to discover different profiles from the geospatial behavior of the citizens. For this analysis, instead of the user information from a day, it was considered that the behavior of a user during all the study period would be more informative. For computing this information, it was decided to summarize all the daily information for the users, collecting all the different places where they have been and when. The same discretization scheme used before was used to reduce the possible events for the users.

132

J. Bejar et al. / Discovery of Spatio-Temporal Patterns from Location Based Social Networks

To obtain the summary for each user behavior, it was decided to use the text mining analogy already used on related work. A feature vector was generated using different techniques based on the vector space model/bag of words and TF-IDF attributes [12]. There are different term/event attributes that can be computed. Being the task at hand exploratory, three different possibilities were evaluated, namely: 1. Absolute term frequency, computed as the times the user has been on a place at a specific time. 2. Normalized term frequency, computed as the times the user has been on a place at a specific time normalized by the total number of places the user has been. 3. Binary term frequency, computed as 0 or 1, depending if the user has been on a place at a specific time. To include in the representation the importance of the places on the city respect to the global number of visits they have, also the inverse document frequency (IDF) was computed for all the different places/times in the dataset. This allows to obtain six different representation of the citizen data. One for each type of term/event attribute and one for each respective IDF normalization. Different cluster algorithms can be applied to extract group profiles. Due to the nature of the data (geographically distributed), our intuition was that exemplar based or grid based clustering algorithms would be more successful in finding meaningful clusters than prototype or model based clustering algorithms. To test this intuition, we experimented with two clustering algorithms, Kmeans [3] and affinity propagation clustering [6]. The first one is based on finding spherical clusters around a prototype. The main issue for this method is to find out the correct number of clusters. This task is harder in our case because the Kmeans assumption of spherical clusters is probably wrong for most of the clusters in the data, so the usual quality indices employed to decide the number of clusters are not very useful. The second algorithm is based on belief propagation and it is exemplar based, being able to find irregular shaped clusters. It has also the advantage of being able to decide the number of clusters that best fit the data. In order to enhance the quality of the dataset, we filtered users without a minimum number of distinct events (place/time). This allows to extract more meaningful profiles with the cost of reducing the actual number of users. In the experiments, people that had visited less than 20 places-time were discarded, reducing the number of examples to a few thousands of users. Also, from the obtained clusterings, all the clusters that had less than 20 users were discarded as non informative. The experiments with the different representations for all the datasets show that the absolute term frequency and the normalized term frequency (with and without IDF normalization) do not allow a good representation of group behavior. The K-means algorithm produces, for all four combinations, only one or two large clusters, that include most of the examples, additionally with a small number of very small clusters (around 30-40 users each). Affinity propagation does not result in much better clusterings, yielding also only one or two very large clusters and a small number of smaller ones, but only about a third of the instances concentrate

J. Bejar et al. / Discovery of Spatio-Temporal Patterns from Location Based Social Networks

133

in the larger clusters. The exception for this algorithm is for the normalized term frequency representation without IDF normalization. In this case it is obtained a larger set (6 to 10) of more size balanced clusters, with some hundreds of examples each, and some additional smaller cluster. The binary term frequency (with and without IDF normalization) has slightly better results for K-means, resulting in a larger number of clusters ranging from 5 to 25 depending on the dataset. The distribution of the examples in clusters consists of a large cluster, a few medium sized ones an a few small ones. For affinity propagation, the number of clusters is much larger and sizes are more evenly distributed, being this the best result. Without the IDF normalization, the number of clusters range from 50 to 100 depending on the dataset, with cluster sizes from 25 to 100 users. This result seems more reasonable, because it is more plausible that several thousands of users picked at random will show a large variety of group behaviors. Normalizing by IDF the number of clusters is reduced to a range between 10 and 30 clusters that group larger number of users, representing more general groups. In order to facilitate the interpretation of the clusters, a prototype, represented over a map, is computed as the absolute frequency of the visits of the users to the different points of the grid, without considering the time slot of the day. Also, for a more complex interpretation, a representation that includes the frequency of visits break down by time slot is computed. This representation allows to see geographical behavior associated with time of the day. It is difficult to interpret clustering results without a more profound knowledge of the domain, but from the visualization of the prototypes some evident clusters appear. For example, for the Instagram data from Barcelona, several clusters include visits to different subsets of touristic points of interest in the city that show different choices about what to visit and when. From the Twitter data from Barcelona richer clusters can be discovered. Some of them show behaviors that also appear in the frequent itemsets, like a cluster of people tweeting from home from outside cities and later tweeting from places distributed over all Barcelona. Others show other kinds of behaviors like people that tweets after work mostly from different places inside districts of the city with active night life. Another example is a cluster of the people that tweets from convention centers and also from hotels obtained using affinity propagation and the binary term frequency representation with IDF normalization. This cluster could be characterized as business people that attend conventions in Barcelona. As an example of the representation obtained to visualize the clusters, figure 2, represents an extract from the map of this cluster.

6. Conclusions and future work Location Based Social Networks are an important source of knowledge for user behavior analysis. Different treatments of the data and the use of different attributes allow to analyze and study the patterns of users in a geographical area. Methods and tools for helping to analyze this data will be of crucial importance in the success of, for example, smart city technologies.

134

J. Bejar et al. / Discovery of Spatio-Temporal Patterns from Location Based Social Networks

Figure 2. Cluster from Barcelona Twitter data representing people that attend business conventions (150×150 Grid, groups of 8 hours, users with at least 20 different places-time). The figure shows a large number of events at convention center Fira de Barcelona Gran Via, restaurants/convention center Fira de Barcelona Montjuïc and hotels/convention center at Diagonal Mar

In this paper we present two methodologies that are able to extract patterns that can help to make decisions in the context of the management of a city from different perspectives, like mobility patterns, touristic interests or citizens profiles. The patterns extracted show that it is possible to obtain behavior information from LBSN data. Increasing the quantity and the quality of the data will improve further the patterns and the information that can be obtained. As future work, we want to link the information of these different networks to extract more complex patterns. The data from Twitter includes Foursquare check-ins, this allows to tag some of the events to specific venues and their categories allowing for recommender systems applications and user activity recognition and prediction. There are also links to Instagram photographs allowing to cross reference both networks augmenting the information of user Twitter events with Instagram events of the same user, reducing this way the sparsity of the

J. Bejar et al. / Discovery of Spatio-Temporal Patterns from Location Based Social Networks

135

data. Also, in this paper, the temporal dimension of the dataset has not been fully exploited. Analyzing the events temporal relationship will allow the study of causal dependencies and temporal correlations. 7. Acknowledgments This work has been supported by the EU funded SUPERHUB project (ICT-FP7289067). References [1] Gennady Andrienko, Natalia Andrienko, Salvatore Rinzivillo, Mirco Nanni, Dino Pedreschi, and Fosca Giannotti. Interactive visual clustering of large collections of trajectories. In Visual Analytics Science and Technology, 2009. VAST 2009. IEEE Symposium on, pages 3–10. IEEE, 2009. [2] Natalia Andrienko and Gennady Andrienko. A visual analytics framework for spatiotemporal analysis and modelling. Data Mining and Knowledge Discovery, pages 1–29, 2012. [3] David Arthur and Sergei Vassilvitskii. k-means++: the advantages of careful seeding. In Nikhil Bansal, Kirk Pruhs, and Clifford Stein, editors, SODA, pages 1027–1035. SIAM, 2007. [4] Nathan Eagle and Alex (Sandy) Pentland. Reality mining: Sensing complex social systems. Personal Ubiquitous Comput., 10(4):255–268, March 2006. [5] Katayoun Farrahi and Daniel Gatica-Perez. Discovering routines from large-scale human locations using probabilistic topic models. ACM Trans. Intell. Syst. Technol., 2(1):3:1– 3:27, January 2011. [6] Frey and Dueck. Clustering by passing messages between data points. SCIENCE: Science, 315, 2007. [7] Jiawei Han, Jian Pei, Yiwen Yin, and Runying Mao. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data mining and knowledge discovery, 8(1):53–87, 2004. [8] Kenneth Joseph, Chun How Tan, and Kathleen M. Carley. Beyond "local", "categories" and "friends": Clustering foursquare users with latent "topics". In Proceedings of the 2012 ACM Conference on Ubiquitous Computing, UbiComp ’12, pages 919–926, New York, NY, USA, 2012. ACM. [9] Nan Li and Guanling Chen. Analysis of a location-based social network. In Computational Science and Engineering, 2009. CSE ’09. International Conference on, volume 4, pages 263–270, Aug 2009. [10] Ross Maciejewski, Stephen Rudolph, Ryan Hafen, Ahmad Abusalah, Mohamed Yakout, Mourad Ouzzani, William S Cleveland, Shaun J Grannis, and David S Ebert. A visual analytics approach to understanding spatiotemporal hotspots. Visualization and Computer Graphics, IEEE Transactions on, 16(2):205–220, 2010. [11] F. Pianese, Xueli An, F. Kawsar, and H. Ishizuka. Discovering and predicting user routines by differential analysis of social network traces. In World of Wireless, Mobile and Multimedia Networks (WoWMoM), 2013 IEEE 14th International Symposium and Workshops on a, pages 1–9, June 2013. [12] Sholom M Weiss, Nitin Indurkhya, and Tong Zhang. Fundamentals of predictive text mining, volume 41. Springer, 2010. [13] Vincent W Zheng, Yu Zheng, Xing Xie, and Qiang Yang. Collaborative location and activity recommendations with gps history data. In Proceedings of the 19th international conference on World wide web, pages 1029–1038. ACM, 2010. [14] Yu Zheng. Location-based social networks: Users. In Computing with Spatial Trajectories, pages 243–276. Springer, 2011.

136

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-136

Collaborative Assessment Patricia Gutierrez

a,1

, Nardine Osman a,2 and Carles Sierra a IIIA-CSIC, Spain

a,3

Abstract. In this paper we introduce an automated assessment service for online learning support in the context of communities of learners. The goal is to introduce automatic tools to support the task of assessing massive number of students as needed in Massive Open Online Courses (MOOC). The final assessments are a combination of tutor’s assessment and peer assessment. We build a trust graph over the referees and use it to compute weights for the assesments aggregations. Keywords. Automated Assessment, Trust Model, Online Learning

1. Introduction Self and peer assessment have clear pedagogical advantages. Students increase their responsibility and autonomy, get a deeper understanding of the subject, become more active and reflect on their role in group learning, and improve their judgement skills. Also, it may have the positive side effect of reducing the marking load of tutors. This is specially critical when tutors face the challenge of marking large quantities of students as needed in the increasingly popular Massive Open Online Courses (MOOC). Online learning communities encourage different types of peer-to-peer interactions along the learning process. These interactions permit students to get more feedback, to be more motivated to improve, and to compare their own work with other students accomplishments. Tutors, on the other hand, benefit from these interactions as they get a clearer perception of the student engagement and learning process. Previous works have proposed different methods of peer assessment. The authors of [6] propose methods to estimate peer reliability and correct peer biases. They present results over real world data from 63,000 peer assessments of two Coursera courses. The models proposed are probabilistic. Differently from them, we place more trust in students who grade like the tutor and do not consider student’s biases. When a student is biased its trust measure will be very low and his/her opinion will have a moderate impact over the final marks. The authors of [2] proposes the CrowdGrader framework, which defines a crowdsourcing algorithm for peer evaluation. The accuracy degree (i.e. reputation) of each student is measured as the distance between his/her self assesment and the aggregated opinion of the peers weighted by their accuracy degrees. In this paper, and differently from previous works, we want to study the reliability of student assessments when compared with tutor assessments. Although part of the 1 Corresponding

Author: E-mail: [email protected]. Author: E-mail: [email protected]. 3 Corresponding Author: E-mail: [email protected]. 2 Corresponding

P. Gutierrez et al. / Collaborative Assessment

137

learning process is that students participate in the definition of the evalution criteria, tutors want to be certain that the scoring of the students’ works is fair and as close as possible to his/her expert opinion. To achieve our objective we propose in this paper an automated assessment method (Section 2) based on tutor assessments, aggregations of peer assessments and on trust measures derived from peer interactions. We experimentaly evaluate (Section 3) the accuracy of the method over different topologies of student interactions (i.e. different types of student grouping). The results obtained are based on simulated data, leaving the validation with real data for future work. We then conclude with a discussion of the results (Section 4).

2. Collaborative Assessment In this section we introduce the formal model and algorithm for collaborative assessment. 2.1. Notation and preliminaries We say an online course has a tutor τ , a set of peer students S, and a set of assignments A that need to be marked by the tutor and/or students with respect to a given set of criteria C. The automated assessment state X is then defined as the tuple: X = R, A, C, L where R = {τ } ∪ S defines the set of possible referees (or markers), where a referee could either be the tutor τ or some student s ∈ S. A is the set of submitted assignments that need to be marked and C = c1 , . . . , cn  is the set of criteria that assignments are marked upon. L is the set of marks (or assessments) made by referees, such that L : R × A → [0, λ]n (we assume marks to be real numbers between 0 and some maximum  , where α ∈ A, value λ). In other words, we define a single assessment as: μρα = M  ρ ∈ R, and M = m1 , . . . , mn  describes the marks provided by the referee on the n criteria of C, mi ∈ [0, λ]. Similarity between marks We can definie a similarity function sim : [0, λ]n ×[0, λ]n → [0, 1] to determine how close two assesments μρα and μηα are. We calculate the similarity between assessments μρα = {m1 , . . . , mn } and μηα = {m1 , . . . , mn } as follows: n |mi − mi | sim(μρα , μηα ) = 1− i=1

n

. This measure satisfies the basic properties of a fuzzy λ

i=1

similarity [5]. We nothe that other similarity measures can be used. Trust relations between referees Tutors need to decide up to which point they can believe on the assessments made by peers. We use two different intuitions to make up this belief. First, if the tutor and the student have both assessed some assigments, their similarity gives a hint of how close the judegements of the student and the tutor are. Similarly, we can define the judgement closeness of any two students by looking into the assign-

138

P. Gutierrez et al. / Collaborative Assessment

ments evaluated by both of them. In case there are no assigments evaluated by the tutor and one particular student we could simply not take that student’s opinion into account because the tutor would not know how much to trust the judgement of this student, or, as we do in this paper, we approximate that unknown trust by lookig into the chain of trust between the tutor and the student through other students. To model this we define two different types of trust relations: • Direct trust: This is the trust between referees ρ, η ∈ R that have at least one assignement assessed in common. The trust value is the average of similarities on their assessments over the same assignments. Let the set Aρ,η be the set of all assignments that have been assessed by both referees. That is, Aρ,η = {α | μρα ∈ L and μηα ∈ L}. Then,  TD (ρ, η) =

α∈Aρ,η

sim(μρα , μηα )

|Aρ,η |

We could also define direct trust as the

conjunction of the similarities for all common assignments as: TD (ρ, η) = sim(μρα , μηα ). In this case TD would be a α∈Aρ,η

similarity measure satisfying ⊕-transitivity for any T-norm. However, this would not be practical, as a significant difference in just one assessment of two referees would make their mutual trust very low. • Indirect trust: This is the trust between referees ρ, η ∈ R without any assignement assessed by both of them. We compute this trust as a transitive measure over chains of referees for which we have pair-wise direct trust values. We define a trust chain as a sequence of referees cj = ρi , ..., ρi , ρi+1 , . . . , ρmj  where ρi ∈ R, ρ1 = ρ and ρmj = η and TD (ρi , ρi+1 ) is defined for all pairs (ρi , ρi+1 ) with i ∈ [1, mj − 1]. We note by C(ρ, η) the set of all trust chains between ρ and η. Thus, indirect trust is defined as a aggregation of the direct trust values over these chains as follows:

TI (ρ, η) =

max

cj ∈C(ρ,η)



TD (ρi , ρi+1 )

i∈[1,mj −1]

Hence, indirect trust is based in the notion of transitivity.4 Ideally, we would like to not over rate the trust of a tutor on a student, that is, we would like that TD (a, b) ≥ TI (a, b) in all cases. Guaranteeing this in all cases is impossible, but we can decrease the number of overtrusted students by selecting an operator that operator, because gives low values to TI . In particular, we prefer to use the product this is the t-norm that gives the smallest possible values. Other opertors could be used, for instance the min function. 4T

I is based on a fuzzy-based similarity relation sim presented before and fulfilling the ⊗-Transitivity property: sim(u, v) ⊗ sim(v, w) ≤ sim(u, w), ∀u, v, w ∈ V , where ⊗ is a t-norm [5].

139

P. Gutierrez et al. / Collaborative Assessment

Trust Graph We use a graph data structure to store and represent the trust relations between referees. The trust graph is defined as: G = R, E, w, where the set of nodes R is the set of referees, E ⊆ R × R are edges between referees with direct or indirect trust relations, and w : E → [0, 1] provides the trust value. We note by D ⊂ E the set of edges that link referees with direct trust. That is, D = {e ∈ E|TD (e) = ⊥}. An similarly, I ⊂ E for indirect trust, I = {e ∈ E|TI (e) = ⊥} \ D. The w values are defined as:  w(e) =

TD (e) TI (e)

, if e ∈ D , if e ∈ I

Figure 1 shows examples of trust graphs with e ∈ D (in black) and e ∈ I (in red —light gray) for different sets of assessments L.

(a) L = {μtutor = (b) L = {μtutor = (c) L = {μtutor = (d) L = {μtutor ex1 ex1 ex1 ex1 dave dave 5, 5} 5, 5, μex1 = 5, 5, μex1 = 5, 5, μdave ex1 6, 6} 6, 6, μdave = 6, 6, μdave ex2 ex2 2, 2} 2, 2, μpatricia ex2 8, 8}

= = = =

Figure 1. Trust graph example 1.

2.2. Computing collaborative assessments Algorithm 1 implements the collaborative assessment method. We keep the notation (ρ, η) to refer to the edge connecting nodes ρ and η in the trust graph and C(ρ, η) to refer the set of trust chains between ρ and η. First, the trust graph is built from L. Then, the final assessments are computed as follows. If the tutor marks an assignment, then the tutor mark is considered the final mark. Otherwise, a weighted average of the marks of student peers is calculated for this assignment, where the weight of each peer is the trust value between the tutor and that peer. Figure 1 shows the evolution of a trust graph built from a chronological sequence of assessments made. The criteria C in this example are speed and maturity and the maximum mark value is λ = 10. For simplicity we only represent those referees that have made assessments in L. In Figure 1(a) there is one node representing the tutor who has made the only assessment over the assignment ex1 . In (b) student Dave assesses the same exercise as the tutor and thus a link is created between them. The direct trust value

140

P. Gutierrez et al. / Collaborative Assessment

Algorithm 1: collaborativeAssessments(S = R, A, C, L) Initial trust between referees is zero

D = I = ∅; for ρ, η ∈ R, ρ = η do w(ρ, η) = 0; end

Update direct trust and edges

for ρ, η ∈ R, ρ = η do ρ η Aρ,η = {β | μβ ∈ L and μβ ∈ L}; if |Aρ,η | > 0 then D = D ∪ (ρ, η); w(ρ, η) = TD (ρ, η); end end

Update indirect trust and edges between tutor & students

for ρ ∈ R do if (τ, ρ) ∈ D and C(τ, ρ) = ∅ then I = I ∪ (ρ, η); w(ρ, η) = TI (τ, η); end end

Calculate automated assessments

assessments = {}; for α ∈ A do if μτ α ∈ L then else

assessments = assessments ∪ (α, μτ α) R = {ρ|μρ α ∈ L}; if |R | > 0 then  μα =

ρ∈R



Tutor assessments are preserved

Generate automated assessments μρ α ∗ w(τ, ρ)

ρ∈R

w(τ, ρ)

;

assessments = assessments ∪ (α, μα ); end end end return assessments;

w(tutor, Dave) is high since their marks were similar. In (c) a new assessment by Dave is added to L with no consequences in the graph construction. In (d) student Patricia adds an assessment on ex2 that allows to build a direct trust between Dave and Patricia and an indirect trust between the tutor and Patricia, through Dave. Note that the trust graph built from L is not necessarily connected. A tutor wants to reach a point in which the graph is connected because that means a trust measure between the tutor and every student can be deduced. Figure 2 shows an example of a trust graph of a learning community involving 50 peer students and a tutor. When the history of assessments is small (|L| = 30) we observe that not all nodes are connected. As the number of assessments increases, the trust graph becomes denser and eventually gets connected. In (b) and (c) we see a complete graph. 3. Experimental Platform and Evaluation In this Section we describe how we generate simulated social networks, describe our experimental platform, define our benchmarks and discuss experimental results. 3.1. Social Network Generation Several models for social network generation have been proposed reflecting different characteristics present in real social communities. Topological and structural features of

141

P. Gutierrez et al. / Collaborative Assessment

(a) |L| = 30

(b) |L| = 200

(c) |L| = 400

Figure 2. Trust graph example 2

such networks have been explored in order to understand wich generating model resembles best the structure of real communities [4]. A social network can be defined as a graph N where the set of nodes represent the individuals of the network and the set of edges represent connections or social ties among those individuals. In our case, individuals are the members of the learning community: the tutor and students. Connections represent the social ties and they are usually the result of interactions in the learning community. We rely on the social network in order to simulate which student will assess the assignment of which other student. We assume students will assess the assignments of students they know, as opposed to picking random assignments. As such, we clarify that social networks are different from the trust graph of Section 2. While the nodes of both graphs are the same, edges of the social network represent social ties, whereas edges in the trust graph represent how much does one referee trust another in judging others work. To model social networks where relations represent social ties, we follow three different approaches: the Erd˝os-Rényi model for random networks [3], the Barabási-Albert model for power law networks[1] and a hierarchical model for cluster networks. 3.1.1. Random Networks The Erd˝os-Rényi model for random networks consists of a graph containing n nodes connected randomly. Each possible edge between two vertices may be included in the graph with probability p and may not be included with probability (1 − p). In addition, in our case there is always an edge between the node representing the tutor and the rest of nodes, as the tutor knows all of its students (and may eventually mark any of them). The degree distribution of random graphs follows a Poisson distribution. 3.1.2. Power Law Networks The Barabási-Albert model for power law networks base their graph generation on the notions of growth and preferential attachment. The generation scheme is as follows. Nodes are added one at a time. Starting with a small number of initial nodes, at each time step we add a new node with m edges linked to nodes already part of the network. In our experiments, we start with m + 1 initial nodes. The edges are not placed uniformly at random but preferentially in proportion to the degree of the network nodes. The probability p that the new node is connected to anode i already in the network depends on n the degree ki of node i, such that: p = ki / j=1 kj . As above, there is also always an edge between the node representing the tutor and the rest of nodes. The degree distribu-

142

P. Gutierrez et al. / Collaborative Assessment

tion of this network follows a Power Law distribution. Recent empirical results on large real-world networks often show their degree distribution following a power law [4]. 3.1.3. Cluster Networks As our focus is on learning communities, we also experiment with a third type of social network: the cluster network which is based on the notions of groups and hierarchy. Such networks consists of a graph composed of a number of fully connected clusters (where we believe clusters may represent classrooms or similar pedagogical entities). Additionally, as above, all the nodes are connected with the tutor node. 3.2. Experimental Platform In our experimentation, given an initial automated assessment state X = R, A, C, L with an empty set of assessments L = {}, we want to simulate tutor and peer assessments so that the collaborative assessment method can eventually generate a reliable and definitive set of assessments for all assignments. To simulate assessments, we say each students is defined by its profile that describes how good its assessments are. The profile is essentially defined by distance, dρ ∈ [0, 1] that specifies how close are the student’s assessments to that of the tutor. We then assume the simulator knows how the tutor and each student would assess an assignment, even if they do not actually assess it in the simulation. Then: • For every assignment α ∈ A, we calculate the tutor’s assessment, which is randomly generated according to the function fτ : A → [0, λ]n . • For every assignment α ∈ A, we also calculate the assessment of each student ρ ∈ S. This is calculated according to the function fρ : A → [0, λ]n , such that: sim(fρ (α), fτ (α)) ≤ dρ We note that we only need to calculate ρ’s assessment of α if the student who submitted the assignment α is a neighbour of ρ in N . 3.3. Benchmark We consider 50 students and one assignment is submitted by each student, resulting in |S| = 50 and |A| = 50. Three types of social networks are used: random social networks (with 51 nodes, p = 0.5, and approximate density of 0.5), power law networks (with 51 nodes, m = 16, and approximate density of 0.5), and cluster networks (with 51 nodes, 5 clusters of 10 nodes each, and approximate density of 0.2). Two cases are considered for generating the set of student profiles P r = {ds }∀s∈S , where d ∈ [0, 0.5]. A first case where d is picked randomly following a power law distribution and a second case where d is picked randomly following a uniform distribution. The set of criteria is C = speed, maturity and the maximum mark value is λ = 10. We also compute the ‘error’ of the collaborative assessment method, whose range is sim(fτ (α), φ(α)) [0, 1], over the set of assignments A accordingly:

α∈A

, where φ(α) |A| describes the automated assessment for a given assignment α ∈ A With the settings presented above, we run two different experiments, each with 50 executions. The results presented are an average of the 50 executions.

P. Gutierrez et al. / Collaborative Assessment

143

Figure 3. Experiment 1

In experiment 1, students provide their assessments before the tutor. Each student ρ provides assessments for a randomly chosen aρ number of peer assigments We run the experiment for 5 different values of aρ = {3, 4, 5, 6, 7}. After the students provide their assessments, the tutor starts assessing assignments incrementally. After every tutor assessment, the error over the set of automated assessment is calculated. Notice that the collaborative assessment method takes the tutor assessment, when it exists, to be the final assessment. As such, the number of automated assessments calculated based on aggregating students assessments is reduced over time. Finally, when the tutor has assessed all 50 students, the resulting error is 0. In experiment 2, the tutor provides its assessments before the students. The tutor in this experiment will assess a randomly chosen number of assignments aτ = {5, 10, 15, 20, 25}. After the tutor provides their assessments, students assessments are performed. In every iteration, a student ρ randomly selects a neighbor in N and assesses his assignment. We note that in the case of random and power law networks (denser networks), a total number of 1000 student assessments are performed. Whereas in the case of cluster networks (looser network), a total of 400 student assessments are performed. We note that initially, the trust graph is not fully connected, so the service is not able to provide automated assessments for all assignments. When the grap gets fully connected, the service generates automated assessments for all assignments and we start measuring the error after every new iteration.

144

P. Gutierrez et al. / Collaborative Assessment

Figure 4. Experiment 2

3.4. Evaluation In experiment 1, we observe (Figure 3) that the error decreases when the number of tutor assessments increase, as expected, until it reaches 0 when the tutor has assessed all 50 students. This decrement is quite stable and we do not observe abrupt error variations. More variations are observed in the initial iterations since the service has only a few assessments to deduce the weights of the trust graph and to calculate the final outcome. In the case of experiment 2 (Figure 4), the error diminishes slowly as the number of student assessments increase, although it never reaches 0. Since the number of tutor assessments is fixed in this experiment, we have an error threshold (a lower bound) which is linked to the students’ assessment profile: the closest to the tutor’s the lower this threshold will be. In fact, in both experiments we observe that when using a power law distribution profile the automated assessment error is lower than when using a uniform distribution profile. This is because when using a power law distribution, more student profiles are generated whose assessments are closer to the tutors’. In general, the error trend observed in all experiments comparing different social network scenarios (random, cluster or power law) show a similar behavior. Taking a closer look at experiment 2, cluster social graphs have the lowest error and we observe that assessments on all assignments are achieved earlier (this is, the trust graph gets connected earlier). We attribute this to the topology of the fully connected clusters which favors the generations of indirect edges earlier in the graph between the tutor and the nodes of each cluster. Power law social graphs have lower error than random networks in most cases. This can be attributed to the criteria of preferential attachment in their network generation, which favors the creation of some highly connected nodes. Such

P. Gutierrez et al. / Collaborative Assessment

145

nodes are likely to be assessed more frequently since more peers are connected to them. Then, the automated assessments of these higly connected peers are performed with more available information which could lead to more accurate outcomes.

4. Conclusion and Future Work The use of AI techniques is key for the future of online learning communities. The application presented in this paper is specially useful in the context of MOOC: with a low number of tutor assessments and encouraging students to interact and provide assessments among each other, direct and indirect trust measures can be calculated among peers and automated assessments can be generated. Several error indicators can be designed and displayed to the tutor managing the course, which we leave for future work. For example the error indicators may inform the tutor which assignments have not received any assessments yet, or which deduced marks are considered unreliable. Alternatively, a reliability measure may also be assigned to the computed trust measure TD . Observing such error indicators, the tutor can decide to assess more assignments and as a result the error may improve or the set of deduced assessments may increase. When the error reaches a level of acceptance, the tutor can decide to endorse and publish the marks generated by the collaborative assessment method. Another interesting question for future work is presented next. Missing connections in the trust graph might improve its connectivity or maximize the number of direct edges. The question that follows then is, what assignments should be suggested to which peers such that those connections appear and the trust graph and the overall assessment outcome would improve? Additionally, future work may also study different approaches for calculating the indirect trust value between two referees. In this paper, we use the product operator. We suggest to study a number of operators, and run an experiment to test which is most suitable. To do such a test, we may calculate the indirect trust values for edges that do have a direct trust measure, and then see which approach for calculating indirect trust gets closest to the direct trust measures.

References A. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286:509–512, 1999. L. de Alfaro and M. Shavlovsky. Crowdgrader: Crowdsourcing the evaluation of homework assignments. Thecnical Report 1308.5273, arXiv.org, 2013. [3] P. Erd˝os and A. Rényi. On random graphs. Publicationes Mathematicae, 6:290–297, 1959. [4] E. Ferrara and G. Fiumara. Topological features of online social networks. Communications in Applied and Industrial Mathematics, 2:1–20, 2011. [5] L. Godo and R. Rodríguez. Logical approaches to fuzzy similarity-based reasoning: an overview. Preferences and Similarities, (504):75–128, 2008. [6] C. Piech, J. Huang, Z. Chen, C. Do, A. Ng, and D. Koller. Tuned models of peer assessment in moocs. Proc. of the 6th International Conference on Educational Data Mining (EDM 2013), pages 153–160, 2013. [1] [2]

This page intentionally left blank

Computer Vision II

This page intentionally left blank

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-149

149

Analysis of Gabor-Based Texture Features for the Identification of Breast Tumor Regions in Mammograms Jordina TORRENTS-BARRENA a , Domenec PUIG a,1 , Maria FERRE a , Jaime MELENDEZ b , Joan MARTI c and Aida VALLS a a Department of Computer Engineering and Mathematics, University Rovira i Virgili b Department of Radiology, Radboud University Medical Center c Computer Vision and Robotics Research Institute, University of Girona Abstract. Breast cancer is one of the most common neoplasms in women and it is a leading cause of worldwide death. However, it is also among the most curable cancer types if it can be diagnosed early through a proper mammographic screening procedure. So, suitable computer aided detection systems can help the radiologists to detect many subtle signs, normally missed during the first visual examination. This study proposes a Gabor filtering method for the extraction of textural features by multi-sized evaluation windows applied to the four probabilistic distribution moments. Then, an adaptive strategy for data selection is used to eliminate the most irrelevant pixels. Finally, a pixel-based classification step is applied by using Support Vector Machines in order to identify the tumor pixels. During this part we also estimate the appropriate kernel parameters to obtain an accurate configuration for the four existing kernels. Experiments have been conducted on different trainingtest partitions of mini-MIAS database, which is commonly used among researchers who apply machine learning methods for breast cancer diagnosis. The improved performance of our framework is evaluated using several measures: classification accuracy, positive and negative predictive values, receiver operating characteristic curves and confusion matrix. Keywords. Mammographic images, Gabor Filters, Statistical moments, Binaryclass Support Vector Machine, Kernel functions, Optimized model parameters.

1. Introduction Cancer is a group of diseases in which cells in the body grow, change, and multiply out of control. A group of rapidly dividing cells may form a lump or mass of extra tissue, called tumors. They can either be cancerous (malignant) or non-cancerous (benign). Malignant tumors can penetrate and destroy healthy body tissues. Thus, breast cancer refers to the erratic growth of cells that originate in the breast tissue [1]. For this reason, it is the leading cause of death among women between 40 and 55 years. The challenging effect of cancerous cells identification in a patient is highly subjective and it is reliant on the 1 Corresponding

Author: Domenec Puig, e-mail: [email protected]

150

J. Torrents-Barrena et al. / Analysis of Gabor-Based Texture Features

physician expertise. This may lead to inaccurate predictions since the experiments are prone to human and visual error and may be affected by blurred mammograms. Fortunately, the mortality rate has decreased in recent years with an increased emphasis on diagnostic techniques and more effective treatments [1]. Diagnosis is employed to significantly discern between malignant and benign cancerous patterns. Mammography is a commonly used imaging modality to enhance the radiologist’s ability to detect and diagnose cancer at an early stage and take immediate precautions for its earliest prevention. There is no doubt that evaluation of data taken from patients and decisions of experts are the most important factors in diagnosis [1,2]. However, CAD systems and different artificial intelligence techniques for classification also help experts in a great deal. There has been a lot of research on medical diagnosis of breast cancer in literature, and most of them reported high classification accuracies. Oliver and Marti [3] applied Eigenfaces to true mass detection. In Polat and Gunes [4], least square Support Vector Machines (SVM) was used and an accuracy of 98.53% was obtained. Local Binary Patterns were used by Llado and Oliver [5] to represent salient micro-patterns in order to preserve the spatial structure of the masses. Once the descriptors were extracted, Support Vector Machines were used for classifying the detected masses. The obtained accuracy was above 90%. In Hajare and Dixit [2] a Gabor filter was used to extract intensity features and the patches were obtained to recognize whether the mammogram image was normal benign or malign. Finally, the extracted features were classified using SVM. In another system designed by Lavanya et al [6], unilateral and bilateral information was fused using a multivariate statistical technique called canonical correlation analysis. The fused feature vector was then used to train a SVM with radial basis function [6]. As contributions, our work provides a new method based on the analysis of Gabor texture features and the available SVM configurations in order to afford more information about how to detect breast masses and know which methodology could be more effective. However, the main goal will be to classify normal and abnormal tissues in mammograms. The rest of the paper is organized as follows. Section 2 describes the proposed methodology. Section 3 presents experimental results by using our method to diagnose breast cancer. Last, Section 4 concludes the paper along with outlining future directions.

2. Proposed Method The proposed methodology in this work follow the next steps. During an initial training phase, a subset of mammographic images are selected. A set of statistical features is computed at every texture region of interest (normal and abnormal tissue) applying multisized evaluation windows. The training data associated with each pattern are first filtered by a multichannel Gabor filter bank, obtaining a cloud of texture feature vectors for every pattern. Then, an accurate selection of these features is done in order to achieve better results. The binary classification processes several test images to identify the texture pattern corresponding to each pixel. This is done by the same methodology explained at the first stage. So, each feature vector is assigned into one of the given texture patterns by a SVMbased classifier fed with the prototypes extracted during the training stage. To obtain the most accurate results we apply our method to the four existing kernel functions:

151

J. Torrents-Barrena et al. / Analysis of Gabor-Based Texture Features

linear, radial, polynomial and sigmoid in order to investigate which provides the best classification. In addition, we include a study that illustrates the effect of the required regularization parameters. All phases involved in this scheme are detailed below. 2.1. Textural Feature Extraction Feature extraction involves simplifying the amount of resources required to describe a set of data accurately. Analysis with a large number of variables generally requires a large amount of memory and computation power or a classification algorithm which over fits the training set and generalizes poorly to new samples. So, the accuracy of the classification depends on the feature extraction stage [7] who plays an important role in detecting abnormalities due to mammograms nature (see Fig. 1). Our work takes advantage of statistical features that have been proven to differentiate suspicious and normal breast tissues and isolate benign and malign lesions with masses and calcifications [8]. 2.1.1. Statistical Approaches In general, any image processing/analysis would require a particular feature strategy for classification. Nevertheless, mainly statistical features are significant in pattern recognition area [8]. A frequently used approach for texture analysis is based on the properties that statistical moments have [9]. 1. The expression for the first moment (mean) is given by: μ =

∑ni=1 xi n

2. The second is the standard deviation designated by σ , that represents the estimation of the mean square deviation in a gray pixel value p(i, j). It is determined using the formula: σ =

1 n

2

∑ni=1 (xi − μ)

3. To characterize the asymmetry degree of a pixel distribution in the specified window around its mean, we use γ. Skewness is a pure number that characterizes only the shape of the distribution. The formula for finding this third moment is given below: γ = tion respectively.

1 n

∑ni=1 (xi −μ) σ3

3

, where μ and σ are the mean and standard devia-

4. Finally, kurtosis is used to measure the peakness or flatness of a distribution relative to a normal distribution. The conventional definition of κ is: κ =

1 n

4

∑ni=1 (xi −μ) σ4

These four moments have been computed with our own function. The implementation was designed by their mathematical definition and the several convolutions from the 2D Fast Fourier Transform (FFT) in order to decrease the execution time of the algorithm. 2.1.2. Gabor Filtering Function A 2-D Gabor function is a Gaussian modulated by a sinusoid. Gabor filters exhibits the properties as the elementary functions are suitable for modeling simple cells in visual cortex and gives optimal joint resolution in both space and frequency, suggesting simultaneously analysis in both domains. A complex Gabor filter is defined as the product of a Gaussian kernel with a complex sinusoid [10]:

152

J. Torrents-Barrena et al. / Analysis of Gabor-Based Texture Features 2

y˜2

− 12 ( x˜ 2 + 2 ) 1 σx σy g(x, y) = exp exp2π jW x˜ , 2πσx σy

(1)

A self-similar filter dictionary can be obtained by associating an appropriate scale factor and a rotation parameter with the mother wavelet. M and N represent the scales and orientations of the Gabor wavelets.

Figure 1. (1) A mammogram of the mini-MIAS Database, (2) The result of Fast Fourier transform (FFT) of the mammogram, (3) Our Gabor Filter bank with four scales and six orientations.

Our Gabor filter bank has been configured with four scales and six orientations, and a range of frequencies between 0.05 and 0.4. So, every feature vector is composed by a total of 48 dimensions: 6 × 4 × 2(mean, stdev), 72 dimensions: 6 × 4 × 3(mean, stdev, skewness) and 96 dimensions: 6 × 4 × 4(mean, stdev, skewness, kurtosis) . 2.1.3. Multi-sized Evaluation Windows The statistical moments of the Gabor wavelet coefficients module characterize each pixel and its neighborhood. These are computed by W different windows (i.e., 33 × 33 and 51 × 51). In order to find the most suitable size, we calculated the mean in pixels of all tumors contained in the mini-MIAS database [11]. As the result was 48.8 pixels, we decided to choose windows around this value to increase accuracy. We also present a comparison between the results of the previous windows and the direct convolution of the image matrix with the specific Gabor filter, without calculating any feature. So, in this case the window size will be 1 × 1 and the feature vector will be composed by 24 dimensions. Thus, W sets of feature vectors are generated for each pixel of the given texture patterns during the training stage, as well as for each pixel of the test image during the classification stage. 2.2. Feature Selection Feature selection is an important issue in building classification systems. It is advantageous to limit the number of input features in a classifier in order to have a good predictive and less computationally intensive model [7]. In the area of medical diagnosis, a small feature subset means lower test and diagnostic costs. So, to automatically extract the breast region of each image and prepare training and test data pixels, the boundary of the breast was determined using an interactive threshold

J. Torrents-Barrena et al. / Analysis of Gabor-Based Texture Features

153

method to find the correct ground-truth. By this way, we chose the training images (36 in our case). The selection was done manually such that they cover different shapes, textures, tumor types and sizes as much as possible. From the type tumor perspective, we have clustered mammograms with calcifications, architectural distortions, asymmetries, circumscribed masses, spiculated masses and ill-defined masses (see Fig. 2). In addition, some of them have dense-glandular, fatty or fatty-glandular tissues. In summary, the selected training images provide a convenient variety of texture information.

Figure 2. Class of abnormality: Architectural distortion - Fatty (A), Circumscribed masses - Fatty glandular (B), Asymmetry - Fatty glandular (C), Ill defined masses - Fatty glandular (D), Spiculated masses - Fatty (E), Calcification - Dense glandular (F).

As a result, for each mammogram, 1500 random pixels on the image are selected as negative samples. The negative samples which were close to the tumor were discarded. Positive and negative samples were collected from all training images, and, in total, 54000 negative samples and 43200 positive samples were obtained. Finally, we selected another 58 hard-to-process test images from the database and extracted their ground-truth tumors in the same way. The test tumors are not used for the training of the SVM model. We have selected only 58 test images because in most cases, the texture information of the mammograms are similar, so they cannot be good test cases. 2.3. Support Vector Machine Classification Support Vector Machine is a new promising pattern classification technique proposed recently by Vapnik [12]. Unlike traditional methods which minimize the empirical training error, SVM aims at minimizing an upper bound of the generalization error through maximizing the margin between the separating hyperplane and the data. What makes SVM attractive is the property of condensing information in the training data and providing a sparse representation by using a very small number of data points. Some of the significant researches have employed it in various classification problems and mostly current interest in breast cancer detection due its robustness [13,2] . Therefore, the aim is to classify the pixels of a mammogram to a predefined class of interest: tumor (label 1) or non-tumor (label -1), according to the characteristics of a pattern, presented in a feature vector form. Nevertheless, the regularization parameters and the kernel functions are the components that have to be determined before the training. 2.3.1. Kernel Functions Kernel functions must be continuous, symmetric, and most preferably should have a (semi) positive definite Gram matrix. The use of a positive kernel insures that the optimization problem will be convex and solution will be unique. However, many kernel functions which are not strictly positive definite also have been shown to perform very

154

J. Torrents-Barrena et al. / Analysis of Gabor-Based Texture Features

well in practice. An example is the Sigmoid kernel, which, despite its wide use, it is not positive semi-definite for certain values of its parameters. Choosing the most appropriate kernel is closely related to the problem at hand because it depends on what we are trying to model. The motivation behind the use of a particular kernel can be very intuitive and straightforward depending on what kind of information we are expecting to extract about the data [13]. For instance, a Linear kernel allows only to pick out lines or hyperplanes, in contrast of Radial basis function. Polynomial kernels are well suited for problems where all the training data is normalized, since they allows to model feature conjunctions up to the order of the polynomial. Below are listed the kernel functions that we apply in our work, all available from the existing literature: • Linear: K(xi , x j ) = xi · x j . 2

• Radial basis function (RBF): K(xi , x j ) = exp(−γxi −x j  ) , γ > 0 . • Sigmoid: K(xi , x j ) = tanh(γ · xi · x j + r) . • Polynomial: K(xi , x j ) = (γ · xi · x j + r)d , γ > 0 . The parameter that should be optimized for Linear kernel is the penalty C and for RBF kernel are also C and the function parameter γ. Sigmoid applies the same parameters as radial function plus the r (coef0). Finally, Polynomial kernel uses the previous three and the d parameter (degree). 2.3.2. Setting model parameters In addition to the feature subset and function selection, the setting appropriate kernel parameters can greatly improve the SVM classification accuracy. In order to explain the method we take as a example the RBF kernel. For median-sized problems, the grid search approach is an efficient way to find the best C and γ. In grid search, pairs of (C, γ) are tried and the one with the best cross-validation accuracy is chosen. To improve the generalization ability, grid search uses a cross-validation process. That is, for each k subsets of the dataset D, create a training set T = D - k, then run a cross-validation (CV) process as follows [1]: 1. 2. 3. 4.

Consider a grid space of (C, γ) with log2 C ∈ [−5, ..., −1], log2 γ ∈ [−7, ..., 10] . For each pair (C, γ) in the search space, conduct k-fold CV on the training set. Choose the parameter (C, γ) that leads the highest overall CV classification rate. Use the best parameter to create a model for training the dataset.

The following table lists the calculated values to train classifiers appropriately: Table 1. Best kernel function parameters for our work. Kernel function

C

γ

coef0

degree

Linear Radial Sigmoid Polynomial

0.5 0.0312 0.0312 1

64 64 (1 / N)

0 0

1

J. Torrents-Barrena et al. / Analysis of Gabor-Based Texture Features

155

3. Experiments and Discussions We have implemented the proposed method in the MATLAB R2012a environment. In addition, we used the LibSVM (version 3.1) [14] to train the SVM classifier. A typical use of LibSVM involves two steps: first, training a data set to obtain a model and second, using the model to predict information of a testing data set. For C-Support Vector Classification (SVC) and SVR, LibSVM can also output probability estimates. To find the best values for the parameters of the SVM, we have used the logarithmic grid search explained before with a 3-fold crossvalidation. 3.1. Breast Cancer Dataset The Mammographic Image Analysis Society (MIAS) has generated a digital database of mammograms. Films taken from the UK National Breast Screening Programme have been digitized to 50-micron pixel edge with a Joyce-Loebl scanning microdensitometer, a device linear in the optical density range [0 - 3.2] representing each pixel with an 8-bit word [11]. For our experiments, we have used the mini-MIAS [11] database that contains 322 mammograms belonging to three different categories which are obtained by digitizing the original MIAS database. It consists of 208 normal images, 63 benign and 51 malign cases, that are considered abnormal. The database has been reduced to 200-micron pixel edge and clipped/padded so that every image is 1024 × 1024 pixels. It provides background information like breast contour, therefore the pre-processing of these images is required. The mini-MIAS database is publicly available for scientific research at the site of the University of Essex. 3.2. Measures for Performance Evaluation The experimental part was calculated on an Intel Core i3 2.40GHz with 4GB of RAM, which causes some limitations due to the computational time demanded to solve this problem. Add a high number of features in the vectors to train a classifier, increases the execution time exponentially. The solution was to reduce the training data in order to reach an agreement between the total execution time of the algorithm and the quality of the obtained results. However, if the number of pixels is reduced in the training phase, the accuracy of the final results will be affected considerably. Nevertheless, the tests with feature vectors composed by 48 dimensions can be performed without problems, but beyond this threshold has been difficult to get a clear estimation. For this reason, the results with feature vectors of 72 and 96 dimensions are shown as a percentage added to the mean and standard deviation experiments. Commonly used evaluation measures of the predictive ability in breast cancer detection systems is F1 score. It helps to determinate the effectiveness of our pixel-based classifier in terms of discrimination capability between the cancerous and normal regions (see Table 2). In statistical analysis of binary classification, the F1 score is a measure of a test accuracy which is defined as: F1 = 2T P/2T P + FN + FP, where TP, FN and FP are the number of true positives, false negatives and false positives respectively. Also, F1 score can be interpreted as a weighted average of the precision and recall, where a F1 score reaches its best value at 1 and worst score at 0.

156

J. Torrents-Barrena et al. / Analysis of Gabor-Based Texture Features

Table 2. Quality scores corresponding to different configurations of the SVM classifier with 2 features (mean & stdev). TP, FP, TN, FN and F1 ratios are shown between 0 and 1. Classifier

Window-size

TP

FP

TN

FN

F1

Accuracy

Linear SVM

1x1 33x33 51x51

0.21 0.21 0.18

0.18 0.17 0.19

0.49 0.49 0.47

0.12 0.13 0.15

0.53 0.52 0.45

70.13 % 70.04 % 65.67 %

1x1

0.22

0.19

0.48

0.11

0.55

69.71 %

Radial SVM

33x33

0.25

0.24

0.42

0.08

0.56

67.25 %

51x51

0.22

0.24

0.43

0.12

0.48

64.75 %

1x1 33x33

0.22 0.23

0.19 0.23

0.48 0.44

0.11 0.10

0.55 0.53

69.48 % 67.25 %

51x51

0.19

0.22

0.44

0.14

0.42

63.77 %

1x1 33x33 51x51

0.16 0.10 0.09

0.13 0.10 0.10

0.53 0.56 0.56

0.17 0.23 0.24

0.45 0.24 0.22

69.51 % 66.47 % 65.63 %

Sigmoid SVM

Polynomial SVM

In conclusion, our method needs to improve the detection of calcifications, since when we deal with dense mammograms the accuracy of the prediction is reduced. However, the window size 1x1 allows us to increase the accuracy up to 72.88%, unlike the window 33x33 that obtains a 69.45%. By contrast, circumscribed masses are welldefined through the application of our method, even in dense mammograms. We have observed that windows mentioned above (1x1 and 33x33) produce more performance, in especially when the tumor size is close to them or a multiple number. As for kernels, all offer satisfactory results for this problem, but we must consider the linear and non-linear functions, since the sigmoid gives similar predictions as non-linear and polynomial has a 1% less of accuracy. Another analysis is based on the receiver operating characteristic (ROC) curves, a fundamental tool for diagnostic test evaluation. Fig. 3 shows the multi-sized windows curves corresponding to the sensitivity, specificity and AUC mean of all kernel functions. The true positive rate (Sensitivity) is plotted in function of the false positive rate (Specificity) for different cut-off points of a parameter. The area under the ROC curve (AUC) is a measure of how well a parameter can distinguish between two diagnostic groups: tumor/non-tumor (see Table 3).

Figure 3. ROC curves for the different multi-sized evaluation windows.

157

J. Torrents-Barrena et al. / Analysis of Gabor-Based Texture Features

Table 3. ROC-curves parameters. Window

Sensitivity

Specificity

1 - Specificity

Balanced AUC

AUC-ROC

1x1 33x33 51x51

0.6062 0.5921 0.5171

0.7425 0.7203 0.7158

0.2574 0.2797 0.2842

0.6744 0.6562 0.6164

0.7667 0.7679 0.7242

The results presented belong to the tests with two features (mean and stdev). In subsequent tests, we added Skewness and Kurtosis moments. On the one hand, the third statistical moment increases the accuracy of the predicted images up to 7-10%. On the other hand, the fourth moment also helps to improve the total percentage, but in a lower range: 3-5%. Therefore, our method can achieve a 83-85% of accuracy. Below, there are several images with the graphic result of our method. Every image includes information about the existing anomalies (see Fig. 4). The first row comprises the location of the real tumor and the radius of the circle that roughly delimits the lesion, and the second row shows in gray the pixels that our algorithm detect as malignant.

Figure 4. Results example (% accuracy): Col 1. Radial win = 33x33 (89.07 %), Col 2. Sigmoid win = 51x51 (89.62 %), Col 3. Linear win = 1x1 (92.14 %) , Col 4. Polynomial win = 1x1 (34.74 %), Col 5. Linear win = 33x33 (40.99 %), Col 6. Radial win = 51x51 (49.82 %) .

4. Conclusions and Future Work Detecting breast carcinomas in mammograms is a difficult task that has captured the attention of the scientific community. In this paper, we have reviewed the available methods to detect malignant tumor regions in mammographic images. Our methodology proposes a pixel-based classification approach for solving some of the problems of the current methods. First of all, the breast contour is extracted by a threshold method in order to remove the labels in the background region and the annotations in the frame from the whole image. Instead of using raw gray-level values or the gradient of the pixels, we apply the four statistical moments to comprise texture features based on Gabor filters. To estimate the probability of a point to be part of a carcinoma, several SVM kernel functions are used. Our basic idea is developed through an accurate selection of the best features in the training stage to improve the prediction results. So, the malignant points in the breast region are found by classifying the pixels as tumor or non tumor and selecting the ones with the highest probability of being malignant.

158

J. Torrents-Barrena et al. / Analysis of Gabor-Based Texture Features

Experimental results on test data show that our method is able to detect breast tumors. We have evaluated the importance of the support vector machine optimization problem and prior knowledge to determine the best strategy for feature selection, and we have showed that they are necessary for finding precise cancer affected regions. In addition, we have compared our method against several off-the-shelf approaches, and we have demonstrated the advantages of it, in terms of both accuracy and stability. Future work will be focused on the inner tumor regions and their surrounding parts. Our study will try to develop a breast biomarker intended to provide an automatic quantitative estimation of mammographic density [15]. As a result, we will be able to correlate the tumor density with the specific subtypes of breast cancer and other prognosis factors.

Acknowledgements This work was partly supported by the Spanish Government through projects TIN201237171-C02-01 and TIN2012-37171-C02-02.

References [1] [2]

[3]

[4] [5] [6]

[7]

[8] [9] [10] [11] [12] [13] [14] [15]

M. Fatih Akay, "Support vector machines combined with feature selection for breast cancer diagnosis", Expert Systems with Applications, vol. 36, pp. 3240-3247, 2009. P. S. Hajare, V. V. Dixit, "Gabor filter, PCA and SVM based breast tissue analysis and classification", International Journal of Advanced Research in Computer Engineering and Technology, vol. 1, no. 5, pp. 303-306, 2012. A. Oliver, J. Marti, R. Marti, A. Bosch, J. Freixenet, "A new approach to the classification of mammographic masses and normal breast tissue", IAPR Internacional Conference on Pattern Recognition, vol. 4, pp. 707-710, 2006. K. Polat, S. Gunes, "Breast cancer diagnosis using least square support vector machine", Digital Signal Processing, vol. 17(4), pp. 694-701, 2007. A. Oliver, X. Llado, J. Freixenet, J. Marti, "False Positive Reduction in Mammographic Mass Detection Using Local Binary Patterns", MICCAI 2007, vol. 1, pp. 286-293, 2007. R. Lavanya, N. Nagarajan, M. Nirmala Devi, "False positive reduction in computer aided detection of mammographic masses using canonical correlation analysis", Journal of Theoretical and Applied Information Technology, vol. 59, no. 1, pp. 139-145, 2014. B. Elfarra, I. Abuhaiba, "New feature extraction method for mammogram computer aided diagnosis", International Journal of Signal Processing, Image Processing and Pattern Recognition, vol. 6, no. 1, pp. 13-36, 2013. H. S. Sheshadri, A. Kandaswamy, "Breast tissue classification using statistical feature extraction of mammograms", Medical Imaging and Information Sciences, vol. 23, no. 3, pp. 105-107, 2006. M. Sharma, R. B. Dubey, Sujata, S. K. Gupta, "Feature extraction of mammograms", International Journal of Advanced Computer Research, vol. 2, no. 3 (5), pp. 192-199, 2012. B. S. Manjuntah, W. Y. Ma, "Texture features for browsing and retrieval of image data", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 837-842, 1996. The mini-MIAS database of mammograms: http://peipa.essex.ac.uk/info/mias.html C. Cortes, V. Vapnik, "Support-Vector Networks", Machine Learning, vol. 20, pp. 273-295, 1995. H. You, G. Rumbe, "Comparative study of classification techniques on breast cancer FNA biopsy data", International Journal of Artificial Intelligence and Interactive Multimedia, vol. 1, no. 3, pp. 6-13, 2010. C. C. Chang, C. J. Lin, "LIBSVM: a Library for support vector machines", ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, article 27, 2011. P. K. Saha, J. K. Udupa, E. F. Conant, D. P. Chakraborty, D. Sullivan, "Breast tissue density quantification via digitized mammograms", IEEE Transactions on Medical Imaging, vol. 20, no. 8, pp. 792-803, 2001.

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-159

159

        

        

 

  !" #$ ##$  %#

   ##

#   ! 

 &  ' ( )&'*

##       +, -  ( )+*

   ./  0   0 -

/ 1 2 ./            ,             

      !      3 4  ,5,

      !   $         $        ( !  3 $ ,    

     !    ,          ! 

    3 6    $ 2 '       

           ) 7 *  5,  3 

           !         , 

    8       !  -3 2  

,       !   ! !     ! $

      

   3

  

 $   ! $     3

                  ! ,  

, 9 :3  !       !$      3 ;!  5

  $           5   !  

$               !$

! $    3         ) 

    !  *      !   $       

  !  3  !       ,  $ !   < 

!     !     3        

 $ !  !  3  !  $     

  !     )     



!  , 

     *3       $       !  

   !        &  '   ; 

-3      !       3  









































 ,5 ,       !  2  (  8

"=> =?@ @ &>=>=3

160

M. Abdel-Nasser et al. / Improvement of Mass Detection in Breast X-Ray Images

  $     $           

    )A*3   

         

A  !    3

      

    

$ ,

        5,  3

   $   

  

 !       3



'  &  '     !   !      3

A 

3 9=:  ;    %  );%*        

       3   2 '   )2' *     

 5,  3 &     3 9?:       $ 

                

 ,5   3 B  9C:    ( !           

        3 6 $  ( !   ,   

  $        )  !  7   *

,     

  

        

 

      3    3 9D:   + 

    

         !      

  3   E       ,    ! 2' $  

   FDG3 A  3 9H:         !   

  3 %  3 9@:   I    A  (   )IA(* 

!      3 IA(   ,       

   

$  2' ,            3  $ 

       $  ! &  

3 9F:$    A



,   , 

  ! ) 7 * A 

 ! 3     



     A   

5  3





                    

&  '  $ ,  !     J   ,

   3    ,5 ,     ,  K

  !               ! ,  

   A3

  !          

  3

  !   K  !          

  $  5           ,3

   $              ,    3

 L       K !             

            

 3



    ,

          

     ! 

    3 4  J   !   $         $        ( !  3      ,5$ ,

       



,        

   !      3 A         !    $

2' !  !     ) 7  *   3    !$

    , !   !       !

   

! ,    

 3 

  ! $ ,  

 

! !           )!   

 8            ! , 2'   

       !    *3

M. Abdel-Nasser et al. / Improvement of Mass Detection in Breast X-Ray Images

161

        J  ,3 2  = !      

   3 2  ?  J 2' !    3

2  C  

      3 6 $      

,5

    2  D3

              

    ,   !  

 

!      K ;    %  );%*$ (  ; &A 

)(;& *$ I    A  (   )IA(*

 ( !  3 4

  

;     "!   );"*



,

      3

     ;% 

          

   



   

         &  '  3    ;%  

 !       !    ?? ,,    

,          3 %   ,, , 

      

   

  !



  

 >M $       ! F ! $

 ,      63 3  J   ,,       

  )33 DD  @@*3



  &    ;% 

  



!  A3



 

       ;%  ,      



 ,     >  )  *

 3 6   >> >>



  ;% ,  >> > > > 

  ;%3    ;%  

        !            

 



 

  ! )       DN    !

 !  F  *3 4       ,   - 

  ;%    A3           ,  J 

       ! A  3 9=:$  ,   A  

   -     ,  63 =3             !

    ;%    )I $ I=$ I?

 IC*    K





































     































) *

162

M. Abdel-Nasser et al. / Improvement of Mass Detection in Breast X-Ray Images

           A !

      

    ;%  3



 

        ;%       A3

        J  3 9 >:  ;"    3  ,  ;"  ! 

  ;% !              H! $

,  ;%              3  5  

      ;"  ;%      ;%   

   9=:$ ,       ;"  !

     !3 

;"$    ) > @*          !

  +    5 ,       ) 63 ?*3 

 

 ;"         

       

         )H! *3     ;"    

63 ?$  ,          

  

       ;"     5 3     

   !          ;%$         

! ,    $ 

  

 3       ;%$

      ;" 

    !    A    !

     

    3

!"   # $% $    

  9 :  IA(      3 IA(   ! 

 !         !          

          ! 53   IA(  $ 

          ,,   3    

   !5 )    *

 

,       

    3  -     

  J      9>$ :

       3  !       

 

!5    IA(  3



163

M. Abdel-Nasser et al. / Improvement of Mass Detection in Breast X-Ray Images

 

;"     5 3

&'(% ) *#+ ,           !        



   ) 

  *    9 =:3      

  !      3          ! (;&

       ! $  ,    A    !3

6   ! ,    = (;&  ! !    )>$ CD$

N>

 ?D*

    )?$ D

 > *3 6   (;& ,   

==    $

 ,   

          (;& 

  !       3        ,K   -   - $  -     -  $ -    - (   -

    -   ( $-    -  ,    -   . -   -    -    - $    - $    -           -           -  ,       -$   /$ $$      /$            ,    

         $ ,   !   9 = C:          J         !  , 

        ,  ,   3 0%  1  ( !     ! ,             & 

' ! 

      $    $   $

      9C:3  ,   ( !       ! ,



  ,      -   $   ! (  





























   

         

 



! "#! $















)=*





















164

M. Abdel-Nasser et al. / Improvement of Mass Detection in Breast X-Ray Images

, %&  '&           ("  ($

    

 

 ,      ),    ,    (  



  

         *3 (

   

)  $

      

     ) 

   $

































   )  *  































)?*

/-3 =   !       -  $ ,  ! ,  

  -        !         (  

 

  -      3  ( !   ! 5

  !  !     ),      -$ ! ,  

 *      !       ( ! ,  3  ( !

     -            !   

 

 -    

  3

4  

     

                  

                3   

           +,-        (,-   

   3





















  +. (. + ( +/ (/ 0 0 0 +,- (,-























)C*



   1

 2

  !          $

 3   ,5 H    C   , $   

   CF  3

            $ 2 '     ! ,  

!        3  $       

     

    3   $  

      ; $  

 

 

  

         5,  

9 D:3 (  !      3  3 $4   0 0 5 6 ,3 78- $

79 :;<

 6   !   $  2'   ,  J 

!K


?BCB D E F G3

@A

3H.

























2!8 3 C I J3 D K L  : G3 $

G3 L 5









)D*



   J  $     ,      

    5  6M3   N  J3 I J 3  2'   2 2$ ! ,   !     !   ,  3  2' 

   ,         ! ,     , 

   3   J      )      J 

     3  ,   C        

M. Abdel-Nasser et al. / Improvement of Mass Detection in Breast X-Ray Images

165

 3      G       !     

,    

 K    ! 3

  ,5 ,   ; !  !2' 9 H: ,      !   

)6*



  53

 5  !      

   $

  , 

      3  6 5   !  

K



































6MO3  O NO PQRSRT Q $ U V

























)H*

. ,U   3

   

   -          

W  6 5 )33$ U

E*3   ,5$ ,         

  ,    >3D    3     U     X Y U Y / 

 E 

,        X Y E Y .& 3

! " #      !   2  = )2 *     

   ! !     ! 3    ! >$>>>

  3    $            $  

     3           ,  3 4

    =@>> A   2  !  )HN>  

   A

 => >

   A*3  J    A  3,3 3  A   !

     =>D     !  )  *) $  )&&* 

+$   * . ) ;A* ,  F> ,   !    CD   

,3

63 C ,     

   A 9N:3



 ! /        A$ ) *  

 ) *  3



4   2* $ -      

     3   2* $

 $  

    

    2  )     2 

>*3 A                 2'

  $

       6 3           

         $ ,         

     

  2* $3

  

  

 ,K























Z2Z4[4'4[  \]`\] D ^_



























)@*





































abcd>e>d>fg  \_`\_ D ^] 

























)F*









































=

 K77  33 37  7 ! 3 

166

M. Abdel-Nasser et al. / Improvement of Mass Detection in Breast X-Ray Images

\] \_ ^]h2i^_

   $   $       

      

    A$  3 I      

     -    !   !       3 

     !              , 

, A   ) 7

 7?*3

A       ! ,  !    A

   A3

&      %                      

 3       ,  J          3  

  ,  J                

  J 

    3

 $%&' ,  ??   ,,         ;% 

  A3 /  A ,      $     ;%

   ,    !           3

      ;%     ,  =?H )CDN*3

 $()'   ;%$   A ,      !$    

;"     )CHCO=DH * ,   !    

      ! 3

 *+,' ,  ??  J$ FF    !5 J$  N!

  3     IA(     ,  D@H )NFF*3

  -    #K   A ,      !$

 ,

    )>$ CD$ N>

 ?D*

    )?$ D

 >

*            )(;& *   

!3 $  , CF (;&  )C   C    ?

 *$

    (;& ,   ==  3 4   



         (;& 

  !    

     >DH  )CC?==*3

 ,  ',  C  

 H         

( !   )CHO=C *3 4      

     

  ( !  3          !     

   

    )CH=OCF *3

&)      $  A    , 3 6 ,          $

 , A  3    $ ,     ! 

 3 4  J , !  -K     ,    

  !  !    $

  

,        

  $ ,      ,        !  !

  

    8  3



 !  = ,                 

      

       >  3

 !   

  ,  A    7 )D =   A

 D =   A*3 ;"



     )NN3?DG* ,      NC3@?G$  -   

   )NC3@CG*

 ! IA(3  ;%  $

    

M. Abdel-Nasser et al. / Improvement of Mass Detection in Breast X-Ray Images

167

 !      $      ( !  $ ,   !

,        !3 (;&        ,

        3

 ! =      ,  

7? A

 3 ;"       $ ,      N@3>@G  

      ND3? G3  !    )NF3D@*   ! IA($

        FN3HHG3 ;%        ( !

 3 (;&     ,  $       $ !,

=>G3

. 

    

                 )A   7 *3

   ;"

;%



IA(

(;&

( !

/ 012 NC3@?

FN3=@

3!5! HN3F>

F?3>=

"  012 334 F?3=?

ND3?

??3 C

F=3F

 . 

    

                

)A   7?*3

   ;"

;%



IA(

(;&

( !

/ 012 34 HH3F=

FN3HH

N3C=

D?3C@

"  012 N@3>@

NC3FH

3645 FC3?H  N?3?D



          

    ,     ,  !

          !      

 3 A    $ ,          ;"P;%PIA($

;"P;%$ ;"PIA(  ;%PIA($    , 2'     3 4

  )    4  )%&* 9 @:        

      3 A     $ ,       8   

-  !      ;"$ ;%

 IA(3  ! ?    

  !  3  ! !   ;"P;%$ ,  N@3HHG

    NN3C=G    )       $   

%&     *3       - !   

   )NF3>DG*$            ;"P;% )NF3>CG*3

;"P;%   !   !       ;"   ),

   

       !   

* ,   ;%   , 

  

     

!    3 4            

       !     



-  

  )33$ &&

 ;A    ,*3

. 

    

           !     ) 7 A  *

  

;%PIA(

;"PIA(





;"P;%

;"P;%PIA(

&    

/ 012



NC3N=

ND3D>

N@3HH

NC3D?

3674

"  012 NC3N

NC3DC

33! ND3 =

NF3>C

168

M. Abdel-Nasser et al. / Improvement of Mass Detection in Breast X-Ray Images

4    8 9     ,      

     !      $

   

! !     ! 3     $ ,  ;"



,           3 ;"      ,5,

      5 ( !    (;& 3      

    $ ,    , !  3 6 $ ,   

     !      3      $

;"P;% !   !    3 2$ ,       

-    !     ! ;"$ ;%

 IA($



  3    ,5 ,      ! 

       

        !  

     ,3 

 $

!    , !   

      3

:     9 : JJ$ 3$  3$ 5         $    (   3!$    

!$)=> ?*$ @N=F>>3

9=: A$ 3$ 

3$ 6   $     ( $       $

    & 

 &      ) &&*3 2$ $ I!$

)=>>@*3 =FH=N?3

9?: &    

3$ )   $$$      $ /$   -&3 3

 3

 ( 3 ;$ ?>N? N$ =>>=3

9C: Q$ B3$      $    6 ( %       $     $   

)=> >*$ CCH=3

9D:  $ A3 ;3  3$         $       7*  $       ($ /  ;   &  '

     6 )=>>N*$

?ND>3

9H: A$ 3$  3$ 4 6  (  (          (   $       $ F      &   %    )&%*$ / !@@ > $ =>>H3

9@: %$ ' $ 

3$ 8    $   (     $ $ $

985    $

    2   A 

 % $ => C3

9F: & $ B3$  3$ 7 6$* $   * $$ $        $ /$    : 4    $    6!$ )=>> *K CDDCH 3

9N: 25$ R3$ 

3$ '(   (     $    $  $ %3  

=  3 45     $ ?@D??F$ NNC3

9 >:  J 3 

3$   $        :1  $,    $ $

///        %4 )=> ?*$ @C> @D=3

:   $ "3



$ 3$ "     $ $  ( $   3 &  '



%   $ )=>>D*$ / $ 3 FFHFN?3

9 =: I  5$ 3 3$ 

3$ ',          $ ///      2 $ 

9

 &!  ; ) N@?*$ H >H= 3

9 ?: 2 ;3

   &3$ ', 4  94 9 88  % ) *# + -///      (

   2$ / 5$ 3 =$ 3 @F>@ND $ NNN3

9 C: &   3$ 4     *  ,             .  /  $

& 3 R3   2$ 3 =F$

 $

3 CDH=$ =>>=3

9 D: & $ &3$

 '  '3$ 9  *  6 2$    7 ) NND*$ =@?=N@3

9 H: & $ &3  &R ;3$ 89 +:           ($ &     

  2 

  )2*$  )=>

*$ =@3

9 @: R $ 3$       3 R 4 S 2$ ; $ =>>D3

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-169

169

Validating and Customizing a Colour Naming Theory Lledó MUSEROSa, Ismael SANZa, Luis GONZALEZ-ABRILb, and Zoe FALOMIRc a Universitat Jaume I, bUniversidad de Sevilla, cUniversität Bremen

Abstract. The problem of colour naming consists of successfully recognizing, categorising and labeling colours. This paper presents a colour naming theory based on a Qualitative Colour Description (QCD) model, which is validated here by an experiment carried out by real users in order to determine whether the QCD model is close enough to common human colour understanding. Then, this paper presents the necessity to customize this QCD by means of a user profile, since different users may perceive colours differently because of different cultures, knowledge, experiences and even physiology. A standard QCD model adaptable to users could improve colour reference and grounding in human-machine communication situations. Keywords: colour naming, HSL colour space, user customization, colour perception.

1. Introduction Humans attach labels to relevant features of stimuli/things, so that a classification can be performed and further abstract work can be carried out. Colour is one of these essential features. Any technology that aims to reproduce some level of human intelligence, or involves human-machine communication regarding the environment (for instance, assistive technology or ambient intelligence systems), must be able to successfully recognize and categorize colours. From the point of view of colour naming research, Kay and Regier [1] found that: (i) colour categories appear to be organized around universal colour foci, but (ii) naming differences across languages do cause differences in colour cognition because colour categories are determined at their boundaries by language. Therefore, a cognitive colour-naming model should be parameterized not only in a universal way, so that an intelligent agent (robot or ambient intelligent system) can communicate prototypical colours to a human user, but also in a specific way, so that the system adapts to a specific type of user or community which understand colours differently. A great effort has been done during recent years to mimic the colour naming capabilities of humans. In the literature, colour-naming models have been defined using different colour spaces. Menegaz et al. [2] presented a model for computational colour categorization and naming based on the CIE Lab colour space and fuzzy partitioning. Van De Weijer and Schmid [3] presented a colour name descriptor based on the CIE Lab colour space. Mojsilovic [4] presented a computational model for colour

170

L. Museros et al. / Validating and Customizing a Colour Naming Theory

categorization and naming and extraction of colour composition based on the CIE Lab and HSL colour spaces. Seaborn et al. [5] defined fuzzy colour categories based on the Musell colour space (L*C*H). Liu et al. [6] converted the dominant colour of a region (in HSV space) into a set of 35 semantic colour names. Stanchev et al. [7] defined 12 fundamental colours based on the CIELUV colour space and used Johannes Itten’s theory of colour to define the contrasts light-dark, warm-cold, etc. Corridoni et al. [8] presented a model for colour-naming based on the Hue Saturation and Lightness (HSL) colour space and also introduced some semantic connotations, such as warm/cold or light/dark colours. Lammens [9] presented a computational model for colour perception and colour-naming based on the CIE XYZ, CIE Lab and NPP colour spaces. Berk et al. [10] defined the Colour Naming System (CNS), which quantizes the HSL space into 627 distinct colours. Here, the Qualitative Colour Description (QCD) [11] model, based on the HSL colour space –one of the most suitable colour spaces to be divided into intervals of values corresponding to colour names [11]– is used. In contrast to the work by Berk et al. [9], Corridoni et al. [7] and Mojsilovic [3], the QCD model is kept as simple as possible, since Conway [13] demonstrated that human beings can only identify a reduced number of colour names. Furthermore, the QCD model may be understandable by non-experts in colour. Regarding colour customization, [14] proposes a model for customization in fuzzy colour spaces. However, to the best of our knowledge there is no model for colour customization based on a qualitative colour model, as the work presented in this paper. In order to validate the proposed QCD model, an experiment designed to determine if the colour descriptions it produces are close to those of human beings is presented. This is achieved by determining the length of the intervals –corresponding to colour names–. Colour is assumed that is a subjective feature; for instance, there are people who categorize the colour of the fabric shown in row 5, column 2 of Figure 6 as yellow while others may categorize it as green. This is an example of how important it is to customize the colour intervals to a user profile for establishing successful communication. For example, the colour name included in a user interface, which can be both written and read aloud by a speech synthesizer application to help users to communicate (even blind and/or deaf users), will be understood by the system and the user as the same colorimetric feature. That is, there will not be ambiguities between the subjective perspective of the user and the colour reference system used by the machine. The remainder of this paper is organized as follows: Section 2 introduces the QCD model. Two experiments are described in Section 3: the first one is done in order to validate the QCD model, and the second one is defined in order to test the assumptions of the subjectivity of colour naming and the need for customizability

2. Overview of the Qualitative Colour Description (QCD) Model The QCD model [11] defines a reference system in HSL for qualitative colour description as QCRS={uH, uS, uL, QCLAB1…M,QCINT1…M}, where uH/S/L refers to units of Hue, Saturation and Lightness in the colour spindle (Figure 1). In the QCD model it is necessary to consider a number of colours (granularity of the system), which

L. Museros et al. / Validating and Customizing a Colour Naming Theory

171

is determined by a parameter M. Furthermore, once a qualitative colour is chosen, an interval has to be defined which shows all colours that in HSL coordinates are represented by this qualitative colour. These intervals are set using a maximum and a minimum values for each interval, which are parameters that are experimentally defined to match the settings of particular applications.

(KIWTG6JG3%&OQFGNQHVJG*5.EQNQWTURCEG

 In the HSL colour space the rainbow colours are located in the horizontal central circle. The colour lightness changes in the vertical direction. Therefore, light rainbow colours are located above, while dark rainbow colours are located below. The colour saturation changes from the boundary of the two cone bases to the axis of the cone bases and, therefore, pale rainbow colours are located inside the horizontal central circle. Finally, the vertical axis locates the qualitative colours corresponding to the grey scale. Let us consider in this paper a granularity of M = 5, which corresponds to: (1) grey colours, (2) rainbow colours, (3) pale rainbow colours, (4) light rainbow colours and, (5) dark rainbow colours, where the QCLABi and QCINTi for i = 1, … , 5 are: QCLAB1={G1,G2,G3,…,GKG} QCINT1={[0,gul1],(gul1,gul2],(gul2,gul3],…,(gulKG-1,100]ЩUL/СUHЩ[0,360]шСUSЩ[0,gusMAX]}

where KG colour names are defined for the grey scale in QCLAB1 whose corresponding intervals of values in HSL are determined by QCINT1. All the colours in this set can take any value of hue, values of saturation between 0 and gusMAX and values of lightness (gulKG)(gulKG) between 0 and 100, which determine the different colour names defined. Note that the saturation coordinate of the HSL colour space (US) determines if the colour corresponds to the gray scale or to the rainbow scale. QCLAB2={R1,R2,R3,…,RKR} QCINT2={(ruhKR-1,360]ш[0,ruh1],(ruh1,ruh2],..,(ruhKR-2,ruhKR-1]ЩUH/СULЩ СUSЩ(rusMIN,100]}

(rulMIN,rulMAX]

ш

where KR colour names are defined for the rainbow scale in QCLAB2 and considered the most saturated or strongest ones. In QCINT2, saturation can take values between rusMIN and 100, whereas lightness can take values between rulMIN and rulMAX. Here, hue (ruhKR) can take values between 0 and 360, and it determines the colour names defined for this set.

172

L. Museros et al. / Validating and Customizing a Colour Naming Theory

QCLAB3={pale_+QCLAB2} QCINT3={СUHЩQCINT2/СULЩ(rulMIN,rulMAX]шСUSЩ(gusMAX,rusMIN]}

where KR pale colour names are defined in QCLAB3 by adding the prefix pale_ to the colours defined for the rainbow scale (QCLAB2). The colour names defined in QCINT3 have the same interval values of hue as rainbow colours (QCINT2). The lightness intervals also coincide, but they differ from rainbow colours in their saturation, which can take values between gusMAX and rusMIN. QCLAB4={light_+QCLAB2} QCINT4={СUHЩQCINT2/СULЩ(rulMAX,100]шСUSЩ(rusMIN,100]} QCLAB5={dark_+QCLAB2} QCINT5={СUHЩQCINT2/СULЩ(rdul,rulMIN]шСUSЩ(rusMIN,100]}

where KR light and dark colour names are defined in QCLAB4 and QCLAB5, respectively, by adding the prefixes dark_ and light_ to the colour names in the rainbow scale (QCLAB2). The intervals of values for dark and light colour sets (QCINT4 and QCINT5, respectively) take the same values of hue as rainbow colours (QCINT2). The saturation intervals also coincide, but the lightness coordinate (UL) differs and determines the luminosity of the colour (dark or light) taking values between rulMAX and 100 for light colours and between rul and rulMIN for dark colours.

3. Experimentation and results In this section two experiments are presented: the first experiment is intended to validate the QCD model as generating names close to human colour perception; and the second one is intended to demonstrate the necessity of customizing the colour description based on a user profile 3.1. First Experiment: validating the QCD model When people name colours, they are influenced by many aspects, such as culture, visual ability, experience, and so on. Therefore, it is very difficult to select the set of tags to name colours used in an application. For that reason, in order to parameterize the QCD model using a taxonomy of colours as general as possible, an experiment was done in [10] where people were asked to freely determine a name and an adjective (if necessary) for describing a displayed colour. From their responses it was observed that the most used names were the colours {red, orange, yellow, green, turquoise, blue, purple, pink} and the adjectives {pale, light, dark}. With respect to the grey scale, the most used colour names were {black, dark_grey, grey, light_grey, white}. Therefore, the QCD was parameterized using a QCRS including 11 basic colours (black, grey, white, red, orange, yellow, green, turquoise, blue, purple, pink) and the semantic descriptors pale, light and dark. That is, a total of 5 + 8 × 4 = 37 colour names were obtained. The answers not appearing in the most common set of colours were discarded from the initial test and the thresholds in HSL to determine these prototypical 37 colours were calculated using the AMEVA discretization algorithm [15]. The resulting intervals have been used in this test in order to validate this specific parameterization of the QCD model, that is, to test that these intervals divide the HSL colour space into the same colour names most people use and recognize. To validate

L. Museros et al. / Validating and Customizing a Colour Naming Theory

173

this parameterization of the QCD model, a test similar to the previous one has been carried out, but now participants cannot freely chose a colour name, but can only choose among the 37 colour names of the model. The initial parameterization of the QCD model is shown in Figure 2, where the distribution in the H coordinate is depicted in Figure 2(a), while Figure 2(b) shows the distribution in the S and L coordinates.

C                D 

(KIWTG&KCITCOFGUETKDKPIVJG3%&RCTCOGVGTK\CVKQPKPVQEQNQWTUGVUCPFEQNQWTPCOGU

  In order to validate this QCD parameterization, a web application has been designed (see Figure 3) which automatically generates and shows a random RGB colour to the participants, and then it asks them to select which name they would use to describe the colour shown. Then, it also asks the participants the adjective that they would like to add to the colour name (if one adjective is necessary). A total of 545 responses were obtained, and they were used to modify the intervals for colour names. The process followed to modify each interval threshold is explained below.

(KIWTG9GDCRRNKECVKQPVQXCNKFCVGVJG3%45OQFGN



174

L. Museros et al. / Validating and Customizing a Colour Naming Theory

Let us set, a priori, an absolute factor Ǭ>0 and, for the sake of simplicity, let us choose as a toy example the case of establishing the thresholds between a colour with the adjective light and the prototypical colour (a colour without adjective), and Ǭ=0.2. However, it is worth noting that this procedure is carried out between all adjacent colours. In this case, after the experiment, the confusion matrix obtained is shown in Table 1. Table 1. Confussion matrix between adjective light and a prototypical colour. Colour given by the system

Colour given by the user

Prototypical

Light

Prototypical

64 (OK)

29 (FLN)

Light

67 (FNL)

51 (OK)

In Table 1 FLN represents the number of times that the user has chosen a prototypical colour, but the system has classified the colour with the light adjective. FNL represents the number of times that the user has chosen a light colour while the system classified it as a prototypical colour. In this case, the shared boundary of the QCD model is US=55. Therefore, if the user determines that the colour has no adjective, and the QCD model determines light, then it is assumed that the QCD model is not correct and it is necessary to increase the threshold between the different set of colours (with the light adjective or without it) in order to try to solve the mistake. This has to be done using the δ factor, for FLN times. The same reasoning is followed when the system determines that there is no need to an adjective and the user determines the light adjective. Then, the threshold has to be decreased by δ×FNL times to try to solve the mistake. Therefore, the new threshold would be: USNew= USQCDmodel + δ × (FLN – FNL) = 55 + 0,2×(29-67) = 47,4. This process is carried out with all the intervals. Table 2 shows the result of the experiment with respect to the adjective of the colour.

User Perception

6CDNG4GUWNVUQHVJGGZRGTKOGPVYKVJTGURGEVVQVJGWUGQHCPCFLGEVKXG QCD model Dark none light pale dark 75 23 11 12 none 40 64 29 16 light 13 67 51 17 pale 28 18 50 31 Total

Total 121 149 148 127 545

It can be seen that the percentage of success is 41% (total number of agreements between the colour given by the user and the colour provided by the system). This value could be consider low, but the perception of this feature of a colour is very subjective, as it can be seen in the table (29 user observations determine that it is not

175

L. Museros et al. / Validating and Customizing a Colour Naming Theory

necessary to use an adjective, while the model determines light, and 67 observation determine light when the system determines that there is no need for an adjective, and the same happens between light and pale). As a result, by considering the adjectives that share a boundary (Figure 3) and considering δ equal to 0.2 as a baseline (parameter fixed experimentally), the thresholds in table 3 have been obtained. With respect to the adjective in the scale of greys, no change has been considered since the number of inputs with these features has been low (8 observations). Let us indicate that the percentage of success between grey scale colour or the rest of colours has been of 95.97%, that is, the QCD is well-designed in order to distinguish between grey and not grey.

dark-none dark-pale none-light none-pale light-pale grey-colour

6CDNG0GYVJTGUJQNFHQTCFLGEVKXGEQNQWTU Initial QCD Model New QCD Model US UL US -40 --40 --55 -50 -50,4 50 -56,6 20 -22

UL 36,6 36,8 47,4 ----

Table 4 is obtained by considering only the colour without adjective (QCLAB2). It shows that the percentage of success has been 67%, which we consider a good result given that, as mentioned above, colour perception is very subjective.

User Red orange yellow Green turquoise Blue purple Pink

red 36 14 0 1 0 0 4 10

6CDNG3%.#$EQNQWTRGTEGRVKQPTGUWNVU QCD Model orange Yellow green turquoise blue 2 1 0 0 0 19 2 0 0 0 4 27 0 0 0 0 22 107 7 0 0 0 4 22 0 0 0 4 32 65 0 0 0 0 11 0 0 0 0 0

purple 0 0 0 0 1 3 34 5

pink 1 0 0 0 2 0 30 26

Analyzing the results in Table 4, and considering only neighbouring colours (colours sharing a boundary) and with a δ equal to 0.72 (δ was set to 0.2 in a scale up to 100, but it must be scaled to 360 in this case; thus, δ =0.2×360/100=0.72), then the new thresholds obtained are as follows (see Figure 4 for a graphical interpretation of the new thresholds). 3%.#$]TGFQTCPIG[GNNQYITGGPVWTSWQKUGDNWGRWTRNGRKPM_ 3%+06] ?ш=? ? ? ?  ? ? ? ?Щ7*С75Щ ?шС7.Щ ?_

176

L. Museros et al. / Validating and Customizing a Colour Naming Theory

(KIWTTG)TCRJKECNKPPVGTRTGVCVKQPQHVJ JGPGYEQNQWTKPVGGTXCNU

3.2. Secon nd Experimentt: is it necessa ary to customiize the QCD parameterizati p ion accorrding to a userr-profile? To test th he assumption ns about the subjectivity of colour naaming and thee need for customizaability, this second experim ment has been carried out in ncluding speciial objects, such as clothes, c whicch can have different colours and shaadows. The aapplication d for this expeeriment has beeen designed to t be adaptablle to multiple platforms, developed and sincee a future go oal is that th he system could learn fro om people inn everyday situations,, it is designed d to be availab ble for mobilee devices too. Given n an image, th he application n determines the t predominaant colour of the picture from the set s of colours of the QCD parameterizat p tion (Figure 5aa). Then it askks the user if she agreees with the name n given to o the colour. If I the user agrrees, the coloour name is stored into o the databasee. If the user disagrees, d the application assks the user too determine the colourr name by: (i) freely giving a name to thee colour (Figu ure 5b), and (iii) selecting one of thee colour namess of the QCD parameterizattion (Figure 5c). +PQTFFGTVQECTT[QWWVVJGVGUVUCFFCVCDCUGQH KOCIGUQHFKHHGTGPVHCDTKEEUJCUDGGP ETGCVGF ( (KIWTG   GCCEJ QPG CRRCCTGPVN[ HKVVKPI I QP QPG QHH VJG  EQNQQWT PCOGU EQPUKFGTGFFD[VJG3%& &OQFGNFor each e image itss histogram is calculated annd the most representaative colour iss the one seleccted as the collour of the fab bric. This imagge analysis is very sim mple and doess not account for colour co onstancy, spattial relations oor shadows among colours. This su uits the experim mental databaase; to adapt th he applicationn to be used ng, a correction n method [3] could be appllied for obtainning a more in a more general settin olour analysis of the image. precise co 

(a)

(b)

(c)

(KIWTG5PPCRUJQVUQHVJGCRRRNKECVKQPFGXGNQRGFHQTVJGUGEQPPFGZRGTKOGPV

177

L. Museros et al. / Validating and Customizing a Colour Naming Theory

Five users have done the tests and all of them have labeled the colour of the 37 fabrics. The results can be seen in Table 5, in which it is shown the percentage of agreement (the user suggests the same colour than the system), considering the colour plus adjective, only the colour or only the adjective.

(KIWTG1XGTXKGYQHVJGHCDTKEUQHVJGFCVCDCUGGCEJRKEVWTGUJQYUVJGEQNQWTPCOGIKXGPD[QWTOQFGN CPFVJGPWODGTQHRGQRNGVJCVCITGGQTPQVYKVJVJKURGTEGRVKQP KPCPKPKVKCNVGUVRJCUG 

 

User A B C D E

Mean

6CDNG4GUWNVUQHVJGUGEQPFGZRGTKOGPV Colour&Adjective Colour 59,46 70,27 86,49 89,19 64,86 70,27 43,24 56,76 64,86 70,27

63,78

71,35

Adjective 67,57 91,89 75,68 59,46 67,57

72,43

The most relevant result is that there is a great variability, which again shows that colour is a very subjective perception. For instance, user B agrees with the system 86.49% of the times, while user D only agrees 43,24%. Therefore, as each user sees the colour in a different way, it is very important to customize a colour system able to communicate in an efficient way with users. An analysis considering the fabrics can be done as well (Table 6). The first column of Table 6 shows the number of users that agree with the system in the name given to a fabric. For instance, there are 12 fabrics (32,43%) for which the system and the 5 users who have done the test agree in the colour and adjective given; 19 (51,35%) fabrics for which the users and the system agree in the colour name given, and 14 fabrics (37,84%) for which the users and the system agree in the adjective given. 6CDNG5WOOCT[QHPWODGTQHVKOGUCWUGTCITGGUKPVJGEQNQWTIKXGPVQCHCDTKE Agree Users Colour&Adjetive Colour Adjective 0 4 3 2 1 4 5 2 2 4 2 4 3 6 4 6 4 7 4 9 5 12 19 14

178

L. Museros et al. / Validating and Customizing a Colour Naming Theory

It is interesting to note that, for 4 fabrics (10,81%) the system gives a different colour plus adjective to the one given by all 5 users. Therefore, it is very important to customize the QCD model to the colour user profile. By repeating experiment number 1 for each target user, it is possible to customize the QCD model to her colour user-profile.

4. Conclusions A validated QCD model is presented in this paper. The necessity to customize it to a user profile has also been demonstrated, and a method to carry out this customization has been provided, based on the modification of the original colour intervals, wich provides a simple but general method. As future work we intend to customize the QCD model and test if the customized resulting model fits the user expectations. Moreover, a fuzzy colour system can be developed based on the results of the experiments presented in this paper. Acknowledgments This work was funded by Universität Bremen, the European Commission through FP7 Marie Curie IEF actions under project COGNITIVE-AMI (GA 328763), the interdisciplinary Trasregional Collaborative Research Center Spatial Cognition SFB/TR 8, the Deutscher Akademischer Austausch Dienst (DAAD), the Andalusian Regional Ministry of Economy (project SIMON TIc-8052), the Spanish Ministry of Economy and Competitiveness (project TIN2011-24147), GeneralitatValenciana (project GVA/2013/135) and Universitat Jaume I (project P11B2013-29). References [1]

P. Kay and T. Regier. Language, thought, and color: Recent developments. Trends in Cognitive Sciences, 10(2): 51–54, 2006. [2] G. Menegaz, A.L. Troter, J. Sequeira, J.M. Boi, A discrete model for color naming, EURASIP Journal of Applied Signal Process, Special Issue on Image Perception (2007) pp. 1–10. [3] J. Van De Weijer, C. Schmid, Applying color names to image description, in: ICIP (3), IEEE, 2007, pp. 493–496. [4] A. Mojsilovic, A computational model for color naming and describing color composition of images, IEEE Transactions on Image Processing 14 (5) (2005) pp. 690–699. [5] M. Seaborn, L. Hepplewhite, T.J. Stonham, Fuzzy colour category map for the measurement of colour similarity and dissimilarity, Pattern Recognition 38 (2) (2005) 165–177. [6] Y. Liu, D. Zhang, G. Lu, W. Ma, Region-based image retrieval with perceptual colors, in: Proc. of Pacific-Rim Multimedia Conference (PCM2004), 2004, pp. 931–938. [7] P.L. Stanchev, D. Green Jr., B. Dimitrov, High level colour similarity retrieval, International Journal on Information Theories and Applications 10 (3) (2003) pp. 363–369. [8] J.M. Corridoni, A.D. Bimbo, E. Vicario, Image retrieval by color semantics with incomplete knowledge, Journal of the American Society for Information Science 49 (3) (1998) 267–282. [9] J.M.G. Lammens, A computational model of color perception and color naming, Ph.D. thesis, Faculty of the Graduate School of State University of New York at Buffalo, USA, 1994. [10] T. Berk, L. Brownston, A. Kaufman, A new color-naming system for graphics languages, IEEE Computer Graphics and Applications, 2: 37–44, 1982. [11] Z. Falomir, Ll. Museros, L. González-Abril, I. Sanz, A model for qualitative colour comparison using interval distances, Displays, 34 (4): 250-257, 2013. [12] P. Gärdenfors, Conceptual Spaces: On The Geometry Of Thought, MIT Press, Cambridge, MA, 2000.

L. Museros et al. / Validating and Customizing a Colour Naming Theory

179

[13] D. Conway, An experimental comparison of three natural language colour naming models. In: Proc. East–West International Conference on Human– Computer Interactions, pp. 328–339, 1992 [14] J.M. Soto-Hidalgo, J. Chamorro-Martínez, D. Sánchez, A new approach for defining a fuzzy color space. In:Proc. 2010 IEEE International Conference on Fuzzy Sustems (FUZZ), pp.1-6, 2010. [15] L. Gonzalez-Abril, F.J. Cuberos, F. Velasco y J.A. Ortega. Ameva: An autonomous discretization, Expert Systems with Applications Volumen 36, Issue 3, Part 1: 5327-5332, 2009.

This page intentionally left blank

Fuzzy Logic and Reasoning I

This page intentionally left blank

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-183

183

One-dimensional T -preorders D. Boixader and J. Recasens Secció Matemàtiques i Informàtica ETS Arquitectura del Vallès Universitat Politècnica de Catalunya Pere Serra 1-15 08190 Sant Cugat del Vallès Spain {dionis.boixader,j.recasens}@upc.edu

Abstract. This paper studies T -preorders by using their Representation Theorem that states that every T -preorder on a set X can be generated in a natural way by a family of fuzzy subsets of X. Especial emphasis is made on the study of onedimensional T -preorders (i.e.: T -preorders that can be generated by only one fuzzy subset). Strong complete T -preorders are characterized. Keywords. T -preorder, Representation Theorem, generator, dimension, strong complete T -preorder

Introduction T -preorders were introduced by Zadeh in [8] and are very important fuzzy relations, since they fuzzify the concept of preorder on a set. Although there are many works studying their properties and applications to different fields, starting with [8], [7] [1], authors have not paid much attention to their relationship with the very important Representation Theorem. Roughly speaking the Representation Theorem states that every fuzzy subset μ of a set X generates a T -preorder Pμ on X in a natural way and that every T -preorder can be generated by a family of such special T -preorders. The Representation Theorem provides us with a method to generate a T -preorder from a family of fuzzy subsets. These fuzzy subsets can measure the degrees in which different features are fulfilled by the elements of a universe X or can be the degrees of compatibility with different prototypes. Reciprocally, from a T -preorder a family (in fact many families) of fuzzy subsets can be obtained providing thus semantics to the relation. This paper provides some results of T -preorders related to the Representation Theorem. Special attention is paid to one-dimensional T -preorders (i.e.: T -preorders generated by a single fuzzy subset) because they are the bricks out of which T -preorders are built. The fuzzy subsets that generate the same T -preorder are determined (Propositions 2.1, 2.3 and 2.5) and a characterization of one-dimensional T -preorders by the use of Sincov-like functional equations is provided in Propositions 2.12 and 2.13. Also the relation with T -preorders and reciprocal matrices [6] will allow us to find a one-dimensional T -preorder close to a given one as explained in Example 2.17.

184

D. Boixader and J. Recasens / One-Dimensional T-Preorders

A strong complete T -preorder P on a set X is a T -preorder satisfying that for x, y ∈ X, either P (x, y) = 1 or P (y, x) = 1. These are interesting fuzzy relations used in fuzzy preference structures [4]. It is a direct consequence of Lemma 1.6 that one-dimensional T -preorders are strong complete, but there are strong complete T -preorders that are not one-dimensional. In Section 3 strong complete T -preorders are characterized using the Representation Theorem (Propositions 3.4 and 3.5). The last section of the paper contains some concluding remarks and an interesting open problem: Which conditions must a couple of fuzzy subsets μ and ν fulfill in order to exist a t-norm T with Pμ = Pν . Also the possibility of defining two dimensions (right and left) is discussed. A section of preliminaries with the results and definitions needed in the rest of the paper follows.

1. Preliminaries This section contains the main definitions and properties related mainly to T -preorders that will be needed in the rest of the paper. Definition 1.1. [8] Let T be a t-norm. A fuzzy T -preorder P on a set X is a fuzzy relation P : X × X → [0, 1] satisfying for all x, y, z ∈ X • P (x, x) = 1 (Reflexivity) • T (P (x, y), P (y, z)) ≤ P (x, z) (T -transitivity). Definition 1.2. The inverse or dual R−1 of a fuzzy relation R on a set X is the fuzzy relation on X defined for all x, y ∈ X by R−1 (x, y) = R(y, x). Proposition 1.3. A fuzzy relation R on a set X is a T -preorder on X if and only if R−1 is a T -preorder on X. → − Definition 1.4. The residuation T of a t-norm T is defined for all x, y ∈ [0, 1] by − → T (x|y) = sup{α ∈ [0, 1] such that T (α, x) ≤ y}. Example 1.5. 1. If T is a continuous Archimedean t-norm with additive generator t, then → − T (x|y) = t[−1] (t(y) − t(x)) for all x, y ∈ [0, 1]. As special cases, → − • If T is the Łukasiewicz t-norm, then T (x|y) = min(1 − x + y, 0) for all x, y ∈ [0, 1]. → − • If T is the Product t-norm, then T (x|y) = min( xy , 1) for all x, y ∈ [0, 1].  → − y if x > y 2. If T is the minimum t-norm, then T (x|y) = 1 otherwise.

D. Boixader and J. Recasens / One-Dimensional T-Preorders

185

Lemma 1.6. Let μ be a fuzzy subset of X. The fuzzy relation Pμ on X defined for all x, y ∈ X by → − Pμ (x, y) = T (μ(x)|μ(y)) is a T -preorder on X. Theorem 1.7. Representation Theorem [7]. A fuzzy relation R on a set X is a T -preorder on X if and only if there exists a family (μi )i∈I of fuzzy subsets of X such that for all x, y ∈ X R(x, y) = inf Rμi (x, y). i∈I

Definition 1.8. A family (μi )i∈I in Theorem 1.7 is called a generating family of R and an element of a generating family is called a generator of R. The minimum of the cardinalities of such families is called the dimension of R (dimR) and a family with this cardinality a basis of R. A generating family can be viewed as the degrees of accuracy of the elements of X to a family of prototypes. A family of prototypes with low cardinality, especially a basis, simplifies the computations and gives clarity to the structure of X. The next proposition states a trivial but important result. Proposition 1.9. μ is a generator of R if and only if Rμ ≥ R. Definition 1.10. Two continuous t-norms T, T  are isomorphic if and only if there exists a bijective map ϕ : [0, 1] → [0, 1] such that ϕ ◦ T = T  ◦ (ϕ × ϕ). Isomorphisms ϕ are continuous and increasing maps. It is well known that all strict continuous Archimedean t-norms T are isomorphic. In particular, they are isomorphic to the Product t-norm and T (x, y) = ϕ−1 (ϕ(x) · ϕ(y)) Also, all non-strict continuous Archimedean t-norms T are isomorphic. In particular, they are isomorphic to the Łukasiewicz t-norm and T (x, y) = ϕ−1 (max(ϕ(x) + ϕ(y) − 1), 0). → → − − Proposition 1.11. If T, T  are two isomorphic t-norms, then their residuations T , T  → − also are isomorphic (i.e. there exists a bijective map ϕ : [0, 1] → [0, 1] such that ϕ◦ T = − → T  ◦ (ϕ × ϕ)). 2. One-dimensional T -preorders Let us recall that according to Definition 1.8 a T -preorder P on X is one dimensional if and only if there exists a fuzzy subset μ of X such that for all x, y ∈ X, P (x, y) = → − T (μ(x)|μ(y)).

186

D. Boixader and J. Recasens / One-Dimensional T-Preorders

2.1. Generators of One-dimensional T -preorders For a one-dimensional T -preorder P it is interesting to find all fuzzy subsets μ that are a basis of P (i.e.: P = Pμ ). The two next propositions answer this question for continuous Archimedean t-norms and for the minimum t-norm. Proposition 2.1. Let T be a continuous Archimedean t-norm, t an additive generator of T and μ, ν fuzzy subsets of X. Pμ = Pν if and only if ∀x ∈ X the following condition holds: t(μ(x)) = t(ν(x)) + k with k ≥ sup{−t(ν(x))|x ∈ X} Moreover, if T is non-strict, then k ≤ inf{t(0) − t(ν(x)) | x ∈ X}. Proof. ⇒) If μ(x) ≥ μ(y), then → − Pμ (x, y) = T (μ(x)|μ(y)) = t−1 (t(μ(y)) − t(μ(x))) Pν (x, y) = t−1 (t(ν(y) − t(ν(x))) where t[−1] is replaced by t−1 because all the values in brackets are between 0 and t(0). If Pμ = Pν , then t(μ(y)) − t(μ(x)) = t(ν(y)) − t(ν(x)). Let us fix y0 ∈ X.Then t(μ(y0 )) − t(μ(x)) = t(ν(y0 )) − t(ν(x)). and t(μ(x)) = t(ν(x)) + t(μ(y0 )) − t(ν(y0 )) = t(ν(x)) + k ⇐) Trivial thanks to Example 1.5.1. Example 2.2. With the previous notations, • If T is the Łukasiewicz t-norm, then μ(x) = ν(x) + k with inf {1 − ν(x)} ≥ k ≥ sup {−ν(x)}. x∈X

x∈X

• If T is the product t-norm, then μ(x) =

ν(x) with k ≥ sup {ν(x)}. k x∈X

D. Boixader and J. Recasens / One-Dimensional T-Preorders

187

Proposition 2.3. Let T be the minimum t-norm, μ a fuzzy subset of X and xM an element of X with μ(xM ) ≥ μ(x) ∀x ∈ X. Let Y ⊂ X be the set of elements x of X with μ(x) = μ(xM ) and s = sup{μ(x) such that x ∈ X − Y }. A fuzzy subset ν of X generates the same T -preorder than μ if and only if ∀x ∈ X − Y μ(x) = ν(x) and ν(y) = t with s ≤ t ≤ 1 ∀y ∈ Y. Proof. It follows easily from the fact that  μ(y) if μ(x) > μ(y) Pμ (x, y) = 1 otherwise.

At this point, it seems that the dimension of P and of P −1 should coincide, but this is not true in general as we will show in the next example. Nevertheless, for continuous Archimedean t-norms they do coincide in most of the cases as will be proved in Proposition 2.5. Example 2.4. Consider the one-dimensional min-preorder P of X = {x1 , x2 , x3 } generated by the fuzzy subset μ = (0.8, 0.7, 0.4). Its matrix is x1 x2 x3 ⎞ x1 1 1 1 x2 ⎝ 0.7 1 1 ⎠ x3 0.4 0.4 1 ⎛

while the matrix of P −1 is x1 x1 1 x2 ⎝ 1 x3 1 ⎛

x2 x3 ⎞ 0.7 0.4 1 0.4 ⎠ 1 1

which clearly is not one-dimensional. Proposition 2.5. Let T be a continuous Archimedean t-norm, t an additive generator of T , μ a fuzzy subset of X and Pμ the T -preorder generated by μ. Then Pμ−1 is generated by the fuzzy subset ν of X such that t(ν(x)) = −t(μ(x)) + k. Proof. Pμ−1 (x, y) = Pμ (y, x) = t[−1] (t(μ(x)) − t(μ(y))) = t[−1] (−t(μ(y)) + k + t(μ(x)) − k) = t[−1] (t(ν(y)) − t(ν(x))) = Pν (x, y).

Example 2.6. • If T it the t-norm of Łukasiewicz, μ a fuzzy subset of X and P the T -preorder

188

D. Boixader and J. Recasens / One-Dimensional T-Preorders

on X generated by μ (i.e.: P = Pμ ), then P −1 is generated by k − μ, with supx∈X {μ(x)} ≤ k ≤ 1 + inf x∈X {μ(x)}. • If T it the product t-norm, μ a fuzzy subset of X such that inf x∈X {μ(x)} > 0 and P the T -preorder on X generated by μ (i.e.: P = Pμ ), then P −1 is generated by k μ , with 0 < k ≤ inf x∈X {μ(x)}. Hence, the dimensions of a T -preorder P and its inverse P −1 coincide when T is the t-norm of Łukasiewicz (and any other continuous non-strict Archimedean t-norm) while for the product t-norm (and any other continuous strict t-norm) coincide when inf x,y∈X {P (x, y)} = 0. 2.2. Sincov Functional Equation and AHP Definition 2.7. [3] A mapping F : X × X → R satisfies the Sincov functional equation if and only if for all x, y, z ∈ X we have F (x, y) + F (y, z) = F (x, z). The following result characterizes the mappings satisfying Sincov equation. Proposition 2.8. [3] A mapping F : X ×X → R satisfies the Sincov functional equation if and only if there exists a mapping g : X → R such that F (x, y) = g(y) − g(x) for all x, y ∈ X. Proposition 2.9. The real line R with the operation ∗ defined by x ∗ y = x + y − 1 for all x, y ∈ R is an Abelian group with 1 as the identity element. The opposite of x is −x + 2. Replacing the addition by this operation ∗ we obtain a Sincov-like functional equation: Proposition 2.10. Let F : X ×X → R be a mapping. F satisfies the functional equation F (x, y) ∗ F (y, z) = F (x, z)

(1)

if and only if there exists a mapping g : X → R such that F (x, y) = g(y) − g(x) + 1 for all x, y ∈ X. Proof. The mapping G(x, y) = F (x, y) − 1 satisfies the Sincov functional equation and so G(x, y) = g(y) − g(x). Replacing the addition by multiplication we obtain another Sincov-like functional equation:

D. Boixader and J. Recasens / One-Dimensional T-Preorders

189

Proposition 2.11. Let F : X × X → R+ be a mapping. F satisfies the functional equation F (x, y) · F (y, z) = F (x, z)

(2)

if and only if there exists a mapping g : X → R+ such that F (x, y) =

g(y) g(x)

for all x, y ∈ X. Proof. Simply calculate the logarithm of both hand sides of the functional equation to transform it to Sincov functional equation. If μ is a fuzzy subset of X, we can consider μ as a mapping from X to R or to R+ . This will allow us to characterize one-dimensional T -preorders on X when T is a continuous Archimedean t-norm. For this purpose we will use the isomorphism ϕ between T and the Łukasiewicz or the Product t-norm. Proposition 2.12. Let T (x, y) = ϕ−1 (max(ϕ(x) + ϕ(y) − 1), 0) be a non-strict Archimedean t-norm and X a set. F : X × X → R satisfies equation (1) if and only if ϕ ◦ P = min(ϕ ◦ F, 1) is a one-dimensional T -preorder on X. Proposition 2.13. Let T (x, y) = ϕ−1 (ϕ(x) · ϕ(y)) be a non-strict Archimedean t-norm and X a set. F : X ×X → R+ satisfies equation (2) if and only if ϕ◦P = min(ϕ◦F, 1) is a T -preorder on X. Definition 2.14. [6] An n × n real matrix A with entries aij > 0, 1 ≤ i, j ≤ n, is reciprocal if and only if aij = a1ji ∀i, j = 1, 2, ..., n. A reciprocal matrix is consistent if and only if aik = aij · ajk ∀i, j = 1, 2, ..., n. If the cardinality of X is finite (i.e.: X = {x1 , x2 , ..., xn }), then we can associate the matrix A = (aij ) with entries aij = F (xi , xj ) to every map F : X × X → R+ . Then F satisfies equation (2) if and only if A is a reciprocal consistent matrix as defined in [6]. Proposition 2.11 can be rewritten in this context by Proposition 2.15. [6] An n × n real matrix A is reciprocal and consistent if and only if there exists a mapping g of X such that aij =

g(xi ) ∀i, j = 1, 2, ..., n. g(xj )

For a given reciprocal matrix A, Saaty obtains a consistent matrix A close to A [6]. A is generated by an eigenvector associated to the greatest eigenvalue of A and fulfills the following properties. 

1. If A is already consistent, then A = A . 2. If A is a reciprocal positive matrix, then the sum of its eigenvalues is n.

190

D. Boixader and J. Recasens / One-Dimensional T-Preorders

3. If A is consistent, then there exist a unique eigenvalue λmax = n different from zero. 4. Slight modifications of the entries of A produce slight changes to the entries of A . We will use the third property to obtain a one-dimensional T -preorder close to a given one. Definition 2.16. [6] The consistent matrix A = (aij ) associated to a T -preorder P = (pij ) i, j = 1, 2, ..., n is defined by aij = pij if pij ≤ pji aij =

1 if pij > pji . pji

Then in order to obtain a one dimensional T -preorder P  close to a given one P (T the Product t-norm), the following procedure can be used: • Calculate the consistent reciprocal matrix A associated to P . • Find an eigenvector μ of the greatest eigenvalue of A. • P  = Pμ . Example 2.17. Let T be the Product t-norm and P the T -preorder on a set X of cardinality 5 given by the following matrix. ⎛ ⎞ 1 1 1 1 1 ⎜ 0.74 1 1 1 1⎟ ⎜ ⎟ ⎜ 1 1⎟ P = ⎜ 0.67 0.87 1 ⎟. ⎝ 0.50 0.65 0.74 1 1⎠ 0.41 0.53 0.60 0.80 1 Its associated reciprocal matrix A is ⎛ ⎞ 1 1.3514 1.4925 2.0000 2.4390 ⎜ 0.7400 1 1.1494 1.5385 1.8868 ⎟ ⎜ ⎟ ⎜ 1.3514 1.6667 ⎟ A = ⎜ 0.6700 0.8700 1 ⎟. ⎝ 0.5000 0.6500 0.7400 1 1.2500 ⎠ 0.4100 0.5300 0.6000 0.8000 1 Its greatest eigenvalue is 5.0003 and an eigenvector for 5.0003 is μ = (1, 0.76, 0.67, 0.50, 0.40). This fuzzy set generates Pμ which is a one dimensional T -preorder close to P . ⎛

⎞ 1 1 1 1 1 ⎜ 0.76 1 1 1 1⎟ ⎜ ⎟ ⎜ 1 1⎟ Pμ = ⎜ 0.67 0.88 1 ⎟. ⎝ 0.50 0.66 0.74 1 1⎠ 0.40 0.53 0.60 0.81 1

D. Boixader and J. Recasens / One-Dimensional T-Preorders

191

The results of this section can be easily generalized to continuous strict Archimedean t-norms. If T  is a continuous strict Archimedean t-norm, then it is isomorphic to the Product t-norm T . Let ϕ be this isomorphism. If P is a T  -preorder, then ϕ◦P is a T -preorder. We can find P  one-dimensional close to ϕ ◦ P as before. Since isomorphisms between continuous t-norms are continuous and preserve dimensions, ϕ−1 ◦ P  is a one-dimensional T  -preorder close to P .

3. Strong Complete T -preorders Definition 3.1. [4] A T -preorder P on a set X is a strong complete T -preorder if and only if for all x, y ∈ X, max(P (x, y), P (y, x)) = 1. Of course every one-dimensional fuzzy T -preorder is a strong complete T -preorder, but there are strong complete T -preorders that are not one-dimensional. In Propositions 3.3 and 3.5 these fuzzy relations will be characterized exploiting the fact that they generate crisp linear orderings. Lemma 3.2. Let μ be a generator of a strong complete T -preorder P on X. If P (x, y) = 1, then μ(x) ≤ μ(y). → − Proof. Trivial, since T (μ(x)|μ(y)) = Pμ (x, y) ≥ P (x, y) = 1. Proposition 3.3. Let μ, ν be two generators of a strong complete T -preorder P on X. Then for all x, y ∈ X, μ(x) ≤ μ(y) if and only if ν(x) ≤ ν(y). Proof. Given x, y ∈ X, x = y, let us suppose that P (x, y) = 1 (and P (y, x) < 1). Then μ(x) ≤ μ(y) and ν(x) ≤ ν(y). Proposition 3.4. Let P be a strong complete T -preorder on a set X. The elements of X can be totally ordered in such a way that if x ≤ y, then P (x, y) = 1. Proof. Consider the relation ≤ on X defined by x ≤ y if and only if μ(x) ≤ μ(y) for any generator μ of P . (If for x = y, μ(x) = μ(y) for any generator, then chose either x < y or y < x). Reciprocally, Proposition 3.5. If for any couple of generators μ and ν of a T -preorder P on a set X μ(x) ≤ μ(y) if and only if ν(x) ≤ ν(y), then P is strong complete. Proof. Trivial.

192

D. Boixader and J. Recasens / One-Dimensional T-Preorders

4. Concluding Remarks T -preorders have been studied with the help of its Representation Theorem. The different fuzzy subsets generating the same T -preorder have been characterized and the relation between one-dimensional T -preorders, Sincov-like functional equations and Saaty’s reciprocal matrices has been studied. Also strong complete T -preorders have been characterized. We end pointing at two directions toward a future work. • We can look at the results of the Subsection 2.1 from a different point of view: Let us suppose that we obtain two different fuzzy subsets μ and ν of a universe X by two different measurements or by two different experts. It would be interesting to know in which conditions we could assure the existence of a (continuous Archimedean) t-norm for which Pμ = Pν . • A fuzzy subset μ of a set X generates a T -preorder Pμ by Pμ (x, y) = → − → − T (μ(x)|μ(y)), but also another T -preorder μ P (x, y) = T (μ(y)|μ(x)) (in fact, the inverse of Pμ ). In this way we could define two dimensions of a T -preorder according weather we consider it generated by families (Pμi )i∈I or by families (μi P )i∈I . For instance the min-preorder of Example 2.4 would have right dimension 1 and left dimension 2.

References [1] [2] [3] [4] [5] [6] [7] [8]

U. Bodenhofer. A Similarity-Based Generalization of Fuzzy Orderings. Schriftenreihe der Johannes Kepler Universität 26. Universitätsverlag Rudolf Trauner, Linz, 1999. D. Boixader. Some Properties Concerning the Quasi-inverse of a t-norm. Mathware & Soft Computing 5, 5-12, 1998. E. Castillo-Ron, R. Ruiz-Cobo. Functional equations in Science and Engineering. Marcel Decker, New York, 1992. J. Fodor, M. Roubens. Fuzzy Preference Modelling and Multicriteria Decision Support. Kluwer Academic Publishers, Dordrecht, 1994. J.Recasens. Indistinguishability Operators. Modelling Fuzzy Equalities and Fuzzy Equivalence Relations. Studies in Fuzziness and Soft Computing 260 Springer, 2011. T.L. Saaty. The Analytic Hierarchy Process. McGraw-Hill, 1980. L. Valverde. On the Structure of F-Indistinguishability Operators. Fuzzy Sets and Systems 17 313-328, 1985. L.A. Zadeh. Similarity relations and fuzzy orderings. Inform. Sci. 3, 177-200, 1971.

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-193

193

On the derivation of weights using the geometric mean approach for set-valued matrices Vicen¸c Torra a Institut d’Investigaci´ o en Intellig`encia Artificial (IIIA) Consejo Superior de Investigaciones Cient´ıficas (CSIC) Campus UAB s/n, 08193 Bellaterra (Catalonia, Spain)

a

Abstract. Several approaches have been proposed to derive the weights for the Analytic Hierarchy Process (AHP). These methods can be used for any problem in which we need to determine weights for an aggregation operator and we interview an expert to obtain them. For example, they can be used for weighted means, ordered weighted averaging (OWA) operators, and quasi-arithmetic means. Most of these methods interview an expert to elicit a preference matrix, and use this matrix to obtain the vector. In this paper we consider the problem in which each of the cells in the preference matrix does not contain a single value but a set of them. We propose a method following the logarithmic least square approach. Keywords. Weighted mean, aggregation operators, OWA operator, weights derivation, hesitant fuzzy sets.

Introduction Aggregation operators [11,12] often rely on a parameter that permits us to supply some background knowledge on the data sources being aggregated. In the case of the weighted mean, we use a weighting vector where each vector gives information on the importance of each source. Similarly, in the case of the ordered weighted averaging (OWA) operator [14] we have a weighting vector for the relative importance of the values (either we prefer low or high values), in the case of the weighted ordered weighted averaging (WOWA) operator [9] we have two weighting vector for the relative importance of the information sources and the values. The determination of these weights is not always an easy task. The literature presents several approaches for this problem. Based on the description in [11] we can distinguish the following approaches. • Methods based on an expert. In this case an expert gives some relevant information which is used to extract the weights. This is the case of the approaches related to the Analytical Hierarchy Process (AHP) [8,7] and

194

V. Torra / On the Derivation of Weights Using the Geometric Mean Approach

also the approaches for determining the OWA weights. In the case of the AHP, we elicit a preference matrix from the expert and we use this matrix to find the weights of e.g. a weighted mean. In the case of the OWA, we ask for an orness level [4] and then we use this orness level to find the OWA weights. • Methods based on data. In this case a set of examples is available and these examples are used for determining the weights. Optimization and machine learning approaches are used for this purpose. The examples can be either records of the form (input,output) or a set of data with a ranking of these data. In this paper we consider the problem of determining the weights when we elicit a matrix from an expert (or a set of experts) and each cell in the matrix is not a single value but a set of them. It is known that when an expert is asked for preferences, the matrix is usually non consistent. See e.g. [3]. Methods are developed to determine the weights from this non consistent matrix. The method proposed here can be applied to deal with these inconsistencies. The structure of the paper is as follows. In Section 1 we review some results on aggregation operators that are used later. In Section 2 we review the approach applied in the analytic hierarchy process to derive the weights. In Section 3 we present our new approach and an example. The paper finishes with some conclusions and lines for future research. 1. Preliminaries In this section we review a few results on aggregation operators that are used later. An aggregation operators is a function that satisfies unanimity and monotonicity (see e.g. [11] for details). We will consider separable functions. Definition 1.1. C(a1 , . . . , aN ) is separable when there exist functions g1 , . . . , gN and ◦ such that C(a1 , . . . , aN ) = g1 (a1 ) ◦ g2 (a2 ) ◦ . . . ◦ gN (aN ), with ◦ a continuous, associative, and cancellative operator. Let us now consider two properties for aggregation operators. First, unanimity or idempotency. An aggregation operator satisfies unanimity when the following equation holds C(a, . . . , a) = a.

(1)

Second, reciprocity. An aggregation operator satisfies reciprocity when the following equation holds C(1/a1 , . . . , 1/aN ) = 1/C(a1 , . . . , aN ).

(2)

The following result characterizes aggregation operators which are separable and satisfy unanimity and reciprocity.

V. Torra / On the Derivation of Weights Using the Geometric Mean Approach

195

Proposition 1.1. [2,1] An operator C is separable in terms of a unique monotone increasing g (gi = g for all i) and satisfies unanimity and reciprocity if and only if C is of the form C(a1 , . . . , aN ) = exp ω

−1

N 1 ( ω(log(ai ))), N i=1

(3)

with ω an arbitrary odd function. Here exp is the exponential function, and exp a corresponds to ea . We consider an additional condition which is positive homogeneity. An aggregation operator satisfies positive homogeneity when the following condition holds. C(ra1 , . . . , raN ) = rC(a1 , . . . , aN )

(4)

for r > 0 If we add positive homogeneity to the conditions above we have the following result. Proposition 1.2. [2,1] An operator C is separable in terms of a unique monotone increasing g and satisfies unanimity, positive homogeneity (Equation 4), and reciprocity (Equation 2) if and only if C is the geometric mean: C(a1 , . . . , aN ) = (

N

ai )1/N .

i=1

2. The AHP process for the weighted mean In this section we outline the problem of learning the weights for the weighted mean as typically used in the Analytical Hierarchy Process (AHP) [8,7]. Note that the same approach can be used for learning the weights in the OWA, the WOWA, the quasi-weighted mean, and related operators. In the description we consider that an aggregation operator will aggregate data from a set of n information sources. Let X be the set of information sources, and X = {X1 , . . . , Xn }. Weights (w1 , . . . , wn ) for the variables X1 , . . . , Xn are obtained as follows within the AHP framework. Step 1. The expert is interviewed and is asked about the importance of source Xi with respect to source Xj for all pairs of sources i, j. In the original formulation this comparison was given in a linguistic scale, with a numerical translation. We just consider a numerical scale. Thus, this step results into a matrix {aij }ij of numerical values. Step 2. If the matrix {aij }ij is consistent, the process is finished. Here, it is said that a matrix {aij }ij is consistent when • aij = aik akj for all i, j, k, and

196

V. Torra / On the Derivation of Weights Using the Geometric Mean Approach

• aij = 1/aji for all i, j. When the matrix is consistent, we can define asi wi = n j=1

asj

using any row s. It is easy to prove that the weights defined in this way do not depend on the selected row s, and that aij = wi /wj . Step 3. If the matrix is not consistent, find weights wi such that wi /wj approximate aij . Several approaches exist for the case in which the matrix is not consistent. [5,6] classifies methods in two groups: the eigenvalue approach and the methods minimizing the distance between the user-defined matrix and the nearest consistent matrix. Crawford and Williams [3] proposed to select weights such that the following difference was minimized. n  n j=1 i=1

ln(aij ) − ln(

2 wi ) wj

They prove (Theorem 3 in [3]) that the solution of this minimization problem when matrices are not consistent but they are such that aii = 1 and aij = 1/aji for all i, j is the weighting vector (w1 , . . . , wn ) where wk is defined as follows: wk =

n j=1

1/n

akj .

(5)

This way to derive the weights is known as the logarithmic least square method and also as the geometric mean because Equation 5 is the geometric mean. In the next section we consider this problem when instead of a single value aij we have a set of them. That is, we have a set valued matrix. Although this problem is inspired on hesitant fuzzy sets [10,13], we can not use this term here because when aij ∈ (0, 1) then 1/aij does not belong to [0, 1]. 3. The geometric mean approach for set-valued matrices In this section we consider the problem of having a matrix {aij } where in each position aij instead of having a single value we have a set of them. That is, for each pair of variables (xi , xj ) we have that the relative preference of xi over xj is a set of values. In the more general case, we denote this set by k

C(i, j) = {a1ij , . . . , aijij }. Although we allow the number of elements to be different in each set, for the sake of simplicity when no confusion arises we will use k without subindexes. That is,

V. Torra / On the Derivation of Weights Using the Geometric Mean Approach

197

C(i, j) = {a1ij , . . . , akij }. We call such matrix a n × n structure and denote it by {C(i, j)}ij . Inspired on the approach by Crawford and Williams [3], given the structure {C(i, j)}ij , we consider the elicitation of weights wi that minimize the following distance, 2 n  n wi 1 k ln(C(aij , . . . , aij )) − ln( ) wj j=1 i=1 where C(a1ij , . . . , akij ) is an aggregation operator that satisfies idempotency and reciprocity. The conditions on C are based on the following considerations. 1. We consider idempotency for simplicity and technical reasons. Note that idempotency is only necessary when we have a multiset instead of a set. 2. Reciprocity is necessary when we expect that sets k

C(i, j) = {a1ij , . . . , aijij } and k

C(j, i) = {a1ji , . . . , ajiji }. are such that (i) have the same number of elements (i.e., kij = kji ), and (ii) for each arij there is an element asji such that arij = 1/asji and for each asij there is an element arij such that arij = 1/asji , and in addition we have that the following equality holds k

k

C({a1ij , . . . , aijij }) = 1/C({a1ji , . . . , ajiji }). Note that this equation is related to the consistency of aij and aji (i.e., aij = 1/aji ). The solution of this problem is given in the next theorem. Inthe theorem a vector w = (w1 , . . . , wn ) is a weighting vector when wi ≥ 0 and wi = 1. This latter property is required in aggregation operators so that they satisfy unanimity. Proposition 3.1. Let M be an n × n structure {C(i, j)}ij where for each cell (i, j) k we have the set C(i, j) = {a1ij , . . . , aijij }, let C be a given separable aggregation function C(a1 , . . . , at ) satisfying idempotency and reciprocity; then, the weighting vector (w1 , . . . , wn ) that is a solution of the minimization problem 2 n  n wi 1 k ln(C(aij , . . . , aij )) − ln( ) wj j=1 i=1 is given by weights 

wi = exp

r 1 r 1 i=k (ln C(aki ,...,aki )−ln C(aik ,...,aik )) 2n

(6)

198

V. Torra / On the Derivation of Weights Using the Geometric Mean Approach

for i = 1, . . . , n. In addition, C is of the form C(a1 , . . . , aN ) = exp ω

−1

N 1 ( ω(log(ai ))). N i=1

(7)

where ω is an arbitrary odd function. Proof. First, Expression 7 for the given operator C follows from Proposition 1.1. Let us now derive the expression for the weights. Let L be the function to be minimized L=

n  n

ln(C(a1ij , . . . , akij ))

j=1 i=1

2 wi − ln( ) wj

We first rewrite L taking into account that ln(wi /wj ) = ln wi − ln wj . That is, L=

n n 

ln(C(a1ij , . . . , akij )) − ln wi + ln wj

2

.

j=1 i=1

Let us denote C(a1ij , . . . , akij ) by C(i, j). In order to find the minimum, we compute the derivatives of this expression with respect to the weights wk and make them equal to zero. That is, ∂L 1 = 2 (ln C(i, r) − ln wi + ln wr ) ∂wr wr i=r

+



2 (ln C(r, i) − ln wr + ln wi )

i=r

−1 wr

= 0.

(8)

If wr = 0, we can rewrite these equations as follows (ln C(i, r) − ln C(r, i) − ln wi − ln wi ) + (ln wr + ln wr ) = 0. i=r

Or, equivalently, (ln C(i, r) − ln C(r, i)) − 2 ln wi = − 2 ln wr . i=r

i=r

As we have required last equation reduces to



i

(10)

i=r

which adding 2 ln wr to both sides results into (ln C(i, r) − ln C(r, i)) − 2 ln wi = − 2 ln wr = −2n ln wr . i=r

(9)

i=r

i

wi = 1, this implies that



(11)

ln wi = 0. Therefore, the

V. Torra / On the Derivation of Weights Using the Geometric Mean Approach

 i=r

199

(ln C(r, i) − ln C(i, r))

= ln wr . (12) 2n which corresponds to weights as in Equation 6.  We need to check that this definition of weights is such that ln wi = 0. That is, that  i=r (ln C(r, i) − ln C(i, r)) 2n

k

equals zero. As this corresponds to 1 1 (ln C(r, i)) − (ln C(i, r)) 2n 2n k

i=r

k

i=r

and both terms add the whole elements of the matrix except for the diagonal, it is clear that the total is zero. Therefore, the proposition is proven. Note that the constraint that C satisfies reciprocity does not have any role in the proof and any function C would be equally acceptable. If we use a function C that does not satisfy reciprocity we have that Equation 12 is still the solution. However, we will have ratio weights which approximate a matrix C(i, j) which does not necessarily satisfy reciprocity even when the data in the {C(i, j)}ij structure satify it. We illustrate the previous proposition with an example. 1/N Example 3.1. Let C(a1 , . . . , aN ) = ( N , which as shown in Proposii=1 ai ) tion 1.2 satisfies unanimity and reciprocity (and positive homogeneity). Let us consider the following structure: ⎞ 1 {0.2, 0.4} {0.8, 0.9} ⎝ {5, 2.5} 1 {3, 3.5} ⎠ {1.25, 1.11} {0.3, 0.28} 1 ⎛

For this data, the aggregated matrix is ⎞ 1 0.2828 0.8485 ⎝ 3.5355 1 3.2403 ⎠ 1.1779 0.2898 1 ⎛

The solution using Proposition 3.1 is the following vector: • w1 = 1.3280876 • w2 = 0.5318296 • w3 = 1.4157964 We can observe that the product of these weights is one so the constraint wi = 1 is satisfied.

200

V. Torra / On the Derivation of Weights Using the Geometric Mean Approach

4. Conclusions In this paper we have considered the problem of finding the optimal weights corresponding to a set-based matrix. As future work we consider the following two problems. First, in Proposition 3.1 we consider that there is a single aggregation function for all the cells of the table. A more general situation would be when we do not require from the start all aggregation operators to be the same. Second, let us consider the case that we have a structure where we have a single value in each cells (i, j). Let us presume that we have a function reconcile (a kind of consistentize function) that adds to each cell all those values required so that for each i, j we have that if aij ∈ C(i, j) then we have aik ∈ C(i, k) and akj ∈ C(k, j) such that aik akj = aij . Then, we will study the solution of our optimization problem to this reconciled data and its similarity and relation to the solution of the approach by Crawford and Williams in [3] to the original data.

Acknowledgments This work is partially funded by projects ARES-CONSOLIDER INGENIO 2010 CSD2007-00004, TIN2010-15764 and TIN2011-27076-C03-03 of the Spanish Government, and by project FP7/2007-2013 (Data without Boundaries).

References [1] [2] [3] [4]

[5] [6] [7] [8] [9] [10] [11]

Acz´ el, J. (1987) A Short Course on Functional Equations, D. Reidel Publishing Company (Kluwer Academic Publishers Group). Acz´ el, J., Alsina, C. (1986) On synthesis of judgements, Socio-Econ. Plann. Sci 20:6 333339. Crawford, G., Williams, C. (1985) A note on the analysis of subjective judgment matrices, Journal of Mathematical Psychology 29 387-405. Dujmovi´c, J. J. (1973) A generalization of some functions in continous mathematical logic – evaluation function and its applications (in Serbo-Croatian), Proc. of the Informatica Conference, paper d27. Golany, B., Kress, M. (1993) A multicriteria evaluation of the methods for obtaining weights from ratio-scale matrices, European Journal of Operations Research 69 210-202. Ishizaka, A., Lusti, M. (2006) How to derive priorities in AHP: a comparative study, CEJOR 14 387-400. Saaty, R. W. (1987) The analytic hierarchy process – what it is and how it is used, Mathematical Modelling 9:3-5 161-176. Saaty, T. L. (1980) The Analytic Hierarchy Process, McGraw-Hill. Torra, V. (1997) The weighted OWA operator, Int. J. of Intel. Syst. 12 153-166. Torra, V. (2009) Hesitant fuzzy sets, Intl. J. of Intel. Systems, 25:6 (2010) 529-539. Torra, V., Narukawa, Y. (2007) Modeling decisions: information fusion and aggregation operators, Springer.

V. Torra / On the Derivation of Weights Using the Geometric Mean Approach

201

[12] Torra, V., Narukawa, Y. (2007) Modelitzaci´ o de decisions: fusi´ o d’informaci´ o i operadors d’agregaci´ o, UAB Press. [13] Torra, V., Narukawa, Y. (2009) On hesitant fuzzy sets and decision, Proc. of the 2009 IEEE Int. Conf. on Fuzzy Systems (ISBN: 978-1-4244-3597-5), DVD-ROM, Jeju Island, Korea, August, 2009, 1378-1382. [14] Yager, R. R. (1988) On ordered weighted averaging aggregation operators in multi-criteria decision making, IEEE Trans. on Systems, Man and Cybernetics 18 183-190.

This page intentionally left blank

Fuzzy Logic and Reasoning II

This page intentionally left blank

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-205

205

Local and Global Similarities in Fuzzy Class Theory Eva ARMENGOL a , Pilar DELLUNDE a,b , Àngel GARCÍA-CERDAÑA a,c a Artificial Intelligence Research Institute (IIIA - CSIC) Campus UAB, 08193 Bellaterra, Catalonia, Spain e-mail: [email protected] b

Philosophy Department, Universitat Autònoma de Barcelona, Campus UAB, E-08193 Bellaterra, Catalonia, Spain e-mail: [email protected] c

Information and Communication Technologies Department, University Pompeu Fabra, Tànger, 122-140, E-08018 Barcelona, Catalonia, Spain e-mail: [email protected]

Abstract. Similarity is a central issue in many topics from psychology to artificial intelligence. In the present paper we want to contribute to the study of the similarity relation between objects represented as attribute-value pairs in the logical context. Following the idea that similarity is a phenomena essentially fuzzy, we introduce fuzzy similarities and prove some logical properties in the framework of Fuzzy Class Theory based on the logic MTLΔ . We prove that the global similarity inherits the usual properties of the local similarities (reflexivity, symmetry, transitivity). We also show that similar objects have similar properties, being these properties expressed by MTLΔ ∀ formulas evaluated in these objects. Keywords. Similarity Relation, Fuzzy Class Theory, Attribute-Value Representation

1. Introduction In this paper we want to contribute to the study of the similarity relation between objects represented as attribute-value pairs in the logical context. Ruspini suggests in [13], that the degree of similarity between two objects A and B may be regarded as the degree of truth of the vague proposition “A is similar to B”. Thus, similarity among objects can be seen as a phenomenon essentially fuzzy. Following this idea, we study the relationship between global similarities and local similarities in the graded framework of Fuzzy Class Theory (FCT). FCT, introduced in [5], is a part of the Mathematical Fuzzy Logic [10,7] devoted to the axiomatization of the notion of fuzzy set. This formalism serves as foun-

206

E. Armengol et al. / Local and Global Similarities in Fuzzy Class Theory

dation of a large part of fuzzy mathematics. In particular, fuzzy relations as fuzzy orders and similarities can be studied in this graded framework. For instance, in FCT we can express the fact that a fuzzy relation is transitive up to a certain degree. Roughly speaking, global similarities between objects are defined by the fusion of local similarities (defined between values of the object’s attributes). In this paper we study the relationship of the degree to which a global similarity has a property (reflexivity, symmetry and transitivity) to the degree to which local similarities have this same property. In FCT the properties of reflexivity, symmetry and transitivity are expressed by means of first-order sentences. For instance, the degree up to which the relation R is reflexive is the truth-value of the sentence Refl(R) = ∀xRxx. In the fuzzy framework, the notion of similarity was introduced by Zadeh in [18] as a generalisation of the notion of equivalence relation (see [12] for a historical overview on the notion of t-norm based similarity). As Zadeh pointed out, one of the possible semantics of fuzzy sets is in terms of similarity. Indeed, the membership degree of an object to a fuzzy set can be seen as the degree of resemblance between this object and prototypes of the fuzzy set. Hájek [10] studies similarities and congruences in fuzzy predicate logics and proposes the following axioms1 : (S1) (Reflexivity) (∀x) x ≈ x (S2) (Symmetry) (∀x, y)(x ≈ y → y ≈ x) (S3) (Transitivity) (∀x, y, z)(x ≈ y & y ≈ z → x ≈ z) (Cong) For each n-ary predicate P ,

(∀x1 , . . . , xn , y1 , . . . , yn )(x1 ≈y1 & · · · &xn ≈yn → (P x1 , . . . , xn ↔ P y1 , . . . , yn ))

All the results in this paper are presented in the framework of FCT. We obtain some results that adequately interpreted allow us to say that: • the properties of reflexivity, symmetry and transitivity of fuzzy binary relations at the global level inherit the properties of the fuzzy binary relations at the local level when we fusion them (Proposition 1), • the global similarity is a congruence if some of the local similarities are congruences (Proposition 2). Moreover, we obtain an analogous result to Lemma 5.6.8 of [10] for the logic MTLΔ (Theorem 1). We apply this result to show that similar objects (global similarity) have similar properties, being these properties expressed by first-order formulas evaluated in these objects. The paper is organised as follows. In Section 2 we introduce the basics of FCT. In Section 3 we present our main logical results. Finally there is a section devoted to conclusions and future work.

2. Fuzzy Class Theory Fuzzy Class Theory was introduced in [5] with the aim to axiomatize the notion of fuzzy set, and it is based on the logic ŁΠ∀. Later in [4] the FCT was based in the more general setting of the logic MTLΔ ∀. In such paper, B˘ehounek et al. studied fuzzy relations in the context of FCT, which generalises existing crisp results on fuzzy relations to the graded 1 In

order to economize parenthesis we will consider → the least binding connective.

E. Armengol et al. / Local and Global Similarities in Fuzzy Class Theory

207

framework. The algebra of truth values for formulas is the standard MTLΔ ∀-chain over the real unit interval [0, 1]. Fuzzy Class Theory over MTLΔ (for a presentation of this logic and its firstorder version see [4, Apendix A]) is a theory over the multi-sorted first-order logic MTLΔ ∀ with crisp equality. It has sorts of individuals of order 0 (atomic objects) a, b, c, x, y, z, . . .; individuals of the first-order (fuzzy classes) A, B, X, Y, . . .; individuals of the second-order (fuzzy classes of fuzzy classes) A, B, X , Y, . . .. For every variable x of any order n and for every formula ϕ there is a class term {x|ϕ} of order n + 1. Besides the logical predicate of identity, the only primitive predicate is the membership predicate ∈ between successive sorts. For variables of all orders, the axioms for ∈ are: (∈ 1) (∈ 2)

y ∈ {x|ϕ(x)} ↔ ϕ(y), for every formula ϕ, (∀x)Δ(x ∈ A ↔ x ∈ B) → A = B.

(Comprehension Axioms) (Extensionality)

The basic properties of fuzzy relations are defined as sentences as follows: Definition 1 Let R be a binary predicate symbol. Reflexivity: Symmetry: Transitivity:

Refl(R) ≡df (∀x)Rxx Sym(R) ≡df (∀x, y)(Rxy → Ryx) Trans(R) ≡df (∀x, y, z)(Rxy&Ryz → Rxz)

Example 1 Let R1 , R2 and R3 be fuzzy relations on the set U = {v, w} defined as follows:  1  0.7  1 1  1  0.3  R1 = R2 = R3 = 0.5 0.7 1 0.2 0.9 1 Let us calculate the truth value of the sentences Refl(Ri ), Sym(Ri ) and Trans(Ri ) by using the t-norm of the minimum. The truth value of Refl(Ri ) is given by: inf{Rxx : x ∈ U }, thus we have ||Refl(R1 )|| = 0.7, ||Refl(R2 )|| = 0.2, and ||Refl(R3 )|| = 1. The truth value of Sym(Ri ) is given by: inf{Rxy → Ryx : x, y ∈ U }, where → is the Gödel implication; thus we have ||Sym(R1 )|| = 0.5, ||Sym(R2 )|| = 1, and ||Sym(R3 )|| = 0.3. The truth value of Trans(Ri ) is given by: inf{Rxy ∧ Ryz → Rxz : x, y, z ∈ U }, thus we have ||Trans(R1 )|| = 1, ||Trans(R2 )|| = 0.2, and ||Trans(R3 )|| = 1. In FCT we have two notions of similarity, the strong one defined using the strong conjunction &, and the weak one define using the weak conjunction ∧. They are defined as sentences in the following way: Definition 2 Let R be a binary predicate symbol, Strong similarity: Weak similarity:

Sim(R) ≡df Refl(R)&Sym(R)&Trans(R) wSym(R) ≡df Refl(R) ∧ Sym(R) ∧ Trans(R)

208

E. Armengol et al. / Local and Global Similarities in Fuzzy Class Theory

Following the Example 1 above, the strong similarity of each Ri , i.e., the truth value of each the sentence Sim(Ri ) is: ||Sim(R1 )|| = 0.5, ||Sim(R2 )|| = 0.2, and ||Sim(R3 )|| = 0.3. Observe that in this particular example, since the t-norm is the minimum, the strong and the weak similarities coincide.

3. Local and Global Similarities in Fuzzy Class Theory: Main Results Similarity has been a central issue for decades in different disciplines, ranging from philosophy (Leibniz’s Principle of the Identity of Indiscernibles) and psychology (Tversky’s stimuli judged similarity) to natural sciences (taxonomy) and mathematics (geometric similarity). The common approach to similarity (or dissimilarity) between objects is to define it by means of a distance measure. This implies, however, that objects are described geometrically, which is not always the case. In many situations objects are described symbolically and, in these cases, Tversky [17] proposes to define similarity through the comparison of the features that describe these objects. This implies that when we want to assess the similarity between two objects by comparing their features, we need to assess how similar are both objects in each feature and then to aggregate these similarities. This gives us a notion of global and local similarity. Moreover, in [17] Tversky shows situations in which similarities do not satisfy the usual mathematical properties of metrics. Sometimes we can also consider weaker notions of similarity in which some of these properties does not necessarily hold or hold only up to some degree. In the context of FCT we can deal with graded notions of similarity. Now we will proceed to prove the logical main results of this paper concerning the relationship between local and global similarities. We show that basic properties of local similarities are preserved when we define a global similarity between objects, using these local similarities. Let U be a set of objects represented by attribute-value pairs. We assume that we have a way to describe domain objects in U by means of a fixed number of attributes. For every i, 1 ≤ i ≤ l, let Si be a binary fuzzy relation on the set of values Vi of an attribute Ai . Each relation Si induces a relation Ri on U as follows. For every v, w ∈ U , v = v1 , . . . , vl and w = w1 , . . . , wl , we define: Ri vw ≡df Si vi wi . We call each Ri a local relation. From these local relations, and using a t-norm ∗, we define a new relation R as follows: Rvw ≡df R1 vw ∗ · · · ∗ Rl vw. We say that R is a global relation. In Example 1 of the previous section, the global relation R on U = {v, w} induced by the local relations R1 , R2 and R3 is the following:

E. Armengol et al. / Local and Global Similarities in Fuzzy Class Theory

R=

 0.7 0.5

209

0.3  0.2

Intuitively, the following proposition shows that the properties of reflexivity, symmetry and transitivity of fuzzy binary relations at the global level inherit the properties of the fuzzy binary relations at the local level when we fusion them. Proposition 1 Let R1 , . . . , Rl be binary predicate symbols from the language of FCT, R1 xy, . . . , Rl xy atomic formulas and Rxy = R1 xy & · · · & Rl xy. Then, the following theorems are provable in FCT: (TS1) (TS2) (TS3)

Refl(R1 ) & · · · & Refl(Rl ) → Refl(R) Sym(R1 ) & · · · & Sym(Rl ) → Sym(R) Trans(R1 ) & · · · & Trans(Rl ) → Trans(R)

Proof: Along the proof of this Proposition, we apply repeatedly the following theorem of FCT (see for instance [4, Lemma B.8 (L15)]):

FCT (∀x)ϕ1 & · · · & (∀x)ϕk → (∀x)(ϕ1 & · · · & ϕk ),

(1)

and we also apply the following theorem of FCT (see for instance [6, Lemma 3.2.2(3)]):

FCT (ϕ1 → ψ1 ) & · · · & (ϕk → ψk ) → (ϕ1 & · · · & ϕk → ψ1 & · · · & ψk ).

(2)

(TS1): By Definition 1, Refl(R1 ) & · · · & Refl(Rl ) = (∀x)R1 xx & · · · & (∀x)Rl xx. By applying (1), we have:

FCT (∀x)R1 xx & · · · & (∀x)Rl xx → (∀x)(R1 xx & · · · & Rl xx) and since (∀x)(R1 xx & · · · & Rl xx) = Refl(R), we have Refl(R1 ) & · · · & Refl(Rl ) → Refl(R). (TS2): By Definition 1, Sym(R1 ) & · · · & Sym(Rl ) = = (∀x, y)(R1 xy → R1 yx) & · · · & (∀x, y)(Rl xy → Rl yx). Now, by applying two times (1), we obtain:

FCT (∀x, y)(R1 xy → R1 yx) & · · · & (∀x, y)(Rl xy → Rl yx) → → (∀x, y)((R1 xy → R1 yx) & · · · & (Rl xy → Rl yx)). By applying (2) we obtain:

FCT (∀x, y)((R1 xy → R1 yx) & · · · & (Rl xy → Rl yx)) → → (∀x, y)(R1 xy & · · · & Rl xy → R1 yx & · · · & Rl yx). Finally, by transitivity, we obtain:

FCT (∀x, y)(R1 xy → R1 yx) & · · · & (∀x, y)(Rl xy → Rl yx) → (∀x, y)(R1 xy & · · · & Rl xy → R1 yx & · · · & Rl yx). Consequently, Sym(R1 ) & · · · & Sym(Rl ) → Sym(R). (TS3): By Definition 1, Trans(R1 ) & . . . & Trans(Rl ) = = (∀x, y, z)(R1 xy & R1 yz → R1 xz) & · · · & (∀x, y, z)(Rl xy & Rl yz → Rl xz).

210

E. Armengol et al. / Local and Global Similarities in Fuzzy Class Theory

Now, by applying three times (1), we obtain:

FCT (∀x, y, z)(R1 xy & R1 yz → R1 xz) & · · · & (∀x, y, z)(Rl xy & Rl yz → Rl xz) → → (∀x, y, z)((R1 xy & R1 yz → R1 xz) & · · · & (Rl xy & Rl yz → Rl xz)). By applying (2) we obtain:

FCT (∀x, y, z)((R1 xy & R1 yz → R1 xz) & · · · & (Rl xy & Rl yz → Rl xz)) → → (∀x, y, z)((R1 xy & R1 yz) & · · · & (Rl xy & Rl yz) → (R1 xz & · · · & Rl xz)). By repeatedly applying the commutativity and associativity axioms for & we obtain:

FCT (∀x, y, z)((R1 xy & R1 yz) & · · · & (Rl xy & Rl yz) → (R1 xz & · · · & Rl xz)) → → (∀x, y, z)((R1 xy &· · ·&Rl xy)&(R1 yz &· · ·&Rl yz) → (R1 xz &· · ·&Rl xz)). That is, by definition of the atomic formula Rxy,

FCT (∀x, y, z)((R1 xy & R1 yz) & · · · & (Rl xy & Rl yz) → (R1 xz & · · · & Rl xz)) → → (∀x, y, z)(Rxy & Ryz → Rxz). The consequent formula of the previous sentence is precisely Trans(R). Thus, by transitivity, we obtain FCT Trans(R1 ) & · · · & Trans(Rl ) → Trans(R). 2 We can establish a lower bound of the degree of similarity of a global relation using the degrees of similarity of the local relations, as it is proved in the following corollary. Corollary 1 Let R1 , . . . , Rl be binary predicate symbols from the language of FCT, R1 xy, . . . , Rl xy atomic formulas and Rxy = R1 xy & · · · & Rl xy. Then, the following theorems are provable in FCT: (TS4) (TS5)

Sim(R1 ) & · · · & Sim(Rl ) → Sim(R), wSim(R1 ) ∧ · · · ∧ wSim(Rl ) → wSim(R).

Proof: (TS4): By Definition 2, we have Sim(R1 ) & · · · & Sim(Rl ) = = Refl(R1 ) & Sym(R1 ) & Trans(R1 ) & · · · & Refl(Rl ) & Sym(Rl ) & Trans(Rl ). Observe that, using the axioms of commutativity and associativity for &, we get:

FCT Sim(R1 ) & · · · & Sim(Rl ) → → Refl(R1 )&· · ·&Refl(Rl )&Sym(R1 )&· · ·&Sym(Rl )&Trans(R1 )&· · ·&Trans(Rl ). Now we use the fact that if α1 , . . . αk are theorems of FCT, α1 & · · · & αk is also a theorem of FCT. From (TS1), (TS2) and (TS3) of Proposition 1, using (2), we obtain that the following formula is a theorem of FCT: Refl(R1 )&· · ·&Refl(Rl )&Sym(R1 )&· · ·&Sym(Rl )&Trans(R1 )&· · ·&Trans(Rl ) → → Refl(R) & Sym(R) & Trans(R). The consequent formula of the previous sentence is precisely Sim(R). Finally, by transitivity we get FCT Sim(R1 ) & · · · & Sim(Rl ) → Sim(R). (TS5): It is analogously proved by using the following theorem (3) (see for instance [6, Lemma 3.2.2(4)]) instead of (2)

FCT (ϕ1 → ψ1 ) ∧ · · · ∧ (ϕk → ψk ) → (ϕ1 ∧ · · · ∧ ϕk → ψ1 ∧ · · · ∧ ψk ).

(3) 2

E. Armengol et al. / Local and Global Similarities in Fuzzy Class Theory

211

The following proposition shows that the global similarity is a congruence if some of the local similarities are also congruences. Proposition 2 Let R1 , . . . , Rl be binary predicate symbols from the language of FCT, R1 xy, . . . , Rl xy atomic formulas and Rxy = R1 xy & · · · & Rl xy. Assume that T is a theory that includes for some 1 ≤ i ≤ l the formulas obtained by substituting in (Cong) the identity symbol ≈ by Ri . Then, T FCT α, for every formula α obtained by substituting in (Cong) the symbol ≈ by R. Proof: For the sake of clarity we prove the proposition for every binary predicate P , but the same proof holds for predicates of arbitrary arity. Let i, 1 ≤ i ≤ l be such that T includes the formulas obtained substituting in (Cong) the identity symbol ≈ by Ri . Let us consider the following instance of the MTL axiom (ϕ → ψ) → ((ψ → χ) → (ϕ → χ)): (Rx1 y1 & Rx2 y2 → Ri x1 y1 & Ri x2 y2 ) → ((Ri x1 y1 & Ri x2 y2 → (P x1 x2 ↔ P y1 y2 )) → (Rx1 y1 & Rx2 y2 → (P x1 x2 ↔ P y1 y2 ))).

()

On the one hand, since Rxy = R1 xy & · · · & Rl xy, using the following theorem:

FCT ϕ1 & · · · & ϕi & · · · & ϕl → ϕi .

(4)

We have that Rx1 y1 → Ri x1 y1 and Rx2 y2 → Ri x2 y2 are theorems of FCT. Therefore, by applying (1), we have:

FCT Rx1 y1 & Rx2 y2 → Ri x1 y1 & Ri x2 y2 .

(5)

On the other hand, by assumption we have: T FCT Ri x1 y1 & Ri x2 y2 → (P x1 x2 ↔ P y1 y2 ).

(6)

because this formula is an instantiation of the axiom (Cong) applied to Ri . Now, taking as premises (), (5), and (6), by applying two times Modus Ponens, we obtain: T FCT Rx1 y1 & Rx2 y2 → (P x1 x2 ↔ P y1 y2 ).

(7)

Finally, by applying four times the Generalitation rule to (7), we obtain: T FCT (∀x1 , x2 , y1 , y2 )(Rx1 y1 & Rx2 y2 → (P x1 x2 ↔ P y1 y2 )). q.e.d.

(8) 2

In [1, Theorem 1] we generalized Lemma 5.6.8. of [10] to the logic MTL. In order to work in FCT based in MTLΔ , we extend this result to that logic. We could apply this result to show that similar objects (global similarity) have similar properties, being these properties expressed by first-order formulas evaluated in these objects. According to Definition 5.6.7 of [10], we introduce the notion of syntactic degree of a formula φ in the following way: 1. dg(φ) = 1, if φ is atomic 2. dg(φ) = 0, if φ is a truth constant

212

E. Armengol et al. / Local and Global Similarities in Fuzzy Class Theory

3. dg(∀xφ) = dg(∃xφ) = dg(¬φ) = dg(Δφ) = dg(φ) 4. dg(φ → ψ) = dg(φ ∗ ψ) = dg(φ) + dg(ψ) 5. dg(φ ∧ ψ) = dg(φ ∨ ψ) = max{dg(φ), dg(ψ)} Notation: Let x ≈k y abbreviate (x ≈ y)& · · · &(x ≈ y) (k times). Theorem 1 [1, Theorem 1] Let T be a theory in MTL∀ containing axioms (S1), (S2), (S3) and (Cong) for ≈. Let φ be a first-order formula of MTL∀ with dg(φ) = k. Let x1 . . . , xn be the variables including all free variables of φ and let, for every 1 ≤ i ≤ n, yi be substituable for xi in φ. Then, T (x1 ≈k y1 )& · · · &(xn ≈k yn ) → (φ(x1 , . . . , xn ) ↔ φ(y1 , . . . , yn )). Theorem 2 Let T be a theory in MTLΔ containing axioms (S1), (S2), (S3) and (Cong) for ≈. Let φ be a first-order formula of MTL∀Δ with dg(φ) = k. Let x1 . . . , xn be the variables including all free variables of φ and let, for every 1 ≤ i ≤ n, yi be substituable for xi in φ. Then, T Δ[(x1 ≈k y1 )& · · · &(xn ≈k yn )] → (φ(x1 , . . . , xn ) ↔ φ(y1 , . . . , yn )). Proof: By induction on the complexity of formulas. By axioms (S1), (S2), (S3) and (Cong), the assertion is true for atomic formulas (and is vacuous for truth constants). For the proof of all the inductive steps except for the Δ step, we refer to the proof of [1, Theorem 1]. For the sake of clarity, we prove the Δ step only for 2 variables, that is, for x ≈k y instead that for (x1 ≈k y1 )& · · · &(xn ≈k yn ). The generalization to the n case is trivial. Inductive step Δφ. By definition of the syntactic degree, dg(Δφ) = k. By inductive hypothesis we have T x ≈k y → (φ(x) ↔ φ(y)). Thus, by the Δ rule, T Δ[(x ≈k y) → (φ(x) ↔ φ(y))]. and then, by Axiom (Δ5) of MTLΔ , T Δ(x ≈k y) → Δ(φ(x) ↔ φ(y)). Consequently by [6, Lemma 3.2.1(TΔ3)] and Axiom (Δ5) of MTLΔ , T Δ(x ≈k y) → (Δφ(x) ↔ Δφ(y)). 2 4. Conclusions and Future Work Description Logics (DLs) are knowledge representation languages built on the basis of classical logic. DLs allow the creation of knowledge bases and provide ways to reason on the contents of these bases. A full reference manual of the field can be found in [2]. Fuzzy Description Logics (FDLs) are natural extensions of DLs expressing vague concepts commonly present in real applications (see for instance [15,16,14,3] or [11] for a survey). In [9] we studied the notion of similarity between objects represented as attributevalue pairs in the context of FDL. In such paper we proposed to add a SBox (Similar-

E. Armengol et al. / Local and Global Similarities in Fuzzy Class Theory

213

ity Box) to the knowledge bases of an ALC-like fuzzy description language. The SBox allows the expression of properties such as reflexivity, symmetry, transitivity and congruence of a relation. In this way we can explicitly declare which of these properties are satisfied by an eventual similarity. Following this line of research, in [1] we proved some logical properties in the context of IALCE [8] with concrete domains. In such paper we showed in the context of the logic MTL that the global similarity inherits the usual properties of the local similarities (reflexivity, symmetry, transitivity). We also proved that similar objects have similar properties, being these properties expressed by fuzzy description logic formulas evaluated in these objects. In the present paper we have extended the work done at the logical level in [1] to the graded framework of FCT, and we have proved that the global similarity inherits the usual properties of the local similarities also in this new context. This means, in practice, that we can assess the similarity between objects focusing on the similarities of the attributes that satisfy them. In other words, the more similar the attributes describing two objects are, the more similar will be both objects. Moreover, according to the congruence property, we have also shown that similar objects have similar properties. The satisfaction of these properties assure the regularity between domain objects and, in classification tasks, this means that similar objects will have similar classifications. The results presented in Section 3 will be used in a forthcoming paper in order to introduce graded axioms for reflexivity, symmetry and transitivity in the SBox of a FDL in a systematic way.

Acknowledgments The authors acknowledge support by the Spanish MICINN project EdeTRI (TIN201239348-C02-01). The work of Eva Armengol was also partially supported by the Spanish MICINN project COGNITIO (TIN2012-38450-C03-03) and the grant 2014SGR-118 from the Generalitat de Catalunya. The work of Pilar Dellunde was also partially funded by the Spanish MICINN project CONSOLIDER (CSD2007-0022). The work of Àngel García-Cerdaña was also partially funded by the Spanish MICINN project MTM 201125747. Pilar Dellunde and Àngel García-Cerdaña also acknowledge the support of the grant 2009SGR-1433 from the Generalitat de Catalunya.

References [1] [2]

[3] [4] [5] [6]

E. Armengol, P. Dellunde, and À. García-Cerdaña. On similarity in Fuzzy Description Logics. Fuzzy Sets and Systems, submitted. F. Baader, D. Calvanese, D.L. McGuinness, D. Nardi, and P.F. Patel-Schneider, editors. The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press, New York, NY, USA, 2003. F. Bobillo, M. Delgado, J. Gómez-Romero, and U. Straccia. Fuzzy Description Logics under Gödel semantics. International Journal of Approximate Reasoning, 50(3):494–514, 2009. L. Bˇehounek, U. Bodenhofer, and P. Cintula. Relations in Fuzzy Class Theory. Fuzzy Sets and Systems, 159(14):1729–1772, July 2008. L. Bˇehounek and P. Cintula. Fuzzy Class Theory. Fuzzy Sets and Systems, 154(1):34–55, 2005. L. Bˇehounek and P. Cintula. Fuzzy Class Theory: A primer v1.0. Technical Report V-939, Institute of Computer Science, Academy of Sciences of the Czech Republic, Prague, 2006.

214 [7]

[8] [9]

[10] [11] [12] [13] [14] [15] [16] [17] [18]

E. Armengol et al. / Local and Global Similarities in Fuzzy Class Theory

L. Bˇehounek, P. Cintula, and P. Hájek. A Introduction to Mathematical Fuzzy Logic, chapter I of Handbook of Mathematical Fuzzy Logic - volume 1, pages 1–102. Number 37 in Studies in Logic. College Publications, London, 2011. M. Cerami, À. García-Cerdaña, and F. Esteva. On finitely-valued Fuzzy Description Logics. International Journal of Approximate Reasoning, http://dx.doi.org/10.1016/j.ijar.2013.09.021, 2013. À. García-Cerdaña, E. Armengol, and P. Dellunde. Similarity for Attribute-value Representations in Fuzzy Description Logics. In R. Alquezar, A. Moreno, and J. Aguilar, editors, Artificial Intelligence Research and Developement. Volume 220 of Frontiers in Artificial Intelligence and Applications. Proceedings of CCIA 2010, pages 269–278, 2010. P. Hájek. Metamathematics of Fuzzy Logic, volume 4 of Trends in Logic. Studia Logica Library. Kluwer Academic Publishers, Dordrecht, 1998. T. Lukasiewicz and U. Straccia. Managing uncertainty and vagueness in Description Logics for the Semantic Web. Journal of Web Semantics, 6(4):291–308, 2008. J. Recasens. Indistinguishability Operators - Modelling Fuzzy Equalities and Fuzzy Equivalence Relations, volume 260 of Studies in Fuzziness and Soft Computing. Springer, 2011. E. H. Ruspini. On the semantics of fuzzy logics. International Journal of Approximate Reasoning, 5:45–88, 1991. G. Stoilos, G. Stamou, J.Z. Pan, V. Tzouvaras, and I. Horrocks. Reasoning with very expressive Fuzzy Description Logics. Journal of Artificial Intelligence Research, 30(8):273–320, 2007. U. Straccia. Reasoning within Fuzzy Description Logics. Journal of Artificial Intelligence Research, 14:137–166, 2001. U. Straccia. A Fuzzy Description Logic for the Semantic Web. In Elie Sanchez, editor, Fuzzy Logic and the Semantic Web, Capturing Intelligence, chapter 4, pages 73–90. Elsevier, 2006. Amos Tversky. Features of similarity. Psychological Review, 84(4):327–352, 1977. L. A. Zadeh. Similarity relations and fuzzy orderings. Information Sciences, 3:177–200, 1971.

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-215

215

On the Characterization of the Maximal Ideal Recursive Semantics of RP-DeLP Teresa Alsinet a,1 , Ramón Béjar a Lluís Godo b and Francesc Guitart a a Department of Computer Science. University of Lleida, S PAIN b Artificial Intelligence Research Institute (IIIA-CSIC), Bellaterra, S PAIN Abstract. Possibilistic Defeasible Logic Programming (P-DeLP) is a logic programming framework which combines features from argumentation theory and logic programming, in which defeasible rules are attached with weights expressing their relative belief or preference strength. In P-DeLP a conclusion succeeds if there exists an argument that entails the conclusion and this argument is found to be undefeated by a warrant procedure that systematically explores the universe of arguments in order to present an exhaustive synthesis of the relevant chains of pros and cons for the given conclusion. Recently, we have proposed a new warrant recursive semantics for P-DeLP, called Recursive P-DeLP (RP-DeLP for short), based on the claim that the acceptance of an argument should imply also the acceptance of all its subarguments which reflect the different premises on which the argument is based. In RP-DeLP, an output of a program is a pair of sets, a set of warranted and a set of blocked conclusions.Arguments for both warranted and blocked conclusions are recursively based on warranted conclusions but, while warranted conclusions do not generate any conflict with the set of already warranted conclusions and the strict part of program (information we take for granted they hold true), blocked conclusions do. Conclusions that are neither warranted nor blocked correspond to rejected conclusions. This paper explores the relationship between the exhaustive dialectical analysis based semantics of P-DeLP and the recursive based semantics of RP-DeLP, and analyzes a non-monotonic inference operator for RP-DeLP which models the expansion of a given program by adding new weighed facts associated with warranted conclusions. Keywords. argumentation, logic programming, uncertainty, non-monotonic inference

1. Introduction and motivations Defeasible argumentation is a natural way of identifying relevant assumptions and conclusions for a given problem which often involves identifying conflicting information, resulting in the need to look for pros and cons for a particular conclusion [10]. This process may involve chains of reasoning, where conclusions are used in the assumptions for deriving further conclusions and the task of finding pros and cons may be decomposed recursively. Logic-based formalizations of argumentation that take pros and cons for some conclusion into account assume a set of formulas and then lay out arguments and counterarguments that can be obtained from these assumed formulas [4]. 1 Correspondence to: T. Alsinet. Department of Computer Science, University of Lleida. C/Jaume II, 69. Lleida, Spain. Tel.: +34 973702734; E-mail: [email protected].

216

T. Alsinet et al. / On the Characterization of the Maximal Ideal Recursive Semantics of RP-DeLP

Defeasible Logic Programming (DeLP) [8] is a formalism that combines techniques of both logic programming and defeasible argumentation. As in logic programming, knowledge is represented in DeLP using facts and rules; however, DeLP also provides the possibility of representing defeasible knowledge under the form of weak (defeasible) rules, expressing reasons to believe in a given conclusion. In DeLP, a conclusion succeeds if it is warranted, i.e., if there exists an argument (a consistent sets of defeasible rules) that, together with the non-defeasible rules and facts, entails the conclusion, and moreover, this argument is found to be undefeated by a warrant procedure which builds a dialectical tree containing all arguments that challenge this argument, and all counterarguments that challenge those arguments, and so on, recursively. Actually, dialectical trees systematically explore the universe of arguments in order to present an exhaustive synthesis of the relevant chains of pros and cons for a given conclusion. In fact, the interpreter for DeLP [7] (http://lidia.cs.uns.edu.ar/DeLP) takes a knowledge base (program) P and a conclusion (query) Q as input, and it then returns one of the following four possible answers: YES, if Q is warranted from P ; NO, if the complement of Q is warranted from P ; UNDECIDED, if neither Q nor its complement are warranted from P ; or UNKNOWN, if Q is not in the language of the program P . Possibilistic Defeasible Logic Programming (P-DeLP) [2] is an extension of DeLP in which defeasible facts and rules are attached with weights (belonging to the real unit interval [0, 1]) expressing their relative belief or preference strength. As many other argumentation frameworks [6,10], P-DeLP can be used as a vehicle for facilitating rationally justifiable decision making when handling incomplete and potentially inconsistent information. Actually, given a P-DeLP program, justifiable decisions correspond to warranted conclusions (to some necessity degree), that is, those which remain undefeated after an exhaustive dialectical analysis of all possible arguments for and against. Recently in [1], we have proposed a new semantics for P-DeLP based on a general notion of collective (non-binary) conflict among arguments and on the claim that the acceptance of an argument should imply also the acceptance of all its subarguments which reflect the different premises on which the argument is based. In this framework, called Recursive P-DeLP (RP-DeLP for short), an output (extension) of a program is now a pair of sets, a set of warranted and a set of blocked conclusions, with maximum necessity degrees. Arguments for both warranted and blocked conclusions are recursively based on warranted conclusions but, while warranted conclusions do not generate any conflict with the set of already warranted conclusions and the strict part of program (information we take for granted they hold true), blocked conclusions do. Conclusions that are neither warranted nor blocked correspond to rejected conclusions. The key feature that our warrant recursive semantics addresses corresponds with the closure under subarguments postulate recently proposed by Amgoud [3], claiming that if an argument is excluded from an output, then all the arguments built on top of it should also be excluded from that output. As stated in [9], this recursive definition of acceptance among arguments can lead to different outputs (extensions) for warranted conclusions. For RP-DeLP programs with multiple outputs we have also considered in [1] the problem of deciding the set of conclusions that could be ultimately warranted. We have called this output (extension) maximal ideal output of an RP-DeLP program. In this paper we explore the relationship between the exhaustive dialectical analysis based semantics of P-DeLP and the maximal ideal output of RP-DeLP, and we analyze a

T. Alsinet et al. / On the Characterization of the Maximal Ideal Recursive Semantics of RP-DeLP

217

non-monotonic inference operator for RP-DeLP which models the expansion of a given program by adding new weighed facts associated with warranted conclusions.

2. The language of P-DeLP and RP-DeLP In order to make this paper self-contained, we will present next the main definitions that characterize P-DeLP and RP-DeLP frameworks. For details the reader is referred to [2,1]. The language of P-DeLP and RP-DeLP, denoted L, is inherited from the language of logic programming, including the notions of atom, literal, rule and fact. Formulas are built over a finite set of propositional variables p, q, ... which is extended with a new (negated) atom “∼p” for each original atom p. Atoms of the form p or ∼p will be referred as literals, and if P is a literal, we will use ∼P to denote ∼p if P is an atom p, and will denote p if P is a negated atom ∼p. Formulas of L consist of rules of the form Q ← P1 ∧ . . . ∧ Pk , where Q, P1 , . . . , Pk are literals. A fact will be a rule with no premises. We will also use the name clause to denote a rule or a fact. P-DeLP and RP-DeLP frameworks are based on the propositional logic (L, ) where the inference operator  is defined by instances of the modus ponens rule of the form: {Q ← P1 ∧ . . . ∧ Pk , P1 , . . . , Pk }  Q. A set of clauses Γ will be deemed as contradictory, denoted Γ  ⊥, if , for some atom q, Γ  q and Γ  ∼q. In both frameworks a program P is a tuple P = (Π, Δ, ) over the logic (L, ), where Π, Δ ⊆ L, and Π  ⊥. Π is a finite set of clauses representing strict knowledge (information we take for granted they hold true), Δ is another finite set of clauses representing the defeasible knowledge (formulas for which we have reasons to believe they are true). Finally,  is a total pre-order on Π ∪ Δ representing levels of defeasibility: ϕ ≺ ψ means that ϕ is more defeasible than ψ. Actually, since formulas in Π are not defeasible,  is such that all formulas in Π are at the top of the ordering. For the sake of a simpler notation we will often refer in the paper to numerical levels for defeasible clauses and arguments rather than to the pre-ordering , so we will assume a mapping N : Π ∪ Δ → [0, 1] such that N (ϕ) = 1 for all ϕ ∈ Π and N (ϕ) < N (ψ) iff ϕ ≺ ψ. 1 The notion of argument is the usual one inherited from similar definitions in the argumentation literature [11,10,6]. Given a program P, an argument for a literal (conclusion) Q of L is a pair A = A, Q, with A ⊆ Δ such that Π ∪ A  ⊥, and A is minimal (w.r.t. set inclusion) such that Π ∪ A  Q. If A = ∅, then we will call A a s-argument (s for strict), otherwise it will be a d-argument (d for defeasible). We define the strength of an argument A, Q, written s(A, Q), as follows: s(A, Q) = 1 if A = ∅, and s(A, Q) = min{N (ψ) | ψ ∈ A}, otherwise. The notion of subargument is referred to d-arguments and expresses an incremental proof relationship between arguments which is defined as follows. Let B, Q and A, P  be two d-arguments such that the minimal sets (w.r.t. set inclusion) ΠQ ⊆ Π and ΠP ⊆ Π such that ΠQ ∪ B  Q and ΠP ∪ A  P verify that ΠQ ⊆ ΠP . Then, B, Q is a subargument of A, P , written B, Q  A, P , when either B ⊂ A (strict inclusion for defeasible knowledge), or B = A and ΠQ ⊂ ΠA (strict inclusion for strict knowl1 Actually, a same pre-order  can be represented by many mappings, but we can take any of them to since only the relative ordering is what actually matters.

218

T. Alsinet et al. / On the Characterization of the Maximal Ideal Recursive Semantics of RP-DeLP

edge). A literal Q of L is called justifiable conclusion w.r.t. P if there exists an argument for Q, i.e. there exists A ⊆ Δ such that A, Q is an argument. As in most argumentation formalisms (see e.g. [10,6]), in P-DeLP and RP-DeLP frameworks it can be the case that there exist arguments supporting contradictory literals, and thus, there exist sets of conflicting arguments. Since arguments can rely on defeasible information, conflicts among arguments may be resolved in both frameworks by comparing their strength. In this sense the aim of both frameworks is to provide a useful warrant procedure in order to determine which conclusions are ultimately accepted (or warranted) on the basis of a given program. The difference between the two frameworks lies in the way in which this procedure is defined and the type of conflicts are handled. In P-DeLP warranted conclusions are justifiable conclusions which remain undefeated after an exhaustive dialectical analysis of all possible arguments for an against and only binary attacks or defeat relations are considered. In RP-DeLP semantics for warranted conclusions is based on a collective (non-binary) notion of conflict between arguments and if an argument is excluded from an output, then all the arguments built on top of it are excluded from that output. In the following sections we describe both mechanisms.

3. Warrant semantics of P-DeLP Let P be a P-DeLP program, and let A1 , Q1  and A2 , Q2  be two arguments w.r.t. P. A1 , Q1  defeats A2 , Q2 2 iff Q1 = ∼ Q2 and s(A1 , Q1 ) ≥ s(A2 , Q2 ), or A, Q  A2 , Q2  and Q1 = ∼ Q and s(A1 , Q1 ) ≥ s(A, Q). Moreover, if A1 , Q1  defeats A2 , Q2  with strict relation > we say that A1 , Q1  is a proper defeater for A2 , Q2 , otherwise we say that A1 , Q1  is a blocking defeater for A2 , Q2 . In P-DeLP warranted conclusions are formalized in terms of an exhaustive dialectical analysis of all possible argumentation lines rooted in a given argument. An argumentation line starting in an argument A0 , Q0  is a sequence of arguments λ = [A0 , Q0 , A1 , Q1 , . . . , An , Qn , . . .] such that each Ai , Qi  defeats the previous argument Ai−1 , Qi−1  in the sequence, i > 0. In order to avoid fallacious reasoning additional constraints are imposed, namely: 1. Non-contradiction: given an argumentation line λ, the set of arguments of the proponent (respectively opponent) should be non-contradictory w.r.t. P. 3 2. Progressive argumentation: (i) every blocking defeater Ai , Qi  in λ with i > 0 is defeated by a proper defeater4 Ai+1 , Qi+1  in λ; and (ii) each argument Ai , Qi  in λ, with i ≥ 2, is such that Qi = ∼Qi−1 . The non-contradiction condition disallows the use of contradictory information on either side (proponent or opponent). The first condition of progressive argumentation enforces the use of a proper defeater to defeat an argument which acts as a blocking 2 In what follows, for a given goal Q, we will write ∼ Q as an abbreviation to denote “∼ q", if Q ≡ q, and “q", if Q ≡ ∼ q. S 3 Non-contradiction for a set of arguments is defined as follows: a set S = n {A , Q } is contradictory i i i=1 Sn w.r.t. P iff Π ∪ i=1 Ai is contradictory. 4 It must be noted that the last argument in an argumentation line is allowed to be a blocking defeater for the previous one.

T. Alsinet et al. / On the Characterization of the Maximal Ideal Recursive Semantics of RP-DeLP

219

defeater, while the second condition avoids non optimal arguments in the presence of a conflict. An argumentation line satisfying the above restrictions is called acceptable, and can be proven to be finite. The set of all possible acceptable argumentation lines results in a structure called dialectical tree. Given a program P and a goal Q, Q is warranted w.r.t. P with maximum strength α iff there exists an argument A, Q with s(A, Q) = α such that: i) every acceptable argumentation line starting with A, Q has an odd number of arguments; and ii) there is no other argument of the form B, Q, with s(B, Q) > α, satisfying the above condition. In the rest of the paper we will write P |∼w A, Q, α to w denote this fact and we will write CDT (P) to denote the set of warranted conclusions of w P based on dialectical trees, i.e. CDT (P) = {Q | P |∼w A, Q, α}. In [5] Caminada and Amgoud proposed three rationality postulates which every rule-based argumentation system should satisfy. One of such postulates (called Indirect Consistency) requires that the set of warranted conclusions must be consistent (w.r.t. the underlying logic) with the set of strict clauses. This means that the warrant semantics of P-DeLP satisfies the indirect consistency postulate iff given a program P = (Π, Δ, ) w w its set of warranted conclusions CDT (P) is such that Π ∪ CDT (P)  ⊥. The defeat relation in P-DeLP, as occurs in most rule-based argumentation systems, is binary and, in some cases, the conflict relation among arguments is hardly representable as a binary relation when we compare them with the strict part of a program. For instance, consider the following program P = (Π, Δ, ) with Π = {p, ¬p ← a ∧ b ∧ c}, Δ = {a, b, c} and a single defeasibility level α for Δ. Clearly, A1 = {a}, a, A2 = {b}, b and A3 = {c}, c are arguments that justify conclusions a, b and c respectively, and A1 , A2 and A3 have no defeaters, and thus, {a, b, c} are warranted w.r.t. the P-DeLP program P. Indeed, conclusions a, b and c do not pairwisely generate a conflict since Π ∪ {a, b}  ⊥, Π ∪ {a, c}  ⊥ and Π ∪ {b, c}  ⊥. However, these conclusions are collectively conflicting w.r.t. the strict part of program Π since Π ∪ {a, b, c}  ⊥, and thus, the warrant semantics of P-DeLP does not satisfy the indirect consistency postulate. In order to characterize such situations we proposed in [1] the RP-DeLP framework, a new warrant semantics for P-DeLP based on a general notion of collective (non-binary) conflict among arguments ensuring the three rationality postulates defined by Caminada and Amgoud. 4. Warrant semantics of RP-DeLP The warrant recursive semantics of RP-DeLP is based on the following general notion of collective conflict in a set of arguments which captures the idea of an inconsistency arising from a consistent set of justifiable conclusions W together with the strict part of a program and the set of conclusions of those arguments. Let P = (Π, Δ, ) be a program and let W ⊆ L be a set of conclusions. We say that a set of arguments {A1 , Q1 , . . . , Ak , Qk } minimally conflicts with respect to W iff the two following conditions hold: (i) the set of argument conclusions {Q1 , . . . , Qk } is contradictory with respect to W , i.e. it holds that Π ∪ W ∪ {Q1 , . . . , Qk }  ⊥; and (ii) the set {A1 , Q1 , . . . , Ak , Qk } is minimal with respect to set inclusion satisfying (i), i.e. if S  {Q1 , . . . , Qk }, then Π ∪ W ∪ S  ⊥. This general notion of conflict is used to define an output of an RP-DeLP program P = (Π, Δ, ) as a pair (Warr, Block) of subsets of L of warranted and blocked conclu-

220

T. Alsinet et al. / On the Characterization of the Maximal Ideal Recursive Semantics of RP-DeLP

sions respectively. Since we are considering several levels of strength among arguments, the intended construction of the sets of conclusions Warr and Block is done level-wise, starting from the highest level and iteratively going down from one level to next level below. If 1 > α1 > . . . > αp ≥ 0 are the strengths of d-arguments that can be built within P, we define: Warr = Warr(1) ∪ {∪i=1,p Warr(αi )} and Block = ∪i=1,p Block(αi ), where Warr(1) = {Q | Π  Q}, and Warr(αi ) and Block(αi ) are respectively the sets of the warranted and blocked justifiable conclusions of strength αi . Intuitively, an argument A, Q of strength αi is valid whenever (i) it is based on warranted conclusions; (ii) there does not exist a valid argument for Q with strength > αi ; and (iii) Q is consistent with warranted and blocked conclusions of strength > αi . Then, a valid argument A, Q becomes blocked as soon as it leads to some conflict among valid arguments of same strength and the set of already warranted conclusions, otherwise, it is warranted. In [1] we show that, in case of some circular dependences among arguments, the output of an RP-DeLP program may be not unique, that is, there may exist several pairs (Warr, Block) satisfying the above conditions for a given RP-DeLP program. The following example shows a circular relation among arguments involving strict knowledge. Consider the RP-DeLP program P = (Π, Δ, ) with Π = {y, ∼y ← p∧r, ∼y ← q∧s}, Δ = {p, q, r ← q, s ← p} and a single defeasibility level α for Δ. Then, Warr(1) = {y} and A1 = {p}, p and A2 = {q}, q are valid arguments for conclusions p and q, respectively, and thus, conclusions p and q may be warranted or blocked but not rejected. Moreover, since arguments B1 = {q, r ← q}, r and B2 = {p, s ← p}, s are valid whenever q and p are warranted, respectively, and Π ∪ {p, r}  ⊥ and Π ∪ {q, s}  ⊥, we get that p can be warranted iff q is blocked and that q can be warranted iff p is blocked. Hence, in that case we have two possible outputs: (Warr1 , Block1 ) and (Warr2 , Block2 ), where Warr1 = {y, p}, Block 1 = {q, s} and Warr2 = {y, q}, Block2 = {p, r}. Figure 1 shows the circular dependences among {A1 , A2 } and {B1 , B2 }. Conflict and support dependencies among arguments are represented as dashed and solid arrows, respectively. The cycle of the graph expresses that (1) the warranty of p depends on a (possible) conflict with r; (2) the support of r depends on q (i.e., r is valid whenever q is warranted); (3) the warranty of q depends on a (possible) conflict with s; and (4) the support of s depends on p (i.e., s is valid whenever p is warranted). r

2

4

s

3

1 p

q

Figure 1. Circular dependences for program P.

In [1] we analyze the problem of deciding the set of conclusions that can be ultimately warranted in RP-DeLP programs with multiple outputs. The usual skeptical approach would be to adopt the intersection of all possible outputs. However, in addition to the computational limitation, as stated in [9], adopting the intersection of all outputs

T. Alsinet et al. / On the Characterization of the Maximal Ideal Recursive Semantics of RP-DeLP

221

may lead to an inconsistent output in the sense of violating the base of the underlying recursive warrant semantics, claiming that if an argument is excluded from an output, then all the arguments built on top of it should also be excluded from that output. Intuitively, for a conclusion, to be in the intersection does not guarantee the existence of an argument for it that is recursively based on ultimately warranted conclusions. Instead, the set of ultimately warranted conclusions we are interested in for RP-DeLP programs is characterized by means of a recursive level-wise definition considering at each level the maximum set of conclusions based on warranted information and not involved in neither a conflict nor a circular definition of warranty. We refer to this output as maximal ideal output of an RP-DeLP program. Intuitively, a valid argument A, Q becomes blocked in the maximal ideal output,as soon as (i) it leads to some conflict among valid arguments of same strength and the set of already warranted conclusions or (ii) the warranty of A, Q depends on some circular definition of conflict between arguments of same strength; otherwise, it is warranted. Consider again the previous program P . According to Figure 1, valid arguments for conclusions p and q are involved in a circular circular definition of conflict, and thus, conclusions p and q must be blocked in the maximal ideal output of P and arguments for conclusions r and s are rejected. Hence, in that case, we have the following maximal ideal output of P: (Warrmax , Blockmax ), where Warrmax = {y}, Blockmax = {p, q}.

5. Dialectical analysis and maximal ideal output In [1] we prove that the maximal ideal output of an RP-DeLP program is unique and satisfies the indirect consistency property defined by Caminada and Amgoud with respect to the strict knowledge. w Next we show that given a program P and its set CDT (P) of warranted conclusions w based on dialectical trees, CDT (P) contains each warranted conclusion in the maximal ideal output of P. 5 Proposition 1 Let P = (Π, Δ, ) be a program with levels of defeasibility 1 > α1 > . . . > αp ≥ 0. If (Warr, Block) is the maximal ideal output of P, for each level αi it holds that Warr(αi ) ⊆ {Q | P |∼w A, Q, αi }. Obviously, Warr(1) = {Q | P |∼w A, Q, 1}. Notice that the inverse of Prop. 1 does not hold since the dialectical analysis based semantics of P-DeLP does not satisfy the indirect consistency property defined by Caminada and Amgoud with respect to the strict knowledge. Because we are interested in exploring the relationship between the warrant semantics of P-DeLP and the maximal ideal output of RP-DeLP, we have to extend the P-DeLP framework with some mechanism ensuring this property. In [5] Caminada and Amgoud propose as a solution the definition of a special transposition operator Cltp for computing the closure of strict rules. This accounts for taking every strict rule r = φ1 , φ2 , . . . , φn → ψ as a material implication in propositional logic which is equivalent to the disjunction φ1 ∨ φ2 ∨ . . . , φn ∨ ¬ψ. From that disjunction different rules of the form φ1 , . . . , φi−1 , ¬ψ, φi+1 , . . . , φn → ¬φi can be obtained 5 Proofs

are not included in the paper for space reasons.

222

T. Alsinet et al. / On the Characterization of the Maximal Ideal Recursive Semantics of RP-DeLP

(transpositions of r). If S is a set of strict rules, Cltp is the minimal set such that (i) S ⊆ Cltp (S) and (ii) If s ∈ Cltp (S) and t is a transposition of s, then t ∈ Cltp (S). Computing the closure under transposition of strict rules allows the indirect consistency property to be satisfied in the case of rule-based argumentation systems like DeLP or P-DeLP as it was proved in [5]. In fact, in some sense, it allows to perform forward reasoning from warranted conclusions, and thus, to evaluate collective conflicts among arguments. However, P-DeLP is a Horn-based system, so that strict rules should be read as inference rules rather than as material implications. In this respect, the use of transposed rules might lead to unintuitive situations in a logic programming context. Consider e.g. the program P = (Π, Δ, ) with Π = {q ← p ∧ r , s ← ∼r , p, ∼q, ∼s}, Δ = ∅. w (P) = {p, ∼q, ∼s}. In P-DeLP, p, ∼q and ∼s would be warranted conclusions, i.e. CDT However, the closure under transposition Cltp (Π) would include the rule ∼r ← p ∧ ∼q, resulting in inconsistency since both s and ∼s can be derived, so that the whole program would be deemed as invalid. Apart from the above limitation, when extending a P-DeLP program with all possible transpositions of every strict rule, the system can possibly establish as warranted goals conclusions which are not explicitly expressed in the original program. Consider e.g. the program P = (Π, Δ, ) with Π = {∼y ← a ∧ b, y}, Δ = {a, b} and two levels of defeasibility for Δ as follows: {b} ≺ {a}. Assume α1 is the level of {a} and α2 is the level of {b}, with 1 > α1 > α2 > 0. Transpositions of the strict rule ∼y ← a ∧ b are ∼a ← y ∧ b and ∼b ← y ∧ a. Then, the argument A = {∼b ← a ∧ y, a}, ∼b with strength α1 justifies conclusion ∼b. Moreover, as there is neither a proper nor a blocking defeater of A, we conclude that ∼b is warranted w.r.t. P  = (Π ∪ Cltp (Π), Δ, ), although no explicit information is given for literal ∼b in P. Moreover, notice that w w (P) = {y, a, b} and CDT (P  ) = {y, a, ∼b}. CDT Next we show that if (Warr, Block) is the maximal ideal output of a program P = (Π, Δ, ), the set Warr of warranted conclusions contains indeed each literal Q satisfying that P  |∼w A, Q, α and Π ∪ A  Q, with P  = (Π ∪ Cltp (Π), Δ, ) and whenever Π ∪ Cltp (Π)  ⊥. Proposition 2 Let P = (Π, Δ, ) be a program with levels of defeasibility 1 > α1 > . . . > αp ≥ 0 and such that Π ∪ Cltp (Π)  ⊥. If (Warr, Block) is the maximal ideal output of P and P  = (Π ∪ Cltp (Π), Δ, ), for each level αi it holds that Warr(αi ) = {Q | P  |∼w A, Q, αi  and Π ∪ A  Q}. Obviously, Warr(1) = {Q | Π  Q} = {Q | P  |∼w A, Q, 1 and Π ∪ A  Q}. Following the approach we made in [2] for dialectical semantics, next we study the behavior of the maximal ideal output of an RP-DeLP program in the context of nonmonotonic inference relationships. In order to do this, we define an inference operator w that computes the expansion of a program including all new facts which correspond ERS to warranted conclusions in the maximal ideal output. Formally: Let P = (Π, Δ, ) be an RP-DeLP program with levels of defeasibility 1 > α1 > . . . > αp ≥ 0 and let (Warr, Block) be the maximal ideal output of P. w w associated with P as follows: ERS (P) = (Π ∪ Warr(1), Δ ∪ We define the operator ERS   (∪i=1...p Warr(αi )),  ) and such that N (ϕ) = N (ϕ) for all ϕ ∈ Π ∪ Δ, N  (ϕ) = 1 for all ϕ ∈ Warr(1), and N (ϕ) = αi for all ϕ ∈ Warr(αi ), i = 1 . . . p. w is well-defined (i.e., given an RP-DeLP proNotice that by definition operator ERS w satgram as input, the associated output is also an RP-DeLP program). Moreover, ERS

T. Alsinet et al. / On the Characterization of the Maximal Ideal Recursive Semantics of RP-DeLP

223

isfies inclusion: given an RP-DeLP program P = (Π, Δ, ) with levels of defeasibility 1 > α1 > . . . > αp ≥ 0 and maximal ideal output (Warr, Block), Π ⊆ Π ∪ Warr(1), Δ ⊆ Δ ∪ (∪i=1...p Warr(αi )) and  preserves the total pre-order  on Π ∪ Δ. In what follows, given an RP-DeLP program P = (Π, Δ, ), a clause ϕ and a set of clauses Γ, we will write ϕ ∈ P and Γ ⊆ P to denote that ϕ ∈ Π ∪ Δ and Γ ⊆ Π ∪ Δ, respectively. w Besides, monotonicity does not hold for ERS , as expected. It is satisfied if all warranted conclusions from a given program are preserved when the program is augmented with new clauses. As a counterexample consider the program P = (Π, Δ, ) with Π = {q}, Δ = {p ← q} and a single level of defeasibility α for Δ. Then, w Warr(1) = {q} and Warr(α) = {p}, and thus, {q, p} ⊆ ERS (P). However, if we extend program P with the strict fact ∼p, we get the following program P  = (Π , Δ,  ) with Π = {q, ∼p} and N  (∼p) = 1. Then, Warr(1) = {q, ∼p} and Warr(α) = ∅ in the w w maximal ideal output of P  . Hence, p ∈ ERS (P  ) but p ∈ ERS (P) . Semi-monotonicity is an interesting property for analyzing non-monotonic consequence relationships. It is satisfied if all defeasible warranted conclusions are preserved when the program is augmented with new defeasible clauses. Semi-monotonicity does w not hold for ERS , as adding new defeasible clauses cannot invalidate already valid arguments, but it can enable new ones that were not present before, thus introducing new conflicts or new circular dependences among arguments. Arguments that were warranted may therefore no longer keep that status. Consider a variant of the previous counterexample: we consider the fact ∼p as defeasible information, i.e. we define the following program P  = (Π, Δ ,  ) with Δ = {p ← q, ∼p} N  (∼p) = N  (p ← q). Now, Warr(1) = {q}, Warr(α) = ∅ and Block(α) = {p, ∼p} for the maximal ideal output of w w P  . Hence, p ∈ ERS (P  ) but p ∈ ERS (P). w Next we define some relevant logical properties that operator ERS satisfies. w w w w satisfies idempotence: ERS (P) = ERS (ERS (P)). • The operator ERS w w w • The operator ERS satisfies cummulativity: if Q ∈ ERS (P), then if R ∈ ERS (P ∪{Q}) w implies R ∈ ERS (P). w w • The operator ERS satisfies (Horn) supraclassicality: Π ⊆ ERS (P), where Π = {Q | Π  Q}. w Finally, the operator ERS satisfies (somewhat softened) right weakening with respect to the set of strict rules. Indeed, it is satisfied in the full sense for RP-DeLP programs with a single defeasibility level: let P = (Π, Δ, ) be an RP-DeLP program with a single w defeasibility level for Δ, if Q ← P1 ∧ . . . ∧ Pk ∈ Π and {P1 , . . . , Pk } ⊆ ERS (P), then w Q ∈ ERS (P). The key point here is how warranted and blocked conclusions at higher levels of the maximal ideal output are taken into account in lower levels. In particular blocked conclusions play a key role in the propagation mechanism between defeasibility levels. In the RP-DeLP approach if a conclusion ϕ is blocked at level α, then for any lower level than α, not only the conclusion ϕ is rejected but also every conclusion ψ such that {ϕ, ψ}  ⊥. Then, for the general case we have the following right weakening logical w property for operator ERS . Let P = (Π, Δ, ) be an RP-DeLP program with defeasibility levels 1 > α1 > . . . > αp > 0, and let (Warr, Block) be the maximal ideal output of w w P. If Q ← P1 ∧ . . . ∧ Pk ∈ Π and {P1 , . . . , Pk } ⊆ ERS (P), then either Q ∈ ERS (P)   and N (Q) ≥ min{N (Pi ) | Pi ∈ {P1 , . . . , Pk }}, or Q, or ∼Q ∈ Block(β) for some β > min{N  (Pi ) | Pi ∈ {P1 , . . . , Pk }}.

224

T. Alsinet et al. / On the Characterization of the Maximal Ideal Recursive Semantics of RP-DeLP

6. Conclusions and future work In this paper we have analyzed the relationship between the exhaustive dialectical analysis based semantics of P-DeLP and the recursive based semantics of RP-DeLP and we have shown that the maximal ideal semantics of RP-DeLP provides a useful framework for making a formal analysis of logical properties of warrant in defeasible argumentation. Our current research work in RP-DeLP will follow two main directions: on the one hand, we are concerned with characterizing a lower bound on complexity for computing the warranty status of arguments according to the maximal ideal recursive semantics. On the other hand, we are concerned with developing a graphic representation framework of the maximal ideal recursive semantics. This representation could be used as a mechanism for refinement of strict and defeasible information. Acknowledgments This research was partially supported by the Spanish projects EdeTRI (TIN2012-39348C02-01) and AT (CONSOLIDER- INGENIO 2010, CSD2007-00022).

References [1] Teresa Alsinet, Ramón Béjar, Lluis Godo, and Francesc Guitart. RP-DeLP: a weighted defeasible argumentation framework base on a recursive semantics. Journal of Logic and Computation, 2014. doi:10.1093/logcom/exu008. [2] Teresa Alsinet, Carlos I. Chesñevar, Lluis Godo, and Guillermo R. Simari. A logic programming framework for possibilistic argumentation: Formalization and logical properties. Fuzzy Sets and Systems, 159(10):1208–1228, 2008. [3] Leila Amgoud. Postulates for logic-based argumentation systems. In Proceedings of the ECAI-2012 Workshop WL4AI, pages 59–67, 2012. [4] Philippe Besnard and Anthony Hunter. Elements of Argumentation. The MIT Press, 2008. [5] Martin Caminada and Leila Amgoud. On the evaluation of argumentation formalisms. Artif. Intell., 171(5-6):286–310, 2007. [6] Carlos I. Chesñevar, Ana G. Maguitman, and Ronald P. Loui. Logical Models of Argument. ACM Computing Surveys, 32(4):337–383, December 2000. [7] Alejandro J. García, Jürgen Dix, and Guillermo R. Simari. Argument-based logic programming. In Iyad Rahwan and Guillermo R. Simari, editors, Argumentation in Artificial Intelligence, chapter 8, pages 153–171. Springer, 2009. [8] Alejandro J. García and Guilermo R. Simari. Defeasible Logic Programming: An Argumentative Approach. Theory and Practice of Logic Programming, 4(1):95–138, 2004. [9] John L. Pollock. A recursive semantics for defeasible reasoning. In Iyad Rahwan and Guillermo R. Simari, editors, Argumentation in Artificial Intelligence, chapter 9, pages 173– 198. Springer, 2009. [10] Henry Prakken and Gerard Vreeswijk. Logical Systems for Defeasible Argumentation. In D. Gabbay and F.Guenther, editors, Handbook of Phil. Logic, pages 219–318. Kluwer, 2002. [11] Guillermo R. Simari and Ronald P. Loui. A Mathematical Treatment of Defeasible Reasoning and its Implementation. Artificial Intelligence, 53:125–157, 1992.

Planning

This page intentionally left blank

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-227

227

A comparison of two MCDM methodologies in the selection of a windfarm location in Catalonia Arayeh Afsordegana, , Mónica Sánchezb, Núria Agellc1, Juan Carlos Aguadod and Gonzalo Gamboae a Department of Engineering Projects, Universitat Politècnica de Catalunya b

Department of Applied Mathematics 2, Universitat Politècnica de Catalunya c ESADE Business School, Universitat Ramon Llull, d Department of ESAII(Automatic Control) , Universitat Politècnica de Catalunya e Institute for Environmental Sciences and Technologies, Universitat Autònoma de Barcelona Abstract. A case study in a social multi-criteria evaluation framework for selecting a windfarm location in the regions of Urgell and La Conca de Barberà in Catalonia is presented. Two different MCDM approaches are introduced and compared through their application to the mentioned case. On the one hand, a Qualitative TOPSIS methodology able to address uncertainty and, able to deal with different levels of precision is considered. On the other hand, we consider the results obtained by a non-compensatory outranking MCDM method. Both approaches are analyzed and their performance in the selection of a windfarm location is compared. Although results show that both methods conduct to similar alternatives rankings, the study highlights both their advantages and drawbacks. Keywords. Multi-criteria decision-making, linguistic labels, qualitative reasoning, TOPSIS, wind energy.

Introduction From beginning of 1970s, multi-criteria decision making (MCDM) is presented as one of the comprehensive tools to find the solution for different kinds of problems in which several alternatives are described by means of a set of variables. Recently a large number of challenges are related to sustainability issues [1], [2]. In sustainability the objective is balance together economic activities with their environmental and social impacts. In particular, in the energy sector the aim is providing an appropriate balance of energy production and consumption with minimum negative impact on environment [3]. Energy problems are addressing conflicting economic, technological, social and environmental criteria, involving different variables with qualitative and quantitative natures. A suitable method among many multi-criteria models that have been formulated should be selected to rank or sort the alternatives. 1

Corresponding Author. E-mail address: [email protected], [email protected], [email protected]

228

A. Afsordegan et al. / A Comparison of Two MCDM Methodologies

Qualitative Reasoning (QR) is a subfield of research in Artificial Intelligence that attempts to understand and explain the skill of human beings to reason without having precise knowledge. It has been used as one of the systematic tools for sustainability assessment. Qualitative absolute order-of-magnitude that was introduced in qualitative reasoning models uses a linguistic approach able to work with different levels of precision. It can be integrated with multi-criteria decision making methods, such as TOPSIS, to evaluate alternatives in respect to different criteria for ranking problems [4]. The main aims of this paper are the application of a Qualitative TOPSIS methodology to a windfarm location problem and the comparison of its results with a technique that already was used in this specific case. The methodology CKYL (Condorcet-Kemeny-Young-Leveng) considered for the comparison, along with the wind park location problem [9]. This comparison between the Qualitative TOPSIS and the CKYL models highlights their advantages and disadvantages and represents a first step to consider a possible integration of both methods, in future studies. The reason of selecting CKYL method is its simple adaptation for social choice and sustainability issues. The paper is organized as follows: Section 1 introduces the main aspects of the Qualitative TOPSIS methodology. Section 2 presents the application to the windfarm location problem. Section 3, compares the results obtained by the qualitative TOPSIS methodology and the CKYL method. Finally, section 4 closes this document with a few conclusions and a brief description of our future work.

1. A Windfarm Location Problem in Catalonia Recently the rapid development in wind energy technology has made it the promising alternative to conventional energy systems. Wind energy as a powerful source of renewable energy with rapid and simple installation, lack of emissions and low water consumption, is one of the most promising tools to confront global warming [5], [6]. Nevertheless, windfarm location is a problem that involves multiple and conflicting factors related to public opinion and public interest. To find the best windfarm location, the relevant economic, social, technical and environmental perspectives must be taken into account. Some studies have examined these key factors [6]–[8]. A case of wind park location problem in Catalonia is used to illustrate the potential of the proposed method in Section 2. Early in this political process, different alternatives were proposed for the location of the desired windfarms (Table 1). The alternatives are constructed combining information from participatory processes, interviews and a review of the projects that was performed by the research group in [5]. Table 1. Alternatives for the location of windfarm Alternatives CB-Pre: Coma Bertran Preliminary project. CB: Coma Bertran project. ST: Serra del Tallat project. CBST: Combination of CB and ST projects. L: Based on CB and ST projects, this alternative considers the windmills located at least 1.5 km from the inhabited centers and potential tourist attractions (Santuari del Tallat). R: This option tries to move the windmills away from the inhabited centers presenting higher resistance to the winfarms (Senan and Montblanquet) NP: the possibility of not constructing parks at all.

A. Afsordegan et al. / A Comparison of Two MCDM Methodologies

229

2. Two MCDM Ranking Approaches Two MCDM ranking approaches are applied in this paper to the introduced windfarm location problem, Qualitative TOPSIS methodology and CKYL methodology. 2.1. Qualitative TOPSIS Methodology This paper considers a TOPSIS methodology by using an extension of the qualitative approach introduced in [9]. TOPSIS developed by Hwang and Yoon in 1981, is one of the MCDM techniques for ranking of alternatives. The basic idea in TOPSIS that is used in this paper is that the compromise solution has the shortest distance to the positive “ideal” and the farthest distance from negative “ideal” solution [10]–[12]. On the other hand, QR techniques manage data in terms of absolute or relative orders of magnitude. Absolute orders of magnitude models usually consider a discretization of the real line in intervals corresponding to different qualitative labels. In this way, as a contribution of this paper to show the advantages of using qualitative labels, Q-TOPSIS approach is introduced that alternatives are represented by kdimensional vectors of qualitative labels, being k = r·m, where these two numbers (r and m) are the number of criteria and the number of experts, respectively. The expert’s evaluations are given through a set of qualitative labels belonging to certain order-of-magnitude spaces. The basic qualitative labels, corresponding to linguistic terms, are usually defined via a discretization given by a set { , … ,  . } of real numbers  < / <  . , 01 = [1 , 1. ]  = 1, … , 4. The non-basic qualitative labels, describing different levels of precision, are defined by [01 , 0 ] = 51 ,  . 6, ,  = 1, … , 4, with i < . Then, considering [01 , 01 ] = 01 , and μ a measure defined over the set of basic labels, a location function of each qualitative label [ 01 , 0 ] is introduced as an element in 78 whose first component is the opposite of the addition of the measures of the basic labels to the left of [01 , 0 ] and whose second component is the addition of the measures of the basic labels to its right (Eq. 1):

9 :501 , 0 6; = :? 1 A  @(0A ) , A . @(0A );.

(1)

The location function is applied to each component of the k-dimensional vector of labels representing an alternative. As a result, each alternative is codified via a vector in 78C . The reference labels to compute distances are, respectively, the minimum reference label, which is the vector E = (0 , … , 0 )  78 , and the maximum reference label, which is the vector EF = (0 , … , 0 )  78 . Their location function values are as follows: G(E ) = (0 ,  A 8 @(0A ) , … , 0 ,  A 8 @(0A ))



 G(EF ) = (? A  @(0A ) , 0, … , ? A  @(0A ) , 0)

Then it can be integrated with TOPSIS to find the Euclidean distances of each alternative location to EF and E locations: 1F is the distance between the alternative location G(E1 ) and G(EF ) , meanwhile 1 is the distance between the alternative location G(E1 ) and G(E ). The qualitative closeness coefficient of each alternative is obtained by (Eq. 2).

230

A. Afsordegan et al. / A Comparison of Two MCDM Methodologies

JKK1 =

LM

LF .LM

,

 = 1,2 … , N.

(2)

Finally, the alternatives are ranked by the decreasing order of JKK1 values. 2.2. The CKYL approach The CKYL method was provided by Young and Levenglick in 1978 [13]. This model is able to integrate together social, economic and technical factors inside a coherent framework and it is a powerful model for energy policy analysis. The underlying idea for the development of this method is to enrich the dominance relation by some elements based on preference aggregation. In CKYL method, when a DM must compare two alternatives, she/he will react in preference for one of them and indifference between them [14].

3. An application to the windfarm location problem in Catalonia Taking into account the alternatives in Table 1, the both approaches introduced in Section 2 are performed on the basis of nine indicators. The weights are considered equal to avoid compensation ( Table 2) [5]. Table 2. Evaluation Criteria Criteria Economic

Social Socio-ecologic

Technical

Indicators Land owner`s income Economic activity tax Construction tax Number of jobs Visual impact Forest lost Avoided CO2 emissions Noise Installed capacity

Weight 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11

Direction + + + + + +

3.1. Qualitative TOPSIS Results The criteria scores should be computed to construct the multi-criteria impact matrix (Table 3). All the scores are extracted by regional scale in [5]. Table 3. Multi-criteria impact matrix Criteria

Threshold

CB-Pre

CB

ST

CBST

L

R

NP

Land owner`s income Economic activity tax Construction tax Number of jobs Visual impact Forest lost Avoided CO2 emissions Noise Installed capacity

3000 10000 12000 1 1.5

48000 12750 61990 2 76057 804

33000 15470 55730 1 71.465 8.1

99000 46410 96520 4 276.55 6.6

132000 61880 15250 5 348.015 14.7

78000 36570 81890 3 220.4 3.9

72000 33750 67650 3 163.29 2.6

-

200

4680

6010

19740

25750

14740

13760

-

10 1

14.64 13.6

23.86 16.5

18.6 49.5

23.84 66

20.88 39

14.66 36

-

231

A. Afsordegan et al. / A Comparison of Two MCDM Methodologies

Then, the steps of the Qualitative TOPSIS algorithm are executed. To this end, the highest and the lowest scores of each criterion are respectively considered, in this case, as the maximum and the minimum elements of the qualitative space, and therefore as reference labels. The first step of this algorithm is assigning qualitative labels to the quantitative scores to simplify the computation in the process of ranking. The Qualitative TOPSIS approach considered in this example uses seven basic qualitative labels for each criterion which correspond to seven intervals whose length is defined via the corresponding threshold in Table 3. Table 4 shows these qualitative labels together with their locations, obtained directly from Formula (1) where the considered measure P over the set of basic labels is (01 ) = 1, for all i= 1,…,7. Table 4. Different levels of qualitative labels Qualitative Labels B1 B2 B3 B4 B5 B6 B7

Locations (0,6) (-1,5) (-2,4) (-3,3) (-4,2) (-5,1) (-6,0)

Here, starting from the impact matrix and using the threshold presented in Table 3 and qualitative labels presented in Table 4, a qualitative impact matrix is constructed (Table 5). Table 5. Qualitative impact matrix Criteria Land owner`s income Economic activity tax Construction tax Number of jobs Visual impact Forest lost Avoided CO2 emissions Noise Installed capacity

CB-Pre B3 B3 B4 B3 B5 B3 B2 B6 B2

CB B2 B3 B4 B2 B6 B3 B3 B6 B3

ST B6 B6 B6 B6 B2 B4 B6 B6 B6

CBST B7 B7 B7 B7 B1 B1 B7 B6 B7

L B5 B5 B5 B5 B3 B6 B5 B6 B5

R B4 B5 B4 B5 B4 B6 B4 B6 B4

NP B1 B1 B1 B1 B7 B7 B1 B7 B1

Each alternative (A) is represented by a 9-dimensional vector of qualitative labels (Formula 3), obtained from its indicators’ assessments (C shows the indicators). A P (C , … , CT )

(3)

Then, as mentioned in Subsection 2.1, each label is represented via a vector in 78C . Therefore, the location function codifies each alternative by an 18-dimensional vector of real numbers (Formula 4) representing the position of the vector in Formula 3. Table 6 shows the alternatives evaluation matrices via the locations of the nine indicators. L(A) = ( V , … , VX )

(4)

The two vectors L(A ) = L(0 , … , 0 ) = (0,6, … ,0,6) and (AF ) = L(0Z , … , 0Z ) = (?6,0, … , ?6,0) are considered as the reference labels to compute distances. Then, the

232

A. Afsordegan et al. / A Comparison of Two MCDM Methodologies

Euclidean distance of each alternative from two reference labels is calculated by means of (Eq. 5). Note that the vector in Eq. 5 for each alternative E1 is obtained by the ith column of the locations of the nine criteria of the matrix. Table 6. Location impact matrix Criteria Land owner`s income Economic activity tax Construction tax Number of jobs Visual impact Forest lost Avoided CO2 emissions Noise Installed capacity

CB-Pre (-2,4) (-2,4) (-3,3) (-2,4) (-4,2) (-2,4) (-1,5) (-5,1) (-1,5)

CB (-1,5) (-2,4) (-3,3) (-1,5) (-5,1) (-2,4) (-2,4) (-5,1) (-2,4)

ST (-5,1) (-5,1) (-5,1) (-5,1) (-1,5) (-3,3) (-5,1) (-5,1) (-5,1)

CBST (-6,0) (-6,0) (-6,0) (-6,0) (0,6) (0,6) (-6,0) (-5,1) (-6,0)

L (-4,2) (-4,2) (-4,2) (-4,2) (-2,4) (-5,1) (-4,2) (-5,1) (-4,2)

R (-3,3) (-4,2) (-3,3) (-4,2) (-3,3) (-5,1) (-3,3) (-5,1) (-3,3)

NP (0,6) (0,6) (0,6) (0,6) (-6,0) (-6,0) (0,6) (-6,0) (0,6)

8 (E, E\) = ^T1  1 (bC ( VC1 – V` C_ ) )

(5)

The considered weights in this case are equal and the procedure detailed in Subsection 2.1 was applied. Table 7 shows the values of the distances to the reference labels of each alternative together with the values of the JKK1 . Table 7. Closeness coefficient factors Column1 CB-Pre CB ST CBST L R NP

1

3.126 3.018 6.411 8.013 5.099 4.268 0

1F

JKK1

5.811 5.98 3.018 2.357 3.915 4.570 8.485

0.349 0.335 0.679 0.772 0.565 0.482 0

According to the maximum JKK1 values, the best alternative is CBST and the order of the rest of alternatives is ST, L, R, CB-Pre, CB and NP.

4. Comparison of the Qualitative TOPSIS and a non-compensatory method In this section, Qualitative TOPSIS is compared with the methodology CKYL in two different aspects, theoretically and also comparing the results obtained by both methodologies. In order to obtain a final ranking of the available alternatives, a lot of multi-criteria models exist [15], [16]. Each of them has its own advantages and disadvantages. As said in the introduction, the reason of selecting CKYL method is its simple adaptation for social choice and sustainability issues. In addition, in this method a weakness of criteria is not compensated by strength of other desirable criteria. So, using non compensatory models in the social framework helps preserving all social actors' opinions, but we believe they can be further enhanced via the combination with qualitative reasoning methods. So let us first compare both the advantages and drawbacks of both methods by using an example based on data provided in [5].

233

A. Afsordegan et al. / A Comparison of Two MCDM Methodologies

In the following subsection the outranking method introduced by [5] is considered and compared. 4.1. CKYL outranking method results In this subsection, the results provided by the CKYL method are considered. Starting from the impact matrix and using the threshold presented in Table 3, an ordinal matrix is constructed (Table 8). Table 8. Ordinal impact matrix Column1 Land owner`s income Economic activity tax Construction tax Number of jobs Visual impact Forest lost Avoided CO2 emissions Noise Installed capacity

CB-Pre

CB 2 3 4 2 6 3 3 6 3

3 3 4 3 5 3 2 6 2

ST 6 6 6 6 2 4 6 6 6

CBST 7 7 7 7 1 1 7 6 7

L 5 5 5 5 3 6 5 6 5

R 4 5 4 5 4 6 4 6 4

NP 1 1 1 1 7 7 1 7 1

In CKYL method, the equal weights are derived as importance coefficients to avoid compensation and the criterion scores must be aggregated. The algorithm is built through an outranking matrix by pair-wise comparison between alternative j and k by means of (Eq. 6): n

! C = c

f 



df :g C ; + f :k C ;l

(6)

8

where Pqu and Iqu indicate a “preference” and an “indifference” relation, respectively, it means a higher value of criterion score is preferred to lower one and the same scores indicate an indifference relation. The maximum likelihood ranking of alternatives is the ranking supported by the maximum number of criteria for each pairwise comparison, summed over all pairs of alternatives considered. So, all the N (N-1) pair-wise comparisons compose the outranking matrix (Table 9). Table9. Outranking matrix CB-Pre CB-Pre CB ST CBST L R NP

0 0.56 0.83 0.72 0.83 0.78 0.33

CB 0.44 0 0.83 0.72 0.83 0.78 0.33

ST CBST 0.17 0.28 0.17 0.28 0 0.28 0.72 0 0.28 0.28 0.28 0.28 0.33 0.33

L

R 0.17 0.17 0.72 0.72 0 0.33 0.33

0.22 0.22 0.72 0.72 0.67 0 0.33

NP 0.67 0.67 0.67 0.67 0.67 0.67 0

Considering that there are v! possible complete rankings of alternatives, the corresponding score yA is computed for each one of them and the final ranking is the one that maximizes (Eq. 7). yA =  ! C

 z |, ~ = 1,2, … v! 4 ! C  !A

(7)

The five best ranking with the maximum score among all 5040 possible rankings according to the number of alternatives are presented in Table 10. This method gives

234

A. Afsordegan et al. / A Comparison of Two MCDM Methodologies

the opportunity of doing sensitivity analysis in respect to different policy maker’s aspects of views in different social evaluation frameworks. Table 10. Ranking CKYL Ranking 1 Ranking 2 Ranking 3 Ranking 4 Ranking 5

1 CBST CBST CBST CBST CBST

2 ST ST L ST ST

3 L L ST R L

4 R R R L R

5 CB CB-Pre CB CB CB

6 CB-Pre CB CB-Pre CB-Pre NP

7 NP NP NP NP CB-Pre

4.2. The comparison of two methodologies Although these two methods have produced the similar rankings, they have different characteristics in their framework and structures. The Qualitative TOPSIS method does not require the handling of the previous discretization or definition of landmarks to define initial qualitative terms because the calculations are performed directly with the labels though the location functions so the computations are very fast and easy. In contrast, CKYL method uses maximum likelihood approach as an aggregation function is more difficult to compute and becomes unmanageable as the number of alternatives rises. As it can be seen in Table 11, the best solution obtained by the first method is the second best in the other. Also, the Qualitative TOPSIS constructs only one ranking meanwhile CKYL explores all the possible ones, which of course could represent different social actors’ preferences. Table 11. Ranking results Ranking Qualitative TOPSIS CKYL

1 CBST CBST

2 ST ST

3 L L

4 R R

5 CB-Pre CB

6 CB CB-Pre

7 NP NP

Additionally, the Qualitative TOPSIS method can address different levels of precision, from the basic labels, which represent the most precise ones to the least precise label which can be used to represent unknown values. This strength of the proposed method in not represented in this study because of using an illustrative example with previous evaluation scores. In the real case study, it can trade-off among criteria or attribute values. On the other hand, CKYL avoids compensation and tradeoffs by representing as weights the importance coefficients. Therefore, low scores on one criterion cannot be compensated by high scores on another. Table 12 shows the main characteristics of two methodologies. Table 12. Comparison of both ranking methodologies Scale Granularity Compensatory Weights Aggregation step Aggregation function Results

Qualitative TOPSIS Qualitative labels Multi-granularity Compensatory Model Trade-off Based on distance function Distance to the maximum and minimum One ranking

CKYL Ordinal scale Fixed granularity Non-Compensatory Model Importance coefficients Outranking and pair-wise comparison Maximum likelihood approach N! ranking

A. Afsordegan et al. / A Comparison of Two MCDM Methodologies

235

These differences suggest that both methods could be used together synergistically. For instance the Qualitative TOPSIS can more efficiently process data which is qualitative from the beginning and expert group decision making; meanwhile CKYL can impose the absence of compensation.

5. Conclusion This paper introduces a MCDM application to windfarm location selection based on a Qualitative TOPSIS methodology to give the experts the ability of dealing with uncertainty. The results are compared with the CKYL methodology based on outranking. One of the main advantages of the proposed method is that each stakeholder could express their preferences with its convenient degree of precision. This is a desirable feature in order to consider transparent decision processes. In addition, from a sustainability point of view, a social or environmental disaster can never be compensated with an economic success. This is also one of the main characteristics of the proposed method. As future research the integration of both methods will be considered. On the one hand, the use of non-compensatory methods is considered to be very useful in sustainability problems, because the weakness of one criterion is not compensated by the strength of others with the same weights. On the other hand, the use of qualitative labels determined by a partition of the real line, introduced in qualitative reasoning, is considered to provide an appropriate evaluation framework for group decision-making. Although in the illustrative case introduced in the paper, the strength of the proposed method in using different levels of precision is not used, in future research new real cases will be consider to apply it. Acknowledgements This research was partially supported by the SENSORIAL Research Project (TIN201020966-C02-01 and TIN2010-20966-C02-02), funded by the Spanish Ministry of Science and Information Technology. Partial support was also provided by a doctoral fellowship awarded to one of the authors at the ESADE Business School, with additional support from Ramon Llull University.

References [1]

[2]

[3]

S. D. Pohekar and M. Ramachandran, “Application of multi-criteria decision making to sustainable energy planning—A review,” Renew. Sustain. Energy Rev., vol. 8, no. 4, pp. 365–381, Aug. 2004. K. F. R. Liu, “Evaluating environmental sustainability: an integration of multiple-criteria decision-making and fuzzy logic.,” Environ. Manage., vol. 39, no. 5, pp. 721–36, May 2007. J.-J. Wang, Y.-Y. Jing, C.-F. Zhang, and J.-H. Zhao, “Review on multi-criteria decision analysis aid in sustainable energy decision-making,” Renew. Sustain. Energy Rev., vol. 13, no. 9, pp. 2263–2278, Dec. 2009.

236

[4]

[5]

[6]

[7]

[8]

[9]

[10] [11] [12]

[13] [14] [15]

[16]

A. Afsordegan et al. / A Comparison of Two MCDM Methodologies

J.     

 -Viedma, “A consensus model for group decision making problems with linguistic interval fuzzy preference relations,” Expert Syst. Appl., vol. 39, no. 11, pp. 10022–10030, Sep. 2012. G. Gamboa and G. Munda, “The problem of windfarm location: A social multicriteria evaluation framework,” Energy Policy, vol. 35, no. 3, pp. 1564–1583, Mar. 2007. A. H. I. Lee, H. H. Chen, and H.-Y. Kang, “Multi-criteria decision making on strategic selection of wind farms,” Renew. Energy, vol. 34, no. 1, pp. 120–126, Jan. 2009. T.-M. Yeh and Y.-L. Huang, “Factors in determining wind farm location: Integrating GQM, fuzzy DEMATEL, and ANP,” Renew. Energy, vol. 66, pp. 159–169, Jun. 2014. M. Wolsink, “Near-shore wind power—Protected seascapes, environmentalists’ attitudes, and the technocratic planning perspective,” Land use policy, vol. 27, no. 2, pp. 195–203, Apr. 2010. N. Agell, M. Sánchez, F. Prats, and L. Roselló, “Ranking multi-attribute alternatives on the basis of linguistic labels in group decisions,” Inf. Sci. (Ny)., vol. 209, pp. 49–60, Nov. 2012. C.-L. Hwang and K. Yoon, Multiple Attribute Decision Making: Methods and Applications. Springer-Verlag, 1981. H. Shih, H. Shyur, and E. Lee, “An extension of TOPSIS for group decision making,” Math. Comput. Model., vol. 45, pp. 801–813, 2007. S. Opricovic and G.-H. Tzeng, “Compromise solution by MCDM methods: A comparative analysis of VIKOR and TOPSIS,” Eur. J. Oper. Res., vol. 156, no. 2, pp. 445–455, Jul. 2004. H. Young and A. Levenglick, “A consistent extension of Condorcet’s election principle,” SIAM J. Appl. Math., no. July, 1978. J. Figueira, V. Mousseau, and B. Roy, “ELECTRE methods,” … criteria Decis. Anal. State art …, pp. 1–35, 2005.  !"#$%!!&%&' ' ()%   decision aiding method,” EURO J. Decis. Process., vol. 1, no. 1–2, pp. 69–97, May 2013. H. Polatidis, D. Haralambopoulos, G. Munda, and R. Vreeker, “Selecting an Appropriate Multi-Criteria Decision Analysis Technique for Renewable Energy Planning,” Energy Sources, Part B Econ. Planning, Policy, vol. 1, no. 2, pp. 181–193, Jul. 2006.

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-237

237

A System for Generation and Visualization of Resource-Constrained Projects Miquel BOFILL a,1 , Jordi COLL a , Josep SUY a and Mateu VILLARET a a Universitat de Girona, Spain Abstract. In this paper we present a system for solving the Resource-Constrained Project Scheduling Problem (RCPSP) by means of reformulation into SMT. We extend previous results to RCPSP/max, where maximum delays between tasks are considered in addition to minimum delays. We also present a tool for easy modeling and solving of RCPSP/max instances, which allows a graphical representation of the solution obtained. The system is modular in the sense that new solvers, not necessarily based on SMT, can be plugged-in as external executables. Keywords. Scheduling, RCPSP, Reformulation, SMT, Tools

Introduction The Resource-Constrained Project Scheduling Problem (RCPSP) is one of the most general scheduling problems that has been extensively studied in the literature. It consists in scheduling a set of non-preemptive activities (or tasks) with predefined durations and demands on each of a set of renewable resources, subject to partial precedence constraints. Durations and precedence constraints between tasks imply minimal distances between tasks (minimum time lags). Normally, the goal is to minimize the makespan, i.e., the end time of the schedule. Many generalizations exist: for example, in RCPSP/max, minimum and maximum time lags are considered. That is, a maximum delay between the start time of every two tasks can be specified, in addition to the minimum delays implied by precedences. RCPSP/max is harder than RCPSP, since finding a feasible solution already becomes NP-hard. For a survey on variants and extensions of the RCPSP see [5]. Exact methods for solving the RCPSP are based on Mixed Integer Linear Programming (MILP) [11], Boolean satisfiability(SAT) [6], Satisfiability Modulo Theories (SMT) [1] and Lazy Clause Generation (LCG) [16]. The latter approach is state of the art in RCPSP and RCPSP/max. In this paper we extend the approach of [1] based on SMT to RCPSP/max, and report some experimental results showing that our approach is competitive with LCG. 1 Corresponding Author: Miquel Bofill, Universitat de Girona, Campus de Montilivi, Edifici P-IV, E-17071 Girona, Spain; E-mail: miquel.bofi[email protected]

238

M. Bofill et al. / A System for Generation and Visualization of Resource-Constrained Projects

Moreover, we present a tool, similar to PSPSolver [15], to graphically represent RCPSP instances and their solutions. Our system differs from PSPSolver in several aspects: • The user can create and modify instances. • The user can plug-in any solver by simply setting its execution parameters, provided that the solver accepts the rcp or sch formats proposed in [10]. • Solutions can automatically be retrieved and graphically represented if they are supplied in a simple format (described in Section 3). • The system supports several variants of RCPSP, such as: RCPSP/max, multimode RCPSP and multi-mode RCPSP/max. We specially remark the feature of creating/modifying instances since we believe that this feature can be very helpful to the researcher. The rest of the paper is organized as follows. In Section 1 we briefly introduce the RCPSP. In Section 2 we extend the SMT encodings for the RCPSP presented in [1] to RCPSP/max, and provide some experiments to evaluate the approach. In Section 3 we present the graphical tool. In Section 4 we conclude and point out some future research directions.

1. The Resource-Constrained Project Scheduling Problem (RCPSP) The RCPSP is defined by a tuple (V, p, E, R, B, b) where: • V = {A0 , A1 , . . . , An , An+1 } is a set of activities. A0 and An+1 are dummy activities representing, by convention, the starting and finishing activities, respectively. The set of non-dummy activities is defined as A = {A1 , . . . , An }. • p ∈ Nn+2 is a vector of durations, where pi denotes the duration of activity i, with p0 = pn+1 = 0 and pi > 0, ∀i ∈ {1, . . . , n}. • E is a set of pairs representing precedence relations, thus (Ai , A j ) ∈ E means that the execution of activity Ai must precede that of activity A j , i.e., activity A j must start after activity Ai has finished. We assume that we are given a precedence activity-on-node graph G(V, E) that contains no cycles; otherwise the precedence relation is inconsistent. Since the precedence is a transitive binary relation, the existence of a path in G from node i to node j means that activity i must precede activity j. We assume that E is such that A0 precedes all other activities and An+1 succeeds all other activities. • R = {R1 , . . . , Rm } is a set of m renewable resources. • B ∈ Nm is a vector of renewable resource availabilities, where Bk denotes the available amount of each resource Rk per time unit. • b ∈ N(n+2)×m is a matrix containing the resource demands of activities, where bi,k denotes the amount of resource Rk required by activity Ai . Note that b0,k = 0, bn+1,k = 0 and bi,k ≥ 0, ∀i ∈ {1, . . . , n}, ∀k ∈ {1, . . . , m}. A schedule is a vector S = (S0 , S1 , . . . , Sn , Sn+1 ), where Si denotes the start time of activity Ai . We assume that S0 = 0. A solution to the RCPSP is a non-preemptive2 sched2 An

activity cannot be interrupted once it is started.

M. Bofill et al. / A System for Generation and Visualization of Resource-Constrained Projects

239

Activity duration 1,2,1

1,3,2 2

1

3,2,2 3

5

7

0

2

3 2

0

2

3

0,0,0

6

1,1,1

0 2

4

2,1,1

3

2,1,1 4

8 0,0,0

1

1,2,1

Demand of each resource

Resource 1, availability = 3 3 2 1 0

5

3 2

1

4

7

6

time

Resource 2, availability = 3 3 2 1 0

3

5

1

4

2

7

6

time

Resource 3, availability = 2 2 1 0

1

5

3 2

4

6

7 time

Figure 1. An example of RCPSP and one of its possible solutions [12].

ule S of minimal makespan Sn+1, subject to the precedence and resource constraints. It can be stated as follows: minimize Sn+1

(1)

subject to: S j − Si ≥ pi

∀(Ai , A j ) ∈ E

(2)



∀Bk ∈ B, ∀t ∈ H

(3)

Ai ∈At

bi,k ≤ Bk

where At = {Ai ∈ A | Si ≤ t < Si + pi } represents the set of non-dummy activities in process at time t, the set H = {0, . . . , T } is the scheduling horizon, with T (the length of the scheduling horizon) being an upper bound of the makespan. A schedule S is said to be feasible if it satisfies the precedence constraints (2) and the resource constraints (3). In the example of Figure 1, three resources and seven (non-dummy) activities are considered. Each node is labeled with the number of the activity it represents. The durations of the activities are indicated in the on-going arcs, and the resource consumptions

240

M. Bofill et al. / A System for Generation and Visualization of Resource-Constrained Projects

are indicated in the labels next to the nodes. The upper part of the picture represents the instance to be solved, while the lower part gives a feasible solution using Gantt charts. For each resource, the horizontal axis represents the time and the vertical axis represents the consumption. The work in [1] introduced the rcpsp2smt3 system for solving RCPSP instances in the rcp and sch formats4 by means of reformulation into SMT, using several SMT encodings, named Time, Task and Flow. We briefly recall the main ideas of the two former, which are the ones exhibiting best performance: • The Time encoding is very similar to the MILP encoding proposed in [14]. The idea is to check, for every time step t and resource Rk , that the sum of demands of all activities for resource Rk at time t does not exceed the availability of resource Rk . 0/1 variables are used to represent if an activity is being processed at each time step. • The Task encoding is inspired by the encoding proposed in [13]. In this encoding, variables are indexed by activity number instead of by time. The key idea is that it suffices to check that there is no overload at the beginning (or end) of each activity. Contrarily to the Time encoding, here the number of variables does not depend on the time horizon.

2. The Resource-Constrained Project Scheduling Problem with Minimum and Maximum Time Lags (RCPSP/max) RCPSP/max is a generalization of the RCPSP where the precedence graph G(V, E) becomes G(V, E, g), being g an edge labeling function, valuating each edge (Ai , A j ) ∈ E with an integer time lag gi, j . Non-negative lags gi, j ≥ 0 correspond to a minimum delay in the start of activity A j with respect to the start of activity Ai . The case gi, j < 0 corresponds to a maximum delay of −gi, j units in the start of activity Ai with respect to the start of activity A j (see Figure 2). Note that the standard RCPSP can be seen as the particular case of RCPSP/max where only minimum time lags are considered, taking gi, j = pi ∀(Ai , A j ) ∈ E. Regardless of minimizing the makespan, deciding if there exits a resource-feasible schedule that respects the minimum and maximum lags is NP-complete [2], and the optimization problem is NP-hard in the general case. The Time and Task encodings proposed for RCPSP in [1] can be straightforwardly adapted to RCPSP/max as follows. We simply need to replace the constraints (2) by the following, where general time lags are considered instead of only durations: S j − Si ≥ gi, j

∀(Ai , A j ) ∈ E

(4)

Although the constraints are almost the same as before, the preprocessing steps of [1] need to be adapted to this more general setting, as we describe in the next subsection. 3 http://imae.udg.edu/recerca/lap/rcpsp2smt/ 4 RCPSP

formats of the PSPLib [10].

M. Bofill et al. / A System for Generation and Visualization of Resource-Constrained Projects

241

Time lag 3;1,2,1

2;1,3,2 3

1

2;3,2,2 3

5

0 -3

1 0

3

-7

0;0,0,0 2

5

3;2,1,1

8 0;0,0,0

1

4

4

4;2,1,1

2

6

2

2;1,1,1

0

7 2

1;1,2,1

Activity duration; demand of each resource

Resource 1, availability = 3 3 2 1 0

3

5 2

1

4

6

7 time

Resource 2, availability = 3 3 2 1 0

1

5

3 2

4

6

7 time

Resource 3, availability = 2 2 1 0

1

3 2

4

5 6

7 time

Figure 2. An example of RCPSP/max and one of its possible solutions.

2.1. Preprocessing In [1], several preprocessing computations are proposed to derive implied constraints that can help improve propagation during the solving process. Those include the computation of an extended precedence set, a lower and an upper bound for the makespan, time windows of activities and a matrix of incompatibilities between activities. By extended precedences the authors refer to minimum time lags between all pairs of activities. They are computed in polynomial time O(n3 ), where n is the number of activities, by running the Floyd-Warshall algorithm on the graph defined by the precedence relation E. For RCPSP/max, only positive edges must be considered. The extended precedence set can be used to easily detect some unsatisfiable instances, given that an RCPSP/max instance is satisfiable only if the following two conditions hold: 1. There is no cycle of positive length, i.e., the minimum time lag between each activity and itself is zero. 2. There is no contradiction between the maximum and the minimum time lags between activities, i.e., for every negative edge g j,i indicating a maximum delay of |g j,i | time units in the start of activity A j after the start of activity Ai , we must

242

M. Bofill et al. / A System for Generation and Visualization of Resource-Constrained Projects

have that the minimum time lag between Ai and A j in the extended precedence set is not greater than |g j,i |. Although finding a solution (with minimal makespan) for the RCPSP is NP-hard, a feasible schedule can be trivially found by concatenating all activities, i.e., without running any two activities in parallel. This determines a trivial upper bound. Moreover, well known heuristics, such as the parallel scheduling generation scheme [7,8] can be used to improve this bound. However, this is not the case for the RCPSP/max, due to the presence of maximum time lags. In fact, as commented before, in this case finding a feasible solution is already NP-complete. For this reason, the upper bound that we consider here is the following trivial one: UB =

(gi, j )) ∑ max(pi , (Aimax ,A j )∈E

Ai ∈A

The lower bound, time windows and incompatibilities between activities are calculated exactly as in [1]. 2.2. Experiments We have carried out experiments on the following RCPSP/max instances accessible from the PSPLib [10]: • Test sets J10, J20 and J30, each consisting of 270 instances with 5 resources and 10, 20 and 30 activities, respectively (cf. [9]). • Test sets UBO10, UBO20, and UBO50, each consisting of 90 instances with 5 resources and 10, 20 and 50 activities, respectively (cf. [4]). R The experiments were run on an Intel [email protected], with 8GB of RAM, using Yices [3] version 1.0.40 as core SMT solver, with a cutoff of 600 seconds per instance. All instances in the J10, J20, UBO10 and UBO20 test sets were successfully solved within the allowed time. For the J30 test set, only 7 instances could not be solved with the Time encoding, and 13 with the Task encoding. For the UBO50 test set, only 3 instances could not be solved with the Time encoding, and also 3 with the Task encoding. These results show that our approach is competitive with the state of the art approach of LCG, which is an hybrid of a finite domain solver and a SAT solver: the results reported in [16] show between 7 and 9 unsolved instances in the J30 test set (depending on the strategy used) and 3 unsolved instances in the UBO50 test set, with a runtime limit of 600 seconds. We ignore which was the memory limit used in the experiments of [16]. Unfortunately, our system raised an out of memory error for bigger instances like the ones in the UBO100 test set, consisting of 100 activities.

3. Description of the tool VIEWPROJECT5 is a desktop tool for visualizing and editing scheduling problems’ instances. It represents in a graphic environment the resources, activities, and restrictions 5 http://imae.udg.edu/recerca/lap/viewproject/

M. Bofill et al. / A System for Generation and Visualization of Resource-Constrained Projects

243

Figure 3. Graph with a leftmost distribution. The filled nodes are member of the critical path between the starting and finishing activities.

on time and resource usages in a project, and lets the user to modify them interactively. The user can create instances or modify existing ones by introducing or removing activities and updating their durations, resource usages and precedences. The user can also introduce or remove resources and modify their capacities. Finally, the system is also able to run external solvers to find schedules for the project and to plot them in Gantt charts, with the only restriction of getting solver’s output in a specific format. Problem instances are loaded and saved from and to plain text files. The supported encodings are .rcp files for RCPSP and .sch files for RCPSP/Max (see [10]). The set of activities and their relations are displayed in an interactive directed and acyclic graph (see Figure 3). Each activity has its corresponding node labeled with its identifier number. The edges between pairs of activity nodes represent time constraints (e.g., precedences), and they are labeled with the associated time lags. The application offers a default set of node distributions giving visual information of precedences, but it is possible to move the nodes everywhere on the display area. The default distributions are the following: • Leftmost distribution: All nodes are placed in columns and each node is in the leftmost possible column according to the precedence relation (i.e., being always at the right of all its predecessors). • Rightmost distribution: All nodes are placed in columns and each node is in the rightmost possible column according to the precedence relation. • Centered distribution: All nodes are placed in columns and as centered as possible according to the precedence relation. • Circular distribution: The nodes are placed equidistantly on the edge of an imaginary circle. This distribution ensures that no two edges overlap completely. All nodes will be at the right or at the same vertical position of all their predecessors. The rest of the project properties, such as the durations of the activities, resource usages and the capacity of the resources are displayed in text panels (see Figure 4).

244

M. Bofill et al. / A System for Generation and Visualization of Resource-Constrained Projects

Figure 4. Editor panel of project properties.

Another of the utilities of the tool is to draw critical (longest necessary) paths between pairs of nodes in the graph.6 In order to find the critical paths, the longest path between each pair of nodes is calculated with the all-pairs shortest (longest) path FloydWarshall algorithm. It is possible to run external solvers to solve problem instances. For that purpose, the path to an executable file must be given to the tool, as well as the needed command line arguments. No other information than the arguments is passed to the solvers, so it is the solver’s responsibility to load the problem instance from the source file. The expected output is a plain text in the standard output channel, with only the following tokens: 1. A sequence S 0 = S0 ; S 1 = S1 ; S 2 = S2 ; . . . where Si is the start time of activity Ai in the schedule. It is mandatory to finish the sequence with an end of line character “\n”. 2. Optionally, it can be stated that the schedule is optimum with a final line: s OPTIMUM FOUND If the output doesn’t match with this template, it will be displayed as plain text. Once the the solution is loaded, it is represented in m Gantt charts, one for each resource Ri in the project (see Figure 5). The horizontal edge represents the timeline, and the vertical edge the capacity of the resource. In the chart for a resource Ri , each activity A j is represented as a rectangle placed in the horizontal position corresponding to its start time S j . The width of the rectangle is equal to the duration p j of activity A j . The height of the rectangle is equal to the usage b j,i of resource Ri by activity A j . All the start times S0 , . . . , Sn+1 are also displayed in a text panel.

4. Conclusion We have presented an extension of the results in [1] for RCPSP to RCPSP/max, and showed that reformulation into SMT is a competitive approach for these problems in front of other methods. With preprocessing, an iterative process of minimization, and 6 Note

that the critical path between two nodes determines the minimum time lag between the two corresponding activities.

M. Bofill et al. / A System for Generation and Visualization of Resource-Constrained Projects

245

Figure 5. Gantt chart for a resource in a schedule.

using an SMT solver as core solver, we have obtained similar performance than those of state of the art systems. As future work, a first goal is to take profit from maximal time lags to improve the initial upper bound of the schedule. This improvement could imply a reduction on the number of variables in the Time encoding, and hence, mitigate the memory problems when dealing with big instances. Also, the search for encodings that require less memory is planned as further work. In a broader sense, future plans involve using SMT to solve more general problems such as the Multi-Mode Resource Constrained Project Scheduling Problem (MRCPSP), the Multi-Mode Resource Constrained Project Scheduling Problem with Minimal and Maximal Time Lags (MRCPSP/max), the MultiSkill Project Scheduling Problem (MSPSP), and the Resource Investment Problem with Minimal and Maximal Time Lags (RIP/max). In all these generalizations, with more complex Boolean structure, we believe that SMT will excel further. We have also presented a tool for easy definition, modification and graphical representation of instances, as well as solutions. Contrarily to the library oriented approach of PSPSolver [15], our tool communicates with the RCPSP solver through text files in standard formats, so that any solver, based on any technology, can be easily plugged-in.

Acknowledgements This work has been partially supported by the Spanish Ministry of Economy and Competitiveness through the project HeLo (ref. TIN2012-33042).

References [1]

[2] [3] [4]

[5]

C. Ans´otegui, M. Bofill, M. Palah´ı, J. Suy, and M. Villaret. Satisfiability Modulo Theories: An Efficient Approach for the Resource-Constrained Project Scheduling Problem. In Proceedings of the 9th Symposium on Abstraction, Reformulation, and Approximation (SARA), pages 2–9. AAAI, 2011. M. Bartusch, R. H. Mohring, and F. J. Radermacher. Scheduling Project Networks with Resource Constraints and Time Windows. Annals of Operations Research, 16:201–240, January 1988. B. Dutertre and L. de Moura. The Yices SMT solver. Tool paper available at http://yices.csl.sri.com/tool-paper.pdf, August 2006. B. Franck, K. Neumann, and C. Schwindt. Truncated branch-and-bound, schedule-construction, and schedule-improvement procedures for resource-constrained project scheduling. OR-Spektrum, 23(3):297–324, 2001. S. Hartmann and D. Briskorn. A Survey of Variants and Extensions of the Resource-Constrained Project Scheduling Problem. European Journal of Operational Research, 207(1):1 – 14, 2010.

246 [6]

M. Bofill et al. / A System for Generation and Visualization of Resource-Constrained Projects

A. Horbach. A Boolean Satisfiability Approach to the Resource-Constrained Project Scheduling Problem. Annals of Operations Research, 181:89–107, 2010. [7] J. Kelley. The critical-path method: Resources planning and scheduling. Industrial scheduling, pages 347–365, 1963. [8] R. Kolisch. Serial and parallel resource-constrained project scheduling methods revisited: Theory and computation. European Journal of Operational Research, 90(2):320–333, 1996. [9] R. Kolisch, C. Schwindt, and A. Sprecher. Benchmark instances for project scheduling problems. In J. Weglarz, editor, Project Scheduling, volume 14 of International Series in Operations Research & Management Science, pages 197–212. Springer US, 1999. [10] R. Kolisch and A. Sprecher. PSPLIB - A Project Scheduling Problem Library. European Journal of Operational Research, 96(1):205–216, 1997. [11] O. Kon´e, C. Artigues, P. Lopez, and M. Mongeau. Event-Based MILP Models for Resource-Constrained Project Scheduling Problems. Computers & Operations Research, 38:3–13, January 2011. [12] O. Liess and P. Michelon. A Constraint Programming Approach for the Resource-Constrained Project Scheduling Problem. Annals of Operations Research, 157:25–36, 2008. [13] A. O. El-Kholy. Resource Feasibility in Planning. PhD thesis, Imperial College, University of London, 1999. [14] L. J. W. Pritsker, A. Alan B. and P. S. Wolfe. Multiproject Scheduling with Limited Resources: A Zero-One Programming Approach. Management Science, 16:93–108, 1996. [15] J. Roca, F. Bossuyt, and G. Libert. PSPSolver: An Open Source Library for the RCPSP. In 26th workshop of the UK Planning and Scheduling Special Interest Group (PlanSIG 2007), pages 124–125, 2007. [16] A. Schutt, T. Feydy, P. Stuckey, and M. Wallace. Solving RCPSP/max by lazy clause generation. Journal of Scheduling, 16(3):273–289, 2013.

Short Contributions and Applications

This page intentionally left blank

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-249

249

Towards a Remote Associate Test Solver based on Language Data a,1 and Zoe FALOMIR a Ana-Maria OLTETEANU ¸ a Cognitive Systems, Universität Bremen, Germany

Abstract. The Remote Associate Test (RAT) has been used for measuring creativity in humans. In this paper, the initial parts of a creative cognitive problem-solving framework are implemented to solve the RAT automatically using a knowledge base of language data extracted from the Corpus of Contemporary American English. The results provided by the computational RAT are compared to those obtained in previous RATs carried out on human participants in order to hyphotesize on an associationist creative cognition paradigm. Keywords. computational creativity, Remote Associate Test, cognitive systems, knowledge base, language corpus

Introduction From the cognitive systems perspective, one of the major unsolved questions about creative problem-solving is what kind of knowledge organization and processes endow the human cognitive system with creative abilities. Data on the particular profile such abilities take in human performance (i.e. specific errors, functional fixedness, ability to freely associate, incubation, insight) can be used to try to understand and model such knowledge organization and processes. Running hypotheses on the type of knowledge organization and processes that are used by humans can be modeled in artificial systems. Thus any artificial system aiming to implement creative problem-solving in a cognitivelyinspired manner should be matched against human abilities and performance data. In order to test any assumptions about principles of knowledge organization and processes in creative problem-solving, a task is needed which enlists abilities similar to insight problem-solving, and also provides enough human data for a rich comparison of the performance of the machine. Due to these reasons, the Remote Associate Test (RAT), initially proposed by Mednick [1], is used here as a comparison point. This work is focused towards modeling a computational problem-solver that can answer this test in a cognitively inspired manner. The rest of this paper presents: the RAT test and the principles of the proposed framework (Section 1); the set-up of the proposed RAT problem-solver (Section 2), together with the knowledge used, the system’s knowledge organization and the system flow; the obtained results and their comparison to human normative data [2] (Section 3); finally, the results and proposals of further work are discussed (Section 4). 1 Corresponding Author: Universität Bremen, Enrique-Schmidt-Str. 5, 28359 Bremen, Germany, E-mail: [email protected]

250

A.-M. Olte¸teanu and Z. Falomir / Towards a RAT Solver Based on Language Data

1. The Remote Associate Test (RAT) The Remote Associate Test [1] takes the following form: given three word items, the participant has to find a fourth term, which is common or can be connected to all of them. For example, the following 3 items are given: COTTAGE - SWISS - CAKE; and the participant has to come up with a fourth related term. An answer considered correct in this case according to Mednick’s studies[1] is CHEESE, because of the following associates: cottage cheese, swiss cheese and cheese cake. The associative theoretical framework our approach is based on [3] proposes the use of specific knowledge organization principles to help the agent search its own memory in the creative problem-solving process, in order to find relevant but remote information, replace missing objects, re-represent the problem in new ways which make solution inference possible. The RAT problem-solver mechanism presented here implements a few of these principles in a domain in which we can compare it to human data. It has a knowledge base in which formerly encountered knowledge creates associative links. To find the solution, these links are brought together in an associative problem-solving process, and converge upon a solution. The actual mechanism will become clear in the following description of the RAT problem-solver set-up.

2. The Computational RAT Problem-Solver The RAT problem-solver framework is composed of: 1. The RAT Knowledge Base (KB): the system is endowed with knowledge from language data of the Corpus of Contemporary American English2 (COCA) specifically 2-grams3 . The 1 million most frequent 2-grams of this corpus are pruned, based on the semantic analysis of the words (i.e. nouns, adjectives, verbs, etc. are selected4 ), in this way items not relevant for the RAT task are removed. As a result of the pruning, approx. 200, 000 items are retained in the KB. 2. Knowledge Aquisition and Organization by Association: The system is presented sequentially with each of the pre-selected 2-grams of the corpus and endowed with three types of atomic knowledge structures: Concepts, Expressions and Links. When a 2-gram is presented to the system, it is registered as an Expression. The system then checks if it is aware of the concepts contained in this expression - any concept that is unknown is added to its Concepts list. A Link is attached to each of the Concepts in the presented Expression. The Link is bidirectional - it can lead from either Concept to the other Concept with which it has been encountered in an Expression. After a while, each Concept is thus connected by Links to all the other Concepts it has formed an expression with, thus forming a hub of incoming connections. 2 Corpus

of Contemporary American English (COCA): http://corpus.byu.edu/coca/ are contiguous sequences of n items from a given sequence of text or speech, where n stands in for the number of items in this sequence 4 The UCREL CLAWS7 Tagset is used for extracting the semantic categories: http://ucrel.lancs. ac.uk/claws7tags.html 3 n-grams

A.-M. Olte¸teanu and Z. Falomir / Towards a RAT Solver Based on Language Data

251

3. Query System Flow: Whenever a 3-item query of the RAT form is received, each of the 3 items is activated. Then all the Concepts which are Linked to the first 3 active items are activated. This implies activation of all the Concepts which have been previously observed in an Expression, independent of whether they appeared in the first or second position of the Expression. Thus the other Concept in all Expressions which contain the initial query items becomes active too. Through the activation of all these Concepts, sometimes activation convergences towards a Concept occurs. For example, the items COT TAGE, SW ISS and CAKE activate amongst them many separate items, but also activate convergently the Concept CHEESE. This high activation (via convergence) allows the Concept CHEESE to be considered as a response for the RAT query.

3. Experimentation and Results For comparison of performance to tests given to human participants, the normative data from Bowden and M. Jung-Beeman [2] is used. The results show that out of the 144 items used in the psychological test, 64 are answered correctly5 by the proposed system. The system is set up to offer as answer the first Concept6 found with the highest activation. Thus, in the case in which multiple items are activated from the three different Concepts, the first one is chosen. As multiple items might be activated from all three Concepts, a different answer to that obtained by Mednick’s test [1] could be offered. Note that some expressions that compose the correct answer in Mednick’s test might not be present in the extraction from the corpus, and other items might associatively arise as the answer. If no 3-item convergence is found, the system will propose the first encountered 2-item on which convergence has happened (a best-next principle of selection). The performance of the system can also be assessed by analyzing the knowledge acquired by the system. For example, an interesting issue is to check in how many cases all three expressions suggested by the query were inside the knowledge base of the system, and how did the system perform in case only two items where present (Table 1). The data we compare to [2] only offers one correct answer. However, the results obtained compared to human data exceeded expectations, validating the usability of some of the framework’s organizational principles. As Table 1 shows, in the cases where the system had all 3 items in its database, the system’s accuracy of response compared to human data was at 97.92%, while when the system only knew two of the given expressions, it was finding the correct answers in 30.36% of the cases. Note that this is a bonus, since humans are normally assumed to answer correctly the queries for which they know all three items. The fact that some queries are solved only having 2 items in the KB proves that associative principles can add robustness to the system and help find solutions even in the case where knowledge is lacking. These accuracy results are achieved with knowledge organization and associationist principles alone, without yet introducing frequency data. 5 Correctness

in this case is considered as the exact answer provided by the system on its first try. this difference in notation: a Concept is part of the linked KB, whereas an item is part of the psychological test by Bowden and M. Jung-Beeman [2]. 6 Note

252

A.-M. Olte¸teanu and Z. Falomir / Towards a RAT Solver Based on Language Data

Table 1. Analysis of the accuracy data provided by the system. Answer /items known

0 items

1 item

2 items

3 items

Total

Correct Not solved

0 6

0 34

17 39

47 1

64 80

30.36%

97.92%

Accuracy

4. Conclusions and Further work In this paper, an associative approach from a creative cognitive problem-solving framework is implemented to solve the Remote Associate Test (RAT) automatically from a knowledge base of language data extracted from the Corpus of Contemporary American English. The RAT has previously been used for measuring creativity in humans. The results provided by the implementation of the RAT are compared to those obtained in previous RATs carried out on human participants in order to hypothesize on the viability of a knowledge-organization oriented associationist paradigm for creative problem-solving. The accuracy of response of the system (compared to human data by Bowden and M. Jung-Beeman’s test [2]) is at 97.92% in the cases where the system had all 3 items in its knowledge base, while the system still manages to answer accurately 30.36% of the cases when only two of the given expressions are known. Humans are normally assumed to answer correctly the queries for which they know all three items, so this proved that associative principles can add robustness to the system and help find solutions even in cases of incomplete knowledge. The RAT has not yet been studied thoroughly in the literature from a computational knowledge-organization paradigm. This puts us in the position to be able to suggest a large amount of further work. These suggestions can be narrowed to five categories: (1) addition of frequency data; (2) analysis of semantic influences; (3) comparison to other categories of RAT problems; (4) and reverse-engineering to generate other RAT tests for humans. Acknowledgements This work was conducted in the scope of the project R1-[Image-Space] of the Collaborative Research Center SFB TR 8 Spatial Cognition. Funding by the German Research Foundation (DFG) is gratefully acknowledged by Ana-Maria Olte¸teanu. Moreover, the support by the European Commission through FP7 Marie Curie IEF actions under project COGNITIVE-AMI (GA 328763) is also acknowledged by Dr.-Ing. Zoe Falomir. References [1] [2] [3]

[4]

S. A. Mednick and M. Mednick, Remote associates test: Examiner’s manual, Houghton Mifflin, 1971. E. M. Bowden and M. Jung-Beeman, “Normative data for 144 compound remote associate problems”, Behavior Research Methods, Instruments, & Computers, vol.3, no. 4, pp. 634–639, 2003. A.-M. Olte¸teanu, “From Simple Machines to Eureka in Four Not-So-Easy Steps.Towards Creative Visuospatial Intelligence”, In V.C. Müller (ed.) Philosophy and Theory of Artificial Intelligence, Synthese Library, Berlin:Springer, 2014. J.W. Schooler and J. Melcher, “The ineffability of insight”, in T.B. Ward and R.A. Finker (eds.) The creative cognition approach, Cambridge, The MIT Press, pp. 249–268, 1995.

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-253

253

Fear Assessment: Why data center servers should be turned off Damián Fernández-Cerero a , Alejandro Fernández-Montes a,1 , Luis González-Abril b , Juan A. Ortega a and Juan A. Álvarez a a Dpto. Lenguajes y Sistemas Informáticos, Universidad de Sevilla, Spain b Dpto. Economía Aplicada I, Universidad de Sevilla, Spain Abstract. The purpose of this paper is to demonstrate the extra costs incurred by maintaining all the machines of a data center executing continuously for the fear of damaging hardware, degradating service or losing data. In order to demonstrate this, an objective function which minimizes energy consumption depending on the number of times that the machines are switched on/off is provided. Finally we demonstrate that no matter how afraid of damaging hardware an administrator is, it is always a better option to turn off machines in order to reduce costs. Keywords. energy efficiency, grid computing, cloud computing, data center

Introduction A data center is a facility used to house computer systems and associated components that includes redundant or backup power supplies, redundant data communications connections and environmental controls. Data centers consumes as much electricity as a small town and sometimes are a significant source of air pollution. Large companies like Google locate some of their data centers in high latitudes to minimize cooling costs which are associated to data centers, which represents almost the 40% of total energy consumption of these infrastructures [1]. Because of this, data centers energy consumption has grown in the past ten years reaching the 1.5% of worldwide energy consumption [6]. Thus,big companies have addressed their energy-efficiency efforts to areas such as cooling [3], hardware scaling [4] or power distribution [5], slowing the power consumption growth in these facilities in recent years [2] which shows last predictions made. In addition to these areas of work, saving energy by switching on/off machines in grid computing environments has been simulated using different energy efficiency policies such as turning off every machine whenever possible or turning off some machines depending on workload [7,8] saving about 30% of energy. However big companies prefer not to adopt such policies because of their impact on the hardware and the possibility of damaging machines and the costs associated with this hardware deterioration. 1 Corresponding Author: Alejandro Fernández-Montes, Dpto. Lenguajes y Sistemas Informáticos, E.T.S.Ing. Informática, Universidad de Sevilla, Av. Reina Mercedes s/n, 41012, Sevilla, Spain; E-mail:[email protected].

254

D. Fernández-Cerero et al. / Fear Assessment: Why Data Center Servers Should Be Turned Off

1. Problem Analysis It makes sense that one of the most effective ways to achieve considerable energy savings is to turn off devices that are not being used. Although the average server utilization within data centers is very low (typically between 10 and 20 percent), there are very few companies that prefer to turn off the machines that are not in use rather than leaving them in an idle state. While idle servers consume half the energy [6], this remains a high energy cost. IT departments do prefer generally to keep the machines idle because of different fears, mainly: 1. Hardware damage: A high number switching on/off cycles, some computer hardware components suffer stress which can lead to computer deterioration [9]. We incur then in a repair cost. 2. Service degradation: In case that a task that needs this damaged computer we incur in a new cost due to the worsening in service quality, response times, etc. which is the opportunity cost. 3. Data loss: This is a critical issue in a data center infrastructure. If the machine (and its hard drive) that has been damaged was the only one that stored some data and it has been lost, some critical operations could not be performed and it would entail very high operation costs.

2. Fear cost A function that quantifies the fear costs is proposed, i.e. the costs associated with the belief that turning off data center machines involves more costs than the energy savings achieved. First, we assume that the minimum power consumption m is achieved by turning off the machines whenever possible, while the maximum power consumption Ma is obtained in the case that we never turn off the machines. We can then consider M = Ma −m as the extra expense because of having all that machines turned on without interruption. To achieve this goal we propose a cost function on the number of machines that could be turned off. We can consider then a new value N that represents the maximum number of times that a machine can be turned on given an operation time T . This value can be computed as a maximum that depends on operation time T , shutting down time Tof f and turning on time Ton as N = Tof fT+Ton . Then the random variable Xi , which takes the value 1 if a computer i breaks down on power switching and 0 otherwise, and this follows a Bernoulli model (which measures de probability of the success or not of a experiment) with probability x of success p that should has a value very close to zero. Finally a new variable X = i=1 Xi is considered, which represents the average number of damaged machines after x switching on/off cycles. On the other hand, we have to consider the cost of repairing computers damaged for the switching on/off cycles. This is computed as an average cost Cr . Thus we can quantify the fear costs derived from switching on/off machines as Cf ear (x) = xpCr . In addition, if a computer is turned off and then a request that requires that machine is turned on comes, the client will need to wait until the computer turns on. Considering Co as the opportunity cost that measures the value that a customer gives to that lost time, and Ton as the time that needs a computer to turn on, we can quantify the turn on costs Con as Con (x) = xTon Co . We can get from both costs the total cost of turning off the machines can be quantified as C(x) = x(Ton Co + pCr ) x = 0, 1, 2 . . . N . From this we can obtain the

D. Fernández-Cerero et al. / Fear Assessment: Why Data Center Servers Should Be Turned Off

255

cost of switching off the machines whenever possible C(N ) = N (Ton Co + pCr ). Nowadays most companies prefer not turning off machines, so this decision implies that C(N ) > M . x , which indicates In order to simplify this function we can consider that y = N the proportion (per unit) of the number of switching on/off cycles applied against the natural maximum applicable. From the above assumptions C(y) = yN (Ton Co + pCr ) is obtained. Now (1 − y) represents the percentage of switching on/off cycles not applied to the machines. Assuming that the cost of having all the machines turned on M is proportional to the percentage of switching on/off cycles applied represented by y, then we can compute (1 − y)M as the cost for having the machines switched on. From these latter two costs we can compute the cost for having a percentage of machines turned off: f (y) = yN (Ta Co + pCr ) + (1 − y)M

0 1

(2)

Following the current hypothesis which assumes that switching off any machine implies more cost (the called fear cost), this objective function reaches its minimum when y = 0, i.e., when no machines are turned off and they maintain a continuous execution. However, if we assume that switching off/on machines moderately may have a benefit, we can consider a new objective function that starts from this hypothesis: f (y) = Ay B + (1 − y)

0 < y < 1, A > 1, B > 1

(3)

This new objective function is convex and reaches its minimum at the point: 1

y0 = (AB) 1−B

(4)

Thus, if the B parameter has a high value, it is enhancing the shutdown of the servers in the data center, while if the value of the parameter B has a value close to 1 it is enhancing keeping machines on/idle. Nowadays, most data center companies do not shut down servers, so B parameter is set to 1 while the parameter y takes the value of 0. Based on this objective function, we can promote not to turn off the machines by placing the value of B in 1.1. However, as we can see, the minimum cost is achieved by turning off (few) machines, although this way we incur in many costs associated with fear. In the following figures (Figure 1a, Figure 1b and Figure 1c) the percentage of shutting downs is represented in the x axis, while y axis represents the total costs of the datacenter. It is noticiable that the minimun cost is always achieved when shutting down at least some machines off, never when no machines are turned off. If we overcome these fears to turn off machines, we can low energy consumption and notice that the minimum cost is achieved by turning off more machines placing the value of B in 1.5. And this trend continues if we increase the value of B, which means that every time we take less costs due to fear and we can reach the minimum cost by turning off more machines, as it is noticeable when B takes the value of 5.

256

D. Fernández-Cerero et al. / Fear Assessment: Why Data Center Servers Should Be Turned Off

(b) A=1.5, B=1.5 1.4 1.2 1.0 0.8 0.6 0.4 0.0

(a) A=1.5, B=1.1

0.2

0.4

0.6

0.8

1.0

(c) A=1.5, B=5 Figure 1. ObjetiveFunction

3. Conclusions Fear cost has been presented in which most companies are nowadays incurring due to the false belief that turning off machines in data centers involves more costs than savings. In order to demonstrate this, an objetive function has been proposed which determines that we can always get a lower total cost by turning off data center servers a number of times greater than zero, showing that the current belief is a mistake that should be corrected by applying shutting on/off policies. As future work, the measurements of a) the extra costs associated with turning off the machines in terms of hardware and b) the energy savings that we could get by building a software system which implement policies for energy efficiency in data centers have to be analyzed.

Acknowledgements This research is supported by the project Simon (TIC-8052) of the Andalusian Regional Ministry of Economy, Innovation and Science and by the project Hermes (TIN201346801-C4) of the Spanish Ministry of Economy and Competitiveness.

References [1] [2] [3] [4] [5] [6] [7] [8] [9]

Ahuja N, Rego C, Ahuja S, Warner M. Data center efficiency with higher ambient temperatures and optimized cooling control. 27th IEE SEMI-THERM Symposium 2011. Koomey J. Worldwide electricity used in data centers. Environmental Research Letters 2008. El-Sayed N. Temperature management in data centers: Why some (might) like it hot. Technical report, Department of Computer Science, University of Toronto 2012. Fan X. Power provisioning for a warehouse-sized computer. Technical report, Google Inc. 2007. Femal ME. Boosting data center performance through non-uniform power allocation. Autonomic Computing. 2005. Koomey J. Growth in data center electricity use 2005 to 2010. Analytics Press 2011. Fernández-Montes A, González Abril L, Ortega JA, Lefevre, L. Smart scheduling for saving energy in grid computing. Expert Systems with Applications 2012. Fernández-Montes A, Velasco F, Ortega JA. Evaluating decision-making performance in a gridcomputing environment using DEA. Expert Systems with Applications 2012. Pinheiro E, Weber WD, Barroso LA. Failure Trends in a Large Disk Drive Population. 5th USENIX Conference on File and Storage Technologies 2007.

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-257

257

Low-complex Real-Time Breathing Monitoring System for Smartphones a

Pere MARTI-PUIG a1, Gerard MASFERRER a2, Moises SERRA a3 Dept.Digital Information Technologies, UVic-UCC,08500 Vic (Barcelona), Spain 1 [email protected], 2 [email protected], [email protected]

Abstract. Monitoring breathing is required in many applications of medical and health fields but it can also be used in a new set of applications that involve fields as diverse as entertainment games or some new applications oriented to develop skills such as focused attention. As smartphones are widespread around the world it is assumed that an important number of such applications must run on such platforms. In this work, an automatic, low-complex real-time system for monitoring breathing is presented. The system uses the acoustic signal recorded by a standard microphone placed in the area of the nostrils. The method is based on the Sequencial Walsh Hadamard (SWH) transform coefficients computed on nonoverlapped frames to finally provide a reduced set of real parameters. These parameters feed a linear classifier that labels the frames in three different groups: inspiration, transition and expiration, in real time. The system runs in a smartphone or any portable device, only requiring as auxiliary equipment, a standard microphone from a conventional Bluetooth Headset. Keywords. Breathing monitoring, low-complex system, linear discriminant analysis, smartphone app, Sequencial Walsh Hadamard (SWH) Transform

1

Introduction

This paper specifically considers the accurate real-time breathing monitoring using an acoustic signal recorded by a standard microphone placed in the area of the nostrils. By analysing this acoustic signal, the breathing is continuously tagged in terms of its cycles of inspiration-expiration and an intermediate stage called transition. This preliminary design is thought to work in low noise environments running on portable devices such as smartphones or tablets. Therefore, in this work we focus our efforts on obtaining low complex algorithms. Breathing is one of the body’s few autonomic functions that can be controlled and can affect functioning of the autonomic nervous system [1]. Hence, it has received great attention from the biofeedback framework. Following the biofeedback breathing approach, some relevant research has been developed related with stress and health [2]– [7] or in the area of respiratory illnesses, as for example the respiratory sinus arrhyth1

Corresponding Author.

258

P. Marti-Puig et al. / Low-Complex Real-Time Breathing Monitoring System for Smartphones

mia (RSA), the phenomenon by which respiration modulates the heart rate in normal humans [3], [8]. It is also reported that breathing affects arterial blood pressure and pulse volume. Therefore, nowadays an important part of those biofeedback clinical studies relate breath control with heart rate coherence [9] and with the idea of developing future non-pharmacological treatments for hypertension. Many clinical studies are being done under this assumption. Classical methods to monitor breathing are based on movement, volume and tissue composition detection. Methods included in this category are transthoratic impedance monitoring, measurement of chest and/or the abdominal circumference, electromiography, various motion detectors and the photoplethysmography. A good compilation of these methods can be found in [10]. In the literature, other several non-intrusive methods have been proposed to detect breathing. Recently, in [11], [12] breathing is detected using far-infrared (FIR) cameras by monitoring the air flow temperature in the nasal hole due of the inspiration and expiration phases. Those approaches involve image processing techniques and have to deal with practical questions as head rotation distance or angle between human and the camera. Our approach follows the acoustic signal approximation similarly to [13], [14] where the respiratory sound is measured using a microphone placed either close to the respiratory airways or over the throat to detect sound variations. The acoustic breathing signal has been studied and modelled in different works[15]–[18]. Current smartphones have become more ubiquitous and provide high computing and connectivity drive, thus a huge increase of applications (Apps) is expected. Biofeedback apps, for example, can take advantage of those devices allowing a patient to perform training in different places and, taking also advantage of its powerful connectivity capabilities, to collect data -with user permission if the apps are distributed for free- to enable large-scale clinical studies. Nowadays, there are several applications available for smartphones, mainly for iOS and Android operating systems, which work on different breathing aspects. These apps come from fields like health [1], [19] relaxation [20] or meditation techniques [21], and have a careful presentation but suffer from little real-time monitoring capacity, or don’t have any control at all, on monitoring the quality of the exercise. Most of these apps simply provide breathing rhythms that should guide the exercise. Our system can overcome these limitations and provide an accurate control on breath monitoring. 2

Low complex parameterization and system performance.

In the case presented, a cheap standard microphone from a conventional Bluetooth Headset was placed very close to the nostrils area with the amplifier gain near its maximum. Microphones detect airflow due to the sound created by turbulence that occurs in the human respiratory system because, even in shallow breathing, turbulence occurs in parts of this system creating a noise which is transferred through tissue to the surface of the skin [22]. Works in [15]–[18] have investigated the acoustic breathing signal from physical production. In a real-time monitoring breathing system, able to run in handset devices, it is essential to avoid complex operations in order to save power consumption so we propose a new parameterization based on the Sequence Ordered (SO) Walsh Hadamard Transform (WHT) which has fast algorithms that do not require multiplications, only additions and subtractions. Then a less complex algorithm can be developed, first detecting the very low energy frames of the signal that will conform the transition frames, and then introducing the rest of the frames into a very simple

P. Marti-Puig et al. / Low-Complex Real-Time Breathing Monitoring System for Smartphones

259

classifier in order to label the rest as inspiration or expiration frames. The acoustic signal is sampled at 8,8 KHz. The frame length is kept at 200ms and the coefficient’s number fixed experimentally at three, based on our previous work in [23]. Voice Activity Detector (VAD) will be implemented by simply establishing a threshold after the envelope detector implemented by:

vp32 (n)  0.995e(n  1)

e(n)

(1)

Where v|↓32 is the input sample at instant n considering a down sample factor by 32 and e(n) is the envelope signal; | | stands for the modulus. The threshold can be slowly modified from the smartphone screen. The real time implementation follows a classical structure of a two buffers in order to manage the memory of the device appropriately. One entire frame can be allocated in each buffer. Each frame is parameterized with only three coefficients pa (a=0, 1, 2) which are obtained from the above operations written in matrix form as follows:

WM / 2 >I M / 2

wM /2

I M / 2 @PM v M

(2)

Where PM is the even-odd permutation matrix which, written using the column vectors in of the identity matrix, IN, takes the form: PN

[i o

i2

··· i N 2

i1

i3

··· i N 1 ] ,

(3)

WM/2 is the squared SO WHT matrix of size M/2. The coefficients pa, considering the elements {wm} of vector wM/2 are obtained as follows: M 1 6

pa

¦w i 0

ia

M 6

a

0,1,2

(4)

To evaluate the system performance, a set of 500 frames of 200ms of each user (three males and three females) were parameterized with the three pa parameters and were handmade classified in the three groups labelled as inspiration, expiration and transition frames. Many possible techniques for data classification are available. Among them, Linear Discriminant Analysis (LDA) has been selected for its simplicity. We have shown that a low-complex user-independent breathing classification system is able to classify in real-time frames of 200ms in three categories properly. When we use all the frames from all the available subjects into the LDA classification system, a classification rate of 95,85% is achieved. When the breathing monitoring system only needs to follow the cycles and it can support the classification decision on more than one frame, the last result can be improved nearly 100%. 3

Conclusions

A preliminary study for developing a low-complex system for real-time monitoring breathing is presented. The system is designed to be used with the acoustic signal recorded by a standard microphone and is based on a low-complex signal parameterization (only three parameters) performed on non-overlapping frames that are classified as inspiration, expiration and transition by a LDA classifier. The almost 96% of classification rate obtained is a promising result and encourages us to explore this way for designing a real system using only a smartphone. The major drawback of this study is that

260

P. Marti-Puig et al. / Low-Complex Real-Time Breathing Monitoring System for Smartphones

the database of the experiments is small. It has only six different volunteers, three males and three females, all of them between 18 and 25 years old. Therefore, results have to be more widely contrasted and verified. References [1] L. J. Badra, W. H. Cooke, J. B. Hoag, A. A. Crossman, T. A. Kuusela, K. U. O. Tahvanainen, and D. L. Eckberg, “Respiratory modulation of human autonomic rhythms,” American Journal of PhysiologyHeart and Circulatory Physiology, vol. 280, no. 6, pp. H2674–H2688, 2001. [2] A. McGRADY, “The effects of biofeedback in diabetes and essential hypertension,” Cleveland Clinic journal of medicine, vol. 77, no. Suppl 3, pp. S68–S71, 2010. [3] P. Mikosch, T. Hadrawa, K. Laubreiter, J. Brandl, J. Pilz, H. Stettner, and G. Grimm, “Effectiveness of respiratory-sinus-arrhythmia biofeedback on state-anxiety in patients undergoing coronary angiography,” Journal of Advanced Nursing, vol. 66, no. 5, pp. 1101–1110, 2010. [4] S. K. Moore, “Tools & toys: Calm in your palm,” IEEE Spectrum, vol. 43, no. 3, p. 60, 2006. [5] V. M. Pokrovskii and L. V Polischuk, “On the conscious control of the human heart,” Journal of Integrative Neuroscience, vol. 11, no. 2, pp. 213–223, 2012. [6] G. F. Rafferty and W. N. Gardner, “Control of the respiratory cycle in conscious humans,” Journal of Applied Physiology, vol. 81, no. 4, pp. 1744–1753, 1996. [7] M. Stock, K. Kontrisova, K. Dieckmann, J. Bogner, R. Poetter, and D. Georg, “Development and application of a real-time monitoring and feedback system for deep inspiration breath hold based on external marker tracking,” Medical Physics, vol. 33, no. 8, pp. 2868–2877, 2006. [8] A. J. R. Van Gestel, M. Kohler, J. Steier, S. Teschler, E. W. Russi, and H. Teschler, “The effects of controlled breathing during pulmonary rehabilitation in patients with COPD,” Respiration, vol. 83, no. 2, pp. 115–124, 2012. [9] M. Sharma, “RESPeRATE: Nonpharmacological treatment of hypertension,” Cardiology in review, vol. 19, no. 2, pp. 47–51, 2011. [10] M. Folke, L. Cernerud, M. Ekström, and B. Hök, “Critical review of non-invasive respiratory monitoring in medical care,” Medical and Biological Engineering and Computing, vol. 41, no. 4, pp. 377–383, 2003. [11] L. J. Goldman, “Nasal airflow and thoracoabdominal motion in children using infrared thermographic video processing,” Pediatric pulmonology, 2012. [12] T. Koide, S. Yamakawa, D. Hanawa, and K. Oguchi, “Breathing Detection by Far Infrared (FIR) Imaging in a Home Health Care System,” in Proceedings of the 2009 International Symposium on Bioelectronics and Bioinformatics, 2009, p. 206. [13] P. Corbishley and E. Rodriguez-Villegas, “A nanopower bandpass filter for detection of an acoustic signal in a wearable breathing detector,” Biomedical Circuits and Systems, IEEE Trans on, vol. 1, no. 3, pp. 163–171, 2007. [14] P. Corbishley and E. Rodríguez-Villegas, “Breathing detection: towards a miniaturized, wearable, battery-operated monitoring system,” Biomedical Engineering, IEEE Transactions on, vol. 55, no. 1, pp. 196–204, 2008. [15] I. Hossain and Z. Moussavi, “Relationship between airflow and normal lung sounds,” in Electrical and Computer Engineering, 2002. IEEE CCECE 2002. Canadian Conference on, 2002, vol. 2, pp. 1120– 1122. [16] I. Hossain and Z. Moussavi, “Respiratory airflow estimation by acoustical means,” in Engineering in Medicine and Biology, 2002. 24th Annual Conference and the Annual Fall Meeting of the Biomedical Engineering Society EMBS/BMES, 2002. Proceedings of the Second Joint, 2002, vol. 2, pp. 1476–1477. [17] B. E. Shykoff, Y. Ploysongsang, and H. K. Chang, “Airflow and normal lung sounds,” American Journal of Respiratory and Critical Care Medicine, vol. 137, no. 4, pp. 872–876, 1988. [18] Y. L. Yap and Z. Moussavi, “Acoustic airflow estimation from tracheal sound power,” in Electrical and Computer Engineering, 2002. IEEE CCECE 2002. Canadian Conf. on, 2002, vol. 2, pp. 1073–1076. [19] “Breath Health Tester Pro,” Mar-3AD. . [20] “Relax: Stress & Anxiety Relief,” Mar-3AD. [21] “Pranayama -Yoga Breathing,” Mar-3AD. [22] V. P. Harper, H. Pasterkamp, H. Kiyokawa, and G. R. Wodicka, “Modeling and measurement of flow effects on tracheal sounds,” Biomedical Engineering, IEEE Tran. on, vol. 50, no. 1, pp. 1–10, 2003. [23] P. Martí-Puig, J. Solé-Casals, G. Masferrer, and E. Gallego-Jutglà, “Towards a Low-Complex Breathing Monitoring System Based on Acoustic Signals,” in in Advances in Nonlinear Speech Processing SE - 17, vol. 7911, T. Drugman and T. Dutoit, Eds. Springer Berlin Heidelberg, 2013, pp. 128–135.

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-261

261

Influencer Detection Approaches in Social Networks: A Current State-of-the-Art Jordi-Ysard PUIGBÒ a,1 , Germán SÁNCHEZ-HERNÁNDEZ a , Mònica CASABAYÓ a and Núria AGELL a a GREC-ESADE, Ramon Llull University, Spain Abstract. In this paper a literature review on approaches for influencer detection in social networks is conducted. The paper contributes with a comparison between the three most popular influencer detection tools, with an analysis of their methods and algorithms and with a list of proposed extending capabilities. Keywords. Influencers, Social Networks, Word-of-Mouth Marketing, Opinion Leaders

Introduction Identifying the opinion leaders (OLs) has been a classical challenge for academics and practitioners in the marketing field. Rogers and Cartano’s [1] described them as “individuals who exert an unequal amount of influence on the decisions of others”. The new digital OL and the power of Word-of-Mouth communication strategies in Social Networks (SNs), has attracted the interest of the science community in the identification of influence in SNs. Principal approaches on detection of OLs in SNs are based on information diffusion models [2], identifying central nodes in the social network [3] and ranking influencers using search engine ranking algorithms [4]. Besides, several companies have also developed tools which help marketing department’s to identify influencers. The double aim of this paper is to compare these tools to help marketing managers to find out which are more useful for their marketing standards and to provide some extension guidelines. The paper is organised as follows: Section 1 introduces the main approaches in influencer detection and some interesting cases. Section 2 compares the aforementioned commercial applications and states the flaws of current approaches and proposes new worklines to extend its applicability in a business environment. Finally, Section 3 closes this document with a few conclusions and a brief description of our future work.

1. Influencer Ranking Approaches: A Survey The detection of influencers has recently been addressed by Social Network Analysis (SNA) given the exponential growth of available data in web services. This problem has 1 Corresponding Author: Jordi-Ysard Puigbò, Avda. de la Torre Blanca 59, 312N. 08172 - Sant Cugat del Vallès, Spain; E-mail: [email protected]

262

J.-Y. Puigbò et al. / Influencer Detection Approaches in Social Networks

been modelled by graphs and so, graph theory has become extremely related to SNA. In this section we present the most representative approaches to identify influencers in social networks: centrality measures, defining users’ prestige and identifying the optimal paths for the diffusion of innovations. 1.1. Centrality Measures Centrality, in graph theory, is defined as a measure of the importance of a given node within a graph. This method has been widely explored in the early years of identification of opinion leaders in SNs2 (SNs) and have been readopted to explore online SNs. In online SNs almost every action implies interaction (adding a friend in FaceBook, retweeting in Twitter, etc.), either directed or undirected. Using these interactions as edges and considering users as nodes allow several different interaction graphs to be defined. The most common measures are InDegree Centrality, understood as the number of inward connections of a node i; Closeness Centrality, which considers the shortest paths to reach from every node j to the studied node i; and Betweenness Centrality, that measures the proportion of shortest paths passing through a node i. Kiss and Bichler [3] provide an extended review and comparison of different centrality measures. 1.2. Prestige Ranking Algorithms Ranking algorithms have their origin in web search engines. The need to find the most relevant documents, given a specific query, lead to algorithms that determine a numerical value for each document, interpreted as a measure of prestige. This measure depends on the the hyperlinks pointing from one web document to another. Although these algorithms (specially PageRank and HITS) were originally conceived for web page ranking, they have been applied and adapted for the specific area of SNA. As an introduction to this algorithms one should be referred to PageRank [5] and Hits [6], that can be directly applied to Twitter. PageRank determines iteratively the prestige of a webpage or user as the weighted sum of directed links to a certain node, being the weight the prestige of the linking node. HITS, on the contrary, defines two measures for each node: authority, understood as the weighted sum of links directed to a page; and hub, defined as the weighted sum of links going out from the studied node. After them, several adaptations of both algorithms have been created for SNs, although those based on PageRank are used more frequently. An example is TwitterRank [4], an adaptation of PageRank that considers the topical similarity between users as a measure of direct influence from one user to another as another weighting factor. 1.3. Information Diffusion The objective of information diffusion approaches is to identify the paths that optimise the spread of information. The general workflow of these techniques consists in the selection of a diffusion model and an optimisation process to select the best initial set of nodes, the nodes that maximise the spread of information. These diffusion models define how a certain node becomes active due to the activity of others. Kempe et al. [2] iden2 We

understand a social network as any interconnected system whose connections are the product of social relations or interactions between persons or groups.

J.-Y. Puigbò et al. / Influencer Detection Approaches in Social Networks

263

tified two basic diffusion models: Linear Threshold Models, which define that a node i becomes active when the sum of the weights of the neighbouring nodes becomes greater than a randomly chosen threshold θi ; and the Independent Cascade Models, which states that when a node i becomes active it receives a single chance to activate each neighbouring node j, independently of the history so far. Many extensions of these models have been found in the literature, being the most popular those considering additional node states like SIS (susceptible-infectious-susceptible) or SIRS (susceptible-infectiousresistant-susceptible). Identifying the initial set of users that maximise the spread of information offers an interesting approach to the identification of influential users.

2. Discussion on Current Approaches: Analysis and Extensions 2.1. Existing Commercial Applications The increasing interest of the companies to detect OLs on the net led to the appearance of several startups offering their own tools based on a combination of different methodologies which aim to increase the efficiency of the marketing communication campaigns. Although an exhaustive analysis of online applications is far from the scope of this work, Klout, PeerIndex and Kred 3 , among others, can be highlighted as some of the most popular existing tools in this area. Their strategies are different and their purpose is strictly helping companies reach more efficiently their customers, while they converge in providing an influence measure. Klout offers personalised marketing campaigns based on the identification of influencers in several SNs while detecting these influencers by the aggregation of several indicators. In the other hand, PeerIndex provides their clients with a toolbox that offers measures of general influence and for different topics together with the tools to begin and supervise a marketing campaign based on Twitter. Finally, Kred offers an open measure of influence based on badges, points and achievements, contrary to their competitors that do not define clearly the way they define the influence. 2.2. Analysis and Extensions The aforementioned algorithms face some difficulties and challenges that are still to be solved. On the one hand, the influence of users is spread among the internet and the SNs. Having most of the SNs restrictions to the data availability due to privacy and bandwidth limitations, Twitter becomes the most widely addressed SN to identify influencers while others are left on a secondary level. The absence of adequate aggregations undermines the effectiveness of current approaches. Nonetheless, methods described in Section 1 are more effective as more complete and connected is the portion the whole set of users studied. On the other hand, influence is considered to be linked to specific topics. Identifying users influential on very specific topics is more useful than considering a general influence indicator. Consequently, the identification of these users potentially improves the efficiency of marketing campaigns. However, this last issue has been addressed in very simplistic ways, due to the difficulties to obtain information about these topics. In order to face the previous challenges regarding the influencer detection, some directions are suggested. Using advanced indicators, understood as measures not limited 3

http://www.klout.com; http://peerindex.com; http://kred.com

264

J.-Y. Puigbò et al. / Influencer Detection Approaches in Social Networks

to the edges of each SN graph, but as generalised indicators identifiable on every SN, such as the engagement generated by a user’s publications, their echo or the interest they generated. These indicators have been demonstrated to be core aspects in the definition of influence in SNs [7]. Moreover, considering topical influence, as stated above, by collecting not only general, but topical indicators is key to identify both general influencers and domain-specific opinion leaders in a SN. Finally, finding the appropriate indicators that define influence and considering a proper fuzzy aggregation of the aforementioned advanced measures would allow a network independent and a generalised measure of influence to be properly defined.

3. Conclusions and Future Research Some extensions are proposed and discussed to improve their usability. The consideration of the maximum context in the influence analysis and users thematic influence among SNs has been studied. Finally, to introduce future research, a call for the homogenisation of measures and techniques to advance on influence detection approaches in SNs is shared in this paper. Next research lines will be oriented towards the development of functional and robust measures of influence involving several SNs.

Acknowledgements This work is partially supported by the SENSORIAL Research Project (TIN2010-20966C02-01), funded by the Spanish Ministry of Science and Information Technology. In addition, the support of the Spanish technology-based services provider Arvato (http: //www.arvato.com) is also gratefully acknowledged.

References [1] E. M. Rogers and D. G. Cartano. Living Research Methods of Measuring Opinion Leadership. Public Opinion Quarterly, 3(26):435–441, 1962. [2] David Kempe, Jon Kleinberg, and Éva Tardos. Maximizing the spread of influence through a social network. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’03, page 137, 2003. [3] Christine Kiss and Martin Bichler. Identification of influencers – Measuring influence in customer networks. Decision Support Systems, 46(1):233–253, 2008. [4] Jianshu Weng, Ee-peng Lim, and Jing Jiang. TwitterRank: Finding Topic-Sensitive Influential Twitterers. pages 261–270, 2010. [5] Lawrence Page and Sergey Brin. The PageRank Citation Ranking: Bringing Order to the Web. pages 1–17, 1998. [6] Jon M Kleinberg. Authoritative Sources in a Hyperlinked Environment. Journal of the ACM, 46(5):604–632, 1999. [7] Leisa Reinecke Flynn, Ronald E Goldsmith, and Jacqueline K Eastman. Opinion leaders and opinion seekers: two new measurement scales. Journal of the Academy of Marketing Science, 24(2):137–147, 1996.

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-265

265

Combinatorial and Multi-Unit Auctions Applied to Digital Preservation of SelfPreserving Objects Jose Antonio OLVERAa,1, Paulo Nicolás CARRILLOa and Josep Lluis DE LA ROSAa a TECNIO - Centre EASY, Agents Research Lab, VICOROB Institute, Univ. of Girona

Abstract.The optimization of the budget of self-preserving digital objects is a new bottom-up approach for the digital preservation, that is in this paper developed through micro negotiations as combinatorial and multi-unit of preservation and curation services for these objects to be preserved, accessible and reproducible for long term despite frequent updates of software and hardware that causes them to digital obsolescence. Keywords. Digital preservation, self-preservation, agents, electronic auctions

1. Introduction The Long Term Digital Preservation (LTDP), focuses increasingly on companies, scientists and citizens because it will become a problem to all having digital information over 15 years old, outpacing two or more versions of software and hardware. As [2] explains, our paradigm is bottom-up, where digital objects (DO) are selfpreserving. In this paper, we propose them to manage their own cost of LTDP. Works like [1][3][4] propose behaviors for DO to fight digital obsolescence with a promising level of success. However, part of that success is due to budget management that DOs should have for their preservation. We call them Cost Aware Digital Objects (CADO).

2. The Paradigm of Cost Aware Digital Object (CADO) In our paradigm, we rethink the relationship between DOs and digital preservation services, as follows: The CADO: It is responsible for its own-conservation with its own budget for hiring preservation services, such as risk assessment data loss, data recovery, metadata extraction, migration and new storage formats. We define the state of obsolescence with four attributes: accessibility, readability, integrity and authenticity, combination of which represents the quality of CADO and digital longevity. 1

Corresponding Author, email address: [email protected] Email addresses [email protected], [email protected], [email protected]

266

J.A. Olvera et al. / Combinatorial and Multi-Unit Auctions Applied to Digital Preservation

In our simulations, we consider a CADO is "at risk" of obsolescence if any of its attributes is below the threshold value of 2/3 because it means that only 2 out of 3 times it can be accessed, read, integral, or authentic. Preservation Services: They are specific preservation action such as today format migration, metadata extraction, and new storages are. Examples of them were taken from [5], services that are of types relevant and compatible with a CADO so that they can increase one or more attributes at the same time. We organize them in terms of a generic type of preservation (i.e., general families of services that will be valid 100 years from now, such as storage, conversion, checksum, etc.) and a particular type (the services themselves), an initial sales price and a minimum one, and an estimation of the increase in percentage of the four attributes that a CADO can experience with this service. Both price and quality percentages of increase at each CADO attribute are calculated by estimates that have been made based in several heuristic criteria, such as family and specific type they are and the number of downloads, as shown in [5]. SAW – Software Adoption Wave: It represents a massive upgrade of old software to a new one, a wave of change, such as when Microsoft urges to make every few years with your operating system, your Office or .NET and SharePoint. It occurs at the same pace of the formats update, 5 years 0. In each SAW, we activate digital preservation actions to recover the CADOs quality loss suffered in these waves. We simulated the SAWs with a percentage decrement of CADOs attributes: randomly 10% or 20% decrements are applied. For the 10% decrement in the quality of all its attributes some CADO deterioration of 30% occurs after 15 years, moment in which the CADO has urgency of its preservation or curation if deterioration occurs of higher % Virtual Currency: In order to extemporize and universalize the price assignment of the LTDP services costs, so that we will talk of the same prices now and in the very long future, regardless of the monetary paradigm of the future, as well as to provide budget to CADO in an transparent, easier, and general way, we define a virtual currency called PRESERVA (₱). We define than 1₱ is the cost of preserving a DO for 100 years. Among other advantages, this currency brings a measure of value, the degree of confidence you have on LTDP services based on their price, or the importance of a DO for a person depending on the budget that he or she allocated to it. In our experiments we use the m₱ (miliPRESERVA).

3. Implementation of Purchasing Mechanisms In the simulations, we seek to maximize the life of the CADOs, that is estimated with the DOs attributes, which have to be maximized. DO is given a budget that has to manage to achieve this purpose, with the following mechanisms: Free choice buying: In each SAW, the CADO chooses the cheapest service among compatible services. Descending and inverse multi-unit auction: The bidders interested in selling (the preservation services) compete by offering their best (low) price to be purchased by the CADOs and thus proceed to increase the quality of the attribute that is at risk (below 70% threshold). In each SAW, the auction process is initiated, where each CADO with one of its four attributes below the threshold enter into the process. In our implementation, there will be 4 auctions in each process, one for each CADO attribute. For each of the 4 attributes, CADOs that any attribute is below the threshold and the preservation services that can enhance it are selected. At this point rounds of auctions

J.A. Olvera et al. / Combinatorial and Multi-Unit Auctions Applied to Digital Preservation

267

begin, searching through all services the one that has the best quality-price ratio for the attribute in question; they are sought as well which CADOs that having enough budget are willing to buy the service, and then they purchase the service. If there are CADOs left with any of their attributes under threshold as well as services ready to have their prices cut then the process goes on and on until one of the following criteria is met: No more DO have preservation needs for an attribute; or the preservation services might not cut their prices any further. Combinatory auction descendent and reverse: CADOs with at least two under threshold attributes look for services which are now the offerors instead of being the bidders of the multi-unit auction. In every SAW, the auction process starts: For every demanding CADO all the combinations of preservation services that contribute to keep the attributes up are calculated. The winner combination is at the lowest price while the highest contribution to the quality of the CADO that required of the service. The winner combination is purchased by the CADO if it has budget enough, otherwise a new auction is started for it as long as the preservation services show they are willing to cut prices. This process goes on and on until one of the following criteria is met: There are no more CADO with preservation needs; the services cannot cut their prices any further; or the CADOs have not enough budgets for new purchases. Simulations set up: 5 simulations per scenario are run to have statistical ground, in each scenario the price of the services varies ±10% as well as their impact of the quality boost in the CADOs. 20 iterations happen at every run (of 100 years), and a SAW is simulated at every iteration, that is every 5 years. The simulations run 1000 CADOs and 100 preservation services. Three scenarios are developed according to the budget: low budget, that is depleted after few iterations (800-1500 m₱); high budget, that last many more iterations (4501-8500 m₱); and the average budget of 1501-4500 m₱. To have a clearer idea of the budget, having less than 1000m₱ means that CADOs have not enough budget to survive digitally in the 100 years of our experiments.

4. Results and Conclusions We compare the three mechanisms with low budget (Figure 1), average (Figure 2) and high (Figure 3). X-axis is the iterations when auctions happen and Y-axis is the average quality of the CADOs. For low and average budgets, one might see the average quality of the CADOs is the best for the combinatory auctions, followed by the free choice, and the worst is the multi-unit auction, as well as keep doubled the remaining budget with respect the free choice mechanism. With high budgets, both the combinatory auction and the free choice perform similarly in quality yet the combinatory auction manages the budget slightly better (at the end of simulations has 24% more budget). These are signs that the combinatory auction significantly contributes to the LTDP of the CADO. This result encourages us to follow the research in this line with experiments of wider scope, refined behavioral auction heuristics, and more precise and extemporal definitions of quality of preservation and virtual money to back properly this bottom-up management of the LTDP costs.

268

J.A. Olvera et al. / Combinatorial and Multi-Unit Auctions Applied to Digital Preservation

Figure 1. Low budget experiments

Figure 2. Average budget experiments

Figure 3. High budget experiments

Acknowledgements This research is partly funded by the Spanish IPT20120482430000 (MIDPOINT). Nuevos enfoques de preservación digital con mejor gestión de costes que garantizan su sostenibilidad, the EU DURAFILE num. 605356, FP7-SME-2013, BSG-SME (Research for SMEs) Innovative Digital Preservation using Social Search in Agent Environments, TIN2013-48040-R (QWAVES) Nuevos métodos de automatización de la búsqueda social basados en waves de preguntas, as well as AGAUR 2012 FI_B00927 grant awarded to Jose Antonio Olvera and the grup de recerca consolidat CSI-ref. 2014 SGR 1469.

References [1] de la Rosa J. L. and Olvera J.A. First Studies on Self-Preserving Digital Objects. AI Res. & Dev., Procs 15th Intl Conf. of the Catalan Assoc. for AI, CCIA 2012, Vol.248, pp: 213-222, 2012, Alacant, Spain. [2] Olvera, J.A. 2013. Digital Preservation: a New Approach from Computational Intelligence, Joint Conference on Digital Libraries 2013, JCDL Doctoral Consortium 2013, July 22-26, Indianapolis, Indiana, USA. Available at http://www.ieee-tcdl.org/Bulletin/current/papers/olvera.pdf [3] Charles L. Cartledge and Michael L. Nelson. When Should I Make Preservation Copies of Myself? Technical Report arXiv:1202.4185, 2012 [4] Nelson M. 2001, Buckets: Smart Objects for Digital Libraries, PhD thesis, Old Dominion Univ. [5] Ruusalep R., Dobreva M. Digital Preservation Services: State of the Art Analysis. Available at www.dcnet.org/getFile.php?id=467 [6] Rotherberg, J. 1995. Ensuring the Longevity of Digital Documents, Scientific American, Vol.272, Number 1, pp. 42-7, January 1995

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-269

269

Manifold learning visualization of Metabotropic Glutamate Receptors Martha-Iv´on C´ardenas a,b,1 , Alfredo Vellido a,c and Jes´us Giraldo b a Llenguatges i Sistemes Inform` atics, UPCatalunya 08034, Barcelona, Spain b Institut de Neuroci` encies and Unitat de Bioestad´ıstica, UAB, 08193, Bellaterra, Spain c CIBER-BBN, Cerdanyola del Vall` es, Spain Abstract. G-Protein-Coupled Receptors (GPCRs) are cell membrane proteins with a key role in biological processes. GPCRs of class C, in particular, are of great interest in pharmacology. The lack of knowledge about their 3-D structures means they must be investigated through their primary amino acid sequences. Sequence visualization can help to explore the existing receptor sub-groupings at different partition levels. In this paper, we focus on Metabotropic Glutamate Receptors (mGluR), a subtype of class C GPCRs. Different versions of a probabilistic manifold learning model are employed to comparatively sub-group and visualize them through different transformations of their sequences. Keywords. G-Protein-Coupled Receptors, Metabotropic Glutamate Receptors, data visualization, Generative Topographic Mapping.

Introduction The G-protein-coupled receptors (GPCRs) in the human genome form five main families (A to E) according to their similarity [9]. Class C GPCRs include metabotropic glutamate receptors (mGluRs), in which we focus our research. They are promising targets for the development of new therapeutic drugs. The functionality of GPCRs is often studied from the 3-D structure of their sequences. As no complete crystal structure data is currently available for class C GPCRs, the investigation of their primary structure as amino acid (AA) sequences is necessary. The unaligned symbolic sequences are unsuitable for direct analysis, but many different sequence transformation techniques are available to overcome this limitation. In this study, we used two relatively simple ones: the first is AA composition (AAC [2]), which accounts only for the relative frequencies of appearance of the 20 AAs in the sequence. Recent analysis using semi-supervised and supervised classification [3,4] with this type of transformation showed that accuracy reaches an upper bound.The second choice is the digram transformation, which considers the frequencies of occurrence of any given pair 1 *This research was partially funded by MINECO TIN2012-31377 and SAF2010-19257, as well as Fundaci´o La Marat´o de TV3 110230 projects.

270

M.-I. Cárdenas et al. / Manifold Learning Visualization of Metabotropic Glutamate Receptors

of AAs. They were used for the more general classification of class C GPCR sequences in [5], obtaining accuracies in the area of 93-94%. The target of this study was exploratory mGluR sequence clustering and visualization as a preliminary but complementary step towards full-blown mGluR subtype classification (into their eight known subtypes). This was implemented using different variants of a nonlinear dimensionality reduction (NLDR) method: Generative Topographic Mapping (GTM [6]). This machine learning technique has previously been applied with success to the more general problem of class C GPCR visualization [7,8]. MGluR subtype visual discrimination was quantitatively assessed here using an entropy measure. 1. Materials and methods 1.1. Class C GPCR mGluR data The GPCRDB [9] database of GPCRs divides them into five major classes (namely, A to E). The investigated class C data (from version 11.3.4 as of March 2011) include 351 mGluR sequences, in turn sub-divided into 8 subtypes (mGluR1 to mGluR8) plus a group of mGluR-like sequences. They are distributed as 33 cases of mGluR1, 26 mGluR2, 44 mGluR3, 23 mGluR4, 32 mGluR5, 15 mGluR6, 4 mGluR7, 98 mGluR8 and 76 mGluRlike. This 8 subtypes can also be grouped into 3 categories according to sequence homology, pharmacology and transduction mechanism: group I mGluRs include mGluR1 and mGluR5; group II includes mGluR2 and mGluR3; whereas group III includes mGluR4, 6, 7 and 8. 1.2. The basic GTM and Kernel GTM The GTM [6] is a non-linear latent variable model of the manifold learning family that performs simultaneous data clustering and visualization through a topology-preserving generative mapping from the latent space in RL (with L = 2 for visualization) onto the RD space of the observed data in the form y = Φ (u)W , where y is a D-dimensional vector, Φ is a set of M basis functions, u is a point in the visualization space and W is a matrix of adaptive weights wmd . The likelihood of the full model can be approximated and maximum likelihood methods can be used to estimate the adaptive parameters. Details can be found in [6] and elsewhere. The probability of each of the K latent points uk for the generation of each data point xn , p(k|xn ), also known as a responsibility rkn , can be calculated as part of the parameter estimation process. For data visualization, it is used to obtain a posterior mode projection, defined as xn : knmode = arg max{kn } rkn , as well as a posterior mean projection knmean = ∑Kk=1 rkn uk . The standard GTM is used here to model and visualize the AAC- and digram-transformed unaligned sequences. The kernel-GTM (KGTM) [10] is a kernelized version of the standard GTM that is specifically well-suited to the analysis of symbolic sequences such as those characterizing proteins. This is achieved by describing sequence similarity through  a kernel  on the mutations and gaps between sequences: K (x, x ) =  function based  π (x,x ) ρ exp ν π (x,x)+π (x ,x ) for sequences x and x ; ρ and ν are prefixed parameters, and π (·) is a score function of common use in bioinformatics. Further details on these parameters can be found in [10]. KGTM is used here to model and visualize the multiple sequence alignment (MSA)-transformed sequences, using the posterior mode projection.

M.-I. Cárdenas et al. / Manifold Learning Visualization of Metabotropic Glutamate Receptors

271

2. Results The standard GTM visualization of the AAC- and digram-transformed mGluR sequences according to their posterior mean projection is shown in Fig.1.

Figure 1. Visualization map of the standard GTM-based posterior mean projection of the mGluR AAC- (left) and digram-transformed (right) sequences. Different mGluR subtypes are identified by color, as in Fig. 2.

Given that, for KGTM, all the conditional probabilities (responsibilities) rkn are sharply peaked around the latent points uk , the visualization of the mGluR is better and more intuitively represented by their posterior mode projections as shown in Fig.2.

Figure 2. KGTM-based visualization of the mGluR subtypes through their posterior mode projection. Left) Individual pie charts represent sequences assigned to a given latent point and their size is proportional to the ratio of sequences assigned to them by the model. Each portion of a chart corresponds to the percentage of sequences belonging to each mGluR subtype. Right) The same map without sequence ratio size scaling, for better visualization.

An entropy-based measure, suitable for discrete clustering visualizations, was used to quantify the level of mGluR subtype overlapping: If map areas are completely subtype specific, entropy will be zero, whereas high entropies will characterize highly overlapping subtypes. For a given latent point k, the entropy is Sk = − ∑Cj=1 pk j lnpk j , where j m is one of the C = 9 mGluR plus mGluR-like subtypes and pk j = mkkj , where, in turn, mk is the number of sequences in cluster k and mk j is the number of subtype j sequences in cluster k. The total entropy of a given GTM map can thus be calculated as S = ∑Kk=1 mNk Sk , where mNk is the proportion of mGluR sequences assigned to latent point k. The entropy results for the standard GTM representation of the transformed sequences are summarized in Table 1.

272

M.-I. Cárdenas et al. / Manifold Learning Visualization of Metabotropic Glutamate Receptors

Table 1. Entropies for each of the 8 mGluR and mGluR-like subtypes, together with total entropy. 1

2

3

4

5

6

7

8

Like

Total

AAC digram

0.35 0.77

0.41 0.55

0.69 0.53

0.65 0.59

0.55 0.50

0.19 0.80

0.41 0.69

0.33 0.48

0.48 0.35

0.31 0.37

KGTM

0.50

0.65

0.51

0.47

0.53

0.57

0.64

0.27

0.50

0.33

3. Discussion All GTM visualizations provide insights about the inner grouping structure of mGluRs. The first overall finding is that most subtypes show a reasonable level of separation, but none of them avoids subtype overlapping. Most subtypes also show clear inner structure. The differences between the AAC sequence mapping and its digram counterpart in Fig.1 are noticeable, although there are also clear coincidences, such as the neat separation of the heterogeneous mGluR-like sequences in the bottom-left quadrants of both maps, with mGlu3 located nearby. These differences indicate that the visual data representation is at least partially dependent on the type of sequence transformation. This is further corroborated by the KGTM visualization in Fig.2. The mapping differs in many ways from the previous ones, although many characteristics remain consistent. As stated in section 1.1, the 8 main mGluR subtypes are commonly grouped into 3 categories. The visualizations in Figs.1 and 2 provide only partial support to these categories. The entropy measure described in the previous section provides us with a quantitative measure of subtype location specificity. The results in Table 1 are quite telling. First, because the overall entropy is not too dissimilar between transformations; despite this, the transformation yielding lowest entropy (highest level of subtype discrimination) is, unexpectedly, the simplest one: AAC, which does not even consider ordering in the AA sequence. It is clear, in any case, that subtype overlapping is substantial. Second, because the dependency of results on the type of sequence transformation is clearly confirmed. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]

M.C. Lagerstr¨om and H.B. Schi¨oth, Structural diversity of G protein-coupled receptors and significance for drug discovery, Nature Reviews Drug Discovery 7 (2008), 339–357. M. Sandberg et al., New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids, Journal of Medicinal Chemistry 41 (1998), 2481–2491 R. Cruz-Barbosa, A. Vellido, and J. Giraldo, Advances in semi-supervised alignment-free classification of G-protein-coupled receptors. IWBBIO’13, Granada, Spain, pp. 759–766 (2013) C. K¨onig et al., SVM-based lassification of class C GPCRs from alignment-free physicochemical transformations of their sequences. ICIAP 2013, LNCS 8158, pp. 336–343, (2013) C. K¨onig et al., Finding class C GPCR subtype-discriminating n-grams through feature selection. PACBB 2014. C.M. Bishop, M. Svens´en, and C.K.I. Williams, GTM: The Generative Topographic Mapping. Neural Computation 10 (1998), 215–234. M.I. C´ardenas, A. Vellido, I. Olier, X. Rovira, and J. Giraldo, Complementing kernel-based visualization of protein sequences with their phylogenetic tree, CIBB 2011, LNCS/LNBI 7548, 136–149 (2012) M.I. C´ardenas et al., Exploratory visualization of misclassified GPCRs from their transformed unaligned sequences using manifold learning techniques. IWBBIO 2014, 623–630 (2014) B. Vroling, et al., GPCRDB: information system for G protein-coupled receptors. Nucleic Acids Research 39, suppl 1 (2011) D309–D319 I. Olier et al., Kernel Generative Topographic Mapping. ESANN 2010, 481–486 (2010)

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-273

273

Evaluation of Random Forests on large-scale classification problems using a Bag-of-Visual-Words representation Xavier SOLE´ a,1 Arnau RAMISA a and Carme TORRAS a a Institut de Rob` otica i Inform`atica Industrial (CSIC-UPC), Llorens i Artigas 4-6, 08028 Barcelona, Spain Abstract. Random Forest is a very efficient classification method that has shown success in tasks like image segmentation or object detection, but has not been applied yet in large-scale image classification scenarios using a Bag-of-Visual-Words representation. In this work we evaluate the performance of Random Forest on the ImageNet dataset, and compare it to standard approaches in the state-of-the-art. Keywords. large-scale image classification, classifier forest, random forests

Introduction In recent years, “big data” has emerged as a trove of enormous potential to tackle problems typically too hard for artificial intelligence: large datasets can be used to learn very accurate predictors, able to match or even surpass expert human performance in tasks such as automatic translation, medical diagnostics or legal counseling. These predictors can then be used to automate processes, and provide service to citizens at a level never imagined before. However, in such scenario, it is as important to learn good classifiers as it is to have methods with a very low computational footprint at test time, since then a system can be scaled to dimensions that can truly serve millions of users simultaneously. A good example of this are web services: more than a half billion search queries have to be served each day, and thousands of pictures are uploaded to photo-sharing sites every minute. Services such as picture search by example or automatic tagging of new photos have, therefore, to be based on methods very efficient at test time to be useful in practice. In this context, the Random Forest method, proposed by Breiman et al. [1], is a good candidate, as its computational cost at test time is very small. This machine learning approach, that combines discriminative and generative aspects, can be used for multiple tasks, like classification, regression or density estimation. However, even though Random Forests have been used in many different fields with successful results [2,3,4], to the best of our knowledge it has not yet been evaluated in a large-scale image classification scenario using a Bag-of-Visual-Words representation. In order to see how Random Forest behaves on a large scale image classification context, and to study its error and computational complexity, we have evaluated their performance in the ImageNet [5] Large Scale Visual Recognition Challenge’10 (LSVRC’10) dataset. We also compare the performance obtained with Random Forest in this dataset to that of two other methods for multi-class image classification: the widely used One1 Corresponding

Author: Xavier Sol´e. E-mail: [email protected]

274

X. Solé et al. / Evaluation of Random Forests on Large-Scale Classification Problems

versus-Rest Support Vector Machines (OvR-SVM) approach, and the Ensembles of Class Balanced Nested Dichotomies (ECBND), both reported in an earlier work [6]. 1. Random Forest In this section we will briefly describe the Random Forest method by Breiman et al. [1]. Random Forests are sets of random decision trees constructed as follows: beginning in the root node, we separate the initial set of training images I using some split function into two disjoint sets; then this procedure is recursively repeated until a stopping criterion is met, and a leaf node is generated. Each leaf node has an associated probability distribution over classes c j ∈ C, computed as the fraction of images labeled as c j that reached the leaf node. A forest of random trees is generated by repeating the random tree creation process. At test time, a new example traverses each random tree in the forest to determine its corresponding leaf node. Then, the probability distribution over classes for this particular example is computed as the average of the distributions of the leaf nodes reached in every tree. The objective function that is optimized during the creation of a random tree is the information gain, measured as the Shannon entropy between the labels of the images in a node, and the combination with those of its descendants. The split function associated with the internal nodes is computed by randomly generating a number of random axisaligned divisions of the set of vectors associated with a node and then choosing the one that induces the highest gain. The creation of division candidates is controlled by two parameters: number of candidate features and number of candidate thresholds per feature. Computational cost. The computational complexity at test time for a Random Forest of size T and maximum depth D (excluding the root) is O(T · D). However, the computational cost can be lower if trees are not balanced. It must be noted that unification costs of ensemble methods, left out in the theoretical cost computation as they are often negligible, in the regime where Random Forest operates become quite significant and dominate the total cost. Another important cost to be considered in our method is memory space, exponential in the depth of the tree: O(2D ). 2. Experimental Results We have evaluated the Random Forest method on the ImageNet Large Scale Visual Recognition Challenge’10 (LSVRC’10) dataset. This data set is formed by 1000 classes, approximately one million training images and 150k testing images, with categories as diverse as “lemon”, “cress”, “web site”, “church” or “Indian elephant”. Since it is common to find images with more than one category of objects, the recommended evaluation criterion is error at five, i.e. five class predictions are allowed for each image without penalization. To facilitate comparison with related work, we used the demonstration Bagof-Visual-Words (BoVW) features for the LSVRC’10 dataset2 . Our implementation of the Random Forest method is based on the Sherwood C++ library [2]. 2 http://www.image-net.org/challenges/LSVRC/2010/download-public

X. Solé et al. / Evaluation of Random Forests on Large-Scale Classification Problems

(a)

(b)

275

(c)

Figure 1. (a) Error at five results for Random Forest, T = 60 in all experiments. (b) Gain (objective function) for each tree level. (c) Average percentage of “null nodes”, i.e. branches terminated at a previous level.

Small-scale experiments. The first step in our experiments consisted in evaluating the performance of Random Forest with a small number of classes (20 and 100 classes), before moving to the large-scale case. First, we adjusted the randomness parameters, fixing D and T . The optimal parameters found were used in the rest of the experiments. Next, we conducted experiments to see how the size of the forest T and the maximum depth D affect the results, interleaving the optimization of the two parameters. In practice, we fixed a depth, and then increased the size of the forest until the error saturated (i.e. adding trees does not result in more accurate predictions). Figure 1a shows how the tree depth affects the error at five and also that, although results with a small number of classes are good, the error increased dramatically as more classes were considered. Experiments with full dataset. Finally we conducted experiments with all 1000 classes varying the size of the forest (T and D parameters). Figure 2 shows the experimental results as well as computational and spatial complexity of Random Forests of increasing sizes. We have chosen operating points for the parameters D and T that, at a higher cost, improve the error at five of the final result. This way we explore the most optimistic possibilities of the Random Forest method. As can be seen in Figure 2a, the cost grows fast with respect to the error at five, but the real limiting factor is the exponential spatial cost, as can be seen in Figure 2b. We observe only a moderate decrease of the error with an exponential increase in computational requirements. To estimate the improvement we can expect by increasing D we use the evolution of entropy between consecutive levels. Therefore, in order to find out if we are close to the optimal D parameter, we compute the forest level gain: G(F, l) = E(F, l − 1) − E(F, l), that tells us how increasing D translates in gain towards our objective function. E(F, l) is the average forest F entropy at level l. In particular, it allows us to model how the forest evolves when increasing D to values greater that 17, where it is impractical to operate due to space constraints. Extrapolating the bell-shaped curves of the small-scale experiments to the full-scale one, we can conclude that at depth 17 we are close to the maximum gain per level, and that increasing the size of the tree would yield diminishing returns. This conclusion is supported by Figure 1a, where we can see that the error at five does not decrease significantly when D increases above depth 13 for the 20 classes experiment and above depth 15 for the 100 classes one. Finally, Figure 1c suggests that approximating the computational cost with the worst case scenario (i.e. a balanced tree) is accurate, since at the second to last level used in our experiments, trees only have 1.89% null nodes on average, and 8.35% at the last.

276

X. Solé et al. / Evaluation of Random Forests on Large-Scale Classification Problems

(a)

(b)

Figure 2. Error at five of the evaluated methods on the LSVRC’10 dataset at different complexity points.

Comparison with other methods. We compare the performance of the Random Forest classifier in the LSVRC’10 dataset with those of the Ensembles of Class-Balanced Nested Dichotomies (ECBND) and One-versus-Rest Support Vector Machines (OvRSVM) classifiers used in [6]. In Figure 2a we can see that Random Forest obtains worse results overall than ECBND and OvR-SVM, but with a much lower computational cost at the same error at five. However, as discussed earlier, the most pressing limitation for the Random Forest method is the memory requirements. In Figure 2b we can see the relation between memory complexity and error at five for the evaluated methods. 3. Conclusions In this work we have seen that Random Forest is unpractical for large-scale multiclass image classification problems using BoVW because of the high spatial cost and low accuracy. Other hierarchical classification methods such as ECBND are much more discriminative at each node, thus requiring less levels to reach the same error. On the other hand, Random Forest attains the lowest computational complexity at testing time, and for certain problems they can be the best choice [3]. Acknowledgements This research is partially funded by the Spanish Ministry of Science and Innovation under project DPI2011-27510, by the CSIC project CINNOVA (201150E088) and by the ERA-Net CHISTERA project ViSen PCIN-2013-047. A. Ramisa worked under CSIC/FSE JAE-Doc grant. References [1] [2] [3] [4] [5] [6]

L. Breiman, “Random Forest,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001. A. Criminisi and J. Shotton, Decision Forests for Computer Vision and Medical Image Analysis. Springer, 2013. J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake, “Realtime human pose recognition in parts from single depth images,” in CVPR, pp. 1297–1304, 2011. A. Bosch, A. Zisserman, and X. Munoz, “Image Classification using Random Forests and Ferns,” in ICCV, pp. 1–8, 2007. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A Large-Scale Hierarchical Image Database,” in CVPR, pp. 248–255, 2009. A. Ramisa and C. Torras, “Large-scale image classification using ensembles of nested dichotomies,” in Frontiers in Artificial Intelligence and Applications: Artificial Intelligence Research and Development, vol. 256, pp. 87–90, IOS Press, 2013.

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-277

277

Web pattern detection for Business Intelligence with data mining a

ARTURO PALOMINO ab and KARINA GIBERT bc1 Kantar Worldpanel Consumer Data Science director, Sant Cugat del Vallés, Spain b Associate Professor, Department of Statistics & Operations Research Universitat Politècnica de Catalunya-BarcelonaTech, Barcelona, Spain c Kowledge Engineering and Machine Learning groups Universitat Politécnica de Catalunya- BarcelonaTech, Barcelona, Spain

Abstract. Finding Internet browsing patterns is a current hot topic, with expected benefits in many areas, marketing and business intelligence among others. Discovering user’s internet habits might improve fields like chained-publicity, ecommerce and media optimization. The large amount of data contained in log files that is currently being analyzed to find user’s patterns require efficient and scalable data mining solutions. This paper proposes an algorithm to identify the most frequent route followed by Internet users, based on a specific combination of simple statistical and vectorial operators that provides exact solution with a really cheap computational cost. In the paper, the performance is compared with other two algorithms and an application to a real case study in the field of bussiness intelligence and chained publicity is presented. Keywords. Eclat, data mining, marketing mix, cross media, media optimization, web mining, return of investment, web domain, internet

Introduction Recently there is an increasing interest to analyze the behavior of users within the network. Several references on related topics are found in the literature [7] [6] [1] [4], most related with proactivity and recommender systems. The aim of this work is to extract the most common browsing patterns of a panel of Internet users, aplying data mining methods to user-centric log files, a pattern conceived as an ordered sequence of visited domains, rather than a bag of domains as often considered in the literature [8]. The order of visited sites is strategic to design chained-advertisement campaigns, where the information is gradually provided to the user through a sequence of banners with partial information in successive websites [3]. According to most recent trends, dividing the advertisement in several short texts in multiple sites likely to be consecutively visited is more friendly to smoothly drive the user to a possible purchase. Structurally, the problem consists on identifying the maximum of a probability function built over a union of power event sets with increasing exponent (maxR P(R), R= Ul>0 l D , where l is the length of the pattern and D is the set of visitable Internet domains, R representing a route); it is a combinatorial problem with no trivial solution on 1

Corresponding author: Karina Gibert, [email protected].

278

A. Palomino and K. Gibert / Web Pattern Detection for Business Intelligence with Data Mining

reasonable runtime. In [2] site-centric log files with support constraints are analyzed, but no works were found using user-centric data. Main goal is to design and implement an scalable algorithm that finds the frequent most domain sequence using user-centric data. Also to provide a ranking of variablelength domain sequences, according to their probability. Three algorithms have been designed and implemented. Scalability is an issue here as the targeted log files are large. The three algorithms have been tested over real data coming from KantarWorldpanel and their performances compared in terms of accuracy and computational cost. Once the most frequent sequence of the panel has been identified, this information has been used in a business intelligence application which is presented in section 4. Section 1 describes the data. Section 2 presents the developed methods. In section 3 results are provided with a comparison on both performance and quality of solutions and an application to marketing is shown. Section 4 give conclusions.

1. Data A real-world sample from a panel conducted by KantarWorldpanel about Internet browsing of 1700 families is used. Annual log files registering the Internet activity of users contains more than 600 million registers on average. Here, data from the week 3rd-9th December 2011 is used. They originally contained 6.678.000 rows (see Fig 1). Some preprocessing was required previous to the analysis: all correlative clicks of an user to same domain were reduced to a single row, reducing to 441.721. Finally, error sites and involuntary websites like background opened banners were disregarded, as they are irrelevant for user sequence detection. The useful information reduced to 151,260 rows. In this work sessionId., panelistId and domain are used. Further studies will include aditional information, like social class of user or persons in the household to profile the user associated to a certain internet path.

Figure 1: Structure of original log file

2. Methods Three methods were developed to find maxR P(R). Starting from a naïve algorithm (M1) where all possible domain combinations are searched and frequencies counted brute force. M2 proposes to include ECLAT[9] as an internal operator; ECLAT is a pre-existing method for identifying frequent itemsets; it is used in M2 to filter unusual combinations of domains thus cheaping the counting step, but this provides an approximate solution as sometimes prunes bags that could derive valid domain sequences. M3 is a totally original proposal based on the combination of simple and efficient vectorial operations, commonly used in statistical procedures. M3 is designed from the scratch to find maxR P(R) in the fewer time. It gives an exact solution by using a transformed dataset. Here, only details of M3 are provided (see [5] for M1 and 2). M3 is based on transforming the original log format (see Fig.1) to a cross-table with user’s sessions in rows, visits in columns and visited domains per session in cells (step

A. Palomino and K. Gibert / Web Pattern Detection for Business Intelligence with Data Mining

279

1, Fig.2). A previous recoding assigns unique numeric codes to domains (Step 0, Fig.2) and each row of the cross-table becomes the complete route followed by an user in a session; i.e., session 9 (Fig.2) follows domains 9-13-8-20 which are google-musicafacebook-youtube, according to codification (Fig.2).

Figure 2. Method with recoding and tabulation flux process.

Efficient counting of frequencies of L-length patterns is performed by means of a combination of a stack operation, followed by a concatenation of fields. Step 2, Fig.2 visualizes the stack operation for patterns of length L=2, where a mobile window of L=2 columns scans the whole cross-table horizontally. Step 3 concats the resulting structure; thus, sequences of L consecutive domains keep codified in a single identifier. A simple tally operation computes frequencies for occurring domain sequences. Finally, the results back to domain literals, and a patterns are priorized by frequency (Step 4, Fig.2). The method only uses basic primitives from recoding, cross tabbing and concatenation. Computational cost of most of this operations is robust to L.

3. Results and application to bussiness intelligence In this section comparison of the 3 methods results and runtimes is shown. Fig.3 shows runtimes of ranking sequences for several L values and m (the minimum support for frequent itemsets required by Eclat) in M2. M1 used more than one day for patterns of L=2; expected result, as 18.000 domains give more than 324 million combinations. M2, m = 25 runtime increases quickly, but no solution of L>3 can be derived, as probabilities of longer patterns where below the threshold and the algorithm stops. For m=50 runtime improves more than 200%, but still no solution for L>7. With m=100, runtime improves but much less patterns are detected. M3 shows quasi-constant runtime for increasing L. It is signifficantly faster than M2 and provides exact solutions. M3 provides a ranking of user patterns useful to design chained advertising campaigns, like the following: A customer wants to put some creativities in L webs likely to be

280

A. Palomino and K. Gibert / Web Pattern Detection for Business Intelligence with Data Mining

consecutively visited by users, so that the probability of viewing the complete message in the exact order maximizes. M3 was used to find the most frequent patterns for increasing L. Sequences with L>4 show a really low frequency and are disregarded for publicity. To place 3 banners, googleЍ msnЍ facebook are choosen according to the obtained ranking. The complete chain of banners, seen in the proper order, stimulates purchase and leads the panelist to the online shopping site in the last banner displayed.

Figure 3. Execution times by method.

4. Conclusions M2 and M3 tried to accelerate the excessive computational requirements of M1. Although introduction of Eclat (M2) reduces runtime, derived prunning might mask interesting solutions. M3 is based on a transformation of the log files into a new format and the use of basic primitive operations, like cross tabbing and concatenation, to find a complete set of solutions in a quasi-constant runtime. The proposal is highly scalable and sustainable for massive log files in real time systems. M3 is interesting tu support chained-advertisement internet campaigns. Any kind of subpopulation can be targeted just defining inclusion criteria that filters the subset of users. In a further analysis additional individual user information is used to find relationships with web patterns Acknowledgements Thanks to the generous data contribution of KantarWorldpanel which has a continuous panel of internet users with click stream information.

References [1] Adomavicius, Gediminas, and Alexander Tuzhilin. "Expert-driven validation of rule-based user models in personalization applications." Data Mining and Knowledge Discovery 5.1-2 (2001): 33-58. [2] Baumgarten, M, et al. "User-driven navigation pattern discovery from internet data." Web Usage Analysis and User Profiling. Springer Berlin Heidelberg, 2000. 74-91. [3] McElfresh, C, Mineiro P, Radford M "Method and system for optimum placement of advertisements on a webpage." U.S. Patent No. 7,373,599. 13 May 2008. [4] Nasraoui, Olfa, et al. "WebKDD 2004: web mining and web usage analysis post-workshop report." ACM SIGKDD Explorations Newsletter 6.2 (2004): 147-151. [5] Palomino, A.”Detecció de sequencies web per a bussines intelligence”, Master thesis, UPC (2013) [6] Rao, Yanghui, et al. "Sentiment topic models for social emotion mining." Inf. Sci 266 (2014): 90-100. [7] Tang, Jie, et al. "A combination approach to web user profiling." ACM Trans KDD 5.1 (2010): 2. [8] Yang, Yinghui C. "Web user behavioral profiling for user identification." DSS 49-3(2010): 261-271. [9] Zaki, MJ. "Scalable algorithms for association mining." IEEE Trans KDE 12.3 (2000): 372-390..

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-281

281

About model-based distances Gabriel MATTIOLI Mathematics and Computer Science Department ETSAV - Universitat Politècnica de Catalunya [email protected] Abstract. The literature of any scientifical field is full of models. In this work it will be shown how every model defines a distance function in a natural way. This distance is much more suitable for specific problems than the ones usually used (generally Euclidean or Manhattan), because its directly related with the particular problem and does not has to be chosen arbitrarily. Keywords. Model, Distance, Extensional sets, Indistinguishability

1. Introduction Models are everywhere in Science. We find models in Physics, Biology, all Engineerings, Social Sciences... Essentially a model is used as a representation of an object or phenomenon. Hence, given a particular sample or case (terminology is variable between scientifical fields) and a model, the latter provides information of the particular case. Another recurrent concept is distance functions, which are used to measure the dissimilarity (either qualitative, spatial...) between objects. A distance is defined by some functional equations and thereby there is a wide variety of distance functions that can be used in every particular field and problem. Despite the study, development and usage of new distance functions (interval distance [2], kernel-based distances [6], semantic distance [8] ,....) has increased in the last decades, the most used distances are still Euclidean and Manhattan because their performance is good in general. Although this justification is acceptable if the goal is a good performance of an algorithm or method, this is an arbitrary choice without a scientifical background or justification related to the specific problem of interest. In this paper it will be shown how, when there is a model of the problem, there is a natural distance function defined by the model if this is understood in terms of extensional sets (fuzzy equivalence classes). This distance does not suffer the arbitrarity on the choice pointed before as it is strongly related with the specificities of the problem.

2. What is a model? A quick search of the definition for scientifical model shows the difficulties that encounters in order to include all its different usages. Without any pretension of giving the final and complet definition, and summarizing some of the ideas present in the many different

282

G. Mattioli / About Model-Based Distances

definitions proposed in the literature, we can define a scientifical model in the following way: Definition 2.1. A scientifical model is an idealized representation of a real phenomenon or object in a symbolic space. In other words, given a supposed relation between a real situation and phenomenon

Situation

relation

−→

Phenomenon

this is idealized as: f

Measurable inputs −→ Classifiable output • The "measurable inputs" are measures of real entities. These inputs correspond to those real entities that the modeler thinks that are related to the phenomenon. • f is a function or relation between symbolic spaces that is built so that its behavior is similar to the one of the real relation. • The "classifiable output" represents the space of states of the phenomenon. In this space different classes are defined that correspond to the different qualitatively different states of the phenomenon. In this work we will illustrate the ideas developed with two different models: The first one is the behaviorist learning model of Pavlov’s dog [5]. It is modelled how a subject learns the relation between a stimulus (the sound of a bell) and a reward (food). The input would be the number of times the conditioning is done and the output if the dog learns or not. The second is a climate model. In this case the inputs are the different meteorological measures: temperature, humidity, wind speed, ... and the output the different possible states of the system: rain, sun, snow...

3. Indistinguishabilities and distances Following we define indistinguishability operators, as they will be one of the keys for defining model-based distances. Definition 3.1. Let T be a t-norm. A fuzzy relation E on a set X is a T -indistinguishability if and oly if for all x, y, z ∈ X a) E(x, x) = 1 b) E(x, y) = E(y, x) c) T (E(x, y), E(y, z)) ≤ E(x, z) Indistinguishabilities model the intuitive notion of similarity between objects, and fuzzify classical equivalence relations. To learn more about them we recommend [7]. There is an (inverse) equivalence between indistinguishabilities and distances.

283

G. Mattioli / About Model-Based Distances

Theorem 3.2. E is a T -indistinguishability with T ≥ Ł, being Ł the Lukasiewicz t-norm if and only if dE (x, y) = 1 − E(x, y) is a normalized pseudo-distance. For general t-norms there is an analogous result concerning S-distances [3]. There are many ways of generating indistinguishabilities. One of the easiest ones is to consider the indistinguishability Eμ related to a fuzzy set μ [7]. One of the most important results of the field is the well known Representation Theorem we state below. It allows to define an indistinguishability given a family of fuzzy subsets, which will be the main fact used in the construction we propose. Theorem 3.3. [9] Representation Theorem. Let R be a fuzzy relation on a set X and T a continuous t-norm. Then R is a T -indistinguishability relation if and only if there exists a family {μi }i∈I of fuzzy subsets such that for all x, y ∈ X R(x, y) = inf Eμi (x, y). i∈I

The generators of E correspond to the set HE of extensional fuzzy subsets related to E [9], and can be understood as the observable sets for an "eye" identifying objects acording to the relation E.

4. Model-based distances A phenomenon can have several qualitatively different states. Pavlov’s dog can learn σ1 or not σ2 . In the climate model the different weather conditions define different classes σi . Given a model of a phenomenon, we can reread the different classes in "Classifiable output" as a family of sets σi , and the relation f as a function that assigns to all inputs the membership to each σi . Without loss of generality we can consider that this sets are fuzzy sets. Hence, we can reread the diagram of the model in the following way: f

σ

i Measurable inputs −→ Classifiable output −→ [0, 1]

where σi stands for the membership function of the sets that represent the different states of the system. This way, we can take these sets as a generator family of a set of extensional sets on the input space of the model HE =< σ1 ◦ f, ..., σn ◦ f >. This set HE is unique (except for the choice of the t-norm)and uniquely defines an indistinguishability E that models the similarity between the different possible inputs for the model. The model-based disance is thus dE (x, y) = 1 − E(x, y) (or any involutive negation of E to have an S-distance). In the climate model example this distance will be more sensitive to the qualitative difference of a variation of the temperature between -1 ◦ C and +1 ◦ or between +23 ◦ C and +24 ◦ C, because despite the numerical diference among them is equivalent in a termic scale, it is not considering the outcome of the phenomenon. In the behaviorist model,

284

G. Mattioli / About Model-Based Distances

this distance will appreciate better the difference between repeating the experiment 10 or 11 times (where the dog is at maximum learning) or between 200 201 (no learning). In general, model-based distances do not suffer the problem of arbitrarity on the choice given that they are built taking into account the specificities and particular properties of the problem of interest.

5. Conclusions In this work it has been shown how model-based distances can be built in a natural way. To do so, techniques from the field of indistinguishability operators as well as the duality between these and distance functions has been used. This construction has been done in the most general case and for that reason it has been explained taking into account scientific models in general. In order to be applied to particular cases a specification exercice of the ideas explained has to be done. As future work, the author intends to apply the ideas explained on a real problem concerning structural models of brain MRI images.

Acknowledgements The author acknowledges the National Scholarship Programme of SAIA, funded by the Ministry of Education, Science, Research and Sport of the Slovak Republic.

References [1] [2] [3] [4] [5] [6] [7] [8] [9]

Castro, J.L., Klawonn, F.: Similarity in Fuzzy Reasoning. Mathware & Soft Computing 2, 197-228 (1996). Falomir Z., Museros L., Gonzalez-Abril L., Sanz I.: A Model for Qualitative Colour Comparison Using Interval Distances. Displays, vol. 34, no 4, 250-257 (2013). Jacas, J: On the generators of T-indistinguishability operators. Stochastica 12, 173-191 (1988). Jacas, J., Recasens, J.: Fixed points and generators of fuzzy relations. J. Math. Anal. Appl 186, 21-29 (1994). Pavlov, I. P.; Conditioned reflexes. Courier Dover Publications(2003). Phillips, J. M., Venkatasubramanian, S.: A gentle introduction to the kernel distance. arXiv preprint arXiv:1103.1625 (2011). Recasens,J: Indistinguishability Operators. Modelling fuzzy equalities and fuzzy equivalence relations. Studies in Fuzziness and Soft Computing 260 (2010). Rips, L. J., Shoben, E. J., Smith, E. E.: Semantic distance and the verification of semantic relations. Journal of verbal learning and verbal behavior, 12(1), 1-20 (1973). Valverde, L.: On the structure of F-indistinguishability operators. Fuzzy Sets and Systems 20, 313-328 (1985).

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-285

285

Defining Dimensions in Expertise Recommender Systems for Enhancing Open Collaborative Innovation J. NGUYEN a , A. PEREDA a , G. SÁNCHEZ-HERNÁNDEZ a and C. ANGULO a GREC–ESADE Business School, Ramon Llull University, Spain b GREC–Universitat Politècnica de Catalunya - BarcelonaTech, Spain

b,1

Abstract. In open innovation a firm’s R&D crosses not only internal boundaries but disciplines. It is an interactive process of knowledge generation and transfer between internal and external firms. Despite the assistance of open innovation marketplaces, the process of matching seekers and solvers remains a challenge. Expertise recommender systems in an open innovation marketplace can facilitate finding the “right partner”. With this aim, a list of appropriated dimensions to be considered for the expertise recommender system are defined. An illustrative example is also provided. Keywords. open innovation, recommender system, expertise

Introduction In the past, a corporation’s internal research and development organization was a strategic asset which acted as a barrier to entry to competitors. Closed innovation environments where firms managed the development of a product from conception to distribution required heavy investment in Research and Development (R&D) resources. Towards the end of the 20th century, the closed innovation model began to erode due to the increased mobility of workers who transported their ideas and expertise with them. An open innovation model emerged where innovation could easily move between a firm and its surroundings. Organizations recognized they could profit from research developed externally and from licensing the use of their intellectual property [1]. In an open innovation model firms can commercialize ideas which they purchased externally and commercialize internally generated ideas through external channels. Open innovation marketplaces, such as InnoCentive, consist of a network of scientist, professionals, retirees, and students who solve a wide variety of challenges presented by seeker companies [2]. However, due to the unsystematic nature of partner identifica1 Corresponding Author: Cecilio Angulo, GREC Research Group, Universitat Politècnica de Catalunya, Pau Gargallo 5, 08028 Barcelona, Spain; E-mail: [email protected]. This research is partially supported by the SENSORIAL (TIN2010-20966-C02-01) and PATRICIA (TIN201238416-C03-01) Research Projects, funded by the Spanish Ministry of Science and Information Technology and by the European Commission funded project COLLAGE (GA318536 2012-15).

286

J. Nguyen et al. / Defining Dimensions in Expertise Recommender Systems

tion, realizing transactions presents a managerial challenge [3]. In this context, recommender systems may be able to facilitate the technology brokering. Recommender systems are able to filter the range of available choices [4] to content of interest to individuals of a community or to users with similar profiles. In turn, expertise recommender systems are a specific type of recommender system, which help find people who have some expertise with a problem. The aim of this paper is to show how expertise recommender systems can help seeker firms find the right solver into open innovation marketplaces. A number of specific selected dimensions will be listed and an illustrative example will be also provided. 1. Background Open Innovation Platforms Open innovation, a widely researched area in the last decade, is defined as the use of purposive inflows and outflows of knowledge to accelerate internal innovation, and expand the markets for external use of innovation, respectively [1]. Intermediaries in open innovation play a key role in facilitating the transfer of this knowledge by connecting firms to unknown resources. With the use of Internet platforms the cost of linking seekers with potential solvers has decreased dramatically [5]. However, there are still very important challenges for improving these best practices. Firstly, open innovation intermediaries create value by matching seekers and solvers. However, today intermediaries face difficulties finding solvers that fulfill the requirements of the seeker in a timely manner. Studies have shown that people in the boundaries of disciplines create innovative solutions [6,7]. Identifying these solvers for each solution will increase the pool of potential solvers and the diversity and quality of the solutions. Secondly, intermediaries’ main goal is to make the process of bringing in ideas efficient and to reduce the cost of the knowledge transaction. However, this efficiency approach makes the nature of the ties that are formed between the solver and the seeker very weak [5]. Tie strength can be enhanced by reducing the distance and the frequency of the transaction between both parties. Expertise Recommender Systems In this era of information, people struggle to filter vast quantity of data. Recommender systems have sought to fulfill this need. However, information is not always stored in systems or databases. Rather, information can be processed and embodied in people. Expertise recommender systems address this issue by identifying people with specific information and knowledge. Up to date, several expertise recommender (ER) systems have been developed [8]. ER systems should recommend people based on an appropriate mixing and an optimal matching of the characteristics of the candidates and the preferences of the user. Currently, recommender systems focus on finding the person with the “right level of expertise” rather than “the right person”. According to [9], intermediaries aim to find the “uniquely prepared mind” to solve the problem. 2. Improving Open Innovation Platforms with ER Systems Expert recommender systems can address two of the main challenges that open innovation platforms are currently facing: to match seekers and solvers efficiently and effectively, and to strengthen the tie between the seekers and the solvers.

J. Nguyen et al. / Defining Dimensions in Expertise Recommender Systems

287

Match. Expertise recommender systems filter the number of capable solvers to the most suitable, allowing seekers to arrive at a solution faster. Moreover, the number of resources required for firms to review, select and test solutions is reduced. Tie strength. According to [10], firms need to integrate specialized knowledge. However, this knowledge integration and transfer is not efficient across markets due to the sticky nature of knowledge. In this regard, it is suggested in [11] that the more closely related are two people, the more likely it is that tacit knowledge is transferred. For this reason, finding the “right person” is of paramount importance to forging the gap of weak ties. Advanced recommender systems can play an important role by incorporating specific dimensions to find the best match for both parties. This facilitation requires assessing not only the level of knowledge (expertise) but the collaborative behavior of and distance between both parties. Taking these dimensions into account recommender systems enable open innovation platform users to initiate relationships which can lead to strong ties. 2.1. Proposed Dimensions Needed in this Environment The following dimensions for expertise recommender systems in an open innovation marketplace are proposed: expertise, qualifications, proximity and availability. The expertise dimension represents the areas of knowledge of each candidate to solve a problem. They reflect in which topics the candidate has a certain degree of expertise. This information can be analyzed in both explicit and implicit way. The qualification variables capture the behavior of the solver. The solver can directly state explicit information by choosing the qualification topics that best define herself. But in this case implicit information would be more important due to its objective nature. This implicit information must be collected by analyzing the interaction between the solver and the intermediary platform. The proximity information is used to measure the distance between the solver and the seeker. It will be extracted by analyzing the connections between the solver’s and the seeker’s social network. Finally, it is proposed to incorporate the dimension of availability, which informs about the current availability of each solver. 2.2. Illustrative Example of an Expertise Recommender This subsection details an illustrative example of an expertise recommender prototype that is being developed. Leo (seeker) is starting the design and development of a new product and would like fresh ideas from outside of his R&D team. He decides to look for external solutions. On the website of the intermediary, Leo enters his requirements for the desired solver (Figure 1(left)). In the skills and expertise section he selects the required skills for the desired solver. Because Leo is concerned with the transfer of knowledge he looks for qualifications of communication skills with the intent of obtaining complete and accurate documentation of the innovation. Lastly, Leo is not interested in the proximity of the solvers, so he disables this dimension. The ER automatically adds “high availability” to the list of requirements and gets back to Leo the recommended solvers detailed in Figure (Figure 1(right). The recommendations’ table includes the solver candidates by rows and the requirements taken into account by columns.

288

J. Nguyen et al. / Defining Dimensions in Expertise Recommender Systems

Figure 1. (Left) Selection of requirements, (Right) Recommended solvers.

3. Conclusions and Future Work Expert recommender systems can address two of the main challenges that open innovation platforms are currently facing, such as matching and building strong ties between firms and experts. By accelerating the process of finding the right solver for a challenge, ER systems reduce the costs of partner identification and innovation lead time for the seeker, and increase the community of participants for the intermediary while growing their revenue stream. In this paper, it has been proposed that ER systems can help to fill this need by applying specific dimensions. The proposed dimensions for an ER are derived from literature on open innovation intermediaries and expertise recommender systems. However, it should be recognized that there may be additional dimensions specific to open innovation marketplaces which can further enhance matching and strong ties.

References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]

H. W. Chesbrough, “The era of open innovation,” Managing innovation and change, vol. 127, no. 3, pp. 34–41, 2006. J. Howe, “The rise of crowdsourcing,” Wired Magazine, vol. 14, 06 2006. U. Lichtenthaler and H. Ernst, “Innovation intermediaries: Why internet marketplaces for technology have not yet met the expectations,” Creativity and innovation management, vol. 17, no. 1. F. Ricci, L. Rokach, and B. Shapira, “Introduction to recommender systems handbook,” in Recommender Systems Handbook (F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, eds.), pp. 1–35, Springer, 2011. C. Billington and R. Davidson, “Leveraging open innovation using intermediary networks,” Production and Operations Management, vol. 22, no. 6, pp. 1464–1477, 2013. K. R. Lakhani, L. B. Jeppesen, P. A. Lohse, and J. A. Panetta, The Value of Openess in Scientific Problem Solving. Division of Research, Harvard Business School, 2007. E. Guinan, K. J. Boudreau, and K. R. Lakhani, “Experiments in open innovation at harvard medical school,” MIT Sloan Management Review, vol. 54, no. 3, pp. 45–52, 2013. P. Chandrasekaran, A. Joshi, M. S. Yang, and R. Ramakrishnan, “An expertise recommender using web mining,” in FLAIRS Conference, pp. 291–294, 2001. M. Sawhney, E. Prandelli, and G. Verona, “The power of innomediation,” MIT Sloan Management Review, vol. 44, no. 2, pp. 77–82, 2002. R. M. Grant, “Toward a knowledge-based theory of the firm,” Strategic Management Journal, vol. 17, no. Special Issue: Knowledge and the Firm, pp. 109–122, 1996. K. Venkitachalam and P. Busch, “Tacit knowledge: Review and possible research directions.,” J. Knowledge Management, vol. 16, no. 2, pp. 357–372, 2012.

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-61499-452-7-289

289

Using the Fuzzy Inductive Reasoning methodology to improve coherence in algorithmic musical beat patterns Iván PAZ-ORTIZ 1 , Àngela NEBOT, Francisco MUGICA, Enrique ROMERO Llenguatges i Sistemes Informátics, Soft Computing research group, Barcelona Tech. Abstract. In the present work, the Fuzzy Inductive Reasoning methodology (FIR) is used to improve coherence among beat patterns, structured in a musical A-B form. Patterns were generated based on a probability matrix, encoding a particular musical style, designed by experts. Then, all possible patterns were generated and the most probables were selected. A-B musical forms were created and the coherence of the sequence was evaluated by experts by using linguistic quantities. The output pairs (A-B pattern and its qualification) were used as inputs to train a FIR system, and the variables that produce “coherent” outputs and the relations among them where identified as rules. The extracted rules are discussed in the context of the musical form and from the psychological perception. Keywords. Fuzzy Inductive Reasoning, musical coherence, algorithmic composition.

Introduction Automated algorithmic composition systems are now well-understood and documented [2, 7]. On the search for designing more effective systems with greater expressiveness, latest attempts have shown the need to extract representations for capturing and managing high level musical features like coherence or composer personality [4]. However, these appear commonly as a side effect of the research made in machine learning for the construction of composition or interactive systems. The fact that machine learning processes have effectively captured such features to a great degree is still object of discussion. Moreover, designed systems have not extensively incorporated perception and semantics of the generated music, including the listener psychology sensation of the musical form. Attempts to do this often deal with machine listening technics that need high computational capacity, using modules with a pre-established, symbolic domain for output’s evaluation and adjustment [3]. Fuzzy systems require less amount of resources, and are not restricted to pre-established structures for the evaluation modules, allowing systems to include humans (with their psychological perspective) without having predefined representations of the desired output. In this work, we used the Fuzzy Inductive Reasoning Methodology [8] as a module to evaluate the coherence between two algorithmically produced beat musical patterns. This allows the system to extract the musical representation of the expert and translate it in terms of combinations of variables, which allow consistency between musical parts, through the subjective evaluation to listen combinations. The structure of the work is: Section 1 the basic concepts. Section 2 general methodology. Section 3 results and presents the discussion. 1 E-mail:

Iván: [email protected], Ángela: [email protected]. Proyecto PAPIIT IG400313.

290

I. Paz-Ortiz et al. / Using the FIR to improve coherence in algorithmic musical beat patterns

1. Basic concepts: music coherence and Fuzzy inductive reasoning methodology In this work we explored the musical coherence between two patterns arranged in an A-B form. For methodological reasons we defined coherence as “how good A-B patterns match together” when they are perceived by a listener. The evaluation was made by using linguistic variables [6]. The coherence will depend on the contrast and repetition points between A and B and on the moments on which those are situated. A complete documentation of FIR can be found in [5,1]. The system is fed with raw data from the system under study. It has four basic functions: The fuzzyfication process transforms the data into triplet format. The qualitative modeling utilizes a fuzzy optimal mask function, referring to the selection of which variables participate in the output prediction. This process finds the qualitative relationships between the different input variables. This analysis is performed by using either an exhaustive search or by means of search trees or genetic algorithms. FIR uses Masks as qualitative models, by analyzing the episodical behavior of the system for the identification of a qualitative modeling used for future forecasting. FIR creates the best mask for only one input variable, the best for two, and so on. The masks are called of complexity one, two, etc. The quality of the mask (Q) is determined by the uncertainty reduction measure based on the entropy associated with the transition matrix of states associated with the set of variables of the mask. The fuzzy simulation process allows the model to predict future qualitative outputs based on past experiences by interpolation processes in the input variables to extrapolate the output. The regeneration module performs the inverse process of fuzzyfication by transforming the triplet into the original data format. 2. System design and methodology The beat patterns were created in the context of UK garage/two step [9] for 3 instruments: kick, snare and hihat based on the analysis of [2]. The probability vectors below describe the independent probability, in the interval [0,1], for each instrument to play in a particular moment. Each vector represents a 4/4 bar where each quarter (1 unit) is divided in four sixteenth notes (1/16 of unit). [0.7, 0.0, 0.0, 0.1, 0.0, 0.0, 0.0, 0.2, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.3] kick [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.5, 0.0, 0.2, 0.0, 0.0, 1.0, 0.0, 0.0, 0.3] snare [0.0, 0.0, 1.0, 0.0 , 0.0, 0.0, 1.0, 0.7, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.7] hihat

To avoid the cases where more than one instrument play at the same time, we considered three (musical hierarchy) rules: If kick and snare, then kick. If kick and hihat, then kick. If snare and hihat, then snare. From all possible patterns we selected the 10 with highest probabilities which yielded a set of 20. Those were sequenced in A-B form and reproduced to the listener at 120 beats per minute, in sequences of 4 times A followed by 4 times B, for psychological perception reasons. The coherence between A-B patterns was evaluated using linguistic variables: low, medium or high. We considered 105 different A-B forms. The data, was structured in the format: [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33]

Where numbered variables correspond with each one of the 32 sixteenth in the A-B pattern. Entrance 33 is the listener’s evaluation of the sequence, and is the output of the system. The values of the entrances of the vector were defined as 4- non strike, 1- kick, 2- snare, 3- hihat. The output took coherence values of 1: low, 2: medium, and 3: high. With these considerations we fed the FIR, setting the membership values at the center of the bell membership functions in the fuzzyfication process to allowed the model to

I. Paz-Ortiz et al. / Using the FIR to improve coherence in algorithmic musical beat patterns

291

manage the crisp data. To find the relations among variables we used a comprehensive search based on Shannon entropy, i.e searching for the set of variables that make the state-transition matrix as deterministic as possible. Those are the relevant variables. We also used the Linguistic Rules in FIR algorithm (LR_FIR, [1]), which is a rule-extraction algorithm, that starts from the set of pattern rules obtained by the FIR model previously synthesized, and is able to derive linguistic rules from it. 3. Results and discussion We were interested in modeling the different configurations from the relevant variables that produce a coherent perception in the listener. These are consistent with the musical structure of the style. The results allow us to change the rhythmic motives so that a new system produces parts perceived as more coherent. Extracted rules showed that either V4 nor V20 variate. This should be attributed to the fact that the probability of having a strike in these variables is determined by the kick with probability of 0.1. This result left only the variables: 1, 8, 10, and 16 creating variation in part A, and 17, 24, 26, and 32 in part B. The extracted rules using the LR_FIR in cases 1) considering the 8 variables and 2) considering the most relevant variables obtained by FIR, which correspond to the variables 17, 24 and 32, are displayed in Figure 1 Left and Right, respectively. The rules describe which is the value (instrument) of a particular variable (e.g V17-1 should be read: “variable 17 was in 1 (kick)”), and how the combination of variables produces a particular output. The third rule of Figure 1 should be read as: IF V16 is 1 and V26 is 2 and V32 is 1 then V33 is 3.

Figure 1. Extracted rules for the input data by using LR_FIR considering 8 input variables and one output.

In left down, are the rules for evaluated low coherent patterns LC (i.e. the output of V 33 is 1). We found a great amount of silences (Variables in 4), specially in V32. If we compare this behavior with rules for medium coherent MC patterns (center of the figure) and with high coherent HC patterns (top) for the same variable, we can say in general, that silences in V32 affect the coherence of the perceived sequence. An explanation for this is given in terms of the subjective perception of patterns. Given that the patterns are composed by 4 times A followed by 4 times B, the sensation of periodicity of B will be produced in great amount by the variable that completes the cycle by connecting it with their repetition, which is V32, in this case connected with variable 17, from which we know (from the probability matrix) that 70% of the cases will be 1 (kick). As we said, this behavior is expected from the point of view of the subjective perception of rhythm. In this case, if pattern “A” has been interesting enough, the focus in searching coherence

292

I. Paz-Ortiz et al. / Using the FIR to improve coherence in algorithmic musical beat patterns

will be on pattern B. And to perceived B as cyclically coherence we need to look at V32. The previous hypothesis can be also supported by the rules: V10-2 and V17-1 and V24-2 and V32-2 THEN V33-3; V16-4 and V17-1 and V26-2 and V32-2 THEN V33-3. Belonging to patterns evaluated as HC. In both cases, V17-1 and Vd32-2. These two rules represent approximately 4/16 of the cases evaluated as HC. Moreover, if we look into the different masks (Table1) when we looked for the mask for one input variable, we obtained V32. Also, sequences evaluated as MC (Left center), do not have silences in V32. In the right are the extracted rules considering three variables. We can see that we only have one rule describing the HC evaluating cases: V17-1 and V32-1/2 THEN V33-3. In which the behavior described above is clearly expressed. V17 most be 1, and V32 could be 1 or 2. As said, those variables determine the cycle sensation in B. Also all silences, with one exception, are found in patterns evaluated as LC. The different masks created by LR_FIR for one to eight variables are shown in Table 1. Table 1. Masks created by LR_FIR for one to eight variables. The quality of the masks is denoted by Q V1

V8

V10

V16

V17

V24

V26

V32

Q

∗ ∗ ∗ ∗

∗ ∗ ∗ ∗

∗ ∗ ∗ ∗

∗ ∗ ∗ ∗ ∗

∗ ∗ ∗ ∗ ∗ ∗

∗ ∗ ∗ ∗

-

∗ ∗ ∗ ∗ ∗

0.14 - Q1 0.27 - Q2 0.30 - Q3 0.28 - Q4 0.27 - Q5 0.21 - Q6 0.22 - Q7 0.20 - Q8

∗ ∗ ∗ ∗

They represent the variables that have more influence in the prediction of the output [1]. At the top the mask for only a single variable “Q1” contains the V32. As discussed, we can explain this by considering this variable as the one who, together with V17 (which is 1 in 70% of the cases), give a cyclic sensation to the pattern B. Q2 is in terms of V16 and V32. This selection can be explained considering the role of V32, and that V16 plays for A-patterns the same role of V32 for B-patterns, so we can understand this in terms of the cyclic sensation they produce. Also, V16 is a connection variable between parts A and B of the pattern, so it gives V16 another important role in the perception of the whole. In the case of Q3, the selected variables were 17, 24 and 32 which are related with the cyclic perception of the B-pattern (17 and 32). V24 is playing a role in increasing the rhythmic interest. The same idea explains Q4, where 24 and 26 were selected. However, an interesting behavior appears in Q5, when, with exception of V17, all selected variables belong to pattern A. The following masks add 26, 24 and 32, respectively. This is explained because when new variables are added to the mask, the new variables together, can explain a great amount of the overall perception in comparison with the original ones. References [1] [2] [3] [4]

[5] [6] [7] [8] [9]

Castro, F., Nebot, A. and Mugica, F., 2011. On the extraction of decision support rules from fuzzy predictive models. Applied Soft Computing, 11 (4), 3463-3475. Collins, N., 2003 Algorithmic Composition Methods for Breakbeat Science ARiADA No.3 May 2003. Collins, N., 2012. Automatic Composition of Electroacoustic Art Music Utilizing Machine Listening. Computer Music Journal, 36:3, pp. 8-23, Fall 2012 Massachusetts Institute of Technology. Duncan, W., Kirke, A., Miranda, E,. Etienne B., Roesch, Slawomir J. Nasuto. Towards Affective Algorithmic Composition. Proceedings of the 3rd International Conference on Music & Emotion (ICME3), Jyväskylä, Finland, 11th - 15th June 2013. Geoff Luck & Olivier Brabant (Eds.) Escobet, A., Nebot, A., Cellier, F. E., 2008. Visual-FIR: A tool for model identification and prediction of dynamical complex systems, Simulation Modeling Practice and Theory, 16:1, pp. 76-92. Kosko B (1986) Fuzzy Cognitive Maps. Man-Machine Studies, 24, 65-75. Nierhaus, G., 2009. Algorithmic Composition. Paradigms of Automated Music Generation. Springer Wien, New York. Nebot, A., Mugica, F., 2012. Fuzzy Inductive Reasoning: a consolidated approach to data-driven construction of complex dynamical systems. International Journal of General Systems, 41:7, pp. 645-665 Shapiro, P. 1999. Drum ’n’ bass, the Rough Guide. Rough Guides Ltd.

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved.

293

Subject Index agents 265 aggregation operators 193 algorithmic composition 289 approximate policy iteration 3 argumentation 215 attribute-value representation 205 automated assessment 136 Autonomous Underwater Vehicle (AUV) 95 binary-class support vector machine 149 branch and bound 13 breathing monitoring 257 classifier forest 273 cloud computing 253 clustering 126 cognitive systems 249 color trends 107 colour naming 169 colour perception 169 computational creativity 249 computer vision 55 conversational case-based recommendation 116 creativity 87 creativity support system 87 critiquing 116 cross media 277 data center 253 data mining 126, 277 data visualization 269 decision support systems 67, 77 digital preservation 265 dimension 183 disaggregation preference method 107 distance 281 distance-based representation 23 Dynamic Movement Primitives (DMP) 95 Eclat 277 electronic auctions 265 emotion detection 55 energy efficiency 253

environmental modelling expertise extensional sets facial action units facial expression recognition feature combination food industry frequent itemsets fuzzy class theory fuzzy inductive reasoning G-protein-coupled receptors Gabor filters generative topographic mapping generator golf course grasp specification grid computing group recommendation hesitant fuzzy sets HSL colour space image processing indistinguishability influencers intelligent systems interaction internet kernel functions knowledge base language corpus large-scale image classification Learning by Demonstration (LbD) life-logging data linear discriminant analysis linguistic labels logic programming low spatial and temporal resolution videos low-complex system mammographic images marketing mix media optimization metabotropic glutamate receptors model

77 285 281 55 55 159 87 126 205 289 269 149 269 183 67 45 253 116 193 169 67 281 261 67 116 277 149 249 249 273 95 35 257 227 215 35 257 149 277 277 269 281

294

multi-criteria decision analysis multi-criteria decision-making musical coherence non-monotonic inference non-monotonic utility function online learning open innovation opinion leaders optimized model parameters OWA operator person tracking point cloud preference relations qualitative reasoning random forests RANSAC RCPSP recommender system reformulation regularization reinforcement learning remote associate test representation theorem return of investment rule engine scheduling self-preservation semi-supervised learning Sequencial Walsh Hadamard (SWH) transform shape fitting

107 227 289 215 107 136 285 261 149 193 35 45 77 227 273 45 237 285 237 3 3 249 183 277 67 237 265 13 257 45

similarity learning 23 similarity relation 205 smartphone app 257 SMT 237 social networks 261 spatio temporal data 126 statistical moments 149 strong complete T-preorder 183 support vector machine(s) 3, 13, 159 T-preorder 183 texture analysis 159 tools 237 TOPSIS 227 training size 23 transduction 13 trust model 136 uncertainty 215 underwater autonomous grasping 45 underwater intervention 95 user customization 169 UWSim underwater realistic simulator 45 violator spaces 13 water allocation problem 77 web domain 277 web mining 277 weighted mean 193 weights derivation 193 wind energy 227 word-of-mouth marketing 261

Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved.

295

Author Index Abdel-Nasser, M. 159 Afsordegan, A. 227 Agell, N. v, 87, 107, 227, 261 Aghaei, M. 35 Aguado, J.C. 227 Alsinet, T. 215 Álvarez, J.A. 253 Alvarez, S. 126 Angulo, C. 285 Arevalillo-Herráez, M. 23 Armengol, E. 205 Bejar, J. 126 Béjar, R. 215 Berjaga, X. 67 Bofill, M. 237 Boixader, D. 183 Cárdenas, M.-I. 269 Carrera, A. 95 Carreras, M. 95 Carrillo, P.N. 265 Casabayó, M. 261 Chao, T.C. 77 Coll, J. 237 Compta, M. 67 Contreras, D. 116 De La Rosa, J.L. 265 Del Vasto-Terrientes, L. 77 Dellunde, P. 205 Esposito, G. 3, 13 Falomir, Z. 169, 249 Fernández, J.J. 45 Fernández-Cerero, D. 253 Fernández-Montes, A. 253 Ferre, M. 149 Fornas, D. 45 Gamboa, G. 227 Garcia, D. 126 García-Cerdaña, À. 205 Ghaderi, M. 107 Gibert, K. 277 Giraldo, J. 269 Godo, L. 215 Gomez, I. 126

González-Abril, L. Grimaldo, F. Guitart, F. Gutierrez, P. Hurtós, N. Kormushev, P. Kumar, V. Lapedriza, À. Lopez, J.M. López-Iñesta, E. Lulli, F. Marti, J. Marti-Puig, P. Martin, M. Masferrer, G. Masip, D. Mattioli, G. Melendez, J. Moreno, A. Mugica, F. Museros, L. Nebot, À. Nguyen, J. Oliva, L. Olteţeanu, A.-M. Olvera, J.A. Ortega, J.A. Osman, N. Pérez, J. Palomeras, N. Palomino, A. Pascual, J. Paz-Ortiz, I. Peñalver, A. Pereda, A. Pons, G. Puig, D. Puigbò, J.-Y. Pujol, O. Radeva, P. Ramisa, A. Raya, C. Recasens, J.

169, 253 23 215 136 95 95 77 55 67 23 67 149 257 3, 13 257 55 281 149 159 289 v, 169 289 285 126 249 265 253 136 45 95 277 116 289 45 285 67 149, 159 261 v 35 273 87 183

296

Romero, E. Ruiz, F.J. Salamó, M. Sales, J. Samà, A. Sánchez, M. Sánchez-Hernández, G. Sanchez-Mendoza, D. Sanz, I. Sanz, P.J. Schuhmacher, M. Serra, M.

289 87, 107 116 45 87 227 261, 285 55 169 45 77 257

Sierra, C. Solé, X. Suy, J. Tejeda, A. Torra, V. Torras, C. Torrents-Barrena, J. Valls, A. Vazquez-Salceda, J. Vellido, A. Villaret, M.

136 273 237 126 193 273 149 77, 149 126 269 237