Intelligent Autonomous Systems 9 1586035959, 9781586035952, 9781423799283

The papers in this publication cover both the applied as well as the theoretical aspects of intelligent autonomous syste

854 59 25MB

English Pages 1064 Year 2006

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Intelligent Autonomous Systems 9 
 1586035959, 9781586035952, 9781423799283

Table of contents :
Title page......Page 1
Preface......Page 5
IAS-9 Conference Organization......Page 6
Contents......Page 9
Papers of Invited Guest Speakers......Page 19
Reaction-Diffusion Intelligent Wetware......Page 21
Evolutionary Robotics for Legged Machines: From Simulation to Physical Reality......Page 29
Real World Informatics Environment System......Page 37
Understanding and Realization of Constrained Motion - Human Motion Analysis and Robotic Learning Approaches......Page 48
Navigation and Motion Planning......Page 57
The Specifiability Requirement in Mobile Robot Self-Localization......Page 59
Ghost-Goal Algorithm for Reactive Safe Navigation in Outdoor Environments......Page 67
Autonomous Robot Vision System for Environment Recognition......Page 75
Multi-Resolution Field D*......Page 83
Path and Observation Planning of Vision-Based Mobile Robots with Multiple Sensing Strategies......Page 93
Mobile Robot Motion Planning Considering Path Ambiguity of Moving Obstacles......Page 103
Robotic Navigation Using Harmonic Functions and Finite Elements......Page 112
The Hippocampal Place Cells and Fingerprints of Places: Spatial Representation Animals, Animats and Robots......Page 122
Incremental Reconstruction of Generalized Voronoi Diagrams on Grids......Page 132
Learning from Nature to Build Intelligent Autonomous Robots......Page 142
Tracking Control & Active Vision......Page 151
A Laser Based Multi-Target Tracking for Mobile Robot......Page 153
Simultaneous Environment Mapping and Mobile Target Tracking......Page 163
Omnidirectional Active Vision for Evolutionary Car Driving......Page 171
Localization......Page 181
Map of Color Histograms for Robot Navigation......Page 183
Designing a System for Map-Based Localization in Dynamic Environments......Page 191
Appearance-Based Localization of Mobile Robots Using Local Integral Invariants......Page 199
Enhancing Self Covertness in a Hostile Environment from Expected Observers at Unknown Locations......Page 207
Multi-Agent Robots......Page 215
An Experimental Study of Distributed Robot Coordination......Page 217
Single-Sensor Probabilistic Localization on the SeReS Self-Reconfigurable Robot......Page 225
Mutual Localization and 3D Mapping by Cooperative Mobile Robots......Page 235
Network Agent Systems......Page 243
Exploration of Complex Growth Mechanics of City Traffic Jam for the Adaptive Signal Control......Page 245
Coordinated Control of Mobile Antennas for Ad-Hoc Networks in Cluttered Environments......Page 253
An Adaptive Behavior of Mobile Ad Hoc Network Agents......Page 261
Evolution and Learning......Page 271
Learning the Cooperative Behaviors of Seesaw Balancing Agents - An Actor-Critic Aproach -......Page 273
Evolutionary Reinforcement Learning for Simulated Locomotion of a Robot with a Two-Link Arm......Page 281
Metric State Space Reinforcement Learning for a Vision-Capable Mobile Robot......Page 290
Transition Entropy in Partially Observable Markov Decision Processes......Page 300
Movement Control of Tensegrity Robot......Page 308
An Adaptive Neural Controller for a Tendon Driven Robotic Hand......Page 316
Self-Organizing Route Guidance Systems Based on Coevolution of Multi-Layered Guidance Vector Fields......Page 326
Object Transportation Using Two Humanoid Robots Based on Multi-Agent Path Planning Algorithm......Page 336
Cognitive Map Plasticity and Imitation Strategies to Extend the Performance of a MAS......Page 344
Examination of Abilities Based on Pseudolite System for Indoor Positioning......Page 352
A Memory-Based PID Controller for Indoor Airship Robot......Page 359
Co-Evolutionary Design for AGV Systems......Page 367
An Update Method of Computer Simulation for Evolutionary Robotics......Page 375
Vision-Based Teleoperation of a Mobile Robot with Visual Assistance......Page 383
Adaptation......Page 391
Adaptive Control Strategy for Micro/Nano Manipulation Systems......Page 393
Smart Roadster Project: Setting up Drive-by-Wire or How to Remote-Control Your Car......Page 401
A Reactive Approach for Object Finding in Real World Environments......Page 409
A Geometric Approach for an Intuitive Perception System of Humanoids......Page 417
Autonomous Learning of a Topological Model in a Road Network......Page 426
Reinforcement Learning Performance Evaluation: An Evolutionary Approach......Page 434
Quantify Distinguishability in Robotics......Page 443
Group Transport Along a Robot Chain in a Self-Organised Robot Colony......Page 451
Growing Virtual Neural Tissue: Binding Spiking Neurons Through Sensory Input......Page 461
Emergent Synthesis......Page 471
Hormone-Inspired Adaptive Distributed Synchronization of Reconfigurable Robots......Page 473
Spatial Prisoner's Dilemma in a Network Environment Introducing Heterogeneous Information Distribution......Page 481
Behavioral Decision for Multi-Agent Systems with Dynamic Interaction......Page 489
Co-Creative Composition Using Multiagent Learning: Toward the Emergence of Musical Structure......Page 497
Self-Assembly of Mobile Robots: From Swarm-Bot to Super-Mechano Colony......Page 505
Lot Release Control Using Genetics Based Machine Learning in a Semiconductor Manufacturing System......Page 515
Design of an AGV Transportation System by Considering Management Model in an ACT......Page 523
Analysis of Purchase Decision Making: Network Externalities and Asymmetric Information......Page 533
Dynamics, Morphology, and Materials in Intelligent Behavior......Page 541
Emergence of Insect Navigation Strategies from Homogeneous Sensorimotor Coupling......Page 543
Active Learning of Local Structures from Attentive and Multi-Resolution Vision......Page 552
Modular Design of Home Service Robot System with Hierarchical Colored Petri Net......Page 560
Auctions for Task Allocation to Robots......Page 568
Exploration of Natural Dynamics Through Resonance and Chaos......Page 576
One-Legged Locomotion with a Compliant Passive Joint......Page 584
Analysis of Dynamical Locomotion of Two-Link Locomotors......Page 592
Mobiligence......Page 601
A Modular Robot That Self-Assembles......Page 603
Autonomous Robust Execution of Complex Robotic Missions......Page 613
Emergence of Small-World in Ad-Hoc Communication Network Among Individual Agents......Page 623
Parametric Path Planning for a Car-Like Robot Using CC Steers......Page 631
Self-Organizing Planner for Multiple Mobile Robot Exploration and Patrol......Page 640
Adaptive Action Selection of Body Expansion Behavior in Multi-Robot System Using Communication......Page 650
From Mobility to Autopoiesis: Acquiring Environmental Information to Deliver Commands to the Effectors......Page 658
RoboCup......Page 667
Cooperative Agent Behavior Based on Special Interaction Nets......Page 669
A Framework for Advanced Robot Programming in the RoboCup Domain - Using Plug-In System and Scripting Language......Page 678
Major Behavior Definition of Football Agents Through XML......Page 686
Getting Back on Two Feet: Reliable Standing-Up Routines for a Humanoid Robot......Page 694
Ball Tracking with Velocity Based on Monte-Carlo Localization......Page 704
Fast Vector Quantization for State-Action Map Compression......Page 712
Incremental Purposive Behavior Acquisition Based on Modular Learning System......Page 720
Real World Information Systems......Page 729
Simple Form Recognition Using Bayesian Programming......Page 731
Towards Robust State Estimation with Bayesian Networks: A New Perspective on Belief Propagation......Page 740
HRP-2W: A Humanoid Platform for Research on Support Behavior in Daily Life Environments......Page 750
Human Supporting Production Cell "Attentive Workbench"......Page 758
A Foveal 3D Laser Scanner Integrating Texture into Range Data......Page 766
Autonomous Collaborative Environment for Project Based Learning......Page 774
Humanoid Robots......Page 783
Pedaling Motion of a Cycle by Musculo-Skeletal Humanoid with Adapting Ability Based on an Evaluation of the Muscle Loads......Page 785
Behavior Transition Between Biped and Quadruped Walking by Using Bifurcation......Page 794
Tendon Arrangement Based on Joint Torque Requirements for a Reinforceable Musculo-Skeletal Humanoid......Page 804
Vision-Based Virtual Information and Semi-Autonomous Behaviours for a Humanoid Robot......Page 812
Load Distributed Whole-Body Motion Generation Method for Humanoids by Minimizing Average Joint Torque Ratio......Page 822
Measurement and Simulation Verification of Reflexive Responses to Perturbation During Walking......Page 830
Toward a Human-Like Biped Robot with Compliant Legs......Page 838
Service Robotics and Human Support......Page 847
Using JavaSpace for a PEIS Ecology......Page 849
A New Heating Method for the Actuation of the Shape Memory Alloy (SMA) Actuator......Page 857
Informational Organization on Network Among Multiple Agents......Page 865
A Flexible Task Knowledge Representation for Service Robots......Page 874
Intelligent Autonomous Japanese Comic "MANGA" Designing Support System......Page 883
Human Behavior Analysis......Page 891
Behavior Induction by Geometric Relation Between Symbols of Multi-Sensory Pattern......Page 893
Effects of Robotic Arm Orthosis Behaviors to User's Motion Structure. Qualitative Assessment Using Arm Trajectory Profiles......Page 901
A Human Behavior Discrimination Method Based on Motion Trajectory Measurement for Indoor Guiding Services......Page 909
Mutual Adaptation among man and machines......Page 919
Effects of Shared Communicational Modality to Joint Activity of Human Operator and Robot Autonomy......Page 921
Learning of Object Identification by Robots Commanded by Natural Language......Page 931
An f-MRI Study of an EMG Prosthetic Hand Biofeedback System......Page 939
Wearable Inertial Sensors for Arm Motion Tracking in Home-Based Rehabilitation......Page 948
Learning and Control Model of Arm Posture......Page 956
Competitive Learning Method for Robust EMG-to-Motion Classifier......Page 964
Mutual Adaptation Among Man and Machine by Using f-MRI Analysis......Page 972
Women in Robotics, Human Science and Technology......Page 981
Behavior Generation of Humanoid Robots Depending on Mood......Page 983
Construction of Human-Robot Cooperating System Based on Structure/Motion Model......Page 991
Motion Assist Devices for Rehabilitation Using Parallel Wire Mechanisms......Page 999
Analysis of Skill Acquisition Process - A Case Study of Arm Reaching Task -......Page 1009
Generation of Size-Variable Image Template for Self-Position Estimation Considering Position Shift......Page 1017
Subjective Age Estimation Using Facial Images - the Effects of Gender, Expressions and Age Groups -......Page 1025
Evaluation and Suppression of Overrun of Microorganisms Using Dynamics Model for Microrobotic Application......Page 1033
From Muscle to Brain: Modelling and Control of Functional Materials and Living Systems......Page 1043
Using the Sony AIBOs to Increase Diversity in Undergraduate CS Programs......Page 1051
Author Index......Page 1059

Citation preview

INTELLIGENT AUTONOMOUS SYSTEMS 9

This page intentionally left blank

Intelligent Autonomous Systems 9 IAS-9

Edited by

Tamio Arai The University of Tokyo, Japan

Rolf Pfeifer University of Zurich, Switzerland

Tucker Balch Georgia Institute of Technology, USA

and

Hiroshi Yokoi The University of Tokyo, Japan

Amsterdam • Berlin • Oxford • Tokyo • Washington, DC

© 2006 The authors All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without prior written permission from the publisher. ISBN 1-58603-595-9 Library of Congress Control Number: 2006920171 Publisher IOS Press Nieuwe Hemweg 6B 1013 BG Amsterdam Netherlands fax: +31 20 687 0019 e-mail: [email protected] Distributor in the UK and Ireland Gazelle Books Falcon House Queen Square Lancaster LA1 1RN United Kingdom fax: +44 1524 63232

Distributor in the USA and Canada IOS Press, Inc. 4502 Rachael Manor Drive Fairfax, VA 22032 USA fax: +1 703 323 3668 e-mail: [email protected]

LEGAL NOTICE The publisher is not responsible for the use which might be made of the following information. PRINTED IN THE NETHERLANDS

v

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

Preface The IAS-9 conference aims to address the main issues of concern within the IAS community. The conference covers both the applied as well as the theoretical aspects of intelligent autonomous systems. Autonomy and adaptivity are key aspects of truly intelligent artificial systems, dating from the first IAS conference in 1989. New directions of research have recently emerged from the synergetic interaction of many fields, such as cognitive science, operations research, mathematics, robotics, mechanics, electronics, informatics, and economics, interdisciplinary as well as transdisciplinarily. One key insight is that to realize both intelligence and autonomy, it is crucial to build real-world devices and abstract principles of design from them. The goal of IAS-9 is to lay out new scientific ideas and design principles for artificial systems able to survive in nature and in our society. The conference proceedings stimulate novel challenges as well as exciting research directions. A total of 146 scientific papers were submitted from 16 countries. All of the submitted papers were reviewed by the program committee, and 112 were accepted as full papers. We have 5 invited guest speakers at IAS-9: Andrew Adamatzky from the University of West England addresses the new direction of computation; Hod Lipson from Cornell University shows the frontier study of evolutionary robotics; Tomomasa Sato from The University of Tokyo presents the COE project of Japan for the real world application; Masahiro Fujita from SONY addresses the communication and service robotic system; and Shigeyuki Hosoe from RIKEN Bio-mimetic control research center shows the human analysis toward robotic learning. The conference takes place at Kashiwa new campus of the University of Tokyo, where frontier sciences are being created as “transdisciplinary” studies. A novel research center on artifacts, RACE, is also located on this campus with other three interdisciplinary research centers. I hope all participants of IAS-9 will enjoy the atmosphere of the campus and the facilities of the research building, and experience the novel trend of “transdisciplinary” studies in Japan. We sincerely appreciate the support of the Inoue Foundation of Science and Kayamori Foundation of Informational Science Advancement, the Robotics Society of Japan and the Research into Artifact Center of Engineering at the University of Tokyo. We would also like to express our gratitude to everybody of the program committee who contributed to the collection and the selection of high-level papers, and to the local committee members who supported the management of IAS-9. We look forward to seeing you at the conference site of IAS-9 in Tokyo. Tamio Arai, Rolf Pfeifer, Tucker Balch and Hiroshi Yokoi

vi

IAS-9 Conference Organization General Chair Tamio Arai, The Univ. of Tokyo, Japan Steering Committee Rüdiger Dillmann, Univ. of Karlsruhe, Germany Maria Gini, Univ. of Minnesota, USA Frans Groen, Univ. of Amsterdam, the Netherlands Thomas C. Henderson, University of Utah Yukinori Kakazu, Hokkaido Univ., Japan Enrico Pagello, Univ. of Padua and LADSEB-CNR, Italy Anthony Stentz, Carnegie Mellon Univ., USA Program Committee Co-Chairs In America: Tucker Balch, Georgia Institute of Technology, USA In Europe/Africa: Rolf Pfeifer, The Univ. of Zurich, Switzerland In Asia/Oceania: Hiroshi Yokoi, The Univ. of Tokyo, Japan Publicity Chair Kanji Ueda, Univ. of Tokyo, Japan Organized Session Chair Jun Ota, Univ. of Tokyo, Japan Max Lungarella, Univ. of Tokyo, Japan Local Organization Hajime Asama, Univ. of Tokyo, Japan Yusuke Maeda, Yokohama National Univ., Japan Masao Sugi, Univ. of Tokyo, Japan Ryosuke Chiba, Univ. of Tokyo, Japan Ryuichi Ueda, Univ. of Tokyo, Japan Program Committee Europe Fumiya Iida, The University of Zurich, Switzerland Miriam Fend, The University of Zurich, Switzerland Gabriel Gomez, The University of Zurich, Switzerland Giorgio Metta, University of Genoa, Italy Giulio Sandini, University of Genova, Italy Frank Pasemann, Frauenhofer Institute for Autonomous Intelligent Systems, Germany Tom Ziemke, University of Skovde, Sweden

vii

Noel Sharkey, University of Sheffield, England Owen Holland, University of Essex, England Barbara Webb, University of Edinburgh, Scottland Philippe Gaussier, Cergy Pontoise University, France Arnaud Revel, ENSEA, France Pierre-Yves Oudeyer, Sony Computer Science Lab., France Francesco Mondada, EPFL, Switzerland Roland Siegwart, EPFL, Switzerland Rüdiger Dillmann, University of Karlsruhe, Germany Sven Behnke, University of Freiburg, Germany Raoul Rojas, Freie Universitat Berlin, Germany Auke Ijspeert, EPFL, Switzerland Aude Billard, EPFL, Switzerland Andrew Adamatzky, University of West England, England Huosheng Hu, University of Essex, England Stefano Carpin, International University Bremen, Germany Daniele Nardi, University of Roma “La Sapienza”, Italy Alicia Casals, Technical University of Catalonia, Spain Ray Jarvis, Monash University, Australia Pedro Lima, Technical University of Lisbon, Portugal Enzo Mumolo, University of Trieste, Italy Carme Torras, Institut de Robotica i Informatica Industrial, Spain Ernesto Burattini, University of Naples “Federico II”, Italy Andrea Bonarini, Politecnico di Milano, Italy Gerhard K. Kraetzschmar, University of Applied Sciences Bonn-Rhein-Sieg, Germany Antonio D’Angelo, University of Udine, Italy Angel P. del Pobil, Universitat Jaume I, Spain America Paul Scerri, Carnegie Mellon University Tony Stentz, Carnegie Mellon University Omead Amidi, Carnegie Mellon University Doug MacKenzie, Mobile Intelligence Inc. Maria Gini, University of Minnesota Sven Koenig, University of Southern California Daniel Lee, University of Pennsylvania Devin Balkom, Dartmouth University Wes Huang, Rensselaer Polytechnic Institute Joelle Pineau, McGill University Tom Collins, Georgia Institute of Technology Eric Johnson, Georgia Institute of Technology Artur Arsenio, Massachusetts Institute of Technology Jose Carmena, UC Berkeley Darrin Bentivegna, ATR, Kyoto, Japan Josh Bongard, Cornell University Paul Fitzpatrick, Massachusetts Institute of Technology Steve Collins, University of Michigan

viii

Chandana Paul, Cornell University Chris Atkeson, Carnegie Mellon University Asia Tetusnari Inamura, The University of Tokyo, Japan Yoshihiko Nakamura, The University of Tokyo, Japan Yuichi Kobayashi, RIKEN BMC, Japan Hiroaki Yamaguchi, Musashi Institute of Technology, Japan Yoshihiro Miyake, Tokyo Institute of Technology, Japan Yasutake Takahashi, Osaka University, Japan Shinkichi Inagaki, Nagoya University, Japan Kousuke Inoue, Ibaraki University, Japan Wenwei Yu, Chiba University, Japan Yuko Ishiwaka, Hokkaido University, Japan Toshiyuki Kondo, Tokyo Institute of Technology, Japan Toshio Hori, AIST, Japan Mitsuo Wada, Hokkaido University, Japan Mihoko Otake, The University of Tokyo, Japan Masashi Furukawa, Asahikawa National College of Technology, Japan Koichi Osuka, Kobe University Koichi Nishiwaki, AIST, Japan Koh Hosoda, Osaka University Keiji Suzuki, Future University – Hakodate, Japan Kazuhiro Ohkura, Kobe University, Japan Takashi Kawakami, Hokkaido Institute of Technology, Japan Katsuyoshi Tsujita, Osaka Institute of Technology, Japan Jun Hakura, Iwate Prefectural University, Japan Kosei Ishimura, Hokkaido University, Japan Hiroshi Ishiguro, Osaka University, Japan Akio Ishiguro, Nagoya University, Japan

ix

Contents Preface Tamio Arai, Rolf Pfeifer, Tucker Balch and Hiroshi Yokoi IAS-9 Conference Organization

v vi

Papers of Invited Guest Speakers Reaction-Diffusion Intelligent Wetware Andrew Adamatzky Evolutionary Robotics for Legged Machines: From Simulation to Physical Reality Hod Lipson, Josh Bongard, Victor Zykov and Evan Malone Real World Informatics Environment System Tomomasa Sato Understanding and Realization of Constrained Motion – Human Motion Analysis and Robotic Learning Approaches Shigeyuki Hosoe, Yuichi Kobayashi and Mikhail Svinin

3

11 19

30

Part 1. Navigation and Motion Planning The Specifiability Requirement in Mobile Robot Self-Localization Francesco Capezio, Antonio Sgorbissa and Renato Zaccaria

41

Ghost-Goal Algorithm for Reactive Safe Navigation in Outdoor Environments Mattia Castelnovi, Antonio Sgorbissa and Renato Zaccaria

49

Autonomous Robot Vision System for Environment Recognition Woong-Jae Won, Sang-Woo Ban and Minho Lee

57

Multi-Resolution Field D* Dave Ferguson and Anthony Stentz

65

Path and Observation Planning of Vision-Based Mobile Robots with Multiple Sensing Strategies Mitsuaki Kayawake, Atsushi Yamashita and Toru Kaneko

75

Mobile Robot Motion Planning Considering Path Ambiguity of Moving Obstacles Hiroshi Koyasu and Jun Miura

85

Robotic Navigation Using Harmonic Functions and Finite Elements Santiago Garrido and Luis Moreno

94

x

The Hippocampal Place Cells and Fingerprints of Places: Spatial Representation Animals, Animats and Robots Adriana Tapus, Francesco Battaglia and Roland Siegwart

104

Incremental Reconstruction of Generalized Voronoi Diagrams on Grids Nidhi Kalra, Dave Ferguson and Anthony Stentz

114

Learning from Nature to Build Intelligent Autonomous Robots Rainer Bischoff and Volker Graefe

124

Part 2. Tracking Control & Active Vision A Laser Based Multi-Target Tracking for Mobile Robot Masafumi Hashimoto, Satoshi Ogata, Fuminori Oba and Takeshi Murayama Simultaneous Environment Mapping and Mobile Target Tracking Abedallatif Baba and Raja Chatila

135

Omnidirectional Active Vision for Evolutionary Car Driving Mototaka Suzuki, Jacob van der Blij and Dario Floreano

153

145

Part 3. Localization Map of Color Histograms for Robot Navigation Takanobu Kawabe, Tamio Arai, Yusuke Maeda and Toshio Moriya

165

Designing a System for Map-Based Localization in Dynamic Environments Fulvio Mastrogiovanni, Antonio Sgorbissa and Renato Zaccaria

173

Appearance-Based Localization of Mobile Robots Using Local Integral Invariants Hashem Tamimi, Alaa Halawani, Hans Burkhardt and Andreas Zell

181

Enhancing Self Covertness in a Hostile Environment from Expected Observers at Unknown Locations Mohamed Marzouqi and Ray Jarvis

189

Part 4. Multi-Agent Robots An Experimental Study of Distributed Robot Coordination Stefano Carpin and Enrico Pagello Single-Sensor Probabilistic Localization on the SeReS Self-Reconfigurable Robot Kenneth Payne, Jacob Everist, Feili Hou and Wei-Min Shen Mutual Localization and 3D Mapping by Cooperative Mobile Robots Julian Ryde and Huosheng Hu

199

207 217

xi

Part 5. Network Agent Systems Exploration of Complex Growth Mechanics of City Traffic Jam for the Adaptive Signal Control Kouhei Hamaoka and Mitsuo Wada Coordinated Control of Mobile Antennas for Ad-Hoc Networks in Cluttered Environments Gianluca Antonelli, Filippo Arrichiello, Stefano Chiaverini and Roberto Setola An Adaptive Behavior of Mobile Ad Hoc Network Agents Masao Kubo, Chau Dan, Hiroshi Sato and Takashi Matubara

227

235

243

Part 6. Evolution and Learning Learning the Cooperative Behaviors of Seesaw Balancing Agents – An Actor-Critic Aproach – Takashi Kawakami, Masahiro Kinoshita and Yukinori Kakazu

255

Evolutionary Reinforcement Learning for Simulated Locomotion of a Robot with a Two-Link Arm Yohannes Kassahun and Gerald Sommer

263

Metric State Space Reinforcement Learning for a Vision-Capable Mobile Robot Viktor Zhumatiy, Faustino Gomez, Marcus Hutter and Jürgen Schmidhuber

272

Transition Entropy in Partially Observable Markov Decision Processes Francisco S. Melo and Isabel Ribeiro

282

Movement Control of Tensegrity Robot Masaru Fujii, Shinichiro Yoshii and Yukinori Kakazu

290

An Adaptive Neural Controller for a Tendon Driven Robotic Hand Gabriel Gómez, Alejandro Hernandez and Peter Eggenberger Hotz

298

Self-Organizing Route Guidance Systems Based on Coevolution of Multi-Layered Guidance Vector Fields Yasuhiro Ohashi and Kosuke Sekiyama

308

Object Transportation Using Two Humanoid Robots Based on Multi-Agent Path Planning Algorithm Shotaro Kamio and Hitoshi Iba

318

Cognitive Map Plasticity and Imitation Strategies to Extend the Performance of a MAS P. Laroque, E. Fournier, P.H. Phong and P. Gaussier

326

Examination of Abilities Based on Pseudolite System for Indoor Positioning Isamu Kitano and Keiji Suzuki

334

xii

A Memory-Based PID Controller for Indoor Airship Robot Takamasa Sato and Keiji Suzuki

341

Co-Evolutionary Design for AGV Systems Ryosuke Chiba, Jun Ota and Tamio Arai

349

An Update Method of Computer Simulation for Evolutionary Robotics Yoshiaki Katada and Kazuhiro Ohkura

357

Vision-Based Teleoperation of a Mobile Robot with Visual Assistance Naoyuki Kubota, Daisuke Koudu, Shinichi Kamijima, Kazuhiko Taniguchi and Yasutsugu Nogawa

365

Part 7. Adaptation Adaptive Control Strategy for Micro/Nano Manipulation Systems Hwee Choo Liaw, Denny Oetomo, Bijan Shirinzadeh and Gursel Alici Smart Roadster Project: Setting up Drive-by-Wire or How to Remote-Control Your Car Joachim Schröder, Udo Müller and Rüdiger Dillmann

375

383

A Reactive Approach for Object Finding in Real World Environments Abdelbaki Bouguerra

391

A Geometric Approach for an Intuitive Perception System of Humanoids David Israel Gonzalez-Aguirre and Eduardo Jose Bayro-Corrochano

399

Autonomous Learning of a Topological Model in a Road Network Gabriel Aviña Cervantès and Michel Devy

408

Reinforcement Learning Performance Evaluation: An Evolutionary Approach Genci Capi and Masao Yokota

416

Quantify Distinguishability in Robotics Aurélien Hazan, Frédéric Davesne, Vincent Vigneron and Hichem Maaref

425

Group Transport Along a Robot Chain in a Self-Organised Robot Colony Shervin Nouyan, Roderich Groß, Marco Dorigo, Michael Bonani and Francesco Mondada

433

Growing Virtual Neural Tissue: Binding Spiking Neurons Through Sensory Input Pascal Kaufmann and Gabriel Gómez

443

Part 8. Emergent Synthesis Hormone-Inspired Adaptive Distributed Synchronization of Reconfigurable Robots Feili Hou and Wei-Min Shen

455

xiii

Spatial Prisoner’s Dilemma in a Network Environment Introducing Heterogeneous Information Distribution Hiroshi Kuraoka, Nobutada Fujii and Kanji Ueda Behavioral Decision for Multi-Agent Systems with Dynamic Interaction Yusuke Ikemoto and Toshio Fukuda

463 471

Co-Creative Composition Using Multiagent Learning: Toward the Emergence of Musical Structure Shintaro Suzuki, Takeshi Takenaka and Kanji Ueda

479

Self-Assembly of Mobile Robots: From Swarm-Bot to Super-Mechano Colony Roderich Groß, Marco Dorigo and Masaki Yamakita

487

Lot Release Control Using Genetics Based Machine Learning in a Semiconductor Manufacturing System Ryohei Takasu, Nobutada Fujii, Kanji Ueda and Motohiro Kobayashi

497

Design of an AGV Transportation System by Considering Management Model in an ACT Satoshi Hoshino, Jun Ota, Akiko Shinozaki and Hideki Hashimoto

505

Analysis of Purchase Decision Making: Network Externalities and Asymmetric Information Yohei Kaneko, Nariaki Nishino, Sobei H. Oda and Kanji Ueda

515

Part 9. Dynamics, Morphology, and Materials in Intelligent Behavior Emergence of Insect Navigation Strategies from Homogeneous Sensorimotor Coupling Simon Bovet

525

Active Learning of Local Structures from Attentive and Multi-Resolution Vision Maxime Cottret and Michel Devy

534

Modular Design of Home Service Robot System with Hierarchical Colored Petri Net Guohui Tian, Feng Duan and Tamio Arai

542

Auctions for Task Allocation to Robots Maitreyi Nanjanath and Maria Gini

550

Exploration of Natural Dynamics Through Resonance and Chaos Alex Pitti, Max Lungarella and Yasuo Kuniyoshi

558

One-Legged Locomotion with a Compliant Passive Joint Juergen Rummel, Fumiya Iida and Andre Seyfarth

566

Analysis of Dynamical Locomotion of Two-Link Locomotors Kojiro Matsushita, Hiroshi Yokoi and Tamio Arai

574

xiv

Part 10. Mobiligence A Modular Robot That Self-Assembles Akio Ishiguro, Hiroaki Matsuba, Tomoki Maegawa and Masahiro Shimizu

585

Autonomous Robust Execution of Complex Robotic Missions Paul Robertson, Robert Effinger and Brian Williams

595

Emergence of Small-World in Ad-Hoc Communication Network Among Individual Agents Daisuke Kurabayashi, Tomohiro Inoue, Akira Yajima and Tetsuro Funato

605

Parametric Path Planning for a Car-Like Robot Using CC Steers Weerakamhaeng Yossawee, Takashi Tsubouchi, Masamitsu Kurisu and Shigeru Sarata

613

Self-Organizing Planner for Multiple Mobile Robot Exploration and Patrol Chomchana Trevai, Jun Ota and Tamio Arai

622

Adaptive Action Selection of Body Expansion Behavior in Multi-Robot System Using Communication Tomohisa Fujiki, Kuniaki Kawabata and Hajime Asama

632

From Mobility to Autopoiesis: Acquiring Environmental Information to Deliver Commands to the Effectors Antonio D’Angelo and Enrico Pagello

640

Part 11. RoboCup Cooperative Agent Behavior Based on Special Interaction Nets Oliver Zweigle, Reinhard Lafrenz, Thorsten Buchheim, Uwe-Philipp Käppeler, Hamid Rajaie, Frank Schreiber and Paul Levi A Framework for Advanced Robot Programming in the RoboCup Domain – Using Plug-In System and Scripting Language Hayato Kobayashi, Akira Ishino and Ayumi Shinohara Major Behavior Definition of Football Agents Through XML José Luis Vega, Ma. de los Ángeles Junco and Jorge Ramírez Getting Back on Two Feet: Reliable Standing-Up Routines for a Humanoid Robot Jörg Stückler, Johannes Schwenk and Sven Behnke

651

660 668

676

Ball Tracking with Velocity Based on Monte-Carlo Localization Jun Inoue, Akira Ishino and Ayumi Shinohara

686

Fast Vector Quantization for State-Action Map Compression Kazutaka Takeshita, Ryuichi Ueda and Tamio Arai

694

xv

Incremental Purposive Behavior Acquisition Based on Modular Learning System Tomoki Nishi, Yasutake Takahashi and Minoru Asada

702

Part 12. Real World Information Systems Simple Form Recognition Using Bayesian Programming Guy Ramel, Adriana Tapus, François Aspert and Roland Siegwart

713

Towards Robust State Estimation with Bayesian Networks: A New Perspective on Belief Propagation Jan Nunnink and Gregor Pavlin

722

HRP-2W: A Humanoid Platform for Research on Support Behavior in Daily Life Environments Tetsunari Inamura, Kei Okada, Masayuki Inaba and Hirochika Inoue

732

Human Supporting Production Cell “Attentive Workbench” Masao Sugi, Yusuke Tamura, Makoto Nikaido, Jun Ota, Tamio Arai, Kiyoshi Takamasu, Kiyoshi Kotani, Akio Yamamoto, Hiromasa Suzuki, Yoichi Sato, Fumihiko Kimura and Seiichi Shin

740

A Foveal 3D Laser Scanner Integrating Texture into Range Data Marcus Walther, Peter Steinhaus and Rüdiger Dillmann

748

Autonomous Collaborative Environment for Project Based Learning Mihoko Otake, Ryo Fukano, Shinji Sako, Masao Sugi, Kiyoshi Kotani, Junya Hayashi, Hiroshi Noguchi, Ryuichi Yoneda, Kenjiro Taura, Nobuyuki Otsu and Tomomasa Sato

756

Part 13. Humanoid Robots Pedaling Motion of a Cycle by Musculo-Skeletal Humanoid with Adapting Ability Based on an Evaluation of the Muscle Loads Tomoaki Yoshikai, Yuto Nakanish, Ikuo Mizuuchi and Masayuki Inaba

767

Behavior Transition Between Biped and Quadruped Walking by Using Bifurcation Kenji Asa, Kosei Ishimura and Mitsuo Wada

776

Tendon Arrangement Based on Joint Torque Requirements for a Reinforceable Musculo-Skeletal Humanoid Yuto Nakanishi, Ikuo Mizuuchi, Tomoaki Yoshikai, Tetsunari Inamura and Masayuki Inaba Vision-Based Virtual Information and Semi-Autonomous Behaviours for a Humanoid Robot Olivier Stasse, Jean Semere, Neo Ee Sian, Takashi Yoshimi and Kazuhito Yokoi

786

794

xvi

Load Distributed Whole-Body Motion Generation Method for Humanoids by Minimizing Average Joint Torque Ratio Ryusuke Adachi, Shigeru Kanzaki, Kei Okada and Masayuki Inaba Measurement and Simulation Verification of Reflexive Responses to Perturbation During Walking Shahed Sarwar, Wenwei Yu, Masaru Kumagai, Masaki Sekine, Tamotsu Katane, Toshiyo Tamura and Osami Saitou Toward a Human-Like Biped Robot with Compliant Legs Fumiya Iida, Yohei Minekawa, Juergen Rummel and Andre Seyfarth

804

812

820

Part 14. Service Robotics and Human Support Using JavaSpace for a PEIS Ecology Beom Su Seo, Mathias Broxvall, Marco Gritti, Alessandro Saffiotti and Jung Bae Kim A New Heating Method for the Actuation of the Shape Memory Alloy (SMA) Actuator Chee Siong Loh, Kojiro Matsushita, Hiroshi Yokoi and Tamio Arai

831

839

Informational Organization on Network Among Multiple Agents Yoshihito Shikanai, Koichi Ozaki and Sumio Yamamoto

847

A Flexible Task Knowledge Representation for Service Robots Steffen Knoop, Sven R. Schmidt-Rohr and Rüdiger Dillmann

856

Intelligent Autonomous Japanese Comic “MANGA” Designing Support System Yuko Ishiwaka and Yuka Kobayasi

865

Part 15. Human Behavior Analysis Behavior Induction by Geometric Relation Between Symbols of Multi-Sensory Pattern Naoki Kojo, Tetsunari Inamura and Masayuki Inaba Effects of Robotic Arm Orthosis Behaviors to User’s Motion Structure. Qualitative Assessment Using Arm Trajectory Profiles Yukio Horiguchi, Satoshi Tsukamoto, Hiroyuki Ono, Tetsuo Sawaragi and Masahiro Sato A Human Behavior Discrimination Method Based on Motion Trajectory Measurement for Indoor Guiding Services Hajime Asama, Atsushi Morimoto, Kuniaki Kawabata and Yasushi Hada

875

883

891

xvii

Part 16. Mutual Adaptation among man and machines Effects of Shared Communicational Modality to Joint Activity of Human Operator and Robot Autonomy Yukio Horiguchi and Tetsuo Sawaragi Learning of Object Identification by Robots Commanded by Natural Language Chandimal Jayawardena, Keigo Watanabe and Kiyotaka Izumi An f-MRI Study of an EMG Prosthetic Hand Biofeedback System Alejandro Hernández A., Hiroshi Yokoi, Takashi Ohnishi and Tamio Arai Wearable Inertial Sensors for Arm Motion Tracking in Home-Based Rehabilitation Huiyu Zhou, Huosheng Hu and Nigel Harris

903 913 921

930

Learning and Control Model of Arm Posture K.S. Kim, H. Kambara, D. Shin, M. Sato and Y. Koike

938

Competitive Learning Method for Robust EMG-to-Motion Classifier Ryu Kato, Hiroshi Yokoi and Tamio Arai

946

Mutual Adaptation Among Man and Machine by Using f-MRI Analysis Hiroshi Yokoi, Alejandro Hernandez Arieta, Ryu Katoh, Takashi Ohnishi, Wenwei Yu and Tamio Arai

954

Part 17. Women in Robotics, Human Science and Technology Behavior Generation of Humanoid Robots Depending on Mood Kazuko Itoh, Hiroyasu Miwa, Yuko Nukariya, Massimiliano Zecca, Hideaki Takanobu, Stefano Roccella, Maria Chiara Carrozza, Paolo Dario and Atsuo Takanishi Construction of Human-Robot Cooperating System Based on Structure/Motion Model Fumi Seto, Yasuhisa Hirata and Kazuhiro Kosuge

965

973

Motion Assist Devices for Rehabilitation Using Parallel Wire Mechanisms Keiko Homma

981

Analysis of Skill Acquisition Process – A Case Study of Arm Reaching Task – Kahori Kita, Ryu Kato, Hiroshi Yokoi and Tamio Arai

991

Generation of Size-Variable Image Template for Self-Position Estimation Considering Position Shift Kae Doki, Naohiro Isetani, Akihiro Torii and Akiteru Ueda Subjective Age Estimation Using Facial Images – the Effects of Gender, Expressions and Age Groups – Noriko Nagata, Naoyuki Miyamoto, Yumi Jinnouchi and Seiji Inokuchi

999

1007

xviii

Evaluation and Suppression of Overrun of Microorganisms Using Dynamics Model for Microrobotic Application Naoko Ogawa, Hiromasa Oku, Koichi Hashimoto and Masatoshi Ishikawa From Muscle to Brain: Modelling and Control of Functional Materials and Living Systems Mihoko Otake

1015

1025

Using the Sony AIBOs to Increase Diversity in Undergraduate CS Programs Maria Gini, Jan Pearce and Karen Sutherland

1033

Author Index

1041

Papers of Invited Guest Speakers

This page intentionally left blank

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

3

Reaction-Diffusion Intelligent Wetware Andrew Adamatzky Unconventional Computing Centre, Faculty of Computing, Engineering and Mathematical Sciences, University of the West of England, Bristol, United Kingdom [email protected]

Abstract. We give an overview of recent results on implementation of computing, actions, and emotions in spatially extended reaction-diffusion chemical systems [1-3]. We pinpoint all essential ingredients of intelligence found in spatio-temporal dynamics of nonlinear chemical systems, and show outlines of future designs and prototypes of chemical intelligent ‘gooware’.

Reaction-diffusion chemical systems are well known for their unique ability to efficiently solve combinatorial problems with natural parallelism. In spatially distributed chemical processors, the data and the results of the computation are encoded as concentration profiles of the chemical species. The computation per se is performed via the spreading and interaction of wave fronts. The reaction-diffusion computers are parallel because the chemical medium’s micro-volumes update their states simultaneously, and molecules diffuse and react in parallel [1,3]. During the last decades a wide range of experimental prototypes of reaction-diffusion computing devices have been fabricated and applied to solve various problems of computer science, including: image processing, pattern recognition, path planning, robot navigation, computational geometry, logical gates in spatially distributed chemical media, arithmetical and memory units. These important ņ but scattered across many scientific fields ņ results convince us that reaction-diffusion systems can do a lot. Are they are capable enough to be intelligent? Yes, reaction-diffusion systems are smart ņ showing a state of readiness to respond, able to cope with difficult situations, capable for determining something by mathematical and logical methods ņ and endowed with capacity to reason. Sensing and Reacting Reaction-diffusion computers allow for massive parallel input if data. Equivalently, reaction-diffusion robots would need no dedicated sensors, each micro-volume of the medium, each site of the matrix gel, is sensitive to changes in one or another physical characteristic of the environment. Electric field, temperature and illumination are ‘sensed’ by reaction-diffusion devices, and these are three principle parameters in controlling and programming reaction-diffusion robots. Thus, for example, velocity of excitation waves in

4

A. Adamatzky / Reaction-Diffusion Intelligent Wetware

Belousov-Zhabotinsky reactors is increased by a negative and decreased by a positive electric field, waves are splitted in two by high intensity electric field, wave-fronts can be (de)stabilised, spiral waves are generated by alternating fields, morphological diversity of generated patterns can be increased. Increasing temperature is known to drive space-time dynamics of the Belousov-Zhabotinsky reactor from periodic to chaotic oscillations. Light is a key to sophisticated control of excitable (light-sensitive) chemical media: applying light of varying intensity we can tune medium’s excitability, waves’ velocities, pattern formation. Coping with Difficult Tasks Hard computational problems of geometry, image processing and optimisation on graphs are resource-efficiently solved in reaction-diffusion media due to intrinsic natural parallelism of the problems [1,3]. Let us look at three examples. The Voronoi diagram is a subdivision of plane by data planar set. Each point of the data set is represented by a drop of a reagent. The reagent diffuses and produces a colour precipitate when reacting with the substrate. When two or more diffusive fronts of the ‘data’ chemical species meet, no precipitate is produced (due to concentration-dependent inhibition). Thus, uncoloured domains of the computing medium represent bisectors of the Voronoi diagram. A skeleton of a planar shape is computed in the similar manner. A contour of the shape is applied to computing substrate as a disturbance in reagent concentrations. The contour concentration profile induces diffusive waves. A reagent diffusing from the data-contour reacts with the substrate and the precipitate is formed. Precipitate is not produced at the sites of diffusive waves’ collision. The uncoloured domains correspond to the skeleton of the data shape. To compute a collision-free shortest path in a space with obstacles, we can couple two reaction-diffusion media. Obstacles are represented by local disturbances of concentration profiles in one of the media. The disturbances induce circular waves travelling in the medium and approximating a scalar distance-to-obstacle field. This field is mapped onto the second medium, which calculates a tree of ‘many-sources-one-destination’ shortest paths by spreading wave-fronts [3]. Logical reasoning Most known so far experimental prototypes of reaction-diffusion processors exploit interaction of wave fronts in a geometrically constrained chemical medium, i.e. the computation is based on a stationary architecture of medium’s inhomogeneities. Constrained by a stationary wires and gates RD chemical universal processors pose a little computational novelty and none dynamical reconfiguration ability because they simply imitate architectures of conventional silicon computing devices. To appreciate in full massive-parallelism of thin-layer chemical media and to free the chemical processors from limitations of fixed computing architectures we adopt an unconventional paradigm of architecture-less, or collision-based, computing. A non-linear medium processor can be either specialised or general-purpose (universal). A specialized processor is built to solve only one particular problem, possibly

A. Adamatzky / Reaction-Diffusion Intelligent Wetware

5

with different data sets and variations in interpretations of results. Specialised computing devices are quite handy when we deal with image processing, problems of mathematical morphology, or computation on graphs [1]. A device is called computation universal if it computes any logical function. Therefore to prove a medium's universality we must represent quanta of information, routes of information transmission and logical gates, where information quanta are processes, in states of the given system. We could highlight two types of computational universality. They can be generally named architecture-based, or stationary, and architecture-less, or collision-based, types of a universal computation. An architecture-based, or stationary, computation implies that a logical circuit is embedded into the system in such a manner that all elements of the circuit are represented by the system's stationary states. The architecture is static. If there is any kind of 'artificial' or 'natural' compartmentalisation the medium is classified as an architecture-based computing device. Personal computers, living neural networks, cells, and networks of chemical reactors are typical examples of architecture-based computers.

Figure 1. Implementation of simple logical gate in numerical model of sub-excitable Belousov-Zhabotinsky medium. Boolean variables X and Y are represented by wave-fragments travelling west and east, respectively. In this particular example, both variables take value Truth. When wave-fragments collide, they change their trajectories. Thus signals following new trajectories represent logical conjunction X AND Y. If one of the wave-fragments were absent, then another fragment would travel undisturbed and its undisturbed trajectory represent operation NOT X AND Y or X AND NOT Y, respectively.

A collision-based, or dynamical, computation employs mobile compact finite patterns, mobile self-localized excitations or simply localisations, in active non-linear medium (Fig. 1). The localisations travel in space and do computation when they collide with each other. Essentials of collision-based computing are following. Information values (e.g. truth values of logical variables) are given by either absence or presence of the localisations or other parameters of the localisations. The localisations travel in space and do computation when they collide with each other. There are no predetermined stationary wires (a trajectory of the travelling pattern is a momentarily wire). Almost any part of the medium space can be used as a wire. Localisations can collide anywhere within a space sample, there are no fixed positions at which specific operations occur, nor location specified gates with fixed operations. The localisations undergo transformations, form bound states, annihilate or fuse when they interact with other mobile patterns. Information values of localisations are transformed in result of collision and thus a computation is implemented [1,3].

6

A. Adamatzky / Reaction-Diffusion Intelligent Wetware

Chemical Emotions Sensing, computing and reasoning are essential but not sufficient components of intelligence: emotions are the prime ingredient which turns the brew of computing into actions of apprehension, cooperation, creation and synergy. In reaction-diffusion systems, an affective mixture is a theoretical construct which represents emotions spreading and reacting one with another as a massive pool of locally interacting entities. Molecules of the affective mixture correspond to basic emotions ņ happiness, anger, fear, confusion and sadness ņ which diffuse and react with each other by quasi-chemical laws (Fig. 2).

Figure 2. Space-time development of one-dimensional chemical reactor mimicking evolution of emotional pool with happiness, anger and confusion [2].

In computational experiments with affective solutions, in well-stirred and thin-layer reactors, we uncover and classify varieties of behavioural modes and regimes determined by particulars of emotion interactions. We demonstrate that a paradigm of affective solutions offers an unconventional but promising technique for designing, analyzing and interpreting integral and spatio-temporal dynamics of emotional developments in artificial chemical minds [2]. Locomotion and manipulation We have demonstrated that reaction-diffusion chemical systems can solve complex problems, do logical reasoning and express emotions. Now we want the chemical reactors to stop minding their own business but act: move and manipulate. The physico-chemical artefacts are known to be capable for sensible motion. Most famous are Belousov-Zhabotinsky vesicles, self-propulsive chemo-sensitive drops and ciliary arrays (Yoshikawa Lab, Kyoto). Their motion is directional but somewhere lacks sophisticated control mechanisms.

A. Adamatzky / Reaction-Diffusion Intelligent Wetware

7

In computational experiments we established that two reagents are enough to produce a sustained amoeba-like pattern ņ compact domain of wave-like dynamics enclosed in undulating membrane ņ in reaction-diffusion system (Fig. 3). Such types of proto-cell like patterns are sensitive to stimulation of their membranes, which can provoke their movement.

Figure 3. Reaction-diffusion proto-cell emerged in cellular automaton (binary cell state set, eight-cell neighbourhood) model of chemical system. Each cell of the automaton takes two states, 0 (white pixel) and 1 (black pixel). A cell in state 0 (1) takes state 1 (0) if it has from 4 to 7 neighbours in state 1; otherwise the cell does not change its state.

At present stage of reaction-diffusion intelligence research it seems to be difficult to provide effective solutions for experimental prototyping of combined sensing, decisionmaking and actuating. However as a proof-of-concept we can always consider hybrid ‘wetware+hardware’ systems. For example, to fabricate a chemical controller for robot, we place a reactor with Belousov-Zhanotinsky solution onboard of a wheeled robot (Fig. 4), and allow the robot to observer excitation wave dynamics in the reactor. When the medium is stimulated at one point, target waves are formed. The robot becomes aware of the direction towards source of stimulation from the topology of the wave-fronts [3].

Figure 4. The robot controlled by liquid-phase Belousov-Zhabotinsky medium [3].

8

A. Adamatzky / Reaction-Diffusion Intelligent Wetware

A set of remarkable experiments were undertaken by Hiroshi Yokoi and Ben De Lacy Costello. They built interface between robotic hand and Belousov-Zhabotinsky chemical reactor (Fig. 5).

Figure 5. Closed-loop system of robotic hand interacting with Belousov-Zhabotinsky medium [3,4]. Photograph of Prof. Yokoi and Dr. De Lacy Costello experimental setup.

Excitation waves propagating in the reactor were sensed by photo-diodes, which triggered finger motion. When bending fingers touched the chemical medium with their glass nails filled with colloid silver, which triggered circular waves in the medium [3]. Starting from any initial configuration, the chemical-robotic system does always reach a coherent activity mode, where fingers move in regular, somewhat melodic patterns, and few generators of target waves govern dynamics of excitation in the reactor [4].

Figure 6. Transporting rectangular object in Belousov-Zhabotinsky chemical medium [5].

A. Adamatzky / Reaction-Diffusion Intelligent Wetware

9

Massively parallel smart manipulation is yet another hot topic of reaction-diffusion intelligence. This was demonstrated in our recent experiments on manipulating simple objects in a simulated massive-parallel array of actuators controlled by experimental excitation dynamics in a Belousov-Zhabotinsky chemical medium [5]. We assumed that manipulated objects should be pushed by travelling excitation wave-fronts. Therefore we encoded snapshots of gel-based Belousov-Zhabotinky media and converted them into arrays of local force vectors in such a manner that every pixel of the medium’s snapshot got its own force vector, the orientation of which was determined by the gradient of concentrations of chemical species in the medium in the neighbourhood of the pixel. We have shown that various types of excitation dynamics ņ target waves, spiral waves, wave-fragments ņ are capable of transportation and manipulation of small objects and larger, spatially extended objects, planar shapes (Fig. 6). Conclusion In this sketchy paper we provided basic paradigms and facts which will lead us to design and fabrication, and mass-production, of shapeless and sluggish but highly intelligent creatures. These artificial reaction-diffusion wet-bots will act as computers, transporters, builders, and, ultimately, parts of our body. The wet-bots will fill our life with care, understanding, gentleness and love (Fig. 7).

Figure 7. The slogan on the first poster of “The Blob” (1956) reflects our vision of future reaction-diffusion intelligence.

10

A. Adamatzky / Reaction-Diffusion Intelligent Wetware

When things can go wrong? The things could not possibly go wrong unless we step with both feet in the trap of conventional architectures! The field of reaction-diffusion computing started twenty years ago [6] as a sub-field of physics and chemistry dealing with image processing operations in uniform thin-layer excitable chemical media. The basic idea was to apply input data as two-dimensional profile of heterogeneous illumination, then allow excitation waves spread and interact with each, and then optically record result of the computation. The first even reaction-diffusion computers were already massively parallel, with parallel optical input and outputs. Later computer scientists came to the field, and started to exploit traditional techniques ņ wires were implemented by channels where wave-pulses travel, and specifically shaped junctions acted as logical valves. In this manner, most ‘famous’ chemical computing devices were implemented, including Boolean gates, coincidence detectors, memory units and more. The upmost idea of reaction-diffusion computation was if not ruined then forced into cul-de-sac of non-classical computation. The breakthrough happened when paradigms and solutions from the field of dynamical, collision-based computing and conservative logic, were mapped onto realms of spatiallyextended chemical systems. The computers became uniform and homogeneous again.

Refererence [1] Adamatzky A. Computing in Nonlinear Media and Automata Collectives (IoP Publishing, 2001). [2] Adamatzky A. Dynamics of Crowd-Minds (World Scientific, 2005). [3] Adamatzky A., De Lacy Costello B., Asai T. Reaction-Diffusion Computers (Elsevier, 2005). [4] Yokoi H., Adamatzky A., De Lacy Costello B., Melhuish C. Excitable chemical medium controlled for a robotic hand: closed loop experiments, Int. J. Bifurcation and Chaos (2004). [5] Adamatzky A., De Lacy Costello B., Skachek S., Melhuish C. Manipulating objects with chemical waves: Open loop case of experimental Belousiv-Zhabotinsky medium. Phys. Lett. A (2006). [6] Kuhnert L. A new photochemical memory device in a light sensitive active medium. Nature 319 (1986) 393.

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

11

Evolutionary Robotics for Legged Machines: From Simulation to Physical Reality Hod Lipson, Josh Bongard, Victor Zykov, Evan Malone Computational Synthesis Lab Cornell University, Ithaca NY 14853, USA

Abstract. This talk will outline challenges and opportunities in translating evolutionary learning of autonomous robotics from simulation to reality. It covers evolution and adaptation of both morphology and control, hybrid co-evolution of reality and simulation, handling noise and uncertainty, and morphological adaptation in hardware.

Keywords. Evolutionary robotics, co-evolutionary learning, estimation-exploration, rapid prototyping.

Introduction The idea that machine learning processes inspired by biological evolution can be used to design autonomous machines, has its roots in the early days of evolutionary computation and has been implemented numerous times, starting with the seminal work of Sims [9]. Nevertheless, the transition of evolutionary robotics from simulation to reality has been met with many challenges, as is evident from the relatively few examples of successful implementations of these methods in physical reality. Though many robotic experiments are carried out in simulation, a robot must ultimately function in physical reality. Consider the problem of evolving controllers for a dynamical, legged robot, shown in Figure 1 [13]. The nine-legged machine composed of two Stewart platforms back to back. The platforms are powered by twelve pneumatic linear actuators, with power coming from an onboard 4500psi paintball canister. While most robotic systems are use position-controlled actuators whose exact extension can be set, pneumatic actuators of the kind used here are force-controlled. Like biological muscle, the controller can specify the force and duration of the actuation, but not the position. It is therefore a challenging control problem. The controller architecture for this machine was an open-loop pattern generator that determines when to open and close pneumatic valves. The on-off pattern was evolved; Candidate controllers were evaluated by trying them out on the robot in a cage, and measuring fitness using a camera that tracks the red ball on the foot of one of the legs of the machine (see inset in Figure 1b for a view from the camera). Snapshots from one of the best evolved gates are

12

H. Lipson et al. / Evolutionary Robotics for Legged Machines: From Simulation to Physical Reality

shown in Figure 1c. Nolfi and Floreano [8] describe many other interesting hardware experiments evolving controllers for wheeled robots

(a)

(b)

(c) Figure 1: Evolving a controller for physical dynamic legged machine: (a) The nine-legged machine is powered by twelve pneumatic linear actuators arranged in two Stewart platforms. The controller for this machine is an open-loop pattern generator that determines when to open and close pneumatic valves. (b) Candidate controllers are evaluated by trying them out on the robot in a cage, and measuring fitness using a camera that tracks the red foot (see inset). (c) Snapshots from one of the best evolved gates. From [13].

H. Lipson et al. / Evolutionary Robotics for Legged Machines: From Simulation to Physical Reality

13

While evolution successfully generated viable gaits in this case, applying evolutionary processes to physical machines is difficult for two reasons. First, even if we are only evolving controllers for a machine with a fixed morphology, each evaluation of a candidate controller involves trying it out in reality. This is a slow and costly process that also wears out the target system. Performing thousand of evaluations is usually impractical. Second, if we are evolving morphology as well, then how would these morphological changes take place in reality? Changes to the controller can be done simply by reprogramming, but changes to the morphology require more sophisticated processes. Nature has some interesting solutions to this problem, such as growing materials, or self-assembling and self-replicating basic building blocks like cells.

1. Evolving controllers for physical morphologies One approach to evolving controllers for fixed morphologies is to make a simulator with such fidelity that whatever works in simulation will also work in reality equally well. This is possible only for some types of locomotion, such as quasi-static kinematics that can be accurately predicted [6][4]. Figure 2a shows some of the machines that evolved for quasistatic locomotion in simulation; these machines were “copied” from simulation into reality using rapid-prototyping technology (Figure 2b) where they functioned in a way similar to their simulation. Unfortunately, however, it is unlikely that a similarly predictive dynamic simulator would exist, given that machine dynamics are inherently chaotic and sensitive to initial conditions and many small parameter variations. But even if such simulators existed, creating accurate models would be painstakingly difficult, or may be impossible because the target environment is unknown.

Figure 2: Evolving bodies and brains: (a) Three evolved robots, in simulation (b) the three robots reproduced in physical reality using rapid prototyping. From [6].

14

H. Lipson et al. / Evolutionary Robotics for Legged Machines: From Simulation to Physical Reality

An alternative approach to “crossing the reality gap” is to use a crude simulator that captures the salient features of the search space. Techniques have been developed for creating such simulators and using noise to cover uncertainties so that the evolved controllers do not exploit these uncertainties [5]. Yet another approach is to use plasticity in the controller: Allow the robot to learn and adapt in reality. In nature, animals are born with mostly predetermined bodies and brains, but these have some ability to learn and make final adaptations to whatever actual conditions may arise. A third approach is to co-evolve simulators so that they are increasingly predictive. Just as we use evolution to design a controller, we can use evolution to design the simulator so that it captures the important properties of the target environment. Assume we have a rough simulator of the target morphology, and we use it to evolve controllers in simulation. We then take the best controller and try it – once – on the target system. If successful, we are done; but if the controller did not produce the anticipated result (as is likely to happen since the initial simulator was crude), then we observed some unexpected sensory data. We then evolve a new set of simulators, whose fitness is their ability to reproduce the actual observed behavior when the original controller is tested on them. Simulators that correctly reproduce the observed data are more likely to be predictive in the future. We then take the best simulator, and use to evolve a new controller, and the cycle repeats: If the controller works in reality, we are done. If it does not work as expected, we now have more data to evolve better simulators, and so forth. The co-evolution of controllers and simulators is not necessarily computationally efficient, but it dramatically reduces the number of trials necessary on the target system. The co-evolutionary process consists of two phases: Evolving the controller (or whatever we are trying to modify on the target system) – we call this the exploration phase. The second phase tries to create a simulator, or model of the system – we call this the estimation phase. To illustrate the estimation-exploration process, consider a target robot with some unknown, but critical, morphological parameters, such as mass distribution and sensory lag times. Fifty independent runs of the algorithm were conducted against the target robot. Figure 3a shows the 50 series of 20 best simulator modifications output after each pass through the estimation phase. Figure 3a makes clear that for all 50 runs, the algorithm was better able to infer the time lags of the eight sensors than the mass increases of the nine body parts. This is not surprising in that the sensors themselves provide feedback about the robot. In other words, the algorithm automatically, and after only a few target trials, deduces the correct time lags of the target robot's sensors, but is less successful at indirectly inferring the masses of the body parts using the sensor data. Convergence toward the correct mass distribution can also be observed. But even with an approximate description of the robot's mass distribution, the simulator is improved enough to allow smooth transfer of controllers from simulation to the target robot. Using the default, approximate simulation, there is a complete failure of transferal: the target robot simply moves randomly, and achieves no appreciable forward locomotion. It is interesting to note that the evolved simulators are not perfect; they capture well only those aspects of the world that are important for accomplishing the task. The exploration-estimation approach can be used for much more than transferring controllers to robots – it could be used by the robot itself to estimate its own structure. This

H. Lipson et al. / Evolutionary Robotics for Legged Machines: From Simulation to Physical Reality

15

would be particularly useful if the robot may undergo some damage that changes some of its morphology in unexpected ways, or some aspect in its environment changes. As each controller action is taken, the actual sensory data is compared to that predicted by the simulator, and new internal simulators are evolved to be more predictive. These new simulators are then used to try out new, adapted controllers for the new and unexpected circumstances. Figure 3b shows some results applying this process to design controllers for a robot which undergoes various types of drastic morphological damage, like loosing a leg, motor, or sensor, or combinations of these. In most cases, the estimation-exploration process is able to reconstruct a new simulator that captures the actual damage using only 4-5 trials on the target robot, and then use the adapted simulator to evolve compensatory controllers that recover most of the original functionality. There are numerous applications to this identification and control process in other fields.

(a)

(c)

(b) Figure 3: Co-evolving robots and simulators: (a) Convergence toward the physical characteristics of the target robot. Each pass through the estimation phase produces a set of mass changes for each of the nine body parts of the robot (top row) and a set of time lags for each of the eight sensors (bottom row). The open circles indicate the actual differences between the target robot and the starting default simulated robot [1]. (b) Three typical damage recoveries. a: The evolutionary progress of the four sequential runs of the exploration EA on the quadrupedal robot, when it undergoes a failure of one of its touch sensors. The hypotheses generated by the three runs of the estimation EA (all of which are correct) are shown. The dots indicate the fitness of the best controller from each generation of the exploration EA. The triangle shows the fitness of the first evolved controller on the target robot (the behavior of the ‘physical’ robot with this controller is shown in b); the filled circle shows the fitness of the robot after the damage occurs (the behavior is shown in c); the squares indicate the fitness of the ‘physical’ robot for each of the three subsequent hardware trials (the behavior of the ‘physical’ robot during the third trial is shown in d). e-h The recovery of the quadrupedal robot when it experiences unanticipated damage. i-l The recovery of the hexapedal robot when it experiences severe, compound damage. The trajectories in b-d, f-h and j-l show the change in the

16

H. Lipson et al. / Evolutionary Robotics for Legged Machines: From Simulation to Physical Reality

robot’s center of mass over time (the trajectories are displaced upwards for clarity) [2]. (c) The simulator progressively learns the entire robot morphology from scratch. Panels (a-g) are progressive intermediate selfinference stages, panel (h) is the true target system [3].

2. Making morphological changes in hardware An evolutionary process may require a change of morphology, or production of a new physical morphology altogether. One approach for generating new morphology is to use reconfigurable robots [12]. Reconfigurable robots are composed of many modules that can be connected, disconnected and rearranged in various topologies to create machines with variable body plans. Self-reconfigurable robots are able to rearrange their own morphology, and thus adapt in physical reality. Figure 4a shows one example of a self-reconfiguring robot composed of eight identical cubes [14]. Each cube can swivel around its (1,1,1) axis, and connect and disconnect to other cubes using electromagnets on its faces. Though this robot contains only 8 units, it is conceivable that future machine will be composed of hundreds and thousands of modules of smaller modules, allowing much greater control and flexibility in morphological change.

Figure 4: Transferring morphological changes into reality (a) Reconfigurable molecube robots [14], (b) Rapid prototyping, (c) Future rapid prototyping systems will allow deposition of multiple integrated materials, such as elastomers, conductive wires, batteries and actuators, offering evolution a larger design space of integrated structures, actuators and sensors, not unlike biological tissue. From [7].

H. Lipson et al. / Evolutionary Robotics for Legged Machines: From Simulation to Physical Reality

17

An alternative approach to varying morphology is to produce the entire robot morphology automatically. For example, the robots shown in Figure 2b were produced using rapid prototyping equipment: These are 3D printers, that deposit material layer by layer to gradually build up a solid object of arbitrary geometry, as shown in Figure 4b. This “printer”, when coupled to an evolutionary design process, can produce complex geometries that are difficult to produce any other way, and thus allow the evolutionary search much greater design flexibility. But even when using such automated fabrication equipment we needed to manually insert the wires, logic, batteries and actuators. What if the printer could print these components too? Future rapid prototyping systems may allow deposition of multiple integrated materials, such as elastomers, conductive wires, batteries and actuators, offering evolution an even larger design space of integrated structures, actuators and sensors, not unlike biological tissue. Figure 4c shows some of these printed components [7].

Figure 5: Macro-scale physical models of stochastic self-assembly. (a) Stochastic self-assembly and self reconfiguration of 10-cm scale modules on an oscillating air table: Top: units with electromagnets; Bottom: Units with swiveling permanent magnets [10]. (b) Three dimensional stochastic self assembly and reconfiguration of 10cm cubes in oil [11].

Looking at biology, one would ultimately like to emulate ‘growing structures’ – structures that can actively move material from one place to another, adapting to needs in situ. As we move to smaller and smaller scales, however, deterministically moving material becomes increasingly difficult. An interesting alternative is to exploit the random ‘Brownian’ motion of the particles in the environment to assist in stochastic self assembly. Figure 5 shows some macro-scale prototypes of such stochastically reconfiguring systems, both in 2D and in 3D. Implementation of such systems at the micro scale, using many thousands of units, entails many physical as well as computational challenges, involving local actuation, sensing, and control.

18

H. Lipson et al. / Evolutionary Robotics for Legged Machines: From Simulation to Physical Reality

3. Conclusions The transition of evolutionary robotics methods from simulation to reality has met several hurdles: Besides the scaling limits of evolutionary computation itself, we are confronted with the limits of simulation and modeling, the cost, time and risk of training machines in reality, and the technical challenge of adapting morphology in reality. Again we have resorted to inspiration from biology: Better ways to design adaptive simulations, better ways to determine the most useful physical evaluation, and new ways to adapt physical morphology through automatic reconfiguration and material growth, all leading to new ideas for engineering and new engineering insights into biology.

4. References [1]

Bongard J. C., Lipson H., (2004) “Once More Unto the Breach: Automated Tuning of Robot Simulation using an Inverse Evolutionary Algorithm”, Proceedings of the Ninth Int. Conference on Artificial Life (ALIFE IX), pp.57-62

[2]

Bongard J., Lipson H. (2004), “Automated Damage Diagnosis and Recovery for Remote Robotics”, IEEE International Conference on Robotics and Automation (ICRA04), pp. 3545-3550

[3]

Bongard, J. and Lipson, H. (2004) “Integrated Design, Deployment and Inference for Robot Ecologies”, Proceedings of Robosphere 2004, November 2004, NASA Ames Research Center, CA USA

[4]

Hornby G.S., Lipson H., Pollack. J.B., 2003 “Generative Encodings for the Automated Design of Modular Physical Robots”, IEEE Transactions on Robotics and Automation, Vol. 19 No. 4, pp 703-719

[5]

Jakobi, N. (1997). Evolutionary robotics and the radical envelope of noise hypothesis. Adaptive Behavior, 6(1):131–174.

[6]

Lipson H., Pollack J. B. (2000) Automatic design and manufacture of artificial lifeforms. Nature, 406:974–978

[7]

Malone E., Lipson H., (2004) “Functional Freeform Fabrication for Physical Artificial Life”, Ninth Int. Conference on Artificial Life (ALIFE IX), Proceedings of the Ninth Int. Conference on Artificial Life (ALIFE IX), pp.100-105

[8]

Nolfi S., Floreano D. (2004), Evolutionary Robotics: The Biology, Intelligence, and Technology of SelfOrganizing Machines, Bradford Books

[9]

Sims K. “Evolving 3D morphology and behaviour by competition”. Artificial Life IV, pages 28–39, 1994.

[10] White P. J., Kopanski K., Lipson H. (2004), “Stochastic Self-Reconfigurable Cellular Robotics”, IEEE International Conference on Robotics and Automation (ICRA04), pp. 2888-2893 [11] White P., Zykov V., Bongard J., Lipson H. (2005) Three Dimensional Stochastic Reconfiguration of Modular Robots, Proceedings of Robotics Science and Systems, MIT, Cambridge MA, June 8-10, 2005 [12] Yim, M., Zhang, Y. and Duff, D., "Modular Reconfigurable Robots, Machines that shift their shape to suit the task at hand," IEEE Spectrum Magazine cover article, Feb. 2002 [13] Zykov V., Bongard J., Lipson H., (2004) "Evolving Dynamic Gaits on a Physical Robot", Proceedings of Genetic and Evolutionary Computation Conference, Late Breaking Paper, GECCO'04 [14] Zykov V., Mytilinaios E., Adams B., Lipson H. (2005) "Self-reproducing machines", Nature Vol. 435 No. 7038, pp. 163-164

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

19

Real World Informatics Environment System Tomomasa SATO1 Department of Mechano-Informatics Graduate School of Information Science and Technology The University of Tokyo

Abstract. This paper proposes a real world informatics environment system realized by multiple human behavior support cockpits (HBSC’s). The human behavior support cockpit (HBSC’s) is synthesized by such supportive environment as illumination environment (physiological support), object access environment (physical support) and background music environment (psychological support). These HBSC’s are implemented by cooperating the real world informatics environment system components of humanoid robots, audio/visual agents and ubiquitous appliances. In the paper, the author describes images of real world informatics environment system and presents research results of its constituent elements by dividing them into the following research groups; a humanoid robot (HR), VR system (VR), attentive environment system (AE), neo-cybernetic system (NC), and human informatics (HI) research group. Keywords. Human Robot Symbiosis System, Human Robot Interaction, Human Support

1.

Introduction

Advanced robot in the future should be able to offer such services of a servant, a secretary, a friend and a pet simultaneously and continuously to human. To this end, a lot of research efforts have been devoted on humanoid robots, communication robots, mobile robots and manipulation robots. These efforts can be classified as an individual robot approach. An environmental robot is another research approach to this end. In the environmental robot system, multiple robotic elements are distributed over space to offers services to human. Thanks to the progress of micro devices and network technology, the environmental robot systems have been widely under study such as Smart Room [1], Intelligent Room [2], Easy Living [3], Self [4], Aware House [5], neural network House [6], and Intelligent Space [7]. While research approach of individual robot or environmental robot attracts many researchers independently, these two approaches are not exclusive but complimentary to 1 Department of Mechano-Informatics, Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656 Japan. Tel: +81 3 5841 6441; Fax: +81 3 5802 5379; Email: [email protected].

20

T. Sato / Real World Informatics Environment System

each other, i.e. the integrated system of both individual robot and environmental robot will realize better robotic performance than each independent robot. Taking account of the stated consideration, researchers of the 21st century COE (center of excellence) program of the University of Tokyo started a new project of a real world information system from 2002 supported by the Ministry of Education, Culture, Sports, Science and Technology. The real world information system project is intended to develop a real world informatics environment system. The system is a human symbiotic environment system where such intelligent entities as agents in the information world, avatars in the virtual reality world and humanoid robots or ubiquitous appliances with sensors and actuators in the real world are seamlessly linked with each other and are interacting with human to offer support. The social significance of constructing such real world informatics environment system is that it can demonstrate and show a prototype of the forthcoming ubiquitous society in a concrete manner. Scientific significance of the project is that the project reveals a model of human recognition and behavior by analyzing the relationship between the real world informatics system and the human inside it. In this paper, the author first of all makes clear basic concept of the real world informatics environment system and proposes to realize the system by combining multiple human behavior support cockpits (HBSC’s). The paper then touches upon several new functions of the human behavior support cockpits (HBSC’s) from the viewpoint of human behavior support environment by introducing research groups of the project. Lastly, the paper also reports realized new interactions between a human and a system in terms of the human behavior support cockpit taking account of the real world informatics related researches.

Higher-order communication with a remote place Assistance in reading or operations

Dialogue with an information agent Work assistance by AWB

The point of integration Show Room = Future Living Room x Natural x Always available

Life support by a humanoid Promotion of work efficiency and activation of communication

x Nonbinding x Coexisting with humans

Figure 1: Image of a real-world information system environment

T. Sato / Real World Informatics Environment System

2.

21

Project Target

2.1 Image of Real World Informatics Environment Figure 1 illustrates a conceptual sketch of the real world informatics environment system proposed by the members of 21st century COE project in 2002. In the environment, an information agent recognizes human daily movements continuously, a Virtual Reality (VR) system capable of natural dialogue interacts with the human, a humanoid robot handling complicated tasks comes close to the human to assist, and a ubiquitous appliance (a future information home electronics item) offers a helping hand to the human. As a result of discussion in the demonstration committee of the project, we obtained the stated research target image of the real world informatics environment system. This image sketch illustrates daily life in the future with multiple people, i.e. the daily life in the ubiquitous society in the future. Major characteristics of such an environment system are that it is a natural environment and is a comfortable place for a human to live, various system elements work all the time providing services while excessive physical and psychological constraints on a human are avoided as much as possible, and it is a humansystem symbiotic environment where the system adjusts itself to a human. 2.2 Human Behavior Support Cockpit In this section, the author proposes to realize the world informatics environment system by multiple human behavior support cockpits (HBSC’) which supports a human to perform that specific behavior by providing necessary and/or satisfactory environment. The human behavior support cockpit (HBSC) is synthesized by cooperating such elements as humanoid robots, information agents, VR systems and ubiquitous appliances depending upon human behavior. Here let us explain the human behavior cockpit by taking concrete example of reading newspaper behavior. When we want to read a newspaper, we have to have a newspaper at hand and we need illumination if it is dark in a room. These are the environments we need to read a newspaper. In this case, we need both illuminated environment and newspaper access environment. By synthesizing these two necessary environments, the newspaper reading cockpit is realized. This is the typical example of human behavior support cockpit by synthesizing multiple human support environments of illumination and object access. In the real world informatics environment system, the human behavior support cockpit is synthesized by the combination of not only a newspaper access environment realized by a humanoids robot bringing it near a human but also a newspaper readable illumination environment realized by a robotic lamp which is a component of the ubiquitous appliances of the room as described later.

22

T. Sato / Real World Informatics Environment System

2.3 Main Features of Human Behavior Support Cockpit 2.3.1 Elementary Human Behavior Support Environments to Realize Cockpit The human behavior support cockpit (HBSC) is synthesized by combining the following four categories of human behavior support environments by considering human property. These are the support environment of a) physiological support environment, b) physical support environment, c) informational support environment and d) psychological support environment. An illuminated space is an example of a physiologically supported environment. Physical support environment enables the human physically supported situation, for example a humanoid robot is able to realize a newspaper accessible physical environment by bringing the newspaper close to the human. An active projector easily realizes informational support environment by projecting necessary information required to perform particular behavior. Lastly the robot system should realize psychologically comfortable environment by providing BGM (back ground music) if it is adequate time for that. Thus by combining the stated human behavior support environment, the human behavior support cockpit (HBSC) is synthesized. A specific human behavior support cockpit (HBSC) is required depending upon the individual human behavior. 2.3.2 Factors of Human Behavior Support Cockpit The most important factors of a specific human behavior support cockpit (HBSC) are related to a) who, b) where, c) when, d) what, e) how and, f) why. When just one person is living in a room alone, then who to support is apparent, however in the case of multiple person, the system has to decide to whom the cockpit give support. The place or the space where the cockpit should be generated and when the support should be offered must be carefully examined, otherwise the support becomes useless or troublesome. The content of the support can be classified as explained in the previous section, i.e. a) physiological support, b) physical support, c) informational support and d) psychological support. In many cases, multiple superposed supports are needed in our daily life. In addition to this, even if the content of the support is the same, the cockpit has to select the way or the media of the support carefully. For example, the system has to select whether it uses voice channel or visual channel in case of informational support. The system always has to pay keen attention to a human so that it can provides adequate support to the human constantly, it means that the system has to have the capability to explain why some specific cockpit is offered to the human. 2.4 Realization of Human Behavior Support Cockpit The size of the human behavior support cockpit (HBSC) can be categorized by either a) a device level, b) an appliance level, c) a furniture level, c) an individual robot level, and d) a room level or combination of them depending upon the size of the cockpit. In addition to construction and sophistication of such individual components of the human behavior support cockpit realized by either devices, furniture, robots and/or room

T. Sato / Real World Informatics Environment System

23

itself, the combination of them have also to be investigated for better performance of support environment. 2.5 Research Group of Project Before realizing the final real world informatics environment system, the following subenvironment systems have been realized so far, i.e., a) a humanoid and robot environment system, b) a VR environment system, c) an ubiquitous environment system, and d) an audio/visual agent environment system. To realize the stated sub-systems, the following five research groups are formed in the project; x Human Robotics Research Group (HR), x VR System Research Group (VR), x Neo-Cybernetics Research Group (NC), x Attentive Environment Research Group (AE), x Human Informatics Research Group (HI). The first four groups correspond to the stated a) to d) sub-environment and the last research group of Human Informatics is formed to provide the project with basic scientific base of human. So far, discussions on a real world informatics environment system and its constituent elemental environment system were deepened. Details of these images will be presented in the following chapters along with individual research results.

3.

Human Robotics Research Group (HR)

Within the real world information system project that is intended to construct an intelligent environment interacting with a human in a new way, the Human Robotics Research Group (HR) will conduct researches focusing on innovative robots physically assisting a human such as a humanoid capable of equally interacting with a human in terms of physical handling of object and communication, wearable a ubiquitous supporting device that provides necessary information accompanying a human, and a room environment integrated robot casually watching over and giving helping hand to a human. Through collaboration with the other Human Informatics Research Group, NeoCybernetics Research Group and VR System Research Group within the real world information system project, we attempt to implement an intelligent system environment

Figure 2: Image of human robotics environment

24

T. Sato / Real World Informatics Environment System

interacting with a human by way of human behavior cockpit. Figure 2 illustrates the image of a human robot environment system. This image sketch shows a place where people get together, and a robot answers a human’s call correctly (communication environment), attends a human maintaining a comfortable distance (communication environment), exchanges information (communication environment) and goods (object access environment) with a human politely. They are related to provide assistance by knowing a person’s daily living behaviors. As a perceptual function of a cockpit, knowing of the existence of a human, responding to a human motion, and observing the direction of a sight line are basic and indispensable; however, there still remain some unsolved problems. To realize such a cockpit, this research group allotted and deepens research and development of an environment robot with a sensing floor and robotic lamp device (described later in section 7) collaborating with each other, a mobile and a humanoid robot.

4. VR System Research Group (VR) This research group promotes research and development of augmented reality (real space with information and images) technology and tele-existence technology (in which an operator controls a remote robot as if it A actually is on board.) to display a “cyber world essentially equivalent to real world” within daily space where a human actually lives rather than in a special room. This provides the human behavior cockpit with powerful tool to realize displaying environment Figure 3 is an image sketch of the VR system environment. The upper right and lower left of the figure are remote space— B here, referred to as Space A (upper right) and Space B (lower left). An entirecircumference cameras we installed around. Cameras in Space B are used to capture Figure 3: Image of a VR system environment a 3-D entire-circumference image of Person, and the image is presented by an entire-circumference 3-D image generator in Space A to bring about the effect of making a viewer in Space A feel as if the Person existing in Space B existed in Space A (communication environment). Bidirectional tele-existence so that Person B also feels as if he or she were in Space A is to be implemented. Moreover, construction of a system to realize high-level integration of the real world and information world using a self-propelled projector and projector capable of free adjustment of a projection direction is utilized.

T. Sato / Real World Informatics Environment System

25

5. Neo-Cybernetic Research Group (NC) The purpose of this research group is to provide a machine with functions and abilities such as vision, hearing, and touch senses, targeting a wide range of research subjects from intelligent high-performance hearing, vision and touch sensors to a high-speed robot or a voice interaction personalized agent that behaves like a human. Figure 4 shows an image sketch of the neo-cybernetics system environment. The plot currently under our consideration could be a system guiding through a museum demonstration site. A large self-propelled display with a voice interaction personalized agent has a dialogue with a user through speech recognition and synthesis (communication environment). A guest is to be led or followed by this equipment or by acoustic processing, directional hearing and high-speed vision functions, where a selection of information Figure 4: Image of neo-cybernetics environment media becomes important research issue (information display environment). Stated communication and information displaying environment are synthesized to realize support cockpit of museum walk around behavior.

6. Attentive Environment Research Group (AE) The group mainly studies an environment which we call the Attentive Environment (AE) that “offers a helping hand to an operator” as a new intelligent machinery system for assisting operations in cell production, under the theme of “a new interaction between a human and machine” in a production environment. Furthermore, as a specific development objective, a system that flexibly handles major parts during assembly was reviewed, and an Attentive Workbench (AWB) system that integrates a motion tray (object access environment), information display (information display environment), vital sign monitor (physiological support environment) and parts tag (computer readable information environment) have been under development. Figure 5 shows an image of an attentive environment system. This is an environment Figure 5: Image of an attentive environment system where a worker is doing assembly system environment

26

T. Sato / Real World Informatics Environment System

work, i.e. the figure illustrates the assembly behavior support cockpit in the configuration of a room. The system recognizes the intention of the worker from his motions, and tries to “offer a helping hand”—presenting information or moving parts which corresponds to information environment and physical object access environment. For example, the system may display necessary information on the desktop, bring necessary tools and parts to an operator’s hand as he extends his arm, or transport assembled parts to the next procedure to provide object access environment to the worker. Motions recognition shall be done without discomforting the operator. A visual system installed at the top of the system is used to recognize especially motions of the operator’s arm or finger and the direction of a sight line (head). Information presentation is done by presenting an icon, figure or document on the desktop by a projector, using a method of extended reality to provide object access environment. Furthermore, the operator’s fatigue degree is also recognized by letting the operator wear a monitor device for vital signs such as the heart rate, and work speed and the degree of assistance shall be varied accordingly to ensure suitable physiological environment.

7. Human Informatics Research Group (HI) The Human Informatics Research Group develops research for modeling human perception, behavior and recognition functions based on measurement and analysis to form human scientific basis of this project; establishing mathematical scientific theories of real world information processing; and constructing basic methods for new forms of recognition/behavior, communication, and coordination/assistance functions. This environment is called a showroom, intending not only to demonstrate interactive system technology truly compatible with a human but also to do analysis and modeling study of a human (i.e., mathematical science of real-world information system). A measurement interface for a robot experiment and interface construction, and international multipoint remote lectures for result transmission and educational feedback will also be tried in the future.

8. Example of Human Behavior Support Cockpits In this chapter, an example of a human behavior support cockpits (HBSC’s) will be presented. These cockpits are composed of an illumination environment to a adapt to several human behaviors such as entering the room, sitting on the chair, sitting on the cushion, being on bed and getting up from bed, i.e. a entering the room cockpit, a sitting on the chair cockpit, a sitting on the cushion cockpit, a being on bed cockpit and a getting up from bed cockpit respectively realized by the cooperation of a sensing floor and a robotic lamp to generate illumination support environment [8].

T. Sato / Real World Informatics Environment System

27

8.1 Sensing Floor A sensing floor is a high resolution pressure sensor distributed floor to detect human position and objects such as a chair and a cushion. The sensing floor is composed of 16 units (Fig.6). Each unit has 16*16=4096 on/off sensors distributed in a square of 500mm width. Therefore the sensor pitch is 7mm. The output of the sensors is digital one bit. The sampling frequency is 5 Hz. This sensing floor is able to detect the user's a) moving on the bed b) sitting on the chair c) sitting on the cushion d) being on bed.

Figure 7: Entrance Illumination

Figure 8: Desk Illumination

Figure 9: Table Illumination

Figure 10: Bed Illumination

Figure 6: Sensing Floor

8.2 Robotic Lamp The Robotic lamp is a robotic 5 DOF desk light with the following functional capabilities: a) brightness control, b) posture control and c) back drivability to detect human physical contact. 8.3 Human Behavior Support Illumination Environment The following figures show how the robotic lamp illuminated the entrance as the user entered the room (Fig.7) and illuminated the desk as the user sat on the chair (Fig.8). Robotic lamp illuminated the table as the user sat on the cushion near the table (Fig.9) and illuminated the bed after the user moved on the bed (Fig.10) which corresponds to the entering the room behavior support cockpit, the sitting on the chair behavior support cockpit,

28

T. Sato / Real World Informatics Environment System

the sitting on the cushion behavior support cockpit, the being on bed behavior support cockpit and the getting up from bed behavior support cockpit respectively. These figures presents the robotic lamp system is able to provide adequate illumination environment depending upon user’s behavior.

9. Conclusion Bearing realization of a new human-system interaction in mind, we have been promoting the project to realize a real world informatics environment system that is implemented by multiple human behavior support cockpits. This paper mentioned the research objectives of the real world information system project of the University of Tokyo and proposes human behavior support cockpit realized by synthesizing the behavior support environment of a physiological support environment (e.g. an illumination environment), a physical support environment (e.g. object access environment) and psychological support environment (in some case realized by a communication environment). The project is composed of research group such as a Human Robot Research Group (HI), a VR Research Group (VR), a Neocybernetics Research Group (NC), a Attentive Work Bench Research Group (AE) and Human Informatics Research Group (HI). Research target of each research group and examples of human behavior support cockpit are also explained. Our future tasks are promotion and integration of researches in sophistication of many system elements including a humanoid, VR system, information agent and ubiquitous appliance, and pursuit of the relation between those information systems and a human to realize human behavior environment. By synthesizing these environment, proposed human behavior support cockpit would be realized as the new function of the real world informatics environment system.

Acknowledgement The author would like to express great thanks to Professor Shigeki Sagayama, Professor Masayuki Inaba, Professor Yasuo Kuniyoshi, Professor Susumu Tachi, Professor Hiromasa Suzuki and Professor Kiyoshi Takamasu who are the leaders of the research group of the Neo-Cybernetics Environment, Human Robotics Environment, Human Informatics Environment, VR System Environment, Attentive Environment, and Attentive Environment of this project respectively. Great thank also to Dr. Hiroshi Morishita, Prof. Mihoko Otake, Dr. Shinji Sako, Dr. Kiyoshi Kotani, Dr. Masao Sugi for their keen discussion as specially appointed researchers and Mr. Yosuke Kita for nice picture of conceptual sketches of environments. This project is the part of the 21st century COE programs of and supported by the Ministry of Education, Culture, Sports, Science and Technology.

T. Sato / Real World Informatics Environment System

29

References [1] Alex Pentland, “Smart Rooms”, Scientific American, pp.54-62, April, 1996. [2] Mark C. Torrance, “Advances in human computer interaction: The intelligent room,” In Working Notes of CHI 95 Research Symposium, Denver, Colorado, May 6-7, 1995. [3] Barry Brumitt et.al., “EasyLiving: Technologies for Intelligent Environments,” Proceedings of International Symposium on Handheld and Ubiquitous Computing, 2000. [4] Y. Nishida, T. Hori, T. Suehiro, and S. Hirai, “Sensorized environment for selfcommunication based on observation of daily human behavior,” Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems pp.1364-1372, 2000. [5] I. A. Essa, “Ubiquitous sensing for smart and aware environments: technologies towards the building of an aware home,” Position Paper for the DARPA/NSF/NIST workshop on Smart Environment, July 1999. [6] Michael C. Mozer, “The neural network house: An environment that adapts to its inhabitants”. In Proceedings of the AAAI, 1998. [7] Joo-Ho Lee, Noriaki Ando, Hideki Hashimoto, “Design Policy for Intelligent Space”, IEEE Systems, Man, and Cybernetics Conference, 1999. [8] Tomomasa Sato, Mehrab Hosseinbor, Takeru Kuroiwa, Rui Fukui, Jun Tamura, and Taketoshi Mori: “Behavior and Personal Adaptation of Robotic Lamp,” Journal of Robotics and Mechatronics, Vol.17, No.1, 2005 pp69-76, 2005. [*] http://www.u-tokyo.ac.jp/coe/list06_e.html

30

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

Understanding and realization of constrained motion – human motion analysis and robotic learning approaches a,b

a

Shigeyuki Hosoe, Yuichi Kobayashi and Mikhail Svinin a RIKEN Bio-mimetic Control Research Center b Nagoya University Nagoya, Japan [email protected], {koba,svinin}@bmc.riken.jp

a

Abstract. This paper presents two approaches toward the understanding and realization of constrained motion of the upper limbs. The first approach deals with the analysis on the constrained human movements under the framework of optimal control. It is shown that the combination of the optimality criteria constructed by muscle force change and hand contact force change can explain the formation of the constrained reaching movements. The second approach comes from robotics. It is illustrated by application of reinforcement learning to a robotic manipulation problem. Here, an assumption on the existence of holonomic constraints can accelerate learning by introducing function approximation techniques within model-based reinforcement learning. The idea of estimation (parameterization) of the constraints can relate two different problems under the common perspective of “learning constrained motion.” Keywords. Constrained motion, Human motion analysis, Optimal control, Reinforcement learning

1. Introduction Historically, the robotics research has grown from industrial applications. However, recently it attracts more and more attention for the reason that robots can be regarded as “bio-mimetic machines,” where the principles of the robot control are expected to help understanding of the principle of human motion control. That is, we may understand the function of human brain by constructing machines that mimics human body and by testing those machines using biologically inspired control principles. It has been pointed out that the interaction between the human body and the environment plays an important role for the understanding of human intelligence [1]. This perspective was emphasized in contrast to the symbol-based information processing researches, which mainly considered functions of brain independently of the motion of human body. Examples of interaction with the external environment can be found in locomotion and manipulation tasks. In locomotion tasks one deals with the control of lower limbs, while in the manipulation task we are interested in the control of upper limbs. Typically,

S. Hosoe et al. / Understanding and Realization of Constrained Motion

31

the external environment in the locomotion tasks is static. In comparison with the locomotion problems, the control of upper limbs motions may have different properties. For example, as can be seen in many manipulation problems, the interaction between the upper limbs and the environment can change the latter. In general, the property of “changing the world” can not be seen in locomotion tasks. This can require the agent to have qualitatively different abilities of adapting to the environment. Additionally, the interaction between the upper limbs and the environments can be physically constrained. The property of being constrained by the external environment requires further analysis of the basic mechanisms for the generation of constrained movements that can be exploited by the human control system for the creation of adaptation mechanisms with richer computational structures. In this paper, we present two approaches toward the understanding and realization of the constrained motion of the upper limbs. First, we show the analysis of upper limbs motion physically constrained by the environment. The analysis is based on the idea that the human motion can be explained using optimization principles. Secondly, we show the realization of constrained motions by a manipulator using reinforcement learning. Here, we introduce some assumptions on the learning problem so that learning performance is improved in comparison with the conventional reinforcement learning methods taken without such specific assumptions. Fig.1 shows the relation between two approaches described in this paper. Both these approaches have constrained motions as the common control problem. The two arrows in the figure denote the connection between the approaches. The left arrow indicates the research on human motion analysis which aims to the development of the robot technology as well as understanding of human itself. The right arrow indicates the expected contribution to the understanding of human motion by developing robot control methods while borrowing some ideas from human motion for the construction of robot control strategies. Understanding of human motion analysis of human motion

Constrained motion construction of robot control

Development of engineering technology

Fig. 1 : Two approaches for the understanding and realization of constrained motion

2. Analysis of human constrained motion It is known that unconstrained reaching movements of human arm have invariant features expressed by straight (slightly curved) hand path in the task space and a bell-shaped hand velocity profile [3]. One approach to account for the invariant

32

S. Hosoe et al. / Understanding and Realization of Constrained Motion

features of reaching movements is via optimization theory. The cost function for the optimal control problem can be generally described as

J

³

T 0

L(q, q , q,")dt ,

(1)

where q denotes the vector of the joint angles. The optimal control and the optimal trajectory for the upper limb system can be derived by minimizing this cost function. For example, Flash and Hogan proposed a minimum hand jerk criterion [3]. Other criteria, such as a minimum joint torque change criterion [4] and a minimum muscle tension change criterion, were also tested for the prediction of reaching movements. Roughly, the criteria of optimality can be classified depending on the planning space (task or joint) and the criterion domain (motion or force) as indicated in Fig.2.

Fig. 2 : Classification of optimization criteria (cost function)

Fig. 3 : Experimental setup for human motion measurement

The above-indicated criteria of optimality were tested mainly for the reaching movements in free space. However, humans also make arm movements interacting with the environment (opening a door, turning a steering wheel, rotating a coffee mill, and so on). To manipulate objects or to use tools, the central nervous system (CNS) has to take into account the constraints imposed by the external environment when planning hand trajectories. It should also be noted that constrained movements constitute a class of force interactive tasks with redundancy in the distribution of the force actuation across the muscles. Understanding how human deals with the external constraints and solves the redundancy problem is important not only from the point of view of computational neuroscience but also for the construction of intelligent control systems for robot manipulators interacting with the external environment. Considering optimal control problems for the prediction of reaching movements, we noticed that criteria defined in the force domain of the task space have not been considered in literature. Next, by analyzing human motion profiles in a crank rotation task (see Fig. 3), we found that the conventional criteria (minimum joint torque and minimum muscle force change criteria) do not predict well the experimental data in the motion domain (tangential direction of movement) but they can capture the tendency of the internal force profiles. To take into consideration the force interactive nature of the crank rotation task, we have proposed a minimum hand force change criterion which is essentially a dynamic version of the minimum hand jerk criterion. This criterion gives

33

S. Hosoe et al. / Understanding and Realization of Constrained Motion

much better prediction of the velocity profiles. However the level of the normal forces, given by this criterion, is much lower than that observed in the experimental data. Consequently, we proposed a combined form of the optimality criterion, which has two components‒the hand force change and the muscle force change–balanced by a weight factor. Our simulation results show that in comfortable point-to-point movements the combined criteria predict the experimental data better. In addition to the motion and force trajectories in the task space, the combined criterion formulated at the muscle level can also capture the basic patterns of muscle activity (see Fig. 4). Even though the mathematical model we employed is relatively simple and does not take into account the dynamics at the metabolic level, the predictions are quite satisfactory. This suggests that the smoothness of both hand and actuating forces is of primary importance for explaining human movements constrained by the external environment. 0 400

f6

0 400

f5

0 400

f4

0 400

f3

0 400

f2

EMG6

EMG4 EMG3 EMG2

Muscle force (N)

IEMG

EMG5

EMG1

0.2

0.4

0.6

0.8

1

1.2

1.4

0

0.25

Time (s)

0.5

0.75

1

1.25

1.5

f1

Time (s)

Fig. 4 : EMG signals (left) and muscle forces predicted by proposed criterion (right)

It should be noted that the combined criteria introduced in this research take into account the geometry of the external constraints through the force change term, which depends on the orientation of the moving contact frame. Knowledge of the normal and the tangential directions of the constraint is essential in the calculation of the combined criteria. It is therefore assumed that, after enough learning and adaptation, the CNS implicitly constructs a parameterization of the constraint manifold. This may raise new interesting questions about the nature of the biological motor control of human movements.

3. Reinforcement learning for robotic constrained motion Reinforcement learning (RL) was originally related to the psychology of animal learning. Later it was connected to the optimal control framework and recently applied to robot control problems (e.g. [10]). RL is characterized by the fact that we do not need the perfect information about the evaluation and the dynamics of the system [1]. The general objective of reinforcement learning is to maximize the discounted sum of reward. One of the key notions in RL is known as a Temporal-Difference (TD) learning, where the following TD-error G t is used to update the value function V ;

34

S. Hosoe et al. / Understanding and Realization of Constrained Motion

Gt

rt  J (V ( xt 1 )  V ( xt )) ,

(2)

Here, xt and rt denote the state variable and reward at time t , respectively. One of the advantages of the TD-learning is that it does not require explicit representations of the state transition function and the reward function, which results in less computation. Another advantage is that the TD-learning does not require any specific assumptions except the one that the state transition should be Markov decision process [1]. This generality means wide applicability of the learning method in principle, though it is difficult to practically apply TD-learning to control problems due to the slow convergence. On the other hand, there is a possibility to improve the efficiency of learning. This can be done by the explicit approximation of the state transition function and the reward function, and by utilizing them for learning even at the higher computational cost. In the discrete state and action domains, such explicit approximation methods are known as Dyna [7] or real-time dynamic programming [8]. Here, we extend the approximation to the continuous state domain by introducing a suitable function approximation scheme and call such an approach a model-based reinforcement learning. In this paper, we focus on a manipulation problem where the actual motion is restricted to a sub-manifold, which is included in the manifold of the total system, as indicated in Fig.5. In other words, we deal with the problem which includes holonomic constraints. The actual motion of the manipulator is restricted to the motion of the object as long as the contacts between the manipulator and the object are maintained (see Fig.6). If we can obtain an appropriate expression for such a low-dimensional motion, learning will be more effective. In the control of the constrained motion, there are two objectives, 1) to reach the desired configuration and 2) not to break the constraints (not to let the fingertip slip or part from the object). The second objective can be transformed to a discontinuous reward (see Fig. 7), which is given when the manipulator fails to keep the constraint. Thus, we introduce two assumptions on the control problem, the holonomic constraint and the discontinuous reward.

x3 constrained motion space

p2 p1

x1

x2

original state space

Fig. 5 : Sub-manifold generated by constraint

Fig. 6 : Constrained manipulation

motion

of

35

S. Hosoe et al. / Understanding and Realization of Constrained Motion

Based on these assumptions, we propose a learning scheme which is specified by the following three features: a) Construction of the mapping from N to a smaller-dimensional parameter space. b) Model-based reinforcement learning taking into account the system dynamics and using an approximation of the reward function. c) Approximation of discontinuous reward functions. Fig.8 shows a specific example of the manipulation task used in our study. Here, a two-link manipulator rotates an object to the desired angle without slipping and parting from the object.

R

y

q2

x2

Fcx

W2

Fcy

x1

q1 W1

x

T

xr , yr

Fig. 8 : Rotational task of one-dof Fig. 7 : Discrete reward function with object continuous boundary The motion of the manipulator is restricted to one-dimensional space while the constraint is maintained as indicated in Fig.9 which shows the constraint curve the joint space obtained during the learning. Fig. 10 shows the reward profile of the proposed learning method.

q2 -1.6

-1.8

-2

-2.2

0.9

1

1.1

q1

Fig. 9 : Obtained curve of constraint in joint angle space

Fig. 10 : Reward profile of proposed learning method

First, we compared the developed model-based learning method (using the proposed mapping) with the conventional Q-learning method and found that it was

36

S. Hosoe et al. / Understanding and Realization of Constrained Motion

impossible to obtain a control policy achieving the goal with Q-learning. This implies that the explicit approximation of the reward plays an important role in learning. Secondly, we applied the model-based learning without the proposed mapping. This time, the learning method could achieve the goal, but the averaged performance of the obtained policies was much worse. Thus, the learning performance could be improved by changing the compactness of the representation of state.

4. Discussion In the analysis of constrained human motion, we showed that the combined criteria of hand force change criterion and muscle force change criterion explained the experimental results well. The combined criterion essentially requires the knowledge of the geometrical information of the constraint. This result implies that human might construct a parameterization of the constraint, which can be obtained after some trial movements. This idea of parameterization of the constrained manifold was then considered within reinforcement learning framework. In the application of the model-based reinforcement learning to a manipulation task, we confirmed that the parameterization of the constraint improved the learning performance. The idea of estimation of the geometrical information of the constraint was also applied to the estimation of a discontinuous reward that specifies the boundary of the region for a discrete reward. This also greatly helped to improve the learning performance. From the perspective of robot control, in addition to the parameterization of constrained manifold, there can be other kinds of knowledge that can accelerate reinforcement learning. This issue can be addressed by constructing reinforcement learning methods that effectively utilize the additional knowledge on the manipulation problem and by measuring human motion with different conditions and clarifying whether humans really utilize such knowledge or not. In the human motion analysis, we dealt with the human motions that had been obtained by sufficient amount of training. It is also very important to focus on the learning of such motions. The relation of RL and the brain function has been discussed in the context of reward estimation and TD-error [12, 13]. On the other hand, some correspondences between the activities of Purkinje cells in the cerebellum and the learning framework based on motion primitives have been pointed out [14]. It is natural that human motion skills are obtained not by a single RL algorithm but by the combination of several kinds of learning frameworks, and the combination can be characterized by an effective usage of acquired (and somehow abstracted) knowledge. We will focus on this aspect of the acquisition of human motion skill in the future study.

5. Conclusion In this paper, we presented two approaches for the understanding of human constrained motion and the realization of robotic learning. In the analysis of human constrained motion, we proposed a combined criterion of hand-force change criterion and muscle

S. Hosoe et al. / Understanding and Realization of Constrained Motion

37

force change criterion. The criterion predicted the experimental results well enough. The idea that human might construct the parameterization of the constraint was applied to the robotic motion learning. We proposed to introduce the knowledge of the constraint into the reinforcement learning framework by model-based learning with function approximation. The simulation results showed that the proposed learning method improved the learning performance. We expect that the further development of formal learning methods can be realized by mimicking the learning techniques witnessed in human movements.

Reference [1] Richard S.Sutton and Andrew G.Barto, “Reinforcement Learning,” MIT Press, 1998. [2] Rolf Pfeifer and Christian Scheier, “Understanding Intelligence,” MIT Press, 1999. [3] P. Morasso, “Spatial control of arm movements,” Experimental Brain Research, 42, pp. 223-227, 1981. [4]

T. Flash and N. Hogan, ”The coordination of arm movements”, Journal of Neuroscience, vol.5, pp.1688-1703,1985.

[5] K. Ohta, M. Svinin, Z.W. Luo, S. Hosoe, and R. Laboissiere, “Optimal Trajectory Formation of Constrained Human Arm Reaching Movements,” Biological Cybernetics, Vol. 91, No. 1, July 2004, pp. 23-36. [6] Mikhail Svinin, Youhei Masui, Zhiwei Luo, and Shigeyuki Hosoe, “On the Dynamic Version of the Minimum Hand Jerk Criterion,” Journal of Robotic Systems, Vol. 22, No. 11, pp.639-651, 2005. [7] R. S. Sutton, “Dyna, an Integrated Architecture for Learning, Planning, and Reacting,” Proc. of the 7th Int. Conf. on Machine Learning, pp. 216-224,1991. [8] Andrew G. Barto, Steven J. Bradke and Satinder P. Singh,

Learning to Act using Real-Time Dynamic

Programming,” Artificial Intelligence, Special Volume: Computational Research on Interaction and Agency, 72, 1995, pp.81-138. [9] C. G. Atkeson, A. W. Moore, and S. Schaal: “Locally Weighted Learning for Control,” Artificial Intelligence Review, Vol.11, pp.75-113,1997. [10] J. Morimoto and K. Doya:“Reinforcement Learning of dynamic motor sequences: learning to stand up,” Proc. of IEEE/RSJ Int. Conf. of Intelligent Robots and Systems, Vol.3, pp.1721-1726, 1998. [11] Y. Kobayashi and S. Hosoe, “Reinforcement Learning for Manipulation Using Constraint between Object and Robot,” Proc. of IEEE Int. Conf. on Systems, Man & Cybernetics, pp. 871-876, 2005. [12] A.G. Barto, “Adaptive critics and the basal ganglia,” In Models of Information Processing in the Basal Ganglia, Cambridge, MA: MIT Press, pp.215-232, 1995. [13] C.D. Fiorillo, P.N. Tobler, and W. Schultz, “Discrete coding of reward probability and uncertainty by dopamine neurons,” Science, vol. 299, pp. 1898-1902, 2003. [14] K. A. Thoroughman and R. Shadmehr, “Learning of action through adaptive combination of motor primitives,” Nature 2000 Oct 12;407(6805):682-3.

This page intentionally left blank

Part 1 Navigation and Motion Planning

This page intentionally left blank

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

41

The Specifiability Requirement in Mobile Robot Self-Localization Francesco Capezio, Antonio Sgorbissa 1 , and Renato Zaccaria DIST - University of Genova Abstract. In last years, Simultaneous Localization And Mapping techniques monopolize research in mobile robot self-localization, especially in the civilian, Service Robotics domain. In this paper we argue that techniques derived from the industrial scenario, in particular beacon-based triangulation systems, should be taken into considerations even for civilian applications whenever we deem important to provide accurate, a priori specifications about the system behavior in a generic, yet untested environment. The paper provides an analytical expression of sensitivity to errors for triangulation depending on the system geometry. Keywords. Mobile Robotics, Beacon-based triangulation, System design

1. Introduction Self-localization has been the key for success for all those systems which are agreed upon as “big achievements” in the last decade (some renown examples are [1] [2] [3]). This is not a surprise, if we consider long term autonomy as a discriminator (the “six months between missteps” that Moravec advocates [4]): whenever robots are required to work continuously, for a long period of time, carrying out their tasks with no performance degradation or human intervention, the ability to keep dead reckoning errors (which are intrinsically cumulative) limited over time plays a fundamental role. Among the criteria which are usually adopted for classifying localization methods, in this work we focus onto a single, specific criterion: i.e., by extending the dialectic between “artificial landmark” and “natural landmark” [5], we distinguish two categories: 1. Approaches which rely on features which are naturally present in the environment to update the robot’s perceptions. Autonomous systems in the civilian, Service domain mainly belong to this category; features in the environment are detected by means of proximity sensors (sonars and, more recently, laser rangefinders [2] [3]) or vision. Simultaneous Localization And Mapping falls in this class: it puts further constraints by assuming that, in addition to being unmodified, the environment is also unknown. 2. Approaches which introduce artificial, ad hoc designed features or devices in the environment in order to improve and simplify the perception process. AGVs in the industrial domain very often belong to this category; since both the signal emitting source and the sensor are designed ad hoc, the only limit is one’s imagination. Some examples 1 Correspondence to: Antonio Sgorbissa, DIST - University of Genova. Tel.: +39 010 353 2801; Fax: +39 010 353 2154; E-mail: [email protected].

42

F. Capezio et al. / The Specifiability Requirement in Mobile Robot Self-Localization

are spray-painted tracks in the floor adopted by Bell and Howell Mailmobile, Komatsu Z-shaped metal landmarks [6], MDARS-I retroreflective landmarks [7], DBIR LaserNav [8], and TRC Beacon Navigation System [9]. Approaches belonging to the second class allow better performance: that is why they are often preferred in the industrial domain. However, there are often very good motivations to leave the environment unaltered: commonly raised questions address the system’s migrability (i.e., can it behave indifferently in different environments?), biological plausibility, cost, accessibility (i.e., is it possible at all to modify the environment?), and would deserve further discussion. Here we focus onto a different issue, which is rarely taken into consideration when evaluating a localization systems: we call it specifiability, a market-inspired term which means “how accurately can a consumer assess... whether or not the good will satisfy the purpose it is purchased for” [12]. In Section II we show that the current trend in mobile robotics (especially indoor, SLAM-based, Service Robotics) has many difficulties to answer to the specifiability requirement. In Section III we focus onto the triangulation algorithm: we provide a closed form expression of triangulation and we formally derive an expression of the algorithm sensitivity to measurement errors. In Section IV, MatLab simulations and experimental results with the beacon-based system that we designed are shown. Conclusions follow.

2. Specifiability In [3] a large scale application in the museum scenario is discussed: the localization subsystem, based on two laserscanner and a feature based approach, undoubtedly demonstrates to be robust and reliable enough for the assigned tasks. Suppose however to be interested in buying or renting a similar system, and to consequently ask for a priori, formal specifications about its localization accuracy in a new, untested environment. Suppose also that, as the only available information, you can provide interior plans of the new environment and, maybe, some photographic snapshots. Unfortunately, in the case of laser-based (or vision based) feature extraction and matching, predicting the system’s accuracy from elementary considerations about the geometry of the environment is not simple. The situation is even more dramatic if the environment dynamically changes during task execution. Notice that this lack of specifiability is not due to limitations of the sensor; the main problem is the complexity of the interactions of the sensor with the environment, which causes feature extraction and matching to combinatorially depend on many factors: the number, the geometry and distribution of available features, the relative position and orientation of the robot with respect to each of them, the similarities and differences between potentially confusable features, lighting conditions (for vision), etc. On the opposite, complete, formal specifications are usually provided for AGV localization and navigation in the industrial domain [5]. Let us focus onto positioning errors: either the robot’s motion is constrained along a fixed trajectory or it relies on a finite number of artificial landmarks to correct its position, distribution p(errr |r) of the positioning error errr can be predicted for every robot’s configuration r by knowing 1. measurement error distribution p(errs |r) for each landmark si sensed in r; 2. sensitivity to errors, i.e. how positioning errors depends on measurement errors. If the robot’s state can be expressed in closed form as r = F(s1 , .., sn ), sensitivity is given by the Jacobian J of the partial derivatives of F with respect to s1 , .., sn .

F. Capezio et al. / The Specifiability Requirement in Mobile Robot Self-Localization

43

About 1, notice that also statistical approaches relying on natural landmarks are required to compute p(errs |r). However, with artificial landmarks, measurement errors can be evaluated with higher fidelity both in the theoretical and the experimental noise analysis: since one usually designs both the sensor and the landmark, the underlying model is better known and - hopefully - simpler. About 2, sensitivity proves to be particularly relevant when using artificial landmarks. In this case, landmark detection does not depend anymore on the complexity of the environment (which can be ignored), but on the spatial arrangement of landmarks as it has been decided with the sole aid of interior plans: i.e., for every r, it is always possible to know in advance which landmarks are visible and the corresponding p(errs |r). Thus, once the spatial arrangement of landmarks has been decided and sensitivity has been computed on the basis of pure geometric considerations, positioning errors p(errr |r) are predictable and fully specifiable, for any area of the Workspace in which enough landmarks are available for localization, as a function of the sole robot state r. Among localization systems commonly used in the industrial domain, beacon-based localization through triangulation seems a good compromise to meet all the requirements introduced so far while suiting two major needs of civilian scenarios: beacons can be elevated from ground (thus avoiding occlusions due to surrounding people) and they allow the robot to exhibit complex metric navigation in the free space [8] [9]).

3. Triangulation expressed in closed form Triangulation assumes that the robot knows the position of (at least) three landmarks bi = (xbi , ybi ), i = 1, .., 3 (i.e., beacons in our system) with respect to a fixed Cartesian frame F . By measuring the relative angles θ1 , θ2 , θ3 between the vehicle’s heading (to be computed) and the three landmarks, triangulation allows to retrieve the position and orientation r= (xr , yr , θr ) of the vehicle with respect to F (Fig. 1a). The problem has a graphic solution which is very simple and evocative (Fig. 1b): if the robot observes two landmarks (say b1 , b2 ) separated by an angle θ12 (i.e., the difference θ1 − θ2 , with positive angles measured counter clockwise), it must be on a circle C12r univocally determined by that angle and the distance between the landmarks. When considering another couple of landmarks (say b2 , b3 ) and the corresponding angle θ23 , another circle C23r is defined, which intersects with the previous one in two points: b2 and (xr , yr ), i.e. the actual robot’s position. Finally, by knowing (xr , yr ) and any of the three angles θ1 , θ2 , θ3 , orientation θr can be retrieved. This graphic solution (also referred to as Circle Intersection [10]) can be implemented as a closed form expression xr = fx (θ12 , θ23 ), yr = fy (θ12 , θ23 ), θr = fθ (θ12 , θ23 )

(1)

which will be described in details in the following. The graphic approach helps to visualize singularity configurations which are well known in literature: 1. the robot lies on the same circle (possibly collapsing to a straight line) which connects the three beacons: C12r and C23r overlap and the solution is not univocal; 2. the robot lies on the line which connects two landmarks, say b1 , b2 : the subtended angle θ12 equals 0 or π and C12r consequently collapse to a line. Notice that, whilst in case 1 the problem has intrinsically infinite solutions, in case 2 a unique solution still exits: it just requires a special care to deal with it. Moreover, in

44

F. Capezio et al. / The Specifiability Requirement in Mobile Robot Self-Localization

Figure 1. a) The Triangulation Problem b) Graphical solution

both cases, all configurations which are close to singularities reveal to be problematic: a report [10] of triangulation algorithms in literature, compared on the basis of their sensitivity to measurement errors, shows that Circle Intersection provides the higher accuracy as possible (by ignoring iterative, approximate methods such as the Newton-Rapson which assume an a priori estimate of the robot’s position). In [11] authors claim that the Trigonometric Method is able to find a solution even for type 2 singularities whilst Circle Intersection is not. This is imprecise: of the three subtended angle θ12 , θ23 , θ31 (each uniquely defining a circle C12r , C23r , C31r ), at most one can equal 0 or π (unless all beacons lie on the same line). The information available is thus redundant, allowing to consider the two angles for which the singularity is not reached. Since its simplicity and efficiency, we consider only the Circle Intersection algorithm, of which we give a closed form expression which is much simpler than in [10]. Next, to express sensitivity to error, we analytically compute the Jacobian J of the partial derivatives of fx , fy and fθ with respect to θ12 , θ23 , θ31 . This idea is first introduced in [11]; surprisingly, authors do not provide a closed form expression of J, and limit themselves to simulating the effect of angular errors on an Error Sensitivity Map. Let us imagine to choose angles θ12 and θ23 , and to consequently consider the circles C12r and C23r . First, we compute the center of C12r : by elementary geometric considerations (Fig. 2a), we know that the angle between b1 and b2 , when measured from the center cr12 , is doubled with respect to θ12 . Thus, the distance between cr12 and b1 b2 can be computed as follows  2 2 d12 = 0.5 (xb2 − xb1 ) + (yb2 − yb1 ) tan(θ12 ) (2)   notice that, if we consider θ12 in the interval 0 2π , d12 can assume negative values: this implicitly considers the fact that cr12 can be on both sides of b1 b2 . Notice also that the circles defined by θ12 and θ12 + π are the same; however, for every angle θ12 , cr12 is univocally defined. Next, we compute the midpoint m12 between b1 and b2 and we add to m12 a vector d12 which is perpendicular to b1 b2 and has length d12 After some substitutions, this yields the following expression for cr12 (and cr23 ): xcr12 = 0.5 (xb1 + xb2 + (yb2 − yb1 )/tan (θ12 )) ycr12 = 0.5 (yb1 + yb2 − (xb2 − xb1 )/tan (θ12 ))

(3)

There are some singular points in Equation 3: 1. θ12 = 0, π corresponds to a circle with infinite radius (i.e., collapsing to a line), a type 2 singularity which requires considering a different couple of landmarks

F. Capezio et al. / The Specifiability Requirement in Mobile Robot Self-Localization

45

2. θ12 = π, 3π/2 corresponds to a circle whose center lies on b1 b2 , i.e., the midpoint between the two landmarks. Next, we compute the Equation of the straight line cr12 cr23 (Fig. 2b) and the distance db2 between cr12 cr23 and b2 (the first intersection of C12r and C23r ) db2 =

(ycr23 − ycr12 ) xb2 − (xcr23 − xcr12 ) yb2 − ycr23 xcr12 + ycr12 xcr23  2 2 (ycr23 − ycr12 ) + (xcr23 − xcr12 )

(4)

since the second intersection (i.e., corresponding to xr , yr ) is symmetrical to b2 with respect to cr12 cr23 , we compute it by adding to b2 a vector db2 which is perpendicular to cr12 cr23 and is doubled than db2 (but has an opposite sign).  2 2 xr = xb2 − 2db2 (ycr23 − ycr12 ) (ycr23 − ycr12 ) + (xcr23 − xcr12 )  (5) 2 2 yr = yb2 + 2db2 (xcr23 − xcr12 ) (ycr23 − ycr12 ) + (xcr23 − xcr12 ) To compute orientation θr , we use Equation 6 (handling the special case ycr23 = ycr12 ).     xcr23 − xcr12 yb2 − yr − θ2 = arctan − − θ2 θr = arctan (6) xb2 − xr ycr23 − ycr12 Notice that, at this point, we could simply substitute Equations 3 and 4 in 5 and 6, thus obtaining a single, closed form expression as in Equation 1. Sensitivity to errors can be analytically written as follows: Δxr = Δθ12 · ∂fx /∂θ12 + Δθ23 · ∂fx /∂θ23 Δyr = Δθ12 · ∂fy /∂θ12 + Δθ23 · ∂fy /∂θ23 Δθr = Δθ12 · ∂fθ /∂θ12 + Δθ23 · ∂fθ /∂θ23

(7)

which requires to compute the Jacobian J of the partial derivatives of fx , fy , fθ with respect to θ12 and θ23 . By noticing that the denominator in Equation 4 and 5 is the same, we are allowed to write: n = num (db2 ) , d = den (db2 )2 ∂xr ∂θ12 ∂yr ∂θ12 ∂θr ∂θ12

cr12 n = 2 ∂y ∂θ12 d − 2 (ycr23 − ycr12 )



1 ∂n d∂θ12



n ∂d d2 ∂θ12



1 ∂n n ∂d cr12 n = −2 ∂x + 2 (x − x ) − cr23 cr12 2 d ∂θ12 d ∂θ12  ∂θ12 d ∂ycr12 cr12 = d1 ∂x (y − y ) − (x − xcr12 ) cr23 cr12 cr23 ∂θ12 ∂θ12

(8)

Similarly, we can compute the partial derivatives with respect to θ23 . Equation 8 can be used 1) to improve the system’s accuracy in run-time and 2) to compute the sensitivity to errors for every r and with different beacon configurations. 1. In Equation 1, we expressed r= (xr , yr , θr ) as a function F(θ12 , θ23 ); however, as we already stressed, other two choice are available, namely to consider the couple (θ23 , θ32 ) or (θ12 , θ31 ). We call F1 = F(θ12 , θ23 ), F2 = F(θ23 , θ31 ) and F3 = F(θ12 , θ31 ) and J1 , J2 , J3 the corresponding Jacobian matrices of the partial derivatives. Whenever a triplet (θ12 , θ23 , θ32 ) is measured, we compute, for each component of the state (xr , yr , θr ), which Ji yields the minimum sensitivity to errors (it

46

F. Capezio et al. / The Specifiability Requirement in Mobile Robot Self-Localization

Figure 2. a) Computing the circle center b) Intersecting circles A. Circle Intersection

just requires to compute Equation 8 and 7 three times): this allows to choose which of the subtended angles (θ12 , θ23 , θ32 ) should be considered (and which one should be ignored). Angles which are close to 0 or π are automatically rejected; solutions which are very sensitive to errors can be rejected as well. Finally, when more than three beacons are available, Ji can be used to determine which triplet should be considered. 2. For any given beacon distribution, we are allowed to predict the positioning error distribution p(errr |r) for any location of the environment r in which at least three beacons are visible. To achieve this, for any given r we compute the three expected angles (θ12 , θ23 , θ32 ) = h(r) (i.e., h is the measurement model), we evaluate the three expected Jacobian matrices J1 , J2 , J3 as a function of r as in Equation 8, and we finally derive positioning errors from measurement errors p(errs |r) through Equation 7. If we limit ourselves to a worst case analysis, computing the maximum positioning error as a function of the maximum measurement error is trivial; otherwise, more complex manipulations are required. This procedure can provide guidelines for beacon arrangement.

4. Experimental Results In the following, to simplify computations, we assume a maximum error errθmax on the measure of θ12 , θ23 , and θ31 , thus ignoring the full distribution p(errs |r); moreover, we consider the x− and y−components of r, whilst θr is neglected. Since errθmax = Δθ12 = Δθ23 = Δθ31 in Equation 7 is constant, we can easily compute, for each Fi , i = 1, .., 3, the maximum position error errrmax as a function of errθmax and r: in Equation 9, (θ12 , θ23 ) = h(r) are considered.

∂fx 2 ∂fy 2 ∂fy 2 ∂fx 2 errr max = Δx2r + Δyr2 = errϑ max + + + (9) ∂θ12 ∂θ23 ∂θ12 ∂θ23 Under these assumptions, the Euclidean norm S of J is sufficient to determine the sensitivity to measurement errors for each position in the Workspace r, and can be effectively adopted to decide which angle in the triplet (θ12 , θ23 , θ32 ) should be ignored. Figure 3 shows the plot of S(r) versus (xr , yr ) when b1 = (0, 0), b2 = (10, 5), b3 = (10, 0) and different couples of subtended angles are considered, i.e., S1 (θ12 , θ23 ), S2 (θ23 , θ31 ), and S3 (θ12 , θ31 ); values are saturated to Smax = 100 for ease of representation. In all the Figures the highest sensitivity to error corresponds to the circle which connects the three beacons; positions which are close to this circle have an increasing sensitivity, thus

F. Capezio et al. / The Specifiability Requirement in Mobile Robot Self-Localization

47

Figure 3. Sensitivity computed with (θ12 , θ23 ), (θ23 , θ31 ), and (θ12 , θ31 ); minimum sensitivity.

yielding very inaccurate results during localization. Straight lines connecting couples of beacons yields inaccurate results as well (e.g., b1 b2 and b2 b3 for S1 ); however, it is sufficient to consider different couples: Figure 3 shows the plot of mini (Si ), in which the singularity is no more present. Figure 4a shows the plot of errrmax (expressed in meters) when errθmax = 1◦ , projected onto (x, errrmax ): it can be noticed that the highest accuracy (about 6cm) is achieved within the connecting circle, whilst errors increase outside. Figure 4c and d shows S(r) with different beacon configurations: by plotting sensitivity it is possible to infer how to arrange beacons in order to achieve the desired accuracy in any given area of the environment. Finally, the approach has been tested for many years with different experimental setups: Figure 4b shows the positioning errors of a real robot by considering the couple (θ12 , θ23 ); the resolution is obviously lower (data are collected every 0.3m), but the trend of errr and its magnitude approximately meet our expectations (values expressed in mm) 5. Conclusions If we want autonomous mobile robots to “hit the market” for civilian applications, it is fundamental to provide customers with a priori accurate specifications of the behavior of a system in a previously untested environment. To deal with the specifiability requirement, we propose to borrow localization methods from industrial AGVs; in particular, beacon based triangulation, which proves to be a good compromise for its simplicity and versatility. Analytical expressions of triangulation and its sensitivity to errors are given: simulations show how this information can be used to predict the system’s accuracy through geometric considerations.

48

F. Capezio et al. / The Specifiability Requirement in Mobile Robot Self-Localization

Figure 4. a) Plot of errrmax b) A real world experiment c, d) Sensitivity with different beacon configurations

References [1] Thrun, S., Beetz, M., Bennewitz, M., Burgard, W., Cremers, A., Dellaert, F., Fox, D., Hähnel, D., Rosenberg, C., Roy, N., Schulte, J., Schulz. D., 2000. Probabilistic Algorithms and the Interactive Museum Tour-Guide Robot Minerva. Int. J. of Robotic Research, Vol. 19, No 11. [2] Burgard, W., Cremers, A.B., Fox, D., Hähnel, D., Lakemeyer, G., Schulz, D., Steiner, W., and Thrun., S., 2000. Experiences with an interactive museum tour-guide robot, AI, 114 (1-2). [3] Siegwart R., Arras K.O., Bouabdallah S., Burnier D., Froidevaux G., Greppin X., Jensen B., Lorotte A., Mayor L., Meisser M., Philippsen R., Piguet R., Ramel G., Terrien G., Tomatis N., 2003. Robox at Expo.02: A Large-Scale Installation of Personal Robots, Robotics and Autonomous Syst., vol. 42, issue 3-4. [4] Moravec, H., Rise of the Robots, in Scientific American, December 1999, pp. 124-135. [5] J. Borenstein, H. R. Everett, L. Feng, Where am I? Sensors and Methods for Autonomous Mobile Robot Positioning - Available from ftp.eecs.umich.edu/people/johannb, 1996. [6] Matsuda, T. and Yoshikawa, E., 1989. Z-shaped Position Correcting Landmark, Proc. of the 28th SICE Annual Conference. [7] Laird, R.T, H.R. Everett, G.A. Gilbreath, R.S. Inderieden, and K.J. Grant, "Early User Appraisal of the MDARS Interior Robot," American Nuclear Society 8th International Topical Meeting on Robotics and Remote Systems (ANS’99), Pittsburgh, PA, 25-29 April, 1999 [8] Maddox, J., Smart Navigation Sensors for Automatic Guided Vehicles, Sensors, April 1994. [9] TRC - Transitions Research Corp., Beacon Navigation System, Product Literature, Shelter Rock Lane, Danbury, CT 06810, 203-798-8988. [10] Cohen, C. and Koss, F., 1992. A Comprehensive Study of Three Object Triangulation, Proc. of the 1993 SPIE Conference on Mobile Robots, Nov. 18-20, pp. [11] Shoval, S., and Sinriech, D., 2001. Analysis of Landmark Configuration for Absolute Positioning of Autonomous Vehicles, Journal of Manifacturing Systems, Vol. 20 No. 1. [12] Koppius, O., 1999. Dimensions of Intangible Goods, Proc. of the Thirty-second Annual Hawaii Int. Conference on System Sciences-Vo 5, 1999

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

49

Ghost-Goal Algorithm for Reactive Safe Navigation in Outdoor Environments Mattia Castelnovi, Antonio Sgorbissa, Renato Zaccaria Laboratorium DIST Università di Genova {mattia.castelnovi, antonio.sgorbissa, renato.zaccaria}@unige.it Abstract—A force field algorithm based on inclinometer readings is presented. It leads the robot to the goal by preventing it from turning upside down because of the inclination of the terrain. The method is supported by a navigation algorithm, which helps the system to overcome the well known local minima problem for force fieldbased algorithms. By appropriately selecting the “ghost-goal” position, known local minima problem can be solved effectively. When the robot finds itself in danger of local minimum, a “ghost-goal” appears while the true goal temporarily disappears in order to make the robot go outside dangerous configurations. When the robot escapes the possible dangerous configuration the system makes the true goal appear again and the robot is able to continue on its journey by heading towards it. Simulation results showed that the Ghost-Goal algorithm is very effective in environments with complex rugged terrain for a robot only equipped with an inclinometer.

1. Introduction Autonomous outdoor navigation in vast unknown environments is a special challenge for mobile robots. In the recent past, autonomous mobile robots have been able to accomplish a great deal of tasks in indoor environment, such as exploration[1], self-localization[2] and map building[3]. In most of these cases the problem of autonomous navigation supposed that the world is a flat surface. Such assumption allowed researchers to develop navigation algorithms that consider the robot a material point, and to consequently plan a path to the goal as a curve on a plane. One category of algorithms which proved to be especially suited to deal with unknown, dynamic environments is the so called force-field (also known as Artificial Potential Field) method. Unfortunately, when moving autonomous robots from indoor to outdoor environments many problems arise. In fact, in outdoor environments robotic platforms must deal with other hazardous situations [4], in addition to collision with obstacles one of them being the risk of turning upside down while trying to reach the goal [5, 6]. When the robot is climbing up or down a hill, moving through dunes, etc., the navigation system must take into consideration the maximum allowable pitch and roll values for the robot to avoid turning upside down. The following work proposes a forcefield algorithm which relies on data acquired from an inclinometer with the purpose of keeping the robot far from “dangerous slopes”. Obviously the difference between “dangerous slopes” and obstacles is quite hard to describe: in this paper the ground slope is considered as if it were continuous; we assume that obstacles would locally produce an

50

M. Castelnovi et al. / Ghost-Goal Algorithm for Reactive Safe Navigation in Outdoor Environments

infinite derivative, which cannot be detected by the inclinometer and requires ad hoc sensors (proximity sensors, for example). However obstacles are not considered here, we focus on a force-field algorithm which proves to be adequate to lead the robot to the goal, whilst preventing it from turning upside down because of the inclination of the terrain. Our method differs from other existing approaches, which usually rely on vision systems [7, 8] or proximity sensors [6], our system is thus provided with just an inclinometer which locally provides information about the current pitch and roll of the robot. As a consequence of the poor sensing capabilities that are available, the potential for getting stuck in local minima is higher. This problem is solved by a novel approach, which we call the “ghost goal” method. The method works in the following manner: when the robot finds itself in danger because of a local minimum, a “ghost-goal” appears, and the true goal temporarily disappears. Once the potential for local minima is reduced, it is able to continue on its journey by heading towards the true goal. Simulation results showed that the approach is very effective in environments with complex configuration for a robot equipped just with an inclinometer. The paper is structured as follows: section II describes how the force-field inspired method is implemented using data from an inclinometer; section III is focuses Ghost-goal algorithm which help the system not to fall into local minima; section IV describes experimental results. The final section is devoted to conclusions.

2. Inclinometer Based Force Field Method The presented system implements a force-field method based on the robot position and its pitch and roll angles. In force-field methods [9], the goal configuration exerts an attractive force on the robot, while obstacles exert a repulsive force. We extend the basic principle in order to deal with constraints on the robot’s maximum allowable pitch and roll. First of all we define a fixed Cartesian frame (Of, Xf, Yf) and a Cartesian frame (Or, Xr, Yr) centered on the robot, with the Xr axis oriented along the heading of the robot (Fig.1). In the following, all the quantities with subscript ‘f’ are defined with respect to (Of, Xf, Yf ), whilst those with subscript ‘r’ are defined with respect to (Or, Xr, Yr). Since the robot is going to be used in outdoor missions, we assume that it is equipped with a GPS and that the system is able to localize itself and to know the position of the goal with enough precision for our purpose. In particular, we define: x pr and rr , two unit vectors lying respectively along the Xr and the Yr axes. x qf(t), the configuration of (Or, Xr, Yr) with respect to the fixed frame at time t x tr(t), a unit vector directed from qf(t) to the goal x fr(t), the sum of the repulsive forces exerted by terrain slopes in qf(t) x gr(t), the gradient of the terrain in qf(t) expressed as inclination percentage x ir(t), a vector defined as follows: ir(t)·gr(t)=0, |ir(t)|=|gr(t)| (tangential to the isocline passing through qf(t), with the same magnitude as the gradient) It is straightforward to show that, if we want to minimize the pitch of the robot at time t, we need to minimize the absolute value of the dot product pr·gr(t), which can be done by forcing the robot to move along the isocline curve (i.e., by maximizing|pr·ir(t)|). On the

M. Castelnovi et al. / Ghost-Goal Algorithm for Reactive Safe Navigation in Outdoor Environments

51

opposite, if we want to minimize the roll, we have to minimize the absolute value of rr·gr(t), which can be done by forcing the robot to climb or descend the gradient (i.e., by maximizing |pr·gr(t)|). Thus, if we want to minimize both the pitch and the roll components, we have to choose the direction of motion of the robot in order to maximize the quantity |pr·ir(t)|+|pr·gr(t)|. This can be done by choosing the velocity vector (i.e. the robot heading) v(t)= ir(t) + gr(t). (1) However, in most cases, the maximum values allowed for the pitch and the roll components are different, since they depend on various physical characteristics (e.g. geometrical dimensions, weight distribution). Thus, at time t, given the current values pr(t) and rr(t) and their maximum allowed values (PitchUB, RollUB), we define the following quantities: 1 (2) mag r

mag p

RollUB

2

- RollAngle2

2

1

PitchUB

2

(3)

- PitchAngle 2

2

Finally, we calculate the velocity vector v (t)

mag p i r (t)  mag r g r (t) mag p

2

i r (t)

2

 mag r

2

g r (t)

(4) 2

as the most promising direction for the robot to safely move to (we are not considering the goal yet, nor obstacles to be avoided). In other words, as the pitch of the robot gets closer to its upper bound, magp quickly increases, thus forcing the robot to move along the isocline curve. Analogously, as the roll component gets closer to its upper bound, magr quickly increases, forcing the robot to climb/descend the gradient. That is, magp and magr are used to weight the sum of ir(t) and gr(t) in eq. 4, thus imposing different constraints for the maximum pitch and roll values (respectively PitchUB and RollUB) in order to safely keep the robot away from a dangerous configuration. Finally notice that, for each configuration qf(t), there are two vectors tangential to the isocline: ir(t) and -ir(t). As a consequence, since we want to maximize the absolute value | pr·ir(t)|, we can choose both ir(t) and -ir(t) in eq. 4. The choice is performed on the basis of the following rule: If ir(t) . tr(t ) < - ir(t). tr(t) choose ir(t) else choose - ir(t)

(5)

The vector pointing in the direction closer to the goal is selected to introduce a bias toward it (vector tr(t) ). In the same way the gr (t) vector is selected: if gr(t) · tr(t ) > - gr(t)·tr(t) choose gr(t) else choose - gr(t)

(6)

Finally, the attractive vector tr(t) heading towards the target is weighted and added to the velocity vector vr(t), as is typical in force-field models [1]. The resulting law of motion is:

52

M. Castelnovi et al. / Ghost-Goal Algorithm for Reactive Safe Navigation in Outdoor Environments

vr (t)=w1(magp ir(t) + magr gr(t))+w2 tr (t)

(7)

which drives the robot towards the goal while safely keeping pitch and roll values below the given threshold (w1, w2 are weighting factors) . In Fig. 3 it is possible to see different paths made by the robot when different upper bounds were imposed (in figures the robot icon just represents the robot and it is not scaled, but in the algorithm the robot is treated as a point). The Fig 3.a shows the path when both PitchUB and RollUB are at a very low value (5°=11.1%). The Fig 3.b shows the same mission with PitchUB = 5° and RollUB = 90° and the rest of the figures show the robot path when other values of PitchUB and RollUB (in Fig. 3 as p and r) are selected. In Figs. 13 and 14, the behavior of system with different PitchUB and RollUB values are shown when the Goal is located on the top of the hill. In the first of those Figure RollUB is much higher than PitchUB (PitchUB = 30°, RollUB = 90°). As a consequence, the robot moves towards the top of the hill by following a spiral shaped trajectory, which is necessary to keep the pitch very low with respect to the roll. In the second experiment (Fig. 14) the opposite is true: since PitchUB is much higher than RollUB (PitchUB = 90°, RollUB = 30°). The robot climbs toward the top of the hill by following a straight trajectory which allows to keep the roll value very low.

3. Ghost-goal algorithm 3.1. Local Minima problem in force- field methods The force-field method considers the robot as a particle under the influence of an artificial field of forces which is expected to summarize the “structure” of the free space [10]. The force-field function is usually defined as the sum of an attractive vector which pulls the robot toward the goal configuration and a repulsive vector which pushes the robot away from obstacles. One known drawback of the force-field method is the occurrence of local minima in the force function [15]. Local minima occur in configurations where the repulsive force generated by obstacles cancel exactly the attractive force generated by the goal (Figs. 6, 7, 12). The Ghost-Goal algorithm includes appropriate techniques for escaping from local minima. 3.2. Ghost-Goal applied on a inclinometer based Force Field Method This section explains how the Ghost-Goal algorithm avoids local minima. The Ghost-goal algorithm can be divided into 2 steps: 1) Barrier detection, 2) Ghost-Goal position selection. 1. Barrier detection: two unit vectors are introduced: the first one, which we call tdisc(t) heads to the goal like unit vector tr(t), but its direction is discretised with a very low resolution of 45° (i.e., like a wind star). The second one, which we call gdisc(t), points toward the direction of the gradient like gr(t) (with the same resolution than tdisc(t)). For example, in Fig. 4. , tdisc(t) is pointing West, and gdisc(t) is pointing North-West instead. The system periodically checks the direction of tdisc(t) and gdisc(t), and makes some

M. Castelnovi et al. / Ghost-Goal Algorithm for Reactive Safe Navigation in Outdoor Environments

53

considerations about the situation that the robot is probably about to face. If the two unit vectors “almost” lie over the same straight line, the robot is possibly approaching a local minimum (Fig. 5). It has been decided to discretized the directions because it is a fast and easy way to compare the directions of 2 vectors keeping a margin of safety: If (| tdisc(t) gdisc(t)| = 1 )AND(|pr| > PitchBarrierFinderTH OR |rr| > RollBarrierFinderTH) then barrier = TRUE (8) bdisc(t) = gdisc(t) else barrier = FALSE Of course, this can happen only if the slope of the ground is significantly high to influence the robot’s motion as stated in Equations 4 and 7. At the same time, the system monitors if both pitch and roll values are below their Barrier-Finder-thresholds defined as follows: PitchBarrierFinderTH = PitchUB/\ RollBarrierFinderTH = RollUB/\

(9)

where \ ҏis a constant which manage the sensibility of the system to the local minima. If one of them is over the threshold and the tdisc(t) and gdisc(t) almost lie over the same sector direction (for example in Fig. 5 gdisc(t) is pointing North-West), a flag barrier is set equal to TRUE and a new vector bdisc(t) is set equal to gdisc(t)..This means that there is a potential local minimum to the goal. This step allows the system to identify dangerous situations which are due either to the slope of the ground or the relative position of goal.

Figure 2. 3D model for Figure 1. Definition of the area of the the used vectors experiments in Fig.3.

Figure 4. Configuration Figure 5. Configuration Figure 3. Different experiments made with different without barrier detection with barrier detection PitchUB and RollUB .

2. Ghost-Goal position selection. Once we have detected a possible local minimum, the robot must avoid falling into it. To deal with this, the algorithm does not require to perform local search procedures; in fact, a solution can be easily found just by temporarily move the Goal position, thus providing the robot with fallacious perceptions which, in some sense, are the analogous of what we call “a mirage”. The robot “sees” the goal in a more

54

M. Castelnovi et al. / Ghost-Goal Algorithm for Reactive Safe Navigation in Outdoor Environments

conformable and less hazardous position and tries to reach it using the same rules used in its usual (= without dangerous situations) navigation. In this way it takes itself far away from the local minima. If barrier = TRUE Then Set the Ghost Goal(…)

(10)

As soon as the dangerous situation disappears, the system is able to “understand” that it was just a mirage, a ghost, and it is no more useful to follow that direction. In fact, when the relative position of the robot has changed with respect to the goal and the surrounding hills or valleys, the just avoided local minima -probably - are not dangerous anymore. The robot can start again heading to the real goal. Obviously, we must carefully select the Ghost-Goal in order to make the robot escape from local minima while moving closer to the real goal or, at least, while not going “too far” from it. To achieve this, the Ghost Goal is not explicitly represented as a position in the Workspace, but as a suggested direction tsafe(t) (a unit vector with the same 45° discretization than tdisc(t), gdisc(t), bdisc(t)), which is chosen according to the following rules: Set the Ghost Goal(…) If RobotDir[arg(bdisc(t)-HT), arg(bdisc(t))+HT] then arg(tsafe (t)) = arg(bdisc(t))±HT

(11)

where HT identifies the maximum deviation from the real goal allowed; its sign is chosen in order to minimize the discrepancy between tsafe (t) and tdisc(t). If barrier = FALSE then tsafe(t) = tdisc(t)

(12)

I.e., as soon as the gradient changes, either in its direction or its magnitude, the Ghost-Goal disappears and the robot starts heading again towards the real goal.

4. Experimental Results The system has been developed and tested using the AE-SIM simulator [11] a tool included in ETHNOS [12] (Expert Tribe in a Hybrid Network Operating System) a programming environment for multi autonomous robot systems. The software allows the robot to receive data from the simulated environment in the same way as it would receive it from a real sensor. To simulate an outdoor environment, Digital Elevation Models have been used. The size of each cell is 1 meter2, which is very accurate compared to other inclinometer-driven navigation algorithms implemented in simulation for analogous mobile robot tasks [13]. Figure 11 shows the path of the robot in a labyrinth which has a flat surface on the ground and sloped walls for making inclinometer sensing. This experiment shows that the robot reaches the goal even though it encounters 5 local minima. In Figs. 8, 9, 10, a “castle-like” environment is considered. Walls are inclined with an increasing slope (i.e., the second

M. Castelnovi et al. / Ghost-Goal Algorithm for Reactive Safe Navigation in Outdoor Environments

55

order derivative is positive). The goal is placed onto a “flat” square in the middle of the castle. A small entrance is present in the walls, to show that the system is able to find paths in very difficult situation too. The first mission (Fig.9) is made with a low value of PitchUB and a high value of RollUB. The robot reaches the entrance after a spiral shaped trajectory that makes the robot come closer to the goal without overcome the threshold. This simulation shows how the Ghost-Goal algorithm it is able to reach the goal even in extreme situations. Next figure (Fig.10) shows the same experiment made with a low value of RollUB and a high value of PitchUB. Since the walls at their top are almost vertical the robot turns in order not to damage itself. Two short turns allow it to reach the goal.

Figure 6. Same of Fig. 2.b. without Ghost-Goal algorithm

Figure 8. 3D model Figure 7. Same of for the castle-like Figs. 13 and 14 Figure 9. Spiral Figure 10. Straight to environment without Ghostgoal robot shaped robot trajectory the experiment Goal algorithm trajectory.

13. Spiral Figure 14. Straight to Figure 11. Path made inside Figure 12. Same of Fig.11 Figure without Ghost-Goal algorithm shaped trajectory goal trajectory. a wall-sloped Labyrinth

In Figs. 6 and 7 examples of navigation missions without the Ghost-goal algorithm is shown: in Fig. 6 is possible to see the same experiment of Fig 2.b. using the same threshold (PitchUB = 5° and RollUB = 90°). The robot stops in front of the hill in correspondence of a local minimum. It continues to wheel around itself, but it is not able to move away from that position. The same happens in fig, 7, the robot, after coasting around the hill, stops in a local minimum and it is not able to escape from that position. In Fig 12 the robot stops at the first local minimum of the labyrinth. It is important to remember that, in these situations, the robot would have been able to reach the goal. Because the PitchUB is lower than the slope of the hills or the walls of the labyrinth it could have reached the goal by simply coasting around the hill (or the walls of the labyrinth) or using a spiral shaped trajectory.

5. Conclusions This paper presents a force-field inspired method based on inclinometer data which implements a new algorithm for escaping from local minima. The results from experiments

56

M. Castelnovi et al. / Ghost-Goal Algorithm for Reactive Safe Navigation in Outdoor Environments

conducted on different kinds of slope and threshold configurations have been shown. The developed system seems to be robust enough to be used with real robot. Before and during the system’s development, many experiments were made with different slopes and robot configurations. Without the Ghost-Goal algorithm, the robot was often able to overstep the upper bounds on its allowed pitch and roll, i.e., a really dangerous situation in the case a real robot. After developing the Ghost-Goal algorithm, the simulated robot becomes able to complete missions without exceeding the maximum allowed pitch or roll. The Ghost-goal algorithm allows the system to better capitalize on its capabilities and to reach the goal overcoming the problem of local minima even if some issues still arise, for example, from situation like C-obstacles due also to the discretization of the direction. For a complete autonomous navigation using the Ghost-Goal algorithm, future work will focus on, installing the system on a real robot provided with a localization system and testing it with supplementary sensors (e.g., for obstacle avoidance). REFERENCES [1] [2] [3] [4]

[5] [6]

[7] [8] [9] [10] [11]

[12]

[13] [14]

[15]

W. Burgard, M. Moors, C. Stachniss, and F. Schneider Coordinated Multi-Robot Exploration. IEEE Transactions on Robotics, 21(3), 2005. S. Thrun, D. Fox, W. Burgard and F. Dellaert Robust Monte Carlo localization for mobile robots. Journal of Artificial Intelligence, 128 (1-2), pp. 99-141, 2001. Folkesson, J., Jensfelt, P., and Christensen, H. Vision slam in the measurement subspace. In Intl Conf. on Robotics and Automation (Barcelona, ES, Apr 2005), IEEE. ] Castelnovi, M., Arkin, R., and Collins, T., 2005. "Reactive Speed Control System Based on Terrain Roughness Detection", 2005 IEEE International Conference on Robotics and Automation, Barcelona, SP, April 2005. Arkin, R.C. and Gardner, W., 1990. "Reactive Inclinometer-Based Mobile Robot Navigation", Proc. 1990 IEEE International Conf. on Robotics and Automation, Cincinnati, OH, pp. 936-941. Marco Giaroli, Antonio Sgorbissa, Renato Zaccaria, 2002, "Reactive Safe Navigation in Outdoor Environments", International Symposium on Robotics and Automation (ISRA 02), September 2002, Toluca, Mexico. Langer, D., Rosenblatt, J., and Hebert, M.,.1994. A Behavior-Based System for Off-Road Navigation, IEEE Trans. Robotics and Automation, Vol. 10, No. 6. Matthies, L.H., 1992. Stereo Vision for Planetary Rovers: Stochastic Modeling to Near Real-Time Implementation, International Journal of Computer Vision, 8:1. Arkin, R.C., 1989. "Motor Schema-Based Mobile Robot Navigation", International Journal of Robotics Research, Vol. 8, No. 4, pp. 92-112. Brooks, R. A., 1986, "A Robust Layered Control System for a Mobile Robot." IEEE Journal of Robotics and Automation, Vol. RA-2, No. 1, pp. 14-23. Andrea Raggi, Antonio Sgorbissa, Renato Zaccaria, “AE-SIM: Simulating Intelligent Robots in Intelligent Environments”, Proceedings from The IEEE Conference on Control Applications/ International Symposium on Intelligent Control/ International Symposium on Computer Aided Control Systems Design (CCA/ISIC/CACSD 2004), Taipei, Taiwan, September 1-4 2004 Maurizio Piaggio, Antonio Sgorbissa , Renato Zaccaria ," A Programming Environment for Real Time Control of Distributed Multiple Robotic Systems " Advanced Robotics , The International Journal of the Robotics Society of Japan, Vol. 14, N.1, VSP Publisher, 2000. Arkin, R.C. and Gardner, W., 1990. "Reactive Inclinometer-Based Mobile Robot Navigation", Proc. 1990 IEEE International Conf. on Robotics and Automation, Cincinnati, OH, pp. 936-941 Zou Xi-yong and Zhu Jing “Virtual local target method for avoiding local minimum in potential field based robot navigation”, Journal of Zhejiang University SCIENCE (ISSN 1009-3095, Monthly) 2003 Vol. 4 No. 3 p.264-269. Arkin, R. 2000. Behavior-Based Robotics. MIT Press, Cambridge, MA .

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

57

Autonomous Robot Vision System for Environment Recognition Woong-Jae Won, Sang-Woo Ban and Minho Lee* School of Electrical Engineering and Computer Science, Kyungpook National University 1370 Sankyuk-Dong, Puk-Gu, Taegu 702-701, Korea *[email protected] Abstract. In this paper, we propose a novel autonomous robot vision system that can recognize an environment by itself. The developed system has self-motivation to search an interesting region using human-like selective attention and novelty scene detection models, and memorize the relative location information for selected regions. The selective attention model consists of bottom-up saliency map and low level topdown attention model. The novelty detection model generates a scene novelty using the topology information and energy signature in the selective attention model. Experimental results show that the developed system successfully identify an environment as well as changing of environment in nature scene. Keywords. Autonomous robot vision system, selective attention model, novelty scene detection, environment recognition

Introduction The present intelligent system has been mainly based on task-specific manners and/or the programmer’s input about environmental information. Even through such a traditional artificial intelligent robot has been successfully used for simple tasks, we need a new artificial intelligent paradigm to continuously adapt the changes of an environmental and to voluntarily increase a new knowledge without any interrupt by a programmer. This concept would be useful in developing a truly human-like robot and it could be applied to many engineering fields such as artificial secretary, household robot, medical treatment nursing robot, etc. Recent researches have been toward developing a more human-like machine such as an autonomous mental development (AMD) mechanism [1,2,3]. The AMD mechanism is that robots can increase their own knowledge through continuous interaction with human and environment. In order to implement such an intelligent system with the AMD concept, we need to endow the developed robot system with environment perception and novelty detection based on human interaction [4]. In this paper, we propose an autonomous robot vision system which can not only learn the surroundings with self-motivated desire but also detect a novelty in a dynamic scene. The developed system is based on human-like selective attention model which imitates the human visual search mechanism. The selective attention model uses bottom-up process in conjunction with an interactive process that ignores an unwanted area and/or pays attention to a desired area in subsequent visual search process through human interaction [5,6,7]. Novelty detection has been used in video streams by

58

W.-J. Won et al. / Autonomous Robot Vision System for Environment Recognition

Gaborski et al. [8], in which the subtraction between two saliency maps for two consecutive input frames was used to detect a novel scene. In section 1, we describe an object preferred selective attention model with a scaleability using a maximum entropy algorithm. In section 2, we explain a novelty detection and environment recognition model that can memorize a visual scan path topology and an energy signature for a dynamic scene. In section 3, we describe the implementation of proposed autonomous robot vision system, and experimental results will be followed. In final section, we summarize our results and mention about a further research.

1. Visual selective attention model The proposed selective attention model is implemented by imitating the functions of the visual cortex from the retina to the LIP (Lateral Intra-Parietal cortex) through the LGN and the visual cortex area [9]. Reward and punishment function of the hypothalamus and associated limbic structures is considered to develop the low level top-down attention model reflecting the reinforcement and inhibition function [6,9]. Figure 1 shows the proposed object preferred selective attention model which consists of bottom-up processing and low level top-down attention model.

Figure 1. Object preferred selective attention model

The simple biological visual pathway for bottom-up processing is from the retina to the visual cortex through the LGN [9]. In our approach, we use the bottom-up saliency map (SM) model that reflects the functions of the retina cells, the LGN and the visual cortex. Since the retina cells can extract edge and intensity information as well as color opponency, we use these factors as the basis features of the bottom SM model. In order to consider the function of the LGN and the ganglian cells, we consider the oncenter and off-surround operation by the Gaussian pyramid image with different scale, and then constructing three feature bases such as intensity, edge, color feature map. These feature maps make a candidate of salient regions by the bottom-up processing. In order to reflect the human-like attention that has prefer selection to an object, we consider a symmetry operator because an arbitrary object contains a symmetry

W.-J. Won et al. / Autonomous Robot Vision System for Environment Recognition

59

information, which is realized by the noise tolerant generalized symmetry transformation (NTGST) algorithm based on edge information [10]. After obtaining the candidate of selective attention regions, we modify the scan path to pay attention on an object by considering the degree of symmetry. Then, we can construct a saliency map which has more preference on an object [7]. Then, we adapt an entropy maximization approach to select the proper scale of the salient areas by using the bottom-up SM model which has prefer attention to an object. The scale selection algorithm is based on Kadir’s approach [11]. For each salient location, the proposed model chooses those scales at which the entropy is its maximum or peaked, then it weights the entropy value at such scales by some measure of the self-dissimilarity in the scale space of the bottom-up SM model. The most appropriate scale for each salient area centered at location x is obtained by equation

scale ( x )

arg m ax{ H D ( s , x ) u W D ( s , x )}

(1)

s

where D is the set of all descriptor values, H D ( s, x) is the entropy defined by equation (2) and WD ( s, x) is the inter-scale measure define by equation (3)

H D ( s , x )   ¦ p d , s , x log 2 p d , s , x

(2)

d D

W D (s, x) 

s2 2s 1

¦

p d , s ,x  p d , s 1,x

(3)

dD

where pd , s , x is the probability mass function for scale s, position x, and descriptor value d, which takes on value in D. Although the proposed bottom-up the SM model is generated only by primitive features such as intensity, edge, color and symmetry information. In order to develop a more plausible selective attention model, we need to develop an interactive procedure with a human supervisor, together with bottom-up information processing. Human beings ignore uninteresting area, even if they have primitive salient features, and they can memorize the characteristics of the unwanted area, Humans do not pay attention to new areas that have characteristic similar to learned unwanted area. In addition, human perception can focus on an interesting area, even if it does not have primitive salient features, or if it is less salient than other areas. We propose a now selective attention model that mimics the human-like selective attention mechanism and that can consider not only primitive input features, but also interactive properties in the environment as reflecting limbic structures including the hypothalamus function in our brain [9]. We use a fuzzy ART network “learns” and “memorizes” the characteristics of uninteresting and/or interesting areas decided by a human supervisor [6]. After the successful training of the fuzzy ART network, an unwanted salient area is inhibited and a desired area is reinforced by the vigilance value U of the fuzzy ART network [6,12]. As shown in figure 1, inputs for the fuzzy ART network can be composed of intensity, edge and color feature maps within a salient area. However, the sizes of the selected salient areas can be different, and thus we need to consider how we can adapt the input dimension of the fuzzy ART network corresponding to the various sizes of the salient regions. In this paper, we use a log-polar transformation to generate a fixed size of input dimension for the fuzzy ART network for various sizes of salient areas in the selective attention model. It means that one of the quantized areas by log-polar transformation corresponds to one of input neurons in the fuzzy ART network. The proposed model can generate a fixed uniform scale for every salient area with different

60

W.-J. Won et al. / Autonomous Robot Vision System for Environment Recognition

scales, of which the proposed fuzzy ART model makes to train the arbitrary scaled feature maps for the selected area in order to reinforce and inhibit a salient region.

2. Environment recognition based on novelty detection In order to develop a low level novelty scene detection model, we developed a robust scene description model with a tolerance against noise and a slight affine transformation such as translation, rotation, zoom-in and zoom-out in a dynamic environment. Such a model can be used to detect novelty by comparing the description of the current scene with that of an experienced scene, like the assumable function of the hippocampus in our brain. In the proposed model, we developed a relative topology matching method by using visual scan paths obtained from a saliency map, which were generated by the low level top-down attention model, together with the bottom-up saliency map (SM) model. We can describe an input scene robustly by using the visual scan path based on the selective attention model. Moreover, the size information of every salient area is obtained by the entropy maximization method, which is used to represent a scan path topology of an input scene. In addition, the energy signatures obtained from the SM are used as an additional factor to decide whether a novelty is occurred in the input scene [13]. The obtained scan path topology and energy signature are used as the input of the developed incremental computation model, which can incrementally memorize scene information that is experienced at a new place or at different time for the same place. Moreover, the proposed model is able to recognize the placement and the direction of a field of view from memorized scene information in the incremental computation model. Figure 2 shows the proposed incremental computation model of which the input consists of geometry topology of locations of the salient areas, the scale information of the salient areas and the relative energy signature, which are defined as a relative distance score f d , a relative scale score f s , and a energy score f e , respectively. If the model is executed for the first time, the incremental computation model needs to be initiated with only one output neuron. When the input of the incremental computation model is prepared, the proposed model obtains the location information pi of the camera system and the direction information d j of the field of view of the camera system, by feedback information such as encoder sensors. If the location information is new, the incremental computation model generates a new output node for that location and direction. After getting the location and direction information, the proposed model selects a winner node from among the nodes related with the dedicated location and direction. Then, the model checks the degree of similarity using a vigilance value U ' . If the degree of similarity between the current input scene and the winner node is less than the vigilance value U ' , then the proposed model indicates it is a novelty scene and creates a new node, otherwise the current input scene is considered as previously seen and the proposed model adjusts the weights of the winner node according to the input scene through the unsupervised learning. The developed model can recognize the location of its placement by using the trained incremental computation model for scene memorization and scene novelty detection, as described

W.-J. Won et al. / Autonomous Robot Vision System for Environment Recognition

61

above. The proposed placement recognition algorithm uses the same procedure that makes an input vector ( f d , f s , f e ) for the incremental computation model.

Figure 2. Incremental model for recognizing an environment

Assume that the training for novelty scene detection is successfully finished using the proposed incremental computation model. When the model is placed in a room without a priori information, the proposed computation model autonomously selects a winner node from among all of the output nodes according to the input vector ( f d , f s , f e ) . Then, the proposed computation model can get location and direction information as well as the instance that the model has been seen the scene from the indices pi and d j , respectively, of the selected winner node. Finally, the proposed model can indicate its placement from the experienced scene memory, whenever it is placed at some place.

3.

Hardware implementation and experimental results

We implement a pan-title unit for autonomous robot vision system for recognizing environment. Figure 3 shows implemented system called by SMART-v1 (Self Motivated Artificial Robot with Trainable selective attention model version 1).

Figure 3. SMART-v1 platform

62

W.-J. Won et al. / Autonomous Robot Vision System for Environment Recognition

The SMART-v1 has four DOF and 1394 CCD camera and Text to Speech module (TTS) to communicate with human to inquire about an attentive object, and title sensor to set offset position before starting moving. We use the Atmega128 as the motor controller and zigbee to transmit motor command from a PC to SMART-v1. The SMART-v1 can continuously search an interesting region by the selective attention model which is explained in section 2. Also, it can detect a novelty in dynamic situation as well as environment recognition. Followings are one of operating mode to recognize an environment (1) We set the SMART-v1 in arbitrary place. The SMART-v1 does not have any a priori knowledge for environment. (2) Then, the visual selective attention model operates for the first scene. After deciding the salient regions, it asks whether the selected region interests to human supervisor. After getting an answer about the selected region, the fuzzy ART network learns the region as inhibition and reinforcement region according to supervisor’s decision. (3) The SMART-v1 memorizes the location and direction information of the selected regions by the explained the environment recognition model using the motor encoder signals. (4) If the SMART-v1 detects a novelty in a scene, then it asks the human what it is. Then human gives a labeling, which consists of the geometry topology of locations of the salient areas, the scale information of the salient areas and the relative energy signature, for the selected region. The SMART-v1 simply memorizes the characteristics of the selected region by color histogram and topology of orientation information. (5) After training the environment, we can ask environment information to the SMART-v1 where a specific object is. Then, the SMART-v1 can reply the direction and location information based on past experience. Figure 4 show the experimental results of the proposed selective attention model. Figure 4 (a) is the result of bottom-up SM model, Figure 4 (b) is the result after inhibition for the second attention region in Figure 4 (a). After considering the symmetry information, we can ignore the 5th attentive region because it does not contain symmetry information so much. Finally, we can change the scan path to pay attention a left computer monitor rather than a red cushion as most salient region using the reinforcement fuzzy ART network as shown in Figure 4 (d).

(a)

(b)

(c)

(d)

Figure 4. The result of selective attention model : (a) bottom-up attention point, (b)inhibit for 2nd attention region in (a), (c) ignore for 5th attention region in (b), (d)reinforcement for 2nd attention region in (c)

Figure 5 shows that it is hard to indicate a novel scene by simply subtracting the saliency maps in a dynamic environment, in which the obtained input scenes usually

63

W.-J. Won et al. / Autonomous Robot Vision System for Environment Recognition

have different affine transformed field of view each other. Therefore, Gaborski’s [8] approach can not apply to detecting novel scenes in the dynamic environment. Figure 6 shows that the proposed model can successfully detect a novelty scene in a dynamic environment. Figure 6 (c) has novelty information, while figures 6 (a) and (b) do not have novelty information, except for a slight change in the field of view. Moreover, figures 6 (d) and (e) show that the proposed model can get similar energy signatures in two corresponding SMs. Table 1 shows the performance of the proposed novelty scene detection model using total 61 scenes which consist of 44 novelty scenes and 17 nonovelty scenes at three directions in the interior environment.

(a)

(b)

(c)

(d)

(e)

Figure 5. Scan path topologies and saliency maps for two different time scenes for the assumed same scene that has a little affine transformed field of view: (a) and (b) scan path topologies at time t and t + dt, respectively, (c) and (d) saliency maps at time t and t+dt, respectively, (e) subtraction between (c) and (d)

(a)

(b)

(c)

(d)

(e)

(f)

Figure 6. Scan path topologies and saliency maps for three different time scenes for the assumed same scene that has a little affine transformed field of view: (a), (b) and (c) scan path topologies at time t, t + dt and t + 2dt, respectively, (d), (e) and (f) saliency maps at time t, t+dt and t+2dt, respectively.

Table 1. The performance of the proposed novelty scene detection model:61 scenes were considered to evaluate among which 44 scenes were novel scenes and the remainder 17 scenes were no novel scenes. (TP: True Positive, TN: True Negative, FP: False Positive, FN: False Negative)

Novelty perception No-novelty perception Correct Recognition Rate

Novelty scenes (44 scenes) TP 41 FN 3 95.3%

No-novelty scenes (17 scenes) FP 2 TN 15 88.2%

Total (61 scenes) F(P+N) 5 T(P+N) 56 91.8%

64

4.

W.-J. Won et al. / Autonomous Robot Vision System for Environment Recognition

Conclusion

We implemented a novel autonomous robot vision system that can detect a novelty as well as environment recognition. The proposed system is based on human-like selective attention model which consists of bottom-up SM and top-down reinforcement and inhibition models. The variant scale problem in selective attention regions can be solved by introducing the log-polar transformation. The incremental model based on novelty detection successfully recognizes both the environment and change of environment. As a future work, we are investigate in efficient way to describe an attention region including a task-nonspecific object representation Acknowledgment This research was funded by the Brain Neuroinformatics Research Program of the Ministry of Commerce, Industry and Energy. References [1]

C. Breazeal : Imitation as Social Exchange between Humans and Robots, in proceedings of the Symposium on Imitation in Animals and Artifacts (AISB99), Edinburg, Scotland, 1999, 96 - 104. [2] J. Weng et, all: A theory for mentally developing robots, The 2 International conference on Development and Learning on Development and Learning, 2002, 131 - 140. [3] B. Scassellati: Investigating models of social development using a humanoid robot, in proceedings of International Joint Conference on Neural Networks, 2003, Portland, USA. 2704 - 2709. [4] S. Singh and M. Markou: An approach to novelty detection applied to the classification of image regions, IEEE Trans. On Knowledge and Data Eng, 16 (2004), 396-407. [5] L. Itti, C. Koch, E. Niebur: A model of saliency-based visual attention for rapid scene analysis, IEEE Trans PATT Mach Intell ,20 (1998), no. 11, 1254 -1259. [6] S. B. Choi, S. W. Ban, M Lee: Human-Like selective Attention Model with Reinforcement and Inhibition Mechanism, in proceedings of the 11th International Conference on Neural Information Processing, 2004, Calcutta, India, 694- 699. [7] W. J. Won, S. W. Ban, M. Lee: Real Time Implementation of a selective attention model for the intelligent robot with autonomous mental development, in proceeding of International Symposium on Industrial Electronics 2005, Dubrovnik, Croatia, D7-D5. [8] R.S. Gaborski, V.S. Vanigankar, V.S. Chaoji and A.M. Teredesai, “VENUS: A system for novelty detection in video streams with learning,” in proceedings of AAAI 2004, San Jose, USA, 2004, pp. 1-5. [9] E.B GoldStein (Ed.), Sensation and perception 4th ed., USA: International Thomson Publishing Company, 1996. [10] K. S. Seo, C. J. Park, C. J. Cho and H. M. Choi: Context-free maker-controlled watershed transform for efficient multi-object detection and segmentation, IEICE Trans, 384-A (2001), no.6, 1392-1400. [11] T. Kadir and M. Brady: Scale, saliency and image description, International Journal of Computer Vision, 45 (2001), 83 -105. [12] G.A Carpenter, S. Grossberg, N. Markuzon, J.H. Reynolds and D.B. Rosen: Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps, IEEE Trans. on Neural Networks, 3 (1992), No.5, 698-713. [13] S.-W. Ban and M. Lee: Novelty analysis in dynamic scene for autonomous mental development, Lecture notes in Computer Science, 3696 (2005), 1-6.

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

65

Multi-resolution Field D* Dave Ferguson and Anthony Stentz The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA {dif, tony}@cmu.edu Abstract. We present a multi-resolution path planner and replanner capable of efficiently generating paths across very large environments. Our approach extends recent work on interpolation-based planning to produce direct paths through non-uniform resolution grids. The resulting algorithm produces plans with costs almost exactly the same as those generated by the most effective uniform resolution grid-based approaches, while requiring only a fraction of their computation time and memory. In this paper we describe the algorithm and report results from a number of experiments involving both simulated and real field data.

1. Introduction A common task in mobile robot navigation is to plan a path from an initial location to a desired goal location. Typically, this is done by storing the information about the environment in a discrete traversability cost map, where the cost of traversing each area depends on any number of factors, including safety risk, traversal time, or fuel expended, and then planning over this cost map. In robotics, it is common to improve the efficiency of planning by approximating the cost map as a graph. For instance, given a cost map consisting of a 2D grid of cells of uniform size (or resolution), a graph is typically constructed by placing nodes at the centers or corners of the grid cells and inserting edges between each node and the nodes in its 8 adjacent cells. Several algorithms exist for planning paths over such graphs. Dijkstra’s algorithm and A* are useful for computing initial paths to the goal from every location or a single location, respectively [2,6]. D* and D* Lite are very effective at repairing these paths when new information about the environment is received [9,5]. Combining uniform resolution grids with these incremental planning algorithms has been a very popular approach for mobile robot navigation in partially-known or dynamic environments. However, such an approach has two significant limitations. Firstly, the small, discrete set of transitions modeled by the graph can reduce the quality of the resulting paths. In the case of a uniform resolution grid, each node in the extracted graph only has edges connecting it to its 8 closest neighboring nodes. This limits the heading of any solution path to increments of π4 . This can lead to suboptimal paths and undesirable behavior [3]. Secondly, uniform resolution grids can be very memory-intensive. This is because the entire environment must be represented at the highest resolution for

66

D. Ferguson and A. Stentz / Multi-Resolution Field D*

Figure 1. (left, center ) Multi-resolution Field D* produces direct, low-cost paths (in blue/dark gray) through both high-resolution and low-resolution areas. (right) The GDRS XUV robot used for autonomous navigation of outdoor terrain. Multi-resolution Field D* was initially developed in order to extend the effective range of rugged outdoor vehicles such as this by one to two orders of magnitude.

which information is available. For instance, consider a robot navigating a large outdoor environment with a prior overhead map. The initial information contained in this map may be coarse. However, the robot may be equipped with onboard sensors that provide very accurate information about the area within some field of view of the robot. Using a uniform resolution grid-based approach, if any of the high-resolution information obtained from the robot’s onboard sensors is to be used for planning, then the entire environment needs to be represented at a high-resolution, including the areas for which only the low-resolution prior map information is available. Storing and planning over this representation can require vast amounts of memory. To overcome the first of these limitations, improvements to uniform resolution grid-based planners have been developed that use interpolation to produce paths not constrained to a small set of headings [7,3]. In particular, the Field D* algorithm extends the D* family of algorithms by using linear interpolation to derive the path cost of points not sampled in the grid [3]. This algorithm efficiently produces very low-cost paths with a range of continuous headings. A number of techniques have been devised to address the second limitation. One popular approach is to use quadtrees rather than uniform resolution grids [8]. Quadtrees offer a compact representation by allowing large constant-cost regions of the environment to be modeled as single cells. They thus represent the environment using grids containing cells of varying sizes, known as non-uniform resolution grids or multi-resolution grids. However, paths produced using quadtrees and traditional quadtree planning algorithms are again constrained to transitioning between the centers or corners of adjacent cells and can thus be grossly suboptimal. More recently, framed quadtrees have been used to alleviate this problem somewhat [1,10]. Framed quadtrees add cells of the highest resolution around the boundary of each quadtree region and allow transitions between these boundary cells through the quadtree region. As a result, the paths produced can be much less costly, but the computation and memory required can be large due to the overhead of the representation (and in pathological cases can become significantly more than is required by a full high-resolution representation). Also, segments of the path between adjacent

D. Ferguson and A. Stentz / Multi-Resolution Field D*

67

high-resolution cells suffer the same π4 heading increment restrictions as classical uniform resolution grid approaches. In this paper, we present an approach that combines the ideas of interpolation and non-uniform resolution grid representations to address both of the limitations mentioned above. The resulting approach provides very cost-effective paths for a fraction of the memory and, often, for a fraction of the time required by uniform resolution grid-based approaches. We begin by describing how linear interpolation can be used to provide direct, low-cost paths through non-uniform resolution grids. We then present our new algorithm, Multi-resolution Field D*, along with a number of example illustrations and results comparing it to its uniform resolution predecessor.

2. Interpolation-based Path Planning Given a grid with nodes at the corners of every grid cell, classic grid-based planners restrict the available actions from each node to be direct paths to neighboring nodes. This restriction affects the quality and effectiveness of grid-based paths, as mentioned above. 2.1. Using interpolation to produce more accurate path cost estimates We can overcome the limitations of classical grid-based planning if we allow each node to transition to any point on any neighboring edge, rather than to just the small set of neighboring nodes. We define the neighboring edges of a node s to be all edges that can be reached from s via a straight-line path for which s is not an endpoint (see Figure 2 (left)). The rationale here is that the optimal path from s to the goal must pass through one of these neighboring edges, so if we knew the optimal paths from every point on any of these edges to the goal we could compute the optimal path for s. If E is the set of neighboring edges of s and PE is the set of all points on these edges, then the cost of the optimal path for s is g ∗ (s) = min (c(s, p) + g ∗ (p)), p∈PE

(1)

where c(s, p) is the cost of the cheapest path from s to p that does not travel through any other neighboring edges of s, and g ∗ (p) is the cost of the optimal path from point p to the goal. Unfortunately, there are an infinite number of points p on any one of the neighboring edges of s, so solving for the least-cost paths from each of these points is impossible. Fortunately, a method was presented in [3] that uses linear interpolation to approximate the path cost of any point on an edge using just the path costs of the two endpoints of the edge. In that work, this approximation was used to provide an efficient closed form solution to the above path cost calculation for any given corner node s in a uniform resolution grid. Using this method, very accurate paths through uniform resolution grids can be efficiently computed.

68

D. Ferguson and A. Stentz / Multi-Resolution Field D*

(a)

(b)

(c)

(d)

Figure 2. (left) The neighboring edges (dashed in red/gray, along with their endpoints in gray) from a given node (white) in a grid containing cells with two different resolutions: low-resolution and high-resolution. (a, b) Some of the possible paths to a neighboring edge/node from a low-resolution node. On the left are the possible optimal path types (in blue/dark gray) through the top low-resolution edge (dashed in red/gray) and its endpoint nodes (in gray). Linear interpolation is used to compute the path cost of any point along the top edge. On the right are the possible optimal path types (in blue/dark gray) to one neighboring high-resolution corner node (in gray). (c, d) Some of the possible paths (in blue/dark gray) from a high-resolution corner node (white) to a neighboring low-resolution edge (c) and to a high-resolution node (d, left) and edge (d, right).

2.2. Combining interpolation with multiple resolutions We can use the same basic idea to provide accurate path costs for nodes in non-uniform resolution grids. The main difference is that, while in the uniform resolution case each node s has exactly 8 neighboring edges of uniform length, in the non-uniform resolution case a node may have many more neighboring edges with widely-varying lengths. However, linear interpolation can still be used to approximate the path costs of points along these edges, exactly as in the uniform resolution case. As a concrete example of how we compute the path cost of a node in a multi-resolution grid, we now focus our attention on a grid containing cells of two different resolutions: high-resolution and low-resolution. This two-resolution case addresses the most common navigation scenario we are confronted with: a lowresolution prior map is available and the robot is equipped with high-resolution onboard sensors. Although we restrict our attention to this scenario, the approach is general and can be used with arbitrarily many different resolutions. In a grid containing two different resolutions, each node can reside on the corner of a low-resolution cell, the corner of a high-resolution cell, and/or the edge of a low-resolution cell. Examples of each of these possibilities can be seen in Figure 2(b): the white node is the corner of a low-resolution cell and the gray node is the corner of a high-resolution cell and on the edge of a low-resolution cell. Let’s look at each of these possibilities in turn. Firstly, imagine we have a node that resides on the corner of a low-resolution cell. We can calculate the least-cost path from the node through this cell by

D. Ferguson and A. Stentz / Multi-Resolution Field D*

69

ComputePathCost(s) 1 vs = ∞; 2 for each cell x upon which s resides 3 if x is a high-resolution cell 4 for each neighboring edge e of s that is on the boundary of x 5 vs = min(vs , minp∈Pe (c(s, p) + g i (p))); 6 else 7 for each neighboring edge e of s that is on the boundary of x 8 if e is a low-resolution edge 9 vs = min(vs , minp∈Pe (c(s, p) + g i (p))); 10 else 11 vs = min(vs , minp∈EPe (c(s, p) + g(p))); 12 return vs ;

Figure 3. Computing the Path Cost of a Node s in a Grid with Two Resolutions

looking at all the points on the boundary of this cell and computing the minimumcost path from the node through any of these points. We can approximate the cost of this path by using linear interpolation to provide the path cost of arbitrary boundary points, exactly as in uniform resolution Field D*. However, some of the boundary may be comprised of high-resolution nodes. In such a case, we can either use interpolation between adjacent high-resolution nodes and allow the path to transition to any point on an adjacent high-resolution edge, or we can restrict the path to transitioning to one of the high-resolution nodes. The former method provides more accurate approximations, but it is slightly more complicated and less efficient. Depending on the relative sizes of the high-resolution cells and the low-resolution cells, either of these approaches may be appropriate. For instance, if the high-resolution cells are much smaller than the low-resolution cells, then interpolating across the adjacent high-resolution edges when computing the path from a low-resolution node is not critical, as there will be a wide range of heading angles available just from direct paths to the adjacent high-resolution nodes. However, if the high-resolution cells are not significantly smaller than the lowresolution cells then this interpolation becomes important, as it allows much more freedom in the range of headings available to low-resolution nodes adjacent to high-resolution nodes. In Figure 2 we illustrate the latter, simpler approach, where interpolation is used to compute the path cost of points on neighboring, strictly low-resolution edges (e.g. the top edge in (a)), and paths are computed to each neighboring high-resolution node (e.g. the gray node on the right edge in (b)). For nodes that reside on the corner of a high-resolution cell, we can use interpolation to approximate the cost of the cheapest path through the highresolution cell (see the paths to the right edge in (d)). Finally, for nodes that reside on the edge of a low-resolution cell, we can use a similar approach as in our low-resolution corner case. Again, we look at the boundary of the low-resolution cell and use interpolation to compute the cost of points on strictly low-resolution edges (e.g. the top edge in (d)), and for each high-resolution edge we can choose between using interpolation to compute the cost of points along the edge, or restricting the path to travel through one of the endpoints of the edge. The latter approach is illustrated for computing a path through the left edge in (d).

70

D. Ferguson and A. Stentz / Multi-Resolution Field D*

key(s) 1 return [min(g(s), rhs(s)) + h(sstart , s); min(g(s), rhs(s))]; UpdateNode(s) 2 if s was not visited before, g(s) = ∞; 3 if (s = sgoal ) 4 rhs(s) = ComputePathCost(s); 5 if (s ∈ OPEN) remove s from OPEN; 6 if (g(s) = rhs(s)) insert s into OPEN with key(s); ComputeShortestPath() ˙ key(sstart ) OR rhs(sstart ) = g(sstart )) 7 while (mins∈OPEN (key(s))< 8 remove node s with the minimum key from OPEN; 9 if (g(s) > rhs(s)) 10 g(s) = rhs(s); 11 for all s ∈ nbrs(s) UpdateNode(s ); 12 else 13 g(s) = ∞; 14 for all s ∈ nbrs(s) ∪ {s} UpdateNode(s ); Main() 15 g(sstart ) = rhs(sstart ) = ∞; g(sgoal ) = ∞; 16 rhs(sgoal ) = 0; OPEN = ∅; 17 insert sgoal into OPEN with key(sgoal ); 18 forever 19 ComputeShortestPath(); 20 Wait for changes to grid or traversal costs; 21 for all new cells or cells with new traversal costs x 22 for each node s on an edge or corner of x 23 UpdateNode(s);

Figure 4. The Multi-resolution Field D* Algorithm (basic version)

Thus, for each node, we look at all the cells that it resides upon as either a corner or along an edge and compute the least-cost path through each of these cells using the above approximation technique. We then take the cheapest of all of these paths and use its cost as the path cost of the node. Pseudocode of this technique is presented in Figure 3. In this figure, Pe is the (infinite) set of all points on edge e, EPe is a set containing the two endpoints of edge e, g i (p) is an approximation of the path cost of point p (calculated through using linear interpolation between the endpoints of the edge p resides on), c(s, p) is the cost of a minimum-cost path from s to p, and g(p) is the current path cost of corner point p. We say an edge e is a ‘low-resolution edge’ (line 8) if both the cells on either side of e are low-resolution. An efficient solution to the minimizations in lines 5 and 9 was presented in [3].

D. Ferguson and A. Stentz / Multi-Resolution Field D*

71

Figure 5. Multi-resolution Field D* used to guide an agent through a partially-known environment. On the left is a section of the path already traversed showing the high-resolution cells. This data was taken from Fort Indiantown Gap, Pennsylvania.

3. Multi-resolution Field D* The path cost calculation discussed above enables us to plan direct paths through non-uniform resolution grids. We can couple this with any standard path planning algorithm, such as Dijkstra’s, A*, or D*. Because our motivation for this work was robotic path planning in unknown or partially-known environments, we have used it to develop an incremental replanning algorithm which we call Multi-resolution Field D*. A basic version of the algorithm is presented in Figure 4. Here, the ComputeMinPathCost function (line 4) takes a node s and computes the path cost for s using the path costs of all of its neighboring nodes and interpolation across its neighboring edges. Apart from this, notation follows the D* Lite algorithm: g(s) is the current path cost of node s, rhs(s) is the one-step lookahead path cost for s, OPEN is a priority queue containing inconsistent nodes (i.e., nodes s for which g(s) = rhs(s)) in increasing order of key values (line 1), sstart is the initial agent node, and sgoal is the goal node. h(sstart , s) is a heuristic estimate of the cost of a path from sstart to s. Because the key value of each node contains two ˙ key(s ) iff the first quantities a lexicographic ordering is used, where key(s) <  element of key(s) is less than the first element of key(s ) or the first element of key(s) equals the first element of key(s ) and the second element of key(s) is less than the second element of key(s ). For more details on the D* Lite algorithm and this terminology, see [5]. As with other members of the D* family of algorithms, significant optimizations can be made to this initial algorithm. In particular, several of the optimizations presented in [4] are applicable and were used in our implementation. Because of its use of interpolation, Multi-resolution Field D* produces extremely direct, cost-effective paths through non-uniform resolution grids. Figures 1 and 5 show example paths planned using the algorithm.

72

D. Ferguson and A. Stentz / Multi-Resolution Field D*

Figure 6. Computation time and solution cost as a function of how much of the environment is represented at a high-resolution. The x-axis of each graph depicts the percentage of the map modeled using high-resolution cells, ranging from 0 (all modeled at a low-resolution) to 100 (all modeled at a high-resolution). (top) Initial planning. A path was planned from one side to the other of 100 randomly generated environments and the results were averaged. (bottom) Replanning. 5% of each environment was randomly altered and the initial paths were repaired.

4. Results Multi-resolution Field D* was originally developed in order to extend the effective range over which unmanned ground vehicles (such as the XUV vehicle in Figure 1) could operate by orders of magnitude. We have found the algorithm to be extremely effective at reducing both the memory and computational requirements of planning over large distances. To quantify its performance, we ran several experiments comparing Multi-resolution Field D* to uniform resolution Field D*. We used uniform resolution Field D* for comparison because it produces less costly paths than regular uniform grid-based planners and far better paths than regular non-uniform grid-based planners. Our first set of experiments investigated both the quality of the solutions and the computation time required to produce these solutions as a function of the memory requirements of Multi-resolution Field D*. We began with a randomly generated 100 × 100 low-resolution environment with an agent at one side and a goal at the other. We then took some percent p of the low-resolution cells (centered around the agent) and split each into a 10 × 10 block of high-resolution cells. We varied the value of p from 0 up to 100. We then planned an initial path to the goal. Next, we randomly changed 5% of the cells around the agent and replanned a path to the goal. We focussed the change around the robot to simulate new information being gathered in its vicinity. The results from these experiments are presented in Figure 6. The x-axis of each graph represents how much of the environment was represented at a high-resolution. The left graphs show how the time required for planning changes as the percent of high-resolution cells increases, while the middle and right graphs show how the path cost changes. The y-values in the middle and right graphs are path costs relative to the path cost computed when 100% of the environment is represented at a high-resolution. The right

73

D. Ferguson and A. Stentz / Multi-Resolution Field D*

Total Planning and Replanning Time (s) Initial Planning Time (s) Average Replanning Time (s) Percent high-resolution cells

Field D*

Multi-res Field D*

0.493

0.271

0.336

0.005

0.0007

0.0012

100

13

Figure 7. Results for uniform resolution Field D* versus Multi-resolution Field D* on a simulated robot traverse through real data acquired at Fort Indiantown Gap. The robot began with a low-resolution map of the area and updated this map with a high-resolution onboard sensor as it traversed the environment. The map was 350 × 320 meters in size.

graph shows the standard error associated with the relative path costs for smaller percentages of high-resolution cells. As can be seen from these results, modeling the environment as mostly low-resolution produces paths that are only trivially more expensive than those produced using a full high-resolution representation, for a small fraction of the memory and computational requirements. This first experiment shows the advantage of Multi-resolution Field D* as we reduce the percentage of high-resolution cells in our representation. However, because there is some overhead in the multi-resolution implementation, we also ran uniform resolution Field D* over a uniform, high-resolution grid to compare the runtime of this algorithm with our Multi-resolution version. The results of uniform resolution Field D* have been overlaid on our runtime graphs. Although uniform resolution Field D* is more efficient than Multi-resolution Field D* when 100% of the grid is composed of high-resolution cells, it is far less efficient than Multi-resolution Field D* when less of the grid is made up of high-resolution cells. Our second experiment simulated the most common case for outdoor mobile robot navigation, where we have an agent with some low-resolution prior map and high-resolution onboard sensors. For this experiment, we took real data collected from Fort Indiantown Gap, Pennsylvania, and simulated a robotic traverse from one side of this 350 × 320 meter environment to the other. We blurred the data to create a low-resolution prior map (at 10 × 10 meter accuracy) that the robot updated with a simulated medium-resolution sensor (at 1×1 meter accuracy with a 10 meter range) as it traversed the environment. The results from this experiment are shown in Figure 7. Again, Multiresolution Field D* requires only a fraction of the memory of uniform resolution Field D*, and its runtime is very competitive. In fact, it is only in the replanning portion of this final experiment that Multi-resolution Field D* requires more computation time than uniform resolution Field D*, and this is only because the overhead of converting part of its map representation from low-resolution to highresolution overshadows the trivial amount of processing required for replanning. In general, our results have shown that Multi-resolution Field D* produces paths that are trivially more expensive than those returned by uniform resolution Field D*, and it provides these paths for a fraction of the memory. Further, in many cases it is also significantly more computationally efficient than uniform resolution Field D*.

74

D. Ferguson and A. Stentz / Multi-Resolution Field D*

5. Conclusions We have presented Multi-resolution Field D*, an interpolation-based path planning algorithm able to plan direct, low-cost paths through non-uniform resolution grids. This algorithm addresses two of the most significant shortcomings of grid-based path planning: solution quality and memory requirements. By using a non-uniform resolution grid to represent the environment and linear interpolation to approximate the path costs of points not sampled on the grid, Multi-resolution Field D* is able to provide solutions of comparable quality to the most effective uniform resolution grid-based algorithms for a fraction of their memory and computational requirements. It is our belief that, by combining interpolation with non-uniform representations of the environment, we can ‘have our cake and eat it too’, with a planner that is extremely efficient in terms of both memory and computational requirements while still producing high quality paths. 6. Acknowledgements This work was sponsored in part by the U.S. Army Research Laboratory, under contract “Robotics Collaborative Technology Alliance” (contract number DAAD19-01-2-0012). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies or endorsements of the U.S. Government. Dave Ferguson is supported in part by a National Science Foundation Graduate Research Fellowship. References [1] D. Chen, R. Szczerba, and J. Uhran. Planning conditional shortest paths through an unknown environment: A framed-quadtree approach. IEEE International Conference on Intelligent Robots and Systems (IROS), 1995. [2] E. Dijkstra. A note on two problems in connexion with graphs. Numerische Mathematik, 1:269–271, 1959. [3] D. Ferguson and A. Stentz. Field D*: An Interpolation-based Path Planner and Replanner. International Symposium on Robotics Research (ISRR), 2005. [4] D. Ferguson and A. Stentz. The Field D* Algorithm for Improved Path Planning and Replanning in Uniform and Non-uniform Cost Environments. Technical Report CMU-RI-TR-05-19, Carnegie Mellon School of Computer Science, 2005. [5] S. Koenig and M. Likhachev. D* Lite. National Conference on Artificial Intelligence (AAAI), 2002. [6] N. Nilsson. Principles of Artificial Intelligence. Tioga Publishing Company, 1980. [7] R. Philippsen and R. Siegwart. An Interpolated Dynamic Navigation Function. IEEE International Conference on Robotics and Automation (ICRA), 2005. [8] H. Samet. Neighbor Finding Techniques for Images Represented by Quadtrees. Computer Graphics and Image Processing, 18:37–57, 1982. [9] Anthony Stentz. The Focussed D* Algorithm for Real-Time Replanning. International Joint Conference on Artificial Intelligence (IJCAI), 1995. [10] A. Yahja, S. Singh, and A. Stentz. An efficient on-line path planner for outdoor mobile robots operating in vast environments. Robotics and Autonomous Systems, 33:129–143, 2000.

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

75

Path and Observation Planning of Vision-based Mobile Robots with Multiple Sensing Strategies1 Mitsuaki Kayawake a,2 , Atsushi Yamashita a and Toru Kaneko a a Department of Mechanical Engineering Shizuoka University Abstract. In this paper, we propose a new path and viewpoint planning method for a mobile robot with multiple observation strategies. When a mobile robot works in the constructed environments such as indoor, it is very effective and reasonable to attach landmarks on the environment for the vision-based navigation. In that case, it is important for the robot to decide its motion automatically. Therefore, we propose a motion planning method that optimizes the efficiency of the task, the danger of colliding with obstacles, and the accuracy and the ease of the observation according to the situation and the performance of the robots. Keywords. Path planning, Viewpoint planning, Mobile robot, Landmark

1. Introduction In this paper, we propose a new path and viewpoint planning method for a mobile robot that has two active cameras according to the performance of the robot. The robot navigation is a very important technology to execute various tasks. The navigation is usually executed while the robots move and estimate their positions and orientations by using the information from several sensors. A dead-reckoning is a method that can estimate the robot position and orientation with internal sensor. However the error of it is accumulated in proportion to the traveling distance and the recovery of the error is impossible only with internal sensors. Therefore, external sensors are always utilized for the robot navigation. The robot can observe landmarks in the environment and measure the relationship between these points and the robot itself in the image-based navigation. When the robots observe them, the problems are how to measure the accurate position and orientation of landmarks, and where and which landmarks the robots should observe while there are multiple landmarks. As to the former problem, there are a lot of studies that improve the accuracy of 3-D measurement, i.e., [1]. However, there is a limit in accuracy when the robot always observes the same landmark regardless of the distance between the robot and the landmark. 1 Tel.:

81 53 478 1604; Fax: 81 53 478 1604; [email protected].

2 E-mail:

76

M. Kayawake et al. / Path and Observation Planning of Vision-Based Mobile Robots

Therefore, the latter problem is very important for the robot navigation. This means that the robot must decide the path and choose the observing landmarks according to its position while it moves[2]. Navigation planning methods under the uncertainty are also proposed[3]-[6]. On the other hand, the design of the optimal arrangement of artificial landmark is very important in the indoor environment[7]. Takeuchi et al. proposed a method to dispose artificial landmarks in the environment, and to navigate a mobile robot there[8]. The landmarks are disposed so that the robot can observe at least one landmark at any positions, and the robot plans the observation so that they can observe landmarks.

2. Purpose We have already proposed the landmark observation method with multiple strategies[9]. In [9], the robot used the C-obstacle (configuration obstacle) to escape the danger of the collision with the obstacle, and the only width of the expansion of C-obstacle was decided. However, when the given environment is complex, deciding the expanding width becomes difficult. If the expanding width of C-obstacle was unsuitable, the robot was not able to reach the destination, or it had to travel unnecessary long moving distance. Therefore, the robot had to look for appropriate expanding width of C-obstacle. Then, the robot in this work searches for the path of the expanding width of two or more Cobstacles, and discovers the path in appropriate width among that. In this method, a primary value of the expanding width of C-obstacle is given to the robot, and the robot continue to increase constant width about the expanding width of C-obstacle. The robot searches for the path in the expanding width of each C-obstacle. When the path that can be connected with the destination is not found, the robot ends the search for the path. Using observation technique of [9], we propose the improvement technique for path planning. We make the shape of C-obstacle better. In [9], the vicinity of the vertex of Cobstacle has been expanded more widely than original expanding width. Therefore, there was a problem to which it was not able to pass in the path that should be able to pass. In this work, the corner of C-obstacle is expressed by the polygon. The expanding width in the vicinity of the vertex approximates to original expanding width by this method, and solves problem of [9]. In [9], the n-th shortest paths were searched in each visibility graph by a simple brute force method. In our path planning, the path is planned by Dijkstra algorithm. Even if the combination of all the vertices is not confirmed, this technique can discover the shortest path. As the result, the amount of the path searching is decreased. Optimal path and viewpoint planning is necessary for the robot to execute works efficiently. At the same time, “optimal” path and viewpoints change according to the performance of the robot. Some previous works can plan the optimal paths and viewpoints of the robot [10]. However, they don’t consider the relationship between optimal planning results and the performance of the robot explicitly. For the robot with good performance, path can be planned by giving priority to moved distance more than safety. On the other hand, the robot with bad performance will be able surely to reach the goal the plan of path to which it gives priority to safety. Therefore, “optimal” path and viewpoints depend on the performance of the robot. There are multiple evaluation methods such as high accuracy, path length, and safety. Evaluation methods also change when the performance changes. Therefore, we propose a new path and viewpoint planning method for a mobile

M. Kayawake et al. / Path and Observation Planning of Vision-Based Mobile Robots

77

Environmental information

Setting primary error tolerance Replan

Shortest path planning Observation planning

Possible to reach the destination ? Yes

No

Alternative path planning Repeat

Observation planning

All vertexes of C-obstacles are selected

No Yes

Changing in error tolerance Can the path connected from the start to the goal be planned?

Yes

No

Selection of the shortest path that can reach Motion planning

Figure 1. Path and observation planning.

robot with multiple observation strategies according to the performance of the robot. We verify the effectiveness of this method through simulations. The accuracy of multiple observation strategies depends on the positions of landmarks that can be observed. Therefore, the robot must choose the optimal landmarkobservation strategy to consider the number and the configuration of visible landmarks in each place. The path and viewpoints are planned by considering not only the accuracy of the optimal observation strategy but also the dead-reckoning error of the robot also plans the path and viewpoints. Our motion planning method is designed to guarantee that the robot never collides with obstacle. The robot selects the shortest path from the safe paths. After deciding the shortest path, the number of viewpoints is minimized. This is because it is desirable that the cameras equipped with the robot are utilized for the other uses of the landmarkobservation. The motion planning begins from the input of environmental information including the landmark position, the start position, the goal position, and the extra information to the robot (Figure 1). The shortest path and the viewpoint are planned from input environmental information. As for shortest path, it is distinguished whether to reach the goal without colliding with the obstacle by the observation plan. When it is possible to reach goal position, error tolerance is changed and it plans from the shortest path plan again. When it is not possible to reach the goal, paths other than shortest path are planned and

78

M. Kayawake et al. / Path and Observation Planning of Vision-Based Mobile Robots

Landmark 1

Landmark

Landmark 1 Landmark 2 Landmark 3

Landmark 2 (a) Stereo observation.

(b) Two-landmark observation.

(c) Three-landmark observation.

Figure 2. Three observation strategies.

the viewpoint of those is planned. Shortest path that can reach the goal in searched path is assumed to be optimal path.

3. Three Observation Strategies 3.1. Problem Statement We make the assumptions for the planning of the robot navigation. The robot can move in all direction at any time and uses the information from two active cameras that can change their directions independently. The shape of the robot is expressed as the circle whose radius is R. The environment map that indicates the positions of walls, obstacles, and landmarks is also previously provided to the robot. Therefore the map is available in the robot. All landmarks whose heights are same with those of robot’s cameras are attached to the environment. The shape of the landmark is a circle and the radius of each landmark is constant. Each landmark can be distinguished with each other. The detail of each error of the observation is explained in [9]. We develop three observation strategies: (a) stereo observation, (b) two-landmark observation, and (c) three-landmark observation (Figure 2). 3.2. Stereo Observation The stereo observation can be executed when one or more landmarks are inside the robot’s field of view. The robot estimates its position and orientation with the triangulation. In this strategy, the 3-D positions of left and right ends of the observed landmark are measured with the information of disparities. The position and orientation of the robot in the world coordinate can be calculated from the coordinate value of two points. 3.3. Two-Landmark Observation The two-landmark observation can be executed when two or more landmarks are inside the robot’s field of view. Left and right ends of the nearer landmark and a center point of the distant landmark are extracted from two acquired images. This observation calculates only angle information from that image. The position and the orientation of the robot in the world coordinate can be decided as a result. 3.4. Three-Landmark Observation The three-landmark observation can be executed when three or more landmarks are inside the robot’s field of view. The relationship between three landmarks and the robot is estimated from the coordinate value of the center of three landmarks in images.

M. Kayawake et al. / Path and Observation Planning of Vision-Based Mobile Robots Accurate position of landmark

Accurate size of landmark

79

Observed size of landmark

v

u

Observed position of landmark

r

(a) Position error. (b) Size error. Figure 3. Image error.

Next position error from position error

Next viewpoint (ideal)

Next viewpoint (ideal) Robot

Angle error

Next viewpoint (actual)

Position error

Next position error from angle error

Next viewpoint (actual) Robot

(a) Position error. (b) Orientation error. Figure 4. Position and orientation error of robot.

The accuracy of the estimated position and orientation become higher as compared with two-landmark observation method. This is because the distance between the extracted points in images is larger in three-landmark observation than in two-landmark observation. In addition, the image noise at the edge of the landmark is generally larger than that at the center, because the coordinate value of the center position in the image coordinate can be obtained accurately by calculating the center of gravity of the pixels that belong to the landmark. 3.5. Optimal Observation Strategy The robot chooses the optimal landmark-observation strategy that can estimate its position and orientation precisely. The optimal strategy can be decided when the path and the position of the viewpoint is planned. At first, the visible landmarks in each place are selected to consider the robot’s field of view. The robot cannot observe landmarks when obstacles are between the robot and landmarks and cannot observe them from the back side. Incidentally, the error of image such as quantization error always occurs. Then, the theoretical estimation errors of robot’s position and orientation are calculated by considering the situation that the errors of landmark’s position and size (shape) in images occur (Figure 3). We assumed that the position error of the landmark’s center point in the image is (Δu, Δv). The size error of the landmark in the image Δr is also considered. It means that the observed landmark’s position in the image may shift (±Δu, ±Δv) from the true position at the maximum. The observed radius may also shift ±Δr from the true size.

80

M. Kayawake et al. / Path and Observation Planning of Vision-Based Mobile Robots

The robot estimates how the position and orientation errors occur in the world coordinate about all combination of visible landmarks when the errors occur in the images (image coordinates) of two cameras. However, the position error and the orientation error are not compared directly because the dimensions of them are different from each other. The position error is expressed as the dimension of length, i.e., [mm], and the orientation error is expressed as the dimension of angle, i.e., [deg]. Therefore, we transform the orientation error (the dimension of angle) into the position error (the dimension of length). The total sum of the error when the robot moves at a certain distance while the position and orientation error occur is calculated (Figure 4). This means that the total error at the next time’s position of the robot when it moves is the sum of the position error (Epos,max , Figure 4(a)) and the incorrect moving distance under the influence of the orientation error (Eang,max , Figure 4(b)). Epos,max is the distance between the ideal robot position without error and the real position with error derived from position error. Eang,max is the distance between the ideal robot position without error and the real position with error derived from orientation error. Therefore, Epos,max and Eang,max have the same dimension of length. In this way, the estimated error of the robot position in the world coordinate that is caused by the image error is calculated. The optimal observation strategy is equal to the observation method that has minimum position estimation error. The optimal observation strategy is decided as follows: Emax (p, m) = Epos,max (p, m) + Eang,max (p, m) → min,

(1)

where p is the present position, m is the moving distance, and Emax (p, m) is the total error when the robot move distance m from p. The direction of two cameras is decided by selecting the optimal observation strategy in each place. Therefore, the navigation planning of the mobile robot can be executed.

4. Path and viewpoint Planning 4.1. Path Planning The path of the robot is planned based on the visibility graph.In our motion planning method, original obstacles are expanded R + S where S is the margin for safety (Figure 5(a)). In this paper, we call S the error tolerance. When generating C-obstacles, shapes of their vertices are approximated with the polygons because of simplicity of computations. Therefore, their vertices were approximated to intersection T1 , T2 of Li (i = 1, 2, 3). The vertices of C-obstacles are connected with each other and a visibility graph is constructed (Figure 5(b)). In this step, multiple visibility graphs are generated by changing S for optimal planning. This is because it is difficult to find the best S in complex environment beforehand. The robot of bad performance can plan path that safely reaches the goal by changing S(Figure 5(c)). In each visibility graph, the shortest path from a start position to a goal one is searched by Dijkstra algorithm (Figure 5(d)). However, the robot cannot necessarily

M. Kayawake et al. / Path and Observation Planning of Vision-Based Mobile Robots

L1

T1

81

L2 T2 L3

C-Obstacle

R+S Obstacle

Obstacle

start

(a) C-obstacle.

Goal

(b) Visibility graph (S: small).

Shortest path

A Alternative path

(c) Visibility graph (S: large).

(d) Shortest path and alternative path.

Figure 5. Expanded C-obstacle and path planning.

move along the shortest path due to its performance and the configuration of obstacles and landmarks, and several candidates of path must be prepared. Therefore, to search for paths other than shortest path by Dijkstra algorithm, the vertex of C-obstacle is chosen at random as a relay point(= A) of the start position and the goal one. About the relay point, path from start position to A and path from A to goal position are searched by Dijkstra algorithm. The paths are composed by the connection of them. In this paper we call them alternative paths. Path for which it searched by choosing the relay point at random and quite different path can be planned. 4.2. Viewpoint Planning Viewpoint is planned about each path that is planned in the previous step. Viewpoint planning for arriving at the goal position safely is executed by considering the observation error of landmarks and dead-reckoning error. The robot estimates its position and orientation by dead-reckoning while moving after observing landmark(s). If it moves long distance after estimating its position with landmark(s), the error of estimated position and orientation is accumulated in proportion to the traveling distance. Therefore, the robot must estimate its position with landmark(s) frequently before position error is accumulated. Here, it is assumed that the robot observes landmark(s) where the distance between the present position ps and the start position is ms along the planned path. Let Dmax (ps , m) be the estimated maximum dead-reckoning error when the robot move distance m from ps , Emax (ps , m) be the estimated maximum error from the observation (1), and S (error tolerance) means the maximum error of the robot position for not colliding with obstacles. The maximum distance mmax that the robot can move without observing landmarks is expressed as follows (Figure 6(a)): Dmax (ps , mmax ) + Emax (ps , mmax ) ≤ R + S.

(2)

82

M. Kayawake et al. / Path and Observation Planning of Vision-Based Mobile Robots Ideal position of the future Ideal position

reach most distant point = next view point

mmax

R+S

Error

Ps

Actual position

P1

Position that comes off from range by error at first

(a) Distance that can be moved without observation.

A P2 Pg planned path path with error mPs,max (b) Selection of view points.

Figure 6. Observation planning

Here, let pg be the position whose distance from the start is ms + mps ,max . The path from ps to pg is divided into n + 1 position, and we define mi as the distance between the each divided position pi and the start position of the path (p0 = ps , pn = pg ). When the next viewpoint from ps is pi , the next viewpoint from pi must satisfy the following equation when the total number of observation becomes small (Figure 6(b)). mi + mpi ,max → max .

(3)

The robot selects the optimal landmark-observation strategy that satisfies (3). Therefore, the viewpoints can be decided one by one in the following way. 1. mps ,max and pg that satisfy (2) are calculated when the robot observes landmarks at ps . 2. The path from ps to pg is divided into n + 1 position pi . 3. pi that satisfies (3) is calculated, and pi is regarded as the next viewpoint. 4. If mi + mpi ,max is smaller than the distance between the start position and the goal position of planned path, ps is replaced with pi and go to step 1). If it is large, the viewpoint planning finishes. The optimal viewpoints, the optimal observation strategies in each viewpoint, the optimal observed landmark(s) and the direction of the cameras in each viewpoint could be planned in the above procedure.

5. Results of Motion Planning In this section, the effectiveness of the motion planning by the difference of the performance of the robot is verified. In this paper, robot performance means dead-reckoning error and image error. The dead-reckoning error shows at the rate how much error occurs about moving distance. The image error shows the position’s error of the landmark and the size error of the landmark. Figure 7(a) shows environment. The primary value of the Table 1. Robot performance. Name

Dead-reckoning error

Robot1

±10 %

±1 Pixel

Robot2

±40 %

±1 Pixel

Robot3

±40 %

±5 Pixel

Image error

M. Kayawake et al. / Path and Observation Planning of Vision-Based Mobile Robots 1m

: Landmark 1m

1m Robot3

Observation point

Obstacle Destination

83

Robot2

Robot1

Start

(a) Environment.

(b) Robot1 and Robot2 path.

(c) Robot3 path.

Figure 7. Simulation results. Table 2. Comparison of path. Name

Error torelance (cm)

Distance (cm)

Robot1

60

2359

6

Robot2

120

3246

19

Robot3

150

5140

31

The total number of observation

expanding width of C-obstacle is decided to S=0cm, and the increasing constant width is decided to 30cm. We compare Robot1 whose dead-reckoning error and image error is small, Robot2 whose dead-reckoning error is large, and Robot3 whose dead-reckoning error and image error are large, for evaluating the relationship between the planning result of the path and the viewpoints and the performance of the robots (Table 1). As the result of planning, Robot1 and Robot2 can reach the goal along the paths shown in Figure 7(b). In addition Robot3 can reach the goal along the path shown in Figure 7(c). The path of Robot1 is narrow and the landmark that can be observed is few. However, because the dead-reckoning performance of Robot1 is good, the path planned can be traced accurately. Moreover, because the observation accuracy is good, the error can be accurately corrected by a little observation frequency. On the other hand, because the dead-reckoning performance of Robot2 is worse than Robot1, it cannot trace accurately the planned path. Therefore, it cannot plan the path of Robot1. In addition, the path reaching the destination was not found at this error tolerance (=60cm). Then, error tolerance is changed, and Robot2 plans the path with wide width of the road though the distance of it is longer than the path of Robot1. Moreover, because the dead-reckoning performance of Robot2 is bad, it should correct frequently the dead-reckoning error by observing the landmark. As a result, the observation frequency of it is more than frequency of Robot1. The path of Robot3 is wide and landmarks that can be observed are many. Because the dead-reckoning performance and the observation accuracy of Robot3 are worst, it cannot be moved along the planned path. Therefore, it plans the path that safely reaches the destination though moving distance becomes long. In addition it is understood that Robot3 observed a lot of landmarks and moves along the route. From these results, it is shown that the optimal path observation points, observation strategies can be planned according to the performance of the robot. In concrete terms, the robot with high performance (small dead-reckoning and image error) can select the short path from the start to the goal, although there are few landmarks and this is a dangerous path. Contrarily, the robot with low performance selects the safe path in which a lot of landmarks can be observed and the accuracy of positioning is high, although the distance between the start and the goal is long.

84

M. Kayawake et al. / Path and Observation Planning of Vision-Based Mobile Robots

6. Conclusion In this paper, we propose a new path and viewpoint planning method for autonomous mobile robots with multiple observation strategies. The robot estimates its position by observing landmarks attached on the environments with two active cameras. It chooses the optimal landmark-observation strategy depending on the number and the configuration of visible landmarks in each place. The optimal path, viewpoints, and observation strategies that minimize the estimation error of the robot position and the number of observation can be selected properly. The effectiveness of our method is shown through simulations. As the future works, it should be better that models of sensor and motion error are based on probabilistic ones, such as Kalman filters, particle filters, SLAM.

Acknowledgment This research was partly supported by CASIO Science Promotion Foundation, Japan.

References [1] M. Okutomi and T. Kanade: “A Multiple-baseline Stereo,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.15, No.4, pp.353–363, 1993. [2] K. Nagatani and S. Yuta: “Path and Sensing Point Planning for Mobile Robot Navigation to Minimize the Risk of Collision,” Proceedings of the 1993 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.2198–2203, 1993. [3] A. Lazanas and J.-C. Latombe: “Motion Planning with Uncertainty: A Landmark Approach,” Artificial Intelligence, Vol.76, No.1-2, pp.287–317, 1995. [4] Th. Fraichard and R. Mermond: “Integrating Uncertainty and Landmarks in Path Planning for Car-Like Robots,” Proceedings of IFAC Symposium on Intelligent Autonomous Vehicles, Vol.2, pp,397–402, 1998. [5] N. Roy and S. Thrun: “Coastal Navigation with Mobile Robots,” Advances in Neural Processing Systems 12, Vol.12, pp.1043–1049, 1999. [6] J.P. Gonzalez and A. Stents: “Planning with Uncertainty in Position: An Optimal and Efficient Planner,” Proceedings of the 2005 IEEE International Conference on Intelligent Robots and Systems, pp,607–614 2005. [7] K. Tashiro, J. Ota, Y. C. Lin and T. Arai: “Design of the Optimal Arrangement of Artificial Landmarks,” Proceedings of the 1995 IEEE International Conference on Robotics and Automation, pp.407–413, 1995. [8] K. Takeuchi, J. Ota, K. Ikeda, Y. Aiyama and T. Arai: “Mobile Robot Navigation Using Artificial Landmarks,” Intelligent Autonomous Systems 6, pp.431–438, 2000. [9] A. Yamashita, K. Fujita, T. Kaneko, and H. Asama: “Path and Viewpoint Planning of Mobile Robots with Multiple Observation Strategies,” Proceedings of the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.3195-3200, 2004. [10] T. Kanbara, J. Miura, Y. Shirai, A. Hayashi and S. Li: “A Method of Planning Movement and Observation for a Mobile Robot Considering Uncertainties of Movement, Visual Sensing, and a Map,” Proceedings of the 4th Japan-France / 2nd Asia-Europe Congress on Mechatronics, pp.594–599, 1998.

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

85

Mobile Robot Motion Planning Considering Path Ambiguity of Moving Obstacles Hiroshi Koyasu and Jun Miura Department of Mechanical Engineering, Osaka University Suita, Osaka 565-0871, Japan Email: {koyasu,jun}@cv.mech.eng.osaka-u.ac.jp Abstract. This paper describes a motion planning method for mobile robot which considers the path ambiguity of moving obstacles. Each moving obstacle has a set of paths and their probabilities. The robot selects the motion which minimizes the expected time to reach its goal, by recursively predicting future states of each obstacle and then selecting the best motion for them. To calculate the motion for terminal nodes of the search tree, we use a randomized motion planner, which is an improved version of a previous method. Simulation results show the effectiveness of the proposed method. Keywords. motion planning, dynamic environment, mobile robot

1. Introduction Planning a path without collision with both static and dynamic obstacles is one of the fundamental function of mobile robots. Most of past research on motion planning in dynamic environments can be classified into two categories. One is for reactive methods, which do not predict the paths of moving obstacles [5]. In such a reactive method, not the optimality but the safety is more important. The other is for deliberative methods which are based on the assumption that the paths of moving obstacles are completely known; an optimal motion can be generated by employing a planning in space-time [3], or by using the concept of velocity obstacle [6,10]. However, it is difficult to predict paths of obstacles without uncertainty. We divide the uncertainty of an obstacle motion into the path ambiguity (which path to choose) and the error in following a path. The latter can be handled by considering the range of uncertainty in motion planning [2]. A similar approach is, however, difficult to apply to the former because a very inefficient motion may be planned when the possible paths of an obstacle spread in a wide area. It is, therefore, necessary to cope with multiple path candidates. Bennewitz et al. [1] proposed a motion planning method which considers multiple path candidates of obstacles. To avoid collision with an obstacle which has multiple path candidates, they use a heuristic cost function which is defined in space-time; the cost is set to be large near path candidates and proportional to the probability of each path candidate. Using this cost function, motion planning is done by using the A∗ search

86

H. Koyasu and J. Miura / Mobile Robot Motion Planning

algorithm. The method, however, does not consider the change of the environment to be recognized by future observations. Miura and Shirai [9] proposed to explicitly predict possible future states (i.e., which path a moving obstacle actually takes) of the environment and their probabilities for motion planning. By recursively searching for the optimal motion for each predicted state, they selected the optimal next motion minimizing the expected time to the goal. The method was, however, tested only for the case where there is only one moving obstacle with two path candidates and the possible motions of the robot are given in advance. In this paper, we expand our previous method [9] so that it can handle multiple moving obstacles with multiple path candidates and can plan robot motions according to the obstacle configuration and movements. We use a randomized motion planner, which is an improved version of a previous method. We show the effectiveness of the proposed method by simulation experiments. 2. Model of Moving Obstacles In the environment, there are static and moving obstacles. Positions of static obstacles are given in advance. Each moving obstacle has its own set of candidate destinations and moves to one of them with avoiding collision with static obstacles. We represent a set of possible paths by using a tangent graph [8]. The initial probability of each obstacle taking one of its paths is given. The robot periodically observes the position and the velocity of each moving obstacle, and updates the probabilities of its path candidates. When a new observation is obtained, the probability of the jth candidate path of the ith obstacle is updated by (1) P (pathij | oi (t)) = αP (pathij )P (oi (t) | pathij ), where oi (t) is the observed position of the obstacle at time t, and α is a normalization constant. P (pathij ) is given by P (pathij | oi (t − 1)). P (oi (t) | pathij ) is the likelihood of the path calculated by using the Gaussian positional uncertainty. 3. Predicting the State of Moving Obstacles We define a state of a moving obstacle as a set of possible obstacle path candidates. The states can be classified into two cases. One is the case where only one candidate is remaining. We call this case a fixed state. The other is the case where multiple candidates are still remaining. We call this case an ambiguous state. An ambiguous state will be changed into a fixed state by future observations; the probability of a fixed state thus becomes large as time elapses. 3.1. Uncertainty of obstacle motion on a path we assume that the error of a predicted position of an obstacle on a path is distributed within the so-called 3σ region of a 2D Gaussian. Its mean is the position predicted by assuming that the obstacle moves at the currently observed speed. The variance of the Gaussian perpendicular to the path is constant by supposing that an obstacle tries to follow the path as closely as possible. The variance along the path is proportional to the moving distance of the obstacle. 3.2. Calculating the probability of a state We first consider the case where there is one obstacle with two path candidates. Fig. 1 shows the relationship between the prediction of the positional uncertainty and the path ambiguity in such a case. In the figure, path1 and path2 are drawn as black arrows. Black points are predicted positions on the path candidates at time t1 and t2 . Let Ri (t) be the

H. Koyasu and J. Miura / Mobile Robot Motion Planning

87

Obstacle position and its uncertainty at time t 1 R 2 (t1 ) position and its uncertainty R 1 (t 1 ) at time t 2 path 2

R 2 (t 2 )

R 1 (t 2 ) path 1

Figure 1. Prediction of the positional uncertainty and the path ambiguity of a moving obstacle.

3σ region of the Gaussian on pathi at time t (i = 1, 2). If the obstacle is observed in the overlapped region of R1 (t) and R2 (t) (the hatched region in the figure), the state is an ambiguous state. The probability of this ambiguous state is calculated by (2) P (path1 )P (x1 (t) ∈ R2 (t)) + P (path2 )P (x2 (t) ∈ R1 (t)), where P (pathi ) is the prior probability of pathi , and xi (t) is the position at time t when the obstacle is actually on pathi . The first term of Eq. (2) is the probability that the obstacle on path1 cannot be determined to be on either of the paths. P (x1 (t) ∈ R2 (t)) is calculated by integrating the probability density function of x1 (t) in R2 (t). There are two fixed state. One is the case where the obstacle is determined to be on path1 , and the other is the case of path2 . The probability of the case of path1 is calculated by / R2 (t)). (3) P (path1 )P (x1 (t) ∈ Next, let us consider the case where there are n path candidates. The number of states of the obstacle is 2n − 1, that is, all combinations of paths. The probability of an ambiguous state where there are m remaining path candidates is calculated as follows. Let L = {l1, · · ·, lm } be a set of indices of the remaining path candidates. For the case where the obstacle is on pathli , the probability that the m paths are still possible is equal to the probability that xli is inside regions Rlj (j = 1, · · ·, i−1, i+1, · · ·, m) and is outside the other n − m regions. The weighted sum of such probabilities for the m remaining paths becomes the probability of the ambiguous state as follows: m  P (pathli )P (xli (t) ∈ ∩j=i Rlj (t), xli (t) ∈ / ∪k∈L (4) / Rk (t)). i=1

Also, the probability of a fixed state where the obstacle is on pathi is given by / ∪j=i Rj (t)). (5) P (pathi )P (xi (t) ∈ When there are multiple moving obstacles, a state of the environment is represented by a set of the states of all obstacles, and its probability is the product of the probabilities of the states of the obstacles. A state of the environment is called fixed if the paths of all obstacles are uniquely determined. Otherwise, a state is ambiguous. 4. Planning Method The robot selects the motion which minimizes the expected cost. We use the time to reach the goal as the cost. We first predict possible future states of the environment and their probabilities. Then, by recursively searching for the optimal motion for each predicted state, we select the optimal next motion which minimizes the expected cost. This search process thus composes an AND/OR tree of robot motions and predicted states of the environment, as shown in Fig. 2. Once the next motion is determined by this search process, the robot executes it and observes the environment to update the state. The robot plans the subsequent motion based on the updated state.

88

H. Koyasu and J. Miura / Mobile Robot Motion Planning

Initial State

Motion i

Motion M

Minimization

State j

State L

Summation weighetd by probability

Motion 1

State 1

Motion i

fixed state

State j’

ambiguous state

Depth limit Motion i

Figure 2. AND/OR tree search.

S mi

Ci

mj 1

Sj C 1j

l

Sj C lj

prune Cj

goal n

Sj >C i

goal Figure 3. Prune a branch of the AND/OR tree.

Figure 4. A path for calculating the lower bound.

4.1. Search for the Motion Minimizing the Expected Cost We explain the search for the motion minimizing the expectation of the cost using Fig. 2. We first pick a next motion and predict a set of states after the robot executes the motion. If a state in the set is fixed, we calculate the cost of the state by using the motion planner described in Sec. 4.3. If a state is ambiguous, we recursively search for the optimal motion (and thus the cost) for the state. We limit the search depth to avoid combinatorial explosion. When the search reaches the depth limit, we calculate a feasible robot motion which considers all possible obstacle paths, i.e., plan a robot motion by assuming that every remaining path candidate is actually taken by some obstacle. The expectation of the cost of the next motion is then obtained by calculating the sum of the expectation of the costs of all possible states weighted by their probabilities. We calculate the expectation of the cost for every candidate for the next motion, and select the one which has the minimum expectation of the cost. 4.2. Pruning of the AND/OR Tree by Using Lower Bound To reduce the computation cost of the search, we perform a pruning of the AND/OR tree. Fig. 3 shows this pruning process, when searching for the motion minimizing the expected cost for state S. Suppose that for motion mi in state S, expected total cost Ci is calculated. This cost is the sum of the cost from the initial state to S and that from S to the goal state. For another motion mj , if its lower bound Cj is larger than Ci , mj can be pruned. This lower bound changes as the total cost of a predicted state to be reached after executing mj is obtained. Let Sjk (k = 1, · · · , n) be the predicted states. After the costs Cjk for Sjk (k = 1, · · ·, l ≤ n) are calculated, the lower bound becomes

H. Koyasu and J. Miura / Mobile Robot Motion Planning

Cj l = Cj +

l 

  P (Sjk ) Cjk − Cj .

89

(6)

k=1

If Cj l > Ci , we prune branch mj without calculating the costs of remaining n − l states. Lower bound Cj is calculated as follows. We consider a motion pair which is composed of a turning motion and a straight motion to reach the goal from the current position, with neglecting any collisions with obstacles, as shown in Fig. 4. We examine the cost of such motion pair for a given set of turning radii and set the minimum cost to Cj . 4.3. Calculating a Feasible Path To calculate a feasible path for a fixed state (including the case where a robot has to make a plan by considering all possible path candidates at the depth limit), we use an improved version of the method proposed by Hsu et al. [4]. Their method is a randomized search method which considers non-holonomic constraints of the robot motion. The method picks a robot motion at random and generates a probabilistic roadmap of sampled state×time points, called milestones, connected by short admissible trajectories. The probabilistic roadmap consists of a tree of milestones whose root is the initial robot pose. This search process finishes by finding a milestone which is in the endgame region. In the method, the completeness is an important issue, but the quality of planned path is not fully discussed. Since the method finishes the search by finding only one path, a found path may be much more inefficient than the optimal one. We improve their method in the following points: • Give weights to milestones so that a wide space is explored. • Iterate the path search until a sufficiently good path is found. In this paper, the endgame region is defined as the set of robot poses from which a feasible path to the goal can be generated as the one shown in Fig. 4. 4.3.1. Weighting Milestones In sampling milestones, if the weights of milestones are uniform, the tree may grow under a set of limited milestones; this may make a path quality low. To deal with this problem, Hsu et al. [4] give a weight to a milestone inversely proportional to the number of other milestones in the neighborhood of the milestone in the configuration space. However, the distance between milestones under a specific milestone is not necessarily small when their depths are largely different. Rapidly-exploring Random Tree (RRT) [7], which uses the size of the Voronoi regions for weighting, may have the same drawback. To make the tree grow relatively uniformly, we give the same weight to subtrees if their roots are at the same depth. Therefore the weight to milestone i with depth di and with bi motions which have not been expanded is given by (bi /B)(1/B)di , where B is the number of robot motions (i.e., branching factor). However, searching by using these weights is probabilistically equal to the breadthfirst search; this may require much computation. So we put more weights on deeper milestones. The weight of the milestone is modified as (bi /B)(a/B)di , where a (> 1) is a constant. 4.3.2. Evaluating the Quality of a Path We iterate the path search several times and select the best path among the generated ones. We here consider the problem of determining when to stop the iteration. To solve this problem, we need to be able to evaluate how good the current best path is. Therefore, we estimate the proportion of the number of potential paths whose costs are greater than

90

H. Koyasu and J. Miura / Mobile Robot Motion Planning

milestones milestones that found collision

di

found path

i

D

potential path

j

2

proportion

1

Figure 6. Milestones and their potential paths.

Figure 5. Number of potential paths on a milestone tree. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

bi B

acutual path quality predicted path quality

0

1000

2000

3000

4000

5000

rank Figure 7. Predicted path qualities.

that of the current best path to the total number of all possible paths. After a feasible path is found, if the lower bound of a milestone is larger than the cost of the feasible path, no paths passing the milestone are more efficient than the feasible path. To estimate the proportion described above, we need to estimate the number of potential paths which passes a milestone. We assume that milestones at the same depth have the same number of potential paths. Under this assumption, a deeper milestone has fewer potential paths; Fig. 5 illustrates this property. In the figure, D denotes the maximum depth of the tree, which is set to that of the current best path. The total number of the potential paths is B D (B is the branching factor). Fig. 6 illustrates how to calculate the number of potential paths of a milestone. In the figure, milestone i has a child milestone j. We consider that a potential path belongs to the deepest milestone on the path. So in the figure, potential path set 1 belongs to milestone i and set 2 belongs to milestone j. A milestone which is a child of milestone i and has not been expanded has B D−(di +1) potential paths. If milestone i has bi such children, its number of potential paths is bi B D−(di +1) . If a milestone cannot be added to the tree due to collision, the potential paths passing it can never be obtained as feasible paths. So when milestone j cannot be added, the number of potential paths passing it is B D−dj , and this number should be excluded from the total number of the potental paths. Let I be the set of the indices of milestones whose lower bounds are greater than the cost of the current best path, and J be the set of the indices of milestones which cannot be added due to collision. The proportion of the number of potential paths whose costs are greater than that of the current best path to the total number of potential paths is thus

91

H. Koyasu and J. Miura / Mobile Robot Motion Planning

obstacle 1

goal obstacle 2

robot position

obstacle 1

robot position

t=0 obstacle 1

goal obstacle 2

robot position

goal obstacle 2

t=1 obstacle 2

goal obstacle 1

robot position

t=3

t=4

obstacle 1

goal obstacle 2

robot position

t=2 obstacle 2

goal obstacle 1

robot position

t=5

Figure 8. Simulation result.

estimated by



bi B D−di −1  . − j∈J B D−dj

i∈I

BD

(7)

If the estimated proportion becomes greater than 1−α, the current best path is expected to belong to the best 100α% of the entire set of potential paths. If this condition is satisfied with a small α, we consider that a sufficiently good path is obtained. Currently, we use 0.1 as α. To validate Eq. (7), we performed the following experiment. First, we generated 5000 paths from a milestone tree. Next, for each path, we calculated the estimated proportion given by Eq. (7) using the path as the current best path, and compared it with the actual proportion. Fig. 7 shows the comparison result. The estimated proportion is always less than the actual one mainly because the estimation is based on the lower bound. When the quality of a path is high enough, the estimated proportion is near to the actual; this suggests that this estimated proportion is a good measure of path quality. 5. Experimental Results Fig. 8 shows the results of a simulation experiment. In the figure, white regions are free space, dark gray lines are path candidates of moving obstacles. Gray lines are planned

92

H. Koyasu and J. Miura / Mobile Robot Motion Planning

goal obstacle 2 obstacle 1

robot position(start)

Figure 9. Probabilities of the paths of each obstacle.

Figure 10. Result of motion planning without considering ambiguity.

robot paths, among which a darker line indicates a path with a higher probability. On paths of both the robot and the obstacles, circles are drawn with a fixed time interval. In the experiment, the number of obstacles is two, the number of initial path candidates of each obstacle is three, and the depth limit of search is two. A motion of the robot is represented by a pair of the turning radius and the speed. The number of robot motions is nineteen. The prior probability of each path candidates is set equally. Obstacle 1 actually takes the right path, and obstacle 2 takes the middle. Fig. 9 shows the change of the probabilities of paths for each obstacle. For obstacle 1, the probability of taking the right path becomes dominant at t = 1, but the ambiguity of paths remains until t = 4. For obstacle 2, the probability of taking the middle path becomes dominant at t = 1, but the ambiguity of paths remains until t = 4. At time t = 0, since all paths of the obstacles are equally probable, the robot selects the motion to move a position where the robot can safely avoid collision with the obstacles even if they will take any paths. At time t = 1, the situation is still highly ambiguous and the robot selects a motion similar to the one selected at time t = 0. At time t = 2, since the path of each obstacle is almost uniquely determined, only a few robot motions in a generated plan have high probabilities. At time t = 3, the robot now knows that both obstacles will not take their left paths and generates a plan in which the probability of robot motions which will cross over the left paths. At time t = 4, there is no obstacle in front of the robot. However, since the environment is not fixed, the robot generates two slightly different paths due to the randomness of the planner. At time t = 5, the paths of the obstacles are fixed, and the robot plans a single path towards the goal. The computation time at t = 0 is 140 [sec] using an Athlon 2200+ PC. At t = 1, the time is 9.5 [sec]. After t = 1, the time does not exceed 0.5 [sec]. The computation time is decreased with time elapses, because the ambiguity of the environment is decreased. We also compared our method with the method which plans a robot motion to avoid all possible path candidates of obstacles. Fig. 10 shows the planning result at t = 0 by this method in the same situation as the one shown in Fig. 8. In the plan, the robot moves slowly until t = 2 to wait for the obstacles to go away. The expected cost of the plan is 12.90 [sec]. On the other hand, our method selects a faster motion in the same situation, and the expected cost is 10.99 [sec]. This result shows the effectiveness of the proposed method.

H. Koyasu and J. Miura / Mobile Robot Motion Planning

93

6. Conclusion This paper has described a path planning method which considers path ambiguity of moving obstacles. The method predicts possible future states and their probabilities, and searches for a robot motion for each predicted state. This prediction and search are recursively performed until every state becomes terminal (i.e., the ambiguity of a state is resolved or the search reaches the depth limit). The method finally selects the next motion which minimizes the expected time to the goal. The proposed method can deal with fairly general motion planning problems with arbitrary static obstacle configurations and multiple moving obstacles with multiple path candidates. The path planner which we use for calculating the costs of terminal nodes is an improved version of a previous randomized method; it iteratively generates a set of paths until a sufficiently good path is generated. The number of iteration is determined based on the quality evaluation of the current best path. The current method still needs too much calculation time to be used on-line. A future work is to consider the trade-off between the quality of a planned path and the computational cost in order to appropriately control the planning time. References [1] M. Bennewitz, W. Burgard, and S. Thrun. Adapting navigation strategies using motions patterns of people. In Proc. of the 2003 IEEE Int. Conf. on Robotics and Automation, pages 2000–2005, 2003. [2] A. Elnagar and A. Basu. Local Path Planning in Dynamic Environments with Uncertainty. In Proc. of IEEE Int. Conf. on Multisencor Fusion and Integration for Intelligent Systems, pages 183–190, 1994. [3] K. Fujimura. Time-Minimum Routes in Time-Dependent Networks. IEEE Tran. of Robotics and Automation, 11(3):343–351, 1995. [4] D. Hsu, R. Kindel, J.C. Latombe, and S. Rock. Randomized kinodynamic motion planning with moving obstacles. Int. J. Robotics Research, 21(3):233–255, 2002. [5] S. Ishikawa. A method of indoor mobile robot navigation by using fuzzy control. In Proc. 1991 IEEE/RSJ Int. Workshop on Intelligent Robots and Systems, pages 1013–1018, 1991. [6] F. Large, S. Sekhavat, Z. Shiller, and C. Laugier. Towards real-time global motion planning in a dynamic environment using the nlvo concept. In Proc. of the IEEE Int. Conf. on Intelligent Robots and Systems, 2002. [7] S. M. LaValle and J. J. Kuffner. Rapidly-exploring random trees: Progress and prospects. In B. R. Donald, K. M. Lynch, and D. Rus, editors, Algorithmic and Computational Robotics: New Directions, pages 293–308. A K Peters, Wellesley, MA, 2001. [8] Y.-H. Liu and S. Arimoto. Path planning using a tangent graph for mobile robots among polygonal and curved objects. Int. J. of Robotics Research, 11(4):376–382, 1992. [9] J. Miura and Y. Shirai. Probabilistic uncertainty modeling of obstacle motion for robot motion planning. J. of Robotics and Mechatronics, 14(4):349–356, 2002. [10] Z. Qu, J. Wang, and C. E. Plaisted. A new analytical solution to mobile robot trajectory generation in the presence of moving obstacles. IEEE Transactions on Robotics, 20(6):978– 993, 2004.

94

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

Robotic Navigation using Harmonic Functions and Finite Elements Santiago Garrido a,1 , Luis Moreno a a Carlos III University b Madrid, Spain Abstract. The harmonic potentials have proved to be a powerful technique for path planning in a known environment. They have two important properties: Given an initial point and a objective in a connected domain, it exists a unique path between those points. This path is the maximum gradient path of the harmonic function that begins in the initial point and ends in the goal point. The second property is that the harmonic function cannot have local minima in the interior of the domain (the objective point is considered as a border). Our approach has the following advantages over the previous methods: 1) It uses the Finite Elements Method to solve the PDE problem. This method permits complicated shapes of the obstacles and walls. 2) It uses mixed border conditions, because in this way the trajectories are smooth and the potential slope is not too small and the trajectories avoid the corners of walls and obstacles. 3) It can avoid moving obstacles in real time, because it works on line and the speed is high. 4) It can be generalized to 3D or more dimensions and it can be used to move robot manipulators. Keywords. Path Planning, Harmonic Functions, Finite Elements

1. Introduction Robot motion control can be viewed as an underconstrained problem. The robot exists at a certain configuration, and must get to a certain goal configuration, using any free path. If properly specified, such underconstrained interpolation problems can be solved using Laplace’s equation. Intuitively, Laplace’s equation can be thought of as governing the shape of a thin membrane: Obstacle positions are pulled up, and goal positions are pulled down. If a ball bearing is dropped on the membrane, it will always fall into one of the goal positions, never hitting the obstacles. The movement of the ball corresponds to a change in state of a robot (e.g., the movement of joints during a reach). The trajectory of the ball is always a smooth, obstacle-avoiding path. The harmonic potentials have proved to be a powerful technique for path planning in a known environment. They have two important properties: Given an initial point and a objective in a connected domain, it exists a unique path between those points. This path is the maximum gradient path of the harmonic function that begins in the initial point 1 Correspondence to: Santiago Garrido, Carlos III University, Leganés, Madrid, Spain. Tel.: +34 91 6245990; Fax: +34 91 6249430; E-mail:[email protected]

S. Garrido and L. Moreno / Robotic Navigation Using Harmonic Functions and Finite Elements

95

and ends in the goal point. The second property is that the harmonic function cannot have local minima in the interior of the domain (the objective point is considered as a border). These properties means that a path can always be found if it exists and it is impossible to be stuck in a local minima. These are the most important properties that made harmonic functions extremely interesting for path planning purposes. Our approach has the following advantages over the previous methods: 1) It uses the Finite Elements Method to solve the PDE problem. This method permits complicated shapes of the obstacles and walls. 2) It uses mixed border conditions, because in this way the trajectories are smooth and the potential slope is not too small and the trajectories avoid the corners of walls and obstacles. 3) It can avoid moving obstacles in real time, because it works on line and the speed is high. 4) It can be generalized to 3D or more dimensions and it can be used to move robot manipulators. 1.1. Previous work Every method concerning robot motion planning has its own advantages and application domains as well as its disadvantages and constraints. Therefore it would be rather difficult either to compare methods or to motivate the choice of a method upon others. In contrast to many methods, robot motion planning through artificial potential fields (APF) considers simultaneously the problems of obstacle avoidance and trajectory planning (Arkin [1][2]. In addition the dynamics of the manipulator are directly taken into account, which leads in our opinion to a more natural motion of the robot. The first use of the APF concept for obstacle avoidance was presented by Khatib [8]. He proposed a Force Involving an Artificial Repulsion from the Surface (FIRAS, in French) which should be nonnegative, continuous and differentiable. However, the potential field introduced exhibits local minima other than the goal position of the robot. To solve the preceding problem, Volpe and Khosla [11] developed new elliptical potential functions called “superquadric artificial potential functions”, which do not generate local minima in the physical space. They have shown that superquadric potential fields can be constructed only for simple shapes like square or triangular figures. The problem of local minima remains, because the superquadric potential functions do not generate local minima in the workspace but the local minima can occur in the C-space of the robot. The contributions of Koditschek in [9][10][6] are worth to be mentioned because they introduce an analytic potential field in the C-space without local minima. However the topology of the application range is limited to obstacles, which have to be ball-, or star-shaped, otherwise no solution can be found. The contributions of Connolly [4] and of Kim and Khosla [7] are the most successful methods concerning robot motion planning with potential field. They used the harmonic functions to build a potential field in the C-space without local minima. The harmonic functions attain their extreme values at the boundary of the domain. Using a potential function to accomplish a certain motion implies that the trajectory of the robot is not known or calculated in advance, which means that the robot chooses autonomously the way to reach its goal. The main problem of this method is the complication of the Panel Method used that means that the obstacles have to be very simple, like rectangles. The method of harmonic functions based on the Finite Difference Method has been used for guiding robots by Connolly and Gruppen [3]. Edson Prestes [5] used this method for exploration of unknown environments. The method of harmonic functions is not very extended because up to now has been based in the Finite Difference Method, and for

96

S. Garrido and L. Moreno / Robotic Navigation Using Harmonic Functions and Finite Elements

this reason it is not easy to implement. It is quite slow and has to be used in static environments and the environment has to be very easy with obstacles and walls with parallel straight lines.

2. Harmonic Functions Equilibrium problems in two-dimensional, and higher, continuous give rise to elliptic partial differential equations. A prototype is the famous equation of Laplace: ∇2 φ = 0. This equation holds for the steady temperature in an isotropic medium, characterizes gravitational or electrostatic potentials at points of empty space, and describes the velocity potential of an irrotational, incompressible fluid flow. The two-dimensional counterpart of this equation lies at the foundation of the theory of analytic functions of a complex variable. An harmonic function φ on a domain Ω ∈ R is a function which satisfy the n  ∂2φ = 0 where xi is the i − th Cartesian coordinate and Laplace’s equation: ∇2 φ = ∂x2 i=1

i

n is the dimension. In the case of robot path planning, the boundary ∂Ω of Ω, is formed by all the walls, obstacles and the goal, but not the start point. Maximum Principle: A nonconstant harmonic function φ(x) defined on Ω takes its Maximum value M on the boundary and guarantees that there are no local minima in the interior of Ω.

Figure 1. Robot’s trajectories obtained to solved Laplace’s equation with the boundary conditions of Dirichlet, Neumann and Robin, respectively.

S. Garrido and L. Moreno / Robotic Navigation Using Harmonic Functions and Finite Elements

97

2.1. Problems using Harmonic Functions The use of harmonic functions as path planning potentials is a powerful technique but it is not very extended because : First, the previous methods based on finite differences need that all obstacles and walls of the region are composed by parallel straight lines. Second, the previous methods based on finite differences are slow and difficult to implement in a dynamic environment that changes with time.Third, some implementations of these methods that does not assure the existence of a trajectory. For example, when using Harmonic functions as a potential field it is not possible to add an attractive potential to the goal and a repulsive potential from the obstacles because the domain is not the same. In the first case, the goal is one of the borders because this is the only way of being the minimum of the function. In the second case the borders are the maximum and the goal can’t be one of them because the resulting addition function does not have the minimum at the goal. Because of that, the only way of doing a potential attractive to the goal and repulsive from the obstacles is defining the obstacles borders value at the maximum value, the goal border at the minimum value and to solve the Laplace function with this border values. Fourth, in case the obstacles are not fixed, it is necessary to recalculate the harmonic function and path continuously, at each time that a movement is registered in order to obtain an adaptive version of the method.

3. Boundary Conditions The different kinds of contour conditions imposed to the Laplace equation have a critical importance in the solution obtained of the equation and the quality of the trajectory. The directions field associated to the partial derivatives problem give the path that the robot has to track. To achieve a good quality path to goal, some convenient properties of the direction fields are: 1. the trajectories have to be smooth and as short as possible. 2. the potential slope have not to be small. 3. The trajectories have to avoid the corners of walls and obstacles. Because of that, the gradient has to be perpendicular or tangent to the borders of the work space. The solution can be considered as the potential function and the trajectories associated to the direction field of the potential, that is the path of the robot. The contour conditions can be given in three ways: by the values of the potencial function in the contour (Dirichlet), by the gradient values in the contour (Neumann), or by a linear combination of the two of them (Robin or mixed): Dirichlet φ = c(x, y) in the contour. Usually is taken as a constant function φ = K. Neumann ∇(φ) = c(x, y) in the contour. Usually is taken as a constant function ∇(φ) = K. Robin or Mixed n · a · ∇(φ) + b · φ = c(x, y) in the contour. Usually is taken as a constant function n · a · ∇(φ) + b · φ = K. If the solution of the Laplace equation is the velocity potential of an irrotational, incompressible fluid flow, then the Dirichlet conditions represent the different contour levels, the Neumann conditions correspond to the entrance or exit of fluid flow and the Robin conditions correspond to the two of them at the same time.

98

S. Garrido and L. Moreno / Robotic Navigation Using Harmonic Functions and Finite Elements

Figure 2. Solutions of Laplace’s equation with the boundary conditions of Dirichlet, Neumann and Robin, respectively.

Dirichlet Conditions. In the subsection 2.1 was treated some problems in the use of harmonic functions and the reasons why this method is not very extended. Another problem in the way this method is used is that in some articles (e.g. Karnik et al.) three different values are given to the initial point, to the walls and obstacles and to the objective point (value 10 to the inicial point, value 8 to the walls and obstacles and value -10 to the objective). In this situation the trajectories can go to the walls as can be seen in the figure. This situation happens because the Maximum’s Principle says that the solution of the Laplace’s equation can’t have its maximum (and Minimum) in an interior point, but it says nothing about the border. The unique way of not having the end point of a trajectory in the walls is to give them the maximum value. The reason why some authors give to the initial point a higher potential value than the walls values is because in regions far from the objective point the potential values are quite flat ( for Dirichlet conditions). In this situation the gradient is very small and computational error can be produced (round off, quantization,...). The correct way to be sure that all the trajectories end at the objective point is to have only two values of the potential in the border conditions: the maximum value in the walls and obstacles and the minimum value in the objective point. It is important to note that the objective point, the obstacles and the walls belong to the contour of the Laplace’s equation problem. It is possible too, to give the high potential value to the initial point. But, in this way the unicity of the trajectories is lost, and it will be necessary a selection criteria in order to choose the initial direction.

S. Garrido and L. Moreno / Robotic Navigation Using Harmonic Functions and Finite Elements

99

This high potential value for the initial point is motivated for the logarithmic dependency of the distance to the objective point. Because of this reason, points distant to the objective have a small gradient that difficult the numerical calculus of the trajectory. In order to assure that the trajectories end at the objective point and the solution gradient is not small it is a preferable to use the Robin border conditions. Proposition.- If the borders have a maximum potential value (=1) and the objective point has a minimum value (=0), then there exist a unique trajectory from the initial point to the objective point.

.

Figure 3. Contour of Laplace’s equation with the boundary conditions of Dirichlet, Neumann and Robin, respectively.

This proposition shows the convenience of choosing the contour conditions of Dirichlet or Robin, because they give fixed values to the potential conditions in the contour. When Dirichlet’s contour conditions are used the solution is quite flat for values far from the objective point. For this reason, it is easy to have computational errors in the trajectory computation. Apart from that, the trajectories are very smooth and they are not very close to the walls corners and obstacles. Neumann’s Conditions. In the Neumann’s Conditions case, the descent is smooth and continuous, with a slope not close to zero and because of that it is more difficult to have computational errors in the trajectory calculation. In the Neumann’s conditions case it is possible, too, to assure the existence of a unique trajectory from the initial point to the objective point. If the walls and obstacles are sources, with positive flow and the objective point is the only sink, with negative flow, then by similitude with the problem of th velocity potential of an irrotational, incompressible flow there is a unique trajectory from the initial point to the objective. However, the problem with Neuman’s conditions is that the trajectories are very close to the contour corners and the trajectories are so close to these corners that it is impossible to continue the calculation of the trajectory, as can be seen in the figure. Robin Conditions (or mixed). Finally, with the contour conditions of Robin have the advantages of two both and none of the disadvantages: the solution have a smooth slope, continuous and big enough, and, at the same time, the trajectories are not close to the corners. This let us an easy calculation of the trajectories.

4. Comparison between Finite Elements Method and Finite Difference Method The finite elements method (FEM) is particularly useful when a robust approximation is sought to solve partial differential equations on an inhomogeneous mesh. Solid mathematical foundations and a great deal of generality allow for different implementations.

100

S. Garrido and L. Moreno / Robotic Navigation Using Harmonic Functions and Finite Elements

The most important difference between Finite Elements Method and Finite Difference Method is the treatment of the geometry of the domain. Finite Differences Method uses rectangular meshes and Finite Element Method usually uses triangular meshes. J. Rossell uses a hierarchical decomposition of the domain, but it can be awkward to try to work finite differences over nonuniform meshes. Generally speaking the finite element method is more flexible with curved geometries, etc. Clearly the details of finite element method are more complex than those of the finite difference technique. The end-points of both methods are essentially the same namely a system of linear or nonlinear algebraic equations. The finite element method, however, has a number of advantages: • it is well suited to problems involving complex geometries. • it can readily handle problems where the physical parameters vary with position within the domain. • it can also be used for nonlinear and/or time-varying problems. • complex boundary conditions can be readily dealt with. • general computer programs to perform finite element calculations can be, and have been, developed conventional numerical techniques can be used to solve the equations resulting from a finite element analysis. In few words, anything that can be done nicely with finite differences can be done well with finite elements, but that the converse does not hold.

.

Figure 4. Avoidance of stationary obstacles.

5. Implementation of the FEM method The method starts by approximating the computational domain with a union of simple geometric objects, in the 2D case triangles. The triangles form a mesh and each vertex is called a node. The mesh design has to strike a balance between the ideal rounded forms of the original sketch and the limitations of his simple building-blocks, triangles or quadrilaterals. If the result does not look close enough to the original objects, we can always improve it using smaller blocks. Now using the Laplace’s equation (expressed in Ω) −∇2 u = 0. If uh is a piecewise linear approximation to u, it is not clear what the second derivative term means. Inside each triangle, ∇uh is a constant (because uh is flat) and thus the second-order term vanishes. At the edges of the triangles, ∇uh is in general discontinuous and a further derivative makes no sense. This is the best approximation of u in the class of continuous piecewise polynomials. Therefore we test the equation for uh against all possible functions v of that class. Testing means formally to multiply the residual against any function and then integrate, i.e.,

S. Garrido and L. Moreno / Robotic Navigation Using Harmonic Functions and Finite Elements

101

 determine uh such that Ω (∇ uh )vdx = 0 for all possible v. The functions v are usually called test functions.   Partial integration (Green’s formula) yields that uh should satisfy Ω ∇uh ∇v − n · (c∇uh )vds = 0 where ∂Ω is the boundary of Ω and n is the outward pointing ∂Ω normal on ∂Ω. Note that the integrals of this formulation are well-defined even if uh and v are piecewise linear functions. Boundary conditions are included in the following way. If uh is known at some boundary points (Dirichlet boundary conditions), we restrict the test functions to v = 0 at those points, and require uh to attain the desired value at that point. For all the other points Robin boundary conditions are used,i.e., (∇uh ) · n +   quh = g . The FEM formulation is: to find uh such that Ω ∇uh ∇v + ∂Ω1 quh vds = ∂Ω1 gvds ∀v where ∂Ω1 is the part of the boundary with Robin conditions. The test functions v must be zero on ∂Ω − ∂Ω1 . Any continuous piecewise linear uh is represented as a combination uh (x) = N i=1 Ui φi (x) where φi are some special piecewise linear basis functions and Ui are scalar coefficients. Choose φi like a tent, such that it has the "height" 1 at the node i and the height 0 at all other nodes. For any fixed v, the FEM formulation yields an algebraic equation in the unknowns Ui . To determine N unknowns, we need N different instances of v. What better candidates than v = φj , j = 1, 2, ..., N ?. We find a linear system KU = F where the matrix K and the right side F contain integrals in terms of the test functions φi , φj and the coefficients defining the problem: q, and g. The solution vector U contains the expansion coefficients of uh , which are also the values of uh at each node xi since uh (xi ) = Ui . If the exact solution u is smooth, then FEM computes uh with an error of the same size as that of the linear interpolation. It is possible to estimate the error on each triangle using only uh and the PDE coefficients (but not the exact solution u, which in general is unknown). The FEM method provides functions that assemble K and F automatically. To summarize, the FEM approach is to approximate the PDE solution u by a piecewise linear function which is expanded in a basis of test-functions φi , and the residual is tested against all the basis functions. This procedure yields a linear system KU = F . The components of U are the values of uh at the nodes. For x inside a triangle, uh (x) is found by linear interpolation from the nodal values.

6. Avoidance of stationary and moving obstacles

.

Figure 5. Avoidance of moving obstacles.

102

S. Garrido and L. Moreno / Robotic Navigation Using Harmonic Functions and Finite Elements

Figure 6. Local Path Planning using the trajectory done by the solution of the Laplace’s equation calculated by Finite Element Method, with sensor data.

This technique can be used at a global level to plan a free obstacles trajectory by considering the environment information contained in an a priory map. Besides, this technique can also be applied at a local level to consider sensor based information. This situation is required because in spite of the robot has an initial global map of the environment and makes an initial path planning from the initial point to the goal. This map is not perfect and can have errors such as obstacles that doesn’t appear in the map. This obstacles can be static or they can move. In order to avoid obstacles, the robot requires to consider local map information extracted from sensors. This local map is used in a reactive way to find the path avoiding the obstacles. When there are static obstacles, the followed strategy consist of the steps shown in figure 7. This kind of algorithm is possible to be performed today because of the speed of current computers and the flexibility of the resolution of the Laplace’s equation with the finite elements method that permits all kind of shapes in walls and obstacles. An important detail to be considered is that the gradient modulus is proportional to the objective distance (without obstacles). For this reason, it is better to use a constant speed in order to avoid big differences in speed of the robot.

7. Conclusions In this paper, we introduced the harmonic potential function for building potential fields for path planning and moving obstacle avoidance. We proposed the use of harmonic functions to eliminate the local minima even in very complicated environments. The effectiveness of the proposed strategy has been demonstrated using this new finite element method robot navigation, autonomous motion planning in known and unknown environments with dynamic obstacles is greatly improved. The computational efficiency of the proposed control scheme makes it particularly suitable for real-time implementation.The trajectory of the global map is done off-line and the trajectory in a local map of dimensions 40x40, with a grid composed by 3200 triangles lasts less than one second in a 1.6 Mhz laptop PC.

S. Garrido and L. Moreno / Robotic Navigation Using Harmonic Functions and Finite Elements

103

Begin

Calculation of the position of the obstacles t hat doesn’t appear in the map

Resolution of the Laplace’s equation with the border conditions given by the obstacles and walls positions . no

Calculation of the direction field given by the potential of the solution of the Laplace’s equation.

Calculation of the trajectory to the objective.

Tracking of the calculated trajectory a small distance , which depends on the distance to the next mobile object.

Has the robot found the objective? yes End

Figure 7. Flow chart of the algorithm.

References [1] R. C. Arkin. Integrating behavioral, perceptual, and world knowledge in reactive navigation. Robotics and Autonomous Systems, 6, 1990. [2] R. C. Arkin. Behavior-Based Robotics-Intelligent robots and autonomous agents. The MIT Press, 1998. [3] R.A. Grupen C.I. Connolly. The applications of harmonic functions to robotics. J. Robotic Systems, 10(7):931–946, 1993. [4] C. I. Connolly. Harmonic functions as a basis for motor control and planning. PhD thesis, University of Massachussets, Amherst, MA, USA, 1994. [5] M. Trevisan M.A.P. Idiart E. Prestes, P. M. Engel. Exploration method using harmonic functions. Rob. Aut. Systems, 40(1):2542–2558, 2002. [6] D.E. Koditschek E. Rimon. Exact robot navigation using artificial potential functions. IEEE Transactions on Robotics and Automation, 8(5):501–518, 1992. [7] P. K. Khosla J. O. Kim. Real-time obstacle avoidance using harmonic potential functions. IEEE Transactions on Robotics and Automation, 8(3):338–349, 1992. [8] O. Khatib. Commande dynamique dans l’espace opérationnel des robots manipulateurs en présence d’obstacles. PhD thesis, École Nationale Supérieure de l’Aéronautique et de l’Espace (ENSAE), Toulouse, France, 1980. [9] D. E. Koditschek. The control of natural motion in mechanical systems. Journal of Dynamic Systems, Measurement, and Control, 113:547–551, 1991. [10] D.E. Koditschek. Some applications of natural motion. Journal of Dynamic Systems, Measurement, and Control, 113:552–557, 1991. [11] P. Khosla R. Volpe. Manipulator control with superquadric artificial potential functions: Theory and experiments. IEEE Transactions on Systems, Man, and Cybernetics,, 20(6):1423– 1436, 1990.

104

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

The Hippocampal Place Cells and Fingerprints of Places: Spatial Representation Animals, Animats and Robots a.

Adriana Tapusa1, Francesco Battagliab and Roland Siegwarta Ecole Polytechnique Fédérale de Lausanne (EPFL), Autonomous Systems Lab, Switzerland b. Collège de France, LPPA, France

Abstract. In this paper we address the problem of autonomous navigation seen from the neuroscience and the robotics point of view. A new topological mapping system is presented. It combines local features (i.e. visual and distance cues) in a unique structure – the “fingerprint of a place” - that results in a consistent, compact and distinctive representation. Overall, the results suggest that a process of fingerprint matching can efficiently determine the orientation, the location within the environment, and the construction of the map, and may play a role in the emerging of spatial representations in the hippocampus

1. Introduction In all our daily behaviors, the space we are living and moving in plays a crucial role. Many neurophysiologists dedicate their work to understand how our brain can create internal representations of the physical space. Both neurobiologists and robotics specialists are interested in understanding the animal behavior and their capacity to learn and to use their knowledge of the spatial representation in order to navigate. The ability of many animals to localize themselves and to find their way back home is linked to their mapping system. Most navigation approaches require learning and consequently need to memorize information. Stored information can be organized as cognitive maps – term introduced for the first time in [31]. Tolman’s model advocates that the animals (rats) don’t learn space as a sequence of movements; instead the animal’s spatial capabilities rest on the construction of maps, which represent the spatial relationships between features in the environment. Several methods, each with its advantages and drawbacks, have been proposed to construct maps in the framework of autonomous robot navigation, from precise geometric maps based on raw data or lines to purely topological maps using symbolic descriptions. To mention only a few papers in the vast SLAM (Simultaneous Localization and Mapping) literature, Leonard and Durrant-Whyte introduced for the first time the concept of SLAM as the construction of maps while the robot moves 1

Corresponding Author: Ecole Polytechnique Fédérale de Lausanne (EPFL), Autonomous Systems Lab, 1015 Lausanne, Switzerland ; E-mail : [email protected]

A. Tapus et al. / The Hippocampal Place Cells and Fingerprints of Places

105

through the environment and the localization with respect to the partially built maps [15]. Many works in space representations are based on metric maps. The stochastic map technique to perform SLAM [3, 7, 15] and the occupancy grids approaches [28] are typical examples belonging to this kind of space representation. More recent visionbased metric approaches use SIFT features [24]. However, metric SLAM can become computationally very expensive for large environments. Thrun in [29] proposes probabilistic methods that make the metric mapping process faster and more robust. However, one of the main shortcomings of metric maps is that they are not easily extensible so as to be useable for higher level, symbolic reasoning. They contain no information about the objects and places within the environment. Topological approaches to SLAM attempt to overcome the drawbacks of geometric methods by modeling space using graphs. Significant progress has been made since the seminal papers by Kuipers [13, 14]; Kortenkamp and Weymouth in [12] have also used cognitive maps for topological navigation. They defined the concept of gateways which mark the transition between two adjacent places in the environment. Their work has been an amelioration of Mataric’s approach [17], contributing towards the reduction of the perceptual aliasing problem (i.e. observations at multiple locations are similar). They have used the data from sonars combined with vision information in order to achieve a rich sensory place-characterization. A model by Franz, Schölkopf and Mallot [8] was designed to explore open environments within a maze-like structure and to build graph-like representations. The model described in [5] represents the environment with the help of a Generalized Voronoi Graph (GVG) and localize the robot via a graph matching process. This approach has been extended to H-SLAM (i.e. Hierarchical SLAM) in [16], by combining the topological and feature-based mapping techniques. In [30], Tomatis et al. have conceived a hybrid representation, similar to the previously mentioned work, in which a global topological map with local metric maps associated to each node for precise navigation is described. Topological maps are less complex, permit more efficient planning than metric maps and they are easier to generate. Maintaining global consistency is also easier in topological maps compared to metric maps. However, topological maps suffer from perceptual aliasing (i.e. observations at multiple locations are similar) and the difficulty in automatically establish a minimal topology (nodes). Our method uses fingerprints of places to create a cognitive model of the environment. The fingerprint approach, by combining the information from all sensors available to the robot, reduces perceptual aliasing and improves the distinctiveness of places. The main objective of this work is to enable the navigation of an autonomous mobile robot in structured environments without relying on maps a priori learned and without using artificial landmarks. A new method for incremental and automatic topological mapping and global localization [26] using fingerprints of places is described. The mapping method presented in this paper uses fingerprints of places to create a cognitive model of the environment. The construction of a topological mapping system is combined with the localization technique, both relying on fingerprints of places. This fingerprint-based approach yields a consistent and distinctive representation of the environment and is extensible in that it permits spatial cognition beyond just pure navigation.

106

A. Tapus et al. / The Hippocampal Place Cells and Fingerprints of Places

2. Navigation Framework Navigation strategies are based on two complementary sources of information (available on the mobile agent: animal, robot): idiothetic and allothetic. The idiothetic source yields internal information about the mobile agent movements (e.g. speed, acceleration) and the allothetic source provides external information about the environment (e.g. the cues coming from the visual, odor, laser range finders, sonars, etc.). Idiothetic information provides a metric estimate of the agent’s motion, suffering from errors accumulation, which makes the position estimation unreliable at long-term. In contrast, the allothetic (sensory) data is stationary over the time, but is susceptible to perceptual aliasing (i.e. observations at multiple locations are similar) and requires non-trivial processing in order to extract spatial information. The map-based navigation needs map-learning and localization. Map-learning is the process of constructing a map representing the environment explored by the mobile agent and localization is the phenomenon of finding the mobile agent’s location (position) in the map. Localization and mapping are interdependent – to localize the robot, a map is necessary and to update a map the position of the mobile agent is needed. This is usually known as Simultaneous Localization and Mapping (SLAM) problem that is of a “chicken and egg“ nature. While navigating in the environment, the mobile agent first creates and then updates the map.

3. Fingerprints of Places and Space Cognition The seminal discovery of place cells, by O’Keefe and Dostrovsky [20], in the rat hippocampus – cells whose firing pattern is dependent on the location of the animal in the environment – led to the idea that the hippocampus works as a cognitive map of space [21]. It was shown in [4] (for a review see e.g. [23]) that the lesion of the hippocampus impairs the performance of rodents in a wide variety of spatial tasks indicating a role of the hippocampus in map-based navigation. The framework for topological SLAM (Simultaneous Localization and Mapping) (see Figure 2) that we propose here organizes spatial maps in cognitive graphs, whose nodes correspond to fingerprints of places, and may be seen as a possible mechanism for the emergence of place cells. The computational model describes how a mobile agent can efficiently navigate in the environment, by using an internal spatial representation (similar to some extent to hippocampal place cells). This model builds a topological (qualitative) representation of the environment from the sequence of visited places. Many visual based systems for place fields based on metric information have been extensively discussed in literature (e.g. [10], [11] and [1] are just some of them). In this work, places in the environment are characterized by fingerprints of places. This characterization of the environment is especially interesting when used within a topological framework. In this case the distinctiveness of the observed location plays an important role for reliable localization and consistent mapping. A fingerprint of a place is a circular list of features, where the ordering of the set matches the relative ordering of the features around the robot. We denote the fingerprint sequence using a list of characters, where each character is the instance of a specific feature defining the signature of a place. In this work, we choose to extract color patches and vertical edges from visual information and corners (i.e. extremity of line-segments) from laser scanner. The letter cvc is used to characterize an edge, the letters cAc,cBc,cCc,...,cPc to

A. Tapus et al. / The Hippocampal Place Cells and Fingerprints of Places

107

represent hue bins and the letter ccc to characterize a corner feature (i.e. in this work, a corner feature is define as the extremity of a line-segment extracted with the DouglasPeucker algorithm). An cempty spacec between features is also denoted by the character cnc in the sequence, providing the angular distance between the features, which is some kind of very rough metric information. Figure 1 depicts how a fingerprint of a place is generated through an example. More details about the fingerprint approach can be found in [27]. With our fingerprint based-approach, the allothetic sensors are used (e.g. this choice has been made because similarly the animals are using multimodal sensory information). The fingerprints of places are integrating the information from the omnidirectional camera and the laser range finder, characterizing different places and being used to map (model) the environment. The relative angular position of the local features is also enclosed in the fingerprint of a place. A fingerprint of a place is associated to each distinctive place within the environment and so the result given by the fingerprint matching algorithm is strongly correlated to the location of the mobile agent in the environment, giving high or the highest probability to the correct place associated to the fingerprint. The firing of place cells units can be seen as the manifestation of fingerprint matching. The closer to the center of the place field the animal is, the higher the rate of neural firing.

a)

b)

c)

Figure 1: Fingerprint generation. (a) panoramic image with the vertical edges and color patches detected, denoted by cvc and cAc…cPc, respectively ; (b) laser scan with extracted corners ccc; (c) the first three images depict the position (0 to 360°) of the colors (I-light blue, B- orange and E-light green), vertical edges and corners, respectively. The forth image describes the correspondence between the vertical edge features and the corner features. By regrouping all these results together and by adding the empty space features, the final fingerprint is: cIfvnvcvfnvvncvnncvBnvBccE

108

A. Tapus et al. / The Hippocampal Place Cells and Fingerprints of Places

Similarly, the nearer the new observation of the robot (i.e. the new observed fingerprint of a place) will be with respect to the registered (learned) place (i.e. a known fingerprint of a place), the higher the probability of the mobile agent of being in an already explored place. One of the main issues in topological map building is to detect when a new node should be added in the map. Most of the existing approaches to topological mapping place nodes periodically in either space (displacement, 'd) or time ('t) or alternatively attempt to detect important changes in the environment structure. Any of these methods cannot result in an optimal topology. In contrast, our approach is based directly on the differences in the perceived features. Instead of adding a new node in the map by following some fixed rules (e.g. distance, topology) that limit the approach to indoor or outdoor environments, our method introduces a new node into the map whenever an important change in the environment occurs. This is possible using the fingerprints of places. A heuristic is applied to compare whether a new location is similar to the last one that has been mapped.

Figure 2: The spatial representation framework encodes the topological relationships between places, by comparing the actual observation (fingerprint of a place) of the mobile agent with the previously mapped places.

The process of introducing a new node in the topological map is split into several sequences of steps as follows: 1) Start with an initial node (i.e. fingerprint f0) 2) Move and at each 't (time) or 'd (distance), take a new scan with the laser scanner and a new image with the omnidirectional camera and generate the new fingerprint fi 3) Calculate the probability of matching, prob_matching, between the fingerprints fi-1 and fi respectively. Compute the dissimilarity factor, dissimilarity. prob_matching = P (fi ¸ fi-1)

A. Tapus et al. / The Hippocampal Place Cells and Fingerprints of Places

109

dissimilarity(fi , fi-1) = 1- prob_matching 4) If dissimilarity(fi , fi-1) distnew , or underconsistent s s new if dists < dists . Like A*, D*, and Brushfire, Dynamic Brushfire keeps a priority queue OPEN of the inconsistent cells to propagate changes. A cell s’s priority is always min(dists , distnew ) s and cells are popped with increasing priority values. When a cell is popped from the OPEN queue, its new distance is propagated to its adjacent cells, and any newlyinconsistent cells are put on the OPEN queue. Thus, the propagation emanates from the source of the change and terminates when the change does not affect any more cells.

118

N. Kalra et al. / Incremental Reconstruction of Generalized Voronoi Diagrams on Grids

When a cell is set to obstacle, it reduces the new minimum distance of adjacent cells which propagate this reduction to their adjacent cells, etc. This creates an overconsistent sweep emanating from the original obstacle cell (Figure 2, panels 1 to 3 ). An overconsistent sweep terminates when cells are encountered that are equally close or closer to another obstacle and thus cannot be made overconsistent. These cells lie on the boundary between Voronoi regions and will be part of the new GVD (Figure 2, panels 4 and 5). When an obstacle cell is removed, all cells that previously used it to compute their distances are invalid and must recompute their distances. This propagation occurs in two sweeps. First, an underconsistent propagation sweeps out from the original cell and resets the affected cells (Figure 2, panels 6 and 7). This sweep terminates when cells are encountered that are closer to another obstacle and cannot be made underconsistent. Thus, at most those cells in the removed obstacle cell’s Voronoi region are invalided. Then, an overconsistent propagation sweeps inwards and uses the valid cells beyond this region to recompute new values for the invalid cells (Figure 2, panels 8 and 9). 3.2. Algorithm Pseudocode Initialize () 1 OPEN =TIES = ∅; 2 foreach cell s 3 dists =distnew =∞ s 4 parents =ties =∅ 5

obsts =∅

SetObstacle (o, O) =0 1 2 obsto = O 3 parento =o distnew o

4 validO =TRUE 5 insert(OPEN , o, distnew ) o

RemoveObstacle (o, O) =∞ 1 distnew o 2 obsto =∅ 3 if O - o =∅ 4 validO =FALSE; 5 insert(OPEN , o, disto )

Figure 3. Pseudocode for initializing the grid and adding and removing obstacle cells.

Figures 3, 4, and 5 present pseudocode for the Dynamic Brushfire algorithm. In ad, and obsts , each cell s tracks the cell parents from which its dition to dists , distnew s distance is computed and a cell ties that is adjacent to s and equidistant to a different obstacle. Additionally an obstacle is invalid (validO = FALSE) when it is entirely removed from the environment. Dynamic Brushfire first initializes all distances to infinite and pointers to unknown. Then, obstacles are added and removed using SetObstacle and RemoveObstacle. When a cell o is added to obstacle O (SetObstacle(o, O)), its distance becomes zero, its parent cell is itself, and it is placed on the queue as an overconsistent cell. When a cell o is removed from obstacle O (RemoveObstacle(o, O)), its properties are reinitialized and the cell is placed on the queue as an underconsistent cell. The GVD is reconstructed by calling the function RebuildGVD which removes cells from the OPEN queue for processing. When an overconsistent cell is removed (lines 2-4), its lookahead distance will be correct, so its current distance is updated. When an underconsistent cell is removed (lines 7-10), its old distance is reset so that it can subsequently be processed as an overconsistent cell. The GVD is marked in two steps. First, when a cell s becomes consistent (line 3 in RebuildGVD), we check if it should be considered for inclusion in the GVD. In the function ConsiderForGVD, if a cell s has an adjacent cell n in a different Voronoi region, s is placed on the TIES list of possible GVD cells and n is recorded for later use. Thus, if any two adjacent cells lie in different Voronoi regions, at least one of them will be placed on the TIES list. Once the OPEN list is empty, the TIES list is examined in function ConstructGVD to determine which cells belong on the GVD using the same criterion.

N. Kalra et al. / Incremental Reconstruction of Generalized Voronoi Diagrams on Grids

RebuildGVD()

ConsiderForGVD(s) 1 foreach n ∈ Adj(s) 2 if obsts = obstn 3 ties = n

1 while s = remove(OPEN ) 2 3

119

if distnew < dists s dists = distnew ; s

4

ProcessLower(s)

5 6

ConsiderForGVD(s) else

7 8

dists =∞ if distnew = dists s

9 insert(OPEN , s, 10 ProcessRaise(s) 11 ConstructGVD()

4

insert(TIES , s)

ConstructGVD(s) 1 while s = remove(TIES )

distnew ) s

2 3

foreach n ∈ Adj(s) if obstn = obsts

4 5

n and s are on the GVD ties = n ; tien = s

6 7

else n and s are not on the GVD

8

ties = ∅ ; tien = ∅

Figure 4. Pseudocode for rebuilding the GVD. Function RebuildGVD removes cells from the OPEN queue to be processed and functions ConsiderForGVD and ConstructGVD mark the GVD. ProcessLower(s)

ProcessRaise(s)

1 foreach n ∈ Adj(s) 2 if tien =s 3 4 5

1 foreach n ∈ Adj(s) 2 if tien =s

insert (TIES , n) > distnew if distnew n s  d = distance(n, s) + distnew ) s

3 4

6

if d < distnew n

5 6

7

distnew n

7

8 9

parentn = s obstn = obsts

10

=

d

insert(OPEN , n, distnew ) s

8 9 10 11 12 13

insert(TIES , n) if parentn =s new ; distnew =∞ distold n =distn n obstold n =obstn foreach a ∈ Adj(n) s.t. obsta is valid d =distance(n, a) + distnew a if d < distnew n

distnew = d ; parentn =a n obstn =obsta old if distnew = distold n n or obstn = obstn insert(OPEN , n, min(distnew , distn )) n

Figure 5. Pseudocode for propagating overconsistencies (ProcessLower) and underconsistencies (ProcessRaise).

The functions ProcessLower and ProcessRaise both update cells’ distances and place inconsistent cells on the queue. ProcessLower propagates overconsistencies: a cell examines its neighbors to see if it can lower their distances. Any changed neighboring cell has its lookahead distance, parent, and closest obstacle updated and is placed on the queue. ProcessRaise propagates underconsistencies: a cell examines its neighbors to see if any used its old distance to compute their own (line 4). For each such cell, the lookahead distance is reset and then recomputed using current values (lines 7-11). Any changed cell is placed back on the OPEN list. Additionally, both ProcessLower and ProcessRaise also find neighboring cells that were marked on the GVD because their Voronoi region was different from the current cell’s. Such cells are reinserted into the TIES list to be reexamined (lines 2-3). The termination and correctness of Dynamic Brushfire follow from the proof of D* Lite [11] and are shown in [8].

120

a

N. Kalra et al. / Incremental Reconstruction of Generalized Voronoi Diagrams on Grids

b

c

d

Figure 6. The maps used to generate results. From left to right, the complete correct map, the map with random errors, the map at lower resolution, and an illustration of an Unmanned Aerial Vehicle (UAV) updating the GVD to the correct map from the low resolution map.

4. Experiments and Results In this section we present and discuss statistical comparisons between Dynamic Brushfire and competing algorithms on several robotic scenarios. We also present results from applying Dynamic Brushfire to real robot data and to a multirobot planning problem. 4.1. Comparison to Other Algorithms We compared Dynamic Brushfire to the Brushfire, Euclidean Distance Transform, and quasi-Euclidean Distance Transform algorithms discussed in Section 2 on four scenarios. The first scenario is common to many domains and involves constructing the GVD once given a static environment. We use 200×200 cell environments that are 20% occupied by randomly placed obstacles ranging in size from 5×5 to 20×20 cells (see Figure 6 (a)). The remaining three scenarios are unique to and occur often in robotics: they require traversing the environment and repairing the GVD as new information is gathered from sensors. For this, we simulate an Unmanned Aerial Vehicle (UAV) sweeping this environment with a 10 cell omni-directional sensor and require that the UAV repair the GVD after every 10 cells of traverse. In the first of these three scenarios, the UAV has no prior map. In the second, it must repair the GVD given an erroneous prior map in which each obstacle has a 30% chance of having incorrect dimensions, of being placed incorrectly, or of being absent. An additional 10% of obstacles are randomly added (see Figure 6 (b)). Finally, the third involves repairing a GVD given a 20×20 cell low-resolution prior map (see Figure 6 (c)). Figure 6 (d) illustrates the UAV’s traverse as it incorporates new information into a prior low-resolution map. We ran each algorithm on each scenario on 100 different maps. The results from these experiments are shown in Table 1 and in Figure 7. The performance difference between Dynamic Brushfire and the other algorithms depends primarily on how much of the old computation can be preserved and how often repair must occur. This is true of most incremental repair algorithms (e.g. D* and D* Lite). Thus, in initial construction of the GVD (where prior computation does not exist), the extra processing that enables repair causes Dynamic Brushfire to be competitive to but slightly slower than the other algorithms. In the other three scenarios, Dynamic Brushfire outperforms its competitors by an order of magnitude. This improvement is most pronounced when updating from the low resolution and erroneous maps (Dynamic Brushfire is 20 and 17 times faster, respectively, than its closest competitor) because information about the environment is gained at every step and is localized around the robot. When constructing the GVD without a prior map,

121

N. Kalra et al. / Incremental Reconstruction of Generalized Voronoi Diagrams on Grids

Figure 7. A graph of the results from Table 1 on the three incremental update scenarios. From left to right in each group: Brushfire, EDT, Q-EDT, and Dynamic Brushfire. Table 1. A comparison of the performances of the four approaches on four GVD construction and repair scenarios: initial construction of a complete map (Initial), incremental construction without a prior map (Blank), incremental construction with an erroneous prior map (Error), and incremental construction with a low resolution prior map (Lowres). Each scenario was tested 100 times for each algorithm; the left column indicates the mean total time taken in seconds, and the right column indicates the standard deviation of this measure. Algorithm Brushfire

Initial

Blank

Error

Lowres

μ

σ

μ

σ

μ

σ

μ

σ

0.158

0.006

20.075

1.091

10.467

2.382

20.666

1.393

Q-EDT

0.091

0.003

12.161

0.061

5.662

1.256

11.356

0.065

EDT

0.104

0.003

11.264

0.078

7.227

1.669

15.212

1.276

DynamicBrushfire

0.162

0.005

1.887

0.174

0.328

0.051

0.551

0.068

large portions of the map have yet to be observed and the GVD is sparse. Thus, changes must be propagated over larger areas than in the other cases. As a result, the improvement is significant (Dynamic Brushfire is 6 times faster) but less than in the other scenarios. 4.2. GVD Construction on Real Data We have also tested Dynamic Brushfire on real environmental data obtained from traverses by a John Deere e-Gator robotic platform equipped with a SICK LMS laser in an outdoor wooded environment (Figure 1, top and bottom right). In one experiment, our robot explored a 75 × 40 meter area (Figure 1, left). We generated a GVD of the final data (Figure 1, center). We also repaired the GVD after each sensor scan from the laser. The roughly 2,400 cell updates over 90 repair episodes in a 375×200 cell environment took Dynamic Brushfire 2.7 seconds. For comparison, the same experiment took Brushfire 31.1 seconds. In a second experiment, we gathered sensor data from a traverse by the e-Gator platform in a similar but larger environment and then returned three weeks later for a second traverse over the same area. The first traverse provided a prior map for the second traverse. Here, repair on a prior map of size 680 ×480 cells with roughly 2,200 cell updates over 156 repair episodes took Dynamic Brushfire 11.4 seconds. For comparison, the same experiment took Brushfire 250.6 seconds. 4.3. Application to Multirobot Path Planning This work was originally motivated by the need for efficient path planning for multirobot teams. Specifically, we are interested in solving the constrained exploration problem in which a team of agents must navigate through an environment while maintaining com-

122

N. Kalra et al. / Incremental Reconstruction of Generalized Voronoi Diagrams on Grids

Figure 8. The path of two robots successfully completing a constrained exploration task from the left to the right of the environment. The paths are in bold blue/gray and bold black, obstacles are in light gray, and the GVD generated on this environment is in thin light gray. The coverage areas of the four communication towers are marked by dashed circles.

munication constraints. An instance of this problem is presented in Figure 8. Here, two robots must navigate through the environment to reach their goal locations, from the left to the right. Each robot must also always remain in contact with one of the communication towers in the environment at all times, either directly or indirectly via its teammate. Because the area of coverage of some neighboring communication towers do not overlap, the robots must coordinate to hop over the gaps. In such domains where teammates’ actions are highly interdependent, centrally planning for parts of the team can be beneficial [9]. However, algorithms for computing centralized plans typically have complexity exponential in the number of robots. While normally such planning might be intractable, it can be made feasible by reducing the search space (e.g. by planning only on the GVD). We compared planning on the GVD to planning over the entire space. In this instance, A* took only 0.36 seconds to determine a path when the search space was limited to the GVD, while it took 94.9 seconds when searching over the entire grid. We then repeated the same task but this time gave the robots an erroneous prior map of the environment. They constructed the GVD of the current map, planned a path, executed a portion of that plan while improving the map with sensor information, and then repaired the GVD and replanned again. Replanning after every five steps resulted in 74 planning and GVD construction episodes; the total planning time was 12.2 seconds and the GVD repair time using Dynamic Brushfire was 4.5 seconds. For comparison, GVD repair using Brushfire took 132.3 seconds.

5. Conclusions In this paper we have presented Dynamic Brushfire, a new algorithm for efficient, incremental reconstruction of GVDs on grids. Dynamic Brushfire operates analogously to the D* family of algorithms for path replanning: it restricts the propagation of changes in the environment to only those areas that could be affected. We have compared our algorithm to several leading approaches for constructing GVDs and found our algorithm to be significantly more efficient, particularly in typical robotics applications. We are currently using this algorithm for multirobot path planning in unknown and uncertain environments.

N. Kalra et al. / Incremental Reconstruction of Generalized Voronoi Diagrams on Grids

123

6. Acknowledgments This work was sponsored by the U.S. Army Research Laboratory, under contract “Robotics Collaborative Technology Alliance” (contract number DAAD19-01-2-0012). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies or endorsements of the U.S. Government. Dave Ferguson is supported in part by an NSF Graduate Research Fellowship. References [1] Jerome Barraquand and Jean-Claude Latombe. Robot motion planning: A distributed representation approach. Technical Report STAN-CS-89-1257, Computer Science Department, Stanford University, May 1989. [2] Heinz Breu, Joseph Gil, David Kirkpatrick, and Michael Werman. Linear time euclidean distance transform algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17:529–533, May 1995. [3] Howie Choset and Joel Burdick. Sensor based planning, part I: The generalized voronoi graph. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 1995. [4] Howie Choset, Sean Walker, Kunnayut Eiamsa-Ard, and Joel Burdick. Sensor-based exploration: Incremental construction of the hierarchical generalized voronoi graph. The International Journal of Robotics Research, 19(2):126 – 148, February 2000. [5] Mark Foskey, Maxim Garber, Ming Lin, and Dinesh Manocha. A voronoi-based hybrid motion planner. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2001. [6] Leonidas Guibas, Christopher Holleman, and Lydia Kavraki. A probabilistic roadmap planner for flexible objects with a workspace medial-axis-based sampling approach. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 1999. [7] Kenneth Hoff, Tim Culver, John Keyser, Ming Lin, and Dinesh Manocha. Fast computation of generalized voronoi diagrams using graphics hardware. In SIGGRAPH ’99: Proceedings of the 26th annual conference on Computer graphics and interactive techniques, 1999. [8] Nidhi Kalra, Dave Ferguson, and Anthony Stentz. The dynamic brushfire algorithm. Technical Report CMU-RI-TR-05-37, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, September 2005. [9] Nidhi Kalra, Dave Ferguson, and Anthony Stentz. Hoplites: A market-based framework for complex tight coordination in multi-robot teams. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2005. [10] Sven Koenig and Maxim Likhachev. D* lite. In Proceedings of the National Conference on Artificial Intelligence (AAAI), 2002. [11] Maxim Likhachev and Sven Koenig. Lifelong Planning A* and Dynamic A* Lite: The proofs. Technical report, College of Computing, Georgia Institute of Technology, 2001. [12] Nageswara Rao, Neal Stoltzfus, and S. Sitharama Iyengar. A "retraction" method for learned navigation in unknown terrains for a circular robot. IEEE Transactions on Robotics and Automation, 7(5), October 1991. [13] Azriel Rosenfeld and John Pfaltz. Sequential operations in digital picture processing. Journal of the Association for Computing Machinery, 13(4), 1966. [14] Peter Forbes Rowat. Representing spatial experience and solving spatial problems in a simulated robot environment. PhD thesis, University of British Columbia, 1979. [15] Anthony Stentz. Optimal and efficient path planning for partially-known environments. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 1994.

124

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

Learning from Nature to Build Intelligent Autonomous Robots Rainer BISCHOFF and Volker GRAEFE Intelligent Robots Laboratory, Bundeswehr University Munich, Germany Abstract. Information processing within autonomous robots should follow a biomimetic approach. In contrast to traditional approaches that make intensive use of accurate measurements, numerical models and control theory, the proposed biomimetic approach favors the concepts of perception, situation, skill and behavior – concepts that are used to describe human and animal behavior as well. Sensing should primarily be based on those senses that have proved their effectiveness in nature, such as vision, tactile sensing and hearing. Furthermore, human-robot communication should mimic dialogues between humans. It should be situation-dependent, multimodal and primarily based on spoken natural language and gestures. Applying these biomimetic concepts to the design of our robots led to adaptable, dependable and human-friendly behavior, which was proved in several short- and long-term experiments. Keywords. Bionics, robot system architecture, situation, skill, behavior

1. Introduction Nature has created highly complex, efficient and dependable systems in the form of organisms since the very beginning of life on earth. Design and function of organisms have been optimized under evolutionary pressure over billions of years, a small step at a time. It is, therefore, an attractive idea to apply nature’s solutions to today’s engineering problems and, specifically, to the creation of robot intelligence. A first step is to base a robot’s sensor modalities, world modeling, control and communication with humans primarily on those principles that proved their effectiveness in nature. This biomimetic approach to the realization of robot intelligence differs from the traditional one in many respects. Table 1 gives an overview. 2. Human information processing Especially if a robot, such as a personal robot, is supposed to work closely together with humans in human-populated environments, its information processing design might benefit from understanding the human one and from employing similar concepts. An overall qualitative model of human performance was proposed by Rasmussen [1]. In his model he classified three typical levels of performance of skilled human operators: skill-, rule-, and knowledge-based performance (Figure 1). According to Rasmussen, a skill-based behavior represents „sensory-motor performance during acts or activities which [...] take place without conscious control as smooth, automated, and highly integrated patterns of behavior”. Performance is mostly based on feedforward control which depends upon a very flexible and efficient dynamic internal world model. Handwriting and sports are examples of such well coordinated rapid movements.

R. Bischoff and V. Graefe / Learning from Nature to Build Intelligent Autonomous Robots

125

Table 1. Summary of biomimetic vs. traditional approaches to robot intelligence Category

Traditional Approach

Biomimetic Approach

Sensor modalities

often only single modality

multi-modal, combination of:

often range sensors: laser range finder, sonar, radar, depth from stereo vision, motion, …

š

exteroceptors: vision, audition, touch, olfaction, gustation

š

proprioceptors: angle encoders, strain gauges, bodily acceleration sensors, kinesthetic sensors …

š

interoceptors: internal temperature measurements, battery gauging, …

Modeling

Robot control

seemingly suitable for constructing numerical models, e.g., 3-D surface reconstruction

suitable for situation assessment

global coordinates

local coordinates (if any)

accurate geometrical maps

attributed topological maps

detailed numerical models of the robot and the external world

qualitative models of relevant aspects of a vaguely known and ever-changing world

complete knowledge of all robot and world parameters

largely qualitative internal model of the robot’s situation

accurate and complete calibration of sensors, actuators, kinematics etc.

no or intrinsic calibration

accurate measurements

largely qualitative perception

control theory

highly adaptive calibration-free control situation-dependent behavior selection and execution

Communication

exact trajectory teaching

situation- and context-depending dialogues resembling inter-personal communication

artificial, well-structured (usually written) command language

spoken natural language augmented by gestures and visual and haptic events

commands and status reports based on coordinates

commands and status reports based on current situation and perceivable objects

Goal

Knowledge-Based Behavior Symbols

Identification

Decision of Task

Planning

Recognition

Association State / Task

Stored Rules for Tasks

Rule-Based Behavior Signs

Skill-Based Behavior Feature Formation

Sensory Input

(Signs)

Automated Sensori-Motor Patterns

Signals Actions

Fig. 1. Human information processing. Simplified model illustrating the three levels of performance of skilled human operators (Rasmussen [1]).

126

R. Bischoff and V. Graefe / Learning from Nature to Build Intelligent Autonomous Robots

At the next level – rule-based behavior – human performance is goal-oriented, but structured through stored rules (procedures). These rules may have been obtained through instructions, derived from experience or deduced by conscious problemsolving. Rules are typically defined as „if-then-else” clauses, and operators would experience them as stereotype acts. Rasmussen makes a distinction between skill- and rule-based behavior by the level of training and attention a person requires to perform a certain task. Whereas skill-based performance does not necessitate a person’s conscious attention, rule-based performance is generally based on explicit know-how. At the highest conceptual level human performance is goal-controlled and knowledge-based. It requires a mental model of the system or task enabling the person to develop and predict different plans to reach a certain goal. Rasmussen’s model is well suited to describe a typical service task, such as serving a drink, in terms of the required knowledge, rules, skills, symbols, signs and signals. It is interesting to see that both a human and a robotic servant could apply the same model to decompose a task into a set of elementary action patterns. Supposed that the servant would know how to serve a drink, he would probably apply the following rules: grasp bottle, grasp glass, fill glass from bottle, approach table, place glass onto table Each rule would in turn be composed of elementary sensorimotor skills, e.g.: grasp bottle := locate object bottle, open hand, move hand towards object, close hand around object Each sensorimotor skill would require signals (i.e., continuous sensory input) or a sign (to start some automated sensorimotor skill). The completion of a rule or sensorimotor skill is indicated by a sign representing a specific system state. The rule „grasp bottle” could thus be terminated by the sign „hand closed around the object”, the termination sign of the rule „approach table” could be, e.g., „docking position reached”. Knowledge-based operations are necessary when a set of rules does not yet exist to describe a specific task or when an unfamiliar situation occurs. For example, imagine that the servant experiences unexpected difficulties in opening a closed bottle. To handle this exception it is required to devise a plan and probably a set of rules allowing the removal of the cork (or any other stopper) before being able to fill the glass. On this knowledge-based level, sensory inputs are interpreted as symbols that are related to concepts and functional properties, i.e., problem solving requires knowledge about the process („when the stopper has been removed, the filling can begin”).

3. Robot Information Processing So far, basically three paradigms for organizing robot information processing have emerged: hierarchical (sequential), reactive (parallel), and hybrid. As a combination of hierarchical and reactive, the hybrid approach is meant to combine the best of both worlds. It allows the implementation of both purely reactive behaviors (= reflexive behaviors) and more cognitively challenging (computationally more expensive) functions to a varying degree. Although quite a number of hybrid architectures have been developed (see [2] for an overview) it is still an open research question how to organize and coordinate the reactive and deliberative parts of the system. As part of our solution to this question, we introduced a situation-oriented skill-based system architecture [3] that – according to our biomimetic approach – favors perception over measurement and situation-based calibration-free control over accurate modeling and classical control

R. Bischoff and V. Graefe / Learning from Nature to Build Intelligent Autonomous Robots

127

theory. This architecture is based on the concepts of situation, perception, behavior and skill, which are briefly explained in the sequel and illustrated in Figure 2. According to the classical approach, robot control is model-based and depends on a continuous stream of accurately measured data. Numerical models of the kinematics and dynamics of the robot and of the external objects that the robot should interact with, as well as quantitative sensor models, are the basis for controlling the robot’s motions. The main advantage of model-based control is that it lends itself to the application of classical control theory and, thus, may be considered a straight-forward approach. The weak point of the approach is that it breaks down when there is no accurate quantitative agreement between reality and the models, which is all too often the case. Organisms, on the other hand, have the ability to continuously cope with, and to survive in, the complex, dynamic and unpredictable real world. They are robust and adapt easily to changes of their own conditions and of the environment. They never need any calibration, and they normally do not know the values of any parameters related to the characteristics of their „sensors” or „actuators”. Obviously, they do not suffer from the shortcomings of model-based control which leads us to the assumption that they use something other than measurements and numerical models for controlling their behaviors. Perhaps their behavior control is based on a qualitative and holistic assessment of situations and the selection of actions to execute from an existing repertoire on that basis. We define a robot’s (internal) situation as the set of all decisive factors that should ideally be considered by a robot in selecting the correct action at a given moment. These decisive factors include: š the goals of the robot, i.e., permanent goals, usually internal goals of the robot, such as survival, and satisfaction of desires, intentions and plans; and transient goals emerging from the actual mission; š the state of the robot (state of motion, state of actuators and sensors, presently executed behaviors, focus of attention of the perceptual system,...); š the state of the robot’s environment, i.e., perceivable objects in the environment and their suspected or recognized states; and static characteristics of the environment, even if they currently cannot be perceived by the robot’s sensors; š the repertoire of available skills and knowledge of the robot’s abilities to change the present situation in a desired way by executing appropriate skills. Main basis for the recognition of situations are perceptions that integrate sensory impressions of events in the external world. Perceptions help the robot to become aware of something through its senses. Generally speaking, perception plays a similar role in the biomimetic approach as measurement does in the traditional one. In the proposed system architecture situations provide the framework for the set of the robot skills. A skill is a goal-directed, well organized ability that is inbuilt, can be acquired and improved through learning, and is performed with economy of effort (derived from psychological literature [4]). Whereas a skill describes functional underpinnings of a robot and is implemented as a specific software module, a behavior is an observable activity. Skills may thus be the basis for behaviors. It should be noted that a robot may show a certain behavior not only based on the execution of skills, but also based on purely reactive responses to external or internal stimulations, e.g., reflexes. Figure 2 (left) illustrates our definition of the term „situation” by embedding it in the perception-action cycle of a situation-oriented skill-based robot. As defined above, the robot’s image of the situation emerges at any given moment by combining internal and external goals, knowledge and percepts.

128

R. Bischoff and V. Graefe / Learning from Nature to Build Intelligent Autonomous Robots

Knowledge Base Storage

Situation Assessment & Skill Management

kills

Pr oc e

Com mu nic

ills Sk

skill execution

kills

rS

reflexive shortcut

&

knowledge (internal actions)

actuation

situation assessment

S ing

lls Ski

situation

ss

ive at

interaction

perception

supervision

or Sens

sensation

external goals

Data

internal goals

ROBOT

MMI Peripherals

oto

HUMAN

disturbances

skill management

M

ENVIRONMENT

Sen so

(external actions)

Sensors

r imotor Skil

ls

Actuators

disturbances

Fig. 2. Biomimetic robot information processing. Left: Perception-action cycle of a situation-oriented skillbased robot in interaction with its environment (and a human) based on the key concepts of perception, situation, and skill. Right: System architecture of a personal robotic assistant based on those key concepts.

4. Implementation Due to space limitations the actual implementation of the described fundamental concepts can only be sketched; for details cf. [3]. Figure 2 (right) shows the essence of the situation-oriented skill-based robot architecture as we have implemented it. The situation module (situation assessment & skill management) acts as the deliberator of the whole system. It is interfaced via skills in a bidirectional way with all hardware components – sensors, actuators, knowledge base storage and MMI peripherals (manmachine and machine-machine interface peripherals). Skills have direct access to the hardware components and, thus, let the robot sense, think (plan) and act. They obtain certain information, e.g., sensor readings, generate specific outputs, e.g., arm movements or speech, or plan a route based on map knowledge. Skills report to the situation module via events and messages on a cyclic or interruptive basis to enable a continuous and timely situation update and error handling. Whereas some skills require real-time connections to and from the situation module, e.g., sensor skills, non-real-time performance is sometimes allowed, e.g., for data base maintenance and route planning. The situation module fuses data and information from all system components to make situation assessment and skill management possible. As such, it provides cognitive skills, i.e., it is responsible for planning an appropriate skill sequence to reach a given goal. By activating and deactivating the inbuilt skills, the situation-dependent concatenation of skills is realized that leads to complex and elaborate robot behavior. Motor skills control simple movements of the robot’s actuators. They can be arbitrarily combined to yield a basis for more complex control commands. Encapsulating the access to groups of cooperating actuators leads to a simple interface structure and allows an easy generation of pre-programmed motion patterns. Sensor skills encapsulate the access to one or more sensors and provide the situation module with proprio-, extero- and interoceptive data. Sensorimotor skills combine both sensor and motor skills to yield sensor-guided robot motions. Data processing skills organize and access the system’s knowledge bases. They return specific information upon request and modify existing, or add newly gained, knowledge to the data bases. Various types of knowledge bases are being used, including an attributed topological map for storing the static characteristics of the environment, an object data base, a subject data base and a list of missions to accomplish. Communication takes place via various different channels (acoustic, optical, tactile) and requires a variety of Communicative skills to pre-

R. Bischoff and V. Graefe / Learning from Nature to Build Intelligent Autonomous Robots

129

process input from humans and to generate a valuable feedback for them according to the current situation and the given application scenario. They depend on various sensor skills to receive the required information, e.g., hearing skills for understanding speech, vision skills, and even tactile skills, to allow situated dialogues with references to visible and tangible objects and subjects in the world. When comparing Rasmussen’s model of human information processing (cf. Fig. 1) with our model of robot information processing (cf. Fig. 2) the three blocks „feature formation”, „recognition” and „identification” are realized by sensor skills, and „automated sensor-motor patterns” are mostly realized by motor and sensorimotor skills. The cognitive skills provided by the situation module correspond to the three blocks „decision of task”, „planning” and „association state/task”. Part of „planning” and „stored rules for task” are realized by data processing skills. As communicative skills involve the entire information processing system they cannot be assigned to an individual block of Figure 1. In our implementation all tasks and threads of the robot’s software system run asynchronously, but can nevertheless be synchronized via messages or events. Overall control is realized as a finite state automaton that is capable of responding to prioritized interrupts and messages. A finite state automaton has been preferred over other possibilities because of its simplicity. Motor skills are mostly implemented at the microcontroller level within the actuator modules. The other skills are implemented on a network of digital signal processors that is physically embedded in a PC.

5. Real-world experiments Over the years four of our robots have been equipped with different variants of the described system architecture and brought into short- and long-term field trials: two autonomous vehicles (ATHENE I and II), a stationary manipulator and a humanoid robot (HERMES). All robots were endowed from the start with certain basic skills, and were able to incrementally extend their knowledge and skills through learning. Calibration-free navigation. Two different vision-guided mobile robots were used as experimental platforms for learning to navigate a network of corridors by concatenating simple vision- and odometry-based sensorimotor skills: HERMES, a humanoid personal robotic assistant, and the older ATHENE II, a transportation robot with tricycle wheel configuration. HERMES has a unique wheel configuration with two independently driven and steered wheels mounted at the front and the back, and two caster wheels arranged at the centers of the left and right sides of a quadratic undercarriage allowing many different drive modes (differential, car-like, fork-lift, omnidirectional) [5]. While ATHENE II is equipped with a monochrome video camera mounted on a 1-DOF platform steerable around its vertical axis, HERMES’ main sensors are two monochrome video cameras mounted on independent pan/tilt drive units, in addition to a pan/tilt unit that controls the common „head” platform. Various standard lenses (from wide angle to tele) and a Suematsu lens were used during the experiments. It should be explicitly noted that it would have been difficult and time-consuming to measure all relevant kinematic parameters of such versatile robots with any degree of accuracy and to design model-based controllers for all possible modes of operation. Our calibration-free approach makes it unnecessary. For testing our calibration-free navigation concept we have equipped both robots with reliable, robust and sufficiently fast dynamic vision systems [5]. They yield the

130

R. Bischoff and V. Graefe / Learning from Nature to Build Intelligent Autonomous Robots

required data, such as the position of a guideline that is typically detected at the intersections of the floor with the walls despite adverse conditions, such as specular reflections and difficult illumination conditions, in general at video field rate (i.e., 50 Hz). In the beginning of the actual driving experiments HERMES and ATHENE II were placed somewhere in a corridor. Their exact orientation and position were unknown, but a section of the guideline had to be in the field of view. Corridor navigation was possible and worked fine in a tilt angle range from 0° (horizontal) to about 50° (looking downward). Negative tilt angles (i.e., looking upward) were not tested, but could have worked as long as the guideline was visible. Based on their navigation skills map building and delivery missions were successfully performed (Fig. 3. left). Calibration-free manipulation. Manipulation experiments were first carried out with the vision-guided stationary manipulator (Fig. 3, middle). The cameras are attached to the robot on metal rods at the first link so that they rotate around the axis of joint 1 together with the arm. They are mounted in a rather unstable way to make the impossibility of any calibration or precise adjustment obvious, and to allow easy random modifications of the camera arrangement. Objects of various shapes could be grasped, although no knowledge regarding the optical or kinematic parameters of the robot was used. Even arbitrary unknown rotations and shifts of the cameras were tolerated while the robot was operating, which demonstrated the robot’s extreme adaptability. Furthermore, it was shown that such an extreme adaptability does not necessarily have to degrade pick and place cycle times if appropriate learning schemes are used. With its two arms with 6 DOF and a two-finger gripper each and a bendable body, HERMES possesses an even more complex manipulation system which – besides grasping - allowed us to carry out human-robot interaction experiments. The biggest advantage of our skill-based approach is that it enormously reduces complexity in designing intelligent robot behaviors, such as taking/handing over objects from/to people, placing them onto other objects and gesturing in communicative acts. Communication and interaction. HERMES has been presented successfully at trade fairs, in television studios and in a museum, and at various demonstrations in our institute environment. The robot is able to deliver simple services within initially unknown environments for users that may be initially unknown, too. Services handled by the robot include, but are not limited to, transportation tasks, entertainment, and gathering and providing information about people and their living or working environment (Fig. 3, right). For example, many types of dialogues exist to cooperatively teach the robot new knowledge and to build a common reference frame for subsequent execution of service tasks. It is possible for the robot – besides learning an attributed topological map of the environment - to learn persons’ names, to learn how locations and objects are denominated by a specific person, where objects of personal and general interest are located, and how to grasp specific objects. This requires several databases and links between them in order to retrieve the required information [6]. One of the most promising results of our experiments is that the biomimetic approach seems to pay off. First, despite the complexity of the overall system the natureinspired system architecture allows the integration of newly developed skills very easily making the system scalable. Second, although we experienced drifting of system parameters in the long-term experiments due to temperature changes or simply wear of parts or aging, the robots’ performances were not affected by such drifts because our algorithms only rely on qualitatively (not quantitatively) correct information and adapt to parameter changes automatically. We found that the implemented skills worked very well not only around our laboratory, but also in other settings, despite the rather differ-

R. Bischoff and V. Graefe / Learning from Nature to Build Intelligent Autonomous Robots

131

ent appearance of objects of interest (e.g., corridors, docking stations). Third, according to the museum staff who ran our longest field trial for a period of more than 6 months (October 2001 – April 2002) in the Heinz Nixdorf MuseumsForum (HNF) in Paderborn, Germany, HERMES was one of the few robots at the show that could regularly be demonstrated in action, and among them it was considered the most intelligent and most dependable one. This statement is supported by the fact that the museum staff never called for advice once the initial setup was done.

gripper

camera

camera

object

Fig. 3. Experiments and Results. Left: Image taken by the HERMES' vision system during corridor navigation (tracking a guideline, indicating two forms of interruptions (door and junction)). Middle: Setup of stationary robot arm used for grasping (Mitsubishi Movemaster RV-M2, 5 DOF, two-finger gripper and a stereo vision system). Right: HERMES describing the way to a location of interest by means of voice and gestures.

6. Summary and conclusions A biomimetic approach to the realization of robot intelligence by studying mother nature’s best ideas and then imitating her designs and processes to solve robotic problems has been proposed. A robot information processing architecture was derived by studying Rasmussen’s model of human information processing. It has been implemented on various robots and has proved to endow robots with intelligent situation-oriented behavior. The ability to sense in a human-like way by means of vision, touch and hearing – the most powerful sensor modalities known – is enabling our robots to perceive their environments, to understand complex situations and to behave intelligently. While today’s robots are mostly strong with respect to a single functionality, e.g., navigation or manipulation, our results illustrate that many functionalities can be integrated within one single robot through a nature-inspired unifying situation-oriented skill-based system architecture. Furthermore, testing a robot in various environmental settings, both short- and long-term, with people having different needs and different intellectual, cultural and social backgrounds, is enormously beneficial for learning the lessons that will eventually enable us to build dependable personal robotic assistants. 7. References 1. 2. 3. 4. 5. 6.

Rasmussen, J. (1983): Skills, rules, and knowledge; Signals, signs, and symbols, and other distinctions in human performance models. IEEE Trans. on Systems, Man and Cybern. Vol. 13, No. 3, pp. 257-266. Arkin, R. C. (1998): Behavior-Based Robotics. MIT Press, Cambridge, MA, 1998. Bischoff, R.; Graefe, V. (2004): HERMES – a versatile Personal Robotic Assistant. Proc. of the IEEE, Spec. Issue on Human Interactive Robots for Psychological Enrichment, Vol. 92, No. 11, p. 1759-1779. Proctor, R. W.; Dutta, A. (1995): Skill Acquisition and Human Performance. In: Bourne, L. E. (ed.). Advanced Psychology Texts. SAGE Publications, Thousand Oaks. Graefe, V.; Bischoff, R. (2004): Robot Navigation without Calibration. Proceedings IEEE International Workshop on Intelligent Robots and Systems (IROS ‘04). Sendai, pp. 627-632 Bischoff, R.; Graefe. V. (2002): Dependable Multimodal Communication and Interaction with Robotic Assistants. 11th IEEE Int. Works. on Robot and Human Interactive Comm. Berlin, pp. 300-305.

This page intentionally left blank

Part 2 Tracking Control & Active Vision

This page intentionally left blank

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

135

A Laser Based Multi-Target Tracking for Mobile Robot Masafumi HASHIMOTO a1, Satoshi OGATA b, Fuminori OBA b, and Takeshi MURAYAMA c a

Faculty of Engineering, Doshisha University, Japan Faculty of Engineering, Hiroshima University, Japan c Faculty of Dentistry, Hiroshima University, Japan b

Abstract. This paper presents a method for tracking multiple moving objects with invehicle 2D laser range sensor (LRS) in a cluttered environment, where ambiguous/false measurements appear in the laser image due to observing clutters and windows, etc. Moving objects are detected from the laser image with the LRS via a heuristic rule and an occupancy grid based method. The moving objects are tracked based on Kalman filter and the assignment algorithm. A rule based track management system is embedded into the tracking system in order to improve the tracking performance. The experimental results of two people tracking validate the proposed method. Keywords. Mobile robot, Laser range sensor, Multi-target tracking, Occupancy grid method, Kalman filter, Data association, Assignment algorithm

1. Introduction Tracking (i.e., estimating position and velocity) of multiple targets (moving objects) is an important issue for achieving autonomous navigation and cooperation of mobile robots and vehicles. There has been much interest in the use of stereo vision or laser range sensor (LRS) in mobile robotics and vehicle automation [1-13]. Although a number of papers exist in tracking of multiple moving objects with the LRS in clean/crowded environments [2-4, 68, 11, 12], there are a few efforts [5, 13] in the use of the LRS in cluttered environments, where ambiguous/false measurements appear in the laser image by observing clutters, windows and so on. In this paper we present a method for tracking multiple moving objects with the LRS in cluttered environments. The tracking of multiple moving objects is achieved by two steps; target detection (finding moving objects from the laser image) and target tracking (tracking the moving objects detected). The occupancy grid method is one of effective methods for the target detection [2, 3]; another approach to find moving objects is based on the scan matching of successive laser scans [13]. In this paper we apply the occupancy grid based method for the target detection. In the occupancy grid method the cells occupied by obstacles are marked. For each new cell of occupation in the current scan the corresponding cell in the previous 1 Corresponding Author: Department of Information Systems Design, Faculty of Engineering, Doshisha University, Tatara, Kyotanabe, Kyoto, 610-0321 Japan; E-mail: [email protected]

136

M. Hashimoto et al. / A Laser Based Multi-Target Tracking for Mobile Robot

scan is checked. If that cell is marked the cell in the current scan is considered as stationary, otherwise as moving. Most of the conventional occupancy grid based target detection are based on the measurements in a few successive scans of the LRS, and thus cluttered environments deteriorate the detection performance since ambiguous/false measurements appear in the laser image. In order to enhance the performance of the target detection in cluttered environments this paper presents an alternative method based on a heuristic rule and an occupancy grid method. The target tracking is divided into two approaches; model based approach (e.g., Kalman filter and Particle filter) [4, 6-13] and model free approach [2, 3, 5]. The model based approach makes it possible to track objects temporarily occluded by stationary/ moving objects, and thus it is more suitable than the model free approach. Data association (one-to-one matching of tracked objects and measurements) is necessary for the target tracking in multi-target environments. Many data association techniques have been addressed in the aerospace area (e.g., Joint Probabilistic Data Association Filter (JPDAF) and Multiple Hypothesis Tracker (MHT)) [15]. In this paper we apply the assignment algorithm [14] for the data association. The algorithm is simple extension of well-known Nearest Neighboring Standard Filter (NNSF) [15] for the data association in multi-target environments, and it allows the multi-target tracking in cluttered environments. Moreover in this paper a rule based track management system is presented in order to improve the tracking performance. This paper is organized as follows: In Section 2, our experimental system is overviewed. In Sections 3 and 4, methods of moving target detection and tracking, respectively, are presented. In Section 5, two experiments are conducted to validate our method, followed by conclusions.

2. Experimental System Figure 1 shows our mobile robot (a smart wheelchair) controlled with a joy-stick. It has two steered/drive wheels in its front and at rear and four passive casters at four corners. Each of the steered/drive wheels consists of a steering unit and a drive unit. A potentiometer is attached at the steering unit to sense the steering angle, and an encoder is attached at the drive unit to measure the wheel velocity. A yaw-rate gyro is attached at the robot body to sense the turn velocity accurately. These five internal sensors are exploited for determining the ego-motion of the robot with the dead reckoning. A forward-looking LRS (Sick LMS200) mounted on the robot takes the scan image, which is represented by a sequence of distance samples in the horizontal half plane; the distance samples are angularly ordered counterclockwise. The angular resolution of the LRS is 0.5deg, and then the number of distance samples are 361(=180/0.5+1) in one scan image. The scanning frequency is about 75Hz; however the performance of serial communication link imposes a restriction to about 10Hz.

M. Hashimoto et al. / A Laser Based Multi-Target Tracking for Mobile Robot

137

Laser range sensor (LRS)

Steered/drive wheel

Figure 1.

Overview of mobile robot (smart wheelchair)

Figure 2. Samples taken with LRS within an indoor environment. The symbol of open circle shows the sample point with LRS.

3. Target Detection We describe a method for finding moving objects from the scan image of the LRS based on Fig.2, where moving objects are the objects (D), (G) and (I). Detecting moving objects are achieved by the three steps; sample clustering, exclusion of distant objects (detection of nearby objects), and exclusion of stationary objects (detection of moving objects). The distance samples coming from the same object have the similar values, while those coming from different objects are significantly difference. Thus the distance samples taken with the LRS are clustered by checking the gap between two adjacent samples. Based on the sample clustering the ten objects (A) to (J) are detected as in Fig.2. It is assumed that for collision avoidance, it has interest in moving objects located in the vicinity of the robot. In order to exclude distant objects (objects located far away from the robot) from the sample clusters as many as possible, we use a heuristic rule; as shown in Fig.2 the two edge-samples related to the object (i), where i=B to I, are denoted by bsi and bei, respectively. The edge-sample related to the right object of the object (i) is denoted by be(i-1), and that related to the left object of the object (i) is done by bs(i+1). Objects that comply with bsi  be ( i 1) t rbg ‫ޓ‬and bei  bs (i1) t rbg , where rbg is a threshold, are considered as the distant objects. Based on this heuristic rule the objects (A),(C), (F), (H) and (J) are excluded, and the objects (B), (D), (E), (G) and (I) remain. It should be noted that the distant object (E) is not excluded since our exclusion algorithm is only based on the two gaps related to three adjacent sample-clusters and the relatively distant object is excluded. In our experiments described in Section 5, the threshold rbg is set at 0.3m. As shown in Fig.3 a grid map is represented in a reference coordinate frame 6 w ( Ow ; X wYw ). The LRS measurements related to the objects (B), (D), (E), (G) and (I) are mapped in 6 w , and those objects are marked on the grid map. In order to judge whether each cell on the grid map is occupied by stationary or moving objects, we count how many

138

M. Hashimoto et al. / A Laser Based Multi-Target Tracking for Mobile Robot

Yw

Robot

Ow

Xw

Figure3. Grid map; the gray cells show the field of view of the LRS.

successive scans each cell is marked in. The cell marked in more than ten successive scans is referred to as the stationary object cell. For the objects (B), (D), (E), (G) and (I) we obtain the value of D i b / a ; where a is the number of cells occupied by the object (i), where i= (B), (D), (E), (G) and (I); b is the number of the stationary object cell related to the object (i). The object that complies with D i  0.5 is decided as the moving object, and thus the moving objects (D), (G) and (I) are extracted. Finally the nearby objects (G) and (I) are merged. In our experiments the cell size is set at 0.3m x 0.3m. For mapping the laser image on the grid map we find the change of the robot position and orientation between the laser scans with the dead reckoning system. Although the dead reckoning causes the drift error over time, we use the change in a limited number of previous scans. Thus the effect of the drift error on the mapping is very small.

4. Target Tracking The position of the moving object in 6 w is denoted by ( x, y ) . The object is assumed to move at nearly constant velocity. A rate kinematics of the object motion is then given by xk

Fx k 1  G'xk 1

(1)

where x ( x, x , y, y )T . 'x ('x, 'y)T is an acceleration. The subscripts ( k  1 ) and ( k ) represent the ( k  1 )th scan and the k th scan, respectively, of the LRS. The system matrices F and G are constant. The LRS measurements related to the moving object (i.e., the distance zr and the bearing zT ) give the position in 6 w by ( z x , z y ) ( zr cos zT , zr sin zT ) . The measurement model related to the moving object is then

139

M. Hashimoto et al. / A Laser Based Multi-Target Tracking for Mobile Robot

zk

H k xk  Lk pk  'z k

(2)

where z ( z x , z y )T . 'z is a measurement noise. p is the position vector of the LRS in 6 w . The system matrices are given by Hk

ª cos\ LRSk « sin\ LRSk ¬

0 sin\ LRSk 0 cos\ LRSk

0º and Lk 0»¼

ª cos\ LRSk « sin\ LRSk ¬

sin\ LRSk º cos\ LRSk »¼

\ LRS is the orientation of the LRS in 6 w . The position and orientation, p and \ LRS , are estimated with the dead reckoning system. The position and velocity of the moving object are predicted with Kalman filter [15]. As shown in Fig.4 a validation region with the radius of rVG is set around the predicted position of the tracked object. The measurement inside the validation region, which is considered to come from the tracked object, is applied for the track update with Kalman filter. In our experiments the radius of the validation region, rVG , is set at 0.5m. As in Fig.4 (a)-(c) in real world multiple measurements would exist inside a validation region; multiple tracked objects would also compete for measurements. For achieving the reliable data association (one-to-one matching of tracked objects and measurements), we exploit the assignment algorithm [14], in which a combined likelihood function relating to measurements is evaluated. We assume that the number of tracked objects, whose validation regions are overlapped partially, is N and that the number of measurements, which exist inside the validation region, is M . A binary assignment variable anm is defined such that anm =1 if the measurement m is assigned to the object n, otherwise anm =0, where m=1 to M and n=1 to N. The assignment variable is subject to the following constraints: M N

¦a

nm

m 1

N M

¦ anm n 1

½ 1‫ޓޓ‬ for ‫ޓ‬n 1,, N , N  1,, N  M ° ° ¾ 1‫ޓޓ‬ for ‫ޓ‬m 1,, M , M  1,, M  N ° ¿°

(3)

where anm =1 for n t N  1 and m d M denotes that measurements come from new moving

(a) Case 1

(b) Case 2

(c) Case 3

(d) Case 4

(e) Case 5

Figure 4. Tracking condition; the symbols of closed circle and open diamond mean the tracked object and the measurement, respectively. The circle drawn by broken line is the validation region.

140

M. Hashimoto et al. / A Laser Based Multi-Target Tracking for Mobile Robot

objects or false alarms; anm =1 for n d N and m t M  1 denotes that tracked objects are invisible; anm =1 for n t N  1 and m t M  1 is dummy. A set of assignment variables and a set of measurements in the kth scan of the LRS are denoted by a {anm : n, m 1,, N  M } and Z k {z m ,k }mM 1 , respectively. The data association is achieved by finding the optimal assignment a * so that the combined likelihood function p>Z k | a @ relating to the measurements Z k might be maximized under the constraints in Eq.(3). Equivalently this can be rewritten as the following standard minimization problem: N M M  N

a* arg min a

¦ ¦a n 1

c

nm nm

(4)

m 1

The cost function cnm is given by [14]

cnm

­ln[ODVVG Lnm ,k|k 1 ] for n d N , m d M ° ® ln[1  OD ] ‫ ޓ‬for n d N , m t M  1‫ޓ‬ ° for n t N  1 0 ¯

(5)

where OD is the detection probability of the tracked object, and VVG is the area of the validation region. Lnm is the likelihood function related to the measurement under the assumption that the measurement m comes from the object n . In our experiments we set OD =0.9. In cluttered environments ambiguous/false measurements may cause incorrect extraction of stationary and moving objects, and then false alarms occur (i.e., false moving objects are detected). Moreover in real world moving objects appear in and disappear from the field of view of LRS. They also meet the occlusion, crossing, merging and splitting situations. In order to cope with such conditions we build a track management system based on the following rules: a) Track initiation: As shown in Fig.4 (a) and (d) the measurements that are not matched with any tracked objects are considered to come from new moving objects or false alarms. The false alarms would disappear soon. From this reason the robot tentatively initiates tracking of the measurements with Kalman filter. If the measurements are constantly visible in more than 13 successive scans of the LRS, they are decided to come from new moving objects, and the tracking is continued. If the measurements disappear within 12 scans, they are decided as the false alarms, and the tentative tracking is terminated. b) Track termination: When the tracked objects leave the field of view of the LRS or they are occluded by other objects/obstacles in environments, no measurements exist within their validation regions as shown in Fig.4 (e). If no measurements arise from the temporal occlusion the measurements would appear again. The robot thus predicts the positions of the tracked objects with Kalman filter. If the measurements appear again within 30 scans of the LRS, the robot proceeds with tracking the objects. Otherwise the tracking is terminated.

M. Hashimoto et al. / A Laser Based Multi-Target Tracking for Mobile Robot

141

c) Merging and crossing: Merging or crossing of moving objects brings a situation as shown in Fig.4(b). In the crossing situation, however, new measurements would appear soon within the validation regions. If new measurements appear again within eight scans of the LRS the objects are considered as crossing, otherwise as merging.

5. Experimental Results

We conducted an experiment to evaluate the proposed method; two people walking around were tracked in a room environment as shown in Fig.5. They always exist in the field of view of the LRS, and they cross many times. Their waking speed is between 0.3 and 1.2m/s. The LRS is set on the robot at a height of 0.47m from the floor, and thus it observes around knees of the people. Figure 6 shows the experimental results; the subfigure (a) plots all the samples taken with the LRS in the experiment. (b) shows the result of detecting moving objects. (c) shows the tracking result, where black and gray bold lines show the tracks of two people, and the black dotted line shows the track of the robot. The LRS observes many kinds of stationary objects in the room environment (e.g., walls, desks, chairs, boxes, cables, and pipes), and ambiguous measurements appear. The measurements cause many false moving objects. However the tracking system works well, and thus only the two walking people are tracked as in Fig. 6(c). We conducted another experiment; two people walking around were tracked in room/corridor environments as shown in Fig.7. The room environment is the same as that conducted in the above experiment. The left side of the corridor is windows, while the right side is walls and doors as shown in Fig.7; the windows disturb the laser image due to the laser transmission/diffusion, and thus many spurious measurements appear at the window side of the corridor as in Fig.8 (a). However our tracking system achieves the people tracking correctly as in Fig.8(c).

6. Conclusions

In this paper we presented a laser based multi-target tracking in cluttered environments where ambiguous/spurious measurements are visible in the laser image. The moving objects were detected based on a heuristic rule and an occupancy grid based method. The detected objects were tracked based on Kalman filter and the assignment algorithm. In order to enhance the tracking performance a rule based track management method was incorporated into the tracking system. Two experimental results validated our method. The method proposed in this paper allows detecting and tracking only the small moving objects (e.g., walking people). Current research effort is directed toward detecting and tracking moving objects with a wide range of size (e.g., mobile robots and vehicles as well as walking people).

142

M. Hashimoto et al. / A Laser Based Multi-Target Tracking for Mobile Robot

Figure 5. Experimental environment

(a) Laser image with LRS

(b) Result of detecting moving object

(c) Result of tracking moving object

Figure 6. Experimental result; in the subfigure (c) black-dotted line, black-bold line and thin-bold line show the tracks of the mobile robot and two people, respectively.

M. Hashimoto et al. / A Laser Based Multi-Target Tracking for Mobile Robot

143

Figure 7. Experimental environment

(a) Laser image with LRS

(b) Result of detecting moving object

(c) Result of tracking moving object Figure 8. Experimental result; in the subfigure (c) black-dotted line, black-bold line and thin-bold line show the tracks of the mobile robot and two people, respectively.

144

M. Hashimoto et al. / A Laser Based Multi-Target Tracking for Mobile Robot

Acknowledgement This research was supported by the Academic Frontier Research Project on "New Frontier of Biomedical Engineering Research" of Ministry of Education, Culture, Sports, Science and Technology.

References [1] H.Koyasu, J.Miura, and Y.Shirai: Realtime Omnidirectional Stereo for Obstacle Detection and Tracking in Dynamic Environment, Proc. of the 2001 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS2001), CD-ROM (2001) [2] E.Prassler, J.Scholz and A.Elfes: Tracking Multiple Moving Objects for Real-Time Robot Navigation, Autonomous Robots, Vol.8, pp.105-116 (2000) [3] E.Prassler, J.Scholz and P.Fiorini: A Robotic Wheelchair for Crowded Public Environment, IEEE Robotics and Automation Magazine, pp.38-45 (2001) [4] B.Kluge, C.Kohler and E.Prassler: Fast and Robust Tracking of Multiple Moving Objects with a Laser Range Finder, Proc. of the 2001 IEEE Int. Conf. on Robotics & Automation (ICRA 2001), pp.1683-1688 (2001) [5] M.Lindstrom and J.-O.Eklundh: Detecting and Tracking Moving Objects from a Mobile Platform Using a Laser Range Scanner, Proc. of the 2001 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS2001), CD-ROM (2001) [6] D.Schulz, W.Burgard, D.Fox, and A.B.Cremers: People Tracking with Mobile Robot using Sample-based Joint Probabilistic Data Association Filters, The Int. J. of Robotics Research, pp.99-115 (2003) [7] A.Fod, A.Howard, and M.J Mataric: A Laser-Based People Tracker, Proc. of the 2002 IEEE Int. Conf. on Robotics & Automation (ICRA 2002), pp.3024-3029 (2002) [8] O.Frank, J.Nieto, J.Guivant, and S.Scheding: Multiple Target Tracking Using Sequential Monte Carlo Methods and Statistical Data Association, Proc. of the 2003 IEEE Int. Conf. on Robotics & Automation (ICRA 2003), pp.2718-2723 (2003) [9] C.Hue, J-P.Le Cadre, and P.Perez: Tracking Multiple objects with Particle Filtering, IEEE Trans. on Aerospace and Electronic Systems, Vol.38, No.3, pp.791-811 (2003) [10] D.Caveney, B.Feldman, and J.K.Hedrick: Comprehensive Framework for Multisensor Multitarget Tracking in the Adaptive Cruise Control Environment, Proc. of 6th Int. Symp on Advanced Vehicle Control (AVEC’02), pp.697-702 (2002) [11] K.Nakamura, H.Zhao, R.Shibasaki, K.Sakamoto, T.Ooga, and N.Suzukawa: Tracking Pedestrians by using Multiple Laser Range Scanners, Proc. of ISPRS Congress, vol. 35, part B4, pp.1260-1265 (2004) [12] B.Jensen, et al.: Narrative Situation Assessment for Human-Robot Interaction, Proc. of the IEEE Int. Conf. on Robotics and Automation (ICRA 2003), CD-ROM (2003) [13] C.Wang, C.Thorpe, and S.Thrun: Online simultaneous localization and mapping with detection and tracking of moving objects: theory and results from a ground vehicle in crowded urban areas, Proc. of Int. Conf. on Robotics and Automation (ICRA 2003), CD-ROM (2003) [14] K.R.Pattipati, R.L.Popp and T.Kirubarajan: Survey of Assignment Techniques for Multitarget Tracking,: Multitarget-Multisensor Tracking: Applications and Advances volume III (Edited by Y.Bar-Shalom and W.D. Blair), Artech House, Inc., pp.77-159 (2000) [15] Y.Bar-Shalom and T.E.Fortmann: Tracking and Data Association, Academic Press,Inc. (1988)

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

145

Simultaneous Environment Mapping And Mobile Target Tracking Abedallatif Baba 1 and Raja Chatila 2 LAAS-CNRS 7 avenue du Colonel Roche 31077 Toulouse Cedex 4 France Abstract. This paper presents an approach for building a dynamic environment model based on an occupancy grid using a SLAM technique, while detecting and tracking mobile objects using an Auxiliary Multiple-Model particle filter. The mobile objects, such as people, are distinguished from the fixed parts and not included in the model, and their motion is tracked. Keywords. Mobile Robot Navigation, SLAM, People tracking, Particle filter, Occupancy grid.

1. Introduction Most of the approaches addressing SLAM and environment mapping assume that the environment is static. In the case of service and personal robots interacting with people, this hypothesis is clearly incorrect. The resulting map will therefore include the moving objects, or a trace of their trajectories as phantom obstacles, and this might result in incorrect models [7]. Moreover, it is highly desirable that a robot can determines the positions and trajectories of the humans in its surroundings in order to be able to focus its attention on those which are willing to interact with it (e.g., those approaching the robot or moving in its close vicinity). For these reasons, it is necessary for the robot to detect and track of mobile objects, while it is mapping its environment. This paper presents an approach that relies on an occupancy grid to model the environment, and a particle filter to track moving objects, which are not included in the map. In the next section, we briefly overview related work. Then in section 3 we recall mapping using occupancy grids. Section 4 presents the target tracking approach and the Auxiliary Multiple-Model particle filter technique. Section 5 shows results obtained so far, then we conclude with future directions. 2. Related Work There has been recently a number of results on robot mapping and localization in populated environments. Most approaches suppose a known map, and focus on distinguishing 1 E-mail: 2 E-mail:

[email protected] [email protected]

146

A. Baba and R. Chatila / Simultaneous Environment Mapping and Mobile Target Tracking

mobile objects from the static map. For example, Burgard et al. [4] update a static map using the most recent sensory input to deal with people in the environment. Montemerlo et al. [10] present an approach to simultaneous localization and people tracking using a conditional particle filter. Arras et al. [1] use line features for localization and filter range-measurements reflected by persons. Fox et al. [5] use a probabilistic technique to filter out range measurements that do not correspond to the environment model. Our technique, in contrast, does not require a given map. On the other hand, there are a variety of techniques for tracking persons, for predicting future poses of persons, or for adopting the behavior of the robot according to the information obtained about the persons in its vicinity [9,2,11]. However, these techniques do not filter the measurements corresponding to persons in order to improve the model of the environment. They suppose a known map. Our approach is tobuild the map while tracking people circulating in the environment to avoid including them in the map. Recently, Wang and Thorpe [13] addressed this issue. Their approach segments the data to extract features. Hahnel, Schulz, and Burgard. [8] presented a probabilistic approach to filtering people out of sensor data and to incorporate the results of the filtering into the mapping process. Their approach is very sound, but it needs to use scan alignment for each two successive maps to detect the dynamic objects, which may be time consuming. The approach we use in this paper doesn’t need specific features, which makes it more robust. It’s not using scan alignment neither: the environment is represented by occupancy grids, which enables to be independent of any kind of representation or features, and the grid representation is less complex to deal with that the scans themselves. Human motion is detected by comparing the successive occupancy grid models, and tracking is accomplished by a bearing only tracking approach using a particle filter.

3. Map Building and Moving Target Detection Approaches to environment mapping usually rely on appearance representations (raw data), feature-based visual or geometrical representations (e.g., 2D segments), occupancy grids, or topological graphs. All these representations have advantages and drawbacks. It is not the purpose of this paper to discuss the representation issues. We have selected for this work occupancy grid because of the simplicity of their construction. In this representation the environment is discretized in cells, and to each cell is assigned a random variable describing its probable state of being empty or occupied. We won’t detail the building of the occupancy grid here, the algorithm being widely known. See [12] for the theoretical foundations and the algorithm. Figure 1 is an example of the grid built by the robot in our lab. Robot localization is currently done by using an edge-based representation of obstacles. The occupancy grids representation enables to compare successive maps (represented by a matrix of numbers in [0,1]) to detect changes rather easily. A sudden change in occupancy state for already perceived grid cells indicates a moving object. The probability of occupancy of a given cell is indeed going to change drastically in two successive views for a mobile object, whereas for fixed objects there will be smooth changes. A comparison of consecutively built grids followed by a thresholding of the result reveal the moving object. As a first approach we are using a grid difference and thresholding. Figure 2 shows three consecutive maps with a target moving to the left .

A. Baba and R. Chatila / Simultaneous Environment Mapping and Mobile Target Tracking

147

Figure 1. Example of occupancy grid built with a 2D scanning laser range finder. The occupancy value is coded as follows: green: unknown; white: free; blue occupied. The area is 4x4m2 ; a cell is 5x5cm2

Figure 2. Occupancy grid with an object moving towards the left. The occupancy probabilities change drastically when the object moves.

4. Target Tracking Tracking people in the robot environment is not a new problem. Stated otherwise, this is a classical target tracking problem, present in a large number of applications. The Kalman filter, quite popular for SLAM, was actually devised for this application. In our context, we however need a recursive bayesian filter, which does not have the limitations of linearity nor gaussianity. As mentioned before, particle filters are a common tool for coping with this problem. The problem can be stated as follows: given several targets defined by their state (position, velocity, acceleration) measured by noisy sensors, compute at each time step the estimate of the target state. To simplify the tracking poroblem, we consider only the bearing information. In the next section, we recall the basic technique for bearing tracking. 4.1. Bearing-Only tracking The approach for bearing only tracking as developed in [6] is as follows. In the 2D plane, we suppose a target X t defined by its position and velocity. The robot state is also defined

148

A. Baba and R. Chatila / Simultaneous Environment Mapping and Mobile Target Tracking

by its position and velocity, and the relative target state vector is their difference. This is the state to be estimated. From the target state dynamic equation Xk = fk−1 , νk−1 and sensor measurements Zk = hk (Xk , wk ), where νk−1 and wk are considered to be white noises with known probability density functions and mutually independent, we want to compute the posterior P (Xk |Zk ). The target is supposed to be changing directions (maneuvring target), which is quite natural for humans in an indoors environment. Given a straight trajectory, the behavior models of the target are defined as moving straight ahead (noted 1), turning clockwise (2), and turning anti-clockwise (3). S = {1, 2, 3} denotes the set of three models. A transition probability matrix (jump Markov Model) defines the state change probability among these three behaviors: ⎛

Πab

⎞ π11 π12 π13 = ⎝ π21 π22 π23 ⎠ π31 π32 π33

where πij ≥ 0, and the sum along the lines is equal to one. 4.2. Auxiliary Multiple-Model Particle Filter The Auxiliary Sequential Importance Resampling filter (ASIR) [6], generates particles from the sample at k − 1. Each particle is assigned a weight conditioned on the current measurement. Thus, ASIR does resampling at the previous time step based on estimates that characterize the posterior of X knowing Xk−1 . The advantage is that the generated new points are in a region of high likelihood, which means a lower sensitivity to outliers if the process noise is small, which we suppose here given the dynamics of the problem. On the other hand, the Multiple-Model Particle Filter (MMPF) is a technique for non-linear filtering with switching dynamic models, which is the case of our problem. A combination of both ASIR and MMPF is the Auxiliary Multiple-Model Particle Filter which we have implemented (see [6] for the detailed algorithm).

5. Results We show in this section simulation results for the tracking approach discussed above. We have first applied the approach in the case where the dynamic target is one person moving randomly in the vicinity of a robot which is the observer. In this simulation, we have chosen the number of particles for each model of the set S in section 4, to be 100 particles, for having an good accuracy. In this case, the computation time to estimate each step of the target trajectory was 500 ms. However, a reasonable accuracy can be obtained with 10 particles only, with a computing time of 80 ms. The sampling interval is T = 0.05 s, the frequency of the laser scan being 20 Hz. The typical maneuvre target acceleration is 0.2m.s−2 . The transition probability matrix is:

A. Baba and R. Chatila / Simultaneous Environment Mapping and Mobile Target Tracking

149

Figure 3. Person tracking. The robot moves along the piecewise straight line (in green); the red circles represent the real target trajectory, the blue points are the particles.



Πab

⎞ .32 .34 .34 = ⎝ .34 .34 .32 ⎠ .34 .32 .34

Here we have chosen the probability for the target to go straight ahead π11 , less than the others, i.e., the person prefers to change directions randomly during movement, which will be more difficult to be tracked. We have also chosen probabilities π23 and π32 to be also less than the other values, because there is less chance for one person to change suddenly directions from right to left or the opposite. The robot-person geometry is shown in figure 3, where the robot and the person move in variable speed and their movements are simulated along the axes x and y by a linear model in which one variable changes randomly. Figure 4 is a zoomed part showing the more weighted particles. Figure 5 shows another run with a more complex trajectory. As shown in the figures, the results have a good accuracy (a few cm). Tracking can be done while the robot is constructing its map with the occupancy gird, which enables to detect the moving targets and to supply the tracker technique with their coordinates, and to discard them from the map, as mentioned in section 3. With a laser scanner, the occupancy grid algorithm distinguishes between the dynamic and the static cells using the occupancy probability which is related to the hit rate in the cell. Each time the algorithm detects a dynamic cell, it starts up a tracker chain for this detected cell. To reduce the number of trackers, we must however group neighboring cells moving with the same velocity and direction, considering them to be only one target. This approach can be generalized to tracking several persons moving simultaneously in the robot vicinity. Figure 6 shows two persons (targets) being tracked. The case when a moving target hides another moving target is not addressed in this paper. This might produce additional noise if the two targets are moving together.

150

A. Baba and R. Chatila / Simultaneous Environment Mapping and Mobile Target Tracking

Figure 4. Person tracking. Zoomed from figure 3. The circles (red) represent the real target trajectory, the points (in blue) are the particles. The stars (in blue) are the more weighted particles which represent the more probable predicted state of the target.

Figure 5. Person tracking. A more complex trajectory.

6. Conclusion In this paper we have presented a Bearing-Only Tracking approach to track people in the data obtained with robot sensors, while the robot is also building a map of its environment using an uncertainty grid, itself used to detect the moving targets. We have shown simulation results using MATLAB with good accuracy. The next stage is to experiment this tracking technique on the robot. Further improvements are to label the moving targets so that grid cells occupied by moving persons can be identified as such, even if the

A. Baba and R. Chatila / Simultaneous Environment Mapping and Mobile Target Tracking

151

Figure 6. Tracking two persons. The robot moves again along the piecewise straight line (in green); the red circles represent the real target trajectory, the blue points are the particles.

person comes to a stop. The purpose is to trigger an interaction with this person, and also to keep the mapping process from including in the model.

Acknowledgements The authors wish to thank the anonymous reviewers for their valuable comments.

References [1] K.O. Arras, R. Philippsen, M. de Battista, M. Schilt, and R. Siegwart. A navigation framework for multiple, mobile robots and its applications at the Expo.02 exhibition. In Proc. of the IROS-2002 Workshop on Robots in Exhibitions, 2002. [2] M. Bennewitz,W. Burgard, and S. Thrun. Using EM to learn motion behaviors of persons with mobile robots. In Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2002. [3] D. Beymer and Konolige K. Tracking people from a mobile platform. In IJCAI-2001 Workshop on Reasoning, with Uncertainty in Robotics, 2001. [4] W. Burgard, A.B. Cremers, D. Fox, D. Hähnel, G. Lakemeyer, D. Schulz, W. Steiner, and S. Thrun. Experiences, with an interactive museum tour-guide robot. Artificial Intelligence, 114(1-2), 2000. [5] D. Fox, W. Burgard, and S. Thrun. Markov localization for mobile robots in dynamic environments. Journal of Artificial Intelligence Research (JAIR), 11:391-427, 1999. [6] B. Ristic, S. Arulampalam, N. Gordon. Beyond the Kalman filter, particle filter for tracking applications. Artech House 2004. [7] D. Hähnel, D. Schulz, and W. Burgard. Mapping with mobile robots in populated environments. In Proc. Of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2002. [8] D. Hähnel, R. Triebel, W. Burgard, S. Thrun. Map Building with Mobile Robots in Dynamic Environments. In Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2003.

152

A. Baba and R. Chatila / Simultaneous Environment Mapping and Mobile Target Tracking

[9] ]B. Kluge, C. Koehler, and E. Prassler. Fast and robust tracking of multiple moving objects with a laser range finder. In Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2001. [10] M. Montemerlo and S. Thrun. Conditional particle filters for simultaneous mobile robot localization and people-tracking (slap). In Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2002. [11] C. Stachniss and W. Burgard. An integrated Approach To Goal-directed Obstacle Avoidance Under Dynamic Constraints For Dynamic Environments. In Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2002. [12] S Thrun. Learning occupancy grids with forward sensor models. Autonomous Robots, 15:111-127, 2003. [13] C.-C. Wang and C. E. Thorpe. Simultaneous Localization And Mapping With Detection And Tracking of Moving Objects. In Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2002.

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

153

Omnidirectional Active Vision for Evolutionary Car Driving Mototaka Suzuki a,1 , Jacob van der Blij b , and Dario Floreano a a Laboratory of Intelligent Systems, Ecole Polytechnique F´ed´erale de Lausanne (EPFL) b Artificial Intelligence Institute, University of Groningen (RuG) Abstract. We describe a set of simulations to evolve omnidirectional active vision, an artificial retina scanning over images taken via an omnidirectional camera, being applied to a car driving task. While the retina can immediately access features in any direction, it is asked to select behaviorally-relevant features so as to drive the car on the road. Neural controllers which direct both the retinal movement and the system behavior, i.e., the speed and the steering angle of the car, are tested in three different circuits and developed through artificial evolution. We show that the evolved retina moving over the omnidirectional image successfully detects the taskrelevant visual features so as to drive the car on the road. Behavioral analysis illustrates its effective strategy in algorithmic, computational, and memory resources. Keywords. Active Vision, Omnidirectional Camera, Artificial Evolution, Neural Networks, Mobile Robots

1. Introduction The omnidirectional camera is a relatively new optic device that provides a 360 degrees field of view, and it has been widely used in many practical applications including surveillance systems and robot navigation [1,2,3]. However, in most applications visual systems uniformly process the entire image, which would be computationally expensive when detailed information is required. In other cases the focus is determined for particular uses by the designers or users. In other words, the system is not allowed to freely interact with the environment and selectively choose visual features. Contrarily, all vertebrates and several insects – even those with a very large field of view – share the steerable eyes with a foveal region [4], which means that they have been forced to choose necessary information from a vast visual field at any given time so as to survive. Such a sequential and interactive process of selecting and analyzing behaviorally-relevant parts of a visual scene is called active vision [5,6,7,8]. Our previous work has demonstrated that it can also be applied to a number of real world problems [9]. In this article we explore omnidirectional active vision applied to a car driving task: Coupled with an omnidirectional camera, a square artificial retina can immediately ac1 Correspondence

to: Mototaka Suzuki, Ecole Polytechnique F´ed´erale de Lausanne, EPFL-STI-I2S-LIS, Station 11, CH-1015 Lausanne, Switzerland. Tel.: +41 21 693 7742; Fax: +41 21 693 5859; E-mail: Mototaka.Suzuki@epfl.ch; WWW homepage: http://lis.epfl.ch.

154

M. Suzuki et al. / Omnidirectional Active Vision for Evolutionary Car Driving

cess any visual feature located in any direction, which is impossible for the conventional pan-tilt camera because of the mechanical constraints. It is challenging for the artificial retina to select behaviorally-relevant features in such a broad field of view so as to drive a car on the road. Omnidirectional active vision is not biologically plausible. But as claimed in [10] it is interesting to study visual systems from a broader point of view which contains those that have never been available in biology. Also, there are several engineering applications that could benefit from omnidirectional vision. Some promising applications are discussed in section 5. A 1/10 scale model car equipped with an omnidirectional camera and three different circuits are modeled in simulation. We show that the evolved retina moving over the omnidirectional image successfully detects the task-relevant visual features so as to drive the car on the road. Behavioral analysis illustrates its effective strategy in algorithmic, computational, and memory resources. In comparison to the results obtained with a pantilt camera mounted on the same car, we show that omnidirectional active vision performs the task very robustly in spite of more difficult initial conditions.

2. Experimental Setup Figure 1 shows the real and simulated model cars as well as the views through the real and simulated omnidirectional cameras. The omnidirectional camera consists of a spherical mirror and a CCD camera. It is mounted on a 1/10 scale model car (KyoshoT M ) which has four motorized wheels. We simulated the car and the circuits using Vortex1 libraries, a commercially available software package that models gravity, mass, friction, and collisions. Additionally we used a vision software for modeling the view from the omnidirectional camera, which had originally been developed in the Swarm-bots project2 . Figure 2 shows the three circuits; ellipse, banana, and eight shaped, used in the present evolutionary experiments. An artificial retina actively moves over the omnidirectional view3 . Figure 3 illustrates the unwrapping process from the polar view to the Cartesian view and the retina overlaid on each image. In order to evaluate the performance of the omnidirectional active vision system, we also simulate a pan-tilt camera mounted on the same car and compare the results obtained in the same experimental condition. The neural network is characterized by a feedforward architecture with evolvable thresholds and discrete-time, fully recurrent connections at the output layer (Figure 4). The input layer is an artificial retina of five by five visual neurons that receive input from a gray level image of 240 by 240 pixels. The activation of a visual neuron, scaled between 0 and 1, is given by the average gray level of all pixels spanned by its own receptive field or by the gray level of a single pixel located within the receptive field. The choice between these two activation methods or filtering strategies, can be dynamically changed by one output neuron at each time step. Two proprioceptive neurons provide input information about the measured position of the retina with re1 http://www.cm-labs.com 2 http://www.swarm-bots.org/ 3 A similar approach has been taken for evolving flocking behavior of three simulated robots independently in [11], inspired by our previous work [9].

M. Suzuki et al. / Omnidirectional Active Vision for Evolutionary Car Driving

155

Figure 1. The real 1/10 scale 4WD model car (KyoshoT M ) with an omnidirectional camera mounted on the roof of the car (top left) and the simulated one (top right). The car base is 19.5 cm (W), 43 cm (L), and 13.5 cm (H). View through the real omnidirectional camera (bottom left) and one through the simulated camera (bottom right).

Figure 2. Three circuits where the robot car is tested: from left to right, ellipse, banana, and eight shaped circuits. Each circuit is designed such that an 8 m×8 m room accommodates it. The width of the roads in all circuits is 50 cm. In the present experiments the sidelines are omitted.

spect to the chassis of the car: the radial and angular coordinates for the omnidirectional camera; or the pan and tilt degrees for the pan-tilt camera. These values are in the interval [retina size/2, radius − retina size/2] pixels (retina size = 240 pixels, radius = 448 pixels in these experiments) and [0, 360] degrees for radial and angular coordinates respectively. The values for the pan-tilt camera are in the interval [−100, 100] and [−25, 25] degrees respectively. Each value is scaled in the interval [0, 1] and encoded as a proprioceptive input. A set of memory units stores the values of the output neurons at the previous sensory motor cycle step and sends them back to the output units through a set of connections, which effectively act as the recurrent connections among output units [12]. The bias unit has a constant value of −1 and its outgoing connections represent the adaptive thresholds of the output neurons [13]. Output neurons use the sigmoid activation function f (x) = 1/(1 + exp(−x)) in the range [0, 1], where x is the weighted sum of all inputs. They encode the motor commands of the active vision system and of the car for each sensory motor cycle. One neuron determines the filtering strategy used to set the activation values of visual neurons for the

156

M. Suzuki et al. / Omnidirectional Active Vision for Evolutionary Car Driving

Figure 3. Left: The polar image taken by the omnidirectional camera and the overlaid retina. Right: The corresponding unwrapped image and the retina in Cartesian coordinate.

next sensory motor cycle. Two neurons control the movement of the retina (or camera), encoded as speeds relative to the current position. The remaining two neurons encode the directional and rotational speeds of the wheels of the car. Activation values above 0.5 stand for left (directional) and forward (rotational) speeds whereas activation values below 0.5 stand for right and backward speeds, respectively. Filter VangVrad D

S

Bias

Proprioceptive neurons

Memory neurons

Visual neurons

Figure 4. The architecture is composed of a grid of visual neurons with nonoverlapping receptive fields whose activation is given by the gray level of the corresponding pixels in the image; a set of proprioceptive neurons that provide information about the movement of the retina with respect to the chassis of the car; a set of output neurons that determine at each sensory motor cycle the filtering used by visual neurons, the new angular (Vang ) and radial (Vrad ) speeds of the retina (or pan and tilt speeds of the pan-tilt camera), and the directional (D) and rotational (S) speeds of the wheels of the car; a set of memory units whose outgoing connection strengths represent recurrent connections among output units; and a bias neuron whose outgoing connection weights represent the thresholds of the output neurons. Solid arrows between neurons represent fully connected layers of weights between two layers of neurons. Dashed arrows represent 1:1 copy connections (without weights) from output units to memory units, which store the values of the output neurons at the previous sensory motor cycle step.

The neural network has 165 evolvable connections that are individually encoded on five bits in the genetic string (total length= 825). A population of 100 individuals is evolved using truncated rank-based selection with a selection rate of 0.2 (the best 20 individuals make four copies each) and elitism (two randomly chosen individuals of the population are replaced by the two best individuals of the previous generation). Onepoint crossover probability is 0.1 and bit-toggling mutation probability is 0.01 per bit.

157

M. Suzuki et al. / Omnidirectional Active Vision for Evolutionary Car Driving

3. Evolution of Neural Controllers The fitness function was designed to select cars for their ability to move straight forward as long as possible for the evaluation time of the individual. Each individual is decoded and tested for three trials, each trial lasting 500 sensory motor cycles. A trial can be truncated earlier if the operating system detects a drop of the height of the center of the car, which corresponds to going off-road. The fitness criterion F is a function of the measured speed St of the four wheels and the steering direction Dt of the front wheels: F =

E T  1 f (St , Dt , t) t=0 e=0 E × T × Smax

f (St , Dt , t) = St ×



1−

(1)



|Dt |/Dmax

(2)

where St and Dt are in the range [−8.9, 8.9] cm/sec and [−45, 45] degrees, respectively; f (St , Dt , t) = 0 if St is smaller than 0 (backward rotation); E is the number of trials (three in these experiments), T is the maximum number of sensory motor cycles per trial (500 in these experiments), T  is the observed number of sensory motor cycles (for example, 34 for a robot whose trial is truncated after 34 steps if the car goes off-road). At the beginning of each trial the position and orientation of the car as well as the position of the retina within the image are randomly initialized. We performed these replications of the evolutionary run starting with different genetic populations. In all cases the fitness reached stable values in less than 20 generations (Figure 5) which correspond to successful on-road driving. The fitness values both of the best individuals and of the population average obtained with the pan-tilt camera were higher than those with the omnidirectional camera in all three circuits. Notice that the fitness can never be one because the car must steer in corners so as to not go off-road. Omnidirectional camera, Eight-shaped circuit 0.6

0.5

0.5

0.4

0.4

Fitness

Fitness

Pan-tilt camera, Eight-shaped circuit 0.6

0.3

0.3

0.2

0.2

0.1

0.1

0

0 0

5

10

Generations

15

20

0

5

10

15

20

Generations

Figure 5. Evolution of neural controllers of the car with the pan-tilt camera (left) and the omnidirectional camera (right) in the eight shaped circuits. Thick line shows the fitness values of the best individuals and thin line shows those of the population average. Each data point is the average of three evolutionary runs with different initializations of the population. Vertical lines show the standard error.

158

M. Suzuki et al. / Omnidirectional Active Vision for Evolutionary Car Driving

4. Behavioral Analysis The behavioral strategy of the best evolved car equipped with the pan-tilt camera is as follows: At the beginning of the trial the car points the camera downward and to its right (or left depending on the evolutionary run), and it steers so to maintain the edge between the road and the background within the retina. Due to the lack of space, we show only the behavioral analysis of the best individual with the omnidirectional camera evolved in the eight shaped circuit because the circuit possesses all of the geometrical characteristics which the ellipse and banana shaped circuits have. It also has an intersection which would disturb the car’s perception if simple edge detection is the strategy developed by the evolved individual, which is sufficient for driving in the banana and ellipse shaped circuits. Indeed, since the best evolved individuals with the pan-tilt camera all adopted this strategy, they were disturbed more largely at the intersection and went off-road in several trials. Our preliminary tests also confirmed that the best individuals evolved in the eight shaped circuit were general in the sense that they could drive in the other two circuits as well. Figure 6 shows the strategy of the best individual evolved in the eight shaped circuit: During the starting period the evolved car moves back and forth, and then starts going forward at full speed once the retina finds the front road arena. Another effective strategy acquired by the best individuals in other evolutionary runs is that the car moves forward very slowly until the retina finds the road area, and starts driving at full speed immediately after finding it (data not shown). Both behaviors serve to spare time for the retina to find the road area during the most critical period at the beginning of the trial. In the intersection although the perception of the car is disturbed by the crossing road, which corresponds to the slight deviation of the trajectory, the evolved car manages to find the straight road beyond the intersection by moving the retina upward in the radial direction, which corresponds to “looking farther”, and corrects its trajectory (Figure 6, right). After passing the intersection, the retina moved downward again and maintained the straight road area in sight. The rotational speeds of the wheels and the angular position of the retina did not change significantly while passing the intersection.

5. Discussion The slightly lower fitness values of the evolved individuals with the omnidirectional camera than those with the pan-tilt camera are due to two main reasons: 1) It is harder to find the front road area out of the omnidirectional camera view than out of the pantilt camera view; 2) Evolved individuals can find the area, but it takes them some time because during the road search the car does not move much. Despite this more difficult initial condition, we have shown that artificial evolution selects the individuals capable of “waiting” until the retina finds the appropriate location and of driving at full speed after that. Therefore the slightly lower fitness values of the best evolved individuals with the omnidirectional camera do not mean that those individuals are inferior to those with the pan-tilt camera. The lower fitness is caused by the waiting behavior at the beginning of each test in order to robustly start driving at full speed afterward. Such a behavior has never been observed in any evolutionary run with the pan-tilt camera. Once the road area is detected, the best evolved car with the omnidi-

M. Suzuki et al. / Omnidirectional Active Vision for Evolutionary Car Driving

159

Figure 6. An example of the evolved car with the omnidirectional camera driving in the eight shaped circuit. Top left: Motor activations of the wheel speed (thick line) and the steering angle (thin line) during the first 150 time steps. Speed values above – and below – 0.5 stand for forward – and backward – rotational speeds. Angle values are mapped from 0.5 to 1 for rotations to the left of the car and from 0.5 to 0 for rotations to the right. Dotted vertical line denotes the moment when the retina fixated its position on the front road area. Bottom left: The corresponding trajectory of the car (shown only for the first 70 time steps). Top right: The radial position of the retina. Shaded period corresponds to while passing the intersection. The angular position of the retina remained stable and did not change even while passing the intersection. Bottom right: The corresponding trajectory of the car around the intersection.

rectional camera perfectly drives as that with the pan-tilt camera does. The advantage is lower algorithmic, computational, and memory resources. For comparison with the results obtained with the pan-tilt camera, we did not implement the zooming function in the present experimental setup. However it enables the system to select visual features more specifically by choosing an appropriate resolution for each feature. Indeed, our previous work has demonstrated that in a non-panoramic view the active zooming function played a crucial role in their performance [9], which encourages us to further apply it to the current system. The current setup also allows us to lay multiple retinas within a single omnidirectional camera image so that they each specialize in detecting separate features simultaneously and perform behaviors. In real circuits, there are a number of features to which car drivers must pay attention (e.g., sidelines, signals, other cars). Indeed, [14] developed a multi-window system with a conventional non-panoramic camera for detecting such features during real car navigation, but the position and shape of each window were predetermined by the designers. Active feature selection by multiple retinas which are moving independently over the omnidirectional view may display significant advantages over an active single retina or the fixed multi-window system in several tasks, e.g., navigation

160

M. Suzuki et al. / Omnidirectional Active Vision for Evolutionary Car Driving

in dynamic, unknown environments. Further investigations must be done to validate this hypothesis. The present method may also offer a significant advantage in landmark navigation of mobile robots because of its fast, parallel searching for multiple landmarks in any direction. Instead of processing the 360 degrees field of view, the system could actively extract only small task-related regions from the visual field, which dramatically lightens the computation and the memory consumption.

6. Conclusions In this article we have explored omnidirectional active vision applied to a car driving task. The present simulations have shown that the evolved artificial retina moving over the omnidirectional view successfully detects the behaviorally-relevant feature so as to drive the car on the road. Although it costs time to find the appropriate feature in such a broad field of view during the starting period, the best evolved car overcomes the disadvantage by moving back and forth or moving slowly until the retina finds the appropriate feature. Then the car starts driving forward at full speed. The best evolved car equipped with the omnidirectional camera performs robust driving on the banana, eight, and ellipse shaped circuits in spite of the more difficult initial condition. The advantage of the present methodology is lower algorithmic, computational, and memory resources. Currently we aim to implement an active zooming function of the retinas, to generalize the neural controllers in different-featured circuits (e.g., backgrounds, sidelines, textures), and to transfer the evolved neural controller into the real car shown in Figure 1.

Acknowledgments The authors thank Giovanni C. Pettinaro for the software to model the view of the omnidirectional camera, which has originally been developed in the Swarm-bots project. Thanks also to Danesh Tarapore for enhancing the readability of this article and to two anonymous reviewers for their useful comments. MS and DF have been supported by EPFL. JB has been supported by University of Groningen.

References [1] Y. Matsumoto, K. Ikeda, M. Inaba, and H. Inoue. Visual navigation using omnidirectional view sequence. In Proceedings of International Conference on Intelligent Robots and Systems, pages 317–322, 1999. [2] L. Paletta, S. Frintrop, and J. Hertzberg. Robust localization using context in omnidirectional imaging. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 2072–2077, 2001. [3] I. Stratmann. Omnidirectional imaging and optical flow. In Proceedings of the IEEE Workshop on Omnidirectional Vision, pages 104–114. IEEE Computer Society, 2002. [4] M. F. Land and D.-E. Nilsson. Animal Eyes. Oxford University Press, Oxford, 2002. [5] R. Bajcsy. Active perception. Proceedings of the IEEE, 76:996–1005, 1988.

M. Suzuki et al. / Omnidirectional Active Vision for Evolutionary Car Driving

161

[6] J. Aloimonos, I. Weiss, and A. Bandopadhay. Active vision. International Journal of Computer Vision, 1(4):333–356, 1987. [7] J. Aloimonos. Purposive and qualitative active vision. In Proceedings of International Conference on Pattern Recognition, volume 1, pages 346–360, 1990. [8] D. H. Ballard. Animate vision. Artificial Intelligence, 48(1):57–86, 1991. [9] D. Floreano, T. Kato, D. Marocco, and E. Sauser. Coevolution of active vision and feature selection. Biological Cybernetics, 90(3):218–228, 2004. [10] C. G. Langton. Artificial life. In Artificial Life, pages 1–48. Addison-Wesley, 1989. [11] S. Lanza. Active vision in a collective robotics domain. Master’s thesis, Faculty of Engineering, Technical University of Milan, 2004. [12] J. L. Elman. Finding structure in time. Cognitive Science, 14:179–211, 1990. [13] J. Hertz, A. Krogh, and R. G. Palmer. Introduction to the theory of neural computation. Addison-Wesley, Redwood City, CA, 1991. [14] E. D. Dickmanns, B. Mysliwetz, and T. Christians. An integrated spatio-temporal approach to automatic visual guidance of autonomous vehicles. IEEE Transactions on Systems, Man, and Cybernetics, 20(6):1273–1284, 1990.

This page intentionally left blank

Part 3 Localization

This page intentionally left blank

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

165

Map of Color Histograms for Robot Navigation Takanobu KAWABE a , Tamio ARAI b , Yusuke MAEDA c,1 and Toshio MORIYA d a Nippon Telegraph and Telephone West Corp., Japan b The University of Tokyo c Yokohama National University d Hitachi, Ltd. Abstract. In this paper, we propose a method to estimate a robot position and orientation in a room by using a map of color histograms. The map of color histograms is a distribution map of colors measured by a mobile robot in multiple directions at various points. The size of the map should be strongly compressed because the memory capacity of the robot is restricted. Thus, the histograms are converted by the discrete cosine transform, and their high-frequency components are cut. Then, residual low-frequency components are stored in the map of color histograms. Robot localization is performed by matching between the histogram of an obtained image and reconstructed histograms out of the map by means of histogram intersection. We also introduce a series of estimations with the aid of odometry data, which leads to accuracy improvement. Keywords. Color histograms, robot navigation, mobile robots

1. Introduction The number of robots used for entertainment and as pets has increased dramatically in the past decade [1]. Self-localization is one of the most fundamental functions of pet robots so that it can live and perform some useful tasks at home and in the office. Compared with conventional AGV (Automated Guided Vehicle) or mobile robots, pet robots cannot move precisely. This is partly because their environment is not well structured and partly because their locomotive mechanism is unsteady and their sensing systems detect too much noise. Thus, several methods of self-localization should be introduced to ensure that the robots have robust navigation systems. Some of them might be good in a global environment, and some others might be accurate enough in small areas. One easy and robust method of localization is to utilize artificial or natural landmarks as studied in [2][3]. Operators, however, need to teach the features and positions of the landmarks [4]. Robots can take a route that is accurately taught in advance as a sequence of sensor data [5][6]. However, such a data sequence must be relatively large. 1 Correspondence

to: Yusuke MAEDA, Div. of Systems Research, Fac. of Eng., Yokohama National University, 79-5 Tokiwadai, Hodogaya-ku, Yokohama 240-8501, JAPAN; Tel.: +81 45 339 3918; Fax: +81 45 339 3918; E-mail: [email protected].

166

T. Kawabe et al. / Map of Color Histograms for Robot Navigation

Inverse Discrete Cosine Transform

Discrete Cosine Transform

(a) Original Image

(b) Normalized Color Histogram

(c) Low Frequency Components

(d) Compressed Color Histogram

Figure 1. The process of making a compressed color histogram

In this paper, we propose a simple and robust algorithm for a small robot to localize itself by matching an obtained image onto a map of color histograms. Here, “localization” means that both the position and orientation of a robot are estimated. The proposed algorithm is expected to be applied to a pet robot such as Sony’s AIBO [7] that has the following characteristics: 1. Vision system: The robot has a conventional low-resolution color camera installed in its forehead, whose height is close to the floor level. 2. Memory capacity: The available memory for the data for navigation is severely restricted. In this section, we discussed the requirement for pet robot navigation and its solution. Our proposed method is outlined in Section 2, and color histograms are then explained in Section 3. Section 4 deals with self-localization using the similarity between two histograms. In Section 5, maps of color histograms are generated in a room environment; experiments involving self-localization are conducted, and the results are discussed. The conclusion is presented in Section 6.

2. Outline of Map of Color Histograms Let us assume that a mobile robot moves around in a room with sets of furniture, such as chairs and desks, and items such as waste-paper baskets. Since the height of the robot’s camera is lower than that of the furniture, the images are directly affected by the position of the furniture. Therefore, the making of a map of a room requires that multiple images of the room be taken from various positions in various directions. The characteristics necessary for the image map include the following: • • • •

Robustness against changes in illumination Robustness against occlusions Small memory consumption Simple and quick matching

To meet the requirement, we propose a map of color histograms. The robot can be localized based on the similarity between color histograms of an obtained image and those stored in the map. The sequence of making a compressed color histogram for the map is illustrated in Figure 1: the obtained image (a) is converted into three sets of normalized color histograms (b); each histogram is transformed into frequency components by the discrete cosine transform and only low-frequency components are stored in the map (c); when the map is used, the stored low-frequency components are transformed inversely into histograms (d).

T. Kawabe et al. / Map of Color Histograms for Robot Navigation

167

However, color histograms of images taken at different poses can be very similar. Therefore, in order to improve the accuracy of self-localization, we also use a history of color histograms and odometry.

3. Representation of Color Histograms 3.1. Normalized Color Histograms A color histogram is obtained for each color component from an image taken by a camera. Each histogram has the number of pixels in the vertical axis against the amount of the corresponding color component in the horizontal axis, as shown in Figure 2. Then, the values of each color component are divided by the number of total pixels of the image in order to normalize the histogram in terms of the image size. As described in Section 2, the color histogram must be robust against changes of the lighting conditions in both color temperature and brightness. The robustness is strongly affected by the expression of color. Thus, we compared the color space, such as RedGreen-Blue (RGB), Hue-Saturation-Intensity (HSI, or HSV or HSL), and YUV. Generally speaking, the hue in HSI has little influence on the lighting conditions. The hue, however, is not stable when the chroma of the color is low. Thus, we adopt the RGB color space for histogram generation. The average brightness of the whole image should also be normalized for the robustness. We adjust the brightness of the image by means of gamma correction before we make the map of color histograms. Let us formulate the above procedure in more detail. A normalized color histogram on each color c out of RGB, hc , is defined as follows: hc = {hc (0), hc (1), . . . , hc (N − 1)} 1  δ(ic , n) hc (n) = NI  δ(ic , n) =

(1) (2)

i∈I

1

n 1 when N ≤ ic γ < 0 otherwise,

n+1 N ,

(3)

where N is the number of discrete steps for the amount of each color component (in this research, N = 256); i is one of the pixels in an image, I; NI is the number of pixels of I; 0 ≤ ic ≤ 1 is defined as the amount of color c of the pixel i. The gamma-value γ is adjusted so that the average of the brightness of all pixels of the image might be set to 0.5 as follows:  2 γ = argmin (Y (i, γ) − 0.5) , (4) γ

i∈I

where 1/γ

1/γ

Y (i, γ) = 0.299 ired + 0.587 i1/γ green + 0.114 iblue .

(5)

168

T. Kawabe et al. / Map of Color Histograms for Robot Navigation

Histogram

400 300 200 100 0

0.025

Normalized Histogram

Normalized Histogram

500

0.02 0.015 0.01 0.005 0

0

64

128 192 Brightness (unit)

0.015 0.01 0.005 0

0

Figure 2. Original Histogram

0.02

64

128 192 Brightness (unit)

0

64

128 192 Brightness (unit)

Figure 3. Normalized Histogram Figure 4. Compressed Histogram

3.2. Histogram Compression by Discrete Cosine Transform We use the discrete cosine transform (DCT) for compression of color histograms. It is a technique for converting a signal into elementary frequency components, and widely used in compression, especially, in image compression, such as JPEG or MPEG. The eigenspace method is another possible algorithm of compression; however, a very large amount of calculation is required in order to obtain an orthogonal basis. On the other hand, the DCT is easy to calculate. When the histogram is used to express the characteristics of an image, the outward appearance is more important than the detailed appearance. Moreover, the form of color histograms of a natural image usually slopes gently. Therefore, the high-frequency components of a color histogram can be ignored. Let us denote the DCT of a histogram h = {h(i)|i = 0, 1, . . . , N − 1} by H. Then, H is given by: H = {H(u)|u = 0, 1, . . . , N − 1}  N −1  (2x + 1)uπ 2 H(u) = k(u) h(x) cos N 2N x=0  1 √ when u = 0, 2 k(u) = 1 otherwise.

(6) (7)

(8)

The high-frequency components of the DCT may include a considerable amount of noise in the histogram. The noise is not only caused by colors on the texture of objects but is also generated as false intermediate colors in pixels; for example, pink is often found between black and white. Therefore, only low-frequency components, H(0), . . . , H(NA − 1) are stored in the map of color histograms (NA ≤ N ). ˆ can be reconstructed by applying the inverse An approximated original histogram h discrete cosine transform (IDCT) to the low-frequency components H(0), . . . , H(NA − ˆ is given by: 1). The reconstructed color histogram, h,   ˆ = h(u)|u ˆ h = 0, 1, . . . , N (9)  ˆ h(u) =

NA −1 (2u + 1)xπ 2  . k(x)H(x) cos N x=0 2N

(10)

In our experiments, NA is set to one-eighth of N (i.e., NA = 32). The three figures from Figure 2 to Figure 4 show the difference among the histograms. The original color

T. Kawabe et al. / Map of Color Histograms for Robot Navigation

(a-1)

169

(a-2)

(3)

(b-1) (1) Original Images

(b-2) (2) Filtered Histogram

(3) Histogram Intersection of (a) and (b)

Figure 5. Histogram Intersection

histogram, Figure 2, changes into Figure 3 by normalization with regard to the image size and the brightness. Figure 4 is a reconstructed histogram from the low-frequency components of the DCT; it shows that the rough form of Figure 3 is maintained, although the high-frequency components of Figure 3 were removed.

4. Self-Localization Using Map of Color Histograms 4.1. Similarity Between Two Histograms The similarity between two histograms is calculated by the algorithm of the histogram intersection [8]. The intersection between the two histograms, h1 and h2 , is given by: d∩ (h1 , h2 ) =

N −1 

min(h1 (i), h2 (i)),

(11)

i=0

which means the area of the overlapping portion of the two histograms. The process of the algorithm is explained in Figure 5. Two original images, (a-1) and (b-1), are converted separately into RGB histograms: each histogram is transformed and squeezed by the DCT and a low-pass filter, and then transformed inversely to a compressed histogram, as shown in (a-2) and (b-2). The intersection is computed as (3). The matching for localization is taken in the space of the histogram rather than that of the frequency. This is because the former produces more direct and reliable results. ˆ red1 , h ˆ green1 , h ˆ blue1 ) ˆ 1 = (h The similarity between two sets of RGB histograms, H ˆ ˆ ˆ ˆ and H2 = (hred2 , hgreen2 , hblue2 ), can be calculated by the summation of the histogram intersections of RGB as follows: ˆ red1 , h ˆ red2 ) + d∩ (h ˆ green1 , h ˆ green2 ) + d∩ (h ˆ blue1 , h ˆ blue2 ). (12) ˆ 1, H ˆ 2 ) = d∩ (h d∩ (H

170

T. Kawabe et al. / Map of Color Histograms for Robot Navigation

A robot can be localized based on the similarity between a set of color histograms of ˆ and each set of reconstructed histograms from the map; we can an obtained image, H, ˜ as follows: estimate the position and orientation of the robot, p, ˆ H(m)) ˆ m ˜ ← argmax d∩ (H,

(13)

p˜ ← p(m), ˜

(14)

m∈M

ˆ where M is the map of color histograms; m is an element of the map; H(m) is a set of reconstructed RGB color histograms at m; p(m) is the position and orientation corresponding to m. 4.2. Localization Accuracy Improvement Using Odometry Self-localization of a robot as described above may sometimes fail, because color histograms of images taken at different poses can be very similar. Thus, we use a history of color histograms and odometry in order to improve the localization accuracy. Let us assume that the robot moves stepwise. We denote the relative pose difference before and after the i-th movement by Δpi , which can be roughly estimated from odometry. Then, self-localization after the n-th movement can be performed as follows: pn−k (m) = p(m) −

k−1 

Δpn−i

(15)

i=0

  mn−k (m) ← argmin p(m ) − pn−k (m) m ∈M

m ˜ ← argmax m∈M

n 

(16)

ˆ n−k , H(m ˆ wk d∩ (H n−k (m)))

(17)

p˜ ← p(m), ˜

(18)

k=0

ˆ i is a set of RGB color histograms at the i-th step; w is a discount factor to where H alleviate the effect of odometry error and 0 ≤ w ≤ 1.

5. Experiments Localization experiments were conducted to estimate the error of the proposed method. They were made in a room 6 [m] wide by 8 [m] long in the daytime. A robot installed with a CCD color camera, XC-777A by Sony, was manually moved in the room and took images. The height of the camera was 505 [mm]. The number of pixels of the camera was 768[Horizontal] by 494[Vertical], but 160[H] by 120[V] was used for processing. 5.1. Procedure The experimental procedure was as follows:

171

T. Kawabe et al. / Map of Color Histograms for Robot Navigation 0.5

600

0.4 rotational error [rad]

translational error [mm]

700

500 400 300 200

with normalization

100

without normalization

0.3 0.2 with normalization 0.1

0

without normalization

0

0

2

4

6

8

10

12

14

16

0

2

4

6

8

10

12

14

16

step

step

Figure 6. Average Translational Error

Figure 7. Average Rotational Error

1. Image Acquisition for Map of Color Histograms: First, the images were taken in the area of 2.4 [m] by 1.2 [m] in the room. The robot pose (x, y, θ) at the sampling points was as follows: x = 305i − 1220 [mm] y = 305j − 610 [mm]

(i = 0, . . . , 8) (j = 0, . . . , 4)

θ = kπ/6 [rad] (k = 0, . . . , 11). Thus, the total number of the sampling points was 9 × 5 × 12 = 540. Then, the map was defined as a list of all the quadruplets m = (x, y, θ, V), where V = {(Hred (i), Hgreen (i), Hblue (i))|i = 0, . . . , NA − 1}. Since one pixel has one byte for each RGB, 3 × 160 × 120 = 57600 bytes are necessary for an image. 2. Generation of Map of Color Histograms: Now, each image was compressed into low-frequency components of a normalized color histogram on each color of RGB by means of the algorithm given in Section 3. As a frequency component was expressed by 8 bytes of a real number and only 32 components were stored for each color in the map, the size of the frequency components was 8 × 32 × 3 = 768 bytes. It was 1.33% of its original image size, 57600 bytes. The total size of color histograms in the map was 768[B] × 540 ≈ 0.4[MB]. 3. Localization: We made one hundred trials of self-localization. The initial position and orientation of the robot at each trial was chosen at random. For simplicity, we restrict the robot pose as follows: x = 305i/2 − 1220 [mm] y = 305j/2 − 610 [mm]

(i = 0, . . . , 16) (j = 0, . . . , 8)

θ = kπ/12 [rad] (k = 0, . . . , 23). The robot was moved at random to an adjacent pose manually for each step. 5.2. Experimental Results Average translational and rotational errors of localization in 100 trials are shown in Figure 6 and Figure 7, respectively. For comparison, results without normalization (i.e.,

172

T. Kawabe et al. / Map of Color Histograms for Robot Navigation

without gamma correction for adjustment of the brightness) are also shown. We can see that normalization leads to better localization. Each of the graphs tends to decrease as the step number increases; that is the accuracy improvement by the history of color histograms and odometry. The translational and rotational errors almost converge to a certain level. The residual translational error is larger than the distance interval of sampling points, 305 [mm], while the residual rotational error is smaller than the angle interval of the sampling points, π/6 ≈ 0.52 [rad]. This is because the color histograms are more sensitive to the rotational movement of the robot than the translational one (especially in the camera direction).

6. Conclusion In this paper, we proposed an algorithm that makes it possible for a robot to localize itself with a vision sensor by using a map of color histograms. • The normalized color histogram of an image by gamma correction is used for the robustness against changes of the lighting conditions. • The map is compressed by means of the DCT of color histograms, thus retaining only the low-frequency components of the histograms. The size of the compressed map is reduced to 1.33 % of that of original images. • In the experiments in a room environment, the localization error was about 400 [mm] in translation and 0.3 [rad] in rotation with the help of odometry, when the interval of sampling points for the map building was 305 [mm] in translation and 0.52 [rad] in rotation.

References [1] H. H. Lund (2003) “Adaptive Robotics in the Entertainment Industry,” Proc. 2003 IEEE Int. Symp. on Computational Intelligence in Robotics and Automation, pp. 595–602. [2] G. Jang, S. Kim, W. Lee, I. Kweon (2003) “Robust Self-localization of Mobile Robots using Artificial and Natural Landmarks,” Proc. 2003 IEEE Int. Symp. on Computational Intelligence in Robotics and Automation, pp. 412–417. [3] V. Ayala, J. B. Hayet, F. Lerasle, M. Devy (2000) “Visual Localization of a Mobile Robot in Indoor Environments Using Planar Landmarks,” Proc. of 2000 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 275–280. [4] R. Ueda, T. Fukase, Y. Kobayashi, T. Arai, H. Yuasa, J. Ota (2002) “Uniform Monte Carlo Localization - Fast and Robust Self-localization Method for Mobile Robots,” Proc. of 2002 IEEE Int. Conf. on Robotics and Automation, pp. 1353–1358. [5] Y. Adachi, H. Saito, Y. Matsumoto, T. Ogasawara (2003) “Memory-based Navigation Using Data Sequence of Laser Range Finder,” Proc. 2003 IEEE Int. Symp. on Computational Intelligence in Robotics and Automation, pp. 479–484. [6] T. Matsui, H. Asoh, S. Thompson (2000) “Mobile Robot Localization Using Circular Correlations of Panoramic Images,” Proc. of 2000 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 269–274. [7] M. Fujita, H. Kitano (1998) “Development of an Autonomous Quadruped Robot for Robot Entertainment,” Autonomous Robots, vol.15, no.1, pp. 7–18. [8] M. J. Swain, D. H. Ballard (1991) “Color Indexing,” Int. J. on Computer Vision, vol.7, no.1, pp. 11–32.

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

173

Designing a System for Map-Based Localization in Dynamic Environments Fulvio Mastrogiovanni, Antonio Sgorbissa 1 , and Renato Zaccaria DIST - University of Genova Abstract. This paper deals with map-based self-localization in a dynamic environment. In order to detect localization features, laser rangefinders traditionally scan a plane: we hypothesize the existence of a “low frequency cross-section” of the 3D environment (informally, over people heads), where even highly dynamic environments become more “static” and “regular”. We show that, by accurately choosing the laser scanning plane, problems related to moving objects, occluded features, etc. are simplified. Experimental results showing hours of continuous work in a realworld, crowded scenario, with an accuracy of about 2.5cm and a speed of 0.6m/sec, demonstrate the feasibility of the approach. Keywords. Mobile Robotics, Map-based self-localization, System design

1. Introduction This paper deals with the self-localization subsystem of an autonomous mobile robot within a dynamic, indoor environment. It is commonly accepted that, if we want the robot to operate autonomously for a long period of time while carrying out complex navigation tasks (e.g., for autonomous trasportation, autonomous surveillance, museum tour guiding, etc. [1] [4]), self-localization is a fundamental, yet-to-be solved issue. Since incremental dead-reckoning methods are not sufficient because of cumulative errors which increase over time, a very common approach consists in collecting observations from the surrounding environment (natural or artificial features or landmarks), and to periodically update the robot pose by comparing measurements with a map of the environment. If required, each new observation can be used to update the map itself, thus estimating both the pose of the robot and each surrounding feature in the environment (this is also known as Simultaneous Localization And Mapping). In this general scenario, existing localization methods are often classified along different axes (Local Vs. Global, Active Vs. Passive, etc. [5]). In spite of their differences, almost all the existing approaches rely on the so-called Markov assumption; i.e., sensor readings are assumed to depend only on the relative configuration of the robot with respect to the surrounding, presumed static features. Obviously, this is unrealistic in realworld scenarios, where people walk all around and objects are moved frequently from place to place. However, since the assumption allows to simplify computations, most ap1 Correspondence to: Antonio Sgorbissa, DIST - University of Genova. Tel.: +39 010 353 2801; Fax: +39 010 353 2154; E-mail: [email protected].

174

F. Mastrogiovanni et al. / Designing a System for Map-Based Localization

proaches rely on post-processing techniques to filter out data originating from unmodeled, dynamically changing features. The present work is focused on a method relying on a SICK laser rangefinder as the only sensing device in the system, and an Extended Kalman Filter for position estimate. In particular, we face problems related to the dynamism and the inherent complexity of real-world environments by introducing a new dimension, i.e. elevation from the ground. Since laser rangefinders traditionally scan a plane, we hypothesizes the existence of a 2D Low Frequency Cross-Section (informally, “over people heads”), where even highly dynamic environments become more “static” and “regular”: the attribute “low frequency” thus refers both to the temporal and the spatial dimension. The localization process can be successfully managed here, with the robot standing “on the tips of its toes”, thus decoupling position tracking from obstacle avoidance: whilst features for localization should be extracted in the LFCS, which is static and has a simpler topology, sensors for obstacle avoidance should be put lower, in order to deal with the complex and ever changing topology of a dynamic and chaotic human environment. Section II clarifies the LFCS concept, and introduces an “unevenness index” to better formalize it. In Section III the feature extraction and matching algorithms are discussed. Section IV shows experimental results. Conclusions follows.

2. The Low Frequency Cross-section Most approaches in literature make assumptions about the stationarity of the environment. People walking around the robot, objects moved from place to place, or even other moving robots evidently violate this assumption, thus making feature detection harder both as a consequence of occlusions and the fact that features themselves change significantly over short periods of time. Two approaches are widely known in literature to deal with this problem. According to the first approach, the map of the world has to be continuously updated using current observations, in order to reflect the actual state of the environment. This approach is often unfeasible, because of the complexity and the large amount of relevant features which must be taken into account. Recently, important advances in the field of Simultaneous Localization and Mapping (SLAM) [7] [8] have lead to partial solutions, in particular for those features of the environment which appear/disappear at a low frequency (e.g., furniture); however, this is an open research issue. The second approach is based on techniques to filter out sensor readings which do not have a correspondence in the a priori map. For example, in [5] such a filter is used to compute the entropy value of the state estimate, thus considering only perceptions which reduce the entropy value itself. The main drawback of the approach is its inherent conservativeness, since it is not able to incorporate new information in the map. The main purposes of this work is to introduce a novel approach to deal with complex, dynamic indoor environments. We notice that the intrinsic dynamic nature of human populated environments is not a characteristic of the environment “as a whole”, but it mainly depends on elevation from the floor. In particular, it does exist a well defined height range, roughly located over our heads, which we call the Low Frequency CrossSection: experience shows that, in such 2D cross-section of the 3D Workspace, the status of the world varies at an extremely low frequency. Consider for example the displacement

F. Mastrogiovanni et al. / Designing a System for Map-Based Localization

175

Figure 1. Bottom, mid and top view.

of objects and furniture from one place to another: in a typical office-like environment, people usually move chairs, tables, wastebaskets, boxes, PCs, books, etc., from and to places which are “close” to the floor, in order to be handy for humans; these objects are moved around on a daily-basis. Shelves and closets, which are placed at higher elevation from the floor or have a bigger size which could cover an entire wall, are displaced (at most) on a yearly-basis, i.e. a time significantly longer than the life of most current robotic set-ups. We can summarize this by saying that in LFCS the environment is more “static” than in the area below. For analogous reasons, we claim that, over a given height, the environment is more “regular” and “simple” than below. This is a consequence of the fact that humans are inherently “conservative”, i.e., they tend to minimize the energetic strain when performing a particular task. Therefore, it is more common to put down a book on a messy stack on our desktop (if we use it everyday) rather than in a well ordered (but not easily reachable) shelf located over our heads, especially if we plan to take it again after a couple of minutes. Fig. 1 shows a picture of our lab, with laser scans taken at different heights: bottom view (10 cm from the floor, quite messy); middle view (90 cm, very messy); top view (200 cm, i.e. in the LFCS). The idea of a “freely sensing area” is not new. A successful attempt is described in [6], where authors deal with the localization problem by pointing a TV camera upward, with the purpose of detecting ceiling lights and matching them against a previously stored map in memory. The main difference between LFCS and this approach is straightforward: LFCS is a general method for laser based feature extraction, which opportunistically chooses the most promising laser scanning plane but can be adopted in any indoor environment; the approach in [6] is oriented towards solving a very specific problem, thus not being portable at all. To give a formal meanings to terms such as “static”, “regular” and “simple”, we introduce an “unevenness index”; unevenness measures and the fractal degree are commonly used in geography to measure the length and complexity of shore profiles, and can be applied to quantify the regularity of a given laser scan. The unevenness index U is simply provided by the following formula: U=

L C

(1)

L in Equation 1 is the length of the polyline which is built by connecting all the raw scan points, thus not relying on a particular segmentation algorithm. C – commonly referred to as the chord of the polyline – is the length of the segment located between the first and the last scan point. It is straightforward to depict the behavior of U :

176

F. Mastrogiovanni et al. / Designing a System for Map-Based Localization

1) U is always greater than 1. This comes from the fact that the link between the start and the final point of a polyline is always shorter than the polyline itself. U could theoretically be equal to 1 in the case the polyline is made up of a single segment: in other words, when all the points are collinear. 2) U is a measure of the discontinuity between contiguous scan points. High U values are detected if range measurements vary at high frequencies between contiguous scan points. On the contrary, low U values are detected if the points are mostly colinear or their range variation is very smooth. 3) U is proportional to the depth of the environment. Long corridors are mainly characterized by high values of U , while rooms by lower values. This “not explicitly modeled” effect can be reduced, but not removed, by dividing a full scan into disjoint subsets of range measurements, and considering each polyline and the corresponding chord separately. Notice that this is similar to the approach adopted in geography, where coastal profiles are divided into “mostly convex” and “mostly concave” areas before computing the fractal degree. All the data collected confirm the intuition that, at ground level, U is higher; it significantly decreases as we move the laser to a higher elevation, i.e., corresponding to the LFCS. Here, transitions between close scan points are generally smoother, since the environment is less cluttered. Finally, the same is obviously true in a dynamic environment, crowded with people moving around the robot: U substantially increases - with respect to its reference value computed in the LFCS - when scans are taken at a lower height.

3. Self-Localization Two phase are iteratively executed, namely Acquisition and EKF-Update. 3.1. Acquisition We adopt a typical Split & Merge method [9]. This family of line extraction methods exhibits a number of advantages compared to other existing techniques [10] [11] [12] [3] [2]: they exploit local scan templates, they are faster – due to the fact they act on smaller data sets – and produce more accurate results. The required steps are: Preprocessing, Scan Segmentation, Segment Line Extraction, and Feature Matching. Preprocessing and Scan Segmentation do not deserve further discussion: they basically consist in standard techniques to filter out invalid readings, and to segment range data into “almost continuous”, non overlapping, sets of points Sj to be feeded to the next computation step. Line extraction recursively iterates on each set, trying to find local subsets strictly made up of collinear points. This is accomplished by iteratively calculating the fitting value φj of each set Sj (equations are not shown here: see [9]). If φj ≤ Φ, where Φ is the fitting threshold, the points belonging to Sj are considered collinear. If not, the algorithm splits Sj into m smaller sets and computes φjm for each of them. This possibly originates several segment lines from the original set. When a segment line lj is found, ρlj and θlj are measured: in particular, by defining Lj (with capital letter) the straight (infinite) line on which lj lies, ρlj is the distance between Lj and the robot, θlj is the angle between ρlj and the robot heading. By relying on geometrical considerations, both end points of each segment line lj are found. Finally, a set of segment lines λ such that

F. Mastrogiovanni et al. / Designing a System for Map-Based Localization

177

Figure 2. Left: The plot of U versus time in a different 5 mins tests; Right: mean and variance.

Figure 3. Left: The plot of U versus time in a 3 hours test; Right: mean and variance.

  λ = lj , j = 1, ..., |λ|

(2)

is returned, ordered according to the angle θlj associated to each lj (from right to left). During Feature Matching, each line lj is compared with the segment lines which compose the environmental model. The model is a set of |μ| oriented segment lines (each defined by a start and an end point) to allow to distinguish the opposite surfaces of the same wall (or object).   μ = mk , k = 1, ..., |μ| (3) In particular, different heuristic conditions must be satisfied in order to obtain a correct match; equations are not shown here for sake of brevity. As a result, each sensed line lj can virtually match against more map lines mk , thus producing a set of matching couples:   M Cj = < lj , mk >, mk ∈ M In the next phase, some criteria are adopted to select the best matching map line for each sensed line. 3.2. EKF-Update Two steps are involved: Best Match Selection and Kalman Filtering. As usual, to select the most likely match in the set M Cj , we refer the current estimated robot position. In

178

F. Mastrogiovanni et al. / Designing a System for Map-Based Localization

particular, for every map line mk included in M Cj , we compute the measurement model ρ˜mk and θ˜mk (its complete expression is given in equation 6, when describing the EKF), which returns the estimated distance and angle given the current position estimate. Next, we assign a likely score lsjk to each couple < lj , mk >∈ M Cj according to Equation 4.



lsjk = Θρ ρlj − ρ˜mk 2 + Θθ θlj − θ˜mk 2 (4) In Equation 4, Θρ and Θθ are experimentally determined in order to differently weight errors with respect to the distance and to the angle; the j th measure (ρlj , θlj ) feeded to the EKF corresponds to the matching couple mcjk such that: (5) mcjk = arg max lsjk k

If more sensed line lj are available, the following step is iteratively invoked in order to process all the matching pairs. In the last step, position is corrected through an Extended Kalman Filter. An EKF is a linear recursive estimator for the state of a dynamic system described by a nonlinear process f and a measurement model h. Its good properties and limitations are well known in literature, thus not requiring to be discussed here. However, it is interesting to notice that, in our case, the measurement model h is linear, therefore partly allowing to ignore some of the problems which are related to linearization (unfortunately, the kinematic model of our differentially driven vehicle is not linear). In particular, if

T sˆr,t = x ˆr,t , yˆr,t , θˆr,t is the a priori estimate of the robot’s position at the tth instant,

T and the 2-dimensional vector z˜m ,t = ρ˜m ,t , θ˜m ,t is the predicted measure (i.e., k

k

k

corresponding to the map segment mk in the matching couple mcjk ), the measurement model is given by the following linear equations:   a mk x ˆr + bmk yˆr + cmk b mk ˜ − θˆr , θmk = atan (6) ρ˜mk = amk a2mk + b2mk I.e., we express segment lines mk in the map through the implicit equation amk x + bmk y + cmk = 0 of the corresponding straight line. An observability analysis have been performed, showing that the observability space, for each measurement lj , has dimension 2; i.e., the system is not completely observable. However, since the measurement model h varies depending on the equation of the line mk which is currently considered, subsequent measurements – corresponding to different, hopefully perpendicular lines – allow to correct all the component of the robot state.

4. Experimental Results Experiments allowed to validate the intuition about a Low Frequency Cross-Section. In particular, the laser scanner has been set at three different heights from the ground: 10 cm (bottom view), 90 cm (mid view) and roughly 2 meters (top view). From these views we collected about one thousand scans in different places of our department, a house, and a garage; each scan lasts about five minutes, thus requiring to average results over time. The experiments show that bottom and mid views (Fig. 2) almost exhibit a comparable behavior for the U parameter, whilst for the top view (in darker gray) U

F. Mastrogiovanni et al. / Designing a System for Map-Based Localization

179

Figure 4. Left: simulated robot (8 hours); Middle: real robot (2 hours); Right: snapshots of the experiment

is significantly smaller. Obstacle profiles, when detected in the LFCS, are much more regular than below; moreover, the variance over time of each scan is significant in the bottom and mid view, whilst it is almost zero in the top view (small fluctuations are due to sensor noise). Experiments with a longer duration of about 2 hours, shown in Fig. 3, confirm this evidence. Exploiting these results, we performed many localization experiments, both in simulation and with two real robots (a modified TRC Labmate base and the home-built robot Staffetta). For simulation, we simulate errors both in rangefinder measurements [2] and in odometry. In Fig. 4 (left), the simulated robot is asked to navigate through an environment which replicates the ground floor of our department: the figure depicts the superimposition between the ground truth and the estimated robot position after 8 hours simulation. In Fig. 4 (middle) a similar experiment is shown with a real robot; the system proves to be able to localize itself for almost 2 hours (the maximum time allowed by batteries), even it is almost continuously surrounded by students (Fig. 4 right). Moreover, it demonstrates its robustness when moved to a different scenario, i.e.: the second floor of our department (Fig. 5). During experiments the normal life of the department is not interrupted: people are walking by, furniture are displaced etc. In spite of this, positioning have an accuracy of about 2.5 cm when robots are moving at a speed of 0.6m/sec.

5. Conclusion The paper shows that map-based self-localization within a dynamic environment can be successfully performed by opportunely designing the laser to scan a 2D plane where the environment slowly changes, both temporally and spatially (i.e., which we call a Low Frequency Cross-Section). This concept is formalized through the unevenness parameter U , which attributes a score to the “roughness” of laser scans; as expected, scans taken outside the LFCS are much less stable and regular. Localization experiments – performed both in simulation and with a real robot – demonstrate that the approach is very robust and precise even in highly crowded environments.

180

F. Mastrogiovanni et al. / Designing a System for Map-Based Localization

Figure 5. Top: simulated robot (8 hours); Bottom: real robot (2 hours)

References [1] I.J. Cox, BLANCHE - An Experiment in Guidance and Navigation of an Autonomous Robot Vehicle, IEEE Int. J. of Robotics and Automation, Vol. 7 no 2 (1991), 193–204. [2] P. Jensfelt and H. I. Christensen, Pose Tracking Using Laser Scanning and Minimalistic Environmental Models, IEEE Int. J. of Robotics and Automation, Vol. 17 no 2 (2001), 138–147 [3] S. Thrun, C. Martin, Y. Liu and, Hahnel, R. Emery-Montemerlo, D. Chakrabarti and W. Burgard, A Real-Time Expectation-Maximization Algorithm for Acquiring Multilanar Maps of Indoor Environments With Mobile Robots, IEEE Int. J. of Robotics and Automation, Vol. 20 no 3 (2004), 433–442. [4] S. Thrun, M. Beetz, M. Bennewitz, W. Burgard, A.B. Creemers, F. Dellaert, D. Fox, D. Hahnel, C. Rosenberg, N. Roy, J. Schulte and D. Schulz, Probabilistic Algorithms and the Interactive Museum Tour-Guide Robot Minerva, Int. J. of Robotics Research, Vol. 19 no 11 (2000), 972–999. [5] D. Fox, Markov Localization: A Probabilistic Framework for Mobile Robot Localization and Navigation, PhD Thesis, Institute of Computer Science III, University of Bonn (1998) [6] D. Fox, S. Thrun, W. Burgard, and F. Dellaert, Particle Filters for Mobile Robot Localization, Sequential Monte Carlo Methods in Practice, Statistics for Engineering and Information Science, Springer-Verlag, eds. A. Doucet and N. de Freitas and N. Gordon, New York (2001). [7] J. Nieto, J. Guivant, and E. Nebot, DenseSLAM: The Unidirectional Information Flow (UIF), Proc. Fifth IFAC/EURON Symp. on Intell. Autonomous Vehicles (IAV’04), Portugal (2004). [8] J. B. Folkensson and H. Christensen, Robust SLAM, Proc. Fifth IFAC/EURON Symp. on Intell. Autonomous Vehicles (IAV’04), Portugal (2004). [9] D. Sack and W. Burgard, A Comparison of Methods for Line Extraction from Range Data, Proc. Fifth IFAC/EURON Symp. on Intell. Autonomous Vehicles (IAV’04), Portugal (2004) [10] M. Bennewitz, W. Burgard, and S. Thrun, Using EM to learn motion behaviors of persons with mobile robots, Proc. of IROS’02, Lausanne, Switzerland (2002). [11] R. O. Duda and P. E. Hart, Pattern classification and scene analysis, Wiley and Sons, New York (1973). [12] Y. Liu, R. Emery, D Chakrabati, W. Burgard, S. Thrun, Using EM to learn 3D models of indoor environments with mobile robots, Proc. of the Int. Conf. on Machine Learning (2001)

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

181

Appearance-based Localization of Mobile Robots using Local Integral Invariants Hashem Tamimi a , Alaa Halawani b Hans Burkhardt b and Andreas Zell a a Computer Science Dept., University of Tübingen, Sand 1, 72076 Tübingen, Germany. b Institute for Computer Science, University of Freiburg, 79110 Freiburg, Germany. Abstract. In appearance-based localization, the robot environment is implicitly represented as a database of features derived from a set of images collected at known positions in a training phase. For localization the features of the image, observed by the robot, are compared with the features stored in the database. In this paper we propose the application of the integral invariants to the robot localization problem on a local basis. First, our approach detects a set of interest points in the image using a Difference of Gaussian (DoG)-based interest point detector. Then, it finds a set of local features based on the integral invariants around each of the interest points. These features are invariant to similarity transformation (translation, rotation, and scale). Our approach proves to lead to significant localization rates and outperforms a previous work. Keywords. Appearance-based robot localization, integral invariants, (DoG)-based interest point detector.

1. Introduction Vision-based robot localization demands image features with many properties. On one hand the features should exhibit invariance to scale and rotation as well as robustness against noise and changes in illumination. On the other hand they should be extracted very quickly so as not to hinder other tasks that the robot plans to perform. Both global and local features are used to solve the robot localization problem. Global features are extracted from the image as a whole, such as histograms [18], Principal Component Analysis (PCA)-based features [3], and integral invariants [17]. On the other hand, local features are computed from areas of high relevance in the image under consideration such as Scale Invariant Feature Transform (SIFT) [6], kernel PCA-based features [16], and wavelet-based features [15]. Local features are more commonly employed because they can be computed efficiently, are resistant to partial occlusion, and are relatively insensitive to changes in viewpoint. There are two considerations when using local features [5]: First, the interest points should be localized in position and scale. Interest points are positioned at local peaks in a scale-space search, and filtered to preserve only those that are likely to remain stable over transformations. Second, a signature of the interest point is built. This signature should be distinctive and invariant over transformations caused by changes in camera pose as well as illumination changes. While point localization and signature aspects of interest point algorithms are often designed together, they can be considered independently [7].

182

H. Tamimi et al. / Appearance-Based Localization of Mobile Robots

In this paper we propose the application of the integral invariants to the robot localization problem on a local basis. First, our approach detects a set of interest points in the image based on a Difference of Gaussian (DoG)-based interest point detector developed by Lowe [6]. Then, it finds a set of descriptive features based on the integral invariants around each of the interest points. These features are invariant to similarity transformation (translation, rotation, and scale). Our approach proves to lead to significant localization rates and outperforms a previous work that is described in Section 4.

2. Integral Invariants Following is a brief description of the calculation of the rotation- and translationinvariant features based on integration. The idea of constructing invariant features is to apply a nonlinear kernel function f (I) to a gray-valued image, I, and to integrate the result over all possible rotations and translations (Haar integral over the Euclidean motion): T[ f ](I) =

2π 1 M−1 N−1 P−1 ∑ ∑ f (g(n0 , n1 , p P )I) PMN n∑ 0 =0 n1 =0 p=0

(1)

where T[ f ](I) is the invariant feature of the image, M, N are the dimensions of the image, and g is an element in the transformation group G (which consists here of rotations and translations). Bilinear interpolation is applied when the samples do not fall onto the image grid. The above equation suggests that invariant features are computed by applying a nonlinear function, f , on the neighborhood of each pixel in the image, then summing up all the results to get a single value representing the invariant feature. Using several different functions finally builds up a feature space. To preserve more local information we remove the summation over all translations. This results in a map T that has the same dimensions of I:     1 P−1 2π I (2) (T [ f ] I) (n0 , n1 ) = ∑ f g n0 , n1 , p P p=0 P Applying a set of different f ’s will result in a set of maps. A global multidimensional feature histogram is then constructed from the elements of these maps. The choice of the non-linear kernel function f can vary. For example, invariant features can be computed by applying the monomial kernel, which has the form: 

P−1

f (I) =



∏ I xp , yp

1

P

(3)

p=0

One disadvantage of this type of kernels is that it is sensitive to illumination changes. The work in [10] defines another kind of kernels that are robust to illumination changes. These kernels are called relational kernel functions and have the form: f (I) = rel (I (x1 , y1 ) − I (x2 , y2 )) with the ramp function

(4)

H. Tamimi et al. / Appearance-Based Localization of Mobile Robots

183

Figure 1. Calculation of f = rel (I(0, 3) − I(4, 0)), by applying Equation 2, involves applying the function rel to the difference between the grey scale value of the pixels that lie on the circumference of circle of radius 3 and pixels that lie on the circumference of another circle of radius 4 (taking into consideration a phase shift of π 2 between the corresponding points) and taking the average of the result [17].

rel (γ ) =

⎧ ⎨1 ⎩

ε −γ 2ε

0

if γ < −ε if − ε ≤ γ ≤ ε if ε < γ

(5)

centered at the origin and 0 < ε < 1 is chosen by experiment. Global integral invariants have been successfully used for many applications such as content-based image retrieval [12], texture-classification [9], object recognition [8], and robot localization [17]. Figure 1 illustrates how these features are calculated. Please refer to [12] for detailed theory.

3. DoG-based Point Detector The interest points, which are used in our work, were first proposed as a part of the Scale Invariant Feature Transform (SIFT) developed by Lowe [6]. These features have been widely used in the robot localization field [11] [14]. The advantage of this detector is its stability under similarity transformations, illumination changes and presence of noise. The interest points are found as scale-space extrema located in the Difference of Gaussians (DoG) function, D(x, y, σ ), which can be computed from the difference of two nearby scaled images separated by a multiplicative factor k: D (x, y, σ ) = (G (x, y, kσ ) − G (x, y, σ )) ∗ I (x, y) = L (x, y, kσ ) − L (x, y, σ )

(6)

where L(x, y, σ ) defines the scale space of an image, built by convolving the image I(x, y) with the Gaussian kernel G(x, y, σ ). Points in the DoG function, which are local extrema in their own scale and one scale above and below are extracted as interest points. The interest points are then filtered for more stable matches, and more accurately localized to scale and subpixel image location using methods described in [2].

184

H. Tamimi et al. / Appearance-Based Localization of Mobile Robots

4. Using Global Integral Invariants For Robot Localization In [17], integral invariants are used to extract global features for solving the robot localization problem by applying Equation 2 to each pixel (n0 , n1 ) in the image I. The calculation of the matrix T involves finding an invariant value around each pixel in the image which is time consuming. Instead of this, Monte-Carlo approximation is used to estimate the overall calculation [13]. This approximation involves applying the nonlinear kernel functions to a set of randomly chosen locations and directions rather than to all locations and directions. Global features achieve robustness mainly because of their histogram nature. On the other hand, local features, extracted from areas of high relevance in the image under consideration, are more robust in situations where the objects in images are scaled or presented in different views [4]. Such situations are often encountered by the robot during its navigation. In the next sections we modify the global approach by applying the integral invariants locally around a set of interest points.

5. Extracting Local Integral Invariants Unlike the existing approach, explained in Section 2, the features that we propose are not globally extracted; they are extracted only around a set of interest points. Our approach can be described in the following steps: 1. Interest point detection: The first stage is to apply the DoG-based detector to the image in order to identify potential interest points. The location and scale of each candidate point are determined and the interest points are selected based on measures of stability described in [6]. 2. Invariant features initial construction: For each interest point located at (n0 , n1 ) we determine the set of all points which lie on the circumference of a circle of radius r1 . We use bilinear interpolation for sub-pixel calculation. Another set of points that lie on a circumference of a circle of radius r2 are determined in the same manner. Both circles have their origin at (n0 , n1 ). To make the features invariant to scale changes, the radii are adapted linearly to the local scale of each interest point. This way the patch that is used for feature extraction always covers the same details of the image independent of the scale. 3. Nonlinear kernel application: A non-linear kernel function is applied to the values of the points of the two circles. Each point located at (x0 , y0 ) on the circumference of the first circle is tackled with another point located at (x1 , y1 ) on the circumference of the second circle, taking into consideration a phase shift θ between the corresponding points. This step is repeated together with step 2 for a set of V kernel functions fi , i = 1, 2, · · · ,V . The kernels differ from each other by changing r1 , r2 and θ . Finally we apply Equation 7 for each interest point located at (n0 , n1 ).     1 P−1 2π Fi (n0 , n1 ) = ∑ fi g n0 , n1 , p I , i = 1, 2, · · · ,V. (7) P p=0 P We end up with a V -dimensional feature vector, F, for each single interest point.

H. Tamimi et al. / Appearance-Based Localization of Mobile Robots

(a) Step (1)

(b) Step (2)

(c) Step (3)

(d) Step (4)

(e) Step (5)

(f) Step (6)

(g) Step (7)

(h) Step (8)

185

Figure 2. Eight different image samples that belong to the same reference location.

6. Experimental Results In this section we present the experimental results of our local integral invariants compared with the global integral invariants reviewed in Section 4. 6.1. Setting up the Database To simulate the robot localization we use a set of 264 gray scale images taken at 33 different reference locations. Each has a resolution of 320 × 240. In each reference location we apply the following scenario capturing an image after each step: (1) The robot stops. (2) It translates 50 cm to the right. (3) The pan-tilt unit rotates 20 degrees to the left. (4) The robot moves 50 cm ahead. (5) The pan-tilt unit rotates 40 degrees to the right. (6) The pan-tilt unit rotates 20 degrees to the left. (7) The pan-tilt unit moves five degrees up. (8) The pan-tilt unit moves 10 degrees down. Figure 2 includes eight sample images of one reference location. The database is divided into two equal parts. 132 images are selected for training and 132 for testing. This partitioning is repeated 50 times with different combinations for training and testing images. The average localization rate is computed for each combination. We assume that optimal localization results are obtained when each input image from one of the reference locations matches another image that belongs the same reference location. 6.2. Global Integral Invariants In order to compare our work with the work of [17] which involves calculating the global features, a set of 40 × 103 random samples is used for the Monte-Carlo approximation, which is also suggested in [13] for best performance. For each sample we apply a set of three different kernels. Both monomial and texture kernel functions are investigated for best localization accuracy using the above images. For each image a single

186

H. Tamimi et al. / Appearance-Based Localization of Mobile Robots

D-dimensional histogram is build with D = 3. Each dimension contains 8 bins which has experimentally led to best results. The histograms are compared using the l 2 − Norm measure. 6.3. Local Integral Invariants We use the following parameters when implementing our descriptive features: For each interest point we set V = 12 which gives us a 12-dimensional feature vector that is generated using a set of either relational kernel functions or monomial kernel functions. Best results were obtained with ε = 0.098 in Equation 5. When evaluating the localization approach we first compare each individual feature vector from the image in the query with all the other feature vectors, extracted from the training set of images, using the l 2 − Norm measure. Correspondences between the feature vectors are found based on the method described in [1] which leads to robust matching. Then we apply a voting mechanism to find the corresponding image to the one in query. The voting is basically performed by finding the image that has the maximum number of matches. Figure 3 gives an example of the correspondences found between two different images using the proposed approach.

Figure 3. Matching two different images using the proposed approach.

6.4. Results Figure 4 demonstrates the localization rate of the proposed approach and the existing one. It can be seen that the local integral invariants-based approach performers better than the global integral invariants-based approach using any of the two kernel types but gives best results using the relational kernel function. The global integral invariants-based approach has better results when using the monomial kernel function. One way to test if the results of the proposed approach are statistically significant or not, is to calculate the confidence interval for each experiment, and to check if the confidence interval of the mean values of the proposed approach does not overlap the confidence interval of the mean values of the

H. Tamimi et al. / Appearance-Based Localization of Mobile Robots

187

100 Localization rate (%)

90 80 70 60 50 40 30 20 10 0

LR

LM

GR

GM

Figure 4. The localization rate of the local integral invariants-based approach using relational kernel functions (LR) and monomial kernel functions (LM), compared with the existing global integral invariants-based approach using relational kernel functions (GR) and monomial kernel functions (GM).

existing approach. The 95%-confidence interval test is used here, which means that the real mean value of each experiment lies in this interval with the propability of 95%. On the top of each bar in Figure 4, the following information are represented from outside to inside: The maximum and minimum values of the data, the standard deviation and the 95%−confidence interval. The confidence interval in the bar (LR) does not overlap any confidence interval of the other bars. This means that the results of this local integral invariants-based approach are significantly better than the other results. The average localization time of the global approach and the local approach are 0.42 and 0.86 seconds respectively using a 3GHz Pentium 4. The additional computation time in the case of the local approach is due to the additional complexity in the feature extraction stages and the computation overhead during the comparison of the local features.

7. Conclusion In this paper we have proposed an appearance-based robot localization approach based on local integral invariants. The local features have a compact size but are capable of matching images with high accuracy. In comparison with using global features, the local features show better localization results with a moderate computational overhead. Using local integral invariants with relational kernel functions leads to significant localization rate than the monomial kernel functions.

Acknowledgment The first and second author would like to acknowledge the financial support by the German Academic Exchange Service (DAAD) of their PhD. scholarship at the Universities of Tübingen and Freiburg in Germany. The authors would also like to thank Christian Weiss for his assistance during the experimental part of this work.

188

H. Tamimi et al. / Appearance-Based Localization of Mobile Robots

References [1] P. Biber and W. Straßer. Solving the Correspondence Problem by Finding Unique Features. In 16th International Conference on Vision Interface, 2003. [2] M. Brown and D. Lowe. Invariant Features from Interest Point Groups. In British Machine Vision Conference, BMVC, pages 656–665, Cardiff, Wales, September 2002. [3] H. Bischof H, A. Leonardis A, and D. Skocaj. A Robust PCA Algorithm for Building Representations from Panoramic Images. In Proceedings 7th European Conference on Computer Vision (ECCV), pages 761–775, Copenhagen, Denmark, 2002. [4] A. Halawani and H. Burkhardt. Image retrieval by local evaluation of nonlinear kernel functions around salient points. In Proceedings of the 17th International Conference on Pattern Recognition (ICPR), volume 2, pages 955–960, Cambridge, United Kingdom, August 2004. [5] Y. Ke and R. Sukthankar. PCA-SIFT: A More Distinctive Representation for Local Image Descriptors. In CVPR (2), pages 506–513, 2004. [6] D. Lowe. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vision, 60(2):91–110, 2004. [7] K. Mikolajczyk and C. Schmid. A Performance Evaluation of Local Descriptors. IEEE Transactions on Pattern Analysis & Machine Intelligence, 27(10):1615–1630, 2005. [8] O. Ronneberger, H. Burkhardt, and E. Schultz. General-purpose Object Recognition in 3D Volume Data Sets using Gray-Scale Invariants - Classification of Airborne Pollen-Grains Recorded with a Confocal Laser Scanning Microscope. In Proceedings of the International Conference on Pattern Recognition (ICPR), volume 2, Quebec, Canada, 2002. [9] M. Schael. Invariant Grey Scale Features for Texture Analysis Based on Group Averaging with Relational Kernel Functions. Technical Report 1/01, Albert-Ludwigs-Universität, Freiburg, Institut für Informatik, January 2001. [10] M. Schael. Texture Defect Detection Using Invariant Textural Features. Lecture Notes in Computer Science, 2191:17–24, 2001. [11] S. Se, D. Lowe, and J. Little. Vision-based Mobile robot localization and mapping using scale-invariant features. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 2051–2058, Seoul, Korea, May 2001. [12] S. Siggelkow. Feature Histograms for Content-Based Image Retrieval. PhD thesis, AlbertLudwigs-Universität Freiburg, Fakultät für Angewandte Wissenschaften, Germany, December 2002. [13] S. Siggelkow and M. Schael. Fast Estimation of Invariant Features. In W. Förstner, J. M. Buhmann, A. Faber, and P. Faber, editors, Mustererkennung, DAGM, pages 181–188, Bonn, Germany, September 1999. [14] H. Tamimi, H. Andreasson, A. Treptow, T. Duckett, and A. Zell. Localization of Mobile Robots with Omnidirectional Vision using Particle Filter and Iterative SIFT. In Proceedings of the 2005 European Conference on Mobile Robots (ECMR05), Ancona, Italy, 2005. [15] H. Tamimi and A. Zell. Vision-based Global Localization of a Mobile Robot using Wavelet Features. In Informatik aktuell, editor, Autonome Mobile Systeme (AMS), 18. Fachgespräch, Karlsruhe, 4. - 5. December, pages 32–41, Karlsruhe, Germany, 2003. [16] H. Tamimi and A. Zell. Vision based Localization of Mobile Robots using Kernel approaches. In Proceedings of the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2004), pages 1896–1901, 2004. [17] J. Wolf, W. Burgard, and H. Burkhardt. Robust Vision-based Localization by Combining an Image Retrieval System with Monte Carlo Localization. IEEE Transactions on Robotics, 21(2):208–216, 2005. [18] C. Zhou, Y. Wei, and T. Tan. Mobile Robot Self-Localization Based on Global Visual Appearance Features. In Proc. IEEE Int. Conf. Robotics & Automation (ICRA), pages 1271–1276, 2003.

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

189

Enhancing Self Covertness in a Hostile Environment from Expected Observers at Unknown Locations Mohamed MARZOUQI and Ray JARVIS Intelligent Robotics Research Centre, Monash University, Australia {mohamed.marzouqi, ray.jarvis}@eng.monash.edu.au

Abstract. In this paper, a new approach is introduced to the field of covert robotics. A mobile robot should minimize the risk of being exposed to observers which are expected to be operating within the same environment but at unknown locations. The robot should enhance its covertness by hiding. A proper hiding point should be chosen while taking into account the risk of the path to it. The approach depends on the assumption that the probability of an object to be observed is proportional to its exposure to the free-spaces of the environment, where a possible observer might be standing. Given that, a visibility risk map is generated that is used to find the hiding point and a covert path to it. The approach has been tested on simulated known environments. A number of test cases are presented. Keywords. Mobile robot, covert path planning, hiding.

Introduction This work is part of our research on covert robotics [1,2,3,4], where the aim is to study the strategies, behaviors and techniques that can be used to create mobile robots with the ability to accomplish different types of navigation tasks covertly (i.e. without being observed by hostile observers in the operating environment). Covert robotics systems can be useful in various applications, especially where safety and security are required (e.g. police and military fields). To date, the research on covert navigation has shown a noticeable weakness. Our previous work in this field has concentrated on planning a covert path between two nominated points (see [1,3,4]). A robot would require one or more different covert navigation abilities to accomplish a real-life covert mission successfully. Examples of such abilities are: covert navigation to a specified destination, tracking or approaching a target in a stealthy manner, and searching or exploring a hostile environment covertly. In this paper, the ability of hiding from observers whose locations within the environment are unknown is considered. Hiding is an important tool that a robot may use during its mission to enhance its covertness. For example, robot carrying out a task in a hostile environment may need to hide if it detects possible risk of being discovered. In previous work, number of strategies has been used for robots hiding. One example is to plan a path to a dark spot in the environment to hide from expected observers. It has been used in number of research problems for the purpose of monitoring an area covertly [5,6]. In such research, the robot is equipped with a light

190

M. Marzouqi and R. Jarvis / Enhancing Self Covertness in a Hostile Environment

intensity sensor to search for the darkest area where the robot can stand to be hidden, or at least to be less noticeable. The success of this strategy depends heavily on the environment’s lighting condition. Another strategy to minimize self exposure during navigation is to minimize the size of the robot. Insect robotics [7,8] is an obvious example that can be used for such a purpose. Dependence upon the robot’s size for covertness is realistic, since an insect robot is either difficult to notice, or to be recognized as a robot if it is observed. On the other hand, a robot’s capability decreases along with its size. Insect robots are mainly used for spying but not for heavy-duty jobs (e.g. carrying an object) which might be required in a covert mission. Other examples are pursuit-evasion and hide-and-seek games, which are known problems in autonomous robots field. The aim is to guide one or a team of robots to search for or track one or a group of robots [9,10]. The work done in this area has only focused on developing the pursuer/seeker side, while the evader/hider is controlled manually or has a random behavior to escape the opponent’s observation. In [2], we have presented hiding as a path planning strategy. The aim is to allow a mobile robot in a known stationary environment to plan a path with low exposure to observers at initially known locations. This is achieved by choosing a hiding point that has maximum guaranteed time before possible observation by any observer (i.e. the farthest point in distance from the observers’ view). In this paper, the work is extended to the case of unknown observers’ locations. The strategy used for this case is based on the assumption that the probability of an observer (who is moving randomly) to detect the robot is proportional to the robot’s position exposure to the free-spaces of the environment. Depending on this assumption, a hiding point is chosen while considering the risk of the path to it. The organization of this paper is as follows. The definition and assumptions of the problem are presented in section 1. The hiding strategy is explained section 2. The approach algorithm is presented in section 3. Simulation experiments have been conducted to evaluate the approach. In section 4, results for different cases are illustrated. The conclusion and our direction for future research are in section 5.

1. Problem Definition The robot’s objective is to enhance its covertness in the environment by moving (if necessary) to a location that would offer a higher covertness than its current location. The robot’s covertness is violated by being exposed to at least one observer. The following assumptions have been made. The operating environment is a 2-d space that is represented as a uniform grid-based map, each of its cells are either a free-space or occupied by an obstacle that no agent (i.e. the robot or an observer) can penetrate. Obstacles within the environment are the only means of maintaining covertness. They are assumed opaque and higher than the sight level of all agents. The environment is assumed to be known and stationary (neither obstacles nor free-spaces change). It can be indoor or outdoor but with specified real or virtual borders to limit the search space for a path. The robot can expect observers to be within the environment but has no knowledge about their locations or behaviour (e.g. if the observer is a sentry, the robot would not know its strategy in searching for intruders). The expected observers could be moving around with no fixed positions.

M. Marzouqi and R. Jarvis / Enhancing Self Covertness in a Hostile Environment

191

2. The Hiding Strategy A proper hiding point is a location that is less likely to be observed (or inspected) by the observers. Choosing a hiding point would normally be based on the environmental knowledge or information given about the observers (e.g. a room that is least visited by the observers is a good hiding place). Our hiding strategy will only depend on the environment structure as the only knowledge available. Based on human covert behavior in hiding, a good hiding place is the one with minimum exposure to the free-spaces of the environment. This is considered an important factor since the probability of a position to be exposed to an observer at an unknown location would be mostly proportional to its exposure to the free-spaces. A position that is exposed to the whole environment (e.g. a center of a star shape environment) is guaranteed to be exposed to any observer. Still, there are other factors that can affect choosing a hiding point other than exposure (e.g. a hallway is riskier as a hiding place than an isolated store room). The rest of this paper will concentrate on the exposure only as a crucial hiding factor, keeping other factors for future research. The first step in the approach is to build a visibility map (introduced previously in [1]). In the visibility map, the exposure value of each of its free-space cells is calculated. The exposure value represents the number of free-space cells visible to the corresponding cell (Each cell’s value will be in the range from 0 to the total number of free-space cells). Fig. 1 shows an example of a calculated visibility map where the exposure values are represented in grayscale. A cell gets darker as it becomes more hidden, and vice versa. The basic way to calculate the exact exposure value at each cell is to use a ray shooting method. At any cell, the free-space cells are counted at each direction from that cell, where the exposure value is the sum of the counted cells in all directions. This process is time consuming for high resolution maps as its time complexity is O(n2), where n is the number of free-space cells. A faster method is described in [1] in details. The time consumed in generating the visibility map in the case of known environments is acceptable, as all calculations will be performed only once offline.

Figure. 1. A visibility map of an environment. Obstacles are in white. The brightness level of any location in the free-space represents its exposure to the free-spaces.

192

M. Marzouqi and R. Jarvis / Enhancing Self Covertness in a Hostile Environment

A variation to the described case is if the robot expects the observers to be in some places than others (e.g. hallway). Such knowledge can be implemented in the visibility map by making the cells exposed to those areas have a higher calculated visibility risk value. To do that, a map is created that each of its cells has a probability value of the robot’s expectation for an observer to visit that cell. The visibility cost at any cell will be the sum of the probability values at those cells exposed to it. The rest of this paper will concentrate on uniform probability of the observers’ locations. Choosing the hiding point is only part of the solution. A covert path to the hiding point is the second part which may also affect the choice of the hiding point (e.g. a point with maximum covertness may not be considered if the path to it is risky). The next section presents our approach that is based on the described strategy.

3. The Hiding Algorithm Let t denote the hiding period that starts from the time the robot decides to hide. t is initially known (e.g. the robot can estimate that the detected risk will be cleared after a period of time). A hiding plan is evaluated by measuring the risk of both the hiding spot and the path to it. H(c) in (1) is the hiding cost that represents the total risk of choosing a cell c as a hiding point and navigating to it through p. m is the time the robot needs to follow a path from its current location to c. V(c) is the visibility cost at cell c (which is the exposure value calculated in the visibility map). pi is the robot’s position along p at moment i. A cell is ignored if the time to approach it is higher than the hiding time (i.e. m>t). To describe (1) in words, the total cost of any hiding plan is the sum of the visibility costs that encountered at each time frame of the hiding period. This is logical since the risk of being exposed is proportional to the time the robot spends in the environment and the risk of its position at any moment during that time. The chosen hiding point is the one with the lowest H(c). It should be obvious that the robot’s current position is also considered as a hiding point option with m = 0. m

H (c )

¦V ( p )  V (c).(t  m) i

,t ! m

(1)

i 0

The equation in (1) will form a balance that considers the hiding time as well as the chosen point and path. For example, if the robot needs to hide for a very short period of time, then choosing a hiding point with a low exposure value but navigating to it through a highly exposed path would be irrational. On the other hand, if the robot needs to hide for a very long time then a hiding point with the lowest visibility cost may be chosen regardless of the risk of the path to it. The chosen path to the hiding point can be the shortest path. A better approach is to follow a covert path that can minimize the visibility cost of approaching the hiding point, and hence produce a better hiding plan. For example, a good hiding point might be ignored if the short path to it is highly exposed, on the other hand, it might be considered if there is a covert path to it with low exposure. Finding a covert path will depend on the already calculated visibility map, as described next. The Dark Path (DP) algorithm has been used here to find the covert path to each cell. We have introduced this algorithm in previous work [1,3,4] as an optimal covert

M. Marzouqi and R. Jarvis / Enhancing Self Covertness in a Hostile Environment

193

path planner between two nominated points. The DP algorithm is an extension of the known Distance Transform (DT) algorithm [11] (a short collision-free path planning algorithm, a brief description is in [1]). An important advantage offered by the DT algorithm is its ability to accept other cost factors other than distance. In the DP algorithm, the visibility cost of the path is considered as an additional cost. Different applications would require different visibility cost factors that can satisfy the path’s requirements. In our case the visibility cost at any cell is its exposure. A generated path is a balance between shortness and covertness. Similar to the DT algorithm, a cost map is created that each of its cells holds a value that represent the minimum cost to the destination. The process of finding the DP cost map is as follows. Firstly, the robot’s cell location is given a 0 value in the created cost map, which is where the cost propagation will start from. All other cells get a high value. Secondly, a cost propagation process is conducted to assign each cell a value equals the least value amongst the n neighbors’ costs, added to that the time W to move to that neighbor and the visibility cost of the cell itself. The DP cost at each cell is represented by (2). Į is used to balance between the path shortness and covertness. It is set to 1 here. Once the cost map is generated, each cell will hold a value that represents the minimum visibility/time cost to approach it from the robot’s current location. A covert path to any cell c can be generated by tracing the steepest descent trajectory from c to the robot. i 8

Cost (c )

min Cost (ni )  W i  D u V (c )

(2)

i 1

The first part of the equation at (1) (i.e. the path visibility risk cost) can be calculated efficiently using the created DP cost map without the need to follow the path to each point and sum its visibility costs. This is done by creating a time map that the value at each of its cells is m (i.e. the time to reach that cell through the covert path). This is done during the process of generating the DP cost map. Initially, the robot location is given an m value of 0 in the time map. Then, at the time of generating the DP cost map using (2), m of c in the time map equals m of its neighbor (which it gets its visibility/time cost from) added to that the value of W . Once the time map is generated, it is subtracted from the DP cost map which will produce a map that each of its freespace cells has the value of the total visibility cost of the covert path to that cell (scaled by Į).

4. Experimental Results 4.1. Observers are Expected after a Period of Time This is a simple case scenario which is demonstrated here to concentrate on the choice of the hiding spot but not the path to it. The robot knows that the observers are currently outside the environment but expected to be inside later. It has enough time to approach any hiding point. The path to the chosen point will be safe, hence the shortest path should be followed. The described algorithm can be easily implemented here by setting Į in (2) to 0. This will make the DP algorithm generate the shortest path and make the calculated visibility cost for any path =0. t is set to a value higher than the

194

M. Marzouqi and R. Jarvis / Enhancing Self Covertness in a Hostile Environment

maximum m in the time map. Fig. 2 shows an example of a simulated experiment. The robot at R (which is represented as a point for simplicity) has followed a short path (the continuous line) to the point H1 which has the minimum visibility cost in the map. The visibility map is shown on the top of the environment map for evaluation purposes. If the traveled distance is a concerned to safe motion power or find a closer hiding point to where the robot is operating, then the robot can choose the nearest cell that has a visibility cost within a specified threshold to the maximum visibility cost. In Fig. 2, the robot has H2 as another hiding spot option that is closer to it than H1. The path to H2 is the dash line. 4.2. Observers are inside the Environment In this case, the robot expects the observers to be within the environment, but at unknown locations. Fig. 3 shows an example. Given that the hiding time t is set to a very high value, the robot at R has chosen H2 as it has the minimum exposure value. The robot has followed a covert path to that point. When setting t to a lower value (3 times less), the robot has chosen H1 since the path to H2 is risky compared to the remaining hiding time that the robot will stay at H2. The difference between the calculated hiding costs at H1 and H2 for the two different time settings are shown in Table 1 in its first two rows. Fig 4 represents a combination of the cases in Fig. 2 and 3. The robot expects the observers to be in the environment within a period of time (which we will call here the safe time). The safe time in this case is not long enough to allow the robot to reach any hiding point in the environment safely through the shortest path. To accommodate that, Į is set to 0 only if the time to reach a cell is less than the safe time (using the time map). Otherwise it is set to 1. The shaded area around the robot in Fig. 4 is the region which the robot can reach safely any point within it through the shortest path. t has been set here to a low value, however, H2 has been selected since the highly exposed part of its path is within the safe area. The hiding cost at H2 becomes accordingly less than H1 and hence it has been chosen as the hiding point (see Table 1).

Figure. 2. The observers are expected to be in the environment after a long period of time. The robot at R has followed a short path (continuous line) to the best hiding point at H1. H2 is another hiding option that is closer than H1 but with a little higher visibility cost (environment size is 200x120).

M. Marzouqi and R. Jarvis / Enhancing Self Covertness in a Hostile Environment

195

Figure. 3. The observers are inside the environment. When t is set to a high value, H2 is chosen as the best hiding point ignoring its path cost. When t is set to a low value, H1 is preferred. In both cases the robot has followed a covert path to each point.

H2

Figure. 4. The shaded area is the region which the robot has the time to reach any point within it safely through the shortest path. Observers are expected to be in the environment after the safe time. Even t has been set here to a low value, the robot chooses H2 since the highly exposed part of its path is within the safe region.

Table 1. A comparison table of the calculated H() at the points H1 and H2. The chosen hiding point for each case is in bold. H(H1)

H(H2)

(x1000)

(x1000)

0

1,098

1,338

900

0

2562

1,656

300

50

810

750

Fig.

Hiding Time

Safe Time

3

300

3 4

196

M. Marzouqi and R. Jarvis / Enhancing Self Covertness in a Hostile Environment

5. Conclusion A hiding strategy is presented for a mobile robot that needs to minimize the risk of being exposed to expected observers in the environment at unknown locations. The approach so far depends only on the exposure to free-spaces risk factor. Different factors should be studied that can be combined with the exposure factor to produce more efficient results. Moreover, hiding in unknown environments would be another interesting extension that may depend on the same presented strategy.

Acknowledgment The authors would like to thank Etisalat University College and the Emirates Telecommunication Corporation (Etisalat), United Arab Emirates, as well as the Intelligent Robotics Research Centre (IRRC), Monash University, for providing the financial support for this work.

References [1]

[2] [3]

[4]

[5] [6] [7] [8]

[9]

[10] [11]

Mohamed Marzouqi and Ray A. Jarvis, “Covert path planning for autonomous robot navigation in known environments,” Proc. Australasian Conference on Robotics and Automation ACRA, Brisbane, December 2003. Mohamed Marzouqi and Ray A. Jarvis, “Covert robotics: Hiding in known environments,” Proc. IEEE Conference on Robotics, Automation and Mechatronics RAM, Singapore, December 2004. Mohamed Marzouqi and Ray A. Jarvis, “Covert robotics: covert path planning in unknown environments,” Proc. Australasian Conference on Robotics and Automation ACRA, Canberra, December 2004. Mohamed Marzouqi and Ray A. Jarvis, “Covert path planning in unknown environment with known or suspected sentry location,” Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), Canada, August 2005. B. Kratochvil, “Implementation of scout behaviors using analog sensing methods”, Technical report no. 03-044, University of Minnesota, 2003. P. E. Rybski, et al. “A team of robotic agents for surveillance”, Proceedings of the Fourth International Conference on Autonomous Agents, ACM, pp. 9-16, 2000. J.M. McMichael and M.S. Francis. “Micro air vehicles toward a new dimension in flight”, DARPA, USA, 1997. R. T. Vaughan, G. S. Sukhatme, F. J. Mesa-Martinez, and J. F. Montgomery. “Fly spy: Lightweight localization and target tracking for cooperating air and ground robots”. Distributed Autonomous Robotic Systems, vol. 4, pp. 315-324, 2000. S. M. LaValle, H. H. Gonz'alez-Ba, C. Becker, and J.-C. Latombe. “Motion strategies for maintaining visibility of a moving target”. In Proc. IEEE Int'l Conf. on Robotics and Automation, pp. 731-736, 1997. K. Mann, “Evolving robot behavior to play hide and seek”, The Journal of Computing in Small Colleges archive, vol.18, pp. 257–258, May 2003. R. A. Jarvis, ”Collision-free path trajectory using distance transforms,” Proc. National Conference and Exhibition on Robotics, Melbourne, August 1984. Also in Mechanical Engineering Transactions, Journal of the Institution of Engineers, Australia,1985.

Part 4 Multi-Agent Robots

This page intentionally left blank

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

199

An experimental study of distributed robot coordination a

Stefano Carpin a and Enrico Pagello b International University Bremen – Germany b University of Padova – Italy

Abstract. Coordinating the path of multiple robots along assigned paths is a difficult problem (NP-Hard) with great potential for applications. We here provide a detailed study of a randomized algorithm for scheduling priorities we have developed, also comparing it with an exact approach. It turns out that for problems of reasonable size our approach does not perform significantly worse than the optimal one, while being much faster. Keywords. Multi-robot systems, intelligent mobility, randomized algorithms

1. Introduction Planning the motion of multiple robot systems has been a task investigated since the early days of mobile robotics. While the problem is interesting in itself, because of the inherent computational complexity it exhibits, it has to be admitted that few applications have been presented up to now in the context of autonomous mobile robots. This is partly due to the fact that systems with a remarkable number of robots have not been deployed yet. Moreover, when numerous multiple robots are to operate in a shared unstructured environment, one of the holy grails of multi-robot research, their motion is commonly governed by reactive navigation modules rather than by precisely planned paths. That said, it is not implied that the problem in itself lost interest from the applicative point of view. The demand for systems capable of governing the motion of multiple objects in shared environments is instead ever increasing. Applications include for example luggage handling systems at airports, storage systems in factories, moving containers in harbors, and more. The theme of intelligent mobility is envisioned to play a major role in the foreseeable future. Maybe one of the major differences that will be seen will be a decrease of individual robots’ degree of autonomy. Sticking to the storage systems in factories example, mobile carts are not free to move wherever they want, but are rather constrained to proceed along predefined paths, usually hard wired in hardware. Given such a set of predefined paths, coordinating the motion of vehicles along these paths will be asked more and more often. And, of course, it will be asked to find solutions that optimize certain performance indices, like for example, consumed energy, time needed to complete the task, and the alike. From a computational point of view these problems have been studied since quite some time, and their inherent computational complexity has soon been outlined. It is therefore a must to look for approximated and heuristic based

200

S. Carpin and E. Pagello / An Experimental Study of Distributed Robot Coordination

solutions, if one is required to deal with instances with a multitude of moving objects. In this paper we offer an experimental assessment of a simple randomized approach to solve the coordination problem we have proposed in the past. Section 2 discusses related literature, while the problem is formalized in section 3. The solving algorithm is illustrated in section 4, and the experimental framework and results are shown in section 5. Conclusions are finally provided in section 6. 2. Related work The problem of multi-robot motion planning has been constantly studied in the past. The first major distinction concerns centralized versus decentralized approaches. When a centralized motion planner is used, one process has to plan the motion of whole set of robots. The obvious drawback is in the high dimensionality of the configuration space to be searched. In a decentralized approach, every robot plans its own motions and then has to undergo a stage of negotiations to solve possible collisions with other robots. Decentralized approaches are inherently incomplete, but much faster. Sánchez and Latombe [16] recently speculated that decentralized approaches are likely to show their incompleteness often when used in industrial production plants. In the case of mobile robot systems, however, the environment is likely to be less cluttered and hence these problems are less likely to occur. Efficient methods to solve the single robot motion planning are available [4][9][13] and will not be further discussed here (extensive treatments of the motion planning problem are available, like [6] and [10]). A common approach to solve the multi-robot motion planning problem consists in assigning priorities to robots and planning their motion according to the priorities [5]. When planning the motion of a robot with priority pj , it is necessary to take into consideration the already planned motions for robots with priority pi , where pi < pj . Finding a good priority schema is a hard problem in itself [2]. A related problem, which is the one we will address in this paper, consists in coordinating the path of a set of robots along given specified paths. As the paths may intersect with each other, it might be necessary to stop certain robots when approaching an intersection point in order to give way to other robots and avoid collisions. In this context one is generally interested to minimize certain parameters, like, for example, the time needed by all robots to reach their final destination. This rules out certain trivial coordination schemas, like for example the one where just one robot moves and all the other remains stationary, since the overall required time would be too high. LaValle solved the problem using a game-theoretic approach based on multi-objective optimization [11][12]. The approach allows to tune the algorithm behavior between centralized planning and complete decentralized planning. The author reports that optimal results can be found, but this required significant amount of time. Simèon et al. [17] solve the path coordination problem using a resolution complete algorithm. They show how it is possible to split a given path in segments where the robot will not collide with any other robot, and segments where its path intersects other paths. The authors report results involving up to 150 robots, but where no more than 10 robots are intersecting each other’s path. Computation time is in the order of minutes. Akella and Hutchinson [1] solve the problem of robot coordination along predefined routes by simply varying the start time of each robot. Once a robot starts to move, it never stops until it reaches its target position. Peng and Akella recently extended these ideas addressing the case of robots with kinodynamic constraints [14].

S. Carpin and E. Pagello / An Experimental Study of Distributed Robot Coordination

201

Figure 1. The simplest case of robot coordination, involving two robots only. R1’s path can be divided into three segments, free, occupied and free respectively. R2’s segment can be divided into two segments. If R2 is given a priority higher than R1, and both robots travel at the same speed, R1 will not be able to reach its final destination, because of R2 sitting on its path.

3. Problem formulation The problem we aim to study is the following: given n robots, assume that n paths, one for each robot, are given. We assume that each path has been subdivided into free and occupied segments. This is feasible in the light of the results presented in [14] and [17]. s(i) Hence, a path pi can be seen as a sequence p1i . . . pi , where each pji is either a free or an occupied segment, and s(i) is the number of segments composing path pi . The task is to find a coordination schema, i.e. a mapping: C : [0, T ] → {1 . . . s(1)} × {1 . . . s(2)} × . . . × {1 . . . s(n)}.

(1) s(i)

In the beginning robot i is positioned at segment p1i , and in the end has to reach pi . While moving through the different segments, a certain amount of time will be spent to traverse each segment. Let t(pji ) be the time spent by robot i to traverse segment pji (1 ≤ j ≤ s(i)). Throughout the paper we assume that robots only move forward along their path, though they can stop at certain points to give way to other robots. However, they never backtrack along their route. The goal is to find a coordination schema that minimizes the time needed by all robots to complete their motion. Formally we aim to minimized the following quantity z = max

1≤i≤n

s(i) 

t(pji ).

(2)

j=1

It can be shown that this problem is equivalent to the Job Shop Scheduling (JSS) Problem, which is known to be NP-hard [7]. The JSS problem asks how to schedule n jobs that have to be processed through m machines in way that the overall required time is minimized. The constraints are that no machine can process more than one job at the same time, and that each job has to be processed by the machine in a given order. In the path coordination problem, each robot is a job, and each free or occupied segment is a machine. Therefore, under the assumption that P = N P , the search for a coordination schema that minimizes the time needed to complete the motion task is doomed to take exponential time. This motivates the great number of approximated and heuristic approaches that have been tried.

202

S. Carpin and E. Pagello / An Experimental Study of Distributed Robot Coordination

4. Random rearrangements In our former work [3] we proposed a simple distributed schema to solve the multi-robot motion planning problem. Similar ideas were later used in [8]. The algorithm is slightly modified here to describe how the various robots can operate to find a valid coordination schema, and is depicted in algorithm 1. The algorithm assumes that a datastructure SpaceTime is available. SpaceTime records which part of the space is occupied or free at a given time. The algorithm picks a random priority schema (line 1), and then schedules the robot motions according to the selected priority schema. The first considered robot will be scheduled to move straight along its path, and SpaceTime will be accordingly updated. When scheduling successive robot motions, it is necessary to check whether the robot can move to its next path segment or if it is already occupied (line 6). If this is possible, the robot moves to its next segment (line 7), or a delay is inserted (line 10). In both cases SpaceTime is updated to record robots’ position (line 11) while time evolves (lines 8 and 10). We here stick to the hypothesis formulated in our former work, i.e. that Algorithm 1 Coordination of n robots 1: pick a random permutation of the set {1 . . . n} 2: for i ← 1 to n do 3: T ime ← 0 4: j←1 5: while j < s(i) do 6: if SpaceTime(Time,pji ) is free then 7: Advance robot i to its segment pji 8: T ime ← T ime + t(pji ) 9: else 10: T ime ← Delay 11: update SpaceTime each robot will apply this procedure to compute a coordination schema, and that in the end the one leading the best value for the variable z formerly defined will be used to execute the real motion. So, when n robots are involved, n independently chosen random priority schemas are tried. The coordination schema does not compute any path and does not alter any of the paths it starts with. Hence there are cases when certain priority schemas do not lead to any solution. For example, if robot i’s goal position is on the path pj given to robot j and i has a higher priority than j, it may reach its goal position before j passes through the same point. At that point robot j will not be able to complete its motion. This could be avoided if robots would be allowed to move backwards along their path, but this is not considered here. From a practical point of view, it has to be mentioned that even a naive implementation of the above algorithm can solve a coordination task in a time in the order of a fraction of a second1 . 1 the reader should not be misleaded by this statement. The algorithm just computes a random permutation and tries to schedule robots one after the other according to this schema. No paths are computed and no optimization is tried, hence the small time required to find a solution.

S. Carpin and E. Pagello / An Experimental Study of Distributed Robot Coordination

203

5. Experimental results While the conceptual equivalence between the path coordination problem and the JSS problem has been outlined since long time, to the best of our knowledge there have been no attempts to set up an experimental framework to compare the exact approach and heuristic solutions proposed along the years. While one can easily guess that solving the JSS problem will take long time, it is interesting to assess how much the best solution found by heuristic algorithms differ from the optimal one. With respect to the algorithm presented in section 4, it is also appealing to measure how likely it is to select a random priority schema that leads to a deadlock situation. We have then developed two experimental setups. The first one is based on the formerly described algorithm, while the second is based on a freely available implementation of Mixed Integer Linear Programming algorithms [15] and is used to solve the associated JSS problem. The operating scenario has been simplified to a square grid, with robots starting and ending at random positions and moving along randomly generated paths. It is worth observing that these hypothesis do not oversimplify the problem. We anticipated in section 2 that there exist efficient algorithms to both compute paths and decompose them into free and occupied segments. Here we are only interested in the successive step, i.e. the coordination along these provided paths. The first set of tests analyzes the possibility of not finding a valid priority schema. Figure 3 illustrates the results. Fixing three different environment sizes, we have solved 20 randomly generated problems involving a varying number of robots. The figure illustrates the percentage of generated priority schemas that are not valid. A sort of saturation effect can be observed, i.e. when the number of robots increases, so does the fraction of invalid priority schemas. The reader should however observe that this effect manifests itself when the number of robots is big. The bigger the environment, the longer the paths travelled by each robot, hence the number of collisions and the chance that a robot will terminate its run on some other path. This was decided on purpose, to test the algorithm under extreme conditions. It however never happened that none of the generated priority schemas was feasible. The algorithm always produced a solution. It is also worth noting the fact that when trying to coordinate n robots, the algorithm always tries only n of the possible n! priority schemas. In the most extreme case with 35 robots, only 35 out of the 35! possible priority schemas were tried. This is clearly a negligible fraction. Still, the system is able to find a solution. In the second set of tests we have developed, we have measured the variability of the generated solutions, in term of maximum time taken to complete the given coordination task. Results are presented in figure 3, where the relative standard deviation2 is plotted versus the number of robots in the team, for different environment sizes. It can be observed that the relative standard deviation stays most of the time bounded below a certain environment dependent threshold, and grows in the end when the number of robots is such that the number of collisions becomes big. The last point would be comparing the solution found with the proposed algorithm with the one found by the exact JSS solution. Doing this comparison is however computationally hard. For the scenario involving a 20 × 20 no differences were observed. When the number of robot grows however, one is confronted with instances of the JSS problem with tenths of machines and jobs, and hundred of constraints. Instanced of this 2 relative standard deviation is the ratio between standard deviation and mean. In this case it is more appropriate than the simple standard deviation since the average path size grows with the number of robots

204

S. Carpin and E. Pagello / An Experimental Study of Distributed Robot Coordination

Figure 2. Percentage of invalid priority schemas for different environment sizes and different sizes of the multi-robot team

size require few hours to be solved. This pushes execution time very high, hence making impossible for the moment to do a precis comparison for each and every case (formerly illustrate charts are the result of more than 2000 randomly generated problems)

6. Conclusions The simple randomized schema for finding priority schemas turns out to be competitive under rather general conditions. None of the testcases generated led to an instance un-

S. Carpin and E. Pagello / An Experimental Study of Distributed Robot Coordination

205

Figure 3. Relative standard deviation for different environment sizes and different sizes of the multi-robot team

solvable by the algorithm. This result is somehow intriguing, if one considers the number of priority schemas tried versus the number of possibilities. We have also observed that the quality of the solution, intended as the time to bring all the robots to their final destination, is often not far from the optimal one. The analyzed values for the relative standard deviation show that when problem becomes difficult, produced solutions span a wide range of possible values. The proposed approach shows that in many practical situations, suboptimal approaches are still satisfactory, and the need of guaranteed optimal solutions offers very often an unbalanced ratio between quality increase and required time.

References [1] S. Akella and S. Hutchinson. Coordinating the motions of multiple robots with specified trajectories. In Proceedings of the IEEE International Conference on Robotics and Automation,

206

S. Carpin and E. Pagello / An Experimental Study of Distributed Robot Coordination

pages 624–632, 2002. [2] M. Bennewitz, W. Burgard, and S. Thrun. Finding and optimizing solvable priority schemes for decoupled path planning techniques for teams of mobile robots. Robotics and Autonomous Systems, 41(2-3):89–99, 2002. [3] S. Carpin and E. Pagello. A distributed algorithm for multi-robot motion planning. In Proceedings of the fourth Eurpoean Conference on Advanced Mobile Robots, pages 207–214, Lund (Sweden), September 2001. [4] S. Carpin and G. Pillonetto. Motion planning using adaptive random walks. IEEE Transactions on Robotics, 21(1):129–136, 2005. [5] M. Erdman and T. Lozano-Pérez. On multiple moving objects. Algorithmica, 2:477–521, 1987. [6] H. Choset et al. Principles of robot motion. MIT Press, 2005. [7] M.R Garey and D.S. Johnson. Computers and Intractability. A guide to the theory of NPCompleteness. W.H. Freeman and Company, 1979. [8] Y. Guo and L.E. Parker. A distributed and optimal motion planning approach for multiple mobile robots. In Proceedings of the IEEE Conference on Robotics and Automation, 2002. [9] L.E. Kavraki, P. Švestka, J.C. Latombe, and M.H. Overmars. Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Transactions on Robotics and Automation, 12(4):566–580, 1996. [10] S.M. LaValle. Planning Algorithms. Available Online. [11] S.M. LaValle. Robot motion planning: A game-theoretic approach. Algorithmica, 26:430– 465, 2000. [12] S.M. LaValle and S.A. Hutchinson. Optimal motion planning for multiple robots having independent goals. IEEE Transactions on Robotics and Automation, 14(6), 1998. [13] S.M. LaValle and J.J. Kufner. Randomized kinodynamic planning. International Journal of Robotics Research, 20(5):378–400, 2001. [14] J. Peng and S. Akella. Coordinating multple robots with kinodynamic constraints along specified paths. International journal of robotics research, 24(4):295–310, 2005. [15] GNU project. The gnu linear programming toolkit. www.gnu.org/software/glpk/glpk.html, 2005. [16] G. Sánchez and J.C Latombe. Using a prm planner to compare centralized and decoupled planning for multi-robot systems. In Proceedings of the IEEE International Conference on Robotics and Automation, 2002. [17] T. Siméon, S. Leroy, and J.P. Laumond. Path coordination for multiple mobile robots: A resolution-complete algorithm. IEEE Transactions on Robotics and Automation, 18(1):42– 49, 2002.

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

207

Single-Sensor Probabilistic Localization on the SeReS Self-Reconfigurable Robot Kenneth PAYNE, Jacob EVERIST, Feili HOU and Wei-Min SHEN1 University of Southern California, Information Sciences Institute, USA

Abstract. This paper proposes a novel method for localizing a stationary infrared source of unknown orientation relative to a static docking sensor. This method uses elliptical approximations of likely positions of the infrared source and computes the intersections to find the most probable locations. It takes only a few samples to localize, is easily computed with inexpensive microcontrollers, and is robust to sensor noise. We then compare our approach with two other methods. The first uses a Bayesian filter across a map of discrete locations in the robot’s operational workspace to determine the suspected source position. The second also uses a probability distribution map but uses the method described by Elfes in his paper on probabilistic sonar-based mapping and navigation [1]. We show that our approach localizes quickly with a single sensor and is no more computationally demanding than other methods.

Keywords. Alignment, Docking, Bayesian, Infrared Beacon, Elliptical Probabilistic Sensing, SeReS, CONRO.

Introduction and Previous Work Self-reconfigurable robots are modular systems that can change and adapt their configuration to the environment based on the needs and characteristics of their task. These robots rely heavily on their inherent ability to connect and disconnect multiple times to change the configuration of modules in the network. A system that is able to align and correct for docking offsets can change shape reliably. One of the fundamental problems encountered during reconfiguration is the issue of correcting the error in connector alignment. Traditionally, the reconfiguration process would be hand-coded into the system by either an operator or choreographed by an off-board computer [2]. However, one of the goals of self-reconfigurable robots is to achieve reconfiguration at run-time [3][4][5][6][7]. To accomplish this, a method of localization is needed to overcome noise in motors and proprioceptive sensors, as 1

Corresponding Author: Wei-Min Shen, USC/Information Sciences Institute, 4676 Admiralty Way, Suite 1001, Marina del Rey, CA 90292, USA; E-mail: [email protected].

208 K. Payne et al. / Single-Sensor Probabilistic Localization on the SeReS Self-Reconfigurable Robot

well as error caused by docking friction and slippage. Many researchers have addressed this issue for both reconfigurable and general robotic docking [8][9][10][11][12]. Occasionally, those phenomena are somewhat predictable and can be approximated. For single-sensor modular systems, alignment with the infrared beacon was typically accomplished by scanning back and forth, localizing through multiple passes [10][13]. Systems with multiple sensors usually involved large amounts of computational complexity [14]. Lastly, the discretized methods discussed are based on the Bayes filter as well as Elfes’ work on sonar range finding [1]. Probabilistic methods achieve their objectives through sparse sampling and computation of probability distributions, which are robust to sensor and actuator noise. We designed our probabilistic approach for detecting error in sensor alignment and compared it to two others. For this series of experiments, we focus on the detection and localization of a single infrared light. The novel method we have developed is a parametric approximation of the two probability distribution-based methods. If we derive an accurate sensor model, the accuracy of the probabilistic methods is not significantly affected by sensor noise. The advantage of our approach over others is that it localizes quickly with a single sensor, yet uses no more computation than other methods, and can be carried out on a simple, inexpensive on-board microcontroller.

1. Experiment Characteristics 1.1. CONRO SeReS Robot Module and Testing Environment CONRO is a self-reconfigurable robot system (Fig. 1a). Each module has four interconnecting dock faces with an infrared emitter/receiver pair (for digital communication and analog sensing) as well as a pair of pin-and-latch docking systems, discussed in [12]. For this experiment, we used a CONRO SeReS module equipped with an Atmel Atmega128L processor2 and 128K flash memory.

a.

b.

Figure 1. (a) The CONRO Self-Reconfigurable Robot. (b) The SeReS conversion test environment. The operator holds a mobile infrared beacon with a plumb bob above a graduated strip mounted below the robot.

All three experiments were conducted in the same room (in Figure 1b) under the controlled lighting conditions of a typical office environment with diffuse fluorescent lighting in the ceiling, matte white walls, and celadon carpeting. This is by no means an approximation of the real world, but is instead used as an initial testing environment 2

For more information on this processor, see http://www.atmel.com, and also http://www.atmel.com/dyn/resources/prod_documents/doc2467.pdf.

K. Payne et al. / Single-Sensor Probabilistic Localization on the SeReS Self-Reconfigurable Robot 209

for the proof-of-concept work done in this experiment. Further testing should involve variable light and reflectivity in the environment, such as under direct sunlight, and ambient or cloudy skies. 1.2. Sensor Models For our sensor model, we simply use an approximation of the real light characteristics of the infrared detector on the SeReS module. SeReS reads in a 10-bit brightness value that is then used to determine a suspected angular offset for both yaw θ and

pitch φ (Figure 2). We collected sensor response data in 100° vertical and horizontal sweeps to map detector sensitivity to both rotational axes. This was a simple yet appropriate approach since SeReS modules often perform side-to-side or up-and-down motions to find their neighbors in reconfiguration tasks. Pitch

% Saturation vs Relative Orientation

Yaw 3

2.5

% Saturation (at 30 cm)

2

1.5 1

0.5 0 -60

-40

-20

0

20

40

60

-0.5 Degrees from Sensor Centerline

a.

b.

Figure 2. Brightness vs. Angular Offset Sensor Model. Note that in (a) “% Saturation” refers to total sensor saturation. Since these experiments were conducted at 30 cm, a very small but sensitive range is detected, and the infrared detector never completely saturates. This is from a 10-bit value representing a 5V range. (b) shows how the orientation workspace of SeReS is mapped onto a 2D spherical section. Note also that orientation can be expressed in degrees (0°,0° center) or in discretized 4.2° x 4.2° cells (12,12 center cell).

2. The Elliptical Intersection Method The most important part of a probability distribution map is the set of peaks indicating likely regions in which the source should be found. As the sensor model can only approximate range but not specific orientation from the current position, each location has with it an associated elliptical region of likely positions. The elliptical intersection method represents the highest rings of those regions of probability as parametric ellipses. By determining the intersections of these ellipses (Figure 3), a concise set of points can be found where the infrared source may be located. In our light model, we measured the response curves of the infrared detector on both θ and φ axes. This simple model ensures that all major and minor axes of associated ellipses will be parallel. The major and minor axes of each ellipse are the hypothesized angular offset taken from the light reading in Figure 2a. Lastly, the three samples are arranged in a triangle to cover more operational space.

210 K. Payne et al. / Single-Sensor Probabilistic Localization on the SeReS Self-Reconfigurable Robot

a.

b.

Figure 3: (a) The elliptical intersection method begins by calculating the sizes of three sampling ellipses, taken from different points in SeReS’ operational space, and whose a and b terms are determined from the light measured at each sample point with respect to the light model from Figure 2a. (b) shows the resulting intersections of the ellipses determined in (a). After discarding the outliers, the remaining cluster will envelope the region most likely to contain the infrared source.

Recall from elementary geometry that the intersection of two ellipses is Eq. (1) and Eq. (2) after simplification. In these equations, the variables a and b are the vertical and horizontal angular offsets taken from the light model in Figure 2a, and the position (h, k) is each point at which sampling takes place.

(θ − h1 )2 (φ − k1 )2 +

a12 § 1 ¨¨ 2 © a1

b12

=

(θ − h2 )2 (φ − k 2 )2 +

a 22

(1)

b22

§ h 2 (φ − k1 )2 · § 1 · 2 § 2h1 · ¸=¨ 2 ¸¸θ − ¨¨ 2 ¸¸θ + ¨ 12 + ¨a ¸ ¨a b12 ¹ © a1 ¹ © 1 ¹ © 2

· 2 § 2h2 ¸¸θ − ¨¨ 2 ¹ © a2

§ h 2 (φ − k 2 )2 · ¸¸θ + ¨ 22 + ¨a b22 ¹ © 2

· (2) ¸ ¸ ¹

As the current SeReS orientation (h, k) and the suspected angular offsets (a, b) will not be known until sampling at runtime, a general solution is needed. We used Sylvester’s determinant of the matrix Eq. (3) to remove the θ term. The determinant itself is a very long fourth order polynomial with thirty-four terms. This allows us to find its φ roots by simplifying it into the form of Eq. (4) and solving.

ª1 « 2 « a1 « «0 « «1 « a2 « 2 « «0 ¬

(Aφ

2h1 a12 1 a12 2h − 22 a2 1 a22 −

4

h12 (φ − k1 ) + a12 b12 2h − 21 a1 2 2 h2 (φ − k 2 ) + a22 b22 2h − 22 a2 2

º » » ( φ − k1 )2 » + » b12 » » 0 » » 2 ( φ − k2 ) » + b22 »¼ 0

h12 a12

h22 a22

)

+ Bφ 3 + Cφ 2 + Dφ + E =0 F

(3)

(4)

K. Payne et al. / Single-Sensor Probabilistic Localization on the SeReS Self-Reconfigurable Robot 211

Since the intersections are represented by floating point numbers, we must use the Newton-Raphson method of approximating roots the same way it is done on modern graphing calculators. By solving Eq. (1) for θ and substituting the roots found, one can find the set of real 2D points at which the ellipses intersect. Next, by clustering the points together, we approximate the location of the source by discarding outliers and taking the center of mass of the remaining cluster. The sensor model guarantees that the major and minor axes of the ellipses will be parallel to the pitch and yaw axes of the SeReS module respectively. Also, the arrangement of sample points chosen for taking readings guarantee that there will be up to nine intersections. To simplify clustering, simply discard those points whose angular offset from the center of mass is greater than the average, and then recalculate the new center of mass. Due to the guarantee of axial symmetry and triangular arrangement, this clustering method is sufficient. If the light model were more complex, we would likely have to draw a clustering method from [15]. It should be noted that if fewer than three intersections are found, the data is insufficient to produce a solution. This is addressed later.

3. Discretized Methods In these localization methods, we separated the angular offset space into a set of discrete cells that contain the probability of the light source being in that location. We used a 24 x 24 2D map of the SeReS module’s yaw θ and pitch φ operational space that is also interpreted as the orientation of the infrared detector. This map places cell (0, 0) at the lower west corner and (23, 23) is at the upper east (Figure 2b). To update the belief functions, readings must be taken from various locations. This yields a ring or peak of high probability around SeReS cell location depending on the infrared source’s suspected location. If the new probability for a given cell in the map is lower than the current belief for that position, it will decrease. This eliminates competing peaks from the map. We chose the four cells (12, 12), (5, 12), (14, 6), and (14, 14), because together these locations cover SeReS’ entire operational workspace.

a.

b.

Figure 4: In both images, readings are taken from SeReS’ central orientation at cell (12,12). In (a), the suspected brightness is far away from cell (12, 12) so an elliptical valley is created in the probability distribution. In (b) the source is very close to SeReS’ current orientation, so a very small ring (becomes a peak) is created around the current orientation of the SeReS module.

For all sensed values b > 4.3% of complete infrared detector saturation, the function would return suspected angular offsets of 0°. This occurs mainly at perfect

212 K. Payne et al. / Single-Sensor Probabilistic Localization on the SeReS Self-Reconfigurable Robot

alignment, and supports the assumption that the suspected source position was at an angular offset of 0° from SeReS’ current position. This also creates a peak at the current position in the 24 x 24 map. The next part of the curve shows that for all brightness values 0.1% < b < 4.3%, the curve can be approximated by a parabola opening downwards. Lastly, for all sensed brightness b < 0.1%, a wide distribution was used for each cell. This essentially lowers the probability of the region surrounding SeReS and leaves the oblique regions with a nearly equal chance of containing the source. Not every point is expected to see the light source, and so this method maintains probability mass for those points not in the immediate detector range. See Figure 4 for illustration.

3.1. Implementation of Bayesian Filter Recall the Bayes filter, governed by Eq. (5) for sensor observations. The robot starts with a uniform prior belief distribution of the light source. The state, s, is the position of the sensor in the 24 x 24 cell map. Due to the large size of the individual cells, (4.2° x 4.2°), Bayes’ action model tested to be P (s | s ' , action) = 1.0 for all commanded servo motions, and is therefore left out for brevity. In Eq. (6) we represent the state as the coordinate pair (θ , φ ) . The observation, o, in Eq. (5) is the detected brightness B and the suspected angular offsets (a,b) that follow from the light model given in Figure 2a. The evidence η is found in the usual way by taking the summation of all nonnormalized P and then dividing each cell by that value.

Bel ' (s | o ) = ηP (o | s ) ∗ Bel (s ) , where η =

1 . evidence

Bel ' (θ , φ | B, a, b ) = ηP(B, a, b | θ , φ ) ∗ P(θ , φ )

(5) (6)

3.2. Implementation of Elfes’ Method In pursuit of another method for updating the prior, we also tried Elfes’ approach [1]. Elfes held that the new belief of a cell in an observation is governed by Eq. (7).

P(cell) = P(cell) + P(reading) - P(cell) * P(reading)

(7)

We picked this method because sound and light dissipate in a similar fashion, and we hypothesized that what worked for Elfes’ on sonar-range finding would work on beacon location. Our infrared detector gives us magnitude but not direction just as Elfes’ sonar gave magnitude (and by virtue of positioning on the robot, direction), but did so with heavy noise [1]. As Elfes used repetition in his method to circumvent the sonar noise, we used it to overcome the noise in the infrared detector. The rest of the process is carried out in the same fashion as for our Bayesian method.

K. Payne et al. / Single-Sensor Probabilistic Localization on the SeReS Self-Reconfigurable Robot 213

4. Findings

4.1. Elliptical Intersection Method For each method, we ran ten trials with the infrared source placed in various locations. For the elliptical intersection method, the results were erratic. In simulation, the system used very regular curves for brightness and always measured perfectly. In real application however, the sample points sometimes yielded 0% intensity, leaving us with the challenge of representing an “out-of-range” condition in the program. We tried two possible sets of angular offsets for this condition. The first was the bare minimum angular offset for that received value, which was a = 45°, b = 36°, taken from the model (Figure 2a). The second pair was the absolute extreme angular offset measured for 0% intensity: a = 50°, b = 50°. Some of the results are in Table 1. Table 1. Out-of-Range Elliptical Intersection trial results. Trial

Actual Position (ș, ij)

Cluster Center (ș, ij)

Out of Range Angular Offsets

EIM Trial I-a

(17°, 17°) = cell (16, 16)

(15°, 24°)

a = 50°, b = 50°

(11°, 16°)

a = 45°, b = 36°

(2°, -3°)

Both conditions.

(7°, -8°)

a = 50°, b = 50°

EIM Trial I-b

(0°, 0°) = cell (12, 12)

EIM Trial I-c

(-50°,-50°) = cell (0, 0)

EIM Trial I-d

(-30°, 25°) = cell (5, 18)

(-6°, -8°)

a = 45°, b = 36°

(-33°, -30°)

a = 50°, b = 50°

In theory, this method was able to approximate the light source in simulation to within ±2°. In actual practice, however, this was usually greater than ±10° with occasional trials such as I-d, off by ±60° on one axis. This indicates that our light source and light model were not as uniform as we had hoped, and that further study into making a clean reliable uniform light source will be needed. One benefit of the probabilistic methods was that even if no brightness was measured, the algorithm would increase the extreme areas and decrease the probability mass for the surrounding area. With the elliptical method, no reliable ellipse dimensions could be determined from a single measurement. Lastly, the Newton-Raphson part of the algorithm had to cycle many more thousands of times for each potential intersection, which yielded a solution on our 4MHz processor within 10 to 30 seconds, (depending on the points selected and their respective ellipse sizes). This is currently too slow for implementation on real-time docking problems.

4.2. Discretized Methods The discretized methods worked much more smoothly. The results in Table 2 show the discretized algorithms to be accurate usually within 2 cells as the probability increases. The Bayesian method is quick and accurate. It repeatedly localized to a single peak in short order; always within a few cells from the actual source of infrared light (it tends to converge low-west of actual position). Of particular interest were the results of Trial B-a, where the light source was placed slightly east and low-of-center. This was able to

214 K. Payne et al. / Single-Sensor Probabilistic Localization on the SeReS Self-Reconfigurable Robot

converge on the fourth sample despite the fact that the first three measurements were out of range (Figure 5). Table 2. Notable Bayesian and Elfes trial results. Trial

Actual Cell (ș, ij)

Peak Cell (ș, ij)

Bayesian Trial B-a

(16, 16)

(18, 17)

Bayesian Trial B-b

(12, 12)

(14, 14)

Bayesian Trial B-c

(12, 12)

(16, 16)

Bayesian Trial B-d

(16,16)

(11,15)

Elfes’ Method Trial E-a

(12, 12)

(1, 10)

Elfes’ Method Trial E-b

(0, 0)

(3, 1)

a.

b.

Figure 5: Trial B-a. (a) SeReS starts at cell (12,12) and sees no light. It then moves to cells (5,12) and (14,6) with no detected light. Finally, it approaches (14,14) (b) near the source and begins to converge with a higher peak. Already by the fourth sample (b), the three other competing peaks were reduced.

a.

b.

c.

d.

Figure 6. Trials B-b (a) and B-c (b) show the effects of noise in the sensor. Trial B-b has half of Trial B-c’s standard deviation. The noise in the sensor yielded different probabilities, yet still converged to within 4 cells. In Trial E-a, the distribution does not readily converge. (c) SeReS begins at cell (12, 12). (d) Fourth step at (14, 14). The overall peak increased and moved closer to the source with each sampling, but a large portion of the probability ring still remains. To implement Elfes’ approach more efficiently, we may need a better sensor model, or many more readings.

K. Payne et al. / Single-Sensor Probabilistic Localization on the SeReS Self-Reconfigurable Robot 215

Bayesian Trials B-b and B-c, and Elfes’ Method Trial E-a were all taken with the light at the center. Trial B-b registered the peak closer to the actual position, and Trial B-c had a wider standard deviation (Figure 6a and 6b). Next is trial E-a, which had the light source at the center (12, 12). The adaptation of Elfes’ approach is slow, but promising. To effect change in the probability distribution it relies on many readings, and had not yet converged to a single useful peak by the fourth sample. It is shown in Figure 6c and 6d.

5. Conclusions and Future Work One conclusion we can draw from these results is that the chief cause of failure or inaccuracy was that our sensor model was too brief, and did not account for anomalies that were seen at oblique angles from above and below. These anomalies are common to off-the-shelf infrared detectors like the ones used on SeReS, and are even within the specifications of the particular one we used. This is supported by the fact that the simulated models worked well based on simplified physical laws that govern light behavior. Once the methods were applied under real-world conditions, the shortcomings of the light model became apparent. Also, we found that by reducing the standard deviation for the sensor model probability curve, the peaks were sharper and converged faster and tighter than when we used a broader curve. This is only natural, as more probability mass was concentrated into a smaller volume. This accounts for the difference in Bayesian Trials B-b and B-c. By giving more benefit of the doubt to nearby cells, Trial B-c converged with a wider distribution around the peak, whereas Trial B-b was higher and tight. While this paper focused on localizing a single infrared source under semi-ideal conditions, it should be noted that the real world is full of light noise and sensor misalignments. It is therefore desirable to develop a more rigid and redundant sensor model that takes into account other factors such as slop and backlash in the servomotors, multiple light sources, and of most importance, moving lights at various ranges from the detector. One suggestion we have for further work on this topic is to use a weighted system of constant random sampling. The peaks or intersections would quickly drift across the operational space and the clusters would track the source with an error whose size was directly related to the sampling frequency and source velocity. Another suggestion would be to use a variable set of light models taken at varying distances. If peaks or intersections are either very far away from each other, or nonexistent, then the light model would be assuming that the source is too close, or too far away, respectively. One could use a wide variety of standard approaches to converge on selecting the proper model, such as rule-based lookup tables, simulated annealing, neural networks, and many others. A proper infrared sensor model can help modular systems to localize relative to opposing dock faces during self-reconfiguration. Using probabilistic methods, the approach phase of docking can be sloppier and still localize and align, assuming the control algorithm chooses appropriate points from which to take readings. In our experiments, points were chosen arbitrarily for apparent uniform distribution across the operational space, but a more variable approach could also be used.

216 K. Payne et al. / Single-Sensor Probabilistic Localization on the SeReS Self-Reconfigurable Robot

Acknowledgements We are grateful that this research is sponsored by AFOSR under award numbers F49620-01-1-0020 and F49620-01-1-0441. We also want to thank other members in the Polymorphic Robotics Laboratory for their generous intellectual support and discussions.

References [1] [2]

[3] [4]

[5] [6]

[7] [8]

[9] [10]

[11]

[12]

[13]

[14]

[15] [16]

A. Elfes, "Sonar-based Real-world Mapping and Navigation", IEEE Transactions on Robotics and Automation, 3(4), pps. 249-265, 1987. E. Yoshida, S. Murata, A. Kamimura, K. Tomita, H. Kurokawa, S. Kokaji, "A Self-Reconfigurable Modular Robot: Reconfiguration Planning and Experiments", International Journal of Robotics Research, 21(10), pps. 903-916, October 2002. M. Rubenstein, M. Krivokon, W.-M. Shen, "Robotic Enzyme-Based Autonomous Self-Replication", International Conference on Intelligent and Robotic Systems, Sendai, Japan, 2004. Z. Butler, K. Kotay, D. Rus, K. Tomita, "Generic Decentralized Control for a Class of SelfReconfigurable Robots", Proceedings of International Conference on Robotics and Automation, Washington, DC, USA, 2002. K. Stoy, R. Nagpal, "Self-Repair Through Scale Independent Self-Reconfiguration", Proceedings of 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, Sendai, Japan, 2004. K. Kotay, D. Rus, "Generic Distributed Assembly and Repair Algorithms for Self-Reconfiguring Robots", Proceedings of 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, Sendai, Japan, 2004. E. Klavins, R. Ghrist, D. Lipsky, "Graph Grammars for Self-Assembling Robotic Systems", Proceedings of International Conference on Robotics and Automation, New Orleans, USA, 2004. M.W. Jorgensen, E.H. Ostergaard, H.H.Lund, "Modular ATRON: Modules for a Self-Reconfigurable Robot", Proceedings of 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, Sendai, Japan, 2004. M. Nilsson, "Heavy-Duty Connectors for Self-Reconfiguring Robots", Proceedings of International Conference on Robotics and Automation, Washington DC, USA, 2002. M. Rubenstein, K. Payne, P. Will, W.-M. Shen, "Docking Among Independent and Autonomous CONRO Self-Reconfigurable Robots", Proceedings of International Conference on Robotics and Automation, New Orleans, USA, 2004. M. Silverman, D.M. Nies, B. Jung, G.S. Sukhatme, "Staying Alive: A Docking Station for Autonomous Robot Recharging," Proceedings of International Conference on Robotics and Automation, Washington DC, USA, 2002. B. Khoshnevis, P. Will, W.-M. Shen, "Highly Compliant and Self-Tightening Docking Modules for Precise and Fast Connection of Self-Reconfigurable Robots," Proceedings of International Conference on Robotics and Automation, Taiwan, 2003. K. Payne, B. Salemi, W.-M. Shen, and P. Will, "Sensor-Based Distributed Control for Chain-Typed Self-Reconfiguration," Proceedings of 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, Sendai, Japan, 2004. K. Roufas, Y. Zhang, D. Duff, M. Yim, “Six Degree of Freedom Sensing for Docking Using IR LED Emitters and Receivers,” Experimental Robotics VII, Lecture Notes in Control and Information Sciences 271, D. Rus and S. Singh Eds. Springer, 2001. A.K. Jain, M.N. Murty, P.J. Flynn, "Data Clustering: A Review", ACM Computing Surveys, 31(4), pps. 264-323, 1999. D.M. Lane, HyperStat Online Tools Website, http://davidmlane.com/hyperstat/normal_distribution.html, Oct. 2, 1999.

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

217

Mutual Localization and 3D Mapping by Cooperative Mobile Robots Julian Ryde and Huosheng Hu Department of Computer Science, University of Essex Wivenhoe Park, Colchester CO4 3SQ, England E-mail: [email protected], [email protected] Abstract. This paper describes the development of a 3D laser scanner and an approach to 3D mapping and localization. The 3D scanner consists of a standard 2D laser scanner and a rotating mirror assembly. Employing multiple robots and mutual localization local 3D maps are built. Global localization within the maps is performed by extracting a cross-section of the map just below the ceiling and then using an exhaustive search algorithm to enable the merger of multiple local 3D maps. The quality of these maps is such that the poses estimated by this method are accurate to within 0.1m and 1 degree. Keywords. Mutual, Localization, Cooperative, Mobile, Robots, 3D, Mapping

1. Introduction Extracting 3D information about the world surrounding a robot has proved difficult. The two main approaches, vision and laser range finding, have been dogged by problems. Vision is often computationally intensive and suffers from sensitivity to changes in illumination. Many of the difficulties stem from having to solve the correspondence problem which can be alleviated by structured light approaches, however the data spatial density does not come close to that provided by laser scanners. Non-visual localization and mapping has taken place in 2D, mainly due to limitations of the sensors in the case of laser range finders, or processor speed and algorithms for that of stereoscopic vision. Recently in a drive to test the benefits of 3D sensors researchers have mounted 2D laser scanners on nodding or rotating mechanisms to obtain 3D scans [1,2]. Alternatively, two laser scanners mounted with their scan planes orthogonal [3] are also popular. The main problems with nodding or rotating approaches are difficulties in hardware implementation and high power consumption as the 2D scanners are heavy. Consequently a rotating mirror prototype has been built which produces 3D scans with a field of view of 100 by 180◦ , is light, has low power consumption and is easily deployed on conventional robotics platforms. A number of groups are undertaking research into 3D laser mapping however very few groups are performing cooperative 3D laser mapping. The closest are groups using cooperative vision and laser based mapping in outdoor environments [4] and vision only [5]. The benefits of full 3D mapping are abundant and so the rapid expansion of this field is inevitable. The detection of negative and over-hanging obstacles greatly en-

218

J. Ryde and H. Hu / Mutual Localization and 3D Mapping by Cooperative Mobile Robots

hances avoidance behavior. Once 3D maps of environments have been built they can be customized for different robots. For instance various 2D occupancy grids may be built for robots of different sizes or with 2D sensors at different heights. Severely cluttered non-manifold environments such as search and rescue situations may be reliably mapped. Maps based upon the ceilings of rooms will remain accurate for longer and an unoccluded view of the ceiling is usually readily accessible to a robot even in crowded environments [6]. The disadvantages of 3D sensing technologies are slower acquisition time and the geometric increase in data that needs to be processed. In this paper, the mutual localization approach discussed in Section 2.1 coupled with the 3D laser range finder prototype pushes this research into the new area of threedimensional cooperative localization and mapping. Combining mutual localization with the data from multiple 3D laser scanners enables full 3D mapping of indoor environments. This would prove vital for a number of industries such as nuclear decommissioning, search and rescue scenarios, surveying as built structures and maps for mobile robots. The rest of this paper is organized as follows. Section 2 presents the system framework of the research carried out in this paper. Section 3 details the 3D scanner hardware. Section 4 includes the experimental results and is followed by Section 5 discussing their implications along with future research directions.

2. System Framework 2.1. Mutual Localization The premise for mutual localization is that rather than merely observing robots as beacons each robot observes and is itself observed simultaneously [7]. Additionally, ensuring that robots may observe team-mates and be observed themselves means that simultaneous mutual localization events can occur. These events allow superior relative pose determination. Firstly, the mutual localization is robust to spurious readings because simple checks on the validity of the mutual pose are available; for instance the separation of the robots should be similar as measured by both observing robots. Secondly, the accuracy in the pose does not deteriorate with separation, [7], a very useful property. Increasing separation merely decreases the frequency of mutual localization events. This is accomplished by mounting retro-reflective cylinders above the laser scanner as shown in Fig. 1(a). The 3D laser scanner prototype has a blind spot directly in front of it with an angular width of 20◦ thus higher mounted beacons will be visible from further afield. Most clutter in rooms is quite low, for example chairs and tables, so even when the line of sight between the robots is interrupted; the beacons, being some way from the ground, may still be detected by the laser scanners. Mutual localization is accomplished by first projecting the observed beacon’s 3D position onto the xy-plane which is parallel to the floor of the room. Once projected onto the horizontal plane the 2D mutual localization algorithm may be used. This method assumes that the beacons are always the same height above the floor, reasonable in the case for many indoor environments with flat horizontal floors. Given the relative pose of the mapping robot with respect to the observing robot multiple 3D laser scans may be combined to produce a local map.

J. Ryde and H. Hu / Mutual Localization and 3D Mapping by Cooperative Mobile Robots Retroreflective Cylinders

219

D d a

s1

β

C

s2

Rotating Mirror

Laser Scanner

b Robot Platform

A

α

θ d

(a)

B (b)

Figure 1. Beacon positioning and geometry used to calculate the relative pose. AB is the stationary robot and CD the mapping robot. D and B are the beacons with C and D the origins of the laser scanners.

The beacons are above the laser scanner and there is a small horizontal displacement from the origin of the laser scanner. This displacement introduces the complications evident in Fig. 1(b), where the laser scanners are represented as semi-circles with the forward part of the scanner corresponding to the curve of the semicircle and the beacons as solid circles. The robots are at A and C, with C mapping whilst A remains stationary. The pose of A may be constrained to the origin and x-axis without loss of generality. The beacons, B and D, are attached to the robots at A and C respectively. The distances AB and CD are both equal to d. The separation of the beacons is labelled as s1 and s2 because s1 is the separation as observed by the robot at A and s2 is the separation as observed by the robot at C, which in practice will not be identical. The robot at A observes the beacon D at range a and angle α whilst that at C observes the beacon B at range b and angle β. The beacon separation may be calculated from the robot observations by the cosine rule for triangles. Comparison of the two values s1 and s2 (which should be approximately equal) allows erroneous localizations to be detected and rejected. Typical errors stem from distractors in the environment or error sensitive geometric configurations. To acquire the position of the robot at C it is best to consider the geometry in the complex plane rather than the more immediately apparent geometric methods involving the sine and cosine rules. Using vectors in the complex plane gives two expressions for the position of C found by traversing the two available paths from the origin to C. A detailed explanation is given in [7]. Equating these gives the following expression for θ. eıθ =

aeıα − d b − de−ıβ

(1)

Eliminating eıθ and simplifying gives C in terms of the observed angles, ranges and the parameter d. C=

abeı(α+β) − d2 beıβ − d

(2)

The real and imaginary parts of C give the Cartesian coordinates of the second robot. The orientation is θ−β +π with θ determined from the argument of (1). The complex number calculations are done in Java using the Java Math Expression Parser (JEP) because Java does not natively handle complex numbers. The robots are able to exchange information

220

J. Ryde and H. Hu / Mutual Localization and 3D Mapping by Cooperative Mobile Robots

through wireless networking. Once the pose of robot C is acquired its scan data may be used to update the centrally held occupancy map. The map is a standard occupancy grid [8], where each cell contains the probability that the particular space it represents contains an obstacle. The grid is initialized so that all cells contain 0.5 representing complete uncertainty. Typically the resolution is 0.1m and the size is 30 by 30m and the origin is the initial position of the observing robot. New sensor data is incorporated via the Bayesian update process described in (3). Occupancy grids are an effective map representation as they are robust, handle uncertainty well and allow fusion of data from different sensors [9]. −1  (1 − evidence)(1 − prior) P (occupied) = 1 + evidence × prior

(3)

Due to the fidelity and narrow beam width of the laser scanner a straightforward raytrace model was employed. When updating the occupancy grid all cells along the path of the laser ray have their probabilities decreased whilst the cell containing the end point of the ray has its probability increased. 2.2. Global Localization Whilst producing local maps in a cooperative manner, one of the robots remains stationary acting as a reference frame for building the map. In order to map areas larger than the local map the stationary robot must move at some point. This is an opportunity for odometry errors to adulterate the global map. Although these may be vastly reduced by moving the reference robot under observation of the mapping robot, these errors included at each step, albeit small, still undergo unbounded accumulation. Without some form of reference to the global map frame, mapping of large cyclic environments would prove impossible. Thus global localization is a necessity. Usually mobile robots operate on the floor, consequently the 2D assumption may be enforced for global localization. In most typical indoor environments moveable objects tend to lie on the floor. Floor-level 2D maps (as most in previous 2D localization mapping research have been) are unreliable and quickly become out of date when furniture is moved and doors are opened or closed. To avoid these problems 2D map data is extracted as a horizontal cross-section just below the ceiling of the room. This volume of a room changes infrequently and is more easily observed especially in cluttered environments. The horizontal cross-section chosen was the volume from 2.3 to 2.9m in height and the resulting 2D occupancy grid is produced by simply summing the vertical columns within this height range and re-normalizing. A plethora of global localization algorithms given a prior map exist however in this research an exhaustive search approach was implemented. This has a number of advantages, the entire pose probability density function may be recorded and inspected, it is very robust and suffers only from its slow execution speed. The impact of this on realtime operation may be vastly reduced especially if a position estimate is available which is almost always the case. The pose search algorithm may simply spiral search out from the estimated position. It was found that the execution speed of an unoptimized algorithm meant that global localization could take place on maps of size 20m by 20m with pose resolution of 0.1m and 1◦ in the order of minutes on a 2GHz processor. For the imple-

J. Ryde and H. Hu / Mutual Localization and 3D Mapping by Cooperative Mobile Robots

221

Move A to NBV and take 3D scan

Move B to NBV and take 3D scan

No

Can A and B detect each other?

Yes

Brute force globally localise A in global map

Determine pose by mutual localisation and add scan to 3D map

No

Is local map adequate?

Yes Add local map to global map

Figure 2. The 3D mapping process for two robots A and B

mentation of the exhaustive search algorithm the 3D sensor scanner is converted into a 2D occupancy grid by extracting the cross-section near the ceiling. A correlation metric is then calculated iteratively for the sensor occupancy grid by stepping through various poses with respect to the map. Typically, pose resolutions of 0.1m and 1◦ are used. The pose with the best correlation metric is then chosen. An example position probability distribution function is displayed in Fig. 5(b). 2.3. Concurrent Processing Usually an exhaustive search algorithm of localization is unfeasible because it does not run fast enough to provide continuous real-time pose updates. However in this implementation the pose updates are supplied by the mutual localization. The global localization is run in parallel with the local map build process as depicted in Fig. 2. In Fig. 2 the first robot, A, moves to the next best view (NBV). The NBV is the next position which is most likely to yield useful observations for updating the map and in these experiment is calculated manually. Robot A is acting as the stationary robot providing a reference frame for B whilst B maps the local area around A. When the local map is sufficiently good it can be added to the global map. Whilst the local map around A’s new pose is built the pose of A within the global map is calculated. This pose along with (3) enables the merger of the local and global maps. The process is repeated until the global map is of sufficient quality.

3. 3D Scanner Design The 3D laser range scanner consists of a SICK LMS 200, rotating reflecting surface prototype (Fig. 3(a)) and software to convert the range scans into 3D data. The mirror rotation period is 40 seconds. The slow rotation speed contributes to a very low power consumption of 0.01W. Rotating at the same speed as the motor is the far blocking arm in Fig. 3(a). The purpose of this arm is to allow feed back of the angular velocity of the

222

J. Ryde and H. Hu / Mutual Localization and 3D Mapping by Cooperative Mobile Robots Right block arm Motor r co φ

φ

d

r sin θ

M irr

or

θ

sθ −

Left block arm d

(a)

(b)

Figure 3. Rotating mirror mechanism and calculating 3D coordinates from range data

mirror to the robot. This blocking arm approach circumvents problems synchronizing angular velocity with the laser data. The laser scanner scans full 180◦ scans with most of the range data from side angles discarded. The furthest right hand scan points are blocked every half a second and this information is used to calculate the angular velocity of the motor and consequently the mirror. A particular advantage of this approach is ease of deployment in that the rotating mirror mechanism can simply be placed in front of the laser scanner and no further connections are required, even the power supply is self contained. The arm blocks are easy to detect as falling edges in the side range data. The LMS was configured to scan a 100◦ arc at 1◦ intervals and mm measurement setting. These settings were chosen to minimize the number of scan points in order to increase the complete scan frequency. The scan frequency is limited by the data rate and at the moment is 0.05Hz. This data rate is 34.8kb/s, the maximum for the serial interface, resulting in 13 horizontal scans per second. The SICK LMS 200 can support a 500kb/s data rate using a USB interface. At this data rate, full 3D 1◦ by 1◦ scans should be possible at 0.5Hz. The mirror is mounted in front of the scanner rather than rotating or nodding the scanner itself. This has a number of advantages namely less power consumption and simpler hardware setup. The disadvantages are a reduced scan angle to around 100◦ , a blind spot when the mirror is edge on and a requirement for more complex geometric calculations (Fig. 3(b)) to correct for the separation between the mirror and the scanner. In Fig 3(b) the effect of the mirror is to bend part of the xy-coordinate plane to a new elevation illustrated in grey. The following equations, which reference the values indicated in Fig. 3(b), indicate the conversion between the r, θ and φ coordinates, measured by the laser scanner, and 3D cartesian coordinates. x = ((r cos θ − d) cos φ + d, r sin θ, (r cos θ − d) sin θ)

(4)

where the value d is the separation between the origin of the laser scanner and the axis of the rotating mirror. The range and bearing as measured by the laser scanner are r and θ. The angle of the plane (grey region in Fig. 3(b) to the horizontal introduced by reflecting the laser ray from the rotating mirror in front of the scanner is indicated by φ.

J. Ryde and H. Hu / Mutual Localization and 3D Mapping by Cooperative Mobile Robots

223

(a) (b) Figure 4. Photograph and typical 3D scan of mapped room

(a)

(b)

Figure 5. Global sub ceiling map and position probability density function both with a resolution of 0.1m. The map is a thresholded occupancy probability grid of the space 1-2.9m above the floor. The width of the room is approximately 12m. Two pillars and the windows at the back are evident.

4. Experimental Results and Analysis Results are visualized using a number of methods and software programs. To visualize the data a 3D scene is generated consisting of cubes or spheres representing occupancy and the results rendered using a ray tracer, Fig 4(b). This allows the production of high quality images and movies with shadowing, luminosity and perspective, visual cues that allow the brain to extract the 3D information more effectively from the 2D display media. However, despite these efforts it still remains difficult to effectively display 3D maps as images. An example of such a 3D image is displayed in Fig. 4(b) and a photograph from the corresponding view point is shown in Fig. 4(a). Each sphere in Fig. 4(b) represents a laser range return and is colored by its distance from the view point, with those closer being lighter. Fig. 5 illustrates a typical global localization scenario; the occupancy grid, Fig. 5(a), is represented by drawing a cube at every grid cell where the probability of occupancy exceeds 0.9. Fig. 5(b) shows a postion probability density surface where the probability

224

J. Ryde and H. Hu / Mutual Localization and 3D Mapping by Cooperative Mobile Robots

of each position is indicated by the height of the surface. The reliability and accuracy of the pose may be ascertained by inspection of the position probability distribution. The true position is easily discerned and the dominance and shape of the peak denotes the reliability and error respectively.

5. Conclusion and Future Work For decades mobile robots have been shackled by the 2D assumption, which has been necessary due to the absence of suitable 3D sensors. Even today 3D sensing technology is problematic and expensive. In this paper a rotating mirror mechanism has been added to a standard SICK LMS 200 laser scanner which produces 3D scans. These 3D scans coupled with mutual localization have produced full 3D environmental representations at very low cost. The fidelity of these representations has been validated by performing global localization within them. Further work would include mounting the laser scanner so that it is facing up into the mirror to allow fuller scans which might allow localization by range histograms. Results from increasing the number of cooperating robots would be interesting. The 3D occupancy grids produced by mutual localization, although interesting for humans to inspect, will mainly be used by other robots. Thus the global localization accuracy is a good indicator of their fidelity and suitability for further use by other robots. Global localization in these maps delivers poses accurate to 0.1m and 1◦ .

References [1] A. Nüchter, K. Lingemann, J. Hertzberg, and H. Surmann, “Heuristic-based laser scan matching for outdoor 6D SLAM,” in Advances in artificial intelligence. 28th annual German Conf. on AI, Sept. 2005. [2] K. Lingemann, H. Surmann, A. Nüchter, and J. Hertzberg, “Indoor and outdoor localization for fast mobile robots,” in Proceedings of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, Sendai, Japan, Sept. 2004, pp. 2185–2190. [3] A. Howard, D. F. Wolf, and G. S. Sukhatme, “Towards 3D mapping in large urban environments,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, Sendai, Japan, Sept. 2004, pp. 419–424. [4] R. Madhavan and K. Fregene and L. Parker, “Distributed Cooperative Outdoor Multirobot Localization and Mapping,” Autonomous Robots, vol. 17, pp. 23–39, 2004. [5] J. Little and C. Jennings and D. Murray, “Vision-based mapping with cooperative robots,” in Sensor Fusion and Decentralized Control in Robotic Systems, vol. 3523, Oct. 1998, pp. 2–12. [6] W. Burgard, A. B. Cremers, D. Fox, D. Hahnel, G. Lakemeyer, D. Schulz, W. Steiner, and S. Thrun, “Experiences with an interactive museum tour-guide robot,” Artificial Intelligence, vol. 114, no. 1–2, pp. 3–55, 1999. [7] J. Ryde and H. Hu, “Laser based simultaneous mutual localisation for multiple mobile robots,” in Proc. of Int. Conf. Mechatronics and Automation, Niagara Falls, Canada, July 2005, pp. 404–409. [8] A. Elfes, “Using occupancy grids for mobile robot perception and navigation,” Computer, vol. 22(6), pp. 46–57, 1989. [9] P. Štˇepán, M. Kulich, and L. Pˇreuˇcil, “Robust data fusion with occupancy grid,” IEEE Transactions on Systems, Man and Cybernetics, vol. 35, no. 1, pp. 106–115, 2005.

Part 5 Network Agent Systems

This page intentionally left blank

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

227

Exploration of complex growth mechanics of city traffic jam for the adaptive signal control Kouhei HAMAOKA , Mitsuo WADA Dept. of Information Science and Technology, Hokkaido University Abstract. Traffic jams cause stagnation of traffic flow which leads to waste of energy and damage to city environment. Thus, relaxation of traffic jams is an important issue. The dynamics of city traffic jams are difficult to derive in a theoretical sense because of their complexities. In this research, we construct a city traffic flow model which can express these complex characteristics. Furthermore, the relations between city traffic jams and traffic signals’ indications are analyzed by the traffic simulation. In the result, we propose a new measure ”critical inflow rate”. If inflow approaches this value, the traffic jam grows exponentially. Thus, it is a feasible criterion to evaluate growth of the traffic jam. Furthermore, we discuss the effectiveness of random split indication of the traffic signal in finite jams’ area in the context of stochastic resonance. From these results, we show a new methodology which can deal with city traffic jams. Keywords. Traffic simulation, Coupled map lattice, Signal control, Queuing theory

1. Introduction Real traffic flow is a complex system, since many drivers’ decisions affect the flow. Thus, there is a need to model traffic flow for the purpose of analyzing traffic jams. In traffic systems, flow volume increases linearly with density if the density is sufficiently low. But traffic flow has the intrinsic property that the flow volume does not increase any more if the density has gotten large on some level. Put simply, flow volume can not identify the density. So traffic simulation is indispensable to analyze traffic jams generated from complex traffic flow. In this research, we construct a city traffic flow model which can express these complex characteristics. Furthermore, the relations between city traffic jams and traffic signals’ indications are analyzed by the traffic simulation. On a free way, vehicles will decelerate because of the narrow gap if vehicles’ density has became large. The backward propagation of velocity delay intercepts traffic flow. We call such stagnation ”natural jam” [1]. If there is a traffic signal in the traffic system, vehicles accumulate right in front of the signal because of the signal indication. The backward growing of vehicles’ accumulation intercepts traffic flow. We call such stagnation ”signal jam”. Kuwahara et al. discuss jams in signal set system by using queuing theory 1 Correspondence to: Hokkaido University, N14-W9, Kitaku, Sapporo, 060-0814 JAPAN. Tel.: +81 011 706 7105; Fax: +81 011 706 7105; E-mail: [email protected]

"No signal" "Signal" ρs

ρn

Density

Figure 1. Flows against densities at the traffic system under the signal indication, and at not influenced system.

Total of vehicles

K. Hamaoka and M. Wada / Exploration of Complex Growth Mechanics of City Traffic Jam

Flow

228

∫ Qin

Lq

Wq ∫ Qout

"Input flow" "Output flow"

t

Time

Figure 2. Temporal transition of the total inflow and the transition of the total outflow.

standing on macro traffic flow models [2]. In this research, micro traffic flow is simulated to discuss the details of signal jam which is emerged by interactions of individual vehicles. Herewith we suggest a new criterion to evaluate signal jam. Aperiodic boundary condition is set in the simulations since it assumed city traffic flow. However it is difficult to conserve the number of vehicles in the system in aperiodic boundary condition. Then we discuss it by using vehicles’ inflow rate as a criterion. The signal jam grows backward from the signal according to the increase of inflow rate. In these surroundings, queuing theory is applicable to signal jam. The signal jam is relaxed by inhibition of delay propagation. Hence, we evaluate signal jam by using critical inflow rate which defines the border whether jam conserves limited scale.

2. Macro properties of traffic flow In this section, we present that critical inflow rate which evaluates growing of jam is defined by vehicles’ average passing time since queuing theory is applicable to signal jam. [vehicles/m] is set by the low density side of the signal The natural jam’s critical point jam’s critical point [vehicles/m] as shown in Figure 1. The flow volume is saturated if vehicles’ density increases from this point as shown in Figure 1. This phenomenon occurs because the red indications restrict passing vehicles at the signal. We employ inflow rate as criterion since it is independent of system situation. It shall be useful to employ critical value of inflow rate as criterion of signal jam. They are related through queuing theory as an intermediary. If the signal can correspond to the counter and the passing time can correspond to service time, queuing theory is applicable to the city traffic jam when macro probability variables of waiting time and queue are averaged. [vehicles/s] and outflow volume [veIf the difference of inflow volume [vehicles] hicles/s] becomes stationary, the average number of vehicles in the queue [s] become constant. The vehicle at the end of the queue costs and their waiting time to arrive the signal if the queue consist of vehicles on average as shown in Figure 2. Some of inflow vehicles can not pass through the signal directly. Then, the intercepted vehicles form signal jam.

K. Hamaoka and M. Wada / Exploration of Complex Growth Mechanics of City Traffic Jam

229

(1) . [vehicles/s] is average inflow rate from to . We put relates to through inflow rate , regardless of the statistic distributions of arrival processes or passing times. (2) corresponds to signal jam. Then signal jam corresponds to queue. Then we discuss the limit of inflow rate which inhibit jam finite range by introducing queuing theory. If vehicles arrive in manner of Poisson process, their passing times form generic distribution and there is one signal, this situation is represented (M/G/1) by Kendall’s notation. In this case, the average number of vehicles in queue is described by Pollacczek Hinchine’s formula [3]. (3) is the average passing time, is the quadratic moment of the passing time. If , queue grows infinitely. Then the jam spread to backward afar off. We set the critical inflow rate, (4) Then the inflow rate

should satisfy

in order to maintain a finite queue.

3. Simulation model for traffic flow It is important to construct traffic simulation models which expresses actual situations. Yukawa & Kikuchi describe traffic flow model by using a Coupled Map Lattice (CML) model to simulate free way flow [4]. CML can adjust velocity for vehicles’ gaps every time steps. We can model the automatically deceleration of the velocity fluctuation and the vehicle’s smooth stopping by extending their model [5] [6]. We can describe vehicle’s accelerating action respect to previous vehicle simply.

(5)

is vehicle’s velocity at time , is goal velocity of each vehicle. Parameters and concern acceleration characteristic, and braking effect respectively. And, and concern scale of fluctuation. We call this CML model ”Neuro based CML model”. every time step by inputting its velocity The vehicle updates its position which is given by this map. (6)

230

K. Hamaoka and M. Wada / Exploration of Complex Growth Mechanics of City Traffic Jam

If vehicle is set at position , and vehicle influenced by vehicle through its gap .

is set at position

, vehicle

is (7)

is the length of a vehicle. The vehicle runs without speed reduction, when the gap is sufficient large. But, when the gap gets narrow, the vehicle should brake. It will follow the previous one with conserving safe distance. We define the distance by using Fujii’s safe gap [7]. (8) [km/h] need not decelerate if its gap is larger than A vehicle running with velocity [m]. But, it will decelerate to the goal velocity to avert collision if the gap gets more narrow than . (9) is the vehicle’s braking acceleration. If the vehicle’s gap becomes narrower than the minimum gap (=4[m]), it stops. Thus, equation (9) is used in the range of . When the vehicle comes right in front of the signal, its dynamics are changed by signal’s indications. If the signal indicates green, the velocity does not change. But, if the signal indicates red, it decelerates and stops by inserting the distance to the signal as to Eq. (7).

4. Traffic flow simulation in signal set system We research how the queue and outflow are changed by the signals’ indications. We employ the neuro based CML model which was introduced in chapter 2. Its parameters are . In addition, all vehicles’ lengths are 4[m] and goal velocities are 50[km/h]. We prepare a one way traffic lane under the aperiodic boundary condition with conserving the vehicles’ inflow rate constantly. When vehicles move, they have three procedures. First, their gap against the previous vehicle is measured. Second, their velocities are determined from preset gaps and velocities. Third, all vehicles move. All vehicles emerge at position 0[m] in manner of Poisson process. They are influenced by the signal set at 5000[m]. The signal has two parameters, cycle length (period of signal indicating time, the unit is [s]) and split (green indication’s rate). We prepare following indications. Periodic indication:Both green and red indicating rates are fixed 50% for cycle length as shown in Figure 3. Random split indication:Green indicating rate % and red indicating rate change randomly as shown in Figure 4. But, their time means are equal.

%

Random split indication can be regarded as random noise polluted cyclic indication about its changing point between green and red. The changing point can move only the range

K. Hamaoka and M. Wada / Exploration of Complex Growth Mechanics of City Traffic Jam

Figure 3. Periodic indication

231

Figure 4. Random split indication

Output Flow [vehicles/s]

which is product of the noise intensity (uniform random numbers, from 0 to 1) and cycle length. We put the noise intensity 0.2, for decreasing of flow fluctuations. In other words, both indicating times change from ”0.4 cycle length” to ”0.6 cycle length” randomly. We plot outflows which pass through the periodic and random indicating signals against cycle lengths when inflow rate is 0.167[vehicles/s] as shown in Figure 5. Random 0.167 0.1665 0.166 0.1655 0.165

"Periodic indication" "Random split indication"

20 40 60 80 100 120 140 160 180 200 Cycle length [s] Figure 5. Output flows of each indications against cycle lengths.

split indication runs something more vehicles than periodic indication until about 90[s]. If cycle length goes over any more, there are not large differences between the two indications. We plot space-time plots of both indications about each 20[s], 50[s], 100[s], 200[s] cases in Figure 6 Figure 13 to compare the features of the jams. Dots express existence of vehicles at certain times in certain place in these plots. The dense parts are jams.

Figure 6. Periodic indication : Cycle length = 20[s]. Figure 7. Random indication : Cycle length = 20[s].

When cycle length is 20[s], the jam happens only right in front of the signal with both indications. But, periodic indication’s jam is a little larger. When cycle length is 100[s], jams grow larger than in 20 [s] and they propagate to backside. There are not large differences between both indications. These two cases belong to finite jams’ area. When cycle

232

K. Hamaoka and M. Wada / Exploration of Complex Growth Mechanics of City Traffic Jam

Figure 8. Periodic indication : Cycle length = 50[s]. Figure 9. Random indication : Cycle length = 50[s].

Figure 10. Periodic indication : Cycle length = 100[s]. Figure 11. Random indication : Cycle length = 100[s].

Figure 12. Periodic indication : Cycle length = 200[s]. Figure 13. Random indication : Cycle length = 200[s].

length is 200[s], jams spread afar off in both indications similarly. This case belongs to saturated jams’ area. But, when cycle length is 50[s], there is radical difference between both indications. Periodic indication’s jam is obviously larger than the random split indication. Next chapter, we analyze the relation between the growths of jams and signals’ indications.

233

Distribution [a.u.]

Distribution [a.u.]

K. Hamaoka and M. Wada / Exploration of Complex Growth Mechanics of City Traffic Jam

0.4

0.3 0.2 0.1 0

0.4 0.3 0.2 0.1 0

0

10

20

30

40

Transit time [s]

Figure 14. Histogram of transit times in front of the periodic indicating signal.

0

10

20

30

40

Transit time [s]

Figure 15. Histogram of transit times in front of the random indicating signal.

5. Analysis of jams We plot histograms of passing times of vehicles at the signal when cycle length is 50[s] in Figure 14 and Figure 15 to know the both indications’ capacities for running vehicles. Both indications have tall peaks at about 4[s]. In periodic indication, there is a little peak at about 25 [s] which corresponds to red indicating time. But, in random split indication, the peak’s dispersion is larger than periodic indication because of its randomness. Further more, we compare numbers of vehicles in queues between both indications against inflow rates in Figure 16. The average passing time is 5.8279 [s] in periodic indication, and 5.6823 [s] in random split indication. The critical inflow rates are calculated from these values by using eq.(4). The critical inflow rate is 0.1716 [vehicles/s] in periodic indication, and is 0.1760 [vehicles/s] in random split indication. That is to say, the number of vehicles increases exponentially when it approaches to 0.1716 [vehicles/s] in periodic indication. But, the number of vehicles doesn’t increase exponentially yet in random split indication. It says that the difference of vehicles’ accumulations between Figure 8 and Figure 9 is derived from critical inflow rates of signal indications. In addition, we plot theoretical values of (M/G/1) for comparison. From this result, random noises seem to make the signal’s critical inflow rate get up, and jam’s growth is inhibited very large. Laszlo et al. reported effectiveness of noise in traffic lanes with a junction as spectral stochastic resonance [8]. The relation between the signal and stochastic resonance doesn’t become evident. But, in this research, it may appear to be related. Then, we plot output flow against noise intensity in Figure 17. If the noise ranges are 0.1 to 0.5, outflows are larger than in other ranges. But, there is not apparent local maximal value. We can’t say the effectiveness of stochastic resonance in this phenomenon completely. But, some factors are conceivable. Random indication may break the jam which grows like crystal by red indications. Or, frequently long green indication may inhibit growth of the jam by running vehicles at a breath like vibrations of a sieve.

6. Conclusion We simulate traffic flow by using inflow rate as criterion to evaluate signal jam. We employ neuro based CML model to express traffic flow model. Vehicle’s acceleration, velocity fluctuation and following action which depends safe gap is incorporated in this model.

234

K. Hamaoka and M. Wada / Exploration of Complex Growth Mechanics of City Traffic Jam

150

"Periodic (Observed)" "Periodic (M.G.1)" "Random (Observed)" "Random (M.G.1)"

Output Flow [vehicles/s]

Queue [vehicles]

200

100

50

0 0.155

0.16

0.165

Inflow rate [vehicles/s]

0.17 0.175 λc1 λc2

Figure 16. Queues of each indications against inflow rates.

0.1668 0.1664

0.166 0.1656 0.1652 0

0.2

0.4

0.6

0.8

1

Noise intensity

Figure 17. Output flow against noise intensity at random split indication.

We introduce queuing theory to analyze growths of jams caused by signals from a new perspective. It is conceivable to evaluate signal jam by using critical value of inflow rate. There is a critical value of inflow rate to conserve the queue of the visitor finite length. This critical point exists in traffic jam too. These points can be derived by the same mechanism. In this research, jams grow rapidly if inflow rate of vehicles approach to this value. We compare outflow of vehicles and growths of jams derived from periodic indicating signal and random split indicating signal. Random split indication expands little more outflow than periodic indication in lesser cycle length by using outflow as criterion. There is apparent difference between them by using critical inflow rate as a criterion. Then, the critical inflow rate shall be useful to evaluate signal jams derived from various signal indications. In random split indication, the plot of outflow against noise intensity is analogous to stochastic resonance. But, specific clarification remains as problem. Extension to multi signals systems and intersection systems are other challenges. On that basis, we will construct an adaptive traffic signal which inhibits the growths of the city traffic jams.

References [1] D Helbing, M Treiber : ”Gas-Kinetic-Based Traffic Model Explaining Observed Hysteretic Phase Transition.” Physical Review Letters, 81 : 3042–3045, (1998). [2] M. Kuwahara and T. Akamatsu : ”Dynamic user optimal assignment with physical queues for a many-to-many OD pattern” Transportation Research Part B, Vol.35B, No.5, pp.461-479, Jun. 2001. [3] H Morimura, Y Oomae : ”Application of the queues.” Union of Japanese Scientists and Engineers Press, Ltd. : 1–82, (1980) in Japanese. [4] S Yukawa, M Kikuchi : ”Density Fluctuations in Traffic Flow.” Journal of the Physical Society of Japan, 65 : 916–919, (1996). [5] K Hamaoka, M Wada, K Ishimura : ”Coupled Map Lattice Model based on Driving Strategy for City Traffic Simulation.” Proc WSTST’05. : 1173–1182, (2005). [6] M Wada : ”Cyclic Sequences Generated by Neuron Model.” Electronics and Communications in Japan. 65-2, pp.1-10(1982) [7] E Yoneya, S Watanabe, M Mouri : ”Traffic engineering.” National Science Press, Ltd. : 117– 121, (1965), in Japanese. [8] Laszlo B Kish, Sergey M Bezrukov : ”Flows of cars and neural spikes enhanced by colored noise.” Physics Letters A. 266 : 271–275, (2000).

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

235

Coordinated Control of Mobile Antennas for Ad-Hoc Networks in Cluttered Environments Gianluca Antonelli a , Filippo Arrichiello a,1 , Stefano Chiaverini a and Roberto Setola b a DAEIMI, Università degli Studi di Cassino, Italy b Lab. Sistemi Complessi e Sicurezza, Università Campus Bio-Medico di Roma, Italy Abstract. Mobile ad-hoc networks are used to extend and adapt the area covered by a static network of antennas. This is the case of an autonomous agent moving in a cluttered environment that needs to be connected to a fixed, limited-range, base station. The wireless network can be obtained by resorting to a platoon of mobile robots carrying each one antenna. Self-configuration of the robots’ platoon is achieved by a singularity-robust task-priority inverse kinematics algorithm via the definition of suitable task functions. The effectiveness of the proposed algorithm has been extensively verified via numerical simulations involving an autonomous mobile robot moving in a cluttered environment. Keywords. Mobile Ad-hoc NETworks (MANETs), Coverage Area Adaptation, Multi-Robot Systems

1. Introduction Communication among mobile agents is a crucial requirement in many applications to support cooperation of team members in fulfilling a given mission. Traditional solutions to the problem of mobile communication, e.g., cellular networks, require coverage of a wide area through a number of space-fixed antennas that serve partly overlapping local areas (the cells); an user then communicates with the others through a wireless connection to the local antenna that conveys all the traffic within the cell. While the above static placement of the antennas is cost/performance effective when there is always a large number of users to be connected within the cell (e.g., metropolitan areas or highway paths), in many applications it would be more convenient to adopt an approach where the antennas are dynamically placed to optimize the coverage needs as in survey and rescue missions to be performed inside a damaged building or in exploration of unknown areas. In this scenario, wireless Mobile Ad-hoc NETworks (MANETs) represent a promising approach to overcome the drawbacks of fixed-configuration networks. A MANET is a collection of autonomous nodes that communicate each other without using any fixed 1 Correspondence

to: Filippo Arrichiello, Dipartimento di Automazione, Elettromagnetismo, Ingegneria dell’Informazione e Matematica Industriale, Università degli Studi di Cassino, via G. Di Biasio 43, 03043 Cassino (FR), Italy. Tel.: +39 0776 2994323; Fax: +39 0776 2993707; E-mail: [email protected]

236

G. Antonelli et al. / Coordinated Control of Mobile Antennas for Ad-Hoc Networks

networking infrastructure. Each node in a MANET operates as both a host and a router. Therefore, any node can communicate directly with nodes that are within its transmission range and, in order to reach a node that is out of its range, data packets are relayed over a sequence of intermediate nodes using a store-and-forward multi-hop transmission principle. This approach is very attractive also for mobile robot applications. For this reason, a number of researchers have started proposing ad-hoc wireless solutions for mobile robot communication (see, e.g.,[8,9,12]). However, these solutions consider position and motion of each node as uncontrollable variables. Assuming instead that at least some of the nodes may be controlled, more performing MANETs can be designed. Indeed, a suitable dynamic reconfiguration of the robots’ position allows to adapt the coverage area of the communication network to better support the team’s mission, to avoid signal fading area (e.g., that induced by the presence of obstacles), and to handle possible fault of some team members. First studies on this topic were related to use mobile robots to provide adaptive sensor network in a dynamic environment. In [10] Delaunay tessellation and Voronoi diagram are used to define an algorithm able to maximize the coverage area of the network. A more general formulation for the coverage problem is proposed in [7], where a distributed asynchronous algorithm is developed to dynamically reconfigure a mobile sensors network taking into account sensor performance; specifically, in this approach the control input of each vehicle is proportional to the distance from its actual position and the centroid of the associated Voronoi region. In reference [5], to guarantee communication for a mobile robot involved into a dynamic coverage problem, a static network of markers is autonomously dispersed by the robot during its motion (these markers are also used to improve localization capability). In this paper, following the preliminary results in [1], we consider the problem of dynamically adapting the configuration of a platoon of mobile robots equipped with a wireless device (that we will call antenna) so as to realize an ad-hoc network to support communication of an autonomous agent, i.e., an autonomously driven vehicle or an human, with a fixed base station. Antennas coordination for agent coverage is achieved in the framework of kinematic control of platoon of mobile robots [3], by resorting to properly defined task functions. This approach has shown to be efficient and reliable in simultaneous handling of several control objectives, namely, agent-antennas-station connectivity and obstacle avoidance. The proposed approach has been simulated while a platoon of mobile antennas successfully performs a communication coverage mission of an autonomous robot navigating in a cluttered environment.

2. The MANET case In this section the details of the proposed solution to the MANET problem will be presented. In the considered case, a platoon of self-configuring mobile antennas is aimed to realize a communication bridge between a fixed base station (e.g., an Internet access point) and a mobile agent (a mobile robot or a human operator) performing its autonomous mission (see Figure 1). Thus, the antennas have to dynamically configure themselves so as to ensure the communication coverage of the mobile agent during the all mission, avoiding obstacles present in the environment and eventually performing secondary tasks.

G. Antonelli et al. / Coordinated Control of Mobile Antennas for Ad-Hoc Networks

237

agent

mobile antennas

base station

Figure 1. Sketch of the coverage problem to be solved; the autonomous agent needs to be connected with the base station by the use of a platoon of mobile antennas.

Let us denote with the subscript i the generic antenna as well as the base station, i.e., it is i ∈ [0, n] where i = 0 denotes the base station and i = 0 the generic antenna. Each antenna is supposed to cover in open space a circular area Ii centered in pi (the position of the i-antenna) and of radius rmax,i = rmax,i (i, pwi ) ,

(1)

where pwi is the current level of power used for transmission. It can be noticed that rmax,i depends on the characteristic of the i-th antenna and on the level of power used for transmission pwi . During the execution of a mission the level of power used for the transmission may be varied in order to preserve the battery level or, on the other side, can be decreased when the battery level is low. In order to guarantee adequate operative margins, we will constraint each antenna to dialogue in a smaller region, identified via the radius dmax,i = (1 − Δ) rmax,i

0 < Δ < 1,

(2)

where Δ is a design parameter. Moreover, in order to avoid collisions, all other bodies in the environment (i.e., base station, other antennas, agent and obstacles) must be farther than dmin,i from each antenna. Notice that this definition is similar to that of comfort zone in [11], therefore, this term will be used in the following to denote the area shown in Figure 2. Let us define as virtual chain a virtual connection that, starting from the base station, reaches all the n antennas keeping all the couple of successive antennas in their relative comfort zones. That is, defining a n-dimensional vector k that collects the indexes of the

238

G. Antonelli et al. / Coordinated Control of Mobile Antennas for Ad-Hoc Networks

rmax dmax dmin antenna

Figure 2. Comfort zone of an antenna and relevant distances.

antennas in their order in the virtual chain, then pkj ∈ Ikj+1 and pkj+1 ∈ Ikj . Thus, if the agent belongs to the comfort zone of one of the antennas, then the virtual chain represents the communication bridge between the agent and the base station. 2.1. Algorithm details The coordination control technique for multi-vehicle systems presented in [1] and inherited from [2,6] will be used to suitably coordinate the motion of the platoon of mobile antennas so as to implement an Ad-Hoc Mobile network. To solve the coverage problem by resorting to this technique, it is necessary to properly define the task functions and properly arrange them in a priority order. The task functions to be achieved by the antennas are, namely, obstacle-avoidance and keep-in-comfort-zone. The obstacle-avoidance task function is aimed to keep each antenna to a safe distance do ≥ dmin from all the obstacles (i.e. other antennas, base station or the agent). The keep-in-comfort-zone task function has to keep each couple of successive antennas in the virtual chain to a distance dc ≤ dmax . The task functions to be achieved by the antennas in order of decreasing priority, being 1 the highest and assuming that k0 is the base station, are: • for an antenna along the virtual chain kj with j ∈ {1, n−1}: 1. obstacle avoidance; 2. keep the next antenna kj+1 in the comfort zone; 3. keep the previous antenna kj−1 in the comfort zone. • for the last antenna in the virtual chain kn : 1. obstacle avoidance; 2. keep the previous antenna kj−1 in the comfort zone. Moreover, for the antenna closest to the agent (not necessarily kn ), the task of keeping the agent in the comfort zone is considered with a priority lower than the obstacle avoidance and higher than the others. With respect to the above list, thus, this task is at a priority 2 while the other tasks from priority 2 skip of one unity downward. Notice that the obstacle avoidance task function, preserving the integrity of the system, has always the highest priority. Wider discussion on the obstacle avoidance possibilities as well as implementation details will not be given here for seek of space, in the simulation Section

G. Antonelli et al. / Coordinated Control of Mobile Antennas for Ad-Hoc Networks

239

a rather complex environment will be handled with this approach. The limit of this obstacle avoidance approach, common to all the path planning algorithm relying on local information only, is the possibility to get trapped in local minima. Further details can be found in [3]. Once the task functions have been defined, the velocity contribution for each antenna due to each task is elaborated referring to a singularity robust closed loop inverse kinematics technique inherited from [2,6]. Then, the velocity contributions of each task are combined using a task priority based algorithm to compose the finale, reference, velocity for each of the antennas. More details on the proposed technique can be found in [1]. 2.2. Dynamic handling of the virtual chain As mentioned before, in order to guarantee the proper connection between the agent and the base station, the antennas dynamically self-organize to realize a virtual chain. This configuration is obtained using the follow algorithm: 1 from the base station k0 look for the closest antenna inside I0 and label it as the first in the virtual chain k1 ; make k1 the current antenna; 2 from the current antenna in the virtual chain kj look for the closest antenna inside Ikj in the remaining antennas and label it as the successive in the virtual chain kj+1 ; make kj+1 the current antenna; 3 repeat the point 2 increasing the index of the current antenna until there is no remaining antenna; if j +1 = n go to step 4, else keep the virtual chain formed at tk−1 , i.e., k(tk ) := k(tk−1 ) 4 virtually connect the agent to its closest antenna (not necessarily kn ). 3. Simulation Results The proposed algorithm, aimed at guaranteing the wireless communication bridge between the agent and the base station by means of several mobile antennas, has been extensively tested in numerical simulations. In all the simulations run the agent is a robot performing its missions independently from the antennas. In particular, the mobile robot has to reach predefined via-points, avoiding collision with obstacles present in the environment. Moreover, the robot’s movements are coordinated with the MANET so that it can eventually receive stop when in escaping the comfort zone of its nearest antenna to allow reconfiguration of the antennas. Notice that, as explained above, this behavior is done autonomously by considering the proposed kinematic control and not explicitly commanded to the robot. The proposed scenarios (see fig. 3, 4) simulate the navigation of an autonomous robot in a building while a platoon of antenna ensures the communication coverage. The robot has to reach different goals passing trough predefined via-points and, obviously, avoiding the walls. The antennas have to ensure the communication between the robot and the base station avoiding collisions with walls, robot, base station and among themselves. The simulations starts with the 10 antennas randomly distributed around the base station following an uniform distribution centered in position [0 0]T m with an interval of 50 m on the coordinates and verifying that a virtual chain is formed and the robot in the initial position [10 10]T m.

240

G. Antonelli et al. / Coordinated Control of Mobile Antennas for Ad-Hoc Networks

t = 138

t = 0.3

8 7 695412 3 10

7985412 6103

t = 561

t = 414 87 6

t = 276

9 10 351 42 7 8 6

351 4 2 9 10

541 9 103 2 6 7 8

t = 690

6897 103 5 1 4 2

Figure 3. Snapshots of a coverage mission for a platoon of mobile antennas in an indoor environment. The robot (blue dot) has to reach the several goals (marked by a ×), while the antennas (red dots) have to ensure connection between the robot and the fixed station (the star).

For the simulations reported the following parameters have been used rmax,i = 24 m dmax,i = 16 m dmin,i = 6 m T = 300 ms while the maximum velocities are 1 m/s for the robot and 1.5 m/s for the antennas. It can be noticed that dmax,i has been kept constant ∀i. The robot moves in an environment with obstacles and it is required that it stays at a minimum distance of 10 m from them. Figure 3 reports six snapshots of a mission execution while an autonomous robot reaches multiple via points in different rooms; the robot reaches the final goal in about 690 s while the antennas do maintain the coverage with the base station during the all mission and avoid the obstacles themselves. Notice that the virtual serial chain among the antennas can dynamically change during the mission, giving more flexibility to the approach; moreover, differently from [4], this implies that there is no need to impose an explicit order to the antennas. Figure 4 reports six snapshots of a mission execution while an autonomous robot reaches multiple via points in a corridor with a corner; the robot reaches the final goal in

G. Antonelli et al. / Coordinated Control of Mobile Antennas for Ad-Hoc Networks

t = 60

t = 0.3

241

t = 120

8 4 1 327 58910 46

t = 180 4 8 5 1 3 10 27 96

51 3 2 910 67

8 1 10 4 5 9327 6

t = 240 1 5 4 8 3 10 2 7 6 9

t = 300 10 3 1 5 4 2 8 7 6 9

Figure 4. Snapshots of a coverage mission for a platoon of mobile antennas in a complex environment. The robot (blue dot) has to reach the several goals (marked by a ×), while the antennas (red dots) have to ensure connection between the robot and the fixed station (the star).

about 300 s while the antennas do maintain the coverage with the base station during all the mission and avoid the obstacles themselves. During the simulation, the robot sometimes can reach the boundary of the comfort area of the nearest antenna; in these cases the robot stops and waits for reconfiguration of the network.

4. Conclusions In this paper we have proposed an algorithm to generate coordinated motion of a platoon of mobile antennas for dynamic reconfiguration of the coverage area of a mobile adhoc networks to ensure a communication link between an autonomous agent, that moves independently from them, and a fixed base station. The algorithm is based on the singularity-robust task-priority inverse kinematics approach that can simultaneously handle multiple tasks. Remarkably, the configuration of the communication chain and the relative priority of the tasks can be dynamically changed during the mission execution for increased effectiveness of the coverage. In this preliminary simulation study, navigation in a complex scenario has been considered. Future work will be devoted at considering coverage of more than one moving agent, occurrence of faults in the antennas, and non-chained link topologies.

242

G. Antonelli et al. / Coordinated Control of Mobile Antennas for Ad-Hoc Networks

References [1] G. Antonelli, F. Arrichiello, S. Chiaverini, and R. Setola. A self-configuring MANET for coverage area adaptation through kinematic control of a platoon of mobile robots. In Proceedings IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1307–1312, Edmonton, CA, Aug. 2005. [2] G. Antonelli and S. Chiaverini. Kinematic control of a platoon of autonomous vehicles. In Proceedings 2003 IEEE International Conference on Robotics and Automation, pages 1464– 1469, Taipei, TW, Sept. 2003. [3] G. Antonelli and S. Chiaverini. Fault tolerant kinematic control of platoons of autonomous vehicles. In Proceedings 2004 IEEE International Conference on Robotics and Automation, pages 3313–3318, New Orleans, LA, April 2004. [4] P. Basu and J. Redi. Movement control algorithms for realization of fault-tolerant ad hoc robot networks. IEEE Network, 18(4):36–44, 2004. [5] M.A. Batalin and G.S. Sukhatme. Coverage, exploration and deployment by a mobile robot and communication network. Telecommunication Systems – Special Issue on Wireless Sensor Networks, 26(2):181–196, 2004. [6] S. Chiaverini. Singularity-robust task-priority redundancy resolution for real-time kinematic control of robot manipulators. IEEE Transactions on Robotics and Automation, 13(3):398– 410, 1997. [7] J. Cortés, S. Martínez, T. Karatas, and F. Bullo. Coverage control for mobile sensing networks. IEEE Transactions on Robotics an Automation, 20(2):243–255, 2004. [8] P.E. Rybski, N.P. Papanikolopoulos, S.A. Stoter, D.G. Krantz, K.B. Yein, M. Gini, R. Voyles, D.F. Hougen, B. Nelson, and M.D. Ericksn. Enlisting rangers and scouts for reconnaissance and surveillance. IEEE Robotics & Automation Magazine, 7(4):14–24, 2000. [9] G.T. Sibley, M.H. Rahmi, and G.S. Sukhatme. Robomote: A tiny mobile robot platform for large-scale ad-hoc sensor networks. In Proceedings IEEE International Conference on Robotics and Automation, pages 1143–1148, Washington, DC, May 2002. [10] J. Tan, O.M. Lozano, N. Xi, and W. Sheng. Multiple vehicle systems for sensor network area coverage. In Proceedings 5th World Congress on Intelligent Control and Automation, pages 4666–4670, Hangzhou, PRC, June 2004. [11] J. Vazquez and C. Malcolm. Distributed multirobot exploration maintaining a mobile netwok. In Proceedings 2nd IEEE International Conference on Intelligent Systems, volume 3, pages 113–118, Varna, Bulgaria, June 2004. [12] Z. Wang, M. Zhou, and N. Ansari. Ad-hoc robot wireless communication. In Proceedings IEEE Conference on Systems, Man and Cybernetics, pages 4045–4050, Washington, DC, Oct. 2003.

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

243

An Adaptive Behavior of Mobile Ad Hoc Network Agents Masao Kubo a,1 , Chau Dan a and Hiroshi Sato a Takashi Matubara a a National Defence Academy, Japan Abstract. In this paper, the authors are interested in building a backbone network by robot collective, which can supply communication service for disaster site. At first, we show the typical characteristics of static formation of network. Then, applying the HOT&COLD theory, we proposed 3 core heuristic rules of wireless access points. With the proposed rules, the emerged networks can achieve the performance same as the static grid pattern formation. Keywords. manet, Cascade Failure, Hot&Cold

1. Introduction Mobile Ad hoc Network (MANET) is a technology, which offers communication services for users anywhere and anytime. It is robust for changing of topology and user distribution. Therefore, MANET is useful when wire network facility can’t sufficient, so that it will be more important in future in case of fire, flood, and earthquake. Fig.1 illustrates a scenario of MANET in devastated district. A group of wireless communication airships fly above the disaster area and provide backbone communication services for users (we denote as host) on the ground. In this case, if airship is not sufficient capability, they change their position according to the loads and the distribution of the communication requests from users. In this paper, we want to discuss the adaptive behaviors of the mobile agents to generate and maintain a high performance network. Basically, in MANET research, it is supposed that the behavior of mobile access points (we call it agents) and network performance is independent so that agents do not need to behave to improve network performance even though it is worse. A lot of sophisticated network protocols have been discussed in ITS field (in cellular network field, base stations are not mobile). However, recently, the wireless facility is getting smaller and more powerful so that we believe ‘mobile’ base station will be common and a key component for future swarm robotics. In this paper, a group of mobile wireless access points (agents) which generates backbone network for (hosts) who have more lower power wireless communication capability is studied. Each agent has wireless communication capability and it can open communication channel to other agents within a particular distance. Hosts can ask the nearest agent to send a fixed size message (request) to other hosts for them. Generally, 1 Masao KUBO, Dep. of Computer Science,National Defense Accademy, Hashirimizu 1, Yokosuka, Kanagawa, JAPAN,239-8686. Tel.: +81 46 841 3810; Fax: +81 46 844 5911 ; E-mail: [email protected].

244

M. Kubo et al. / An Adaptive Behavior of Mobile Ad Hoc Network Agents

Figure 1. Scenario: a group of wireless agents and users

wired/wireless network are used without discrimination in MANET. However, in this paper, for simplicity, we suppose that all communication only use wireless facilities. This paper is composed as follows: at the next section, a scenario and a model for the MANET is defined. In section 3, we show that by numerical simulation, an adequate size grid of static agents (agent doesn’t move) can generate a good network. In section 4, the emerged network from 3 proposed rules using local information can achieve almost same performance as the medium size grid pattern with non-mobile access points. But in some cases, the deviation of distribution is worse than the static formation’s and it is too large to be accepted. Then, we discuss HOT&COLD approach, which can improve the network performance in this case.

2. Scenario and Model A group of mobile wireless access points (agents) which generates backbone network for (hosts) who have more lower power wireless communication capability is given. A given workspace W ∈ R2 is rectangle. On W , a group of agent G = {1, . . . , N }, N = |G| is allocated. The communication speed cs between agent and agent, agent and user, (d is the distance between them) (d + 1)−cd .

(1)

And if cs > kcut

(2)

is satisfied, the agent opens communication channel without any other conditions [5]([1] introduced negotiation process to realize small world connection topology). cd , kcut are constants and in the next experiments cd = {2.0, 2.6},kcut = 0.002 is used. Communication requests come from Host union not Agents union. A member of U wants to send a particular size of message to others, which is in different location ∈ W . A request rq contains 2 location information: psend , the location of sender ∈ U , and pdest , the location of the destination. At first, the nearest agent ∈ G decides to accept the request or not . If the communication speed between the sender and the nearest agent is satisfied the condition 2, the request is accepted and the nearest agents will try forward that rq to an adequate neighbor agent. On the other hand, if the condition 2 is not satisfied, the

M. Kubo et al. / An Adaptive Behavior of Mobile Ad Hoc Network Agents

245

Figure 2. Left:Successful request of cd =2.0.Right: The result by ns2(FastSpeed)

request is aborted but the request location information is recorded. Also, the location information is also recorded when the forwarded requests do not reach the destination user correctly even though it was accepted. A lot of routing protocols are proposed for MANET, for example, AODV, DSR, and etc. However, there is no universal protocol that network designers select best one according to its application, frequency of topology change, mobility of agents, and the amount of overhead, for example [2]. In this paper, we are interested in MANET characteristics under more ideal condition and Dijkstra Algorithm is selected. Each request is forwarded to the neighbors by the shortest path from source to destination. The weight of arc of Dijkstra Algorithm is 1/cs,

(3)

where cs is a communication speed of a given pair of agents ∈ G. Note that the weight is not fixed. Each agent can handle ALR(Allowable Load Resistance) requests per step and can forward (ALR − 1/cs) requests at the step after it forwards a request using cs communication speed link. If ALR − 1/cs < 0,

(4)

we suppose that this agent is saturated and it can not transfer any more requests. Therefore, weight of the all arcs is set 1. A communication link’s weight is also set to 0 if the transfer via this link made this condition to be true. Also, all agents are homogeneous so that ALR is a constant. 2.1. Verification of model Fig.2 Left shows the relation between SR(Successful Request) and the load. The x-axis indicates the requests per step. We execute 10 runs using different initial formations. Fig.2 Right shows the simulation result by using network simulator ns2 [10] with the same simulation setting. As you can see, our model offers the same characteristics with ns2 simulation result. The interesting point is: the distributions are very different, depending on the agent formations. At the beginning, the SR dramatically increases because Load is still small so that all requests are delivered correctly if it is accepted. Therefore, the SR increases

246

M. Kubo et al. / An Adaptive Behavior of Mobile Ad Hoc Network Agents

Figure 3. Left:Static Formation and links of cd =2.0(From Left to Right: random,circular、large griddal formation、the middle、the small. kcut =0.002.N =16. Right:that of cd =2.6(No links are achieved under this condition by Eq.2)

linearly. However, once Load is larger than some thresholds, the network starts to fail delivering the accepted requests. Under this circumstance, the agents around the center of network works harder than other near by agents so that some of links of central agents is saturated and requests have to travel via longer paths [4]. It makes agent’s Load increase so that SR is degraded after a threshold. However, the thresholds of 10 runs are quite different. Some of them keep forwarding a particular amount of requests while it can’t delivery.

3. Static agents Formations Well, what kind of formation is good? Therefore, we evaluate static formations shown by Fig.3 Left and Right. The cd of Fig.3 Left is 2 and that of the Right is 2.6 respectively. Circles indicate agents and the lines mean wireless links which satisfy condition 2.Some of formation of the both figure is same but the topology is different because of cd . The geographical distribution of host is same as the last experiments and the destination host is also picked up from workspace W using uniform random function. What kind of network formations offers good performance? First, we evaluate the static formations shown by Fig.3 Left and Right. The cd of Fig.3 Left is 2 and that of Fig.3 Right is 2.6, respectively. Circles illustrate agents and lines illustrate wireless links, which satisfy condition (2). In both figures, the topologies are depended on cd value. The geographical distribution of hosts is the same as the last experiment and the destination host is also picked up from workspace W using uniform random function. Fig.4 Right illustrates the histogram distribution of SR of 10000 runs when the requests are constant (=100 requests per step). The x-axis means SR(successful request) and the y-axis means the logarithmic transformed histogram. Fig.2 Left and Fig.2 Right represent the result of cd =2.0, 2.6, respectively. In Fig.2 when Load is under 100 requests, the distribution of SR is almost in the “linear region”. With random formation, the distribution of SR of cd =2.0 heavily dispersed from 30% to 100% because it strongly depends on randomness

M. Kubo et al. / An Adaptive Behavior of Mobile Ad Hoc Network Agents

247

Figure 4. Left:Result of the proposed rule of cd =2.0.Right:Result of the proposed rule of cd =2.6

that agents make a connected graph. On the other hands, circle, 3 grid formations give better results and the deviations are not too large because it is secured that agents make a connected graph. Also, the 3 grid formations show that the medium size grid pattern is best one among them. When cd =2.6 (see Fig.4 Right), agents of random formation are not able to generate good network because the peak of SR curve is almost 10%. Circle formation also gives a bad distribution of SR. The reason is if slightly large amount of requests are occurred around a particular agent, the links of the agent are saturated by condition 4. If all agents connect to its neighbors, the average distance to other agents is equal while the average distance is completely different once agents without links to its neighbor are occurred (the network shape looks like ‘C’). On the other hand, even if some agents lose their links, the network can offer sub optimal path to the destination. Of course, all grid formations don’t give high distribution of SR. For example, the agents of the smallest grid are too far from senders so that the accepted requests are few. Simulation results tell us following conclusions: basically, there are 2 approaches to improve the SR performance. 1) Increasing the accepted requests. 2) The global network should be connected and robust against of load. Especially, the agents around the center of network should be given more links to its neighbors. The balance of these 2 factors is important so that the medium size of grid formation is best among them. Therefore, in the following experiments, we would like to use this formation as criteria for evaluation of mobile wireless agent’s behavior.

4. Mobile agents: behavior rules In this section, we propose fundamental rules and show a typical performance of the distribution with mobility. 4.1. proposed rules All agents are supposed to move simultaneously and no agent knows other agent’s distance and direction of next movement. Also, at first, all agents are distributed over

248

M. Kubo et al. / An Adaptive Behavior of Mobile Ad Hoc Network Agents

workspace W by uniform random distribution. We suppose that each agent can estimate direction and distance of its liked agents from the intensity of electric wave. 4.1.1. Isolated agents Agent have no link is called isolated agent. As in Fig.3, isolated agents will be occurred. Obviously, in this case, it should connect to other agents and comes closer to others. Then, we propose the following rule. Suppose the number of links of an agent Ln and the set of location of information of accepted requests but not delivered yet, Pf d = {pdest }, if

Ln Go to



thL

average

then

(5)

Pf d ,

of

where thL is a constant. By this rule, an agent goes to the average of the fails. If the agent had no link, we expect, the average of the fails indicates almost of the center of workspace W so that isolated agents have a chance to make links with each other. 4.1.2. Behavior of agents with links(1): Load-sharing & Generation of High Speed Linkage The last experiments tell us the method to improve average/deviation of SR is to degrade center agents’ load because heavy load make ‘saturated links’. Therefore, we propose the second rule as follows. Now, let the remaining ALR of the last step of agent i is ΔALRi , if ΔALRj /ΔALRi ≥ thD Go

to

agent

then

(6)

j

where j is one of agents linked to agent i and thD is a constant. By this rule, the connection speed between agent i and j is increased and we expect the load of agent j is decreased. 4.1.3. Behavior of agents with links(2): Increase of the accepted request The second approach is to increase the accepted requests. If the accepted requests are not larger than a threshold very much we denoted at section 2.1, agents accept more requests, we expect, more successful requests will be counted. Therefore, we propose third rule as follows. Now, let a set of location information of rejected requests is Pf a = {psend }, a set of location information of the un-delivered requests is Pf d , and the current position of agent i is pi . The vector from the current position to the average location of all failed and rejected requests, vp is vp = (Pf a ∪ Pf d )/(|Pf a | + |Pf d |) − pi

(7)

and let vp ’s unit vector is evp .Also, to increase the total accepted requests as whole, the overlapping region among agents should be reduced. Therefore, we introduce the

M. Kubo et al. / An Adaptive Behavior of Mobile Ad Hoc Network Agents

249

Figure 5. Left:Result of the proposed rule of cd =2.0.Right:Result of the proposed rule of cd =2.6

repulsive behavior from its nearest agent. Let j is the nearest agent of the agent i. The vector from the agent j to i is vji = pi − pj .

(8)

Also, let evij is the unit vector of vji .Using the above 2 unit vectors, agent i move to the following direction. (evp + evji )/2

(9)

4.1.4. Fragmantation caused by Simultaneous Mobile behavior Here, as described at the beginning of this section, we assume that all agents move simultaneously. Therefore, sometimes, links are broken after a movement because the distance among them is changed (Eq.2). Especially, if this flameout of linkage happens, it will make performance seriously down. Therefore, we think some agents should not move so frequently. Then we make a combination of the above core rules according this discussion. Now, let the highest speed of agent i’s links is csi , a combination is if

csi /kcut ≥ ks then execute else

execute

(10)

Eq.6&9 Eq.5

where ks is a constant. This rule pemits that only agents which are sufficiently close to other agents move but agents which has almost no links. 4.2. Result of the propose rules (1):ks Fig.5 illustrates the result of the proposed rules. At first, all agents are randomly located and change their position 40 times. We evaluate the finally produced network. As the last experiments, we execute this process 10000 times repeatedly and the histogram distri-

250

M. Kubo et al. / An Adaptive Behavior of Mobile Ad Hoc Network Agents

Figure 6. A Snapshot of the produced network with cd =2.6

butions of SR are shown in the both graphs. Fig. 5 Left and Right shows the histograms under cd = 2.0, 2.6, respectively. T hL = 1. As we can see, the set of rule can improve the performance dramatically than its initial state. When cd = 2.0, the curve of ks = 3.0 is almost same as the medium size grid formation which is the best formation in the section 3. The rule with ks = 1.5 can improve the occurrence of very low performance network and the allowable network (around SR=55% and 75%) but the occurrence around SR=100 is worse than the medium grid formation. We suppose that too many agents move simultaneously in spite of unavoidable disparity of load so that they can’t maintain as a connected graph. Under cd = 2.6 situation (see Fig.5 Right), the rule with ks = 1.5 can generate high performance networks more frequently than ks = 3.0 while Rule11 with ks = 1.5 also generate meaningless network very often. Although this results of rule11 with ks = 1.5, 3.0 seems to be a kind of trade-off, we think, it also can be explained by the above dynamics. The disparity of load is not so large because the number of accepted request is smaller so that ks = 1.5 is more adequate for this environment than ks = 3.0 but simultaneous move may their network fragmented. Summary, the combination of core rules can improve the average and distribution of SR from initial formation although ks is set adequately. However, the occurrence of useless network is still much more frequently than the static grid formation. 4.3. Result of the propose rules (2):T hL As you see, the last result based on ks can achieve high peak performance network as same as the static heuristical griddal formation but in sometimes very poor networks are emerged. We would like to improve the large disturvance of the network performacne because it is not so rare enough tha we can regare it as range of error. Highly Optimized Tolerance [6] and Constrained Optimazation with Limited Deviations [7] is optimization concept. In forest fire model[9], a spark burns trees around it and many trees catch fire. Let a farmer plants trees wall-to-wall in its cultivated area. If no sparks are happen, he get a lot of harvest while it can not get any harvest by a single spark. Hot introduces ‘firewall’, which is a intentional blank space. The firewall checks

M. Kubo et al. / An Adaptive Behavior of Mobile Ad Hoc Network Agents

251

Figure 7. Result of T hL >1

the spread of the fire but the number of harvest is down. The optimal size of firewall and several firewall patterns based on HOT concept is shown in [6]. The COLD optimization concept is the degradation of the large size of fire propagation. As you know, in a common forest fire model, the size of spreading fire is estimated by power law. The auther introduce utility and deduces optimal size of firewall based on this concept, which prevents large size fire by a ratio. However, they do not mention about the design methodology of firewall and for other application so far. Therefore, we think that it is a reasonable and important question for future network designer to ask how it should design agent’ behavior to realize meaningful network based on COLD concept. The main reason of the large disturbance is fragmentation around the center because large ks may promote network radiating from the center while rotative direction links are not considered. Fig.6 shows an example of the emerged network on the last experiments. As you see, there are several agents who has only 1 or 2 links. We think the meaningless network is occured when a breakage of long line-like formation is happen around the center area of W . HOT&COLD concept suggests that degration of large error can be prevented by firewall. In our model, excess links could be regared as the firewall cell. Therefore, we set T hL > 1 to increase rotative directions links. Fig.7 shows the result. In this graph, the results of T hL = 1, 2, 3 are depicted. As you see, T hL = 3 can strike a balance between high peak value and low divergence as same as the static griddal formation.

5. Conclusion In this paper, a group of agents, which generates network backbone for users who have less power wireless communication capability, is studied. Each agent can open com-

252

M. Kubo et al. / An Adaptive Behavior of Mobile Ad Hoc Network Agents

munication channel to other agents in a particular distance. Users can ask agents send message to other user for them. In section 3, we showed that by numerical simulation, an adequate size grid formation of static agents generate a good network. In section 4, emerged network by a combination of the proposed 3 core rules of wireless access points based on local information offers almost same performance as the best grid pattern with non-mobile access points, but the deviation of distribution is more worse. Finally, we proposed a new combination by using HOT&COLD optimization theory, which prevents the emergence of poor network, and T hL > 1 model composed by mobile agents can produce almost same performance network as the grid static formation.

References [1] D. Kurabayashi, A.Yajima, T.Funato, Emergence of smalll-world network among moving inidividual、17th SICE symposium on Decentralized Autonomous Systems,pp15-20, 2005(in japanese). [2] A.S.tanenbaum,computer network fourth japanese edition,2003. [3] T. Camp, J. Boleng and V. Davies, A Survey of Mobility Models for Ad Hoc Network Research, Dept. of Math. and Computer Sciences, Colorado School of Mines, Golden, CO, 2002. [4] Adilson E. Motter, Cascade control and defense in complex networks,arXiv:condmat/0401074v2, 2004. [5] N. Zhou, H. Wu and A. Abouzeid, Reactive Routing Overhead in Networks with Unreliable agents, Proceedings of Ninth Annual International Conference on Mobile Computing and Networking (MobiCom ’03), San Diego, CA, USA, September, 2003. [6] Jean M. Carlson and John C. Doyle, Highly optimized tolerance: a mechanism for power laws in designed systems, Phys Rev E, 60(2 Pt A): 1412-27, 1999. [7] M. E. J. Newman, Michelle Girvan and J. Doyne Farmer, Optimal design, robustness and risk aversion, Phys. Rev. Lett. 89, 028301, 2002. [8] S. Ichikawa, F. Hara,Effects of Static and Dynamic Variety in the Character of Robots on Group Intelligence of Multi-robot System. DARS 2000,pp89-98,2000. [9] Per Bak, How Nature Works, The Science of Self-Organized Criticality, Copernicus Books, 1999.   [10] The Network Simulator  - ns2, http://www.isi.edu/nsnam/ns/, 2005. [11] D. Johnson and D. Maltz, Dynamic source routing in ad hoc wireless networks, In T. Imelinsky and H. Korth, editors, Mobile Computing, pages 153-181, Kluwer Academic Publishers, 1996. [12] Christian de Waal and Michael Gerharz, BonnMotion: a mobility scenario generation and analysis tool, Communication Systems group, Institute of Computer Science IV, University of Bonn, Germany, http://web.informatik.uni-bonn.de/IV/Mitarbeiter/dewaal/BonnMotion/, 2005.

Part 6 Evolution and Learning

This page intentionally left blank

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

255

Learning the Cooperative Behaviors of Seesaw Balancing Agents - An Actor-Critic Approach Takashi KAWAKAMI a, Masahiro KINOSHITA a and Yukinori KAKAZU b a Hokkaido Institute of Technology Sapporo, JAPAN b Hokkaido Information University, JAPAN

Abstract This paper proposes a new approach to realize a reinforcement learning scheme for autonomous multiple agents system. In our approach, we treat the cooperative agents systems in which there are multiple autonomous mobile robots, and the seesaw balancing task is given. This problem is an example of corresponding tasks to find the appropriate locations for multiple mobile robots. Each robot agent on a seesaw keeps being balanced state. As a most useful algorithm, the Q-learning method is well known. However, feasible action values of robot agents must be categorized into some discrete action values. Therefore, in this study, the actor-critic method is applied to treat continuous values of agents’ actions. Each robot agent has a set of normal distribution, that determines a distance of the robot movement for a corresponding state of the seesaw system. Based on a result of movement in this system, the normal distribution is modified by actor-critic learning method. The simulation result shows the effectiveness of our approaching method.

Keywords Actor-critic, reinforcement learning, multiagent systems, cooperative behaviors, seesaw balancing problems

1. Introduction Today, an industrial field around robotics research is advancing rapidly. Small personal robots also are with our office and home recently. Instead of human worker, higher integrated robots are necessary for carrying out dangerous tasks. Personal robots or pet robots also should be in our house or welfare facilities. Those robots must have the higher-leveled autonomy and cooperativeness, because they have to achieve a task interacting with other robots or human in a real world where is an uncertain, complex and dynamic environment for autonomous robots. However, these autonomy and cooperativeness are hard to be designed by human designer. For this difficulty, reinforcement learning mechanisms are proposed as an effective approach, and many good results are obtained and reported. In Q-learning method as a conventional algorithm of reinforcement learning, feasible sensory input values from the environment and action values must be divided into finite discrete values. Nevertheless, there are many applications in which continuous values of sensory input and action should be represented to complete given tasks appropriately. In this case, it is well known that continuous action values of autonomous robots are applied by the

256

T. Kawakami et al. / Learning the Cooperative Behaviors of Seesaw Balancing Agents

actor-critic method. Applications of actor-critic method for multiple autonomous agents systems are hardly discussed ever. In this paper, the actor-critic method is applied to treat continuous action values of multiple agents that cooperate each other to perform a given cooperative task. A seesaw balancing task is given as a difficult cooperative task for the multiple autonomous agents system. In this task, appropriate behaviors of multiple mobile robots must be found as a solution by which a seesaw keeps its balance continuously. In the environment where autonomous robots move independently, there is a possibility that the whole system may have instability by a certain agent's movement. It is important practically to set up a control system in consideration of whole stability. Each robot agent on a seesaw keeps being balanced state by using corresponding reinforcement learning systems. Each robot agent has a set of normal distributions, that determines a distance of the robot movement for a corresponding state of the seesaw system. Based on a result of movement in this system, applied normal distribution is modified by actor-critic learning method. The simulation result shows the effectiveness of our approaching method.

2. Reinforcement Learning Systems Based on Actor-Critic Method Recently, the reinforcement learning is attracting attention as a profitable machine learning algorithm. This reinforcement learning is performed based on the given environmental rewards for executed actions. Therefore, the complete world model is not necessary to construct, and the reinforcement learning system is expected to solve the difficult problems in which external environments cannot be significantly modeled in the system, or environments are dynamically changing. To realize reinforcement learning scheme, some approaches such as Temporal Difference Algorithm [3], Q-Learning [4] and Classifier Systems [5], were proposed. Q-Learning is the most successful algorithm of reinforcement learning methods; however it is necessary to categorize continuous feasible sensory input values into finite discrete values to learn appropriate action patterns. There are many applications in which continuous values of sensory input and action should be represented to complete given tasks appropriately. In this case, it is well known that continuous action values of autonomous robots are applied by the actor-critic method. Actor-critic method is TD method that has a separate memory structure to explicitly represent the policy independent of the value function. The policy structure that is called as the actor selects suitable actions. The estimated value function is called as the critic that criticizes the actions determined by the actor. Leaning by actor-critic method is performed on-policy. The critique is treated as a TD-error. The TD-error is the scalar signal output of the critic and drives all learning in both actor and critic. Actor-critic methods are the natural extension of the idea of reinforcement comparison methods to TD learning methods. After each action selection, the critic evaluates the new state to determine whether things have become better or worse than expected. The evaluation is the TD error such as; TD-error=[rt +ǫV(st+1)㧙V(st)]

(1)

T. Kawakami et al. / Learning the Cooperative Behaviors of Seesaw Balancing Agents

257

where V is the state value function estimated by the critic. This TD error is used to evaluate the selected action at taken in state st. If the TD error is positive, the tendency to select at should be strengthened for the future. The other hand, if the TD error is negative, the tendency should be weakened. The advantages of actor-critic methods are as follows; z They require minimal computation in order to select actions. Consider a case where there are an infinite number of possible actions. For example, that is a continuous-valued action. Any method learning just action values must search through this infinite set in order to pick an action. If the policy is explicitly stored, then this extensive computation may not be needed for each action selection. z They can learn an explicitly stochastic policy; that is, they can learn the optimal probabilities of selecting various actions. This ability turns out to be useful in competitive and non-Markov cases.

3. The Seesaw Balancing Problem In this paper, we treat the seesaw balancing problem as a cooperative multiagent system in which cooperative behaviors of multiple autonomous robots are acquired by applying the actor-critic method. In our problem setting, there are autonomous mobile robots mri (i=1,2,…,n) on a flat seesaw that has no motor taking a balance. Each mobile robot independently moves along a straight line. Thus, the objective of this problem is keeping a balance of the seesaw by appropriate actions of mobile robots. An action of each robot can be represented as a continuous value of a distance of one dimensional movement. A position of a robot mri at time t is defined as xi,t , and moving distance value is named as 'xi,t. Consequently, a next position at time t+1 is presented by the following equation; xi,t+1= xi,t +'xi,t

(2)

Since each mobile robot moves along a corresponding straight rail, there is no collision between robots. Let the length of the seesaw be L, so, possible position of a robot is in a range of  L / 2 d xi ,t d L / 2 .

Figure 1. Experimental seesaw system model

Although a conventional dynamics of the seesaw system is generally represented by an angle of seesaw Tt, an angler velocity Zt, a friction of a bearing and so on, we

258

T. Kawakami et al. / Learning the Cooperative Behaviors of Seesaw Balancing Agents

exploit only a moment value of the seesaw at time t to simplify the problem in this problem setting. A moment value Mt is calculated as M t ¦ mi ˜xi ,t , where mi is a weight of robot mri . Therefore, the system has to control the seesaw such as the moment value is kept to be value 0. Each robot is assumed to have the same weight. Figure 1 illustrates the experimental seesaw system model.

4. Applying the Actor-Critic Method Based on the problem setting of seesaw balancing tasks described above, autonomous agents learn cooperative behaviors by the actor-critic method. 4.1 Determination of Behaviors by Actor Module In our approach, behaviors of each agent at a certain time are determined by the actor module in the actor-critic system. For a certain agent, other agents’ behaviors are uncertain factors, so that the agent learns its own behavior by reinforcement learning mechanisms. The i-th agent mri has respective stochastic policy Si independently. This stochastic policy Si determines a movement length 'xi according to the current moment value Mt . In detail, the stochastic policy Si contains normal distributions N(Pi, Vi) that formed by an average value Pi and a standard deviation Vi, so that the movement length 'xi is represented as a normal random value calculated based on the normal distributions N(Pi, Vi). Each robot recognizes its current state by a moment value of the seesaw system. Appropriate movement distances of mobile robots depend upon a current moment value of seesaw, so in this study a continuous moment value is translated into one of twelve discrete state values. Therefore, the policy Si of the agent mri is in detail represented as,

Si = {(Pi,1,Vi,1), (Pi,2,Vi,2), ̖, (Pi,12,Vi,12)}

(3)

where,Pi,j and Vi,j are respectively an average value and a standard deviation of the normal distribution N(Pi,j , Vi,j) which used by i-th agent when the state of seesaw is recognized in j-th state. 4.2 Learning Stochastic Policies by Critic Module To learn appropriate behaviors of mobile agents, the critic module calculates values of TD-error and revises the stochastic policies based on trial-and-error processes. In this approach, a value of TD-error is estimated based on a current moment value Mt at time t and a changed moment value Mt+1 by movements of agents at time t+1. A TD-error value is calculated by the following equation; TD-error=|Mt|  |Mt+1|

(4)

T. Kawakami et al. / Learning the Cooperative Behaviors of Seesaw Balancing Agents

259

No reward value is given from environment in our approach. Using this TD-error value, modification rules of policies are presented as follows; rule 1) if TD-error>0 then revising the average value Ǵi,j of used normal distribution according to the following equation,

Pi , j m D'xi , j  (1  D ) Pi , j

(5)

where, D (0 d D d 1) is a fraction of learning rate. Also if 'xi,t 0 means that a state of the seesaw system becomes better by the agent’s moving. rule 2) if TD-error R > P > S, T + S < 2R, defection is the best choice in a one-shot game. The result is a Nash equilibrium in which both agents defect. 2. Agent i updates its current state by comparing its total payoff with all neighbors and imitates the state of the wealthiest agent. Agent i does not update its current state when its own payoff is highest among neighbors. 3. Agent i diffuses influence field Fi (l) in the system according to the total payoff Πi . Also, l is the distance from agent i. Here, Fi (l) is defined as

466

H. Kuraoka et al. / Spatial Prisoner’s Dilemma in a Network Environment

Figure 2. Simulation model

Fi (l) = Πi {exp(−l2 /λ2 )}.

(3)

In that equation, λ is defined as the influence parameter. It can be said that the higher the total payoff Πi , the stronger the influence field Fi (l) becomes. Fi (l) is attenuated as the distance from agent i becomes longer. Agent i perceives the total influence field Ii from the system. Ii is defined as  Ii = Fj (|xi − xj |). (4) j=i

4. When the state of agent i before updating its state is D, i replaces the link with probability p between the opponent whose state before updating is also D by a new one. It can be said that the higher the probability p, the more flexible the network structure becomes. The probability Φik of agent k to be chosen as a new opponent by agent i is defined as Φik =

Fk (|xi − xk |) . Ii

(5)

An agent whose influence field is strong will be chosen as a new opponent with high probability. In this paper, the heterogeneity of information distribution is defined as the difference of Ii among agents. The degree of homogeneity of information distribution is defined as influence parameter λ. In case of large λ, Fi (l) is not attenuated so much and the

H. Kuraoka et al. / Spatial Prisoner’s Dilemma in a Network Environment

467

information distribution becomes homogeneous since the influence field Fi (l) is diffused globally among the system. On the other hand, when λ is small, Fi (l) is attenuated notebly and the information distribution becomes heterogeneous since the influence field Fi (l) is diffused locally among the system. The indicator used in the model will be (a) ρc (t): the percentage of the agent whose state is C, (b) LN : the average shortest path length of network N which measures the typical separation between any pair of elements in the network, and (c) CN : clustering coefficient of network N which meaures the clustering of an element’s neighborhood. 3. Simulation results and discussion 3.1. Simulation settings In this paper, simulation of various influence parameters λ is shown to analyze the effect of heterogeneous information distribution. The relationship between the influence field according to the influence parameter λ and the scale of the field is shown in Figure. 3.

Figure 3. The relationship between Fi (l) and the scale of the field

In this paper, λ = 10000 is defined as the situation in which information distribution is completely homogeneous. The degree of heterogeneity of information distribution becomes stronger in order of decreasing λ. Three cases (p = 0.01, 0.1, 1) are tested. The model is characterized as N = 1000, n = 2. The initial state and opponents of the agent are chosen randomly. Parameters used in the matrix of the Prisoner’s Dilemma are R = 1.0, T = 1.6, P = 0.1, S = 0. 3.2. Network emergence 3.2.1. Relation between ρc (100) and heterogeneous information distribution Figure 4 shows average ρc (100) according to the influence parameter λ in each p. Here, ρc (100) is the percentage of agents whose state is C in time step 100. In case of p = 0.1 and 1, emergence of cooperation is observed in about λ = 50 because the ρc (100) converges in about λ= 50 to ρc (100) = 0.8, 0.9. A notable difference is inferred to exist between λ = 5 and λ = 10. In case of small λ, such as λ = 5, opponents playing games will only be nearby agents because the perception of the influence field will be the one diffused by the near agent. For that reason, agents will lose the chance of updating their state to C by playing games with distant agents; no emergence of cooperation is observed. It can be said that cooperation does not emerge if information distribution is notably heterogeneous (λ = 5) among agents. In the case of p = 0.01, the value of ρc (t) is low. If the network structure is not sufficiently flexible, cooperation’s emergence will be difficult to observe because less chance exists of updating its state to C by playing games with other agents.

468

H. Kuraoka et al. / Spatial Prisoner’s Dilemma in a Network Environment

Figure 4. Transition of average ρc (100) according to λ

3.2.2. Relation between LN , CN and heterogeneous information distribution

(a) LN (b) CN Figure 5. Transition of LN and CN according to distance

Figure 5(a) shows average LN according to the influence parameter λ in each p. It is observed that LN is at its highest among all λ at λ = 5 and it is attenuated as λ becomes larger. In case of p = 1, LN converges to 4. Particularly, when λ = 10000, a situation in which information distribution is completely homogeneous, small-world phenomena are observed (LN = 3.67). There are shortcuts that link distant agents in a small-world network, but for small λ such as λ = 5, the game opponents will be nearby agents, so there will be no shortcut that makes the network small. For small p, no significant difference of LN exists among λ. It is considered that this is true because the network is not flexible and the agents cannot change opponents so often. Figure 5(b) shows average CN according to the influence parameter λ in each p. Similar to the case of LN , CN is at its highest among all λ at λ = 5 and it is attenuated as λ becomes larger because, in the case of small λ, the game opponents will be nearby agents. There will be a sufficient chance of two agents having a common neighbor. It is observed that for all λ of p = 1, the network structure is highly clustered considering the fact that the clustering coefficient of the random network crand is crand = ki /N = 0.004, and that the clustering coefficient in all λ of p = 1 is greater than crand . Smallworld phenomena emerged also from the perspective of clustering coefficients.

H. Kuraoka et al. / Spatial Prisoner’s Dilemma in a Network Environment

469

3.3. Robustness of cooperation

(a) λ = 5

(b) λ = 129 (c) λ = 198 Figure 6. Connectivity distributions

(d) λ = 10000

Figure 6 shows the distribution of degree of λ = 5, 129, 198, 10000 for p = 1. Apparently, no hub agent exists: that is, no agent has a much greater number of links than another in the case of λ = 5, but there is a hub agent in case of λ = 10000. On the other hand, power law distributions are visible in the case of λ = 129, which shows the property of a scale-free network. The influence field of an agent becomes stronger when the total payoff of the agent grows and spreads through the system homogeneously when the information is homogeneous (λ = 10000). It is considered that when information is homogeneous, a positive feedback mechanism pertains against the agent that starts to grow as a hub because this agent will be chosen as an opponent with high probability by other agents because of homogeneous information distribution. A network structure that contains a few hubs is considered to be robust against random failure, but is vulnerable to the failure of hubs[3] because hubs chiefly sustain the network structure.

Figure 7. Transition of ρc (t) according to λ

Figure 7 shows the transition of ρc (t) of λ = 5, 129, 198, 10000 in the case of p =1. This figure illustrates the sensitivity of the stationary network structure to perturbations that act on the highly connected agents, which reflects their key role in sustaining cooperation. At time step 100, the most connected agent is externally forced to change its state from C to D, triggering an avalanche. It is apparent that, in the case of λ = 10000, in which information distribution is completely homogeneous, an avalanche engenders ρc (t)=0: no cooperation exists. It is considered that a large avalanche occurs when there are only a few hub agents. In other

470

H. Kuraoka et al. / Spatial Prisoner’s Dilemma in a Network Environment

words, cooperation might emerge on a network that is structured under homogeneous information distribution, but that cooperation will not be robust. For λ = 129, 198, in which heterogeneous information distribution is considered and a couple of hubs exist, the emergent cooperation is robust and reproduces transient dynamics in which the system searches for a new stationary globally cooperative structure. These characteristics pertain because some clusters exist that are sustained by each hub; they stop the avalanche. After the avalanche settles, a new network will be structured as centering upon one hub, which subsequently engenders the reproduction of cooperation.

4. Conclusion This paper introduced heterogeneous information distribution to the spatial Prisoner’s Dilemma and analyzed the effects of heterogeneous information distribution using a computer simulation. Results demonstrated that: (a) the resulting network has characteristics of a small-world and scale-free network when information distribution is heterogeneous; (b) cooperation that emerges on a network under homogeneous information distribution is fragile; and (c) robust cooperation emerges when heterogeneous information distribution is considered. In other words, the perspective of heterogeneous information distribution is important in creating a decision-making solution for an artificial system. Further studies should explore the effect of heterogeneous information distribution on real-world problems such as manufacturing systems.

References [1] G. Abramson, M. Kuperman: Social games in a social network, Physical Review E 63 (2001), 030901. [2] A.-L. Barabasi, R. Albert, H. Jeong: Mean-field theory for scale-free random networks, Physica A 272 (1999), 173-187. [3] R. Albert, H. Jeong, A.-L. Barabasi: Error and attack tolerance of complex networks, Nature 406 (2001), 378-382. [4] M. Buchanan: Nexus: Small Worlds and the Groundbreaking Science of Networks, W. W. Norton & Company, 2002. [5] S. Kauffman: Investigation, Oxford Univ. Pr on Demand, 2003. [6] K. Klemm, V.M. Eguiluz: Highly clustered scale-free networks, Physical Review E 65 (2002), 036123. [7] R. Pfeiffer, C. Scheier: Understanding Intelligence, Massachusetts Institute of Technology, 1999. [8] B. Skyrms, R. Pemantle: A dynamic model of social network formation, Proceedings of the National Academy of Sciences 97(16) (2000), 9340-9346. [9] D.J. Watts, S.H. Strogatz: Collective dynamics of ’small-world’ networks, Nature 393 (1998), 440-442. [10] M.G. Zimmermann, V.M. Eguiluz, M.S. Miguel: Coevolution of dynamical states and interactions in dynamic networks, Physical Review E 69 (2004), 065102.

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

471

Behavioral Decision for Multi-agent Systems with Dynamic Interaction Yusuke Ikemoto a,1 , and Toshio Fukuda a a Nagoya University Abstract. In this study, we discuss a behavioral decision for multi-agent systems. Merging behaviors of vehicles at non-signalized intersection are mentioned as an example of behavioral decision. The algorithm enables vehicles to pass through an intersection one by one like weaving and zipping manners without vehicle collision. These behaviors are achieved by an emerging special-temporal pattern of a coupled oscillator network, where a mutual communication between vehicles is realized by using only IVC device. The special-temporal pattern formation can be realized in the diffusion system through the mutual communication so that only locally broadcasting communication could be enough for the communication. The algorithm is therefore independent of any infrastructures at a junction such as a signal system, road-vehicle communication systems, and so on. Finally, the proposed algorithm is experimentally verified using actual autonomous mobile robots. Keywords. Local Interaction, Spatial-temporal Pattern Formation, Homogeneous System, Globally Coupled Oscillator Network, Experiment by Robots.

1. Introduction We discuss a behavioral decision for multi-agent systems with dynamic interaction.As an example, merging behavior of vehicles are treated.Automated transportations are one of the desired systems in order to improve the traffic efficiency and to reduce traffic accidents. In ITS research field, most researches of an automated vehicle have an approach to develop an intelligent system that automatically and safely drives a vehicle on behalf of human or that assists a driver. These automation technologies are not sufficient for the automated transport system, because a cooperative manner in driving becomes one of important factors for smooth and efficient traffic flow control especially in a congested area. The ideally efficient traffic flow will be realized when each driver cooperatively selects his own path, locally or globally checking other cars on its driving paths. For example, several studies on the cooperative driving using IVC (Inter-vehicle Communications)[1][6] have been reported. A vehicle that is equipped with intelligent system such as IVC and car navigation system can drive cooperatively without any advanced infrastructures. Authors have also proposed a cooperative planning algorithm, where each car revises its own path at all intersection based on other vehicles ’s destinations exchanged in a local 1 Yusuke

Ikemoto, Department of Micro-Nano System Engineering, Graduate School of Engineering, Nagoya University, Japan. Tel.: +81 52 789 4481; Fax: +81 52 789 3909; E-mail: [email protected]+81 52 789 4481

472

Y. Ikemoto and T. Fukuda / Behavioral Decision for Multi-Agent Systems

Figure 1. Definition of blue zone and red zone

Figure 2. Transitions of target phase-locked state. (a) There is two cars around an intersection. (b) An additional car joints the communication. (c) A car of three passed an intersection.

area. This is a planning approach to improve the traffic flow, and a local performance improvement is another approach for the same purpose. In this study, we therefore focus on a cooperative merging control for an intersection flow, using only IVC as one of local performance improvement approaches. We have proposed zipping and weaving behaviors which is observed in the merging in non-signalized intersection[7]. The merging control for the zipping and weaving manners includes more general cases. Finally, we make experimants using actual mobile robots and confirm the effectiveness of the proposal algorithm.

2. Target Behavior and Assumption A merging control algorithm at a general non-signalized intersection is dealt with as the extension of zipping and weaving manners. Figure 2 shows the target behaviors of a merging. At a general intersection. The time allocation depending on the traffic states is informative for traffic management. For example, in the case of figure 2(a), the time for passing through is allocated by two kinds of roads. Only when some vehicles which try to turn right approaches the intersection as shown in figure 2(b), the time is allocated by three kinds of roads. After all vehicles which located in right-turn lane finish passing through as show in figure 2(c), the time is allocated by two kinds of roads again. 2.1. Common Reference Signal In order to realize the synchronization among vehicles, each vehicle behaves based on the own reference signal and the signals are synchronized by an interaction. The reference signal is defined as follows: θ˙i = ωn ,

(1)

where θi is the phase of vehicle i and θ˙i is time differentiation of θi . ωn is natural frequency, which is determined by an intersection configurations and so on. Here, the blue zone and red zone are defined in figure 1, corresponding to the value of the reference signal θi . When the value of θi is in the blue zone, a vehicle,i, is allowed to enter an intersection. In the case of the red zone, a vehicle i must stop before entering. In this experiment, an even-length time is allocated for two zone as ideal state that is independent of an intersection configurations such as the number of roads, size of an intersection and so on. Considering the state shown in figure 2 as an example.

Y. Ikemoto and T. Fukuda / Behavioral Decision for Multi-Agent Systems

473

2.2. Synchronization of Traffic from Different Roads Reference signals, which independently oscillate, need to be synchronized by simple mutual communication. Coupled oscillator network is used for synchronization among each reference signal. There are many kinds of oscillator models and any kinds of oscillator models can be installed in our proposed algorithm. A reaction-diffusion system expressed in equation (2) is used, because of simplicity of parameter settings.

θ˙i = ωn + K

N 

a · f (θj − θi ),

j=1

where a =



(2)

−1 if j = i 1 if j = i

and N (≥ 2) and K are the number of roads or lanes and coupling strength of diffusion, respectively. The function f , that has a phase difference variable as input, means oscillators’ coupling. When two vehicles in the same road communicate, the suffix j is the same as i, that means the reference signals synchronize with zero phase difference. On the other hand, the reference signals synchronize with a certain phase difference when two vehicles in different roads communicate. Figure 3 shows the time series of reference signals in a numerical simulation, where four vehicles are supposed to approach an intersection from two difference roads. θpi and θqi are reference signals of vehicles in the road p and q, respectively. As shown in the figure, the phase difference between θp1 and θp2 converges to zero, while the phase difference between θpi and θqi converges to π. Using these coupled oscillator networks, the reference signals in equation (1) can be synchronized. The stability of synchronization in general case where an intersection consists of N roads is investigated in the next section. 3. Synchronization Stability All vehicles are supposed to communicate with all vehicles in a local area around an intersection using a broadcasting communication. The coupled oscillator network could be considerd as globally coupled oscillator[14]. To be synchronized with an even phase difference, the function, f , in the equation (2) has to be set properly. In order to stabilize the globally coupled oscillator, the function, f , should be set as the follows: 1 f (φ) = − φ + 1, π

(3)

where φi means the phase difference. Considering the symmetry, it is enough to prove only a case of φ1 = θ1 − θ2 converges. The difference of φ1 is φ˙1 = θ˙1 − θ˙2 =K

N  j=1

{f (θ1 − θj ) − f (θ2 − θj )}

474

Y. Ikemoto and T. Fukuda / Behavioral Decision for Multi-Agent Systems

Figure 3. Example of phase-locked transitions. The phase difference between θp1 and θp2 converges to zero, while the phase difference between θpi and θqi converges to π. It means that vehicles can generate virtual phase signal by local communications.

=K

N −1 

  f

j=1

j 

 φl

 −f

l=1

j 

φl+1

.

(4)

l=1

From the equation (3),   n n   φi = f (φi ) − (n − m) f i=m

"

(m > n).

(5)

i=m

Here, substituting the equation (5) to the equation (4), φ˙1 = K

N −1 

(N − j) {f (φj ) − f (φj+1 )}

j=1

= K {N f (φ1 ) − (N − 2)} N (φ1 = 2π − i=2 φi ) =−

KN 2π (φ1 − ). π N

As a consequence, φ1 is solved as follows:   2π KN φ1 = Cexp − t + , π N

(6) (7)

(8)

where C is a constant number determined by an initial condition. From the equation (8), it is proved that the phase converges exponentially and the phase difference becomes 2π/N after enough time passed.

4. Implementation 4.1. Behavior Rules A vehicle determines to enter or stop before an intersection according to its reference sig2 nal. A vehicle moving at maximum velocity vmax takes vmax /2a[m] for its completely 2 /2a[m] ahead stop, where a is deceleration. The decision line is supposed to drawn vmax of an intersection. A vehicle has to make a decision where it could enter an intersection

Y. Ikemoto and T. Fukuda / Behavioral Decision for Multi-Agent Systems

475

or wait for the next phase at the decision line. Before passing through the decision line, a vehicle can move regardless of blue zone or red zone, and then a vehicle is allowed to enter an intersection when it is in a blue zone on the limit line. On the other hand, a vehicle decelerates if it is in a red zone when it passes through the decision line. The decelerating vehicle starts to accelerate to enter an intersection if it is in the blue zone of the next cycle. 4.2. Parameter Settings The one cycle is defined as an interval from a blue zone to the next blue zone and tbi and tra are defined as the interval of the blue zone of vehicle i and a red zone that the other vehicles should decelerate or stop. The necessary condition for collision avoidance is that cars in the other lanes never enter the intersection until a car in a lane finishes passing completely once cars in a lane enter. That means the deceleration red interval tra becomes longer than the blue interval. The sum of blue zone intervals in all lanes is calculated by N  θbi i=1

ωn

=N

θb , ωn

(9)

where θb (0 ≤ θb < 2π) is the angle of the blue zone shown in figure 1. Therefore, tra is calculated by 1 N



θb 2π −N ωn ωn  # 1 2π − θb . = ωn N

tra =

#

(10)

Additionally, the geometrical condition also should be satisfied, as follows: 

tra

v(t)dt > L + l,

(11)

0

where l and L are vehicle length and road width, respectively. Assuming that a velocity of a vehicle is constant, the condition for collision avoidance is obtained as follows: 1 ωn



2π − θb N

# >

L+l . vconst.

(12)

Based on this condition, ωn and θb are designed. These parameters depends on intersection specifications that includes the number of road N , road width L and vehicle velocity vconst . These information is available since they can be embedded in car navigation systems.

476

Y. Ikemoto and T. Fukuda / Behavioral Decision for Multi-Agent Systems

Figure 4. Phase locked

4.3. Communication Range for Convergence The reference signals should be synchronized at least before a vehicles enters an intersection. A necessary communication range R[m] is calculated by the convergence time spent for a phase lock using the equation (8). As far as an initial condition,the necessary convergence time for phase lock becomes the maximum in the case of φi t=0 = 0. Besides, the convergence time becomes the maximum in the case of N = 2 since the phase is exponentially locked with a stability constant, KN/π. Consequently, the necessary communication distance R is calculated on the condition where the number of roads N , are two, and where stability constant C, is −π. Figure 4 shows transitions of θ1 , θ2 and θ3 where N = 3, K = 0.4 and ωn = 0.075. The phase almost converges and is locked after the seventh mutual broadcasting communication. The necessary communication distance is calculated as follows: R > 2 × v × 7vtc = 14vtc + L,

(13)

where L[m] is the width of an intersection.

5. Experiment 5.1. Experimental Settings In this experiment, a vehicle velocity is simplified as constant speed and binary with no acceleration and no deceleration. The velocity and the length of a vehicle are set as vmax = 50[mm/s] and l = 65[mm], respectively. The velocity is corresponding to about 20[km/h] of a current passenger vehicle, considering the scaling factor. Only vehicles locating in the communication area is allowed to receive information from other vehicles in other roads. Vehicle can recognize an intersection by sensing white lines drawn on the floor with infrared sensors so that it pass through a cross, turn right or left and stop before a cross. The internal value is transmitted with 250[msec] intervals by checking an internal timer interrupt of a micro processor. The internal data received from other vehicles are buffered and used to update own internal value of the oscillator. If no information is received until the next timer interrupt, the internal value is updated with K = 0 of the equation (2). Figure 6 shows the intersection model in this experiment. The dimensions of the intersection is designed according to the ratio between an actual vehicle and an experimen-

Y. Ikemoto and T. Fukuda / Behavioral Decision for Multi-Agent Systems

477

Experiment (a)

Experiment (b)

Figure 5. experimental view

Figure 6. Intersection model

Experiment (a)

Experiment (b)

Figure 7. Experimental result

tal vehicle. Four kinds of roads are supposed to connect at an intersection, and N , S, W and E are used to express each road that comes from north, south, west and east directions. Each road has three kinds of lanes named i, r and o, which goes into an intersection for passing through, for turning right and goes out of an intersection, respectively. For example, the vehicles which come from road Si or Er will go out an intersection from road No . In this case the vehicle routes are expressed as a route Si -No and a route Er -No , respectively. 5.2. Results There are two kinds of experiments: (a) the merging between route Ei -Wo and route Wr So and (b) the merging between route Si -No , route Ei -Wo and route Sr -Eo . On the both of the experiments, the parameters is set as ωn = 0.075, θb = 1.57 and K = 0.4 by the equation (12). In the experiment (a), two numbers of vehicles approach the intersection to pass through the route Wr -So at t = 5, 20 during six numbers of vehicles pass through the route Si -No . In the experiment (b), two numbers of vehicles approach the intersection to pass through the route Sr -Eo at t = 10, 30 during five numbers of vehicles pass through the route Si -No and five numbers of vehicles pass the route Ei -Wo . Figure 7 indicates the distance from a center of the intersection every 1 second. Both results clearly show that vehicles alternately pass through the intersection one by one. Some vehicles stop before reaching a limit line because they keep a distance from fore vehicle stopping on a limit line. The reason the velocity of some vehicles is wiggling is that they detect the line and rotate during line-trace.

478

Y. Ikemoto and T. Fukuda / Behavioral Decision for Multi-Agent Systems

6. conclusion In this study, we discuss a merging algorithm at non-signalized intersection as behavioral decision for multi-agent systems and the effectiveness of the algorithm was confirmed by mobile robots in real space. Additionally, the stability of the synchronization algorithm between vehicles using coupled oscillator network was proved by a mathematical analysis. The proposal algorithm is effective since vehicles can automatically generate a global order using only locally broadcasting in situation of not only merging but also other cooperative driving. The synchronization algorithm with a coupled oscillator network works. Widespread applications are expected for mutual concessions in general road, allocating the time for a lane change and so on.

References [1] M. Parent, “Distributed motion control of an electric vehicle,” Proc. 5th International Workshop on Advanced Motion Control, pp. 573, 1998 [2] C. Thorpe, T. Jochem, and D. Pomerleau, “The 1997 Automated Highway Free Agent Demonstration,” Proc. IEEE Conf. ITS, 1997, CD-ROM. [3] O. Gehring and H. Fritz, “Practical Results of a Longitudinal Control Concept for Truck platooing with Vehicle to Vehicle Communication,” Proc. IEEE Conf. ITS, 1997, CD-ROM. [4] A. Cochran, “AHS Communications Overview,” Proc. IEEE Conf. ITS, 1997, CD-ROM. [5] K.S. Chang, J. K. Hedrick, W. B. Zhang, P. Varaiya, M. Tomizuka, and S. Shladover, “Automated Highway System experi-ments in the PATH Program,” IVHS J., vol. 1, no. 1, pp. 63-87, 1993. [6] S. Kato, S. Tsugawa, K. Tokuda, T. Matsui and H. Fujii, “Vehicle Control Algorithms for Cooperative Driving With Automated Vehicles and Intervehicle Communications,” IEEE Trans. Intell. Transport. Syst., vol. 3, pp. 155-161, Sep. 2002. [7] Y. Ikemoto, Y. Hasegawa, T. Fukuda and K. Matsuda“Zipping and Weaving : Control of Vehicle Group Behavior in Non-signal Intersection,” Proc. IEEE Int. Conf. on RA, 2004, CD-ROM. [8] A. Bejan and R. Lawrence, “Peer-to-Peer Cooperative Driving,” 17th Int. Symp. Computer and Information Sciences, pp. 259-263. [9] A. Uno, T. Sakaguchi and S. Tsugawa, “A Merging Control Algorithm based on Inter-Vehicle Communication,” Proc. IEEE Conf. ITS, 1999, pp. 783-787. [10] Y. Kuramoto, “Chemical oscillations, waves, and turbulence,” Berlin, Springer-Verlag, 1984 [11] J. S. Kwak and J. H. Lee, “Infrared Transmission for Intervehicle Ranging and Vehicle-To Roadside Communication Systems Using Spread-Spectrum Technique,” IEEE Trans. Intell. Trans-port. Syst., vol. 5, pp. 12-19, Mar. 2004. [12] M. Akanegawa, Y. Tanaka and M. Nakagawa, “Basic Study on Traffic Infirmation System Using LED Traffic Lights,” IEEE Trans. Intell. Transport. Syst., vol. 2, pp. 197-203, Dec. 2001. [13] “Optical Transmitter and Receiver for Inter-vehicle Communi-cation,” OKI Technical Review, OKI Electric Industry co., Ltd. Homepage, http://www.oki.com/ . [14] E. Brown, P. Holmes, and J. Moehlis, “Globally coupled oscillator networks” Perspectives and Problems in Nonlinear Science: A Celebratory Volume in Honor of Larry Sirovich, E. Kaplan, J. Marsden, K. Sreenivasan, Eds., p. 183-215. Springer, New York, 2003

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

479

Co-creative Composition Using Multiagent Learning: Toward the Emergence of Musical Structure1 Shintaro Suzuki a,2 , Takeshi Takenaka a and Kanji Ueda a a Research into Artifacts, Center for Engineering (RACE), University of Tokyo Abstract. This paper presents Co-creative Composition, by which the musical note agent decides the next position while interacting with other agents in a musical sheet-like environment. The agent determines its action with reinforcement learning according to the sensory consonance theory. The generated chords are evaluated through comparison with music theory and through psychological experimentation. Results demonstrate that the generated chords were similar to those obtained through music theory, and show that the subjects felt their clarity of sound. Keywords. Co-creation, Reinforced Learning, Consonance, Multi-agent

1. Introduction >From the perspective of music cognition, we listen to the entire impression of multiple tones rather than to each tone in music. Each musical note is dependent on other notes and has a strong relation to them based on listeners’ cognitive features. Musical composition, therefore, is thought to be a design problem of these relations. Prior studies for understanding such human composition in cognitive psychology have employed analytical approaches and have represented attempts to create a human model of the composition process. Their approaches seem to be insufficient because processes of human composition have variety and depend on persons or situations. Consequently, we aim not to create a model, but to understand synthetic aspects of human composition that involve relationships between cognitive features and emergence of musical structure through solving the design problem. However, it is very difficult to design proper relations of tones because many layout methods of tones exist. A multi-agent approach has been applied to this complex system. We propose this Co-creative Composition system, in which music is generated through interaction of autonomous musical notes based on cognitive features. Musical note agents 1 Use

IOS-Book-Article.tmpl file as a template. to: Shintaro SUZUKI, Research into Artifacts, Center for Engineering (RACE), University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa-shi, Chiba. Tel.: 04 7136 4276; Fax: 04 7136 4276; E-mail: [email protected]. 2 Correspondence

480

S. Suzuki et al. / Co-Creative Composition Using Multiagent Learning

learn suitable relationships with other agents through reinforced learning. In addition, we specifically examine sensory consonance, which is a basic feature of music cognition, as a relationship among agents. We generate chords (chord sequence) through simulation using our system and examine what kind of musical structure is emergent in them through comparison with music theory and by psychological experimentation. Some previous studies on automatic composition have mainly used a database of existing music and have reconstructed data with emergent-based calculation [3]. Prior studies have also examined automatic harmonization and automatic counterpoints [1] [2]. Such approaches seem to be unsuitable for our object – understanding synthetic aspects of human composition – because we particularly explore generation of musical phrases created by human, musical theory and composition technique themselves, which are given in advance in those approaches.

2. Multi-agent Simulation with Reinforcement Learning for Serial Chord Generation 2.1. Outline of simulation model Figure 1 shows the structure of a simulation model in the Co-creative Composition system. Musical note agents are in a musical-sheet-like environment. We use Q-learning in this study. The procedure is to repeat the following two-step loop: 1. Agents perceive the state of environment and select the action by discrete timestep 2. As a result of actions, agents are given a reward and update Q(s,a) according to the following function: Q(s, a) ← Q(s, a) + α[r + γ max  Q(s , a ) − Q(s, a)] a ∈A(s )

(1)

where s is the state, a is the action, α is the step size parameter, and γ is the discount rate. The action selection strategy is the Boltzmann strategy. The “state” is defined as the positional relationship of the other agents (how far the distance from the other agents is by half-tone). The “action” is defined as the distance from the present position to the next one of the agent (how far the agent moves up or down by half tone). The “reward” is the value of new position generated through the actions of agents. In the next section, we explain the process of evaluating chords. 2.2. Evaluation Method based on Sensory Consonance Theory We use the sensory consonance theory for evaluation of actions. This theory is the most widely supported one to explain the concept of consonance. The theory considers dissonance as roughness brought by the beat, which is decided by a pitch, a tone color, and a velocity [4]. In this study, we used the Sethares model for computation of dissonance [5]. Consonance was used originally for the relationship of two tones existing simultaneously (spatial consonance). In this study, however, we propose to extend the concept of consonance to the time dimension (temporal consonance). We consider short term memory (STM) during listening and set the range of memories that is equivalent to a

S. Suzuki et al. / Co-Creative Composition Using Multiagent Learning

481

Figure 1. Structure of the simulation model.

certain fixed period from present to past in a time-pitch plane to realize this idea. Generated notes are located in this range. Dissonance between the present tone and the past one in the range is defined as temporal dissonance (Fig. 2). Music is art based on tempo. Therefore, it seems to be reasonable to employ STM. In this study, temporal consonance is applied only to past tones of each agent between memories because we conceptualize each agent’s motion as a melody and think of the generated music as a polyphonic music in which each melody is highly independent. Dissonance between two tones is evaluated based on the sensory consonance theory. Dissonance is converted into a consonance point using linear transformation. Spatial consonance R s or the temporal consonance R t is given by the average of the points between tones that are relevant to them. Using R s and Rt , the reward R is given as R = Rs × Rt

(2)

Figure 2. Model of short term memory during listening.

2.3. Experiment 1 2.3.1. Simulation Experiment: Outline Using the model explained in the preceding section, we conducted an experiment in four cases. In each case, we used different kinds of consonance to evaluate the actions. We verified the effects of temporal consonance through this experiment.

482

S. Suzuki et al. / Co-Creative Composition Using Multiagent Learning

In this experiment, the model had three musical note agents. They produced decisions simultaneously. All actions had equal duration (length of crotchet). Therefore, generated chords consisted of three notes at a constant tempo. The task for the agents was to generate eight constant-paced chords. In experiment 1 on case 4, we attempted to generate consonance and melodious chords using this system. case 1 Spatial consonance and temporal consonance (the range of memories is equivalent to 16 tones (each tone is half of a crotchet)) were used. We call this case “Md” henceforth in this study. case 2 Spatial consonance and temporal consonance (the range of memories is equivalent to 1 tone (each is half of a crotchet)) was used. We call this case “Next”. case 3 Only spatial consonance was used. We call this case “notM”. case 4 Although consonance was the same as that in case 1, agents were prohibited from crossing with each other to make their actions more melodious. This additional condition to melodize chords was set as an application of gestalt theory to music. We call this case “notCross”. 2.3.2. Simulation Experiment: Results and Analyses In each case, we obtained 10 patterns (generated chords) of convergent results. An exemplary learning curve is shown in Fig. 3. It shows the average reward of three agents on the vertical axis and the number of trials on the horizontal axis. Figure 4 shows an example of the generated chords in case 1. Below, we describe the analysis of the generated chords.

Figure 3. An example of the learning curve in case 1.

Figures 6(a) and 6(b) show the frequency distribution of component tones of the generated chords in case 1 and case 3. Figure 6(a) shows that the generated chords in case 1 consist only of tones of a major scale (Fig. 5). On the other hand, Fig. 6(b) shows that generated chords in case 3 consist of all tones between 2 octaves. Component tones of the generated chords in case 2 almost correspond with tones of a major scale. Through this analysis, it turns out that we were able to obtain chords of a major scale using temporal consonance when the range of memory has appropriate length. This result suggests that humans created tonal music and certain kinds of folk music based on both spatial consonance and temporal consonance.

S. Suzuki et al. / Co-Creative Composition Using Multiagent Learning

483

Figure 4. Score: An example of chords simulated in case 1.

Figure 5. Major scale on key of F.

(a) case 1.

(b) case 3.

Figure 6. Frequency distribution of component tones of generated chords

In case 4, component tones of the generated chords have identical characteristics to those in case 1. We successfully generated consonance and melodious chords using our system.

484

S. Suzuki et al. / Co-Creative Composition Using Multiagent Learning

2.3.3. Psychological Experiment: Method We conducted a psychological experiment using an impression rating As stimuli, we used 25 chords generated through simulation (Md: 5, Next: 5, notM: 5, notCross: 5, random: 5). This experiment enrolled 18 participants as volunteers (17 males, 1 female, 22–26 years old). The task was to evaluate their impressions. We used a semantic differential technique in which the participants evaluated impressions according to 14 adjective pairs on a 5 point scale. 2.3.4. Psychological Experiment: Results and Analysis First, we obtained the averaged value of each stimulus type according to each evaluation item across all participants. The averaged value was defined as an evaluated value of the stimuli. We obtained them in each case (Md, Next, notM, notCross, Random) for each evaluated item. The average value was defined as the evaluated value of the generation type on the evaluated item. Figure 7(a) shows results of these data analyses.

(a) Radar of stimulus

(b) Mapping of stimulus.

Figure 7. Results of the psychological experiment

Furthermore, we did factor extraction to the evaluated value of each stimulus using factor analysis. Principal factor method was used for factor extraction. Orthogonal and varimax rotations were used for transformation of values. The two main factors were extracted from results of factor analysis. The first factor, called “sense of affinity”, comprised likability, naturalness, coherency and ease of remembrance. The second factor, called “sound clarity”, comprised lightness, brightness, beauty of sound and reduction in tension. Figure 7(b) shows that participants evaluated stimulus according to two main axes, thereby suggesting that participants had different impressions according to generation types. Chords using temporal consonance give participants the impression of "sound clarity" because the stimulus generated in the temporal consonance (Md, notCross) condition showed high scores on the vertical axis in the map.

S. Suzuki et al. / Co-Creative Composition Using Multiagent Learning

485

2.4. Experiment 2 2.4.1. Simulation Experiment: Outline In experiment 2, we set a different rhythm pattern in each case. The model also has three musical note agents. They make decisions at different times. Actions of agents have different duration. Therefore, the generated chords consisted of three melody lines with different rhythm (time and duration). In both cases, the consonance used to evaluate the actions was identical to that in experiment 1 case 1.

Figure 8. Rhythm pattern in case 1.

case 1 The task for the agents was to generate chords according to the rhythm pattern we made (Fig. 8). We verified the effect of temporal consonance in chord generation with a rhythm pattern through this experiment. case 2 The task for the agents was to generate chords according to the rhythm pattern extracted from the opening part of Bach’s Dreistimmige Invensionen (Sinfonien) Sinfonie No. 14 B-dur. We verified the characteristics of generated chords compared with Bach’s original pattern. 2.4.2. Simulation Experiment: Result and Analysis In each case, we obtained 10 patterns (generated chords) of convergent results. In case 1, the result shows that generated chords consist of tones of a major scale. In the simulation with the rhythm pattern, the chords in a major scale were also generated as well as in the simulation with constant rhythm. This result also supports our idea that humans created tonal music based on both spatial consonance and temporal consonance. In case 2, compared with Bach’s original pattern, the generated chords have a lesssmooth melody and unnatural sound (Fig. 9). These strange characteristics are inferred to arise from the fact that we ignored some cognitive features about melody in our system. Consequently, evaluation based on some cognitive features about melody seems to be necessary to generate chords with a smooth and natural melody. 2.5. Summary This paper presented the Co-creative Composition, in which the musical note agent decides the next position interacting with other agents on the musical-sheet-like environment. Generated chords have the order that component tones were identical to a major scale from the perspective of musical theory and have the order of "sound clarity" at the aspect of human impression using temporal consonance. Therefore, we inferred the possibility that these orders had generated through structural determination based on sensory

486

S. Suzuki et al. / Co-Creative Composition Using Multiagent Learning

Figure 9. Example of generated pattern in case 2.

consonance. As shown in experiment 2 (case 2), some other evaluations were based on other cognitive features. They should be incorporated into our system to generate more musical chords that have various types of musical order.

References [1] K. Sugawara, R. Yoneda, T. Nishimoto and S. Sagayama, Automatic Harmonization for Melodies based on HMMs Including Note-chain Probability, ASJ 1-9-2 (2004), 665-666 [2] S. Nakagata, T. Nishimoto and S. Sagayama, Automatic Generation of Counterpoints Based on Dynamic Programming and Probability of Tone Series, MUS 56 (2004), 65-70 [3] K. Verbeurgt, M. Fayer and M. Dinolfo, A Hybrid Neural-Markov Approach for Learning to Compose Music by Example, Canadian AI 3060 (2004), 480-484 [4] R. Plomp and W.J.M. Levelt, Tonal Consonance and Critical Bandwidth, Journal of the Acoustical Society of America 38 (1965), 548-560 [5] W.A. Sethares, Local consonance and the relationship between timbre and scale, Journal of the Acoustical Society of America 94 Num. 3 (1993) 1218-122

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

487

Self-assembly of Mobile Robots: From Swarm-bot to Super-mechano Colony Roderich Groß a,b , Marco Dorigo a , and Masaki Yamakita b a IRIDIA, Université Libre de Bruxelles, Belgium b Yamakita Lab, Tokyo Institute of Technology, Japan Abstract. Up to now, only a few collective or modular robot systems have proven capable of letting separate and autonomous units, or groups of units, self-assemble. In each case, ad hoc control algorithms have been developed. The aim of this paper is to show that a control algorithm for autonomous self-assembly can be ported from a source multi-robot platform (i.e., the swarm-bot system) to a different target multirobot platform (i.e., a super-mechano colony system). Although there are substantial differences between the two robotic platforms, it is possible to qualitatively reproduce the functionality of the source platform on the target platform—the transfer neither requires modifications in the hardware nor an extensive redesign of the control. The results of a set of experiments demonstrate that a controller that was developed for the source platform lets robots of the target platform self-assemble with high reliability. Finally, we investigate mechanisms that control the patterns formed by autonomous self-assembly. Keywords. Self-assembly, self-reconfigurable robots, pattern formation, collective robotics, swarm robotics, super-mechano colony, swarm-bot, swarm intelligence

1. Introduction Self-assembly is a reversible processes by which discrete entities bind to each other without being directed externally. It may involve components from molecular scale (e.g., DNA strands forming a double helix) to planetary scale (e.g., weather systems) [23]. The study of self-assembling systems attracted much interest as it may lead to selfconstructing, self-repairing and self-replicating machines [21,26,11]. Self-assembly has been widely observed in social insects [1]. Insects physically connect to each other to form aggregate structures with capabilities exceeding those of an individual insect. Oecophylla longinoda worker ants, for example, form connected living bridges which other colony members traverse [15]. Understanding how insects use selfassembly to manipulate objects or to navigate all-terrain may have strong implications for robotic systems design. Self-reconfigurable robots [24,20] hold the potential to self-assemble and thus to mimic the complex behavior of social insects. In current implementations [18,24,20,

488

R. Groß et al. / Self-Assembly of Mobile Robots: From Swarm-Bot to Super-Mechano Colony

14], however, single modules usually have highly limited autonomous capabilites (when compared to an insect). Typically, they are not equipped with sensors to perceive the environment. Nor, typically, are they capable of autonomous motion. These limitations, common to most self-reconfigurable robotic systems, make it difficult to let separate modules, or groups of modules, connect autonomously. In some systems, self-assembly was demonstrated with the modules being pre-arranged at known positions [25,26]. Rare instances of less constrained self-assembly have been reported which are detailed in the following: • Fukuda et al. [9] demonstrated self-assembly among cells of the CEBOT system [10]. In an experiment, a cell approached and connected with another one. Communication among connected cells was studied to enable the group to approach and connect with additional cells [8]. • Bererton et al. [3] studied docking in the context of self-repair. One robot was equipped with a fork-lift mechanism to replace a component of its teammate. Successful docking was demonstrated for distances of up to 30 cm and angular displacements of up to 30◦ . Image processing was accomplished on an external PC. • Rubenstein et al. [19] demonstrated the ability of two modular robots to selfassemble. Each robot consisted of a chain of two linearly-linked CONRO modules [5]. The robot chains were set up at distances of 15 cm, facing each other with an angular displacement not larger than 45◦ . • White et al. [22], Griffith et al. [11], and Bishop et al. [4] studied simple, programmable parts capable of self-assembling in the context of self-replication and pattern formation. The parts slid passively on an air table and bound to each other upon random collisions. • Groß et al. [12] demonstrated self-assembly of 16 physical robots using the swarm-bot system [16,7]. The robots self-assembled from arbitrary initial positions and on different types of terrain. At present, swarm-bot is the state of the art in autonomous self-assembly. The aim of this paper is to show that a control algorithm for autonomous selfassembly can be ported from a source multi-robot platform to a different target multirobot platform. The source platform is the swarm-bot system [16,7]. The target platform is a super-mechano colony (SMC) system [6]. Although there are substantial differences between the two robotic platforms, it is possible to qualitatively reproduce the functionality of the source platform on the target platform. This requires neither modifications in the hardware nor an extensive redesign of the control. The paper is organized as follows. Section 2 overviews the hardware of the source and target platforms. Section 3 presents the control as implemented on the source platform, and details the transfer to the target platform. Section 4 presents the results of a set of experiments. Finally, Sections 5 and 6 discuss the results and conclude the paper.

2. Hardware 2.1. Source Platform (Swarm-bot) Fig. 1a shows an s-bot, the basic robotic component of the swarm-bot system [16,7, see also www.swarm-bots.org]. The s-bot has a diameter of 12 cm, a height of 19 cm

R. Groß et al. / Self-Assembly of Mobile Robots: From Swarm-Bot to Super-Mechano Colony

(a)

(b)

(c)

489

(d)

Figure 1. Swarm-bot: (a) the s-bot and (b) two s-bots connecting to each other. Super-mechano colony: (c) a child robot (top view) and (d) two child robots connecting to each other.

(including the transparent cylinder on top), and weighs approximately 700 g. The s-bot has nine degrees of freedom (DOF), all of which are rotational, including two DOF for the traction system, one DOF to rotate the s-bot’s upper part with respect to the lower part, one DOF for the grasping mechanism of the gripper (located in what we define to be the s-bot’s front), and one DOF for elevating the arm to which the gripper is attached. The robot’s traction system consists of a combination of tracks and two external wheels, called treels© . An s-bot can connect with another by grasping the connection ring (see Fig. 1b). The s-bot can receive connections on more than two thirds of its perimeter. For the purpose of robot to robot communication, the s-bot has eight RGB LEDs. The s-bot is equipped with a variety of sensors, including 15 proximity sensors distributed around the robot, two optical barriers integrated in the gripper, and a VGA omnidirectional vision system. The control is executed on an XScale board running a Linux operating system at 400 MHz. A battery provides full autonomy. 2.2. Target Platform (SMC) Super-mechano colony (SMC) [6,17] is a modular robotic concept composed of a single main body, called mother ship, and many child robots attached to it. Child robots are an integral part of the system’s locomotion. They can disband to accomplish separate, autonomous missions, and reconnect once the missions are accomplished. Furthermore, child robots have the potential to connect to each other. Fig. 1c shows the physical implementation of a child robot of an SMC system [6]. The robot has a diameter of 26 cm, a total height of 51 cm and weighs 11 kg. The child robot has five DOF, including two DOF for the traction system, one DOF to rotate the robots’ upper part with respect to the lower part, one DOF for elevating a manipulation arm (located in what we define to be the robot’s front), and one DOF to open and close a gripper that is attached to the manipulation arm. The traction system consists of two active wheels on the left and the right side, and two passive wheels in the front and the back. Each child robot is equipped with a coupling cylinder in its back that allows for receiving connections from a teammate (see Fig. 1d). A directional VGA stereo vision system is mounted on top of the robot. An additional camera is attached to the manipulation arm. The vision system can detect the relative position of the mark attached to the top of the coupling cylinder of another robot. The control is executed on an on-board Pentium MMX computer running a Microsoft Windows operating system at 233 MHz. A battery provides full autonomy. In the experiments presented in this paper we used an external power supply instead.

490

R. Groß et al. / Self-Assembly of Mobile Robots: From Swarm-Bot to Super-Mechano Colony

Algorithm 1 Self-assembly module for the source platform 1: set-color-ring(blue) 2: repeat 3: if risk-of-stagnation() then 4: set-traction-system-recovery() 5: end if 6: (i1 , i2 ) ← camera-feature-extraction() 7: (i3 , i4 ) ← proximity-sensor-readings() 8: (o1 , o2 , o3 ) ← neural-network(i1 , i2 , i3 , i4 ) 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20:

if (o3 > 0.5) ∧ grasping-requirements-fulfilled() then set-gripper(close) if successfully-connected() then set-color-ring(red) halt() else set-gripper(open) end if end if set-traction-system(o1, o2 ) until timeout-reached()

3. Control Transfer 3.1. Control of the Source Platform (Swarm-bot) Algorithm 1 details the control module for self-assembly. It is assumed that the process is seeded by the presence of a non-moving red object (e.g., the color ring of a robot). A reactive neural network (line 8) constitutes the principal control mechanism. The network takes as input the binary values i1 and i2 from the robot’s vision system (line 6) and the values i3 and i4 from the left-front and right-front robot’s proximity sensors (line 7). The network’s output (o1 , o2 , o3 ) is used to control the speed of the left and the right side of the traction system (line 19) and the connection mechanism (lines 10 to 18). Before being applied to the traction system, o1 and o2 are smoothed by a moving average function. By default, the tuple (i1 , i2 ) is assigned (0, 0). Any other assignment indicates the presence of a red object (in the front, or to the left or the right side). If an unconnected robot (i.e., a blue object) is perceived in between, i1 and i2 are set to zero. To avoid damage to the traction system and to improve the reliability of the control, a recovery move is launched if high torque is continuously present on the traction system (lines 3 to 5). During recovery, the s-bot moves backwards with a small lateral displacement. The neural network had been shaped by artificial evolution in a computer simulation [13] and subsequently transferred to the physical swarm-bot system [12]. 3.2. Transfer to the Target Platform (SMC) Algorithm 1 describes the control algorithm as it was developed for the source multirobot platform. In the following, we explain how the sensing and acting functions of

R. Groß et al. / Self-Assembly of Mobile Robots: From Swarm-Bot to Super-Mechano Colony

(a)

(b)

491

(c)

Figure 2. Self-assembly of two robots: influence of the initial orientation of the approaching robot. Examples of (a) initial and (b) final configurations. (c) Box-and-whisker plot [2] of the completion times (in s) grouped according to the initial orientation of the approaching robot (39 observations in total).

the source platform were realized on the target platform so that Algorithm 1 could be ported without any change. Some functions as, for instance, neural-network() remained identical (except for the time required for processing). Many other functions such as set-traction-system() could be transferred with minor modifications (e.g., by scaling the speed values to an appropriate range). In the following, we detail those functions which required a different implementation on the target platform to qualitatively reproduce the original function of the source platform. • risk-of-stagnation(): to detect the risk of stagnation of an s-bot, the torque on each side of the traction system was monitored. For the SMC child robot, we use the camera vision system instead. If there is the risk that the left side of the manipulation arm collides with another robot, the recovery move is executed.1 • proximity-sensor-readings(): as the target platform is not equipped with proximity sensors, we mimic virtual proximity sensors heading in the front-left and front-right directions by making use of the vision system. The reading values of the virtual sensors are computed based on the relative position to other robots. • grasping-requirements-fulfilled(), successfully-connected(): to test if the grasping requirements are fulfilled, the stereo vision system is used. The system allows for computing the relative position of the coupling cylinder. Consequently, no additional tests must be performed to validate the connection. • set-color-ring(): as the current prototype of the SMC system is not equipped with communication mechanisms other than wireless network, the robots do not signal their connection state. Therefore, each robot can receive connections at any time.

4. Experiments on the Target Platform (SMC) 4.1. Influence of Initial Orientation We examine the ability of a robot to approach and connect with a passive teammate (see Fig. 2). The two robots have identical hardware. The approaching robot is placed at a 1 Note

that the neural network lets the robot approach the object either straight, or by turning anti-clockwise. If the right side of the manipulation arm collides with the object, the neural network lets the robot retreat as a result of the high reading values from the front-right proximity sensor.

492

R. Groß et al. / Self-Assembly of Mobile Robots: From Swarm-Bot to Super-Mechano Colony

(a)

(b)

(c)

Figure 3. Self-assembly of two robots: influence of the approaching angle. Examples of (a) initial and (b) final configurations. (c) Box-and-whisker plot [2] of the completion times (in s) grouped according to the approaching angle (50 observations in total).

distance of 100 cm and orientation α with respect to its teammate. The latter is oriented so that its back with the coupling cylinder is heading towards the approaching robot. For each initial orientation α ∈ {0◦ , 90◦ , 180◦ , 270◦}, 10 repetitions are carried out, thus in total 40 trials are performed. If the robots have not established a connection within 300 s, the trial is stopped. In 39 out of 40 cases, the robots self-assembled successfully. Fig. 2c shows the observed completion times (in s). If no robot to approach is perceived, the neural network controller lets the robot turn anti-clockwise. This explains the differences in performance. Overall, it seems that the success rate does not depend on the initial orientation of the approaching robot. 4.2. Influence of Approaching Angle We examine the ability of a single robot to connect with a passive teammate when approaching it from different angles (see Fig. 3). Due to the mechanical design, the robot cannot connect with the coupling cylinder of the teammate from every angle. In fact, if the angular mismatch between the orientations of the two robots exceeds 85◦ , it is impossible to establish a connection. Therefore, potential approaching angles for a successful grasp are limited to the range [−85◦, 85◦ ]. For approaching angles in the range [−45◦ , 45◦ ], there should be no difference in the performance as the jaws of the gripper element are not likely to collide with the body of the teammate. The bigger the angular deviation, the more difficult gets the task. We study the approaching angles α ∈ {−75◦, −45◦, 0◦ , 45◦ , 75◦ }. Initially, the approaching robot is oriented towards the teammate. For each angle, ten repetitions are carried out, thus in total 50 trials are performed. If the robots have not established a connection within 300 s, the trial is stopped. In all 50 cases, the robots correctly self-assembled. Fig. 3c shows the observed completion times (in s). The fluctuations in performance are astonishingly low: all completion times are in the range [50, 63]. 4.3. Start Positions That Require to Turn Away or Retreat We examine the ability of two robots to self-assemble when their starting position and orientation are such that self-assembly is particularly difficult. To create such a situation, we take two robots forming a linear, connected chain and we generate the start positions

R. Groß et al. / Self-Assembly of Mobile Robots: From Swarm-Bot to Super-Mechano Colony

(a)

493

(b)

Figure 4. Sensor readings and actuator commands over time (in s) for the two difficult initial arrangements: translation of the approaching robot (a) to the left and (b) to the right.

from this situation via a translation of the grasping robot for 10 cm to either the left or the right side. These start positions oblige the grasping robots to turn away or retreat before approaching the target. In fact, aligning the robot on the spot in the direction of the target would result in a collision between one side of the manipulation arm and the coupling cylinder. The robots correctly self-assembled in both situations. Figs. 4a and 4b show the corresponding sensor readings and actuator commands as monitored at the end of each control cycle for the whole duration of the trial. In the first case (see Fig. 4a), the entire situation was handled by the neural network that caused the robot to retreat. In the second case (see Fig. 4b), instead, a recovery move was launched during three control cycles (at time 0, 2, and 7 s).

5. Discussion We have shown that a control algorithm for autonomous self-assembly can be ported from a source multi-robot platform (i.e., the swarm-bot system) to a different target multi-robot platform (i.e., a super-mechano colony system). A set of experiments demonstrated the validity of the approach. The robots selfassembled reliably—in 91 out of 92 trials the robots correctly established a connection. A robot can approach another and connect to it from a wide range of angles (at least 150◦ ).2 The initial orientation of the approaching robot does not affect the reliability. In addition, the control proved capable of dealing with two particular challenging arrangements in which the approaching robot is required to retreat. All experiments have been recorded on video tape (see http://iridia.ulb.ac.be/~rgross/ias2006/). The success can be partially attributed to some design choices made when developing the control for the source platform. First, the use of sensory feedback was reduced to a small set of input variables that appeared to be indispensable to perform the task. For instance, the perceptual range of the robot vision was limited substantially. Consequently, no major difficulties arose when porting the control on the target platform which 2 Note

that also for the source platform the range of possible approaching angles is limited. When starting from outside this range, the approaching robot can search for another position or the target robot may change its orientation.

494

R. Groß et al. / Self-Assembly of Mobile Robots: From Swarm-Bot to Super-Mechano Colony

(a)

(b)

Figure 5. A group of four robots self-assembling: (a) initial configuration and (b) final configuration reached after 475 s. Once a robot had completed its task, a visual mark was manually attached to the coupling cylinder.

(a)

(b)

(c)

Figure 6. Pattern formation: a group of four robots self-assembles starting from a specific initial arrangement, as shown in (a). Depending on the type of visual mark of the robot seeding the process (i.e., the robot on the right side in the figures), different patterns emerge. The final configurations shown in (b) and (c) were reached after 102 s and 207 s, respectively. During the experiments, there was no human intervention.

was equipped with directional vision only. Second, a simple neural network controller was chosen to compute the motor commands based on the sensory feedback. The neural network controller proved robust with respect to changes in the hardware as well as in the provided control functions. An open issue to address is the problem of scalability. To enable tens or more SMC child robots to self-assemble, we believe that it is beneficial if each robot can signal whether it is connected or not (as it is the case on the swarm-bot platform). Although it is possible to mimic such a function using the existing actuators of the target platform, it might be more appropriate to equip the robot with a communication mechanism (e.g., a controllable LED). To illustrate the use of such a mechanism, we conducted a preliminary set of experiments. Fig. 5 shows a group of four robots self-assembling. In this experiment, the (visual) marks on the top of the coupling cylinder of each robot were attached manually as soon as the robot completed its task. In a second experiment, we adjusted the type of visual mark of the robot seeding the process prior to experimentation. It was shown that depending on the visual mark present, distinct patterns emerged (see Fig. 6). Video recordings are available at http://iridia.ulb.ac.be/~rgross/ias2006/.

6. Conclusions The aim of this study was to show that a control algorithm for autonomous self-assembly can be ported from a source multi-robot platform (i.e., the swarm-bot system) to a different target multi-robot platform (i.e., a super-mechano colony system) if the control

R. Groß et al. / Self-Assembly of Mobile Robots: From Swarm-Bot to Super-Mechano Colony

495

algorithm is designed adequately. Although there were substantial differences between the two robotic platforms, we showed that it is possible to qualitatively reproduce the basic functionality of the source platform on the target platform. In fact, the transfer did neither require modifications in the hardware nor an extensive redesign of the control. The task to let two mobile robots of identical hardware self-assemble starting from random position had so far been demonstrated only with the CONRO system [19] and the swarm-bot [12]. The results presented in this paper demonstrate that the control ported from the swarm-bot to the super-mechano colony lets the robots accomplish the task, while preserving high reliability and low fluctuations in performance. This suggests that the control algorithm is based on some generic principles for the design of self-assembling systems. Finally, we presented a first attempt to control the patterns that are formed by autonomous self-assembly. An interesting research direction will be the study of mechanisms that lead to patterns that have a functional value for the robotic swarm.

Acknowledgments This work was accomplished during the visit of Roderich Groß to the Tokyo Institute of Technology in 2005, supported by the Japan Society for the Promotion of Science (JSPS). Marco Dorigo acknowledges support from the Belgian FNRS, of which he is a Research Director, and from the “ANTS” project, an “Action de Recherche Concertée” funded by the Scientific Research Directorate of the French Community of Belgium.

References [1] C. Anderson, G. Theraulaz, and J.-L. Deneubourg. Self-assemblages in insect societies. Insect. Soc., 49(2):99–110, 2002. [2] R. A. Becker, J. M. Chambers, and A. R. Wilks. The New S Language. A Programming Environment for Data Analysis and Graphics. Chapman & Hall, London, UK, 1988. [3] C. A. Bererton and P. K. Khosla. Towards a team of robots with repair capabilities: a visual docking system. In Proc. of the 7th Int. Symp. on Experimental Robotics, volume 271 of Lecture Notes in Control and Information Sciences, pages 333–342. Springer Verlag, Berlin, Germany, 2000. [4] J. Bishop, S. Burden, E. Klavins, R. Kreisberg, W. Malone, N. Napp, and T. Nguyen. Programmable parts: A demonstration of the grammatical approach to self-organization. In Proc. of the 2005 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pages 2644–2651. IEEE Computer Society Press, Los Alamitos, CA, 2005. [5] A. Castano, A. Behar, and P. M. Will. The CONRO modules for reconfigurable robots. IEEE/ASME Trans. Mechatron., 7(4):403–409, 2002. [6] R. Damoto, A. Kawakami, and S. Hirose. Study of super-mechano colony: concept and basic experimental set-up. Adv. Robots, 15(4):391–408, 2001. [7] M. Dorigo, V. Trianni, E. Sahin, ¸ R. Groß, T. H. Labella, G. Baldassarre, S. Nolfi, J.-L. Deneubourg, F. Mondada, D. Floreano, and L. M. Gambardella. Evolving self-organizing behaviors for a Swarm-Bot. Auton. Robots, 17(2–3):223–245, 2004. [8] T. Fukuda, M. Buss, H. Hosokai, and Y. Kawauchi. Cell structured robotic system CEBOT— control, planning and communication. In Proc. of the 2nd Int. Conf. on Intelligent Autonomous Systems, volume 2, pages 661–671. IOS Press, Amsterdam, The Netherlands, 1989.

496

R. Groß et al. / Self-Assembly of Mobile Robots: From Swarm-Bot to Super-Mechano Colony

[9] T. Fukuda, S. Nakagawa, Y. Kawauchi, and M. Buss. Self organizing robots based on cell structures - CEBOT. In Proc. of the 1988 IEEE/RSJ Int. Workshop on Intelligent Robots and Systems, pages 145–150. IEEE Computer Society Press, Los Alamitos, CA, 1988. [10] T. Fukuda and T. Ueyama. Cellular Robotics and Micro Robotic Systems. World Scientific Publishing, London, UK, 1994. [11] S. Griffith, D. Goldwater, and J. M. Jacobson. Self-replication from random parts. Nature, 437(7059):636, 2005. [12] R. Groß, M. Bonani, F. Mondada, and M. Dorigo. Autonomous self-assembly in a swarmbot. In Proc. of the 3rd Int. Symp. on Autonomous Minirobots for Research and Edutainment (AMiRE 2005), pages 314–322. Springer Verlag, Berlin, Germany, 2006. [13] R. Groß and M. Dorigo. Group transport of an object to a target that only some group members may sense. In Proc. of the 8th Int. Conf. on Parallel Problem Solving from Nature, volume 3242 of Lecture Notes in Computer Science, pages 852–861. Springer Verlag, Berlin, Germany, 2004. [14] M. W. Jørgensen, E. H. Østergaard, and H. H. Lund. Modular ATRON: Modules for a selfreconfigurable robot. In Proc. of the 2004 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, volume 2, pages 2068–2073. IEEE Computer Society Press, Los Alamitos, CA, 2004. [15] A. Lioni, C. Sauwens, G. Theraulaz, and J.-L. Deneubourg. Chain formation in Oecophylla longinoda. J. Insect Behav., 14(5):679–696, 2001. [16] F. Mondada, L. M. Gambardella, D. Floreano, S. Nolfi, J.-L. Deneubourg, and M. Dorigo. The cooperation of swarm-bots: Physical interactions in collective robotics. IEEE Robot. Automat. Mag., 12(2):21–28, 2005. [17] K. Motomura, A. Kawakami, and S. Hirose. Development of arm equipped single wheel rover: Effective arm-posture-based steering method. Auton. Robots, 18(2):215–229, 2005. [18] S. Murata, E. Yoshida, A. Kamimura, H. Kurokawa, K. Tomita, and S. Kokaji. M-TRAN: Self-reconfigurable modular robotic system. IEEE/ASME Trans. Mechatron., 7(4):431–441, 2002. [19] M. Rubenstein, K. Payne, P. Will, and W.-M. Shen. Docking among independent and autonomous CONRO self-reconfigurable robots. In Proc. of the 2004 IEEE Int. Conf. on Robotics and Automation, volume 3, pages 2877–2882. IEEE Computer Society Press, Los Alamitos, CA, 2004. [20] D. Rus, Z. Butler, K. Kotay, and M. Vona. Self-reconfiguring robots. Commun. ACM, 45(3):39–45, 2002. [21] J. von Neumann. Theory of Self-Reproducing Automata. Univ. Illinois Press, Urbana, IL, 1966. [22] P. J. White, K. Kopanski, and H. Lipson. Stochastic self-reconfigurable cellular robotics. In Proc. of the 2004 IEEE Int. Conf. on Robotics and Automation, volume 3, pages 2888–2893. IEEE Computer Society Press, Los Alamitos, CA, 2004. [23] G. M. Whitesides and B. Grzybowski. Self-assembly at all scales. Science, 295(5564):2418– 2421, 2002. [24] M. Yim, Y. Zhang, and D. Duff. Modular robots. IEEE Spectr., 39(2):30–34, 2002. [25] M. Yim, Y. Zhang, K. Roufas, D. Duff, and C. Eldershaw. Connecting and disconnecting for chain self-reconfiguration with PolyBot. IEEE/ASME Trans. Mechatron., 7(4):442–451, 2002. [26] V. Zykov, E. Mytilinaios, B. Adams, and H. Lipson. Self-reproducing machines. Nature, 435(7039):163, 2005.

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

497

Lot Release Control Using Genetics Based Machine Learning in a Semiconductor Manufacturing System , Nobutada FUJII b Kanji UEDA b and Motohiro KOBAYASHI c a School of Engineering, The University of Tokyo b Research into Artifacts, Center for Engineering, The University of Tokyo c Sony Corporation

Ryohei TAKASU

a,1

Abstract. This paper proposes a new lot release control method employing genetics based machine learning (GBML) in a semiconductor manufacturing system. A scheduling agent with a classifier system (CS) that is a GBML system is introduced to generate suitable rules to decide the timing to release new products onto a production floor according to the distribution of work in process (WIP) number. Feasibility of the proposed lot release control method is assessed based on results of computer simulations modeling a multi-product semiconductor manufacturing system. Comparisons with existing lot release control methods in terms of throughput, turn around time (TAT) and WIP were performed. Analyses of lot release rules that were acquired by the scheduling agent were also performed. Keywords. Lot release control, Reinforcement learning, Semiconductor manufacturing system,

1. Introduction Because of diversification of consumer needs and intensification of international competition, semiconductor manufacturers must make huge investments and keep pace to meet sharp demand fluctuations. Therefore, semiconductor manufacturing systems must realize efficient facility utilization and flexibility for production of widely various products and work volumes. A semiconductor manufacturing system is a representative example of complex manufacturing systems because it usually comprises hundreds of machines. A semiconductor product involves a complex and reentrant sequence of more than hundreds of processing steps. Figure 1 shows an example of the product ’s process flow. In Figure 1, named nodes represent the kind of processing step and arcs represent the input-output relation between processing steps. The huge problem size and the complex process relationship imply vastly numerous combinations. Moreover, the objective function is dynamic, not static. It is important 1 Correspondence to: Ryohei TAKASU, Kashiwanoha 5-1-5, Kashiwa, Chiba 277-8568, Japan. Tel.: +81 04 7136 4271; Fax: +81 04 7136 4271; E-mail: [email protected]

498

R. Takasu et al. / Lot Release Control Using GBML in a Semiconductor Manufacturing System

Figure 1. Example of process flow

to plan a proper production schedule to achieve high productivity and low cost in the production system. However, specific properties of semiconductor manufacturing render it difficult to determine an optimal solution using traditional static optimization. This study mainly addresses lot release control problems in semiconductor manufacturing. Some lot-release control methods have been proposed: constant work in process number (CONWIP) [1], workload regulation (WR) [2], and starvation avoidance (SA) [3]. The CONWIP manages product-dispatching timing to stabilize the work in process (WIP) throughout the production floor or on a part of it. In turn, WR regulates productrelease timing depending on the WIP number for the whole production floor to maintain the workload of the bottleneck facility at a certain level. Products are released by SA so that the WIP number of a bottlenecked facility can maintain a proper level. Current proposed lot release control methods are reportedly effective and can determine a proper schedule to realize a short turn around time (TAT) of products on a static production floor. However, it is difficult to determine the premeditated static threshold of the scheduling criteria in a large-scale and complex production floor. Moreover, even if a suitable threshold value is found, the threshold value is no more effective in cases of a production situation change in a turbulent semiconductor-manufacturing environment. A new method must be developed that can produce a proper schedule without using a static predetermined threshold of criteria. This study adopts a biological manufacturing systems approach (BMS) [4], and proposes to employ genetics based machine learning (GBML) to control lot release timing and thereby realize adaptive lot release control without premeditated thresholds in the scheduling criteria.

2. Lot Release Control Using GBML 2.1. GBML Based Scheduling A lot release scheduling agent should have a learning classifier system [5], which is one GBML method, to decide the product dispatching timing that is introduced to the production floor. It is expected that the proposed scheduling agent can plan a proper lot release schedule without premeditated thresholds because the scheduling agent can find the threshold by itself through trial-and-error learning. Because of learning characteristics, it is also expected that the proposed scheduling method is applicable to large-scale scheduling problems and can adapt properly to dynamic fluctuation on the production floor.

R. Takasu et al. / Lot Release Control Using GBML in a Semiconductor Manufacturing System

499

Figure 2. Lot release scheduling agent and the production floor

Figure 3. Configuration of lot release scheduling agent

Figure 4. WIP conversion into a 2-bit binary string

Figure 2 depicts the proposed system using the scheduling agent with a classifier system. The input to the agent is the state of the production floor, i.e. the WIP number of each machine group that consists of alternative machines. On the other hand, the output from the agent is the action: lot release. The agent action is evaluated using the evaluation criteria. Consequently, the agent obtains a suitable state-action chain. The scheduling agent is modeled using the classifier system (see Figure 3). The classifier system consists of fashioned n-bit production rules. In this paper, the condition part of the rule consists of a 2m-bit binary string that represents the WIP number of m machine groups. Condition {10} means that WIP is larger than twice the number of machines in the machine group. Condition {11} represents that WIP is less than twice, but more than the machine number, condition {01} means that WIP is less than the machine number, and condition {00} means that no product exists in the machine group (see Figure 4). Only the condition-part classifiers consist of a "do not care" mark

500

R. Takasu et al. / Lot Release Control Using GBML in a Semiconductor Manufacturing System

(#) that can match both 0 and 1 so that generalization of rules occurs. The action part consists of an n-bit binary string in the case that n kinds of product are produced. Action {1} means that it dispatches one lot of each product; action {0} signifies that it dispatches no lot. The scheduling agent ’s objective is to obtain a rule to fulfill high productivity and a preset throughput rate among product kinds. The scheduling agent ’s action is evaluated as the following: the scheduling agent is rewarded if a finished product keeps its due-date; the scheduling agent is penalized if not. Credit values are apportioned among classifiers using the bucket brigade algorithm. The action is also evaluated for the throughput rate among product kinds: the scheduling agent receives a reward if the throughput rate meets or exceeds the target throughput rate; otherwise, it is penalized. The credit values are assigned among classifiers using a profit sharing plan (PSP) [6]. Genetic algorithms [7] are used to create new classifiers to maintain various classifiers and to avoid entrapment in a locally minimal solution. 2.2. Scheduling on the Production Floor The production floor scheduling problem must be solved properly. A dynamic scheduling method using self-organization [8] solves the scheduling problem. In self-organization based scheduling, production proceeds through local interaction among machines and AGVs transporting a product without global control. A potential field, which is a superposition of attraction and repulsion fields, is employed to realize interaction on the production floor.

3. Simulation Results and Discussion 3.1. Simulation Setting Computer simulations are performed to verify the feasibility of the proposed lot-release control method using the classifier system. Figure 5 illustrates the modeled production floor. One scheduling agent, which determines whether to release new lot, is introduced onto it. The production floor has 40 processing machines, which are classified into 22 kinds as alternative machine groups. The number of AGVs is set to four. Six stockers are also introduced to stock the products waiting for the next process. This system has one product dispatching point and one collection point.

Figure 5. Production floor model

R. Takasu et al. / Lot Release Control Using GBML in a Semiconductor Manufacturing System

501

Two product types are produced on a production floor that has approximately 1/4 length of the process steps of the actual process flow: TYPE1 has 90 production steps and its total production time is 1.95 (day); TYPE2 has 60 steps and 2.49 (day) production time. The respective due dates of products are set to about twice their production times. These data are listed in Table 1. As a consequence, classifiers that constitute lot-release rules that have been acquired by the scheduling agent have 44-bit input condition strings to detect situations of machine groups and 2-bit output action strings to dispatch each product. The target throughput rate is set as TYPE1 : TYPE2 = 1:1. The simulation runs for 15 trials; each trial has a 1000-day simulation. The trial is stopped and moved to the next situation if irrelevant dispatching fills the floor. 3.2. Simulation Results Figure reffig:wip shows the respective transitions for 50 days of (a) average WIP number, (b) daily throughput, and (c) average TAT throughout the simulation trials. The horTable 1. Process data of products: TYPE1 and TYPE2 TYPE1 Process number Process time [day] Due date [day]

TYPE2

90

60

1.95 4

2.49 5

(a) WIP

(b) Throughput

(c) TAT Figure 6. Transitions of (a) WIP, (b) throughput, and (c) TAT

502

R. Takasu et al. / Lot Release Control Using GBML in a Semiconductor Manufacturing System

Table 2. Average of WIP, throughput and TAT in the final 50 days TYPE1

TYPE2

WIP Throughput [lot/day]

15.5 5.46

17.2 5.50

TAT [day]

2.82

3.11

izontal axis indicates simulation steps; p_q represents qth day in pth trial. In the initial trials, the WIP number achieves wrong and fluctuating values because of overloading caused by random lot release. After the scheduling agent learns, it accomplishes a better and steady value at the 15th trial. This simulation takes about one day to run. Table 2 shows the average WIP number, daily throughput and average TAT during the last 50 days at the 15th trial. It is readily apparent that the throughput rate of two product types accomplishes almost 1:1 and that the average TAT of both products is before each due date. The availability of the machine with highest value also obtains 96.9%. The result clarifies that the obtained scheduling rules can achieve not only a target throughput rate, but also high productivity on the production floor. 3.3. Comparison with Other Lot Release Control Methods We compared this system with two other lot release control methods to verify its performance. The first one is Uniform (UNIF), which dispatches lots at certain intervals. The second one is Constant WIP (CONWIP).We maintained WIP numbers at set values. The threshold criteria for respective methods were determined a priori by preliminary simulations with consideration of the result of the proposed method and were set to indicate the best performance with each methods. Table 3 shows results of comparison in terms of average WIP number, daily throughput, and average TAT. As shown in Table 3, the proposed lot release control method can obtain approximately equal performance to UNIF and CONWIP. Whereas current methods, including UNIF and CONWIP, must be preset with appropriate criteria thresholds, the proposed method can acquire lot release rules autonomously by capturing the changes of the production floor ’s condition. Therefore, its effectiveness is expected to be related to the fluctuation of the production environment such as the change of demand throughput rate. Table 3. Comparison with other methods WIP

Throughput

TAT

TYPE1

TYPE2

TYPE1

TYPE2

TYPE1

TYPE2

Proposed

15.54

17.24

5.46

5.50

2.82

3.11

UNIF

14.52

16.12

5.56

5.48

2.61

2.91

15

17

5.56

5.72

2.71

2.97

CONWIP

3.4. Analysis of Acquired Rules s conditions are We analyzed acquired dispatching rules to verify that the machine group ’ determinative to the scheduling agent ’s action. Figure 7 shows all executed rules during

R. Takasu et al. / Lot Release Control Using GBML in a Semiconductor Manufacturing System

503

Figure 7. Executed classifiers in the final 5 days

Figure 8. Difference of machine group condition by action

the final 5 days of the simulation. Here (2m − 1)th and 2mth locus in the condition part of each classifier represent the WIP of machine group No.m and nth locus in the action part represent whether to release a new TYPEn lot to the production floor. In the final 5 days, we observed that 10 classifiers were executed. Their action parts were action11, which means dispatching one TYPE1 lot and one TYPE2 lot, and action00, which means dispatching no lot. The condition part include numerous "do not care" marks, so we can only slightly detect which locus dictates action according only to Figure 7. Figure 8 shows the difference of frequency caused by difference between actions on each condition of machine groups. The horizontal axis indicates the number of the machine group; the vertical axis shows the difference of frequency of each condition. We can infer that machine groups that show the largest difference of frequency are determinative to the timing to switch actions. For instance, focusing on the No. 21 machine group, action{11} tended to be executed if its condition was {11}; and action{00} tended to be executed if its condition was {10}. Therefore, it is presumable that the condition change of the No. 21 machine group engenders the action change of the scheduling agent. Consideration of the difference of frequencies in Figure 8 with the locus of a donot-care mark in Figure 7, confirms machine groups No. 1, No. 13, No. 21, and so on as determinative. These machine groups are characteristic for machines with high availability, no alternative machines, and machines with much accessed time by product. Consequently, the scheduling agent realizes the production with high productivity and fulfillment of demands through detecting the condition change of these machine groups and obtains dispatching rules that correspond to their condition changes.

504

R. Takasu et al. / Lot Release Control Using GBML in a Semiconductor Manufacturing System

4. Concluding Remarks This paper presented a new approach to lot release control method using GBML in a semiconductor manufacturing system. In this approach, a scheduling agent acquires proper dispatching rules autonomously using a classifier system without premeditated criteria thresholds, which current scheduling methods require. Computer simulations were performed using data extracted from an actual semiconductor fabrication to verify the feasibility of the proposed method. Simulation results show that the proposed scheduling agent can achieve high productivity and fulfillment of demands without criteria thresholds. Comparison with other methods showed that its performance is approximately equal to other lot release control methods such as UNIF and CONWIP in terms of WIP number, throughput and TAT. By analysis of acquired rules, we demonstrated that the scheduling agent obtained the proper dispatching rules without focusing on machine groups with specific characteristics. For future studies, the adaptability of the proposed method can be verified by dealing with more difficult problems: manufacturing systems with more product types, larger scale manufacturing systems, and manufacturing systems that introduce fluctuation of production demands.

Acknowledgements This study was supported in part by Grants-in-Aid for Young Scientists (B), 17760326, and for Scientific Research on Priority Areas (2), 16016228 from the Ministry of Education, Culture, Sports, Science and Technology.

References [1] Spearman, M.L.; Woodruff, D.L. & Hopp, W.J. (1990). CONWIP: a pull alternative to kanban, International Journal of Production Research, Vol.28, No.5, pp. 879-894 [2] Wein, L.M. (1988). Scheduling Semiconductor Wafer Fabrication, IEEE Transactions on Semiconductor Manufacturing, Vol.1, No.3, pp. 115-130 [3] Glassey, C.R. & Resende, M.G.C. (1988). Closed-Loop Job Release Control for VLSI Circuit Manufacturing, IEEE Transactions on Semiconductor Manufacturing, Vol.1, No.1, pp. 36-46 [4] Ueda, K. (1992). A concept for bionic manufacturing systems based on DNA-type information, Proceedings of 8th International PROLAMAT Conference, IFIP, pp. 853-863 [5] Holland, J.H.; Holyoak, K.J.; Nisbett, R.E. & Thagard, P.R. (1985). Induction-process of inference, learning and discovery, The MIT Press [6] Grefennstette, J.J. (1988). Credit Assignment in Rule Discovery System Based on Genetic Algorithms, Machine Learning, Vol.3, pp. 225-245 [7] Goldberg, D.E. (1989). Genetic Algorithms in Search, Optimization & Machine Learning, Addison Wesley [8] Vaario, J. & Ueda, K. (1998). An emergent modeling method for dynamic scheduling, Journal of Intelligent Manufacturing, Vol.9, pp. 129-140

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

505

Design of an AGV Transportation System by Considering Management Model in an ACT Satoshi Hoshino a,1 , Jun Ota a , Akiko Shinozaki b , and Hideki Hashimoto b a Dept. of Precision Engineering, The University of Tokyo, Japan b Mitsubishi Heavy Industries, LTD. Abstract. This paper focuses on the design of a highly efficient Automated Guided Vehicle (AGV) transportation system. Therefore, we attempt to construct detailed system management models that include agent cooperation, container storage scheduling, and container transportation planning. We then optimally design systems that are constructed with the use of these management models. The systems are compared to evaluate their operation efficiency based on their total construction cost. Finally, we describe a highly efficient management model for an AGV transportation system and discuss the need to consider an efficient system management methodology. Keywords. Design, AGV transportation system, management model, ACT,

1. Introduction The amount of container trade, especially in Asia, has increased significantly in recent years [1]. Following this trend, several studies have analyzed automation from various viewpoints [2,3]. In this paper, we consider the machines as the operational agents that operate in the automated container terminal (ACT) for unloading/loading, transportation, and storage. We then design a highly efficient automated transportation system in which the agents operate. For this purpose, it is necessary to consider the following problems: (I) optimal design of a transportation system [4], (II) comparison, evaluation, and analysis of a transportation system [5,6], and (III) efficient management of a transportation system. For problem (I), we have proposed a hybrid design methodology for an automated guided vehicle (AGV) transportation system with the use of a queuing network theory and a transportation simulator [4]. For problem (II), we have compared and evaluated the characteristics of the layouts of the transportation systems, such as vertical and horizontal ones [5]. On the other hand, as an integrated study of problems (I) and (II), we have evaluated and analyzed the total construction cost of the transportation system on the basis of the comparison of the optimal design results. A comparison with the vertical system that has been used in conventional ACTs indicated that the horizontal system is 1 Correspondence to: The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656 JAPAN. Tel.: +81 3 5841 6486; Fax: +81 3 5841 6487; E-mail: [email protected].

506

S. Hoshino et al. / Design of an AGV Transportation System by Considering Management Model

Figure 1. Actual Horizontal AGV Transportation System in an Automated Container Terminal (ACT)

more efficient [6]. Therefore, in this study, we focus on the horizontal AGV transportation system shown in Fig.1; we then design a highly efficient transportation system while considering an efficient management methodology to overcome the existing problem (II I). In a conventional study for ACT management, Hartmann has introduced an approach to generate scenarios of operations [7]. The generator is based on a large number of parameters. He then outlined the parameters that are important to produce realistic scenarios of high practical relevance. This study, however, focuses mainly on scenarios in a quay-side operation. As for yard-side container transportation and storage operations, he focused on the transportation destination of a container. Therefore, for this study, we constructed yard-side management models. For this purpose, we define the management models as follows. • Agent cooperation between heterogeneous multi-agent. • Container storage scheduling at a storage location. • Container transportation planning in a container storage area. After that, we implement the constructed models to a horizontal AGV transportation system and then derive the numbers of AGVs and rubber-tired gantry cranes (RTGCs) for various transportation demands by means of the proposed design methodology [4]. Based on the derived design parameters, we compare the construction costs of each transportation system and evaluate the influence of the management model on operation efficiency. Finally, we describe a highly efficient management model and discuss the need to consider an efficient management methodology.

2. AGV Transportation System in an ACT 2.1. Transportation System and Layout Settings With regard to the layout of a transportation system for an optimal design, the horizontal AGV transportation system, shown in Fig.1, is divided into three parts and four areas, namely, a quay, two means of transportation, and a container storage area. A number of deployed quay container cranes (QCCs) is given as three, and it is not a design objective due to a space constraint. Here, container transportation orders are equally generated by

S. Hoshino et al. / Design of an AGV Transportation System by Considering Management Model

507

the three QCCs in the quay area. In the container storage area, the container storage locations show the temporal container storage space between the sea and land. Two RTGCs are used at each location. There is one work path adjacent to each location. Each location has a container storage capacity of 320 (4 Rows × 20 Bays × 4 Tiers). Let us define the capacity of one storage container is a 20-foot equivalent unit (TEU). The operation specifications of the AGV, RTGC, and QCC are based on the cited literature [8]. 2.2. Problem Description In addition to the numbers of AGVs and RTGCs, the management models, i.e. agent cooperation, container storage scheduling, and container transportation planning, are the design objectives. The detailed design process of the number of AGVs and RTGCs are discussed in the cited literature [4]. The derived numbers of AGVs and RTGCs in the designed system are the evaluation factors. Concerning storage constraints, the final storage rate at each location must not exceed the given limits. This constraint is enacted by the actual ACT field because effective land use is a serious issue. 2.3. Transportation Procedure The AGVs circularly transport containers in the system shown in Fig.1 while cooperating with the RTGCs as follows. step 1. A QCC loads a container to an AGV from a container ship. step 2. A container storage location is assigned as the destination location of the container. step 3. The AGV begins to transport the container to the assigned location. step 4. The AGV goes into the work path and then arrives at a container transfer point. step 5. The AGV begins to transfer the container to an RTGC. step 6. The RTGC stores it at the storage point and waits for the next order. step 7. The AGV that has already transferred the container to the RTGC goes back to a QCC. Back to step 1.

3. Agent Cooperation 3.1. Framework of Agent Cooperation In conventional studies, Yoshimura et al. and Ota et al. have focused on agent cooperation, such as handing over and transporting objects with the use of homogeneous multiagent [9,10]. On the other hand, since we deal with heterogeneous multi-agent, it is impossible to realize highly efficient agent cooperation using a uniform agent-behavior strategy for the various types of agents. Therefore, we focus on and design the following agent cooperation: (i) the RTGC selection method by AGV (distance-based RTGC selection and workspace-based RTGC selection), and (ii) RTGC selection and call-out timing by AGV (immediately after the AGV arrives at the destination point on the work path; immediately after the AGV goes onto the work path).

508

S. Hoshino et al. / Design of an AGV Transportation System by Considering Management Model Workspace for RTGC_A RTGC_A

Workspace for RTGC_B

Destination point of AGV_A

RTGC_B

AGV_A

Communicating/Sensing/Selecting AGV on the working path

AGV_B

Sensing/Selecting AGV at the destination point

Figure 2. Cooperation Between the AGVs and RTGCs (top view)

3.2. Construction of Agent Cooperation 3.2.1. Distance-based RTGC selection at the destination point As shown by the dotted line in Fig.2, the AGV_A measures the relative distance of each RTGC at the location when the AGV_A arrives at the destination point. If two RTGCs are idling, RTGC_B, which is nearer the AGV_A than RTGC_A, will be selected and identified as the transfer partner. However, if the RTGC_B (a) has already been selected by another AGV or (b) is in a transfer or storage operation, the RTGC_A is selected as the transfer partner. If both RTGCs are in the state of (a) or (b), the AGV_A needs to wait at the destination point. 3.2.2. Workspace-based RTGC selection at the destination point The RTGCs have their own assigned workspace and operate in each workspace. Therefore, as shown in Fig.2, the AGV_A selects and calls the RTGC_B that operates in the assigned workspace. For this study, an equal scale of workspace was given to each of the RTGCs at the location. In other words, each RTGC has the following workspace: 4 Rows × 10 Bays × 4 Tiers. 3.2.3. Distance-based RTGC selection on the work path As shown by the solid line in Fig.2, immediately after the AGV_A goes onto the work path, the AGV_A begins to communicate with each RTGC to collect situational information and measure the relative distance for the AGV_A’s destination point; it then selects the RTGC_B as the transferring partner. The selected RTGC_B begins to move while the AGV_A is transporting. As in the case described in 3.2.1, the AGV has to select the more distant RTGC even if another RTGC is near the AGV’s destination point. This occurs even if the nearby RTGC is in the state of (a) or (b). In addition, the increase in the workload on the work path causes an increase in the number of AGVs that have already been communicating with the RTGCs as the number of AGVs increases. When this increase occurs, the rear AGV does not begin to communicate with the RTGC until the lead AGV finishes communicating with it.

S. Hoshino et al. / Design of an AGV Transportation System by Considering Management Model

Workspace of RTGC_B

Workspace of RTGC_A RTGC_A

509

RTGC_B J

F

L

C B

D

G

A

K

H

E

I

[Initial orders] Sequence 1 2 3 4 5 6 7 8 9 10 11 12 J B D F E C L K I A H G Task [Scheduled orders] Sequence 1 2 3 4 5 6 7 8 9 10 11 12 A B C D E F G H I J K L Task [Rescheduled orders] Sequence 1 2 3 4 5 6 7 8 9 10 11 12 Task A H B C I J D K E F G L

Figure 3. Example of Container Storage Scheduling (top view)

3.2.4. Workspace-based RTGC selection on the work path As shown by the solid line in Fig.2, immediately after the AGV_A goes onto the work path, the AGV_A begins to communicate with each RTGC to identify its workspace and collect situational information. After that, the AGV_A selects the RTGC_B based on the workspace in which the container is stored; it orders the RTGC_B to move to the destination point. The selected RTGC_B begins to move while the AGV_A is transporting. The AGV maintains communication or waits on the work path until the RTGC is in the idling state if the RTGC is in the state of (a) or (b).

4. Container Storage Scheduling 4.1. Framework of Container Storage Scheduling From among conventional studies of container storage scheduling with ACTs, several studies have focused on system performance, in which container storage orders are given on the basis of certain dispatching rules [11,12]. These studies, however, have been limited to discussions of how to assign the generated transportation and storage orders to the AGVs; there have been no studies regarding the assignment considering the sequence of orders at the container storage locations. Therefore, in this study, we construct the following container storage models: random container storage, in which the sequence of orders is not considered, and container storage scheduling, in which the sequence of orders is considered. 4.2. Random Container Storage Random container storage is a container storage model that does not consider the sequence of the operation orders at the location. In Fig.3, the “initial orders” represent container transportation and storage orders that are randomly generated in the initial state.

510

S. Hoshino et al. / Design of an AGV Transportation System by Considering Management Model

Here, every container has information about its storage situation, including the “rows” and “bays” of available container storage space A∼L. In the case of random container storage, the AGVs achieve the orders in sequence, and the transported containers are then stored in sequence by the RTGCs. 4.3. Container Storage Scheduling With regards to container storage scheduling, we arrange the initial orders based on the following objectives: (A) minimizing the total moving distance of the RTGCs and (B) equalizing the workloads at the locations and on work paths. Here, we use the workspace for the RTGC. All orders are known a priori. First, the positions are defined as RTGC_A and RTGC_B, which are shown in Fig.3 as the initial positions. We then replace the initial orders based on (A) and define them as “scheduled orders.” These scheduled orders, however, cause a heavier workload at the workspace of RTGC_A. Therefore, we again replace the scheduled orders based on (B) as the second scheduling in each workspace and define them as “rescheduled orders.” The AGVs achieve the rescheduled orders.

5. Container Transportation Planning 5.1. Framework of Container Transportation Planning With regards to container transportation planning, we construct the following models: random, equivalent, and quay-side-weighted transportation. Based on the constructed models, the number of containers that are transported and stored at each location is determined in a few seconds under a storage constraint. With regards to the storage constraint at the location, the maximum number of stored containers, i.e., the container storage capacity, is assumed to be 320 [TEU]. 5.2. Random Transportation Under the random transportation rule, the destination locations of the containers are randomly planned. Therefore, the number of containers transported to each location is affected by random numbers. 5.3. Equivalent Transportation Under the equivalent transportation rule, the destination locations of the containers are equivalently planned. For instance, the number of containers is 600 [TEU], and the number of locations is 4. Thus, the number of containers transported to each location is planned to be 150 [TEU] (see Fig.4). As for the x-axis of Fig.4, the locations are arranged from the quay side in the order of the location number. In addition, due to the container storage capacity at one location (320 [TEU]), the final storage rate at each location reaches 46.9 [%]. Here, a final storage rate denotes (the number of stored containers) / (the container storage capacity).

511

Qu

ay

-si

76.9 [%]

246 [TEU]

de

-w

+30 [%]

eig

hte

ds

tor

ag

46.9 [%]

e

Equivalent storage 150 [TEU]

-30 [%]

Transported and stored container rate [%]

S. Hoshino et al. / Design of an AGV Transportation System by Considering Management Model

54 [TEU]

16.9 [%]

1

2

3

4

Location number

Figure 4. Example of the equivalent and quay-side-weighted transportation rule

5.4. Quay-side-weighted Transportation As shown in Fig.4, under the quay-side-weighted transportation rule, the destination locations of the containers are preferentially determined from the quay side. With regard to the storage constraint described in 2.2, the maximum and minimum storage rate must fall within ±30[%] at each location. Under this constraint, the number of transportation and storage containers is planned at a slant from the quay side. For instance, the total number of containers is 600 [TEU], and the number of locations is 4. Thus, the numbers of containers transported to the 1st ∼ 4th locations are 246, 198, 102, and 54 [TEU], respectively.

6. Optimal Design of the AGV Transportation System 6.1. Design Process The design process is described from the given demand, i.e., required throughput, to the design solutions, as follows. Here, the throughput is given as follows: (total number of containers) / (required transportation time). First of all, under random transportation, we construct the transportation systems based on management model 1)∼6), as shown in Table1. In order to derive the number of AGVs and RTGCs, we apply the design methodology [4]. Model 1) represents a conventional management model assumed in the literature [4]. Under this model, the AGV selects an RTGC at random with the use of the random container storage when the AGV arrives at the destination point. Process 1. Input a throughput. Process 2. Implement the management models. Process 3. Derive the deployed number of agents, evaluate operation efficiency, and derive a more efficient management model. Process 4. Apply the equivalent and quay-side-weighted transportation rules to the management model that is derived from process 3.

512

S. Hoshino et al. / Design of an AGV Transportation System by Considering Management Model

Table 1. System management models under random transportation RTGC selection method

RS

DS

DS

DS

DS

WS

WS

WS

WS

RTGC selection & call-out timing

DP

DP

WP

DP

WP

DP

WP

DP

WP

Container storage scheduling

×

×

×





×

×





Management model

1)

2)

3)







4)

5)

6)

DS: Distance-based Selection, WS: Workspace-based Selection, RS: Random Selection DP: at the Destination Point, WP: on the Work Path, ×: Random storage, : Storage scheduling

50

50

Model 1) Model 2) Model 3)

Random storage (Model 6)

Model 4) Model 5) Model 6)

30

Quay-side-weighted storage

40 Construction cost [point]

Construction cost [point]

40

20

10

0

Equivalent storage 30

20

10

0

10

20

30

40

50

60

70

80

90

100 110 120 130

10

20

30

Required throughput [TEU/hour]

(a) Under random container transportation

40

50

60

70

80

90

100 110 120 130

Required throughput [TEU/hour]

(b) Under all transportation rules

Figure 5. Comparison Results of the Total Construction Costs

Process 5. Derive the deployed number of agents, evaluate the operation efficiency, and design the most efficient management model. Design the numbers of AGVs and RTGCs as the design solutions. In processes 3 and 4, we use the following cost model in order to derive the construction cost. Based on the cost of equipment for the AGV and RTGC, a and b denoting the cost, the coefficients are as follows: a : b = 1 : 2. Construction cost = a × AGV s + b × RT GCs 6.2. Evaluation of the Operation Efficiency Based on the Total Construction Cost As shown in Fig.5(a), except model 1), notable differences of the total construction costs are noticeable as the required throughput becomes higher. As a result, the difference in the total construction costs between models 1) and 6) at the point of the required throughput is 120, i.e., higher by 18 [point]. This result indicates that model 6) is more efficient than the conventional management model by about 38 [%]. From the results of models 2)∼4) that were obtained in order to evaluate the validity of agent cooperation under the random container storage, model 4), that is, the model in which the AGV selects and calls out the RTGC by means of workspace-based selection at the destination point, is the most efficient one. Under container storage scheduling, model 6) is more efficient than model 5).

S. Hoshino et al. / Design of an AGV Transportation System by Considering Management Model

513

Therefore, model 6), that is, the management model in which the AGV selects and calls out the RTGC by means of workspace-based selection on the work path under container storage scheduling, is the most efficient one. 6.3. Discussion of a Highly Efficient Management Methodology Fig.5(b) shows the results of a comparison of the total construction costs of transportation systems designed with the use of equivalent and quay-side-weighted transportation in addition to management model 6). At the point at which the required throughput is 130, the differences in the total construction costs under equivalent and quay-side-weighted transportation are higher by 32 and 34 [point], respectively. In terms of operation efficiency, it is noticeable that the 16.3[%] system efficiency is improved under equivalent transportation and the 10.5[%] system efficiency is improved under quay-side-weighted transportation for model 6) with the use of random storage under random transportation. Even if a required throughput becomes greater, the tendencies in which equivalent transportation is more efficient will not be changed because of the operating RTGCs that are deployed equally at each location. From the results shown above, we designed the following highly efficient management model: the AGV selects and calls out the RTGC with the use of workspace-based selection on the work path under container storage scheduling after the container transportation and storage destinations are planned on the basis of equivalent transportation. Furthermore, these results clarified a need to take efficient system management into account.

7. Conclusion and Future Work In this paper, the focus was on the design of a highly efficient AGV transportation system. We constructed system management models that included agent cooperation, container storage scheduling, and container transportation planning. In the result, we designed a highly efficient management model and clarified the need to consider an efficient system management. In future works, we will address the issue of reliability and robustness of the transportation system in case of failure of one of the operations. On the other hand, the problem discussed in this paper is probably formulated from a viewpoint of mathematical programming. In that case, it will be necessary to consider the computational cost.

References [1] D. Steenken et al., Container Terminal Operation and Operations Research - A Classification and Literature Review, OR Spectrum, 26 1 (2004) 3–49. [2] P. Ioannou et al., Advanced Material Handling: Automated Guided Vehicles in Agile Ports, CCDoTT Technical Report, Center for Advanced Transportation Technologies, (2001). [3] J. Zhang et al., Automated Container Transport System between Inland Port and Terminals, 83rd TRB Annual Meeting Interactive Program, (2004). [4] S. Hoshino et al., Optimal Design Methodology for an AGV Transportation System by Using the Queuing Network Theory, 7th International Symposium on Distributed Autonomous Robotic Systems, (2004) 391–400.

514

S. Hoshino et al. / Design of an AGV Transportation System by Considering Management Model

[5] S. Hoshino et al., Comparison of an AGV Transportation System by Using the Queuing Network Theory, Proc. 2004 IEEE/RSJ Int. Conf. Intell. Robots and Systems, (2004) 3785– 3790. [6] S. Hoshino et al., Optimal Design, Evaluation, and Analysis of AGV Transportation Systems Based on Various Transportation Demands, Proc. IEEE Int. Conf. Robotics and Automation, (2005) 1412–1418. [7] S. Hartmann, Generating Scenarios for Simulation and Optimization of Container Terminal Logistics, OR Spectrum, 26 2 (2004) 171–192. [8] MITSUBISHI HEAVY INDUSTRIES, LTD., Advanced Technology Cargo Handling Systems, Products Guide, (2004). [9] Y. Yoshimura et al., Iterative Transportation Planning of Multiple Objects by Cooperative Mobile Robots, 2nd International Symposium on Distributed Autonomous Robotic Systems, (1996) 171–182. [10] J. Ota et al., Flexible Transport System by Cooperation of Conveyer-load AGVs, Proc. IEEE Int. Conf. Robotics and Automation, (2000) 1144–1150. [11] Martin Gunow et al., Dispatching Multi-load AGVs in Highly Automated Seaport Container Terminals, OR Spectrum, 26 1 (2004) 211–235. [12] Chin.I. Liu et al., A Comparison of Different AGV Dispatching Rules in an Automated Container Terminal, The IEEE 5th International Conference on Intelligent Transportation Systems, (2003) 3–6.

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

515

Analysis of Purchase Decision Making: Network Externalities and Asymmetric Information Yohei Kaneko a,1 , Nariaki Nishino b Sobei H. Oda c and Kanji Ueda b a School of Engineering, The University of Tokyo b Research into Artifacts, Center for Engineering (RACE), The University of Tokyo c Faculty of Economics, Kyoto Sangyo University Abstract. Product markets in which network externalities are present have been expanding through technological development. In such markets, information about users strongly affects purchase decision-making because a consumer’s utility depends on the number of users. Although complete information is assumed in most studies of network externalities, it is difficult for consumers to obtain complete information in the real world. We construct a model of multiple-product market with network externalities. This model subsumes that each consumer has asymmetric information with respect to others’ purchase decision-making. We analyze purchase decisions using game theory, multi-agent simulation, and experiments with human subjects. Results imply the following important features: (1) a monopolistic market arises when consumers neglect their own preferences and purchase the same products that others do; (2) a market becomes inefficient because of asymmetric information; (3) irrational decision-making might increase total surplus; and (4) much information does not always increase consumers’ utilities. Keywords. Network externalities, Asymmetric information, Nash equilibrium, Multi-agent simulation, Experiments with human subjects

1. Introduction This study is intended to investigate consumers’ purchasing decision-making under conditions of asymmetric information in product markets where network externalities are present. Markets with network externalities, such as those for mobile telephones or Internet services, have been growing. Network externalities are defined as externalities by which a consumer’s utility depends on the number of users who consume the same product. To cite one example, a fax machine is worth little to a person if no other consumers use them. Church et al. [1] analyzed the equilibria of product diffusion in a market with mathematical model including the variety of software and prices. Katz et al. [2] specifically examined excess inertia and excess momentum, and investigated entering markets and product compatibility. Ruebeck et al. [3] undertook experiments with human subjects and indicated that switching costs between products prevented diffusion of superior products. 1 Correspondence

to: Yohei Kaneko, Research into Artifacts, Center for Engineering (RACE ), The University of Tokyo, Kashiwanoha 5-1-5, Kashiwa, CHIBA, 277-8568, Japan. Tel.: +81 4 7136 4271; Fax: +81 4 7136 4242; E-mail: [email protected]

516

Y. Kaneko et al. / Analysis of Purchase Decision Making

Oda et al. [4] constructed a market model with network externalities and analyzed the product diffusion process using cellular automata. Two problems are inherent in those studies. First, they assumed that information that consumers possess about products is complete and symmetric. Consumers in the real world, however, obtain various information from different media. Accordingly, one can say that information among consumers is asymmetric. Second, most studies specifically examine only the diffusion of products. Considering asymmetric information and the variety of consumer preferences, it is necessary to examine not only product diffusion, but also consumers’ behavior. We developed a model of a market with network externalities introducing asymmetric information. We used it to analyze consumers’ behavior with game theory, multiagent simulation, and experiments with human subjects. This paper is organized as follows. Section 2 explains our model. The set of market equilibria is characterized in section 3. Results of multi-agent simulation and experiments with human subjects are illustrated in section 4 and 5. We conclude this study in section 6. 2. The Model We explain the model of a product market, a formulation of network externalities, and consumers’ behavior under asymmetric information, which is an extention of the model constructed in Davis et al.[5]. 2.1. Product Market We constructed a model of a product market with network externalities. Consumers and multiple products (A, B) exist in the market. Among them, some consumers prefer product A, others prefer product B. A consumer’s utility using a product depends on the number of users of the same product because of network externalities. Only consumers exist in the market: for simplicity, no producers are included. It is assumed that the price of product A equals that of product B; both prices are constant. 2.2. Network Externalities Each consumer chooses a product for which (U tility) = Rp N e − (P rice)

(1)

is maximized [5]. Here, Rp indicates the reservation price and N e represents the number of users of each product. Equation (1) represents that a consumer’s utility increases when more consumers purchase identical products. A set of consumer’s reservation price for each product represents his or her preference. Table 1 shows an illustration of Rp. For example, the maximum money a consumer who tastes product A can pay for product A is $8; that for product B is $3. Table 1. An exemplary reservation price Consumer who prefers product A (B) Rp

Product A

Product B

$8 ($3)

$3 ($8)

Y. Kaneko et al. / Analysis of Purchase Decision Making

517

2.3. Consumers’ Behavior under Asymmetric Information Figure 1 shows a summary of consumers’ behavior. A consumer’s behavior in the market with network externalities is modeled as follows: • A consumer has a reservation price for each product. • It is assumed that each consumer has different information concerning others’ purchase decisions. For example, one consumer can know only three consumers’ decision-making among ten consumers, another can know eight among them. • A consumer makes the purchase decision of buying product A, product B, or nothing. • A consumer derives zero utility from buying nothing.

Figure 1. Consumers’ purchase behavior model

3. Theoretical Analysis This section shows results of theoretical analyses for product diffusion in three ways: consumer’s utility maximization, social surplus maximization, and Nash equilibria. Table 2 shows the definition of parameters for theoretical analysis. We assume that parameters satisfy conditions (2) and (3). Table 2. Definitions of parameters for theoretical analysis N

Number of whole consumers

rA

Number of consumers who prefer product A

rB

Number of consumers who prefer product B

RJ a

Reservation Price for product A that consumers who prefer product J have (J = A, B)

RJ b

Reservation Price for product B that consumers who prefer product J have (J = A, B)

P

Product Price

na

Number of consumers who use product A

nb

Number of consumers who use product B

n0 P >

Number of consumers who use nothing RA a (=

N =r

A

+r

RB b ) B

>

RA b (=

RB a )

= na + nb + n0

(2) (3)

3.1. Consumer’s Utility Maximization We classified nine cases and examined product diffusion for maximizing each consumer’s utility 1 . Table 3 shows the respective users in each case. 1 Althogh the details of equations are omitted, these cases are classified by inequalities of J J RJ a , Rb , N, r and P .

518

Y. Kaneko et al. / Analysis of Purchase Decision Making

Table 3. Number of users, in equilibrium, for whom consumer’s utility is maximized

Case 1 2 3 4 5

na 0 0 rA 0 rA

nb 0 0 0 rB rB

n0 N N rB rA 0

Case 6 7 8 9

na rA N 0 N 0

nb rB 0 N 0 N

n0 0 0 0 0 0

In Cases 1 and 2, no one purchases a product because products are expensive. Particularly in Case 1, no consumer can obtain positive utility even if all consumers purchase identical products. On the other hand, in Case 2, some consumers obtain a positive utility and others obtain negative utility when all buy identical products. In Case 3, consumers who prefer product A purchase product A, consumers who prefer product B never purchase any product. Cases 3 and 4 are symmetric. In Cases 5 and 6, the same product is bought according to each consumer’s preference. In Case 5, consumers who buy the opposite product to their preferences always obtain negative utilities. On the other hand, they do not do so necessarily in Case 6. All consumers purchase one kind of product in Cases 7, 8 and 9. Accordingly, some consumers buy opposite products to their preferences. 3.2. Social Surplus Maximization This section presents examination of product diffusion for maximizing social surplus in each case that was obtained in section 3.1. Social surplus means the sum of each consumer’s utility. Table 4 shows results of analysis. Table 4. Number of users, in equilibrium, for whom social surplus is maximized

Case 1 2 3 4 5

na 0 N 0 N 0 N 0

nb 0 0 N 0 N 0 N

n0 N 0 0 0 0 0 0

Case 6 7 8 9

na N 0 N 0 N 0

nb 0 N 0 N 0 N

n0 0 0 0 0 0 0

In Case 1, the social surplus is maximum when no product is purchased for an overly expensive product price. Above all, social surplus equals zero. Except for Case 1, monopolistic diffusion of any product maximizes the social surplus. In Cases 2, 5, 6 and 9, whichever product A or B diffuses, social surplus is maximized. In Cases 3, 4, 7 and 8, however, only one kind product diffuses because bias exists about consumers’ preference distribution. Compared with diffusion in section 3.1, the results indicate that private incentives of consumers do not coincide with social incentives (Cases 2, 3, 4, 5 and 6). 3.3. Nash Equilibria We examined Nash equilibria in each case that was obtained in section 3.1. Table 5 shows those results. Our analyses revealed the following points.

519

Y. Kaneko et al. / Analysis of Purchase Decision Making

Table 5. Nash equilibria Case 1

na 0

nb 0

n0 N

Case

na N

nb 0

n0 0

2

0 rA

0 0

N rB

7

0 0

N 0

0 N

N

0

0

0 0

N 0

0 N

3 4

5

6

0

0

N

0 0

rB 0

rA N

rA

rB

0

rA 0

0 rB

rB rA

0 N

0 0

N 0

rA

rB

0

rA 0

0 N

rB 0

0

rB

rA

0

0

N

8

9

N

0

0

* *

rA rA

rB 0

0 rB

*

0 0

N rB

0 rA

0

0

N

A * only if RA ar ≥P

• Product diffusion in each case of section 3.1 shows one of the Nash equilibria. • In addition to results of section 3.1, extra patterns of product diffusion were derived. • Although some Nash equilibria maximize social surplus, product diffusion that maximizes social surplus is not always a Nash equilibrium. 4. Multi-agent Simulation We analyzed decision-making under asymmetric information with multi-agent simulation. Cases 1 and 2 were not examined because no consumers purchased any products in equilibria. Furthermore, Cases 3 and 4 and Cases 7 and 8 were symmetric and essentially identical. For those reasons, this study simulated Cases 3, 5, 6, 8 and 9. 4.1. Agents’ Decision Making A consumer agent behaves in the following manner. 1. A consumer agent prefers product A or product B. 2. The number of users that each consumer can know is different from that of others. 3. N agents sequentially make decisions and can know the results of previous agents’ selections. 4. Consumer agents calculate expected utilities in the following two linear modes: (a) Calculate expected utility using information of preference distribution, or (b) Calculate expected utility without using information of the preference distribution. 5. The agent purchases the product that maximizes expected utility if it is positive. However, the agent purchases nothing if the expected utility is negative. 4.2. Simulation Results Table 6 presents the parameters used for simulations. Table 7 summarizes the simulation results. Three features are remarkable.

520

Y. Kaneko et al. / Analysis of Purchase Decision Making

• Information Cascades: Regardless of preferences, a consumer mimics others’ decision-making [6]. • Social Surplus: Social surplus is the sum of all consumers’ utilities. • Information and Utility: The relation between the number of users a consumer can know and their average utility. Figure 2 shows the relation between information and utility (Cases 5 and 6). Table 6. Parameters in simulation Case

RA a

RA b

P

rA

rB

3

8

3

35

7

3

5 6

8 8

3 3

35 25

5 5

5 5

8 9

8 5

4 3

25 25

3 5

7 5

N

10

Table 7. Summary of simulation results Without considering preference distribution

Considering preference distribution

Case

IC

SS

Information and Utility

IC

SS

3

No

39%

information⇒utility

Yes

34%

Information and Utility nothing

5

Yes

5%

information⇒utility

Yes

3%

information⇒utility

6

Yes

68%

information⇒utility

Yes

70%

information⇒utility

8

Yes

85%

nothing

Yes

100%

9

Yes

100%

nothing Yes 38% I C : Information Cascades, S S : Social Surplus

(a) simulation, Case 5

nothing nothing

(b) simulation, Case 6

Figure 2. Relation between information and utility

First, it is apparent that information cascades are observed, except for Case 3 (without considering preference distribution). For example, a consumer who prefers product B purchases product A to increase their utility by following other consumers’ decisionmaking. Second, in almost all cases, social surplus is inefficient. Accordingly, we can say that social surplus is not optimal even if consumers try to maximize their own utilities. We infer that two reasons pertain. • Product diffusion when maximizing each private utility differs from that when maximizing social surplus (Cases 3, 5 and 6). • Agents fail to do correct decision-making because of asymmetric information. Finally, we describe the relation between information and utility. In Case 5, much information increases utility (Fig. 2 (a)). Agents tend to obtain negative utilities because products are expensive. Their utilities decrease if the products which initial agents purchase

521

Y. Kaneko et al. / Analysis of Purchase Decision Making

do not diffuse in the market. On the other hand, much information decreases utility in Case 6 (Fig. 2(b)). In this case, initial agents purchase products according to their preferences. The latter agents buy those identical products by information cascades. Consequently, initial agents’ utilities are higher than those of latter agents. 5. Experiments with Human Subjects 5.1. Settings of Experiments We conducted experiments with human subjects based on methods of experimental economics [7]. Parameters were identical for all simulation cases. Settings of experiments were as follows. • Only one human subject existed in the market; other consumers were replaced with computerized agents. • In experiments using preference distribution information, agents considering it were replaced. • In experiments without using preference distribution information, agents who did not consider it were replaced. 5.2. Results of Experiments Table 8 summarizes results of experiments with human subjects. Figure 3 shows the relation between information and utility (Cases 5 and 6). Table 9 shows private utility and social surplus. Table 8. Summary of experimental results Without considering preference distribution

Considering preference distribution

Case

IC

SS

Information and Utility

IC

SS

3

Yes

39%

information⇒utility

Yes

35%

nothing

5

Yes

5%

information⇒utility

Yes

0%

information⇒utility

6

Yes

62%

information⇒utility

Yes

66%

information⇒utility

8

Yes

80%

information⇒utility

Yes

100%

nothing

9

Yes

95%

nothing

Yes

26%

nothing

(a) experimental results in Case 5

Information and Utility

(b) experimental results in Case 6

Figure 3. Relation between information and utility Table 9. Effect to all consumers by a consumer’s decision-making Case 6 (Without Information of preference distribution)

Private Utility Agent Human Subject

96% 84%

Social Surplus 64% 99%

522

Y. Kaneko et al. / Analysis of Purchase Decision Making

First, information cascades were also observed in experiments with human subjects. Consumers in the real world buy the same products as others do. Second, social surplus is inefficient in almost all cases. Compared with simulation results, it is less inefficient because human subjects behave irrationally in experiments despite obtaining the same information as agents in simulation. Private utility of human subjects is worse than that of computerized agents. However, social surplus in the market where the human subjects exist might be efficient (Table 9), perhaps because human subjects fail to maximize their respective utilities and consequently increase other consumers’ utilities. Finally, we describe the relation between information and utility. Nearly identical results to those of the simulation were derived in experiments. As shown in Case 6 (Figure 3 (b)), the consumer whose information is 8, however, obtained more utility than computerized agent. This indicates that he or she performed purchase decision making that is more rational than linear expectation. 6. Concluding Remarks In this paper, we analyzed consumers’ purchase decisions under asymmetric information in a product market with network externalities. Introducing asymmetric information that each consumer has different information about product diffusion, we developed a model of a market in which network externalities are present. Our analyses using theory, multiagent simulations, and experiments with human subjects indicated the following salient points. • Monopoly diffusion is derived when consumers neglect their preferences and buy the same products as others do. • Social surplus is especially inefficient when a difference is apparent between private incentives of consumers and social incentives. • Irrational decision-making decreases a consumer’s utility. However, it might increase the social surplus. • In cases where products are expensive, much information increases consumers’ utility. On the other hand, in cases where products are cheap, less information increase consumers’ utility because products that are initially purchased diffuse in the market because of information cascades. References [1] J. Church and N. Gandal: Network Effects, Software Provision, and Standardization, The Journal of Industrial Economics, Vol.40 (1992), No. 1, pp.85-103. [2] M. L. Katz and C. Shapiro: Product Introduction with Network Externalities, The Journal of Industrial Economics, Vol.40 (1992), No.1, pp.55-83. [3] C. Ruebeck, S. Stafford, N. Tynan, W. Alpert, G. Ball and B. Butkevich: Network Externalities and Standardization: A Classroom Demonstration, Southern Economic Journal, Vol.69 (2002), No.4, pp.1000-1008. [4] S. H. Oda, K. Iyori, K. Miura, K. Ueda: The Application of Cellular Automata to the Consumer’s Theory, Simulating the Duopolistic Market, Simulated Evolution and Learning, Springer (1999), pp.454-461. [5] D. D. Davis and C. A. Holt: Experimental Economics, pp.249-282, Princeton University Press, Princeton, N.J., 1992. [6] L. R. Anderson and C. A. Holt: Information Cascades in the Laboratory, The American Economics Review, Vol.87 (1997), No.5, pp.847-862. [7] D. Friedman and S. Sunder: Experimental Methods: A Primer for Economists, Cambridge University Press, Cambridge, 1994.

Part 9 Dynamics, Morphology, and Materials in Intelligent Behavior

This page intentionally left blank

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

525

Emergence of Insect Navigation Strategies from Homogeneous Sensorimotor Coupling Simon Bovet 1 Artificial Intelligence Laboratory, University of Zurich Abstract. When natural intelligent systems are studied with artificial agents, control architectures and body dynamics are usually designed for specific tasks. This contrasts with natural evolution, considered to have no goal or intention. An alternative approach consists in investigating how emergent behaviors can be observed with systems not designed for any particular task. This paper shows how two different homing strategies observed in insects (path integration and landmark navigation) can emerge from a general model of autonomous agent control. This recently proposed control architecture consists of homogeneous, non-hierarchical sensorimotor coupling. The synaptic coupling between each pair of modalities uses a Hebbian learning rule that captures correlations between sensory or motor activity generated by the agent itself. Activity is allowed to propagate across the different modalities, thus generating the observed behaviors. The experiments are performed both in simulation and with a real robot. Keywords. Emergent behaviors; Sensorimotor coordination; Insect navigation

1. Introduction In the endeavor of building “intelligent” agents, designers very often decide to concentrate on a specific task the agent has to perform, towards which much of the design is actually biased. For instance, if we consider an agent built by a researcher focusing on bipedal locomotion, it is very likely that no one will be surprised (nor will really argue) if the robot is equipped with two “legs”, if its control model contains some sort of central pattern generators, or if some proprioceptive information from the legs or the feet is used as feedback to modulate some parameters of the pattern generators. Also, providing the agent with an ambient temperature sensor will most likely be considered as irrelevant or off-topic. On the one hand, there is no need to argue that this scientific approach – i.e. designing agents for specific tasks – has led and continues to lead to substantial progress and understanding. On the other hand, several difficulties of this approach have been recognized [1]. To mention just a few: What kind of sensory or motor information is relevant and should be used for a given task? How to implement a particular observed behavior? Very often, designers go at least partially for ad-hoc solutions tailored in actual fact for 1 Current

address: AI Laboratory, Andreasstrasse 15, 8050 Zurich, Switzerland. E-mail: bovet@ifi.unizh.ch

526

S. Bovet / Emergence of Insect Navigation Strategies from Homogeneous Sensorimotor Coupling

Figure 1. (a) General structure of the control architecture model. Each pair of sensory of motor modalities is fully connected by synaptic coupling (represented symbolically by the arrows). (b) Implementation of the model at the neural level for any sensory or motor modality. Arrows represent full connectivity from and to all other modalities. See text for details.

the task the agent has to perform. For example, designers almost always provide the control system of autonomous robotic agents with a set of basic behaviors such as reflexes or with a general hierarchical structure in the control architecture. However, it is not clear how arbitrary the assumptions made by the designer about how the external world is perceived by the situated agent actually are. Moreover, since most “intelligent” systems that we know and from which we often draw inspiration are in fact natural systems, it is legitimate to ask the following question: how much can we understand about intelligent systems which are products of natural evolution – i.e. which were not intentionally designed for any specific task – by investigating artificial agents precisely constructed to perform given tasks? Consequently, an alternative approach consists of building a system a priori not designed for any particular task, and observing what behaviors can emerge from its interaction with the environment. The present work follows this approach with focus on the control part of robotic agents. In [2] and [3], we proposed a new model of homogeneous and non-hierarchical robot control architecture. We showed that by systematically coupling any pair of sensory or motor modalities with a structure kept as simple as possible (using only Hebbian learning), interaction of the agent with the environment can lead to the emergence of coherent behaviors, such as the agent approaching and following an object without any built-in reflex, or solving a delayed reward task with no sensory memory. The present paper extends this initial research and shows that for a situated agent using the same control architecture model, a sufficiently rich body dynamics leads to the emergence of two different navigation strategies observed in insects such as bees and ants [4]: one resorting on egocentric information (dead reckoning or path integration [5, 6]), the other using a geocentric system of reference (Snapshot model [7] or Average Landmark Vector model [8]).

2. Model The control architecture model used in this work is described in details in [2,3]. The essential aspects thereof are briefly discussed here.

S. Bovet / Emergence of Insect Navigation Strategies from Homogeneous Sensorimotor Coupling

527

This model consists essentially of a systematic and homogeneous synaptic coupling of each pair of sensory or motor modalities, as shown in Figure 1(a). The role of the synaptic coupling between each pair of sensory or motor modalities is twofold: first, it learns the correlations of simultaneous activity in the respective modalities (with a simple Hebbian learning rule); second, it allows activity to propagate from one modality to the other, according to correlations learned by the agent itself. The activity thus induced in the motor modalities generates actions, leading to observable behaviors of the agent. Note that at this level, no distinction is made between sensors and motors. Moreover, there is no hierarchical structure that could bias how the external world is perceived by the robot: the agent has no built-in knowledge about any relation between sensory input and motor action. In our model, each modality consists of five populations of real-valued linear artificial neurons (see Figure 1(b)): 1. Current State: the activity of the neurons in this population reflects the current reading of the corresponding sensor or motor. In the visual modality for instance, the activity of each neuron corresponds to the brightness of a pixel in the camera image. 2. Delayed State: like the current state, but delayed by a small amount of time, typically one time step. 3. “Desired”1 State: contains as many neurons as the current state, but is only activated by the synaptic input coming from all other sensory or motor modalities. 4. State Change: a population of neurons whose activity reflects a change in the current state, i.e. a difference between activity in the delayed and current states. For example, the activity of this population corresponds in the visual modality to the visual flow2 in the camera image. 5. “Desired” Change: like the state change, but taking this time the difference of activity between the current and “desired” states. The activity of the neurons in this population propagates through the synaptic couplings to the neurons in the “desired” state of all other sensory or motor modalities. The synaptic coupling between each pair of sensory or motor modality fully connects the neurons in the “desired” change population of the first modality to the neurons in the “desired” state population of the second modality. The weights (initially set to zero) are continually modified by a simple Hebbian learning rule using the activity of the current and state change populations in the first and second modality, respectively. A detailed mathematical description of the model is provided in the appendix.

3. Experiments 3.1. Agent and Environment Experiments are performed both in simulation and in the real world with the mobile robot shown in 2(a). The agent is situated in a flat environment similar to the ecological niche 1 The

term “desired” is purely arbitrary, chosen so only to help the reader better understand the model. visual flow is defined here as the ratio of intensity change of one pixel over the gradient of neighboring pixel intensity. If the absolute value of the gradient is below a given threshold, the flow is set to zero. 2 The

528

S. Bovet / Emergence of Insect Navigation Strategies from Homogeneous Sensorimotor Coupling

Figure 2. (a) The mobile robot “Samurai” used for the real world experiments, equipped with an omnidirectional camera and differential steering. (b) Image processing for the visual input. The image from the 360◦ camera (top) is used to extract the visual input for the system (300 × 1 pixel, bottom). (c) Motor input of the system. Each component is represented by a radial bar whose length indicates the magnitude, and whose position indicates the direction. The gray arrow represents the resulting net input. See text for details.

of a desert ant. The home location – corresponding to the “nest” – consists of a dark hole in the ground. An external light and heat source – similar to the sun – shines upon the whole surroundings. The agent is equipped with the following sensors: 1. Omnidirectional Camera: the one-dimensional 360◦ view is taken by thresholding, averaging and low-pass filtering a portion of the polar transformed image from the camera, as shown in Figure 2(b). The polar image is aligned to a geocentric coordinate system by using a compass3 . 2. Temperature Sensor: since the agent out of its nest is constantly exposed to the external heat source, its temperature constantly increases with time. 3. Light Sensor: only detects light when the agent is outside the nest hole4 . The motor modality of the agent is a crucial aspect for the following experiments and needs a more detailed description. Even though the experiments are performed in the real world with a robot equipped with a differential steering, a redundant motor representation is chosen. Instead of a two-component input indicating the left and right motor speeds, the input consists of several components indicating the magnitude of “speed vectors” applied to the agent in equally spaced azimuth directions, as shown in Figure 2(c). The different vectors applied together produce a net speed vector, which is finally transformed to the two wheel motors in the following way: if the agent is aligned in the direction of the net speed vector, its moves forward proportionally to its magnitude; otherwise, it turns towards its direction. 3 With the real robot, the compass value is obtained by triangulation, using the omnidirectional camera image and three salient landmarks of known positions. 4 Obviously – no hole were bored in the floor of the indoor office arena! – temperature and light sensors are simulated on the real robot.

S. Bovet / Emergence of Insect Navigation Strategies from Homogeneous Sensorimotor Coupling

529

Figure 3. Trajectories of the agent whose behavior corresponds to an egocentric homing strategy (path integration). The solid circle indicates the home location. The dot specifies the location where the “homing behavior” is triggered. In (d), (e) and (f), the agent was displaced to a different location (dotted arrow); when released, the agent returns to a fictive home location corresponding to the integrated path (dotted circle). (a) and (d) show trajectories obtained in simulation, (b) and (e) plot trajectories of the real robot, (c) and (f) show for comparison trajectories of a real desert ant (adapted from [9] with permission).

Figure 4. Path integration homing behavior. (a) Graphical representation of the synaptic weights connecting the temperature modality to the motor modality, corresponding to the run shown in Figure 3(e), just before the homing behavior is triggered. (b) Propagation of activity across the temperature and motor modalities. The activity in each population of neurons is illustrated symbolically. See text for details.

3.2. Emergence of Egocentric Navigation Strategy During the initial foraging excursion, the neurons in the “desired” state population of the motor modality are externally activated with random values. The agent therefore moves around randomly in the environment. At a given point, the activity of the “desired” state neuron in the temperature modality is set to an arbitrarily low value, and the agent is let to move on its own. The resulting trajectories – obtained either in simulation or with the real robot – are shown in Figure 3(a), (b), (d) and (e): the agent returns in an almost straight course to the starting location and stops there. If the agent is first displaced and released from a different location, it returns to a fictive home location corresponding to the integrated path.

530

S. Bovet / Emergence of Insect Navigation Strategies from Homogeneous Sensorimotor Coupling

This behavior thus matches quite exactly what is described in the literature as egocentric homing strategy (path integration, see Figure 3(c) and (f), or e.g. [5,6,9]). A consideration of the coupling between the motor and temperature modalities provides an explanation for the emergence of this behavior. In the described environment, the value of the temperature sensor constantly increases, meaning that the state change neuron in the corresponding modality displays a constant activity. This leads to the fact that the Hebbian learning rule modifying the synaptic weights of the connections from the temperature modality acts in fact like an integrator. Consequently, the values of the weights connecting the temperature modality to the motor modality are the timeintegrated values of the motor readings, i.e. the actual integrated path (see Figure 4(a)). Therefore, activity in the temperature “desired” change neuron (provoked by reducing the activity of the “desired” state neuron) will propagate into the motor modality according to the integrated path, thus generating the homing behavior (see Figure 4(b)). 3.3. Emergence of Geocentric Navigation Strategy In this second experiment, we observe the behavior of the agent provoked by setting the activity of the “desired” state neuron in the light modality to an arbitrarily low value. The resulting behavior is again quite surprising (see Figure 5(a) and (b)): irrespective of the release position, the agent returns to the home position and stops there. This behavior corresponds closely to what is known as geocentric homing strategies (landmark navigation, see Figure 5(c), or e.g. [7,8,10]). It is also interesting to notice that agent does not necessarily return along a straight trajectory and most of the time, the agent avoids obstacles on its route – an additional emergent behavior already observed with traditional visual homing strategies [10]. An explanation for the emergence of this behavior requires here to consider neuronal activity across three modalities, namely light, vision and motor modalities. The neuron in the light modality corresponding to the change of detected light (i.e. in the state change population) is only active at the beginning of the foraging excursion, namely when the agent exits its dark hole and surfaces into the bright external environment. The effect is that the Hebbian learning rule will thus capture a “snapshot” of the states of all the other modalities in the weights of the synapses leaving the light modality. Therefore, when the activity of the “desired” state neuron in the light modality is arbitrarily set to a low value (corresponding to darkness), the synaptic coupling will propagate activity into the “desired” state population of the visual modality, projecting so to say the view of the environment as seen from the home location (Figure 6(a), inner ring). Now, if the agent is located anywhere else, the discrepancy between neural activity corresponding to the perceived environment (current state) and the projected view (“desired” state) will lead to activity in the “desired” change of the visual modality, as shown in Figure 6(b). This activity will in turn propagate into the motor modality according to the correlation learned by the agent itself between motor activity and visual flow, thus leading to the observed homing behavior (see Figure 6(c)).

4. Discussion This investigation sheds a new light on the situated perspective of an agent performing a given behavior. It shows how two homing strategies emerged from a single control

S. Bovet / Emergence of Insect Navigation Strategies from Homogeneous Sensorimotor Coupling

531

Figure 5. Trajectories of the agent whose behavior corresponds to a geocentric homing strategy (landmark navigation). Objects are shown in dark gray and the circle marks the home location. Each trajectory corresponds to a different release position. (a) Simulation results. (b) Trajectories obtained with the real robot. (c) Trajectories of an agent using the Average Landmark Vector model (adapted from [10] with permission).

Figure 6. Visual homing behavior. (a) The difference between the visual current state (outer ring) and the “desired” state (inner ring) produces activity in the “desired” change corresponding to the visual flow indicated by the arrows. Landmarks are represented by black discs, the home location by the gray dot. (b) Propagation of activity across the light, vision and motor modalities. The activity in each population of neurons is illustrated symbolically. (c) Generated motor activity. The resulting behavior is a motion of the agent in east direction, i.e. towards the home location. See text for details.

model, each strategy triggered by only changing the activity of one single neuron in two different sensory modalities: in the temperature modality (whose sensor value continually increases with time) for the egocentric strategy; or in the light modality (whose sensor value only changes when the agent leaves or returns to the home location) for the geocentric strategy. Note that the activation of these single neurons, presented here as an external intervention, can be generated by the agent itself. Suppose that the agent were provided with an additional sensory modality measuring its hunger (as a matter of fact an internal, situated value). The nest being a source of food, the model would learn – as the agent would eat inside the nest – a correlation between decrease of hunger and darkness as well as low temperature. As soon as the hunger level of the agent would exceed a given value during the foraging behavior, activity would propagate from the “desired” change of the hunger modality into both the light and temperature modalities, generating the same kind of activation as the one presented so far as external intervention, and thus triggering the homing behaviors.

532

S. Bovet / Emergence of Insect Navigation Strategies from Homogeneous Sensorimotor Coupling

This work also demonstrates that observable behaviors do not need to rely on specifically dedicated control “modules” – an instantiation of the well-known frame-ofreference problem [1]. It contrasts with other models of these navigation strategies proposed so far (for a survey, see e.g. [8,9,11,12,13]), which – to our knowledge – all present definite designer biases5 at the control level. Firstly, only activity in specific modalities (e.g. vision for geocentric strategies) is defined as system input. Secondly, all these models contain a specific “module” responsible for converting an explicit internal state (e.g. the components of the integrated home vector) into precise motor commands. An additional conceptual difficulty is related to the symbol-grounding problem: what triggers the homing behaviors? An explicit “go home” action seems indeed quite questionable from a biological and evolutionary perspective. In our model, the homing behaviors are produced by activity generated (either externally or by the agent itself) in sensory modalities a priori quite irrelevant, which would certainly be attributed differently by an observer focusing only on the model without taking into account the agent-environment interaction.

5. Conclusion This work shows how emergent behaviors can be observed with situated agents using a control model not designed for any particular task. More specifically, it describes how two insect navigation strategies (path integration and landmark navigation) emerge from a recently proposed model of control architecture [2,3], consisting basically of homogeneous sensorimotor coupling using Hebbian-like synaptic plasticity. The presented set of experiments highlights the relevance of using an agent equipped with a number of different sensory modalities and redundant body dynamics, engaged in a sufficiently rich environment (a concept also referred to as morphological computation [15]). This also stresses the importance of taking into account a priori irrelevant sensors (such as temperature or light sensors), which turn out to play a critical role in grounding the observer-based concept of “homing” behavior.

Acknowledgment This research is supported by the IST-2000-28127 European project (AMOUSE). Thanks to Hanspeter Kunz for kindly providing the “Samurai” robot platform and to Miriam Fend for helpful comments on the paper.

Appendix: Mathematical description of the model M M is Let xM = (xM 1 , x2 , . . .) be the current state of a modality M . The state change y M M defined at any time step t by means of the delayed state x ˘ (t) := x (t − τ ):

yM (t) := δ M (˘ xM (t), xM (t)) 5 Even

if some specific parts have been shown to be learnable by the agent itself [14].

S. Bovet / Emergence of Insect Navigation Strategies from Homogeneous Sensorimotor Coupling

533

where τ > 0 is a small time delay and δ M (·, ·) a function computing a change between two states, typically: δ M (a, b) := b − a. The synaptic weight matrix WM N = (WijM N ) connecting any two different modalities M and N is initially set to zero WM N (0) := 0 and is updated with a Hebbian learning rule: N WijM N (t + 1) := WijM N (t) + η · xM i (t) · yj (t)

with the learning rate η > 0. The “desired” state x ˜M is defined as the synaptic input coming from all other modalities. Finally, the “desired” change y ˜M is defined in a similar M way as the state change y :  x ˜M (t + 1) := WM N (t) · y ˜N (t) N =M

M

M

y ˜ (t) := δ (xM (t), x ˜M (t))

References [1] R. Pfeifer and C. Scheier, Understanding Intelligence. Cambridge, Mass.: MIT Press, 1999. [2] S. Bovet and R. Pfeifer, “Emergence of coherent behaviors from homogenous sensorimotor coupling,” in Proc. 12th Int. Conf. on Adv. Robotics (ICAR), Seattle, 2005, pp. 324–330. [3] ——, “Emergence of delayed reward learning from sensorimotor coordination,” in Proc. of the IEEE/RSJ Int. Conf. on Int. Robots and Sys. (IROS), Edmonton, 2005, pp. 841–846. [4] R. Wehner, B. Michel, and P. Antonsen, “Visual navigation in insects: Coupling of egocentric and geocentric information,” Journal of Experimental Biology, vol. 199, pp. 129–140, 1996. [5] M. Collett and T. S. Collett, “How do insects use path integration for their navigation?” Biological Cybernetics, vol. 83, pp. 245–259, 2000. [6] D. Andel and R. Wehner, “Path integration in desert ants, Cataglyphis: how to make a homing ant run away from home,” Proc. R. Soc. Lond., vol. 271, pp. 1485–1489, 2004. [7] B. A. Cartwright and T. S. Collett, “Landmark learning in bees,” Journal of Comparative Physiology, vol. 151, pp. 521–543, 1983. [8] D. Lambrinos, R. Möller, T. Labhart, R. Pfeifer, and R. Wehner, “A mobile robot employing insect strategies for navigation,” Robotics and Autonomous Systems, special issue on Biomimetic Robots, vol. 30, pp. 39–64, 2000. [9] M. Müller and R. Wehner, “Path integration in desert ants, cataglyphis fortis,” Proc. Natl. Acad. Sci. USA, vol. 85, pp. 5287–5290, 1988. [10] V. V. Hafner, H. Kunz, and R. Pfeifer, “An investigation into obstacle avoidance as an ‘emergent’ behaviour from two different perspectives,” in Proc. of the EPSRC/BBSRC Int. Workshop on Biol.-Inspired Robotics: The Legacy of W. Grey Walter, Bristol, 2002, pp. 166–173. [11] M. O. Franz and H. A. Mallot, “Biomimetic robot navigation,” Robotics and Autonomous Systems, vol. 30, pp. 133–153, 2000. [12] A. S. Etienne and K. J. Jeffrey, “Path integration in mammals,” Hippocampus, vol. 14, pp. 180–192, 2004. [13] R. Möller, “Insect visual homing strategies in a robot with analog processing,” Biological Cybernetics, vol. 83, pp. 231–243, 2000. [14] V. V. Hafner and R. Möller, “Learning of visual navigation strategies,” in Proc. of the Eur. Workshop on Learning Robots (EWLR-9), M. Quoy, P. Gaussier, and J. Wyatt, Eds., Prague, 2001, pp. 47–56. [15] R. Pfeifer and F. Iida, “Morphological computation: Connecting body, brain and environment,” Japanese Scientific Monthly, vol. 58, no. 2, pp. 48–54, 2005.

534

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

Active learning of local structures from Attentive and Multi-Resolution Vision 1 Maxime Cottret a and Michel Devy a,2 a LAAS-CNRS, Toulouse, France Abstract. In order to execute tasks, a robot needs a complex visual system to cope with the detection, characterization and recognition of places and objects. This paper presents on-the-way developments on detection and characterization functions, integrated on a companion robot equipped with an omnidirectional color camera and an PTZ camera with pan, tilt and zoom controls. When moving in a indoor environment, our robot executes concurrently a pre-attentive process and an attentive one; the former extracts coarsely localized regions of interest (ROI) from the omnidirectional images. The latter analyzes more accurately each ROI by focusing the PTZ camera, grabbing a succession of viewpoints and generating an appearance based model of the region. Based on discriminant and invariant patches, this model provides a set of visual features which will be used to categorize region classes and to recognize region instances in next images. Keywords. local structures, attention, active vision, multi-resolution.

1. Introduction In ten or twenty years, everyone will have a companion robot at home to perform some simple domotic tasks. The cognitive functions of such a robot are currently studied by several robotics team, in the framework of the COGNIRON european project. Several testbeds are integrated , with many sensors used to learn knowledges, to control navigation and manipulation tasks and to interact with the user. The embedded system of a companion robot, integrates functional and decisional functions, especially to learn knowledges on the environment, autonomously or interactively with a user. In order to generate motions, several spatial models must be learnt; at the more abstract level, the environment is described by places a posteriori labelled by the user according to some semantical informations (labels like kitchen, room, livingroom...) and connected in a topological graph. In order to control the execution of these motions, the robot will require other knowledges to locate itself, i.e. discriminant landmarks linked to places where they have been detected or characteristic images for every area. Classical free space representations like occupancy grids could be generated in cluttered areas. 1 This

work is funded by the FP6 european project COGNIRON: http://www.cogniron.org to: Michel Devy, LAAS-CNRS, 7 avenue du Colonel Roche, 31077 Toulouse Cedex 04, FRANCE. Tel.:+33 5 61 33 63 31 ; Fax:+33 5 61 33 64 55 ; E-mail: [email protected] 2 Correspondence

M. Cottret and M. Devy / Active Learning of Local Structures

535

A companion robot has to recognize, locate and grasp objects. First, a user could ask the robot to bring him an object (a pen, a glass, the TV control box. . . ) or to execute autonomously a more complex task that has been learnt a priori like Water plants. Moreover, the companion robot could execute by itself some simpler tasks, planned from analysis of the user behaviour to recognize its intention, e.g. Bring an object the user is looking for. Many learning functions required on a companion robot, have been studied in the robotics community, taking advantage of the Bayesian framework to deal with uncertainties and noisy measurements, e.g. the automatic construction of stochastic maps [13] to represent landmarks detected from visual or laser data [5,15], the autonomous learning of topological models from panoramic images, including the place categorization [17,14], the automatic construction of a geometrical model for a 3D object to be grasped by the robot . . . The robot could learn these models interactively with the user, e.g. it could construct navigation maps following an operator who guides the robot during the home tour and who gives a semantical label to every place [16], ot it could construct the model of an object that the operator moves so that every aspect will be viewed [4]. This paper gives preliminary results on a more autonomous strategy applied by our robot to learn local knowledges, from two different visual modalities: a color omnidirectional camera and an active camera with pan, tilt and zoom parameters that can be controlled by the system (PTZ camera). When learning the environment map from panoramic images, eventually following an operator, our robot will also detect and track regions of interest (namely, ROIs): a ROI is a salient image region, i.e. a region that can be easily distinguished in a local neighborhood. These ROIs correspond to local structures of the environment, discriminant and invariant enough to be tracked in an image sequence and to be exploited in further motions, either planare objects like quadrangles [5], or isolated 3D objects laid on the ground (chairs...) or on tables (glasses, plates...). From high resolution images acquired by the PTZ camera, the robot will build an invariant representation for such local structures, fusing data acquired from several view points and asking the operator to label this entity for further interactions. So, two processes will be executed asynchronously to extract local knowledges; the panoramic images will be analyzed in a pre-attentive process, while high resolution images will be acquired and analyzed by the attentive process. The section 2 gives a short state of the art and details our general approach. In the section 3, the pre-attentive process is described: it provides a list of salient regions detected on low resolution images. The section 4 is devoted to the attentive process, for the categorization of local structures extracted from high resolution images acquired on every ROI. The section 5 gives preliminary results on the integration of such visual processes on our experimental robot. Finally, the section 6 will summarize our contribution and our current works.

2. General approach In recent years, numerous authors in the Vision community, have provided contributions on cognitive vision, visual learning or object categorization. Some authors [1] propose neuronal approaches to analyze the whole image, but many others exploit only invariant features or patches. Numerous regions descriptors have been proposed [9]:

536

M. Cottret and M. Devy / Active Learning of Local Structures

Scale Saliency Patches [7] are mostly used in object recognition and categorization [3,2], while SIFT features [8] have become very popular in robotics [17]. J.Ponce et C.Schmid [11,12,10] build representations from patches inspired from the classical Harris points [9]. . . In these contributions, an object class is characterized by a set of invariant patches extracted from numerous images acquired from different viewpoints. The described entities correspond generally to a single aspect of a 3D object (the back side of a car, a right side of an airplane) like in [3]. Typically these authors do not address the localization problem, which is mandatory for a robot which will interact with the described object (landmark-based navigation, object grasping, visual servoing). Moreover, images are simple, with isolated objects during the learning step; only one passive visual modality is exploited to acquire images; only view-based representations are built, assuming that all patches extracted from every image belong to the same object. Here we take advantage of active and multi-focal vision to improve the learning of local structures extracted and tracked from sequences of complex images. At this step, the learning task is autonomous; interactions with the operator will be considered in a next step, in order to add semantics to the model. The robot moves along a known trajectory, and during this motion, executes concurrently two asynchronous visual processes. The pre-attentive process detects salient regions from low resolution images. It exploits quick operators [6] to create from every low resolution image, a saliency map, segmented by a clustering method, in order to generate Regions of Interest, memorized in a ROI list. The attentive process will learn only from these regions. It is executed asynchronously, taking as input, the ROI list; it controls the PTZ camera to acquire high resolution images focused successively on the salient regions. For every one, it generates an appearance based representation, more precisely a set of discriminant scale-invariant patches [7]. Then, this representation will be used to categorize local structures in object classes, and to recognize object instances in next images [3]. The pre-attentive process provides image-based information only for short-term memorization; on the contrary the attentive process generates knowledge, for long-term memorization. Such a structure could correspond to an isolated object; but generally, it will be difficult to obtain a right object segmentation from an image; a user interaction will be required later for the final interpretation.

3. Pre-attentive vision This process takes place before any decision. The robot need to detect highly salient regions in order to choose where to focus. Numerous definitions of visual saliency exist in the literature, like Jagersand, Itti or Kadir’s ones. As our robot will interact with humans, it has to focus on the same discriminant features than humans. Therefore, our preattentive vision system is based on the saliency detector developed by L. Itti [6]: according neurobiologic studies on primates, particular locations in the scene only are treated in the visual cortex, in a bottom-up process and without a priori knowledge. These studies also showed that three types of visual components take most part in the pre-attentive vision : color, intensity and orientations. Thus, after extraction of these components into three maps, they are put in competition and only highly informative locations remain leading to a saliency map of the visual field.

M. Cottret and M. Devy / Active Learning of Local Structures

537

Figure 1. Saliency map construction

Saliency map construction The figure 3 summarizes the pipeline of operations from the input image to the saliency map. Three filters are applied to extract intensity, color and orientation discontinuities at several levels of the input image resolutions. Three feature maps (one for each filter) are computed at several image resolutions, using center-surround comparisons, in order to model the sensitivity difference in the human visual system between the center and the edge of the retina. I(c, s) = |I(c)  I(s)| RG(c, s) = |(R(c) − G(c))  (G(s) − R(s))| BY (c, s) = |(B(c) − Y (c))  (Y (s) − B(s))| O(c, s, θ) = |O(c, θ)  O(s, theta)|

with c ∈ {2, 3, 4} and s = c + δ, δ ∈ {3, 4} So, the first step consists in computing gaussian pyramids, with scale σ ∈ [0..8], to detect discontinuities on the intensity I(σ), the color RG(σ) for G − R, BY (σ) for B − Y , and the orientation O(σ, θ), using Gabor filters with angles θ = 0◦ , 45◦ , 90◦ and 135◦ . Feature maps are normalized and integrated to form a single conspicuity map for each visual attributes I, G − R, B − Y and O. A non-linear normalization is applied to each conspicuity map to amplify peaks of contrasts relative to noise in the background. In the final stage, conspicuity maps are combined at a given scale (here σ = 3) to produce a single saliency map S: I=

c+4 4 $ $

N (I(c, s))

c=2 s=c+3 c+4 4

C=

$ $

|N (RG(c, s)) + N (BY (c, s))|

c=2 s=c+3

O=



(

c+4 4 $ $

θ∈{0◦ ,45◦ ,90◦ ,135◦} c=2 s=c+3

S=

N (I)+N (C)+N (O) 3

N (O(c, s, θ)))

538

M. Cottret and M. Devy / Active Learning of Local Structures

The figure 2-b presents the saliency map and intermediate intensity, color and orientation conspicuity maps, computed on a low resolution image acquired in a kitchen: let us note that the more salient local structure is the coffee machine.

(a)

(c)

(b)

(d)

(e)

Figure 2. (a) input image, (b) saliency map, (c) intensity conspicuity, (d) color conspicuity, (e) orientation conspicuity (from Gabor filters)

Regions of interest extraction. Our selection algorithm takes the saliency map as a statistical map for the purpose of Monte-Carlo selection in order to extract N from the most salient points. It uses rejection sampling, i.e. randomly reject points according a randomly biased threshold. The resulting points cloud, distributed over the most salient regions of the map, is then clustered in R3 (spatial position and saliency) using the K-means method, with K a priori fixed; we are evaluating a Mean Shift algorithm to avoid the classic al problem on the K selection. The regions of interest are stocked as the barycenter - weighted by the saliency - , the bounding box of the points from a resulting class and the mean saliency. The pre-attentive process. Whereas the robot goes through its environment, ROIs are detected using the omnidirectional camera and stored with a timestamp and a position tag. The treatment of ROIs through the attentive process is done according the mean saliency weighted by: Ð distance from the current position of the robot : the ROI must be in the field of view of the PTZ camera. Ð distance from the current timestamp : even if most of the environment is almost static, some objects can be moved, involving a change in the interest of the ROI. So, we considere that the saliency of a ROI become insignificant with time. ROIs are erased once treated or if they become too old (the time threshold is manually fixed).

M. Cottret and M. Devy / Active Learning of Local Structures

539

4. Attentive vision Once the robot "knows" where salient structures rely in its environment, it can address planning or decision making problems of higher abstraction, in order to build a longterm model for these entities. By now, our development are focussed on the learning of appearance-based models; later a geometrical ones will be built from the same image sequences.

Figure 3. Extracted features on focalized images

Our companion robot evolves in a cluttered environment, where plenty of partially occulted objects, hardly separable from the background, can be found. Therefore the robot’s model has to be incrementally built and/or updated. It appears that global object description like PCA or moments are not adapted for our purpose (it code the whole shape and appearance of the object in one description). Therefore a local structure is modelled by a constellation of features. Scale Saliency Patches T.Kadir and M.Brady [7] introduce a local invariant descriptor based on information and scale-space frameworks. Each point of the input image x, is characterized by a descriptor vector D that take values (d1 ...dr )t . A local probability density function px,s (pdf) is estimated using histogram on a circular window and for a range of scale s (radius). The entropy Hx,s of this pdf is: Hx,s = Σi px,s (di )log2 (px,s (di )) The local maximum of entropy over the scale gives the patch characteristic scale s¯ .Thus, a local feature is given by a pixel s with its scale s¯ and a score S S = Hx,¯s Σi |px,¯s (di ) − px,¯s−1 (di )| In order to reduce the feature number, extracted features are clustered in R3 (spatial positions and scale): a cluster of Scaled Salient Patches has a local support (nearby points with similar scores and scales). Each cluster must be sufficiently distant from all other in. Thus, a local structure corresponds to such a cluster, given by the barycenter of a selected local support (mean pixel, mean scale and mean score): the figure 3 presents Scaled Salient Patches extracted from focalized views acquired on the coffee machine seen on the global view in figure 2. The attentive process. The attentive process focus on the most salient ROI given by the pre-attentive process. For each ROI, Scaled Salient Patches are extracted and stored as a thumbnails clus-

540

M. Cottret and M. Devy / Active Learning of Local Structures

Figure 4. (top left) input from omnidirectional camera; (top right) saliency map (boxes mark centers of ROIs); (bottom) focalized image on the bottom right salient region

ter with a position tag and label them as "location" of the environment. As describe in [10], we use pixel correlation for thumbnails matching and so "location" recognition : there is recognition if most features in the new viewpoint are matched with the location’s one. Then, the location’s list of thumbnails is updated by merging the new viewpoint’s ones and eliminates those that never match. A coarse 3D localization using the position of the different viewpoints, can also be computed because the robot motion is known.

5. Preliminary results By now, operations involved in the pre-attentive and attentive processes have been evaluated on several images manually acquired at home. Only the pre-attentive process is integrated on a robot. Panoramic images are acquired, moving the robot with the joystick; salient regions must correspond to same areas in the environment; we are studying how to match salient regions extracted from a sequence of panoramic images. The omnidirectional camera is coarsely calibrated, so we can compute a 3D direction from each ROI’s center; on figure 4, salient ROIs are extracted from the panoramic image on the top left, and a focalized image is acquired from the PTZ camera pointed toward the more salient region, here around the clear cupboard behind a person.

6. Conclusion This paper has proposed an original approach to learn local structures that a robot could detect by itself indoor. These models will be used to understand how the environment is structured: they could be combined either with a topological map built by another learning process, to recognize places (such an entity characteristic for such a place), or with a metrical stochastic map to locate the robot. Current works are devoted to two problems:

M. Cottret and M. Devy / Active Learning of Local Structures

541

1) how to take into account contextual knowledge, mainly to make easier the segmentation of the local structures (for example, assuming that these structures are on large uniform planar faces of the environment: ground, walls ...)? and 2) how to categorize and recognize objects, from their patch representations built from several viewpoints, using the Bayesian framework proposed in [2]?

References [1] N. Dohuu, W. Paquier, and R. Chatila. Combining structural descriptions and image-based representations for image, object and scene recognition. In Proc. Int. Joint Conf. on Artificial Intelligence (IJCAI), 2005. [2] L. Fei-Fei, R. Fergus, and P. Perona. A bayesian approach to unsupervised one-shot learning of objects categories. In Proc. Int. Conf. on Computer Vision (ICCV), 2004. [3] R. Fergus, P. Perona, and A. Zisserman. Object class recognition by unsupervised scaleinvariant learning. In Proc. IEEE Int. Conf. on Vision and Pattern recognition (CVPR), 2003. [4] P. Fitzpatrick, G. Metta, L. Natale, S. Rao, and G. Sandini. Learning about objects through action: Initial steps towards artificial cognition. In Proc. IEEE Int. Conf. on Robotics and Automation (ICRA), 2003. [5] J.B. Hayet, F. Lerasle, and M. Devy. Visual landmarks detection and recognition for mobile robot navigation. In Proc. IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR), Vol.II, pages 313–318, 2003. [6] L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. on Pattern Analysis and Machine Intelligence(PAMI), Vol. 20, No. 11, pages 1254–1259, 1998. [7] T. Kadir and M. Brady. Scale, saliency and image description. Int. Journal on Computer Vision (IJCV), 2001. [8] D.G. Lowe. Distinctive image features from scale-invariant keypoints. Int. Journal of Computer Vision, 60, 2, pages 91–110, 2004. [9] K. Mikolajczyk and C. Schmid. Indexing based on scale invariant interest points. In Proc. Int. Conf. on Computer Vision (ICCV), 2001. [10] J. Ponce, S. Lazebnik, F. Rothganger, and C.Schmid. Toward true 3d object recognition. In Proc. AFRIF/AFIA Conf. Reconnaissance de Formes et Intelligence Artificielle (RFIA), 2004. [11] F. Rothganger, S. Lazebnik, C.Schmid, and J. Ponce. 3d object modeling and recognition using affine-invariant patches and multi-view spatial constraints. In Proc. IEEE Int. Conf. on Vision and Pattern Recognition (CVPR), 2003. [12] C. Schmid. Constructing models for content-based image retrieval. In Proc. IEEE Int. Conf. on Vision and Pattern Recognition (CVPR), Vol II, pages 39–45, 2003. [13] R.C. Smith and P. Cheeseman. On the representation of spatial uncertainty. In Int.Journal on Robotics Research, 1987. [14] A. Tapus, S. Heinzer, and R. Siegwart. Bayesian programming for topological global localization with fingerprints. In Proc. IEEE Int. Conf. on Robotics and Automation (ICRA), 2004. [15] S. Thrun and Y. Liu. Multi-robot slam with sparse extended information filers. In Proc. Int. Symp. of Robotics Research (ISRR), 2003. [16] E.A. Topp and H.I. Christensen. Tracking for following and passing persons. In Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 2005. [17] Z. Zivkovic, B. Bakker, and B. Krose. Hierarchical map building using visual landmarks and geometric constraints. In Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 2005.

542

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

Modular Design of Home Service Robot System with Hierarchical Colored Petri Net Guohui TIAN a,1, Feng DUAN b and Tamio ARAI b a b

School of Control Science and Engineering, Shandong University, China

Department of Precision Machinery Engineering, University of Tokyo, Japan

Abstract. A modular design procedure is proposed for the autonomous and distributed control of home service robot system with Hierarchical Colored Petri net. In order to handle complexity of robot configurations and to satisfy flexibility necessary to covering various tasks and dynamic environments for home service, each kind of devices or tasks are considered as a module, the interaction mechanisms for the coordination and cooperation of the devices are presented. By studying an example, a control structure for home service robot system is developed in detail using the simulation software—CPN Tools. Keywords. Home service robots, modular design, Hierarchical Colored Petri net

Introduction It is necessary to develop effective home service robot systems consisting of two or more robots and other important assistant devices, in order to deal with so many and different kinds of home works and increase the whole system’s flexibility, adaptability and reliability in the complicated and dynamic home environments. The complex characteristics of home service robot system can be summarized as follows: (1) Home service robots may have many different robot configurations of the kinematical and sensor modules. For example, one can have only a mobile base and vacuum. Some can have a head module in addition and others can have arms in addition. As the robots are considered to be composed of several modules of a mobile base, arms, head, sensors units, and so on, the architecture has to be able to handle the complexity of diverse robot configurations.

1

Corresponding Author: Guohui TIAN, School of Control Science and Engineering, Shandong University,

No.73, Jingshi Road, Jinan city, 250061, P.R.China; E-mail: [email protected].

G. Tian et al. / Modular Design of Home Service Robot System with Hierarchical Colored Petri Net 543

(2) Home service robots may have to perform many different kind of tasks. They include exploration, guidance, transportation, cleaning, surveillance, teaching and playing with children, etc. Since these tasks are different from each other, it is better for each task to have a corresponding module [1, 2]. (3) Home environment is dynamic. For example, the orientation of a chair is changed or even this chair is moved away from its former position, or one person suddenly crashes into the house. So an efficient developing procedure is needed to handle the complexity of robot configurations and to satisfy flexibility necessary for covering various tasks and the dynamic environments. It should be composed of upper level control for high-level intelligence to interact with human and to plan tasks, as well as lower level control to allow low-level intelligence for prompt reaction in each module [3, 4]. In this paper, modular design procedure is proposed using Hierarchical Colored Petri nets [5]. The design methodology is based on a concept of discrete states and actions, and a kind of devices or tasks is represented as one autonomous module composed of a sequence of primitive actions. Home service robot system is viewed as a collection of asynchronous and concurrent processes which execute different tasks. For cooperative or exclusive tasks of multiple modules, discrete event-driven control is implemented as a data flow network of concurrent processes communicating with each other. The procedure is developed by modeling a home service robot system using one simulation software—CPN Tools [6].

1. The Structure of a Home Service Robot System A general home service robot system is depicted as Figure 1. It is composed of a hand robot with an arm camera, a mobile robot carrying a hand with an arm camera, a

RF Tag

Internet

Figure 1. The structure of a home service robot system

544 G. Tian et al. / Modular Design of Home Service Robot System with Hierarchical Colored Petri Net

conveying mobile robot with a pallet, a set of projector, and 4 set of ceiling cameras. In addition, the RF-Tags are used for the conveying mobile robots to navigate in the home environment, QR-Codes are adopted to recognize and handle so many different kinds of goods in the house conveniently and efficiently. All the above devices are connected by the Local Area Net and controlled in a distributed and autonomous way. We will modularly develop the control structure for this system in a down-top way using Hierarchical Colored Petri Nets.

2. Petri Net Models for the Modules 2.1. Petri Net Model for Projector The projector can send out a color line onto the floor board in specified position and orientation. The mobile robots will be guided by this line as globally referential path information, as well as move autonomously with collision avoidance based on the local environmental information through their own sensors, such as laser range scanners and infrared detectors. Guided by this line, the mobile robot can navigate more robustly considering the dynamic characteristics of home environment. This module is modeled as Figure 2. When the projector receives a request command from the coordinator and if it is in idle state at this time, it will modulate its orientation, and then send out a guiding line in correct position and direction. The projector will maintain the massage-sending state till it receives an acknowledgement massage to cancel the guiding line, and then return to be idle. 2.2. Petri Net Model for Ceiling Cameras Each of the 4 set of cameras can give the local scene information respectively, and they

Figure 2. Petri net model for projector module

Figure 3. Petri net model for ceiling cameras module

G. Tian et al. / Modular Design of Home Service Robot System with Hierarchical Colored Petri Net 545

work together corporately to give the global scene information about the home environment. Based on the scene information, the conveying mobile robot can understand the home environment more efficiently. This module is modeled as Figure 3. When the cameras receive the request command from the coordinator, they will capture images, send out image massages, and maintain these massages till they receive an acknowledgement massage from the conveying robot, and then turn to be idle. 2.3. Petri Net Model for Hand Robot The hand robot carrying an arm camera can load or unload goods. Its model is composed of an arm module and an arm camera module, its global description is just depicted as Figure 4 (a). When the hand robot receives a request command, the arm will start to move near to the goods to be handled, and at the same time the arm camera will receive a command massage to start to capture images in order to detect the goal object in detail. After the goal object is detected, the arm will hold it, and move it near to the pallet installed on the conveying robot, at the same time a command request is

(a) Global Petri net model for hand robot

(b) Petri net model for arm camera module

(c) Petri net model for arm module

Figure 4. Petri net models for hand robot module

546 G. Tian et al. / Modular Design of Home Service Robot System with Hierarchical Colored Petri Net

sent to the arm camera again in order to detect the exact position of the pallet. If the pallet is available and is detected, when the arm approaches it, the hand will place the goods onto it, at the same time a massage is sent to the conveying robot. The arm will wait till it receives a massage from the conveying robot that the goods has been loaded successfully, then it will release the goods. Finally a massage is sent to conveying robot so that it can start to transfer, and also an acknowledgment massage is sent to the coordinator indicating that the loading action is over, the hand returns to home state and then becomes idle, just depicted as Figure 4 (c). When the arm camera receives a request command massage, it will start to capture images till it has detected the goal object, then it will send out the goal image massage to the arm module and return to be idle, just depicted as Figure 4 (b). 2.4. Petri Net Model for Mobile Robot The mobile robot carries an arm with an arm camera to load or unload goods just like the hand robot, it also can move autonomously in some local areas of the house. Its model is composed of a wheel module, an arm module, and an arm camera module, its global description is just depicted as Figure 5(a). When the mobile robot receives a request command, firstly it moves to one position near to the goods to be loaded or unloaded, and then it will load/unload goods onto/from the conveying robot in the same way as the hand robot does. The moving action of the wheel module is modeled as Figure 5(b). The wheel module will continue moving till it approaches to the destination, and then it will send out a request command to the arm module. Here the arm camera module acts in the same way as in the case of hand robot depicted as Figure 4(b). The loading/unloading action of the arm module is the same as that of the arm module of the hand robot depicted as Figure 4(c), except that here it

(a) Global Petri net model for mobile robot

(b) Petri net model for wheel module

Figure 5. Petri net models for mobile robot module

G. Tian et al. / Modular Design of Home Service Robot System with Hierarchical Colored Petri Net 547

receives the request command signal from the wheel module instead of that from the coordinator in case of hand robot. 2.5. Petri Net Model for Conveying Mobile Robot The conveying mobile robot carries a pallet and can move to convey goods from one position to another destination position. It can be loaded or unloaded goods by the hand robot or the mobile robot. The conveying robot can navigate in the house by detecting the local environment by its own installed sensors. In addition, it can use the image massages provided by the ceiling cameras as local and global scene information about the environment and can be guided by the line massage provided by the projector to navigate robustly in the whole house. This module is modeled as Figure 6(a). When the conveying robot receives a request command, guided by the ceiling cameras and the projector it firstly moves to a position near to the loading robot, i.e. the hand robot or the mobile robot. Then it will send an approaching signal to the loading robot and wait to be loaded. Finishing being loaded, it also will send a message to the loading robot and wait to transfer the loaded goods. When the loading robot moves its arm away, it starts to transfer the loaded goods to the destination position. After it is unloaded, it will send out an acknowledgment massage to the coordinator and return to the idle state.

(a) Global Petri net model for conveying mobile robot

(b) Petri net model for its wheel module

Figure 6. Petri net model for conveying mobile robot module

548 G. Tian et al. / Modular Design of Home Service Robot System with Hierarchical Colored Petri Net

The actions of the conveying robot to move near to the loading robot and to transfer the loaded goods to the destination position are both described by the wheel module depicted as Figure 6(b). The former is considered as the first moving request to the wheel module, as well as the latter as the second one. The places of Wheel Req and Approach Massage are typed as INT, so that two kinds of request commands can be distinguished by variable (wreq), i.e. 1 and 2, respectively. The initial mark for the Idle place is 1’e, indicating that the wheel module can only respond to one request once a time, i.e. mutual exclusion. The unloading action is similar to the loading action, it is represented just as one transition for simplicity.

3. Coordinating Mechanism for Above Devices All the above devices are connected by Local Area Net, and are scheduled by the coordinator. The Petri net model of the coordinating mechanism is depicted as Figure7. When we want to move the goods on desk to another destination position, the coordinator will send request commands to the hand robot, the conveying robot, the projector, and the ceiling cameras respectively. The hand robot loads goods onto the conveying robot, the conveying robot will transfer the loaded goods to the destination position guided by the massages provided by the projector and the ceiling cameras. In the same way, when we want to fetch some goods from the refrigerator, the coordinator will send request commands to the mobile robot, the conveying robot, the projector, and the ceiling cameras respectively. The mobile robot loads goods onto the conveying robot, the conveying robot will transfer the loaded goods to the destination position guided by the massages provided by the project and the ceiling cameras.

Figure 7. Petri net models for coordinator

G. Tian et al. / Modular Design of Home Service Robot System with Hierarchical Colored Petri Net 549

Figure 8. Top level Petri net model for the whole system

4. Top Global Level Petri Net for the Whole System The whole system is modeled as the top level net of Hierarchical Colored Petri net just depicted as Figure 8. Where the substitution transitions have their own subpages corresponding to the representations of hand robot, mobile robot, conveying robot, projector, ceiling cameras, and also the coordinator described as above respectively.

5. Conclusion Based on the modular design procedure, we could provide the flexibility needed for so many different kinematical configurations and tasks. In addition, it is very easy to add or remove robot configuration modules, and can represent additional devices belonging to the same kind using the same module model only by changing its color type conveniently. References: [1] [2] [3] [4] [5] [6]

G.Yasuda and K.Tachibana, A parallel distributed control architecture for multiple robot systems using a network of microcomputers, Computers & Industrial Engineering, 27 (1994), 63-66. G.Yasuda, K.Tachibana, A computer network based control architecture for autonomous distributed multirobot systems, Computers & Industrial Engineering, 31 (1996), 697-702. Gideon Cohen, Concurrent system to resolve real-time conflicts in multi-robot systems, Engineering Applications of Artificial Intelligence, Volume 8, Issue 2, April 1995, 169-175. John P. T. Mo and Bryan C. K. Tang, Petri net modeling and design of task oriented messaging system for robot control, Computers & Industrial Engineering, Volume 34, Issue 4, 1998, 729-742 Kurt Jensen, An Introduction to the Practical Use of Colored Petri Nets. In: Reisig, W., Rozenberg, G. (eds.): Lectures on Petri nets II. LNCS 1492. New York: Springer Verlag, 1998, pp. 237-292. CPN Tools. Online: http://wiki.daimi.au.dk/cpntools/cpntools.wiki.

550

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

Auctions for task allocation to robots Maitreyi Nanjanath a and Maria Gini a,1 a Department of Computer Science and Engineering, University of Minnesota Abstract. We present an auction-based method for dynamic allocation of tasks to robots. The robots operate in a 2D environment for which they have a map. Tasks are locations in the map that have to be visited by the robots, in any order. Unexpected obstacles and other delays may prevent a robot from completing its allocated tasks. Therefore tasks not yet achieved are rebid every time a task has been completed. This provides an opportunity to improve the allocation of the remaining tasks and to reduce the overall time for task completion. We present experimental results that we have obtained in simulation using Player/Stage. Keywords. multi-robot systems, auctions, task allocation, Player/Stage

1. Introduction There are many real-world problems in which a set of tasks has to be distributed to a group of robots. We are interested in situations where, while a single robot could do all the tasks, sharing the work with other robots will reduce the time to complete the tasks and increase the success rate. Search and retrieval tasks, which have been studied extensively in robotics (see, for instance [10,14]), are examples of the types of tasks we are interested in. In our study, tasks are locations in the map that have to be visited by the robots, but we could easily add other activities the robot has to perform at each task location. What distinguishes task allocation to robots from other task allocation problems is the fact that robots have to physically move to reach the task locations, hence the cost of accomplishing a task depends highly on the current robot location. We describe an efficient method based on auctions to perform task allocation. The method does not guarantee an optimal allocation, but it is specially suited to dynamic environments, where execution time might deviate significantly from estimates, and where it is important to adapt dynamically to changing conditions. The method is totally distributed. There is no central controller and no central auctioneer, which increases robustness. The auction mechanism we propose attempts to minimize the total time to complete all the tasks. Given the simplifying assumption of constant and equal speed of travel for all the robots, this is equivalent to minimizing the sum of path costs over all the robots. We are not as much interested in obtaining a theoretically optimal solution, as in providing a method that is both simple and robust to failure during execution. If a 1 Correspondence to: Maria Gini, Dept of Computer Science and Engineering, 200 Union St SE. Minneapolis, MN 55455. Tel.: +1 612 625 5582; Fax: +1 612 625 0572; E-mail: [email protected].

M. Nanjanath and M. Gini / Auctions for Task Allocation to Robots

551

robot finds an unexpected obstacle, or experiences any other delay, or is disabled, the system continues to operate and tasks get accomplished. Our algorithm is greedy, and finds close-to-optimal solutions that are fast to compute. It is flexible, allowing robots to rebid when solutions are unobtainable, rather than forcing a costly re-computation of the entire optimal solution.

2. Proposed Algorithm In this work we assume that each robot is given a map that shows its own location and the positions of walls and rooms in the environment. No information is given about where the other robots are located. The map allows a robot to estimate its cost of traveling to the task locations, and to compute the path to reach them from its original location. Suppose a user has a set R of m robots R = {r1 , r2 , ...rm }, and a set T of n tasks T = {t1 , t2 , ...tn }. In this study tasks are simply locations in the map that have to be visited, but the algorithm can take into account additional costs of doing the task once its location has been reached. The user partitions the tasks into m disjoint subsets, such that T1 ∪ T2 ∪ ... ∪ Tm = T and Ti ∩ Tj = φ ∀i, j1 ≤ i, j ≤ m. and allocates each subset to a robot. Note that a subset can be empty. The initial task distribution done by the user might not be optimal. Some robots might have no task at all assigned to them, while others might have too many tasks, the tasks assigned to a robot might be distributed all over the environment, and might be within easy reach of another robot, some tasks may be in an unreachable location. A robot must complete all its tasks unless it can pass its commitments to other robots. Since the robots are cooperative, they will pass their commitments only if this reduces the estimated task completion time. The ability to pass tasks to other robots is specially useful when robots become disabled since it allows the group as a whole to increase the chances of completing all the tasks. This process is accomplished via single-item reverse auctions, in which the lowest bid wins, that are run independently by each robot for their tasks. Each bid is an estimate of the time it would take for that robot to reach that task location (assuming for simplicity a constant speed) from its current location. To generate paths efficiently, robots use Rapidly-expanding Random Trees (RRTs) [12]. Generation of RRTs is very fast, and scales well with large environments. An example of a RRT is shown later in Figure 2. Auctions are simultaneous, i.e. many auctioneers may put up their auctions at once, but since each bidder generates bids in each auction independently of the other auctions, the effect is the same as having the auctions done sequentially. The algorithm each robot follows is outlined in Figure 1. We assume the robots can communicate with each other, for the purpose of notifying potential bidders about auctioned tasks, for submitting their own bids, and for receiving notification when they won a bid. When estimating costs for tasks in different auctions, a robot uses only its current position, without accounting for possible movements in between task locations. A robot can choose not to bid on a particular task, based on its distance from and accessibility to that task. Once the auctioned tasks are assigned, the robots begin to move to their task locations, attempting the nearest task first (i.e. the task with the lowest cost). When a robot

552

M. Nanjanath and M. Gini / Auctions for Task Allocation to Robots

Repeat for each robot ri ∈ R: 1. 2. 3. 4. 5. 6.

Activate ri with a set of tasks Ti and a list of robots R−i = R - {ri }. Create an RRT using ri ’s start position as root. Find paths in the RRT to each task location in Ti . Assign cost estimate cj to each task tj ∈ Ti based on the path found. Order task list Ti by ascending order of cj . ri does in parallel: (a) Auction the assigned tasks: i. Create a Request For Quotes (RFQ) with tasks Ti . ii. Broadcast the RFQ to R−i and wait for bids. iii. Find the lowest bid bjk among all the bids for task tj . iv. If bjk < cj then assign tj to robot rk else keep tj . Mark tj as assigned. v. Ask rk to update its bids for the tasks left (rk has now new tasks). vi. Repeat from 6(a)iii until all tasks are assigned. (b) Bid on RFQs received from other robots: i. Find a RRT path for each task tr in the RFQ. ii. Create a cost estimate cr for each tr that the robot found a path to. iii. Send the list of costs to the auctioneer that sent the RFQ. (c) Begin execution of first assigned task: i. Start executing the first task tj by finding a path in the RRT and following it as closely as possible. ii. If new tasks are added as result of winning auctions, insert them in Ti keeping it sorted in ascending order of cost, and repeat from 6(c)i. iii. If ri is stuck, auction ri ’s tasks. iv. If tj is completed successfully, restart from 4.

until timeout. Figure 1. Task allocation algorithm.

completes its first task, it starts an auction again for its remaining tasks, in an effort to improve the task allocation. In case robots get delayed by unexpected obstacles, this redistribution of tasks allows them to change their commitments and to adapt more rapidly to the new situation. If a robot is unable to complete a task it has committed to, it can auction that task. Any task that cannot be completed by any of the robots is abandoned. We assume that there is value in accomplishing the remaining tasks. The robots are given a time limit to complete the tasks, so that they do not keep trying indefinitely. When all the achievable tasks (determined by whether at least one robot was able to find a path to that task) are completed, the robots idle until the remainder of the time given to them is over. The algorithm allows for dynamical additions of new tasks during the execution, but for simplicity, in the experiments described in Section 3, the set of tasks and of robots is known at start and does not change during the execution.

M. Nanjanath and M. Gini / Auctions for Task Allocation to Robots

553

Figure 2. The hospital environment. The top part of the figure shows the Stage simulation, with the locations of the tasks and of the robots. (The active robot has its range sensor traces shown). The lower part shows the paths generated by the RRT algorithm, with the location of the active robot on the paths indicated by a square. This is one of the single robot experimental runs, where only one robot is active.

3. Experimental setup and analysis We conducted experiments in the Player/Stage simulation environment [9]. We simulated robot deployment in complex 2-D worlds, using as our test environment the section of the hospital world from Player/Stage shown in Figure 2. The hospital world consists of several rooms with small doorways and limited accessibility, covering a total area of 33 × 14m2 . Each robot is a small differential drive vehicle placed at an arbitrary location in the world. It is equipped with 5 sonar sensors mounted at 45◦ angles across its front, which are used for obstacle avoidance. While these sensors allow the robot to avoid colliding into straight walls, robots tend to get stuck on corners where they cannot detect the corner before colliding into it. This tend to produce unexpected delays in the execution. Tasks are modeled as beacons placed at different positions in the environment. We used different experimental setups, each with 16 tasks placed in different rooms. We tested the setups with 1, 3, and 10 robots, and ran a final set of experiments with a single auction (with no rebidding) to use as a baseline. The experiments were run for 10 minutes each, to avoid long runs when robots were unable to make much progress. This also allowed us to test how often the robots could

554

M. Nanjanath and M. Gini / Auctions for Task Allocation to Robots

not accomplish all the tasks in the allocated amount of time. We ran each experiment 10 times, with the same initial conditions, but with different initial task allocations. The auction algorithm is sensitive to the order in which tasks are given to the robots. To reduce this effect we supplied the tasks to the robots in a random order each time an experiment was run. This, combined with the inherently random nature of the RRT generation algorithm, resulted in significant variations across runs both in the allocation of tasks and time taken to complete the tasks. 600

Completion Times (seconds)

500

400

300

200

Single Round Auction Single Robot

100

3 Robots, Multiple Auctions 10 Robots, Multiple Auctions

0

1

2

3

4

5 6 Experiment Number

7

8

9

10

Figure 3. Time spent trying to complete tasks in different robot-auction combinations.

Performance results are shown in Figure 3. The results show the time taken to complete all the tasks that were accomplished in each run. We can observe that a single robot takes longer, but, as expected, the speedup when using multiple robots is sublinear. A single round auction tends to perform worse than multiple auctions and has more variability in the time needed to complete the tasks. This is consistent with the observation that reallocation of tasks via additional bidding tends to produce on average a better allocation. The results are best when the number of robots and tasks is balanced. When the task are few some of the robots stay idle, when the tasks are too many with respect to the number of robots the completion time increases, since each robot has more work to do. Figure 4 shows the percentage of tasks completed for each run. Since the number of tasks was relatively large with respect to the time available and the distance the robots had to travel, very few runs had all the tasks completed. We can observe that with a single robot only a small percentage of the 16 tasks get accomplished in the time allocated. With a more balanced number of tasks and robots a much larger percentage of tasks gets done. We can see differences between runs when using a single round auction versus using multiple rounds. The performance of multiple rounds of auctions is not consistently better than when using a single round. Recall that in each experiment the initial allocation of tasks to robots was different, and some allocations were clearly better than others.

555

M. Nanjanath and M. Gini / Auctions for Task Allocation to Robots 100 90

Percentage Tasks Completed

80 70 60 50 40 30

Single Round Auction Single Robot

20

3 Robots, Multiple Auctions

10

10 Robots, Multiple Auctions

0

1

2

3

4

5 6 Experiment Number

7

8

9

10

Figure 4. Relative task completion rates for different robot-auction combinations

4. Related Work The problem we studied is a subset of the larger problem of coordination in a team. Our robots have to coordinate so that all the locations of a given set are reached by a robot, but are otherwise independent. A recent survey [5] covers in detail the state of the art in using auctions to coordinate robots for accomplishing tasks such as exploration [4,11], navigation to different locations [15], or box pushing [7]. Auction-based methods for allocation of tasks are becoming popular in robotics [4,8,15] as an alternative to other allocation methods, such as centralized scheduling [3], blackboard system [6], or application-specific methods, which do not easily generalize [1] to other domains. Combinatorial auctions have been tried as a method to allocate navigation tasks to robots [2] but are too slow to be practical and do not scale well. They allow tasks to be accomplished with maximum efficiency, but the time taken in determining whom to assign which tasks often ends up being more than the time for the tasks themselves. Single item auctions tend to miss opportunities for optimal allocation, even though they can be computed in polynomial time. Our approach tries to find a tradeoff between computational complexity and optimality of allocations. We do not use combinatorial auctions, but we reauction tasks multiple times while they are being executed, so allowing for a better allocation. Recent work [15,13] has focused on producing bidding rules for robot navigation tasks that lower computational costs while providing an optimal solution. The method uses multi-round auctions, where each robot bids in each round on the task for which its bid is the lowest. The overall lowest bid on any task is accepted, and the next round of the auction starts for the remaining tasks. Once all the tasks have been allocated, each robot plans its path to visit all the sites for the tasks it won. The bidding rules are such that there is no need for a central controller, as long as each robot receives all the bids from all the robots, each robot can determine the winner of the auction. Our approach differs in many ways. First, the auctioneer determines the winner of the

556

M. Nanjanath and M. Gini / Auctions for Task Allocation to Robots

auction, so if a robot fails to submit a bid (perhaps because of communication failure), the auction can continue. Second, our approach is designed for dynamic situations where unexpected delays during execution can prevent a robot from accomplishing its tasks, or can make task accomplishment more expensive than originally thought. By continuously rebidding and reallocating tasks among themselves during task execution, the robots react and adjust to changing situations.

5. Conclusions and Future Work We have presented an algorithm for allocation of tasks to robots. The algorithm is designed for environments that are dynamic and where failures are likely. We assume the robots are cooperative, and try to minimize the total time to complete all the tasks assigned to the group. Each robot acts as an auctioneer for its own tasks and tries to reallocate its tasks to other robots whenever this reduces the cost. Robots also reassess the current situation and attempt to improve the current task allocation by putting their remaining tasks up for bid whenever they complete a task. The process continues until all the tasks have been completed or the allocated time has expired. We removed any need for central coordination; tasks are assigned in a distributed fashion, so that the system can recover from single or even multiple points of failure. This prevents us from using any centralized system, such as a blackboard system [6], since this will create a single point of failure. Future work will include considering additional costs to do tasks over the cost of reaching the task location, and introducing heterogeneous robots having different speeds and capabilities. In addition, we have left addressing communication malfunctions for future research. It is our experience that robots can become disabled but rarely lose the ability to communicate. Disabling communication will introduce new challenges. Since each robot is initially given its own tasks, it will have to maintain separately the complete list of tasks given to the system as a whole. This can be done by having an initial communication phase that involves broadcasting the list of tasks to all the robots (assuming no communication failure at that time). Each robot will also need to track task completion by the other robots, and periodically broadcast its own state. Acknowledgments Work supported in part by NSF under grants EIA-0324864 and IIS-0414466.

References [1] W. Agassounon and A. Martinoli. Efficiency and robustness of threshold-based distributed allocation algorithms in multi-agent systems. In Proc. of the 1st Int’l Conf. on Autonomous Agents and Multi-Agent Systems, pages 1090–1097, July 2002. [2] M. Berhault, H. Huang, P. Keskinocak, S. Koenig, W. Elmaghraby, P. Griffin, and A. Kleywegt. Robot exploration with combinatorial auctions. In Proc. Int’l Conf. on Robotics and Automation, 2003. [3] S. Chien, A. Barrett, T. Estlin, and G. Rabideau. A comparison of coordinated planning methods for cooperating rovers. In Proc. of the Int’l Conf. on Autonomous Agents, pages 100–101. ACM Press, 2000.

M. Nanjanath and M. Gini / Auctions for Task Allocation to Robots

557

[4] M. B. Dias and A. Stentz. A free market architecture for distributed control of a multirobot system. In Proc. of the Int’l Conf. on Intelligent Autonomous Systems, pages 115–122, Venice, Italy, July 2000. [5] M. B. Dias, R. M. Zlot, N. Kalra, and A. T. Stentz. Market-based multirobot coordination: A survey and analysis. Technical Report CMU-RI-TR-05-13, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, April 2005. [6] R. S. Engelmore and A. Morgan, editors. Blackboard Systems. Addison-Wesley, 1988. [7] B. P. Gerkey and M. J. Matari´c. Sold!: Auction methods for multi-robot coordination. IEEE Trans. on Robotics and Automation, 18(5), Oct. 2002. [8] B. P. Gerkey and M. J. Matari´c. Multi-robot task allocation: Analyzing the complexity and optimality of key architectures. In Proc. Int’l Conf. on Robotics and Automation, Sept. 2003. [9] B. P. Gerkey, R. T. Vaughan, and A. Howard. The Player/Stage project: Tools for multi-robot and distributed sensor systems. In Proc Int’l Conf on Advanced Robotics, pages 317–323, June 2003. [10] D. Goldberg and M. J. Matari´c. Design and evaluation of robust behavior-based controllers. In T. Balch and L. E. Parker, editors, Robot Teams: From Diversity to Polymorphism. A K Peters Ltd, Natick, MA, Apr. 2002. [11] N. Kalra, D. Ferguson, and A. Stentz. Hoplites: A market-based framework for planned tight coordination in multirobot teams. In Proc. Int’l Conf. on Robotics and Automation, 2005. [12] J. J. Kuffner and S. M. LaValle. RRT-connect: An efficient approach to single-query path planning. In Proc. Int’l Conf. on Robotics and Automation, pages 995–1001, 2000. [13] M. G. Lagoudakis, E. Markakis, D. Kempe, P. Keskinocak, A. Kleywegt, S. Koenig, C. Tovey, A. Meyerson, and S. Jain. Auction-based multi-robot routing. In Robotics: Science and Systems, Cambridge, USA, June 2005. [14] K. Sugawara and T. Watanabe. Swarming robots – foraging behavior of simple multi-robot systems. In Proc. IEEE/RSJ Int’l Conf. on Intelligent Robots and Systems, Lausanne, Switzerland, 2002. [15] C. Tovey, M. Lagoudakis, S. Jain, and S. Koenig. The generation of bidding rules for auctionbased robot coordination. In Multi-Robot Systems Workshop, Mar. 2005.

558

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

Exploration of Natural Dynamics through Resonance and Chaos Alex Pitti, Max Lungarella, and Yasuo Kuniyoshi Intelligent Systems and Informatics Lab Dept. of Mechano-Informatics The University of Tokyo, Japan Email:{alex,maxl,kuniyosh}@isi.imi.i.u-tokyo.ac.jp Abstract. Self-exploration of movement possibilities and exploitation of natural dynamics are two crucial aspects of intelligent autonomous systems. We introduce a dynamical exploration strategy which combines chaotic neural activity with feedback-induced resonance. The underlying mechanism satisfies three conditions: (1) resonant patterns are discovered even in the presence of noisy sensory feedback; (2) no preprocessing of the feedback signal is required; and (3) exploration is an integral part of the mechanims. An embodied system endowed with our exploration strategy, can autonomously discover and tune into the intrinsic resonant modes most relevant to its body morphology. The mechanism not only allows to explore stable behavioral patterns, but also unstable ones. We demonstrate the role played by our exploration strategy on the emergence of movements through a simulation of a bipedal robot. We analyze with spectral and temporal methods the synchronicity of movements, spatio-temporal coordinations, and bifurcations in the couplings between neural, body, and environmental dynamics. Keywords. Natural dynamics, feedback resonance, coupled chaotic fields, morphology, neural control

1. Introduction The main question addressed in this paper is how an embodied system can autonomously discover modes of movement coordination which exploit the natural dynamics of the body. Traditional approaches to robot control [1] as well as soft-computing methodologies (e.g. reinforcement learning, neural networks, or genetic algorithms) conceive of the control of actions as a mathematical problem neglecting the importance of the interaction with the real world. The emphasis is placed more on precision, reproducibility, and planning of optimal movements guided by feedback. Although these approaches are successful in particular well-controlled application domains (e.g. industrial environments), they face considerable problems when the system’s parameters are unknown, the number of sensor and actuators is subject to changes, or the robustness and adaptivity to unforeseen situations is of the essence. The difficulties have not gone unnoticed, and there have been recent attempts at devising more adaptive control strategies grounded into dynamic systems theory (e.g. [5,10]). The focus of this research effort is on the coupling of brain, body, and en-

A. Pitti et al. / Exploration of Natural Dynamics Through Resonance and Chaos

559

vironment, and the exploitation of the intrinsic dynamics of the individual components. The "control loop" explicitly and purposively exploits factors such as morphology, material properties, natural dynamics, and, more generally, the physical interaction with the real world. In this paper, building up on these ideas and on previous work [6,11], we present and discuss a framework for exploring and exploiting the bodily capabilities of a complex embodied system. The core intuition is that small but well-timed perturbations can be used to push the system into behavioral modes characterized by high behavioral flexibility and low energetic cost. For this purpose, we investigate neural and physical mechanisms involved in the emergence of actions that are driven by the interaction of brain, body, and environment, and propose several tools to quantify them. We focus on two key components governing such emergence: (1) feedback resonance (physical component), and (2) coupled chaotic fields (neural component). In the following section, we detail how these two components affect the emergence of movements. We then describe our experimental setup and a set of methods aimed at shedding some light on the complex interplay between brain, body, and environment. We finally analyze the movement patterns produced by our system and discuss the results.

2. Exploration Mechanism 2.1. Physical component: feedback resonance Feedback resonance is a mechanism to drive a nonlinear oscillator into resonance by varying the frequency of an external perturbation as a function of the oscillation amplitude [8]. The rationale is that such a perturbation destabilizes the system and induces it to move from its current orbit to another more stable one, potentially driving the system closer to resonance. In addition to external perturbations, changes of the system’s global dynamics are also regulated through perturbations induced by the system’s internal dynamics. Modulating the internal dynamics can therefore also provide clues about which external perturbations the system is sensitive to. This approach is inspired by the OGYmethod [7] which demonstrates how a small time-dependent change of a control parameter of a chaotic system can significantly affect the behavior of the system, e.g. turn a chaotic behavior into a periodic one. It is also somewhat related to the notion of "intervention dynamics" which was introduced as a means of funneling the global dynamics of a system through particular characteristic via-points known as "focuses" [5]. Given two coupled systems A and B, we can formulate the principle underlying feedback resonance as: FiA (t + 1) = FiA (t) + γFiB (t), with FiA (t) >> γFiB (t),

(1)

where FiA (t) denotes the "force" exerted on the ith degree of freedom of system A at time t, and γ is a coupling constant. The term γFiB (t) represents the influence of system B on system A. Conversely, the same perturbation scheme can be applied to system B: FiB (t + 1) = FiB (t) + ηFiA (t), with FiB (t) >> ηFiA (t),

(2)

where FiB (t) is the value of the ith degree of freedom of B, and FiA (t) indicates the influence of system A on system B modulated by the coupling constant η. Eqs. 1 and 2 establish a coupling between the two systems which allows internal dynamics and ex-

560

A. Pitti et al. / Exploration of Natural Dynamics Through Resonance and Chaos

ternal perturbations to mutually entrain. Such mutual entrainment is typically accompanied by resonant modes. In our framework, A and B are the neuro-body and the bodyenvironment sub-systems (Fig. 1 a).

Figure 1. Our model: a) theoretical framework, and b) simulated bipedal robot.

2.2. Neural component: dynamically coupled chaotic field We model the neural part of our system as a dynamical network of decoupled chaotic units. These units are connected through afferent connections to the sensors (input), and through efferent connections to the actuators (output). The chaotic system is coupled to the mechanical system (body) through feedback resonance (Eqs. 1 and 2). The effect of entrainment is the emergence of correlations and modes of coordination accompanied by an integrative field of neural units which can not be produced by a predefined and static coupling. We suggest that the interaction between brain and body-environment dynamics couples (or decouples) the chaotic units connected to the body parts involved in the movement. Each chaotic unit is modeled as a logistic map whose chaoticity is determined by the control parameter α: xi (t + 1) = fα (xi (t)),

(3)

fα (xi (t)) = 1 − αxi (t)2 + ηFi (t), where 1 − αxi (t)2 >> ηFi (t). In all our experiments and for all units: α ∈ [0.0, 2.0]. The use of chaotic units is partially justified by findings on human locomotion showing that variations between steps are not mere noise but have a chaotic structure [2]. Some evidence also indicates that chaos might occur at higher levels of the neuromuscular control system [2,3].

3. Methods In this section, we introduce measures for quantifying movement patterns: the duty factor (DF), the spectral bifurcation diagram (SBD), the wavelet transform (WT) and a novel method derived from the wavelet transform, the wavelet bifurcation diagram (WBD). The duty factor is typically used for quantifying and classifying different types of locomotive patterns [14]. It is defined as the fraction of time the legs are on the ground versus the duration of one locomotive cycle. In walking humans, for instance, each foot is on the ground for more than half of the time (DF > 0.5). In running humans, however,

A. Pitti et al. / Exploration of Natural Dynamics Through Resonance and Chaos

γ

η

10

0.01

density

damping

stiffness

561

sampling time (s)

0.6 10 10 0.05 Table 1. Parameters used in our experiments.

it is on the ground for less than half of the time (DF < 0.5). The spectral bifurcation diagram is an attempt to analyze qualitatively the dynamics of high dimensional dynamical systems [9]. It superimposes the spectral density distribution of each variable relative to the current value of the control parameter of the system under study. The wavelet transform is a popular and powerful tool for time series analysis and signal compression [12]. It extends the Fourier representation to the temporal domain. An interesting property of the wavelet transform is that it can be used to identify short- and long-range temporal correlations in time series. We extended the wavelet transform to the analysis of correlations among the chaotic units composing the neural system and the individual body parts. In order to visualize the spatio-temporal patterns and the bifurcations at different scales, we added a dimension spanning the index of the units of the system. We call this measure the wavelet bifurcation diagram.

4. Experiments For our experiments we used a simulated bipedal (two-legged) robot moving in the sagittal plane (Fig. 1 b). Each leg was composed of two limbs linked by a damper-spring joint and had two active and two passive joints. The legs were actuated at the hip by two motors controlled by a neural system consisting of two chaotic units. The input to the chaotic units was provided by angle sensors located in the passive joints (the sensors measured the angle between upper and lower limb). The parameters used for the simulation are given in Table 4. All our simulations were performed with Matlab. By changing the control parameter α, we observed many different movement patterns. For 0 < α < 0.7, the dynamics of the chaotic system and of the body were weakly coupled and perturbed each other only slightly. The small perturbations transmitted by both systems, however, were not strong enough for any movements to occur. For α = 0.7, the system started to rock slowly back and forth without actually changing its position. By increasing α up to 0.8, the robot had enough energy to balance but not enough to lift its legs and locomote, that is, the system began to "crawl" and then from time to time returned to its initial (balanced) position (Fig. 3 a). For α = 0.85 the system had finally enough inertia to lift one of its legs and started moving. The external perturbations, and the speed increased too. This change is reflected by a drop of the duty factor from about 0.65 to 0.40 for speeds v < 0.3 m/s. For α = 0.9, the internal excitations of the chaotic system fed to the motors had enough energy so that both legs could move separately. The system coordinated its legs alternatively by hopping (Fig. 3 b; duty factor DF < 0.20 and speeds v < 0.5 m/s), walking (Fig. 3 c), and jumping (Fig. 3 d; DF < 0.20, 0.5 m/s < v < 1.0 m/s). Note that for α < 1.0 hopping was a more stable and reproducible behavior compared to walking and jumping which require a more precise synchronization between the legs. Systems with hard (k = 50) and soft (k = 10) springs displayed the same stable behaviors (Fig. 2). For α > 1.0 we observed a change of the dynamics depending on the spring stiffness. In the case of hard springs, the mechanical system moved in the

562

A. Pitti et al. / Exploration of Natural Dynamics Through Resonance and Chaos

same stable fashion quite immune to changes of the chaoticity of the neural system. The system with soft springs was able to perform faster coordinated movements but as the control parameter increased the movements were less stable. Its dynamics allowed it to run and jump at v > 1.5 m/s for DF < 0.20. For α > 1.4, the system was highly unstable and did not display any coordinated movements. The perturbations of the chaotic system strongly affected the body movements. External perturbations due to the bodyenvironment interaction had no measurable influence on the neural system.

Figure 2. Duty factor for different values of α and for joints with different stiffness. (a) Hard springs (k = 50); and (b) soft springs (k = 10). α < 1.0 (circles); α > 1.0 (crosses). 600 samples for each category.

Figure 3. Gaits for different values of the control parameter: a) crawling, b) hopping, c) walking, and d) jumping.

5. Analysis Movement patterns have spatio-temporal structure resulting from changes in the coordination dynamics of both body and neural system. The spectral bifurcation diagram illustrates the spectral distribution of the coupling between the neural system and the body as a function of the control parameter α (Fig. 4). The frequencies with high density correspond to those of synchronization between the two dynamical systems. They represent the inner resonant states for which perturbations (or moments of energy exchange) between the two coupled systems are significant. Interestingly, for α < 1.0, the body dynamics has little effect on the neural dynamics and vice versa. We can distinguish the fundamental mode at a frequency of 10 Hz, and harmonics at regular intervals from the fundamental with an harmonic mode around 256 Hz. This corresponds to the excitation of the passive dynamics of the damped springs (the low frequencies) by the chaotic system (the higher frequencies) when the mechanical system balances, crawls, and starts walking and hopping. These frequencies

A. Pitti et al. / Exploration of Natural Dynamics Through Resonance and Chaos

563

Figure 4. Spectral Bifurcation Diagram: (a) sensors, and (b) chaotic units.

represent the lowest energetic cost required by the system to move and locomote and at the same time the maximum amount of external perturbation that the system can absorb while performing the required actions. The wavelet transform for different values of α for the time series extracted from the sensor located in the knee of the rear leg is plotted in Fig. 5 (a). This figure also illustrates the stability and low complexity of the behavior when the robot is poised in balance. Note that for α < 1.0 every scale lacks long-range temporal correlations. Conversely, for α = 0.97, a qualitative change occurs in the interaction between the two dynamical systems. As a result a spectral bifurcation at frequency 128 Hz appears in the SBD indicating more powerful hopping and walking patterns. The wavelet transform in Fig. 5 (a) shows this qualitative change for s = 128 and for s < 50. Temporal correlations are formed by the appearance of long almost single-scale (i.e. periodic) temporal patterns. The wavelet bifurcation diagram (WBD) in Fig. 6 (a) allows to visualize the emergence of rhythmically stable spatio-temporal coordinations of the two legs (units 1 and 2) when the system starts to walk and hop (appearance of stripes at scales 1 and 2). When α increases a little, these dynamical patterns become more unstable (Fig. 6 b). For α > 1.13, the spectral bifurcation at 128 Hz diffuses and gradually activates all the surrounding frequencies. This new modulation affects the stability of the coordination between the chaotic system and the body producing fast walking and running behaviors in addition to hopping and crawling. These frequencies represent the amount of energy that the system can actually absorb to perform these behaviors with regard to internal/external perturbations. These irregulars harmonics are characteristic of the change of stiffness in springs as observed in [13], giving the property of the system to "harden" or "soften" its springs, and thus to have higher flexibility. As can be seen in Fig. 5 (c) shortlasting unstable motions for scales s < 25 form stable and repetitive long-range temporal patterns at higher scales (s > 25, 128, 155 and higher). The stability of the dynamics has changed scale with α implying that stability is scale-dependent. The same result is observed in the case of the WBD (Fig. 6 c). For scales s > 2 the embodied system performs stable "long-range" movements which are accompanied by short-range perturbance-like movements at lower scales. The structure of the spatio-temporal patterns is fractal as in human locomotion [2]. The patterns are formed at the lowest scales and are integrated and coordinated in time at higher scales. Long-lasting movements are thus composed of movements of smaller duration. Increasing α beyond 1.25 gives rise to a second spectral bifurcation at frequency 256 Hz leading to chaotic interaction dynamics between the coupled systems. In this

564

A. Pitti et al. / Exploration of Natural Dynamics Through Resonance and Chaos

mode, the embodied system is highly sensitive to any kind of perturbations and is unstable (almost all the perturbations give rise to resonance). This result seems to contradict the ones obtained with the wavelet transform (Fig. 5 d) and with the WBD (Fig. 6 d). The movements, although unstable and dynamic for scales s < 25, are highly correlated at the higher scales presenting stable movement patterns with long-range temporal correlations. On other words, the stability of the dynamics has changed scale.

Figure 5. Wavelet Transform. From top-left to bottom-right the intervals of the control parameter α are: [0.90; 1.00]; [1.00; 1.10]; [1.10; 1.20]; and [1.20; 1.30]. The arrow indicates a spectral bifurcation.

Figure 6. Wavelet Bifurcation Diagram. From top to bottom the intervals of the control parameter α are: [0.90; 1.00]; [1.00; 1.10]; [1.10; 1.20]; and [1.20; 1.30]. The horizontal axis denotes time, the vertical axis denotes the index of the chaotic unit. The individual graphs have different time-scale resolutions.

A. Pitti et al. / Exploration of Natural Dynamics Through Resonance and Chaos

565

6. Conclusion In this paper, we addressed the question of how an embodied system can autonomously explore its movement possibilities and exploit its natural dynamics. We proposed a general framework based on two core phenomena: resonance and chaos. We also introduced a set of quantitative measures to analyze the dynamics of coupled nonlinear system. We observed that the stability of the emergent movement patterns in a simulated bipedal robot is scale dependent and present a spatio-temporal fractal structure. Our future work will be aimed at exploring how our framework might be combined with learning, planning, and intentional behavior.

7. Acknowledgments The authors would like to thank Fumiya Iida for kindly providing the simulation of "Puppy – the dog robot", and the Japanese Society for the Promotion of Science (JSPS) for supporting this research.

References [1] Spong, M.W. and Vidyasagar, M. (1989). Robot Dynamics and Control. New York: Wiley. [2] Hausdorff, J. M., C.-K. Peng, Z. Ladin, J. Y. Wei, and A. L. Goldberger (1995). Is walking a random walk? Evidence for long-range correlations in the stride interval of human gait. J. Appl. Physiol. 78: 349-358, 1995. [3] Kurz MJ, Stergiou N. (2005). An artificial neural network that utilizes hip joint actuations to control bifurcations and chaos in a passive dynamic bipedal walking model. Biological Cybernetics 93(3):213-21. [4] Kaneko, K. and Tsuda, I. (2000). Complex Systems: Chaos and Beyond. New York: SpringerVerlag. [5] Yamamoto,T. and Kuniyoshi, Y. (2002). Global dynamics: a new concept for design of dynamical behavior. Proc. of 2nd Int. Workshop on Epigenetic Robotics, pp. 177-180. [6] Kuniyoshi, Y. and Suzuki, S. (2004). Dynamic emergence and adaptation of behavior through embodiment as coupled chaotic field. Proc. of 17th Int. Conf. on Intelligent Robots and Systems, pp. 2042-2048. [7] Ott, E., Grebogi, C. and Yorke, J. (1990). Controlling Chaos. Physical Review Letters, 64(11):1196-1199. [8] Fradkov, A.L. (1999). Investigation of physical systems by means of feedback. Autom. Remote Control, 60(3). [9] Orrel, D. and Smith, L.A. (2003). Visualising bifurcations in high dimensional systems: the spectral bifurcation diagram. Int. J. of Bifurcation and Chaos, 13(10):3015-3027. [10] Lungarella, M. and Berthouze, L. (2002). On the interplay between morphological, neural, and environmental dynamics: a robotic case-study. Adaptive Behavior, 10(3/4):223-241. [11] Pitti A., Lungarella, M. and Kuniyoshi, Y. (2005). To appear in Proc. of 2nd Australian Conf. on Artificial Life. [12] Mallat S. (1989). A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans. Pattern Analysis and Machine Intelligence, 11(7):674-693. [13] Thompson, J.M.T. and Stewart, H.B. (2002). Nonlinear Dynamics and Chaos. New York: Wiley. [14] McNeill Alexander, R. (2003). Principles of Animal Locomotion. Princeton Book.

566

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

One-Legged Locomotion with a Compliant Passive Joint a

Juergen Rummel a , Fumiya Iida a,b , Andre Seyfarth a Locomotion Laboratory, University of Jena, Dornburger Str. 23, D-07743 Jena, Germany, email: [email protected], web: www.lauflabor.de b Artificial Intelligence Laboratory, University of Zurich, Andreasstrasse 15, CH-8050 Zurich, Switzerland Abstract. There is an increasing attention of exploiting compliant materials for the purpose of legged locomotion, because they provide significant advantages in locomotion performance with respect to energy efficiency and stability. Toward establishing a fundamental basis for this line of research, a minimalistic locomotion model of a single legged system is explored in this paper. By analyzing the dynamic behavior of the system in simulation and a physical robotic platform, it is shown that a stable locomotion process can be achieved without the necessity of sensory feedback. In addition, further analysis characterizes the relation between motor control and the natural body dynamics determined by morphological properties such as body mass and spring constant. Keywords. Legged locomotion, compliance, hopping, two-segmented leg, feedforward control, morphological properties,

1. Introduction While most of the legged robots are composed of rigid materials and controlled by highgain control, there has been an increasing attention to the legged locomotion exploiting elastic materials. In nature, biologists have explored a few important reasons why animals make use of elasticity to achieve rapid legged locomotion. Elastic materials can be used for storing energy which results in energy efficient locomotion [1,2]. In particular, the so-called spring-mass model has been explored for the theoretical analyses of animal running behavior [3,4]. This simple theoretical model which consists of a body represented as a point mass and a leg as a linear spring explained how the legged locomotion can be approximated. Based on these biomechanical studies, a number of legged robots have been developed to understand how elasticity could be exploited for the purpose of locomotion [5,6,7,8,9,10]. Although all of these robotic platforms are designed for the purpose of legged locomotion, there are a few different approaches. One class of locomotion mechanism makes use of prismatic actuation in the legs. In this approach, a prismatic actuator pushes off the body when it touches down the ground [5,6]. In the second approach, robots recognize stance phases of a leg but without using prismatic actuation. The leg is controlled to set a particular angle during swing phase.

567

J. Rummel et al. / One-Legged Locomotion with a Compliant Passive Joint

hip joint

beam

(x1,x2)

(xl,yl)

hip joint

spring

a y

rubber cm

B

lspr

(x6,x7)

x

k

z1

z2

l2 /2

knee joint

A

spring

body

M g x5

thigh

(x3,x4) m1 I1=m1l12/12 l1 /2 shank

x8

m2 2 I2=m2l2 /12

(xr,yr)

Figure 1. A: Fujubot, a one-legged robot with two segments and a compliant passive joint. A based beam holds the robots body. B: Schematic diagram of the simplified dynamic model used in simulations.

We aim to propose another class of legged locomotion with compliant legs, which uses no sensory information. Previously, it has been shown that, if the system has a good morphological properties such as spring stiffness and body dimension, it is able to achieve rapid four legged locomotion [8]. In this model, by simply swinging the legs back and forth, the system is able to achieve hopping locomotion behavior. Moreover, the system has to exploit the body dynamics derived from morphological properties, because there is no sensory information. For a better understanding of dynamic legged locomotion, we reduced the model to a single spring-like but also segmented leg that will be introduced in the next section. In this study, we investigate the characteristics of self-organized locomotion in relation to actuation and morphological properties in which we use a physical robotic platform and a simulation model. The performance criterions speed and actuator torques will be analyzed in detail to prove the behaviors in one-legged locomotion.

2. Methods 2.1. Mechanical application The mechanical application (Fujubot) we used in this investigation is a one-legged robot with two segments shown in Figure 1A. The robot is approximately 15 cm in height and has a total weight of about 0.2 kg including additional masses behind the horizontal bar. Only one digital servomotor is implemented as actuator. This motor actuates the hip joint that is connected to the thigh segment (90 mm). A centered shank segment (80 mm) is connected to the thigh via a passive knee joint. In addition, the shank is elastically linked with the thigh by a spring. The spring stiffness k is 0.2 N/mm with a rest length of 25 mm. The rest angle of the knee joint is circa 130 ◦ . Therefore, the leg length is

568

J. Rummel et al. / One-Legged Locomotion with a Compliant Passive Joint

approximately 12 cm. Two elements of rubber at both ends of shank segment serve as damper when hitting the ground. As shown in the background of Figure 1A, the body of Fujubot is held by a beam (0.34 m) and a base. This mounting, similar to Raiberts tether mechanism [5], constrains the motion of robots body leaving just two degrees of freedom. It moves forward and backward and up and down. Indeed, this robot is built imperfectly, hence, the mentioned retaining application has play in the joints. Especially the pitch axis has a play of up to ±5 ◦ . Nevertheless, with the help of this mounting the robot does not need to be stabilized. Moreover, we are able to investigate the behavior of a segmented spring-like leg without using difficult control strategies. 2.2. Motor control The hip actuator is realized by a position controlled motor. In experiments we used a simple sine oscillation as angular position signal. α(t) =

ωmax sin (2 π f t) + α0 2πf

(1)

The maximum angular velocity ωmax was held constant at 360 ◦ /s and is a motor specific property. Furthermore, we are able to use the frequency f and offset angle α0 as independent control parameters. In the experiments the frequency varied from 1.5 to 8 Hz in steps of 0.5 Hz, where the offset angle ran the gamut from −25 ◦ to +25 ◦ in increments of 5 ◦ . We have to notice that the maximum controllable position of the used motor averages at ± 64 ◦ . Therefore, we limited the amplitude of the position signal at 38 ◦ for frequencies lower than 3 Hz. According to [8], this control strategy does not need global sensory feedback. 2.3. Experimental method In the experiments the robot moved in a circle on a flat ground surface. For kinematic analysis of the robots behavior and locomotion we attached reflective markers on the application. A high-speed (240 Hz) camera system (Qualisys) with six motion capture units recorded the three-dimensional coordinates of the markers. For data analysis we converted the measured trajectories into a two-dimensional plane. 2.4. Simulation model In this study we also used a simplified simulation model as illustrated in Figure 1B. It has 4 degrees of freedom: the body coordinates x1 and x2 and the angular displacement of thigh and shank: x5 and x8 . We replaced the equation of motion for x5 while using kinematical control. The control was the same as discussed in the 2.2 except of the limitation of the amplitude at lower frequencies. The ground is modelled as nonlinear spring and damper in vertical direction [11]. For horizontal direction, our ground reaction model is more complex. Here, we implemented a state machine for describing sliding friction and adhesive force when the foot stands still. It switches from sliding to sticking when the velocity of the foot becomes lower than a specified threshold. It switches back when the horizontal force becomes higher

569

J. Rummel et al. / One-Legged Locomotion with a Compliant Passive Joint 0

20

2.5

-1

1

0

A

-3 -2

10

-1

0

0 1

-10

-20

-20

5 7 2 3 4 6 8 frequency f of hip oscillation [Hz]

2 1

2.5

2

0

offset angle a0 [deg]

2.5

2

10

-10

1

0

-2

offset angle a0 [deg]

20

B

2.5

2.5

2

2 5 7 2 3 4 6 8 frequency f of hip oscillation [Hz]

Figure 2. Average speed in horizontal direction of the robot (A) and the simulation model (B). The velocity is measured in leg lengths per second, whereas the leg is 12 cm in length.

than the force of sliding friction. See appendix for a detailed description of the ground model. A difference of this model to the real existing leg is that this model has no play R 7.01 with additional toolboxes in the joints. We simulated this model using Matlab R Simulink and SimMechanics.

3. Results Locomotion behavior of the two-segmented leg explained in the previous section is tested and analyzed by using the robotic platform and the simulation model. First, we explore the relation between locomotion behavior and the control parameters, offset angle α0 and the frequency of hip oscillation f . With every set of these two parameters, we conduct experiments for 30 seconds. We repeat the experiments two times and record the kinematic data of the robot by using the motion tracking system mentioned in section 2.3. By varying these parameters, we observe four clearly distinguishable patterns of locomotion, i.e. forward locomotion with one and two ground contacts, hopping at place and backward locomotion. Figure 3B shows typical forward locomotion of this robot in which the robot touches down with the right end of lower segment. The spring is extended due to the ground reaction force. Because the robot efficiently uses the spring, this behavior generally results in faster locomotion. Another type of locomotion behavior shows ground contacts of both ends of the lower segment as shown in Figure 3A. Generally, after the right point touches down, the left end of the segment also hits the ground which results in slower but more stable locomotion behavior. This type of behavior is, however, mostly dependent on the parameter of offset angle as shown in Figure 3E. The backward locomotion shown in Figure 3C hardly uses the passive joint, which leads to unstable speed as explained later in this section. This behavior results in fast locomotion, but rarely. In order to understand underlying mechanisms of these behaviors, we investigate locomotion speed with respect to the control parameters as shown in Figure 2A. For this analysis, we split the two-dimensional trajectories into cycles defined by motor oscilla-

570

J. Rummel et al. / One-Legged Locomotion with a Compliant Passive Joint 0.35

1.0

y [m] 0

10 0

0.35

0.5

-10

0.2 0.2

A

0.5

0.2

offset angle a0 [deg]

0.1

0.5

20

-20

y [m]

0.1

D

5 7 2 3 4 6 8 frequency f of hip oscillation [Hz] 20

B

offset angle a0 [deg]

0

y [m]

0.1

C

0.2

10 0

0.3

0.4

-10 -20

0

0

0.1

0.2

x [m]

0.3

E

0.5

0.6

0.5

5 7 2 3 4 6 8 frequency f of hip oscillation [Hz]

Figure 3. A-C: The stick figures shows two cycles of robots locomotion behavior. The direction of movement is described by the arrows. Parameters of motor control: A: f = 3 Hz, α0 = −15 ◦ , B: f = 5 Hz and α0 = 10 ◦ and C: f = 7.5 Hz, α0 = 25 ◦ . The landscape in D illustrates the standard deviation of the robots horizontal velocity measured in leg lengths per second. In figure E, the length of time is shown where the left foot-point touches the ground in addition to the right foot-point. The values are normalized to cycle time of motor oscillation.

tion. For every set of parameters, offset angle and frequency, we randomly selected 120 movement cycles. The average speed of each cycle was calculated and we averaged it again. It is clearly shown that there are three peaks of speed in the landscape of control parameters, i.e. one peak in the lower frequency and lower offset angle, one in the middle, and the negative peak at the higher frequency and high offset angle. These peaks are corresponding to the forward locomotion with two ground contacts, the normal forward and the backward locomotion. We conduct simulation experiments in the same manner, and it results in a similar perspective as shown in Figure 2B. Here, the limit where direction of hopping changes is displaced into lower frequencies. In contrast, the region of lower offset angles, where high speed locomotion is observed, spreads out to higher frequencies. These differences could be caused by the non-existence of compliance or play in the hip joint of the model. For further characterization of the locomotion behavior we explored the periodicity of the cyclic motion. The periodicity is analyzed using standard deviation of average speeds of each motion cycle as shown in Figure 3D. The deviation was calculated by using the previous locomotion speed data. In this analysis, the standard deviation is to interpret as opposite value to periodicity. This means, low deviation in speed of the motion cycles describes high periodicity and vice versa. The figure shows that more periodic lo-

571

J. Rummel et al. / One-Legged Locomotion with a Compliant Passive Joint

f = 5 Hz

8 6

1

0

2

0

0,1

2

4

1 1

0

2

0

0 0

0,2

0,4

0,6

0,8

10

0,2 0,4 0,6 0,8 10 spring stiffness k [N/mm]

0,2

0,4

0,6

0,8

1

0

mean of hip torques [Ncm]

-1

0

2

0,2

3

1

3

1

0.5

body mass M [kg]

4

0,3

10

2

-1

f = 4 Hz

2

f = 3 Hz

0,4

Figure 4. Simulation results: Absolute values of averaged hip torques during stance phase. The contour lines illustrate the forward velocity as additional performance criterion. The velocity is measured in leg lengths per second. The offset angle α0 is held constant at 10 ◦ . Cleared regions indicate that the foot does not leave the ground.

comotion can be achieved with lower offset angles, and it becomes more unstable as the offset is increased. By considering the locomotion speed shown in Figure 2A, the best performance can be achieved by locomotion with two points in ground contact. Another region of higher performance is shown around the frequency of 5 Hz and an offset angle of approximately 10 ◦ . Here, a higher speed and a local minimum in deviation of speed was achieved where the robot shows normal forward locomotion. So far, we have investigated the influence of control on morphology of hopping behavior. An additional analysis of the locomotion performance is explored in simulation with respect to the spring stiffness k and the body mass M . Here, we investigated the effects of these mechanical properties on the torque in the hip joint and again on speed as illustrated in figure 4. Both issues provide information about energy efficiency of locomotion. We calculated the torque by using the ground reaction forces during stance in conjunction to the leg vector. The data were averaged over 10 locomotion cycles in which the simulated leg was in steady state of hopping. The figure shows that with higher mass the torque mostly increases and therefore the raised energy increases too. Indeed, there is a linear connection between body mass and stiffness that causes in higher velocities if the legs natural oscillation frequency is considered. Furthermore, we observe that with low mass, i.e. 0.2 kg, the behavior is more independent on the stiffness but the speed is lower. Therefor, an exact adjustment of spring stiffness is not essential for achieving locomotion with compliant legs.

4. Conclusion This paper presented preliminary experimental results of a single legged locomotion by using a robotic platform and simulation. Despite its simplicity, this locomotion model shows a few implications which are potentially important for our better understanding of dynamic locomotion behavior. Even though this model is still attached to a supporting beam to prevent rotational movement of the body, the locomotion behavior is simply achieved without the necessity of sensory feedback. This is possible mainly because the system can self-stabilize

572

J. Rummel et al. / One-Legged Locomotion with a Compliant Passive Joint

F2 T1

F1

F1 Mg

F2

Fg2y Fg2x

Fspr Fspr

j2 F3 m2 g F4

j1 F4

m1 g F3

Fg1y Fg1x

Figure 5. Dynamics of the two-segmented leg.

itself into a stable periodic movement by exploiting the body dynamics derived from mechanical properties and the ground interaction with the environment. In particular, it is important to note that morphological properties (e.g. body mass and spring stiffness) are the important design parameters in order to make this locomotion possible. One of the direct applications of this approach could be four (or more) legged locomotion as shown in some of the previous work [6,8,9]. Although the relation of the proposed locomotion model and the control of rotational movement has to be investigated further in the future. The minimalistic model explored in this paper provides further insights of self-stabilizing character in the processes of locomotion in general.

Appendix Here, we introduce the ground reaction model in detail. The ground reaction forces Fg.. are exemplarily given for the right foot-point described as (xr , yr ). All variables correspond to Figure 1A and Figure 5. ⎧  e2   e1  ⎨  yr  x˙ r √ −p1  l0  sgn(yr ) + p2 · mt g for yr < 0, g l0 Fg1y = ⎩ 0 otherwise, ⎧    Δxr e1 ⎪ ⎪ −p  1 ⎪  l0  ⎪  e2  ⎪ ⎪ ⎪ x˙ r ⎪ √ ⎪ · sgn(Δx ) + p · mt g for yr < 0, and so long as r 3 ⎪ ⎪ g l0 ⎨ Fg1x − μ1 Fg1y > 0 Fg1x = ⎪ ⎪ ⎪ for yr < 0, and so long as −sgn (x˙ r ) μ2 Fg1y ⎪ ⎪ ⎪ ⎪ | x˙ r | − vlim > 0 ⎪ ⎪ ⎪ ⎪ ⎩0 otherwise,

J. Rummel et al. / One-Legged Locomotion with a Compliant Passive Joint

Δxr = xr − xr0

573

mt = M + m1 + m2

The moment the state machine for describing horizontal ground reaction forces changes into adhesive friction, the horizontal displacement xr is saved as reference xr0 . In our simulations we used the following constants: g = 9.81 m s−2

M = 0.2 kg

m1 = 0.01 kg

m2 = 0.01 kg

l0 = 120 mm

μ1 = 0.40

μ2 = 0.45

p1 = 2.5 · 105

p2 = 3

p3 = 1

e1 = 3

e2 = 1

vlim = 0.01 m s−1 Acknowledgements This research is supported by the German Research Foundation (DFG). References [1] Cavagna, G. A., Heglund, N. C. and Taylor, C. R., Mechanical work in terrestrial locomotion: two basic mechanisms of minimizing energy expenditure. Am. J. Physiology 233, R243R261, 1977 [2] Alexander, R. McN., Three uses for springs in legged locomotion, Int. J. Robotics Res., 9, Issue 2, 53-61, 1990 [3] Blickhan, R., The spring-mass model for running and hopping. J. Biomech. 22, 1217-1227, 1989 [4] McMahon, T. A. and Cheng, G. C., The mechanics of running: how does stiffness couple with speed? J. Biomech. 23, 65-78, 1990 [5] Raibert, M. H., Legged Robots That Balance, The MIT Press, Cambridge, 1986 [6] Cham, J. G., Bailey, S. A., Clark, J. E., Full, R. J. and Cutkosky, M. R., Fast and robust: hexapedal robots via shape deposition manufacturing, Int. J. Robotics Res. 21, No. 10, 869883, 2002 [7] Buehler, M., Saranli, U., Papadopoulos, D. and Koditschek, D., Dynamic locomotion with for and six-legged robots, Int. Symposium on Adaptive Motion of Animals and Machines, Montreal, 2000 [8] Iida, F. and Pfeifer, R., Cheap rapid locomotion of a quadruped robot: self-stabilization of bounding gait. Intelligent Autonomous Systems 8, IOS Press, 642-649, 2004 [9] Fukuoka, Y., Kimura, H. and Cohen, A. H., Adaptive dynamic walking of a quadruped robot on irregular terrain based on biological concepts, Int. J. Robotics Res. 22, No. 3-4, 187-202, 2003 [10] Zhang, Z. G., Fukuoka, Y. and Kimura, H., Stable quadrupedal running based on a springloaded two-segment legged model, Proc. ICRA2004, 2601-2606, 2004 [11] Guenther, M., Computersimulationen zur Synthetisierung des muskulaer erzeugten menschlichen Gehens unter Verwendung eines biomechanischen Mehrkoerpermodells. PhD Thesis, University of Jena, 1997 [12] Altendorfer, R., Moore, N., Komsuoglu, H., Buehler, M., Brown Jr., H. B., McMordie, D., Saranli, U., Full, R. J. and Koditschek,D. E., RHex: a biologically inspired hexapod runner, Autonom. Robots 11, 207-213, 2001 [13] Seyfarth, A., Geyer, H., Guenther, M. and Blickhan, R., A movement criterion for running. J. Biomech 35, 649-655, 2002

574

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

Analysis of Dynamical Locomotion of Two-Link Locomotors Kojiro MATSUSHITA, Hiroshi YOKOI, Tamio ARAI The University of Tokyo, Precision Engineering Dept. Japan

Abstract: In this paper, we demonstrated simple locomotive robots, which consist of two links connecting at one joint and are actuated with simple oscillation, in both virtual and real world. Then, the locomotion of those robots was analyzed in terms of gait and exploitation of its own morphology. As results, we have firstly recognized two different gaits: kicking ground type and falling-over type. Then, the third gait has been acquired and it indicated a clue to acquire dynamically bipedal locomotion. Keywords: Legged robot, Dynamical Locomotion, Morphology, Edutainment

Introduction In the field of legged locomotion robotics, implementation of control is mainly focused so that many types of gait control and balance control are theorized and applied to legged robots [7,11]. Common characteristics of these robots in mechanism are made of rectangular-solid materials, actuated with many high-drive electronic motors, and controlled with high-speed information processing. With these configurations, the conventional robots are able to keep its balance (balance control) and their gaits are generated in the same methodology as robot arm manipulators: stance legs are regarded fixed links and swing legs are regarded as effecters during one step motion. By iteration of the manipulation, the robots achieve locomotion. Meanwhile, in recent years, there have been designed legged robots with biologically inspired knowledge such as anatomy, physiology, and neuroscience [1]. For example, passive dynamic walker [2,4,10] is a robot, which well-designed for bipedal locomotion - straight legs, curved feet with passive hip joints and latch knee joints as human body components and, then, the robot achieves dynamically stable walking on some incline by driving force from not motor actuation but gravity through its own morphology. Thus, it is known that morphological characteristics help to exploit its own physics and it can be regarded as computational functionality (morphological computation [5]) in embodied artificial intelligence [6]. In our research, we focus on morphological characteristics of legged robots. As our hypothesis, inter-dependence between morphology and controller should be an important factor [8] to achieve dynamical locomotion. We define dynamical locomotion as exploiting gravity and inertial force during its locomotion. In this paper, firstly, we introduce edutainment course as heuristic design methodology in order to acquire locomotive robots. Secondly, we show the results of analysis on the remarkable legged robots in both real and virtual world. Finally, we indicate one type of dynamical locomotion by the interdependence between morphology and controller.

K. Matsushita et al. / Analysis of Dynamical Locomotion of Two-Link Locomotors

575

Robot Edutainment as Heuristic Design Method We utilized edutainment course as heuristic design methodology in order to acquire various locomotive robots. The edutainment course is organized for 20 students in master course at engineering department and aimed at teaching importance of morphology. The main characteristic was that students could build their own robot structure with rapid proto-typing method as described in next section. Students designed locomotive robots with one or two motors with electric-circuit controllers and we instructed use of the controller with sample programs, which generates simple oscillation.

A.

Electric Circuit: Simple Oscillation Controller

Electric circuit Controllers are built on microchip H8/3664 (16MHz) and functions to control three RC-servo motors, to read two light sensors and two touch sensors, to communicate with a computer through serial port (fig.1, table1). The oscillation algorithm for RC servo motors is the followings: motor axis moves to two different angle-position alternately at an certain cycle. Students could modify the angle-positions and the cycle as control parameters. The students could use light detect sensor, however, we did not describe on the sensor in this paper.

Fig.1 Controller circuit

B.

Table1 Control circuit for edutainment Microchip H8/3664 (16MHz) Communication RS232 RC Servo Control 3ch Sensor Input 4ch LED 4ch Battery 9V

Rapid Proto-Typing Method: Plastic Bottles Based Robots

In the robot edutainment course, we applied rapid proto-typing method: plastic bottles are used as body parts, RC-servo motors as actuated joints, and connected with hot glue. The advantage is easy-assembling, easy-modifying, and economizing because, generally, it takes long time to build robots and difficult to change its morphology. It is sure that preciseness is less than metal materials, however, enough to realize desired behaviors as proto-type robots.

C.

Results: Locomotive Robots Designed by Students

Locomotive robots designed by students as shown in fig.2, were qualitatively categorized into three types at the criteria of locomotion gaits: shuffling, kickingground, and falling-over. The shuffling type moves forward by a high-frequently oscillation at a motor, which is approximately 4Hz. The locomotion is generated by friction forces at contact points. The kicking ground (KG) type moves forward by

576

K. Matsushita et al. / Analysis of Dynamical Locomotion of Two-Link Locomotors

kicking the ground with a rear leg. In other words, driving forward force is generated with the kicking motion. The falling-over (FO) type moves forward by falling over after the motor lifts up its fore leg (fig.3). In short, the FO type changes its body shape and, then, exploits gravity to fall over to forward. Table 2 indicates time traveled 1 meter forward of designed robots. The shuffling type was much slower than other types and difficult to move straight forward so that they normally could not reach the goal. Meanwhile, the best KG type and FO type reached 1 meter at approximately 10 seconds. We assumed that the shuffling type contacted the ground all the time and its locomotion was influenced by its floor condition so much. Therefore, actuation force is subtracted by friction force and driving force was relatively small and not unidirectional. It is no wander that this type is slower than KG and FO types, which efficiently utilize actuation force to driving force (we do describe on the KG and FO type in next chapter).

Fig.2 Students’ locomotors Table 2 categorization and ranking of students’ robots Locomotion Type Inversed Pendulum (falling) Shuffling Kicking Ground

Locomotive Energy Gravity Inertia force/ Friction Actuation

Number of robots 5 7 6

Ranking 1 2 3 4 5 - 20

Record Time of 1Meter Race 7.4 sec 8.1 sec 11.0 sec 1 m 30 sec More than 5 min / No goal

Locomotion Type Kicking Ground Inversed Pendulum Kicking Ground Inversed Pendulum Others

Preparation State

Moving Forward State 䋺RC Servo Motor

Kick Kick-Ground Type Move rear leg

Rear leg kicks ground Lift up

Inversed-Pendulum Type Move fore leg

Lifting up fore leg and falling over

Fig.3 Schematic snap shots of two types at each step during locomotion

K. Matsushita et al. / Analysis of Dynamical Locomotion of Two-Link Locomotors

577

Investigation on Morphology of Two Locomotors In this chapter, we mainly analyze on locomotion characteristic of two remarkable locomotors, which acquired in the edutainment course. Firstly, we observed their ground contact information during locomotion in real world, which represents their gaits. Secondly, those robots are modeled and simulated in three-dimensional world and, then, their ground contact information including ground reaction force are recorded. A.

Analysis of Locomotive Robots in Real World

In the previous chapter, we categorized that the KG type moves forward by kicking the ground (fig4a, fig5a) and the FO type moves forward by falling-over (fig4b, fig5b). In this chapter, we quantitatively analyze the locomotion by recording ground contact information.

Length*Width*Height =0.30m*0.25m*0.20m Weight = 3 kg

Length*Width*Height =0.35m*0.30m*0.25m Weight = 3.5 kg

(a) KG type

(b) FO type Fig.4 Locomotors designed in edutainment

(a) KG type

(b) FO type

㪪㪼㫉㫍㫆㩷㪩㫀㪾㪿㫋

㪪㪼㫉㫍㫆㩷㪩㫀㪾㪿㫋

㪪㪼㫉㫍㫆㩷㪣㪼㪽㫋

㪪㪼㫉㫍㫆㩷㪣㪼㪽㫋

㪞㪸㫀㫋㪃㩷㪪㪼㫉㫍㫆㩷㫄㫆㫋㫆㫉

㪞㪸㫀㫋㪃㩷㪪㪼㫉㫍㫆㩷㫄㫆㫋㫆㫉

Fig.5 photographic strip

㪩㪼㪸㫉㪣㪼㪾㩷㪦㫅 㪩㪼㪸㫉㪣㪼㪾㩷㪦㪽㪽

㪝㫆㫉㪼㪣㪼㪾㩷㪦㫅

㪩㪼㪸㫉㪣㪼㪾㩷㪦㪽㪽

㪝㫆㫉㪼㪣㪼㪾㩷㪦㫅 㪝㫆㫉㪼㪣㪼㪾㩷㪦㪽㪽

㪝㫆㫉㪼㪣㪼㪾㩷㪦㪽㪽

㪍㪇㪇㪇

㪩㪼㪸㫉㪣㪼㪾㩷㪦㫅

㪎㪇㪇㪇

㪏㪇㪇㪇

㪐㪇㪇㪇

㪍㪇㪇㪇

㪫㫀㫄㪼㩷㪲㫄㫊㪼㪺㪴

(a) KG type (Cycle=1 sec)

㪎㪇㪇㪇 㪏㪇㪇㪇 㪫㫀㫄㪼㩷㪲㫄㫊㪼㪺㪴

(b) FO type (Cycle=1 sec)

Fig.6 ground contact information of locomotors

㪐㪇㪇㪇

578

K. Matsushita et al. / Analysis of Dynamical Locomotion of Two-Link Locomotors

As results, we have macroscopically seen similar gaits between the KG type and the FO type: their rear legs were all the time on the ground and their fore legs are taken off the ground for approximately 0.5 second when the types opened their legs. However, they have microscopically indicated difference characteristics. In the case of the KG type, the fore leg took the ground off approx. 0.2 sec before and approx. 0.2 sec after the KG type started opening its legs. We assumed that the center of gravity (CoG) should be on the rear leg at the time for 0.2 sec and, then, the rear leg kick the ground for 0.2 sec. On the contrary, in the case of the FO type, the fore leg took the ground off 0.1 sec after the legs started opening. It seems that the fore leg was lifted up for 0.1 sec and, then, the rear leg fell over forward for 0.2 sec. Thus, the ground contact information implies their locomotive patterns. However, with real robots, it is difficult to acquire some data such as ground reaction force, which represents gaits. Therefore, we modeled the KG and FO types for analysis of their locomotive patterns. Motor Angle Rear Leg

Motor

Mass

Mass Rear Leg

Fore Leg

Fore Leg Angle

Mass: 0.5kg at 0.0m on F-leg Angle: PI/2 rad F-leg: 0.65*0.1*0.1m, 0.5kg R-leg: 0.35*0.1*0.1m, 0.5kg

Mass: 0.5kg at 0.35m on F-leg Angle: PI/4 rad F-leg: 0.65*0.1*0.1m, 0.5kg R-leg: 0.35*0.1*0.1m, 0.5kg

(a) KG type

(b) FO type Fig.7 Simulated and modeled locomotors

(a) KG type

(b) FO type Fig.8 photographic strip 㪲㪥㪴

㪝㫆㫉㪼㩷㪣㪼㪾 㪩㪼㪸㫉㩷㪣㪼㪾





㪦㪥 㪦㪝㪝 㪦㪥 㪦㪝㪝



㪍㪅㪌



㪎㪅㪌



㪞㫉㫆㫌㫅㪻㩷㪚㫆㫅㫋㪸㪺㫋㩷㪠㫅㪽㫆㫉㫄㪸㫋㫀㫆㫅 㪆㩷㪞㫉㫆㫌㫅㪻㩷㪩㪼㪸㪺㫋㫀㫆㫅㩷㪝㫆㫉㪺㪼

㪞㫉㫆㫌㫅㪻㩷㪚㫆㫅㫋㪸㪺㫋㩷㪠㫅㪽㫆㫉㫄㪸㫋㫀㫆㫅 㪆㩷㪞㫉㫆㫌㫅㪻㩷㪩㪼㪸㪺㫋㫀㫆㫅㩷㪝㫆㫉㪺㪼

㪲㪥㪴

㪝㫆㫉㪼㩷㪣㪼㪾 㪩㪼㪸㫉㩷㪣㪼㪾



㪇 㪍

㪍㪅㪌



㪎㪅㪌



㪦㪥 㪦㪝㪝 㪦㪥 㪦㪝㪝

㪫㫀㫄㪼㩷㪲㫊㪼㪺㪴

㪫㫀㫄㪼㪲㫊㪼㪺㪴

Maximum Amplitude=PI/2 Rotary Speed=PI/2 [rad/sec] Cycle=1.4 [sec]

Maximum Amplitude=PI/3 Rotary Speed=PI/2 [rad/sec] Cycle =1.0 [sec]

(a) KG type

(b) FO type

Fig.9 ground contact information and ground reaction force of two locomotors

㪏㪅㪌



K. Matsushita et al. / Analysis of Dynamical Locomotion of Two-Link Locomotors

B.

579

Analysis of Locomotive Robots in Three-Dimensional Simulation

 In the previous section, we have got clues to distinguish their locomotion patterns. In this section, we modeled the locomotive robots and simulated in three-dimensional world (implemented with three-dimensional library – Open Dynamic Engine [9]) in order to observe their internal states.  Fig. 7 shows the modeled KG type and FO type in simulation. The design is slightly different from original design in real world because of simplification. By using the same body components, which are two links connecting at a joint and a mass ball, we compared between those two locomotive robots. The morphological and controller parameters are searched heuristically, and those simulated robots showed similar locomotion patterns to real ones as shown in fig8. We observed their ground contact information including ground reaction force (GRF). As results, fig.9 shows ground contact information and transition of GRF. The fig.9a indicates the following states of the KG type: the CoG was on the rear leg (GRF at the foreleg gradually decreased and GRF at the rear leg gradually increased), the rear leg started kicking the ground (GRF at the rear leg shows its peak and decreased and, then, GRF at the rear leg is zero for 0.1 sec), stayed in the air (GRF at both legs are zero for 0.05 sec), the fore leg landed (GRF at the fore leg is peak) for one locomotion cycle. Meanwhile, the fig9b shows that the FO type is similar to crawling because the rear leg keeps relatively constant GRF and even the fore leg keeps constant GRF except its landing peak. So far, we have shown internal state during locomotion. However, it is necessary to reveal the advantages of their locomotion so that distance traveled forward at amplitude and frequency as control parameters are examined. Fig.10 shows results: both types commonly traveled most forward at approx. 0.5 [sec] on cycle axis (The maximum motor speed is PI/2 [rad/sec]). However, the area where is more than 2 [sec] on cycle axis and more than 3PI/8 [rad] on amplitude axis, shows different characteristic. This indicates that the KO type achieves moving forward by direct kicking the ground, therefore, it can kick the ground at both fast and slow cycle. Oppositely, the FO type moves forward by lifting up the fore leg so that the lifting up motion requires fast cycle because the fore leg is not lifted up and it does not move forward if the cycle is slow. Thus, it revealed that the KG type has more controllability than the FO type.

(a) KG type

(b) FO type

Fig.10 Distance traveled at different control parameters

C.

Same-Length Leg Type

In the previous section, we analyzed two robots, which were the same morphology, but actuation angle range was different. In this section, we also focused on two link

580

K. Matsushita et al. / Analysis of Dynamical Locomotion of Two-Link Locomotors

structure and investigate on the same length type (SL type). We implemented both simulated and real locomotors as shown in fig11. As results, the SL type achieves qualitatively similar locomotion between virtual and real. Comparing with GRF of the KG and FO types, the SL type is different because it does not have landing peaks during locomotion but more smooth transition. Furthermore, the remarkable difference is seen in fig.14. The map of distance traveled is completely different from ones of the KG and FO types. The SL traveled forward more than 6 meters at one peak on the map at the area where is 1 [sec] on cycle axis and PI/4 [rad] on amplitude axis. In the term of controllability, the SL is lower than others. However, the results indicates the interdependence between morphological and control parameters achieves smooth transition as dynamical locomotion and better locomotion ability. Motor

Rear Leg

Angle

Fore Leg

Mass: 0.5kg at 0.05m behind motor F-leg: 0.50*0.1*0.1m, 0.5kg R-leg: 0.50*0.1*0.1m, 0.5kg

(a) Simulated robot

(b) Schematic

(c) Real robot

Fig.11 Simulated and Real locomotors

(a) Simulated robot

(b) Real robot Fig.12 photographic strip 㪪㪼㫉㫍㫆㩷㪩㫀㪾㪿㫋

㪝㫆㫉㪼㩷㪣㪼㪾 㪩㪼㪸㫉㩷㪣㪼㪾

㪪㪼㫉㫍㫆㩷㪣㪼㪽㫋



㪇 㪦㪥 㪦㪝㪝 㪦㪥 㪦㪝㪝



㪍㪅㪌



㪺 㪎㪅㪌



㪏㪅㪌



㪞㪸㫀㫋㪃㩷㪪㪼㫉㫍㫆㩷㫄㫆㫋㫆㫉

㪞㫉㫆㫌㫅㪻㩷㪚㫆㫅㫋㪸㪺㫋㩷㪠㫅㪽㫆㫉㫄㪸㫋㫀㫆㫅 㪆㩷㪞㫉㫆㫌㫅㪻㩷㪩㪼㪸㪺㫋㫀㫆㫅㩷㪝㫆㫉㪺㪼

㪲㪥㪴

㪩㪼㪸㫉㪣㪼㪾㩷㪦㫅 㪩㪼㪸㫉㪣㪼㪾㩷㪦㪽㪽

㪝㫆㫉㪼㪣㪼㪾㩷㪦㫅 㪝㫆㫉㪼㪣㪼㪾㩷㪦㪽㪽

㪫㫀㫄㪼㩷㪲㫊㪼㪺㪴

(a) Simulated robot (Cycle=0.80 sec)

㪍㪇㪇㪇

㪎㪇㪇㪇 㪏㪇㪇㪇 㪫㫀㫄㪼㩷㪲㫄㫊㪼㪺㪴

㪐㪇㪇㪇

(b) Real robot (Cycle=0.70 sec)

Fig.13 Ground contact and ground reaction force

Furthermore, the SL type generates symmetrical characteristic in its locomotion at some specific control parameters as shown in fig15: the robot starts iteration of switching swing leg and support leg at one spot in simulation and its GRF is similar to M letter shape, which represents human walking gait [3]. We assume that this locomotion is close to bipedal locomotion and utilizing pitching oscillation indicates

581

K. Matsushita et al. / Analysis of Dynamical Locomotion of Two-Link Locomotors

exploiting its own physics through morphology. Therefore, we believe three link robot is appropriate design for bipedal locomotion because it should control its pitching oscillation.

㪞㫉㫆㫌㫅㪻㩷㪚㫆㫅㫋㪸㪺㫋㩷㪠㫅㪽㫆㫉㫄㪸㫋㫀㫆㫅 㪆㩷㪞㫉㫆㫌㫅㪻㩷㪩㪼㪸㪺㫋㫀㫆㫅㩷㪝㫆㫉㪺㪼

㪲㪥㪴

㪝㫆㫉㪼㩷㪣㪼㪾 㪩㪼㪸㫉㩷㪣㪼㪾



㪇 㪍

㪍㪅㪌



㪎㪅㪌



㪏㪅㪌



㪦㪥 㪦㪝㪝 㪦㪥 㪦㪝㪝 㪫㫀㫄㪼㪲㫊㪼㪺㪴

Same Length and Mass Rear Type Amplitude = Angle between legs = 6PI/64, Mass Position=-0.05 Fig.14 Distance traveled at different control parameters

Simulated robot (Cycle=0.80 sec) Fig.15 ground contact information and ground reaction force

Conclusion In this paper, we investigated on dynamical locomotion as exploiting its own physics through morphology. The locomotive robots are designed in edutainment course as heuristic method. Then, we selected remarkable robots and analyzed them in both real and virtual world. As results, two-link locomotors have shown two different gaits. Furthermore, we have observed the third gait - pitching oscillation, which switch swing leg and support leg, and the transition of its ground reaction force is similar to M letter shape, which is characterized human dynamic walking. Therefore, we found that twosame-length link robot exploits pitching oscillation during locomotion and we believe that the design can be a clue to achieve dynamical locomotion of legged robots.

Reference [1] [2]

Alexander, R. “Principles of Animal Locomotion.” Princeton University Press, New Jersey, 2002 Collins,S.H., Wisse,M. and Ruina,A. “A 3-D passive-dynamic walking robot with two legs and knees.” Int. J. of Robotics Research, 20(7):607-615, 2001. [3] Komi,P.V., Gollhofer,A., Schmidtbleicher,D. and Frick,U. “Interaction between man and shoe in running: considerations for a more comprehensive measurement approach.” Int. J. of Sports Medicine, 8:196-202, 1987. [4] McGeer,T. “Passive dynamic walking.” Int. J. Robotics Research, 9(2):62–82, 1990. [5] Paul,C. “Morphology and computation.” Proc. of Int. Conf. on the Simulation of Adaptive Behaviour, 2004. [6] Pfeifer,R. and Scheier,C. “Understanding Intelligence.” Cambridge, MA: MIT Press, 1999. [7] Raibert, M. H. “Legged Robots That Balance.” MIT Press, Cambridge, MA, 1986 [8] Sims, K. “Evolving 3D morphology and behavior by competition.” In R. Brooks and P. Maes, editors, Artificial Life IV Proceedings. MIT Press, pp 28-39, 1994 [9] Smith, R. “Open Dynamics Engine.” URL: http://ode.org/, 2000. [10] Van der Linde,R.Q. Design, “Analysis and control of a low power joint for walking robots, by phasic activation of McKibben muscles.” IEEE Trans. Robotics Automation, 15(4):599-604, 1999. [11] Vukobratovic,M. and Juricie,D. “Contribution to the synthesis of biped gait.” IEEE Tran. On BioMedical Engineering, 16(1):1-6, 1969.

This page intentionally left blank

Part 10 Mobiligence

This page intentionally left blank

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

585

A Modular Robot That Self-Assembles Akio Ishiguro a,1 , Hiroaki Matsuba a , Tomoki Maegawa a and Masahiro Shimizu a a Dept. of Computational Science and Engineering, Nagoya University Abstract. One of the most graceful phenomena widely observed in nature is selfassembly; living systems spontaneously form their body structure through the developmental process. While this remarkable phenomenon still leaves much to be understood in biology, the concept of self-assembly becomes undeniably indispensable also in artificial systems as they increase in size and complexity. Based on this consideration, this paper discusses the realization of self-assembly with the use of a modular robot. The main contributions of this paper are twofold: the first concerns the exploitation of emergent phenomena stemming from the interplay between the control and mechanical systems; the second is related to the implementation of different adhesiveness among the modules. Here, form generation by self-assembly is considered as the result of time evolution toward the most dynamically stable state. Preliminary simulation results show that stable and spontaneous self-assembly is achieved irrespective of the initial positional relationship among the modules. Keywords. Self-assembly and self-repair, modular robot, emergent phenomena, morphological computation, fully decentralized control

1. Introduction Self-assembly is a phenomenon where basic units gather together and form some specific configuration stably and spontaneously. In the natural world, this phenomenon is observed widely among living and non-living things. For example, hydra, a primitive coelenterate, is known to reproduce itself by recreating its original body structure even if it is artificially dissociated into cells. Amphipathic molecules organized from hydrophilic and hydrophobic groups spontaneously form a micelle or vesicle structure. Even in artificial constructions, this concept will become essential as the system to be designed increases in size and complexity. In addition, as in the hydra example, self-assembly is a concept strongly associated with self-repair. Therefore, discussion toward the realization of self-assembly can also be expected to provide useful information for artificial construction of highly fault tolerant systems. Based on this viewpoint, several studies are currently underway in the field of robotics to realize self-assembly using a modular robot — or sometimes referred to as a reconfigurable robot. A modular robot, which was originally initiated by Fukuda and his colleagues, is a robotic system that consists of multiple mechanical units called modules[1]. In contrast to a conventional robot on a fixed-morphology basis, a modular robot can reconfigure itself according to the situation encountered by actively altering 1 Corresponding

Author: Department of Computational Science and Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan. Tel.: +81 52 789 3167; Fax: +81 52 789 3166; E-mail: [email protected].

586

A. Ishiguro et al. / A Modular Robot That Self-Assembles

the relative positional relationship among the modules. What has be noticed is that, in many of these studies, the behavior of each module in the self-assembly process is individually determined in a fully algorithmic manner: a control algorithm — or sometimes an external operator — precisely specifies which modules should be connected physically as well as how each module should be moved. Under this kind of “rigorous” control method, however, the control algorithm required may become extremely complicated and intractable as the system scale increases in size and complexity. Consider the self-assembly phenomenon observed in the natural world. It should be noted that there is no entity corresponding to an algorithm that fully specifies all events in the self-assembly process behind this phenomenon; the resultant form is emerged by exploiting a physiochemical interaction between substances as well. For example, different intercellular adhesiveness are known to play a significant role in the time evolution to the final form in an individual morphogenesis. Since self-assembly is based on the presumption that the number of system components is extremely large, we believe it is essential to adopt an approach that explicitly considers the exploitation of physical interactions between modules from the initial stage of investigation. In light of these facts, this study aims at the development of a modular robot that enables stable and spontaneous self-assembly from the initial to the final configuration by exploiting emergent phenomena stemming from the interplay between the control and mechanical systems. In order to achieve this goal, this study has focused on a “functional material” and a “mutual entrainment among nonlinear oscillators”. By using the former as an inter-module spontaneous connectivity control mechanism and the latter as a core control mechanism for the generation of inter-module cohesive force, we attempt to realize self-assembly in a fully decentralized manner. Note that this study considers form generation by self-assembly as the result of the time evolution toward the most dynamically stable state, i.e., the equilibrium state. Here, the behavior of individual modules is not determined only by a control algorithm; exploitation of the inter-module physical interaction is also well considered, which allows us to significantly reduce the computational cost required for the connectivity control.

2. Previous and Related Work This section briefly introduces several prior studies related to self-assembly. Murata et al. discussed a method which enables a staged formation toward a goal structure using a modular robot called Fractum [2]. Yoshida et al. extended Fractum to a threedimensional modular robot [3]. Using a modular robot called M-TRAN, Kamimura et al. realized self-assembly to a serpentine or four-legged robot [4]. These studies, however, were based on a full algorithmic manner: the control algorithm precisely specifies the behavior of each module to realize the goal structure, particularly focusing on efficient procedures. Similar approaches are also found in [5]-[10]. On the other hand, several studies have also been reported on self-assembly achieved through fully exploiting physical interactions between modules (e.g. [11]). These studies, however, discussed self-assembly in terms of micrometers or smaller, and it still remains unclear whether they can be directly employed for centimeter-order or larger applications. Recently, inspired by a seminal study done by Hosokawa et al. [12], Bishop et al. have constructed a self-assembling robot [13]. In their study, however, since the

A. Ishiguro et al. / A Modular Robot That Self-Assembles

587

physical interaction between the modules greatly depends on random collision, efficient time evolution to the final form may be difficult.

3. The Proposed Method 3.1. The Design Strategies The design strategies for the development of a modular robot that enables self-assembly employed in this study can be summarized as followings: 1. Self-assembly is considered as an emergent phenomenon. More specifically, form generation by self-assembly is regarded as the result of the time evolution toward the most dynamically stable state, i.e., the equilibrium state. 2. In the self-assembly process, the behaviors of individual modules are not determined simply by a control algorithm. Exploitation of the physical interaction between modules is also considered in detail. In what follows, we will explain how we designed the control and mechanical systems of a modular robot having the ability to self-assemble according to the above idea.

Figure 1. A schematic of the mechanical structure of the module considered in this study. (left) top view. (right) side view. Note that no active control mechanism that precisely specifies the connection/disconnection among the modules is implemented. Instead, a spontaneous connectivity control mechanism exploiting a functional material, i.e. a genderless Velcro strap, is employed. This allows us to induce emergent properties in artificial “morphogenesis”.

3.2. The Mechanical System As an initial step of the investigation, a two-dimensional modular robot has been considered, consisting of many identical modules, each of which has a mechanical structure like the ones shown in Figure 1. Each module is equipped with telescopic arms and a ground friction control mechanism (explained later). Each module is also equipped with a mechanism which informs the module whether it is positioned as an inner module or a surface module within the entire system1 . Note that the module is covered with a functional material. More specifically, we have used a genderless Velcro strap as a practical example, since this intrinsically has interesting properties: when two halves of Velcro come into contact, they are connected easily; and when a force greater than the yield 1 This could be done by some sort of sensory mechanism or in an algorithmic manner; we do not intensively consider the detail of this mechanism in this paper.

588

A. Ishiguro et al. / A Modular Robot That Self-Assembles

Figure 2. A schematic of the active mode and the passive mode (A side view of two connected modules is shown for clarity). Note that a module in the passive mode sticks to the ground and acts as a supporting point.

strength is applied, the halves will come apart automatically. Exploiting the property of this material itself as a “spontaneous connectivity control mechanism", we can expect not only to reduce the computational cost required for the connection control significantly, but also to induce emergent properties in morphology control. The property of the connectivity control mechanism is mainly specified by the yield stress of Velcro employed: connection between the modules is established spontaneously where the arms of each module make contact; disconnection occurs if the disconnection stress exceeds the Velcro’s yield stress. We also assume that local communication between the connected modules is possible, which will be used to create the phase gradient inside the modular robot (discussed below). In this study, each module is moved by the telescopic actions of the arms and by ground friction. Note that each module itself does not have any mobility but can move only by the collaboration with other modules. 3.3. The Control System Under the above mechanical structure, now we consider a fully decentralized control method that enables the modules to move toward a particular equilibrium configuration, i.e. goal shape, stably and efficiently. It should be noted here that we do not use any active control for the connection/disconnection among the modules. Instead, we have employed a spontaneous connectivity control mechanism exploiting a functional material, i.e. a genderless Velcro strap. In order to take full advantage of this mechanism, in this study, we intend to generate a cohesive force inside the modular robot, acting like an effect stemming from the surface tension. In the following, we will show how we have achieved this requirement. 3.3.1. Active mode and passive mode Before a detailed explanation of the control algorithm developed, here we firstly define the basic operation of each module. Each module of this modular robot can take one of the following two exclusive modes at any time: active mode and passive mode. As shown in Figure 2, a module in the active mode actively contracts/extends the connected arms, and simultaneously reduces the ground friction. In contrast, a module in the passive mode increases the ground friction2 , and returns its arms to their original length. Note that a module in the passive mode does not move itself but serves as a “supporting point" for efficient movement of the modules in the active mode. 2 Slime

molds use this mechanism. This is called a pseudopod.

A. Ishiguro et al. / A Modular Robot That Self-Assembles

589

3.3.2. Basic requirements In order to generate an appropriate cohesive force that effectively drives all the modules toward a particular equilibrium configuration, the following two things have to be carefully considered in developing a control algorithm: • The mode alternation in each module. • The extension/contraction of each telescopic arm in a module. Of course, this control algorithm should be implemented in a fully decentralized manner. In addition, it should not depend on the number of the modules and the morphology of the entire system. To this end, we have focused on nonlinear oscillators since they exhibit an interesting phenomenon called the mutual entrainment when they interact with each other. In this study, we have implemented a nonlinear oscillator onto each module of the modular robot. We then create phase gradient inside the modular robot through the mutual entrainment among the locally-interacting nonlinear oscillators. This is done by local communication among the physically-connected modules. By exploiting the phase gradient created as key information for controlling the mode alternation and the arm extension/contraction in each module, we have successfully generated a cohesive force inside the modular robot in a fully decentralized manner. In what follows, we will explain this in more detail. 3.3.3. Creating phase gradient through the mutual entrainment As a model of a nonlinear oscillator, the van der Pol oscillator (hereinafter VDP oscillator) has been employed, since this oscillator model has been well-analyzed and widely used for its significant entrainment property. The equation of the VDP oscillator implemented on module i is given by αi x ¨i − βi (1 − x2i )x˙ i + xi = 0,

(1)

where the parameter αi specifies the frequency of the oscillation. βi corresponds to the convergence rate to the limit cycle. The local communication among the physically connected modules is done by the local interaction among the VDP oscillators of these modules, which is expressed as: ⎧ ⎫ i (t) ⎨ 1 N ⎬ xi (t + 1) = xi (t) + ε xj (t) − xi (t) , (2) ⎩ Ni (t) ⎭ j=1

where xj represents the state of the neighboring module physically connected to module i, which is obtained through the local communication mechanism. Ni (t) is the number of modules neighboring module i at time t, respectively. The parameter ε specifies the strength of the interaction. Note that this local interaction acts like a diffusion. When the VDP oscillators interact according to Eq. (2), significant phase distribution can be created effectively by varying the value of αi in Eq. (1) for some of the oscillators. In order to create an equiphase surface effective for generating an appropriate cohesive force, we set the value of αi as:  1.3 if the module is positioned as a surface module αi = (3) 1.0 if the module is positioned as an inner module

590

A. Ishiguro et al. / A Modular Robot That Self-Assembles

Note that the value of αi is increased when the module is positioned as a surface module in the entire system. As a result, the frequency of the VDP oscillators in the outer modules will be relatively decreased compared to the ones in the inner modules. This allows us to create the phase gradient inside the modular robot, which can be effectively exploited to endow the entire system with the cohesive force (explained later). Figure 3 shows the phase distribution created through the mutual entrainment when the modules are arranged circularly. In the figure, arrows — each of which represents the direction of the gradient vector at the corresponding point — are also depicted for clarity. It should be noticed that the direction of each gradient vector points toward the center of the entire system.

Figure 3. Phase distribution created through the mutual entrainment among the VDP oscillators in a circular arrangement. The gray scale denotes the value of the phase at the corresponding point. Each arrow represents the direction of the gradient vector at the corresponding point. Note that all the arrows point toward the center.

3.3.4. Generating cohesive force Based on the above arrangements, here we will explain how we designed the control algorithm. More specifically, we will show how the mode alternation and the arm extension/contraction in each module is controlled by exploiting the phase gradient created from the aforementioned mutual entrainment among the locally-interacting VDP oscillators. The mode alternation in each module is simply done according to the phase of its VDP oscillator, which is expressed as:  0 ≤ θi (t) < π : active mode (4) π ≤ θi (t) < 2π : passive mode where, θi (t) denotes the phase of oscillation of the VDP oscillator in module i at time t, which is written by θi (t) = arctan

x˙ i (t) . xi (t)

(5)

On the other hand, the extension/contraction of each arm mounted on module i in the active mode is determined according to the phase difference with its corresponding neighboring module. This is given by Fim (t) = −k{θjm (t) − θi (t)},

(6)

591

A. Ishiguro et al. / A Modular Robot That Self-Assembles

where, Fim (t) is the force applied for the extension/contraction of the m-th arm of module i at time t. k is the coefficient. θjm (t) represents the phase of the neighboring module physically connected to the m-th arm of module i. Due to the algorithm mentioned above, the degree of arm extension/contraction of each module will become most significant along the phase gradient (see Figure 3), and the timings of the mode alternation are propagated from the inner modules to the outer ones as traveling waves. As a result, all the modules are encouraged to move toward the center of the entire system, by which a cohesive force will be automatically generated. 3.4. Experiments 3.4.1. Simulation results In order to verify the proposed method, simulations were conducted under the condition where the number of modules was set to 100; βi = 1.0; i = 0.01; all the modules were placed initially so as to form a rectangular shape. A representative data is shown in Figure 4. These snapshots are in the order of time evolution of the self-assembly process. As the figures explain, all the module move such that the entire system forms a disk-like shape. This is because a disk-like shape is the equilibrium configuration — the most dynamically stable form — in this case, as observed in soap bubbles. Note that the spontaneous connectivity control mechanism stemming from the genderless Velcro straps is fully exploited in this process.

(a) Initial state

(b) 20000 steps

(c) 30000 steps

(d) 50000 steps

Figure 4. : Time evolution of the self-assembly (see in alphabetical order). The entire system consists of 100 modules. In this simulation, all the modules are identical. Due to a cohesive force similar to an effect stemming from surface tension, the modules converge to a disk-like form. Note that the spontaneous connectivity control stemming from the genderless Velcro straps is fully exploited in this process.

3.4.2. Changing equilibrium configuration In the abovementioned result, a disk-like form was dynamically most stable; a module group starts its self-assembly process toward this form from any initial form. One question then arises. How can we modify this dynamically most stable form? In other words, how can we change the equilibrium configuration in the self-assembly process? In the following, we will investigate this question by focusing in particular on different cell adhesiveness that are observed in the morphogenesis, i.e., the development of multicellular organisms. At the initial stage of the morphogenesis, a fertilized egg repeats cell division to become an aggregate of many cells. As this process proceeds, the cells are differentiated

592

A. Ishiguro et al. / A Modular Robot That Self-Assembles

into a variety of cell types; they are grouped together depending on the genes expressed in each cell. During this process, proteins called cell adhesion molecules on the cell surface are known to specify which cells it can adhere to. Steinberg clearly explained that this different intercellular adhesiveness seriously affects the final configuration of the organisms (known as the Steinberg’s thermodynamic model)[14][15]. Inspired by his remarkable idea, we have investigated how the equilibrium configuration could be altered by different adhesiveness. For simplicity, we have employed a modular robot consisting of two types of modules — module A and module B, each of which has its own specific inter-module adhesiveness. For convenience, let the adhesion strength between module A-A, module A-B, and module B-B be WAA , WAB and WBB . Although there are many possible relationships among WAA , WAB and WBB , as an initial step of the investigation, we consider the following two cases: Case 1 for WAA = WBB > WAB ; Case 2 for WAA > WAB > WBB . Possible structures for Case 1 and Case 2 are illustrated in Figure 5. Representative simulation results obtained under Case 1 and Case 2 are shown in Figure 6 and Figure 7, respectively. In both cases, the entire system consisted of 100 modules: 50 for module A; 50 for module B. The solid and empty circles in the figures denote module A and module B, respectively. As Figure 6 explains, over time the entire system forms into two groups, each of which consists of the same type of module (see (d)). Note that this is the equilibrium configuration in this case. In contrast, the equilibrium configuration in Case 2 is different from the one observed in Case 1: the modules with lower adhesive strength, i.e., module B envelopes the ones with higher adhesive strength, i.e., module A. These results obtained fit well with the Steinberg’s thermodynamic model.

(a) Case 1: WAA = WBB > WAB

(b) Case 2: WAA > WAB > WBB

Figure 5. : Possible structures for Case 1 and Case 2. Note that (1) the binding force between modules (corresponding to intercellular adhesiveness) is proportional to the genderless Velcro strap sticking area; (2) gendered Velcro straps in addition to genderless ones are also exploited in Case 2.

4. Conclusion and Further Work This paper has investigated the realization of self-assembly with the use of a modular robot consisting of many mechanical modules. The main contribution of this study can be summarized as follows: first, we considered form generation by self-assembly as the result of the time evolution toward the most dynamically stable state, i.e., the equilibrium state; second, the behavior of each module was not determined simply by the control algorithm. The inter-module physical interaction was also well exploited in the

A. Ishiguro et al. / A Modular Robot That Self-Assembles

(a) Initial state

(b) 10000 steps

(c) 25000 steps

593

(d) 50000 steps

Figure 6. : Time evolution of the self-assembly in Case 1 (see in alphabetical order). The entire system consists of 100 modules: 50 for module A; 50 for module B. The solid and empty circles denote module A and module B, respectively. Interestingly, over time the entire system forms into two groups, each of which consists of the same type of module (see (d)). Note that this is the equilibrium configuration in this case. The result obtained fits well with the Steinberg’s thermodynamic model.

(a) Initial state

(b) 10000 steps

(c) 60000 steps

(d) 180000 steps

Figure 7. : Time evolution of the self-assembly in Case 2 (see in alphabetical order). The entire system consists of 100 modules: 50 for module A; 50 for module B. The solid and empty circles denote module A and module B, respectively. The equilibrium configuration in this case is different from the one in Case 1: the modules with lower adhesive strength, i.e., module B envelope the ones with higher adhesive strength, i.e., module A. This result also fits well with the Steinberg’s thermodynamic model.

process of self-assembly. To this end, we focused on a functional material, i.e. a gendered/genderless Velcro strap, which was used as an inter-module spontaneous connectivity control mechanism; third, the self-assembly was done in a fully decentralized manner by using the mutual entrainment among the locally-interacting nonlinear oscillators; fourth and finally, we investigated how the equilibrium configuration could be altered, by borrowing the idea from the Steinberg’s thermodynamic model. To verify the feasibility of our proposed method, an experiment with a real physical modular robot is significantly indispensable. A prototype of the module currently under construction is represented in Figure 8 [16]. In this prototype model, the telescopic arm is implemented as an air-driven cylinder. After checking the operation of the modules, we are planning to construct more than 20 modules.

Acknowledgment This work has been partially supported by a Grant-in-Aid for Scientific Research on Priority Areas “Emergence of Adaptive Motor Function through Interaction between Body, Brain and Environment” from the Japanese Ministry of Education, Culture, Sports, Science and Technology.

594

A. Ishiguro et al. / A Modular Robot That Self-Assembles

Figure 8. : The structure of a prototype model. (left) a broken down view of the module. The telescopic arm is implemented as an air-driven cylinder. The outer walls are covered with Velcro straps, which are not shown here for clarity. (right) the real module. This module can be driven in a fully self-contained manner by using a high-pressure-air cartridge.

References [1] T. Fukuda and Y. Kawauchi, Cellular Robotic System (CEBOT) as One of the Realization of Self-Organizing Intelligent Universal Manipulators, Proc. of IEEE ICRA (1990), 662–667. [2] S. Murata, H. Kurokawa, and S. Kokaji, Self-Assembling Machine, Proc. of IEEE ICRA (1994), 441–448. [3] E. Yoshida, S. Murata, H. Kurokawa, K. Tomita and S. Kokaji, A Distributed Reconfiguration Method for 3-D Homogeneous Structure, Proc. of 1998 IEEE/RSJ IROS (1998), 852–859. [4] A. Kamimura, S. Murata, E. Yoshida, H. Kurokawa, K. Tomita, and S. Kokaji, SelfReconfigurable Modular Robot – Experiments on Reconfiguration and Locomotion –, Proc. of 2001 IEEE/RSJ IROS (2001), 590–597. [5] M. Dorigo, et al., Evolving Self-Organizing Behaviors for a Swarm-bot, Autonomous Robots (2004), 223-245. [6] M. W. Jorgensen, E. H. Ostergaard and H. H. Lund, Modular ATRON: Modules for a SelfReconfigurable Robot, Proc. of 2004 IEEE/RSJ IROS (2004), 2068–2073. [7] K. Kotay and D. Russ, Generic Distributed Assembly and Repair Algorithms for SelfReconfiguring Robots, Proc. of 2004 IEEE/RSJ IROS (2004), 2362–2369. [8] M. Yim, C. Eldershaw, Y. Zhang, and D. Duff, Self-reconfigurable Robot Systems: PolyBot, Journal of the Robotics Society of Japan, 21(8) (2003), 851-854. [9] A. Castano, W.-M. Shen, and P. Will, CONRO: Towards Miniature Self-Sufficient Metamorphic Robots, Autonomous Robots (2000), 309-324. [10] C-J. Chiang and G. Chirikjian, Modular Robot Motion Planning Using Similarity Metrics, Autonomous Robots, 10(1) (2001), 91-106. [11] K. Fujibayashi and S. Murata, A Method of Error Suppression for Self-Assembling DNA Tiles, Preliminary Proc. of 10th International Meeting on DNA Computing (DNA10) (2004), 284–293. [12] K. Hosokawa, I. Shimoyama, and H. Miura, Dynamics of Self-Assembling Systems: Analogy with Chemical Kinetics, Artificial Life 1 (1995), 413–427. [13] J. Bishop et al., Programmable Parts: A Demonstration of the Grammatical Approach to SelfOrganization, Proc. of 2005 IEEE/RSJ IROS (2005), 2644-2651. [14] M. S.Steinberg, Reconstruction of Tissues by Dissociated Cells, Science, 141(3579) (1963), 401-408. [15] F. Graner and J.A. Glazier, Simulation of Biological Cell Sorting Using a Two-Dimensional Extended Potts Model, Physical Review Letters, 69(13) (1992), 2013-2016. [16] M. Shimizu, T. Kawakatsu, and A. Ishiguro, A Modular Robot That Exploits Emergent Phenomena, Proc. of 2005 IEEE ICRA (2005), 2993-2998.

595

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

Autonomous Robust Execution of Complex Robotic Missions Paul Robertson, Robert Effinger, and Brian Williams MIT CSAIL, 32 Vassar Street, Building 32-272 Cambridge, MA 02139

Abstract. A model-based system for autonomous robot health management and execution of complex robotic explorations is described that handles software, hardware, and plan execution faults robustly. Keywords: Model-Based Autonomy, Health Management, AI, Space Exploration.

1. Introduction Getting robots to perform even simple tasks can be fraught with difficulty. Sending a robot to a distant planet, such as Mars, raises a number of significant issues concerning autonomy. The intended mission itself may be quite complex, conceivably involving coordinated work with other robots. An example of such an anticipated mission is to send a team of robots to Mars where they will, among other things, construct habitat that will be used by astronauts that arrive at a later time. Quite apart form the complexity of the mission, these robots are likely to be expensive especially when considering the cost of transporting them to the planet surface. The loss of a robot would therefore be a serious loss that could jeopardize the feasibility of the astronaut arrival. For this reason robotic exploration (Figure 1) of Mars (Sojourner, Spirit, and Opportunity) have not been autonomous except for very minor navigation capability.

Figure 1 NASA Rover Spirit

Operating rovers from earth is an excruciating process because the time delay caused by the time it takes a signal to travel between Earth and Mars does not permit tele-operation of the rovers. Consequently a large team of mission planners must plan and test, in excruciating detail, plans for the next days operations. This approach will not work for more complex missions such as cooperative robots assembling an astronaut habitat.

596

P. Robertson et al. / Autonomous Robust Execution of Complex Robotic Missions

In our rover testbed we are attempting to perform such complex cooperative missions using a team of iRobot ATRV robots (Figure 2a) with arms mounted on them for instrument placement/collection and construction tasks. The hostile nature of space and the uncertainty of the Martian environment means that we must consider repair of a mission plan as well as adapting to failed components. In modern systems these failed systems are as likely to be software systems as hardware. Indeed many of NASA’s recent problems have been software problems. In this paper we describe our approach to dealing with perturbations with a particular emphasis to the software/mission planning aspect of robot autonomy. In complex, concurrent critical systems, every component is a potential point of failure. Typical attempts to make such systems more robust and secure are both brittle and incomplete. That is, the security is easily broken, and there are many possible failure modes that are not handled. Techniques that expand to handling component level failures are very expensive to apply, yet are still quite brittle and incomplete. This is not because engineers are lazy – the sheer size and complexity of modern information systems overwhelms the attempts of engineers, and myriad methodologies, to systematically investigate, identify, and specify a response to all possible failures of a system. Adding dynamic intelligent fault awareness and recovery to running systems enables the identification of unanticipated failures and the construction of novel workarounds to these failures. Our approach is pervasive and incremental. It is pervasive in that it applies to all components of a large, complex system – not just the “firewall” services. It is incremental in that it coexists with existing faulty, unsafe systems, and it is possible to incrementally increase the safety and reliability of large systems. The approach aims to minimize the cost, in terms of hand-coded specifications with respect to how to isolate and recover from failures.

Figure 2 (a) MIT MERS Rover Testbed, (b) Deep Space 1: Fight Experiment

There are many reasons why software fails. Among the more common reasons are the following: 1. Assumptions made by the software turn out not to be true at some point. For example, if a piece of software must open a file with a given path name it will usually succeed but if the particular disk that corresponds to the path name fails the file will not be accessible. If the program assumed that the file was accessible the program will fail. In highly constrained situations it is possible to enumerate all such failures and hand code specific exception handlers – and such is the standard practice in the

P. Robertson et al. / Autonomous Robust Execution of Complex Robotic Missions

597

industry. In many cases however, particularly in embedded applications, the number of ways that the environment can change becomes so large that the programmer cannot realistically anticipate every possible failure. 2. Software is attacked by a hostile agent. This form of failure is similar to the first one except that change in the environment is done explicitly with the intent to cause the software to fail. 3. Software changes introduce incompatibilities. Most software evolves of its lifetime. When incompatibilities are inadvertently introduced software that previously did not fail for a given situation may now fail. Whatever the reason for the software failure we would like the software to be able to recognize that it has failed and to recover from the failure. There are three steps to doing this: Noticing that the software has failed; Diagnosing exactly what software component has failed; and Finding and alternative way of achieving the intended behavior. In order for the runtime system to reason about its own behavior and intended behavior in this way certain extra information and algorithms must be present at runtime. In our system these extra pieces include models of the causal relationships between the software components, models of intended behavior, and models of correct (nominal) execution of the software. Additionally models of known failure modes can be very helpful but are not required. Finally the system needs to be able to sense, at least partially, its state, it needs to be able to reason about the difference between the expected state and the observed state and it need to be able to modify the running software such as by choosing alternative runtime methods. Building software systems in this way comes with a certain cost. Models of the software components and their causal relationships that might otherwise have existed only in the programmers head must by made explicit, the reasoning engine must be linked in to the running program, and the computational cost of the monitoring, diagnosis, and recovery must be considered. In some systems the memory footprint and processor speed prohibit this approach. More and more however memory is becoming cheap enough for memory footprint to not be an issue and processor power is similarly becoming less restrictive. While the modeling effort is an extra cost there are benefits to doing the modeling that offset its cost. Making the modeling effort explicit can often cause faults to be found earlier than would otherwise be the case. The developers can choose the fidelity of the models. More detailed models take more time to develop but allow for greater fault identification, diagnosis, and recovery. Finally our approach to recovery assumes that there is more than one way of achieving a task. The developer therefore must provide a variety of ways of achieving the intended behavior. The added costs of building robust software in this way are small when compared to the benefits. Among the benefits it allows us to: Build software systems that can operate autonomously to achieve goals in complex and changing environments; Build software that detects and works around “bugs” resulting from incompatible software changes;Build software that detects and recovers from software attacks; and Build

598

P. Robertson et al. / Autonomous Robust Execution of Complex Robotic Missions

software that automatically improves as better software components and models are added. Prior Work: Self-adaptive software has been successfully applied to a variety of tasks ranging from robust image interpretation to automated controller synthesis [7]. Our approach, which is described below, builds upon a successful history of hardware diagnosis and repair. In May 1999 the spacecraft Deep Space 1 [1], shown in Figure 2b, ran autonomously for a period of a week. During that week faults were introduced which were detected, diagnosed, and recovered from by reconfiguring the (redundant) hardware of the spacecraft. Subsequently Earth Observer 1 has been flying autonomously planning and executing its own missions. Extending these technologies to software systems involves extending the modeling language to deal with the idiosyncrasies of software such as its inherently hierarchical structure [6]. Approach: At the heart of our system is a model-based programming language called RMPL that provides a language for specifying correct and faulty behavior of the systems software components. The novel ideas in our approach include method deprecation and method regeneration in tandem with an intelligent runtime modelbased executive that performs automated fault management from engineering models, and that utilizes decision-theoretic method dispatch. Once a system has been enhanced by abstract models of the nominal and faulty behavior of its components, the model-based executive monitors the state of the individual components according to the models. If faults in a system render some methods (procedures for accomplishing individual goals) inapplicable, method deprecation removes the methods from consideration by the decision-theoretic dispatch. Method regeneration involves repairing or reconfiguring the underlying services that are causing some method to be inapplicable. This regeneration is achieved by reasoning about the consequences of actions using the component models, and by exploiting functional redundancies in the specified methods. In addition, decision-theoretic dispatch continually monitors method performance and dynamically selects the applicable method that accomplishes the intended goals with maximum safety, timeliness, and accuracy. Beyond simply modeling existing software and hardware components, we allow the specification of high-level methods. A method defines the intended state evolution of a system in terms of goals and fundamental control constructs, such as iteration, parallelism, and conditionals. Over time, the more that a system’s behavior is specified in terms of model-based methods, the more that the system will be able to take full advantage of the benefits of model-based programming and the runtime model-based executive. Implementing functionality in terms of methods enables method prognosis, which involves proactive method deprecation and regeneration, by looking ahead in time through a temporal plan for future method invocations. Our approach has the benefit that every additional modeling task performed on an existing system makes the system more robust, resulting in substantial improvements over time. As many faults and intrusions have negative impact on system performance, our approach also improves the performance of systems under stress. Our approach provides a well-grounded technology for incrementally increasing the robustness of complex, concurrent, critical applications. When applied pervasively, model-based execution will dramatically increase the security and reliability of these systems, as well as improve overall performance, especially when the system is under stress.

P. Robertson et al. / Autonomous Robust Execution of Complex Robotic Missions

599

2. Fault Aware Processes Through Model-based Programming Recall, to achieve robustness pervasively, fault adaptive processes must be created with minimal programming overhead. Model-based programming elevates this task to the specification of the intended state evolutions of each process. A model-based executive automatically synthesizes fault adaptive processes for achieving these state evolutions, by reasoning from models of correct and faulty behavior of supporting components. Each model-based program implements a system that provides some service, such as secure data transmission. This is used as a component within a larger system. The model-based program in turn builds upon a set of services, such as name space servers and data repositories, implemented through a set of concurrently operating components, comprised of software and hardware. 2.1. Component Services Model The service model represents the normal behavior and the known and unknown aberrant behaviors of the program’s component services. It is used by a deductive controller to map sensed variables to queried states. The service model is specified as a concurrent transition system, composed of probabilistic concurrent constraint automata 0. Each component automaton is represented by a set of component modes, a set of constraints defining the behavior within each mode, and a set of probabilistic transitions between modes. Constraints are used to represent co-temporal interactions between state variables and intercommunication between components. Constraints on continuous variables operate on qualitative abstractions of the variables, comprised of the variable’s sign (positive, negative, zero) and deviation from nominal value (high, nominal, low). Probabilistic transitions are used to model the stochastic behavior of components, such as failure and intermittency. Reward is used to assess the costs and benefits associated with particular component modes. The component automata operate concurrently and synchronously. 2.2. Self Deprecation and Regeneration Through Predictive Method Dispatch In model-based programming, the execution of a method will fail if one of the service components it relies upon irreparably fails. This in turn can cause the failure of any method that relies upon it, potentially cascading to a catastrophic and irrecoverable system-wide malfunction. The control sequencer enhances robustness by continuously searching for and deprecating any requisite method whose successful execution relies upon a component that is deemed faulty by mode estimation, and deemed irreparable by mode reconfiguration. Without additional action, a deprecated method will cause the deprecation of any method that relies upon it, potentially cascading to catastrophic system-level malfunction. Model-based programmers specify redundant methods for achieving each desired function. When a requisite method is deprecated, the control sequencer attempts to regenerate the lost function proactively, by selecting an applicable alternative method, while verifying overall safety of execution. More specifically, predictive method selection will first search until it finds a set of methods that are consistent and schedulable. It then invokes the dispatcher, which passes each activity to the deductive controller as configuration goals, according to a

600

P. Robertson et al. / Autonomous Robust Execution of Complex Robotic Missions

schedule consistent with the timing constraints. If the deductive controller indicates failure in the activity’s execution, or the dispatcher detects that an activity’s duration bound is violated, then method selection is reinvoked. The control sequencer then updates its knowledge of any new constraints and selects an alternative set of methods that safely completes the RMPL program. 2.3. Self-Optimizing Methods Through Safe, Decision-Theoretic Dispatch In addition to failure, component performance can degrade dramatically, reducing system performance to unacceptable levels. To maintain optimal performance, predictive method dispatch utilizes decision-theoretic method dispatch, which continuously monitors performance, and selects the currently optimal available set of methods that achieve each requisite function. 2.4. A Simple Example To describe the machinery of Predictive Method Selection and Decision-Theoretic Dispatch in more detail, we present a simple example. In this example a MER (Figure 1) rover is tasked with performing science operations on the surface of Mars. Specifically, in this example, the rover is asked to visit two Martian rocks and sample their composition. The primary goal is to scratch each rock using a grinder (a.k.a. rock abrasion tool) and then analyze the rock’s composition. As a secondary goal, if for some reason the primary goal is unattainable, the rover should take a high resolution image of the rock. In Figure 3, below, we show how this simple example can be encoded as a Temporal Plan Network (TPN). A TPN [8] can be viewed as a generalization of a Simple Temporal Network (STN) that supports multiple redundant methods, method deprecation and regeneration, and optimal planning. Ask grinder = ok

Ask grinder = ok

Start

Rover1.sample( target1 ) Rover1.goto(p1) [5,10]

[2,5]

5

Rover1.image( target1) [2,5]

End

Rover1.sample( target2 ) Rover1.goto(p2) [5,10]

10

[2,5]

5

Rover1.image( target2 ) [2,5]

Rover1.goto(p3) [5,10]

10

Tell grinder = ok [0,+inf]

Figure 3: Temporal Plan Network encoding of the MER Rover Example.

In a TPN, nodes represent instantaneous time events, and arcs represent activities with duration. The duration of an activity is constrained by a lower and upper timebound, specified in brackets, [lb,ub]. By default, a blank arc has [0,0] timebounds. Redundant methods in a TPN are denoted by a double circle and a circle with two horizontal lines. One thread of execution must be chosen between each pair of double circle node and double line node in the TPN. This construct allows the autonomous system to select from multiple redundant methods at runtime. Method deprecation and regeneration is enabled through Ask and Tell operators. For example, in Figure 3, we enforce that the grinder be in working condition if we wish to sample the composition of a rock. This condition is enforced by placing an arc over each sample(Target)

601

P. Robertson et al. / Autonomous Robust Execution of Complex Robotic Missions

activity which asks that the condition grinder = ok holds true during the entire execution of the activity. The dispatcher enforces that an Ask condition is satisfied by covering it with a Tell condition. For example, at the bottom of the plan in Figure 3, we tell the plan that grinder = ok for the duration of the plan. Additionally, a cost Step 1: Predictive Method Selection: - initially select a set of methods that are consistent, schedulable, and optimal Ask grinder = ok

Ask grinder = ok

Start

Rover1.sample( target1 ) [2,5]

Rover1.goto(p1) [5,10]

5

[2,5]

Rover1.goto(p2)

Rover1.image( target1)

[5,10]

5

Rover1.goto(p3)

Rover1.image( target2 )

10

[2,5]

End

Rover1.sample( target2 )

[5,10]

10

[2,5] Tell grinder = ok [0,+inf]

Step 2: Deductive Controller indicates failure of an activity during execution - the grinder fails and can’t sample target 1 Ask grinder = ok

Start

Failure

Rover1.sample( target1 ) [2,5]

Rover1.goto(p1) [5,10]

5

[2,5]

Ask grinder = ok

[5,10]

End

Rover1.sample( target2 ) [2,5]

Rover1.goto(p2)

Rover1.image( target1)

Catch: Type: sampleTarget Handler: Rover1.retract_arm Tell: grinder = broken

Throw Exception: Type: sampleTarget Reason: grinder = broken

5

Rover1.goto(p3)

Rover1.image( target2 )

10

[2,5]

[5,10]

10

Tell grinder = ok [0,+inf]

Step 3: Method Deprication: - update the plan with new information (grinder = broken), sample(Targets) is now depricated Ask grinder = ok

Ask grinder = ok

Start

Rover1.sample( target1 ) Rover1.goto(p1) [5,10]

[2,5]

5

Rover1.image( target1) [2,5]

End

Rover1.sample( target2 ) [2,5]

Rover1.goto(p2) [5,10]

10

[2,5] Tell gripper =

Broken

5

Rover1.image( target2 )

Rover1.goto(p3) [5,10]

10

[0,+inf]

Step 4: Reinvoke Predictive Method Dispatch: - choose the optimal set of methods, while avoiding any depricated methods Ask grinder = ok

Ask grinder = ok

Start

Rover1.sample( target1 ) Rover1.goto(p1) [5,10]

[2,5]

5

Rover1.image( target1) [2,5]

End

Rover1.sample( target2 ) [2,5]

Rover1.goto(p2) [5,10]

Rover1.image( target2 )

10

[2,5] Tell grinder = Broken

5

Rover1.goto(p3) [5,10]

10

[0,+inf]

Figure 4: A walkthrough example.

can be associated with each activity in the TPN. This allows the planner to choose the set of available methods that minimizes cost, or analogously, maximizes reward. The cost of an activity is denoted by a number, and as shown in Figure 4 we give each sample(Target) activity a cost of 5 and each image(Target) activity a cost of 10. By giving each sample(Target) activity a lower cost than the associated image(Target) activity, we are abiding by the plan requirement that sampling each rock is the primary goal, and imaging each rock is just the secondary goal.

602

P. Robertson et al. / Autonomous Robust Execution of Complex Robotic Missions

Next, we walk through each step of the MER rover example depicted above in Figure 4. Initially, in step 1, Predictive Method Selection is invoked and an optimal consistent and schedulable plan is selected. We see that among the redundant methods of sampling and imaging each rock, the planner correctly chooses to sample each rock (since this is the option with the least cost). Now that a plan is selected, it is sent to the dispatcher. In step 2, we assume that the grinder breaks, causing a failure exception to be thrown. When the failure occurs, the plan is updated with any new information (grinder = broken) and then method selection is re-invoked. However, as shown in step 3, we see that since the grinder is now broken, the Ask condition associated with each sample(Target) activity is no longer satisfied, thus the activities must be deprecated. In step 4, predictive method selection is re-invoked, and the optimal consistent and schedulable plan is selected. As is shown, with a broken grinder, the only consistent alternative for the MER rover is to take high-resolution images of the two targets.

3. Results Initial testing of the described system has been performed by augmenting the MIT MERS rover test bed. The rover test bed consists of a fleet of ATRV robots within a simulated Martian terrain. By way of example we describe one mission whose robustness has been enhanced by the system. Two rovers must cooperatively search for science targets in the simulated Martian terrain. This is done by having the rovers go to the selected vantage points looking for science targets using the rover’s stereo cameras. The rovers divide up the space so that they can minimize the time taken in mapping the available science targets in the area. The paths of the rovers are planned in advance given existing terrain maps. The plan runs without fail. Between them the rovers successfully find all of the science targets that we have placed for them to find. The scenario is shown below in Figure 5. p4

p5 p5

1 1

2 2

p1

p2

p3 p3

Figure 5: Rover test bed experimental platform

In the test scenario two faults are introduced by placing a large rock that blocks rover #1’s view of one of the designated areas. When rover #1 gets into its initial position to look for science targets its stereo cameras detect the unexpected rock obscuring its view. This results in an exception that disqualifies the current software component from looking for targets. Since the failure is external to the rover software the plan itself is invalidated. The exception is resolved by replanning which allows the both rovers to modify their plans so that the second rover observes the obscured site from a different vantage point. The rovers continue with the new plan but when rover #2 attempts to scan the area for science targets the selected vision algorithm fails due

P. Robertson et al. / Autonomous Robust Execution of Complex Robotic Missions

603

to the deep shadow being cast by the large rock. Again an exception is generated but in this case a redundant method is found – a vision algorithm that works well in low light conditions. With this algorithm the rover successfully scans the site for science targets. Both rovers continue to execute their plan without further failure. R o v e r 1 . f in d T a r g e t s R o v e r 1 . g o t o ( p 2 )

R o v e r1 .g o to (p 3 )

S ta rt

E nd R o v e r1 .g o to (p 1 )

R o v e r 1 . g o t o ( p 2 ) R o v e r 1 . f in d T a r g e t s R o v e r 1 . g o t o ( p 3 )

R o v e r2 .g o to (p 5 )

R o v e r 2 .fin d E g g s

R o v e r2 .g o to (p 6 )

R o v e r2 .g o to (p 6 )

R o v e r 2 .g o to ( p 3 ) R o v e r 2 .fin d T a r g e ts

F a ilu r e

R o v e r2 .g o to (p 3 )

Figure 6: The TPN for the two-rover exploration plan. Failure due to an obscuration (rock) results in automatic online replanning so that the mission can continue.

4. Comparison with Current Technology 4.1. Model-based Programming of Hidden States The reactive model-based programming language (RMPL) is similar to reactive embedded synchronous programming languages like Esterel. In particular, both languages support conditional execution, concurrency, preemption and parameter less recursion. The key difference is that in embedded synchronous languages, programs only read sensed variables and write to controlled variables. In contrast, RMPL specifies goals by allowing the programmer to read or write ``hidden'' state variables. It is then the responsibility of the language's model-based execution kernel to map between hidden states and the underlying system’s sensors and control variables. 4.2. Predictive and Decision-theoretic Dispatch RMPL supports nondeterministic or decision theoretic choice, plus flexible timing constraints. Robotic execution languages, such as RAPS, [2], ESL[4] and TDL[3], offer a form of decision theoretic choice between methods and timing constraints. In RAPS, for example, each method is assigned a priority. A method is then dispatched, which satisfies a set of applicability constraints while maximizing priority. In contrast, RMPL dispatches on a cost that is associated with a dynamically changing performance measure. In RAPS timing is specified as fixed, numerical values. In contrast, RMPL specifies timing in terms of upper and lower bound on valid execution times. The set of timing constraints of an RMPL program constitutes a Simple Temporal Network (STN). RMPL execution is unique in that it predictively selects a set of future methods whose execution are temporally feasible. 4.3. Probabilistic Concurrent Constraint Automata Probabilistic Concurrent Constraint Automata (PCCA) extend Hidden Markov Models (HMMs) by introducing four essential attributes. First, the HMM is factored into a set of concurrently operating automata. Second, probabilistic transitions are treated as conditionally independent. Third, each state is labeled with a logical

604

P. Robertson et al. / Autonomous Robust Execution of Complex Robotic Missions

constraint that holds whenever the automaton marks that state. This allows an efficient encoding of co-temporal processes, which interrelate states and map states to observables. Finally, a reward function is associated with each automaton, and is treated as additive. 4.4. Constraint-based Trellis Diagram Mode estimation encodes PHCA as a constraint-based trellis diagram, and searches this diagram in order to estimate the most likely system diagnoses. This encoding is similar in spirit to a SatPlan/Graphplan encoding in planning.

5. Conclusions We have extended a system capable of diagnosing and reconfiguring redundant hardware systems so that instrumented software systems can likewise be made robust. Modeling software components and their interconnections poses a high modeling burden. Results of our early experiments are encouraging but much work remains to extend the current experimental system to cover the full range of software practice.

6. Acknowledgements The work described in this article was supported in part by DARPA and NASA. NASA pictures courtesy of NASA.

7. References [1]

[2] [3] [4] [5] [6] [7] [8]

D. Bernard, G. Dorais, E. Gamble, B. Kanefsky, J. Kurien, G. Man, W. Millar, N. Muscettola, P. Nayak, K. Rajan, N. Rouquette, B. Smith, W. Taylor, Y. Tung, Spacecraft Autonomy Flight Experience: The DS1 Remote Agent Experiment, Proceedings of the AIAA Space Technology Conference & Exposition, Albuquerque, NM, Sept. 28-30, 1999. AIAA-99-4512. R. Firby, “The RAP language manual,” Working Note AAP-6. University of Chicago, 1995. R. Simmons, “Structured Control for Autonomous Robots,” IEEE Transactions on Robotics and Automation, 10(1), 94. E Gat, “ESL: A Language for Supporting Robust Plan Execution in Embedded Autonomous Agents,” In Proceedings of the AAAI Fall Symposium on Plan Execution, 1996. B. Williams and P. Nayak. A Reactive Planner for a Model-based Execution. In Proceedings 15th International Joint Conference AI, Nagoya, Japan, August 1997. IJCAI-97. T. Mikaelian & M. Sachenbacher. Diagnosing Mixed Hardware/Software Systems Using Constraint Optimization. In Proceedings DX-05. Robert Laddaga, Paul Robertson, Howard E. Shrobe. Introduction to Self-adaptive Software: Applications. Springer Verlag LNCS 261 Phil Kim, Brian C. Williams and Mark Abramson. 2001. "Executing Reactive, Model-based Programs through Graph-based Temporal Planning." Proceedings of the International Joint Conference on Artificial Intelligence, Seattle, Wa.

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

605

Emergence of small-world in Ad-hoc communication 1 network among individual agents Daisuke Kurabayashi a,2 , Tomohiro Inoue a , Akira Yajima b and Tetsuro Funato a a Tokyo Institute of Technology b Olympus, co., ltd. Abstract. This paper describes an algorithm to realize a small-world type ad-hoc communication network among autonomous mobile agents. An ad-hoc network means a network emerged by local negotiations without central manager. We have shown procedures so that an emerged network equips small-world property, which results effective communication with low cost. We have formulated expected number of communication links and degrees of agents. By simulations we verified the proposed method. Keywords. Small world, Ad-hoc network, Local negotiation

1. Introduction Our society includes many networks. Sometime we are nodes of the communication networks. Today, we have some high-speed wireless communication systems, for example W-CDMA (UMTS) phones. However, they require heavy infrastructure, and they are not so fault-torelant when the access point is broken. In the contrast, we also have Ad-hoc network, which is constracted and maintained by distributed way. It is robust, but its performance depends on the algorithms in individuals. In this study, we formulate communication network property that is emerged by ad-hoc negotiation between autonomous agents. We examine the essential factor and conditions to emerge effective communication network that has ”smallworld” property.

2. Problem settlement 2.1. Small World (SW) Some researchers have suggested that the structure of a graph itself has functionality. Small world is a property found in human links in our society [1][2][3][4]. 1 This work has been partially supported by a Grant-in-Aid for Scientific Research 17075007 from Japanese Ministry of Education, Culture, Sports, Science and Technology. 2 Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku, Tokyo, 152-8552 Japan.

606

D. Kurabayashi et al. / Emergence of SW in Ad-Hoc Communication Network

The property ”small world” means good connectivity in large-scale network with rather small number of edges. Some researchers have suggested that the network of the power transmission lines in USA forms a small world graph[5].

(a) Regular graph

(b) Small world

(c) Random graph

Figure 1. Typical graphs and a small world.

We employee the property ”small world” to evaluate an ad-hoc network emerged by local negotiations. A small world graph is identified by following two measures: Characteristic Path Length (CPL) and Clustering Coefficient (CC) [6][7]. Equation (1) shows CPL and CC, where d¯vi is average of the shortest paths to other nodes, kvi shows degree of a node vi , Γvi means partial graph composed of neighbors of vi , and ||E(Γvi )|| indicates number of edges in Γvi .CPL indicates radius of a graph. CC denotes connectivity of geometrically near nodes. CP L =

NV 1  d¯v NV i=1 i

CC =

NV 1  ||E(Γvi )|| NV i=1 kvi C2

(1)

Generally, both CPL and CC of a random graph become large, and those of a regular graph become small. However, a small world graph has small CPL and large CC. This means that a small world graph has high connectivity among not only geometrically near nodes but also far nodes. Notice that there is no strict threshold to distinguish a small world graph from other graphs. So we usually compare CPL and CC with those of a random graph and a regular graph (Fig. 1)[8]. We define a graph is a small-world graph if its normalized CPL is lower than 0.4 and normalized CC is higher than 0.6 (2). N CP L =

CP L − CP LRandom CP LRegular − CP LRandom

N CC =

CC − CCRandom (2) CCRegular − CCRandom

2.2. Assumptions and conditions 2.2.1. Agents Let us suppose that Nr autonomous agents walk in a certain region W (Fig. 2(a)). An agent walks straight with v per step in a certain period. Then, it changes its walking direction at random, and continues to walk. We call this motion as random walk. Initial positions of agents are generated at random.

607

D. Kurabayashi et al. / Emergence of SW in Ad-Hoc Communication Network

2.2.2. Targets We also suppose NT static agents, called targets. A target never move around, but it changes its status between non-active and active in a certain period. All targets are synchronized. When targets are active, an agent walks toward to the position of the nearest target (Fig. 2(b)). Let us call this motion as concentration. When targets turn into non-active, an agent returns to the random walk. In this case, agents diffuse in the working field. We name this as diffusion. The targets take the role of schools, hospitals and companies in the real society. We assume that a working region W is a square, NT is a square number, and targets are located in the grid positions to simplify the problem. An agent

A target

ra

An agent

v A0 t-1

(a) Mobile agents

(b) Showing the targets

A1 t

(c) Communication radius

Figure 2. Essential motions of agents

2.2.3. Generation of a communication edge Each agent has its ID like a telephone number. An agent can communicate with agents it knows whose ID. We call this connection as an edge in a communication network. An agent can talk to other agents within a circle with radius ra directly. When it finds other agents in the circle, it negotiates them to exchange their IDs. They will reach an agreement with the probability S (0 < S ≤ 1). An agent can memory IDs up to D. When it wants to memory D + 1th ID, it deletes one of IDs in its current memory at random, then incorporates new one. It notifies a robot of the deleting ID in order to the agent can also delete the communication link. Under those assumptions, we examine the condition to emerge small-world network among mobile agents. 3. Essential factor for SW network In this section, we examine qualitative condition to emerge small-world communication network. We carry out simulations to evaluate the effects of the targets. Table 1 indicates the parameters. The graph theory says that a connected graph needs degree d >> ln(Nr ) [8]. We set the limit number of memory by the condition. In the initial state, agents are located at random and they have no connection (no edge). We have carried out each simulation for 5000 steps in following two conditions.

608

D. Kurabayashi et al. / Emergence of SW in Ad-Hoc Communication Network

Table 1. Parameters for simulations W

Size of a region

Nr NT

Number of agents Number of targets

200 × 200

T2

Step for active

100

100 9

ra S

Communication radius Success ratio of a negotiation

10 1.0

D

Limit number of memory

v

Speed of an agent

1.0

T1

Step for non-active

100

10∼70

㪈㪅㪇 D=10

㪇㪅㪐

9KVJ6CTIGVU 9KVJQWV6CTIGVU

Normalized CC

㪇㪅㪏 㪇㪅㪎 㪇㪅㪍

D=20

㪇㪅㪌 㪇㪅㪋 㪇㪅㪊 㪇㪅㪉

D=10 D=70

㪇㪅㪈 D=20

㪇㪅㪇 㪇㪅㪇

㪇㪅㪉

㪇㪅㪋 㪇㪅㪍 㪇㪅㪏 Normalized CPL

㪈㪅㪇

Figure 3. Properties of emerged networks

A NT targets are located. A target repeats T1 step non-active → T2 step active. B No target exists. An agent walks at random. Figure 3 shows the normalized CC and CPL of generated communication networks. We have changed the limited number of memory D from 10 to 70. Under the condition B, the generated graph never satisfy the criteria to small world. However, in A, when D is in 10 < D < 20, graphs with small-world property emerge. We can say that we need targets to generate small world in a communication network. In the next section, we will formulate the property of emerged network.

4. Formunation of the emergence of a network 4.1. Estimated number of edges In this section, we formulate an estimation of edges in an emerged network. Suppose that the estimated number of agents in a certain area is subject to Poisson distribution. Let Pij (t) and Qij (t) indicate possibilities that an edge between agent i and j exist and that they negotiate each other, respectively. We denote Rij (t) as the probability so that the edge between i and j disappears. Then we can formulate estimated number of edges NE (t) by (3). Let us formulate Qij (t) and Rij (t) respectively. Pij (t) = Pij (t − 1)Qij (t)S + Pij (t − 1)Rij (t)

NE (t) = Pij (t)Nr C2

(3)

D. Kurabayashi et al. / Emergence of SW in Ad-Hoc Communication Network

609

4.2. At the random walk Let U is a set of agents that agent i does not have communication link to. Fig. 2(c) shows transition from t − 1 step to t. Region A0 indicates an overlapped communicable region by that of step t − 1. Let ρU (t) is the density of U in W , ρ− U A (t) is that in A before negotiation, and ρ+ (t) is that after negotiation. Let N U (t) is an expected number of U after UA negotiation. NU (t − 1) Pij (t − 1)(Nr − 1) = W W   1 ρ− ρ+ (4) U A (t) = U A (t − 1)A0 + ρU (t)A1 A Nr −1 − # 1  Nr − 1 (ρU A (t)A)x −ρ− (t)A − UA e − ρ ρ+ (t) = (x − S) + ( (t))AR (t) ij UA UA A x=1 x! W ρU (t) =

Then, we can derive Qij (t) as (5). Qij (t) =

N r −1 x=1

x − {ρ− 1 U A (t)A} e−ρUA (t)A x! NU (t − 1)

(5)

4.3. At the concentration and the diffusion An agent repeats the concentration and the diffusion when targets are introduced. Let τ indicates passed steps from the latest activation or inactivation of targets. We describe t = 0∗ , T1∗ and T2∗ when τ = 0, τ = T1 and τ = T2 , respectively. From the viewpoint from agent i, other agents are separated into two categories. Let C and X show sets of agents to go toward the same target of agent i and don’t, respectively. An agent only can negotiate with C (6). C

Pij (τ ) = C P ij (τ − 1)Qij (τ )S + C Pij (τ − 1)Rij (τ )

X

Pij (τ ) = X Pij (τ − 1)Rij (τ ) C

(6)

X

Pij (τ ) = Pij (τ ) + Pij (τ ) We suppose C Pij (t) = ηC Pij (t)、X Pij (t) = (1−ηC )Pij (t)、C Pij (t) = ηX Pij (t)、 X Pij (t) = (1 − ηX )Pij (t) at t = 0∗ . ηC , ηX are derived by(10). We derive the expected number of C, NC by (7). NC =

Nr NT

(7)

We denote the expected number of U ∈ C as C NU (τ ), of which initial value is determined by (9).

610

D. Kurabayashi et al. / Emergence of SW in Ad-Hoc Communication Network

By the concentration, the density of agents increases. We approximate a Voronoi region of a target as a circle with the same size of averaged area for targets (8). Then we formulate the density of U as (9), where B0 is a communication range that is overlapped by that in t − 1 step.  W (8) r1 = πNT  1 C + ρU (t − 1)B0 + ρU (t)B1 (9) WB  1 C − Nr − 1 C − C + − ρU (t))WB Rij (t) ρU (t) = ρU WB − Qij (t)NU (t − 1)S + ( WB W

C − ρU (t)

C

=

∗ NU (0∗ ) = C ρ+ U (0 )WB

Here, we can derive ηC and η = X by (10). ηC =

(NC − 1) − C NU (0∗ ) (Nr − 1) − NU (0∗ )

C

ηX =

NU (0∗ ) NU (0∗ )

(10)

During the concentration, the approximated Voronoi region, B(τ ), shrinks (11). When t = T1∗ , it is expressed by (12). B(τ ) = (r1 − v0 τ )2 π  (r1 − v0 T2 )2 π (r1 ≥ v0 T2 ) BC (T2∗ ) = 0 (v0 T2 ≥ r1 )

(11) (12)

When T1 is quite small, the concentration will be effected by the latest diffusion. We formulate B(τ ) by (13) for those cases. Note that B(T1∗ ) indicates an area of C just after the diffusion.  B(T1∗ )  2  B(τ ) = (r1 − v0 τ ) π, r1 = (13) π When targets turn into non-active, an agent starts the diffusion. We can approximately formulate the Voronoi region as (14)  B(T2∗ ) 2 (14) B(τ ) = (r2 + v0 τ ) π, r2 = π Let ρU B (τ ) as the density of U in B(τ ). We derive it by (15) and (16). C − 1)(Nr − 1) NU (τ − 1) = B(τ ) B(τ )  1  + ρ− ρU B (τ − 1)A0 + ρU B (τ )A1 U B (τ ) = A

ρU B (τ ) =

cP

ij (τ

(15)

D. Kurabayashi et al. / Emergence of SW in Ad-Hoc Communication Network

ρ+ U B (τ )

1 = A 

B(τ ) =

N c −1 x=1

611

# x NC − 1 {ρ− −ρ− (τ)A − U B (τ )A} e UB − ρU B (t))ARij (t) (x − S) + ( x! B(τ )

B(τ ) (B(τ ) ≥ A) A (B(τ ) < A)

(16)

Then, we derive (17). Qij (t) =

N c −1 x=1

x − {ρ− 1 U B (τ )A} e−ρUB (τ)A C x! NU (τ − 1)

(17)

Notice that the size of a Voronoi region becomes larger than that of WB , we apply the random walk model to the system. 4.4. Formulation of Rij (t) When agent i obtains new ID although its memory has been already full, it deletes one of ID in the memory at random. Let D QiU (t) as the possibility that agent i without available memory tries to negotiate with U. We derive it when an agent moves at the random walk (18) and at the concentration (19), respectively. D

QiU (t) = (Nr − 1 − D)Qij (t)

D

QiU (t) =

Nr − 1 − D Qij (t)C NU (t − 1) NU (t − 1)

(18) (19)

Then, we can formulate Rij by (20). Notice that fn (t) indicates the possibility that an agent has n edges after negotiation. Rij (t) = fD (t − 1)D QiU (t)S

1 1 (2 − fD (t − 1)D QiU (t)S ) D D

(20)

4.5. Verification of expected degrees We carried again simulations with the parameters in Table 1. In this case, we verified expected degrees for agents with targets. Figure 4 shows the total number of edges and degrees, respectively. Table 2 illustrates the errors according to the parameters. By the results, we have successfully formulated the network within 10 % errors. The approximations for the formulations cause the errors. The most essential one may be the approximation that we regard the Voronoi region of a target as a circle with the same size. However, it is quite difficult to formulate the Voronoi diagram for general layout of the targets.

5. Conclusion In this study, we examined the essential factor and conditions to emerge smallworld (SW) communication network among autonomous mobile agents. We in-

D. Kurabayashi et al. / Emergence of SW in Ad-Hoc Communication Network 800

60

700

50

600 500 400

Formulated

300

Simulated

200

0WODGTQHCIGPVU

Total number of edges

612

100 0

Formulated

40

Simulated

30 20 10

0

0

200 400 600 800 1000 1200 1400 1600

0

STEP

(a) Total number of links

5 10 15 &GITGG PWODGTQHEQPPGEVGFCFIGU

(b) The distribution of degrees

Figure 4. Verification of the formulated model Table 2. Error of number of links D 10

15

S

Ave. Error (%)

Distribution

1.0

3.08

3.18e−4

0.5

3.77

3.25e−4

0.1

3.94

6.33e−4

1.0

5.92

3.11e−4

0.5

5.24

3.32e−4

0.1

4.27

1.34e−3

D 20

S

Ave. Error (%)

Distribution

1.0

5.00

6.82e−4

0.5

4.60

5.91e−4

0.1

4.89

2.03e−3

corporate concentration and diffusion motion for SW property. We formulate estimated property of a network emerged by ad-hoc negotiation among agents. Based on those results, we verify the derived model by simulations.

References [1] S. Milgram: The small world—world problem, Phycology Today, Vol. 2, pp. 60/67, 1967. [2] M. Marchirori, V. Latora: Harmony in the Small-world, Physica A, Vol. 285, pp.539/546, 2000. [3] N. Mathias, V. Gopal: Small Worldws: How and Why, Physical Review E, Vol. 63, No.2, 2001. [4] J.M. Montoya, R.V. Sole: Small World Patterns in Food Webs, 2000. [5] L.Adamic: The Small World Web, Proc. ECDL99, pp. 443/452, 1999. [6] D. Watts, S. Strogatz: Collective Dynamics of Small-world Networks, Nature, Vol. 393, pp. 440/442, 1998. [7] D. Watts: Small Worlds, Princeton U. Press, 1999. [8] B. Bollobas: Random Graphs, Academic Press, 1999.

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

613

Parametric Path Planning for a Car-like Robot Using CC Steers Weerakamhaeng Yossawee a,1 , Takashi Tsubouchi a , Masamitsu Kurisu b and Shigeru Sarata c a Intelligent Robot Laboratory, University of Tsukuba, Japan b Department of Mechanical Engineering, Faculty of Engineering, Tokyo Denki University, Japan c Intelligent systems Institute, AIST, Japan Abstract. The present authors have been interested in a smooth locomotion of a wheel loader between any predefined pair of a locally desired initial configuration and a locally desired destination configuration. The success comes from the smooth path generation scheme based on the design of a steering method called  Continuous-Curvature Steer( CC Steer ) which is composed of line segments. clothoid segments and circular arcs. Keywords. clothoid pair, nonholonomic path planning, upper-bounded curvature and curvature derivative path, continuous-curvature path, frame articulated steering type vehicle

1. Introduction The present authors have been interested in autonomous control of a wheel loader (WL) (Figure 1) which is considered as a frame articulated steering type vehicle. WL is one of major machine vehicles used in several fields, especially for mining. Its main operations are to shovel, carry, load and unload gravel. During the operations, a typical motion of this vehicle consists of several motions between pairs of a locally desired initial configuration, and a locally desired destination configuration. It is vigorously preferable to have the smooth locomotion in order to accommodate the precision of path tracking control and improve the vehicle’s dead reckoning ability. This paper then contributes to the generation of a smooth path as the key to preference. Dubins proved that without any obstacle, the optimal(shortest) path[1] for a car that can only go forward will take one of the six combinations of sequential curve segments. However, his idea must have sharp turns. For the case of a car-like robot that can move forward or backward, Reeds and Shepp [2] showed that the optimal path between two configurations will take one of 48 different forms. The curvature of this type of path is discontinuous. Discontinuities occur at the transitions between straight segments and sharp turns and between sharp turns with opposite direction of rotation. 1 Correspondence to: Weerakamhaeng Yossawee, Intelligent Robot Laboratory, University of Tsukuba, Japan. E-mail: [email protected], [email protected]

614

W. Yossawee et al. / Parametric Path Planning for a Car-Like Robot Using CC Steers

Barraquand and Latombe [10] proposed a heuristic brute-force approach to motion planning for tractor-trailer robots. It consists of heuristically searching a graph whose nodes are small axis-parallel cells in configuration space. Two such cells are connected in the graph if there exists a feasible path between two configurations in the respective cells. The main drawback of their method is that when the heuristics fail, it requires an exhaustive search in the discretized configuration space. In addition, the resulting path is inexact because the solution to the nonholonomic constraints is approximated numerically; this implies that the goal configuration is never reached exactly. Laumond et al. [9] proposed a motion planning for a car-like robot taking into account of nonholonomic constraint whose turning radius is lower-bounded. They obtained a fast and exact planner for their mobile robot model(HILARE mobile robot), based upon the recursive subdivision of a collision-free path generated by a lower-level geometric planner that ignores the motion constraint. The resultant trajectory is optimized to give a path that is of near-minimal length in its homotopy class. S.M.LaValle and J.J. Kuffner [11] presented the first randomized approach to trajectory planning. The task is to determine control input to drive a robot from an initial configuration and velocity to a goal configuration and velocity while obeying physicallybased dynamical models and avoiding obstacles in the robot’s environment. They developed a randomized planning approach based upon what they called  Rapidly-exploring Random Trees , which offers benefits that are similar to those obtained by successful randomized holonomic planning methods, but apply to a much broader class of problems. Thierry et al. [3] presented the CC Steer, an algorithm planning paths in the absence of obstacles. The planned path possesses a) continuous curvature, b) upper-bounded curvature, c) upper-bounded curvature derivative for a car-like vehicle, and d) the length of the computed path is not the minimal length but close to the length of the optimal paths for the simplified car (as it is true when the curvature derivative limit tends to infinity, the paths computed become Reeds and Shepp paths). The path computed by CC Steer are made up of line segments, circular arcs and clothoid arcs[4]. Kito et al. [6] considered the application of CC Steer in the work space with the definite number of polygonal obstacles. They combined this CC Steer idea with visibility graph search. In order to gain the optimal path (shortest path), the decision of the number of tangents to the discretized circle having the center at each vertice of polygonal obstacles play a role in the algorithm as well. However, the specific criteria in the decision was not mentioned. In this paper, the present authors propose a smooth path generation scheme in the absence of obstacles based on the nonlinear programming. Novelty of this paper is a formulation of the shortest path finding problem suitable for the nonlinear programming, where the geometric relationships with combination of CC Turns [3] and line segments were considered. The objective is to find an optimal path (shortest in length) for the specific parameterized canonical path skeleton. By the optimization, for a given pair of initial and destination configurations, the values of parameters in the parameter set can be understood and used to draw a smooth planned path. The planned path consists of 2 turns, each of which is a CC Turn.

615

W. Yossawee et al. / Parametric Path Planning for a Car-Like Robot Using CC Steers Front axle

R σst 2

σst(steering angle)

σst 2

L

R

center pin

Pwl (representative point of the wheel loader)

Figure 1. A Wheel Loader(WL)

L

Rear axle

Figure 2. Circling Behaviors of a Wheel Loader

2. Paths of continuous and upper bounded curvature and its derivative 2.1. A wheel loader kinematics issues It is natural to consider that the path to be planned should have the properties of continuous curvature, upper-bounded curvature, and upper-bounded curvature derivative. The limit property of upper-bounded curvature arises from the maximum steering angle σ st max (Figure 2). The curvature κ at any instance with respect to the steering angle σ st is represented as: κ = tan(

σst )/L 2

(1)

The fact that |

dσst ds dσst |=| · | dt ds dt

(2)

is bounded requires the continuous curvature property. 2.2. The combination among CC Turn and 2 line segments Refering to [3], in general, a CC Turn (Figure 3) is made up of three parts: 1. a clothoid arc(from lc p0 to lc p4 ) of sharpness k = ±kmax whose curvature varies from 0.0 to ±κ max where |κmax | is the upper bounded curvature. A clothoid is a curve whose curvature(κ) varies linearly with its arc length s : κ(s) = ks + κ(0.0). 2. a circular arc(from lc p4 to lc p5 ) of radius ±κ−1 max and 3. a clothoid arc(from lc p5 to lc p9 ) of sharpness −k whose curvature varies from ∓κmax to 0.0. For the local reference frame lc having lc p0 as the origin, without loss of generality, p0 (lc xp0 ,lc yp0 ,lc θp0 ,lc κp0 ) = (0.0, 0.0, 0.0, 0.0). Defining sp4 as the distance of lc p4 from lc p0 , from Figure 3, lc p4 (lc xp4 ,lc yp4 , lc θp4 ,lc κp4 ) can be calculated as follows: lc

616

W. Yossawee et al. / Parametric Path Planning for a Car-Like Robot Using CC Steers lc

κp4 = ±κmax lc

sp4 = lc

θp4 =

nrdlimit min =

1 lc κ

(3)

p4

κp4 −lc κp0 k

k · [sp4 ]2 = 2 

lc

lc

,

u=sp4

xp4 =

(4) lc

θm

(5) 

lc

cos[ θ(u)] d u

lc

,

u=sp4

yp4 =

u=0.0

sin[lc θ(u)] d u

(6)

u=0.0

The center of the circle arc, lc ac(lc xac ,lc yac ) can be calculated as: lc

lc xac =lc xp4 −lc nrdlimit min · sin[ θp4 ] , lc

lc

lc yac =lc yp4 +lc nrdlimit min · cos[ θp4 ] (7)

nrac and lc μ can be determined as: ( lc nrdlimit lc min · nrac = lc (lc xac −lc xpo )2 + (lc yac −lc ypo )2 | nrdlimit min | (8) lc

lc xac −lc xpo μ = arctan[ lc ] yac −lc ypo

(9)

Defining the deflection of the CC Turn as lc θp9 −lc θp0 : lc

θp9 −lc θp0 = (lc θp4 −lc θp0 ) +lc θp4 p5 + (lc θp9 −lc θp5 ) = lc θm +lc θp4 p5 +lc θm

(10)

The deflection of the CC Turn whose circular arc has zero length( lc θp4 p5 = 0.0) is θp9 −lc θp0 = κ2max k −1 = 2 ·lc θm and CC Turn consists of only two clothoid arcs. It is called the double clothoid or the clothoid pair according to Kanayama et al. [4]. When the path is made up of 2 line segments connecting a CC Turn at lc p0 and lc p9 respectively, the curvature profile is continuous and looks like the one depicted in Figure 4 (assumption of CCW rotation of this CC Turn). Segment from lc pa to lc p0 is the line segment. Moving from lc p0 to lc p4 is on the clothoid arc of k = k max . The curvature is maximum and constant from lc p4 to lc p5 which is the circular arc. Finally, moving from lc p5 to lc p9 is on another clothoid arc of k = −k max . Curvature is varied down back to 0.0, then movement continues on the line segment from lc p9 to lc pb . lc

3. Specific parameterized canonical path skeletons Given a pair of 1 Init(1 xinitdsr ,1 yinitdsr ,1 θinitdsr ) (without loss of generality, set to (0, 0, 0)) and 1 Des (1 xdesdsr ,1 ydesdsr ,1 θdesdsr ) (where the leftmost superscript ”1” is refered to the local reference frame x 1 − y1 ), the authors propose a path generation scheme employing the following specific parameterized canonical path skeleton. In this section, parameterizations of the path skeleton suitable for the nonlinear programming are illustrated.

617

W. Yossawee et al. / Parametric Path Planning for a Car-Like Robot Using CC Steers

lc

θm lcp7

lc

lc

p8

θ p9 p9

lcp6

lc

lc

cx

θ p5

lc

μ

lc

lc

μ μ

lcac

lc

p5

lc

θ p4p5

θ p4

lc

μ lc μ

ci

lc lc

p4

ylc lc

p3

xlc lcp0

lc

lc



p1

Curvature lcp2 lc

θm

κ max lc

nrdlimit min lc

lc

lc

nrac

Figure 3. A CC Turn

pa

p0

lc

p4

lc

p5

lc lc

p9

distance pb

Figure 4. Curvature profile of two line segments connecting to both ends of a CC Turn of CCW rotation

3.1. Skeleton consisting of two turns of two CC Steers In Figure 5, the vehicle will begin to run on this skeleton from 1 O1 to 1 A on line segment −−−→ 1 1 O1 A of displacement nl 1 . Then it takes a CC Turn ac 1 illustrated in Section 2. The deflection of this CC Turn ac 1 defined as 1 θD − 1 θA is equal to 1 θmac1 + 1 θBC + 1 θmac1 which is depicted as θM in Figure 5. Vehicle continues to move on line segment −−−→ 1 1 D E of displacement nl 2 to 1 E. It then takes a CC Turn ac 2 . Finally, vehicle moves −−−→ on line segment 1 H 1 J of displacement nl 3 to 1 J as the destination of this skeleton. The deflection of CC Turn ac 2 defined as 1 θH − 1 θE is equal to 1 θmac2 + 1 θF G + 1 θmac2 which is depicted as θN in Figure 5. 3.1.1. Scheme for the path generation for this skeleton The scheme for the path generation is to minimize the total moving distance of vehicle from 1 O1 to 1 J. Therefore, the following nonlinear programming can be formulated: 2

M inimize Z = 1 O1 1 A + [ 1 A1 B]2 + [ 1 B 1 C]2 + [ 1 C 1 D]2 2

+1 D1 E + [ 1 E 1 F ]2 + [ 1 F 1 G]2 + [ 1 G1 H]2 + 1 H 1 J

2

(11)

= Z(1 nl1 ,1 μac1 ,1 nrac1 ,1 θmac1 ,1 θBC ,1 nl2 ,1 μac2 ,1 nrac2 ,1 θmac2 ,1 θF G ,1 nl3 ) (12) Subject to: 1 nl1 − L ≥ 0.0 or 1 nl1 + L ≤ 0.0 , 1 nl3 − L ≥ 0.0 (13) 1

xdesdsr = 1 xJ

,

1

ydesdsr =1 yJ

,

1

θdesdsr =1 θJ

(14)

618

W. Yossawee et al. / Parametric Path Planning for a Car-Like Robot Using CC Steers 1

μac1 = |lc μ| or 1 μac1 = −|lc μ| , 1 nrac1 = |lc nrac | or 1 nrac1 = −|lc nrac | (15) 1

θmac1 = |lc θm | or 1 θmac1 = −|lc θm | 1

μac1 · nrac1 ≥ 0.0 ,

1

μac2 = |lc μ| or 1 μac2 = −|lc μ| , 1 nrac2 = |lc nrac | or 1 nrac2 = −|lc nrac | (18)

1

θmac2 = |lc θm | or 1 θmac2 = −|lc θm | 1

1

(16)

1

1

θmac1 · nrac1 ≥ 0.0

μac2 ·1 nrac2 ≥ 0.0 ,

1

θmac2 ·1 nrac2

(17)

(19) ≥

0.0

(20)

Note that 1 nl1 , 1 θBC , 1 nl2 , 1 θF G and 1 nl3 can continuously be varied independently in the nonlinear programming to minimize Z. On the other hand, 1 μac1 and 1 μac2 , 1 nrac1 and 1 nrac2 and 1 θmac1 and 1 θmac2 are allowed to be varied between either ± lc μ, ±lc nrac and ±lc θm (fixed by calculation from Eq.(9), Eq.(8) and Eq.(5) ) consecutively. This means that these parameters are given discretely when the nonlinear optimization is performed. By the way, k in Eq.(5) is fixed constantly as k upperbound and equal to the upper bounded curvature derivative (|dκ limit max /dt|) [8] divided by the absolute value of tangential linear velocity(v = ds/dt) to a clothoid at any moment. kupperbound =

|dκlimit max /dt| |v|

=

|

dt dκlimit | · | max | ds dt

(21)

The deflection of CC Turn ac 1 whose circular arc has zero length is θ Mmin = |κlimit max | · −1 kupperbound . The deflection of CC Turn ac 2 whose circular arc has zero length is θ Nmin −1 = |κlimit max | · kupperbound . Constraints of Eq.(13) are to guarantee that there will be a line segments of lower bounded length | L | connected to a CC Turn at 1 A and the bucket attached to the front body of wheel loader must always leads the rear body toward any given desired destination position( 1 Des)(1 nl3 must be positive) and there must be a line segment 1 nl3 connecting CC Turn ac 2 at point 1 H of minimum length of L respectively. Constraints of Eq.(14) are to guarantee that end point of this planned path( 1 J) is exactly connected to the locally desired destination position, 1 Des(1 xdesdsr ,1 ydesdsr ), with the same orientation. Constraints of Eq.(15) and Eq.(16) are the results from the upper-bounded curvature and upper-bounded curvature derivative which are from the physical limits of wheel loader. Constraints of Eq.(17) are to guarantee the consistency of the signs of 3 parameters, 1 μac1 , 1 nrac1 and 1 θmac1 that characterize the direction of rotation(CW or CCW) of this CC Turn ac 1 . Constraints of Eq.(18) and Eq.(19) are the results from the upper-bounded curvature and upper-bounded curvature derivative which are from the physical limits of wheel loader. Constraints of Eq.(20) are to guarantee the consistency of the signs of 3 parameters, 1 μac2 , 1 nrac2 and 1 θmac2 that characterize the direction of rotation(CW or CCW) of this CC Turn ac 2 . 3.2. Remark on one CC Turn path Depending on the initial condition of the given pair of 1 Init and 1 Des, there is a possibility that the path is obtained by using only one CC Turn constituting the skeleton. In such a case, similar parameterization and nonlinear programming yields the path. Because of shortage of the paper length, details of this case could not be presented. For a

W. Yossawee et al. / Parametric Path Planning for a Car-Like Robot Using CC Steers

619

realistic strategy, it must be an idea that once the one CC Turn constituting the planned path is employed, if a specific criteria could not be met, the skeleton consisting of two turns of two CC Steers is then applied. 3.3. Nonlinear programming For the actual nonlinear programming(NLP) in Section 3.1.1, the authors utilized the Mathematica command function. Cost function Z and all of the constraints are passed to the suitable parameter lists of this function. The Mathematica NLP function can accept attributes of continuous and/or discrete parameters. The author specified  differential evolution  method for NLP as the option. Desired NLP for resolving the problem formulation can be performed properly. Finally, for a given pair of 1 Init and 1 Des, the values of the parameter set of ( 1 nl1 ,1 μac1 ,1 nrac1 ,1 θmac1 ,1 θBC ,1 nl2 , 1 μac2 ,1 nrac2 ,1 θmac2 ,1 θF G ,1 nl3 ) can be obtained. Based upon the computer model of Pentium 3, 650MHz, 128MB RAM, the calculation time required for a given pair of 1 Init and 1 Des is approximately less than or equal to 78.376368 seconds.

4. Example of path generation In this section, typical examples of path generation results based on the proposed path limit limit generation scheme are illustrated. We assumed |ω st | of 20 degree/sec, |σ st | of 40 max max degree, L of 12.5 cm and |v| of 40 cm/sec. 4.1. Example 1 For a given pair of 0 Init(2.0L, 3.0L, 140◦) and 0 Des(10.0L, 10.0L, 225◦) 1. Setting 0 O1 = 0 Init and converting to 1 O1 (0, 0, 0) and 1 Des(−1.62884L, −10.5046L, 85◦ ) 2. Applying the nonlinear programming of Section 3.1.1 and the obtained values of (1 nl1 ,1 μac1 ,1 nrac1 ,1 θmac1 ,1 θBC ,1 nl2 ,1 μac2 ,1 nrac2 , 1 θmac2 , 1 θF G ,1 nl3 ) = (1.0L, −lcμ, −lc nrac , −lc θm , −19.585◦, 6.52142L , lc μ,lc nrac , lc θm , 104.585◦, 1.0L). | θM | is | −158.749◦ | which is greater than | θ Mmin |(139.164◦). | θN | is 243.749◦ which is greater than | θ Nmin |(139.164◦). Planned path can be drawn as in Figure 6. 4.2. Example 2 For a given pair of 0 Init(2.0L, 3.0L, 140◦) and 0 Des(0.0L, 10.0L, −90◦) 1. Setting 0 O1 = 0 Init and converting to 1 O1 (0, 0, 0) and 1 Des(6.0316L, −4.07674L, −230◦) 2. Applying the nonlinear programming of Section 3.1.1 and the obtained values of (1 nl1 ,1 μac1 ,1 nrac1 ,1 θmac1 ,1 θBC ,1 nl2 ,1 μac2 ,1 nrac2 , 1 θmac2 ,1 θF G ,1 nl3 ) = (1.70905L, −lcμ, −lc nrac , −lc θm , 122.148◦, 0.661746L, −lcμ, −lc nrac , −lc θm , −73.82◦, 1.44713L). | θM | is | −17.016◦ | which is less than | θMmin |(139.164◦). | θN | is | −212.984◦ | which is greater than | θ Nmin |(139.164◦). Planned path can be drawn as in Figure 7. Note that as a result of NLP, the steer-

620

W. Yossawee et al. / Parametric Path Planning for a Car-Like Robot Using CC Steers ed 2

θN nl 3

1 1

μac2

H 1

θJ

1

1

J

μ ac2

μ ac2

1

G

θ mac2

F

μac2

1

θ mac2 1 μac2

ac2cx ac2 1

nrac2

limit

1

−μac2

ac2ci

nrd2

ed2

x3

1

1

θ FG

N

E

min

y3 nl 2

μac1

1

1

1

nrac1

D

limit

nrd1

ac1cx

min

ac1ci

μac1

1

ac1 1

y2

y1

O1

1 1

μac1

A

x1

bucket front body

θ mac1

θmac1 B

x2 1

−μac1

ed

1

μac1

1

M 1

μac1

1

θM

C

θ BC

nl 1 ed1

Figure 5. Skeleton consisting of 2 turns of CC Steer

ing angle σst of the vehicle did not come to the maximum physical limited value. Therefore, the planned path is not tangent to the inner circle of CC Turn ac 1 . This fact is provided by the NPL result merely due to the value of the given pair of 1 Init and 1 Des.

5. Conclusion In this paper, the present authors present the path generation scheme for a wheel loader during actual operations. The focus of this scheme is on the creation of the smooth path connecting any given pair of a locally desired initial configuration and a locally desired destination configuration. The specific parameterized canonical path skeleton is defined: the one with two CC Turns and line segments. Path planning is achieved by finding out necessary parameters characterizing this skeleton. The planned path might not be the

W. Yossawee et al. / Parametric Path Planning for a Car-Like Robot Using CC Steers y0-axis

G

G H D

621

y0-axis

F

J

H

C

J E

B

A

E

D

F

O1

O1 A

x0-axis x0-axis

Figure 6. Example of 2 ccpath for a given pair of 0 Init(2.0L, 3.0L, 140◦ ) and 0 Des(10.0L, 10.0L, 225◦ )

Figure 7. Example of 2 ccpath for a given pair of 0 Init(2.0L, 3.0L, 140◦ ) and 0 Des(0.0L, 10.0L, −90◦ )

shortest distance, however it guarantees the smooth locomotion. Morover, extensions can be made to the proposed skeleton in order to cover the finite number of the static obstacle existences and more number of CC Turns.

References [1] L. E. Dubins. On curves of minimal length with a constraint on average curvature, and with prescribed initial and terminal positions and tangents. American Journal of Mathematics, 79:497-516, 1957. [2] J. A. Reeds and L. A. Shepp. Optimal paths for a car that goes both forwards and backwards. Pacific Journal of Mathematics, 145(2):367-393, 1990. [3] Thierry Fraichard, and Alexis Scheuer, From Reeds and Shepp’s to Continuous-Curvature Paths, IEEE Trans. on Robotics and Automation, Vol.20 No. 6, December 2004 [4] Yutaka Kanayama, and Norihisa Miyake, Trajectory Generation for Mobile Robots, Robotics Research, vol. 3, 333-340, MIT Press, 1986 [5] Scheuer, A., and Fraichard, Th., Collision-free and Continuous-Curvature Path Planning for Car-Like Robots. Proc. of 1997 IEEE Int.Conf.on Robotics and Automation, pp. 867-873, 1997 [6] T. Kito, J. Ota, et al, Smooth Path Planning by Using Visibility Graph-like Method. Proc. IEEE Int. Conf. on Robotics and Automation, pp.3770-3775, September 2003. [7] Weerakamhaeng Yossawee, Takashi Tsubouchi, Shigeru Sarata, and Shin’ichi Yuta., Path Generation for Articulated Steering Type Vehicle Usi ng Symmetrical Clothoid Proc. of 2002 IEEE Int. Conf. on Industrial Technology(ICIT’02), volume 1, pp.187-192, 2002. [8] Weerakamhaeng Yossawee, Takashi Tsubouchi, Masamitsu Kurisu, and Shigeru Sarata, ., A Semi-Optimal Path Generation Scheme for Frame Articulated Steering Type Vehicle submitted in July 2005 to International Journal of Advanced Robotics, Japan. [9] J.P. Laumond, P.E. Jacobs, M. Taïx and R.M. Murray., A Motion Planner for Nonholonomic Mobile Robots IEEE Transactions on Robotics and Automation 10(5), 577-593(1994). [10] J. Barraquand and J.C. Latombe., Nonholonomic multibody mobile robots: Controllability and motion planning in the presence of obstacles. Algorithmica, 10:121-155, 1993. [11] S.M. LaValle and J.J. Kuffner., Randomized kinodynamic planning. International Journal of Robotics Research, 20(5):378-400, May 2001. [12] Stephen Wolfram., The Mathematica Book, 5th edition, ISBN:1-57955-022-3.

622

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

C. Trevai et al. / Self-Organizing Planner for Multiple Mobile Robot Exploration and Patrol

623

624

C. Trevai et al. / Self-Organizing Planner for Multiple Mobile Robot Exploration and Patrol

C. Trevai et al. / Self-Organizing Planner for Multiple Mobile Robot Exploration and Patrol observation points

border of sensing areas

rsen

virtual observation point

obstacle

ri rb

grid detect to be obstacle by sensor

ri

observation point

625

626

C. Trevai et al. / Self-Organizing Planner for Multiple Mobile Robot Exploration and Patrol

boundary of the working area uncover region

rb mobile robot

C. Trevai et al. / Self-Organizing Planner for Multiple Mobile Robot Exploration and Patrol

627

628

C. Trevai et al. / Self-Organizing Planner for Multiple Mobile Robot Exploration and Patrol

4

4 5

5

b

3

b

3 6

2

7

1 a

6

2

7

1 a

8

8

C. Trevai et al. / Self-Organizing Planner for Multiple Mobile Robot Exploration and Patrol

629

630

C. Trevai et al. / Self-Organizing Planner for Multiple Mobile Robot Exploration and Patrol

cyclic partition greedy

140

80 60 40 20

120 maximum traveling length [m]

100

100 80 60 40 20

0 b

c

simulation environments

d

100 80 60 40 20

0 a

cyclic partition greedy

140

120 maximum traveling length [m]

maximum traveling length [m]

cyclic partition greedy

140

120

0 a

b

c

simulation environments

d

a

b

c

simulation environments

d

C. Trevai et al. / Self-Organizing Planner for Multiple Mobile Robot Exploration and Patrol

631

632

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

Adaptive Action Selection of Body Expansion Behavior in Multi-Robot System using Communication Tomohisa Fujiki a,1 , Kuniaki Kawabata b and Hajime Asama c a School of Engineering, The University of Tokyo b RIKEN c RACE, The University of Tokyo Abstract. In multi-robot system, cooperation is needed to execute tasks efficiently. The purpose of this study is to realize cooperation among multiple robots using interactive communication. An important role of communication in multi-robot system is to make it possible to control other robots by intention transmission. We consider that multi-robot system can be more and more adaptive by treating communication as action. In this report, we adopt action adjustment function to achieve cooperation between robots. We also run some computer simulations of collision avoidance as an example of cooperative task, and discuss the results. Keywords. Q-learning, Multi-Robot System, Communication, Cooperation, Mobile Robot

1. INTRODUCTION In multi-robot systems, communication is thought as a necessary skill for robotsto cooperate, and a number of schemes have been proposed for it [1,2]. However, these studies may not be useful to adapt in a dynamic and complex environment as they set rules to communicate. To achieve cooperation effectively in such environments, we have to discuss the adaptable cooperation using communication. Yanco et al. tried to develop a method to acquire an adaptive communication for cooperation of two robots[3]. Billard et al. proposed a learning method of communication through imitation [4]. This is an interesting approach but the system needs a teacher robot. In these methods and most of robotics resesarch, the communication is treated as special function for the robotic systems. On the other hand, in developmental psychology, communication is considered as interaction between individuals [5]. Moreover, communication is the transmission of intention, and those who received have to comprehend the intention. In conventional studies on cooperation of robots based on communication as signal transmission, action is taken as a motion of its own body and they focused on decision making using sensory 1 Correspondence to: Tomohisa Fujiki, 5-1-5, Kashiwanoha, Kashiwa-shi, Chiba, 277-8568, JAPAN . Tel.: +81-47-136-4260; Fax: +81-47-136-4242; E-mail: [email protected].

T. Fujiki et al. / Adaptive Action Selection of Body Expansion Behavior in Multi-Robot System

633

information. Communication is signal transmission over wireless LAN or other devices, but it is not correct in developmental psychological sense. There should be a sort of protocol between robots to communicate, and the intention should be exchanged. Consequently transmitting one’s intention could be treated as an action and receiving other’s intention could be treated as perception in multi-robot system. By introducing this concept to their control architecture, robots can make an attempt to control other robots. This means that a robot can make an action over constraint of its own D.O.F.(bodyexpansion behavior), and multi-robot system can be more flexible and adaptable. In this study, we take in communication to robot’s model both as perception and action. It means to achieve cooperation between robots, not only robot’s own movement but also sending message to other robots is treated as an action. We have previously developed an action selection method [6] which treats communication as above, but there was a problem of how to adjust different type of actions; self generated action and a requested one by communication. It seems that most effective strategy for the whole system is to accept a request only when the situations for both robots seem to improve. In this paper, we propose an action adjustment function to achieve cooperation between mobile robots. We also have some computer simulations of collision avoidance as an example of cooperative task, and discuss the results.

2. ACTION SELECTION METHOD INCLUDING INTERACTIVE COMMUNICATION 2.1. Reinforcement Learning Reinforcement Learning(RL,[7] is widely used in robotic systems to emerge robots’ actions from the interaction between the environment. However, in multi-robot system, there is a possibility that the same action causes different state transition which misleads the learning. To avoid this problem, Q-Learning for Semi Markov Decision Process (SMDP, [8]) which can handle discrete time series is utilized generally. Q-Learning algorithm for SMDP is as follows. 1. Observe state st at time t in the environment. 2. Execute action at selected by action selection node. 3. Receive reward r and calculate the sum of discounted reward Rsum until its state changes. Rsum = rt + γrt+1 + γ 2 rt+2 + · · · + γ N −1 rt+N −1

(1)

Here, r is a discount factor (0 ≤ r ≤ 1). 4. Observe state st+N at time t + N after the state change. 5. Renew Q value by equation (2). Q(st , at ) ← (1 − α)Q(st , at ) + α[Rsum + γ N max Q(st+N , a )]  a

Here, α is a learning rate (0 ≤ α ≤ 1) and a is possible actions in state st+N . 6. Clear r. 7. Renew time step t to t + N , and return to 1.

(2)

634

T. Fujiki et al. / Adaptive Action Selection of Body Expansion Behavior in Multi-Robot System

2.2. Basic Actions There are a variety of tasks considered as cooperative tasks, but in this paper, we are going to discuss collision avoidance problem of mobile robots. This is because that although there are a lot of the rule based schemes proposed for it, it can still see the effect of the communication and the expansion in D.O.F. directly. We suppose omni-directional mobile robots which are equipped with omnidirectional visual sensors. Considering communication as robots’ action, basic actions for robots are set as Table 1. Here, “Comminication” means intention transmission, which is a requesting action to other robot to make an asked action. This means that a robot can request any actions which the other robot can make. Robots acquire their state-action policy by RL. We also configure robot’s state space as Table 2. Numbers in Table 2 shows the number of state space for each domains. Example for the visual sensory information is shown in Figure 1. The size of the other robot on image plane is determined by the distance threshold. Both the direction of the other robot and the goal are devided into six state spaces. Direction of the wall has five state spaces, which are front, left, right, and the back of the robot, or, no walls. In Figure 1, the white wall is placed above the distace threshold, so only the grey wall is considered as an obstacle in robot’s state space. In this framework, a robot selects, evaluates and learns its action from sensory information and other robots’ intentions. Table 1. Actions of robot Move Own Body

Communication

- No changes in speed or direction

- No changes in speed or direction

- Speed down (2[mm/sec]) - Speed up (2[mm/sec])

- Speed down (2[mm/sec]) - Speed up (2[mm/sec])

- Change direction (+45 [deg]) - Change direction (-45 [deg])

- Change direction (+45 [deg]) - Change direction (-45 [deg])

2.3. Action Selection and Reward There are a lot of action selection models for Q-Learning like Max Selection or Random Selection. One of the methods to improve its adaptability gradually by RL is probabilistic Table 2. Configuration of state space Visual sensory information - Size of other robot on image plane - Direction of other robot - Direction of the goal - Wall direction inside the sensing area

2 6 6 4+1(none)

Communication - Other robot’s request

5+1(none)

Other Information - Own Speed Number of the State Space

2 4320

T. Fujiki et al. / Adaptive Action Selection of Body Expansion Behavior in Multi-Robot System

635

distance threshold 600[mm]

front left

small large roobt on the image plane (distance threshold 600[mm])

wall

right

wall

back

direction of walls

goal

1 6

2

5

3

4

1

other robot

ex. direction of other robot = 2

direction of other robots

6

2

5

3

4

ex. direction of the goal = 2

direction of the goal

Figure 1. Visual Sensory Information

action selection using Boltzmann distribution (Boltzmann selection). It is used widely and is reported that probabilistic selection works better than deterministic policy in multiagent systems [9]. In Boltzmann Selection model, probability p(a | s) to make action a in state s is defined as equation (3). expQ(s,a)/T p(a | s) =  expQ(s,ai )/T

(3)

ai ∈A

Here, T is temperature constant. If T is near zero, action selection will be deterministic, and if T becomes large, action selection will be more random and will do aggressive search for state-action policy. Evaluation of the selected action is done by using the distance from the goal g(t) in time t. Reward rt is defined by equation (4). rt = μ(g(t) − g(t − Δt))

(4)

Here, μ is a weight value and represents effectiveness of the reward. Δt is cycle time for decision making.

3. ACTION ADJUSTMENT FUNCTION When communication is treated as an action for intention transmission, accepting all the requested actions will only to improve other robots’ situations. However, for the whole system, it is seems that most effective way is to accept the request only when the situations of both robots can be improved. To accept such requests, there is a need for action adjustment function to compare the actions which are self determined action and a requested one by communication. It makes the robots to create better situations, and will be able to cooperate efficiently. For this action adjustment, we introduce the algorithm which is illustrated in Figure 2. First, a robot decides whether to move itself or to make other robot move by communication. This is a selfish action selection which doesn’t consider the state of other robot.

636

T. Fujiki et al. / Adaptive Action Selection of Body Expansion Behavior in Multi-Robot System

Of course there is a probability that the request will be refused, but whether to accept or reject the request is determined by the receiver. Next, a robot will determine which action to make; the selfish action that is decided at first step or a requested action by other robot. By those two steps, a robot can select an action considering a request from other robot. This adjustment algorithm can be utilized generally by giving numeric values for each actions. In this paper, we use Q-Learning algorithm for SMDP and the Q values from the RL are used as numeric values for two step action selection. The implemented algorithm for the robot is shown in Figure 3. Move Own Body

Move Other Robots (Communicate)

Execute Requested Action

Selfish Action Selection Total Action Selection

Selected Action

Figure 2. Action selection process

Observe State s

Calculate Reward r

No

State Changed?

Yes Renew Q value Selfish Action Selection Total Action Selection Execute Selected Action a

Figure 3. Algorithm for action selection including communication

4. COMPUTER SIMULATION OF COLLISION AVOIDANCE PROBLEM In this section, we are going to have some computer simulations to test our approach and discuss the results.

T. Fujiki et al. / Adaptive Action Selection of Body Expansion Behavior in Multi-Robot System

637

4.1. Settings There are two omni-directional mobile robots in simulation field, and the task is collision avoidance as an example of cooperative task. To compare our approach to general approach of communication in robotic research field, we set three conditions. Case A is for our proposed approach and robots can use communication to move the other robot by intention transmission. In case B, robots can inform their adjacent action which they made by signal transmission. By this, we can eliminate an influence on the size of state space and robots in case B have same state number as in case A. Robots in case C cannot use communication. Common settings are as follows. Robot is cylinder shaped and its diameter is 300[mm]. Start position of each robot is set 500[mm] from longitudinal sides of the environment, symmetry in ups and downs. Goal position is the starting point of the other robot, and is set face to face in initial condition. Maximum speed of the robots is 40[mm/sec], and minimum is 10[mm/sec]. It assumes that robots can output their speed without a time lag. A trial is terminated under four conditions, which are the goal of the both robots, the collision of robots, the collision of either robot against walls or simulation area, or when the time step reached to 3000. The parameters for RL are set experimentally as Δt = 1.0[sec], μ = 0.1, α = 0.04, γ = 0.9 and T = 0.2. Reward for the robots are calculated by equation (4), but in case of any collisions, r = −5 is given as punishment value. 4.1.1. Simulation 1 In simulation 1, straightway environment in Figure 4 is utilized. Width 800 ≤ x ≤ 3000[mm] is changed by 100[mm] and computer simulation is run for four times in every situation, and the learning is episodic for each simulation. Maximum trial number is 30000 for every experiment. 4.1.2. Simulation 2 In simulation 2, crossroad environment in Figure 4 is utilized. Simulation area is 3000[mm] square, and the width of both roads are x[mm], which changes 600 ≤ x ≤ 3000[mm] by 200[mm]. Four black pieces in Figure 4 are walls (obstacles). Computer simulation is run for four times in every situation, and Maximum trial number is 100000 for every experiment. Learning is episodic for each simulation. All settings has the same distance for goal, to make it easy to compare the results. In simulation 2, when a robot moves, physical relationship against the walls change, and it affects robot’s state space. Consequently, robots’ state change frequently when x is small, and the problem will be much difficult compared to the same x in Simulation 1. 4.2. Results Figure 5 shows the number of trials for convergence. In this report, “convergence” means 100 continuous goals. Horizontal axis shows the width of the road x. Data on those graphs are the average of four trials. It shows some oscillation, but the aptitude can been comprehended.

638

T. Fujiki et al. / Adaptive Action Selection of Body Expansion Behavior in Multi-Robot System

Figure 4. Over view of environment

Figure 5. Number of trials for convergence

4.2.1. Convergence Properties When the width x is large enough for robots and the problem can be solved easily, Case C achieves convergence faster than other cases. We believe that this occurs because the state space of Case C is one fifth of other cases and therefore it is easy to acquire the stateaction policy. The result of Case B shows large oscillation in both graphs. In this case, communication changes the state of the other robot, and it makes difficult to search stateaction policies. Communication as signal transmission doesn’t show its superiority in any case of our experiments. It only multiplies the number of states and prevents system from fast achieving of cooperation. Finally, Case A has superiority to other methods when x is small. This is the condition which the problem is hard to solve and is difficult to cooperate with others. Results show that our approach can solve the problem cooperatively even when the other approaches cannot solve it. It is a difficult situation for robots to cooperate without communication, and comparing Case A to Case B, our proposed system works better than usual usage of the communication such as information transmission. 4.2.2. Quality of the Solution Figure 6 shows the number of steps to converge, which shows the quality of the solution achieved by the system. Data on those graphs are the average of four trials. Although there are many spikes, Case A apts to generate better solutions than the other methods. We consider that intention transmission worked effectively by affectiong the other robots, only when the communication is needed. This result supports our approach that it is not only in the fastness in finding solutions but also in the quality of the solution.

T. Fujiki et al. / Adaptive Action Selection of Body Expansion Behavior in Multi-Robot System

639

Figure 6. Number of steps

5. CONCLUSIONS In this paper, we proposed a method to adjust different type of actions which include communication as intention transmission. By using this method, we enabled to treat communication as intention transmission action in multi-robot system and also examined its performance by computer simulations. The results show that our approach can find solution in difficult situation where cooperation is hardly achieved without communication, and is also excels in the quality of solution achieved by the system than ordinal way of communication or without using communication. In our future work, we will try our approach in more complex environments or other tasks.

References [1] Y. Arai, S. Suzuki, S. Kotosaka, H. Asama, H. Kaetsu and I. Endo: Collision Avoidance among Multiple Autonomous Mobile Robots using LOCISS (LOcally Communicable Infrared Sensory System), Proceedings of the IEEE International Conference on Robotics and Automation, pp. 2091–2096, 1996. [2] N. Hutin, C. Pegard, E. Brassart, A Communication Strategy for Cooperative Robots, Proc. of IEEE/RSJ Intl.Conference on Inteligent Robots and Systems, pp. 114–119, 1998. [3] H. Yanco, L. A. Stein, An Adaptive Communication Protocol for Cooperating Mobile Robots, From Animals to Animats 2, pp. 478–485, 1993. [4] A. Billard, G. Hayes, Learning to Communicate Through Imitation in Autonomous Robots, Artificial Neural Networks - ICANN’97, pp. 763–768,1997. [5] Walburga Von Raffler-Engel (Editor): Aspects of Nonverbal Communication, Loyola Pr, 1979. [6] M. Hoshino, H. Asama, K. Kawabata, Y. Kunii and I.Endo: Communication Learning for Cooperation among Autonomous Robots, Proceedings of the IEEE International Conference on Industrial Electronics, Control & Instrumentation, pp. 2111–2116, 2000. [7] Richard S. Sutton and Andrew G. Barto: Reinforcement Learning:An Introduction (Adaptive Computation and Machine Learning), The MIT Press, 1998. [8] Steven J. Bradtke and Michael O. Duff: Reinforcement Learning Methods for ContinuousTime Markov Decision Problems, In G. Tesauro and D. Touretzky and T. Leen, editors, Advances in Neural Information Processing Systems, Vol.7, pp. 393–400, The MIT Press, 1995. [9] Satinder P. Singh and Tommi Jaakkola and Michael I. Jordan: Learning Without StateEstimation in Partially Observable Markovian Decision Processes, International Conference on Machine Learning, pp. 284–292, 1994.

640

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

A. D’Angelo and E. Pagello / From Mobility to Autopoiesis

641

642

A. D’Angelo and E. Pagello / From Mobility to Autopoiesis

A. D’Angelo and E. Pagello / From Mobility to Autopoiesis

643

644

A. D’Angelo and E. Pagello / From Mobility to Autopoiesis

Perc

δP

Kf

+

+

−Kb

E ff

Vds

A. D’Angelo and E. Pagello / From Mobility to Autopoiesis

645

A. D’Angelo and E. Pagello / From Mobility to Autopoiesis

10 8 6 distance from the object along y-axis

646

4 2 0 -2 -4 -6 -8 -10 -3

-2

-1

0

1

distance from the object along x-axis

2

3

A. D’Angelo and E. Pagello / From Mobility to Autopoiesis

4.5

4

3.5

direct gain Kf

3

2.5

2

1.5

1

0.5

0 -6

-4

-2

0 inverse gain Kb

2

4

6

647

This page intentionally left blank

Part 11 RoboCup

This page intentionally left blank

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

651

Cooperative Agent Behavior Based on Special Interaction Nets Oliver Zweigle 1 , Reinhard Lafrenz, Thorsten Buchheim, Uwe-Philipp K¨ appeler, Hamid Rajaie, Frank Schreiber and Paul Levi University of Stuttgart, Germany Abstract. An important aim of the current research effort in artificial intelligence and robotics is to achieve cooperative agent behavior for teams of robots in real-world scenarios. Especially in the RoboCup scenario, but also in other projects like Nexus, agents have to cooperate efficiently to reach certain goals. In the RoboCup project, cooperative team-play and team strategies similar to real world soccer are intended. This article describes an approach that combines cooperative aspects, role assignment algorithms and the implementation of robot behavior with Interaction Nets. Based on these single methods, a complete framework for team strategies was developed that is used in the RoboCup environment for the middle-size team CoPs Stuttgart and in the Nexus project, where a team of robots guides persons through a building. Keywords. Cooperative Robotics, RoboCup, Interaction Nets

1. Introduction Modeling and implementing group behavior in multi-agent systems, which are used to for multi-robot applications, is a complex and fault-prone task. Interdependent interactions between agents have to be considered in the individual behavior model for a well working coordination and synchronization of distributed agent actions. Especially for real-world applications of a team of autonomous mobile robots these problems become even more aggravated because situations can be interpreted in different ways by different agents. One reason for that is incomplete and fuzzy information provided by the local sensors of a robot. Therefor it is a very complex task to develop a distributed plan for coordinated group behavior. Such a complex system requires in addition to the implementation a lot of testing and tracing to identify potential problems, e.g., the occurrence of special events or deadlocks, which cannot be feasibly modeled in the team plan. In this paper, we present a framework for modeling distributed multi-agent plans by introducing XPlM Nets a special simplified form of hierarchical Interaction Nets. 1 Correspondence

to: O. Zweigle, Univerit¨ at Stuttgart, IPVS, Universit¨ atsstr. 38, 70569 Stuttgart, Germany Tel.: +49 711 7816 421; Fax: +49 711 7816 250; E-mail: [email protected].

652

O. Zweigle et al. / Cooperative Agent Behavior Based on Special Interaction Nets

Interaction Nets [1] are a formalism based on a specialized form of Predicate/Transition Nets [3]. The approach of Interaction Nets was introduced in [4]. In [7] they were modified to describe cooperation processes in a multi-agent environment. As this approach was too complex for the RoboCup scenario it was simplified. Motivated by the description of several team cooperation strategies in [11], we used the Shared Plans Theory as the basic idea for cooperation between the agents. The Joint Intensions Theory is currently in our opinion not useful in RoboCup, as it assumes that knowledge about the team-mates is always available. Due to instable communication in RoboCup this theory isn’t used at the moment, but it is promising for future work, using for example visual tracking [10] instead of communication. A behavior-based approach to control robots on a higher level of complexity was introduced in [12]. The attempt to use basic behaviors to reach a more abstract level of designing tactics in RoboCup provided a basis for our current work. In our implementation we have two levels of abstraction in the representation of cooperative behavior: First, modeling of the behavior in form of XPlM Nets, simplified hierarchical Interaction Nets, and secondly the execution of decision trees specified in the XML-dialect XABSL (eXtensible Agent Behavior Specification Language) [6], a specification language for agent behavior which can be directly executed by a execution engine. For easy specification of distributed behaviors, we developed the graphical editor XPlM for the XPlM Nets that generates XABSL output. The rest of the paper is organized as follows: Section 2 describes the distributed world model and the tactical role concept, the fundamentals of cooperative behavior. In section 3, Interaction Nets, the specialized XPlM Nets and the graphical modeling tool are introduced. Section 4 explains the modeling of a pass-play in robotic soccer as a complex application example. Finally, section 5 summarizes and gives an outlook.

2. Cooperative System Overview As a basis for cooperation, knowledge about the environment and the objects within is required. In complex robotic systems, this knowledge comprises environmental information locally derived from sensor data, local state information, e.g., the current role or behavior, and information transmitted by other members of the robot team. To allow for uncomplex handling of this data, we use a world model, which incorporates mechanisms for multi-thread save data representation, data processing and communication [2]. 2.1. Decentralized World Model Each robot holds several data containers, one for the locally sensed or derived data and one for each team-mate’s data, but only a (small) part of locally available information of a robot is communicated to others. The information of all robots is used to estimate abstract world states, which describe application-relevant features of the environment. In the RoboCup scenario, geometrical relations and

O. Zweigle et al. / Cooperative Agent Behavior Based on Special Interaction Nets

653

abstract states such as weHaveBall or iAmAttacker are used to trigger a specific behavior and are called world model flags. Due to autonomy of the robots, each one makes its decisions locally, considering all available information, including communicated status information, which can be used to model cooperative behaviors. This approach implements a cooperative system with a high degree of fault-tolerance. 2.2. Basics for Team-Play Modeling The subsystem for team play is divided into two layers. One layer, the so-called WorldModelAdapter, is derived from the world model. The WorldModelAdapter is responsible for processing the data and providing results in a format suitable for the second layer. Due to fault-tolerance, the WorldModelAdapter runs locally on each robot. The input data are local sensor readings that are processed and information communicated by other robots of the team. This way, characteristics of the current game situation are derived from a merged view of the distributed system of robotic agents. The results provided by the WorldModelAdapter are data sets, describing for example the ball position, logical information about ball possession (teamHasBall), the positions of other robots or the most recent referee event. These flags are used together with the current tactical roles of the robot to switch between behaviors of the agents. A behavior is a state of an agent in which certain drive commands are executed. The second, higher layer constitutes the control of the behaviors of an agent. This behavior control is executed in a XPlM Net (see section3). 2.3. Role Assignment Since all robots communicate with each other, it is possible to assign tactical roles and subroles to the different robot players. Every robot has a tactical role and a unique tactical subrole during the match. Although the tactical roles and subroles are assigned dynamically, a robot’s tactical role and subrole will not change unless other robots fail. We distinguish between two main tactical roles: Defender and Forward. Defender and Forward are redivided into subroles, such as Left Defender or Front Forward. This system is adopted from human soccer and can be easily extended to other main tactical roles and subroles, e.g., Midfielder. It was one aim to switch between different strategies for the team lineup like: Defender - Defender - Defender - Forward - Forward Defender - Defender - Forward - Forward - Forward At present, the Midfielder role is not used due to the limited field dimensions. According to the RoboCup rules we only change the team lineup before the match and during the halftime break at the moment. On each level of the tactical role and subrole hierarchy the different tactical subroles are designed in such a way that they do not conflict with each other and define complementary behavior, e.g., covering and defending the own goal. The advantage of the chosen tactical role hierarchy approach is a better organization of the team coordinating mechanisms.

654

O. Zweigle et al. / Cooperative Agent Behavior Based on Special Interaction Nets

In near future, we will try to recognize the game situation automatically and adapt the adequate tactical roles and the role behavior in a self-organized way [9]. The implementation uses two ordered lists. One list (the roleList) contains the dedicated tactical roles. The other list (robot list) contains the available robots. Robots that break down during a match are automatically removed from the second list. The tactical roles are assigned to the robots one-on-one depending on the order defined in the role list. Tactical subroles are assigned depending on the number of robots that occupy a tactical role. Every robot calculates its tactical role and subrole locally, based on a common algorithm. In addition, there are special roles that weight more heavily than the tactical roles. If a special role is assigned to a robot it will keep its tactical role and subrole, but it will execute the behavior of the the special role. Special roles are for example Keeper, Attacker, Pass Player or Pass Receiver.

3. XPlM Nets - A Simplified Version of Interaction Nets To achieve cooperative performance, we use a simplified version of Interaction Nets to map behaviors to robotic agents. This mapping process is based on the knowledge represented in the world model as described above. Because all robots in the cooperative environment have the same capabilities, the basic behaviors and therefore the decision making mechanism to switch between these behaviors should be identical for all robots of the team1 . In the context of autonomous mobile systems with unreliable communication, a centralized approach with only one deciding instance is not applicable. Hence, each agent has to make individual decisions about the current situation and the state of its team mates based on its local belief state. This local belief state is enhanced by communication, but decisions must also be taken in absence of a working inter-team communication. To achieve this, we use decision networks based on simplified Interaction Nets and call them XPlM Nets. Interaction Nets are a formalism to describe agent interactions in a multi-agent scenario, Interaction Nets show a more centralized approach than the underlying Predicate-Transition-Nets introduced in [3]. They include the possibility to run several agents (tokens) in one common net. In contrast, our XPlM Net approach allows only one agent to run through a commonly known net, which is replicated on every agent and executed locally. 3.1. XPlM Nets - Phases and Transitions A XPlM Net N et = {P h, T rans, Arc, T ok} consists of Phases, Transitions, Arcs from phases to transitions and from transitions to phases, and one Token to indicate the current phase. The phases P h = {P h1 , . . . , P hn } are equivalent to places in Petri-Nets and represent states of the agent, but imply the execution of a basic behavior. Such a basic behavior is an abstract driving command like getBall or dribble. See Fig. 1 for an example of a simple XPlM Net. Transitions T rans : P h → P h define preconditions and postconditions of a phase. To switch from P hn to P hn+1 , P hn must hold the token and the transition 1 Here,

we do not consider special roles with different behavior patterns, e.g., keeper.

O. Zweigle et al. / Cooperative Agent Behavior Based on Special Interaction Nets

655

IHaveBall

dribble

getBall

ballIntercepted

readyToShoot

ballLost

Figure 1. Simple XPlM Net

conditions between the two phases must be satisfied. Fig. 2 shows an example of transition conditions.

Figure 2. Transition conditions in the XPlM Net editor

3.2. Modularity Modeling the behavior of robot agents can reach a high level of complexity with a large number of phases and transitions. To keep the system manageable, hierarchical nets are introduced, where subnets SN ⊆ N et are handled like common phases. Such a subnet has a transition for its preconditions and one for its postconditions. Regardless of which phase is reached in the subnet, the execution of a subnet is immediately finished if the postcondition is valid. To leave a subnet after a certain process in the subnet was executed a starting and ending phase can be defined. As soon as the end phase is reached the execution of the the subnet is finished and the process continues on the level above with the post condition of the subnet. 3.3. Modeling Interactions Between Agents Although every agent runs a local copy of the commonly known net, cooperation between the agents is required. To implement cooperative behavior every agent gets a unique tactical subrole. Based on that subrole, different phases with different behaviors are triggered in the net. Every agent can get information about its team-mates with the help of the world model flags. With the help of that knowledge the agent can coordinate its actions with the actions of its team-mates. To map a behavior to an agent, in our implementation a subsystem consisting of two layers is used. The lower layer for agent controlling uses XABSL [5], a

656

O. Zweigle et al. / Cooperative Agent Behavior Based on Special Interaction Nets

XML-based behavior specification language for autonomous robots, which was originally developed for the 4-legged AIBO league in RoboCup. The XABSL files are compiled into an intermediate code to be executed by a runtime system (XABSL-engine). Writing these files manually is complex and error-prone, hence we developed a graphical modeling tool to design hierarchical decision networks, the so called XPlM editor (Fig. 3). This higher layer generates XABSL output automatically.

Figure 3. XPlM Net Editor: Editor with opened subnet showing the handling of certain referee events in RoboCup

4. Application The application of the modeling framework is currently used in two applications, the RoboCup team CoPs of the University of Stuttgart and in the Nexus project [8] for a distributed robot architecture in which multiple robots are used to guide persons through a building. In RoboCup, the aim was the modeling of an intelligent player model for coordinated and cooperative team play. There is one global multi-agent plan for the whole team that is executed on each robot player individually and designed according to real life soccer. According to a player’s tactical role the multi-agent plan specifies a behavior sequence to be executed depending on the current situation of the game and the team mates. The definition of the multi-agent plan is slightly connected to

O. Zweigle et al. / Cooperative Agent Behavior Based on Special Interaction Nets

657

real soccer strategies. On the main level we distinguish between Referee Events, such as Throw-In or Kick-Off, and the running game flow. The Referee Events subnet assigns unique tactical positions tuned to the current situation for every robot. Every referee event is handled uniquely in an own subnet. Furthermore, there are different strategies for some referee events like Goal-Kick or ThrowIn, depending on the current situation. The further development will include a learning algorithm to choose the right referee event strategy. During the running game we distinguish between two strategies, offensive play and defensive play. Each strategy defines a proper set of behaviors according to the situation. The advantage of such an approach is that each role can make use of different behaviors depending on the current situation in the match, which results in a situation sensitive architecture. This architecture can be used for many different player movement strategies that are modeled on real soccer movements. The Offensive and Defensive subnets are subdivided into further subnets that handle the behavior for the different tactical roles and subroles. In the subnets for tactical subroles, the different behaviors are implemented. An example for such a behavior is the pass play scenario described in the following section. 4.1. Pass-play

Figure 4. Pass Play Net

Figure 4 shows an example of a XPlM Net for a pass play implementation in form of a 2-agent interaction. Here, a pass is done between two players with a fixed position of the pass receiver. This XPlM Net requires the participation of two agents, one taking the special role of a pass player and another the special role of pass receiver.

658

O. Zweigle et al. / Cooperative Agent Behavior Based on Special Interaction Nets

The two participating agents start in the initial phase decideRole. Here a decision is made on which player takes which role in the pass play interaction. The function iAmNearestTo, which calculates the current position of the robot itself compared to the position of the team mates relative to the distance to a given arrival point, is part of a set of special world model functions 2.1. They are triggered regularly to gain information about the current status of the robot in connection with its team mates. The robot agent that has the closest position to the ball takes the special role pass player. The agent that has the closest position to the defined pass target position takes the special role pass receiver. If the distance of both players to one or both targets is similar a unique robot-id is considered to assign the special roles. The pass player approaches the ball oriented in the direction of the pass target. The pass receiver drives towards the pass target position. As soon as the pass player has reached the required position relative to the ball an action synchronization between the two agents takes place. Only if the pass receiver moves into a range of tolerance around the target position the execution of the pass play is continued. The position of the team mate is derived from communication over the world model. The pass receiving agent changes to the behavior interceptBall as soon as it reaches the pass target position. The behavior interceptBall lets the agent move to a position parallel to the goal line to intercept the pass. As soon as the pass receiver agent got the ball or the ball stops in a position very close to the pass receiver, it tries to shoot a goal directly. The introduced interaction contains only three transitions that result in an action coordination. The input transition of the behavior decideRole results in a complementary role assignment of the cooperating agents. The transition canPass results in a time-synchronization of the action phase. For clarity of the example not all exceptional cases are presented. The whole implementation was successfully tested in a simulation environment as well as on real robots at RoboCup 2005 in Osaka where an modified version was used to do a short pass after the referee events Throw-In and Kick-Off to avoid that the executing robot is shooting the ball directly into the goal.

5. Conclusion and Outlook We have presented a framework for modeling and executing cooperative behavior for teams of robots, based on newly introduced XPlM nets(simplified interaction nets) in combination with XABSL. We are currently using and evaluating the framework in RoboCup. In addition the approach will be used in an application for the Nexus project [8] that has the aim to develop global distributed world models. As a part of the project certain applications have to use these world models. One application will be the use of a set of robot agents that are navigating as visitor guides through a building. Here a high degree of cooperative agent behavior is necessary to coordinate the tasks of the different agents. Furthermore the presented framework will be enhanced for the use in the RoboCup project. To achieve a higher degree of autonomy for a team of cooperative robots, we are developing mechanisms for self-organized role decision. The work on a component for the analysis of the game situation is also in progress. Based on that analysis

O. Zweigle et al. / Cooperative Agent Behavior Based on Special Interaction Nets

659

the system should be able to decide autonomously which strategies are used. At the moment a certain team strategy has to be defined before the match or in the halftime break. With the help of an autonomous strategy decision component the team could react faster on the events during a match. As a part of that work we evaluate learning algorithms that allow the robots to decide about situationdependent strategies.

References [1] M. Becht, Muscholl K. M., and Paul Levi. Transformable multi-agent systems: A specification language for cooperation processes. In World Automation Congress (WAC), Sixth International Symposium on Manufacturing with Applications (ISOMA), 1998. [2] T. Buchheim, G. Kindermann, R. Lafrenz, and P. Levi. A dynamic environment modelling framework for selective attention. In Visser et al., editor, Proceedings of the IJCAI-03 Workshop on Issues in Designing Physical Agents for Dynamic Real-Time Environments, 2003. [3] H. J. Genrich and K. Lautenbach. System Modelling with High-Level Petri Nets. 1981. [4] Y. Lafont. From proof nets to interaction nets. In J.-Y. Girard, Y. Lafont, and L. Regnier, editors, Advances in Linear Logic, pages 225–247. Cambridge University Press, 1995. ungel. Designing agent behavior [5] M. L¨ otzsch, J. Bach, H.-D. Burkhard, and M. J¨ with the extensible agent behavior specification language XABSL. In Daniel Polani, Brett Browning, and Andrea Bonarini, editors, RoboCup 2003: Robot Soccer World Cup VII, volume 3020 of Lecture Notes in Artificial Intelligence, pages 114–124, Padova, Italy, 2004. Springer. [6] M. L¨ otzsch, J. Bach, H.-D. Burkhard, and M. Jngel. Designing agent behavior with the extensible agent behavior specification language xabsl. In 7.th international workshop on RoboCup 2003, 2003. [7] K.M. Muscholl. Interaktion und Koordination in Multiagentensystemen. Dissertation, Universit¨ at Stuttgart, 2000. [8] K. Rothermel, D. Fritsch, B. Mitschang, P. J. K¨ uhn, M. Bauer, C. Becker, C. Hauser, D. Nicklas, and S. Volz. SFB 627: Umgebungsmodelle f¨ ur mobile kontextbezogene Systeme. Proceedings Informatik, Innovatice Informatikanwendungen Band 1, pages 103–116, 2003. [9] M. Schanz, J. Starke, R. Lafrenz, O. Zweigle, M. Oubbati, H. Rajaie, F. Schreiber, T. Buchheim, U.-P. K¨ appeler, and P. Levi. Dynamic Task Assignment in a Team of Agents. In P. Levi et al., editor, Autonome Mobile Systeme, pages 11–18. Springer, 2005. [10] T. Schmitt, M. Beetz, R. Hanek, and S. Buck. Watch their moves: applying probabilistic multiple object tracking to autonomous robot soccer. In Eighteenth national conference on Artificial intelligence, pages 599–604, 2002. [11] M. Tambe. Towards flexible teamwork. Journal of Artificial Intelligence Research, 7:83–124, 1997. [12] Barry Brian Werger. Cooperation without deliberation: A minimal behavior-based approach to multi-robot teams. Artificial Intelligence, 110(2):293–320, 1999.

660

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

a,1 a b c

b

c

H. Kobayashi et al. / A Framework for Advanced Robot Programming in the RoboCup Domain

661

662

H. Kobayashi et al. / A Framework for Advanced Robot Programming in the RoboCup Domain

Lua Script Scripts can already access the available methods of the modules. Concrete Modules

MindModule

CameraModule

ActionModule

SensorModule1

SensorModule2

We can plug the modules that we want to use into the base program. Abstract Modules

JPMindModule

JPCameraModule JPActionModule JPSensorModule JPUDPModule

The occured events is allotted to the modules. Core Program OPEN-R Base Program

JPTCPModule

H. Kobayashi et al. / A Framework for Advanced Robot Programming in the RoboCup Domain

663

664

H. Kobayashi et al. / A Framework for Advanced Robot Programming in the RoboCup Domain

H. Kobayashi et al. / A Framework for Advanced Robot Programming in the RoboCup Domain

665

666

H. Kobayashi et al. / A Framework for Advanced Robot Programming in the RoboCup Domain

H. Kobayashi et al. / A Framework for Advanced Robot Programming in the RoboCup Domain

o¨ e´

667

668

Intelligent Autonomous Systems 9 T. Arai et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.

Major behavior definition of football agents through XML José Luis VEGA, Ma. de los Ángeles JUNCO, Jorge RAMÍREZ Instituto Tecnológico y de Estudios Superiores de Monterrey

Abstract. Implementing behaviors on robots or computational agents is a task that implies specific knowledge about the basic robot programming language. In our work we propose an approach that separates low level robot programming from behavior definition, which allows an accelerated development of both tasks. The behavior models that we use are hierarchical state machines in which every state is related with an action that can be a basic action or a complex behavior defined in XML. Transitions between states are triggered by events perceived by the agent. A simulator was built to test this approach and to develop football behaviors. This simulator accepts agents with different characteristics and also defines an environment similar to the actual field. It shows transitions, active states in the internal state machines of the agent and the events perceived. With this tool the main players of a football team (goalkeeper, defense and attacker) have been generated easily with little knowledge of programming and mainly focusing on strategies.

Introduction Robot programming is a difficult task for a person without robotics or programming skills. Even for expert programmers changing code in native language is hard and error prone. We want to avoid these inconveniences so any person can define complex task on robots. To make this possible the system must allow defining agent behaviors in an easy and intuitive way without needing any knowledge on the basic implementation of actions and perceptions [1]. The Sony Four-Legged Robot League players of the RoboCup league [2] are a particular case in which a team needs to define in a fast and easy way strategies. In our team we are looking for a platform that is simple, versatile, and capable of code reuse. In addition we wish a hierarchical and well defined structure that can be adapted easily to the developments of other modules as perception or locomotion, and independent enough so we can reuse it in future research on robotics and multi-agent systems. To fasten the development of strategies a controlled or simulation environment is built like in [3], in it the strategies can be tested and improved quickly. One of the most used techniques in defining behaviors are the Finite State Machines (FSM), in spite of their limitations the fast and always predictable response makes it a good option in defining robot behaviors. Using an intuitive language as XML (Extensive Markup Language) for modeling Finite State Machines we can hide lots of programming details and facilitate the creation of graphic tools that even a nonprogrammer user can understand. In addition we propose a hierarchical structure in

J.L. Vega et al. / Major Behavior Definition of Football Agents Through XML

669

which the behaviors can be easily reused and evolved with the characteristics of the robot. Parallel work between separate modules can be achieved and this system can also be used as an alternative in defining complex tasks in a generic way, just implementing the XML engine in the specific platform of the agent.

1. Defining Behaviors in Agents There is a lot of work related with the problem of defining behaviors in agents. There are well studied techniques that are very efficient in certain situations. RoboCup research, which focuses mainly on the robotics leagues, is very particular because of the environment and robot characteristics. Teams using decision systems based on planning must contemplate that plans need to be generated in real time. The environment is so dynamic that a plan generated lately could be inadequate or maybe harmful in pursuing the global team goals. Other complication is that although the plan has been generated on time the execution of this plan in the real world can arouse unexpected results, therefore systems capable of executing multiple parallel plans, plans with conditional cycles, timing in actions and fail planning systems have been developed. Some teams have decided not to use planning but instead they have been using fast and reactive systems, for example Finite State Machines, decision trees [4] or behavior based robotics [5]. In past works like Rudomín and Millán [6], XML scripts are used for defining low level animations and they include in the same scripts a way in which the agent can interact with the environment. As a result they obtain a simplified way to assign an agent or groups of agents a particular behavior. XML agents are associated with a “type of agent” which depending on the perception of the agent can switch to other behaviors. The “type of agent” consists of a set of states that are associated to atomic actions which are executed in a sequential manner. In each state there are transitions to other states that validate some condition and, if evaluated true, they switch the running state. Other useful features are mechanisms for defining variables and assigning values to them on execution time and, also, include them in a condition expression part of a transition. In its champion code of 2004, the “German Team” of the Four-Legged RoboCup league, uses the language XABSL [8] based on XML. All the football strategies are modeled using this language. Robot behaviors are defined through a modular state machine system in which each module is named “option”. These options are written in XML archives. This language has many advantages: it is flexible, modular and hierarchical. The decision tree is based on if else statements and the conditions are related to a big number of variables that the user must manage in order to program properly. With these tools and this behavior modeling, the team has achieved the integration of a large number of options. These flexibility allows a lot of people working specifically in strategy without necessarily knowing the basic details of the implementation in C++. Graphic documentation is another fancy characteristic using DOTML [9]. Another highly used method to consider is behavior based robotics [5], the disadvantages are that football is a game in which strategy is a key point, so hybrids methods can be used like in [7], but searching always an equilibrium between complexity and usability of the whole system.

670

J.L. Vega et al. / Major Behavior Definition of Football Agents Through XML

2. Defining behaviors through XML We developed a platform for defining robot behaviors using state machines modeled with XML as in [8]. With this any user can define a complex task even without knowing the basic programming details. These facilitate the creation of graphic interfaces even more intuitive for an inexperienced computer or robotics user. As any state machine it responses very fast to environment changes. We propose a hierarchical structure so it allows at top levels defining more complex strategies. The behaviors modeled with XML prevent programming the strategies directly in the native language of the robot (like C++ for AIBO), we only need one implementation of a generic state machine which can run any state machine defined in XML avoiding a compilation stage with every strategy change. The XML document has many advantages like: a well standardized description, easy and intuitive writing, portability and a lot of design tools available. With this module users can work directly with top level strategies encapsulating programming details. A user programming an agent only needs to know the set of permitted actions as well as all the events an agent can perceive in order to program the robot. The module allows the incorporation of new functionality, as actions or events with little changes in the source code, so it can grow as fast as other modules like vision or locomotion. The following section will explain how the XML behavior definition document is compounded. State definition States are the basic execution unit. All of these have an associated action which can be a basic action or an action defined as well by a state machine. They can be iterative or not. An iterative state causes an action to be executed repeatedly as long as the state is active. An example could be “turn till the ball is found” that will cause the robot to turn until a ball is found. A non-iterative state executes the associated action once and then waits for an event that switches the active state. Figure 1 shows an example of an state definition in XML.