This book of the SPAR series contains 39 scientific articles presented in the Distributed Autonomous Robotic Systems con
185 81 71MB
English Pages 576 Year 2024
Table of contents :
Foreword
Preface
Contents
How Can We Understand Multi-Robot Systems? a User Study to Compare Implicit and Explicit Communication Modalities
1 Introduction
2 Methods
2.1 Communication Modalities
2.2 Experimental Task
2.3 Experimental Setup
2.4 Participants
2.5 Data Collection and Analysis
3 Results
3.1 Correctness of Responses
3.2 Response Time
3.3 Subjective Reporting
4 Conclusion
References
The Benefits of Interaction Constraints in Distributed Autonomous Systems
1 Introduction
2 Collective Learning in Multi-agent Systems
3 A Three-Valued Model for Collective Learning
4 Simulation Environment
5 The Networks Underlying Agent Interactions
6 Multi-agent Simulation Experiments
6.1 Convergence Results for Physical Networks
7 Constraining the Interaction Network
7.1 Convergence Results for Constrained Interaction Networks
8 Conclusion and Future Work
References
Outlining the Design Space of eXplainable Swarm (xSwarm): Experts' Perspective
1 Introduction
2 Background and Related Work
3 Background and Related Work
3.1 Participants
3.2 Study Procedure and Design
3.3 Analysis
4 Results
4.1 Explanation Categories
4.2 Challenges Faced by Swarm Experts
5 Conclusion
References
VMAS: A Vectorized Multi-agent Simulator for Collective Robot Learning
1 Introduction
2 Related Work
3 The VMAS Platform
4 Multi-robot Scenarios
5 Comparison with MPE
6 Experiments and Benchmarks
7 Conclusion
References
FLAM: Fault Localization and Mapping
1 Introduction
2 Related Work and Background
2.1 Fault Detection
2.2 Risk Awareness
2.3 Distributed Storage
3 System Model
3.1 Risk Modelling
3.2 Feature Vectors
3.3 Fault Detection
3.4 Distributed Belief Map
3.5 Implementation
4 Simulations
4.1 Experimental Setup
4.2 Results
5 Conclusion
References
Social Exploration in Robot Swarms
1 Introduction
2 Related Work
3 Methodology
3.1 Simulation and Local Swarm Behaviours
3.2 Generating Random Environments
3.3 Agents with ``Happiness''
3.4 Social Exploration Algorithm
4 Results
4.1 Performance in Randomly Generated Environments
5 Conclusions and Future Work
References
Stochastic Nonlinear Ensemble Modeling and Control for Robot Team Environmental Monitoring
1 Introduction
2 Problem Formulation
2.1 Task Topology
2.2 Ensemble Model
3 Ensemble Model Control
3.1 Feedback Controller
3.2 Distributed Microscopic Algorithm
4 Simulation and Experimental Setup
5 Results
6 Discussion
7 Conclusion
References
A Decentralized Cooperative Approach to Gentle Human Transportation with Mobile Robots Based on Tactile Feedback
1 Introduction
2 Methodology
2.1 Flexible Tactile Sensors and Methods of Acquiring Force Information
2.2 Design of Control Law
3 Modeling and Simulation
3.1 Modeling
3.2 Simulation
4 Experiments
4.1 Experiment Setup
4.2 Experiment Result
5 Conclusion and Future Work
References
Sparse Sensing in Ergodic Optimization
1 Introduction
2 Prior Work
3 Sparse Sensing in Ergodic Optimization
3.1 Sparse Ergodic Optimization
3.2 Multi-agent Sparse Ergodic Optimization
4 Results and Discussion
4.1 Experimental Details
4.2 Single Agent Sensing Distribution
4.3 Multi-agent Sparse Ergodic Optimization
5 Conclusion
References
Distributed Multi-robot Tracking of Unknown Clustered Targets with Noisy Measurements
1 Introduction
2 Problem Formulation
2.1 Lloyd's Algorithm
3 Distributed Multi-target Tracking
3.1 Instantaneous State Estimation
3.2 Cumulative State Estimation
3.3 Environment Approximation
3.4 Distributed Control
3.5 Algorithm Outline
4 Simulations
4.1 Qualitative Comparison
4.2 Quantitative Comparison
5 Conclusions
References
A Force-Mediated Controller for Cooperative Object Manipulation with Independent Autonomous Robots
1 Background
2 Related Work
3 Challenges and Assumptions
4 Control Methodology
4.1 Closed Loop Control Dynamics
4.2 Inertial Estimation and Error Compensation
4.3 Controller Stability
4.4 Task-Frame Switching
5 Experimental Validation
5.1 Implementation in Hardware
5.2 Implementation in Simulation
6 Results
6.1 Controller Performance Within Static Task Domain
6.2 Controller Performance with a Changing Task Domain
6.3 Multi-robot Performance with Changing Task Domain
7 Discussion
References
A Distributed Architecture for Onboard Tightly-Coupled Estimation and Predictive Control of Micro Aerial Vehicle Formations
1 Introduction
2 Problem Formulation
3 Methodology
3.1 Follower Estimation Scheme
3.2 Follower Controller Scheme
4 Experiments and Results
4.1 Formation Navigation
4.2 Formation Reconfiguration
4.3 Discussion
5 Conclusion
References
Search Space Illumination of Robot Swarm Parameters for Trustworthy Interaction
1 Introduction
2 Methodology
2.1 Scenario
2.2 Swarm Metrics
2.3 Map-Elites Illumination Algorithm Methodology
3 Results
3.1 Qualitative Analysis: Case Study
4 Conclusion and Future Work
References
Collective Gradient Following with Sensory Heterogeneous UAV Swarm
1 Introduction
2 Methodology
3 Experimental Setup
4 Results and Discussion
5 Conclusion
References
DAN: Decentralized Attention-Based Neural Network for the MinMax Multiple Traveling Salesman Problem
1 Introduction
2 Prior Works
3 Problem Formulation
4 mTSP as an RL PROBLEM
5 DAN: Decentralized Attention-Based Network
6 Training
7 Experiments
7.1 Results
7.2 Discussion
8 Conclusion
References
Receding Horizon Control on the Broadcast of Information in Stochastic Networks
1 Introduction
2 Background and Problem Formulation
2.1 Graph Theory
2.2 Information Propagation Model
3 Methodology
3.1 Control Strategy
3.2 Expected Time to Broadcast Information
3.3 Robust Moment Closure
4 Numerical Evaluation and Discussions
5 Conclusions and Future Work
References
Adaptation Strategy for a Distributed Autonomous UAV Formation in Case of Aircraft Loss
1 Introduction
2 Adaptation Algorithm for a Decentralized UAV Formation
2.1 Distributed Autonomous System Model
2.2 Adaptation Strategy for a Distributed Autonomous System
3 Simulation Results and Discussion
4 Conclusions
References
DGORL: Distributed Graph Optimization Based Relative Localization of Multi-robot Systems
1 Introduction
2 Related Work
3 Problem Formulation and the Proposed DGORL Solution
3.1 Graph Formation
3.2 Expansion Through Transition
3.3 Optimization
4 Theoretical Analysis
5 Simulation Experiments and Results
6 Conclusion
References
Characterization of the Design Space of Collective Braitenberg Vehicles
1 Introduction
1.1 Related Work
2 Model and Simulation
2.1 Agent Model
2.2 World
2.3 Metrics
3 Emergent Behaviors
3.1 Behavior Characterization
3.2 General Observations
3.3 Love
3.4 Aggression
3.5 Fear
3.6 Curiosity
4 Conclusion and Outlook
References
Decision-Making Among Bounded Rational Agents
1 Introduction
2 Related Work
3 Problem Formulation
3.1 Multi-agent Markov Decision Process
3.2 Iterative Best Response for MMDPs
4 Methodology
4.1 Bounded Rational Agents in Game-Theoretic Framework
4.2 Importance Sampling for Computing Bounded-Rational Strategies
5 Simulated Experiments
5.1 Simulation Setup
5.2 Results
6 Physical Experiments
7 Conclusion
References
Distributed Multi-robot Information Gathering Using Path-Based Sensors in Entropy-Weighted Voronoi Regions
1 Introduction
2 Problem Formulation
3 Distributed Information Gathering
3.1 Distributed Entropy Voronoi Partition and Planner (DEVPP)
3.2 Multi-agent Distributed Information-Theoretic Planner (MA-DITP)
3.3 Multi-agent Global Information-Theoretic Planner (MA-GITP)
4 Experiments and Results
5 Conclusion
References
Distributed Multiple Hypothesis Tracker for Mobile Sensor Networks
1 Introduction
2 Background
2.1 MHT Definition
2.2 Lloyd's Algorithm
3 Distributed Multiple Hypothesis Tracker
3.1 Assumptions
3.2 Track Exchange and Fusion
3.3 Importance Weighting Function
4 Results
4.1 Performance Metric
4.2 Stationary Targets
4.3 Dynamic Targets
4.4 Discussion
5 Conclusion
References
Distributed Multirobot Control for Non-cooperative Herding
1 Introduction
2 Prior Work
3 Problem Formulation
4 Controller Design
4.1 Approach 1: One Dog to One Sheep Allocation Based Approach
4.2 Approach 2: Iterative Distributed Reformulation of (12)
5 Results
5.1 Simulation Results
5.2 Robot Experiments
6 Conclusions
7 Appendix: Proof of Feasibility for Approach 1
References
On Limited-Range Coverage Control for Large-Scale Teams of Aerial Drones: Deployment and Study
1 Introduction
2 Methods
2.1 Coverage Control Strategy
2.2 Simulation Protocol
2.3 Field Test Protocol
2.4 Metrics
3 Results and Discussion
3.1 Simulation Results
3.2 Results of Field Tests
4 Conclusions
References
MARLAS: Multi Agent Reinforcement Learning for Cooperated Adaptive Sampling
1 Introduction
2 Problem Formulation
3 Approach
4 Experiments and Results
4.1 Analyzing Scalability of Learned Policy
4.2 Analyzing Robustness of the Learned Policy
4.3 Online Adaptations to the Changing Field
4.4 Robot Experiments
5 Conclusion and Future Work
References
Proportional Control for Stochastic Regulation on Allocation of Multi-robots
1 Introduction
2 Problem Formulation
2.1 Graph Theory
2.2 Ensemble Model
3 Methodology
3.1 Control Strategy
3.2 Control Analysis
4 Results
4.1 Numerical Simulations
4.2 Experimental Results
5 Discussion
6 Conclusions and Future Work
References
Comparing Stochastic Optimization Methods for Multi-robot, Multi-target Tracking
1 Introduction
1.1 Related Work
1.2 Contributions
2 Problem Formulation
2.1 PHD Filter
2.2 Lloyd's Algorithm
2.3 Distributed PHD Filter
2.4 Assumptions
3 Distributed Control
3.1 Simulated Annealing
3.2 Particle Swarm Optimization
3.3 Communication and Complexity
4 Experimental Results
4.1 Evaluation Metric
4.2 Stationary Targets
4.3 Dynamic Targets
5 Conclusion
References
Multi-agent Deep Reinforcement Learning for Countering Uncrewed Aerial Systems
1 Introduction
1.1 Related Work
2 Problem Formulation
3 Multi-agent Deep Reinforcement Learning for C-UAS
3.1 Proximal Policy Optimization Based Advantage Actor Critic
3.2 Parameter Sharing
4 Simulation Results
5 Conclusion
References
Decentralized Risk-Aware Tracking of Multiple Targets
1 Introduction
2 Related Work
3 Problem Formulation
3.1 Notation
3.2 Robot and Target Modeling
3.3 Problem Definition
4 Approach
4.1 Decentralized Goal Position Generation
4.2 Decentralized Motion Synthesis
4.3 Decentralized Target Position Estimation
5 Experiments
5.1 Decentralized Tracking
5.2 Comparison to Centralized Target Tracking
6 Conclusion and Future Work
References
Application of 3D Printed Vacuum-Actuated Module with Multi-soft Material to Support Handwork Rehabilitation
1 Introduction
2 MORI-A FleX: The Vacuum-Actuated Module with 3D Printed Flexible Connectors
2.1 Creation Procedure of MORI-A FleX
2.2 TPU Internal Construction
2.3 3D Printed Flexible Connector
3 MORI-A Flex Mechanical Properties
3.1 Load Displacement Performance
3.2 Coefficient of Friction with Texture Change Connector
4 Applications
5 Conclusion
References
Combining Hierarchical MILP-MPC and Artificial Potential Fields for Multi-robot Priority-Based Sanitization of Railway Stations
1 Introduction and Related Works
2 Architecture
3 Experiments and Comparison with MARL Technique
4 Conclusions
References
Exploration System for Distributed Swarm Robots Using Probabilistic Action Decisions
1 Introduction
2 The Proposed Exploration Swarm Robot System and Area Control Algorithm
2.1 Distributed Robot System for Wide-Area Exploration by Swarm Robots
2.2 Overview of the Region Control Algorithm with Probabilistic Action Decision
2.3 Algorithm for Probabilistic Action Decision Making
3 Evaluation by Simulation
3.1 Algorithm Verification by Simulator and Comparison of Search Efficiency by Changing Control Parameters
4 Development and Implementation of Exploration Systems
4.1 Hardware Development of the Robot for Actual Equipment Verification
4.2 Software and Exploration System Configuration
5 Exploration Field Tests and Results
5.1 Description of Experimental Environment/conditions and Data Collected
5.2 Examination of Collected Data
6 Conclusion
References
Distributed Estimation of Scalar Fields with Implicit Coordination
1 Introduction
2 Related Work
3 Problem Statement
4 Methods
4.1 Gaussian Process Regression
4.2 Spatial Prior
4.3 Exploration
4.4 Vertex Quality Computation
5 Experimental Evaluation and Discussion
6 Conclusion
References
A Constrained-Optimization Approach to the Execution of Prioritized Stacks of Learned Multi-robot Tasks
1 Introduction
2 Background and Related Work
2.1 Multi-task Learning, Composition, and Execution
2.2 Constraint-Based Task Execution
2.3 From Dynamic Programming to Constraint-Driven Control
3 Prioritized Multi-task Execution
4 Experimental Results
4.1 Multi-robot Tasks
4.2 Discussion
5 Conclusion
A Comparison Between Optimal Control, Optimization-Based Control, and RL Policy
B Implementation Details
References
A Comparison of Two Decoupled Methods for Simultaneous Multiple Robots Path Planning
1 Introduction
2 Related Work
3 Planning Algorithms
3.1 Problem Formulation
3.2 Decoupled Prioritized Planning (PP)
3.3 Decoupled Fixed-Path Coordination (FPC)
4 Numerical Experiments
4.1 Illustrative Test Cases
4.2 Monte Carlo Simulations
5 Conclusion
References
Smarticle 2.0: Design of Scalable, Entangled Smart Matter
1 Introduction
2 Background and Related Work
3 Electro-mechanical Design
4 Fabrication
5 Characterization
6 Multi-robot Demonstrations
7 Summary
References
Hybrid Flock - Formation Control Algorithms
1 Introduction
2 Methodology
2.1 Pot2Lapl
2.2 Flock2Lapl
3 Experimental Evaluation
3.1 Robotic Platform
3.2 Evaluation Metrics
3.3 Experimental Setup
3.4 Characterization Experiment
3.5 Narrow Passage Experiment
4 Results and Discussion
5 Conclusion
References
Multi-Robot Systems Research: A Data-Driven Trend Analysis
1 Introduction
2 Exhaustive Search
2.1 Methodology
2.2 Findings from Journal-Specific Analysis
2.3 Findings from Trend Analysis
3 Expert Survey
3.1 Methodology
3.2 Findings
4 Conclusions
References
Coverage Control for Exploration of Unknown Non-convex Environments with Limited Range Multi-robot Systems
1 Introduction
2 Problem Statement
3 Distributed Control Algorithm for Exploration
4 Time-Varying Probability Density Function for Exploration
5 Simulations and Experiments
5.1 Implementation
5.2 Results and Discussion
6 Conclusions
References
Author Index
Springer Proceedings in Advanced Robotics 28 Series Editors: Bruno Siciliano · Oussama Khatib
Julien Bourgeois · Jamie Paik · Benoît Piranda · Justin Werfel · Sabine Hauert · Alyssa Pierson · Heiko Hamann · Tin Lun Lam · Fumitoshi Matsuno · Negar Mehr · Abdallah Makhoul Editors
Distributed Autonomous Robotic Systems 16th International Symposium
Springer Proceedings in Advanced Robotics
28
Series Editors Bruno Siciliano, Dipartimento di Ingegneria Elettrica e Tecnologie dell’Informazione, Università degli Studi di Napoli Federico II, Napoli, Italy Oussama Khatib, Robotics Laboratory Department of Computer Science, Stanford University, Stanford, CA, USA
Advisory Editors Gianluca Antonelli, Department of Electrical and Information Engineering, University of Cassino and Southern Lazio, Cassino, Italy Dieter Fox, Department of Computer Science and Engineering, University of Washington, Seattle, WA, USA Kensuke Harada, Engineering Science, Osaka University Engineering Science, Toyonaka, Japan M. Ani Hsieh, GRASP Laboratory, University of Pennsylvania, Philadelphia, PA, USA Torsten Kröger, Karlsruhe Institute of Technology, Karlsruhe, Germany Dana Kulic, University of Waterloo, Waterloo, ON, Canada Jaeheung Park, Department of Transdisciplinary Studies, Seoul National University, Suwon, Korea (Republic of)
The Springer Proceedings in Advanced Robotics (SPAR) publishes new developments and advances in the fields of robotics research, rapidly and informally but with a high quality. The intent is to cover all the technical contents, applications, and multidisciplinary aspects of robotics, embedded in the fields of Mechanical Engineering, Computer Science, Electrical Engineering, Mechatronics, Control, and Life Sciences, as well as the methodologies behind them. The publications within the “Springer Proceedings in Advanced Robotics” are primarily proceedings and post-proceedings of important conferences, symposia and congresses. They cover significant recent developments in the field, both of a foundational and applicable character. Also considered for publication are edited monographs, contributed volumes and lecture notes of exceptionally high quality and interest. An important characteristic feature of the series is the short publication time and world-wide distribution. This permits a rapid and broad dissemination of research results. Indexed by SCOPUS, SCIMAGO, WTI Frankfurt eG, zbMATH. All books published in the series are submitted for consideration in Web of Science.
Julien Bourgeois · Jamie Paik · Benoît Piranda · Justin Werfel · Sabine Hauert · Alyssa Pierson · Heiko Hamann · Tin Lun Lam · Fumitoshi Matsuno · Negar Mehr · Abdallah Makhoul Editors
Distributed Autonomous Robotic Systems 16th International Symposium
Editors Julien Bourgeois University of Franche-Comté Montbéliard, France Benoît Piranda CNRS University of Franche-Comté Besancon, France Sabine Hauert University of Bristol Bristol, UK
Jamie Paik École Polytechnique Fédérale de Lausanne Lausanne, Switzerland Justin Werfel Harvard University Boston, MA, USA Alyssa Pierson Boston University Boston, MA, USA
Heiko Hamann University of Lübeck Lubeck, Germany
Tin Lun Lam The Chinese University of Hong Kong, Shenzhen Shenzhen, China
Fumitoshi Matsuno Kyoto University Kyoto, Japan
Negar Mehr University of Illinois System Urbana, IL, USA
Abdallah Makhoul University of Franche-Comté Montbéliard, France
ISSN 2511-1256 ISSN 2511-1264 (electronic) Springer Proceedings in Advanced Robotics ISBN 978-3-031-51496-8 ISBN 978-3-031-51497-5 (eBook) https://doi.org/10.1007/978-3-031-51497-5 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Paper in this product is recyclable.
Foreword
At the dawn of the century’s third decade, robotics is reaching an elevated level of maturity and continues to benefit from the advances and innovations in its enabling technologies. These all are contributing to an unprecedented effort to bring robots to human environment in hospitals and homes, factories and schools; in the field for robots fighting fires, making goods and products, picking fruits and watering the farmland, and saving time and lives. Robots today hold the promise for making a considerable impact in a wide range of real-world applications from industrial manufacturing to healthcare, transportation, and exploration of the deep space and sea. Tomorrow, robots will become pervasive and touch upon many aspects of modern life. The Springer Tracts in Advanced Robotics (STAR) was launched in 2002 with the goal of bringing to the research community the latest advances in the robotics field based on their significance and quality. During the last fifteen years, the STAR series has featured the publication of both monographs and edited collections. Among the latter, the proceedings of thematic symposia devoted to excellence in robotics research, such as ISRR, ISER, FSR, and WAFR, have been regularly included in STAR. The expansion of our field as well as the emergence of new research areas has motivated us to enlarge the pool of proceedings in the STAR series in the past few years. This has ultimately led to launching a sister series in parallel to STAR. The Springer Proceedings in Advanced Robotics (SPAR) is dedicated to the timely dissemination of the latest research results presented in selected symposia and workshops. This volume of the SPAR series brings the proceedings of the sixteenth edition of the DARS Symposium on Distributed Autonomous Robotic Systems, whose proceedings have been published within SPAR since the thirteenth edition. This symposium took place in Montbéliard, France November 28 to 30, 2022. The volume edited by Julien Bourgeois, Justin Werfel, Jamie Paik, Benoît Piranda, Sabine Hauert, Alyssa Pierson, Fumitoshi Matsuno, Negar Mehr, Heiko Hamann, Tin Lun Lam, and Abdallah Makhoul contains 39 scientific contributions cutting across mobile sensor networks, unmanned aerial vehicles, multi-agent systems, algorithms for multi-robot systems, modular robots, swarm robotics, and reinforcement learning or deep learning, all united through the common thread of distributed robotic systems. With its excellent technical program, DARS culminates with this unique reference on the current developments and new advances in distributed autonomous robotic systems— a genuine tribute to its contributors and organizers! October 2023
Bruno Siciliano Oussama Khatib SPAR Editors
Preface
Distributed robotics is an interdisciplinary and rapidly growing area, with research and applications spanning fields including computer science, communication and control systems, electrical and mechanical engineering, life sciences, and humanities. This book brings together advanced works in this field presented at the 16th International Symposium on Distributed Autonomous Robotic Systems (DARS), November 28–30, 2022, in Montbéliard, France. During this symposium, 39 papers were presented covering a broad scope of topics within distributed robotics including mobile sensor networks, unmanned aerial vehicles, multi-agent systems, algorithms for multi-robot systems, modular robots, swarm robotics, and reinforcement learning or deep learning applied to multi-robot systems. Three keynote talks featured renowned experts in the field: Alcherio Martinoli (EPFL), on “Gas and Particle Sensing in Air: Challenges, Achievements, and Lessons”; Jessica Flack (SFI), on “Hourglass Emergence”; and Kirstin Petersen (Cornell), on “Designing Robotic Systems with Collective Embodied Intelligence.” The welcome reception featured an artwork made from Blinky Blocks, created by the artist duo Scenocosme. They have been working for several years with this hardware developed at Carnegie Mellon University and the FEMTO-ST Institute and have exhibited this work in many museums. The program committee awarded the Best Paper Award to Shoma Abe, Jun Ogawa, Yosuke Watanabe, MD Nahin Islam Shiblee, Masaru Kawakami, and Hidemitsu Furukawa for their work on “Application of 3D Printed Vacuum-actuated module with Multi-Soft Material to Support Handwork Rehabilitation,” and the Best Student Paper Award to Yuhong Cao, Zhanhong Sun and Guillaume Sartoretti for their work on “DAN: Decentralized Attention-based Neural Network for the MinMax Multiple Traveling Salesman Problem.” Finally, we would like to thank all those who contributed to the success of DARS 2022, including Pays de Montbéliard Agglomération, EIPHI Graduate School, MA Scène Nationale, Région Bourgogne Franche-Comté for sponsoring the event, the DARS advisory committee for their useful advice, the program committee for their high-quality
viii
Preface
scientific work, and the organizing committee for serving the attendees during the conference. Summer 2023
Julien Bourgeois Justin Werfel Jamie Paik Benoît Piranda Sabine Hauert Alyssa Pierson Fumitoshi Matsuno Negar Mehr Heiko Hamann Tin Lun Lam Abdallah Makhoul
Contents
How Can We Understand Multi-Robot Systems? a User Study to Compare Implicit and Explicit Communication Modalities . . . . . . . . . . . . . . . . . . . . . . . . . . . Valeria Villani, Cristina Vercellino, and Lorenzo Sabattini The Benefits of Interaction Constraints in Distributed Autonomous Systems . . . Michael Crosscombe and Jonathan Lawry Outlining the Design Space of eXplainable Swarm (xSwarm): Experts’ Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mohammad Naiseh, Mohammad D. Soorati, and Sarvapali Ramchurn
1
14
28
VMAS: A Vectorized Multi-agent Simulator for Collective Robot Learning . . . . Matteo Bettini, Ryan Kortvelesy, Jan Blumenkamp, and Amanda Prorok
42
FLAM: Fault Localization and Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Guillaume Ricard, David Vielfaure, and Giovanni Beltrame
57
Social Exploration in Robot Swarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elliott Hogg, David Harvey, Sabine Hauert, and Arthur Richards
69
Stochastic Nonlinear Ensemble Modeling and Control for Robot Team Environmental Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Victoria Edwards, Thales C. Silva, and M. Ani Hsieh
83
A Decentralized Cooperative Approach to Gentle Human Transportation with Mobile Robots Based on Tactile Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Yi Zhang, Yuichiro Sueoka, Hisashi Ishihara, Yusuke Tsunoda, and Koichi Osuka Sparse Sensing in Ergodic Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Ananya Rao, Ian Abraham, Guillaume Sartoretti, and Howie Choset Distributed Multi-robot Tracking of Unknown Clustered Targets with Noisy Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Jun Chen, Philip Dames, and Shinkyu Park A Force-Mediated Controller for Cooperative Object Manipulation with Independent Autonomous Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 Nicole E. Carey and Justin Werfel
x
Contents
A Distributed Architecture for Onboard Tightly-Coupled Estimation and Predictive Control of Micro Aerial Vehicle Formations . . . . . . . . . . . . . . . . . . 156 I. Kagan Erunsal, Rodrigo Ventura, and Alcherio Martinoli Search Space Illumination of Robot Swarm Parameters for Trustworthy Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 James Wilson and Sabine Hauert Collective Gradient Following with Sensory Heterogeneous UAV Swarm . . . . . . 187 Tugay Alperen Karagüzel, Nicolas Cambier, A. E. Eiben, and Eliseo Ferrante DAN: Decentralized Attention-Based Neural Network for the MinMax Multiple Traveling Salesman Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 Yuhong Cao, Zhanhong Sun, and Guillaume Sartoretti Receding Horizon Control on the Broadcast of Information in Stochastic Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 Thales C. Silva, Li Shen, Xi Yu, and M. Ani Hsieh Adaptation Strategy for a Distributed Autonomous UAV Formation in Case of Aircraft Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Tagir Muslimov DGORL: Distributed Graph Optimization Based Relative Localization of Multi-robot Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Ehsan Latif and Ramviyas Parasuraman Characterization of the Design Space of Collective Braitenberg Vehicles . . . . . . 257 Jack A. Defay, Alexandra Q. Nilles, and Kirstin Petersen Decision-Making Among Bounded Rational Agents . . . . . . . . . . . . . . . . . . . . . . . . 273 Junhong Xu, Durgakant Pushp, Kai Yin, and Lantao Liu Distributed Multi-robot Information Gathering Using Path-Based Sensors in Entropy-Weighted Voronoi Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 Alkesh Kumar Srivastava, George P. Kontoudis, Donald Sofge, and Michael Otte Distributed Multiple Hypothesis Tracker for Mobile Sensor Networks . . . . . . . . . 300 Pujie Xin and Philip Dames Distributed Multirobot Control for Non-cooperative Herding . . . . . . . . . . . . . . . . . 317 Nishant Mohanty, Jaskaran Grover, Changliu Liu, and Katia Sycara
Contents
xi
On Limited-Range Coverage Control for Large-Scale Teams of Aerial Drones: Deployment and Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 Filippo Bertoncelli, Mehdi Belal, Dario Albani, Federico Pratissoli, and Lorenzo Sabattini MARLAS: Multi Agent Reinforcement Learning for Cooperated Adaptive Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 Lishuo Pan, Sandeep Manjanna, and M. Ani Hsieh Proportional Control for Stochastic Regulation on Allocation of Multi-robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 Thales C. Silva, Victoria Edwards, and M. Ani Hsieh Comparing Stochastic Optimization Methods for Multi-robot, Multi-target Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378 Pujie Xin and Philip Dames Multi-agent Deep Reinforcement Learning for Countering Uncrewed Aerial Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394 Jean-Elie Pierre, Xiang Sun, David Novick, and Rafael Fierro Decentralized Risk-Aware Tracking of Multiple Targets . . . . . . . . . . . . . . . . . . . . . 408 Jiazhen Liu, Lifeng Zhou, Ragesh Ramachandran, Gaurav S. Sukhatme, and Vijay Kumar Application of 3D Printed Vacuum-Actuated Module with Multi-soft Material to Support Handwork Rehabilitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424 Shoma Abe, Jun Ogawa, Yosuke Watanabe, MD Nahin Islam Shiblee, Masaru Kawakami, and Hidemitsu Furukawa Combining Hierarchical MILP-MPC and Artificial Potential Fields for Multi-robot Priority-Based Sanitization of Railway Stations . . . . . . . . . . . . . . 438 Riccardo Caccavale, Mirko Ermini, Eugenio Fedeli, Alberto Finzi, Emanuele Garone, Vincenzo Lippiello, and Fabrizio Tavano Exploration System for Distributed Swarm Robots Using Probabilistic Action Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453 Toui Sato, Kosuke Sakamoto, Takao Maeda, and Yasuharu Kunii Distributed Estimation of Scalar Fields with Implicit Coordination . . . . . . . . . . . 466 Lorenzo Booth and Stefano Carpin A Constrained-Optimization Approach to the Execution of Prioritized Stacks of Learned Multi-robot Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479 Gennaro Notomista
xii
Contents
A Comparison of Two Decoupled Methods for Simultaneous Multiple Robots Path Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494 Benjamin Bouvier and Julien Marzat Smarticle 2.0: Design of Scalable, Entangled Smart Matter . . . . . . . . . . . . . . . . . . 509 Danna Ma, Jiahe Chen, Sadie Cutler, and Kirstin Petersen Hybrid Flock - Formation Control Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523 Cyrill Baumann, Jonas Perolini, Emna Tourki, and Alcherio Martinoli Multi-Robot Systems Research: A Data-Driven Trend Analysis . . . . . . . . . . . . . . 537 João V. Amorim Marques, María-Teresa Lorente, and Roderich Groß Coverage Control for Exploration of Unknown Non-convex Environments with Limited Range Multi-robot Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 550 Mattia Catellani, Federico Pratissoli, Filippo Bertoncelli, and Lorenzo Sabattini Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563
How Can We Understand Multi-Robot Systems? a User Study to Compare Implicit and Explicit Communication Modalities Valeria Villani(B) , Cristina Vercellino, and Lorenzo Sabattini Department of Sciences and Methods for Engineering, University of Modena and Reggio Emilia, Reggio Emilia, Italy {valeria.villani,lorenzo.sabattini}@unimore.it Abstract. In this paper we investigate the use of different communication modalities for a multi-robot system. Specifically, the aim is to let a human subject be aware of the goal robots are moving towards: to this end, we compare implicit (i.e., the user infer robots goal from their motion pattern) and explicit (i.e., using visual or audio clues) communication modalities. We propose a user study, performed in virtual reality, aimed to assess which communication modality is the most effective and how they are perceived by human subjects. While implicit communication has the intrinsic advantage of not requiring any specific hardware on the robots, results of the user study show that implicit communication is comparable, from the users’ perspective, to explicit communication modalities when robots goals are unambiguous and sufficiently distinct. In other cases, explicit communication is preferable.
1 Introduction Along with the diffusion of robotic platforms, multi-robot systems are expected to be deployed in a variety of applications in our everyday lives, including precision agriculture, logistics, environmental monitoring, or surveillance [1]. Their spread is enabled not only by the availability of efficient and safe coordination strategies, but also of intuitive communication modalities. Specifically, given the complexity associated with the coordination among numerous individual robots, it is important that, on the one hand, users can exert control over the multi-robot system in a straightforward and intuitive manner and, on the other hand, users can easily understand what robots are doing or going to do. This second problem is the focus of our work and refers to the availability of communication modalities that allow robots to give information about their status in a manner that is easily perceived and understood by human users. Feedback from the robotic platform to the user can be communicated either explicitly or implicitly [2]. Implicit communication consists in defining a robot behavior that enables the collaborator to quickly and confidently infer the goal [3–6]. This idea is related to the notion of legibility and requires the use of implicit cues, such as robot motion. On the contrary, in explicit communication explicit cues are used to provide feedback from the robot to the user. Classical approaches in this regard resort to audio L. Sabattini—This work was supported by the COLLABORATION Project through the Italian Ministry of Foreign Affairs and International Cooperation. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 1–13, 2024. https://doi.org/10.1007/978-3-031-51497-5_1
2
V. Villani et al.
or verbal feedback [7]. Additionally, visual feedback is typically employed [8–10]. In explicit communication, ad-hoc apparatus for communication purposes, such as LEDs, light indicators or speakers, is typically used, which limits the applicability of this communication modality in practical scenarios. Conversely, implicit communication leverages human understanding of movement and the tendency of people to discover intentionality in everything, even in robots [11]. With specific respect to multi-robot systems, few previous works have investigated how the concepts of legibility and implicit communication can be applied to multi-robot systems. In particular, the work [5, 6] aimed to identify robots motion characteristics that are more relevant int terms of system legibility. To this end, the authors considered different trajectories leading the multi-robot system to a spatial goal and other parameters such as dispersion and stiffness of the group of robots. Furthermore, the authors in [12] considered also the effect of target difficulty, namely whether the spatial goals had the same number of neighboring goals or not, and initial radius, namely how much the robots were spread out at the beginning of the experiment. Building upon these ideas, the aim of this work is to investigate the use of implicit and explicit communication modalities in the interaction with multi-robot systems, and assess their effectiveness and user’s satisfaction with them. To this end, we set up an experiment where two groups of robots move towards a goal and communicate their intentions to human subjects either with implicit cues, or with explicit color cues, arrows or verbal messages. Effectiveness of communication modalities is measured in terms of correctness of subjects responses and response time. The results of our study show that explicit communication modalities are more intuitive for participants and implicit communication fails especially in the case of goals with close neighbors, thus confirming findings by [12]. The rest of the paper is organized as follows. In Sect. 2 we described the experimental approach and present test protocol. Section 3 reports and discusses the achieved results, in terms of measured effectiveness of the different communication modalities and subjective reporting by test subjects. Finally, Sect. 4 follows with some concluding remarks.
2 Methods In this section, we describe the experiments carried out to compare implicit and explicit communication in human-multi-robot system interaction. The experiment, which is described in details in Sect. 2.2, consists in having two groups of mobile robots moving towards a goal and asking participants to infer the goal relying on explicit or implicit cues. 2.1
Communication Modalities
As mentioned in Sect. 1, implicit communication has been compared with three explicit communication modalities: vocal commands, color cues, arrows. Figure 1 shows an example of implicit communication (1(a)), color cues (1(b)) and arrows (1(c)). As regards implicit communication, cues about robots intention have been provided through their motion. The trajectory leading a group of robots to a goal has been computed according to the minimum jerk criterion: it minimizes the square of the magnitude
How Can We Understand Multi-Robot Systems?
3
Fig. 1. Considered communication modalities.
of the jerk, which is the rate of change of the acceleration. Previous works have shown that this kind of trajectory is usually used by humans to grasp objects and improves the interaction of a human with a robot manipulator [13, 14]. Moreover, in [5, 6] it has been shown that this trajectory improves the legibility of one or several multi-robot systems that move in a shared environment with the user. As regards explicit communication, vocal commands consisted in a sentence like “The group of robots number 1 is moving towards the (e.g.,) red goal”. Color cues consisted in highlighting the group of robots and their goal with the same color, as shown in Fig. 1(b). Arrows are shown in Fig. 1(b) and simply consist in an arrow starting from a group of robot and ending in the associated goal. It is worthwhile noting that the considered explicit communication modalities are very intuitive and do not require any processing or intuition by the user, as opposed to implicit communication. Nevertheless, implementing the explicit communication modalities requires specific hardware (i.e., speaker, light, projector), which is not needed in the case of implicit communication. 2.2 Experimental Task The experimental task was inspired by those considered in [5, 6]. Specifically, two groups of mobile robots, each composed by six robots, were placed in a virtual environment, together with five goals, as shown in Fig. 1. The experiment consisted of two main parts. In the first part, each group of robots started from a common starting position and traveled towards a goal, randomly selected among those available. The two groups of robots started moving at the same time and were headed to different goals.
4
V. Villani et al.
Fig. 2. Experimental task with puzzle solving as confounding task.
Participants were asked to selected the goal for each group of robot as soon as they were confident with it. The task was repeated four times, each time with a different communication modality (either implicit or one of the explicit ones). The order communication modalities were used varied among participants. The second part of the experiment was organized as the first one, with the same objectives. In addition, participants were involved in a primary task, consisting of a puzzle. Hence, they were asked to focus on a puzzle, while robots were moving towards goals. As in the first part, they were asked to select the goal of each group of robot as soon as they were confident with it. Figure 2 shows the experimental setting in the second part of the task. As a result, the experimental task of each participant consisted in eight trails, where all the considered communication modalities were tested in random order twice: in the first part, focus was exclusively on inferring robots goal; in the second part, participants had to solve a puzzle and, at the same time, infer robots goal. Before the beginning of the experiment, participants were introduced to the objectives of the experiment and its setup. Moreover, they were instructed on how to interact with the system (e.g., how to select the objects in the environment), and they were informed that clues about the goal of the groups may have appeared during the experiment. At the end of the experiment, they were administered a questionnaire investigating their preference for the tested communication modalities. 2.3
Experimental Setup
The experiments were carried out in virtual reality. The use of virtual reality allows for a high number of robots moving in a wide environment, thus overcoming laboratory limitations. Moreover, it guarantees repeatability of experiments, given the lack of localization errors, slippage or other disturbing factors [6]. As regards interaction, it provides an immersive interaction experience that faithfully reproduces interaction in a real environment [15].
How Can We Understand Multi-Robot Systems?
5
The environment was developed in Unity and the experiments were reproduced with the Oculus Rift headset. Goals were rendered as colored boxes, while robots were rendered as elliptical cylinders (width equal to 0.05 m, lengths equal to 0.1 m, height equal to 0.01 m). The two groups of robots were identified by a text box “Group 1” or “Group 2”, moving with the group. Next to goals, two buttons were placed to let participants select the goal inferred for each group. Interaction was enabled though the Oculus remote controller, called Oculus Remote. Participants could select pieces of the puzzle or the goal of group of robots through the central button of the controller. 2.4 Participants A total of 30 volunteer subjects were enrolled in the experiment. Thesubjects are students and researchers working at our engineering department, in different research fields and are categorized as follows: 12 females, 18 males, spread across an age range of 20– 65 years, with mean age of 25.5 years. All the participants were completely na¨ıve to the experimental task and goals. Ten of them had never used virtual reality before the experiment, while eleven reported they had used it once or twice. The remaining nine participants reported they had used it more than twice before this experiment. Moreover, three reported they had never played videogames before, twelve reported they play less than once in a month, five play some times per month, and ten are used to play at least once in a week. The study protocol followed the Helsinki declaration and the participants signed the informed consent form before starting the experiment. All the data were analyzed and reported anonymously. 2.5 Data Collection and Analysis During each experiment, we collected data about the participants’ performance, in terms of selected goal for each robot group, time taken from the beginning of robots movements to selection of the goal for each group and, for the second experimental task, time required to complete the puzzle. A total of 480 values were measured for these quantities (240 for time required to complete the puzzle), since each subject participated in the two parts composing the experiment (without and with puzzle), and in each part they experienced the four considered communication modalities and two groups of robots were used. Data were analyzed using MATLAB R2021a and its Statistics and Machine Learning Toolbox. The final questionnaire included four questions addressing comfort during the experiment, effectiveness of the considered communication modalities in the first and the second part of the experiment, and overall preference for such modalities.
3 Results 3.1 Correctness of Responses The first analysis we carried out investigated whether the use of implicit or explicit communication modalities has some influences on legibility of robots groups. Specifically,
6
V. Villani et al.
Table 1. Correctness of participant’s responses for the different communication modalities: 1 denotes correct response, 0 incorrect response. Communication modality Correctness 0 1 Implicit
36 84
Vocal commands
14 106
Color cues
10 110
Arrows
10 110
we analyzed the effect of communication modality on the correctness of estimated goals for all trials. To this end, we tested the following null hypothesis: H0 - Null hypothesys: there is no relation between the correctness of test participant’s response and robot’s communication modality. Here we considered correctness of response as the dependent variable and communication modality as the independent variable. Being correctness a categorical variable, we performed a Pearson’s chi-squared test [16]. Table 1 reports the contingency table for the hypothesis H0. The table shows that explicit communication modalities led to less errors than implicit communication. From the chi-squared test, these results proved statistically significantly different with p-value p = 7.56 · 10−7 . Hence, the null hypothesis can be rejected: the communication modality, implicit or explicit, affects user’s performance on the inference of robots goal, with implicit communication introducing a higher number of errors. This result is not surprising since the considered explicit communication modalities are very intuitive and straightforward and do not require any information processing or human thinking (although they do require some hardware). Nevertheless, it is still interesting that implicit cues are processed correctly in 70% of trials. Furthermore, we investigated the effect, if any, of the position of the goals and the presence of a puzzle on correctness of test participants responses. To this end, we defined the two following null hypotheses: H1 - Null hypothesys: there is no relation between the correctness of test participant’s response and robots group’s goal. H2 - Null hypothesys: there is no relation between the correctness of test participant’s response and the involvement in a confounding task. Tables 2 and 3 report the contingency tables for these hypotheses, where we considered, as before, correctness of response as the dependent variable and goal and involvement in a confounding task as the independent variables. The tables report that differences can be found among the different conditions. In particular, with reference to Table 2, the goals between the middle one and the leftmost (green) and the rightmost (blue) were the most misestimated, whereas the leftmost (yellow), middle (red) and rightmost (cyan) were correctly estimated more frequently. This is due to the fact that the green and blue goals were the most ambiguous for implicit communication, since they had two neighbors. Indeed, while the one in the middle, that had also two neighbors, was reached with a straight trajectory, which caused poor confusion, the green and
How Can We Understand Multi-Robot Systems?
7
Table 2. Correctness of participant’s responses for the different robots goals, as shown in Fig. 1. Goal
Correctness 0 1
Leftmost (yellow)
8
86
Second from left (green) 19 77 Middle (red)
4
74
Second from right (blue) 31 72 Rightmost (cyan)
8
101
Table 3. Correctness of participant’s responses with respect to the presence of puzzle solving, meant as additional confounding task. Confounding task Correctness 0 1 Puzzle
26 214
None
44 196
blue goals were often confused with the leftmost and rightmost goals, respectively. Conversely, these latter were rarely confused since the corresponding trajectory exhibited a large deviation to left or right. These considerations are corroborated by the results of the chi-squared test, which returned a p-value p = 4.72 · 10−7 for H1, thus rejecting the null hypothesis and confirming that the position of goals influences accuracy of estimation. As regards the presence of a confounding task test participants had to focus on while robots were moving towards goals, Table 3 shows that more errors were made when no confounding task was included. This could be due to the fact that this condition was always administered as first to all the participants, to let them become familiar with the communication modalities, which were the main focus of our experiments. Thus, the results in Table 3 might reflect the learning effect. The chi-squared test returned a p-value p = 0.02 and, hence, the null hypothesis H2 can be rejected. However, a deeper analysis in this regard should be carried out randomizing the order of the two tasks among test participants. 3.2 Response Time As for correctness, we considered response time (T) as independent variable and investigated whether the three dependent variables communication modality (C), robots goal (G) and involvement in a confounding task (P) had an effect on response time. To this end, we defined the following null hypothesis: H3 - Null hypothesis: there is no relation between the response time and robot’s communication modality, goal and involvement in a confounding task. The analysis was limited to those trials test participants gave a correct reply, that is 410 out of 480 trials. Table 4 reports the results of a three-way analysis of variance
8
V. Villani et al.
Table 4. P-values for three-way ANOVA: C denotes the communication modality, G robots goal and P the presence of puzzle solving as confounding task. Variable p-value C
< 10−7
G
< 10−7
P
0.73
CG
< 10−7
CP
0.34
PG
0.76
(ANOVA) in terms of the p-value of all the main effects C, G and P and also of the interaction variables. Results in the table show that communication modality (C) and goal (G) are the main effects that influence response time in a statistically significant manner. Also the effect of their interaction (CG) results statistically significant. As a result, we can reject the null hypothesis for the variables C and G since they influence response time. Conversely, the presence of a confounding task, such as puzzle solving, did not significantly affect participant’s response time. It should be clarified that no priority was given to test participants regarding the two tasks, puzzle solving and guessing robots goal: they could stop puzzle solving as soon as they had a reliable intuition about the goal of any of the two groups of robots. To further investigate differences existing among the experimental conditions, we carried out a post-hoc analysis, based on multiple pairwise comparisons. Indeed, results in Table 4 suggest that for the variable C, G and CG, on average, at least one of the considered experimental conditions is different from the others. However, this analysis does not provide details on which conditions are different from each other. To answer this question, we performed the Tukey’s honestly significant difference test [17]. As regards the variable C, that is the effect of communication modality on participant’s response time, Fig. 3(a) shows the mean response time marginalized for the considered communication modalities. The figure shows that implicit communication caused higher response time than explicit modalities, which all required comparable response time. Multiple comparison with Tukey’s test confirmed this result, since the only pairs with statistically significant difference are those involving implicit communication (p-values p < 10−10 for all the pairs (implicit, arrows), (implicit, colors), (implicit, audio)). Conversely, for all possible pairs with two explicit communication modalities, there was no statistically significant difference between mean response times in each pair (p-value p = 0.99 for the pair (arrows, colors), p = 0.16 for (arrows, audio) and p = 0.30 for (colors, audio)). As regards the variable G, which refers to robots goal, Fig. 3(b) reports marginal mean response time. The figure shows that the green and blue target required longer response times, thus confirming that they were the most critical, as anticipated in Sect. 3.1. In particular, the green one returned statistically significantly different response time with respect to yellow (p = 0.0017), red (p = 0.0027) and cyan (p =
How Can We Understand Multi-Robot Systems?
9
Fig. 3. Mean response time for the different communication modalities and robots goals.
10−4 ) targets, while response time for the blue one was statistically significantly different for the cyan (p = 0.0162) goal only. The combined effect of communication modality and robots goal (variable CG in Table 4) is described by Fig. 4. The figure confirms that, for each goal, implicit cues were the slowest communication modality. However, within a goal, the difference between mean response time for implicit communication and response time for any explicit modality was not statistically significant only in the case of the yellow and cyan goals: for the yellow goal (first four rows in Fig. 4) p-value p = 0.9917 for (implicit, audio), p = 0.9481 for (implicit, colors) and p = 0.9139 for (implicit, arrows); for the cyan goal (last four rows in Fig. 4) p-value p = 1.000 for (implicit, audio), p = 0.9758 for (implicit, colors) and p = 1.000 for (implicit, arrows). This result is interesting since the yellow and cyan goals are the only ones having one neighbor, hence they are less
Fig. 4. Mean response time marginalized over communication modality C and robots goal G.
10
V. Villani et al.
Fig. 5. Mean response time marginalized over communication modality C and involvement in confounding task P.
ambiguous than the other goals: under these circumstances, the delay introduced by implicit communication is neglectable, as confirmed by p-values and Fig. 4. From this result, we can assume that implicit communication is a viable communication modality in the case of enough distinct and unambiguous robots goals. As a last remark, in Fig. 5 we summarize the effect of communication modality and involvement in a confounding task on response time. The figure shows that the longest response times are given in the case of implicit cues, no matter whether the subject was involved in puzzle solving or focused on robots movements. Moreover, as regards explicit communication, no significant differences are found between the presence of puzzle (P = 1) and its absence (P = 0). This is also confirmed by Tukey’s test: the only p-values that are less than 0.01 are those of pairs involving any explicit communication and implicit cues, with or without puzzle. This result might lead to the consideration that if the only reason behind the use of implicit communication is not to overwhelm users since they are busy with other tasks that are occupying their perceptive channels (i.e., need to look at something or to pay attention to sounds), then the choice of implicit communication might be not advantageous as expected. In this regard, Fig. 6 shows that the use of different communication modalities (and goals) has no effect at all on the time required to complete the confounding task, namely puzzle solving. 3.3
Subjective Reporting
At the end of the experimental task, participants were administered a questionnaire to collect their subjective feedback about the experiment and the considered communication modalities. First, they were asked whether they were at ease during the experiment and comfortable in interacting with the virtual environment. On a scale from 1 (no,
How Can We Understand Multi-Robot Systems?
11
Fig. 6. Mean time required for puzzle solving marginalized over communication modality C and robots goal C.
never) to 4 (yes, always), 80% of them reported that they had always been at ease during the experiment and 20% almost always; 60% reported they had always been comfortable with the interaction system and 40% almost always. Then, they were asked to rate each communication modality on a scale from 1 (very bad) to 5 (very good). The average scores are as follows: 3.3 for implicit communication, 4.3 for explicit communication with color cues, 4.3 for explicit communication with vocal commands and 3.6 for explicit communication with arrows. Regarding the second part of the experiment, that is the one including puzzle solving, we asked how much each communication modality was perceived as disturbing. Scores were on the scale from 1 (not disturbing) to 5 (very disturbing). On average, implicit communication was rated 1.9, color cues 1.6, arrows 2.6 and audio messages 1.8. Arrows were found as the most disturbing communication modalities and, interestingly, implicit cues were found almost as disturbing as vocal synthesis. Finally, the following questions were asked: – Which communication modality did you find the most effective? – Which communication modality did you find the less effective? – Which communication modality did you find the most effective while solving the puzzle? – Which communication modality did you find the less effective while solving the puzzle? – Should you need to interact with a multi-robot system, how do you prefer they would communicate with you? – Should you need to interact with a multi-robot system, which communication modality would you definitely not want?
12
V. Villani et al.
Replies to these questions are reported in Table 5. Participants reporting show that they did not appreciate implicit communication and, if given the choice, they prefer not to use it. However, it is noteworthy that it was not the most disturbing communication modality while they were involved in another confounding task. Among the considered explicit communication modalities, color cues and arrows were preferred over audio, which was found almost as disturbing as implicit cues. Table 5. Replies to questionnaire for subjective reporting. Question
Implicit Color
Most effective
10.0%
40.0% 33.3%
16.7%
Less effective
36.7%
13.3% 13.3%
36.7%
Most effective during puzzle 10.0%
30.0% 43.3%
16.7%
Lest effective during puzzle
16.7% 20.0%
30.0%
33.3%
Arrows Audio
Preferred
6.6%
36.7% 40.0%
16.7%
Not wanted
40.0%
3.3%
40.0%
16.7%
4 Conclusion In this paper we investigated the use of different communication modalities for a multirobot system. Specifically, the aim is to let a human subject be aware of the goal robots are moving towards. To this end, we considered implicit and explicit communication modalities. In implicit communication, users infer robots goal from their trajectory, whereas for explicit communication we considered color cues, where the robots and their goal exhibit the same color, arrows pointing from the center of the multi-robot system to the goal, and audio messages. We designed an experiment aimed to assess which communication modality is the most effective and how they are perceived by human subjects. Results have shown that, on average, implicit cues cause the highest number of errors and require the longest response time. While this result is not surprising, the experiments have highlighted that implicit communication is comparable to explicit communication modalities when robots goals are unambiguous and sufficiently distinct. Moreover, implicit communication does not require any specific hardware on robots, which is a practical limitation of explicit communication. Conversely, according to our results, implicit communication does not provide any advantage with respect user’s involvement in other tasks that involve their perceptive channels, such as looking at something requiring attention.
References 1. Schranz, M., Umlauft, M., Sende, M., Elmenreich, W.: Swarm robotic behaviors and current applications. Front. Robot. AI 7, 36 (2020)
How Can We Understand Multi-Robot Systems?
13
2. Lasota, P.A., Fong, T., Shah, J.A., et al.: A survey of methods for safe human-robot interaction. Found. Trends Robot. 5(4), 261–349 (2017) 3. Dragan, A.D., Lee, K.C.T., Srinivasa, S.S.: Legibility and predictability of robot motion. In: Proceedings of 8th Annual ACM/IEEE International Conference Human-Robot Interaction, pp. 301–308. IEEE Press (2013) 4. St-Onge, D., Levillain, F., Zibetti, E., Beltrame, G.: Collective expression: how robotic swarms convey information with group motion. Paladyn, J. Behav. Robot. 10(1), 418–435 (2019) 5. Capelli, B., Villani, V., Secchi, C., Sabattini, L.: Understanding multi-robot systems: on the concept of legibility. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 7355–7361. IEEE (2019) 6. Capelli, B., Villani, V., Secchi, C., Sabattini, L.: Communication through motion: Legibility of multi-robot systems. In: 2019 International Symposium on Multi-Robot and Multi-Agent Systems (MRS), pp. 126–132. IEEE (2019) 7. Clair, A.S., Mataric, M.: How robot verbal feedback can improve team performance in human-robot task collaborations. In: Proceedings of 10th Annual ACM/IEEE International Conference Human-Robot Interaction, pp. 213–220. ACM (2015) 8. Johnson, T., Tang, G., Fletcher, S.R., Webb, P.: Investigating the effects of signal light position on human workload and reaction time in human-robot collaboration tasks. In: Advances in Ergonomics of Manufacturing: Managing the Enterprise of the Future, pp. 207–215. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41697-7 19 9. May, A.D., Dondrup, C., Hanheide, M.: Show me your moves! conveying navigation intention of a mobile robot to humans. In: European Conference Mobile Robots (ECMR), pp. 1–6. IEEE (2015) 10. Szafir, D., Mutlu, B., Fong, T.: Communicating directionality in flying robots. In: Proceedings of 10th Annual ACM/IEEE International Conference Human-Robot Interaction, pp. 19–26. IEEE (2015) 11. De Graaf, M.M.A., Malle, B.F.: People’s explanations of robot behavior subtly reveal mental state inferences. In: 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 239–248. IEEE (2019) 12. Kim, L.H., Follmer, S.: Generating legible and glanceable swarm robot motion through trajectory, collective behavior, and pre-attentive processing features. ACM Trans. Hum.-Robot Interact. (THRI) 10(3), 1–25 (2021) 13. Flash, T., Hogan, N.: The coordination of arm movements: an experimentally confirmed mathematical model. J. Neurosci. 5(7), 1688–1703 (1985) 14. Glasauer, S., Huber, M., Basili, P., Knoll, A., Brandt, T.: Interacting in time and space: investigating human-human and human-robot joint action. In: 19th International Symposium in Robot and Human Interactive Communication, pp. 252–257. IEEE (2010) 15. Villani, V., Capelli, B., Sabattini, L.: Use of virtual reality for the evaluation of human-robot interaction systems in complex scenarios. In: 2018 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 422–427. IEEE (2018) 16. McHugh, M.L.: The Chi-square test of independence. Biochemia Medica 23(2), 143–149 (2013) 17. Miller, R.G.: Simultaneous statistical inference. Springer-Verlag New York Inc., (1981). https://doi.org/10.1007/978-1-4613-8122-8
The Benefits of Interaction Constraints in Distributed Autonomous Systems Michael Crosscombe1(B) and Jonathan Lawry2 1 2
Graduate School of Arts and Sciences, University of Tokyo, Tokyo, Japan [email protected] Department of Engineering Mathematics, University of Bristol, Bristol, UK [email protected]
Abstract. The design of distributed autonomous systems often omits consideration of the underlying network dynamics. Recent works in multi-agent systems and swarm robotics alike have highlighted the impact that the interactions between agents have on the collective behaviours exhibited by the system. In this paper, we seek to highlight the role that the underlying interaction network plays in determining the performance of the collective behaviour of a system, comparing its impact with that of the physical network. We contextualise this by defining a collective learning problem in which agents must reach a consensus about their environment in the presence of noisy information. We show that the physical connectivity of the agents plays a less important role than when an interaction network of limited connectivity is imposed on the system to constrain agent communication. Constraining agent interactions in this way drastically improves the performance of the system in a collective learning context. Additionally, we provide further evidence for the idea that ‘less is more’ when it comes to propagating information in distributed autonomous systems for the purpose of collective learning.
Keywords: collective learning systems · robot swarms
1
· interaction networks · multi-agent
Introduction
During the development of autonomous systems, it is only natural that we focus our attention on the aspects of a system which are responsible for producing the desired behaviours. In distributed autonomous systems the focus is often on how the interactions between individual agents can lead to effective macro-level collective behaviours. What may be oversimplified or even forgotten, however, are the underlying networks which underpin those interactions. A typical assumption to make is that the agent’s interaction network is merely a product of the spatial properties of the system, i.e. agents interact randomly with other agents based only on whether or not they are within the required distance for interaction to occur; often this takes the form of a communication range in swarm c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 14–27, 2024. https://doi.org/10.1007/978-3-031-51497-5_2
The Benefits of Interaction Constraints
15
robotics using platforms such as the Kilobots with infrared transceivers that are limited to a physical communication radius of 10 cm [16,20]. However, in this paper we argue that we ought to consider which agents should be allowed to interact, rather than merely considering how those agents interact. A common modelling assumption in distributed multi-agent and multi-robot systems is the “well-stirred system” assumption: In a well-stirred system each agent is equally likely to encounter any other agent in the system and, therefore, each interaction is regarded as an independent event [15]. This equates to a system in which the interaction network of the agents is totally connected and edges are selected at random to simulate the random interactions of different pairs of agents. However, this assumption does not consider that intentional constraints placed on agent interactions may yield better performance than if the interaction network is assumed to be totally connected and interaction depends only upon the physical proximity of the agents. More recently, [18] found that “less is more” when it comes to communication in a robot swarm and that, by reducing the physical communication radius of the robots (or the density of the swarm), the system-level performance improves as a result of there being less communication overall. While this is the most recent study to highlight the problem of too much communication, [9, Sect. 3.2] previously demonstrated a similar problem in which agents became “too cooperative”. This general problem can be traced further back to [14] in which the authors examined the effect of network structure on system-level performance. A general survey of recent literature that considers the connectivity of a multi-agent/robot system conducting collective exploration is provided in [13]. In previous work we have studied the impact of applying constraints to the interaction network in a non-physical multi-agent system within a collective learning setting [5]. We showed that the “less is more” hypothesis holds true for collective learning when the underlying interaction network’s topology is a small-world network and that interaction networks with low connectivity lead to optimal performance for collective learning. However, this study was limited to studying interaction networks in abstract simulations, without the ability to consider the physical network layer. Therefore, in this work we propose to study both the physical and interaction networks in combination, to determine which layer of connectivity is more important for performance in the development of distributed autonomous systems. An outline of the remainder of this paper is as follows: We begin by describing collective learning as the context of our investigation, before defining a threevalued model for collective learning. We present results for a totally-connected interaction network but where the physical network connectivity is varied, so as to understand the extent to which the physical network has an impact on system performance. Then, we introduce constraints over the interaction network by restricting its topology to that of a regular lattice network, before studying system performance with respect to different levels of connectivity. Finally, we conclude our findings with some discussions and highlight our main contributions before detailing directions for future work.
16
2
M. Crosscombe and J. Lawry
Collective Learning in Multi-agent Systems
Social learning broadly describes the observed phenomena in animals where an individual’s learning is influenced by observing and interacting with other individuals or their products [10]. An example of this is the use of pheromones to facilitate teaching in the tandem-running ant Temnothorax albipennis [8]. This style of learning forms the basis of many distributed autonomous systems which, by design, rely on the interactions of individual agents to produce behaviours more complex than could be demonstrated by any one individual. In the last decade, distributed decision-making has been identified as a foundational problem in swarm systems due to the need for large groups of agents to cooperate on a shared task by functioning as a collective [2,17]. In swarm systems, the ability for agents to make decisions in a distributed manner is crucial for their real-world deployment, as robot swarms are appropriate platforms for deployment in difficult terrain and hazardous environments, such as for the detection of wildfires [12]. Due to their large scale, swarm systems often cannot be supervised at the individual level and must instead rely on human supervision at the swarm level [11], while making its own decisions and taking actions autonomously, often by reaching a consensus [19] or through quorum sensing [1]. Much of the literature on collective decision-making in robot swarms centres on the best-of-n problem; a bio-inspired problem in which the swarm must decide which is the best option of n possible alternatives [15]. However, this class of decision-making is limited to scenarios in which each option is associated with a quality value. Instead, we focus on a state-of-the-world problem whereby agents must learn the current state of the world, in order to make informed decisions about their assigned task. In this paper we adopt the term collective learning to describe social learning in a population of artificial agents attempting to reach a consensus about the state of their environment [4]. In this context, collective learning is used specifically to describe the combination of two distinct processes: direct learning based on evidence obtained directly from an agent’s environment, and indirect learning in the form of an agent fusing its belief with the belief of another agent. The effects of combining these two processes has been studied from an epistemological perspective by [7] who argue that communication between agents acts both to propagate information through the population, and to correct for errors that occur whilst gathering evidence. The connection between these two processes has been further investigated more recently in [3,6].
3
A Three-Valued Model for Collective Learning
We consider a collective learning problem in which a distributed system of autonomous agents attempts to learn the true state of their environment. Suppose that the environment can be described by a set of n propositional variables P = {p1 , . . . , pn } with each relating to a discrete location, then an agent’s belief about the environment is an allocation of truth values to each of the propositional
The Benefits of Interaction Constraints
17
variables b : P → {0, 12 , 1}n . In our model we adopt a three-valued propositional logic with the truth values 0 and 1 denoting false and true, respectively, whilst the third truth-value 12 is used to denote unknown and allows agents to express uncertainty about the environment. For notational convenience, we can instead represent an agent’s belief b by the n-tuple B = B(p1 ), . . . , B(pn ). By doing so, an agent can express uncertainty about a location in the environment by assigning the truth value 12 to any of the propositional variables in P. As an example, let us suppose that an environment can be described by n = 2 propositional variables, then the belief B = 0, 1 would represent that the agent believes p1 to be false, while p2 is true. Now let us suppose another agent holds the belief B = 12 , 12 , then the agent is expressing uncertainty about both of the propositions p1 and p2 , thereby indicating that the agent is totally uncertain about the environment. We now define the two processes of direct and indirect learning in the context of our proposed three-valued model and describe how they combine to produce collective learning at the macro level in distributed autonomous systems. Evidential updating. The learning process by which an agent obtains evidence directly from its environment, e.g., through onboard sensory capabilities, and updates its current belief to reflect this evidence is referred to as evidential updating. In our model, an agent first decides which proposition to investigate by selecting a single proposition at random from those propositions about which the agent is uncertain, i.e. where B(pi ) = 12 . Then, having selected a proposition pi to investigate, the agent travels to the corresponding location in the environment where it gathers evidence. Evidence takes the form of an assertion about the truth value of the chosen proposition pi such that E = 12 , . . . , S ∗ (pi ), . . . , 12 , where S ∗ : P → {0, 1} denotes the ground truth – often referred to as the true state of the world – and that the truth value of each proposition is certainly true or false. Upon obtaining evidence E the agent then updates its belief to B|E as follows: B|E = B(p1 ) E(p1 ), . . . , B(pn ) E(pn ) =BE
(1)
where is a binary operator defined on {0, 12 , 1} given in Table 1. We will elaborate on the reasoning behind our choice of fusion operator when we introduce the belief fusion process but, for now, notice that the operator works for evidential updating by preserving certainty over uncertainty and that updating in this manner does not, therefore, alter the truth value of any proposition pj ∈ P where pj = pi because E(pj ) = 12 . An agent repeatedly gathers evidence about uncertain propositions until it has become totally certain about the environment, at which point it ceases to look for evidence. In real-world environments the act of gathering evidence is often subject to noise, stemming either from the use of noisy sensors – such as thermal sensors used by UAVs [21] – or from the environment itself. In this model evidence shall
18
M. Crosscombe and J. Lawry Table 1. The fusion operator applied to beliefs B and B . B (pi ) B(pi ) 0 12 1 0 0 0 12 1 0 12 1 2 1 12 1 1
therefore take the following form: S ∗ (pi ) : with probability 1 − E(pi ) = ∗ 1 − S (pi ) : with probability
(2)
where ∈ [0, 0.5] is a noise parameter denoting the probability that the evidence is erroneous. Belief Fusion. In collective learning the agents do not exist in isolation. Instead, individual agents can benefit from being able to interact with, and learn from, other agents in a shared environment. To this end, we now describe the indirect learning process by which pairs of agents fuse their beliefs in order to improve the system’s ability to learn the state of the world. The belief fusion process provides two crucial benefits: Firstly, agents are able to propagate information they obtain directly from the environment through the system by fusing their beliefs with those of other agents. We propose to combine beliefs using the three-valued fusion operator in Table 1 which is applied elementwise to all of the propositional variables pi ∈ P. Given two beliefs B and B , corresponding to the beliefs of two agents, the fused belief is then given by: B B = B(p1 ) B (p1 ), . . . , B(pn ) B (pn )
(3)
Secondly, the belief fusion operator helps to correct for errors that occur when the evidence obtained by the agents is subject to noise. This error-correcting effect arises when two agents have gathered evidence about the same proposition pi , independently of one another, and encounter a disagreement. When a pair of agents differ in their belief about the truth value of pi , then the belief fusion process leads them both to become uncertain with regards to pi . As an example, suppose that two agents hold the beliefs B1 (pi ) = 1 and B2 (pi ) = 0, then upon the agents fusing their beliefs such that B1 B2 (pi ) = 12 , both agents will attempt to seek additional evidence about proposition pi , either directly from the environment or indirectly via fusion with other agents, having become uncertain about the truth value of pi . Notice that the belief fusion process can lead to agents becoming less certain when the fusing agents disagree about the truth value of a given proposition, while evidential updating only leads to agents becoming more certain.
The Benefits of Interaction Constraints
19
Measuring Performance. In the context of collective learning agents are attempting to learn a representation of the true state of the world underlying their environment. As such, we can measure the accuracy of the average belief of the population with respect to the true state of the world in the following way: Given a population of m agents, the normalised difference between each agent’s belief B and the true state of the world S ∗ , averaged across the population, is the average error of the system; m
n
1 1 |Bj (pi ) − S ∗ (pi )| m n j=1 i=1
(4)
This shall be our primary metric of performance.
4
Simulation Environment
Fig. 1. Simulation environment depicting a noisy collective learning scenario in which m = 20 agents attempt to learn about n = 126 locations when the environment noise = 0.2. Each agent has a communication radius Cr = 20.
We now introduce the physical simulation environment used to study collective learning in a multi-agent system. In Fig. 1 the simulation environment is depicted with 20 agents and 126 hexagonal locations for the agents to investigate1 . Agents are represented by black Unmanned Aerial Vehicles (UAVs) with either a grey or a blue dashed circle surrounding them, indicating their physical communication radius Cr . If two agents are within each other’s physical radii of communication then communication between those two agents is possible. If the radius is blue, then an agent is said to be in a communicating state, whilst grey indicates that it is not communicating. At initialisation (Fig. 1a), the environment is greyed out to represent the collective lack of information that the system holds about its environment. During the collective learning process (Fig. 1b), agents begin to build an internal representation of the state of the 1
We found that 20 agents were sufficient to complete the task with performance similar to that of 50 agents.
20
M. Crosscombe and J. Lawry
Fig. 2. A snapshot of the simulation during an experiment. In this model two different types of networks are formed which govern the agent interactions in the system. (a) A snapshot of the simulation environment during runtime. (b) A zoomed-in view of 7 agents with a system-wide communication radius Cr = 20. (c) A representation of the physical network that forms from the agents’ communication radii. (d) A representation of the pre-assigned interaction network between the visible agents. (e) The resulting communication links that are available according to edges which are present in both the physical and interaction networks.
world, which can be averaged across the system to give us a general view of performance. To show this, the colours of the hexagons increase in intensity representing the collective certainty about each location as contained in the belief of the average agent. The different colours are used to represent ocean (blue), beach (yellow), and low and high terrain (light and dark green), but are purely visual in this context. The brown hexagon is the initial launch location from which all agents begin exploring. In the case that the system collectively adopts the incorrect truth value for a given proposition, the hexagon at its associated location is coloured red, with its intensity still representing the average certainty with which the agents have (incorrectly) determined its truth value. At the end of the simulation (Fig. 1c) the agents have reached a consensus about the state of the world and the map is in full view, albeit with some errors depicted by red hexagons.
5
The Networks Underlying Agent Interactions
When we consider the design of distributed autonomous systems (e.g. multiagent systems or robot swarms), the system-level performance is driven solely by the interactions of its constituent agents with their environment and with each other. The consequence of this fact is that we focus our effort on designing the rules and processes which define how the agents interact, but often we neglect to consider which agents ought to interact. In the past, we have often assumed that it is beneficial for agents to be able to interact freely with any other agent in the system, but recent works have shown this to be false [5,18]. To understand the extent to which agents interact in our simulation, we now distinguish between the physical network and the interaction network as they occur in our simulation environment. In Fig. 2 we show a snapshot of the simulation during runtime in which we overlay visual representations of the different networks which are formed during a
The Benefits of Interaction Constraints
21
single time step t. Assuming each agent acts as a node in both a physical network and an interaction network, then the red edges connect pairs of agents which are physically capable of communicating, based on the agents being within their communication range with radius Cr . Edges in the physical network are added or removed during each time step t of the simulation according to which pairs of agents are within communication range of one another. The yellow edges, meanwhile, depict the connectivity of the interaction network as determined at the start of each experiment. Unlike in the physical network, the pairs of agents which are allowed to interact are immutable, such that no new edges are added, nor existing edges removed after the network is initialised. The green edges depict which edges exist in both networks at a given time step t. Each edge then corresponds to a pair of agents which are both able and allowed to interact with one another. An interaction network can be constructed in whatever manner leads to optimal system performance, provided that the agents remain capable of interacting physically in the environment during their task. The default interaction network in many multi-agent and multi-robot systems is a totally-connected network. In this paper we seek to constrain interactions by reducing the connectivity of the network whilst improving system performance. To do this we shall use regular lattice networks, having discovered in previous work that such a topology is more optimal even than small-world networks in this setting [5,22].
6
Multi-agent Simulation Experiments
For each experimental condition, the system proceeds in the following way: We initialise m = 20 agents at the same starting location and with total uncertainty about the n = 126 locations, i.e. each agent begins with the belief B(pi ) = 12 for pi ∈ P. Every agent is also initialised with the same communication radius Cr ∈ {20, . . . , 100} and a communication frequency Cf ∈ [0, 1]. After initialisation, each agent selects a proposition pi at random about which they are uncertain and begins travelling to the associated location in search of evidence about pi . When they arrive at the location the agent will observe their environment and obtain evidence E according to Eq. (2), before updating its belief by Eq. (1). The agent will then select another uncertain proposition pi to investigate and begin to travel in the direction of the new location. At this moment, the agent will also enter a communicating state with probability Cf . If the agent does not enter the communicating state then it continues to its destination to obtain evidence and the process repeats. If the agent does enter the communicating state, however, then the agent begins broadcasting its belief to any agents that come within its communication radii Cr as it travels to its destination. The agent remains in the communicating state until it encounters one or more other communicating agents within communication range. Upon receiving the belief(s) of the other agent(s), the agent will cease communicating and update its belief B by selecting another belief B at random before adopting the fused belief B B by Eq. (3). If the agent remains uncertain about the
22
M. Crosscombe and J. Lawry
Fig. 3. Average error at steady state against communication radius Cr for increasing communication frequency Cf = [0, 1]. Each dotted line shows the noise ∈ {0, 0.1, 0.3}. The shaded regions show a 95% confidence interval.
location to which it is travelling, then the agent continues to that location as normal. However if the agent, having fused its belief, becomes certain about the location during travel then it simply repeats the process of searching for new evidence by first selecting a new uncertain location from its belief. Finally, if instead the agent holds a totally certain belief, such that B(pi ) ∈ {0, 1} for all pi ∈ P, then the agent repeatedly enters a communicating state in an attempt to reach a consensus with the rest of the system. Each experiment runs for a maximum of 30, 000 time steps or until the population has reached a unanimous consensus about their environment. We repeat each experiment 50 times for each parameter combination Cr and Cf and for three different noise levels ∈ {0, 0.1, 0.3} representing noise-free, lowand moderate-noise scenarios, respectively. The results are then averages across the 50 runs. We now present results for simulation experiments where the underlying interaction network is totally-connected, in keeping with the “well-stirred system” assumption, in which we study how changes to the connectivity of the physical network affect the macro-level dynamics of the system. 6.1
Convergence Results for Physical Networks
In this section we show convergence results for a static totally-connected interaction network and focus our attention on the dynamic physical network for which the connectivity changes based on the chosen communication radius Cr . In all figures, the shaded regions show a 95% confidence interval. In Fig. 3 we show the average error of the population (Eq. (4)) at steady state against communication radius Cr . Each solid line shows the results for different noise , while the dashed lines of the same colour show the noise . Each plot within this figure represents a different communication frequency Cf increasing from 0 (no communication) to 1 (unrestricted communication). For Cf = 0 we present a special case in which agents do not communicate with one another (referred to as “asocial learning” by [10]). As a result of this total absence of belief fusion, the agents rely solely upon evidential updating to inform their beliefs and, in the case where the evidential updating process is noisy, the agents are unable to learn an accurate representation of their environment. The result is the system reaching an average error around for all communication
The Benefits of Interaction Constraints
23
radii Cr . Therefore, serves as a simple benchmark of performance and any reduction in the average error to below demonstrated improved performance. As we increase Cf and agents are able to fuse their beliefs, we see an immediate and significant reduction in the average error for Cf ≥ 0.001 such that for all noisy scenarios (i.e. for = 0.1 and 0.3) the average error is reduced to below the level of noise in each scenario. In the noise-free scenario (green line) the populations converge to an average error above 0 when the communication frequency is too low (i.e. for Cf < 0.1, suggesting that while the population has converged to a highly accurate average belief, the population does not reach a consensus across the system. Here we see that the best performance is measured for a communication frequency Cf = 0.025, but increasing Cf further results in an increased average error and, therefore, worse performance in the system. Perhaps surprisingly, increasing the communication radius Cr of the system, and thereby the connectivity of the physical network over which agents communicate, has little effect on the macro-level performance as measured by the average error of the system at steady state. The lines in Fig. 3 remain relatively flat across different communication radii Cr , with the performance of the system being almost entirely determined by the communication frequency Cf . This suggests that the level of connectivity of the physical network has very little impact on the performance of the system for the purpose of collective learning when agents can communicate freely provided they are in range.
7
Constraining the Interaction Network
Fig. 4. Network diagrams with different levels of connectivity. (a) Regular lattice network with k = 2. (b) Regular lattice network with k = 4.(c) Totally-connected network.
We now investigate how applying constraints to the interaction network, i.e. limiting the micro-level dynamics, affects the macro-level performance of our model for collective learning. In the remainder of this paper, we will study interaction networks which take the form of regular lattice networks with different levels of connectivity. A regular lattice network can be defined by its connectivity k ∈ [2, m − 1] where k determines how many “nearest neighbours” each agent is connected to when the agents are arranged around a ring. Examples of lattice networks for k = 2 and k = 4 are shown in Figs. 4a and 4b, respectively, and are compared with a totally-connected network equivalent to k = 19 in Fig. 4c.
24
M. Crosscombe and J. Lawry
Fig. 5. Average error at steady state against communication radius Cr for increasing communication frequency Cf = [0, 1]. Each dotted line shows a different lattice network with connectivity k ∈ {2, 4, 8, 16}. The shaded regions show a 95% confidence interval.
7.1
Convergence Results for Constrained Interaction Networks
Results are presented for interaction networks with differing levels of connectivity k. We continue to vary the connectivity of the physical network via the communication radius to observe whether the physical layer, in conjunction with the interaction layer, has some effect on system performance that wasn’t observed previously. In Fig. 5 we show a comparison of the average error of the system at steady state against communication radius Cr for different communication frequencies Cf . Each solid line represents a different regular lattice network of connectivity k. In addition, we compare results for noise = 0.1 (top) against = 0.3 (bottom), and show the noise level as a red, dotted line. Beginning with Fig. 5a we can see that the low-noise scenario converges to an average error at or around 0 for communication frequencies Cf < 0.2 with the best performance observed for Cf = 0.1. We also see that the physical connectivity of the system has little impact when comparing across communication radius Cr . In the high-noise scenario of Fig. 5b, however, we see that the average error remains above 0 for all values of Cr and Cf and that, when Cf ≥ 0.1, there is a sudden increase in average error for k ≥ 4. Most importantly, we observe that the networks with the highest connectivity exhibit the largest increases in average error as Cf increases, such that k = 16 always results in worse performance than k = 8, and so forth. An exception occurs in the high-noise scenario with very low communication frequencies Cf < 0.1, where a network with k = 2 performs slightly worse than more connected networks. However, for Cf ≥ 0.1, k = 2 performs on par or better than more connected networks. For example, at the extreme with a
The Benefits of Interaction Constraints
25
Fig. 6. Average error against time t for Cr = 20 and Cf = 0.2 for different regular lattice networks with connectivity k ∈ {2, 4, 8, 16} and a totally-connected network (i.e. k = 19). Each dotted line shows the noise ∈ {0, 0.1, 0.3}. The shaded regions show a 95% confidence interval.
communication frequency Cf = 1, a network of k = 2 has 15 the average error compared with k = 16 for the same communication radius Cr = 20. When we increase the radius Cr = 100, a network of k = 2 achieves less than 12 the average error of a network with k = 16. Furthermore, the impact of the physical network also becomes more apparent in the high-noise scenario where, for Cf = 1, we see a clear increase in average error as the communication radius Cr is increased. To better understand the dynamics of our model and how the underlying connectivity affects time to convergence, we present trajectories of the average error against time in Fig. 6. Each plot represents a network of different connectivity k, and each line shows that network’s performance for different noise . We have fixed Cr = 20 and Cf = 0.2 as the best performing parameters in this comparison. From left to right, as connectivity increases, we see that for a highnoise scenario the time to convergence is much slower, with the k = 2 network converging after around 25, 000 time steps. This is reduced to less than 15, 000 time steps for k = 4 but with slightly higher average error. Then, as we continue to increase the connectivity k, the network continues to converge in less time but with greater average error. Therefore, increasing the connectivity worsens performance when the noise is high. These results suggest that, for higher communication frequencies, the combination of both the interaction network and the physical network is an important consideration. It is also clear that as the interaction network becomes increasingly connected, the connectivity of the physical network has less of an impact, and that the inverse is true where, for a less connected interaction network, the physical network has greater impact overall. Therefore, both networks are important for considering optimum performance in distributed autonomous systems.
8
Conclusion and Future Work
In this paper we have proposed to study the effects of both the physical and interaction networks on the collective learning performance of a multi-agent system. We have demonstrated that, when the underlying interaction network is totally connected, the physical network has minimal impact on performance in
26
M. Crosscombe and J. Lawry
our model. However, when the connectivity of the interaction network is greatly reduced the system performs better, achieving lower average error when compared with highly-connected interaction networks under the same experimental conditions. Limiting the interactions between agents also results in improved robustness to noise, where increased connectivity at the physical layer only worsens performance for the same limited interaction network. Overall, performance is best when the connectivity of both the physical and interaction networks is severely reduced, e.g., for a connectivity of k = 2 and a communication radius Cr = 20. To our knowledge, this is the first such study explicitly investigating the impact of both the physical and interaction networks in combination in a swarm setting. While it is difficult to draw strong, broad conclusions from this study, we hope that this will encourage other researchers working on collective behaviours to consider the importance of the underlying interaction network and its level of connectivity. Obviously, this study is limited to simulation studies of the effects of the interplay between the physical and interaction network, but a more theoretical study of this interplay is important to understand the reason behind the observed dynamics. Up to now we have only considered regular lattice networks and small-world networks as the basis for interaction networks. These networks possess certain desirable properties which may be more conducive to the processes involved in collective learning than may other, different, types of networks. Furthermore, we have only considered the effect of network connectivity in the context of collective learning. This effect may not necessarily be observed for other kinds of collective behaviours. As such, we intend to further explore how network connectivity, and information sharing more generally, impacts different kinds of collective behaviours in multi-agent systems. We are also interested in understanding how a more dynamic environment might change the optimal level of connectivity in the networks, as less connectivity may lead to outdated information being retained by the population which is likely to affect performance over time. Acknowledgements. This work was funded and delivered in partnership between Thales Group, University of Bristol and with the support of the UK Engineering and Physical Sciences Research Council, ref. EP/R004757/1 entitled “Thales-Bristol Partnership in Hybrid Autonomous Systems Engineering (T-B PHASE).”
References 1. Bechon, P., Slotine, J.J.: Synchronization and quorum sensing in a swarm of humanoid robots. arXiv preprint arXiv:1205.2952 (2012) 2. Brambilla, M., Ferrante, E., Birattari, M., Dorigo, M.: Swarm robotics: a review from the swarm engineering perspective. Swarm Intell. 7(1), 1–41 (2013) 3. Crosscombe, M., Lawry, J.: A model of multi-agent consensus for vague and uncertain beliefs. Adapt. Behav. 24(4), 249–260 (2016) 4. Crosscombe, M., Lawry, J.: Collective preference learning in the best-of-n problem. Swarm Intell. 15(1), 145–170 (2021)
The Benefits of Interaction Constraints
27
5. Crosscombe, M., Lawry, J.: The impact of network connectivity on collective learning. In: Matsuno, F., Azuma, S., Yamamoto, M. (eds.) DARS 2021. SPAR, vol. 22, pp. 82–94. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-92790-5 7 6. Douven, I.: Optimizing group learning: an evolutionary computing approach. Artif. Intell. 275, 235–251 (2019) 7. Douven, I., Kelp, C.: Truth approximation, social epistemology, and opinion dynamics. Erkenntnis 75(2), 271–283 (2011) 8. Franks, N.R., Richardson, T.: Teaching in tandem-running ants. Nature 439(7073), 153–153 (2006) 9. Hamann, H.: Superlinear scalability in parallel computing and multi-robot systems: shared resources, collaboration, and network topology. In: Berekovic, M., Buchty, R., Hamann, H., Koch, D., Pionteck, T. (eds.) ARCS 2018. LNCS, vol. 10793, pp. 31–42. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77610-1 3 10. Heyes, C.M.: Social learning in animals: categories and mechanisms. Biol. Rev. 69(2), 207–231 (1994) 11. Hogg, E., Harvey, D., Hauert, S., Richards, A.: Evolving robust supervisors for robot swarms in uncertain complex environments. In: Matsuno, F., Azuma, S., Yamamoto, M. (eds.) DARS 2021. SPAR, vol. 22, pp. 120–133. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-92790-5 10 12. Innocente, M.S., Grasso, P.: Self-organising swarms of firefighting drones: Harnessing the power of collective intelligence in decentralised multi-robot systems. J. Comput. Sci. 34, 80–101 (2019) 13. Kwa, H.L., Leong Kit, J., Bouffanais, R.: Balancing collective exploration and exploitation in multi-agent and multi-robot systems: a review. Front. Robot. AI 8, 771520 (2022) 14. Lazer, D., Friedman, A.: The network structure of exploration and exploitation. Adm. Sci. Q. 52(4), 667–694 (2007) 15. Parker, C.A., Zhang, H.: Cooperative decision-making in decentralized multiplerobot systems: the best-of-n problem. IEEE/ASME Trans. Mechatron. 14(2), 240– 251 (2009) 16. Rubenstein, M., Ahler, C., Hoff, N., Cabrera, A., Nagpal, R.: Kilobot: a low cost robot with scalable operations designed for collective behaviors. Robot. Auton. Syst. 62(7), 966–975 (2014). Reconfigurable Modular Robotics 17. Schranz, M., Umlauft, M., Sende, M., Elmenreich, W.: Swarm robotic behaviors and current applications. Front. Robot. AI 7, 36 (2020) 18. Talamali, M.S., Saha, A., Marshall, J.A., Reina, A.: When less is more: robot swarms adapt better to changes with constrained communication. Sci. Robot. 6(56), eabf1416 (2021) 19. Valentini, G.: Achieving Consensus in Robot Swarms. SCI, vol. 706. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-53609-5 20. Valentini, G., Ferrante, E., Dorigo, M.: The best-of-n problem in robot swarms: formalization, state of the art, and novel perspectives. Front. Robot. AI 4, 9 (2017) 21. de Vries, J.: Image processing and noise reduction techniques for thermographic images from large-scale industrial fires. In: Proceedings of the 12th International Conference on Quantitative InfraRed Thermography (QIRT) (2014) 22. Watts, D., Strogatz, S.: Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (1998)
Outlining the Design Space of eXplainable Swarm (xSwarm): Experts’ Perspective Mohammad Naiseh1(B) , Mohammad D. Soorati2 , and Sarvapali Ramchurn2 1
Department of Computing and Informatics, Bournemouth University, Poole, UK [email protected] 2 Department of Electronics and Computer Science, University of Southampton, Southampton, UK {m.soorati,sdr1}@soton.ac.uk
Abstract. In swarm robotics, agents interact through local roles to solve complex tasks beyond an individual’s ability. Even though swarms are capable of carrying out some operations without the need for human intervention, many safety-critical applications still call for human operators to control and monitor the swarm. There are novel challenges to effective Human-Swarm Interaction (HSI) that are only beginning to be addressed. Explainability is one factor that can facilitate effective and trustworthy HSI and improves the overall performance of Human-Swarm team. Explainability was studied across various Human-AI domains, such as Human-Robot Interaction and Human-Centered ML. However, it is still ambiguous whether explanations studied in Human-AI literature would be beneficial in Human-Swarm research and development. Furthermore, the literature lacks foundational research on the prerequisites for explainability requirements in swarm robotics, i.e., what kind of questions an explainable swarm is expected to answer, and what types of explanations a swarm is expected to generate. By surveying 26 swarm experts, we seek to answer these questions and identify challenges experts faced to generate explanations in Human-Swarm environments. Our work contributes insights into defining a new area of research of eXplainable Swarm (xSwarm) which looks at how explainability can be implemented and developed in swarm systems. This paper opens discussion on xSwarm and paves the way for more research in the field. Keywords: Explainable AI Interaction
1
· Swarm Robotics · Human-Swarm
Introduction
Swarm robotics consists of multiple robots that interact with each other to accomplish a collective task that is beyond individual ability, or can perform more effectively by a group of robots [11]. Swarm robotics promise various implementations in many scenarios such as pollution control, surveillance, package delivery and firefighting. Although swarms can perform the task autonomously, c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 28–41, 2024. https://doi.org/10.1007/978-3-031-51497-5_3
Explainable Swarm
29
human intervention is still crucial for safe and effective deployment in realworld scenarios [1]. For example, swarm operators may need to manage swarm tasks that might be beyond swarm capability or intervene in cases of failures and errors. This intervention could be more complex when it requires multiple humans to work together in a team to operate remote swarms [10]. For instance, in a search and rescue scenario where the swarm is responsible for searching and identifying casualties, the system may consist of two operators: (a) an operator is tasked with scheduling clusters of robots to search areas and (b) an analyst who is tasked with analysing casualties images. Human-Swarm Interaction (HSI) is an emerging field of research that studies human factors and swarm robotics to better utilise both humans and swarm capabilities in a Human-Swarm environment. An essential part of HSI is communicating, to humans, the reasons behind swarms’ decisions and behaviour [25]. As swarms become more advanced, explaining their behaviour (individual and collective) is buried in increasingly complex communication between swarm agents and underlying black-box machine learning models. Explainable Artificial Intelligence (XAI) is a response to this direction of research. XAI is a domain of research that is interested in making the behaviour or the output of intelligent systems more intelligible to humans by providing explanations [18]. While the concept of XAI is increasingly gaining attention in several research domains, such as Agent Planning [3], Machine Learning [22] and Motion Planning [6]. There is a clear gap in the literature to integrate swarm robotics into the research and development of XAI. Similar to XAI fields of research, we argue that eXplainable Swarm (xSwarm) should address that an explanation is an answer to an unexpected event or a failure of the system [18]. However, compared to different XAI domains, xSwarm can be significantly different since the decision made by a swarm significantly depends on the communication and collective contribution of its agents [25]. Further, the explainability of swarms would require explaining the emerging collective behaviour that is not necessary for the summation of the individual robot behaviour [5]. Many studies discussed the need for XAI methods and models for HumanSwarm environments to improve trust and maintain situational awareness, e.g., [1,20,25]. However, at the current stage of research, it is still vague what kind of explanations a swarm system should be expected to generate and what questions an explainable swarm is expected to answer. This paper seeks to conceptualise the concept of xSwarm and introduce a new taxonomy of explainability requirements for HSI. Our results can guide the requirements elicitation for explanation methods and models for Human-Swarm teaming. These insights contribute to operationalising the concept of xSwarm in reality and can serve as a resource for researchers and developers interested in designing explainable Human-Swarm interfaces. In this paper, we limit our research to swarm robotics, i.e., this paper does not address multi-robot systems that have explicit goals and their agents execute both individual and group plans. We follow a similar approach proposed in recent publications to outline explainability requirements in a given problem [6,15]. Our
30
M. Naiseh et al.
method uses open-text and scenario-based questionnaire with domain experts, to elicit explainability considerations for swarm systems. We foresee a lack of established understanding of the concept or common technical knowledge given the early stage of xSwarm in industrial practices. Therefore, we used a novel use-case developed in our recent work [10] and supported with failure scenarios and explanations to ground our investigation. Our data set was collected from on an online questionnaire with 26 swarm experts. In this paper, we make the following contributions: – We present xSwarm taxonomy that summarises the explainability space for swarm robotics from swarm experts’ perspective. – We summarize current challenges faced by researchers and industry practitioners to create xSwarm.
2
Background and Related Work
So far, the process of generating explanations for AI systems users has received a lot of attention from multiple domains, including psychology [21], artificial intelligence [16], social sciences [18] and law [4]. Furthermore, regulatory requirements and increasing customer expectations for responsible artificial intelligence raise new research questions on how to best design meaningful and understandable explanations for a wide range of AI users [23]. To respond to this trend, many explainable AI (XAI) models and algorithms have been proposed to explain black-box ML models [23], agent-path planning [3], multi-agent path planning [2] and explainable scheduling [14]. These explanations can range from local explanations that provide a rationale behind a single decision to global explanations that provide explanations for the overall AI logic [16]. Despite a growing interest in explainable AI, there is a limited number of studies on how to incorporate explainability in swarm robotics. Only a few attempts in the literature have discussed explainability in swarm robotics. For instance, a scalable human-swarm interaction has been proposed to allow a single human operator to monitor the state of the swarm and create tasks for the swarm [11]. Their study with 100 participants showed that users have different visualisation preferences for explanations to observe the state of the swarm depending on the type of the given tasks. The explanation in this study was limited to a visual representation of the swarm coverage using a heatmap or individual points space that does not capture the wider space of xSwarm. Another example is the work proposed in [20]. They proposed Human-Agent EXplainability Architecture (HAEXA) to filter explanations generated by swarm agents with the aim to reduce human cognitive load. Finally, Roundtree et al. [25] discussed design challenges and recommendations for explainable Human-Swarm interfaces, e.g., they described the delivery of explanations to human operators in swarm systems as the main challenge due to human processing limitations caused by a large number of individual robots in the swarm. Although these studies provide steps toward the right direction for xSwarm, it is still unclear whether explanations developed in other XAI domains would be useful for Human-Swarm
Explainable Swarm
31
environments. Also, it is still undefined what kind of questions a swarm system is expected to answer or what explanations from swarm robotics would look like. Meanwhile, recent papers were published in multiple domains to guide the exploration of explainability requirements in a certain domain. Liao et al. [15] created a list of prototypical user questions to guide the design of human-centred explainable black-box ML. Similarly, Brandao et al. [6] proposed a taxonomy of explanation types and objects for motion planning based on interviewing motion planning experts and understanding their current work and practice. Our work is driven by a similar approach for outlining xSwarm’s design space. More particularly, in light of the growing interest in XAI, we are interested in understanding what is required to create xSwarm and what are the major practical challenges to developing xSwarm.
3
Background and Related Work
Ethical approval was acquired via the Ethics Committee of the School of Electronics and Engineering from the University of Southampton [Reference No. 67669]. 3.1
Participants
Thirty-two participants voluntarily accessed our online questionnaire. Participants were approached through two main mailing lists: multi-agent1 and TASHub2 mailing lists. Four participants were excluded because they identified themselves to have less than one year of working in swarm robotics. Two more participants were also excluded as they chose not to complete the questionnaire, leaving us with twenty-six experts. 46.2% of our participants were researchers, 7.7% were lecturers, 23.1% were PhD students, and 22.9% with industrial roles. All experts had at least more than one year of working in swarm robotics. 61.5% of participants had 1–5 years of experience, 23.1% had 5–10 years, 7.7% with 10–15 years and 7.7% exceeded 15 years of experience in the field. 3.2
Study Procedure and Design
Due to the limited research in xSwarm, we chose to conduct an online questionnaire as it allows us to scope an early investigation into outlining a preliminary xSwarm taxonomy for future in-depth investigations. It also helps us to gather a large number of experts’ feedback in a brief time compared with other data collection methods such as interviews [17]. We designed our questionnaire to include several elicitation probes to obtain representative results and minimise cognitive heuristics and bias [13,19] (Appendix A - Table A.1). First, a draft questionnaire was developed by the first author. Then, the draft was pilot tested with 1 2
Multi-agent systems—The Alan Turing Institute https://www.turing.ac.uk/research/interest-groups/multi-agent-systems. UKRI Trustworthy Autonomous Systems Hub (TAS-Hub): https://www.tas.ac.uk/.
32
M. Naiseh et al.
experts (researchers from the inside and outside the research group) to assure that question formulation can achieve the research goal and be understandable to study participants. The questionnaire starts by asking participants to read the participant information sheet. The information sheet outlines the motivation of the study, compiles background material about the research goal and problem as well as describes the questionnaire structure. Participants’ information sheet also contains background material about explainable AI to minimise any availability bias [19]. Then participants were directed toward the main questionnaire. The questionnaire has four main sections. Table 1 provides a brief description of the questionnaire structure and probe types used in each section. Table 1. Questionnaire structure and elicitation probes Questionnaire section Probe type
Description
Swarm experience
NA
Demographic questions
Swarm failures
Knowledge
Participants’ knowledge about swarm failures, e.g., What are the main reasons for swarm failures? And What behaviour sparks your concern?
xSwarm scenario
Situation assessment Participants’ assessment of a swarm failure or unexpected behaviour, e.g., What questions would you ask to debug this failure? Why do you think the swarm failed in this case?
xSwarm challenges
Experience
Hypothetical
Participants’ previous experience in generating explanations from a swarm, e.g., What kind of challenges did you face when generating an explanation from a swarm? Participant needs to develop eXplainable swarms, e.g., How would swarm developers be supported in integrating explainability??
Swarm Experience. In this section, we asked participants to provide four demographic factors related to their previous swarm experience - participants’ area of research or experience, current role, years of experience in swarm robotics, and level of experience in swarm robotics. We excluded participants who had less than a year of experience or had identified themselves with a low level of experience in the field. xSwarm Scenario. This section used an industrial-based use case for HSI, which represents a real-world use case for HSI where explainability is a crucial requirement [10]. The use case presented the following scenario: “A hurricane has affected a settlement. Many were evacuated, but some did not manage to escape and have been caught in debris and fallen structures. A UAV (Unmanned Aerial Vehicle) swarm is being deployed to identify locations of survivors, to evaluate their condition, and to advise on where unmanned ground res-cue vehicles should be deployed to evacuate casualties”. A diagram for the use case can be found in Appendix A - Figure A.1. First, we asked participants to select when and what the swarm should explain its behaviour to human operators during the task. Then, we provided two cases of swarm failures or unexpected behaviour and asked participants to explain these cases. This is a situation assessment probe to
Explainable Swarm
33
determine the bases for explanations during failures or unexpected behaviour, rather than waiting for these incidents to happen [13]. We also asked participants to provide what questions will they ask to debug failures. Figure 1A and Fig. 1B show two scenarios of either a failure on an expected behaviour during two main stages of the swarm task: (a) Planning: a swarm has suddenly stopped while it is trying to reach its destination, and (b) Searching: the swarm has failed to identify all the casualties in the search area.
Fig. 1. Scenarios of swarm failures or unexpected behaviour presented to participants (A) swarm has suddenly stopped. (B) Swarm has failed to recognise all possible casualties in the search area.
xSwarm Challenges. The questions in this section aim to understand the challenges faced by experts to generate explanations from swarm systems. First, this section included open-end questions to reflect on participants’ experience in generating or obtaining explanations from swarm systems. We asked questions such as “What were the typical difficulties you encountered while trying to extract explanations from swarm systems?”. The section also contained hypothetical questions to identify the key tools and techniques that swarm developers should have to build xSwarm. We asked questions such as “How would swarm developers be better supported in integrating explainability?” and “What kind of issues and considerations should developers have in mind when developing xSwarms?”. 3.3
Analysis
We followed the content analysis method [12]. Our data set included participants’ answers to open-ended questions with themes extracted across perspectives. The analysis was conducted by the first author. To increase the trustworthiness of our qualitative analysis, we followed an iterative process across the research team. This iterative procedure was implemented to check and confirm that the taxonomy is reflected in the responses of the participants. The authors met multiple times during each stage of the research to ensure the accurate interpretation of each category and the supporting evidence. These discussions and meetings
34
M. Naiseh et al.
resulted in the dividing, altering, removing, or adding of categories to ensure that all responses and their settings were adequately represented in the data.
4
Results
Our results are divided into two parts. We start by discussing the general explanation categories emergent in participants’ answers, which outline the design space of xSwarm. We then discuss the main challenges faced by our experts to integrate explainability in swarm systems. Table 2 shows a sample of participants’ answers. More examples can be found in Appendix A - Table A.2. Table 2. Explanation categories for swarm robotics. Explain column refers to participants’ explanations to swarm failures or unexpected behaviour- Debug column presents participants’ questions to debug swarm failures or unexpected behaviour - Freq. refers to the number of times this category appeared in the data set. Category
Explain
Debug
Consensus
Detecting the missing casualty caused a conflict between the agents Agents were not able to agree on some features of the casualty Freq. = 24
How much of the swarm is consolidating the same view? What is the agreement percentage between agents? Freq. = 18
Path Planning
Because the swarm initiative plan is to visit Location L which is a charging location Because swarm is trying to avoid obstacles Freq. = 22
Is it a collision-free route? Where is the swarm right now? [the probability distribution of possible robots locations] Freq. = 14
Communication
Because the communication is limited between agents Environmental condition is limiting the connection between agents Freq. = 19
What is the accuracy of information exchange between swarm agents? What is the synchronisation status between swarm agents? Freq. = 6
Scheduling
Because this casualty should be detected by Why there is x number of agents to search this another cluster area? Because there are not enough agents to detect all Why a cluster c1 is assigned to task t1? casualties Freq. = 8 Freq. = 14
Hardware
Because the casualty is out of sensors coverage distance Because of low-quality input data from swarm sensors Freq. = 16
What is the communication bandwidth between swarm robots What is the limitation of the swarm sensors? Freq. = 5
Architecture and design Because robots have limited communication links What is the response threshold function? per robot How is the swarm size selected? Because the swarm has not got enough training in Freq. = 6 such an environment Freq. = 5
4.1
Explanation Categories
Consensus. A key distinction between swarms and multi-agent systems is that the swarm can be seen as a single entity rather than multiple individual robots with multiple goals and tasks. Swarm robotics is usually equipped with methods to reach a consensus on the swarm decision, based on the accumulation and
Explainable Swarm
35
sharing of information about features of the decision encountered by individual robots. The most frequently asked questions by our participants were not only regarding why the swarm has made a single decision but at a high level, acquiring who and how many robots contributed to that decision. Our participants frequently asked questions to check either the percentage of individual robots who communicated the preferred decision, e.g., P183 mentioned: “How many robots detected the casualties?”, and P26 described: “Which features of the casualty did the robots agree on?”. Furthermore, when participants were asked to provide reasons for swarm failures, they frequently answered that such a failure could be related to a conflict between swarm agents. Participants enquired information to further understand the opinion dynamics of the swarm. Opinion dynamics is a branch of statistical physics that studies the process of agreement in large populations of interacting agents [24]. Because of these dynamics, a population of agents might progressively transition from a situation in which they may hold varying opinions to one in which they all hold the same opinion. Participants were interested in how the swarm opinion dynamics might change and evolve. For instance, P20 reflected on the swarm searching for casualties task: “What we are also interested in is that What are the emerging decisions (fragmentation), Is there a dominant decision, how dominant is this decision, and how will this opinion evolve”. Path Planning. Participants frequently asked questions related to the path that the swarm is going to follow and its related events. For instance, P2 mentioned that remote human operators would frequently check what is the next state of the swarm. Similarly, P20 added, “automatic planning has not been fully adopted especially in high-risk domains such this scenario, what the system usually does to suggest the plan to the human operator to check that path”. Further, participants were not only interested to validate the swarm plan but also asked questions related to the path planning algorithm to debug swarm failures, e.g., P18 and P7 asked to debug a failure in Fig. 1A: “What metrics does the swarm use to follow this route?” and “Why did the swarm follow this route, not another one?”. A similar pattern appeared in participants’ answers when they were asked to explain the unexpected behaviour in Fig. 1A Participants explained the unexpected behaviour based on path planning events, P9 stated: “Because the swarm agents are charging their batteries according to the initial plan”, similarly P18 commented: “Probably the swarm has not stopped, it is just avoiding obstacles right now”. These results support the growing interest in Explainable Planning (XAIP), as shown by many planning contributions such as Human-Aware Planning and Model Reconciliation [9] and multi-agent path planning [2]. Such explanations aim to answer human operators’ questions regarding the path for several agents or clusters of agents to reach their targets [2], such that paths can be taken simultaneously without the agents colliding. However, the research on explainable swarm path-planning is still limited and requires further attention. 3
For anonymity, individual participants are referred to as Pn (e.g. P18) in this chapter.
36
M. Naiseh et al.
Communication. The collective decision-making process is the outcome of an emergent phenomenon that follows localised interactions among agents yielding a global information flow. As a consequence, to investigate the dynamics of collective decision-making, participants went beyond debugging consensus between agents and asked questions related to the communication between swarm agents, e.g., P7 pointed out the following question: “What is the synchronisation status between swarm agents?”. Participants also brought up that swarm failure in our examples could be critically related to a failure in ensuring the appropriate flow of behavioural information among agents, both in terms of information quantity and accuracy, e.g., P21 commented on the failure in Fig. 1B: “because the environmental condition is limiting the connection between agents” and P18 added: “Perhaps one of the casualties moved on/no longer is, therefore state was not updated”. Scheduling-Based. Task scheduling in swarm robotics is a collective behaviour in which robots distribute themselves over different tasks. The goal is to maximize the performance of the swarm by letting the robots dynamically choose which task to perform. Yet, in many applications, human operators are still required to intervene in scheduling plan and adapt accordingly. In our questionnaire, participants’ understanding of particular swarm failure was sometimes associated with an initial task scheduling. Questions and explanations related to task scheduling were mentioned in both scenarios. For instance, P2 and P21 explained casualty detection failure (Fig. 1B) with the following explanations, “Because this casualty should be detected by another cluster”, and “Because the initial scheduling plan did not allocate enough robots”. Interestingly, regardless the failure scenarios, our participants repeatedly suggested questions with a pattern proposed by Miller [?] which have a contrastive nature of explanation that are often implying why not another scheduling plan is feasible. For instance, P19 asked the following question: “Why is a cluster c1 better than cluster c2 in performing task t1?”. Our data also showed a set of questions that require interactive scheduling-based explanations, where the human operator can understand the consequences of the scheduling plan. This pattern is pointed out by Wang et al. [?] as what-if questions, P12 mentioned: “What if cluster c1 were to do task t1 rather than task t2?”. Hardware. Even though the quality and robustness of the hardware are increasing, hardware failure is still quite common. Participants explained the failure of the swarm with a link to a hardware problem. For instance, when the swarm was not able to detect all the casualties in the area, P17 commented: “because of faulty in the sensors”. Our participants also discussed that explanations of hardware are necessary to give human operators actionable explanations to either better utilise the swarm or to improve the swarm performance, e.g., send more swarm agents to the location. There was also a typical pattern among participants’ feedback to explain swarm behaviour or failure based on its hardware boundaries and limitation as well as environmental and training data limita-
Explainable Swarm
37
tions. For instance, P11 explained unexpected behaviour in Fig. 4.1: “Because some agents have limited speed capabilities, so they are waiting for each other”, and similarly P14 explained Fig. 4.2 failure, “the swarm sensors have low image input”. Architecture and Design. Participants also found many reasons for swarm failures based on the swarm architecture or design decisions of the system. This category did not appear frequently in the data, and participants mentioned five explanations and six questions. Participants recognised that potential failure or unexpected behaviour can be related to initial design decisions made by the systems engineers e.g., P12 answered: “Because the swarm has not got enough training in such an environment” and similarly, P22 commented: “Because the transition probability between states is fixed”. Participants also took a broader view of swarm explainability when they discussed questions that can be answered through descriptive information about the swarm system design. For instance, P1 and P4 asked: “How is the swarm size selected?”, “What is the design method? is it behaviour-based or Automatic design?.” Currently, these kinds of explanations are discussed in the XAI literature as an on-boarding activity where human operators are introduced to the system design architecture to understand its capability and limitation [8]. These global properties of the system could help inform the interpretation of swarm behaviour and predictions during critical decision-making scenarios. 4.2
Challenges Faced by Swarm Experts
In this section, we discuss three main challenges participants faced when designing or developing explanations for swarm robotics. A Trade-Off Between Explainability and Human-Swarm Performance. Participants discussed that developing explainable swarms comes with a critical analysis of performance-explainability trade-offs that swarm developers have to face. Participants frequently pointed out that generating explanations does not come without cost; they mentioned that this could be a computational or performance cost, e.g., interruption to the human-swarm task. P18 mentioned: “asking the operator to read an explanation at an emergent situation can be vital and the explanation might delay the reaction time and distract the operator”. These observations are consistent with earlier research [22] that showed explanations can be ignored by human operators when they are perceived as a task goal impediment. It has been shown that humans in Human-AI systems focus to complete the task rather than trying to engage cognitively with AI systems to understand their behaviour [7]. Such effect can be even amplified in human-swarm environments where swarm operators might face time constraints and multitasking as well as large numbers of agents to monitor. In summary, participants argued that integrating explainability in the Human-Swarm task is difficult - numerous design decisions require trade-offs involving explainability, workload and usability.
38
M. Naiseh et al.
Increasing the Number of Agents Would Increase the Complexity of Explanation Generation. The xSwarm challenge is significantly more difficult than other XAI areas since a swarm’s decision is dependent on numerous parameters linked to numerous agents [?]. Participants pointed out that there is a scalability problem in swarm explainability, i.e., increasing the number of agents in a swarm will exponentially increase the complexity of the explanation generation. P12 mentioned: “explainable swarm can have an extreme cost when the swarm size is 100”. Similarly, P2 added: “some of the technical reasons that led to the decision are relevant to a given user; as the number of agents increases, the non-relevant information increases exponentially”. For these reasons, simplicity as a usability feature for explanation can be essential in swarm environments. Simplicity states that explanations generated by AI systems should be selective and succinct enough to avoid overwhelming the explainee with unnecessary information [26]. In response to this challenge, participants suggested that future swarm systems shall include methods and tools to help summarise explanations generated by swarm agents, e.g., P20 suggested: “Human operators cannot deal with loads of explanations, there should be some tools to provide a summary of the swarm decision”. In one relevant research, Mualla et al. [20] addressed this issue and presented a novel explainable swarm architecture with a filtering feature to filter explanations generated from swarm agents. They conduct a user evaluation with three experimental groups: “No explanation: the swarm does not present an explanation”, “Detailed explanation: the swarm presents explanations generated by each agent” and “Filtered explanation, the swarm presented a filtered explanation of each agent”. They showed that filtering explanation was the most preferable option for participants and lead to more trust and understanding of the swarm behaviour. The Soundness of Swarm Explanations is Difficult to Achieve. The soundness of an explanation measures how truthful an explanation is with respect to the underlying system state [26]. Participants stated that they faced difficulties to generate explanations that are sound and complete because, in some contexts, information might be unavailable or outdated, P7 mentioned: “A variety of technical factors like bandwidth limitations and swarm size, an environmental condition factors associated with swarms diminish the likelihood of having perfect communication, which negatively impacts generating explanations that are complete and sound”. To address this challenge, participants suggested explanation generation tools in swarm robotics shall take into consideration the validity of the available information to generate an explanation for a swarm action. P16 mentioned: “swarm explanation algorithms should have metrics to measure the soundness of their explanations before presenting it to operators”. Participants also suggested that outdated information about the swarm may result in an outdated explanation which may degrade Human-Swarm performance and trust, P4 raised this issue: “Sometimes the swarm explanation might be not synchronised with the current state of the swarm, in these cases, it might be better to avoid showing this information to human operators”. Previous empir-
Explainable Swarm
39
ical studies discussed this issue and showed that erroneous actions taken by the human operator to progress toward the overall goal will be amass if actions are based on outdated swarm explanations or information [27]. Future xSwarm research may need to consider discovering mitigation strategies to break the error propagation cycle that may emerge from outdated explanations.
5
Conclusion
Although the research in eXplainable AI is gaining a lot of interest from industry and academia, the research on explainability for swarm robotics is still limited. In this paper, we presented results from an online questionnaire with 26 swarm experts to outline the design space of eXplainable Swarm (xSwarm). In particular, we presented a list of explanations generated by swarm experts to explain swarm behaviours or failures and what kind of questions do they ask to debug the swarm. Our work provides concrete taxonomy of what xSwarm would look like from swarm experts’ perspective, serving as an initial artefact to outline the design space of xSwarm. Our work suggests opportunities for the HCI and swarms robotics communities, as well as industry practitioners and academics, to work together to advance the field of xSwarm through translational work and a shared knowledge repository that represent an xSwarm design space. Finally, we cannot claim this is a complete analysis of xSwarm design space. The results only reflect swarm experts’ views using an online questionnaire. Future work could use our taxonomy as a study probe for in-depth investigation with multiple stakeholders in swarm systems to further enhance our taxonomy. Acknowledgments. This work was supported by the Engineering and Physical Sciences Research Council (EP/V00784X/1).
References 1. Agrawal, A., Cleland-Huang, J.: Explaining autonomous decisions in swarms of human-on-the-loop small unmanned aerial systems. In: Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, vol. 9, pp. 15–26 (2021) 2. Almagor, S., Lahijanian, M.: Explainable multi agent path finding. In: AAMAS (2020) 3. Anjomshoae, S., Najjar, A., Calvaresi, D., Fr¨ amling, K.: Explainable agents and robots: results from a systematic literature review. In: 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2019), Montreal, Canada, 13–17 May 2019, pp. 1078–1088. International Foundation for Autonomous Agents and Multiagent Systems (2019) 4. Atkinson, K., Bench-Capon, T., Bollegala, D.: Explanation in AI and law: past, present and future. Artif. Intell. 289, 103387 (2020) 5. Brambilla, M., Ferrante, E., Birattari, M., Dorigo, M.: Swarm robotics: a review from the swarm engineering perspective. Swarm Intell. 7(1), 1–41 (2013)
40
M. Naiseh et al.
6. Brandao, M., Canal, G., Krivi´c, S., Luff, P., Coles, A.: How experts explain motion planner output: a preliminary user-study to inform the design of explainable planners. In: 2021 30th IEEE International Conference on Robot & Human Interactive Communication (RO-MAN), pp. 299–306. IEEE (2021) 7. Bu¸cinca, Z., Malaya, M.B., Gajos, K.Z.: To trust or to think: cognitive forcing functions can reduce overreliance on AI in AI-assisted decision-making. Proc. ACM Hum.-Comput. Interact. 5(CSCW1), 1–21 (2021) 8. Cai, C.J., Winter, S., Steiner, D., Wilcox, L., Terry, M.: “hello AI”: uncovering the onboarding needs of medical practitioners for human-AI collaborative decisionmaking. Proc. ACM Hum.-Comput. Interact. 3(CSCW), 1–24 (2019) 9. Chakraborti, T., Sreedharan, S., Zhang, Y., Kambhampati, S.: Plan explanations as model reconciliation: moving beyond explanation as soliloquy. arXiv preprint arXiv:1701.08317 (2017) 10. Clark, J.R., et al.: Industry led use-case development for human-swarm operations. arXiv preprint arXiv:2207.09543 (2022) 11. Divband Soorati, M., Clark, J., Ghofrani, J., Tarapore, D., Ramchurn, S.D.: Designing a user-centered interaction interface for human-swarm teaming. Drones 5(4), 131 (2021) 12. Elo, S., Kyng¨ as, H.: The qualitative content analysis process. J. Adv. Nurs. 62(1), 107–115 (2008) 13. Klein, G.A., Calderwood, R., Macgregor, D.: Critical decision method for eliciting knowledge. IEEE Trans. Syst. Man Cybern. 19(3), 462–472 (1989) 14. Kraus, S., et al.: AI for explaining decisions in multi-agent environments. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 13534–13538 (2020) 15. Liao, Q.V., Gruen, D., Miller, S.: Questioning the AI: informing design practices for explainable AI user experiences. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1–15 (2020) 16. Lundberg, S.M., et al.: From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2(1), 56–67 (2020) 17. McGuirk, P.M., O’Neill, P.: Using questionnaires in qualitative human geography (2016) 18. Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019) 19. Morgan, M.G.: Use (and abuse) of expert elicitation in support of decision making for public policy. Proc. Natl. Acad. Sci. 111(20), 7176–7184 (2014) 20. Mualla, Y., et al.: The quest of parsimonious XAI: a human-agent architecture for explanation formulation. Artif. Intell. 302, 103573 (2022) 21. Mueller, S.T., Hoffman, R.R., Clancey, W., Emrey, A., Klein, G.: Explanation in human-AI systems: a literature meta-review, synopsis of key ideas and publications, and bibliography for explainable AI. arXiv preprint arXiv:1902.01876 (2019) 22. Naiseh, M., Cemiloglu, D., Al Thani, D., Jiang, N., Ali, R.: Explainable recommendations and calibrated trust: two systematic user errors. Computer 54(10), 28–37 (2021) 23. Naiseh, M., Jiang, N., Ma, J., Ali, R.: Personalising explainable recommendations: ´ Adeli, H., Reis, L.P., Costanzo, literature and conceptualisation. In: Rocha, A., S., Orovic, I., Moreira, F. (eds.) WorldCIST 2020. AISC, vol. 1160, pp. 518–533. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45691-7 49 24. Poli, R., Kennedy, J., Blackwell, T.: Particle swarm optimization. Swarm Intell. 1(1), 33–57 (2007)
Explainable Swarm
41
25. Roundtree, K.A., Goodrich, M.A., Adams, J.A.: Transparency: transitioning from human-machine systems to human-swarm systems. J. Cogn. Eng. Decis. Making 13(3), 171–195 (2019) 26. Sokol, K., Flach, P.: Explainability fact sheets: a framework for systematic assessment of explainable approaches. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 56–67 (2020) 27. Walker, P., Nunnally, S., Lewis, M., Kolling, A., Chakraborty, N., Sycara, K.: Neglect benevolence in human control of swarms in the presence of latency. In: 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 3009–3014. IEEE (2012)
VMAS: A Vectorized Multi-agent Simulator for Collective Robot Learning Matteo Bettini(B) , Ryan Kortvelesy, Jan Blumenkamp, and Amanda Prorok Department of Computer Science and Technology, University of Cambridge, Cambridge, UK {mb2389,rk627,jb2270,asp45}@cl.cam.ac.uk Abstract. While many multi-robot coordination problems can be solved optimally by exact algorithms, solutions are often not scalable in the number of robots. Multi-Agent Reinforcement Learning (MARL) is gaining increasing attention in the robotics community as a promising solution to tackle such problems. Nevertheless, we still lack the tools that allow us to quickly and efficiently find solutions to largescale collective learning tasks. In this work, we introduce the Vectorized Multi-Agent Simulator (VMAS). VMAS is an open-source framework designed for efficient MARL benchmarking. It is comprised of a vectorized 2D physics engine written in PyTorch and a set of twelve challenging multi-robot scenarios. Additional scenarios can be implemented through a simple and modular interface. We demonstrate how vectorization enables parallel simulation on accelerated hardware without added complexity. When comparing VMAS to OpenAI MPE, we show how MPE’s execution time increases linearly in the number of simulations while VMAS is able to execute 30,000 parallel simulations in under 10 s, proving more than 100× faster. Using VMAS’s RLlib interface, we benchmark our multi-robot scenarios using various Proximal Policy Optimization (PPO)-based MARL algorithms. VMAS’s scenarios prove challenging in orthogonal ways for state-of-the-art MARL algorithms. The VMAS framework is available at: https://github.com/proroklab/ VectorizedMultiAgentSimulator. A video of VMAS scenarios and experiments is available https://youtu.be/aaDRYfiesAY Keywords: simulator
1
· multi-robot learning · vectorization
Introduction
Many real-world problems require coordination of multiple robots to be solved. However, coordination problems are commonly computationally hard. Examples include path-planning [14], task assignment [25], and area coverage [39]. While exact solutions exist, their complexity grows exponentially in the number of robots [3]. Metaheuristics [6] provide a fast and scalable solutions, but lack in optimality. Multi-Agent Reinforcement Learning (MARL) can be used as a scalable approach to find near-optimal solutions to these problems [34]. In MARL, agents trained in simulation collect experiences by interacting with the environment, and train their policies (typically represented with deep neural networks) through a reward signal. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 42–56, 2024. https://doi.org/10.1007/978-3-031-51497-5_4
VMAS: A Vectorized Multi-agent Simulator for Collective Robot Learning
43
Fig. 1. Multi-robot scenarios introduced in VMAS. Robots (blue shapes) interact among each other and with landmarks (green, red, and black shapes) to solve a task.
However, current MARL approaches present several issues. Firstly, the training phase can require significant time to converge to optimal behavior. This is partially due to the sample efficiency of the algorithm, and partially to the computational complexity of the simulator. Secondly, current benchmarks are specific to a predefined task and mostly tackle unrealistic videogame-like scenarios [28,31], far from real-world multi-robot problems. This makes research in this area fragmented, with a new simulation framework being implemented for each new task introduced. Multi-robot simulators, on the other hand, prove to be more general, but their high fidelity and full-stack simulation results in slow performance, preventing their applicability to MARL. Full-stack learning can significantly hinder training performance. Learning can be made more sampleefficient if simulation is used to solve high-level multi-robot coordination problems, while leaving low-level robotic control to first-principles-based methods. Motivated by these reasons, we introduce VMAS, a vectorized multi-agent simulator. VMAS is a vectorized 2D physics simulator written in PyTorch [22], designed for efficient MARL benchmarking. It simulates agents and landmarks
44
M. Bettini et al.
of different shapes and supports torque, elastic collisions and custom gravity. Holonomic motion models are used for the agents to simplify simulation. Vectorization in PyTorch allows VMAS to perform simulations in a batch, seamlessly scaling to tens of thousands of parallel environments on accelerated hardware. With the term GPU vectorization we refer to the Single Instruction Multiple Data (SIMD) execution paradigm available inside a GPU warp. This paradigm permits to execute the same instruction on a set of parallel simulations in a batch. VMAS has an interface compatible with OpenAI Gym [7] and with the RLlib library [15], enabling out-of-the-box integration with a wide range of RL algorithms. VMAS also provides a framework to easily implement custom multirobot scenarios. Using this framework, we introduce a set of 12 multi-robot scenarios representing difficult learning problems. Additional scenarios can be implemented through a simple and modular interface. We vectorize and port all scenarios from OpenAI MPE [16] in VMAS. We benchmark four of VMAS’s new scenarios using three MARL algorithms based on Proximal Policy Optimization (PPO) [29]. We show the benefits of vectorization by benchmarking our scenarios in the RLlib [15] library. Our scenarios prove to challenge state-of-the-art MARL algorithms in complementary ways. Contributions. We now list the main contributions of this work: – We introduce the VMAS framework. A vectorized multi-agent simulator which enables MARL training at scale. VMAS supports inter-agent communication and customizable sensors, such as LIDARs. – We implement a set of twelve multi-robot scenarios in VMAS, which focus on testing different collective learning challenges including: behavioural heterogeneity, coordination through communication, and adversarial interaction. – We port and vectorize all scenarios from OpenAI MPE [16] into VMAS and run a performance comparison between the two simulators. We demonstrate the benefits of vectorization in terms of simulation speed, showing that VMAS is up to 100× faster than MPE. The VMAS codebase is available https://github.com/proroklab/VectorizedMul tiAgentParticleSimulator.
2
Related Work
In this section, we review the related literature in the fields of multi-agent and multi-robot simulation, highlighting the core gaps of each field. Furthermore, we compare the most relevant simulation frameworks with VMAS in Table 1. Multi-agent Reinforcement Learning Environments. A significant amount of work exists in the context of MARL to address the issues of multi-robot simulation for learning hard coordination strategies. Realistic GPUaccelerated simulators and engines have been proposed. Isaac [17] is a proprietary NVIDIA simulator used for realistic robotic simulation in reinforcement learning. Instead of using environment vectorization to accelerate learning, it uses concurrent execution of multiple training environments in the same simulation instance.
VMAS: A Vectorized Multi-agent Simulator for Collective Robot Learning
45
Despite of this, its high-fidelity simulation makes it computationally expensive for high-level MARL problems. Brax [8] is a vectorized 3D physics engine introduced by Google. It uses the Jax [5] library to achieve environment batching and full-differentiability. However, computational issues occur when scaling the number of simulated agents, leading to stalled environments with just 20 agents. There also exist projects for single-agent vectorized environments [13,35], but the complexity of extending these to the multi-agent domain is non-trivial. The core benchmark environments of the MARL literature focus on high-level inter-robot learning. Multiagent Particle Environments (MPE) [16] are a set of enviroments created by OpenAI. They share VMAS’s principles of modularity and ease of new scenario creation, without providing environment vectorization. MAgent [38] is a discrete-world environment supporting a high number of agents. Multi-Agent-Learning-Environments [10] is another simplified discreteworld set of environments with a range of different multi-robot tasks. MultiAgent-Emergence-Environments [2] is a customizable OpenAI 3D simulator for hide-and-seek style games. Pommerman [26] is a discretized playground for learning multi-agent competitive strategies. SMAC [28] is a very popular MARL benchmark based on the Starcraft 2 videogame. Neural-MMO [31] is another videogame-like set of environments where agents learn to survive in large populations. Google Research Football [12] is a football simulation with a suite of scenarios that test different aspects of the game. Gym-pybullet-drones [21] is a realistic PyBullet simulator for multi-quadricopters control. Particle Robots Simulator [30] is a simulator for particle robots, which require high coordination strategies to overcome actuation limitations and achieve high-level tasks. MultiAgent Mujoco [23] consists in multiple agents controlling different body parts of a single Mujoco [32] agent. While all these environments provide interesting MARL benchmarks, most of them focus on specific tasks. Furthermore, none of these environments provide GPU vectorization, which is key for efficient MARL training. We present a comparison between VMAS and all the aforementioned environments in Tab. 1. Multi-robot Smulators. Video-game physics engines such as Unity and Unreal Engine grant realistic simulation that can be leveraged for multi-agent robotics. Both make use of the GPU-accelerated NVIDIA PhysX. However, their generality causes high overheads when using them for robotics research. Other popular physics engines are Bullet, Chipmunk, Box2D, and ODE. These engines are all similar in their capabilities and prove easier to adopt due to the availability of Python APIs. Thus, they are often the tool of choice for realistic robotic simulation. However, because they do not leverage GPU-accelerated batched simulation, these tools lead to performance bottlenecks in MARL training. The most widely known robotic simulators are Gazebo [11] and Webots [18]. Their engines are based on the ODE 3D dynamics library. These simulators support a wide range of robot models, sensors, and actuators, but suffer from significant performance loss when scaling in the number of agents. Complete simulation stall is shown to occur with as few as 12 robots [20]. For this reason, Argos [24] has been proposed as a scalable multi-robot simulator. It is able to
46
M. Bettini et al.
Table 1. Comparison of multi-agent and multi-robot simulators and environments. Vectora Stateb Commc Actiond PhysEnge #Agentsf Geng Exth MRobi MARLj RLlibk Brax [8] MPE [16] MAgent [38] MA-Learning-Environments [10] MA-Emergence-Environments [2] Pommerman [26] SMAC [28] Neural-MMO [31] Google research football [12] gym-pybullet-drones [21] Particle robots simulator [30] MAMujoco [23]
✓ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗
C C D D C D C C C C C C
✗ C+D ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗
C C+D D D C+D D D C+D D C C+D C
3D 2D ✗ ✗ 3D ✗ ✗ ✗ 2D 3D 2D 3D
< 10 < 100 > 1000 < 10 < 10 < 10 < 100 < 1000 < 100 < 100 < 100 < 10
✓ ✓ ✗ ✓ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗
✓ ✓ ✗ ✗ ✗ ✗ ✓ ✓ ✓ ✓ ✓ ✗
✗ ✗ ✗ ✓ ✗ ✗ ✗ ✗ ✗ ✓ ✓ ✗
✗ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
✗ ✓ ✓ ✗ ✗ ✗ ✓ ✓ ✓ ✓ ✗ ✗
Gazebo [11] Webots [18] ARGOS [24]
✗ ✗ ✗
C C C
C+D C+D C+D
C+D C+D C+D
3D 3D 2D&3D
< 10 < 10 < 1000
✓ ✓ ✓
✓ ✓ ✓
✓ ✓ ✓
✗ ✗ ✗
✗ ✗ ✗
✓
C
C+D
C+D
2D
< 100
✓
✓
✓
✓
✓
VMAS a
Vectorized b Continuous state (C) or discrete state/grid world (D) c Continuous communication (C) or discrete communication (D) inside the simulator d Continuous actions (C) or discrete actions (D) e Type of physics engine f Number of agents supported g General purpose simulator: any type of task can be created h Extensibility (API for creating new scenarios) i Contains multi-robot tasks j Made for Multi-Agent Reinforcement Learning (MARL) k Compatible with RLlib framework [15]
simulate swarms in the thousands of agents by assigning parts of the simulation space to different physics engines with different simulation goals and fidelity. Furthermore, it uses CPU parallelization through multi-threading. Despite these features, none of the simulators described are fast enough to be usable in MARL training. This is because they prioritize realistic full-stack multi-robot simulation over speed, and they do not leverage GPU acceleration for parallel simulations. This focus on realism is not always necessary in MARL. In fact, most collective coordination problems can be decoupled from low-level problems relating to sensing and control. When these problems can be efficiently solved independently without loss of generality, fast high-level simulation provides an important tool. This insight is the key factor motivating the holonomicity assumption in VMAS.
3
The VMAS Platform
The unique characteristic that makes VMAS different from the related works compared in Table 1 is the fact that our platform brings together multi-agent learning and environment vectorization. Vectorization is a key component to speed-up MARL training. In fact, an on-policy training iteration1 is comprised 1
Here we illustrate an on-policy training iteration, but simulation is a key component of any type of MARL algorithm.
VMAS: A Vectorized Multi-agent Simulator for Collective Robot Learning
47
of simulated team rollouts and a policy update. During the rollout phase of iteration k, simulations are performed to collect experiences from the agents’ interactions with the environment according to their policy πk . The collected experiences are then used to update the team policy. The new policy πk+1 will be employed in the rollout phase of the next training iteration. The rollout phase usually constitutes the bottleneck of this process. Vectorization allows parallel simulation and helps alleviate this issue. Inspired by the modularity of some existing solutions, like MPE [16], we created our framework as a new scalable platform for running and creating MARL benchmarks. With this goal in mind, we developed VMAS following a set of tenets: – Vectorized. VMAS vectorization can step any number of environments in parallel. This significantly reduces the time needed to collect rollouts for training in MARL. – Simple. Complex vectorized physics engines exist (e.g., Brax [8]), but they do not scale efficiently when dealing with multiple agents. This defeats the computational speed goal set by vectorization. VMAS uses a simple custom 2D dynamics engine written in PyTorch to provide fast simulation. – General. The core of VMAS is structured so that it can be used to implement general high-level multi-robot problems in 2D. It can support adversarial as well as cooperative scenarios. Holonomic robot simulation shifts focus to highlevel coordination, obviating the need to learn low-level controls using MARL. – Extensible. VMAS is not just a simulator with a set of environments. It is a framework that can be used to create new multi-agent scenarios in a format that is usable by the whole MARL community. For this purpose, we have modularized our framework to enable new task creation and introduced interactive rendering to debug scenarios. – Compatible. VMAS has multiple wrappers which make it directly compatible with different MARL interfaces, including RLlib [15] and Gym [7]. RLlib has a large number of already implemented RL algorithms. Let us break down VMAS’s structure in depth. Interface. The structure of VMAS is illustrated in Fig. 2. It has a vectorized interface, which means that an arbitrary number of environments can be stepped in parallel in a batch. In Sec. 5, we demonstrate how vectorization grants important speed-ups on the CPU and seamless scaling on the GPU. While the standard simulator interface uses PyTorch [22] to enable feeding tensors directly as input/output, we provide wrappers for the standard non-vectorized OpenAI Gym [7] interface and for the vectorized interface of the RLlib [15] framework. This enables users to effortlessly access the range of RL training algorithms already available in RLlib. Actions for all environments and agents are fed to VMAS for every simulation step. VMAS supports movement and inter-agent communication actions, both of which can be either continuous or discrete. The interface of VMAS provides rendering through Pyglet [1]. Scenario. Scenarios encode the multi-agent task that the team is trying to solve. Custom scenarios can be implemented in a few hours and debugged using
48
M. Bettini et al.
Fig. 2. VMAS structure. VMAS has a vectorized MARL interface (left) with wrappers for compatibility with OpenAI Gym [7] and the RLlib RL library [15]. The default VMAS interface uses PyTorch [22] and can be used for feeding input already on the GPU. Multi-agent tasks in VMAS are defined as scenarios (center). To define a scenario, it is sufficient to implement the listed functions. Scenarios access the VMAS core (right), where agents and landmarks are simulated in the world using a 2D custom written physics module.
interactive rendering. Interactive rendering is a feature where agents in scenarios can be controlled by users in a videogame-like fashion and all environmentrelated data is printed on screen. To implement a scenario, it is sufficient to define a few functions: make_world creates the agents and landmarks for the scenario and spawns them in the world, reset_world_at resets a specific environment in the batch or all environments at the same time, reward returns the reward for one agent for all environments, observation returns the agent’s observations for all environments. Optionally, done and info can be implemented to provide an ending condition and extra information. Further documentation on how to create new scenarios is available in the repository and in the code. Core. Scenarios interact with the core. This is where the world simulation is stepped. The world contains n entities, which can be agents or landmarks. Entities have a shape (sphere, box, or line) and a vectorized state (xi , x˙ i , θi , θ˙i ), ∀i ∈ [1..n] ≡ N , which contains their position xi ∈ R2 , velocity x˙ i ∈ R2 , rotation θi ∈ R, and angular velocity θ˙i ∈ R for all environments. Entities have a mass mi ∈ R and a maximum speed and can be customized to be movable, rotatable, and collidable. Agents’ actions consist of physical actions, represented as forces fia ∈ R2 , and optional communication actions. In the current state of the simulator, agents cannot control their orientation. Agents can either be controlled from the interface or by an “action script” defined in the scenario. Optionally, the simulator can introduce noise to the actions and observations. Custom sensors can be added to agents. We currently support LIDARs. The world has a simulation step δt, velocity damping coefficient ζ, and customizable gravity g ∈ R2 .
VMAS: A Vectorized Multi-agent Simulator for Collective Robot Learning
49
VMAS has a force-based physics engine. Therefore, the simulation step uses the forces at time t to integrate the state by using a semi-implicit Euler method [19]: ⎧ g a e ⎪ ⎨fi (t) = fi (t) + fi + j∈N \{i} fij (t) i (t) , (1) δt x˙ i (t + 1) = (1 − ζ)x˙ i (t) + fm i ⎪ ⎩ xi (t + 1) = xi (t) + x˙ i (t + 1)δt where fia is the agent action force, fig = mi g is the force deriving from gravity e is the environmental force used to simulate collisions between entities i and fij and j. It has the following form: ⎧ ⎨c xij (t) k log 1 + e −(xij (t)k−dmin ) e xij (t) (t) = fij ⎩ 0
if xij (t) dmin
.
(2)
otherwise
Here, c is a parameter regulating the force intensity. xij is the relative position between the closest points on the shapes of the two entities. dmin is the minimum distance allowable between them. The term inside the logarithm computes a scalar proportional to the penetration of the two entities, parameterized by a coefficient k. This term is then multiplied by the normalized relative position vector. Collision intensity and penetration can be tuned by regulating c and k. This is the same collision system used in OpenAI MPE [16]. The simulation step used for the linear state is also applied to the angular state: ⎧ e ⎪ ⎨τi (t) = j∈N \{i} rij (t) × fij (t) ˙θi (t + 1) = (1 − ζ)θ˙i (t) + τi (t) δt . (3) Ii ⎪ ⎩ ˙ θi (t + 1) = θi (t) + θi (t + 1)δt Here, rij ∈ R2 is the vector from the center of the entity to the colliding point, τi is the torque, and Ii is the moment of inertia of the entity. The rules regulating the physics simulation in the core are basic 2D dynamics implemented in a vectorized manner using PyTorch. They simulate holonomic (unconstrained motion) entities only.
4
Multi-robot Scenarios
Alongside VMAS, we introduce a set of 12 multi-robot scenarios. These scenarios contain various multi-robot problems, which require complex coordination— like leveraging heterogeneous behaviour and inter-agent communication—to be solved. While the ability to send communication actions is not used in these scenarios, communication can be used in the policy to improve performance. For example, Graph Neural Networks (GNNs) can be used to overcome partial observability through information sharing [4].
50
M. Bettini et al.
Each scenario delimits the agents’ input by defining the set of their observations. This set typically contains the minimum observation needed to solve the task (e.g., position, velocity, sensory input, goal position). Scenarios can be made arbitrarily harder or easier by modifying these observations. For example, if the agents are trying to transport a package, the precise relative distance to the package can be removed from the agent inputs and replaced with LIDAR measurements. Removing global observations from a scenario is a good incentive for inter-agent communication. All tasks contain numerous parametrizable components. Every scenario comes with a set of tests, which run a local heuristic on all agents. Furthermore, we vectorize and port all 9 scenarios from MPE [16] to VMAS. In this section, we give a brief overview of our new scenarios. For more details (e.g., observation space, reward, etc.) you can find in-depth descriptions in the VMAS repository. Transport (Fig. 1a). N agents have to push M packages to a goal. Packages have a customizable mass and shape. Single agents are not able to move a highmass package by themselves. Cooperation with teammates is thus needed to solve the task. Wheel (Fig. 1b). N agents have to collectively rotate a line. The line is anchored to the origin and has a parametrizable mass and length. The team’s goal is to bring the line to a desired angular velocity. Lines with a high mass are impossible to push for single agents. Therefore, the team has to organize with agents on both sides to increase and reduce the line’s velocity. Balance (Fig. 1c). N agents are spawned at the bottom of a world with vertical gravity. A line is spawned on top of them. The agents have to transport a spherical package, positioned randomly on top of the line, to a given goal at the top. The package has a parametrizable mass and the line can rotate. Give Way (Fig. 1d). Two agents start in front of each other’s goals in a symmetric environment. To solve the task, one agent has to give way to the other by using a narrow space in the middle of the environment. Football (Fig. 1e). A team of N blue agents competes against a team of M red agents to score a goal. By default, red agents are controlled by a heuristic AI, but self-play is also possible. Cooperation among teammates is required to coordinate attacking and defensive maneuvers. Agents need to communicate and assume different behavioural roles in order to solve the task. Passage (Fig. 1f ). 5 agents, starting in a cross formation, have to reproduce the same formation on the other side of a barrier. The barrier has M passages (M = 1 in the figure). Agents are penalized for colliding amongst each other and with the barrier. This scenario is a generalization of the one considered in [4]. Reverse Transport (Fig. 1g). This task is the same as Transport, except only one package is present. Agents are spawned inside of it and need to push it to the goal. Dispersion (Fig. 1h). There are N agents and N food particles. Agents start in the same position and need to cooperatively eat all food. Most MARL algorithms cannot solve this task (without communication or observations from other agents) as they are constrained by behavioural homogeneity deriving from
VMAS: A Vectorized Multi-agent Simulator for Collective Robot Learning
51
parameter sharing. Heterogeneous behaviour is thus needed for each agent to tackle a different food particle. Dropout (Fig. 1i). N agents have to collectively reach one goal. To complete the task, it is enough for only one agent to reach the goal. The team receives an energy penalty proportional to the sum of all the agents’ controls. Therefore, agents need to organize themselves to send only the closest robot to the goal, saving as much energy as possible. Flocking (Fig. 1j). N agents have to flock around a target without colliding among each other and M obstacles. Flocking has been an important benchmark in multi-robot coordination for years, with first solutions simulating behaviour according to local rules [27], and more recent work using learningbased approaches [33]. In contrast to related work, our flocking environment contains static obstacles. Discovery (Fig. 1k). N agents have to coordinate to cover M targets as quickly as possible while avoiding collisions. A target is considered covered if K agents have approached a target at a distance of at least D. After a target is covered, the K covering agents each receive a reward and the target is re-spawned at a random position. This scenario is a variation of the Stick Pulling Experiment [9] and while it can be solved without communication, it has been shown that communication significantly improves performance for N < M . Waterfall (Fig. 1l). N agents move from top to bottom through a series of obstacles. This is a testing scenario that can be used to discover VMAS’s functionalities.
5
Comparison with MPE
In this section, we compare the scalability of VMAS and MPE [16]. Given that we vectorize and port all the MPE scenarios in VMAS, we can compare the two simulators on the same MPE task. The task chosen is “simple spread”, as it contains multiple collidable agents in the same environment. VMAS and MPE use two completely different execution paradigms: VMAS, being vectorized, leverages the Single Instruction Multiple Data (SIMD) paradigm, while MPE uses the Single Instruction Single Data (SISD) paradigm. Therefore, it is sufficient to report the benefits of this paradigm shift on only one task, as the benefits are task-independent. In Fig. 3, we can see the growth in execution time with respect to the number of environments stepped in parallel for the two simulators. MPE runs only on the CPU, while VMAS, using PyTorch, runs both on the CPU and on the GPU. In this experiment, we compare the two simulators on an Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz and we also run VMAS on an NVIDIA GeForce RTX 2080 Ti. The results show the impact of vectorization on simulation speed. On the CPU, VMAS is up to 5x faster than MPE. On the GPU, the simulation time for VMAS is independent of the number of environments, and runs up to 100× faster. The same results can be reproduced on different hardware. In the VMAS’s repository we provide a script to repeat this experiment.
52
6
M. Bettini et al.
Experiments and Benchmarks
We run a set of training experiments to benchmark the performance of MARL algorithms on four VMAS scenarios. Thanks to VMAS’s vectorization, we are able to perform a training iteration (comprised of 60,000 environment interactions and deep neural network training) in 25 s on average. The runs reported in this section all took under 3 h to complete. The models compared are all based on Proximal Policy Optimization [29], an actor-critic RL algorithm. The actor is a Deep Neural Network (DNN) which outputs actions given the observations and the critic is a DNN (used only during training) which, given the observations, outputs a value representing the goodness of the current state and action. We refer to the actor and critic as centralized when they have access to all the agents’ observations and output all the agent’s actions/values and we call them decentralized when they only map one agent’s observations to its action/value. The models compared are: – CPPO: This model uses a centralized critic and actor. It treats the multiagent problem as a single-agent problem with one super-agent. – MAPPO [37]: This model uses a centralized critic and a decentralized actor. Therefore, the agents act independently, with local decentralized policies, but are trained with centralized information. – IPPO [36]: This model uses a decentralized critic and actor. Every agent learns and acts independently. Model parameters are shared among agents so they can benefit from each other’s experiences. – HetIPPO: We customize IPPO to disable parameter sharing, making each agent’s model unique. – Heuristic: This is a hand-designed decentralized heuristic different for each task.
Fig. 3. Comparison of the scalability of VMAS and MPE [16] in the number of parallel environments. In this plot, we show the execution time of the “simple spread” scenario for 100 steps. MPE does not support vectorization and thus cannot be run on a GPU.
VMAS: A Vectorized Multi-agent Simulator for Collective Robot Learning
53
Experiments are run in RLlib [15] using the vectorized interface. We run all algorithms for 400 training iterations. Each training iteration is performed over 60,000 environment interactions. We plot the mean and standard deviation of the mean episode reward2 over 10 runs with different seeds. The model used for all critics and actors is a two layer Multi Layer Perceptron (MLP) with hyperbolic tangent activations. A video of the learned policies is available at this https:// youtu.be/aaDRYfiesAY. In the following, we discuss the results for the trained scenarios. Transport (Fig. 4a). In the Transport environment, only IPPO is able to learn the optimal policy. This is because the other models, which have centralized components, have an input space consisting of the concatenation of all the agents’ observations. Consequently, centralized architectures fail to generalize in environments requiring a high initial exploration like this one, where there is a high variance in possible joint states (and therefore there is a low probability that a similar state will be encountered).
Fig. 4. Benchmark performance of different PPO-based MARL algorithms in four VMAS scenarios. Experiments are run in RLlib [15]. Each training iteration is performed over 60,000 environment interactions. We plot the mean and standard deviation of the mean episode reward4 over 10 runs with different seeds. 2
The episode reward mean is the mean of the total rewards of episodes contained in the training iteration.
54
M. Bettini et al.
Wheel (Fig. 4b). The Wheel environment proved to be a hard task for MARL algorithms. Here, all models were not able to solve the task and performed worse than the heuristic. Balance (Fig. 4c). In Balance, all models were able to solve the task and outperform the heuristic. However, this is largely due to the use of a big observation space containing global information. The task can be made arbitrarily harder by removing part of the observation space and thus increasing partial observability. Give Way (Fig. 4d). In the Give Way scenario, it is shown that only algorithms able to develop heterogeneous agent behaviour can solve the environment. In fact, IPPO and MAPPO, which use parameter sharing and decentralized actors, fail this scenario. On the other hand, it is shown that the scenario can be solved either through a centralized actor (CPPO) or by disabling parameter sharing and allowing agent policies to be heterogeneous (HetIPPO). The experimental results confirm that VMAS proposes a selection of scenarios which prove challenging in orthogonal ways for current state-of-the-art MARL algorithms. We show that there exists no one-fits-all solution and that our scenarios can provide a valuable benchmark for new MARL algorithms. In addition, vectorization enables faster training, which is key to a wider adoption of multi-agent learning in the robotics community.
7
Conclusion
In this work, we introduced VMAS, an open-source vectorized simulator for multi-robot learning. VMAS uses PyTorch and is composed of a core vectorized 2D physics simulator and a set of multi-robot scenarios, which encode hard collective robotic tasks. The focus of this framework is to act as a platform for MARL benchmarking. Therefore, to incentivize contributions from the community, we made implementing new scenarios as simple and modular as possible. We showed the computational benefits of vectorization with up to 30,000 parallel simulations executed in under 10 s on a GPU. We benchmarked the performance of MARL algorithms on our scenarios. During our training experiments, we were able to collect 60,000 environment steps and perform a training iteration in under 25 s. Experiments also showed how VMAS scenarios prove difficult in orthogonal ways for state-of-the-art MARL algorithms. In the future, we plan to extend the features of VMAS to widen its adoption, continuing to implement new scenarios and benchmarks. We are also interested in modularizing the physics engine, enabling users to swap vectorized engines with different fidelities and computational demands. Acknowledgements. This work was supported by ARL DCIST CRA W911NF-172-0181 and European Research Council (ERC) Project 949940 (gAIa). R. Kortvelesy was supported by Nokia Bell Labs through their donation for the Centre of Mobile, Wearable Systems and Augmented Intelligence to the University of Cambridge. J. Blumenkamp acknowledges the support of the ‘Studienstiftung des deutschen Volkes’ and an EPSRC tuition fee grant.
VMAS: A Vectorized Multi-agent Simulator for Collective Robot Learning
55
References 1. Pyglet. https://pyglet.org/ 2. Baker, B., et al.: Emergent tool use from multi-agent autocurricula. In: International Conference on Learning Representations (2019) 3. Bernstein, D.S., Givan, R., Immerman, N., Zilberstein, S.: The complexity of decentralized control of markov decision processes. Math. Oper. Res. 27(4), 819–840 (2002) 4. Blumenkamp, J., Morad, S., Gielis, J., Li, Q., Prorok, A.: A framework for realworld multi-robot systems running decentralized gnn-based policies. arXiv preprint arXiv:2111.01777 (2021) 5. Bradbury, J., et al.: JAX: composable transformations of Python+NumPy programs (2018). http://github.com/google/jax 6. Br¨ aysy, O., Gendreau, M.: Vehicle routing problem with time windows, Part II: Metaheuristics. Transp. Sci. 39(1), 119–139 (2005) 7. Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016) 8. Freeman, C.D., Frey, E., Raichuk, A., Girgin, S., Mordatch, I., Bachem, O.: Brax a differentiable physics engine for large scale rigid body simulation (2021). http:// github.com/google/brax 9. Ijspeert, A.J., Martinoli, A., Billard, A., Gambardella, L.M.: Collaboration through the exploitation of local interactions in autonomous collective robotics: the stick pulling experiment. Auton. Robot. 11(2), 149–171 (2001) 10. Jiang, S., Amato, C.: Multi-agent reinforcement learning with directed exploration and selective memory reuse. In: Proceedings of the 36th Annual ACM Symposium on Applied Computing, pp. 777–784 (2021) 11. Koenig, N., Howard, A.: Design and use paradigms for gazebo, an open-source multi-robot simulator. In: 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No. 04CH37566), pp. 2149–2154. IEEE (2004) 12. Kurach, K., et al.: Google research football: a novel reinforcement learning environment. Proc. AAAI Conf. Artif. Intell. 34, 4501–4510 (2020) 13. Lange, R.T.: gymnax: A JAX-based reinforcement learning environment library (2022). http://github.com/RobertTLange/gymnax 14. Li, Q., Gama, F., Ribeiro, A., Prorok, A.: Graph neural networks for decentralized multi-robot path planning. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 11785–11792. IEEE (2020) 15. Liang, E., et al.: Rllib: abstractions for distributed reinforcement learning. In: International Conference on Machine Learning, pp. 3053–3062. PMLR (2018) 16. Lowe, R., Wu, Y.I., Tamar, A., Harb, J., Pieter Abbeel, O., Mordatch, I.: Multiagent actor-critic for mixed cooperative-competitive environments. Adv. Neural Inf. Process. Syst. (2017) 17. Makoviychuk, V., et al.: Isaac gym: High performance GPU based physics simulation for robot learning. In: Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) (2021) 18. Michel, O.: Cyberbotics ltd. webotsTM : professional mobile robot simulation. Int. J. Adv. Robot. Syst. 1, 5 (2004) 19. Niiranen, J.: Fast and accurate symmetric Euler algorithm for electromechanical simulations. In: Electrimacs 99 (Modelling and Simulation of Electric Machines Converters an& Systems), pp. I–71 (D1999)
56
M. Bettini et al.
20. Noori, F.M., Portugal, D., Rocha, R.P., Couceiro, M.S.: On 3d simulators for multirobot systems in ros: Morse or gazebo? In: 2017 IEEE International Symposium on Safety, Security and Rescue Robotics (SSRR), pp. 19–24 (2017) 21. Panerati, J., Zheng, H., Zhou, S., Xu, J., Prorok, A., Schoellig, A.P.: Learning to flya gym environment with pybullet physics for reinforcement learning of multi-agent quadcopter control. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 7512–7519. IEEE (2021) 22. Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. (2019) 23. Peng, B., et al.: Facmac: factored multi-agent centralised policy gradients. Adv. Neural Inf. Process. Syst. 12208–12221 (2021) 24. Pinciroli, C., et al.: ARGoS: a modular, parallel, multi-engine simulator for multirobot systems. Swarm Intell. 6, 271–295 (2012) 25. Prorok, A.: Robust assignment using redundant robots on transport networks with uncertain travel time. IEEE Trans. Automat. Sci. Eng. 17, 2025–2037 (2020) 26. Resnick, C., et al.: Pommerman: a multi-agent playground. CoRR (2018) 27. Reynolds, C.W.: Flocks, herds and schools: a distributed behavioral model. In: Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques, pp. 25–34 (1987) 28. Samvelyan, M., et al.: The StarCraft Multi-agent Challenge. CoRR (2019) 29. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017) 30. Shen, J., Xiao, E., Liu, Y., Feng, C.: A deep reinforcement learning environment for particle robot navigation and object manipulation. arXiv preprint arXiv:2203.06464 (2022) 31. Suarez, J., Du, Y., Isola, P., Mordatch, I.: Neural MMO: a massively multiagent game environment for training and evaluating intelligent agents. arXiv preprint arXiv:1903.00784 (2019) 32. Todorov, E., Erez, T., Tassa, Y.: Mujoco: a physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033. IEEE (2012) 33. Tolstaya, E., Gama, F., Paulos, J., Pappas, G., Kumar, V., Ribeiro, A.: Learning decentralized controllers for robot swarms with graph neural networks. In: Kaelbling, L.P., Kragic, D., Sugiura, K. (eds.) Proceedings of the Conference on Robot Learning, Proceedings of Machine Learning Research, vol. 100, pp. 671–682. PMLR (2020). https://proceedings.mlr.press/v100/tolstaya20a.html 34. Wang, B., Liu, Z., Li, Q., Prorok, A.: Mobile robot path planning in dynamic environments through globally guided reinforcement learning. IEEE Robot. Automat. Lett. 5, 6932–6939 (2020) 35. Weng, J., et al.: Envpool: a highly parallel reinforcement learning environment execution engine. arXiv preprint arXiv:2206.10558 (2022) 36. de Witt, C.S., et al.: Is independent learning all you need in the starcraft multiagent challenge? arXiv preprint arXiv:2011.09533 (2020) 37. Yu, C., Velu, A., Vinitsky, E., Wang, Y., Bayen, A., Wu, Y.: The surprising effectiveness of PPO in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955 (2021) 38. Zheng, L., et al.: Magent: a many-agent reinforcement learning platform for artificial collective intelligence. In: Proceedings of the AAAI Conference on Artificial Intelligence (2018) 39. Zheng, X., Koenig, S., Kempe, D., Jain, S.: Multirobot forest coverage for weighted and unweighted terrain. IEEE Trans. Robot. 26, 1018–1031 (2010)
FLAM: Fault Localization and Mapping Guillaume Ricard(B) , David Vielfaure, and Giovanni Beltrame Polytechnique Montr´eal, Montreal, Canada [email protected]
Abstract. Multi-robot systems are increasingly used in several industry automation and warehouse management applications, mostly with a centralized hub for coordination. Several decentralized infrastructures have been studied for using multi-robot systems in real mission scenarios like search-and-rescue, area coverage and exploration. However, despite designing rigorous methods for using multi-robot systems in a decentralized setting, long-term field deployments still seem unfeasible. The lack of proper infrastructure for tackling fault-detection is one of the great challenges in this regard. We propose FLAM (https://github.com/MISTLab/FLAM), a fault localization and mapping algorithm that detects faults in a robotic system and uses them to build a map of the environmental hazards, effectively providing riskawareness to the robotic team.
Keywords: Swarm Robotics Recovery
1
· Robot Safety · Failure Detection and
Introduction
Using a team of robots, in opposition to a single one, for carrying a given task can result in significant performance gains, especially if the aforementioned task requires terrain coverage. Indeed, with proper coordination between the robots, the speed at which the terrain is being covered should increase proportionally with the number of robots in the team [1]. Time critical applications like searchand-rescue scenarios and forest fire reckoning are therefore well-suited for multi robot applications. However, in most cases, the environment in which the task is being carried is unknown and unfortunately, the use of robotic systems in such settings remains a challenge. Robots are prone to exhibiting failures when facing the associated random hazards and constantly changing conditions of such environments. Additionally, the agents forming a robot swarm are generally simple and inefficient on their own [2] and as a result, are not particularly tolerant to faults. If in theory the inherent redundancy of robot swarms should provide increased robustness, it has been demonstrated that, in practice, robot swarms can even exhibit lower reliability levels than traditional centralized robots systems [3]. Individual failures often lead to a complete system failure, challenging the assumption that swarm systems are robust by nature [4]. Developing a tool c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 57–68, 2024. https://doi.org/10.1007/978-3-031-51497-5_5
58
G. Ricard et al.
that tackles failures in decentralized robotic systems is therefore of great importance. The Science Robotics journal even sees it as one of the great challenges of the next decade [5]. Furthermore, a robot itself could be normally operational but under certain external factors could experience temporary sensor or actuator failures. These environmental factors are in a number of cases, location-based (for instance, a mud pit could cause wheel slippage in ground robots) and information about these locations could provide increased situational awareness for robots to effectively perform a given mission. In this work, we design a back-end and a front-end that can be used to determine these environmental hindrances and create a map of these factors that can be used by robots to gain risk awareness. In the front-end, we design feature vectors that are periodically shared between neighbors in the swarm. The feature vectors are then used by an outlier detector to determine if the agent’s behavior is normal or abnormal. An agent is portrayed as being faulty if its behavior is abnormal, the abnormality is presumed to be the result of a fault. The back-end obtains the identified faults along with their given locations and creates a distributed belief map of the environmental hazards. Higher levels of risk will be associated to areas where failures have been detected in the past. The belief map is updated and shared periodically across the swarm which in turn provides risk-awareness to the agents to perform their given task. In this paper, we make the following contribution to the field of swarm robotics: A decentralized fault detection and fault mapping mechanism that provides risk-awareness to autonomous robot swarms. Based on this contribution, we name our algorithm FLAM: Fault Localization And Mapping. The following paper is structured as follows: in Sect. 2 some relevant related works are presented; in Sect. 3 we describe the workings of our system; in Sect. 4 we present results obtained from simulations; and finally, in Sect. 5 we draw conclusions.
2 2.1
Related Work and Background Fault Detection
Faults are inevitable in robotic systems, based on Khalastchi et al. [6] the key is to identify and recover from them as quickly as possible. In a swarm of robots, the problem is slightly different. Because there is usually a large number of agents forming the swarm, losing one of them to a fault should not lead to a global system failure. However, it is of capital importance to gain insights from previous faults so as to avoid having the same fault propagating to the entire swarm. A taxonomy of the faults affecting robotic systems is presented in [6]. Hardware faults affect physical components of the robots and compromise their sensing and acting abilities. Software faults affect the behavior of the robots and are caused by faulty algorithms and/or faulty implementations. Finally, interaction faults affect the dynamics of the robotic system and are caused by exogenous events. In this work, we aim at detecting faults caused by location-based hazards, consequently FLAM focuses on interaction faults. Fault detection methods have
FLAM: Fault Localization and Mapping
59
been proposed in the past and the three main families are presented in the same survey [6]. (i) Data-driven approaches use sensor data and compare it to past normal/abnormal behaviors or to the behavior of neighbor agents to detect faults. (ii) Model-based approaches use a priori explicit model of the system to identify faults. (iii) Knowledge-based approaches are similar to the way a human would perform fault detection. Faults are typically represented in a tree structure where the fault can be linked backwards to where it originated. The main drawbacks of the last two approaches is the need of a priori model of the system and its inefficiency in detecting faults in dynamic environments. Additionally, two strategies can be applied to fault detection: endogenous fault detection and exogenous fault detection [7,8]. Endogenous fault detection refers to the actions taken by an individual to perform fault detection on itself and on its own. It is the robot’s responsibility to determine if it is in a faulty state. In opposition, exogenous fault detection refers to the actions taken by neighboring robots to detect faults on an observed entity. This approach relies on the observations of external robots and requires collaboration of neighboring agents. Exogenous data-driven fault detection methods are the best-suited for robot swarms. Exogenous strategies leverage the power of swarms: collaboration of all team members. Data-driven methods allow the detection of previously unseen faults and, as a result, enable the deployment of the swarm in unknown and dynamic environments. An exogenous data-driven fault detection method inspired by the human immune system has been proposed by Tarapore et al. [9–11]. Binary feature vectors that characterize agents’ behavior are periodically shared among robots of the swarm. Then, using the mathematical formulation of the cross-regulation model along with the feature vectors, outliers are detected. The method detects faults by assuming that the behavioral abnormality is the result of a fault. Nonbinary feature vectors could be used to perform fault detection using more common outlier detection methods like K-nearest neighbor (KNN) and local outlier factor [12]. A neural network fault detection approach was proposed in [13] where faults are presumed to cause changes in the data flow of the system. By monitoring the flow, they can infer the presence of a faults. Of course, such a method requires learning, which can be challenging when failures are concerned. However, because outlier detection identifies abnormal behaviors and not faults directly, it is hard to perform diagnosis and identify the adequate recovery procedure. In [14,15], authors address the problem by adding a fault diagnosis layer that uses feature vectors after the fault detection one. 2.2
Risk Awareness
Several robot swarm applications considering risk have been proposed in the past. In SPIDER [16], agents are deployed into a dangerous environment and maintain connectivity through chain formation. Agents display shy behaviors when the frequency of social interactions is low and bold behaviors when interactions are numerous. In this paper, danger is symbolized through connectivity, where being disconnected from the other agents is the risk the swarm faces.
60
G. Ricard et al.
Ono et al., Vitus et al. introduced the concept of a “risk budget” [17,18] and leverage it to both maximize reward and optimize allocation of risk. Unfortunately, global knowledge of the environment is required and is therefore ill-suited for tasks performed in unknown environments. Vielfaure et al. proposed DORAExplorer [19], a risk-aware exploration algorithm that maximizes terrain coverage while minimizing exposure to risk in a fully decentralized fashion. The method assumes that the risk associated with the environment can directly be sensed by the agents using an on-board sensor. Risk levels are then used to compute a direction that will provide meaningful exploration gains while minimizing exposure to risk. In all these methods the nature of risk is known by the robots: in SPIDER [16] risk is connectivity, in Ono et al., Vitus et al. [17,18] a global state of the environment is needed and in DORA-Explorer [19] risk can be measured by the agents. However in some cases the nature of the risk might not be known, especially is the mission is being carried in dynamic and unknown environments. In FLAM we instead use a fault detector and leverage the identified failures to build a map of the environmental hazards. Risk is never directly sensed, it is inferred from the location of previous failures. 2.3
Distributed Storage
Distributed information sharing remains a challenge in multi robot applications. Limitations in communication capabilities of robots generally complicates achieving consistency among the robotic team [20,21]. The virtual stigmergy presented by Pinciroli et al. [22] tackles the problem and achieves an eventually coherent set of data across the swarm. It is specifically designed for decentralized robotic systems operating in environments with heavy packet loss and partial connectivity. Using Conflict-Free Replicated Data Types (CRDTs), the virtual stigmergy allows robots to share and agree on sets of (key, value) pairs and enables robust decentralized information sharing. The Buzz programming language [23] directly integrates the virtual stigmergy and allows straightforward robot swarm software development. Another data decentralized data storage mechanism has recently been proposed by Majcherczyk et al. [24] where agents each store some amount of information using a fitness based partitioning scheme. Data is not fully replicated across the agents of the swarm allowing the swarm to store more information system wide. However, because agents only store a portion of the swarm’s information, they are less likely to have access to the information they need. Occupancy maps and belief maps have been extensively used for robotic exploration purposes [25–27]. Occupancy maps discretize the environment into cells and record the absence of presence of a feature, generally an obstacle. Whereas occupancy maps are binary, belief maps store probabilities and are proven to offer better exploration performances [28]. The previously mentioned DORA-Explorer algorithm [19] actually leverages a belief map when determining the direction that both optimizes terrain coverage and minimizes exposure to risk.
FLAM: Fault Localization and Mapping
61
Fig. 1. FLAM intuition. Figure 1a: Robots of the swarm start exploring an unknown and hazardous environment. Figure 1b: Some robots start experiencing failures due to environmental hazards (in this example a mud pit causing wheel slippage and mountains causing falling damage). Failures are detected using a data-driven exogenous fault detector and their respective locations are recorded in the distributed belief map. Figure 1c: A map of the environmental hazards that embodies the “risk beliefs” is built and can be leveraged to provides risk-awareness to the swarm to perform the given mission.
3
System Model
Taking inspiration from previous works, we build FLAM, an exogenous datadriven fault detection mechanism that is able to identify interaction faults, in other words, faults caused by exogenous events in the environment modeled as a grid E ⊂ Z2 . Then, using the detected faults and their respective locations, FLAM builds a belief map of where the hazards are located in the environment and shares it among the swarm using the virtual stigmergy. The hazard map provides risk-awareness to agents of the swarm and could, for example, be later used for planning purposes to help avoiding environmental pitfalls during subsequent exploration efforts. This intuition is summarized in Fig. 1. 3.1
Risk Modelling
Several triggers are taken into account for risk sources. Some hazards pertain to fixed areas matching specific terrain or obstacles in the environment: robots are subject to such faults immediately as they enter the corresponding zone (e.g. slippage). Conversely, more diffuse areas of danger may affect the agents with a given probability as they draw closer, causing interference that may result in robot death or communication jamming, we model these as point radiation sources of varying intensities (Ij )j∈S ∼ U(0, 1) akin to those used in DORA-Explorer [19]. The radiation induced for robot i at position xi ∈ E by a single source located at sj ∈ S is expressed as: rsj (xi ) =
Ij 1 + λ||xi − sj ||2
(1)
62
G. Ricard et al.
with λ as a decay constant. To determine whether a robot fails due to its radiation exposure, we sum all individual source radiation levels and add Gaussian noise b ∼ N (0.005) to account for environment and source variability. rj (xi ) (2) r(xi ) = b + sj ∈S
A failure is then triggered with probability min{1, r(xi )}. 3.2
Feature Vectors
Tarapore et al. [10] introduced a set of boolean feature vectors describing neighbor density, robot motion and obstacle avoidance performance. We extended these with a seventh binary feature vector to take into account received network packets in a similar fashion as the motion speed: data transfer speed is sampled over the sliding observation window W and the additional feature vector component is set to 1 as long as the current network speed is at least 10% of the last maximum speed. Counting the values added by each agent to the stigmergy maintains the exogenous nature of estimating nearby robots’ feature vector locally. 3.3
Fault Detection
We compare two approaches for outlier detection: (i) KNN [12]. Erroneous behavior detection is achieved by comparing the Hamming distance of each agent feature vector with its 3 closest neighbors in the stigmergy, the predicted class (faulty or normal) is decided with a majority rule. To perform early on in the absence of robots identified as faulty, KNN must be seeded with an inkling of which values may be abnormal. Fortunately the definition of the feature vector components outlines impossible values for a robot in normal operation (e.g. no movement command while reporting performing obstacle avoidance). (ii) Cross-regulatory model (CRM) [10]. Used in conjunction with the base set of feature vectors in their inception paper by Tarapore et al. [10]. Feature vector values are treated as antigens displayed by agent cells and their classification result is given by integrating the modeled dynamics of our immune system. Over time, more-often-encountered values which are common in the agent population are more tolerated than outliers that we assume are the result of a fault. The original C++ implementation is used directly with feature vector values supplied from Buzz. The local classification results are shared through the stigmergy and a robot is deemed faulty when the majority of the swarm classifies it as abnormal. 3.4
Distributed Belief Map
Using feature vectors to characterize robot behavior may not allow direct diagnosis but it encompasses various kinds of possible faults: without loss of generality,
FLAM: Fault Localization and Mapping
63
we may consider all types of fault equal in terms of risk as FLAM leaves the task of leveraging these maps to the exploration control software. Thus, we store the detected failure count for each discrete cell in the environment modeled as a grid. This information can be used with the current time step to compute an estimate of the failure probability at position (x, y): Nf (3) t As the fault likelihood decreases with time, entries in the stigmergy older than 5 epochs are evicted after each update. In a more realistic application, performing fault diagnosis lets us prioritize which risks we want to avoid by applying additional weights when constructing the global risk belief map. pt (x, y) =
3.5
Implementation
With the steps put forward, the FLAM implementation may be summarized as the following algorithm:
Algorithm 1: FLAM implementation step ← 1, S ← random radiation sources; while True do if Failure(S) or consensus against(self ) then Add failure to map at (self.x, self.y); else if step % W = 0 then compute f eatures() else if step % W = 1 then update votes() else if step % W = 2 then update consensus() random search(); step ← step + 1
Failed robots or robots that were voted against by the majority of the swarm become inactive and stop running FLAM as soon as they are aware that they no longer seem fit for duty. Updates to the stigmergy are one step apart to let the information spread through the shared storage.
4 4.1
Simulations Experimental Setup
We evaluated FLAM with different kinds of faults on the open-source ARGoS robot swarm simulator [29] performing 50 simulation instances over 1200 time
64
G. Ricard et al.
steps. N = 20 wheeled Khepera IV [30] robots start at a random position on a 20 × 20 grid and perform random search in an environment with two localized error sources (see Fig. 2): terrain difficulty is modeled by fault-causing areas slowing down robot wheel speed or saturating sensor readings, respectively. Additionally, four radiation sources may trigger robot termination or communication jamming/packet loss. The latter are more challenging as they effectively disconnect one robot from the stigmergy entirely preventing further voting and consensus updates. The Khepera IV is equipped with an additional near-range communication system using its sensors that is used to shutdown robots cut off from the network that the rest of swarm identified as faulty.
Fig. 2. Simulation setup. Figure 2a: Fault-generating areas are present in the top-left (slippage) and bottom-right (sensor saturation) corners highlighted with rectangles. Red stars represent radiation sources fixed across simulation instances to achieve consistent belief maps. Figure 2b robots inside ARGoS, cylinders indicate radiation sources
Simulations logs are collected for each robot and combined to assess fault detection performance with simple counting rules for determining true positives (TP), false positives (FP) and false negatives (FN). – TP Fault-detection occurs for a robot subject to a fault – FP Fault-detection occurs for a robot before it ever fails or when it is no longer subject to a fault – FN No fault-detection occurs for a failed robot
FLAM: Fault Localization and Mapping
4.2
65
Results
(See Table 1). Table 1. Overall fault detection performance by method. The scores capture the retained counting rules for TP, FP, FN. Left: N = 10, Right: N = 20 Method Precision Recall KNN [12] CRM [10]
38.9 68.4
59.8 57.9
F1 47.0 62.7
Method Precision Recall KNN [12] CRM [10]
82.8 91.9
43.8 57.5
F1 57.3 70.7
Fig. 3. Average fault detections over time. Left: N = 10, Right: N = 20
Fig. 4. Generated risk-belief map with slippage (blue rectangle) and sensor saturation (green rectangle) areas as well as radiation sources (red stars). Darker intensity indicates more robot failures at that discrete location. Left: N = 10, Right: N = 20
66
5
G. Ricard et al.
Conclusion
We introduced FLAM, a multiclass bio-inspired distributed fault detection method that builds risk belief maps from identified robot failures. We had expected that successfully applying CRM [10] would outperform basic KNN [12] classification for fault detection and verified that KNN [12] outputs more false positives. CRM [10] avoids this sensitivity pitfall via inherently conservative design though precision is not as high in sparse swarms (N = 10) as in denser swarms (N = 20) because of the challenging small-scale swarm context FLAM operates in to reflect practical applications. Both approaches cease to detect faults when too few robots are left (stagnancy in reported faults Fig. 3). Independently of fault detection accuracy, maps of failures are obtained that are consistent with the simulation setups as they highlight the frontier of the error zones and the surroundings of radiation sources (Fig. 4). The generated risk belief maps may be used to guide navigation for subsequent tasks in the environment. Future work may involve building risk-aware exploration at the same time as FLAM-based risk assessment is conducted, taking immediate action to alleviate faults (e.g. by increasing obstacle avoidance sensitivity and radius). Such robot response can be explored either as automatic countermeasures or interactive visualization presenting risk and terrain information to a remote operator as it is gathered.
References 1. Burgard, W., Moors, M., Stachniss, C., Schneider, F.E.: Coordinated multi-robot exploration. IEEE Trans. Robot. 21(3), 376–386 (2005). https://ieeexplore.ieee. org/abstract/document/1435481 2. S ¸ ahin, E.: Swarm robotics: from sources of inspiration to domains of application. In: S ¸ ahin, E., Spears, W.M. (eds.) SR 2004. LNCS, vol. 3342, pp. 10–20. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-30552-1 2 3. Winfield, A.F., Nembrini, J.: Safety in numbers: fault-tolerance in robot swarms. Int. J. Model. Identif. Control 1(1), 30–37 (2006). https://www.inderscienceonline. com/doi/abs/10.1504/IJMIC.2006.008645 4. Bjerknes, J.D., Winfield, A.F.T.: On fault tolerance and scalability of swarm robotic systems. In: Martinoli, A., et al. (eds.) Distributed Autonomous Robotic Systems. STAR, vol. 83, pp. 431–444. Springer, Heidelberg (2013). https://doi. org/10.1007/978-3-642-32723-0 31 5. Yang, G.-Z., et al.: The grand challenges of Science Robotics. Sci. Robot. 3(14), eaar7650 (2018). https://www.science.org/doi/abs/10.1126/scirobotics.aar7650 6. Khalastchi, E., Kalech, M.: On fault detection and diagnosis in robotic systems. ACM Comput. Surv. 51(1), 1–24 (2018). https://doi.org/10.1145/3146389 7. Christensen, A.L.: Fault detection in autonomous robots. Ph.D., Universit´e Libre de Bruxelles (2008) 8. Lau, H.K.: Error detection in swarm robotics: a focus on adaptivity to dynamic environments. Ph.D. dissertation, University of York (2012) 9. Tarapore, D., Lima, P.U., Carneiro, J., Christensen, A.L.: To err is robotic, to tolerate immunological: fault detection in multirobot systems. Bioinspiration Biomimetics 10(1), 016014 (2015)
FLAM: Fault Localization and Mapping
67
10. Tarapore, D., Christensen, A.L., Timmis, J.: Generic, scalable and decentralized fault detection for robot swarms. PLoS ONE 12(8), e0182058 (2017) 11. Tarapore, D., Timmis, J., Christensen, A.L.: Fault detection in a swarm of physical robots based on behavioral outlier detection. IEEE Trans. Robot. 35(6), 1516–1522 (2019) 12. Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000) 13. Christensen, A.L., O’Grady, R., Birattari, M., Dorigo, M.: Fault detection in autonomous robots based on fault injection and learning. Auton. Robot. 24(1), 49–67 (2008). https://doi.org/10.1007/s10514-007-9060-9 14. O’Keeffe, J., Tarapore, D., Millard, A.G., Timmis, J.: Towards fault diagnosis in robot swarms: an online behaviour characterisation approach. In: Gao, Y., Fallah, S., Jin, Y., Lekakou, C. (eds.) TAROS 2017. LNCS (LNAI), vol. 10454, pp. 393– 407. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64107-2 31 15. O’Keeffe, J., Tarapore, D., Millard, A.G., Timmis, J.: Adaptive online fault diagnosis in autonomous robot swarms. Front. Robot. AI 5, 131 (2018) 16. Hunt, E.R., Jenkinson, G., Wilsher, M., Dettmann, C.P., Hauert, S.: SPIDER: a bioinspired swarm algorithm for adaptive risk-taking. In: Artificial Life Conference Proceedings, pp. 44–51. MIT Press, Cambridge (2020) 17. Ono, M., Williams, B.C.: An efficient motion planning algorithm for stochastic dynamic systems with constraints on probability of failure. In: AAAI, pp. 1376– 1382 (2008) 18. Vitus, M.P., Tomlin, C.J.: On feedback design and risk allocation in chance constrained control. In: 2011 50th IEEE Conference on Decision and Control and European Control Conference, pp. 734–739. IEEE (2011) 19. Vielfaure, D., Arseneault, S., Lajoie, P.-Y., Beltrame, G.: DORA: distributed online risk-aware explorer (2021) 20. Amigoni, F., Banfi, J., Basilico, N.: Multirobot exploration of communicationrestricted environments: a survey. IEEE Intell. Syst. 32(6), 48–57 (2017). https:// ieeexplore.ieee.org/abstract/document/8267592 21. Otte, M.: An emergent group mind across a swarm of robots: collective cognition and distributed sensing via a shared wireless neural network. Int. J. Robot. Res. 37(9), 1017–1061 (2018) 22. Pinciroli, C., Lee-Brown, A., Beltrame, G.: A tuple space for data sharing in robot swarms. In: Proceedings of the 9th EAI International Conference on Bio-Inspired Information and Communications Technologies (Formerly BIONETICS), pp. 287– 294 (2016). https://carlo.pinciroli.net/pdf/Pinciroli:BICT2015.pdf 23. Pinciroli, C., Beltrame, G.: Buzz: a programming language for robot swarms. IEEE Softw. 33(4), 97–100 (2016) 24. Majcherczyk, N., Pinciroli, C.: SwarmMesh: a distributed data structure for cooperative multi-robot applications. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 4059–4065. IEEE (2020). https:// ieeexplore.ieee.org/abstract/document/9197403 25. Kobayashi, F., Sakai, S., Kojima, F.: Sharing of exploring information using belief measure for multi robot exploration. In: 2002 IEEE World Congress on Computational Intelligence. 2002 IEEE International Conference on Fuzzy Systems. FUZZIEEE 2002. Proceedings (Cat. No. 02CH37291), vol. 2, pp. 1544–1549 (2002)
68
G. Ricard et al.
26. Kobayashi, F., Sakai, S., Kojima, F.: Determination of exploration target based on belief measure in multi-robot exploration. In: Proceedings 2003 IEEE International Symposium on Computational Intelligence in Robotics and Automation. Computational Intelligence in Robotics and Automation for the New Millennium (Cat. No. 03EX694), vol. 3, pp. 1545–1550 (2003) 27. Indelman, V.: Cooperative multi-robot belief space planning for autonomous navigation in unknown environments. Auton. Robot. 42(2), 353–373 (2018). https:// doi.org/10.1007/s10514-017-9620-6 28. Stachniss, C., Burgard, W.: Mapping and exploration with mobile robots using coverage maps. In: Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No. 03CH37453), Las Vegas, Nevada, USA, vol. 1, pp. 467–472. IEEE (2003) 29. Pinciroli, C., et al.: ARGoS: a modular, multi-engine simulator for heterogeneous swarm robotics. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5027–5034 (2011) 30. K-Team, “Khepera IV” (2021). https://www.k-team.com/khepera-iv
Social Exploration in Robot Swarms Elliott Hogg1(B) , David Harvey2 , Sabine Hauert1 , and Arthur Richards1 1
Bristol Robotics Laboratory, University of Bristol, Bristol, UK [email protected] 2 Thales UK ltd, Reading, UK
Abstract. Robot swarms have shown great potential for exploration of unknown environments, utilizing simple robots with local interaction and limited sensing. Despite this, complex indoor environments can create issues for reactive swarm behaviours where specific paths need to be travelled and bottlenecks are present. In this paper we present our social exploration algorithm which allows the swarm to decide between different options of swarm behaviours to search randomly generated environments. Using a “happiness” measure, agents can reason over the performance of different swarm behaviours, aiming to promote free movement. Agents collaborate to share opinions of different behaviours, forming teams which are capable of adapting their exploration to any given environment. We demonstrate the ability of the swarm to explore complex environments with minimal information and highlight increased performance in relation to other swarm behaviours over 250 randomly generated environments.
Keywords: Swarm robotics Rescue
1
· Collective decision making · Search and
Introduction
Swarm robotics studies how to design large teams of robots which can work together to achieve some goal. This takes inspiration from swarms found in nature, particularly, the ability to produce complex emergent behaviours from simple individual swarm agents reacting to their local environment [1,2]. Swarm robotics aims to understand how to design the local interaction between neighbours in the swarm to generate these complex behaviours, without any defined leader [3]. Some of the key benefits of swarms systems are their fully decentralized nature, which reduces risks of single point of failure, and provides redundancy in the case of lost agents [4]. Swarm robotics has been investigated and applied to a large range of applications, including, exploration, collective assembly, and collective decision making [5,6]. Here, we focus on the problem of exploration of unknown environments with swarms. Exploration in robot swarms has been investigated widely, considering how to design swarm systems to explore different types of environments. Swarms c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 69–82, 2024. https://doi.org/10.1007/978-3-031-51497-5_6
70
E. Hogg et al.
are well suited to exploration as we can use large teams of simple robots which can explore different areas in parallel. There are many existing dispersion based algorithms, where agents can simply repel from their local neighbours, allowing swarms to cover large areas of different shapes [7,8]. This has also been demonstrated with minimal sensory requirements for real world applications [9,10]. Other approaches such as random walk behaviours have also been shown to be effective solutions [11,12]. Exploration is also a fundamental requirement in foraging based problems [13]. Foraging algorithms can use virtual pheromones placed by agents to guide other swarm agents in the correct direction [14]. Whilst each of these proposed swarm algorithms have been shown to be effective, each are suited to different types of environments. When faced with uncertainty over the nature of the environment, it is not clear how to select suitable behaviours. Therefore, there is a need for a universal swarm search strategy. In this paper we explore the design of a social exploration algorithm which allows swarms to explore complex, uncertain environments by giving agents the option to individually choose between different swarm behaviours. Without any prior knowledge, agents trial different behaviours to try and find the best current option. Agents use a “happiness” measure to assess their degree of free movement when exploring to determine whether a behaviour is effective. Crucially, agents work together by sharing opinions of different behaviours to discover the most suitable behaviours to use (Fig. 1). This naturally allows the swarm to adapt its search to the configuration of the environment. We discuss our algorithm in greater detail in Sect. 3. In the following section we highlight similar work to our presented algorithm and finally we discuss the performance of our algorithm when compared to other swarm behaviours in Sect. 4.
Fig. 1. Swarm agents reason over suitable swarm behaviours to explore environments using a happiness measure, indicating their ability to explore freely. Agents locally compare happiness to find suitable behaviours and improve exploration.
Social Exploration in Robot Swarms
2
71
Related Work
The social exploration algorithm discussed in this paper is inspired from other examples of collective decision making in robot swarms [15]. Examples of collective decision making can be found in nature such as site selection in ant swarms [16]. Some examples applied to robotics include the “best of n” problem, tasking the swarm with comparing the qualities of different sites, aiming to reach consensus on the best option [17]. In these problems, agents develop opinions over different options and through different communication mechanisms, share this evidence with their neighbours. Through information sharing, agents compare beliefs until the swarm can form consensus on one shared belief or decision [18]. Our work relates to the decision making mechanisms used within this field but crucially differs in that we do not want to achieve global consensus in the swarm. Agents will share their opinions of different options of swarm behaviours and whilst we may observe small groups of agents which adopt the same behaviour, this should remain highly localized. Talamali et al. [19] have also explored collective decision making whilst with short communication ranges, highlighting the benefit of localized information sharing. Global consensus is not a priority in our work, but we find a similar example where swarm agents reason over possible actions in a foraging task [20]. Here it is shown that the swarm can form consensus over the best of two actions to find the shortest path between a food and nest site. Whilst agents don’t need to directly measure the reward of each action, only two options need to be considered given two possible paths to take. In exploration of complex environments, agents are required to make many correct decisions in sequence when many possible complex paths are present. We also highlight themes in Particle Swarm Optimization (PSO) which overlap with our algorithm. In PSO, swarm agents search an N dimensional parameter space for optimal solutions, recording their best individual state and states of their neighbours [21]. The particles use this shared historical information to determine their velocity headings, aiming to search for optimal solutions [22,23]. There are similarities here where agents share their states with some value associated to that state. Our problem differs from PSO as there is no sense of an optimal state that we want to reach, rather promoting continuous exploration. Additionally, agents do not use a sense of a best previous position or state, instead agents use an internalized sense of performance based only on their choose of a certain option. Our implementation of agent happiness also links to themes in reinforcement learning (RL) using reward functions. In multi-agent RL, agents aim to determine optimal policies to solve some problem which consist of a specific sequence of actions. This is achieved using a reward function which is associated to different actions an agent can take in different possible states [24,25]. Whilst in RL, the reward function is used to construct policies over many iterations, here we employ a real-time sense of reward to make a decision at a given point in time. Our agents do not construct policies or decide upon sequences of actions to perform.
72
E. Hogg et al.
3
Methodology
3.1
Simulation and Local Swarm Behaviours
We investigate the exploration of complex environments in a custom simulator built in Python. Swarm agents are modelled as particles which can move in any direction with a degree of random motion. Agents aim to represent either quadcopter or omni-directional wheel based robots. Potential field forces allow agents to avoid obstacles and walls in the environment. The agents travel at a constant speed of 0.5 m/s. In our investigations we aim to maximise coverage of environments which we measure based on the detection of a set of objectives which are evenly distributed over the environment. An objective is detected when the euclidean distance between an agent and an objective is less than 2.5 m. Objectives are distributed evenly 1 m apart over the environment, giving a granular measurement of coverage. The swarm agents have no access to coverage information. In this investigation, individual agents can choose to follow different local rules or swarm behaviours. In this case we implement a directed field behaviour which allows the swarm to disperse in a specific direction. We separate this into eight discrete behaviours allowing the swarm to disperse, North, East, South, and West. As well as, Northeast, Northwest, Southeast, and Southwest. We explored the application of these behaviours in a previous study using centralized behaviour selection [26]. These behaviours allow the swarm to disperse in specific directions, and in this case, individual agents can choose between which directions to disperse rather than all agents following the same global rules. Agents should then work together to determine which direction is best for them to travel to explore an environment. We also give agents the option of selecting a random walk behaviour in addition to the directed field behaviours. The process of selecting between behaviours is discussed in detail in Sect. 3.4. 3.2
Generating Random Environments
In order to thoroughly test the performance of our proposed algorithm, we designed an algorithm to randomly generate environments which resemble possible building layouts. We focus on these types of environments which have complex structures, multiple paths, and features including rooms and doorways which can make exploration more difficult. The environment generation process follows three key stages: 1. Corridor generation - From a random starting point at the bounds of the environment, we randomly grow a spanning tree which forms the core corridor network in the environment. Segments of the corridors are created in sequence with random probabilities to change directions by 90 degrees. Sub-corridors are also added during this process. 2. Room Creation - With a generated corridor network, the remainder of the environment is naturally split into sections. Each section is given randomly placed doorways which connect to the corridor network.
Social Exploration in Robot Swarms
73
3. Corridor doorways - With some probability we add additional doorways at junctions in the corridor network. Key parameters which control the types of environments that are generated include, bounds for the segment lengths within corridors, clen , the probability of creating sub-corridors, psub , and the minimum area which corridors cover in the environment, cprop . All environments in this study were generated with the same bounding dimensions of 80 × 80 m. 3.3
Agents with “Happiness”
For agents to decide between using different behaviours, they need a way of measuring their own performance. In this case, we implement a “happiness” measure which agents use to measure their degree of free movement which they update at given intervals. Here, agents monitor how far they have travelled and density to other agents to determine their happiness and promote exploration. Agents do not use any complex sensing or reasoning to understand their surroundings, nor do they have any direct knowledge of global or local coverage. Instead, agents compare happiness locally to find suitable behaviours (discussed further in Sect. 3.4). We include distance traveled such that agents check how far they have moved within a certain time period. When performing a certain behaviour, agents record their initial position and after a short period of time, agents can check how far they have moved from their initial position. We define α as the travel rate where, xt is the position of the agent at time t, and τ is the period between two states. Here, agents only have memory of one previous position. vc is the constant speed at which agents travel. By selecting behaviours which allow agents to maximize distance travelled, this should in turn promote coverage and allow agents to detect when they become stuck. α=
xt − xt−τ τ vc
(1)
The second factor included is agent density. When agents update their happiness measure, at that moment in time, they check how many other agents are within their communication range. To promote coverage, agents should not be densely packed together. Therefore we define β where n is how many neighbours an agent has and nmax is the limit on the maximum number of neighbours an agent should have. In this case, nmax = 10. This means agents with fewer neighbours are happier. ⎧n max − n ⎪ ⎨ nmax β= 1 ⎪ ⎩ nmax
if n < nmax (2) if n >= nmax
happiness = α ∗ β
(3)
74
E. Hogg et al.
Agent happiness is then given by Eq. 3 which gives a value between 0 for lowest happiness, and 1 for highest happiness. Given a defined happiness measure which agents can use to assess different behaviours, agents can then reason over which behaviours are best to use. 3.4
Social Exploration Algorithm
Using the happiness measure defined above, we define a set of rules which allow the swarm agents to work together to discover suitable behaviours to search an environment. The local decision making rules that are implemented are summarised in Algorithm 1. Agents continually update their happiness each timestep but may only make decisions at certain points based on the update rate τ . This gives agents time to trial different behaviours for small periods of time before checking whether some decision needs to be made. When agents check their happiness, they enter either happy or unhappy states in which they behave differently.
Algorithm 1. Social exploration algorithm using happiness measure. At each time step, agents update their state in either happy, or unhappy modes. for agent in swarm do update happiness if happiness > 0.5 or happiness increased then for neighbour in neighbours do if happiness > neighbour happiness then neighbour behaviour ← agents behaviour if agent timer > update limit then if happiness < 0.5 and not increased then if neighbour has higher happiness then agent behaviour ← neighbour behaviour else if agent has highest happiness then pick random behaviour = neighbours behaviours if probability < 0.5 then broadcast behaviour change to neighbours previous position ← current position agent timer ← 0 update limit ← happiness ∗ update rate agent timer ← agent timer +1
Unhappy Agents. If the agent’s happiness is low and has not improved since the last state check, then the agent needs to switch to some other behaviour. When agents are unhappy, they need to communicate with their neighbours to find a better option. In this case the agent will first look for agents with higher
Social Exploration in Robot Swarms
75
happiness than its own. If a neighbour does have a higher happiness, the agent will switch their behaviour to match their neighbour. This allows unhappy agents to use positive evidence from their neighbours to adopt suitable behaviours. However, if a neighbour is using the same behaviour, the unhappy agent will not switch their behaviour as it conflicts with their own evidence. When all an agent’s neighbours have equal or lower happiness, the agent will then select a behaviour which none of its neighbours are using. Here the agent uses the negative evidence of it’s neighbours to eliminate bad options. In addition, there is a probability that upon selecting a new behaviour, the agent will in turn signal to each neighbour to also switch to this new behaviour. This allows agents to help unhappy neighbours, promoting co-operation rather than individual gain. If an unhappy agent has no neighbours, it will randomly switch to a new behaviour. With no external evidence, agents use trial and error to discover more suitable behaviours. Happy Agents. When an agent has high happiness, it does not make any changes to its chosen behaviour, but it can influence agents with low happiness. At each time-step, happy agents have a probability of broadcasting their chosen behaviour to their neighbours. This probability is set equal to their happiness level. If a broadcast is successful, a neighbour which receives the broadcast will switch its behaviour to the happy agent, but only if its happiness is significantly higher than its own. It is important that happy agents help other agents to improve performance of the swarm as a whole.
4
Results
In this section we discuss the emergent behaviour produced by the social exploration algorithm and performance across large sets of random environments (videos of simulations are presented here). For all experiments shown here we selected a short update interval of 20 s to allow agents to frequently check their happiness. We also set the communication range between agents to 5 m such that evidence is only shared between agents in close proximity. Initial Experiments. We implemented the algorithm discussed above using the happiness measure based on distance travelled and agent density. We first highlight the behaviour of a single agent and the ability to adapt to their surroundings by selecting different swarm behaviours. Figure 2a presents a single agent and the path of the agent through the environment. The path is plotted in different colours which relate to the options of swarm behaviour available. In addition, we plot the agents happiness over time, also indicating the behaviour used at different points in time. What we observe is that the agent is capable of selecting new swarm behaviours when they become stuck and subsequently unhappy. We observe that initially the agent attempts to head West, but quickly
76
E. Hogg et al.
Fig. 2. Emergent behaviour of the swarm using happiness to select between different options of exploration.
hits a wall and becomes unhappy. Without any neighbours present, an agent simply picks a random behaviour to try. The agent briefly attempts to head Southwest, with no success, before then switching to heading Northeast. This then allows the agent to head away from the corner and we observe a sharp improvement in happiness. Whilst the agent has high happiness, no change of behaviour is needed. As navigation continues, we see the agent repeat this process each time their happiness drops. This trial and error approach allows agents to fix problems when they become unhappy, allowing them to navigate out of positions where they become stuck. This creates agents which can adapt to their surroundings without complex reasoning over why they have become stuck. Social Exploration. Whilst with only 1 agent we have to rely on trial and error, with many agents, this approach becomes more powerful where agents can collaborate and share knowledge. Figure 2b shows another environment with 30 deployed agents. By utilizing the mechanisms discussed in the previous section, happy and unhappy agents work together to try and promote the selection of effective behaviours in their surroundings. Through the use of happy agents which broadcast their behaviours, and unhappy agents which collaborate with
Social Exploration in Robot Swarms
77
other agents to discover new behaviours to try, emergent sub-teams form in the swarm where small groups of agents converge to the same behaviour. These flock-like groups form through the use of these mechanisms, allowing agents to share solutions to local problems and help each other navigate with minimal resistance. We found that these emergent behaviours that form allow the swarm to effectively navigate complex environments.
Fig. 3. Trails of 50 agents deployed in a maze environment. The colour of the trials in the environment indicate the level of happiness of agents, red trials show positions where agents had happiness lower than 0.5, and green if greater than 0.5. The coverage achieved over time is presented in comparison to 50 agents performing dispersion or random walk.
We also investigated the behaviour of the swarm in a hand designed maze environment as shown in Fig. 3. Coverage is measured based on the detection of objectives covering the environment, distributed in increments of 1 m in both directions. When the Euclidean distance between any agent and an objective is less than 1.5 m, the objective is classified as detected. The total coverage is determined by the proportion of detected objectives given the total number of objectives. In this case there is a complex path that needs to be navigated, relying on a long sequence of actions to travel in the correct direction. We find with the deployment of social swarming, agents are capable of navigating this environment quickly and achieving high coverage. The swarm forms small teams which work together to select the appropriate behaviours to allow them to explore the maze. This is achieved without any prior knowledge or a direct goal to maximise coverage. Whilst the swarm has some difficulty travelling away from the bottom left region, we highlight the performance against random walk and dispersion swarm behaviours which struggle greatly in this environment. With few agents, dispersion is not effective at pushing agents around many sequences of corners,
78
E. Hogg et al.
and similarly, using random walk is not suited to specific paths that need to be travelled.
Fig. 4. Performance comparison of social exploration algorithm and random walk and dispersion behaviours with 25 agents. Each data point shows the comparison in performance of two algorithms for one environment. Each data set contains 250 environments. Points shown above the red line indicate higher performance with the social exploration algorithm.
4.1
Performance in Randomly Generated Environments
To systematically test the efficacy of our algorithm we test the performance over a large set of randomly generated environments. The developed algorithm should enable the swarm to adapt to search a variety of environments and agents will work together to promote exploration. We test the performance of a swarm of 25 agents over a set of 250 randomly generated environments. For each environment, the swarm has 1000 s to explore and we take the mean performance of each algorithm over 10 separate trials. The input parameters for the generated environments (as discussed in Sect. 3.2) are also randomly selected within controlled bounds to generate a broad spectrum of environments. We measure performance based on the integral of coverage achieved over time, λ, as shown by Eq. 4. Here, C(t) is the function of coverage achieved over time and, T , is the total search duration. Rather than only measuring overall coverage, we also want to measure the rate of coverage, where λ is higher for faster exploration. Fast exploration is important in time critical scenarios such as search and rescue. 1 λ= T
T
C(t) dt 0
(4)
Social Exploration in Robot Swarms
79
Fig. 5. Simulation of social exploration algorithm at different stages with 30 agents. Agents which are green have happiness greater than 0.5, red agents have happiness less than 0.5. The swarm is shown to adapt it’s behaviour at different points to navigate through the environment with low resistance. The coverage over time is compared to random walk and dispersion.
We compare the performance of dispersion and random walk behaviours across the same set of generated environments as shown in Fig. 4. Whilst more sophisticated swarm behaviours exist for exploration, such as through the use of pheromones, this requires the use of either physical beacons or access to a shared virtual pheromone map [27,28]. The method presented does not require any global information, only relying on local information sharing. Each plot presents the difference in performance of each algorithm, each data point showing the performance in a specific environment. Any point plotted above the line indicates an environment where social exploration performed higher. This highlights that the social algorithm is capable of consistently outperforming each of the alternative algorithms. In addition, given the distribution of performance across the data we can also infer that testing covered a broad sample of both difficult and easy environments to explore. Whilst in some cases the difference is small, we observe in the majority of environments the difference is significantly high. When comparing to dispersion, we see many cases where the performance difference is large, highlighting that dispersion struggled in cer-
80
E. Hogg et al.
tain environments. Social exploration is capable of adapting broadly to difficult environments where random walk and dispersion struggle. This is also achieved with relatively few agents. We assess one example environment in detail which highlights how this algorithm can adapt to difficult environment configurations. Figure 5 presents the simulation of the social exploration algorithm in one environment where dispersion and random walk performed poorly. Initially, agents are packed together and find behaviours to quickly spread out, splitting a portion of agents dispersing into the large room (Fig. 5a), and others travelling up the corridor (Fig. 5b). When reaching the top of the corridor, agents become unhappy and trial new behaviours. They discover that moving west allows them to increase happiness again and a team of agents move through to the left region of the environment Fig. 5. Agents then continue to explore both left and right regions, adjusting behaviours to navigate in and out of rooms (Fig. 5d and 5e). We highlight in Fig. 5f that the swarm is able to achieve nearly 100% coverage and can achieve quicker exploration of the environment when compared to random walk and dispersion. In these types of environment where there are specific paths which need to be travelled to fully explore the environment, the social exploration algorithm is much more effective at navigating these paths. This is further supported by the performance of the swarm in the maze environment (Fig. 3).
5
Conclusions and Future Work
In this investigation we demonstrated the ability of a swarm to explore complex random environments by using different possible swarm behaviours. With a simple happiness measure, agents can reason over different options and locally share opinions of these behaviours to adapt to their immediate surroundings. This created the emergent formation of small teams that worked together to discover behaviours to react appropriately to their surroundings. In addition we highlighted the ability of the swarm to solve a complex maze environment without any external information. This is also given a high number of nine possible options of swarm behaviours which agents had to choose between. This is achieved without the need of complex sensing or reasoning at the individual level. The findings showed that the social exploration algorithm allowed the swarm to achieve higher coverage and fast exploration when compared to random walk and dispersion over a set of 250 environments. This was possible through the swarms capability to adapt their behaviour in real-time to suit the needs of each individual environment. Future work will explore how agent happiness could be defined with different information such as average speed and number of collisions to promote free movement, and the subsequent effect on the emergent behaviour of the swarm. We also consider how giving agents greater memory of previous states could change performance. In addition, we aim to implement our algorithm with real robots to verify the ability to produce the same emergent properties shown here.
Social Exploration in Robot Swarms
81
Acknowledgement. This work was funded and delivered in partnership between the Thales Group and the University of Bristol, and with the support of the UK Engineering and Physical Sciences Research Council Grant Award EP/R004757/1 entitled “ThalesBristol Partnership in Hybrid Autonomous Systems Engineering (T-B PHASE)”.
References 1. Meng, X.B., Gao, X.Z., Lu, L., Liu, Y., Zhang, H.: A new bio-inspired optimisation algorithm: bird swarm Algorithm. J. Exp. Theor. Artif. Intell. 28(4), 673–687 (2016) 2. Mirjalili, S., Gandomi, A.H., Mirjalili, S.Z., Saremi, S., Faris, H., Mirjalili, S.M.: Salp swarm algorithm: a bio-inspired optimizer for engineering design problems. Adv. Eng. Softw. 114, 163–191 (2017) 3. Dorigo, M., Theraulaz, G., Trianni, V.: Swarm robotics: past, present, and future. Proc. IEEE 109(7), 1152–1165 (2021) 4. Zhu, B., Xie, L., Han, D., Meng, X., Teo, R.: A survey on recent progress in control of swarm systems. Sci. China Inf. Sci. 60(7), 1–24 (2017) 5. Hanay, Y.S., Ilter, M.: Aggregation , Foraging , and Formation Control of Swarms with Non-Holonomic Agents Using Potential Functions and Sliding Mode Techniques, vol. 15, no. 2, pp. 149–168 (2007) 6. Cardona, G.A., Calderon, J.M.: Robot swarm navigation and victim detection using rendezvous consensus in search and rescue operations. Appl. Sci. (Switzerland) 9(8) (2019) 7. McLurkin, J., Smith, J.: Distributed algorithms for dispersion in indoor environments using a swarm of autonomous mobile robots. Distrib. Auton. Robot. Syst. 6, 399–408 (2008) 8. Novischi, D.M., Florea, A.M.: Decentralized swarm aggregation and dispersion with inter-member collision avoidance for non-holonomic multi-robot systems. In: Proceedings - 2018 IEEE 14th International Conference on Intelligent Computer Communication and Processing, ICCP 2018, pp. 89–95 (2018) 9. Bayert, J., Khorbotly, S.: Robotic swarm dispersion using gradient descent algorithm. In: IEEE International Symposium on Robotic and Sensors Environments, ROSE 2019 - Proceedings (2019) 10. Ludwig, L., Gini, M.: Robotic swarm dispersion using wireless intensity signals. Distrib. Auton. Robot. Syst. 7, 135–144 (2006) 11. Kegeleirs, M., Garz´ on Ramos, D., Birattari, M.: Random walk exploration for swarm mapping. In: Althoefer, K., Konstantinova, J., Zhang, K. (eds.) TAROS 2019. LNCS (LNAI), vol. 11650, pp. 211–222. Springer, Cham (2019). https://doi. org/10.1007/978-3-030-25332-5 19 12. Pang, B., Song, Y., Zhang, C., Wang, H., Yang, R.: A swarm robotic exploration strategy based on an improved random walk method. J. Robot. 2019(i) (2019) 13. Lu, Q., Fricke, G.M., Ericksen, J.C., Moses, M.E.: Swarm foraging review: closing the gap between proof and practice. Curr. Robot. Rep. 1(4), 215–225 (2020) 14. Song, Y., Fang, X., Liu, B., Li, C., Li, Y., Yang, S.X.: A novel foraging algorithm for swarm robotics based on virtual pheromones and neural network. Appl. Soft Comput. J. 90, 106156 (2020) 15. Valentini, G., Brambilla, D., Hamann, H., Dorigo, M.: Collective perception of environmental features in a robot swarm. In: Dorigo, M., et al. (eds.) ANTS 2016. LNCS, vol. 9882, pp. 65–76. Springer, Cham (2016). https://doi.org/10.1007/9783-319-44427-7 6
82
E. Hogg et al.
16. Mallon, E.B., Pratt, S.C., Franks, N.R.: Individual and collective decision-making during nest site selection by the ant Leptothorax albipennis. Behav. Ecol. Sociobiol. 50(4), 352–359 (2001) 17. Valentini, G., Ferrante, E., Dorigo, M.: The best-of-n problem in robot swarms: formalization, state of the art, and novel perspectives. Front. Robot. AI 4(MAR) (2017) 18. Valentini, G., Hamann, H., Dorigo, M.: Efficient decision-making in a selforganizing swarm of simple robots: on the speed versus accuracy Trade-Off G. Valentini, G., Hamann, H., Dorigo, M. Technical Report No ., no. September, pp. 1305–1314 (2014) 19. Talamali, M.S., Saha, A., Marshall, J.A., Reina, A.: When less is more: robot swarms adapt better to changes with constrained communication. Sci. Robot. 6(56) (2021) 20. Scheidler, A., Brutschy, A., Ferrante, E., Dorigo, M.: The k-unanimity rule for self-organized decision-making in swarms of robots. IEEE Trans. Cybern. 46(5), 1175–1188 (2016) 21. Zhao, S.Z., Liang, J.J., Suganthan, P.N., Tasgetiren, M.F.: Dynamic multi-swarm particle swarm optimizer with local search for large scale global optimization. In: 2008 IEEE Congress on Evolutionary Computation. CEC 2008, pp. 3845–3852 (2008) 22. Poli, R., Kennedy, J., Blackwell, T.: Particle swarm optimization: an overview. Swarm Intell. 1(1), 33–57 (2007) 23. Tian, D., Shi, Z.: MPSO: modified particle swarm optimization and its applications. Swarm Evol. Comput. 41(August 2017), 49–68 (2018) 24. Althamary, I., Huang, C.W., Lin, P.: A survey on multi-agent reinforcement learning methods for vehicular networks. In: 2019 15th International Wireless Communications and Mobile Computing Conference. IWCMC 2019, pp. 1154–1159 (2019) 25. Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34(6), 26–38 (2017) 26. Hogg, E., Hauert, S., Harvey, D., Richards, A.: Evolving behaviour trees for supervisory control of robot swarms. Artif. Life Robot. 25(4), 569–577 (2020) 27. Cimino, M.G.C.A., Lazzeri, A., Vaglini, G.: Combining stigmergic and flocking behaviors to coordinate swarms of drones performing target search. In: 2015 6th International Conference on Information, Intelligence, Systems and Applications (IISA), pp. 1–6, IEEE, July 2015 28. Hunt, E.R., Jones, S., Hauert, S.: Testing the limits of pheromone stigmergy in high-density robot swarms. R. Soc. Open Sci. 6(11) (2019)
Stochastic Nonlinear Ensemble Modeling and Control for Robot Team Environmental Monitoring Victoria Edwards(B) , Thales C. Silva, and M. Ani Hsieh Department of Mechanical Engineering and Applied Mechanics, University of Pennsylvania, Philadelphia, USA [email protected] Abstract. We seek methods to model, control, and analyze robot teams performing environmental monitoring tasks. During environmental monitoring, the goal is to have teams of robots collect various data throughout a fixed region for extended periods of time. Standard bottom-up task assignment methods do not scale as the number of robots and task locations increases and require computationally expensive replanning. Alternatively, top-down methods have been used to combat computational complexity, but most have been limited to the analysis of methods which focus on transition times between tasks. In this work, we study a class of nonlinear macroscopic models which we use to control a time-varying distribution of robots performing different tasks throughout an environment. Our proposed ensemble model and control maintains desired time-varying populations of robots by leveraging naturally occurring interactions between robots performing tasks. We validate our approach at multiple fidelity levels including experimental results, suggesting the effectiveness of our approach to perform environmental monitoring.
Keywords: multi-robot system
1
· control · environmental-monitoring
Introduction
Robot teams performing long-term environmental monitoring require methods to manage different data collection across the team’s varied sensing modalities. Since environments can be dynamic and uncertain, it is often necessary to vary the distribution of robots within the workspace over time as they perform their monitoring tasks. Existing strategies are mostly bottom-up where the objective is to synthesize individual robot strategies, [14], but are difficult to ensure the desired global distribution of the team over time. We present an approach to model and control time-varying distributions of robots with multi-modal sensing capabilities that extends existing top-down methods, [6], by modeling robot interactions that govern changes in sensing behavior. To illustrate the importance of our approach, consider the long-term deployment of a robot team to monitor marine life in a littoral ocean environment. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 83–99, 2024. https://doi.org/10.1007/978-3-031-51497-5_7
84
V. Edwards et al.
In these settings, autonomous surface vehicles (ASVs) may be tasked to measure the water depth and take images of the ocean floor but have limited storage capacity. While the ASVs can simultaneously collect both types of data, the limited onboard storage capacity may render it undesirable to collect image data during high tide when ocean floor visibility is reduced. Instead, at high tide some robots should focus on collecting water depth data while others focus on collecting images, and during low tide more should switch to collecting both depth and image data. This problem can be thought of as a planning and coverage problem, which couples the navigation of the robot to where the robots should sense [1]. One approach to the problem of managing teams of robots with multiple capabilities is to model, plan, and control each individual robot. Microscopic modeling of teams of robots considers each robot and assigns robots to tasks as some variant of the resource allocation problem [7]. Many algorithms exist to solve the task allocation problem [14]. These methods often rely on heuristics to measure how well-suited a robot is for a specific task, and as such, they do not scale as the number of robots, number of sensing capabilities, and number of task locations increase. Furthermore, time-varying microscopic solutions require computationally expensive replanning [20], and limited analytical tools exist to study the global properties of the team. As a result of these limitations, microscopic modeling restricts our ability to achieve the desired time-varying distributions of robots to collect the desired data. Alternatively, macroscopic models represent global team behaviors and are most commonly derived through some mean field approximation with the assumption that the microscopic system is fundamentally stochastic [12,18]. Macroscopic models have been employed to study and control the steady-state behavior of both homogeneous [2], and heterogeneous cooperative swarms [13]. These models have also been used to synthesize control strategies to reduce the uncertainty in the microscopic systems [4,19]. This is accomplished in [19] by explicitly modeling the first and second order moments of the team distribution in the workspace. Since macroscopic models are agnostic to team size and scale in the number of tasks, these approaches have been employed for homogeneous as well as heterogeneous robot teams [22,23]. Nevertheless, these topdown approaches, similar to their bottom-up counterparts, require replanning when there is a new desired robot distribution because the macroscopic models employed are fundamentally linear. Various replanning methods for linear ensemble models have been proposed [15,22], however these linear mean field models are fundamentally limited in terms of the complexity of microscopic behaviors that they can accurately represent [6,10]. Different from recent work [3], we propose a nonlinear ensemble model that can describe collaborative robot behaviors at the microscopic level and model the dynamics of time-varying robot distributions in the workspace. The proposed class of nonlinear ensemble models can explicitly account for robot-robot interactions and opens up new avenues for analyzing robot team behaviors. In this work, collaboration during an environmental monitoring task means that robots probe the same event in space simultaneously in a distributed fashion. Since
Stochastic Nonlinear Ensemble Modeling and Control
85
robots can also change their behaviors according to the overall environmental conditions, e.g., tidal changes, collaboration refers to the robot’s ability to adjust its short term goal to attain a global objective. To achieve a representation that exhibits those complex behavioral modes, we model the macroscopic continuous dynamics of the team using the Replicator Equations, which are commonly used in Biology to study natural selection and evolutionary dynamics [25]. Alternative models have been considered to understand evolutionary dynamics with potential applications to robotics, but existing work primarily focus on the continuous analysis of discrete solutions overlooking the fundamentally stochastic nature of the microscopic multirobot system [5,16,21]. Instead, we use the Replicator Equations to model a population of robots interacting in a collaborative fashion, where the collaboration necessitates a time-varying distribution of robots across possible behavioral modes and/or specific regions in the workspace. The contributions in this paper are: 1) the development of a nonlinear macroscopic model for a collaborative robot team based on the Replicator Equations, 2) the design of a feedback control strategy to handle uncertainties during team deployment based on the proposed nonlinear macroscopic model, and 3) a method to distribute macroscopic model derived feedback control strategy for the team. We show how our proposed framework works well for environmental monitoring tasks that requires robots with several sensing capabilities to work collaboratively over time through our multi-level fidelity simulation and experimental results.
2
Problem Formulation
We are interested in problems that require robot teams to exhibit time-varying behavior distributions. Unlike existing linear macroscopic ensemble methods which consider transition times [6], our approach uses collaboration rates, a measure of the interaction between robots that share spatial proximity, in a class of nonlinear models. We illustrate our method with two example time-varying solutions which are not possible with existing linear macroscopic methods. 2.1
Task Topology
Consider a set of M tasks where for each task a robot must exhibit a specific set of behaviors, for example sensing a specific phenomenon. We build an undirected graph to represent the tasks and potential switching between task. The graph G = (V, E), has nodes for each task vj ∈ V where j = 1, ..., M , and edges eij ∈ E. If an edge eij = 0 then it is not possible for robots to switch from task i to task j, likewise, when eij = 1 it is possible for robots to switch from task i to task j. We assume the graph G is strongly connected, which means that there is a path from each task to any other task in the graph. Note that the existence of an edge is not reflective of distance to a spatially distributed task as in [19,22], but instead reflective of the potential for robot collaboration to occur.
86
2.2
V. Edwards et al.
Ensemble Model
Given N robots and M collaborative tasks, let Yi be a random variable which represents the population of robots at task i, and the vector of random variables for all tasks be Y = [Y1 ... YM ]T . We present a class of models that describe the stochastic jump process for how the robot ensemble evolves in time. Consider two collaborating robot populations, where robots performing task i and robots performing task j are expected to have spatially close interactions, then kij
Yi + Yj −−→ 2Yi ,
(1)
where kij describes the Poisson collaboration rate between the two tasks i and j that will exist for every nonzero edge in the task graph G. The physical intuition for a nonzero collaboration rate at the microscopic level is when two agents performing different tasks have physically close interactions with one another the result will be one agent changing task. Each collaboration rate kij can be put into a matrix K ∈ RM ×M , commonly called the payoff matrix. An entry of the payoff matrix, kij , maps what will happen when an interaction happens between two agents performing task i and task j. Consider a row i of the payoff matrix, anywhere that kij = 0, means that when a robot performing task i interacts with a robot performing task j nothing happens. We abuse notation by letting the signal of kij be negative or positive to express two types of switches. Specifically, if kij > 0 then the process results in a robot changing from task Yj and starting to perform task Yi , whereas if kij < 0 then the process results in a robot changing from task Yi and starting to perform task Yj . The Replicator Equations [25], are the dynamical systems which use the provided collaboration rates to describe the evolution of different agent behaviors and can be written in vector form as, M M M kij Yj − Yp kpl Yl ]. Y˙ i = Yi [ j
p
(2)
l
The Replicator Equations in (2) can be constrained to satisfy the assumption M that i Yi = 1, such that, if the initial state Y (0) satisfies this constraint, then all future states will satisfy this condition–maintaining a fixed population of robots N [25]. Note that Eq. (1) only considers collaborations between two task types, while there are third order terms accounted for in the Replicator Equations. This simplification is justified because it is of low likelihood to have three robots of the appropriate type collaborating. This formulation allows us to consider the collaborations between robot populations and study how spatial robot-robot interactions change the ensemble dynamics. The Replicator Equations in (2) will behave differently for different structures of the payoff matrix, K, because each entry can be 0 or nonzero depending on the allowed collaborations specified by G. We will consider two relevant examples throughout the paper which have equilibrium that guarantee time-varying distributions of robots that could be used in the littoral ocean monitoring scenario.
Stochastic Nonlinear Ensemble Modeling and Control
87
Example 1: Consider the canonical example of the Lokta Volterra model, written as the Replicator Equations in (2) as detailed by Hofbauer [11]. This example has three tasks, M = 3, and the payoff matrix is ⎤ ⎡ 0 0 0 (3) K = ⎣ k10 0 −k12 ⎦ . −k20 k21 0 The rules from potential interactions to collaboration are k
Y0 + Y1 −−10 → 2Y1 , k
→ 2Y2 , Y1 + Y2 −−12 k
→ 2Y0 , Y0 + Y2 −−20 k
Y1 + Y2 −−21 → 2Y2 ,
(4)
which correspond to the following differential equations that describe how the processes evolve in time, Y˙ 0 = Y0 (k20 Y0 Y2 − k10 Y0 Y1 + (k12 − k21 )Y1 Y2 ), Y˙ 1 = Y1 (k10 Y0 − k12 Y2 + k20 Y0 Y2 − k10 Y0 Y1 + (k12 − k21 )Y1 Y2 ), Y˙ 2 = Y2 (k21 Y1 − k20 Y0 + k20 Y0 Y2 − k10 Y0 Y1 + (k12 − k21 )Y1 Y2 ).
(5)
The parameters k10 , k12 , k20 , and k21 , are the collaboration rates, and define the result of an interaction between two robots performing task i or task j. The possible outcomes based on the collaboration rate are; keep doing the same tasks, or have one of the robots change task. This model is of interest to us because it exhibits time-varying population distributions. Consider the following equilibrium for positive robot populations, T k21 12 , . If we linearize Eq. (5) about Y d , and compute the Y d = 0, k12k−k k −k 21 21 12 √
√
√ k10 , and λ2 = ik21 √ k10 . eigenvalues of the linearization we get λ0 = 0, λ1 = − ik21 k20 k20 From dynamical systems theory we know that the system is classified as a center. (See Guckenheimer and Holmes Ch 1 [9] for a detailed analysis.) This means that for the same set of parameters, K, and different initial positions, Y (0), a unique non-intersecting orbit will occur around the fixed point. The oscillations that result from a center equilibrium produce time-varying distributions of robots. From our littoral ocean monitoring case, we can select parameters which match the tidal charts so that the desired sensing measurements are performed for representative amounts of time without wasting space resources with unnecessary data or compute resources for replanning.
88
V. Edwards et al.
Example 2: Consider a four task example, M = 4, discussed in detail by Schuster et al. [24]. The payoff matrix is written as follows, ⎡ ⎤ 0 1 −μ 0 ⎢ 0 0 1 −μ⎥ ⎥ K=⎢ (6) ⎣−μ 0 0 1 ⎦ . 1 −μ 0 0 The collaboration definitions can be easily achieved by using Eq. (1), and the resulting dynamical system using Eq. (2) comes out to, Y˙ 1 = Y1 (Y2 − μY3 + 2μY1 Y3 + 2Y2 Y4 − Y1 Y2 − Y2 Y3 − Y4 Y3 − Y4 Y1 ), Y˙ 2 = Y2 (Y3 − μY4 + 2μY1 Y3 + 2Y2 Y4 − Y1 Y2 − Y2 Y3 − Y4 Y3 − Y4 Y1 ), Y˙ 3 = Y3 (Y4 − μY1 + 2μY1 Y3 + 2Y2 Y4 − Y1 Y2 − Y2 Y3 − Y4 Y3 − Y4 Y1 ), Y˙ 4 = Y4 (Y1 − μY2 + 2μY1 Y3 + 2Y2 Y4 − Y1 Y2 − Y2 Y3 − Y4 Y3 − Y4 Y1 ).
(7)
Notice that for kij = 1 an interaction results in a change from task j to task i, and in the case that kij = −μ the result will be a change from task i to task j. Again this model is of interest because of equilibrium behavior which provides alternative time-varying population distributions. Shuster et al. proposed the
T following equilibrium Y d = 14 , 14 , 14 , 14 , and if we linearize Eq. (7) about Y d , −i+μ and compute the eigenvalues of the linearization we get λ0 = −1+μ 4 , λ1 = 4 , i+μ −1−μ λ2 = 4 , and λ3 = 4 . The analysis that Schuster et al. performed concluded that at the equilibrium point Y d , μ is a bifurcation parameter. When μ < 0 the system is stable because the real part of all eigenvalues are negative. At μ = 0 the system is at the bifurcation point, where the complex plane is crossed and the negative real parts of λ2 and λ3 vanish. Finally, when μ > 0 the system exhibits a stable Hopf Bifurcation for small enough μ. (See Schuster et al. [24] Sect. 4 for further details on the analysis). The existence of a stable Hopf Bifurcation means that an attracting limit cycle exists in some neighborhood of the equilibrium. This means that the resulting robot populations will fluctuate at regulated intervals governed by the limit cycle. In the littoral ocean monitoring task, we can break down the tides into four periods of interest and pick parameters to match such that different sensing happens based on local interactions during the corresponding time with minimal replanning or information exchanged.
3
Ensemble Model Control
Our goal is to maintain proximity to an equilibrium of interest which provide time fluctuating distributions of robots performing different behaviors. Previous work has proposed controlling different aspects of the linear mean field model by considering different nonlinear feedback strategies [4,19]. We will aim to maintain a trajectory by actuating on the collaboration rates to control how the interaction
Stochastic Nonlinear Ensemble Modeling and Control
89
with other agents influences the switch to different behaviors. The macroscopic continuous solutions to our proposed models give deterministic behavior, and will result in constant oscillations either around the center equilibrium in Example 1 or the limit cycle equilibrium in Example 2. However, because our models are representations of stochastic processes we know that due to variability in the robot-robot and robot-environment interactions our desired distributions may not be achieved over time. This is important because if the environment we are monitoring in has specific windows where phenomena can be observed it is necessary that we stay near the desired distributions. 3.1
Feedback Controller
We are proposing a modification to the collaboration rate, kij , to take as feedback a desired trajectory, Yi∗ (t). The evaluation for Yi∗ is achieved from the numerical solution of the continuous macroscopic differential equations, Eq. (2), with the desired parameters. Consider the following trajectory tracking controller, kij = αij (
Yi∗ (t) − 1), Yi (t)
(8)
where αij is a tunable parameter, Yi∗ (t) is the desired trajectory and Yi (t) is the current population fraction at time t. If we plug this new collaboration rate into Eq. (1) we get: αij (
Yi∗ Y
−1)
i Yi + Yj −−−−− −−→ 2Yi .
(9)
After substituting Eq. (8) into the Replicator Equations in (2) we get: M M M Yp∗ Y∗ Y˙i = Yj [ (αij ( i − 1))Yi − Yl αpl ( − 1)Yp ], Yi Yp p j l
M
= Yj [
j
αij (Yi∗ − Yi ) −
M M p
αpl Yl (Yp∗ − Yp )].
(10)
l
The resulting (Yi∗ − Yi ) acts as a proportional error term which regulates the population based on the desired trajectory. Notice that, as before, we will have collaboration rates that are negative which will switch the resulting collaboration outcome. The same substitutions can be replicated for both Example 1 and Example 2. 3.2
Distributed Microscopic Algorithm
To demonstrate that the proposed nonlinear stochastic representations are beneficial for modeling and controlling robot teams we will outline an idealized microscopic scenario, which acts as a proxy for teams performing environmental
90
V. Edwards et al.
monitoring. The goal is to find the area of interaction, aij and corresponding radius of interaction, rij , which are microscopic instantiations of the macroscopic collaboration rate kij . If there is a nonzero collaboration rate, kij = 0, then we know that when robots interact within a certain area a task switch will occur. Notice that if the collaboration rate is zero, than the robots may interact but this interaction will not result in collaboration. The area of interaction is directly related to the collaboration rate, which is a measure of how close two agents need to be to one another for a task switch to happen. Let Yˆi be the population count for robots performing task i, then the area of interaction, aij , is computed as follows aij =
Akij , vij Yˆi Yˆj
(11)
where A is the area of the environment to monitor, kij is the collaboration rate, vij is the relative speed of robot i and robot j, and Yˆi Yˆj are all disjoint pairs of collaborations possible given the populations of robots performing these tasks. The robot radius of interaction is: rij = aij /π. Incorporating the proposed feedback controller, Eq. (8), consider aij =
Akij , ˆ vij Yj Yˆ ∗1−Yˆ i
(12)
i
where Yˆi∗ = Yi∗ N is the re-scaling of the desired trajectory population fraction by the total number of robots in the system. The intuition for the feedback term is that when Yˆi is far from Yˆi∗ then the resulting area of interaction will increase. Likewise, when Yˆi is close to Yˆi∗ the area of interaction decreases. The interaction area is updated in time as the microscopic simulations are running. We extend our controller to consider distributed estimation of the population dynamics. Similar to other methods, [3,15], we are proposing a local estimation method of robots performing tasks. Consider that each robot has a fixed sensing radius, Ri , and the ability to determine which robots within its radius are performing which behaviors. The neighborhood of the robot, Ni , is every robot within Ri , and the population estimate for robots performing different i tasks within Ni is Y˜i . This method assumes that within the sensing radius, Ri , there will be a sufficient density of robots to act as a proxy for the overall distributions of robot behaviors. Substituting Y˜i for Yˆi in Eq. (12) gives a new locally computed interaction area. Note to scale the desired trajectory, Y˜i∗ = Yi∗ Ni . Using local estimates and an appropriately scaled desired trajectory we have eliminated the need for global information in our microscopic experiments.
4
Simulation and Experimental Setup
We used three levels of fidelity to evaluate our proposed approach: macro-discrete simulation trials, microscopic simulation trials, and mixed reality experimental
Stochastic Nonlinear Ensemble Modeling and Control
91
trials. Parameters for all results were selected to be near our equilibria of interest for both Example 1 and Example 2, which ensure that we see time-varying distributions of robots performing tasks. Our macro-discrete simulations used the Stochastic Simulation Algorithm (SSA) from Gillespie [8]. SSA takes in initial populations of robots Yˆ (0) and collaboration rates K. To determine the next time and type of event, SSA uses a pseudo random number generator. While the total time and total number of transitions are not exceeded, the following two steps are iteratively taken. The next event is determined, by considering the likelihood of a certain task switch based on the current population counts, Yˆi and the collaboration rates, kij . Consider that each collaboration type has the following scaled collaboration rate, bij = kij Yˆi Yˆj .
(13)
M M Select a random number, r1 and scale that value by bn = i j bij , or the total of all possible collaborations. The next event is which ever, bij is closest to the scaled random number. Note that the resulting switch is defined by reactions outlined in Eq. (1), and the population counts Yˆi and Yˆj are updated accordingly. Next, the time, tp , is updated to the next time, tp+1 , by generating a random number r2 such that,
Fig. 1. The multi-robot Coherent Structure Testbed (mCoSTe) water tank where the miniature Autonomous Surface Vehicles (mASV) perform different tasks.
tp+1 = tp + log
1 r2
1 . bn
(14)
The process continues until the maximum time or maximum number of transitions are met. An important observation is that SSA ignores robot dynamics, and considers only the stochastic jump process making it mathematically equivalent to microscopic simulations.
92
V. Edwards et al.
Specific to our implementation, the original Replicator Equations in (2), require initial conditions to sum to 1, which is not compatible with SSA. To deal with this we scale the population fractions by N , the result of doing this is a shift in the magnitude of the parameters used for the macroscopic continuous solutions and macro-discrete trials. In addition, to prevent behavior extinction, we enforce that each behavior is performed by at least one agent at any given time. To further validate our method, microscopic simulation trials were introduced to incorporate robot dynamics. Robots were simulated using ideal point particles, have unit velocity, and collide with the boundary of the workspace using an angle of deflection of π/2. A difference from the macro-discrete trials is that robots now compute a direct interaction radius rij , based on the collaboration rates. This means when an encounter with another robot occurs, if there is a nonzero collaboration rate and agents are spatially within rij , then the predetermined interaction task switch ensues. Trials were run for 100 s, with a fixed number of robots represented by point particles in an 18 × 18 unit area. Experimental results were performed in the multi-robot Coherent Structure Testbed (mCoSTe), Fig. 1, which includes the miniature Autonomous Surface Vehicles (mASV), and a Multi Robot Tank (MR tank). The MR Tank is 4.5 m × 3.0 m × 1.2 m water tank equipped with an OptiTrack motion capture system. The mASVs are a differential drive platform, localize using 120 Hz motion capture data, communicate via XBee, and onboard processing is done with an Arduino Fio. Mixed reality experiments were performed to increase the robot density, due to limited number of mASV platforms available. Experiments use 4 mASV and 6 idealized point particles with the same configuration as the microscopic simulation trials. To implement our method on the mASV we assigned waypoints to robots by randomly selecting positions on the environmental boundary, which had an area of 2.3 m × 1.3 m. During the mixed reality experiment, robots react to simulated agents as if they were in the environment. Behavior switches occur based on the interaction radius rij . This means when any agent, robot or simulated, is within rij the predetermined collaboration switch happens. The interaction radius comes directly from the calculation of the area of interaction, Eq. (12), which is the connection between the macroscopic model and the microscopic implementation.
Stochastic Nonlinear Ensemble Modeling and Control
5
93
Results
Fig. 2. Example 1 has three time-varying tasks, and include five SSA trials. Figure T (a) parameters: Yˆ (0) = 2 50 50 , with k10 = 0.004, k12 = 0.0004, k20 = 0.003, and T k21 = 0.0008. Figure (b) parameters: Yˆ (0) = 2 50 50 , with α10 = 0.03, α12 = 0.003, α20 = 0.0225, and α21 = 0.006.
Our results use multiple levels of fidelity to demonstrate how interactions can induce time varying distributions of robot behaviors. When possible the same initial conditions were used for both Example 1 and Example 2 to help draw comparisons between different types of results. Specifically, the desired trajectories for Example 1 and Example 2 were the same for all fidelity results. The initial
T conditions for Example 1 were a reference trajectory of Y ∗ (0) = 0.2 0.2 0.6 with parameters k10 = 2.0, k12 = 0.2, k20 = 1.5, and k21 = 0.4. The initial con
T ditions for Example 2 were a reference trajectory of Y ∗ (0) = 0.1 0.2 0.4 0.3 and parameter μ = 0.01. Figures contain reference trajectories Y ∗ as solid lines, and resulting population fractions as dots. Further details for parameters used for each level of fidelity simulation and experiment are in the corresponding figure captions. For both Example 1 and Example 2 we selected parameters near the equilibrium presented in Sect. 2.2 so that time-varying distributions of robot behaviors were observed.
94
V. Edwards et al.
Fig. 3. Example 2 has four time varying tasks and plots include five SSA trials. Figure T (a) parameters: Yˆ (0) = 5 10 15 20 , with k12 = k23 = k34 = k41 = 0.01 and μ = T 0.0001. Figure (b) parameters: Yˆ (0) = 5 10 15 20 , with α12 = α23 = α34 = α41 = 0.1, and μ = 0.001
Five macrodiscrete, SSA, trail results are in Figs. 2 and 3. In the case without control, Figs. 2a and 3a, each trial roughly follows the desired time varying trajectory, but the variability is significant. This variability can be attributed to microscopic behaviors not being accounted for during SSA. In the case with control, Figs. 2b and 3b, the trials follow the trajectory closely as we anticipated. The gains, αij were tuned so that the rate of interactions were sufficiently high, if there was an insufficient interaction rate then the SSA trials would lag behind the reference trajectory.
Fig. 4. Controlled microscopic results of Example 1. Figure (a) uses global population estimation, and Figure (b) uses distributed population estimation where Ri = 5. Both T trials use Yˆ (0) = 2 25 25 , α10 = 2.0, α12 = 0.2, α20 = 1.5, and α21 = 0.4.
Stochastic Nonlinear Ensemble Modeling and Control
95
Fig. 5. Controlled microscopic results of Example 2 for two distributed estimation cases. Figure (a) uses sensing radius, Ri = 10, and Figure (b) uses sensing radius, T Ri = 5. Both trials used Yˆ (0) = 5 10 20 15 , μ = 0.05, α12 = α23 = α34 = α41 = 0.5.
Microscopic simulation trials are shown in Fig. 4, which provides comparisons between the centralized and distributed approach. The distributed method shown in Fig. 4b was not as accurate but roughly follows the trajectory which is to be expected. In Fig. 5 a direct comparison between different sensing radii is studied, as Ri reduces the number of neighbors reduces giving less accurate estimates for the robot task distributions. The rate that Ri loses accuracy is related to the density of robots in the environment. These microscopic scenarios are the most idealized version of environmental monitoring task assignment.
Fig. 6. Figure (a) shows a centralized mixed reality experiment, and Figure (b) shows a distributed mixed reality experiment with Ri = 0.25. The parameters used are the T same as those used in Fig. 4. The environment area is 2.3 m×1.3 m and Yˆ (0) = 2 4 4 .
96
V. Edwards et al.
Our final evaluation demonstrates that using physical robots in a mixed reality setting produce comparable results. The mixed reality experimental trials have robots and simulated agents respond to one another in the environment and the desired distributions of behaviors are achieved in Fig. 6. Notice that in the centralized case, Fig. 6a, the populations closely follow the desired trajectory. The variability in the distributed estimation plot, Fig. 6b, is due to the breakdown of the underlying assumption that within the local sensing radius, Ri , the robot has a decent sample of other robots performing tasks. Video demonstrations of the mixed reality experiments can be found at https://youtu.be/ X0QHgYM1s-g.
6
Discussion
Across multiple types of experimental results, we have demonstrated that we can control time-varying distributions of robots. Our results consider the most ideal case, where robots move randomly and consistently throughout the environment with parameters that were selected near equilibria of interest to ensure that our desired behavior was observed. There are several directions of future work which need to be addressed to further enhance this approach. For example, a more realistic robot behavior should be considered where there may be prolonged stops to complete a task or complex non-random behavior which would change the reaction dynamics of our macroscopic model. Likewise, right now, we have selected parameters to show that we can achieve time-varying distributions of robots, however, in practice we want the team to evolve based on information sensed in the environment. One way to address these nonidealities is to consider feedback from the microscopic team to the macroscopic model. This would require taking data collected from the individual robot and transforming it into a form that is compatible with the macroscopic model, for example by determining collaboration rate parameters kij . A direction of great interest to the authors is to consider more broadly the classes of replicator dynamics in general which achieve our task objective and potentially other more diverse task objectives. Important properties of the replicator dynamics which we have identified with our two examples is the need for equilibria to have either limit cycle type behavior or center behavior. For example, the resulting oscillations would not exist for parameters or equilibria which resulted in a payoff matrix that had only real eigenvalues, which we know from dynamical systems theory results in stable solutions [9]. Currently, we enforce that at least one agent must be performing a task at any given time to avoid extinction events. Extinction can occur when the parameters selected drive the populations to have extreme shifts in size, for example, all agents performing one task. Without our stated assumption, there is no way for the team to recover the other tasks if extinction happens. A solution would be to consider equilibria which do not cause such extreme population fluctuations. Alternatively, there are different nonlinear systems, for example, the Replicator Mutator equations [21] which introduce variability to the model to help ensure that only traits which are strong persist.
Stochastic Nonlinear Ensemble Modeling and Control
97
Finally, during the mixed reality experimental trials, robot collisions can occur and are non-catastrophic due to the small size and low speed of the mASVs. The addition of collision avoidance requires an additional scaling term to compute the interaction radius and is a direction for future work. An additional direction of future work is to consider how to expand experimental results to further improve the distributed estimation using robot-robot communication. A potential approach similar to [17], would introduce more stable results by considering a history of observations made about neighbors.
7
Conclusion
In this work, we address the limitations of existing macroscopic modeling techniques to model and control time-varying distributions of robot behaviors. This is of particular importance for environmental monitoring tasks, where natural phenomena might require the populations of robots to change behavior, for example, tidal shifts. Our proposed methods use robot-robot interactions, which are explicitly ignored by existing linear mean field models. The result is a rich set of time-varying robot behavior distributions that were not previously achievable. In addition, to counteract known uncertainty during robot team deployment, we propose a feedback control strategy which follows a specified desired distribution. Our macro-discrete results show that we get improved control over the model, and the idealized microscopic examples support that this is a fruitful avenue for further study. Future work includes understanding different types of nonlinear models, and further characterizing the uncertainty by seeking upper and lower bounds on the macroscopic model. Acknowledgement. We gratefully acknowledge the support of ARL DCIST CRA W911NF-17-2-0181, Office of Naval Research (ONR) Award No. N00014-22-1-2157, and the National Defense Science & Engineering Graduate (NDSEG) Fellowship Program.
References 1. Almadhoun, R., Taha, T., Seneviratne, L., Zweiri, Y.: A survey on multi-robot coverage path planning for model reconstruction and mapping. SN Appl. Sci. 1(8), 1–24 (2019) 2. Berman, S., Hal´ asz, A., Hsieh, M.A., Kumar, V.: Optimized stochastic policies for task allocation in swarms of robots. IEEE Trans. Rob. 25(4), 927–937 (2009) 3. Biswal, S., Elamvazhuthi, K., Berman, S.: Decentralized control of multi-agent systems using local density feedback. IEEE Trans. Autom. Control 67(8), 3920– 3932 (2021) 4. Deshmukh, V., Elamvazhuthi, K., Biswal, S., Kakish, Z., Berman, S.: Mean-field stabilization of Markov chain models for robotic swarms: computational approaches and experimental results. IEEE Robot. Autom. Lett. 3(3), 1985–1992 (2018). https://doi.org/10.1109/LRA.2018.2792696
98
V. Edwards et al.
¨ 5. Dey, B., Franci, A., Ozcimder, K., Leonard, N.E.: Feedback controlled bifurcation of evolutionary dynamics with generalized fitness. In: 2018 Annual American Control Conference (ACC), pp. 6049–6054. IEEE (2018) 6. Elamvazhuthi, K., Berman, S.: Mean-field models in swarm robotics: a survey. Bioinspiration Biomimetics 15(1), 015001 (2019). https://doi.org/10.1088/17483190/ab49a4 7. Gerkey, B.P., Mataric, M.J.: Multi-robot task allocation: analyzing the complexity and optimality of key architectures. In: 2003 IEEE International Conference on Robotics and Automation (ICRA), vol. 3, pp. 3862–3868. IEEE (2003) 8. Gillespie, D.T.: Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 81(25), 2340–2361 (1977) 9. Guckenheimer, J., Holmes, P.: Nonlinear Oscillations, Dynamical Systems, and Bifurcations of Vector Fields, vol. 42. Springer, Cham (2013). https://doi.org/10. 1007/978-1-4612-1140-2 10. Harwell, J., Sylvester, A., Gini, M.: Characterizing the limits of linear modeling of non-linear swarm behaviors. arXiv preprint arXiv:2110.12307 (2021) 11. Hofbauer, J.: On the occurrence of limit cycles in the Volterra-Lotka equation. Nonlinear Anal. Theory Methods Appl. 5(9), 1003–1007 (1981) ´ Berman, S., Kumar, V.: Biologically inspired redistri12. Hsieh, M.A., Hal´ asz, A., bution of a swarm of robots among multiple sites. Swarm Intell. 2(2), 121–141 (2008) 13. Hsieh, M.A., Halasz, A., Cubuk, E.D., Schoenholz, S., Martinoli, A.: Specialization as an optimal strategy under varying external conditions. In: 2009 IEEE International Conference on Robotics and Automation (ICRA), pp. 1941–1946 (2009). https://doi.org/10.1109/ROBOT.2009.5152798 14. Khamis, A., Hussein, A., Elmogy, A.: Multi-robot task allocation: a review of the state-of-the-art. Coop. Robots Sens. Netw. 2015, 31–51 (2015) 15. Lee, W., Kim, D.: Adaptive approach to regulate task distribution in swarm robotic systems. Swarm Evol. Comput. 44, 1108–1118 (2019) 16. Leonard, N.E.: Multi-agent system dynamics: bifurcation and behavior of animal groups. Annu. Rev. Control. 38(2), 171–183 (2014) 17. Lerman, K., Jones, C., Galstyan, A., Matari´c, M.J.: Analysis of dynamic task allocation in multi-robot systems. Int. J. Robot. Re. 25(3), 225–241 (2006). https:// doi.org/10.1177/0278364906063426 18. Lerman, K., Martinoli, A., Galstyan, A.: A review of probabilistic macroscopic models for swarm robotic systems. In: S ¸ ahin, E., Spears, W.M. (eds.) Swarm Robotics, pp. 143–152. Springer, Heidelberg (2005). https://doi.org/10.1007/9783-540-30552-1 12 19. Mather, T.W., Hsieh, M.A.: Distributed robot ensemble control for deployment to multiple sites. In: Robotics: Science and Systems VII (2011) 20. Nam, C., Shell, D.A.: Analyzing the sensitivity of the optimal assignment in probabilistic multi-robot task allocation. IEEE Robot. Autom. Lett. 2(1), 193–200 (2017). https://doi.org/10.1109/LRA.2016.2588138 21. Pais, D., Caicedo-Nunez, C.H., Leonard, N.E.: Hopf bifurcations and limit cycles in evolutionary network dynamics. SIAM J. Appl. Dyn. Syst. 11(4), 1754–1784 (2012) 22. Prorok, A., Hsieh, M.A., Kumar, V.: The impact of diversity on optimal control policies for heterogeneous robot swarms. IEEE Trans. Rob. 33(2), 346–358 (2017) 23. Ravichandar, H., Shaw, K., Chernova, S.: STRATA: unified framework for task assignments in large teams of heterogeneous agents. Auton. Agents Multi Agent Syst. 34(2), 38 (2020)
Stochastic Nonlinear Ensemble Modeling and Control
99
24. Schuster, P., Sigmund, K., Hofbauer, J., Gottlieb, R., Merz, P.: Selfregulation of behaviour in animal societies. Biol. Cybern. 40(1), 17–25 (1981) 25. Sigmund, K.: A survey of replicator equations. In: Casti, J.L., Karlqvist, A. (eds.) Complexity, Language, and Life: Mathematical Approaches. Biomathematics, vol. 16, pp. 88–104. Springer, Heidelberg (1986). https://doi.org/10.1007/978-3-64270953-1 4
A Decentralized Cooperative Approach to Gentle Human Transportation with Mobile Robots Based on Tactile Feedback Yi Zhang(B) , Yuichiro Sueoka(B) , Hisashi Ishihara, Yusuke Tsunoda, and Koichi Osuka Osaka University, Suita, Osaka 565-0871, Japan {zhang.y,sueoka,y.tsunoda,osuka}@mech.eng.osaka-u.ac.jp, [email protected]
Abstract. Cooperative transportation of objects by a group of small mobile robots is expected to work in disaster sites. In this study, we aim to transport fragile objects including humans which may move during the transport, with as little burden as possible. We propose the adoption of a flexible tri-axis tactile sensor with thickness at the top of the robot on which the object is mounted for safe support and state monitoring. We improved the leader-follower based control by adding force feedback into the leader’s control law to prevent excessive force on the object, which is the disadvantage of the typical leader-follower method. We verified the robots can transport a rigid body gently with the proposed control law by converging their speeds to the same value in a dynamical simulation and actual experiments. Additionally, we found that multijoint objects also can be transported with the proposed method, whereas the stable support of the object by each robot is reserved for future works.
Keywords: cooperative transportation human transportation
1
· flexible tactile sensor ·
Introduction
Recently, cooperative transportation of objects by a swarm of small mobile robots has attracted much attention [1–3]. Because of their scalability, flexibility, and robustness [4–7], swarm robots are expected to be applied to work at disaster sites [8]. Examples of disaster site work include cooperative removal of rubble and fallen trees, transportation of various supplies, and injured people. Since a single robot is not versatile enough to perform such tasks, methods have been proposed to realize such tasks through the cooperation of multiple robots. Recently, many studies on cooperative multi-robot transportation systems [1– 3] have been conducted. Cooperative transportation methods can be roughly divided into pushing-only, grasping, and caging. In particular, many studies c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 100–112, 2024. https://doi.org/10.1007/978-3-031-51497-5_8
Gentle Human Transportation Based on Tactile Feedback
101
have adopted the method of pushing and pulling on the ground for pushing-only and caging. Though there are studies dealing with human transportation [9,10] among them, in [10], the human is pulled and transported by swarm robots on the ground, which is not gentle to the human. For systems that carry adult-sized people on the robot, [9] is designed with about three modular robots, making it difficult to deal with different people sizes due to the large size of each modular robot. Thus, though there exists a demand for robots to transport people at disaster sites, there are few studies of cooperative transport systems in which multiple robots carry a human on their top and are suitable for various body types. When we consider small robots carrying a fragile object on their top, the contact surface materials that cover their top are desired to be soft. If the robot parts are made with hard materials, a substantial force will occur on the contact surface between the robot and the object to be transported, which may damage the object. Safe interaction is necessary and objects such as humans need to be treated gently which can be achieved by using flexible materials [14,15]. Another advantage of using soft materials as a contact surface is that by introducing a sensor to measure the deformation of the contact surface, each robot can determine its movement so that no extra force is applied to the object. According to the action-reaction law, if we move the robot so that the deformation of the contact surface is reduced, the force applied to the object is also reduced. For most of the previous studies, the force information is measured at the joints rather than at the surface that handles the object. Here, we consider the implementation of flexible tactile sensors that measure the force through deformation on the contact surface of all robots in a leaderfollower system. Leader-follower is a typical cooperative transportation method in which the leader decides the direction of the transport, and the followers move the object to assist the leader’s effort [16–20]. We consider the method suitable for transporting humans, which may change their position during transport, because the followers can keep following each body part based on the force information even when the person changes their position. However, the transportation is not gentle if the leader robot runs forward without paying attention to the overall transportation situation. In [19,20], the control law is designed so that leader tries to run at the desired velocity all the time. This will let the leader exert excessive force on the object when the desired velocity is suddenly changed. These situations include the beginning of the transport and emergencies where the leader detects a danger such as a soon-to-collapse house and needs to evacuate as soon as possible. This is why the control of a leader robot also needs to include feedback from its force sensor. Therefore, in this study, we introduce force feedback into the control law for the leader, as a major difference from the conventional leader-follower based method. Using the improved leader-follower method, in situations when the leader tries to give a speed command far from the current system’s speed, the designed control law will restrict the force the leader applies to the object by moving toward the direction where force is applied, i.e. slowing down when force
102
Y. Zhang et al.
is applied to the opposite direction of the desired velocity. Since each robot’s control is completed by its sensors and controller, there is no need for communication with other robots in this system, the system is fully autonomous and decentralized. Thus, this study adopts an approach to gentle transportation and safe interaction without applying excessive force by improving both the hardware and the software. We define gentle as a characteristic when the device has a surface that enables safe interaction with human, and does not apply undue force to the transportation subject. This is achieved by adopting a robot equipped with a thick and flexible tri-axis tactile sensor and an improved leader-follower based control which includes force feedback in the leader’s control. In this paper, firstly, an autonomous decentralized controller for cooperative transport is designed based on the force information from the flexible sensors. Then, a model including the flexible sensor’s dynamics is constructed. Speed convergence, consensus speed characteristics and gentleness to the object of the designed autonomous decentralized control are analyzed through simulations and actual experiments using a rigid object. In addition, we discuss the performance of the extended application of the designed controller to multijoint transport through transport experiments with a replica of a half-size skeleton. Contributions: I. An approach to gentle transport without applying excessive force by improving both hardware and control laws; II. A modeling method of the system considering the flexible tactile sensor; III. Verification of extensibility of the controller to multijoint objects.
2
Methodology
In this chapter, we discuss the mechanism of the flexible tactile sensor and the control law for gentle cooperative transportation which uses the flexible tactile sensor’s reading values. 2.1
Flexible Tactile Sensors and Methods of Acquiring Force Information
In recent years, various flexible tactile sensors have been developed [11–13] and their design methods have been established [11]. In this study, a force sensor with a flexible thick contact layer that can measure the triaxial force developed by Kawasetsu et al. [12] was adopted for the part that interacts with the object. The force exerted on the object during transport is measured at the contact area, enabling transport without exerting excessive force on the object. The flexible tactile sensor consists of two main parts: the flexible contact part and the circuit board part which includes a 4-channel inductance digital converter (LDC1614EVM, TEXAS INSTRUMENTS). The flexible contact part has a disk-shaped magnetorheological elastomer (ferromagnetic marker) in the center that deforms according to the force applied. In [12], the silicon rubber
Gentle Human Transportation Based on Tactile Feedback
103
Fig. 1. (a) shows the mechanism of the measurement using the flexible tactile sensor’s deformation. (b) shows the coil position indexing for force related values’ calculation.
ECOFLEX 00-30 (Smooth-On) was used, whereas, in this study, ECOFLEX 0020 (Smooth-On), a softer silicone rubber, was used considering handling objects more gently and increasing sensor sensitivity. The magnetorheological elastomer in the center was made of elastomer containing iron powder. The deformation of the magnetorheological elastomer part changes the values of inductance Li ∈ R, i = {1, 2, 3, 4}, measured with the four coils as shown in Fig. 1, which can be used to determine Lx , Ly and Lz as in Equations (1)–(3) which are known to correspond linearly to the forces in the three axes. We calibrate the sensor by centralizing each Li to zero when no load is on the sensor. During the experiment, when force is applied to the sensor, we obtain three values that are proportional to the force as follows: Lx = (L1 + L4 ) − (L2 + L3 ),
(1)
Ly = (L1 + L2 ) − (L3 + L4 ),
(2)
Lz = L1 + L2 + L3 + L4 .
(3)
Force can be calculated using Lx , Ly and Lz for each corresponding direction after calibration. However, we use Lx and Ly in the control law directly because of their linear relationship with the forces [12]. 2.2
Design of Control Law
Consider a situation where not all robots know the path in which the object should be moving. In this case, the robot that has the knowledge of the path to the goal can be the leader and brings the object forward. The remaining follower robots can sense the direction the leader robot wants to go based on the force information from the object, then follow the leader robot. There is no need for communication between robots. However, the transportation is not gentle if the leader robot runs forward without paying attention to the overall transportation situation or movements made by humans such as stretching one’s limb. Therefore, the control of a leader robot also needs to include feedback from the force sensor. The system schema is shown in Fig. 2.
104
Y. Zhang et al.
Fig. 2. Cooperative transportation using leader follower model.
F T For the follower robot’s control law, [LF obtained from the flexible x , Ly ] tactile sensor is multiplied by a gain KF ∈ R to determine the follower’s speed F F , vin,y ]T as shown in Equation (4). input [vin,x F F vin,x Lx = K (4) F F LF vin,y y
For the leader robot’s control law, in addition to the desired velocity v d ∈ R2 L T for leader along the predetermined path, [LL x , Ly ] obtained from the flexible tactile sensor is multiplied by a gain KL ∈ R to determine the leader’s speed L L , vin,y ]T as shown in Equation (5). Note that KF and KL are coninput [vin,x stants with the dimension of M −1 T . L L vin,x Lx (5) = KL + vd L LL vin,y y The control law is designed so that leader will decrease its speed when force is applied to a direction opposite to the desired velocity. Due to this feature, the system’s velocity does not converge to v d but to some velocity v c where |v c | < |v d |.
3
Modeling and Simulation
Firstly, we need to construct a model of the robots’ behavior that considers the dynamics of the system which includes the flexible tactile sensor. Since the leader robot’s control also contains feedback of the force it receives, the system does not simply converge to the leader robot’s desired speed. The effects of various parameters on convergence and the value of consensus speed were investigated through simulations. 3.1
Modeling
In this model, we consider the transportation of a rigid body in a one-dimensional path for simplicity. Since silicone rubber generally has elasticity and damping elements, we model the flexible tactile sensor as a spring and a damper with the modulus of elasticity and damping coefficient denoted by k and c, respectively.
Gentle Human Transportation Based on Tactile Feedback
105
Fig. 3. Modeling schematic diagram. All the flexible sensors mounted on the robots are assumed to have the same elasticity and damping coefficient. The initial value of each robot’s and object’s position and speed is set to be 0.
When the object is too heavy to move for the leader, the followers are not able to sense the force from the object in the first place. So we consider multiple leaders in our model. Thus, consider a system with p leaders, q followers and an object, where the position of leaders, object and followers are denoted by {x0 , x1 , · · ·, xp−1 }, xp and {xp+1 , xp+2 , · · ·, xp+q } respectively. The mass of each robot and the object is m and M respectively. In the initial state, the object is placed on the robots so that the forces are balanced among all robots. Let xi ∈ R, {i ∈ Z | 0 ≤ i ≤ p + q} be defined as zero when all robots are in the initial state, i.e., at the position where the sensor springs are at their initial length. The model schema is shown in Fig. 3. There is no direct interaction between the robots. We assume there is no sliding between the object and each robot. For the dynamics model, we only consider the dynamics of the object for simplicity. We consider rolling resistance for wheel resistance, which is the main resistance in the low-speed region. The rolling resistance is a force directly applied to the robot, but we consider it hypothetically applied to the object since it is assumed that there is no sliding between the object and the robot. Ur is denoted as the resistance coefficient. Thus, the dynamics of leaders, followers and the object can be expressed as in Equations (6)–(8). The leaders’ and followers’ speed change according to the control law and the object moves according to the summation of forces it receives from all robots. x˙i = vd + K L (−k(xi − xp ) − c(x˙ i − x˙ p )), 0 ≤ i ≤ p − 1 Mx ¨p =
p−1
(−k(xp − xi ) − c(x˙ p − x˙ i ) −
i=0 p+q
(6)
x˙ i M Ur ( + m)g) |x˙ i | p+q
x˙ j M + Ur ( + m)g) (−k(xp − xj ) − c(x˙ p − x˙ j ) − | x ˙ | p +q j j=p+1 x˙j = K F (−k(xj − xp ) − c(x˙ j − x˙ p )), p + 1 ≤ j ≤ p + q
(7)
(8)
106
Y. Zhang et al.
Fig. 4. (a) shows the force applied to each robot and the object, (b) shows the speed of the leader, follower and object during the base case simulation.
3.2
Simulation
In this section, we use the model constructed in the previous section to investigate the convergence and trends of consensus speed for different parameters. We define consensus speed as the speed after leaders and followers reach consensus and their speeds become the same value. The governing equations and the initial conditions for all leader robots are the same, so the answers will be identical. This is the same with the followers. Therefore, only one leader and one follower are considered in the simulation and the following graph, the leader’s state represents the state of all leaders and the same with the followers. We consider a base case that has a certain configuration and investigate the influence of parameters by comparing other cases with the base case. The parameters include the object’s weight, leaders’ number, followers’ number, leader’s gain and follower’s gain in the control law, sensor’s elasticity, damping effect and leader’s desired speed. The parameter value settings for each case are described in Table 1. We set Ur to be 0.01 in all cases. We show the graph of speed and force in Fig. 4. The speed of the leader and follower reached a consensus so that the speed value became the same. Both in the speed and force graph, we could observe the gentleness to the object in the first two seconds. When the leader is trying to run too fast or applying too much force to the object, it will restrict its speed according to the designed control law. The rest of the cases also showed speed consensus. The speed after consensus of each case is shown in Table 1. It is shown that when the value of leader number, follower gain, or leader desired speed increases, the speed after consensus is higher. On the other hand, if the weight of the object, follower number, or leader gain increases, the consensus speed becomes lower. The elasticity and damping effect of the sensor does not affect the consensus speed. More specifically, the elasticity of 1 to 400 and damping effect of 0.001 to 5 resulted in the same consensus speed with other parameters fixed, though they affect the length of
Gentle Human Transportation Based on Tactile Feedback
107
Table 1. Simulation settings and results. Cases
p q M KL
KF k
c
vd Speed after consensus
Base case
1 2 5
0.1
0.2 20 0.03 3
1.48
Heavier object
1 2 9
0.1
0.2 20 0.03 3
1.46
More leaders
2 2 5
0.1
0.2 20 0.03 3
1.98
More followers
1 3 5
0.1
0.2 20 0.03 3
1.18
Higher KL
1 2 5
0.15 0.2 20 0.03 3
1.17
Higher KF
1 2 5
0.1
0.4 20 0.03 3
1.97
Sensor with more elasticity
1 2 5
0.1
0.2 40 0.03 3
1.48
Sensor with more damping effect 1 2 5
0.1
0.2 20 0.08 3
1.48
Higher leader desired speed
0.1
0.2 20 0.03 10 4.98
1 2 5
time for the system’s speed to converge. With elasticity larger than the range, convergence still can be achieved by adjusting the gain. Conditions for stability will be explored in future work.
4 4.1
Experiments Experiment Setup
A three-wheeled omnidirectional robot platform manufactured by Nexus robot was used to test the performance of the control law. The appearance of the omnidirectional mobile robot equipped with the flexible contact sensor described in the previous section is shown in Fig. 5 (a),(b). Although the simulation was in one dimension for simplicity to show the convergence of speed, the control is in two dimension as stated in Sect. 2.2. For the control of the robots, the relationship between the velocity of each wheel v1 , v2 , v3 , and the velocity of the robot vx , vy , vθ with respect to the x, y, and θ axes can be obtained as in Equation (9) using the inverse kinematics model. Positive directions of velocities are defined as in Fig. 5 (c). Distance from the center of the robot to the wheel is defined as r. The sample rate in the experiments is about 0.23 s. ⎞⎛ ⎞ ⎛ ⎞ ⎛ 1 r vx v1 √0 ⎝v2 ⎠ = ⎝−1/2 − 3/2 r⎠ ⎝vy ⎠ √ v3 vθ −1/2 3/2 r
(9)
In the experiments, the validity of the system is first verified through the transportation of a rigid object. We show that the system is able to transport an object cooperatively even if the weight of the object and the number of leaders
108
Y. Zhang et al.
Fig. 5. (a): The mechanical structure of the developed robot. The Vstone’s omni-robot platform which comes with three omni-wheels, 12V DC coreless motors (gearbox ratio is 80:1), and optical encoders (4CPR) is used for the experiments. (b): The microcontroller - ESP32-DevKitC V4, motor driver - Cytron MDD3A, Bi-directional Logic Level Converter - TXB0108, and a Lipo battery is equipped. (c): velocity indexing of the omnidirectional robot. Table 2. Experiments’ conditions Cases
p q M (kg) KL (M −1 T ) KF (M −1 T )
Base case
1 2 5
15
30
Heavier object 1 2 9
15
30
More leaders
2 2 5
15
30
More followers 1 3 5
15
30
and followers are changed. Secondly, a half-size mannequin is transported to verify the scalability of the system for transporting multijoint objects. 4.2
Experiment Result
Transportation of Rigid Body. The experiment conditions are described in Table 2. Note that in the experiment, we feed back values measured with the flexible tactile sensor which is proportional to the force information but not the exact force. For the base case result, the flexible tactile sensor’s reading and the speed calculated using encoder values are shown in Fig. 6. A video recording of the experiment is shown in [21]. It can be confirmed that the speed changes according to the control law and the sensor value. During the experiment, the slipping of the leader’s tires is observed. Thus, though the consensus on speed is confirmed, there is a difference between the speed of the leader and the speed of the followers. At 44 s, the object is removed from the robots and the sensor reading turned back to nearly zero. The followers stopped moving and the leader started moving at its desired speed vd . The Follower 2’s sensor reading did not turn exactly back to zero because of the hysteresis in the flexible sensor’s shape. For Follower 1, the flexible contact part of the sensor which includes the magnetorheological
Gentle Human Transportation Based on Tactile Feedback
109
Fig. 6. (a,b,c) shows the speed of the robots during the base case experiment calculated with encoders. (d,e,f) shows the flexible force sensor’s reading in the direction of movement.
elastomer was removed when removing the object, resulting in a remaining sensor reading. In other cases, we also observed the consensus of speed during the experiments. Time-series graphs of other experiments are omitted. The averaged speed after the observation of the speed consensus in each case is shown in Table 3. The consensus speed is calculated using the motion capture software Kinovea. The trend of consensus speed when the parameters change stated in the simulation is also shown in the experiment except in the case when a heavier object is used for the experiment. This might due to the difference of the actual vd in each experiment. In all cases, the leader’s desired input was set to the same value, but the actual desired speed results are slightly different due to the difference in Lipo battery voltage. Transportation of Multijoint Object–Mannequin. In the cooperative transportation experiment, a half-sized human skeleton model (85 cm, 4.9 kg) was placed on six robots that are under the back, waist, arms and legs respectively as shown in Fig. 7. The human skeleton model is fleshed out by sandbags to mimic a human and its joints are free to move. The leader was placed under the waist, and the other five robots acted as followers. The leader was given a straight path in advance.
110
Y. Zhang et al. Table 3. Experiments’ speed result Base case Heavier object
Speed after consensus (Kinovea, m/s) 0.0507
0.0523
More leaders
More followers
0.0778
0.0485
Fig. 7. The scene during the Mannequin transportation experiment. Table 4. Experiment consensus speed result calculated using Kinovea and the leader’s desired speed in each case measured with encoder. Cases
Leader
Body parts
Upper body Head Left arm Right arm Left leg Right leg
Follower
Average speed after consensus (m/s)
0.014
0.012 0.014
0.014
0.016
0.014
The result of each robot’s average speed after consensus was calculated using Kinovea is shown in Table 4. We observed the consensus on speed and success in transportation. A video of the experiment is shown in [22]. In another experiment, we applied force to arms and legs in which the direction is in the plane that is being controlled to mimic the situation where a human moves its limbs. A video of the experiment is shown in [23]. The followers moved in the direction of applied force following the designed control law and the robots successfully tracked the limb’s movement. The compatibility of the leader-follower based control and multijoint objects’ transportation is demonstrated.
5
Conclusion and Future Work
We have confirmed the feasibility of transporting objects with flexible tactile sensors in both simulated and actual environment including a skeleton model transportation experiment in the real world. We confirmed the consensus of speed and its trend with the improved leader-follower based control which applies force feedback to the leader in order to achieve gentle transport. The gentleness to the object is verified according to the force data in the simulation. Until now,
Gentle Human Transportation Based on Tactile Feedback
111
little thought has been given to the transport of multijoint objects, damage or discomfort to the objects being transported. The system in this paper is an example of research that considers extensions to multijoint object transport and gentleness to the objects being transported. The control law designed in this study includes feedback from a force sensor to the leader, so the consensus speed is not simply the desired speed given to the leader. Modeling of the system allowed us to qualitatively confirm the difference in consensus speed. In the future, by analytically determining the consensus speed, it is possible to confirm what parameters and how they are involved in the value of consensus speed to design the consensus speed. In addition, we plan to decrease the instability due to the roundness of the body by making the current robots smaller, increasing the number of robots and increasing the contact area. Acknowledgement. This work was supported by JSPS KAKENHI 20H04473, T21K141830, A21H05104a and JST [Moonshot R&D][Grant Number JPMJPS2032] and in part by grants-in-aid for JSPS KAKENHI Grant Number JP22K14277. The authors would like to thank Dr. Kawasetsu for his valuable advice on the flexible tactile sensor and Mr. Ito for discussions on the hardware design.
References 1. Cao, Y., Fukunaga, A., Kahng, A.: Cooperative mobile robotics: antecedents and directions. Auton. Robot. 4, 7–27 (1997). https://doi.org/10.1023/A: 1008855018923 2. Tuci, E., Alkilabi, M.H., Akanyeti, O.: Cooperative object transport in multi-robot systems: a review of the state-of-the-art. Front. Robot. AI. 5 (2018). https://doi. org/10.3389/frobt.2018.00059 3. Schranz, M., Umlauft, M., Sende, M., Elmenreich, W.: Swarm robotic behaviors and current applications. Front. Robot. AI (2020) 4. Fukuda, T., Ueyama, T., Kawauchi, Y., Arai, F.: Concept of cellular robotic system (CEBOT) and basic strategies for its realization. Comput. Electr. Eng. 18, 11–39 (1992). https://www.sciencedirect.com/science/article/pii/004579069290029D 5. Parker, L.: Current state of the art in distributed autonomous mobile robotics. Distrib. Auton. Robot. Syst. 4, 3–12 (2000) 6. Yan, Z., Jouandeau, N., Cherif, A.: A survey and analysis of multi-robot coordination. Int. J. Adv. Robot. Syst. 10 (2013). https://doi.org/10.5772/57313 7. Brambilla, M., Ferrante, E., Birattari, M., Dorigo, M.: Swarm robotics: a review from the swarm engineering perspective. Swarm Intell. 7, 1–41 (2013) 8. Nazarova, A., Zhai, M.: The application of multi-agent robotic systems for earthquake rescue. Robot.: Indus. 4.0 Issues New Intell. Control Paradigms., 133–146 (2020). https://doi.org/10.1007/978-3-030-37841-7 11 9. Osuka, K., Isayama, S.: Motion control of multi-linked stretcher robot DUCKS. In: Proceedings Of SICE Annual Conference 2010, pp. 873–874 (2010) 10. Dorigo, M., et al.: The SWARM-BOTS project 11. Wang, H., et al.: Design methodology for magnetic field-based soft tri-axis tactile sensors. Sensors (Switzerland) 16 (2016)
112
Y. Zhang et al.
12. Kawasetsu, T., Horii, T., Ishihara, H., Asada, M.: Flexible tri-axis tactile sensor using spiral inductor and magnetorheological elastomer. IEEE Sens. J. 18, 5834– 5841 (2018) 13. Wang, H., et al.: Design and characterization of tri-axis soft inductive tactile sensors. IEEE Sens. J. 18, 7793–7801 (2018) 14. Hughes, J., Culha, U., Giardina, F., Guenther, F., Rosendo, A., Iida, F.: Soft manipulators and grippers: a review. Front. Robot. AI 3 (2016) 15. Kim, S., Laschi, C., Trimmer, B.: Soft robotics: a bioinspired evolution in robotics. Trends Biotechnol. 31, 287–294 (2013) 16. Asama, H., Matsumoto, A., Ishida, Y.: Design of an autonomous and distributed robot system: ACTRESS. In: Proceedings. IEEE/RSJ International Workshop on Intelligent Robots and Systems. (IROS ’89) the Autonomous Mobile Robots and its Applications, pp. 283–290 (1989) 17. Kosuge, K., Oosumi, T., Chiba, K.: Decentralized control of multiple mobile robots handling a single object in coordination. J. Robot. Soc. Jpn. 16, 87–95 (1998) 18. Osumi, H., Nojiri, H., Kuribayashi, Y., Okazaki, T.: Cooperative control for three mobile robots transporting a large object. J. Robot. Soc. Jpn. 19, 744–752 (2001) 19. Wang, Z., Schwager, M.: Kinematic multi-robot manipulation with no communication using force feedback. In: 2016 IEEE International Conference On Robotics And Automation (ICRA), pp. 427–432 (2016) 20. Wang, Z., Yang, G., Su, X., Schwager, M.: OuijaBots: omnidirectional robots for cooperative object transport with rotation control using no communication. In: Groß, R., et al. (eds.) Distributed Autonomous Robotic Systems. SPAR, vol. 6, pp. 117–131. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73008-0 9 21. Zhang, Y.: Transportation experiment with mobile robots based on tactile feedback - object. https://youtu.be/i5yjvAJ5VRE 22. Zhang, Y.: Transportation experiment with mobile robots based on tactile feedback - half-sized skeleton. https://youtu.be/AtM5e7bqSZQ 23. Zhang, Y.: Transportation experiment with mobile robots based on tactile feedback - moving limb. https://youtu.be/NBYaup4oivA
Sparse Sensing in Ergodic Optimization Ananya Rao1(B) , Ian Abraham2 , Guillaume Sartoretti3 , and Howie Choset1 1
Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15203, USA [email protected] 2 Department of Mechanical Engineering, Yale University, New Haven, CT 06520-8292, USA 3 Mechanical Engineering Department, National University of Singapore, Singapore 117575, Singapore https://biorobotics.org/ Abstract. This paper presents a novel, sparse sensing motion planning algorithm for autonomous mobile robots in resource limited coverage problems. Optimizing usage of limited resources while effectively exploring an area is vital in scenarios where sensing is expensive, has adverse effects, or is exhaustive. We approach this problem using ergodic search techniques, which optimize how long a robot spends in a region based on the likelihood of obtaining informative measurements which guarantee coverage of a space. We recast the ergodic search problem to take into account when to take sensing measurements. This amounts to a mixed-integer program that optimizes when and where a sensor measurement should be taken while optimizing the agent’s paths for coverage. Using a continuous relaxation, we show that our formulation performs comparably to dense sampling methods, collecting information-rich measurements while adhering to limited sensing measurements. Multi-agent examples demonstrate the capability of our approach to automatically distribute sensor resources across the team. Further comparisons show comparable performance with the continuous relaxation of the mixedinteger program while reducing computational resources.
Keywords: Search and coverage
1
· Control · Ergodic search
Introduction
There is a lack of capability of current robotic systems to explore and search for information in resource limited settings. Such a problem occurs in several applications including planetary science robots with expensive sensors, agricultural robots with limited available measurements, etc. Currently, arbitrary heuristics are used to prioritize resources, but this approach does not consider the dependence on what information the robot has acquired and its mobility capabilities. There is a need for methods that can jointly optimize how robots search and when to take measurements. Prior works in coverage, especially those based on ergodic search processes, assume that the sensor being used is consistently taking measurements, and c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 113–126, 2024. https://doi.org/10.1007/978-3-031-51497-5_9
114
A. Rao et al.
therefore, constantly uses energy. In robotic applications, taking a sensor measurement incurs a cost, which this work strives to minimize without compromising the efficacy of autonomous coverage, in terms of the ergodic metric. To improve the applicability and efficacy of robots, especially in resource limited environments, we need to consider optimizing over when to record and acquire sensor measurements (for example, in Fig. 1). In this paper, we pose the problem of acquiring a set of optimal measurements as a sparse ergodic optimization problem, where the decision to take a measurement is optimized as a vector of decision variables, and show that placing an L1 norm can promote sparsity in this vector. Much prior work in resourcelimited robotics focuses on using sparse data to achieve goals like localization [1] and depth reconstruction [2]. However, these formulations pose complex optimizations, which are hard to solve or require forming heuristics that neglect optimality of sensor measurement placement. Unlike many other information-based search Fig. 1. Proposed Approach for Sparse Ergodic Optimization: Our approach automates how a methods, trajectories planned robot explores an area and what informative meausing ergodic search processes surements to collect in a joint optimization probcan be optimized over longer lem. Illustrated above is an example sparse sensing time horizons, and drive the solution for covering a one-dimensional information robot to spend time in areas distribution (gray distribution), where peaks correof the search domain in pro- spond to areas of high expected information, and the portion to the amount of green colored bar represents when the robot takes a expected information at those measurement. locations, thus efficiently balancing exploration and exploitation. Optimizing the ergodic metric ties to better coverage of the a priori information distribution. Furthermore, ergodic search processes yield special structure in the formulation of the optimization problem, which we leverage to formulate sampling decision times that are solutions to a convex optimization problem. The sampling solutions result in a near-optimal set of sensing measurements along the robot’s trajectory. The organization of this paper is as follows: In Sect. 2 we discuss existing search and coverage methods, as well as recent advances in sparse sensing. A brief background on ergodic search processes is also presented in Sect. 2. Details on our sparse ergodic optimization approach, and its application to distributing sensing resources across a multi-agent team are discussed in Sect. 3. In Sect. 4 we present and discuss the results of our systematic set of experiments, in which we observe that our approach can maintain coverage performance while significantly decreasing sensing costs, in both single and multi-agent settings. Finally, we presenting concluding remarks in Sect. 5.
Sparse Sensing in Ergodic Optimization
2
115
Prior Work
Search and Coverage Methods: Search and coverage methods have widespread uses in a number of important applications, including search and rescue, cleaning robots, and automated agriculture techniques. Both object search and coverage involve motion planning, typically based on an a priori information distribution in a domain. Most search and coverage techniques involve the construction and use of an information map which encodes the utility of the robot exploring specific regions of the search space. Broadly, area search and coverage methods fall into three categories: geometric, gradient-based and trajectory optimization-based approaches. Geometric methods, like lawnmower search, are good techniques for uniformly covering a search domain [3,4]. Such approaches are useful when no a priori information is provided, or when the a priori information is presumed to be a uniform probability distribution over the search domain. On the other hand, since these approaches do not utilize any a priori information about regions of interest, the efficacy of exploration and object search is upper bounded. When a priori information is available, a representation of how information is distributed across a search domain (i.e. an a priori information map or information distribution) can be leveraged to develop better search and coverage techniques. In gradient-based methods [5–7], the short-term information gain is greedily maximized by guiding the search agent in the direction of the derivative of the information map around its position. Since the agent is always driven in the direction of greatest information gain, it is led to areas with higher likelihood of finding objects. However, these approaches can leave areas unexplored, because they generally do not take into account the uncertainty associated with the a priori information distribution, which can help differentiate between unexplored areas with low information and areas with no information. These approaches are very sensitive to noise in the information map because noise leads to inaccurate gradient estimations, which in turn leads to agents greedily over-exploiting local information maxima. Trajectory optimization-based methods formulate search as an information gathering maximization problem, which is then solved by planning a path for the search agent [8]. The cost function that drives the optimization in many of these approaches combines both the information distribution and its associated uncertainty, thereby reducing the likelihood of leaving areas unexplored. Sparse Sensing: Many real-world robotic applications are plagued by resource limitations. Sparse sensing techniques are useful in these scenarios. Most prior work in applications involving sparse sensing focus on using sparse sensor measurements (and therefore sparse data) to accomplish tasks like localization and SLAM [1], depth reconstruction [2] and wall-following [9]. These works mostly explore methods to better use limited data that has already been acquired, or that is actively being acquired. However, in all of these approaches, the robot still has to acquire the measurements, and post-optimizing for sparse data points does not help reduce costs for limited onboard resources. While intelligently using
116
A. Rao et al.
limited data does help improve the performance of resource-limited robotic systems, further improvements can be made by deciding where to take these limited measurements. Ergodic Search: Most information-based search and coverage methods view the information gathering problem through one of two lenses: exploration or exploitation. Through the exploratory lens, the information acquisition problem is framed as the wide space search for diffused information, for applications like localization or coverage [10,11]. On the other hand, through the exploitative lens, the information gathering problem is framed as the direct search for highly structured information peaks, such as in object search [11,12]. Ergodic search is able to balance both exploration and exploitation goals by accounting for both diffused information densities and highly focused points of information [8,13, 14]. By doing this, ergodic control trajectories are able to conduct both broadstroke coverage for diffused information densities and localized search for more focused high information points, thereby balancing exploration and exploitation. Specifically, an ergodic path will drive a robot to spend more time in regions of higher expected information in an a priori information map, and less time in regions of low information. Ergodic coverage [8] produces trajectories for agents such that they spend time in areas of the domain proportional to the expected amount of information present in that area. In order to control agents to accomplish this behavior, we pose an optimization problem that minimizes the distance between the timeaverage statistics of the agent Eq. 1 and the underlying information map. 1 δ(x − γi (τ )), C (x, γt ) = t τ =0 t−1
t
(1)
where γ is the agent’s trajectory, defined as γ : (0, t] → X , t is the discrete time horizon, and δ is the Dirac delta function, with X ⊂ IRd in the d-dimensional search domain. The spatial time-average statistics of an agent’s trajectory quantifies the fraction of time spent at a position x ∈ X . The expected information distribution, or information map, over the domain to be explored and searched is determined by a target distribution which defines the likelihood of generating informative measurements at any given location in the search domain. Formally, the agent’s time-averaged trajectory statistics are optimized against this expected information distribution over the whole domain, by minimizing the distance between the Fourier spectral decomposition of each distribution. This is obtained by minimizing the ergodic metric Φ(·), expressed as the weighted sum of the difference between the spectral coefficients of these two distributions [8]: Φ(γ(t)) =
m
2
αk |ck (γt ) − ξk | ,
(2)
k=0
where ck and ξk are the Fourier coefficients of the time-average statistics of an agent’s trajectory γ(t) and the desired spatial distribution respectively,
Sparse Sensing in Ergodic Optimization
117
and αk are the weights of each coefficient difference. In practice, αk = (1 + k2 )−(d+1) is usually defined to place higher weights on lower frequency components, which correspond to larger spatial-scale variations in the information distribution. The goal of ergodic coverage is to generate optimal controls u∗ (t) for an agent, whose dynamics are described by a function f : Q × U → T Q, such that u∗ (t)
= arg minu Φ(γ(t)),
subject to q˙ = f (q(t), u(t)), u(t) ≤ umax
(3)
where q ∈ Q is the state and u ∈ U denotes the set of controls. Equation 3 is solved directly for the optimal control input at each time-step, by trajectory optimization to plan feed-forward trajectories over a specified time horizon [14], or by using sampling-based motion planners [15], where it is straightforward to pose additional constraints such as obstacle avoidance. Note that in this optimization, each point x(t) in the robot’s trajectory is considered a sample point during the exploration of the search domain, which can be detrimental in scenarios where the robot’s trajectory traverses over large regions of low expected information. In the following section, we describe our approach to selecting the sampling decision times as a part of the optimization.
3
Sparse Sensing in Ergodic Optimization
We approach sparse sensing in ergodic coverage in two steps. First, we extend ergodic optimization to formulate a sparse ergodic optimization problem to jointly optimize for both an agent’s trajectory, and the importance of taking a sensor measurement at each point along the trajectory. Second, we apply this formulation to ergodic coverage using multi-agent teams, and jointly optimize the trajectories and sensing decisions of each agent in the team, resulting in the optimal distribution of limited sensing resources amongst agents. 3.1 Sparse Ergodic Optimization In many search and coverage applications, only a few sensor measurements or data points are required to fully
Fig. 2. Sparse Ergodic Trajectory Example: Trajectories are generated with a decision variable for when to take a sensor measurement. This approach results in trajectories which provide maximal information gathering with minimal sensing resources.
118
A. Rao et al.
comprehend and optimize a model. Note that in the optimization in Eq. 3, the locations at which sensor measurements are obtained are uniform in time, that is, each measurement along the trajectory is treated as being equally important and contributing equally to minimizing the ergodic metric. In many scenarios, due to both the non-uniform distribution of information in the region being covered, as well as the dynamic constraints of the robotic agents being used, this can lead to extraneous measurements that do not contribute to reconstructing the information prior. For example, if there are few regions of high information, which are separated by larger regions of low expected information, uniform sensor measurements would result in many measurements that provide low information gain (see Fig. 2). In this case, ergodic performance would be improved by only taking sensor measurements near the areas of higher information. In this work we extend ergodic optimization in order to find this set of optimal measurements for a given search scenario. We optimize for the set of optimal sensor measurements by posing the following ergodic optimization, u∗ (t), λ∗ (t) = arg minu,λ Φ (γ(t)), subject to q˙ = f (q(t), u(t)), u(t) ≤ umax
(4)
where q ∈ Q is the state, u ∈ U denotes the set of controls, and λ(t) ∈ {0, 1}. λ(t) represents the decision variable for choosing whether to take a sensor measurement or not at a given location in the search domain. We promote sparsity in the sample measurements by regularizing λ with an L1 optimization which promotes sparsity [16]. We augment the ergodic metric in Eq. 2 in the following manner
Φ (γ(t)) =
m
2
αk |ck (γ(t), λ(t)) − ξk | +
|λk |,
(5)
k=0
where ck and ξk are the Fourier coefficients of the time-average statistics of the set of agent’s trajectories γ(t) and the desired spatial distribution of agents respectively, and αk are the weights of each coefficient difference. The spatial time-average statistics of the agent’s trajectory (from Eq. 1) are also modified to be, 1 λ(t)δ(x − γ(τ )), t λ(t) τ =0
C t (x, γ(t)) =
t
(6)
where λ(t) ∈ {0, 1}. Thus far, λ(t) is defined to be an integer (i.e. λ(t) ∈ {0, 1}), resulting in Eq. 4 being a mixed integer programming problem. However, such mixed integer programming problems are difficult to solve, due to a lack of gradient information from the integer variables [17], and due to requiring direct search methods that do not scale with longer time horizons. For this reason, we employ a relaxation of the problem Eq. 4 by defining λ(t) to be a bounded continuous variable
Sparse Sensing in Ergodic Optimization
119
Fig. 3. Mixed-Integer Continuous Relaxation for Sparse Ergodic Optimization: Illustrated is the process with which we jointly optimize for trajectories and when to sample. The decision variable λ is relaxed to be continuous [0, 1] where solutions are projected into the integer 0, 1 space. As input, our approach takes information prior distributions which guide the planning and sensing. Output trajectory solutions (shown on the right) show concentrated samples over areas of high information with minimal use of sensor resources.
λ(t) ∈ [0, 1] and optimize over the new domain. After optimization, we project λ from the continuous domain to the nearest integer value, while adhering to the sensing budget. This allows us to continuously optimize Eq. 4, and then map the resultant continuous values of λ(t) to discrete values {0, 1}. The procedure for optimizing the sparse ergodic problem Eq. 4 is depicted in Fig. 3. When we investigate the numerical values of the decision variables λ(t) calculated in sparse ergodic optimization Eq. 4, we see that there are peaks formed that correspond to peaks in information in the a priori information distribution (Fig. 3). When these continuous numerical values are mapped back to λ(t) ∈ {0, 1}, this results in λ(t) = 1 being more likely in areas of higher information, and λ(t) = 0 being more likely in areas of lower information. This shows that the sparse ergodic optimization drives the likelihood of taking a sensor measurement in an area of higher expected information to be higher, which follows intuition. Additionally, we also see some peaks in regions of lower information, which typically correspond with areas between relatively closely places areas of high information, or with regions of low information where no sensor measurements have been taken for a considerable amount of time. 3.2
Multi-agent Sparse Ergodic Optimization
For a multi-agent team covering a given information prior, the limited number of measurements required to fully cover the region can be distributed among the
120
A. Rao et al.
different agents. We apply sparse ergodic optimization (Eq. 4), as described in Sect. 3.1 to distribute sensing resources amongst a multi-agent team. A sample result is shown in Fig. 4. For N agents, the modified joint spatial time-average statistics of the set of agent trajectories {γi }N i=1 are defined as C t (x, γ(t)) =
Nt
1 N i=1
t
t
λi (t)
λi (t)δ(x − γi (τ )),
(7)
0
where λi (t) ∈ {0, 1} for all integers i ∈ [0, N ). We use the augmented ergodic metric defined in Eq. 5 to drive the joint optimization of the set of agent trajectories and sensing decision variables. The ergodic optimization posed in Eq. 4 is used to generate optimal controls and set of optimal sensor measurements for each agent. We again employ a relaxation of the problem described by Eq. 4 by defining each λi (t) to be a bounded continuous variable λi (t) ∈ [0, 1] for all integers i ∈ [0, N ) and optimize over the new domain. After optimization, we project the λi for each agent i for all integers i ∈ [0, N ) from the continuous domain to the nearest inte- Fig. 4. Multi-Agent Sparse Ergodic ger value, while adhering to the sens- Trajectory Example: Trajectories for a ing budget. We take into account all multi-agent team are generated with a deciof the agent trajectories when map- sion variable for when to take a sensor ping from the continuous domain to measurement. This approach automatically discrete integers in order to distribute updates trajectories which provide maximal information gathering with minimal sensing sensing measurements across all of the resources. agents.
4
Results and Discussion
In this section we empirically show that for a specific ergodic coverage problem, there exists a minimal set of sensor measurements to take in order to gain enough information to fully characterize and solve the coverage task. Through our experiments we illustrate that optimizing for this set of samples, or sensor measurements, leads to improvements in ergodicity for coverage problems. This supports our intuition that in many coverage scenarios, extraneous samples are taken that either do not contribute to or negatively impact the optimization. Further, we show that our sparse ergodic optimization approach can be applied to multi-agent coverage tasks in order to optimally distribute a given sensing
Sparse Sensing in Ergodic Optimization
121
budget amongst all of the agents. In multi-agent coverage scenarios as well, we empirically show that for a specific ergodic coverage problem, there is a minimal set of sensor measurements required to cover the information prior. 4.1
Experimental Details
In our set of systematic experiments, the dynamics of the agent considered are defined as the constrained discrete time dynamical system x(t + 1) = f (x, u) = x(t) + tanh(u)
(8)
where x(t) ∈ R2 and u ∈ R2 is constrained by the tanh function bounded ∈ [0, 1]. The sensor footprint of the agent is modeled as a Gaussian distribution centered at the agent’s position, whose variance prescribes a circular observation range ρ > 0. The information maps are built using information distributions that are distributed within the continuous search space X ∈ [0, L]2 ⊂ R2 , which is defined as 3 ηi exp(||x − ci ||2−1 ) (9) p(x) = i=1
i
where p(x) : X → R+ , and ηi , ci , i are the normalizing factor, the Gaussian center, and the Gaussian variance respectively. As described in Sect. 3.1, the sparse ergodic optimization problem Eq. 4 is a mixed integer programming problem. Since mixed integer programming problems are difficult to solve, we relax this formulation by defining λ(t) to be a bounded continuous variable λ(t) ∈ [0, 1], and map the resultant values to discrete values in {0, 1}. We use a solver to compare the results of the mixed integer programming formulation to our relaxed problem to show that this relaxation greatly improves the computational cost of the optimization, without negatively impacting performance. We investigate the performance of the sparse ergodic optimization described in Sect. 3.1 in terms of coverage performance, using ergodicity as the metric. We compare our coverage results to two different approaches. The first is standard ergodic optimization Eq. 2, where sensor measurements are uniformly distributed along the optimized trajectory. The second is a probabilistic heuristic with a two step process: first we optimize an ergodic trajectory, then we sample measurement locations from the distribution of information under the optimized trajectory. Further, we investigate the performance of multi-agent sparse ergodic optimization described in Sect. 3.2 in terms of coverage performance, with ergodicity as the metric. We compare our results to the two baselines described above, averaged across different team sizes (ranging from 3 to 10 agents). The performance statistics for each method and sensing budget are averaged across 25 randomized experiment setups each, where initial information map is varied between experiments. For each method, 10 different sensing budgets were explored. Agents starting positions, initial information maps, and sensing budgets are kept identical among experiments with different controllers to ensure that our results are comparable.
122
A. Rao et al.
4.2
Single Agent Sensing Distribution
When looking at results of sparse ergodic optimization in terms of overall coverage performance, measured by the ergodic metric, we observe that there is a minimal number of sensor measurements to be taken to minimize ergodicity (see Fig. 6). In our experiments, this number of sensor measurements varies with changes in information map being covered, sample rate, time horizon and initial sample weights. However, for a set of fixed experiment hyperparameters, the minimal number of sam- Fig. 5. Standard Ergodic and Sparse ples required is consistent. For any Ergodic Trajectory Comparison: Stanfixed experiment setup, when we take dard ergodic optimization results in a diffewer than this minimal number of ferent final trajectory (orange trajectory) sensor measurements, the optimiza- than jointly optimizing for both trajectory and sensor weight (blue and yellow trajection lacks relevant information, and tory). This implies a cross-effect between so, we see a decrease in coverage per- agent position and if a sensing measurement formance. On the other hand, when is taken, due to which joint optimization we take more sensor measurements becomes more important as the coverage than the minimal required number, problem scales. the ergodic value increases, due to extraneous measurements negatively impacting coverage. We also experimentally demonstrate that it is better to selecSparse Ergodic Optimization Uniform Sampling tively choose measurements along Probabilistic Heuristic the standard ergodic trajectory, since in all cases, the coverage performance of the probabilistic heuristic is much better than that of standard ergodic optimization with uniformly distributed meaNumber of Sensing Measurements surements. On the other hand, using the probabilistic heuristic Fig. 6. Comparison of Baseline Appro- on a standard ergodic trajectory aches and Sparse Optimization: Results has very similar coverage perforshow that sparse ergodic optimization has betmance to sparse ergodic optimizater coverage performance in terms of the ergodic metric when compared to standard ergodic opti- tion for a single agent. We see that mization with uniformly distributed sparse mea- optimizing for trajectory alone, surements. The probabilistic heuristic results in and jointly optimizing for trajectory and measurement placement comparable performance. creates different resultant trajec0.09
0.085
Ergodic Value
0.08
0.075 0.07
0.065 0.06
0.055 0.05
0.045
0
10
20
30
40
50
60
70
80
90
100
Sparse Sensing in Ergodic Optimization
123
tories (Fig. 5), implying that trajectory optimization and measurement choice impact each other. Jointly optimizing for trajectory and sensing measurements leads to lower control cost, as you are directly taking into account the cost of moving between chosen sensing measurements. For a single agent in the simple coverage problems being considered, there aren’t large differences in control cost, leading to very similar performance of the probabilistic heuristic and sparse ergodic optimization. Further, we compare the performance statistics and computation cost of our approach to that of solving the mixed integer programming formulation of the sparse ergodic optimization problem. We see in Fig. 7 that our relaxation of the sparse ergodic optimization problems leads to much lower computational costs (i.e. lower run times), while retaining comparable coverage performance in terms of the ergodic metric. 0.08
Ergodic Value
0.07 0.065 0.06 0.055 0.05 0.045
Computation Time (seconds)
6
Sparse Ergodic Optimization Mixed Integer Formulation
0.075
Sparse Ergodic Optimization Mixed Integer Formulation
5
4
3
2
1
0
0.04 0
50
100
Number of Sensing Measurements
150
0
50
100
150
Number of Sensing Measurements
Fig. 7. Comparison of Mixed Integer Optimization Problem and Relaxed Sparse Optimization Problem: Solving the mixed integer formulation of the sparse sensing problem leads to slightly better coverage performance (a), in terms of the ergodic metric, but has a much higher computational cost (b).
124
4.3
A. Rao et al.
Multi-agent Sparse Ergodic Optimization
When optimizing limited sensing resources (specifically a sensing Sparse Ergodic Optimization Uniform Sampling budget in terms of a restricted Probabilistic Heuristic number of sensing measurements) for a multi-agent team using the sparse ergodic optimization approach, we see that the ‘workload’ of covering different peaks in a given a priori information map Number of Sensing Measurements is distributed among the agents, and the sensing measurements are distributed in order to support Fig. 8. Comparison of Baseline Approathe requirements of each coverage ches and Sparse Optimization for MultiAgent Coverage: Results show that multiworkload. agent sparse ergodic optimization has better Similar to single agent sparse coverage performance in terms of the ergodic ergodic optimization results, we metric when compared to standard ergodic optisee that using sparse ergodic opti- mization with uniformly distributed sparse meamization to distribute a limited surements and with sparse measurements dissensing budget across a multi- tributed using a probabilistic heuristic. agent team results in improved coverage performance in terms of the ergodic metric with fewer sensing measurements (i.e. for lower sensing budgets) (see Fig. 8). We also observe that there is a minimal number of sensor measurements that are required in order to minimize ergodicity, and this minimal number varies with changes in experiment hyperparameters like information map being covered, sample rate, time horizon and initial sample weights. For a fixed set of experiment hyperparameters, the ergodicity increases when the sensing budget is increased past this minimum required number, since there are extraneous measurements being taken, while the ergodicity increases with decrease in sensing budget below the minimum required number, since the optimization is missing information. For multi-agent teams we see that the probabilistic heuristic has worse coverage performance compared to sparse ergodic optimization (see Fig. 8). As explained in Sect. 4.2, jointly optimizing for trajectory and sensing measurements leads to lower control cost. As we scale up the optimization problem, control cost becomes more substantial, as we need to optimize multiple trajectories. Further, the cross-effect of the agents’ trajectories and if they take sensing measurements increased for multiple agents, since coverage performance is now impacted by several trajectories with potential areas of overlap. Thus, it becomes more necessary to jointly optimize for trajectory and choosing sensing measurements in multi-agent sparse sensing coverage problems. 0.085
Ergodic Value
0.08
0.075
0.07
0.065
0.06
0
50
100
150
Sparse Sensing in Ergodic Optimization
5
125
Conclusion
In this paper we investigate the idea that there is an optimal set of sensing measurements that can be taken during coverage, in order to fully characterize the problem, reduce sensing costs and avoid local sensing optima. To this end, we formulate a novel approach to sensing-resource limited coverage by modifying ergodic optimization to jointly optimize for both the sensing trajectory and the decision of where to take sensing measurements. Specifically, the set of optimal sensor measurements is posed as a sparse ergodic optimization problem, where the choice to take a measurement is encoded in a vector of sample weights. Our set of experiments show that there exists an optimal set of sensing measurements for a given coverage scenario, in both single and multi agent cases, using ergodicity, a state-of-the-art coverage metric. We also infer that there exists a cross-effect between an agent’s trajectory and sensing decisions, which make it important to jointly optimize these in cases with limited sensing measurements. This effect is stronger for multi-agent scenarios. This work experimentally shows that there are improvements to be made in resource-limited coverage scenarios through sparse sensing. Specifically, we show that there exists an optimal set of measurements required to solve a coverage problem. Future work will focus on proving theoretical guarantees regarding the existence of this optimal set of sensing measurements. This work assumes the availability of accurate a priori information maps, which is not the case for many real-world coverage applications. Future work will seek to use sparse ergodic optimization in order to identify and account for inaccurate information priors with minimum sensor measurements, and take into account sensor noise.
References 1. Mar´ın, L., Vall´es, M., Soriano, A., Valera, A., Albertos, P.: Multi sensor fusion framework for indoor-outdoor localization of limited resource mobile robots. Sensors 13(10), 14133–14160 (2013). https://doi.org/10.3390/s131014133. https:// www.mdpi.com/1424-8220/13/10/14133 2. Ma, F., Carlone, L., Ayaz, U., Karaman, S.: Sparse sensing for resource-constrained depth reconstruction. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 96–103 (2016). https://doi.org/10.1109/IROS. 2016.7759040 3. Ablavsky, V., Snorrason, M.: Optimal search for a moving target - a geometric approach. In: AIAA Guidance, Navigation, and Control Conference and Exhibit. AIAA (2000) 4. Choset, H.: Coverage for robotics-a survey of recent results. Ann. Math. Artif. Intell. 31(1), 113–126 (2001) 5. Lanillos, P., Gan, S.K., Besada-Portas, E., Pajares, G., Sukkarieh, S.: Multi-UAV target search using decentralized gradient-based negotiation with expected observation. Inf. Sci. 282, 92–110 (2014)
126
A. Rao et al.
6. Baxter, J.L., Burke, E.K., Garibaldi, J.M., Norman, M.: Multi-robot search and rescue: a potential field based approach. In: Mukhopadhyay, S.C., Gupta, G.S. (eds.) Autonomous Robots and Agents. Studies in Computational Intelligence, vol. 76, pp. 9–16. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-54073424-6 2 7. Wong, E.M., Bourgault, F., Furukawa, T.: Multi-vehicle bayesian search for multiple lost targets. In: International Conference on Robotics and Automation, pp. 3169–3174. IEEE (2005) 8. Mathew, G., Mezi´c, I.: Metrics for ergodicity and design of ergodic dynamics for multi-agent systems. Physica D 240(4), 432–442 (2011) 9. Wu, G.D., Zhu, Z.W., Chien, C.W.: Sparse-sensing-based wall-following control design for a mobile-robot. In: 2016 IEEE International Conference on Control and Robotics Engineering (ICCRE), pp. 1–5 (2016). https://doi.org/10.1109/ICCRE. 2016.7476144 10. Miller, L.M., Silverman, Y., MacIver, M.A., Murphey, T.D.: Ergodic exploration of distributed information. IEEE Trans. Rob. 32(1), 36–52 (2015) 11. Abraham, I., Mavrommati, A., Murphey, T.: Data-driven measurement models for active localization in sparse environments. In: Robotics: Science and Systems XIV. Robotics: Science and Systems Foundation (2018). https://doi.org/10.15607/RSS. 2018.XIV.045 12. Abraham, I., Prabhakar, A., Murphey, T.D.: An ergodic measure for active learning from equilibrium. IEEE Trans. Autom. Sci. Eng. 18(3), 917–931 (2021). https:// doi.org/10.1109/TASE.2020.3043636 13. Ayvali, E., Salman, H., Choset, H.: Ergodic coverage in constrained environments using stochastic trajectory optimization. In: International Conference on Intelligent Robots and Systems, pp. 5204–5210. IEEE (2017) 14. Miller, L.M., Murphey, T.D.: Trajectory optimization for continuous ergodic exploration. In: American Control Conference (ACC) 2013, pp. 4196–4201. IEEE (2013) 15. Kobilarov, M.: Cross-entropy motion planning. The Int. J. Robot. Res. 31(7), 855– 871 (2012) 16. Schmidt, M., Niculescu-Mizil, A., Murphy, K., et al.: Learning graphical model structure using l1-regularization paths. In: AAAI, vol. 7, pp. 1278–1283 (2007) 17. Wolsey, L.A.: Mixed Integer Programming. Wiley Encyclopedia of Computer Science and Engineering, pp. 1–10. Wiley, Hoboken (2007)
Distributed Multi-robot Tracking of Unknown Clustered Targets with Noisy Measurements Jun Chen1(B) , Philip Dames2 , and Shinkyu Park1 1
King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia {jun.chen.1,shinkyu.park}@kaust.edu.sa 2 Temple University, Philadelphia, PA 19122, USA [email protected] Abstract. Distributed multi-target tracking is a canonical task for multi-robot systems, encompassing applications from environmental monitoring to disaster response to surveillance. In many situations, the distribution of unknown objects in a search area is irregular, with objects are likely to distribute in clusters instead of evenly distributed. In this paper, we develop a novel distributed multi-robot multi-target tracking algorithm for effectively tracking clustered targets from noisy measurements. Our algorithm contains two major components. Firstly, both the instantaneous and cumulative target density are estimated, providing the best guess of current target states and long-term coarse distribution of clusters, respectively. Secondly, the power diagram is implemented in Lloyd’s algorithm to optimize task space assignment for each robot to trade-off between tracking detected targets in clusters and searching for potential targets outside clusters. We demonstrate the efficacy of our proposed method and show that our method outperforms of other candidates in tracking accuracy through a set of simulations. Keywords: Multiple target tracking distributed sensor network
1
· sensor-based control ·
Introduction
Multi-target tracking using distributed multi-robot systems (MRSs) has drawn increasing attention over the past decades as robots become more powerful and low-cost. In a large number of real-world scenarios, targets are likely to distribute in clusters despite lack of prior knowledge about exact locations, including social animals, certain species of plants, trash distributed in inhabited places, etc. In such cases, detecting a targets indicates that some other targets are likely to appear nearby. Most existing multi-robot multi-target tracking (MR-MTT) algorithms underperform in trading-off between having robots to search for undetected targets and tracking detected targets when targets are not evenly distributed across the search space given no prior knowledge. This paper aims at developing effective distributed tracking algorithms for unknown targets that c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 127–139, 2024. https://doi.org/10.1007/978-3-031-51497-5_10
128
J. Chen et al.
are likely to distributed in clusters. Such applications include the acquisition of image data from coral reef for high-precision 3D reconstruction of its habitats, trash collection in a desert area, flaw inspection and repair on surfaces of buildings and large machinery, detection and sample collection of vegetation in a nature reserve, and so on. There are two key components to a multi-robot multi-target tracking (MR-MTT) system: an estimation system to model and track objects as they are detected and a control system to drive the motion of individual robots in the team towards areas that are likely to contain useful information. Different from single target tracking, the main challenge of MTT is matching detections to target tracks, a process known as data association, especially in the presence of false negative and false positive detections. There are a number of standard MTT algorithms, each of which solve data association in a different way: global nearest neighbor (GNN) [9], joint probabilistic data association (JPDA) [8], multiple hypothesis tracking (MHT) [2], and particle filters [7]. Each of these trackers propagates the posterior of target states over time and solves the data association problem prior to tracking. Another class of MTT techniques is derived from random finite set (RFS) statistics [13] and which simultaneously solve data association and tracking. We use the probability hypothesis density (PHD) filter [12], which tracks the spatial density of targets, making it best suited to situations where each target is not required to have a unique identity. We recently developed a distributed PHD filter that is provably equivalent to the centralized solution [6]. Lloyd’s algorithm [11] is one of the best-known control algorithms for distributed target tracking, the idea of which is to represent target states by a weighting function over the task space and to drive each robot to the weighted centroid of its Voronoi cell [5]. Comparing to other MR-MTT algorithms such as graph-based methods [1,16], Lloyd’s algorithm requires no prior knowledge about targets and the number of targets can be unknown and time-varying. It drives more robots to areas where targets are more likely to appear, while allows fewer robots to search for targets in the rest of the areas based on the weighting function. The weighting function can be estimated online using on-board sensors [15], and Dames in his work [6] uses the PHD as the weighting function, driving robots to actively track targets. However, when no target is within a robot’s Voronoi cell, the robots move erratically, reacting to any false positive detections as well as the dynamically changing shape of their Voronoi cells. As a result, robots often stay within empty sub-regions instead of purposefully seeking out un-tracked targets, slowing down the rate at which they find targets. This problem is further exacerbated when a majority of targets gather within some small subsets of the environment. To improve tracking accuracy of clustered targets, this paper develops a novel estimation and control policy. We have three primary contributions: 1) we introduce a state estimation strategy incorporating both instantaneous and cumulative target states, which allows robots to track detected targets precisely through noisy measurements while learning coarse distribution of targets from
Distributed Tracking of Clustered Targets
129
historical detecting outcomes, 2) we implement the power diagram in Lloyd’s algorithm to dynamically assign optimized task space to each robot based on its current workload and drive the robots to explore or track targets more effectively, and 3) we demonstrate in a series of simulated experiments that with our proposed method, the team finds and tracks targets more effectively than using the previous algorithms in [6].
2
Problem Formulation
A set of nt targets with states X = {x1 , . . . , xnt } are located within a convex open task space denoted by E ⊂ R2 . A team of nr (possibly heterogeneous) robots R = {r1 , . . . , rnr } are tasked with determining nt and X, both of which are unknown and may vary over time. We assume that each robot ri knows its location qi in a global reference frame (e.g. from GPS), though our proposed method can be immediately extended to handle localization uncertainty using the algorithms from our previous work [3]. At each time step, a robot ri receives a set of |Zi | noisy measurements Zi = {zi , z2 , . . .} of targets within the field of view (FoV) Fi of its onboard sensor. Note that the sensor may experience false negative or false positive detections so the number of detections may not match the true number of targets. 2.1
Lloyd’s Algorithm
The objective of Lloyd’s algorithm is to minimize the following functional: H(Q, W) =
n i=1
Wi
f x − qi φ(x)dx,
(1)
where Wi is dominance region of robot ri (i.e., the region that robot ri is responsible for), · is the Euclidean distance, x ∈ E, φ(x) is the weighting function for all x ∈ E, and f (·) is a monotonically increasing function. The role of f is to quantify the cost of sensing due to degradation of a robot’s ability to measure events with increasing distance. The dominance regions Wi form a partition over E, meaning the regions have disjoint interiors and the union of all regions is E [5]. The goal is for the team to minimize the functional (1), both with respect to the partition set W and the robot positions Q. Minimizing H with respect to W induces a partition on the environment Vi = {x | i = arg mink=1,...,n x − qk }. In other words, Vi is the collection of all points that are the nearest neighbor of ri . This is the Voronoi partition, and these Vi are the Voronoi cells, which are convex by construction. We call qi the generator point of Vi . Minimizing H with respect to Q leads each sensor to the weighted centroid of its Voronoi cell [5], that is
130
J. Chen et al.
qi∗
= Vi
xφ(x) dx
Vi
φ(x) dx
.
(2)
Lloyd’s algorithm sets the control input for robot ri to ur (qi∗ ), where g − qi , ui (g) = min dstep , g − qi g − qi
(3)
g is an arbitrary goal location, and dstep > 0 is the distance a robot can move during one time step. Robots follow this control law online, i.e., recursively move to the temporary weighted centroids of their Voronoi cells, re-construct their cells based on their new positions, and compute the new weighted centroids to move to. As a result, the robots asymptotically converge to the weighted centroids of their Voronoi cells, causing the team to reach a local minimum of Eq. (2). This still holds true when φ(x) varies with time. Note that the original Lloyd’s algorithm works in a centralized manner.
3 3.1
Distributed Multi-target Tracking Instantaneous State Estimation
The sets X and Zi from above contain a random number of random elements, and thus are realization of random finite sets (RFSs) [13]. The first order moment of an RFS is known as the Probability Hypothesis Density (PHD) (which we denote v(x)) and takes the form of a density function over the state space of a single target or measurement. By assuming that the RFSs are Poisson distributed where the number of targets follows a Poisson distribution and the spatial distribution of targets is i.i.d., the PHD filter recursively updates this target density function in order to track the distribution over target sets [12]. The PHD filter uses three models to describe the motion of targets: 1) The motion model, f (x | ξ), describes the likelihood of an individual target transitioning from an initial state ξ to a new state x. 2) The survival probability model, ps (x), describes the likelihood that a target with state x will continue to exist from one time step to the next. 3) The birth PHD, b(x), encodes both the number and locations of the new targets that may appear in the environment. The PHD filter also uses three models to describe the ability of robots to detect targets: 1) The detection model, pd (x | q), gives the probability of a robot with state q successfully detecting a target with state x. Note that the probability of detection is identically zero for all x outside the sensor FoV. 2) The measurement model, g(z | x, q), gives the likelihood of a robot with state q receiving a measurement z from a target with state x. 3) The false positive (or clutter) PHD, c(z | q), describes both the number and locations of the clutter measurements.
Distributed Tracking of Clustered Targets
131
Using these target and sensor models, the PHD filter prediction and update equations are: v¯t (x) = b(x) + f (x | ξ)ps (ξ)vt−1 (ξ) dξ (4) E
vt (x) = (1 − pd (x | q))¯ vt (x) +
(5)
z∈Zt
ηz (v) = c(z | q) +
ψz,q (x)¯ vt (x) ηz (¯ vt )
ψz,q (x)v(x) dx
(6)
E
ψz,q (x) = g(z | x, q)pd (x | q),
(7)
where ψz,q (x) is the probability of a sensor at q receiving measurement z from a target with state x. The PHD represents the best guess of target states at current time step and is utilized as the instant state estimation of target density, denoted by wi(x), i.e., wi(x) = v(x), x ∈ E. In [6] a distributed PHD filter is formulated, with each robot maintaining the PHD within a unique subset, Vi , of the environment. Three algorithms then account for motion of the robots (to update the subsets Vi ), motion of the targets (in (4)), and measurement updates (in (5)). As a result, each robot recursively re-constructs its Voronoi cell online based on current relative locations of neighboring robots and maintains PHD locally by communicating with neighbors to estimate target states, yielding to identical results to running a centralized PHD filter over the task space. In this paper, same strategies are applied for propagating wi(x) over time in a distributed manner. 3.2
Cumulative State Estimation
Since no prior of birth model is known for the PHD filter, we typically assume that the target birth rate is uniformly distributed. As a result, individuals do not actively search for a target when no target is detected and could often spend a long time locating targets that appeared in underexplored regions of the environment, degrading the tracking performance especially when targets are clustered. To improve this, we need to estimate online the probability of target appearance over a long period of time, i.e., the cumulative state wc. In a variety of scenarios, targets move randomly in some relatively fixed clusters, e.g., animals cluster around water sources. In such cases, the frequency of target appearance at each point x ∈ E can be regarded as time-invariant and the cumulative state estimation is a density distribution that quantifies the best guess of the number of expected targets at each location based on accumulated observation. The PHD filter assumes that the number of targets follows a Poisson distribution. The conjugate prior of this is a Gamma distribution, which describes the distribution f (x) of the expected number of targets found at x. This is given by Γ(α, β) =
xα−1 e−βx β α , x > 0, α, β > 0, (α − 1)!
(8)
132
J. Chen et al.
where α(x) is the shape parameter, which describes the total rewards obtained from the observations at location x, and β(x) is the inverse rate parameter, which describes the inverse of the number of historical observations sensors conduct at location x. Initially, α0 (x) = 0 and β0 (x) = ∞. When a robot ri receives a measurement set Zi at time t > 0, it updates α(x), β(x) for all x ∈ Fi by α(x) = α(x) + rw, β(x) = (β(x)−1 + 1)−1 , 0 |Zi | = 0 rw = . 1 else
where
(9)
(10)
Then, for each x ∈ E, the cumulative state estimation is given by the mode of the distribution Γ(α(x), β(x)) α(x)−1 β(x) , α(x) ≥ 1 (11) wc(x) = 0, α(x) < 1. 3.3
Environment Approximation
Since it is neither feasible or physically meaningful to have a different distribution for each point in space, it makes sense to use a subset of points to represent both wi and wc. In this paper, we use uniform grids to represent wi and wc over the task space. Therefore, the integral of wi and wc in a grid is equal to the instantaneous and cumulative expected number of targets in the grid, respectively. 3.4
Distributed Control
Optimized Space Assignment. In order to optimize the assigned coverage area, each robot is endowed a weight ρ2 in constructing it Voronoi cell, and the monotonically increasing function in Eq. (1) becomes f (x − qi ) = x − qi 2 − ρ2i
(12)
where ρi is the power weight of robot ri . The resulting W is a additively weighted Voronoi diagram, or power diagram,a variant of the standard Voronoi diagram, and Wi is the power cell. Recall that x∈Wi wc(x) quantifies the expected number of targets to be found in Wi and that the optimal partition Wi = {x | i = arg mink=1,...,n f (x − qk )}. Therefore, for each robot ri , we set ρi as ρi =
− 12 wc(x)
,
(13)
x∈Wi
so that the area of Wi is approximately inversely proportional to the estimated number of targets can be found in Wi . Thereby, robots in areas with lower expected target density are assigned larger area to explore, which takes advantage of the search and tracking capability of the team.
Distributed Tracking of Clustered Targets
3.5
133
Algorithm Outline
The distributed target search and tracking algorithm is outlined as Algorithm 1. For each robot, both instantaneous and cumulative states are initialized to null as there is no prior knowledge about target states, and α(x) and β(x) are initialized. As robots start to explore and receive sensor measurements, each of them exchange its location with neighbors to compute its power cell Wi , as outlined in lines 5–8. Then each robot updates and broadcast the tuple {α(x), β(x), wii (x), wci (x)}. Lines 12–18 outline the control strategy, which separates the team into idle robots (i.e., robots that find no target) and busy robots (i.e., robots that find targets), and causes the robot to move according to the following rules: When targets are detected, it drives the robot to locations with higher instantaneous estimation of target density to better track the detected targets; On the contrary, a robot is driven to search for targets in areas with higher cumulative estimation of target density where targets are more likely to be found based on accumulated experience. As a result, the team is able to optimize robot locations by taking advantage of historical measurements and further improve the tracking accuracy of targets. Algorithm 1 boosts the performance of our previous methods [6] in that it allows the team to actively explore the environment and learn the characteristics of the target distribution. In particular, robots are now able to use a combination of detailed local information (coming from the PHD) and coarse global information (coming from α, β) to inform their actions. For evenly distributed targets, Algorithm 1 yields a similar tracking performance compared
Algorithm 1: Distributed Search and Tracking 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
for ri ∈ R do wa(x) ← 0, wi(x) ← 0, x ∈ E 1 α(x) ← ∞ , β(x) ← ∞, x ∈ E while true do for ri ∈ R do Receive measurement set Zi Exchange state qi with neighbors Ni Compute power cell Wi Update α(x), β(x), x ∈ Wi using (9) Update wii (x), wci (x), x ∈ Wi using (5) and (11), respectively Broadcast {α(x), β(x), wii (x), wci (x)} if Zi = ∅ then φi (x) ← wci (x), x ∈ Wi Idle robots else φi (x) ← wii (x), x ∈ Wi
Busy robots
qi∗
Set goal gi = using (2) Compute ui (gi ) using (3) q i ← q i + ui
Move towards goal
134
J. Chen et al.
with Dames’ method [6] since wc is close to uniform across the task space and W is similar to a standard Voronoi diagram since then. However, the advantages of our method is pronounced when targets are not uniformly distributed in the space but are instead grouped together within small regions. Under these circumstances, idle robots are especially helpful in learning the difference in target density among sub-regions and optimizing the assignment of tracking effort in different sub-regions.
4
Simulations
We test our proposed algorithms via Matlab simulations. The task space is a 100 m × 100 m square. Targets are distributed in clusters, where 30 are located in a 33 × 33 m square sub-region at the lower-left corner of E, and another 30 targets in a 33 × 33 m squared sub-region at the top-right corner. Targets move within the sub-regions following a Gaussian random walk at a maximum speed of 3 m/s. All robots begin each trial at randomized locations within a 20 m × 10 m box at the bottom center of the environment. Robots have a maximum speed of 5 m/s and are equipped with isotropic sensors with a sensing radius 5 m. Both instantaneous and cumulative estimation are approximated by uniform grid implementation, with a grid size of 0.1 m × 0.1 m. In the PHD filter, the robots use a Gaussian random walk (with σ = 0.35 m/s) for the motion models f , set the survival probability to 1, and the birth PHD to 0 due to no prior knowledge of targets is learned. We use the same measurement model for homogeneous noisy sensors as [6]. Note that our proposed method is compatible with heterogeneous sensing network [4], we just make this choice to simplify the tests.
Fig. 1. Figures show 70 robots and 60 targets distribution in a 100 m × 100 m squared task space after 300 s of tracking using Dame’s method and our method, respectively. Orange diamonds plot locations of targets. Green squares and circles show locations and field of views of robots, respectively. Red crosses plot robots’ temporary goals. Dashed lines plot boundaries of robots’ current assigned space.
Distributed Tracking of Clustered Targets
4.1
135
Qualitative Comparison
We first show how our proposed algorithms improve multi-target tracking using a single trial using 70 robots. Figures 1 show the locations of robots and targets after 300 s using both Dames’ method [6] and our method. When targets gather within only a small portion of the environment, the previous method can hardly optimize robot locations for more accurate tracking of targets since only a few robots have found targets and are tracking them. As a result, most of the robots are idle and move erratically to cover areas where no targets are distribute, which demonstrates the weakness of the previous algorithm that idle robots do
Fig. 2. Figures plot surface of wi and wc distributed over the 100 m × 100 m task spaces after 300 s, respectively.
136
J. Chen et al.
not actively search for targets, causing an inefficient use of the total sensing capability of the team while searching for un-tracked targets. Such phenomenon is improved by our method, which drives a larger number of robots to gather at the clusters of targets while a handful of other robots continue to search unexplored areas, as the exact locations and motion models of targets are unknown. The result indicates the efficacy of our method in that the distribution of the clusters is learned overtime and the power diagram distributes more balanced workload to individuals. To illustrate the different characteristics of the instantaneous estimation and the cumulative estimation of targets, we plot in Fig. 2 the value of wi(x) and wc(x) for all x ∈ E after 300 s of the task using our method, respectively. The instantaneous estimation, i.e., the PHD, provides the best guess of exact locations of targets at current time through a set of sharp peaks, shown as Fig. 2a. The PHD wi(x) of x ∈ E returns to near zero rapidly as targets are no longer found at x over a very short period of time given low expected target birth rate, exhibiting high accuracy of estimating target exact locations but short “memory” of historical target distribution. On the contrary, the cumulative estimation wc presents a relatively smooth and continuous distribution over the task space, with higher values distributed over the entire target clusters and near-zero values over the rest of the space, shown as Fig. 2b. Therefore, the instantaneous and the cumulative estimation are utilized to drive busy and idle robots respectively, as the former robots requires target exact locations for accurate tracking while the latter robots are in need of coarse distribution of clusters for optimized deployment. 4.2
Quantitative Comparison
To test the efficacy of our proposed approach, we conduct a series of trials using a range of team sizes (from 60 to 90 robots). Three tracking strategies are compared: 1) Dames’ method (“D” method) in [6], 2) Dames’ method with power diagram (“P” method) which uses instantaneous estimation only for tracking (similar to the power diagram implementation in [4]), and the power weight in Eq. (14) is depending on PHD instead, given by ρi =
− 12 wi(x)
,
(14)
x∈Wi
and 3) our method (“O” method) which uses both instantaneous and cumulative estimation, outlined in Algorithm 1. For each team size we run 10 trials. OSPA. To quantify the performance, we will use the first order Optimal SubPattern Assignment (OSPA) metric [14], a commonly-used approach in MTT. The error between two sets X, Y , where |X| = m ≤ |Y | = n without loss of
Distributed Tracking of Clustered Targets
137
generality, is
d(X, Y ) =
1 min n π∈Πn
m
1/p dc (xi , yπ(i) )p + cp (n − m)
,
(15)
i=1
where c is a cutoff distance, dc (x, y) = min(c, x − y), and Πn is the set of all permutations of the set {1, 2, . . . , n}. This gives the average error in matched targets, where OSPA considers all possible assignments between elements x ∈ X and y ∈ Y that are within distance c of each other. This can be efficiently computed in polynomial time using the Hungarian algorithm [10]. We use c = 10 m, p = 1, and measure the error between the true and estimated target sets. Note that a lower OSPA value indicates a more accurate tracking of the target set. Results. In Fig. 3a, we report the median OSPA value to measure tracking accuracy over the final 700 s of each trial, allowing the team to reach or be close to a steady state and smoothing out the effects of spurious measurements that cause the OSPA to fluctuate, and the results are aggregated into boxplots. “D” method shows the worst performance in tracking clustered targets regardless of team sizes. As depicted by the results of “P” method, such performance is improved by assigning optimized task spaces that contain equivalent instantaneous estimated number of targets to robots in the team, which drives the robots to gather at where targets are currently tracked. However, robots are still not
Fig. 3. Figures show OSPA errors of group of trials. Figure 3a displays boxplots of average OSPA errors from 300 s to 1000 s using the three methods (“D”, “P”, and “O”) with four team sizes, each over ten trials. |R| denotes the number of robots. Figure 3b plots average OSPA errors of the total of 40 trials (10 for each team size) using the three methods over the entire 1000 s.
138
J. Chen et al.
tend to move to where targets are likely to clustered by no target is tracked instantly. Our proposed method further improves this flaw and shows the best tracking performance of clustered targets, as suggested by the OSPA values of “O” method. In Fig. 3b, we plot the average OSPA values of all 40 trials of four different team sizes using the three methods over the entire 1000 s. It is shown that the OSPA error drops at similar rates over the first 200 s as robots start moving from the starting area and explore the entire search space, despite the applied tracking algorithm. After that, “D” method no longer improves the tracking accuracy and the team reaches a steady state, while the other two algorithms continues to result in lower down of the OSPA error. “P” method reaches a steady state at around 400 s, after which the team has completed updating a steady-state PHD and propagating optimized cell assignment. Meanwhile, the OSPA error continuously drops by using “O” method until approximately 600 s. This is due to the fact that the cumulative estimation of target number continuously grows as more observations are gain, leading to an enhanced congregation of robots in target clusters. It is illustrated that our proposed method results in a continuous learning behavior of cluster distribution as robots explore and contributes to the outperformance of tracking accuracy over the other two algorithms.
5
Conclusions
The tracking accuracy of existing distributed MR-MTT algorithms significantly decays when targets are clustered instead of evenly distributed across the task space. In this paper, we propose a novel distributed multi-target tracking algorithm that allows a team of robots to effectively track clustered targets, despite given no prior knowledge of target states. Each robot estimates both instantaneous and cumulative target density and dynamically optimizes its space assignment using a power diagram implementation of Lloyd’s algorithm. As a result, robots are able to track detected targets precisely while congregate at target clusters by learning the coarse cluster distribution from past observations. Simulation results suggest that our algorithm is superior to other candidates in effective tracking of clustered targets. Acknowledgements. This work was supported by funding from King Abdullah University of Science and Technology (KAUST).
References 1. Ahmad, A., Tipaldi, G.D., Lima, P., Burgard, W.: Cooperative robot localization and target tracking based on least squares minimization. In: 2013 IEEE International Conference on Robotics and Automation, pp. 5696–5701. IEEE (2013) 2. Blackman, S.S.: Multiple hypothesis tracking for multiple target tracking. IEEE Aerosp. Electron. Syst. Mag. 19(1), 5–18 (2004)
Distributed Tracking of Clustered Targets
139
3. Chen, J., Dames, P.: Collision-free distributed multi-target tracking using teams of mobile robot with localization uncertainty. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (2020) 4. Chen, J., Dames, P.: Distributed multi-target tracking for heterogeneous mobile sensing networks with limited field of views. In: IEEE International Conference Robotics and Automation (2021), Under review 5. Cortes, J., Martinez, S., Karatas, T., Bullo, F.: Coverage control for mobile sensing networks. IEEE Trans. Robot. Autom. 20(2), 243–255 (2004) 6. Dames, P.M.: Distributed multi-target search and tracking using the phd filter. Auton. Robot. 44(3), 673–689 (2020) 7. Doucet, A., Vo, B.N., Andrieu, C., Davy, M.: Particle filtering for multi-target tracking and sensor management. In: Proceedings of the Fifth International Conference on Information Fusion, vol. 1, pp. 474–481. IEEE (2002) 8. Hamid Rezatofighi, S., Milan, A., Zhang, Z., Shi, Q., Dick, A., Reid, I.: Joint probabilistic data association revisited. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3047–3055 (2015) 9. Konstantinova, P., Udvarev, A., Semerdjiev, T.: A study of a target tracking algorithm using global nearest neighbor approach. In: Proceedings of the International Conference on Computer Systems and Technologies (CompSysTech 2003), pp. 290– 295 (2003) 10. Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Logistics Q. 2(1–2), 83–97 (1955) 11. Lloyd, S.: Least squares quantization in pcm. IEEE Trans. Inf. Theory 28(2), 129–137 (1982) 12. Mahler, R.P.: Multitarget bayes filtering via first-order multitarget moments. IEEE Trans. Aerosp. Electron. Syst. 39(4), 1152–1178 (2003) 13. Mahler, R.P.: Statistical multisource-multitarget information fusion, vol. 685. Artech House Norwood, MA (2007) 14. Schuhmacher, D., Vo, B.T., Vo, B.N.: A consistent metric for performance evaluation of multi-object filters. IEEE Trans. Signal Process. 56(8), 3447–3457 (2008) 15. Schwager, M., Rus, D., Slotine, J.J.: Decentralized, adaptive coverage control for networked robots. Inter. J. Robot. Res. 28(3), 357–375 (2009) 16. Sung, Y., Budhiraja, A.K., Williams, R.K., Tokekar, P.: Distributed assignment with limited communication for multi-robot multi-target tracking. Auton. Robot. 44(1), 57–73 (2020)
A Force-Mediated Controller for Cooperative Object Manipulation with Independent Autonomous Robots Nicole E. Carey1(B) and Justin Werfel2 1
2
Autodesk Robotics Lab, Autodesk Inc., San Francisco, USA [email protected] Harvard John A. Paulson School of Engineering and Applied Sciences, Harvard University, Boston, USA [email protected]
Abstract. We consider cooperative manipulation by multiple robots assisting a leader, when information about the manipulation task, environment, and team of helpers is unavailable, and without the use of explicit communication. The shared object being manipulated serves as a physical channel for coordination, with robots sensing forces associated with its movement. Robots minimize force conflicts, which are unavoidable under these restrictions, by inferring an intended context: decomposing the object’s motion into a task space of allowed motion and a null space in which perturbations are rejected. The leader can signal a change in context by applying a sustained strong force in an intended new direction. We present a controller, prove its stability, and demonstrate its utility through experiments with (a) an in-lab force-sensitive robot assisting a human operator and (b) a multi-robot collective in simulation. Keywords: robot cooperation force-based control
1
· distributed robotics · manipulation ·
Background
Cooperative (or collaborative) manipulation is the task where two or more agents, which may be any combination of robots and humans, must work together to manipulate an object too large or unwieldy for one agent to handle alone. Often one agent may act as a leader, seeking to move the object along a privately known trajectory through space, aided by an arbitrary number of independent helper agents who have no direct knowledge of the path or destination. Challenges associated with this task can include the following: (1) The details of the environment (topography of the ground, locations of obstacles, etc.) may not be known by some or all agents, and may change dynamically. (2) The number and locations of the agents participating in the task may not be known. (3) The properties of the object (mass, geometry, etc.) may not be known. (4) c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 140–155, 2024. https://doi.org/10.1007/978-3-031-51497-5_11
A Force-Mediated Controller for Cooperative Object Manipulation
141
There may be unexpected disruptions (e.g., the object or agents may bump into obstacles). (5) Communication may be problematic, e.g., unreliable, undesirable, or unscalable. We are interested in the case of an unanticipated need to move an object, with no opportunity for advance preparation. That is, the properties of the environment, object, set of helpers, and desired manipulation trajectory are all unknown, and some of these elements may also be changeable and dynamic. Further, if the leader is a human, then communicating the details of a desired trajectory to the helper robots is unlikely to be quick, easy, or detailed and accurate. From a usability standpoint, the ideal way for the robots to give help would be for the human to try to move the object in the desired way directly, by exerting forces on it, and for the robots to supplement the leader’s force to enable the manipulation, without requiring any explicit instruction. An approach that works under these restrictive conditions could also be used in a scenario where more information or communication is available.
2
Related Work
Work on collective manipulation typically assumes that all robots share knowledge about the intended trajectory, inertial parameters of the payload, and/or number and locations of teammates, and that explicit robot-to-robot communication is available [1,7,15,17,19]. Some work has considered the problem of obtaining such information in partially restricted cases, e.g., estimating inertial parameters using carefully coordinated actions and knowledge about the team [15]. Some studies rely on specialized gripper hardware that constrains freedom of movement in specific ways [10]. A subcategory of cooperative manipulation is collective transport, where movement is limited to a plane [4,9,10]. Here we consider cases where only limited payload information (local mass and relative center of mass (CM) location) can be obtained, no other information or explicit communication is available, movement in any dimension may be desired, and off-the-shelf hardware can be used.
3
Challenges and Assumptions
In general, a helper robot cannot disambiguate wrenches that are due to guidance from a leader, collisions, other agents, or the load itself (Fig. 1B). Thus to successfully follow guidance forces and reject all others, each robot must have a contextual framework through which it can observe, filter, and classify sensed forces, and apply an appropriately calculated control torque to smoothly accomplish the task goal. In previous work [4], we presented an adaptive control framework for collective transport, a special case of the more general cooperative manipulation problem, in which an unknown object is carried by an arbitrary number of robots with movement strictly in a horizontal plane. That work followed an operational space control paradigm [12] in which the robots allow movement in the plane
142
N. E. Carey and J. Werfel
Fig. 1. A: A robot collective transports an unknown object following a leader’s guidance. B: Two helper robots handle a small rigid object, guided by a human leader’s force applied to one corner (blue arrow). The left-hand robot experiences a single multi-dimensional wrench at its end-effector, with no disambiguation of components resulting from the leader, the object’s inertial properties, and forces due to the other agent. C: Physical testing of cooperative manipulation of a basket, using a Franka Emika Panda. D: An example application scenario: a robot helps a human manipulate a load in a challenging field situation of installing solar panels.
and prevent movement out of the plane. That is, the contextual framework that robots used to interpret forces was the a priori knowledge that only movement in the plane would be intended. In this paper, we modify our previously presented controller to permit more general collaborative manipulation tasks, encompassing movement in any direction. The central principle robots use to determine context is to interpret transient or weak forces as unintended (e.g., perturbations due to collisions), and sustained stronger forces as intentional control signals. We show that if each robot can estimate the payload’s mass and relative CM location, then it can respond to control signals in any direction, switching between dimensional contexts without loss of controller stability or compliance. We demonstrate this framework in physical collaborative experiments using a compliant 7-DoF manipulator. The key assumptions of the approach are: (1) Each agent can obtain an estimate of the load’s relative CM location and its share of the mass (the latter may typically be m/na , for a system of na agents and total mass m, with both m and na unknown to the agents). (See §4.2 for a discussion of how this assumption can be realized.) (2) There are enough agents to handle the load (na is large enough that m/na is significantly less than the carrying capacity of any single agent). (3) The grasp or connection between each agent and the shared load is
A Force-Mediated Controller for Cooperative Object Manipulation
143
rigid, with no slippage or in-grasp rotation. (4) Agents have some compliance, permitting a degree of conflict between forces exerted by different agents without immediately causing instability. (5) Each agent knows the orientation of its base with respect to a shared reference vector (e.g., robots can sense the direction of gravity while moving over uneven terrain). We do not assume other knowledge about the environment, task, agent numbers or locations, nor direct explicit communication. We note that compliance can be helpful in multiple ways. For instance, it facilitates movement of the object in response to the guiding force, providing displacement as well as force cues that can be used by helper robots to infer the intended motion (Table 1). Table 1. Control variables and constants. d Db Λx , Λν p μx , μ ν x J ν JE Jt η Zν q wt Ke , K ν , γ Bn , Dν , Dx T f
4 4.1
Binary 6 × 1 dimensional constraint vector, [x, y, z, roll, pitch, yaw]T Binary diagonal matrix s.t. diag(Db ) = d. Inertial matrices associated with task and null spaces, respectively The end-effector pose of the robot in cartesian space Coriolis matrices associated with task and null spaces, respectively An n × 1 subvector of p consisting of the task-related pose dimensions the (non-square) standard manipulator Jacobian An m × 1 null-space velocity vector, where the total degrees of freedom of the manipulator can be written n + m The extended (n + m) × (n + m) manipulator Jacobian The task-space Jacobian A 6 × 1 vector describing the pose in non-task-related dimensions, with task dimensions (from x) set to zero. The null-space Jacobian Joint positions An n × 1 vector of estimated external wrench along task-related dimensions Controller gain constants (stiffness) Controller gain constants (damping) Joint torque Force (from force/torque sensors) expressed in the robot’s base frame
Control Methodology Closed Loop Control Dynamics
The system dynamics for each agent take the standard form ¨ + C(q, q) ˙ q˙ + g(q) + Text T = Mq
(1)
144
N. E. Carey and J. Werfel
where the load mass and inertia are modeled as connected rigid components of the end-effector, incorporated into the mass/Coriolis/gravitational matrices accordingly. Each agent’s (dynamic) context for the intended manipulation is encoded as a binary 6 × 1 vector d, expressed in the robot’s base frame. di > 0 indicates the agent should follow a guiding force along dimension i; di = 0 indicates a dimension where any external force is likely to be disturbance-related. We can use d to decouple the instantaneous control torque applied in the robot’s joint space into two components: a task controller, and a null or residual controller. Using d as a dimensional filter on the full manipulator Jacobian J ˙ The null space Jacobian Zν can gives a task Jacobian Jt such that x˙ = Jt q. be derived as in [18]. This allows us to develop a task-specific control schema, and use the complementary space to enact a different form of control on the remaining null dimensions. In a previous study [4], we considered a framework in which the task space was the set of dimensions in which external forces should be rejected, and the null space allowed free movement along other dimensions. Here we invert that division, so that instead of assigning a high priority task of stabilisation against unknown inertial dynamics (with the force-following component relegated to a lower priority null space), we assign the collaborative force-following goal to the high priority task space, while disturbance rejection of out-of-task forces is accomplished in the null space. This assignment allows the force-following behaviour of each agent (and hence the collective) to be more sensitive and responsive. The task space controller takes the following form (a task passivity-based control): ˙ − γKe x ˜) ¨ c = Λ−1 (2) x x (wt − (μx + Dx )x where wt is an estimate of the guiding external wrench in the chosen task domain, expressed in the robot’s base frame, and measured by the robot using either ˙ describes the joint torque sensors or a wrist-based force/torque sensor; (x, x) trajectory of the end-effector in the task-space dimensions only (i.e., x is a subset of the full 6×1 end-effector pose vector p); and Λx , μx are the task-space dynamic inertia and Coriolis matrices, respectively [20]. The error-correcting stiffness term Ke is explored in Sect. 4.2. In a cluttered dynamic environment, it is likely that disturbance forces will be due to environmental collisions. Rigidly maintaining null-space target positioning during a collision risks potential damage to the robot or the colliding object/person. However, the control space of the null component is defined by the residual degrees of freedom that remain unused by the task controller (i.e., it is thus both task- and robot-dependent). Noting this, we can take advantage of the greater number of degrees of freedom offered by redundant manipulators, and use a redundant joint space to accommodate collisions compliantly while minimally impacting the task space behavior. (If, instead, a high rigidity null space is desired, or the robot has insufficient DOFs to accommodate all possible task space configurations while rejecting error, a high stiffness impedance controller based on joint space error could be used.)
A Force-Mediated Controller for Cooperative Object Manipulation
145
We define ν as a null space velocity vector [4,20] and create an impedance control-based input torque of the following form: (3) Tν = −Zν#T Z T Kν JE η˜ − Bν (Λν Z˙ # q˙ + (μν + Dν )ν) where μν , Λν are the null Coriolis and inertial matrices [20], Kν is a stiffness term acting on offsets in non-task-space end-effector dimensions, and Bν , Dν represent damping gains. The null position error state variable η˜ can be calculated using D¯b , the binary matrix derived from the negation of d, assuming that for small changes in η and q, we can use the expanded (full-rank) Jacobian JE to transform errors in joint space to the (non-task-constrained) end-effector frame dimensions: δη = D¯b JE δq
(4)
(In fact for x = 0, where the goal joint configuration qd is governed solely by null space control input, η˜ is the 6 × 1 end-effector pose projection of the joint ˜ .) error q The input torque corresponding to the task control components can be written ˜) ˙ + (wt − (μx + D)x˙ − γKe x (5) Tt = −JtT Λx (J˙q) and our total input torque becomes ˙ + g(q). T = Tt + Tν + C(q, q) 4.2
(6)
Inertial Estimation and Error Compensation
Here we discuss the need for each agent to have an estimate of local load mass (mi ) and CM location (ri ), the relationship between mass inertia matrix error and controller responsiveness, and how these estimates can be achieved. i using q, mi , A robot can calculate an estimated mass inertia matrix Mest and ri . The true (unknown) mass inertia matrix of the (local) system is M i . We define the error in this inertial estimation as follows: i . ΔM i = M i − Mest
(7)
When a follower agent senses an external force, we can assume that this force is predominantly composed of a wrench imparted by the guiding agent, and any residual uncompensated inertial elements of the load: q. Fext = Wt + J T ΔM i (q)¨
(8)
Assuming a constant load, M i , and hence ΔM i , are bounded above [21]. In fact, given our assumption of enough agents to handle the load, M i ≤ mmax , where mmax is the maximum load each agent can support. (Ideally it would be significantly less.) Then an absolute lower bound for the external guiding wrench can be written i (q))¨ q Wt,min = fext − J T (Mmax (q) − Mest
(9)
146
N. E. Carey and J. Werfel
where Mmax (q) is the mass matrix for a load of mass mmax at joint configuration q. Based on this minimum guiding wrench input, and the current system state, we can project a minimum expected end-effector pose estimate xg , given a discrete time-window Δt. The difference between the actual pose and this forward ˜ = xg − x. projection can be written as an error term in the task domain, x This allows us to introduce an error correction term into our control acceleration equation, using a spring stiffness gain matrix (Ke ) (see Eq. 2), and our challenge then becomes choosing a suitable Ke so that it fully compensates for model error. From Eq. 9, suppose f (Mmax , q) is the inertial force applied by the maximum load the robot could support (calculated given the joint configuration q). Then a possible choice of the elements ke of Ke is |kej | ≥ max(f (ML,max , q))j , i.e., we estimate the maximum possible force exerted by the inertial mass matrix upper bound Mmax in each task dimension {j}, and choose Ke to guarantee sufficient compensation force. Note that given Eq. 9, the ratio of {maximum force it is reasonable for the guiding agent to exert} :: {upper bound of load estimate} represents a metric for the system responsiveness. Ke need not be constant: an adaptive stiffness term can be used without loss of stability [14] (provided K˙ e is bounded), as shown in [4], to create regions or dimensions of high and low responsiveness based on the expected impact of errors in the inertial estimate. Adaptive stiffness can also be increased temporarily in response to unexpected motions of the load, to prevent instability that could otherwise result if load inertia estimates are poor [4]. Other work has shown approaches for multiple robots to collaboratively estimate load parameters [8,15]. In future work we plan to extend these approaches to the case without explicit communication. In the reductive case of a single guiding agent and a single follower (as in §5.1), an upper bound on ΔM i can be found by examining the force residuals Δf (q) at the extremes of inertial offset, while only the follower agent is carrying the load (call this a stiffness tuning phase). Since in this case it is also relatively trivial to obtain a high degree of accuracy in inertial load estimation [16], in practice these residual force errors are minimal, and a small constant Ke suffices to fully compensate for model error. 4.3
Controller Stability
In this section, we consider the stability of the full operational space controller, including the model-error correction component described above. Let our state ˜ }. . From [11], we can demonstrate asymptotic stavariable be z = {˜ q, ν, T˜ , ˜f , x bility for the system if there exists a function V (z) such that in a neighborhood Ω of an equilibrium point z = 0, the following three conditions are satisfied: 1. V (z) ≥ 0 ∀z ∈ Ω, and V (0) = 0; 2. V˙ (z) ≤ 0 ∀z ∈ Ω; 3. the system is asymptotically stable for the largest positive invariant set L in {z ∈ Ω | V (z) = 0}.
A Force-Mediated Controller for Cooperative Object Manipulation
147
The total wrench acting on the system along the task dimensions can be written f = wt + ˜f , where wt is the estimated input guiding input wrench as before, and ˜f is the total force estimation error, now including not only inaccuracies in the system model as above, but also other sources of error such as sensor noise. (Note that we are not considering the constrained null dimensions, so this error term is not confounded by disturbance rejection). At equilibrium, f = ˜f , and we can introduce a disturbance observer to track error in the force domain. Using ˙ ˙ where the task state error defined previously, we desire x˙ g = 0, so ˜f = −Γf−1 x, Γf is symmetric and positive definite. Let V (x, f ) be a candidate Lyapunov sub-function that considers only taskrelated energy terms: 1 1 ˜ + x˙ T Λx x˙ + ˜f T Γf ˜f ˜ T Ke x V (x, f ) = x 2 2
(10)
This function is positive definite along (x, f ) and positive semi-definite on the full state z (condition 1 above). Differentiating, we find 1 ¨ + x˙ T Λ˙ x x˙ + ˜f T Γf˜˙f V˙ = 2˜ xT Ke x˙ + x˙ T Λx x 2
(11)
Substitute the closed form dynamics from Eq. 2 to obtain 1 ˜ ) + x˙ T Λ˙ x x˙ + ˜f T Γf˜˙f (12) ˙ − γKe x V˙ = 2˜ xT Ke x˙ + x˙ T Λx Λ−1 x (wt − (μx + D)x 2 We can substitute the disturbance observer defined in Eq. 4.3 and obtain an estimate of the task dimension force error by examining the position error in combination with the environmental stiffness. Then, considering the skew-symmetry of Λ˙ x − 2μx [20], we find condition 2 above on V˙ is satisfied provided ˜ T Ke x˙ − x˙ T Dx˙ < 0 2˜ xT Ke x˙ − γ x
(13)
and since D is constant and positive definite, we require only that the controller gain γ > 2 to ensure conditional stability. Furthermore, under the set condition x = 0, our null state error term ˜ , and the null space control torque is analogous Z T Kn J η˜ reduces to Z T Kn q to that used in [20]. By using a similar subset function candidate VL on the ˜ = 0}, it can be shown that the system is asymptotiset L = {˜ q, ν, T˜ , ˜f = 0, x cally stable conditional to L (condition 3 above). Hence all conditions for overall system stability are fulfilled. 4.4
Task-Frame Switching
The guiding agent can indicate a change in d (e.g., switching from lifting to horizontal carrying) to the other agents using the payload as the medium for a physical signal. For this paper, we note that in the absence of slow changes in uncompensated force driven by unknown load inertia (and with the assumption
148
N. E. Carey and J. Werfel
of a joint-compliant null-space control), disturbance forces are likely to be abrupt in onset and of limited duration. We can use the dynamic characteristics of such disturbances to define an admittance/rejection filter, establishing a set of time- and magnitude-based characteristics that signify a force imposed by the knowledgeable agent intended to change the dimensional domain of the task, and thus adjust the vector d on the fly. §6.2 discusses examples.
5
Experimental Validation
The approach requires a force-sensitive manipulator with (ideally) > 6 degrees of joint freedom. The Franka Emika Panda cobot meets these needs (7 DOF) and is readily commercially available, making it an ideal platform with which to undertake proof of concept experiments. In §5.1 we discuss hardware testing/demonstration of the approach, using the single physical Panda available to us to assist a human leader. In §5.2 we discuss simulations with four Pandas assisting a leader. All code is available at [2]. 5.1
Implementation in Hardware
We implemented an initial parameter estimation phase without a human in the loop, using a two-stage recursive least squares method [13] to estimate approximate mass and CM location of the load, without requiring any foreknowledge of the load parameters or grasp point. Although the inclusion of off-diagonal inertias will in theory improve the stability region [14], we are interested in exploring the case where inertial estimation is imprecise or not all parameters can be determined. The estimation trajectory comprised a series of sinusoidal joint velocity commands (phase offset to each other to minimize inversion errors due to singularity points), operating symmetrically around the robot’s default neutral pose. The joint velocity during this iner control L ˙ , with amplitudes tial estimation phase was defined as: q(t) = ai sin 2π t−t ωi a = [0.3265, −0.3265, 0.225, −0.3, −0.4435, 0.3, −0.3625], cycle period scaling factor ω = [3.68, 2.04, 2.98, 1.75, 4.43, 2.749, 1.4], tL a phase offset constant. The parametric convergence of the overall mass and CM location (relative to the end-effector grasp centroid) was usually accomplished in < 5s, with a mass estimation accuracy within ±5% and centre of mass accuracy ±10%. We did not seek to ensure highly accurate parameter estimation, because we anticipate precise inertial information to be difficult to obtain in the multi-agent case, and did not want this to be a necessary pre-condition when assessing the success of this proof of concept. When calculating the null-space Jacobian projection Z, we must take particular care to ensure the devolved sub-Jacobian Jm [18] is well-conditioned, by using a robust inversion function; this necessarily increases the computation time required. The joint control torque is thus updated asynchronously, leading to non-zero errors in torque and projected position estimation; however, we find
A Force-Mediated Controller for Cooperative Object Manipulation
149
that the small correction stiffness Ke (described in §4.2) serves to accommodate these errors and ensure stability without impacting performance. The Franka Panda is extremely sensitive to discontinuities in control inputs. Abrupt changes in torque input can result in a safety system lock, and aside from the need to low pass filter the raw torque sensor data due to a high noise level, there is also the potential for such discontinuities during task switching. Therefore, when a request for task dimension change has been registered, the real-time controller (a) sends a short burst of torque commands (∼50ms) comprising mass and Coriolis compensating elements only, in order to bring the system smoothly to equilibrium, and (b) re-establishes a new null space origin at the current joint and end-effector positions. This ensures a relatively smooth transition between task contexts. As discussed in Sect. 4.4, we use a force input from the guiding agent to signal task dimension switches (changes in d). For each test case discussed in the next section, we established a set di of pre-coded potential task domain vectors, and time and wrench magnitude thresholds (tth , wi,th ) such that any external force or torque > wi,th applied for > tth along a non-task dimension would signal a switch in the task dimension vector. In our case, the number of pre-coded states di is small; however, with a sufficiently granular set of di , we can attain as much dimensional control flexibility as desired.
Fig. 2. Hardware implementation: behaviour in null-space (constrained; left) vs. taskspace (unconstrained; right) dimensions. Top left: torque was applied around the pitch axis (relative to the robot’s base). Bottom left: a force was applied in the Z (vertical) dimension. Top right: torque was applied around the yaw axis (relative to the robot’s base). Bottom right: a force was applied in the Y (lateral) dimension. In all cases, blue indicates the applied force or torque, orange indicates the system response along or around the axis in question. Controller gains: Dx = 5.0I3 , Ke = 0.2I3 , γ = 2.5, Kν = diag(240, 120, 120, 120, 120, 120, 120), Bν = 0.2, Dν = 4.0I4 .
150
5.2
N. E. Carey and J. Werfel
Implementation in Simulation
To demonstrate the efficacy of this method for multi-robot cooperation without direct communication, we used a simulated Franka Emika Panda model created in the Unity platform [3], and implemented the controller described above on four simulated robots manipulating a large shared load. A separate simulated leader applies forces and torques to the load. Each Panda is mounted on an omnidirectional wheeled base; it uses lateral motion of its end-effector in the task space as a control input to its base, thereby moving along with the leader’s guidance in the plane, and extending its effective workspace.
Fig. 3. Robot guidance without and with contextual information. The human guide applies a torque to rotate the end-effector through 90 degree, around the roll axis. In the first sequence of images (top), the robot control torque compensates for gravitational load, but otherwise is compliant in all dimensions. In the second sequence (bottom), we implement our decoupled force/impedance algorithm with the task constraint vector d = {0, 0, 0, 1, 1, 0}, rejecting wrenches in any direction other than roll and pitch. X’s show the position of the end of the object across successive frames. Video: [5].
6 6.1
Results Controller Performance Within Static Task Domain
To investigate the performance of the two control elements (force following vs. impedance-control), we defined our task domain as d = {1, 1, 0, 0, 0, 1}, applied a set of short disturbance forces to a load held by a single robot, and examined the resulting end-effector trajectory. A sample output is shown in Fig. 2, with motion and force expressed relative to the robot’s base frame of reference. Disturbances
A Force-Mediated Controller for Cooperative Object Manipulation
151
in the pitch or vertical motion are rejected by the null controller, while those around the yaw axis or along the lateral Y axis result in motion for as long as the wrench is applied, and do not return to the origin position afterwards. To demonstrate the behavioural difference between our chosen decoupled control schema and a more conventional compliant control that might be used for, e.g., a learning by demonstration context, we consider a task whose goal is a 90 degree rotation of a manipulated object around the base’s roll axis. Figure 3 compares the motion of the robot and object for each of the two control schemes in response to the same user input wrench. In the top set of images, the robot is under a free-floating (mass-compensated) control without contextual information about which wrench dimensions should be followed. Since the user input—as expressed through the shared load—has wrench components in multiple dimensions, the robot’s end-effector position not only rotates, but also moves in the Y and Z dimensions. In the bottom image series, the robot is controlled by our decoupled controller and supplied with a dimensional context which restricts motion in Y and Z (d = {0, 0, 0, 1, 1, 0}). This controller decouples the wrenches imparted by human manipulation of the load and responds only to those along or around the task dimensions, resulting in an end-effector pose that changes only along the roll axis, as desired. In other words, with this control framework and given a suitable context, we can shape the robot’s resultant trajectory with much greater precision. See video at [5] (part 1).
Fig. 4. Performing a collaborative task where force following and rejection dimensions change, under the guidance of a human operator. The human guides the robot to first insert a key into a slot, then turn it 90 degree and slide it along the slot. Left: Still image after insertion, during the turning phase. Right: The force and motion mapping for one dimension (roll). Force following vs force rejection phases are indicated through colour on the torque input (blue = force follow, red = force reject). Controller gains: Dx = 5.0I3 , Ke = 0.2I3 , γ = 2.5, Kν = diag(240, 120, 120, 120, 120, 120, 120), Bν = 0.2, Dν = 4.0I4 . Task-switch thresholding: (tth = 1.5s, wth = (5N, 0.5N m)). Video: [5].
6.2
Controller Performance with a Changing Task Domain
To demonstrate switching between contexts when the motion requirements of a task have a clearly defined dimensional change, we consider a collaborative
152
N. E. Carey and J. Werfel
key/lock positioning/insertion and rotation task. We pre-code a set of dimensional constraints di corresponding to the key insertion (i = 0), and key turn/slide (i = 1): {1, 1, 1, 0, 0, 1} i = 0 di = (14) {0, 1, 0, 1, 1, 0} i = 1 Figure 4 shows a snapshot during the process, and the roll input torque and end-effector angle over the interaction. Initially, the robot is free to move in (x, y, z, yaw), but in order to maintain the key in the correct orientation for insertion, constrained in (roll, pitch). Once the key has been inserted into the slot, the human seeks to shift the robot’s context to allow a 90 degree turn, and does this by applying a firm roll torque for 2s. The robot then shifts its constraints to a new dimensional set where it is free in (y, roll, pitch) but constrained in (x, z, yaw). See video at [5] (part 2). The plot on the right of Fig. 4 shows the roll torque applied to the robot through the shared load, and its corresponding roll angle. During the first phase (indicated by a red color on the input torque), the robot successfully rejects motion around its roll axis. Once the time/magnitude threshold for context switching has been passed, the contextual control dimension changes to d1 , and the roll input torque (now shown in blue) results in a corresponding angular rotation.
Fig. 5. Behaviour of a four robot collective using our controller, following a two-stage guiding force. Top left: stable state counteracting inertial load, before external force application. Top right: An external force is applied along the global X axis. Bottom left: An external torque is applied around the global X axis. Bottom right: The corresponding local Y and local pitch pose of one robot during this process. Video: [5].
A Force-Mediated Controller for Cooperative Object Manipulation
6.3
153
Multi-robot Performance with Changing Task Domain
To demonstrate that this control methodology can be extended to multiple robots with no direct communication, knowing only a (local) estimate of the load inertia, we implemented a simulated trial with four Franka Emika Panda robots using the Unity platform. After grasping a shared load, each robot applies a joint torque sufficient to counteract the gravitational inertia of its portion of the manipulation object (assumed for this proof of concept to be known ahead of time). An external guiding force is then applied in two stages, first horizontally in the plane, then rotationally around the long axis of the load object. The corresponding dimensional control constraints for each stage are: {1, 1, 0, 0, 0, 1} i = 0 (15) di = {1, 0, 1, 0, 1, 0} i = 1 Figure 5 shows that using only an external force input and a pre-calculated knowledge of load inertial parameters (note that the robots are not aware of the number, locations, or behaviour of other agents co-manipulating the object), the robot collective can ensure the load follows the guidance force. Video: [5] (part 3).
7
Discussion
A fundamental reason for deploying decoupled operational space control, in this task of robots performing collective manipulation with limited information, is to address disturbances imposed by the lack of shared knowledge. Without awareness of the others’ poses or intentions, agents cannot disambiguate between interpretations of sensed forces, and thus in general will apply conflicting torques. This research demonstrates that (a) given a task context, these kinds of conflicts between agents can be resolved by using a compliant error-rejection framework; (b) if the agents are further provided with an estimation of load inertia, this task context can incorporate out-of-plane motion; and (c) force can be used as a control signal to enable contextual switching between control dimension subsets. The use case for such a system is likely to be in collaborative transport or manipulation in complex, unknown, or dynamic environments. As §6.3 demonstrates, this kind of spontaneous collaboration can be effectively deployed with very little foreknowledge or preparation, requiring only one knowledgeable guiding agent. This simplifies the navigational challenges of collaborative transport considerably, and minimizes the shared knowledge base required. In particular, using compliant control for both elements of the decoupled controller eliminates any need for strict global kinematic alignment between agents; hence such systems should be robust to perturbations such as rough terrain or minor collisions. To fully realize the goal of collective manipulation without prior knowledge or communication, further work is needed for letting agents estimate inertial
154
N. E. Carey and J. Werfel
parameters online, potentially during an interaction [6]. Since direct communication is not necessary for localised mass estimation [15], it may be feasible, e.g., to use an adaptive high-gradient stiffness controller [4] to implement a CM estimation routine within a shallow 6DoF low-stiffness basin common to all agents, solving the dynamic equations iteratively to allow each agent to calculate an individual estimate of the approximate CM location. Future work will focus on investigating such methods. Acknowledgements. This work was supported by a Space Technology Research Institutes grant (number 80NSSC19K1076) from NASA’s Space Technology Research Grants Program.
References 1. Bai, H., Wen, J.T.: Cooperative load transport: a formation-control perspective. IEEE Trans. Rob. 26(4), 742–750 (2010) 2. Carey, N.: Decoupled Controller for the Franka Panda (Version 1.08) [Computer software] (2022). https://github.com/niccarey/panda decoupled controller 3. Carey, N.: Unity implementation of the Franka Panda platform(version 1.0) [Computer software] (2020). https://github.com/niccarey/FrankaPanda Unity 4. Carey, N.E., Werfel, J.: Collective transport of unconstrained objects via implicit coordination and adaptive compliance. In 2021 IEEE International Conference on Robotics and Automation (ICRA), (pp. 12603–12609). IEEE (2021) 5. Carey, N.E., Werfel, J.: A force-mediated controller for cooperative object manipulation [Media file, video] (2022) https://youtu.be/VIoj6v a-Gw ´ 6. Cehaji´ c, D., Hirche, S.: Estimating unknown object dynamics in human-robot manipulation tasks. In 2017 IEEE International Conference on Robotics and Automation (ICRA), (pp. 1730–1737) (2017) IEEE 7. Elwin, M.L., Strong, B., Freeman, R.A., Lynch, K.M.: Human-multirobot collaborative mobile manipulation: the Omnid Mocobots. (2022) arXiv preprint arXiv:2206.14293 8. Franchi, A., Petitti, A., Rizzo, A.: Decentralized parameter estimation and observation for cooperative mobile manipulation of an unknown load using noisy measurements. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), (pp. 5517–5522). IEEE (2015) 9. Habibi, G., Kingston, Z., Xie, W., Jellins, M., McLurkin, J.: Distributed centroid estimation and motion controllers for collective transport by multi-robot systems. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), (pp. 1282–1288). IEEE (May 2015) 10. Hichri, B., Fauroux, J.C., Adouane, L., Doroftei, I., Mezouar, Y.: Design of cooperative mobile robots for co-manipulation and transportation tasks. Robot. Comput.Integr. Manufact. 57, 412–421 (2019) 11. Iggidr, A., Sallet, G.: On the stability of nonautonomous systems. Automatica 39(1), 167–171 (2003) 12. Khatib, O.: A unified approach for motion and force control of robot manipulators: the operational space formulation. IEEE J. Robot. Autom. 3(1), 43–53 (1987) 13. Kubus, D., Kroger, T., Wahl, F.M.: On-line rigid object recognition and pose estimation based on inertial parameters. In: 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, (pp. 1402–1408). IEEE (Oct 2007)
A Force-Mediated Controller for Cooperative Object Manipulation
155
14. Liang, X., et al.: An adaptive time-varying impedance controller for manipulators. Front. Neurorobot. 16 (2022) 15. Marino, A., Muscio, G., Pierri, F.: Distributed cooperative object parameter estimation and manipulation without explicit communication. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), (pp. 2110–21116). IEEE (May 2017) 16. Mavrakis, N., Stolkin, R.: Estimation and exploitation of objects’ inertial parameters in robotic grasping and manipulation: a survey. Robot. Auton. Syst. 124, 103374 (2020) 17. Montemayor, G., Wen, J.T.: Decentralized collaborative load transport by multiple robots. In: Proceedings of the 2005 IEEE International Conference on Robotics and Automation, (pp. 372–377). IEEE (Apr 2005) 18. Ott, C.: Cartesian impedance control of redundant and flexible-joint robots. Springer (2008) 19. Petitti, A., Franchi, A., Di Paola, D., Rizzo, A.: Decentralized motion control for cooperative manipulation with a team of networked mobile manipulators. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), (pp. 441– 446). IEEE (May 2016) 20. Sadeghian, H., Villani, L., Keshmiri, M., Siciliano, B.: Task-space control of robot manipulators with null-space compliance. IEEE Trans. Rob. 30(2), 493–506 (2013) 21. Sariyildiz, E., Sekiguchi, H., Nozaki, T., Ugurlu, B., Ohnishi, K.: A stability analysis for the acceleration-based robust position control of robot manipulators via disturbance observer. IEEE/ASME Trans. Mechatron. 23(5), 2369–2378 (2018) 22. Wang, Z., Schwager, M.: Kinematic multi-robot manipulation with no communication using force feedback. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), (pp. 427–432). IEEE (May 2016)
A Distributed Architecture for Onboard Tightly-Coupled Estimation and Predictive Control of Micro Aerial Vehicle Formations I. Kagan Erunsal1,2(B) , Rodrigo Ventura2 , and Alcherio Martinoli1 1
2
´ Distributed Intelligent Systems and Algorithms Laboratory, Ecole Polytechnique F´ed´erale de Lausanne, Lausanne, Switzerland {kagan.erunsal,alcherio.martinoli}@epfl.ch Institute for Systems and Robotics, Instituto Superior T´ecnico, Lisbon, Portugal [email protected]
Abstract. This paper presents a distributed estimation and control architecture for leader-follower formations of multi-rotor micro aerial vehicles. The architecture involves multi-rate extended Kalman filtering and nonlinear model predictive control in order to optimize the system performance while satisfying various physical constraints of the vehicles, such as actuation limits, safety thresholds, and perceptual restrictions. The architecture leverages exclusively onboard sensing, computation, and communication resources, and it has been designed for enhanced robustness to perturbations thanks to its tightly-coupled components. The architecture has initially been tested and calibrated in a high-fidelity robotic simulator and then validated with a real two-vehicle system engaged in formation navigation and reconfiguration tasks. The results not only show the high formation performance of the architecture while satisfying numerous constraints, but also indicate that it is possible to achieve full navigation and coordination autonomy in presence of severe resource constraints as those characterizing micro aerial vehicles. Keywords: Formation control · micro aerial vehicles · distributed nonlinear model predictive control · relative and onboard localization
1
Introduction
Recent years have been characterized by various improvements in the coordination and cooperation strategies of autonomous robots. This is because loosely connected systems controlled by distributed or decentralized strategies have inherent advantages over centrally governed systems, such as additional flexibility, robustness, and scalability [1]. Among these strategies, formation control, usually aimed at achieving prescribed geometric constraints between vehicles, is one of the most actively investigated topics [2]. A promising method to optimally c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 156–172, 2024. https://doi.org/10.1007/978-3-031-51497-5_12
A Distributed Architecture for Onboard Estimation and Control of MAVs
157
achieve this goal while satisfying diverse constraints is Nonlinear Model Predictive Control (NMPC) due to its architectural heterogeneity, simple reconfigurability, and hierarchical flexibility [3]. Furthermore, to improve the performance, the distributed version of this algorithm is a prominent candidate because it approximates the global objective more accurately than its decentralized counterpart by including inter-robot communication. However, NMPC-based algorithms require an accurate model of the system, correct state estimates, and sufficient computational resources. Furthermore, the pure distributed version of the algorithm requires a robust communication framework. These requirements might be challenging to satisfy primarily on resourceconstrained Micro Aerial Vehicles (MAVs). For example, system identification techniques might not yield a faithful model, the system might be time-varying due to the intrinsic nature of multiple vehicles, and only the partial state information and limited computational power might be available because of hardware and software constraints. As a result, these problems should be addressed in a tightly-coupled estimation and control architecture that considers the various constraints in order to be deployed successfully on real robots. Another critical challenge for deploying such systems is the dependency of robotic platforms on global localization systems, such as Motion Capture Systems (MCSs) or Global Navigation Satellite Systems (GNSSs), for indoor and outdoor scenarios, respectively. There are various successful real-world implementations of model predictive approaches for the formations of multi-robot systems. Among those, we focus on decentralized and distributed approaches with real-world implementations. In [4], authors propose a trajectory planner that considers various task requirements such as swarm coordination, flight efficiency, obstacle avoidance for a swarm of flying robots and demonstrates the method in cluttered environments. Although the strategy seems highly effective and scalable for flocking navigation, visual-inertial odometry (VIO) coupled with Ultra-Wide Band (UWB) distance measurement (without orientation information) might not be accurate enough for formation control applications. In [5], authors present a decentralized strategy for the compact swarm of unmanned aerial vehicles without communication and external localization. The approach is well-constructed for outdoor flocking applications; however, the accuracy of the ultra-violet (UV) direction and ranging technique is limited for indoor formation applications. Furthermore, since predictive information is not shared among vehicles, it might yield low performance in dynamic scenarios. A distributed methodology is studied for the formation control and obstacle avoidance of homogeneous robots employing multi-robot consensus in [6]. The method is scalable and efficient with the number of robots; however, the implementation is not carried out onboard, and a motion capture system is used to obtain state feedback. In [7], a distributed predictive approach is presented for drone swarms in cluttered environments and with high sensor noise. Although the real-world experiments with sixteen quadrotors show the scalability of the strategy, the computations are not carried out onboard, and the localization solution is global. In [8], authors introduce a
158
I. K. Erunsal et al.
decentralized robotic front-end for autonomous aerial motion capture in an outdoor environment and integrate perception constraints into the optimization problem. Although computations are performed onboard, GNSS information is collected and shared among vehicles to generate relative position information, which is not always reliable. Other works with real-world implementations include dynamic encirclement [9], predictive outdoor flocking [10], formation in motion [11], formation with communication constraints [12] and indoor formation with visual-inertial odometry [14]. However, the papers listed previously relies on global localization systems and/or offboard computation and/or shared coordinate frames and/or reliable communication network. This paper presents a distributed estimation and control architecture based on a multi-rate Extended Kalman Filter (EKF) and NMPC to perform leaderfollower formations of multi-rotor MAVs under state, actuation, and perception constraints and in the presence of environmental disturbances and model uncertainties. Our focus here is mainly on the follower subsystem since it constitutes the central part of the leader-follower formations. We validate the performance and applicability of this architecture, leveraging exclusively onboard resources and relative sensing in a high-fidelity robotic simulator and a real-world setup with diverse experiments. In particular, followers obtain relative pose and bodyfixed velocity information by employing their coupled optical flow and distance sensor, and an RGB camera tracking April bundles [15]. The main contribution of this paper is to meet the following requirements simultaneously. First, the formation subsystems exclusively employ onboard resources for localization, computation, communication, and sensing. This brings the multi-robot system closer to practical applications. Second, the components of the overall architecture are designed to be scalable, performant, and robust at the network level. The main ingredient that allows such a characteristic is the tight-coupling of perception, communication, estimation, and control at the individual robot level. This unified architecture leads each component not only to exploit its full potential, but also to assist the other components in maintaining a robust operation. In particular, the distributed NMPC exhibits high performance with an accurate model, state estimates, information from neighboring vehicles, and enough computational resources. For this purpose, the computationally efficient distributed EKF in each vehicle employs both the information received on communication and perception assets to provide uninterrupted state and uncertainty estimates, even if the communication channel has flaws. On the other hand, the individual EKFs require sufficient perception and communication update rates to perform successfully. This is where the distributed NMPC helps the perception and communication components to satisfy their various constraints, such as the field of view (FOV) and range. Furthermore, the simple but efficient prediction model selected for NMPC not only minimizes communication requirements, but also maximizes computational efficiency. Finally, the distributed architecture can be scaled linearly by increasing the computational power up to a maximum number of neighbors.
A Distributed Architecture for Onboard Estimation and Control of MAVs
2
159
Problem Formulation
Similar to [16], the following notation for vectors and rotation matrices will be adopted assuming that {a}, {b}, {c} are the coordinate frames and x is the name of the vector: xca/b : Vector quantity x ∈ Rn of the origin of {a} with respect to the origin of {b} expressed in {c}.
Fig. 1. An example of formation network consisting of one leader (L0 ) and five followers (Fi , i = 1, .., 5): two subnetworks for the F1 and F4 , the corresponding relative positions, sensing and communication ranges are also indicated.
Fig. 2. MPC coordinate frames in a prediction horizon of length N: {m} (blue), {b} (yellow), {c} (red), {c} (green). (Color figure online)
Rab : Rotation matrix R ∈ SO3 that transforms a vector expression from {a} to {b} with the Z-Y-X convention. . denotes the√Euclidean norm of a vector or the spectral norm of a real matrix and . P := xT P x (with P ∈ Rn×n and P 0) stands for the weighted Euclidean norm of x ∈ Rn . Consider a formation network N that consists of K +1 aerial robots. Assume that one of them is assigned as a leader L and the rest are followers Fi so that N = {L, F1 , ..., FK }. All robots operate in a workspace W ⊂ R3 . The state of each robot i ∈ N is defined as follows: T
n T n T b χi = [xnb/n T vb/n tb/n wb/n ]T
(1)
n where xnb/n and vb/n represent the position and linear velocity, tnb/n is the Euler b angles, and wb/n denotes the angular velocities. Here, {n} is the Earth-fixed inertial frame and {b} is the body-fixed frame. Then, for instance, the vector xnb/n would represent the position of the origin of {b} with respect to {n} and expressed in the latter coordinate system.
160
I. K. Erunsal et al.
We adopt the full dynamics of the multi-rotor MAV including dominant (propeller) and auxiliary (drag) aerodynamic effects based on [17,18]. Note that although this model can be used as a plant model for any simulation, it is too complex for optimal control purposes. We will explain the simplified model used for MPC in Sect. 3. Finally, due to the physical limits of the propellers and electric motors of the aerial vehicle, the inputs should be within a set of constraints, that is, ubb/n ∈ Uc . Additionally, due to vehicle safety limits, we have χi ∈ Xc . Consider that a robot i ∈ N has a sensing area of As ⊂ Bs (xi , rs ) and a communication area Ac ⊂ Bc (xi , rc ) where xi is the center and rs and rc are the radius of the norm balls Bs and Bc , respectively. Assume Δxij = xj − xi is the relative position of the robot j ∈ N with respect to the robot i ∈ N \ {L}. Then the sub-network for the follower i is defined as follows: Ni := {i, j ∈ N , j = i : Δxij < min(rs , rc )}. Figure 1 represents this network structure. In this work, all robots, including the leader, have an output type of feedback (i.e., no state-based feedback across vehicles is possible). The only vehicle that has access to its absolute position by a global localization system is the leader. As a result, the leader is responsible for the absolute navigation of the robot ensemble. All vehicles are equipped with a coupled optical flow and distance sensor to obtain linear velocities and ground distance, an Inertial Measurement Unit (IMU), and magnetometer to acquire linear accelerations, rotational velocities, and Euler angles. All follower vehicles are endowed with an onboard, limited range and FOV, 3D, relative localization system for measuring inter-vehicle poses of neighboring vehicles (i.e., relative range, bearing and orientation). We selected to use an RGB camera and April bundles to obtain such 3D relative pose information of the neighbor vehicles due to its accuracy and implementation simplicity. We do not assume any reliable communication network among vehicles, but we benefit from it when it is available. Furthermore, all vehicles are under the disturbed effect of model uncertainty and slowly varying airflow disturbance due to the wake effect of neighboring vehicles. Finally, all sensors are characterized by zero-mean Gaussian noise. Problem 1. Consider one leader and K follower robots under the input, percepb ∈ Zc and χ ∈ Xc . tion, communication, and state constraints ubb/n ∈ Uc , zb/n Here, the leader can be considered as a separate and independent trajectory tracking robot. Under the assumptions explained in the previous paragraph, design a control and estimation strategy for followers so that the robotic network will follow a predefined reference trajectory rp ∈ R3 while maintaining the formation and connectivity defined by, possibly time-varying, relative positions references Δ xij , relative yaw angles references Δψi where i ∈ Ni \ {L} and j ∈ Ni .
A Distributed Architecture for Onboard Estimation and Control of MAVs
3
161
Methodology
The components of our distributed architecture are selected and designed to take into account resource constraints. In particular, we have paid attention to distribute the architecture over the network while preserving its strong integration at the robot level. The former feature allows us to obtain scalable behavior, while the latter brings high performance and robustness due to the tight coupling of the perception, communication, estimation, and control components. The proposed architecture is represented in Fig. 3. Here, the neighborhood consists exclusively of followers, but the leader could also be part of the subnetwork. One can observe the tight-coupling of components on the vehicle level and their distribution over the network. The details of the estimation and formation control components and their relationship with perception and communication will be explained in the next section. To comprehend the details of the estimation and control architecture, we should define the coordinate frames employed here. The first frame is the bodyfixed reference frame {b} and is defined for the real-time horizon, the second is the body-fixed MPC reference frame {d} anchoring to the body throughout any MPC prediction horizon. As a result, {b} and {d} coincide at the beginning of each prediction horizon. Furthermore, {m} is defined as the inertial MPC-EKF frame, which corresponds to the zero vector at the beginning of the motion. The final reference frame is introduced as the MPC control frame {c}, which is a roll and pitch-free frame anchored to the {d} and adopted to define the control and reference variables. These frames are represented in Fig. 2.
Fig. 3.
Distributed architecture for a sub-network, tight-coupling of components on the agent level is observable.
3.1
Follower Estimation Scheme
We adopt a distributed multi-rate EKF approach to carry out the estimation function due to its computational efficiency. The related literature is vast, and we refer the reader to [19,20] and the references therein. For the sake of clarity,
162
I. K. Erunsal et al.
we will present the process and perception models and highlight the important points. Consider a new extended state definition for each robot i ∈ N \ {L}: ξii = [viT ψ˙ i dTi,a ]T ξij = [ΔxTij vjT Δψij ψ˙ j ]T
(2)
where Δxij , Δvij and Δψij are the relative positions, velocities, and yaw angles of the robot i with respect to j ∈ Ni ; vi , vj , ψ˙ i and ψ˙ j are the linear and yaw velocities of the robot i and j ∈ Ni , respectively. Finally, di,a stands for the slowly varying disturbances in the acceleration dynamics of the agent i. Note that all states are expressed in the inertial frame {m}. Let us concatenate the defined states so that we obtain an extended state for an individual robot i ∈ Ni \ {L}: T T T T ξij1 ξij ... ξij ]T (3) ξi = [ξii 2 N where j1 , j2 , ..., jN ∈ Ni . Consider a generic nonlinear time-discrete model of an individual agent i: ξi [k + 1] = fi (ξi [k], ui [k], wi [k]) zi [k] = hi (ξi [k], νi [k])
(4)
where fi is the process map, hi is the perception map, wi ∼ N (0, σw ) and νi ∈ N (0, σν ) are the zero-mean Gaussian noise vectors in the process and perception models, respectively. Then, inspired by [21], for ∀j ∈ N , the process model can be written explicitly as follows: Fi [k] − g + di,a [k] + wi,a [k] vi [k + 1] = vi [k] + Δt Ri [k] (5) m ψi [k + 1] = ψi [k] + Δt (wi,ψ [k]) ψ˙ i [k + 1] = ψ˙ i [k] + wi,ψ [k] Δxij [k + 1] = Δxij [k] + Δt(vj [k] − vi [k])− Δt2 Fi [k] − g − di,a [k] − wij,Δa [k] Ri [k] 2 m
(6) (7) (8) (9)
vj [k + 1] = vj [k] + Δt (wj,a [k]) Δψij [k + 1] = Δψij [k] + Δt(Δψ˙ ij [k] + wij,Δψ [k])
(10)
Δψ˙ ij [k + 1] = Δψ˙ ij [k] + wij,Δψ [k] di,a [k + 1] = di,a [k] + wi,d [k]
(12) (13)
(11)
where Fi is the input, Ri stands for the rotation matrix, g is the gravitational acceleration vector, Δt is the sample time, w’s represent the Gaussian noise, d’s are the slowly varying disturbance (or uncertainty) on the corresponding states. Note that these lumped disturbance estimates accounting for model uncertainty, environmental disturbance, and time-varying characteristics of the vehicle are
A Distributed Architecture for Onboard Estimation and Control of MAVs
163
crucial for the robust behavior of the predictive formation controller. In addition, some definitions in the model can be written as follows: wij,Δa := wj,a − wi,a and wij,Δψ := wj,ψ − wi,ψ . Note that all equations are written with respect to the {m} frame and ∀j ∈ Ni . Furthermore, ∀j ∈ Ni , the perception model is given as follows: zi,Δx [k] = g(Ri [k]Δxij [k] + νi,Δx [k])
(14)
zi,v [k] = Ri [k]vi [k] + νi,v [k] zi,Δψ [k] = Δψij [k] + νi,Δψ [k]
(15) (16)
zi,ψ [k] = ψij [k] + νi,ψ [k] z ˙ [k] = ψ˙ ij [k] + ν ˙ [k]
(17)
zi,vj [k] = Rj [k](vj [k]) + νi,vj [k]
(19)
i,ψ
i,ψ
(18)
where z represents the measurement, ν stands for perception noise, and the function g(.) maps the relative position to the range and the bearing information. Furthermore, it is assumed that the robot has an accurate attitude estimator that generates the roll and pitch estimates φi [k] and θi [k] so that the rotation matrix Ri is fully defined. Remark 1. (Neighboring vehicle states): Eq. (19) is critical in the perception model. vj represents the velocity of any neighboring vehicle j ∈ Ni received by communication and must be assigned to the reference frame of vehicle i by Rj . However, if it is not available due to various communication flaws, the EKF provides a continuous estimation and tracking of the velocity of neighboring vehicles. 3.2
Follower Controller Scheme
Reliable and high performance attitude controllers are commonly available for today’s aerial vehicles. In order to separate high-frequency tasks (attitude control) from low-frequency ones (formation control), to exploit the full potential of the autopilot and onboard computer, and to simplify the prediction model, we leverage a cascaded architecture as shown in Fig. 3. The attitude controller is selected as the PID-based controller proposed by [22] and the formation controller is a Distributed Nonlinear Model Predictive Controller (D-NMPC). The reason behind the latter choice is that NMPC can simultaneously handle constraints and optimize the performance in a systematic manner, and the distributed version inherently has higher performance since it approximates the centralized problem more accurately compared to its decentralized counterpart. Here, we use the following notation to distinguish predicted values from actual ones: v[m|n] is the value of the variable v at a discrete instant k = n, predicted at k = m where m ≥ n. Furthermore, N is defined as the prediction (and control) horizon and Δt is the sampling time. Now, let us write the Open ˆ k) (at time k with initial estimated state Loop Control problem (OCP) P(ξ[0], ˆ ξ[0]) inspired by [23]:
164
I. K. Erunsal et al.
min
N −1
ξ i [k|k],...,ξ i [k+N |k] u i [k|k],...u i [k+N −1|k] n=0
Jk (ξi [k + n|k], ui [k + n|k], ri [k + n|k])+
JN (ξi [k + N |k], ui [k + N |k], ri [k + N |k])
(20)
subject to the following constraints for n= 1,2, ..., N : ξi [k + n + 1|k] = fˆ(ξi [k + n|k], ui [k + n|k]) gn (ξi [k + n|k], ui [k + n|k]) ≤ 0 hn (ξi [k + n|k], ui [k + n|k]) = 0 ξi [k|k] = ξˆi [0]
(21) (22) (23) (24)
Here, Jk is the stage cost, JN is the terminal cost, ξi , ui and ri represent the state, input and reference of the vehicle i, respectively. Additionally, fˆ is the prediction function and gn and hn represent the inequality and equality constraints. Before explaining the individual terms of the OCP, it is worth visiting the pinhole camera model [24] so that we can extract the coordinates of the detected vehicle in the image plane and incorporate this information into OCP to obtain perception-aware behavior. Consider that Δxij is a representation of the relative 3D coordinates of the neighbor vehicle j and pj are the image coordinates of the same vehicle assuming that the origin is the image center. Then we can obtain the following relationship: pj = KCΔxij
(25)
where C is the camera matrix, which is a function of the focal length f of the camera and K is the gain. Now, the stage and terminal cost functions can be expressed as follows: Jk (ξi [k + n|k], ui [k + n|k], ri [k + n|k]) = Δ xij [k + n|k] − Δxij [k + n|k]QΔx + j∈Ni
vj [k + n|k] − vi [k + n|k]Qv +
j∈Ni
ui [k + n|k]Qu + Δui [k + n|k]QΔu +
pj Qp
(26)
j∈Ni
JN (ξi [k + N |k], ui [k + N |k], ri [k + N |k]) = Δ xij [k + N |k] − Δxij [k + N |k]QΔxN + j∈Ni
j∈Ni
vj [k + N |k] − vi [k + N |k]QvN +
j∈Ni
pj QpN
(27)
A Distributed Architecture for Onboard Estimation and Control of MAVs
165
where Δ xij represents the geometric formation reference and Δui stands for the rate of change of inputs. As a result, in the stage cost function, the first term minimizes the formation errors, the second term stabilizes the relative velocities, the third and fourth terms optimize the control effort, and the last one is responsible for satisfying camera FOV constraints. The continuous version of the prediction function, which includes not only ego but also neighboring vehicle motion characteristics, fˆ expressed in Eq. (21) and can be discretized with any efficient method such as the implicit Runge-Kutta (i-RG), can be written as follows: m m ˙ m Δx d/m,ij = vd/m,j − vd/m,i
Rdm d m F + g + Cd vd/m,i + di,a mb d/m,i 1 = (ki tref,i − tm d/m,i ) τi = 0, ∀j ∈ Ni
m v˙ d/m,i =
t˙m d/m,i m ψ˙ d/m,j
(28)
d and attitude angle references tref,i . Finally, gn where inputs are thrust Fd/m,i and hn include constraints to impose the safety conditions/limits such as the maximum velocity, thrust, and Euler angle that the robotic agent is allowed to achieve. The following remarks will be helpful to gain more insight about the OCP: m , the predicted Remark 2. (Information sharing): At each sample instant, vd/m,j m , the vehicle’s prevelocity of neighboring vehicles should be received and vd/m,i dicted velocity should be transmitted over the communication channel for the whole horizon. If the received information is not available for some reason, the DNMPC leverages the estimates sent by the local EKF for the prediction. In this case, the formation controller assumes a constant velocity over the prediction horizon.
Remark 3. (Prediction model): The prediction model includes the closed-loop attitude subsystem as three first-order linear differential equations. This simplifies the prediction model and allows a seamless separation of tasks. Their parameters can be easily identified by conducting attitude-tracking tests. Remark 4. (Disturbance estimates): The disturbances acting on the prediction model, di,a , are received from the estimator and assumed constant over the horizon. This is a valid assumption if the horizon length is not very long and/or the main disturbance source is model uncertainty. Remark 5. (Perception requirements): The horizontal and vertical field-of-view requirements are not added to hn to avoid increasing the number of constraints in the OCP. Although it does not provide guarantees, the inclusion of the last terms in the cost function ensures a reliable perception operation.
166
I. K. Erunsal et al.
Real-Time Iteration Scheme. We adopt a modified Real-Time Iteration (RTI) strategy combined with estimation. This is one of the state-of-the-art solution methods applied to real-time systems as explained by [25]. In this method, first discretization and then linearization are applied to the OCP around operating points and Quadratic Programming (QP) solvers are used successively to solve the generated problem. The optimality of the solution is determined by the Karush-Kuhn-Tucker (KKT) value. The computational constraints can be satisfied by monitoring the online solver time. Please refer to [13] for details of the RTI scheme repeated for each control iteration. Note that to realize the algorithm, the nonlinear program solver ACADO [26] is integrated with the necessary modifications. Note that ACADO employs qpOASES as a convex solver.
4
Experiments and Results
The proposed architecture has been first tested in simulation leveraging the high-fidelity robotics simulator Webots [27] interfaced with the Robot Operating System (ROS). A screen-shot of a formation experiment with three MAVs is shown in Fig. 5. The simulator employs a realistic quadrotor model obtained and identified in our previous work [13] where we tuned the parameters of both the estimator and the controller to faithfully match reality. Since the interfaces of the Webots simulator with ROS is exactly the same as that of physical quadrotors, the transition to physical reality was quite smooth.
Fig. 4. The Helipal Quadrotor and its components.
Fig. 5. Webots simulation consisting of three drones equipped with Aprilbundles and a RGB camera.
The quadrotor used in our experiments is a modified Helipal Storm Drone-4 v3, endowed with a Hex Cube Orange autopilot, a Jetson Xavier NX onboard computer, an IMU, a coupled optical flow and distance sensor, a wide FOV global shutter RGB camera and a bundle consisting of April tags, as shown in Fig. 4. The instrumented quadrotor weighs 1.71 kg and has a center-to-propeller
A Distributed Architecture for Onboard Estimation and Control of MAVs
167
distance of 21 cm. All computations are performed onboard by leveraging ROS. The Jetson Xavier NX computer has six parallel NVIDIA Carmel ARM v8.2 (max. 1.9 GHz) cores and one dedicated GPU (384 NVIDIA CUDA cores and 48 Tensor cores). For the experiments, all CPUs are activated with maximum power. Only the leader has access to 3D position information generated by a MCS with millimeter accuracy. The communication among the robots is realized using Wide Area Network (WAN) and the multimasterfkie ROS package. Trajectories are generated inside an indoor volume of 27 m3 . This arena, which includes two quadrotors, can be seen in Fig. 6. The measurement update rates of the 3D relative localization solution, optical flow sensor, distance sensor, IMU and magnetometer are 10 Hz, 20 Hz, 60 Hz, 100 Hz and 40 Hz, respectively. The control and estimation frequency is 40 Hz. NMPC for the follower solves the OCP with 40 horizon steps, which corresponds to 1 s of prediction window. The optical flow and distance sensors are selected and calibrated for indoor use. As can be seen in Fig. 6, various white stripes are placed on the floor to improve the quality of optical flow. The appropriate selection and calibration of the camera and tag for the mission and the measures taken against vibration and blur are the critical parts of the integration. With this solution, we have a detection range of 6 m and a field of view of 120◦ . The magnetometer is not fully reliable indoors; however, onsite and on-the-fly calibration significantly improves performance. Lastly, the computational resources of both the onboard computer and the autopilot are allocated to ensure efficient perception, communication, estimation, and control.
Fig. 6. Experiment arena with two drones.
Fig. 7. Leader’s trajectory for the navigation experiment.
We will present the results of two different types of experiments: formation navigation and reconfiguration, which are two common tasks for multi-robot systems performing formations. Note that you can view the video of the experiments on the following research page https://www.epfl.ch/labs/disal/research/ quadrotorformationmpc/.
168
I. K. Erunsal et al.
Fig. 8. Formation errors for the navigation experiment: x, y, z and norm error, dashed red lines indicate the averages over time.
4.1
Fig. 9. Solver metrics: cost function value, KKT value, solver time and the number of iterations, dashed red lines indicate the averages over time.
Formation Navigation
For this section, after the follower engaged into the formation, the leader follows a predefined 3D trajectory, and the follower tries to maintain the formation geometry. The formation reference is given as [2, 2, 0] m, and the leader’s path can be seen from Fig. 7. The total duration of the experiment is nearly 90 s, and the same scenario is executed five times. Figure 8 presents the mean and standard deviations of the formation errors for each axis and their norm over five experiments. For this calculation, the MCS is used as a ground truth. As can be seen in the figure, while the average norm error is less than 0.16 m, ≈ 5% of the total range, the maximum norm error is less than 0.27 m ≈ 9% of the total range. While the mean errors on the x and y axes are different from zero, the one for the z axis is very close to zero. This is possibly due to the fact that modeling uncertainty for the altitude dynamics is less time-varying and compensates well with higher gains. Figure 9 shows the mean and standard deviations of the solver metrics over five experiments. These are the cost function value, the KKT value, the solver time, and the iteration number. Despite some occasional peaks, the cost value stays below a upper limit and the KKT value is lower than 0.005, which is a solid indication that Nonlinear Program (NP) is converging to Quadratic Program (QP) approximations. Furthermore, the solver time remains below 0.025 s, with at most three SQP iterations, which in turn defines the sample time of the controller. Finally, Fig. 10 shows the estimation errors, again calculated with respect to the ground-truth and disturbance estimates. As can be observed from the figure, the average error is around 0.1 m and follows a trend similar to that of formation errors. The absence of sudden peaks indicates that the FOV is maintained so that the 3D relative pose measurement is received steadily. Furthermore, the velocity estimation error in the body frame is less than 0.1 m/s on average, and the
A Distributed Architecture for Onboard Estimation and Control of MAVs
169
disturbance estimations are mostly steady, indicating that the main disturbance source in this experiment is the time-invariant model uncertainty.
Fig. 10. Estimation errors (relative position, velocity) and disturbance estimations in 3D, dashed red lines indicate the averages over time.
4.2
Fig. 11. Follower’s reference during the reconfiguration experiment. 15-30-30 corresponds to the angles of relative position vectors
Formation Reconfiguration
For these experiments, the position of the leader is kept fixed and the formation reference of the follower is changed over a prescribed duration to reconfigure the formation geometry. Three distinct relative position references for the follower are displayed in Fig. 11. The total duration of the experiment is around 23 s and the scenario is realized five times. The mean and standard deviations of the formation errors are given in Fig. 12. We can see that the oscillatory behavior of the errors is due to the fact that the reference commands are sent separately over the course of the experiment. As a result, the mean error is around 0.21 m ≈ 7% of the total range. However, the errors reach steady state after some time and, on average, stabilize around 0.17 m. Solver metrics are shown in Fig. 13. We can make similar observations for the solver metrics, except for the cost value. It can be seen that the cost value is overall higher but reaches the same low levels at the end of the experiment. This is because the experiment is more dynamic compared to the formation navigation experiment. Furthermore, the observed plateaus are possibly due to the transient compromise between the individual terms in the cost function.
170
I. K. Erunsal et al.
Fig. 12. The formation errors for the reconfiguration experiment: x, y, z and norm errors, dashed red lines indicate the averages over time
4.3
Fig. 13. Solver metrics: cost function value, KKT value, solver time and the number of iterations, dashed red line indicate the averages over time
Discussion
In order for the reader to compare the quality of the results, we present a brief benchmark on relative position estimation and formation errors. Recently, [28] introduced a decentralized VIO-UWB perception and estimation scheme for aerial swarms. For an indoor trajectory of 27.8 m, they obtained about 0.08 m average relative position estimation error, which is comparable to our visionbased solution. Another state-of-the-art localization method presented by [5] targets outdoor applications with their ultra-violent (UV) markers and camera, and they achieved 1.16 m relative localization RMSE. Finally, [21] employed a distributed infrared-based relative localization system for the formations of quadrotors. The system achieved an error of less than 0.20 m in range, 5◦ in bearing and 10◦ in elevation for the sensor-to-target distance of 3 m. Considering formation control errors, [21] adopted a distributed graph-based formation control method that obtained a 0.27 m range and 5◦ elevation error for the desired range of 2.26 m, which corresponds to 12% of the full range. In our previous work [13], we proposed a decentralized predictive approach and achieved an average formation error of 0.44 m. The high error observed in this work was due to the exclusion of aerodynamic disturbances occurring between vehicles and the assumption of constant neighbor velocity in the prediction model. The low average estimation (≤0.1 m) and formation (≤0.17 m) errors obtained in two different formation control scenarios reveal the high performance of our strategy. By leveraging the tightly-coupled distributed MPC and EKF scheme, we could simultaneously comply with perception, communication, and vehicle constraints. Although the method is scalable in terms of algorithmic properties, a couple of issues should be addressed for real-world scalability. First, at least three wide-lens cameras should be used to perceive the neighbor vehicles in all directions. Next, the April-bundle pose detection algorithm is not lightweight, and in our current implementation, it runs on the CPU. There are
A Distributed Architecture for Onboard Estimation and Control of MAVs
171
efficient implementations of this algorithm using a GPU [29]. Finally, obtaining the SQP solution of NMPC is also costly, which limits the maximum number of neighbors each vehicle can have; however, there is no theoretical limit on the total swarm size.
5
Conclusion
Obtaining a robust, autonomous, performant and scalable control and estimation architecture for multi-robot systems is a significant challenge. In this work, we present a unified, predictive, and distributed architecture to carry out leaderfollower formations with MAVs. Our approach employs onboard resources and integrates various physical constraints, such as camera FOV, actuation, velocity, and acceleration limits, into the architecture. After a parameter tuning process in a high-fidelity simulator with three vehicles, we validated and showed the performance of our approach with two real quadrotors, one serving as leader and the other as follower, for two common formation tasks. In the future, we intend to validate the approach with more following vehicles, improve the autonomy of the leader, analyze the robustness in presence of various communication problems and compare our solution with alternative approaches in detail. Acknowledgment. This work has been partially sponsored by the FCT grant [PD/BD/135151/2017], the FCT doctoral program RBCog and the FCT project [UIDB/50009/2013]
References 1. Ebel, H., Ardakani, E., Eberhard, P.: Distributed model predictive formation control with discretization-free path planning for transporting a load. Robot. Auton. Syst. 96, 211–223 (2017) 2. Oh, K., Park, M., Ahn, H.: A survey of multi-agent formation control. Automatica 53, 424–440 (2015) 3. Eren, U., Prach, A., Ko¸cer, B., Rakovi´c, S., Kayacan, E., A¸cıkme¸se, B.: MPC in aerospace systems: current state and opportunities. J. Guid. Control. Dyn. 40, 1541–1566 (2017) 4. Zhou, X., et al.: Swarm of micro flying robots in the wild. Sci. Robot. 7, eabm5954 (2022) 5. Petr´ aˇcek, P., Walter, V., B´ aˇca, T., Saska, M.: Bio-inspired compact swarms of unmanned aerial vehicles without communication and external localization. Bioinspir. Biomim. 16, 026009 (2020) 6. Alonso-Mora, J., Montijano, E., N¨ ageli, T., Hilliges, O., Schwager, M., Rus, D.: Distributed multi-robot formation control in dynamic environments. Auton. Robot. 43, 1079–1100 (2019) 7. Soria, E., Schiano, F., Floreano, D.: Distributed Predictive Drone Swarms in Cluttered Environments. IEEE Robot. Autom. Lett. 7, 73–80 (2022) 8. Allamraju, R., et al.: Active perception based formation control for multiple aerial vehicles. IEEE Robot. Autom. Lett. 4, 4491–4498 (2019)
172
I. K. Erunsal et al.
9. Hafez, A., Marasco, A., Givigi, S., Iskandarani, M., Yousefi, S., Rabbath, C.: Solving multi-UAV dynamic encirclement via model predictive control. IEEE Trans. Control Syst. Technol. 23, 2251–2265 (2015) 10. Yuan, Q., Zhan, J., Li, X.: Outdoor flocking of quadcopter drones with decentralized model predictive control. Int. Soc. Autom. Trans. 71, 84–92 (2017) 11. Van Parys, R., Pipeleers, G.: Distributed MPC for multi-vehicle systems moving in formation. IEEE Robot. Autonom. Syst. 97, 144–152 (2017) 12. Abichandani, P., Levin, K., Bucci, D.: Decentralized formation coordination of multiple quadcopters under communication constraints. In: IEEE International Conference on Robotics and Automation, pp. 3326–3332 (2019) 13. Erunsal, I., Ventura, R., Martinoli, A. NMPC for formations of multi-rotor micro aerial vehicles: an experimental approach. In: International Symposium on Experimental Robotics, pp. 449–461 (2020) 14. Thakur, D., Tao, Y., Li, R., Zhou, A., Kushleyev, A., Kumar, V.: Swarm of inexpensive heterogeneous micro aerial vehicles. In: International Symposium on Experimental Robotics, pp. 413–423 (2020) 15. Olson, E.: AprilTag: a robust and flexible visual fiducial system. In: IEEE International Conference on Robotics and Automation, pp. 3400–3407 (2011) 16. Fossen, T.: Handbook of marine craft hydrodynamics and motion control. John Wiley and Sons (2011) 17. Mahony, R., Kumar, V., Corke, P.: Multirotor aerial vehicles: modeling, estimation, and control of quadrotor. IEEE Robot. Autom. Mag. 19, 20–32 (2012) 18. Omari, S., Hua, M., Ducard, G., Hamel, T.: Nonlinear control of VTOL UAVs incorporating flapping dynamics. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2419–2425 (2013) 19. Jazwinski, A.: Stochastic processes and filtering theory Jazwinski. AH Academic Press (1970) 20. Quan, Q.: Introduction to multicopter design and control. Springer (2017) 21. Dias, D.: Distributed State Estimation and Control of Autonomous Quadrotor Formations Using Exclusively Onboard Resources. (EPFL-IST PhD Thesis, No. 9224) (2019) 22. https://docs.px4.io/ . Autopilot control. Accessed July 2022 23. Magni, L., Scattolini, R.: Stabilizing decentralized model predictive control of nonlinear systems. Automatica 42, 1231–1236 (2006) 24. Szeliski, R.: Computer vision: algorithms and applications. Springer Science & Business Media (2010) 25. Gros, S., Zanon, M., Quirynen, R., Bemporad, A., Diehl, M.: From linear to nonlinear MPC: bridging the gap via the real-time iteration. Int. J. Control 93, 62–80 (2020) 26. Ferreau, H., Kraus, T., Vukov, M., Saeys, W., Diehl, M.: High-speed moving horizon estimation based on automatic code generation. In: IEEE Conference On Decision And Control, pp. 687–692 (2012) 27. Michel, O.: Webots: professional mobile robot simulation. Int. J. Adv. Rob. Syst. 1, 39–42 (2004) 28. Xu, H., et al.: Omni-swarm: a decentralized omnidirectional visual-inertial-uwb state estimation system for aerial swarms. IEEE Trans. Robot. (2022) 29. https://github.com/NVIDIA-AI-IOT/isaac ros apriltag . Accessed Sept 2022
Search Space Illumination of Robot Swarm Parameters for Trustworthy Interaction James Wilson1,2(B) and Sabine Hauert1,2 1
2
Bristol Robotics Laboratory, Bristol, UK [email protected] Department Of Engineering Mathematics, University of Bristol, Bristol, UK [email protected]
Abstract. For swarms of robots to become commonly adopted, operators will need to be able to adjust swarm behaviour in an intuitive and simple manner. To achieve this, we produce a feature space of swarm characteristic sets using Map-Elites. These sets provide users with the ability to explore specific combinations of characteristics known to achieve good results for the specified requirements. This allows operators to pass preferential parameter sets to swarm agents, without requiring a deep understanding of how parameters effect swarm behaviour. The resultant system gives the user reassurance that the swarm will operate to their needs while maintaining the benefits of distributed swarm technology. We demonstrate how this system might be used with a swarm logistics case study where an operator requires two swarm behaviour modes with differing characteristic needs. In this example we illustrate how an operator might express preference, receive recommendations, and select their final swarm characteristics. Keywords: Search space illumination Trustworthiness
1
· Swarm robotics ·
Introduction
Robot swarms create promising opportunities within a wide range of applications, from intralogistics for small businesses, to space exploration [24]. However, to create a swarm robotic system that can be deemed trustworthy, and thus enable wide-spread use, swarm operators will need to rapidly simulate, deploy, monitor, and control swarms through intuitive cyber-physical infrastructure, making use of methodologies that do not retract from the swarms inherent beneficial qualities. Swarms of robots present an interesting challenge when it comes to human control. Human input can benefit the swarms performance by providing context or instructions from the human operator that would not typically be available to the swarm [13]. However, the high number of agents within a swarm can present c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 173–186, 2024. https://doi.org/10.1007/978-3-031-51497-5_13
174
J. Wilson and S. Hauert
difficulty; with more agents acting within a system, the more difficult manually controlling the entire swarm is, potentially surpassing the cognitive limit of the operator [15]. Additionally, by centralising the swarm to a human operator it is possible to create a single point of failure in the system. As a result, the robot system may lose its swarm-like properties i.e. the adaptability, scalability and robustness swarms gain from their innate redundancy and distributed operation. [9] The level of control an operator has over a swarm is a key consideration when producing an effective swarm system [1]. Too much influence from a human, especially from an inexperienced operator, can lead to significantly reduced swarm performance as the collective behaviours of the swarm are overridden by human control. This effect has been referred to as ‘Neglect Benevolence’ [28] in which swarms can, in some cases, be observed to operate more effectively as interference from a human operator reduces. Nonetheless, currently there is a demand for human operator supervision of autonomous systems [20] and swarms [4]. Not only can operators provide the aforementioned context to swarm tasks, but they can also provide an element of trust in the swarm system, giving the public reassurance in their interactions with the swarm. Operators are able to make ethical considerations when presented with scenarios that current levels of autonomy would struggle to provide adequate resolutions to. For example: directing agents to areas of importance in a wild fire scenario, or selecting a preferable behaviour in a shared human-robot warehouse as humans enter/exit the environment. Even in less critical scenarios, operators themselves may exhibit more trust in a system they are able to influence towards their preference for operation. To address the issue of effective human swarm interaction, many techniques have been investigated. Some studies have explored approaches for direction control of agents including gesture, voice and touch [14,22]. While others have presented approaches for monitoring and providing information on many agents to an operator [6,29]. In addition to this, there have also been investigations into the appropriate levels of control over swarms [1,27]. Recently, evolutionary techniques have been deployed to automatically generate swarm behaviour to overcome the difficulty of collective behaviour design. [2]. Within this field, methods have been explored that generate several swarm behaviour options that are human understandable and human readable [8]. Operators with contextual awareness can then use these options to send commands to a swarm, which then behaves autonomously and in a distributed manner to execute the desired swarm behaviour. This allows the operator to modify the behaviour of the swarm as a task or environment varies, while still preserving swarm qualities. We look to expand upon the idea of operator behaviour selection and hope to add an element of trustworthiness to swarm systems by providing operators with characteristic options defined in simple terms. Options which have been measured and tested based on sets of generated, low-level, swarm parameters. When it comes to deployment, this means that an operator can simply express their use-case’s characteristic priorities, and select a set of parameters for their
Search Space Illumination for Trustworthy Swarms
175
swarm which represents an intuitive set of trade-offs. For example, an operator managing a swarm logistics warehouse may express that they require a high performing, robust, swarm but they have little requirement for energy efficiency. Our system could subsequently recommend multiple options, displaying the trade-offs between performance, robustness and energy efficiency. The operator could then select the option that best represents their requirements, uploading the parameters associated with their selected characteristic set to their swarm. This method should allow operators, even those who are inexperienced, to beneficially adjust the behaviour of the swarm, meeting user requirements for trust while preserving swarm capabilities. We achieve this method of interaction by creating a large feature space of swarm characteristics representing high performing sets of swarm parameters. This should provide users with the ability to explore combinations of trustworthiness characteristics and, as a result, receive suggestions for relevant parameter sets. Consequently, operators can request swarm behaviours in user-friendly terms without needing in depth knowledge of swarm behaviour or having to micromanage swarm agents. We will produce the aforementioned feature space through the use of MapElites, a search space illumination algorithm first introduced by Mouret and Clune [19]. Map-Elites has since been used to develop varieties of robotic behaviours to create adaptive methods of control [5], swarm behaviour generation [3,7,12] and has also been proposed as a method for building trust in optimisation algorithms by displaying multiple solutions to users at once [26]. From the characteristic sets (or ‘genomes’) generated through Map-Elites, we will demonstrate how a user might request and interact with the feature space to effectively and intuitively influence the behaviour of a swarm. An illustration of our target system can be seen in Fig. 1.
Fig. 1. Illustration of target interface displaying the behaviour of the swarm within an arena (left side of interface). Users are able to list their priority of characteristic traits. Based on this priority, the top sets of available characteristic combinations are displayed in a radar chart to the right. Operators can then select the suggestion they feel best fits their use case or can re-specify their preference order to receive additional recommendations.
176
2 2.1
J. Wilson and S. Hauert
Methodology Scenario
In our scenario we mimic a logistics swarm warehouse scenario. The swarm execute a simple box delivery task, moving items from one end of a 20 × 20m environment (displayed in Fig. 1 as an element within the target interface) to another. In this task, the swarm has 30 boxes to deliver with a time limit of 500 seconds. If the swarm completes the delivery of all of the boxes before the specified time limit the simulation ends early and the scores are recorded. Simulated Agents. The agents within our simulation are based on DOTS, an open access swarm robot solution designed for industrial swarm experimentation [10]. While this experimentation platform has a high fidelity simulation platform available, we chose to create a simple 2D simulation for the purpose of executing the illumination algorithm. This was due to the large amount of computing power that would be required to simulate 5–40 agents in high fidelity across several thousand iterations. In our simulation, agents measure 30 cm in diameter and are able to pick up boxes instantaneously once within 25 cm of the boxes centre coordinate. Agents have been simulated with simple 2D physics and, should their avoidance systems fail, are able to collide elastically with other agents, boxes, and the walls of the environment. The maximum velocity, size of the swarm, maximum amount of time an agent will carry a box, and distance at which agents begin to avoid one other are all controlled by the Map-Elites mutations. The limitations of our swarm parameters and the magnitude of mutations applied on each iteration of Map-Elites are shown in Table 1. Robot Controller. The controller present on each agent executes a simple set of instructions (illustrated in Fig. 2). They first search for boxes, moving randomly through the environment, gaining a randomly generated heading once every 10 s to modify their path. Simultaneously, the agents attempt to avoid one another, generating avoidance vectors, which navigate them away from the centroid of neighbouring agents (agents within the distance specified by the Table 1. Table of swarm parameter value constraints. Parameters
Minimum Maximum
± Maximum Mutation
Maximum Velocity
1 cm/s
200 cm/s
20 cm/s
Maximum Acceleration
200 cm/s
200 cm/s
N/A
Avoidance Range
1 cm
Unconstrained 25 cm
Swarm Size
3 agents
50 agents
Maximum Box Hold Time 5 s
5
Unconstrained 2.5 s
Search Space Illumination for Trustworthy Swarms
Check timer. Have 10 seconds passed?
Start.
No
Generate new random heading.
No
Is the box in range being held?
No
Are you carrying a box?
Apply movement bias towards collection zone.
Agent within emergency avoidance range?
No
Yes
Override movement with evasive vector away from closest robot.
Activate avoidance.
Yes
Are you in the drop off zone?
Yes
No
Yes
Apply movement bias towards drop off zone.
No
Yes
Yes
Pick up the box.
Agent within in avoidance range?
177
Is time holding box > max box hold
Drop the box.
Fig. 2. Flow chart illustrating the behaviour executed by every swarm agent.
‘Avoidance Range’ parameter). In addition to this we incorporate an emergency avoidance system which overrides the current heading of an agent and directs them away from any agent they sense less than 20 cm away. Once an agent encounters a box by chance, they are able to pick it up and attempt to move the box to the delivery zone, once again moving in the same random fashion while attempting to execute avoidance behaviour. To assist in the delivery of boxes there is a light bias applied to the movement of agents. When searching for boxes, a vector of random magnitude (between zero and half the maximum acceleration of the agent) is added in the direction of the box pick up zone. Once a box is collected, a similar vector is added away from the box pick up zone to assist in delivery. This is similar to the behaviour deployed by Milner et al. [16,17], with this bias acting as a proxy to simple directional awareness agents may have access to, such as an environmental gradient sensor (e.g to a light or wireless beacon). 2.2
Swarm Metrics
In order to create a feature space of elite parameter sets, multiple characteristics relating to the swarm must be produced. In this paper, our focus when implementing Map-Elites was to create a method of swarm interaction which enabled trustworthiness. In doing so, not only should the method of interaction be intuitive, fostering the trust between operator and swarm, but the characteristics the operator is able to select preference for should have a strong relation to trust. With this considered, we produced three characteristics, energy consumption, robot danger, and robustness, as well as a metric for overall swarm performance.
178
J. Wilson and S. Hauert
Performance. To measure performance (P ) we used number of boxes collected by the swarm (nB ) within the maximum available time (T ). In some instances agents may collect all of the available boxes before the maximum time limit. In instances such as this, to avoid using unnecessary computational resources, we truncated the experiment. To compensate for this, and to promote controllers that perform the task quickly, the number of boxes collected is multiplied by T t (where t is the time actually taken to complete the task). This produced a score representing how many boxes the swarm would have collected had agents continued moving boxes at the average rate of collection for the full experiment length (see Eq. 1). P =
nB T t
(1)
Energy Consumption. Our first swarm characteristic is the total energy consumed by the swarm (E) (displayed in Eq. 2). Energy consumption is a key ethical consideration when creating modern products and, depending on the use case, may be a factor in the trustworthiness of a system [21,25]. E was produced by summing the energy used by robots (Er ) across the experiment length. Energy consumption in this case was approximated through the magnitude of acceleration each agent experienced at any given time step. This creates a score capturing where the majority of energy would be consumed in the equivalent real-world system. As with the performance metric, to compensate for early completion of the task and to make parameter set comparison fair, the total score was multiplied by maximum experiment length (T ) over actual time taken to complete the task (t). E=
nr T Er t r=1
(2)
Robot Danger. The next characteristic was the average number of time each agent spent in the emergency avoidance mode (shown in Fig. 2). This characteristic was chosen to give an indication of how safe a system might be. Equation 3 shows that this was calculated by summing the amount of time each agent spent performing emergency avoidance (Dt ) throughout the experiment and dividing by the total number of agents in the swarm (nr ). This was once more multiplied by a factor of Tt to give a value representative of how long agents would be expected to have spent in emergency avoidance should the simulation have extended to the maximum time limit. nr Dt T r=1 D= (3) t nr
Search Space Illumination for Trustworthy Swarms
179
Robustness. Our final characteristic, robustness, comes from the recent work on quantifying swarm traits conducted by Milner et al. [18]. This metric is produced from the deviation from optimal performance due to faults (dP ) over the deviation from optimal swarm size due to loss of agents (dN ). Using this metric, values where R ≥ 1 are considered robust while R < 1 are not. This means that a system is considered robust when, despite failures or faults, the swarm produces a performance equal to or greater than the same system with the faulty agents entirely removed from the swarm. We show the equations comprising this metric in Eqs. 4 - 6. dP dN P − Po dP = 1 − Po R=
(4) (5)
Nf (6) N To calculate robustness in these experiments, faults must be injected into the system. We inject two faults into comparison trials to gauge the performance of the system when compromised. To simulate these robot faults we recreated a common wheel fault, in which faulty agents have a constant directional bias. We achieved this in simulation by creating an orthogonal vector to the right of the agent at one tenth the magnitude of the current velocity. This caused faulty agents to consistently spiral to the right, compromising avoidance and direction of travel. As a result of fault injection, we are able to gain values for the swarm’s performance with no faults injected (Po ) and performance with faults injected (P ). As mentioned before, across these trials the number of faults injected to the swarm for testing (Nf ) was always 2, while N , the number of agents in the swarm, would act as a variable parameter in our search space illumination. dN = 1 −
2.3
Map-Elites Illumination Algorithm Methodology
Having defined our simulation, performance metric, swarm parameters, and swarm characteristics, we deployed the Map-Elites illumination algorithm to generate characteristic subsets. This algorithm iterates through genomes (in our case the swarm parameter sets), selecting and mutating existing genomes to create offspring. To evaluate a new genome’s characteristics and performance, an experimental trial (in our case the previously described swarm simulation) is executed using the parameters of the offspring. If the evaluated genome outperforms the current occupants of a cell, or identifies a new characteristic niche, the genome is placed within the cell. The algorithm then iterates as new genomes are selected from the occupied cells, mutated, and their offspring evaluated. An illustration of this process can be seen in Fig. 3.
180
J. Wilson and S. Hauert
Speed: 1m/s
Performance: 50
Avoidance: 2m
Swarm size: 6
Repeat with new mutated genome as input. Robustness Robot Danger Energy Select new genome at random from the fea-
Performance
Params
1-1.25
0-5
15-20
52
Params
1-1.25
5-10
0-5
39 -> 50
Params
1-1.25
5-10
5-10
43
Speed: 2 m/s
0.3 m/s
Params
1-1.25
5-10
10-15
23
Avoidance: 1.5m
- 0.2m
Params
1-1.25
5-10
15-20
12
Params
1-1.25
10-15
0-5
54
Params
1-1.25
10-15
5-10
47
Parameters
12s Swarm size: 11
-1
Fig. 3. Illustration of the map elites algorithm process with example parameters, characteristic values and mutations.
In our use of Map-Elites we began with an empty feature space with characteristics divided equally into 10 bins, using maximum and minimum values identified through initial testing of the simulation. Starting with a single unevaluated parameter set, on each iteration thereafter we randomly selected 7 genomes currently filling the feature space to mutate. We then tested all 7 of the offspring in parallel, repeating each test 4 times, taking the mean results from the repetitions as the genome’s final characteristic and performance evaluation scores.
3
Results
The results of our application of Map-Elites can be seen in Fig. 4. These results, displaying 28000 total parameter mutations (4000 repetitions of our 7 parallel parameter mutation batches, executed until coverage of the feature space was deemed adequate), show patterns forming in the relations between characteristics and performance. To begin there is a clear gradient towards the top right of each graph. This shows, somewhat intuitively, that there is a link between power consumption, safety and performance. This is explained by fast moving or large volumes of agents consuming large amounts of energy to quickly move boxes from one side of the arena to the other. At these higher speeds agents also struggle to effectively maintain distance from one another, relying more so on emergency avoidance to prevent collisions. Interestingly, robustness does not seem to have
Search Space Illumination for Trustworthy Swarms
181
a linear relationship with performance. Instead, performance peaks at a sweet spot with robustness scores between 1–2, this reflects the inherent robustness within the swarm algorithm. Further tests and a wider variety of fault injections would likely produce a greater variance in the scores seen here, however this was outside of the scope of this initial proof-of-concept experimentation. The patterns and trade-offs shown here demonstrate the variety of performances that can be achieved by a swarm given different user requirements. The ability to visualise and intuitively select characteristics based on requirements could prove valuable in creating trustworthy and adoptable swarm systems. In the next section we explore the generated genomes in greater detail, demonstrating how feature spaces of swarm characteristics could be implemented in reality for user focused applications. 3.1
Qualitative Analysis: Case Study
Looking more closely at the results from our application of Map-Elites, we consider a case study in which users have specified a priority list for swarm characteristics. We then take a closer look at the parameters that could be selected for the representative genomes, highlighting how Map-Elites generated characteristics can create an effective tool for swarm control. Taking the feature space generated through our implementation of MapElites, we produce radar charts to illustrate characteristic values. To display the information intuitively, characteristics have been normalised to take values between 0 and 1. Energy consumption and danger have been used to produce values for energy efficiency (1 − E) and safety (1 − D) respectively. The resulting characteristics thus constitute their greatest fitness at a value of 1, in line with performance and robustness. In our example case study, a user requires a swarm for a cluttered warehouse in which humans will frequently interact with swarm agents during the day. At night there will be no humans present in the warehouse and agents will rely on energy stored via solar panels during the day. As a result, the operator specifies priority lists 1 and 2 for day and night operation respectively: Priority List 1 1. Safety 2. Performance 3. Energy Efficiency 4. Robustness Priority List 2 1. Energy Efficiency 2. Performance 3. Safety 4. Robustness To begin observing the trade-offs between characteristics, the feature space is ranked by characteristic in the order specified by the operator. The genome at the top of the list can then be selected as the first characteristic recommendation (examples 1 & 4 in this case study). Then, by making concessions to the highest priority characteristic, genomes from lower down the feature space ranking can be
182
J. Wilson and S. Hauert
Fig. 4. Heat maps of performance based on characteristics extracted using MAPElitess. Coloured cells in these heat maps display the performance scores across characteristic categories for robot danger, energy consumed and robustness. Black cells show characteristic combinations for which no parameter set was found across the Map-Elites iterations performed in this instance.
selected, providing recommendations that demonstrate how reducing the highest priority feature might effect the rest of the swarm’s features. This method for selecting the top characteristic examples does create some issues for the operator should they want a balance between multiple characteristic features. However, it should be possible to address this by adding the ability to specify minimum possible values for certain characteristics. This would reduce the size of the feature space, possibly removing genomes which perform highly in one particular characteristic. In restricting elements of the feature space, the resultant characteristic examples would be more representative of genomes that perform more evenly across multiple characteristic types.
Search Space Illumination for Trustworthy Swarms
183
In Fig. 5 we illustrate two radar charts showing the genomes that would be presented to the operator for each of the stated priority lists. In the illustration for priority list 1 we can see that example 1 shows the option with the highest possible safety rating. However, this safety comes at a cost to all of the other swarm features. By selecting other features with slightly lower safety scores, we see improvements to both robustness and performance. Similarly, for the radar chart given for priority list 2 we can see that example 4, the genome best representing the operators priorities, gives very good results for energy efficiency, safety and robustness but lacks performance. As energy efficiency is reduced in examples 5 and 6 we see a notable increase in performance. However, we also see a drop in robustness and safety. Using these graphs to display swarm information in simple terms, we can see how the search space illumination can be beneficial to a user. Operators can make their selection, clearly see the trade-offs between the features they have a primary interest in, and use this information to make informed swarm characteristic selection.
Fig. 5. Radar diagrams each displaying the characteristics and performance of the three example parameter sets that may be recommended for the specified operator preferences.
4
Conclusion and Future Work
In this paper we show how search space illumination techniques can be deployed to create a method of swarm robot interaction with a focus on trustworthiness. This is achieved by presenting human-centric swarm characteristics to a user in a simple, consumable manner. We have illustrated how this system might be used through a proof of concept case study, and identified the benefit such a methodology could provide to swarm operators. In the future we hope to take this work further by implementing the MapElites algorithm on a wider scale; evolving full swarm behaviours using behaviour
184
J. Wilson and S. Hauert
trees [11], instead of simply mutating swarm parameters. Through the use of this system, swarms could be equipped with vast repertoires of behaviour types, deployed based on human understanding of characteristic requirements. Another interesting implementation for this system would be to create diversity amongst a swarm, taking clusters of neighbouring genomes to be spread amongst agents. These selections may lead to more robust and effective behaviours with agents specialising and dividing labour to complement each other’s strengths and weaknesses. This would reflect the benefits demonstrated by natural examples of increased swarm diversity in social insects [23]. We also hope to investigate the viability of this system for user interaction by performing a human study. This study would examine whether a system similar to that presented in this paper would be perceived as beneficial and more trustworthy than other high-level command techniques, or direct agent control. Finally, the generation of discrete sets of parameters seen in this paper could enable rigorous validation of swarms. After the initial search space illumination, the resultant parameter sets or behaviours could be taken through a series of tests, evaluating the viability of the genomes, removing genomes from the feature space that are deemed unacceptable. Such a process would increase user confidence in the system, adding an additional layer of safety to enforce the trustworthiness of any behaviours or parameter sets selected via the user’s characteristic requirements.
References 1. Ashcraft, C.C., Goodrich, M.A., Crandall, J.W.: Moderating operator influence in human-swarm systems. In: 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), pp. 4275–4282. IEEE (2019) 2. Birattari, M., et al.: Automatic off-line design of robot swarms: a manifesto. Front. Robot. AI 6, 59 (2019) 3. Bossens, D.M., Tarapore, D.: Rapidly adapting robot swarms with swarm mapbased Bayesian optimisation. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 9848–9854. IEEE (2021) 4. Carrillo-Zapata, D., et al.: Mutual shaping in swarm robotics: user studies in fire and rescue, storage organization, and bridge inspection. Front. Robot. AI 7, 53 (2020) 5. Cully, A., Clune, J., Tarapore, D., Mouret, J.B.: Robots that can adapt like animals. Nature 521(7553), 503–507 (2015) 6. Divband Soorati, M., Clark, J., Ghofrani, J., Tarapore, D., Ramchurn, S.D.: Designing a user-centered interaction interface for human-swarm teaming. Drones 5(4), 131 (2021) 7. Engebraaten, S.A., Moen, J., Yakimenko, O.A., Glette, K.: A framework for automatic behavior generation in multi-function swarms. Front. Robot. AI, 175 (2020) 8. Hogg, E., Hauert, S., Harvey, D., Richards, A.: Evolving behaviour trees for supervisory control of robot swarms. Artif. Life Robot. 25(4), 569–577 (2020) 9. Jones, S., Milner, E., Sooriyabandara, M., Hauert, S.: Distributed situational awareness in robot swarms. Adv. Intell. Syst. 2(11), 2000110 (2020)
Search Space Illumination for Trustworthy Swarms
185
10. Jones, S., Milner, E., Sooriyabandara, M., Hauert, S.: DOTS: an open testbed for industrial swarm robotic solutions. arXiv preprint arXiv:2203.13809 (2022) 11. Jones, S., Studley, M., Hauert, S., Winfield, A.: Evolving behaviour trees for swarm robotics. In: Groß, R., et al. (eds.) Distributed Autonomous Robotic Systems. SPAR, vol. 6, pp. 487–501. Springer, Cham (2018). https://doi.org/10.1007/9783-319-73008-0 34 12. Kaiser, T.K., Hamann, H.: Evolution of diverse swarm behaviors with minimal surprise. In: ALIFE 2020: The 2020 Conference on Artificial Life, pp. 384–392. MIT Press (2020) 13. Kapellmann-Zafra, G., Salomons, N., Kolling, A., Groß, R.: Human-robot swarm interaction with limited situational awareness. In: Dorigo, M., et al. (eds.) ANTS 2016. LNCS, vol. 9882, pp. 125–136. Springer, Cham (2016). https://doi.org/10. 1007/978-3-319-44427-7 11 14. Kim, L.H., Drew, D.S., Domova, V., Follmer, S.: User-defined swarm robot control. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1–13 (2020) 15. Lewis, M.: Human interaction with multiple remote robots. Rev. Hum. Factors Ergon. 9(1), 131–174 (2013) 16. Milner, E., Sooriyabandara, M., Hauert, S.: Stochastic behaviours for retrieval of storage items using simulated robot swarms. Artif. Life Robot. 27, 1–8 (2022) 17. Milner, E., Sooriyabandara, M., Hauert, S.: Swarm diffusion-taxis: transport of spatial information for cooperative gradient-based navigation. In: AROB SWARM (2022) 18. Milner, E., Sooriyabandara, M., Hauert, S.: Swarm performance indicators (SPIS). In: Preparation, pp. 1–6 (2022) 19. Mouret, J.B., Clune, J.: Illuminating search spaces by mapping elites. arXiv preprint arXiv:1504.04909 (2015) 20. Nahavandi, S.: Trusted autonomy between humans and robots: toward human-onthe-loop in robotics and autonomous systems. IEEE Syst. Man Cybern. Magaz. 3(1), 10–17 (2017) 21. Partridge, A., et al.: ReRobot: recycled materials for trustworthy soft robots. In: 5th IEEE-RAS International Conference on Soft Robotics (2022) 22. Podevijn, G., O’Grady, R., Nashed, Y.S.G., Dorigo, M.: Gesturing at subswarms: towards direct human control of robot swarms. In: Natraj, A., Cameron, S., Melhuish, C., Witkowski, M. (eds.) TAROS 2013. LNCS (LNAI), vol. 8069, pp. 390– 403. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43645-5 41 23. Psalti, M.N., Gohlke, D., Libbrecht, R.: Experimental increase of worker diversity benefits brood production in ants. BMC Ecol. Evol. 21(1), 1–10 (2021) 24. Schranz, M., Umlauft, M., Sende, M., Elmenreich, W.: Swarm robotic behaviors and current applications. Front. Robot. AI 7, 36 (2020) 25. Thiebes, S., Lins, S., Sunyaev, A.: Trustworthy artificial intelligence. Electron. Mark. 31(2), 447–464 (2021) 26. Urquhart, N., Guckert, M., Powers, S.: Increasing trust in meta-heuristics by using map-elites. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 1345–1348 (2019) 27. Walker, P., Nunnally, S., Lewis, M., Chakraborty, N., Sycara, K.: Levels of automation for human influence of robot swarms. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting, vol. 57, pp. 429–433. SAGE Publications Sage CA: Los Angeles, CA (2013)
186
J. Wilson and S. Hauert
28. Walker, P., Nunnally, S., Lewis, M., Kolling, A., Chakraborty, N., Sycara, K.: Neglect benevolence in human control of swarms in the presence of latency. In: 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 3009–3014. IEEE (2012) 29. Wilson, J., Hauert, S.: Information transport in communication limited swarms. In: 5th International Symposium on Swarm Behavior and Bio-Inspired Robotics (2022)
Collective Gradient Following with Sensory Heterogeneous UAV Swarm Tugay Alperen Karag¨ uzel1(B) , Nicolas Cambier1 , A.E. Eiben1 , and Eliseo Ferrante1,2 1
Vrije Universiteit Amsterdam, Amsterdam, Noord-Holland, The Netherlands {t.a.karaguzel,n.p.a.cambier,a.e.eiben,e.ferrante}@vu.nl 2 Technology Innovation Institute, Abu Dhabi, United Arab Emirates
Abstract. In this paper, we present a new method for a swarm to collectively sense and follow a gradient in the environment. The agents in the swarm only rely on relative distance and bearing measurements of neighbors. Additionally, only a minority of agents in the swarm perceive the scalar value of the gradient at their location. We test the method with incrementally changing ratio of agents with sensors on various swarm sizes. In addition to repeated simulation experiments, we also test the performance with a real nano-drone swarm. Results show us that, using the new method, the swarm was successful at following the gradient in the environment even with a low portion of the swarm with sensors on various swarm sizes. A real nano-drone swarm also demonstrates a good performance in our test even with members having disabled sensors. Keywords: collective sensing swarm
1
· sensor heterogeneity · nano-drone
Introduction
Coordinated and ordered collective motion of entities can be encountered in different species from bacteria to big mammals [17]. They gain various advantages from this distributed organization. For some species, the advantage is improved searching of a food source [6]. For others, it can be safety and enhanced survival [11]. Among these advantages, an important one is emergent sensing. Emergent sensing can be defined as the collective behavior arising from social interactions that enable a swarm to perceive and react to a global feature in the environment while individuals alone perceive insufficient information about this feature and would not be able to react alone. A special case, considered in this paper, is the ability to achieve collective gradient following with individuals having the measurements limited to the scalar values of the gradient [3]. When individuals have the right decision and evaluation mechanisms, emergent sensing can be achieved without direct information exchange of the environmental feature. Fish schools couple this behavior with collective motion [2,13]. An individual fish cannot estimate the light gradient in the environment and, hence, cannot follow c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 187–201, 2024. https://doi.org/10.1007/978-3-031-51497-5_14
188
T. A. Karag¨ uzel et al.
it to migrate towards darkness. Nevertheless, a collective composed of individuals that can perceive neighbor’s relative positions, can successfully estimate the light gradient and migrate towards darkness, even when the light gradient is changing with time. Researchers have analyzed collective and emergent sensing of other species as well. For instance, bacteria evaluate and modulate their interactions with others according to the chemical concentration of a certain substance [5]. As a result, the swarm achieves a collective computation of the chemical gradient. A similar behavior has also been observed in honeybees [10]. Bees achieve emergent estimation of temperature and aggregate without any explicit information exchange. They do this by wandering around until colliding with another bee. They then modulate their waiting time according to the temperature of their current position. Such examples from nature have influenced the swarm robotics field. The behavior of honeybees was reproduced in wheeled robots in [18], whereby robots aggregate on particular regions in the environment depending on the light gradient. In another work [4], emergent sensing was achieved by relying only on inhibiting the sociality of agents. In [14], simulated robots modulate their messaging frequency depending on their local measurements to achieve collective taxis. In another example, robots in a tiled arena estimate the most frequent color of tiles [16]. Successful estimation of the main color shows a valid example of collective sensing without any coherence in the swarm. In addition, [7] use a swarm of nano-drones to localize a gas source. Despite their breadth, none of the above mentioned methods guarantee an ordered motion of the collective. Moreover, [7,14,16] all rely on peer to peer communication, which adds one possible point of failure to the system and threaten the robustness of the swarm. Emergent sensing and collective motion without any information exchange is, however, possible, but existing works [15] rely on multiple sensors to estimate desirable direction before modulating their social behaviors. In [9], we introduced a gradient following method for a flocking swarm. This method achieves collective gradient following without any information exchange, and by having all agents equipped with on-board peer sensing and local-scalar environmental measurements. However, it is yet possible to further improve the robustness of the swarm by considering cases where some of the agents in the swarm do not have sensors for environmental measurements. Having non-sensory agents in the swarm corresponds to cases where a portion of the swarm have damaged sensors or is unable to use their sensors for some other reason. Moreover, having non-sensory agents could be a design choice depending on the sensor cost. In this paper, we build on [9] to enable a swarm of largely non-sensory agents to successfully follow a gradient. For this purpose, we increase the attraction possibility between neighbor agents to produce more drastic reactions to higher gradient values. Consequently we expect the sensory agents to be more effective on non-sensory ones and become sufficient on influencing the whole swarm to move towards the increasing gradient while maintaining a collective motion. With the results, we show that gradient following is robust to a considerable
Collective Gradient Following with Sensory Heterogeneous UAV Swarm
189
proportion of non-sensory agents in the swarm, for a wide range of swarm size. In real life experiments, we emulate the peer and environmental sensing to show similar behaviors are achievable with a real aerial vehicle dynamics, in a way that does not violate the aforementioned constraints.
2
Methodology
Members of the swarm move in a 2D environment. Each member is individually controlled. It has a direction of motion (heading) and can only translate in this direction. In other words, the axis of linear speed is always aligned with the heading direction. Meanwhile, the angular speed of agents determines the change of heading. These constraints are similar to the non-holonomic constraints of a differential drive robot: although these are not natural for drones, they are imposed only on drones (as explained later) in order to build our method on top of others presented recently [1]. All agents can measure the relative distance and bearing of others within a given range, which we set to 3 m in this paper. The range is chosen by considering the trade-off between ensuring the locality of perception within the swarm and agents having sufficient space to react to other agents and maneuver without loosing the sight of them. Linear and angular speeds of a focal agent i are calculated by combining distance and bearing of perceived peers. To do that calculation, a proximal control vector [8], designated as pi is used. This vector generates a spring-like effect between the focal agent i and each perceived peer, which oscillates around the desired distance dides . It becomes a pulling effect if the perceived peer is too far (distance > dides ) and a pushing effect if the peer is too close (distance < dides ). Some agents, which we call “sensory agents”, can also measure the gradient’s local value at their positions, but they have no memory of previous measurement, so they have no information about the gradient’s direction. Non-sensory agents have no information about the gradient at all and can only measure the relative distance and bearing of their neighbours. We call our new method Attractive Desired Distance Modulation (ADM) as opposed to the previously presented one that we called Desired Distance Modulation (DM). In ADM, the proximal control vector produces more attraction between agents compared to DM [8,9]. ADM’s proximal control vector is calculated as follows for N perceived peers: pi = α
m
m jφi pm i (di , σi )∠e
(1)
m∈N
In the equation above, the relative distance of a neighbor peer m is shown with jφm i . dm i . The bearing angle of neighbor m is represented with ∠e In ADM, the desired distance, i.e. the distance where the neighbor m produces no virtual forces on the focal agent, is given by dides = σi . Finally, α stands for the proximal weight coefficient. The contribution of each perceived peer to the magnitude of pi is:
190
T. A. Karag¨ uzel et al.
m pm i (di , σi )
=− 2
σi dm i
0.5
σi − m di
(2)
In the above equation, is the coefficient that tunes the strength of the proximal vector. In addition to the peers positions, sensory agents can use the gradient’s local value to compute the proximal vector. These scalar measurements modulate dides through σi . In ADM, the desired distance of the focal agent (dides ) takes the same value as the coefficient σi . The value of σi of a sensory agent i is computed according to the scalar perception of the gradient as follows: σi = σc +
G◦ Gmax − G◦
0.2 σc
(3)
The scalar gradient perception of the focal agent is depicted by the term G◦ and the sensor threshold by Gmax . σc is a parameter which is kept constant and simply determines upper and lower boundaries of σi on sensory agents. On non-sensory agents, σi is always equal to σc . In other words, non-sensory agents always sense G◦ as zero. To calculate linear (Ui ) and angular (ωi ) speeds, first the proximal control vector ( pi ) is projected on to agent’s local frame. The local frame of agent is placed such that x-axis of this orthogonal and right-hand sided frame always coincides with the heading direction. Next, speeds are calculated as follows where K1 and K2 are respective gains of speeds and px and py are components of pi on agent’s local frame: Ui = K 1 p x ,
ω i = K2 p y
(4)
Finally, linear speed (Ui ) is clamped between 0 and Umax whereas angular speed (ωi ) is clamped between −ωmax and ωmax . This clamping implies that agents cannot move backwards. In case of a sharp repulsion/attraction, agent changes heading first and slowly increase the linear speed when the heading is diverged enough. In addition, the rate of change of heading being finite implies that agent changes heading over time, not instantaneously. Our previous proposed method is called Desired Distance Modulation (DM) [9]. The major differences between DM and ADM are the calculation of the proximal control vector magnitude and the modulation of the desired distance. In DM, the contribution of each neighbor peer to the magnitude of proximal control vector of i is calculated as follows: σi4 σi2 m (d , σ ) = − 2 − (5) pm i i i 5 3 (dm (dm i ) i ) The desired distance coefficient σi is obtained as in: ◦ G σi = σc + σc Gmax
(6)
Collective Gradient Following with Sensory Heterogeneous UAV Swarm
191
Fig. 1. Proximal control vector magnitude vs neighbor distance. Positive values represent attraction, while negative repulsion.
√ Differently from ADM, √ σi correlates with the desired distance by a factor of 2 (dides = 21/2 σi ) in DM. 2 (dides = 21/2 σi ) in DM. Figure 1a and Fig. 1b show the difference between the two methods of calculating, respectively, pm i and σi . In Fig. 1a, positive values produce attraction and negative ones repulsion. In addition, the points where the curves intersect the red dashed line (no force) indicate the desired distances (dides ). As seen in Fig. 1a, ADM produces attraction for the neighbors which are further than dides , up to the sensing range, which is set to 3.0. In contrary, DM produces attraction only for a short distance after dides and the attraction starts to decrease. Although the form of DM’s proximal control vector helps on maintaining a stable collective motion in the swarm [8], the form of ADM becomes advantageous when there are non-sensory agents since sensory agents can sufficiently influence the whole swarm. To maintain the stable collective motion, curve of ADM intersects the res dashed line at a shorter distance than DM, implying a shorter desired distance. In Fig. 1b, higher σi stands for larger dides on both methods. In ADM, σi changes drastically at high local perceived values to produce a impulse within the swarm by using sensory agents, which guides the swarm towards increasing gradient more successfully.
3
Experimental Setup
We perform both simulations as well as validation experiments on real nano drones. Simulations involve N agents with no inertial properties. The state of each agent is its position and heading, which are updated by discrete integration using a time step dt , which is set to 0.05 simulated seconds. The gradient in the environment is modeled as a grid which carries values between 0 and 255. The gradient model is designed in such a way that it carries the maximum value (255) in its center and has decreasing values radially out from there. Sensory agents can only perceive the value of the cell they are currently located in,
192
T. A. Karag¨ uzel et al.
with a random error in range of [−eg , eg ]. All agents including non-sensory ones can perceive relative peer distances and bearings within a limited interaction range (Di = 3.0 m). Random error is added to the relative range and bearing measurements within the ranges of [−er , er ] and [−eb , eb ] respectively. These ranges are set to: er = 0.025 m, eb = 0.09 radians and eg = 2.5. We also tested the following swarm sizes: 10, 20, 50, 100, 150, 200, 250. For a fair comparison, the grid is modeled in a way that each cell edge measures 0.04 m. In addition, larger swarms require larger gradient grid to counterbalance the effect of them covering a larger area. To achieve that, the grid is scaled in correlation with the swarm size while still preserving cell edge length. To assess the influence of nonsensory agents on the swarm, the ratio of sensory agents is incremented by 10% steps between 10% and 100% for each swarm size. Sensory agents are selected randomly. For each run of each setting, the swarm is initialized around a random point chosen on the isoline (which is a circle because of our gradient design) carrying values which are 25% of the maximum. This value was chosen to be challenging while starting with the whole swarm inside the gradient, since starting from outside of the gradient would make our methods unusable. The initialization is done within a circle large enough to host all agents with a safe distance among them when uniformly scattered. Random perturbations are applied to the positions before each run. The heading is randomly chosen for each agent. For each setting, 24 experiments are conducted. To assess a gradient following performance (GFP) for swarms, the local value perceived by all agents are averaged over swarm and recorded every 2 simulated seconds, together with agent states. Since larger swarms move in larger gradients, they require respectively longer times. Hence, the experiment duration increases proportionally to the swarm sizes (1620, 2292, 3623, 5123, 6275, 7245, 8100 simulated seconds). In all experiments of ADM and DM, speed gains were chosen as K1 = 0.2, K2 = 0.1. Whereas σc was 0.6 for ADM and 0.5 for DM. and α are chosen as 12.0 and 3.0 respectively, for both methods. The maximum speed was 0.08 m/s and maximum angular speed was 3.14 rad/s. To test our methods on agents which are under the constraints of aerial vehicle dynamics, we chose Crazyflie 2.1 nano-drones1 . A swarm of 7 Crazyflie’s is used for the experiments. In our setup, we use an Ultra-Wideband positioning system developed for the Crazyflie2 . In the setup, each drone calculates its position onboard in 3D. Since an off-the-shelf framework for onboard peer sensing on Crazyflie still does not exist, we use the Crazyswarm project [12] and parallelized controllers on central computer. Nevertheless in our central computer, we guarantee that constraints on peer (Di ) and environmental sensing (defined in Sect. 2) are never violated. During the experiment, each drone sends its position to the central computer. Then, the control software calculates linear and angular speeds on parallel controllers. Local value sensing is achieved in the same way 1 2
https://www.bitcraze.io/products/crazyflie-2-1/. https://www.bitcraze.io/documentation/system/positioning/loco-positioningsystem/.
Collective Gradient Following with Sensory Heterogeneous UAV Swarm
193
with a simulated gradient grid. The control software detects which virtual cell the real agent is in and the value of that cell is used in the velocity command calculations. Finally, linear and angular speed commands are sent back to each drone. Drones and control software exchange information 20 times a second. Here, it is important to note that the heading does not correspond to the yaw angle of the drone in our setup. Instead, the yaw is kept constant. The heading only updates the instantaneous direction of motion. By doing so, we get rid of extra control overhead of the yaw angle and still have an effective notion of heading.
Fig. 2. Performance comparisons of DM and ADM with 50 agents and 50% sensory agent ratio.
For each method, we tested the GFP for two cases: All agents are sensory and only 4 out of 7 agents are sensory. To test each setting, flights are repeated 5 times, once for each region selected around 4 corners in addition to the center point of our flight arena. Initial positions of the drones are chosen randomly in the selected region. The dimensions of the flight arena are 6.5 m by 4.0 by meters. Each flight test has a duration of 270 s. Finally, the variable parameters of ADM in flight tests were: K1 = 0.4, K2 = 0.2 and σc = 0.7. In DM, they are chosen as: K1 = 0.2, K2 = 0.1 and σc = 0.5. Maximum speeds, α and were exactly the same with simulations. The parameter values for simulation and flight experiments are manually tuned for the best performance by considering sensor ranges and maximum allowed speeds at the same time. Some parameter values change between flight and simulation experiments since the motion dynamics of point agents and nano-drones are different from each other.
4
Results and Discussion
In this section, we presents simulation experiments results followed by results from nano-drone swarm experiments. Finally, we discuss on our findings.
194
T. A. Karag¨ uzel et al.
First, we compare the performances of ADM and DM on following the gradient and observe the behaviours by looking at the interactions between agents and swarm trajectories. In Fig. 2a, the time evolution of the GFP over 24 repetitions is presented for 50 agents and 50% sensory agent ratio. We can observe that ADM reaches near-maximum values faster than DM and remains stable afterward. Meanwhile DM shows a wider range for quartiles and oscillation of the median around lower values than ADM. In Fig. 2b, we present the median and quartiles for GFP of both methods averaged over experiment duration, by changing sensory agent ratios. We also show whether there was any separation in the group. This is determined by calculating the total number of groups (N oG) which are completely out of each other’s sensing range at any instant. The performance of DM is significantly worse than ADM until the sensory agent ratio reaches 50%. After this point, although we can see a clear improvement with DM, its GFP never reaches values close to ADM’s. In addition, we observe more than one groups in DM for sensory agent ratios between 40% and 90%. The N oG being higher than 1 indicates that despite some portion of the swarm reaching the high value region, other portions can be elsewhere even outside of the defined gradient. To understand the reasons of performance differences between methods, the distances between agents can be recorded and observed. This observation is important because of the main intuitive idea behind our methodology: The heterogeneity of desired distance is the only possible cause for the emergent collective motion. In Fig. 3a and 3b (for DM and ADM respectively), we present all distances between all neighboring agents and desired distances of all agents as a distribution over the experiment duration and at the mid-point of experiment duration. We observe that the desired distance and the median distance of neighbors are closer to each other in ADM compared to DM. As expected from our methodology (see Eq. 3), sensory agents have larger desired distances on a gradient than non-sensory ones for both methods. Hence, we can understand the performances by observing the desired distances. Figure 3b shows a clear increase in desired distance for ADM, which indicates that perceived local values are getting higher and reaching a maximum. This increase is not present for DM in Fig. 3a. Another difference between the two methods is between minimum neighbor distances, which is the distance between each agent and its closest neighbor. Differently from the desired distance, minimum neighbor distance is a result rather than a variable we control with our methodology. In ADM, the minimum distance is significantly lower than the desired one. Nevertheless, the minimum distance never reaches near-zero values (which could be dangerous for physical agents) for 50 agents. In DM, it changes in close correlation with the desired distance which means that agents keep a safe distance from each other. The reason behind this difference in minimum distances between DM and ADM is that, in ADM, attraction is produced even for the furthest perceived neighbors. Thus, some portion of the repulsion that would keep the focal agent away from the nearest neighbor is cancelled out by that increased attraction. As a result,
Collective Gradient Following with Sensory Heterogeneous UAV Swarm
195
Fig. 3. Distributions of neighbor distances for sensory and non-sensory agents of 50 agents with 50% sensory ratio
the minimum distance where the balance is achieved is smaller when compared to DM. When violins are observed (which shows distribution at a time instant), we naturally see very similar properties with over-time distributions. Yet, it can be seen that distances are clustered around several values, in a multimodal distribution. This indicates that neighbors are located as “neighborhood rings” around the focal agents by having empty spaces between “rings”. In Fig. 4, we present the trajectory of the swarm centroids (i.e., the center of mass of the swarm in each experiment, not the individual agent trajectories) and snapshots from simulations. For both methods, we can clearly see “neighborhood rings” with aligned directions of motion when the swarm is following the gradient. Centroid trajectories also indicate that ADM ascends the gradient successfully with a decisive direction of motion while DM trajectories show disturbances and curves. Also in the top center snapshot, ADM forms a unique emergent and dynamic formation whereby the swarm rotates around the maximum. In DM, the swarm wanders around the maximum by covering a larger region without any emergent formation. Next, we focus on scalability and look at the performances of the methods for changing swarm sizes. For each swarm size, all possible sensory agent ratios are observed. In addition, the interaction between agents within a larger swarm is also analyzed. Median of time-averaged GFP distributions are presented in Fig. 5. With this new metric, we have an overall look at the performances for all agent sizes and sensory agent ratios. We observe that performance with ADM is
196
T. A. Karag¨ uzel et al.
Fig. 4. Centroid trajectories of the swarms for 24 repetitions and snapshots from phases of a simulation run. DM in top row, ADM in bottom. Non-sensory agents are colored in blue and sensory agents are colored in orange. (Color figure online)
Fig. 5. Median of time-averaged GFP of two methods for 24 repetitions. Left block shows values for DM and right block for ADM. Blue box boundaries points out a group separation during at least one of the experiments.
Collective Gradient Following with Sensory Heterogeneous UAV Swarm
197
consistently better than DM. Still, we can pin point the cases where ADM is not at its best, specifically with low sensory agent ratio combined with small swarm size or very large swarm size. When the swarm size is small, sensory agent’s total influence become insufficient to guide the whole swarm. On the other hand when swarm size is larger, visual inspections has shown reduced order. This results in lower speed of the swarm centroid, which makes the time limit given insufficient for swarm to gather higher values of the gradient. For DM, performances suffer comparatively more from decreasing sensory agent ratio and never reach high values for ratios lower than 50%. In addition, the blue box boundaries indicate a group separation (N oG > 1). It is also another indicator of low performance for DM, especially for larger swarm sizes. Experiments with increased swarm size give us the opportunity to analyze the scalability of the neighbor distances since we observed potential issues with smaller swarms. Figure 6a and 6b show this information for 50% sensory agent ratio. For ADM with 250 agents the desired and median neighbor distances are very close to each other, similarly with 50 agents. Yet minimum neighbor distance is clearly shorter. It can even be considered dangerous since it reaches near-zero values which indicates a possible collisions for real robots. When those distances are observed for DM, we can see that minimum neighbor distance never falls below 0.5 m. Despite that, sensory agents with DM still show a difference for 50 and 250 agents: Difference between desired distance and minimum distance is significantly higher for 250 agents. Especially in larger swarms with DM, agents fail to obtain the emergent isolation between sensory and non-sensory agents. Non-sensory agents with low desired distances become trapped among sensory ones. Hence, they produce minimum agent to agent distances for sensory agents similar to non-sensory ones. Finally, we assess the performances of ADM and DM with aerial vehicle dynamics. We observe the centroid trajectory of the nano-drone swarm (each trajectory corresponds to the swarm centroid in one experiment) in addition to calculating GFP. Figure 7 presents these centroid trajectories in addition to mean time-averaged GFP of 5 experiments. Start points are labeled with S where last recorded points are labeled with F. What we can observe from the centroid trajectories is that when all agents are sensory, both methods manage to follow the ascending gradient and stay around the maximum for the rest of the experiment. However, when only 4 agents are sensory out of 7, DM struggles to follow the gradient. Even if the swarm manages to reach the maximum, it does not stay there. In addition, since we did not introduce any boundaries, swarms with DM leave the gradient (and, hence, our flight arena) in 2 out of 5 experiments. For safety reasons, these experiments are terminated earlier than the time limit and the last recorded point is labeled with E. On the other hand, swarms with ADM show satisfactory performances with only 4 sensory agents. Although the centroid wanders around the maximum instead of staying there, we do not observe any behavior where the swarm abandons it or outright leaves the gradient. In addition, the formation we observed on simulation experiments also emerges in real flights with ADM.
198
T. A. Karag¨ uzel et al.
Fig. 6. Distributions of neighbor distances for sensory and non-sensory agents of 250 agents with 50% sensory ratio
As a final remark, for both methods and for all settings, there were no crashes or group separations. Drones always maintained a safe distance to each other and maintained a cohesive group. Shots of drone swarms can be seen in Fig. 7. Videos can be accessed on https://youtu.be/migbwjt EW0. We now discuss the reported results at a higher level. Altogether, when overall performances of both methods are compared, with various swarm sizes and sensory agent ratios, ADM clearly stands out. It shows better performances with larger swarms and lower proportions of sensory agents, even down to 10% of all swarm. Yet, for both methods, when the swarm is too small, the influence of sensory agents becomes insufficient to guide the rest of the swarm. Conversely, when the swarm becomes too large with low proportions of sensory agents, order cannot be obtained and maintained. Thus, although the swarm moves towards the ascending gradient, the performance ends up unsatisfactory within a given time limit. In neighbor distances data, we can observe the difference of methods on sensory and non-sensory agent’s neighbor distances. In ADM, the gap is larger, hence its performances are better. To understand this, we need to comprehend how both methods work in the first place. When sensory agents perceive higher values, they increase their desired distances to others. In other words, they become more “asocial”. Since non-sensory agents still have shorter desired
Collective Gradient Following with Sensory Heterogeneous UAV Swarm
199
Fig. 7. The swarm centroid trajectories of the nano-drone swarm and shots from flights
distances, they move to be closer to the others, in other words they become more “social”. The chase of “social” to be closer to “asocial”, ends up with a collective motion following increasing gradient. When the distance gap is more significant between sensory and non-sensory, gradient following gets better. Yet, we could not optimize DM manually to obtain larger difference in distances. In that case, group separation becomes a serious problem. This is expected because DM has a smaller attractive region in its proximal control vector (see Fig. 1a). ADM does not suffer from that but, as a drawback, it cannot guarantee safe distance between agents, especially in larger swarms. DM guarantees safe distance
200
T. A. Karag¨ uzel et al.
but suffers from low performance. The most optimized solution must be lying between them, yet still not discovered. We do not see the drawback of ADM in real flights since the number of agents is lower than in simulations. Hence ADM performs better than DM in nano-drone experiments as it does in simulation experiments.
5
Conclusion
In this paper, we propose an efficient gradient-following method which does not require information communication and relies on scalar sensing only. We compare the gradient following performance with our previous effort and test the performance of the method over changing swarm sizes and sensory agent ratios. Results show that this new method’s gradient following ability compares favorably to our previous effort which was, itself, comparable to the ideal case of a single agent able to sense the ascending direction of the gradient at each step. Moreover, our proposed method is robust to disabled or absent sensors for large portions of the swarm, for a wide range of swarm size. The possible searching applications for the gradient following behaviour are numerous: Wildfires, dangerous radioactive sources outdoor or gas leaks indoor are only some of them. The ability of the proposed method on being robust to disabled sensors becomes an effective advantage on these applications. Even a large portion of the swarm have disabled sensors for some reason, we can still expect the swarm to keep conducting the searching task. Implementing the control, peer sensing and scalar measurements on-board is an important step to take in the future. Meanwhile, designing a gradient model with more realistic form (a plume or fluctuating local values) would be a complementary step. In addition, our results also constitute building blocks for alternative functionalities, which opens the door for further research. For instance, if the swarm is tasked to become a predator entity to chase some prey, we expect that it will be able to do so, even if the latter is in the sensing range of only a portion of the swarm. For that reason we believe further research and real life trials with relevant sensors will reveal exciting promises for daily life usage.
References 1. Amorim, T., Nascimento, T., Petracek, P., De Masi, G., Ferrante, E., Saska, M.: Self-organized UAV flocking based on proximal control. In: 2021 International Conference on Unmanned Aircraft Systems (ICUAS), pp. 1374–1382. IEEE (2021) 2. Berdahl, A., Torney, C.J., Ioannou, C.C., Faria, J.J., Couzin, I.D.: Emergent sensing of complex environments by mobile animal groups. Science 339(6119), 574–576 (2013) 3. Berdahl, A.M., et al.: Collective animal navigation and migratory culture: from theoretical models to empirical evidence. Philos. Trans. Roy. Soc. B Biol. Sci. 373(1746), 20170009 (2018)
Collective Gradient Following with Sensory Heterogeneous UAV Swarm
201
4. Bjerknes, J.D., Winfield, A.F., Melhuish, C.: An analysis of emergent taxis in a wireless connected swarm of mobile robots. In: 2007 IEEE Swarm Intelligence Symposium, pp. 45–52. IEEE (2007) 5. Camley, B.A.: Collective gradient sensing and chemotaxis: modeling and recent developments. J. Phys.: Condens. Matter 30(22), 223001 (2018) 6. Couzin, I.D., Krause, J., Franks, N.R., Levin, S.A.: Effective leadership and decision-making in animal groups on the move. Nature 433(7025), 513–516 (2005). https://doi.org/10.1038/nature03236 7. Duisterhof, B.P., Li, S., Burgu´es, J., Reddi, V.J., de Croon, G.C.: Sniffy bug: a fully autonomous swarm of gas-seeking nano quadcopters in cluttered environments. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 9099–9106. IEEE (2021) 8. Ferrante, E., Turgut, A.E., Stranieri, A., Pinciroli, C., Birattari, M., Dorigo, M.: A self-adaptive communication strategy for flocking in stationary and nonstationary environments. Nat. Comput. 13(2), 225–245 (2014). https://doi.org/ 10.1007/s11047-013-9390-9 9. Karag¨ uzel, T.A., Turgut, A.E., Eiben, A.E., Ferrante, E.: Collective gradient perception with a flying robot swarm. Swarm Intell. 17, 117–146 (2023). https://doi. org/10.1007/s11721-022-00220-1 10. Kernbach, S., Thenius, R., Kernbach, O., Schmickl, T.: Re-embodiment of honeybee aggregation behavior in an artificial micro-robotic system. Adapt. Behav. 17(3), 237–259 (2009) 11. Olson, R.S., Hintze, A., Dyer, F.C., Knoester, D.B., Adami, C.: Predator confusion is sufficient to evolve swarming behaviour. J. Roy. Soc. Interface 10(85), 20130305 (2013). https://doi.org/10.1098/rsif.2013.0305 12. Preiss, J.A., H¨ onig, W., Sukhatme, G.S., Ayanian, N.: Crazyswarm: a large nanoquadcopter swarm. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 3299–3304. IEEE (2017) 13. Puckett, J.G., Pokhrel, A.R., Giannini, J.A.: Collective gradient sensing in fish schools. Sci. Rep. 8(1), 1–11 (2018) 14. Schmickl, T., Wotawa, F., Thenius, R., Varughese, J.C.: FSTaxis algorithm: bioinspired emergent gradient taxis. In: ALIFE 2016, The Fifteenth International Conference on the Synthesis and Simulation of Living Systems, pp. 330–337. MIT Press (2016) 15. Shaukat, M., Chitre, M.: Adaptive behaviors in multi-agent source localization using passive sensing. Adapt. Behav. 24(6), 446–463 (2016) 16. Valentini, G., Brambilla, D., Hamann, H., Dorigo, M.: Collective perception of environmental features in a robot swarm. In: Dorigo, M., et al. (eds.) ANTS 2016. LNCS, vol. 9882, pp. 65–76. Springer, Cham (2016). https://doi.org/10.1007/9783-319-44427-7 6 17. Vicsek, T., Zafeiris, A.: Collective motion. Phys. Rep. 517(3–4), 71–140 (2012) 18. Wahby, M., Petzold, J., Eschke, C., Schmickl, T., Hamann, H.: Collective change detection: adaptivity to dynamic swarm densities and light conditions in robot swarms. In: Artificial Life Conference Proceedings, pp. 642–649. MIT Press (2019)
DAN: Decentralized Attention-Based Neural Network for the MinMax Multiple Traveling Salesman Problem Yuhong Cao, Zhanhong Sun, and Guillaume Sartoretti(B) National University of Singapore, Mechanical Engineering Department, Singapore, Singapore {caoyuhong,sun z}@u.nus.edu, [email protected]
Abstract. The multiple traveling salesman problem (mTSP) is a wellknown NP-hard problem with numerous real-world applications. In particular, this work addresses MinMax mTSP, where the objective is to minimize the max tour length among all agents. Many robotic deployments require recomputing potentially large mTSP instances frequently, making the natural trade-off between computing time and solution quality of great importance. However, exact and heuristic algorithms become inefficient as the number of cities increases, due to their computational complexity. Encouraged by the recent developments in deep reinforcement learning (dRL), this work approaches the mTSP as a cooperative task and introduces DAN, a decentralized attention-based neural method that aims at tackling this key trade-off. In DAN, agents learn fully decentralized policies to collaboratively construct a tour, by predicting each other’s future decisions. Our model relies on attention mechanisms and is trained using multi-agent RL with parameter sharing, providing natural scalability to the numbers of agents and cities. Our experimental results on small- to large-scale mTSP instances (50 to 1000 cities and 5 to 20 agents) show that DAN is able to match or outperform state-of-the-art solvers while keeping planning times low. In particular, given the same computation time budget, DAN outperforms all conventional and dRLbased baselines on larger-scale instances (more than 100 cities, more than 5 agents), and exhibits enhanced agent collaboration. A video explaining our approach and presenting our results is available at https://youtu.be/ xi3cLsDsLvs.
Keywords: multiple traveling salesman problem planning · deep reinforcement learning
1
· decentralized
Introduction
The traveling salesman problem (TSP) is a challenging NP-hard problem, where given a group of cities (i.e., nodes) of a given complete graph, an agent needs to find a complete tour of this graph, i.e., a closed path from a given starting node c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 202–215, 2024. https://doi.org/10.1007/978-3-031-51497-5_15
DAN: Decentralized Attention-based Neural Network for the MinMax mTSP
203
that visits all other nodes exactly once with minimal path length. The TSP can be further extended to the multiple traveling salesman problem (mTSP), where multiple agents collaborate with each other to visit all cities from a common starting node with minimal cost. In particular, MinMax (minimizing the max tour length among agents, i.e., total task duration) and MinSum (minimizing total tour length) are two most common objectives for mTSP [1–3]. Although simple in nature, the TSP and mTSP are ubiquitous in robotics problems that need to address agent distribution, task allocation, and/or path planning, such as multi-robot patrolling, last mile delivery, or distributed search/coverage. For example, Faigl et al. [4], Obwald et al. [5] and Cao et al. [6] proposed methods for robot/multi-robot exploration tasks, at the core of which is such a TSP/mTSP instance, whose solution quality will influence the overall performance. More generally, dynamic environments are involved in many robotic deployments, where the underlying graph may change with time and thus require the TSP/mTSP solution be recomputed frequently and rapidly (within seconds or tens of seconds if possible). While state-of-the-art exact algorithms (e.g., CPLEX [7], LKH3 [8], and Gurobi [9]) can find near-optimal solutions to TSP instances in a few seconds, exact algorithms become unusable for mTSP instances if computing time is limited [2]. As an alternative, meta-heuristic algorithms like OR Tools [10] were proposed to balance solution quality and computing time. In recent years, neural-based methods have been developed to solve TSP instances [11–13] and showed promising advantages over heuristic algorithms both in terms of computing time and solution quality. However, neural-based methods for mTSP are scarce[2,3]. In this work, we introduce DAN, a decentralized attention-based neural approach for the MinMax mTSP, which is able to quickly and distributedly generate tours that minimize the time needed for the whole team to visit all cities and return to the depot (i.e., the makespan). We focus on solving mTSP as a decentralized cooperation problem, where agents each construct their own tour towards a common objective. To this end, we rely on a threefold approach: first, we formulate mTSP as a sequential decision-making problem where agents make decisions asynchronously towards enhanced collaboration. Second, we propose an attention based neural network to allow agents to make individual decisions according to their own observations, which provides agents with the ability to implicitly predict other agents’ future decisions, by modeling the dependencies of all the agents and cities. Third, we train our Fig. 1. DAN’s final solumodel using multi-agent reinforcement learning with tion to an example mTSP parameter sharing, which provides our model with problem. natural scalability to arbitrary team sizes (Fig. 1). We present test results on randomized mTSP instances involving 50 to 1000 cities and 5 to 20 agents, and compare DAN’s performance with that of exact, meta-heuristic, and dRL methods. Our results
204
Y. Cao et al.
highlight that DAN achieves performance close to OR Tools, a highly optimized meta-heuristic baseline [10], in relatively small-scale mTSP (fewer than 100 cities). In relatively large-scale mTSP, our model is able to significantly outperform OR Tools both in terms of solution quality and computing time. We believe this advantage makes DAN more reliable in dynamic robotic tasks than non-learning approaches. DAN also outperforms two recent dRL based methods in terms of solution quality for nearly all instances.
2
Prior Works
For TSP, exact algorithms like dynamic programming and integer programming can theoretically guarantee optimal solutions. However, these algorithms do not scale well. Nevertheless, exact algorithms with handcrafted heuristics (e.g., CPLEX [7]) remain state-of-the-art, since they can reduce the search space efficiently. Neural network methods for TSP became competitive after the recent advancements in dRL. Vinyals et al. [11] first built the connection between deep learning and TSP by proposing the Pointer network, a sequence-to-sequence model with a modified attention mechanism, which allows one neural network to solve TSP instances composed of an arbitrary number of cities. Kool et al. [13] replaced the recurrent unit in the Pointer Network and proposed a model based on a Transformer unit [14]. Taking the advantage of self-attention on modeling the dependencies among cities, Kool et al.’s achieved significant improvement in term of solution quality. In mTSP, cities need to be partitioned and allocated to each agent in addition to finding optimal sequences/tours. This added complexity makes state-ofthe-art exact algorithm TSP solvers impractical for larger mTSP instances. OR Tools [10], developed by Google, is a highly optimized meta-heuristic algorithm. Although it does not guarantee optimal solutions, OR Tools is one of the most popular mTSP solvers because it effectively balances the trade-off between solution quality and computing time. A few recent neural-based methods have also approached the mTSP in a decentralized manner. Notably, Hu et al. [2] proposed a model based on a shared graph neural network and distributed attention mechanism networks to first allocate cities to agents. OR Tools can then be used to quickly generate the tour associated with each agent. Park et al. [3] presented an end-to-end decentralized model based on graph attention network, which could solve mTSP instances with arbitrary numbers of agents and cities. They used a type-ware graph attention mechanism to learn the dependencies between cities and agents, where the extracted agents’ feature and cities’ features are then concatenated during the embedding procedure before outputting the final policy.
3
Problem Formulation
The mTSP is defined on a graph G = (V, E), where V = {1, ..., n} is a set of n nodes (cities), and E is a set of edges. In this work, we consider G to be complete, i.e., (i, j) ∈ E for all i = j. Node 1 is defined as the depot, where all m agents
DAN: Decentralized Attention-based Neural Network for the MinMax mTSP
205
are initially placed, and the remaining nodes as cities to be visited. The cities must be visited exactly once by any agent. After all cities are visited, all agents return to the depot to finish their tour. Following the usual mTSP statement in robotics [15], we use the Euclidean distance between cities as edge weights, i.e., this work addresses the Euclidean mTSP. We define a solution to mTSP π = π 1 , ..., π m as a set of agent tours. Each agent tour π i = (π1i , π2i , ..., πni i ) is π1i = πni i is the an ordered set of the cities visited by this agent, where πti ∈ V and m depot. ni denotes the number of cities in this agent tour, so i=1 ni = n+2m−1 since all agent tours involve the depot twice. Denoting the Euclidean distance n i −1 i between cities i and j as cij , the cost of agent i’s tour reads: L(π i ) = cπji πj+1 . j=1
As discussed in Sect. 1, we consider MinMax (minimize
max
i∈{1,...,m}
L(π i )) as the
objective of our model.
4
mTSP as an RL PROBLEM
In this section, we cast mTSP into a decentralized multi-agent reinforcement learning (MARL) framework. In our work, we consider mTSP as a cooperative task instead of an optimization task. We first formulate mTSP as a sequential decision making problem, which allows RL to tackle mTSP. We then detail the agents’ state and action spaces, and the reward structure used in our work. Sequential Decision Making. Building upon recent works on neural-based TSP solvers, we consider mTSP as a sequential decision-making problem. That is, we let agents interact with the environment by performing a sequence of decisions, which in mTSP are to select the next city to visit. These decisions are made sequentially and asynchronously by each agent based on their own observation, upon arriving at the next city along their tour, thus constructing a global solution collaboratively and iteratively. Each decision (i.e., RL action) will transfer the agent from the current city to the next city it selects. Assuming all agents move at the same, uniform velocity, each transition takes time directly proportional to the Euclidean distance between the current city and the next city. We denote the remaining time until the next decision of agent i as gi (i.e., its remaining travel time). In practice, we discretize time in steps, and let gi decrease by Δg at each time step, which defines the agents’ velocity. This assumption respects the actual time needed by agents to move about the domain, but also lets agents make decisions asynchronously (i.e., only when they reach the next city on their tour). We note that staggering the agents’ decision in time naturally helps avoid potential conflicts in the city selection process, by allowing agents to select cities in a sequentially-conditional manner (i.e., agents select one after the other, each having access to the decisions already made by agents before them in this sequence). We empirically found that this significantly improves collaboration and performance. Observation. We consider a fully observable world where each agent can access the states of all cities and all agents. Although a partial observation is more
206
Y. Cao et al.
Algorithm 1. Sequential decision making to solve mTSP. Input: number of agents m, graph G = (V, E) Output: Solution π Initialize mask M = {0}, remaining travel time g = {0}, and empty tours starting at the depot π i = {1} (1 ≤ i ≤ m). while sum(M ) < n do for i = 1, ..., m do if gi ≤ 0 then Observe sci , sai of agent i and outputs p Select next city πti from p (t = Length(π i ) + 1) Append πti to π i , M [πti ] ← 1, gi ← cπi πti t−1 end if gi ← gi − Δg end for end while return π = {π 1 , ..., π m }
common in decentralized MARL [16], a global observation is necessary to make our model comparable to baseline algorithms, and partial observability will be considered in future works. Each agent’s observation consists of three parts: the cities state, the agents state, and a global mask. The cities state sci = (xci , yic ), i ∈ {1, ..., n} contains the Euclidean coordinates of all cities relative to the observing agent. Compared to absolute information, we empirically found that relative coordinates can help prevent premature convergence and lead to a better final policy. The agents state sai = (xai , yia , gi ), i ∈ {1, ..., m} contains the Euclidean coordinates of all agents relative to the observing agent, and the agents’ remaining travel time gi . As mTSP is a cooperative task, one agent can benefit from observing the state of other agents, e.g., to predict their future decisions. Finally, agents can observe a global mask M : an n-dimensional binary vector containing the visit history of all n cities. Each entry of M is initially 0, and is set to 1 after any agent has visited the corresponding city. Note that the depot is always unmasked during the task. This help agents avoid to be forced to visit remaining cities even it would lead to worse solutions. Action. At each decision step of agent i, based on its current observation (sci , sai , M ), our decentralized attention-based neural network outputs a stochastic policy p(πti ), parameterized by the set of weights θ: pθ (πti = j|sci , sai , M ), where j denotes an unvisited city. Agent i takes an action based on this policy to select the next city πti . By performing such actions iteratively, agent i constructs its tour π i . Reward Structure. To show the advantage of reinforcement learning, we try to minimize the amount of domain knowledge introduced into our approach.
DAN: Decentralized Attention-based Neural Network for the MinMax mTSP
207
In this work, the reward is simply the negative of the max tour length among agents: R(π) = − max (L(π i )), and all agents share it as a global reward. i∈{1,..,m}
This reward structure is sparse, i.e., agents only get rewarded after all agents finish their tours.
5
DAN: Decentralized Attention-Based Network
We propose an attention-based neural network, composed of a city encoder, an agent encoder, a city-agent encoder, and a decoder. Its structure is used to model three kinds of dependencies in mTSP, i.e., the agent-agent dependencies, the city-city dependencies, and the agent-city dependencies. To achieve good collaboration in mTSP, it is important for agents to learn all of these dependencies to make decisions that benefit the whole team. Each agent uses its local DAN network to select the next city based on its own observation. Compared to existing attention-based TSP solvers, which only learn dependencies among cities and finds good individual tours, DAN further endows agents with the ability to predict each others’ future decision to improve agent-city allocation, by adding the agent and the city-agent encoders. Figure 2 shows the structure of DAN. Based on the observations of the deciding agent, we first use the city encoder and the agent encoder to model the dependencies among cities and among agents respectively. In the city-agent encoder, we then update the city features by considering other agents’ potential decisions according to their features. Finally, in the decoder, based on the deciding agent’s current state and the updated city features, we allocate attention weights to each city, which we directly use as its policy. Attention Layer. The Transformer attention layer [14] is used as the fundamental building block in our model. The input of such an attention layer consists of the query source hq and the key-and-value source hk,v , which are both vectors with the same dimension. The attention layer updates the query source using the weighted sum of the value, where the attention weight depends on the similarity between query and key. We compute the query qi , key ki and value vi v k,v Q K V as: qi = W Q hqi , ki = W k hk,v are all learni , vi = W hi , where W , W , W able matrices with size d × d. Next, we compute the similarity uij between the query qi and the key kj using a scaled dot product: uij =
qiT ·kj √ . d
Then we calculate
uij ne uij j=1 e
the attention weights aij using a softmax: aij = . Finally, we compute a weighted n sum of these values as the output embedding from this attention layer: hi = j=1 aij vj . The embedding content is then passed through the feed forward sublayer (containing two linear layer and a ReLU activation). Layer normalization and residual connections are used within these two sublayers as in [14]. City Encoder. The city encoder is used to extract features from the cities state sci and model their dependencies. The city encoder first embeds the relative Euclidean coordinates xci of city i, i ∈ {2, ..., n} into a d-dimensional (d = 128
208
Y. Cao et al.
Fig. 2. DAN consists of a city encoder, an agent encoder, a city-agent encoder and a final decoder, which allows each agent to individually process its inputs (the cities states, and the agents states), to finally obtain its own city selection policy. In particular, the agent and city-agent encoders are introduced in this work to endow agents with the ability to predict each others’ future decision and improve the decentralized distribution of agents.
in practice) initial city embedding hci using a linear layer. Similarly, the depot’s Euclidean coordinates xc1 are embedded by another linear layer to hc1 . The initial city embedding is then passed through an attention layer. Here hq = hk,v = hc , as is commonly done in self-attention mechanisms. Self-attention achieved good performance to model the dependencies of cities in single TSP approaches [13], and we propose to rely on the same fundamental idea to model the dependencies in mTSP. We term the output of the city encoder, h c , the city embedding. It contains the dependencies between each city i and all other cities. Agent Encoder. The agent encoder is used to extract features from the agents state sai and model their dependencies. A linear layer is used to separately embed each (3-dimensional) component of sai into the initial agent embedding hai . This embedding is then passed through an attention layer, where hq = hk,v = ha . We term the output of this encoder, h a the agent embedding. It contains the dependencies between each agent i and all other agents. City-agent Encoder. The city-agent encoder is used to model the dependencies between cities and agents. The city-agent encoder applies an attention layer with cross-attention, where hq = h c , hk k, v = h a . We term the output hca of this encoder the city-agent embedding. It contains the relationship between each city i and each agent j and implicitly predicts whether city i is likely to be selected by another agent j, which is one of the keys to the improved performance of our model.
DAN: Decentralized Attention-based Neural Network for the MinMax mTSP
209
Decoder. The decoder is used to decode the different embeddings into a policy for selecting the next city to visit. The decoder starts with encoding the deciding agent’s current state. We choose to express the current agent state implicitly by computing an aggregated embedding hs which is the mean of the city embedding. This operation is similar to the graph embedding used in [13]. The first attention layer then adds the agent embedding to the aggregated embedding. In doing so, it relates the state of the deciding agent to that of all other agents. Here hq = hs and hk,v = h a . This layer outputs the current state s embedding h . After that, a second attention layer is used to compute the final candidate embedding hf , where hq = h s , hk,v = hca . This layer serves as a glimpse which is common to improve attention mechanisms [12]. There, when computing the similarity, we rely on the global mask M to manually set the similarity ui = −∞ if the corresponding city i has already been visited to ensure the attention weights of visited cities are 0. The final candidate embedding hf then passes through a third and final attention layer. The query source is the final candidate embedding hf , and the key source is the city-agent embedding hca . For this final layer only, following [11], we directly use the vector of attention weights as the final policy for the deciding agent. The same masking operation is also applied in this layer to satisfy the mTSP constraint. These similarities are normalized using a Softmax operation, to finally yield the probability distribution n p for the next city to visit: pi = pθ (πtj = i|sci , sai , M ) = eui / i=1 eui .
6
Training
In this section, we describe how DAN is trained, including the choice of hyperparameters and hardware used. REINFORCE with Rollout Baseline. In order to train our model, we define ni the policy loss: L = −Epθ (πi ) [R(π)], where pθ (π i ) = pθ (πti |sci , sai , M ). The t=1
policy loss is the expectation of the negative of the max length among the tours of agents. The loss is optimized by gradient descent using the REINFORCE algorithm with a greedy rollout baseline [13]. That is, we re-run the same exact episode from the start a second time, and let all agents take decisions by greedily exploiting the best policy so far (i.e., the “baseline model” explained in Sect. 6 below). The cumulative reward b(π) of this baseline episode is then used to estimate the advantage function: A(π) = R(π) − b(π) (with R(π) the cumulative reward at each state of the RL episode). This helps reduce the gradient variance and avoids the burden of training the model to explicitly estimate the state value, as in traditional actor-critic algorithms. The final gradient estimator for the policy loss reads: L = −Epθ (πi ) [(R(π) − b(π))∇logpθ (π i )]. Distributed Training. We train our model using parameter sharing, a general method for MARL [17]. That is, we allow agents to share the parameters
210
Y. Cao et al.
of a common neural network, thus making the training more efficient by relying on the sum of experience from all agents. Our model is trained on a workstation equipped with a i9-10980XE CPU and four NVIDIA GeForce RTX 3090 GPUs. We train our model utilizing Ray, a distributed framework for machine learning [18] to accelerate training by parallelizing the code. With Ray, we run 8 mTSP instances in parallel and pass gradients to be applied to the global network under the A2C structure [19]. At each training episode, the positions of cities are generated uniformly at random in the unit square [0, 1]2 and the velocity of agents is set to Δg = 0.1. The number of agent is randomized within [5, 10] and the number of cites is randomized within [20, 100] during early training, which needs 96 h to converge. After initial convergence of the policy, the number of cities is randomized within [20, 200] for further refinement, requiring 24 h of training. We formulate one training batch after 8 mTSP instances are solved, and perform one gradient update for each agent. We train the model with the Adam optimizer [20] and use an initial learning rate of 10−5 and decay every 1024 steps by a factor of 0.96. Every 2048 steps we update the baseline model if the current training model is significantly better than the baseline model according to a t-test. Our full training and testing code is available at https://bit.ly/ DAN mTSP.
7
Experiments
We test our decentralized attention-based neural network (DAN) on numerous sets of 500 mTSP instances each, generated uniformly at random in the unit square [0, 1]2 . We test two different variants of our model, which utilize the same trained policy differently: – Greedy: each agent always selects the action with highest activation in its policy. – Sampling: each agent selects the city stochastically according to its policy. For our sampling variant, we run our model multiple times on the same instance and report the solution with the highest quality. While [13] sample 1280 solutions for single TSP, we only sample 64 solutions (denoted as s.64) for each mTSP instance to balance the trade-off between computing time and solution quality. In the test, the velocity of agents is set to Δg = 0.01 to improves the performance of our model by allowing more finely-grained asynchronous action selection. 7.1
Results
We compare the performance of our model with both conventional and neuralbased methods. For conventional methods, we test OR Tools, evolutionary algorithm (EA), and self organizing maps (SOM) [21] on the same test set. OR Tools initially gets a solution using meta-heuristic algorithms (path-cheapest-arc) and then further improves it by local search (2-opt) [22] (denoted as OR). We allow OR Tools to run for a similar amount of time as our sampling variant, for fair
DAN: Decentralized Attention-based Neural Network for the MinMax mTSP
211
Table 1. Results on random mTSP set (500 instances each). n denotes the number of cities and m denotes the number of agents n=50 m=10
n=100 m=5
n=100 m=10
Max.
T(s)
Max.
T(s)
Max.
T(s)
Max.
T(s)
Max.
T(s)
EA
2.35
7.82
2.08
9.58
1.96
11.50
3.55
12.80
2.75
17.52
SOM
2.57
0.76
2.30
0.78
2.16
0.76
3.10
1.58
2.41
1.58
OR
2.04
12.00
1.96
12.00
1.96
12.00
2.36
18.00
2.29
18.00
Gurobi
2.54
3600
2.42
3600
7.29
3600
7.17
3600
Kool et al.
2.84
0.20
2.52
0.25
3.28
0.37
2.77
0.41
Park et al.
2.37
Hu et al.
2.12
0.01
DAN(g.)
2.29
0.25
2.11
DAN(s.64)
2.12
7.87
1.99
Method
n=50 m=5
Method
n=50 m=7
2.64
0.22
2.18
2.10
2.88
2.23
1.95
0.02
2.48
0.04
2.09
0.04
0.26
2.03
0.30
2.72
0.43
2.17
0.48
9.38
1.95
11.26
2.55
12.18
2.05
14.81
n=100 m=15 n=200 m=10 n=200 m=15 n=200 m=20 Max.
T(s)
Max.
T(s)
EA
2.51
21.63
4.07
29.91
3.62
34.33
3.37
39.34
SOM
2.22
1.57
2.81
3.01
2.50
3.04
2.34
3.04
OR
2.25
18.00
2.57
63.70
2.59
60.29
2.59
61.74
Kool et al.
2.64
0.46
3.27
0.78
2.92
0.83
2.77
0.89
Park et al.
2.16
DAN(g.)
2.09
0.58
2.40
0.93
2.20
0.98
2.15
1.07
DAN(s.64)
2.00
19.13
2.29
23.49
2.13
26.27
2.07
29.83
2.50
Max.
T(s)
2.38
Max.
T(s)
2.44
comparison. Note that OR Tools is always allowed to perform local search if there is computing time left after finding an initial solution. Exact algorithms need hours of time to solve one mTSP instances. Here, we report the results of Gurobi [9], one of the state-of-the-art integer linear programming solver, from Hu et al.’s paper [2], where 1 h of computation was allowed for each instance. Table 1 reports the average MinMax cost (lower is better) for small-scale mTSP instances (from 50 to 200 cities), as well as the average computing time per instance for each solver. For neural-based methods, we report Park et al.’s results [3] and Hu et al.’s results [2] from their papers, since they did not make their code available publicly. Since Park et al.’s paper does not report the computing time of their approach, we leave the corresponding cells blank in Table 1. Similarly, since Hu et al. did not provide any results for cases involving more than 100 cities or more than 10 agents, these cells are also left blank. Note that the test sets used by Park et al. and Hu et al. are likely different from ours, since they have not been made public. However, the values reported here from their paper are also averaged over 500 instances under the same exact conditions as the ones used in this work. Finally, for completeness, we also test Kool et al.’s (TSP) model on our mTSP instances,
212
Y. Cao et al.
Table 2. Results on the large-scale mTSP set (500 instances each) where the number of agents is fixed to 10. Method
n=500 Max T(s)
n=600 Max T(s)
n=700 Max T(s)
n=800 Max T(s)
n=900 Max T(s)
n=1000 Max T(s)
OR [2] SOM [21] Hu et al. [2] DAN(g.) DAN(s.64)
7.75 3.86 3.32 3.29 3.14
9.64 4.24 3.65 3.60 3.46
11.24 4.54 3.95 3.91 3.75
12.34 4.93 4.20 4.23 4.10
13.71 5.21 4.59 4.55 4.42
14.84 5.53 4.81 4.84 4.75
1800 7.63 0.56 2.15 48.91
1800 9.39 0.81 2.58 57.81
1800 10.86 1.22 3.03 67.69
1800 14.28 1.69 3.36 77.08
1800 16.65 2.21 3.81 87.03
1800 17.89 2.87 4.21 97.26
Table 3. Results on TSPlib instances where longer computing time is allowed. Method
eil51 eil76 eil101 kroa150 tsp225 m=5 m=10 m=5 m=10 m=5 m=10 m=10 m=20 m=10 m=20
LKH3 [8] 119 119 OR (600 s) DAN (s.256) 126
112 114 113
142 145 160
126 130 128
147 153 168
113 116 116
1554 1580 1610
1554 1560 1571
998 1068 1111
999 1143 1032
by simply replacing our neural network structure with theirs in our distributed RL framework. Table 2 shows the average MinMax cost for large-scale mTSP instances (from 500 to 1000 cities), where the number of agents is fixed to 10 (due to the limitation of Hu et al.’s model). When testing our sampling variant, we set C = 100 in the third decoder layer for efficient exploration (since the tour is much longer). Except DAN and Hu et al.’s model, no method can handle such large-scale mTSP within reasonable time, but we still report the results of OR Tools from Hu et al.’s paper, as well as SOM results as the best-performing meta-heuristic algorithms for completeness. Table 3 shows the test results on TSPlib [23] instances, a well-known benchmark library, where city distributions come from real-world data. There, we extend the computing time of OR Tools to 600s and increase the sample size of DAN to 256, to yield solutions as optimal as possible. Note that the computing time of DAN never exceeds 100s. LKH3 results are reported from [8], where long enough computing time were allowed to yield exact solutions. 7.2
Discussion
We first notice that DAN significantly outperforms OR Tools in larger-scale mTSP instances (m > 5, n ≥ 50), but is outperformed by OR Tools in smallerscale instances (as can be expected). In smaller-scale mTSP, OR Tools can explore the search space sufficiently to produce near-optimal solutions. In this situation, DAN and all other decentralized methods considered find it difficult to achieve the same level of performance.
DAN: Decentralized Attention-based Neural Network for the MinMax mTSP
213
However, DAN outperforms OR Tools in larger-scale mTSP instances, where OR Tools can only yield sub-optimal solutions within the same time budget. In mTSP100(m ≥ 10), our sampling variant is 10% better than OR Tools. The advantage of our Fig. 3. Planning time for the different solvers model becomes more significant from n = 20 to n = 140 while m = 5. The as the scale of the instance grows, computing time of our model only increases linearly with respect to the number of cities, while even when using our greedy varithe computing time of OR Tools (without local ant. For instances on instances search) increases exponentially. involving 500 or more cities, OR Tools becomes impractical even when allowing up to 1800 s per instance, while our greedy variant still outputs solutions with good quality in a matter of seconds. In general, the computing time of our model increases linearly with the scale of the mTSP instance, as shown in Fig. 3. Second, we notice that DAN’s structure helps achieve better agent collaboration than the other decentralized dRL methods, thus yielding better overall results. In particular, we note that simply applying a generic encoder-decoder structure (Kool et al.’s model) only provides mediocre to low-quality solutions across mTSP instances. This supports our claim that DAN’s extended network structure is key in yielding drastically improved performance over this original work. More generally, we note that all four dRL methods tested in our work (i.e., DAN, Hu et al.’s model, Park et al.’s model, and Kool et al.’s model) contain similar network structures as our city encoder and decoder. Therefore, our overall better performance seem to indicate that our additional agent encoder and city-agent encoder are responsible for most of DAN’s increased performance. Finally, we notice that DAN can provide near-optimal solutions for larger teams in medium-scale mTSP instances, while keeping computing times much lower than non-learning baselines (at least one order of magnitude). As shown in Tables 1 and 3, although exact solvers cannot find reasonable mTSP solutions in seconds/minutes like OR Tools and DAN, by running for long enough they still can guarantee near-optimal solutions. However, DAN’s performance is still close (within 4%) to LKH3 in problems involving 10 agents.
8
Conclusion
This work introduced DAN, a decentralized attention-based neural network to solve the MinMax multiple travelling salesman (mTSP) problem. We approach mTSP as a cooperative task and formulate mTSP as a sequential decision making problem, where agents distributedly construct a collaborative mTSP solution iteratively and asynchronously. In doing so, our attention-based neural model allows agents to achieve implicit coordination to solve the mTSP
214
Y. Cao et al.
instance together, in a fully decentralized manner. Through our results, we showed that our model exhibits excellent performance for small- to large-scale mTSP instances, which involve 50 to 1000 cities and 5 to 20 agents. Compared to state-of-the-art conventional baseline, our model achieves better performance both in terms of solution quality and computing time in large-scale mTSP instances, while achieving comparable performance in small-scale mTSP instances. We notice such a feature may make DAN more interesting for robotics tasks, where mTSP needs to be solved frequently and within seconds or a few minutes. We believe that the developments made in the design of DAN can extend to more general robotic problems where agent allocation/distribution is key, such as multi-robot patrolling, distributed search/coverage, or collaborative manufacturing. We also acknowledge that robotic applications of mTSP may benefit from the consideration of real-life deployment constraints directly at the core of planning. We note that such constraints as agent capacity, time window, city demand were added to learning-based TSP solvers with minimal change in network structure [13]. We believe that the same should hold true for DAN, and such developments will be the subject of future works as well. Acknowledgements. This work was supported by Temasek Laboratories (TL@NUS) under grants TL/SRP/20/03 and TL/SRP/21/19. We thank colleagues at TL@NUS and DSO for useful discussions, and Mehul Damani for his help with the initial manuscript. Detailed comments from anonymous referees contributed to the quality of this paper.
References 1. Kaempfer, Y., Wolf, L.: Learning the multiple traveling salesmen problem with permutation invariant pooling networks. arXiv preprint arXiv:1803.09621 (2018) 2. Hu, Y., Yao, Y., Lee, W.S.: A reinforcement learning approach for optimizing multiple traveling salesman problems over graphs. Knowl.-Based Syst. 204, 106244 (2020) 3. Park, J., Bakhtiyar, S., Park, J.: ScheduleNet: learn to solve multi-agent scheduling problems with reinforcement learning. arXiv preprint arXiv:2106.03051 (2021) 4. Faigl, J., Kulich, M., Pˇreuˇcil, L.: Goal assignment using distance cost in multi-robot exploration. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3741–3746. IEEE (2012) 5. Oßwald, S., Bennewitz, M., Burgard, W., Stachniss, C.: Speeding-up robot exploration by exploiting background information. IEEE Robot. Autom. Lett. 1(2), 716–723 (2016) 6. Chao, C., Hongbiao, Z., Howie, C., Ji, Z.: TARE: a hierarchical framework for efficiently exploring complex 3D environments. In: Robotics: Science and Systems Conference (RSS). Virtual (2021) 7. IBM: CPLEX Optimizer (2018). https://www.ibm.com/analytics/cplex-optimizer 8. Helsgaun, K.: An extension of the Lin-Kernighan-Helsgaun TSP solver for constrained traveling salesman and vehicle routing problems. Roskilde University, Roskilde (2017)
DAN: Decentralized Attention-based Neural Network for the MinMax mTSP
215
9. Gurobi Optimizer (2020). https://www.gurobi.com 10. Google: OR Tools (2012). https://developers.google.com/optimization/routing/ vrp 11. Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. arXiv preprint arXiv:1506.03134 (2015) 12. Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S.: Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940 (2016) 13. Kool, W., Van Hoof, H., Welling, M.: Attention, learn to solve routing problems! arXiv preprint arXiv:1803.08475 (2018) 14. Vaswani, A., et al.: Attention is all you need. In: Proceedings of NeurIPS, pp. 5998–6008 (2017) 15. Bektas, T.: The multiple traveling salesman problem: an overview of formulations and solution procedures. Omega 34(3), 209–219 (2006). https://doi.org/10.1016/ j.omega.2004.10.004 16. Zhang, K., Yang, Z., Ba¸sar, T.: Multi-agent reinforcement learning: a selective overview of theories and algorithms. arXiv:1911.10635 (2021) 17. Gupta, J.K., Egorov, M., Kochenderfer, M.: Cooperative multi-agent control using deep reinforcement learning. In: Proceedings of AAMAS, pp. 66–83 (2017) 18. Moritz, P., et al.: Ray: a distributed framework for emerging AI applications. In: Proceedings of OSDI, pp. 561–577 (2018) 19. OpenAI: OpenAI Baselines: ACKTR & A2C (2017). https://openai.com/blog/ baselines-acktr-a2c/ 20. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980 (2017) 21. Lupoaie, V.I., Chili, I.A., Breaban, M.E., Raschip, M.: SOM-guided evolutionary search for solving MinMax multiple-TSP. arXiv:1907.11910 (2019) 22. Voudouris, C., Tsang, E.P., Alsheddy, A.: Guided local search. In: M. Gendreau, J.Y. Potvin (eds.) Handbook of Metaheuristics, vol. 146, pp. 321–361. Springer, US, Boston, MA (2010). https://doi.org/10.1007/978-1-4419-1665-5 11. Series Title: International Series in Operations Research & Management Science 23. Reinelt, G.: TSPLIB-A traveling salesman problem library. INFORMS J. Comput. 3(4), 376–384 (1991)
Receding Horizon Control on the Broadcast of Information in Stochastic Networks Thales C. Silva1(B) , Li Shen1 , Xi Yu2 , and M. Ani Hsieh1 1
2
Department of Mechanical Engineering and Applied Mechanics, University of Pennsylvania, Philadelphia, PA 19104, USA [email protected] Department of Mechanical and Aerospace Engineering, West Virginia University, Morgantown, WV 26501, USA [email protected]
Abstract. This paper focuses on the broadcast of information on robot networks with stochastic network interconnection topologies. Problematic communication networks are almost unavoidable in areas where we wish to deploy multi-robotic systems, usually due to a lack of environmental consistency, accessibility, and structure. We tackle this problem by modeling the broadcast of information in a multi-robot communication network as a stochastic process with random arrival times, which can be produced by irregular robot movements, wireless attenuation, and other environmental factors. Using this model, we provide and analyze a receding horizon control strategy to control the statistics of the information broadcast. The resulting strategy compels the robots to redirect their communication resources to different neighbors according to the current propagation process to fulfill global broadcast requirements. Based on this method, we provide an approach to compute the expected time to broadcast the message to all nodes. Numerical examples are provided to illustrate the results.
Keywords: multi-robot system of information
1
· receding horizon control · broadcast
Introduction
We are interested in leveraging spatio-temporal sensing capabilities of robotic teams to explore or monitor large-scale complex environments (e.g., forests, oceans, and underground caves and tunnels [3,4,7,27,29]). While individual robots can each monitor or explore limited regions, a team of them can efficiently cover larger areas. Naturally, individual robots in swarms are expected to exchange information and make certain decisions based on the information gathered by itself and received from other robots [9,14]. It is, therefore, essential for any robot to be able to transmit its gathered data to others in the team. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 216–230, 2024. https://doi.org/10.1007/978-3-031-51497-5_16
Receding Horizon Control on the Broadcast of Information
217
As the demand of information transmission can emerge between any pair of nodes, we use a propagation model to study the efficiency of the connectivity of the whole team. In such models, robots are represented as nodes of a graph, and interactions between nodes are represented by edges. At the beginning of a propagation, the nodes are considered as ‘non-informed’, and will be ‘informed’ by the information propagated through the graph. A faster convergence of all nodes into the status of ‘informed’ indicates a more efficiently connected network. Existing works abstract the propagation of information from any individual agent to the rest of the team as compartmental models [10,11,31], which are also broadly used in modeling the spread of an infectious diseases [2]. The flow of the nodes from one compartment to another occurs when active links exist between robots that are ‘informed’ and ‘non-informed’ (i.e., only ‘informed’ robots can transmit the information to ‘non-informed’ ones). However, compartmental models often focus on the statistical behavior of the nodes, overlooking the capabilities of the individual-level decision making that may impact the network’s performance. Recent works in robotics have noticed and embraced the potential of controlling individual robots so that the performance of the whole network (i.e., the transmission rates, if a propagation model is considered,) can be impacted or improved. The topology of networks was analyzed in [18] and control strategies were proposed to identify and isolate the nodes that have a higher impact on the propagation process. The design of time-varying networks where robots leverage their mobility to move into each other’s communication ranges to form and maintain temporal communication links has been addressed in [8,12,28]. Information can be propagated in such networks by relying on time-respect routing paths formed by a sequence chronologically ordered temporal links [28]. Nonetheless, such approaches still require thorough and perfect knowledge of the network’s topology and the robots’ ability to maintain active links and requires robots to execute their motion plans precisely both in time and space. As such, the resulting networks lack robustness to uncertainties in link formation timings that may result from errors in motion plan execution due to localization errors and/or actuation uncertainties. Existing stochastic graph models focus on random changes in a network’s topology centered around random creation and removal of edges within a graph. For example, the Canadian Traveller Problem assumes partially visible graphs where edges randomly appear and disappear [1,15]. For time-varying networks, [13,21] assumed that any temporal link may suffer from deviations from its scheduled timing due to the uncertain arrival time of one robot into another robot’s communication range. Information routing or propagation planned for such networks may experience severe delays if subsequent links along a routing path appear out of chronological order in the presence of link formation uncertainties resulting from uncertainties in robot motions [28]. These challenges are addressed in [21] for networks with periodically time-varying interconnection topologies. Control strategies to ‘fix’ nodes with higher impact on the whole network’s performance were also proposed in [21] similar in spirit to those presented in [18].
218
T. C. Silva et al.
In the Canadian Traveler Problems and [21], messages are assumed to be transmitted instantaneously between nodes whenever a link appears. As such, the resulting strategies are solely based on the existence of an available link between a certain pair of robots at a certain time point. When the time to transmit a message is non-trivial, the question goes beyond whether the information can be transmitted or not and must consider the quality of the transmission (e.g., safe, fast, confident). Consider a pair of robots flying through a Poisson forest maintaining a communication link that requires line-of-sight (LOS). The randomly distributed obstacles may intermittently disrupt the LOS between the robots causing randomized delay in the completion of the information transmission task. In these situations, it becomes difficult to determine whether a robot is informed or not at a given time point which then impacts the robots ability to plan subsequent actions. In this paper, we aim to develop receding horizon control schemes that allows individual robots to re-direct their transmission resources (e.g., time, power) to different neighboring robots on the communication network with stochastic links to guarantee an exponentially fast convergence status in the information propagation. We model the completion of the message transmission across a link as a Poisson Point Process. The density parameter of this process is determined by the transmission resources invested by the nodes. Each node carries limited transmission resources that is distributed among all the links connecting to it. The completion of the transition of a message from one node to another is then modeled as a Markov process. All robots would then be able to update the distribution of their own transmission resources according to the neighboring current states and follow the control directions. Therefore, the control strategy takes into account the transmission resource capabilities of the robots and acts on them to fulfill performance requirements. Such a set of resources changes according to the application, for example, in the Poisson forest we might require that the robots stay close together at the cost of decreasing the covered area; in a situation in which the robots need to move back and forth to carry information (e.g., surveillance around buildings or in tunnels), we might require the robots to increase their frequency of visits among each other. The paper is organized as follows. Section 2.1 and Sect. 2.2 provides theoretical backgrounds of the graph structure and the propagation model. The problem is formally stated at the end of Sect. 2. Section 3 introduces our approach of developing the receding horizon control scheme for the stochastic network. Section 4 validates our proposed control schemes with numerical examples. Section 5 concludes the paper and proposes future directions.
2 2.1
Background and Problem Formulation Graph Theory
We represent the communication network as a graph G(V, E), in which V = {1, ..., n} is the node set and E ⊂ V ×V the edge set. Each element of the edge set represents a directed link from i to j if (i, j) ∈ E. The set constituted by the nodes
Receding Horizon Control on the Broadcast of Information
219
that have the ith node as a child node is denoted by Ni = {j ∈ V : (j, i) ∈ E} and it is called neighborhood of the ith node. In accordance to this nomenclature, we call a neighbor of a node i a node j in Ni . The adjacency matrix A = [aij ] associated with the graph G is an object that maps the edge set into a matrix as, 0, if i = j or (j, i)E, (1) aij = ωij (t), if (j, i) ∈ E, where ωij (t) > 0 is a positive number that maps the transmission rate of information from node j to node i, which we will examine in more detail below. 2.2
Information Propagation Model
We study the stochastic broadcast of information in a networked system represented by a graph G(V, E), in which the edges represent communication channels and the nodes represent the robots. We assume that each robot has limited communication capabilities, in these scenarios the systems may have to operate with intermittent network connectivity due to wireless signal attenuation, resulting from obstacle occlusions and other environmental factors [23]. We borrow an epidemiological model from Van Mieghem and Cator [24], who defined an exact 2n -state joint Markov chain to study the stochastic process of propagation of diseases using a Susceptible-Infected (SI) representation. We investigate a similar model to represent the broadcast of information. Naturally, using 2n states to represent the broadcast of information can quickly lead to computational infeasibility. To circumvent this problem we apply a moment-closure to the model [5,20,25], which approximates the representation of the joint Markov chain using a lower-dimensional representation. Afterward, we compare the performance of both representations. In a SI model, each individual can be either “susceptible” or “infected” by a disease, and a susceptible agent can only become infected if there is at least one infected agent in its neighborhood. We adopt this idea to model the transmission of information in a robot networked system, such that each agent can be either informed or non-informed about a message to be transmitted over the network. We assume that the transmission process between any given pair of agents (j, i) follows a stochastic jump process with Poisson distribution and transmission rate ωij (t) [16]. This type of modeling can represent scenarios where the information is carried by mobile robots with low-range and limited communication capabilities, which usually leads to intermittent interaction between agents, such as in [23,30]. To be able to affect the transmission statistics over the whole network while satisfying a team’s performance criterion, we assume that the robots have access to the states of its neighborhood on predefined sampling times, nevertheless, each node can carry the computation with that information. Even though this assumption can be conservative in general, there are scenarios in which it is possible to have a higher range, but low transmission rate, to perform general and simple coordination, while agents need to perform local higher demanding
220
T. C. Silva et al.
data transactions, for example on data fusion tasks. Also, estimates techniques might help to alleviate such an assumption. We model the current state of a robot as an indicator random variable Xi (t) ∈ {0, 1}, such that Xi (t) = 1 if the ith robot is informed at time t, and Xi (t) = 0 otherwise. Note that Xi (t) is a Bernoulli random variable, therefore the evolution of the network expectation of broadcast of information can be determined by the evolution of the probability Pr{Xi (t) = 1 : Xj (0) = 1} for any pair (i, j), over time t > 0. The joint state of the network maps the set of agents that hold the information in a given instant. Since Xi (t) ∈ {0, 1} for each i ∈ V, it is clear that the joint states of the network evolve in a 2n state-space. Inspired by [19,24] in epidemiological studies, we define a time-dependent state-space vector for the whole network as Y (t) = [Y0 (t) Y1 (t) · · · Y2n−1 ] , with n 1, i = k=1 Xk 2k−1 , Yi (t) = 0, otherwise, n
for all Xk ∈ {0, 1}. Observe that Y (t) ∈ R2 corresponds to a binary-valued vector that maps the current state of each node into a specific element Yi (t) (i.e., we have a bijective map between each possible state of the network and the vector Y (t)). Figure 1 illustrates the relation of all possible states of the network and Y (t) for a network with three nodes. In addition, note that the elements of Y (t) also are Bernoulli random variables, hence E[Yi (t)] = Pr{Yi (t) = 1}.
Fig. 1. Representation of all possible states of a network with three agents. The arrows represent possible switching between states.
One could try to model the states of the whole network directly as X(t) = [X1 (t) X2 (t) · · · Xn (t)], since that would reduce the dimension of the represenn tation from R2 to Rn . However, as shown in [19], the evolution of E[X(t)] represents only a marginal probability distribution of the states of the network, and this information is not enough to describe the evolution of Pr{Xi (t) = 1 : Xj (0) = 1} for any t > 0. Therefore, if we are interested in computing the expectation of future states given a current distribution, using X(t) alone is not enough. Alternative methods to reduce the dimension of the state-space include mean-field type approximations and moment-closure methods (please, see [19] and references therein). Typically those methods are valid under the assumption of large
Receding Horizon Control on the Broadcast of Information
221
numbers and have reduced precision about the underlying stochastic process. Indeed, as pointed out in [25], in general, mean-field approximations do not consistently provide either an upper- nor a lower-bound for general spreading processes. Since Y (t) is an exact characterization of the states of the network, an expression for Pr{Yi (t) = 1 : Y (0)} can be obtained using a continuous-time Markov chain with 2n states, determined by the following infinitesimal generan n tor Q(t) = [qij (t)] ∈ R2 ×2 [24], ⎧ n m−1 , ⎨ k=1 ωmk Xk , for m = 1, ..., n, Xm = 1 if i = j − 2 n if i = j, qij (t) = − k=1 qkj , (2) ⎩ 0, otherwise, for all Xk ∈ {0, 1}, in which the corresponding Markov chain evolves according the following Kolmogorov foward equation dv(t) = v(t) Q(t), dt
(3)
where the state vector is defined as v(t) = [v0 (t) · · · v2n−1 ] with elements vi (t) = Pr{Yi (t) = 1} and satisfies i vi = 1 for all t > 0. In this paper, we are interested in enforcing a desired evolution for the propagation of information on the network (i.e., Pr{Xi (t) → 1 : Xj (0) = 1}) exponentially fast, by closing the loop of the network with a feedback control strategy, which chooses the control action from a finite set of possible actions and minimizes a cost of available resources. Specifically, the problem investigated can be stated as: Problem 1. Given a network with stochastic propagation of information represented by the topology in (1), define a feedback control that actuates on the transmission rates ωij (t) such that the evolution of the probability of all nodes in the network becoming informed nodes follows the desired convergence criteria.
3 3.1
Methodology Control Strategy
To focus on the transmission of information over the network and understand the ability to shape the broadcast of information, we make the following assumption: Assumption 1. The communication graph has a directed spanning tree with a root corresponding to the robot that is the origin of the information to be transmitted. This assumption implies that each edge of the spanning tree has a positive transmission rate (i.e., the information is propagated with some positive baseline transmission). In this scenario, it is clear that the broadcast of information will eventually reach every robot on the network with probability 1. Hence, the
222
T. C. Silva et al.
main objective of our control problem can be viewed as an attempt to shape the statistics of the broadcast of information on the network to satisfy a given convergence criterion. It is worth noticing that most works on contact processes are concerned with regular lattices or mean-field models [17,18]. Also, usually, the desired convergence rate is performed through static strategies, targeting central nodes, and using heuristics [18,21]. With our control strategy, the network can adapt to the stochastic realization of the propagation of information and only expend resources to actuate during realizations that might interfere with the team’s desired behavior. With the broadcast model in (3), we control the transmission rates ωij (t) by applying allowable actions based on the current state of the network and on predicted future states, sampled on predefined times, following a sampled-data predictive controller [22]. The controller we study in this paper is defined by the following optimization problem: (4) min C(Xi (t))
−rΔt s.t. 1 − E Xi (t + Δt) : X(t) − 1 − Xi (t) e ≤ 0, where E Xi (t + Δt) : X(t) is the prediction stage which is computed using (3) (later we utilize a moment-closure to compute its upper-bounds), C(Xi (t)) is the cost of applying the control action ωij on the transmission between nodes i and j, Δt > 0 defines a window of prediction from the current instant t, the positive constant r defines a desired exponential convergence rate, and U represents the set of admissible control actions. In practice, the set of admissible control actions should represent the operations that robots can perform to adjust their transmission rate among neighbors. For example, if the team of robots operate in geographically disjoint locations and travel back and forward regularly to neighbors’ sites and carry information with them, one could define an increase in the transmission rate as an increase in the frequency the agents should visit each other. Another example is in the case of periodic rendezvous, in which admissible actions could map the range of possible changes in the agent’s period. We draw attention to the fact that ensuring the constraints in (4) plays a central role in guaranteeing the fulfillment of the desired performance. We notice that for the general case it might be difficult to guarantee feasibility of problem (4). Therefore, in this work we assume that there is an auxiliary control law that provides feasibility and that the controller has access to it. This assumption is not uncommon in the nonlinear model predictive literature (see [6] and references therein), typically such an auxiliary controller is provided ad hoc. Nonetheless, in the problems we are interested in, we notice that in several applications it is possible to find feasible, but high-priced, solutions for the problem. For example, in tasks where robots carry the message from site to site intermittently we might satisfy the constraints by bringing all robots closer to each other; in the case of robots performing tasks on cyclic paths we could make all robots cover overlaying paths. Hence, even though it is necessary to provide an auxiliary control strategy according to each task, we notice that in the problem of broadcasting information ωij ∈ U
Receding Horizon Control on the Broadcast of Information
223
simply defining a feasible strategy is not necessarily interesting. Therefore, we seek to apply admissible control actions that do not lead to trivial connectivity on the network and preserve a satisfactory performance. 3.2
Expected Time to Broadcast Information
The feasibility of the optimization in (4) guarantees that the information is transmitted throughout the network exponentially fast. However, a remaining open question is how can we infer how long it will take until all robots in the networks receive the message of interest. In other words, what is the hitting time for the stochastic process to reach the stationary state. In this section, we analyze this question and provide the expectation for the propagation time. Given that we have the exact representation of the Markov chain that describes the underlying stochastic process we could, in principle, compute the hitting times directly. However, since we solve the receding optimization problem (4), our Markov chain is non-homogeneous and its transition rates might change according to each realization. Therefore, we take advantage of the constraint in problem (4) relying on the assumption previously discussed that a feasible solution can always be found and compute the expected time to broadcast information based on Watkins et al. (Please, note that the method shown below is similar to Theorem 5 in [26].) Summing the constraint in (4) over the n nodes gives, n − E S(X(t + Δt)) ≤ (n − S(X(t))e−rΔt , n where S(X(t)) = i Xi (t) is the number of informed robots at time t. The feasibility of the problem ensures that n − E S(X(t)) ≤ (n − S(X(0))e−rt . (5) Define the broadcast time as tb = inf{t ≥ 0 : S(X(t)) = n}, then its expectation can be computed as ∞ E[tb : X(0)] = 1 − Pr(tb ≤ σ)dσ. (6) 0
Note that Pr{tb ≤ t} = Pr{S(X(t)) = n}, therefore the equivalence 1 − Pr{tb ≤ t} = Pr{S(X(t)) < n} holds, where Pr{S(X(t)) < n} is the probability of having any robot non-informed at time t. We use this equality and (5) to compute the expectation time for the broadcast of information. Note that the right-hand side of (5) bounds the expected number of the existence of non-informed robots, therefore
Pr S(X(t)) < n : (X(0)) ≤ min 1, (n − S(X(0))e−rt ,
224
T. C. Silva et al.
hence, (6) yields E[tb : X(0)] ≤
∞
0
≤ τ1 +
σ min 1, (n − S(X(0))e−r Δt Δt dσ ∞
σ
(n − S(X(0))Δte−r Δt Δt
τ
1 i= Δt
≤ τ1 +
e−rτ1 (n − S(X(0))Δt, 1 − e−rΔt
(7)
t
with τ1 = inf{t ≥ 0 : (n − S(X(t)))e−r Δt Δt ≤ 1}. 3.3
Robust Moment Closure
In this section we consider a moment closure technique to solve Problem 1 with a computational amenable method. We reiterate that, while (3) is a theoretically right representation for our problem, it is not computationally tractable for relatively medium-size networks due to the necessity to use 2n states to capture the evolution of the probabilities in the network. The technique considered in this section was introduced in [25] for the problem of continuous-time epidemics. The approach is based on the Fr´echet inequalities, defined as Flwr (Pr{A}, Pr{B}) = max{0, Pr{A} + Pr{B} − 1}, Fupr (Pr{A}, Pr{B}) = min{Pr{A}, Pr{B}},
(8) (9)
which are lower- and upper-bounds for the joint probability Pr{A, B}, respectively. Its main advantage over mean-field approximations is that it gives meaningful bounds for the evolution of the real mean of the underlying stochastic process (see [25] for a detailed discussion). Initially, we need to reformulate our representation to consider the marginal probabilities of each robot receiving the information, since we are interested in drawing bounds on their evolution. For our problem, the marginal probabilities can be written as the expectation of a robot becoming informed as a function of its neighbors as, dE[Xi (t)] ¯ i (t)Xj (t)]ωij (t), = E[X dt
(10)
j∈Ni
¯ i (t) is the complement of Xi (t) (i.e., X ¯ i (t) = 1 if Xi (t) = 0). for all i ∈ V, where X This equation maps an infinitesimal probability of Xi receiving the information given that some neighbors Xj , j ∈ Ni , have the message. Notice that (10) is not closed, therefore we cannot solve it for E[Xi (t)] without equations for ¯ i (t)Xj (t)]. Central to the result in [25] is to bound the joint probabilities of E[X ¯ i Xj ]. Next, we show the optimal approximations for the expectathe form E[X tions we employ, which can be expressed as closed-form functions.
Receding Horizon Control on the Broadcast of Information
225
Lemma 1 (Watkins et al.,[25]). Let x(0) = X(0) = x(0), define S(X) as the number of nodes informed in X. It holds that non E[S(X(t)) : X(0)] ≤ min xinf (11) i (t), 1 − xi (t) , and i∈V
E[S(X(t)) : X(0)] ≥
non max xinf i (t), 1 − xi (t) ,
(12)
i∈V
where the variables xi (t) and xi (t), for each ∈ L = {inf, non} are the solutions of the following ordinary differential equations, x˙ i = Fupr (Bxi (xi ), xj )ωij (t), (13) j∈Ni ∈L
x˙ i =
Flwr (xi , xj )ωij (t),
(14)
j∈Ni ∈L
and Bxi (xi ) = min 1 − xi , xi } is the upper-bounding operator. In addition, the following inclusion is always satisfied E[Xi (t) : X(0)] ∈ xi (t), xi (t) ⊆ 0, 1 , for all t ≥ 0. The result in Lemma 1 allows us to compute bounds for the evolution of the probabilities of the broadcast of information in the network using 4n differential equations (equations (13) and (14) for each informed and non-informed node), instead of 2n . This enables us to apply the receding horizon strategy in equation (4) in larger networks.
4
Numerical Evaluation and Discussions
In this section, we illustrate the approach proposed in Sect. 3 with two simulation results. The first simulation was carried out on a 15 robots static topology. The topology of the network was generated using Erd¨ os-R´enyi model with connection probability of 0.3. We assumed that each robot has an uniform total transmission rate μ to distribute to its connected neighbor. In other words, an informed robot can become connected to its neighbors according to different transmission rates ωij (t) such that the sum of its transmission rates between neighbors is limited by fixed value μ. This parameter, along with the connection probability of Erd¨ osR´enyi model, has a big impact on the information propagation speed of the whole network. In this simulation, we set the total transmission rate μ to be 2 for better comparison results. We applied the control strategy (4) proposed in Sect. 3.1 with the Continuous-Time Markov Chain (CTMC) representation and with the Robust Moment Closure (RMC), model as described in Sect. 3.3. We can choose from a
226
T. C. Silva et al.
myriad of cost that align with different design purposes. Here we chose functions 2 . If we were not seeking an exponential propagation, this C(Xi (t)) = j∈Ni ωij objective function, together with the constraints on the transmission resource ωij (t) ∈ (0, μ), and j∈Ni ωij = μ for each i ∈ V, would push for an even distribution of every robot’s resources between its neighbors. We chose Δt = 1, and r = 0.22 for both CTMC and RMC models, and simulated our proposed controller which guarantees an exponential propagation while minimizes the chosen cost. The original exact model is also simulated without any control method in the loop. We run all three models 10 times each, and we took their average as the performance results. The performance comparison result is shown in Fig. 2 and the numerical result at different stages of the propagation process is in Table 1. From Fig. 2a, the CTMC model with control outperforms two other models, the RMC model with control is slightly slower than the CTMC, but still faster than the original model without control. The CTMC model was in average 39.6% faster than the open-loop model to informed all robots, and RMC model was 27.3% faster than the open-loop model completion time. Notice that the two patterns in black and in red lines shown in Fig. 2b are similar–the biggest difference number between CTMC and RMC is 3 robots which occurs around 2.5 s.
Fig. 2. Numerical results comparison of 15 nodes network’s broadcasting speed. In (a), the bar chart showcases the total number of robots that received information after the end of each 1.0833 s. The error bars represent the maximum and minimum informed robots in our 10 different data-set. In (b), the Line chart demonstrates the evolution of the difference in informed robots with respect to time. The red plot shows the difference between CTMC and the exact model, and the black plot shows the difference between RMC and the exact model.
We conducted the second simulation in a larger static topology network with 100 nodes. The CTMC model is not appropriate in this setting as the state100 100 space evolves in R2 ×2 . The topology of the network is again generated using Erd¨ os-R´enyi model with connection probability of 0.05, and total transmission rate μ = 4. We compared the performance between the open-loop model
Receding Horizon Control on the Broadcast of Information
227
Fig. 3. Performance comparison of 100 nodes network’s broadcasting speed. In (a), the bar chart depicts the total number of informed robots after every 0.2609 s for both RMC and exact models. The error bars indicate the variation in informed robots of 10 different simulations. In (b), the line plot represents the evolution of the difference in informed robots between RMC and the open-loop method. Table 1. Average completion time at each propagation stage Topology
Model
15 nodes
CTMC 2.0571 2.4084 2.7997 3.7673 4.5902 RMC 2.2174 2.7436 3.3202 4.1394 5.0324 Open-Loop 2.6085 3.1619 3.7573 5.1964 6.4092
20%
40%
60%
80%
100%
100 nodes RMC 0.2325 0.2849 0.3131 0.3525 0.9364 Open-Loop 0.2515 0.3343 0.4603 0.6428 2.9845
and RMC model in broadcasting speed. The control strategy is implemented considering Δt = 0.1 and exponential convergence rate r = 2.8. We run 10 different trials for each model using random seeds and use their mean values as the numerical results. Figure 3 shows the performance in information broadcasting for those two models. Notice that at the beginning in Fig. 3a, before 0.3 s, there was not much difference between the two methods. This is because the constraint of the optimization problem was satisfied with the initial trivial solution and there was not many actions in U that could further improve the information propagation during that period. This might be why there is a fluctuating pattern at the beginning in Fig. 3b. After that, the control strategy starts to show its advantage, and the average time to broadcast the information to all robots is 0.93 s, which is in average 218.7% faster than the open-loop network, which has an average of 2.98 s.
228
T. C. Silva et al.
Both numerical examples show a significantly better performance of our proposed methods in the scenarios assuming a random node is propagating information to the rest of the team. Notice that such scenarios are not usually seen in real-world applications. A faster propagation indicates a more efficiently connected network, providing a tighter upper bound of the information transmission time between any pair of nodes in the network. Therefore the propagation models were chosen to demonstrate a more efficient transmission is expected for any shortest path connecting two nodes in the network. A limitation of this propagation model is that all shortest paths are considered equally important, while in real-world applications, it makes sense that certain paths are with higher priorities due to the actual variation of demands. Our method can be easily tailored to fit different objective functions.
5
Conclusions and Future Work
We have investigated a control structure for shaping the transmission rates on networked systems to increase the connectivity in networks, the broadcast of information can be modeled as stochastic jump processes. As a first result, we have shown that by using an epidemiological model to represent the transmission of information among robots we can describe the probability of future transmissions based on the current joint state of the network and its topological structure. Then, based on this finding, we examined the applicability of a receding control strategy to account for possible drifts in performance on the current realization and, subsequently, actuate on transmission rates whenever necessary. This approach provides efficient adjustments to the network. Finally, numerical experiments were implemented, illustrating the effectiveness of the method. Possible future work includes the extension of the strategy for distributed optimization. It is also essential to connect the robot dynamics with the changing transition rates. One possible approach to bridging those two models is through maps defined a priori and then accounting for them on the admissible set of actions. Finally, our methodology only accounts for the expectation of the transmission of information. This parameter can be a weak decision parameter, an extension to consider the variance would be appealing. Acknowledgements. We gratefully acknowledge the support of ARL DCIST CRA W911NF-17-2-0181, Office of Naval Research (ONR) Award No. N00014-22-1-2157.
References 1. Bar-Noy, A., Schieber, B.: The canadian traveller problem. In: Proceedings of the Second Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 261–270 (1991) 2. Brauer, F.: Compartmental models in epidemiology. In: Brauer, F., van den Driessche, P., Wu, J. (eds.) Mathematical Epidemiology, pp. 19–79. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78911-6 2
Receding Horizon Control on the Broadcast of Information
229
3. Breitenmoser, A., Schwager, M., Metzger, J.C., Siegwart, R., Rus, D.: Voronoi coverage of non-convex environments with a group of networked robots. In: 2010 IEEE International Conference on Robotics and Automation, pp. 4982–4989 (2010). https://doi.org/10.1109/ROBOT.2010.5509696 4. Cassandras, C.G.: Smart cities as cyber-physical social systems. Engineering 2(2), 156–158 (2016) 5. Cator, E., Van Mieghem, P.: Second-order mean-field susceptible-infectedsusceptible epidemic threshold. Phys. Rev. E 85, 056111 (2012). https://doi.org/ 10.1103/PhysRevE.85.056111 6. Ellis, M., Durand, H., Christofides, P.D.: A tutorial review of economic model predictive control methods. J. Process Control 24(8), 1156–1178 (2014) 7. Gusrialdi, A., Hirche, S., Hatanaka, T., Fujita, M.: Voronoi based coverage control with anisotropic sensors. In: Proceedings of the IEEE American Control Conference, pp. 736–741 (2008) 8. Hollinger, G.A., Singh, S.: Multirobot coordination with periodic connectivity: theory and experiments. IEEE Trans. Rob. 28(4), 967–973 (2012) 9. Hsieh, M.A., Kumar, V., Chaimowicz, L.: Decentralized controllers for shape generation with robotic swarms. Robotica 26(5), 691–701 (2008) 10. Huang, L., Park, K., Lai, Y.C.: Information propagation on modular networks. Phys. Rev. E 73(3), 035103 (2006) 11. Khelil, A., Becker, C., Tian, J., Rothermel, K.: An epidemic model for information diffusion in manets. In: Proceedings of the 5th ACM International Workshop on Modeling Analysis and Simulation of Wireless and Mobile Systems, pp. 54–60 (2002) 12. Khodayi-Mehr, R., Kantaros, Y., Zavlanos, M.M.: Distributed state estimation using intermittently connected robot networks. IEEE Trans. Robot. 35(3), 709– 724 (2019). https://doi.org/10.1109/TRO.2019.2897865 13. Knizhnik, G., Li, P., Yu, X., Hsieh, M.A.: Flow-based control of marine robots in gyre-like environments. In: 2022 IEEE international Conference on Robotics and Automation (ICRA). IEEE (2022) 14. Moarref, S., Kress-Gazit, H.: Automated synthesis of decentralized controllers for robot swarms from high-level temporal logic specifications. Auton. Robot. 44(3), 585–600 (2020) 15. Nikolova, E., Karger, D.R.: Route planning under uncertainty: the Canadian traveller problem. In: Proceedings of the 23rd National Conference on Artificial Intelligence (AAAI 2008), pp. 969–974. AAAI Press (2008) 16. Øksendal, B., Sulem, A.: Applied Stochastic Control of Jump Diffusions, 2nd edn. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-69826-5 17. Par´e, P.E., Beck, C.L., Nedi´c, A.: Epidemic processes over time-varying networks. IEEE Trans. Control Netw. Syst. 5(3), 1322–1334 (2018). https://doi.org/10.1109/ TCNS.2017.2706138 18. Preciado, V.M., Zargham, M., Enyioha, C., Jadbabaie, A., Pappas, G.J.: Optimal resource allocation for network protection against spreading processes. IEEE Trans. Control Netw. Syst. 1(1), 99–108 (2014). https://doi.org/10.1109/TCNS. 2014.2310911 19. Sahneh, D.F., Scoglio, C., Van Mieghem, P.: Generalized epidemic mean-field model for spreading processes over multilayer complex networks. IEEE/ACM Trans. Netw. 21(5), 1609–1620 (2013). https://doi.org/10.1109/TNET.2013. 2239658
230
T. C. Silva et al.
20. Schnoerr, D., Sanguinetti, G., Grima, R.: Comparison of different moment-closure approximations for stochastic chemical kinetics. J. Chem. Phys. 143(18), 185101 (2015). https://doi.org/10.1063/1.4934990 21. Shen, L., Yu, X., Hsieh, M.A.: Topology control of a periodic time-varying communication network with stochastic temporal links. In: 2022 American Control Conference (ACC), pp. 4211–4217. IEEE (2022) 22. Sopasakis, P., Patrinos, P., Sarimveis, H.: MPC for sampled-data linear systems: guaranteeing constraint satisfaction in continuous-time. IEEE Trans. Autom. Control 59(4), 1088–1093 (2014). https://doi.org/10.1109/TAC.2013.2285786 23. Twigg, J., Dagefu, F., Chopra, N., Sadler, B.M.: Robotic parasitic array optimization in outdoor environments. In: 2019 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), pp. 1–8 (2019). https://doi.org/10.1109/ SSRR.2019.8848974 24. Van Mieghem, P., Cator, E.: Epidemics in networks with nodal self-infection and the epidemic threshold. Phys. Rev. E 86, 016116 (2012). https://doi.org/10.1103/ PhysRevE.86.016116 25. Watkins, N.J., Nowzari, C., Pappas, G.J.: A robust moment closure for general continuous-time epidemic processes. In: 2018 IEEE Conference on Decision and Control (CDC), pp. 244–249 (2018). https://doi.org/10.1109/CDC.2018.8618664 26. Watkins, N.J., Nowzari, C., Pappas, G.J.: Robust economic model predictive control of continuous-time epidemic processes. IEEE Trans. Autom. Control 65(3), 1116–1131 (2020). https://doi.org/10.1109/TAC.2019.2919136 27. Wei, C., Yu, X., Tanner, H.G., Hsieh, M.A.: Synchronous rendezvous for networks of active drifters in gyre flows. In: Distributed Autonomous Robotic Systems, pp. 413–425. Springer, Cham (2019) 28. Yu, X., Hsieh, M.A.: Synthesis of a time-varying communication network by robot teams with information propagation guarantees. IEEE Robot. Automat. Lett. 5(2), 1413–1420 (2020) 29. Yu, X., Hsieh, M.A., Wei, C., Tanner, H.G.: Synchronous rendezvous for networks of marine robots in large scale ocean monitoring. Front. Robot. AI 6, 76 (2019) 30. Yu, X., Salda˜ na, D., Shishika, D., Hsieh, M.A.: Resilient consensus in robot swarms with periodic motion and intermittent communication. IEEE Trans. Robot. 38, 110–125 (2021). https://doi.org/10.1109/TRO.2021.3088765 31. Zanette, D.H.: Dynamics of rumor propagation on small-world networks. Phys. Rev. E 65(4), 041908 (2002)
Adaptation Strategy for a Distributed Autonomous UAV Formation in Case of Aircraft Loss Tagir Muslimov(B) Ufa State Aviation Technical University, Ufa, Russian Federation [email protected]
Abstract. Controlling a distributed autonomous unmanned aerial vehicle (UAV) formation is usually considered in the context of recovering the connectivity graph should a single UAV agent be lost. At the same time, little focus is made on how such loss affects the dynamics of the formation as a system. To compensate for the negative effects, we propose an adaptation algorithm that reduces the increasing interaction between the UAV agents that remain in the formation. This algorithm enables the autonomous system to adjust to the new equilibrium state. The algorithm has been tested by computer simulation on full nonlinear UAV models. Simulation results prove the negative effect (the increased final cruising speed of the formation) to be completely eliminated. Keywords: UAV Formation Flight · Fault-tolerant Formation Control · Drone Flocking · Formation Reconfiguration
1
Introduction
For a decentralized formation of unmanned aerial vehicles (UAVs), the formation needs to be capable of reconfiguration should one of the UAVs fail. Some papers approach this problem by developing algorithms to optimize the recovery or maintenance of connectivity. For instance, paper [1] investigates reconfiguring the communication topology to minimize the communication cost in case of the UAVs in the formation losing their links. When considering a UAV formation as a multiagent system, existing approaches become applicable, see, e.g., [2] for an overview. Paper [3] investigates the possibility of reaching consensus should agents stop communicating. Work [4] covers the effectiveness of a swarm intelligence in a multiagent system that loses its agents. Article [5] investigates controlling a multiagent system when links between agents are lost. Paper [6] proposes a novel approach that relies on a Negative Imaginary (NI) theory-based control algorithm that keeps a decentralized system of single integrators stable when several agents are lost. Work [7] covers an agent loss-tolerant multiagent system. One of the approaches covered therein consists in adding redundant links to make the group robust to agent loss. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 231–242, 2024. https://doi.org/10.1007/978-3-031-51497-5_17
232
T. Muslimov
Being a complex system, a UAV formation does affect the reconfiguration algorithms in development. The way it does so depends, for instance, on the mission being planned or on the dynamics of the aircraft themselves. Paper [8] presents a concept of fault-tolerant formation control for cases of UAV loss. The control strategy consists in adjusting the objective function that keeps the entire formation functional. Work [9] covers a situation of UAV loss and proposes an algorithm that performs priority ranking in order to choose a UAV to replace the lost one. Article [10] presents an approach where the lost UAV agents are replaced on a priority basis in a formation caught in an natural disaster. Paper [11] covers operating a UAV formation in a wildfire. Should a single UAV be lost, the proposed algorithm reconfigures the formation by adjusting for the lost links. Numerous papers cover UAV formation control strategies in cases of actuator faults. For instance, work [12] demonstrates the use of UAV formations to tackle wildfires; it shows that cooperative tracking errors could be bounded even in case of actuator faults. Article [13] investigates UAV formation stability by applying a Lyapunov function, assuming both actuator faults and communication interrupts. Adaptive control enabled the authors to compensate for actuator faults. The above paper overview leads to a conclusion that the existing publications make little focus on how losing a UAV agent affects the dynamics of the entire formation. Of particular interest are cases where the system has completely decentralized control, i.e., each UAV agent relies exclusively on readings on the adjacent aircraft to stir itself. Yamaguchi et al. have developed one of the most effective control methods for distributed autonomous systems. Paper [14] was the first to propose a model that enabled completely decentralized control. The approach was further enhanced in a series of subsequent papers that showed using this method to enable a group of autonomous robots to generate a variety of shapes [15] as well as to encircle a target [16]. Yamaguchi states this approach is based on formation vectors. The idea is that each agent in a formation has its own formation vector that is calculated from the adjacent agents’ readings. Work [17] also showed that when autonomous robotic agents are configured to maintain different distances to their neighbors, the general stability of the formation can be maintained by running a special adaptation algorithm. However, this cycle of papers did not focus on losing an agent from the formation. Besides, they only covered linear integrators as agents. Paper [18] proposed an adaptation algorithm for a decentralized system of linear oscillators, which also covered the case of losing a single oscillator. This approach can be modified for use in a system of autonomous robots including flying ones/UAVs. Thus, the novelty hereof lies in that we (i) investigate the feasibility of implementing an agent loss adaptation algorithm in a completely decentralized/distributed autonomous system, from the standpoint of its dynamics; (ii) test the proposed algorithm on full nonlinear UAV models engaged in cooperative target tracking in a circular formation.
Adaptation Strategy for a Distributed Autonomous UAV Formation
2 2.1
233
Adaptation Algorithm for a Decentralized UAV Formation Distributed Autonomous System Model
A decentralized system usually consists of interchangeable locally interconnected systems. The structural similarity of such systems to their biological counterparts is what makes researchers believe these systems could be flexible and adaptable. This gives rise to two problems: controlling the entire system with only local data at hand; and implementing adaptation mechanisms. Of particular interest is the development of a fault-tolerant algorithm to adapt the UAV formation to losing a faulty UAV agent when tracking a ground target. This model is based on the Japanese researcher’s work [18–20], where they developed an adaptation algorithm for the locomotion model of an oscillatory network of motor neurons in a biological organism; they also developed a multicylinder engine model [21]. These papers consider locally interconnected oscillators that maintain relative phase angles to create locomotion rhythms that control walking, running, swimming, and other types of organism locomotion. In a target-tracking decentralized UAV formation, UAV agents, too, communicate locally and maintain the present relative phase angles (phase shifts) of orbiting the target. This is why this model, when further modified and developed, could become the foundation for an adaptive UAV formation reconfiguration algorithm for fault-tolerant control.
Fig. 1. Losing one UAV in a formation due to failure
We further describe a distributed autonomous system per [18]. The system consists of N interchangeable subsystems {Si }i=1,...,m . Interchangeability means that the dynamics of such subsystems can be described with differential equations of the same form. For analysis, consider a subsystem state written as the state variable qi . This could be the phase angle of orbiting the target in a forT mation. Consider the vector Q = [q1 , . . . , qm ] . This vector will be further used as a vector of states. For example, in the case of UAV formation, the vector Q can be a vector of these phase angles mentioned earlier. In a decentralized formation, UAVs do not have one-to-all communications, i.e., a single UAV agent will not be able to communicate with all other UAV
234
T. Muslimov
agents in the formation. In other words, the UAV agent dynamics contains the state variables for the limited set of adjacent UAV agents. Interaction refers herein to receiving the state readings of the adjacent UAV agents. Let N (qi ) = {ql |ql is interacting with qi } be the set of such adjacent UAVs. Each pair of interacting UAVs, e.g., Si and Sj , have a linear functional dependency relationship as pk = pk (qi , qj ), where k = 1, . . . , n are the ordinal numbers of interactions. The difference between the respective subsystem states is an T example of such functional dependency. Consider the vector P = [p1 , . . . , pn ] . In the case of a decentralized UAV formation, this vector can be regarded as a vector of current phase shifts in circular motion around the target object. The functional dependencies in the formation can be written as the equation P = LQ, where L ∈ Rn×m . Control consists in finding such subsystem dynamics q˙i = fi (N (qi )) that the T functional dependencies P converge to the desired values Pd = [pd1 , . . . , pdn ] . In the case of decentralized UAV formation, vector Pd can be considered as a vector of desired phase shifts in circular motion around the target object. For brevity, let us refer to the desired geometric shape of the UAV formation (depends on Pd ) as the pattern. An important assumption here is that the desired values of the functional dependencies Pd are predetermined, i.e., there is such vector Q that the condition LQ = Pd holds. Write such condition as (I − LL+ ) Pd = 0, where L+ is the pseudoinverse of matrix L, and I is a unit matrix. UAV formation control applies when the vector P converges to the vector Pd , i.e., each element of the vector P converges to the correspondingly numbered element of the vector Pd . However, the vector Pd is not always attainable for a variety of reasons. For instance, it will not be attainable if one or more UAVs in a formation fail. Thus, creating an adaptation (or adaptive reconfiguration) algorithm boils down to developing a vector Pd tuning algorithm. Yuasa and Ito published a series of papers [19,20] where they solved the above-mentioned control problem on determining the dynamics of subsystems
Fig. 2. Reconfiguring a decentralized system by means of an adaptation algorithm. Figure partly adapted from [18]
Adaptation Strategy for a Distributed Autonomous UAV Formation
235
q˙i = fi (N (qi )) for oscillators by proving a number of theorems. However, their solution was developed with linear agents in mind and would require a substantial modification for use on nonlinear agents. To begin with, the equa˙ = F (Q) , where tion q˙i = fi (N (qi )) can be rewritten in vector form as Q T ˙ ˙ = LF (Q). F = [f1 , . . . , fm ] . The equation can be transformed as P = LQ ˙ = LQ ˙ = Theorem 1 (Yuasa and Ito) [19]. Dynamics of P in the equation P LF (Q) can be described as an autonomous system if and only if F in the equation ˙ = F (Q) satisfies the following condition: L ∂F (I − L+ L) = 0. Q ∂Q Theorem 2 (Yuasa and Ito) [19]. If the dynamics of P can be described as a gradient system with the potential function V (P), i.e., P = − ∂V ∂P , then T ∂VX (X) + F can be written as F = + (I − L L) Q , where X = −LT P, ∂X T VX (X) = VX −L P = V (P) , and Q is an arbitrary vector that has the same dimensionality as Q. Conversely, if F can be described with the equation above, there exists such potential function V (P) that the dynamics of P would be described as P = − ∂V ∂P . Suppose that the subsystem dynamics can be described as q˙i = f¯i + f˜i (xi ), T T ¯ ∈ ker L and f˜1 , . . . , f˜m = F ˜ ∈ (ker L)⊥ . Dynamics where f¯1 , . . . , f¯m = F ¯ = 0. Given this, write of P is only determined by the summand f˜i (xi ) since LF ˙ = LF. ˜ Paper [18] also proves the following theorem. the dynamics of P as P Theorem 3 [18]. If f˜i (x i ) satisfies the following conditions: 1. f˜i (xi ) = 0 at the ˜i (xi )
point xi = xdi , 2. ∂ f∂x > 0, then X = Xd (here Xd = −LT Pd ) will be
i xi =xid
one of the equilibrium states. Besides, if the following holds: 3.f˜i (xi )·(xi − xdi ) > 0, then X = Xd becomes the only equilibrium state. The authors of [19] prove this theorem under an assumption that the dynamics of P become a gradient system. When all the three conditions of this theorem are satisfied, X = Xd is the global minimum of the potential function m V (X) = i=1 f˜i (xi )dxi . Below is a variant of dynamics satisfying this theorem: (1) q˙i = f¯i + fi+ (xi − xdi ) , where fi+ (·) is an odd increasing function. 2.2
Adaptation Strategy for a Distributed Autonomous System
In paper [18], the authors refer to losing one or more oscillators as an environment variation. In a UAV formation, it can be referred to as a structural variation in the formation. Should the formation lose a single UAV as shown in Fig. 1, the desired phase shifts between the UAVs will be altered, as they are assumed to depend on the total number of UAVs in the formation. In other words, the pattern Pd configured before losing a faulty UAV becomes unattainable. Pattern
236
T. Muslimov
Pd being unattainable means there is no such matrix Q that would satisfy the equation LQ = Pd . From the above equations Xd = −LT Pd and X = −LT P, derive X − Xd = −LT (P − Pd ). Apparently, X − Xd = 0 is necessary but insufficient to satisfy P − Pd = 0. In other words, if P − Pd = 0 and also P − Pd ∈ ker L, then X − Xd = 0. Thus, Pd needs to be adaptively tuned. Such tuning involves estimating the current system. From the engineering standpoint, it is reasonable to use the function 1 n 2 (pi − pdi ) . (2) E= i=1 2 Figure 2 shows the minimum of this function is attained when the desired pattern Pd is formed; local interactions between the adjacent UAVs are null. If the pattern Pd cannot be attained due to losing a UAV from the formation, then the interaction is ongoing, and a new pattern emerges as a result of balancing such interactions, see Fig. 2b. To reduce subsystem interaction while preserving the newly emerged pattern, Pd needs to be tuned, see Fig. 2c. As noted in [18], a UAV agent’s Si interaction as a subsystem can be defined in terms of the equation: Iki = −Lki (pk − pdk ) .
(3)
Here, k is the ordinal number of the interaction, i.e., assuming that in the matrix L, its elements Lki and Lkj are non-zero, that means the UAVs numbered i and j interact as subsystems in the formation. Based on (3), the second term of the Eq. (1) can be written as [18] n fi+ (xi − xdi ) = fi+ Iki . k=1
Using (3), the objective function (2) can be transformed as follows [18]: E=
m n 1 1 i 2 I . 4 i=1 L2ki k k=1 Lki =0
Apparently, minimizing the function E, we minimize the interactions Iki between the subsystems. The key point of such adaptive reconfiguration is the separation of dynamics on the time scale, i.e., the system dynamics needs to be much faster than the adaptive tuning dynamics. Reason being, adaptation requires estimating the current state of the system. Before a pattern can be adjusted and a new variation thereof can be made, the current pattern (variation) needs to be suitabilitytested. Let us show how this approach could be used on a decentralized UAV formation when tracking a target. Consider a 4-UAV formation orbiting the target at a preset radius. Control aims at maintaining a predefined phase shift between UAVs numbered i and i + 1. The UAVs are engaged in an “open-chain” interaction, i.e., 1–2, 2–3, 3–4. This phase shift is denoted as pi,i+1 = ϕi+1 − ϕi ,
Adaptation Strategy for a Distributed Autonomous UAV Formation
237
where ϕ∗ is the phase angle of UAVs orbiting the target, with the subscript for the UAV’s ordinal number in the formation. As this is a four-UAV group, i = 1, 2, 3, 4. Phase shifts should converge to the desired values determined by T T the vector of the pattern Pd = [D1 , D2 , D3 ] = [2π/3, 9π/13,18π/29] . Then we can obtain ⎡ ⎤ −1 1 0 0 P = LQ, L = ⎣ 0 −1 1 0 ⎦ , 0 0 −1 1 T T where P = p12 p23 p34 and Q = ϕ1 ϕ2 ϕ3 ϕ4 . The kernel of the matrix L T in this case is defined as ker L = 1 1 1 1 , and from the relation X = −LT LQ, it can be found that ⎡ ⎤ ⎡ ⎤ −ϕ1 + ϕ2 p12 ⎢ −p12 + p23 ⎥ ⎢ ϕ1 − 2ϕ2 + ϕ3 ⎥ ⎥ ⎢ ⎥ X=⎢ ⎣ −p23 + p34 ⎦ = ⎣ ϕ2 − 2ϕ3 + ϕ4 ⎦ . −p34 ϕ3 − ϕ 4 UAV formation dynamics in case of encircling a target is defined as ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ϕ˙ 1 ω1 vf (2/{πρ1 }) arctan (kθ (p12 − D1 )) ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ˙ = ⎢ ϕ˙ 2 ⎥ = ⎢ ω2 ⎥ + ⎢ vf (2/{πρ2 }) arctan (kθ (−p12 + p23 + D1 − D2 )) ⎥ , Q ⎣ ϕ˙ 3 ⎦ ⎣ ω3 ⎦ ⎣ vf (2/{πρ3 }) arctan (kθ (−p23 + p34 + D2 − D3 )) ⎦ ϕ˙ 4 ω4 vf (2/{πρ3 }) arctan (kθ (−p34 + D3 )) where ω∗ and ρ∗ are the angular velocity of UAVs orbiting the target, and the radius of such orbit, respectively; subscripts match the UAVs’ ordinal numbers in the formation. Applying a nonlinear arctan (·) function limits the minima and maxima of the speed control signal. Thus, it avoids the undesirable integral windup that might cause the UAV formation to go unstable. Here vf is a positive constant that determines the maximum added speed; kθ is a positive constant that determines the smoothness of how the UAV reaches its orbit. ˙ once the uniform radius ρ is attained, is The dynamics of the pattern P, written as follows since ω1 = ω2 = ω3 = ω4 = v/ρ, where v is the final cruising linear speed of the formation: ⎤ ⎡ −vf (2/{πρ}) arctan (kθ (p12 − D1 )) + ⎢ +vf (2/{πρ}) arctan (kθ (−p12 + p23 + D1 − D2 )) ⎥ ⎥ ⎢ ⎥ ⎤ ⎢ ⎡ −p12 + p23 ⎥ ⎢ p˙12 (2/{πρ}) arctan k + −v f θ ⎥ ⎢ ˙ = ⎣ p˙23 ⎦ = ⎢ ⎥. +D1 − D2 P ⎥ ⎢ −p23 + p34 ⎥ ⎢ p˙34 (2/{πρ}) arctan k +v f θ ⎥ ⎢ +D2 − D3 ⎥ ⎢ ⎣ −vf (2/{πρ}) arctan (kθ (−p23 + p34 + D2 − D3 )) + ⎦ +vf (2/{πρ}) arctan (kθ (−p34 + D3 )) We propose the following adaptation algorithm: p˙di = as fsigm (τp (pi − pdi )) ,
(4)
238
T. Muslimov
where 0 < τp < 1; fsigm is a sigmoid odd function, e.g., an arctangent or hyperbolic tangent; as is a positive coefficient to alter the maxima and minima of the function as fsigm ; for example, if an arctangent is picked for fsigm , then this can be as = 2/π.
3
Simulation Results and Discussion
A 4-UAV simulation model based on full nonlinear models has been developed in MATLAB/Simulink. For details on the models, refer to the monograph [22]. Simulation parameters are the same as in paper [23]. Find below the results of simulation modeling using full nonlinear models. Figure 3a shows the trajectories of a 4-UAV formation in case of a stationary target. Figure 3b shows changes in phase shift errors. Apparently, these errors reach zero over time. There are three phase shift errors as the UAVs in the formation use open-chain interaction. Stability can be proved as follows, similarly to [18]. Let the function E (2) be the Lyapunov function. Then derivative of the Lyapunov function E along the trajectories of the system (4), with an arctangent for fsigm , is written as E˙ = −
n i=1
τp
2 (pi − pdi ) arctan (pi − pdi ) < 0, ∀ (pi − pdi ) = 0. π
Here the condition of quasi-stationarity is accepted and taken into account, that is pi ≈ const, and the system manages to attain a new state of equilibrium after losing one or more UAVs. As mentioned earlier, the adaptation algorithm has much slower dynamics than the system. Figure 4a shows how phase shift errors change and that they do not reach zero because losing UAV No. 3 makes the pattern unattainable. However, graphs also show the system to reach a new state of equilibrium. There are two phase shift errors here, as the UAVs in the formation use open-chain interaction; losing one makes it a 3-UAV formation. As noted earlier, the new equilibrium state exists as a result of interaction balancing. This can be seen in the UAV speeds graph when the formation loses one unit, see Fig. 4b. UAV No. 3 is lost. The remaining UAVs fail to attain the initially preconfigured cruising speed of 12 m/s. These are UAV No. 1 and UAVs No. 2, No. 4 that the lost UAV was between. Failure to reach the required cruising speed is due to the continuous inter-UAV communication that keep the system in this new equilibrium state. Cruising at a speed other than the initially configured value is strongly undesirable for two reasons: (i) it may jeopardize the mission that the initial cruising speed was meant for; (ii) it may be resourcesuboptimal as the initial cruising speed was optimized for the UAV’s aerodynamics to maximize fuel efficiency. Thus, reducing interaction within the UAV formation subject to adaptive reconfiguration is directly applicable to attaining the initially configured cruising speed. We further implemented the adaptation algorithm (4) with the coefficient τp = 0.1. As shown in Fig. 5a, this made every UAV in the formation reach its preconfigured final cruising speed of 12 m/s. Figure 5b shows change in the phase
Adaptation Strategy for a Distributed Autonomous UAV Formation
239
shift between UAV No. 1 and UAV No. 2, UAV No. 2 and UAV No. 4. The new state of equilibrium is different from that of the 4-UAV formation. In order to further illustrate the performance of the adaptation algorithm, we conducted simulations at other initial positions of the UAV-agents in the formation. This time, we used the following initial positions in the inertial coorT T T dinate system: [230, 750, 100] , [150, 360, 100] , [810, 770, 100] . Here, the order in which the initial position vectors are listed corresponds to the ordinal numbers of the UAVs in the formation. The desired relative phase shift angles and other simulation parameters were chosen the same as in the first scenario described above. Figure 6a shows how the speeds of the UAVs changed in the absence of the adaptation algorithm. Figure 6b shows how the speeds of the UAV changed when the adaptation algorithm was running. From the graphs we can see the effect of eliminating the increase in the final cruising speed of the formation.
Fig. 3. (a) 4-UAV formation trajectories; (b) relative phase errors in a 4-UAV formation
Fig. 4. (a) Phase shift errors in the resulting 3-UAV formation after losing 3rd UAV; (b) UAV speeds in the resulting 3-UAV formation after losing 3rd UAV
240
T. Muslimov
Fig. 5. (a) Adaptation algorithm performance in case of adaptively reconfiguring a UAV formation after losing a single UAV; (b) changes in phase shift during post-loss adaptive reconfiguration
Fig. 6. (a) UAV speeds in the resulting 3-UAV formation after losing 3rd UAV; (b) adaptation algorithm performance in case of adaptively reconfiguring a UAV formation after losing a single UAV
4
Conclusions
The adaptation strategy presented herein eliminates a negative effect that completely decentralized UAV formations tend to show when losing one or more UAVs. This negative effect lies in the fact that in such a situation the final cruising speed of the formation increases. If it is possible to take the UAV out of formation in advance with fault diagnosis, then the proposed approach will allow the overall mission to continue while maintaining high safety. The formation reconfiguration approach shown here could be applicable in more diverse UAV formation control scenarios; the adaptation algorithms are further modifiable. This adaptation scenario also necessitates designing a diagnostics module for UAVs; further work will present its models.
Adaptation Strategy for a Distributed Autonomous UAV Formation
241
Acknowledgements. This work was supported by the Ministry of Science and Higher Education of the Russian Federation (Agreement No. 075-15-2021-1016).
References 1. Wang, G., Luo, H., Hu, X., Ma, H., Yang, S.: Fault-tolerant communication topology management based on minimum cost arborescence for leader-follower UAV formation under communication faults. Int. J. Adv. Robot. Syst. 14, 1–17 (2017) 2. Yang, H., et al.: Fault-tolerant cooperative control of multiagent systems: a survey of trends and methodologies. IEEE Trans. Ind. Inform. 16, 4–17 (2020) 3. Rezaee, H., Parisini, T., Polycarpou, M.M.: Almost sure resilient consensus under stochastic interaction: links failure and noisy channels. IEEE Trans. Autom. Control 66, 5727–5741 (2021) 4. Mann, V., Sivaram, A., Das, L., Venkatasubramanian, V.: Robust and efficient swarm communication topologies for hostile environments. Swarm Evol. Comput. 62, 100848 (2021) 5. Abel, R.O., Dasgupta, S., Kuhl, J.G.: The relation between redundancy and convergence rate in distributed multi-agent formation control. In: 2008 47th IEEE Conference on Decision and Control, pp. 3977–3982. IEEE (2008) 6. Hu, J., Lennox, B., Arvin, F.: Robust formation control for networked robotic systems using negative imaginary dynamics. Automatica 140, 110235 (2022) 7. Alireza Motevallian, S., Yu, C., Anderson, B.D.O.: On the robustness to multiple agent losses in 2D and 3D formations. Int. J. Robust Nonlinear Control 25, 1654– 1687 (2015) 8. Slim, M., Saied, M., Mazeh, H., Shraim, H., Francis, C.: Fault-tolerant control design for multirotor UAVs formation flight. Gyroscopy Navig. 12, 166–177 (2021) 9. Huang, S., Teo, R.S.H., Kwan, J.L.P., Liu, W., Dymkou, S.M.: Distributed UAV loss detection and auto-replacement protocol with guaranteed properties. J. Intell. Robot. Syst. 93, 303–316 (2019) 10. Aminzadeh, A., Khoshnood, A.: A novel distributed fault-tolerant cooperative control approach for auto-reconfiguration of a multi agent system in natural disasters. In: 2021 9th RSI International Conference on Robotics and Mechatronics (ICRoM), pp. 163–170. IEEE (2021) 11. Ghamry, K.A., Zhang, Y.: Fault-tolerant cooperative control of multiple UAVs for forest fire detection and tracking mission. In 2016 3rd Conference on Control and Fault-Tolerant Systems (SysTol), 2016-November, pp. 133–138 (2016) 12. Yu, Z., Zhang, Y., Jiang, B., Yu, X.: Fault-tolerant time-varying elliptical formation control of multiple fixed-wing UAVs for cooperative forest fire monitoring. J. Intell. Robot. Syst. 101, 48 (2021) 13. Han, B., Jiang, J., Yu, C.: Distributed fault-tolerant formation control for multiple unmanned aerial vehicles under actuator fault and intermittent communication interrupt. Proc. Inst. Mech. Eng. Part I J. Syst. Control Eng. 235, 1064–1083 (2021) 14. Yamaguchi, H., Arai, T.: Distributed and autonomous control method for generating shape of multiple mobile robot group. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 2, pp. 800–807 (1994) 15. Yamaguchi, H., Beni, G.: Distributed autonomous formation control of mobile robot groups by swarm-based pattern generation. Distrib. Auton. Robot. Syst. 2, 141–155 (1996)
242
T. Muslimov
16. Yamaguchi, H.: A distributed motion coordination strategy for multiple nonholonomic mobile robots in cooperative hunting operations. Robot. Auton. Syst. 43, 257–282 (2003) 17. Yamaguchi, H., Arai, T., Beni, G.: A distributed control scheme for multiple robotic vehicles to make group formations. Robot. Auton. Syst. 36, 125–147 (2001) 18. Ito, S., Yuasa, H., Ito, M., Hosoe, S.: On an adaptation in distributed system based on a gradient dynamics. In: IEEE SMC’99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No. 99CH37028), pp. 200–205. IEEE (1999) 19. Yuasa, H., Ito, M.: Coordination of many oscillators and generation of locomotory patterns. Biol. Cybern. 63, 177–184 (1990) 20. Ito, M.: Trends and perspective of researches on control system theory. Distrib. Auton. Robot. Syst. 2, 3–11 (1996) 21. Ito, S., Sasaki, M., Fujita, Y., Yuasa, H.: A circularly coupled oscillator system for relative phase regulation and its application to timing control of a multicylinder engine. Commun. Nonlinear Sci. Numer. Simul. 15, 3100–3112 (2010) 22. Beard, R.W., McLain, T.W.: Small Unmanned Aircraft: Theory and Practice. Princeton University Press, Princeton (2012) 23. Muslimov, T.Z., Munasypov, R.A.: Coordinated UAV standoff tracking of moving target based on Lyapunov vector fields. In: 2020 International Conference Nonlinearity, Information and Robotics (NIR), pp. 1–5. IEEE (2020)
DGORL: Distributed Graph Optimization Based Relative Localization of Multi-robot Systems Ehsan Latif and Ramviyas Parasuraman(B) School of Computing, University of Georgia, Athens, GA 30602, USA {ehsan.latif,ramviyas}@uga.edu https://github.com/herolab-uga/DGORL
Abstract. An optimization problem is at the heart of many robotics estimating, planning, and optimum control problems. Several attempts have been made at model-based multi-robot localization, and few have formulated the multirobot collaborative localization problem as a factor graph problem to solve through graph optimization. Here, the optimization objective is to minimize the errors of estimating the relative location estimates in a distributed manner. Our novel graph-theoretic approach to solving this problem consists of three major components; (connectivity) graph formation, expansion through the transition model, and optimization of relative poses. First, we estimate the relative poseconnectivity graph using the received signal strength between the connected robots, indicating relative ranges between them. Then, we apply a motion model to formulate graph expansion and optimize them using g2 o graph optimization as a distributed solver over dynamic networks. Finally, we theoretically analyze the algorithm and numerically validate its optimality and performance through extensive simulations. The results demonstrate the practicality of the proposed solution compared to a state-of-the-art algorithm for collaborative localization in multi-robot systems. Keywords: Multi-Robot · Localization · Graph Theory · Distributed Optimization
1 Introduction The estimation of a relative pose, including position and orientation, [1], for multirobot systems (MRS) [2] is the foundation for higher-level tasks like collision avoidance, cooperative transportation, and object manipulation. Motion capture systems [3], ultra-wideband (UWB) systems with anchors, and RTK-GPS systems are a few examples of established multi-robot relative positioning solutions that currently rely on the deployment of physical anchor or base stations in the application. These plans, however, are not suitable for large areas or interior settings where it is difficult to convert the infrastructure, which limits the overall performance and application possibilities of multi-robot systems and makes their use more difficult and expensive. Furthermore, extraction of range and bearing measurements from cameras and visual makers, while c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 243–256, 2024. https://doi.org/10.1007/978-3-031-51497-5_18
244
E. Latif and R. Parasuraman
another practical approach, has the drawbacks of having a small field of vision, a shortrange, obscured by nearby objects, and maybe requiring much computational power. The use of distance measurements from sensors like radars, Lidars, and UWB to achieve relative localization, on the other hand, has recently attracted more significant interest. The multi-robot relative localization (MRL) problem, which refers to detecting and locating the relative configurations of mobile agents (typically with fewer sensor data such as relative range or bearing) concerning other agents or landmarks, is critical in MRS because it is required for robot teaming and swarming [4, 5]. As a result, many applications are frequently confronted with the relative localization problem, including formation control, cooperative transportation, perimeter surveillance, area coverage, and situational awareness. The relative localization and mapping (aka multi-robot SLAM) is an extension of the MRL problem. While several researchers have proposed novel solutions to the multi-robot map merging problem using pose graph matching and optimization techniques, they rely on extensive sensor data inputs (such as point clouds or Lidar scans) [6–8]. Therefore, solving the MRL problem with relative range or bearing in a distributed manner is desirable and scalable in MRS [9]. R3 R5
R2
R04
)2 1
R0
5
ω) (v, 5,
X
3
X3 , (v , ω)
ω v, X1 , (v , ω)
)3
,( X2
R01
R43
,ω , (v X3
R03
R02
R1
Connected Robots Connecting Robots
X4, (v, ω)4
R45
Connected and Connecting Robots Rij Computed range of i & j Xj, (v, ω)j
Sending State and velocity of j to i
R4
Fig. 1. Overview of configuration space of a multi-robot system, sharing their pose (xi ) and relative range (Rij ) measurements in our DGORL solution.
Distributed optimization is the problem of minimizing a joint objective function that is the sum of many local objective functions, each corresponding to a computer node. We can model many fundamental activities in this area as distributed optimization problems, which have significant implications for multi-robot systems. Examples include cooperative estimation [10], multiagent learning [11], and collaborative motion planning. The distributed optimization formulation provides a versatile and effective paradigm for creating distributed algorithms for numerous multi-robot problems. In consumer electronics, Wi-Fi is one of the most extensively utilized wireless technology for indoor wireless networks. The ubiquitous availability of Received Signal Strength Indicator (RSSI) measurement on such inexpensive commercial devices is the RSSI measured from an Access Point (AP) or a Wireless Sensor Robot (WSN). The RSSI value can be used in various applications, including relative localization [12–14], cooperative control [15, 16], and communication optimization [17]. In this paper, we formulate the MRL problem as a graph optimization problem and solve it in a distributed manner using a polynomial-time optimizer called the General
Distributed Graph Optimization Based Relative Localization
245
Graph Optimization (g2 o [18]). g2 o is an open-source graph-based framework to handle the nonlinear error problems and is used to optimize global measurement pose using the initial global measurement poses and local relative pose constraints. Our solution, termed DGORL, aims to achieve high localization accuracy efficiently in a distributed fashion. DGORL forms relative position-weighted connectivity graphs using RSSI as local sensor data then expands graphs based on possible positions at an instant and further optimizes to fetch relative position estimates for all connected robots. See Fig. 1 for an overview of the configuration space of DGORL. The main contributions of this paper are listed below. 1. A novel distributed, efficient, and precise relative localization system based on shared inertial measurements and RSSI inputs from connected robots. 2. Position-weighted connectivity graph construction and optimization strategy tailored specifically for obtaining reliable relative pose estimates. 3. Theoretical and numerical analysis to evaluate the performance of the algorithm. 4. Validation of accuracy and efficiency of the DGORL compared to the recent collaborative multi-robot localization algorithm [19], which used covariance intersection technique to address the temporal correlation between received signals. 5. Open-sourcing of the codes1 for use and improvement by the research community.
2 Related Work Most recent solutions to the simultaneous localization and mapping (SLAM) and MRL problem are based on graph optimization (i.e., all robot poses and landmark positions compose the graph’s nodes, while each edge encodes a measurement constraint) [18]. A conventional graph formulation, on the other hand, may suffer from unbounded processing and memory complexity, which might constantly expand over time. This is because new robot poses (and new landmarks in the case of feature-based SLAM) are constantly being added to the graph, resulting in an increase in the number of nodes over time; additionally, if frequent loop-closing events occur in SLAM, loop-closure constraints (edges) can significantly increase the graph density [20]. For example, this could be the case if a service robot works for an extended time inside an office building. Particularly, graph optimization and factoring have been recently proposed in the literature to solve different variants of the MRL problem [21–23]. Even though the issue of reducing the complexity of graph optimization has recently been addressed [24, 25], to the best of our knowledge, little work has yet explicitly taken into account estimation consistency (i.e., unbiased estimates and an estimated covariance more significant than or equal to the actual covariance [26]) in the design of graph reduction (sparsification) schemes. This is a critical flaw because if an estimator is inconsistent, the accuracy of the derived state estimations is unclear, making it untrustworthy [27]. Moreover, the performance and efficiency of approaches to the localization problem in dynamic environments are significantly traded off. Most cooperative localization methods entail robot communication and observation, which makes any step prone to inaccuracy. In a recent attempt at multi-robot localization, many robots can locate themselves jointly using Terrain Relative Navigation 1
http://github.com/herolab-uga/DGORL.
246
E. Latif and R. Parasuraman
(TRN) [19]. The localization estimation utilizing shared information fusion has been improved by using an estimator structure that takes advantage of covariance intersection (CI) to reduce one source of measurement correlation while properly including others. Similarly, a work [28] developed a CI-based localization method with an explicit communication update and guaranteed estimation consistency simultaneously would increase the robustness of multi-robot cooperative localization methods in a dispersed setting. However, robots keep a system-wide estimate in their algorithm, which relative observations can instantly update. Consequently, it increases the computational complexity at a centralized server and over-burdened the robot to keep track of dynamics for global positioning accurately. Unlike the explicit CI modeling methods, our objective is to localize the robot relative to other robots in a distributed fashion by utilizing the polynomial-time graph optimization technique with high accuracy and efficiency. Therefore, this paper proposes a graph-based optimization algorithm to address the gaps mentioned above. To form a graph using shared RSSI information among robots, we employ a Relative Pose Measurement Graph (RPMG) using observability analysis [22]. Once we have created a connected, reliable graph, we exploit particle filtering over the motion model to expand the graph based on mobility constraints. Finally, we construct a k-possible combination of graphs for each robot which needs to be optimized. We use the polynomial-time graph optimizer (g2 o [18]) for graph optimization with a distributed constraint optimization.
3 Problem Formulation and the Proposed DGORL Solution An overview of the typical MRS components is given in this section, along with a thorough explanation of the suggested distributed graph optimization formulation. Multi-robot System. Robotic members of an MRS are divided into disjoint (isolated/disconnected from others) and connected (operating collaboratively) robots, which can be either ground-based or aerial. The robot that enters the measuring range of the monitoring robot at any given moment is referred to as the observed robot. A robot that takes measurements of a random robot, among other robots, is the observation robot. The following qualities are presumptive for the considered MRS’s robotic members: – Wireless communication is used to share information across the MRS. – The watching robot can extract the neighboring robot’s relative range (e.g., through RSSI measurements) and can uniquely identify each robot in its field of view. – While the observed robots have limited sensory and computational capabilities, the observing robot may use high-precision sensors to carry out its self-localization. – We restrict the movement of the robots within a two-dimensional planar space. Assume at a given time t, a team of robots contains n ∈ N connected robots can form a weighted undirected graph, denoted by G = (V, E, A), of order n consists of a vertex set V = {v1 , ..., vn }, an undirected edge set E ∈ V × V is a range between connected robots and an adjacency matrix A = {aij }n×n with non-negative element
Distributed Graph Optimization Based Relative Localization
247
aij > 0, if (vi , vj ) ∈ E and aij = 0 otherwise. An undirected edge eij in the graph G is denoted by the unordered pair of robots (vi , vj ), which means that robots vi and vj can exchange information with each other. Here, we only consider the undirected graphs, indicating that the robots’ communications are all bidirectional. Then, the connection weight between robots vi and vj in graph G satisfies aij = aji > 0 if they are connected; otherwise, aij = aji = 0. Withindicates no self-connection in the graph. out loss of generality, it is noted that aii = 0 n The degree of robot vi is defined by d(vi ) = j=1 aij where j = i and i = 1, 2, ..., n. The Laplacian matrix of the graph G is defined as Ln = D−A, where D is the diagonal with D = diag{d(v1 ), d(v2 ), ..., d(vn )}. If the graph G is undirected, Ln is symmetric and positive semi-definite. A path between robots vi and vj in a graph G is a sequence of edges (vi , vi1 ), (vi1 , vi2 ), ..., (vik , vj ) in the graph with distinct robots vil ∈ V . An undirected graph G is connected if a path exists between any pair of distinct robots vi and vj where (i, j = 1, ..., n).
Fig. 2. The Distributed Graph Optimization for Relative Localization of Multi-Robot Systems (DGORL) system architecture shows input (Received connected robot’s motion information and RSSI) and output (relative pose estimation) for robot i. The robot generates and expends graphs based on observability for optimization along with the constraints set up through local and received inertial measurement, which further fed into the optimization function. The optimized graph yields relative pose estimates for all connected robots.
In this paper, we formulated a multi-robot system as a graph problem to find the optimized solution for relative localization based on inequality constraints with a timevarying domain. Furthermore, we have segmented the solution into three different components, which can be solved simultaneously for every iteration over distributed constraints: 1) graph formation, 2) expansion through transition, and 3) optimization. See Fig. 2 for sequential procedure of DGORL. 3.1 Graph Formation An undirected graph G = (V, E), where (i, j) ∈ E if and only if (i, j) and (j, i) both are in E, is called the underlying undirected graph of G. G is connected if its underlying undirected graph is connected. An isolated and connected subgraph of G is called a component. Suppose that an undirected graph G has n nodes and m edges.
248
E. Latif and R. Parasuraman
The incidence matrix of G, denoted by A(G), is an m × n matrix whose m rows and n columns correspond to the m edges and n nodes of G, respectively. Each element aij of A(G) is defined as: 1 if node vj is the tail/head of edge ei , aij = 0 otherwise The following lemma describes the relation between the rank of A(G) and the connectivity of G. Lemma 1. Let G be an undirected graph with n nodes and A(G) be its incidence matrix. If G is composed of λ components, the rank of A(G) satisfies rank(A(G)) = n − λ ≤ n − 1 The overall algorithm is mainly composed of two steps, as depicted in Algorithms 1 and 2, respectively. The first step is to generate the Estimated Relative Position Measurement Graph (ERPMG); GE = (VE , EE , AE ) based on the Relative Position Estimation Graph (RPMG); G = (V, E, A), which can be built-up using received range information from connected robots of an n-robot system, which describes the relative position measurements among robots. In this step, the node-set VE of the ERP M G = GE is initialized to be the same as that of the RP M G = G, and the edge set EE of the ERP M G = GE is initialized to be empty. Then, add all relative position measurement edges of E into the edge set EE if edge eij satisfy the motion constraint; J Fj VBj = 0. A concise description of this step is illustrated in Algorithm 1.
Algorithm 1: ERPMG Formation 1 2 3 4 5 6 7 8 9 10
Input: State x, input u, RPMG G = (V, E, A); Output: ERPMG GE ; Initialize ERPMG GE = (VE , EE ) with VE = V , EE = φ ; for each node vi ∈ V d do for each edge eij ∈ E do if J Fj VBj = 0 then add edge eij to EE ; end end end
The second step (See Algorithm 2) is to examine the observability of the n-robot system according to the ERP M G = GE generated by Algorithm 1. We initialize the incidence matrix A(GE ) and the diagonal matrix Ln to be zero matrices, with their dimensions determined by the number of nodes and edges in GE . Then construct the incidence matrix A(GE ) and the diagonal matrix Ln . The incidence matrix A(GE ) describes which nodes are connected by which edges and is constructed based on the topology of GE . The diagonal matrix Ln describes the weight of the edges in GE with
Distributed Graph Optimization Based Relative Localization
249
the weight vector of each edge. The edges in A(GE ) and Ln take the same order. Then, we can obtain the spectral matrix C(GE ) based on A(GE ) and Ln . The observability of the n-robot system can be determined by evaluating the rank of C(GE ). 3.2 Expansion Through Transition
Algorithm 2: Observability Checking 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Input: n = |VE |, m = |EE | ; Initialize the incidence matrix: A(GE ) = m × n; Initialize the diagonal matrix: Ln = n × n ; k = 0; for each node vi ∈ VE do for each edge (i, j) ∈ EE do k = k + 1; A(GE )[k, i] = 1; W [3k − 2 : 3k, 3k − 2 : 3k] = diag(Ln ((i, j))); end end if rank (C(GE )) is equal to 4(n − 1) then return True; end else return False; end
Initial position and velocity constraints are known to each robot; hence locally constructed ERPMG tends to change based on relative motion in the network. We have limited the number of possible positions of an individual robot to k by exploiting the concept of particle filtering. The motion process is carried out in the 2-D state-space Xn,t includes position (xn,t , yn,t ) and orientation φn , t. The robot model f (∗) can be written as: xn,t+1 = xn,t + vn,t Δtcos(φn , t) yn,t+1 = yn,t + vn,t Δtsin(φn , t)
(1)
φn,t+1 = φn,t + ωn,t Δt In the Eq. (1), vn,t and ωn,t are the velocity and angular velocity of the robot n at time, t respectively. δt represents the time interval between two control inputs. Based on the robot model f (∗) , the control input of the robot n at time t is defined as: un,t = [vn,t , ωn,t ]
(2)
It is worth mentioning that Eq. (2) is the ideal kinematic model of the robot. Under pure rolling conditions, the robot’s motion follows this equation. However, imperfections are unavoidable in real devices. Complex wheel-ground interactions and system
250
E. Latif and R. Parasuraman
noises cause disturbances to the robots. Moreover, these disturbances are modeled as Gaussian random variables, characterized by their mean and covariance matrices. Thus, the model of a given robot n at time t is defined as: xn,t+1 = f (xn,t , un,t ) + Nnoisen,t
(3)
Here, xn,t ∈ Rnx and un,t ∈ Rnu are the state and control input of the robot n at the time, t respectively. Nnoise ∈ Rnx is an unknown disturbance with Gaussian probability distribution. Considering the disturbance level in the real environment, Nn,t noise in simulations is set to diag(0.1 m, 0.1 m, 0.5◦ ). As each robot also receives RSSI from other robots, we can find the intersecting region as an area of interest to map the estimated relative position for each robot using the model and find k soft max out of them. Once we have k possible positions of each robot, we can generate < nk solvable graphs for optimization. 3.3
Optimization
Constraints. We are interested in solving the constrained convex optimization problem over a multi-robot network in a distributed fashion. More specifically, we consider a network with n robots, labeled by V = {1, 2, ..., n} and k possible connections to other robots. Every robot i has a local convex objective function and a global constraint set. The network cost function is given by: minimize f (x) =
N i=1
fi (x) subject to x ∈ D = x ∈ Rk : c(x) ≤ 0 (4)
Here, x ∈ Rk is a global decision vector; fi : Rk → R is the convex objective function of robot i known only by robot i; D is a bounded convex domain, which is (without loss of generality) characterized by an inequality constraint, i.e., c(x) ≤ 0, where c : Rk → R is a convex constraint function. All the robots know it. We assume that D is contained in a Euclidean ball of radius R, that is: D ⊆ B = x ∈ Rk : x2 ≤ R . (5) We also assume that there exists a point x ˆ such that the inequality constraint is strictly feasible, i.e., c(ˆ x) < 0. We introduce a regularized Lagrangian function to deal with the inequality constraint c(x). L(x, λ) =
N
fi (x) + λN c(x) −
i=1
=
N
Li (x, λ),
γ N λ2 2
=
N γ fi (x) + λc(x) − λ2 2 i=1
(6)
i=1
where, we have replaced the inequality constraint c(x) ≤ 0 with Nc (x) ≤ 0, and γ > 0 is some parameter. It is noted that Li (x, λ) is known only by robot i.
Distributed Graph Optimization Based Relative Localization
251
Algorithm 3: Relative Localization Based on Graph Optimization 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
16 17 18
At given time interval t, for robot i; Observe connected robots n; Receive motion and observability information from connected robots; Generate ERPMG from RPMG using Algorithm 1; Check observability of n-robot systems using Algorithm 2; for every sampling instance tk , k = 0,1,2,.. do for each connected robot Rj , j = i and j = 1,2,...,n do Get previous odometry information; Get previous relative positioning information; Generate nk possible graph over estimation horizon; end end Set constraints as Eq. (4); Generate a time-varying network model for n connected robots; Solve the graph optimization problem using distributed solver [18] over a predicted horizon; for each connected robot Rj , j = i and j = 1,2,...,n do Extract position estimation from optimized graph: Xj,t+1 = (xj,t+1 , yj,t+1 , φj,t+1 ); end
Model. We consider a time-varying network model that has been widely considered in the literature [18, 20, 29]. The robots’ connectivity at time t can be represented by an undirected graph G(t) = (V, E(t), A(t)), where E(t) is the set of activated edges at time t, i.e., edge (i, j) ∈ E(t) if and only if robot i can receive data from robot j, and we assign a weight [A(t)]ij > 0 to the data on edge (i, j) at time k. Note that the set E(t) includes self-edges (i, i) for all i. We make the following standard assumption on the graph G(t). Assumptions: The graph G(t) = (V, E(t), A(t)) satisfies the following. 1. For all t ≥ 0, the weight matrix A(t) is doubly stochastic. 2. There exists a positive scalar ξ such that [A(t)]ii ≥ ξ for all i and t ≥ 0, and [A(t)]ij ≥ ξ if [A(t)]ij > 0. 3. There exists an integer T ≥ 1 such that the graph (V, E(sT ) ∪ ... ∪ E((s + 1)T − 1)) is strongly connected for all s ≥ 0. Once the robot i model the network and constraints, g2 o [18] optimize the provided k-ERPMG to find the best possible position estimations for connected n robots in the form of optimized graph Go = (Vo , Eo , Ao ). Vo are the possible node positions concerning i, Eo contains the possible distance range to other nodes in terms of weighted edges, and Ao is an optimized adjacency matrix from i to every other node. Algorithm 3 describes the entire localization strategy for a given MRS, with a graph-based optimizer denoting the algorithm’s three main iterative steps.
252
E. Latif and R. Parasuraman
4 Theoretical Analysis Graph optimization-based relative pose localization has three submodules; Formation, Expansion, and Optimization. The formation depends on the number of robots n, which causes the generation of ERPMG G = (V, E, A) and expansion relies on the number of possible position instances, k, which produces nk potential graphs to be optimized. As both initial steps can be serially done in polynomial time, one can deduce that the algorithm is polynomial and scalable with the number of robots. However, the optimization of nk possible graphs using the g2 o optimizer needs to be analyzed for its NP-hardness. NP-hardness: We will show that the problem of finding optimizers is NP-hard in general. An example shows that L1-optimality is NP-hard for non-submodular energies. Remember that if no two vertices of a graph G = (V, E, A) are connected by an edge, the set U of vertices is independent. The problem of finding the maximal independent set of vertices of an arbitrary graph is known to be NP-hard [30]. Consider the following local costs as an example: – Give each vertex v of label l the cost li . – For each edge with both vertices of label l, let the cost be N = |V | + l. – Connect the cost of 0 to any other edge. It is worth noting that if and only if the set U = l1 (1) is independent, the maximum cost of any labeling l is N . All labeling l associated with an independent U have a maximum cost of 1. Furthermore, the labeling l is a strict minimizer when the number of cost 1 atom for U , which is |V | − |U |, is minimal, i.e., when the size of U is maximal. To put it another way, if we use the previously mentioned local cost assignments for a graph G, then l is a strict minimizer if and only if U := l1 (1) is a maximal independent set of vertices. As a result, our problem, like the problem of finding the most extensive independent set of vertices, is NP-hard. Thus, the optimization problem is proved to be NP-hard, and we also know that such graph problems can be verified in polynomial time [30] hence the problem is NP-complete. Convergence: In order to control convergence, the g2 o method adds a damping factor and backup operations to Gauss-Newton, for which g2 o solves a damped version of the optimization function. A damping factor λ is present here; the more significant it is, the smaller the λ is. In the case of nonlinear surfaces, this helps control the step size. The g2 o algorithm’s goal is to regulate the damping factor dynamically. The error of the new setup is tracked throughout each iteration. The following iteration is reduced if the new error is smaller than the prior one. If not, lambda is raised, and the solution is reversed. We refer to [18] for a more thorough discussion of how the g2 o algorithm guarantees convergence.
5 Simulation Experiments and Results We performed extensive simulation experiments in a 60 × 60 m bounded region under certain conditions to analyze the algorithm. We have set up the MRS as discussed in
Distributed Graph Optimization Based Relative Localization
253
Fig. 3. Trajectories and the Optimized Graph. Left: Simulation with the ground truth (green) trajectory along with the colored predicted trajectory of a robot; Right: Robots as colored vertices and colored edges between connected robots with the respective color for optimized graphs.
Fig. 4. DGORL performance of five robots in terms of Root Mean Squared Error (RMSE) in the 60 m × 60 m simulation workspace (Left). Comparison with Collaborative Terrain Relative Navigation (TRN) based on Covariance Intersection (Right).
Sect. 3, where each robot shares inertial measurement and RSSI with connected robots. Each robot in the MRS performs relative localization based on its frame of reference. We can calculate weights of edges eij in G = (V, E) as a range d the between Ri and Rj using path loss model using perceived RSSI: d = 10
A−RSSI 10n
(7)
Here, n denotes the signal propagation exponent, which varies between 2 (free space) and 6 (complex indoor environment), d denotes the distance between robots i and j, and A denotes received signal strength at a reference distance of one meter. On every iteration, ri generates ERPMG based on its observation and performs observability analysis (at least four nodes connected, based on the observability Checking in Algorithm 2). Later, it will expand ERPMG and generate input for the g2 o optimizer. The initial global measuring posture, the local relative pose limitations, and its information matrix make up g2 o’s input. Once obtained an optimized graph, ri estimated rel-
254
E. Latif and R. Parasuraman
ative positions from vertex poses. Although in MRS, each robot performs localization distributedly. To measure error, we have captured initial positions on a global scale and compared predicted trajectories with ground truth as RMSE in meters. We used range sensors to compare our strategy to Terrain Relative Navigation (TRN) from [19]. Localization Accuracy. We performed experiments with five robots driving them on a random walk for 100 iterations, and obtained consistent results through 10 repetitive trials under identical conditions. Figure 3 visualizes the predicted trajectories and connected graphs for experimentation. Results in Fig. 4 have validated the claim about the accuracy of DGORL in a larger space, with 8% localization error. Furthermore, experimentation was carried out in the same simulated workspace area to evaluate the localization accuracy of DGORL in contrast to TRN and to confirm its applicability in more significant scenarios. We performed ten attempts using identical simulation conditions and obtained the RMSE in meters to assess the localization accuracy. Results in Fig. 4, which demonstrate a 23% percent improvement in localization accuracy over TRN, have supported the assertion that DGORL is viable in larger contexts. Furthermore, the absolute localization error of 4.2 m for a 3600 m2 zone with five robots in a standalone view of localization of DGORL is highly encouraging. Still, it needs further analysis to understand the limitations and sufficient conditions. Nevertheless, the results showcase the viability of DGORL for MRS applications. Computational Demand. The mean optimization time (referring to the time taken by the solver) and the mean CPU usage of the whole process are 10.2 ± 2.7 ms and 143 ± 49%, respectively. The results are evaluated for five robots from 10 runs over 100 iterations. The computational complexity of covariance intersection-based TRN is close to DGORL, which further highlights the efficiency of DGORL. It is worth noticing that the performance of DGORL provides evidence for its practical use in small resource-constrained robots (i.e., equipped with tiny computers, e.g., Raspberry Pi zero with Quad Core running at 1 GHz). Hence, DGORL can be used as an efficient relative localization algorithm for most multi-robot systems, including swarm robotics.
6 Conclusion Many estimations, planning, and optimum control challenges in robotics have a basis for optimization problems. In most optimization problems, the goal to be maximized or minimized is made up of multiple local components or terms. That is, they only rely on a tiny portion of the variables. We use graph theory to solve a well-defined multi-robot relative localization problem. Graph generation, expansion through transition, and optimization are the three fundamental components of our novel technique. We first estimated a relative pose measurement graph using received signal intensity between connected robots. Later, we used a motion model to build graph expansions and a polynomial-time graph optimizer to find an optimized pose estimation graph from which a robot extracts relative positions for connected robots. Finally, we analyzed the algorithm through simulations to validate its practicality and accuracy in a large workspace. Furthermore, DGORL delineates high localization accuracy than appropriate methods while maintaining a comparable optimization time, which further backs
Distributed Graph Optimization Based Relative Localization
255
the efficiency and accuracy of DGORL. In the future, we will perform hardware experimentation to substantiate the applicability of graph optimization in the real world.
References 1. Islam, M.J., Mo, J., Sattar, J.: Robot-to-robot relative pose estimation using humans as markers. Auton. Robot. 45(4), 579–593 (2021) 2. Xianjia, Y., Qingqing, L., Queralta, J.P., Heikkonen, J., Westerlund, T.: Applications of UWB networks and positioning to autonomous robots and industrial systems. In: 2021 10th Mediterranean Conference on Embedded Computing (MECO), pp. 1–6. IEEE (2021) 3. Najafi, M., Nadealian, Z., Rahmanian, S., Ghafarinia, V.: An adaptive distributed approach for the real-time vision-based navigation system. Measurement 145, 14–21 (2019) 4. Guo, K., Qiu, Z., Meng, W., Xie, L., Teo, R.: Ultra-wideband based cooperative relative localization algorithm and experiments for multiple unmanned aerial vehicles in GPS denied environments. Int. J. Micro Air Veh. 9(3), 169–186 (2017) 5. Fink, J., Michael, N., Kim, S., Kumar, V.: Distributed pursuit-evasion without global localization via local fronteirs. Auton. Robot. 32(1), 81–95 (2012) 6. Dubé, R., Gawel, A., Sommer, H., Nieto, J., Siegwart, R., Cadena, C.: An online multi-robot slam system for 3D lidars. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1004–1011. IEEE (2017) 7. Mangelson, J.G., Dominic, D., Eustice, R.M., Vasudevan, R.: Pairwise consistent measurement set maximization for robust multi-robot map merging. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 2916–2923. IEEE (2018) 8. Tian, Y., Chang, Y., Arias, F.H., Nieto-Granda, C., How, J.P., Carlone, L.: Kimera-Multi: robust, distributed, dense metric-semantic slam for multi-robot systems. IEEE Trans. Robot. 38, 2022–2038 (2022) 9. Latif, E., Parasuraman, R.: Multi-robot synergistic localization in dynamic environments. In: ISR Europe 2022; 54th International Symposium on Robotics, pp. 1–8 (2022) 10. Shorinwa, O., Yu, J., Halsted, T., Koufos, A., Schwager, M.: Distributed multi-target tracking for autonomous vehicle fleets. In: IEEE International Conference on Robotics and Automation (ICRA) 2020, pp. 3495–3501 (2020) 11. Wai, H.-T., Yang, Z., Wang, Z., Hong, M.: Multi-agent reinforcement learning via double averaging primal-dual optimization. In: Advances in Neural Information Processing Systems, vol. 31 (2018) 12. Wang, W., Jadhav, N., Vohs, P., Hughes, N., Mazumder, M., Gil, S.: Active rendezvous for multi-robot pose graph optimization using sensing over Wi-Fi. In: Asfour, T., Yoshida, E., Park, J., Christensen, H., Khatib, O. (eds.) Robotics Research. ISRR 2019. Springer Proceedings in Advanced Robotics, vol. 20, pp. 832–849. Springer, Cham (2019). https://doi.org/10. 1007/978-3-030-95459-8_51 13. Parashar, R., Parasuraman, R.: Particle filter based localization of access points using direction of arrival on mobile robots. In: IEEE 92nd Vehicular Technology Conference (VTC2020-Fall), pp. 1–6. IEEE (2020) 14. Latif, E., Parasuraman, R.: Online indoor localization using DOA of wireless signals. arXiv preprint arXiv:2201.05105 (2022) 15. Luo, S., Kim, J., Parasuraman, R., Bae, J.H., Matson, E.T., Min, B.-C.: Multi-robot rendezvous based on bearing-aided hierarchical tracking of network topology. Ad Hoc Netw. 86, 131–143 (2019)
256
E. Latif and R. Parasuraman
16. Parasuraman, R., Min, B.-C.: Consensus control of distributed robots using direction of arrival of wireless signals. In: Correll, N., Schwager, M., Otte, M. (eds.) Distributed Autonomous Robotic Systems. SPAR, vol. 9, pp. 17–34. Springer, Cham (2019). https:// doi.org/10.1007/978-3-030-05816-6_2 17. Parasuraman, R., Oegren, P., Min, B.-C.: Kalman filter based spatial prediction of wireless connectivity for autonomous robots and connected vehicles. In: IEEE 88th Vehicular Technology Conference (VTC-fall), pp. 1–5. IEEE (2018) 18. Kümmerle, R., Grisetti, G., Strasdat, H., Konolige, K., Burgard, W.: g2 o: a general framework for graph optimization. In: 2011 IEEE International Conference on Robotics and Automation, pp. 3607–3613. IEEE (2011) 19. Wiktor, A., Rock, S.: Collaborative multi-robot localization in natural terrain*. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 4529–4535 (2020) 20. Huang, G., Kaess, M., Leonard, J.J.: Consistent sparsification for graph optimization. In: European Conference on Mobile Robots, pp. 150–157 (2013) 21. Zheng, S., et al.: Multi-robot relative positioning and orientation system based on UWB range and graph optimization. Measurement 195, 111068 (2022) 22. Hao, N., He, F., Hou, Y., Yao, Y.: Graph-based observability analysis for mutual localization in multi-robot systems. Syst. Control Lett. 161, 105152 (2022) 23. Sahawneh, L.R., Brink, K.M.: Factor graphs-based multi-robot cooperative localization: a study of shared information influence on optimization accuracy and consistency. In: Proceedings of the 2017 International Technical Meeting of the Institute of Navigation (ION ITM 2017), Monterey, California, pp. 819–838 (2017) 24. Carlevaris-Bianco, N., Eustice, R.M.: Generic factor-based node marginalization and edge sparsification for pose-graph SLAM. In: 2013 IEEE International Conference on Robotics and Automation, pp. 5748–5755. IEEE (2013) 25. Johannsson, H., Kaess, M., Fallon, M., Leonard, J.J.: Temporally scalable visual slam using a reduced pose graph. In: 2013 IEEE International Conference on Robotics and Automation, pp. 54–61. IEEE (2013) 26. Bar-Shalom, Y., Li, X.R., Kirubarajan, T.: Estimation with Applications to Tracking and Navigation: Theory Algorithms and Software. Wiley, New York (2004) 27. Indelman, V., Nelson, E., Michael, N., Dellaert, F.: Multi-robot pose graph localization and data association from unknown initial relative poses via expectation maximization. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 593–600. IEEE (2014) 28. Chang, T.-K., Chen, K., Mehta, A.: Resilient and consistent multirobot cooperative localization with covariance intersection. IEEE Trans. Rob. 38(1), 197–208 (2021) 29. Jiang, X., Zeng, X., Sun, J., Chen, J.: Distributed solver for discrete-time Lyapunov equations over dynamic networks with linear convergence rate. IEEE Trans. Cybern. 52(2), 937–946 (2022) 30. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2022)
Characterization of the Design Space of Collective Braitenberg Vehicles Jack A. Defay, Alexandra Q. Nilles(B) , and Kirstin Petersen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY 14853, USA {jad452,aqn3,kirstin}@cornell.edu
Abstract. Large collectives of artificial agents are quickly becoming a reality at the micro-scale for healthcare and biological research, and at the macro-scale for personal care, transportation, and environmental monitoring. However, the design space of reactive collectives and the resulting emergent behaviors are not well understood, especially with respect to different sensing models. Our work presents a well-defined model and simulation for study of such collectives, extending the Braitenberg Vehicle model to multi-agent systems with on-board stimulus. We define omnidirectional and directional sensing and stimulus models, and examine the impact of the modelling choices. We characterize the resulting behaviors with respect to spatial and kinetic energy metrics over the collective, and identify several behaviors that are robust to changes in the sensor model and other parameters. Finally, we provide a demonstration of how this approach can be used for control of a swarm using a single controllable agent and global mode switching.
1
Introduction
Valentino Braitenberg introduced his vehicles [2] as a thought experiment in synthetic psychology: a light-hearted demonstration of the human tendency to see order and intelligence in the emergent behaviors of simple reactive agents. Sensors are “wired” directly to actuators, so the agents do not use on-board stateful representations or memory (Fig. 1(a)). Seemingly intelligent behaviors such as collision avoidance or aggregation easily arise from memoryless reactions between agents and environmental stimuli, along with the human tendency to see agency in expressive motion [18]. In this work, we introduce a simulated model of simple Braitenberg Vehicles (BVs) in a collective. Most demonstrations of BVs implement off-board stimuli sources; here, we extend the model to on-board rear-facing stimuli sources; the environment of interest is the collective itself and its dynamics. We analyze the emergent behaviors for robustness, and cast the behaviors in light of potential robotics applications. From social organisms such as cellular slime mold, we know that collective intelligence is possible without highly complex nervous systems. At the micrometer-scale, researchers are developing engineered collectives for drug c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 257–272, 2024. https://doi.org/10.1007/978-3-031-51497-5_19
258
J. A. Defay et al.
Fig. 1. (a) Overview of how the original BV react to environmental stimulus. (b) Overview of local and global coordinates referenced in this article.
delivery and minimally invasive surgery [13]. Robotics techniques at this scale are rapidly becoming a reality, with memristor-based circuits capable of sensing chemical and light signals [5] and read/write memory of temporal events [34]. However, the agents are too small to use traditional onboard state-estimation and control algorithms, instead requiring careful design of primarily reactive strategies, uniform off-board controls for the entire collective [30], or a combination of the two approaches [9]. Reactive approaches that use local information and embodied intelligence are crucial for success at the micro-scale, as biological collectives show [10]. At the human scale, as robotic collectives begin assisting us with personal care, transportation, and environmental monitoring, the design of reactive collectives is appealing. Decentralized methods are more scalable, and could even help side-step some privacy and surveillance concerns if we can implement systems that do not require cameras, full scene reconstruction, or global positioning [8,29]. To our knowledge, the dynamics of multiple BVs with onboard stimuli have not been a first-class object of study in the robotics, control or autonomous systems literature. Our contributions include a more formal definition of the BV model, an opensource Python implementation for simulating and characterizing the system, and an interpretation of the results and next steps toward application. We extend the BV model to include on-board stimuli and define two stimuli models: a symmetric omni-directional stimulus such as that enabled by radio frequency localization [17], and a directional stimulus modeled off a generic LED’s spatial intensity. We also define two sensor models, an omnidirectional sensor model and a sensor model with an angular dependency, such that the observed stimuli intensity decreases with misalignment. In this paper, we examine three combinations of these models, and characterize emergent behaviors with respect to metrics over the nearest-neighbor distances and kinetic energy of the collective. We examine homogeneous collectives while varying three main design parameters: – Influence, γ: scale of the reactive control law, updates heading and velocity – Influence angle, θmax : angular extent of constrained sensors and/or stimuli – Actuation noise: Gaussian noise on heading and velocity execution
Design Space of Braitenberg Vehicles
1.1
259
Related Work
The premise that collectives of simple, reactive agents exhibit robust behaviors has been extensively validated in simulated and empirical work, from boid models of swarms [25] and particle-based optimization algorithms [7], to consensus in biological collectives [28]. However, the effort to design robust and efficient artificial collectives is still very much underway, with current systems falling far short of the performance seen in biological collectives [6]. It is still difficult to design for robust behavior, especially if the behaviors are learned, evolved or environmental disturbances are present. Recent works in formalizing reactive plans or controllers have focused on problems such as co-design of individual robotic hardware and software with dynamic effects [36]; task-driven single-robot planning and design problems [22]; or top-down synthesis problems whose solutions require significant sensing, computation, and communication between agents [3]. Biological systems, such as schooling fish, are a proof-of-concept that parsimonious local strategies can lead to robust behavior, especially when combined with embodied agent-environment interactions [19]. In this work we contribute a dynamical system definition and characterization of the BV model as a step toward more formal understanding of the dynamics and design of reactive collectives. The BV model has not often been a direct object of study by engineers, apart from use as a demonstration in education [14] and graphics [33]. Similar underlying themes can be found in the reactive multi-robot systems literature, often focused on finding minimal requirements of a single behavior, while we aim to generalize many behaviors in a single model. One fruitful line of work shows that local tactile information, such as the frequency of robot-robot encounters, is sufficient for tasks such as localization and density regulation in structured environments [21]. In [11], the authors perform a parameter sweep over the control space to find an optimal reactive controller for aggregation with a one-bit lasertype occupancy sensor. Similarly, segregative behavior is achieved by a reactive controller on a sensor reporting only whether another agent is nearby and if it is the same or a different type [23]. Interestingly, neither of these systems [11,23] include the ability to sense whether nearby agents are to the left or right, as is fundamental to the BV model. However, in both systems the optimal motion pattern in the absence of stimulus is a circular arc, and the agents are able to effectively search the space and aggregate when they detect nearby agents. This “sweep-until-find” strategy seems to simulate the taxis-kinesis strategy of love with a more minimalist active sensing strategy that avoids the need to compute direction of taxis. However, these systems are not designed to switch modes to other flocking or spatial coverage behaviors, as the BV model allows. Another promising approach for safe emergent behavior design is the Gibbs Random Fields (GRF) model, demonstrating segregative and flocking behaviors [26] as well as static shape formation [27]. The GRF approach achieves stronger guarantees on performance for specific behaviors, but assumes the ability to estimate neighboring agent positions, velocities, and type (in the heterogenous swarm case). In this work, we focus our efforts on the BV model with onboard
260
J. A. Defay et al.
stimulus for homogeneous collectives and characterize the behavior space. The BV model has minimal sensing requirements and yet still demonstrates at multiple robust switchable behavior modes. Researchers in control theory and applied mathematics have also shown formal stability proofs for emergent collective motion. The Vicsek model of constant speed agents in the plane with a nearest-neighbor alignment feedback rule is a stable switched linear system [16,32]; the Dubin’s car model has been shown to reliably rendezvous given a simple reactive steering control law [35]. Discrete stochastic methods can give formal guarantees on performance of collectives of minimal agents tasked with static shape formation, aggregation and dispersal [4,20], and the strategies show promising applicability to continuous systems. So while we have strong mathematical intuition that reactive swarms can perform robust behaviors, there is a lack of understanding of dynamic system behavior, especially considering different sensor models and sensor-actuator couplings. One exception is [12], where the dynamics of schooling fish were investigated as a function of both the perception angle and manoeuvrability of the fish, identifying three possible stable states (schooling, swarming and milling). In a similar spirit, we address the BV model as a starting point to unify these insights and advance the understanding of minimalist, mode-switching autonomous collectives. Here, we present a thorough characterization of the BV model as a first step toward more formal understanding of the system’s stable states. A similar and more developed effort is the swarmalator model [24]; the model leverages the physics of oscillators to formally show five stable states in collective phase and position, but is limited in its applicability to systems with directional sensing constraints or the inability to sense neighboring phase. BVs have recently been used as a testbed for embodied artificial intelligence research in developmental neurosimulation [1] and structured Bayesian techniques [31]. However, these approaches generally lack a formal model or an accessible implementation. To fill this gap, we implement an open-source BV model and swarm characterization metrics, allowing for comparison between approaches to reactive collective design.
2
Model and Simulation
First, we detail the modelling choices made in the adaptation of the BV to our simulated implementation. We have emphasized directional sensing and stimulus production, motivated by practical constraints in robotic systems. Our interactive, Python-based implementation is available at github.com/CEIlab/DARS2022-BVcollectives. An overview of the model and results is also available in video format at https://youtu.be/9ngfH0vkOfc. 2.1
Agent Model
Each agent i has a position on the plane, xi , orientation θi , and speed vi ∈ [Vmin = 0; Vmax ]. Agents are considered to be points in the plane; we leave most
Design Space of Braitenberg Vehicles
261
considerations of morphology and collisions to future work, and assume that each agent is small compared to the workspace. Time progresses in discrete stages, where agents update their heading and speed at each time step as a memoryless, reactive function of their observations in the previous time step. We omit the time variable in our notation where operations occur within a single time step. We implement directional and omnidirectional sensing and stimulus models, inspired by common geometric sensing constraints for multi-robot systems, such as those found with radio communications or LED-photodiode implementations. Consider two agents (a leader and a follower ) with positions and orientations xL , θL and xF , θF , respectively. Let θC be the global orientation of the vector from xF to xL . Let α ∈ [−π, π] be the angular coordinate of the follower in the leader’s reference frame. If the follower is to the right as viewed by the leader, α is negative. Let β be the angular coordinate of the leader in the follower’s reference frame (β = 0 when θC = θF ). If the leader is to the right as viewed by the follower, β is negative. In both stimulus models, intensity drops off proportional to the square of the distance from the stimulus source, as would be the case with light-based stimulus. If an observer is distance d from a stimulus source, the fraction of stimulus intensity remaining after distance is taken into account is Id = 1 − (
d dmax
)2 ,
(1)
where dmax is the distance cutoff beyond which the stimulus is undetectable. The directional source is modelled off a generic LED intensity profile, with the stimulus pointing “behind” the agent and visible if the follower is within the stimulus “cone” (α > π − θmax or α < −π + θmax ). We assume perceived intensity does not depend on α within this angular range. In the omnidirectional sensing case, the sensed area is the entire circle around an agent defined by radius dmax . In the directional sensing case, the field of view is limited by the angle constraint θmax such that |β| < θmax . In addition, we have the apparent intensity be scaled by a multiplicative factor Iβ , as photodiodes are often flat and thus reflect more incoming photons when misaligned. We use the relation Iβ = 1 − (
β θmax
)2
(2)
to model a smooth dropoff in sensor sensitivity that reaches zero at a misalignment of β = θmax . See Fig. 1(B) for an illustration of the directional stimulus intensity profile for θmax = π/3, θF = θL = π/2. Inspired by Braitenberg’s two “forward-facing” sensors, we define a “windshield” sensor model, splitting the field of view into the left and right half of the “windshield,” centered on the agent’s orientation. Each agent then has two high level sensors, each of which reports the total intensity of stimuli observed in that half. We let yi,r and yi,l denote the right-hand and left-hand sensor readings for agent i. To compute each of these sensor readings, we compute the sum of each
262
J. A. Defay et al.
visible agent’s intensity (Imax = 1 for all agents in our simulations), scaled by Id and Iβ as applicable. Next, we define reactive, proportional control laws according to Braitenberg’s schema, where we vary kinesis (k) to move faster or slower in response to stimuli, and vary taxis (t) to turn toward or away from stimuli. For agent i, we control the speed vi and the orientation θi according to v˙i = k(yi,l , yi,r ) θ˙i = t(yi,l , yi,r )
(3) (4)
where yi,l and yi,r are the left and right sensor response magnitudes, respectively. For readability, we define i = yi,l + yi,r for the total stimulus observed by an agent, and δi = yi,l − yi,r for the difference between the two sensing regimes, with a positive δi indicating more stimulus on the left hand side of the agent than the right. The basic reactive “behaviors” that we consider here are love, fear, aggression, and curiosity, and each correspond to different simple control law implementations for k and t. For example, in the love behavior, agents turn toward stimuli and slow down as intensity increases, leading to reliable aggregations. The discretized control laws for love are then vi (t + 1) = vi (t) − γi (t) θi (t + 1) = θi (t) + γδi (t)
(5) (6)
where γ is a scaling factor and design parameter we call the influence, governing how strongly the agents actuate in each time step. In our discrete model and simulation, it combines the design parameters of gain in the control law and the discretization of time. Likewise, we define fear as the emergent behavior when the agents turn away from stimuli, and slow down in response to stimuli; agents “fear” the stimuli and wish to avoid it. Aggression emerges when agents turn toward stimuli and speed up (“attacking”), and curiosity emerges when agents turn away from stimuli and speed up (on their way to find the next stimulus signal). The control laws are summarized in Table 1. We implement two independent sources of noise, on the actuation direction and speed. During each timestep, the agent updates its control inputs (heading and speed) according to the rules in Table 1. To simulate actuation noise, an additive Gaussian noise term is applied to both of the control inputs, independently for each agent, before the simulation takes a step. The default noise has a mean of zero for both heading and speed, and a standard deviation of π = 5◦ for the heading and σv = vmax for speed. In later sections, we σθ = 72 72 increase these standard deviations to study system response to actuation noise.
Design Space of Braitenberg Vehicles
263
Table 1. The reaction rules for each of the four regimes of interest.
2.2
Response to Increasing Stimulus
Negative Kinesis Slow Down
Positive Kinesis Speed Up
Positive Taxis Turn Toward
Love Aggression vi (t + 1) = vi (t) − γi (t) vi (t + 1) = vi (t) + γi (t) θi (t + 1) = θi (t) + γδi (t) θi (t + 1) = θi (t) + γδi (t)
Negative Taxis Turn Away
Fear Curiosity vi (t + 1) = vi (t) − γi (t) vi (t + 1) = vi (t) + γi (t) θi (t + 1) = θi (t) − γδi (t) θi (t + 1) = θi (t) − γδi (t)
World
We define a two-dimensional 500 × 500 unit environment. We give the agents a maximum velocity of 4 units per second. To study collective behavior without the influence of boundary conditions, we identify the edges [0, y] ∼ [500, y] and [x, 0] ∼ [x, 500], creating an impossible torus environment. Agents are able to sense across these identified edges. We include the possibility for obstacles to be present, with a default behavior of an elastic collision when an agent crosses an obstacle boundary. However, the computation of occlusions and shading are beyond the scope of this work, so agents are able to detect each other through obstacles if the angle and distance constraints are met. For the radius of the circle denoting maximum distance of stimulus detection, we use dmax = 200 for all simulations, regardless of other geometric constraints on stimulus and sensing. 2.3
Metrics
We define two metrics for evaluating the emergent behavior in terms of spatialand velocity distribution. The Nearest Neighbor Distance (NND) metric is defined as the population average of the distance from each agent its closest neighbor, or N N Dpop =
N 1 min(d(ai , aj )) N i=0
(7)
where d(ai , aj ) is the minimum Euclidean distance along the surface of the torus world between agents i and j. Using an approximation based on uniformly distributed points in a unit square [15], we estimate N N Dpop ≈ 60 in the case where 20 agents are distributed uniformly; however this is an underestimation due to the approximation as well as our torus geometry. In practice, we see a maximum N N Dpop ≈ 75 in cases where the behavior is observed to create a more nearly uniform spatial distribution.
264
J. A. Defay et al.
We also compute the population average for a kinetic energy metric, such that KEpop =
N 1 2 v N i=0 i
(8)
normalizing the mass of each agent to 1.
3 3.1
Emergent Behaviors Behavior Characterization
For the models with directional sensing, we characterize each of the four emergent behaviors, with respect to the nearest neighbor distance and energy metrics, by performing parameter sweeps across the influence angle θmax and the influence magnitude γ. For the influence angle, we vary θmax ∈ [π/8, π/2], and we vary the influence scale in [0.25, 2.5], with each range discretized into ten evenly spaced values. For each pair of angle/influence variables, we simulated 20 randomly initialized collectives of 20 agents, collecting the metrics at each timestep after a convergence period Tconv = 990 for a duration of Trecord = 10 timesteps, for a total simulation runtime of 1000 timesteps. We then average each metric over time during the converged behavior, and report the value on the heat maps in Figs. 2, 3, 4, and 5. We also show a representative time-lapse of agent positions for the two directional models over the last 100 frames of the simulation. Finally, we show the system response to noise for all three models. For each model, we fix γ = 1.5 and θmax = π/2, and compute the mean and standard deviation of NND and KE over 20 iterations with Tconv = 990 and Trecord = 10, as in the heat maps. 3.2
General Observations
While all four taxis-kinesis combinations produce distinct emergent behaviors, themes emerge in our characterization of the design space. Firstly, most behaviors are robust to noise, suggesting several stable states in the system phase space. In fact, without adequate noise, the system may get stuck in a “local minimum” where connectivity is too low to activate a steady state with higher kinetic energy, especially in cases of limited sensing, stimulus, and influence. If influence is too low, or the angular extent of directional sensors/stimulus is too narrow, the agents may be “under-stimulated” and will not necessarily produce the expected emergent behaviors. However, they may still form coherent flows or other temporary or local dynamical structures, as can be seen in the supplementary video. In a second case, if influence is too high or sensing and stimulus production is overpowered, the agents are “over-stimulated” and may produce an unexpected emergent behavior. Of course in between these extremes there is a regime where the system produces emergent behaviors that are expected and indicate a balance between too much and too little reactivity;
Design Space of Braitenberg Vehicles
265
but all behaviors are of technical interest for multi-modal artificial collectives. In the following sections, we will first characterize the balanced behavior regime, then identify under- and over-stimulated modes if they exist. 3.3
Love
“Loving” agents turn toward stimulus and slow down as local stimulus increases. For artificial collectives, this behavior is primarily useful for aggregation into a low-velocity collective, perhaps to await a new task or to establish a static network for exchange of information. Figure 2 presents an overview of the emergent behaviors of this taxis-kinesis combination.
Fig. 2. Characterization of love behavior (positive taxis, negative kinesis). (a) First row: NND and kinetic energy profiles for the directional stimulus and sensing model, varying the angle θmax identically for both stimulus and sensing, and the influence γ. The right-most figure is a snapshot of the behavior for the last 100 frames of a random simulation with γ = 1 and θ = π/2. Second row, (b) results for omni-directional stimulus and directional sensing model, varying the angle θmax (now only affecting angular sensing range) and the influence γ. Third row, (c) results for each sensor-stimuli model of NND and kinetic energy, while varying noise scaling factor as a percent of the maximum value π for rotational noise and vmax = 4px for linear actuation noise. All data points have θmax = π/2 where applicable and γ = 1.5.
266
J. A. Defay et al.
Overall this is one of the most consistent quadrants, with almost all parameter choices leading to consistent aggregative behaviors. Aggregation is not a new behavior in the decentralized reactive agent literature, and our results confirm that minimal sensing and reactive strategies lead to stable, low kinetic energy, low NND states regardless of noise. Of interest is that the three sensor/stimulus models have notably different mean NND as the noise magnitude increases, but the mean kinetic energy of all three implementations is unaffected by noise. In the case of directional stimulus and sensing, the converged steady state has higher NND because the agents cannot see nearby agents behind them, and thus an aligned following behavior is stable in the torus environment. In the cases with omnidirectional stimulus and/or sensing, the agents have sufficient mutual visibility such that some agents will halt completely and form more traditional aggregates. 3.4
Aggression
Aggressive agents turn toward stimuli and speed up. This behavior is useful for clustering and flocking in spaces where agents wish to keep a high momentum for quick dispersion after aggregation, or when agents must come together quickly or follow stimulus sources closely. As seen in Fig. 3(a) and (b), the traditional behavior of aggressive following is found to emerge for higher values of influence and θmax , and the behavior is not as robust as love. At lower levels of stimulation, the agents tend to become more uniformly distributed, and qualitatively their behavior consists of short sprints toward stimulus before losing the signal and slowing down. Quantitatively we also observe a lower average kinetic energy. As we increase the power of the sensing and stimuli models, mean kinetic energy increases, but mean NND stays relatively constant. We observe that for narrow θmax , agents form small slow clusters, but as θmax and influence increase, the collective forms chains of agents following each other around the torus. Agents are closely following each other, so mean NND stays low while collective kinetic energy increases. Noise has relatively little effect on the behavior of the system, apart from preventing a deadlocked stable state at zero noise for directional stimulus and sensing. Increasing noise also tends to increase the kinetic energy of the system for directional constraints, increasing the chance of sighting another agent through random walk coverage and re-orientations. Finally, we observe that the standard deviation for kinetic energy is high in the directional cases, supporting our observation of bi-modal velocity distributions with this taxis-kinesis combination. 3.5
Fear
Fearful agents turn away from stimuli and slow down, and the resulting emergent behavior is a space-filling formation. This behavior lends itself well to applications that need automatic organization of a collective into a mostly static network of agents monitoring the environment or passing messages.
Design Space of Braitenberg Vehicles
267
Fig. 3. Characterization of aggression behavior (positive taxis, positive kinesis). (a) First row: NND and kinetic energy profiles for the directional stimulus and sensing model, varying the angle θmax identically for both stimulus and sensing, and the influence γ. The right-most figure is a snapshot of the behavior for the last 100 frames of a random simulation with γ = 1 and θ = π/2. Second row, (b) results for omnidirectional stimulus and directional sensing model, varying the angle θmax (now only affecting angular sensing range) and the influence γ. Third row, (c) results for each sensor-stimuli model of NND and kinetic energy, while varying noise scaling factor as a percent of the maximum value π for rotational noise and vmax = 4px for linear actuation noise. All data points have θmax = π/2 where applicable and γ = 1.5.
We see from Fig. 4 that overall, the NND metric is consistently high with this behavior regardless of sensor/stimulus model, nearly maximized in much of the design space. As sensing and stimulus production become more powerful, each agent is able to detect more stimulus around it, and becomes more “afraid” and moves more slowly. The under-stimulated case is apparent, dramatically so in the case of directional stimulus and sensing, where kinetic energy is consistently high across all noise values. The system is in a stable state of agents detecting a low amount of stimulus most of the time, and stopping and reorienting when they happen to cross paths. The resulting behavior is reminiscent of run-and-tumble models in active matter systems. The taxis-kinesis combination of fear is notably more robust with respect to noise than the previous two behaviors.
268
J. A. Defay et al.
Fig. 4. Characterization of fear behavior (negative taxis, negative kinesis). (a) First row: NND and kinetic energy profiles for the directional stimulus and sensing model, varying the angle θmax identically for both stimulus and sensing, and the influence γ. The right-most figure is a snapshot of the behavior for the last 100 frames of a random simulation with γ = 1 and θ = π/2. Second row, (b) results for omni-directional stimulus and directional sensing model, varying the angle θmax (now only affecting angular sensing range) and the influence γ. Third row, (c) results for each sensor-stimuli model of NND and kinetic energy, while varying noise scaling factor as a percent of the maximum value π for rotational noise and vmax = 4px for linear actuation noise. All data points have θmax = π/2 where applicable and γ = 1.5.
3.6
Curiosity
Finally, in the curiosity mode, agents turn away from stimulus and speed up when more stimulus is detected. Intuitively, they are “seeking” regions with no stimulus in which to stop. The agents tend to disperse, similarly to fear, but display a greater sensitivity to noise and system design parameters. The curiosity behavior is applicable to the formation of space-filling static formations, as well as dynamic space-filling trajectories that have a higher average energy, depending on the sensing and stimulus model and design parameters. In Fig. 5, we see that counterintuitively, more directional models are better at maintaining a uniform spatial distribution. As θmax increases, agents are able to see more neighbors, and a stable state occurs where agents align and form a “search party” style formation, as seen in Fig. 5(a).
Design Space of Braitenberg Vehicles
269
Fig. 5. Characterization of curiosity behavior (negative taxis, positive kinesis). (a) First row: NND and kinetic energy profiles for the directional stimulus and sensing model, varying the angle θmax identically for both stimulus and sensing, and the influence γ. The right-most figure is a snapshot of the behavior for the last 100 frames of a random simulation with γ = 1 and θ = π/2. Second row, (b) results for omnidirectional stimulus and directional sensing model, varying the angle θmax (now only affecting angular sensing range) and the influence γ. Third row, (c) results for each sensor-stimuli model of NND and kinetic energy, while varying noise scaling factor as a percent of the maximum value π for rotational noise and vmax = 4px for linear actuation noise. All data points have θmax = π/2 where applicable and γ = 1.5.
At high θmax in the omnidirectional stimulus and directed sensing case, the system becomes overstimulated, as it also does in the fully omnidirectional case. In this mode, agents cannot escape stimuli, and maintain a consistently high speed. Interestingly, this behavior does not maximize the NND metric, indicating some additional collective structure, especially at lower noise values.
4
Conclusion and Outlook
We have presented a novel model of Braitenberg Vehicles with onboard stimulus, and characterized the behavior of the system with respect to nearest-neighbor distance and kinetic energy, under three sensing and stimulus models inspired by robotics applications. We identified several unexpected modes, such as both low
270
J. A. Defay et al.
and high kinetic energy modes for curiosity. We also analyzed for response to noise, and found the emergent collective dynamics to be quite stable, giving hope for continued formalization of multi-agent system design. It is also interesting to note that noise often aids stability or can be used to tune the emergent behavior. We also identify certain “deadlock” states, such as aggression with directional stimulus and sensing; when agents cannot adequately detect others, the system can to become “stuck” in a motionless state without noise. We identified that constraints on angular range of sensing and stimulus have a key effect on behavior in much of the design space, supporting findings in other works on embodied system dynamics [12]. Other interesting design parameters that we have not considered here include agent density in the space, environment geometries and strategies for agent-environment interactions, heterogenous collectives, and the influence of controllable or unreliable agents. In the supplemental video, available at https://youtu.be/9ngfH0vkOfc, we show a demonstration of how a single controllable agent can be used along with global mode-switching to guide a swarm through narrow passageways and explore a multi-room environment. Many of these behaviors that we have shown are amenable to more formal analysis. We also aim to use this platform for comparison of reactive, stateful, and learned strategies under minimal sensing, as well as develop an embodied representation of the agents that allows for design of agent-agent and agentenvironment interactions. The Braitenberg Vehicle model’s strength lies in leveraging the ability to switch a collective between several useful modes with minimal requirements in hardware and software. Acknowledgments. This project was funded by the Packard Fellowship for Science and Engineering, GETTYLABS, and the National Science Foundation (NSF) Grant #2042411.
References 1. Alicea, B., Dvoretskii, S., Felder, S., Gong, Z., Gupta, A., Parent, J.: Developmental embodied agents as meta-brain models. In: DevoNN Workshop (2020) 2. Braitenberg, V.: Vehicles: Experiments in Synthetic Psychology. MIT Press, Cambridge (1986) 3. Chen, J., Sun, R., Kress-Gazit, H.: Distributed control of robotic swarms from reactive high-level specifications. In: 2021 IEEE 17th International Conference on Automation Science and Engineering (CASE), pp. 1247–1254. IEEE (2021) 4. Coppola, M., Guo, J., Gill, E., de Croon, G.C.: Provable self-organizing pattern formation by a swarm of robots with limited knowledge. Swarm Intell. 13(1), 59–94 (2019) 5. Ding, T., et al.: Light-induced actuating nanotransducers. Proc. Natl. Acad. Sci. 113(20), 5503–5507 (2016) 6. Dorigo, M., Theraulaz, G., Trianni, V.: Swarm robotics: past, present, and future [point of view]. Proc. IEEE 109(7), 1152–1165 (2021) 7. Eberhart, R.C., Shi, Y., Kennedy, J.: Swarm Intelligence. Elsevier, Amsterdam (2001)
Design Space of Braitenberg Vehicles
271
8. Frank, S., Kuijper, A.: Privacy by design: survey on capacitive proximity sensing as system of choice for driver vehicle interfaces. In: Computer Science in Cars Symposium, pp. 1–9 (2020) 9. Gardi, G., Ceron, S., Wang, W., Petersen, K., Sitti, M.: Microrobot collectives with reconfigurable morphologies, behaviors, and functions. Nat. Commun. 13(1), 1–14 (2022) 10. Garnier, S., et al.: The embodiment of cockroach aggregation behavior in a group of micro-robots. Artif. Life 14(4), 387–408 (2008) 11. Gauci, M., Chen, J., Li, W., Dodd, T.J., Groß, R.: Self-organized aggregation without computation. Int. J. Robot. Res. 33(8), 1145–1161 (2014) 12. Gautrais, J., Jost, C., Theraulaz, G.: Key behavioural factors in a self-organised fish school model. In: Annales Zoologici Fennici, vol. 45, pp. 415–428. BioOne (2008) 13. Hauert, S.: Swarm engineering across scales: from robots to nanomedicine. In: ECAL 2017, The Fourteenth European Conference on Artificial Life, pp. 11–12. MIT Press (2017) 14. Hogg, D.W., Martin, F., Resnick, M.: Braitenberg Creatures. Epistemology and Learning Group, MIT Media Laboratory Cambridge (1991) 15. achille hui. (https://math.stackexchange.com/users/59379/achille hui): Average distance between n randomly distributed points on a square with their nearest neighbors. Mathematics Stack Exchange. https://math.stackexchange.com/q/ 2565546. Accessed 11 Nov 2018 16. Jadbabaie, A., Lin, J., Morse, A.S.: Coordination of groups of mobile autonomous agents using nearest neighbor rules. IEEE Trans. Autom. Control 48(6), 988–1001 (2003) 17. Lanzisera, S., Zats, D., Pister, K.S.: Radio frequency time-of-flight distance measurement for low-cost wireless sensor localization. IEEE Sens. J. 11(3), 837–845 (2011) 18. LaViers, A., et al.: Choreographic and somatic approaches for the development of expressive robotic systems. In: Arts, vol. 7, p. 11. MDPI (2018) 19. Lei, L., Escobedo, R., Sire, C., Theraulaz, G.: Computational and robotic modeling reveal parsimonious combinations of interactions between individuals in schooling fish. PLoS Comput. Biol. 16(3), e1007194 (2020) 20. Li, S., et al.: Programming active cohesive granular matter with mechanically induced phase changes. Sci. Adv. 7(17), eabe8494 (2021) 21. Mayya, S.: Local encounters in robot swarms: from localization to density regulation. Ph.D. thesis, Georgia Institute of Technology (2019) 22. McFassel, G., Shell, D.A.: Reactivity and statefulness: action-based sensors, plans, and necessary state. Int. J. Robot. Res., 02783649221078874 (2021) 23. Mitrano, P., Burklund, J., Giancola, M., Pinciroli, C.: A minimalistic approach to segregation in robot swarms. In: 2019 International Symposium on Multi-Robot and Multi-Agent Systems (MRS), pp. 105–111. IEEE (2019) 24. O’Keeffe, K.P., Hong, H., Strogatz, S.H.: Oscillators that sync and swarm. Nat. Commun. 8(1), 1–13 (2017) 25. Reynolds, C.W.: Flocks, herds and schools: a distributed behavioral model. In: Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques, pp. 25–34 (1987) 26. Rezeck, P., Assun¸ca ˜o, R.M., Chaimowicz, L.: Flocking-segregative swarming behaviors using Gibbs random fields. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 8757–8763. IEEE (2021)
272
J. A. Defay et al.
27. Rezeck, P., Chaimowicz, L.: Chemistry-inspired pattern formation with robotic swarms. arXiv preprint: arXiv:2206.03388 (2022) 28. Rosenthal, S.B., Twomey, C.R., Hartnett, A.T., Wu, H.S., Couzin, I.D.: Revealing the hidden networks of interaction in mobile animal groups allows prediction of complex behavioral contagion. Proc. Natl. Acad. Sci. 112(15), 4690–4695 (2015) 29. Rueben, M., et al.: Themes and research directions in privacy-sensitive robotics. In: 2018 IEEE Workshop on Advanced Robotics and its Social Impacts (ARSO), pp. 77–84. IEEE (2018) 30. Shahrokhi, S., Lin, L., Ertel, C., Wan, M., Becker, A.T.: Steering a swarm of particles using global inputs and swarm statistics. IEEE Trans. Rob. 34(1), 207– 219 (2017) 31. Thornton, C.: Predictive processing simplified: the infotropic machine. Brain Cogn. 112, 13–24 (2017) 32. Vicsek, T., Czir´ ok, A., Ben-Jacob, E., Cohen, I., Shochet, O.: Novel type of phase transition in a system of self-driven particles. Phys. Rev. Lett. 75(6), 1226 (1995) 33. Wiseman, J.: Braitenberg vehicles (1999). http://people.cs.uchicago.edu/wiseman/ vehicles/ 34. Yang, J.F., et al.: Memristor circuits for colloidal robotics: temporal access to memory, sensing, and actuation. Adv. Intell. Syst. 4(4), 2100205 (2022) 35. Yu, J., LaValle, S.M., Liberzon, D.: Rendezvous without coordinates. IEEE Trans. Autom. Control 57(2), 421–434 (2011) 36. Zardini, G., Censi, A., Frazzoli, E.: Co-design of autonomous systems: from hardware selection to control synthesis. In: 2021 European Control Conference (ECC), pp. 682–689. IEEE (2021)
Decision-Making Among Bounded Rational Agents Junhong Xu1(B) , Durgakant Pushp1 , Kai Yin2 , and Lantao Liu1 1
Indiana University, Bloomington, IN 47408, USA {xu14,dpushp,lantao}@iu.edu 2 Expedia Group, Seattle, USA
Abstract. When robots share the same workspace with other intelligent agents (e.g., other robots or humans), they must be able to reason about the behaviors of their neighboring agents while accomplishing the designated tasks. In practice, frequently, agents do not exhibit absolutely rational behavior due to their limited computational resources. Thus, predicting the optimal agent behaviors is undesirable (because it demands prohibitive computational resources) and undesirable (because the prediction may be wrong). Motivated by this observation, we remove the assumption of perfectly rational agents and propose incorporating the concept of bounded rationality from an information-theoretic view into the game-theoretic framework. This allows the robots to reason other agents’ sub-optimal behaviors and act accordingly under their computational constraints. Specifically, bounded rationality directly models the agent’s information processing ability, which is represented as the KLdivergence between nominal and optimized stochastic policies, and the solution to the bounded-optimal policy can be obtained by an efficient importance sampling approach. Using both simulated and real-world experiments in multi-robot navigation tasks, we demonstrate that the resulting framework allows the robots to reason about different levels of rational behaviors of other agents and compute a reasonable strategy under its computational constraint. (1 A preliminary version of this work appeared as a poster in 2021 NeurIPS Workshop on Learning and Decision-Making with Strategic Feedback. The video of the real-world experiments can be found at https:// youtu.be/hzCitSSuWiI. We gratefully acknowledge the support of NSF with grant No. 2006886 and 2047169.)
Keywords: Bounded Rationality System
1
· Game Theory · Multi-Robot
Introduction
We consider the problem of generating reasonable decisions for robots in multiagent environments. This decision-making problem is complex because each c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 273–285, 2024. https://doi.org/10.1007/978-3-031-51497-5_20
274
J. Xu et al.
robot’s motion trajectory depends on and affects the trajectories of others. Thus they need to anticipate how others will respond to their decisions. The gametheoretic framework provides an appealing model choice to describe this complex decision-making problem among agents [16] and has been applied to various robotics applications, e.g., drone racing [24] and swarm coordination [1]. In an ideal situation, where all the agents are perfectly rational (i.e., they have unlimited computational resources), they can select the motion trajectories that reach the Nash equilibrium (if exists). However, since these trajectories live in a continuous space, agents need to evaluate infinitely many trajectories and the interaction among them, which is intractable. As a remedy, most of the works consider constraining the search space of the multi-robot problem via sampling [25] or locally perturbing the solution [24] to find a good trajectory within a reasonable amount of time. Most game-theoretic planners mentioned above do not explicitly consider the agents’ computational limitations in their modeling process, i.e., they assume each agent is perfectly rational. They consider these limitations only externally, e.g., by truncating the number of iterations during optimization. In contrast, we propose a novel and more principled treatment for modeling agents with limited computational resources by directly modeling the agents being only boundedrational [6] in the game-theoretic framework. Bounded Rationality (BR) has been developed in economics [22] and cognitive science [7] to describe behaviors of humans (or other intelligent agents), who have limited computational resources and partial information about the world but still need to make decisions from an enormous number of choices, and has been applied to analyze the robustness of controllers in the single-agent setting [17]. In this work, we use the information-theoretic view of BR [6], which states that the agent optimizes its strategy under an information-theoretic constraint (e.g., KL-divergence), describing the cost of transforming its a-prior strategy into an optimized one. This problem can be solved efficiently by evaluating a finite number of trajectory choices from its prior trajectory distribution. Since BR explicitly considers the computational constraints during the modeling process, the resulting solution provides an explainable way to trade-off computational efficiency and performance. Furthermore, by incorporating BR into the game-theoretic framework, robots can naturally reason about other agents’ sub-optimal behaviors and use these predictions to take advantage of (or avoid) other agents with less (or higher) computational resources.
2
Related Work
Our work is related to motion planning in multi-agent systems and gametheoretic frameworks. Here we provide a brief overview of these topics. The game-theoretic approach in robotics has gained increasing popularity recently. For example, the authors in [23,24] combine Model-Predictive Control (MPC) and Iterated Best Response (IBR) to approximate the Nash Equilibrium solution of a two-player vehicle racing game. Recently, a game-theoretic iterative linear
Decision-Making Among Bounded Rational Agents
275
quadratic regulator (iLQR) has been proposed to solve a general-sum stochastic game [21]. In addition, the game-theoretic framework is also used in self-driving vehicles for trajectory prediction [5,20] and motion planning among pedestrians [4,12]. The above works are all based on the concept of rational agents who are assumed to be able to maximize their utility. In contrast, the proposed bounded rational framework explicitly considers the information-processing constraints of intelligent agents. Although bounded rational solutions are used in almost all robotic systems in practice, e.g., anytime planners terminate the computation if the time runs out [8], only a few works attempt to model this bounded rational assumption explicitly. Recently, authors in [17] analyze the single-agent robust control performance using the information-theoretic bounded rationality [6,15]. Another closely related but independently developed literature is KL-Control [3,9]. When computing the optimal policy, it includes an additional information-theoretic cost measured by the KL-divergence between a prior policy distribution and a posterior after optimization. This is similar to the effect of the informationtheoretic constraints of bounded rationality, where the final optimal control distribution can be sampled from the exponential family using approximate Bayesian inference [10,26]. Although these methods have similar traits, they generally only focus on single-agent problems. In contrast, our work integrates the bounded rationality idea into the game-theoretic framework and provides a method to compute agents’ strategies under computational limits.
3
Problem Formulation
In this section, we define multi-agent decision-making using the formulation of Multi-Agent Markov Decision Processes (MMDPs) or Markov Game (MGs) [11], and provide the resulting Nash Equilibrium (NE) solution concept under the perfect rationality assumption. In the subsequent sections, we show how this assumption can be mitigated using the BR concept and derive a sampling-based method to find a Nash Equilibrium strategy profile under bounded rationality. 3.1
Multi-agent Markov Decision Process
In an MMDP with N agents, each agent i has its own state space si ∈ S i and action space ai ∈ Ai , where ai and si denote the state and action of agent i; S i and Ai denote the corresponding spaces. We denote the joint states and actions of all the agents as S = [s1 , ..., sN ] and A = [a1 , ..., aN ]. The agents progress in the environment as follows. At every timestep t, all the agents simultaneously execute their actions At to advance to the next states St+1 according to their stochastic transition function sit+1 ∼ pi (sit+1 |St , At ). At the same time, they receive rewards Rt = [rt1 , ..., rtN ], where rti = fti (St , At ) is the reward function for agent i. Each agent’s stochastic transition and reward functions depend on all agents’ states and actions in the system. Under the perfectly rational assumption, the goal for agent i is to find a strategy that maximizes an expected utility function πti,∗ = arg maxπti U i (St , Πt ),
(1)
276
J. Xu et al.
H+t i where U i (St , Πt ) = E r (S , A ) is agent i’s utility function at t; H k k t k=t is planning horizon, and Πt = [πt1 , ..., πtN ] denotes the strategy profile for every agent. In this work, we assume that the agents’ strategies take a specific form: a distribution over the action sequence ait ∼ πti (ait |St , Πt−i ), where ait = [ait , ..., ait+H ] is the action sequence up to horizon H and Πt−i is the strategy profile without agent i. This policy form is well-suited for most robotics problems because the trajectory induced by the action sequence can be tracked by a low-level controller. 3.2
Iterative Best Response for MMDPs
To solve the problem defined in Eq. (1), each agent needs to predict how other agents will behave and respond to each other. For brevity, we will write π i (ait |St ) as agent i’s strategy and omit the dependency on Π −i . One possible and common way to predict other agents’ behaviors is by assuming all other agents are perfectly rational, and thus the strategy profile of all the agents reaches the Nash Equilibrium (NE) (if exists) [16], which satisfies the following relationship: U i (St , Πt−i,∗ , πti,∗ ) ≥ U i (St , Πt−i,∗ , πti ), ∀i ∈ {1, ..., N }, for any πti ,
(2)
Intuitively, if the agents satisfy the NE, no agent can improve its utility by unilaterally changing its strategy. To compute NE, we can apply a numerical procedure called Iterative Best Response (IBR) [19]. Starting from an initial guess of the strategy profiles of all the agents, we update each agent’s strategy to the best response to the current strategies of all other agents. The above procedure is applied iteratively for each agent until the strategy profile does not change. If every agent is perfectly rational, the robot can use NE strategy profile to predict other agents’ behaviors and act correspondingly.1 . However, there is a gap between this perfect rational assumption and the real world, as most existing methods can only search the strategy profile in a neighborhood of the initial guess [23,24] due to computational limits. In the following section, we fill this gap by explicitly formulating this bounded rational solution.
4
Methodology
This section first provides details on the formulation of bounded rationality and integrates it into the game-theoretic framework. Then, we propose a samplingbased approach to generate bounded-rational stochastic policies. 4.1
Bounded Rational Agents in Game-Theoretic Framework
In the standard game-theoretic frameworks, agents are rational, i.e., they optimize their decisions via evaluating an infinite number of action sequences without considering the computational resources. In contrast, we model each agent 1
We make a simplifying assumption that there is only one NE in the game.
Decision-Making Among Bounded Rational Agents
277
as bounded rational – it makes decisions that maximize its utility subject to a certain computational constraint. Following the work in information-theoretic bounded rationality [6,14], this constraint is explicitly defined by the neighborhood of a default policy q i . Intuitively, this default policy describes the nominal behavior of the agent. For example, in a driving scenario, the default policy of an aggressive driver may be more likely to drive at a high speed. The agent’s goal is to search for an optimized posterior policy bounded within the neighborhood of q i . This size of the neighborhood may reflect the limited computational resources or other practical considerations. In the following, we omit the time subscript t for clarity. We use KL-divergence to characterize this neighborhood π i,∗ = arg max U i (S, Π), s. t. KL(π i ||q i ) ≤ Ki . πi
(3)
Ki is a constant denoting the amount of computation (measured in bits) agent i can deviate from the default policy. Using Lagrange multipliers, we can rewrite the constrained optimization problem in Eq. (3) as an unconstrained one π i,∗ = arg maxπi U i (S, Π) − β1i KL(π i ||q i ), where βi > 0 indicates the rationality level. To see how this bounded-optimal stochastic policy can be computed, we can first observe that the unconstrained problem can be written as 1 U i (S, Π) − KL(π i ||q i ) βi 1 KL(π i ||q) − βU i (S, Π) =− βi 1 π i (ai |S) =− π i (ai |S) log i i βU i (S,Π) dai β q (a )e 1 = − KL(π i ||ψ i ), β
(4)
where ψ i (ai |s) ∝ q i (ai )eβU (s,a) . Since KL-divergence is non-negative, the maximum of − β1 KL(π i ||ψ i ) is obtained only when KL(π i ||ψ i ) = 0, which means π i = ψ i . Therefore, the optimal action sequence distribution of agent i (while keeping other agents’ strategies fixed) under the bounded rationality constraint is i 1 (5) π i,∗ (ai |S, Π −i ) = q i (ai )eβ·U (S,Π) , Z i i β·U i (S,Π) i where Z = q (a )e da is a normalization constant. This boundedoptimal strategy provides an intuitive and explainable way to trade off between computation and performance. When β increases from 0 to infinity, the agent becomes more rational and requires more time to compute the optimal behavior. When βi = 0, the bounded-rational policy becomes the prior, meaning agent i has no computational resources to leverage. On the other hand, when βi → ∞, the agent becomes entirely rational and selects the optimal action sequence deterministically. The rationality parameter β allows us to model agents with different amounts of computational resources.
278
J. Xu et al.
Similar to the rational case, our goal is to find the Nash Equilibrium strategy profile for a group of bounded-rational agents whose bounded-optimal policies are defined in Eq. (5). This can be done using the IBR procedure analogous to Sect. 3. Instead of optimizing the policy in Eq. (1), each bounded-rational agent finds the optimal strategy distribution defined in Eq. (5) while keeping other agents’ strategies fixed. This procedure is carried out for each agent for a fixed number of iterations or until no agent’s stochastic policy changes. 4.2
Importance Sampling for Computing Bounded-Rational Strategies
The previous section describes the bounded rationality concept, its integration with the game-theoretic framework, and uses the IBR numerical method to solve for a bounded rational Nash Equilibrium strategy profile. To actually compute the bounded-rational strategies, we need an efficient way to query samples from the distribution in Eq. (5) for each agent. Since it is relatively easier to sample the actions from the default q i (ai ), we can use importance sampling to estimate the expectation of the optimal action sequence as the best response for the agent i while keeping others’ actions fixed [2] i 1 −i |S , Π ] = ait qti (ait )eβU (St ,Πt ) dait Eai,∗ ∼πi,∗ [ai,∗ t t t t t Z 1 = Eait ∼qti (ait ) [w(ait )ait ] (6) Z K 1 1 w(ait,k )ait,k , ≈ ZK k=1
H+t
where w(ait ) = exp{β k=t rti (Sk , Ak )} and ait,k denotes the k th sample from the default policy with N samples in total. Similarly, the normalization constant can also be approximated as i Z = q i (ait )eβU (St ,Πt ) dait = Eait ∼qi (ait ) [w(ait )]
(7)
K 1 ≈ w(ait,k ). K k=1
At a high level, the importance sampling procedure proceeds as follows. The agents first propose action sequence samples from their default policies qti (ai ) and then assign each sequence a weight w(at i ) indicating its value based on the agents’ utilities and rationality levels. Finally, by combining Eq. (6) and Eq. (7), we can use the weighted average to compute the expected optimal action sequence N i i k=1 w(at,k )at,k i,∗ −i . Eai,∗ ∼πi,∗ [at |St , Πt ] ≈ N (8) i t t k=1 w(at,k )
Decision-Making Among Bounded Rational Agents
279
To find the bounded-rational Nash Equilibrium strategy profile, we replace the optimization procedure in IBR Eq. (2) using the above importance sampling. One important observation is that the shape of the prior distribution q i , the number of samples for evaluation N , and the rationality level β play essential roles in the final performance. Their relationships will be explored in the experimental section.
5
Simulated Experiments
We conduct extensive simulation experiments to demonstrate that integrating bounded rationality with the game theory framework (1) allows each agent to reason about other agents’ rationality levels to exploit (or avoid) others with less (or higher) computational capabilities and (2) explicitly trades-off between the performance and computation by varying the rationality level β and the number of sampled trajectories. We also show qualitatively that our method can generate group behaviors that approximately reach a bounded rational Nash Equilibrium strategy profile for a varying number of agents. 5.1
Simulation Setup
In this experiment, we consider the task of navigating a group of aerial vehicles in a 3D space. Each agent’s goal is to swap its position with the diametrically opposite one while avoiding collisions with each other. The distance between each agent and its goal is 6m. This environment is neither fully cooperative as each agent has a different goal location nor fully competitive because their objectives are not totally reversed (zero-sum). Thus, the agents need to collaborate with each other to avoid blockage in the center and at the same time compete with each other to follow their shortest paths to the goals. The agents are homogeneous in terms of their sizes and physical capabilities. Each agent has a size of 0.125m in x, y, z directions (similar to the dimension of Crazyfly drones used in the next section). Similar to [24], we set their transition functions using a deterministic single integrator model with the minimum and maximum speeds as amin = 0m/s and amax = 1m/s. We set a uniform prior q i (a) = U nif orm(amin , amax ) as the default policy for all the agents. The number of IBR iterations is set to 10 as we found that under varying parameters, IBR usually converges to a bounded rational NE strategy at 10 iterations. In all the simulations, we assume that the rationality levels of all the agents are known to each other. The one-step reward function is the same for each agent and set to penalize collisions and large distances to the goal. We run 50 times for each simulation with T = 80 timesteps. 5.2
Results
We first show that using the proposed framework agents can naturally reason about the behaviors of others with the same physical capabilities but different
280
J. Xu et al.
Fig. 1. (a) Comparison of the performance by increasing the ego’s rationality level from β = 0.01 to β = 0.13 while keeping other agents’ rationality levels fixed in six (solid lines) and eight (dashed lines) agent environments. The x-axis and y-axis indicate the ego’s β values and traveled distances. The green lines are the average travel distance of other agents and the red lines indicate the ego’s travel distance. (b)(c) Show the agents’ trajectories in six-agent environment with β = 0.01 and β = 0.13, respectively. The ego’s trajectories are depicted in red.
rationality levels β. Since we want to examine how varying β affects the performance, we need to ensure that the policy converges to the “optimal” one under the computational constraints. Thus, we sample a large number of trajectories, 5 × 105 , for policy computation for each agent. In this simulation, we fix other agents’ β = 0.05 and vary one agent’s (called ego agent) β from 0.01 to 0.13 and compare the travel distances between the robot and other agents (the distance of other agents is averaged). Figure 1(a) shows the performance comparison in six and eight agent environments. In general, when the ego has a lower rationality level β than other agents, it avoids them by taking a longer path. When all the agents have the same β, the ego and other agents have a similar performance. As β increases, the ego begins to exploit its advantage in computing more rational decisions and taking shorter paths. We also notice that the ego generally performs better with a large β when there are more agents in the scene. The trajectories of the six-agent environment for β = 0.01 and β = 0.13 are plotted in Fig. 1(b) and Fig. 1(c), respectively. When the ego’s β = 0.01 (in the red trajectory), it takes a detour by elevating to a higher altitude to avoid other agents. In contrast, when its β = 0.13, it pushes other agents aside and takes an almost straight path. These results are aligned with Fig. 1(a). We omit the trajectories of eight agent environments to avoid clutter. The readers are encouraged to watch the videos at https://youtu.be/hzCitSSuWiI Next, we evaluate the performance of a group of agents with the same β to show that the trade-off between the performance and computation can be directly controlled by the rationality level. Since most of the computation occurs when evaluating a large number of sampled trajectories in the proposed importance sampling-based method, we can use the number of evaluated trajectories as a proxy to measure the amount of computation an agent possesses. In Fig. 2(c), we analyze the relationship between β and the computation constraint in the six-agent environment. We can observe that when the computation is limited,
Decision-Making Among Bounded Rational Agents
281
Fig. 2. (a) Compares the performance (average traveled distance of the group) of different β values using a different number of trajectories in the six-agent environment. E The x and y axes are the number of trajectories and the average traveled distance of the group. (b) Shows the number of trajectories required to converge for different β in four, six, and eight agent environments.
Fig. 3. Minimum distances of each agent to other agents at every timestep in (a) four-agent (b) six-agent (c) ten-agent environments. The x-axis and y-axis are the timesteps and agents’ minimum distances, respectively. The colored solid lines represent the statistics of each agent, which is the same as their trajectory color in Fig. 4. The dashed grey line shows the minimum safety distance (0.25m) that needs to be maintained to avoid collisions.
i.e., only a small number of action sequences can be evaluated, a larger β (more rational) actually hurts the performance. When the more rational agents have the resources to evaluate more trajectories, they travel less distance on average than the less rational groups. This result demonstrates that by controlling the rationality parameter the bounded rational framework can effectively trade-off between optimality and limited computation. In Fig. 2(b), we also evaluate the number of trajectories that need to be sampled for the method to converge at different rationality levels in four, six, and eight agent environments. The result aligns with the previous observation – in general, when β is larger, more trajectories need to be evaluated to converge to “optimal” under the bounded rational constraints. Furthermore, when more agents are present in the environment, the method requires more trajectories to converge.
282
J. Xu et al.
Fig. 4. Agents’ trajectories in environments with four, six, and ten agents. The three columns show the snapshots at t = 20, 60, 80, respectively. Each agent, its trajectory, and its goal are assigned the same unique color. We only show the last 10 steps of trajectories to avoid clutter.
Finally, we show qualitative trajectory results of the group behaviors under bounded rationality using a fixed β = 0.1 and number of samples N = 5 × 105 for each agent in Fig. 4. Additionally, we provide the minimum distances of each agent to other agents in Fig. 3. The result shows that in each environment, the agent can maintain a safe distance > 0.25m to avoid collisions.
6
Physical Experiments
We use the Crazyflie 2.1 nano-drones under a motion capture system to validate our method in the real world. For this hardware experiment, we consider two types of tasks with a varying number of agents. The first task is to navigate a group of drones to a designated goal region while avoiding static obstacles and inter-drone collisions. The second task is position swapping similar to the previous section. The size of the workspace considered in all the experiments is 4.2 m × 5.4 m × 2 m in the x, y, and z axes, and the size of the drone is
Decision-Making Among Bounded Rational Agents
283
Fig. 5. (a) Shows the experimental setup with 4 Crazyflie drones. Green arrow points to the goal position. All the agents are navigating to the same goal point. (b) Snapshot of the experiment. Shows the path followed by the agents to avoid the obstacles.
Fig. 6. Shows the experimental setup with 6 Crazyflie drones. The drones are divided into two groups - red and green. (a) Shows the initial position of the drones. The task is to swap the positions. Black lines show assigned swapping tasks among agents. (b) Snapshot of the experiment during swapping. It shows the path followed by the agents to avoid collision with each other.
92 mm × 92 mm × 29 mm. To mitigate the downwash effect and control inaccuracy, we buffer the drone’s collision size to be 0.5m × 0.5m × 0.5m. We use Crazyswarm [18] platform to control the drones. For each run, we generate trajectories using N = 3 × 105 and β = 0.1 for all the agents using the proposed method. These trajectories contain a sequence of (x, y, z) coordinates. We use minimum snap trajectory optimization and control strategy [13] to track the trajectories generated by the proposed planner. We show two representative scenarios for each task. For complete experimental videos, please refer to https://youtu.be/hzCitSSuWiI Fig. 5 shows that a group of four drones have to go from the red zone to the green zone while avoiding four obstacles of various sizes distributed around the center of the workspace. Note that one of the drones opted to go over the obstacle of height 1.5m which shows that it finds a path through the space as narrow as the size
284
J. Xu et al.
of the drone (0.5m) in the z-axis. This event is captured in the snapshot shown in Fig. 5(b). Figure 6 shows the position swapping scenario. We use the same dynamics model as the previous section to generate the trajectories. We observe that the outcomes of the physical experiments are consistent with the results obtained in the simulation.
7
Conclusion
This paper considers the problem of making sequential decisions for agents with finite computational resources, where they need to interact with each other to complete their designated tasks. This problem is challenging because each agent needs to evaluate its infinite number of decisions (e.g., waypoints or actuator commands) and reason how others will respond to its behavior. While the game-theoretic formulation provides an elegant way to describe this problem, it is based on an unrealistic assumption that agents are perfectly rational and have the ability to evaluate the large decision space. We propose a formulation that replaces this rational assumption with the bounded rationality concept and presents a sampling-based approach to computing agents’ policies under their computational constraints. As shown in the experiments, by removing the perfect rational assumption, the proposed formulation allows the agents to take advantage of those with less computational power or avoid those who are more computational-capable. Additionally, when all the agents are similarly computational capable, they exhibit behaviors that avoid being taken advantage of by others.
References 1. Abdelkader, M., G¨ uler, S., Jaleel, H., Shamma, J.S.: Aerial swarms: recent applications and challenges. Current Rob. Reports 2(3), 309–320 (2021) 2. Bishop. C.M., Nasrabadi, N.M.: Pattern recognition and machine learning, vol. 4. Springer (2006) 3. Botvinick, M., Toussaint, M.: Planning as inference. Trends Cogn. Sci. 16(10), 485–488 (2012) 4. Chen, Y.F., Everett, M., Liu, M., How, J.P.: Socially aware motion planning with deep reinforcement learning. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1343–1350. IEEE (2017) 5. Fisac, J.F., Bronstein, E., Stefansson, E., Sadigh, D., Sastry, S.S., Dragan, A.D.: Hierarchical game-theoretic planning for autonomous vehicles. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 9590–9596. IEEE (2019) 6. Genewein, T., Leibfried, F., Grau-Moya, J., Braun, D.A.: Bounded rationality, abstraction, and hierarchical decision-making: An information-theoretic optimality principle. Front. Robot. AI 2, 27 (2015) 7. Gigerenzer, G., Brighton, H.: Homo heuristicus: why biased minds make better inferences. Top. Cogn. Sci. 1(1), 107–143 (2009) 8. Ingrand, F., Ghallab, M.: Deliberation for autonomous robots: a survey. Artif. Intell. 247, 10–44 (2017)
Decision-Making Among Bounded Rational Agents
285
9. Kappen, H.J., G´ omez, V., Opper, M.: Optimal control as a graphical model inference problem. Mach. Learn. 87(2), 159–182 (2012) 10. Lambert, A., Fishman, A., Fox, D., Boots, B., Ramos, F.: Stein variational model predictive control. arXiv preprint arXiv:2011.07641 (2020) 11. Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Machine Learning Proceedings 1994, pp. 157–163. Elsevier (1994) 12. L¨ utjens, B., Everett, M., How, J.P.: Safe reinforcement learning with model uncertainty estimates. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 8662–8668. IEEE (2019) 13. Mellinger, D., Kumar, V.: Minimum snap trajectory generation and control for quadrotors. In: 2011 IEEE International Conference on Robotics and Automation, pp. 2520–2525 (2011) 14. Ortega, P.A., Braun, D.A.: Thermodynamics as a theory of decision-making with information-processing costs. Proc. Royal Soc. A: Mathem. Phys. Eng. Sci. 469(2153), 20120683 (2013) 15. Ortega, P.A., Braun, D.A., Dyer, J., Kim, K.-E., Tishby, N.: Information-theoretic bounded rationality. arXiv preprint arXiv:1512.06789 (2015) 16. Osborne, M.J., et al.: An introduction to game theory, vol. 3. Oxford University Press New York (2004) 17. Pacelli, V., Majumdar, A.: Robust control under uncertainty via bounded rationality and differential privacy. arXiv preprint arXiv:2109.08262 (2021) 18. Preiss, J.A., H¨ onig, W., Sukhatme, G.S., Ayanian, N.: Crazyswarm: a large nanoquadcopter swarm. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 3299–3304. IEEE (2017). https://github.com/USC-ACTLab/ crazyswarm 19. Reeves, D., Wellman, M.P.: Computing best-response strategies in infinite games of incomplete information. arXiv preprint arXiv:1207.4171 (2012) 20. Schwarting, W., Alonso-Mora, J., Rus, D.: Planning and decision-making for autonomous vehicles. Annual Rev. Control Robot. Autonom. Syst. 1, 187–210 (2018) 21. Schwarting, W., Pierson, A., Karaman, S., Rus, D.: Stochastic dynamic games in belief space. IEEE Trans. Robot. (2021) 22. Simon, H.A.: A behavioral model of rational choice. Q. J. Econ. 69(1), 99–118 (1955) 23. Spica, R., Cristofalo, E., Wang, Z., Montijano, E., Schwager, M.: A real-time game theoretic planner for autonomous two-player drone racing. IEEE Trans. Rob. 36(5), 1389–1403 (2020) 24. Wang, M., Wang, Z., Talbot, J., Gerdes, J.S., Schwager, M.: Game theoretic planning for self-driving cars in competitive scenarios. In: Robotics: Science and Systems (2019) 25. Williams, G., Goldfain, B., Drews, P., Rehg, J.M., Theodorou, E.A.: Best response model predictive control for agile interactions between autonomous ground vehicles. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 2403–2410. IEEE (2018) 26. Williams, G., et al.: Information theoretic mpc for model-based reinforcement learning. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 1714–1721. IEEE (2017)
Distributed Multi-robot Information Gathering Using Path-Based Sensors in Entropy-Weighted Voronoi Regions Alkesh Kumar Srivastava1(B) , George P. Kontoudis1 , Donald Sofge2 , and Michael Otte1 1
2
University of Maryland, College Park, MD 20742, USA {alkesh,kont,otte}@umd.edu U.S. Naval Research Laboratory, Washington DC 20375, USA [email protected]
Abstract. In this paper, we present a distributed information-gathering algorithm for multi-robot systems that use multiple path-based sensors to infer the locations of hazards within the environment. Path-based sensors output binary observations, reporting whether or not an event (like robot destruction) has occurred somewhere along a path, but without the ability to discern where along a path an event has occurred. Prior work has shown that path-based sensors can be used for search and rescue in hazardous communication-denied environments—sending robots into the environment one-at-a-time. We extend this idea to enable multiple robots to search the environment simultaneously. The search space contains targets (human survivors) amidst hazards that can destroy robots (triggering a path-based hazard sensor). We consider a case where communication from the unknown field is prohibited due to communication loss, jamming, or stealth. The search effort is distributed among multiple robots using an entropy-weighted Voronoi partitioning of the environment, such that during each search round all regions have approximately equal information entropy. In each round, every robot is assigned a region in which its search path is calculated. Numerical Monte Carlo simulations are used to compare this idea to other ways of using path-based sensors on multiple robots. The experiments show that dividing search effort using entropy-weighted Voronoi partitioning outperforms the other methods in terms of the information gathered and computational cost. Keywords: path planning · multi-robot exploration · distributed decision making · Shannon Information Theory · information-driven partitioning · Bayesian Probability
1
Introduction
Autonomous agents are increasingly used in scientific, commercial, and military applications. In critical missions of locating human survivors after disasters, c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 286–299, 2024. https://doi.org/10.1007/978-3-031-51497-5_21
Multi-robot Information Gathering, Path-Based Sensors, Voronoi Regions
287
Fig. 1. Information gathering with two robots where the search space is decomposed in two entropy-weighted Voronoi partitions. The dotted lines represent the path that the agent traverses in the three search rounds. Robot 1 efficiently traverses the path of round 1, reporting a potential target at the location shown in green. However, it gets destroyed on round 2 and 3, and thus we only know that a hazard exist along these two paths. The overlapping areas of the round 2 and 3 for robot 1 have higher probability of hazard existence (darker gray). Robot 2 gets destroyed at round 1, and survives round 2 and 3. Since robot 2 reports a target at the same location at round 2 and 3, the probability of target existence is higher (darker green).
the deployment of multi-robot systems may be more efficient than deploying a single agent. In this work, we are primarily motivated by search and rescue missions of human survivors in lethally hostile environments. We consider a general case where information exchange between agents in the field is prohibited. Yet, the locations of base stations—where robots start and end each search round’s journey through the environment—provide connectivity to a shared centralized server (i.e., before and after each journey). The environment is completely unknown and lethally hostile such that agents may be destroyed by a stationary hazards during each search round, and the locations of hazards are initially unknown. Due to the lack of communication in the field, agent destruction happens at an unknown location along the agent’s path and leads to loss of any information collected about targets along that path. However, agent loss also provides information about the location of hazards within the environment (see Fig. 1). Even though we do not know where along the path the event of robot destruction has occurred, the shape of the path constrains the set of possible locations at which the hazard may have destroyed the agent. The idea that a path’s shape provides information that can be used to update a belief map (e.g., of hazard presence) given that an event (e.g., robot destruction) has occurred somewhere along a path is called a path-based sensor [1]. The objective of the work described in the current paper is to extend the use of path-based sensors to the case where multiple robots explore the environment simultaneously and in a distributed fashion, to improve the computational efficiency and accelerate the convergence speed of the hazard and target belief maps. The proposed distributed methodology decomposes this problem into two distinct sub-problems – decomposition of search space, and path planning in the
288
A. K. Srivastava et al.
decomposed space. Thus, this technique is a decoupled approach consisting of two steps: (i) centralized Shannon entropy-driven partitioning of the environment into a set of disjoint regions (one region per agent); and (ii) distributed information theoretic path planning of each agent in its region. The first step assigns each agent a particular partitioned region in the search space. To ensure that each point in the partitioned space is given to the nearest base station and affects the entropy of the partitioned space the least, we use entropy-weighted Voronoi partitioning. This helps distribute the workload among the agents for the next step, where each agent simultaneously and locally plans an information-driven path in its corresponding search space, as described in Fig. 1. Thereby helping the multi-robot system to efficiently and robustly infer the hazard and target belief map of the environment. Related Work: The concept of path-based sensors was introduced in [1], which focused on the scenario where multiple agents explored the environment sequentially. The current paper compares three different extensions of the path-based sensor idea in which m agents explore the environment in parallel during each search round. We now discuss two different bodies of related work. We begin by reviewing the literature that studied the passive cooperation problem for coverage path planning (CPP) algorithms, and then we review work on information gathering. CPP algorithms use a combination of environmental partitioning and passive communication [3]. In [4], Nair and Guruprasad use a partition-and-cover approach based on Voronoi partitioning to achieve a passive cooperation between agents to avoid task duplicity. Distributed strategies are used in [5–12] to keep track of regions covered. The use of geodesic or Manhattan distance as a distance metric to improve exploration is proposed in [13]. Cooperative sweeping of the environment by multiple robots is studied in [14] and the use of entropy as a weight for Voronoi tessellation was discussed in [15]. In our work, we employ a divide-and-conquer approach to environmental partitioning using entropy-weighted Voronoi decomposition. Our work assumes that base stations have a centralized communication topology, and we focus on using path-based sensors to gather information about hazards when communication is denied in the field. Recursive Bayesian filters are used in the literature to update belief maps of the environment [16,17]. However, [17] assumed knowledge of exact location of malfunctioning agents, i.e., false positive location is known. A formal derivation of mutual information gradient for information gathering is presented in [18]. The latter introduced information surfing, the idea of using the gradient of mutual information for information gathering, and the work is extended in [19,20], but with the assumption that a hazard could not destroy the agents. The focus of this work is on combining path-based sensors and information gathering in environments with lethal hazards and targets, using a team of imperfect robots that may malfunction (false positive) or report false negative observations. Contribution: The contribution of this paper is twofold. First, we present a distributed methodology for information gathering in a multi-robot system using
Multi-robot Information Gathering, Path-Based Sensors, Voronoi Regions
289
path-based sensors. The proposed technique is a decoupled two-step scheme that uses a distributed information-theoretic mechanism for partitioning the environment and local information-theoretic planner to maximize information gathering. Second, we also synthesize a centralized global planner for information gathering in a multi-robot system that uses sequential Bayesian filter to calculate estimated belief maps of the environment.
2
Problem Formulation
Consider a spatial environment which is divided into a×b discrete cells composing the search space S ∈ R2 . A team of M agents is deployed in the environment, where each agent is denoted by i = 1, . . . , M . The search space includes n base stations D = {d1 , d2 , . . . , dn } ⊂ S, where n ≤ M . Note that in this paper we consider the case of n = M . Each agent i is located at its corresponding base station di to explore a specific bounded region mSi ⊂ S. The union of the regions of exploration Si constitute the search space i=1 Si = S, with no overlap ∀i = j, Si ∩ Sj = ∅, where i, j ∈ {1, 2, . . . , m}. Each agent i can visit up to li cells c in its search space forming the path ζi = c1 , c2 , . . . , cli ⊆ Si such that the exploration starts and ends at the agent’s corresponding base station di , i.e., c1 = cli = di . When all the surviving agents reach their respective base stations at the end of their exploration, we terminate one search round r ∈ Z>0 . We define the time taken by an agent i to move from cell cj to cj+1 as one timestep t. Thus, the duration of a search round can be defined as tr := maxi∈{1,...,M } li t. Definition 1. Path-based sensor [1] A sensor that reports whether or not an event has occurred somewhere along a path, but has no conception of where along the path that event has taken place. The search space includes hazardous elements Z = {0, 1} that may destroy the agent, where Z = 1 indicates the presence of a hazard, while Z = 0 denotes the absence of a hazard in a particular cell. If the agent i is destroyed anywhere along the path ζi then the path-based sensor (Definition 1) is triggered (Θ = 1), and if the agent i survives the path ζi then we consider that the path-based sensor is not triggered (Θ = 0). We assume that the path-based sensor may report false positive and false negative triggering. False-positive accounts for faulty or malfunctioning robots that get destroyed regardless of the presence of a hazard, whereas false-negative accounts for the chance of the robot surviving a cell despite having a hazard. The search space also includes some elements of interest, hereafter referred to as targets, X = {0, 1}, where X = 1 indicates the presence of a target, while X = 0 denotes the absence of a target in a particular cell. The presence of a target is recorded by a noisy sensor that may also report a false positive or a false negative observation of the target. The sensor used for the detection of targets is not a path-based sensor. In other words, if a robot survives a path, the target sensor reports the exact whereabouts of the target observation along the survived path.
290
A. K. Srivastava et al.
All agents start their exploration simultaneously at the beginning of a search round. Inter-agent communication is prohibited at all times. However, the successful traverse of a path by an agent indicates information about absence of hazards, and each survived agent also transmits information about targets along their path to its base station. In other words, each agent i reports its observation only to its base station di . A central server σ can obtain this information from each base station to update the global belief map. Problem 1. Given a team of M agents, with inter-agent communication prohibited at all times, in an environment with multiple available base stations D that communicate with a central server σ, the task of the agents is to explore a completely unknown environment efficiently and gather information about targets X and hazards Z. Remark 1. The use of the centralized server to partition the environment and update shared belief maps is convenient, but not required. If a completely decentralized and distributed algorithm is desired, then the use of the centralized server can be eliminated, assuming that base stations are able to synchronize their data after each search round. Each base station can perform the same (deterministic) weighted Voronoi partitioning algorithm in parallel such that all base stations achieve the same partitioning.
3
Distributed Information Gathering
In this section, we propose three methodologies to address Problem 1. The first methodology consists of a decoupled two-step scheme: i) entropy-weighted Voronoi partitioning as a centralized method to decompose the environment in Si spaces; and ii) mutual information-based planner from each base station di that maximizes local information gathering. In the second method, we assume no partitioning of the environment, thus information-based planning is executed in the search space S. However, we deploy a team of robots simultaneously from various base stations di . Lastly, we discuss a global planning method that calculates expected belief maps for each agent i before deployment from the same base station d and then plans an information-driven path for all agents. 3.1
Distributed Entropy Voronoi Partition and Planner (DEVPP)
To address the obscure information obtained from a path-based sensor, [1] proposed integration of all potential location for a path-based sensor triggering. However, this increases the computational complexity of optimal path solutions beyond short path lengths. Therefore, we propose a distributed strategy with multiple agents that partitions the environment in a way that maximizes the expected information gain of the entire search space S without task duplicity. The structure of the proposed methodology is presented in Fig. 2.
Multi-robot Information Gathering, Path-Based Sensors, Voronoi Regions
291
Fig. 2. The block diagram illustrates a decoupled two-step scheme of distributed local planning with centralized partitioning. In the leftmost block diagram base stations di (r) (r) communicate their belief maps Xi , Zi to the central server σ. Next, the central server partitions the environment S into sub-search spaces Si and then transmits this information to each base station di . The sub-search spaces Si are used locally for information theoretic planning.
Centralized Entropy Decomposition. For centralized decomposition we use weighted Voronoi partitioning to decompose the search space S into M smaller search spaces {S1 , S2 , . . . , SM }, where each robot i is assigned Si . This partitioning uses base stations di as generators, assigning each agent i to a specific search space Si . Let pi = [pi,x , pi,y ] ∈ S denote the location of a cell in the search space and pdj ∈ S denote the location of base station dj , where j = 2, 3, . . . , card(S) with card(S) = ab. The geodesic Voronoi partitioning of the search space is computed by, Si = { pi ∈ S | g(pi , pdj ) ≤ g(pi , pdk ), ∀ j = k},
(1)
where g(pi , pd ) = pi − pd is the distance between pi and pd . Next, we redefine g(pi , pd ) to incorporate the Shannon entropy of the partition, gw (pi , pd ; Xp i , XSp d ) = w(Xp i , XSp d )g(pi , pd ),
(2)
where w(Xp i , XSp d ) = (H(Xp i ) + H(XSp d ))/(card(XSp d ) + 1) is a weight representing the average entropy of the expected partition with entropy given as H(Xp i ) = − X P(Xp i ) log P(Xp i ) dx. Thus, by substituting (2) to (1) the entropy-weighted Voronoi partitioning yields, Si = { pi ∈ S | w(Xp i , XSj )g(pi , pdj ) ≤ w(Xp i , XSk )g(pi , pdk ), ∀ j = k}. (3) Note that for the initial decomposition of the environment with no prior information, the entropy-weighted Voronoi partitioning (3) is identical to the geodesic Voronoi partitioning (1), i.e., w(Xp i , XSj ) = w(Xp i , XSk ) for all j = k. Local Information-Theoretic Planning. Centralized entropy-weighted Voronoi partitioning decomposes the problem into single-agent sub-problems of information gathering. Here, each agent i stationed at its base station di is
292
A. K. Srivastava et al.
Algorithm 1. Distributed Entropy Voronoi Partitioning and Planner (DEVPP) (0)
(0)
M Inputs: {Xi }M i=1 , {Zi }i=1 , S, M , D, rmax (r)∗ M Output: {ζi }i=1 1: for i = 1, . . . , M do (0) 2: Si ← geodesicPartition(p, pdi ) (1) (0)∗
(0)
(0)
Initial decomposition and planning
(0)
3: ζi ← calculatePath(Xi , Zi , Si ) (1) (1) (0) (0) (0) (0)∗ 4: Xi , Zi ← beliefUpdate(Xi , Y i , Zi , ζi ) 5: end for 6: for r = 2 to rmax do 7: for i = 1, . . . , M do Communication to server (r−1) (r−1) , Zi from base stations to central server 8: communicate Xi 9: end for (r) 10: Si ← {∅} 11: for p ∈ S do Centralized Entropy Decomposition 12: for i = 1, . . . , M do 13: gw ← EntropyWeight(p, pdi ; Xp , XSp d ) (2) i 14: end for 15: pdi = arg minp di ∈D {gw } (r)
(r)
i ← index(pdi ); Si = Si ∪ p end for (r) Communication from broadcast Si from central server to base stations server 19: for i = 1, . . . , M do Local Information Theoretic Planning (r)∗ (r−1) (r−1) (r) ← calculatePath(Xi , Zi , Si ) 20: ζi (r) (r) (r)∗ 21: Θζi , Y i ← traversePath(ζi )
16: 17: 18:
(r)
(r)
22: Xi , Z i 23: end for 24: end for
(r−1)
← beliefUpdate(Xi
(r)
(r−1)
, Y i , Zi
(r)
(r)∗
, Θζi , ζi
)
now assigned to find a path ζi∗ in its corresponding search space Si that maximizes the expected information gained about targets X and hazards Z, given the observed target sensor measurement Y and path-based sensor measurements Θζi along a path ζi at the exploration round r, (r)∗
ζi
(r−1) (r) (r) (r) (r−1) (r) (r) = arg maxζi ∈Ωi I(Xi ; Y i | Θζi , Si ) + I(Zi ; Θζi | Si ) ,
(4) (r) (r) where Ωi is the space all possible paths of agent i in its search space Si , Y i = (r) (r) [Yi,1 , . . . , Yi,l ] ∈ Rl denotes all l observations collected at search round r by agent i, and Θζi is the space of observation of agent i about the path-based sensor triggering. This local path planning (4) is addressed using the methodology in [1]. We illustrate the proposed methodology for partitioning the environment with M agents and M base stations in Fig. 2. DEVPP (Algorithm 1) describes the centralized partitioning and local planning methodology, where brown color
Multi-robot Information Gathering, Path-Based Sensors, Voronoi Regions
293
Fig. 3. The block diagram illustrates a distributed local planning with centralized sequential Bayesian filter (SBF). In the leftmost block all base stations di communicate (r) (r) their local observations Y i , Θζi to the central server σ. Next, the central server updates the belief map sequentially, where the posterior belief map of agent i is used as prior for the i + 1 update. The posterior belief map is broadcasted to the base stations which use them for local information theoretic planning.
indicates action from central server σ, and blue color local action from a base station di . After the environment has been partitioned successfully at the central entity, each agent i plans a path using calculatePath algorithm introduced in [1]. DEVPP terminates after a number of rounds rmax . 3.2
Multi-agent Distributed Information-Theoretic Planner (MA-DITP)
Unlike DEVPP, this approach does not perform any partitioning of the search space. Thus, each agent i explores the entire search space S and plan its path based on the prior belief of hazards and targets in the search space S at every search round r. In other words, the planning is global in the entire search space S and not local as in Sect. 3.1. After a search round, each agent i transmits the (r) (r) target observations Y i and the path-based sensor measurement Θζi to its base station di which subsequently uploads this information to the central server σ. (r) The central server then receives observations from all base stations {Θζi }M i=1 , (r)
{Y i }M i=1 and updates the belief maps using sequential Bayesian filtering (SBF). Next, the central server broadcasts the posterior belief maps to the base stations. (r+1)∗ in Lastly, each base station di computes an information theoretic path ζi the search space S and the robots are assigned to traverse the paths and collect measurements. The proposed MA-DITP method is presented in Algorithm 2, where brown color indicates action from the central entity, and blue color local action from a base station di . The block diagram is illustrated in Fig. 3. 3.3
Multi-agent Global Information-Theoretic Planner (MA-GITP)
In this section, we discuss a global planning method where a team of robots is deployed simultaneously from the same base station d instead of multiple base stations as in DEVPP and MA-DITP. Using MA-DITP in a single station with
294
A. K. Srivastava et al.
Algorithm 2 . Multi-Agent Distributed Information Theoretic Planner (MADITP) (0)
(0)
M Input: {Xi }M i=1 , {Zi }i=1 , S, M , rmax (r)∗ M Output:{ζi }i=1 1: for r = 1 to rmax do 2: for i = 1, . . . , M do Local information-theoretic planning (r)∗ (r−1) (r−1) ← calculatePath(Xi , Zi , S) 3: ζi (r) (r) (r)∗ 4: Θζi , Y i ← traversePath(ζi ) (r)
(r)
communicate Θζi , Y i from base station to central server end for (r) (r) (r−1) (r) (r−1) (r) (r)∗ , Y i , Zi , Θζi , ζi ) Xi , Zi ← beliefUpdate(Xi SBF (r) (r) 8: broadcast Xi , Zi from central server to base stations 9: end for
5: 6: 7:
Centralized
multiple-agent scenario would result in redundant and repeated exploration of search space S, hampering the information gathering process. To address this problem, we introduce the idea of using an expected belief map for each agent. The expected belief map for the first agent is the true belief map computed by the sequential Bayesian filter (SBF) using observations from the previous search round r − 1. Next, the base station makes a prediction for each possible path(r) (r) based sensor observation of agent 1 (Θζ1 = 1 for destruction or Θζ1 = 0 for survival) and computes the weighted average belief map which generalizes to, (r)
(r)
(r)
(r)
(r)
(r)
(r)
Zi+1 = p1 Zi (Θζi = 1) + p0 Zi (Θζi = 0), ∀i = 2, . . . , M, (r)
(r)
(r)
(5)
(r)
where p1 = P(Θζi = 1) is the probability of destruction, p0 = P(Θζi = 0) is (r)
(r)
the probability of survival, Zi (Θζi = 1) is the belief map in case of destruction, (r)
(r)
and Zi (Θζi = 0) is the belief map in case of survival of the previous agent (r)
i. Each expected belief map Zi+1 is used to compute an information-theoretic (r)
path ζi+1 . The multi-agent global information-theoretic planning method (MAGITP) from a single base station is presented in Algorithm 3. Note that Algorithm 3 includes only local updates, thus the all actions are illustrated in blue color. The multi-agent global planner is inherently resilient to communication channel attacks, because there is no communication and all the processing is performed at a base station for all agents. In addition, this approach is practical as in most search and rescue missions we typically have access to a single station for exploration.
Multi-robot Information Gathering, Path-Based Sensors, Voronoi Regions
295
Algorithm 3. Multi-Agent Global Information Theoretic Planner (MA-GITP) (0)
Inputs: X (0) , {Zi }M i=1 , S, M , rmax (r)∗ M Output: {ζi }i=1 1: for r = 1 to rmax do 2: for i = 1, . . . , M do Weighted Average Belief Map (r)∗ (r−1) ← calculatePath(X (r−1) , Zi ) 3: ζi (r−1) (r−1) ) (5) 4: Zi+1 ← weightedAvgBelief(Zi 5: end for 6: for i = 1, . . . , M do Sequential Bayesian Filter (r) (r) (r)∗ 7: Θζi , Y i ← traversePath(ζi ) (r)
8: X (r) , Zi 9: end for 10: end for
4
(r)
(r−1)
← beliefUpdate(X (r−1) , Y i , Zi
(r)
(r)∗
, Θζi , ζi
)
Experiments and Results
Experimental Setup: To evaluate the performance of the proposed methodologies we implement 30 Monte Carlo (MC) replications for all experiments, where each MC replication has r = 150 search rounds. The spatial environment is of dimensions a × b = 15 × 15 cells. A cell in the environment can be either empty (contain no hazard and no target), or contain a hazard, or a target, or both. The movement of each agent in the environment is determined by a 9-grid connectivity, i.e., an agent can move anywhere to the 8-neighboring cells or decide to stay at its current cell for the next timestep. Let each agent to have a malfunction probability of 5% per timestep and the target sensor to have a false positive and false negative rate of 10% per timestep. We consider environments with different hazard lethalities of 50%, 70%, and 90%. Lethality refers to the probability of an agent destruction when it visits a cell that contains a hazard. We evaluate the proposed methodologies for various fleet sizes, ranging from M = 1 to M = 5 agents. In all numerical experiments, we set the maximum number of timesteps for each path to be twice the Manhattan distance between its base station and the furthest corner of its search space Si , i.e., li = 2 maxp i ∈Si pdi − pi 1 . Experiment 1: In this set of experiments, we compare the efficiency of DEVPP against [1] with different fleet sizes. Note that for the case of M = 1 agent, DEVPP is identical to [1]. We deploy up to M = 5 robots at card(D) = 5 distinct base stations. In Fig. 4, we demonstrate the progress of information entropy for the environment during 150 search rounds. As the number of agents increases, the information entropy reduces faster for all adversary lethalities cases. Contrary to the notion that adding more agents may increase the computational cost of the experiments, Fig. 5 (Left) presents that the time elapsed per agent to complete 150 search rounds decreases with an increasing number of agents. This significant reduction in computational time is attributed to the smaller search sub-spaces for planning with multiple robots. Although the time elapsed for the entire planning and execution decreases with increasing number of robots, the time taken by the
296
A. K. Srivastava et al.
Fig. 4. Information entropy of the environment with hazard lethality of 50%, 70%, 90% using the DEVPP method. As the fleet size increases the information entropy of the environment decreases faster.
Fig. 5. (Left) Computational time required for each agent to perform local computations for r = 150 search rounds using DEVPP for M ≥ 2 and Fig. 5 for M = 1. As the fleet size increases the computation is distributed among agents. (Right) Computational time required by the central server σ to partition the environment using (3). As the size of the fleet increases, the time required to compute the partitioned search spaces Si ∈ S increases.
central server σ to compute the partitioning increases with addition of robots in the search space S, as shown in Fig. 5 (Right). The computations of the central server σ are insignificant compared to the local computations of the agents for small fleet size (M = 2, 3), but increases at similar level for local computations with higher fleet size (M = 4, 5). In any case, even if we account the central computations for all fleet sizes, the proposed DEVPP method outperforms [1]. Experiment 2: Next we compare all methods with each other and with [1]. The lethality rate is fixed to 90% and we consider three fleet sizes M = 2, M = 4, and M = 5 robots. Note that when M = 1 agent all methods are identical to [1]. In Fig. 6, we present the progress of information entropy through r = 150 search rounds. The results show that DEVPP outperforms all methods; MADITP gathers competitive information to DEVPP; while MA-GITP performs slightly better to [1] for all fleet sizes. In Fig. 7, we demonstrate the execution time required per agent after r = 150 search rounds. The execution time of DEVPP is significantly lower from all other methods and with minimal variation.
Multi-robot Information Gathering, Path-Based Sensors, Voronoi Regions
297
Fig. 6. Information entropy of the environment with hazard lethality of 90% for all proposed methods. We consider three fleet sizes M = 2, M = 4, and M = 5 agents, while all cases are identical to [1] when M = 1 agent.
Fig. 7. The boxplots illustrate the execution time per agent in seconds with hazard lethality of 90% for fleet sizes M = 2, M = 4, and M = 5 agents. When of M = 1 agent all methods report identical execution time to [1].
To this end, not only information gathered using DEVPP is more efficient, but also it is executed significantly faster. The other proposed methods (MA-DITP and MA-GITP) are executed in similar or slightly more time than [1]. Main Result: The numerical experiments reveal that the proposed DEVPP method with multiple robots is more efficient in terms of information gathered compared to the case of a single robot [1]. This result is attributed to the shorter paths produced for each agent on the assigned subspaces, which subsequently reduces the likelihood of robot destruction and significantly reduces execution time. To this end, simultaneous deployment of multiple robots leads to efficient information gathering in all proposed methodologies.
5
Conclusion
This paper introduces DEVPP, a distributed information gathering approach in an environment with path-based sensors using a team of robots simultaneously. We demonstrate the efficacy of this approach in various experiments with Monte Carlo replications. We also compare this approach against two informationgathering methods that employ multiple robots simultaneously (MA-DITP and MA-GITP) and [1] in the same environment. We experimentally show that the
298
A. K. Srivastava et al.
information gathered for hazards and targets using DEVPP outperforms the rest approaches in all cases. In addition, DEVPP reduces the computational cost, thereby accelerates the overall execution time of the information-theoretic planner. Although MA-GITP performs slower than the rest methods, no communication is required, and thus it is resilient to malicious attacks in the communication system. All proposed multi-robot methodologies, reduce the entropy of the environment faster than the case where only a single robot is deployed. Acknowledgments. This study was conducted at the Motion and Teaming Laboratory, University of Maryland (UMD), and was funded by the Maryland Robotics Center (MRC) and the Office of Naval Research (ONR). Alkesh K. Srivastava and George P. Kontoudis were partially supported by the MRC to conduct this research with “Pathway to the Ph.D.” and “MRC Postdoctoral Fellowship” programs, respectively. This work has been possible with the support of ONR under the grant “Experimentally Verified Autonomous Path Planning for Information Gathering in Lethally Hostile Environments with Severely Limited Communication” (N0001420WX01827). The views, positions, and conclusions contained in this document are solely those of the authors and do not explicitly represent those of ONR.
References 1. Otte, M., Sofge, D.: Path-based sensors: paths as Sensors, Bayesian updates, and Shannon information gathering. IEEE Trans. Autom. Sci. Eng. 18, 946–967 (2021) 2. Okabe, A., Boots, B., Sugihara, K.: Nearest neighbourhood operations with generalized Voronoi diagrams: a review. Int. J. Geogr. Inf. Syst. 8(1), 43–71 (1994) 3. Choset, H.: Coverage for robotics - a survey of recent results. Ann. Math. Artif. Intell. 31, 113–126 (2001) 4. Nair, V.G., Guruprasad, K.R.: GM-VPC: an algorithm for multi-robot coverage of known spaces using generalized Voronoi partition. Robotica 38(5), 845–860 (2020) 5. Jager, M., Nebel, B.: Dynamic decentralized area partitioning for cooperating cleaning robots. In: IEEE International Conference on Robotics and Automation, vol. 4, pp. 3577–3582 (2002) 6. Rankin, E.S., Rekleitis, I., New, A.P., Choset, H.: Efficient Boustrophedon multirobot coverage: an algorithmic approach. Ann. Math. Artif. Intell. 52, 109–142 (2008) 7. Hazon, N., Kaminka, G.A.: On redundancy, efficiency, and robustness in coverage for multiple robots. Robot. Auton. Syst. 56, 1102–1114 (2008) 8. Agmon, N., Hazon, N., Kaminka, G.A.: The giving tree: constructing trees for efficient offline and online multi-robot coverage. Ann. Math. Artif. Intell. 52(2–4), 43–168 (2009) 9. Zheng, X., Koenig, S., Kempe, D., Jain, S.: Multirobot forest coverage for weighted and unweighted terrain. IEEE Trans. Rob. 26(6), 1018–1031 (2010) 10. Wilson, Z., Whipple, T., Dasgupta, P.: Multi-robot coverage with dynamic coverage information compression. In: International Conference on Informatics in Control, Automation and Robotics, pp. 236–241 (2011) 11. Michel, D., McIsaac, K.: New path planning scheme for complete coverage of mapped areas by single and multiple robots. In: IEEE International Conference on Mechatronics and Automation, pp. 1233–1240 (2012)
Multi-robot Information Gathering, Path-Based Sensors, Voronoi Regions
299
12. Macharet, D., Azpurua, H., Freitas, G., Campos, M.: Multi-robot coverage path planning using hexagonal segmentation for geophysical surveys. Robotica 36(8), 1144–1166 (2018) 13. Guruprasad, K.R., Wilson, Z., Dasgupta, P.: Complete coverage of an initially unknown environment by multiple robots using Voronoi partition. In: International Conference on Advances in Control and Optimization in Dynamical Systems, Bangalore, India (2012) 14. Kurabayashi, D., Ota, J., Arai, T., Yoshida, E.: Cooperative sweeping by multiple mobile robots. In: IEEE International Conference on Robotics and Automation, vol. 2, pp. 1744–1749 (1996) 15. Bhattacharya, S., Michael, N., Kumar, V.: Distributed coverage and exploration in unknown non-convex environments. In: Martinoli, A., et al. (eds.) Distributed Autonomous Robotic Systems. Springer Tracts in Advanced Robotics, vol. 83, pp. 61–75. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-32723-0 5 16. Otte, M., Sofge, D.: Path planning for information gathering with lethal hazards and no communication. In: International Workshop on the Algorithmic Foundations of Robotics, pp. 389–405 (2018) 17. Schwager, M., Dames, P., Rus, D., Kumar, V.: A multi-robot control policy for information gathering in the presence of unknown hazards. In: Christensen, H.I., Khatib, O. (eds.) Robotics Research. STAR, vol. 100, pp. 455–472. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-29363-9 26 18. Julian, B.J., Angermann, M., Schwager, M., Rus, D.: Distributed robotic sensor networks: an information-theoretic approach. Int. J. Robot. Res. 31(10), 1134–1154 (2012) 19. Dames, P., Schwager, M., Kumar, V., Rus, D.: A decentralized control policy for adaptive information gathering in hazardous environments. In: IEEE Conference on Decision and Control, pp. 2807–2813 (2012) 20. Dames, P., Schwager, M., Rus, D., Kumar, V.: Active magnetic anomaly detection using multiple micro aerial vehicles. IEEE Robot. Autom. Lett. 1, 153–160 (2016)
Distributed Multiple Hypothesis Tracker for Mobile Sensor Networks Pujie Xin and Philip Dames(B) Temple University, Philadelphia, PA 19122, USA {pujie.xin,pdames}@temple.edu https://sites.temple.edu/trail/
Abstract. This paper proposes a distributed estimation and control algorithm to allow a team of robots to search for and track an unknown number of targets. The number of targets in the area of interest varies over time as targets enter or leave, and there are many sources of sensing uncertainty, including false positive detections, false negative detections, and measurement noise. The robots use a novel distributed Multiple Hypothesis Tracker (MHT) to estimate both the number of targets and the states of each target. A key contribution is a new data association method that reallocates target tracks across the team. The distributed MHT is compared against another distributed multi-target tracker to test its utility for multi-robot, multi-target tracking.
Keywords: Distributed estimation Multi-target tracking
1
· Multiple Hypothesis Tracker ·
Introduction
The paper addresses the problem of multi-robot, multi-target tracking (MRMTT), a canonical task in robotics, which includes problems such as search and rescue [21], surveillance [11], and mapping [10]. Additionally, we assume that the robots operate in a known area that contains an unknown number of targets (e.g., the robots know the blueprint map of a building but not how many people are inside of it). The targets of interest can be stationary or dynamic, and they might enter or leave current environment. Thus the number of the targets varies with time. A solution to this problem contains two primary components: (I) a estimation algorithm to track the targets and (II) a control algorithm which leads robots to search for new targets (exploration) while also keeping track of detected targets (exploitation). Multiple Target Tracking (MTT) is a well studied problem, with the primary distinction between methods being how they address the problem of data association [30]. Common approaches include the Multiple Hypothesis Tracker (MHT) [2], Joint Probabilistic Data Association (JPDA) [13], Probability Hypothesis Density (PHD) filter [20], and Multi-target Multi-Bernoulli (MeMBer) Filter c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 300–316, 2024. https://doi.org/10.1007/978-3-031-51497-5_22
Distributed Multiple Hypothesis Tracker
301
[33]. We base our solution on the track-oriented MHT (TOMHT) [16] as this offers a conceptually simple and computationally efficient solution. Active search with teams of robots is another well studied problem. The primary distinction in this space is the use of centralized [27,34] versus distributed strategies [11,19,29]. One of the most successful distributed algorithms is Voronoi-based control [7], where the basic idea is to divide the search area with a Voronoi partition [12] and iterative move each robot to the centroid of its Voronoi cell, a process known as Lloyd’s algorithm. In recent years, Multi-Agent Deep Reinforcement Learning (MADRL) has offered a new method to generate search policies through repeated interactions between other agents and the environment in an attempt to maximize a reward function [15]. However, much of the work based on MADRL requires global information and uses a centralized architecture [14,25]. Zhou et al. [35] propose a decentralized MADRL method using the maximum reciprocal reward to track cooperatively with UAV swarms. However, one downside of learning-based frameworks is that they often have trouble generalizing to unseen scenarios. There are few works that specifically address the MR-MTT problem. One of the first solutions was Cooperative Multi-robot Observation of Multiple Moving Targets (CMOMMT) from Parker [22], which assigns robots to targets to maximize the time such that each target is observed by at least one robot. Related to CMOMMT, Pierson and Rus [23] proposed a distributed method for capturing multiple evaders with multiple pursuers in 2D and 3D environment by assigning pursuers to evaders and then using gradient descent. Pimenta et al. [24] propose Simultaneous Coverage and Tracking (SCAT) where they use Lloyd’s algorithm to create a control law with guaranteed exponential convergence to a local minimum of the objective function. Our previous work [9] built upon SCAT by allowing the robots to learn the target distribution online rather than needing that information a priori. We did this by formulating a distributed PHD filter and using this as the importance weighting function in Lloyd’s algorithm to drive target search. This paper will use the same control framework but formulate a new distributed MHT. The primary contribution of this work is creating a new distributed TOMHT algorithm. Our approach leverages information about the relative locations of sensors (i.e., robots) to decrease the memory and computational requirements relative to a na¨ıve implementation. We then demonstrate the utility of this formulation to solve the MR-MTT problem by using the resulting target estimates to drive robot search. Our search strategy naturally and effectively drives the robots to follow previously detected targets and to explore unknown areas that may contain new targets. To demonstrate our result is correct and efficient, we compare our result to that of the PHD filter [9] through a series of simulated experiments with static and moving targets.
302
2
P. Xin and P. Dames
Background
In this section we provide the mathematical formulation of the standard Multiple Hypothesis Tracker (MHT) for multi-target tracking and of Lloyd’s algorithm for distributed coverage control.
Fig. 1. Illustration of MHT. (a) Example of at t = 1, where observations 2 and 3 are in the gate and could be associated with the current track 1 while observation 4 is out of the gate and can not be associated to existing tracks. (b) 1–6 are observations at different time step. With the gating process, initial track hypothesis are built based on gating test. (c) The corresponding track trees. Each tree node is associated to an observation in (b).
2.1
MHT Definition
Let the set of objects be Xt = {x1t , x2t , . . .}, where xit is the state of the ith object at time t. The MHT uses an extended Kalman filter to estimate the state ˆ it ∼ N (μit , Σit ). Since the number of objects is unknown, the of each object, so x MHT uses the concept of a track to represent a potential object. Therefore, the number of tracks represents the estimated number of objects within the search space and the state of each track represents the state of that individual object. At each time step, a robot receives a set of measurements Zt = {z1t , z2t , . . .}. The number of measurements depends upon the number of targets within the robot’s field of view (FoV) as well as the probability of receiving a false negative pfn or false positive detection pfp . The MHT uses a two-step approach to solve the MTT, first solving the data association problem to match measurements to tracks and then using the associated measurements to update each track. Data Association. The MHT uses a gating method to determine if an indiˆ it : vidual measurement zjt should be associated with a track x ˆ it − zjt )T (Σ it )−1 (μ ˆ it − zjt ) ≤ 2d , xit , zjt ) = (μ d2t (ˆ
(1)
ˆ it and d where μit , Σ it are the mean and covariance of the estimated track x is the allowable distance threshold. Figure 1a shows the gating process for one
Distributed Multiple Hypothesis Tracker
303
time step. This comparison is to determine the full set of potential associations ˆ t at time t. between all measurements Zt and all tracks X The TOMHT then considers a set of potential associations, as Fig. 1b shows. ˆ 1t , For example, if measurements z2t and z3t could be associated with a track x then two copies of that track are propagated into the future, one using each hypothetical association, as Fig. 1c shows. All these distinct hypotheses for a track are maintained using a tree structure, where each branch is a unique series of associations and each node has a distinct mean and covariance. Additionally, the TOMHT considers the possibility of starting a new tree using each measurement, which represents the possibility that a detection represents an entirely new object. Track Scoring. Based on the above structure, there will be many distinct hypotheses for an individual track based on different association histories. Therefore, the MHT uses a scoring system to determine the likelihood of individual tracks. This score can be used both to find the maximum likelihood track as well as to prune branches within the tree to avoid an exponential growth in the set of hypotheses over time. A standard way to formulate the score S uses the log likelihood ratio [3,16] S(Ai1:t )
ki−1 ki ki 1:t | zki i ⊆ Zi ∀i) p(zk1:t i=1:t p(zi | z1:i−1 , zi ⊆ Zi ) = ln = ln ki−1 ki ki 1:t p(zk1:t | zki i ⊆ ∅) i=1:t p(zi | z1:i−1 , zi ⊆ ∅)
(2)
where Ai1:t = {k1 , . . . , kt } is a history of associations for track i, p(zki i | ki−1 ˆ ki i ) represents the track being updated with a spez1:i−1 , zki i ⊆ Zi ) = N (zki i ; x ki−1 cific measurement and p(zki i | z1:i−1 , zki i ⊆ ∅) = 1/AFoV represents a missed detection (i.e., false negative) and AFoV is the area of the sensor field of view. This score can be easily updated at each time p fn , zkt t ⊆ ∅ i i (3) S(A1:t ) = S(A1:t−1 ) + 1−pAfp 2 FoV ln 2π − 12 ln |Σ tkt | + d2 , zkt t ⊆ Zt where d(= 2) is the dimension of the space. To get the maximum likelihood target set at time t, the TOMHT selects the best association history using the combination assignment problem: S(Ai1:t ) max i (4) j i s.t. A1:t ∩ A1:t = ∅, ∀i = j In other words, we want to find the maximum total store such that each measurement is associated with at most one target. This optimization problem is solved in [16]. Then the N-scan pruning approach will trace back to t − N and prune the subtrees which diverge from the global optimal hypothesis, which is the solution to (4).
304
P. Xin and P. Dames
2.2
Lloyd’s Algorithm
Lloyds’ algorithm [7] locally minimizes the function: H(q1 , q2 , ...qR ) = min f (d(x, qr ))φ(x)dx, E r∈{1,...,R}
(5)
where d(x, q) measures the distance between elements in E, f (·) is a monotonically increasing function, and φ(x) is a non-negative weighting function. φ(x) can be a time-invariant function in coverage problem and it can also be a time-variant function in tracking problem. A general choice is f (x) = x2 . The minimum value of the inside the integral is reached when we divide the environment using the Voronoi partition, Vr = {x | d(x, qi ) ≤ d(x, qj ), ∀i = j}, where Vr are the Voronoi cells. The minimum with respect to robot positions is xφ(x) dx r ∗ (q ) = Vr , (6) φ(x) dx Vr This is a distributed control algorithm because as long as each robot knows the positions of its neighbors, a robot is capable of computing its Voronoi cell and moving forward to its weighted centroid of Voronoi cell. Note that in practice, Voronoi neighbors can potentially be far apart. We make the assumption that the density of robots and/or the communication range of robots is sufficiently large to allow robots to communicate with all of their neighbors. The problem of formulating a consistent partition subject to communication range limits will the subject of future work. This simple iterative, myopic control scheme is widely used in coverage problems. For example, Lee and Egerstedt [18] use Lloyd’s algorithm to track a timevarying density function with a teams of robots. Bhattacharya et al. [1] expand the search space to non-convex and non-Euclidean space. Suresh et al. [31] propose a method with low complexity and communication workload in distributed case using Lloyd’s algorithm. Our previous work [9] repurposes this coverage controller to perform target tracking by choosing the estimated target density as the time-varying importance weighting function φ(x).
3
Distributed Multiple Hypothesis Tracker
The formulation above was for the standard single-sensor case. However, two new considerations arise when there are multiple robots. First, multiple robots may observe the same target at the same time (and consequently may each have a local copy of a track for that target). Second, targets may leave the FoV of one robot and enter the FoV of another robot. So long as those two robots are within communication range (which is typically much larger than the sensing range), then the first robot can provide some information to the second robot to improve its initial estimation. This section outlines our novel formulation to address these issues.
Distributed Multiple Hypothesis Tracker
305
Note that there are existing distributed MHT solutions for multi-sensor surveillance in radar systems [5,6]. However, in that case, the sensors are static and the topology of the communication network is fixed. In our case, the sensors are mobile, have a smaller FoV than that of radar, and the communication topology changes online. Our new formulation accounts for all of these effects. 3.1
Assumptions
We make three main assumptions. First, that robots know their pose. This is a strong assumption, but robots operating in indoor environments with a highquality prior map [26] or in outdoor environments with GPS receivers are able to navigate for long periods of time with minimal drift in their pose estimates. We also showed in our past work that sufficiently small drift has negligible effects on distributed target tracking accuracy [9]. Second, we assume that each robot is capable of communicating with its Voronoi neighbors and all the robots with overlapping sensor FoVs. This assumption was also discussed above in Sect. 2.2. Third, each robot has a unique ID, which they will use to determine precedence to avoid conflicts.
Fig. 2. Example of track fusion. There are three robots, whose FoV are shown by the dashed circles. Each robot has a unique color that is shared by its tracks. There are 4 targets, located at (10.5, 11), (5.5, 6.5), (5.5, 9), and (16, 16), and 6 tracks (colored ellipses), 2 from each robot. Here r2 is the only robot with a nonempty set of private tracks, P 2 = (16, 16). All other tracks are in the union of multiple areas. Since r3 has the largest ID, then R3 = (5.5, 9), (10.5, 11) and F 3 = (10, 11), (11, 11), (5.5, 9), while R1 , R2 , F 1 , F 2 = ∅. Then, r3 calculates the KL-Divergence between tracks in F 3 and R3 and fuse tracks with low KL-Divergence using Algorithm 1.
3.2
Track Exchange and Fusion
Similar to our past work creating a distributed PHD filter [9], our distributed MHT solution utilizes the Voronoi partition from Lloyd’s algorithm to spatially
306
P. Xin and P. Dames
distribute information across the team, i.e., robot r maintains the tracks that are located within its Voronoi cell Vr . The key is to create a set of rules for data exchange to ensure consistency across the team as cell boundaries shift due to robot motion and as targets move. When robots are well spaced out with their FoVs not overlapping, then each robot can run its own separate MHT. However, when multiple robots view the same region then each may have its own separate track for the same target. These repeated tracks not only drive several robots track one target, which is a waste of system resource, but also increase false positive rate. Algorithm 1 outlines our approach, which is also illustrated in Fig. 2. The process starts by inflating a robot’s FoV to account for effects such as measurement noise (line 1), typically using the standard deviation of the measurement model 2σ. Robots then exchange these inflated regions to determine their local grouping NF (line 2). Robots then exchange track estimates amongst these local clusters (lines 3–14), using the relative positions of the track and other robots as well as the IDs of each robot to sort each track into one of three categories: 1) private (i.e., only in Fˆr ), 2) fused (i.e., in multiple regions but r has the highest ID amongst these), or 3) received (i.e., in multiple regions but r is not the highest ID). After this data exchange process, robot r measures the Kullback-Leibler divergence between all tracks in its fused and received sets F and R (lines 15–20). Robot r then uses the Munkres algorithm [17] to find the optimal assignment of tracks (line 21), where a valid assignment (i.e., with cost less than dmax ) means that a fused and received track are the same object and the estimates are combined and added to the private list (lines 26–27). Unassigned tracks are added directly to the private list (lines 23–24 and 30–34). 3.3
Importance Weighting Function
Then we need to use the resulting target estimation from the MHT to drive exploration using Lloyd’s algorithm. The key choice in Lloyd’s algorithm is to set the importance weighting function φ(x). In our previous work [9], we used the PHD as the weighting function. We will take a similar approach, using φ(x) = v(x) +
n
N (ˆ xi | μi , Σi2 )
(7)
i=1
where v(x) accounts for undetected targets (i.e., driving exploration) and the right terms accounts for detected targets (i.e., exploitation). We create a simple ad hoc rule to approximate the number of undetected targets (1 − pd (x))v(x) x ∈ FoV v(x) = (8) max(v + Δv, δ) else, where pd (x) is the probability of the robot detecting a target at x, Δv is the number of new targets potentially appearing per time step, and δ is a maximum target density to prevent “windup” in infrequently observed areas. The
Distributed Multiple Hypothesis Tracker
307
Algorithm 1. Track Exchange and Fusion for Robot r 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34:
Fˆr = Fr ⊕ B(2σ) Expand the FoV NF = {j = r | Fˆr ∩ Fˆj = ∅} P, F , R = ∅ Private, fused, and received track lists for t ∈ T do Nt = {j ∈ NF | t ∈ Fˆj } if Nt is empty then P = P ∪ {t} else if r > maxj∈Nt j then F = F ∪ {t} else Send t to robot maxj∈Nt j end if end for Add all tracks received from other robots to R Create assignment distance matrix D for tR i ∈ R do ∈ F do for tF j det Σ F F F R 2 F −1 R 1 Σi ) − 2 + ln det ΣjR DKL (tR i tj ) = 2 μj − μi (Σ F )−1 + tr((Σj ) j i F F DKL (tR DKL (tR i tj ), i tj ) < KDL di,j = dmax , else end for end for Find assignment A : R → F for tR i ∈ R do if A(i) = ∅ then Track tR i not assigned R P = P ∪ {ti } else μR +μF Σ R +Σ F Make new track t with μ = i 2 A(i) , Σ = i 2 A(i) P = P ∪ {t} end if end for for tF j ∈ F do Track tF if (i s.t. A(i) = j) AND (di,j = dmax , ∀i) then j not assigned F P = P ∪ {tj } end if end for
motivation behind this rule is to decrease the expected number of undetected targets for all points that are observed and increases the number of undetected targets outside of the field of view (where Δv = 0 when it is known that targets are stationary). We set v(x) = δ at the start of exploration. Typically, we set δ = 1/A, where A is the area of the environment. This assumes there is only one target in the environment. To make the problem tractable, we approximate the weighting function by a set of key points along a uniform grid in the environment, a common choice in robotics [32]. We then calculate the weight of each key point using (7) and
308
P. Xin and P. Dames
use this to approximate the integrals in (6) by summations. We use the same algorithms as our previous work [9, Algorithm 1] to ensure the data is properly exchanged as Voronoi boundaries shift.
4
Results
In this section, we test our approach against our previous results using the distributed PHD filter [9], as this is the most closely related MR-MTT solution in the literature. First, we isolate the differences due to the tracker by using the same measurement and pose data from [9] and running this data through our new MHT. Then, we examine the benefits of the new weighting function (7) to show that unifying the estimation and control is helpful. All trials take place in an open 100 × 100 m area. We run trials with different number of targets with the position of targets drawn uniformly from a 120 × 120 m area. Any targets initialized outside the map will be discarded. The robots begin at random locations within a small box at the bottom center of the map. The robot are holonomic with a maximum velocity of 2 m/s. Each robot is equipped with an onboard sensor with a circular FoV of radius 5 m, pfn = 1−pd = 0.2, and pfp = 3.66 · 10−3 . Robots measure the relative position of a target z ∼ N (x − q, 0.25I2 ). The sensors collect measurements at 2 Hz. Note, these values do not represent any specific real-world situation, but could represent a team of drones equipped with downward facing cameras. For a real situation, one could follow the procedures outlined in our other work to carefully characterize the sensor models [4,8]. 4.1
Performance Metric
We measure tracking error using the Optimal SubPattern Assignment (OSPA) metric [28]. The error between sets X and Y , where |X| = m ≤ |Y | = n without loss of generality, is d(X, Y ) =
p1 m 1 p p min dc (xi , yπ(i) ) + c (n − m) n π∈Πn 1
(9)
where c is the cutoff distance (we use c = 10 m), dc (x, y) = min(c, x − y ), Πn is the set of permutations of {1, 2, 3, ..., n}, and p = 1 is our choice for p-norm. Since the OSPA can fluctuate due to false negative and false positive detections, we calculate the median of the OSPA over the last 100 s to obtain a steady estimate and measure the variance of the OSPA over this same time window to measurement the stability of the tracker.
Distributed Multiple Hypothesis Tracker
309
Fig. 3. Robot trajectories. (a) shows the trajectories of robots during a single trail with 20 robots and 5 static targets. (b) shows the trajectories of robots during a single trail with 20 robots and dynamic targets
Fig. 4. Boxplots shows the median OSPA statistics over 10 runs for teams of 10–100 robots and 10, 30, 50 static targets for distributed MHT, MHT tracking based on control result in PHD and distributed PHD case respectively.
4.2
Stationary Targets
In the static case, the number of targets and their positions remain constant. We run a batch of trials with 10, 30, and 50 targets, respectively. The robots begin by distributing over the environment. Once a robot detects a target, it will typically continue to track that target unless the track gets assigned to another robot, as Fig. 3a shows. As the robots explore the environment, v(x) decreases in the detected area, thus the weighted centroid in one Voronoi cell keeps shifting away from regions without targets.
310
P. Xin and P. Dames
Fig. 5. Boxplots shows the median of standard deviation of OSPA statistics over 10 runs for teams of 10–100 robots and 10, 30, 50 static targets for distributed MHT, tracking results from MHT using detection results in PHD and distributed PHD case respectively.
Figure 4 shows the median OSPA over the last 100 s of a 550 s experiment (long enough to reach a steady state). We see that the median OSPA decreases significantly when the number of robots increases for all three methods, leveling out when the number of robots is near the number of targets. We see that in general, the performance is about the same using all three methods, with the mixed MHT-PHD case having a higher average OSPA as the control was not directly tied to the tracker. This is also born out in the standard deviation of the OSPA data in Fig. 5, where the MHT-PHD case tends to have the highest standard deviation. This also explains the increase in the standard deviation as the number of robots increases. We also see that the MHT data tends to be between the mixed case and the PHD. This is likely due to the MHT making hard decisions about the presence of a target (i.e., a track exists or is pruned) compared to the PHD, which uses a target density function that more gracefully degrades. We expect the MHT to perform better when there are fewer false negative detections, as repeated missed detections often cause tracks to be pruned. 4.3
Dynamic Targets
In the dynamic case, targets move around in the environment and the number of targets varies over time as targets enter or leave the map area. Thus we set up the birth rate b(x) near the map boundary (∂E) to simulate the occurrence of new targets: 5.26 × 10−5 x − ∂E ≤ 5m b(x) = 0 else (10)
v(x) := v(x) + b(x) Note, this birth model is incorporated into the v term in (8). With this birth model, the total expected number of increasing targets is 1 per 10 s. Because of this flux, the final number of targets is always similar regardless of the initial
Distributed Multiple Hypothesis Tracker
311
Fig. 6. Boxplots shows the median OSPA error statistics over 10 runs for teams of 10– 100 robots and dynamic targets for distributed MHT, MHT tracking based on control result in PHD and distributed PHD case respectively.
number. The emergent behavior in this case is different the static case, where some robots spread out along the map boundary due to detect new targets, as Fig. 3b shows. To get the steady state tracking performance, we extend the experiment to 800 s and we evaluate the OSPA error over last 200 s. Figure 6 shows the median OSPA of the distributed MHT, MHT-PHD and PHD. We see that the OSPA decreases as the number of robots increases, and the variance of the OSPA has the same trend as the static case. The pure MHT estimation, namely, the MHT-PHD, still performs a little worse than the distributed PHD and keeps the largest variance among these three methods. The main potential reason is similar to that in static case, which is the control policy based on distributed PHD disturbs the tracks assignment in MHT estimation. However, unlike the static case, the MHT has a consistently lower OSPA value than the PHD data, especially in smaller teams. It also has a smaller variation in larger teams, meaning it provides a more consistent result. We do not analyze the standard deviation of the OSPA in last 200 s of each trail because the appearance and disappearance of the targets is the principle component of the standard deviation. 4.4
Discussion
Beyond the qualitative and quantitative tracking performance in the above scenario, there are a number of other factors to consider with MR-MTT algorithms. Failure Cases. The most challenging scenario for MTT is when targets are tightly clustered together (relative to the measurement covariance), as this is when data association is most difficult. Due to their different approaches, the MHT and PHD filter will fail in different ways. The MHT makes multiple hard
312
P. Xin and P. Dames
decisions about data association. Thus, tightly clustered targets will lead to a large branching factor at each time step, resulting in a large number of branches that are difficult to prune. This will cause the number of branches to be very large and difficult to prune. On the other hand, the PHD filter represents targets using one target density function. Thus, the PHD filter will have a single large peak in the density function, making it difficult to extract the precise location of any individual object within the cluster. Another challenging scenario is when there are multiple missed detections in a row. In the MHT, this causes a track to be pruned prematurely, as there is typically a maximum timeout for a track to ensure that old branches get pruned. Increasing this timeout will make the MHT more robust to missed detections, but also increases computation by increasing the number of tracks. The PHD filter fails in a different manner. In the PHD filter update equation [9, Eq. (4)], the target density function decreases in areas where the robot expects to detect a target and increases in areas where the robot actually detects a target. Thus, repeated missed detections lower the density and make it difficult to recover quickly. Adding multiple robots adds another challenge. Due to the design of our distributed MTT algorithms, when a target lies right at the boundary between two Voronoi cells then the ownership of that track will be ambiguous. Thus, two robots may repeatedly exchange ownership of a track simply due to measurement noise pushing the mean position across the Voronoi boundary. This does not affect tracking quality but increase the communication load. Computational Complexity. Analyzing the computational complexity of the MHT filter is challenging as the number of computations depends on the specific scenario under consideration. As mentioned above, the worst case scenario for the MHT is when all the targets are tightly clustered together (i.e., within the gate radius of one another). In this case, the number of branches will grow exponentially over time, where the base of the exponential function is the number of targets in the cluster. On the other extreme, when targets are widely spaced, there will only be one branch per target, essentially just running an extended Kalman filter for each target in parallel. The complexity of the PHD filter scales differently than that of the MHT filter in terms of the number of targets. As detailed in [9, Eq. (4)], the PHD filter features a summation over the number of measurements (which is proportional to the number of targets). This superior scaling law relative to the MHT is a result of the choices made during the data association step of the MTT algorithm design. The distributed MHT or PHD filter will generally scale better in the number of targets than a centralized MTT as the computations are distributed across the team, with each robot taking ownership of one portion of the full environment. The worst case would be if all targets in cell of a single robot, but this is almost guaranteed to never occur by construction of the control method, as robots are drawn towards areas of high target density.
Distributed Multiple Hypothesis Tracker
313
Communication Load. Like the complexity, the communication load is situationally dependent. There are three main stages to the distributed MHT or PHD filter algorithms, 1) Voronoi calculations, 2) environment reassignment, and 3) MTT update. The Voronoi calculations are exactly the same between the distributed MHT and PHD filter and only require robots to exchange pose information with their neighbors, a low bandwidth operation. The environment reassignment step in the MHT filter is where robots exchange tracks that have crossed the boundaries of Voronoi cells. The communication load depends upon the number of targets, the length of history, and the number of branches in the track tree. In the PHD filter, the robots exchange portions of the target density function (in our case, sets of weighted particles). Thus, with low target density the MHT will be more efficient but with high target density the PHD filter will be more efficient. The MTT update step requires robots to exchange measurements if and only if they can see into the cell of another robot (i.e., when the robot-robot distance is smaller than the field of view of a robot). In this case, the robots simply need to exchange measurement sets, which is a relatively low bandwidth operation as this is typically just a numerical array (e.g., a list of range-bearing measurements). In the case of the PHD filter, the robots need to perform a second round of communication in order to compute a consistent normalization term in the PHD update step [9, Eq. (5)].
5
Conclusion
In this paper, we propose a distributed algorithm to search for and track an unknown number of targets in a search area. There are two main components: 1) a novel, distributed MHT formulation for mobile robot teams and (2) a Voronoibased control strategy. We compare the result of the distributed MHT to our previous work, which uses an alternative MTT solution, a distributed PHD filter. We found that both filters perform similarly well to solve the MR-MTT task. The PHD filter was slightly better in the case of static targets, yielding more accurate and consistent tracking performance. However, the MHT performed slightly better when the targets were dynamic, particularly when the size of the team is smaller. We also demonstrated the importance of tying the control directly to the tracking, with the MHT performing better on live data compared to running on the prior data collected by robots using the PHD filter. Acknowledgment. This work was funded by NSF grants IIS-1830419 and CNS2143312.
314
P. Xin and P. Dames
References 1. Bhattacharya, S., Ghrist, R., Kumar, V.: Multi-robot coverage and exploration on Riemannian manifolds with boundaries. Int. J. Robot. Res. 33(1), 113–137 (2014). https://doi.org/10.1177/0278364913507324 2. Blackman, S.: Multiple hypothesis tracking for multiple target tracking. IEEE Aerosp. Electron. Syst. Mag. 19(1), 5–18 (2004). https://doi.org/10.1109/MAES. 2004.1263228 3. Blackman, S., Popoli, R.: Design and Analysis of Modern Tracking Systems. Artech House Radar Library. Artech House (1999) 4. Chen, J., Xie, Z., Dames, P.: The semantic PHD filter for multi-class target tracking: from theory to practice. Robot. Auton. Syst. 149 (2022). https://doi.org/10. 1016/j.robot.2021.103947 5. Chong, C.Y., Mori, S., Reid, D.B.: Forty years of multiple hypothesis tracking a review of key developments. In: 2018 21st International Conference on Information Fusion (FUSION), pp. 452–459 (2018). https://doi.org/10.23919/ICIF.2018. 8455386 6. Coraluppi, S., et al.: Distributed MHT with active and passive sensors. In: 2015 18th International Conference on Information Fusion (Fusion), pp. 2065–2072 (2015) 7. Cortes, J., Martinez, S., Karatas, T., Bullo, F.: Coverage control for mobile sensing networks. IEEE Trans. Robot. Autom. 20(2), 243–255 (2004). https://doi.org/10. 1109/TRA.2004.824698 8. Dames, P., Kumar, V.: Experimental characterization of a bearing-only sensor for use with the PHD filter (2015) 9. Dames, P.M.: Distributed multi-target search and tracking using the PHD filter. Auton. Robot. 44(3), 673–689 (2020). https://doi.org/10.1007/s10514-019-098409 10. Deutsch, I., Liu, M., Siegwart, R.: A framework for multi-robot pose graph SLAM. In: 2016 IEEE International Conference on Real-time Computing and Robotics (RCAR), pp. 567–572 (2016). https://doi.org/10.1109/RCAR.2016.7784092 11. Doitsidis, L., et al.: Optimal surveillance coverage for teams of micro aerial vehicles in GPS-denied environments using onboard vision. Auton. Robot. 33(1), 173–188 (2012). https://doi.org/10.1007/s10514-012-9292-1 12. Du, Q., Faber, V., Gunzburger, M.: Centroidal Voronoi tessellations: applications and algorithms. SIAM Rev. 41(4), 637–676 (1999). https://doi.org/10.1137/ S0036144599352836 13. Fortmann, T., Bar-Shalom, Y., Scheffe, M.: Sonar tracking of multiple targets using joint probabilistic data association. IEEE J. Oceanic Eng. 8(3), 173–184 (1983). https://doi.org/10.1109/JOE.1983.1145560 14. Goldhoorn, A., Garrell, A., Alqu´ezar, R., Sanfeliu, A.: Searching and tracking people with cooperative mobile robots. Auton. Robot. 42(4), 739–759 (2018). https:// doi.org/10.1007/s10514-017-9681-6 15. Gronauer, S., Diepold, K.: Multi-agent deep reinforcement learning: a survey. Artif. Intell. Rev. 55(2), 895–943 (2022). https://doi.org/10.1007/s10462-021-09996-w 16. Kim, C., Li, F., Ciptadi, A., Rehg, J.M.: Multiple hypothesis tracking revisited. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4696–4704 (2015)
Distributed Multiple Hypothesis Tracker
315
17. Konstantinova, P., Udvarev, A., Semerdjiev, T.: A study of a target tracking algorithm using global nearest neighbor approach. In: Proceedings of the 4th International Conference Conference on Computer Systems and Technologies: E-Learning, CompSysTech 2003, pp. 290–295 (2003). https://doi.org/10.1145/973620.973668 18. Lee, S.G., Egerstedt, M.: Controlled coverage using time-varying density functions. IFAC Proc. Vol. 46(27), 220–226 (2013). https://doi.org/10.3182/20130925-2-DE4044.00030 19. Leonard, M.R., Zoubir, A.M.: Multi-target tracking in distributed sensor networks using particle PHD filters. Signal Process. 159, 130–146 (2019) 20. Mahler, R.: Multitarget bayes filtering via first-order multitarget moments. IEEE Trans. Aerosp. Electron. Syst. 39(4), 1152–1178 (2003). https://doi.org/10.1109/ TAES.2003.1261119 21. Murphy, R.: Human-robot interaction in rescue robotics. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 34(2), 138–153 (2004). https://doi.org/10.1109/ TSMCC.2004.826267 22. Parker, L.E.: Cooperative robotics for multi-target observation. Intell. Autom. Soft Comput. 5(1), 5–19 (1999) 23. Pierson, A., Rus, D.: Distributed target tracking in cluttered environments with guaranteed collision avoidance. In: 2017 International Symposium on Multi-Robot and Multi-Agent Systems (MRS), pp. 83–89 (2017). https://doi.org/10.1109/MRS. 2017.8250935 24. Pimenta, L.C.A., et al.: Simultaneous coverage and tracking (SCAT) of moving targets with robot networks. In: Chirikjian, G.S., Choset, H., Morales, M., Murphey, T. (eds.) Algorithmic Foundation of Robotics VIII. Springer Tracts in Advanced Robotics, vol. 57, pp. 85–99. Springer, Heidelberg (2010). https://doi.org/10.1007/ 978-3-642-00312-7 6 25. Qie, H., Shi, D., Shen, T., Xu, X., Li, Y., Wang, L.: Joint optimization of multi-UAV target assignment and path planning based on multi-agent reinforcement learning. IEEE Access 7, 146264–146272 (2019). https://doi.org/10.1109/ACCESS.2019. 2943253 26. Ribas, D., Ridao, P., Neira, J.: Localization with an a priori Map. In: Ribas, D., Ridao, P., Neira, J. (eds.) Underwater SLAM for Structured Environments Using an Imaging Sonar. Springer Tracts in Advanced Robotics, vol. 65, pp. 47– 75. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14040-2 5 27. Robin, C., Lacroix, S.: Multi-robot target detection and tracking: taxonomy and survey. Auton. Robot. 40(4), 729–760 (2016) 28. Schuhmacher, D., Vo, B.T., Vo, B.N.: A consistent metric for performance evaluation of multi-object filters. IEEE Trans. Signal Process. 56(8), 3447–3457 (2008). https://doi.org/10.1109/TSP.2008.920469 29. Schwager, M., Dames, P., Rus, D., Kumar, V.: A multi-robot control policy for information gathering in the presence of unknown hazards. In: Christensen, H.I., Khatib, O. (eds.) Robotics Research. STAR, vol. 100, pp. 455–472. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-29363-9 26 30. Stone, L.D., Streit, R.L., Corwin, T.L., Bell, K.L.: Bayesian Multiple Target Tracking. Artech House (2013) 31. Suresh, A.T., Yu, F.X., McMahan, H.B., Kumar, S.: Distributed mean estimation with limited communication. CoRR abs/1611.00429 (2016) 32. Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics. MIT Press, Cambridge (2005)
316
P. Xin and P. Dames
33. Vo, B.T., Vo, B.N., Cantoni, A.: The cardinality balanced multi-target multibernoulli filter and its implementations. IEEE Trans. Signal Process. 57(2), 409– 423 (2009). https://doi.org/10.1109/TSP.2008.2007924 34. Yan, Z., Jouandeau, N., Cherif, A.A.: A survey and analysis of multi-robot coordination. Int. J. Adv. Rob. Syst. 10(12), 399 (2013). https://doi.org/10.5772/57313 35. Zhou, W., Li, J., Liu, Z., Shen, L.: Improving multi-target cooperative tracking guidance for UAV swarms using multi-agent reinforcement learning. Chin. J. Aeronaut. 35(7), 100–112 (2022). https://doi.org/10.1016/j.cja.2021.09.008
Distributed Multirobot Control for Non-cooperative Herding Nishant Mohanty(B) , Jaskaran Grover, Changliu Liu, and Katia Sycara The Robotics Institute, Carnegie Mellon University, Pittsburgh, USA {nishantm,jaskarag,cliu6,sycara}@andrew.cmu.edu
Abstract. In this paper, we consider the problem of protecting a highvalue area from being breached by sheep agents by crafting motions for dog robots. We use control barrier functions to pose constraints on the dogs’ velocities that induce repulsions in the sheep relative to the highvalue area. This paper extends the results developed in our prior work on the same topic in three ways. Firstly, we implement and validate our previously developed centralized herding algorithm on many robots. We show herding of up to five sheep agents using three dog robots. Secondly, as an extension to the centralized approach, we develop two distributed herding algorithms, one favoring feasibility while the other favoring optimality. In the first algorithm, we allocate a unique sheep to a unique dog, making that dog responsible for herding its allocated sheep away from the protected zone. We provide feasibility proof for this approach, along with numerical simulations. In the second algorithm, we develop an iterative distributed reformulation of the centralized algorithm, which inherits the optimality (i.e. budget efficiency) from the centralized approach. Lastly, we conduct real-world experiments of these distributed algorithms and demonstrate herding of up to five sheep agents using five dog robots. Videos of these results are available at https://bit.ly/3bZq0dB. Keywords: Herding
1
· Barrier Functions · Quadratic Programming
Introduction
Recent developments in robotics and sensing have created significant interest among researchers to deploy multiple robots to operate cooperatively towards achieving a common goal. Many works have developed techniques to tackle realworld problems using multi-robot systems (MRS), like conducting surveys or automating warehouses [1–3]. The major developments in MRS for enabling multiple robots to behave cooperatively have been based on interactions within a single team, i.e., a robot interacts with other robots in its group to achieve a given objective [4,5]. The main features of these types of algorithms are a) local N. Mohanty and J. Grover—These authors contributed equally to this work. This research is supported by AFOSR FA9550-18-1-0097 and AFRL/AFOSR FA955018-1-0251. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 317–332, 2024. https://doi.org/10.1007/978-3-031-51497-5_23
318
N. Mohanty et al.
interaction, b) collision-free motion within the group, and c) achieving collective behavior using local interaction [6]. In literature, there are studies on MRS that involve interaction between multiple groups of agents. Here, along with the local interaction with group members, the individuals also interact with an external agent from another group. An example of this is a scenario where a group of adversarial robots has a goal of their own that might damage a given high-value unit. Here, a group of defenders must interact with the adversarial robots to ensure the safety of the unit. [7,8]. In this paper, we propose a provably correct controller for the group of defenders (“dog robots”) to prevent an adversarial group (the “sheep robots”) from breaching a protected zone. This is challenging because dog robots do not control the sheep robots directly; rather have to rely on the interaction dynamics between the dogs and sheep to influence the sheep’s behavior. In our prior work [9]1 , we developed a centralized algorithm to solve this problem using control barrier functions. In this work, a) we provide more experimental validation of the centralized algorithm, b) propose two distributed algorithms, and c) provide simulations and experiments to validate these algorithms. Our formulation computes the velocity of each dog locally to prevent sheep from breaching the protected zone(s). In the first distributed algorithm, we allocate each sheep to a unique dog and pose a constraint on that dog’s velocity to herd its allocated sheep away from the protected zone. We provide proof of feasibility of this approach, thus showing that whenever the number of sheep and dogs are equal, the herding problem is well-posed. Our previously proposed centralized algorithm lacked this feasibility guarantee. However, it did not necessitate equal numbers of dogs and sheep; in fact, in many experiments, fewer dogs than sheep were sufficient to herd all the sheep away. This observation led us to develop the second algorithm. In this algorithm, we construct an iterative distributed approach that asymptotically attains the same velocities as computed by the centralized approach, thereby attaining the same total optimality (measured in terms of the total movement the dogs exhibit) as the centralized approach and obviating the need to have equal numbers of dogs and sheep. We build on the dual-decomposition algorithms proposed in [10,11] for developing this distributed algorithm. Both of our proposed distributed algorithms are compositional in nature i.e., we can protect multiple zones by including more constraints, as shown in Fig. 1(c). To highlight the performance of our formulation, we provide results from numerical simulations showing the success of our approach for multiple dogs against multiple sheep. Finally, we demonstrate our algorithm on real robots and show multiple dog robots successfully preventing the breaching of protected zones against multiple sheep robots. The outline of this paper is as follows: in Sect. 2, we give a brief review of the prior work in this area. In Sect. 3, we provide a mathematical formulation of the problem statement. In Sect. 4, we show how to use control barrier functions to pose constraints on dog velocities. Section 5 provides simulations and exper1
Accepted in IEEE Conference on Decision and Control 2022.
Distributed Herding
319
imental results to demonstrate the proposed approach. Finally, we summarize our work in Sect. 6 along with our directions for future work.
2
Prior Work
The framework of multi-group interaction within MRS has many applications beyond the adversarial problem statements. The shepherding problem is an example of such a category. In [12,13], the authors have proposed methods to enable multiple shepherd agents to influence a flock of sheep by modeling the interaction as repulsion forces. The Robot Sheepdog Project [14,15] conducted a real-world demonstration of a shepherding algorithm where a group of mobile ground robots cooperatively herded a flock of ducks to a given goal location. In the literature, there are several works on non-cooperative shepherding as an example of a multi-group interaction type problem. The works like [13,16–20] deal with a problem where the sheep robots do not exhibit adversarial behavior. They do not have any goals of their own. However, they experience a repulsive force from the dog robots, which is exploited to produce the desired behavior in the sheep robots. For example, collecting all the sheep at some location and then driving them to a target goal. Differently from prior work, our sheep may or may not be adversarial. We call them adversarial if their goal lies inside the protected zone and non-adversarial otherwise. Our safe control synthesis approach remains the same regardless. The dog robots observe and generate their control commands considering the cohesion between the sheep robots, the attraction to their goal location, and the repulsion experienced by them from the dog robots. And as we use control barrier functions to generate the constraints on the velocity of the dog robots, it only requires the dynamics of the sheep to be represented as a symbolic function. Thus allowing for the sheep to experience any kind of attractive or repulsive forces.
3
Problem Formulation
Consider a scenario with m sheep agents flocking towards a common goal location. One commonly assumed model for flocking is the Reynolds-Boids dynamics [6] that considers inter-sheep cohesive forces, inter-sheep repulsive forces, and attraction to a common goal. In the presence of dog agents, each sheep’s dynamics would include repulsive forces from each dog robot. While en route to their goal, the sheep, having no knowledge about high-value regions in workspace (protected zones), pose a risk of breaching them. Thus, our problem is to orchestrate the motions of dog robots by capitalizing on the repulsions that the sheep experience from the dogs to prevent this breaching. Next, we pose this problem in formal terms. Consider the protected zone P ⊂ R2 as a disc centered at xP with radius Rp , i.e., P := {x ∈ R2 | x − xP ≤ Rp }. We denote the flock of sheep as S and the position of the ith sheep as xSi ∈ R2 . The collective positions of all sheep is denoted as xall S := (xS1 , xS2 , ..., xSm ). Similarly, we denote the set of all dogs
320
N. Mohanty et al.
using D. The position of the k th dog is xDk ∈ R2 and the positions of all dogs collectively is xall D := (xD1 , xD2 , ..., xDn ). Each sheep follows single integrator dynamics x˙ Si := f i (xS1 , ..., xSn , xD1 , ..., xDn ), given by RS3 1− (xSj − xSi ) + kG (xG − xSi ) x˙ Si = uSi = kS xSj − xSi 3 j∈S\i attraction to goal inter-sheep cohesion and repulsion xS − xD i l + kD (1) xSi − xDl 3 l∈D repulsion from dogs Here, RS is a safety margin that each sheep tends to maintain with every other sheep, xG is the sheep’s desired goal and kS , kG and kD are proportional gains corresponding to the attractive and repulsive forces. We model each dog as a velocity controlled robot with the following dynamics: x˙ Dk = uDk ∀k ∈ {1, 2, · · · , n}
(2)
Before posing the problem, we state some assumptions on the dogs’ knowledge: Assumption 1. The dog robots have knowledge about the sheep’s dynamics i.e. (1) and can measure the sheep’s positions accurately. Assumption 2. Each dog robot can measure the velocities of other dog robots (by using numerical differentiation, for example). Problem 1. Assuming that the initial positions of the sheep xSi (0) ∈ / P ∀i ∈ S, the dog robots’ problem is to synthesize controls {uD1 , · · · , uDn } such that / P ∀t ≥ 0 ∀i ∈ S. xSi (t) ∈
4
Controller Design
In this section, we show two approaches to solve Problem 1, building on our previously proposed centralized algorithm [9]. Define a safety index h(·) : R2 −→ R that quantifies the distance of Si from P: h(xSi ) = xSi − xP 2 − (r + Rp )2
(3)
Here r is a safety buffer distance. Thus, we require h(xSi (t)) ≥ 0 ∀t ≥ 0. We all define x = (xall S , xD ) as the aggregated state of all sheep and all dogs. To ensure, h(xSi (t)) ≥ 0 ∀t ≥ 0, we treat h(·) as a control barrier function require its derivative to satisfy ˙ h(x) + p1 h(xSi ) ≥ 0.
(4)
Distributed Herding
321
Here p1 is a design parameter and is chosen such that it satisfies: p1 > 0 and
p1 > −
˙ h(x(0)) . h(xSi (0))
(5)
The first condition on p1 ensures the pole is real and negative. The second depends on the initial positions x(0) of all the sheep and dogs relative to the protected zone. Note that the constraint in (4) does not contain any dog velocity terms, which is what we require to control each dog. Therefore, we define the LHS of (4) as another control barrier function v(x) : R4n −→ R: v = h˙ + p1 h,
(6)
and require its derivative to satisfy the constraint: v(x) ˙ + p2 v(x) ≥ 0. Here p2 is another design parameter which must satisfy p2 > 0 and
p2 > −
˙ ¨ h(x(0)) + p1 h(x(0)) ˙ h(x(0)) + p1 h(xSi (0))
(7)
Using (3), (6) and the constraint on the derivative, we get ¨ ˙ h(x) + αh(x) + βh(xSi ) ≥ 0
(8)
where α := p1 + p2 and β := p1 p2 . The derivatives of h(·) are: ˙ h(x) = 2(xSi − xP )T x˙ Si = 2(xSi − xP )T f i (x) T T S D ¨ Jji f i + Jli uDl h(x) = 2f i f i + 2(xSi − xP ) j∈S
(9) (10)
l∈D
where the jacobians are defined as JSji := ∇x Sj f i (x) and JD li := ∇x Dl f i (x) Note that (10) contains the velocity terms of all dogs. In [9], we leveraged this observation to obtain a linear constraint on the velocity of all dogs collectively for preventing sheep Si from breaching P: all C AC i u D ≤ bi ,
where
(11)
D D
T J1i , J2i , · · · , JD AC i := (xP − xSi ) ni h T T bC JSji f j + α(xSi − xP )T f i + β i := f i f i + (xSi − xP ) 2 j∈S
This gives us a centralized algorithm that collectively computes the velocities of all dogs using the following QP: all 2 uall D = arg minuD u all D
subject to
all AC i uC
≤ bC i ∀i ∈ S.
(12)
Building on this centralized approach, in this paper, we develop two distributed approaches wherein we allow each dog to compute its velocity locally. The computed velocities will make the dog herd the sheep away from P.
322
4.1
N. Mohanty et al.
Approach 1: One Dog to One Sheep Allocation Based Approach
In this approach, we assume that we have an equal number of dogs and sheep. By exploiting this equality, we assign a unique sheep Si for i ∈ {1, · · · , n} to a unique dog Dk for k ∈ {1, · · · , n} and make Dk responsible for herding Si away from P. In other words, Dk computes a velocity uDk that repels Si from P thereby / P ∀t ≥ 0. The premise is that owing to the equality, ensuring that xSi (t) ∈ each sheep will end up being herded by a unique dog, therefore, no sheep will breach the protected zone2 . Now while this strategy necessitates having an equal number of dogs and sheep, the benefit of this approach stems from the feasibility guarantee (that we prove shortly), which the centralized approach lacks. Simple algebraic manipulation of constraint (11) yields a constraint on the velocity of Dk as follows H AH i u D k ≤ bi ,
where
(13)
T D AH i := (xP − xSi ) Jki h T T S D := + (x f bH f + β − x ) J f + αf + J u S P D i i i i ji j li i l 2 j∈S
l∈D\k
1×2 H Here AH and bH i ∈ R. The term uDl in the expression of bi is computed i ∈R by using numerical differentiation of the positions xDl . We pose a QP to obtain the min-norm velocity for Dk as follows
u∗Dk = arg minuDk 2 u Dk
subject to
H AH i u D k ≤ bi
(14)
The obtained velocity u∗Dk guarantees that the protected zone P will not be breached by sheep Si by ensuring that h(xSi (t)) ≥ 0 ∀t ≥ 0. Since each dog in D is in-charge of herding exactly one sheep in S, feasibility of (13) ∀k ∈ D would ensure no sheep breaches P. Next, we show the conditions under which (14) remains feasible but first state some assumptions. Assumption 3. We make the following assumptions on the distances between pairs of agents: 1. There exists a lower bound and upper bound on the distance between any pair of sheep, i.e., LS xSi − xSj MS , ∀i, j ∈ S and i = j. 2. There exists a lower bound on the distance between every sheep and dog, i.e., xSi − xDk ≥ LD ∀i ∈ S and k ∈ D. 2
Note that although Si is assigned to Dk , the position of the remaining dogs {1, · · · , n}\k and the remaining sheep {1, · · · , n}\i do influence Dk ’s constraint H ∗ parameters (AH i , bi ), and in turn, its computed velocity u Dk .
Distributed Herding
323
3. There exists a upper bound on the distance between each sheep and its goal i.e., xSi − xG MG and between the sheep and the center of the protected zone i.e., xSi − xP MP . Theorem 1. In a scenario with ‘n’ dogs and ‘n’ sheep, with each dog assigned a unique sheep, the herding constraint (13) for a given dog is always feasible, provided assumptions 3 are met. Proof. See appendix (Sect. 7). 4.2
Approach 2: Iterative Distributed Reformulation of (12)
The distributed formulation proposed in (14) comes with a feasibility guarantee ensuring that all sheep will be herded away from P. While vital, this comes at the cost of requiring as many dog robots as the number of sheep agents. This is because, in a way, this equality ensures that controlling the sheep from the perspective of dog robots is not an underactuated problem. Be that as it may, in our simulations and experiments involving the centralized approach with an equal number of dogs and sheep, we frequently observed that not all dog robots needed to move to repel the sheep away from P i.e., equality may have been an overkill. Thus, in terms of budget efficiency, at least empirically, the centralized approach outweighs the distributed approach. This raises the question, can we convert the centralized algorithm of (12) into a distributed version that inherits the budget efficiency (optimality) promised by (12)? Indeed, we found out that [10,11] propose algorithms to convert constrained-coupled convex optimization problems (such as (12)) into distributed counterparts. They combine techniques called dual decomposition and proximal minimization and develop iterative distributed schemes which consist of local optimization problems. The solutions to these optimization problems asymptotically converge to the solution of centralized optimization under mild convexity assumptions and connectivity properties of the communication network. In our case, this network refers to the communication between dog robots. Below, we present the distributed dual sub-gradient method of [10,11] adapted to the costs and constraints of (12). This algorithm calculates an estimate of dog Dk ’s velocity u ˆDk which, given large enough iterations Kmax , matches with the k th velocity returned by (12). Ak ∈ RnS ×2 refers component in the optimal velocities u∗all D H to those columns of A that correspond to uDk in uall D .
324
N. Mohanty et al.
Algorithm 1 . Distributed Dual Subgradient for (12) (based on sec. 3.4.2 in [11]) Initialize Lagrange Multiplier: µ0k = 0 ∈ RnS Evolution: t = 1, 2, · · · , Kmax Gather Multipliers µtr from Dr ∀r ∈ {1, · · · , nD }\k = n1D r∈{1,··· ,nD }\k µtr Average Multipliers: v t+1 k
2 t+1 T H 1 1 T t+1 Local Solution: ut+1 Dk = arg min u + (v k ) (Ak u − nD b ) = − 2 Ak v k u 1 Update Multiplier: µt+1 = v t+1 + γt Ak ut+1 Dk − nD b + k k Kmax t Return Average: u ˆDk = (1/Kmax ) t=1 uDk
5
Results
In this section, we provide simulation and real-world experimental results demonstrating our proposed distributed algorithms. 5.1
Simulation Results
We first validate the first distributed algorithm and the feasibility proof given in Sect. 4.1. For this, we model the sheep with the Reynolds-Boids dynamics (1) with gains kS = 0.5, kG = 1 and kD = 0.1. The dogs use (14) to compute their velocities, where hyperparameters α and β are computed following (5) and (7). We chose a circular protected zone of radius Rp = 0.6 m and center xP at origin. The sheep are initialized outside of the protected zone, and their goal location xG is chosen such that their nominal trajectory would make them breach the zone, thus necessitating intervention from dogs. The positions of dogs are initialized randomly within a certain range of the protected zone. In Figs. 1(a) and 1(b),
Fig. 1. Preventing the breaching of the protected zone using our proposed distributed algorithm in Sect. 4.1. Here dogs are shown in blue and sheep in red. The green disc represents the protected zone. The nominal task of the sheep is to go straight towards goal xG . However, since this would result in infiltration of the protected zone, the dog intervenes using the control algorithm presented in (14). In Fig. 1(c), we defend two protected zones from four sheep.
Distributed Herding
325
we show two examples involving a) two dog robots vs. two sheep robots and b) three dog robots vs. three sheep robots. To demonstrate the compositionality of our approach, we consider two protected zones in Fig. 1(c) where we have four dogs defending both zones from four sheep. In all these simulations, none of the sheep breach any zone, thus demonstrating the correctness of our approach. In the interest of space, we skip the simulation results for the algorithm in Sect. 4.2 but do provide experimental results. 5.2
Robot Experiments
In this section, we show the results obtained by performing robot experiments by implementing the distributed algorithms of Sect. 4.1 and Sect. 4.2. Additionally, we also present more experimental results for our prior centralized algorithm from [9] (because at the time, we did not have as many robots). We conduct these experiments in our lab’s multi-robot arena, which consists of a 14ft × 7ft platform with multiple Khepera IV robots and eight Vicon cameras for motion tracking. Although Khepera robots have unicycle dynamics, [9] consists of a technique to convert the single-integrator dynamics (assumed for dogs and sheep) to linear and angular velocity commands for the robots. First of all, to build upon our previous work, we show additional experiments using centralized velocity computation of the dog robots (12). Figure 2 shows a case with 2 dog and 4 sheep robots. The dog robots have a green tail, and the sheep robots have an orange tail. The tails are pointing in the opposite direction of the robot’s heading angle. The protected zone is the green-colored circular region. This figure shows the performance in the case of an underactuated system, i.e., there are more sheep against less number of dogs. Another example is shown in Fig. 3 where 3 dogs successfully prevent breaching against 5 sheep robots. Following that, multiple experiments were conducted using the distributed algorithm presented in Sect. 4.1, which requires equal numbers of dogs and sheep. Figure 4 shows 4 dog robots against 4 sheep robots scenario. Here we take two protected zones and show that the dogs can protect both of them. This highlights the compositional nature of our proposed algorithm. We conducted experiments with 5 dog robots and 5 sheep robots, as shown in Fig. 5. Here we can see some dog robots did not require to move as the assigned sheep were being prevented from entering the protected zone due to the configuration of the flock itself. Finally, we test our distributed algorithm presented in Sect. 4.2. Figure 6 shows a case where 2 dogs prevent the breaching of protected zone against three dogs. This highlights that our distributed approach can handle under-actuated scenarios. Figure 7 and Fig. 2 can be compared to see both centralized and distributed algorithm handling a similar scenario of 2 dogs against 4 sheep.
326
N. Mohanty et al.
Fig. 2. Experiments for Centralized Control: Two dogs defending the protected zone from four sheep using centralized control algorithm (12) from our prior work [9]. Video at https://bit.ly/3OTAnOu.
Fig. 3. Experiment for Centralized Control: Three dogs (green-tailed robots) defending a protected zone from five sheep (orange-tailed robots) using centralized control (12) from our prior work [9]. Video at https://youtu.be/2 Xuxnd9jZw.
Distributed Herding
327
Fig. 4. Experiment for the distributed algorithm in Sect. 4.1 : Four dogs (greentailed robots) defending two protected zone from four sheep (orange-tailed robots). The goal position xG (red disc) is in extreme left that would encourage sheep to breach both zones. However, our proposed algorithm moves the dogs so that none of the zones get breached. Video at https://bit.ly/3yo9ziC.
Fig. 5. Experiment for the distributed algorithm in Sect. 4.1): Five dogs (greentailed robots) defending the protected zone from five sheep (orange-tailed robots). The sheep’s goal (red disc) is in the center of the protected zone. Eventually, in this scenario a deadlock occurs where all sheep come to a stop outside the protected zone. Video at https://bit.ly/3o51Cu1.
328
N. Mohanty et al.
Fig. 6. Experiment for distributed algorithm in Sect. 4.2): Two dogs (greentailed robots) defending the protected zone from three sheep (orange-tailed robots). The goal position xG (red disc) is at the center of the zone. Video at https://youtu. be/IbCjkR1ye0c.
Fig. 7. Experiment for distributed algorithm in Sect. 4.2): Two dogs (greentailed robots) defending the protected zone from four sheep (orange-tailed robots). This case is similar to the one shown in Fig. 2. Video at https://youtu.be/51FoHZWFYC4.
Distributed Herding
6
329
Conclusions
In this paper, we developed a novel optimization-based distributed control techniques to enable multiple dog robots to prevent the breaching of protected zones by sheep agents. We provided proof of feasibility of the controller when n dog robots faceanequalnumberofsheeprobots.Additionally,wedevelopedanotherdistributed algorithm that iteratively computes a solution that agrees with the solution returned by the centralized problem without requiring equal number of dogs and sheep. We experimentally validated both distributed algorithms in addition to validating our previously developed centralized control. We show that multiple dog robots can prevent breaching of protected zone in both simulation and real-world experiments. In futurework,weaimforthedogrobotstolearnthedynamicsofthesheeprobotsonline while preventing them from breaching.
7
Appendix: Proof of Feasibility for Approach 1
Theorem 1. In a scenario with ‘n’ dogs and ‘n’ sheep, with each dog assigned a unique sheep, the herding constraint (13) for a given dog is always feasible, provided Assumptions 3 are met. Proof. Our strategy to guarantee feasibility of constraint (13) relies on ruling out situations in which it is infeasible. (13) can become infeasible H – either when AH i = 0 and bi < 0 (possibility 1) H – or when bi = −∞ (possibility 2).
To determine the conditions in which possibility 1 occurs, we calculate the determinant of JD ki as det(JD ki ) =
2 −2kD xDk − xSi 3
The determinant det(JD ki ) is non-zero as long as the distance between dog Dk and H sheep Si is finite. Therefore, JD ki will have no null space, implying that Ai = 0 2 2 ∀xSi ∈ R , xDk ∈ R . This rules out possibility 1 for infeasibility. To rule out H possibility 2, we need to check for condition when bH i −→ −∞. Given bi in (13), T we find its worst case lower bound. Here f i f i ≥ 0 and as we assume that at the current time step, the sheep is outside P, this ensures β h2 ≥ 0. By removing these terms, the lower bound of bH i can be given as T bH (xSi − xP ) JSji f j + (xSi − xP )T JSii f i + (xSi − xP )T JD i ≥ li uDl j∈S\i
l∈D\k T
+ α(xSi − xP ) f i
(1)
330
N. Mohanty et al.
Using the triangle inequality on the RHS and Cauchy-Schwarz inequality on individual terms, we get −σmax JSji xSi − xP f j − σmax JSii xSi − xP f i (2) bH i ≥ j∈S\i
+
−σmax JD li xSi − xP uDl − αxSi − xP f i
l∈D\k
where σmax is the largest singular value of a matrix. Further, using the fact that the largest singular value of a matrix (σmax ) is upper bounded by its Frobenius norm (σF ), we obtain −σF JSji xSi − xP f j − σF JSii xSi − xP f i (3) bH i ≥ j∈S\i
−σF JD li xSi − xP uDl − αxSi − xP f i
l∈D\k
Now to compute this lower bound we make use of Assumption 3. We use the dynamics in (1) to compute JSii and obtain the upper bound on σF JSii and use the bounds on distances from Assumption 3 to get following upper bound:
σF
JSii
√ √ 3 + 2 kD √ √ (3 + 2)R3 kS 2+ + 2kG + 3 xSi − xSj 3 xSi − xDl l∈D j∈S\i √ √ √ √ 3 + 2 kD (3 + 2)R3 := λM (n − 1)kS 2+ + 2kG + n L3S L3D
We omit the proof of this computation in the interest of space. Similarly, using the dynamics in (1), we compute an expression for JSji and obtain an upper bound on σF JSji as follows: √ √ √ (3 + 2)kS R3 (3 + 2)kS R3 := λS σF 2kS + 2kS + xSi − xSj 3 L3S Likewise, an upper bound of σF JD li , is given by √ √ (3 + 2)kD (3 + 2)kD := λD σF JD li xSi − xDl 3 L3D
JSji
√
Lastly, we use obtain an upper bound on the dynamics of each sheep f i as: R3 f i kS xSi − xSj + + kG xG − xSi xSi − xSj 2 j∈S\i
+
l∈D
kD
xSi − xDl xSi − xDl 3
(4)
Distributed Herding
331
Now we need to compute the maximum possible value of the RHS to get the upper bound of the sheep dynamics. The first term has a local minima at xSi − xSj = (2)1/3 R. Therefore the maximum value can occur at either the lower bound or upper bound of xSi − xSj . Thus the maximum value of the first R3 R3 term can be given as Fmax := max(kS LS + kS L 2 , kS MS + kS M 2 ). Second term S S is maximum when xG − xSi = MG . The last term is maximum when distance of the sheep to the dogs are minimum, xSi −xDk = LD . Using these the upper bound on the sheep dynamics is computed as: f i (n − 1)Fmax + kG MG + nkD
1 L2D
Assuming that the velocity of the dog robots have an upper bound, and by taking the upper bound on the dynamics of all the sheep to be equal, the lower bound on bH i from 3 is (taking γ = −(α + λM + (n − 1)λS )Mp ) nkD γ (n − 1)F + k M + − (n − 1)λD MP uD max bH max G G i L2D This shows that bH i has a finite lower bound, thus ruling out possibility 2. Thus, the herding constraint (13) for a one dog to repel one sheep from the protected zone is always feasible. Since each sheep in S is allocated to one unique dog in D, extension of this feasibility result to all sheep ensures that none of them will breach the protected zone.
References 1. D’Andrea, R.: Guest editorial: a revolution in the warehouse: a retrospective on kiva systems and the grand challenges ahead. IEEE Trans. Autom. Sci. Eng. 9(4), 638–639 (2012) 2. D’Andrea, R., Dullerud, G.E.: Distributed control design for spatially interconnected systems. IEEE Trans. Autom. Control 48(9), 1478–1495 (2003) 3. Kazmi, W., Bisgaard, M., Garcia-Ruiz, F., Hansen, K.D., la Cour-Harbo, A.: Adaptive surveying and early treatment of crops with a team of autonomous vehicles. In: Proceedings of the 5th European Conference on Mobile Robots ECMR 2011, pp. 253–258 (2011) 4. Ji, M., Egerstedt, M.: Distributed coordination control of multiagent systems while preserving connectedness. IEEE Trans. Rob. 23(4), 693–703 (2007) 5. Lin, J., Morse, A.S., Anderson, B.D.: The multi-agent rendezvous problem-the asynchronous case. In: 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No. 04CH37601), vol. 2. IEEE, pp. 1926–1931 (2004) 6. Reynolds, C.W.: Flocks, herds and schools: a distributed behavioral model. In: Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques, pp. 25–34 (1987) 7. Walton, C., Kaminer, I., Gong, Q., Clark, A., Tsatsanifos, T., et al.: Defense against adversarial swarms with parameter uncertainty. arXiv preprint arXiv:2108.04205 (2021)
332
N. Mohanty et al.
8. Tsatsanifos, T., Clark, A.H., Walton, C., Kaminer, I., Gong, Q.: Modeling and control of large-scale adversarial swarm engagements. arXiv preprint arXiv:2108.02311 (2021) 9. Grover, J., Mohanty, N., Luo, W., Liu, C., Sycara, K.: Noncooperative herding with control barrier functions: Theory and experiments, arXiv preprint arXiv:2204.10945 (2022) 10. Falsone, A., Margellos, K., Garatti, S., Prandini, M.: Dual decomposition for multiagent distributed optimization with coupling constraints. Automatica 84, 149–158 (2017) 11. Notarstefano, G., Notarnicola, I., Camisa, A., et al.: Distributed optimization for smart cyber-physical networks. Found. Trends Syst. Control 73, 253–383 (2019) 12. Lien, J.-M., Bayazit, O.B., Sowell, R.T., Rodriguez, S., Amato, N.M.: Shepherding behaviors. In: Proceedings of IEEE International Conference on Robotics and Automation, ICRA’04. 2004, vol. 4. IEEE, 2004, pp. 4159–4164 (2004) 13. Pierson, A., Schwager, M.: Controlling noncooperative herds with robotic herders. IEEE Trans. Rob. 34(2), 517–525 (2017) 14. Vaughan, R., Sumpter, N., Henderson, J., Frost, A., Cameron, S.: Robot control of animal flocks. In: Proceedings of the 1998 IEEE International Symposium on Intelligent Control (ISIC) held jointly with IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA) Intell., pp. 277– 282. IEEE (1998) 15. Vaughan, R., Sumpter, N., Henderson, J., Frost, A., Cameron, S.: Experiments in automatic flock control. Robotics and autonomous systems, vol. 31, no. 1-2, pp. 109–117 (2000) 16. Pierson, A., Schwager, M.: Bio-inspired non-cooperative multi-robot herding. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 1843–1849. IEEE (2015) 17. Licitra, R.A., Bell, Z.I., Doucette, E.A., Dixon, W.E.: Single agent indirect herding of multiple targets: a switched adaptive control approach. IEEE Control Syst. Lett. 2(1), 127–132 (2017) 18. Licitra, R.A., Hutcheson, Z.D., Doucette, E.A., Dixon, W.E.: Single agent herding of n-agents: a switched systems approach. IFAC-PapersOnLine 50(1), 14 374– 14 379 (2017) 19. Sebasti´ an, E., Montijano, E.: Multi-robot implicit control of herds. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 1601–1607. IEEE (2021) 20. Bacon, M., Olgac, N.: Swarm herding using a region holding sliding mode controller. J. Vib. Control 18(7), 1056–1066 (2012)
On Limited-Range Coverage Control for Large-Scale Teams of Aerial Drones: Deployment and Study Filippo Bertoncelli1 , Mehdi Belal2(B) , Dario Albani2 , Federico Pratissoli1 , and Lorenzo Sabattini1 1
Department of Sciences and Methods for Engineering (DISMI), University of Modena and Reggio Emilia, Reggio Emilia, Italy {filippo.bertoncelli,federico.pratissoli,lorenzo.sabattini}@unimore.it 2 Autonomous Robotics Research Center Technology Innovation Institute (TII), Abu Dhabi, UAE {mehdi.belal,dario.albani}@tii.ae
Abstract. When considering a multi-robot aerial system, coverage control can be seen as a basic behavior serving as a building block for complex mission. However, when the number of units in play becomes too high, centralized and global optimal solutions might not scale well, causing a reduction in efficiency and reactiveness. To this end, in this manuscript we propose and analyze a control methodology for area coverage that is fully distributed and is able to mitigate the above factors while still providing high performances. The approach we propose in this manuscript relies on local Voronoi partitioning constrained by limited robot sensing and communication capabilities. To validate our approach, we perform an ablation study over a large set of simulations that we evaluate with a set of proposed metrics. At the same time, the algorithm has been tested on the field, with real quad-copters, and with a set of proof of concept experiments deploying up to seven aerial units.
1
Introduction
Addressing a persistent and unpredictable service demand with a multi-robot system can provide higher efficiency, thanks to parallel operation, and larger flexibility by modulating the operational costs in terms of robots employed, as well as by deploying robots only where needed. Concrete solutions can be obtained by addressing two concurrent problems. In the last years, significant improvements have been made by the researchers into the field of multi-robot systems [4]. In particular, these systems are gaining relevance in the real world applications where, due to unpredictability of the environments and the presence of external factors unknown a priori, they often requires the distributed capabilities provided by a teams of cooperating robots [13]. At the same time, a distributed and decentralized robotic system proves scalable, fault-tolerant and, due to the absence of a central point of failure, ensures robust continuous and efficient operation. Among the many applications that could benefit from c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 333–346, 2024. https://doi.org/10.1007/978-3-031-51497-5_24
334
F. Bertoncelli et al.
above properties we find: Precision agriculture, emergency response and rescue, environmental monitoring, search and rescue operations, and outdoor industrial operations such as fault diagnosis [4]. Some require control and coordination strategies and in the recent years several have been deeply studied which aim at organizing the robots to enable solving global tasks using local information. The main representative classes of control strategies are discussed in [4] and among them, a remarkable paradigm is represented by coverage control, which entails controlling a group of robots to be deployed in a given environment whit a distribution that allows the units to cover it in an optimal way. The literature is flourishing with strategies proposing multiple implementation of coverage control performed by a group of mobile robots—either ground, aerial, marine or hybrid. Amongst these, well known methods [5,6] propose decentralized coordination algorithms for groups of mobile agents based on Voronoi diagrams [7,9,10] and proximity graphs [2,11]. This control strategy provides a methodology to guarantee convergence of the positions of the robots in the team to a configuration that maximizes the area covered in the environment. The main drawback is related to the fact that the computation of each robot’s control input requires the computation of the Voronoi partitioning of the entire environment, which requires global knowledge and/or global communication and information exchange [15,17]. However, in many practical application scenarios, robots may not have access to global information or to a reliable allto-all communication network: therefore, methods were proposed to define the Voronoi partitioning in a range-constrained scenario. A distributed methodology was introduced in [5,6] to calculate the Voronoi partition over a limited sensing network, exploiting the algorithm presented in [3], which consists in gradually refining the globally computed partition by utilizing an adjustable sensing range and incrementally increasing the sensing range. However, in practical applications, limited sensing capabilities are typically a hard constraint, and rarely the sensing radius can be adjusted to detect sufficient information to allow computation of the Voronoi partition. To overcome these issues, a few methodologies were recently proposed to implement coverage control in a totally distributed manner, letting each robot compute its own control input base on locally available information only. In particular, a methodology was proposed in [14] that allows a team of mobile robots, moving in a two-dimensional environment, to optimally cover a bounded region with unknown feature. Robots are considered with limited sensing capabilities, and without access to reliable high-bandwidth communication infrastructures. The proposed methodology was validated by means of kinematic simulations and preliminary experiments on ground robots. Contribution. The aim of this paper is to experimentally evaluate the performance of the coverage control architecture proposed in [14] deployed on a large team of drones for outdoor operations. In the manuscript, we perform a deep and extensive analysis of both realistic simulations and real field tests with a fleet of quad-rotor drones in an unstructured environment as the Abu Dhabi desert. Thanks to these, we assess the scalability of the control methodology and its
Limited-Range Coverage Control of Aerial Drones
335
performance in the presence of real-world constraints and limitations such as localization error coming from noisy GPS readings. In order to guarantee the statistical relevance of the simulated results, we perform an ablation study of the main parameters with randomized initialization conditions repeated over multiple runs. We note that, to the best of the authors’ knowledge, this represents the first large-scale deployment of a team of drones performing fully decentralized and distributed coverage control in an outdoor environment. Organization of the Paper. The rest of the paper is organized as follows. The methodology considered for our experiment is described in Sect. 2. The analysis of the experiments is extensively presented in Sect. 3. Here, quantitative measurements are reported along with a description of the simulator and the platform used for the real-world experiments and continuing with a discussion of the observed results. particular Finally, Sect. 4 closes the manuscript with some concluding remarks.
2
Methods
The aim of our experiments is to evaluate the performance of a team of drones controlled by a decentralized, limited-range, coverage control strategy, deployed in a large outdoor environment. For this purpose, field trials and high-fidelity dynamic simulations were set up, to achieve a statistically relevant characterization. Details of the methods are provided in the following paragraphs. 2.1
Coverage Control Strategy
We consider a swarm of robots composed by n drones modeled as holonomic units able to move in two dimensions. All the robots posses the same limited sensing and communication capabilities. The objective of the coverage control strategy here proposed is to optimize the distribution of the agents on a bounded environment Q ⊂ R2 . Let pi ∈ R2 be the position of the i-th drone. Also, assume P = {p1, . . . , pn } ⊂ R2 to be the set of positions of all the robots. Last, let the velocity p of each drone be directly controllable. By defining ui ∈ R2 as the control input of the i-th drone, ∀i = 1, . . . , n, the kinematic model of the robot can be expressed as (1) pi = ui, As discussed in [5,6], the Lloyd’s algorithm can be used to define control strategy to optimally cover a desired environment with a team of holonomic robots. This is achieved partitioning the environment in Voronoi cells Vi , adjusted over time using the positions pi ∈ R2 of the robots as seeds. A probability density function (pdf) φ(·) : Q → R+ can be defined to identify regions of interest in Q: the centroid of each cell CVi is then computed considering the pdf, as ∫ qφ(q)dq V CVi = ∫ i (2) φ(q)dq V i
336
F. Bertoncelli et al.
Each robot is then controlled to move towards the centroid of its cell by: ui = −k pr op (pi − CVi )
(3)
with ui representing a simple proportional controller and k pr op > 0 being a proportional gain. To account for the limited sensing and communication capabilities of the robots, as discussed in [14], the partition of the environment is defined only locally with non overlapping Voronoi cells computed by each robot i as: (4) Vi = {q ∈ B(pi, r)|q − pi ≤ q − p j , ∀ j i = 1. . . . , n} In Eq. 4, the term B(pi, r) represents a circle of radius r centered in pi with r associated with the range—either for communication or for sensing—of the robot. This limited-range definition allows the robot to compute, in a decentralized fashion, its own control action (3), based on its current position and on the position of its neighbors within B(pi, r). 2.2
Simulation Protocol
Simulated experiments have been performed using the Flightmare Quadrotor Simulator [16], a photo-realistic simulator, with integrated physic, developed by the Robotics and Perception group at UZH. The control algorithm described in Sect. 2.1 has been implemented using the Robot Operating System—ROS— on separate threads, one for each drone. Each robot thread also handles the low=level controller for the quad-rotor fly, the initial take-off, and the sharing of information. A python script acting as a parallel daemon manages the execution of the simulation batches and collects the data needed to assess performances. The simulations have been performed on an Intel Xeon Platinum 8280M [email protected] GHz with 4 processors, for a total of 112 cores and 512 GBof RAM. Each simulation collects the global position for each drone over a time interval of 120 s—empirically set at the end of the simulation. To guarantee the Independence of the proposed methodology from the initial conditions, drones are spawned inside the bounded environment in randomized positions: for every simulation run, the vehicles are spawned within the simulation environment in a neighborhood of the center of the coverage area with a random displacement. The final position is bounded to be at least 0.5 m from any of the already positioned vehicles. After arming and take-off of all the vehicles at the same altitude, the coverage control algorithm takes over and steers the drones toward their wanted position thanks to the velocities by Eq. (3). The simulator makes use of the RPG Quadrotor control [8], which allows for generating desired orientation, collective thrust command, body rates, angular accelerations, and angular jerk to generate the control values of the four motors from the requested velocity command. The measurements of the positions of the other vehicles, used to feed the control function, are added to a noise value generated by a normal distribution N (0, 1) to simulate realistic measurement error. Ten simulations runs have been performed by combining different parameters expressed a tuple (n, a, r)
Limited-Range Coverage Control of Aerial Drones
337
respectively representing: The number of drones n ∈ N = {8, 32, 64}; the size of the operational area a ∈ A = {50 m, 100 m, 200 m}; and the sensitivity range r ∈ R = {10 m, 50 m, 100 m, 500 m}. 2.3
Field Test Protocol
Real-world experiments have been performed in a remote location in the Al Khaznah region of the Abu Dhabi emirate in the UAE. The platform selected as the flying unit is depicted in Fig. 1. It is a custom-built quad-rotors developed by the company Fly4Future, consisting of a standard DJI frame of 450 mm, a Pixhawk4 autopilot with custom firmware, and an Intel NUC with i7 processor and 16 GB of RAM as an on-board computer. The latter is used to run the highlevel software control stack that, in this particular case, runs on the MRS UAV system openly available for research purposes and released by the Multi-robot Systems Group at the Czech Technical University in Prague [1].
Fig. 1. Figure of the MRS F450, the quad-rotor platform selected for the experiments
The robotic unit is equipped with several sensors among which, for the specific use case we analyze here, we highlight a Global Positioning System module for self-localization—as the one coming with the Pixhawk autopilot bundle— and a XBee devices for communication among the units and information sharing. The latter, makes use of an ad-hoc peer-to-peer proprietary mesh-network called DiGiMesh that we limit to be single-hop to force the range communication constraint. We employed omni-directional low-power antennas on board all drones, each of which runs on 900Mhz empirically estimated to guarantee a good connectivity among all the units in the group. The vehicle controller [12] allows to generate the desired orientation, collective thrust command, body rates, angular accelerations, and angular jerk based on the coverage control. The experiments were carried out using 7 vehicles, in an area of 50×50 m; as previously described, the communication devices used offer full connectivity but, to evaluate the behavior of the control strategy we manually limit the effect of the sensing range r
338
F. Bertoncelli et al.
with the same values used for the simulations—i.e., R ∈ {10 m, 50 m, 100 m}. Three simulations runs have been performed for each value of the sensing range. As for the simulation, the control algorithm proposed in Sect. 2 runs in a distributed way on board of each drone. The position information of neighboring units needed by the coverage control are solely provided through radio communication and only by the aerial units—i.e., no external or central computer is used. The ve Fig. 2 shows the trajectories of the UAVs in one of the experiments carried out; following the takeoff procedure to bring the vehicles to the same altitude, the coverage control function is performed to determine the new positions of the vehicles. The speed control of Eq. 1 is limited to 2m/s for safety reasons, taking into account the number of vehicles involved in the experiment and the degree of freedom that the drones have to move in various directions. 2.4
Metrics
In order to evaluate the overall performance of the system, the same metrics were collected both in simulations and in field tests. In particular, we consider the following three main aspects: (i) optimality of the final configuration, (ii) efficiency in the coverage of the environment, and (iii) effectiveness in the use of the robots.
Fig. 2. Vehicles trajectories of a real experiment with 7 drones
Limited-Range Coverage Control of Aerial Drones
339
Optimality of the Configuration. As is typically done in the literature, the performance of the coverage control algorithm is evaluated computing the value of an optimization function H (P), whose value indicates how close the displacement of the robots is to the optimal configuration, that is the centroidal Voronoi configuration. Considering the positions P of the robots, it is possible to partition the environment Q into n range-unlimited Voronoi cells {V¯1, . . . , V¯n }, where the i-th cell is computed using position pi as a seed. The optimization function is then defined as follows: n ∫ H (P) = − q − pi 2 φ(q)dq (5) i=1
¯i V
As discussed in [6,15], such definition allows to quantify how close the configuration of the robots is to the centroidal Voronoi configuration, defined considering, for each point q ∈ Q, the value of the pdf φ(q). Coverage Effectiveness. Considering the limited-range sensing capabilities of the robots, in order to assess how well the environment Q is covered by the set of individual robot non-overlapping limited-range cells {V1, . . . , Vn }, we propose to evaluate the ratio A(P) of the integral of the pdf φ(·), computed over the summation of the cells Vi , with respect to the same integral computed over Q, namely n ∫ φ(q)dq A(P) =
i=1Vi
∫
Q
φ(q)dq
(6)
Overall Efficiency. Considering the fact that each robot is able to individually cover a circular area with radius r, we measure the effectiveness in the use of resources computing the ratio between the actual area covered by the team of drones, and the upper-bound on the area that could be theoretically covered, namely: n ∫ dq E(P) =
3
i=1 V
i
n π r2
(7)
Results and Discussion
In this section we report the results for both simulations and field tests performed according to the presented methodology. Quantitative results are related to the value of the optimization function defined in Eq. (5), with larger values corresponding to a configuration that is closer to the optimal one; to the integral of the PDF defined in Eq. (6), where again larger values reflect better coverage efficiency; and last to the measurement of the overall covered area as defined in (7), where greater values reporting a better effectiveness in the usage of the robots.
340
F. Bertoncelli et al.
3.1
Simulation Results
We hereby present the results of the simulations. Figure 3 displays the value of the average value of H (P) across the set repetitions. The blue line shows the value for 8 robots, the red line for 32 robots, and the yellow line for 64 robots. Each plot groups according to the size of the area to cover and the sensing range. Following the same color scheme, Figs. 4 and 5 display the average value of A(P) and E(P) respectively. Figure 3 clearly shows that increasing the number of robots increases the performance with respect to the coverage metric. This effect is intuitively attributable to a greater cumulative covered area. It is also possible to note that, the rate of convergence decreases with the increase of area size, increases if the sensing range grows and decreases the more robots are employed. The oscillations present on the blue line for the cases with large Q and big sensing range are caused by the dynamics of the drones. Inherently, each drone possesses a certain motion dynamic different from pi = ui . The velocity controller that generates
Fig. 3. Average value of A(P) across all repetitions. The plots on the same line refer to the same total area to cover, while the plots on the same column refer to cases with same sensing range. The values for 8, 32 and 64 robots are displayed with blue, red and yellow lines respectively.
Limited-Range Coverage Control of Aerial Drones
341
Fig. 4. Average value of A(P) across all repetitions. The plots on the same line refer to the same total area to cover, while the plots on the same column refer to cases with same sensing range. The values for 8, 32 and 64 robots are displayed with blue, red and yellow lines respectively.
Fig. 5. Average value of E(P) across all repetitions. The plots on the same line refer to the same total area to cover, while the plots on the same column refer to cases with same sensing range. The values for 8, 32 and 64 robots are displayed with blue, red and yellow lines respectively.
342
F. Bertoncelli et al.
the motor commands controls the drone so that its motion is as similar to (1) as possible. However, if the requested velocity is too different from the current velocity, the controller encounters acceleration limits that render it impossible to closely track the requested velocity profile. With large Q and large sensing range, the resulting Vi can be very large and their centroid can be far away from the drone. Therefore, the input generated by the control law (3) is very large, requiring accelerations outside the limitations of the drone, causing the oscillating behavior depicted by the blue line. With a larger number of drones, this behavior is limited or totally absent since each drone has a larger number of neighbors, therefore its resulting cell Vi is smaller. This effect can be reduced by reducing the value of k pr op in (3) or by applying a saturation on the value of ui . Nevertheless, the coverage algorithm is still capable of spreading drones across the area. Figure 4 shows how well the area is covered by the drones. When the sensing range is small, the robots often struggle to cover the totality of Q. The covered area increases with the increase of the number of drones employed. This is expected since, obviously, more robots imply more cells Vi and the behavior for each drone is to distance itself from its neighbors, enlarging its cell Vi . Figure 5 depicts the efficiency of the coverage. The plots clearly show that fewer robots are more efficient. A value of 1 for E means that each robot has reached the configuration where its cell Vi is the largest possible. With few robots there are fewer neighbors to interact with, ultimately allowing for the maximum enlargement of each cell. The condition for a team of robots to not reach maximum efficiency that the total area Q cannot accommodate all the cells of the robot. These considerations ultimately allow us to set some design analysis to make when deploying a coverage application. A larger number of robots is not necessarily better in all scenarios, even if it increases the performance. Depending on the sensing range, which is often dictated by different factors, a larger number of robots can imply a better area coverage but a trade-off has to be considered, in order to not overly reduce efficiency. 3.2
Results of Field Tests
We continue our analysis of the results with real field experiments involving a group of seven drones. Results are presented in Fig. 6 where all the metrics detailed in Sect. 2.4 are utilized. The first line of graphs describes the metric H (P) for the different sensitivity range values taken into consideration. The second and third lines describe A(P) and E(P) respectively.
Limited-Range Coverage Control of Aerial Drones
343
Fig. 6. Average value of Eqs. (5), (6), (7) across all repetitions. The plots on the same line display the same metric, while the plots on the same column refer to cases with same sensing range.
Parallel to the simulated case, it can be seen how the metric of H (P), first row, blue line, increases as the value of the range increases. Unlike Fig. 5 it can be seen that the convergence times of the optimization function in Eq. 5 are longer. This is due to a cap to the maximum velocity of 2m/s introduced for safety reason and to avoid long overshoot with the real platform. Results for values of r equals to 10, 50 and 100 show different convergence time. For r = 50 and r = 100 we note how the velocity cap is particularly detrimental since the vehicles unable to reach the speed outputted by the proportional control law and the optimization function requires more time to reach the steady state. Figure 7 shows the initial and final conditions of the whole control process. Coloured dots show the position of the each drone in the group at start—leftmost figure—and end—rightmost figure—of the simulation. The circles surrounding the dots represent the sensing range of the robots. In particular, by visual analysis of Fig. 7b, it can be noted that the final group configuration does not show a compact and close formation. Indeed, while in perfect conditions the final graph should have adjacent but non-overlapping sensing ranges—circles—this is not the case here. By analysis of the data, it has been noticed that, during the 2 2 and σy,y ranged in the interval experiment, the GPS covariance matrix for σx,x [1.0, 1.4] m, while values for the EPH, the standard deviation of horizontal posi-
344
F. Bertoncelli et al.
tion error, and EPV, the standard deviation of vertical position, were oscillating between 0.6 and 0.95 m. These values, and the position uncertainty that derives from them, justify the distance between the sensitivity ranges.
Fig. 7. On image (a) the initial configuration of the vehicles during a real experiment with sensitivity range R = 50; on image (b), the final configuration of the drones in the same experiment. On image (c) an aerial photo of the convergence process is shown, with a different orientation of the axes.
4
Conclusions
In this paper, we presented a distributed coverage control algorithm capable of arbitrating a group of autonomous vehicles with limited sensing capabilities. Thanks to an extensive set of simulations and the implementation of real field experiments, the strategy has been proven to be robust and efficient. It is indeed capable of achieving close-to-optimal performances both in simulation and reality and, as shown, it is also capable of orchestrating a significant number of drones marking the approach as scalable. On the other end, real experiments have shown how the presence of non-ideal conditions, such as noisy GPS reading, have an
Limited-Range Coverage Control of Aerial Drones
345
effect on the final configuration but overall do compromise the behavior of the coverage control function. Future work includes a more extensive study of the real experiments and the investigation of an alternative control law. Indeed, in this work, the final outputted velocity is generated by the proportional law in Eq. (1). In some situations, as previously discussed, the generated velocity is too high—e.g., when the point to reach is far from the drone’s actual position—and does not respect the physical constraints of the platform. This represents an intuition for a future investigation, analyzing control strategies that take into account the dynamics of the drone, through a more elaborate kinematic model.
References 1. Baca, T., et al.: The MRS UAV system: pushing the frontiers of reproducible research, real-world deployment, and education with autonomous unmanned aerial vehicles. J. Intell. Robot. Syst. 102(1), 1–28 (2021) 2. Bullo, F., Cort´es, J., Martinez, S.: Distributed Control of Robotic Networks: A Mathematical Approach to Motion Coordination Algorithms. Princeton University Press, Princeton (2009) 3. Cao, M., Hadjicostis, C.: Distributed algorithms for voronoi diagrams and application in ad-hoc networks. Technical report UILU-ENG-03-2222, DC-210, UIUC Coordinated Science Laboratory (2003) 4. Cort´es, J., Egerstedt, M.: Coordinated control of multi-robot systems: a survey. SICE J. Control Meas. Syst. Integr. 10(6), 495–503 (2017) 5. Cortes, J., Martinez, S., Bullo, F.: Spatially-distributed coverage optimization and control with limited-range interactions. ESAIM Control Optim. Calculus Variations 11(4), 691–719 (2005) 6. Cortes, J., Martinez, S., Karatas, T., Bullo, F.: Coverage control for mobile sensing networks. IEEE Trans. Robot. Autom. 20(2), 243–255 (2004) 7. Du, Q., Emelianenko, M., Ju, L.: Convergence of the Lloyd algorithm for computing centroidal voronoi tessellations. SIAM J. Numer. Anal. 44(1), 102–119 (2006) 8. Faessler, M., Franchi, A., Scaramuzza, D.: Differential flatness of quadrotor dynamics subject to rotor drag for accurate tracking of high-speed trajectories. IEEE Robot. Autom. Lett. 3(2), 620–626 (2018). https://doi.org/10.1109/LRA.2017. 2776353 9. Guruprasad, K., Dasgupta, P.: Distributed voronoi partitioning for multi-robot systems with limited range sensors. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3546–3552. IEEE (2012) 10. He, C., Feng, Z., Ren, Z.: Distributed algorithm for voronoi partition of wireless sensor networks with a limited sensing range. Sensors 18(2), 446 (2018) 11. Laventall, K., Cort´es, J.: Coverage control by multi-robot networks with limitedrange anisotropic sensory. Int. J. Control 82(6), 1113–1121 (2009) 12. Lee, T., Leok, M., McClamroch, N.H.: Geometric tracking control of a quadrotor UAV on se (3). In: 49th IEEE Conference on Decision and Control (CDC), pp. 5420–5425. IEEE (2010) 13. Parker, L.E.: Multi-robot team design for real-world applications. In: Asama, H., Fukuda, T., Arai, T., Endo, I. (eds.) Distributed Autonomous Robotic Systems 2, pp. 91–102. Springer, Tokyo (1996). https://doi.org/10.1007/978-4-431-669425 10
346
F. Bertoncelli et al.
14. Pratissoli, F., Capelli, B., Sabattini, L.: On coverage control for limited range multirobot systems. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2022) 15. Schwager, M., McLurkin, J., Rus, D.: Distributed coverage control with sensory feedback for networked robots. In: Robotics: Science and Systems, pp. 49–56 (2006) 16. Song, Y., Naji, S., Kaufmann, E., Loquercio, A., Scaramuzza, D.: Flightmare: a flexible quadrotor simulator. In: Conference on Robot Learning (2020) 17. Teruel, E., Aragues, R., L´ opez-Nicol´ as, G.: A practical method to cover evenly a dynamic region with a swarm. IEEE Robot. Autom. Lett. 6(2), 1359–1366 (2021)
MARLAS: Multi Agent Reinforcement Learning for Cooperated Adaptive Sampling Lishuo Pan1,2(B) , Sandeep Manjanna1,3(B) , and M. Ani Hsieh1(B) 1
GRASP Laboratory at the University of Pennsylvania, Philadelphia, USA [email protected], [email protected] 2 Brown University, Providence, USA lishuo [email protected] 3 Plaksha University, Mohali, India
Abstract. The multi-robot adaptive sampling problem aims at finding trajectories for a team of robots to efficiently sample the phenomenon of interest within a given endurance budget of the robots. In this paper, we propose a robust and scalable approach using Multi-Agent Reinforcement Learning for cooperated Adaptive Sampling (MARLAS) of quasi-static environmental processes. Given a prior on the field being sampled, the proposed method learns decentralized policies for a team of robots to sample high-utility regions within a fixed budget. The multi-robot adaptive sampling problem requires the robots to coordinate with each other to avoid overlapping sampling trajectories. Therefore, we encode the estimates of neighbor positions and intermittent communication between robots into the learning process. We evaluated MARLAS over multiple performance metrics and found it to outperform other baseline multirobot sampling techniques. Additionally, we demonstrate scalability with both the size of the robot team and the size of the region being sampled. We further demonstrate robustness to communication failures and robot failures. The experimental evaluations are conducted both in simulations on real data and in real robot experiments on demo environmental setup (The demo video can be accessed at: https://youtu.be/qRRpNC60KL4).
Keywords: multi-robot systems learning
1
· adaptive sampling · reinforcement
Introduction
In this paper, we propose a decentralized multi-robot planning strategy (Fig. 1(a) presents the framework of our method) to cooperatively sample data from twodimensional spatial fields so that the underlying physical phenomenon can be
L. Pan and S. Manjanna—Equal contribution. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 347–362, 2024. https://doi.org/10.1007/978-3-031-51497-5_25
348
L. Pan et al.
modeled accurately. We consider sampling from quasi-static spatial processes, that is, we can reasonably assume the field to be temporally static for the sampling duration. Examples of such spatial fields include algal blooms, coral reefs, the distribution of flora and fauna, and aerosol concentration. Building highresolution representations of environmental physical phenomena can help better understand the effects of global warming and climate change on our environment. Observing and modeling the spatial evolution of such fields using multiple data sampling robots plays a key role in applications such as environmental monitoring, search and rescue, anomaly detection, and geological surveys. We present a planning algorithm for multiple robotic platforms to sample these spatial fields efficiently and adaptively. In this context, sampling refers to collecting sensor measurements of a phenomenon. Adaptive sampling of spatial fields with hotspots has gained a lot of momentum in the field of robotic exploration and mapping [1–4]. Adaptive sampling refers to strategic online path planning for robots based on the observations made until the current time step. Exhaustively sampling each point of an unknown survey region can be tedious, inefficient, and impractical if the survey space is large and/or the phenomenon of interest has only a few regions (hotspots) with important information [1,5]. It has been observed that for low-pass multiband signals, uniform sampling can be inefficient, and sampling rates far below the Nyquist rate can still preserve information [6]. This is the key guiding principle behind active and non-uniform sampling [7]. Although the approaches solving coverage control problems [8,9] seem suitable for solving sampling problems, they differ in a quite important aspect. The coverage control or area division problem [10,11] focuses on finding the most preferable positions for the robots such that a non-occupied space is covered with their sensors or the corresponding assigned areas are physically visited by the respective robot as depicted in Fig. 1(b). On the contrary, in this paper we are looking at designing robot paths that visit high-utility regions within a fixed budget as illustrated in an example run of our method in Fig. 1(c). Recently, multi-agent reinforcement learning (MARL) methods have proven to be effective for searching cooperative policies in complex and large-scale multirobot tasks and they have been explored for information gathering and coverage problems. Persistent monitoring and coverage with MARL involve defining equal utility for every state [12,13]. In the sampling problem, utilities are distributed unevenly, thus requiring nonuniform exploration. Some of the multi-robot information gathering techniques are limited to offline planning [14] or are incapable of handling robot failures, as they function by dividing the region of interest among the robots in the team [15]. Scalable multi-agent reinforcement learning policy is proposed to achieve large-scale active information gathering problem in [16]. However, they use a central node for target localization, which is fragile to single-node failure. The key contribution of this work is a fully decentralized multi-robot control strategy, MARLAS, to learn multi-robot cooperation and achieve efficient online sampling of an environmental field. From comprehensive experimental results,
MARLAS: Multi Agent Reinforcement Learning
349
we show that our method is scalable with both the size of robot team and the size of the region being sampled, and achieves equitable task distribution among robots. We further demonstrate our approach is robust to both communication failures and individual robot failures.
2
Problem Formulation
Consider the multi-robot adaptive sampling problem where N homogeneous robots aim to maximize accumulated utilities over a region of interest in a given time horizon H. The sampling region E ∈ R2 is a bounded continuous 2D space. In this work, E is discretized into uniform grid cells forming a workspace with length h and width w. Thus, the position xi of robot r i is an element of Z2 . Each grid is assigned a prior value F(x1, x2 ), which represents the amount of data that robots can collect at this location. In this work, we assume once the grid is visited by any robot, all the data in that grid will be collected. We further assume that a low-quality estimate of reward map F ∈ Rh×w of the region E can be obtained through pilot surveys or satellite data or historical knowledge of the field in the region of interest.
Fig. 1. (a) Overview of MARLAS algorithm, (b) coverage paths generated by one of the area division approaches (DARP [15]), and (c) an example sampling run of the proposed approach (MARLAS).
The objective is to obtain a set of decentralized policies π for all robots in the multi-robot team, that maximize the averaged expected return of N robots N i i i i Jtot (π) = N1 i= 1 J (π ). Here, J is the expected return of the robot r along its i trajectory τ . In the homogeneous team setting, we can assume that all robots have the same control strategy; therefore, we will drop the subscripts and write the policy as π = π i for each robot r i . We use a neural network with parameters θ to approximate the underlying optimal control policy as a conditional probability πθ = P(ai |si ; θ), where ai , si are the action and state of the robot r i , respectively. ˆij ∈ Rh×w Each robot maintains locally a belief of the position of other robots x and the belief of the reward map Fˆi ∈ Rh×w throughout the mission. Our learning
350
L. Pan et al.
task is to solve the optimization problem to find a set of parameters θ ∗ for the neural network. Multi-robot Learning Formulation. In our decentralized multi-agent system setting, individual agents do not have the direct access to the ground truth information, and need to trade-off between exploiting the current information and communicating with neighbors to solve the task in a cooperative manner. We frame the task as a decentralized partially observable Markov Decision Process (Dec-POMDP) represented by the tuple, (I, S, {A i }, {Z i }, T, R, γ). Here, I is the set of robots, S is a set of states, and A is the set of discrete actions. Z i is a set of possible observations by the robot ri . T models the transition function. R, γ are the reward function and discount factor. Optimal solution of the Dec-POMDP using dynamic programming is provably intractable as the time horizon, number of robots, and the size of region increases [17]. In this work, we train a neural network in a multi-agent reinforcement learning (MARL) setting to approximate the underlying optimal decentralized policy.
3
Approach
An overview of our proposed MARLAS framework is presented in Fig. 1(a). Let j {r 1, r 2, · · · , r N } be the set of robots, and xi(t−l+1):t , {x(t−l+1):t }, F(t−l+1):t repre-
sents the state of each robot, where xi(t−l+1):t are robot’s historical positions up j
to l steps in the past, {x(t−l+1):t , ∀ j i} is the set of positions of the neighboring robots with a history up to l steps in the past, and F(t−l+1):t are the reward maps up to l steps in the past. Thus, in our formulation each robot stores the trajectory history of length l and uses intermittent communication with other robots to update the neighbor positions and reward maps. The individual robot has no direct access to the neighbor positions when they are not in communication. Thus, the ground truth of the reward maps quickly become unavailable to individual robots. As a substitute, each robot r i estimates and maintains belief distributions over the neighbor positions and the reward maps. Hence, the new i . state of the robot r i at time t is defined as sit := xi(t−l+1):t , {ˆ xij,(t−l+1):t }, Fˆ(t−l+ 1):t ˆi We denote x as belief distributions of neighbor positions and Fˆi j,(t−l+1):t
(t−l+1):t
as the belief distributions of reward maps. A discrete action set is defined as A = {(i, j) | ∀i, j ∈ {−1, 0, 1}, (i, j) (0, 0)}, that is our action space consists of actions leading to 8-connected neighboring grids from the current position. At each time step t, each robot r i takes an action ait ∈ A. We assume that robots can uniquely identify other robots, and thus zij,t represents the observation of robot r j ’s position by robot r i at time t. More concretely, we define zij,t as the occupancy {0, 1}h×w of grid cells within the sensing radius of robot r i . We assume a deterministic transition function T , but our method can be easily extended to handle non-deterministic transitions.
MARLAS: Multi Agent Reinforcement Learning
351
Distributed Belief State Estimate. We assume that a robot’s estimates of its neighbor positions is limited by its sensing radius and these estimates are noisy due to hardware limitations and environmental uncertainties. Direct communication between robots is limited by the communication radius. Since knowledge of global robot states is necessary to achieve multi-robot coordination, ˆij of all teammates’ positions, we assume each robot r i maintains local beliefs x where j is the index of its teammates. Without loss of generality, we choose the Bayes filter to estimate neighbor positions, which can be replaced with other filters such as the Kalman or learning-based filters. For simplicity, we assume that the sensing radius is equal to the communication radius, and both denoted as dcr , but this is not a strict requirement. When zij (x1, x2 ) = 1, a neighboring robot is detected at location (x1, x2 ), otherwise zij (x1, x2 ) = 0. Due to sensing noise, we consider false positive and false negative in the discrete setting. Grid cells that are outside of the robot’s sensing radius have zij = 0. In this work, we assume that the robot has no knowledge of its neighbors’ control policies. In other words, in the action update of a neighbor’s position estimate, the unobserved neighbors have equal transition probabilities ˆij is dictated by a to its 8-connected neighboring grids. Thus, the distribution x 2D random walk in the action update. The sequence of action and sensor updates are given by, action update: sensor update:
j
p(ˆ xij,t+1 |ˆ xij,t , at ) = p(ˆ xij,t+1 |ˆ xij,t ), p(ˆ xij,t+1 |zij,t ) =
p(zij, t
ˆ xi
j, t
|ˆ xij, t )·p(ˆ xij, t ) i i p(z j, t |ˆ x j, t )·p(ˆ xij, t )
(1) (2)
To enhance each robot’s estimates of its neighbors’ positions, we utilize communication between robots. To achieve this, each robot saves a length l of the trai , considering the limitation of memory units in distributed jectory history τt−l+ 1:t systems. When neighboring robots enter a robot’s communication radius, the robot has direct access to its neighbor’s trajectory history and updates its belief ˆij,t−l+1:t , accordingly. In this work, we only consider of the neighbor’s position, x 1-hop communication without communication delays. In general, multi-hop communication would introduce communication delays and thus is out of scope of this work. Decentralized Control Policy. In this work, we use the expressiveness of a neural network to approximate the underlying optimal decentralized control policy for the proposed multi-agent reinforcement learning problem. We use a fully connected neural network of two hidden layers f (·) with 128 hidden units, and hyperbolic tangent activation functions following each hidden layer. The output layer units correspond with actions to be taken. To define a stochastic policy, a softmax layer is attached to generate a distribution on the set of actions A. From experiments, we find that with the spatial feature aggregation φ(·) detailed in the following section, two-hidden-layers neural networks have sufficient expressiveness to approximate the underlying optimal policy. We do not exploit the fine-tuning of neural network architectures, as it is not the main focus of this
352
L. Pan et al.
work. Policy parameterization is given by πθ (ait |sit ) = Softmax( f (φ(sit ); θ))
(3)
where φ(sit ) represents the spatial feature aggregation vector. At each time step, robots sample the actions from stochastic policy and move to their next positions following transition function T . To avoid collisions with other robots in their neighborhood, robots employ a potential field based obstacle avoidance strategy similar to [18]. It is assumed that collision avoidance will only be activated within d0 distance from the closest neighboring robot. We further quantized repulsive force synthesised from collision avoidance strategy, by maximizing its alignment with the action from the discrete set A. For robots whose collision avoidance is active, their actions generated from learned policy are replaced with a repulsive force. Additionally, robots generate random actions when the collision occurs. Spatial Feature Aggregation. In our previous work [3,19], we proposed and evaluated various feature aggregation methods for adaptive sampling. Multiresolution feature aggregation [3] has been empirically proven to outperform other methods, as it reduces the state dimension and utilizes the geometry induced bias for the sampling task. In multiresolution feature aggregation, we divided the current belief distribution of reward map Fˆti into regions with different resolutions according to their distance to each robot. Feature regions closer to a robot have higher resolution, and resolution decreases farther away from the robot. Since rewards at locations visited by other robots are removed, each robot must estimate its neighbors’ positions at any given time. We utilize each robot’s ˆij and the belief estimate for prebelief estimates for other robot locations, x i vious reward map Fˆt−l when updating the belief of current reward map Fˆti in i t i (x , x ) J ˆ j,k (x1, x2 ) , for all (x1, x2 ). the form Fˆti (x1, x2 ) = Fˆt−l 1 2 h,w − k=t−l+1 j x +
ˆij,k is the discrete Here, Jh,w is a all-ones matrix of size of the reward map, the x belief distribution of the neighbor positions with its value between 0 and 1 in each grid cell, the operation [·]+ converts the negative value to 0. Notice that the ˆij,t−l+1:t , thus we recompute direct communication between robots updates the x the summation over l time steps of the belief distributions of the neighbor posii follow the Markov tions. Here, the belief distributions of reward maps Fˆ(t−l+ 1):t property.
Multi-robot Learning in Adaptive Sampling. As mentioned in Sect. 2, solving a Dec-POMDP optimally is provably intractable in our task. Therefore, we use the belief model to maintain belief distributions of the ground truth and employ a neural network to approximate the optimal decentralized policy π ∗ and we use policy gradient methods to search the optimal policy on the neural network parameter space. To encourage cooperation, we adapt the centralized training with decentralized execution (CTDE) technique commonly used for multi-agent policy training. More specifically, we compute a centralized reward as the average of all the robots’ individual rewards, using global reward function R(·), state s and action a information. Thus, the centralized expected return
MARLAS: Multi Agent Reinforcement Learning
353
H t N
i i is given by Jtot (πθ ) = Eτ∼πθ N1 t= 1γ i=1 R st , at | s0 , Control policy πθ is decentralized in both training and execution. To optimize θ directly, we use policy gradient which gives the gradient of the expected return in the form of ∇J(πθ ) ∝ Eτ∼πθ [Gt · ∇ log πθ (at |st )], where Gt = Tk=t+1 γ k−t−1 R sik , aik . Action at and state st are drawn from τ. To estimate Gt , we adapt the Monte Carlo based REINFORCE method by generating trajectories τ following πθ . For each training episode of MARLAS, we initialize robots randomly in E. We assumed the robot knows the initial position of all the other robots. Each robot is given the control policy with the most updated parameters θ. At time step t, each robot r i randomly samples action ait from πθ (ait |sit ). Robot r i receives a reward F(xi ) by sampling data at location xi . Once the grid cell xi is visited, F(xi ) is set to a negative value indicating the data has been consumed at this location. If multiple robots enter the same grid cell simultaneously, the reward will be evenly distributed to each robot in the cell, and there will be a collision penalty for each robot in the same cell. The reward function is given by, F(xi ) + βcol · I(xi = x j ), ∀ j ∈ I, R s i , ai = c
(4)
where c is the total number of robots in the grid of xi , I is the indicator function, which indicates a collision between robots, and βcol < 0 is the collision penalty. Each robot generates τ by executing the control policy up to the time horizon H. Furthermore, we generate M trajectories for each robot as one training batch. We use REINFORCE with baseline to update the parameters θ as follows θ k+1 = θ k + α ·
N M H−1 1 i,m i,m (G − bit )∇ log πθ (ai,m t |st ) , N · M i=1 m=1 t=0 t
(5)
i,m 1 M with the baseline defined as bit = M m=1 G t . Superscription i, m are the indices of the robots and the trajectories. Note that we draw actions according to the policy distribution in training to enhance the exploration. During deployment, we draw the action that has the maximum probability.
4
Experiments and Results
We train the multi-robot team on a region E synthesized by a mixture of Gaussians. In the training phase, we use a robot team size of N = 5, a sensing and communication radius of dcr = 10 grid cells, a collision avoidance range of do = 2 grid cells and collision penalty coefficient βcol = −2. Other hyperparameters used for training include: a discount factor γ = 0.9, training time horizon H = 200, and set the number of trajectories M = 40. We use the ADAM stochastic optimizer [20] in our training phase and the control policy is trained on a single map of size 30 × 30 by changing the starting locations of the robots for every epoch. We only use one trained policy to generate all the results in this paper. We found that the
354
L. Pan et al.
Fig. 2. Test datasets we used in evaluation, (a) real bathymetry data from Barbados, (b) synthetic cshape distribution, (c) reaction diffusion simulation [21], and (d) a satellite image of a coral reef.
learned policies can adapt to diverse fields during the test phase as presented in our results. The results presented in this Section show that the learned control policies outperform existing baseline methods. We further evaluate scalability of MARLAS for different robot team and workspace sizes, and the robustness of MARLAS to communication failures and robot failures. Testing Datasets: For our testing phase, we consider a set of diverse environments generated both by real-world data and synthetic simulations, as illustrated in Fig. 2. Performance Metrics: Many applications in the domain of robotic sampling have a need to sample in information-rich locations at the earliest due to a limited endurance of the robots to carry out the survey, and the environmental processes being sampled are quasi-static. Hence, we measure the discounted accumulated reward as one of the performance metrics. Standard deviation of discounted accumulated reward provides a measure on task sharing between the robots. A lower value implies that the robots contribute more equally to the overall task. Average pair-wise overlap measures the total pair-wise overlap between the robot trajectories averaged by the number of all possible neighbors. It reflects the performance of a multi-robot sampling approach in terms of coordination and task allocation. Average overlap percentage measures the percentage of the averaged pair-wise overlap in the robot trajectories and is computed as the ratio between the average pair-wise overlap and the number of steps the robots has traveled. Coverage measures the completeness of the active sampling mission, calculated as the ratio between the accumulated reward and the maximum total rewards possible in the given time horizon H. Average communication volume isa measure of the amount of communication between robots, given by N1 i t Nti , where Nti is the set of neighboring robots of robot r i at time t.
MARLAS: Multi Agent Reinforcement Learning
355
Fig. 3. Comparison with baseline methods: Column (a) illustrates trajectories of the deployed robot team. Circles are the deployment location, and triangles are the stop location. Column (b) presents comparisons of discounted accumulated reward. Column (c) presents comparisons of average pair-wise overlap. The 95% confidence intervals are over 40 trials.
In Fig. 3, we compare MARLAS with multiple baseline sampling methods including independently trained multi-robot sampler [19], DARP (Divide Area based on the Robot’s initial Positions) algorithm [15], and maxima search algorithm [22]. We used discounted accumulated reward and average pair-wise overlap metrics for these comparisons. Column (a) of Fig. 3 presents the trajectories followed by two robots sampling the underlying distribution using MARLAS. The communication radius is fixed as dcr = 20%Dmax , where Dmax is the maximum possible distance in the given region of interest. Columns (b) in Fig. 3 presents quantitative comparison of the discounted accumulated rewards. It can be observed that MARLAS outperforms baseline methods. Figure 3 column (c) compares the pair-wise overlap between paths generated by MARLAS and independently trained multi-robot sampling technique, which keeps track of the full history for inter-robot communications. In these experiments, the robots using MARLAS only communicate history length l = 50 trajectory data, compared to the baseline methods where the full trajectory history is being sent, MARLAS outperforms other methods in generating
356
L. Pan et al.
non-overlapping utility maximizing paths. Even though the average overlaps for MARLAS with l = 2 is high, the neighbor position estimate embedded learning helps it to achieve higher discounted rewards compared to the baselines. MARLAS is stochastic because the robots are initialized in the same grid and select random actions to avoid collision. Therefore, its metrics have a tight confidence interval. 4.1
Analyzing Scalability of Learned Policy
Scalability is an important property for a learning-based multi-robot decentralized policy. To evaluate the scalability of MARLAS, we conducted experiments with different robot team sizes; different sizes of the workspace; and varying sensing and communication radii. We trained a policy with 5 robots (N = 5) and sensing and communication radius dcr = 10 grids, and generalized to all the different settings.
Fig. 4. Deployment of robot teams over reef data for 150 time duration. The first row presents robot trajectories (in different colors) as the team size is increased from 2 to 15. Circles are the deployment location, and triangles are the stop location. Second row presents sequential progress of a sampling run with 15 robots at time steps 10, 40, 80 and 150. Circles represent the robot current positions.
Scalability with the Team Size: In this scenario, we increase the number of robots from 2 to 20 and fix the √ sensing and communication radius to 10 grids. We scale the map by a factor of n to maintain a constant density of robots for different sizes of robot teams. For example, a map of size 26 × 26 is used for a 2 robot experiment, while a map of size 40 × 40 is used for a 5 robot experiment.
MARLAS: Multi Agent Reinforcement Learning
357
Robots start exploring from the upper left corner of the map, as illustrated in Fig. 4 so that they have the same opportunity to accumulate rewards. Qualitative results presented in the first row of Fig. 4 show the robot trajectories as the team size is increased from 2 to 15. The plots in the second row of Fig. 4 illustrate the sequential progress of a sampling run with 15 robots at time steps 10, 40, 80 and 150. We observe that the robots quickly spread out, identify, and cover to dense reward regions. Furthermore, the robots spread to the different regions and are able to perform the sampling task in a coordinated manner. The quantitative results in Fig. 5 illustrate that the standard deviation of the discounted accumulated rewards start close to 0 and increase sub-linearly as the robot team size increases. The increase of the standard deviation values is expected as the discounted accumulated rewards also increase as team size increases. We further notice that all the standard deviation values are negligible compared to the discounted accumulated rewards. This indicates that the collected rewards are distributed evenly to each individual robot. Together with the qualitative results, we conclude that each robot continues to contribute equally to the sampling task as we scale up the team size. Majority of coverage metrics in Fig. 5(b) remain above 70% regardless of scaling up the number of robots to 20 and on an 80 × 80 map, which is far bigger than the training setup. The decrease in coverage is not surprising, as the robots need to travel longer distances between the dense reward regions as the map resolution increases.
Fig. 5. Quantitative results for scaling the team size. The dots represent each of the 40 trials.
Scalability with Sensing and Communication Radius: To evaluate the scalability with the sensing and the communication radius, we change the sensing and the communication radius dcr during the deployment from 0%Dmax to 100%Dmax . In our simulations, we down-sampled the reef data to 30 × 30. Figure 6 (a) and (b) present 5 robots deployed at upper left corner of the map carrying out the sampling task with dcr = 0%Dmax and 100%Dmax respectively. As expected, with a larger sensing and communication radius, robots explore cooperatively and do not sample the same region repeatedly.
358
L. Pan et al.
Fig. 6. (a) and (b) Robot team with 5 robots carrying out the sampling task with dcr = 0%Dmax and dcr = 100%Dmax using the learned policy. Circles are the deployment location, and triangles are the stop location. (c) Quantitative results include plots of discounted accumulated reward, average pair-wise overlap, and average communication volume over different communication radii. The 95% confidence intervals over 40 trials.
The plots in Fig. 6(c) illustrate that the discounted accumulated reward increases and the average pair-wise overlap decreases, and both quickly plateau at dcr = 30%Dmax . The average communication volume increases linearly and slows its rate after dcr = 60%Dmax . We observed consistent behaviors over all the other test datasets. We conclude that, with a moderate range of sensing and communication, the cooperative behavior is achieved by the team, and increasing the sensing and communication radius yields no significant contribution to its performance. 4.2
Analyzing Robustness of the Learned Policy
In this section, we show how the ability to estimate the position of other robots makes the system robust against communication failures and robot failures. Communication Failure: We first consider a communication failure scenario where the robot can only rely on sensing to localize neighbor positions. The robot team and workspace setup is kept same as the one described in the communication scalability experiment. We created four scenarios by changing the neighbor position estimation and communication capability of the learned model: the first is when the estimates of the neighbor positions and the communication are both enabled; The second scenario is when the communication failed at the time step 20 and robots localize their neighbors only via filtering. The third scenario is when both the estimates of the neighbor positions and the communication failed at time step 20. Lastly, we use the control policy with global communication as a benchmark. To better reflect the overlap change with respect to the robots’ travel distance, we use average overlap percentage as a metric. The results in Fig. 7 (a) and (b) illustrate that before the failure, there is no significant difference between models’ performances in all the scenarios. Once the failure is introduced, the second scenario, with the communication failure but active position estimate, has a performance that is very close to that of the benchmark
MARLAS: Multi Agent Reinforcement Learning
359
scenario with the global communication. This means that the estimates of the neighbor positions serve as good substitutes for the ground truth information. The scenario with both the communication failure and the neighbor position estimation failure performs poorly compared to all the other scenarios showing that the proposed belief model compensates for complete communication failures, thus increases the robustness of MARLAS. We observed consistent results over all the test datasets.
Fig. 7. Robustness to failures: The failures happen at time step 20, denoted as the dashed line. The 95% confidence intervals are computed over 40 trials. (c) Snapshots of robot failure experiments.
Robot Failure: In Fig. 7 (c), we present qualitative experimental results to show robustness of MARLAS against the robot failure. As shown in left image in Fig. 7 (c), when there are no failures, all three robots share and cover the hotspot regions. When two of the robots fail during a survey, the other robot samples from the hotspot regions and achieves good overall performance for the team as depicted in the right image in Fig. 7 (c). These preliminary results display a fault-tolerance nature of MARLAS and we would like to further investigate this behavior in the near future.
Fig. 8. Online adaptation: Three sequential time steps of a changing field and MARLAS generating robot paths that adaptively cover the underlying distribution. Circles represent the start locations, and triangles represent the stop locations.
360
4.3
L. Pan et al.
Online Adaptations to the Changing Field
To demonstrate the online adaptation capability of MARLAS, we deploy the robots to sample a stochastic field. We synthesize a dynamic field as a mixture-ofGaussians with a random motion added to the means of the Gaussians. Figure 8 illustrates three sequential time steps of a changing field and MARLAS generating robot paths that adaptively cover the underlying distribution. An updated low-resolution prior is provided to the robots every 100 time steps which can be achieved by either an aerial vehicle or a sparse network of stationary sensors. The MARLAS algorithm is able to generate updated plannings for the robot team on the fly. 4.4
Robot Experiments
To demonstrate the feasibility of the proposed strategy in the physical world, we conducted experiments in an indoor water tank using centimeter scale robotic boats as presented in Fig. 9(a) and (b). We pre-plan the robot trajectories using the MARLAS algorithm, and each mini boat is controlled by base station commands to execute its assigned trajectory. Plots in Fig. 9(c), (d), and (e) illustrate the difference between planned path and the path executed by the physical robots. These results indicate that the action space used for our policy generation needs further investigation to include the dynamics of the autonomous vehicles. In future, we plan to design the policies over motion primitives to achieve smooth trajectories for physical robots.
Fig. 9. (a) Indoor water tank experimental setup and embeded picture shows the internals of a mini robotic boat, (b) MARLAS policies run on mini robotic boats to sample from the spatial field that is projected over the water surface.
MARLAS: Multi Agent Reinforcement Learning
5
361
Conclusion and Future Work
We proposed an online and fully decentralized multi-robot sampling algorithm that is robust to communication failures and robot failures, and is scalable with both the size of the robot team and the workspace size. Our proposed method, MARLAS, outperforms other baseline multi-robot sampling techniques. We analyze the key features of MARLAS algorithm and present qualitative and quantitative results to support our claims. In the future, we like to investigate the application of MARLAS to sample dynamic spatiotemporal processes. We would like to study the effect of reducing the length l of trajectory history that needs to be communicated between the teammates and thus enhance the memory efficiency and system robustness. We would like to further investigate distributed learning with map exploration using local measurements of the environment. Acknowledgements. We gratefully acknowledge the support of NSF IIS 1812319 and ARL DCIST CRA W911NF-17-2-0181.
References 1. Low, K.H., Dolan, J.M., Khosla, P.: Adaptive multi-robot wide-area exploration and mapping. In: Proceedings of the 7th international joint conference on Autonomous agents and Multiagent Systems Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, pp. 23–30 (2008) 2. Sadat, S.A., Wawerla, J., Vaughan, R.: Fractal trajectories for online non-uniform aerial coverage. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 2971–2976. IEEE (2015) 3. Manjanna, S., Van Hoof, H., Dudek, G.: Policy search on aggregated state space for active sampling. In: Xiao, J., Kr¨ oger, T., Khatib, O. (eds.) ISER 2018. SPAR, vol. 11, pp. 211–221. Springer, Cham (2020). https://doi.org/10.1007/978-3-03033950-0 19 4. Almadhoun, R., Taha, T., Seneviratne, L., Zweiri, Y.: A survey on multi-robot coverage path planning for model reconstruction and mapping. SN Appl. Sci. 1(8), 1–24 (2019) 5. Salam, T., Hsieh, M.A.: Adaptive sampling and reduced-order modeling of dynamic processes by robot teams. IEEE Robot. Autom. Lett. 4(2), 477–484 (2019) 6. Venkataramani, R., Bresler, Y.: Perfect reconstruction formulas and bounds on aliasing error in sub-nyquist nonuniform sampling of multiband signals. IEEE Trans. Inf. Theory 46(6), 2173–2183 (2000) 7. Rahimi, M., Hansen, M., Kaiser, W.J., Sukhatme, G.S., Estrin, D.: Adaptive sampling for environmental field estimation using robotic sensors. In: 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3692–3698. IEEE (2005) 8. Cortes, J., Martinez, S., Karatas, T., Bullo, F.: Coverage control for mobile sensing networks. IEEE Trans. Robot. Autom. 20(2), 243–255 (2004) 9. Durham, J.W., Carli, R., Frasca, P., Bullo, F.: Discrete partitioning and coverage control for gossiping robots. IEEE Trans. Rob. 28(2), 364–378 (2011) 10. Aurenhammer, F.: Voronoi diagrams-a survey of a fundamental geometric data structure. ACM Comput. Surv. (CSUR) 23(3), 345–405 (1991)
362
L. Pan et al.
11. Breitenmoser, A., Schwager, M., Metzger, J.-C., Siegwart, R., Rus, D.: Voronoi coverage of non-convex environments with a group of networked robots. In: 2010 IEEE International Conference on Robotics and Automation, pp. 4982–4989. IEEE (2010) 12. Mishra, M., Poddar, P., Chen, J., Tokekar, P., Sujit, P.: Galopp: multi-agent deep reinforcement learning for persistent monitoring with localization constraints, arXiv preprint arXiv:2109.06831 (2021) 13. Chen, J., Baskaran, A., Zhang, Z., Tokekar, P.: Multi-agent reinforcement learning for visibility-based persistent monitoring. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2563–2570. IEEE (2021) 14. Kantaros, Y., Schlotfeldt, B., Atanasov, N., Pappas, G.J.: Sampling-based planning for non-myopic multi-robot information gathering. Auton. Robot. 45(7), 1029–1046 (2021) 15. Kapoutsis, A.C., Chatzichristofis, S.A., Kosmatopoulos, E.B.: DARP: divide areas algorithm for optimal multi-robot coverage path planning. J. Intell. Robot. Syst. 86(3), 663–680 (2017) 16. Hsu, C.D., Jeong, H., Pappas, G.J., Chaudhari, P.: Scalable reinforcement learning policies for multi-agent control. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4785–4791. IEEE (2021) 17. Dibangoye, J.S., Amato, C., Buffet, O., Charpillet, F.: Optimally solving DecPOMDPs as continuous-state MDPs. J. Artif. Intell. Rese. 55, 443–497 (2016) 18. Khatib, O.: Real-time obstacle avoidance for manipulators and mobile robots. In: Cox, I.J., Wilfong, G.T. (eds.) Autonomous Robot Vehicles. Springer, New York, NY (1986). https://doi.org/10.1007/978-1-4613-8997-2 29 19. Manjanna, S., Hsieh, M.A., Dudek, G.: Scalable multi-robot system for non-myopic spatial sampling, arXiv preprint arXiv:2105.10018 (2021) 20. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization, arXiv preprint arXiv:1412.6980 (2014) 21. Dai, C., Zhao, M.: Nonlinear analysis in a nutrient-algae-zooplankton system with sinking of algae. In: Abstract and Applied Analysis, vol. 2014. Hindawi (2014) 22. Meghjani, M., Manjanna, S., Dudek, G.: Multi-target rendezvous search. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2596–2603. IEEE (2016)
Proportional Control for Stochastic Regulation on Allocation of Multi-robots Thales C. Silva(B) , Victoria Edwards, and M. Ani Hsieh Department of Mechanical Engineering and Applied Mechanics, University of Pennsylvania, Philadelphia, PA 19104, USA [email protected]
Abstract. Any strategy used to distribute a robot ensemble over a set of sequential tasks is subject to inaccuracy due to robot-level uncertainties and environmental influences on the robots’ behavior. We approach the problem of inaccuracy during task allocation by modeling and controlling the overall ensemble behavior. Our model represents the allocation problem as a stochastic jump process and we regulate the mean and variance of such a process. The main contributions of this paper are: Establishing a structure for the transition rates of the equivalent stochastic jump process and formally showing that this approach leads to decoupled parameters that allow us to adjust the first- and secondorder moments of the ensemble distribution over tasks, which gives the flexibility to decrease the variance in the desired final distribution. This allows us to directly shape the impact of uncertainties on the group allocation over tasks. We introduce a detailed procedure to design the gains to achieve the desired mean and show how the additional parameters impact the covariance matrix, which is directly associated with the degree of task allocation precision. Our simulation and experimental results illustrate the successful control of several robot ensembles during task allocation. Keywords: multi-robot system
1
· control · task-allocation
Introduction
Modeling an ensemble of robots as an aggregate dynamical system offers flexibility in the task assignment and is an alternative to traditional bottom-up task allocation methods, which usually are computationally expensive (see [3,5] and references therein). Given a desired allocation of robots to tasks, each robot must navigate, handle dynamic constraints, and interact with the environment to achieve the desired allocation while meeting some desired team-level performance specifications. It is well understood that uncertainties resulting from the robot’s interactions with the environment, execution of its assigned tasks, and noise from its own sensors and actuators might lead to several issues during task allocation (e.g., performance loss and inaccuracy) [13,14]. This is because allocations are often computed prior to execution and do not account for uncertainties c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 363–377, 2024. https://doi.org/10.1007/978-3-031-51497-5_26
364
T. C. Silva et al.
that arise during runtime. In addition, analysis of the team-level performance of swarms has shown that local (and sometimes small) deviations from the required performance at the robot level can combine and propagate, leading to substantial changes in the behavior of the whole group, which might result in loss of performance for the entire team (see Sect. 2 in [7]). Consequently, addressing the fundamental question of how to properly represent and design robot ensemble behaviors to meet the desired performance requirements can help us to improve our understanding of such complex systems. An existing category of ensemble models uses stochastic processes to represent robot teams [4,8,11]. These models use random variables to describe the population of robots performing tasks, whereas transitions between tasks are described based on jump processes with random arrival times. Using these representations it is possible to design stable equilibria that reflect desired distributions of the robots over tasks [1,2]. In addition, it is also possible to account for and to incorporate heterogeneous sensing and mobility capabilities in a robot team using these models [17,18]. One of the major advantages of these macroscopic representations is their ability to scale to larger team sizes since their complexity is solely a function of the number of tasks. In contrast, the complexity of traditional task allocation methods grows as the number of agents and tasks increases. Nevertheless, most macroscopic approaches are accurate for an asymptotically large number of agents, i.e., when N → ∞, where N represents the number of robots in the team. Fundamentally, within this assumption is the notion that no individual robot’s deviations from its desired performance will greatly impact the team’s performance as a whole. And yet Tarapore et al., [19] recently discussed the challenges associated with employing large teams of robots in real world applications. Considering the impracticability of dense swarms in several applications, they propose sparse swarms arguing that guiding the research toward low-density swarms is more relevant to real-world applications. One of the goals of this paper is to address the large number assumption by providing a method to regulate the variance of a robot distribution in the task allocation problem. Specifically, by explicitly controlling the mean of the robot distribution and independently controlling its variance we can employ our methodology in teams with a relatively small number of agents, that is, we decrease the impact of individual robot deviations on the team’s performance. Hence, our methodology is a stride towards accurately modeling the allocation dynamics of these sparse swarms. While similar techniques exist [12], the proposed strategy lacks the ability to simultaneously control both the first- and second-order moments. Instead, we leverage the work from Klavins [10], which considers a stochastic hybrid system to model and control the mean and variance of molecular species concentrations. We extend this result to robot swarm applications, which fundamentally alter some of the assumptions in [10], for example, in our setting we are concerned about teams of constant size without a set of continuous variables. In robotics, Napp et al., [15] presented an integral set point regulation technique, employed to regulate the mean of the robot population performing specific tasks.
Control for Stochastic Regulation on Allocation of Multi-robots
365
While they highlighted the importance of considering variance control, since the mean can be a sensitive control objective, they do not propose a methodology to adjust the variance. In this paper, we present a strategy to govern the mean and variance of the distribution of robots across a set of tasks. The contributions of this work include 1) a decoupled formulation of the first- and second-order moments for an ensemble model of robot teams that allows their adjustment individually, and 2) systematic methods to determine the control gains to achieve the desired distributions of robots over tasks. Our experimental results indicate that our proposed macroscopic techniques allows for improved control of small number teams.
2 2.1
Problem Formulation Graph Theory
We are interested in the stochastic behavior of a robot ensemble generated by robot-level uncertainties. We assume a group of N ∈ N robots executing M ∈ N spatially distributed tasks, the relationship among tasks is represented by a graph G(V, E), in which the elements of the node set V = {1, ..., M } represent the tasks, and the edge set E ⊂ V×V maps adjacent allowable switching between task locations. Each element of the edge set represents a directed path from i to j if (i, j) ∈ E. A graph G is called an undirected graph if (i, j) ∈ E ⇐⇒ ( j, i) ∈ E. We define the neighborhood of the ith task node as the set formed by all tasks that have the ith task as a child node and it is denoted by Ni = { j ∈ V : ( j, i) ∈ E}. We call a neighbor of a task i an adjacent task j that belongs to Ni . In this paper we assume the following: Assumption 1. The graph that maps the relationship between tasks, G(V, E), is connected and undirected. Assumption 1 allows the robots to sequentially move back and forth between tasks. For some task topologies, it is possible to adjust the transition parameters such that directionality in the task graph is obtained [2]. 2.2
Ensemble Model
To derive the macroscopic ensemble model for the N robot team simultaneously executing M tasks, we assume each robot can execute only one task at a given time and it must switch between adjacent tasks to complete the team’s wide objective. The model needs to capture the global stochastic characteristics that emerge from a robot’s interaction with other robots and the environment, such as variations in terrain [12,17], collision avoidance [11], and delays [1]. Thus, the state of our system is given by a vector of random variables X(t) = [X1 (t) · · · XM (t)], where the prime symbol in a vector, a , denotes its
366
T. C. Silva et al.
transpose, and each element of X(t) describes the number of robots executing M X (t) = N for any t ≥ 0. the respective task at time t, therefore i= 1 i We aim to characterize the evolution of the random variable Xi (t), for each i ∈ {1, . . . , M } and, ultimately, regulate its value. However, the stochastic nature of these variables might lead to convergence only to a vicinity of the desired value, where the size of the convergence region depends on robot-level deviations from the desired performance [9]. In principle, the control of systems subject to random variations could be approached by applying robust control techniques and defining input-to-state stability criteria. However, in our scenario, defining bounds for variations of robot-level behavior for a general number of tasks and robots can be difficult. Therefore, we approach the problem by controlling the mean value of the state variables Xi (t) and the dynamics of their second-order moments. Observe that the vicinity size of the desired objective is directly associated with the second-order moment. The states of our system characterize pure jump processes Xi (t) : [0, ∞) → D, where D is a finite discrete set. In particular, let ψ : D × [0, ∞) → R be a realvalued function, it is a known result that the dynamics of the expected value of ψ is described by the following relation (see Chapter 1 in [16]), d E[ψ(Xi (t))] = E[Lψ(Xi (t))], dt in which L is the infinitesimal generator of the stochastic process defined as [ψ(φ (Xi (t))) − ψ(Xi (t))]λ (Xi (t)), (1) Lψ(Xi (t)) =
where λ (Xi (t)) is a function of the number of robots at task Xi and in its neighborhood which describes the rate of transitions, i.e., λ (Xi (t))dt represents the probability of a transition in Xi (t) occurring in dt time, φ (Xi (t)) maps the size of a change on Xi (t) given that an transition happened. In our case, we assume two possible unitary changes in dt, a transition of type = 1, in which one agent leaves Xi (t) defined by φ1 (Xi (t)) = Xi (t) − 1, and of type = 2 where one agent arrives Xi (t), defined by φ2 (Xi (t)) = Xi (t) + 1. Hence, ∈ {1, 2} maps the type of transition, i.e., one robot leaving or one agent arriving at Xi (t), respectively. In this paper, we are interested in enforcing desired first- and second-order moments of the distribution of robots across tasks. Specifically, the problem investigated can be stated as: Problem 1. Given M tasks to be dynamically executed by N robots, in which the switching between adjacent tasks is represented by the topology of the task graph G(V, E), define a feedback controller that changes the transition rates λ (Xi ), for all i ∈ {1, . . . , M }, such that the distribution of first- and second-order moments converge to the desired value. It is important to note that a solution for a similar problem was previously proposed in [11,12]. Even though they were motivated by the study from Klavins
Control for Stochastic Regulation on Allocation of Multi-robots
367
[10], the proposed control strategy relies on a feedback linearization action, which leads to a closed-loop system of the form (see equation (9) in [11]) d E[X(t)] = (K α + K β )E[X(t)], dt where K α is a matrix that maps the transition rate of the system and K β is a matrix with new parameters to attain the desired variance. The main drawback of this approach is twofold: i) by changing K β the stationary distribution of the equivalent Markov chain also changes, which makes it necessary to readjust the gains for the whole system, and ii) addressing the regulation of secondorder moments using K β is equivalent to changing the transition rate of the whole process during the entire evolution. In comparison, our proposed strategy overcomes these limitations by simply manipulating the graph structure and defining a new structure for λ (·), in such a way that the new feedback strategy does not directly modify the stationary distribution of the equivalent Markov chain and provides changes to the transition rates according to the current state of the system. We describe our methodology in the following section.
3 3.1
Methodology Control Strategy
We model the arrival and departure of robots by pure stochastic jump processes. For example, for two tasks: X1 (t) ←→ X2 (t) λ· (·)
(2)
the random variables X1 (t) and X2 (t) represent the number of robots in each task at time t, and λ (Xi ) for ∈ {1, 2} maps the transition rate of agents between tasks. Motivated by Klavins [10] on gene regulatory networks, we approach Problem 1 by defining the transition rate functions with a term that is proportional with the number of agents on a given site and in its neighborhood. In Eq. (2) this choice means that switching from task 1 to 2 and from 2 to 1 will depend on the number of agents at both tasks. In particular, for Eq. (1) we define the transition rates as, ki j Xi − βi Xi X j , (3) λ1 (Xi ) = j ∈Ni
λ2 (Xi ) =
k ji X j − βi Xi X j .
(4)
j ∈Ni
Remark 1. A necessary condition for the equations in (3) and (4) to be valid transition rates is being non-negative. While it is possible to guarantee positiveness during the whole evolution for some well-designed initial conditions with careful choices of gains, we instead, let robots transition back to previous tasks before the team reaches steady-state by mapping the equivalent non-negative transition with reversed direction. This flexibility is the justification for Assumption 1.
368
T. C. Silva et al.
It is worth noticing that our control strategy is distinct from the one presented in [10]. Here, we only require the transition rate to be adjusted according to the current number of agents in the tasks and in their neighborhood. While in [10] it is assumed that each node in the network has a source and a sink, which implies that the variation on the value of the random variable Xi can increase or decrease independently of the variation in its neighbors, in addition to the switching between adjacent nodes. In the following, we show that with our choice for the transition rates λ (Xi ), the first-order moment depends only on the parameter ki j , while the variable βi manifests on the second-order moments. This introduces a free variable, allowing the variance to be properly adjusted. Defining ψ(Xi ) = Xi and applying the infinitesimal generator (1) with the transition rates in (3) and (4), we get the following first-order moment for a general arrangement, d k ji X j − βi Xi X j − ki j Xi − βi X j Xi E[Xi ] = E dt j ∈Ni j ∈Ni k ji E[X j ] − ki j E[Xi ] . = j ∈Ni
While for the second-order moments we define ψ(Xi ) = Xi2 , d 2 k ji X j − βi Xi X j E[Xi ] = E (Xi + 1)2 − Xi2 dt j ∈Ni 2 2 ki j Xi − βi Xi X j + (Xi − 1) − Xi j ∈Ni
=E
2k ji Xi X j − 2ki j Xi Xi + ki j Xi + k ji X j − 2βi Xi X j ,
j ∈Ni
and for off-diagonal terms, d k ji X j − βi Xi X j E[Xi Xq ] = E (Xi + 1)Xq − Xi Xq dt j ∈Ni k jq X j − βq Xq X j + (Xq + 1)Xi − Xi Xq j ∈Nq
+ (Xi − 1)Xq − Xi Xq
ki j Xi − βi Xi X j
j ∈Ni
+ (Xq − 1)Xi − Xi Xq =E
k q j Xq − βq Xq X j
j ∈Nq
j ∈Ni
k ji Xq X j − ki j Xi Xq +
j ∈Nq
k jq Xi X j − k q j Xq Xi .
Control for Stochastic Regulation on Allocation of Multi-robots
369
An important property of these equations is that they are in closed-form, which allows us to use them to design gains to attain the desired distribution. For a compact vectorized form, let K be an M × M matrix defined as follows, Kij
⎧ ⎪ kj ⎪ ⎨ i ⎪ = − qM kiq ⎪ ⎪ ⎪0 ⎩
if ( j, i) ∈ E, if i = j, otherwise,
(5)
then the first- and second-order moments can be written as, d E[X] = K E[X], dt
(6)
d E[X X ] = K E[X X ] + E[X X ]K dt
M ki j E[Xi ] + k ji E[X j ] − 2βi E[Xi X j ] , + ei ei ⊗ i=1
(7)
j ∈Ni
where ei , for i = 1, ..., M, are the canonical basis of R M and ⊗ denotes the Kronecker product. Eqs. (6) and (7) model the dynamics of the first- and secondorder moments of a group of N robots executing M tasks. There are two important underlying assumptions on this representation, first we assume the timing for robots to leave and arrive at a task follows a Poisson process. We believe this assumption is not too restrictive since, as discussed above, robot level deviations from desired performance will impact the scheduling, leading to stochastic jump processes. The second assumption, and arguably the hardest to justify for general systems, is that we can actuate on the system’s transition rates. To be able to do this in our case, notice the structure of λ (·) in (3) and (4), we need to monitor the current number of agents at each task and their neighborhood and then communicate this value to each agent at these sites to compute the switching time. Dynamically defining the transition rates has an intricate relationship with the allocated time and microscopic deviations. We expect that the feedback nature of our approach accounts for microscopic deviations from the desired performance in the long run. 3.2
Control Analysis
In this section we analyse the convergence of the mean to the desired distribution and show a simple method to obtain the parameters of the gain matrix K. Theorem 1. Let N agents execute M tasks organized according to an undirected graph G(V, E). Given a desired stationary distribution E[X d ] such that M d d i=1 Xi = N, the robot ensemble converges asymptotically to E[X ] from any initial distribution, with transition rates defined in (3) and (4), if, and only if K E[X d ] = 0.
370
T. C. Silva et al.
Proof. The demonstration follows from standard Lyapunov analysis. Note that, by definition, the eigenvalues of matrix K have negative real part, except for one which is zero, that is, 0 = σ1 (K) > σ2 (K) ≥ · · · ≥ σM (K), where σi (K) denotes the ith eigenvalue of K. Hence, defining a simple Lyapunov function candidate, V = E[X] PE[X], where P is a symmetric positive definite matrix. The time-derivative of V along the trajectories of the first-order moments yields dV = E[X] K P + PK E[X], dt because every eigenvalue of K has negative or zero real part, we have that dV ≤ 0. dt Consider the invariant set S = {E[X] ∈ R M : dV dt = 0}, since the robots will always M be in one of the graph’s nodes, i.e., i= X = N, we have that E[X(t)] 0 for i 1 all t > 0. Therefore, no solution can stay in S other than E[X d ], in other words, the equilibrium point E[X d ] is globally asymptotically stable. Notice that we assumed that each robot will always be assigned to one task at any instant, to consider robots in transit between nodes it is possible to extend the task graph as in [1]. In addition, by construction, K results in a stable ensemble. The compelling aspect of Theorem 1 with respect to our model for the first- and second-order moment, Eqs. (6) and (7), is that we can use the condition provided on the theorem to design the gains to attain a desired distribution, while still having free variables βi to adjust the second-order terms. We compute the gain matrix K by solving the following optimization: K = argmin QE[X d ] Q
(8)
s.t. 1 Q = 0; Q ∈ K; structure in (5), where the set K accounts for the set of possible rate matrices for a given set of tasks and E[X d ] ∈ Null(Q) is the desired final distribution. At the moment, although we have shown additional tuning variables in our approach, we do not have a methodology to compute appropriate gains βi for a desired covariance matrix. Nonetheless, we give an intuition of the mechanism of how they impact the steady-state covariance matrix. Let C = E[(X − E[X])(X − E[X])] be the covariance matrix, replacing E[X X ] in (7) and recalling that d K E[X] = 0 during the steady-state, analysing the equilibrium point dt E[X X ] = 0 gives us
M d d d d ki j E[Xi ] + k ji E[X j ] − 2βi E[Xi X j ] . ei ei ⊗ K C + CK = − i=1
j ∈Ni
Since every eigenvalue of K has negative real part except for one which is equal to zero, and E[Xd ] 0, we have an unique solution for C [9,10]. Therefore, the additional tuning variables βi , for i ∈ V are inversely proportional to the covariance at the steady-state–the bigger the values of βi , while (3) and (4) are positive during the steady-state (recalling Remark 1), the smaller the covariance.
Control for Stochastic Regulation on Allocation of Multi-robots
4 4.1
371
Results Numerical Simulations
Example 1. We use the stochastic simulation algorithm proposed by Gillespie [6] to evaluate the proposed control strategy. We present a four task example, M = 4, with the topology represented in Fig. 1 (a). We run two different tests to evaluate the effectiveness of our method, one with parameters βi = 0, for i = 1, . . . , 4, and another with nonzero βi parameters (values given below). In both cases, we aim to achieve the same final distribution. The initial populations at each task is given by X 0 = [5 15 5 5], and the desired distribution is E[X d ] = [13 9 6 2]. The gain matrix K used to reach such a distribution was computed solving the optimization problem (8) with K = {K ∈ R4×4 : K ≤ −1.5I}, this constraint was imposed to ensure not too long convergence time. The resulting parameters from the optimization are k12 = 2.1, k14 = 1.4, k21 = 1.5, k23 = 1.3, k32 = 0.9, k34 = 1.2, k41 = 0.1, k43 = 0.6, and 0 elsewhere outside the main-diagonal. We computed the mean and variance from the simulation considering 130 data points sampled for t > 2.0 seconds in each simulation. This number was chosen to match the sample size between the two experiments. The simulation with βi = 0 for all i = 1, . . . , 4 in Fig. 1 (b) has the following mean E[X] = [12.16 9.76 5.18 2.77], and
Fig. 1. The graph topology is depicted in (a). In (b) and (c) it is shown a realization of each of the stochastic simulations, where the dots represent the number of agents and the solid lines represent the numerical solution of (6). The tasks are identified by the color caption. The realization in (b) uses βi = 0, ∀i ∈ V, while in (c) is with β1 = 0.05, β2 = 0.20, β3 = 0.11, and β4 = 0.052.
372
T. C. Silva et al.
variances, diag(E[X X ]) = [5.78 6.83 4.20 1.44]. The variances were improved by greedily modifying βi values–if the variance was reduced and did not actively disturb another task population variance then that value was kept. Figure 1 (c) used parameters β1 = 0.05, β2 = 0.20, β3 = 0.11, and β4 = 0.052, where the resulting means are E[X] = [12.26 9.30 5.83 2.60] variances are diag(E[X X ]) = [1.06 1.12 1.15 0.45]. These results highlight the impact of variability on small size teams (N = 30). Visually, the control reduces the variance throughout the experiment, especially once steady-state is reached. It is worth mentioning that we have tested our strategy on larger teams and the results were in agreement with our hypothesis that it is possible to shape the variance using (3) and (4), however, due to space limitations of the venue, we chose to not show those results. Example 2. In this example, we run 6 different trials with varying team sizes to numerically evaluate the variation in accuracy for teams of different sizes. To this end, we consider four tasks with topology represented in Fig. 1 (a), and we computed the variance considering 160 data points sampled from each simulation for t > 2.0 s. In addition, we vary the total number of robots between simulations, hence relative the initial and desired distributions for each simulation are: X 0 = [25% 25% 0% 50%], and E[X d ] = [50% 50% 0% 0%], respectively, given as a percentage of the total number of robots in the team. In these simulations, we used the same values for the parameters βi as in Example 1, and we do not change them between simulations. To provide an intuition of the level of spread in each scenario, we compute the Relative Variance (RV) as the quotient of the mean by the variance of the respective task. The results are presented in Table 1. We notice that for this scenario, the RV for the multi-robot system Table 1. Expectation and Relative Variance (RV) of a series of numerical experiments considering different team sizes, with task graph in Fig. 1 (a), initial and desired final distributions X 0 = [25% 25% 0% 50%] , and E[X d ] = [50% 50% 0% 0%] , respectively, and when βi parameters are considered their values are given by β1 = 0.05, β2 = 0.20, β3 = 0.11, and β4 = 0.052. βi = 0 for i = 1, . . . , 4
βi 0 for i = 1, . . . , 4
N = 52 N = 26 N = 16 N = 52 N = 26 N = 16 E[X 1 ] E[X 2 ] E[X 3 ] E[X 4 ]
24.8 26.5 0.5 0.2
12.8 12.7 0.0 0.0
8.2 7.7 0.0 0.0
25.2 25.6 0.6 0.4
13.3 14.6 0.0 0.0
7.5 8.3 0.1 0.0
RV(X 1 ) RV(X 2 ) RV(X 3 ) RV(X 4 )
0.49 0.54 0.97 0.75
0.62 0.63 0.1 0.1
0.61 0.69 0.0 0.0
0.21 0.21 0.88 1.38
0.36 0.23 0.00 0.00
0.53 0.47 1.56 0.00
Control for Stochastic Regulation on Allocation of Multi-robots
373
without βi and with 52 robots is similar to the RV of a team with only 16 robots and βi values. This suggests that our methodology improves the accuracy of the allocation of relatively small teams. Figures 2 (a) and (b) provide a realization for an instance with βi = 0 and N = 52 and with βi 0 and N = 16. 4.2
Experimental Results
Experimental tests were performed in the multi-robot Coherent Structure Testbed (mCoSTe). Figure 3 depicts the miniature autonomous surface vehicles (mASV) in Fig. 3 (b), and the Multi-Robot Tank (MR tank) in Fig. 3 (a). The MR Tank is a 4.5 m × 3.0 m × 1.2 m water tank equipped with an OptiTrack motion capture system. The mASVs are a differential drive platform, localized using 120 Hz motion capture data, communication link via XBee, and onboard processing with an Arduino Fio. Experiments were run for 4 minutes each, with X 0 = [4 0 0 0], and E[X d ] = [1 1 1 1], following the task graph outlined in Fig. 1 (a). The parameters used for experimental trials were ki j = 0.01 for each nonzero and out-of-diagonal element of the matrix K, and when variance control was used it was all βi = 0.005 for i = 1, . . . , 4. Those parameters were chosen to take into consideration the time necessary for the mASVs to travel among tasks. The parameters were converted to individual robot transitions by computing the likelihood of transition using Eqs. (3) and (4), as well as which task the robot should transition to. At each task the mASVs were required to circle the location with a radius of 0.25m until a new transition was required.
(a) Realization without
and
52
(b) Realization with
and
16
Fig. 2. In (b) and (c) it is shown a realization of each of the stochastic simulations with N = 52 and N = 16, respectively, where the dots represent the number of agents and the solid lines represent the numerical solution of (6). The figure axis are the same to help the visualization of the relative spread. The tasks are identified by the color caption and the graph topology is depicted in Fig. 1 (a). The realization in (b) uses βi = 0, ∀i ∈ V, while in (c) is with β1 = 0.05, β2 = 0.20, β3 = 0.11, and β4 = 0.052.
374
T. C. Silva et al.
An instance of an experimental trial for both, with and without βi , are included in Fig. 4. The trajectory of each robot is represented by colors, the intensity of the colors is associated with its time occurrence–when the color is lighter that path was traveled by the respective mASV earlier in the experimental trial. Figure 4 (a) has a total of 28 transitions among tasks, those transitions take place throughout the experiment as indicated by darker lines between tasks. In the case of variance control, Fig. 4 (b), there are 13 transitions among tasks, notice that many of these transitions happened earlier in the experiment. From five experimental trials without variance control the average number of task switches was 25.4, and from five experimental trials with variance control the average number of task switches was 15.6. This confirms that in the case where feedback is provided there is a reduced number of switches. A video illustrating our results can be found at https://youtu.be/fTsX-5z6BUw. During experimentation we observed that sometimes the mASVs would transition before it could physically reach the assigned task, traveling diagonally among tasks. This was the case in Fig. 4 (b) where Robot 4 quickly switched to another task while in transit. While this did not impact the overall results, it is an area of open interest to achieve desired parameters which leads to the desired task distribution within the capability of the robots.
5
Discussion
In the preceding sections, we have investigated the problem of stochastic allocation of multi-robot teams. For our main result, we formally demonstrated that through a particular structure for the transition rates of the stochastic jump process model, we can decouple the first-order moments from the second-order moments. Such a decoupling allows us to introduce additional tuning variables to regulate the variance of the desired distribution of robots over tasks. The additional degree of freedom helps to reduce the impact of individual robots on the overall team’s performance. Therefore, the result of this contribution is to expand the viability of top-down stochastic models to include reduced-size
Fig. 3. The mCoSTe environment and mASV platform used for experimental results.
Control for Stochastic Regulation on Allocation of Multi-robots
375
Fig. 4. Experimental results from a 4 minute and 4 mASV run, with task graph in Fig. 1 (a). The desired distribution is one agent at each task. The test in Fig. 4 (a) had 28 switches, while in Fig. 4 (b) it had 13 switches. The colors’ intensity reflects time–the lighter the color the earlier that path was cruised in the trial.
robot teams. In general, the intuition for these models is that they are more accurate as the team size increases. However, when using our proposed method and directly adjusting the team variance it is possible to increase the top-down stochastic model accuracy for smaller teams. We argue that such refinement is a stride towards sparse swarms–even though we are technically approaching the team size problem and ignoring the task space size. Although we have formally shown the impact of the additional tuning variables as well as the decoupling between first- and second-order moments, and also experimentally and numerically investigated the influence of such variables, we did not draw general mathematical expressions to compute their values during the design stage. In our investigations, we used a greedy algorithm that increased the variable number based on the desired values of variance and final distribution. In addition, our optimization problem to compute the gain matrix K for a desired distribution incorporates designer knowledge about the robot capabilities through the set K, which will directly affect the robots’ mean transition time. We plan to formalize this unsolved concern in future investigations.
6
Conclusions and Future Work
We have provided a new structure for the transition rates for ensemble robot distributions modeled from a top-down perspective. As a first result, we have demonstrated that such a structure leads to uncoupled parameters to adjust the mean and the variance of desired team’s distribution. Then, based on this finding, we examined simple design strategies to compute the necessary gains. This approach provides an efficient ensemble behavior for relatively small groups. Finally, physical and numerical experiments were implemented, illustrating the effectiveness of the method. Possible future work includes the extension of the
376
T. C. Silva et al.
strategy for distributed regulation. A potential strategy is to perform distributed estimates of the number of agents performing each task. It is also of interest to connect the robot dynamics with the changing transition rates. One possible approach to bridging those two models is through hybrid switching systems. A formal methodology to design the network structure for a given covariance bound will be considered in the future. Acknowledgements. We gratefully acknowledge the support of ARL DCIST CRA W911NF-17-2-0181, Office of Naval Research (ONR) Award No. N00014-22-1-2157, and the National Defense Science & Engineering Graduate (NDSEG) Fellowship Program.
References 1. Berman, S., Hal´ asz, A., Hsieh, M.A., Kumar, V.: Optimized stochastic policies for task allocation in swarms of robots. IEEE Trans. Rob. 25(4), 927–937 (2009) 2. Deshmukh, V., Elamvazhuthi, K., Biswal, S., Kakish, Z., Berman, S.: Meanfield stabilization of Markov chain models for robotic swarms: computational approaches and experimental results. IEEE Rob. Autom. Lett. 3(3), 1985–1992 (2018). https://doi.org/10.1109/LRA.2018.2792696 3. Elamvazhuthi, K., Berman, S.: Mean-field models in swarm robotics: a survey. Bioinspiration Biomimetics 15(1), 015001 (2019). https://doi.org/10.1088/17483190/ab49a4 4. Elamvazhuthi, K., Biswal, S., Berman, S.: Controllability and decentralized stabilization of the Kolmogorov forward equation for Markov chains. Automatica 124, 109351 (2021). https://doi.org/10.1016/j.automatica.2020.109351 5. Gerkey, B.P., Matari´c, M.J.: A formal analysis and taxonomy of task allocation in multi-robot systems. Int. J. Rob. Res. 23(9), 939–954 (2004) 6. Gillespie, D.T.: Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 81(25), 2340–2361 (1977) 7. Hamann, H.: Towards swarm calculus: universal properties of swarm performance and collective decisions. In: Dorigo, M., et al. (eds.) ANTS 2012. LNCS, vol. 7461, pp. 168–179. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3642-32650-9 15 ´ Berman, S., Kumar, V.: Biologically inspired redistri8. Hsieh, M.A., Hal´ asz, A., bution of a swarm of robots among multiple sites. Swarm Intell. 2(2), 121–141 (2008) 9. Khalil, H.K.: Nonlinear Systems, vol. 10, 3rd edn. Prentice Hall, Hoboken (2002) 10. Klavins, E.: Proportional-integral control of stochastic gene regulatory networks. In: 49th IEEE Conference on Decision and Control (CDC), pp. 2547–2553. IEEE (2010) 11. Mather, T.W., Hsieh, M.A.: Distributed robot ensemble control for deployment to multiple sites. In: Robotics: Science and Systems VII (2011) 12. Mather, T.W., Hsieh, M.A.: Ensemble modeling and control for congestion management in automated warehouses. In: 2012 IEEE International Conference on Automation Science and Engineering (CASE), pp. 390–395 (2012). https://doi. org/10.1109/CoASE.2012.6386498 13. Nam, C., Shell, D.A.: When to do your own thing: analysis of cost uncertainties in multi-robot task allocation at run-time. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 1249–1254 (2015). https://doi.org/10. 1109/ICRA.2015.7139351
Control for Stochastic Regulation on Allocation of Multi-robots
377
14. Nam, C., Shell, D.A.: Analyzing the sensitivity of the optimal assignment in probabilistic multi-robot task allocation. IEEE Rob. Autom. Lett. 2(1), 193–200 (2017). https://doi.org/10.1109/LRA.2016.2588138 15. Napp, N., Burden, S., Klavins, E.: Setpoint regulation for stochastically interacting robots. Auton. Robot. 30(1), 57–71 (2011). https://doi.org/10.1007/s10514-0109203-2 16. Øksendal, B., Sulem, A.: Applied Stochastic Control of Jump Diffusions, 2nd edn. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-69826-5 17. Prorok, A., Hsieh, M.A., Kumar, V.: The impact of diversity on optimal control policies for heterogeneous robot swarms. IEEE Trans. Rob. 33(2), 346–358 (2017) 18. Ravichandar, H., Shaw, K., Chernova, S.: Strata: unified framework for task assignments in large teams of heterogeneous agents. Auton. Agents Multi Agent Syst. 34(2), 38 (2020) 19. Tarapore, D., Groß, R., Zauner, K.P.: Sparse robot swarms: moving swarms to real-world applications. Front. Robot. AI 7 (2020). https://doi.org/10.3389/frobt. 2020.00083
Comparing Stochastic Optimization Methods for Multi-robot, Multi-target Tracking Pujie Xin and Philip Dames(B) Temple University, Philadelphia, PA 19122, USA {pujie.xin,pdames}@temple.edu https://sites.temple.edu/trail/ Abstract. This paper compares different distributed control approaches which enable a team of robots search for and track an unknown number of targets. The robots are equipped with sensors which have a limited field of view (FoV) and they are required to explore the environment. The team uses a distributed formulation of the Probability Hypothesis Density (PHD) filter to estimate the number and the position of the targets. The resulting target estimate is used to select the subsequent search locations for each robot. This paper compares Lloyd’s algorithm, a traditional method for distributed search, with two typical stochastic optimization methods: Particle Swarm Optimization (PSO) and Simulated Annealing (SA). This paper presents novel formulations of PSO and SA to solve the multi-target tracking problem, which more effectively trade off between exploration and exploitation. Simulations demonstrate that the use of these stochastic optimization techniques improves coverage of the search space and reduces the error in the target estimates compared to the baseline approach. Keywords: Multi-target tracking Distributed control
1
· Stochastic optimization ·
Introduction
Multi-robot, multi-target tracking (MR-MTT) encompasses many traditional robotics problems, including search and rescue [19], surveillance [10], and mapping [9]. All of these scenarios require a team of robots to search across an environment to detect, identify, and track individual objects. A generic solution to the MR-MTT problem consists of two parts: 1) an estimation algorithm to determine the number and states of all targets and 2) a control algorithm to drive the robots to seek out undetected targets as well as track detected targets. These problems are extensively studied in multi-robot systems [29]. 1.1
Related Work
Multi-target tracking (MTT) is a problem wherein one or more sensors seek to determine the number of targets in a space and the state of each individual c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 378–393, 2024. https://doi.org/10.1007/978-3-031-51497-5_27
Stochastic Optimization for Multi-robot Tracking
379
target. Often the number of targets is unknown a priori and in many cases the targets are not uniquely identifiable. One of the main challenges in MTT is the data association problem, i.e., matching measurements to potential targets. Stone et al. [25] discuss this problem in their book, and survey existing solutions, such as the Multiple Hypothesis Tracker (MHT) [1], Joint Probabilistic Data Association (JPDA) [12], and the Probability Hypothesis Density (PHD) filter [17]. Each of these different methods solve data association in a different way. We pick the PHD filter due to its relative simplicity and good performance. Multi-robot search is another well studied problem in the literature, with many methods existing methods, as discussed in the survey [22]. In recent years, coordination policies using Multi-Agent Deep Reinforcement Learning (MADRL) have become popular [21,30], with some examples use partially observable Monte-Carlo planning [14] while others perform a joint optimization of target assignment and path planning [20]. However, many of the methods are based on a centralized policy and global information, which is a main challenge in MADRL. One of the most widely used classical techniques is Lloyd’s algorithm [11], a decentralized algorithm which iterative partitions the environment and drives the robots using a gradient descent controller. Due to its wide popularity, we will use Lloyd’s algorithm as our baseline approach for comparison. In this paper, we will examine the use of different numerical optimization algorithms to drive MR-MTT. These algorithms fall into two broad categories: deterministic and stochastic. Deterministic methods often suffer from local optima entrapment, particularly for the gradient-based methods which require the knowledge of the entire search space [18]. However, in our setting, the target locations are unknown a priori, which makes the use of gradient-based methods challenging. In contrast, stochastic optimization methods do not rely on the gradient descent which makes these particularly fitted for solving problems with unknown search spaces. Based on the number of candidates solutions, there are individual-based and population-based algorithms. In the context of numerical optimization, these techniques are typically used to find a single global extremum of a function. However, in the MR-MTT problem, we aim to find all local maxima (i.e., all target locations), which requires us to reformulate these techniques to suit our problem. In the family of the individual-based methods, there are Tabu Search [13], Hill Climbing [7], and simulated annealing (SA) [16]. The SA algorithm is widely used in solving unconstrained and bound-constrained optimization problems. This method models the physical activity of heating and tempering in material process. It is likely to accept worse solutions at a rate proportional to the temperature. This enables SA explore broad search space and escape from local optima, which will allow the robots to better explore the environment. People proposed different versions of distributed control with SA based on a variety of objective functions. One similar work is to minimize the total travel distance under Boolean specification using SA [24]. Their scenario is similar to that in Travelling salesman problem (TSP), which enumerates the trajectories. The trajectories are the combination of different nodes.
380
P. Xin and P. Dames
PSO [15] is one of the most successful in population-based algorithms. Derr et al. [8] proposed a decentralized version of PSO for MR-MTT, but in their experiment, the targets are mobile phones. This makes the problem much simpler since the sensors have unlimited range (i.e., are able to detect the signal strength everywhere in the environment) and there is no ambiguity in data association. Tang et al. [26] proposed methods in the environment with obstacles, but the optimization function is still predefined, which means the global importance weight is known to each robot (i.e., target locations are known). Our solution differs from these other works in two key ways: 1) the objective function or the global importance weight of the map is unknown and 2) the robots have a limited field of view. 1.2
Contributions
In this paper, we propose two novel reformulations of traditional stochastic optimization algorithms, SA and PSO, to solve the MR-MTT problem. We compare these against Lloyd’s algorithm, a traditional method for multi-robot search. In all cases we keep the distributed MTT algorithm constant, so the only difference in performance is due to the search strategy. Our results show that the new techniques result in more complete coverage of the search space and lower errors in the target estimate. We also see that SA and PSO yield qualitatively different search patterns as a result of their different structures.
2
Problem Formulation
We have a team of R robots exploring a convex environment E ⊆ R2 to search for an unknown number of targets. The set of targets are Xt = {xt1 , xt2 , . . .}, where xti is the state of the ith target at time t. At each time step, robot r receives a set of measurements Ztr = {zt1,r , zt2,r , . . .}. The number of measurements depends upon the number of targets within the robot’s field of view (FoV) as well as the chance of receiving a false negative or false positive detection. 2.1
PHD Filter
The PHD filter [17] models the sets X and Z above as realizations of Poisson random finite sets (RFSs), which are sets composed of independently and identically distributed (i.i.d.) elements where the number of elements follows a Poisson distribution. The likelihood of such an RFS X is v(x) (1) p(X) = e−λ x∈X
where v(x) is the PHD and λ = E v(x)dx. The PHD is a density function over the state space of the targets, and the expected cardinality of X is the integral of the PHD over E.
Stochastic Optimization for Multi-robot Tracking
381
The PHD filter recursively updates v(x), which completely encodes the distribution over sets, using the measurement sets collected by the robots and the motion model of targets. The PHD filter uses six models. 1) The target motion model f (xt | xt−1 ) characterizes the state transition model of x. 2) The birth PHD b(xt ) describes both of the number of targets and their location at t. 3) The survival probability, ps (xt−1 ), models the existence of a target with state xt−1 . 4) The detection model is pd (xt | q) describes the targets’ detected states xt with a sensor state q. 5) The measurement model, g(z | xt ; q), describes the probability of a robot with state q receiving measurement z from a detected target with state xt . Combining this with the detection model yields ψz,q (xt ) = g(z | xt , q)pd (xt | q), which characterizes the probability of receiving measurement z from a target with state xt by a sensor locating at q. 6) Finally, the clutter model c(z) is another PHD describing the number and states of false positive detections. With these targets and sensor models, the prediction and update of the PHD filter are: t t t f (xt | xt−1 )ps (xt−1 )v t−1 (xt−1 ) dxt−1 (2a) v¯ (x ) = b(x ) + E
v t (xt ) = (1 − pd (xt | q))¯ v t (xt ) + ηz (v) = c(z | q) +
ψz,q (xt )¯ v t (xt ) ηz (¯ v t (xt )) t
(2b)
z∈Z
ψz,q (xt )v t (xt ) dxt
(2c)
E
Here, ηz (v) is a normalization factor, though note that E v(x) dx = λ, i.e., the expected number of targets in the search space, rather than 1, which is standard in single-target tracking. Since the PHD v(x) is a density function in the state space of targets, we need to make an additional assumption to parametrically represent it in a manner that is amenable to computations. The two traditional methods are as a set of weighted particles [28] or a mixture of Gaussians [27]. We use the former, as this allows for arbitrarily non-linear sensing and target models. 2.2
Lloyd’s Algorithm
Lloyds’ algorithm [3] iterative partitioning the environment and navigating the robots towards the weighted centroid of their Voronoi cell. This process locally minimizes the function: min f (d(x, qr ))φ(x)dx, (3) H(q1 , q2 , ..., qR ) = E r∈{1,...,R}
where d(x, q) measures the distance between elements in E, f (·) is a monotonically increasing function, and φ(x) is a non-negative weighting function. A general choice is f (x) = x2 . The inside of the integral approaches the minimum when the environment is continuously divided using the Voronoi partition.
382
P. Xin and P. Dames
Vi = {x | d(x, qi ) ≤ d(x, qj ), ∀i = j} where Vi is the Voronoi cell for robot i. The minimum with respect to robot’s position is xφ(x) dx i ∗ (q ) = Vi , (4) φ(x) dx Vi This is a distributed control algorithm since if each robot knows the positions of its neighbors, the robot is able to compute its Voronoi cell and move to the weighted centroid of its Voronoi cell. In our case, the PHD serves as weighting function φ(x), converting the traditional coverage problem into a search and tracking problem. 2.3
Distributed PHD Filter
We use the distributed PHD filter that we developed in our previous work [6]. The key to this is that each robot maintains the portion of the PHD within its own Voronoi cell and exchanging the data with its Voronoi neighbors. There are then three algorithms that account for the motion of Voronoi cells (i.e., motion of robots), motion of targets across cell boundaries, and measurements within other robots’ cells. This estimation process is described and the convergence is proved in the previous work [6]. Since the distributed estimation method is same, we will focus on the distributed control method. Since all teams use the same distributed PHD filter, they all employ the same Voronoi-based partitioning scheme regardless of their chosen search strategy 2.4
Assumptions
The first assumption we make is that each robot knows its own position. While this is a strong assumption, it can be easily achieved using a localization algorithm in indoor environments [4] or GPS receivers in outdoor environments. Additionally, we have shown that the distributed PHD filter can perform well even with some bounded uncertainty in the pose estimates [6]. Another assumption we make is that all robots can communicate with all of their Voronoi neighbors, this is a necessary assumption under Lloyd’s algorithm (and for our distributed PHD filter formulation). In a practical setting, this problem could be addressed by using robots as communication relay or using multi-hop communication strategies.
3
Distributed Control
As is mentioned in the introduction, stochastic optimization algorithm can be divided into population-based algorithm and individual-based algorithm according the number of candidate solutions. In this paper, we pick one typical method from each family and compare these methods with the standard baseline of Lloyd’s algorithm. Specifically, we use SA as an example of individual-based algorithms and PSO as an example of population-based algorithms.
Stochastic Optimization for Multi-robot Tracking
383
Traditionally, optimization algorithms aim to find a single global maximum of some function. However, in the multi-target tracking case we instead wish to find all the local maxima. In the context of PHD, these local maxima correspond to the peaks of the PHD in search space, which represent the location of the targets. 3.1
Simulated Annealing
Simulated annealing takes inspiration from the annealing process in manufacturing, where materials are slowly cooled to allow them to reach a stable configuraˆ t+1 is randomly tion. In SA, at each iteration, a new candidate point locating at q t qt+1 − qt | is generated near the last solution point q , where the distance δ = |ˆ proportional to the “temperature” T . Suppose we want to maximize the objective function. The solution then moves to the new location if the value is better at the new location, or with some probability if the value is worse. This probability of accepting a suboptimal solution is given by: 1 (5) Paccept ∼ Bernoulli δ 1 + exp ( max(T )) which allows the solution to escape from local optima and hopefully reach a global optimum, with the probability of a jump decreasing over time as the temperature cools. Algorithm 1. Distributed control with SA for Robot r 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21:
Vˆr = Vr ⊕ B(d) Expand the Voronoi Cell for i ∈ N (r) do This step can be combined with particle exchange in estimation Compute ΔVi,r = Vi ∩ Vr Send particles in ΔVi,r to robot i end for Compute cold = Vˆr ∩F oV r (qr ) v(x) for θ ∈ [0 : Δθ : 2π] do ˆ t+1 (θ) =qr + T R(θ)d q Δc(θ) = Vˆr ∩F oV r (ˆqt+1 (θ)) v(x) − cold end for θ∗ = arg maxθ c(θ) if Δc(θ∗ ) > 0 then ˆ t+1 (θ∗ ) =q qt+1 r Tinit = 1 in experiment T = Tinit Paccept from (5) else if rand(0, 1) < Paccept then ˆ t+1 (θ∗ ) =q qt+1 r T := 0.95T 0.95 is a typical cooling factor else according to (4). Set qt+1 r T := 0.95T end if
384
P. Xin and P. Dames
The rules governing simulated annealing correspond to a single searching agent. Thus, in a multi-robot setting, each agent selects actions on its own based on the information stored in the shared, distributed PHD. Our approach is outlined in Algorithm 1. First, robot r expands its Voronoi cell Vˆr and exchange the particles PHD with its neighbor (line 1–5). The weight cold is the sum of PHD of the particles in the overlapped area between Vˆr and its FoV. Robots then search for the best action from a finite set of actions composed of angles θ, which follow a discrete uniform distribution with fixed step size d (line 7). Then robot r searches over potential new locations (line 8), where T is the temperature, R is a rotation matrix, and d is a standard step. For each potential location, the robot predicts the weight it may obtain if it moves to that location by summing the PHD of the particles in the overlapped area between Vˆr and the potential future location (line 9). The movement θ∗ leading to the maximum Δc will be selected ˆ t+1 as the next movement step (line 11). Robot r will move to the new position q only if the Δc > 0 or rand(0, 1) < Paccept (lines 12–17). Otherwise, the robot falls back to the standard Lloyd’s algorithm (line 19). 3.2
Particle Swarm Optimization
In PSO, a collection of particles P move on the search space and the algorithm evaluate the objective value at each particle. These particles are the candidate solutions where the particle i locates at qit at time t. The velocity vit of particle i is updated with the algorithm evaluation function. qti = qt−1 + vti i vti = wvt−1 + c1 r1 (q∗i,b − qti ) + c2 r2 (q∗i,g − qti ) i
(6)
where r1 and r2 are random numbers between 0 and 1, c1 and c2 are predefined constant numbers in PSO. And w is called as the inertia term. Here, qi∗ i,b is the position of particle i where the objective function achieves its best value (i.e., a local best), and q∗i,g is the position with the best objective function value in the history (i.e., a global best). PSO is a metaheuristic method that generates multiple candidate solutions in the search space at the same time. This has a natural analog in multi-robot search, where each robot represents a single particle. However, slightly different from standard PSO, a robot can measure the value of points everywhere in its FoV instead of a single point. Additionally, our objective is not to find a single global best value for the objective function, but rather to find all local maxima (i.e., all targets). Thus, we need to redfine the meaning of q∗·,g and q∗·,b . In Algorithm 2, we set q∗·,g to be the weighted center of Voronoi cell (i.e., the location from Lloyd’s algorithm). And if a robot obtains the highest weight in its history at its current location, it will set q as the q∗·,b . The highest weight in the history, denoted as whistory , is initialized as the obtained weight at t = 1, and it is updated with the current weight wt if wt ≥ whistory . Since r1 , r2 ∼ rand(0, 1), the robot moves with random trade off between q∗·,b and q∗·,g .
Stochastic Optimization for Multi-robot Tracking
385
Algorithm 2. Distributed control with PSO for Robot r 1: 2: 3: 4: 5: 6: 7: 8:
Compute local wt = x∈F oV v(x) at time t. ∗ Set qg according to (4) if wt ≥ whistory then q∗r,b = qr whistory = wt end if + c1 r1 (q∗r,b − qtr ) + c2 r2 (q∗r,g − qti ) vtr = wvt−1 r t+1 t qr = qr + vtr
3.3
Communication and Complexity
Current robot location
Experiments set w, c1 , c2 = 1 Robot moves to qt+1
All teams use the distributed PHD filter for MTT regardless of the search strategy. This is a limiting factor in terms of both computational complexity and communication bandwidth. It requires robots to exchange ownership of portions of the PHD density function as the Voronoi partitions shift [6, Algorithm 1], exchange information about targets that may move across the boundaries of Voronoi cells [6, Algorithm 2], and exchange measurements about targets viewed in neighboring Voronoi cells [6, Algorithm 3]. The standard PHD filter equations scale linearly in complexity with the number of targets (i.e., measurements) [6, eq. (4)]. The complexity and communication load of the distributed PHD filter depend on the situation. For example, the particle exchange step [6, Algorithm 1] depends on the amount of area changing ownership, which can be large when the entire team is tightly packed into one area (yielding very large Voronoi cells for boundary robots). The PHD update [6, Algorithm 3] scales linearly with the number of robots who can see into one another’s Voronoi cells, again making the tightly packed case difficult. When all robots are evenly distributed in the space, the complexity and communication bandwidth will be minimal. Note also the distributed PHD filter assumes that all robots can communicate with their Voronoi neighbors, that communication channels are lossless, and that it ignores the effects of latency. Addressing these real-world factors will be the subject of future work. The different control policies (Lloyd’s algorithm, SA, and PSO) all utilize the resulting PHD estimate to find the goal location. Since robots compute their goal locations using only the PHD within their own Voronoi cell, no additional communication is required for the control computations.1 Additionally, the control computations are all quite straightforward and have constant complexity with respect to team size and target count. Lastly, it should be noted that the “particles” used within our SA and PSO algorithms are actually individual robotic agents in the team. Thus, while it is well known that the performance of SA and PSO is dependent upon the number of particles, in our case this simply means it is a function of team size. 1
This assumes that, per the note in Algorithm 1, line 2, the Voronoi computations are combined with the distributed PHD filter step.
386
4
P. Xin and P. Dames
Experimental Results
In this section, we conduct a set of simulated experiments using Matlab so as to 1) evaluate the correctness and 2) demonstrate the efficacy of the proposed distributed estimation and control algorithm by comparing with the distributed methods used in our previous work [6]. The environment is an open 100 × 100 m area with no obstacles. For simplicity, we represent PHD using a uniform grid of particles with 1 m spacing. The initial weight for each grid is wi = 1 · 10−4 so the initial number of expected targets in the environment is 1. The robots have a maximum velocity of 2 m/s and are holonomic. The sensor collects the data at 2 Hz and the sensor specification are: 0.8 x − q ≤ 5m pd (x | q) = 0 else (7) g(z | x, q) = N (z | x − q, 0.25I2 ) c(z | q) = 3.66 · 10−3
The expected number of clutter detections per scan is F oV c(z | q) dz = 0.287. Note, these values do not represent any specific real-world situation, but could represent a team of drones equipped with downward facing cameras. For a real situation, one could follow the procedures outlined in our other work to carefully characterize the sensor models [2,5]. 4.1
Evaluation Metric
To evaluate the results we use the Optimal SubPattern Assignment (OSPA) metric [23], which is widely used in the PHD literature. The OSPA between two sets X, Y , where |X| = m ≤ |Y | = n without loss of generality, is d(X, Y ) =
p1 m 1 p p min dc (xi , yπ(i) ) + c (n − m) n π∈Πn 1
(8)
where c is the cutoff distance where we set c = 10 m. The dc (x, y) = min(c, x − y) and the Πn is the set of permutations of {1, 2, 3, ..., n}. We set p = 1 as the p-norm. Since the OSPA can fluctuate due to false negative and false positive detections, we use the median of the OSPA over the last 100 s of each trial to obtain a steady estimate in each experiment and measure the variance of the OSPA to evaluate the stability of the tracker. 4.2
Stationary Targets
In our first set of experiments we assume all targets are static, so the transition model is the identity map, the survival model is 1, and the birth model is 0. We use teams 10 to 100 robots (in steps of 10) searching for 10, 30, or 50 targets and compare the new SA and PSO control methods against our previous work that
Stochastic Optimization for Multi-robot Tracking
387
uses Lloyd’s algorithm alone [6]. The target positions are drawn uniformly at random, while the robots all begin in a 20 × 10 m rectangle at the bottom center of the environment. For each configuration of target number, robot number, and control algorithm, we run 10 trials. Note that to ensure consistency in the results, we use identical target distributions across all team/control methods, i.e., there are 10 different randomized target distributions for each configuration. Tracking Accuracy. Figure 1 shows the statistics of the median OSPA error over the ten trials using our original function (PHD), PSO, and SA, respectively. We can see in all cases, as the size of the team increases and surpasses the number of targets, the OSPA error reaches a minimum and does not decrease, even with the addition of more robots. The OSPA error of around 0.8 indicates that almost all targets have been detected and tracked with higher accuracy than the PHD grid (1 m). This empirical minimum is based on the accuracy of the sensors, the particle spacing, and our method for extracting target locations (i.e., finding peaks in the PHD density function [6]). We see that the PSO and SA based algorithm both have a much lower OSPA error when the number of robots is smaller than the targets number, with the PSO algorithm having a lower average value than SA or PHD. The difference between the new methods and PHD is especially evident in the 10-robot, 10target case, where the original method fails to find about half of the targets (i.e., OSPA about halfway between the minimum of 0.8 and maximum of c = 10). We also see that the spread in the OSPA values is smallest for the SA case, indicating that it offers the most reliable performance across all three approaches. In summary, PSO offers the lowest average OSPA, SA offers the most consistent results, and the original is not the best in any category. Coverage. We also wish to examine how well each algorithm explores the environment, as this is another indicator of success. Figure 2 show the robots’ trajectories in the 10-robots, 10-targets case, where the orange diamonds are the targets location and the lines in different colors are the robots’ trajectories. We can observe that in all cases most of the robots eventually find targets and oscillate around them. However, the paths they take to get there are different. Lloyd’s algorithm, as shown in Fig. 2a, spends the least time exploring the space, which contributes to it having the highest OSPA error. In this figure, two targets are not detected. The robots move directly to their weighted Voronoi Center and become idle once they have arrived at the goal point. This is a great waste of the searching resource. We see in Fig. 2b that the robots cover much more of the search space in the same amount of time. Thus, the addition of the new local best term (since the global best is identical to Lloyd’s), causes the robots to explore more. We also see that the robot paths are relatively smooth, a result of the inertia term in (6) causing the heading direction to remain somewhat consistent. We see in Fig. 2c that these robots take more of a random walk. This is because the annealing process drive the robots oscillate between the best point
388
P. Xin and P. Dames
Fig. 1. Boxplots shows the median OSPA statistics over 10 runs for teams of 10–100 robots and 10, 30, and 50 static targets for original PHD, PSO, and SA respectively.
Stochastic Optimization for Multi-robot Tracking
389
Fig. 2. Robots’ trajectories using different control methods. Robots all begin in the dashed box at the bottom of the environment.
Fig. 3. Percent of undetected area over time over 10 runs for 10-robot, 10-target case, where the solid curves show the mean value and the shaded areas show the range.
Fig. 4. Boxplots show the percent of undetected area at the end of the experiment over 10 runs for teams of 10–100 robots and 50 static targets for the original PHD, PSO, and SA search strategies.
390
P. Xin and P. Dames
and the potential point. However, since the temperature keeps decreasing, the oscillation range is smaller than that of PSO. We can examine these trends in a quantitative manner by plotting the percentage of unexplored area over time for each method in Fig. 3. We observe that the average exploration speed is similar across all methods over the first 100 s, with SA showing a larger range. After that, we observe that the exploration rate of Lloyd’s algorithm levels off around 0.35 as robots become trapped. The other methods both have ways of escaping local minima, with SA yielding the highest and most consistent exploration rate as time increases. To further explain, the unexplored area metric in Fig. 4 measures the percentage of the total area that was not explored by any robot at the end of the search period. We see that in the 50 target case, all methods are able to cover a significant portion of the environment, but SA has the highest median and maximum unexplored area. This is likely due to the fact that the robots frequently get stuck in local minima due to the higher density of targets in the environment. Once the number of robots exceeds the number of targets, all yield similar results and cover more than 99% of the environment for ≥60 robots in the allotted time. This is because there are free robots to continue to explore the remainder of the space even when each target has a robot assigned to track it.
Fig. 5. Boxplots shows the median OSPA statistics over 10 runs for teams of 10–100 robots in tracking moving targets for original PHD, PSO, and SA respectively.
4.3
Dynamic Targets
We also conduct a set of experiments with moving targets. Target have a max velocity 1 m/s and may enter or leave the environment by moving across the boundary. When a new target crosses the boundary, it moves along a random
Stochastic Optimization for Multi-robot Tracking
391
direction. Targets become inactive if they move out of the environment. To model the birth of the targets, we set the birth PHD to be non-zero near the map boundary (∂E) 5.26 × 10−5 x − ∂E ≤ 5m b(x) = (9) 0 else This model expects one new target to appear per 10 s. Based on the appearance and disappearance of targets, the number of targets reaches a steady state of around 30 targets regardless of the initial number of targets. Figure 5 shows the median steady state OSPA value, as measured by the average over the final 200 out of 1000 time steps. We see that in all cases, the OSPA decreases as the number of robots increases and appears to reach an asymptote. However, the PHD method yields the highest OSPA error when the number of robots is small, but has the lowest asymptote when the number of robots is high. We see that the PSO and SA methods are quite similar with 10–40 robots. However, above that we see that PSO reaches a higher asymptote and has a higher variance in the OSPA, meaning it is both less accurate and less reliable than SA. This is similar to the static target case, where Lloyd’s algorithm get stuck tracking one target while PSO and SA can both escape these local minima to switch to another target. This escape behavior is very beneficial when there are fewer robots than targets but slightly harmful to tracking performance when there are more robots than targets.
5
Conclusion
In this paper, we examined the performance of different search strategies to solve the MR-MTT problem. All methods share a common structure with two components: 1) an estimation algorithm based on the distributed PHD filter and 2) a control strategy combining the stochastic optimization algorithm with Voronoi-based control. The robots use the estimate from the distributed PHD filter to weight the relative importance of the area within their Voronoi cell and select an action. We compare the standard Lloyd’s algorithm with two new methods based of standard stochastic optimization algorithms, namely simulated annealing (SA) and particle swarm optimization (PSO). Both these new methods allows robots to trade-off between a global goal and other local areas known to contain targets, unlike Lloyd’s algorithm which only uses local information. We see that the new stochastic methods enable the robot team to explore more search area and yield better tracking results when there are fewer robots than targets. In other words, the robots make a better trade-off between exploration and exploitation. Acknowledgement. This work was funded by NSF grants IIS-1830419 and CNS2143312.
392
P. Xin and P. Dames
References 1. Blackman, S.: Multiple hypothesis tracking for multiple target tracking. IEEE Aerosp. Electron. Syst. Mag. 19(1), 5–18 (2004). https://doi.org/10.1109/MAES. 2004.1263228 2. Chen, J., Xie, Z., Dames, P.: The semantic PHD filter for multi-class target tracking: From theory to practice. Robot. Autonomous Syst. 149 (2022). https://doi. org/10.1016/j.robot.2021.103947 3. Cortes, J., Martinez, S., Karatas, T., Bullo, F.: Coverage control for mobile sensing networks. IEEE Trans. Robot. Autom. 20(2), 243–255 (2004). https://doi.org/10. 1109/TRA.2004.824698 4. Dames, P., Kumar, V.: Autonomous localization of an unknown number of targets without data association using teams of mobile sensors. IEEE Trans. Autom. Sci. Eng. 12(3), 850–864 (2015). https://doi.org/10.1109/tase.2015.2425212 5. Dames, P., Kumar, V.: Experimental characterization of a bearing-only sensor for use with the phd filter (2015). https://doi.org/10.48550/arXiv.1502.04661 6. Dames, P.M.: Distributed multi-target search and tracking using the PHD filter. Auton. Robot. 44, 673–689 (2020). https://doi.org/10.1007/s10514-019-09840-9 7. Davis, L.: Bit-climbing, representational bias, and test suit design. In: Proc. Intl. Conf. Genetic Algorithm, 1991, pp. 18–23 (1991) 8. Derr, K., Manic, M.: Multi-robot, multi-target particle swarm optimization search in noisy wireless environments. In: 2009 2nd Conference on Human System Interactions, pp. 81–86 (2009). https://doi.org/10.1109/HSI.2009.5090958 9. Deutsch, I., Liu, M., Siegwart, R.: A framework for multi-robot pose graph slam. In: 2016 IEEE International Conference on Real-time Computing and Robotics (RCAR), pp. 567–572 (2016). https://doi.org/10.1109/RCAR.2016.7784092 10. Doitsidis, L., et al.: Optimal surveillance coverage for teams of micro aerial vehicles in gps-denied environments using onboard vision. Auton. Robot. 33(1), 173–188 (2012). https://doi.org/10.1007/s10514-012-9292-1 11. Du, Q., Faber, V., Gunzburger, M.: Centroidal voronoi tessellations: applications and algorithms. SIAM Rev. 41(4), 637–676 (1999). https://doi.org/10.1137/ S0036144599352836 12. Fortmann, T., Bar-Shalom, Y., Scheffe, M.: Sonar tracking of multiple targets using joint probabilistic data association. IEEE J. Oceanic Eng. 8(3), 173–184 (1983). https://doi.org/10.1109/JOE.1983.1145560 13. Glover, F.: Tabu search-part i. ORSA J. Comput. 1(3), 190–206 (1989) 14. Goldhoorn, A., Garrell, A., Alqu´ezar, R., Sanfeliu, A.: Searching and tracking people with cooperative mobile robots. Auton. Robot. 42(4), 739–759 (2018). https:// doi.org/10.1007/s10514-017-9681-6 15. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of ICNN’95 - International Conference on Neural Networks, vol. 4, pp. 1942–1948 vol 4 (1995). https://doi.org/10.1109/ICNN.1995.488968 16. Kirkpatrick, S., Gelatt, C.D., Jr., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983) 17. Mahler, R.: Multitarget bayes filtering via first-order multitarget moments. IEEE Trans. Aerosp. Electron. Syst. 39(4), 1152–1178 (2003). https://doi.org/10.1109/ TAES.2003.1261119 18. Mirjalili, S.: Moth-flame optimization algorithm: A novel nature-inspired heuristic paradigm. Knowl.-Based Syst. 89, 228–249 (2015). https://doi.org/10.1016/j. knosys.2015.07.006
Stochastic Optimization for Multi-robot Tracking
393
19. Murphy, R.: Human-robot interaction in rescue robotics. IEEE Trans. Syst. Man Cybern. Part C (Applications and Reviews) 34(2), 138–153 (2004). https://doi. org/10.1109/TSMCC.2004.826267 20. Qie, H., Shi, D., Shen, T., Xu, X., Li, Y., Wang, L.: Joint optimization of multi-uav target assignment and path planning based on multi-agent reinforcement learning. IEEE Access 7, 146264–146272 (2019). https://doi.org/10.1109/ACCESS.2019. 2943253 21. Rizk, Y., Awad, M., Tunstel, E.W.: Decision making in multiagent systems: a survey. IEEE Trans. Cognitive Developmental Syst. 10(3), 514–529 (2018). https:// doi.org/10.1109/TCDS.2018.2840971 22. Robin, C., Lacroix, S.: Multi-robot target detection and tracking: taxonomy and survey. Auton. Robot. 40(4), 729–760 (2016) 23. Schuhmacher, D., Vo, B.T., Vo, B.N.: A consistent metric for performance evaluation of multi-object filters. IEEE Trans. Signal Process. 56(8), 3447–3457 (2008). https://doi.org/10.1109/TSP.2008.920469 24. Shi, W., He, Z., Tang, W., Liu, W., Ma, Z.: Path planning of multi-robot systems with boolean specifications based on simulated annealing. IEEE Robot. Autom. Lett. 7, 6091–6098 (2022). https://doi.org/10.1109/LRA.2022.3165184 25. Stone, L.D., Streit, R.L., Corwin, T.L., Bell, K.L.: Bayesian multiple target tracking. Artech House (2013) 26. Tang, Q., Yu, F., Xu, Z., Eberhard, P.: Swarm robots search for multiple targets. IEEE Access 8, 92814–92826 (2020). https://doi.org/10.1109/ACCESS.2020. 2994151 27. Vo, B.N., Ma, W.K.: The Gaussian mixture probability hypothesis density filter. IEEE Trans. Signal Process. 54(11), 4091–4104 (2006) 28. Vo, B.N., Singh, S., Doucet, A.: Sequential Monte Carlo methods for multitarget filtering with random finite sets. IEEE Trans. Aerosp. Electron. Syst. 41(4), 1224– 1245 (2005) 29. Wang, B.: Coverage problems in sensor networks: A survey. ACM Comput. Surv. 43(4) (2011). https://doi.org/10.1145/1978802.1978811 30. Zhang, C., Lesser, V.: Coordinating multi-agent reinforcement learning with limited communication. In: Proceedings of the 2013 International Conference on Autonomous Agents and Multi-Agent Systems, AAMAS ’13, pp. 1101–1108. International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (2013)
Multi-agent Deep Reinforcement Learning for Countering Uncrewed Aerial Systems Jean-Elie Pierre1(B) , Xiang Sun1 , David Novick2 , and Rafael Fierro1 1
Department of Electrical and Computer Engineering, The University of New Mexico, Albuquerque, NM 87131, USA [email protected] 2 Sandia National Laboratories, Albuquerque, New Mexico 87185, USA
Abstract. The proliferation of small uncrewed aerial systems (UAS) poses many threats to airspace systems and critical infrastructures. In this paper, we apply deep reinforcement learning (DRL) to intercept rogue UAS in urban airspaces. We train a group of homogeneous friendly UAS, in this paper referred to as agents, to pursue and intercept a faster UAS evading capture while navigating through crowded airspace with several moving non-cooperating interacting entities (NCIEs). The problem is formulated as a multi-agent Markov Decision Process, and we develop the Proximal Policy Optimization based Advantage ActorCritic (PPO-A2C) method to solve it, where the actor and critic networks are trained in a centralized server and the derived actor network is distributed to the agents to generate the optimal action based their observations. The simulation results show that, as compared to the traditional method, PPO-A2C fosters collaborations among agents to achieve the highest probability of capturing the evader and maintain the collision rate with other agents and NCIEs in the environment. Keywords: Multi-agent systems · deep reinforcement learning counter uncrewed aerial systems (C-UAS) · machine learning
1
·
Introduction
Counter Uncrewed Aerial Systems (C-UAS) is an active area of research. A paper by the Center for The Study of Drones identified 235 counter-drone products on the market or in development as of 2018 [1]. The rise in C-UAS research is largely due to the proliferation of uncrewed aerial systems (UAS). Although UAS has shown great potential in solving various critical and difficult tasks, the growing autonomy of UAS posed a threat to the security of individuals and organizations alike. A growing number of concerns is the intrusion of UAS into guarded or protected spaces as any mishandled and malicious UAS in a protected space poses a safety risk to the related entities, such as aircraft near or in airports and groups of people in large gatherings. In addition, malicious UAS can be c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 394–407, 2024. https://doi.org/10.1007/978-3-031-51497-5_28
Multi-agent Deep RL for C-UAS
395
Fig. 1. C-UAS scenario.
used to attack critical infrastructure, such as airport control towers, or perform unwanted surveillance in a protected space [19]. It is necessary to design a C-UAS method that can intercept any threat UAS before they cause further damage in protected or target areas. Instead of having one UAS in terms of the pursuer to intercept a threat UAS in terms of an evader, we consider a group of pursuers, whose original task is to provide surveillance service, to intercept evaders, thus protecting the target area. Normally, the CUAS problem can be formulated as a pursuit-evasion problem, where an evader moves around and avoids capture by a pursuer [10,23]. The traditional way to solve the problem is to utilize the control techniques in each agent in terms of a pursuer, which, however, has high complexity and requires each agent the global knowledge of the network, and thus is impractical in most cases. In recent years there has been increasing research to solve the pursuit-evasion problem by using the Multi-agent Deep Reinforcement Learning (DRL) method [4,9], where deep neural networks would be used as the policies to derive the actions in terms of the velocities and angular velocities of the pursuers based on their observations. Here, each pursuer could apply the same or different policies in terms of deep neural networks to derive the actions. As compared to the traditional control techniques, the multi-agent DRL method has a low response time to derive the optimal actions of a pursuer based on the pursuer’s current observation as long as the deep neural networks are well-trained via extensive simulations. On the other hand, although the DRL method has been proposed to solve the C-UAS problem, some of them do not consider collaboration amongst multiple pursuers [2]. Some works apply the multi-agent DRL method to enable collaborations
396
J.-E. Pierre et al.
among pursuers, but they do not consider, for example, the existence of moving obstacles and the collisions between pursuers [7], which may significantly reduce the state space and complexity of the problem but leads to the methods impractical. As compared to the existing works, we study a more complicated but realistic C-UAS scenario, where there is one evader and multiple pursuers in an alarmed area. As shown in Fig. 1, each pursuer has to work together to intercept the evader from entering the target area, while avoiding collisions with other pursuers and non-cooperating interacting entities (NCIEs) in the alarmed area. We design a novel multi-agent DRL algorithm that is inspired by [16,18], where different pursuers derive their actions based on their local actor networks, that is trained with the shared experiences of all the pursuers simultaneously based on Proximal Policy Optimization (PPO). The contributions of the paper are summarized as follows. 1) We consider a complicated C-UAS scenario, where multiple pursuers, NCIEs, and one evader exist in an alarmed area. We tackle this challenging problem by designing a Proximal Policy Optimization-based Advantage Actor-Critic for parameter sharing multi-agent DRL. More specifically, the pursuers share their observations to a centralized controller, which trains the actor and critic networks, and then distributes the actor network to the pursuers to locally derive their actions. The design solution allows the pursuers to quickly intercept the evader while avoiding collisions and model divergence. 2) We conduct extensive simulations to show the convergence of the method and demonstrate its performance by comparing it with the existing baseline methods. The rest of the paper is structured as follows. We formulate the C-UAS problem in Sect. 2 as a multi-agent Markov Decision Process. Section 3 describes our designed multi-agent DRL algorithm, Sect. 4 discusses simulation results, and Sect. 5 provides concluding remarks and future research directions. 1.1
Related Work
The multi-agent pursuit-evasion problem has been studied throughout the years. In traditional pursuit-evasion games, a pursuer tries to capture evaders, while an evader tries to evade capture from the pursuers. The authors in [14] used the area minimization strategy as a means to intercept rogue robots. The strategy seeks the best coordination to minimize Voronoi cells of evaders. Similarly, the authors in [24] solve the cooperative pursuit problem by partitioning the problem space into Voronoi regions and developing a control strategy that allows each pursuer to occupy the regions, thus increasing the probability of capturing the evaders. One variation of the pursuer evasion problem is in the form of a perimeter defense game as described in [17]. The authors developed an algorithm to intercept a group of evaders trying to reach a target area. In their problem formulation, the pursuers are restricted to the perimeter of the target. Whereas in our paper, the pursuers are free to move out the perimeter of the target area, which is more practical. Another variation of the pursuer evasion problem is described in [23], where the policies are developed for two pursuers to intercept any incoming threat. The authors described several intercept strategies for
Multi-agent Deep RL for C-UAS
397
catching an evader. The main focus of the work is to use two pursuers though, which may not be adequate to capture a superior evader. Another variation is described by[3], where the designed policy is to herd a group of swarm evaders to a safe area. The authors developed a stringent algorithm to herd the attackers to a safe area away from a target space. Different from using the traditional control theory, multi-agent DRL is another solution to solve the pursuit-evasion problem. Research in multi-agent reinforcement learning has been plentiful over the years. The authors in [21] highlighted the current state of multi-agent DRL and its challenges. Actually, Multi-agent DRL can solve not only pursuit-evasion games, but also, for example, swarm control [9,21] and cooperation. Some of the challenges to applying multi-agent DRL include nonstationarity and scalability. The nonstationarity issue arises because an agent’s action derived from its policy may affect other agents’ actions, thus leading to model training divergence. The scalability arises because the action and state spaces exponentially increase with respect to the number of agents, and thus are not scalable to a larger group of agents. To address the scalability issue, the authors in [9] propose Mean Embedding for defining the agent’s observation so that training is invariant to the number of agents. Whereas [8] used a self-attention-based model to aid with scalability. In [22], a target prediction network (TP Net) is developed to predict an evader’s location using long-term short-term memory (LSTM). In [4], which is the most relevant work as compared to our paper, the authors provided a framework for training eight pursuers to capture one evader. The evader’s goal is to avoid capture while moving around within a specific area. This is different from our scenario, where the evaders’ goal is to reach the predefined target area. Their main contribution is the evaluation of different factors that affect the probability of capture for the pursuers such as the evader’s relative speed, number of pursuers and radius of pursuers and evader. They also used curriculum learning to train their agents. With curriculum learning, the agents learn to achieve successive tasks by first completing and building on easier tasks. This method has been shown to have good stability performance but does not address collision between agents and obstacle avoidance. Our approach builds on this research and addresses inter-agent collision and obstacle avoidance. To train the deep neural networks in terms of policies in multi-agent reinforcement learning, several methods have been proposed. For example, [6] applied Trust region policy optimization to training the deep neural networks, whose parameters are shared among agents. In [12], the authors created a variant of DDPG for multi-agent systems, where the agents share the action-value function for learning and act in a decentralized fashion during execution. To achieve superior training performance, a reward function must be provided to the agents that maps agents’ states to their rewards, and so [20] uses sparse reward to enable pursuers to reach evaders. At each successful capture, the pursuers get a reward based on the number of pursuers, who are within the captured distance of the evader.
398
2
J.-E. Pierre et al.
Problem Formulation
As shown in Fig. 1, there is a group of N pursuers tasked with protecting a target area T from a group of M evaders in an alarmed area W, where T ⊂ W. The positions of the pursuers and evaders at each time step are denoted as j Pip (t) ∈ R2 (i = 1, . . . , N) and Pe (t) ∈ R2 ( j = 1, . . . , M), respectively, where i and j are the indices of the pursuers and evaders, respectively, and t is the current time step, Pip (t) and θ ip (t) are the 2D position and heading for pursuer i, and j j Pe (t) and θ e (t) are the 2D position and heading for evader j. Define xip (t) and j xe (t) as the state vector of pursuer i and evader j, where xip (t) = Pip (t), θ ip (t) j j j and xe (t) = Pe (t), θ e (t) . If the unicycle model is applied, then the dynamics of the pursuer and evader are given by
d i x (t) = dt p
⎡cos θ ip (t) 0⎤ i ⎥ ⎢ ⎢ sin θ ip (t) 0⎥ v pi (t) and d xej (t) = ⎥ ω (t) ⎢ dt ⎢ 0 1⎥⎦ p ⎣
⎡cos θ ej (t) 0⎤ j ⎥ ⎢ ⎢ sin θ j (t) 0⎥ vej(t) , e ⎥ ω (t) ⎢ ⎢ 0 1⎥⎦ e ⎣
(1)
respectively, where v pi (t) and ωip (t) are the wheel velocity and angular velocity j j of pursuer i at time step t, ve (t) and ωe (t) are the wheel velocity and angular j velocity of evader j at time step t. The feasible values of v pi (t), ωip (t), ve (t), and j j j ωe (t) are defined as vmin ≤ v pi (t), ve (t) ≤ vmax and ωmin ≤ ωip (t), ωe (t) ≤ ωmax . In addition, there are K non-cooperating interacting entities (NCIE) moving in the area W, and denote Pko (t) ∈ R2 (k = 1, . . . , K) as the position of NCIE k at time step t. The NCIEs move throughout area W with constant speeds and no rotational components, i.e., xok (t) = cxk , yok (t) = cyk , and θ ok (t) = arctan cyk , cxk , (2) where cxk and cyk represent the constant velocity of NCIE k in the x and y direction, respectively. Here, the constant speed setup for an NCIE is used to emulate the trajectory of the NCIE. Although these NCIEs, such as flying aircraft in busy airports or a flock of birds, may have variable speeds that lead to much more sophisticated control of the pursuers to avoid collisions as compared to the constant speed setup of the NCIEs, we adopt a simplified but reasonable strategy that each pursuer should ensure a safe distance to each NCIE to avoid the collision. Here, the safety distance is calculated based on the worst-case scenario that two entities are flying towards each other with their maximum velocities. Hence, the constant/variable speed setup for an NCIE does not affect our proposed method to be mentioned later on. Note that for future reference, we will drop indices (i, j, k) and t when not applicable. In this paper, we assume that all entities are at the same height, and therefore will only focus on 2-dimensional space for this paper. We leave 3-dimensional C-UAS for future research. Also, restricting UASs to nonholonomic constraints is feasible for working on a planar space W for fixed-wing UAVs.
Multi-agent Deep RL for C-UAS
399
The goal of the evaders is to enter the target area T while evading capture. j Evader j has successfully penetrated the target space if ||Pe (t)−P T || ≤ δ T , where P T and δ T are the center position and radius of the target area T , respectively, and · is the Euclidean norm. The goal of the pursuers is to capture the evaders before the evaders enter the target area. Evader j successful capture j by pursuer i is categorized by ||Pip (t) − Pe (t)|| ≤ δc where δc is defined as a captured radius around the evader. Moreover, the pursuers cannot enter the target area T , which is motivated by the fact that in the real world, some agents cannot enter the crowded closed areas to avoid complicated events. In addition, a collision happened between two pursuers once Pip (t) − Pip (t) ≤ δ p , where i and i are the indices of the two pursuers and δp is the safety distance to avoid the collision between two pursuers. Similarly, a collision between pursuer i and NCIE k occurs if Pip (t)−Pko (t) ≤ δo , where δo is the safety distance to avoid the collision between a pursuer and an NCIE. We disregard the collisions between two NCIEs as well as two evaders as we assume that the evaders/NCIEs are wellcoordinated to prevent collisions. Specifically, the initial positions of pursuers and evaders are always within and out of the alarmed area W, respectively. j j j In our scenario, the evader’s action, i.e., ae (t) = ve (t) , ωe (t) , is dictated by a virtual attractive force defined in [23]. To define the virtual attractive force, we first convert the action of the evader from a unicycle model to a single integrator using [5]. An integrator model takes as input j j j (3) ve (t) = vx,e (t) , vy,e (t) , j
j
with vx,e, vy,e being the current speed of the evader in the x and y direction respectively. Alternatively, the velocity vector for the evader can be defined as: j j ˆj ve (t) = ve d e (t),
(4)
j ˆ e (t) is a unit vector where ve is the max permissible velocity of the evader and d pointing in the desired direction of the evader. Since the evader’s goal is to reach the target, we can derive a virtual attractive force that pulls the evader towards the target while simultaneously pushing the evader away from NCIEs or the pursuer. This desired driving direction is given as
j de (t)
= −kr
N
1
i=0
Pip (t) − Pe (t) α
j
i (t) + Pˆp,e
K
1 j
k=0
k (t) Pˆo,e
Pko (t) − Pe (t) α 1 + ka j PˆT,e (t) , Pe (t) − P T
(5)
where the first two summations represent the repulsive force pushing the evader away from the pursuers and NCIEs, and the last summation represents an attractive force that pulls the evader towards the target space area T . Here, kr and k a are the repulsive and attractive force gains, respectively, and α determines if the j forces are distance dependent. In our simulation, we set α equal to 1. Since ve (t)
400
J.-E. Pierre et al.
is the action for the single integrator model, we use [5] to convert it to action j ae (t) of the unicycle model. The values kr and k a are optimized by extensive evaluation to maximize the evader entering the alarmed space while avoiding collision with the pursuers and NCIEs.
3
Multi-agent Deep Reinforcement Learning for C-UAS
As described in [7], the C-UAS problem can be described as a multi-agent Markov Decision Process (MAMDP) formally given as a tuple (N, S, A, R, F , γ). Here, 1. N represents the number of agents in terms of the pursuers in an environment. 2. S is the state space of the environment. Here, a state observed by pursuer i at time step t, denoted as sit ∈ S, comprises 1) the states with respect to pursuer i itself, i.e., v pi (t) , ωip (t) , di, T (t) , θ i, T (t) , where v pi (t) and ωip (t) are the wheel velocity and angular velocity of pursuer i at time step t, and di, T (t) and θ i, T (t) are the distance and angle between pursuer i and the center of the target space T , respectively; 2) the states of other evaders with respect to pursuer i, i.e., di, j , θ i, j , di, T , θ j, T , where di, j is the distance between pursuer i and evader j, and θ i, j and θ j, T are the angles between evader j and pursuer i and between evader j and center of the target space T, respectively; 3) the states of other NCIEs with respect to pursuer i, i.e., di,k , θ i,k , where di,k and θ i,k are the distance and angle between pursuer i and NCIE k. Here, we assume each pursuer knows the states of the other pursuers, evaders, and NCIEs. This may not be realistic in the real world but can be obtained by using lidar or vision-based sensors. 3. A is the joint action of all the N agents, i.e., A = A1 × . . . × A N . Let ati be i the action of agent i at time step t (where i at i∈ Ai ), which is the velocity and i angular velocity of pursuer i, i.e., at = v p, ω p . 4. R is the reward function. The goal of each agent is to maximize its cumulative reward Ri = Tt=0 γ t rti , where γ ∈ [0, 1] is the discount factor, T is the length of one episode, and rti is the reward function of agent i at time slot t, i.e., ⎧ ⎪ 1, ⎪ ⎪ ⎪ ⎪ ⎪ −1, ⎪ ⎪ ⎪ ⎪ ⎨ −10, ⎪ rti = −0.10, ⎪ ⎪ ⎪ ⎪ ⎪ −0.15, ⎪ ⎪ ⎪ ⎪ di, j ⎪ , ⎪ −β dmax ⎩
j
||Pip (t) − Pe (t)|| ≤ δc, j ||Pe (t) − P T || ≤ δ T , ||Pip (t) − P T || ≤ δ T , Pip (t) − Pip (t) ≤ δp, Pip (t) − Pko (t) ≤ δo,
(6)
otherwise,
where rti = 1 if the evader is captured, rti = −1 if the evader reaches the target space, rti = −10 if it pursuer j enters the target space, rti = −0.10 if i the pursuer collides with each other, rt = −0.15 if the pursuer collides with d
i, j an NCIE in the environment. β dmax is a negative reward calculated based on the distance between the pursuer and the evader. β determines the weight
Multi-agent Deep RL for C-UAS
401
of the penalty, di, j is the Euclidean distance to the evader and dmax is the maximum possible distance of the environment. 5. F : S × A → S defines the state transition probability density function that maps the current states and actions into the next states. 3.1
Proximal Policy Optimization Based Advantage Actor Critic
In our multi-agent C-UAS environment, each pursuer interacts with the environment, receives a reward rti , and proceeds into a new state st+1 . During each time step t, the pursuer takes an action ati ∈ A based on a state st ∈ S. The goal of the pursuer, denoted as G, is to maximize its cumulative discounted reward in the environment following a policy π, which maps state s to action ati [13], i.e., ∞ t (7) γ r (st , at , st+1 ) |at = π (·|st ) , G = Eπ,s0 t=0
where γ ∈ [0, 1] is the discount factor and s0 is the initial state of the environment. In policy-based RL algorithms, we can choose a parameterized πφ (at |st) to maximize the cumulative discounted reward G(φ) = policy t r(s , a ; φ) . Many solutions have been designed to efficiently solve the γ E ∞ t t t=0 MAMDP problem [13], and motivated by [16], we apply the Advantage ActorCritic (A2C) method to the problem. A2C is a DRL method combining policybased and value-based reinforcement learning. In A2C, there are two neural networks, i.e., the actor and critic networks. The actor network provides the stochastic policy to choose the actions such that the expected cumulative reward G is maximized. Here, we apply Proximal Policy Optimization (PPO) [16], which claims to outperform other policy-based methods in continuous control environments [16]. Denote πφ (at |st ) as the parameterized policy for PPO, where φ is the set of weights. The loss function in PPO is ˆ t min(rt (φ) Aˆt , clip(rt (φ), 1 − , 1 + ) Aˆt ) , L CLIP (φ) = E (8) where is the clipping ratio, rt (φ) is the ratio between current policy πφ (at |st ) π (a |s ) and πφold (at |st ) i.e., rt (φ) = πφ φ (at t |st t ) , Aˆt in Eq. (8) is the Advantage function. In old our implementation, Aˆt is calculated using Generalized Advantage Estimation [15], T AˆGAE(γ,λ) = (γλ)l δV (9) t+l . l=0
is the temporal difference (TD) residual error at time step t + l, i.e., Here, δV t+l δV t+l = −Vν (st+l ) + rt+l + γVν (st+l+1 ),
(10)
where Vν is the state value, which is estimated by the critic network, and ν indicates the set of the weights in the critic network. The objective of the critic
402
J.-E. Pierre et al.
network is to minimize the loss function L(ν), which is defined as the mean square error between the estimated state-value and the expected cumulative reward, i.e., L (ν) = (rt + γVν (s t+1 ) − Vν (s t ))2 .
(11)
Note that λ in Eq. (9) (where 0 ≤ λ ≤ 1) is a hyperparameter that affects the bias and variance of Vν [15]. 3.2
Parameter Sharing
One of the major challenges of multi-agent RL is non-stationarity, where each agent trains its actor and critic networks based on its observations, thus causing the models to diverge. To overcome this issue, [6] proposes parameter sharing for cooperative model training in multi-agent systems. Specifically, the observations of each agent are shared to a centralized server as shown in Fig. 2, which is used to train and update the actor and critic networks based on Eq. (12). φ := φ − βa ∇φ L CLIP (φ), (12) ν := ν − βc ∇ν L (ν) ,
Fig. 2. The architecture of parameter sharing in multi-agent reinforcement learning.
Multi-agent Deep RL for C-UAS
403
Here, βa and βc are the learning rates for the actor and critic networks, respectively. The updated actor network will then be forwarded to all the agents to derive their actions based on their observations. Algorithm 1 summarizes Proximal Policy Optimization based Advantage Actor-Critic (PPO-A2C), which is implemented in RlLib [11], a RL application programming interface (API) for training multi-agent systems using reinforcement learning.
4
Simulation Results
We will conduct extensive simulations to validate the proposed PPO-A2C algorithm. The environment has an arena size of 80 m × 60 m, where there are four pursuers to capture one evader with five NCIEs navigating. The captured and collision radii for pursuers/evader/NCIEs (i.e., δc /δ p /δo ) are set to 1 m. The maximum speeds of the pursuers and evader are the same 10 m/s, while the NCIEs move with a constant speed of 1 m/s. At each episode, the pursuers are initialized in a random location inside the circular alarmed area with a radius of 30 m but outside the circular target area with a radius of 5 m. Meanwhile, the evader is randomly initialized outside the alarmed area and the NCIEs are initialized with a random location inside the environment and heading θ. The length of each episode is 40 s of simulation time, and the episode ends if the evader is captured by any pursuers or if any pursuers enter the protected space. Figure 3 shows a still frame of the pursuers capturing the evader. For additional full-length videos, please visit our shared folder on Google Drive1 . The trace blue lines show the trajectory used by the pursuers to reach the evader. The hyperparameters for PPO-A2C are listed in Table 1. Also, the simulation is conducted on a computer with an Intel Core i7 processor with 12 cores, 16 GiB memory, and GPU Nvidia GeForce RTX 2070 graphics card for 20 hours. Figure 4 shows the learning curves for the pursuers by applying the proposed PPO-A2C and the evader based on Eq. (5), respectively. From Fig. 4 we can derive that the A2C
Algorithm 1. PPO-A2C Input: initial policy parameters φ0 , initial value function parameters ν0 for iteration = 1, 2, · · · do for each agent i = 1, 2, · · · , N do Run actor network to derive the related action ati based on the observation; i ; Interact with the environment to obtain rti and st+ 1 i i i i Transmit st , at , rt , st+1 to the centralized server; end for Update φ and ν based on Eq. (12) via mini-batch SGD; Forward the updated actor network to all the agents; end for 1
https://drive.google.com/drive/folders/1GNMzj1GrD3Ov8oOhKUZCorw5D4Xpg2ul? usp=sharing.
404
J.-E. Pierre et al.
models in the pursuers are finally converged. In contrast, the evader’s normalized reward decreases as the pursuers learn an optimal policy. We compare the performance of the proposed PPO-A2C to a baseline method, i.e., artificial Table 1. Hyperparameters potential function pursuers (APFP), where each pursuer uses the artificial potential field function Parameters Values to determine its action to capture the evader. γ 0.99 Also, we apply the Multi-Agent Reinforcement λ 0.95 Learning Pursuers (MARLP), which also uses Minibatch size 4096 PPO-A2C, but pursuers neglect the existence of
0.2 NCIEs and there is no penalty for the pursuer to β a , βc 5 × 10−5 pursuer collisions. Note that for all the methods, the evader uses APF (i.e., Eq. (5)) to reach the target area, while trying to avoid the collisions. The metrics to evaluate the policy performance of PPO-A2C, APFP and MARLP are the capture probability of evaders, target collision rate, NCIE collision rate, and pursuer collision rate. The capture probability of evaders is defined as the number of episodes when the evader was captured divided by all the episodes. The target collision rate is the number of episodes when the pursuer enters the protected space divided by the total number of episodes. The NCIE collision rate is the number of collisions between pursuers and NCIEs divided by the total number of time steps. Lastly, the pursuer collision rate is defined as the number of collisions between pursuers divided by the total number of time steps.
Fig. 3. An example to show the trajectories of the evader and pursuers.
Fig. 4. Normalize rewards for the pursuer and evader during training.
Figure 5 shows the mean capture probability of the three methods over episodes, PPO-A2C has the highest mean capture probability, which demonstrates its effectiveness. MARLP has the lowest mean capture probability due to the reason that it is trained in an environment without NCIEs, thus unable to adapt to the environment with NCIEs, leading to many collisions. Figure 6
Multi-agent Deep RL for C-UAS
Fig. 5. Probability of capturing the evader over 400 episodes.
405
Fig. 6. Probability of capturing the evader as we vary the number of pursuers.
illustrates the mean capture probability of the three methods when the number of pursuers increases. From the figure, we can see that PPO-A2C can generate efficient policies for the pursuers to achieve the highest capture probability when the number of pursuers is less than seven but achieves a similar capture probability to APFP when the number of pursuers is no less than seven. We also observe the target collision rate, pursuer collision rate, and NCIE collision rate for the three methods by varying the number of pursuers. As shown in Fig. 7a, PPO-A2C always maintains the lowest target collision rate as compared to the other two methods. In Fig. 7b, we can find that PPO-A2C has a similar collision rate to APFP. Interestingly, MARLP has the lowest pursuer collision rate. Lastly, in Fig. 7c, APFP maintains the lowest NCIE collision rate even as the number of pursuers increases, and PPO-A2C has slightly worse performance but not as bad as MARLP. Based on the above figures, we can conclude that PPO-A2C can achieve a comparable low collision rate to the repulsive algorithm, but derive a better policy to capture the evaders with a higher probability than the repulsive algorithm.
Fig. 7. Probability of different collisions. (a) Target collision rate. (b) Pursuer collision rate. (c) NCIE collision rate.
406
5
J.-E. Pierre et al.
Conclusion
In this paper, we proposed PPO-A2C to tackle the counter UAS scenario. PPOA2C has been demonstrated to outperform the traditional APFP method against a smart evader. In addition, each agent can avoid obstacles and maintains a safe distance around the protected space. One of the limitations of our method is the lack of cooperative behaviors among the pursuers. During the simulation, the pursuers are observed with greedy behaviors to catch the evader. One possible method to overcome the issue is to design reward values to foster collaboration among the pursuers. We will tackle this limitation in future research. Furthermore, we plan to improve the proposed method so that each pursuer does not need to have the complete state information of the environment to train the models and derive the action. Lastly, we would extend the proposed PPO-A2C method to capture multiple evaders. Acknowledgements. This work is supported by Sandia National Laboratories, North Atlantic Treaty Organization (NATO), and National Science Foundation under Award CNS-2148178. Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.
References 1. Michel, A.H.: Counter-drone systems. http://dronecenter.bard.edu/counter-dronesystems/ (2018) 2. C ¸ etin, E., Barrado, C., Pastor, E.: Counter a drone in a complex neighborhood area by deep reinforcement learning. Sensors 20(8), 2320 (2020) 3. Chipade, V.S., Marella, V.S.A., Panagou, D.: Aerial swarm defense by stringnet herding: theory and experiments. Front. Robot. AI, 8, 81 (2021) 4. De Souza, C., Newbury, R., Cosgun, A., Castillo, P., Vidolov, B., Kuli´c, D.: Decentralized multi-agent pursuit using deep reinforcement learning. IEEE Robot. Autom. Lett. 6(3), 4552–4559 (2021) 5. Glotfelter, P., Egerstedt, M.: A parametric MPC approach to balancing the cost of abstraction for differential-drive mobile robots. In:2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 732–737. IEEE (2018) 6. Gupta, J.K., Egorov, M., Kochenderfer, M.: Cooperative multi-agent control using deep reinforcement learning. In: Sukthankar, G., Rodriguez-Aguilar, J.A. (eds.) AAMAS 2017. LNCS (LNAI), vol. 10642, pp. 66–83. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71682-4 5 7. Hasan, Y.A., Garg, A., Sugaya, S., Tapia, L.: Defensive escort teams for navigation in crowds via multi-agent deep reinforcement learning. IEEE Robot. Autom. Lett. 5(4), 5645–5652 (2020) 8. Hasan, Y.A., Garg, A., Sugaya, S., Tapia, L.: Scalable reinforcement learning policies for multi-agent control. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Multi-agent Deep RL for C-UAS
407
9. H¨ uttenrauch, M., Adrian, S., Neumann, G., et al.: Deep reinforcement learning for swarm systems. J. Mach. Learn. Res. 20(54), 1–31 (2019) 10. Kamimura, A., Ohira, T.: Group chase and escape. New J. Phys. 12(5), 053013 (2010) 11. Liang, E., et al.: Abstractions for Distributed Reinforcement Learning. Rllib (2018) 12. Lowe, R., Wu, Y.I., Tamar, A., Harb, J., Pieter Abbeel, O., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. arXiv:1706.02275 (2017) 13. Oroojlooy, A., Hajinezhad, D.: A review of cooperative multi-agent deep reinforcement learning. arXiv preprint arXiv:1908.03963 (2019). https://doi.org/10.1007/ s10489-022-04105-y 14. Pierson, A., Wang, Z., Schwager, M.: Intercepting rogue robots: an algorithm for capturing multiple evaders with multiple pursuers. IEEE Robot. Autom. Lett. 2(2), 530–537 (2016) 15. Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. arXiv:1506.02438 (2015) 16. Schulman, J.: Filip Wolski. Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms, Prafulla Dhariwal (2017) 17. Shishika, D., Paulos, J., Kumar, V.: Cooperative team strategies for multi-player perimeter-defense games. IEEE Robot. Autom. Lett. 5(2), 2738–2745 (2020) 18. Terry, J.K., Grammel, N., Hari, A., Santos, L., Black, B.: Revisiting parameter sharing in multi-agent deep reinforcement learning (2021) 19. Wang, J., Liu, Y., Song, H.: Counter-unmanned aircraft system (s)(C-UAS): State of the art, challenges, and future trends. IEEE Aerosp. Electron. Syst. Mag. 36(3), 4–29 (2021) 20. Chao, Yu., Dong, Y., Li, Y., Chen, Y.: Distributed multi-agent deep reinforcement learning for cooperative multi-robot pursuit. J. Eng. 2020(13), 499–504 (2020) 21. Zhang, K., Yang, Z., Ba¸sar, T.: Multi-agent reinforcement learning: a selective overview of theories and algorithms. In: Vamvoudakis, K.G., Wan, Y., Lewis, F.L., Cansever, D. (eds.) Handbook of Reinforcement Learning and Control. SSDC, vol. 325, pp. 321–384. Springer, Cham (2021). https://doi.org/10.1007/978-3-03060990-0 12 22. Zhang, R., Zong, Q., Zhang, X., Dou, L., Tian, B.: Game of drones: multi-UAV pursuit-evasion game with online motion planning by deep reinforcement learning. IEEE Transactions on Neural Networks and Learning Systems (2022) 23. Zhang, S., Liu, M., Lei, X., Yang, P., Huang, Y., Clark, R.: Synchronous intercept strategies for a robotic defense-intrusion game with two pursuers. Auton. Robot. 45(1), 15–30 (2021) 24. Zhou, Z., et al.: Pursuit Voronoi Partitions. Autom. 72, 64–72 (2016)
Decentralized Risk-Aware Tracking of Multiple Targets Jiazhen Liu1(B) , Lifeng Zhou1 , Ragesh Ramachandran2 , Gaurav S. Sukhatme2 , and Vijay Kumar1 1
2
University of Pennsylvania, Philadelphia, PA 19104, USA {jzliu,lfzhou,kumar}@seas.upenn.edu University of Southern California, Los Angeles, CA 90007, USA {rageshku,gaurav}@usc.edu
Abstract. We consider the setting where a team of robots is tasked with tracking multiple targets with the following property: approaching the targets enables more accurate target position estimation, but also increases the risk of sensor failures. Therefore, it is essential to address the trade-off between tracking quality maximization and risk minimization. In the previous work [1], a centralized controller is developed to plan motions for all the robots – however, this is not a scalable approach. Here, we present a decentralized and risk-aware multi-target tracking framework, in which each robot plans its motion trading off tracking accuracy maximization and aversion to risk, while only relying on its own information and information exchanged with its neighbors. We use the control barrier function to guarantee network connectivity throughout the tracking process. Extensive numerical experiments demonstrate that our system can achieve similar tracking accuracy and risk-awareness to its centralized counterpart. Keywords: distributed robotics · multi-target tracking risk-awareness · connectivity maintenance
1
·
Introduction
Tracking multiple targets using a decentralized robot team finds applications in many settings, e.g., tracking biochemical or nuclear hazards [2] or hot-spots in a fire [3]. In these settings, approaching the targets closely generally produces more accurate state estimates, but this comes at a cost – the closer the team of robots comes to the targets, the more likely that they experience sensor failure or other system faults due to e.g., radiation or heat emitted by the targets. In certain cases, the targets may even be actively hostile. The challenge is to maximize tracking accuracy while avoiding risky behavior on the part of the robot team. A common strategy for achieving team coordination is to devise a central controller G.S. Sukhatme holds concurrent appointments as a Professor at USC and as an Amazon Scholar. This paper describes work performed at USC and is not associated with Amazon. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 408–423, 2024. https://doi.org/10.1007/978-3-031-51497-5_29
Decentralized Risk-Aware Tracking of Multiple Targets
409
to make team-level decisions for all robots [4]. This relies on the assumption that all robots have a sufficiently large communication range to stay connected with a central station. In real applications, robots may have a limited communication range. A decentralized approach – where robots only communicate with nearby team members – is a natural choice [5,6]. In a decentralized strategy, each robot makes its own decisions according to local information gathered by itself and its peers within a certain communication range. If such a decentralized strategy is to achieve the same level of task execution as a centralized system, it is essential for it to maintain network connectivity, such that robots could share information mutually and make the most use of their resources. Motivated by these issues, we present a decentralized risk-aware multi-target tracking framework. Our contributions are as follows: 1. Our system is risk aware – it automatically trades off tracking quality and safety; 2. Our system is fully decentralized – each robot makes decisions relying on information sharing between neighbors within a communication range. We apply the control barrier function (CBF) in a decentralized manner to guarantee network connectivity maintenance and collision avoidance; 3. Our system produces comparable results to its centralized counterpart – we extensively evaluate the performance of our system through simulated experiments in Gazebo environment under various settings, and compare it to its centralized counterpart from our previous work [1]. We present qualitative demonstrations to provide an intuitive illustration. Quantitative results on metrics including tracking error and risk level are also exhibited.
2
Related Work
Multi-robot multi-target tracking is usually formulated as an iterative state estimation problem [7–12]. Various methods have been proposed for this; most of them design different flavors of filters to reduce the uncertainty in target pose estimation during the tracking process [1,7,13,14]. In the more challenging scenario where the targets may adversarially cause damage to the robots, safety becomes a major concern when the robot team plans its motion. To incorporate risk-awareness into a multi-robot system, [15] uses safety barrier certificates to ensure safety. With a special emphasis on tracking and covering wildfire, [3] encodes hazards from fire using an artificial potential field. Mutual information is used in [2] to build a probabilistic model of the distribution of hazards in the environment. Considering that a multi-robot system controlled by a central planner does not scale well [1], efforts have been made to decentralize the multi-target tracking task. For example, [16] adopts the local algorithm for solving the max-min problem to solve multi-target tracking in a distributed manner. In [17], a decentralized MPC-based strategy is proposed that generates collision-free trajectories. In [18], the general problem of multi-robot coordination while preserving connectivity is
410
J. Liu et al.
addressed. As it points out, a frequently adopted method to coordinate multiple robots in order to achieve a common goal is to use a consensus-based estimation algorithm. To enable multi-robot multi-target tracking in a decentralized framework, it is typically required that the connectivity of the underlying network graph is maintained. To do this, [19] formulates the connected region as a safety set and uses a CBF to render it forward invariant such that the network will always remain connected as long as the robots are initialized properly. For a systematic elaboration of the theory of CBF and its relationship with Lyapunov control, see [20]. In [21], a discussion of using graph spectral properties such as the algebraic connectivity to ensure that the network is connected is given. In [18], the connectivity maintenance problem is approached by assigning weights to the edges and designing control strategies accordingly.
3 3.1
Problem Formulation Notation
We use capital letters in boldface (e.g., A) to denote matrices, lowercase letters in boldface (e.g., x) to denote vectors, and lowercase letters in regular font (e.g., q) to denote scalars. 1 M×N denotes an all-one matrix of shape M × N. I M is the identity matrix of shape M × M. M1 ⊗ M2 gives the Kronecker product of two matrices M1 and M2 . [v1 ; v2 ; · · · ; vn ] represents vertical vector concatenation. The vertical concatenation of matrices or scalars are defined in the same fashion. Diag([M1, M2, · · · , Mn ]) represents diagonal matrix construction with constituents M1, M2, · · · , Mn placed along the diagonal. Tr(M) is the trace of matrix M. We use · to denote the 2-norm for vectors. For a set S, |S| returns its cardinality. [z] denotes the set {1, 2, · · · , z} for an arbitrary integer z ∈ Z+ . N (0, Q) represents the Gaussian distribution with a zero mean and covariance matrix Q. 3.2
Robot and Target Modeling
The multi-target tracking task is assigned to a team of N robots, indexed by R := [N]. For each robot i ∈ [N] with state xi ∈ X ⊆ R p and control input ui ∈ U ⊆ Rq , its motion is modeled by the control-affine dynamics: x i = f (xi ) + g(xi )ui Rp
Rp
Rp
R p×q
(1)
where f : → and g : → are continuously differentiable vector fields. The robot team is equipped with a set of sensors that are heterogeneous in terms of their sensing ability. Suppose there are in total U different sensors indexed by [U]. We encode the measurement characteristic of sensor k ∈ [U] using a vector hk . At time step t, we denote the sensor status, i.e., whether each sensor is functioning or damaged, for robot i as: 1, if sensor k on robot i is functioning well at time t; Γki,t = (2) 0, otherwise.
Decentralized Risk-Aware Tracking of Multiple Targets
411
Based on this, Γt denotes the sensor status of the whole robot team at time t. The set of sensors available at time t for robot i is denoted as γi,t = {k ∈ [U]|Γki,t = 1}. Combining the sensor status with the measurement characteristic of each sensor, the measurement matrix for robot i taking measurements of the target j is: Hi j,t = [hγi, t (1) ; hγi, t (2) ; · · · ; hγi, t ( |γi, t |) ], and the measurement matrix for robot i measuring all M targets is: Hi,t = I M ⊗ Hi j,t . Without loss of generality, we assume that all the robots have a homogeneous maximum communication range Rcomm , as heterogeneity in the robots’ communication ability is beyond the scope of this paper and is left for future exploration. For robot i, we define the region where other robots could have information exchange with it as χi := {x ∈ X|x − xi ≤ Rcomm }. We call the robots that are within χi as robot i’s neighbors. A robot is only able to communicate with its neighbors. The list of indices of robot i’s neighbors is defined as Ni := { j ∈ [N], j i|x j ∈ χi }. Note that Ni is time-varying for each robot i ∈ R as the robots actively reconfigure their positions to track the targets. Let the M targets be indexed by T := [M]. Denote the true state of target j as z j,t ∈ X ⊆ R p , and its control input as v j,t ∈ V ⊆ Rq , both at time step t. The dynamics of the targets follow: zt+1 = Azt + Bvt + wt
(3)
where A and B are the process and control matrices, respectively. zt and vt are obtained by stacking z j,t and v j,t , ∀ j ∈ T vertically. wt ∼ N (0, Q) is the process Gaussian noise with covariance matrix Q. A, B and Q are assumed to be known to the robots. Robot i’s measurement of the targets with noise ζ i,t = [ζi1,t ; ζi2,t ; · · · ; ζiM,t ] is modeled by: yi,t = Hi,t zt + ζ i,t .
(4)
We let ζi j,t ∼ N (0, Ri j,t ), ∀ j ∈ T where the measurement noise covariance matrix Ri j,t grows exponentially as the distance between robot i and target j increases, 1 R−i j,t = Diag([R−i j11,t , R−i j12,t , · · · , R−i j1|γi, t |,t ])
(5)
1 relying on a particular sensor k ∈ γi,t being: with R−i jk,t 1 = wk exp(−λk xi,t − z j,t ) R−i jk,t
(6)
where wk and λk are parameters characterizing the noise of sensor k ∈ γi,t . The target risk field φ, which encodes the probability that a robot experiences sensor failures induced by the targets, inherits the same definition as our previous work [1, Sect. III-E, V]. In particular, the risk from target j acting on all robots is: cj 1 exp(− (x − z j )T Σ j (x − z j )) (7) φ j (x) = 2π|Σ j | 2 where c j and Σ j denote the peak value and covariance for the risk field of target j, respectively. Correspondingly, the safety field is defined as π j (x) = 1 − φ j (x), ∀ j ∈ T.
412
3.3
J. Liu et al.
Problem Definition
We focus on planning the motion of each robot in the team to obtain an accurate estimation of the targets’ positions while at the same time avoiding risk, in a decentralized manner at each time step t throughout the tracking process. Specifically, for each robot i ∈ R, the goal is to solve the following optimization program: min
xi ,δ i 1,δi 2
s.t.
1 δ2 1 + ηi i 2 + δ i1, j , δ i1, j ≥ 0, ∀ j ∈ T ;
q1 ηi δ i1 2 + q2
(8a)
Tr(Pi j ) ≤ ρ i1, j
(8b)
1 ) Tr(O−i, Π
≤ ρi2 + δi2, δi2 ≥ 0;
|Ni | > 0; ¯i ≤ dmax ; xi − x xi − xl ≥ dmin, ∀l ∈ R, l i.
(8c) (8d) (8e) (8f)
The time index t is dropped here for the brevity of notation. xi is the optimization ¯i refers to robot variable representing robot i’s position at the next step, while x i’s current position. The cost function Eq. (8a) balances competing objectives of high tracking quality (first term) and safety (second term), using the sensor margin ηi which encodes the abundance of sensors available for robot i. The larger the value of ηi , the more powerful the sensor suite on robot i is for observing the targets, and therefore the first term in the objective is more stressed. On the contrary, when the sensors on robot i are limited, ηi becomes small, enforcing the second term in Eq. 8a to be stressed prioritizing safety. q1, q2 are constants weighing the two objectives. The tracking error is represented using the trace of target estimation covariance matrix Pi j and is upper bounded by threshold ρ i1, j , j ∈ T to ensure tracking accuracy, as in Eq. (8b). Meanwhile, the risk of 1 ), where Oi, Π is the risk-aware observability sensor failure is encoded by Tr(O−i, Π Gramian for robot i, defined similarly as the ensemble form in [1, Eq. (14)]. The risk level is upper bounded by a pre-specified threshold ρi2 to discourage highly risky behavior, as in Eq. (8c). δ i1, j , ∀ j ∈ T and δi2 are slack variables, which enable the “bend but not break ” behavior when sensor failures render it impossible to meet the strict constraints of ρ i1, j , ∀ j ∈ T and ρi2 . See [1, Sect. IV-B, C, D] for more details on the design of sensor margin and observability Gramian. Moreover, according to Eq. (8d), each robot should have at least one neighbor, so that the communication network formed by the robots remains connected. ¯i , Note that the connectivity relationship is determined using xi rather than x to ensure that the robots are still connected at the next time step. Constraint Eq. (8e) specifies the maximum distance dmax that robot i could travel between two consecutive steps. Finally, Eq. (8f) is the collision avoidance constraint to guarantee the minimum safety distance dmin between two robots. Notice that the optimization program in Eq. (8) is a non-convex problem that is computationally difficult to solve. We discuss an approach to solve it in two stages in the next section.
Decentralized Risk-Aware Tracking of Multiple Targets
4
413
Approach
In the proposed decentralized system, every robot relies on four major modules to perform the target tracking task, as illustrated in Fig. 1. At each time step t, every robot first generates its ideal goal state that it wants to reach without considering constraints including network connectivity and collision avoidance. Its decisions are only based on trading off tracking quality and risk aversion. This is performed by the Ideal Action Generation module, as introduced in Sect. 4.1. Sect. 4.2 corresponds to the Connectivity and Collision Avoidance module, which employs CBF to ensure network connectivity and to avoid inter-robot collisions when the robots move towards their respective goal positions. After each robot moves to its new position, the Decentralized Estimation module generates a new estimate of target positions and maintains common knowledge of such estimate among neighbors using the Kalman-Consensus Filtering algorithm in [22]. We introduce it in Sect. 4.3. Note that the Local Information Exchange module is responsible for information sharing within the neighborhood and all the other three modules rely on it.
Fig. 1. The proposed decentralized framework. Each robot performs the decentralized target tracking task using 4 modules: Local Information Exchange, Decentralized Ideal Action Generation, Decentralized Connectivity and Collision Avoidance, and Decentralized Estimation. The Local Information Exchange module takes charge of all communications between neighbors.
4.1
Decentralized Goal Position Generation
In this section, we focus on computing the “ideal” positions that robots want to reach at every time step, only out of the desire for higher tracking accuracy and aversion to overly risky behavior, without considering the connectivity maintenance and collision avoidance constraints. To this end, each robot i ∈ R solves a non-convex optimization program as follows:
414
J. Liu et al.
min
xi ,δ i 1,δi 2
s.t.
1 δ2 1 + ηi i 2 + δ i1, j , δ i1, j ≥ 0, ∀ j ∈ T ;
q1 ηi δ i1 2 + q2
(9a)
Tr(Pi j ) ≤ ρ i1, j
(9b)
1 Tr(O−i, Π )
≤ ρi2 + δi2, δi2 ≥ 0.
(9c)
This is a less constrained version of the optimization program in Eq. (8). Note 1 requires that to achieve team-level coordination, the computation of Pi j and O−i, Π not only robot i’s information, but also that from its neighbors acquired through the Local Information Exchange module. Since each robot runs its own optimization program, all robots in the team are required to solve the optimization program in Eq. (9) over multiple rounds, until their ideal goal positions converge. 4.2
Decentralized Motion Synthesis
Given the generated ideal goal positions, we aim to steer the robots towards their goal positions while preserving the network connectivity and inter-robot safety distance. We achieve this by using the control barrier functions. We consider the robots with control-affine dynamics x = f (x) + g(x)u as defined in Eq. (1), where f (x), g(x) are locally Lipschitz functions. We let C be the region where we want the robot team to stay within. Concretely, we define C as the superlevel set of a continuously differentiable function h(X): C = {X ∈ R p |h(X) ≥ 0}. From [20], we know that if we can select a feasible u ∈ Rq such that (10) L f h(X) + Lg h(X)u + α(h(X)) ≥ 0 is satisfied, we could render C forward invariant and ensure that if the robots initially start from an arbitrary state inside C, they will always remain within C. Note that L f and Lg represent the Lie derivatives of h(·) along the vector fields f and g, respectively. α(·) is an extended class K function as introduced in [19]. Back to our case, to ensure network connectivity, we define h(·) as: h(x) = λ2 −
(11)
where λ2 is the algebraic connectivity, which is the second smallest eigenvalue of the weighted Laplacian matrix of the robots’ communication graph, and is the connectivity threshold adjusting the extent to which we want to enforce this connectivity constraint. A larger ensures network connectivity more strictly. The weighted Laplacian matrix is: Lt = Dt −At . Dt and At represent the weighted degree matrix and weighted adjacency matrix, respectively. Specifically, we adopt the same definition from [19, Eq. (5)] to define the adjacency ail,t between robot i and l at time t, using the distance dil,t between them. That is, 2 2 2 e(Rcomm −dil, t ) /σ − 1, if dil,t ≤ Rcomm ; ail,t = (12) 0, otherwise, where σ ∈ R is a normalization constant. Based on these, we formulate the connectivity maintenance constraint using CBF in Eq. (13c) and similarly formulate the collision avoidance constraint in Eq. (13d). With these constraints,
Decentralized Risk-Aware Tracking of Multiple Targets
415
each robot i ∈ R is steered towards its desired position by solving the following quadratically constrained quadratic program: 2 1 ui − udes,i min (13a) ui 2 (13b) s.t. ui Δt ≤ dmax ; ∂λ2 ui + λ2 − ≥ 0; (13c) ∂¯ xi ∂dil2 2 ui + dil2 − dmin ≥ 0, ∀l ∈ Ni, (13d) ∂¯ xi where udes,i denotes the ideal control input which could be computed easily using the desired ideal goal position generated in Sect. 4.1. Δt is the time interval ¯i is robot i’s state at the current step. dil is the between two consecutive steps. x ¯l . The ¯i and x distance between robot i and l calculated using their current states x cost function in Eq. (13a) minimizes the discrepancy between robot i’s actual control input and its ideal one. Constraint Eq. (13b) specifies the maximum distance dmax that each robot could move between consecutive steps. Moreover, Eqs. (13c) and (13d) are the connectivity maintenance and collision avoidance constraints, respectively, as introduced above. In the decentralized system, every robot solves the optimization program in Eq. (13) to generate its actual control input. Similar to Sect. 4.1, we adopt an iterative procedure where all the robots exchange information and solve the optimization program in Eq. (13) repeatedly, until their solutions ui, ∀i ∈ R converge. From the computational perspective, each robot i ∈ R needs to know λ2 and ∂λ2 ∂¯ xi for calculating Eq. (13c). In a centralized system, a central controller has
2 access to information of every robot and can directly compute λ2 and ∂λ ∂¯ xi . However, in the decentralized system, the robots need to compute them using local information. Inspired by [23, Sect. III-IV], we utilize the Decentralized Power Iteration (PI) algorithm, which enables each robot to compute a local approx2 imation of λ2 and ∂λ ∂¯ xi . Such approximation is achieved through repeated local information exchange. Specifically, in each round of the information exchange, the message passed between neighbors is:
Ei = {νi, yi, 1, wi, 1, yi, 2, wi, 2 }
(14)
where νi approximates the i th element of the eigenvector ν corresponding to λ2 of the weighted Laplacian matrix L. yi, 1 is robot i’s estimate of Ave({νi }), ∀i ∈ R, i.e., average of all entries in ν. Similarly, yi, 2 is robot i’s estimate of Ave({νi2 }), ∀i ∈ R, i.e., the average of squares of all entries in ν. wi, 1 and wi, 2 are variables introduced for computation purpose. [23] shows that the decentralized PI algorithm converges in a few iterations. Once it converges, every robot computes its local estimate of the algebraic connectivity λ2 as: l ∈Ni Lil νl i (15) λ2 = − νi
416
J. Liu et al.
Additionally, from [19, Eq. (8)],
∂λ2 ∂¯ xi
is computed as:
∂ail ∂λ2 = (νi − νl )2 ∂¯ xi l ∈N ∂¯ xi
(16)
i
2 where ail is the weighted adjacency introduced in Eq. (12). In this way, ∂λ xi ∂¯ is computed locally using the values of νi, νl obtained from the decentralized 2 PI algorithm. Substituting the approximations of λ2 and ∂λ ∂¯ xi into optimization program in Eq. (13), each robot i ∈ R can solve Eq. (13) to obtain it actual control input.
4.3
Decentralized Target Position Estimation
Whenever the robot team reaches a new state, each robot obtains new measurements of the targets and runs a Kalman Filter (KF) independently to generate new estimates of all targets’ positions. Specifically, each robot i ∈ R computes the a priori estimate ˆzi,t |t−1 and covariance matrix Pi,t |t−1 during the prediction step as: ˆzi,t |t−1 = Aˆzi,t−1 |t−1,
Pi,t |t−1 = APi,t−1 |t−1 A + Q,
(17)
where ˆzi,t−1 |t−1 and Pi,t−1 |t−1 represent robot i’s estimate of the targets’ positions and the corresponding covariance matrix at the previous time step, respectively. A is the process matrix for targets, and Q is the targets’ process noise covariance matrix, both introduced in Sect. 3.2. After robot i ∈ R obtains a new measurement yi,t of the targets, the mea˜i,t and Kalman gain Ki,t are calculated as: surement residual y ˜i,t = yi,t − Hi,t ˆzi,t |t−1, y
−1 Ki,t = Pi,t |t−1 H i,t Si,t ,
(18)
where Si,t = Hi,t Pi,t |t−1 H i,t + Ri,t , with Ri,t being the measurement noise covariance matrix as introduced in Sect. 3.2. The new measurement is incorporated into robot i’s estimation during the update step of KF: ˜i,t , ˆzi,t = ˆzi,t |t−1 + Ki,t y
Pi,t = Pi,t |t−1 − Ki,t Si,t K i,t ,
(19)
Since each robot runs its own KF, their estimates may be different. To maintain a common estimate of the targets’ positions, we utilize the KalmanConsensus Filtering algorithm [22, Table I]. This is an iterative approach for the robots to reach an agreement on their estimates of the targets’ positions. Each robot first exchanges the information matrix Ωit (0) = (Pi,t )−1 and information vector qit (0) = (Pi,t )−1ˆzi,t with its neighbors. Then, each robot i ∈ R performs multiple rounds of the following convex combination: π˜i,l Ωit (τ), π˜i,l qit (τ), qit (τ + 1) = (20) Ωit (τ + 1) = l ∈Ni
l ∈Ni
Decentralized Risk-Aware Tracking of Multiple Targets
417
until the values of Ωit and qit converge. Note that π˜i,l is the weight between robot i and l, which is assigned using the method in [24]. Then, each robot uses Ωit and qit from the last round of the consensus algorithm to compute the updated estimates of target positions and the covariance matrix. Based on [24], it is guaranteed that once the consensus algorithm converges, each robot has the same estimates of targets’ positions as its neighbors.
5
Experiments
Through simulated experiments in the Gazebo environment, we show that the proposed system can track multiple targets in a decentralized, risk-aware, and adaptive fashion, while achieving performance comparable to its centralized counterpart. 5.1
Decentralized Tracking
We demonstrate that our decentralized target tracking system is: 1. Able to track multiple targets simultaneously, and has a reasonable division of labor among the robots; 2. Risk-aware and adaptive to the sensor configuration; 3. Able to remain connected. At the start of the target tracking process, each robot is equipped with three heterogeneous sensors which have the following measurement vectors: (21) h1 = 1 0 , h2 = 0 1 , h3 = 1 1 i.e., the first type of sensor has sensing ability in the x direction, the second type of sensor has sensing ability in the y direction, and the third type has both. Sensor failures are simulated randomly by flipping a biased coin, where the probability of failure is proportional to the total risk at each robot’s position induced by all the targets. Each robot’s random sensor failure is generated independently. For an illustration of the system performance, a demo of Gazebo simulations is available online1 , where five robots track three targets. Here, we present qualitative results for the setting where five robots track four targets. The initial sensor status as described in Eq. (2) is Γ0 = 15×3 . For each robot i ∈ {1, 2, · · · , 5}, we let the parameters to trade off between tracking quality maximization and risk minimization in Eq. (9a) be q1 = 3.0, q2 = 20.0. The maximum tracking error in Eq. (9b) is chosen as ρ i1 = [0.001; 0.001; 0.001; 0.001], and the maximum risk in Eq. (9c) is ρi2 = 0.15. The connectivity threshold in Eq. (13c) is chosen to be = 0.25. Figure 2 compares the behavior of robots under three different settings: riskagnostic with communication range Rcomm = 11.0 (Fig. 2a), risk-aware with 1
https://www.youtube.com/watch?v=AAOKDGqkuz8.
418
J. Liu et al.
Rcomm = 11.0 (Fig. 2b), and risk-aware with Rcomm = 7.0 (Fig. 2c). To simulate the risk-agnostic case, we remove constraint Eq. (9c) and the second term in the cost function Eq. (9a) for all robots. All robots are initially placed around the origin, and the targets move in concentric circles. In all three cases, the robots can move closer to the targets, so that they could obtain more accurate estimates of target positions. Meanwhile, these five robots spread out to track different targets. Figure 2a shows that without risk awareness, the robots directly approach the targets to get better estimates of the targets’ positions. In contrast, the robots keep a certain distance from the targets if the risk-aware constraint is added, as shown in Fig. 2b. Such risk-aware behavior helps the robot team preserve its sensor suite for a longer time. Figure 2c shows the behavior of the robot team when the communication range is relatively small. Compared to Fig. 2b, the robot team is more sparsely connected in this case.
Fig. 2. Comparison of the target tracking behaviors under three different settings. Grey circles represent the robots. We draw true targets’ positions and their trajectories in the latest 20 steps using red squares and red dotted lines. Correspondingly, blue squares and blue dotted lines represent the estimated targets’ positions and trajectories. The purple-shaded regions around the targets indicate the distance-dependent risk field. The green edge denotes that the robots on its two endpoints are connected. (Color figure online)
We further present quantitative results to show the effectiveness of risk awareness in the proposed decentralized system. We utilize two metrics, Tr(P) and η, which are the trace of the target state estimation covariance matrix and the sensor margin, respectively, both averaged across all robots. For both risk-aware and risk-agnostic cases, we run five trials and generate sensor failures randomly. Figure 3 shows the comparison for a 3-vs-3 case with ρ i1 = [0.01; 0.01; 0.01], ρi2 = 0.33, q1 = 1.0, q2 = 10.0, ∀i ∈ {1, 2, 3} in program (9), and = 0.25 in Eq. (13c). The initial sensor configuration is Γ0 = 13×3 . The communication range is Rcomm = 18.0 so that the robots can easily maintain connectivity. It can be observed that if we remove the risk-aware constraint Eq. (9c), the sensor margin of the system will decrease dramatically because of more sensor failures. As a result, the tracking error upsurges sharply as the increasing number of failures quickly renders the system non-observable.
Decentralized Risk-Aware Tracking of Multiple Targets
419
To demonstrate the system’s ability to trade off between tracking quality and risk aversion, we compare Tr(P) and Tr(O−Π1 ) with respect to the sensor margin η. The result is presented in Fig. 4. Note that, different from the above setting where the sensor margin is computed based on the available suite of sensors at each time step, here η is specified as a sequence of fixed values to observe the system’s behavior when the abundance of sensors is varying. As is expected, when the sensor margin η is large, there are abundant sensors in the team and the system prioritizes tracking accuracy; conversely, as η decreases, the system settles for a relatively higher tracking error to preserve its sensor suite. 5.2
Comparison to Centralized Target Tracking
We compare the proposed decentralized target tracking system with its centralized counterpart in terms of tracking error and sensor margin. The centralized tracking system used for comparison is from our previous work [1]. For a fair comparison, we use the same CBFs (Eq. (13c) and (13d)) for network connectivity maintenance and collision avoidance.
Fig. 3. Comparison of tracking error in Fig. 3a and sensor margin in Fig. 3b when the risk-aware constraint is vs is-NOT added to our decentralized target tracking framework with Rcomm = 18.0.
Fig. 4. The trade-off between tracking error and risk level with respect to the sensor margin. When η is small, the system prioritizes safety; as η increases, the system prioritizes higher tracking accuracy.
420
J. Liu et al.
Fig. 5. Comparison of the centralized and decentralized systems under the same setting. We enforce the same sequence of sensor failures on both of them as denoted by the grey crosses. For the decentralized system, the Tr(P) and η shown are both obtained from averaging across all robots.
We consider the scenario where 4 robots are tasked to track 4 targets. In both the centralized and decentralized systems, we let Rcomm = 18.0, q1 = 1.0, q2 = 10.0 in Eq. (9a), and the connectivity threshold in Eq. (13c) be = 0.25. The maximum tracking error in Eq. (9b) is chosen as ρ i1, j = 0.01, ∀ j ∈ {1, 2, 3, 4}, and the maximum risk level in Eq. (9c) is chosen as ρi2 = 0.33, for each robot indexed i ∈ {1, 2, 3, 4}. Initially, each robot is equipped with three sensors with measurement matrices h1, h2, h3 , respectively, i.e., the initial sensor configuration as described in Eq. (2) is Γ0 = 14×3 . To conveniently compare their performance, we enforce the same sequence of pre-specified sensor failures instead of inducing sensor failures randomly. In a total of 50 time steps, a sensor whose measurement matrix is equal to h3 is damaged at time step 20; another sensor whose measurement matrix is h2 is damaged at step 35. The comparison of the tracking error and sensor margin is illustrated in Fig. 5. When both systems have the same changes in their sensor margin as shown in Fig. 5b, the tracking error of the decentralized system is similar to that of the centralized one as shown in Fig. 5a.
6
Conclusion and Future Work
In this paper, we designed a decentralized and risk-aware multi-target tracking system. This is motivated by the setting where a team of robots is tasked with tracking multiple adversarial targets and approaching the targets for more accurate target position estimation increases the risk of sensor failures. We develop a decentralized approach to address the trade-off between tracking quality and safety. Each robot plans its motion to balance conflicting objectives – tracking accuracy and safety, relying on its own information and communication with its neighbors. We utilize control barrier functions to guarantee network connectivity throughout the tracking process. We show via simulated experiments that our
Decentralized Risk-Aware Tracking of Multiple Targets
421
system achieves similar tracking accuracy and risk-awareness to its centralized counterpart. Since our focus is the framework design, we adopted a simplified linear measurement model. Therefore, one future research direction is to utilize more realistic observation models, e.g., range and bearing sensors [8], which requires linearized approximations in the optimization program. A second future direction relates to the computation of goal positions. In this work, the robots generate ideal goal positions by solving a non-convex optimization program. This gradient-based approach is in general computationally time-consuming. When a very strict requirement is specified for the tracking accuracy, it is possible that the system would fail to find the global optimal solution and get stuck in local optima. To this end, we plan to design more efficient distributed algorithms to compute the ideal goal positions [25]. Additionally, we consider extending the proposed framework to deal with the scenarios where the targets can act strategically to compromise the sensors and communications of the robots [26–31].
References 1. Mayya, S., et al.: Adaptive and risk-aware target tracking for robot teams with heterogeneous sensors. IEEE Robo. Autom. Lett. 7(2), 5615–5622 (2022) 2. Schwager, M., Dames, P., Rus, D., Kumar, V.: A multi-robot control policy for information gathering in the presence of unknown hazards. In: Christensen, H.I., Khatib, O. (eds.) Robotics Research, pp. 455–472. Springer, Cham (2017). https:// doi.org/10.1007/978-3-319-29363-9 26 3. Pham, H.X., La, H.M., Feil-Seifer, D., Deans, M.: A distributed control framework for a team of unmanned aerial vehicles for dynamic wildfire tracking. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6648–6653. IEEE (2017) 4. Robin, C., Lacroix, S.: Multi-robot target detection and tracking: taxonomy and survey. Auton. Robot. 40(4), 729–760 (2016) 5. Sung, Y., Budhiraja, A.K., Williams, R.K., Tokekar, P.: Distributed assignment with limited communication for multi-robot multi-target tracking. Autonom. Robots 44(1), 57–73 (2020) 6. Zhao, Y., Wang, X., Wang, C., Cong, Y., Shen, L.: Systemic design of distributed multi-uav cooperative decision-making for multi-target tracking. Auton. Agent. Multi-Agent Syst. 33(1), 132–158 (2019) 7. Bar-Shalom, Y., Li, X.R., Kirubarajan, T.: Estimation with Applications to Tracking and Navigation: Theory Algorithms and Software. John Wiley & Sons (2004) 8. Zhou, K., Roumeliotis, S.I.: Multirobot active target tracking with combinations of relative observations. IEEE Trans. Robot. 27(4), 678–695 (2011) 9. Dames, P., Tokekar, P., Kumar, V.: Detecting, localizing, and tracking an unknown number of moving targets using a team of mobile robots. Int. J. Robot. Res. 36(13– 14), 1540–1553 (2017) 10. Zhou, L., Tokekar, P.: Active target tracking with self-triggered communications in multi-robot teams. IEEE Trans. Autom. Sci. Eng. 16(3), 1085–1096 (2018) 11. Zhou, L., Tokekar, P.: Sensor assignment algorithms to improve observability while tracking targets. IEEE Trans. Rob. 35(5), 1206–1219 (2019) 12. Dames, P.M.: Distributed multi-target search and tracking using the PHD filter. Autonom. Robots 44(3), 673–689 (2020)
422
J. Liu et al.
13. Dias, S.S., Bruno, M.G.S.: Cooperative target tracking using decentralized particle filtering and RSS sensors. IEEE Trans. Signal Process. 61(14), 3632–3646 (2013) 14. Wakulicz, J., Kong, H., Sukkarieh, S.: Active information acquisition under arbitrary unknown disturbances. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 8429–8435. IEEE (2021) 15. Wang, L., Ames, A.D., Egerstedt, M.: Safety barrier certificates for collisions-free multirobot systems. IEEE Trans. Robot. 33(3), 661–674 (2017) 16. Sung, Y., Budhiraja, A.K., Williams, R.K., Tokekar, P.: Distributed simultaneous action and target assignment for multi-robot multi-target tracking. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 3724–3729. IEEE (2018) 17. Tallamraju, R., Rajappa, S., Black, M.J., Karlapalem, K., Ahmad, A.: Decentralized MPC based obstacle avoidance for multi-robot target tracking scenarios. In: 2018 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), pp. 1–8. IEEE (2018) 18. Ji, M., Egerstedt, M.: Distributed coordination control of multiagent systems while preserving connectedness. IEEE Trans. Robot. 23(4), 693–703 (2007) 19. Capelli, B., Fouad, H., Beltrame, G., Sabattini, L.: Decentralized connectivity maintenance with time delays using control barrier functions. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 1586–1592. IEEE (2021) 20. Ames, A.D., Coogan, S., Egerstedt, M., Notomista, G., Sreenath, K., Tabuada, P.: Control barrier functions: theory and applications. In: 2019 18th European Control Conference (ECC), pp. 3420–3431. IEEE (2019) 21. Sabattini, L., Secchi, C., Chopra, N., Gasparri, A.: Distributed control of multirobot systems with global connectivity maintenance. IEEE Trans. Robot. 29(5), 1326–1332 (2013) 22. Liu, Q., Wang, Z., He, X., Zhou, D.H.: On Kalman-consensus filtering with random link failures over sensor networks. IEEE Trans. Automat. Control. 63(8), 2701– 2708 (2017) 23. Yang, P., Freeman, R.A., Gordon, G.J., Lynch, K.M., Srinivasa, S.S., Sukthankar, R.: Decentralized estimation and control of graph connectivity for mobile sensor networks. Automatica 46(2), 390–396 (2010) 24. Boyd, S., Diaconis, P., Xiao, L.: Fastest mixing markov chain on a graph. SIAM Rev. 46(4), 667–689 (2004) 25. Atanasov, N., Ny, J.L., Daniilidis, K., Pappas, G.J. :Decentralized active information acquisition: theory and application to multi-robot slam. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 4775–4782. IEEE (2015) 26. Zhou, L., Tzoumas, V., Pappas, G.J., Tokekar, P.: Resilient active target tracking with multiple robots. IEEE Robot. Automat. Lett. 4(1), 129–136 (2018) 27. Shi, G., Zhou, L., Tokekar, P.: Robust multiple-path orienteering problem: securing against adversarial attacks. In: 2020 Robotics: Science and Systems (RSS) (2020) 28. Ramachandran, R.K., Zhou, L., Preiss, J.A., Sukhatme, G.S.: Resilient coverage: exploring the local-to-global trade-off. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 11740–11747. IEEE (2020) 29. Liu, J., Zhou, L., Tokekar, P., Williams, R.K.: Distributed resilient submodular action selection in adversarial environments. IEEE Robot. Autom. Lett. 6(3), 5832–5839 (2021)
Decentralized Risk-Aware Tracking of Multiple Targets
423
30. Zhou, L., Tzoumas, V., Pappas, G.J., Tokekar, P.: Distributed attack-robust submodular maximization for multirobot planning. IEEE Trans. Robot. (2022) 31. Zhou, L., Kumar, V.: Robust multi-robot active target tracking against sensing and communication attacks. In: 2022 American Control Conference (ACC), pp. 4443–4450. IEEE (2022)
Application of 3D Printed Vacuum-Actuated Module with Multi-soft Material to Support Handwork Rehabilitation Shoma Abe, Jun Ogawa(B) , Yosuke Watanabe, MD Nahin Islam Shiblee, Masaru Kawakami, and Hidemitsu Furukawa Yamagata University, Jonan-4-3-16, Yonezawa, Yamagata 992-8510, Japan {jun.ogawa,yosuke.w,nahin,kmasaru,furukawa}@yz.yamagata-u.ac.jp
Abstract. Soft modular robotics is a research field that pursues body flexibility while maintaining reconfigurability by employing flowable or soft materials as the main constituent materials of modular robots. In soft modular robotics, a hollow structure made of silicone is created, and the inside of the structure is pressurized to generate stretching and bending motions as an actuator module. In general, it is important to design modular robots with reconfigurability, various deformability, and environmental adaptability. Modularization of soft actuators with reconfigurable and diverse deformability allows morphology to be tailored to the desired task. We have developed a soft actuator module, MORIA, that combines hollow silicone and 3D printable deformable structures (uniaxial shrinking, bending, shearing, uniform contracting, and no deformation). This study proposes a functional extension of the module “MORI-A FleX” for application as a wearable device for physical rehabilitation using this MORI-A module. Our MORI-A FleX connector is a thermoplastic polyurethane flexible connector that can be coated with materials that offer different textures to achieve excellent connectivity and texture variation. We have assembled MORI-A Flex as a rehabilitation device for hand work, assisting the fingers to move harmoniously and enabling them to grasp a slippery, brittle half-boiled egg and a PET bottle containing 200 ml of water in a deactivated state. Keywords: 3D printing · soft robotics robotics · wearable device
1
· pneumatic actuator · modular
Introduction
Soft modular robotics is a research field that pursues body flexibility while maintaining reconfigurability by employing flowable or soft materials as the main constituent materials of modular robots. In soft modular robotics, a hollow structure made of silicone is created, and the inside of the structure is pressurized to generate stretching and bending motions as an actuator module [1,2]. In general, c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 424–437, 2024. https://doi.org/10.1007/978-3-031-51497-5_30
Application of 3D Vacuum-Actuated Module to Support Handwork
425
it is important to design modular robots with (1) reconfigurability, (2) various deformability, and (3) environmental adaptability. Conventional soft modular robots achieve reconfigurability by forming a large number of microchannels with connecting devices such as magnets and silicon tubes, [3–8], and creating various deformabilities by local hardness bias on the surface of the shell structure [9,10]. However, in the highly reconfigurable method, the connection is easy and has independent flow paths, making it highly controllable, but the deformation of the connected module is suppressed by the hard material and the complexity of the connection. On the other hand, direct processing of silicone surfaces increases deformability but reduces versatility. We have developed a vacuum-driven soft module “MORI-A” [11] that achieves these three functionalities by combining a cube-shaped hollow silicone case with a soft polymer material and a deformation-controlling structure that can be easily created with a 3D printer. The MORI-A module has ventilation holes on six sides of the cube, and each hole is sealed by a connector or sealing plug made by a 3D printer. Figure 1 shows a schematic diagram of the MORI-A module. Since neither electrical devices nor water-soluble resins are used in the construction materials, the module is mobile in different environments, whether on land or in a fluid environment, and is drivable enough to allow buoyancy adjustment. In addition, we propose the MORI-A FleX, which extends the surface texture and extensibility of the MORI-A module connector by modifying its materials for application to physical rehabilitation. The complex geometry and intended use of this actuator have potential applications in physical rehabilitation. As examples of research on hand rehabilitation, two pneumatic soft actuators have been reported to assist ROM training and muscle stretching [12], and the soft actuator “pneumagami” with an origami structure has been reported to support human physical activity through deformation [13]. In the case of artificial muscles, soft machines have been developed to assist the grip force, joints and posture of the legs and hips under pressure [14–16]. In the case of artificial muscles, exoskeletal gloves have been developed for hand rehabilitation, and motorized ankle-foot orthoses have been developed for foot rehabilitation [17–20]. In addition, soft machines for rehabilitation are mostly used for adults, but in recent years, small soft rehabilitation machines for infants have been developed [21].While these methods are effective as assistive tools, they are characterized by the need for electrical components in the actuators that are directly attached, and by their lack of reconfigurability. In addition, their size creates limitations on the age and body size at which they can be used. Therefore, this study aims to apply MORI-A FleX to wearable devices that support hand and finger movements, which do not require electrical wiring for actuators with various deformability and reconfigurability plus scale variation.
2
MORI-A FleX: The Vacuum-Actuated Module with 3D Printed Flexible Connectors
This section describes the procedure for creating the MORI-A FleX module.
426
2.1
S. Abe et al.
Creation Procedure of MORI-A FleX
The silicone case is made of EcoflexTM 00–10 (Smooth-On, Inc.), a stretchable silicone. The shape is a hollow cube of 19.0 mm square with a thickness of 2.0 mm. Since the goal here is to develop a wearable device for hand work rehabilitation, the dimensions are adjusted to the average finger width of an adult. The scalability of the MORI-A FLeX module is not limited to this size, as it can be scaled
Fig. 1. Schematic diagram of the vacuum-actuated module “MORI-A FleX”
Fig. 2. Procedure for creating a MORI-A FleX module
Application of 3D Vacuum-Actuated Module to Support Handwork
427
Fig. 3. Basic components of the internal structure and parametric design
up or down. In addition, modules with different dimensions are designed to be connectable. Figure 2 shows the process of creating the MORI-A FleX module. First, laser-cut acrylic panels are assembled to form a lattice of partitions. A core consisting of a thermoplastic polyurethane (TPU) inner structure and a water-soluble polyvinyl alcohol (PVA) outer shell and lid is placed in the acrylic partition. After installation, silicone is poured into the core and left to cure for at least 4 h in a vacuum de-aerator to prevent the formation of air bubbles on the silicone surface. The cured silicone case is then removed from the acrylic partition and immersed in water for one hour to dissolve the PVA core. This process produces MORI-A FleX, a silicone with an embedded TPU inner structure. 2.2
TPU Internal Construction
This internal structure is embedded in a module that is vacuum compressed to create the desired deformation through the case. The internal structure adopts a 3D printable structure called a parallel-cross structure [22], in which parallel line objects are rotated by an intersection angle θ, and stacked alternately. The basic structure is shown in Fig. 3. As shown in Fig. 3, this structure is edited by seven variables: intersection angle θ, maximum grid width sx , number of cells m and n, line width t, stacking height tz and maximum stacking height z. The number of cells can be arbitrarily determined by n, and the value of m is determined by
428
S. Abe et al.
Fig. 4. 3D printed deformation-defining structures
Fig. 5. Appearance of MORI-A FleX deformation
the formula m = 2n − 1. The deformation prescriptive structure of the MORIA FleX module is composed of “no deformation”, “shrinking”, “bending”, and “shearing”. The parameters set for this structure and the deformations are shown
Application of 3D Vacuum-Actuated Module to Support Handwork
429
Fig. 6. Connectors used in MORI-A FleX module and their roles
in Fig. 4 and Fig. 5. The shrinking, bending, and shearing deformation are 3D printed with TPU filaments because they require flexibility to allow deformation. The no-deformation structure is achieved by using Poly-lactic acid (PLA) as the material, which has a stiffness that does not cause bending even when the silicone case is vacuumed. Therefore, This structural form is not necessary for “undeformed” since the material is the main deforming element. These parallel cross structures were fabricated using an FDM 3D printer (X-Pro, QIDI Tech, Inc.). 2.3
3D Printed Flexible Connector
The MORI-A FleX module also has a parametric design for the connector that closes the opening. The basic shape of the connector is a cylinder with a diameter slightly larger than the silicone case opening. These connectors can be fabricated on a 3D printer using the Ninja FleX (Ninja Tek) TPU filament or the UV lightcurable resin (Clear/Black Resin, Wanhao). The list of connectors to be molded with this material and their roles are shown in Fig. 6. There are four types of MORI-A FleX connectors: (1) module-to-module connector; (2) module air plug; (3) texture plug; and (4) tubing connector. (1) The module-to-module connectors are used to connect two vacuum-actuated modules with flow paths secured. (2) The module air plug is a connector that seals the opening of a module without connecting it to another module. The thickness of the cylindrical part is the same as the wall thickness of the silicone case, so it can be sealed without allowing air to enter. (3) The texture plug is used to change the surface texture of the module. (4) A tube connector is a connector that can connect a silicone tube to a module to allow air to flow from the silicone tube to the module. It is mainly used to connect an air compressor.
430
3 3.1
S. Abe et al.
MORI-A Flex Mechanical Properties Load Displacement Performance
Since MORI-A FleX is designed to be vacuum-actuated by connecting many modules, load capacity in the tensile direction is an important parameter for applications such as grippers. Here, load-displacement tests are performed on MORI-A FleX uniaxial shrinkage modules of 1 to 8 modules. This test was performed using a STA-1150 table-top material testing machine manufactured by A&D Company.The measurements and results are shown in Fig. 7 and Fig. 8. The inside of the uniaxially contracting MORI-A FleX is vacuumed, and the displacement and load are measured when tension is applied after the material is mounted in the chuck of the tensile testing machine. The pressure condition from the compressor was set to 500[kPa]. From the results in Fig. 8, the load carrying capacity is about 7.3 [mm] for displacement at a maximum load of about 15.1 [N] when a single MORI-A FleX module is driven. This means that a single MORI-A FleX module lifts an object with a mass of about 1.5 Kg. As the number of couplings increases, the maximum load decreases while the displacement increases. The load values are equivalent for four or more connections. The actual displacement depends on the stiffness of the internal structure.
Fig. 7. MORI-A FleX load displacement test in tensile direction
3.2
Coefficient of Friction with Texture Change Connector
Connectors with variable surface properties can be coated with materials such as UV-curable resin (Clear Resin, Wanhao), shape memory gel (SMG) [23], and
Application of 3D Vacuum-Actuated Module to Support Handwork
431
Fig. 8. Comparison of load carrying capacity of MORI-A FleX in the tensile direction depending on the number of connections
Fig. 9. Comparison of MORI-A FleX texture by machined surface of flexible connector (texture plug)
pile to give the desired texture to the module. The results of a comparison of the processed surface textures are shown in Fig. 9. These connectors coated with UV-curable resin have the flexibility of the Ninja FleX (NinjaTek), but have a smooth, glossy surface texture. Next, connectors coated with SMG developed
432
S. Abe et al.
Fig. 10. Comparison of friction coefficients due to materials coated on texture plugs
a temperature-dependent tackiness that is a characteristic of this material, and by taking advantage of this tackiness, it is possible to hold objects that are difficult for conventional MORI-A to grip. Finally, a simple flocking experiment device (Green Techno Co., Ltd.) was used to apply flocking pile to the connectors. Connectors with flocking pile attached to them have a fluffy, feather-like texture. This pile texture can be used to give a texture unique to fibrous fabrics. In a friction test, the effects of the UV resin, pile, and SMG surface coatings on the fabrication of this study are compared. A velocity friction tester Type:μV1000 (Trinity-Lab., Inc.) is used for the friction test. Soda-lime-silica glass substrates of 100[mm]×100[mm]×2[mm] are used as friction surfaces for the test. Figure 10 shows the results of the friction test. The pile showed the lowest friction. The average friction coefficient of pile is 0.37[-]. The coefficient of friction is closest to that of magnesium, which is 0.34[-], and is used in wheelchairs, canes, and other assistive devices, as well as in electronic components. The friction of resin is slightly higher than for untreated resin (TPU). Compared to the other three, the value of friction for resin converges to a constant value. The average coefficient friction of resin is 0.48[-], and the closest material is cerium, which is used as an abrasive for glass and automotive glass. The average friction coefficient of SMG is approximately 1.2[-].
Application of 3D Vacuum-Actuated Module to Support Handwork
433
Fig. 11. Schematic of wearable device with MORI-A FleX assembly supported by handwork
Fig. 12. Grasping a PET bottle of 200 ml water and a half-boiled egg (subject’s hand is in a state of weakness)
434
4
S. Abe et al.
Applications
MORI-A Flex will be applied to a wearable device for the rehabilitation of human hand grasping movements. A schematic diagram of the wearable device to be developed is shown in Fig. 11. To secure MORI-A Flex, which is scaled to the width of a human finger, to the finger, a ring connector is sculpted with a binding band glued to one side of a UV-curable resin sealing plug. The modules on the five fingers are connected to each other by a silicone tube and a six-mouth connector of UV-curable resin developed by the company. To represent the movement along the fingers with MORI-A FleX, a shear deformation module is attached to the tip of the finger, while the rest of the finger area is attached to a module that
Fig. 13. Sixth finger with bending module
Fig. 14. Block diagram showing the wearable device developed in this study and the control of the sixth finger
Application of 3D Vacuum-Actuated Module to Support Handwork
435
can stretch and deform. Figure 12 shows the actual grasping of a boiled egg and a PET bottle containing 200 ml of water with this wearable device. The results are evaluated from two points of view: rehabilitation and assistance. First, in terms of rehabilitation, since the finger flexes softly and without strain, it is expected to gradually restore the sense of finger bending for those who have peripheral nerve problems by repeatedly applying and decreasing pressure. From the viewpoint of power assist, even when the subject is in a state of weakness, attaching a highly tacky SMG texture change plug to the tip of the little finger on MORI-A FleX creates an effect that makes it easier to grasp a PET bottle and successfully carry an object with a load of 200g. Next, in the grasping of a half-boiled egg, which was conducted to verify whether an object with a smooth texture could be grasped, a boiled egg was wrapped in the palm of the hand, and the hand was able to carry the boiled egg stably without slipping under reduced pressure. The deformability, contact texture, and changeable shape of the wearable device can be considered to enable simple power assistance even when the finger muscles are completely inoperable. In addition, the MORI-A Flex attached to the finger was unable to withstand the weight of an object weighing more than 200 g. This was due to misalignment between the module and the finger, air leakage inside the module, and misalignment between the adhesive surface and the object, which made it impossible to grasp the object properly. However, although a PET bottle was grasped from the side, objects that can be wrapped by hand, such as a half-boiled egg, can be expected to grasp objects weighing 200 g or more. It can be indicated that adding two shear modules next to the fingertips would allow them to carry objects more stably. In addition, as shown Fig. 13, the flexion module is expected to assist in grasping by creating a sixth finger composed only of flexion. Furthermore, if it can be combined with the wearable device developed this time, it is thought that it will be possible to grasp heavier and more difficult-to-grasp objects. Figure 14 shows a block diagram representing the wearable device developed in this study and the control of the sixth finger. Compressed air from the compressor is connected to the vacuum generator to reduce pressure. A solenoid valve is inserted just before the vacuum generator, and when the valve is open, the module attached to the finger is depressurized and driven. On the other hand, when the valve is closed, air flows back and the module returns to its original shape. Pneumatic pressure can be controlled by opening and closing the valve with a relay circuit controlled by an Arduino.
5
Conclusion
We have developed a versatile vacuum-actuated module, MORI-A FleX, which allows for high reconfigurability, diverse deformability, and changes in contact surface texture. By applying the morphological adaptability and texture adjustment capabilities of this module, we have succeeded in developing a soft-weld device for simple grasp rehabilitation and assistance by assisting soft movements along with hand and finger movements. Since this module does not require electrical wiring for the actuator, there is no risk of wire breakage or electric shock,
436
S. Abe et al.
and it can be viewed as a safe rehabilitation tool. Future prospects include the development of MORI-A FleX wearable devices for infants and toddlers through morphological changes, scaling, various combinations and refinements of different textures, automatic calculation of module configurations to fit each individual, and motion assist for different body parts. Acknowledgement. This work was supported in part by JSPS KAKENHI Grant Number JP18H05471, JP19H01122, JP21H04936, JP21K14040, JP22K17972, JST OPERA Program Grant Number JPMJOP1844, Moonshot Agriculture, Forestry and Fisheries Research and Development Program (MS508, Grant Number JPJ009237) and the Cabinet Office (CAO), Cross-ministerial Strategic Innovation Promotion Program (SIP), “An intelligent knowledge processing infrastructure, integrating physical and virtual domains”,”Intensive Support for Young Promising Researchers” (funding agency: NEDO)
References 1. Cheney, N., MacCurdy, R., Clune, J., Lipson, H.: Unshackling evolution: evolving soft robots with multiple materials and a powerful generative encoding. ACM SIGEVOlution 7(1), 11–23 (2014) 2. Kriegman, S., Walker, S., Shah, D., Levin, M., Kramer-Bottiglio, R., Bongard, J.: Automated shapeshifting for function recovery in damaged robots. In: arXiv:1905.09264 (2019) 3. Vergara, A., Lau, Y., Mendoza-Garcia, R.F., Zagal, J.C.: Soft modular robotic cubes: toward replicating morphogenetic movements of the embryo. PLoS ONE 12(1), e0169179 (2017) 4. Lee, J., Kim, W., Choi, W., Cho, K.: Soft robotic blocks: introducing SoBL, a fast-build modularized design block. IEEE Robot. Autom. Magaz. 23(3), 30–41 (2016) 5. Nemitz, M.P., Mihaylov, P., Barraclough, T.W., Ross, D., Stokes, A.A.: Soft robotic blocks: using voice coils to actuate modular soft robots: wormbot, an example. Soft Rob. 3(4), 198–204 (2016) 6. Zhang, Y., et al.: A mechatronics-embedded pneumatic soft modular robot powered via single air tube. Appl. Sci. 9(11), 2260 (2019) 7. Zou, J., Lin, Y., Ji, C., Yang, H.: A reconfigurable omnidirectional soft robot based on caterpillar locomotion. Soft Rob. 5(2), 164–174 (2018) 8. Sui, X., Cai, H., Bie, D., Zhang, Y., Zhao, J., Zhu, Y.: Automatic generation of locomotion patterns for soft modular reconfigurable robots. Appl. Sci. 10(1), 294 (2020) 9. Morin, S.A., et al.: Using “Click-e-Bricks” to make 3D elastomeric structures. Adv. Mater. 26(34), 5991–5999 (2014) 10. Morin, S.A., et al.: Elastomeric tiles for the fabrication of inflatable structures. Adv. Func. Mater. 24(35), 5541–5549 (2014) 11. Ogawa, J., Mori, T., Watanabe, Y., Kawakami, M., Shiblee, M.D.N.I., Furukawa, H.: MORI-A: soft vacuum-acctuated module with 3D-printable deformation structure. IEEE Robot. Autom. Lett. 7(2), 2495–2502 (2022) 12. Hironari, T., Shuichi, W., Koichi, S.: Development of hand rehabilitation system to prevent contracture for finger joints based on the therapy of occupational therapists (Massege hand and range of motion exercises using pneumatic soft actuators) . Trans. JSME 80(820), 348-348 (2014). (in Japanese)
Application of 3D Vacuum-Actuated Module to Support Handwork
437
13. Robertson, M.A., Kara, O.C., Paik, J.: Soft pneumatic actuator-driven origamiinspired modular robotic “pneumagami”. Int. J. Robot. Res. 40(1), 72–85 (2021) 14. Yiyue, L., et al.: Digital fabrication of pneumatic actuators with integrated sending by machine knitting. In: CHI Conference on Human Factors in Computing Systems, pp. 1-13 (2022) 15. Al-Fahaam, H., Davis, S., Nefti-Meziani, S., Theodoridis, T.: Novel soft bending actuator-based power augmentation hand exoskeleton controlled by human intention. Intell. Serv. Robot. 11(3), 247–268 (2018) 16. Xie, D., Ma, Z., Liu, J., Zuo, S.: Pneumatic artificial muscle based on novel winding method. Actuators 10(5), 100 (2021) 17. Herr, H.M., Koronbluh, R.D.: New horizons for orthotic and prosthetic technology: artificial muscle for ambulation. In: Smart Structures and Materials 2004. Electroactive Polymer Actuators and Devices (EAPAD), vol. 5385, pp. 1-9 (2004) 18. Dong, T., Zhang, X., Liu, T.: Artificial muscles for wearable assistance and rehabilitation. Front. Inf. Technol. Electron. Eng. 19(11), 1303–1315 (2018) 19. Al-Fahaam, H., Davis, S., Nefti-Meziani, S.: The design and mathematical modelling of novel extensor bending pneumatic artificial muscles (EBPAMs) for soft exoskeletons. Robot. Auton. Syst. 99, 63–74 (2018) 20. Pan, M., et al.: Soft actuators and robotic devices for rehabilitation and assistance. Adv. Intell. Syst. 4(4), 2100140 (2022) 21. Mendoza, M.J., et al.: A vacuum-powered artificial muscle designed for infant rehabilitation. Micromachines 12(8), 971 (2021) 22. Touma, T.: Complex of Parallel crosses structure. J. Imaging Soc. Japan 58, 406– 414 (2019) 23. Yokoo, T., Hidema, R., Furukawa, H.: Smart lenses developed with high-strength and shape memory gels. e-J. Surf. Sci. Nanotechnol. 10, 243–247 (2012)
Combining Hierarchical MILP-MPC and Artificial Potential Fields for Multi-robot Priority-Based Sanitization of Railway Stations Riccardo Caccavale1 , Mirko Ermini2 , Eugenio Fedeli2 , Alberto Finzi1 , Emanuele Garone3 , Vincenzo Lippiello1 , and Fabrizio Tavano1,2(B) 1
Universit` a degli studi di Napoli “Federico II”, via Claudio 21, 80125 Naples, Italy {riccardo.caccavale,alberto.finzi,vincenzo.lippiello, fabrizio.tavano}@unina.it 2 ReteFerroviariaItaliana, Piazza della Croce Rossa 1, 00161 Rome, Italy [email protected],[email protected] 3 ULB, Ecole polytechnique de Bruxelles, Campus du Solbosch - CP 165/55 Avenue F.D. Roosevelt, 50, Bruxelles, Belgium [email protected]
Abstract. We propose a distributed framework, driving a team of robots for the sanitization of very large dynamic indoor environment, as the railway station. A centralized server uses the Hierarchical Mixed Integer Linear Programming to coordinate the robots assigning different zones where the cleaning is a priority; thanks to the Model Predictive Control approach we use historical data about the distribution of people and the knowledge about the transportation service of the station, to predict the future dynamic evolution of the position of people in the environment and the spreading of the contaminants. Each robot navigates the large environment represented as a gridmap, exploiting the Artificial Potential Fields technique in order to reach and clean the assigned areas. We tested our solution considering real data collected by the WiFi network of the main Italian railway station, Roma Termini. We compared our results with a Decentralized Multirobot Deep Reinforcement Learning approach. Keywords: Hierarchical Mixed-Integer Linear Programming Predictive Control · Artificial Potential Fields · distributed · multi-agent · Priority-Based Sanitization
1
· Model
Introduction and Related Works
The pandemic caused by the SARS-CoV-2 has spawned a crisis that has affected the railway sector in a significant way [29], for example by inducing people to c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 438–452, 2024. https://doi.org/10.1007/978-3-031-51497-5_31
HMILP-MPC Multi-robot Priority-Based Sanitization of Railway Stations
439
prefer cars instead of trains [6]. In this regard, disinfectant robot technologies are proven very useful to fight global pandemics [30] by reducing the number of people involved in the cleaning process and by optimizing the sterilization. In this work, we propose a multi-robot sanitizing framework specific for teams of robots capable of cleaning human-populated environments [16] as railway stations. Our main goal is to design a framework in which a team of autonomous indoor cleaning robots executes the sanitizing activities exploiting information about human presence retrieved from WiFi infrastructure to maximize the cleaning performance in big indoor environments such as railway stations without stopping the railway/commercial activities. In particular, we exploit the station’ WiFi Network to verify the most populated areas of the environment by retrieving the information about the position of the users’ smartphones [24]. In our solution a Server (MPC-Server) receives from the WiFi Network, the information about the presence of the people in the station and estimates the positions of the contaminated areas to be mapped into an heatmap. Here, we deploy a distributed MPC approach that adapts the strategy of sanitization to the evolution of the contamination due to the variation of the position and number of people in the station. Several approaches have been proposed for multirobot cleaning of indoor environments. One possible approach is to formulate the cleaning of the environment as a coverage path planning (CPP) problem, where the objective is to find a strategy to reach every part of the environment minimizing the time of the execution. A first work relies on the random walk method [4], where robots performs random movements inside the environment. In this case, the time of execution is affected by position overlapping and path repetitions. Other solutions consider the environment as a grid-map that is partitioned in several parts and assigned to different robots executing fixed shaped cleaning paths (spirals, triangles, rectangles, etc.) [9–12,14,17,18,20]. These methods are effective to provide a continuous cleaning service which maximize the coverage and minimizes the idleness of the agents. Additional works regarding the CPP methodology, consider the presence of some zones of the environment with persistent static higher priority, that needs to be cleaned/sanitized with pre-specified higher frequency [5,13,15,21,28]. Those approaches often consider a graph-based representation of the environments with only a limited number of nodes. In contrast, we are interested to find a solution considering high resolution and dynamic priorities in very large railway stations. An interesting work is described in [5], where the authors solve the multi-robot persistent coverage control problem over a grid environment using Mixed Integer Linear Programming (MILP) to maximize the coverage level of the cells over a finite horizon. In their study, the authors defined terminal constraints to ensure that the agents are always able to terminate their plans in predetermined closed trajectories. Also in our framework we consider persistent priorities from the historical data recorded by Meraki WiFi Network, but in contrast, we are more interested to find a solution that may adapt to different circumstances, following the unexpected changes of priorities of the environment, without fixed trajectories or fixed nodes. Moreover, we chose to design a Hierarchical MILP (HMILP) [3,7] where the problem of finding differ-
440
R. Caccavale et al.
ent paths for different robots in a common graph is decomposed in a cascade of MILP problems, one for each robot, and each selected path is used to constrain the selection of the followings. This way, robot starting from the same initial conditions are forced to select different paths to increase the coverage. In this context Model Predictive Control (MPC), has often been used to find local solutions in open environments with a small number of obstacles [2]. Unfortunately, including geometric constraints for complex and very large indoor environments with static obstacles makes the optimization problem very expensive to solve in real-time [1]. This depends by the number of added obstacle constraints, increasing the iteration cost of MPC solvers, and the extra non-convexity making it hard to find good local solutions. For this reason, we propose an approach that avoids geometric constraints in the control layer by delegating collision checking and obstacle avoidance to a lower-level Artificial Potential Fields (APF) navigation module. Similarly to our work, in [27] an APF framework controlling and coordinating a group of robots for cooperative manipulation tasks is proposed. Also in [23] APF method and optimal controllers methods are used for path planning of autonomous vehicles. Following these perspectives, in this paper we propose a MILP-MPC approach combined with APF, which is able to produce highlevel strategies for the cleaning/sanitizing of crowded environments by on-line adapting the agents’ behavior with respect to the current distribution of people. In particular, the proposed system relies on anonymous information from the preexisting WiFi infrastructure, which is often available in public environments, to estimate the position of people [24] and to prioritize the sanitizing process toward more crowded areas of the environment. Moreover, we also defined a simple parametric model to estimate the spreading of contaminants (bacteria or viruses) due to the presence of people in order to better estimate the risky areas to be cleaned with higher priority. In particular, our main contributions can be summarized as follows: – We propose a scalable MPC-MILP framework combined with APF, where multiple mobile robots cooperates during the execution of cleaning tasks into a very large crowded environment, exploiting WiFi information about the distribution of people, historical data and the knowledge about the transportation service in the station. – We tested the behavior of a teams of four robots considering two weeks of real data recorded by the Meraki Cisco System WiFi network of the Roma Termini station. – We compared behavior of our solution with the Multi Agent Reinforcement Learning (MARL) framework [25]. For this experiment we have shared the video MARLvsHMILP.mp4, at the link: http://wpage.unina.it/fabrizio. tavano/video/MARLvsHMILP.mp4. – We provide an additional experiment showing how the proposed system can be adapted to consider on-line requests for the sanitization of specific areas. For this experiment we have shared the video CleaningSpecificArea.mp4, at the link: http://wpage.unina.it/fabrizio.tavano/video/CleaningSpecificArea. mp4.
HMILP-MPC Multi-robot Priority-Based Sanitization of Railway Stations
441
The rest of the paper is structured as follows. In Sect. 2 we describe the architecture of the proposed framework, In the Sect. 3 we focus on the case studies about the comparison with the MARL technique presented in [25]. Finally, Sect. 4 concludes the paper and discusses future works.
2
Architecture
The architecture is based on a decentralized client-server architecture represented in Fig. 1, where a team of robots is on-line connected to a unique server. We decided to follow this choice because our idea is to find a solution that responds to the demand of Rete Ferroviaria Italiana (RFI), using the resources and technologies that are just present in the Italian railway stations as the Meraki Wifi Cisco System Infrastructure that records information on passenger foot traffic that flows the station in form of a Meraki Heatmap.
Fig. 1. Distributed client-server architecture, where the MILP server sends the MPCHeatmap and the destination nodes to the robots that decide their root thanks to the APF motion path technique.
As shown in Fig. 1, we consider that the MPC Server receives the Meraki heatmap as visualized in the Meraki Dashboard, with an updating period of 1 h. In the Meraki heatmap, the colors represent the current distribution of people in the station. This image is converted in a new MPC-Heatmap formatted as gridmap of dimensions 100 × 172 pixels, in which the colors are now indicating the level of contamination of the areas in the environment, as shown in Fig. 2 (a). More formally, we define M as the gridmap of the station, X as the set of all free-obstacle grids in the map, S as the set of possible heatmaps (i.e. priority distributions) on the map M and A as the set of actions available for a single agent where ai ∈ A drives the agent i from the current grid to an adjacent one (Fig. 2 (b)). Each element of the MPC-Heatmap is a real number in the interval [0, 1] where 1 is the maximum priority and 0 means that no cleaning is needed. This matrix can be displayed as a color-coded image (see MPC-Heatmap in Fig. 2 (b)) where black pixels are associated to 0 priority areas, while colors from red to
442
R. Caccavale et al.
yellow are for increasingly higher priorities. We have also modeled the behavior of the contaminants to spread and attenuate during the time. As for the update of the heatmap (cleaning priority) it is computed from the position of people (clusters) by modeling possible spreading of viruses or bacteria using a gaussian model of dispersion as in [8]. Specifically, we exploit the periodic convolution of a Gaussian filter every step, where μ and σ are suitable parameters that can be regulated depending on the meters/pixels ratio, the timestep, and the considered typology of spreading (in this work we assume a setting inspired to the aerial diffusion of the Covid-19 [26]). In our case, we use the values of μ and σ as in [25]. The effects of spreading and attenuation of the priorities at every step of time in the MPC-Heatmap is shown in Fig. 2 (c) after 100 steps. This spreading process is also considered as a model for our MPC method in order to estimate the evolution of the priority over time.
Fig. 2. Heatmap representation of the Roma Termini Station: Meraki Heatmap (a) MPC-Heatmap (b) MPC-Heatmap after 100 timesteps, MPC-Heatmap with the associated graph.
Our strategy of sanitization has 4 main steps in its execution as shown in Fig. 3. We consider that the cleaning starts with a delay of 30 minutes from the first reception of the first Meraki Heatmap in the morning. Following the arrow of the timeline, we consider a first step where the MPC-Server receives the update of the Meraki Heatmap from the Meraki Server (step A). The MPCServer applies the HMILP as in [3] to decide the destinations of every robot (step B). In this phase the MPC Server makes use of information about historical data about the distribution of people in the station in order to do a prediction on the conditions over the next 30 min. After (step C) there is a new update of the Meraki Heatmap, and finally (step D) A new HMILP decision is executed to correct the first prevision considering the new Meraki Heatmap update, without contribution of the historical knowledge. The latter correction is needed to compensate for the unpredictable behavior of the people. The server is also responsible to verify the positions of the people in the environment thanks to the Meraki heatmap and to receive the positions of every robot inside the station. Those information are used to build MPC-Heatmap
HMILP-MPC Multi-robot Priority-Based Sanitization of Railway Stations
443
(Fig. 2) with different colored zones. The MPC-Server extracts information about the positioning of the people in the station from the current Meraki heatmap, it receives the positions of every robot and their chosen paths and cleans relative zones in the MPC-Heatmap to prepare the new updated map. MPC-Server provides a partitioning of the current MPC-Heatmap in a discrete and limited number of areas with a fixed step of sectioning (in our case we choose 20 pixels); every resulting subarea will be represented by a single value in an ordered list of partitions G1. Considering i ∈ N = {0, n}, where n is the total number of partitions of the MPC-Heatmap, the value of the element i of the list G1, is qi , defined as the sum of the priority risk of the relative portion of MPC-Heatmap. The MPC Server applies the same procedure to realize a second ordered list G2, using a second heatmap MPC-Heatmap 2, which is used to estimate the future expected distribution of people. MPC-Heatmap 2 is built considering the knowledge about the transportation service of the station and the average behavior of people in the historical data. RFI defines 4 categories of days to organize the transportation service in the station Roma Termini: the working day, from Tuesday to Thursday; the Friday, the day before the holidays, including the Saturday; the holiday, including Sunday, The day after the holidays, as the Monday.
Fig. 3. Four main steps of execution in the strategy of Sanitization presented in this study.
In Figs. 4 the different number of departures in Roma Termini, grouped in steps of 1 h, where every color is associated to a different category. There are differences in number of train services between the 4 categories. RFI defines 4 types of transportation services: The passenger high speed service by Trenitalia company (TI Pax HSS), the passenger high speed service of other private companies (OC-HSS), long haul standard low speed service (TI-PAX-SU) and local transportation (TI-PAX-REG ) as shown for example in Fig. 4 for the cases of working day and holiday. The MPC-Heatmap 2 is so realized considering the average of
444
R. Caccavale et al.
historical Meraki Heatmaps that represent the conditions of the station in the days that belong the same category of day of the current MPC-Heatmap used to built G1, but an hour later. Considering i ∈ N = {0, n}, where n is the total number of partitions of the MPC-Heatmap 2, the value of the element i of the list G2, is pi , defined as the sum of the priority risk of the relative portion of MPC-Heatmap2. Starting by the knowledge of the last received position of every robot in the MPC-Heatmap, the MPC-Server finds for every robot their current starting node in the graph. Then, it calculates the next nodes for every robot, considering the contribution of the values of the nodes of two ordered lists G1 and G2. Every robot receives from the MPC-Server the updated MPC-Heatmap and a list of destination nodes that it must visit, following the indications of the HMILP results. Every node represents a specific portion of the MPC-Heatmap. Every robot, acting independently and with autonomy, verifies the positions of the peaks of priorities in the portion of MPC-Heatmap considering the nodes of its path in the graph. It sorts the found peaks starting from the higher value to the lower value. In case of peaks of the same value, they are positioned in the list from the most closed to its current position to the farthest one. Finally every robot, autonomously use the APF method to reach the found peaks in the list. We assume that the team of robots is constantly connected with the common MPC-Server thanks to the WiFi railway service of the station. We consider that every robot communicates with each other and they know the APF paths of the other robots. In this way we avoid path repetition. The robots, acting independently, are also responsible of the process of obstacle avoidance. When a robot meets an obstacle it calculates the shortest path to turn around. It starts to count the number of necessary steps to avoid the obstacles in both the directions, then it choose the shortest route to avoid it. When a priority peak is reached, the robot starts to clean following a spiral shape path centered in correspondence of the coordinates of the found peak in the gridmap as in [11,12]. We consider that our robot sanitizes an area of the gridmap of dimension 36 pixels at every timestep, where the timestep has a duration of 1 min as in the study [25]. The Model Predictive Control (MPC) consists of solving an on-line finite horizon open loop optimization problem using the current state as the initial state for the team of robots. The solution to the problem gives a sequence of control inputs for the entire control horizon; however, only some elements of the optimal input sequence are applied to the team of robots. The number of used inputs depends in our case by the number of peaks of priorities that are present in the MPC-Heatmap, and by the horizon time. In our context, Meraki Server sends an updating of the Meraki Heatmap with a periodicity of 1 h, so our horizon control is also of 1 h.
HMILP-MPC Multi-robot Priority-Based Sanitization of Railway Stations
445
Fig. 4. Comparison between RFI categories of day for the transportation service in Roma Termini station. To the left different number of departure of trains, at different hours and different categories. To the right, different typology of services for two categories: Holiday and Working Day.
The graph representation used to describe the problem in this study is similar to the ones proposed in [19,31]. We define the Graph G = (N, A), where N = {0, 1, . . . , n} is the set of vertexes and A = {(i, j) : i, j ∈ N, i/ j} is the set of arcs. Vertex 0 denotes the depot of the robots, while the other vertexes correspond to different areas of the partitioned gridmap MPC-Heatmap. Each area i must be sanitized so every node has a non negative priority number qi + pi that represents the total quantity of risk priority in the relative area of the MPC-Heatmap added to the predicted value relative to the MPC-Heatmap 2. Each arc between any two vertexes i and j is associated with a distance di, j (in our solution it represents the reward for the robot that travel between the nodes i and j). The value associated to the arcs, takes into account also of the distance of the path in the gridmap to reach the destination node, and the presence of fixed obstacles. The reward assigned to the arc is lower in case diagonal movements respect to the axis of the gridmap (as shown in the Fig. 2 (d)), or in case the robot may meet a wall or other fixed obstacles to be avoid during its travel in the arc. The weight of the edge is calculated by the equation : di, j = (q j + p j ) , i, j, ∈ A. In HMILP path decision, it is important to take in account also of the cost of the distance that a robot cover to reach the node. In general, a greater distance, depending for example by the presence of an obstacle, may reduce the efficiency of the total percentage of cleaning inside the horizon time, because our robots cleans also the path to arrive at its target as in [25]. For this reason with reference to the Fig. 2 (d) we define three kinds of arcs: the straight arcs, that connects two nodes in the same x-axis or same y-axis of the MPC-Heatmap. Examples of straight arcs in Fig. 2 (d) are the connections between the node 0 and 1, or 0 with 3; the
446
R. Caccavale et al. Table 1. Notations used in the mathematical model Notation Description N
set of all nodes N = {1, . . . , n}
N0
N and the depot N0 = N ∪ {0}
K
total number of k robots
Q
max value of sanitization of a single robot
di, j
benefit assigned to the arc between the nodes, i, j ∈ N0
xi, j
is 1 if the k-th robot travels the arc (i,j)
ui
total priority sanitized by the kth robot
qi
priority of G1
pi
priority of G2
diagonal arcs as the arcs as between nodes 0 and 4 or 1 and 5. A third type of arc are those in which the robot meets an obstacle in his path in the gridmap following the connection tracked by that arc. d in G represents a benefit for the robot the run the relative arc. For this reason the arcs that are associated with a greater distance, as in case of presence of obstacles, are also associated with a lower benefit. In particular, we consider the following formula to define the weights of the three defined typologies of arcs:
di, j
⎧ ⎪ d , ⎪ ⎨ i, j ⎪ = di, j /2, ⎪ ⎪ ⎪ di, j /2, ⎩
straight arcs, diagonal arcs obstacles in the path
(1)
At the beginning of the horizon time, the robots starts their new path in the graph from the node that was the last reached in the previous hour. The first action that the MPC Server must do, is the positioning of the robots from the deposit to the current starting node where the robot has sent its actual position. This is done considering a dummy arc with a very high gain. The first step of the path of every robot from the deposit to the starting node is removed from the final path considered for the cleaning and sent to every robot at the beginning of their decentralized operations. The notations including sets, parameters, and variables used in the model are listed in Table 1. The MILP constraints for our multirobot sanitization problem are the following as in [22]:
HMILP-MPC Multi-robot Priority-Based Sanitization of Railway Stations
max(z) N0 N0 i
447
(2)
xi, j di, j ≥ z
(3)
j N0
xi, j = 1,
i∈N
(4)
jl
(5)
i∈N
(6) (7)
j N0
xi,l −
i
N0
xl, j = 0,
l ∈ N,
i j,
i l,
j
xi, j
pi + qi ≤ ui, ∈ {0, 1}
The objective function of the model is given in (2) which maximize the quantity of priority that is deleted in the graph G. Here, z represents the sum of the benefits d along a path selected on the graph G. Constraint set (3) indicates that the objective may not exceed the sum of every benefit of all the arcs of G. Constraint set (4) ensures that each section of the MPC-Heatmap is visited exactly once. Constraint set (5) implies that every robot that leaves a node, will reach a new different node. Constraint set (6) represents a limit for the quantity of sanitization done in a node i, that cannot be less of the priority assigned to the node i of the Graph G. Our approach to solve the multirobot optimization problem is the hierarchical MILP as in [3,7]. The MPC Server repeats the MILP calculus for every single robot, in a sequence. At every step, it finds the path for one robot, it puts to zero the nodes of the Graph G, belonging the path of the previous robot, and calculates the new path for the following robot. The MPC-Server repeats this method for every following robot. In this way, the MPC Server considers different graphs for every robot, to prevent two robots to select the same path, even in case they start from the same node. The MPC Server calculates the value of total reward of the whole team as sum of the single rewards gained by every robot, for each permutation of the sequence of the 4 robots, and chooses the sequence with the higher result. MPC Server considers also a different deposit for every robot, positioned in different places of the Graph G and fixed at the beginning of simulation. The assignment of a different deposit to more than one robot may be on-line changed by human intervention to correct the operation of cleaning by a remote control room. In Fig. 5 we see the action of cleaning of 4 robots, represented by the white squared dots. The cold zones, in black, are the path that they leave when they clean a zone of the environment. The environment is represented by the MPC-Heatmap of Roma Termini, where 1 pixel corresponds to 1 square meter.
448
3
R. Caccavale et al.
Experiments and Comparison with MARL Technique
Fig. 5. On the left, the distribution of priorities at the 6:00 a.m. of the 6 September 2021 in our simulation. On the right, the action of cleaning of 4 robots driven by HMILP-MPC control, and moving in the environment that represent Roma Termini station, thanks to the APF method.
In order to assess the performance of the system in a realistic dynamic environment, we propose a case study where the presented method is compared with the solution proposed in [25]. Thanks to the collaboration with RFI, we received a Meraki Heatmap for every hour of the day representing the whole planimetry of the station between the 30 August 2021 to 12 September 2021. We have verified that in considered period, there were not occurred exceptional events that may change the ordinary behavior of the station (for example, national or religious holidays, opening of the schools etc.). Moreover, the two weeks are temporally consecutive. For these reasons there are not changes in the behavior of passengers that frequents the station. We have calculated the euclidean distance between every MPC-Heatmap built at every different hour of the available days, and we have verified that the difference of the average euclidean distance of the days that belongs in the same category is lower than the ones in the other cases. In the available dataset we can predict the distribution and the density of people in the station in a working day of the second week of data, using the average of the working days of the first week at the same hour. In order to assess the performance of the teams of robots, we define a simple metric c perc as in [25], representing the cleaning rate (in percentage) of the map as follows: c perc = ((xtot − scurr )/xtot ) · 100,
(8)
where xtot = |X | is the number of free-obstacles cells of the map, while scurr = (i, j) s(i, j) is the sum of the current priorities for each cell of the heatmap. The value c perc is than 100% if the map is totally clean (i.e., each cell has 0 priority) and 0% if every cell of the map has maximum priority. We compared the quality of cleaning for the day 6 September 2021, using for the HMILP the
HMILP-MPC Multi-robot Priority-Based Sanitization of Railway Stations
449
mean values of the heatmaps of the days: 31 August, 1 and 2 September 2021, considered working day in the RFI category definitions. As it is shown in Fig. 6, the two methods have comparable results and approximately the same shape of the curve. They may be considered valid alternative to the proposed problem.
Fig. 6. Simulation of sanitization of the day 6 September 2021. The first two curves represent c perc for the two cases: MARL, HMILP-MPC. The yellow curve represents the spreading of the contaminants without any robot action.
Table 2. Comparison between case 1 and case 2 Time Case 1 Case 2 17:00 48.895
56.871
17:30 32.733
61.778
18:00 47.469
79.425
18:30 33.945
54.611
Our solution has an important advantage respect to the MARL method presented in [25]. It has the possibility to be controlled by remote site by a technical staff in a control room. This represents an important innovation in the solving of the problem proposed by RFI, because Roma Temini represents a challenging environment where exceptional events may occur and form new aggregation of people. For this reason, it is required a solution that may permit to correct the activity of cleaning of the robots, assigning greater priority to a specific area. This feature is enabled by exploiting the graph-based representation of the environment. We can on-line select for every robot a different deposit where they conclude their mission, choosing between those available in the station. It is also possible to assign a gain to the nodes inside the subarea that we want to select. We have verified this possibility simulating a situation in which it is occurred a rapid increment of priority risk in a zone Z where the nodes 7, 10, 13 are involved. We selected that zone Z, multiplying the weight of these node with a gain equal to 10. We also assigned to the robots the deposits in the nodes 10 and 13, while in normal condition their deposits are assigned to the node 2, 7, 9,
450
R. Caccavale et al.
11. In order to assess the performance on Z we defined a specific c percz value | similarly to the one proposed in Eq. 8. In this case we set xtot = |Z as the number of free-obstacle cells of the map in the zone Z, while scurr = (i, j)∈Z s(i, j) is the sum of the current priorities for each cell of the heatmap in Z. The results of the cleaning are shown in the Table 2, where we compare the case in which we have not selected any subarea (case 1) with the case in which we selected the zone Z (case 2). In the column 2 and 3 there are the relative results of c percz for the two cases at different time of day in the simulation. We can see that the team of case 2 demonstrates to sanitize better the zone Z, reaching a peak of cleaning at the 18:00 o’ Clock where c percz is equal to 79.425 %, in contrast to the 47.469% of the team of the case 1.
4
Conclusions
In this work we presented a new technique based of Hierarchical MILP-MPC method with APF for the on-line cleaning of very large and dynamic environments as a railway station. We verified the possibility to use historical data about the positions of people recorded by the WiFi infrastructure Network Meraki and the knowledge about the transportation service in the considered railway station, to predict the movement of clusters of passengers with an horizon time of 1 h. We have compared our solution with the MARL technique proposed in [25], highlighting the advantages deriving by the use of a graph and the MILP: the on-line control of the sanitization strategy from a remote human staff. We have also tested the quality in a real case simulation, with real data from the Roma Termini Station, the largest and most populated station of Italy. As future work, we plan to test the solution with real robots and to investigate how the proposed MPC approach can be integrated into a MARL framework to improve the cleaning performance. It is also interesting to verify the possibility to use the MPC method in the process of training of the MARL solution. Declarations Acknowledgments. the research leading to these results has been supported by the Italian Infrastructure Manager Rete Ferroviaria Italiana S.p.A, and by the HARMONY project (Horizon 2020 Grant Agreement No. 101017008). The authors are solely responsible for its content. Conflict of Interests. The authors declared that they have no conflict of interests to this work.
HMILP-MPC Multi-robot Priority-Based Sanitization of Railway Stations
451
References 1. Andersson, O., Ljungqvist, O., Tiger, M., Axehill, D., Heintz, F.: Receding-horizon lattice-based motion planning with dynamic obstacle avoidance. In: 2018 IEEE Conference on Decision and Control (CDC), pp. 4467–4474 (2018) 2. Andersson, O., Wzorek, M., Rudol, P., Doherty, P.: Model-predictive control with stochastic collision avoidance using Bayesian policy optimization. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 4597–4604 (2016) 3. Branca, C., Fierro, R.: A hierarchical optimization algorithm for cooperative vehicle networks. In: 2006 American Control Conference, p. 6 (2006) 4. Cao, Z.L., Huang, Y., Hall, E.L.: Region filling operations with random obstacle avoidance for mobile robots. J. Field Robot. 5, 87–102 (1988) 5. Charitidou, M., Keviczky, T.: An milp approach for persistent coverage tasks with multiple robots and performance guarantees. Eur. J. Control. 64, 100610 (2022) 6. Ciuffini, F., Tengattini, S., Bigazzi, A.Y.: Mitigating increased driving after the Covid-19 pandemic: an analysis on mode share, travel demand, and public transport capacity. Transp. Res. Res. 03611981211037884 (2023) 7. Earl, M.G., D’Andrea, R.: A decomposition approach to multi-vehicle cooperative control. Robot. Auton. Syst. 55(4), 276–291 (2007) 8. Hanna, S.: Transport and dispersion of tracers simulating Covid-19 aerosols in passenger aircraft. Indoor Air 32(1), e12974 (2022) 9. Lakshmanan, A.K., et al.: Complete coverage path planning using reinforcement learning for tetromino based cleaning and maintenance robot. Autom. Constr. 112, 103078 (2020) 10. Lee, T.-K., Baek, S., Oh, S.-Y.: Sector-based maximal online coverage of unknown environments for cleaning robots with limited sensing. Robot. Auton. Syst. 59(10), 698–710 (2011) 11. Lee, T.-K., Baek, S.-H., Choi, Y.-H., Oh, S.-Y.: Smooth coverage path planning and control of mobile robots based on high-resolution grid map representation. Robot. Auton. Syst. 59(10), 801–812 (2011) 12. Lee, T.-K., Baek, S.-H., Oh, S.-Y., Choi, Y.-H.: Complete coverage algorithm based on linked smooth spiral paths for mobile robots. In: 2010 11th International Conference on Control Automation Robotics Vision, pp. 609–614 (2010) 13. Mallya, D., Kandala, S., Vachhani, L., Sinha, A.: Priority patrolling using multiple agents. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 8692–8698 (2021) 14. Miao, X., Lee, J., Kang, B.-Y.: Scalable coverage path planning for cleaning robots using rectangular map decomposition on large environments. IEEE Access 6, 38200–38215 (2018) 15. Murtaza, G., Kanhere, S., Jha, S.: Priority-based coverage path planning for aerial wireless sensor networks. In: 2013 IEEE Eighth International Conference on Intelligent Sensors, Sensor Networks and Information Processing, pp. 219–224 (2013) 16. Narang, M., et al.: Fighting Covid: an autonomous indoor cleaning robot (AICR) supported by artificial intelligence and vision for dynamic air disinfection. In: 2021 14th IEEE International Conference on Industry Applications (INDUSCON), pp. 1146–1153 (2021) 17. Nasirian, B., Mehrandezh, M., Janabi-Sharifi, F.: Efficient coverage path planning for mobile disinfecting robots using graph-based representation of environment. Front. Robot. AI 8, 4 (2021)
452
R. Caccavale et al.
18. Nasirian, B., Mehrandezh, M., Janabi-Sharifi, F.: Efficient coverage path planning for mobile disinfecting robots using graph-based representation of environment. Front. Robot. AI 8 (2021) ´ l-Kaff, A. , Mart´ın, D., de la Escalera, A.: Trajectory planning for 19. Madridano, A., multi-robot systems: Methods and applications. Expert Syst. Appl. 173, 114660 (2021) 20. Oh, J.S., Choi, Y.H., Park, J.B., Zheng, Y.: Complete coverage navigation of cleaning robots using triangular-cell-based map. IEEE Trans. Industr. Electron. 51(3), 718–726 (2004) 21. Pasqualetti, F., Durham, J.W., Bullo, F.: Cooperative patrolling via weighted tours: performance analysis and distributed algorithms. IEEE Trans. Robot. 28(5), 1181–1188 (2012) 22. Qin, W., Zhuang, Z., Huang, Z., Huang, H.: A novel reinforcement learning-based hyper-heuristic for heterogeneous vehicle routing problem. Comput. Ind. Eng. 156, 107252 (2021) 23. Rasekhipour, Y., Khajepour, A., Chen, S.-K., Litkouhi, B.: A potential field-based model predictive path-planning controller for autonomous road vehicles. IEEE Trans. Intell. Transp. Syst. 18(5), 1255–1267 (2017) 24. Ren, Y., et al.: D-log: a WIFI log-based differential scheme for enhanced indoor localization with single RSSI source and infrequent sampling rate. Pervasive Mob. Comput. 37, 94–114 (2017) 25. Caccavale, R., Cal` a, V., Ermini, M., Finzi, A., Lippiello, V., Tavano, F.: Multirobot sanitization of railway stations based on deep q-learning. AIRO 2021: 8th Italian Workshop on Artificial Intelligence and Robotics of the 20th International Conference of the Italian Association for Artificial Intelligence (AI*IA 2021) (2021) 26. Setti, L., et al. Airborne transmission route of Covid-19: why 2 meters/6 feet of inter-personal distance could not be enough, April 2020 27. Song, P., Kumar, V.: A potential field based approach to multi-robot manipulation. In: Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292), vol. 2, pp. 1217–1222 (2002) 28. Stump, E., Michael, N.: Multi-robot persistent surveillance planning as a vehicle routing problem. In: 2011 IEEE International Conference on Automation Science and Engineering, pp. 569–575 (2011) 29. Tardivo, A., Zanuy, A.C., Mart´ın, C.S.: Covid-19 impact on transport: a paper from the railways’ systems research perspective. Transp. Res. Rec. 2675(5), 367– 378 (2021) 30. Tavakoli, M., Carriere, J., Torabi, A.: Robotics, smart wearable technologies, and autonomous intelligent systems for healthcare during the Covid-19 pandemic: an analysis of the state of the art and future vision. Adv. Intell. Syst. 2(7), 2000071 (2020) 31. Zhou, X., Wang, H., Ding, B., Hu, T., Shang, S.: Balanced connected task allocations for multi-robot systems: an exact flow-based integer program and an approximate tree-based genetic algorithm. Expert Syst. Appl. 116, 10–20 (2019)
Exploration System for Distributed Swarm Robots Using Probabilistic Action Decisions Toui Sato1(B) , Kosuke Sakamoto1 , Takao Maeda2 , and Yasuharu Kunii1 1
Chuo University, Tokyo, Japan {a14.pw6d,ksakamoto605}@g.chuo-u.ac.jp, [email protected] 2 Tokyo University of Agriculture and Technology, Tokyo, Japan [email protected]
Abstract. In this paper, we propose a new distributed exploration system consisting of a parent robot and a group of ground-mobile child robots that search the surrounding area to solve the low efficiency of planetary exploration by relatively large robots such as NASA’s Curiosity. The proposed system is a probabilistic algorithm, in which robots act based on a specified probability distribution. A distributed swarm robot system based on the proposed algorithm was constructed in real space and validated by developing and testing robots named RED for experimental demonstration. The effectiveness of the proposed system was evaluated through experiments. Experimental results showed that the robot explored the region according to the specified probability distribution.
1
Introduction
In recent years, various studies have been conducted in the form of multi-robot agent systems, or the hierarchical forms, such as swarm robots, parent and child robots, and so on [1–4]. These studies have been considered for applications such as environmental measurement on the ground, disaster relief, and planetary exploration, However The size of these robots, such as a planetary rover, is relatively large because a single robot must perform a variety of tasks [5]. As a result, the robot is required strict safety, and the development period and cost become large. and the risk of loss due to failure is high. In addition, the traveling speed is extremely slow because careful operation is required to maintain safety, and only limited environments, such as along the path, can be measured. Therefore, the efficiency of exploration by large robots is low. In order to solve these problems, small robots have been developed by many institutes, such as JAXA [6–10]. The advantages of small robots are as follows: 1. the development period and cost can be reduced 2. the robots can be mounted with other payloads. The details of the extreme environments, such as a planetary surface, are almost unknown. Therefore, it is hard to localize the robot by themselves. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 453–465, 2024. https://doi.org/10.1007/978-3-031-51497-5_32
454
2
T. Sato et al.
The Proposed Exploration Swarm Robot System and Area Control Algorithm
In this section, we present the proposed exploration system for wide area. we also present the concept of our proposed algorithm and a description of the proposed algorithm. 2.1
Distributed Robot System for Wide-Area Exploration by Swarm Robots
This study focuses on the small swarm robots system without path planning. The schematic diagram is shown in Fig. 1 and the system structure in Fig. 2. We will consider a simple configuration for the exploration system without assuming that the ground robots have abundant functions and performance in terms of sensing and communication capabilities. The proposed exploration system assumes a heterogeneous small robot system in which a small hopping leader robot exists in the upper layer and small wheeled swarm robots are in the lower layer. We do not consider such a large robot can be a leader robot, since this study focuses on the small swarm robot system. The wheeled swarm robots in the lower tier are basically engaged in a single leader robot, and explore the surrounding area widely with the leader as a dispersing center. It is generally difficult for small ground robots to observe the environment and create the map by their own sensing due to the field of view constriction. In addition, small ground robots do not create the path because of the reasons mentioned above. The ground robots need to be aware of the positional relationship with the leader robots, and for this purpose, employ previous studies [11]. Since there are some studies on global and local path planning by leader robots, this paper does not focus on the motion planning or control of the leader robot [12]. The leader robot’s position in this study is to serve as a reference point for the ground robots in this study, which cannot sense the environment. The ground robot needs to periodically observe the leader robot at a higher altitude than the narrowing of the field of view, and the leader robot will leap directly above the ground robot at the same position until the ground robot finishes its exploration, for example. At this time, the leader robot also repeats the observation to determine the next moving point and searches for the next moving position. This paper focuses only on the exploration method around this distributed reference point and shows how to design an algorithm for ground robots that act independently and autonomously based only on the location information with the leader robot. In addition, the lack of some robots does not significantly affect this system, because the total number of robots is much larger than the lacks. Thus, it has an advantage in terms of risk diversification [13].
Exploration System for Distributed Swarm Robots
Fig. 1. The overview of the proposed system
2.2
455
Fig. 2. The structure of the proposed system
Overview of the Region Control Algorithm with Probabilistic Action Decision
In this method, we adopt a random walk as the basic behavioral model of robots and suppress their actions in a stochastic manner [14]. This section explains the proposed stochastic distributed exploration algorithm for wide area observation. In the assumed exploration system, the objective is that the lower layer robots perform wide-area distributed exploration around a distributed center such as a leader without commands from the upper layer. There are many randomness factors such as the existence of obstacles and topographical in natural environments or disaster areas, so that it is difficult to expect the robot position accurately. Considering the low intelligence, low functionality, and low performance of the robot, the algorithm should be simple as possible. Formation control by animal mimicry has been the mainstay of the conventional swarm intelligence studies, however these methods. These are not suited for the exploration of what this study focuses on [15]. Therefore, we adopt a stochastic and simple random walk model as the basic algorithm. The random walk model is generally used to describe the phenomenon of particle dispersion from an arbitrary point, which is known to gradually expand spatially with the passage of time. Therefore, we think that the random walk-based method is suitable for wide-area exploration. However, if the spread distance becomes too large, many robots move away from the dispersion center and become isolated. Therefore, it is necessary to moderately restrain the probabilistic movement away from the center of distribution.
456
2.3
T. Sato et al.
Algorithm for Probabilistic Action Decision Making
For this purpose, we propose the following stochastic action control method using the Metropolis method. The basic behavior of the robot is a repetition of turning and walking straight ahead, and the turning angle is randomly selected. In the case of a general random walk, since the variance is proportional to time, it gradually diffuses as time goes by. Thus, it is necessary to control the region where the follower stays. A similar study is a method for controlling the distribution of robot presence in a given region by assuming a probabilistic approach in which the distribution of agents on the state space is modeled [16]. In this study, the robot presence region is defined based on the distance space from the dispersion center, and each robot makes a probabilistic action decision based on the presence region. The following is the stochastic action decision method. (1) Decide the destination (x , y ). Determine the direction of movement by random walk and estimate the amount of movement from the current position. The estimation of the amount of movement is based on the information obtained internally by the robot, such as the information from the rotary encoder. Let dx and dy be the displacement in the direction of the coordinate axes, respectively. (x , y ) = (x + dx, y + dy)
(1)
(2) Substitute the distance between the current location and the destination location into the horizontal axis of the proposed distribution. Prepare the proposal distribution with the distance between the robot and the dispersion center on the horizontal axis and the probability on the vertical axis. For the proposed distribution, we use the normal distribution. This is because the probability of the center is high and decreases as it spreads outward, thus realizing an algorithm that always adopts actions toward the center and judges actions toward the outside probabilistically, as described in the previous section. The normal distribution has an internal parameter standard deviation σ and mean μ and can be expressed by the following Eq. (2). (x−μ)2 1 (2) f (x) = √ e− 2σ2 2π (3) Calculate the ratio of probabilities The probability p of a follower at each location is calculated by substituting the distance α between the leader and the follower before moving and the estimated distance α between thethe leader and the follower at the destination into the proposed distribution. The ratio of p(α) at the current distance and p(α ) after the move is denoted as r.
r=
p(α ) p(α)
(3)
Exploration System for Distributed Swarm Robots
457
(4) A uniform random number R(0 < R < 1) is generated and compared with r to evaluate the behavior. R < r, the destination (x , y ) is adopted, and the action is actually taken. If R > r, it rejects generated action move and takes the rejection action. Although various patterns of “rejection action” could be considered, such as re-selection of the direction of movement, or movement toward the center of dispersion, in this study, the left-right movement behavior is given toward the center of dispersion, which can be observed widely within the region. The“rejection action” induces the robots to rotate within the dispersed area and reduces the bias of the observation. With the stochastic control method described above, each robot always moves in the direction of the dispersion center, since r > 1, and it is probabilistically rejected when the direction of movement is away from the dispersion center. The proposed method can control a group of robots within 3σ standard deviation of the proposed distribution and perform a diffuse exploration. The internal parameters of the proposed distribution can be used for stochastic control of robots, and the size of the search area around the center of dispersion can be controlled.
3
Evaluation by Simulation
In this section, we show how the control algorithm with stochastic action decision proposed in the previous section is controlled in the simulation. 3.1
Algorithm Verification by Simulator and Comparison of Search Efficiency by Changing Control Parameters
The number of robots was assumed to be 10 as the condition for simulation using the stochastic domain control algorithm, and the distance and the total number of steps were set to be 1.0[m] and 150 times, respectively, at a time when the robot moves straight ahead. The internal parameters of the proposed distribution were set to have mean μ0 = 0 and standard deviation σ0 = 0 = 2.0. The center of variance is set at (10, 10), and the position of the center of variance is set to the initial position of the robot. Figure 3 shows the results of the search around the dispersion center using the stochastic dispersion algorithm. Figure 4 shows the ideal distribution for the same control parameters. The values of the color bars indicate the number of overlapping observations for each square of the grid map. From Fig. 3, it is confirmed that the stochastic region density control algorithm controls the robot within a region of radius 6.0[m] from the center of dispersion as a result of suppressing the diffusion phenomenon over time. Since the robot does not deviate significantly from the range of radius 3σ = 6.0[m], which is the range of the hem of the proposed distribution prepared in advance,
458
T. Sato et al.
it can be said that the robot is controlled within the range in accordance with the control parameters. This is also true in comparison with the ideal distribution shown in Fig. 4. In Fig. 3, all the cells are colored except for the unexplored cells, and the blue cells outside the cells are light-colored according to the specification. It can be confirmed that this area is within a radius of 6.0[m] from the dispersion center. We also examined whether the proposed method produced exploration results in accordance with the proposed distribution. A histogram of the x-axis of Fig. 3 is shown in Fig. 5.@ It can be seen that a high density of robots is formed around the center of dispersion. The histogram was analyzed by approximation, and the standard deviation σ0 of the normal distribution obtained was 1.7. Although the analyzed value is not exactly equal to the set value σ0 = 2.0, the results are close in tendency and numerical value, and it is confirmed that the control is based on the proposed distribution.
Fig. 3. Stochastic distributed control
Fig. 4. Ideal of stochastic distributed control
When the mean value of the control parameters is μ0 = 0, as shown in Fig. 3 In this situation, the exploration progress to the outside of the area is delayed, and the exploration efficiency is lost. To improve exploration efficiency, it is desirable to eliminate the concentration of movement around the center of dispersion and allow the robot to actively move to more distant locations. In contrast, this method improves the concentration of robot movement around the dispersion center by changing the mean value μ0 of the proposed distribution, which is a control parameter, from 0 to improve the exploration efficiency. In the case of μ0 = 1.0, the assignment value of the proposed distribution is the distance between the robot and the dispersion center, which means that
Exploration System for Distributed Swarm Robots
459
Fig. 5. Exploration density by histogram
the frequency of exploration behavior of the robot at a distance of 1[m] from the dispersion center increases. This is expected to improve the concentration of exploration frequency at the center of dispersion, thereby increasing the efficiency of exploration. The results of the simulation by changing the mean value, which is an internal parameter of the proposed distribution, are shown in Fig. 6. In this case, the mean value μ0 = 5.0 and the standard deviation σ0 = 2.0 were used as the internal parameters. The ideal distribution for the same parameters is shown in Fig. 7. The number of units was set to 10 and the number of steps was set to 150. The results show that the search density is not concentrated in the center of dispersion, but is distributed over a wide area around the center of dispersion, and the bias of the search density is suppressed compared to the case where the mean μ0 = 0 and standard deviation σ0 = 2.0 are used as internal parameters.
Fig. 6. Torus probability distribution density
Fig. 7. Ideal of torus probability distribution density
460
4
T. Sato et al.
Development and Implementation of Exploration Systems
In this section we describe the exploration system we have built by introducing our proposed algorithm. We describe the hardware and software of the entire exploration system and the robot called RED that we developed to validate the proposed algorithm. 4.1
Hardware Development of the Robot for Actual Equipment Verification
In order to demonstrate the stochastic region control algorithm described in the previous section, we have developed a tubular Robot shown in Fig. 8 as a ground robot for experimental verification, which corresponds to the lower level in the exploration system shown in Fig. 2. We named this robot RED and will refer to it as the RED Robot from now on. This robot has a width 20.0[cm], a depth 17.0[cm], a height 13.0[cm], a weight 600[g]. This robot moves by repeatedly turning and going straight. The wheels of the RED robot are uneven to improve its ability to overcome obstacles. The sides of these wheels have protrusions to make it easier for the RED robot to return to the ground if it gets stuck. Figure ?? shows how the RED robot observes with a camera for floors. This floor camera allows the robots to observe and cover the space when turning and traveling. The hardware configuration of the robot is shown in Fig. 9. The RED robot consists of a Raspberry Pi equipped with a motor unit for driving and a gyro sensor for posture assistance. This robot is equipped with cameras on the front and the top, respectively. The camera on the front takes pictures of the floor to detect scratches. In this system, a LED marker is used instead of robots in the upper layer in order to validate the probabilistic exploration algorithm for the ground robots in the lower layer. The camera at the top is used to detect the LED marker, which is treated as a dispersion center. The RED robot measures
Fig. 8. A robot named RED: There is a hole on the top surface to put out a camera to recognize a photo of an LED marker in the direction of the ceiling. The robot is equipped with a distance sensor, a camera, and LED lights in the front.
Exploration System for Distributed Swarm Robots
Fig. 9. Structure of robot system
461
Fig. 10. The RED robot explores the area around one or more LED markers as a distributed center.
in which direction the LED markers are located and how far away they are. The measurements are used to run the proposed algorithm. The LED markers are placed in the exploration space as shown in the Fig. 10 and the RED robot is allowed to explore around them. The height (z-component) of the LED marker is given to the robots as a parameter in advance, and the LED marker is detected and calculated by the camera on the top surface. The distance to the LED marker in the x-y direction can be calculated by detecting the LED marker with the camera on the top surface.
Fig. 11. Structure of software system
4.2
Software and Exploration System Configuration
The software configuration is shown in Fig. 11, where the information is integrated by GIT Server and UDP Server in the PC. The software on RED robot is the latest version by referring to the GIT Server. In addition, we developed the GUI application as shown in Fig. 12 that integrates the UDP Server. The RED robots send its individual address, acquired images, and its own state to
462
T. Sato et al.
the application, and the application sends parameters necessary for each robot to operate. The application sends the following parameters to the robot. 1. Height of the LED marker (measured in advance) 2. Standard deviation of the proposal distribution σ and mean value μ 3. The robot’s straight-line speed The RED robots can act autonomously with the proposed algorithm based on the parameters and the positional relationship with the LED markers, which are the dispersion centers. They send the following information to this application. 1. Estimated distance and direction from the center of dispersion at each step number 2. Camera pictures in floor and ceiling direction at each number of steps 3. Status of each robot, including connection status The application can analyze the information received from the robots to make estimated distribution and parameter estimation for the robots. In addition, by observing the exploration of the robot by an external camera, we can know the explored and unexplored locations. By determining the percentage of explored and unexplored locations within the exploration area, we can estimate the coverage of space relative to time.
5
Exploration Field Tests and Results
In this section, we present experimental results from a real robot system to validate the proposed exploration algorithm.
Fig. 12. Applications for Management and Analysis of Swarm Robot Systems
Exploration System for Distributed Swarm Robots
5.1
463
Description of Experimental Environment/conditions and Data Collected
The exploration field and area of exploration was a concrete field within a black circle as shown in Fig. 13. 10 robots were operated in the exploration area. The control parameters of the proposed algorithm were set to a mean value of μ = 5.0 and a standard deviation of σ = 2.0. The radius of the black circle was set to 8.0[m]. This means that the area of the exploration area is 64[πm2 ]. The exploration was performed in 1200[sec]. The RED robots were observed by an external camera. The trajectories of their movements were extracted from the video and are highlighted in white in the Fig. 13. The LED marker posts were placed in the center of the circular area in the Fig. 13, but were excluded from the figure for the sake of analysis. In surface exploration, the size of the area and the number of overlaps (uniformity) affect the efficiency of exploration. In order to validate the exploration efficiency, the experiments are evaluated in terms of”exploration time” required to reach a certain level of coverage in an area. We describe on a heat map created to evaluate exploration efficiency. This heat map is shown in Fig. 14. The numbers in the color map on the heat map indicate the number of overlapping explores for each location by the robot swarm, and the number of overlapping searches increases as the color changes from blue to red. The robots estimate the direction and distance of the dispersion center at each step, and the graph summarizing the estimated distance among these data is shown in Fig. 15. A graph of the coverage of the exploration area versus time was created from the information on the unexplored and explored squares in the exploration area obtained from the heat map. This graph is shown in Fig. 16.
Fig. 13. Robot movement trajectory obtained from video analysis of the results of the wide-area exploration experiment (it is painted in white).
Fig. 14. Heat map of the robot’s movement trajectory
464
T. Sato et al.
Fig. 15. Histogram of estimated distance between robot and dispersion center
5.2
Fig. 16. Graph of spatial coverage of the exploration radius 8.0[m] versus exploration time (in seconds)
Examination of Collected Data
From Fig. 6 and Fig. 14, it is confirmed that the on-torus region control works in the actual experiment as well as in the simulation, depending on the control parameters. The torus-like region control is a control parameter in which the mean value μ is made larger than 0 to prevent the robot’s stochastic behavior from concentrating around the center of dispersion. Figure 15. shows that the frequency of robot movement around the position 5.0[m] from the dispersion center is higher. The distribution of robots was modified according to the control parameter mean μ = 5.0. This data confirms that torus-like area control was achieved in the actual experiment. Figure 16 shows that in 1200 [sec], the robots were able to explore 78% of the area with a radius of 8.0[m] from the LED marker, which is the center of dispersion.
6
Conclusion
In this paper, we propose a distributed exploration system using swarm robots that can maintain swarm formation in a certain exploration area and observe the entire area, while the robots act autonomously and flexibly. A stochastic exploration algorithm for area control based on a random walk model of swarm robots is proposed. Simulation results show that the proposed algorithm can observe the area around an arbitrary center of dispersion and control swarms in a certain area while maintaining autonomy and diffuseness. An actual wide-area exploration experiment was conducted using the RED robot swarm, a robot for algorithm verification. The experimental results showed that the actual exploration system is as effective as the simulation in controlling by the control parameters. In addition, an 8.0[m] radius exploration area was observed 78% of the time in 1200 [sec]. Therefore, the proposed exploration system and control algorithm are practical for wide-area distributed exploration.
Exploration System for Distributed Swarm Robots
465
Acknowledgment. The results of this research were carried out in collaboration with the Space Exploration Innovation Hub of the National Aerospace Laboratory of Japan Aerospace Exploration Agency (JAXA), Takenaka Corporation, and Chuo University. We would like to take this opportunity to express our deepest gratitude.
References 1. Zungeru, A.M., Ang, L.M., Seng, K.P.: Classical and swarm intelligence based routing protocols for wireless sensor networks, A survey and comparison. J. Netw. Comput. Appl. 35, 1508–1536 (2012) 2. Ichikawa, S., Hara, F.: Characteristics on swarm intelligence generated in multirobot system. J. Robot. Soc. Japan 13(8), 1138–1144 (1995) 3. Nouyan, S., Grob, R., Bonani, M.: Teamwork in self-organized robot colonies. IEEE Trans. Evol. Comput. 13(4), 695–711 (2009) 4. Hamann, H.: Swarm Robotics: A Formal Approach. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-74528-2 5. Lindemann, R.A. and Voorhees, C.J.: Mars exploration rover mobility assembly design, test and performance. In 2005 IEEE International Conference on Systems, Man and Cybernetics, pp. 450–455 (2005) 6. Tao, J., Deng, Z., Hu, M., Liu, J., Bi, Z.: A small wheeled robotic rover for planetary exploration. In: 2006 1st International Symposium on Systems and Control in Aerospace and Astronautics (2006) 7. Middleton, A., Paschall, S. and Cohanim, B.: Small lunar lander/Hopper performance analysis. In: Proceedings of IEEE International Conference on Systems, Man and Cybernetics (2010) 8. Yoshimitsu, T., Kubota, T., Nakatani, I.: MINERVA rover which became a smalll artificial solar satellite. In: 20th Annual AIAA/USU Conference on Small Satellites (2006) 9. Rubenstein, M., Ahler, C., Nagpal, R.: Kilobot: a low cost scalable robot system for collective behaviors. In: Robotics and Automation(ICRA) (2012) 10. Kazuya, Y., Yoichi, N., Takeshi, M.: Sampling and surface exploration strategies in MUSES-C and future asteroid missions. In: 20th Annual AIAA/USU Conference on Small Satellites (2003) 11. Matsumoto, R., Sato, T., Maeda, T., Kunii, Y., Kida, H.: Position and attitude estimation for distributed exploration of small rovers using flash light from leader agent. In: 2019 9th International Conference on Recent Advances in Space Technologies (RAST) (2019) 12. Ushijima, M., Kunii, Y., Maeda, T., Yoshimitsu, T., Otsuki, M.: Path planning with risk consideration on hopping mobility. In: IEEE/SICE International Symposium on System Integration (2017) 13. Zhu, Y.F., Tang, X.M.: Overview of swarm intelligence. In: International Conference on Computer Application and System Modeling, ICCASM 2010 (2010) 14. Nurzaman, S.G., Matsumoto, Y., Nakamura, Y., Koizumi, S., Ishiguro, H.: Yuragibased adaptive searching behavior in mobile robot: from bacterial chemotaxis to levy walk. In: The 2008 IEEE International Conference on Robotics and Biomimetics, Bangkok, Thailand, 21–26 February (2009) 15. Reynolds, C.W.: Flocks, herds, and schools: a distributed behavioral model. In: ACM SIGGRAPH 1987 Conference Proceedings. Anaheim, California (1987) 16. Milutinovic, D., Lima, P.: Modeling and optimal centralized control of a large-size robotic population. IEEE Trans. Rob. 22(6), 1280–1285 (2006). https://doi.org/ 10.1109/TRO.2006.882941
Distributed Estimation of Scalar Fields with Implicit Coordination Lorenzo Booth and Stefano Carpin(B) Department of Computer Science and Engineering, University of California, Merced, 5200 N Lake Road, Merced, CA 95343, USA {lbooth,scarpin}@ucmerced.edu
Abstract. Motivated by our ongoing work in robotics for precision agriculture, in this work we consider the problem of estimating a scalar field using a team of robots collecting samples and subject to a travel budget. Our fully distributed method leverages the underlying properties of Gaussian Process regression to promote dispersion using minimal information sharing. Extensive simulations demonstrate that our proposed solution outperforms alternative approaches.
Keywords: Coordination
1
· Applications in Agriculture · Estimation
Introduction
In this work we consider the problem of estimating a scalar field using a team of collaborating robots. This problem is motivated by our ongoing research in precision agriculture, where it is often necessary to estimate the spatial distribution of parameters such as soil moisture, nitrates, or carbon dioxide flux that can be modeled as spatially-varying scalars. To model this underlying field, we use a Gaussian process (GP) [6]. GPs are not only elegant and efficient computational tools to solve regression problems, but are also the model of choice in geostatistics and agroecological modeling, where GP regression is known as kriging [9]. A key feature about GP regression is that this model allows to easily estimate the uncertainty of the predictions, thus enabling iterative methods where new sensor measurements are collected in regions of high uncertainty to refine accuracy. In agricultural applications one is often faced with the problem of estimating quantities over very large domains, therefore the use of multiple robots allows for quicker coverage of the region of interest. In these conditions, robots have to plan their motions with multiple objectives. When an exhaustive, systematic coverage L. Booth is supported by the Labor & Automation in California Agriculture (LACA) project (UC-MRPI intiative). S. Carpin is partially supported by the (NSF)under NSF Cooperative Agreement Number EEC-1941529 (IoT4Ag) and USDA/NIFA under award # 2021-67022-33452. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the funding agencies. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 466–478, 2024. https://doi.org/10.1007/978-3-031-51497-5_33
Distributed Estimation of Scalar Fields
467
of the entire region is not feasible, it is important to collect samples at the most informative places. It is also necessary to be aware of the limited distance that can be covered before the robot must be refueled or recharged. Therefore robots have to plan paths that will eventually terminate at a designated end point before they run out of energy or fuel. Likewise, it is also important to consider that in real world applications the energy consumed to move between two locations is not deterministically known upfront; rather, it is a random variable whose realization is only known at run-time. For example, a robot may need to take a detour to reach a certain place, or it may move through a muddy area causing wheel splippage, etc. Finally, in rural regions communication infrastructures are often lacking or limited and therefore robots can not assume the availability of broadband communication channels. With these motivations in mind, in this paper we present an algorithm to solve this estimation problem with a team of robots. Coordination between the agents is obtained with minimal information exchange and by leveraging the mathematical properties of GPs to promote dispersion. The approach is fully distributed and each robot just broadcasts to the rest of the team the limited information consisting of the locations where it has collected data, the value it measured, and its unique identifier—no more than a handful of bytes at a very low frequency. No other communication is required and robots never exchange their individual plans or models. In addition, each robot uses a refinement of our recently developed planner for stochastic orienteering to ensure that it reaches the final location before it runs out energy. Through extensive simulations we demonstrate that this approach ensures robots collect samples in areas leading to a more accurate reconstructions of the underlying unknown scalar field. The rest of the paper is organized as follows. Selected related work is presented in Sect. 2. The problem formulation is introduced in Sect. 3 and our methods are discussed in Sect. 4. In Sect. 5 we present extensive simulations to evaluate our proposal and in Sect. 6 we draw the conclusions.
2
Related Work
The problem considered in this contribution is related to the orienteering combinatorial optimization problem where one has to plan a path through a weighted graph to gather the maximum sum of vertex rewards while ensuring that the length of the path does not exceed a preassigned travel budget. Recently, we have extensively studied this problem in agricultural settings, both for single agents [7,13] and multiple agents [14], and we also considered its stochastic variants [12]. In all these works, however, rewards associated with vertices were static and assigned upfront, while in this work rewards associated with vertices are iteratively re-estimated based on the gathered samples. Moreover, our former multi-robot solution [14] was centralized, while we here propose a fully distributed approach. The use of Gaussian Processes for estimating scalar fields with robots has also been explored in the past. Suryan and Tokekar [11] proposed an algorithm
468
L. Booth and S. Carpin
to reduce in minimal time the variance of the scalar field being estimated with a GP. Their solution uses a team of robots, but does not consider travel budgets, i.e., robots can travel as much as needed. Similarly, Di Caro et al. [2] propose an estimation algorithm for GPs aiming at reducing uncertainty under time and communication constraints. However, their solution does not consider a travel budget. Our work is also related to Informative path planning (IPP) where the emphasis is on planning paths to collect the most informative samples [3,4]. In IPP, however, the set candidate sample points is not given, but is rather determined by the algorithm, and the travel budget is typically not explicitly considered.
3
Problem Statement
We consider the problem of estimating a scalar function f : X → R defined over a bounded region of interest X given a limited number of observations collected by multiple robots. The goal is to determine where robots should collect these samples. As commonly done in this domain, a graph structure is used to model the navigable environment. We assume that observations of the underlying scalar function can be collected at a limited set of sampling locations, denoted as the set of vertices V in the graph. This assumption holds in a variety of real-world agricultural applications, where specific points of interest (e.g., sentinel trees that serve as early-indicators of ecosystem health) have been pre-identified, or when the robots sense the environment by leveraging pre-deployed infrastructure (such as soil sensors implanted in the ground). This assumption is not restrictive; when prior sensing locations are not specified, one can choose V arbitrarily, e.g. as a set of equally-spaced points covering the region of interest, as is typical in na¨ıve surveying schemes. We consider this navigable environment as a complete graph, with edge set E = V × V . To each edge e ∈ E we assign a random variable c(e) representing the movement cost (e.g. energy spent) when the robot traverses the edge. We assume that the density functions characterizing these random variables are known. All robots must begin at an assigned start vertex vs and end at an assigned goal vertex vg before they run out of energy. Each robot ri starts with a travel budget Bi . When the robot traverses an edge e, its budget decreases by the random value drawn form the random variable c(e). For simplicity, we assume that all Bi s are the same, but this is not a strict requirement. Each time the robot visits a location v ∈ V , using its onboard sensor(s) it collects a sample of the underlying function f , obtaining a noisy observation yv = f (v) + ε, where 2 ) is measurement noise, Gaussian-distributed with zero mean and ε ∼ N (0, σm variance σm . With regard to communication, we make the following two assumptions: When robots are collecting data, they can only anonymously broadcast packets of the type (v, yv , n) indicating that they collected observation yv at vertex v. The last component n is a unique id assigned to each robot— its use will be presented
Distributed Estimation of Scalar Fields
469
in Sect. 4. At the end of the mission, after the robots have converged at the goal vertex vg and are in proximity, they can exchange all the data they have gathered during the mission. However, at that point data collection is concluded and they cannot return to the field and acquire more data to improve the estimate. These communication assumptions are consistent with contemporary technology used by robots in agricultural domain. In particular, LoRa [10] offers the capability of streaming limited amounts of data at long distances and is compatible with the assumptions we made when robots are out in the field. When robots terminate their mission and are in close proximity data can be instead be exchanged using onboard WiFi. To reconstruct the scalar field we use Gaussian process (GP) regression, as detailed in Sect. 4. Throughout the mission, using the available data (either collected or communicated) robots can make predictions about the value of f at arbitrary locations in X . We indicate such predictions as fˆ. The overall objective is to collect the set of observations providing the most accurate reconstruction of the underlying scalar field. As common in estimation literature, in this work our metric for accuracy is the mean squared error (MSE) defined as 1 (f (ψ) − fˆ(ψ))2 dψ. M SE = |X | X
4 4.1
Methods Gaussian Process Regression
We provide the necessary notation and background for GP regression and we refer the reader to [6] for a more comprehensive introduction. As per our problem definition, we describe the spatial distribution of an unknown parameter (moisture, nitrates, etc.) as a function: f : X → R that is continuous in the 2-D environment X ⊂ R2 where measurements are taken. The function which describes the environmental field f and measurement noise σm are represented as unique random variables that follow an i.i.d. Gaussian distribution with zero mean μ and variance σ 2 . The Gaussian process assumption is to model f as a random probability distribution over a set of functions, and that the value of f for arbitrary inputs x and x (f (x) and f (x ), respectively) has a jointly-Gaussian distribution. We assume that f (x) is a realization of a Gaussian process, which is completely defined by a mean function m(x) and a covariance function k(x, x ) with input vector x: f (x) ∼ GP (m(x), k(x, x ))
(1)
The joint distribution of observations (the explanatory variable) y, {f (x1 ) + ε1 , . . . , f (xn ) + εn } at inputs X, {x1 , . . . , xn } and function values (the response variable) f , {f , . . . , fn } can be written as: k(X, X) + σ 2 IN k (X, x ) y ∼ N 0, (2) k (x , x ) f (x ) k (x , X)
470
L. Booth and S. Carpin
where y is a column vector of scalar outputs y, from a training set D of n observations, D = (X, y) = {(xi , yi ) | i = 1, . . . , n}. k is the covariance function (or kernel), σn2 is the variance of the observation noise, and input vectors x and query points x of dimension d are aggregated in the d × n design matrices X and X respectively. Through the marginalization of jointly Gaussian distributions, we can derive the following predictive conditional distribution at a single query point f | D, x ∼ N (E [f ] , V [f ]) as [6]: −1 y μ = E [f ] = k (x , X) k(X, X) + σn2 In
(3)
−1 σ = V [f ] = k (x , x ) − k (x , X) k(X, X) + σn2 In k (X, x∗ )
(4)
where k(X, X) is a matrix containing the joint prior distribution of covariances of the function f at inputs X and k (x , X) is a matrix containing the covariances between the function at query points and training inputs. The covariance function k (or kernel), captures prior knowledge about the function of interest, including stationarity and smoothness. Often, it is assumed that the covariance of any two random variables depends only on their distance (isotropy), independent of their location (stationarity) [6]. Note that the variance in Equation (4) can be computed for any point and not only for the observed locations. This serves as a means to reason about overall map uncertainty at unobserved locations and is key for the robots to decide where to sample next to decrease the MSE at the end of the surveying task. 4.2
Spatial Prior
The kernel k establishes a prior likelihood over the space of functions that can fit observed data in the regression task. Kernel selection and tuning is a key component in GP regression tasks. In machine learning the radial basis function (RBF) kernel is often used. However, in this paper, we use the Mat´ern kernel with ν = 3/2 which is a finitely-differentiable function. Our choice of this kernel is motivated by its broad use in the geostatistical literature for modeling physical processes [9] like those motivating this research. The Mat´ern covariance function takes the form: √
ν
√ 1−ν 2ν 2ν 22 r Kν r (5) KMatern (X, X ) = σ Γ(ν) l l where Kν is a modified Bessel function , Γ(·) is the Gamma function, and r is the Euclidean distance between input points X and X . The hyperparemeters ν > 0, l > 0, and σ 2 > 0 represent smoothness, lengthscale, and observation variance respectively. As common in GP inference, to account for the measurement noise,
Distributed Estimation of Scalar Fields
471
the kernel we use is the sum of two kernels, namely the Mat´ern kernel and a noise term, i.e., the kernel we use is K(X, X ) = KMatern (X, X ) + σn2 I where I is the identity matrix and the term σn2 models the measurement noise 2 . While we keep ν fixed at 3/2, the other hyperparameters θ = {σ 2 , σn2 , l} can σm be trained using various optimization methods using the marginal likelyhood to match the properties of environment and the sampled data [6]. In particular, the length scale l is related to the lag parameter of the variogram, a function used in geostatistics that establishes how quickly the variance increases as a function of separation distance between pairs of observations in space [9]. In the GP kernel, smaller values of l imply that variance quickly grows with distance, while with larger values the variance grows less. As we will see in the next subsection, by putting constraints on the range of possible values of l one can implicitly encourage dispersion between the robots, thus promoting the collection of samples in different areas of the environment. 4.3
Exploration
In this section we present the planning algorithm executed by each robot in the team. No global data structure is shared among the agents, and all the quantities described in the following are local to each robot. Let G = (V, E) be the graph of possible sampling locations and let D = (vi , yi ) i = 1 . . . n the set of collected samples (vertices and values). All robots start from the same start vertex vs and must end at the same goal vertex vg . D is initialized as an empty set, but then grows as the robot collects more data or receives data from other agents. Each robot is given a unique numerical identifier ni , but the robots need not to know how many agents are in the team. At each iteration the robot assigns a reward function to each of the vertices in V , i.e., it computes a function r : V → R assigning a value to each possible sampling location. Different options for the function r will be discussed in the next subsection. Once the function r has been computed, the robot is faced with an instance of the stochastic orienteering problem, i.e., it has a graph G = (V, E) with known deterministic rewards r associated to vertices and stochastic costs c associated to edges, as well as a residual budget B. At this point the robot executes the algorithm presented in [12] to solve the stochastic orienteering problem (SOP). Because of the intrinsic computational complexity of the orienteering problem, the SOP algorithm uses a Monte Carlo tree search informed by an heuristic aiming at identifying vertices with high value r, low travel cost, and from which the robot can still reach vg with high probability (the reader is referred to [12] for all details). The SOP algorithm returns the next vertex va to visit. The robot then moves to va , collects an observation ya , updates D, and broadcasts the packet (va , ya , ni ) to all other agents, where ni is the unique id of the robot. This process continues until the SOP algorithm returns vg , in which case the agent moves to the goal vertex vg and terminates. Throughout the process the
472
L. Booth and S. Carpin
robot keeps listening for possible packets broadcast by other agents, and when they are received the location and sampled values are added to D. As the SOP algorithm was developed for the single robot case, in this work we added a minor modification to account for the presence of multiple robots. The change is as follows: When considering the reward of a vertex r(v), the robot considers for all other agents, the last packet they transmitted (if it exists). Then, if it determines that another agent is closer to v than itself, it discounts the reward r(v). More precisely, let v be the vertex whose utility is being evaluated, and r(v) its reward. Let vc be the location of the current robot, and assume that it determines that robot j has broadcast a packet indicating it collected a sample at vertex vj . If vj is closer to v than vc , then r(v) is updated as r (v) = r(v)d(vj , v)/d(vc , v) where d is the Euclidean distance between the vertices. The rationale for this update is that if another robot is closer to v, then it is more likely to reach v than the former robot, so the utility of v is decreased for the former robot to prevent having both robots visiting v, as this would be a replicated effort wasting resources. However, the utility is not set to zero because robots do not communicate with each other and do not know their individual intentions. Also, since each robot maintains its own set of GP hyperparameters (see discussion below) and these will be different from each other, robots cannot make absolute predictions about the intentions of other robots in the team. Remark: one could imagine that after a robot has determined which vertex v it will visit next, it could broadcast this information to other agents so that they do not consider it anymore among their choices. However, this is not done for two reasons. Fist, such additional communication would almost double the amount of transmitted data, thus going against our effort to keep exchanged information at a minimum. Second, because of the stochastic nature of the environment there is no guarantee that a robot electing to visit a certain vertex will eventually reach it and collect a sample. Hence we opt for the current approach where robots share measured data only after they have reached and sampled a location. 4.4
Vertex Quality Computation
Key to the presented approach is the reward function r : V → R used by the SOP algorithm to decide which vertex to visit next. Ideally, the function should identify instrumentally good vertices to visit, where good in this case means vertices that will yield a reduction of the MSE metric. Different metrics have been proposed in literature. One obvious choice is to use Eq. (4) to predict the variance of vertices in V and set r(v) = σ 2 (v). In this case, the objective is to assign high values to vertices with high uncertainty in the estimate. In [8] the authors instead propose to use a linear combination of the mean and standard deviation predicted by Eqs. (3) and (4). Their approach aims at discovering the extrema of an unknown function. As in our application we are interested in the entire function, and not just its peaks, we could set r(v) = |μ(v)| + βσ(v). Finally, in [1] the authors propose an algorithm to compute the mutual information for vertices (prior to and after being added to the movement graph) using predictions for mean and variance. After having implemented these three alternatives,
Distributed Estimation of Scalar Fields
473
preliminary experiments did not outline significant differences between them. However, setting r(v) = σ 2 (v) has the advantage of not requiring the tuning of additional parameters, as it is instead necessary for the other two methods. Therefore, informed by these preliminary findings, in our implementation each robot assigns the predicted variance as the value of a vertex. Note that for vertices already in D the algorithm sets r(v) = 0, so that robots never consider again vertices that have been already sampled at least once. The kernel we use to make predictions about the variance depends on three hyperparameters θ = {σ 2 , σn2 , l} that can be tuned to best fit the data in D. As pointed out in [6] Ch.5, to obtain better results it is possible to repeat the optimization process multiple times, with random restarts to avoid getting stuck in suboptimal local minima. In our approach, before assigning values to the vertices each robot executes the optimization locally with ten restarts, but never communicates the hyperparameters of its internal model θ to the other team members. Each agent then operates with its separate set of hyperparameters θ that are unlikely to match the others, due to the random restarts of the optimizer. This difference will further decrease the likelyhood that multiple agents will select the same vertices to sample, because even with identical sets D the variance predicted by the GP will be different. However, during the optimization process each agent uses the same lower bound l0 for the length scale l. This choice encourages robots to disperse because the variance of vertices in V near to vertices already inserted in D is lower than the variance of vertices far from D and thereby the reward associated to vertices near to already sampled locations is lower.
5
Experimental Evaluation and Discussion
To assess merits and limitations of the proposed approach, we perform simulations on test cases while varying the different parameters related to the planning and surveying objectives. Due to the limitation of space, we examine the task of reconstructing two scalar fields. The first is a synthetic scalar field with a periodic trend depicted in Fig. 1a. The second, displayed in Fig. 1b, shows the soil moisture distribution measured in Summer 2018 in a commercial vineyard located Central California. This second scalar field was used as benchmark in some of our former publications [14]. To ease the comparisons between the two cases, both fields were rescaled to the same size, although the amplitude of the underlying values are different. GP predictions of each respective field were made with a Mat´ern kernel with ν = 3/2. It should be noted that this kernel is commonly used in geostatistical applications and is more appropriate for the soil moisture dataset. Here, the periodic synthetic field serves as a pathological example, with a mismatched spatial prior. In fact, the use of periodic kernels could lead to better results for the synthetic field. Future work will examine online adaptive kernel selection through Bayesian optimization. Our algorithm, indicated as Coord in the following discussion, is compared with two baseline alternatives:
474
L. Booth and S. Carpin
Fig. 1. Benchmark scalar fields to be estimated. With reference to the travel budget B, the length of the side edge is 5. In both instances the start vertex vs is in the lower left corner and the goal vertex vg is in the top right corner.
– The random waypoint selection algorithm (RWP), which selects the next vertex to visit at random among those still to be visited. Due to the nature of the selection process, the ability to communicate during the sampling process is immaterial. The RWP algorithm is often considered as a baseline comparison in this type of tasks (see e.g., [2]). – A non-coordinated (NC) approach, which selects the next sampling point using the same criteria used by our proposed algorithm, but does not exchange any information during the sample process, i.e., during the selection process each agent only considers the samples it collected, but not those collected by the other agents. At the end of the mission, when all robots have reached vg , both RWP and NC share all collected sensor observations and the MSE is computed after fitting the GP using all data collected by all robots. This step is not necessary for Coord because data is exchanged on the go, but it ensures that the MSE evaluation is done fairly among the three algorithms. After the algorithm has selected the next point to visit, all algorithms, including Coord, use the same planning engine, i.e., the one we proposed in [12]. Finally, both NC and Coord do refit of the GP and update the parameters θ before computing r, while RWP does not do this step because it does not use the current estimate to select the next point to visit. All procedures were executed single-threaded in Python running on an Apple M1 processor. All computations related to GP fitting and processing use the scikit-learn library [5]. For both scalar fields considered in the tests, we varied the number of agents (3,5,7,9), the budget B (10,15,20), and the parameter l0 (0.1,0.5,1). For each combination of parameters, twenty independent simulations were performed, for a total of about 3500 executions. Figure 2 shows the average MSE as a function of the number robots for all algorithms. As expected, the trend is decreasing (i.e. improved prediction accuracy) with diminishing returns as the number of robot grows. We can observe that the proposed algorithm outperforms the others. Note that the range of values for MSE in the two test cases is different because of the different values in
Distributed Estimation of Scalar Fields
475
Fig. 2. Average final map MSE for n=20 trials per algorithm. Error bars show ± one standard deviation.
the underlying scalar fields. Next, in Fig. 3 we show the reconstructed scalar field for the three algorithms with a budget of 20 and 5 robots. The red dots show the locations where the samples were collected. Due to the random selection process, RWP ends up collecting less samples before exhausting the budget and this leads to an inferior estimate. NC and Coord, instead, collect more samples, but we can see how Coord spreads them more uniformly and ultimately leads to a more accurate estimate (see Fig. 1b for the ground truth). Similar results are observed for the synthetic map, but are not displayed for lack of space. Table 1 provides a more detailed numeric analysis of the performance of the three algorithms. Specifically, we look at the number of unvisited locations as well as the number of locations visited by more than one robot. These are two proxies for the MSE metric, and lower values are desired in both cases. When the travel budget is 10, the number of unvisited locations is similar for the three algorithms because with limited budget the set of available choices before the budget is used is limited. As the budget grows, we see that the Coord algorithm emerges as the algorithm with less unvisited vertices, thus substantiating the claim that agents spread in the environment in a more coordinated fashion. For the number of revisited locations, RWP (as expected) always has the lowest number of revisited locations, due to the completely random nature of the selection. However, when comparing NC with Coord we see that the latter has always a lower number, again showing that the agents better spread in the environment avoiding to revisit the same places, thus ensuring that coordination leads to a better use of the robots are mobile sensors. Finally, in Fig. 4 we display how the choice of l0 , the lower-bound for the length scale parameter l, impact the value of the MSE metric. The two panels correspond to a budget of 15 and 20 respectively, and group the results for different numbers of surveying robots. For 9 robots the impact is marginal, and this is explained by the fact that with this many agents the team manages to cover most of the environment during the mission. However, for budget of 15 and
476
L. Booth and S. Carpin
Fig. 3. Reconstructed scalar field soil moisture map with 5 robots and B = 20. Table 1. Average number of unvisited and re-visited waypoints, from an experimental setting of 200 candidate sampling locations. X/Y means that there were on average X unvisited vertices and Y revisited vertices. map
synthetic dataset
soil moisture dataset
budget 10
15
20
RWP
164/8
155/10 179/6
179/6
10
15
20
164/9
155/10
NC
164/17 139/31 122/42 166/16 140/30 122/40
Coord
163/12 129/18 104/21 166/11 133/17 106/21
Fig. 4. Average MSE for different values of l0 and number of robots.
3 robots, a value of l0 = 1 gives a clearly better result. Likewise, for budget of 20 and 5 robots, a value of l0 = 0.5 is best. These results show that by tuning l0 it is possible to implicitly promote better dispersion in the team and then lower values for the MSE. An outstanding question to be investigated is how to select this value in a general setting. Nevertheless, these results confirm our hypothesis that by constraining the GP kernel parameters being optimized, one can enforce different behaviors on the team members.
Distributed Estimation of Scalar Fields
6
477
Conclusion
We have presented an approach to reconstruct a scalar field using a team of robots performing distributed GP estimation. Through limited communication, by exploiting the underlying properties of GPs, robots implicitly manage to disperse thorough the domain and collect samples at locations leading to a more accurate estimation. Our proposed approach has been extensively evaluated in simulation and outperforms competing approaches.
References 1. Contal, E, Perchet, V., Vayatis, N.: Gaussian process optimization with mutual information. In: Proceedings of the 31st International Conference on Machine Learning, pp. 253–261 (2014) 2. Di Caro, G.A., Yousaf, Z., Wahab, A.: Map learning via adaptive region-based sampling in multi-robot systems. In: Matsuno, F., Azuma, S.-I., Yamamoto, M. (eds.) Distributed Autonomous Robotic Systems: 15th International Symposium, pp. 335–348. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-927905 26 3. Hollinger, G.A., Sukhatme, G.S.: Sampling-based robotic information gathering algorithms. Int. J. Robot. Res. 33(9), 1271–1287 (2014) 4. Jadidi, M.G., Miro, J.V., Dissanayake, G.: Sampling-based incremental information gathering with applications to robotic exploration and environmental monitoring. Int. J. Robot. Res. 38(6), 658–685 (2019) 5. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011) 6. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA (2006) 7. Shamshirgaran, A., Carpin, S.: Reconstructing a spatial field with an autonomous robot under a budget constraint. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 8963–8970 (2022) 8. Srinivas, N., Krause, A., Kakade, S.M., Seeger, M.W.: Information-theoretic regret bounds for gaussian process optimization in the bandit setting. IEEE Trans. Inf. Theory 58(5), 3250–3265 (2012) 9. Stein, M.L.: Interpolation of Spatial Data: Some Theory for Kriging. Springer Series in Statistics. Springer, New York (1999) 10. Sundaram, J.P.S., Du, W., Zhao, Z.: A survey on LoRa networking: research problems, current solutions, and open issues. IEEE Commun. Surv. Tutor. 22(1), 371– 388 (2019) 11. Suryan, V., Tokekar, P.: Learning a spatial field in minimum time with a team of robots. IEEE Trans. Rob. 36(5), 1562–1576 (2020) 12. Thayer, T.C., Carpin, S.: Solving stochastic orienteering problems with chance constraints using Monte Carlo tree search. In: Proceedings of the IEEE International Conference on Automation Science and Engineering, pp. 1170–1177 (2022)
478
L. Booth and S. Carpin
13. Thayer, T.C., Vougioukas, S., Goldberg, K., Carpin, S.: Routing algorithms for robot assisted precision irrigation. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 2221–2228 (2018) 14. Thayer, T.C., Vougioukas, S., Goldberg, K., Carpin, S.: Multi-robot routing algorithms for robots operating in vineyards. IEEE Trans. Autom. Sci. Eng. 17(3), 1184–1194 (2020)
A Constrained-Optimization Approach to the Execution of Prioritized Stacks of Learned Multi-robot Tasks Gennaro Notomista(B) Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON, Canada [email protected]
Abstract. This paper presents a constrained-optimization formulation for the prioritized execution of learned robot tasks. The framework lends itself to the execution of tasks encoded by value functions, such as tasks learned using the reinforcement learning paradigm. The tasks are encoded as constraints of a convex optimization program by using control Lyapunov functions. Moreover, an additional constraint is enforced in order to specify relative priorities between the tasks. The proposed approach is showcased in simulation using a team of mobile robots executing coordinated multi-robot tasks. Keywords: Multi-robot motion coordination · Distributed control and planning · Learning and adaptation in teams of robots
1
Introduction
Learning complex robotic tasks can be challenging for several reasons. The nature of compound tasks, made up of several simpler subtasks, renders it difficult to simultaneously capture and combine all features of the subtasks to be learned. Another limiting factor of the learning process of compound tasks is the computational complexity of machine learning algorithms employed in the learning phase. This can make the training phase prohibitive, especially when the representation of the tasks comprises of a large number of parameters, as it is generally the case when dealing with complex tasks made up of several subtasks, or in the case of high-dimensional state space representations. For these reasons, when there is an effective way of combining the execution of multiple subtasks, it is useful to break down complex tasks into building blocks that can be independently learned in a more efficient fashion. Besides the reduced computational complexity stemming from the simpler nature of the subtasks to be learned, this approach has the benefit of increasing the modularity of the task execution framework, by allowing for a reuse of the subtasks as building blocks for the execution of different complex tasks. Discussions and analyses of such advantages can be found, for instance, in [9,16,26,32]. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 479–493, 2024. https://doi.org/10.1007/978-3-031-51497-5_34
480
G. Notomista
Along these lines, in [13], compositionality and incrementality are recognized to be two fundamental features of robot learning algorithms. Compositionality, in the context of learning to execute multiple tasks, is intended as the property of learning strategies to be in a form that allows them to be combined with previous knowledge. Incrementality, guarantees the possibility of adding new knowledge and abilities over time, by, for instance, incorporating new tasks. Several approaches have been proposed, which exhibit these two properties. Nevertheless, challenges still remain regarding tasks prioritization and stability guarantees [6,21,25,28,34]. The possibility of prioritizing tasks together with the stability guarantees allows us to characterize the behavior resulting from the composition of multiple tasks. In fact, when dealing with redundant robotic systems—i.e. systems which possess more degrees of freedom compared to the minimum number required to execute a given task, as, for example, multi-robot systems—it is often useful to allow for the execution of multiple subtasks in a prioritized stack. Task priorities may allow robots to adapt to the different scenarios in which they are employed by exhibiting structurally different behaviors. Therefore, it is desirable that a multi-task execution framework allows for the prioritized execution of multiple tasks. In this paper, we present a constrained-optimization robot-control framework suitable for the stable execution of multiple tasks in a prioritized fashion. This approach leverages the reinforcement learning (RL) paradigm in order to get an approximation of the value functions which will be used to encode the tasks as constraints of a convex quadratic program (QP). Owing to its convexity, the latter can be solved in polynomial time [3], and it is therefore suitable to be employed in a large variety of robotic applications, in online settings, even under real-time constraints. The proposed framework shares the optimizationbased nature with the one proposed in [18] for redundant robotic manipulators, where, however, it is assumed that a representation for all tasks to be executed is known a priori. As will be discussed later in the paper, this framework indeed combines compositionality and incrementality—i.e. the abilities of combining and adding sub-tasks to build up compound tasks, respectively—with stable and prioritized task execution in a computationally efficient optimization-based algorithm. Figure 1 pictorially shows the strategy adopted in this paper to allow robots to execute multiple prioritized tasks learned using the RL paradigm. Once a value function is learned using the RL paradigm (using, e.g., the value iteration algorithm [2]), this learned value function is used to construct a control Lyapunov function [30] in such a way that a controller synthesized using a minnorm optimization program is equivalent to the optimal policy corresponding to the value function [20]. Then, multiple tasks encoded by constraints in a min-norm controller are combined in a prioritized stack as in [17]. To summarize, the contributions of this paper are the following: (i) We present a compositional and incremental framework for the execution of multiple tasks encoded by value functions; (ii) We show how priorities among tasks
Multi-robot Multi-learned-Tasks
Optimal control
[2]
481
Value functions
[20]
Prioritized execution
[17]
Lyapunov functions
Fig. 1. Pictorial representation of the strategy adopted in this paper for the execution of prioritized stacks of learned tasks.
can be enforced in a constrained-optimization-based formulation; (iii) We frame the prioritized multi-task execution as a convex QP which can be efficiently solved in online settings; (iv) We demonstrate how the proposed framework can be employed to control robot teams to execute coordinated tasks.
2 2.1
Background and Related Work Multi-task Learning, Composition, and Execution
The prioritized execution framework for learned tasks proposed in this paper can be related to approaches devised for multi-task learning—a machine learning paradigm which aims at leveraging useful information contained in multiple related tasks to help improve the generalization performance of all the tasks [35]. The learning of multiple tasks can happen in parallel (independently) or in sequence for naturally sequential tasks [10,29], and a number of computational frameworks have been proposed to learn multiple tasks (see, e.g., [14,24,35], and references therein). It is worth noticing how, owing to its constrained-optimization nature, the approach proposed in this paper is dual to multi-objective optimization frameworks, such as [5,27] or compared to the Riemannian motion policies [15,22,23]. Several works have focused on the composition and hierarchy of deep reinforcement learning policies. The seminal work [33] shows compositionality for a specific class of value functions. More general value functions are considered in [12], where, however, there are no guarantees on the policy resulting from the multi-task learning process. Boolean and weighted composition of reward, (Q-)value functions, or policies are considered in [11,19,34]. While these works have shown their effectiveness on complex systems and tasks, our proposed approach differs from them in two main aspects: (i) It separates the task learning from the task composition; (ii) It allows for (possibly time-varying and statedependent) task prioritization, with task stacks that are enforced at runtime.
482
2.2
G. Notomista
Constraint-Based Task Execution
In this paper, we adopt a constrained-optimization approach to the prioritized execution of multiple tasks learned using the RL paradigm. In [17], a constraintbased task execution framework is presented for a robotic system with control affine dynamics x˙ = f0 (x) + f1 (x)u, (1) where x ∈ X ⊆ Rn and u ∈ U ⊆ Rm denote state and control input, respectively. The M tasks to be executed are encoded by continuously differentiable, positive definite cost functions Vi : X → R+ , i = 1, . . . , M . With the notation which will be adopted in this paper, the constraint-based task execution framework in [17] can be expressed as follows: minimize u2 + κδ2 u,δ
subject to Lf0 Vi (x) + Lf1 Vi (x)u ≤ −γ(Vi (x)) + δi
i = 1, . . . , M
(2)
Kδ ≥ 0, where Lf0 Vi (x) and Lf1 Vi (x) are the Lie derivatives of Vi along the vector fields f0 and f1 , respectively. The components of δ = [δ1 , . . . , δM ]T are used as slack variables employed to prioritize the different tasks; γ : R → R is a Lipschitz continuous extended class K function—i.e. a continuous, monotonically increasing function, with γ(0) = 0—κ > 0 is an optimization parameter, and K is the prioritization matrix, known a priori, which enforces relative constraints between components of δ of the following type: δi ≤ lδj , for l 1, which encodes the fact that task i is executed at higher priority than task j. In the following, Sect. 2.3 will be devoted to showing the connection between dynamic programming and optimization-based controllers. In Sect. 3, this connection will allow us to execute tasks learned using the RL paradigm by means of a formulation akin to (2). 2.3
From Dynamic Programming to Constraint-Driven Control
To illustrate how controllers obtained using dynamic programming can be synthesized as the solution of an optimization program, consider a system with the following discrete-time dynamics: xk+1 = f (xk , uk ).
(3)
These dynamics can be obtained, for instance, by (1), through a discretization process. In (3), xk denotes the state, uk ∈ Uk (xk ) the input, and the input set Uk (xk ) may depend in general on the time k and the state xk . The value iteration algorithm to solve a deterministic dynamic programming problem with no terminal cost can be stated as follows [2]: (4) gk (xk , uk ) + Jk (fk (xk , uk )) , Jk+1 (xk ) = min uk ∈U k (xk )
Multi-robot Multi-learned-Tasks
483
with J0 (x0 ) = 0, where x0 is the initial state, and gk (xk , uk ) is the cost incurred at time k. The total cost accumulated along the system trajectory is given by J(x0 ) = lim
N −1
N →∞
αk gk (xk , uk ).
(5)
k=0
In this paper, we will consider α = 1 and we will assume there exists a cost-free termination state.1 By Proposition 4.2.1 in [2] the value iteration algorithm (4) converges to J satisfying J (x) = min
u∈U (x)
g(x, u) + J (f (x, u)) .
(6)
Adopting an approximation scheme in value space, J can be replaced by its approximation J˜ by solving the following approximate dynamic programming algorithm: ˜ ˜ Jk+1 (xk ) = min gk (xk , uk ) + Jk (fk (xk , uk )) . uk ∈U k (xk )
In these settings, deep RL algorithms can be leveraged to find parametric approximations, J˜ , of the value function using neural networks. This will be the paradigm considered in this paper in order to approximate value functions encoding the tasks to be executed in a prioritized fashion. The bridge between dynamic programming and constraint-driven control is optimal control. In fact, the cost in (5) is typically considered in optimal control problems, recalled, in the following, for the continuous time control affine system (1): ∞ q(x(t)) + u(t)T u(t) dt minimize u(·) (7) 0 subject to x˙ = f0 (x) + f1 (x)u. Comparing (7) with (5), we recognize that the instantaneous cost g(x, u) in (5) in the context of the optimal control problem (7) corresponds to q(x) + uT u, where q : X → R is a continuously differentiable and positive definite function. A dynamic programming argument on (7) leads to the following HamiltonJacobi-Bellman equation: 1 T Lf0 J (x) − Lf1 J (x) (Lf1 J (x)) + q(x) = 0, 4 where J ∗ is the value function—similar to (6) for continuous-time problems— representing the minimum cost-to-go from state x, defined as ∞ q(x(τ )) + u(τ )T u(τ ) dτ. J (x) = min (8) u(·)
1
t
Problems of this class are referred to as shortest path problems in [2].
484
G. Notomista
The optimal policy corresponding to the optimal value function (8) can be evaluated as follows [4]: 1 T u = − (Lf1 J (x)) . (9) 2 In order to show how the optimal policy u in (9) can be obtained using an optimization-based formulation, we now recall the concept of control Lyapunov functions. Definition 1 (Control Lyapunov function [30]). A continuously differentiable, positive definite function V : Rn → R is a control Lyapunov function (CLF) for the system (1) if, for all x = 0 (10) inf Lf0 V (x) + Lf1 V (x)u < 0. u
To select a control input u which satisfies the inequality (10), a universal expression—known as the Sontag’s formula [31]—can be employed. With the aim of encoding the optimal control input u by means of a CLF, we will consider the following modified Sontag’s formula originally proposed in [7]: T −v(x) (Lf1 V (x)) if Lf1 V (x) = 0 u(x) = (11) 0 otherwise, where v(x) =
Lf0 V (x)+
2
(Lf0 V (x))
T
+q(x)Lf1 V (x)(Lf1 V (x)) T
Lf1 V (x)(Lf1 V (x))
.
As shown in [20], the modified Sontag’s formula (11) is equivalent to the solution of the optimal control problem (7) if the following relation between the CLF V and the value function J holds: ∂J ∂V = λ(x) , ∂x ∂x
(12)
T
where λ(x) = 2v(x) (Lf1 V (x)) . The relation in (12) corresponds to the fact that the level sets of the CLF V and those of the value function J are parallel. The last step towards the constrained-optimization-based approach to generate optimal control policies is to recognize the fact that, owing to its inverse optimality property [7], the modified Sontag’s formula (11) can be obtained using the following constrained-optimization formulation, also known as the pointwise min-norm controller: minimize u2 u
subject to Lf0 V (x) + Lf1 V (x)u ≤ −σ(x),
(13)
2 T where σ(x) = (Lf0 V (x)) + q(x)Lf1 V (x) (Lf1 V (x)) . This formulation shares the same optimization structure with the one introduced in (2) in Sect. 2, and in the next section we will provide a formulation which strengthens the connection with approximate dynamic programming.
Multi-robot Multi-learned-Tasks
485
In Appendix A, additional results are reported, which further illustrate the theoretical equivalence discussed in this section, by comparing the optimal controller, the optimization-based controller, and a policy learned using the RL framework for a simple dynamical system.
3
Prioritized Multi-task Execution
When V = J˜ , the min-norm controller solution of (13) is the optimal policy which would be learned using a deep RL algorithm. This is what allows us to bridge the gap between constraint-driven control and RL and it is the key to execute tasks learned using the RL paradigm in a compositional, incremental, prioritized, and computationally-efficient fashion. Following the formulation given in (2), the multi-task prioritized execution of tasks learned using RL can be implemented executing the control input solution of the following optimization program: minimize u2 + κδ2 u,δ 1
Lf0 J˜i (x) + Lf1 J˜i (x)u ≤ −σi (x) + δi , subject to λi (x) Kδ ≥ 0
i = 1, . . . , M (14)
where J˜1 , . . . , J˜M are the approximated value functions encoding the tasks learned using the RL paradigm (e.g. value iteration). In summary, with the RL ; the robotic paradigm, one can get the approximate value functions J˜1 , . . . , J˜M system is then controlled using the control input solution of (14) in order to execute these tasks in a prioritized fashion. (x) contain the gradients Remark 1. The Lie derivatives Lf0 J˜1 (x), . . . , Lf0 J˜M ∂J1 ∂JM ˜ (x), . . . , J˜ (x) are approximated using neural networks, , . . . , . When J 1 M ∂x ∂x these gradients can be efficiently computed using back propagation.
We conclude this section with the following Proposition 1, which ensures the stability of the prioritized execution of multiple tasks encoded through the value functions J˜i by a robotic system modeled by the dynamics (1) and controlled with control input solution of (14). Proposition 1 (Stability of multiple prioritized learned tasks). Consider executing a set of M prioritized tasks encoded by approximate value functions J˜i , i = 1, . . . , M , by solving the optimization problem in (14). Assume the following: 1. All constraints in (14) are active 2. The robotic system can be modeled by driftless control affine dynamical system, i.e. f0 (x) = 0 ∀x ∈ X 3. The instantaneous cost function g used to learn the tasks is positive for all x ∈ X and u ∈ U . Then,
486
G. Notomista
⎡
⎤ J˜1 (x(t)) ⎢ ⎥ .. ⎣ ⎦ → N (K), . ˜ J (x(t)) M
as t → ∞, where N (K) denotes the null space of the prioritization matrix K. That is, the tasks will be executed according to the priorities specified by the prioritization matrix K in (2). Proof. The Lagrangian associated (14) is given
with the optimization problem 2 2 T T ˆ ˆ by L(u, δ) = u + κδ + η1 f0 (x) + f1 (x)u + σ(x) − δ + η2 (−Kδ), where fˆ0 (x) ∈ RM and fˆ1 (x) ∈ RM ×m defined as follows: the i-th component of fˆ0 (x) is equal to 1 Lf J˜ (x), while the i-th row of fˆi (x) is equal to 1 Lf J˜ (x). η1 λi (x)
0
i
λi (x)
1
i
and η2 are the Lagrange multipliers corresponding to the task and prioritization constraints, respectively. From the KKT conditions, we obtain: 1 1 I −K T η, (15) δ= u = − fˆ1 (x)T 0 η 2 2κ where η = [η1T , η2T ]T . By resorting to the Lagrange dual problem, and by using assumption 1, we get the following expression for η: I −1 + fˆ1 (x)fˆ1 (x)T −K T fˆ0 (x) + σ(x) κ η=2 , (16) 0 K KK T A−1 1
b0
where I denotes matrix of appropriate size. −1Substituting (16) in (15), an identity 1 T A1 b0 . we get u = − fˆ1 (x)T 0 A−1 1 b0 and δ = κ I −K To show the claimed stability property, we will proceed by a Lyapunov ˜ T argument. Let us consider the Lyapunov function candidate V (x) = 12 J(x) T T K K J˜ (x), where J˜ (x) = J˜1 (x) . . . J˜M (x) . The time derivative of V evaluates to: ˜ ∂V ˜ T K T K ∂ J x˙ V˙ = x˙ = J(x) ∂x ∂x ˜ ∂ J ˜ T K T K = J(x) f1 (x) u (by assumption 2), ∂x fˆ1,λ (x)
where, notice that fˆ1,λ (x) = Λ(x)fˆ1 (x) ∈ RM and Λ(x) = diag ([λ1 (x), . . . , λM (x)]). By assumption 2, λi (x) ≥ 0 for all i, and therefore Λ(x) 0, i.e. Λ(x) ˜ T K T ˜ T K T KΛ(x)fˆ1 (x)u = −J(x) is positive Then, V˙ = J(x) semidefinite. σ(x) KΛ(x)Aˆ , where 0 I + fˆ1 (x)fˆ1 (x)T −K T −1 Aˆ = fˆ1 (x) fˆ1 (x)T 0 κ 0 (17) K KK T
Multi-robot Multi-learned-Tasks
487
as in Proposition 3 in [17], and we used assumption 2 to simplify the expression of b0 . By assumption 3, it follows that the value functions J˜i are positive definite. Therefore, from the definition of σ, in a neighborhood of 0 ∈ RM , we can bound σ(x)—defined by the gradients of J˜ —by the value of J˜ as σ(x) = γJ (J˜ (x)), where γJ is a class K function. Then, proceeding similarly to Proposition 3 in [17], we can bound V˙ as fol¯ (x), ˆ J (J(x)) ˜ ˜ T K T KΛ(x)J(x) ˜ ˜ T K T KΛ(x)Aγ ≤ −J(x) ≤ −λV lows: V˙ = −J(x) ¯ = min{λ1 (x), . . . , λM (x)}. Hence, K J˜ (x(t)) → 0 as t → ∞, and where λ J˜ (x(t)) → N (K) as t → ∞. Remark 2. The proof of Proposition 1 can be carried out even in case of timevarying and state-dependent prioritization matrix K. Under the assumption that K is bounded and continuously differentiable for all x and uniformly in time, the norm and the gradient of K can be bounded in order to obtain an upper bound for V˙ . Remark 3. Even when the prioritization stack specified through the matrix K in (14) is not physically realizable—due to the fact that, for instance, the functions encoding the tasks cannot achieve the relative values prescribed by the prioritization matrix—the optimization program will still be feasible. Nevertheless, the tasks will not be executed with the desired priorities and even the execution of high-priority tasks might be degraded.
4
Experimental Results
In this section, the proposed framework for the execution of prioritized stacks of tasks is showcased in simulation using a team of mobile robots. Owing to the multitude of robotic units of which they are comprised, multi-robot systems are often highly redundant with respect to the tasks they have to execute. Therefore, they perfectly lend themselves to the concurrent execution of multiple prioritized tasks. 4.1
Multi-robot Tasks
For multi-robot systems, the redundancy stems from the multiplicity of robotic units of which the system is comprised. In this section, we will showcase the execution of dependent tasks—two tasks are dependent if executing one prevents the execution of the other [1]—in different orders of priority. The multi-robot system is comprised of 6 planar robots modeled with single integrator dynamics and controlled to execute the following 4 tasks: All robots assemble an hexagonal formation (task T1 ), robot 1 goes to goal point 1 (task T2 ), robot 2 goes to goal point 2 (task T3 ), robot 3 goes to goal point 3 (task T4 ). While Task 1 is independent of each of the other tasks taken singularly, it is not independent of any pair of tasks 2, 3, and 4. This intuitively corresponds to the fact that it is possible to form a hexagonal formation in different points in space, but it might
488
G. Notomista
Fig. 2. Snapshots (a–l) and plot of J˜1 , J˜2 , J˜3 , and J˜4 (m) corresponding to the hexagonal formation control task and 3 go-to-goal tasks for robots 1, 2, and 3, respectively, recorded during the course of a simulated experiment with a multi-robot system. Robots are gray dots, connection edges between the robots used to assemble the desired formation are depicted in blue, goal points are shown as red dots.
not be feasible to form a hexagonal formation while two robots are constrained to be in two pre-specified arbitrary locations. Figure 2 reports a sequence of snapshots and the graph of the value functions encoding the four tasks recorded during the course of the experiment. Denoting by Ti ≺ Tj the condition under which task Ti has priority higher than Tj , the sequence of prioritized stacks tested in the experiment are the following: ⎧ ⎪ 0 s ≤ t < 15 s ⎨T2 , T3 , T4 ≺ T1 (18) 15 s ≤ t < 30 s T 1 ≺ T 2 , T3 , T4 ⎪ ⎩ 30 s ≤ t ≤ 45 s. T 1 ≺ T 2 ≺ T 3 , T4
Multi-robot Multi-learned-Tasks
489
The plot of the value functions in Fig. 2l shows how, for 0s ≤ t < 15s, since the hexagonal formation control algorithm has lower priority compared to the three go-to-goal tasks, its value function J˜1 is allowed to grow while the other three value functions are driven to 0 by the velocity control input solution of (14) supplied to the robots. For 15s ≤ t < 30s, the situation is reversed: the hexagonal formation control is executed with highest priority while the value functions encoding the three go-to-goal tasks are allowed to grow—a condition which corresponds to the non-execution of the tasks. Finally, for 30s ≤ t ≤ 45s, task T2 , i.e. go-to-goal task for robot 1 to goal point 1 is added at higher priority with respect to tasks T3 and T4 . Since this is independent by task T1 , it can be executed at the same time. As a result, as can be seen from the snapshots, the formation translates towards the red point marked with 1. Tasks T1 and T2 are successfully executed while tasks T3 and T4 are not executed since are not independent by the first two and they have lower priority. Remark 4. The optimization program responsible for the execution of multiple prioritized tasks encoded by value functions is solved at each iteration of the robot control loop. This illustrates how the convex optimization formulation of the developed framework is computationally efficient and therefore amenable to be employed in online settings. Alternative approaches for task prioritization and allocation in the context of multi-robot systems generally result in (mixed-)integer optimization programs, which are often characterized by a combinatorial nature and are not always suitable for an online implementation [8]. 4.2
Discussion
The experiments of the previous section highlight several amenable properties of the framework developed in this paper for the prioritized execution of tasks encoded by a value function. First of all, its compositionality is given by the fact that tasks can easily be inserted and removed by adding and removing constraints from the optimization program (14). For the same reason the framework is incremental and modular as it allows for building a complex task using a number of subtasks which can be incrementally added to the constraints of an optimization-based controller. Moreover, it allows for seamless incorporation of priorities among tasks, and, as we showcased in Sect. 4.1, these priority can also be switched in an online fashion, in particular without the need of stopping and restarting the motion of the robots. Furthermore, Proposition 1 shows that the execution of multiple tasks using the constraint-driven control is stable and the robotic system will indeed execute the given tasks according to the specified priorities. Finally, as the developed optimization program is a convex QP, its low
490
G. Notomista
computational complexity allows for an efficient implementation in online settings even under real-time constraints on computationally limited robotic platforms.
5
Conclusion
In this paper, we presented an optimization-based framework for the prioritized execution of multiple tasks encoded by value functions. The approach combines control-theoretic and learning techniques in order to exhibit properties of compositionality, incrementality, stability, and low computational complexity. These properties render the proposed framework suitable for online and realtime robotic implementations. A multi-robot simulated scenario illustrated its effectiveness in the control of a redundant robotic system executing a prioritized stack of tasks.
A
Comparison Between Optimal Control, Optimization-Based Control, and RL Policy
To compare optimal controller, optimization-based controller, and RL policy, in this section, we consider the stabilization of a double integrator system to 01 0 the origin. The system dynamics are given by: x˙ = x+ u, where 00 1 x = [x1 , x2 ]T ∈ R2 and u ∈ R. The instantaneous cost considered in the optimal control problem (7) is given by q(x)+u2 where q(x) = xT x. The reward function of the value iteration algorithm employed to learn an approximate representation of the value function has been set to g(x, u) = −q(x) − u2 , and the resulting value function J˜ has been shifted so that J˜ (0) = 0. The results of the comparison are reported in Fig. 3. Here, the optimizationbased controller solution of (13) with V = J˜ is compared to the optimal controller given in (9), and the RL policy corresponding to the approximate value function J˜ . As can be seen, the optimization-based controller and the optimal controller coincide, while the RL policy becomes closer and closer as the number of training epochs increases.
B
Implementation Details
The results reported in Sect. 4 have been obtained using a custom value function learning algorithm written in Python. The details of each multi-robot task are given in the following.
Multi-robot Multi-learned-Tasks
491
Fig. 3. Comparison between the optimal controller (given in (9)), RL policy (based on the approximate value function J˜ (8)), and optimization-based controller (solution of (13) with V = J˜ ) employed to stabilize a double-integrator system to the origin of its state space, i.e. driving both x1 and x2 to 0. As can be seen, when trained for a sufficiently long time, the RL policy results in the optimal controller, which is also equivalent to the optimization-based controller.
Each robot in the team of N robots is modeled using single integrator dynamics x˙ i = ui , where xi , ui ∈ R2 are position and velocity input of robot i. The ensemble state and input will be denoted by x and u, respectively. For the formation control task, the expression of the cost g is given by g(x, u) = 1000 − 0.01(−E(x) − 10u2 ), where the value of E(x) is the formaN tion energy defined as E(x) = i=1 j∈N i (xi − xj 2 − Wij2 )2 , Ni being the neighborhood of robot i, i.e. the set of robots with which robot i shares an edge, and √ ⎤ ⎡ 0 l 3l 2l 0 l ⎢ l 0 l 0 2l 0 ⎥ ⎥ ⎢√ ⎢ 3l l 0 l 0 2l⎥ ⎥ ⎢ (19) W =⎢ ⎥ ⎢ 2l 0 l 0 l 0 ⎥ ⎣ 0 2l 0 l 0 l ⎦ l 0 2l 0 l 0 with l = 1. The entry ij of the matrix W corresponds to the desired distance to be maintained between robots i and j. The cost function g for the go-to-goal tasks is given by g(x, u) = 100 − 0.01(−x − x ˆ2 − u2 ), where x ˆ ∈ R2 is the desired goal point. Remark 5 (Combination of single-robot and multi-robot tasks). Single-robot tasks (e.g. the go-to-goal tasks considered in this paper) are combined with multi-robot tasks (e.g. the formation control task) by defining the task gradient
492
G. Notomista
˜ required to compute Lf0 J˜i (x) and Lf1Ji (x) in the optimization program (14) ∂ J˜i ∂ J˜ij = 0 ··· 0 in the following way: 0 · · · 0 , where the j-th entry J˜ is the ∂x
∂x
approximate value function for task i and robot j.
ij
References 1. Antonelli, G.: Stability analysis for prioritized closed-loop inverse kinematic algorithms for redundant robotic systems. IEEE Trans. Rob. 25(5), 985–994 (2009). https://doi.org/10.1109/TRO.2009.2017135 2. Bertsekas, D.P.: Reinforcement Learning and Optimal Control. Athena Scientific Belmont, MA (2019) 3. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004) 4. Bryson, A.E., Ho, Y.C.: Applied Optimal Control: Optimization, Estimation, and Control. Routledge, New York (2018) 5. Bylard, A., Bonalli, R., Pavone, M.: Composable geometric motion policies using multi-task pullback bundle dynamical systems. arXiv preprint arXiv:2101.01297 (2021) 6. Dulac-Arnold, G., Mankowitz, D., Hester, T.: Challenges of real-world reinforcement learning. arXiv preprint arXiv:1904.12901 (2019) 7. Freeman, R.A., Primbs, J.A.: Control Lyapunov functions: new ideas from an old source. In: Proceedings of 35th IEEE Conference on Decision and Control, vol. 4, pp. 3926–3931. IEEE (1996) 8. Gerkey, B.P., Matari´c, M.J.: A formal analysis and taxonomy of task allocation in multi-robot systems. Int. J. Robot. Res. 23(9), 939–954 (2004) 9. Ghosh, D., Singh, A., Rajeswaran, A., Kumar, V., Levine, S.: Divide-and-conquer reinforcement learning. arXiv preprint arXiv:1711.09874 (2017) 10. Gupta, A., et al.: Reset-free reinforcement learning via multi-task learning: learning dexterous manipulation behaviors without human intervention. arXiv preprint arXiv:2104.11203 (2021) 11. Haarnoja, T., Pong, V., Zhou, A., Dalal, M., Abbeel, P., Levine, S.: Composable deep reinforcement learning for robotic manipulation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6244–6251. IEEE (2018) 12. Haarnoja, T., Tang, H., Abbeel, P., Levine, S.: Reinforcement learning with deep energy-based policies. In: International Conference on Machine Learning, pp. 1352– 1361. PMLR (2017) 13. Kaelbling, L.P.: The foundation of efficient robot learning. Science 369(6506), 915– 916 (2020) 14. Micchelli, C.A., Pontil, M.: Kernels for multi–task learning. In: NIPS, vol. 86, p. 89. Citeseer (2004) 15. Mukadam, M., Cheng, C.A., Fox, D., Boots, B., Ratliff, N.: Riemannian motion policy fusion through learnable lyapunov function reshaping. In: Conference on Robot Learning, pp. 204–219. PMLR (2020) 16. Nachum, O., Gu, S., Lee, H., Levine, S.: Data-efficient hierarchical reinforcement learning. arXiv preprint arXiv:1805.08296 (2018) 17. Notomista, G., Mayya, S., Hutchinson, S., Egerstedt, M.: An optimal task allocation strategy for heterogeneous multi-robot systems. In: 2019 18th European Control Conference (ECC), pp. 2071–2076. IEEE (2019)
Multi-robot Multi-learned-Tasks
493
18. Notomista, G., Mayya, S., Selvaggio, M., Santos, M., Secchi, C.: A set-theoretic approach to multi-task execution and prioritization. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 9873–9879. IEEE (2020) 19. Peng, X.B., Chang, M., Zhang, G., Abbeel, P., Levine, S.: MCP: learning composable hierarchical control with multiplicative compositional policies. arXiv preprint arXiv:1905.09808 (2019) 20. Primbs, J.A., Nevisti´c, V., Doyle, J.C.: Nonlinear optimal control: a control Lyapunov function and receding horizon perspective. Asian J. Control 1(1), 14–24 (1999) 21. Qureshi, A.H., Johnson, J.J., Qin, Y., Henderson, T., Boots, B., Yip, M.C.: Composing task-agnostic policies with deep reinforcement learning. arXiv preprint arXiv:1905.10681 (2019) 22. Rana, M.A., et al.: Learning reactive motion policies in multiple task spaces from human demonstrations. In: Conference on Robot Learning, pp. 1457–1468. PMLR (2020) 23. Ratliff, N.D., Issac, J., Kappler, D., Birchfield, S., Fox, D.: Riemannian motion policies. arXiv preprint arXiv:1801.02854 (2018) 24. Ruder, S.: An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098 (2017) 25. Sahni, H., Kumar, S., Tejani, F., Isbell, C.: Learning to compose skills. arXiv preprint arXiv:1711.11289 (2017) 26. Schwartz, A., Thrun, S.: Finding structure in reinforcement learning. Adv. Neural. Inf. Process. Syst. 7, 385–392 (1995) 27. Sener, O., Koltun, V.: Multi-task learning as multi-objective optimization. arXiv preprint arXiv:1810.04650 (2018) 28. Singh, S.P.: Transfer of learning by composing solutions of elemental sequential tasks. Mach. Learn. 8(3), 323–339 (1992) 29. Smith, V., Chiang, C.K., Sanjabi, M., Talwalkar, A.: Federated multi-task learning. arXiv preprint arXiv:1705.10467 (2017) 30. Sontag, E.D.: A lyapunov-like characterization of asymptotic controllability. SIAM J. Control. Optim. 21(3), 462–471 (1983) 31. Sontag, E.D.: A “universal” construction of Artstein’s theorem on nonlinear stabilization. Syst. Control Lett. 13(2), 117–123 (1989) 32. Teh, Y.W., et al.: Distral: robust multitask reinforcement learning. arXiv preprint arXiv:1707.04175 (2017) 33. Todorov, E.: Compositionality of optimal control laws. Adv. Neural. Inf. Process. Syst. 22, 1856–1864 (2009) 34. Van Niekerk, B., James, S., Earle, A., Rosman, B.: Composing value functions in reinforcement learning. In: International Conference on Machine Learning, pp. 6401–6409. PMLR (2019) 35. Zhang, Y., Yang, Q.: A survey on multi-task learning. IEEE Trans. Knowl. Data Eng. 34, 5586–5609 (2021)
A Comparison of Two Decoupled Methods for Simultaneous Multiple Robots Path Planning Benjamin Bouvier and Julien Marzat(B) DTIS, ONERA, Universit´e Paris Saclay, 91123 Palaiseau, France {benjamin.bouvier,julien.marzat}@onera.fr Abstract. Two path planning algorithms dedicated to multiple robots driving simultaneously in a cluttered workspace are defined and compared in this paper. Each robot is assigned an initial position, a goal position and a constant reference speed, and has to drive without colliding with its teammates and the environment. Both approaches are based on an implementation of the widely known A* algorithm and belong to the family of decoupled path planning methods, since robots are considered sequentially and with a predefined priority ranking. The first algorithm is called Prioritized Planning (PP), where the path of each robot is computed sequentially while taking into account static obstacles as well as the previously-planned robot positions. The second algorithm, called FixedPath Coordination (FPC), follows a two-step approach: (i) obtaining A* paths of all robots taking into account static obstacles only; (ii) applying a velocity-tuning procedure so that lower-priority robots can stop and restart along their paths to let the higher-priority robots move freely. Both algorithms have been applied on a variety of test-cases through a Monte Carlo setup to evaluate and compare their performances. Keywords: Multi-robot motion coordination · Decoupled path planning · Prioritized Planning · Fixed-Path Coordination
1
Introduction
Coordinated path planning of a group of mobile robots is a major requirement for many cooperative tasks in interaction with a complex environment (e.g. unknown, cluttered, dynamical). A high-level description of the problems related to either collaborative or competitive robot coordination can be found in [1]. This coordination implies an inherent conflict of resources: in path planning the conflict is mostly a space conflict because the robots move around each other, but it may also be a conflict due to limited communication channels or a conflict of payload if various robots manipulate the same one. Task allocation is a problem in itself too, as well as the option of leaving the fleet management to a centralized authority or allowing every robot to take decisions on their own. It is considered here that the allocation of goal positions has already been performed and that the problem consists in coordinating the robots’ simultaneous motion in the workspace. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 494–508, 2024. https://doi.org/10.1007/978-3-031-51497-5_35
Comparison of Decoupled Methods
495
When managing more than one robot, a solution can consist in applying motion coordination [2] i.e. coordinating the robots online, as they are moving, to prevent collisions from occurring. This can be achieved either through a centralized authority sending commands to the robots (centralized approach) or with the robots communicating with each other based on their future displacements (decentralized approach). Motion coordination can result in various solutions: (i) a system of traffic rules; (ii) reactive approaches which can be based on potential fields with attractive forces towards goals and repulsive ones to stay away from obstacles; (iii) the management of a swarm or fleet with a common objective. For example, coordinated control approaches have been previously used for several applications that can be addressed by a fleet of autonomous aerial or terrestrial robots [3]. To limit computational complexity, the online search of a model predictive control input for each vehicle can be reduced to a discretized set while still taking into account the predicted motion of the other robots, as in [4]. However this strategy can be limited for autonomous robots moving freely in a very cluttered environment, where in some configurations no feasible solution can be found. This is why graph-based and sampling-based path planning methods should also be considered for simultaneous robot coordination [5,6]. In the latter framework, a simple approach could consist in considering only the current positions of the other robots when computing the path of each one, but collision avoidance with the other robots is then not guaranteed. The objective of the present work is to study path planning methods that can be applied to multiple robots moving simultaneously in the same cluttered environment, with an explicit handling of their motion interaction. For this purpose two decoupled sampling-based path planning algorithms derived from the A* procedure are described in Sect. 3, and their performances are evaluated in a Monte Carlo setup with a combination of map sizes, rate of static obstacles and number of robots (see Sect. 4).
2
Related Work
Simultaneous motion planning for multiple robots comes with various difficulties, such as the numerical capacity of planning for a high number of robots or large workspaces, the consideration of robot dynamics as well as sensing constraints, or the transition from numerical simulation to real robots experiments [2]. To deal with this variety of possible issues, several motion planning paradigms have been derived as described in the reference textbook [5], where in particular algorithms can be classified as either sampling-based or combinatorial methods. In Sampling-based motion planning, the idea is to sample the free space to conduct the search for a path relying on a collision detection module, hence the explicit construction of the obstacle space is avoided. In Combinatorial motion planning, also referred to as exact methods, paths are computed in the continuous configuration space without resorting to sampling or discretization. Two main categories emerge from the literature to address simultaneous path planning, namely coupled approaches [6–8] and decoupled approaches [8–11]. In coupled
496
B. Bouvier and J. Marzat
planning, the problem can be seen as handling a single robot which concatenates the states of the actually different robots. This increases the dimension of the configuration space and complexity is known to rise exponentially in this number. Classical algorithms may then fail to find a solution or require too much time (this has been verified in our evaluations, see Sect. 4.1). In decoupled path planning, two subcategories can be considered [12]: 1. Robots are treated one by one. For every one of them, a path is calculated considering all other robots as moving obstacles on top of the already-existing static obstacles. Two steps are required, namely: a planning algorithm and a prioritization rule. The planning algorithm is applied to obtain the path of each robot to its respective goal, and then prioritization should favor those with the highest cost (depending on time, distance, energy, etc.). For instance, robots with the least cost could be penalized by having to bypass or make detours. 2. Each robot has its own path, calculated independently of the others with only the static obstacles taken into account. Then a coordination diagram should plan the displacements to avoid collisions. Some authors mention a second phase of velocity tuning (e.g. in [13]) to coordinate robots in parallel: the relative speed between two robots can thus be tuned in the portion of path subject to a risk of collision. Finally, it is possible to build hybrid approaches [12] which rely on both coupled and decoupled search. The idea is to divide the robots into subgroups depending on their level of dependency or degree of priority (the constitution of such groups is in itself a task). For each subgroup, which has a smaller configuration space, a coupled approach is applied. The groups are treated in a decoupled way with respect to one another. The authors in [14] proposed a planning algorithm based on satisfiability modulo theory, where suboptimal paths can be accepted if, in turn, this makes the problem solvable. Coupled or hybrid approaches however still report computational loads that are prohibitive for embedded applications for realistic numbers of robots and workspace dimensions. It has thus been chosen to focus on decoupled planning methods in this work, since this category of approaches has the capacity of decomposing the problem into simpler ones, and the literature shows that feasible solutions can be found even when the number of robots, the workspace dimension or the degrees of freedom increase. In the context of managing intersections for autonomous vehicles, a priority-based coordination framework which combines a high-level priority graph with a feedback control law has been proposed in [15]. Under this framework, the proposed overall coordination system is proven to be collision-free and deadlock-free. In [16], a prioritized planning approach has been defined with a formal guarantee to provide a solution under strict conditions regarding the workspace and the robots initial conditions. An asynchronous decentralized implementation of this strategy has also been proposed and demonstrated reliable results via a simulation evaluation on real-world maps. Similar approaches relying on priority-based planning have also been proposed
Comparison of Decoupled Methods
497
in [9,10]. The main ideas for coordinating the speed of robots having independent goals along pre-defined paths have been presented in [8]. The so-called coordination diagram based on a bounding box representation of the obstacles has then been introduced in [17] to tackle the motion coordination of several robots moving along fixed independent paths while avoiding mutual collisions. Multi-robot path planning continues to be an active area of research with many different approaches proposed in the past few years, based e.g. on reinforcement learning [18,19], particle swarm optimization [20,21], graph neural networks [22] and with the design of novel heuristics [23]. Several decoupled multi-robot coordination strategies have been defined and evaluated in these previous works, relying either on priority-based successive planning or on velocity adjustment along fixed paths. Two such decoupled algorithms are defined in the present paper, one from each category, namely Prioritized Planning (PP) and Fixed-Path Coordination (FPC). They are both derived from the same A* framework with practical implementations, which are detailed in pseudocodes. This allows to compare fairly the two approaches in the same simulation setup using extensive Monte-Carlo evaluations on randomlydefined workspace configurations, and as a result provide recommendations for the motion coordination of multiple robots having independent goals in cluttered environments.
3 3.1
Planning Algorithms Problem Formulation
The problem is illustrated in Fig. 1. A two-dimensional workspace W = R2 is considered (but the methods can easily generalize to higher-dimension workspaces). This workspace can be separated into two parts: a first part where robots are free to move, Cf ree , and a second one occupied by static obstacles, Cobs , where robots are not allowed to drive. A robot A can undergo various transformations which result in reaching a particular configuration q ∈ W . An obstacle-free configuration such that q ∈ Cf ree is considered valid, while a collision configuration q ∈ Cobs is invalid. Let us now consider we have m robots, m ∈ [[2; +∞[[. Every robot Ai , i ∈ [[1; m]], is assigned an initial configuration, qIi , and a final configui , defined in Cf ree . Contrary to the single-robot case where it is ration or goal, qG sufficient to consider robot-obstacle collisions, robot-robot collisions should be taken into account. Robots are moving with respect to each other, therefore a given robot Ai should consider all the other robots Aj , j = i, as moving obstacles. The set of configurations such that robot Ai collides with robot Aj , j = i, i,j is defined as Cmov obs .
498
B. Bouvier and J. Marzat
Fig. 1. Multiple robots path planning: robots A1 and A2 drive from their initial to final configurations while avoiding two static obstacles and ensuring they do not collide.
The path planning problem consists in finding, for every robot Ai , a continuous path τi : {[0; 1] → Cf ree ; s → τi (s)} where s parameterizes the path such i , and with an empty intersection with both Cobs and that τ (0) = qIi and τ (1) = qG i,j Cmov obs for all j = i. The notion of moving obstacles requires to have a parameter that can be used to determine where a given robot is located at some instant. If all robots have the same speed, the cumulative distance could be used and s would be sufficient. However, for robots with different speeds vi , time should be considered explicitly in the problem definition. Therefore, a date is associated to every configuration of the path. Instead of looking for the parameterized path τi defined previously, we now look for a trajectory φi = τi ◦ σi considering a timing function defined as σi : {T → [0; 1] ; t → s}, where t denotes time. 3.2
Decoupled Prioritized Planning (PP)
The first algorithm is a decoupled two-dimensional algorithm derived from A* using prioritization to sequentially obtain the path of every robot without collisions. The method requires that the m robots have been ranked from highest to lowest priority. For the first one, A1 , a standard A* path is calculated from its initial position to its goal position, taking into account the static obstacles present in the workspace. Then, the second robot is treated: besides static obstacles, A2 has to consider the previously computed motion of robot A1 along its path with a given constant speed v1 . In case of collision with A1 at a given node, A2 will have to reach another configuration, therefore bypassing A1 which has priority. This procedure is applied iteratively for the next robots until all have been treated. The idea is illustrated in Fig. 2. In this algorithm, every robot is assumed to drive at its own constant nonzero speed, meaning that a robot is unable to stop. All it can do is bypass, which requires to have sufficient free space around the static and moving obstacles.
Comparison of Decoupled Methods
499
Fig. 2. PP Algorithm: A2 ’s path is obtained by bypassing the higher-priority robot A1 .
Each robot is assigned a vector of dates associated to its A* path, given its nominal velocity vi specified by the user. For robot Ai , it will allow to interpolate the positions of all higher-priority robots Aj , 1 ≤ j < i, to detect possible collisions. Collisions are defined as follows: the current robot Ai is considered as a point, while all the robots with higher priorities Aj , 1 ≤ j < i, are considered to occupy a disk of parameterized safety radius centered at the robot position at the current date. If Ai finds itself inside or on the outer circle of any disk representing Aj , a collision is detected and the corresponding node is declared unfeasible. The pseudocode of the algorithm is given in Algorithm 1. It globally consists in a loop with a modified A* procedure applied for each robot (modifications are indicated in blue). The dates in the list time vec (which is of same length as the heuristic scores gscores ) are defined for the nodes present either in the closedSet (which progressively contains the path) or in the openSet (which contains nodes considered to be part of the path but eventually not selected). The considered heuristic dist(nbg, n) is based on the Euclidean distance. In the end, a trajectory (pathi ; datesi ) is obtained for each robot i. If the while loop is exited with success = f alse, the path is extracted from the node of minii , therefore an incomplete path is obtained to mal heuristic value instead of qG determine how far the robot can go before having to stop. The core of the multi-robot layer lies in the robot-robot collision detection. When managing static obstacles, it is sufficient and straightforward to check if the intended configuration is inside or outside Cobs . When it comes to moving obstacles (higher-priority robots), one has to consider the relative speed between the robot and the obstacle as well as the obstacle width. Collision checks are performed every dT seconds, given the relative speed vector and the obstacle width. When moving from a current node position n to the intended one ngb, the robot does not change its direction but the moving obstacle might. As a consequence, care should be taken in selecting the dates and in calculating the relative speed vector. Moreover, collision checks are performed with all robots of higher priority (the sweeping is nevertheless stopped when a collision is detected because one is sufficient). All these additional calculations can be costly so they are run only if a static obstacle has not already been detected. Finally, a postcheck is conducted on the obtained paths and dates such that the coordination is considered to have failed in the particular case where a lower-priority robot has reached its destination and is blocking the path of a higher-priority robot.
500 1 2 3
4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21 22 23 24
B. Bouvier and J. Marzat foreach robot i ∈ [[1; m]] [ranked from highest to lowest priority] do Initialize closedSet to empty and openSet to the node containing qIi ; Initialize lists gscores (cumulated distances from qIi ) and time vec (time instants); while openSet not empty do Find node n in openSet with minimal heuristic value mhv; if mhv = ∞ then Assign success := false; Break Loop; end; i then Assign success := true; Break Loop; end; if n = qG Add n in closedSet; Remove n from openSet; for all neighbors ngb of ”n” inside the workspace do if ngb ∈ / closedSet then Compute candidate cost alt := gscores (n) + dist(ngb, n); if ”ngb” collides with a static obstacle or with previous robots on {pathj , datesj , ∀j < i} then alt := ∞; end if alt < gscores (ngb) then Update gscores , time vec end; end end end if success = true then i and vector datesi from time vec; Get full pathi from start qIi to goal qG else Get incomplete pathi and vector datesi from start to best reachable node; end Return pathi and datesi ; end
Algorithm 1: Prioritized Planning (PP) pseudocode
3.3
Decoupled Fixed-Path Coordination (FPC)
The second algorithm is also decoupled, two-dimensional and A*-based. It differs from the PP algorithm in that it allows robots to stop on their paths and resume their motions. Therefore, robot-robot collisions are not avoided by bypassing but simply by stopping and starting again (see Fig. 3a), which is one possible choice for modulating the robot velocities along their trajectories [5].
Fig. 3. A2 ’s trajectory is obtained by pausing until higher-priority robot A1 has gone by.
Comparison of Decoupled Methods
501
First, an A* path is obtained independently for each of the m robots by considering static obstacles only (referred to as single-robot planning). The input requirements are the same as the previous algorithm: the robots should be ranked from highest to lowest priority and each is assigned a nominal velocity vi . A speed profile is then initialized for each robot along the A* paths. The next step consists in tuning these velocity profiles to ensure collision-free paths, which is achieved by inserting pauses every time a robot-robot collision is detected. Note that these breaks can be inserted anywhere in the path, which means not necessarily on the nodes of the A* grid. If a collision is detected (with the same mechanism as in PP), the robot with the lowest priority will pause as long as necessary for the higher-priority robots to go past. Another less conservative option would be to optimize the robot motions given some global criterion to select which robot should stop for given collision conditions, which might result in different prioritization rules for similar collision configurations at different time instants. In the PP algorithm, a list of dates has been added as additional information to handle collision-checking. The same applies in this second algorithm, with the addition of a binary motion command state associated to each date, to determine if the robot is driving or waiting (see Fig. 3b). Another difference with Algorithm 1 is the calculation of dT . In PP it is calculated from the relative speed vector between two given robots at a certain date, while a unique dT is now used in FPC for all couples of robots on the basis of the highest possible relative speed. The pseudocode is given in Algorithm 2. Every time a collision is detected, a pause is inserted, which requires to check again the collisions of all pairs of robots at the date of collision, in case the pause insertion would induce a collision that did not exist before. Also, when two robots are moving exactly in opposite directions, that stop may need to be inserted several times dT before the collision date, which requires going backwards in time to check all pairs and can turn out to be costly. Although not explicit in Algorithm 2, the main while loop (line 7) can be interrupted on a maximum number of collision checks in the sub-routines, which indicates a likely absence of solution and is thus considered as a failure to find coordinated paths.
502 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28
29 30 31 32 33
B. Bouvier and J. Marzat for all robots i ∈ [[1; m]] do get A* pathi (with static obstacles only), initial datesi and motioni vectors; end Tmax := maximum final date among all robots; Calculate dT given nominal speeds vi , i ∈ [[1, m]] ; Initialize K := ceil(Tmax /dT ) and time index k ← 1; while k ≤ K do Initialize boolean any collision detected with value true; Initialize integer k candidate with value k; while any collision detected do Set collision counter to 0; for robot r from 1 to m − 1 [loop over priority robots] do for robot s from r to m [loop over subordinate robots] do if robot ’s’ collides with robot ’r’ then collision counter ← collision counter + 1; Insert a stop at last date dstop when a motion command had been applied before the collision, and update datess and motions ; k candidate ← min {k candidate; ceil (dstop /dT )}; if final date of robot ’s’ > final date of all other robots then K ← K + 1; end end end end if collision counter == 0 then any collision detected := f alse; end end if k candidate < k then k ← k candidate; [start over the collision check in the subinterval containing or ending by the smallest dstop used in the latest collision while loop] else k ← k + 1 [go to next time subinterval]; end end For all robots i ∈ [[1; m]], return datesi , motioni (pathi is fixed by design);
Algorithm 2: Fixed-Path Coordination (FPC) pseudocode
4
Numerical Experiments
Extensive simulations have been run based on the following setup. The workspace is a bounded subset of R2 with translational degrees of freedom. As shown in Fig. 4a, the free space is sampled at regularly-spaced points (in green) which form a grid with at most eight neighbors per node. Robots move along the edges with the aim of traveling from their starting positions (black) to their final ones (red). The number indicated next to every node is the date at which the robot reaches the corresponding position.
Comparison of Decoupled Methods
503
Fig. 4. Illustrative test case with 3 robots successfully solved by PP (b) and FPC (c-d). Arrival times are indicated around the nodes and motion command profiles (d) show the stop-and-go behaviour of FPC.
4.1
Illustrative Test Cases
Coupled Planning. A coupled A∗ strategy has been considered as a baseline, where all the positions of the robots in the fleet were aggregated into a single state vector. However the combinatorial complexity grows exponentially with the number of nodes and the number of robots, making it impractical for the simultaneous path planning of teams with more than two or three robots. This has been verified on a simple example with around 100 nodes where computing the paths takes a few milliseconds for one robot, several seconds for two robots, and several hours for three robots. Therefore, the focus has been put on the comparison of the proposed decoupled algorithms.
504
B. Bouvier and J. Marzat
Decoupled Methods. As an illustrative example, a simple test case for three robots with different velocities is presented in Figs. 4b and 4d. Both decoupled algorithms are successful in coordinating the three agents. With PP, robot A2 finds an alternative node to avoid a collision with A1 at node (45; 25). With FPC, robots A2 and A3 stop once and twice respectively, increasing their motion duration by 16 and 89% compared to the constant-speed travel time of the initial independent A* paths. 4.2
Monte Carlo Simulations
In order to evaluate and compare the algorithms performances, a Monte Carlo simulation was set up. It consisted in (i) creating random scenarios (following uniform distributions) as two-dimensional maps with randomly-placed static obstacles and random initial and final configurations of robots and (ii) applying the two algorithms. The following parameters are selected for each experiment: (a) a number of nodes (related to the map size), (b) a rate of static obstacles and (c) a number of robots. The static obstacles are placed so that a particular percentage of the map (called occupancy rate) is covered with them, e.g. 10% means that 10% of the nodes are occupied by static obstacles. Even though both algorithms allow the robots to have different nominal speeds, they were all set to the same value in these evaluations for a fair comparison since path lengths and travel times are handled differently by PP and FPC. Two values were respectively chosen for the map size (about 103 and 104 nodes), the occupancy rate (10 and 30%) and the number of robots (5 and 10), and all eight combinations evaluated. Any map where single-robot A* planning failed was dismissed, since that means at least one destination cannot be reached. The expression singlerobot planning refers to the superposition of the paths obtained for every robot considering only the static obstacles and ignoring the other agents. Two main metrics were computed for all the test cases and algorithms: the success rate and the increase of travel duration with respect to the single-robot reference (see Table 1). Table 1. Monte-Carlo comparison (mean results on 1000 runs) for PP and FPC algorithms
Comparison of Decoupled Methods
505
Fig. 5. PP and FPC success rates show little dependency on occupancy rate
Rate of Planning Success. Single-robot success refers to cases where independent single-robot planning happens to generate no collisions with constant-speed travel. As can be seen in Table 1 (line 1), the success rates ranging from 10 to 90% decrease by 4 to 15 points when the occupancy rate is tripled and by 26 to 67 points when the number of robots is doubled. Available nodes become progressively less numerous so the probability of path overlap and/or crowded areas rises, which produces more collisions. On the contrary, growing the number of nodes has a positive impact on the success rate because more free space becomes available: paths are less likely to overlap and, even if they do, robots are less likely to take these common portions at the same dates. Considering now the cases where single-robot planning fails and multi-robot planning becomes necessary, we may compare the efficiency of PP and FPC (lines 2-3 in Table 1, also shown in Fig. 5). It should be noted that this metric is calculated only on the subset of single-robot failures. PP has a success rate between 64 and 96% and FPC from 51 to 81%. PP always performs better than FPC because FPC fails when an initial or final position is on another robot’s path. A goal position acts as an additional static obstacle from the perspective of a lower-priority robot; an initial position is also seen as a static obstacle but by a higher-priority robot because this is the only position that cannot be freed by holding up the robot somewhere else in the map. In a nutshell, FPC handles path overlaps less efficiently. A check on initial and final positions as well as priority reordering could help avoiding such cases. Finally, the number of nodes has a positive impact on the planning status because it offers more possibilities for PP to find alternative paths and because FPC is less affected by the previously described issues. Travel Duration Increase. It is worth reminding the difference between PP and FPC path alteration. PP will change portions of paths to bypass higherpriority robots while driving continuously so both travel duration and distance may increase. FPC, however, can make smaller-priority robots wait but cannot make them divert from single-robot paths, which means distance remains
506
B. Bouvier and J. Marzat
unchanged. This is why Table 1 only mentions travel duration increase. Path lengthening would be zero for FPC and equal to duration increase for PP because all speeds are equal. The metric is computed only on multi-robot configurations for which the paths were successfully modified by the corresponding algorithm. The results show that FPC delays the robots much more than PP, with 18 to 57% of time increase on average compared to less than 6.1% for the smaller map and always less than 1% on the bigger one. In many cases, bypassing a robot has a limited impact on arrival time because various slightly different paths (and sometimes ones of exact same length) lead to the goal. If two robots drive in opposite directions, one may simply shift to a parallel lane with PP whereas in FPC the smaller-priority agent will have to stop to give way.
5
Conclusion
Two path planning algorithms dedicated to multiple robots driving simultaneously in a cluttered workspace have been defined as variations of A* procedures and compared in this paper. Both are based on a predefined order of priority and assumptions of constant nominal velocities. The first one named Prioritized Planning (PP) consists in bypassing higher-priority agents while continuously driving. The second one named Fixed-Path Coordination (FPC) handles collision risks along fixed paths by inserting pauses in order to let higher-priority robots move past. The Monte Carlo simulation setup showed that directly superimposing single-robot planning paths can be successful in a number of configurations with limited occupancy rates and fleet sizes. This could thus be a first step when addressing a multi-robot coordination problem, where the new strategies can then be called upon when this straightforward approach has failed. PP showed good performances thanks to its flexibility in adapting the paths by following alternative nodes. Satisfactory success rates and very limited travel duration and distance increases were obtained. The proposed FPC version proved less efficient because of the incapacity of diverting from the initial single-robot paths. This strategy may only prove more efficient in extremely cluttered environments where alternative nodes would be harder to find for PP. Most of the FPC failed cases were due to initial and final positions conflicts, as they may prevent other robots from reaching their goals. Moreover, although distance remains minimal with FPC, travel duration grows significantly more than with PP. Free space sampling maximization is of interest to improve the success rate of both singleand multi-robot algorithms. Future work directions include the evaluation of priority shuffles for both PP and FPC algorithms and their extension to higher-dimensional workspaces, which will allow to handle heterogeneous teams of robots. Additional comparison criteria can also be considered, such as energy efficiency. Finally, the deployment on embedded computers of real-world robots and the interaction with actual perception and control modules would also make it possible to estimate the computational load and performances in full autonomy loops on the field. Acknowledgement. This work was supported by ONERA project PR PARADIS.
Comparison of Decoupled Methods
507
References 1. Yan, Z., Jouandeau, N., Cherif, A.A.: A survey and analysis of multi-robot coordination. Int. J. Adv. Rob. Syst. 10(12), 399 (2013) 2. Parker, L.E.: Path planning and motion coordination in multiple mobile robot teams. Encyclopedia of Complexity and System Science, pp. 5783–5800 (2009) 3. Hoy, M., Matveev, A.S., Savkin, A.V.: Algorithms for collision-free navigation of mobile robots in complex cluttered environments: a survey. Robotica 33(3), 463– 497 (2015) 4. Bertrand, S., Marzat, J., Piet-Lahanier, H., Kahn, A., Rochefort, Y.: MPC strategies for cooperative guidance of autonomous vehicles. Aerospace Lab J. 8, 1–18 (2014) 5. LaValle, S.M.: Planning Algorithms. Cambridge University Press, Cambridge (2006) ˇ 6. Svestka, P., Overmars, M.H.: Coordinated path planning for multiple robots. Robot. Auton. Syst. 23(3), 125–152 (1998) 7. Yu, J., LaValle, S.M.: Planning optimal paths for multiple robots on graphs. In: 2013 IEEE International Conference on Robotics and Automation, pp. 3612–3617. IEEE (2013) 8. LaValle, S.M., Hutchinson, S.A.: Optimal motion planning for multiple robots having independent goals. IEEE Trans. Robot. Autom. 14(6), 912–925 (1998) 9. Van Den Berg, J.P., Overmars, M.H.: Prioritized motion planning for multiple robots. In: 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 430–435. IEEE (2005) 10. Zheng, T., Liu, D., Wang, P.: Priority based dynamic multiple robot path planning. In: International Conference on Autonomous Robots and Agents. Massey University, New Zealand (2004) 11. Kala, R.: Rapidly exploring random graphs: motion planning of multiple mobile robots. Adv. Robot. 27(14), 1113–1122 (2013) 12. van Den Berg, J., Snoeyink, J., Lin, M.C., Manocha, D.: Centralized path planning for multiple robots: optimal decoupling into sequential plans. Robot. Sci. Syst. 2, 2–3 (2009) 13. S´ anchez, G., Latombe, J.C.: On delaying collision checking in PRM planning: application to multi-robot coordination. Int. J. Robot. Res. 21(1), 5–26 (2002) 14. Saha, I., Ramaithitima, R., Kumar, V., Pappas, G.J., Seshia, S.A.: Implan: scalable incremental motion planning for multi-robot systems. In: 2016 ACM/IEEE 7th International Conference on Cyber-Physical Systems (ICCPS), pp. 1–10. IEEE (2016) 15. Gregoire, J.: Priority-based coordination of mobile robots. Ph.D. thesis, Ecole Nationale Sup´erieure des Mines de Paris (2014) ˇ ap, M., Nov´ 16. C´ ak, P., Kleiner, A., Seleck` y, M.: Prioritized planning algorithms for trajectory coordination of multiple mobile robots. IEEE Trans. Autom. Sci. Eng. 12(3), 835–849 (2015) 17. Sim´eon, T., Leroy, S., Lauumond, J.P.: Path coordination for multiple mobile robots: a resolution-complete algorithm. IEEE Trans. Robot. Autom. 18(1), 42–49 (2002) 18. Bae, H., Kim, G., Kim, J., Qian, D., Lee, S.: Multi-robot path planning method using reinforcement learning. Appl. Sci. 9(15) (2019) 19. Wen, S., Wen, Z., Zhang, D., Zhang, H., Wang, T.: A multi-robot path-planning algorithm for autonomous navigation using meta-reinforcement learning based on transfer learning. Appl. Soft Comput. 110 (2021)
508
B. Bouvier and J. Marzat
20. Das, P.K., Jena, P.K.: Multi-robot path planning using improved particle swarm optimization algorithm through novel evolutionary operators. Appl. Soft Comput. 92 (2020) 21. Tang, B., Xiang, K., Pang, M., Zhanxia, Z.: Multi-robot path planning using an improved self-adaptive particle swarm optimization. Int. J. Adv. Robot. Syst. 17(5) (2020) 22. Li, Q., Gama, F., Ribeiro, A., Prorok, A.: Graph neural networks for decentralized multi-robot path planning. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 11785–11792 (2020) 23. Han, S.D., Yu, J.: Effective heuristics for multi-robot path planning in warehouse environments. In: IEEE International Symposium on Multi-Robot and Multi-Agent Systems (MRS), pp. 10–12 (2019)
Smarticle 2.0: Design of Scalable, Entangled Smart Matter Danna Ma1 , Jiahe Chen1 , Sadie Cutler2 , and Kirstin Petersen1(B) 1
2
School of Electrical and Computer Engineering, Cornell University, Ithaca, NY 14853, USA {dm797,jc3472,kirstin}@cornell.edu Sibley School of Mechanical and Aerospace Engineering, Cornell University, Ithaca, NY 14853, USA [email protected]
Abstract. We present a new iteration of smart active matter modules capable of unprecedented 3D entanglement, designed specifically for fabrication and operation at large scales by a range of scientific users. We discuss the benefits of entanglement compared to traditional rigid, lattice formations in active matter and modular robots, and the design which supports low cost, a small and appropriate form factor, low weight, low barrier-of-entry, and ease of operation. We characterize the platform in terms of actuation repeatability and longevity, lifting and holding strength, a number of sensing modalities, and battery life. We demonstrate short and (relatively) long range communication using tactile and acoustic transceivers. We further show exploratory collective behaviors with up to 10 modules, including static entanglement and self disassembly. We hope that this open-source ‘robo-physical’ platform can pave the way for new innovations across the fields of modular robots and active and soft matter. Keywords: scalable collective
1
· modular robot · active matter
Introduction
Smart robotic matter consists of large aggregates of programmable units capable of actuation, sensing, and intelligent response to their surroundings [1]. It promises applications both as a physical test platform to gather further insights on materials, colloids, and biological swarms [2,3], as well as, the ability to act as a taskable, modular robot that can morph between shapes and change properties as needed [4–6]. Demonstrations currently span swarms of carefully linked, individually-capable modules [1,5], to loose aggregates of modules with little-to-no individual mobility, sensing, or computation [2,7,8]. The latter is of special interest in this article because system-wide complexity and functionality can emerge from local interactions between large numbers of individually limited components. However, it also involves significant challenges related to scalable fabrication and operation, and methods for coordination and reliable autonomy. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 509–522, 2024. https://doi.org/10.1007/978-3-031-51497-5_36
510
D. Ma et al.
Fig. 1. A) Smarticle 2.0 modules, capable of transitioning between “I” and “U” shapes. B) Wireless charging circuit. C) Entangled staples hanging from a ruler. D) 20 entangled modules.
This paper presents the newest iteration of a Smarticle (smart active matter) robot platform [2], specifically designed to support affordable and large-scale operation, 3D entanglement, a variety of sensing modalities, and low barrier of entry for novice users across scientific fields (Fig. 1). A Smarticle 2.0 module has a long slender profile that can alternate between a staple-like shape (“U”) and an oblong shape (“I”) using a single actuator, and has the ability to sense and communicate acoustically, optically, and mechanically. This article focuses on the details of the Smarticle 2.0 design (Sect. 3), fabrication (Sect. 4), and mechanical and sensor characterization (Sect. 5). However, we also show some exploratory multi-robot experiments with up to 10 robots (Sect. 6). To clarify the results presented, we recommend that the reader watches the accompanying video (https://youtu.be/KrcyYz2ccdg). To the best of our knowledge, this is the first demonstration of a smart matter platform capable of large-scale, 3D entangled operation.
Smarticle 2.0
2
511
Background and Related Work
The typical trend in smart matter is to rely on modules with sufficient locomotion, sensing, communication, and reasoning skills to function on their own or in small numbers [1,5]. However, more recent works have suggested that smart matter can be composed of much less capable modules that are largely agnostic to each other and their surroundings. Examples include modules that are individually immobile, but collectively move [9–11], and, perhaps more impressively, exhibit emergent intelligent behaviors in loose aggregates with only very coarse perception and no explicit communication [8,10,12,13]. One of the major practical issues with smart matter and modular robots is the design of couplings/connection points. Beyond supporting the weight of adjoining modules, these connection points are often used to transfer drive forces, communication signals, and power. In reconfigurable robotic matter, these mechanisms must dock and undock repeatedly and with high accuracy. Hence, these coupling mechanisms are often responsible for a significant fraction of the weight, volume, price, and wear. To overcome this issue, many platforms rely on active or passive magnetic coupling [1,5,14], with the explicit trade-off of low force transfer which limits configurations to small overhangs or planar operation. Another interesting tangent are modules with compliant magnetic couplings, which permit loosely coupled aggregates in non-lattice formations [8–10]. A final promising option is the reversible fusion of surface material, as demonstrated in 3D modular robots [15]. The Smarticles, first presented in 2015 [2], breaks with these discrete on/off unit couplings by leveraging entanglement of convex modules. Entanglement is advantageous because 1) it does not require precise module-module alignment; 2) it does not require expensive, heavy, and bulky coupling mechanisms; and 3) although it cannot produce particular formations, it permits modules to couple loosely to generate materials and conglomerates with different shapes and rheological and structural properties like viscosity, yield stress, and packing density. The Smarticles entangle simply by leveraging two rotating appendages, causing units to alternate between “U”, “I”, and, in some cases, “Z” shapes. In spite of their simplicity, Smarticle aggregates have been shown to move collectively when many are confined in a small space [11], and to perform phototaxis using only binary sensor inputs and physical interactions [12,16]. Further demonstrating their versatility, they have also been used for physics studies, e.g. pertaining to non-equilibrium ordering phenomena [3] and novel phase dynamics in active matter [17] and biological model systems [18]. While the original inspiration for the Smarticles platform stems from 3D entanglement of “U”-shaped particles [19], all prior demonstrations have been restricted to planar operation. The reason is two-fold. Most importantly, the Smarticle 1.0 prototype does not have a profile that supports effective 3D entanglement. Each module can only entangle with one other due to the aspect ratio. Furthermore, the current prototype was not optimized for operation in large robot swarms which are needed for the stochastic effects to average out. In this paper, we present a new design which enables full 3D entanglement, large scale
512
D. Ma et al.
deployment, and may serve as an easily accessible platform to support novel studies on smart and active matter systems.
3
Electro-mechanical Design
Compared to other modular and swarming robots in literature (Table 1), the Smarticles 2.0 have a relatively small form factor (2000×20×17mm3 ), very light weight (20g), low cost (< 71USD/pcs) and assembly time (∼15min), making it feasible to acquire and operate many in traditional research labs (Table 1). Equally important, the modules have an optimal aspect ratio which is the key to 3D entanglement (l/w = 0.4− 0.75 [19]). Smarticle 2.0 can translate between “I” and “U” shapes leveraging a non-backdrivable 4-bar crank and slider mechanism. Modules also have the ability to sense and communicate optically, acoustically, and mechanically. Finally, modules are powered by a Venom Fly 180mAh 3.7V Lithium Polymer battery which can support reasonably long experiments (up to 200 appendage cycles). The cost of a module breaks down as follows for a 15-piece order: the raw PCB costs 22.1USD, the electronic components 25.53USD, the motor 14.99USD, and the battery 6.49USD. (Note that the price of 3D printed parts is negligible as they total 2.24g of material, and that the module price will drop with bulk ordering) The crank and slider mechanism dominates the module design, volume, weight, and cost. It works by rotating a custom worm gear with two opposite thread angles, which is translated to linear motion through two gear racks that attach to raise or lower the appendages. To support this mechanism we chose a commercially available, affordable, strong, small, and light weight motor; specifically the 700:1 Sub-Micro Plastic Planetary Gearmotor (6mm diameter, 21mm length, 1.3g) from Pololu. Beyond the off-the-shelf motor and the battery, modules consist almost entirely of FR4 and polyimide material, commonly known as rigid and flexible printed circuit boards (PCBs). We order these pre-soldered by Sandnwave for ease; because these fabrication houses are located in China, we also circumvent long lead times due to the ongoing shipping crisis. Beyond the PCBs, the module consists of a 3D printed bearing and motor holder which are snap-fit into holes in the PCB. We use the Formlabs 2.0 and the Carbon 3D printers, with Tough1500 and UMA90 material respectively, to produce the bearing, motor holder, and worm gear. While the resolution of the prints are important, the material itself only affects how quickly the parts wear down. Figure 2 shows the breakdown of the mechanical parts as well as the top and side view of a robot. On the PCB top, it mounts the worm gear and motor pair, the bearing and motor holder, as well as the left and right sliders. The bearing and motor holder not only secures the worm gear and motor during the operation, but also help to guide the sliding motion of the gear racks. These parts are fitted on the center body, connected to the left and right appendage by 4 flexible PCB link pieces. The battery is mounted on the back of the module.
Smarticle 2.0
513
Table 1. Examples of self-re-configurable modular and swarm robotic platforms, characterized in terms of properties related to scalable fabrication and operation. Platform
Form factor[mm]
Jasmine robots[20] ∼30×30×30
Weight[g] Cost[USD] Assembly time[min] –
130
–
Donuts[9]
46(diameter),70(tall) 25.4
587
48
Kilobots[21]
33(diameter),40(tall) –
14*
15
Smarticles 1.0[2]
151×53×25
–
40.8*
–
20
71
15
Smarticles 2.0 2000×20×17 -no information *parts cost only
Fig. 2. Component overview. Mechanical Components: 1. Worm gear; 2. Planetary DC motor; 3. Motor bearing; 4. Motor holder; 5. Left slider; 6. Right slider; 7. Left appendage; 8. Flexible PCB link; 9. Center body; 10. Right appendage; 11. Battery. Electronics: 1. Light sensor; 2. IR encoders; 3. Motor current sensor; 4. Buzzer; 5. RGB LED; 6. Atmega328pb microcontroller; 7. Battery state sensor; 8. Accelerometer; 9. Microphone.
514
D. Ma et al.
Each module is controlled by an 8-bit ATmega328pb microprocessor, making it compatible with the popular do-it-yourself (DIY) Arduino IDE programming framework. Modules have six sensing modalities. In terms of proprioception, they can measure their battery state, appendage load (motor current), appendage position through IR encoders, and orientation through a 3-axis accelerometer. In terms of exteroception, they can measure ambient light using an IR sensor, audio using a buzzer-microphone pair, and vibrations using the accelerometer. These sensing modalities compliment each other: acoustic signals can penetrate an entangled collective, light can address only perimeter modules, and, because the robots are typically loosely entangled, mechanical vibrations translate only between neighboring modules. To lower maintenance, the battery is connected to a coil embedded in the left appendage, such that the modules can be charged by loose placement on a wireless charger (Fig. 1.B). The current charging circuit cost 85USD and simultaneously charges 5 robots fully over 6hrs. We have yet to optimize for cost and ease of use, i.e. the amount of misalignment the charger can handle for easier placement.
4
Fabrication
Fig. 3. Smarticle 2.0 assembly steps.
Fabrication time and ease are both critical to support user-adoption of swarm modular robot platforms. All electronic components can be mounted on the rigid PCB either in- or out of house. The latter requires less in-house expertise and labor, and is in fact economically advantageous due to the current shipping crisis.
Smarticle 2.0
515
Excluding component soldering and 3D printing of components, the in-house assembly takes less than 15 min per robot and requires little prior knowledge on soldering techniques. The fabrication process involves six main steps (Fig. 3): 1. Assemble slider pieces: To support accurate, rapid, and inexpensive assembly of the flexible and rigid PCBs that make up the sliders, we added solder pads such that they can be laminated together in a re-flow oven. This process enable the pieces to self-align due to the surface tension created by the solder paste. Depending on the size of the oven, a few dozen sliders can be produced in each batch. Alternatively, this step can be done with a carefully pointed heat gun. 2. Connect body and appendages: We connect the two appendages and the center body by soldering the four flexible link pieces on to each side. To assist in the alignment of PCBs, we perform this step on a 3D printed holder. 3. Mount the actuator: We snap-fit the bearing and motor holder to the center body PCB, to hold together the two sliders and the worm gear. 4. Assemble crank-and-slider: We solder the sliders to the appendages. 5. Attach Motor: We glue the motor into the motor holder and solder the motor connections onto the center body PCB. 6. Finalize: We plug in the battery and apply a thin layer of liquid electrical insulation paint on the surface of robot.
5
Characterization
In this section we characterize essential properties of the Smarticle 2.0 modules, including their actuation profile (accuracy/longevity, lifting strength, and holding strength) and sensor specifications (acoustic, orientation). First, to test the repeatability and durability of the mechanical components, a module was programmed to raise-, pause-, lower-, and pause the appendages open-loop for 2 s-3 s-2 s-3 s, respectively. We repeated this behavior for 67min continuously, amounting to more than 400 cycles. We found that even without sensor feedback, the appendage motion is fairly symmetrical and accurate within a standard deviation of less than 2◦ , and persists over many cycles. Specifically, the left and right appendage minimum angle was found to 30.2±1.0◦ and 35.8±1.2◦ respectively; and the left and right appendage maximum angle was found to 88.0±0.9◦ and 85.4±2.0◦ respectively. Second, we analyzed the appendage lifting force, by combining the governing equations for the crank and slider mechanism [22] and the worm gear force translation [23]. These parameters are also shown in Fig. 4A. Geometrically, the distance from the end of the connecting rod to the crank axle, x, and the angle between the slider and the connecting rod, β, can be calculated as follows: (1) x = rcos(α) + l2 − r2 sin2 (α) β = cos−1 (
x + rcos(π − α) ) l
(2)
516
D. Ma et al.
where α is the angle of the crank, r is the rotating crank length, and l is the length of the connecting rod. The torque from the crank and slider on the appendage, τ , depends on F , the linear force pulling on the slider: τ = F rsin(α + β)
(3)
Conversely, the motor stall torque, τstall , relates to the maximum linear force the crank and slider mechanism is capable of delivering, estimated as: F =
2τstall πdmean − f psec(ζ) ) ( dmean p + πf dmean sec(ζ)
(4)
Fig. 4. A) Left: Kinematic forces acting on the worm gear; Right: Single-appendage view of the crank and slider mechanism, with design parameters. B) Lifting force of a single appendage. The blue line is the theoretical prediction, red data points show experimentally obtained values. C) Alternative design with a stronger motor. (Color figure online)
where dmean and p is the mean thread diameter and pitch of the worm gear, and ζ and f are the angle and amplitude of the normal force between the teeth and the grooves in the (FR4-based) rack, respectively. We now combine Eqs. 4 and 3 and iterate over all appendage angles to estimate their lifting force (Fig. 4B). Note that the model is inaccurate due largely to the loss of efficiency in the wormrack transmission and friction between the sliders and center body PCBs. We partially make up for this inaccuracy by tuning the value of f to two sets of real
Smarticle 2.0
517
data points at low and high appendage angles gathered from the real modules. We see that the mechanism is strongest (can lift the highest load) when the appendage is near vertical, and vice versa. Practically, this means that when the appendage is at a 90◦ angle, the appendage can lift 3.42 modules placed 40mm out from the center body PCB; at 45◦ it can lift 1.7 modules; and at 10◦ it can lift 0.33 modules. The real strength of the chosen mechanism is that it is non-back-drivable. This feature allow robots to maintain a joint angle without energy consumption, independent of load size (provided that it does not exceed the breaking point). As a rule of thumb, this property holds for worm gears with a lead angle smaller than atan(μ), where μ is the coefficient of friction. Estimating that μ is comparable to two glass plates sliding against each other (0.4), we get that the lead angle, λ must be smaller than 21◦ - well beyond the 5.6◦ of our design. Practically, this means that the module’s holding force is only limited by the bonding strength of the solder joints that attach the appendages to the main PCB. In other words, once in position, the module can hold a significant amount of weight without necessitating continuous power. In the spirit of keeping our robots intact, we have yet to perform destructive testing, but in preliminary tests we have loaded appendages with over 15 modules without mechanical failure. The Smarticle 2.0 design is also compatible with other off-the-shelf motors such as the Micro Metal Gearmotors (cross Sect. 10 × 12 mm) from Pololu. Taking a Pololu micro motor with gear ratio 298:1 (stall torque 20 OzIn) as an example, it uses the same PCB design only and takes some adjustment on the 3D printed parts to fit the motor as shown in Fig. 4C. Assembling a Smarticle 2.0 with this motor follows the same steps as the Plastic Planetary Gearmotor design and takes similar assembly time. The overall weight of a Smarticle 2.0 is 25 g and its total cost is slightly higher at 72.47USD. With this motor, when the appendage is at a 90◦ angle, the appendage can lift 41.6 modules placed 40mm
Fig. 5. A) Characterization of accelerometer accuracy: mean and standard deviation of ten measurements at each rotation. B) Microphone-buzzer detection between two modules.
518
D. Ma et al.
out from the center body PCB; and at 10◦ angle, it can lift 2.04 modules. We will explore the functionalities of a heterogeneous collectives in the future. Finally, Fig. 5A-B shows a characterization of the onboard accelerometer and buzzer-microphone pair. We see that rotation around the accelerometer x-axis (roll) and y-axis (pitch), is accurately detected. We further see that at 50% power using the onboard microphone and buzzer pair, two modules can signal over 300mm range, equivalent to 1.5 the module body length. This value can be tuned by adjusting the average buzzer power (the pulse width) of the carrier signal.
6
Multi-robot Demonstrations
While this paper is focused mainly on describing the design of and characterizing individual modules, there are many avenues for extension of this work. In this
Fig. 6. A) Three-module tensile test at different appendage angles. B) Three-module compression tests with appendages located at 90◦ . C) Recorded audio signal of acoustic synchronization tests. 0–10 s shows the beeps from three synchronized modules; at 10 s an unsynchronized module is added to the collective; past 25 s all four modules have synchronized again. D) Active entanglement by hanging multiple modules. E) Before and after of disassembly test with 7 modules.
Smarticle 2.0
519
section, we show exploratory multi-robot experiments to illustrate the platform’s potential to support both active matter and robotic functionality in the future. The Smarticle 2.0 platform can be used for both taskable robot- and smart matter experiments. As a simple demonstration of this concept, Fig. 6A-B shows tensile and compression tests of 3 modules arranged in a 2D chain. Note that to calculate stress in Fig. 6A, we assume that the cross-sectional area of the “Smarticle 2.0 material” remains constant; i.e. only the force changes. The experiment shows how the material, similar to most hyperelastic polymers, first exhibits a roughly linear profile due to linkage deflection, then rapidly declines in stress because the modules start to slide against each other. As one would expect, sliding occurs earlier with lower appendage angles; i.e. the higher the appendage angle, the higher the elastic modulus. The compression test was limited to Smarticles with 90◦ appendages, because at lower angles, the center module would immediately start sliding. In the three repeated trials, we again see a roughly linear profile caused by link deflection, and then a rapid decline in compression strength due to module sliding. To demonstrate that, mechanically, the Smarticle 2.0 modules have a form factor and weight that permits entanglement, we attached a single module to a ruler and manually hung as many other modules off it as possible (Fig. 6D). In five repeated trials, we achieved 7, 8, 9, 8, and 9 modules. In the near future, we plan to repeat this experiment with active entanglement - i.e. by placing modules in a bin, having them actively entangle, and then pulling out globs of “Smarticle 2.0 material” with a stick similar to past papers demonstrating entanglement of passive staples [19], fire ants, and worm blobs [24]. Finally, to demonstrate that material composed of entangled Smarticle 2.0 modules can actively undergo a phase change, we manually assembled a glob of 7 modules with appendages at 90◦ , and programmed them to unfold upon hearing an acoustic signal in close proximity(Fig. 6E). In doing so, the glob went from a volume of 1853.8cm3 to 3745.4cm3 , i.e. an order of magnitude change, similar to a phase change from solid to liquid. As a building block for more complex and distributed coordination schemes, we demonstrated acoustic synchronization of four modules using the Kuramoto model [25]. In Fig. 6C, we show how this works, by placing a non-synchronized module next to three previously synchronized modules, while recording their buzzes with a separate module. In this case the modules synchronized over 9 cycles. In longer experiments where all four modules started unsynchronized, full synchronization took 17 cycles. These characteristics can be optimized with better coupling factors, but the general concept of module synchronization is promising for future work on consensus and collective motions. In Fig. 7A, we demonstrate how the onboard accelerometer can be used for tactile communication; i.e. by sensing changes in orientation. We manually poke a module, causing it to activate by flapping its appendages, which causes the signal to propagate to other nearby robots. It is worth noting that the signal amplitude is heavily dependent on where along the center body PCB the modules connect, and that the flapping action can cause unwanted accelerometer readouts that self-activate the module. In Fig. 7B we included a small inhibitory
520
D. Ma et al.
Fig. 7. A-B) Smarticles sequentially activate each other through tactile cues.
period, thereby avoiding self-activation and only flapping the appendages once upon external stimulus.
7
Summary
We presented and characterized a design optimized for scalable fabrication and operation of robotic modules, capable of 3D entanglement by a range of scientific users. This relatively new subfield of 3D entangled robots presents unique opportunities related to active matter and taskable modular robots as evidenced by entangled social organisms in nature, but also unique research challenges related to communication and consensus algorithms, reconfiguration, and mobility. In future work, we aim to scale up the size of our collective and demonstrate more complex multi-agent behaviors in both hardware and simulation. The design files for these robots are available upon request from the authors. Acknowledgements. This project was funded by NSF awards #1933284 and #2042411, and the Packard Fellowship for Science and Engineering. The authors would
Smarticle 2.0
521
like to acknowledge many fruitful discussions with Prof. Goldman at the Georgia Institute of Technology.
References 1. Goldstein, S.C., Campbell, J.D., Mowry, T.C.: Programmable matter. Computer 38(6), 99–101 (2005) 2. Savoie, W., Pazouki, A., Negrut, D., Goldman, D.: Smarticles: design and construction of smart particles to aid discovery of principles of smart, active granular matter. In: The First International Symposium on Swarm Behavior and Bio-Inspired Robotics (2015) 3. Chvykov, P., et al.: Low rattling: a predictive principle for self-organization in active collectives. Science 371(6524), 90–95 (2021) 4. Gardi, G., Ceron, S., Wang, W., Petersen, K., Sitti, M.: Microrobot collectives with reconfigurable morphologies, behaviors, and functions. Nat. Commun. 13(1), 1–14 (2022) 5. Daudelin, J., Jing, G., Tosun, T., Yim, M., Kress-Gazit, H., Campbell, M.: An integrated system for perception-driven autonomy with modular robots. Sci. Robot. 3(23), eaat4983 (2018) 6. Gilpin, K., Rus, D.: Modular robot systems. IEEE Robot. Autom. Mag. 17(3), 38–55 (2010) 7. Haghighat, B., Droz, E., Martinoli, A.: Lily: a miniature floating robotic platform for programmable stochastic self-assembly. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 1941–1948. IEEE (2015) 8. Ceron, S., Kimmel, M.A., Nilles, A., Petersen, K.: Soft robotic oscillators with strain-based coordination. IEEE Robot. Autom. Lett. 6(4), 7557–7563 (2021) 9. Wilson, N.J., Ceron, S., Horowitz, L., Petersen, K.: Scalable and robust fabrication, operation, and control of compliant modular robots. Front. Robot. AI 7, 44 (2020) 10. Li, S., et al.: Particle robotics based on statistical mechanics of loosely coupled components. Nature 567(7748), 361–365 (2019) 11. Warkentin, R., Savoie, W., Goldman, D.I.: Locomoting robots composed of immobile robots. In: 2018 Second IEEE International Conference on Robotic Computing (IRC), pp. 224–227. IEEE (2018) 12. Savoie, W., et al.: A robot made of robots: emergent transport and control of a smarticle ensemble. Sci. Robot. 4(34), eaax4316 (2019) 13. Nilles, A., Ceron, S., Napp, N., Petersen, K.: Strain-based consensus in soft, inflatable robots. In: 2022 IEEE 5th International Conference on Soft Robotics (RoboSoft), pp. 789–794. IEEE (2022) 14. Gilpin, K., Knaian, A., Rus, D.: Robot pebbles: one centimeter modules for programmable matter through self-disassembly. In: 2010 IEEE International Conference on Robotics and Automation, pp. 2485–2492. IEEE (2010) 15. Swissler, P., Rubenstein, M.: Fireant3D: a 3D self-climbing robot towards nonlatticed robotic self-assembly. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3340–3347. IEEE (2020) 16. Savoie, W., et al.: Phototactic supersmarticles. Artif. Life Robot. 23(4), 459–468 (2018) 17. Zhou, W., Gravish, N.: Density dependent synchronization in contact-coupled oscillators. arXiv preprint arXiv:2012.07124 (2020) 18. Ozkan-Aydin, Y., Goldman, D.I., Bhamla, M.S.: Collective dynamics in entangled worm and robot blobs. Proc. Natl. Acad. Sci. 118(6), e2010542118 (2021)
522
D. Ma et al.
19. Gravish, N., Franklin, S.V., Hu, D.L., Goldman, D.I.: Entangled granular media. Phys. Rev. Lett. 108(20), 208001 (2012) 20. Woern, H., Szymanski, M., Seyfried, J.: The i-swarm project. In: ROMAN 2006The 15th IEEE International Symposium on Robot and Human Interactive Communication, pp. 492–496. IEEE (2006) 21. Rubenstein, M., Ahler, C., Nagpal, R.: Kilobot: a low cost scalable robot system for collective behaviors. In: 2012 IEEE International Conference on Robotics and Automation, pp. 3293–3298. IEEE (2012) 22. Hartenberg, R., Danavit, J.: Kinematic Synthesis of Linkages. McGraw-Hill, New York (1964) 23. Budynas, R., Nisbett, K.: EBOOK Shigley’s Mechanical Engineering Design 11e in SI Units. McGraw-Hill Education (Asia) (2020) 24. Hu, D., Phonekeo, S., Altshuler, E., Brochard-Wyart, F.: Entangled active matter: from cells to ants. Eur. Phys. J. Spec. Top. 225(4), 629–649 (2016) 25. Acebr´ on, J.A., Bonilla, L.L., Vicente, C.J.P., Ritort, F., Spigler, R.: The kuramoto model: a simple paradigm for synchronization phenomena. Rev. Mod. Phys. 77(1), 137 (2005)
Hybrid Flock - Formation Control Algorithms Cyrill Baumann(B) , Jonas Perolini, Emna Tourki, and Alcherio Martinoli Distributed Intelligent Systems and Algorithms Laboratory, School of Architecture, ´ Civil and Environmental Engineering, Ecole Polytechnique F´ed´erale de Lausanne, Lausanne, Switzerland {cyrill.baumann,jonas.perolini,emna.tourki,alcherio.martinoli}@epfl.ch
Abstract. Two prominent categories of algorithms for achieving coordinated multi-robot displacement are flocking and navigation in formation. Both categories have their own body of literature and characteristics, including their respective advantages and disadvantages. Although typically they are treated separately, we believe that a combination of flock and formation control represents a promising algorithmic solution. Such an algorithm could take advantage of a combination of characteristics of both categories that are best suited for a given situation. Therefore, in this work, we propose two distributed algorithms capable of gradually and reversibly shifting between flocking and formation behaviors using a single parameter W. We evaluate them using both simulated and real robots with and without the presence of obstacles. We find that both algorithms successfully trade off flock density for formation error. Furthermore, leveraging a narrow passage experiment as an application case study, we demonstrate that an adaptive shift between flock and formation behavior, adopting a simple method to define W in real-time using exclusively on-board resources, results in a statistically relevant reduction of traversing time in comparison to a non-adaptive formation control algorithm. Keywords: flocking
1
· formation control · multi-robot coordination
Introduction
In recent years, Multi-Robot Systems (MRS) have gained importance, as coordinated teams are able to perform tasks which would be unfeasible for single robots. Additionally, their inherent redundancy introduces robustness to the system, allowing it to successfully complete the overall mission objectives even when some individual agents fail. Spatial coordination of MRS is crucial in a wide range of applications, ranging from search and rescue missions over surveillance or escorting missions to spacecraft formations. Existing algorithms that tackle this challenge can be split into two main categories: those that manage tight spatial topologies, most commonly referred to as formation control, and those that manage loose ones, usually called flocking. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 523–536, 2024. https://doi.org/10.1007/978-3-031-51497-5_37
524
C. Baumann et al.
Flocking has its origin in nature, where such behavior achieves multiple functionalities (e.g., energy benefits, navigation accuracy, protection from predators) and is present in all media (e.g., schools of fish and flocks of birds). This behavior has been first engineered in [1] for the purpose of modeling the behavior of birds, fish, etc. in computer animation. It has since been improved upon, but the main principles have remained the same. The behavior of each individual agent can be expressed as the vector sum of three complementary components: separation, cohesion, and velocity matching. The result is a swarm-like behavior with (in steady state) almost equal distances among the flock members, but without any precise spatial formation. Additionally, formal methods for designing flocking behavior have also been proposed, for example, in [2], where three different kinds of lattices are leveraged to form triangle-like flocks, allowing for splitting and rejoining, as well as squeezing maneuvers to avoid obstacles. Formation control is an actively studied topic with a large amount of prior work, including recent surveys such as [3]. There exist three main approaches for addressing formation control: Leader-Follower, Virtual Structure (VS), and Behavior-Based (BB) methods. In the LF approach, one agent is designated as leader. All other agents, called followers, are then tasked with maintaining a predefined relative position to the leader, resulting in formation shape retention [4]. The VS approach [5] consists of three steps. First, the virtual structure (corresponding to the predefined formation) is aligned to the current positions of the agents in order to minimize the distance from every agent to its corresponding spot within the formation. Then the virtual structure is moved in the desired direction of movement. In the third step, the agents are controlled towards their corresponding positions in the shifted virtual structure. The BB approach is based on defining several desired behaviors for every agent. The control is then either a weighted average of the resulting motion vectors created by the individual behaviors [6], a Finite State Machine (FSM) switching between them [7], or a more elaborate combination of behaviors [8]. A BB approach results in a collective response, fairly adaptive to different environmental situations. While this represents the main advantage of this approach, one major drawback is that the resulting spatial topology emerges from local interactions and is therefore difficult to control it precisely. In multiple related works, the formation control has subsequently been “loosened” up, usually to account for obstacle avoidance. In [9] the formation is capable of splitting and rejoining. In [10] and [11] dynamical formation changes are leveraged. In [12] a potential field augmented with formation transformation is used. Deformable formations are also leveraged in [13]. On the flocking side, it is worth noting that, in comparison to the flocking based on Reynold’s rules, flocking based on lattices (as in [2]) can be considered significantly more rigid and thus closer to formation control. However, the above approaches all remain faithful to their respective categories of algorithms. More specifically, to the best of our knowledge, no work exists which is capable of shifting gradually from flocking to formation control or vice-versa. Yet, this gradual shift would enable a smooth transition between the characteristics of both types of algorithm, with an infinite number of
Hybrid Flock - Formation Control Algorithms
525
intermediate states in between. Such a continuous change could then be leveraged in numerous scenarios. For example, it could be used to optimize multi-robot navigation by trading off formation precision versus the ability to traverse narrow passages or areas with obstacles. As the shift between formation control and flocking would only be performed to the degree required, this would result in reduced change of desired positions and could thus even reduce the overall energy consumption. In this work, we propose and experimentally validate two such algorithms, designed to gradually shift between a flock and a formation behavior. For both algorithms proposed, the formation control is based on a graph-based, Laplacian control algorithm [14], which can be seen as part of the VS category. The corresponding flocking algorithm used differs between the two proposed solutions: Pot2Lapl leverages a potential-based flocking algorithm based on [2], while Flock2Lapl uses the original Reynolds rules [1], more specifically the version introduced in [15]. The remainder of this paper is structured as follows: we start by introducing the two algorithms Flock2Lapl and Pot2Lapl. We then present our two experimental evaluation setups, report and discuss the algorithms’ performances, before ending with some conclusive remarks.
2
Methodology
We consider a homogeneous group of N differential wheeled robots, tasked with following a predefined trajectory while forming and maintaining spatial coordination between them. This spatial coordination varies from flock to formation control depending on the scalar W. The robot control is designed following the framework introduced and described in detail in [16]. The control algorithm consists of the following seven functional elements: 1. Pose estimation: each robot is equipped with both a local range-and-bearing system and a receiver exploiting data from an emulated Global Navigation Satellite System (GNSS). Global communication is then used to transmit (and receive) GNSS-based position data. This effectively results in a global, occlusion-less sensing of neighboring robots. All information is fused using an Extended Kalman Filter. 2. Consensus on orientation: the formation orientation is not established a priori but obtained online using Laplacian consensus. 3. Position allocation: a Hungarian algorithm [17] is used to assign each robot the optimal position within the formation (minimizing the overall distances between the robots and their respective target positions). 4. Coordination algorithm: the estimated relative positions of neighboring robots (Xj with j ∈ [1, N ]) is used for both coordination algorithms. They are specified in Sects. 2.1 and 2.2, respectively. 5. Group movement: the common goal is given as a temporal trajectory in a global reference frame.
526
C. Baumann et al.
6. Collision avoidance: the commonly used Braitenberg algorithm is applied using the modifications introduced in [16] to cast it in the motion vector framework. 7. Robot control: a single integrator control law that includes a damping term as in [16] is used. It is further worth noting that the motion vectors of elements 4, 5 and 6 (Xcoord , Xtraj and Xcol respectively) are summed to obtain the final motion vector. Both coordination algorithms Pot2Lapl and Flock2Lapl use as formation control part of their respective algorithms a graph-based, Laplacian control algorithm. Using the standard feedback method for a single integrator kinematic model [18], we obtain for robot i: z˙i =
N 1 − Li,j zi,j + bi,j N j=1
(1)
with L the Laplacian matrix based on the underlying connectivity graph (which is fully connected in this work), zi,j the position of robot j and bi,j the desired position of robot j, both defined in the local coordinates of robot i. bi,j can be calculated from the desired positions in global coordinates BN for a given number of robots N . Following the reasoning in [16], we can express the motion vector of the Laplacian control algorithm as: Xilapl = −z˙i dt
(2)
with BN , dt numerical parameters. The numerical values used in this work are summarized in Table 1. The resulting V-shape formation for N = 5 is depicted in Fig. 3b. 2.1
Pot2Lapl
Intuitively, Pot2Lapl corresponds to a potential-based flocking algorithm (cf. [2]), with an additional attraction towards holonomic, massless virtual agents following Laplacian control. Then, varying a single attractive weight W towards the virtual agents allows the group of robots to gradually shift from pure flocking (i.e. virtual agents are ignored) to pure formation control (i.e. attraction to virtual agents dominates flocking). More specifically, the target motion vector Xcoord is given by Xicoord =
N dψ j=1
dz
(||zi,j || − dref )
zi,j + WXivirt | i = j ||zi,j ||
(3)
With ψ the potential function as defined below, dref the desired inter-robot distance and Xivirt the position of the virtual agent associated with robot i following a Laplacian-based control. That is, Xivirt = Xilapl .
Hybrid Flock - Formation Control Algorithms
527
To define ψ, we first define a base potential function ψbase (z) similar to [2]: ψbase (z) =
a−b a + b z 1 + (z + c)2 − 1 + c2 + 2 2
(4)
√ using c = |a−b| and a,b numerical parameters. 2 ab Similarly to [2], we further define a cut-off function ρ(z), in order to limit the influence of distant neighbors:
⎧ 0, ⎪ ⎪
⎪ ⎨ d z− ref 1 r −δ ρ(z) = 1 − 2 1 + cos π dref , ⎪ 1− r −δ ⎪ ⎪ ⎩ 1,
if z < if
dref r
dref r
+δ
+δ ≤z ≤1
(5)
otherwise
Additionally, we introduce a piece-wise constant function B(z) to increase the repulsive effect of the potential function: Krep , if z < 0 (6) B(z) = Katt , otherwise We can then define a piece-wise derivable, cut-off potential function ψ
z + d ref ψbase (z) − ψbase (rcut ) ψ(z) = B(z) ψbase (z) − ρ rcut
(7)
with dref , δ, Krep , Katt and rcut numerical parameters. Figure 1 shows the resulting ψ(z − dref ) and the corresponding gradient dψ dz (z − dref ) with the numerical parameters used in this work.
Fig. 1. Potential function ψ and the corresponding gradient dψ (z − dref ) with the dz parameters used in this work as a function of the inter-robot distance z.
528
C. Baumann et al.
2.2
Flock2Lapl
Flock2Lapl is based on the original Renyolds rules [1]. Intuitively, it uses nonlinear cohesion and separation vectors, extended with an attraction to a virtual agent following a Laplacian formation control. Formally, we can write the target motion vector X coord of robot i as Xicoord = Kcoh Xicoh + Ksep Xisep + WXivirt
(8)
using again Xivirt = Xilapl , with the cohesion vector defined as Xicoh = and the separation vector defined as N 1 sep Xi = N j=1 0,
N 1 zi,j N j=1
−zi,j ||zi,j ||2 ,
(9)
if ||zi,j || < s
(10)
otherwise
with Kcoh , Ksep and s numerical parameters. Table 1. Numerical parameters used in this work, listed in order of appearance. Parameter Value
Parameter Value
B5
{(0, 0), (0,1.2), (0.4, 0.3), (0.4, 0.9), (0.8, 0.6)} [m]
B10
{(0.8, 0), (0.8, 0.4), (0.8, 0.8), (0.8,1.2), (1.2, 0.2),
.. .
.. .
rcut
2 [m]
Krep
10 [-]
Katt
0.9 [-]
Kcoh
1 [-]
Ksep
0.15 [-]
s
0.45 [m]
rs
0.3 [m]
W0
4 [-]
(1.2, 0.6), (1.2,1), (1.6, 0.4), (1.6, 0.8), (2, 0.6)} [m] dt
50 [ms]
a
1 [-]
b
7 [-]
dref
0.4 [-]
δ .. .
0 [m] .. .
3
Experimental Evaluation
This section introduces the robotic platform used to demonstrate the algorithms, the experimental setup for both simulated and real robot experiments, as well as the evaluation metrics used. 3.1
Robotic Platform
All experiments were conducted using both simulated and real Khepera IV robots [19]. Khepera IV robots are differentially driven vehicles with a diameter of 14 cm
Hybrid Flock - Formation Control Algorithms
529
and a maximum speed of ∼ 81 cm/s. They are further equipped with eight infrared (IR) proximity sensors with a range of 20 cm that are used for obstacle avoidance. As illustrated in Fig. 2, the robots are enhanced with a custom range-and-bearing module [20] for relative localization and an active marker module featuring two LEDs that allow accurate tracking through the SwisTrack software [21]. Communication between the robots is leveraging UDP/IP and is assumed to be error-free and to have a range larger than the experimental arena.
Fig. 2. Khepera IV robot with a custom range-and-bearing module as well as tracking module. In this work, both the camera and the US sensors are not used and thus not further mentioned.
3.2
Evaluation Metrics
The performance of the coordination algorithms are evaluated using four metrics for every temporal instant t given the robots’ positions Zi with i ∈ N in a global reference frame. – Φdensity evaluates the density of the group with respect to an ideal, latticelike flock with dref between all robots. Based on circle packing, it compares Ar , the enclosing circle of all robots, with Ao , the optimal enclosing circle for the given number of robots. Φdensity =
1 1+
1 N ||Ao
− Ar ||
(11)
– Φf orm captures the error between the positions of the robots and their respective attributed positions (ZA ) within the ideal formation (of shape as defined by BN ), centered on ZF and using optimal position assignments.
530
C. Baumann et al.
Φf orm = argmin ZF
1+
1 N N
i=1
1
||Zi − (ZA |ZF )||
(12)
– Φorder (adopted from [15]) evaluates the heading alignment of the robots: Φorder =
Z˙ i Z˙ j 1 N (N − 1) ||Z˙ i || ||Z˙ j || i=j
(13)
– Φsaf e (adopted from [15]) measures the risk of collision among robots: Φsaf e = 1 −
ns N (N − 1)
(14)
with ns the number of occurrences where two robots are “too close”, that is, closer than rs , a numerical parameter: ns = |(i, j) s.t. j = i & ||Zi − Zj || < rs | We further denote Φ the respective average performances over the entire duration of the experiment. 3.3
Experimental Setup
Webots [22], an open-source, high-fidelity robotics simulation platform, has been used for the simulated experiments. All simulated sensors and actuators of the simulated robots were calibrated to match those of the real robots. For the physical experiments, SwisTrack [21] is used in combination with a GigE color camera to both track the robots in order to calculate the evaluation metrics, and to emulate absolute localization such as that provided by a GNSS module in indoor settings. SwisTrack is calibrated using the algorithm proposed by Tsai et al. [23], which results in a spatial error of 1.06 + 0.65 cm and a negligible temporal error of less than 1 mm amplitude. Emulated GNSS information is sent to the robots via UDP/IP at a rate of 1 Hz. 3.4
Characterization Experiment
Characterization experiments are conducted similarly in simulation and reality: five Khepera IV robots are randomly placed in an 3 x 2 m2 arena, as depicted in Fig. 3a. Using the algorithmic framework described in Sect. 2, both algorithms are then evaluated for different parameters, as summarized in Table 2. It should be noted that the main purpose of these experiments is to evaluate the performance of the algorithms as a function of W, both in simulation and in reality, both for ideal and more realistic conditions, that is, without and with the presence of obstacles, respectively. An additional set of experiments, involving ten robots without obstacles, has been added in simulation to give additional insights into the scalability of the algorithms. However, for a complete scalability analysis, further, more systematic experiments will be required in future work.
Hybrid Flock - Formation Control Algorithms
531
An 8-shaped trajectory has been chosen as group movement for all experiments. For the experiments with obstacles, six square obstacles of 20 x 20 cm are placed arbitrarily in the arena. We note that the positions of the obstacles are kept constant over all experiments to increase comparability among them. Figure 3a shows the corresponding obstacle positions used for both real and simulated experiments.
Fig. 3. Experimental arenas used in this work. The required V-shaped formation is identical for both sets of experiments. Table 2. Experimental conditions evaluated in this work Environment
W
N
Webots without obstacles [0, 1, 2, 5] [5, 10]
3.5
Webots with obstacles
[0, 1, 2, 5] 5
Real robots
[0, 1, 2, 5] 5
Narrow Passage Experiment
A narrow passage experiment, similar to the one in [16], is used as case study to demonstrate the performance of the algorithms. Five robots are placed randomly within a starting zone (see Fig. 3b). The robots are then tasked to pass through the narrow passage and cross the finish line, while maintaining spatial coordination. The migration urge is given as a temporal trajectory in the global reference frame. Real robot experiments were repeated 10 times, simulated ones 20 times. In addition to the four metrics introduced previously, we record for every experimental run the time taken until the last robot crosses the finishing line. In contrast to the characterization experiments of Sect. 3.4, W is calculated dynamically on-board for every robot and timestep: W = max(W0 − Nact , 0)
532
C. Baumann et al.
with W0 a numerical parameter corresponding to the desired “rigidness” of the formation and Nact the number of infrared sensors detecting an obstacle, that is returning sensor reading above a given threshold (20% of the maximum value for this work). The performance of this “adaptive” algorithm is compared to a “non-adaptive” version which uses W = W0 .
4
Results and Discussion
Figure 4 shows the resulting performance of the Flock2Lapl algorithm on the four metrics presented before for different W. As intended, with increasing W, φf orm increases and φdensity decreases. It is also worth noting that Φsaf e and φorder are not affected by W.
Fig. 4. Experimental results for the Flock2Lapl algorithm, using both 5 and 10 robots in simulation without obstacles (WB), using 5 robots in simulation with obstacles (WB) and using 5 robots in reality without obstacles (RR). Lines show mean values with the shaded areas representing standard deviation.
Figure 5 shows the corresponding performances for the Pot2Lapl algorithm, which are similar to the results of Flock2Lapl. Figures 4 and 5 further show the resulting performance of both algorithms for 10 robots. The most noticeable impact of this increase in N can be observed for φf orm for low W, that is when almost purely in flocking configuration. This can be explained that for 10 robots the difference between a V-shaped formation and a flock is larger than for 5 robots.
Hybrid Flock - Formation Control Algorithms
533
Fig. 5. Experimental results for the Pot2Lapl algorithm, using both 5 and 10 robots in simulation without obstacles (WB), using 5 robots in simulation with obstacles (WB) and using 5 robots in reality without obstacles (RR). Lines show mean values with the shaded areas representing standard deviation.
Fig. 6. Experimental results for the narrow passage experiment using five robots in simulation (WB) and reality (RR).
534
C. Baumann et al.
Figure 6 shows the results of the narrow passage experiments. For all metrics, a one-way ANOVA revealed that there was a statistically significant difference in the respective metric between at least two groups. However, Tukey’s HSD test for multiple comparisons found that the mean value of the metrics differs only between the simulated and real experiments, with the following exceptions: for Flock2Lapl, statistical differences (p < 0.05) can be observed for φf orm for real and φorder for simulated experiments. For Pot2Lapl, φdensity is statistically different for real experiments (p < 0.01). For both algorithms, the traversing time is statistically different for both simulated and real experiments (p < 0.05). This indicates that our proposed algorithms successfully decreased traversing time, demonstrating one possible use case. We note that the impact on φf orm is limited by the averaging over the entire experiment. However, given a more sensitive metric, we would expect this to be a trade-off between traversing time and φf orm . For all metrics in Fig. 6, relatively high standard deviations can be observed for both simulated and real experiments. This is due to the high amount of stochasticity present, mostly introduced by the random initialization of the robots within the starting area, but also by sensor noise. We also acknowledge that since this work focuses on the coordination algorithms, the sensing capabilities of the robots are maximized. Naturally, further work will be required to analyze the performances of the algorithms under more realistic situations considering exclusively local sensing affected by occlusions.
5
Conclusion
In this work we introduced two hybrid flock - formation control algorithms. By varying a single parameter W, both algorithms are able to change gradually from a pure flocking behavior to a formation behavior and vice-versa. We then characterized the behavior of the two algorithms using four dedicated metrics for different numerical values of W in both simulated and real robot experiments. We further introduced a simple method to compute W on-board, based on the number of infrared sensors that detect an obstacle. By comparing this adaptive algorithm to a non-adaptive formation control algorithm for a narrow passage scenario, we find that the adaptive algorithm achieves a significant traversing time reduction. Acknowledgements. We would like to thank Masaki Haruna, Masahiko Kurishige and Tomoki Emmei of the Advanced Technology R&D Center, Mitsubishi Electric Corporation, Amagasaki City, Japan for the fruitful technical discussions all along this work. This work is financially supported by Mitsubishi Electric Corporation, Japan.
Appendix A video of the experiments performed for this contribution can be found on our research web page: www.epfl.ch/labs/disal/research/mrsmodelingcontrol.
Hybrid Flock - Formation Control Algorithms
535
References 1. Reynolds, C.W.: Flocks, herds and schools: a distributed behavioral model. In: ACM Annual Conference on Computer Graphics and Interactive Techniques, pp. 25–34 (1987) 2. Olfati-Saber, R.: Flocking for multi-agent dynamic systems: algorithms and theory. IEEE Trans. Autom. Control 51, 401–420 (2006) 3. Liu, Y., Bucknall, R.: A survey of formation control and motion planning of multiple unmanned vehicles. Robotica 36(7), 1019–1047 (2018) 4. Wang, P.K.C.: Navigation strategies for multiple autonomous mobile robots moving in formation. J. Robot. Syst. 8(2), 177–195 (1991) 5. Tan, K.-H., Lewis, M.A.: Virtual structures for high-precision cooperative mobile robotic control. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 1, pp. 132–139 (1996) 6. Balch, T., Arkin, R.: Behavior-based formation control for multirobot teams. IEEE Trans. Robot. Autom. 14(6), 926–939 (1998) 7. Wang, H., Rubenstein, M.: Shape formation in homogeneous swarms using local task swapping. IEEE Trans. Rob. 36(3), 597–612 (2020) 8. Antonelli, G., Arrichiello, F., Chiaverini, S.: The NSB control: a behavior-based approach for multi-robot systems. Paladyn, J. Behav. Robot. 1, 48–56 (2010) 9. Monteiro, S., Bicho, E.: Attractor dynamics approach to formation control: theory and application. Auton. Robots 29, 331–355 (2010) 10. Vilca, J., Adouane, L., Mezouar, Y.: Adaptive leader-follower formation in cluttered environment using dynamic target reconfiguration. In: Chong, N.-Y., Cho, Y.-J. (eds.) Distributed Autonomous Robotic Systems. STAR, vol. 112, pp. 237– 254. Springer, Tokyo (2016). https://doi.org/10.1007/978-4-431-55879-8 17 11. Wasik, A., Pereira, J.N., Ventura, R., Lima, P.U., Martinoli, A.: Graph-based distributed control for adaptive multi-robot patrolling through local formation transformation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1721–1728 (2016) 12. Seng, W.L., Barca, J.C., S ¸ ekercio˘ glu, Y.A.: Distributed formation control of networked mobile robots in environments with obstacles. Robotica 34(6), 1403–1415 (2016) 13. Soares, J.M., Aguiar, A.P., Pascoal, A.M., Martinoli, A.: A distributed formationbased odor source localization algorithm - design, implementation, and wind tunnel evaluation. In: IEEE International Conference on Robotics and Automation, pp. 1830–1836 (2015) 14. Falconi, R., Gowal, S., Martinoli, A.: Graph based distributed control of nonholonomic vehicles endowed with local positioning information engaged in escorting missions. In: IEEE International Conference on Robotics and Automation, pp. 3207–3214 (2010) 15. Soria, E., Schiano, F., Floreano, D.: The influence of limited visual sensing on the reynolds flocking algorithm. In: Third IEEE International Conference on Robotic Computing, pp. 138–145 (2019) 16. Baumann, C., Martinoli, A.: A modular functional framework for the design and evaluation of multi-robot navigation. Robot. Auton. Syst. 144, 103849 (2021) 17. Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Logist. Q. 2(1–2), 83–97 (1955) 18. Mesbahi, M., Egerstedt, M.: Graph Theoretic Methods in Multiagent Networks. Princeton Series in Applied Mathematics (2010)
536
C. Baumann et al.
19. Soares, J.M., Navarro, I., Martinoli, A.: The Khepera IV mobile robot: performance evaluation, sensory data and software toolbox. In: Robot 2015: Second Iberian Robotics Conference. AISC, vol. 417, pp. 767–781. Springer, Cham (2016). https:// doi.org/10.1007/978-3-319-27146-0 59 20. Pugh, J., Raemy, X., Favre, C., Falconi, R., Martinoli, A.: A fast onboard relative positioning module for multirobot systems. IEEE/ASME Trans. Mechatron. 14(2), 151–162 (2009) 21. Lochmatter, T., Roduit, P., Cianci, C., Correll, N., Jacot, J., Martinoli, A.: SwisTrack - a flexible open source tracking software for multi-agent systems. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4004– 4010 (2008) 22. Michel, O.: WebotsTM: professional mobile robot simulation. Int. J. Adv. Rob. Syst. 1, 39–42 (2004) 23. Tsai, R.: A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses. IEEE J. Robot. Autom. 3(4), 323–344 (1987)
Multi-Robot Systems Research: A Data-Driven Trend Analysis Jo˜ao V. Amorim Marques1,2 , Mar´ıa-Teresa Lorente1,3 , and Roderich Groß1(B) 1
2
Sheffield Robotics, The University of Sheffield, Sheffield, UK [email protected] INESC Science and Technology, Faculty of Engineering, University of Porto, Sheffield, Portugal 3 Instituto Tecnol´ ogico de Arag´ on, Zaragoza, Spain
Abstract. This paper provides a data-driven analysis of the trends of research on multi-robot systems (MRS). First, it reports the findings of an exhaustive search of the MRS studies published from 2010 to 2020 in 27 leading robotics journals, including a quantitative analysis of trends. Second, it reports the findings of a survey capturing the views of 68 leading experts in the field of MRSs. Finally, it summarises the findings.
1
Introduction
In the late 1940s, Walter [1] built two autonomous robot tortoises that displayed not only impressive individual capabilities, including moving towards a light source or into a station for self-charging, but also collective capabilities, described by the author as “mutual recognition” or forming a “sort of community”. These robots are one of the first physical realisations of multi-robot systems (MRSs). Despite significant developments, especially over the past 35 years, MRSs are only beginning to become commonplace in real-world applications. Numerous surveys have been published on the topic of MRSs [2–5], and the reader is referred to these for a comprehensive overview of the field. This paper seeks to complement previous studies by providing the first data-driven analysis of the trends in the field. It reports: 1. the quantitative findings of an exhaustive search of the MRS studies published in leading robotics journals (in between 2010 and 2020); and 2. an analysis of the views of leading MRS experts. The findings may provide insights to guide future research, for example, by determining areas of development that require significant effort and the shortterm to long-term prospects for specific applications.
2
Exhaustive Search
This section presents the methods and findings from the exhaustive search. All authors contributed equally to this work. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 537–549, 2024. https://doi.org/10.1007/978-3-031-51497-5_38
538
2.1
J. V. A. Marques et al.
Methodology
We use robot to refer to a machine that perceives, and acts in, an environment in physically plausible ways. The environment can be understood as the set of characteristics that define the space in which the robot resides. Studies employing abstract environment/robot models are considered within scope where clearly of relevance to real-world MRSs. For the purpose of the exhaustive search, we restrict the scope to systems composed of mobile robots, that is, robots that are able to change their position and orientation in their environment. A multi-robot system can be defined as a group of robots that operate in the same environment while exhibiting some form of interaction. When part of an MRS, robots usually work collaboratively towards a common goal. This allows them to address a more complex task, or perform a task more efficiently, than when on their own. MRSs may have different levels of autonomy when deployed, from being teleoperated to being fully autonomous. We use an exhaustive search to target all documents published in years 2010– 2020 in 27 of the 28 “robotics” journals that are included in the 2020 Journal Citations Report of the Web of Science Group.1 The documents include original research articles, perspectives, and surveys, but exclude editorials, corrections and other communications. Conference papers, although valuable to the robotics field, were not considered due to the sheer volume of works and the lack of clear criteria for which conferences to consider. For each document, we determine whether it is within scope. Where a document is not within scope, no further analysis is performed. For each document within scope, it is first rated in terms of (i) the originality (up to 6 points) of the work that is reported, (ii) its rigour (up to 4 points) and (iii) its significance (up to 5 points). The individual ratings are then added to produce a relevance score. Although detailed criteria and training are used to guide the assessors (two of the authors) and improve consistency in evaluation, the ratings are finally subjective. To reduce the impact of possible assessor bias (i.e. one assessor tending to provide higher values than the other), samples of documents were marked by both of the assessors—once at the beginning and twice during the exhaustive search. From the results of these double-marking rounds, a consistent but small bias can be observed. To counter this bias, the relevance scores that we report have been adjusted by -0.15 and +0.20 respectively, such that the top 25% of documents within scope for either assessor have a score of 6.00 or above. In the following, these works are referred to as highly relevant. 2.2
Findings from Journal-Specific Analysis
Overall, 24 250 documents were examined in the exhaustive search. Of these, 1487 (6.1%) are within scope, and hence analysed in detail. Figure 1 shows the
1
The documents of one of the journals (International Journal of Robotics & Automation) could not be accessed in full and were hence excluded.
Multi-Robot Systems Research: A Data-Driven Trend Analysis
539
Fig. 1. Percentage of documents within scope (per journal). Portions in red correspond to papers regarded as highly relevant. The numbers represent the total number of papers identified to be within scope.
percentage of documents that are within scope for each journal. For eight journals, more than 10% of documents are within scope. The journals Swarm Intelligence (SI) and Autonomous Robots (AURO) have the strongest focus on MRS research.
Fig. 2. Distribution of the technology readiness levels for the documents within scope (per journal).
Figure 2 shows the distribution of the technology readiness levels (TRL) that we estimated for the systems reported in all documents within scope for each journal (for a definition, see [6]). Most of the journals primarily report findings with a TRL level of up to 4, considered as laboratory validation by the assessors. The Journal of Field Robotics (JFR) and Revista Iberoamericana de Autom´ atica e Inform´atica Industrial (RIAI) stand out: 75% and 46% of documents report work of TRL level 5 or above, owing to these journals primarily focusing on studies that have been validation via field experiments. It should be noted that journals RIAI, Intelligent Service Robotics (ISR), JFR, The International Journal of Robotics Research (IJRR), AURO and Robotica introduce some documents evaluated with a TRL of 6, where the system was demonstrated in a relevant environment.
540
J. V. A. Marques et al.
Fig. 3. Distribution of documents within scope by type of validation (laboratory only, real-world only, or both).
Figure 3 shows the percentage of documents within scope that validated their research via laboratory and/or real-world experiments, as well as the percentage of documents that report formal, theoretical results. As expected, the JFR presents the greatest proportion of documents with real-world validation. In terms of theoretical validation, the journals IEEE Transactions on Robotics (TRO), IJRR, AURO, IEEE Robotics and Automation Letters (RA-L), Robotica, and Journal of Intelligent & Robotic Systems (JINT) contain the largest proportions of works reporting theoretical findings (journals with fewer than 10 works being omitted from consideration of proportions). 2.3
Findings from Trend Analysis
This section presents a quantitative analysis of the trends observed based on the data collected in the exhaustive search. Figure 4 presents the number and percentage of documents that were within scope for each year. Dashed lines show the trends, obtained through linear regression using the least squares method. The trend in number of documents on MRSs suggests a substantial growth in the number of documents within scope, whereas the trend in percentage of documents suggests that the fraction of documents devoted to MRSs has decreased. A possible explanation is that robotics journals tend to report on an increasingly diverse range of topics, some of which attracted significant attention in recent years (e.g. soft-bodied and learning robots). Figure 5 presents the evolution of the percentage of studies within scope that perform theoretical, laboratory and real-world validation. Note that it is possible for a single study to feature for multiple types of validation. Overall, the trends suggest the percentage of theoretical studies is increasing, while laboratory validation is shown to be decreasing and real-world validation is shown to remain relatively constant over the years. Note however that the number of documents within scope is increasing over the years. As a result, the total amount of validation is increasing as well, for both laboratory and real-world experiments. Although theoretical validation is often based on strong assumptions, it
Multi-Robot Systems Research: A Data-Driven Trend Analysis
541
Fig. 4. Number and percentage of documents within scope (per year), as identified by the exhaustive search. The dashed lines represent the trends obtained by linear regression.
Fig. 5. Percentage of documents within scope using theoretical, laboratory and realworld validation (per year), as identified by the exhaustive search. The dashed lines represent the trends obtained by linear regression.
can offer a detailed understanding of the underlying mechanisms and allow for a basic demonstration of the validity of the proposed solutions. The decreased focus on laboratory validation may be attributed to increases in computational capabilities, allowing more trustworthy and realistic simulations to be performed, which may help substitute for laboratory validations.
3
Expert Survey
The expert survey is conducted to help better understand the state of the art and possible future developments of MRSs. It captures the views of a representative sample of internationally leading experts in the field of MRSs. 3.1
Methodology
The expert survey received ethics approval following a review by The University of Sheffield (application reference 034028). It uses a questionnaire that includes
542
J. V. A. Marques et al.
closed- and open-ended questions. Responses are anonymous. Participant can choose not to answer any specific question if they prefer. Where subject-specific questions provide multiple options to choose from, these are presented in a random order to each participant. To identify suitable prospective participants, research paper databases; conference, workshop and journal websites (invited speakers, organisers, editorial boards); resources related to non-academic organisations (e.g. industry, government); websites of technical committees (e.g. IEEE RAS TC Multi-Robot Systems); and other sources are explored. Special emphasis is placed on identifying female experts, experts from traditionally underrepresented geographic regions, and experts from industry and other non-academic organisations. The participants are given three weeks to complete the survey from the moment of invitation (9 July 2020). 3.2
Findings
Overall, 179 experts were invited to participate in the survey, including 119 men (66.5%) and 60 women (33.5%).2 The invited experts were from the following broad, geographic regions: Africa (10), America (62), Asia (35), Europe (66), and Oceania (6), as defined by the Statistics Division of the United Nations3 . A total of 68 participants completed the survey, corresponding to a participation of 38.0% of the invited experts. Each participant provided consent to participate in the survey. In this section, we analyse the findings. Participant Distribution. To ensure the responses to the expert survey are representative, or reveal biases where they exist, we included four personal questions regarding the participant gender, geographic region, sector of work, and the number of years of relevant professional experience. Overall, 76.5% and 20.6% of participants reported themselves as male and female, respectively. A small percentage of participants preferred not to answer (2.9%). The gender distribution is clearly biased towards men, primarily as 66.5% of those invited to participate were men. This can, at least partially, be attributed to women being underrepresented in the field of MRSs, and, especially, in leading roles. For the related disciplines of electrical engineering, mechanical engineering, and computer science, in between 13.5% and 19.7% of tenured/tenure-track faculty positions in the USA are held by women according to a recent survey [7]. Most participants were from Europe, America and Asia. The participant share of Europe is 42.6%, with 1.5% from Eastern Europe, 7.4% from Northern Europe, 16.2% from Southern Europe, and 17.6% from Western Europe. The participant share of America is 32.4%, with 26.5% from North America and 5.9% from South America. The participant share of Asia is 20.6%, with 7.4% from Eastern Asia and 4.4% from each of South-eastern, Southern, or Western 2 3
The gender for each prospective participant was estimated. Actual participants could choose to report their gender. See geographic regions on https://unstats.un.org/unsd/methodology/m49/.
Multi-Robot Systems Research: A Data-Driven Trend Analysis
543
Asia. The remaining participants were from Australia and New-Zealand (2.9%) or did not disclose their geographic region (1.5%). Although the survey succeeded in capturing the views of experts from a wide range of regions, some bias can be observed, in particular towards participants from Europe. It should also be noted that none of the experts reported Africa as their geographic region. Most participants work in academia (77.9%), followed by industry (14.7%). Participants who work in both academia and industry, or in other sectors, are in the minority (2.9% for either case). Three quarters of the participants have up to 20 years of relevant professional experience with half reporting 10 to 20 years (50.0%) and a quarter less than 10 years (25.0%). Most of the remaining participants have between 20 and 30 years of relevant professional experience (19.1%). Two participants have more than 30 years of relevant professional experience (2.9%). Analysis of Responses to Close-Ended Questions. This section presents the views obtained from the experts when answering the subject-specific, closedended questions of the survey. The first question is about what constitutes an MRS. The experts are asked to what extent they consider 12 example systems, labelled a–, to be MRSs. Figure 6 shows the distribution of their responses. Their opinions are diverse. However, broad consensus exists that systems a, b and c would not be considered as MRSs. In these systems, the robots may not even be active. Moreover, broad consensus exists that system , where the robots operate in a same environment and have a common purpose, would be considered as an MRS. Most experts viewed systems where multiple robots interact, such as systems i, j, and k, as instances of MRSs, even when the robots were physically separated, such as in system k. Systems d and e are not considered as MRSs by most of the experts, though conflicting views exist. For system d, the robots are not self-propelled and do not necessarily interact. For system e, one robot transports another robot. For the remaining systems f –h, no clear trend is observed. For system f , the robots share a common infrastructure to obtain position information but are otherwise independent. In the case of system g, a group of conventional manipulators work towards a common goal along a factory assembly line. Given that the motion of each manipulator is fully automated, there appears to be limited interaction. Where the operations of one manipulator builds upon the operations of the previous ones, however, the group could be seen as interacting via the products that they assemble. For system h, each robot is teleoperated by a dedicated human operator while sharing the same environment. Different from system j, the robots are not moving autonomously. As the wider context may not be sufficiently clear, it is difficult to categorise this system, as suggested by a relatively high percentage of experts selecting the option “other (e.g. would need more information to decide)”.
544
J. V. A. Marques et al.
Fig. 6. Distribution of the expert responses to the question of whether the following are considered an MRS: (a) A group of robots stored on the same shelf, (b) Any robot of which more than two copies exist, (c) A group of robots that can be stacked to ease transportation, (d) A group of passive beads residing in a same environment, and being externally controlled through changes in a magnetic field, (e) A robot transporting another robot, (f ) A group of robots operating in different environments, and that do not interact; each obtains positional information from the same global infrastructure, (g) A group of conventional manipulators along a factory assembly line, working on the same products though at different stages, (h) A group of robots operating in a same environment, each being teleoperated by a dedicated operator; the operators may have conflicting goals, though they always attempt to prevent collisions between robots, (i) A robot composed of several modules that can autonomously rearrange the way they are connected to assume a new shape, (j) A group of robots operating in a same environment, each having its own goal, including not to collide with others, (k) A group of robots operating in different environments and that learn from each other through a cloud service, () A group of robots operating in a same environment and having a common purpose. Statements ranked by weighted mean results.
The second question concerns the future deployments of MRSs. The experts were asked the time frame in which they expect MRSs to be widely deployed within their geographic region in various application scenarios. Figure 7 illustrates the distribution of the expected times for each of the application scenarios. Almost all experts consider warehouse logistics as an achievable target in the short term (0–5 years). The majority of experts consider surveillance and security, manufacturing and entertainment as achievable targets in the short term (0–5 years). These application scenarios have already fully embraced robotics in multiple geographic regions. Transport and delivery, inspection and maintenance, agriculture and forestry, education, mining, first response, and domestic applications are considered achievable targets in the medium term (5–10 years) by many of the experts. Space, medical applications, rehabilitation and care and construction are expected to face a longer wait (> 10 years). Additionally, we should notice that education has a moderate number of experts suggesting MRSs may never be widely deployed or not be applicable.
Multi-Robot Systems Research: A Data-Driven Trend Analysis
545
Fig. 7. Distribution of the responses by experts to the question in which time frame they expect MRSs to be widely deployed within their geographic region in the specified applications. Applications ranked by weighted mean results.
Deploying MRSs in the real world involves solving some problems that are equally present when deploying single robot systems. The third question specifically focuses on the transition from systems of single robots to systems of multiple robots. It asks the expert to gauge how much additional effort this transition requires by the robotics community with respect to a number of areas of development. Figure 8 shows the distribution of the additional effort the experts consider necessary for each of the suggested areas of development. In general, we observe that all aspects would demand additional effort. Particularly, experimentation under realistic conditions, autonomy/resilience, system integration, trustworthiness, and end-user acceptance can be highlighted as the most difficult areas of development, with the highest percentage of experts choosing ratings of either 5 or 6. On the contrary, issues such as materials, modelling, ethical concerns, and simulation are considered less demanding in terms of additional effort, with the highest percentage of experts choosing ratings of either 1 or 2. Regarding legal aspects, the opinions are divided. Diversity and inclusion has been appraised as “Undecided” by 29.4% of experts, possibly as the challenge is viewed by some as the same irrespective of whether the system to be designed comprises one or multiple robots. The last question asks the experts how fast they expect the field of MRSs to have grown in ten years from the time of the survey compared to the field of robotics as a whole. Figure 9 presents the responses, suggesting a bimodal distribution with a small peak at “slower” and a larger one at “faster”. Analysis of Responses to Open-Ended Questions. In this section, we present and discuss the participant responses to the open-ended questions of the survey. The first question asked how the experts attitude towards MRSs had changed over the past 10 years. About a third (32.4%) of the participants responded that
546
J. V. A. Marques et al.
Fig. 8. Distribution of the expert responses to the question of how much additional effort is required by the robotics community to move from single-robot system to MRS in real-world applications with respect to various areas of development. The scale is such that 1 corresponds to “insignificant”, whereas 6 corresponds to “unprecedented (e.g. paradigm change required)”. Areas ranked by weighted mean results.
Fig. 9. Distribution of the expert responses to the question of how fast they expect the field of MRSs to have grown in ten years from the time of the survey compared to the field of robotics as a whole.
their attitude had not changed. The remaining participants presented a multitude of aspects that had changed when considering their position towards MRSs. The most frequently mentioned aspects include the focus of their work (27.3%), in particular a shift from theoretical work to more practical applications, the definition of MRS they used, to either include or exclude specific control architectures (centralised vs. decentralised), system size, etc., and the consideration of new capabilities or applications to MRSs, such as machine learning, cloud computing, and warehouse logistics. Although the outlook of most responses to this question were positive, a small number of participants (6.8% of the participants that reported a change in attitude) reported a negative position, questioning the extent to which real-world applications will be impacted on by MRSs. The next question asked participants to give examples of commercial applications other than warehouse logistics where MRSs are currently being deployed. A total of 63.2% participants gave examples, with the most common being (i) entertainment, for example, through the use of drones such as in light shows and for filming, (ii) agriculture, for seeding, tending and harvesting of crops,
Multi-Robot Systems Research: A Data-Driven Trend Analysis
547
and (iii) surveillance. Interestingly, agriculture is included in the list of applications that experts expect to reach deployment only in the medium term (see Sect. 3.2). Other applications, from the most to the least mentioned, include defence, inspection, monitoring, logistics, transport, education, hospital applications, mining, maintenance, and manufacturing. The field of MRSs is diverse, with research and development constantly progressing in a wide range of topics. We presented the experts with a list of topics, and asked them to choose the one they are most familiar with. An option to add their own topic was provided. Each expert was then asked to present the main challenges regarding the chosen topic. The two most frequently chosen topics are coordination/cooperation, control and planning (44.1%) and bio-inspiration, modular and swarm robots (25.0%). The most frequently mentioned challenges in coordination/cooperation, control and planning are related to system integration, particularly in unstructured environments; robustness of the system, both in terms of hardware and software/algorithms/behaviours; logistics, in terms of system deployment and energy; scalability, particularly in terms of how behaviours scale as more robots are added; and costs. Less frequently mentioned challenges include the reality gap; operation safety, in terms of humans, robots and tasks; and the lack of real-world testing. For the participants choosing the topic of bio-inspiration, modular and swarm robotics, the challenge most mentioned is the lack of accessibility to, and the high costs of, reliable and capable robot platforms. This is followed by logistics challenges, which are particularly severe when studying swarms of robots. One other challenge is the lack of realistic real-world applications, as many studies performed on this topic frequently fall back to simple and commonly used behaviours, such as aggregation or pattern formation, which would generally solve only a small part of a given real-world problem. The availability of capable, reliable and robust robot platforms is also the most frequently mentioned challenge on the topic of applications. For networked robots, communication, end-user acceptance and scalability are some of the challenges that were identified by the experts. To be able to assess what can be achieved in the future, it is required to have information on what has been achieved in the past. With that in mind, survey participants were asked which area in MRSs they consider to have improved the most in the last 10 years. From the responses (89.7% of the participants replied), the areas most frequently mentioned are coordination/cooperation, control and hardware. From the responses to the previous question, we know that more than 40% of the participants are most familiar with coordination/cooperation, control and planning, and the challenges identified in these fields were mostly related to implementation issues. As a result, it is reasonable for coordination/cooperation, control and planning to be perceived as areas with significant development over the last 10 years. By contrast, hardware, in the form of available platforms, is one of the aspects frequently mentioned in the previous question (for multiple topics) that still presents a challenge to be overcome. Despite hardware being one of the topics participants believe to have shown most development, further
548
J. V. A. Marques et al.
progress is vital, for example, to enable more comprehensive validation and/or deployment of solutions. While current challenges and recent developments help determine possible future development paths, we also wish to identify promising areas for future research. Survey participants were asked what development specific to MRS they would like to see in the next 10 years. The most commonly mentioned developments were real-world testing (16.2%), real-world applications (13.2%) and coordination/cooperation (11.7%). The interest in real-world testing could be viewed as a consequence of an insufficient availability of hardware for implementation and testing. Although hardware has been identified as a topic of interest to develop in the next 10 years, the percentage of experts mentioning it is relatively small (4.4%). However, developments in this field are required to allow real-world testing to be performed. The interest in real-world applications could be viewed as a consequence of the change in perspective, as previously mentioned, where more theoretical work is being replaced with more practical work. Additionally, it is through the study of new applications that new challenges can be identified. Lastly, with new applications and more experimentation, the requirements for coordination/cooperation algorithms are expected to become more stringent, as MRS performance evaluations move from laboratory environments to realistic environments. Not only does the amount of recent development imply that coordination/cooperation is a growing topic, the development of new technologies, particularly in communication, is expected to enable more reliable coordination/cooperation of MRSs. However, significant work is required to reconcile new technologies and coordination/cooperation strategies. One of the logistics concerns presented by the participants is the lack of adequately trained personnel. The participants were asked what they consider to be an effective activity for training/teaching staff or students to work with MRSs. The majority (85.3%) of the participants replied, resulting in the following proposed activities (ordered from most to least mentioned): hands-on experience; simulations; experimentation; courses; algorithm development; participation in competitions; study and discussions of recent literature; development of robotic systems; and summer schools. In the last open-ended question, we asked participants to state the main difficulties faced in their current or past jobs when undertaking work related to MRSs. Different from what we expected, many of the responses concerned technical difficulties, with the most commonly mentioned ones being hardware reliability, logistics, complexity of experimentation, implementation issues, and communication. Non-technical difficulties that were reported include the costs and access to equipment and infrastructures, personnel training, space constraints, tool availability, and management of expectations in the field of study.
4
Conclusions
This paper reported a data-driven trend analysis of MRSs. An exhaustive search of all papers published in 27 leading robotics journals from 2010 to 2020 revealed (i) a substantial growth in the number of papers
Multi-Robot Systems Research: A Data-Driven Trend Analysis
549
on MRSs albeit the fraction of papers devoted to the topic decreased; (ii) for half of the journals, the fraction of papers devoted to MRSs exceeded 5%; (iii) six journals reported MRS research at TRL 6 or above; and (iv) a decline, in relative terms, in the use of laboratory demonstrations while theoretical work and real-world demonstrations remain in focus. A survey of 68 leading experts revealed (i) that there is no consensus regarding what constitutes an MRS; (ii) the applications where MRSs are expected to be widely deployed in the short-, medium- and long-term; and (iii) the areas of development that require the most effort to move from single-robot systems to MRSs in real-world applications, including experimentation under realistic conditions, autonomy/resilience, system integration, trustworthiness and end-user acceptance. There was no consensus regarding how rapid the field will develop over the next decade, with some expected a faster development while others expected a slower development. Acknowledgments. This study was conducted as part of the Multi-Robot Systems project (Oct 2019–Dec 2020), which was funded by the Defence Science and Technology Laboratory (contract no DSTL-1000141223). The authors thank the 68 participants of the expert survey, and Alejandro R. Mosteo for feedback on the design of the expert survey.
References 1. Walter, W.G.: The Living Brain. WW Norton, 1953 2. Dudek, G., Jenkin, M.R.M., Milios, E., Wilkes, D.: A taxonomy for multi-agent robotics. Auton. Robot. 3(4), 375–397 (1996) 3. Farinelli, A., Iocchi, L., Nardi, D.: Multirobot systems: A classification focused on coordination. IEEE Trans. Syst. Man, Cybern. Part B 34(5), 2015–2028 (2004) 4. Parker, L.E., Rus, D., Sukhatme, G.S.: Multiple mobile robot systems, in Springer Handbook of Robotics. Springer, 2016, pp. 1335–1384 5. Rossi, F., Bandyopadhyay, S., Wolf, M., Pavone, M.: Review of multi-agent algorithms for collective behavior: a structural taxonomy. IFAC-PapersOnLine 51(12), 112–117 (2018) 6. UK House of Commons, technology and innovation centres, science and technology committee, Annex 1: Technology Readiness Levels, 2010. https://publications. parliament.uk/pa/cm201011/cmselect/cmsctech/619/61913.htm#note221 7. American Society for Engineering Education, Engineering and engineering technology by the numbers, 2019, Washington, DC
Coverage Control for Exploration of Unknown Non-convex Environments with Limited Range Multi-robot Systems Mattia Catellani(B) , Federico Pratissoli, Filippo Bertoncelli, and Lorenzo Sabattini Department of Sciences and Methods for Engineering (DISMI), University of Modena and Reggio Emilia, Reggio Emilia, Italy {mattia.catellani,federico.pratissoli,filippo.bertoncelli, lorenzo.sabattini}@unimore.it
Abstract. This paper presents a novel methodology for the exploration and monitoring of unknown environments performed by a group of coordinated mobile robots. The proposed methodology builds upon Voronoi-based coverage control methods, with time-varying probability density functions. In particular, we propose a methodology that, exploiting locally available information, allows robots to dynamically update the probability density function in order to drive the robots to explore previously unexplored areas. The proposed control strategy was validated in extensive simulations, and by means of experiments carried out on a group of mobile robots. Results of simulations and experiments demonstrate the effectiveness of the proposed methodology in leading to the exploration of large portions of the environment .
1
Introduction
The problem of coordinating multiple robots with the task to explore (and map) an unknown environment has been widely studied in the last years. One of the most widely adopted strategies, firstly described in [11], exploits potential fields to generate collision-free trajectories. Every robot is treated as a positive ion, while obstacles are assumed as positively charged and the goal as negatively charged, thus resulting in every robot being driven to the goal while rejecting both other agents and obstacles. Other control strategies can be derived from the well known random walk approach, as the ones proposed in [10], where robots move randomly within the environment. An alternative approach is the one proposed by [18], where the authors study the problem of multi-robot exploration and coverage with limited communication. They propose a decentralized algorithm based on leaving marks on the unknown terrain, which can be sensed by the robots and allow them to cover the terrain. The limited communication issue has been considered also by [6], where the control of the multi-robot system relies on the communication with a base station. In order to assist the robot coverage of the environment, [2] proposes an algorithm that deploys radio beacons c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 550–562, 2024. https://doi.org/10.1007/978-3-031-51497-5_39
Coverage Control for Exploration of Unknown Non-convex Environments
551
and, similarly, [3] proposes robots that, in the process of moving around, deploy network nodes into the environment. One of the most relevant and studied exploration methodologies is the frontier based coordination approach [19]. The environment is discretized in standard occupancy grids and a frontier cell consists in each already explored cell that is an immediate neighbor of an unknown, unexplored cell. A mapping of the environment is required, usually generated exploiting grid maps or graphs, to keep updated the portion of the environment already explored and to localize the frontier nodes positions [8]. Coverage based control strategies [5] coordinate a group of autonomous robots with the aim of maximizing the coverage of the environment and generally exploit a Voronoi partitioning of the environment. The tools behind these methodologies can be exploited to deal with the exploration of unknown environments, like in [1], where the Voronoi diagram is used to generate an obstacle-free convex neighbourhood among strongly convex obstacles. A significant work is presented by [9] where a frontier-based exploration methodology is integrated with a Voronoi partitioning of the environment to avoid duplicated exploration areas. In both of these works the boundary of environment and the obstacles needs to be fully or partially known. A recent study presented in [17] deals with the problem caused by updating a single global grid map managing only local information. In [16] an exploration and navigation framework has been proposed to explore GPS-denied environments using the cooperation between UAVs and UGVs to increase the accuracy of the SLAM algorithm. The mentioned studies and the majority of the approaches in multi-robot exploration are based on a discretization of the environment, typically exploiting occupancy grids. This approach is not suitable for large environments, where the number of cells grows significantly, thus leading to severe computational and memory consumption. Moreover, these approaches require a strong communication network in order to keep updated the shared map among the robots [12]. A methodology that deals with communication issues has been presented in [14], where the total area is partitioned based on the topology exploiting the Generalized Voronoi Graphs for exploration purposes. However, the robots are constrained to move on the Voronoi diagrams edges. Moreover, this strategy is mostly effective in environments structured in a way that naturally drives robots during exploration, e.g. with corridors, losing effectiveness in more generic spaces. An alternative and relevant approach to obtain a mapping and exploration of large environments is presented by [4]. A Gaussian mixture model for global mapping is used to model a complex environment. This approach, developed and tested for one robot monitoring, has the benefits to require a small memory footprint and a low volume of communication. Following this strategy, in this work, we propose a methodology that exploits a mixture of Gaussian distributions to coordinate a group of robots and explore an unknown environment. Given the Gaussian function defined by only a few parameters and the lack of a grid discretization of the environment, the strategy aims to an efficient memory usage and a low communication data among robots.
552
M. Catellani et al.
Contribution. Our methodology uses the coverage control theory described in [15], which introduces a limited Voronoi diagram able to deal with unknown and non-convex environments differently from the approach in [5]. Exploiting this strategy, which allows to obtain an optimal configuration of the agents in the environment in order to maximize the covered area, we provide a solution to dynamically explore an unknown environment while continuously passing through it, in order to keep it constantly monitored. We combine the coverage based control methodology with a mixture of Gaussian distributions to model the areas of the environments already explored. The proposed approach performs the exploration and monitoring without a discretization of the environment (grid map or graph) and without any knowledge of the boundaries of the environment. Moreover, it is worth noting that the few parameters that define a Gaussian function can be easily shared among the robots keeping low the communication and the memory usage, as stated in [4]. Finally, most of the works found in literature provide simple simulations to show the proposed strategy and only a few works can be found that test the control on real platforms. We extensively tested the proposed control in both simulated and real environment experiments. We compare the proposed approach with other typical ones that do not require a discretization or a partial knowledge of the environment, such as random walk and potential field approaches.
2
Problem Statement
Consider a multi-robot system constituted by n holonomic robots able to move in two dimensions, controlled with the objective of monitoring and exploring an unknown environment. We assume each robot to be modeled as a single integrator system,1 whose position pi ∈ R2 evolves according to pi = ui , where ui ∈ R2 is the control input, ∀i = 1, . . . , n. We consider the following assumptions: 1. Localization with respect to a global reference frame: each robot is able to localize itself with respect to a global reference frame, which is shared among the robots in the team;2 2. Limited sensing capabilities: each robot is able to measure the position of neighboring robots and objects (including boundaries of the environment), within its limited sensing range (defined by a circle with radius rsens ∈ R>0 ). Based on these assumptions, the problem addressed in this paper is then formalized as follows: 1
2
We would like to remark that, even though the single integrator is a very simplified model, it can still effectively be exploited to control real mobile robots: using a sufficiently good trajectory tracking controller, the single integrator model can be used to generate velocity references for widely used mobile robotic platforms, such as wheeled mobile robots, and unmanned aerial vehicles. This can be achieved, e.g., by means of Global Positioning System (GPS) or cooperative localization techniques, which are however out of the scope of the paper and will not be addressed.
Coverage Control for Exploration of Unknown Non-convex Environments
553
Problem 1. Define a distributed control strategy that allows a multi-robot system with limited sensing capabilities to explore an arbitrary unknown, possibly non-convex, large environment. The methodology needs to consider the possibility to not be able to discretize the environment due to memory and communication limitations. Notation and Definitions. In the rest of the paper, we denote by N, R, R ≥0 , and R>0 the set of natural, real, real non-negative, and real positive numbers, respectively. Given x ∈ Rn , let x be the Euclidean norm. Let F(R2 ) be the collection of finite point sets in R2 . We can denote an element of F(R2 ) as P = {p1, . . . , pn } ⊂ R2 , where {p1, . . . , pn } are points in R2 . We denote, for p ∈ R2 and r ∈ R>0 , the closed and open ball in R2 centered at p with radius r with B(p, r) = q ∈ R2 |q − p ≤ r and B(p, r) = q ∈ R2 |q − p < r , respectively. In the paper, Q ⊂ R2 denotes a generic polygon: it will be used, in particular, to denote the environment to be covered by the robots. An arbitrary point in Q is denoted by q ∈ Q.
3
Distributed Control Algorithm for Exploration
In this section we briefly describe the control algorithm used to determine the motion of the robotics agents tasked to explore the unknown environment. The proposed strategy builds upon the coverage based control approach described in [15] where the multi-robot system reaches the optimal coverage positions, i.e., the central Voronoi configurations, in a non necessarily convex environment according to the robots sensing capabilities. Briefly, the coverage based control makes use of Lloyd’s algorithm to maximize the sensed area of a mobile sensing network. The environment is partitioned in diagrams following the Voronoi partitioning algorithm. Every diagram Vi is computed according to every robot’s position pi ∈ P and is limited to the area sensed by the robot. The centroid of each partition or cell CVi is then computed, possibly considering a probability density function φ(q) in the environment. The i-th robot is then instructed to move towards the centroid of its cell according to the law ui = −k pr op (pi − CVi )
(1)
where k pr op ∈ R ≥0 is a proportional gain. It is worth remarking that the centroid of each cell, and hence the control law of each robot, is strongly dependent on the chosen density distribution (e.g., uniform density function or Gaussian-type density function). In this work we make use of this property to control the group of robots with the aim to explore and monitor the environment. In fact, the objective of the coverage based control is a static and optimal positioning of the robots in the environment without a dynamical exploration or monitoring of it. As stated in Sect. 2, each robot has a limited sensing range, therefore the Voronoi partitioning depends on the sensing range of the robot, according to the definition in [15]: Vi (P) = {q ∈ B(pi, rsens )|q − pi ≤ q − p j , ∀p j ∈ P}.
(2)
554
M. Catellani et al.
Namely, the i-th cell contains all the points in the environment that are closer to the i-th robot than to other robots, and that are contained within the sensing radius of the i-th robot. The centroid of the Voronoi cell, identified by CVi is then computed according to the probability density distribution φ(q), that is: ∫ qφ(q)dq V . (3) CVi = ∫ i φ(q)dq V i
While the probability density function can be exploited, in general, to highlight the relative importance of specific portions of the environment with respect to others, in our approach this distribution describes the probability that a specified area has been already explored, as will be explained in Sect. 4. The control law can be computed consequently following the methodology described in [15], where each robot is driven towards the centroid of its cell, computed as in (3). It is worth noting that, according to (2) and (3), both the partition and the centroid can be computed in a fully distributed manner: therefore, no global knowledge is required for the computation of the control action. Each robot can, in fact, compute its own control input simply knowing its own location, the neighbors’ location and the probability distribution in the environment. In the following section we describe how we define the probability distribution using a mixture of Gaussian functions that brings to an exploration of the environment.
4
Time-Varying Probability Density Function for Exploration
As introduced in Sect. 2, the aim of this work is to monitor and explore an unknown and possibly non-convex environment with a group of robots with limited sensing capabilities. In order to fulfill the exploration of the whole environment, the robots need to keep updated the areas already explored by the multi-robot system. A probability density function can be exploited for this purpose focusing the attention on the areas to be explored and avoiding areas recently visited. The proposed distributed control algorithm uses this approach to keep updated and shared information about the explored environment. The probability density function is defined as a combination of multiple Gaussian density functions. Each Gaussian function ϕ(q) is determined by a mean value q ∈ R2 and a standard deviation σ ∈ R>0 , and is then defined as follows: q − q 2 . (4) ϕ(q) = exp − 2σ 2 Every agent in the environment must be able to access this data when q lies inside its sensing range, thus the exchange of information can be achieved exploiting a central storage unit or physically placing a marker or a beacon that each robot
Coverage Control for Exploration of Unknown Non-convex Environments
555
can easily localize. During the motion and the exploration, every robot places several Gaussian functions in the environment. The combination of all the shared Gaussian functions determines the motion of the multi-robot system. Every time a new area is explored by a robot, a new Gaussian function ϕ(q), defined as in (4), is added by the robot to the shared density distribution following the algorithm described later (Algorithm 1), affecting the motion of the overall multi-robot system. During the exploration of an unknown environment, we consider two possible scenarios the robot can face: (1) the robot explores areas that need to be crossed only once during the navigation, and (2) areas that need to be crossed more than once to complete the exploration without deadlocks. Thus, we define two types of Gaussian functions to be added by the robot following the definition in (4): a time-constant Gaussian function and a time-varying Gaussian function. The first one intuitively remains constant over time during the whole exploration: q − qi 2 . (5) fFi (q) = exp − 2σi2 Conversely, the time-varying Gaussian function decreases over time: for this purpose, we make use of a forgetting factor k j (t) ∈ [0, 1], which gradually decreases the intensity of the function over time and is defined as k j (t) = 1 −
t − T0 j T f or get
,
(6)
where T0 j is the time instant at which the Gaussian component is added and T f or get represents how long the component should remain active. Thus, the timevarying Gaussian function can be defined as q − q j 2 fVj (q, t) = min exp − (7) , k j (t) 2σj2 where the forgetting factor k j (t), decreasing over time, lowers the peak of the probability function until completely nullifying it at t = T0 j + T f or get . The definition of two different functions allows to specify whether an explored area needs to be crossed again multiple times or instead is needed to be visited only once, depending on the specific application. This differentiation is carried out marking explored areas that don’t need to be visited again with the function fFi (q), while areas required to be crossed multiple times are marked with the function fVj (q, t): time variance decreases the intensity of the Gaussian function that models the area over time, allowing a specific region to be visited again in the future by the system. In this way, the robots will continue monitoring the environment even once the exploration is completed, keeping the gathered information up to date. Moreover, time variance is useful to avoid local minima which cause the robots to get stuck, making it difficult to achieve an entire exploration of the environment. Every robot stores two lists of Gaussian functions, time-varying and time-constant functions, whose combination generates
556
M. Catellani et al.
the density distribution perceived by the robot and exploited to compute the control input. During the exploration process, the lists of Gaussian functions are shared among the robots’ network. Each robot, during the motion, computes the Gaussian components and communicates their parameters to the network. Every robot’s control input is continuously updated following the changes on the shared density distribution. The combination of the various different Gaussian components is computed as follows: φ(q) = c + m + 1 −
c
fF i (q) −
i=1
m
fV j (q),
(8)
j=1
where c is the number of constant density functions fF i and m is the number of time-varying density functions fV j . The presented formulation generates a density distribution φ that has smaller values in areas already visited. The main objective for each robot is to add Gaussian components while exploring the environment decreasing the intensity of the density distribution around the recently visited areas in order to drive the robots’ attention towards more interesting areas. The density distribution φ is then used, following the control theory shown in Sect. 3, to determine the control input for each robot. Every robot determines the addition of a new Gaussian component according to Algorithm 1. The main idea of the algorithm is to add new Gaussian functions in order to keep the robot in motion. A new component is added whenever a robot’s position is sufficiently far from the mean point of existing components: the threshold is set equal to the standard deviation σ. The standard deviation σ can be seen geometrically as the width of the Gaussian function and is set to be half of the robot’s sensing range r: σ = r/2. In a normal distribution, the 95.45% of the samples are inside a 2σ radius from the center point, therefore the value σ = r/2 reflects the information that is collected by the robot in its sensing range. It is worth noting that new components are not added in the actual position of the robot: in fact, this might lead to the definition of symmetric probability distributions, that would lead to have the centroid of the region in the same position as the robot, resulting in a null control input calculated in (1). To avoid this, the new components are centered in a previously occupied position: in particular, this delay Δ is calculated as the time needed for a robot to travel a distance of σ with maximum velocity v M AX : Δ=
σ v M AX
(9)
Moreover, if a robot gets stuck, a new component is added to introduce a perturbation in the probability distribution of the surrounding environment, moving the centroid in a different position. The result of using Gaussian functions described above is the definition of a probability distribution indicating whether an area has already been explored or not: an example of such a function is depicted in Fig. 1.
Coverage Control for Exploration of Unknown Non-convex Environments
557
Algorithm 1: Exploration Algorithm
1 2 3 4 5 6
7 8
5 5.1
input : Robot’s position p Time delay Δ List of probability functions fV (q) and fF (q) with mean points q1, . . . , q m+n and variance σ 2 output: Updated probability distribution begin if (p − qi ) > σ, ∀i ∈ [0, (m + n)] then if robot is moving then Add a new time-varying gaussian component with mean point corresponding to the position of p Δ seconds earlier if robot has stopped then Add a new time-varying gaussian component with q in the most recent robot position that is at a distance ≥ 0.75σ from p if robot lies in an area that needs to be visited only once then Add a time-constant gaussian component with q = p and σ = r/2.
Simulations and Experiments Implementation
The proposed algorithm has been implemented using ROS2 [13], and has been validated by means of extensive simulations and real-world experiments. In particular, several simulation environments have been created using the Gazebo simulator, representing wide areas populated by a large number of robots. Two series of rigorous simulations have been conducted, the first one aiming for a statistical analysis of the results and the performance of the proposed control strategy, the other one to make a comparison with other strategies found in literature. As for the first series of simulations, the initial number of robots was randomly chosen, as well as their starting location in the environment. The simulations conducted include configurations with 2, 4, 6 and 8 robots with a sensing range rsens = 4m and a maximum linear velocity vmax = 0.5 m/s. Moreover, various environment dimensions have been tested along with different environment configurations and different obstacles locations. In particular, we considered non-convex polygonal environments with an area of 200m2 , 400m2 , 900m2 and 2500m2 . Two of the scenarios tested in the simulations are shown in Fig. 2: they differ in number of robots, environment dimension and complexity. For each experimental scenario, we ran 20 tests of 5 min of simulation: overall 320 tests were conducted. For each experiment we saved the robots locations over time, useful to compute the portion of the covered surface. The same statistical approach was followed to make a comparison with other strategies. In this case, our solution was compared with the classic potential field navigation algorithm [11] and a random walk solution presented in [7], where robots drive with a constant straight motion, only changing their direction
558
M. Catellani et al.
Fig. 1. Probability distribution of the environment
when an obstacle is detected. As before, the same robots were used in teams of 6, 8, 10 and 12 agents placed in random starting positions within the same 2500m2 environment already implemented. 20 simulations were run for every team configuration, for a total number of 80 simulations for each method. We also conducted experiments on real robotic platforms exploiting the ROS2 framework. The tests took place in a limited size environment with polygonal obstacles, using three Turtlebot3 Burger differential drive robots (see Fig. 3). The robots localization with respect to a global reference system was obtained through the Optitrack motion capture system. Every robot, following the control algorithm, shares the locally added Gaussian function among the robots. We conducted several tests randomly choosing the initial locations and orientations of the robots. Both in simulations and real world experiments, information defining the Gaussian functions parameters were shared among the agents using a central storage unit, where each robot is able to both save and retrieve data. In addition, we considered non circulating areas such as corners or dead end streets as places that need to be visited only once, marking them with time-constant Gaussian functions fFi (q) defined in (5). Assuming that every robot is capable of detecting corners in the environment adopting a state-of-the-art strategy such as the one proposed in [20], it’s possible for the agents to understand when a non circulating area is visited and consequently define a new time-constant Gaussian function fFi (q). 5.2
Results and Discussion
As previously mentioned, a large number of simulations were executed in order to have a rigorous statistical analysis of the results. We show how the performance of the control strategy depends on the starting number of robots and is not affected by the initial environment configuration or initial robots locations. In all the experiments and simulations, the multi-robot system was able to entirely explore the unknown non-convex environment. In particular, the percentage of the covered area during the exploration increases with the number of robots used.
Coverage Control for Exploration of Unknown Non-convex Environments
559
Fig. 2. Two of the scenarios tested in simulation exploiting Gazebo environment. The two experiments differ in number of robots, environment dimension and complexity. The blue circles represent the sensing areas of the robots.
Figure 4 focuses on analyzing the results from the simulations conducted on the environment shown in Fig. 2a, which has an area of 400m2 . From the results in Fig. 4, we can conclude that 4 robots are sufficient to successfully explore the desired environment, while slightly better performances are obtained with 6 and 8 robots. Figure 5 instead shows the comparison between the results obtained with our solution and other strategies. The advantages with respect to the Potential Field
Fig. 3. The experimental set-up of one of the experiments conducted with real robotic platforms. Three Turtlebot3 burger robots were tasked to explore an unknown nonconvex area.
560
M. Catellani et al.
Fig. 4. Explored area in 400m2 environment
Fig. 5. Comparison with other solutions
navigation are clear: this solution had a poor performance because of the large environment used, which results in a great number of local minima where robots got stuck. Even the comparison with the Random Walk approach proves our solution to achieve a better performance: the multi-robot system was able to explore a larger region of the environment thanks to the ability to coordinate the agents in an optimal way. Finally, real-world experiments have also proven very good exploration of the environment: in the example in Fig. 6, where 3 robots were employed, the covered area reached 90% of totality. This value is perfectly in line with expectations from simulated trials, because the total area is much smaller than the one in simulated environments, but less robots were employed.
Coverage Control for Exploration of Unknown Non-convex Environments
561
Fig. 6. Explored area in field trials
6
Conclusions
In this paper we presented a distributed coverage based control to coordinate a group of autonomous robots with limited sensing capabilities with the aim to explore an unknown environment. The proposed control strategy exploits a density distribution to drive the multi-robot system towards areas of the environment to be explored. The density distribution is build from the combination of multiple Gaussian functions placed by the robots in the environment during the motion. We show how the full exploration of the unknown environment can be achieved without a discretization of the environment through graphs or grid cells differently from the majority of the studies found in literature. Moreover, the robots need to store and share among the network only the Gaussian parameters, which require a low communication volume and a low memory consumption. Several experiments and simulations were performed on a group of mobile robots to validate the performance of the proposed control method and make a comparison with other solutions found in literature. As a future work we aim to further investigate the proposed approach in comparison with frontier-based exploration methodologies.
References 1. Arslan, O., Koditschek, D.E.: Sensor-based reactive navigation in unknown convex sphere worlds. Int. J. Robot. Res. 38(2–3), 196–223 (2019) 2. Batalin, M.A., Sukhatme, G.S.: Coverage, exploration and deployment by a mobile robot and communication network. Telecommun. Syst. 26(2), 181–196 (2004) 3. Batalin, M.A., Sukhatme, G.S.: The analysis of an efficient algorithm for robot coverage and exploration based on sensor network deployment. In: Proceedings of the 2005 IEEE International Conference on Robotics and Automation, pp. 3478– 3485. IEEE (2005)
562
M. Catellani et al.
4. Corah, M., O’Meadhra, C., Goel, K., Michael, N.: Communication-efficient planning and mapping for multi-robot exploration in large environments. IEEE Robot. Autom. Lett. 4(2), 1715–1721 (2019) 5. Cortes, J., Martinez, S., Karatas, T., Bullo, F.: Coverage control for mobile sensing networks. IEEE Trans. Robot. Autom. 20(2), 243–255 (2004) 6. De Hoog, J., Cameron, S., Visser, A.: Dynamic team hierarchies in communicationlimited multi-robot exploration. In: 2010 IEEE Safety Security and Rescue Robotics, pp. 1–7. IEEE (2010) 7. Francesca, G., Brambilla, M., Brutschy, A., Trianni, V., Birattari, M.: Automode: A novel approach to the automatic design of control software for robot swarms. Swarm Intell. 8, 1–24 (2014) 8. Franchi, A., Freda, L., Oriolo, G., Vendittelli, M.: The sensor-based random graph method for cooperative robot exploration. IEEE/ASME Trans. Mechatron. 14(2), 163–175 (2009) 9. Hu, J., Niu, H., Carrasco, J., Lennox, B., Arvin, F.: Voronoi-based multi-robot autonomous exploration in unknown environments via deep reinforcement learning. IEEE Trans. Veh. Technol. 69(12), 14413–14423 (2020) 10. Kegeleirs, M., Garz´ on Ramos, D., Birattari, M.: Random walk exploration for swarm mapping. In: Althoefer, K., Konstantinova, J., Zhang, K. (eds.) TAROS 2019. LNCS (LNAI), vol. 11650, pp. 211–222. Springer, Cham (2019). https://doi. org/10.1007/978-3-030-25332-5 19 11. Khatib, O.: Real-time obstacle avoidance for manipulators and mobile robots. In: Cox, I.J., Wilfong, G.T. (eds.) Autonomous Robot Vehicles, pp. 396–404. Springer, New York, NY (1986). https://doi.org/10.1007/978-1-4613-8997-2 29 12. Kim, J.: Cooperative exploration and networking while preserving collision avoidance. IEEE Trans. Cybern. 47(12), 4038–4048 (2016) 13. Macenski, S., Foote, T., Gerkey, B., Lalancette, C., Woodall, W.: Robot operating system 2: design, architecture, and uses in the wild. Sci. Robot. 7(66), eabm6074 (2022) 14. Masaba, K., Quattrini Li, A.: GVGExp: communication-constrained multi-robot exploration system based on generalized Voronoi graphs. In: 2021 International Symposium on Multi-Robot and Multi-Agent Systems (MRS). IEEE (2021) 15. Pratissoli, F., Capelli, B., Sabattini, L.: On coverage control for limited range multirobot systems. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2022) 16. Qin, H., et al.: Autonomous exploration and mapping system using heterogeneous UAVs and UGVs in GPS-denied environments. IEEE Trans. Veh. Technol. 68(2), 1339–1350 (2019) 17. Ryu, H.: Graph search-based exploration method using a frontier-graph structure for mobile robots. Sensors 20(21), 6270 (2020) 18. Senthilkumar, K., Bharadwaj, K.K.: Multi-robot exploration and terrain coverage in an unknown environment. Robot. Auton. Syst. 60(1), 123–132 (2012) 19. Stachniss, C.: Coordinated Multi-Robot Exploration. In: Robotic Mapping and Exploration. Springer Tracts in Advanced Robotics, vol. 55, pp. 43–71. Springer, Berlin, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01097-2 4 20. Yan, C., Shao, B., Zhao, H., Ning, R., Zhang, Y., Xu, F.: 3d room layout estimation from a single RGB image. IEEE Trans. Multimed. 22(11), 3014–3024 (2020)
Author Index
A Abe, Shoma 424 Abraham, Ian 113 Albani, Dario 333
Ferrante, Eliseo 187 Fierro, Rafael 394 Finzi, Alberto 438 Furukawa, Hidemitsu 424
B Baumann, Cyrill 523 Belal, Mehdi 333 Beltrame, Giovanni 57 Bertoncelli, Filippo 333, 550 Bettini, Matteo 42 Blumenkamp, Jan 42 Booth, Lorenzo 466 Bouvier, Benjamin 494
G Garone, Emanuele 438 Groß, Roderich 537 Grover, Jaskaran 317
C Caccavale, Riccardo 438 Cambier, Nicolas 187 Cao, Yuhong 202 Carey, Nicole E. 140 Carpin, Stefano 466 Catellani, Mattia 550 Chen, Jiahe 509 Chen, Jun 127 Choset, Howie 113 Crosscombe, Michael 14 Cutler, Sadie 509 D Dames, Philip 127, 300, 378 Defay, Jack A. 257 E Edwards, Victoria 83, 363 Eiben, A. E. 187 Ermini, Mirko 438 Erunsal, I. Kagan 156 F Fedeli, Eugenio 438
H Harvey, David 69 Hauert, Sabine 69, 173 Hogg, Elliott 69 Hsieh, M. Ani 83, 216, 347, 363 I Ishihara, Hisashi
100
K Karagüzel, Tugay Alperen 187 Kawakami, Masaru 424 Kontoudis, George P. 286 Kortvelesy, Ryan 42 Kumar, Vijay 408 Kunii, Yasuharu 453 L Latif, Ehsan 243 Lawry, Jonathan 14 Lippiello, Vincenzo 438 Liu, Changliu 317 Liu, Jiazhen 408 Liu, Lantao 273 Lorente, María-Teresa 537 M Ma, Danna 509 Maeda, Takao 453 Manjanna, Sandeep 347
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 563–564, 2024. https://doi.org/10.1007/978-3-031-51497-5
564
Marques, João V. Amorim 537 Martinoli, Alcherio 156, 523 Marzat, Julien 494 Mohanty, Nishant 317 Muslimov, Tagir 231
Author Index
N Naiseh, Mohammad 28 Nilles, Alexandra Q. 257 Notomista, Gennaro 479 Novick, David 394
Shen, Li 216 Shiblee, MD Nahin Islam 424 Silva, Thales C. 83, 216, 363 Sofge, Donald 286 Soorati, Mohammad D. 28 Srivastava, Alkesh Kumar 286 Sueoka, Yuichiro 100 Sukhatme, Gaurav S. 408 Sun, Xiang 394 Sun, Zhanhong 202 Sycara, Katia 317
O Ogawa, Jun 424 Osuka, Koichi 100 Otte, Michael 286
T Tavano, Fabrizio 438 Tourki, Emna 523 Tsunoda, Yusuke 100
P Pan, Lishuo 347 Parasuraman, Ramviyas 243 Park, Shinkyu 127 Perolini, Jonas 523 Petersen, Kirstin 257, 509 Pierre, Jean-Elie 394 Pratissoli, Federico 333, 550 Prorok, Amanda 42 Pushp, Durgakant 273
V Ventura, Rodrigo 156 Vercellino, Cristina 1 Vielfaure, David 57 Villani, Valeria 1
R Ramachandran, Ragesh 408 Ramchurn, Sarvapali 28 Rao, Ananya 113 Ricard, Guillaume 57 Richards, Arthur 69 S Sabattini, Lorenzo 1, 333, 550 Sakamoto, Kosuke 453 Sartoretti, Guillaume 113, 202 Sato, Toui 453
W Watanabe, Yosuke 424 Werfel, Justin 140 Wilson, James 173 X Xin, Pujie 300, 378 Xu, Junhong 273 Y Yin, Kai 273 Yu, Xi 216 Z Zhang, Yi 100 Zhou, Lifeng 408