Robotics: Science and Systems VIII [1 ed.] 9780262315722, 9780262519687

Papers from a flagship conference reflect the latest developments in the field, including work in such rapidly advancing

235 35 16MB

English Pages 501 Year 2013

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Robotics: Science and Systems VIII [1 ed.]
 9780262315722, 9780262519687

Citation preview

Robotics

Robotics Science and Systems VIII

edited by Nicholas Roy, Paul Newman, and Siddhartha Srinivasa

The MIT Press Cambridge, Massachusetts London, England

c 2013 Massachusetts Institute of Technology

All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher. For information about special quantity discounts, please email special [email protected] or write to Special Sales Department, The MIT Press, 55 Hayward Street, Cambridge, MA 02142. Printed and bound in the United States of America. Library of Congress Cataloging-in-Publication Data

´

Robotics: Science and Systems Conference (8th : 2012 : Sydney) Robotics : science and systems VIII / edited by Nicholas Roy, Paul Newman, and Siddhartha Srinivasa. pages cm Selected papers presented at Robotics: Science and Systems (RSS) 2012, held July 9-13 at the University of Sydney, Sydney, NSW, Australia. Includes bibliographical references and index. ISBN 978-0-262-51968-7 (pbk. : alk. paper) 1. Robotics—Congresses. I. Roy, Nicholas. II. Newman, Paul (Paul Michael). III. Srinivasa, Siddhartha. IV. Title. TJ210.3.R6435 2012 629.8 92—dc23 2012049981

10 9 8 7 6 5 4 3 2 1

Contents Preface

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ix

Organizing Committee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xi

Program Committee

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

Workshop Evaluation Committee Sponsors

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix

Estimating Human Dynamics On-the-Fly Using Monocular Video For Pose Estimation Priyanshu Agarwal, Suren Kumar, Julian Ryde, Jason J. Corso, and Venkat N. Krovi . . . .

1

Fast Weighted Exponential Product Rules for Robust General Multi-Robot Data Fusion Nisar Ahmed, Jonathan Schoenberg, and Mark Campbell . . . . . . . . . . . . . . . . . . .

9

State Estimation for Legged Robots: Consistent Fusion of Leg Kinematics and IMU Michael Bloesch, Marco Hutter, Mark A. Hoepflinger, Stefan Leutenegger, Christian Gehring, C. David Remy, and Roland Siegwart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Extrinsic Calibration from Per-Sensor Egomotion Jonathan Brookshire and Seth Teller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

Colour-Consistent Structure-from-Motion Models Using Underwater Imagery Mitch Bryson, Matthew Johnson-Roberson, Oscar Pizarro, and Stefan B. Williams

. . . . .

33

M-Width: Stability and Accuracy of Haptic Rendering of Virtual Mass Nick Colonnese and Allison Okamura . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

41

Contextual Sequence Prediction with Application to Control Library Optimization Debadeepta Dey, Tian Yu Liu, Martial Hebert, and J. Andrew Bagnell . . . . . . . . . . . .

49

Physics-Based Grasp Planning Through Clutter Mehmet R. Dogar, Kaijen Hsiao, Matei Ciocarlie, and Siddhartha S. Srinivasa

57

. . . . . . .

FFT-Based Terrain Segmentation for Underwater Mapping B. Douillard, N. Nourani-Vatani, M. Johnson-Roberson, S. Williams, C. Roman, O. Pizarro, I. Vaughn, and G. Inglis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Formalizing Assistive Teleoperation Anca D. Dragan and Siddhartha S. Srinivasa . . . . . . . . . . . . . . . . . . . . . . . . .

73

Reducing Conservativeness in Safety Guarantees by Learning Disturbances Online: Iterated Guaranteed Safe Online Learning Jeremy H. Gillula and Claire J. Tomlin . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 What’s in the Bag: A Distributed Approach to 3D Shape Duplication with Modular Robots Kyle W. Gilpin and Daniela Rus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

v

89

An Object-Based Approach to Map Human Hand Synergies onto Robotic Hands with Dissimilar Kinematics G. Gioioso, G. Salvietti, M. Malvezzi, and D. Prattichizzo . . . . . . . . . . . . . . . . . .

97

What Types of Interactions Do Bio-Inspired Robot Swarms and Flocks Afford a Human? Michael A. Goodrich, Brian Pendleton, Sean Kerman, and P. B. Sujit . . . . . . . . . . . . . 105 Real-Time Inverse Dynamics Learning for Musculoskeletal Robots Based on Echo State Gaussian Process Regression Christoph Hartmann, Joschka Boedecker, Oliver Obst, Shuhei Ikemoto, and Minoru Asada . 113 Recognition, Prediction, and Planning for Assisted Teleoperation of Freeform Tasks Kris Hauser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Hybrid Operational Space Control for Compliant Legged Systems Marco Hutter, Mark A. Hoepflinger, Christian Gehring, Michael Bloesch, C. David Remy, and Roland Siegwart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Modeling and Prediction of Pedestrian Behavior Based on the Sub-Goal Concept Tetsushi Ikeda, Yoshihiro Chigodo, Daniel Rea, Francesco Zanlungo, Masahiro Shiomi, and Takayuki Kanda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Asymptotically-Optimal Path Planning on Manifolds L´eonard Jaillet and Josep M. Porta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Minimal Coordinate Formulation of Contact Dynamics in Operational Space Abhinandan Jain, Cory Crean, Calvin Kuo, Steven Myint, and Hubertus von Bremen . . . . 153 Nonparametric Bayesian Models for Unsupervised Scene Analysis and Reconstruction Dominik Joho, Gian Diego Tipaldi, Nikolas Engelhard, Cyrill Stachniss, and Wolfram Burgard 161 Distributed Approximation of Joint Measurement Distributions Using Mixtures of Gaussians Brian J. Julian, Stephen L. Smith, and Daniela Rus . . . . . . . . . . . . . . . . . . . . . . 169 Robust Object Grasping Using Force Compliant Motion Primitives Moslem Kazemi, Jean-Sebastien Valois, J. Andrew Bagnell, and Nancy Pollard

. . . . . . . 177

Multi-Stage Micro Rockets for Robotic Insects Mirko Kovaˇc, Maria Bendana, Rohit Krishnan, Jessica Burton, Michael Smith, and Robert J. Wood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Feature-Based Prediction of Trajectories for Socially Compliant Navigation Markus Kuderer, Henrik Kretzschmar, Christoph Sprunk, and Wolfram Burgard . . . . . . . 193 Variational Bayesian Optimization for Runtime Risk-Sensitive Control Scott Kuindersma, Roderic Grupen, and Andrew Barto . . . . . . . . . . . . . . . . . . . . 201 Time-Optimal Trajectory Generation for Path Following with Bounded Acceleration and Velocity Tobias Kunz and Mike Stilman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 Towards A Swarm of Agile Micro Quadrotors Alex Kushleyev, Daniel Mellinger, and Vijay Kumar . . . . . . . . . . . . . . . . . . . . . . 217 CompActTM Arm: a Compliant Manipulator with Intrinsic Variable Physical Damping Matteo Laffranchi, Nikos G. Tsagarakis, and Darwin G. Caldwell . . . . . . . . . . . . . . 225

vi

Robust Loop Closing Over Time Yasir Latif, Cesar Cadena, and Jos´e Neira . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Optimization-Based Estimator Design for Vision-Aided Inertial Navigation Mingyang Li and Anastasios I. Mourikis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Practical Route Planning Under Delay Uncertainty: Stochastic Shortest Path Queries Sejoon Lim, Christian Sommer, Evdokia Nikolova, and Daniela Rus . . . . . . . . . . . . . 249 A Distributable and Computation-Flexible Assignment Algorithm: From Local Task Swapping to Global Optimality Lantao Liu and Dylan A. Shell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 The Banana Distribution Is Gaussian: A Localization Study with Exponential Coordinates Andrew W. Long, Kevin C. Wolfe, Michael J. Mashner, and Gregory S. Chirikjian . . . . . . 265 Recognition and Pose Estimation of Rigid Transparent Objects with a Kinect Sensor Ilya Lysenkov, Victor Eruhimov, and Gary Bradski . . . . . . . . . . . . . . . . . . . . . . 273 Towards Persistent Localization and Mapping with a Continuous Appearance-Based Topology William Maddern, Michael Milford, and Gordon Wyeth . . . . . . . . . . . . . . . . . . . . 281 Robust Navigation Execution by Planning in Belief Space Bhaskara Marthi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 Visual Route Recognition with a Handful of Bits Michael J. Milford . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 Exploiting Passive Dynamics with Variable Stiffness Actuation in Robot Brachiation Jun Nakanishi and Sethu Vijayakumar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 Inference on Networks of Mixtures for Robust Robot Mapping Edwin Olson and Pratik Agarwal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 Turning-Rate Selective Control : A New Method for Independent Control of Stress-Engineered MEMS Microrobots Igor Paprotny, Christopher G. Levey, Paul K. Wright, and Bruce R. Donald . . . . . . . . . 321 Affine Trajectory Deformation for Redundant Manipulators Quang-Cuong Pham and Yoshihiko Nakamura . . . . . . . . . . . . . . . . . . . . . . . . 329 E-Graphs: Bootstrapping Planning with Experience Graphs Mike Phillips, Benjamin Cohen, Sachin Chitta, and Maxim Likhachev . . . . . . . . . . . . 337 Walking and Running on Yielding and Fluidizing Ground Feifei Qian, Tingnan Zhang, Chen Li, Pierangelo Masarati, Aaron M. Hoover, Paul Birkmeyer, Andrew Pullin, Ronald S. Fearing, and Daniel I. Goldman . . . . . . . . . . . . . . . . . . 345 On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference Konrad Rawlik, Marc Toussaint, and Sethu Vijayakumar . . . . . . . . . . . . . . . . . . . 353 Failure Anticipation in Pursuit-Evasion Cyril Robin and Simon Lacroix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 Tendon-Driven Variable Impedance Control Using Reinforcement Learning Eric Rombokas, Mark Malhotra, Evangelos Theodorou, Emanuel Todorov, and Yoky Matsuoka 369 Guaranteeing High-Level Behaviors While Exploring Partially Known Maps Shahar Sarid, Bingxin Xu, and Hadas Kress-Gazit . . . . . . . . . . . . . . . . . . . . . . 377

vii

Development of a Testbed for Robotic Neuromuscular Controllers Alexander Schepelmann, Michael D. Taylor, and Hartmut Geyer . . . . . . . . . . . . . . . 385 Experiments with Balancing on Irregular Terrains Using the Dreamer Mobile Humanoid Robot Luis Sentis, Josh Petersen, and Roland Philippsen . . . . . . . . . . . . . . . . . . . . . . . 393 Parsing Indoor Scenes Using RGB-D Imagery Camillo J. Taylor and Anthony Cowley . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 Toward Information Theoretic Human-Robot Dialog Stefanie Tellex, Pratiksha Thaker, Robin Deits, Dimitar Simeonov, Thomas Kollar, and Nicholas Roy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409 Efficiently Finding Optimal Winding-Constrained Loops in the Plane Paul Vernaza, Venkatraman Narayanan, and Maxim Likhachev . . . . . . . . . . . . . . . . 417 On the Structure of Nonlinearities in Pose Graph SLAM Heng Wang, Gibson Hu, Shoudong Huang, and Gamini Dissanayake

. . . . . . . . . . . . 425

Probabilistic Modeling of Human Movements for Intention Inference Zhikun Wang, Marc Peter Deisenroth, Heni Ben Amor, David Vogt, Bernhard Sch¨olkopf, and Jan Peters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433 Optimization of Temporal Dynamics for Adaptive Human-Robot Interaction in Assembly Manufacturing Ronald Wilcox, Stefanos Nikolaidis, and Julie Shah . . . . . . . . . . . . . . . . . . . . . . 441 Optimal Control with Weighted Average Costs and Temporal Logic Specifications Eric M. Wolff, Ufuk Topcu, and Richard M. Murray . . . . . . . . . . . . . . . . . . . . . . 449 Probabilistic Temporal Logic for Motion Planning with Resource Threshold Constraints Chanyeol Yoo, Robert Fitch, and Salah Sukkarieh . . . . . . . . . . . . . . . . . . . . . . . 457 Hierarchical Motion Planning in Topological Representations Dmitry Zarubin, Vladimir Ivan, Marc Toussaint, Taku Komura, and Sethu Vijayakumar . . . 465 Rigidity Maintenance Control for Multi-Robot Systems Daniel Zelazo, Antonio Franchi, Frank Allg¨ower, Heinrich H. B¨ulthoff, and Paolo Robuffo Giordano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473

viii

Preface This volume contains the 61 papers presented at Robotics: Science and Systems (RSS) 2012, held July 9-13 at the University of Sydney, Sydney, NSW, Australia. This year 177 papers were submitted, each of which was reviewed by four program committee members. Final decisions were made during an in-person area chair committee meeting yielding an acceptance rate of 35%. Readers will recall that in RSS 2011, 45 papers were accepted from a field of 183. Beyond doubt, RSS continues to represent the best work from across the whole spectrum of the robotics science endeavour and as ever the diversity of papers submitted (and accepted) to RSS continues to astound and this volume bears ample witness to that. This, in conjunction with a RSS’s single track format, furnishes the attendee and reader with a tour of what is judged by peer review to be new, exciting and sometimes challenging. This preface then is an appropriate place to thank you, the robotics community, for reviewing all the submitted papers and putting up with a stream of badgering emails from the area and program chairs. Without the hard work of reviewers, RSS would be greatly impoverished. Great thanks is due. This year RSS took on a different format. In the hope that change is good, we tried to integrate workshops, plenaries, early career spotlights, technical paper talks and e-poster sessions. Every day began with a morning of workshops or tutorials. Here one could find discussions and interactions covering the latest and best of robotics science. The afternoon sessions began with a plenary from one of our five outstanding invited speakers. They presented both visions of the future and revelations of the state of the art in fields that one might not immediately associate with robotics but surely should or soon will: • Dr. Anders Sandberg from the University of Oxford gave a talk titled “The Robot and the Philosopher: Charting Progress at the Turing Centenary.” • Prof. Mandyam Srinivasan FRS from the University of Queensland gave a talk titled “Small Brains, Small Planes.” • Prof. Michelle Simmons from the University of New South Wales University gave a talk titled “Single Atom Devices for Quantum Computing.” • Prof. Zoubin Ghahramani from Cambridge University gave a talk titled “Machine Learning as Probabilistic Modeling.” • Dr. Andrew Howard from SpaceX gave a talk titled “Enter the Dragon: The SpaceX COTS Missions.” 1 The afternoons proceeded with a mix of best paper finalist talks and early career spotlight talks, the latter given by three rising and recognised stars in the robotics community: • Jan Peters from Technische Universitaet Darmstadt gave a talk titled “Towards Motor Skill Learning for Robotics.” • Charlie Kemp from Georgia Institute of Technology gave a talk titled “Mobile Manipulation for Healthcare.” 1

With perhaps the most astounding tales of engineering gusto heard for some time.

ix

• Cyrill Stachniss from the University of Freiburg gave a talk titled “Towards Lifelong Navigation for Mobile Robots.” A highlight every afternoon was the all-important rapid-fire, e-poster teaser talks punctuated and regulated by the RSS gong. This year, instead of having one gargantuan poster session, each day ended with a smaller, refreshment-enhanced e-poster session with around 15 presenters. We hoped that these sessions would allow authors and attendees to interact in a meaningful and two way discussion without attendees being overwhelmed by 61 poster shows running in parallel. The jury is out but the wine was good. Of course RSS 2012 was only possible because of industrial and institutional sponsoring. Special thanks to NICTA for Platinum level sponsorhip, ONR Global and the Australian DSTO for Gold sponsorship, Google and NSW Trade and Investment for Silver, Willow Garage, CSIRO, ARAA and Springer for Bronze sponsorship. Thanks also to Google for funding the best paper award and to Springer for funding the best student paper award. We would like to recognize our technical sponsors: IEEE Robotics and Automation Society, the Association for the Advancement of Artificial Intelligence, the International Foundation of Robotics Research and the Robotics Society of Japan. A huge thanks must go to the area chairs who once again brought panache, insight and professionalism to the paper reviewing process. There was a lot to do but it was worth it. They were: Pieter Abbeel (University of California, Berkeley), Tim Barfoot (University of Toronto), Han-Lim Choi (KAIST), Noah Cowan (Johns Hopkins University), Andrew Davison (Imperial College), Frank Dellaert (Georgia Institute of Technology), Ryan Eustice (University of Michigan), Seth Hutchinson (UIUC), Takayuki Kanda (ATR), George Kantor (Carnegie Mellon University), Hadas Kress-Gazit (Cornell University), Katherine Kuchenbecker (University of Pennsylvania), Ingmar Posner (University of Oxford), Cedric Pradalier (ETH Zurich), Luis Sentis (University of Texas at Austin), Siddhartha Srinivasa (Carnegie Mellon University), Cyrill Stachniss (University of Freiburg), Salah Sukkarieh (University of Sydney), Juan Tardos (University of Zaragoza), Sethu Vijayakumar (University of Edinburgh) and Gordon Wyeth (QUT). We save the last and greatest thanks for Stefan Williams and Fabio Ramos for their unflappable approach to local arrangements, to Dylan Shell for elegant and timely action as Publicity Chair, to Jos´e Neira for building an outstanding program of workshops and tutorials, to Siddhartha Srinivasa for being the final gatekeeper of quality as Publications Chair, and finally, to Anca Dragan and Mehmet Dogar for dealing with the business of getting the conference booklet and this tome together. If this conference was a success, it was in no small part because of these folk. Now please read on and enjoy what RSS 2012 brought to the community either here, on these pages, or online (including color images) at: http://www.roboticsproceedings.org/rss08/index.html Nicholas Roy, Massachusetts Institute of Technology Paul Newman, University of Oxford Siddhartha Srinivasa, CMU July 2012 x

Organizing Committee General Chair

Nicholas Roy, Massachusetts Institute of Technology

Program Chair

Paul Newman, University of Oxford

Local Arrangement Co-Chairs

Stefan B. Williams, University of Sydney Fabio Ramos, University of Sydney

Workshops Chair

Jos´e Neira, Universidad de Zaragoza

Publicity Chair

Dylan Shell, Texas A&M

Publications Chair

Siddhartha Srinivasa, Carnegie Mellon University Pieter Abbeel, UC Berkeley Tim Barfoot, University of Toronto Han-Lim Choi, KAIST Noah Cowan, Johns Hopkins University Andrew Davison, Imperial College Frank Dellaert, George Institute of Technology Ryan Eustice, University of Michigan Seth Hutchinson, UIUC Takayuki Kanda, ATR George Kantor, Carnegie Mellon University Hadas Kress-Gazit, Cornell University Katherine Kuchenbecker, University of Pennsylvania Ingmar Posner, Oxford University C´edric Pradalier, ETH Z¨urich Luis Sentis, University of Texas at Austin Siddhartha Srinivasa, Carnegie Mellon University Cyrill Stachniss, University of Freiburg Salah Sukkarieh, University of Sydney Juan Tard´os, Universidad de Zaragoza Sethu Vijayakumar, University of Edinburgh Gordon Wyeth, QUT

xi

RSS Foundation Board President

Sebastian Thrun, Stanford University

Directors

Oliver Brock, Technische Universit¨at Berlin Dieter Fox, University of Washington Yoky Matsuoka, University of Washington Stefan Schaal, University of Southern California Gaurav Sukhatme, University of Southern California Jeff Trinkle, Rensselaer Polytechnic Institute

Treasurer

Wolfram Burgard, University of Freiburg

Secretary

Nicholas Roy, Massachusetts Institute of Technology

xii

Program Committee Abbott, Jake Adams, Martin Ahmed, Nisar Akin, Emrah Andreasson, Henrik Asfour, Tamim Atkeson, Chris Ayanian, Nora Bachrach, Abe Bailey, Tim Bajcsy, Ruzena Balasubramanian, Ravi Balkcom, Devin Beall, Chris Beer, Jenay Beetz, Michael Behnke, Sven Bekris, Kostas Belta, Calin Benenson, Rodrigo Bennewitz, Maren Berenson, Dmitry Bergbreiter, Sarah Bergerman, Marcel Bertuccelli, Luca Bicchi, Antonio Billard, Aude Bosse, Mike Bretl, Tim Brock, Oliver Bryson, Mitch Buchli, Jonas Burgard, Wolfram Calinon, Sylvain Carpin, Stefano Castellanos, Jos´e C¸avus¸o˘glu, Cenk Censi, Andrea Chadwicke, Odest Chaumette, Francois Chernova, Sonia Chirikjian, Greg

Chli, Margarita Choi, Youngjin Chong, NakYoung Choset, Howie Christensen, Henrik Chung, Soon-Jo Chung, Timothy Civera, Javier Clark, Jonathan Clark, Chris Colas, Francis Comport, Andrew Conti, Francois Corke, Peter Crandall, Jacob Degani, Amir Deisenroth, Marc Deng, Xinyan Deshpande, Ashish Do˘gar, Mehmet Douillard, Bertrand Dragan, Anca Duckett, Tom Dudek, Greg Fairfield, Nathaniel Fitch, Robert Fraisse, Philippe Fraundorfer, Friedrich Frazzoli, Emilio Frew, Eric Frisoli, Antonio Fritz, Mario Full, Robert Furgale, Paul Gasserts, Roger Geyer, Hartmut Geyer, Christopher Giguere, Philippe Gillespie, Brent Glas, Dylan Goldberg, Ken Goldman, Dan xiii

Gosselin, Clment Gould, Stephen Grisetti, Giorgio Grizzle, Jessy Grocholsky, Ben Grollman, Dan Gross, Roderich Gutmann, Steffen Ha, Quang Haddadin, Sami Hannaford, Blake Hart, Stephen Hauser, Kris Haydar, Ali Henning, Philippe Hollerbach, John Hollinger, Geoff Hover, Franz How, Jonathan Howard, Matthew Howard, Andrew Howard, Ayanna Hsiao, Kaijen Hsieh, Ani Hsu, David Huang, Haomiao Huang, Albert Hurst, Jonathan Hyun, Baro Ijspeert, Auke Ila, Viorela Imai, Michita Indelman, Vadim Jenkin, Michael Jensfelt, Patric Jo, Sungho Kaess, Michael Kallem, Vinutha Karaman, Sertac Katz, Dov Kelly, Jonathan Kelly, Alonzo

Kim, Jung Kinsey, James Klein, Georg Knepper, Ross Ko, SeongYoung Kodagoda, Sarath Koditschek, Daniel Kolter, Zico Kosecka, Jana K¨oser, Kevin Kroeger, Torsten Kr¨uger, Norbert Kuemmerle, Rainer Kuhnlenz, Kolja Kumar, Pawan Kuwata, Yoshiaki Kyrki, Ville Lacroix, Simon Lai, Kevin Lamiraux, Florent Lane, David Lavalle, Steve Lee, Jusuk Lee, Joon-Yong Leibe, Bastian Leonard, John Leonardis, Ales Levesque, Vincent Likhachev, Maxim Lilienthal, Achim Little, Jim Liu, Dikai Lozano-Perez, Tomas Lynch, Kevin Maciver, Malcolm Manchester, Ian Mansard, Nicolas Marchand, Eric Maria, Jose Marshall, Joshua Marthi, Bhaskara Martinelli, Agostino Martinez, Oskar Mason, Matt Matthies, Larry

Matthies, Larry Maye, Jerome Mayol, Walterio McLurkin, James Mei, Christopher Michael, Nathan Michaud, Francois Milford, Michael Minato, Takashi Minguez, Javier Mistry, Michael Mitra, Sayan Moll, Mark Mombaur, Katja Moon, Hyungpil Morgansen, Kristi Morimoto, Jun Mountney, Peter Mourikis, Anastasios Munich, Mario Myung, Hyun Nakanishi, Jun Napp, Nils Neira, Jos´e Nelson, Brad Neumann, Gerhard Nuske, Stephen Ny, Jerome Le O’Kane, Jason Oliver, Kai Olson, Edwin Oriolo, Giuseppe Ott, Christian Pacheco, Lluis Pack, Leslie Padois, Vincent Park, Jaeheung Paz, Lina Peynot, Thierry Pezzementi, Zachary Pfaff, Patrick Philippsen, Roland Piater, Justus Pini´es, Pedro Pizarro, Oscar xiv

Pomerleau, Francois Porta, Josep Rajan, Kanna Ramamoorthy, Subramanian Ramos, Fabio Ranganathan, Ananth Rao, Rajesh Ratliff, Nathan Ravela, Sai Ren, Wei Revzen, Shai Riviere, Cam Rock, Steve Rogers, John Roman, Chris Rusu, Radu Salichs, Miguel Saranlı, Uluc¸ Sardellitti, Irene Sattar, Junaed Scaramuzza, Davide Scerri, Paul Schaal, Stefan Schwager, Mac Secchi, Cristian Shah, Julie Shibata, Tomohiro Shiller, Zvi Shin, Jiwon Shiomi, Masahiro Sibley, Gabe Simeon, Nicola Singh, Arjun Smith, Christian Smith, Stephen Smith, Ryan Sousa, Jo˜ao Spaan, Matthijs Spinello, Luciano Stilman, Mike Stoytchev, Alex Strasdat, Hauke Strobl, Klaus Sturm, Juergen

Sukhatme, Gaurav Takayama, Leila Tangorra, James Tanner, Herbert Tapus, Adriana Tedrake, Russ Teo, Rodney Tews, Ashley Tipaldi, Gian Todorov, Emanuel Toussaint, Marc Trawny, Nikolas Trevor, Alex

Triebel, Rudolph Ude, Ales Ueda, Jun Underwood, James Upcroft, Ben Valls, Jamie van den Berg, Jur Vaughan, Richard Vedaldi, Andrea Venkadesan, Madhusudan Walter, Matt Waslander, Steven

xv

Werfel, Justin Whitcomb, Louis Williams, Stephen Wongpiromsarn, Nok Wood, Robert Yamane, Katsu Yguel, Manuel Yoshida, Eiichi Yoshikawa, Yuichiro Zavlanos, Michael Zhang, Fumin Zinn, Mike

Workshop Evaluation Committee Balkcom, Devin Barfoot, Tim Castellanos, Jos´e Dudek, Gregory Eustice, Ryan Grisetti, Giorgio Howard, Ayanna Inamura, Tetsunari Kanda, Takayuki Kantor, George Klavins, Eric

Kragic, Danica Kress-Gazit, Hadas Kumar, Vijay Leonard, John Murimoto, Jun Oh, Paul Pineau, Joelle Posner, Ingmar Roumeliotis, Stergios Saranlı, Uluc¸ Sentis, Luis

xvii

Simeon, Nicola Singh, Sanjiv Stachniss, Cyrill Sukkarieh, Salah Tard´os, Juan Tedrake, Russ Thomaz, Andrea Wood, Robert Wyeth, Gordon Xiao, Jing Yamane, Katsu

Sponsors The organizers of Robotics Science and Systems 2012 gratefully acknowledge the following conference sponsors: • Platinum Sponsors:

• Gold Sponsors:

• Silver Sponsors:

• Bronze Sponsors:

xix

• Sponsor of Student Travel Fellowships:

• Sponsor of the Best Paper Award:

• Sponsor of the Best Student Paper Award:

• Technical Sponsors:

• Organized by:

xx

Estimating Human Dynamics On-the-Fly Using Monocular Video For Pose Estimation Priyanshu Agarwal† , Suren Kumar† , Julian Ryde‡ , Jason J. Corso‡ and Venkat N. Krovi† Mechanical and Aerospace Engineering Department† Department of Computer Science and Engineering‡ State University of New York at Buffalo Buffalo, New York 14260-1660 Email: {priyansh, surenkum, jryde, jcorso, vkrovi}@buffalo.edu

style due to changes in speed, step-length, and mass leading to 3D person tracking with physically plausible poses. The Kneed Walker extension, allowed for capture of subtle aspects of motion such as knee bend and torso lean, even when these are not strongly evident from the images [5]. Vondrak et al. [31] employed a full body 3D simulation-based prior that explicitly incorporated motion control and dynamics into a Bayesian filtering framework for human motion tracking. However, there is always one or more fundamental assumptions involved that there is a priori knowledge about the physical properties (e.g. mass, inertia, limb lengths, ground plane and/or collision geometries), the activity in the scene, calibrated camera, imagery from multiple cameras, availability of similar motion dataset [24, 2, 3, 25, 16, 26, 32]. Furthermore, the obtained pose estimates are susceptible to the error in the physical parameters estimates of the system due to propagation of pose states via a nonlinear dynamics model. Brubaker et al. [4] proposed a framework for estimating contact dynamics and internal torques using 3D pose estimates obtained from two views (roughly sagittal and roughly frontal) of a subject using calibrated cameras, mass and inertial properties determined by combining the body segment length estimated from the motion capture data with standard biomechanical data. However, well established system identification techniques fail to solve the problem of estimating physical human parameters solely using uncalibrated monocular imagery when only partial noisy pose estimates are available due to: (i) partial state knowledge (no pose velocity information); (ii) noise due to imaging inaccuracies; (iii) unknown and insufficient input excitation (both structure and frequency); (iv) low sampling frequency; and (v) inherent dynamics nonlinearity. Nevertheless, accurate and efficient kinematic and inertial parameter estimation of articulated multibody systems using noisy partial sensing has broader applications for enhanced telepresence [22], robot navigation [12], vision-based humanoid control, multi-robot cooperation [20] and imitation based robot control [21]. In this work, we build on our previous work [1] wherein a framework for human pose estimation in uncalibrated monocular videos is proposed to obtain the raw pose estimates of lower limbs. A Fast Fourier Transform (FFT) of the raw pose estimates provides information regarding the fundamental fre-

Abstract—Human pose estimation using uncalibrated monocular visual inputs alone is a challenging problem for both the computer vision and robotics communities. From the robotics perspective, the challenge here is one of pose estimation of a multiply-articulated system of bodies using a single nonspecialized environmental sensor (the camera) and thereby, creating low-order surrogate computational models for analysis and control. In this work, we propose a technique for estimating the lowerlimb dynamics of a human solely based on captured behavior using an uncalibrated monocular video camera. We leverage our previously developed framework for human pose estimation to (i) deduce the correct sequence of temporally coherent gap-filled pose estimates, (ii) estimate physical parameters, employing a dynamics model incorporating the anthropometric constraints, and (iii) filter out the optimized gap-filled pose estimates, using an Unscented Kalman Filter (UKF) with the estimated dynamicallyequivalent human dynamics model. We test the framework on videos from the publicly available DARPA Mind’s Eye Year 1 corpus [8]. The combined estimation and filtering framework not only results in more accurate physically plausible pose estimates, but also provides pose estimates for frames, where the original human pose estimation framework failed to provide one.

I. I NTRODUCTION Estimating and tracking 3D pose of humans in unconstrained environments using monocular vision poses several technical challenges due to high-dimensionality of human pose, self-occlusion, unconstrained motions, variability in human motion and appearance, observation ambiguities (left/right limb ambiguity), ambiguities due to camera viewpoint, motion blur and unconstrained lighting [13]. In the past, visual tracking algorithms have relied on kinematic prior models. While these exploit the motion constraints of articulated models, joint limits, and temporal smoothness for improved performance, they have proven insufficient to capture the nonlinearities of human motion [28, 27, 14]. More advanced activity-specific models learned from motion capture data have also been employed that yield approximately correct body configurations on datasets containing similar motions [9, 29, 30]. However, such models often lead to poses that are physically implausible (e.g. foot-skates, out-of-plane rotations) and face difficulty in generalization beyond the learned dataset. Brubaker et al. [6] introduced the notion of a physics-based biomechanical prior-model characterizing lower-body dynamics. A key benefit is the natural accommodation of variations in

1

Raw Monocular Video

Video Preprocessing (Silhouette Extraction, Person Detection by Tracking)

Optimization based 3D Heading/Pose Estimation Pose Estimates Frequency based Activity Inferencing and Parameter Estimation

Dynamics Model Structure (No input/inertial parameter information)

Non-linear Least Squares Pose based Pose Sequence and Physical Parameter Estimation Pose Sequence, Parameter and Input Estimates

Pose filtering using Estimated Dynamically-Equivalent Model (Unscented Kalman Filter)

Optimized Pose Sequence and Dynamically Equivalent Model

Fig. 1: Overview of the overall system. The grayed out boxes represent previous work of the authors that has been used in this framework.

III. H UMAN P OSE E STIMATION

quency in the estimates leading to inference about the current activity in the scene. We focus on inertial and kinematic parameter estimation for human lower-limbs from monocular video by setting up optimization subproblems employing the structure of the dynamics model. A simplified dynamics model–the Anthropomorphic Walker is used along with the optimized anthropometric constraints tuned to generate stable gait, to identify the human physical parameters along with an optimized sequence of gap-filled pose estimates. Finally, an UKF is used to filter out the optimized gap-filled pose estimates using the continuous-discrete state-space model of the estimated dynamically-equivalent human dynamics model for walking.

The work on human pose estimation by the authors employs a model-based generative approach for the task of human pose estimation in unrestricted environments. Unlike many previous approaches, the framework is fully automatic, without any manual initialization for monocular video-based pose estimation. We do not use any camera calibration, prior motion (motion capture database), prior activity, appearance, body shape or size information about the scene. The generative model for estimating heading direction of the subject in the video uses motion-based cues to constrain the pose search space. Thereafter, a sequential optimization framework estimates the remaining uncoupled pose states (camera parameters, body location, body joint angles) leveraging a combination of deterministic and probabilistic optimization approaches. The framework outputs two probable body pose estimates (human torso location (x, y, scaled z), right/left hip flexion angle, right/left knee flexion angle, human heading direction) for each frame in the video. Fig.s. 7(a), 7(b) illustrates the two obtained raw pose estimates (right leg being forward and the left leg being forward, respectively) obtained for a few videos in the DARPA corpus [8]. However, determining the correct temporally coherent sequence of physically plausible pose estimates remained a challenge which is addressed in this work for nearly periodic activities.

II. S YSTEM OVERVIEW Problem Statement: Given the noisy partial pose estimates (joint angles of hips) from an uncalibrated monocular video of a human, develop a dynamic (equivalent) model for human activity suitable for subsequent use to filter the pose estimates. Fig. 1 provides an overview of the proposed system. A raw monocular video is processed to extract human silhouette and track humans in the scene. The human pose estimation framework is then used to obtain the pose estimates. The lower limb pose information is used to extract the primary frequency component and inference about the activity in the scene. An anthropometrically constrained dynamic model with uncertain physical parameters is simulated to obtain a modelbased frequency estimate. In Phase I, a non-linear least squares (NLS) estimator is used to estimate a subset of the parameters that minimizes the frequency disparity (between the measured and simulated frequencies).This is used to bootstrap a Phase II NLS estimator (with 6 uncertain parameters) to minimize the disparity between measured and simulated poses. These estimates are then used to obtain a dynamically equivalent system which is used as the process model for an UKF to obtain the filtered pose estimates.

IV. H UMAN DYNAMICS M ODEL A. The Anthropomorphic Walker The Anthropomorphic Walker proposed and pioneered by McGeer [19], and later modified by Kuo [17] by introducing impulsive push-off combined with torsional spring between the legs permitted the generation of a range of stable walking motion on level and slanted ground. We use this simplified dynamics model (Eqn.s 1,2) for modeling human walking.

2

Mass Relations Notation Relative fraction/ Expression Torso/Upper body mt 0.678 Thigh mth 0.1 Shank ms 0.061 Leg ml mth + ms Length Relations Upper body lt 0.47 Thigh lth 0.23669 Shank ls 0.24556 Foot Radius Rf 0.14466 Center of Mass Relations Leg from foot end Cl 0.553 Radius of Gyration Relations Leg γl 0.326 Torso γt 0.496 Inertia Relations 2 Leg Il ml × γleg Torso It mt × γt2 Body Part

Fig. 2: Parameters of the Anthropomorphic Walker. Please refer to [6] for details regarding the dynamics model.

q = f (t) + TT M (G − g) TT M T¨

(1)

TABLE I: Relative fractions and expressions obtained after optimizing the anthropometric relationships for various physical parameters. Relative fractions are expressed in terms of full body mass and length.

T

where q = [φ1 , φ2 ] , T (q) is the kinematic transfer function, M is the mass matrix, T is the Jacobian of T (q), f (t) is the generalized force vector, G is the gravitational acceleration vector, g is the convective acceleration, and γ is the ground slope angle which is assumed to be zero for this work. The post collision velocities are solved using   T+T M T+ q˙+ = T+T S + M Tq˙− (2)

to the length of the body part, the number of steps, and the minimum number of foot steps required in simulation. We use Θal = [0.1, 0.6, 0.2, 0.2, 0.3, 0.4] and Θau = [0.2, 0.7, 0.3, 0.3, 0.4, 0.5] such that the standard relations are well within the bounds. In order to obtain global solutions in the multi-modal space, we use the standard genetic algorithm [11] to solve the optimization problem with termination criteria as tolerance on parameter values (0.001) and function value (0.01). The optimized length relationships along with other mass relationships are provided in Table I. The only unknowns for the model after incorporating these constraints are the total mass (m), total height (h), spring stiffness (κ) for the spring connecting the two legs, and the magnitude of foot-ground collision impulse (ι) of the human. However, since the spring stiffness and the magnitude of foot-ground collision impulse leads to the same effect in the dynamics, we treat the impulsive reaction force to be a constant. Please note that till this point no pose information regarding the specific person in the scene has been exploited.

where T + (q) is the kinematic transfer matrix immediately after the impact, T+ (q) is the Jacobian of T + (q), q˙− , q˙+ are the pre- and post-collision velocities, respectively, and S is the impulse vector. B. Incorporation of Empirical Anthropomorphic Constraints In order to constrain the various physical parameters (lengths, mass, inertia) in the original model, we exploit the empirical anthropometric values provided in [7]. However, since the dynamics model is a simplified version of the actual human anthropometry, the values are not directly applicable. We set up an optimization problem (Eqn. 3) to estimate the optimal length relations such that minimum variation is obtained in step length across multiple steps (i.e. gait is stable) by simulating the dynamics model for the current set of anthropometric relations. We initialize these relations with the corresponding relations in standard human anthropometry. min Θa

V. P OSE AND PARAMETER E STIMATION

f (Θa ) = V ar(lsi ) (j)

(j) subject to: Θal ≤ Θ(j) a ≤ Θau , lsi > 0, ns > nmin

Once a stable walking model is obtained, the parameters can now be optimized to generate the characteristic nearly periodic gait of the human in the scene using the raw pose estimates in a two stage approach.

(3)

Θa = [rR , rC , rs , rth , rγl , rγt ] where Var(.) stands for variance, lsi , rx , rγx , ns , nmin represent the step length of ith step, relative fraction of the parameter with respect to the total height of the human, relative fraction of radius of gyration with respect

A. Gait Frequency Based Estimation Estimation of all parameters simultaneously based on pose alone leads to incorrect estimates at times due to aliasing.

3

0.8

0.4

0.6

|Hip Angle(t)| →

Hip Angle (radians) →

0.6

0.2 0

0.5 0.4 0.3

−0.2

0.2

−0.4

0.1

−0.6 0

0.5

1

1.5

2

Time (sec) →

2.5

3

(a)

0.6

Right Hip Estimate 1 Left Hip Estimate 1 Right Hip Estimate 2 Left Hip Estimate 2

0.7

3.5

0 0

5

10

Frequency (Hz) →

0.2 0 −0.2 −0.4

Optimized Simulated

0

−0.6 −0.8 0

15

0.5

Optimized Simulated

0.4

Left Hip Angle (radians) →

Right Hip Estimate 1 Left Hip Estimate 1 Right Hip Estimate 2 Left Hip Estimate 2

Right Hip Angle (radians) →

0.8

0.5

1

1.5

2

Time (sec) →

2.5

3

3.5

−0.5 0

(a)

(b)

0.5

1

1.5

2

Time (sec) →

2.5

3

3.5

(b)

Fig. 4: Optimized gap-filled and dynamics-simulation-based pose estimates for a video in the DARPA corpus. (a) Right hip angle, and (b) Left hip angle.

Fig. 3: Frequency estimation from raw pose estimates for a video in the DARPA corpus. (a) Time evolution of raw pose estimates, and (b) Single-sided amplitude spectrum of raw pose estimates.

(Eqn. 6). Physical parameter estimation to first minimize the difference between the simulated and estimated gait frequency reduces this error. Fig. 3 represents the raw pose estimates and the single-sided amplitude spectrum of the raw pose estimates obtained using FFT. We take the average of the frequency estimates obtained using the right (frh1 , frh2 ) and the left (flh1 , flh2 ) hip raw pose estimates for both the raw estimates (Eqn. 4). The range in which the frequency of the raw pose estimates lies determines the periodic activity of the human in the scene. Fujiyoshi and Lipton [10] have shown that it is possible to classify motions into walking and running based on the frequency of the cyclic motion and that a threshold of 2.0 Hz correctly classifies 97.5% of the target motions. In this work, we consider the case where the human activity is walking (fe ≈ 1.17Hz) to illustrate the concept. Generalized more complex dynamics model can be built that can cater multiple nearly periodic activities. In this Phase I NLS estimation problem (Eqn. 5), we seek to minimize the difference between estimated frequency (fe ) and the frequency obtained using the dynamics simulation (fs ). frh1 + flh1 + frh2 + flh2 4 f (Θ) = fs − fe  fe =

min Θ

subject to: Θl ≤ Θ ≤ Θu , Θ = [m, h, κ]

min Θ

f (Θ) =

N  j=1

min Xs − Xei j i

subject to: Θl ≤ Θ ≤ Θu ,

(6)

Θ = [m, h, κ, φ10 , φ20 , φ˙ 10 , φ˙ 20 ], where X = [φ1 , φ2 ], Xs and Xei , i = 1, 2 represent the simulated and the two estimated poses, respectively, and Θ represents the unknown parameters to be estimated. Furthermore, experiments showed that assuming φ20 = −φ10 resulted in stable walking gait which otherwise was difficult. We use Θl = [1, 0.1, 0.1, −π/2, −π/2, −3π/2, −3π/2] and Θu = [200, 2, 100, π/2, π/2, 3π/2, 3π/2]. We use the function evaluation based Nelder-Mead Simplex Method (due to absence of closed form solution of the states making derivative-based methods inapplicable) [18] to solve both the optimization problems with termination criteria as tolerance on parameter values (0.01), function value (0.01) and maximum number of function evaluations (100000). During the optimization for pose and physical parameters we obtain (i) the correct sequence of the pose estimates as well as (ii) the estimates for the physical parameters of the system resulting in a dynamically-equivalent model for the human in the scene. Raw pose estimates for many frames are missing due to inability of the pose estimation algorithm to provide reliable estimates for these frames because of: (i) improper silhouette segmentation due to insignificant lower limb movement, large shadow, high similarity in appearance of foreground and background, merging of human silhouette with other entities; (ii) partial occlusion of the human in the scene. We substitute the dynamic simulation based states for the states available from simulation (hip angles) and use linear interpolation for the remaining states. Fig. 4 shows the optimized gap-filled and dynamics simulation-based pose estimates for the hip angles in a video. In order to filter the optimized pose estimates using the estimated dynamically-equivalent model we linearized the dynamics model. However, the linearization of the model does not truly capture the non-linearity in the model especially because a strong non-linearity is present in the system (due

(4) (5)

We use Θl = [1, 0.1, 0.1] and Θu = [200, 2, 100] for the optimization problem to be generalizable. B. Pose Based Estimation In Phase II, we tune the full parameter set of the model to minimize the difference between the dynamics-simulationbased states and the estimated states (chosen based on its proximity to the simulated state for the current frame). Initializing the dynamics model with the noisy raw pose states also results in unstable walking (due to noise in the estimation). So, the initial states are also estimated during the parameter estimation. The objective is to minimize the difference between the dynamics-simulation based states and the estimated raw states

4

to impulsive collision of the foot with the ground). This leads to unstable walking, rendering the use of Extended Kalman Filter infeasible. Hence, in lieu of computationally expensive particle filters, we explore the use of UKF for filtering out the optimized states, using the dynamically equivalent model for the human in the scene.

where L = nx (= 4) + nw (= 2) + nv (= 2):

 x ˆ ak−1 + γSxak−1 x ˆ ak−1 − γSxak−1 , ˆ ak−1 χak−1 = x a Pk−1 = Sxak−1 (Sxak−1 )T

Step 2.2 Time update equations for i = 0, ..., 2L:   , t ∈ [0, Δt] χt|k−1,i = f t, χxk−1,i , χw k−1,i

VI. U NSCENTED K ALMAN F ILTERING The original discrete-time Unscented Kalman filter works [15] satisfactorily when small time steps are considered. However, for the problem of human pose filtering the measurement update rate (Δt) is governed by the frame rate of the video being processed. This is typically poor even for high frame-rate cameras (∼ 100 Hz), especially given significant non-linearity in the dynamics. Furthermore, the presence of discontinuity in the system dynamics requires a continuous-discrete Kalman filter to be employed, especially because the occurrence of the discontinuity significantly affects the stability of the system. The use of continuous time non-linear state-space model for the system ensures that the occurrence of the discontinuity is accurately captured in time, which is imperative for the stability of the system. In addition, the resulting change in the system model requires careful switching of the states and the entries in the associated state and error covariance matrices as explained in Algorithm 1. We use the following continuousdiscrete system model (Eqn. 7) derived from the dynamics model (Eqn. 1) for human walking with the states switching roles (φ1 , φ˙ 1 becomes φ2 , φ˙ 2 respectively and vice versa) when collision occurs:

Step 2.3 If collision occurred at t = tc φ˙ − (tc ) = [χtc ,i,3 χtc ,i,4 ]T  −1 +T   S + M T− φ˙ − (tc ) T φ˙ + (tc ) = T+T M T+T Swap the states (stance leg becomes swing leg and vice versa) χt

c |k−1,i

χt|t

c ,i

T ˙+ = [χtc ,i,2 χtc ,i,1 φ˙ + 2 (tc ) φ1 (tc )]   , t ∈ [tc , Δt] = f t, χxt |k−1,i , χw k−1,i c

χk|k−1,i = [χΔt,i,2 χΔt,i,1 χΔt,i,4 χΔt,i,3 ]T Step 2.4 If no collision occurred χk|k−1,i = [χΔt,i,2 χΔt,i,1 χΔt,i,4 χΔt,i,3 ]T x ˆ− k =

2L 

wim χxk|k−1,i

i=0

Px−k =

2L 

  T − x wic χxk|k−1,i − x ˆ− − x ˆ χ k k k|k−1,i

i=0



x˙ 1 (t) ⎢ x˙ 2 (t) ⎢ ⎣ x˙ 3 (t) x˙ 4 (t)

y1 (k) y2 (k)

⎤ Step 2.5 Measurement-update equations for i = 0, ..., 2L: φ˙ 1 (t) 0   x ⎥ ⎢ ⎥ ⎢ 0 ⎥ y w φ˙ 2 (t) ⎥ , χk|k−1,i = h χk−1,i , χk−1,i ⎥=⎢ ⎥+⎢ ⎦ ⎣ f1 (t, φ1 (t), φ2 (t), φ˙ 1 (t), φ˙ 2 (t)) ⎦ ⎣ w1 (t) ⎦ 2L  w2 (t) f2 (t, φ1 (t), φ2 (t), φ˙ 1 (t), φ˙ 2 (t)) = wim χyk|k−1,i yˆ− k T w(t) ∼ N (0, Q), w(t) = [w1 (t) w2 (t)] i=0 2L   T  φ1 (k) v1 (k) − − c y y = + , v(k) ∼ N (0, R) χ χ P = w − y ˆ − y ˆ eyey i φ (k) v (k) k k k|k−1,i k|k−1,i ⎤





2



2

i=0

E(w(t1 )w(t2 ) ) = δ(t1 − t2 )Q, E(v(i)v(j)T ) = δij R, T

E(v(k)w(t)T ) = 0, ∀i, j, v(k) = [v1 (k) v2 (k)]T

Pexey = (7)

2L 

  T wic χxk|k−1,i − x ˆ− χyk|k−1,i − yˆ− k k

i=0

Step 2.6 If collision occurred swap the entries in the state covariance matrices Px−k , Peyey , and Pexey to accommodate for the swapping of the states. Step 2.7 Kalman update equations

Algorithm 1 Pseudocode for UKF implemented for a continuous-discrete system model. Step 1. Initialization

−1 Kk = Pexey Peyey   − x ˆk = x ˆ− k + Kk y k − y k

x ˆ 0 = [φ1 , φ2 , φ˙ 1 , φ˙ 2 ]Toptimized , Px0 = diag(0.01, 0.01, 0.5, 0.5) Q0 = diag(10, 10), R0 = diag(0.01, 0.01) ⎤ ⎡ 0 Pxk 0 Qk 0 ⎦ P0a = ⎣ 0 0 0 Rk

−1 Pxk = Px−k − Kk Peyey KkT

End for where {wi } is a set of scalar weights given by:

Step 2. For k=1,...,N Step 2.1 Calculate sigma-points {χak,i ∈ RL |i = 0, ..., 2L},

w0m =

5

λ λ , wc = + (1 − α2 + β), L+λ 0 L+λ

1 , i = 1, ..., 2L 2(L + λ) √ λ = α2 (L + κ) − L, γ = L + λ

Right Hip Angle (degrees) →

It was observed that arbitrarily choosing the process and measurement covariances results in the state covariance matrix being negative definite due to round-off errors during the filtering process, in which case the filtering cannot continue. We provide more weight to the measurements and so, choose large values for the process covariance.

0.5

0.4 0.2 0 −0.2 −0.4

UKF estimated φ1

−0.6 −0.8 0

Optimized φ 20

40

60

Time step →

Left Hip Angle (degrees) →

0.6

wim = wic =

0

UKF estimated φ2 Optimized φ

1

80

100

−0.5 0

20

(a)

2

40

60

Time step →

80

100

(b)

Fig. 5: Results for the UKF  filtered states for a video in the 2 L DARPA corpus. (α = L , β = 0, κ = 2 ) (a) Right hip angle, and (b) Left hip angle.

VII. R ESULTS Fig. 5 shows the filtered pose estimates for the hip angles obtained using the UKF. As is evident, the higher frequency noise in the raw pose estimates is filtered and the filtered estimates are temporally coherent. Table 7 shows the raw estimates corresponding to right/left leg forward, simulated, optimized and filtered estimates for the hip angles rendered on the original video. Fig.s. 7(a) and 7(b) show the raw estimates corresponding to the frames with missing pose estimates rendered with reduced color intensity. Please note that the limb identities are not maintained (left leg becomes right and vice versa) in these raw estimates. Fig. 7(c) shows that the hip angles generated using the estimated dynamics very closely explain the true hip angles, even for the frames where no raw pose estimates were available. Note that a subset of pose variables (left/right knee angles, torso location, and heading estimate) for the frames with missing pose estimates is obtained via linear interpolation. Fig. 7(d) shows the optimized pose estimates obtained by choosing the correct raw pose estimates, where they were available, and substituting the dynamics based pose estimates, in case the raw estimates were missing. Fig. 7(e) shows the pose estimates obtained after filtering the optimized pose estimates using an UKF with the estimated dynamics model as the process model. The filtered pose clearly shows that not only the poor raw pose estimates were replaced by better pose estimates, the frames for which no pose estimates were present were filled with very accurate pose estimates. Table II presents the results obtained after filtering the optimized pose estimates for a video in the DARPA corpus. Note that the error metric [23] for filtered pose should ideally not be compared with that of the raw estimates. This is because the frames where no pose estimates were available is simply filled with the dynamics estimates and interpolation i.e. no evidence from the images is used. Nevertheless, the error metric considering the total number of frames is better than the raw estimates. Furthermore, the filtered pose estimates are now available for 94 frames as opposed to the 52 frames with noisy raw estimates. Fig. 6 shows the error distribution across the 13 salient marker points considered for error metric evaluation for the same video. There is significant reduction in the error corresponding to the left/right foot markers as there is no swapping

Output Raw Filtered

Average Error Per Marker Per Frame 18.33 pixels 16.54 pixels

Number of Frames Processed 52 94

TABLE II: Error metric (L1 norm for 13 salient marker points in the image frame) for the raw and the filtered pose estimates for a video in the DARPA corpus. The original frame resolution is 1280 × 720 and the ground truth was obtained using manual annotation.

of the limbs in the estimated pose. However, the estimates for the markers in general are not comparable as the number of frames considered are different for the two histograms. VIII. C ONCLUSION In this work, we propose a technique for estimating the dynamics of a human, solely based on its uncalibrated monocular video. An UKF is used to filter out the optimized gap-filled pose estimates using the continuous-discrete statespace model of the estimated dynamically-equivalent human dynamics model. Results showed that the framework not only resulted in more accurate pose estimates, but also provided physically plausible pose estimates for frames where the original human pose estimation framework failed to provide one. In the future, we plan to build multi-model adaptive Unscented Kalman filter with more complex dynamics model (also simulating knee flexion, torso lean, and arms) that can cater multiple human activities. ACKNOWLEDGMENTS The authors gratefully acknowledge the support from Defense Advanced Research Projects Agency Mind’s Eye Program (W911NF-10-2-0062). The keen insight of the anonymous reviewers is also gratefully acknowledged. R EFERENCES [1] P. Agarwal, S. Kumar, J. Ryde, J. Corso, and V. Krovi. An Optimization Based Framework for Human Pose Estimation in Monocular Videos. In International Symposium on Visual Computing, 2012.

6

(a) Raw I (b) Raw II (c) Simulated (d) Optimized (e) Filtered Fig. 7: Human pose estimation results at various stages for videos in the DARPA corpus. (a) Raw estimates with left leg forward, (b) Raw estimates with right leg forward, (c) Optimized simulated dynamics, (d) Optimized estimates, and (e) Filtered estimates.

7

35

[16] H. Kjellstrom, D. Kragic, and M.J. Black. Tracking people interacting with objects. In CVPR, pages 747– 754, 2010. [17] A.D. Kuo. A simple model of bipedal walking predicts the preferred speed–step length relationship. Journal of Biomechanical Engineering, 123:264, 2001. [18] J.C. Lagarias, J.A. Reeds, M.H. Wright, and P.E. Wright. Convergence properties of the Nelder-Mead simplex method in low dimensions. SIAM Journal of Optimization, 9:112–147, 1998. [19] Tad McGeer. Passive Dynamic Walking. The International Journal of Robotics Research, 9(2):62–82, 1990. [20] M.B. Nogueira, A.A.D. Medeiros, P.J. Alsina, and N.R.N. Brazil. Pose Estimation of a Humanoid Robot Using Images From a Mobile External Camera. In IFAC Workshop on Multivehicle Systems, 2006. [21] J.P.B. Rubio, C. Zhou, and F.S. Hern´andez. Visionbased walking parameter estimation for biped locomotion imitation. Computational Intelligence and Bioinspired Systems, pages 677–684, 2005. [22] M. Sarkis, K. Diepold, and K. Huper. Pose estimation of a moving humanoid using Gauss-Newton optimization on a manifold. In 7th IEEE-RAS International Conference on Humanoid Robots, pages 228 –234, 2007. [23] L. Sigal and M.J. Black. Humaneva: Synchronized video and motion capture dataset for evaluation of articulated human motion. Brown Univertsity TR, 120, 2006. [24] L. Sigal and M.J. Black. Measure locally, reason globally: Occlusion-sensitive articulated pose estimation. In CVPR, volume 2, pages 2041–2048, 2006. [25] L. Sigal, A. Balan, and M.J. Black. Combined discriminative and generative articulated pose and nonrigid shape estimation. Advances in Neural Information Processing Systems, 20:1337–1344, 2007. [26] L. Sigal, M. Isard, H. Haussecker, and M.J. Black. Looselimbed People: Estimating 3D Human Pose and Motion Using Non-parametric Belief Propagation. IJCV, pages 1–34, 2011. [27] C. Sminchisescu and A. Jepson. Variational mixture smoothing for non-linear dynamical systems. In CVPR, volume 2, pages II–608, 2004. [28] C. Sminchisescu and B. Triggs. Kinematic jump processes for monocular 3D human tracking. In CVPR, volume 1, pages I–69, 2003. [29] R. Urtasun, D.J. Fleet, A. Hertzmann, and P. Fua. Priors for people tracking from small training sets. In ICCV, volume 1, pages 403–410, 2005. [30] R. Urtasun, D.J. Fleet, and P. Fua. 3D people tracking with Gaussian process dynamical models. In CVPR, volume 1, pages 238–245, 2006. [31] M. Vondrak, L. Sigal, and O.C. Jenkins. Physical simulation for probabilistic motion tracking. In CVPR, pages 1–8, 2008. [32] Y. Yang and D. Ramanan. Articulated pose estimation with flexible mixtures-of-parts. In CVPR, pages 1385– 1392, 2011.

25

− Average = 16.54

30

0

Marker Name →

(a)

−− Median = 15.04

0

Left Foot

Left Knee

Left Hip

Right Foot

Right Knee

Right Hip

Left Hand

Left Elbow

Left Shoulder

5

Right Hand

10

Right Elbow

15

Right Shoulder

Left Foot

Left Knee

Left Hip

Right Foot

Right Knee

Right Hip

Left Hand

Left Elbow

Left Shoulder

5

Head

10

Right Hand

15

Right Elbow

20

Head

Error in Pixels →

− Average = 18.33 −− Median = 18.55 Right Shoulder

Error in Pixels →

20 25

Marker Name →

(b)

Fig. 6: Average error distribution for 13 markers for the obtained pose estimates for a video in the DARPA corpus. (a) Raw estimates averaged for both the outputs (with left leg forward, with right leg forward), (b) Filtered estimates.

[2] A.O. Balan and M.J. Black. An adaptive appearance model approach for model-based articulated object tracking. In CVPR, volume 1, pages 758–765, 2006. [3] A.O. Balan, L. Sigal, M.J. Black, J.E. Davis, and H.W. Haussecker. Detailed human shape and pose from images. In CVPR, pages 1–8, 2007. [4] M. A. Brubaker, L. Sigal, and D. J. Fleet. Estimating contact dynamics. In ICCV, pages 2389–2396, 2010. [5] M.A. Brubaker and D.J. Fleet. The kneed walker for human pose tracking. In CVPR, pages 1–8, 2008. [6] M.A. Brubaker, D.J. Fleet, and A. Hertzmann. Physicsbased person tracking using simplified lower-body dynamics. In CVPR, pages 1–8, 2007. [7] A. Bruderlin and T.W. Calvert. Goal-directed, dynamic animation of human walking. ACM SIGGRAPH Computer Graphics, 23(3):233–242, 1989. [8] DARPA. ”DARPA Mind’s Eye Year 1 Videos”, December 2011. www.visint.org. [9] A. Elgammal and C.S. Lee. Inferring 3D body pose from silhouettes using activity manifold learning. In CVPR, volume 2, pages II–681, 2004. [10] H. Fujiyoshi and A.J. Lipton. Real-time human motion analysis by image skeletonization. In IEEE Workshop on Applications of Computer Vision, pages 15–21, 1998. [11] D.E. Goldberg. Genetic algorithms in search, optimization, and machine learning. Addison-wesley, 1989. [12] O.K. Gupta and R.A. Jarvis. Robust pose estimation and tracking system for a mobile robot using a panoramic camera. In IEEE Conference on Robotics Automation and Mechatronics, pages 533–539, 2010. [13] Y.W. Hen and R. Paramesran. Single camera 3d human pose estimation: A review of current techniques. In International Conference for Technical Postgraduates, pages 1–8, 2009. [14] L. Herda, R. Urtasun, and P. Fua. Hierarchical implicit surface joint limits for human body tracking. Computer Vision and Image Understanding, 99(2):189–209, 2005. [15] S.J. Julier and J.K. Uhlmann. Unscented filtering and nonlinear estimation. Proceedings of the IEEE, 92(3): 401–422, 2004.

8

Fast Weighted Exponential Product Rules for Robust General Multi-Robot Data Fusion Nisar Ahmed*

Jonathan Schoenberg*

Mark Campbell

Autonomous Systems Laboratory, Cornell University Ithaca, New York 14853 e-mail:[nra6,jrs55,mc288]@cornell.edu *both authors contributed equally to this work

correlation is the weighted exponential product (WEP) [1]. The WEP is a generalization of CI to non-Gaussian distributions [9] and different cost metrics to determine an optimal WEP fusion weight have been proposed [3, 9]. However, these existing WEP fusion approaches have drawbacks that can limit their usefulness in robotic DDF, including the theoretical nature of their cost metrics and difficult implementation for arbitrary pdfs. This paper makes the following contributions: (1) it proposes novel information-theoretic metrics for performing WEP fusion that address these issues and are suitable for fusing arbitrary state estimates shared via ad-hoc network communication topologies in a wide range of robotic DDF applications; (2) it presents new formally consistent algorithms for online implementation of our proposed WEP fusion metrics that can be used to quickly and robustly combine information in a recursive Bayesian estimation framework, with emphasis on the Bernoulli and Gaussian mixture distribution functions used widely in robotics; (3) it demonstrates the effectiveness of our proposed WEP fusion methods for performing online collaborative 3D mapping and 2D target search with multirobot networks in loopy communication topologies.

Abstract—This paper considers the distributed data fusion (DDF) problem for general multi-agent robotic sensor networks in applications such as 3D mapping and target search. In particular, this paper focuses on the use of conservative fusion via the weighted exponential product (WEP) rule to combat inconsistencies that arise from double-counting common information between fusion agents. WEP fusion is ideal for fusing arbitrarily distributed estimates in ad-hoc communication network topologies, but current WEP rule variants have limited applicability to general multi-robot DDF. To address these issues, new information-theoretic WEP metrics are presented along with novel optimization algorithms for efficiently performing DDF within a recursive Bayesian estimation framework. While the proposed WEP fusion methods are generalizable to arbitrary probability distribution functions (pdfs), emphasis is placed here on widely-used Bernoulli and Gaussian mixture pdfs. Experimental results for multi-robot 3D mapping and target search applications show the effectiveness of the proposed methods.

I. I NTRODUCTION The problem of fusing information from an ensemble of noisy data streams is critical to many existing and soon-tobe-realized robotic systems operating in uncertain dynamic environments. This is particularly true for distributed multirobot systems requiring distributed perception for applications such as collaborative mapping for exploration, target search/tracking for surveillance, and futuristic unmanned urban transport systems. For sensor agents in a network to share information and perform distributed data fusion (DDF), it is most desirable to establish a scalable, flexible and robust network over which the robots can transmit and receive information. An ad-hoc and arbitrary connected network provides scalability for fusion agents to join and drop off the network, flexibility to allow agents to join at any point and robustness to ensure multiple links or agents must fail before the network becomes unconnected [5]. Implementation of DDF for general robot sensor networks thus requires conservative data fusion techniques to maintain estimates that avoid inconsistencies due to rumor propagation [3]. A common conservative fusion rule for estimates with Gaussian probability distribution functions (pdfs) with unknown correlation is Covariance Intersection (CI) [11]. This rule is appropriate for certain types of problems, but it is inadequate for handling non-Gaussian distributions that arise in applications such as target search and 3D mapping. A suitable conservative fusion rule for arbitrary pdfs with unknown

A. DDF Preliminaries Formally, let xk ∈ Rnx be an nx -dimensional state to be estimated at discrete time steps k = 0, 1, ... by N independent robotic sensor agents. Assume that each robot i ∈ {1, ..., N } obtains ny local sensor measurements of xk in the vector yik with likelihood function pi (yik |xk ). Let pi (xk |Zik ) be the local posterior pdf Zik =  for robot i given the set of all information  k−1 k k Zi , yi available to i before new information Zj arrives from robot j = i, such that local fusion of yik is given by the posterior pdf from Bayes’ rule, pi (xk |Zik ) ∝ pi (xk−1 |Zik−1 )pi (yik |xk ).

(1)

For generality, the robot networks considered here are assumed to have arbitrary dynamic node-to-node communication topologies, such that: (i) N may vary; (ii) each robot is only aware of nodes it connected to; (iii) no robot ever knows the complete global network topology; (iv) no robot knows the receipt status of messages it has sent. Centralized fusion N or raw data sharing can fully recover new information in i=1 Zik , but such methods scale poorly with

9

N and are vulnerable to node failures. DDF overcomes such bottlenecks and can recover the centralized Bayesian fusion result for a given pair of robots (i, j) [8], since p(xk |Zik ∪ Zjk ) ∝

pi (xk |Zik )pj (xk |Zjk ) p(xk |Zik ∩ Zjk )

,

(2)

where the denominator is the common information pdf for k the local pdf pi (xk |Zik ) ∝ p(xk |Zik ∩ Zjk ) · p(xk |Zi\j ), k where Zi\j is the exclusive local information set for node i and similarly for node j. Hence, the common information pdf must be explicitly known for DDF to optimally extract only new information in Zik ∪ Zjk . Note that if division by p(xk |Zik ∩ Zjk ) were ignored in (2), the resulting ‘Naive Bayes’ fusion rule would always double-count p(xk |Zik ∩ Zjk ) and induce inconsistencies via rumor propagation. However, tracking p(xk |Zik ∩ Zjk ) in an arbitrary ad-hoc network requires pedigree-sharing about each distribution to fuse; this becomes computationally prohibitive for non-tree communication topologies with loops/cycles [3, 8]. The generality of assumptions (i)-(iv) above leads to difficulties for implementing DDF via graphical model techniques. An alternative method for consistent DDF considered here is the conservative weighted exponential product (WEP) fusion rule, 1 pWEP (xk |Zik ∪ Zjk ) = pω (xk |Zjk ), (3) (xk |Zik ) · p1−ω j c i ≡ pWEP (xk ) where ω ∈ [0, 1] and c is a normalizing constant. The WEP rule has two properties that make it well-suited to general multi-robot DDF problems: (1) it is applicable to arbitrary non-Gaussian pdfs, and (2) for any ω ∈ [0, 1], it is guaranteed to avoid double-counting of p(xk |Zik ∩ Zjk ) [1]. However, this also means some exclusive information from i and j is always discarded. Hence, the WEP fusion weight ω must be picked carefully to ensure useful information exchange between robots i and j. B. WEP Fusion Metrics Ref. [9] considers two popular cost metrics on (3) for optimal selection of ω ∈ [0, 1]. The first cost is the Shannon entropy of pWEP (xk ), which is motivated by the idea that the absolute uncertainty of (3) should be minimized. Ref. [9] shows that the resulting Shannon fusion rule [3] is a direct non-Gaussian generalization of the Gaussian CI rule. The second cost selects the ω = ω ∗ corresponding to the Chernoff Information between pi (xk |Zik ) and pj (xk |Zjk ) [4], and is thus called Chernoff fusion [6, 10]. It can be shown that this equates the Kullback-Leibler divergences (KLDs) between (3) and the individual robot posteriors, which guarantees that the Chernoff fusion result is ‘half-way’ between the original local i and j posteriors in an information gain/loss sense. Both the Shannon and Chernoff fusion rules provide nice information theoretical reasons for choosing ω, but each rule has potential limitations in the context of general multi-robot DDF, as explained in Section II.

C. WEP Fusion Algorithms General WEP fusion can present challenging implementation issues when dealing with arbitrary non-Gaussian pdfs. For instance, both Shannon and Chernoff fusion require minimization of integral costs that are in general analytically intractable, making WEP fusion potentially unsuitable for online robotic DDF problems. Furthermore, (3) is generally not available in closed-form. Heuristic WEP fusion approximations have been proposed to address these issues [6, 10], but these have no formal guarantees of either correctly minimizing the desired WEP cost or accurately approximating (3). Such heuristic approximations can thus lose information that should be preserved between robots or introduce spurious information. II. N EW I NFORMATION - BASED WEP M ETRICS A. Limitations of Existing Metrics Shannon fusion is not always justified from a general Bayesian estimation perspective, since acquisition of new information can, in many cases, increase the entropy of the exact/centralized Bayesian posterior for non-Gaussian pdfs [1]. Chernoff fusion ignores the possibility that an imbalance in information gain may exist between robots i and j prior to fusion. To address such scenarios, ref. [9] suggests introducing weighting factors into Chernoff fusion that are equal to the number of measurements taken by each robot in order to bias ω towards the better informed pdf. However, the number of measurements is not a complete reflection of information gain for each robot, as local differences in sensor quality/characteristics and information obtained from other neighbors in the robot network must also be taken into account. B. Generalized Information Weighted Chernoff Fusion The first new fusion rule modifies the measurement weighted Chernoff fusion [9] to apply weighting factors ci , cj ≥ 0 that reflect actual information gain by each robot i and j, instead of the number of measurements. This biases ω towards the constituent pdf with greater information content without explicitly counting measurements, so that ω satisfies   ci · DKL pWEP (xk )||pi (xk |Zik )   = cj · DKL pWEP (xk )||pj (xk |Zjk ) , (4) where DKL [·||·] is the KLD. For problems yik are always guaranteed to reduce uncertainty (e.g. static linear Gaussian estimation), ci and cj can be conveniently estimated as the reciprocal of the entropy of the constituent distributions, ci =

1 1 , cj = , H[pi (xk |Zik )] H[pi (xk |Zik )]

(5)

where H[p(x)] is the Shannon entropy of p(x). This leads to an Entropy Weighted Chernoff Fusion solution that uses a weight ω ∗ ∈ [0, 1] to equate the entropy weighted KLDs between the WEP and original distributions. This results in a bias towards the distribution with the lower entropy along the chord that connects the two distributions according to the KLDs.

10

However, as discussed in Section II-A, the Shannon entropy is not always a suitable indicator of information gain. It is also unsuitable for fusing distributions over continuous states xk , since the differential Shannon entropy of probability density functions can be negative (whereas ci , cj should be nonnegative). One possible alternative is to select ci and cj to be the information gained in the KLD sense with respect to a common reference distribution pr (xk ) for robots i and j. If a common prior pdf p(x0 ) is available to all agents at k = 0 for initializing Bayesian estimation, then this forms a convenient choice for pr (xk ), so that   ci = DKL pi (xk |Zik )||p(x0 ) , (6) and similarly for cj . These weights reflect the total information gain of each agent over time with respect to p(x0 ) via the combined effects of local Bayesian fusion and DDF. Note that a fixed reference distribution must be used with the KLD, since it is not a true distance metric and does not obey the triangle inequality. The KLD could be replaced by another divergence that is a true metric (e.g. the Hellinger divergence [2] or the symmetric KLD), which removes the need for a common prior and permits ci , cj to be computed incrementally, but that is not demonstrated here. C. Minimum Information Loss Weight Fusion A second cost is now developed to minimize an approximation on the information lost as a result of WEP fusion. Eq. (3) discounts exclusive information at i and j to ensure that the unknown common information between the two is counted only once. Therefore, there is an information loss between the optimal fusion pdf (2) and the WEP fusion pdf (3), which is defined

   Δ ZjK )||pWEP (x|ZiK ZjK ) ILOSS = DKL pOPTIMAL (x|ZiK (7) which cannot be computed without knowing the optimally fused distribution. However, note that if i and j are known to have no information in common, then the optimal fusion rule simplifies to the Naive Bayes (NB) fusion,  1 ZjK ) = p(x|ZiK )p(x|ZjK ). (8) pNB (x|ZiK c In the case of an unknown dependence between the i and j pdfs, the discounting of exclusive information resultant from WEP fusion leads to information loss if the true dependence between the distribution is zero. Therefore, an approximation to the information loss is the KLD between the NB and WEP fusion pdfs, ILOSS ≈ I¯LOSS (ω)

   Δ = DKL pNB (x|Zi Zj )||pWEP (x|Zi Zj )

(9)

It is now possible to compute the fusion weight ω ∗ ∈ [0, 1] to minimize the approximate information loss, ω ∗ = arg min I¯LOSS (ω). ω∈[0,1]

(10)

The advantage of this Minimum Information Loss Fusion (10) scheme is that ω is selected to minimize the possible information loss (7) should the two distributions be truly uncorrelated. In practice, Minimum Information Loss Fusion drives the solution towards Naive Bayes fusion in the case when the distributions to fuse are significantly different (in a Kullback-Leibler sense), which could indicate the two do not share a significant amount of common information. The Minimum Information Loss Fusion scheme provides an automatic method for trading between the Naive Bayes and conservative fusion rules, without resorting to a heuristic decision. III. WEP F USION FOR M ULTI -ROBOT O CCUPANCY G RIDS AND G AUSSIAN M IXTURE B ELIEFS The proofs have been omitted due to limited space, but it can be shown that the newly proposed metrics Section II are all convex in ω for arbitrary pdfs pi (xk |Zik ) and pj (xk |Zjk ). In this section, the problem of implementing WEP fusion using the (un)weighted Chernoff and Minimum Information Loss cost metrics is addressed. The focus is on cases where both pi (xk |Zik ) and pj (xk |Zjk ) are described by either discrete Bernoulli distributions or continuous finite Gaussian mixtures, which are commonly used in robotics, e.g. for occupancy grid mapping and object tracking/localization, respectively. With some minor modifications, the methods described here can be readily applied in other contexts and to other arbitrary discrete/continuous pdfs. A. Fast WEP Fusion for Bernoulli Distributions Applying the general WEP fusion rule (3) to two Bernoulli distributions with unknown correlation results in a fused distribution that is also Bernoulli, pi (xk |Zik ) = pi , pj (xk |Zjk ) = pj pWEP (xk ) =

1−ω pω + i pj

1−ω pω i pj (1 − pi )ω (1

(11) − pj )1−ω

.

(12)

In the case of Chernoff fusion, the optimal weight ω ∗ is given in closed form, ∗ ωCF =

log

[log(1−pi )−log(1−pj )] [log(pi )−log(pj )]

− log pj + log(1 − pj )

[log pi − log pj − log(1 − pi ) + log(1 − pj )]

(13)

For Entropy Weighted Chernoff Fusion, ω ∗ cannot be found in closed form, and the following equation must be solved numerically, pω (1 − pω ) 1 pω log − ... + (1 − pω ) log H[pi ] pi (1 − pi ) 1 pω (1 − pω ) pω log = 0 (14) + (1 − pω ) log H[pj ] pj (1 − pj ) where pω is given by (12) and the entropy of a Bernoulli distribution is H[p] = −p log p − (1 − p) log(1 − p). When the entropy of the two distributions is equal, Entropy Weighted Chernoff Fusion collapses to regular (i.e. unweighted) Chernoff fusion.

11

In the case of the Minimum Information Loss fusion, ω ∗ is given in closed form,

 pi pj log pj − log(1 − pj ) − log (pi −1)(p j −1) . (15) ωM∗ - L = log pj − log(1 − pj ) − log pi + log(1 − pi )

Algorithm 1 GM WEP Fusion via Importance Sampling Inputs: robot GM pdfs pi (xk ) and pj (xk ); number of samples Ns ; initial ¯ ; update function [ωold , ωcurr ] ← guess ω0 ; importance pdf exponent ω R1D [f˜(ωcurr ), ωcurr , ωold ] for desired 1D minimization rule; effective sample size threshold, Nt Ns s Output: ω ˜ ∗ ∈ [0, 1]; samples {xs }N s=1 with normalized weights {θs }s=1 1. Initialize ωcurr ← ω0 and ωold according to R1D 2. construct GM importance sampling pdf g(x; ω ¯ ) via (21) s 3. draw Ns samples {xs }N s=1 ∼ g(x; ωcurr ) s 4. store pdf values pi (xk ), pj (xk ), g(x; ωcurr ) for {xs }N s=1 while ωcurr not converged do s 5. compute unnormalized importance weights {θs }N s=1 via (20) 6. using (19), compute f˜(ωcurr ) via (17) or (18) 7. update [ωold , ωcurr ] ← R1D [f˜(ωcurr ), ωcurr , ωold ] end while 8. if effective sample size < ¯ = ωcurr and go to step 1. t , set ω N Ns s 9. normalize {θs }N ˜ ∗ = ωcurr . s=1 s.t. s=1 θs = 1 and return ω

B. Fast WEP Fusion for Finite Gaussian Mixtures Assume each robot’s pdf is a finite Gaussian mixture (GM), i

pi (x

k

|Zik )

k

= pi (x ) =

M 

i wm N (μim , Σim )

(16)

m=1

where (for agent i and similarly for j) M i is the number of mixands, μi and Σi are the mean and covariance matrix of component m, andwm is the mth mixing weight such M that wm ∈ [0, 1] and m=1 wm = 1. The WEP cost metric integrals and the resulting fusion pdf (3) are not closed form for GMs. A new fast and consistent two-phase Monte Carlobased optimization and learning procedure is presented to overcome these issues. 1) Particle-based optimization: The Information Weighted Chernoff fusion cost can be written as  k (ω) = hij (xk ) · pWEP (xk )dxk , (17) fWEP

Ns 1  θs (ω) · hij (xs ), hij (x ) · pWEP (x )dx ≈ Ns s=1 k

k

θs (ω) =

1−ω pω (xs ) i (xs )pj g(xs ; ω)

(19) (20)

where hij (xs ) are sample values of hij (xk ) and θs (ω) are unnormalized non-negative weights for Ns i.i.d. samples Ns {xs }s=1 drawn from a fixed importance pdf g(xk ; ω). The constants ci and cj in (6) and κ in (18) are generally not available in closed form for GMs, but can be well-approximated estimated with fast deterministic methods [7]. To ensure that (19) is well-behaved and easy to compute, the importance pdf g(xk ; ω) should be easy to sample from and evaluate, have non-zero support over the entire domain of integration and resemble (3) closely enough to control the variance of (19). Many choices are possible for g(xk ; ω), but

M 

F F F ) wm N (μm , Σm

(21)

m=1

−1  F = ω(Σiq )−1 + (1 − ω)(Σjr )−1 Σm   F F ω(Σiq )−1 μiq + (1 − ω)(Σjr )−1 μjr = Σm μm F = wm

The integral terms with respect to pWEP (xk ) in each of these analytically intractable costs are approximated via importance sampling, k

g(x; ω) =

k

(where unweighted Chernoff fusion uses ci = cj = 1); the Minimum Information Loss cost (4) can be written as  k (18) fWEP (ω) = ω · κ + log hij (xk ) · pWEP (xk )dxk ,    pj (xk ) dxk hij (xk ) = 1, κ = pi (xk )pj (xk ) log pi (xk )



F

ci log pi (x ) − cj log pj (x ) , log pi (xk ) − log pj (xk ) k

where hij (xk ) =

the following ‘first-order’ GM approximation to (3) proposed by [10] is particularly effective,

(wqi )ω (wrj )1−ω ,  j 1−ω i ω q  ,r  (wq  ) (wr  )

(22) (23) (24)

which is a component-wise covariance intersection operation on the constituent GMs (with M F resulting mixands). Eq. (19) is unbiased and converges to the true value of the integral as Ns → ∞. Hence, a numerical 1D optimization routine can obtain an unbiased estimate ω ˜ ∗ of ω using (19) to evaluate (17) or (18). The 1D search is accelerated by using a fixed importance pdf g(x; ω) = g(x; ω ¯ ), where ω ¯ is the Ns same in all optimization steps. As such, the samples {xs }s=1 are drawn once and the pdf values {pi (xs ), pj (xs ), g(xs ; ω ¯ )} are stored as constants for recalculating (20) at new ω values. The resulting optimization procedure is shown in Algorithm 1, where R1D [f˜(ωcurr ), ωcurr , ωold ] denotes an ω-update rule for any desired 1D optimization algorithm (e.g. bisection search or golden section search) and ω ¯ = 0.5 is typically used in our applications as an initial estimate. Algorithm 1 typically converges quite quickly and therefore may be run again if it terminates with an effective sample size less than some prescribed threshold Nt . In this case, ω ¯ is reset to the current best estimate for ω to restart the optimization. 2) WEM Learning: Immediately following optimization, density estimation procedures are applied to the importance samples to recover an accurate GM to approximate (3) at ω ˜ ∗ . The Weighted EM (WEM) algorithm is used for GMs to exploit the normalized importance weights as part of the learning problem. Details for implementing WEM learning of GMs from weighted samples are given in [7]. The initial GM parameters for WEM are found via compressing (21) to a desired maximum number of GM terms, M MAX , using [13].

12

(a)

(b)

Fig. 1. (a) Mobile robot set up with Hokuyo lidar and on-board computer for distributed occupancy grid map fusion experiment. (b) Laboratory environment for the occupancy grid mapping experiment is 15 × 8 meters and contains boxes of different sizes.

IV. E XPERIMENTAL R ESULTS A. Multi-Robot Occupancy Grid Map Fusion The generalized fusion of the Bernoulli distribution (Section III-A) is directly applicable to fusion of multi-robot occupancy grid maps. The probability of a voxel of space being occupied is represented as a Bernoulli probability and each voxel is assumed independent. As a result, the fusion of occupancy grid maps from multiple robots is accomplished by sequentially performing fusion across each of the grid cells in the map. A laboratory experiment demonstrates the fusion of 3D occupancy grid maps based on the different fusion rules derived, including Chernoff Fusion, Entropy Weighted Chernoff Fusion and Minimum Information Loss fusion and compares the resulting approximate information loss as a function of map location for the different techniques. Data was collected using the Pioneer P3-DX differential drive mobile robot from Mobile Robots Inc. shown in Fig. 1(a). The primary sensor for the occupancy grid mapping is the Hokuyo URG-04X laser scanner which features a 240o field-of-view and angular resolution of 0.36o and maximum range of ≈ 5 m. The laser is pitched downward 45o and scans along the ground as the robot moves forward. The test environment is 15 × 8 meters and is instrumented with a Vicon MX+ precision tracking system for robot localization and 3D object position/attitude determination. The features in the environment consist of boxes between 10 and 25 cm tall that are meant to simulate traffic cones or other similarly sized obstacles for a full-size traffic vehicle. Eight robots are run in different paths around the environment for a 120 second data collection (Fig. 2). The robots are run sequentially to avoid sensing each other during map construction. The robots each construct a 3D occupancy grid map using the Octomap [14] implementation. The occupancy grid resolution is 0.05 m. To establish a baseline occupancy grid map, the data from all the mobile robots is processed to make the centralized solution that is equivalent to all agents sending each laser scan and position report to a central server for processing. The resulting centralized map is shown in Fig. 3(a). The map is rendered displaying only the voxels that are occupied or have a pOCC > 0.8, the empty and unknown voxels are not shown and the voxels are falsely shaded according

Fig. 2. The overhead paths of the eight mobile robots exploring the environment for distributed occupancy grid mapping experiment. The data collection interval is 120 seconds and different robots explore different regions of the environment with some overlap to demonstrate the benefit of the generalized fusion rules to distributed mapping. The distributed mapping optimal network topology is shown as an inset.

(a)

(b)

(c)

Fig. 3. (a) 3D centralized occupancy grid map displaying the occupied voxels with false coloring based on height. (b) 3D Occupancy grid map from Agent 1 using only local updates. The occupied and empty cells (lightly shaded) are shown along with the final pose of the robot (large box). (c) 3D Occupancy grid map from Agent 1 after Minimum Information Loss Fusion; this map is qualitatively similar to the centralized solution.

to height according to the scale in Fig. 3(b); a 0.25 m grid is shown for reference. The line of small skinny boxes to the left of the environment are clearly visible along with the larger boxes towards the right of the environment and the outer walls. There are gaps in the center of the map where no robot explored. The centralized solution establishes a baseline map for use in the distributed occupancy grid mapping approaches.

13

Distributed Occupancy Grid Fusion on Optimally Connected Network: To evaluate the application of the distributed generalized fusion rules to occupancy grid mapping, the mobile robots are connected in a network. The network is connected according to the topology with maximum robustness [5]. The topology is symmetric, has equal node and link connectivity equal to the minimum degree, and each node is a central vertex of the graph. The topology is shown as an inset to Fig. 2. This makes it difficult to track common information during fusion without data tagging and generalized data fusion for unknown correlations is used. The individual sensor nodes collect and process local scans to build a local occupancy grid map that will be updated with map data passed along the network from other agents. An example of map constructed via local updates only is in Fig. 3(b), which shows the map from Agent 1 (cf. path from Fig. 2). The map is rendered showing the occupied cells falsely shaded according to height and the empty cells are shown lightly shaded; the unknown cells are not shown. The final pose of the robot is shown using a solid box. The map shows Agent 1 explores only a portion of the map. To enable a full representation of the environment, the distributed data fusion techniques are used over the maximally robust sensor network. The robots share map information across the bi-directional links connecting the robots. The agents are required to share the following information for each voxel: the center coordinate and the log-odds probability. This implies each voxel requires 32 bytes of data (if all numbers are in double precision) transmitted for each communication. The nodes communicate aperiodically as they collect information. The final occupancy grid map after Minimum Information Loss Fusion is shown in Fig. 3(c). The distributed data fusion via Minimum Information Loss fusion does not have access to the centralized solution nor the optimal fusion distribution, but is successfully utilized to build a map that is qualitatively similar to the centralized solution (Fig. 3(a)). The maps for Chernoff Fusion and Entropy Weighted Chernoff Fusion are similar (not shown). Approximate Information Loss Maps: The approximate information loss I¯LOSS (9) is computed after fusion for each of the different fusion rules. The resulting information loss maps can be used for planning purposes to balance exploration vs verification of cells that may have contained substantial information loss as a result of fusion. The information loss map resulting from Chernoff Fusion is shown in Fig. 4(a). The map is falsely shaded according to the approximate information loss metric I¯LOSS and the scale is shown in Fig. 4(b), where the range reflects the maximum range of values according to the empirical CDF computed over all possible combinations of two Bernoulli distributions for the different fusion rules (not shown). The areas where the largest information loss occurs when information coming from remote nodes clashes with the content estimated locally at Agent 1 and are in the left hand portion of the map. The areas of information loss to the right hand portion of the map are resulting from the sequential application of the fusion rule as information is received on the

(a)

(b)

(c)

Fig. 4. (a) Approximate information loss I¯LOSS on the occupancy grid after Chernoff Fusion for Agent 1. (b) Approximate information loss I¯LOSS on the occupancy grid after Entropy Weighted Chernoff Fusion for Agent 1. (c) Approximate information loss I¯LOSS on the occupancy grid after Minimum Information Loss Fusion for Agent 1.

three links connected to Agent 1. The resulting information loss map for Entropy Weighted Chernoff Fusion at Agent 1 is shown in Fig. 4(b). The results shows an increase in some areas and a decrease in the approximate information loss in other areas as a result of Entropy Weighted Chernoff Fusion. The resulting fused map (not shown) is similar to Fig. 3(b). The Minimum Information Loss Fusion rule is significantly better than either approach in terms of reducing the need for verification due to potential information loss, especially along the ground. The resulting information loss map for Minimum Information Loss Fusion at Agent 1 is shown in Fig. 4(c). The results show a dramatic improvement in the potential information loss, because the fusion rule has zero loss for 50% of the possible combinations of Bernoulli distributions. The Minimum Information Loss Fusion rule generates consistent and quality occupancy grid maps and has the lowest information loss. Therefore, this fusion rule is the best for distributed data fusion of occupancy grid maps. B. GM Fusion for Multi-robot Target Search In another experiment, three Pioneer 3D-X robots are tasked to search for a static target with location xtarg in an environment similar to the one shown in Fig. 1. Each robot equipped with single forward facing Unibrain Fire-I camera

14

(a)

(b)

(c)

Fig. 5. (a) Example GM prior over xTARG (darker/lighter surface indicates lower/higher density); (b) model for camera-based binary ‘No Detection’ likelihood (P (‘No Detection’|xTARG ) shown zoomed in); (c) posterior GM after fusing GM prior in (a) with ‘No Detection’ report in (b) via eq.(1) using Gaussian sum filter approximation.

with a 42.5 deg field of view, which feeds onboard visual object detection software that produces binary ‘detection’/‘no detection’ outputs at 1 Hz with 1 m maximum detection range. The likelihood model for the visual object detectors is shown in Fig. 5(b) and the common GM prior pdf for xTARG at k = 0 is shown in Fig. 6(a). The sensor likelihood model for each robot is highly non-Gaussian and induces non-Gaussian posterior via (1). As shown in Fig. 5(c), this local Bayesian posterior is well-approximated at each time step k by a finite GM using a variant of the Gaussian sum particle filter with 1000 particles per mixand [12] and a standard GM reduction algorithm to keep M MAX = 15 for each robot [13]. The robots move along different pre-defined search trajectories for 65 secs, during which they fuse their own camera observations to modify their local copy of the GM pdf for xTARG (none of the agents finds the target). The robots communicate 5 times (once every 13 secs) using the cyclic communication pattern 1 → 2 → 3 → 1 for time steps k = 26 and k = 39, and 1 → 3 → 2 → 1 for the remaining instances. The unweighted Chernoff, KLD-gain Information Weighted Chernoff, and Minimum Information Loss rules were all evaluated for DDF via Algorithm 1 with Ns = 1000. Non-DDF benchmark runs using no sharing and centralized fusion were also evaluated, alongside alternative DDF benchmark runs obtained via Naive Bayes and conservative Bhattacharyya fusion. Bhattacharyya fusion is another WEP fusion rule that always assigns ω = 0.5 [3]. This fusion rule requires no optimization but does require approximation of the WEP fusion pdf, which is accomplished by applying WEM to weighted importance samples, as done for the other WEP rules used here. Fig. 6(b)-(h) show the resulting fusion posterior GM pdfs at k = 65 for robot 1 (bottom left of map). Fig. 7 shows the KLD information gains (6) for each robot over time under each fusion scheme. Fig. 6(b) and Fig. 7(a) show that robot 1 contributes the least amount of new information, as it remains completely still at the bottom left of the map throughout the search . When using DDF to update its local copy of the xTARG GM, robot 1 exchanges GMs with robots 2 and 3 without any of the robots being aware of the common information in the network. The Naive Bayes DDF results in Fig. 6(d) and Fig. 7(c) show that repeated double-counting of common information drives the final DDF posterior pdfs for all robots very far from the centralized fusion posterior in Fig. 6(c) and thus leads to severely inconsistent results.

In contrast, the final WEP DDF posterior GMs in Fig. 6(e)(h) and information gains in Fig. 7(d)-(f) all bear much closer resemblance to the centralized fusion result (with KLDs to the centralized result roughly the same between 0.4-0.55 nats). The posterior GM plots for Information Weighted Chernoff and Minimum Information Loss fusion show that these rules are better at retaining some of the subtler features of the centralized robot fusion GM pdf, such as the distinctness of the modes around the outskirts of the map and the ‘plowed lanes’ created by robots 2 and 3. On the other hand, the posterior GMs for unweighted Chernoff and Bhattacharyya fusion are smeared out, indicating that less novel information is retained by WEP fusion at the expense of robot 1’s older outdated common prior information (note the frequent information ‘dips’ for unweighted Chernoff). Interestingly, Fig. 7(f) shows that each robot is more likely to gain (and less likely to lose) information via Information Weighted Chernoff fusion, as the gains for each robot rarely drop from their previous values, as they do near the end of the unweighted Chernoff and Minumum Information Loss results. Further analysis shows that this behavior for Information Weighted Chernoff fusion stems from an aggressive ‘winner takes all’ strategy for selecting ω in the initial stages of the search, so that values of ω close to 0 or 1 are observed frequently. As the robot information gains become similar after k = 30, the ω values selected by Information Weighted Chernoff become less extreme. This suggests different approaches are more appropriate at different times during the search process. V. C ONCLUSION The WEP approach for robust fusing two arbitrary probability distributions with unknown common information was addressed for general multi-robot applications. The nominal Chernoff fusion rule does not account for differences in content or quality of information sources used to construct constituent robot distributions; the Information Weighted Chernoff Fusion rule was developed to account for such potential disparities without explicitly counting the number of measurements. To approximately minimize losses incurred by WEP fusion relative to the exact fusion posterior, a second novel Minimum Information Loss Fusion rule was also developed using the KLD between the Naive Bayes and WEP-fused distributions as the cost metric. Novel fast convex implementations of the proposed WEP fusion rules were presented for fusion of Bernoulli and arbitrary Gaussian mixture pdfs. These methods were experimentally demonstrated in loopy multi-robot communication topologies for fusion of 3D occupancy grids and 2D target location beliefs. Future work will investigate the theoretical properties and online performance of the proposed WEP methods across a variety of dynamic ad hoc fusion scenarios, spanning scales from dozens to thousands of robots. ACKNOWLEDGMENTS This work was supported by the Air Force Office of Scientific Research (AFOSR) (FA9550-08-1-0356) and Army Research Office (ARO) (W911NF-09-1-0466).

15

Info Gain vs. Time, No Fusion

Info Gain vs. Time, Centralized Fusion

0.4

(a)

(b)

0.3 0.25 0.2

Robot 1

Robots 2 and 3 separately and slowly gain new info over search paths

0.35

KLD from Prior (nats)

KLD from Prior (nats)

0.35

0.4 Robot 1 Robot 2 Robot 3

Stationary Robot 1 receives no new info

0.15 0.1 0.05 0 0

0.3 0.25 0.2 0.15

Optimal info gain curve for all robots if all estimates fused

0.1 0.05

10

20

30

40

50

0 0

60

10

20

(a)

(d)

0.35

Information gains ~10x larger than for centralized fusion

1.5 1 0.5

Robot 1 Robot 2 Robot 3

0.3 0.25

Large initial info gain for Robot 1, followed by smaller info gains/drops over network

0.2 0.15 0.1 0.05

10

20

30

40

50

0 0

60

10

Time step k (secs)

20

(c) Robot 1 Robot 2 Robot 3

0.3

Large initial info gains during fusion steps

0.2 0.15 0.1 0.05

(h)

Fig. 6. (a) Prior GM pdf for xTARG at k = 0; (b)-(h) Robot 1’s (bottom left) GM posterior pdf at k = 65 under various fusion rules: (b) no fusion, (c) centralized fusion, (d) Naive Bayes, (e) unweighted Chernoff, (f) Information Weighted Chernoff, (g) Minimum Information Loss, (h) Bhattacharyya fusion.

R EFERENCES [1] T. Bailey, S. Julier, and G. Agamennoni. On Conservative Fusion of Information with Unknown Non-Gaussian Dependence. Technical report, Austrailian Center for Field Robotics, 2011. [2] F. Bourgault. Decentralized control in a Bayesian World. PhD thesis, University of Sydney, 2005. [3] K. Chang, Chong Chee-Yee, and S. Mori. Analytical and Computational Evaluation of Scalable Distributed Fusion Algorithms. IEEE Trans. on Aero. and Electronic Sys., 46(4):2022–2034, 2010. [4] T. M. Cover and J. A. Thomas. Elements of information theory, volume 6. Wiley Online Library, 1991. [5] Anthony H. Dekker and Bernard D. Colbert. Network robustness and graph topology, 2004. [6] W. J. Farrell and C. Ganesh. Generalized chernoff fusion approximation for practical distributed data fusion. In 12th Int’l Conf. on Info. Fusion., pages 555–562, 2009. [7] J. Goldberger, H. K. Greenspan, and J. Dreyfuss. Simplifying Mixture Models Using the Unscented Transform. IEEE Trans. on Pattern Analysis and Mach. Intell.,, 30 (8):1496–1502, 2008. [8] S. Grime and H. F. Durrant-Whyte. Data fusion in decentralized sensor networks. Control Engineering

0.35

Robot 1 Robot 2 Robot 3

0.3 0.25

Info gains more consistently positive for each robot at each fusion step

0.2 0.15 0.1 0.05

10

20

30

40

Time step k (secs)

(g)

60

Info Gain vs. Time, Wtd. Chernoff

Info gains drop off and nearly level at later steps

0.25

0 0

50

0.4

KLD from Prior (nats)

KLD from Prior (nats)

0.35

40

(d)

Info Gain vs. Time, Unwtd Chernoff Fusion

(f)

30

Time step k (secs)

0.4

(e)

60

Info Gain vs. Time, Min Info Loss Fusion

Robot 1 Robot 2 Robot 3

2

0 0

50

0.4

KLD from Prior (nats)

KLD from Prior (nats)

(c)

2.5

40

(b)

Info Gain vs. Time, Naive Bayes Fusion 3.5 3

30

Time step k (secs)

Time step k (secs)

(e)

50

60

0 0

10

20

30

40

50

60

Time step k (secs)

(f)

Fig. 7. Robot information gains vs. time relative to k = 0 GM prior under different fusion rules: (a) no fusion, (b) centralized, (c) Naive Bayes (note larger vertical scale) (d) Minimum Information Loss, (e) unweighted Chernoff, (f) Information Weighted Chernoff. Results for Bhattacharyya fusion (not shown) are similar to unweighted Chernoff results. Dashed vertical lines denote interagent DDF time steps.

Practice, 2(5):849–863, 1994. [9] M. B. Hurley. An Information-Theoretic Justification for Covariance Intersection and Its Generalization. Technical report, MIT Lincoln Lab, 2001. [10] S. Julier. An Empirical Study into the Use of Chernoff Information for Robust, Distributed Fusion of Gaussian Mixture Models. In 9th Int’l Conf. on Info. Fusion., 2006. [11] S. J. Julier and J. K. Uhlmann. A non-divergent estimation algorithm in the presence of unknown correlations. In Proc. of the American Control Conf., volume 4, pages 2369–2373 vol.4, 1997. [12] J. H. Kotecha and P. M. Djuric. Gaussian sum particle filtering. IEEE Trans. on Signal Proc., 51(10):2602– 2612, 2003. 1053-587X. [13] Andrew R. Runnalls. Kullback-Leibler Approach to Gaussian Mixture Reduction. IEEE Trans. on Aerospace and Electronic Systems., 43(3):989–999, 2007. [14] K. M. Wurm, A. Hornung, M. Bennewitz, C. Stachniss, and W. Burgard. OctoMap: A probabilistic, flexible, and compact 3D map representation for robotic systems. In Proc. of the ICRA 2010 workshop on best practice in 3D perception and modeling for mobile manipulation, 2010.

16

State Estimation for Legged Robots: Consistent Fusion of Leg Kinematics and IMU Michael Bloesch, Marco Hutter, Mark A. Hoepflinger, Stefan Leutenegger, Christian Gehring, C. David Remy and Roland Siegwart Autonomous Systems Lab, ETH Z¨urich, Switzerland, [email protected]

Abstract—This paper introduces a state estimation framework for legged robots that allows estimating the full pose of the robot without making any assumptions about the geometrical structure of its environment. This is achieved by means of an Observability Constrained Extended Kalman Filter that fuses kinematic encoder data with on-board IMU measurements. By including the absolute position of all footholds into the filter state, simple model equations can be formulated which accurately capture the uncertainties associated with the intermittent ground contacts. The resulting filter simultaneously estimates the position of all footholds and the pose of the main body. In the algorithmic formulation, special attention is paid to the consistency of the linearized filter: it maintains the same observability properties as the nonlinear system, which is a prerequisite for accurate state estimation. The presented approach is implemented in simulation and validated experimentally on an actual quadrupedal robot.

I. I NTRODUCTION Particularly in rough and highly unstructured environments in which we ultimately want to employ autonomous legged robots, postural controllers require fast and precise knowledge of the state of the robots they are regulating. Especially for dynamic locomotion, the underlying state estimation can quickly become a bottleneck in terms of achievable bandwidth, robustness, and locomotion speed. To achieve the required performance, a state estimator for legged robots should explicitly take into account that such systems are interacting with their environment via multiple intermittent ground contacts. Ignoring or neglecting the ground interaction will lead to computationally and sensory more “expensive” approaches, ranging from vision-based [2, 16, 17] to GPS-supported [4, 6] methods. In contrast to such approaches, we will show in the following that in cases where on-board sensors fully measure the internal kinematics of the robot as well as its inertial acceleration and rotational rate, precise information on the robot’s pose can be made readily available. One of the earliest approach exploiting information given by the leg kinematics was implemented by Lin et al. [13] in 2005 on a hexapod robot. Assuming that the robot is in contact with three of its six feet at all times and assuming completely flat terrain, they implemented a leg-based odometer. Their method requires the robots to follow a tripod gait and is affected by drift. In [14], the same group fused the leg-based odometer with data from an Inertial Measurement Unit (IMU) and thus is able to handle tripod running. Using the assumption of This research was supported by the Swiss National Science Foundation through the National Centre of Competence in Research Robotics.

Fig. 1. Experimental quadruped platform StarlETH [11]. The inertial and the body fixed coordinate frames I and B are depicted, as well as the absolute position vectors of the robot r and of the footholds p1 , . . . , pN . The presented EKF includes all foothold positions into the estimation process.

knowing the precise relief of the terrain, Chitta et al. [3] implemented a pose estimator based on a particle filter. It fuses leg kinematics and IMU in order to globally localize a robot. Just very recently, three novel pose estimators have been presented that are all based on leg kinematics. Reinstein and Hoffmann [15] presented a data-driven approach using joint encoders, pressure sensors, and an on-board IMU. They searched for significant sensory based indicators in order to determine the stride length when given a specific gait pattern. With this assumption, they successfully limited the position drift of the IMU and by appropriate training of the filter could additionally handle slippage. Chilian et al. [2] implemented a leg odometer based on point cloud matching for a hexapod robot, requiring a minimum of three feet in contact. It is based on a multisensor fusion algorithm that includes inertial measurements and visual odometry. Assuming planar springmass running, Gur and Saranli [7] proposed a generic, modelbased state estimation technique. In the presented approach we implement an Extended Kalman Filter (EKF) and choose an appropriate state vector in order to break down the estimation problem to the proper formulation of a few simple measurement equations. Without any assumption about the shape of the terrain, we are able to estimate the full state of the robot’s main body, and we can provide an estimate of the ground geometry. By performing an observability analysis, we show that apart from the absolute

17

position and yaw angle of the robot all other states can be precisely observed as long as at least one foot is in contact with the ground. This means that, after accumulating some drift during a flight phase, the pitch and roll angles become again fully observable when the robot regains ground contact and the corresponding estimation errors will decrease. Only proprioceptive sensors are required and no assumptions are made concerning the type of gait or the number of robot legs. Little foot slippage and uncertainties on the leg kinematics can be handled as well. Due to current limitations of the control approach, dynamic gaits are currently evaluated in simulation only. Still, results obtained from static walking sequences on an actual quadrupedal platform (see Fig. 1) are presented and compared with ground truth measurements from an external motion tracker. The structure of the paper is as follows. In section II a short overview of the sensory devices is provided. Subsequently, section III presents the design of an Extended Kalman Filter. Section IV argues on the observability of the filter states and introduces observability constraints. Simulation and experimental validation are discussed in section V. II. S ENSOR D EVICES AND M EASUREMENT M ODELS This section discusses the required sensors and the corresponding stochastic measurement models for a N legged robot. The particular model choices represent a trade-off between simplicity and accuracy. Throughout the paper, external disturbances and noise will be modeled as continuous white Gaussian noise or as discrete Gaussian noise processes. This is a coarse simplification, but can be handled by increasing the corresponding covariance matrix. A. Forward Kinematics and Encoders Incremental encoders provide access to the angular position of all joints. The corresponding encoder measurement vector ˜ of the joint angles vector α is assumed to be affected by α discrete Gaussian noise nα with covariance matrix Rα : ˜ = α + nα . α

(1)

Based on the known leg kinematics, the location of each foot can be computed with respect to the main body. However, due to erroneous calibration and possible errors in the kinematical model lkini (·) of leg i, additive discrete Gaussian noise terms ns,i are included in the model: si = lkini (α) + ns,i ,

(2)

where si represents the vector from the center of the main body to the contact point of leg i and where Rs is the covariance matrix of ns,i . Both, si and ns,i , are expressed in the body fixed frame B. B. Inertial Sensors The IMU measures the proper acceleration f and the angular rate ω of the robot’s main body. The proper acceleration is related to the absolute acceleration a by f = C(a − g),

where C is the matrix rotating coordinates of a vector expressed in the inertial coordinate frame I into the body coordinate frame B. The IMU quantities f and ω are assumed to be directly measured in the body coordinate frame B. In order to describe the underlying stochastic process, the following continuous stochastic models are introduced: f˜ = f + bf + wf , b˙ f = wbf , ˜ = ω + bω + wω , ω ˙bω = wbω .

(4) (5) (6) (7)

˜ are affected by additive The measured quantities f˜ and ω white Gaussian noise processes wf and wω and by bias terms bf and bω . The bias terms are modeled as Brownian motions and their derivatives can be represented by white Gaussian noise processes wbf and wbω . The noise terms are specified by the corresponding covariance parameters Qf , Qbf , Qω , and Qbω . Following the paper of El-Sheimy et al. [5], they can be evaluated by examining the measured Allan variances. For the sake of simplicity each covariance parameter is assumed to be a diagonal matrix with identical diagonal entries. III. S TATE E STIMATION As stated in the previous section, two different sources of data are available. Each of them provides information that can potentially contribute to the state estimate of the robot. In order to exploit this information an Extended Kalman Filter is designed. This section starts by defining the state vector of the filter and subsequently continues by formulating the filter models and equations. A. Filter State Definition The state vector of the filter has to be chosen such that the corresponding prediction and measurement equations can be stated in a clean and consistent manner. In this approach the state vector of the quadruped robot is composed of the position of the center of the main body r, of the corresponding velocity v and of the quaternion q representing the rotation from the inertial coordinate frame I to the body coordinate frame B. In order to consider the kinematics of the legs, the absolute positions of the N foot contact points pi are included into the state vector. Together with the accelerometer bias bf and the gyroscope bias bω this yields the following state vector:   (8) x := r v q p1 · · · pN bf bω . r,v and all contact positions pi are expressed in the inertial coordinate frame I, whereas bf and bω are expressed in the body coordinate frame B. Given a quaternion q the corresponding rotation matrix C can be easily determined. The presented Extended Kalman Filter represents the uncertainties of the estimated state vector via the covariance matrix P of the corresponding state error vector δx P := Cov(δx),   δx := δr δv δφ δp1 · · · δpN δbf δbω .

(3)

18

(9) (10)

For the orientation state q, special care has to be taken. It possesses 3 degrees of freedom and its covariance term should therefore also be represented by a 3 dimensional covariance matrix. Therefore the error of the pose is represented as a 3ˆ represents the dimensional rotation vector δφ. That is, if q estimate of the orientation quaternion, the error quaternion δq is defined by the relation ˆ, q = δq ⊗ q

(11)

where ⊗ is the quaternion multiplication operator and where the quaternion error is related to the error rotation vector by means of the map ζ(·): δq = ζ(δφ), v sin( 12 v) v ζ : v → ζ(v) = . cos( 12 v)

(12)

foothold is set to infinity (or to a very large value) whenever it has no ground contact. This enables the corresponding foothold to relocate and reset its position estimate when it regains ground contact, whereby the old foothold position is dropped from the estimation process. This is all that is required in order to handle intermittent contacts when stepping. ⎤ ⎡ 0 wp,i,x 0 (21) Qp,i = ⎣ 0 wp,i,y 0 ⎦ . 0 0 wp,i,z C. Measurement Model Based on the kinematic model (2) a transformed measurement quantity is introduced for each leg i:

(13)

˜i := lkini (α) ˜ s ≈ lkini (α) + J lkin,i nα

(22) (23)

The inclusion of the foot contact positions into the filter state is the key point in the filter design, enabling a simple and consistent representation of the model equations. The leg kinematics measurements represent relative pose measurements between main body and foot contact, based on which the EKF is able to simultaneously correct the location of the foot contacts as well as the pose of the main body. In fact, the presented approach can be interpreted as a simultaneous localization and mapping (SLAM) algorithm, where the position of the actual foot contacts build up the map the robot is localized in.

≈ si −ns,i + J lkin,i nα .   

(24)

ni

The linearized noise effect from the encoders (1) and the noise from the foothold position are joined into a new measurement noise quantity ni with covariance matrix Ri : Ri = Rs + J lkin,i Rα J Tlkin,i ,

where J lkin,i is the Jacobian of the kinematics of leg i with respect to the joint angles αi of the same leg:

B. Prediction model The prediction equations are responsible for propagating the state from one timestep to the next. The IMU measurements ˜ are directly included here. Using (3)-(7), a set of f˜ and ω continuous time differential equations can be formulated: r˙ = v,

(14)

v˙ = a = C (f˜ − bf − wf ) + g, 1 1 ˜ − bω − wω )q, q˙ = Ω(ω)q = Ω(ω 2 2 p˙ i = C T wp,i ∀i ∈ {1, . . . , N }, b˙ f = wbf , T

b˙ ω = wbω ,

(25)

J lkin,i :=

∂lkini (α) ∂αi

i ∈ {1, . . . , N }.

(26)

˜i is the measurement of the position of the foot contact i s with respect to the body coordinate frame B which can also be expressed as the absolute position of the foot contact minus the absolute position of the robot rotated into the body frame.

(15)

˜i = C(pi − r) + ni . s

(27)

(16) D. Extended Kalman Filter Equations (17) (18)

For the subsequent linearization and discretization of the above models, the following auxiliary quantity is introduced:

(19)

where Ω(·) maps an arbitrary rotational rate ω to the 4x4 matrix used for representing the corresponding quaternion rate: ⎡ ⎤ 0 ωz −ωy ωx ⎢−ωz 0 ω x ωy ⎥ ⎥ Ω : ω → Ω(ω) = ⎢ (20) ⎣ ωy −ωx 0 ωz ⎦ . −ωx −ωy −ωz 0 While in principle the foot contacts are assumed to remain stationary, the white noise terms wp,i in (17) with covariance parameter Qp,i are added to the absolute foot positions in order to handle a certain amount of foot slippage. It is described in the body frame which allows tuning the magnitude of the noise terms in the different directions relative to the quadruped orientation (21). Furthermore, the noise parameter of a certain

∞  Δti+n ×i ω , Γn := (i + n)! i=0

(28)

where the (·)× superscript is used to represent the skewsymmetric matrix obtained from a vector. It draws on the series expansion of the matrix exponential. For n = 0 it yields: i ∞     Δtω × Γ0 = = exp Δtω × . (29) i! i=0 This means that Γ0 represents the incremental rotation matrix if rotating an arbitrary coordinate frame with a rotational rate of −ω for Δt seconds. There exists a closed form expression for Γn that can be efficiently numerically evaluated (similar to Rodrigues’ rotation formula).

19

1) Prediction Step: A standard filtering convention is employed: at time step k the a priori state estimate is represented ˆ− ˆ+ by x k , the a posteriori state estimate by x k . Assuming ˜ ˜ k , and zero-order hold for the measured quantities f k and ω neglecting the effect of the incremental rotation, equations (14)-(19) can be discretized to: rˆ − k+1 ˆ− v k+1 ˆ− q k+1 ˆ− p i,k+1 − ˆ bf,k+1 ˆ− b ω,k+1

Δt2 ˆ +T ˆ (C k f k + g), = + + 2 ˆ +T ˆ ˆ+ =v k + Δt(C k f k + g), rˆ + k

Δtˆ v+ k

ˆ k) ⊗ = ζ(Δtω ˆ+ =p i,k = =

ˆ+ q k,

(30) (31) (32)

∀i ∈ {1, . . . , N },

(33)

ˆ+ , b f,k + ˆ bω,k ,

(34) (35)

with the bias corrected IMU measurements ˆ+ , fˆ = f˜ − b k

k

˜k − ˆk = ω ω

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

Qf +

Δt5 20 Δt4 8

(43)

4 Qf + Δt 8 Qbf Δt3 ΔtQf + 3 Qbf

Qbf

Qf + Qbf 0 0 3 ˆ+ − Δt Q bf C k 6 0 0 0 0

ˆ +T Q C ˆ+ ΔtC p k k 0 0

0 0 2 ˆ+ − Δt 2 Qbf C k 0

3 ˆ +T − Δt 6 C k Qbf 2 ˆ +T Q − Δt C 2

k

bf

0

0 0 ˆ T )Q Γ bω 3,k

ˆ 3,k + ΔtQω + (Γ 0 0 ˆ 2,k −Qbω Γ ⎤

0

0

ˆT Q −Γ 2,k bω

0 ΔtQbf 0

0 0 ΔtQbω

⎥ ⎥ ⎥ ⎥ ⎥ = Qk . ⎥ ⎥ ⎦





With this the measurement Jacobian H k can be evaluated: ∂y k ˆk ∂x ⎡ ˆ− −C k ⎢ ⎢ .. =⎢ . ⎣ ˆ− −C k

Hk =

(41) (42)

Δt2 2



− − − ˆ (ˆ ˆ ˆ ˆ− si,k − C k pi,k − r k ) ≈ − C k δr k + C k δpi,k  − × ˆ (p− − r − ) δφ− . (47) + C k k i,k k

(40)

For the subsequent discretization, Van Loan’s results [18] and the relation (28) can be applied to get the discrete linearized error dynamics matrix F k and the discrete process noise covariance matrix Qk (for readability only one foothold estimate is depicted): ⎡ ⎤ 2 ˆ +T fˆ × 0 − Δt2 C ˆ +T 0 I ΔtI − Δt2 C k k k 2 ⎢ ⎥ ˆ +T fˆ × 0 −ΔtC ˆ +T ⎢0 I −ΔtC 0 ⎥ k k k ⎢ ⎥ ⎢ ˆT ˆT ⎥ Γ 0 0 −Γ F k = ⎢0 0 0,k 1,k ⎥ , (44) ⎢0 0 0 I 0 0 ⎥ ⎢ ⎥ ⎣0 0 0 0 I 0 ⎦ 0 0 0 0 0 I Δt3 3 Δt2 2

Considering the error states and again neglecting all higher order terms, it can be derived that the errors of the predicted leg kinematics measurements are given by:

(39)

∀i ∈ {1, . . . , N },

(45)

2) Update Step: The measurement residual, also called innovation, is the difference between actual measurements and their predicted value: ⎞ ⎛ − ˆ − (ˆ ˆ− ˜1,k − C s k p1,k − r k) ⎟ ⎜ .. ⎟. (46) y k := ⎜ . ⎠ ⎝ − − − ˆ (ˆ ˜N,k − C ˆk ) s k pN,k − r

(37)

˙ = −C T f × δφ − C T δbf − C T wf , δv ˙ = −ω × δφ − δbω − wω , δφ



+ T P− k+1 = F k P k F k + Qk .

(36)

f,k + ˆ bω,k .

In order to correctly propagate the covariance matrix through the state dynamics, a set of linear differential equations describing the error dynamics is derived from (14)-(19) where all higher order terms were neglected: ˙ = δv, δr (38)

˙ = C T wp,i δp i ˙ f = wbf , δb ˙ ω = wbω . δb

By linearly combining two Gaussian distributions the Extended Kalman Filter stipulates the following a priori estimate of the covariance matrix at the timestep k + 1:



× − − ˆ− ··· 0 ˆ − (ˆ ˆ p − r ) C C k k 1,k k .. .. . . . . .. . . ×  − − ˆ− ˆ (ˆ ˆ− 0 ··· C 0 C k pN,k − r k k) 0 .. .

Stacking the single measurement noise the total measurement noise matrix: ⎡ R1,k ⎢ .. Rk = ⎣ .

⎤ 0 ⎥ .. ⎥ . .⎥ ⎦ 0 0

0 .. .

matrices (25) returns ⎤ ⎥ ⎦.

(48)

RN,k Finally the a priori state estimate can be merged with the current measurements, where the Extended Kalman Filter states the following update equations: T S k := H k P − k H k + Rk , T −1 P− k H k Sk ,

K k := Δxk := K k y k ,

− P+ k := (I − K k H k )P k

(49) (50) (51) (52)

where S k represents the innovation covariance, K k the Kalman gain, Δxk the resulting correction vector and P + k the a posteriori estimate of the state covariance matrix. Given Δxk the state estimate can be updated. Again the orientation state requires special attention. Although the quaternion is of dimension 4, the extracted rotational correction Δφk has only 3 dimensions. It basically represents the 3D rotation vector that needs to be applied to correct the predicted quaternion: ˆ+ ˆ− q k = ζ(Δφk ) ⊗ q k.

20

(53)

IV. O BSERVABILITY A NALYSIS A. Nonlinear Observability Analysis Analyzing the observability characteristics of the underlying nonlinear system reveals the theoretical limitations of state estimation and can validate the employed approach. Based on the paper of Hermann and Krener [8] a nonlinear observability analysis is performed. In order to remain analytically tractable robocentric coordinates are introduced. The coordinate transformation is bijective and will thus not change the observability characteristics. Given the current operating point by   x∗ := r ∗ v ∗ q ∗ p∗1 · · · p∗N b∗f b∗ω

form. For the sake of ⎡ I ··· 0 0 ⎢ .. . . .. .. ⎢. . . . ⎢ ⎢0 · · · I 0 ⎢ ⎢0 · · · 0 −I ⎢ ⎢0 · · · 0 0 ⎢ O= ⎢ ⎢0 · · · 0 0 ⎢0 · · · 0 0 ⎢ ⎢0 · · · 0 0 ⎢ ⎢0 · · · 0 0 ⎢ ⎢0 · · · 0 0 ⎣

O2 =

× (s× 1ω

0 .. .

0 0 0 0 I −2(Cg)× 0 2ω × (Cg)× 0 0 0 0 0 0 0 0 0

×

are dropped again: ⎤ 0 0 .. .. ⎥ . .⎥ ⎥ 0 0⎥ ⎥ s× 0⎥ 1 ⎥ O1 0⎥ ⎥ O2 0⎥ ⎥ , (59) Δs× 0⎥ i,j ⎥ × 0⎥ Δs× i,j ω ⎥ × × 0⎥ Δs× i,j ω ω ⎥ 0⎥ O3 ⎦ .. . 0 (60)

×

+ 3(Cv) )ω − ω ×

×

× (s× 1ω

×

+ 2(Cv) )

×

− (Cg) − 2f ,

(61)

× × × × × × O 3 = ω × (s× 1 ω ω + 5(Cv) ω − 4f − 3(Cg) ) × × × × × × × − (s× 1 ω ω + 4(Cv) ω − 3f − 2(Cg) )ω

− 4ω × (Cv)ω T ,

(55)

Δsi,j := si − sj .

The quantities in (55) are ordered such that a nice row echelon form results. The corresponding prediction model (14)-(19) and measurement equation (27) will be transformed to ⎡

(56)

(57)

where all noise terms were disregarded as they have no influ¯ and C ∗ represent the ence on the observability and where C ¯ and to q ∗ , respectively. rotation matrices corresponding to q The observability of the transformed system can now be analyzed. In contrast to the linear case, Lie-derivatives need to be computed in order to evaluate the observability matrix. By applying a few row-operations and by directly including the transformed operating point   z ∗ := s∗1 · · · s∗N C ∗ v ∗ 0 (0 0 0 1) 0 C ∗ r ∗



× × O 1 = − s× 1 ω − 2(Cv) ,



⎤ ¯ ω ) × s1 − v ¯ (ω − b ⎥ ⎢ .. ⎥ ⎢ . ⎥ ⎢ ⎥ ⎢ ¯ ω ) × sN − v ¯ (ω − b ⎥ ⎢ ⎢(ω − b ¯ ω )× v ¯f + CC ¯ ∗ g⎥ ¯ + f − b z˙ := ⎢ ⎥, ⎥ ⎢ 0 ⎥ ⎢ ⎥ ⎢ ¯ω )¯ Ω(ω − b q ⎥ ⎢ ⎦ ⎣ 0 ¯ω )× r¯ + v ¯ (ω − b ˜i = si i ∈ {1, . . . , N } s

0 .. .

0 ··· 0 0 0

(54)

the following coordinate transformation is introduced: ⎤ ⎡ ⎤ C(p1 − r) s1 ⎥ ⎢ .. ⎥ ⎢ .. ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎢ ⎥ ⎢ ⎢sN ⎥ ⎢C(p − r)⎥ N ⎥ ⎢ ⎥ ⎢ ⎥ ⎢¯⎥ ⎢ Cv z := ⎢ v ⎥=⎢ ⎥. ⎢b ¯ω ⎥ ⎢ bω − b∗ω ⎥ ⎥ ⎢ ⎥ ⎢ ∗−1 ⎥ ⎥ ⎢ ⎢q ⎢ ¯ ⎥ ⎢ q⊗q ∗ ⎥ ¯ f ⎦ ⎣ bf − bf ⎦ ⎣b r¯ Cr

readability the

(58)

the observability matrix can be converted into a row echelon

(62) (63)

A full interpretation of this matrix is not within the scope of this paper. However, two essential points are emphasized. The four dimensional manifold composed of robot position and yaw angle (rotation around gravity vector g) is always unobservable. This can be verified by looking at the tangential space spanned by the matrix T 0 ··· 0 0 0 0 0 I ¯ , (64) U= 0 · · · 0 0 0 21 (Cg)T 0 0 ¯. 0 = OU (65) ¯  lying within the subspace of Infinitesimal errors Δz = U ¯ cannot be detected. Transforming this back to our original U coordinates yields the tangential space T T C 0 0 CT · · · CT 0 0 U= T × T × T T T × (66) g r g v g C g p1 · · · g T p× N 0 0 where the upper row corresponds to a 3 dimensional translation of the inertial coordinate frame and where the lower row corresponds to a rotation of the inertial coordinate frame around the gravity axis g. The second point is that in some cases, the rank loss associated with the unobservable manifold can increase by up to 5 additional ranks. Table I depicts some of the cases. All cases which induce a rank loss require some singularities. It can thus be stated that statistically almost surely the unobservable space will be limited to absolute position and yaw angle (except for the case where there is no ground contact at all). Note that if the bias estimation is excluded, the unobservable subspace will be invariantly of rank four. Unfortunately, typical operating points can lie very close to singular cases. The upper highlighted row in table I represents

21

ω ω · Cg =  0 ω · Cg =  0 ω · Cg = 0 0 0 0 0

f v * * det O 3 = 0 * * * * * * 0 * −1/2Cg *

s1 , . . . , sN not co-linear at least one contact at least one contact at least one contact not co-linear s1 = . . . = s N s1 = . . . = s N

Rank loss 0 0 ≥1 ≥2 2 3 5

TABLE I E STIMATION SCENARIOS AND CORRESPONDING RANK LOSS .

the case where the robot has at least 3 non co-linear ground contacts and where the rotation axis is not perpendicular to the gravity vector. The lower highlighted row represents the corresponding singular case where ω = 0 inducing a rank loss of 2. This proximity to singular cases can cause bad convergence quality. For this reason the filter is implemented in such a manner that the estimation of the accelerometer and gyroscope biases can be enabled or disabled at runtime. Thereby it is possible to disable the bias estimation for critical tasks. On the other hand special maneuvers can be derived from the conditions in table I which can properly estimate the bias states.

The above phenomenon has been observed earlier in the context of EKF-SLAM [9, 12]. Huang et al. [10] introduced the Observability Constrained Extended Kalman Filter in order to tackle this issue. The approach in this paper goes much along their idea: the unobservable subspace of the nonlinear system (66) should also be unobservable in the linearized and discretized system (67)-(68). Mathematically, this can be imposed by adding the following constraint: M U = 0.

(70)

In order to meet this constraint Huang et al. evaluate the Jacobians at special operating points: instead of using the actual state estimate they use slightly altered values. The approach in this paper tackles the observability problem by exploiting the following observation: the noiseless case does meet the constraint (70) because it perfectly fulfills the prediction equations (30)-(35) and thus the appropriate terms are canceled out. For the presented filter it suffices if the following constraints are introduced (where a ∗ denotes the states or measurements around which Jacobians are evaluated): Δt2 ∗T ∗ (C k f k,1 + g), 2 ∗ = v ∗k + Δt(C ∗T k f k,2 + g),

B. Observability Analysis of the Extended Kalman Filter

r ∗k+1 = r ∗k + Δtv ∗k +

(71)

The filter makes use of a linearized and discretized version of the nonlinear system model:

v ∗k+1

(72)

xk+1 = F k xk + wlin,k , y k = H k xk + nlin,k ,

q ∗k+1 p∗i,k+1

(67) (68)

where errors caused by linearization or discretization are incorporated in the noise terms wlin,k and nlin,k . The corresponding observability analysis will be performed by applying the concept of local observability matrices [1]: similar to the time-invariant case the observable subspace can be derived by analyzing the subspace spanned by the rows of a local observability matrix: ⎡ ⎤ Hk ⎢ ⎥ H k+1 F k ⎢ ⎥ ⎢ H k+2 F k+1 F k ⎥ M=⎢ (69) ⎥. ⎢H k+3 F k+2 F k+1 F k ⎥ ⎣ ⎦ .. . The observability characteristics of the discrete linear timevarying system (67)-(68) can differ from those of the underlying nonlinear system (14)-(19),(27). This discrepancy can be caused by linearization/discretization effects as well as by noise effects. The effect of noise becomes particularly evident when contemplating the observability characteristics of a corresponding noiseless (ideal) system. For the presented system the effect of noise renders the yaw angle observable by preventing the evaluation of the Jacobians F k and H k around the true state and thereby increasing the numerical rank of the local observability matrix M. The spurious appearance of new observable states is strongly objectionable as it results in overconfident state estimation. The magnitude of this inconsistency depends on the noise ratio, but in the long run, it will always deteriorate the state estimate.

= =

ζ(ω ∗k ) p∗i,k



q ∗k ,

(73)

∀i ∈ {1, . . . , N }.

(74)

Both, filter state and IMU measurements, are allowed to differ from their actual estimated quantities. However, in order to keep the linearization errors small the linearization point should remain as close as possible to the estimated state. Thus, given the timestep li of the last touch-down event of foot i, the first-ever available estimate is chosen for the linearization: r ∗k = r − k, p∗i,k

=

p− i,li

v ∗k = v − k,

q ∗k = q − k,

∀i ∈ {1, . . . , N }.

(75) (76)

This is in analogy to the First-Estimates Jacobian EKF of Huang et al. [9]. But, in general, the prediction constraints (71)-(73) are still not met. For this reason the additional terms f ∗k,1 , f ∗k,2 and ω ∗k were introduced. Now, by choosing  ∗  r k+1 − r ∗k − Δtv ∗k ∗ ∗T f k,1 = C k −g , (77) 0.5Δt2   ∗ v k+1 − v ∗k −g , (78) f ∗k,2 = C ∗T k Δt   ω ∗k = ζ −1 q ∗k+1 ⊗ q ∗−1 (79) k all constraints can be easily met. The above quantities represent the IMU measurements that would arise when considering two subsequent filter prediction states at timestep k and k + 1. Because the acceleration related measurements can differ if evaluated based on the position prediction or on the velocity prediction, two terms were introduced. This permits to keep the computation of the linearization quantities simple and avoids complex optimization algorithms or oscillation provoking bindings between subsequent linearization points.

22

y [m]

1

0

2

x [m]

3

4

5

2 1.5 Vicon data 1

Filter estimate Covariance Hull

0.5 0

10

20

30

40

50

60

10

20

30

40

50

60

10

20

30 time [s]

40

50

60

y [m]

0.5

(80)

0

−0.5 0

V. R ESULTS AND D ISCUSSION

0.5 0.45 z [m]

Experiments are performed in simulation and on a real platform, whereby a series-elastic actuated quadruped is stabilized by a virtual model control approach [11] using the feedback of the pose estimator. The estimation of accelerometer and gyroscope biases is always enabled. In a first experiment the filter behavior is evaluated for a dynamic trotting gait within a simulation environment including realistic noise levels. Fig. 2 shows results from a 15 s trajectory with a reference forward speed of 0.4 m/s. The uncertainties of the robot and of the foothold positions are represented by the corresponding 1σellipses. The effects of unobservable absolute position and yaw angle can clearly be perceived. The leg kinematics measurements directly correlate the estimate of the main body position and the estimates of the foothold positions and thereby strongly limit the drift. Moreover, considering the correlations induced by the prediction model, the filter is able to properly correct the estimated quantities rendering the inclination angles and the velocities fully observable. Based on the resulting state estimate the quadruped can stabilize itself in a highly dynamic gait. The second set of results is collected on a real platform. During the experiment independent ground truth measurements are provided by an external visual tracking system. A 60 s long static walking sequence where the robot moves approximately one meter forward is evaluated. By pushing and tilting the robot external disturbances are imposed on the slow locomotion pattern. Because the position is not fully observable, a slight drift occurs for the corresponding estimates (see Fig. 3), it can amount up to roughly 10 % of the traveled distance. Notable sources for the drift are the inaccurate leg kinematics and the fault-prone contact detection. The slightly longer actual robot shank explains the shorter estimated traveled distance (x direction). On the other hand, small perturbations are closely tracked by the filter. This is attested by very precise velocity estimates yielding RMS error

1

Fig. 2. 2D view of a 5 m trotting sequence in simulation. Dashed line (in the background): ground-truth body trajectory. Darker ellipses (red): successive position estimates of the robot’s main body. Light grey ellipses: estimates of the foothold positions. In both cases the ellipses are scaled depending on the corresponding standard deviation (1σ). The position error at the end amounts to less than 5% of the traveled distance.

whereby it is simple to test that the observability constraint (70) is satisfied. As a last side note: similarly to the nonlinear case, observability rank loss will again be induced when ω ≡ 0 and thus C Tk+2 − C Tk = 0.

0.5

0

x [m]

Computing the Jacobians F k and H k using the supplementary linearization quantities and evaluating the corresponding local observability matrix (69) yields: ⎤ ⎡ T −I 0 s× I ··· 0 0 0 1,k C k ⎢ . . .. .. . . .. .. .. ⎥ ⎢ .. .. . . . . . .⎥ ⎥ ⎢ T × ⎢−I 0 0 ··· I 0 0⎥ s1,k C k ⎥ ⎢ ⎥ ⎢ T Δt2 × T ⎥ ⎢ 0 I (v k + Δt g) C 0 0 0 − C # k k 2 2 M=⎢ ⎥ T T 1 × ⎥ ⎢0 0 −g 0 0 0 (C + C ) # k+1 k 2 ⎥ ⎢ T T 1 ⎥ ⎢0 0 0 0 0 0 (C − C ) # k+2 k 2 ⎥ ⎢ T T 1 ⎢0 0 0 0 0 0 2 (C k+3 − C k+1 ) #⎥ ⎦ ⎣ .. 0 0 0 0 0 0 . #

0.4

0.35 0

Fig. 3. Comparison between estimated position and the motion capture system’s position outputs. All three positions are affected by some drift, amounting up to 10 % of the traveled distance.

values of less than 0.02 m/s (see Fig. 4). Like the velocity states, the roll and pitch angles are fully observable as well and exhibit also very small estimation errors (see Fig. 5). The drift of the yaw angle is almost imperceivable. For all estimates the corresponding 3σ covariance-hull is plotted. Except for the x-position estimate, where model inaccuracies induce a significant offset, all estimate errors remain within the covariance-hull and thus validate the consistency of the presented approach. VI. C ONCLUSION AND F UTURE W ORK This paper presents a pose estimator for legged robots. It fuses information from leg kinematics and IMU data, whereby the model equations are kept simple and precise, and only a minimum of assumptions is introduced (mainly limited foot slippage). The filter can handle unknown terrain and arbitrary locomotion gaits. Through an observability analysis, it was shown that for non-degenerate cases only absolute position and yaw angle are not observable. Consequently, the roll and pitch angles as well as the robot’s velocity can be accurately tracked, which was confirmed by the experimental results. Compared to proprioceptive sensor setups only, the obtained state estimate attains an unpreceded level of precision. The very generic formulation enables the filter to be extended with further sensory measurements and allows its implementation on various kinds of legged platforms. Future work will include handling the unobservable states. Different approaches like introducing coordinate transforma-

23

[4]

[5]

[6]

[7] Fig. 4. Comparison between estimated velocity and the motion capture system’s numerical position derivatives. All three velocity estimates are fully observable and consequently can be tracked very accurately. The resulting RMS error values are 0.0111 m/s, 0.0153 m/s and 0.0126 m/s.

[8]

[9]

[10]

[11]

[12] Fig. 5. Comparison between estimated roll, pitch and yaw angle and the motion capture system’s orientation outputs. Roll and pitch angle are fully observable and the filter produces very precise corresponding estimates, with angular error RMS of less than 0.5 deg (0.0088 rad and 0.0073 rad). The yaw angle drift is almost unnoticeable.

tions, partitioning the unobservable manifold or implementing pseudo-measurements could be evaluated. Fusion with exteroceptive sensors will also be investigated. More aggressive locomotion needs to be further tested: while it has been validated in simulation, future work will include dynamic walking on the real quadruped platform.

[13]

[14]

[15]

R EFERENCES [1] Z. Chen, K. Jiang, and J.C. Hung. Local observability matrix and its application to observability analyses. In Industrial Electronics Society, 16th Annual Conference of IEEE, Nov. 1990. [2] A. Chilian, H. Hirschm¨uller, and M. G¨orner. Multisensor data fusion for robust pose estimation of a sixlegged walking robot. In Intelligent Robots and Systems, IEEE/RSJ International Conference on, Sep. 2011. [3] S. Chitta, P. Vernaza, R. Geykhman, and D.D. Lee. Proprioceptive localization for a quadrupedal robot on

[16]

[17]

[18]

24

known terrain. In Robotics and Automation, IEEE International Conference on, Apr. 2007. J. A. Cobano, J. Estremera, and P. Gonzalez de Santos. Location of legged robots in outdoor environments. Robotics and Autonomous Systems, 56:751–761, 2008. N. El-Sheimy, Haiying Hou, and Xiaoji Niu. Analysis and modeling of inertial sensors using allan variance. Instrumentation and Measurement, IEEE Transactions on, 57(1):140–149, Jan. 2008. B. Gassmann, F. Zacharias, J.M. Z¨ollner, and R. Dillmann. Localization of walking robots. In Robotics and Automation, IEEE Int. Conf. on, Apr. 2005. O. Gur and U. Saranli. Model-based proprioceptive state estimation for spring-mass running. In Proceedings of Robotics: Science and Systems, Jun. 2011. R. Hermann and A. Krener. Nonlinear controllability and observability. Automatic Control, IEEE Transactions on, 22(5):728–740, Oct. 1977. G.P. Huang, A.I. Mourikis, and S.I. Roumeliotis. Analysis and improvement of the consistency of extended kalman filter based slam. In Robotics and Automation, IEEE International Conference on, May 2008. Guoquan P. Huang, Anastasios I. Mourikis, and Stergios I. Roumeliotis. Observability-based rules for designing consistent ekf slam estimators. International Journal of Robotics Research, 29:502–528, Apr. 2010. M. Hutter, C. Gehring, M. Bloesch, M.A. Hoepflinger, C.D. Remy, and R. Siegwart. StarlETH: A compliant quadrupedal robot for fast, efficient, and versatile locomotion. In Climbing and Walking Robots, International Conference on, Jul. 2012. S.J. Julier and J.K. Uhlmann. A counter example to the theory of simultaneous localization and map building. In Robotics and Automation, IEEE Int. Conf. on, May 2001. P.C. Lin, H. Komsuoglu, and D.E. Koditschek. A leg configuration measurement system for full-body pose estimates in a hexapod robot. Robotics, IEEE Transactions on, 21(3):41–422, Jun. 2005. P.C. Lin, H. Komsuoglu, and D.E. Koditschek. Sensor data fusion for body state estimation in a hexapod robot with dynamical gaits. Robotics, IEEE Transactions on, 22(5):932–943, Oct. 2006. M. Reinstein and M. Hoffmann. Dead reckoning in a dynamic quadruped robot: Inertial navigation system aided by a legged odometer. In Robotics and Automation, IEEE International Conference on, May 2011. S.P.N. Singh, P.J. Csonka, and K.J. Waldron. Optical flow aided motion estimation for legged locomotion. In Intelligent Robots and Systems, IEEE/RSJ International Conference on, Oct. 2006. F.-L. Tong and M.Q.-H. Meng. Localization for legged robot with single low-resolution camera using genetic algorithm. In Integration Technology, IEEE, Mar. 2007. C. Van Loan. Computing integrals involving the matrix exponential. Automatic Control, IEEE Transactions on, 23(3):395–404, Jun. 1978.

Extrinsic Calibration from Per-Sensor Egomotion Jonathan Brookshire and Seth Teller MIT Computer Science and Artificial Intelligence Laboratory, {jbrooksh, teller}@csail.mit.edu Abstract—We show how to recover the 6-DOF transform between two sensors mounted rigidly on a moving body, a form of extrinsic calibration useful for data fusion. Our algorithm takes noisy, per-sensor incremental egomotion observations (i.e., incremental poses) as input and produces as output an estimate of the maximum-likelihood 6-DOF calibration relating the sensors and accompanying uncertainty. The 6-DOF transformation sought can be represented effectively as a unit dual quaternion with 8 parameters subject to two constraints. Noise is explicitly modeled (via the Lie algebra), yielding a constrained Fisher Information Matrix and CramerRao Lower Bound. The result is an analysis of motion degeneracy and a singularity-free optimization procedure. The method requires only that the sensors travel together along a motion path that is non-degenerate. It does not require that the sensors be synchronized, have overlapping fields of view, or observe common features. It does not require construction of a global reference frame or solving SLAM. In practice, from hand-held motion of RGB-D cameras, the method recovered inter-camera calibrations accurate to within ∼0.014 m and ∼0.022 radians (about 1 cm and 1 degree).

I. I NTRODUCTION Extrinsic calibration – the 6-DOF rigid body transform relating sensor coordinate frames – is useful for data fusion. For example, point clouds arising from different range sensors can be aligned by transforming one sensor’s data into another sensor’s frame, or all sensor data into a common body frame. We show that inter-sensor calibration and an uncertainty estimate can be accurately and efficiently recovered from incremental poses (and uncertainties) observed by each sensor. Fig. 1 shows sensors r and s, each able to observe its own incremental motion vri and vsi respectively, such that the calibration k satisfies: vsi = g (vri , k)

(1)

where g simply transforms one set of motions according to k. This paper describes an algorithm to find the k that best aligns two series of observed incremental motions. The algorithm takes as input two sets of 6-DOF incremental pose observations and a 6×6 covariance matrix associated with each incremental pose. It produces as output an estimate of the 6-DOF calibration, and a Cramer-Rao lower bound (CRLB) [1] on the uncertainty of that estimate. (For source code and data, see http://rvsn.csail.mit.edu/calibration3d.) Prior to describing our estimation method, we confirm that the calibration is in general observable, i.e. that there is sufficient information in the observations to define k uniquely. Observability analysis yields a characterization of singularities in the Fisher Information Matrix (FIM) arising from nongeneric sensor motion. For example, singularity analysis reveals that 6-DOF calibration can not be recovered from planaronly motion, or when the sensors rotate only around a single

Fig. 1: The incremental motions of the r, red, and s, blue, sensors are used to recover the calibration between the sensors as the robot moves. The dotted lines suggest the incremental motions, vri and vsi , for sensors r and s, respectively. (In color; best viewed online.) axis. This confirms previous findings [3, 20] and provides a variance estimator useful in practice. A key aspect of this work is the choice of representation for elements of the Special Euclidean group SE(3), each of which combines a translation in R3 with a rotation in SO(3). Ideally, we desire a representation that (1) supports vector addition and scaling, so that a principled noise model can be formulated, and (2) yields a simple form for g in Eq. 1, so that any singularities in the FIM can be readily identified. We considered pairing translations with a number of rotation representations – Euler angles, Rodrigues parameters, and quaternions – but each lacks some of the criteria above. Instead, we compromise by representing each element of SE(3) as a unit dual quaternion (DQ) in the space H. Each DQ q ∈ H has eight parameters and can be expressed: q = qr + εqε

(2)

where qr is a “real” unit quaternion representing the rotation, qε is the “dual part” representing the translation, and ε2 = 0. An 8-element DQ is “over-parametrized” (thus subject to two constraints) when representing a 6-DOF rigid body transform. Although DQ’s are not minimal, they are convenient for this problem, combining in a way analogous to quaternion composition and yielding a simple form for g – about 20 lines of MatLab containing only polynomials and no trigonometric functions. An Euler-angle representation, by contrast, is minimal but produces much more complex expressions involving hundreds of lines of trigonometric terms. Homogeneous transformations yield a simple form of g, but require maintenance of many constraints. The DQ representation offers a good balance between compactness and convenience. Ordinary additive Gaussian noise cannot be employed with DQ’s, since doing so would produce points not on SE(3). Instead, we define a noise model using a projected Gaussian in the Lie algebra of DQ’s [9] which is appropriate for this

25

over-parametrized form. To identify singularities of the FIM, we adapt communication theory’s “blind channel estimation” methods to determine the calibration observability. Originally developed to determine the CRLB on constrained imaginary numbers [18], these methods extend naturally to DQ’s. Background on calibration techniques and an introduction to DQ’s is provided in § II. The problem is formally stated in § III, along with a noise model appropriate for the DQ representation. System observability is proven, and degenerate cases are discussed, in § IV. The optimization process for constrained parameters is described in § V, with techniques for resampling asynchronous data and converting between representations provided in § VI. Experimental results from simulated and real data are given in § VII. II. BACKGROUND A. Calibration There are several ways to determine the calibration. One can attempt to physically measure the rigid body transform between sensors. However, manual measurement can be made difficult by the need to establish an origin at an inaccessible location within a sensor, or to measure around solid parts of the body to which the sensors are mounted. Alternatively, the calibration can be mechanically engineered through the use of precision machined mounts. This can work well for sensors in close proximity (e.g., stereo camera rigs), but is impractical for sensors placed far apart (e.g., on a vehicle or vessel). The calibration could also be determined by establishing a global reference frame using Simultaneous Localization and Mapping (SLAM), then localizing each sensor within that frame (e.g., [11]). This approach has the significant disadvantage that it must invoke a robust SLAM solver as a subroutine. Incremental motions have also been used to recover “handeye calibration” parameters. The authors in [5, 3, 20] recover the calibration between an end-effector and an adjacent camera by commanding the end-effector to move with some known velocity and estimating the camera motion. The degenerate conditions in [3, 20] are established through geometric arguments; we confirm their results via information theory and the CRLB. Further, the use of the CRLB allows our algorithm to identify a set of observations as degenerate, or nearly degenerate (resulting in a large variance), in practice. Dual quaternions were also used by [5]; we extend this notion and explicitly model the system noise via a projected Gaussian. Doing so allows us to confirm the CRLB in simulated experiments § VII-A. B. Unit Dual Quaternions Our calibration algorithm requires optimization over elements in SE(3). Optimization over rigid body transformations is not new (e.g., [8]) and is a key component of many SLAM solutions. In our setting, unit dual quaternions (DQ’s) prove to be a convenient representation both because they

yield a simple form for g (Eq. 1) and because they can be implemented efficiently [14]. Optimization with DQ’s was also examined in [6], but their cost function included only translations; our optimization must simultaneously minimize rotation error. We review DQ’s briefly here; the interested reader should see [15, p. 53-62] for details. A DQ q can be written in several ways: as an eight-element vector [q0 , · · · , q7 ]; as two four-element vectors [qr , qε ] (c.f. Eq. 2); or as a sum of imaginary and dual components: q = (q0 + q1 i + q2 j + q3 k) + ε(q4 + q5 i + q6 j + q7 k) (3) In this form, two DQ’s multiply according to the standard rules for the imaginary numbers {i, j, k}. We write DQ multiplication as a1 ◦ a2 , where {a1 , a2 } ∈ H. When we have vectors of DQ’s, e.g., a = [a1 , a2 ] and b = [b1 , b2 ], where {b1 , b2 } ∈ H, we write a ◦ b to mean [a1 ◦ b1 , a2 ◦ b2 ]. A pure rotation defined by unit quaternion qr is represented by the DQ q = [qr , 0, 0, 0, 0]. A pure translation defined by t = [t0 , t1 , t2 ] can be represented by the DQ: t 0 t1 t2 (4) q = 1, 0, 0, 0, 0, , , 2 2 2 Given rigid body transform q, the inverse transform q −1 is: q −1 = [q0 , −q1 , −q2 , −q3 , q4 , −q5 , −q6 , −q7 ] −1

(5)

−1

= q ◦ q = I = [1, 0, 0, 0, 0, 0, 0, 0]. A such that q ◦ q vector v = [v0 , v1 , v2 ] can be represented as a DQ by: qv = [1, 0, 0, 0, 0, v0 , v1 , v2 ]

(6)

The DQ form qv of vector v transforms according to q as: qv = q ◦ qv ◦ q ∗

(7)



where q is the dual-conjugate [14] to q: q ∗ = [q0 , −q1 , −q2 , −q3 , −q4 , q5 , q6 , q7 ]

(8)

DQ transforms can be composed as with unit quaternions; applying transform A, then transform B to point v yields: ∗ ∗ ∗ qv = qB ◦ (qA ◦ qv ◦ qA = qBA ◦ qv ◦ qBA ) ◦ qB

(9)

where qBA = qB ◦ qA . If the incremental motion vri and calibration k are expressed as DQ’s, then Eq. 1 becomes: g(vri , k) := k −1 ◦ vri ◦ k

(10)

A DQ with qrT qr = 1 and qrT qε = 0 (c.f. Eq. 2) has unit length. We impose these conditions as two constraints below. C. Lie Groups & Lie Algebras Lie groups are smooth manifolds for which associativity of multiplication holds, an identity exists, and the inverse is defined [7, 13]; examples include Rn , SO(3), and SE(3). However, Rn is also a vector space (allowing addition, scaling, and commutativity), but SO(3) and SE(3) are not. For this reason, points in SO(3) and SE(3) cannot simply be interpolated or averaged. We use the Lie algebra to enable an optimization method that requires these operations. Lie algebra describes a local neighborhood (i.e., a tangent space) of a Lie group. So the Lie algebra of DQ’s, h, can be used to express a vector space tangent to some point

26

in H. Within this vector space, DQ’s can be arithmetically manipulated. Once the operation is complete, points in the Lie algebra can be projected back into the Lie group. The logarithm maps from the Lie group to the Lie algebra; the exponent maps from the Lie algebra to the Lie group. Both mappings are done at the identity. For example, if we have two elements of a Lie group {u, v} ∈ H, the “box” notation of [9] expresses the difference between v and u as d = v  u. (Here, {u, v} are each eight-element vectors and d is a six-element vector.) That is, the box-minus operator connotes d = 2 log (u−1 ◦ v) where d ∈ h. In the Lie group, u−1 ◦ v is a small transformation relative to the identity, i.e., relative to u−1 ◦ u (see Fig. 2). The factor of two before the log is a result of the fact that DQ’s are multiplied on the left and right of the vector during transformation (c.f. Eq. 7). Similarly, the box-plus addition operator [9] involves exponentiation. If d ∈ h, then exp d ∈ H. If d = 2 log (u−1 ◦ v), then u ◦ exp d2 = u ◦ exp (log (u−1 ◦ v)) = u ◦ u−1 ◦ v = v. Since d applies a small transform to u, we use the box-plus operator to write v = u  d = u ◦ exp d2 . ʹŽ‘‰ሺ‫ିݑ‬ଵ ‫ ݑ ל‬ൌ ‫ܫ‬ሻ

2Ž‘‰ ‫ିݑ‬ଵ ‫ ݒ ל‬ൌ ݀ ज़

‫ିݑ‬ଵ ‫ݑ ל‬

Ž‘‰

‡š’

‫ିݑ‬ଵ ‫ ݒ ל‬ൌ ‡š’ሺ݀ൗʹሻ

ԯ

Fig. 2: The mapping between the Lie group, H, and the Lie algebra, h, is performed at the identity, u−1 ◦ u. This definition of the difference between DQ’s yields a Gaussian distribution as follows: imagine a Gaussian drawn on the line h in Fig. 2. Exponentiating points on this Gaussian “projects” the distribution onto H. This projected Gaussian serves as our noise model (§ III). Summarizing, the Lie group/algebra enables several key operations: (1) addition of two DQ’s, (2) subtraction of two DQ’s, and (3) addition of noise to a DQ. 1) Logarithm of dual quaternions: The logarithm of some q ∈ H can be calculated as [17]:  1 log q = [(2θ − sin (2θ))q 3 4(sin θ)3 + (−6θ cos θ + 2 sin (3θ))q 2 + (6θ − sin (2θ) − sin (4θ))q + (−3θ cos θ + θ cos (3θ)  − sin θ + sin (3θ))I] 1:3,5:7

(11)

where θ is the rotation angle associated with the DQ, and exponentiation of a DQ is implemented through repeated multiplication (◦). (This expression incorporates a correction, provided by Selig, to that given in [17].) The (·)1:3,5:7 removes the identically zero-valued first and fifth elements from the 8vector. To avoid the singularity at θ = 0, the limit of log q can

be evaluated as: lim log q = [0, 0, 0, q5 , q6 , q7 ]

θ→0

(12)

For compactness, we write log a to mean [log a1 , log a2 ]. 2) Exponential of dual quaternions: If d ∈ h, w = [0, d0 , d1 , d2 ] is a quaternion and q = [w, 0, d3 , d4 , d5 ], we can exponentiate d as [17]: 1 exp d = (2 cos |w| + |w| sin |w|)I 2 1 1 − (cos |w| − 3sinc|w|)q + (sinc|w|)q 2 2 2 1 3 − (cos |w| − sinc|w|)q (13) 2|w|2 The singularity at w = 0 can be avoided by evaluating: lim exp d = I + q

|w|→0

(14)

III. P ROBLEM S TATEMENT Given the observed incremental motions and their covariances, the problem is to estimate the most likely calibration. We formulate this task as a non-linear least squares optimization with a state representation of DQ’s. DQ’s were chosen to verify the CRLB in § VII-A. In § V, we show how to perform the calculation in the over-parametrized state space. (We chose an over-parametrization rather than a minimal representation, in order to avoid singularities.) The calibration to estimate is k = [k0 , · · · , k7 ], where k ∈ H. The 6-DOF incremental poses from each sensor form a series of DQ observations (our experiments use FOVIS [10] and KinectFusion [16]). Let zr = [zr1 , zr2 , · · · , zrN ] and zs = [zs1 , zs2 , · · · , zsN ] be the observations from sensor r and s, respectively. Note that {zri , zsi } ∈ H. Finally, let z = [zr , zs ] be the (2N ) observations. Both the incremental poses of sensor r and the calibration must be estimated [2], so the state to estimate is x = [vr , k] consisting of (N + 1) DQ’s, where vr = [vr1 , vr2 , · · · , vrN ] is the latent incremental motion of sensor r. We then formulate the task as a maximum likelihood optimization [2]: N & x ˆM L (z) = argmax P (zri |vri )P (zsi |vri , k) (15) x=[vr ,k] i=1

under the constraint that {vri , k} ∈ H. The probability functions might be assumed to be Gaussian. However, it is clear that adding Gaussian noise to each term of a DQ will not generally result in a DQ. Instead, we use the projected Gaussian. By comparison, other approaches [5, 3, 20] simply ignore noise and assume that observations and dimensions should be equally weighted. However, when observation uncertainty varies (e.g., when the number of features varies for a vision algorithm) or when the uncertainty among dimensions varies (e.g., translations are less accurate than rotations), explicit representation of noise minimizes the effects of inaccurate observations. Further, a principled noise model allows recovery of the CRLB and, thus, the calibration’s uncertainty.

27

Then, we calculate:

A. Process model The process model can be written as: δ z = G(x) ◦ exp ≡ G(x)  δ 2 G(x) = [vr1 , · · · , vrN , g(vr1 , k), · · · , g(vrN , k)]

1

(16) (17)

P (z|x) ∼ f (z f (z

−1

◦ G(x)) = '

◦ G(x)) 1

(2π)12N |Σz |

e

− 12 λT Σ−1 z λ

T

The CRLB is critical because (1) it defines a best case covariance for our estimate, and (2) if J is singular, no estimate exists. If (and only if) x cannot be estimated, then x is unobservable. Indeed, we wish to identify the situations under which x is unobservable and arrange to avoid them in practice. A. Shift invariance (20)

The FIM, as it is based on the gradient, is invariant under rigid body transformations, so the FIM of a distribution of f (z −1 ◦ G(x)) equals that of the distribution f (G(x)). This is because the expectation is an integration over SE(3) and is unaffected by a shift [4]. Thus, we analyze P (z|x) ∼ f (G(x)) to produce an equivalent FIM. Let H(y) = −2 log y and λ = H(G(x)). H accepts a 16N ×1 parameter vector of DQ’s in the Lie group, and returns a 12N × 1 vector of differences in the Lie algebra. B. Fisher Information Matrix

JH 

12N ×16N

Σ−1 z



(23) (24)

λ 

(25)

Substituting into Eq. 20:

 T J = E (∇x ln P (z|x)) (∇x ln P (z|x))

   T T −1 T  T T −1 J H Σz λ = E JG J H Σz λ J G

 T T  T T −1 JH Σz λλ Σ−1 JH JG = E JG z

T T T T −1 Σ−1 JH Σz E λλ JH JG = JG z   T T T −1 = JG JH Σz Σz Σ−1 JH JG z T T −1 JG J H Σz J H J G

(26) (27) (28) (29) (30) (31)

Since each of the (N + 1) DQ’s in x is subject to two constraints, JG is always rank-deficient by at least 2(N +1). Of course, our interest is not in rank deficiencies caused by overparametrization but in singularities due to the observations. We must distinguish singularities caused by over-parametrization from those caused by insufficient or degenerate data. C. Cramer-Rao Lower Bound In the communications field, a related problem is to estimate complex parameters of the wireless transmission path. These complex “channel” parameters are usually unknown but obey some constraints (e.g., they have unit magnitude). Since the CRLB is often useful for system tuning, Stoica [18] developed techniques to determine the CRLB with constrained parameters. Following [18], suppose the m constraints are expressed c as f c = [f1c , · · · , fm ] such that f c (x) = 0. Let F c be the c Jacobian of f and U be the null space of F c such that F c (x)U = 0. When K = U T JU is non-singular, the CRLB exists. In our case, T T −1 JH Σ z J H J G U K = U T JU = U T JG

= = Σ−1 z

T

T T U JG JH LLT JH JG U (LT JH JG U )T (LT JH JG U )

(32) (33) (34)

T

where = LL by the Cholesky factorization. In order to find the cases where J is singular, we examine the rank of K:   rank(K) = rank LT JH JG U )T (LT JH JG U (35) Since rank(AT A) = rank(A),

In order to calculate the FIM for our estimation problem, we first find the gradient of λ by applying the chain rule: T  T ∇x λ = ∇p H(p)|p=G(x) [∇x G(x)] =

T T JG JH

(22)

12N ×12N 12N ×1

=

Assume we have some estimate x ˆ of the true parameters x0 . We wish to know

how the estimate varies, so we calculate T the covariance E (ˆ x − x0 ) (ˆ x − x0 ) . Cramer and Rao [19] showed that this quantity can be no smaller than the inverse of J, the Fisher Information Matrix (FIM):

 T E (ˆ x − x0 ) (ˆ x − x0 ) ≥ J −1 (19)

For an unbiased estimator, the FIM is:

 T J = E (∇x ln P (z|x)) (∇x ln P (z|x))

=

(18)

IV. O BSERVABILITY

−1

= (JH JG ) Σ−1 z λ

where δ ∈ h acts as a projected Gaussian: δ ∼ N (0, Σz ). Here, the expected observations G(x) have been corrupted by a noisy transformation with δ. Notice that the process model is not typical additive Gaussian noise, z = G(x) + δ, which would result in z ∈ / H. The difference between the observations, z, and the expected values, G(x), is λ = −2 log (z −1 ◦ G(x)), where λ includes 12N parameters, six for each of the 2N observations. The posteriors in Eq. 15 can be written: −1

T

∇x ln P (z|x) = ∇x ln ce− 2 λ Σz λ   1 T λ = ∇x − λ Σ−1 z 2

JG 

(21)

rank(K) = rank(LT JH JG U )

(36)

Further, since each observation is full rank, Σz is full rank; LT is a 12N × 12N matrix with rank 12N and rank(LT A) = rank(A). Thus (see Appendix A): rank(K) = rank(JH JG U ) = rank(JG U ).

16N ×8(N +1)

28

(37)

1

5

10

15

18

1

1

10

10

20

20

32

Fig. 3: Visualization of the matrix JG U shows that only the first six columns can reduce. Blank entries are zero; orange are unity; red are more complex quantities. (Full expressions for the matrix elements shown in red are included at the URL given in § I.)

32 1

5

10

15

18

D. Degeneracies As shown in [2], one observation is insufficient to recover the calibration. For two observations and a particular null space matrix, U , JG U has the form shown in Fig. 3. Notice that columns 7-18 are always linearly independent due to the unity entries. This is not surprising since vri , the estimated motion of sensor r, is directly observed by zri . Only the first six columns, corresponding to the six DOF’s of the calibration, can possibly reduce. By examining linear combinations of these columns, we can identify a singularity which, if avoided in practice, will ensure that the calibration is observable. Suppose the two incremental motions experienced by sensor r are {a, b} ∈ H and let ai be the i-th term of the 8element DQ, a. When a1 b3 = a3 b1 and a2 b3 = a3 b2 , JG U is singular and the calibration is unobservable. Since the 2nd4th elements of the DQ correspond to the 2nd-4th elements of a unit quaternion representing the rotation, it follows that these relations hold only when there is no rotation or when the rotation axes of a and b are parallel. Thus, the calibration is unobservable when the sensors rotate only about parallel axes. In principle, this analysis could have been completed using any representation for SE(3). However, attempting the analysis using Euler angles and Mathematica 8.0 exceeded 24GB of memory without completing; manual analysis was equally difficult. By contrast, DQ over-parametrization made both manual and automated analyses tractable to perform and simple to interpret, making readily apparent the degeneracy arising from a fixed axis of rotation. The condition to avoid degeneracy has several common special cases: 1) Constant velocity. When the sensors move with constant velocity, the axes of rotation are constant. 2) Translation only. When the sensors only translate, no rotation is experienced. 3) Planar motion. When the robot travels only in a plane, the rotation axis is fixed (perpendicular to the plane).

Fig. 4: Two robots driven along the suggested paths experience rotation about only one axis (green). As a result, the true calibration relating the two true sensor frames (red/blue) cannot be determined. The magenta lines and frames show ambiguous locations for the red sensor frame. These special cases, however, do not fully characterize the degeneracy. So long as the axis of rotation of the incremental poses remains fixed, any translations and any magnitude of rotation will not avoid singularity. In Fig. 4, for example, a robot is traveling along the suggested terrain. Although the robot translates and rotates some varying amount between poses, it always rotates about the same axis (green lines). In such situations, the calibration is ambiguous at least along a line parallel to the axis of rotation (magenta line). That is, if the calibration is translated along such a line, the observations from sensor s remain fixed. Thus, because multiple calibrations result in the same observations, the calibration is unobservable. V. O PTIMIZATION In order to estimate the calibration in Eq. 15, we perform non-linear least squares optimization, using a modified Levenberg-Marquardt (L-M) algorithm [9]. The optimization proceeds as: −1 T −1  Jt Σ (G (xt )  z) (38) xt+1 = xt  − JtT Σ−1 Jt The term G(xt )  z represents error elements in h, which are scaled by the gradient and added (via ) to the current parameter estimate to produce a new set of DQ’s. The method computes error, and applies corrections to the current estimates, in the tangent space via the Lie algebra. After each update, the parameters lie in H. Jt is the analytic Jacobian at the current estimate [21]:  ((   h ( −1 Jt = ∇h H z ◦ G xt ◦ exp ( ) (39) ( ( 2 h=0

Essentially, the method shifts the parameters xt via h ∈ h, then evaluates that shift at h = 0.

29

VI. I NTERPOLATION Although the DQ representation facilitates the FIM analysis, and there are methods to develop a noise model, data and covariance matrices will typically be available in more common formats, such as Euler angles. Furthermore, sensors are rarely synchronized, so incremental motions may be observed over different sample periods. Following the process in [2], we use the Scaled Unscented Transform (SUT) [12] to (1) convert incremental motion and covariance data to the DQ representation and (2) resample data from different sensors at common times. The SUT creates a set of sigma points, X , centered about the mean and spaced according to the covariance matrix. Each point is passed through the interpolation function f i to produce a set of transformed points, Y. A new distribution is then created to approximate the weighted Y. We employ the process adapted by [9] to incorporate the Lie algebra. First, f i converts the Euler states to DQ’s. Second, it accumulates the incremental motions into a common reference frame and resamples them at the desired instants, typically the sample times of the reference sensor. This resampling is done via the DQ SLERP operator [14] which interpolates between poses with constant speed and shortest path. The function then calculates the incremental motions. The SUT requires that transformed sigma points, Y, be averaged according to some weights, w. Because averaging each individual element of a DQ would not result in a DQ, the mean of a set of DQ’s must be estimated by optimization. We wish to find the mean ˆb that minimizes the error term in: N    ˆb (w, Y) = argmin (40) wi log b−1 ◦ Yi b

Fig. 5: Motion simulated such that the red and blue sensors traveled the paths shown. The path is always non-degenerate; in this image k  =  0.1, 0.05, 0.01, 0, 0, π3 .

k0 100 50 0

0.72 0.74 0.76 0.78 0.8 0.82 k2

50

0

0

0.05

0.1 k3

100 50

0

0.45

0.5

0.55

0

−0.45

−0.4 k5

k4 50

0 0.1

−0.35

50

0.2

0.3 k6

0.4

100

0

1.8

1.9

2 k7

2.1

2.2

50

50 0 −0.8 −0.7 −0.6 −0.5 −0.4

i=1

The averaging procedure given in [9], intended for nonnegative weights, fails to converge when the Yi ’s are similar (i.e., for small covariance). Since our SUT implementation can return negative weights, we use L-M optimization (Eq. 38) with the mean cost function given in Eq. 40.

k1

50

0

0

0.1

0.2

0.3

0.4

Fig. 6: Histograms (gray) of calibration estimates from 400 simulations of the path in Fig. 5 match well with the true calibration (green triangles) and constrained CRLB (green diamonds). Black lines indicate the sample mean (solid) and one standard deviation (dashed); red lines show a fitted Gaussian.

VII. R ESULTS

Fig. 6 shows results for a sample calibration:

A. Simulation

kEuler = [2.79 m, −2.79 m, −1.45 m, −0.51 r, 0.94 r, −1.22 r]

We validated the calibration estimates and constrained CRLB by simulating robot motion along the path shown in Fig. 5. The N = 62 observations and covariances per sensor were simulated using a translation/Euler angle parametrization (instead of DQ’s) to model data obtained from sensors in practice. Additive Gaussian noise is applied to this minimal representation with a magnitude 10-40% of the true value. Thirty different calibrations were simulated, each with rigid body parameters k uniformly drawn from [±3 m, ±3 m, ±3 m, ±π, ±π, ±π]. The observations and covariances were then converted to DQ’s using the interpolation method of § VI. We validated the CRLB by performing 400 Monte Carlo simulations [1] for each calibration, sampling each velocity vector from its Gaussian distribution.

in meters ( m) and radians ( r) or, in DQ form, at: kDQ = [0.77, 0.07, 0.49, −0.40, 0.30, 1.99, −0.57, 0.21] As shown, the mean and standard deviations from the simulations match well with the true value and the CRLB, respectively. It is important to note that the covariance matrix corresponding to Fig. 6 is calculated on an over-parametrization; there are only six DOF’s in the 8-element DQ representation. Due to these dependent (i.e., constrained) parameters, the covariance matrix is singular. However, because we use the Lie algebra during optimization and filtering, we can employ the DQ parametrization to avoid problems associated with singular covariances. Fig. 7 shows the error between the thirty true calibrations and the mean of the estimated values for one of the DQ

30

0.01

TABLE I: Calibrations recovered from real data

0.008

x ( m)

y ( m)

z ( m)

ρ ( r)

ϑ ( r)

ψ ( r)

a

True 1 Errorb

-0.045 0.000

-0.305 0.011

-0.572 0.011

-1.316 0.022

0.906 0.002

-1.703 0.009

True 2a Errorb

-0.423 -0.007

-0.004 -0.014

0.006 -0.003

-0.000 -0.019

0.000 0.003

3.141 -0.007

True 3a Errorb

-0.165 -0.007

0.204 0.003

-0.244 -0.013

1.316 0.003

-0.906 0.000

3.009 -0.005

True 4a Errorb

-0.040 -0.006

0.025 0.008

0.000 0.006

-0.052 0.013

0.000 -0.017

3.141 0.005

Error of DQ Parameter

0.006 0.004 0.002 0 −0.002 −0.004 −0.006 −0.008 −0.01 0

5

10

15

20

25

30

Simulation

Fig. 7: The error between the known calibration and the mean estimate was less than ±0.01 for each DQ parameter.

a b

Ground truth calibration Difference between mean of the estimates and the true calibration

Standard Deviation of DQ Parameter

0.04 Simulated CRLB

0.035 0.03 0.025 0.02 0.015 0.01 0.005 0

0

5

10

15

20

25

30

Simulation

Fig. 8: The standard deviation of the simulations and predicted CRLB agreed to within ∼0.01 for each DQ parameter.

Fig. 9: We assess the method’s consistency by recovering the loop of calibrations relating three RGB-D sensors.

parameters. The other DQ elements (not shown) behaved similarly; they were recovered to within about 0.01 of truth. Fig. 8 compares the standard deviation of the parameters resulting from the Monte Carlo experiments and the predicted CRLB. In general, the method came close (within about 0.01) to the best-case CRLB.

Table I summarizes the results of the four different calibrations. For clarity, the transforms are shown as translations and Euler angles, but all processing was done with DQ’s. We assumed a constant variance for each DOF. The first three calibrations used Kinects and FOVIS with 2N  2000 observations; the last used Xtions and KinectFusion with 2N  400. In each case, the method recovered the inter-sensor translation to within about 1.4 cm, and the rotation to within about 1.26 degrees. We assessed the estimator’s consistency with three rigidly mounted depth cameras r, s, and t (Fig. 9). We estimated the pairwise calibrations ks,r , kt,s , and kr,t , where, e.g., ks,r is the calibration between r and s. The closed loop of estimated transformations should return to the starting sensor frame:

B. Real data We further validated the estimator with two different types of depth sensors and motion estimation algorithms. First, we collected data with two Microsoft Kinect RGB-D cameras, mounted on three different machined rigs with known calibrations. The RGB-D data from each camera was processed using the Fast Odometry from VISion (FOVIS) [10] library, which uses image features and depth data to produce incremental motion estimates. Second, we collected data with two rigidly mounted Xtion RGB-D cameras and used the KinectFusion algorithm [16] for motion estimation. For all four calibrations, we moved each rig by hand along a path in 3D. The interpolation algorithm (§ VI) was used to synchronize the translations/Euler angles and convert to DQs. We characterized the noise in both systems using data from stationary sensors. We projected the noise into h and, using a chi-squared goodness-of-fit test, found the projected Gaussian to be a good approximation (at 5% significance) for both FOVIS and KinectFusion.

ks,r ◦ kt,s ◦ kr,t = I

(41)

The accumulated error was small: translation error was [−4.27, −2.66, 7.13] mm and rotation error (again in Euler angles) was [7.03, −5.20, −1.49] milliradians. VIII. C ONCLUSION We described a practical method that recovers the 6-DOF rigid body transform between two sensors, from each sensor’s observations of its 6-DOF incremental motion. Our contributions include treating observation noise in a principled manner, allowing calculation of a lower bound on the uncertainty of the

31

estimated calibration. We show that the system is unobservable when rotation occurs only about parallel axes. Additionally, we illustrate the use of a constrained DQ parametrization which greatly simplified the algebraic machinery of degeneracy analysis. Such over-parametrizations are typically avoided in practice, however, because they make it difficult to perform vector operations (addition, scaling, averaging, etc.), develop noise models, and identify system singularities. We assemble the tools for each required operation, employing the Lie algebra to define local vector operations and a suitable projected Gaussian noise model. Finally, we demonstrated that the constrained form of the CRLB enables system observability to be shown. The method accurately recovers the 6-DOF transformations relating pairs of asynchronous, rigidly attached sensors, requiring only hand-held motion of the sensors through space.

1

10

20

32

Fig. 10: The matrix N (JH ) , depicted here for N = 2, reveals 4N DOF’s corresponding to the constraints of the 2N DQ’s in z. Blank entries are zero; filled are unity. [1] [2] [3] [4] [5] [6]

If A is D × E, B is E × F , N (A) is the null space of A, R (B) is the column space of B, and dim A is the number of vectors in the basis of A, then rank(AB) = rank(B) − dim [N (A) ∩ R (B)]. Substituting from Eq. 37,

[7] [8]

rank(JH JG U ) = rank(JG U ) − dim [N (JH ) ∩ R (JG U )]

JH JG U is singular if and only if JG U is singular. This is not a surprising result; the log function preserves information when mapping between the Lie group and the Lie algebra.

32

T

A PPENDIX

1) If JG U is singular, then rank(JG U ) < 6(N + 1), where N is the number of observations. This implies rank(JH JG U ) < 6(N + 1). Thus, JG U is singular implies JH JG U is singular. 2) If JH JG U is singular, then either JG U is singular or dim [N (JH ) ∩ R (JG U )] > 0. • If JG U is singular, then this is the case above. • If JG U is not singular, then rank(JG U ) = 6(N + 1). The task then becomes to determine dim [N (JH ) ∩ R (JG U )]. Since JG U is full rank, R (JG U ) is the columns of JG U . Furthermore, there are 4N columns in N (JH ), one for each of the two constraints of the 2N DQ’s. (Fig. 10 shows N (JH ) for N = 2.) It can be shown that rank([JH , JG U ]) = rank(JH ) + rank(JG U ). In other words, none of the columns of N (JH ) will intersect with the columns of JG U . Thus, N (JH )∩ R (JG U ) = ∅ and rank(JH JG U ) = rank(JG U ). Since JG U is not singular, JH JG U is not singular, which is a contradiction. Only the former possibility remains, and JH JG U is singular implies JG U is singular.

20

1 2 3 4 5 6 7 8 1

We gratefully acknowledge the support of the Office of Naval Research through award #N000141210071.

Intuitively, this means that if a column of JG U lies in the null space of JH , information is lost during the multiplication and the rank of the matrix product is reduced. In order to show that rank(JH JG U ) = rank(JG U ), there are two cases:

10

1 2 3 4 5 6 7 8

[9]

[10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21]

32

R EFERENCES Y. Bar-Shalom, T. Kirubarajan, and X. Li. Estimation with Applications to Tracking and Navigation. John Wiley & Sons, Inc., New York, NY, USA, 2002. J. Brookshire and S. Teller. Automatic calibration of multiple coplanar sensors. RSS, 2011. H. H. Chen. A screw motion approach to uniqueness analysis of head-eye geometry. In CVPR, Jun 1991. G. Chirikjian. Information theory on Lie groups and mobile robotics applications. In ICRA, 2010. K. Daniilidis. Hand-eye calibration using dual quaternions. IJRR, 18, 1998. D. W. Eggert, A. Lorusso, and R. B. Fisher. Estimating 3d rigid body transformations: a comparison of four major algorithms. Mach. Vision Appl., 9(5-6), March 1997. V. Govindu. Lie-algebraic averaging for globally consistent motion estimation. In CVPR, 2004. G. Grisetti, S. Grzonka, C. Stachniss, P. Pfaff, and W. Burgard. Efficient estimation of accurate maximum likelihood maps in 3D. In IROS, Nov 2007. C. Hertzberg, R. Wagner, U. Frese, and L. Schr¨oder. Integrating generic sensor fusion algorithms with sound state representations through encapsulation of manifolds. CoRR, 2011. A. Huang, A. Bachrach, P. Henry, et al. Visual odometry and mapping for autonomous flight using an RGB-D camera. In ISSR, Aug 2011. E. Jones and S. Soatto. Visual-inertial navigation, mapping and localization: A scalable real-time causal approach. IJRR, Oct 2010. S. Julier. The scaled unscented transformation. In Proc. ACC, volume 6, pages 4555–4559, 2002. K. Kanatani. Group Theoretical Methods in Image Understanding. Springer-Verlag New York, Inc., 1990. L. Kavan, S. Collins, J. Zara, and C. O’Sullivan. Geometric skinning with approximate dual quaternion blending. volume 27. ACM Press, 2008. J. McCarthy. An Introduction to Theoretical Kinematics. MIT Press, 1990. R. Newcombe, A. Davison, S. Izadi, et al. KinectFusion: real-time dense surface mapping and tracking. In ISMAR, Oct. 2011. J. Selig. Exponential and Cayley maps for dual quaternions. Adv. in App. Clifford Algebras, 20, 2010. P. Stoica and B. C. Ng. On the Cramer-Rao bound under parametric constraints. Signal Processing Letters, IEEE, 5(7):177–179, Jul 1998. H. Van Trees. Detection, Estimation, and Modulation Theory, Part I. John Wiley & Sons, New York, 1968. R. Y. Tsai and R. K. Lenz. A new technique for fully autonomous and efficient 3D robotics hand/eye calibration. IEEE Trans. Robot. Autom., 5(3), Jun 1989. A. Ude. Nonlinear least squares optimisation of unit quaternion functions for pose estimation from corresponding features. In ICPR, volume 1, Aug 1998.

Colour-Consistent Structure-from-Motion Models Using Underwater Imagery Mitch Bryson, Matthew Johnson-Roberson, Oscar Pizarro and Stefan B. Williams Australian Centre for Field Robotics University of Sydney, Australia email: [email protected]

Abstract—This paper presents an automated approach to correcting for colour inconsistency in underwater images collected from multiple perspectives during the construction of 3D structure-from-motion models. When capturing images underwater, the water column imposes several effects on images that are negligible in air such as colour-dependant attenuation and lighting patterns. These effects cause problems for human interpretation of images and also confound computer-based techniques for clustering and classification. Our approach exploits the 3D structure of the scene generated using structure-from-motion and photogrammetry techniques accounting for distance-based attenuation, vignetting and lighting pattern, and improves the consistency of photo-textured 3D models. Results are presented using imagery collected in two different underwater environments using an Autonomous Underwater Vehicle (AUV).

I. I NTRODUCTION In recent years, marine biologists and ecologists have increasingly relied on remote video and imagery from Autonomous Underwater Vehicles (AUVs) for monitoring marine benthic habitats such as coral reefs, boulder fields and kelp forests. Imagery from underwater habitats can be used to classify and count the abundance of various species in an area. Data collected over multiple sampling times can be used to infer changes to the environment and population, for example due to pollution, bio-invasion or climate change. To provide a spatial context to collected imagery, techniques in structurefrom-motion, photogrammetry and Simultaneous Localisation And Mapping (SLAM) have been applied to provide threedimensional (3D) reconstructions of the underwater terrain using collected images from either monocular or stereo camera setups and other navigational sensor information (for example see [2, 3, 6, 9]). When capturing images underwater, the water column imposes several effects on images that are not typically seen when imaging in air. Water causes significant attenuation of light passing through it, reducing its intensity exponentially with the distance travelled. For this reason, sunlight, commonly used as a lighting source in terrestrial photogrammetry, is typically not strong enough to illuminate scenes below depths of approximately 20m, necessitating the use of artificial lighting on-board an imaging platform. The attenuation of light underwater is frequency-dependant; red light is attenuated over much shorter distances than blue light, resulting in a change in the observed colour of an object

Fig. 1: Example Images and a 3D Structure-from-Motion Model of an Underwater Environment: Top, example underwater images taken from an AUV. Bottom, a 3D terrain model derived from stereo image pairs in a structure-from-motion processing pipeline where textures on the right half of the model are created from the collected imagery. Vehicle-fixed lighting patterns and water-based light attenuation cause foreground objects to appear bright and red with respect to background objects that appear dark and blue/green resulting in texture inconsistencies.

Full colour paper available at: http://roboticsproceedings.org/rss08/p05.html.

at different distances from the camera and light source. In the context of structure-from-motion, the colour and reflectivity of objects is significantly different when imaged from different camera perspectives and distances (see Figure 1). This can cause problems for human interpretation of images and computer-based techniques for clustering and classification of image data based on colour intensities. The aim of this paper is develop a method for correcting

33

images so that they appear as if imaged in air, where effects such as attenuation are negligible, and where observed image intensities and colour are much less dependant on the perspective and distance from which the image was captured. The context of this work is imaging in large-scale biological habitats, which makes the use of colour-calibration tools such as colour-boards undesirable due to logistical complexity and the sensitivities of marine habitats to man-made disturbances to the seafloor. Instead, our approach to colour correction focuses on colour consistency by utilising the assumption of a ‘grey-world’ colour distribution. We assume that, just as is the case in air, that surface reflectances have a distribution that on average is grey and is independent of scene geometry. In air, given the benign imaging process, this assumption approximately holds at the image pixel intensities which is why grey-world image correction is remarkably effective [1]. When underwater, range and wave-length dependent attenuation bias the distribution of image intensities and the grey-world correction cannot be applied naively. Our method exploits known structure of the 3D landscape and the relative position of the camera, derived using structure-from-motion and photogrammetry, to apply an image formation model including attenuation into the grey-world correction. Results of the method are illustrated in two different underwater environments using images collected from an AUV. Section II provides an overview on past work in modelling underwater image formation and underwater image correction. Section III details our approach to underwater structure-frommotion and image correction. Section IV provides an overview of the context in which underwater images are acquired and the experimental setup used to demonstrate our image correction method. Section V provides results of our image correction method. Conclusions and future work are presented in Section VI. II. BACKGROUND L ITERATURE The predominant effects of the water column on images taken underwater are scattering and attenuation [5]. Scattering occurs when light is reflected from microscopic particles in the water, causing a ‘blurring’ in images that increases with distance. The authors in [11] present an approach to reducing the effects of light scattering by taking images through a polarising filter at various angles. Attenuation occurs when light is absorbed or diffracted by water molecules or other particles in the water [5] and can be affected by water temperature, salinity, water quality (i.e. from pollution, sediment suspension) and suspended microscopic life (plankton). Attenuation is the dominant cause of the colour imbalance often visible in underwater images. Several authors have proposed approaches to compensating for this effect. In [14], the authors present an active imaging strategy that adaptively illuminates a scene during imaging based on the average depth from the camera. The approach alters the colour balance of the strobe lighting source (for example to increase red-light for scenes further from the camera) but uses only one depth value (derived from the camera focal length) per scene,

Fig. 2: Left, example of a colour image of an underwater scene and right, the corresponding range image derived from the 3D structurefrom-motion. The range image is a map of the distance between the front of the camera lens and objects in the scene for each pixel in the image.

neglecting the case where different objects in a single scene are observed at different ranges to the camera. In [15], the authors propose a method for correcting underwater images for colour-attenuation by estimating attenuation coefficients using either a single image captured both underwater and in air or two underwater images captured at different depth values. The method relies on a controlled setup for capturing images and is only demonstrated on three images. Similarly, the authors of [12] present a method for colour correcting underwater images for attenuation using known terrain structure under the assumption that a sufficient number of ‘white’ points can be identified in the data, either from known properties of the scene or by using calibration targets. While these approaches are insightful, they are typically impractical in large-scale, unstructured underwater environments. In [13], the authors present a learning-based approach to colour correction that learns the average relationship between colours underwater and in air using training points of images of the same objects both in and out of water. The approach ignores the effect of distance-based attenuation, instead learning a single transformation that corresponds to an object at a fixed imaging distance. Past approaches to underwater colour correction in the literature either ignore the explicit causes of attenuation or require complicated and limiting calibration setups for attenuationcompensation. The contribution of our work is thus a method of underwater image correction that explicitly accounts for distance-based attenuation and does not require an in-situ colour calibration setup, thus enabling large-scale surveying in unstructured underwater environments. III. M ETHODOLOGY This section of the paper provides a background in image transformation algorithms, underwater structure-from-motion and an underwater image formation model, and describes our approach to underwater image correction. A. Underwater Structure from Motion The colour correction strategy discussed throughout the rest of the paper exploits knowledge about precise distances between the camera and each point in an observed scene.

34

object reflectance and radiance at the camera lens as a function of the object distance, d, and the attenuation coefficient, b(λ), which is a function of the wavelength of light λ (i.e. colour): L = Re−b(λ)d

Fig. 3: Underwater Image Formation Model: Light is reflected from the surface of an object (R) from an artificial lighting source attached to the underwater platform. This light is attenuated as it passes through the water column and reaches the camera lens with a radiance (L). Irradiance (E) reaching the image sensor is affected by vignetting through the camera lens and transformed to an image brightness intensity (I) via the sensor response function of the camera.

This information is generated using an existing structure-frommotion processing pipeline; more details on the approach can be found in [6, 9]. Overlapping stereo-image pairs are collected over an underwater scene using an AUV. Scale Invariant Feature Transform (SIFT) feature points [8] are extracted from each pair and used to compute pose-relative 3D point features. Features are also matched and tracked across overlapping poses and used with additional navigation sensor information (a depth sensor, doppler-velocity sensor and attitude sensors) to compute the trajectory of the platform using a pose-based Simultaneous Localisation and Mapping (SLAM) algorithm [9]. The estimated trajectory and pose-relative 3D point features are then used to construct a global feature map of the terrain from which a photo-textured surface model is created by projecting colour camera images onto the estimated terrain structure [6]. The points in the estimated surface model are then back-projected into each stereo-image pair and interpolated to produce a range-image, i.e. an image whose intensity values correspond to the range from the camera lens to the observed scene points. Figure 2 illustrates an example underwater colour image of a boulder field with scene objects of varying distances from the camera and the corresponding range image derived from a 3D structure-from-motion model of the area.

Light travelling through the lens of the camera undergoes a fade-out in intensity towards the corners of the image via the effect of vignetting [7]. Vignetting is caused primarily by the geometry of light passing through the lens and aperture of the camera; light passing in from greater angles to the principle axis of the camera is partially shaded by the aperture and sometimes by the lens housing. Vignetting can be summarised by: E = C(u, v)L (2) where E is the light per unit area arriving at the camera image sensor (irradiance) and C(u, v) is constant for a given horizontal and vertical position in the image [u, v]. C(u, v) may be parameterised in various ways under different camera setups; a typical model [7] for a simple lens has C(u, v) proportionate to θ, the angle of the image position from the principle axis of the camera: πr2 cos4 θ (3) 4l2 where r is the radius of the lens and l is the distance between the lens and the image plane. The term C(u, v) can also account for the effects of a camera-fixed lighting pattern, which is typical of underwater imaging, where artificial lighting is carried by the underwater platform and imposes an object illumination that is dependant on the position of the object in the image. The last step in image formation occurs when light arriving at the image sensor of the camera is converted into an image intensity value I, via the sensor response function of the camera f (.): I = f (kE) (4) C=

where k is the exposure constant, typically proportionate to the shutter speed of the camera. The sensor response function f (.) can take a variety of forms, for example a gamma curve, and is typically controlled by camera firmware. A detailed discussion of sensor response functions can be found in [4]. The overall image formation process is illustrated in Figure 3. Under the assumption that the sensor response function is linear with the form: I = AkE + c

B. Underwater Image Formation Model The measured image intensity recorded by a camera at a point in an underwater scene is not directly proportionate to the intensity of light reflected from the point; instead, several factors act on the light path that make this relationship camera perspective-dependent. The light reflected from an object per unit area (reflectance, R) is attenuated through the water column, resulting in a exponential decrease in the magnitude of the light per unit area arriving at the front of the lens (radiance, L) as the distance between the object and the camera lens increases [5]. Equation 1 describes the relationship between

(1)

(5)

where the parameters A and c are constant, Equations 1, 2 and 5 can be combined into a single equation: I

= AkC(u, v)Re−b(λ)d + c = a(u, v)Re−b(λ)d + c

(6) (7)

where the parameter a(u, v) is a function of image position, b(λ) is a function of light wavelength (i.e. image colour channel) and c is a constant. Our underwater image formation model assumes that ambient light (i.e. from the sun) is negligible compared to the

35

histogram statistics and the transformation parameters: μy (λ) σy2 (λ)

= =

E[Iy (λ)] = m(λ)μx (λ) + n(λ) E[(Iy (λ) − E[Iy (λ)])2 ]

(9) (10)

=

m(λ)2 E[(Ix (λ) − E[Ix (λ)])2 ]

=

m(λ)2 σx2 (λ)

(11) (12)

where μx (λ) and σx2 (λ) are the mean and variance of a given channel of the original image x and E[.] is the expectation operator. These relationships can then be used to derive a linear transform that results in an image channel with any desired mean and variance: ) σy2 (13) m(λ) = σx2 (λ) (14) n(λ) = μy − m(λ)μx (λ)

Fig. 4: Image before and after grey-world correction: top left subfigures shows the original, uncorrected image, top right subfigure shows the colour-balanced image. Bottom subfigures illustrate the image colour intensity histograms.

vehicle-carried artificial light source, which typically holds for the operating depths considered in this study. The effect of ambient light could be added as an additional parameter to the model in Equation 7; this is beyond the scope of this paper and will be considered in future work. C. Image Transformation Algorithms As a result of the effect of artificial lighting patterns and attenuation, underwater images typically exhibit a colour balance that is different from what would be observed from air, with much stronger blue and green responses. Image transformations such as contrast and brightness enhancements can be applied to each colour channel of the image to compensate; consider the effect of applying a linear transform to each pixel in the image: Iy (u, v, λ) = m(λ)Ix (u, v, λ) + n(λ)

(8)

where Ix (u, v, λ) is the original intensity of a given pixel [u, v] for channel λ in image x, Iy (u, v, λ) is the image pixel intensity in image y, and m(λ) and n(λ) are scaling and offset constants applied across all pixels in a given colour channel λ (i.e. separate values for each of the red, green and blue channels). The mean μy (λ) and variance σy2 (λ) of the resulting pixel histogram after transformation for a given channel can be calculated as a function of the original image channel

where μy and σy2 are the desired mean and variance. This method can be used to balance colour channels by setting the desired mean and variance equal for each channel and computing separate scale and offset parameters for each image channel. Figure 4 provides an example of this process. An uncompensated underwater image with corresponding histogram is shown along with the final, compensated image and image histogram after providing a linear transformation with parameters derived for each red-green-blue channel using Equations 13 and 14 with a desired mean of μy = 0.5 and desired image variance of σy2 = 0.162 for each channel. This process is commonly referred to as a ‘grey-world’ transformation [1] and can also be applied across a collection of images taken in a given environment by computing μx (λ) and σx2 (λ) for each red, green and blue image channel using the concatenation of data from several images (i.e. μx (λ) and σx2 (λ) are computed using all pixels from all images in a given channel). The assumption behind the grey-world transformation is that the real colours in the environment are evenly distributed (i.e. grey on average). D. Image Transformation Accounting for Image Spatial Effects Camera-fixed lighting pattern and vignetting are two examples of effects on images that result in pixels in different regions of an image having different statistical distributions across multiple images in an environment. On average, pixels images towards the centre of an image will appear brighter than those towards the periphery. To account for this, the greyworld image transformation can be computed separately for each spatial region of an image. Thus for each pixel location [u, v], the image correction is applied: Iy (u, v, λ) = m(u, v, λ)Ix (u, v, λ) + n(u, v, λ)

(15)

where a separate set of scale and offset parameters is used for each pixel location and channel: ) σy2 m(u, v, λ) = (16) σx2 (u, v, λ) (17) n(u, v, λ) = μy − m(u, v, λ)μx (u, v, λ)

36

Fig. 5: Scatter plot of measured pixel intensity vs. distance to camera from the center pixel (i.e. u = 680, v = 512) in a set of 2180, 1360by-1024 pixel images. The scatter data exhibits an exponential falloff in intensity with distance from the camera due to attenuation, and a dependancy to colour.

where μy and σy2 are the desired mean and variance and μx (u, v, λ) and σx2 (u, v, λ) are the mean and variance of all intensities in a given set of N images at the pixel location [u, v] for a given channel λ. This process benefits from a large collection of N images within a given environment from different perspectives in order to provide a statistically significant measurement of μx (u, v, λ) and σx2 (u, v, λ) over objects of various colours and intensities. E. Image Transformation Accounting for Attenuation In addition to image spatial effects, such as lighting pattern and vignetting, attenuation also serves to skew the distribution of intensities measured at a given pixel location over an image set based on the range to different objects measured in each image. For example, Figure 5 illustrates a scatter plot of the red, green and blue image intensities measured from the center pixel (i.e. u = 680, v = 512) in a set of 2180, 1360by-1024 pixel images vs. the computed range in the scene of this image pixel taken from various perspectives over an underwater environment. The distributions in pixel intensities exhibit considerable correlation to the range to objects with a exponentially decaying relationship to range, indicative of the image formation model described in Equation 7. As in the case for lighting pattern and vignetting, rangedependent attenuation can be accounted for by applying an image transformation that is a function of image channel, image pixel coordinate and range to the object. The complete image transformation is thus computed as: Iy (u, v, λ) = m(u, v, λ, d)Ix (u, v, λ) + n(u, v, λ, d) ) m(u, v, λ, d)

=

n(u, v, λ, d)

=

σy2 2 σx (u, v, λ, d)

(18)

(19)

μy − m(u, v, λ, d)μx (u, v, λ, d) (20)

where μx (u, v, λ, d) and σx2 (u, v, λ, d) are the mean and variance of all intensities in a given set of N images at the pixel location [u, v] for a given depth d and for a given

channel λ. One potential method for computing μx (u, v, λ, d) and σx2 (u, v, λ, d) is to ‘bin’ measured pixel intensities into specified values of d (distance from the camera) and compute the mean and variance of each bin, requiring a large number of samples at each bin, resulting in certain ranges and pixels locations being under-sampled. Since the expected distribution of intensities across different depths is expected to follow the relationship of image formation derived in Equation 7, an alternative approach is taken. For each pixel location, a scatter plot of image intensity vs. range is created, one point for each of the N image in the underwater dataset. If μR is the expected mean reflectance of the surface, then the mean image intensity measured from pixel [u, v], channel λ and range d is: μx (u, v, λ, d) = a(u, v, λ)μR e−b(u,v,λ)d + c(u, v, λ)

(21)

where a(u, v, λ), b(u, v, λ) and c(u, v, λ) correspond to parameters for the considered pixel location and image channel. Let a∗ (u, v, λ) = a(u, v, λ)μR and the parameters a∗ (u, v, λ), b(u, v, λ) and c(u, v, λ) in Equation 21 can now be estimated by taking a non-linear least-squares fit of this function with the scatter data using a Levenberg-Marquardt optimisation [10]. An initial state estimate x = [a, b, c] = [1, 0, 0] was used and was found to provide stable convergence in all of the datasets examined. The mean intensity is then computed from the function μx (u, v, λ, d) = a∗ (u, v, λ)e−b(u,v,λ)d +c(u, v, λ) as a function of distance d. To compute σx2 (u, v, λ, d), the variance corresponding to pixel [u, v] and channel λ as a function of distance d, a similar 2 is the expected variance of reflectance approach is taken. If σR 2 of the surface, then σx (u, v, λ, d) is: σx2 (u, v, λ, d)

=

E[(Iy (u, v, λ, d) − E[Iy (u, v, λ, d)])2 ]

=

2 [a(u, v, λ)e−b(u,v,λ)d ]2 σR

(22)

2 can be comThe expected variance of surface reflection σR puted using the parameters estimated in Equation 21; for each sample of the scatter data, the expected reflectance can be computed by taking the inverse of Equation 21 and from the 2 resulting values of reflectance, the variance σR is calculated.

IV. E XPERIMENTAL S ETUP Data collected from an AUV deployed in two different types of underwater environments was used to test the image correction techniques described above. The first dataset was collected over an underwater boulder field populated by various species of sponge, algae and sea urchins. The data was collected by taking overlapping transects of a rectangular region approximately 25-by-20m and contained images collected at various altitudes as the vehicle moved over the rocky terrain. The second dataset was collected from a single line transect over an area with various types of benthic coverage including rocky reef platforms with various species of coral, kelp and sandy bottom. In both datasets, the underwater terrain is rugous and results in images being collected from a range of different perspectives and camera-object distances.

37

The underwater images were collected using a stereo camera setup with two 1360-by-1024 pixel cameras; the left camera was a colour-bayer camera and the right camera monochrome (only images from the left camera were used in the image correction process). The cameras are mounted below the AUV platform looking downwards so as as to capture images of the terrain as the vehicle passed over. Images were captured in a raw 12-bit format and later converted to three colour channels. The camera firmware used a linear sensor response function. The aperture and shutter exposure times were fixed for each data collection. Artificial lighting was provided by two strobes attached to the AUV that provide white light to the scene. Both datasets were processed using a structure-from-motion pipeline [6, 9], and the derived 3D structure used to create range images for each left camera image collected. Two sets of corrected images were obtained from the raw data for both datasets for comparison. The correction was applied firstly using Equation 15 which only accounted for image space effects such as vignetting/lighting pattern and colour balance (i.e. not accounting for distance-based attenuation). A second set of corrected images was obtained using Equation 18 which accounted for the full image formation model described in Equation 7 including image space effects and distance-based attenuation. Raw images were typically too dark to view with the naked eye (see Figure 4 for an example) and thus were not displayed in the comparisons shown. The corrected images were assessed both qualitatively (by examining consistency in appearance) and quantitatively by measuring the inconsistency in overlapping images of common objects. For each planar region of the surface model, a list of images that covered this region was compiled. The structurefrom-motion estimated image poses were used via backprojection to compute the patch of each image corresponding to the given spatial region. As a measure of inconsistency, the variance in these image texture intensities at this planar region was computed and displayed spatially over the map for each region. When images were affected by depth-based attenuation, regions that were viewed from multiple perspectives (and thus multiple ranges) displayed widely different texture intensities (and thus high variance and inconsistency) whereas when attenuation was compensated for, images of a common region displayed lower variance in intensity. The spatial patterns seen in the variance (i.e. faint banding patterns) correspond to changes in the number of overlapping views of each point. Some of the sharp peaks in variance correspond to small registration errors in the structure-from-motion process. V. R ESULTS Figure 6 illustrates examples of the mean/variance curve fitting using image intensity samples taken from the centre pixel (i.e. u = 680, v = 512) from the 2180 images in data set 1 (boulder field). For a given value of distance d, the mean and variance values approximate of the distribution of image intensities captured from various objects at this distance from the camera and are used to derive image correction parameters. The use of a fitting approach to compute statistics allowed for

Fig. 6: Scatter plot of measured pixel intensity vs. distance to camera and estimated mean and standard deviation curves for each red green and blue channel for the center pixel, taken from 2180 images in dataset 1 (boulder field). The mean and variance (shown at three standard deviations from the mean) curves provide an approximation of the distribution of pixel intensities at a given distance from the camera.

a robust estimate at ranges where only a small amount of data is present (i.e. very close or far away from the camera). Figure 7 illustrates a comparison between corrected images using the correction methods described in Sections III-D and III-E. The left subfigure illustrates an image that has been corrected using Equation 15 where only the spatial effects and colour channel are considered in the correction (i.e. no distance-based attenuation model). Although the image now has colours that are balanced and does not exhibit spatial effects such as vignetting or lighting pattern, there is still significant distance-based attenuation. The image in the right subfigure has been corrected using Equation 18 where all the effects of the image position, colour channel and distancebased attenuation are considered in the correction. Both the colour intensity and contrast in the more distant regions of the image have been increased to match the overall scene, while the less distant regions have been darkened, largely removing the effects of attenuation. Once images were corrected across an entire dataset, they were applied as photo-textures for the structure-from-motion

38

Fig. 7: Left, colour compensated image using standard (non depth-based) correction and right, colour compensated image with full water attenuation correction. Both the intensity and contrast in the more distant regions of the image have been increased to match the overall scene in the right subfigure, while objects less distant have been darkened.

Fig. 8: Left, 3D photo-textured scene using standard colour corrected textures and right, 3D photo-textured scene using full water attenuation colour corrected textures.

Fig. 9: First and second subfigures from the top: comparison of 3D photo-textured scenes using standard and full water attenuation colour corrected textures. Third and fourth subfigures: comparison of image texture standard deviation for the same corrected textures (colourbar units are normalised intensity). The attenuation-corrected textures display significantly reduced variance in the intensity of images at each part of the map, in particular areas corresponding to large perspective changes by the vehicle.

39

derived 3D surface model. Figure 8 shows a comparison of the photo-textured terrain model of dataset 1 (boulder field) when using image textures that have been corrected via the methods described in Sections III-D and III-E. The left subfigure illustrates the model using textures taken from images corrected using Equation 15 (i.e. no distance-based attenuation model) and the right subfigure illustrates the model using textures corrected using Equation 18 (i.e. full model with attenuation correction). The left model exhibits considerable correlation between the intensity/colour of image textures and the average distance from which each part of the surface was imaged from, in particular a horizontal banding pattern that corresponds to overlapping swaths during image collection that occur at slightly different heights above the terrain. The distance and general spatial correlation has been essentially removed in the corrected-texture model to the right. Figure 9 illustrates a similar comparison between phototextured models from dataset 2 (reef) with the non-attenuation corrected model shown in the first subfigure (from the top) and the attenuation-corrected model shown in the second subfigure. Inconsistencies are visible in the first model during passage of sections of rocky reef that sit above the sandy bottom and appear bright and red in the images as the AUV changes its height above the terrain in order to clear the obstacle. The bottom two subfigures illustrate maps of the variance in overlapping images textures across the map for the two different correction schemes. The attenuation-corrected textures display significantly reduced variance in the intensity of images at each part of the map, in particular areas corresponding to large perspective changes by the vehicle. VI. C ONCLUSIONS AND F UTURE W ORK This paper has developed an automated approach for correcting colour inconsistency in underwater images collected from multiple perspectives during the construction of 3D structure-from-motion models. Our technique exploits the 3D structure of the scene generated using structure-from-motion and photogrammetry techniques accounting for distance-based attenuation, and improves the consistency of photo-textured 3D models. Results are presented using imagery collected in two different underwater environments and demonstrated both the qualitative and quantitative improvement of the imagery. Our approach relies on the assumption of a ‘grey-world’ (i.e. one in which colours in the environment are grey on average and not biased in hue) and further more that this distribution is spatially consistent (in particular, depth). Future work will consider extensions to our approach to account for environments where this assumption is violated such as when objects of a particularly strong colour are only present at a biased depth in the dataset. One potential approach to this issue is to consider robustified fitting approaches or outlier rejection methods that allow for a select, ‘well-behaved’ subset of the data to be used during model fitting. Future work will also consider approaches for building consistency in image datasets collected over multiple collection times, for monitoring longterm changes to the environment.

R EFERENCES [1] G. Buchsbaum. A Spatial Processor Model for Object Color Perception. Journal of Franklin Institute, 310(1): 1–26, 1980. [2] R. Campos, R. Garcia, and T. Nicosevici. Surface reconstruction methods for the recovery of 3D models from underwater sites. In IEEE OCEANS Conference, 2011. [3] R. Eustice, O. Pizarro, and H. Singh. Visually Augmented Navigation for Autonomous Underwater Vehicles. IEEE Journal of Oceanic Engineering, 33(2):103– 122, 2008. [4] M. D. Grossberg and S. K. Nayar. Modelling the Space of Camera Response Functions. IEEE Trans. on Pattern Analysis and Machine Intelligence, 26(10):1272–1282, 2004. [5] J. S. Jaffe. Computer Modeling and the Design of Optimal Underwater Imaging Systems. IEEE Journal of Oceanic Engineering, 15(2):101–111, 1990. [6] M. Johnson-Roberson, O. Pizarro, S.B. Williams, and I. Mahon. Generation and Visualization of Largescale Three-dimensional Reconstructions from Underwater Robotic Surveys. Journal of Field Robotics, 27(1): 21–51, 2010. [7] S.J. Kim and M. Pollefeys. Robust Radiometric Calibration and Vignetting Correction. IEEE Trans. on Pattern Analysis and Machine Intelligence, 30(4):562–576, 2008. [8] D.G. Lowe. Distinctive Image Features from ScaleInvariant Keypoints. International Journal of Computer Vision, 60(2):91–110, 2004. [9] I. Mahon, S. Williams, O. Pizarro, and M. JohnsonRoberson. Efficient View-based SLAM using Visual Loop Closures. IEEE Transactions on Robotics, 24(5): 1002–1014, 2008. [10] D. Marquardt. An Algorithm for Least-Squares Estimation of Nonlinear Parameters. SIAM Journal on Applied Mathematics, 11(2):431–441, 1963. [11] Y.Y. Schechner and N. Karpel. Clear Underwater Vision. In IEEE International Conference on Computer Vision and Pattern Recognition, 2004. [12] A. Sedlazeck, K. Koser, and R. Koch. 3D reconstruction based on underwater video from ROV Kiel 6000 considering underwater imaging conditions. In IEEE OCEANS Conference, 2009. [13] L. A. Torres-Mendez and G. Dudek. A Statistical Learning-Based Method for Color Correction of Underwater Images. In Research on Computer Science Vol. 17, Advances in Artificial Intelligence Theory, 2005. [14] I. Vasilescu, C. Detweiler, and D. Rus. Color-Accurate Underwater Imaging using Perceptual Adaptive Illumination. In Robotics: Science and Systems, 2010. [15] A. Yamashita, M. Fujii, and T. Kaneko. Color Registration of Underwater Images for Underwater Sensing with Consideration of Light Attenuation. In IEEE International Conference on Robotics and Automation, 2007.

40

M-Width: Stability and Accuracy of Haptic Rendering of Virtual Mass Nick Colonnese

Allison Okamura

Mechanical Engineering Stanford University Stanford, California 94305-2232 Email: [email protected]

Mechanical Engineering Stanford University Stanford, California 94305-2232 Email: [email protected]

I. I NTRODUCTION In many applications of haptic virtual environments, we desire the user to feel a very specific set of mechanical properties, which arise from inherent device dynamics, programmed virtual environment dynamics, and (potentially unexpected) interactions between the two that depend on control structure, sampling rate, and other system properties. Consider the example application shown in Figure 1. Here, researchers wish to test the hypothesis that damage to the cerebellum (a region of the brain that plays an important role in motor control) holds a dynamic model of the body used in feed-forward planning of movements. Patients with cerebellar damage who exhibit ataxia (uncoordinated movements) are asked to make fast reaching movements in a backdrivable, planar exoskeleton robot. If a rendered inertia could be found that eliminates the ataxia, this would provide significant evidence toward the theory that cerebellar damage results in a specific bias in internal modeling of body dynamics, and it would open up a host of potential new rehabilitation therapies for patients with ataxia. However, modifying the effective inertia of the user/robot combination to be a specific value evokes fundamental challenges in haptic device control. Haptic interfaces should be simultaneously stable and accurate. Unstable behavior can cause damage to a haptic device, injure the human user, and generate unrealistic and unexpected

 

    

   

   



   

 

   

Abstract—In many physical human-robot interaction scenarios, such as haptic virtual environments for training and rehabilitation, it is desirable to carefully control the apparent inertia of a robot. Inertia compensation can be used to mitigate forces felt by the user during free-space motion, and rendering of additional inertia is desired for particular rehabilitation and training procedures. Many factors influence the stability and accuracy of rendering for haptic display of a pure mass, including device mechanical properties, sample rate, control structure, and human behavior. Inspired by the “Z-Width” approach to haptic device stability and performance analysis, we introduce “M-width”, which we define as the dynamic range of virtual masses renderable in a stable manner. We identify the important parameters for system stability, find stability boundaries, and describe the expected accuracy of the haptic rendering for a canonical haptic system. These results serve as a design tool for creating stable and accurate haptic virtual environments, establish limits of performance, and lay the groundwork for new controllers to improve mass rendering.

    Fig. 1. Example application of inertia rendering in rehabilitation. (a) A user makes single-joint reaches to a 30◦ target in an exoskeleton robot, during which rendered inertia alters his movement. (b) A patient with hypermetria (tendency to overshoot) is predicted to benefit from decreased effective arm inertia. (c) A patient with hypometria (tendency to undershoot) is predicted to benefit from increased effective arm inertia. While neuromechanical models predict these effects, implementing inertia rendering requires improved understanding of system stability and accuracy.

haptic sensations. Accurate haptic virtual environments should feel exactly as desired, with no unwanted effects of haptic device dynamics (e.g., the inherent inertia of the mechanism). In the rehabilitation example given above, stability and accuracy are both necessary for the scientific validity of the study. Yet, there exists a classic tradeoff between ensuring the stability of a haptic display and the range of impedances that it can render. Mass rendering (presenting a force proportional to acceleration) presents a particular challenge because measuring acceleration on a typical impedance-type haptic device is noisy; it involves double discrete differentiation of a noisy position signal. While system improvements (e.g., the addition of new sensors) can improve mass rendering, an improved understanding of the performance of a canonical haptic device will yield insight as to what modifications in hardware, sensing, and control will improve mass rendering. The longterm goal of this work is to develop stable and accurate mass rendering for rehabilitation applications, and predict the limits of performance. Here, we begin by developing a framework

41

for analyzing the stability and accuracy of mass rendering with theory applicable to a one-degree-of-freedom linear system. Colgate and Brown [5] define “Z-Width” as the dynamic range of impedances that can be rendered passively. Passive systems are incapable of generating a net amount of energy, and the coupling of passive systems is guaranteed to be stable. However, conditions for passivity can be conservative compared to conditions for stability. Although Z-width is a general term spanning all impedances, it has generally been discussed as relating to haptic rendering of virtual springs and dampers. In addition, Colgate and Schenkel [6] derived a general theorem for the passivity of haptic interfaces. This paper introduces “M-Width”, the dynamic range of virtual mass renderable in a stable manner. We allow for positive virtual mass, corresponding to mass rendering, as well as negative virtual mass, corresponding to mass compensation. The definition of M-width differs from Z-width in three key ways: (1) we consider BIBO (bounded input, bounded output) stability, not passivity – to avoid an overly conservative implementation, (2) it models the human operator as a specific impedance, not a general passive element, and (3) the desired virtual environment is a pure mass with motion data filtering. This work builds upon significant prior work related to the stability of haptic displays, e.g. [1, 6, 7, 10]. Virtual mass rendering has been previously explored for specific medical robotics scenarios, including dynamic compensation for a surgical teleoperator [14], and inertia compensation for a lower-leg exoskeleton [3]. Gil et al. [11] explore inertia compensation by force feedforward for an impedance haptic device using additional sensors for acceleration estimation. Adams and Hannaford [2] present a general virtual coupling approach that could be used to render mass that is guaranteed to be passive, but do not consider mass explicitly. Hannaford and Ryu [12] extend the passivity domain, at the necessary expense of reducing the quality of haptic rendering. Brown and Colgate [4] establish conditions for passive positive mass simulations. While recognizing that many scenarios require treatment of nonlinear models, in this paper we consider linear models – which are relevant to the problem of dysmetria (under- or overshooting targets) in single-jointed movements, and enable the use of classic linear control theory. II. S YSTEM M ODELS In this section we introduce system models for a human interacting with a haptic device, where the goal is to render a pure mass. There are many different haptic system architectures that can be used to render inertia, but in this paper we are interested in an impedance-type device with position sensing only. Three system models are introduced. One is a hybrid model, containing both continuous and discrete elements. This model is the closest to reality, in that it captures the control of a physical haptic device through a computer containing A/D (analog to digital) and D/A (digital to analog) components. Entirely continuous and discrete models are also introduced. The continuous model is used for the identification

(a)

(b)

Fig. 2. (a) Schematic of a human interacting with a haptic device implementing virtual mass. (b) Block diagram for this system, considering human parameters, device dynamics, sampling, and ZOH.

of important parameters with respect to stability, and for use in a quality of rendering analysis. We show that the discrete model results in stability bounds that are identical to the hybrid model stability bounds. We consider a haptic device described by a mass, m, and viscous damper, b, acted upon by two external forces: the force applied by the operator, Fh (s), and the force applied by the actuator implementing the virtual environment, Fa (s) (Figure 2). The system is equipped with only one sensor measuring the position of the mass, X(s). An estimate of the acceleration of the device is formed by performing a double back difference on the sampled position signal and then low-pass filtering. The actuator force is the product of the acceleration estimate and the desired virtual mass. A. Hybrid Model The system equation of motion is: Fh (s) − Fa (s) = X(s)(ms2 + bs).

(1)

The force of the operator is determined by the motion of the operator Xh (s) via the hand impedance, Fh (s) = bh (sXh (s) − sX(s)) + kh (Xh (s) − X(s)),

(2)

where bh and kh are positive constants corresponding to the hand damping and stiffness, respectively. The continuous

42

position signal X(s) is sampled with a constant sampling period of T to get a sampled position X(k). The sampled position measurements go through a discrete double back differencing operator, D(z) =

(z − 1)2 , (T z)2

(3)

and then a discrete low-pass filter, H(z) =

(1 − e−ω0 T )z , z − e−ω0 T

(4)

to form an estimate of the acceleration signal. The discrete low-pass filter has cut-off frequency ω0 (rad/s). The low-pass filter exists for practical considerations because the double differentiation of the sampled position signal is expected to be noisy. Other filters could be used; we chose one of the simplest possible filters to provide information about baseline performance. In a Z-width analysis, larger virtual damping enables the stable display of larger virtual environment stiffness. Here, a greater degree of filtering enables the stable display of a larger span of virtual masses. We will see, however, that filtering can result in inaccurate mass rendering. The force of the actuator is the product of the acceleration estimate and a desired virtual mass, M , which can be positive (mass rendering), or negative (mass compensation). The actuator force is held constant for the duration of the sampling period with a zero-order hold (ZOH), resulting in a continuous-time staircase actuator force. At sample time k the actuator force is Fa (k) = M H(z)D(z)X(k).

(5)

B. Continuous Model It is possible to convert the general hybrid system shown in Figure 2 into an entirely continuous one. The continuous model is desirable because of its tractability for finding important parameters with respect to stability using conventional linear control systems analysis. The two systems are similar, but not identical. To represent the system continuously, the discrete elements are converted to continuous ones. The continuous representation of D(z), D(s), is found using bilinear (Tustin) mapping: D(z) → D(s) =

4s2 . (sT + 2)2

(6)

The continuous representation of H(z), H(s), is a first-order low-pass filter with cut-off frequency ω0 (rad/s) with unity gain at DC. ω0 H(z) → H(s) = . (7) s + ω0 The zero-order hold is modeled as a time delay equal to half the sample period. ZOH → e−

sT 2

.

(8)

Fig. 3. Steps for converting the hybrid system into an entirely discrete system. A: From the hybrid model shown in Figure 2, we find a continuous transfer function G(s). B: Then, the continuous elements are converted into an equivalent element G(z), resulting in an entirely discrete system.

C. Discrete Model It is also possible to represent the system with an entirely discrete model. The discrete model is useful for stability analysis in which the discrete elements can be considered explicitly. Figure 3 illustrates the process. First, a transfer function from Fa (s) to X(s), G(s), is formed combining only the continuous elements into a single transfer function: Fa (s) 1 = G(s) = . X(s) ms2 + s(b + bh ) + kh

(9)

Then, the ZOH, G(s), and the sample elements are converted into a discrete element, G(z) using a zero order hold equivalent [9] * + G(s) z−1 Z . (10) G(z) = z s The input-output characteristics of the continuous and discrete elements are the same. For this reason, the stability of the hybrid and discrete models are identical. III. S YSTEM S TABILITY We now analyze the stability of the system shown in Figure 2, considering the all-continuous and discrete representations of the hybrid system from which BIBO (bounded input, bounded output) stability can be determined. First, we identify important parameters with respect to stability using the continuous model. This is useful as a haptic system design tool because it identifies what key parameters determine mass rendering stability. Then, we form quantitative stability boundaries using the discrete model equivalent to those of the hybrid model. This establishes the limits of a haptic system rendering mass using this architecture. Finally, we discuss how the continuous and discrete models differ. A. Effects of Parameters on Stability To find important parameters with respect to stability we analyze the characteristic polynomial of the continuous system. We can determine system stability using a Bode plot. By examining how the form of the Bode plot changes with

43

TABLE I E FFECT OF PARAMETERS ON G AIN M ARGIN OF L(s)

system parameters we observe the effect of the parameters on stability. The characteristic polynomial of the system is 1 + L(s), where M L(s) = m



s2 s2 + 2ζωn s + wn2

,

ωn =

4s2 (sT + 2)2

kh m

(b + bh ) ζ= √ . 2 kh m

(11)

sT ω0 e− 2 s + ω0 (12) (13)

Parameter

Gain Margin (positive M)

Gain Margin (negative M)

|M/m| ωn ζ T ω0

⇓ ⇓ ⇓

⇓ ⇑ ⇑ ⇓

for ωn  ω ∗ TABLE II H UMAN AND D EVICE PARAMETERS

(14)

We see that L(s) can be formed by the product of the ratio of M to m, two poles from a second order system described by ωn and ζ, four zeros at the origin, two poles on the real axis at −2/T , a first-order low-pass filter with cut-off frequency ω0 (rad/s), and a linear phase delay depending on T . It is convenient to define ω ∗ as the frequency at which the phase of L(s) is −180◦ . The stability of the system is determined by the value of the gain margin of L(s). The gain margin, GM , is defined to be 1 GM = (15) |L(ω ∗ )| A gain margin greater than one corresponds to a stable system, and a gain margin of less than one corresponds to an unstable system. The Bode plot for positive and negative M have the same magnitude, and the phase for negative M is 180◦ lower than for positive M , at every frequency. The effects of these system parameters on the gain margin of L(s) is summarized in Table I. • |M/m| is a gain of the system; changing it directly affects the gain margin. There exists a maximum stable value of |M/m|. • ωn is the frequency at which the magnitude and phase of the second order system transitions. For positive M , if ωn  ω ∗ then ωn has no effect on the gain margin. For negative M , ω ∗ is nearly ωn , so increasing ωn increases the gain margin. • ζ affects the magnitude and phase of the second order system around ωn . The lower the value of ζ, the higher the “spike” in magnitude around ωn , and the smaller the span for the phase to transition. For positive M , if ω0  ω ∗ the value of ζ does not affect the gain margin. For negative M , increasing ζ increases the gain margin. • ω0 affects the gain margin by introducing a pole at frequency ω0 . Lower cut-off frequencies, corresponding to more aggressive filtering, increase the gain margin. • T affects how quickly phase lag is added to the system. Larger sample periods result in more phase lag added at a given frequency. For positive M , T has a large effect on the gain margin; a smaller T corresponds to a higher gain margin. For negative M , T has little effect on the gain margin.

Human Parameters [13]

Device Parameters [7]

kh bh

m b

700 5

(N/m) (Ns/m)

70 .005

(g) (Ns/m)

Figure 4 shows how L(s) varies with the sample time, T , for positive desired virtual mass. (The Bode plots for negative desired virtual mass would have −180◦ lower phase at every frequency.) The parameter values are |M/m| = 1, ωn = 100 (rad/s), ζ = 0.5, and ω0 = 30 (rad/s). These values were chosen to represent a human interacting with a Phantom Premium haptic device (SensAble, Inc.) based on a human model from Kuchenbecker et al. [13] and device model from Diolaiti et al. [7] (Table II). Figure 4 demonstrates that the sample rate significantly affects the gain margin for positive virtual mass because the value of ω ∗ is mainly determined by the sample rate. However, for negative virtual mass, ω ∗ is nearly ωn , so the sample rate has little effect on the gain margin. Figure 5 shows how L(s) varies with ωn for positive virtual mass. For positive virtual mass, ωn and ζ do not affect the gain margin of L(s), as long as the phase transition of the second order system reaches its asymptote before ω ∗ . However, for negative virtual mass they do affect the gain margin. Thus, the haptic device and operator parameters have more effect on stability for inertia compensation than inertia rendering. The virtual inertia to device inertia ratio, M/m, is a gain of the system and directly affects the gain margin of L(s). If other system parameter values are set, we can determine the range of M/m for which the system is stable. This analysis shows how much inertia rendering is possible. For example, the Bode plot of Figure 4 at a sample rate 1000 Hz is about −45 dB at ω ∗ , so for positive M , values of M/m up to 45 dB are stable. For negative M , the magnitude of the system is about −15 dB at ω ∗ , so values of |M/m| up to 15 dB are stable. This establishes a range of stable commanded virtual mass values, but we will see in Section IV that the rendered mass of the system may not actually match the commanded value. This analysis is based on the continuous model whose stability boundaries are similar, but not identical, to the hybrid and discrete model stability boundaries.

44



  







  







!" # 

  $%& %& %&

   









  





      





    





      







  







Fig. 6. Examining M-width: Stability boundaries from analysis and numerical simulation. Lines mark boundaries between stable and unstable behavior; the origin side is stable. Given the parameters in Table II, a cut-off frequency of the low-pass filter, f0 (Hz), and sample rate, the plots display the range of desired virtual mass to device mass ratios, M/m, that are stable. The maximum minus the minimum stable value of virtual to device ratio scaled by the device mass is the M-width.

 ! ! !"

 

   

  

Fig. 4. Bode plots of L(s) with positive virtual mass M for various sample rates. Here M/m = 1, ωn = 100 (rad/s), ζ = 0.5, and ω0 = 30 (rad/s). The Bode plots for negative virtual mass have 180◦ lower phase at every frequency. The sample rate affects the gain margin greatly for positive virtual mass, but not for negative virtual mass.

!" # 

   # *#  #+  





  #!"##' 

  

   #  *#   #+   

 

 









  





Fig. 5. Bode plots of L(s) with positive virtual mass M for various values of ωn . Here M/m = 1, ζ = 0.5, ω0 = 30 (rad/s), and the sample rate is 1000 Hz. The Bode plots for negative virtual mass M have 180◦ lower phase at every frequency. For positive virtual mass ωn and ζ do not affect the gain margin of L(s) as long as the phase transition of the second order system reaches its asymptote before ω ∗ , but for negative virtual mass they do affect the gain margin.

     







  

!" # 







!" "  # 

    



B. Stability Boundaries (M-width) In this section, analytical stability boundaries are formed using the continuous and discrete models. Stability boundaries are also formed using numerical simulation. Figure 6 displays stability boundaries for a family of systems. The maximum minus the minimum stable value of the (desired) virtual to device mass ratio, M/m, scaled by the device mass, give the M-width of the system: the dynamic range of virtual masses renderable in a stable manner. The maximum stable value for rendering positive mass is predominately determined by the sample time. For rendering negative inertia, the maximum stable value is mainly determined by ωn and ζ. Reducing the cut-off frequency of the filter increases the M-width. C. Continuous and Discrete Model Comparison How do the continuous and discrete models compare? We now examine whether the stability intuition that we get from the continuous model is applicable to the discrete model,







  





Fig. 7. Bode plots of the loop polynomials for the continuous and discrete systems. The plots are similar, but the discrete system has more phase lag at higher frequencies.

which shares the same stability boundaries as the hybrid one. Figure 7 displays the Bode plots of L(s) and L(z) for the continuous and discrete models, respectively. The system parameter values are the same as in Figure 4 with a sample time of 1000 Hz. The Bode plots are very similar, but the discrete model has more slightly more phase lag at higher frequencies. Thus, the continuous model will appear stable for parameters at which the discrete model is unstable. However, the general effects of parameters for the models are similar, so the intuitive parameter guidelines apparent in the continuous model are still useful.

45

   





: 





  





   ! "

"  * *& ; 0 leads to an optimistic estimate of improvement and tends to encourage exploration of regions of high uncertainty. Cost scale invariance can be achieved by multiplying ξ by the signal standard deviation, σf [18]. B. Variational Heteroscedastic Gaussian Process Regression One limitation of the standard regression model is the assumption of i.i.d. noise over the input space (see equation (2)). Many data do not adhere to this simplification and models capable of capturing heteroscedasticity are required. The heteroscedastic regression model takes the form ˆ J(θ) = J(θ) + ε(θ), ε(θ) ∼ N (0, r(θ)),

(6)

where the noise variance, r(θ), is dependent on the input. In the Bayesian setting, a second GP prior, g(θ) ∼ GP(μ0 , kg (θ, θ  )), is placed over the unknown log variance function, g(θ) ≡ log r(θ) [7, 10, 15]. This heteroscedastic Gaussian process (HGP) model has the unfortunate property that the computations of the posterior distribution and the marginal likelihood are intractable, thus making hyperparameter optimization and prediction difficult. In the variational heteroscedastic Gaussian process (VHGP) model [15], a variational lower bound on the marginal likelihood of the HGP model serves as a tractable surrogate function for optimizing the hyperparameters. Let g = [g(θ1 ), g(θ2 ), . . . , g(θN )] be the vector of latent log noise variances for the N data points. By defining a normal variational probability density, q(g) ∼ N (μ, Σ), the marginal variational bound can be derived [15], F (μ, Σ)

= log N (y|0, Kf + R) − 14 tr(Σ) − KL(N (g|μ, Σ)||N (g|μ0 1, Kg )),

(7)

where R is a diagonal matrix with elements [R]ii = e[μ]i −[Σ]ii /2 . Intuitively, by maximizing equation (7) with respect to the variational parameters, μ and Σ, we maximize the log marginal likelihood under the variational approximation while minimizing the distance (in the Kullback-Leibler sense) between the variational distribution and the distribution implied by the GP prior. By exploiting properties of F (μ, Σ) at its maximum, one can write μ and Σ in terms of N variational parameters [15]: 1 μ = Kg (Λ − I)1 + μ0 1, 2

Σ−1 = K−1 g + Λ,

where Λ is a positive semidefinite diagonal matrix of variational parameters. F (μ, Σ) can be simultaneously maximized with respect to the variational parameters and the HGP model hyperparameters, Ψf and Ψg . If the kernel functions kf (θ, θ  ) and kg (θ, θ  ) are squared exponentials (1), then Ψf = {σf , f } and Ψg = {μ0 , σg , g }.

202

The VHGP model yields a non-Gaussian variational predictive density,  (8) q(Jˆ∗ ) = N (Jˆ∗ |a∗ , c2∗ + eg∗ )N (g∗ |μ∗ , σ∗2 )dg∗ , where a∗

=

c2∗

=

μ∗

=

σ∗2

=

−1 k

y, f ∗ (Kf + R)

√ w = (g∗ − μ∗ )/ 2σ∗ and replacing all occurrences of g∗ in the expressions for v∗ and u∗ ,  2 v∗ [u∗ Φ(u∗ ) + φ(u∗ )] dw, e−w √ EI(θ∗ ) = 2πσ∗  2 (13) ≡ e−w h(w)dw. Similarly, the gradient ∂EI(θ)/∂θ can be computed under the integral (12) and the result is of the desired form:  2 ∂EI(θ∗ ) = e−w z(w)dw, (14) ∂θ

−1 kf (θ∗ , θ∗ ) − k

kf ∗ , f ∗ (Kf + R) 1 k

g∗ (Λ − I)1 + μ0 , 2 −1 −1 kg (θ∗ , θ∗ ) − k

) kg∗ . g∗ (Kg + Λ

where

Although this predictive density is still intractable, its mean and variance can be calculated in closed form [15]: Eq [Jˆ∗ ] Vq [Jˆ∗ ]

=

a∗ ,

=

c2∗

+ exp(μ∗ +

z(w)

(9) σ∗2 /2)



s2∗ .

= ×

(10) +

III. VARIATIONAL BAYESIAN O PTIMIZATION There are two practical motivations for capturing policydependent variance structure during optimization. First, metrics computed on the predictive distribution, such as EI and probability of improvement, will return more meaningful values for the system under consideration. Second, it creates the opportunity to employ policy selection criteria that take cost variance into account, i.e., that are risk-sensitive. We extend the VHGP model to the optimization case by deriving the expression for expected improvement and its gradient and show that both can be efficiently approximated to several decimal places using Gauss-Hermite quadrature (as is the case for the predictive distribution itself [15]). We show how efficiently computable confidence bound selection criteria can be used to select risk-sensitive policies and generalize the expected improvement criterion. To address numerical issues that arise when N is small (i.e. in the early stages of optimization), we incorporate independent log priors into the marginal variational bound and identify heuristic sampling strategies that perform well empirically. A. Expected Improvement The improvement, I∗ , of a policy, θ∗ , under the variational predictive distribution (8) is  (11) q(I∗ ) = N (I∗ |μbest − a∗ , v∗2 )N (g∗ |μ∗ , σ∗2 )dg∗ , where v∗2 = c2∗ + eg∗ . The expression for EI then becomes  ∞ I∗ q(I∗ )dI∗ , EI(θ∗ ) = 0 = v∗ [u∗ Φ(u∗ ) + φ(u∗ )] N (g∗ |μ∗ , σ∗2 )dg∗ , (12) where u∗ = (μbest − a∗ )/v∗ . Although this expression is not analytically tractable, it can be efficiently approximated using Gauss-Hermite quadrature. This can be made clear by setting

1 1 √ v∗ (u∗ Φ(u∗ ) + φ(u∗ )) σ   2πσ∗ ∗ √ ∂μ∗ ∂σ∗ 2 ∂σ∗ + 2w + 2w − ∂θ ∂θ ∂θ ∂v∗ ∂u∗ (u∗ Φ(u∗ ) + φ(u∗ )) + v∗ Φ(u∗ ) . ∂θ ∂θ

As in the standard Bayesian optimization setting, we can easily incorporate an exploration parameter, ξ, by setting u∗ = (μbest −a∗ +ξ)/v∗ . EI can be maximized using standard nonlinear optimization algorithms. Since flat regions and multiple local maxima may be present, it is common practice to perform random restarts during EI optimization to avoid lowquality solutions. In our experiments, we use the NLOPT [8] implementation of sequential quadratic programming (SQP) with 25 random restarts to optimize EI. B. Confidence Bound Selection In order to exploit cost variance information for policy selection, we must consider selection criteria that flexibly take cost variance into account. Although EI performs well during learning by balancing exploration and exploitation, it falls short in this regard since it always favors high variance or high uncertainty among solutions with equivalent expected cost. In contrast, confidence bound (CB) selection criteria allow one to directly specify the sensitivity to cost variance. The family of confidence bound selection criteria have the general form ˆ∗ ] + b(V[J ˆ∗ ], κ), CB(θ∗ , κ) = E[J

(15)

where b(·, ·) is a function of the cost variance and a constant κ that controls the system’s sensitivity to risk. Such criteria have been extensively studied in the context of statistical global optimization [5, 26] and economic decision making [16]. Favorable regret bounds ' for sampling with CB criteria of the form b(V[J∗ ], κ) = κ V[J∗ ] ≡ κs∗ have also been derived for certain types of Bayesian optimization problems [26]. Interestingly, CB criteria have a strong connection to the exponential utility functions of risk-sensitive optimal control (RSOC) [32, 31]. Consider the standard RSOC objective function,

203

γ(κ) = −2κ−1 log E[e− 2 κJ∗ ]. 1

ˆ

(16)

Taking the second order Taylor expansion of e− 2 κJ∗ about E[Jˆ∗ ] yields 1

ˆ

1 γ(κ) ≈ E[Jˆ∗ ] − κV[Jˆ∗ ]. (17) 4 This approximation exposes the role of the parameter κ in determining the risk sensitivity of the system: κ < 0 is riskaverse, κ > 0 is risk-seeking, and κ = 0 is risk-neutral [31]. Thus, policies selected according to a CB criterion with b(V[Jˆ∗ ], κ) = − 14 κV[Jˆ∗ ] can be viewed as approximate RSOC solutions. Furthermore, since the selection is performed with resect to the predictive distribution (8), policies with different risk characteristics can be selected on-the-fly, without having to perform separate optimizations that require additional policy executions on the system. We can also apply confidence bound criteria to generalize EI to an expected risk improvement (ERI) criterion. We define risk improvement for a policy θ∗ as I∗κ = μbest + κsbest − μbest and sbest are found by minimizing Jˆ∗ − κs∗ , where  ˆ Eq [Jθi ] + κ Vq [Jˆθi ] over all previously evaluated policies. Thus, ERI can be viewed as a generalization of EI where u∗ = (μbest − a∗ + κ(sbest − s∗ ))/v∗ and ERI = EI if κ = 0. C. Coping with Small Sample Sizes 1) Log Hyperpriors: To avoid numerical instability in the hyperparameter optimization when N is small, we augment F (μ, Σ) with independent log-normal priors [18] for each hyperparameter in the VHGP model,  log N (log ψk |μk , σk2 ), (18) Fˆ (μ, Σ) = F (μ, Σ) + ψk ∈Ψ

where Ψ = Ψf ∪ Ψg is the set of all hyperparameters. In practice, these priors can be quite vague and thus do not require significant experimenter insight. For example, in our experiments we set the log prior on length scales so that the width of the 95% confidence region is at least 20 times the actual policy parameter range. As is the case with standard marginal likelihood maximization, Fˆ (μ, Σ) may have several local optima. In practice, performing random restarts helps avoid low-quality solutions (especially when N is small). In our experiments, we perform 10 random restarts using SQP for policy selection. 2) Sampling: It is well known [9] that selecting policies based on distributions fit using very little data can lead to myopic sampling and premature convergence. Incorporating external randomization is one way to help alleviate this problem. For example, it is common to obtain a random sample of N0 initial policies prior to performing optimization. We have found that sampling according to EI with probability 1− and randomly otherwise performs well empirically. In the standard Bayesian optimization setting with model selection, -random EI selection has been shown to yield near-optimal global convergence rates [4]. Randomized CB selection with, e.g., κ ∼ N (0, 1) can also be applied when the policy search is aimed at identifying a spectrum of policies with different risk sensitivities. However, since this technique relies completely

on the estimated cost distribution, it is most appropriate to apply after a reasonable initial estimate of the cost distribution has been obtained. The Variational Bayesian Optimization (VBO) algorithm is outlined in Box 1. Algorithm 1 Variational Bayesian Optimization Input: Previous evaluations: Θ, y, Iterations: n 1) for i := 1 : n a) Maximize equation (18) given Θ, y + + ˆ Ψ+ f , Ψg , Λ := arg max F (μ, Σ) b) Optimize selection criterion, EI, ERI, or CB, w.r.t. optimized model + + θ  := arg maxθ S(θ, Ψ+ f , Ψg , Λ )   ˆ ) c) Execute θ , observe cost, J(θ ˆ  )] d) Append Θ := [Θ; θ  ], y := [y; J(θ 2) Return Θ, y IV. E XPERIMENTS A. Synthetic Data As an illustrative example, in Figure 1 we compare the performance of the VBO to standard Bayesian optimization in a simple 1-dimensional noisy optimization task. For this task, the true underlying cost distribution (Figure 1(a)) has two global minima with different cost variances. Both algorithms begin with the same N0 = 10 random samples and perform 10 iterations of EI selection (ξ = 1.0,  = 0.25). In Figure 1(b), we see that Bayesian optimization succeeds in identifying the regions of low cost, but it cannot capture the policy-dependent variance characteristics. In contrast, VBO reliably identifies the minima and approximates the local variance characteristics. Figure 1(d) shows the result of applying two different confidence bound selection criteria to vary risk sensitivity. Here we maximized −CB(θ∗ , κ), where CB(θ∗ , κ) = Eq [Jˆ∗ ] + κs∗

(19)

and κ = −1.5 and κ = 1.5 were used to select a risk-seeking and risk-averse policy parameters, respectively. B. Noisy Pendulum As another simple example, we considered a swing-up task for a noisy pendulum system. In this task, the maximum torque output of the pendulum actuator is unknown and is drawn from a normal distribution at the beginning of each episode. As a rough physical analogy, this might be understood as fluctuations in motor performance that are caused by unmeasured changes in temperature. The policy space consisted of “bangbang” policies in which the maximum torque is applied in the positive or negative direction, with switching times specified by two parameters, 0 ≤ t1 , t2 ≤ 1.5 sec. Thus, θ = [t1 , t2 ]. The cost function was defined as  T 0.01α(t) + 0.0001u(t)2 dt, (20) J(θ) =

204

0

(a)

(b)

Fig. 2. The cost distribution for the simulated noisy pendulum system obtained by a 20x20 discretization of the policy space. Each policy was evaluated 100 times to estimate its mean and variance (N = 40000).

(c)

(d)

Fig. 1. (a) A noisy cost function sampled during 10 iterations (N0 = 10) of (b) Bayesian optimization and (c) the VBO algorithm. Bayesian optimization succeeded in identifying the minima, but it cannot distinguish between high and low variance solutions. (d) Confidence bound selection criteria are used to select risk-seeking and risk-averse policy parameter values.

where 0 ≤ α(t) ≤ π is the pendulum angle measured from upright vertical, T = 3.5 sec, and u(t) = τmax if 0 ≤ t ≤ t1 , u(t) = −τmax if t1 < t ≤ t1 + t2 , and u(t) = τmax if t1 + t2 < t ≤ T . The system always started in the downward vertical position with zero initial velocity and the episode terminated if the pendulum came within 0.1 radians of the upright vertical position. The parameters of the system were l = 1.0 m, m = 1.0 kg, and τmax ∼ N (4, 0.32 ) Nm. With these physical parameters, the pendulum must (with probability ≈ 1.0) perform at least two swings to reach vertical in less than T seconds. The cost function (20) suggests that policies that reach vertical as quickly as possible (i.e., using the fewest swings) are preferred. However, the success of an aggressive policy depends on the torque generating capability of the pendulum. With a noisy actuator, we expect aggressive policies to have higher variance. An approximation of the cost distribution obtained via discretization (N = 40000) is shown in Figure 2. Here we indeed see that regions around policies that attempt two-swing solutions (θ = [0.0, 1.0], θ = [1.0, 1.5]) have low expected cost, but high cost variance. Figure 3 shows the results of 25 iterations of VBO using EI selection (N0 = 15, ξ = 1.0,  = 0.2) in the noisy pendulum task. After N = 40 total evaluations, the expected cost and cost variance are sensibly represented in regions of low cost. Figure 4 illustrates two policies selected by minimizing the CB criterion (19) on the learned distribution with κ = ±2.0. The risk-seeking policy (θ = [1.03, 1.5]) makes a large initial swing, attempting to reach the vertical position in two swings. The risk-averse policy (θ = [0.63, 1.14]) always produces three swings and exhibits low cost variance, though it has higher cost than the risk-seeking policy when the maximum torque is large.

Fig. 3. Estimated cost distribution after 25 iterations of VBO (N = 40). The optimization algorithm focuses modeling effort to regions of low cost.

It is often easy to understand the utility of risk-averse and risk-neutral policies, but the motivation for selecting risk-seeking policies may be less clear. The above result suggests one possibility: the acquisition of specialized, highperformance policies. For example, in some cases risk-seeking policies could be chosen in an attempt to identify observable initial conditions that lead to rare low-cost events. Subsequent optimizations might then be performed to direct the system to these initial conditions. One could also imagine situations where the context might demand performance that lower risk policies are very unlikely to generate. For example, if the minimum time to goal was reduced so that only two swing policies had a reasonable chance of succeeding. In such instances it may be desirable to select higher risk policies, even if the probability of succeeding is quite low. C. Balance Recovery with the uBot-5 The uBot-5 is an 11-DoF mobile manipulator that has two 4-DoF arms, a rotating trunk, and two wheels in a differential drive configuration. The robot has a mass of 19 kg and stands 60 cm from the ground with a torso that is roughly similar to a small adult human in scale and geometry (Figure 5). The robot balances using a linear-quadratic regulator (LQR) with feedback from an onboard inertial measurement unit (IMU). In our previous experiments [13], the energetic and stabilizing effects of rapid arm motions on the LQR stabilized system were evaluated in the context of recovery from impact

205

(a) τmax = 3.4 Nm (b) τmax = 4.0 Nm (c) τmax = 4.6 Nm J(θ) = 18.2 J(θ) = 17.0 J(θ) = 15.9 



 

(d) τmax = 3.4 Nm (e) τmax = 4.0 Nm (f) τmax = 4.6 Nm J(θ) = 19.9 J(θ) = 17.7 J(θ) = 13.0

Fig. 5.

Fig. 4. Performance of risk-averse (a)-(c) and risk-seeking (d)-(f) policies as the maximum pendulum torque is varied.

perturbations. One observation we made was that high energy impacts caused a subset of possible recovery policies to have high cost variance: successfully stabilizing in some trials, while failing to stabilize in others. We extend these experiments by considering larger impact perturbations, increasing the set of arm initial conditions, defining a policy space that permits more flexible, asymmetric arm motions. The robot was placed in a balancing configuration with its upper torso aligned with a 3.3 kg mass suspended from the ceiling (Figure 5). The mass was pulled away from the robot to a fixed angle and released, producing a controlled impact between the swinging mass and the robot. The pendulum momentum prior to impact was 9.9 ± 0.8 Ns and the resulting impact force was approximately equal to the robot’s total mass in earth gravity. The robot was consistently unable to recover from this perturbation using only the wheel LQR (see the rightmost column of Figure 6). This problem is well suited for model-free policy optimization since there are several physical properties, such as joint friction, wheel backlash, and tire slippage, that make the system difficult to model accurately. In addition, although the underlying state and action spaces are high dimensional (22 and 8, respectively), low-dimensional policy spaces that contain high-quality solutions are relatively straightforward to identify. The parameterized policy controlled each arm joint according to an exponential trajectory, τi (t) = e−λi t , where 0 ≤ τi (t) ≤ 1 is the commanded DC motor power for joint i at time t. The λ parameters were paired for the shoulder/elbow pitch and the shoulder roll/yaw joints. This pairing allowed the magnitude of dorsal and lateral arm motions to be independently specified. We commanded the pitch (dorsal) motions separately for each arm and mirrored the lateral motions, which reduced the number of policy parameters to 3. The range of each λi was constrained: 1 ≤ λi ≤ 15. At time t, if ∀i τi (t) < 0.25, the arms were retracted to a nominal configuration (the mean of the initial configurations) using a

The uBot-5 situated in the impact pendulum apparatus.

fixed, low-gain linear position controller. The cost function was designed to encourage energy efficient solutions that successfully stabilized the system,  T 1 I(t)V (t)dt, (21) J(θ) = h(x(T )) + 0 10 where I(t) and V (t) are the total absolute motor current and voltage at time t, respectively, T = 3.5 sec, and h(x(T )) = 5 if x(T ) ∈ F ailureStates, otherwise h(x(T )) = 0. After 15 random initial trials, we applied VBO with EI selection (ξ = 1.0,  = 0.2) for 15 episodes and randomized CB selection (κ ∼ N (0, 1)) for 15 episodes resulting in a total of N = 45 policy evaluations. Since the left and right pitch parameters are symmetric with respect to cost, we imposed an arbitrary ordering constraint, λleft ≥ λright , during policy selection. After training, we evaluated four policies with different risk sensitivity selected by minimizing the CB criterion (19) with κ = 2, κ = 0, κ = −1.5, and κ = −2. Each selected policy was evaluated 10 times and the results are shown in Figure 6. The sample statistics confirm the algorithmic predictions about the relative riskiness of each policy. In this case, the risk-averse and risk-neutral policies were very similar (no statistically significant difference between the mean or variance), while the two risk-seeking policies had higher variance (for κ = −2, the differences in both the sample mean and variance were statistically significant). For κ = −2, the selected policy produced an upward laterally-directed arm motion that failed approximately 50% of the time. In this case, the standard deviation of cost was sufficiently large that the second term in equation (19) dominated, producing a policy with high variance and poor average performance. A slightly less risk-seeking selection (κ = −1.5) yielded a policy with conservative low-energy arm movements that was more sensitive to initial conditions than the lower risk policies. This exertion of minimal effort could be viewed as a kind of gamble on initial conditions. Figure 7 shows two successful trials executing risk-averse and risk-seeking policies.

206

Fig. 7. Time series (duration: 1 second) showing two successful trials executing low-risk (top, κ = 2) and high-risk (bottom, κ = −2) policies selected using confidence bound criteria on the learned cost distribution. The low-risk policy produced an asymmetric dorsally-directed arm motion with reliable recovery performance. The high-risk policy produced an upward laterally-directed arm motion that failed approximately 50% of the time.

Fig. 6. Data collected over 10 trials using policies identified as riskaverse, risk-neutral, and risk-seeking after performing VBO. The policies were selected using confidence bound criteria with κ = 2, κ = 0, κ = −1.5, and κ = −2, from left to right. The sample means and two times sample standard deviations are shown. The shaded region on the top part of the plot contains all trials that resulted in failure to stabilize. Ten trials with a fixed-arm policy are plotted on the far right to serve as a baseline level of performance for this impact magnitude.

V. D ISCUSSION AND F UTURE W ORK In many systems, it may be advantageous to adjust risk sensitivity based on runtime context. For example, systems whose environments change in ways that make failures more or less costly (such as operating around catastrophic obstacles or in a safety harness) or when the context demands that the system seek out a low-probability high-performance event. Perhaps not surprisingly, this variable risk property has been observed in a variety of animal species, from simple motor tasks in humans to foraging birds and bees [2, 1]. However, most methods for learning policies by interaction focus on the risk-neutral minimization of expected cost. Extending Bayesian optimization methods to capture policydependent cost variance creates the opportunity to select policies with different risk sensitivity. Furthermore, the ability to change risk sensitivity at runtime offers an advantage over existing risk-sensitive control techniques, e.g., [21, 30], that require separate optimizations and policy executions to produce policies with different risk. There are several properties of VBO that should be consid-

ered when determining its suitability for a particular problem. First, although the computational complexity is the same as Bayesian optimization, O(N 3 ), the greater flexibility of the VHGP model means that VBO tends to require more initial policy evaluations than standard Bayesian optimization. In addition, many model-free policy search algorithms, such as Bayesian optimization, VBO, and stochastic gradient descent [25], are sensitive to the number of policy parameters—highdimensional policies can require many trials to optimize. Thus, these algorithms are most effective in problems where lowdimensional policy spaces are available, but accurate system models are not. However, there is evidence policy spaces at least up to 15 dimensions can be efficiently explored with Bayesian optimization if estimates of the GP hyperparameters can be obtained a priori [17]. In contrast to local methods, such as policy gradient, Bayesian optimization and VBO can produce large changes in policy parameters between episodes, which could be undesirable in some situations. One approach to alleviating this potential problem is to combine VBO with local gradient methods. For example, one could imagine collecting data by performing gradient descent, rather than randomly selecting policies initially. In this case, both the samples obtained and the gradient estimates could be used to constrain the posterior cost distribution. In turn, the learned local cost distribution could act as a critic structure to reduce the variance of the policy update. Local offline optimization could be interweaved with the local policy updates to select greedy policies or change risk sensitivity using CB criteria. Some of these ideas have been explored in our recent work [14]. Another important consideration is the choice of kernel functions in the GP priors. In this work, we used the anisotropic squared exponential kernel to encode our prior assumptions regarding the smoothness and regularity of the underlying cost function. However, for many problems the underlying cost function is not smooth or regular; it contains flat regions and sharp discontinuities that can be difficult to represent. An interesting direction for future work is the use kernel functions with local support, i.e. kernels that are not invariant to shifts in policy space [24].

207

VI. C ONCLUSION Varying risk sensitivity based on runtime context is a potentially powerful way to generate flexible control in robot systems. We considered this problem in the context of modelfree policy search, where risk-sensitive policies can be selected based on an efficiently learned cost distribution. Our experimental results suggest that VBO is an efficient and plausible method for achieving risk-sensitive control. ACKNOWLEDGMENTS The authors would like to thank George Konidaris and the anonymous reviewers for their helpful comments. Scott Kuindersma was supported by a NASA GSRP Fellowship from Johnson Space Center. Roderic Grupen was supported by the ONR MURI award N00014-07-1-0749. Andrew Barto was supported by the AFOSR under grant FA9550-08-1-0418. R EFERENCES [1] Melissa Bateson. Recent advances in our understanding of risksensitive foraging preferences. Proceedings of the Nutrition Society, 61:1–8, 2002. [2] Daniel A. Braun, Arne J. Nagengast, and Daniel M. Wolpert. Risk-sensitivity in sensorimotor control. Frontiers in Human Neuroscience, 5:1–10, January 2011. [3] Eric Brochu, Vlad Cora, and Nando de Freitas. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. Technical Report TR-2009-023, University of British Columbia, Department of Computer Science, 2009. [4] Adam D. Bull. Convergence rates of efficient global optimization algorithms. Journal of Machine Learning Research, 12: 2879–2904, October 2011. [5] Dennis D. Cox and Susan John. A statistical method for global optimization. In Systems, Man and Cybernetics, 1992., IEEE International Conference on, volume 2, pages 1241–1246, 1992. doi: 10.1109/ICSMC.1992.271617. [6] Marcus Frean and Phillip Boyle. Using Gaussian processes to optimize expensive functions. In AI 2008: Advances in Artificial Intelligence, pages 258–267, 2008. [7] Paul W. Goldberg, Christopher K. I. Williams, and Christopher M. Bishop. Regression with input-dependent noise: A Gaussian process treatment. In Advances in Neural Information Processing Systems 10 (NIPS), pages 493–499, 1998. [8] Steven G. Johnson. The NLopt nonlinear-optimization package. http://ab-initio.mit.edu/nlopt, 2011. [9] Donald R. Jones. A taxonomy of global optimization methods based on response surfaces. Journal of Global Optimization, 21:345–383, 2001. [10] Kristian Kersting, Christian Plagemann, Patrick Pfaff, and Wolfram Burgard. Most likely heteroscedastic Gaussian process regression. In Proceedings of the International Conference on Machine Learning (ICML), pages 393–400, 2010. [11] Jens Kober and Jan Peters. Policy search for motor primitives in robotics. In Advances in Neural Information Processing Systems 21. MIT Press, 2009. [12] J. Zico Kolter and Andrew Y. Ng. Policy search via the signed derivative. In Robotics: Science and Systems V (RSS), 2010. [13] Scott Kuindersma, Roderic Grupen, and Andrew Barto. Learning dynamic arm motions for postural recovery. In Proceedings of the 11th IEEE-RAS International Conference on Humanoid Robots, pages 7–12, Bled, Slovenia, October 2011. [14] Scott Kuindersma, Roderic Grupen, and Andrew Barto. Variable risk dynamic mobile manipulation. In RSS 2012 Workshop on Mobile Manipulation, Sydney, Australia, 2012.

[15] Miguel L´azaro-Gredilla and Michalis K. Titsias. Variational heteroscedastic Gaussian process regression. In Proceedings of the International Conference on Machine Learning (ICML), 2011. [16] H. Levy and H. M. Markowitz. Approximating expected utility by a function of mean and variance. The American Economic Review, 69(3):308–317, June 1979. [17] Daniel Lizotte, Tao Wang, Michael Bowling, and Dale Schuurmans. Automatic gait optimization with Gaussian process regression. In Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI), 2007. [18] Daniel James Lizotte. Practical Bayesian Optimization. PhD thesis, University of Alberta, Edmonton, Alberta, 2008. [19] Ruben Martinez-Cantin, Nando de Freitas, Arnaud Doucet, and Jos´e A. Castellanos. Active policy learning for robot planning and exploration under uncertainty. In Proceedings of Robotics: Science and Systems, 2007. [20] Ruben Martinez-Cantin, Nando de Freitas, Eric Brochu, Jos´e A. Castellanos, and Arnaud Doucet. A Bayesian explorationexploitation approach for optimal online sensing and planning with a visually guided mobile robot. Autonomous Robots, 27: 93–103, 2009. [21] Oliver Mihatsch and Ralph Neuneier. Risk-sensitive reinforcement learning. Machine Learning, 49:267–290, 2002. ˘ [22] J. Mo˘ckus, V. Tiesis, and A. Zilinskas. The application of Bayesian methods for seeking the extremum. In Toward Global Optimization, volume 2, pages 117–128. Elsevier, 1978. [23] Jan Peters and Stefan Schaal. Policy gradient methods for robotics. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS), pages 2219–2225, 2006. [24] Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006. [25] John W. Roberts and Russ Tedrake. Signal-to-noise ratio analysis of policy gradient algorithms. In Advances of Neural Information Processing Systems 21 (NIPS), 2009. [26] Niranjan Srinivas, Andreas Krause, Sham Kakade, and Matthias Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. In Proceedings of the 27th International Conference on Machine Learning (ICML), 2010. [27] Russ Tedrake, Teresa Weirui Zhang, and H. Sebastian Seung. Stochastic policy gradient reinforcement learning on a simple 3D biped. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS), volume 3, pages 2849–2854, Sendai, Japan, September 2004. [28] Matthew Tesch, Jeff Schneider, and Howie Choset. Using response surfaces and expected improvement to optimize snake robot gait parameters. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), San Francisco, CA, 2011. [29] Evangelos Theodorou, Jonas Buchli, and Stefan Schaal. Reinforcement learning of motor skills in high dimensions: A path integral approach. In Proceedings of the IEEE International Conference on Robotics and Automation, Anchorage, Alaska, May 2010. [30] Bart van den Broek, Wim Wiegerinck, and Bert Kappen. Risk sensitive path integral control. In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), pages 615–622, 2010. [31] Peter Whittle. Risk-sensitive linear/quadratic/Gaussian control. Advances in Applied Probability, 13:764–777, 1981. [32] Peter Whittle. Risk-Sensitive Optimal Control. John Wiley & Sons, 1990. [33] Aaron Wilson, Alan Fern, and Prasad Tadepalli. A behavior based kernel for policy search via Bayesian optimization. In Proceedings of the ICML 2011 Workshop: Planning and Acting with Uncertain Model, Bellevue, WA, 2011.

208

Time-Optimal Trajectory Generation for Path Following with Bounded Acceleration and Velocity Tobias Kunz and Mike Stilman School of Interactive Computing, Center for Robotics and Intelligent Machines Georgia Institute of Technology Atlanta, GA 30332 Email: [email protected], [email protected]

Abstract—This paper presents a novel method to generate the time-optimal trajectory that exactly follows a given differentiable joint-space path within given bounds on joint accelerations and velocities. We also present a path preprocessing method to make nondifferentiable paths differentiable by adding circular blends. We introduce improvements to existing work that make the algorithm more robust in the presence of numerical inaccuracies. Furthermore we validate our methods on hundreds of randomly generated test cases on simulated and real 7-DOF robot arms. Finally, we provide open source software that implements our algorithms.

®i

`i

qi

qi¡1

ri

y ^i

qi+1 Fig. 1.

x ^i

ci

Circular blend around waypoint

I. I NTRODUCTION To deal with the complexity of planning a robot motion, the problem is often subdivided in two or more subproblems. The first subproblem is that of planning a geometric path through the environment, which does not collide with obstacles. In an additional step this path is converted into a time-parametrized trajectory that follows the path within the capabilities of the robot. Preferably we are looking for a near-optimal trajectory according to some optimality criterion. In practice the capabilities of the robot cannot be expressed exactly. Thus, the capabilities of the robot are normally approximated by using some model of the robot and limiting certain quantities. One option is to model the dynamics of the robot and limit the torques that can be applied by the joints. In this paper we are using limits on joint accelerations and velocities which are often available. This paper presents a method to generate the time-optimal trajectory along a given path within given bounds on accelerations and velocities. The path can be given in any arbitrary configuration space. However, we assume that the acceleration and velocity of individual coordinates are limited. Thus, the configuration coordinates need to be chosen such that they match the quantities that need to be limited. For a robot manipulator the coordinates are typically chosen to match joints of the robot. Thus, we will refer to the limits as joint acceleration and joint velocity limits. We assume that the path is differentiable with respect to some path parameter s. This is a weak assumption. If the path was not differentiable at a point, the robot would have to come to a complete stop at that point in order to follow the path. Then the path could be split at that point and the method presented here could be applied to each part of the path. We also assume that the path

curvature is piecewise-continuous and that for every piece the path is coplanar. The output of a typical geometric path planner is a path in configuration space consisting of continuous straight line segments between waypoints. Such a path is not differentiable at the waypoints. Thus, we also present a preprocessing step to make such a path differentiable by adding circular blends. II. R ELATED W ORK One approach described in standard textbooks [1, 2] to generate a trajectory that satisfies acceleration and velocity constraints from a list of waypoints is to use linear segments with parabolic blends. However, the approach is not directly applicable to automatically generated paths with potentially dense waypoints. They assume that the timing between the waypoints and thus the velocities for the linear segments are already known. However, a path normally does not include timing for the waypoints. Choosing the timing is not trivial. Generally we prefer to move fast, but if we move too fast, neighboring blend segments might overlap and render the trajectory invalid. In [3] a very simple method for choosing the timing is presented. This method tries to resolve overlaps of blend segments locally by slowing down neighboring linear segments until no blend segments overlap. Yet, this method can lead to very slow trajectories, especially if the waypoints are close together. The general approach used in this paper was introduced by Bobrow et al. [4] and by Shin and McKay [5]. We build on the later one. They propose to convert all joint constraints into constraints on the acceleration and velocity along the

209

path, which reduces the problem to one dimension and then search in the two-dimensional phase plane for the optimal trajectory. They consider not only acceleration constraints but use a more general model. They are able to limit torque in the presence of robot dynamics or even to limit the voltage in the presence of motor dynamics. Partially due to this generality, their derivation is more complex than ours. We are therefore able to more thoroughly consider all possible special cases. [5] falsely claims that at every point along the limit curve there is only a single feasible acceleration. It also does not consider limits on joint velocities. Finally, Shin searches for switching points solely numerically, which is less efficient and makes the algorithm more likely to fail, especially in the presence of discontinuities in the path curvature. The following papers have built on top of this general approach. All of them assume torque constraints. A couple of papers analyse conditions for switching points in order to be able to calculate them explicitly instead of searching for them numerically. [6] notes that points where the limit curve is nondifferentiable are potential switching points and gives a necessary condition for identifying them. It also states that there is more than one valid acceleration at these switching points. [7] lists all possible cases for switching points and gives necessary conditions for some of them that can be used to explicitly calculcate switching point candidates. However, they do not give sufficient conditions for switching points. Also, one of the cases still requires a numerical search. We will show that the case that requires a numerical search in [7] cannot happen in our case with acceleration limits instead of torque limits and a piecewise planar path. We only use a numerical search for switching points caused by joint velocity limits. [8] deals with those switching points along the limit curve where the acceleration is not uniquely determined. However, we will show that the acceleration they claim to be optimal at such a point is incorrect. Finally, [9] considers joint velocity constraints in addition to torque constraints. None of the previous work deals with numerical issues or reports how robustly their algorithm performs. III. C ONTRIBUTION Our algorithms build on existing work. The novel contributions of this paper are the following: •





Combining path-following with a preprocessing step that converts the output of typical path planners to a differentiable path. A more thorough derivation of the constraints in the phase plane than [5]. At the same time our derivation is simpler since we only cover a special case of [5]. We show that for the case of constraints on only acceleration and piecewise-planar paths, the limit curve is never continuous and differentiable at a switching point. Thus, we can calculate all switching points along the acceleration limit curve explicitly. Previous work relied



• • • •

at least partially on numerical search, but also handled the more general case of torque constraints. [8] applies an incorrect optimal acceleration at switching points where the limit curve is non-differentiable. We demonstrate this and provide an alternative solution. We provide sufficient conditions for switching points instead of just necessary ones. An improvement to the algorithm that makes it more robust in the presence of numerical inaccuracies. Our algorithm is demonstrated to be sufficiently robust to follow over 100 randomly generated paths without failing. We provide open-source software, available for download at http://www.golems.org/node/1570. IV. PATH P REPROCESSING

Common geometric path planners like PRMs or RRTs usually output the resulting path as a list of waypoints, which are connected by straight lines in configuration space. At a waypoint the path is changing its direction instantaneously and thus is not differentiable. In order to follow such a path exactly, the robot would have to come to a complete stop at every waypoint. This would make the robot motion very slow and look unnatural. Therefore, we add circular blends around the waypoints, which make the path differentiable. If the path is already differentiable, the preprocessing described in this section can be omitted. We are looking for a circular segment that starts tangential to the linear path segment before the waypoint and ends tangential to the linear path segment after the waypoint. We also need to ensure that the circular segment does not replace more than half of each of the neighboring linear segments. Otherwise we might not get a continuous path. In addition, we allow the enforcement of a maximum deviation from the original path. First, we define some quantities that are helpful to define the circle (see also Figure 1). The unit vector y ˆi pointing from waypoint qi−1 to qi is given by y ˆi =

qi − qi−1 |qi − qi−1 |

(1)

The angle αi between the two adjoining path segments of waypoint qi is given by αi = arccos (ˆ yi · yˆi+1 )

(2)

The distance i between waypoint qi and the points where the circle touches the linear segments is given as + * ||qi − qi−1 || ||qi+1 − qi || δ sin α2i (3) , , i = min 2 2 1 − cos α2i where the first two elements give the maximum possible distances such that the circular segment does not replace more than half of the adjoining linear segments and the last element limits the radius to make sure the circlular segment stays within a distance δ from waypoint qi . Given the quantities above, we can now define the circular segment. The circle is defined by its center ci , its radius ri and

210

two vectors x ˆi and yˆi spanning the plane in which the circle lies. The vectors x ˆi and y ˆi are orthonormal. x ˆi points from the center of the circle to the point where the circle touches the preceding linear path segment. yˆi is the previously defined direction of the preceding linear segment. ri =

i tan α2i

(4)

ri y ˆi+1 − yˆi · ||ˆ yi+1 − yˆi || cos α2i ˆi − ci qi −  i y x ˆi = ||qi − i y ˆi − ci || ci = qi +

(5)

The constraints in the high-dimensional joint space need to be converted into constraints on the scalar path velocity s(s) ˙ and path acceleration s¨(s, s). ˙ The constraints on joint accelerations result in constraints on the acceleration and velocity along the path as shown in Section V-A. The constraints on joint velocities result in constraints on the velocity along the path as presented in Section V-B. A. Joint Acceleration Limits We have constraints on the joint accelerations given as −¨ qimax ≤ q¨i ≤ q¨imax

(6)

Given these quantities specifying the circle, we can calculate the robot configuration q for any point on the circular segment as a function f (s) of the arc length s traveled from the start of the path. As we are currently only considering the current circular path segment, we assume si ≤ s ≤ si + αi ri , where si specifies the start of the circular segment. Similarly we can calculate the first and second derivatives of the function f (s). Note that these are not time-derivatives but derivatives by s. We will make use of these derivatives later.      s s ˆi cos +y ˆi sin (7) q = f (s) = ci + ri x ri ri     s s f  (s) = −ˆ xi sin +y ˆi cos (8) ri ri      s s 1 x ˆi sin +y ˆi cos (9) f  (s) = − ri ri ri

q = f (s)

0 ≤ s ≤ sf

(13) ⇔ −¨ qimax ≤ fi (s) s¨ + fi (s) s˙ 2 ≤ q¨imax

d ds d f (s) = f (s) = f  (s) s˙ dt ds dt q¨ = f  (s) s¨ + f  (s) s˙ 2

−¨ qimax fi (s) s˙ 2 − ≤ s¨ ≤ fi (s) fi (s) −¨ q max f  (s) s˙ 2 ⇔ i − i  ≤ s¨ ≤ |fi (s)| fi (s)

(14) ⇔

(12)

If s is the arc length, s˙ and s¨ are the velocity and acceleration along the path, then f  (s) is the unit vector tangent to the path and f  (s) is the curvature vector. [8]

q¨imax fi (s) s˙ 2 − (15) fi (s) fi (s) q¨imax fi (s) s˙ 2 − (16) |fi (s)| fi (s)

If fi (s) < 0: fi (s) s˙ 2 q¨imax fi (s) s˙ 2 −¨ qimax − ≥ s ¨ ≥ − (17) fi (s) fi (s) fi (s) fi (s) q¨max f  (s) s˙ 2 −¨ q max f  (s) s˙ 2 ⇔ i − i  ≥ s¨ ≥  i − i  (18) |fi (s)| fi (s) |fi (s)| fi (s)

(14) ⇔

If fi (s) = 0 and fi (s) = 0: q¨max −¨ qimax ≤ s˙ 2 ≤ i  |fi (s)| |fi (s)| ) q¨imax ⇔ s˙ ≤ |fi (s)|

(14) ⇔

(19) (20)

If fi (s) = 0 and fi (s) = 0, Eq. 14 is always satisfied. Eq. 16 and Eq. 18 are equivalent. Thus, the limits on the path acceleration s¨ are ˙ ≤ s¨ ≤ s¨max (s, s) ˙ s¨min (s, s) with

 ˙ = s¨min (s, s)

(10)

(11)

(14)

If fi (s) > 0:

max i∈[1,...,n] fi (s)=0

where s can be an arbitrary parameter. We will assume it is the arc length traveled since the start of the path. We can also define the joint velocities and accelerations with respect to the parameter s. q˙ =

(13)

where q¨i is the ith component of vector q¨. Although the universal quantifier is ommited in the following, all the following inequalities have to hold for all i ∈ [0, ..., n].

V. R EDUCTION TO O NE D IMENSION The approach we are using was originally proposed in [5], which finds a minimum-time trajectory that satisfies torque limits on the joints. In constrast, we assume acceleration and velocity limits on the joints. Acceleration limits are a special case of torque limits. Using acceleration instead of torque limits results in a simpler problem and a simpler derivation of the following equations than in [5]. Unlike [5] and like [9], we are also considering velocity limits on the joints. Because the solution is constrained to follow a given path exactly, the problem can be reduced to one dimension: choosing the velocity s˙ = ds dt for every position s along the path. A path of length sf is given as a function f : [0, sf ] → Rn . The configuration q at a point s along the path is given by

∀i ∈ [0, ..., n]

max



(s, s) ˙ =

min i∈[1,...,n] fi (s)=0



−¨ qimax f  (s)s˙ 2 − i  |fi (s)| fi (s) f  (s)s˙ 2 q¨imax − i  |fi (s)| fi (s)

(21)  (22)  (23)

The limit on the path acceleration also constrains the path velocity, because in order for a path velocity to be feasible we need s¨min (s, s) ˙ ≤ s¨max (s, s). ˙ We now derive the path velocity (s) caused by acceleration constraints. We get rid of limit s˙ max acc the min and max functions in Eq. 22 and 23 by requiring the inequality to hold for all possible combinations of arguments to the min and max functions.

211

Plugging Eq. 11 into Eq. 32 yields s¨min (s, s) ˙ ≤ s¨max (s, s) ˙ (24) /  max  . max 2   2 −¨ qj fj (s)s˙ q¨i f (s)s˙ ⇔ − i − ((  (( −  ≥0  |fi (s)| fi (s) fj (s) fj (s) ∀i, j ∈ [1, ..., n], fi (s) = 0, fj (s) = 0 (25) . / / . q¨jmax fi (s) fj (s) q¨imax 2 ( ≥0 ⇔− −  s˙ + +( fi (s) fj (s) |fi (s)| (fj (s)( ∀i, j ∈ [1, ..., n], fi (s) = 0, fj (s) = 0 (26) ( ( . / ( f  (s) f  (s) ( q¨jmax q¨imax ( i ( 2 j ( ≥0 ⇔−(  −  +( ( s˙ + ( fi (s) fj (s) ( |fi (s)| (fj (s)( ∀i ∈ [1, ..., n], j ∈ [i + 1, ..., n], fi (s) = 0, fj (s) = 0

( ( . / ( f  (s) f  (s) ( q¨jmax q¨imax ( 2 ( i j ( = 0 (28) −  +( −(  ( s˙ + ( fi (s) fj (s) ( |fi (s)| (fj (s)( : ; q¨max q¨jmax ; i ; |fi (s)| + |fj (s)| ⇔s˙ = ; (29) < (( f  (s) fj (s) (( ( fi (s) − f  (s) ( j

The interval of feasible path velocities s˙ is defined by the intersection of the feasible intervals of all parabolas. Thus, the upper bound for the path velocity is the minimum of all upper bounds given in Eq. 29. Combining this with the case from Eq. 20, the constraint on the path velocity caused by joint acceleration limits is given by s˙ ≤ s˙ max acc (s)

(30)

with s˙ max (s) = min ⎧ acc ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ : ⎪ ⎪ ; q¨max ⎪ q¨jmax ⎪ ; i ⎪ ⎨ ; |fi (s)| + |fj (s)| ;( min < ( f  (s) fj (s) (( , ⎪ i∈[1,...,n] ⎪ ( fi (s) − f  (s) ( ⎪ j∈[i+1,...,n] ⎪ i j ⎪  ⎪ fi (s)=0 ⎪ ⎪  ⎪ fj (s)=0 ⎪ ⎪ ⎪ ⎪ fi (s) − fj (s) =0 ⎩ f  (s) f  (s) i

) min i∈[1,...,n] fi (s)=0 fi (s)=0

j

⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬

q¨imax |fi (s)| ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎭ (31)

B. Joint Velocity Limits Constraints on the joint velocities are given as −q˙imax ≤ q˙i ≤ qimax

∀i ∈ [1, ..., n]

(32)

∀i ∈ [1, ..., n]

(33)

fi (s)

= 0, then Eq. 33 is always satisfied. Otherwise, If because s˙ > 0, Eq. 33 is equivalent to s˙ ≤

qimax |fi (s)|

∀i ∈ [1, ..., n]

(34)

Thus, the constraint on the path velocity caused by limits on the joint velocities is given by s˙ ≤ s˙ max vel (s)

(35)

with s˙ max vel (s) =

(27)

This gives a set of downward-facing parabolas in s˙ horizontally centered around the origin. Each parabola is positive within an interval around 0, which is the interval the feasible velocities may lie in. The positive bound of the interval can be found by setting the parabola equation to zero.

i

−q˙imax ≤ fi (s)s˙ ≤ qimax

q˙imax  i∈[1,...,n] |fi (s)| min

(36)

fi (s)=0

We will make use of the slope of this limit curve in the phase plane, which is the derivative by s given by q˙max fi (s) d max s˙ vel (s) = − i ds fi (s) |fi (s)|

q˙imax  i∈[1,...,n] |fi (s)|

i = arg min

(37)

fi (s)=0

VI. A LGORITHM Figure 2 shows the s-s˙ phase-plane. The start point is in the bottom-left corner and the end point in the bottom-right corner. The two limit curves limiting the path velocity, which are caused by joint acceleration and joint velocity constraints respectively, are shown. The trajectory must stay below these two limit curves. We want to find a trajectory that maximizes path velocity s˙ at every point along the path. While not on the limit curve, the path acceleration must be at one of its limits, i.e. s¨min or s¨max . Thus, at every point on the phaseplane below the limit curve there are two possible directions to move in: one applying minimum acceleration and the other one applying maximum acceleration. The algorithm needs to find the switching points between minimum and maximum acceleration. In Figure 2 switching points are marked by arrows. Switching points from minimum to maximum acceleration must be on the limit curve, because otherwise we could find a faster trajectory above the solution trajectory [5]. The high-level algorithm is described below. It differs slightly from [5]. We integrate backward from the end point as the very last step. This makes the algorithm slightly simpler to implement, because while integrating forward we can never intersect with another trajectory part and while integrating backward we can never reach the limit curve. Section VIII gives some more implementation details. The numbers in Figure 2 correspond to the steps of the algorithm. 1) Start from the start of the path, i. e. s = 0 and s˙ = 0. 2) Integrate forward with maximum acceleration s¨ = s¨max (s, s) ˙ until one of the following conditions is met. • If s ≥ sf , continue from the end of the path, i. e. s = sf and s˙ = 0 and go to step 5. max • If s˙ > s˙ acc (s), go to step 4. max • If s˙ > s˙ vel (s), go to step 3.

212

3) 2)

5) 2)



4) 1)

acceleration limit curve velocity limit curve time-optimal trajectory s Fig. 2.

Phase-plane trajectory

3) Follow the limit curve s˙ max vel (s) until one of the following conditions is met. s¨max (s,s˙ max d max vel (s)) • If < ds s˙ acc (s), go back to step 2. s˙ s¨min (s,s˙ max (s)) d max vel • If > ds s˙ acc (s), go to step 4. s˙ 4) Search along the combined limit curve for the next switching point. See section Section VII. • Continue from the switching point and go to step 5. 5) Integrate backward with minimum acceleration until the start trajectory is hit. (If the limit curve is hit instead, the algorithm failed. We used this condition to detect failures shown in Table I.) The point where the start trajectory is intersected is a switching point. Replace the part of the start trajectory after that switching point with the trajectory just generated. • If we transitioned into this step from step 2, halt. The start trajectory reached the end of the path with s = sf and s˙ = 0. • Otherwise, continue from the end of the start trajectory, which is the switching point found in step 4, and go to step 2. VII. S WITCHING P OINTS With the exception of a finite number of points, at every point on the acceleration limit curve there is only a single feasible acceleration. If this acceleration leads into the feasible region, the point on the limit curve is called a trajectory source [6]. If this acceleration leads out of the feasible region, the point is called a trajectory sink. We define trajectory source and sink similarly for the velocity limit curve if all feasible accelerations lead into or out of the feasible region. If the interval of feasible accelerations allows following the velocity limit curve, we call the point singular. A switching point along a limit curve, is a point where the limit curve switches from being a trajectory sink to being a trajectory source or to being singular. The limit curve is the curve given by the minimum of the acceleration limit curve and the velocity limit curve. A switching point of the acceleration or velocity limit curve is a switching point of the

limit curve if the limit curve the switching point is on is the lower one. The following two sections deal with finding the switching points on both, the acceleration and velocity limit curves. A. Caused by Acceleration Constraints This section describes how to find switching points along the path velocity limit curve s˙ max acc (s) caused by constraints on the joint accelerations. We distinguish three cases of these switching points, depending on whether the curve is continuous and/or differentiable at the switching point. 1) Discontinuous: s˙ max acc (s) is discontinuous if and only if the path curvature f  (s) is discontinuous [7]. For a path generated as described in Section IV the discontinuities are exactly the points where the path switches between a circular segment and straigt line segment. If the path’s curvature f  (s) is continuous everywhere, this case of switching points does not happen. At a discontinuity s there are two path velocity limits − max + s˙ max acc (s ) and s˙ acc (s ). The switching point is always at the smaller one of the two. If the discontinuity is a positive step and there is a trajectory sink in the negative direction of the discontinuity, then the discontinuity is a switching point. Equally, if the discontinuity is a negative step and there is a trajectory source in the positive direction of the discontinuity, then the discontinuity is a switching point. Or more formally, a discontinuity of s˙ max acc (s) is a switching point if and only if  − max + s˙ max acc (s ) < s˙ acc (s ) d max −  − s˙ (s )) ≥ ∧ s¨max (s− , smax (s ) acc ds acc  − max + ∨ s˙ max acc (s ) > s˙ acc (s ) d max +  + s˙ (s )) ≤ (s ) (38) ∧ s¨max (s+ , smax acc ds acc 2) Continuous and Nondifferentiable: The original paper [5] introducing the approach claims that every point along

213

q2 q1

s˙ q3

s Fig. 3. Minimum and maximum acceleration trajectories near a switching point where the limit curve is nondifferentiable

the limit curve only has a single allowable acceleration with ¨max (s, s˙ max s¨min (s, s˙ max acc (s)) = s acc (s)). However, this is inaccurate. As [6] notes, there are points along the limit curve where there is an interval of allowable accelerations, i. e. ¨max (s, s˙ max s¨min (s, s˙ max acc (s)) < s acc (s)). These points are called critical points in [8] and zero-inertia points in [7]. At these points the limit curve is continuous but nondifferentiable [8]. As noted in [6], a neccessary condition for such switching points is ∃i : fi (s) = 0

(39)

For a path generated from waypoints as described in Section IV, points satisfying Eq. 39 can be easily calculated. There are at most n per circular segment. [6] does not note that choosing the correct acceleration at such a switching point might be an issue. [6] proposes to integrate backward from such a switching point with minimum acceleration and integrate forward with maximum acceleration as usual. [8] notices that the acceleration at such a switching point needs special consideration. [8] notes that the maximum or minimum acceleration cannot always be followed, because they would lead into the infeasible region of the phase plane. [8] calls such a point a singular point and proposes to follow the limit curve tangentially. However, both [6] and [8] are in error. Figure 3 shows a switching point at a point where the limit curve is nondifferentiable. At the switching point two arrows indicate the direction of motion in the phase plane when applying minimum or maximum acceleration, respectively. In the feasible region of the phase plane gray curves visualize two vector fields. They show minimum- and maximumacceleration trajectories. Minimum-acceleration trajectories are dashed, maximum-acceleration trajectories are solid. In the case shown here, if integrating backwards with minimum acceleration, the infeasible region would be entered. Thus, according to [8] we would have to follow the red limit curve tangentially backwards from the switching point. For integrating forward [6] and [8] propose to follow the upward arrow. However, both of these motions are not possible, as they would not move along the vector field sourrounding the switching point. Zero is the only acceleration at the switching point that conforms with the vector field around it, resulting in a trajecory moving horizontally in the phase plane. We claim that the optimal acceleration is zero at every such

Fig. 4. Joint acceleration space for n=3 with box constraints and a trajectory for a switching point that meets the limit curve tangentially.

switching point. We leave the proof for future work, but all of our experimental data supports this claim. In order for the zero acceleration to be feasible, the limit curve must switch from a negative to a positive slope at the switching point. We claim that the limit curve switching from a negative to a positve slope is a necessary and sufficient condition for a switching point at a point of nondifferentiability of the limit curve. Together with the sufficient condition stated in Eq. 39, this allows for an explicit enumeration of all such switching points. 3) Continuous and Differentiable: According to the following lemma this case does not exist. Lemma 1: If only joint accelerations are constrained, the path is piece-wise coplanar, and the curvature f  (s) is discontinuous at the points where the pieces are stitched together, then there are no switching points at which the limit curve is continuous and differentiable. Proof: Before the switching point the path acceleration is at the lower limit. This implies that at least one joint is at its acceleration limit. After the switching point the path acceleration is at its upper limit. This implies that at least one joint distinct from the previous one is at its acceleration limit. At the switching point both joints are at their acceleration limit. The limit curve, the phase-plane trajectory and the jointspace trajectory are continuous and differentiable near the switching point. One joint approaches its acceleration limit at the switching point and then stays at the limit. Another distinct joint is at its acceleration limit and leaves the limit at the switching point. Figure 4 shows the joint acceleration space for a 3-joint system with the joint acceleration limits shown as a box. At the switching point two joint are at their acceleration limit. Thus, the trajectory is on an edge of the box at the switching point. Before the switching point the trajectory stays on one surface of the box, approaches the edge tangentially. After the switching point the trajectory leaves the edge tangentially while staying on another surface of the constraint box. This trajectory lives in at least a threedimensional subspace of the joint acceleration space. There is no two-dimensional subspace that includes the trajectory in the vicinity of the switching point. Because the path is piecewise planar, the accelerations in joint space are also piecewise planar. Because there is no trajectory in joint acceleration space that cannot be included in a two-dimensional subspace, a switching point where the limit curve is differentiable does not exist. The only case where such a switching point can exist

214

is where two planar parts of the path are stitched together. However, these points are those where the curvature f  (s) is discontinuous and thus the limit curve is discontinuous. B. Caused by Velocity Constraints This section describes how to find switching points along the path velocity limit curve s˙ max vel (s) caused by constraints on the joint velocities. We distinguish two cases of these switching points, depending on whether s¨min (s, s˙ max vel (s)) is continuous. 1) Continuous: s is a possible switching point if, only if d max s˙ (s) (40) ds acc We search for these switching points numerically by stepping along the limit curve until we detect a sign change. Then we use bisection to more accurately determine the switching point. See also Section VIII-B. 2) Discontinuous: fi (s) being discontinuos is a necessary condition for s¨min (s, s˙ max vel (s)) being discontinuos. s is a possible switching point if and only if



limit curve accurate trajectory inaccurate trajectory

s¨min (s, smax vel (s)) =

d max − − s˙ (s )) (¨ smin (s− , smax vel (s )) ≥ ds acc d max + + s˙ (s )) ∧ (¨ smin (s+ , smax vel (s )) ≤ ds acc VIII. N UMERICAL C ONSIDERATIONS

s (a) failure without source/sink check



(41) limit curve accurate trajectory inaccurate trajectory

(42)

When doing floating-point calculations we must accept approximate results. However, we cannot accept the algorithm failing completely due to numerical inaccuracies. We now describe how numerical inaccuracies can make the algorithm described in section Section VI fail and the measures we have taken to avoid that. None of the previous papers on this approach [5–9] have dealt with numerical issues. A. Integration The algorithm described in Section VI assumes that we can exactly integrate the trajectory. Under this assumption it is impossible to hit the limit curve at a trajectory source when integrating forward. However, as we have to do the integration numerically with some non-zero step size, it is not exact. This can lead to the limit curve being hit at a trajectory source. Figure 5 shows an example of the trajectory hitting the limit curve at a trajectory source. The figure shows the limit curve together with an accurate trajectory generated using a small step size and a not-so-accurate trajectory generated with a 10 times larger step size (1 ms). The trajectories enter the figure on the left at minimum acceleration until they touch the limit curve. Then they switch to maximum acceleration. In the following the accurate trajectory gets close to the limit curve but does not hit it. The inaccurate trajectory, however, because of the larger step size, misses the bend shortly before the limit curve and hits the limit curve. According to the algorithm described in Section VI and the algorithms described in previous work [5, 6, 8, 9] we would have to stop the forward integration, search for the next switching point along the limit curve and integrate backward from there until we hit the forward trajectory. However, when integrating backward

s (b) success with source/sink check Fig. 5.

Dealing with integration inaccuracies

from the next switching point we might hit the limit curve instead of the forward trajectory. This is because the algorithm relies on the fact that there are no trajectory sources between where the forward trajectory hits the limit curve and the next switching point, which is not the case if the forward trajectory hits the limit curve at a trajectory source. To deal with this issue, we determine the intersection point with the limit curve by bisection and then check whether this point is a trajectory source by comparing the slope of the limit curve with the slope of the maximum-acceleration trajectory. Figure 5 shows the result of this measure. The inaccurate trajectory does not follow the accurate one exactly, but it does not stop when the limit curve is hit. Thus the algorithm succeeds. Note that although the smaller step size solved the problem in the example shown in Figure 5, it does not do so in general. In general, a smaller step size or a better integration method only make it less likely that the limit curve is hit at a trajectory source. Thus, the method described above is necessary to ensure completeness of the algorithm. B. Switching Point Search All previous work at least partially relies on numerical search to find switching points, but none notes the potential issues caused by using it. We also use numerical search, but only along the velocity limit curve. If we only had acceleration constraints, numerical search would not be necessary. We try to avoid numerical search as much as possible, because it

215

Fig. 6.

computation time was achieved on a single core of an Intel Core i5 at 2.67 Ghz. Smaller time steps lead to a longer computation time and a slightly closer to optimal path. What is not shown here is that a smaller step size also leads to the constraints being satisfied more accurately. Related to that is the fact that a larger time step makes the algorithm more likely to fail if the counter measures for numerical inaccuracies described in Section VIII-A are omitted. Our algorithm is robust enough to successfully generate trajectories for 100 random pick-and-place operations using different time steps. We also ran our algorithm on a real-robot arm following a path defined by hand-crafted waypoints. A video of this and a few example pick-and-place operations is available at http://www.golems.org/node/1570.

200 start and goal locations

Time Step

Computation Time

Execution Time

10 ms 1 ms 0.1 ms

0.2 s 0.9 s 31.8 s

4.74 s 4.71 s 4.70 s

Failure rate without with source/sink check 13 % 3% 0%

0% 0% 0%

TABLE I R ESULTS FOR VARYING TIME STEP

is slower, less accurate and might make the algorithm fail. Our algorithm relies on the fact that we can find the next switching point along the limit curve. However, by searching numerically and stepping along the limit curve, we could theoretically miss a switching point and find one that is not the next one. The algorithm could potentially be adapted to work with any switching point without requiring it being the next one. However, during our experiments we did not encounter a failure caused by missing a switching point. IX. E XPERIMENTAL R ESULTS We evaluated our approach by generating trajectories for a 7-DOF robot arm that is given the task of picking and placing a bottle from or onto a shelf or table. The results are averages over 100 pick-and-place operations. For each pickand-place operation we randomly selected a pick and a place location. Figure 6 shows the 200 pick and place locations. We used a bidirectional RRT to plan a configuration space path for the arm. Each pick-and-place operation constists of 3 path segments: from the initial arm configuration to the pick configuration, to the place configuration and back to the intial configuration. In our evaluation we ignore the finger motion necessary to grasp the object. The robot shown in Figure 6 uses only its left arm. The arm configuration shown is the inital arm configuration, which is reached at the beginning and end of every pick-and-place operation. The paths generated by the RRT planner are shortened using random short cutting. The result are paths given by waypoints at most 0.1 apart. Here we evaluate the trajectory generation from these waypoints. This method of generating example paths is very real-world oriented and does not allow us to handcraft waypoints until the trajectory generation accidentally succeeds. Table I shows computation and execution times for different time steps used for the integration of the trajectory. The

X. C ONCLUSION We presented an algorithm to follow a path in time-optimal manner while satisfying joint acceleration and velocity constraints. We added blends to a list of waypoint in order to be able to use the output of standart path planners. Our improvements to existing work make the algorithm more robust and less likely to fail. We showed the robustness of our implementation by following paths planned by an RRT planner for 100 random pick-and-place operations. ACKNOWLEDGEMENTS This work is supported by Toyota Motor Engineering & Manufacturing North America (TEMA). We appreciate the great contribution to our robotics research and education. R EFERENCES [1] J.J. Craig. Introduction to Robotics: Mechanics and Control (3rd Edition). Prentice Hall, 2004. [2] B. Siciliano, L. Sciavicco, and L. Villani. Robotics: modelling, planning and control. Springer, 2009. [3] T. Kunz and M. Stilman. Turning Paths Into Trajectories Using Parabolic Blends. Technical Report GT-GOLEM2011-006, Georgia Institute of Technology, 2011. [4] J.E. Bobrow, S. Dubowsky, and J.S. Gibson. Time-optimal control of robotic manipulators along specified paths. The Int. Journal of Robotics Research, 4(3):3–17, 1985. [5] K. Shin and N. McKay. Minimum-time control of robotic manipulators with geometric path constraints. IEEE Transactions on Automatic Control, 30(6):531–541, 1985. [6] F. Pfeiffer and R. Johanni. A concept for manipulator trajectory planning. IEEE Journal of Robotics and Automation, 3(2):115–123, 1987. [7] J.-J.E. Slotine and H.S. Yang. Improving the efficiency of time-optimal path-following algorithms. IEEE Transactions on Robotics and Automation, 5(1):118–124, 1989. [8] Z. Shiller and H.H. Lu. Computation of path constrained time optimal motions with dynamic singularities. Journal of dynamic systems, measurement, and control, 114:34, 1992. [9] L. Zlajpah. On time optimal path control of manipulators with bounded joint velocities and torques. In Proc. of IEEE Int. Conf. on Robotics and Automation, 1996.

216

Towards A Swarm of Agile Micro Quadrotors Alex Kushleyev, Daniel Mellinger, Vijay Kumar GRASP Lab, University of Pennsylvania

Abstract—We describe a prototype 73 gram, 21 cm diameter micro quadrotor with onboard attitude estimation and control that operates autonomously with an external localization system. We argue that the reduction in size leads to agility and the ability to operate in tight formations and provide experimental arguments in support of this claim. The robot is shown to be capable of 1850◦ /sec roll and pitch, performs a 360◦ flip in 0.4 seconds and exhibits a lateral step response of 1 body length in 1 second. We describe the architecture and algorithms to coordinate a team of quadrotors, organize them into groups and fly through known three-dimensional environments. We provide experimental results for a team of 20 micro quadrotors. Fig. 1. A formation of 20 micro quadrotors in flight. See video at http://youtu.be/50Fdi7712KQ

I. I NTRODUCTION The last decade has seen rapid progress in micro aerial robots, autonomous aerial vehicles that are smaller than 1 meter in scale and 1 kg or less in mass. Winged aircrafts can range from fixed-wing vehicles [14] to flapping-wing vehicles [6], the latter mostly inspired by insect flight. Rotor crafts, including helicopters, coaxial rotor crafts [9], ducted fans[22], quadrotors [10] and hexarotors, have proved to be more mature [15] with quadrotors being the most commonly used aerial platform in robotics research labs. In this class, the Hummingbird quadrotor sold by Ascending Technologies, GmbH [2], with a tip-to-tip wingspan of 55 cm, a height of 8 cm, mass of about 500 grams including a Lithium Polymer battery and consuming about 75 Watts is a remarkably capable and robust platform as shown in [16, 17]. Of course micro aerial robots have a fundamental payload limitation that is difficult to overcome in many practical applications. However larger payloads can be manipulated and transported by multiple UAVs either using grippers or cables [20]. Applications such as surveillance or search and rescue that require coverage of large areas or imagery from multiple sensors can be addressed by coordinating multiple UAVs, each with different sensors. Our interest in this paper is scaling down the quadrotor platform to develop a truly small micro UAV. The most important and obvious benefit of scaling down in size is the ability of the quadrotor to operate in tightly constrained environments in tight formations. While the payload capacity of the quadrotor falls dramatically, it is possible to deploy multiple quadrotors that cooperate to overcome this limitation. Again, the small size benefits us because smaller vehicles can operate in closer proximity than large vehicles. Another interesting benefit of scaling down is agility. As argued later and illustrated with experimental results, smaller quadrotors exhibit higher accelerations allowing more rapid adaptation to disturbances and higher stability.

II. AGILITY OF M ICRO Q UADROTORS It is useful to develop a simple physics model to analyze a quadrotor’s ability to produce linear and angular accelerations from a hover state. If the characteristic length is L, the rotor radius R scales linearly with L. The mass scales as L3 and the moments of inertia as L5 . On the other hand the lift or thrust, F , and drag, D, from the rotors scale with the cross-sectional area and the square of the blade-tip velocity, v. If the angular speed of the blades is defined by ω = Lv , F ∼ ω 2 L4 and D ∼ 2 4 ω 2 L4 . The linear acceleration a scales as a ∼ ωLL3 = ω 2 L. Thrusts from the rotors produce a moment with a moment arm 2 5 L. Thus the angular acceleration α ∼ ωLL5 = ω 2 . The rotor speed, ω also scales with length since smaller motors produce less torque which limits their peak speed because of the drag resistance that also scales the same way as lift. There are two commonly accepted approaches to study scaling in aerial vehicles [28]. Mach scaling is used for compressible flows and essentially assumes that the tip velocities are constant leading to ω ∼ R1 . Froude scaling is used for incompressible flows and assumes that for similar v2 aircraft configurations, the Froude number, Lg , is constant. Here g is the acceleration due to gravity. This yields ω ∼ √1R . However, neither Froude or Mach number similitudes take motor characteristics nor battery properties into account. While motor torque increases with length, the operating speed for the rotors is determined by matching the torque-speed characteristics of the motor to the drag versus speed characteristics of the propellors. Further, the motor torque depends on the ability of the battery to source the required current. All these variables are tightly coupled for smaller designs since there are fewer choices available at smaller length scales. Finally, the assumption that propeller blades are rigid may be wrong and the performance of the blades can be very different at smaller scales, the quadratic scaling of the lift with speed may

217

Pitch Angle (deg)

50 40 30 20 10 0 −0.1

0

0.1 0.2 Time (sec)

0.3

(a) Pitch angle step input response Fig. 3. Fig. 2.

(b) Data for the flipping maneuver

Attitude controller performance data

A prototype micro quadrotor.

not be accurate. Nevertheless these two cases are meaningful since they provide some insight into the physics underlying the maneuverability of the craft. Froude scaling suggests that the acceleration is independent of length while the angular acceleration α ∼ L−1 . On the other hand, Mach scaling leads to the conclusion that a ∼ L while α ∼ L−2 . Since quadrotors must rotate (exhibit angular accelerations) in order to translate, smaller quadrotors are much more agile. There are two design points that are illustrative of the quadrotor configuration. The Pelican quadrotor from Ascending Technologies [2] equipped with sensors (approx. 2 kg gross weight, 0.75 m diameter, and 5400 rpm nominal rotor speed at hover), consumes approximately 400 W of power [25]. The Hummingbird quadrotor from Ascending Technologies (500 grams gross weight, approximately 0.5 m diameter, and 5000 rpm nominal rotor speed at hover) without additional sensors consumes about 75 W. In this paper, we outline a design for a quadrotor which is approximately 40% of the size of the Hummingbird, 15% of its mass, and consuming approximately 20% of the power for hovering. III. T HE M ICRO Q UADROTOR

altitude. For communication the vehicle contains two Zigbee transceivers that can operate at either 900 MHz or 2.4 GHz. C. Software Infrastructure The work in this paper uses a Vicon motion capture system [5] to sense the position of each vehicle at 100 Hz. This data is streamed over a gigabit ethernet network to a desktop base station. High-level control and planning is done in M ATLAB on the base station which sends commands to each quadrotor at 100 Hz. The software for controlling a large team of quadrotors is described later in Sec. V (see Fig. 7). Low-level estimation and control loops run on the onboard microprocessor at a rate of 600 Hz. Each quadrotor has two independent radio transceivers, operating at 900 MHz and 2.4 GHz. The base station sends, via custom radio modules, the desired commands, containing orientation, thrust, angular rates and attitude controller gains to the individual quadrotors. The onboard rate gyros and accelerometer are used to estimate the orientation and angular velocity of the craft. The main microprocessor runs the attitude controller described in Sec. IV and sends the desired propeller speeds to each of the four motor controllers at full rate (600Hz).

A. The Vehicle

D. Performance

The prototype quadrotor is shown in Figure 2. Its booms are made of carbon fiber rods which are sandwiched between a custom motor controller board on the bottom and the main controller board on the top. To produce lift the vehicle uses four fixed-pitch propellers with diameters of 8 cm. The vehicle propeller-tip-to-propeller-tip distance is 21 cm and its weight without a battery is 50 grams. The hover time is approximately 11 minutes with a 2-cell 400 mAh Li-Po battery that weighs 23 grams.

Some performance data for the onboard attitude controller in Fig. 3. The small moments of inertia of the vehicle enable the vehicle to create large angular accelerations. As shown in Fig. 4(a) the attiude control is designed to be approximately critically damped with a settling time of less than 0.2 seconds. Note that this is twice as fast as the settling time for the attitude controller for the AscTec Hummingbird reported in [18]. Data for a flip is presented 3(b). Here the vehicle completes a complete flip about its y axis in about 0.4 seconds and reaches a maximum angular velocity of 1850 deg/sec. The position controller described in Sec. IV uses the roll and pitch angles to control the x and y position of the vehicle. For this reason, a stiff attitude controller is a required for stiff position control. Response to step inputs in the lateral and vertical directions are shown in Fig. 4(b). For the hovering performance data shown in Fig. 4 the standard deviations of the error for x and y are about 0.75 cm and about 0.2 cm for z.

B. Electronics Despite its small size this vehicle contains a full suite of onboard sensors. An ARM Cortex-M3 processor, running at 72 MHz, serves as the main processor. The vehicle contains a 3-axis magnetometer, a 3-axis accelerometer, a 2-axis 2000 deg/sec rate gyro for the roll and pitch axes, and a singleaxis 500 deg/sec rate gyro for the yaw axis. The vehicle also contains a barometer which can be used to sense a change in

218

by:

(a) Position error

(b) Position step input response

Fig. 4. The three curves in (a) represent the x, y, and z errors while hovering. (b) shows the step response for the position controller in x (top) and z (bottom).

Fig. 5.

The reference frames and propeller numbering convention.

IV. DYNAMICS AND C ONTROL The dynamic model and control for the micro quadrotor is based on the approach in [17]. As shown in Figure 5, we consider a body-fixed frame B aligned with the principal axes of the quadrotor (unit vectors bi ) and an inertial frame A with unit vectors ai . B is described in A by a position vector r to the center of mass C and a rotation matrix R. In order to avoid singularities associated with parameterization, we use the full rotation matrix to describe orientations. The angular velocity ˙ of the quadrotor in the body frame, ω, is given by ω ˆ = RT R, where ˆ denotes the skew-symmetric matrix form of the vector. As shown in Fig. 5, the four rotors are numbered 1-4, with odd numbered rotors having a pitch that is opposite to the even numbered rotors. The angular speed of the rotor is ωi . The resulting lift, Fi , and the reaction moment, Mi , are given by: Fi = kF ωi2 , Mi = kM ωi2 . where the constants kF and kM are empirically determined. For our micro quadrotor, the motor dynamics have a time constant less than 10 msec and are much faster than the time scale of rigid body dynamics and aerodynamics. Thus we neglect the dynamics and assume Fi and Mi can be instantaneously changed. Therefore the control input to the system, u, consists of the net thrust in the b3 direction, u1 = Σ4i=1 Fi , and the moments in B, [u2 , u3 , u4 ]T , given

⎤ ⎡ 2⎤ kF kF kF ω1 kF 2⎥ ⎥ ⎢ ⎢ 0 k L 0 −k L F F ⎥ ⎢ ω2 ⎥ , (1) u=⎢ ⎣−kF L 0 kF L 0 ⎦ ⎣ω32 ⎦ 2 ω4 −kM kM −kM kM where L is the distance from the axis of rotation of the propellers to the center of the quadrotor. The Newton-Euler equations of motion are given by: (2) m¨r = −mga3 + u1 b3 ⎡ ⎡ ⎤⎤ u2 ω˙ = I −1 ⎣−ω × Iω + ⎣ u3 ⎦⎦ (3) u4 where I is the moment of inertia matrix along bi . We specify the desired trajectory using a time-parameterized position vector and yaw angle. Given a trajectory, σ(t) : [0, tf ] → R3 × SO(2), the controller derives the input u1 based on position and velocity errors: (4) u1 = (−Kp ep − Kv ev + mga3 ) · b3 where ep = r − rT and ev = r˙ − r˙ T . The other three inputs are determined by computing the desired rotation matrix. We want to align the thrust vector u1 b3 with (−Kp ep − Kv ev + mga3 ) in (4). Second, we want the yaw angle to follow the specified yaw ψT (t). From these two pieces of information we can compute Rdes and the error in rotation according to: 1 T eR = (Rdes R − RT Rdes )∨ 2 where ∨ represents the vee map which takes elements of so(3) to R3 . The desired angular velocity is computed by differentiating the expression for R and the desired moments can be expressed as a function of the orientation error, eR , and the angular velocity error, eω : T (5) [u2 , u3 , u4 ] = −KR eR − Kω eω , where KR and Kω are diagonal gain matrices. Finally we compute the desired rotor speeds to achieve the desired u by inverting (1). ⎡

V. C ONTROL AND P LANNING FOR G ROUPS A. Architecture We are primarily interested in the challenge of coordinating a large team of quadrotors. To manage the complexity that results from growth of the state space dimensionality and limit the combinatorial explosion arising from interactions between labeled vehicles, we consider a team architecture in which the team is organized into labeled groups, each with labeled vehicles. Formally, we can define a group of agents as a collection of agents which work simultaneously to complete a single task. Two or more groups act in a team to complete a task which requires completing multiple parallel subtasks [7]. We assume that vehicles within a group can communicate at high data rates with low latencies while the communication requirements for coordination across groups are much less stringent. Most importantly, vehicles within a group are labeled. The small group size allows us to design controllers and planners that provide global guarantees on

219







Fig. 6. The team of quadrotors is organized into m groups. While vehicles within the group are tightly coordinated and centralized control and planning is possible, the inter-group coordination need not be centralized.

shapes, communication topology, and relative positions of individual, agile robots. Our approach is in contrast to truly decentralized approaches which are necessary in swarms with hundreds and thousands of agents [21]. While models of leaderless aggregation and swarming with aerial robots are discussed in the robotics community [11, 26, 19], here the challenge of enumerating labelled interactions between robots is circumvented by controlling such aggregate descriptors of formation as statistical distributions. These methods cannot provide guarantees on shape or topology. Reciprocal collision avoidance algorithms [27] have the potential to navigate robots to goal destinations but no guarantees are available for transient performance and no proof of convergence is available. On the other hand, the problem of designing decentralized controllers for trajectory tracking for three dimensional rigid structures is now fairly well understood[12, 13, 8], although few experimental results are available for aerial robots. Our framework allows the maintenance of such rigid structures in groups. B. Formation Flight Flying in formation reduces the complexity of generating trajectories for a large team of vehicles to generating a trajectory for a single entity. If the controllers are well-designed, there is no need to explicitly incorporate collision avoidance between vehicles. The position error for quadrotor q at time t can be written as (6) epq (t) = ef (t) + elq (t) where ef (t) is the formation error describing the error of position of the group from the prescribed trajectory, and elq (t) is the local error of quadrotor q within the formation of the group. As we will show in Sec. VI the local error is typically quite small even for aggressive trajectories even though the formation error can be quite large. A major disadvantage of formation flight is that the rigid formation can only fit through large gaps. This can be addressed by changing the shape of the formation of the team or dividing the team into smaller groups, allowing each group to negotiate the gap independently. C. Time-Separated Trajectory Following Another way to reduce the complexity of the trajectory generation problem is to require all vehicles to follow the same

team trajectory but be separated by some time increment. Here we let the trajectory for quadrotor q be defined as (7) rT q (t) = rT T (t + Δtq ) where rT T is the team trajectory and Δtq is the time shift for quadrotor q from some common clock, t. If the team trajectory does not intersect or come within an unsafe distance of itself then vehicles simply need to follow each other at a safe time separation. Large numbers of vehicles can follow team trajectories that intersect themselves if the time separations, Δtq , are chosen so that no two vehicles are at any of the intersection points at the same time. An experiment for an intersecting team trajectory is shown in Sec. VI. D. Trajectory Generation with MIQPs Here we describe a method for generating smooth, safe trajectories through known 3-D environments satisfying specifications on intermediate waypoints for multiple vehicles. Integer constraints are used to enforce collision constraints with obstacles and other vehicles and also to optimally assign goal positions. This method draws from the extensive literature on mixed-integer linear programs (MILPs) and their application to trajectory planning from Schouwenaars et al. [23, 24]. 1) Basic Method: As described in [17] an optimization program can be used to generate trajectories that smoothly transition through nw desired waypoints at specified times, tw . The optimization program to solve this problem while minimizing the integral of the kr th derivative of position squared for nq quadrotors is shown below. nq - tnw (((( dkr rTq ((((2 min q=1 (8) (( dtkr (( dt t0 rT q (tw ) = rwq ,

s.t.

dj xT q dtj |t=tw d j yT q dtj |t=tw j

d zT q dtj |t=tw

w = 0, ..., nw ; ∀q

= 0 or free, w = 0, nw ; j = 1, ..., kr ; ∀q = 0 or free, w = 0, nw ; j = 1, ..., kr ; ∀q

= 0 or free, w = 0, nw ; j = 1, ..., kr ; ∀q Here rT q = [xT q , yT q , zT q ] represents the trajectory for quadrotor q and rwq represents the desired waypoints for quadrotor q. We enforce continuity of the first kr derivatives of rT q at t1 ,...,tnw −1 . As shown in [17] writing the trajectories as piecewise polynomial functions allows [8] to be written as a quadratic program (or QP) in which the decision variables are the coefficients of the polynomials. For quadrotors, since the inputs u2 and u3 appear as functions of the fourth derivatives of the positions, we generate trajectories that minimize the integral of the square of the norm of the snap (the second derivative of acceleration, kr = 4). Large order polynomials are used to satisfy such additional trajectory constraints as obstacle avoidance that are not explicitly specified by intermediate waypoints. 2) Integer Constraints for Collision Avoidance: For collision avoidance we model the quadrotors as a rectangular prisms oriented with the world frame with side lengths lx , ly , and lz . These lengths are large enough so that the quadrotor can roll, pitch, and yaw to any angle and stay within the prism. We consider navigating this prism through an environment

220

with no convex obstacles. Each convex obstacle o can be represented by a convex region in configuration space with nf (o) faces. For each face f the condition that the quadrotor’s desired position at time tk , rT q (tk ), be outside of obstacle o can be written as nof · rT q (tk ) ≤ sof ,

∀f = 1, ..., nf (o)

bqof k = 0 or 1

∀f = 1, ..., nf (o)



xT q (tnw ) ≤ xg + M βqg xT q (tnw ) ≥ xg − M βqg

(10)

nf (o)

bqof k ≤ nf (o) − 1

(12)

yT q (tnw ) ≤ yg + M βqg yT q (tnw ) ≥ yg − M βqg

(9)

where nof is the normal vector to face f of obstacle o in configuration space and sof is a scalar that determines the location of the plane. If (9) is satisfied for at least one of the faces then the rectangular prism, and hence the quadrotor, is not in collision with the obstacle. The condition that quadrotor q does not collide with an obstacle o at time tk can be enforced with binary variables, bqof k , as nof · rT q (tk ) ≤ sof + M bqof k

freedom in the optimization problem. For each quadrotor q and goal g we introduce the integer constraints:

zT q (tnw ) ≤ zg + M βqg zT q (tnw ) ≥ zg − M βqg Here βqg is a binary variable used to enforce the optimal goal assignment. If βqg is 0 then quadrotor q must be at goal g at tnw . If βqg is 1 then these constraints are satisfied for any final position of quadrotor q. In order to guaruntee that at least ng quadrotors reach the desired goals we introduce the following constraint. nq n g   βqg ≤ ng nq − ng (13)

f =1

q=1 g=1

where M is a large positive number [23]. Note that if bqof k is 1 then the inequality for face f is always satisfied. The last inequality in (10) requires that the non-collision constraint be satisfied for at least one face of the obstacle which implies that the prism does not collide with the obstacle. We can then introduce (10) into (8) for all nq quadrotors for all no obstacles at nk intermediate time steps between waypoints. The addition of the integer variables into the quadratic program causes this optimization problem to become a mixed-integer quadratic program (MIQP). 3) Inter-Quadrotor Collision Avoidance: When transitioning between waypoints quadrotors must stay a safe distance away from each other. We enforce this constraint at nk intermediate time steps between waypoints which can be represented mathematically for quadrotors 1 and 2 by the following set of constraints: ∀tk : xT 1 (tk ) − xT 2 (tk ) ≤ dx12 or xT 2 (tk ) − xT 1 (tk ) ≤ dx21 or

yT 1 (tk ) − yT 2 (tk ) ≤ dy12

or

yT 2 (tk ) − yT 1 (tk ) ≤ dy21

(11)

Here the d terms represent safety distances. For axially symmetric vehicles dx12 = dx21 = dy12 = dy21 . Experimentally we have found that quadrotors must avoid flying in each other’s downwash because of a decrease in tracking performance and even instability in the worst cases. Therefore we do not allow vehicles to fly underneath each other here. Finally, we incorporate constraints (11) between all nq quadrotors in the same manner as in (10) into (8). 4) Integer Constraints for Optimal Goal Assignment: In many cases one might not care that a certain quadrotor goes to a certain goal but rather that any vehicle does. Here we describe a method for using integer constraints to find the optimal goal assignments for the vehicles. This results in a lower total cost compared to fixed-goal assignment and often a faster planning time because there are more degrees of

Note that this approach can be easily adapted if there are more quadrotors than goals or vice versa. 5) Relaxations for Large Teams: The solving time of the MIQP grows exponentially with the number of binary variables that are introduced into the MIQP. Therefore, the direct use of this method does not scale well for large teams. Here we present two relaxations that enable this approach to be used for large teams of vehicles. a) Planning for Groups within a Team: A large team of vehicles can be divided into smaller groups. We can then use the MIQP method to generate trajectories to transition groups of vehicles to group goal locations. This reduces the complexity of the MIQP because instead of planning trajectories for all nq vehicles we simply plan trajectories for the groups. Of course we are making a sacrifice here by not allowing the quadrotors to have the flexibility to move independently. b) Planning for Sub-Regions: In many cases the environment can be partitioned into nr convex sub-regions where each sub-region contains the same number of quadrotor start and goal positions. After partitioning the environment the MIQP trajectory generation method can be used for the vehicles inside each region. Here we require quadrotors to stay inside their own regions using linear constraints on the positions of the vehicles. This approach guarantees collisionfree trajectories and allows quadrotors the flexibility to move independently. We are gaining tractability at the expense of optimality since the true optimal solution might actually require quadrotors to cross region boundaries while this relaxed version does not. Also, it is possible that no feasible trajectories exist inside a sub-region but feasible trajectories do exist which cross region boundaries. Nonetheless, this approach works well in many scenarios and we show its application to formation transitions for teams of 16 vehicles in Sec. VI.

221

Fig. 7.

Software Infrastructure

(a) Top view

VI. E XPERIMENTAL R ESULTS A. Software Infrastructure for Groups Our architecture (Sec. V-A) is important for a very practical reason. For a large team of quadrotors it is impossible to run a single loop that can receive all the Vicon data, compute the commands, and communicate with each quadrotor at a fast enough rate. As shown in Fig. 7, each group is controlled by a dedicated software node, running in an independent thread. These control nodes receive vehicle pose data from a special Vicon node via shared memory. The node connects to the Vicon tracking system, receives marker positions for each subject, performs a 6D pose fit to the marker data and additional processing for velocity estimation. Finally, the processed pose estimates are published to the shared memory using the Boost C++ library [3]. Shared memory is the fastest method of inter-process communication, which ensures the lowest latency of the time-critical data. The control nodes, implemented in Matlab, read the pose data directly from shared memory and compute the commanded orientation and net thrusts for several quadrotors based on the controller described in IV. For non-time-critical data sharing we use Inter Process Communication (IPC) [4]. For example, high-level user commands such as desired vehicle positions are sent to a planner which computes the trajectories for the vehicles which are sent to the Matlab control nodes via IPC. IPC provides flexible message passing and uses TCP/IP sockets to send data between processes. Each Matlab control node is associated with a radio module containing a 900 MHz and 2.4 GHz Zigbee transceivers, which is used to communicate with all the vehicles in its group. The radio module sends control commands to several vehicles, up to five in this work. Each vehicle operates on a separate channel and the radio module hops between the frequencies for each quadrotor, sending out commands to each vehicle at 100 Hz. The radio modules can also simultaneously receive high bandwidth feedback from the vehicles, making use of the two independent transceivers. B. Formation Flight In Fig. 8 we present data for a team of four quadrotors following a trajectory as a formation. The group formation error is significantly larger the the local error. The local x and y errors are always less than 3 cm while the formation x error is as large 11 cm. This data is representative of all formation trajectory following data because all vehicles are nominally

(b) Error time histories

Fig. 8. Formation following for a 4 quadrotor trajectory. In (a) the gray lines represent the desired trajecotories for each of the four vehicles and the black lines represent the actual trajecotries. The errors are shown in (b). Here the black line represents the formation error, ef (t), from the desired trajectory and the gray lines represent the local errors, eli (t), for each quadrotor.

(a) Error Data

(b) Time-Separated Trajectory Following

Fig. 9. Part (a) shows the Average Standard Deviation for x, y, and z for 20 quadrotors in a grid formation. In (b) we show 16 quadrotors following a figure eight pattern. See the video at http://youtu.be/50Fdi7712KQ

the same and are running the same controller with the same gains. Therefore, even though the deviation from the desired trajectory may be large, the relative position error within the group is small. In Fig. 9(a) we show average error data for 20 vehicles flying in the grid formation shown in Fig. 1. For this experiment the vehicles were controlled to hover at a height of 1.3 meters for at least 30 seconds at several quadrotor-centerto-quadrotor-center grid spacing distances. The air disturbance created from the downwash of all 20 vehicles is significant and causes the tracking performance to be worse for any vehicle in this formation than for an individual vehicle in still air as presented in 4. However, as shown in Fig. 9(a), the separation distance did not have any affect on the hovering performance. Note that at 35 cm grid spacing the nominal distance between propeller tips is about 14 cm. C. Time-Separated Trajectory Following In Fig. 9(b) we show a team of 16 vehicles following a cyclic figure eight pattern. The time to complete the entire cycle is tc and the vehicles are equally spaced in time along tc . In order to guaruntee the trajectory at time increments of 16 collision-free trajectories at the intersection, vehicles spend 15 17 32 tc in one loop of the trajectory and 32 tc in the other. A trajectory that satisfies these timing constraints and has

222

Fig. 11. A team of sixteen vehicles transitioning from a planar grid to a three-dimensional helix (top) and pyramid (bottom) Fig. 10.

Four groups of four quadrotors flying through a window

some specified velocity at the intersection point (with zero acceleration and jerk) is generated using the optimizationbased method for a single vehicle described in [17]. D. MIQP trajectories In this paper, we use a branch and bound solver [1] to solve the MIQP trajectory generation problem. The solving time for the MIQP is an exponential function of the number of binary constraints and also the geometric complexity of the environment. The first solution is often delivered within seconds but finding the true optimal solution and a certificate of optimality can take as long as 20 minutes on a 3.4Ghz Corei7 Quad-Core desktop machine for the examples presented here. 1) Planning for Groups within a Team: In Fig. 10 we show snapshots from an experiment for four groups of four quadrotors transitioning from one side of a gap to the other. Note that in this example the optimal goal assignment is performed at the group-level. 2) Planning for Sub-Regions: In Fig. 11 we show snapshots from an experiments with 16 vehicles transitioning from a planar grid to a three-dimensional helix and pyramid. Directly using the MIQP approach to generate trajectories for 16 vehicles is not practical. Therefore, in both experiments the space is divided into two regions and separate MIQPs with 8 vehicles each are used to generate trajectories for vehicles on the left and right sides of the formation. Note that, in general, the formations do not have to be symmetric but here we exploit the symmetry and only solve a single MIQP for 8 vehicles for these examples. Optimal goal assignment is used so that the vehicles collectively choose their goals to minimize the total cost. VII. C ONCLUSION In this paper we describe the design, manufacture, modeling and control of a micro quadrotor that has a 73 gram mass and is 21 cm in diameter, and the architecture and software for coordinating a team of micro quadrotors with experimental results. While our quadrotors rely on an external localization

system for position estimation and therefore cannot be truly decentralized at this stage, these results represent the first step toward the development of a swarm of micro quadrotors. The small size is shown to facilitate agility and the ability to fly in close proximity with less than one body length separation. Mixed integer quadratic programming techniques are used to coordinate twenty micro quadrotors in known three-dimensional environments with obstacles. The videos of all experiments are available at http://youtu.be/50Fdi7712KQ R EFERENCES [1] IBM ILOG CPLEX V12.1: Users manual for CPLEX, International Business Machines Corporation, 2009. [2] Ascending Technologies, GmbH. http://www.asctec.de. [3] Boost C++ Libraries. http://www.boost.org. [4] Inter Process Communication. http://www.cs.cmu.edu/ ipc/. [5] Vicon Motion Systems, Inc. http://www.vicon.com. [6] Aeroenvironment. Aeroenvironment nano hummingbird, August 2011. Online: http://www.avinc.com/nano. [7] C. Anderson and N. R. Franks. Teams in animal societies. Behavioral Ecology, 12(5):534540, 2001. [8] R. W. Beard, J. Lawton, and F. Y. Hadaegh. A coordination architecture for spacecraft formation control. IEEE Trans. Control Syst. Technol., 9(6):777–790, November 2001. [9] C. Bermes. Design and dynamic modeling of autonomous coaxial micro helicopters. PhD thesis, ETH Zurich, Switzerland, 2010. [10] S. Bouabdallah. Design and Control of Quadrotors with Applications to Autonomous Flying. PhD thesis, Ecole Polytechnique Federale de Lausanne, Lausanne, Switzerland, February 2007. [11] F. Bullo, J. Cort´es, and S. Mart´ınez. Distributed Control of Robotic Networks: A Mathematical Approach to Motion Coordination Algorithms. Applied Mathematics Series. Princeton University Press, 2009. [12] J. P. Desai, J. P. Ostrowski, and V. Kumar. Modeling and

223

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

control of formations of nonholonomic mobile robots. IEEE Trans. Robot., 17(6):905–908, December 2001. M. Egerstedt and X. Hu. Formation constrained multiagent control. IEEE Trans. Robot. Autom., 17(6):947– 951, December 2001. Dario Floreano, Jean-Christophe Zufferey, Adam Klaptocz, Jrg Markus Germann, and Mirko Kovac. Aerial Locomotion in Cluttered Environments. In Proceedings of the 15th International Symposium on Robotics Research, 2011. V. Kumar and N. Michael. Opportunities and challenges with autonomous micro aerial vehicles. In International Symposium on Robotics Research, 2011. S. Lupashin, A. Schollig, M. Sherback, and R. D’Andrea. A simple learning strategy for high-speed quadrocopter multi-flips. In Proc. of the IEEE Intl. Conf. on Robot. and Autom., pages 1642–1648, Anchorage, AK, May 2010. D. Mellinger and V. Kumar. Minimum snap trajectory generation and control for quadrotors. In Proc. of the IEEE Intl. Conf. on Robot. and Autom., pages 2520– 2525, Shanghai, China, May 2011. D. Mellinger, N. Michael, and V. Kumar. Trajectory generation and control for precise aggressive maneuvers. In Int. Symposium on Experimental Robotics, December 2010. N. Michael and V. Kumar. Control of ensembles of aerial robots. Proc. of the IEEE, 99(9):1587–1602, September 2011. N. Michael, J. Fink, and V. Kumar. Cooperative manipulation and transportation with aerial robots. Auton. Robots, 30(1):73–86, January 2011. J. Parrish and W. Hamner, editors. Animal Groups in Three Dimensions. Cambridge University Press, New York, 1997. D. Pines and F. Bohorquez. Challenges facing future micro air vehicle development. AIAA J. of Aircraft, 43 (2):290–305, 2006. Tom Schouwenaars, Bart DeMoor, Eric Feron, and Jonathan How. Mixed integer programming for multivehicle path planning. In European Control Conference, pages 2603–2608, 2001. Tom Schouwenaars, Andrew Stubbs, James Paduano, and Eric Feron. Multi-vehicle path planning for non-line of sight communication. In American Control Conference, 2006. S. Shen, N. Michael, and V. Kumar. Autonomous multi-floor indoor navigation with a computationally constrained MAV. In Proc. of the IEEE Intl. Conf. on Robot. and Autom., pages 20–25, Shanghai, China, May 2011. H. Tanner, A. Jadbabaie, and G. J. Pappas. Flocking in fixed and switching networks. IEEE Trans. Autom. Control, 52(5):863–868, May 2007. Jur van den Berg, Stephen J. Guy, Ming C. Lin, and Dinesh Manocha. Reciprocal n-body collision avoidance. In INTERNATIONAL SYMPOSIUM ON ROBOTICS RE-

SEARCH, 2009. [28] C. H. Wolowicz, J. S. Bowman, and W. P. Gilbert. Similitude requirements and scaling relationships as applied to model testing. Technical report, NASA, August 1979.

224

CompAct™ Arm: a Compliant Manipulator with Intrinsic Variable Physical Damping Matteo Laffranchi, Nikos G. Tsagarakis and Darwin G. Caldwell Fondazione Istituto Italiano di Tecnologia, Via Morego 30, 16163, Genova, Italy {matteo.laffranchi; nikos.tsagarakis; darwin.caldwell}@iit.it Abstract — Humans exploit compliance in their biomechanical muscle-tendon-bone actuation structure to enable robust and safe interaction with the environment and utilize the elastic energy stored into muscles and tendons to obtain large energy efficiency or high output mechanical power peaks at their limbs. From the robotic/mechatronic point of view it is clear that emulating such a property in robotic actuation systems enables the achievement of performance which is not possible with classical stiff designs. In contrast to this, transmission compliance introduces some disadvantages as e.g. typically underdamped modes which reduce the achievable control bandwidth, stability margin and accuracy of the controlled system. These limitations are solved in mammalians by means of physical damping which clarifies why these biological systems are able of performing fast and smooth yet accurate motions in their limbs. This motivates this work which consists in the analysis and development of the CompAct™ Arm, a novel compliant manipulator with intrinsic variable damping. This is probably the first robotic system to exhibit these diverse bio inspired characteristics. A motivation analysis is initially presented to show how the drawbacks introduced by compliance can be overcome by means of physical damping. The second part of the paper presents the mechatronic development of the robotic manipulator and preliminary experimental results.

(a)

Keywords – Mechatronic Design, Compliant assembly, BioInspired Design, Dynamics

I.

INTRODUCTION

Improvements in production time and the quality of the products lie at the heart of modern manufacturing and the use of robots has formed an increasingly important aspect of the drive for efficiency. These robots are designed for precision, speed and repeatability and usually work in restricted areas to prevent any harmful interaction with humans. However, new opportunities are arising in houses and offices that mean that robots will not be confined to these relatively restricted factory environments and this sets new demands in terms of ability to interact with these less structured and more uncertain environments. It is evident that classical heavy and stiff manipulators with high control gains are not suitable to cooperate and work within these new operational areas. In fact, the typical design/control approaches mentioned previously make the resulting robots present serious interaction limitations and large output mechanical impedance which means that the robot and human safety are compromised when unexpected events occur. Yet, other aspects than safety can be jeopardized by this actuation approach as e.g. mechanical robustness and energy efficiency

(b)

Figure 1. The CompAct™ Arm: (a) external layout semi-transparent CAD view and (b) fully assembled prototype

[1-4]. The reason for this is that rigid systems do not permit mechanical energy storage while compliant systems as e.g. humans’ biological structures can exploit this property to improve the mentioned characteristics. These bio-inspired drives motivate scientists to attempt to replicate compliance within robots with a variety of solutions [1-3, 5]. Although compliance can effectively make the difference, from the engineering point of view it introduces also some side effects. It typically induces underdamped dynamics which deteriorate the stability margin and the accuracy of the robot (due to the induced oscillations) and limits the bandwidth which can be set in the controlled system [6-8]. These drawbacks are tackled in mammalians, which exhibit joint compliance, by the synergetic presence in their body of both compliance and physical damping. The properties of passive compliance and damping typical of biological joints explain why e.g. humans are able to safely interact and be robust during collision due to compliance while at the same time having the potential of achieving fast, smooth yet accurate motion thanks to physical damping, [9-11]. From the design perspective, it can be concluded that the introduction of damping in the mechatronics of compliant robot joints (possibly with the ability of varying its level) can be a solution to overcome the drawbacks introduced by compliance. Apart from this, previous work demonstrated that physical damping can enhance the dynamic performance and accuracy of force-torque controlled systems [12]. Although some of the mentioned advantages are obvious, a rigorous

225

analysis is necessary to clarify in depth the role of physical damping in compliant robotic joints. This paper proposes in its first part an analysis on the effects of physical damping in compliant robotic joints assessing how this parameter affects the dynamic performance, energy efficiency and safety of compliant actuators. The positive outcome of this study motivated the design and development of the CompAct™ Arm, i.e. a manipulator powered by actuators with intrinsic rich skills related to the synergetic combination of passive compliance and variable physical damping. To the authors knowledge this is the first robotic systems to exhibit these combined properties. We believe that such a system is step towards the achievement of functional performance of natural systems and particularly the human in terms of motion agility, safety, energy efficiency and power. The mechatronic details of this highly integrated system are presented in the second part of this paper. The paper is structured as follows: Section II presents an analysis to evaluate and quantify how physical damping affects the characteristics of a generic compliant actuator while Section III introduces the mechatronic details of the actuation units employed for the development of the manipulator. Section IV defines the kinematic layout of the robot and analyses a dynamic simulation involving an interactive task for determining the preliminary specifications of the manipulator and related actuation systems. The robot assembly and design is analysed in Section V, with Section VI presenting preliminary results obtained with the prototype. Finally, Section VII addresses the conclusions and future work. II.

DYNAMICS OF COMPLIANT ACTUATION SYSTEMS WITH INTRINSIC VARIABLE DAMPING

A. Mechanical model Considering the basic equivalent linear model of a compliant actuator, Fig. 2a, the system dynamics can be expressed by the following set of equations: (1) M x  D( x  x )  K ( x  x )  F R 



q



q

in

M L xq  D( x  xq )  K ( x  xq )  Fout where MR = 0.5 kg and ML = 1 kg are the reflected rotor and equivalent link inertias, D and K = 100 N/m are the joint viscous damping and stiffness, xθ and xq are the rotor and link equivalent linear displacement, Fin and Fout are the effort provided by the actuation system and the force applied to the outer link, respectively. The basic model presented in Fig. 2a makes some assumptions, i.e. it does not consider any dissipative effect (such as e.g. friction/damping due to the gearbox, bearings, etc. or due to joule losses in the resistance of the drivers/windings) apart from that of the transmission damper D and neglects the electrical (or hydraulic) dynamics of the driving system. In fact, these hypotheses are made on purpose to make the analyses independent of the specific nature of the actuator and make the analysis and results more generalised.

(a)

(b)

Figure 2. (a) Equivalent model of a SEA with variable physical damping and (b) model of the constrained collision scenario

B. Dynamic performance Assuming no interaction is involved (Fout = 0N), the following transfer function between rotor and link velocity (Vθ and Vq) can be obtained from (1): (2) V Ds  K

P( s ) 

q

V



M L s 2  Ds  K

In fact, an increase of the variable joint viscous damping D of the system in Fig. 2a, can be increased to shift the zero of (2) to lower frequencies to introduce a phase lead action (typically from resonance to approximately 2 decades after) avoiding the sharp and significant phase lag introduced by the complex conjugate poles caused by compliance(1). This improves the stability of the system hence facilitating the link position/velocity control. At the same time, the damping D can be set sufficiently large so as to reduce the amplitude of the oscillations of the vibration modes and facilitate accurate position tracking when needed. In addition to this, the magnitude of (2) increases with joint damping in the high frequency domain above resonance. From the performance perspective this is a further attractive feature since this means that for a given actuation system the maximum output speed limit increases at certain operating frequency (above resonance). The high frequency performance improvement gained by a joint damping increase can be quantified by the following coefficient: (3)  P  ( s  j )

GD  lim



+

P ( s  j ) 



D D

-

where P and P are plants having the form of (2) and featuring two different levels of joint damping D+ and D- such that D+ > D-. Equation (3) shows that GD increases proportionally with joint damping meaning that dynamic performance can be augmented by means of damping. C. Energy efficiency Considering the system in Fig. 2a, the rotor velocity vθ required to produce certain link motion can be computed using (2), while the required actuator force Fin is obtained using (1). The corresponding mechanical power input Pin required to provide a desired link velocity can therefore be computed in the frequency domain as:

1

In fact, compliance acts as a limiting factor to the dynamic performance which can be achieved in the controlled system. As a rule of a thumb, the bandwidth of closed loop (lightly damped) systems should be placed below 1/3 of the mechanical poles introduced by compliance, [6].

226

Pin (s)  Fin (s) *V (s)

constrained collision of the system in Fig. 2b, [14]. This scenario is modelled as in Fig. 2b with the purpose of characterizing the joint force generated at collision. A transfer function which can be used for the evaluation of the safety is the force transmissibility between the input and joint force(3). For the system in Fig. 2b, this function is: (5) F Ds  K

(4)

Considering a required velocity output of amplitude 0.1 rad/s, the required RMS power input as a function of the frequency is shown in Fig. 3 for different physical viscous damping levels. It can be observed that the effect of damping on the power consumption varies in function of the operating frequency, that is, in the low frequency domain (approximately in the range of [0, 1]Hz for this specific system) there is no substantial change varying damping; whereas systems with small damping are much more energy efficient in proximity of resonance.

T ( s) 

III.

This last effect is quite obvious since the system is exploiting its natural dynamics to perform the motion. In contrast to this, it is quite surprising that systems with high joint physical damping feature much higher energy efficiency than compliant actuators with low joint viscous friction in the high frequency domain. This is due to the fact that motor and link in compliant actuators are highly decoupled in the high frequency domain and therefore in this region much more effort is required on the motor side to compensate for this decoupling effect when damping is low [13]. In other words, compliance can be beneficial from the efficiency perspective when the system operating frequency matches the natural dynamics (as in e.g. [1]), however when the task requires higher frequency motions the system has to work against the physics of the system hence compromising energy efficiency(2). This in turns means that physical damping can be attractive especially in fixed stiffness designs (as e.g. SEAs) where dynamics cannot be tuned: a tunable damper placed in parallel to the transmission compliance can be an effective mean for the development of performing and efficient (at high frequency spectrum) compliant actuators.

2

It is worth mentioning that this effect can be observed also in the time domain for tracking profiles whose dominant spectral components are well below the system natural frequency [13].

Fin



M R s 2  Ds  K

This transfer function has the same form of (2), apart from the fact that the reflected mass appearing at denominator is that of the rotor instead of that of the link. This means that (3) is valid also for (5), that is, the same amount of improvement gained in performance is lost in safety. In fact, this does not represent a problem because if the actuator has the possibility of varying the joint physical damping level as in the systems of Fig. 2, this parameter can be adjusted to an optimal value as a function of the system states and of the required safety level (as in e.g. the fast and soft paradigm, [5]) to obtain the best safety-performance tradeoff.

Figure 3. Input mechanical power consumption required to provide an output velocity of amplitude 0.1 rad/s for different damping ratio values as a function of the operating frequency.

D. Safety The higher dynamic performance and energy efficiency achieved in plants with larger physical viscous damping level demonstrated previously is due to the increased coupling between rotor and link. This is on one hand an attractive property of physical viscous damping, however on the other hand this quantity introduces the side effect of a decreased safety level due to the generation of higher forces at impact. To evaluate this effect, we consider here the risky case of a

j

MECHATRONICS OF THE ACTUATION SYSTEMS

A. Damping systems Dampers can be categorized in three main classes: passive, active [15] and semi-active [7, 12] basing on the amount of external power required for the damping control system to perform its functionality. Purely passive dampers are not able to adapt the damping value in function of the system configuration/state, while active dampers can be employed to suppress oscillations however they do not offer the benefits analysed in the previous section. On the other hand, semiactive solutions offer the reliability of passive devices while at the same time maintain the versatility and the adaptability of active systems without requiring large amount of energy [7]. In addition, differently from active implementations, they cannot inject mechanical power into the system and therefore they do not have the potential of destabilizing the controlled actuator. When appropriately controlled semi-active systems perform significantly better than passive devices and have the potential of achieving the majority of the performance of the fully active configuration (e.g. ability of regulation of the damping level over wide ranges) [7]. Furthermore, a semiactive damping system can be completely disconnected in particularly critical conditions to maximize the decoupling effect of compliance, such as e.g. when the robot is working in close proximity of the human/environment. This suggests that the use of an actuation system which can provide passive compliance and variable physical damping (using e.g. a semi active damper) can effectively be employed to develop human-friendly and still well performing robots and is therefore employed in the actuation modules used for the development of the manipulator presented in this work.

3

In this work we do not consider the contribution to the impact force of the reflected link mass since this variable is independent from the transmission viscous damping and stiffness.

227

B. The modular design concept Looking at the kinematics of existing serial robots e.g. the Comau Smart NM, [16], Barrett WAM, [17], but also humanoid robots as the Honda ASIMO, [18], Figs. 4a, b, it can be observed that their kinematic structure is mainly composed of revolute joints placed in two different configurations.

(a)

(b)

Figure 4. Conceptual schemes showing the kinematics of (a), Comau Smart NM, (b) Honda ASIMO and Barrett WAM manipulators.

The first actuator kinematic configuration counts the joint axis aligned with the robot structure (this configuration is defined as “longitudinal joint” in this work), whereas the second arrangement is represented by the joint axis placed perpendicularly to the robot structure, which is instead defined as “transverse joint” configuration. For instance, the Comau robot shown in Fig. 4a is made of a longitudinal-transversetransverse configuration while the ASIMO robot arm, Fig. 4b, presents a kinematic chain made of a longitudinal-transverselongitudinal-transverse arrangement. From the analysed examples of Fig. 4, it can be concluded that from the kinematic perspective a serial robot uses mostly revolute joints placed in two basic configurations: the longitudinal and the transverse joints. A serial robot design can therefore be developed by means of an appropriate serial combination of these two joint configurations. The employed actuation system is a series elastic actuator which presents the ability of regulating the physical damping in parallel to the joint compliance and is therefore suited for the development of this manipulator [8]. The adjustment of the damping is realized by a semi-active friction damper actuated by light and compact piezoelectric stack actuators(4), Fig. 5b. Based on the modular design concept the actuation units were formatted following the Longitudinal and Transverse kinematic schemes for developing the full arm. To satisfy the lightweight property of the overall arm a carbon fibre frame was used in this design. The mentioned structure is used not only as frame but also as a cover preventing the human/environment to enter in contact with the mechatronics of the manipulator therefore improving its robustness and safe properties. A further attractive aspect of the use of composite material is that in contrast to traditional manufacturing processes as milling or sheet metal forming it facilitates the development of parts without sharp edges which may result unsafe during 4

The use of semi-active damping systems placed between the actuator and the load is not a new concept in robotics, see e.g.[12]; however these works do not make use of mechanical compliance. In fact dampers in these works are employed as transmission systems (similarly as clutches) and they are hence used to deliver the actuator effort to the load. This means that these systems typically suffer of low energy efficiency when compared to SEAs/VSAs or hybrid solutions as that presented in this work,[12].

interaction. The highly integrated modules embed all the mechatronics of the system including motor-gearbox group, series compliant joint, semi-active damping system, control and conditioning electronics. 1) Longitudinal actuation module The design of this module is shown in Fig. 5. The actuation system is fixed to the carbon fibre frames in two points, i.e. at the base and front support flanges of the actuator, Fig. 5a. This is to avoid the application of thrust moments to the actuator which may result in malfunction due to the misalignment between the frictional surfaces of the damping system, Fig. 5b. The input interface is an aluminium disk which is directly connected to the frame by means of structural adhesive while the output interface is an aluminium component machined with a geometry which is the negative of that of the input disk. The thickness of the carbon fibre frame has been dimensioned to 1.5mm which guarantees sufficient strength of the structure when the nominal load is applied to the fully extended arm and has been validated by means of finite elements analysis (FEM). The achievable range of motion for this joint is [0, 2π] rad, although its motion can be constrained on demand to shorter ranges by means of mechanical locks. The total weight of the unit is mS = 2.2 kg. The fully assembled module is shown in Fig. 7a.

(a) Figure 5.

(b)

(a) Open view (greyed components: actuation system, electronics, input-output interfaces) and (b) semi-exploded cross section view of the longitudinal actuation module (greyed components: piezoelectric stacks of the VPDA, springs of the compliant module, frictional surfaces of the VPDA). For more details on this design, please refer to [8].

2) “Transverse” actuation module In the transverse configuration the frame has been designed in order to allow the replication of the kinematic configuration mentioned previously. Similarly as in the longitudinal module, the actuator is supported on both sides by means of the base and front flanges for the same reasons previously explained, Fig. 6. The input and output mechanical interfaces present the same geometry as that of the longitudinal module in order to fulfil the modular property and permit arbitrary interconnections between the modules. In this case the thickness of the carbon fibre structure is 1.5mm as well as in the previous case whereas the maximum achievable range of motion for this joint is [-π/2, π/2] rad and its total weight is mT = 2.4 kg. The fully assembled prototype is shown in Fig. 7a whereas its specifications are reported in Tab I together with those of the longitudinal module.

228

at the output of the harmonic drive gear to measure the joint torque. Each module is provided with a motor control board, Fig. 7b, used for the sensor data acquisition, control of the whole unit and generation of the command voltages for the brushless DC motor and the piezoelectric driver amplifier board, Fig. 7c. This latter board is used for amplifying the control signals for the piezo stack actuators of the employed friction damper in the nominal voltage range of [0, 150]V as required by the piezos. The communication with the controller is performed through a 100Mbit Ethernet interface utilizing both TCP and UDP protocols. Figure 6. Open view of the transverse actuation module (greyed components: actuation system, electronics, input-output interfaces). For the mechanical design of the actuation system refer to Fig 5b and to[8]. TABLE I SPECIFICATIONS OF THE LONGITUDINAL AND TRANSVERSE ACTUATION MODULES

Parameter Gear ratio – N

Value 100

Power

190 W

Maximum output continuous torque

40 Nm

Maximum joint velocity

10.7 rad/s (≈ 610 °/s)

Maximum rotary passive deflection

±0.18 rad

Maximum damping torque

9 Nm

Joint stiffness

188 Nm/rad

Joint Damping Range

[0, 9] Nms/rad

Inherent joint viscous damping

0.25 Nms/rad

Total mass of the longitudinal actuator - mS

2.2 kg

Total mass of the transverse actuator - mT

2.4 kg

(b)

(a) Figure 7.

(c)

(a) Longitudinal (left) and transverse (right) actuation modules, (b) Actuators control board and (b) Piezo amplifier board

C. Sensing and electronics Regarding the available sensors, both modules are equipped with an incremental 12-bit position encoder placed on the motor shaft and two magnetic 12-bit absolute encoders to monitor the compliant module deflection angle and the mechanical angle of the motor after the reduction drive. A strain gauge – based custom made force sensor is machined on the actuator structure to permit the measurement of the force applied by the piezo actuators, whereas a custom made torque sensor which exploits the same sensing principle is assembled

IV.

MANIPULATOR SPECIFICATIONS

Among the most important issues to be addressed when considering anthropomorphic/humanoid design is the definition of the basic layout of the robot. This phase includes the definition of the kinematics and dimensions of the arm which guarantee an appropriate nominal payload at the same time permitting an overall natural and proportioned design. The size of the manipulator is approximately that of an adult human arm. Regarding the kinematics, three degrees of freedom are located at the shoulder complex and complemented by an additional degree of freedom for the implementation of the elbow flexion-extension DOF. The first three DOFs will reproduce, in sequence, the arm extensionflexion, adduction-abduction and humeral roll degrees of freedom accordingly with the ball-socket kinematic shoulder model, [19]. The elbow pronation-supination degree of freedom is not implemented in this first manipulator version since this degree of freedom plays a less important role in the physical human robot interaction and therefore the inclusion of compliance (and related variable physical damping) is likely not strictly necessary for this specific degree of freedom. The average ranges of motions of the adult human arm are used as a starting point for the selection of the moveable ranges of each joint of the manipulator. These are reported in Tab. II, [19](5). A. Static analysis Although an arm with dimensions exactly equal to those of a 50th percentile male can be designed by means of the modules presented in Section III, these dimensions have equally been increased with the purpose of making the manipulator layout proportioned at the same time guaranteeing that the actuators are capable of delivering the required torque. Statically, the configurations with fully extended arm on the transverse plane are the worst case layouts since the arm centre of mass is distal hence maximizing the lever arm of the moment produced by gravity. The worst case load scenarios of the manipulator are shown in Fig 8. The two cases of Fig. 8 are equivalent due to the properties of the roll-pitch-roll joint employed for the development of the shoulder and this allows finding a unique solution to the dimensioning problem.

5

Note that the ranges of motion reported in Tab. II have been slightly extended with respect to those of the human arm

229

assembly of a hand/gripper at the robot end effector still leaving some load margin. The overall dimensions, mass and nominal payload of the manipulator are reported in Tab. II.

(a)

(b)

Figure 8. Worst case load configurations of the arm. In the configuration of (a) the shoulder adduction-abduction actuator receives the maximum load, while for (b) the most stressed actuator is that of the flexion-extension DOF

B. Dynamic and interaction analysis Using the determined kinematic layout and the physical properties (e.g. mass, stiffness) reported in Tab. I and II as a baseline, dynamic simulations have been carried out to give an indication of the typical torque/velocity profiles required by the employed actuation system when the arm is performing a task which requires physical interaction. A further mass of weight 2 kg has been applied to the end effector to simulate a load and/or the mass of a robotic hand/gripper. As an example case, we analyse here the previously studied task of a manipulator writing on a blackboard, [21], Fig. 9.

The equilibrium equation of the static torque due to gravitational forces at the critical actuators (greyed in Fig. 8) is: T  L m g  L m g  ( L  L )m g (6) s

SE

T

SE

EW

pload

where T is the torque applied to the critically loaded actuators, L1COG is the distance between the point of intersection of the axes of the DOFs of the shoulder (defined as pSH) and the centre of gravity of the humeral roll actuator and is given by the dimensions of the mechanical assembly of the longitudinal and transverse actuation modules. LSE represents the distance between pSH and the centre of gravity of the elbow flexionextension DOF whereas g is the acceleration of gravity and P = 1.8 is the ratio between the total arm and the humerus lengths (P = (LSE+LEW)/LSE using the notation of Fig. 8) of a 50th percentile adult male, [20]. TABLE II SPECIFICATIONS AND RANGES OF MOTIONS (ROM) OF THE MANIPULATOR

Parameter

Value

Arm base – shoulder axes intersection distance (LBS)

193·10-3 m

Shoulder axes intersection – elbow axis distance (LSE)

393·10-3 m

Elbow axis – wrist axes intersection distance (LEW)

360·10-3 m

Total weight (marm)

9.9 kg

Nominal static load at the EE @ extended arm (mpload)

4 kg

Arm extension-flexion ROM

[-45, 180]°

Arm adduction-abduction ROM

[0, 170]°

Humeral roll internal-external rotation ROM

[-90, 90]°

Elbow flexion-extension ROM

[0, 145]°

Setting the payload to mpload = 4kg and considering that the critical actuators are loaded at their maximum torque value (Tab I) and solving (6) with respect to LSE it is possible to obtain the value of this parameter while the total arm length can be set by means of the scaling factor P (7) L L L P SE

EW

SE

where LEW is the distance between the elbow and the wrist axes. Considering the specified payload the layout of the arm is proportionally scaled with a factor of approximately 40% with respect to the dimensions of a 50th percentile human arm, [20]. At the same time, such a nominal payload will permit the

(a) Torque [Nm]

1COG

10 0 -10 SFE -20 0

SAA

HRO

EEH

5

ti 10

15

Time [s]

(b) Figure 9. (a) Snapshot sequence of the simulation animation (manipulator 3D model in black and blackboard in grey) and (b) torques required for the execution of the task. The acronyms SFE, SAA, HRO, EFE represent the Shoulder Flexion-Extension, Shoulder AbductionAdduction, Humeral Roll and Elbow Flexion Extension DOFs while ti is the instant when interaction begins.

The joint trajectory data obtained from the experiments in [21](6) were imposed to the motor positions of the manipulator model to obtain the required joint torques (which result from gravity, dynamics and interaction). A virtual rigid wall simulating the blackboard was implemented in the simulator to generate normal reaction forces on the manipulator end effector when interaction was verified. The joints torques required for the execution of this task are shown in Fig. 9b. Figure 9b shows that before interaction (this occurs at ti = 2.2s), the required torques reach relatively small magnitude levels, i.e. a maximum amplitude of approximately 5 Nm in the shoulder abduction-adduction joint. This occurs because in this phase the actuators compensate for the gravity and the inertial torques only. However, after this instant the torques increase in magnitude due to the increased effort required by the interaction. It is important to notice that the torques of Fig. 9b correspond to the effort required to perfectly track the desired motor positions and therefore can represent a scenario 6

We use this example case as the Barrett WAM arm employed for the experiments in [21] presents a kinematic layout which is very much similar to that of the arm presented in this paper.

230

where high gain motor position controllers are implemented. This hence constitutes a worst case situation since robots designed for interaction should employ safety-oriented controllers which can adapt the robot configuration to limit the interaction force/torque. The maximum output velocity required by the actuators in the execution of this task is 0.42 rad/s and is reached by the elbow joint. The obtained maximum velocities and torques are much lower than the nominal values which can be generated by the actuators, Tab. I, demonstrating that such an interaction task can be executed by the presented system with large torque and velocity safety margins. V.

MANIPULATOR ASSEMBLY AND KINEMATICS

A roll-pitch-roll “spherical” joint made of the series longitudinal-transverse-longitudinal forms the shoulder of the manipulator. Assembled to this, a hollow carbon fibre part defined as “Humerus” is used to connect the shoulder to the elbow. The Humerus output interface has been inclined of 50 degrees with respect to its input flange to respect the set range of motion specifications, Tab. II. The output flange of the Humerus is connected with the input interface of another transverse actuator which implements the elbow flexionextension degree of freedom. Finally, a hollow part implements the manipulator forearm hence completing the design of the arm.

VI.

EXPERIMENTAL RESULTS

Preliminary results are presented in this section as proof of concept of the employed actuation system applied to the presented design. A weight of mass 3kg was assembled at the link extremity of an experimental prototype version of the actuation modules shown in Fig. 7 in order to replicate a moment of inertia of JL = 0.13 kg·m2 which simulates the load of a possible robot configuration (it is about one third of the highest moment of inertia which can be reached by the manipulator, i.e. that seen by the proximal actuator for the fully extended arm, Jarm = 0.42 kg·m2 ≈ 3JL). A step position reference of amplitude 0.15 rad (8.6°) was sent to the positioncontrolled motor in order to induce oscillations to be damped by the semiactive damper while the link position was recorded, Fig. 11. It is important to remark that no controllers to shape the dynamics (as e.g. impedance/admittance control) were implemented on the motor which was controlled using pure position control in order to show the action of the full module with the variable physical damper which is regulated using the viscous damping control scheme proposed in [8]. At the same time, two levels of desired joint physical damping (corresponding to damping ratios of ζ1 = 0.3, ζ2 = 1(7)) were sent to the damping system in order to show the ability of the damping module in the dynamic replication of the desired joint physical damping, Fig. 11. The experimental data obtained in Fig. 11a has been compared with simulation results to validate the effectiveness of the system mechatronics.

Figure 10. (a) Kinematics of the manipulator and reference frames according to the Denavit-Hartenberg convention. The frame shown at the manipulator base is used as “home” configuration for the DH parameters. TABLE III D-H PARAMETERS OF THE MANIPULATOR

Link ID

D

θ

a

Α

1 2 3 4

LBS L12 LSE 0

(3/2)π+q2 (3/2)π-q2 (3/2)π+q2 q4

0 0

(3/2)π (3/2)π (3/2)π 0

LSE_H LEW

This latter part is designed so as to permit the future integration of a robotic hand. The full arm design is shown in Fig. 1. A schematic summarizing the kinematics of the manipulator is shown in Fig. 10 whereas their DenavitHartenberg parameters are reported in Table III. The ranges of motion reported in Tab. II are implemented by means of a mechanical pin-based locking mechanism which is employed in each joint of the arm.

Figure 11. (a) Recorded link position step responses of the actuation system for different levels of joint physical damping and overplots of the experimental results shown in (a) with the simulation results of the (b) high and (c) low damping cases.

The simulated link position data was obtained by using the recorded motor trajectory in the experiments shown in Fig. 11a as a source of flow for an equivalent mass-spring-variable damper system which simulates the joint-link assembly. These have been plotted together with the experimental link position data either for a low (D1 = 1.48 Nms/rad, ζ1 = 0.3) and high 7

Note that this quantity is intended in this paper as the equivalent damping ratio of the mass spring damper system formed by the damped compliant joint and the link inertia (locked system).

231

(D2 = 9.89 Nms/rad, ζ1 = 1) joint viscous damping levels, Figs. 12b and 12c, respectively. The resemblance between simulated and experimental data is clear, however nonlinearities such as e.g. dry friction due to the bearings and assembly have not been considered into the model used for the simulation and this creates slight estimation errors.

[2]

VII. CONCLUSIONS AND FUTURE WORK

[4]

[3]

This paper presented the analysis and development of the CompAct™ Arm, i.e. a highly integrated compliant manipulator with intrinsic variable physical damping. The positive results obtained in the first part of the paper demonstrated that joint physical damping has a significant positive effect in the dynamic performance, accuracy and efficiency (in the high frequency domain) of compliant robots. This motivated the employment of a piezoelectrically actuated semi-active friction damper in parallel to the transmission compliance of the actuation system. The resultant manipulator is probably the first robotic system to exhibit these rich bioinspired physical properties making a step towards the development of robots with enhanced capabilities and performance approaching those of biological systems and in particular of the human. In addition, this anthropomorphic manipulator presents low weight (thanks to the compact design and carbon fibre frame) and joint compliance at all DOFs with the objective of realising high interaction-related performances, enhance robustness and safety. It is developed following a modular design concept, i.e. the use of highly integrated actuation modules which embed the whole actuator mechatronics. The obvious advantage of such a design approach is its versatility, which facilitate the development of further robots (e.g. full humanoids) by employing the two presented modules. The intrinsic characteristics of the presented system (lightweight, joint compliance and variable physical damping) will make this robot able to interact with the environment presenting at the same time robustness, good accuracy and dynamic performance. Further experiments will be carried out on the full manipulator to show the benefits of the introduced damping as demonstrated in the first part of the paper and to fully characterise the effects of damping on safety. Future developments will also include the design of novel control strategies intended to fully exploit the functionalities and advantages of the incorporated semiactive damping system.

[5] [6] [7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16] [17] [18]

ACKNOWLEDGMENT

[19]

This work is supported by the European Commission project SAPHARI FP7-ICT-287513.

[20]

REFERENCES [1]

[21]

A. Jafari, N. G. Tsagarakis, and D. G. Caldwell, "Exploiting Natural Dynamics for Energy Minimization using an Actuator with Adjustable Stiffness (AwAS)," presented at the International Conference on Robotics and Automation, Shanghai, China, 2011.

232

M. G. Catalano, G. Grioli, M. Garabini, F. Bonomo, M. Mancini, N. G. Tsagarakis, and A. Bicchi, "VSA - CubeBot. A modular variable stiffness platform for multi degrees of freedom systems," in International Conference on Robotics and Automation, Shanghai, China, 2011. A. M. Dollar and R. D. Howe, "A robust compliant grasper via shape deposition manufacturing," Mechatronics, IEEE/ASME Transactions on, vol. 11, pp. 154-161, 2006. M. Laffranchi, N. G. Tsagarakis, F. Cannella, and D. G. Caldwell, "Antagonistic and series elastic actuators: a comparative analysis on the energy consumption," in Intelligent Robots and Systems (IROS). IEEE/RSJ International Conference on, St. Louis, USA, 2009, pp. 56785684. A. Bicchi and G. Tonietti, "Fast and soft arm tactics.," Ieee Robotics & Automation Magazine, vol. 11, 2004. A. De Luca and W. Book, "Robots with Flexible Elements," in Handbook of Robotics, S. K. Editors, Ed., ed: Springer, 2008. M. Laffranchi, N. G. Tsagarakis, and D. G. Caldwell, "A variable physical damping actuator (VPDA) for compliant robotic joints," in Robotics and Automation (ICRA), IEEE International Conference on, Anchorage, USA, 2010, pp. 1668-1674. M. Laffranchi, N. Tsagarakis, and D. G. Caldwell, "A compact compliant actuator (CompActTM) with variable physical damping," in Robotics and Automation (ICRA), 2011 IEEE International Conference on, Shanghai, China, 2011, pp. 4644-4650. T. E. Milner and C. Cloutier, "Damping of the wrist joint during voluntary movement," Experimental Brain Research, vol. 122, pp. 309317, 1998. H. Gomi and R. Osu, "Task-Dependent Viscoelasticity of Human Multijoint Arm and its Spatial Characteristics for Interaction with Environmentsq," J. Neuroscience, vol. 18, pp. 8965-8978, 1998. T. Tsuji, Y. Takeda, and Y. Tanaka, "Analysis of mechanical impedance in human arm movements using a virtual tennis system," Biological Cybernetics, vol. 91, pp. 295-305, 2004. C. Chee-Meng, H. Geok-Soon, and Z. Wei, "Series damper actuator: a novel force/torque control actuator," presented at the Humanoid Robots, 2004 4th IEEE/RAS International Conference on, 2004. M. Laffranchi, N. G. Tsagarakis, and D. G. Caldwell, "Analysis and Development of a Semiactive Damper for Compliant Actuation Systems," IEEE/ASME Transactions on Mechatronics, 2012. M. Laffranchi, N. G. Tsagarakis, and D. G. Caldwell, "Safe human robot interaction via energy regulation control," in Intelligent Robots and Systems (IROS). IEEE/RSJ International Conference on, St. Louis, USA, 2009, pp. 35-41. J. Y. Lew and S. M. Moon, "A simple active damping control for compliant base manipulators," Mechatronics, IEEE/ASME Transactions on, vol. 6, pp. 305-310, 2001. http://www.comau.it/. http://www.barrett.com/robot/index. M. Hirose and K. Ogawa, "Honda humanoid robots development," Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 365, pp. 11-19, January 15, 2007 2007. V. M. Zatsiorsky, Kinematics of Human Motion. Leeds, UK: Human Kinetics Pub., 1998. A. R. Tilley and H. D. Associates, The Measure of Man and Woman: Human Factors in Design: Whitney Library of Design, 1993. A. Pistillo, S. Calinon, and D. G. Caldwell, "Bilateral Physical Interaction with a Robot Manipulator through a weighted Combination of Flow Fields," presented at the International Conference on Intelligent Robots and Systems, San Francisco, CA, USA, 2011.

Robust Loop Closing Over Time Yasir Latif, Cesar Cadena and Jos´e Neira Instituto de Investigaci´on en Ingenier´ıa de Arag´on, I3A Universidad de Zaragoza Zaragoza, Spain Email: {ylatif, ccadena, jneira} @unizar.es

Abstract—Long term autonomy in robots requires the ability to reconsider previously taken decisions when new evidence becomes available. Loop closing links generated by a place recognition system may become inconsistent as additional evidence arrives. This paper is concerned with the detection and exclusion of such contradictory information from the map being built, in order to recover the correct map estimate. We propose a novel consistency based method to extract the loop closure regions that agree both among themselves and with the robot trajectory over time. We also assume that the contradictory loop closures are inconsistent among themselves and with the robot trajectory. We support our proposal, the RRR algorithm, on well-known odometry systems, e.g. visual or laser, using the very efficient graph optimization framework g2o as back-end. We back our claims with several experiments carried out on real data.

I. I NTRODUCTION The ability to detect that past place recognition decisions (also known as the loop closing problem) were incorrect, and to recover the accurate state estimate once they are removed, is crucial for lifelong mobile robotic operations. No matter how robust a place recognition system might be, there always exists the possibility of getting just one false positive loop closing hypothesis. This can be catastrophic for any estimation algorithm. Only by collecting more evidence over time we can detect and correct these mistakes in place recognition. Reliability increases with redundancy during the verification process. Consider long term navigation, see Fig. 4(b) (best viewed in color), where information is collected session by session and place recognition is used to improve the estimation after every session. Failures may happen, but with the arrival of new evidence and the reconsideration of decisions taken earlier, the estimation can be corrected. To achieve this, we need to realize that the place recognition system has generated wrong constraints, remove them if necessary, and recompute the estimation. We propose a novel consistency method based on consensus: wrong loop closure constraints usually are in contradiction among themselves, with the correct ones, and with the sequential movement constraints (odometry) of the robot. At the same time, the correct ones form consensus among themselves and with the odometry. Our method, the Realizing, Reversing, Recovering (RRR) algorithm, works with the pose graph formulation. It is part of the back-end of a SLAM system and is therefore independent of the type of sensor used for odometry or place recognition.

All our method needs is a system that is able to generate a pose graph for the sequential poses and a place recognition system that can provide constraints for loop closure. In the next section, we discuss relevant work and highlight the need for a robust loop closing method. In section III we detail our proposal, the RRR algorithm, and carry out real experiments in section IV. Finally, in section V and VI discussion and conclusions about the work are presented, along with possible future work. II. R ELATED W ORK Several approaches to persistent mapping have appeared in the robotics literature in the recent past. Konolige and Bowman [8] presented a stereo visual odometry system that works together with a bag of words place recognition system to build multiple representations of dynamic environments over time. Multiple map stitching, recovery from odometry failures, loop closing, and global localization, all rely on the robustness of the place recognition system. “Weak links” are removed when place recognition is able to close loops, making it prone to errors when the place recognition closes loops wrongly. Similarly, McDonald et al. [11] presented a multi-session 6 DOF Visual SLAM system using “anchor nodes”. In their approach place recognition is assumed to be perfect and its output is trusted every time. Sibley et al. [17] presented a relative bundle adjustment approach for large scale topological mapping. They show an example of mapping from London to Oxford, over a trajectory of about 121km. They also use appearance based place recognition and are therefore in danger of making wrong decisions in case of incorrect place recognition. All of these approaches to large scale and persistent mapping rely on a place recognition system with zero false positives. Appearance based loop closing approaches that work in real time usually follow the bag of words approach [18]. FabMap, from Cummins and Newman [3], uses a bag of words approach with probabilistic reasoning that has been shown to work robustly over a long trajectory. The BoW-CRF system [1] generates loop closings using a bag-of-words and carries out verifications using conditional random fields. This improves the robustness of the system. In any case, neither approach can guarantee 100% accuracy. Olson [12] proposed a hypothesis verification method for loop closure constraints using graph partitioning based on spectral clustering. Another approach is to delay decision making and maintain multiple

233

topologies of the map with an associated belief for each one. Ranganathan and Dellaert [14] follow this approach using RaoBlackwellized particle filter. However, they do not explicitly show how their system is affected by and recovers from wrong loop closures. Their method is also unique in the sense that it uses the estimation process itself to reason about possible loop closures. A similar estimation based reasoning approach using pose graph formulation was presented by S¨underhauf and Protzel [20] in which a robust SLAM back end using “switch factors” was proposed. The central idea is to penalize those loop closure links during graph optimization that deviate from the constraints they suggest between two nodes. Similar to our approach, they change the topological structure of the graph based on identification and rejection of wrong loop closures. In contrast, however, they assess the validity of every loop closure on its own, without forming a general consensus using all the available information. In cases where there are a number of hypotheses that suggest the same but wrong loop closings (for example due to perceptual aliasing in long corridors), overall consensus helps us in rejection of such outliers. Their method suggest a continuous function governing the state of “switch factors” which does not make sense in all the cases. III. O UR PROPOSAL : THE RRR ALGORITHM Since place recognition is a very important step in map estimation, there is a need for a robust mechanism that can identify incorrect place recognition links. We propose a robust consistency-based loop closure verification method using the pose graph formulation based on the observation that correct loop closure in conjunction with odometry can help in the detection of wrong loop closures. Our method follows the line of work in which the estimation process itself is used in making the distinction between correct and false loop closures. Thanks to the maturity of the Simultaneous Localization and Mapping (SLAM) problem, in the last years several very efficient estimation methods have been published as for instance iSAM by Kaess et al. [7], HOG-Man by Grisetti et al. [6], and g2o by K¨ummerle et al. [10]. The graph based formulation for SLAM, the so-called “graph-SLAM”, models robot poses as nodes in a graph where links from odometry or loop closures form edges or “constraints”. Let x = (x1 . . . xn )T be a vector of parameters that describe the configuration of the nodes. Let ωij and Ωij be the mean and the information of the observation of node j from node i. Given the state x, let the function fij (x) be a function that calculates the perfect observation according to the current state. The residual rij can then be calculated as: rij (x) = ωij − fij (x)

(1)

Constraints can either be introduced by odometry which are sequential constraints (j = i + 1), or from place recognition system which are non-sequential. The amount of error introduced by each constraint weighed by its information can be calculated as: dij (x)2 = rij (x)T Ωij rij (x)

(2)

and therefore the overall error, assuming all the constraints to be independent, is given by:   dij (x)2 = rij (x)T Ωij rij (x) (3) D2 (x) = The solution to graph-SLAM problem is to find a state x∗ that minimizes the overall error.  rij (x)T Ωij rij (x) (4) x∗ = argmin x

Iterative approaches such as Gauss-Newton or Levenberg Marquadt can be used to compute the optimal state estimate [10]. We divide the constraints into two sets; S contains sequential links and R contains links from place recognition. Since all constraints are mutually independent, the error in (3) be written as:   D2 (x) = dij (x)2 (5) dij (x)2 + (i,j)∈S

(i,j)∈R

We further divide the set R into n disjoint subsets Rk , where each subset only contains topologically related constraints (sequences of links that relate similar portions of the robot trajectory) such that R = ∪nk=1 Rk and ∀(i = j)Ri ∩ Rj = ∅. We term each of theses subsets as “clusters”. Then the error for set R can be written as:  (i,j)∈R

dij (x)2 =

n   c=1 (i,j)∈Rc

dij (x)2 =

n 

dRc (x)2

(6)

c=1

where dRc (x)2 is the error contributed by the cth subset. This simply means that the overall error introduced due to place recognition constraints is the sum of the individual errors of each cluster. Assuming that we do not have any outliers in odometry, the error in (3) is caused practically only by the loop closing links. Once we iterate to find the optimal state, the error in the odometry is no longer zero. This increase in odometry error gives us a measure of the metric change that must take place so that the graph conforms to place recognition constraints. This error will be smaller when the corresponding place recognition constraints are correct. It is because comparatively smaller metric change is needed as opposed to the change required by wrong constraints. Moreover, clusters that suggest the same change would cause smaller errors among them as compared to conflicting clusters. By measuring how much errors the clusters introduce, we can detect which clusters agree with each other. A point to note here is that even though odometry drifts with time, it is still a useful measure of the underlying topology of the graph. Consider the case, when each of these clusters is the only place recognition constraint in the graph. In absence of other clusters, this cluster can make the odometry converge to an acceptable solution, where acceptable means that the overall errors are within some statistical threshold. If this cluster alone deforms the graph enough to make the solution unacceptable,

234

it is highly likely that this cluster is suggesting a wrong place recognition decision. This forms the basis of our intra-cluster test. Mathematically, for any cluster Ri to be individually consistent, the following two conditions must hold: 2 (x) DG



=

T

rij (x) Ωij rij (x) +

(i,j)∈Ri



2

dij (x)
s then rejectSet ← {} else rejectSet ← {rejectSet, rSet} end if end if end loop

A. Comparisons

establish which runner would end up in the second position, we ask the winner to not participate in the race and conduct the race again. We continue asking the winner each time to join the previous winners, to see who will take the next spot. The winners in our case are the member of the goodSet. This is show in Algorithm 2. As long as we keep finding clusters that are jointly consistent with the goodSet, it will grow. An important point to mention here is the use of the rejectSet. The reject set contains all the clusters that we checked in the last iteration but found them to be inconsistent with the goodSet. We omit them from the optimization process

In the experiments, the RRR algorithm is used in combination with a weak place recognition algorithm (BoW), that is known to exhibit many false positives. In this way, we test the robustness of RRR to incorrect loop closing links. We compare our method with a branch and bound search (BBS) traversal of the whole solution space, computationally very demanding but guaranteed to find the largest subset of hypotheses that is jointly consistent (according to Algorithm 2). We also compare with Olson’s method [12] and a modified version (see below). This is for verification within a single session. Olson’s method is mainly used for outlier rejection and finding inliers that are in some sense consistent with each other. Without any modifications and calculating the consistency just within the “groups” (that we call clusters), the method has very low recall. We believe this happens because a few very well connected prevent other correct but comparatively weakly connected clusters for entering the final solution. In the modified version, we calculate the consistency between all the given loop closure links. This makes the algorithm unreliable because Dijkstra links are derived from long trajectory links. At the same time, the adjacency criterion takes a long time to compute. The only advantage is that it dilutes the effect of a few very well connected clusters. The results for this method are shown in Fig. 1. The method accepts some false positive loop closures mainly because of the aforementioned reason.

236

Pr = 0.69 Re = 0.85 RMSE=55.5m

Pr = 1.0 Re = 0.42 RMSE=3.0m

(a) Loop closures from a BoW system.

(b) Output of the Olson’s method.

Pr = 0.98 Re = 0.78 RMSE=1.8m

(c) Modified Olson’s method’s output.

Pr = 1.0 Re = 0.65 RMSE=1.0m

(d) Output of our proposal.

Fig. 1. Experiment used for comparisons against other possible approaches, over one session of Bicocca campus (Fig. 4(b)). Inset: result of optimization carried out using accepted links. We show the precision (Pr), recall (Re) and root mean square error (RMSE) for each method. (Figure best viewed in colour online at http://roboticsproceedings.org/rss08/p30.html)

on the optimization process. When the base approach reaches full precision our method accepts all the constraints, obtaining the same recall and errors. We test the RRR algorithm in several real datasets. The odometry information comes from visual+wheel, laser, and stereo odometry, for outdoors-Bovisa (Fig 3(a)), mixed-Bovisa (Fig. 3(b)), and NewCollege (Fig. 3(c)) respectively. We use BoW as place recognition with geometrical checking (GC) based on finding the rigid-body transformation with stereo correspondences. In Fig. 3 we show the results. It can be seen that our method successfully rejects all the wrong loop closing links.

Fig. 2. Effect of the RRR algorithm over the precision (top), recall (middle), and root mean square error (bottom) compared with the base approach (BoW+GC) as we sweep the parameter α− of the BoW system. This is over the same Bicocca session used in Fig. 1. The curves for BBS in each case are superimposed with those of RRR.

BBS traverses the solution space to compute the largest set of clusters of loop closing hypotheses that are jointly consistent. As can be seen in Fig. 2, BBS selects the same clusters as our method but at the expense of much higher computational complexity. To further evaluate the performance of the RRR algorithm, we sweep the minimum confidence level of acceptance α− of the BoW system, and compute the precision, recall, and the root mean square error of the optimized graph for the constraints for each value of α− . In Fig. 2, we show the result without our method (base) and with our RRR algorithm. RRR obtains full precision in the whole range and better root mean square error than the base approach. It is important to note that although sacrificing recall, we obtain better metric accuracy. This shows the disastrous effect of just a single false positive

B. Long term operation The task of long term navigation involves robustly joining multiple sessions into a single map. For this purpose, we use multiple sessions from RAWSEEDS [15]. The dataset contains five overlapping runs in indoor environment and provides wheel odometry, laser scans, and images from a camera mounted on the robot. In order to use the RRR algorithm, we need some form of odometry as well as the loop closing hypotheses. We use laser scans from the dataset to compute the laser odometry using a simple scan matcher, see Fig. 4(a), and use BoW plus GC, see Fig. 4(b), to compute appearance based place recognition links. These are the inputs to our method. We run our method incrementally on the five sessions. The first iteration of the algorithm works with data from the first session and computes the set of consistent clusters. The optimized first session is shown at the top-most position in Fig. 5. In the inset, the input to this step can be seen. This optimized session is used in conjunction with the second session and combined reasoning is done on both. During the second session, due to lack of evidence and odometry error we accept some false positives, the effect of which can be seen in the 3D plot. During the next session, as more evidence becomes available, we are able to recover from the previous mistake. As can be seen from the successive improvement in the overall estimate, our method is able to take into account

237

(a) Outdoors (Bovisa campus)

(b) Mix outdoors-indoors (Bovisa campus)

(c) Outdoors (New College)

Fig. 3. Results of the RRR algorithm over different environments. The loop closures are obtained from an appearance based place recognition algorithm. Odometry is in blue, accepted links in green and rejected ones in red. (Figure best viewed in colour)

V. D ISCUSSION

(a) Laser odometry given by a simple scan matching.

(b) Loop closing constraints from BoW+GC , α− = 0.15 Fig. 4. Bicocca multi-session experiment in an indoor environment. (Figure best viewed in colour)

more evidence as it become available and incorporate it into the overall map. The last (bottom most) in Fig. 5 is the final map estimate based on evidence and combined reasoning on all the sessions. In the same way, we run our method on the 36 sessions of Lincoln dataset. This dataset provides omnidirectional images and laser scans. We compute the odometry by simple scan matching of the laser returns, see Fig. 6(a), and the loop closure constraints taking the best match comparing the GIST descriptors of the Omnidirectional images as described in [16], see Fig. 6(b). The final result after our proposal is shown in Fig. 7. The algorithm takes 40.5s to complete, with no false positives.

The method proposed in this work is applicable for metric pose graph formulation. Any SLAM system that constructs a metric map can be used in conjunction with our system for loop closure verification over time. The method is not restricted to any type of sensor. The pose graph, for example can be constructed from either wheel odometry, laser odometry or visual odometry or any other sensor. Similarly, the method is independent of the mechanism used for place recognition. This gives our method the flexibility to be used together with any SLAM front end that can generate a pose graph and the corresponding place recognition constraints. For timing information, we compare our method again with BBS and Olson’s method. BBS explores an interpretation tree which, in the case of our extension to clusters, is a binary tree. The worse case complexity of BBS is therefore O(2n ) where n is the number of clusters. In our experiment, after clustering and intra-cluster consistency (6s), the algorithm took 12s for a binary tree of depth 20. There are randomized versions to check joint consistency such as the RJC algorithm proposed by Paz et al. [13] or the 1-point RANSAC algorithm proposed by Civera et al. [2]. They are based on the random sample consensus paradigm [5], and need a minimum number of data points to fit a model. In the case addressed here, with arbitrary trajectories and without prior knowledge of the number of loop closure areas, there is no clear way to determine the minimum number of data points needed to fit the model. For Olson’s method, the most expensive operation is the computation of the consistency matrix. For N hypotheses the complexity of calculating the consistency matrix is O(N 2 ). It should be noted that N , the number of hypothesis, is much larger than the number of clusters n. In our experiment, the running time for this method was 8s. For n clusters, our method needs to carry out n intra-cluster consistency tests. More over, in the worst case if all the clusters are considered for joint consistency after the first optimization and are not jointly consistent, we need n joint consistency checks. This makes our method linear O(n) in the number of clusters. The actual cost of carrying out each step, however, depends on the underlying graph and the optimization method

238

(a) Laser odometry. Fig. 6.

(b) Loop closure constraints.

Lincoln dataset. (Figure best viewed in colour)

Fig. 7. Results of our RRR algorithm for long term operation using the 36 sessions of Lincoln dataset. Left: Loop closure constraints accepted. Center and right: Optimized graph as output of our method. (Figure best viewed in colour)

Fig. 5. Results for long term operation using Bicocca Feb-25a, Feb-25b, Feb26a, Feb-26b and Feb-27a Inset: Input to current step. 3D plot : Optimized graph as output of our method. (Figure best viewed in colour)

used. For the experiment in Fig. 1, our method takes 8s to calculate the solution, spending 6 of them in clustering and intra-cluster consistency checks. As we mentioned above, these two operations can be carried out in real time. As we mentioned earlier, currently we consider that links less than tg = 10s apart belong to the same cluster, in accordance with the usual frequency of the place recognition algorithm, 1fps. A smaller value for tg results in more clusters, and a corresponding increase in computational cost, with precision-recall not being affected. A large value can result in low recall because large clusters containing valid loop closing links could be rejected more frequently. One of the main advantages of our method is that while finding a solution similar to BBS, the algorithm works in O(n)

as opposed to O(2n ). For the first two sessions of the long term experiment in Fig. 5, BBS takes 1537.6s compared to our method which runs in 34.8s. For the complete experiment, BBS takes over two days to come up with a solution, while our method takes 314.5s. The long term experiments demonstrate the importance of incorporating new evidence session by session to realize possible previous failures. It is important to mention here that as the number of sessions continue increasing, further criteria need to be taken into account in order to join and/or forget past evidence. For instance, if we track the performance history of the clusters we can decide to forget some of them if they are always rejected (as would be the case for perceptual aliasing). Also we can fuse verified clusters, from different times, if they are metrically close after the optimization is carried out. Here, we assumed that the errors in odometry links are small. We think that it is a plausible assumption given the higher frame rate of odometry w.r.t place recognition rate and the state of the art in odometry techniques. Nevertheless, if a failure occurs the approaches of “weak links” [9] or “anchor nodes” [11] can be used to separate the experiment into different sessions. VI. C ONCLUSION The task of long term operation requires that at any given time, the map be updated using all the available information.

239

Therefore, a method is required that can reason based all on the available evidence and as a result produce the best map estimate in light of that evidence. We therefore use the term “loop closing over time” for methods that are able to reason about the consistency of loop closing hypotheses not only in space but also along the time dimension. In this work, we have presented a novel consistency based approach to this problem. We support with evidence our claim that the estimation process itself can be used to establish the validity of loop closing hypotheses. We back up our claims with experiments showing the long term operation abilities of our system as well as its robustness when compared to state of the art data association methods. As future work, we will explore extending our algorithm to consider the possibility of incorrect odometry links. ACKNOWLEDGEMENTS This research has been funded by the Direcci´on General de Investigaci´on of Spain under projects DPI2009-13710 and DPI2009-07130, and by DGA-FSE(group T04). R EFERENCES [1] C. Cadena, D. G´alvez-L´opez, J.D. Tard´os, and J. Neira. Robust place recognition with stereo sequences. IEEE Transaction on RObotics, 28(4), 2012. doi: DOI:10.1109/ TRO.2012.2189497. to appear. [2] J. Civera, Oscar G. Grasa, A. J. Davison, and J. M. M. Montiel. 1-Point RANSAC for extended kalman filtering: Application to real-time structure from motion and visual odometry. J. Field Robot., 27:609–631, September 2010. ISSN 1556-4959. doi: http://dx.doi.org/10.1002/rob.v27: 5. URL http://dx.doi.org/10.1002/rob.v27:5. [3] M. Cummins and P. Newman. Appearance-only SLAM at large scale with FAB-MAP 2.0. The International Journal of Robotics Research, 2010. doi: 10.1177/ 0278364910385483. URL http://ijr.sagepub.com/content/ early/2010/11/11/0278364910385483.abstract. [4] F. Dayoub, G. Cielniak, and T. Duckett. Long-term experiments with an adaptive spherical view representation for navigation in changing environments. Robotics and Autonomous Systems, 59(5):285–295, 2011. [5] M. A. Fischler and R. C. Bolles. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 24(6):381–395, 1981. [6] G. Grisetti, R. K¨ummerle, C. Stachniss, U. Frese, and C. Hertzberg. Hierarchical optimization on manifolds for online 2d and 3d mapping. In Robotics and Automation (ICRA), 2010 IEEE International Conference on, pages 273 –278, may 2010. doi: 10.1109/ROBOT. 2010.5509407. [7] M. Kaess, H. Johannsson, R. Roberts, V. Ila, J. Leonard, and F. Dellaert. iSAM2: Incremental smoothing and mapping with fluid relinearization and incremental variable reordering. In IEEE Intl. Conf. on Robotics and Automation, ICRA, Shanghai, China, May 2011.

[8] K. Konolige and J. Bowman. Towards lifelong visual maps. In Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on, pages 1156–1163. IEEE, 2009. [9] K. Konolige, J. Bowman, J.D. Chen, P. Mihelich, M. Calonder, V. Lepetit, and P. Fua. View-based Maps. The International Journal of Robotics Research, 29(8): 941–957, 2010. doi: 10.1177/0278364910370376. URL http://ijr.sagepub.com/content/29/8/941.abstract. [10] R. K¨ummerle, G. Grisetti, H. Strasdat, K. Konolige, and W. Burgard. g2o: A general framework for graph optimization. In Proc. of the IEEE Int. Conf. on Robotics and Automation (ICRA), Shanghai, China, May 2011. [11] J. McDonald, M. Kaess, C. Cadena, J. Neira, and J.J. Leonard. 6-DOF Multi-session Visual SLAM using Anchor Nodes. In European Conference on Mobile Robotics, ECMR, 2011. [12] E. Olson. Recognizing places using spectrally clustered local matches. Robotics and Autonomous Systems, 57 (12):1157–1172, December 2009. [13] L. M. Paz, J. Guivant, J. D. Tard´os, and J. Neira. Data association in O(n) for divide and conquer SLAM. In Proc. Robotics: Science and Systems, Atlanta, GA, USA, June 2007. [14] A. Ranganathan and F. Dellaert. Online probabilistic topological mapping. The International Journal of Robotics Research, 30(6):755–771, May 2011. doi: 10. 1177/0278364910393287. URL http://ijr.sagepub.com/ content/early/2011/01/23/0278364910393287.abstract. [15] RAWSEEDS. Robotics advancement through Webpublishing of sensorial and elaborated extensive data sets (project FP6-IST-045144), 2009. http://www.rawseeds.org/rs/datasets. [16] A. Rituerto, A. C. Murillo, and J. J. Guerrero. Semantic labeling for indoor topological mapping using a wearable catadioptric system. to appear in Robotics and Autonomous Systems, special issue Semantic Perception, Mapping and Exploration, 2012. [17] G. Sibley, C. Mei, I. Reid, and P. Newman. Vast-scale outdoor navigation using adaptive relative bundle adjustment. The International Journal of Robotics Research, 29(8):958–980, 2010. [18] J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the International Conference on Computer Vision, volume 2, pages 1470–1477, October 2003. [19] M. Smith, I. Baldwin, W. Churchill, R. Paul, and P. Newman. The new college vision and laser data set. The International Journal of Robotics Research, 28(5):595–599, May 2009. ISSN 0278-3649. doi: http://dx.doi.org/10.1177/0278364909103911. URL http: //www.robots.ox.ac.uk/NewCollegeData/. [20] N. S¨underhauf and P. Protzel. BRIEF-Gist - Closing the loop by simple means. In Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on, pages 1234 –1241, sept. 2011.

240

Optimization-Based Estimator Design for Vision-Aided Inertial Navigation Mingyang Li and Anastasios I. Mourikis Dept. of Electrical Engineering, University of California, Riverside E-mail: [email protected], [email protected] Abstract—This paper focuses on the problem of real-time pose tracking using visual and inertial sensors in systems with limited processing power. Our main contribution is a novel approach to the design of estimators for these systems, which optimally utilizes the available resources. Specifically, we design a hybrid estimator that integrates two algorithms with complementary computational characteristics, namely a sliding-window EKF and EKF-SLAM. To decide which algorithm is best suited to process each of the available features at runtime, we learn the distribution of the feature number and of the lengths of the feature tracks. We show that using this information, we can predict the expected computational cost of each feature-allocation policy, and formulate an objective function whose minimization determines the optimal way to process the feature data. Our results demonstrate that the hybrid algorithm outperforms each individual method (EKF-SLAM and sliding-window EKF) by a wide margin, and allows processing the sensor data at real-time speed on the processor of a mobile phone.

I. I NTRODUCTION In this paper we address the problem of tracking the 3D-pose of small, resource-constrained systems in unknown environments. Specifically, we are interested in estimating the motion of miniature devices, similar in size to a mobile phone. In contrast to medium- and large-scale systems, (e.g. mobile robots, UAVs, autonomous cars), small devices have limited computational capabilities and battery life, factors which make the pose-estimation problem challenging. In the absence of GPS, the types of sensors that can be used for pose estimation in small-scale systems are quite restricted. In our work, we opt for using visual and inertial measurements. This is motivated by the fact that cameras and inertial measurement units (IMUs) are small, lightweight, and inexpensive sensors, which can operate in almost any environment. Our goal is to estimate the moving platform’s trajectory only, using the inertial measurements and the observations of naturally occurring point features. We do not assume that a map of the area is available, nor do we aim to build such a map. Thus, the problem we address is analogous to visual odometry, with the added characteristic that an IMU is available. We term the approach visual-inertial odometry. The key challenge that needs to be addressed when designing an estimator for this task is the limited availability of processing power. Feature detection algorithms can track hundreds of features in images of natural scenes, but processing all this data in real time is challenging, particularly for a small, resource-constrained device. To date, the existence of resource

limitations has been typically addressed by proposing a motion estimation algorithm (e.g., EKF-SLAM, sliding-window iterative estimation), and testing whether it can operate within the given resource constraints. If it can, success is declared. However, this design paradigm has the shortcoming that it cannot ensure optimal use of the available resources: even though one algorithm may operate using a certain amount of CPU time, it is possible that a different algorithm can either (i) produce a more accurate estimate in the same amount of time, or (ii) compute an estimate of the same precision faster. Clearly, this situation is undesirable. An additional difficulty that arises when designing an estimator for any localization task is that, typically, the computational efficiency of different methods depends strongly on the nature of each particular dataset. For instance, one algorithm may outperform all others when the environment is feature-rich and the vehicle is moving slowly, while a different algorithm may be the fastest in feature-sparse environments under fast motion. This makes algorithm selection a difficult task, for which no general, systematic methods exist to date. To address the limitations described above, in this work we propose a new paradigm for the design of motion estimation algorithms. Specifically, we propose designing a hybrid estimator that incorporates two (or more, in general) algorithms with complementary computational characteristics. This makes it possible to decide, in real time, to process different measurements (e.g., different features) by a different algorithm. Since the optimal choice for each measurement will depend on the characteristics of the sensor data, in the proposed framework we employ statistical learning to learn these characteristics. Gathering statistical information allows us to compute the expected cost of any strategy for allocating measurements to algorithms. To identify the optimal strategy we solve, in real time, an optimization problem whose objective function is the expected computation time. In this paper, we apply the approach described in the preceding paragraph to the problem of visual-inertial odometry. Specifically, we design a hybrid extended Kalman filter (EKF), which integrates EKF-SLAM with a sliding-window EKF estimator. As explained in Section III, these two estimators process the same measurement information in different ways, and have complementary computational characteristics. We show that the optimal choice of algorithm to process each individual feature depends on the distribution of the featuretrack lengths of all features. This distribution is not known in

241

advance (it depends on the environment, the camera motion, as well as the feature tracker used), and therefore we learn it from the image sequence. Using this information, the optimal strategy for processing the feature measurements can be computed by solving a one-variable optimization problem, as shown in Section V. Our results demonstrate that the hybrid algorithm outperforms each individual method (EKF-SLAM and sliding-window EKF) by a wide margin. In fact, the hybrid filter allows processing the sensor data at real-time speed on a mobile phone, something that is not possible with either of the individual algorithms. II. R ELATED W ORK A huge number of methods have been proposed for pose estimation, and thus any attempt at a general literature survey in the limited space available would be incomplete. We therefore here focus on methods that seek to optimize, in some sense, the computational efficiency of localization. To the best of our knowledge, no prior work exists that employs learning of the feature tracks’ characteristics to explicitly optimize the computational cost of estimation. Past work has focused primarily on SLAM, and can broadly be categorized as: a) Exact reformulations of the SLAM equations: Typical methods decompose the computations into smaller parts, and selectively carry out the only necessary computations at each time instant (see e.g., [1]–[4]). Typically, in these methods the currently visible features are updated normally at every time step, while the remaining ones are updated only “on demand” or when re-visited. In the case of visual-inertial odometry, however, where the state vector contains only the actively tracked features and no loop-closure is considered, these methods are not applicable. b) Approximations of the SLAM equations: On the other hand, several methods exist that employ approximations to the SLAM equations (e.g., [5]–[7] and references therein), to reduce the required computations. In contrast to these methods, which trade-off information for efficiency, our proposed approach involves no information loss, and no approximations, other than the inaccuracies due to the EKF’s linearization. c) Feature selection methods: A different family of approaches seek to reduce the computational cost of localization by processing only the most valuable, in terms of uncertainty reduction, measurements (e.g., [8], [9]). Only processing a small subset of carefully selected measurements can often result in high accuracy. However, since the remaining measurements are simply discarded, loss of information inevitably occurs. In our proposed approach, all the available measurements are processed, and no localization information is lost. III. E STIMATORS FOR VISUAL - INERTIAL ODOMETRY We now examine the possible choices of algorithms for processing the feature observations in visual-inertial odometry. Since real-time performance is necessary, any candidate algorithm must have bounded computational complexity, irrespective of the duration of the trajectory. Within this class, practically all algorithms proposed to date employ either an

EKF (e.g., [10]–[12]), or iterative minimization over a window of states (e.g., [13], [14]). The latter approaches iteratively relinearize the measurement equations to better deal with their nonlinearity, which, however, incurs a high computational cost. In our recent work [15] we showed that, by choosing linearization points that preserve the system’s observability properties, EKF-based visual-inertial odometry can attain better accuracy than iterative-minimization methods, at only a fraction of the computational cost. Therefore, in this paper we only focus on EKF-based algorithms. Within the class of EKF methods, there are two possible formulations of the estimator. Specifically, we can employ the EKF-SLAM approach (e.g. [16]–[18] and references therein), in which the state vector contains the current IMU state and feature positions. To keep the computations bounded, features that leave the field of view must be removed from the state vector, leaving only the currently observed ones [18]. Other EKF algorithms instead maintain a sliding window of camera poses in the state vector, and use the feature observations to apply probabilistic constraints between these poses (e.g., [10], [19], [20]). Out of this class of methods, the multi-stateconstraint Kalman filter (MSCKF) algorithm [10] uses all the information provided by the feature measurements, which makes it our preferred method. Both the MSCKF and EKF-SLAM use exactly the same information, and only differ in the way they organize the computations, and in linearization (more on this in Section VI). If the measurement models were linear, both methods would yield exactly the same result, equal to the MAP estimate of the IMU pose [21]. With respect to computational cost, however, the two methods differ dramatically. For the EKF-SLAM algorithm the cost at each time-step is cubic in the number of features (since all features in the state vector are observed). In contrast, the MSCKF has computational cost that scales linearly in the number of features, but cubically in the length of the feature tracks (see Section IV-B). Therefore, if many features are tracked, each in a small number of frames, the MSCKF approach is preferable, but if few features are tracked over long image sequences, EKF-SLAM would result in lower computational cost1 . Clearly, EKF-SLAM and the MSCKF algorithm are complementary, with each being superior in different circumstances. This motivates us to integrate both algorithms in a single, hybrid filter, as described next. IV. T HE HYBRID MSCKF/SLAM A LGORITHM In this section we present the hybrid MSCKF/SLAM estimator for visual-inertial odometry. We begin our presentation by briefly outlining the MSCKF algorithm, which was originally proposed in [10]. In our work, we employ the modified MSCKF algorithm, which attains increased accuracy and consistency by choosing linearization points that ensure the correct observability properties [15]. 1 Similar conclusions can be reached from the analysis of [22], which compares EKF-SLAM to iterative optimization over a window of poses.

242

A. The MSCKF algorithm for visual-inertial odometry The MSCKF does not include the feature positions in the state vector, but instead uses the feature measurements to directly impose constraints between the camera poses. The state vector of the MSCKF is: T  xTk = xTIk xTC1 xTC2 · · · xTCm (1) where xIk is the current IMU state, and xCi , i = 1 . . . m are the camera poses (positions and orientations) at the times the last m images were recorded. The IMU state is defined as: T  T ¯ xI = q (2) pT vT bTg bTa ¯ is the unit quaternion describing the rotation from where q the global frame to the IMU frame, p and v denote the IMU position and velocity, respectively, while bg and ba are the gyroscope and accelerometer biases. The MSCKF uses the IMU measurements to propagate the current IMU state and the covariance matrix, Pk+1|k , while the feature measurements are used for EKF updates. Let’s assume that the i-th feature, which has been observed in  images, has just been lost from tracking (e.g., it went out of the field of view). At this time, we use all the measurements of the feature to carry out an EKF update. Let the observations of the feature be described by the nonlinear equations zij = h(xCj , fi )+nij , for j = 1 . . . , where fi is the feature position (described by the inverse-depth parameterization [23]) and nij is the noise vector, modelled as zero-mean Gaussian with covariance matrix σ 2 I2 . Using all the measurements we compute a feature position estimate ˆfi , and then compute the residuals ˜zij = zij − h(ˆ xCj , ˆfi ), j = 1 . . . . By linearizing, these residuals can be written as: ˜zij  Hij x ˜ Cj + Hfij ˜fi + nij , j = 1 . . . 

(3)

˜ Cj and ˜fi are the estimation errors of the j-th camera where x pose and i-th feature respectively, and the matrices Hij and Hfij are the corresponding Jacobians. Since the feature is not included in the MSCKF state vector, we proceed to marginalize it out. For this purpose, we first form the vector containing the  residuals from all the feature’s measurements: ˜ + Hfi ˜fi + ni z˜i  Hi x

(4)

˜ij and nij , where ˜zi and ni are block vectors with elements z respectively, and Hi and Hfi are matrices with block rows Hij and Hfij , for j = 1 . . . . Subsequently, we define the residual vector ˜zoi = VT ˜zi , where V is a matrix whose columns form a basis of the left nullspace of Hfi . From (4), we obtain: ˜ + VT ni = Hoi x ˜ + noi z˜oi = VT z˜i  VT Hi x

(5)

Once z˜oi and Hoi are available, we proceed to carry out a Mahalanobis gating test for the residual ˜ zoi , by computing:  −1 o z˜i γ = (˜zoi )T Hoi Pk+1|k (Hoi )T + σ 2 I

and comparing it against a threshold given by the 95-th percentile of the χ2 distribution. By stacking together the residuals of all features that pass this gating test, we obtain: ˜ + no z˜o = Ho x ˜o

o

(7)

o

where z , H , and n are block vectors/matrices, with block rows ˜ zoi , Hoi , and noi , for i = 1 . . . n, respectively. We can now use the above residual, which only involves the camera poses, for an EKF update. However, if a large number of features are processed at each time instant (the common situation), further computational savings can be achieved. Specifically, in [10] it is shown that instead of using the above residual, we can equivalently compute the thin QR factorization of Ho , written as Ho = QHr , and then employ the residual ˜ zr for updates, defined as: ˜ + nr z˜r = QT z˜o = Hr x (8) where nr is the r×1 noise vector, with covariance matrix σ 2 Ir . Once the residual ˜ zr and the matrix Hr have been computed, we compute the state correction and update the covariance matrix via the standard EKF equations: Δx = K˜ zr

(9)

Pk+1|k+1 = Pk+1|k − KSK r

T

r T

(10) 2

S = H Pk+1|k (H ) + σ Ir

(11)

K = Pk+1|k (Hr )T S−1

(12)

B. Computational complexity of the MSCKF The way in which the feature measurements are processed in the MSCKF is optimal, in the sense that no approximations are used, except for the EKF’s linearization [21]. This is true, however, only if the sliding window of states, m, contains at least as many states as the longest feature track. If it does not, then the measurements that occurred more than m timesteps in the past cannot be processed. Therefore, to use all the available feature information, the MSCKF must maintain a window of states long enough to include the longest feature tracks. We now examine the dependence of the computational requirements of the MSCKF on the number of features and the lengths of the feature tracks. The computation time of the MSCKF is dominated by the following operations: 1) The computation of the residual and Jacobian matrix in (5), for each feature. If n features are processed,  with feature track lengths i , i = 1 . . . n, this requires O( ni=1 2i ) operations. n 2) The Mahalanobis test, requiring O( i=1 3i ) operations. 3) The computation of the residual and the Jacobian matrix in (8), which, by exploiting thestructure in Ho , can be implemented in approximately O( ni=1 3i ) operations. 4) The computation of the Kalman gain and the update of the covariance matrix, which require O(r(15 + 6m)2 + r2 (15 + 6m) + r3 ) operations. Here 15 + 6m is the size of the state covariance matrix, and r is the number of rows of Hr . It can be shown that, in general, r (which equals the number of independent constraints for the camera poses) is given by [23]:

(6)

243

r = 2((1) + (2) + (3) ) − 7

(13)

Canyon area Town streets

0.5

Probability

0.4 0.3 0.2 0.1 0

5

10

15 20 25 Feature track length

30

35

40

Fig. 1. Distribution of the feature track lengths in two parts of the Cheddar Gorge dataset [24]. The features detected were Harris corners, matched by normalized cross-correlation.

where (1) , (2) , and (3) are the three longest feature tracks. From the above we see that, while the computational cost of the MSCKF is linear in the number of features, it is at least quadratic in the size of the sliding window, m. In fact, if the size of the sliding window is chosen to be equal to the longest feature tracks, then r is also O(m), and the overall complexity becomes cubic in m. This demonstrates a shortcoming of the MSCKF: to preserve all measurement information, the complexity of the algorithm scales cubically as a function of the longest feature track length. In real-world datasets the distribution of the feature-track lengths is non-uniform, with many features tracked for short durations, and very few stabler features tracked over longer periods. For instance, Fig. 1 shows the distribution of the feature track lengths in two parts of a real-world dataset [24]. It can be seen that, even though the longest feature tracks reach 40 images, the vast majority of features is tracked for a small number of frames (the percentage of features tracked in five or less images is 88% and 69% in the two cases shown, respectively). To be able to process the small percentage of features with long track lengths, the MSCKF must maintain a long sliding window, which is computationally inefficient. In what follows, we show how we can integrate the MSCKF algorithm with EKF-SLAM, to address this limitation. C. The hybrid MSCKF/SLAM algorithm The alternative way of processing feature measurements in the EKF is to include them in the state vector, and use their observations as in the standard visual-SLAM formulation. As discussed in Section III, this approach has computational properties complementary to those of the MSCKF: while the MSCKF is better at processing many features tracked for a few frames, EKF-SLAM is faster when few features are tracked for many frames. This motivates us to combine both approaches in a single, “hybrid” estimator. Specifically, we formulate a filter whose state vector at time-step k contains the current IMU state, a sliding window of m camera poses, and sk features: T  (14) xk = xTIk xTC1 · · · xTCm f1T · · · fsTk This provides us with the ability to choose whether a feature will be processed using the MSCKF approach, or whether it

will be included in the state vector and processed as in EKFSLAM. By analyzing in detail the computational requirements of each EKF update, it can be shown that, when many features are present, there is nothing to be gained by initializing in the state vector any feature observed fewer than m times [23]. Thus, the optimal (in terms of computational requirements) strategy for using the features turns out to be a simple one: if feature i’s track is lost after fewer than m frames (i.e., i < m), then the feature is processed using the MSCKF equations, as described in Section IV-A. On the other hand, if a feature is still actively being tracked after m images, it is initialized into the state vector, and used for SLAM. Thus, the only choice that one needs to make is the size of the sliding window, m. In Section V we show how this can be chosen optimally. At a given time step, the EKF update is carried out using a residual vector constructed by stacking together the residual computed from the “MSCKF features” and from the sk observations of the “SLAM features”: ⎡ r ⎤ ⎡ r ⎤ ˜ H zk ⎢ z˜1m ⎥ ⎢ H1m ⎥ ⎢ ⎥ ⎥ ⎢ ˜ k + nk = H k x ˜ ˜ k + nk (15) zk = ⎢ . ⎥  ⎢ . ⎥ x ⎣ .. ⎦ ⎣ .. ⎦ z˜sk m

Hsk m

In the above equation, z˜jm , for j = 1 . . . sk are the residuals of the observations of the sk SLAM features from the latest camera state (state m), and Hjm , for j = 1 . . . sk are the associated Jacobian matrices (see (3)). Each of these residuals is a 2×1 vector, while each Hjm is a 2×(15+6m+3sk ) matrix (here 15 + 6m+ 3sk is the size of the state covariance matrix). The residual z˜k and the Jacobian Hk are used for updating the state and covariance matrix, similarly to (9)-(12). For initializing new features, the m measurements of a feature are used to triangulate it and to compute its initial covariance and the cross-correlation with other filter states. In our implementation, we use the inverse-depth feature parameterization, due to its superior linearity properties. The latest camera clone, xCm , is used as the “anchor” state, based on which the inverse-depth parameters are computed. If the feature is still actively being tracked at the time its anchor state needs to be removed from the sliding window, the anchor is changed to the most recent camera state, and the covariance matrix of the filter is appropriately modified. The details of feature initialization as well as the “anchor change” process are shown in [23]. Finally, the hybrid MSCKF/SLAM algorithm is described in Algorithm 1. V. O PTIMIZING THE PERFORMANCE OF THE HYBRID EKF In this section we show how the size of the sliding window, m, can be selected so as to minimize the computational cost of the hybrid MSCKF/SLAM filter. As the results in Section VI show, the choice of m has a profound effect on the time requirements of the algorithm. With a suitable choice, the hybrid method can significantly outperform each of its individual components.

244

Algorithm 1 Hybrid MSCKF/SLAM algorithm Propagation: Propagate the state vector and covariance matrix using the IMU readings. Update: Once camera measurements become available: • Augment the state vector with the latest camera pose. • For features to be processed in the MSCKF (feature tracks of length smaller than m), do the following – For each feature to be processed, calculate the residual and Jacobian matrix in (5). – Perform the Mahalanobis gating test in (6). – Using all features that passed the gating test, form the residual vector and the Jacobian matrix in (8). • For features that are included in the state vector, compute the residuals and measurement Jacobian matrices, and form the residual ˜ zk and matrix Hk in (15). • Update the state vector and covariance matrix, via (9)(12), using the residual z˜k and Jacobian matrix Hk . • Initialize features tracked in all m images of the sliding window. State Management: • Remove SLAM features that are no longer tracked, and change the anchor pose for SLAM features anchored at the oldest pose. • Remove the oldest camera pose from the state vector. If no feature is currently tracked for more than mo poses (with mo < m − 1), remove the oldest m − mo poses. A. Operation count for each update By carefully analyzing the computations needed, we calculate the number of floating-point operations per update of the hybrid algorithm: fk = α1

n 

3i + α2 rm2 + α3 r2 m + α4 rmsk

i=1

+ α5 (r + 2sk )3 + α6 (r + 2sk )2 (6m + 15 + 3sk ) + α7 (r + 2sk )(6m + 15 + 3sk )2 + l.o.t

(16)

where the αi ’s are known constants, n is the number of features used in the MSCKF, r is defined in (13), and l.o.t. stands for lower-order terms2 . The terms shown above correspond to the computations for the MSCKF features (1st term), for the matrix S of the hybrid filter (2nd-4th terms), and for the EKF update (5th-7th terms). Note that (16) also models the probability of failure for the Mahalanobis test. It is interesting to examine the properties of (16). First, we note that since r represents the number of constraints provided by the MSCKF features for the poses in the sliding window, it is bounded above by 6m − 7: the total number of unknowns in the sliding window is 6m, and the feature measurements 2 The full expression for the operation count consists of tens of individual terms, whose inclusion would merely complicate the presentation, and not add much insight. Note however, that in the optimization described Section V-B, the complete expression is used.

cannot provide any information about the global pose or scale, which correspond to 7 degrees of freedom. If many features are available, we will have r ≈ 6m − 7, and thus: fk ≈ α1

n 

3i + α2 m3 + α3 m2 sk + α4 ms2k + α5 s3k

i=1

This approximate expression makes it possible to gain intuition about the behavior of the computational cost of the hybrid method: as the size of the sliding window, m, increases, more features will be processed by the MSCKF, and fewer features will be included inthe state vector of the filter. Thus, as m increases, the term ni=1 3i will also increase, but sk will decrease rapidly. These two opposing trends result in the performance curves shown in Section VI (e.g., Fig. 3). B. Minimizing the average operation count We now turn our attention to determining the optimal value of m, in order to minimize the runtime of the hybrid EKF estimator. Equation (16) provides us with the operation count of one filter update, so at first glance, it may appear that one needs to minimize this equation with respect to m, at each time instant. However, that would be an ill-posed problem. To see why, consider the case where, at time step k, the sliding window has length m = 20, and ten features exist that have been continuously tracked in 20 images. At this time, we have the option of either increasing the size of the sliding window, or including the ten features in the state vector. Which is best depends on the future behavior of the feature tracks: if these features end up being tracked for a very large number of frames (2 20), then it would be preferable to include the features in the state vector. If, on the other hand, the features end up being tracked for only 21 frames, it would be preferable to increase m by one. Clearly, it is impossible to obtain future information about any particular feature track. We can however, collect statistical information about the properties of the feature tracks, and use this information to minimize the expected cost of the EKF updates. This is precisely the approach we implement. Specifically, during the filter’s operation, we collect statistics to learn the probability mass function (pmf) of the feature track lengths, p(i ), the probability of failure of the Mahalanobis gating test, as well as the pmf of the number of features tracked in the images. Using the learned pmfs, we compute the average number of operations needed for each EKF update, f¯(m), by direct application of the definition of the expected value of a function of random variables. The value of m that yields the minimum f¯(m) can be found by exhaustive search among all possible values. However, we have found that the cost curve in practical cases is quasiconvex, which means that it has a unique minimum (see Fig. 2). Therefore, to reduce the time needed for the optimization, we perform the optimization by local search starting from a known good initial guess (e.g., the last computed value of m). Since the statistical properties of the feature tracks can change over time (see Fig. 1), we perform the learning of the pmfs as well

245

4.5 Actual time Expected flop count

4 3.5 3 2.5 2 1.5 1 5

10

15 20 Sliding window size

25

30

Fig. 2. Comparison of actual runtime vs. expected flop count. Since the curves have different units, each has been normalized with respect to its minimum value.

as the selection of the optimal m in consecutive time windows spanning a few seconds (15 sec in our implementation). It is worth noting that in modern computers the use of flop counts to model computational cost is not always suitable, as performance is affected by several factors including vectorization, cache access patterns, data locality, etc. However, we have experimentally verified that in the algorithms considered here, and for our implementation, the computed flop counts closely follow the observed runtimes. Specifically, Fig. 2 shows the actual runtime of the hybrid filter, as well as the value f¯(m) calculated analytically, for a specific trajectory. We can observe that the two curves are very similar, with the only significant differences observed at the two “extremes” of very small or very large m. These regions are less important, however, as they fall outside the typical operational region of the hybrid filter. Thus, using f¯(m) as the objective function to minimize is appropriate. VI. E XPERIMENTAL RESULTS A. Simulated data We generate simulation data that closely match real-world datasets, to have as realistic a test environment as possible. Our first simulation environment is based on the dataset of [24], which consists of a 29.6-km long trajectory through canyons, forested areas, and a town. We generate a ground truth trajectory (position, velocity, orientation) that matches the vehicle’s actual trajectory, as computed by a high-precision INS system. Using this trajectory, we subsequently generate IMU measurements corrupted with noise and bias characteristics identical to those of the Xsens MTi-G sensor used in the dataset. Moreover, we generate monocular feature tracks with statistical characteristics similar to those of the features extracted in the actual dataset. The number of features observed in each image and the feature track distribution change in each part of the trajectory, as in the actual dataset (see Fig. 1). Overall, there are 232 features observed in each image on average, and the average feature track length is 5.6 frames. The IMU measurements are available at 100 Hz, while the camera frame rate is 20 Hz, as in the actual dataset. All the results reported are averages over 50 Monte-Carlo trials.

Fig. 3 plots the results from the application of the hybrid filter in this setting. Specifically, the top plot shows the average runtime for each update of the hybrid algorithm. The solid blue line is the average time when m is chosen in advance, and kept constant for the entire trajectory. The red dashed line denotes the runtime achieved when applying the optimization process described in Section V, which optimally chooses m in each time window, to adapt to local feature properties. This plot shows that, by optimizing the value of m in real time, we can attain performance higher than that of any fixed value. Note that when m is large, no features are initialized, and thus the right-most part of the plot gives the performance of the plain MSCKF (similarly, for small m we obtain pure EKF-SLAM). Therefore, from this plot we can see that the optimal hybrid filter has a runtime 37.17% smaller than that of the MSCKF, and 72.8% smaller than EKF-SLAM. In the second subplot of Fig. 3 we plot the RMS position error, averaged over all Monte-Carlo trials and over the duration of the trajectory. We can observe that the plain MSCKF results in the highest accuracy. We attribute this to two causes. First, in the MSCKF features are explicitly marginalized, and thus no Gaussianity assumptions are needed for the pdf of the feature position errors (as is the case in SLAM). Second, all the measurements of each feature are used jointly in the MSCKF, which means that outliers can be more easily detected, and better linearization points can be computed. By combining the MSCKF with EKF-SLAM some accuracy may be lost, as the errors for the features included in the state vector are now assumed to be Gaussian. However, we can see that if the size of the sliding window increases above a moderate value (e.g., 9 in this case), the change in the accuracy is almost negligible. Intuitively, when a sufficient number of observations is used to initialize features, the feature errors’ pdf becomes “Gaussian enough” and the accuracy of the hybrid filter is very close to that of the MSCKF. Based on these results, in our optimization we do not allow the value of m to fall below a certain threshold (set to 7 in our implementation). The timing results presented in Fig. 3 are obtained on a laptop computer with a Core i7 processor at 2.13GHz, and a single-threaded C++ implementation. Clearly, if the data were to be processed on this computer, the timing performance would easily allow for real-time implementation (the hybrid filter requires fewer than 4 msec per update with optimal m). However, our primary interest is in implementing pose estimation on small portable devices. For this reason, we conducted a second set of tests, in which the data were processed on a Samsung Galaxy S2 mobile phone, equipped with a 1.2-GHz Exynos 4210 processor. For these tests, we ported our C++ implementation to Android using the Android NDK. The simulation data were produced in a similar way as described above, by emulating a real-world dataset collected while driving in a residential area of Riverside, CA. The vehicle trajectory and statistics of the dataset (e.g., feature distribution, feature track length, and so on) are different, allowing us to test the proposed algorithm in different situations. Fig. 4 shows the results of the hybrid filter in this setting.

246

Fixed sliding window size Adaptive sliding window size

12

MSCKF Hybrid MSCKF/SLAM SLAM

600

10 8

400

6

North−South (m)

Time (msec)

14

4 5

10

15

20

25

30

200

0

Position error (m)

80 −200

70 −400

50

5

10

15 20 Sliding window size

25

Fixed sliding window size Adaptive sliding window size

70 60 50 40 30

5

10

5

10

15

20

25

30

15 20 Sliding window size

25

30

40 30 20 10 0

−200

0 200 400 East−West (m)

600

800

1000

30

80 Time (msec)

−400

Fig. 5. The trajectory estimates of the MSCKF, EKF-SLAM, and the hybrid algorithm in the real-world experiment.

Fig. 3. Monte-Carlo simulation results: Timing performance and rms position errors of the hybrid filter, for changing values of m. Timing measured on a laptop computer.

Position error (m)

−600

60

Fig. 4. Monte-Carlo simulation results: Timing performance and rms position errors of the hybrid filter, for changing values of m. Timing measured on a Samsung Galaxy S2 mobile phone. Note that the dataset used here is different from the one used in Fig. 3, hence the different accuracy.

These results are very similar to what was observed in the first simulation, with the optimal hybrid filter outperforming each of the individual algorithms by a wide margin (runtime 45.8% smaller than the MSCKF, and 55.6% smaller than SLAM). More importantly, however, we observe that the hybrid filter is capable of processing the data at real-time speeds, even on the much less capable processor of the mobile phone. Specifically, the average time needed for each update of the hybrid filter with optimally chosen thresholds is 33.78 msec, corresponding to a rate of 30 images per second. Since the images are recorded at 20 Hz, this means that the proposed hybrid estimator is capable of real-time processing, something that is not possible with any of the individual methods.

B. Real-world data In addition to the synthetic datasets described above, we employed our proposed algorithm for processing the feature measurements recorded during a real-world experiment. During this experiment an IMU/camera platform was mounted on top of a car and driven on city streets. The sensors consisted of an Inertial Science ISIS IMU and a PointGrey Bumblebee2 stereo pair (only a single camera’s images are used). The IMU provides measurements at 100 Hz, while the camera images were stored at 10 Hz. Harris feature points are extracted, and matching is carried out by normalized cross-correlation. The vehicle trajectory is approximately 5.5 km long, and a total of 7922 images are processed. In this dataset, to compensate for the low frame rate of the images, we use lower feature-detection thresholds, which leads to a larger number of features. Specifically, 540 features are tracked in each image on average, and the average track length is 5.06 frames. This substantially increases the overall computational requirements of all algorithms. When run on the mobile phone’s processor, the average time per update is 139 msec for the MSCKF, 774 msec for EKF-SLAM, and 77 msec for the hybrid filter with optimal m. In Fig. 5 the trajectory estimates for each of the three methods are plotted on a map of the area where the experiment took place. We observe that the accuracy of the MSKCF and the hybrid filter are similar, and substantially better than that of the EKFSLAM. In Fig. 6 we plot the computed values for the optimal m during the experiment. This figure shows that, due to the changing properties of the feature tracks’ distribution, the optimal value varies considerably over time, justifying the need for periodic re-optimization. As a final remark, we note that the optimization process itself is computationally inexpensive. In our implementation, the optimal threshold is re-computed every 15 sec, and this process takes 0.31 msec, on average, on the mobile phone. Therefore, the optimization takes up fewer than 0.003% of the overall processing time, while resulting in substantial performance gains.

247

21

Optimal window size

20 19 18 17 16 15 14 13 0

100

200

300

400 500 Time (sec)

600

700

800

Fig. 6. The length of the sliding window selected via the optimization process for the real-world experiment.

VII. D ISCUSSION AND CONCLUDING REMARKS In this work, we presented a hybrid MSCKF/SLAM estimator for visual-inertial odometry, as well as a methodology for optimizing the estimator’s computational efficiency. The results presented in the preceding section demonstrate that the proposed algorithm offers dramatic performance gains over both the MSCKF and EKF-SLAM individually. In all the cases examined, the algorithm was able to attain runtimes that allow for real-time performance, even on the low-end (by today’s standards) processor of a mobile phone. These results validate the learning-based optimization approach for designing pose estimation algorithms outlined in Section I. In this paper we have exclusively focused on optimizing the performance of the estimator (the estimation “back end” as it is sometimes called). The “front end” of visual feature extraction was not considered in this work. Obviously, for realtime operation on a resource-constrained device both of these components must be optimized, and this is the subject of our ongoing work. As a final remark, we would like to stress that the learningbased performance optimization approach proposed in this paper is general, and can be applied to different estimators, and augmented with additional information. For instance, one can model the correlations between the track lengths of different features, which are now assumed independent. Moreover, one can use additional information (e.g., the distinctiveness of features) to predict the track length of each feature individually, or its likelihood of passing the Mahalanobis gating test. Such improvements will allow further optimization of the algorithm’s performance. ACKNOWLEDGMENTS This work was supported by the National Science Foundation (grant no. IIS-1117957), the UC Riverside Bourns College of Engineering, and the Hellman Family Foundation. R EFERENCES [1] J. E. Guivant and E. M. Nebot, “Optimization of the simultaneous localization and map building algorithm for real time implementation,” IEEE Transactions on Robotics and Automation, vol. 17, no. 3, pp. 242– 257, June 2001. [2] L. M. Paz, J. D. Tardos, and J. Neira, “Divide and conquer: EKF SLAM in O(n),” IEEE Transactions on Robotics, vol. 24, no. 5, pp. 1107 –1120, Oct. 2008.

[3] M. Walter, R. Eustice, and J. Leonard, “Exactly sparse extended information filters for feature-based SLAM,” International Journal of Robotics Research, vol. 26, no. 4, pp. 335–359, Apr. 2007. [4] M. Kaess, A. Ranganathan, and F. Dellaert, “iSAM: Incremental smoothing and mapping,” IEEE Transactions on Robotics, vol. 24, no. 6, pp. 1365 –1378, Dec. 2008. [5] E. D. Nerurkar and S. I. Roumeliotis, “Power-SLAM: a linearcomplexity, anytime algorithm for SLAM,” International Journal of Robotics Research, vol. 30, no. 6, pp. 772–788, May 2011. [6] S. J. Julier, “A sparse weight Kalman filter approach to simultaneous localisation and map building,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Maui, HI, Oct. 29-Nov. 3 2001, pp. 1251–1256. [7] S. Thrun, Y. Liu, D. Koller, A. Ng, Z. Ghahramani, and H. DurrantWhyte, “Simultaneous localization and mapping with sparse extended information filters,” International Journal of Robotics Research, vol. 23, no. 7-8, pp. 693–716, Aug. 2004. [8] H. Strasdat, C. Stachniss, and W. Burgard, “Which landmark is useful? Learning selection policies for navigation in unknown environments,” in Proceedings of the IEEE International Conference on Robotics and Automation, Kobe, Japan, May 2009, pp. 1410 –1415. [9] A. J. Davison and D. W. Murray, “Simultaneous localisation and mapbuilding using active vision,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 865–880, Jul. 2002. [10] A. I. Mourikis and S. I. Roumeliotis, “A multi-state constraint Kalman filter for vision-aided inertial navigation,” in Proceedings of the IEEE International Conference on Robotics and Automation, Rome, Italy, Apr. 2007, pp. 3565–3572. [11] J. Kelly and G. Sukhatme, “Visual-inertial sensor fusion: Localization, mapping and sensor-to-sensor self-calibration,” International Journal of Robotics Research, vol. 30, no. 1, pp. 56–79, Jan. 2011. [12] E. Jones and S. Soatto, “Visual-inertial navigation, mapping and localization: A scalable real-time causal approach,” International Journal of Robotics Research, vol. 30, no. 4, pp. 407–430, Apr. 2011. [13] T. Dong-Si and A. I. Mourikis, “Motion tracking with fixed-lag smoothing: Algorithm and consistency analysis,” in Proceedings of the IEEE International Conference on Robotics and Automation, Shanghai, China, May 2011, pp. 5655 – 5662. [14] G. Sibley, L. Matthies, and G. Sukhatme, “Sliding window filter with application to planetary landing,” Journal of Field Robotics, vol. 27, no. 5, pp. 587–608, Sep. 2010. [15] M. Li and A. I. Mourikis, “Improving the accuracy of EKF-based visualinertial odometry,” in Proceedings of the IEEE International Conference on Robotics and Automation, St Paul, MN, May 2012, pp. 828–835. [16] A. Davison, I. Reid, N. Molton, and O. Stasse, “MonoSLAM: Realtime single camera SLAM,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 6, pp. 1052–1067, Jun. 2007. [17] P. Pinies and J. D. Tardos, “Scalable SLAM building conditionally independent local maps,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, San Diego, CA, Nov. 2007, pp. 3466–3471. [18] R. Munguia and A. Grau, “Monocular SLAM for visual odometry,” in Proceedings of the IEEE International Symposium on Intelligent Signal Processing, Alcala Henares, Spain, Oct. 2007, pp. 1–6. [19] D. D. Diel, P. DeBitetto, and S. Teller, “Epipolar constraints for visionaided inertial navigation,” in IEEE Workshop on Motion and Video Computing, Breckenridge, CO, Jan. 2005, pp. 221–228. [20] S. I. Roumeliotis, A. E. Johnson, and J. F. Montgomery, “Augmenting inertial navigation with image-based motion estimation,” in Proceedings of the IEEE International Conference on Robotics and Automation, Washington D.C, May 2002, pp. 4326–33. [21] M. Li and A. I. Mourikis, “Consistency of EKF-based visual-inertial odometry,” University of California Riverside, Tech. Rep., 2011, www.ee.ucr.edu/∼mourikis/tech reports/VIO.pdf. [22] H. Strasdat, J. M. M. Montiel, and A. J. Davison, “Real-time monocular SLAM: Why filter?” in Proceedings of the IEEE International Conference on Robotics and Automation, Anchorage, AL, May 2010, pp. 2657–2664. [23] M. Li and A. I. Mourikis, “Optimization-based estimator design for vision-aided inertial navigation: Supplemental materials,” 2012, www.ee.ucr.edu/∼mli/SupplMaterialsRSS2012.pdf. [24] R. Simpson, J. Cullip, and J. Revell, “The Cheddar Gorge data set,” Tech. Rep., 2011.

248

Practical Route Planning Under Delay Uncertainty: Stochastic Shortest Path Queries Sejoon Lim∗ , Christian Sommer∗ , Evdokia Nikolova† , Daniela Rus∗ ∗ Computer

Science and Artificial Intelligence Lab, MIT Email: {sjlim, csom, rus}@csail.mit.edu † Department of Computer Science and Engineering, Texas A&M University Email: [email protected]

Abstract—We describe an algorithm for stochastic path planning and applications to route planning in the presence of traffic delays. We improve on the prior state of the art by designing, analyzing, implementing, and evaluating data structures that answer approximate stochastic shortest-path queries. For example, our data structures can be used to efficiently compute paths that maximize the probability of arriving at a destination before a given time deadline. Our main theoretical result is an algorithm that, given a directed planar network with edge lengths characterized by expected travel time and variance, pre-computes a data structure in quasi-linear time such that approximate stochastic shortestpath queries can be answered in poly-logarithmic time (actual worst-case bounds depend on the probabilistic model). Our main experimental results are two-fold: (i) we provide methods to extract travel-time distributions from a large set of heterogenous GPS traces and we build a stochastic model of an entire city, and (ii) we adapt our algorithms to work for realworld road networks, we provide an efficient implementation, and we evaluate the performance of our method for the model of the aforementioned city.

Fig. 1. Path query example for maximizing the probability of reaching the destination within a user-specified deadline. Our system finds such a path within milliseconds for a city-sized network. When a user at origin ‘O’ wants to get to destination ‘D’ within 30 minutes between 5pm and 6pm on a weekday, Path 1 offers the best chance of reaching the destination within the deadline. However, when the deadline changes to 20 minutes, Path 1 has less than a 50% chance for making the deadline. In this case, Path 2 is the best route, offering a 88.5% chance for on-time arrival.

I. I NTRODUCTION We present and evaluate a new algorithm for planning the path of vehicles on roadways in the presence of uncertainty in traffic delays. We build on prior work on stochastic path planning [15, 19, 20], which introduces stochastic path planning algorithms in this model. This prior work presents novel provably correct algorithms that are well suited for planning the path of a vehicle on small-scale graphs; however, the computational complexity of this prior suite of stochastic path planning algorithms makes these algorithms impractical to use for real-time path planning on city-scale road networks. The stochastic path planning model incorporates traffic history in the form of probability distributions of delays observed on actual road segments. The probability distributions capture the historical delay observations that are assigned as weights on the edges of the road network. The stochastic path planning algorithm minimizes a user-specified cost function of the delay distribution. Current route planning methods can produce routes with shortest expected time (based on historical traffic data) and some recent commercial systems, using almost-realtime data based on cell-phone locations [22, 26], can also compute “smartest” routes in dynamic traffic scenarios. The algorithm in this paper improves on and extends previous work by enabling fast routes guaranteed to maximize the likelihood

of arriving at a destination within a specified deadline in cityscale route graphs. This work provides a planning system that can be used by autonomous as well as human-driven vehicles for traffic routing applications that meet travel goals such as “when should you leave, and what path should you take, to reach the destination by the given deadline with high probability?” Preserving the inherently non-deterministic nature of actual traffic, we work with accurate probabilistic models of traffic-intense areas — in our work, we actually build a probabilistic model of an entire city based on a large set of heterogenous GPS traces. Our Contributions: The algorithm in this paper improves the state of the art on stochastic path planning [18, 19, 20] and its application to traffic routing [15]. The running time of the state-of-the-art stochastic-shortest-path algorithm is approximately quadratic in the number of nodes, rendering it too slow for real-time performance on city-scale networks. We improve upon their running time by using efficient data structures. In contrast to previous work [18], we provide a two-phase algorithm, consisting of preprocessing and query algorithms, better suited for practical implementations. Our preprocessing algorithm runs in quasi-linear time and it does not depend on the origin-destination pair or on the deadline,

249

which is the main theoretical challenge. Our query algorithm runs in sublinear time (whereas the algorithms in [15] run in polynomial time). The running time is polylogarithmic, offering exponential speedups over previous algorithms and allowing for almost instantaneous response times (see Lemma 1 and Theorem 3 for details). We have implemented the algorithms described in this paper and packaged them as a route planning system. We evaluated the system using the road map of an entire city. Using GPS traces from a large fleet of taxis (approximately 16,000 vehicles), we extract a time-dependent, probabilistic model for each individual road segment. Our algorithms preprocess the road network equipped with probabilistic edge lengths into a data structure. Using this data structure, our query algorithm can efficiently answer approximate stochastic shortest-path queries. Our experiments validate the theoretical performance predictions, showing improvements upon existing query algorithms by several orders of magnitudes. II. BACKGROUND A. Stochastic Shortest Paths In this section, we provide a brief overview of the stochastic shortest-paths problem and known results, extracted from [18]. Consider a graph G with a given source node s and destination node t with stochastic edge delays we that come from independent distributions with means and variances (μe , τe ), respectively. Denote the vector of edge delays by w = (w1 , ..., w|E| ) and the corresponding vectors of edge means and variances by μ = (μ1 , ..., μ|E| ) and τ = (τ1 , ..., τ|E| ). Denote the incidence vector of a path by x ∈ {0, 1}|E| with ones corresponding to edges present in the path and zeros otherwise. Let the feasible set of s-to-t–paths be F ⊂ {0, 1}|E| . Given the uncertainty of edge delays, it is not immediate how to define a stochastic shortest path: is it the path that minimizes mean delay? Or should it minimize path variance or some other metric? Risk-averse users would naturally care about the variability in addition to mean delays. Two riskaverse models that have been previously considered are the mean-risk model and the probability-tail model, which we describe below. The algorithms for both models make black-box calls to the underlying deterministic shortest-path algorithm, which computes min wT x

subject to x ∈ F .

(1)

a) A Mean-Risk Model: In this model, we seek the path minimizing mean delay plus a risk-aversion coefficient c ≥ 0 times the path standard deviation, more formally: √ (2) minimize μ T x + c τ T x subject to x ∈ F . This is a non-convex combinatorial problem, which can be solved exactly by enumerating all paths that minimize some positive combination of mean and variance (the latter is a deterministic shortest-path problem with respect to edge lengths equal to the corresponding mean–variance linear combination)

and selecting the one with minimal objective (2) (see [18, 20] for more details). The number of deterministic shortest-path iterations is at most nO(log n) in the worst case [20]. Furthermore, there is a fully-polynomial-time approximation scheme (FPTAS) for the problem, as stated in the following theorem: Theorem 1 ([18, Theorem 4.1]): Suppose we have a δ – approximation algorithm for solving the deterministic shortestpath problem (1). For any ε > 0, the mean-risk model (2) can be approximated to a multiplicative factor of δ (1 + ε) by calling the algorithm for the deterministic shortest-path problem polynomially many times in the input size and 1/ε. In other words, one can use both exact (δ = 1) and δ – approximate (for any δ > 1) shortest-path subroutines and obtain a corresponding approximate risk-averse solution. b) Probability-Tail Model: In this model, we are given a deadline d and we seek the path maximizing the probability of arriving before the deadline: maximize Pr(wT x ≤ d) subject to x ∈ F . This model assumes normally-distributed edge delays in order to obtain a closed-form expression for the probability tail. In particular, when w is Gaussian, the problem is transformed into the following formulation [18]: d − μT x maximize √ τT x

subject to x ∈ F .

(3)

Similarly to the mean-risk model, problem (3) can be solved exactly in time nO(log n) [20] and it can be approximated as follows: Theorem 2 ([18, Theorem 4.2]): Suppose we have a δ – approximation algorithm for solving the deterministic shortestpath problem (1). For any ε > 0, the probability tail model (3) , 

2 /4) has a 1 − δ −(1−ε (2+ε)ε/4 –approximation algorithm that calls the algorithm for the deterministic shortest-path problem polynomially many times in 1/ε and the input size, assuming the optimal solution to problem (3) satisfies μ T x∗ ≤ (1 − ε)d for a user-specified deadline d. In our scenario, one main objective is to minimize the running time of the overall procedure at query time and hence the running times of these shortest-path subroutines are of high priority. B. Approximate Distance Oracles An (approximate) distance oracle is a data structure that efficiently answers shortest-path and/or distance queries in graphs. Relevant quantities of a distance oracle include the preprocessing time, which is the time required to construct the data structure, the space consumption, the query time, and, for approximate distance oracles, the stretch, which is the worstcase ratio among all pairs of query nodes of the query result divided by the actual shortest-path distance. Approximate distance oracles using quasi-linear space for poly-logarithmic query time and stretch (1 + ε) are known for planar [24] (implementation in [16]), bounded-genus [14], minor-free [1], geometric [13], and bounded-doubling-dimension graphs [4]. For realistic traffic scenarios, we need the distance oracle to work with directed graphs (also called digraphs). For planar

250

digraphs, Thorup [24] gave an approximate distance oracle that occupies space O(nε −1 log(nL) log(n)) and answers (1 + ε)– approximate distance queries in time O(ε −1 + log log(nL)), where L denotes the largest integer length (see [14] for different tradeoffs between space and query time). At a high level, Thorup’s method works as follows. The preprocessing algorithm recursively separates the graph into subgraphs of approximately equal size. Shortest paths are used as separators. For planar graphs, it is possible to separate the graph into two roughly balanced pieces by using three shortest paths Q1 , Q2 , Q3 . Next, the preprocessing algorithm considers each node v ∈ V and it computes O(1/ε) portals for v on each separator path Qi such that the shortest path from v to each node q ∈ Qi is approximated within a (1 + ε)–factor by going through one of the portals. Since each node only needs to remember O(1/ε) distances to portals per level, and since each node participates in O(log n) levels, the overall space complexity per node is bounded by O(ε −1 log n). For directed planar graphs, the construction is more involved and the space and time complexities are slightly higher.

defined by a deadline or a risk-aversion coefficient. We develop a method that finds a set of parameter values for the entire network, independent of the specific problem, and that provides guaranteed approximation accuracy for all choices of parameters simultaneously. We show that it suffices to maintain only a small set of parameters, which allows for small storage and efficient preprocessing. We preprocess the network using this set of parameter values. We implement the deterministic shortest-path algorithms required in [18] using a distance oracle, as described in the previous section. This allows for fast query times. The preprocessing algorithm, while taking quasilinear time, is executed only once and the data structure can be computed on a powerful machine (not necessarily the machine where the query algorithm is run on). One important technical difference compared to the method described in [15] is that we cannot use an adaptive algorithm at query time, since the preprocessing algorithm must prepare for each step that might get executed at query time. In other words, the preprocessing algorithm must prepare for all possible query pairs (s,t) ∈ V × V , and, potentially, deadline parameter values d ∈ R. While such preprocessing algorithms are standard for classical graph distances, adaptations are necessary for stochastic shortest paths. B. Preprocessing Algorithm

Fig. 2. For each node v we have O(1/ε) portals on each separator path Q such that the shortest path from v to each node q ∈ Q is approximated within a (1 + ε)–factor by going through one of the portals. Note that two nodes u and v may not necessarily use the same set of portals.

Furthermore, many efficient practical methods have been devised [5, 10, 21], their time and space complexities are however difficult to analyze. Competitive worst-case bounds have been achieved under the assumption that actual road networks have small highway dimension [2, 3]. We point out one particular method, which is in some sense related to our construction. Geisberger, Kobitzsch, and Sanders [11] provide a heuristic shortest-path query method with edge lengths defined as a customizable linear combination of two edge length functions (this linear combination, however, is not arbitrary, which is why we cannot use their data structure). III. S TOCHASTIC M OTION P LANNING WITH S PARSE P RECOMPUTED DATA A. Overview We provide a new stochastic path planning algorithm and its implementation in a real traffic network. The main research objective is to minimize the time for querying a path. Algorithms studied in [15, 20] find an exact solution by iteratively determining the parameter value k that governs the edge length (e) := k · μ(e) + τ(e) in terms of mean μ and variance τ. Such parameter values depend on a specific problem instance,

We adapt the algorithms in Theorem 1 and Theorem 2 to our scenario, where we have access to an approximate oracle for the deterministic shortest-path problem with edge lengths e → k · μ(e) + τ(e) for some k ∈ K . In previous work [18], the aim was a polynomial-time algorithm and shortest-path problems could be computed in quasi linear time for any value k. Since we aim at sublinear query time, the queries made to the oracle need to be non-adaptive, or at least restricted to a small set of possible queries K . We can then precompute distance oracles for all these values k ∈ K . On a high level, the preprocessing algorithm is rather simple. Algorithm 1: P REPROCESS Data: G = (V, E) with given lengths μ, τ : E → R+ Result: distance oracles orck for each k ∈ K compute the set K ; for k ∈ K do orck ← build (approximate) distance oracle for length function e → k · μ(e) + τ(e); end Note that this preprocessing algorithm can be parallelized in a straightforward way. Since there are no dependencies between the oracles constructed for different values of k, the preprocessing algorithm can also make use of multi-core architectures. As an immediate consequence, we obtain the following: Claim 1: The preprocessing time is bounded by O(|K | · P), where P denotes the preprocessing time of the deterministic distance oracle.

251

A main technical contribution of this work is showing how to compute (and analyze) this set K . The size of K mostly depends on the user’s choice of ε. We run the deterministic preprocessing algorithm for the graph with edge lengths k (e) = k · μ(e) + τ(e) for k ∈  L, (1 + ξ )L, (1 + ξ )2 L, . . .U for ξ , L,U to be defined for each model. Also, the approximation parameter ε we use for the preprocessing algorithm in the distance oracle changes with the model. 1) A Mean-Risk Model: Using a δ –approximate distance oracle, the end result is a δ (1 + ε)–approximation. Therefore, when asking for a (1 + ε)–approximate answer, we internally set εint := ε/3, since, for ε < 1/2, we have that (1 + ε/3)2 ≤ 1 + 2ε/3 + ε 2 /9 ≤ 1 + ε. In the following, we write ε instead of εint . Next, we set ξ , L, and U as follows. , ε ξ = 1+ε L = min τ(e)

on dL = max d as follows. As soon as the deadline is so large that shortest paths do not change anymore, we do not have to consider larger values of d. In a straightforward way, dL can be computed by computing all-pairs shortest paths for increasing values of d. Alternatively, again for increasing values of d, a 2–approximation of the diameter can be found in linear time. C. Query Algorithm The simplest version just tries all values k ∈ K . Algorithm 2: Q UERY Data: G = (V, E) with given lengths μ, τ : E → R+ ; distance oracles orck for each k ∈ K ; origin O, destination D, and mean risk coefficient c or deadline d depending on the risk model Result: Approximately Optimal Path p p ← NIL; for k ∈ K do q ← f ind path(O, D, orck ) ; √ √ if ( meanRisk and q.μ + c q.τ < p.μ + c p.τ) d−q.μ d−p.μ or ( probTail and q.τ > p.τ ) then p ← q; end end return p;

e

U

=

τ(e) ∑ τ(e) ≤ n · max e e

Alternatively, U can be set to the largest possible variance of any path. We have that the total number of values k is at most     max τ(e) √ |K | = log1+ξ (U/L) = O log n / ε . min τ(e) As a consequence, we obtain the following lemma. Lemma 1: For any graph G = (V, E) and for any two length , there exists a set K functions μ : E →R+ and τ: E → R+  τ(e) such that mean-risk of size |K | = O ε −1/2 log n max min τ(e) shortest paths can be approximated by a shortest path in G with length function k (e) = k · μ(e) + τ(e) for some k ∈ K . 2) Probability-Tail Model: For the probability-tail model, we follow the same strategy as in the previous section, based on the proof of [18, Lemma 3.4], showing that, in the mean–variance plane, any fixed parabola (also termed level set) with apex (d, 0) can be ε–approximated by a piecewise linear curve (see [18, Figure 1(b)] for an illustration) using few line segments. We exploit this ε–approximation in our algorithm. Furthermore, we define level sets irrespective of the deadline d: we may actually define K for any value of d — for a given query deadline d 3 , the actual parabolas are just shifted to the left or to the right. Each segment defines a slope value k and the union of all these slope values (their absolute value, actually) is our set of slopes. Note that shifting the parabola to the left or to the right does not affect these slopes. We set ξ , L, and U as follows. ξ

=

L

=

U

=

ε/2 2 mine τ(e) dL 2 ∑e τ(e) 2n maxe τ(e) 2n maxe τ(e) ≤ ≤ εdU εdU ε mine μ(e)

Although the user may choose to query the data structure for an arbitrarily large deadline d, we can give an upper bound

Claim 2: The query time is bounded by O(|K | · Q), where Q denotes the query time of the deterministic distance oracle. Combined with Claim 1, we obtain our main theorem. Theorem 3: There is a data structure that can be computed in time O(|K | · P), occupying space O(|K | · S ), and answering approximate stochastic shortest-path queries in time O(|K | · Q), where P, S , Q denote the preprocessing, space, and query complexities of the deterministic distance oracle, respectively. In the practical part, we also have a heuristic that binary searches for the optimal value in K , resulting in faster query times. Plugging in the distance oracle of Thorup [24], we obtain our main result. We state it in the mean-risk model. An analogous bound holds for the probability-tail model. To apply [24, Theorem 3.16], we set the largest integer length N (k) := max k · μ(e) + τ(e) for each preprocessing step e∈E

τ(e) k ∈ K . Also, let τ¯ = max min τ(e) . Corollary 1: For any directed planar graph, there exists a data structure that can be computed in time ¯ occupying space O(n(log (nτ))(lg(nN))(lg n)3 ε −5/2 ),  ¯ , such that approximate O n · ε −3/2 (lg n)(lg(nN))(log (nτ)) stochastic shortest-path queries can be answered in time ¯ lg lg(nN) + ε −3/2 log (nτ)). ¯ O(ε −1/2 log (nτ) The space can be reduced at the cost of increasing the query time by using the construction in [14]. In particular, for scenarios where memory is scarce (e.g. on mobile devices),

252

one can obtain a linear-space data structure. Note that the preprocessing algorithm can be executed on a powerful computer and not necessarily on the mobile device. The query time remains poly-logarithmic. IV. S TOCHASTIC T RAFFIC M ODEL BASED ON GPS T RACES We compute a stochastic model of an entire city using the following process. We start with a heterogeneous set of approximately 500M anonymized GPS points. Grouped by individual vehicles, we extract those traces (subpaths), for which the sampling frequency is sufficiently high (at least one data point every thirty seconds). For each trace, we employ map-matching algorithms to compute the sequence of road segments that has the largest likelihood to have been used by this particular trace. The process of matching GPS traces data into a map has been investigated before [12, 27]. To overcome the noise and sparsity of GPS data, a map matching method based on the Viterbi algorithm was suggested in [17]. We use their method for our dataset. Since we do not have the groundtruth for our dataset, the correctness of the mapmatching step cannot be verified. It is actually rather unlikely that all traces are mapped to the correct sequence of road segments. We assume that our dataset is large enough such that the majority of traces is mapped to correct road segments. We determine the speed of each vehicle per road segment and we sort all the values for each road segment by time. We then bin speed values into one-hour intervals for each day of the week. Using standard statistics, we compute the sample mean and standard deviation for the speed and travel time for each bin. For examples of distributions for different road segments and different times of the day, see Fig. 3. V. P RACTICAL C ONSIDERATIONS We adapt our method for planar networks so that it can efficiently handle the actual road network models we use in our experiments.

Turn restrictions: For some intersections, particularly in urban settings, it may not be possible to make a left turn due to a turn restriction, or it may just be more time-consuming to make a left turn, which can be modeled by a turn cost. Turn costs can be taken considered by modeling each road segment by a node of the graph, and then connecting only those nodes whose corresponding road segments can be traversed in sequence [7, 28]. Such a transformation increases the number of nodes [9] and, more importantly for our oracles, it also violates planarity. We argue that planarity is not just violated in a moderate way but that the violation is actually such that it cannot possibly be fixed. If arbitrary turn restrictions were allowed, we could just (i) draw any graph in the plane (for example the large-girth graphs in [25] or the expander graphs in [23]), (ii) add a node for each intersection, (iii) apply left-turn and right-turn restrictions (i.e., apply the supposedlyexisting transformation without violating planarity), (iv) compute a planar distance oracle [24], and (v) store only the distance labels for the actual nodes of the graph. If the transformation in step (iii) could be done, the resulting oracle would contradict known lower bounds. In our implementation, we connect each node to additional portals as follows: on any separator path, we deem each turn-restricted intersection as a portal (assuming that those intersections are not ubiquitous). The upper bound of O(1/ε) portals per separator path cannot be guaranteed anymore. The bound in the implementation is O(1/ε + R), where R is the number of intersections with turn restrictions on a given separator path. B. Directions Realistic modeling of traffic requires us to consider directed graphs.1 The existing implementation [16] is for undirected graphs only and the directed oracle is significantly more involved [24]. Existing methods for directed road networks [5, 6, 10, 21] usually do not have theoretical worst-case bounds close to [24] (some theoretical results are known [2, 3]). For our scenario, we integrate the undirected implementation into our code. While our theoretical results hold for directed networks, our experimental evaluation is for undirected networks. C. Reduced Set of Portals on Separator Paths

A. Non-planarities Our network is not exactly planar and many real-world road networks may be non-planar [8]. The internal algorithms of the distance oracle use separator paths, which are sensitive to non-planarities (crossings). We propose to address most of these non-planarities as follows (inspired by similar constructions for minor-free graphs [1]). On the highest level of the recursion, instead of using two shortest paths to separate the graph, (i) we interpret the set of express ways (cf. highways) as a set of separator paths, (ii) we compute portals for each separator path Q and each node v, (distances to portals are computed with respect to the entire graph) and (iii) we remove the set of express ways from the graph. The highest level of the recursion handles all paths that use any part of an express way. The remaining graph has only few further crossings.

On the highest level, we cannot use planar-graph algorithms to efficiently compute portals and distances to portals for each node. Instead, we propose to relax the approximation guarantee in a controlled way. We heuristically treat each separator path (or express way) as in Algorithm 3. D. Deadline values In the probability-tail model, the user may specify an arbitrary deadline d. As a consequence, the ratio of the smallest and the largest possible d can be arbitrary large. As argued in 1 The shortest-path problem for directed graphs is strictly more general. In a straightforward way, any undirected graph can be represented by a directed graph with bidirectional edges. For arbitrary directed graphs, even if they are guaranteed to be sparse, not much is known with respect to shortest-path queries, while the undirected variant is quite well understood.

253

Fig. 3. The travel time distribution of two different road segments for 0 ∼ 1 am, 8 ∼ 9 am, and 5 ∼ 6 pm on weekdays.

road may potentially be jammed in the other direction. Using the following heuristic, we can use the undirected variant of the distance oracle without significant impact on the path quality. To compile the different travel time distributions of bidirectional road segments, we built distance oracles for two different edge length functions: one with inbound driving time statistics and one with outbound driving time statistics. The geometric orientation of an edge with respect to the city center was used as the reference to indicate which of the two travel time statistics should be used when constructing the distance oracle. At query time, for example, if the route query overall corresponds to a inbound movement, the distance oracles built based on the inbound statistics are used. We examined the pre-processing times, query times, and approximation ratios for different ε values for the two different models: Specifically, we used ε ∈ {0.1, 0.2, 0.3, 0.4, 0.5, 1} for the Mean-Risk model, and ε ∈ {0.5, 1} for the Probability-Tail model.

Algorithm 3: P ORTALS(H, Q, m) Data: a path Q and a parameter m Result: a set of portals for each v ∈ V (H) S INGLE S OURCE S HORTEST PATHS(Q) ; connect dummy source to each q ∈ Q; computes first portal q0 (v) and distance d(v, q0 (v)) for each v ∈ V (H); for qi ∈ Q do for j ∈ {0, 1, 2, . . . m − 1} do Q( j) ← subset of nodes qi where i = j mod m ; S INGLE S OURCE S HORTEST PATHS(Q( j) ) ; dummy source connected to all q ∈ Q( j) ; computes another portal q j (v) and distance d(v, q j (v)) for each v ∈ V (H); if (1 + ε)d(v, q j (v)) > d(v, qi (v)) + d(qi (v), q j (v)) for all portals qi (v) added previously then add portal; end end end

B. Pre-processing

the theoretical sections, this can be controlled for effectively. In practical implementations, one could also limit the range of user input values by restricting the deadline values available. VI. E XPERIMENTS We evaluate an implementation of our algorithm on a realworld network dataset. We show that the query time is several orders of magnitude faster than the previous algorithms, and we also show that approximately optimal paths are reasonably close to optimal paths. A. Setup The tests were run on a machine with 2 GB of main memory and an Intel Xeon CPU 1.86GHz processor. Our graph represents a city-scale network with 40,000+ nodes and 70,000+ edges. As described in the previous section, we adapted the worst-case-proven algorithm and tailored it to fit our real-world scenario. We implemented the distance oracle for undirected graphs based on the implementation of [16]. Since our network is mostly bidirectional (meaning that there are only few one-way streets), our graph could potentially be considered undirected. However, in time-dependent scenarios, the traffic may still flow in one direction, while a

Depending on the user’s desired accuracy level ε, we find the values ξ , L, and U (as outlined in the theoretical part). The sizes of the set K for different values of ε are shown in Fig. 4 (Left). The pre-processing time depends mostly on |K | (and therefore on ε). For each value of ε, we used c ∈ {0, 0.5, 1, · · · , 6} for the Mean-Risk model, and d ∈ {1.1 · m∗ , 1.2 · m∗ , · · · , 2 · m∗ } for the Probability-Tail model, where m∗ is the minimum expected travel time for a given origin-destination pair. 1) Computation time: The pre-processing times for different ε values are shown in Fig. 4 (Middle). For example, for the Mean-Risk model, if we set ε = 0.1, 82 different k values are used. The overall pre-processing time is 5,159 seconds. The average pre-processing time per oracle is about 63 seconds. Note that the number of distance oracles to pre-process only depends on the desired accuracy of the solution paths defined by ε. It is independent of the users’ other query parameters, such as origin, destination and the risk-aversion coefficient c in the Mean-Risk model or the deadline d in the Probability-Tail model. 2) Data space: Depending on ε, the overall space consumption varies greatly (see Fig. 4 (Right)), mostly due to two reasons. First, the number of k values increases as the accuracy is increased (meaning that ε is decreased), see Fig. 4 (Left), and second, the number of connections in the distance oracle grows linearly with 1/ε.

254

Fig. 4. The number of k values needed for the road network — this figure illustrates the growth of |K | for the mean-risk and the probability tail model with respect to the user-specified level of accuracy ε (Left), Pre-processing time for each ε (Middle), For each ε value (1/ε on the x–axis), we report the total storage requirement for all the distance oracles (y–axis) (Right).

Fig. 5. The histogram of the query time for 1,000 random OD-paris. ‘Pruning algorithm’ refers to Algorithm 1 in [15]. Our new method is faster by several orders of magnitude. The speed-up is from tens of seconds or several minutes to tens of or hundreds of microseconds.

C. Path Queries To measure the query time, we compute stochastic shortest paths between 1,000 random origin-destination (OD) pairs. To evaluate the quality and accuracy of the reported paths, we query 1,000 random OD pairs. 1) Query time: The time required to answer a route query for different values of ε is illustrated in Fig. 5. The query time decreases as ε increases. We implemented [15, Algorithm 1] as a comparison to our method in this paper. The query time results are plotted in Fig. 5 . 2) Approximation quality: We compare the quality of the path output by our algorithm to the optimal path length, computed by the algorithm described in [15]. As a quality measure, we list the approximation ratio, defined as the difference between the optimal solution (as defined in equations (2) and (3), respectively) and our solution, divided by the optimal solution (see Fig. 6). Our worst-case guarantee says that this ratio can never be larger than the user-specified level of accuracy ε. Our experimental results show that the ratio is

Fig. 6. The histogram of the absolute performance guarantee (error) for 1,000 random OD-paris. We used c ∈ {0, 0.5, · · · , 6} for the Mean-Risk model and d ∈ {1.1 · m∗ , 1.2 · m∗ , · · · , 2 · m∗ } for the Probability-Tail model, where m∗ is the minimum expected travel time for a given origin-destination pair.

smaller than ε, and, moreover, the ratio is substantially smaller than ε. Even for relatively large values such as ε = 1, we still obtain an approximation ratio less than 0.02. Based on this experimental result, we suggest using larger values of ε, saving both computation time (both pre-processing and query times) and storage (as outlined in the previous section). D. Path Examples We developed a web interface that responds to user inputs. Users can select the origin and destination, as well as the travel time of the day and the day of the week. The system computes the stochastic shortest path and displays it on the map. Fig. 1 shows our different solution paths for different deadlines for the Probability-Tail model with ε = 0.1. VII. C ONCLUSION We improve upon the state of the art in stochastic path planning, providing a method that can answer stochastic shortestpath queries in poly-logarithmic time using a data structure that occupies space essentially proportional to the size of the

255

network. Our method has proven worst-case guarantees and it might be applicable to real-world scenarios. VIII. ACKNOWLEDGMENTS Support for this research has been provided by the Singapore-MIT Alliance for Research and Technology (The Future of Urban Mobility project), NSF awards CPS-0931550 and 0735953, and ONR awards N00014-09-1-1051 (MURI SMARTS) and N00014-09-1-1031. C.S. was partially supported by the Swiss National Science Foundation. We are grateful for this support. Many thanks to Oliver Senn for connecting the first two authors. R EFERENCES [1] I. Abraham and C. Gavoille. Object location using path separators. In 25th ACM Symposium on Principles of Distributed Computing (PODC), pages 188–197, 2006. [2] I. Abraham, A. Fiat, A. Goldberg, and R. Werneck. Highway dimension, shortest paths, and provably efficient algorithms. In 21st ACM-SIAM Symposium on Discrete Algorithms (SODA), 2010. [3] I. Abraham, D. Delling, A. Fiat, A. Goldberg, and R. Werneck. VC-dimension and shortest path algorithms. In 38th Int. Colloquium on Automata, Languages, and Programming (ICALP), pages 690–699, 2011. [4] Y. Bartal, L.-A. Gottlieb, T. Kopelowitz, M. Lewenstein, and L. Roditty. Fast, precise and dynamic distance queries. In 22nd ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 840–853, 2011. [5] H. Bast, S. Funke, P. Sanders, and D. Schultes. Fast routing in road networks with transit nodes. Science, 316(5824):566, 2007. [6] R. Bauer and D. Delling. SHARC: Fast and robust unidirectional routing. ACM Journal of Experimental Algorithmics, 14, 2009. [7] T. Caldwell. On finding minimum routes in a network with turn penalties. Comm. of the ACM, 4(2), 1961. [8] D. Eppstein and M. T. Goodrich. Studying (non-planar) road networks through an algorithmic lens. In 16th ACM SIGSPATIAL Int. Symposium on Advances in Geographic Information Systems (GIS), page 16, 2008. [9] R. Geisberger and C. Vetter. Efficient routing in road networks with turn costs. In 10th Int. Symposium on Experimental Algorithms (SEA), pages 100–111, 2011. [10] R. Geisberger, P. Sanders, D. Schultes, and D. Delling. Contraction hierarchies: Faster and simpler hierarchical routing in road networks. In 7th Int. Workshop on Experimental Algorithms (WEA), pages 319–333, 2008. [11] R. Geisberger, M. Kobitzsch, and P. Sanders. Route planning with flexible objective functions. In 12th Workshop on Algorithm Engineering and Experiments (ALENEX), pages 124–137, 2010. [12] J. Greenfeld. Matching GPS observations to locations on a digital map. In 81st Annual Meeting of the Transportation Research Board, volume 1, 2002.

[13] J. Gudmundsson, C. Levcopoulos, G. Narasimhan, and M. Smid. Approximate distance oracles for geometric spanners. ACM Transactions on Algorithms, 4(1), 2008. [14] K. Kawarabayashi, P. N. Klein, and C. Sommer. Linearspace approximate distance oracles for planar, boundedgenus and minor-free graphs. In 38th Int. Colloquium on Automata, Languages and Programming (ICALP), pages 135–146, 2011. [15] S. Lim, H. Balakrishnan, D. Gifford, S. Madden, and D. Rus. Stochastic motion planning and applications to traffic. International Journal of Robotics Research, 30 (6):699–712, 2011. Announced at WAFR 2008. [16] L. F. Muller and M. Zachariasen. Fast and compact oracles for approximate distances in planar graphs. In 15th European Symposium on Algorithms (ESA), pages 657–668, 2007. [17] P. Newson and J. Krumm. Hidden markov map matching through noise and sparseness. 17th ACM SIGSPATIAL Int. Conference on Advances in Geographic Information Systems (GIS), pages 336–343, 2009. [18] E. Nikolova. Approximation algorithms for reliable stochastic combinatorial optimization. In 13th Int. Workshop on Approximation, Randomization, and Combinatorial Optimization (APPROX), pages 338–351, 2010. [19] E. Nikolova, M. Brand, and D. R. Karger. Optimal route planning under uncertainty. In 16th Int. Conference on Automated Planning and Scheduling (ICAPS), pages 131–141, 2006. [20] E. Nikolova, J. A. Kelner, M. Brand, and M. Mitzenmacher. Stochastic shortest paths via quasi-convex maximization. In 14th European Symposium on Algorithms (ESA), pages 552–563, 2006. [21] P. Sanders and D. Schultes. Highway hierarchies hasten exact shortest path queries. In 13th European Symposium on Algorithms (ESA), pages 568–579, 2005. [22] R.-P. Sch¨afer. IQ routes and HD traffic: technology insights about TomTom’s time-dynamic navigation concept. In Foundations of Software Engineering (SIGSOFT FSE), pages 171–172, 2009. [23] C. Sommer, E. Verbin, and W. Yu. Distance oracles for sparse graphs. In 50th IEEE Symposium on Foundations of Computer Science (FOCS), pages 703–712, 2009. [24] M. Thorup. Compact oracles for reachability and approximate distances in planar digraphs. Journal of the ACM, 51(6):993–1024, 2004. [25] M. Thorup and U. Zwick. Approximate distance oracles. Journal of the ACM, 52(1):1–24, 2005. [26] TomTom. How TomTom’s HD Traffic and IQ Routes data provides the very best routing — travel time measurements using GSM and GPS probe data. White Paper. [27] C. White, D. Bernstein, and A. Kornhauser. Some map matching algorithms for personal navigation assistants. Transportation Research Part C: Emerging Technologies, 8(1-6):91–108, 2000. [28] S. Winter. Modeling costs of turns in route planning. GeoInformatica, 6(4):363–380, 2002.

256

A Distributable and Computation-Flexible Assignment Algorithm: From Local Task Swapping to Global Optimality Lantao Liu

Dylan A. Shell

Dept. of Computer Science and Engineering Texas A&M University, College Station, USA Email: [email protected]

Dept. of Computer Science and Engineering Texas A&M University, College Station, USA Email: [email protected]

Abstract— This paper introduces an algorithm for solving the assignment problem with several appealing features for online, distributed robotics applications. The method can start with any initial matching and incrementally improve the solution to reach the global optimum, producing valid assignments at any intermediate point. It is an any-time algorithm with an attractive performance profile (quality improves linearly) that, additionally, is comparatively straightforward to implement and is efficient both theoretically (O(n3 lg n) complexity is better than widely used solvers) and practically (comparable to the fastest implementation, for up to hundreds of robots/tasks). We present a centralized version and two decentralized variants that trade between computational and communication complexity. Inspired by techniques that employ task exchanges between robots, our algorithm guarantees global optimality while using generalized “swap” primitives. The centralized version turns out to be a computational improvement and reinterpretation of the little-known method of Balinski-Gomory, proposed half a century ago. Deeper understanding of the relationship between approximate swap-based techniques —developed by roboticists— and combinatorial optimization techniques, e.g., the Hungarian and Auction algorithms —developed by operations researchers but used extensively by roboticists— is uncovered.

I. INTRODUCTION A common class of multi-robot task-allocation mechanisms involve estimating the expected cost for each robot’s performance of each available task, and matching robots to tasks in order to minimize overall cost. By allocating robots to tasks repeatedly, a team can adapt as circumstances change and demonstrate fluid coordination. A natural tension exists between two factors: running-time is important as it determines how dynamic the team can be, while quality of the allocation reflects the resultant total cost and hence the performance of the team. While the importance of solutions that trade the quality of results against the cost of computation has been established for some time (e.g., review in [1]), the assignment problem underlying efficient task-allocation has received little attention in this regard. This paper introduces an algorithm that yields a feasible allocation at any point in its execution and an optimal assignment when it runs to completion. The results give an easily characterizable relationship between running time and allocation quality, allowing one factor to be traded for the other, and even for the marginal value of computation to be estimated. Additionally, the algorithm may start from any initial matching so it can be easily used to refine sub-optimal assignments computed by other methods.

But the flexibility afforded by an any-time algorithm will be counterproductive if it comes at too high a cost. The method we describe has strongly polynomial running time and we show that it can be competitive with the fastest existing implementation even for hundreds of robots and tasks. Additionally, the cost can be borne by multiple robots because variants of the algorithm can be executed in a decentralized way. We are unaware of another solution to the assignment problem with these features. II. RELATED WORK Task allocation is one of the fundamental problems in distributed multi-robot coordination [2]. This paper draws a connection between methods for (A.) improving local performance, e.g., via incremental clique preferences improvement, and (B.) allocation methods which seek to solve (or approximate) the global optimum of the assignment. A. Local Task Exchanges in Task-Allocation Several researchers have proposed opportunistic methods in which pairs of robots within communication range adjust their workload by redistributing or exchanging tasks between themselves [3, 4, 5], also called O-contracts [6]. These intuitively appealing methods allow for a form of localized, lightweight coordination of the flavor advocated by [7]. Zheng and Koenig [8] recently explored a generalization of the idea in which an exchange mechanism involving K robots (called Kswaps) improves solution quality. They theoretically analyzed and illustrated properties of the method empirically. This paper gives new insight into how generalized swap-like mechanisms can ensure optimality, in our case through something analogous to automatic computation of the necessary value of K. Also, we have characterized the running-time of our method. B. Optimal Assignment in Task-Allocation The first and best-known optimal assignment method is Kuhn’s O(n3 ) Hungarian algorithm [9]. It is a dual-based algorithm because the variables in the dual program are maintained as feasible during each iteration in which a primal solution is sought. Many other assignment algorithms have been developed subsequently (see review [10]) however most are dual-based methods including: augmenting path [11], the auction [12], pseudo-flow [13] algorithms, etc. These (and approximations to them) underlie many examples of robot taskallocation, e.g., see [14, 15, 16]. Special mention must be made

257

of market-based methods ([17, 18]) as they have proliferated presumably on the basis of inspiration from real markets and their naturally distributed operation, and Bertsekas’s economic interpretation of dual variables as prices [12]. Fully distributing such methods sacrifices optimality: [19] gives bounds for some auction strategies. [20] introduces a distributed auction algorithm in the context of networked systems. Little work reports using primal approaches for taskallocation; researchers who solve the (relaxed) Linear Program directly likely use the popular (and generally non-polynomial time) simplex method [21]. The primal assignment algorithm proposed by Balinski and Gomory [22] is an obscure method that appears entirely unknown within robotics. The relationship is not obvious from their presentation, but their chaining sequence of alternating primal variables is akin to the swap loop transformation we have identified. Our centralized algorithm improves on their run-time performance (they require O(n4 ) time). Also, the data structures we employ differ as they were selected to reduce communication cost in the decentralized versions, which is not something they consider. III. PROBLEM DESCRIPTION & PRELIMINARIES We consider the multi-robot task assignment problem in which the solution is an association of each robot to exactly one task , denoted SR - ST- IA by [14]. An assignment A = R, T  consists a set of robots R and a set of tasks T . Given matrix C = (cij )n×n , where cij : R × T → R+ represents the cost of having robot i perform task j. In our work, n = |R| = |T |, the number of robots is identical to the number of tasks (otherwise dummy rows/columns can be inserted). A. Formulations This problem can be formulated with an equivalent pair of linear programs. The primal is a minimization formulation: minimize f = subject to





cij xij ,

i,j

xij = 1, ∀i, (1)

j



xij = 1, ∀j,

i

xij ≥ 0,

∀(i, j).

where an optimal solution eventually is an extreme point of its feasible set (xij equals to 0 or 1). Let binary matrix X = {xij }, ∀(i, j) contain   the primal variables. The constraints x = 1 and ij j i xij = 1 enforce a mutual exclusion property. There are corresponding dual vectors u = {ui } and v = {vj }, with dual linear program: maximize g =



ui +



i

subject to ui + vj ≤ cij ,

vj , (2)

j

∀(i, j).

B. Reduced Cost, Complementary Slackness, and Feasibility Reduced costs are defined as follows: c¯ij = cij − ui − vj ,

∀(i, j).

(3)

For a maximization dual as shown in Program (2), its constraint shows that an assignment pair (i, j) is feasible when and only when c¯ij ≥ 0.

(a) (b) Fig. 1. Primal transformations are task swaps. (a) A cost matrix with two independent swap loops, where shaded and bold-edged squares represent old and new assigned entries, respectively; (b) Task swapping from an independent swap loop (e.g., left closed loop in (a)) among four robots and tasks.

The duality theorems show that a pair of feasible primal and dual solutions are optimal iff the following is satisfied: xij (cij − ui − vj ) = xij c¯ij = 0,

∀(i, j).

(4)

Equation (4) is called complementary slackness. It also indicates that, if a robot-task pair (i, j) is assigned, i.e., xij = 1, then the corresponding reduced cost c¯ij must be equal to 0. C. Transformations and Admissibilities Primal and dual transformations and, in particular, their admissibilities are used later in the paper. • Admissible Primal Transformation: Map Zp : X → X is an admissible primal transformation if the primal solution quality is better after the transformation, i.e. X = Zp (X) is admissible iff f (X ) < f (X) for a minimization problem. • Admissible Dual Transformation: Zd : (u, v) → (u , v ) is an admissible dual transformation if the size for the set of feasible reduced costs increases, i.e., (u , v ) = Zd (u, v) is admissible iff |{(i, j) | c¯ij ≥ 0}| > |{(i, j) | c¯ij ≥ 0}|. IV. TASK SWAPPING AND OPTIMALITY Any primal transformation X = Zp (X) is easily visualized  by superimposing both X and X on an assignment matrix. Shown as shaded and bold-edged entries in Fig. 1(a), the transformations can be interpreted as row-wise and columnwise aligned arcs, each of which bridges exactly one shaded entry (old assignment) and exactly one bold-edged entry (new assignment). Note that both types of such entries have reduced costs equal to 0s. Connecting the beginning to the end closes the path to form what we call a swap loop, which is easily imagined as a subset of robots handing over tasks in a chain. If a swap loop shares no path segment with any other, it is termed independent. Theorem 4.1: A primal transformation X = Zp (X) where (X = X ) forms a (non-empty) set of independent swap loops. Proof: The mutual exclusion property proves both parts. Independence: if a path is not independent, there must be at least one segment that is shared by multiple paths. Any such segment  mutual exclusion constraints since  contradicts the either i xij > 1, or j xij > 1, or both. Closeness: a non-closed path has end entries that are exposed;  but this leads to j xij = 0 or i xij = 0. Assume Sswp = {swpλ } (λ ∈ [1, m]) is a set of swap loops where swpλ denotes the λth swap loop. Let primal  transformation X → X with specific set of swap loops Sswp  also be denoted as X = Zp Sswp (X).

258

If we put all the sub-assignments together, the objective of the whole assignment problem can be written in the form 

Fig. 2. Amalgamation allows synthesis of complex swap loops from multiple dependent swap loops. Overlapped path segments cancel each other out.

Theorem 4.2: A primal transformation involving mutually independent swap loops Sswp = {swp1 , swp2 , · · · , swpm } can be separated and chained in any random order. i.e.,  {swp1 }

X = Zp

{swp2 }

(Zp

{swpm }

· · · Zp

However, many times independent swap loops can not be directly obtained. Instead, an independent swap loop may be composed of multiple dependent swap loops that share rows/columns on some path segments. Theorem 4.3: Two dependent swap loops with overlapping, reversed segments can be amalgamated into a new swap loop, and vise versa. Proof: A directed path segment can be conveniently represented as vector π . Path segments π1 and π2 sharing the same rows or columns, but with different directions, cancel via  π = π1 + π2 , which has interpretation as a task (robot) handed from one robot (task) to another, but then passed back again. Such cancellation must form a loop because each merger collapses one pair of such segments, consistently connecting two partial loops. The opposite operation (decomposition) involves analogous reasoning. While ordering of independent swap loops is unimportant, the number, size and, order of dependant loops matter. Theorem 4.4: When K < n, K-swaps are susceptible to local minima. Proof: A K-swap loop involves at most K robots and K assigned tasks. Quiescence results by reaching equilibrium after sufficient K-swaps so that no more swaps can be executed. Robots and their assigned tasks involved in the K-swap can

n form a smaller sub-assignment of size K. Thus, we have K possible such sub-assignments, and all of them are optimal at equilibrium. Assume the set of these sub-assignments is SA = n ]. Aγ = {Rγ , Tγ } represents the sub{Aγ }, where γ ∈ [1, K assignment with robot (task) index set |Rγ | = K (|Tγ | = K). Therefore, the dual program for each sub-assignment is: max g(Aγ ) =

i∈Rγ

subject to ui + vj ≤ cij ,

ui +



vj ,

(5)

j∈Tγ

∀i ∈ Rγ , j ∈ Tγ .

(6)

−1



max g(Aγ )

(7)

γ∈[1,|SA |]

where the first term in the product accounts for the repeated summation of  each dual variable. By the fact that  ∀i (max zi ) ≥ max ∀i zi , (zi are arbitrary functions), we have 

(X)).

Proof: A primal transformation is isomorphic to a set of row and column permutations. Assume the row and column permutation matrices (each is a square orthogonal binary doubly stochastic matrix) corresponding to set Sswp are P and Q, so that PXQ permutes the rows and columns of X appropriately. If row i is unaffected the ith column of P, th p 5im= ei (the i column of the identity matrix) and then P = λ=1 Pλ , where Pλ represents the separated permutation matrix for the λth swap loop, will have a non-interfering form so that the order of product does not matter. Thus we have X = PXQ = P1 P2 · · · Pm XQm Qm−1 · · · Q1 (order of Pλ s do not matter, nor do Qλ s analogously), which is {swp } {swp } {swp } equivalent to X = Zp 1 (Zp 2 · · · Zp m (X)).



n−1 K −1

n−1 K −1

−1



max g(Aγ ) ≥ max g(A)

(8)

γ∈[1,|SA |]

where A is the original n × n assignment. With the duality theorems, this is equivalent to 

n−1 K −1

−1



min f (Aγ ) ≥ min f (A).

(9)

γ∈[1,|SA |]

So even completing every possible K−swap, and doing so until equilibrium is reached, may still end sub-optimally. V. AN OPTIMAL SWAP-BASED PRIMAL METHOD The preceding results suggest that to obtain the optimal primal transformation, one seeks a set of independent swap loops, but that these can be equivalently sought as a series of dependent swap loops. The primal assignment method we describe achieves it iteratively and avoids local minima because later swaps may correct earlier ones based on “enlarged” views that examine increasing numbers of rows and columns. The essence of the primal assignment method is that, at any time, the primal solution’s feasibility is maintained (i.e., the mutual exclusion property is satisfied), while infeasible dual variables are manipulated under the complementary slackness condition. At each iteration either an admissible primal transformation is found, or a new improved set of dual variables are obtained. Once all reduced costs are feasible, the primal and dual solutions simultaneously reach their (equal valued) optimum. Algorithm V.1 PRIMAL (C) ¯ 1: init arrays u[n] := {0}, v[n] := diag(C), C[n][n] := {0} 2: for i := 1 to n do ¯ with c¯   = c   − u  − v  , ∀i , j  3: update matrix C i j i j i j ¯ 4: if min{C[:][i]} < 0 then 5: array δ[n] := {0} ¯ u, v) 6: heap h[n] := PRE-PROCESS(C, ¯ get smallest-valued entry (x, y) 7: check the ith column of C, ¯ δ, h, x, y) 8: SWAP LOOP(C, 9: for j := 1 to n do 10: u[j] := u[j] + δ(a(j)), v[j] := v[j] − δ[j] ¯ 11: v[y] := v[y] − |C[x][y] − ux − vy | so that C[x][y] =0 12: if a swap loop found, swap tasks to augment solution ¯ ≡ C[n][n]. ¯ Note: Variable ui and u[i] are equivalent, vector v ≡ v[n], matrix C

The whole algorithm is organized in Algorithm V.1, and critical steps are described in Algorithms V.2–V.4 in some detail to ensure that the pseudo-code is appropriate for straightforward implementation.

259

A. Algorithm V.2: Pre-processing ¯ is pre-processed At each stage, the reduced cost matrix C before searching for a swap loop: a separate min-heap is used to maintain the feasible reduced costs in each row, such that smallest values (root elements) can be extracted or removed efficiently. ¯ u, v) Algorithm V.2 PRE-PROCESS (C, 1: initiate n min-heaps h[n] := {null} 2: for i := 1 to n do 3: for j := 1 to n do ¯ 4: if C[i][j] ≥ 0 AND j = a(i) then ¯ 5: make pair p := label = j, value = C[i][j] 6: insert p into h[i] 7: return min-heaps h

B. Algorithm V.3: Searching for Swap Loops Any swap loop yields an admissible primal transformation. Loops are sought by bridging path segments in the reduced costs matrix. A horizontal path segment is built from a currently assigned entry to a new entry with reduced cost of zero in the same row. Vertical path segments are implicitly identified as from unassigned entries equal to zeros to the unique assigned entries in the respective column. Fig. 3(a) shows the process. The search uses a tree, expanded in a breadth first fashion, to find the shortest loop; a dead-end (i.e., empty queue) triggers the dual adjustment step. ¯ h, δ, x, y) Algorithm V.3 SWAP LOOP (C, 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20:

starting row rs := x, column ts := y, SR := ST := ∅ initiate r := a−1 (y), t := y, V path(t : rs → r) push r into queue Q, color(SR ∪ {r}; ST ∪ {ts }) while Q not empty AND Q.f ront = rs do r := Q.f ront, Q.pop once initiate set Sδ := {r} for each r ∈ Sδ do t := h[r].extract.label while p(r, t)† = 0 do if t ∈ / ST then Hpath(r : a(r) → t), V path(t : r → a−1 (t)) push a−1 (t) into Q, color(SR ∪ {a−1 (t)}; ST ∪ {t}) h[r].remove root element and update root update t := h[r].extract.label if Q empty then ¯ Q, h, δ, Sδ , rs , ts ) DUAL ADJ (C, if updated Sδ not empty then go to STEP 7 return Hpath(rs : t → ts ), form a loop

†: p(·) is a projection of reduced cost, defined in (11) on page 5.

In Algorithm V.3, function t = a(r) denotes the assignment for r is t and thus is used to extract the column index with a given row index; the inverse does the reverse. Horizontal (vertical) segments are constructed via Hpath(cur row : col1 → col2) (V path(cur col : row1 → row2)), where the three domains represent the current row (column) containing the path, the starting column (row) and the ending column (row) for the segment, respectively. The coloring on visited rows/columns is merely the set union operation.

(a) (b) Fig. 3. (a) Path segments are bridged with one another while searching for swap loops. Shaded entries are currently assigned, and bold-edged entries have reduced costs equal to zero. Waved lines represent the paths found after dual adjustments; (b) The associated tree data structure that aids efficient searching.

C. Algorithm V.4: Dual Adjustments If all branches hit the dead-ends, dual adjustment introduces entries with 0-valued reduced costs so that the tree can be further expanded. This is done by changing the values of dual variables, which indirectly changes the reduced costs of corresponding entries. Doing so can only increase the size of the set of feasible reduced costs, thus the dual adjustment will never deteriorate the current result. The method subtracts the smallest feasible reduced cost from all visited (colored) rows and adds it to every visited columns, producing at least one new 0-valued reduce cost(s). Waved arrows in Fig. 3 illustrate such procedure. ¯ Q, h, δ, Sδ , rs , ts ) Algorithm V.4 DUAL ADJ (C, 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19:

array top d[n] := {∞}, col d[n] := {0} for i := 1 to n do if row i ∈ SR then top d[i] := p(i, h[i].extract.label) min δ := min{top d[i]} if min δ > 0 then update Sδ := {i | top d[i] = min δ} else min δ := −p(rs , ts ) for i := 1 to n do if row i ∈ SR then update δ[a(i)] := δ[a(i)] + min δ col d[i] := p(i, ts ) if min{col d[i]} ≥ 0 then terminate current stage update starting row rs := argmini {col d[i]} if rs ∈ SR then Hpath(rs : a(rs ) → ts ), form a loop terminate current swap loop searching

Next, we return to the relation of this method to the Balinski-Gomory’s primal technique [22]. Theoretical complexity and empirical results below show the superiority of the swap-based approach. Nevertheless, it is worthwhile to address the conceptual differences in detail as a common underlying idea is involved: they employ an iterative labelling and updating techniques to seek a chaining sequence of alternating primal variables, which are used to adjust and augment the primal solutions. Three aspects worth highlighting are: 1) The swap loop search incorporates the dual adjustment procedure. Balinski-Gomory’s method may require n rounds of traversals and cost O(n) times more than the traversal based on building and maintaining our tree. This modification is most significant for the decentralized context as each traversal involves communication overhead. 2) Instead of directly updating u, v, the δ array accumulates the dual variable adjustments during each stage. All updates

260

are transferred to u and v after the whole stage is terminated: 

ui

=

 (ω) ui + ω δa(i) , ∀i ∈ SR ui otherwise



vj

=

 (ω) vj − ω δj , ∀j ∈ ST vj otherwise (10)

where SR and ST are index sets of colored rows and columns, respectively, and ω is the index of iterations. The benefit lies in that reduced costs in the whole matrix need not updated on each dual variables adjustment. Instead, query of reduced cost c¯ij for individual entry (i, j) during an intermediate stage can be obtained via a projection p(i, j): c¯ij = p(i, j) = c¯ij + δj − δa(i) .

(11)

3) Swap loops are found more efficiently: for example, the heaps, coloring sets and tree with alternating tree nodes — assigned entries with n-ary branches, and unassigned entries with unary branches — quickly track the formation of loops even when the root is modified (Step 16 of Algorithm V.4). D. Correctness Assume the starting infeasible entry of matrix is (k, l) with reduced cost c¯kl = ckl − uk − vl < 0. Theorem 5.1: Once a task swap loop starting from entry (k, l) is obtained, the task swaps must lead to an admissible primal transformation.  Proof: Term cij xij contributes to f (X) = i,j cij xij only when binary variable xij = 1 and cij = ui + vj (see (4)). With (10): f (X ) − f (X) =

   (ui + vj ) − (ui + vj ) i,j

= =

i,j



ui

 − ui + va(i) −va(i)

i    i

(ω) δa(i)





 (12)

 (ω) δa(i)

− |¯ ckl |

ω

ω

= c¯kl < 0,

(see Step 11 in Algorithm V.1). So after a swap, the value of the primal objective must decrease. Theorem 5.2: If no task swap loop starting from entry (k, l) is found, an admissible dual transformation must be produced. Proof: First, feasible reduced costs remain feasible: 





c¯ij = cij − ui − vj  (ω)  (ω) = cij − ui − δa(i) − vj + δj  =

ω

c¯ij ≥ 0,  (ω) c¯ij − ω δa(i) ≥ 0,

ω

(13)

∀i ∈ SR , j ∈ ST ∀i ∈ SR , j ∈ / ST .

Second, at least c¯kl will become feasible which leads to the termination before formation of a swap loop, even in the sophisticated strategy allowing dynamic updating of starting entry (See step 16 of Algorithm V.4). This proves that the set of feasible reduced costs must increase. E. Time Complexity Algorithm V.1 requires at most n stages because in each stage the smallest infeasible reduced cost in each column is selected (Step 7 of Algorithm V.1), all other infeasible entries in the same column will thus also become feasible. The preprocessing using min-heaps for any stage requires O(n2 lg n).

(a) (b) (c) (d) Fig. 4. Swap loop searching in a multi-robot system using Euclidean distance as cost metric. Circles represent robots and triangles denote the tasks. The graphs can also be interpreted as Hypergraphs [25].

During each stage, there are at most n DUAL ADJs for the worst case and each needs O(n) time to obtain min δ via the heaps. Visited columns are colored in a sorted set and are never considered for future path bridging in any given stage. There are at most n2 entries to color and check, each costs O(lg n), yielding O(n2 lg n) per stage. Therefore, the total time complexity for the whole algorithm is O(n3 lg n) and the light-weight operations lead to a small constant factor. By way of comparison, Balinski-Gomory’s primal method [22] uses O(n2 ) searching steps with O(n2 ) time complexity for each step. Some researchers [23, 24] have suggested that it may possible to further improve the time complexity to O(n3 ) using techniques such as the blossom method [11]. To the best of our knowledge, no such variant has been forthcoming. In addition, although min-heaps in Algorithm V.2 are created in a separate step for the algorithmic description reason, in practice they can be constructed on the fly only when they are required, through which a better practical running time can be obtained although the time complexity is unchanged. Experimental results also show that using a fast approximation algorithm for initialization produces running times close to the fastest existing assignment algorithms with O(n3 ) time complexity. VI. DISTRIBUTED VARIANTS Distributed variants of our primal method are easily obtained. Swap loops are searched via message passing: messages carrying dual variables (u, v) and dual updates δ are passed down the tree while searching progresses. The idea is illustrated in Fig. 4 for a single swap loop searching stage with four robots. The lines in Fig. 4(a) show the initial pairwise robot-task assignment; the arrows in Fig. 4(b) show bridging edges found by searching for a swap loop starting from a selected pair. If the path ending pair connects to the starting pair, then a swap loop has been found (Fig. 4(c)) and tasks may be exchanged among robots in the loop. The new assignment is finally shown in Fig. 4(d). Unlike centralized algorithms, the cost matrix may be not globally visible. Instead, each robot maintains and manipulates its own cost vector associated with all tasks. A noteworthy feature is that a robot need not know the cost information of other robots, since the two arrays of dual variables are shared. We do assume that the initial assignment solution and the corresponding costs for the assigned robot-task pairs are known by all robots, so the initial reduced costs for each robot may be calculated locally. The algorithm has two roles: an organizer robot that holds the starting infeasible entry, and the remainder being member

261

robots (but with unique IDs). The organizer initiates a swap loop search iteration at stage m, by communicating a message containing the dual information um−1 , vm−1 obtained from stage m − 1, as well as a newly created dual increment vector δ m . A successor robot is located from either the assignment information or the newly found feasible and orthogonal entries satisfying the complementary slackness, as presented in the centralized version. When a path can no longer be expanded, member robots at the respective “dead-ends” request a dual adjustment from the organizer. Once the organizer has collected requests equal to the number of branches, it computes and transmits δ m . The process continues until a swap loop is found and tasks are exchanged. At this point, the organizer either re-elects itself as next stage’s organizer, or hands over the role to other robots, based on different strategies discussed below. The roles are described in Algorithms VI.1 and VI.2. Algorithm VI.1 Organizer (um−1 , vm−1 , δ m ) 1: initiate: only once 2: decide starting entry xm , ym for current stage m 3: send msg(um−1 , vm−1 ) to member with ID a−1 (y) 4: listening: 5: if all involved IDs request dual adjustments then 6: compute δ m , send it it to corresponding ID(s) 7: endif 8: if swap loop formed then 9: with δ m , update um−1 , vm−1 to um , vm for next stage 10: decide next organizer j and send msg(um , vm ) to ID j

Algorithm VI.2 Member[i] (organizer ID, um−1 , vm−1 , δ m ) 1: update c¯ij ∀j with received um−1 , vm−1 ,δ m 2: if {j | c¯ij = 0} = ∅ then 3: for each j of {j | c¯ij = 0, j = a(i)} do 4: send (um−1 , vm−1 , δ m ) to ID a−1 (j) 5: send newly involved IDs and No. of new branches to organizer 6: else 7: send min{c¯ij |¯cij > 0,∀j } to organizer, request dual adjustment

Once a reduced cost becomes feasible it never becomes infeasible again (see Theorem 5.2) so the algorithm needs to iteratively transform each infeasible reduced cost to approach global optimality. Two different approaches for locating and transforming the infeasible values lead to two versions of the algorithm: task-oriented and robot-oriented variants.

(a) (b) Fig. 5. Illustrations of task oriented (a) and robot oriented (b) strategies. Here shaded entries have infeasible reduced costs. Solid and void stars represent current starting entry and (possibly) next starting entry, respectively.

B. Robot Oriented Variant The robot oriented method aims to covering all infeasible reduced costs of one robot before transferring to another robot; it works in a row-wise fashion. The organizer is randomly selected from all members that hold infeasible reduced costs, and keeps the role for the whole stage. Monitoring of “worse” projected costs is not required, but the each stage only guarantees that the starting entry will become feasible, not others. This means the organizer may need to iteratively fix all its associated infeasible reduced costs at each stage before transferring the role to a successor organizer.  To compare, (A.) the advantage of the task-oriented scheme lies in that at most n stages are needed to reach global optimality, since each stage turns all infeasible reduced costs to feasible associated with a task. Its disadvantage is the extra communication because at the beginning of each stage, the member holding the smallest reduced cost for the chosen task has to be determined; additional communications are involved in the monitoring aspect too. (B.) the robot oriented strategy has greater decentralization and eliminates extra monitoring communication (disadvantage mentioned in the task oriented scheme). At any stage only a subset of robots need be involved and no global communication is required. The disadvantage of this variant is that a total of O(n2 ) stages (note, each stage is still equivalent to O(n) steps/stages of Balinski-Gomory’s method) is needed. VII. EXPERIMENTS Three forms of experiment were conducted: run-time performance of the centralized algorithm, access pattern analysis, and comparison of the decentralized variants.

A. Task Oriented Variant

A. Algorithmic Performance Analysis We implemented both our swap-based algorithm and Balinski-Gomory’s method in C++ (with STL data structures), and used an optimized implementation of Hungarian algorithm (O(n3 ) complexity) available in the dlib library

The task oriented approach attempts to cover all infeasible reduced costs of one task before moving to the costs of other tasks; it is, thus, operates column-wise in the cost matrix. The task oriented approach mimics the procedure of the centralized version: for any given task (column), the robot holding the smallest projected infeasible reduced cost is elected as organizer. During the swap loop searching stages, it is possible that after some DUAL ADJs one members can hold a “worse” projected infeasible reduced cost. Therefore, after each update of δ, the organizer must check all involved members within the current tree, and hands over the organizer role if necessary.

(a) (b) Fig. 6. Comparison of running times: (a) Time from an optimized Hungarian method, the Balinski-Gomory’s method, and the swap-based algorithm. Primal methods start with random initial solutions; (b) Running time is improved when the algorithm is combined with a fast approximation method.

262

(a) (b) Fig. 7. (a) Linear solution quality and running time from different initial solutions (matrix size: 100×100); (b) Entries traversed during stages.

(http://dlib.net) for comparison. The experiments were run on a standard dual-core desktop with 2.7GHz CPU with 3GB of memory. Fig. 6(a) shows the performance results. We can see that the swap-based algorithm has a significantly improved practical running time over the Balinski-Gomory’s method. The flexibility of the algorithm allowed for further improvement: fast approximation algorithms can give a reasonable initial assignment. Fig. 6(b) shows the improvement using an extremely cheap greedy assignment that assigns the robot-task pairs with lowest costs first, in a greedy manner. This reduces the practical running time to be very close to the Hungarian algorithm, especially for matrices with n < 300. To analyze solution quality as a function of running-time, we computed scenarios with 100 robots and 100 tasks with randomly generated cij ∈ [0, 104 ] ∀i, j. The solution qualities and consumed time for individual stages is illustrated in Fig. 7(a). The solution quality is measured by parameter η calculated as a ratio of current solution αi at current stage i to the final optimum αn , i.e., η = αi /αn ≥ 1. In each figure, the three series represent initial assignments with different “distances” to the optimal solution. A 60% processed initial solution means the initial solution is α60 (the solution output at 60th stage from a random initialization). The matrix is column-wise shuffled before the input of a processed solution such that a new re-computation from scratch can be executed (otherwise it is equivalent to the continuing computation). We can see that the solution qualities for all three scenarios change approximately linearly with the number of stages, which indicates the “step length” for the increment is a constant. From this observation, computational resources and solution accuracy are fungible as each is controllable in terms of the other. Given a current solution αm at the mth stage (m ≥ 1) as well as an initial solution α0 , the optimum can be estimated: α ˆ n = α0 +

n (αm − α0 ) = α0 + nΔs , m

(14)

where Δs ≥ 0 is the step length of solution increment. To bound the accuracy within 1 + , where  ≥ 0, assume we need to stop at θth stage, then α0 − (1 + )(α0 − nΔs ) α0 − θΔs ≤ 1 +  =⇒ θ ≥ . α ˆn Δs

(15)

B. Access Patterns Imply Suitability for Distribution Intuitively, entries in the spanning tree during each stage reflect the cost of communication.∗ Thus, we compared the access pattern of our swap-based algorithm with BalinskiGomory’s method on 100 × 100 matrices with random initial assignment as used. Fig. 7(b) shows that swap-loop traversal

(a) (b) Fig. 8. Quantities of involved rows (robots) and lengths of swap loops over stages (matrix size is 100×100). (a) Results from random initial solutions; (b) Results from greedy initial solutions.

results in a large reduction in accesses: the average is ∼ 100 for each stage, in contrast with Balinski-Gomory’s method requiring ∼ 700 with larger standard deviations (actually, reaching more than 8, 000 traversals when many dual adjustments occur). The results quantify claims made about the swap-based method fitting a decentralized paradigm. We also investigated the total number of rows (and, correspondingly, columns) involved during each stage, which reflects the number of involved robots in decentralized applications, as well as the size of swap loop formed at the end of the stages (defined as the number of colored rows). Fig. 8 show results from random (left plot) and greedily (right plot) initiated solutions. We see that the number of involved rows can be significantly reduced given better initial solutions, and loops are comparatively small for either cases. More detailed statistics are given in the table below. We conclude that improving initial assignment solutions, not only improves running time, but also the degree of locality in communication and computation. The averaged longest swap lengths show that the admissible primal transformations are a series of small swaps (one can regard the longest length equivalent to K of K-swaps), but which still attains optimality. S TATISTICS OF S WAP L OOPS AMONG S TAGES (M ATRIX SIZE 100) Initial solution

No. loops

Avg. length

Avg. longest

Avg. involved

random initiation

97.12

10.16

21.06

46.72

30% processed

71.90

7.34

19.97

34.29

60% processed

47.20

4.46

14.56

23.92

greedy initiation

24.86

2.30

11.80

16.14

Note: The last three columns denote the averaged lengths, the averaged longest lengths of swap loops, and the averaged number of colored rows in single stages, respectively.

C. Results from Decentralized Variants We also implemented both variants of the decentralized algorithms and distributed them over five networked computers for testing. The implementations can be directly applied to distributed multi-robot task-assignment, e.g., as the test routing problems in [26, 27]. The hosts were given unique IDs from 1 to 5, and communication performed via UDP, each host running a UDP server to listen to the messages sent by its peers. Information such as the IDs of machines, values of dual variables, requests of dual adjustments, etc., were encoded via simple protocols over the message passing. To initiate the ∗ Every traversed entry on the path segments, no matter it is assigned or unassigned, must connect to a new entry in other rows, requiring a message be passed. The number is approximately half of all the traversed entries since each entry is counted twice for the analysis of communication complexity.

263

(a) (b) (c) Fig. 9. Performance of the task oriented (T-O) and robot oriented (R-O) decentralized implementation. Measurements of 5 hosts to 5 tasks.

system, we inject 5 tasks with IDs from 1 to 5 and each machine randomly generates an array of cost values associated with these 5 tasks. The initial allocation assigns every machine with ID to the task with the identical ID; the corresponding costs for these assigned pairs are communicated. An initial organizer is randomly selected. Both distributed variants of the algorithm were tested. Fig. 9(a) shows the number stages used for the two schemes (average and variance for 10 separate instances). Fig. 9(b) and Fig. 9(c) show the communication cost (number of messages) and robots involved (ever having received/processed messages) per stage, respectively. These empirical results also validate the claims made above: (i) the task oriented scheme requires fewer stages, but has greater communication per stage; (ii) although the robot oriented method uses more stages, less the communication and fewer the robots are involved, indicating more local computation and communication. VIII. CONCLUSION Strategies of task swaps are a natural paradigm for decentralized optimization and have been used for years (and identified independently by several groups). It is now, using the algorithm we present, that optimality can be guaranteed with these same primitive operations. Additionally, we have sought to emphasize the useful any-time aspect of primal techniques. In summary, we highlight features of the introduced method: Natural primitives and optimality: the method is based on task swap loops, a generalization of O-contracts, task-exchanges, and K-swaps; these are techniques which have intuitive interpretations in distributed systems and natural implementations. However, unlike other swap-based methods, global optimality can be guaranteed. Computational flexibility and modularity: the algorithm can start with any feasible solution and can stop at any nondecreasing feasible solution. It can be used as a portable module to improve non-optimal assignment methods, e.g., some variants of market-based, auction-like methods. Any-time and efficiency: Unlike primal techniques for general LPs, optimality is reached within strongly polynomial time. Initialization with fast approximation methods makes it competitive practically, and it can potentially be further accelerated. Additionally, the linear increase in the solution quality makes balancing between the computation time and assignment accuracy possible. Ease of implementation: the algorithm uses simple data structures with a straightforward implementation that is much simpler than comparably efficient techniques.

Ranked solutions: assignments are found with increasing quality, allowing fast transitions to good choices without recomputation if commitment to the optimal assignment fails. Decentralized Variants, Local Computation & Communication: a small subset of robots are found to be typically involved. The decentralized variants of the algorithm require no single privileged global controller. They allow one to choose to trade between decentralization (communication) and running time (number of stages). R EFERENCES [1] S. Zilberstein, “Using Anytime Algorithms in Intelligent Systems,” AI Magazine 17(3), 1996. [2] L. E. Parker, “Multiple Mobile Robot Systems,” in Handbook of Robotics, B. Siciliano and O. Khatib, Eds. Springer, 2008, ch. 40. [3] M. Golfarelli, D. Maio, and S. Rizzi, “Multi-agent path planning based on task-swap negotiation,” in Proc. UK Planning and Scheduling Special Interest Group Workshop, 1997, pp. 69–82. [4] M. B. Dias, , and A. Stentz, “Opportunistic optimization for marketbased multirobot control,” in Proc. IROS, 2002, pp. 2714–2720. [5] L. Thomas, A. Rachid, and L. Simon, “A distributed tasks allocation scheme in multi-UAV context,” in Proc. ICRA, 2004, pp. 3622–3627. [6] T. Sandholm, “Contract types for satisficing task allocation: I Theoretical results,” in AAAI Spring Symp: Satisficing Models, 1998, pp. 68–75. [7] P. Stone, G. A. Kaminka, S. Kraus, and J. S. Rosenschein, “Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination,” in Proc. AAAI, 2010. [8] X. Zheng and S. Koenig, “K-swaps: cooperative negotiation for solving task-allocation problems,” in Proc. IJCAI, 2009, pp. 373–378. [9] H. W. Kuhn, “The Hungarian Method for the Assignment Problem,” Naval Research Logistic Quarterly 2:83–97, 1955. [10] R. Burkard, M. Dell’Amico, and S. Martello, Assignment problems. New York, NY: Society for Industrial and Applied Mathematics, 2009. [11] J. Edmonds and R. M. Karp, “Theoretical Improvements in Algorithmic Efficiency for Network Flow Problems,” J. ACM 19(2):248–264, 1972. [12] D. P. Bertsekas, “The auction algorithm for assignment and other network flow problems: A tutorial,” Interfaces 20(4):133–149, 1990. [13] A. V. Goldberg and R. Kennedy, “An Efficient Cost Scaling Algorithm for the Assignment Problem,” Math. Program. 71(2):153–177, 1995. [14] B. P. Gerkey and M. J. Matari´c, “A formal analysis and taxonomy of task allocation in multi-robot systems,” IJRR 23(9):939–954, 2004. [15] M. Nanjanath and M. Gini, “Dynamic task allocation for robots via auctions,” in Proc. ICRA, 2006, pp. 2781–2786. [16] S. Giordani, M. Lujak, and F. Martinelli, “A Distributed Algorithm for the Multi-Robot Task Allocation Problem,” LNCS: Trends in Applied Intelligent Systems, vol. 6096, pp. 721–730, 2010. [17] M. B. Dias, R. Zlot, N. Kalra, and A. Stentz, “Market-Based Multirobot Coordination: A Survey and Analysis,” Proc. of the IEEE, 2006. [18] S. Koenig, P. Keskinocak, and C. A. Tovey, “Progress on Agent Coordination with Cooperative Auctions,” in Proc. AAAI, 2010. [19] M. G. Lagoudakis, E. Markakis, D. Kempe, P. Keskinocak, A. Kleywegt, S. Koenig, C. Tovey, A. Meyerson, and S. Jain, “Auction-based multirobot routing,” in Robotics: Science and Systems, 2005. [20] M. M. Zavlanos, L. Spesivtsev, and G. J. Pappas, “A Distributed Auction Algorithm for the Assignment Problem,” in Proc. CDC, 2008. [21] G. Dantzig, Linear Programming and Extensions. Princeton University Press, Aug. 1963. [22] M. L. Balinski and R. E. Gomory, “A primal method for the assignment and transportation problems,” Management Sci. 10(3):578–593, 1964. [23] W. Cunningham and I. A.B. Marsh, “A Primal Algorithm for Optimum Matching,” Mathematical Programming Study, pp. 50–72, 1978. [24] M. Akg¨ul, “The linear assignment problem,” Combinatorial Optimization, pp. 85–122, 1992. [25] L. Liu and D. Shell, “Large-scale multi-robot task allocation via dynamic partitioning and distribution,” Autonomous Robots 33(3):291–307, 2012. [26] M. Berhault, H. Huang, P. Keskinocak, S. Koenig, W. Elmaghraby, P. Griffin, and A. J. Kleywegt, “Robot Exploration with Combinatorial Auctions,” in Proc. IROS, 2003, pp. 1957–1962. [27] L. Liu and D. Shell, “Assessing Optimal Assignment under Uncertainty: An Interval-based Algorithm,” IJRR 30(7):936–953, 2011.

264

The Banana Distribution Is Gaussian: A Localization Study with Exponential Coordinates Andrew W. Long ∗ , Kevin C. Wolfe † , Michael J. Mashner † , Gregory S. Chirikjian





Northwestern University, Evanston, IL 60208 Johns Hopkins University, Baltimore, MD 21218

Abstract—Distributions in position and orientation are central to many problems in robot localization. To increase efficiency, a majority of algorithms for planar mobile robots use Gaussians defined on positional Cartesian coordinates and heading. However, the distribution of poses for a noisy two-wheeled robot moving in the plane has been observed by many to be a “bananashaped” distribution, which is clearly not Gaussian/normal in these coordinates. As uncertainty increases, many localization algorithms therefore become “inconsistent” due to the normality assumption breaking down. We observe that this is because the combination of Cartesian coordinates and heading is not the most appropriate set of coordinates to use, and that the banana distribution can be described in closed form as a Gaussian in an alternative set of coordinates via the so-called exponential map. With this formulation, we can derive closed-form expressions for propagating the mean and covariance of the Gaussian in these exponential coordinates for a differential-drive car moving along a trajectory constructed from sections of straight segments and arcs of constant curvature. In addition, we detail how to fuse two or more Gaussians in exponential coordinates together with given relative pose measurements between robots moving in formation. These propagation and fusion formulas utilized here reduce uncertainty in localization better than when using traditional methods. We demonstrate with numerical examples dramatic improvements in the estimated pose of three robots moving in formation when compared to classical Cartesiancoordinate-based Gaussian fusion methods.

I. I NTRODUCTION A rich area of robotics research known as SLAM or simultaneous localization and mapping consists of a robot mapping its environment while estimating where it may be in this map. To incorporate uncertainty, one strategy represents all possible poses of the robot with a probability density function (pdf). The goal then is to maintain and update this pdf as the robot moves. One solution is to propagate the entire pdf such as with Fourier transform techniques [26]. However, these techniques can be too numerically intensive for real-time SLAM applications. According to Durrant-Whyte and Bailey, the two most common tools used to increase efficiency for SLAM algorithms are the extended Kalman filter (EKF) and the Rao-Blackwellized (RB) particle filter [8]. EKF-SLAM is based on the classic work of Smith et al. in which linearized models are used to propagate uncertainty of the non-linear motion and observations [21]. Several strategies for improving the efficiency for large scale SLAM problems using this technique are documented by Bailey and Durrant-Whyte [2]. By using the RB particle filter, Murphy observed that the SLAM problem

1 y

Y Position



0.5 Q

0 −0.5

(x,y) 2r

−1 −0.5

l

0

0.5 1 X Position

(a)

1.5

x

(b)

Fig. 1. (a) Banana-shaped distribution for position of a differential-drive robot moving along a straight line with noisy wheel speeds. (b) Notation for differential-drive robot.

could be decomposed in to a robot localization problem and several independent landmark estimation problems [18]. With this factorization, many efficient SLAM algorithms known as FastSLAM have been proposed [16, 17, 11]. An inherent technique utilized in EKF-SLAM and many FastSLAM algorithms for robots operating in the plane is to represent all distributions with Gaussians in Cartesian coordinates (x, y) and orientation heading angle (θ). This Gaussian/normal representation can be fully parameterized by a mean and covariance in these variables. Equal probability density contours of these distributions are described by ellipses. However, if we command a differential-drive robot to drive along a straight path, the resulting distribution from many sample paths may look similar to Fig. 1(a). Originally described by Thrun et al. [23], [24], this distribution is generally referred to as the “banana-shaped” distribution, which does not have elliptical probability density contours in the variables (x, y, θ). As the uncertainty grows, this assumption of normality breaks down and the maps inevitably become inconsistent [14]. Many have studied the problem of inconsistency for the EKF-SLAM [14], [3], [5], [13], and [6]. Julier and Ulhmann proved that this inconsistency is a direct result of linearization of the non-linear models in EKF-SLAM [14]. The extent of the inconsistency was studied by Bailey et al. in the context of heading uncertainty. They showed that if the standard deviation on the heading was larger than one or two degrees the map ultimately failed, specically with regards to excessive information gain and jagged vehicle trajectories [3]. In addition, the quality of the normal assumption in these coordinates was

265

studied for the RB particle filter in [22], which demonstrated that several real-world examples were not well described by Gaussians. In the majority of existing algorithms, there is an inherent assumption that the distribution should be represented by Gaussians in Cartesian coordinates. In this paper, we propose to use exponential coordinates and Lie groups to represent the robot’s pose. We demonstrate that a Gaussian in these coordinates for planar robots provides a much better fit than a Gaussian in Cartesian coordinates as the uncertainty increases. By using this approach, we can derive closed-form propagation and fusion formulas that can be used to estimate the mean and covariance of the robot’s pose. In the last few years, the idea of using the Lie group nature of rigid body transformations has drawn some attention for the SLAM problem, particularly in the area of monocular camera SLAM [15], [20], [10]. General robot motion utilizing rigid body transformations has been applied with Consistent Pose Registration (CPR) [1] and complementary filters [4]. The outline of the paper is as follows. In Section II, we derive the stochastic differential equation for the differentialdrive robot operating in the plane. We review rigid body motions and their relationship to exponential coordinates in Section III. We provide the definitions for the mean, covariance and the Gaussian probability density function in these coordinates in Section IV. A comparison between the Gaussian in Cartesian coordinates and the Gaussian in exponential coordinates is detailed in Section V. In Section VI, we derive closed-form expressions that can be used to propagate the mean and covariance of the Gaussian in exponential coordinates. We conclude by deriving closed-form formulas for fusing two (or more) Gaussians in exponential coordinates in Section VII. These fusion formulas are then applied to multiple robots moving in formation. II. S TOCHASTIC D IFFERENTIAL E QUATION The example we will be considering in this paper is the kinematic cart or differential-drive robot. This robot has two wheels each of radius r which can roll but not slip. The two wheels share an axis of rotation and are separated by a length . The configuration of the robot can be represented as x = [x, y, θ]T where (x, y) is the Cartesian position in the plane of the center of the axle and θ is the orientation of the robot as shown in Fig. 1(b). For the configuration coordinates x, the governing differential equations are r cos θ(dφ1 + dφ2 ) , dx = 2

where dwi are unit-strength Wiener processes and D is a noise coefficient, then one obtains the SDE ⎞ ⎛ ⎞ ⎛ r dx 2 (ω1 + ω2 ) cos θ dx = ⎝ dy ⎠ = ⎝ 2r (ω1 + ω2 ) sin θ ⎠ dt r (ω1 − ω2 ) dθ ⎞ ⎛ r r   √ 2 cos θ 2 cos θ dw1 + D ⎝ 2r sin θ 2r sin θ ⎠ . dw2 r r − (2) In this paper, we focus on the distributions for the robot, which deterministically would be driving straight or driving along an arc of constant curvature. For deterministic driving straight forward at speed v, the constant wheel speeds are v ω1 = ω2 = . (3) r For deterministic driving along an arc of radius a counterclockwise at rate α, ˙ the wheel speeds can be shown to be ω1 =

 α˙ (a + ), r 2

 α˙ (a − ). r 2

(4)

III. R EVIEW OF R IGID -B ODY M OTIONS The planar special Euclidean group, G = SE(2), is the semidirect product of the plane, R2 , with the special orthogonal group, SO(2). The elements of SE(2) can be represented with a rotational part R ∈ SO(2) and a translational part t = [t1 , t2 ]T as 3 x 3 homogeneous transformation matrices   R t ∈ SE(2), g= 0T 1 where the group operator ◦ is matrix multiplication. We will also make use of the Lie algebra se(2) associated with SE(2). For a vector x = [v1 , v2 , α]T , an element X of the Lie algebra se(2) can then be expressed as ⎞ ⎛ 0 −α v1 ˆ = ⎝ α 0 v2 ⎠ and X ∨ = x (5) X=x 0 0 0 where the ∧ and ∨ operators allow us to map from R3 to se(2) and back. By using the matrix exponential exp(·) on elements of se(2), we can obtain group elements of SE(2) as g(v1 , v2 , α)

r sin θ(dφ1 + dφ2 ) dy = , 2

= =

r and dθ = (dφ1 − dφ2 ), 

ω2 =

exp(X) ⎛ cos α ⎝ sin α 0

− sin α cos α 0

⎞ t1 t2 ⎠ , 1

where

where the rate of spinning of each wheel is given by ωi = dφi /dt for i = 1 or 2. If the wheel speeds are governed by stochastic differential equations (SDE) as √ (1) dφi = ωi (t)dt + Ddwi for i = 1 or 2

t1 t2

=

[v2 (−1 + cos α) + v1 sin α]/α

=

[v1 (1 − cos α) + v2 sin α]/α.

and

(6) (7)

Since we are using the exponential in this formulation, we will refer to the coordinates (v1 , v2 , α) as exponential coordinates.

266

We can obtain the vector of exponential coordinates x from the group element g ∈ SE(2) from

˜ where exp(·) is the usual scalar exponential function and c˜(Σ) ˜ ˜ Σ) is a pdf. In (19), is a normalizing factor to ensure f (x; μ,

x = (log(g))∨ ,

˜ 1/2 . ˜ = (2π)n/2 | det Σ| c˜(Σ)

(8)

(20)

where log(·) is the matrix logarithm. Given a time-dependent rigid body motion g(t), the spatial velocity as seen in the body-fixed frame is given as the quantity   T R R˙ RT t˙ −1 g g˙ = ∈ se(2), (9) 0T 0

These definitions can be naturally extended to matrix Lie groups as in [25]. Given a group G with operation ◦, the mean μ ∈ G of a pdf f (g) is defined to satisfy    log∨ μ−1 ◦ g f (g)dg = 0. (21)

where the dot represents a time derivative. We define the adjoint operator Ad(g) to satisfy [7]

The covariance about the mean is defined as  log∨ (μ−1 ◦ g)[log∨ (μ−1 ◦ g)]T f (g)dg. Σ=

Ad(g)x = log∨ (g ◦ exp(X) ◦ g −1 ).

(10)

For SE(2), the adjoint matrix is given by     R Mt 0 1 where M = . Ad(g) = −1 0 0T 1 (11) We define another adjoint operator ad(X) to satisfy [7] ad(X)y = ([X, Y ])∨ ,

(12)

where [X, Y ] = XY − Y X is the Lie bracket. This adjoint matrix for se(2) is given by   −αM M v (13) ad(X) = 0T 0 where v = [v1 , v2 ]T . The two adjoint matrices are related by Ad(exp(X)) = exp(ad(X)).

(14)

The Baker-Campbell-Hausdorff (BCH) formula [9] is a useful expression that relates the matrix exponential with the Lie bracket. The BCH is given by Z(X, Y ) = log(exp(X) ◦ exp(Y )) 1 = X + Y + [X, Y ] + . . . . 2 If the ∨ operator is applied to this formula, we obtain 1 z = x + y + ad(X)y + . . . . 2 IV. M EANS , C OVARIANCES AND G AUSSIAN D ISTRIBUTIONS n ˜ ˜ and covariance Σ For a vector x ∈ R , the mean μ 1 the mean for the pdf f (x) are given respectively as  ˜ (x)dx 0 = (x − μ)f n R  ˜ = ˜ ˜ T f (x)dx (x − μ)(x − μ) Σ

(15)

G

A multidimensional Gaussian for Lie groups can be defined as 1 1 f (g; μ, Σ) = exp − yT Σ−1 y , (23) c(Σ) 2 where y = log(μ−1 ◦g)∨ and c(Σ) is a normalizing factor. For highly concentrated distributions (i.e. the distribution decays ˜ rapidly as you move away from the mean), c(Σ) ≈ c˜(Σ). V. G AUSSIAN C OMPARISONS In this section, we compare the Gaussian in Cartesian coordinates defined in (19) to the one defined in (23) in exponential coordinates for the differential-drive car moving along a straight path and moving along an arc. For all the examples in the remainder of this paper, the wheel base  is 0.200 and the radius r of each wheel is 0.033. To obtain a set of sample frames representing the true distribution of this system, we numerically integrated the stochastic differential equation given in (2) 10,000 times using a modified version of the Euler-Maruyama method [12] with a time step of dt = 0.001 for a total time of T = 1 second. Using these sample data points, we can approximate the mean and covariance for Cartesian coordinates with ˜ μ

(16)

˜ Σ about

(17) (18)

Rn

A multidimensional Gaussian probability density function with ˜ is defined as ˜ and covariance matrix Σ mean μ ˜ = 1 exp − 1 (x − μ) ˜ −1 (x − μ) ˜ Σ) ˜ TΣ ˜ , (19) f (x; μ, ˜ 2 c˜(Σ) 1 We will use a ‘tilde’ (˜) to represent quantities associated with the Gaussian in Cartesian coordinates.

(22)

G

=

N 1  xi , N i=1

=

N 1  ˜ ˜ T, (xi − μ)(x i − μ) N i=1

and

(24) (25)

where xi = [xi , yi , θi ]T and N is the number of sample points. The mean for exponential coordinates defined in (21) can be approximated with the recursive formula, 2 1 N 1  log(μ−1 ◦ gi ) . (26) μ = μ ◦ exp N i=1 The covariance matrix in exponential coordinates defined in (22) can be estimated with Σ=

N 1  yi yiT , N i=1

where yi = [log(μ−1 ◦ gi )]∨ .

267

(27)

DT = 7 1

0.5

0.5

XY pdf

Y Position

Y Position

DT = 1 1

0 Exp pdf −0.5

XY pdf 0 Exp pdf −0.5

−1 −0.5

0

0.5 X Position

1

−1 −0.5

1.5

0

0.5 X Position

1

1.5

Fig. 2. Distributions for the differential drive robot moving ideally along a straight line with diffusion constant DT = 1 (left) and DT = 7 (right). Both plots have pdf contours of Gaussians of best fit in exponential coordinates and Cartesian coordinates marginalized over the heading.

DT = 4 DT = 1 1.5 1.5

1 Y Position

Y Position

1 0.5

0.5

0 0

XY pdf

XY pdf −0.5 −0.5

Exp pdf 0

0.5 1 X Position

1.5

Exp pdf

2

−0.5 −0.5

0

0.5 1 X Position

1.5

2

Fig. 3. Distributions for the differential-drive robot moving ideally along a constant-curvature arc with diffusion constant DT = 1 (left) and DT = 4 (right). Both plots have pdf contours of Gaussians of best fit in exponential coordinates and Cartesian coordinates marginalized over the heading.

The sample mean and covariance for each method parameterizes the associated pdf. These pdfs were marginalized over orientation and contours of equal pdf values were plotted along with the sampled end positions of the simulations. Fig. 2 and Fig. 3 present these contours for various values of DT for the driving straight and driving along an arc examples, where D is a diffusion coefficient and T is the total drive time. Here T is equal to one. It is evident that the Gaussian in exponential coordinates fits the sampled data more accurately for both cases, especially as uncertainty increases. If we take the poses of the sampled data, represent them in exponential coordinates, and look at the (v1 , v2 ) values as shown in

Fig. 4, it is clear that the result is approximately elliptical. To numerically verify that the Gaussian in exponential coordinates is a better fit for the sample data than the Gaussian in Cartesian coordinates, we performed a log-likelihood ratio test for various values of the diffusion coefficient D as shown in Fig. 5. Note that for small diffusion values, the log-likelihood ratio is close to one indicating that both Gaussian models perform approximately the same. However, as the uncertainty is increased the Gaussian in (23) in exponential coordinates performs better than the Gaussian in Cartesian coordinates.

268

VI. P ROPAGATION WITH E XPONENTIAL C OORDINATES

Straight, DT = 1

⎞ cos(α)dx + sin(α)dy ˙ ∨ dt = ⎝ − sin(α)dx + cos(α)dy ⎠ (g −1 g) dα ⎛ r ⎞ ⎞ ⎛ r r   √ 2 (ω1 + ω2 ) 2 2 dw1 ⎝ ⎠ ⎠ ⎝ 0 0 0 = . dt + D dw2 r r r (ω − ω ) − 1 2 (28) This will be written in short-hand as (g −1 g) ˙ ∨ dt = h dt + Hdw.

(29)

V2



0.4

0.3

0.3

0.2

0.2

0.1

0.1 V2

In this section, we describe how to propagate the mean and covariance defined in (21) and (22) with closed-form estimation formulas. By using the SDE (2), we can derive another stochastic differential equation as

0

−0.1

−0.2

−0.2

−0.3

−0.3

−0.4

−0.4 0.9

t

Σ(t) =

1.4

1.5

1.6 V1

1.7

Arc 1.5

,

and

(30)

0



1.1

Straight

1.6

LL exp / LL Cart

ˆ dτ h

μ(t) = exp

1 V1

Fig. 4. Distributions as a function of exponential coordinates (v1 , v2 ) with pdf contours marginalized over the heading for the Gaussian in exponential coordinates for driving straight (left) and driving along an arc (right) with diffusion DT = 1.



t

0

−0.1

When rigid-body transformations are close to the identity, the SE(2)-mean and SE(2)-covariance defined in (21) and (22) can be approximated with [19] 

Arc, DT = 1

0.4

Ad(μ−1 (τ ))HH T AdT (μ−1 (τ ))dτ

(31)

0

1.4 1.3 1.2 1.1

T

where HH is a constant diffusion matrix. For the straight driving example (ω1 = ω2 = ω), we have ⎛

1 μ(t) = ⎝ 0 0

1 0.9



0 1 0

1

rωt 0 ⎠. 1

(32)

Σ(t) = ⎝

1 2 2 Dr t

0 0

0

0

2Dω 2 r 4 t3 3 2 Dωr 3 t2 2

Dωr 3 t2 2 2Dr 2 t 2

⎞ ⎠.



cos(αt) ˙ ˙ μ(t) = ⎝ sin(αt) 0

− sin(αt) ˙ cos(αt) ˙ 0

⎞ a sin(αt) ˙ ⎠. a(1 − cos(αt)) ˙ 1

(34)

The closed-form SE(2)-covariance matrix is then ⎛

σ11 Σ(t) = ⎝ σ21 σ31

σ12 σ22 σ32

⎞ σ13 σ23 ⎠ , σ33

where

σ11

=

σ12

=

σ13

=

σ22

=

σ23 σ33

=

(33)

For the constant curvature case (ω1 = ω2 ) with (4), we have

(35)

3 DT

4

5

6

Fig. 5. Log-likelihood ratio test of the Gaussian in exponential coordinates to the Gaussian in Cartesian coordinates as a function of the diffusion DT .

We can solve the integral in (31) in closed form as ⎛

2

2cα, ˙ and Dr2 . c = 2 α˙ To demonstrate the accuracy of this propagation method, we calculated the SE(2)-mean and SE(2)-covariance from 10,000 sample data frames using (26) and (27) and from the propagation formulas. For the straight driving example with

269

=

c 2 (4a + 2 )(2αt ˙ + sin(2αt)) ˙ 8  2 +16a (αt ˙ − 2 sin(αt)) ˙ ,  2  αt ˙ −c  2 4a (−1 + cos(αt)) σ21 = ˙ + 2 sin , 2 2 σ31 = 2ca(αT ˙ − sin(αt)), ˙ c 2 2 − (4a +  )(−2αt ˙ + sin(2αt)), ˙ 8 σ32 = −2ca(−1 + cos(αt)), ˙

rω = 1 and DT = 1, the mean and covariance of the data are ⎛

⎞ 1.0000 0.0003 1.0000 μdata = ⎝ −0.0003 1.0000 0.0000 ⎠ and 0 0 1 ⎛ ⎞ 0.0006 0.0000 −0.0001 Σdata = ⎝ 0.0000 0.0184 0.0276 ⎠ . −0.0001 0.0276 0.0551

(36)

pij (gi ) = f (a−1 i ◦ gi ; μij , Σij ),

(37)

and more generally, p1,2,...,N (gi ). To save space in this section, we will occasionally remove the ◦ when the operation is clear. We can rewrite pij (gi ) in (45) as

The propagated mean and covariance for these parameters are ⎛ ⎞ 1 0 1 μprop = ⎝ 0 1 0 ⎠ and (38) 0 0 1 ⎛ ⎞ 0.0006 0 0 0 0.0184 0.0276 ⎠ . Σprop = ⎝ (39) 0 0.0276 0.0553 If we increase the noise to DT = 7, the mean and covariance from sample data are ⎞ ⎛ 1.0000 0.0011 1.0009 μdata = ⎝ −0.0011 1.0000 −0.0002 ⎠ and (40) 0 0 1 ⎛ ⎞ 0.0068 0.0000 0.0002 Σdata = ⎝ 0.0000 0.1278 0.1943 ⎠ . (41) 0.0002 0.1943 0.3883 The propagated mean and covariance are ⎛ ⎞ 1 0 1 μprop = ⎝ 0 1 0 ⎠ and 0 0 1 ⎛ ⎞ 0.0039 0 0 0 0.1290 0.1935 ⎠ . Σprop = ⎝ 0 0.1935 0.3869

Noting that mji = m−1 ij and switching the roles of i and j gives the same expression for pji (gj ). Our goal in this fusion problem is to write this new fused distribution in the form

−1 −1 −1 pij (gi ) = f (μ−1 i ai gi ; I, Σi ) · f (μj aj gi mij ; I, Σj ). (47) −1 Making the substitution h = μ−1 i ◦ ai ◦ gi , we have

pij (ai μi h) = f (h; I, Σi ) · f (m−1 ij qhmij ; I, Σj ), where q = mij ◦

μ−1 j



a−1 j

(48)

◦ ai ◦ μi . We rewrite (46) as

pij (ai μi h) = f (μi h; μij , Σij ) = f (h; μij , Σij ),

(49)

◦ μij . Our goal is to find closed-form where μij = μ−1 i expressions for μij and Σij . The exponents of all three Gaussian distributions (f ) in (48) and (49) are all scalars and thus we have, to within a constant C,  ∨ Xij Y T −1  ∨ Xij Y  log (e e ) Σij log (e e )  T  ∨ Xi Y  = C + log∨ (eXi eY ) Σ−1 log (e e ) i  ∨ −1 Xj Y T −1  ∨ −1 Xj Y  + log (mij e e mij ) Σj log (mij e e mij ) , (50) where eXij = exp(Xij ) = (μij )−1 , eXi = exp(Xi ) = I, eXj = exp(Xj ) = q and eY = exp(Y ) = h. Note that ∨ Xj Y Xj Y log∨ (m−1 e mij ) = Ad(m−1 e ). ij e ij ) log (e

(42)

(51)

μij ,

We assume that q and h are small such that the BCH expansion can be approximated with only the first three terms (43)

Similar accuracy can be seen for the constant curvature case.

log∨ (exp(X) exp(Y )) ≈ x + y + 21 ad(X)y. log∨ (exp(X) exp(Y )) = (I + 12 ad(X))(x + y).

Here we consider the fusion of banana distributions for N robots moving in formation. To begin, consider only two such robots, i and j. The actual (unknown) pose of these robots at time t are gi and gj . At time t = 0, the known initial poses are ai and aj , and the banana distributions for each robot diffuse so as to result in distributions f (a−1 i ◦ gi ; μi , Σi ) and ◦ g ; μ , Σ ). The means and covariances of these two f (a−1 j j j j distributions can be estimated using the propagation formulas from the previous section. Let us assume that an exact relative measurement between i and j is taken so that (44)

(52)

Since ad(X)x = 0, we can rewrite (52) as

VII. F USION WITH E XPONENTIAL C OORDINATES

mij = gi−1 ◦ gj

(46)

(53)

Since we renormalize, we rewrite (50) ignoring the constant C as (xij + y)T ΓTij Σ−1 ij Γij (xij + y) = (xi + y)T ΓTi Σ−1 i Γi (xi + y) −1 + (xj + y)T ΓTj Ad−T (mij )Σ−1 (mij )Γj (xj + y), j Ad

where Γk = I + 12 ad(Xk ). Let Sij Sj

=

ΓTij Σ−1 ij Γij ,

=

−1 ΓTj Ad−T (mij )Σ−1 (mij )Γj . j Ad

Si = ΓTi Σ−1 i Γi ,

and

We can then combine like terms and solve for the following closed-form formulas

is known. Then the distribution of gi becomes with a form of Bayesian fusion −1 pij (gi ) = f (a−1 i ◦ gi ; μi , Σi ) · f (aj ◦ gi ◦ mij ; μj , Σj ). (45)

270

Sij

=

Si + Sj ,

xij

=

−1 Sij (Si xi + Sj xj ),

Σij

=

−1 T Γij Sij Γij

(54) and

(55) (56)

which have a similar form to the product of Gaussians on Rn . We can then extract the desired mean μij as μij = μi ◦ exp(−ˆ xij ).

Fusion DT = 3 2

(57)

1.5

These equations can be used recursively for multiple robot systems with multiple measurements. These formulas can be contrasted with those used traditionally for Bayesian fusion of multivariate Gaussians on R3 ,

1

Y Position

˜ ij ) = ˜i + μ ˜ ij , Σ f (xi ; a ˜ i ) · f (xi ; a ˜j) ˜i + μ ˜ i, Σ ˜j + μ ˜j − m ˜ ij , Σ f (xi ; a

0.5

−0.5

(58)

−1

˜ ij = xj − xi , where m ˜ −1 Σ ij

=

˜ ij μ

=

⎞ ⎛ ⎞ ⎛ ⎞ 2.043 2.010 1.976 e c = ⎝ 0.090 ⎠ , x1 = ⎝ 0.039 ⎠ , x1 = ⎝ 0.146 ⎠ −0.034 −0.054 0.132 ⎛

⎞ ⎛ ⎞ ⎛ ⎞ 1.001 0.993 0.934 e c = ⎝ 0.840 ⎠ , x2 = ⎝ 0.836 ⎠ , x2 = ⎝ 0.896 ⎠ −0.128 −0.182 0.037 ⎛

⎞ ⎛ ⎞ ⎛ ⎞ 1.033 0.977 0.966 x∗3 = ⎝ −1.077 ⎠ , x3e = ⎝ −1.106 ⎠ , x3c = ⎝ −1.021 ⎠ . −0.283 −0.303 −0.117

It is clear that the means for the Gaussian model in exponential coordinates is closer to the true values. Moreover, if we examine the covariances associated with both models, it is apparent that the covariances of the exponential model have smaller magnitude than those for the Cartesian Gaussian. For example, the covariances of the fused distributions for the first

Ideal Paths True Poses

−1.5

˜ −1 + Σ ˜ −1 , and Σ i j

 −1 ˜ (˜ ˜ ij Σ ˜ −1 (˜ ˜ i) + Σ ˜j − m ˜ ij ) − a ˜i . Σ ai + μ aj + μ i j



x∗2

Sample Data Fusion PDFs

The propagation and fusion formulas were tested on a three robot system driving in a triangular formation with diffusion constant DT = 3 as shown in Fig. 6. Sample data from integrating the SDE in (2) many times are represented by light gray dots similar to Fig. 1(a). We integrated the SDE in (2) once from each starting position to represent a set of three true poses represented by black circles. From these true poses, we generated the exact measurements between each robot. The pdf contours from the fused Gaussians in exponential coordinates marginalized over the heading are also shown in the figure. Note that the fused pdf contours are not trying to match the banana distributions, but instead give a better estimate of the location of the robots based on the exact measurements. These results obtained using propagation and fusion on Gaussians on exponential coordinates were compared with the fusion of Gaussians on Cartesian coordinates. For comparison, we let x∗i be the true pose (xi , yi , θi )T in Cartesian coordinates for the ith robot. Then, xei and xci are the poses in Cartesian coordinates associated with the means from the fused Gaussians in exponential and Cartesian coordinates, respectively. For Fig. 6, these poses are given as: x∗1

0

−2 0

0.5

1

1.5 X Position

2

2.5

3

Fig. 6. Fusion of Gaussian in exponential coordinates for three robots driving in formation with true positions marked with a filled in circle.

robot are Σ1 ˜1 Σ



⎞ 0.0006 −0.0000 −0.0001 0.0011 ⎠ and = ⎝ −0.0000 0.0060 −0.0001 0.0011 0.0008 ⎞ ⎛ 0.0013 −0.0002 −0.0003 0.0267 ⎠ . = ⎝ −0.0002 0.0166 −0.0003 0.0267 0.0571

Thus, the fused exponential coordinate Gaussians’ results are more highly focused than the Cartesian ones. VIII. C ONCLUSION A comparison between the commonly used multivariate Gaussian model on (x, y, θ) and one based on the coordinates of the Lie algebra (or exponential coordinates) has been provided. These models were examined in the context of the stochastic kinematic model of a differential-drive robot. Through numerical examples, we demonstrated that the model based on exponential coordinates is able to capture the underlying distribution more accurately and more precisely. This held true for trajectories that were both straight and circular. In addition, a closed-form approximation was given for propagating the mean and covariance for paths consisting of straight segments and arcs of constant curvature. This allows the mean and covariance to be estimated quickly. Finally, we presented a closed-form approximation for multiplying two Gaussian pdfs in exponential coordinates. Multiplying two pdfs allows us to effectively fuse information obtained from two different sources. In the case of the example provided, these sources were forward propagation of the stochastic kinematic model and relative measurements between robots. The pdfs obtained using this method provided significantly increased accuracy when compared with those from just forward propagation.

271

The methods and models described above can be used to improve current state of the art SLAM algorithms. We hope to use these methods to develop a new filter for SE(2) similar to the Kalman filter. Also, we would like to develop new approaches for fusing data from different models together. For example, if a Gaussian on exponential coordinates is used to model forward propagation, we want to fuse it with noisy measurements from a Gaussian on Cartesian coordinates. ACKNOWLEDGMENTS This work was funded in part by NSF grant IIS-0915542 “RI: Small: Robotic Inspection, Diagnosis, and Repair” and Andrew Long’s NDSEG fellowship. R EFERENCES [1] M. Agrawal. A Lie algebraic approach for consistent pose registration for general euclidean motion. In Proc. of IEEE Int’l Conf. on Intelligent Robots and Systems (IROS), pages 1891–1897, 2006. [2] T. Bailey and H. Durrant-Whyte. Simultaneous localization and mapping (SLAM): Part II. IEEE Robotics and Automation Magazine, 13(3):108–117, 2006. [3] T. Bailey, J. Nieto, J. Guivant, M. Stevens, and E. Nebot. Consistency of the EKF-SLAM algorithm. In Proc. of IEEE Int’l Conf. on Intelligent Robots and Systems (IROS), pages 3562–3568, 2006. [4] G. Baldwin, R. Mahony, J. Trumpf, T. Hamel, and T. Cheviron. Complementary filter design on the special euclidean group SE (3). In Proc. of the 2007 European Control Conference, 2007. [5] J.A. Castellanos, J. Neira, and J.D. Tard´os. Limits to the consistency of EKF-based SLAM. IFAC Symposium on Intelligent Autonomous Vehicles, 2004. [6] JA Castellanos, R. Martinez-Cantin, JD Tard´os, and J. Neira. Robocentric map joining: Improving the consistency of EKF-SLAM. Robotics and Autonomous Systems, 55(1):21–29, 2007. [7] G. Chirikjian. Stochastic Models, Information Theory, and Lie Groups, Volume 2. 2012. [8] H. Durrant Whyte and T. Bailey. Simultaneous localisation and mapping (SLAM): Part I the essential algorithms. IEEE Robotics and Automation Magazine, 13(2):99–110, 2006. [9] E. B. Dynkin. Calculation of the coefficients in the Campbell-Hausdorff formula. Doklady Akademii Nauk SSSR, 57:323–326, 1947. [10] E. Eade and T. Drummond. Scalable monocular SLAM. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), volume 1, pages 469–476, 2006. [11] D. Hahnel, W. Burgard, D. Fox, and S. Thrun. An efficient FastSLAM algorithm for generating maps of large-scale cyclic environments from raw laser range measurements. In Proc. of IEEE Int’l Conf. on Intelligent Robots and Systems (IROS), volume 1, pages 206–211, 2003.

[12] D.J. Higham. An algorithmic introduction to numerical simulation of stochastic differential equations. SIAM review, pages 525–546, 2001. [13] S. Huang and G. Dissanayake. Convergence and consistency analysis for extended Kalman filter based SLAM. IEEE Trans. on Robotics, 23(5):1036–1049, 2007. [14] S.J. Julier and J.K. Uhlmann. A counter example to the theory of simultaneous localization and map building. In IEEE Int’l Conf. on Robotics and Automation (ICRA), volume 4, pages 4238–4243, 2001. [15] J. Kwon and K.M. Lee. Monocular SLAM with locally planar landmarks via geometric Rao-Blackwellized particle filtering on Lie groups. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 1522– 1529, 2010. [16] M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit. FastSLAM: A factored solution to the simultaneous localization and mapping problem. In Proc. of the Nat. Conf. on Artificial Intelligence, pages 593–598, 2002. [17] M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit. FastSLAM 2.0: An improved particle filtering algorithm for simultaneous localization and mapping that provably converges. In Int’l Joint Conf. on Artificial Intelligence, volume 18, pages 1151–1156. Lawrence Erlbaum Associates LTD, 2003. [18] K. Murphy. Bayesian map learning in dynamic environments. Advances in Neural Information Processing Systems (NIPS), 12:1015–1021, 1999. [19] W. Park, Y. Wang, and G.S. Chirikjian. The path-ofprobability algorithm for steering and feedback control of flexible needles. The Int’l J. of Robotics Res., 29(7): 813–830, 2010. [20] G. Silveira, E. Malis, and P. Rives. An efficient direct approach to visual SLAM. IEEE Trans. on Robotics, 24 (5):969–979, 2008. [21] R. Smith, M. Self, and P. Cheeseman. Estimating uncertain spatial relationships in robotics. Autonomous robot vehicles, 1:167–193, 1990. [22] C. Stachniss, G. Grisetti, W. Burgard, and N. Roy. Analyzing gaussian proposal distributions for mapping with Rao-Blackwellized particle filters. In Proc. of IEEE Int’l Conf. on Intelligent Robots and Systems (IROS), pages 3485–3490, 2007. [23] S. Thrun, W. Burgard, and D. Fox. A real-time algorithm for mobile robot mapping with applications to multirobot and 3D mapping. In IEEE Int’l Conf. on Robotics and Automation (ICRA), volume 1, pages 321–328, 2000. [24] S. Thrun, W. Burgard, and D. Fox. Probabilistic robotics. MIT Press, 2006. [25] Y. Wang and G. S. Chirikjian. Nonparametric secondorder theory of error propagation on motion groups. The Int’l J. of Robotics Res., 27(11-12):1258–1273, 2008. [26] Y. Zhou and G.S. Chirikjian. Probabilistic models of dead-reckoning error in nonholonomic mobile robots. In IEEE Int’l Conf. on Robotics and Automation (ICRA), volume 2, pages 1594–1599, 2003.

272

Recognition and Pose Estimation of Rigid Transparent Objects with a Kinect Sensor Ilya Lysenkov

Victor Eruhimov

Gary Bradski

Itseez [email protected]

Itseez [email protected]

Willow Garage [email protected]

Abstract—Recognizing and determining the 6DOF pose of transparent objects is necessary in order for robots to manipulate such objects. However, it is a challenging problem for computer vision. We propose new algorithms for segmentation, pose estimation and recognition of transparent objects from a single RGB-D image from a Kinect sensor. Kinect’s weakness in the perception of transparent objects is exploited in their segmentation. Following segmentation, edge fitting is used for recognition and pose estimation. A 3D model of the object is created automatically during training and it is required for pose estimation and recognition. The algorithm is evaluated in different conditions of a domestic environment within the framework of a robotic grasping pipeline where it demonstrates high grasping success rates compared to the state-of-the-art results. The method doesn’t deal with occlusions and overlapping transparent objects currently but it is robust against non-transparent clutter.

I. I NTRODUCTION Transparent objects are a common part of domestic and industrial environments. Recognition and 6 degres of freedom (6DOF) pose estimation of objects is required for robotic grasping and manipulation. However, transparent objects are very challenging for robot perception with RGB images as well as with modern 3D sensors. • 2D Computer Vision. The appearance of a transparent object strongly depends on its background and lighting. Transparent objects usually don’t have their own texture features, their edges are typically weak and intensity gradient features are heavily influenced by see through background clutter as well as specularity edges induced by lighting. So classical computer vision algorithms for recognition and pose estimation are difficult to apply to transparent objects. • 3D Computer Vision. 3D point clouds are successfully used for object recognition and pose estimation (Hinterstoisser et al. [10]). However, modern sensors (Kinect, ToF cameras, stereo cameras, laser scanners) can’t estimate depth reliably and produce point clouds for transparent and specular objects so these algorithms can not be applied. Cross-modal stereo can be used to get depth estimation on transparent objects with Kinect (Chiu et al. [4]) but its quality is far from estimation on Lambertian objects. Acquisition of 3D data and reconstruction of transparent objects is a challenging unsolved problem (Ihrke et al. [11], M´eriaudeau et al. [19]) and active area of research (Jiang and Smith [12]).

(a) Kinect RGB image

(b) Kinect depth image

(c) Segmentation

(d) Recognition and pose estimation

Fig. 1. Example of test data with 3 transparent objects with segmentation, recognition and pose estimation results using the proposed algorithms.

We address these challenges and propose an algorithm for segmentation, pose estimation and recognition of transparent objects. Unknown transparent objects are segmented from a single image of a Kinect sensor by exploiting its failures on specular surfaces. 3D models of objects created at the training stage are fitted to extracted edges. A cost function value is used to make a decision about an instance of the object and determine its 6DOF pose relative to the robot. The proposed algorithm was integrated into the object recognition stack (Rublee et al. [26]) and connected with the robotic grasping pipeline (Ciocarlie [5]) in the Robot Operating System (ROS, Quigley and WillowGarage [24]) and evaluated on a Willow Garage’s PR2 robot. The grasping is robust to non-transparent clutter and success rate is over 80%. II. R ELATED

WORK

Transparent objects and the challenges they bring in perception were rarely addressed in research papers until recently. McHenry et al. [18] derive features from inherent properties of transparent objects (transparency, reflection and refraction). Usually these properties are obstacles for computer vision

273

algorithms and features like specular highlights are considered as noise. The situation is different for transparent objects. These features are characteristic of transparency and so they are used to segment unknown transparent objects from a single RGB image by McHenry et al. [18], McHenry and Ponce [17]. This problem is also addressed by Kompella and Sturm [14] in a robotic context of transparent obstacles detection. These approaches are based on finding correspondence between deformed background behind a transparent object and its unoccluded part available as foreground on the same image. This correspondence between regions is used to segment the transparent object. However, these counterpart regions should be large enough to estimate features such as blurring, texture distortion and other transparent overlay effects. It is not the case in highly cluttered scenes or, for example, if a transparent glass is filled with a colored drink like milk. The algorithm proposed in this paper allows segmentation even in such cases. Lei et al. [15] use light detection and ranging (LIDAR) data to segment unknown transparent objects: highlight spots from a RGB image are used to find candidate areas of transparent objects and then the GrabCut segmentation algorithm (Rother et al. [25]) is applied to a depth image and a laser reflectance intensity image to compute the final segmentation. The experimental setup consist of a 2D LIDAR device, a registered RGB camera and a pan-tilt unit so it is more complex and expensive than Kinect used in the current paper. Osadchy et al. [22] utilize specular highlights of transparent objects in the recognition problem. Successful recognition of 9 transparent objects was demonstrated with this single feature. Unfortunately, a test scene must have a dominant light source with known position so applicability of the approach is limited. Fritz et al. [8] use an additive model of latent factors to learn appearance of transparent objects and remove influence of a background behind them. It allows recognition of transparent objects in varying backgrounds. The algorithm was evaluated on the problem of transparent object detection in challenging conditions of domestic environment with complex backgrounds using a dataset of 4 transparent objects. Klank et al. [13] detect unknown transparent objects and reconstruct them using two views of a test scene from a ToF camera. The problem is challenging especially because no training is involved and a 3D model of an object is reconstructed directly from test data. The algorithm is tolerant to difficult lighting conditions thanks to data from a ToF camera. Transparent objects are assumed to stay on a flat surface but the algorithm is tolerant to violations of this assumption. The algorithm is not misled by opaque objects and distinguishes them from transparent objects correctly. The algorithm was evaluated on a dataset of 5 transparent objects with uniform backgrounds behind them. The objects were placed on a dark table separately from each other. Transparent objects were reconstructed successfully in 55% of 105 attempts. Also transparent object grasping with a robot was evaluated. It was successful in 41% of the cases when reconstruction was successful so the robot grasped 23% of all transparent objects overall.

Phillips et al. [23] propose to use inverse perspective mapping for detection and pose estimation of transparent objects. This cue can be computed if two views of a test scene are available and objects stay on a support plane. The algorithm was evaluated on a dataset of 5 transparent objects that were isolated from each other in a test scene. Poses of the objects are estimated with the accuracy of 2.7-11.1 millimeters with the mean error of 7.8 millimeters. However, pose is estimated with exhaustive coarse-to-fine search in 2D space of poses on a table (with refinement of the best candidate) so this is not scalable to 6-DOF pose estimation. The approach was demonstrated to be applicable to the detection problem. III. P ROPOSED APPROACH We consider problems of segmentation, pose estimation, recognition and grasping of transparent objects in this paper. Our main focus is accurate pose estimation of transparent objects which is used to enable robotic grasping. Previous works by Klank et al. [13], Phillips et al. [23] are the most relevant here because other papers don’t deal with 3D pose estimation of transparent objects. These works require two views of a test scene, in contrary we use a single image of Kinect at the test stage to recognize and estimate pose of transparent objects. Kinect uses structured light and projects a IR-pattern on objects to estimate their depth. However, such a technique doesn’t work with specular, refractive, translucent objects and objects with very low surface albedo (Ihrke et al. [11]). So Kinect can’t produce point clouds of transparent objects but we take advantage of this fact. The regions where Kinect fails to estimate depth are likely to be produced by specular or transparent surfaces and we use invalid areas in a Kinect depth map as an interest area operator. Using this cue and a corresponding RGB image, we segment transparent objects from background and extract their silhouettes. 3D models of transparent objects are created during a training stage by painting objects with color and scanning them. These learned models are fitted to the extracted RGB silhouettes on the test image by varying poses of the object. We treated this as an optimization problem with regard to 6 parameters that define a 3D pose of a rigid object relative to the robot’s camera. This multi-dimensional optimization problem is decomposed into several parts. Some of them have closed-form solutions and others can be solved with standard iterative algorithms like Levenberg-Marquardt. The inherent ambiguity in pose estimation of transparent objects is discussed. This ambiguity is resolved with a support plane assumption and the correct pose is returned which can be used for robotic grasping. Our main contributions are: • A model of transparent objects that takes into account both silhouette and surface edges. • An algorithm for 6DOF pose estimation of transparent objects. It is based on existing CAD-based approaches to pose estimation (Ulrich et al. [28], Liu et al. [16]) but our algorithm doesn’t need a CAD model and utilizes the specifics of transparent objects to meet practical requirements in performance and accuracy.

274



A complete system for grasping transparent objects with Kinect as input, going from the model capture up to segmentation, pose estimation and grasping.

The proposed algorithm was evaluated on 13 transparent objects in different conditions. As stated above, our algorithm was integrated into a robotic grasping pipeline and evaluated on a PR2 robot. Grasping was successful in 80% of cases and this result is robust even to challenging complex backgrounds behind the transparent objects.

Fig. 2. A 3D model of a transparent object created with KinectFusion at the training stage. The transparent object was sprayed white so that a Kinect sensor could be used to estimate surface depth.

IV. T RAINING STAGE Our method creates a 3D model of an object that allows us to generate edgels (pixels belonging to edges) in a 2D image given the object’s 6DOF pose. Transparent as well as any untextured objects create two types of edges: surface and silhouette edges. A surface edge is produced by a discontinuity in surface orientation. Image edgels corresponding to a surface edge are made visible by lighting discontinuities. A silhouette edge is produced by the object border and the corresponding image edgels are made visible by both lighting and contrast with the background. Examples of surface edges are edges where the stem joins the rest of the glass and the back edge of the cup lip (Fig. 5), while silhouette edges are side edges of the glass. These two types of edges have a different dependence on the object pose. However, they are equally important for the pose estimation, so we have to take into account both. We define the object model that contains a silhouette model and a surface edge model. The silhouette model is a 3D point cloud of the whole object surface and is used to generate silhouette edges. The surface edge model is a set of 3D points corresponding to surface curvature discontinuities. The creation of a silhouette model requires 3D scanning of a transparent object. However, there are no robust algorithms to do this (Ihrke et al. [11]). One possible solution is to create the model manually but that is a time-consuming process. Another approach is to find and download a similar 3D model from the Internet as done by Phillips et al. [23] for 5 transparent objects. Unfortunately, existing databases of 3D models are limited and problems arise if there are no similar models available. Finally, one can cover transparent objects with powder or paint and use standard scanning algorithms on these now Lambertian objects as done by Osadchy et al. [22]. We use the last approach because it allows us to create high-quality models. We paint a transparent object white and use KinectFusion (Newcombe et al. [20]) to compute a clean point cloud of the object. The KinectFusion algorithm creates a single point cloud of a test scene by registering point clouds from different viewpoints. A modeled object was put on a transparent stand on a table to enable easy 3D segmentation of the model as a cluster of points above the table plane. KinectFusion does not use 2D photometric data and so relies on tracking/registering to 3D structures only. To ensure good 3D registration, we placed a 3D calibration rig near the object to be modeled. This allows robust and precise camera tracking that results in accurate 3D models. One of the created models is shown in the Fig. 2.

The surface edge model is created from the silhouette model. All points of the silhouette model are projected on the training images. Points that are often projected on Canny edges are used to form the surface edge model. The models are used to compute edges of the object for given poses. Points of the silhouette model are projected onto an image for a given pose and then morphological operations are applied to get a single blob. The contour of this blob is the silhouette of the object. Surface edges are computed by projecting the surface edge model. Objects are transparent so there are no self-occlusions and we get surface edges directly. Template-based approaches are popular in computer vision, a recent example is Hinterstoisser et al. [10]. Templates are created for several poses by sampling views from a viewing sphere of the object. These templates are matched to a test image and a full pose is estimated. Standard template matching algorithms are not suitable for transparent objects because these objects have weak gradients and they can be severely disturbed by gradients of background behind the object. These algorithms degrade significantly in such cases and should be improved by using a depth map (Hinterstoisser et al. [10]) but it is not available for transparent objects. We adopt a template-based approach for pose estimation of transparent objects. Pose estimation of a rigid object requires estimation of 6 parameters: 3 for rotation and 3 for translation. A rotation matrix can be decomposed into a superposition of three rotations around coordinate axes: R = Rz Rx Ry , where we consider fixed coordinate axes. An optical axis is z and the upright direction of the object is aligned with yaxis for the identity transformation. We will find translation parameters and Rz at the testing stage. So, we create templates of the object for different Ry and Rx during the training stage. We sample possible rotations Ry and Rx with some discrete steps and for each step, we project our model to get a silhouette of the object for this pose. The y-axis corresponds to the upright direction, many transparent objects common in the household environment (plates, glasses) are rotationally symmetric around this direction. So the algorithm checks to see if an object has rotation symmetry around the y-axis and templates are sampled for Rx only if it is the case. 100 templates were sampled for non-symmetric objects and 10 for symmetric objects when evaluating the algorithm. As a result, we get a 3D model of the object and silhouettes for possible rotations Ry and Rx after the training stage.

275

V. T ESTING STAGE A Kinect RGB image of two transparent objects is shown in the Fig. 3a. Kinect has a low quality RGB camera and it is difficult to detect and estimate pose from such images robustly, especially when background is not uniform. However, in the Fig. 3b a corresponding Kinect depth map is shown where invalid depth, that is regions where Kinect fails to estimate depth, is colored by black. Kinect is not able to estimate depth on transparent objects but we can take advantage of this fact: regions where Kinect doesn’t work are likely to belong to transparent objects. So this cue can be used to segment transparent objects on an image.

(a) Kinect RGB image

(b) Kinect depth image

Fig. 3. Kinect images of two transparent objects. It is hard to detect transparent objects from the RGB image but the depth map provides an informative cue for segmentation: Kinect can’t estimate depth in regions where transparent objects are located.

A. Transparent objects segmentation Kinect can fail to estimate depth not only on transparent objects, it often fails on contours of objects. Sometimes it can also return valid depth of the background behind transparent objects. To clean this up, morphological operations (closing and opening) are applied to find and eliminate small regions. The result is a proposed mask representing transparent objects. However, invalid depth doesn’t always correspond to transparent objects, and noise distorts masks so these operations can only produce approximate regions of transparent objects. To refine these masks, we use them as an initialization and constraints for the GrabCut segmentation algorithm (Rother et al. [25]) on the corresponding Kinect RGB image. Regions with transparent objects differ from surrounding background only slightly so many segmentation algorithms fail to segment them without additional information. However, this mask derived from a Kinect depth map provides good initialization and enough constraints to allow GrabCut to segment transparent objects accurately. Results of segmentation are shown in the Fig. 4a. Segmentation of the left object is accurate but the right object is segmented incorrectly due to non-uniform background. However, subsequent steps of our algorithm are robust to such errors of the segmentation. Also there is a third false segmentation mask because Kinect failed on the non-transparent object. This is a rare case but the pose estimation algorithm is robust to such situations too since it is unlikely to match any of the models well.

(a) Results of glass segmentation

(b) Projection of the estimated pose

Fig. 4. (a) The algorithm uses simple morphological operations to get approximate masks from a Kinect depth map and then refines them by applying the GrabCut algorithm to a Kinect RGB image. Pose estimation is robust to errors of the segmentation. (b) The algorithm recognized the object correctly and its pose was estimated accurately.

B. Initial pose estimation We find the initial pose estimation using a test image silhouette. We already have silhouettes of the object for various Rx and Ry from the training stage. So we need to estimate the translation parameters and Rz for each silhouette. In order to do that, we first find a two-dimensional similarity transformation between train and test silhouettes. The transformation consists of a 2D translation, uniform scaling and rotation in the image plane. This is done using Procrustes Analysis (2D shape matching, Dryden and Mardia [6]). 2D translation is estimated by aligning centroids: n 1 (xi , yi ), (1) (¯ x, y¯) = n i=1 where n is the number of points in a silhouette with points (x0 , y0 ), (x1 , y1 ), . . . , (xn , yn ). 2D scale is estimated by aligning scatters of the points: : ; n ;1  s=< (xi − x ¯)2 + (yi − y¯)2 . (2) n i=1 Correspondences between points in the training and test silhouette are not known. So a 2D-2D ICP (Besl and McKay [1]) is used to estimate rotation between these silhouettes and refine other parameters of the similarity transformation. Now we need to compute a corresponding 3D transformation that maps points of a 3D model to the same locations as this 2D similarity transformation. Unfortunately, there is no such transformation under the perspective camera model in general case. So we use a weak perspective projection model (Hartley and Zisserman [9]) that assumes all object points have the same depth. This assumption is valid if object size is small comparable to the distance to the camera. This model produces the following equation which must be solved for all x, y simultaneously with regard to translation (tx , ty , tz ) and rotation Rz : ⎛ ⎞ ⎛ ⎞ ⎞ x ⎛ x r11 r12 0 tx ⎜ ⎟ 1 1 y⎟ SK ⎝y ⎠ = K ⎝r21 r22 0 ty ⎠ ⎜ ⎝ z¯⎠ , (3) z¯ z¯ + tz z¯ 0 0 1 tz 1

276

where K is the camera matrix of intrinsics parameters, S is the similarity transformation written as the 3 × 3 matrix and ⎞ ⎛ r11 r12 0 (4) Rz = ⎝r21 r22 0⎠ . 0 0 1 There are two solutions to this problem but one of them places an object behind the camera. Denoting a matrix A = K −1 SK with elements A = (aij ), the unique physically realizable solution is given by equations:  1  − 1 z¯, tz = √ det A z + tz ), tx = a13 (¯ 

r11 r21

ty = a23 (¯ z + tz ),    tz  a11 r12 = 1+ a21 r22 z¯



(5)

a12 . a22

Translation parameters and Rz are computed by these equations for each Ry and Rx from the training stage. If Ry and Rx are close to the correct values of the test object pose then the training silhouette and the test silhouette will be matched well but incorrect Rx and Ry will produce bad matches. So we need a function to measure the quality of silhouettes matchings to find a correct pose. We use the Chamfer Distance (Borgefors [2]) for this task. Its main weakness is low accuracy in cluttered scenes and various modifications were proposed (Olson and Huttenlocher [21], Shotton et al. [27]) but we have segmented silhouettes so it is not a problem in our pipeline. Poses with large Chamfer Distance are discarded and non-maximum suppression is applied to remove similar transformations. Several plausible poses remained after these procedures (usually 1-3 for a test silhouette). Computed poses are not very accurate due to discrete sampling of Ry and Rx and weak perspective assumption so they need to be refined. C. Pose refinement Surface and silhouette edges are represented by 3D point clouds that should be aligned with a segmented silhouette and Canny edges of a test image. However we do not know correspondences between 3D points and edge points. So this is a problem of 3D-2D registration and we use a robust variant of Levenberg-Marquardt Iterative Closest Points (LMICP, Fitzgibbon [7]) to solve it. The result of this algorithm is the refined pose. D. Ambiguity of poses The problem of pose estimation from one monocular image has inherent ambiguity for transparent objects. There exist significantly different poses that have very similar projections to a test image. For example, see the Fig. 5. After a while you can see two different poses in this image: either the wineglass is upside-down or it is lying on its side on the table. Also see the projected object model for a pose in the Fig. 6a. It appears to be a good pose because model points are projected on the object and model edges are aligned with test

Fig. 5. Ambiguous pose. There are two possible interpretations: either the wine glass is turned upside-down or it is laid on its side. This ambiguity wouldn’t arise if the object was opaque.

(a) Ambiguous projection

(b) Point cloud of this test scene Fig. 6. (a) The projected model onto the image is shown. The projection looks good because it is aligned well with edges of the object. (b) A view from the side shows that in fact the model is far away from the true pose (the table is below and the object model is above).

edges too. But a point cloud of this scene presented in Fig. 6b shows that in fact this pose is not correct. Additional information is required to resolve this ambiguity. For example, one solution is to use a second view of the scene from different viewpoint. However, this solution may require a robot moving and it complicates the pose estimation pipeline. So instead, we use the assumption that the transparent object stays on a support plane. It allows us to resolve pose ambiguity and also be used to refine computed poses. Note that this assumption is not inherent to the approach and can be easily relaxed. This assumption implies that we need to transform a computed pose in such a way that the object will be put on a support plane and the projection of a model with the new pose should be similar to the projection with the old pose. Denoting the vector of residual errors for a pose T as err(T ) and the starting pose as T0 , we need to solve the following

277

problem: min ||err(T ) − err(T0 )||, T

(6)

with such constraints on the pose T that the object with this pose must stay on the support plane. We expand err(T ) in Taylor series in the point T0 and discard the higher order terms because T0 should be close to the correct pose. So ||err(T ) − err(T0 )|| ≈ ||JΔT ||, where J is Jacobian and ΔT = T − T0 . Minimization of ||JΔT || is equivalent to minimization of the dot product (JΔT, JΔT ) = ΔT T J T JΔT = ΔT QΔT, where Q = J T J. Constraints on the pose T consist of constraints on rotation of the object (the upright direction must be aligned with the plane normal) and its location (the bottom of the object must belong to the support plane). These are linear constraints on ΔT which we denote as EΔT = d. So the final problem is: min ΔT T QΔT ΔT

(7)

EΔT = d. This is a quadratic programming problem with linear equality constraints. The solution of the problem is derived with Lagrange multipliers as the solution to the linear system (Boyd and Vandenberghe [3]):      ΔT 0 Q ET = . (8) λ d E 0

Fig. 7.

image is in the Fig. 7. About 500 test images were captured (70-120 images per object) with resolution 640 × 480. The results are shown in the Fig. 8. Poses are estimated with the accuracy of several millimeters. Ground truth data can have a systematic error of about 4 millimeters due to approximate measurement of the distance to a test object. So we also report relative translation error to exclude this bias in the ground truth data. Rotation error is not reported because all objects in this dataset have rotation symmetry and constraints of the support plane define rotation of an object completely in this case. Object bank bottle bucket glass wine glass mean

Estimated poses are updated with computed ΔT and further refined by LM-ICP (Fitzgibbon [7]) with the constraint that the object stays on the support plane. The best pose that achieves the minimum value of the cost function in LM-ICP, is returned as the pose of the object. An example of estimated pose is shown in Fig. 4b. E. Recognition technique Each silhouette can belong to only one object because we assume transparent objects don’t intersect each other in the test image. So we recognize objects by final values of the cost function in LM-ICP. Its minimum value indicates the correct object which is the most similar to this silhouette. VI. E XPERIMENTS AND RESULTS We scanned 13 transparent objects and created 3 datasets to evaluate the algorithm. The first dataset had 5 transparent objects and it was used to measure accuracy of pose estimation and recognition. The second and the third dataset had 8 other objects and they were used to evaluate robotic grasping with uniform and textured background. The open source implementation of the system can be obtained from https://github.com/wg-perception/transparent objects. A. Pose estimation accuracy Each of 5 transparent objects from the first dataset was placed on the table with uniform background, one object at time. The table had fiducial markers attached to it. They were used to estimate the ground truth pose. An example of a test

Example of a test image used to evaluate pose estimation accuracy.

Success rate 0.99 0.99 1.00 0.94 0.97 0.98

Mean translation error (cm) absolute relative 0.3 0.2 1.2 0.5 0.5 0.5 0.4 0.3 1.1 0.5 0.7 0.4

Fig. 8. Accuracy of pose estimation. Success rate is the proportion of cases when a final pose was closer than 2cm to the ground truth pose. Translation error is the distance between an estimated location of the object and (a) the ground truth location (absolute error) or (b) the mean estimated location computed from all test images (relative error).

B. Recognition Recognition was evaluated on the same dataset as pose estimation. However, the algorithm had to estimate poses of all objects in this case and return the one with the lowest cost function. Results are in the Fig. 9. Bank, bucket and glass are correctly recognized in 93% of cases on average. However, the wine glass is not recognized at all. This is due to failures at the glass segmentation step. The wine glass has a very thin leg and so was discarded by morphological operations. So, segmentation of the wine glass without the leg looks like a regular glass. It is interesting to note that pose estimation is still robust to such segmentation errors and it has good accuracy for the wine glass as reported in the Fig. 8. Recognition results show that our algorithm can be applied to the problem of transparent object recognition. However, a more robust segmentation algorithm is required to use it in practice for this task. For example, the segmentation should

278

be improved with using both the invalid depth cue proposed in this paper and complementary algorithms proposed by McHenry et al. [18], McHenry and Ponce [17].

bank bottle bucket glass wine glass

bank 0.97 0.01 0 0.01 0

bottle 0 0.57 0 0 0

bucket 0 0 0.84 0.01 0

glass 0.03 0.42 0.16 0.98 1

wine glass 0 0 0 0 0

(a) Uniform background

Fig. 9. Normalized confusion matrix. Actual classes are in the rows and predicted classes are in the columns. The algorithm recognizes bank, bucket and glass in 93% of cases on average but fails to recognize the wine glass due to problems with segmentation of its thin leg.

(b) Cluttered background

Fig. 11. Typical images of a transparent object (it is in the middle) which were used to evaluate robotic grasping.

Object

C. Grasping

tall glass small cup middle cup wine glass yellow cup perfume bottle perfume ball perfume box mean

Uniform background 8 10 10 9 9 9 3 8 8.25

Non-transparent clutter 8 8 10 4 10 10 7 7 8.00

Fig. 12. Number of successful grasps for each object from 10 attempts. Grasps are successful in 80% of cases and this result is robust to complex non-transparent clutter.

Fig. 10. Dataset of 8 transparent objects with different shapes, materials and characteristics (one of them is translucent). Test objects for evaluating robotic grasping are at left. Identical copies of these objects, painted with color, are on the right. They were used to create 3D models with KinectFusion at the training stage.

The dataset of 8 transparent objects was used to evaluate grasping (Fig. 10). A transparent object was put before the robot on a table in different places. Then our pose estimation algorithm was executed to get the pose of this object. The computed pose was used to grasp the object. The robot had to lift the object for 10 cm. Grasping was considered successful if the robot could hold the object for at least 30 seconds. We evaluated the algorithm in two different conditions. The first experiment had simple uniform background (Fig. 11a). The second experiment had challenging background with nontransparent clutter (Fig. 11b). We put objects on one of the paintings that lie on the table. We had 10 grasp attempts for each object in each experimental setup. Results are presented in the Fig. 12. The robot can grasp a transparent object in 80% of cases and this means the algorithm is stable to cluttered background. However, the results in two setups are different for the wine glass. There are two possible ways to grasp the objects from the database: from the side or from the top. Grasps from the top are easy

because they will be successful even if the gripper is shifted slightly from the correct position. Grasps from the side are very difficult for some objects. For example, the wine glass is almost as big as the maximum width of the PR2 grip. This means that even small errors at any stage of the grasping pipeline (model capture, pose estimation, PR2 calibration) will result in a failed grasp. The wine glass was grasped mostly from the top when background was uniform. But the robot tried mostly to accomplish grasps from the side when the background was cluttered because additional objects changed the collision map and other grasp points were selected in the PR2 grasping pipeline (Ciocarlie [5]). So this difference in results are explained not by the difference in background but by these two types of grasps. Also, results are different for the perfume ball in two setups. The perfume ball is a challenging object to grasp because it is heavy and can easily slip out of the gripper, resulting in less stable grasps than other objects used. The difference in results is explained by higher variance of successful grasps for this object. The results of our algorithm don’t change with different backgrounds. This property is very important when dealing with transparent objects because various cluttered backgrounds behind transparent objects is one of the main challenges for computer vision algorithms. Our algorithm is robust to clutter because a Kinect depth map is used for segmentation and it is not affected by background behind transparent objects.

279

VII. C ONCLUSION The paper presents new algorithms for segmentation, pose estimation and recognition of transparent objects. A robot is able to grasp 80% of known transparent objects with the proposed algorithm and this result is robust across nonspecular backgrounds behind the objects. Our approach and other existing algorithms for pose estimation (Klank et al. [13], Phillips et al. [23]) can not handle overlapping transparent objects so this is the main direction for future work.

[14]

[15]

ACKNOWLEDGMENTS We would like to thank Vincent Rabaud (Willow Garage) and Ethan Rublee (Willow Garage) for help with integration of the algorithm into the object recognition stack and connection with the PR2 grasping pipeline. We also thank the anonymous reviewers for their useful comments and suggestions that helped to improve the paper.

[16]

[17]

R EFERENCES [1] P.J. Besl and N.D. McKay. A method for registration of 3-D shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1992. [2] G. Borgefors. Hierarchical chamfer matching: A parametric edge matching algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1988. [3] S.P. Boyd and L. Vandenberghe. Convex optimization. Cambridge Univ Pr, 2004. [4] Walon Chiu, Ulf Blanke, and Mario Fritz. Improving the Kinect by Cross-Modal Stereo. In 22nd British Machine Vision Conference (BMVC), 2011. [5] M. Ciocarlie. Object manipulator package (ros). http: // www.ros.org/ wiki/ object manipulator, 2012. [6] I.L. Dryden and K.V. Mardia. Statistical shape analysis, volume 4. John Wiley & Sons New York, 1998. [7] A.W. Fitzgibbon. Robust registration of 2D and 3D point sets. Image and Vision Computing, 2003. [8] M. Fritz, M. Black, G. Bradski, and T. Darrell. An additive latent feature model for transparent object recognition. Advances in Neural Information Processing Systems (NIPS), 2009. [9] R. Hartley and A. Zisserman. Multiple view geometry in computer vision. Cambridge Univ Press, 2000. [10] S. Hinterstoisser, S. Holzer, C. Cagniart, S. Ilic, K. Konolige, N. Navab, and V. Lepetit. Multimodal Templates for Real-Time Detection of Texture-Less Objects in Heavily Cluttered Scenes. In IEEE International Conference on Computer Vision, 2011. [11] Ivo Ihrke, Kiriakos N. Kutulakos, Hendrik P. A. Lensch, Marcus Magnor, and Wolfgang Heidrich. State of the Art in Transparent and Specular Object Reconstruction. In STAR Proceedings of Eurographics, pages 87–108, 2008. [12] L.T. Jiang and J.R. Smith. Seashell Effect Pretouch Sensing for Robotic Grasping. In Robotics and Automation (ICRA), IEEE International Conference on, 2012. [13] U. Klank, D. Carton, and M. Beetz. Transparent Object Detection and Reconstruction on a Mobile Platform. In

[18]

[19]

[20]

[21]

[22]

[23]

[24] [25]

[26] [27]

[28]

280

Robotics and Automation (ICRA), 2011 IEEE International Conference on. IEEE, 2011. V.R. Kompella and P. Sturm. Detection and avoidance of semi-transparent obstacles using a collective-reward based approach. In Robotics and Automation (ICRA), 2011 IEEE International Conference on. IEEE, 2011. Zhong Lei, K. Ohno, M. Tsubota, E. Takeuchi, and S. Tadokoro. Transparent object detection using color image and laser reflectance image for mobile manipulator. In Robotics and Biomimetics (ROBIO), 2011 IEEE International Conference on, pages 1 –7, dec. 2011. M.Y. Liu, O. Tuzel, A. Veeraraghavan, R. Chellappa, A. Agrawal, and H. Okuda. Pose estimation in heavy clutter using a multi-flash camera. In IEEE International Conference on Robotics and Automation (ICRA), 2010. K. McHenry and J. Ponce. A geodesic active contour framework for finding glass. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006). IEEE, 2006. K. McHenry, J. Ponce, and D. Forsyth. Finding glass. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005). IEEE, 2005. F. M´eriaudeau, R. Rantoson, D. Fofi, and C. Stolz. Review and comparison of non-conventional imaging systems for three-dimensional digitization of transparent objects. Journal of Electronic Imaging, 21:021105, 2012. R.A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A.J. Davison, P. Kohli, J. Shotton, S. Hodges, and A. Fitzgibbon. KinectFusion: Real-time dense surface mapping and tracking. In 10th IEEE International Symposium on Mixed and Augmented Reality, 2011. C.F. Olson and D.P. Huttenlocher. Automatic target recognition by matching oriented edge pixels. Image Processing, IEEE Transactions on, 6(1):103–113, 1997. M. Osadchy, D. Jacobs, and R. Ramamoorthi. Using specularities for recognition. In 9th IEEE International Conference on Computer Vision. IEEE, 2003. C.J. Phillips, K.G. Derpanis, and K. Daniilidis. A Novel Stereoscopic Cue for Figure-Ground Segregation of Semi-Transparent Objects. In 1st IEEE Workshop on Challenges and Opportunities in Robot Perception, 2011. M. Quigley and WillowGarage. Robot operating system, (ros). http:// www.ros.org/ wiki/ , 2012. C. Rother, V. Kolmogorov, and A. Blake. GrabCut: interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics (TOG), 2004. E. Rublee, T. Straszheim, and V. Rabaud. Object recognition. http:// www.ros.org/ wiki/ object recognition, 2012. J. Shotton, A. Blake, and R. Cipolla. Multiscale categorical object recognition using contour fragments. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1270–1281, 2007. ISSN 0162-8828. M. Ulrich, C. Wiedemann, and C. Steger. CAD-based recognition of 3d objects in monocular images. In International Conference on Robotics and Automation, volume 1191, page 1198, 2009.

Towards Persistent Localization and Mapping with a Continuous Appearance-Based Topology Will Maddern, Michael Milford and Gordon Wyeth School of Electrical Engineering and Computer Science Queensland University of Technology, Brisbane, Australia {w.maddern, michael.milford, gordon.wyeth}@qut.edu.au

easily derive odometry from visual information. The incorporation of a motion model into the image retrieval process has the potential to greatly enhance the performance of an appearance based system. In CAT-SLAM [6, 7], the local movement information of the robot is fused with appearance information using a particle filter to dramatically improve the recall of location over the use of appearance information alone. In this paper, we address the problem of using appearance based methods when revisiting the same location multiple times. Typical appearance-based methods have linear growth in the number of representations as locations are re-visited over and over. Consequently, both computation time and memory usage have unbounded growth over time. This problem is compounded with a corresponding fall in recall performance as multiple representations of a single place compete to be the “correct” or best representation. The new method, called CAT-Graph, uses a combination of visual appearance and local odometry data as in CAT-SLAM, but fuses multiple visits to the same location into a topological graph-based representation. Localization is performed by propagating particles along the edges in the graph using local motion information and updating particle weights based on local appearance information. Mapping is performed by adding new motion and observation information to the graph as the robot visits new locations. Locations that are revisited are explicitly connected to the most recent location, and particles can traverse these connections to simultaneously evaluate multiple representations of a single place. The number of nodes in the graph is limited by a pruning scheme that maintains the map at constant memory requirements while minimizing information loss and maintaining performance. Importantly, the algorithm does not rely on global metric information, or attempt to relax the measurements between nodes for global metric consistency. Instead, the topology maintains relative representations between nodes to provide local metric information to provide improvement in recall performance. We illustrate that global metric accuracy is not required to incorporate appearance and motion into our mapping and localization algorithm. The paper proceeds with a review of recent work in appearance-based mapping and localization systems, before presenting the details of the CAT-Graph algorithm. The performance of the algorithm is demonstrated using the wellknown New College dataset [8], illustrating marked

Abstract— Appearance-based localization can provide loop closure detection at vast scales regardless of accumulated metric error. However, the computation time and memory requirements of current appearance-based methods scale not only with the size of the environment but also with the operation time of the platform. Additionally, repeated visits to locations will develop multiple competing representations, which will reduce recall performance over time. These properties impose severe restrictions on long-term autonomy for mobile robots, as loop closure performance will inevitably degrade with increased operation time. In this paper we present a graphical extension to CAT-SLAM, a particle filter-based algorithm for appearancebased localization and mapping, to provide constant computation and memory requirements over time and minimal degradation of recall performance during repeated visits to locations. We demonstrate loop closure detection in a large urban environment with capped computation time and memory requirements and performance exceeding previous appearance-based methods by a factor of 2. We discuss the limitations of the algorithm with respect to environment size, appearance change over time and applications in topological planning and navigation for long-term robot operation. Keywords— Localization, Mapping, SLAM, Vision

I.

INTRODUCTION

Appearance based mapping provides the means to create useful topological and metric maps on resource limited robots by using appearance signatures to identify places, rather than relying on accurate metric sensing. Appearance based methods are popular as a method of detecting loop closure in range based metric maps [1, 2], and also for generating complete topological maps that can be used for path planning and navigation [3]. Appearance may refer more broadly to a robot’s sensor signatures [4], but most often refers to a snapshot image of a location from a camera mounted on the robot. The computer vision community has provided much of the initial impetus in the advent of appearance based SLAM. Advances in image retrieval techniques, such as visual bag-ofwords [5], have produced impressive results, but there is an opportunity to take advantage of the robotic context of the appearance based mapping problem. Robots on the move typically have readily available odometric information, or can This research was supported by an Australian Research Council Special Research Initiative on Thinking Systems, TS0669699, and an Australian Postgraduate Award.

281

improvements in recall performance with capped memory and computation costs. II.

a clustering-based approach to identify unnecessary views and remove them from the map. [28] presented an informationbased approach to node pruning, which removes nodes based on local visual saliency relative to its neighbors. This method provided constant-memory loop closure detection using CATSLAM when operating in a fixed size environment, but did not address frequent location revisiting. In this paper we will explore the challenges of achieving constant computation time and memory requirement mapping and localization during repeated revisits in a fixed size environment using an appearance based system.

RELATED WORK

Graphical representations in metric SLAM have been studied extensively for many years [9], and pose-graph optimization remains an active area of research [10, 11]. GraphSLAM [12] and other well-known topological SLAM methods form a pose graph of observations connected by edges constrained using relative motion information. However, the goal of these graphical methods is to create an optimal metric map in a global Euclidean frame. The use of relative representations without global embedding has been explored in the VO community [13], where only local metric accuracy is required. Notably, [14] combines local relative geometry without a global frame with topological appearance-based localization using FAB-MAP. A number of recent algorithms in the field of probabilistic topological mapping approach loop closure and map construction as two parts of the same problem. The approach of [15] finds the optimal set of local metric and appearance information in the current map that best matches the current set of observations and local motion. [16] describes a system where both local metric maps and topological position are used to determine the current location within the hybrid map. A general formulation of this approach is presented in [17] using a Rao-Blackwellised particle filter across a probabilistic topology. However, these approaches have only been demonstrated in small environments. Appearance-based localization systems do not typically address data association when revisiting locations multiple times, instead creating multiple representations for each location. [18, 19] describes a data association procedure upon loop closure detection, but the later approach adopted in [20] simply adds multiple representations. Large scale appearancebased localization is typically only demonstrated on trajectories that only feature one loop closure for each location [21], which does not address the persistence problem. Attempts to improve the recall performance of appearance-based SLAM algorithms such as FAB-MAP typically require additional information not provided by descriptor-based image similarity alone; [1] uses RANSAC to compare feature geometry, [6] requires vehicle odometry information and [22] uses additional laser or stereo image sensors for 3D geometric verification. Constant computation time and memory requirements for mapping systems have been addressed most extensively in the metric mapping domain. Submaps have been used to achieve constant time map convergence in the approach by [23], with the assumption of repeated robot visits to all parts of the environment. Dynamic submaps have also been used to achieve a constant map update time in the approach by [24]. Occupancy grid mapping approaches typically scale linearly in terms of memory requirements with the area of the map: [25] builds on the occupancy grid approach by forming multiple occupancy maps in parallel, each representing a different timescale, and demonstrated it over a period of five weeks. Relatively little work on constant memory or computation time mapping has occurred in the appearance-based mapping space. [26] describes a short/long term memory approach to limiting visual bag-of-words location growth, and [27] outlines

III. ALGORITHM DETAILS The proposed algorithm outlined in this section extends the linear ‘trajectory-based’ representation of [7] to a generalised graph-based representation. The steps of the algorithm for each new update of control input uk and visual bag-of-words observation zk are as follows: 1. Add uk and zk to the graph G as node Nk. 2. Update all particle locations in the graph using control input uk; match to best existing location in the graph. 3. Update all particle weights using observation zk: match to expected appearance at particle location. 4. Normalise particle weights along with an ‘unvisited location’ weight to represent the likelihood of a new place. 5. Determine if the particles indicate a loop closure has occurred; if so, create a link from Nk to the particle location. 6. Resample the particles if necessary. Distribute ‘unvisited location’ particles to random locations in the graph (to combat particle deprivation). 7. If the number of nodes exceeds the maximum, determine node information content and remove the least informative node. The graph G defines a connected manifold which is locally Euclidian but not embedded in a global Euclidean frame. Nodes Ni in the graph are associated with observations zi, which take the form of a visual bag-of-words representation of features visible at location i. Edges eij connect node Ni to Nj and are associated with the control input uij (and associated covariance matrix Σij) experienced at location i. Localization is performed using a set of np particles pk(i ) which represent the likelihood of revisiting the previously visited location at fraction α (i ) (between 0 and 1) along associated edge eij(i ) at time k. Each particle is also associated with a Boolean direction variable d (i ) along with a weight wk(i ) . A. Local Graph Construction To perform local operations on the graph we adopt a relative representation, which constructs the local neighbourhood of the graph from the reference frame of particle pk(i ) to a maximum radius of r. A standard non-linear motion model f(x,u) generates 3DOF Euclidean changes in local pose x from control input u, defined as follows:

282

Algorithm 1 Local Graph Construction function localGraph ( pk(i ), r ) :

N L is a list of nodes and local positions E L is a list of edges i i x iL := −α ( ) f ( 0, u ij ) or x iL := −α ( ) f ([0 0 π ], u ij ) if d (i ) = 1

push { N i , x iL } to fringe F until fringe F is empty pop { N n , x nL } from fringe F

add { N n , x nL } to N L (if Nn not already present) for every edge emn and enm linked to Nn do

x mL := f ( x nL , u nm ) or x mL := f ( x nL , −u mn )

(a)

push emn or enm to E L

(b)

if x mL < r then

Figure 1 – Local graph and particle update diagram. (a) illustrates the local graph of particle pk(i ) . The graph is constructed from the reference frame of the particle to a fixed radius r. (b) illustrates the local particle update procedure for particle pk(i ) . The particle is locally propagated by control input uk (plus additive Gaussian noise wk) to generate proposed local position xˆ (ik ) . Each edge is then tested to find the location x (ik ) which minimizes the Mahalanobis distance to the proposed particle location xˆ (ik ) using covariance matrix Σk. The updated particle location is set to the most likely edge ejk at fraction αˆ .

x = [x y θ ]T , u k = [Δx Δy Δθ ]T

push { N m , x mL } to fringe F else add { N m , x mL } to N L end if end for end until return N L , E L

(1)

location xˆ (ik ) . The local graph radius is set to the length of the proposed pose change xˆ (ik ) plus a multiple s of the maximum eigenvalue λ1 of covariance matrix Σk (typically set to 3 standard deviations or more). This represents a worst-case scenario for the difference between the proposed pose change xˆ (ik ) and nearest edge location xˆ kL .

The first step in constructing the local graph is to determine the local position x iL of node Ni relative to particle pk(i ) . From there a breadth-first search (with fringe F) is performed to find all nodes Nn (and associated local positions x nL ) and edges emn and enm within a distance r of particle pk(i ) , outlined in Algorithm 1. Note that the first node and associated edge to fall outside distance r is still added to the list – this ensures all edges even partly within the local graph radius are included in EL. The direction variable d(i) reverses the initial position x iL to represent a particle in a reverse orientation along edge eij. A sample local graph is illustrated in Fig. 1 (a).

C. Appearance-based Observation Update The observation update weights particles based on the likelihood of the current observation given the expected local appearance. The likelihood is calculated by comparing the current visual bag-of-words zk to the appearance generated by interpolating between observations zi and zj, given particle pk(i )

B. Particle Position Update To follow a localization hypothesis, each particle is propagated along edges in the graph according to the current control input uk. The proposed local change in pose xˆ (ik ) for each particle is generated by adding Gaussian noise w k with covariance Σk to the position generated by the non-linear motion model f(0,uk). However, the proposed current location will rarely correspond to a location along an existing edge in the local graph, therefore particles are assigned to the most likely location along an edge in the local graph. Formally, we seek a location xˆ kL at fraction αˆ along edge eij in EL which minimizes the Mahalanobis distance (using covariance matrix Σk) to the proposed particle location xˆ (ik ) . We solve for the minimum by differentiating the Mahalanobis distance along an edge eij with respect to fraction αˆ and solving for ∂ ∂αˆ = 0 : if αˆ is between 0 and 1 a local minimum lies along edge eij. The weight of each particle is updated based on the Gaussian likelihood function P ( xˆ kL | xˆ (ik ), Σk ) . This process is illustrated in Fig. 1 (b) and outlined in Algorithm 2. To reduce computational complexity the graph is only built to the size required to include any edge near the most likely particle

Algorithm 2 Particle Position Update (i ) , u k , Σk ) : function updateParticlePosition ( pk−1

xˆ (ki ) := f (0, u k ) + N (0, Σ k )

(

(i ) N L , E L := localGraph pk−1 , xˆ (ik ) + sλ1 ( Σk )

L

L i

)

L j

for every edge eij in E from x to x in N L do T ∂ ⎡ L x i + αˆ ( x Lj − x iL )⎤⎦ Σ−1k ⎡⎣x iL + αˆ ( x Lj − x iL )⎤⎦ = 0 for αˆ ⎣ ∂αˆ if 0 ≤ αˆ ≤ 1 then

solve

xˆ kL := x iL + αˆ ( x Lj − x iL )

(

)

⎡xˆ L ˆ (i ) ⎤ P ( xˆ kL | xˆ (ik ), Σk ) := exp − 12 ⎡⎣xˆ kL − xˆ (ik ) ⎤⎦ Σ−1 k ⎣ k − xk ⎦ T

store αˆ , i, j for maximum P end if end for (i ) α k(i ) := max αˆ, , ek(i ) := max eij , wˆ k(i ) := wk−1 max P ( xˆ kL | xˆ (ik ), Σk ) P

283

P

P

is at fraction α (i ) on edge eij at time k. The weight of each particle is updated as follows:

wˆ k(i ) := wˆ k(i ) P ( z k | z i , z j , α (i ) )

(3)

The form of this likelihood function is derived from FABMAP and illustrated in full in [7]. D. New Place Detection Since the particle set pk is constrained to exist only along edges e in the graph, they can only represent location hypotheses for previously visited locations. To determine if the current set of observation and motion information indicates the vehicle is in a previously unvisited location we sample from an ‘unvisited’ location G not on the graph G.

P (G | z k , u k ) = P ( z k | G ) P (G | u k )

(a)

Figure 2 – Node information content and node pruning diagram. (a) illustrates the calculation of most likely local position xˆ iL at fraction αˆ on the hypothetical edge ejk between node Nj and Nk. This location is used to generate an expected appearance which is compared to observation zi at node Ni to determine the information content Ii. (b) illustrates the connectivity of the local graph after node Ni has been removed.

(4)

The observation and motion distributions for an unvisited location G can be approximated using information from training data as follows:

P ( z k | G ) P (G | u k ) ≈ P ( z k | z avg ) P ( u avg | u k )

(b)

Algorithm 3 Loop Closure Detection for every particle pk(i ) do

N L , E L := localGraph ( pk(i ), dh )

(5)

for every particle pk( j ) on every edge eij in E L do

P ( pk(i ) ) := ∑ wk( j ) end for end for if max P ( pk(i ) ) > T then

zavg represents an ‘average’ observation and uavg an ‘average’ control input. These are determined using the mean field approximation or a sampling method, both presented in [8]. The proposed weighting assigned to a location not on the graph is given as follows:

wˆ knew = 1n P ( z k | z avg ) P ( u avg | u k )

create edge eki using u k − α k(i ) u ij , Σk + Σij end if

(6)

The new location weight is denoted by wnew. Note that it is not recursively updated; this represents a uniform likelihood of departing the graph at any point in time.

current and matched location. This process is outlined in Algorithm 3. The addition of a new edge on a loop closure event is crucial for increasing recall on repeated traverses of a route; it allows particles to simultaneously evaluate multiple representations of a location while recognising that all representations correspond to the same physical location (since the local graph construction will connect both locations).

E. Resampling and Loop Closure Detection Particle resampling is performed using the Select with Replacement method as in [7]. Any particles selected to replace the new location weight are sampled to a random edge on the graph (with a random direction). This serves to counteract the effects of particle deprivation since the proportion of particles sampled to random locations on the graph increases as the new place likelihood increases, thereby increasing the probability of detecting loop closures without requiring evaluation of every previously visited location. To determine the most likely location hypothesis from the distribution of particles a spatially selective method is used, equivalent to integrating the probability distribution over a local area in the graph. For every particle pk(i ) , the location hypothesis P ( pk(i ) ) is equal to the sum of the weights of all particles within distance dh of the current particle within its local graph. The value of dh is selected based on the desired resolution of loop closure detection, and as such the location hypothesis is not subject to arbitrary location discretization due to local visual saliency. If the maximum location hypothesis exceeds a threshold T, a new graph edge is added between the

F. Local Information-based Pruning To limit the map to a fixed maximum size we extend the information-based trajectory pruning approach of [28] to a graphical representation. The pruning stage is performed before each particle update process. For each new node added to the graph, if the total number of nodes in the graph exceeds a preset number, the node with the lowest information content relative to its neighbors is removed and replaced with a direct link between its neighbors. To find the information content of a node Ni relative to its neighbors Nj and Nk we compare the observation zi to that generated by interpolating neighboring observations zj and zk along a proposed edge ejk that bypasses node Ni. If the proposed edge ejk produces an adequate representation for observation zi, then node Ni can be removed with minimal loss of information. This process is illustrated Fig. 2 (a) and outlined in Algorithm

284

College, Oxford, pictured in Fig. 3, with multiple traversals of each location in both forward and reverse directions (a total of 5 traversals of the quadrangle area). Ground truth is nominally provided by GPS locations; however, the signal is degraded in many locations throughout the urban dataset (particularly through a tunnel between courtyards). Approximately 45% of the panoramic images have an associated valid GPS position; recall data for the precision recall curves is instead based on a hand-corrected trajectory from wheel odometry and manual loop closures, which provides a topologically correct location for every frame.

Algorithm 4 Node Information Content for every node Ni do for every edge pair eji from Nj and eik to Nk do x Lj := f (0, −u ji ), x kL := f (0, u ik ), Σ jk := Σ ji + Σik solve

T ∂ ⎡ L x j + αˆ ( x kL − x Lj )⎤⎦ Σ−1jk ⎡⎣x Lj + αˆ ( x kL − x Lj )⎤⎦ = 0 ⎣ ∂αˆ

xˆ iL := x Lj + αˆ ( x kL − x Lj )

(

)

P ( xˆ iL | x Lj , x kL ) := exp − 12 ⎡⎣xˆ iL ⎤⎦ Σ jk ⎡⎣xˆ iL ⎤⎦ T

I i = − log P ( z i | z j , z k , αˆ ) P ( xˆ | x , x end for end for L i

L j

L k

)

B. Algorithm Parameters As FAB-MAP, CAT-SLAM and CAT-Graph only require appearance information, no camera calibration or image registration is required. Feature descriptors are extracted using SURF [29], and a fast approximate nearest neighbour algorithm [30] was used to find the corresponding visual word for each descriptor. The FAB-MAP implementation used for comparison is openFABMAP [31], which produces results comparable to those presented in [2]. Parameters for the detector functions of FAB-MAP were taken from [19]. The parameters and results for CAT-SLAM on the same dataset are taken from [6]. The

Algorithm 5 Node Pruning if k > nn then find Ni with minimum Ii between nodes Nj and Nk create edge ejk between Nj and Nk using u ji + u ik , Σ ji + Σik for every edge eni from Nn to Ni where n ≠ j do create edge enj using u ji − u ni , Σ ji + Σni for every edge eim from Ni to Nm where m ≠ k do create edge ekm using u im − u ik , Σim + Σik end for delete node Ni and observation zi, edges eji and eik end if 4. The information content Ii of node Ni, is defined as the negative log-likelihood of the odometric and appearance-based match between node Ni and the proposed location xˆ iL along the proposed edge between Nj and Nk. Unlike in [28], nodes may have multiple neighbors due to explicit loop closure events, and therefore the information content of all proposed edges between neighbors must be evaluated to find the bypassing edge with highest information content. To avoid unbounded growth in storage requirements, nodes are removed from the graph once the total number of nodes exceeds a threshold nn. The node Ni with minimum information content Ii is deleted from the graph, and the proposed edge ejk between neighbors Nj and Nk is added. All other nodes connected to Ni are re-routed to neighbors Nj and Nk to preserve the connectivity of the graph. Particles previously on edges connected to Ni are relocated to the new edges. This process is illustrated in Fig. 2 (b) and outlined in Algorithm 5.

Figure 3 – Aerial view of test environment with route overlaid. TABLE 1 – ALGORITHM PARAMETERS openFABMAP

p(zi = 1 | ei = 0) p(zi = 0 | ei = 1) p( Lnew | Zk −1 ) CAT-SLAM

IV. EXPERIMENTAL SETUP To facilitate a direct comparison to FAB-MAP and previous implementations of CAT-SLAM, we use the same evaluation dataset as that presented in [6]. To differentiate the algorithms we refer to the novel representation presented in this paper as a Continuous Appearance-based Topological Graph, or CAT-Graph.

p(zi = 1 | ei = 0) p(zi = 0 | ei = 1) Number of Particles N ESS Threshold Distribution Radius r

0 0.61 0.9 0 0.61 2000 0.25 2.5m

CAT-Graph

p(zi = 1 | ei = 0) p(zi = 0 | ei = 1) Number of Particles np Number of Nodes nn Particle Update Graph Size s ESS Threshold Distribution Radius dh Hypothesis Threshold T

A. Testing Environment The urban dataset used for this evaluation is presented in [8]. It consists of 8127 panoramic images from a Point Grey Ladybug2 camera with accompanying wheel odometry (from shaft encoders on the Segway RMP) and GPS data logged at 5Hz. The route taken is a 2.5km tour of the grounds of New

285

0 0.61 1000 2000 3σ 0.25 2.5m 0.9

codebook was generated using modified sequential clustering [32] yielding 5000 visual words. Parameters for the three algorithms are presented in Table 1. V.

RESULTS

A. Precision-Recall Performance To assess the performance of the CAT-Graph algorithm in comparison to openFABMAP and CAT-SLAM, we examine the precision-recall curves they produce for the test environment. Expected matches are defined as previously visited locations within 7.5m of the current location. The desired performance is high recall at 100% precision. Fig. 4 presents the precision-recall curves produced by openFABMAP, CAT-SLAM and two variants of the CATGraph algorithm; one using 1000 particles and one with 1000 particles and a limit of 2000 nodes. Both CAT-Graph variants provide almost double the recall of CAT-SLAM and 7 times the recall of openFABMAP at 100% precision, but do not differ significantly from each other despite the difference in memory scaling. This demonstrates the effectiveness of the information-based node-pruning algorithm; for this environment, localization performance is not significantly affected despite the removal of approximately 5000 observations from the graph.

(a) Ii = 27.64

(b) Ii = 6.71

Figure 5 – Information content illustration. (a) illustrates a sequence of nodes with high information content. In this case, close proximity to buildings and trees cause a high degree of difference between sequential frames, yielding high relative information. (b) illustrates a sequence through a tunnel, with very little visual change between successive frames and therefore a low information content.

C. Node Information Content Fig. 5 illustrates a pair of frame sequences at different locations in the environment. The first sequence involves the robot moving under an overhanging tree and passing a distinctive building, while the second involves the robot travelling through a dark, featureless tunnel. As shown below the sequences, the information content for the central frame in the second sequence is significantly lower than that calculated for the central frame in the first sequence.

B. Loop Closure Distribution To assess the effects of using loop closure events to inform the topological graph construction (and therefore to examine the improvement gained over other appearance-based SLAM algorithms which do not explicitly perform data association) we examine the loop closure distribution for all four algorithms. Fig. 6 shows the loop closures detected at 100% precision projected onto the hand-corrected ground truth. Both variants of CAT-Graph detect a significant number more loop closures than openFABMAP and CAT-SLAM, and the distribution of loop closures are uniform (they are not concentrated at any particular location but rather spread across the environment).

D. Computational and Memory Scaling Fig. 7 presents the computation and storage requirements for openFABMAP, CAT-SLAM and the two CAT-Graph variants. The computation time does not include feature extraction and visual word classification (on average 800ms per frame), as these will be identical for all three algorithms. The difference in computational scaling is clear, with openFABMAP reduced to update rates below 1Hz by the end of the dataset due to the linear increase in computational requirements. The CAT-Graph configuration with 2000 nodes requires the greatest amount of time per update (as the information content of each node is assessed), but still remains approximately constant over the length of the dataset. Spikes in the computation graph are due to the evaluation of nodes with multiple neighbors, as the information content between every combination of neighbors is assessed. The storage requirements scale linearly with operation time for all but the CAT-Graph configuration with limited nodes. All algorithms require the codebook and Chow-Liu tree from training data, which contribute to the initial ~4MB at the start of the dataset. Storage requirements initially increase linearly for both algorithms; CAT-SLAM requires additional storage for location and odometry information for the continuous trajectory as well as particle locations, weights and directions. However, once the number of locations reaches 2000, CATGraph with limited nodes does not require more storage, whereas openFABMAP and CAT-SLAM memory requirements continue to increase linearly with the number of frames.

Figure 4 – Precision-Recall curve for four algorithm variants on the New College dataset. The two CAT-Graph variants provide greatly increased recall performance over CAT-SLAM and openFABMAP despite the differences in computational and memory scaling.

286

(a) Figure 7 - Computation time and memory requirements for openFABMAP, CAT-SLAM and CAT-Graph. All CAT-SLAM and CAT-Graph variants provide constant computational time scaling, and the CAT-Graph variant with limited nodes provides constant memory scaling over time.

Loop closure events in CAT-Graph also increase the number of edges in the map and therefore memory requirements. However, the worst case memory requirement for storage of edges is when an edge exists for every node pair, making it O(N2) in number of nodes; the worst case edge storage requirements do not grow over time if the number of nodes is fixed. VI. (b)

DISCUSSION

The use of a topological graph and local metric representations along with loop-closure-informed graph links provide CAT-Graph with significant performance increases over CAT-SLAM and openFABMAP. In this section we discuss the key insights gained from this work and directions for future work. A. Improved Loop Closure Through Hypothesis Aggregation The ability to combine multiple hypotheses for multiple representations of the same location is crucial for long-term operation in a fixed-size environment. CAT-Graph consistently detects more loop closures over successive revisits to the same location, whereas CAT-SLAM develops multiple separate representations of the same trajectory but does not identify that all correspond to the same location. Unlike CAT-SLAM, where location hypotheses compete upon revisiting a location, by creating explicit links between locations when loop closures are detected, multiple hypotheses from multiple representations of the same location support, rather than complete with, each other. B. Saliency-based Map Pruning The node pruning scheme enables localization and mapping in a fixed size environment without memory requirements increasing linearly with operation time. Nodes are removed based on visual saliency, with less distinctive areas of the environment generating fewer nodes. The selection of the maximum number of nodes nn depends on the memory available to the robot and the visual saliency of the environment. Experiments presented in [28] for the CATSLAM algorithm illustrate a graceful degradation of recall performance with reduced numbers of nodes and particles; however, determining the absolute minimum number of nodes

(c)

(d) Figure 6 – Loop closure distribution at 100% precision overlaid on handcorrected ground truth based on wheel odometry information. CAT-Graph detects consistently more loop closures than both openFABMAP and CATSLAM in all areas of the environment.

287

sufficient to fully represent a particular fixed-size environment remains an open problem, and the constant-memory approach will fail if the robot continuously explores new locations. When maintaining a constant memory map in a fixed-size environment, removing the least informative nodes ensures future localization performance is minimally affected. C. Future Work Incorporating loop closure events into graph construction provides performance benefits when repeatedly traversing the same location. However, in order for the loop closure events to be detected, the appearance of the environment cannot change significantly between visits. While FAB-MAP provides a level of robustness, matching still fails over large changes in environmental appearance, such as experienced during the course of a day [33]. Explicit modeling of appearance change over time as in [34] could enable persistent localization and mapping over longer time periods. Along with its mapping and localization capabilities, the graph-based representation of CAT-Graph forms a suitable basis for mobile robot path planning and navigation. A world representation which is globally topological and locally metric, such as that provided by CAT-Graph, is suitable for autonomous robot operations as demonstrated in [3]. To use the global topological plan for robot navigation, it must be integrated with a local planning mechanism. This can be accomplished by generating the local metric graph representation at the current location, and planning a local metric path towards the nodes selected by the global topological planner, such as in [28]. In conclusion, we believe that the use of loop closure events to inform future appearance-based matches, along with a framework for constant computational and memory resource usage, are important steps towards persistent robot autonomy.

[10] M. Kaess et al., "iSAM2: Incremental smoothing and mapping with fluid relinearization and incremental variable reordering" in IEEE International Conference on Robots and Automation. 2011. Shanghai, China. [11] R. Kummerle et al., "g2o: A general framework for graph optimization" in IEEE International Conference on Robots and Automation. 2011. Shanghai, China. [12] S. Thrun and M. Montemerlo, "The graph SLAM algorithm with applications to large-scale mapping of urban structures". The International Journal of Robotics Research, 2006. 25(5-6): p. 403. [13] D. Nister, O. Naroditsky, and J. Bergen. "Visual odometry" 2004: IEEE. [14] G. Sibley et al., "Vast-scale Outdoor Navigation Using Adaptive Relative Bundle Adjustment." The International Journal of Robotics Research, 2010: [15] E. Olson, "Robust and efficient robotic mapping" PhD Thesis. 2008, Massachussets Institute of Technology. [16] J. Blanco, J. Fernandez-Madrigal, and J. Gonzales, "Toward a Unified Bayesian Approach to Hybrid Metric-Topological SLAM". IEEE Transactions on Robotics, 2008. 24(2): p. 259-270. [17] A. Ranganathan and F. Dellaert, "Online Probabilistic Topological Mapping". The International Journal of Robotics Research, 2011. [18] M. Cummins and P. Newman. "Probabilistic appearance based navigation and loop closing." in IEEE International Conference on Robotics and Automation. 2007. Rome, Italy. [19] M. Cummins and P. Newman, "FAB-MAP: Probabilistic localization and mapping in the space of appearance." The International Journal of Robotics Research, 2008. 27(6): p. 647. [20] M. Cummins and P. Newman, "Appearance-only SLAM at large scale with FAB-MAP 2.0." The International Journal of Robotics Research, 2010. [21] M. Cummins and P. Newman. "Highly scalable appearance-only SLAM– FAB-MAP 2.0." in Robotics: Science and Systems Conference. 2009. Seattle, Washington. [22] R. Paul and P. Newman. "FAB-MAP 3D: Topological Mapping with Spatial and Visual Appearance." in IEEE International Conference on Robotics and Automation. 2010. Anchorage, Alaska. [23] J. Leonard and P. Newman. "Consistent, convergent, and constant-time SLAM." in ICJAI, Acapulco, Mexico, 2003. [24] S. Thrun et al. "SLAM updates require constant time." in Workshop on the Algorithmic Foundations of Robotics, 2002. [25] P. Biber and T. Duckett, "Experimental analysis of sample-based maps for long-term SLAM." The International Journal of Robotics Research, 2009. 28(1): p. 20-33. [26] M. Labbe and F. Michaud, “Memory management for real-time appearance-based loop closure detection,” in IEEE Conference on Intelligent Robots and Systems, San Francisco, CA, 2011. [27] K. Konolige and J. Bowman, “Towards lifelong visual maps,” in IEEE Conference on Intelligent Robots and Systems, St. Louis, MO, 2009. [28] W. Maddern, M. Milford and G. Wyeth, “Capping computation time and storage requirements for appearance-based localization with CATSLAM,” in IEEE International Conference on Robotics and Automation, Minneapolis, MN, 2012. [29] H. Bay, T. Tuytelaars, and L. Van Gool. "Surf: Speeded up robust features." in European Conference on Computer Vision. 2006. Graz, Austria: Springer. [30] S. Arya et al., "An optimal algorithm for approximate nearest neighbor searching fixed dimensions." Journal of the ACM, 1998. 45(6): p. 891923. [31] A. Glover et al. "OpenFABMAP: An Open Source Toolbox for Appearance-based Loop Closure Detection." in IEEE International Conference on Robotics and Automation. 2012. St Paul, United States: IEEE. [32] A. Teynor and H. Burkhardt, "Fast Codebook Generation by Sequential Data Analysis for Object Classification". Advances in Visual Computing, 2007: p. 610-620. [33] A. Glover et al., "FAB-MAP + RatSLAM: Appearance-based SLAM for Multiple Times of Day", in IEEE International Conference of Robotics and Automation. 2010: Anchorage, Alaska. [34] F. Dayoub and T. Duckett. “An adaptive appearance-based map for longterm topological localization of mobile robots.” IEEE/RSJ International Conference on Intelligent Robots and Systems, 2008.

REFERENCES [1] K. Konolige et al., "View-based maps." The International Journal of Robotics Research, 2009. 29(8): p. 1-17. [2] P. Newman et al., "Navigating, Recognizing and Describing Urban Spaces With Vision and Lasers". The International Journal of Robotics Research, 2009. 28(11-12): p. 1406. [3] M. Milford and G. Wyeth, "Persistent Navigation and Mapping using a Biologically Inspired SLAM System." The International Journal of Robotics Research, 2010,29: 1131-1153. [4] P.E. Rybski et al., "Appearance-based minimalistic metric SLAM". in IEEE International Conference on Intelligent Robots and Systems, Las Vegas, NV, 2003. [5] J. Sivic and A. Zisserman, "Video Google: A text retrieval approach to object matching in videos." in IEEE International Conference on Computer Vision. 2003. Nice, France. [6] W. Maddern, M. Milford, and G. Wyeth, "Continuous Appearance-based Trajectory SLAM". in IEEE International Conference on Robots and Automation. 2011. Shanghai, China. [7] W. Maddern, M. Milford and G. Wyeth, “CAT-SLAM: Probabilistic localization and mapping using a continuous appearance-based trajectory,” The International Journal of Robotics Research, 31(4): 429451, 2012. [8] M. Smith et al., "The new college vision and laser data set." The International Journal of Robotics Research, 2009. 28(5): p. 595. [9] S. Thrun and J. Leonard, "Simultaneous Localization and Mapping", in Springer Handbook of Robotics, B. Siciliano, Editor. 2008, Springer Berlin Heidelberg.

288

Robust Navigation Execution by Planning in Belief Space Bhaskara Marthi Willow Garage, Inc. Menlo Park, CA 94025 Email: [email protected]

Abstract—We consider robot navigation in environments given a known static map, but where dynamic obstacles of varying and unknown lifespans appear and disappear over time. We describe a roadmap-based formulation of the problem that takes the sensing and transition uncertainty into account, and an efficient online planner for this problem. The planner displays behaviors such as persistence and obstacle timeouts that would normally be hardcoded into an executive. It is also able to make inferences about obstacle types even with impoverished sensors. We present empirical results on simulated domains and on a PR2 robot.

B

A S

G

I. I NTRODUCTION Navigation given a known map is a fundamental capability for mobile robots, and there exist practical and efficient methods for subproblems such as localization and path-planning. Less well-understood is how to put these pieces together such that the overall system acts robustly in the world. Define a navigation system to be a program that takes in a known static map, localization information, sensor data, and navigation goals (represented as positions in space), and controls lower level systems (say via velocity commands of the form ˙ Given a correct map of a static indoor environment (x, ˙ y, ˙ θ)). (i.e., with no new obstacles besides the ones in the map), a navigation system is easy to write. For example, we could take localization input from the AMCL algorithm [24], call a planner such as A* [21] on a discretized grid to produce a path, then use a trajectory following algorithm such as DWA [1] to generate velocity commands that will follow that path. An autonomous robot in an unconstrained real-world environment such as an office must, however, also deal with various sorts of changes to the map. A person might be standing in a doorway, a couch may have temporarily been moved into a corridor, rendering it impassable, and so on. Navigation systems should behave efficiently and sensibly given that such dynamic obstacles appear (and disappear) over time. A standard approach is to maintain a map that is updated in some manner given obstacles. Getting this to work robustly is surprisingly tricky. Consider the prototypical example shown in Figure 1. Here the shortest way to the goal is to go down the narrow hallway. The next best alternative is to go around the building, which takes ten times as long. Now suppose a set of obstacle points appear in the hallway at position A, making it impossible to traverse. Certainly, the robot should not stand and wait forever. The map must therefore be updated with this obstacle, causing the robot to eventually choose the long path.

Fig. 1.

An example navigation problem. The goal is to get to G from S.

Also, the robot must not suddenly change its mind when it is, say, at position B, and turn back around. This requires either making obstacles fairly long-lived, or hardcoding “persistence” of some sort into the system. In the first case, there needs to be some scheme for eventually timing out obstacles, to avoid the possibility, e.g., of a corridor being considered permanently out of bounds due to seeing a person blocking it once. Finally, the algorithm should deal intelligently with different types of obstacles. It might, for example, make sense to wait for a person to move, but not for a couch. Ideally, perception would give us this information. But even if, as is currently the norm, perception is fairly impoverished/noisy, there is still the possibility of a kind of implicit sensing of the obstacle type: in the example, it might make sense to just wait for a few seconds. If the obstacle disappears, the robot can take the short path after all. If it stays where it is, it is likely to be static and the robot should take the long route. The usual way to achieve robust and efficient behavior in such cases is to have an program known as an executive sitting above the planners [16]. The executive maintains its own local state and contains various hand-coded procedures, which are intended to avoid infinite loops and dead-ends and to appear intelligent and goal-directed. As a typical example, we consider the executive from the ROS navigation stack [11], a popular open source navigation system. Table I shows the set of local state variables maintained by this executive. The

289

State variable mode time last-valid-plan last-valid-control oscillation-pose oscillation-timestamp recovery-index recovery-cause

Meaning Overall mode: planning, controlling, or clearing The current time Timestamp when planning last succeeded Timestamp when a valid velocity last found Reference pose used to detect oscillations Timestamp for oscillation pose Which recovery behavior was last tried Cause of last failure. One of no-valid-plan, no-valid-control, or oscillation

we will show, the optimal policy for this POMDP automatically exhibits the various behaviors discussed above, such as persistence, patience, and implicit sensing, without having to hardcode them into an executive. The scientific contributions include efficient algorithms for state-updates (Section IV) and planning (Section V) in this POMDP. At the systems level, the main contribution is showing by example that it is possible to build a robust navigation system with failure recovery using a principled decisiontheoretic planning approach, rather than scripted behaviors. Section VI describes the implemented system and empirical results on both simulated and real-world domains.

TABLE I S TATE VARIABLES OF THE EXECUTIVE IN THE ROS NAVIGATION STACK

idea is that the overall system can be in one of three modes. In “planning” mode, the robot is stationary and waiting for a plan. In “controlling” mode, the robot has a plan and is sending velocity commands at some rate. Finally, in “clearing” mode, something has gone wrong, and one of a user-specified set of recovery behaviors is being executed. These may include removing obstacles from the map beyond some distance, or rotating in place. Several other variables govern the transitions between these modes, as shown in Table I. The ROS navigation system has achieved impressive performance in real-world demonstrations, such as navigating with no human intervention for eleven days in an office environment. But there are a few disadvantages to using a complicated and stateful executive. First, there is significant programmer effort involved in coming up with the execution strategy, which is usually done in a trial-and-error fashion. Second, understanding the system’s behavior requires knowing all the state variables in Table I and their interactions with each other and with the recovery behaviors. This makes it hard to debug or modify. It also means the planning algorithms have an inaccurate model of how their plans will be executed. Finally, the executive is not capable of certain types of behavior, such as information-seeking actions. We propose an alternative way of structuring navigation systems. The idea is to explicitly model and plan for the uncertainty in the world, then use a simple stateless executive that just follows the resulting plan. There are many ways to formulate the planning problem, some of which we discuss in Section III. One possibility is to model the transition uncertainty with costs or probabilities, while still pretending that the state is fully observable. These approaches are systematically suboptimal, though, as they will not take information-gathering actions. Alternatively, we can explicitly model the sensing uncertainty, using a partially observable Markov decision process (POMDP). POMDPs are notoriously difficult to solve, however, and the POMDP in our case has size exponential in the number of possible obstacle locations. In Section III, we present a particular formulation of the problem as a POMDP, and an algorithm for solving it efficiently enough to run online on a mobile robot. The formulation is based on a roadmap over the static map, where nodes represent particular locations and edges are local paths that can become blocked and unblocked over time. Obstacles may belong to different classes, with varying lifetimes. As

II. BACKGROUND A. MDPs An undiscounted Markov decision process [17], or MDP, consists of a state space S, action set A, transition model P (s |s, a), reward function R(s, a, s ), and terminal states T ⊂ S. A stationary policy is a function π : S → A. A policy induces a distribution over state-action trajectories that continue until reaching a terminal state. The value function of a policy V π (s) is the expected total reward for following π starting at s, and the Q-function Qπ (s, a) is the expected total reward for doing a in s, then following π. The optimal policy π ∗ maximizes the value at all states, and we write V and Q for its value and Q-functions. In our examples, the set of actions varies depending on the state, but this can be represented by making nonapplicable actions have reward −∞. B. POMDPs A partially observable Markov decision process [6], or POMDP, is like an MDP except that states are not directly observed. Instead, there is an observation distribution Z(o|s, a, s ) over the observation that’s received when making the transition from s to s via a. In the context of POMDPs, we call a distribution over the state space a belief state. Given a belief state b about the current state s, if we do an action a, the marginal distribution over the next state is the result of the projection operator b = P(b, a) where:  b (s ) = b(s)P (s |s, a) s∈S

Also, given an observation o, the conditional distribution is b = C(b , s, a, o) where: b (s ) = 

b (s )Z(o|s, a, s )    s ∈S b (s )Z(o|s, a, s )

The filtering operator F consists of projection followed by conditioning, and is used to update the distribution over the current state. A key fact about POMDPs is that the sequence of beliefs itself forms an MDP (with the same action space). To sample from the transition model of this MDP given belief b and action a, first sample s from b, then sample s from P (·|s, a) and o from Z(·|s, a, s ), and set b = F(b, a, o) (s is discarded). The reward function

290

   is R(b, a, b ) = s,s b(s)P (s |s, a)R(s, a, s ). The belief state summarizes the relevant information about the action– observation history; in particular, any optimal policy for the belief state MDP is also optimal for the POMDP, assuming the belief state is maintained exactly.

B

s

G A

C. Solution algorithms The literature has tended to focus on the offline planning problem of finding a full policy π over the belief space. Due to the constraints of running on a robot in real-time, we consider instead the online planning problem [20] where the agent is repeatedly given an observation and just returns an action for the current belief state. Most online algorithms are based on forward search. A forward search tree for (belief state) MDPs consists of alternating layers of action nodes and chance nodes. The root of the tree is an action node labelled with the initial (belief) state. An action node has children corresponding to the possible actions. The chance node corresponding to a stateaction has children corresponding to the possible successor states, labelled with the probability of that particular state. A search tree can be used to estimate the value of taking each action at the root by repeated backups. The leaf action nodes are given values according to some heuristic estimate of the value function at their state. The value of a chance node is the average of the child values weighted by the probabilities. The value of an action node is the maximum of the child values. There are various choices in how to generate search trees. First, the children of a chance node can be generated either exhaustively, based on an explicitly given transition model of the MDP, or indirectly, by sampling repeatedly from it, which only requires a simulator. Second, there is the choice of which nodes to expand, given finite total computation time. Apart from simple fixed-depth strategies [7], there are various more sophisticated methods [20, 9]. Finally, the choice of evaluation function at the leaves is of key importance, especially when the tree depth is much lower than the expected time to termination. III. M ODELING THE PROBLEM Our goal is to build a robotic system that navigates between locations in an indoor environment efficiently. We now consider several ways to model this as a planning problem, in order to motivate our chosen formulation in Section III-E. All models will be defined with respect to a roadmap over the static map. This is generated as follows from a known static map: first, waypoints are sampled from the free space. This can be done using simple uniform tiling sampling, or more intelligently, e.g., based on the Voronoi graph of free space [3]. The main constraint is that from every point in free space, there is a path of length l < R leading to a waypoint, where R is a fixed radius threshold. Next, given the waypoints, standard path planning is run offline on pairs of nearby waypoints to generate edges. If obstacles are not considered, this graph can be used for planning by adding the start and goal positions to the graph, then searching for a path between them. When the robot is at a waypoint, it does a quick local reachability check to the neighbors in the graph. This process acts as a

Fig. 2. Problem in which two paths are possible, but edge A was more recently observed blocked than edge B.

G

s Fig. 3. Problem in which the top path is better because there is more scope to move around obstacles.

deterministic virtual sensor [10] that allows us to know the state of all edges incident to the node of the graph the robot is currently at. A. Deterministic A simple scheme for updating the roadmap is to maintain a list of blocked edges. An edge is added to this list whenever it is observed blocked and removed when it is observed free. An immediate problem is that the robot can eventually get into a situation where no path exists. To avoid getting stuck, whenever a path cannot be found, all edges that are not currently observed blocked are unblocked. If a path still cannot be found, a wait action is chosen. This scheme does not take any account of the likelihood of an edge being blocked, or of the relative cost of alternative paths. It therefore makes various kinds of systematic errors. Example 1: (Block probabilities) In Figure 2, there are two paths, both blocked, but edge A has been observed blocked much more recently than edge B. It therefore makes sense to take the top path. Example 2: (Prediction) In Figure 3, no edges have been observed blocked. It nevertheless makes sense to take the top path, because a single blocked edge on that path can be circumvented, while a single blocked edge on the bottom path requires going back to the start. Example 3: (Patience) In Figure 4, if an obstacle is observed on the edge between S and G, it might make sense (assuming a dynamic model of obstacles) to wait rather than taking the longer path, since the obstacle could disappear. In each of these cases, the deterministic algorithm will choose the wrong action either always or often. B. Deterministic with blocked-edge costs Rather than viewing blocked edges as completely impassable, we could give them an extra cost proportional to their probability of being blocked. In other words, given an edge such that it was last observed T seconds ago, and it was

291

E. POMDP

s

G

Fig. 4. Problem in which shortest path is currently blocked, but it might make sense to wait and see if it disappears.

G

s

R

Fig. 5. Problem in which it may make sense to resense the obstacle before committing to the longer path.

blocked at that time, its cost is c(e) + e−aT B. It therefore deals correctly with Example 1. It still fails on Examples 2 and 3. While this algorithm takes more account of uncertainty, it still neglects the fact that actions can add information. Example 4: (Value of information) In Figure 5, suppose the robot is at S, and the short path to G has been recently observed blocked. Given the length of the long path, it might still make sense to check again if the path has become free before deciding on the long path. It is not possible to achieve this in general by adjusting the a and B parameter, for that would prevent the robot from ever considering the long path. Having time-varying costs can also lead to another problem. Example 5: (Persistence) In Figure 5, suppose now that the robot is at R, proceeding down the long path which was chosen because the short path to G from S had a high initial probability of being blocked. The probability of being blocked will decrease exponentially though, so it is possible that the edge’s blocked cost decreases enough that the robot will stop in the middle of the path and go back down the other way, which leads to behavior that is suboptimal (and looks strange). C. Most likely state A related method is to maintain a probability distribution over the true state of the world and plan assuming the most likely graph. As above, this method fails to take sensing actions in Example 4. D. MDP Rather than using costs, we can model the uncertainty using an MDP. A straightforward way to do this would be to have each edge’s status be sampled afresh each time the robot is adjacent to it. The problem this runs into is that, since the edges have no hidden state, a behavior such as “wait for 10 seconds, then choose another path if this edge is still blocked” would never be followed in cases like Example 4 — the robot would either leave immediately or wait forever.

POMDPs model both the transition and sensing uncertainty in the domain. We use the following POMDP model: • There is one state variable for the current position in the roadmap and, for each edge a status, which can either be free or blocked. In the latter case it belongs to one of a predefined set of classes. In our examples, the obstacle classes are temporary, person, and static. • The actions at a state are to take one of the outgoing edges from that node, or to wait for 3 seconds at the current position. 1 • The transition model is that a move succeeds iff the edge is not blocked (at the start of the move). If so, it has a duration depending on the edge length, and the robot ends up at the other node incident to the edge. Additionally, each edge status evolves according to an independent continuous time Markov chain. The parameters are: – the block rate (expected time till an obstacle appears) – the prior distribution over obstacle classes; – for each obstacle class, the unblock rate, or expected time till it disappears. • The observation model is that the current position is known with certainty and, for each adjacent edge, the robot observes whether it is free or blocked (but not which class the obstacle belongs to). • The cost of an action is the time it takes. Optimal solutions to (instances of) this POMDP avoid the problems listed above. For example, in Figure 5, the belief will include a distribution on whether the obstacle is static, temporary, or a person. If the prior probability of being temporary or a person is high enough, the optimal plan will be to move to the blocked edge and wait for some amount of time. If the path clears, take it. If not, then over time the probability of being a static obstacle will increase. This falls out of the Bayesian updating formula: P (E|h) ∝ P (E)P (h|E) where h is the observation history and E is the event that the obstacle is static. Even if the prior probability P (E) is low, the longer the robot waits without the obstacle moving, the higher the likelihood term P (h|E) becomes, until it eventually becomes optimal to take an alternate path. IV. S TATE E STIMATION A belief update given belief b, action a, and observation o, consists of a projection through a followed by conditioning on o. In our case, the transition model is: & P (suv |suv , t(s, a)) P (s |a, s) = I{sp =fp (s,a)} (u,v)∈E

where: • fp (s, a) is the deterministic transition function of position that results in moving to the other incident node of a if the move is legal, and staying at the current position for 1 A 3 second interval was chosen as the minimum reasonable wait time in our environment; longer waits can be obtained by repeating the action.

292

G

G 3

2

2

4

2

2

3

s

s

4

Fig. 6. Original problem. Dashed edges have length 2 and have block probability above the threshold. All other edges have length 1, and have block probability below the threshold.

Fig. 7. Abstracted version of problem in Figure 6. In the initial belief state, dashed edges are possibly blocked, and all others are definitely free.

1 second otherwise. Wait actions also result in staying at the current position. • t(s, a) is the duration of the action, which is just the edge length.  • P (suv |suv , t) is given by the continuous time Markov chain transition distribution We represent our belief states in factored form, as consisting of a known position bp , and a distribution buv over each edge’s status. Given such a belief, since the transition model above factors over terms, each of which depend on one of the belief variables above, the projected distribution P(b, a) will also have this factored form. Similarly, the observation model is deterministic, and can be written: & Z(o|s, a, s ) = I{ouv =g(su,v ,sp )}

locations where the robot has recently observed obstacles. Intuitively, (the nodes incident to) these edges are the important ones to reason about during planning. The remaining nodes (apart from the start and goal) are just intermediate locations we pass through — assuming the block rate is not high, the optimal conditional plan will look something like 1) visit the left dashed edge; 2) if is free, take it; 3) otherwise wait for some number of seconds; 4) if the edge still is blocked, move to the right dashed edge; and so on. Figure 7 shows the an abstracted version of the problem based on this intuition. The edge lengths have been computed using Dijkstra’s algorithm in the original graph, implicitly assuming no new obstacles will appear. Note that Figure 7 is still nontrivial to solve optimally because it is partially observable, and obstacles may still appear and disappear on the dashed edges. Nevertheless, it has a shorter effective horizon. More precisely, define the abstract graph Ga given a problem and belief state. First, let B be the set of edges whose probability of being blocked exceeds some threshold (we use 1+pβ where pβ is the stationary probability of an unobserved 2 edge being blocked). Let the cut graph be the original graph with edges in B removed. Given current position vs and goal vg , the abstract graph has vertices V˜ = VB ∪ {vs , vg } where VB are the incident vertices to edges in B. For each edge in B, there is a corresponding edge in Ga with the same length. Also, for every pair of vertices u, v ∈ V˜ , if the distance duv between them in the cut graph is finite, add an edge in Ga between u and v with length duv . We can then construct the reduced POMDP based on this abstract graph. The initial belief b0 of the reduced problem is the same as b except that edges not in B are not present, and the newly added abstract edges are included, and are considered free with probability 1. The unblocking rates are the same as the original problem, but the block rate is 0. The proposed solution algorithm is to plan in the reduced POMDP and follow the first concrete edge of the first action in that plan

(u,v)∈E

where the function g returns free or blocked if sp equals u or v, and unobserved otherwise. Since the position is known, the update can once again be done in factored form: for edges adjacent to the current position, if the edge is observed free the conditional distribution is free with probability 1, and if it is observed blocked, we make the probability of it being free 0 and reweight the remaining statuses to sum to 1. V. E FFICIENT P LANNING The state space of the POMDP model is exponential in the number of edges, which is beyond the capabilities of current general-purpose offline planners. Online planners based on forward search do not directly depend on the state space size, but do have exponential dependence (for a fixed bound on error) on the search horizon, which can be large in our case, since plan sizes can be on the order of the graph diameter. A. The Abstract Graph We can take advantage of structure in the problem. Figure 6 shows an example instance. The dashed edges are two

293

(this is also computed as part of the Dijkstra search to find the edge costs). The assumption made in the reduced POMDP, that new obstacles do not appear on abstract edges, may not hold. Therefore, we replan after traversing each edge of the original graph, using a new abstract graph constructed based on the latest observations. B. Analysis We can analyze the quality of this approximation as a function of the domain parameters. Specifically, let p = (p1 , . . . , pK ) be the stationary probability distribution for the edge obstacle Markov chains, where p1 corresponds to no obstacle, and let e = (e1 , . . . , eK ) be the expected unblocking time for each obstacle type (and e1 = 0). Theorem 1: Let s and g be nodes such that the optimal expected travel time between them is T . Suppose the robot follows an optimal policy with respect to the abstract graph. Then, the expected time T  to travel from s to g satisfies T  ≤ (1 + pT e)T . Intuitively, the approximation quality depends on how frequent obstacles, especially long-lived ones, are. The bound is loose because in fact, unforeseen obstacles mainly matter in cases like Example 2 where there are bottlenecks and narrow hallways. In practice, the approximation is unlikely to be substantial unless most of the domain consists of narrow paths. C. Planning with the Abstract Graph The reduced POMDP still has a large state space, but a shorter horizon, making it a good candidate for forward search. We use simple fixed-depth search with fixed-width sampling, as described in Section II-C. More intelligent strategies [9] would likely improve performance. Leaves of the search tree are evaluated using the following heuristic function. Given a belief state b, sample some number of graphs; in each graph, compute the minimum of the shortest path distance to the goal and the graph diameter (caching within and across leaf nodes is used to make this efficient), and average the results. Intuitively, this corresponds to taking full account of partial observability until the search horizon, then making a full-observability approximation. At state nodes of the lookahead tree, since the current position in the graph and neighboring edge statuses are known with certainty, the number of children is bounded by the outdegree of the roadmap. For action nodes, suppose we are at a node corresponding to doing action a after belief state b. A naive way to generate a child would be: 1) Sample a state s from the distribution b 2) Sample a next-state and observation from P (s , o|s, a) 3) Set b ← F(b, a, o) 4) If there is already a child node corresponding to b , increase its count; otherwise create a new node This procedure results in a large number of belief state updates in the inner loop. Instead, we use a trick that we have not previously seen in the literature, based on the fact that F is deterministic given b, a, o: label each outgoing edge with an observation, and only compute F when the observation has

not been seen before. This helps in our setting because the observation space is much smaller than the state space. VI. E XPERIMENTS We evaluated the various formulations and the ROS navigation stack on simulated graph instances as well as on a PR2 robot. The parameters for the models (e.g., block rates) were chosen based on the known domain model in the simulated case, and set by hand for the physical robot experiments. A. Simulated experiments We first compared several of the approaches described in Section III on a set of simulated instances. There are two metrics of interest: planning time and plan quality, which we model as total execution time 2 . In our evaluations, we bounded planning time and compared the total execution time across algorithms. The scale of our roadmaps is such that traversing each edge typically takes several seconds; we therefore placed a conservative limit of one second of planning time per action, and compared the following approaches: • DA is the deterministic agent from Section III-A. • BA1 and BA2 are the block-cost agents from Section III-B, using block costs of 10 and 1000. • ALA1 through ALA3 are abstract POMDP agents from Section III-E, using search depths 1 to 3, and a sampling width of 100. We used a set of instances with graph sizes ranging from 20 to 1000. The results are shown in Table II. The POMDPbased algorithms were the best by a significant margin on each domain. Interestingly, ALA2 occasionally did better than ALA3. This phenomenon is known in the search literature as a lookahead pathology [2]. A more intelligent search strategy than uniform-depth search should mitigate this problem [9]. B. Experiments on robotic platform We also evaluated our algorithm on a Willow Garage PR2 robot. A roadmap was generated offline from a static map of our environment by starting with uniformly spaced nodes, which were then perturbed locally to move away from obstacles. The environment is about 40 by 40 meters, and the roadmap had about 200 nodes. We implemented a bridge ROS node to connect our graph-based system to the continuous state of the robot, as shown in Figure 8. Pose messages from the AMCL localization system are mapped to the nearest node on the roadmap, while actions (edges of the roadmap) are mapped to goal commands for a trajectory planner based on the elastic band algorithm [19]. A local occupancy grid is maintained in the odometric frame, based on laser range readings; when reaching a new node, a shortest path computation is done in this grid (with obstacles inflated by the robot radius) to determine the neighboring edge statuses. An action is run either until the trajectory follower reports success or failure, or the current nearest node changes. The planner runs asynchronously, using increasing tree depths, while the 2 The framework can be straightforwardly extended to more complex cost functions including, e.g., proximity to obstacles.

294

Instance 1 2 3 4 5 6 7 8

DA 161 ± 23 758 ± 207 1203 ± 64 353 ± 14 897 ± 97 1115 ± 80 441 ± 39 732 ± 101

ALA1 186 ± 36 225 ± 59 936 ± 58 361 ± 33 853 ± 122 1026 ± 56 493 ± 58 857 ± 122

ALA2 132 ± 7 228 ± 65 907 ± 54 191 ± 22 639 ± 43 915 ± 54 410 ± 44 684 ± 55

ALA3 142 ± 12 149 ± 38 1025 ± 48 178 ± 12 625 ± 37 924 ± 51 378 ± 36 621 ± 51

BA1 272 ± 57 522 ± 207 1064 ± 73 259 ± 33 914 ± 103 1079 ± 88 408 ± 37 817 ± 62

BA2 285 ± 94 380 ± 123 1134 ± 63 286 ± 40 1321 ± 91 1226 ± 76 461 ± 36 808 ± 53

TABLE II R ESULTS ON EVALUATING DIFFERENT AGENTS ON A SET OF BENCHMARK ENVIRONMENTS . E ACH CELL ENTRY REPORTS COST TILL REACHING THE GOAL AND SAMPLE STANDARD DEVIATION , GIVEN 30 INDEPENDENT TRIALS . P LANNING TIME WAS BOUNDED AT ONE SECOND .

width. The people in the building are familiar with navigating robots; they do not try to interact with or move out of the way for them. Table III shows navigation times. In the best case, the two systems performed similarly. This corresponds to situations where no unforeseen obstacles were encountered in hallways or doorways. However, the worst case cost of the ROS navigation stack was much higher. This seems to mainly be because it does not model the probability of different observation types or take the length of the second-best path into account when deciding how long to wait given an unforeseen obstacle; it therefore often gives up prematurely in favour of a significantly longer path. VII. R ELATED W ORK

Fig. 8. Screenshot from visualization during execution on PR2 (on subset of overall environment used in experiments). The roadmap graph is overlaid on a static occupancy grid. The polygon shows the robot’s current position, which is mapped to a particular node of the roadmap. An obstacle is visible in the hallway, resulting in the right outgoing edge from the robot position being blocked. Thus the only available action is to follow the left edge.

robot is navigating, assuming the action succeeds, i.e., that we end up at the node on the other side of the given edge. If this does happen, the next action can be selected immediately without pausing. If the action terminates at an unexpected node (usually because a person was in the robot’s way), the robot pauses and the planner is rerun for one second. This only happens occasionally, as the elastic band planner has some degree of reactivity to unforeseen obstacles. We compared the performance of our navigation system against the ROS navigation stack [11] by measuring the time taken to navigate to a sequence of waypoints. Five trials were run for each system. The trials were conducted during the day, when there is a reasonably high density of people. Our environment is fairly typical of academic or office buildings, with a mixture of open spaces, rooms, and hallways of varying

Navigation using roadmaps has been well studied. Most existing work has assumed the graph is known. [13] is an exception, in which a PRM planner is modified to accept or reject sampled points based on the probability of collision, and then looks for an unconditional plan with a high chance of success. In contrast, since our approach has an observation model, it will return conditional plans that base future actions on the results of observations. Ong et al [15] studied mixed observability Markov decision processes (MOMDPs), in which part of the state is perfectly observed, and showed how to speed up offline dynamic programming algorithms in this case. Our problem is a MOMDP, since the robot position in the roadmap is perfectly observed. Unfortunately, the unobserved portion of the state space is still exponentially large. Closely related to this research is that of Kneebone and Dearden [8]. Like us, they consider navigation in a partially observed graph, and represent the uncertainty using a POMDP. A difference is that in their framework, obstacles do not appear and disappear over time. This makes a qualitative difference in the kinds of policies that are found: there is no longer any utility in taking wait actions, nor are there useful inferences to be made about obstacle classes (effectively, all obstacles belong to a single class whose unblocking rate is 0). Also related is the literature on the Canadian Traveller’s Problem [14]. Again, the obstacles here are either static or Markovian, so that a bounded wait will never be optimal.

295

Trial 1 2 3 4 5

ROS Navigation Stack 436 844 1118 491 520

ALA 460 605 549 534 502

TABLE III E XECUTION TIME ( SECONDS ) FOR PR2 USING THE ROS NAVIGATION STACK AND OUR SYSTEM ON A FIXED SET OF WAYPOINTS IN OUR ENVIRONMENT.

The idea of using an abstract graph to solve the planning problem is reminiscent of approaches based on subgoals and macroactions [4]. Subgoal choice tends to be a challenging question for these approaches [22]. We make use of the structure of the problem to choose subgoals as places where obstacles are likely to be. Theocharous and Kaelbling [23] also describe a macroaction-based POMDP planning algorithm for robot navigation, but they were concerned with uncertainty about the robots position rather than the obstacles. Another complementary line of research is on motion planning among movable obstacles [5]. Unlike our work, which considers an unknown and changing set of static obstacles, this work considers a known, fixed set of moving obstacles. Execution systems for robotics have been widely studied [16] but many implemented systems still use finite state machines or scripts in a general-purpose programming language. TREX [18] is a sophisticated executive based on temporal planning, that was recently used in a navigation task involving failure recovery [12]. The executive in that case still required significant procedural knowledge to be prespecified in the form of temporal constraints, and planning was mainly used to ensure that the constraints were met. VIII. C ONCLUSION We view the main contribution of this work as showing by example that it is possible to build robotic systems that reason about and react to execution failures using planning. Our formulation deals with uncertainty in a principled way, and optimal solutions to it exhibit various behaviors that are normally hardcoded into an executive, such as persistence and inference about obstacles. From a software engineering point of view, the resulting systems are less stateful, hence easier to understand and modify. We also described an efficient approximate solution algorithm that can feasibly be run online on a robot, and showed improved performance compared with traditional deterministic planners. R EFERENCES [1] Oliver Brock and Oussama Khatib. High-speed navigation using the global dynamic window approach. In ICRA, 1999. [2] Vadim Bulitko and Mitja Lustrek. Lookahead pathology in realtime path-finding. In AAAI. AAAI Press, 2006. [3] Howie Choset and Joel W. Burdick. Sensor-based exploration: The hierarchical generalized Voronoi graph. IJRR, 19(2), 2000. [4] Thomas G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. J. Artif. Intell. Res. (JAIR), 13, 2000.

[5] David Hsu, Robert Kindel, Jean-Claude Latombe, and Stephen Rock. Randomized kinodynamic motion planning with moving obstacles. IJRR, 21(3), 2002. [6] Leslie Kaelbling, Michael Littman, and Anthony Cassandra. Planning and acting in partially observable stochastic domains. Artif. Intell., 101, 1998. [7] Michael J. Kearns, Yishay Mansour, and Andrew Y. Ng. A sparse sampling algorithm for near-optimal planning in large markov decision processes. In IJCAI, 1999. [8] Michael Kneebone and Richard Dearden. Navigation planning in probabilistic roadmaps with uncertainty. In Alfonso Gerevini, Adele E. Howe, Amedeo Cesta, and Ioannis Refanidis, editors, ICAPS. AAAI, 2009. [9] Levente Kocsis and Csaba Szepesv´ari. Bandit based MonteCarlo planning. In ECML, 2006. [10] Steve Lavalle. Filtering and planning in information spaces. IROS tutorial notes, 2009. [11] Eitan Marder-Eppstein, Eric Berger, Tully Foote, Brian Gerkey, and Kurt Konolige. The office marathon: Robust navigation in an indoor office environment. In ICRA, 2010. [12] Wim Meeussen, Melonee Wise, Stuart Glaser, Sachin Chitta, Conor McGann, Patrick Mihelich, Eitan Marder-Eppstein, Marius Muja, Victor Eruhimov, Tully Foote, John Hsu, Radu Rusu, Bhaskara Marthi, Gary Bradski, Kurt Konolige, Brian Gerkey, and Eric Berger. Autonomous door opening and plugging in with a personal robot. In ICRA, 2010. [13] Patrycja Missiuro and Nicholas Roy. Adapting probabilistic roadmaps to handle uncertain maps. In ICRA, 2006. [14] Evdokia Nikolova and David R. Karger. Route planning under uncertainty: The Canadian traveller problem. In AAAI, 2008. [15] S.C.W. Ong, S.W. Png, D. Hsu, and W.S. Lee. POMDPs for robotic tasks with mixed observability. In Proc. Robotics: Science and Systems, 2009. [16] Ola Pettersson. Execution monitoring in robotics: A survey. Robotics and Autonomous Systems, 53(2), 2005. [17] M.L. Puterman. Markov decision processes. Wiley-Interscience, 2005. [18] Frederic Py, Kanna Rajan, and Conor McGann. A systematic agent framework for situated autonomous systems. In AAMAS, 2010. [19] Sean Quinlan and Oussama Khatib. Elastic bands: Connecting path planning and control. In ICRA, 1993. [20] St´ephane Ross, Joelle Pineau, S´ebastien Paquet, and Brahim Chaib-draa. Online planning algorithms for POMDPs. J. Artif. Intell. Res. (JAIR), 32, 2008. [21] Stuart J. Russell and Peter Norvig. Artificial Intelligence - A Modern Approach. Pearson, 2010. ¨ ur Simsek, Alicia P. Wolfe, and Andrew G. Barto. Identi[22] Ozg¨ fying useful subgoals in reinforcement learning by local graph partitioning. In ICML, 2005. [23] Georgios Theocharous and Leslie Kaelbling. Approximate planning in POMDPs with macro-actions. In NIPS, 2003. [24] Sebastian Thrun. Probabilistic robotics. Commun. ACM, 45(3), 2002.

296

Visual Route Recognition with a Handful of Bits Michael J. Milford, Member, IEEE

burgeoning sensor megapixel counts and large camera field of views. In this quest, one very important question has been largely neglected – what visual information is actually needed to conduct effective vision-based navigation?

Abstract—In this paper we use a sequence-based visual localization algorithm to reveal surprising answers to the question, how much visual information is actually needed to conduct effective navigation? The algorithm actively searches for the best local image matches within a sliding window of short route segments or ‘sub-routes’, and matches sub-routes by searching for coherent sequences of local image matches. In contrast to many existing techniques, the technique requires no pre-training or camera parameter calibration. We compare the algorithm’s performance to the state-of-the-art FAB-MAP 2.0 algorithm on a 70 km benchmark dataset. Performance matches or exceeds the state of the art feature-based localization technique using images as small as 4 pixels, fields of view reduced by a factor of 250, and pixel bit depths reduced to 2 bits. We present further results demonstrating the system localizing in an office environment with near 100% precision using two 7 bit Lego light sensors, as well as using 16 and 32 pixel images from a motorbike race and a mountain rally car stage. By demonstrating how little image information is required to achieve localization along a route, we hope to stimulate future ‘low fidelity’ approaches to visual navigation that complement probabilistic feature-based techniques.1 Index Terms—localization, navigation, featureless, SeqSLAM

I.

route

recognition,

In this paper we present evidence to suggest that, at least for localizing along a route, a simple sequence-based localization algorithm is able to match or surpass the performance of a state of the art algorithm while using images with resolutions up to one millionth the size and less than two hundredth the field of view. Specifically, we make the following contributions:

visual

INTRODUCTION

Current state of the art visual navigation systems are dominated by probabilistic feature-based techniques such as FAB-MAP [1], FrameSLAM [2], and MonoSLAM [3]. These techniques have displayed impressive performance in a range of experiments, the largest of which have occurred over distances of 1000 km [1]. These feature-based approaches have desirable properties such as easy integration with metric pose estimation and semantic mapping techniques, and the ability to localize off a single image. However, such approaches also have shortcomings. Many require training on a suitable dataset to develop a visual ‘codebook’ before they can be applied in an environment, and using an inappropriate codebook can result in poor system performance. Feature-based techniques also rely on being able to reliably detect features, a requirement that is difficult in changing environmental conditions caused by weather, day-night cycles and seasons. The quest to map increasingly impressive datasets has been accompanied by a trend towards increasingly sophisticated algorithms, M. J. Milford is with the School of Electrical Engineering and Computer Science at the Queensland University of Technology, Brisbane, Australia (email: michael.milford AT qut.edu.au). This work was supported by an Australian Research Council Fellowship DE120100995 to MM.

x

a route recognition algorithm incorporating whole of appearance image comparison with Dynamic Time Warping [4] sequence recognition which requires no training and is not reliant on feature recognition,

x

extensive experimental results showing the effect of image resolution, camera field of view, pixel bit depth and sequence length, with comparison to a state of the art method on a publicly available, modern benchmark dataset, and

x

further experimental results from rally car, motorbike and office datasets, including localization using two 7 bit Lego light intensity sensors. II.

BACKGROUND

The most relevant use of image sequences to localize was in work by [5], in which loop closure was performed by comparing sequences of images based on the similarity of 128D vectors of SIFT descriptors. Due to its reliance on visual features, the method required the development of additional algorithms to address visual ambiguity caused by repetitive foliage or architecture features. The use of image sequence information has also been used to geo-locate a person based on a sequence of photos they have taken, even when none of the individual images contain recognizable features [6]. In contrast, the technique presented here forgoes the use of features and uses a novel image difference normalization scheme to partially address visual ambiguity. While there are a large number of vision-based mapping systems [1-3, 7, 8], few current implementations use low resolution images. Earlier research did use relatively low resolution visual images to perform navigation (for reasons of computation as much as anything else), including numerous early systems such as ALVINN [9] and insect-inspired algorithms [10]. More recently, low resolution approaches have been deployed on Sony AIBO robot dogs [11] and Pioneer

297

robots [12], including the biologically inspired RatSLAM system [13]. In research fields outside of localization such as face recognition [14] and object recognition, matching using low resolution images has been found to be highly effective [15]. In this work we use image snapshots with up to two orders of magnitude less information than in these previous studies. The research presented here also builds on related work by the authors on localization using ‘whole of image’ appearance-based methods under extreme environmental change [16], in which it was shown that routes could be recognized over day to night, sunny to stormy and summer to winter transitions, albeit with image sizes of approximately 1000 pixels. Therefore we do not specifically address extreme environmental change in this paper, but rather focus on pushing the boundaries even further on the minimum resolution, pixel bit depth, and field of view required to recognize a route under more modest change. III.

APPROACH

The algorithm consists of two primary modules, the image comparison algorithm and the sequence recognition algorithm.

current image i, and ımin is a minimum standard deviation constant used to avoid undefined output, set to 1/256 of the intensity range in this paper. Rw was set to 10 frames for all experiments in this paper. Normalizing the difference values in local image neighborhoods is a process that would be counterproductive when localizing off single frames. However, in the context of recognizing sequences of images, this process ensures there are clear locally best matching images in every sub-route along the entire stored set of routes, to some extent negating the effect of global biases such as lighting changes and image commonalities. To find likely route matches, we perform a continuous version of the Dynamic Time Warping (DTW) method proposed by Sakoe and Chiba [4]. We impose continuity and slope constraint conditions to constrain the search space. The boundary condition and monotonically increasing constraints are not applicable due to uncertainty in velocity and the need to match both forward and reverse traverses of a route. The search is continuous in that searches are started at every element in the left column of the image difference matrix (shown by the small solid circles in Figure 1).

A. Image Similarity For panoramic images, mean absolute image differences D between the current image i and all stored images j are calculated using the mean absolute intensity differences, performed over a range of horizontal offsets: 

Dj

min g ' x , i , j  ' xV

  Figure 1. The image difference matrix M (a) before and (b) after normalization, with small circles showing the elements where each search originates. Only the bottom section of the difference matrix is shown.

where ı is the offset range, and g( ) is given by: 

g 'x, i, j

1 ¦ ¦ p xi  'x , y  p xj, y  s x 0y 0

 

where s is the area in pixels of the image. Setting V >0, S @ enables recognition when traversing a route in reverse. For perspective cameras, ı can be set to span a range of offset angles to provide some invariance (assuming mostly distal features) to camera yaw. However, for the perspective camera datasets in this paper only a no offset case was used ( V >0 @ ).



Dˆ i

Di  Dl  max(V l , V min )

arg min s i 

 

1didm

s i min d i 



 

The vector di contains the difference scores for sub-route i over all slope possibilities k: 

d ik

1 n ˆ ¦ D ju i , j , k  n j1

 

where n is the sub-route length and u(i, j, k) provides the element row index in the image difference matrix: 

 

where Dl is the local mean, ıl is the local standard deviation, over a distance of Rw images acquired before and after the

imin

where m is the number of stored images and s(i) is the normalized sub-route difference score for sub-route i over all slope constraints:

B. Sequence Matching Comparisons between the current image and all stored images yield a vector of image differences, as in [5]. The matrix M of image differences for the n most recent frames forms the space within which the search for matching subroutes is performed. The key processing step is to normalize the image difference values within their (spatially) local image neighborhoods, similar to the creation of standard scores in statistics (Figure 1a). The updated image difference vectors (Figure 1b) are given by: 

The output of the DTW search produces a vector of subroute matching scores for each search origin and each slope condition. The best matching sub-route is determined as:

u i, j , k i  j tan vk 

 

where vk is a specific slope constraint. The slope constraint is set to span a range of values that encompass possible frame rate variations. For constant frame rate scenarios, such as the

298

Eynsham dataset or datasets with odometry, it is possible to use a small range or even single value of vk. By considering the sum of sub-route difference scores s(i) as a sum of normally distributed random variables, each with the same mean and variance, the sum of normalized differences over a sub-route of length n frames has mean zero and variance n, assuming that frames are captured far enough apart to be considered independent. Dividing by the number of frames produces a normalized route difference score with mean zero, variance 1/n. Percentile rank scores can then be used to determine an appropriate sub-route matching threshold. For example, for the primary sub-route length n = 50 used in this paper, a sub-route threshold of -1 yields a 7.7×10-13 chance of the match occurring by chance.

high resolution image captures from a Ladybug2 camera (circular array of five cameras) at 9575 locations spaced along the route. The Rowrah dataset was obtained from an onboard camera mounted on a racing bike. The Pikes Peak dataset was obtained from cameras mounted on two different racing cars racing up the mountain, with the car dashboard and structure cropped from the images. This cropping process could most likely be automated by applying some form of image matching process to small training samples from each of the camera types. The route consisted of heavily forested terrain and switchbacks up the side of a mountain, ending in rocky open terrain partially covered in snow. TABLE I. Dataset Name

To determine whether the current sub-route matches to any stored sub-routes, the minimum matching score is compared to a matching threshold sm. If the minimum score is below the threshold, the sub-route is deemed to be a match, otherwise the sub-route is assigned as a new sub-route. An example of the minimum matching scores over every frame of a dataset (the Eynsham dataset described in this paper) is shown in Figure 2. In the second half of the dataset the route is repeated, leading to lower minimum matching scores.

Eynsham Rowrah Pikes Peak Office

Number of frames

Distance between frames

Image Storage

70 km 2km

9575 6.7 m (median) 306 kB 440 4.5 m (mean) 7 kB http://www.youtube.com/watch?v=_UfLrcVvJ5o 40 km 4971 8 m (mean) 159 kB http://www.youtube.com/watch?v=4UIOq8vaSCc http://www.youtube.com/watch?v=7VAJaZAV-gQ 53 m 832 0.13 m (mean) 1.6 kB http://df.arcs.org.au/quickshare/790eb180b9e87d53/data3.mat

Figure 3. The (a) 35 km Eynsham, (b) 1 km Rowrah and (c) 20 km Pikes Peak routes, each of which were repeated twice. Copyright DigitalGlobe, GeoEye, Getmapping plc, The GeoInformation Group, USDA Farm Service Agency, Infoterra Ltd & Bluesky, Map data ©2012 Google.

Figure 2. Normalized sub-route difference scores for the Eynsham dataset with the matching threshold sm that yields 100% precision performance.

IV.

Distance

DATASETS

EXPERIMENTAL SETUP

In this section we describe the four datasets used in this work and the image pre-processing for each study. A. Datasets A total of four datasets were processed, each of which consisted of two traverses of the same route. The datasets were: a 70 km road journey in Eynsham, the United Kingdom, 2 km of motorbike circuit racing in Rowrah, the United Kingdom, 40 km of off-road racing up Pikes Peak in the Rocky Mountains, the United States, and 100 meters in an Office building (italics indicate dataset names). The Eynsham route was the primary dataset on which extensive quantitative analysis was performed. The other datasets were added to provide additional evidence for the general applicability of the algorithm. Key dataset parameters are provided in Table I, including the storage space required to represent the entire dataset using low resolution images. Figure 3 shows aerial maps and imagery of the Eynsham, Rowrah and Pikes Peak datasets, with lines showing the route that was traversed twice. The Eynsham dataset consisted of

Figure 4. (a) The Lego Mindstorms dataset acquisition rig with 2 sideways facing light sensors and GoPro camera for evaluation of matched routes. (b) The 53 meter long route which was repeated twice to create the dataset.

B. Image Pre-Processing 1) Eynsham Resolution Reduced Panoramic Images For the Eynsham dataset, image processing consisted of image concatenation and resolution reduction (Figure 5). The raw camera images were crudely cropped to remove overlap between images. No additional processing such as camera undistortion, blending or illumination adjustment was performed. The subsequent panorama was then resolution reduced (re-sampling using pixel area relation in OpenCV 2.1.0) to the resolutions shown in Table II.

299

study [1]. Detected route segment matches were classified as correct if the spatial distance separating the central frames of each route was less than 40 meters, as in the original study. Matches outside this distance were classified as false positives, with missed matches classified as false negatives. Matches were assessed for both traverses of the route, rather than just the second traverse. To generate each precision recall curve, we conducted a sweep over the range of matching threshold sm values. The range of values was chosen such that for most experiments, a complete range of recall rates from 0% to 100% was obtained. TABLE II. Dataset & Image Type Eynsham panoramic images (original image 829440 pixels, 1620×512)

Figure 5. Image pre-processing for the full panoramic images consisted of a crude image stitching stage followed by a reduction in image resolution. The current image was compared with 0° and 180° offsets to all stored images on a pixel by pixel basis to form the image difference matrix described in Section III.B.

2) Reduced Field of View For the reduced field of view experiments, a small area representing 0.4% of the total panoramic image was extracted from the centre of the forward facing image (Figure 6). The resultant 80×60 pixel image was then resolution reduced to the sizes shown in Table II.

Eynsham cropped images (original crop 4800 pixels, 80×60) Rowrah Pikes Peak Office NXT

V.

IMAGE SIZES Reduced Resolution

4 pixels 8 pixels 32 pixels 128 pixels 512 pixels 2 pixels 4 pixels 16 pixels 64 pixels 256 pixels 16 pixels 32 pixels 2×7 bit pixels

Width×Height

2×2 4×2 8×4 16×8 32×16 2×1 2×2 4×4 8×8 16×16 4×4 8×4 2×1

RESULTS

We present a range of results evaluating the performance of the system with varying image resolution, sequence length, pixel bit depth, and field of view. This extensive testing is performed on the 70 km Eynsham dataset, for which both ground truth and a state of the art comparison is available. We also present additional results demonstrating qualitatively the applicability of the technique to three other varied datasets to demonstrate the wide applicability of the technique.

Figure 6. To evaluate the effect of drastically reducing the field of view, an area representing 0.4% of the original panoramic image was extracted and resolution reduced.

Figure 7. To evaluate the effect of reduced pixel bit depth, the resolution reduced panoramic images were resampled at 1 bit, 2 bit, 4 bit and 8 bit grayscale pixel depths.

3) Reduced Pixel Depth Reduced pixel depths were obtained by reducing the bit depths of each pixel in the resolution reduced images (Figure 7). Grayscale image intensities were evenly distributed over the 256 values possible in an 8 bit intensity range, such that a 1 bit image had intensities values 85 or 171, a 2 bit image had intensity values 51, 102, 154 or 205 and so on. C. Precision-Recall Calculation To generate precision-recall curves, we used the manually corrected GPS data provided by the authors of the original

Figure 8. Precision recall curves for a range of reduced resolution panoramic images, with performance compared to four FAB-MAP implementations. Note the axis ranges.

A. Precision-Recall The precision-recall performance using panoramic images from the Eynsham dataset is shown in Figure 8. At high precision levels, 4 pixel images produce superior performance to the baseline FAB-MAP performance. Increasing the resolution to 8 pixels enables the system to overtake the FABMAP with motion model results, while with 32 pixels

300

performance is superior (50% recall at 100% precision) to FAB-MAP with motion model and epipolar geometry, except between 86% and 89% recall rates. The sequence-based technique is also able to attain 100% recall, at 24%, 39%, and 46% precision levels for 4, 8 and 32 pixel images respectively. Figure 9 shows a zoomed in comparison of the techniques. Performance gains are minimal above an image size of 32 pixels.

performance superior at high precision levels to both the baseline and motion model FAB-MAP 2.0 performance. 50 frame sequences provided the best performance at high precision levels, while 100 frame sequences provided the highest recall at lower precision levels. The 50 frame sub-route length (335 meters) is consistent with the warm start localization times of commercial GPS navigation systems of 28.5 seconds [17] (380 meters at 30 miles per hour).

Figure 10. Loop closure map for the Eynsham dataset using 32 pixel images at 99% precision and 82% recall. Loop closures are shown by the large circles, with false positive matches shown by small crosses.

Figure 9. Enlarged plot of the high precision and recall performance curves. Note the rapidly reducing performance gains for image sizes above 32 pixels.

B. Loop Closure Locations Although the precision performance using 32 pixel images is higher for a given recall rate compared to FAB-MAP, this is counteracted partly by inferior spatial loop closure coverage. Consequently, the algorithm requires a higher recall rate to achieve the equivalent loop closure coverage to FAB-MAP. Figure 10 shows the loop closures achieved at a 99% precision level, showing comparable loop closure coverage to the FABMAP algorithm. The sections of the route where the algorithm failed to match a route were mostly due to the linear search constraints not finding sequence matches when frame capture spatial frequencies varied significantly. The reverse route is also only thoroughly recognized at higher recall rates (and precision levels below 100%). Further discussion of this issue is provided at the end of the paper.

Figure 11. (a) Image difference matrix for a matched sub-route, with white circles showing the corresponding matching frame pairs. The matching gradient was approximately 1, indicating this route segment was traversed at the same speed both times. (b) Frames from the second traverse and (c) matching frames from the first traverse of the route. The full frames are shown for visibility, although the actual processed images were low resolution versions of the top 75% of the frame.

C. Sample Route Matches and Speed Ratios Figures 11-12 show matched sub-routes for both a forward (11) and reverse (12) sub-route match. The section of the image difference matrix in which the sub-route match was found is shown in panel (a), with white circles indicating the matching locations of five representative images from the matched subroute. Panels (b-c) show the corresponding images. Figure 13 shows a histogram of the relative frame sampling speed calculated for all matched sub-routes at the maximum recall, 100% precision point. The peak around 1 shows that frames were spatially sampled at similar rates during the second traverse of the route, while the small peak at -0.8 indicates the sub-routes matched in reverse. D. Sequence Length Increasing sequence length had a positive effect on performance up to a point (Figure 14). Matching 10 frame sequences was clearly inadequate, but 20 frames provided

Figure 12. (a) Image difference matrix for the reverse traversal along a section of route. (b) Frames from the second traverse and (c) matching frames from the first traverse of the route. The matching route segment had a gradient of approximately -0.8, indicating that the route was traversed approximately 20% more slowly in reverse.

E. Pixel Bit Depth The pixel bit depth had little effect on system performance beyond 2 bits (4 possible intensities). At a pixel depth of 2 bits,

301

performance was superior to the best FAB-MAP 2.0 performance at high precision levels, but had lower recall rates at lower precision levels. There was negligible difference in performance between 4 and 8 bit depths. A sequence length of 50 frames was used for all the pixel bit depth results.

Figure 16. Precision recall curves with a limited field of view equivalent to 0.4% of the original panoramic image (see Figure 6). Figure 13. Relative speed ratios calculated for matching Eynsham sub-routes at the 100% precision, maximum recall point.

G. Rowrah Track Bike Dataset For the Rowrah, Pikes Peak and Office NXT datasets we did not have metric ground truth. Instead, we present the subroute recognition graphs at a performance level that was qualitatively assessed to be near 100% precision, by comparing the image sequences associated with the matched sub-routes. For the Rowrah dataset, all sub-routes were matched within a few frames of the correct location (each frame separated by an average of 8 meters), as shown in Figure 17a. The vertical axis shows the central frame number of the matching sub-route from the first traverse of the route. Figure 17b shows five frames obtained by evenly sampling a sub-route from the second traverse of the circuit, with Figure 17c showing the corresponding frames from the matching sub-route during the first circuit traverse.

Figure 14. Precision-recall performance using 32 pixel images to match subroutes of 10, 20, 50 and 100 frames in length.

Figure 17. (a) Sub-route localization for the Rowrah dataset. (b) Frames from second route traverse and (c) matching frames from the first traverse. Figure 15. Precision-recall performance using 32 pixel images with 1, 2, 4 and 8 bit grayscale pixel depths. There was no significant improvement in performance above 4 bit pixel depths.

F. Limited Field of View When limiting the field of view to 0.4% of the original panoramic image, the recall rate at 100% precision improved to 56% for 16 pixel images, 67% for 64 pixel images, and 69% for 256 pixel images. The effect of increasing resolution was broadly the same as for the full field of view images, with gains rapidly diminishing after reaching about 32 pixels in resolution.

H. Pikes Peak Dataset The recall level at near 100% precision was around 50% for the Pikes Peak dataset (Figure 18). The lower recall rate was most likely due to significant changes in the racing-line taken by the car, larger variations in vehicle speed and the relatively bland nature of the environment, especially in the later mountainous stages. I.

Office NXT Dataset The route matching performance for the NXT dataset is shown in Figure 19a. Figures 19d and 19e show the NXT light

302

sensor readings, with illustrative camera frames shown in Figures 19b-c. The first column (d) shows the pairs of light sensor readings obtained at that location during the second traverse, and the second column (e) shows the matched light sensor readings from the first route traverse. Manual inspection of the camera frames verified that every matched route segment was accurate to within approximately a meter.

device, or easily download imagery from local areas. For example, a 10 megabyte download would provide 3000 km of road data, enough for a regional area. 2) Computation All experiments performed in this paper were performed at real-time or better speed using unoptimized Matlab and C code. Computation is dominated by the search for matching subroutes. The primary factor affecting computation is the allowance for varying velocities when conducting the matching route segment search. Here we discuss the localization-only scenario where a library of images already exists, starting with the situation where self-motion information (such as car odometry) enables the search space to be constrained to a simple linear search. For a sub-route length of n frames, the dominant calculation is the nm frame comparisons that must be performed, where m is the number of visual templates stored in the template library. Each frame comparison constitutes s byte-wise comparisons, or 2s comparisons to enable forward and reverse route matching. For 32 pixel panoramic images, this constitutes 64 byte-wise comparisons. Table III presents a number of computation scenarios, assuming 5 meter frame spacing and a camera speed of 15 meters per second (54 km/hr). With single instruction, multiple data (SIMD) the large city scenario is achievable on a current desktop machine. To achieve real-time performance during initial localization within an entire country or the world, significant optimizations would need to be implemented. One straightforward method is to cache frame by frame comparisons, comparing new images in the current sub route as they are seen, leading to an n times speed up, at the cost of needing more fast memory. However the fast memory requirement at the large city size is well within all current device capabilities including mobile phones and most other portable devices. Modern graphics card architectures and growing CPU counts even on mobile devices offer the potential for further significant speed ups through leveraging parallel processing. In addition, the implementation of optimized data structure methods could also remove the barrier to achieving country wide localization. Once localized, search spaces could also be massively constrained, as is done with current GPS systems. If a spatially regular spaced frame rate cannot be guaranteed, then the search space must expand to incorporate multiple possible velocities. This increases the compute by a factor dependent on the range of possible velocities. For the Eynsham dataset, allowing for a frame rate variation of 19% increased computation time by a factor of 10.

Figure 18. (a) Sub-route recognition for the Pikes Peak dataset. Recall was not achieved in sections of the 20 km route, most likely due to large variations in racing-line. (b) Frames from second traverse and (c) matching frames from the first traverse of the route.

Figure 19. (a) Sub-route recognition for the Office NXT dataset. (b) Frames from second traverse and (c) matching frames from the first traverse of the route. (d) Light sensor readings from second traverse and (e) matching light sensor readings from the first traverse of the route.

J.

Computation and Storage In this section we describe the storage and computational requirements of the system and present a range of computational scenarios for the proposed visual GPS scenario.

TABLE III.

1) Storage At a pixel depth of 4 bits and an image size of 32 pixels, each stored image takes a total of 16 bytes. It is an informative exercise to calculate the storage required to store imagery from the entire global road network. According to the CIA World Factbook [18], the 191 countries surveyed have a total of approximately 70 million kilometers of paved and unpaved roads. Storing images every 5 meters of road would result in 224 gigabytes of data, easily stored on a hard drive dating from 2006 and current solid state media. Therefore it would be possible to store a global database of images either locally on a

COMPUTATION SCENARIOS

Route Length

Qualitative Description

Template Storage

Number of byte-wise calculations per second

Number of calculations with caching

Cache fast memory storage requirement

100 km 10000 km 1×106 km

Local area

320 kB

192×106

3.84×106

1 MB

32 MB

9

6

100 MB

9

10 GB

303

7×107 km

Large city Medium country World road network

19.2×10

12

384×10

3.2 GB

1.92×10

38.4×10

224 GB

134×1012

2.69×1012

700 GB

VI.

DISCUSSION

The presented approach sacrifices single frame matching capability to achieve robust localization along a route, without the need for prior training or feature detection. While the method is not suitable for single snapshot mobile phone localization, a large range of potential applications involve navigation along routes, such as street navigation and indoor mobile robots. The experimental results show that in the context of navigation along routes, vision-based localization can be achieved with remarkably few pixels, tiny fields of view and reduced pixel bit depths. Localization was achievable through two key, interdependent measures. Performing localization using a sequence of images rather than single image removes the requirement that the image matching scheme be able to reliably calculate a single global image match. Instead, the image matching front end must only on average report matches better than at chance. How much better depends on how long a matching sequence is used, with longer matching sequences reducing the performance requirements for the image matcher but increasing the computation of the sequence matching algorithm. This trade-off is avoided in a subset of real world navigation applications such as domestic car travel, where translational speed information is available from On Board Diagnostic (OBD) systems. The use of sequences rather than individual images also introduces two types of lag – a delay in initial localization upon startup, and a delay when the route taken consists of several fragmented previously traversed sequences. Variable sequence length matching could partially address the initial localization lag problem, by localizing more rapidly when the environment is easily recognizable. To adapt the system to deal with fragmented sequences, we are pursuing three approaches. The first is to use traditional probabilistic filters, which also potentially removes the need for a sequence length parameter. The second is to expand the local best matching and search from one dimensional routes to two dimensional areas. The third is to maintain localization using odometry in situations where a previously localized system briefly loses localization while traversing several fragmented sequences (such as passing through a complex intersection).

ACKNOWLEDGMENT Author MM thanks Peter Corke, Gordon Wyeth and Liz Murphy for their helpful comments on the draft manuscript. REFERENCES [1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9] [10]

[11]

[12]

[13]

Matching using sequences rather than individual frames allows the image matching algorithm to be modified in ways that would render it useless as a global image matcher. We exploit this ability by normalizing the image difference scores within local sub-routes, forcing the algorithm to calculate a best image match candidate within every section of route stored in the image database. While this measure would produce large numbers of false positive loop closures were single image localization used, by matching over a sequence of images it enhances the ability of the algorithm to localize by removing the effect of systematic biases, in much the same way that applying patch normalization to an image removes some of the effect of illumination change [19]. This combination of sequence matching and local image comparison provides a basis for future development of sequence-based, featureless localization techniques with capabilities complementary to single frame feature-based techniques.

[14]

[15]

[16]

[17] [18] [19]

304

M. Cummins and P. Newman, "Highly scalable appearance-only SLAM - FAB-MAP 2.0," in Robotics: Science and Systems, Seattle, United States, 2009. K. Konolige and M. Agrawal, "FrameSLAM: From Bundle Adjustment to Real-Time Visual Mapping," IEEE Transactions on Robotics, vol. 24, pp. 1066-1077, 2008. A. J. Davison, I. D. Reid, N. D. Molton, and O. Stasse, "MonoSLAM: Real-Time Single Camera SLAM," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, pp. 1052-1067, 2007. H. Sakoe and S. Chiba, "Dynamic programming algorithm optimization for spoken word recognition," Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 26, pp. 43-49, 1978. P. Newman, D. Cole, and K. Ho, "Outdoor SLAM using Visual Appearance and Laser Ranging," in International Conference on Robotics and Automation, Florida, United States, 2006. E. Kalogerakis, O. Vesselova, J. Hays, A. A. Efros, and A. Hertzmann, "Image sequence geolocation with human travel priors," in International Conference on Computer Vision, Kyoto, Japan, 2009, pp. 253-260. P. Newman, G. Sibley, M. Smith, M. Cummins, A. Harrison, C. Mei, I. Posner, R. Shade, D. Schroeter, L. Murphy, W. Churchill, D. Cole, and I. Reid, "Navigating, Recognizing and Describing Urban Spaces With Vision and Lasers," The International Journal of Robotics Research, 2009. R. Sim, P. Elinas, M. Griffin, and J. J. Little, "Vision-based SLAM using the Rao-Blackwellised Particle Filter," in International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, 2005. D. A. Pomerleau, "Neural network perception for mobile robot guidance," DTIC Document1992. M. O. Franz, P. G. Scholkopf, H. A. Mallot, and H. H. Bulthoff, "Learning View Graphs for Robot Navigation," Autonomous Robots, vol. 5, pp. 111-125, 1998. D. Q. Huynh, A. Saini, and W. Liu, "Evaluation of three local descriptors on low resolution images for robot navigation," in Image and Vision Computing New Zealand, Wellington, New Zealand, 2009, pp. 113-118. V. N. Murali and S. T. Birchfield, "Autonomous navigation and mapping using monocular low-resolution grayscale vision," in Conference on Computer Vision and Pattern Recognition, Alaska, United States, 2008, pp. 1-8. M. Milford and G. Wyeth, "Persistent Navigation and Mapping using a Biologically Inspired SLAM System," International Journal of Robotics Research, vol. 29, pp. 1131-1153, 2010. P. J. Phillips, P. J. Flynn, T. Scruggs, K. W. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min, and W. Worek, "Overview of the face recognition grand challenge," in International Conference on Control, Automation, Robotics and Vision, Singapore, 2005, pp. 947-954 vol. 1. A. Torralba, R. Fergus, and Y. Weiss, "Small codes and large image databases for recognition," in Computer Vision and Pattern Recognition, Anchorage, United States, 2008, pp. 1-8. M. Milford and G. Wyeth, "SeqSLAM: Visual Route-Based Navigation for Sunny Summer Days and Stormy Winter Nights," in IEEE International Conference on Robotics and Automation, St Paul, United States, 2012. J. Mahaffey, "TTFF Comparisons," ed, 2003. CIA, The World Factbook, 2012. A. M. Zhang and L. Kleeman, "Robust Appearance Based Visual Route Following for Navigation in Large-scale Outdoor Environments," The International Journal of Robotics Research, vol. 28, pp. 331-356, 2009.

Exploiting Passive Dynamics with Variable Stiffness Actuation in Robot Brachiation Jun Nakanishi and Sethu Vijayakumar School of Informatics, University of Edinburgh, United Kingdom Email: [email protected], [email protected]

Abstract—This paper explores a passive control strategy with variable stiffness actuation for swing movements. We consider brachiation as an example of a highly dynamic task which requires exploitation of gravity in an efficient manner for successful task execution. First, we present our passive control strategy considering a pendulum with variable stiffness actuation. Then, we formulate the problem based an optimal control framework with temporal optimization in order to simultaneously find an appropriate stiffness profile and movement duration such that the resultant movement will be able to exploit the passive dynamics of the robot. Finally, numerical evaluations on a twolink brachiating robot with a variable stiffness actuator (VSA) model are provided to demonstrate the effectiveness of our approach under different task requirements, modelling errors and switching in the robot dynamics. In addition, we discuss the issue of task description in terms of the choice of cost function for successful task execution in optimal control.

I. I NTRODUCTION In recent years, there has been growing effort in the development of variable stiffness actuators. Various designs of actuators with mechanically adjustable stiffness/compliance composed of passive elastic elements such as springs have been proposed [4, 6, 11, 12, 14, 15]. In contrast to conventional stiff actuators, one of the motivations to develop variable stiffness actuators is that such actuators are expected to have desirable properties such as compliant actuation, energy storage capability with potential applications in human-robot interaction and improvements of task performance in dynamic tasks. This paper explores a control strategy for exploiting passive dynamics in tasks involving swing movements with variable stiffness actuation based on optimal control. Despite potential benefits of variable stiffness joints, finding an appropriate control strategy to fully exploit the capabilities of variable stiffness actuators (VSAs) is challenging due to the increased complexity of mechanical properties and the number of control variables. Taking an optimal control approach, recent studies in [3, 8, 10] have investigated the benefits of variable stiffness actuation such as energy storage in explosive movements from a viewpoint of performance improvement. Braun et al. [3] have demonstrated such benefits of VSAs by simultaneously optimizing time-varying torque and stiffness profiles of the actuator in a ball throwing task. In [8, 10], an optimal control problem of maximizing link velocity with variable stiffness actuator models has been investigated. It is shown that much larger link velocity can be achieved than that of the motor in the VSA with the help of appropriate stiffness adjustment

during a hitting movement. In a similar problem, Hondo and Mizuuchi [13] have discussed the issue of determining the inertia parameter and spring constant in the design of series elastic actuators to increase the peak velocity. In robot running, Karssen and Wisse [16] have presented numerical studies to demonstrate that an optimized nonlinear leg stiffness profile could improve robustness against disturbances. In this paper, we focus on the passive control strategy with variable stiffness actuation for swing movements in a brachiation task. Indeed, the importance of exploitation of the intrinsic passive dynamics for efficient actuation and control has been discussed in the study of passive dynamic walking where biped robots with no actuation or minimal actuation can exhibit human-like natural walking behavior [5]. In this study, we consider brachiation as an example of dynamic task involving swing movement. Brachiation is an interesting form of locomotion of an ape swinging from handhold to handhold like a pendulum [7, 29] which requires explicit exploitation of the passive dynamics with the help of gravity to achieve the task. From a control point of view, designing a brachiating controller is a challenging problem since the system is underactuated, i.e., there is no actuation at the gripper. Efforts have been made to develop a control law for a class of underactuated systems from a control theoretic view1 , e.g., [18, 19, 22, 32]. In our previous study [23], we have proposed a method of describing the task using a dynamical system based on a nonlinear control approach, and derived a nonlinear control law for a joint torque controlled two-link brachiating robot. The control strategy in [23] uses an active cancellation of the plant dynamics using input-output linearization to force the robot to mimic the specified pendulum-like motion described in terms of target dynamics. In contrast, Gomes and Ruina [9] studied brachiation with zero-energy-cost motions using only passive dynamics of the body. They sought numerical solutions for the initial conditions which lead to periodically continuous locomotion without any joint torques. By extending the (unstable) fixed point solutions in unactuated horizontal brachiation found in [9], Rosa et al. [28] numerically studied open-loop stable (unactuated downhill and powered uphill) brachiation of a two-link model from a viewpoint of hybrid systems control including switching and discontinuous transitions. 1 Much of the related work has focused on the motion planning of underactuated manipulators in a horizontal plane (not necessarily under the influence of the gravity). In such a case, dynamic coupling of link inertia is exploited rather than the passive dynamics due to gravity.

305

Motivated by the work in [9], our goal in this study is to demonstrate that highly dynamic tasks such as brachiation can be achieved by fully exploiting passive dynamics with simultaneous stiffness and temporal optimization. In our recent work [24], effectiveness of temporal optimization and stiffness optimization in periodic movements has been discussed. However, temporal optimization and stiffness optimization are treated separately and a rather simplified, ideal actuator models were used in the evaluation. In this study, numerical evaluations of our approach on a two-link brachiating robot with a realistic MACCEPA (Mechanically Adjustable Compliance and Controllable Equilibrium Position Actuator) VSA model [11] (see motivation for this particular VSA in Section III-B) are provided to show the effectiveness of our approach under different task requirements, modelling errors and variations of the robot dynamics. Furthermore, we also discuss the issue of and effect of task encoding via an appropriate choice of the cost function for successful task execution. II. PASSIVE C ONTROL S TRATEGY IN S WING M OVEMENT WITH VARIABLE S TIFFNESS ACTUATION Our goal in this paper is to devise a control strategy to achieve the desired swing maneuver in brachiation by exploiting natural dynamics of the system. To begin with, we discuss our approaches of implementing a passive control strategy, considering a pendulum with variable stiffness actuation. A natural and desirable strategy would be to make good use of gravity by making the joints passive and compliant. For example, in walking, unlike many high gain position controlled biped robots with stiff joints, humans seem to let the lower leg swing freely by relaxing the muscles controlling the knee joint during the swing phase and increase stiffness only when necessary. In fact, stiffness modulation is observed during a walking cycle in a cat locomotion study2 [1]. Consider the dynamics of a simplified single-link pendulum under the influence of gravity. If we consider an idealized VSA model of the form τ = −k(q − qm ), where q is the joint angle, τ is the joint torque, k is the stiffness and qm is the equilibrium position of the elastic actuator, then the dynamics can be written as: ml2 q¨ + mgl sin q = τ = −k(q − qm )

(1)

where m is the mass, l is the length and g is the gravitational constant. In this idealized VSA model, we assume that k and qm are the control variables. From a viewpoint of position control, one way of looking at this system is as a manipulator with a flexible (elastic) joint, where we solve a tracking control problem [31]. Recently, Palli et al. [25], proposed a tracking control algorithm for such a flexible joint manipulator with variable stiffness actuation to achieve asymptotic tracking to the desired joint and stiffness trajectories based on inputoutput linearization, effectively an active cancellation of the intrinsic robot dynamics. Note that the main focus of [25] is 2 To our knowledge, there are a large number of studies of stretch reflexes modulation in human walking, however, something that specifically addresses stiffness modulation is very limited. In human arm cyclic movement, Bennett et al. [2] reported time-varying stiffness modulation in the elbow joint.

the tracking control of the given joint and stiffness trajectories, and the problem of generating such desired trajectories for a given specific task is not addressed. On the other hand, if we rearrange the linearized dynamics of (1) (sin q ≈ q) as ml2 q¨ + (mgl + k)q = v

(2)

where v = kqm , another view of the control problem could be that varying the stiffness of the actuator k in the second term of the left hand side effectively changes the dynamics property, e.g., the natural frequency of the pendulum. From this perspective, the control problem can be framed as finding an appropriate (preferably small) stiffness profile k to modulate the system dynamics (only when necessary) and compute the virtual equilibrium trajectory qm [30] to fulfill the specified task requirement while maximally exploiting the natural dynamics. In a realistic situation, it is not straightforward to compute a control command for the actuator to realize such an idea due to the complexity of the system dynamics, actuator mechanisms, the requirement of coordination of multiple degrees of freedom and redundancy in actuation. Next, we exploit the framework of optimal control and spatiotemporal optimization of variable stiffness actuation to find appropriate control commands to implement the brachiation task. III. P ROBLEM F ORMULATION A. Robot Dynamics The equation of motion of the two-link brachiating robot shown in Fig. 1 takes the standard form of rigid body dynamics where only the second joint has actuation: 0 ˙ q˙ + g(q) + Dq˙ = M(q)¨ q + C(q, q) (3) τ where q = [ q1 , q2 ]T is the joint angle vector, M is the inertia matrix, C is the Coriolis term, g is the gravity vector, D is the viscous damping matrix, and τ is the joint torque acting on the second joint. B. Variable Stiffness Actuation We consider a MACCEPA model [11] as our VSA implementation of choice. MACCEPA is one of the designs of mechanically adjustable compliant actuators with a passive elastic element (see Fig. 1). This actuator design has the desirable characteristics that the joint can be very compliant and mechanically passive/back-drivable: this allows free swinging with a large range of movement by relaxing the spring, highly suitable for the brachiation task we consider. MACCEPA is equipped with two position controlled servo motors, qm1 and qm2 , each of which controls the equilibrium position and the spring pre-tension, respectively. The parameters of the robot we use in this study (Fig. 1(c)) are based on a 2-link MACCEPA joint (Fig. 1(b)) constructed in our lab [3]. The joint torque for this actuator model is given by / . rd qm2 − (C − B) τ = κ sin(qm1 −q) BC 1+' B 2 + C 2 −2BC cos (qm1 −q) (4)

306

y

x qm2

rd

lc1

q1

m1 , I 1

A κ qm1

lc2

τ q2

B

Gripper

q MACCEPA model

(a) Brachiating robot model with VSA

i=2

0.42

0.23

mi (kg)

0.0022 0.0017

Link length

li (m)

0.25

0.25

COM location

lci (m)

0.135

0.0983

Viscous friction

di (Nm/s)

0.01

0.01

value

Spring constant

κ (N/m)

771

Lever length

B (m)

0.03

Pin displacement

C (m)

0.0125

Drum radius

rd (m)

0.01

(b) 2-link MACCEPA joint

(c) Model parameters

Model of a two-link brachiating robot with a MACCEPA variable stiffness actuator.

and the joint stiffness can be computed as   κ sin2 (α)B 2 C 2 β ∂τ β − k=− (5) = κ cos(α)BC 1 + ∂q γ γ3 where κ is the spring constant, rd is the drum radius, α ' = qm1 − q, β = rd qm2 − (C − B) and γ = B 2 + C 2 − 2BC cos (qm1 − q) (see Fig. 1(a) and (c) for the definition of the model parameters and variables). The spring tension is given by F = κ(l − l0 ) (6) ' where l = B 2 + C 2 − 2BC cos(qm1 − q) + rd qm2 is the current spring length and l0 = C − B is the spring length at rest. The joint torque equation (4) also can be rearranged in terms of the moment arm and the spring tension as τ=

i=1

Mass

MACCEPA parameters

m2 , I 2 l2

Fig. 1.

Robot parameters

Moment of inertia Ii (kgm2 )

l1 C

Target bar

BC sin(qm1 − q) F. γ

(7)

Note that MACCEPA has a relatively simple configuration in terms of actuator design compared to other VSAs, however, the torque and stiffness relationships in (4) and (5) are dependent on the current joint angle and two servo motor angles in a complicated manner and its control is not straightforward. In addition, we include realistic position controlled servo motor dynamics, approximated by a second order system with a PD feedback control q¨mi + 2ai q˙mi + a2i (qmi − ui ) = 0,

(i = 1, 2)

(8)

where ui is the motor position command, ai determines the bandwidth of the actuator and the range of the servo motors are limited as qmi,min ≤ qmi ≤ qmi,max and ui,min ≤ ui ≤ ui,max [3]. In this study, we use ai = 50. C. Augmented Plant Dynamics The plant dynamics composed of the robot dynamics (3) and the servo motor dynamics (8) can now be formulated as x˙ = f (x, u)

(9)

where

⎤ x2    0 ⎥ ⎢ −1 ⎢ M (x1 ) −C(x1 , x2 )x2 −g(x1 )−Dx˙ 2 + τ (x , x ) ⎥ f=⎢ 1 3 ⎥ ⎦ ⎣ x ⎡

4

−a2 x3 − 2ax4 + a2 u

(10)

˙ qm , q˙ m ]T ∈ R8 , x = [ x1 , x2 , x3 , x4 ]T = [ q, q, T T q = [ q1 , q2 ] , qm = [ qm1 , qm2 ] and u = [ u1 , u2 ]T . Note that a in (10) denotes a = diag{ai } and a2 is defined as a2 = diag{a2i } for notational convenience. D. Optimal Feedback Control with Temporal Optimization For plant dynamics x˙ = f (x, u),

(11)

the objective of optimal control [33] is to find a control law u∗ = u(x, t) which minimizes the cost function  T h(x(t), u(t))dt J = Φ(x(T )) +

(12)

(13)

0

for a given movement duration T , where Φ(x(T )) is the terminal cost and h(x(t), u(t)) is the running cost. We employ the iterative linear quadratic Gaussian (ILQG) algorithm [17] to obtain a locally optimal feedback control law   (14) u(x, t) = uopt (t) + L(t) x(t) − xopt (t) . In addition to obtaining an optimal control law, we simultaneously optimize the movement duration T using the temporal optimization algorithm proposed in [27]. In [27], a mapping β(t) from the real time t to a canonical time t  t 1 ds, (15) t = β(s) 0 is introduced and β(t) is optimized to yield the optimal movement duration T . In this study, we simplify the temporal

307

robot movement with spring tension cost

optimization algorithm by discretizing (15) with an assumption that β(t) is constant during the movement as Δt =

1 Δt. β

(16)

−0.1

By updating β using gradient descent βnew = β − η∇β J

−0.2

(17)

−0.3

where η > 0 is a learning rate, we obtain the movement duration T  = β1 T where T = N Δt (N is the number of discrete time steps). In the complete optimization procedure, ILQG and the update of β in (17) are iterated in an EM (ExpectationMaximization)-like manner until convergence to obtain the final optimal feedback control law (14) and the associated movement duration T ∗ . Depending on the task objective, it is further possible to augment the cost by including the time explicitly as (18) J  = J + wT T

−0.4 −0.5 −0.4

0

0.1

0.2

0.3

0.4

   

 



q1 with Topt  





q2 with Topt 















  

 



qm2 with Topt





qm1 with Topt

 

In this paper, we consider the task of swing locomotion from handhold to handhold on a ladder. and swinging-up from the suspended posture to catch the target bar. Motivated by the discussions on our passive control strategy in Section II, we consider the following cost function to encode the task (the specific reason will be explained below)  T  T  ∗ T ∗ J = (y(T ) − y ) QT (y(T ) − y ) + u R1 u + R2 F 2 dt





   









# !#  ! !" 

(b) Joint trajectories and servo motor positions ! !" 





 



nearly zero joint torque 



 

# !#







 

#!##  

0

3 A video clip of summarizing the results is available at http://goo.gl/iYrFr

−0.1

  

A. Optimization Results in Brachiation Task

(21)

−0.2

 

IV. E VALUATIONS3

(19) where y = [ r, r˙ ]T ∈ R4 are the position and the velocity of the gripper in the Cartesian coordinates, y∗ is the target values when reaching the target y∗ = [ r∗ , 0 ]T and F is the spring tension in the VSA given in (6). This objective function is designed in order to reach the target located at r∗ at the specified time T while minimizing the spring tension F in the VSA. Note that the main component in the running cost is to minimize the spring tension F by the second term while the first term uT R1 u is added for regularization with a small choice of the weights in R1 . In practice, this is necessary since F is a function of the state and ILQG requires a control cost in its formulation to compute the optimal control law. Notice that the actuator torque (7) can be expressed in the form (20) τ = −F sin(q − qm1 )/γ  ' B 2 + C 2 − 2BC cos (qm1 − q)/BC. In this where γ  = equation (20), it can be conceived that F has a similar role to the stiffness parameter k as in the simplified actuator model

−0.3

(a) Movement of the robot with optimized duration T = 0.606 (sec)

where J is the cost (13) and wT is the weight on the time cost, which determines trade-off between the original cost J and movement duration T .

τ = −k(q − qm ).

Target

0

 



nearly zero spring tension





 

! #!##







 

nearly zero joint stiffness

  





 

! #







(c) Joint torque, spring tension and joint stiffness Fig. 2. Optimization of the locomotion task using the cost (19). In (b) and (c), gray thin lines show the plots for non-optimized T in the range of T = [0.5, · · · , 0.7] (sec) and thick lines show the plots for optimized Topt = 0.606. Note that especially at the beginning and the end of the movement, joint torque, spring tension and joint stiffness are kept small allowing the joint to swing passively.

Another interpretation can be considered in such a way that if we linearize (4) around the equilibrium position assuming that α = qm1 − q  1, the relationship between the joint stiffness k in (5) and the spring tension F in (6) can be approximated as 1 F. (22) k≈√ 2 B + C 2 − 2BC Thus, effectively, minimizing the spring tension F corresponds

308

     

u2





qm1

u1

   

        









with optimal feedback

qm2



 











   

feedforward only









     





   









Fig. 3. Servo motor commands ui (dotted line) and actual angle qmi (solid line) for the results with optimal movement duration T = 0.606 (cf. Fig. 2 (b) bottom). Servo motor response delay can be observed characterized by the servo motor dynamics (8). The proposed optimal control framework finds appropriate control commands taking this effect into account.

to minimizing the stiffness k in an approximated way. Note that it is possible to directly use k in the cost function. However, in practice, first and second derivatives of k are needed to implement the ILQG algorithm which become significantly more complex than those of F since the joint stiffness k is already the first derivative of τ as described in (5). Thus, it is preferable to use the spring tension F . This close relationship between F and k in the general nonlinear case can be observed in the plots, for example, in Fig. 2 (middle and bottom in (c)). In fact, the appropriate choice of the cost function is critical for successful task execution. We discuss the issue of task encoding via cost selection in Section IV-D in more details. 1) Swing Locomotion: Consider the case where that target bar is located at d = 0.3 (m). We optimize both the control command u and the movement duration T . We use QT = diag{10000, 10000, 10, 10}, R1 = diag{0.0001, 0.0001} and R2 = 0.1 for the cost function in (19). As mentioned above, R1 is chosen to be a small value for regularization needed for ILQG implementation. The optimized movement duration was T = 0.606 (sec). Fig. 2 shows (a) the optimized robot movement, (b) joint trajectories and servo motor positions, and (c) joint torque, spring tension and joint stiffness. In the plots, trajectories of the fixed time horizon ranging T = 0.5 ∼ 0.7 (sec) are also overlayed for comparison in addition to the case of the optimal movement duration T = 0.606 (sec). In the optimized movement, the spring tension and the joint stiffness are kept small at the beginning and end of the movement resulting in nearly zero joint torque, which allows the joint to swing passively. The joint torque is exerted only during the middle of the swing by increasing the spring tension as necessary. This result suggests that the natural plant dynamics are fully exploited for the desirable task execution based on the control strategy discussed in Section II with simultaneous stiffness and temporal optimization. In order to illustrate the effect of the servo motor dynamics characterized by (8), Fig. 3 shows the servo motor position commands and actual motor angles with the optimal movement duration (cf. Fig. 2 (b) bottom). Delays in the servo motor response can be observed in this plot. This suggests that the proposed optimal control framework can find appropriate control commands taking this effect into account.

        













Fig. 4. Effect of optimal feedback under the presence of parameter mismatch between the nominal model used to compute optimal control and the actual robot. Comparison between the movement with the optimal feedback control and the movement with only feedforward command is shown. The result demonstrates the effectiveness of optimal feedback control under model uncertainty.

B. Effect of Optimal Feedback under Modelling Error One of the benefits of using the optimal feedback control framework is that in addition to computing the optimal feedforward control command, it provides a locally optimal feedback control in the neighborhood of the optimal trajectory, which allows the controller to make corrections if there is small deviation from the nominal optimal trajectory. In this section, we present numerical studies of the effect of optimal feedback control (14) under the influence of model mismatch between the nominal model and actual robot parameters. We introduce a modelling error as mi,nominal = 1.05mi (link mass) and lc,i,nominal = 1.1lc,i (location of center of mass on the link) for i = 1, 2. Fig. 4 shows the comparison between the movement using the optimal feedback control law (14) obtained in the simulation in Section IV-A1 above and with only feedforward (open loop) optimal control command u = uopt (t) under the presence of modelling error. Using only feedforward control, the robot deviates from the target bar due to the model mismatch. However, with the optimal feedback control law, the robot is able to get closer towards the target with the help of the feedback term. These results suggest the effectiveness of the optimal feedback control. In future work, we are interested in on-line learning of the plant dynamics to address the issue of model uncertainty [20, 21]. C. Switching Dynamics and Tasks Parameters In this section, we explore different task requirements with switching dynamics. In the following simulation, we use the robot model with the link length as l1 = 0.2 (m) and l2 = 0.35 (m) introducing asymmetric configuration in the robot structure. We consider the task of first swinging up from the suspended posture to the target at d = 0.45 (m), then subsequently continuing to locomote twice to the target bars at d = 0.4 (m) and d = 0.42 (m) (irregular intervals). Note that every time the robot grasps the target and starts swinging for the next target, the robot configuration is interchanged, which significantly changes the dynamic properties for each swing movement due to asymmetric structure of the robot. Thus, the stiffness and movement duration need to be adjusted

309

            

d=0.45

d=0.4

optimized movement time for the swing locomotion 2) and 3) (more than 25%). In 2), the lower link is heavier and in 3) the top link is heavier due to the mass of the VSA model. Thus, the effective natural frequency of the pendulum movement is different, which resulted in different movement duration. The results highlight that our approach can find appropriate movement duration and command sequence to achieve the task under different requirement and conditions (locomotion, swing-up, robot dynamics change and target distance change). In this example, each maneuver is optimized separately. Optimization over multiple swing movements including transitions will be of our future interest.

d=0.42

  T=3.460 

T=0 





1) swing up T=0~2.071











2) 1st locomotion with interchanged configuration T=2.071~2.849 (0.778s)





3) 2nd locomotion T=2.849~3.460 (0.611s)

(a) Sequence of the movement of the robot   

    

1) swing up

2) 1st locomotion

3) 2nd locomotion



mod 2pi



D. Design and Selection of a Cost Function In optimal control, generally, a task is encoded in terms of a cost function, which can be viewed as an abstract representation of the task. From our point of view and experience, design and selection of the cost function is one of the most important and difficult parts for a successful application of such an optimal control framework. For a simple task and plant dynamics, an intuitive choice (typically a quadratic cost in the state and control as in an LQR setting) would suffice (still it is necessary to adjust the weights). However, for a highly dynamic task with complex plant dynamics, this increasingly becomes difficult and an appropriate choice of the cost function which best encodes the task still remains an open issue. In this section, we explore a few more candidates of the cost functions. In addition to the cost function (19), consider the following running cost functions h = h(x, u) in (13): • quadratic cost with the control command (servo motor position command):

  

 













   

  



1) swing up

2) 1st locomotion

3) 2nd locomotion



 





 





    





  

   1) swing up  

2) 1st locomotion





  

3) 2nd locomotion







 

   

   

   

(b) Joint trajectories and servo motor positions

 







  





h = uT Ru



  



 







  





(c) Joint torque, spring tension and joint stiffness Fig. 5. Simulation results of the sequence of movements. Note that the robot configuration is asymmetric with the link length l1 = 0.2 (m) and l2 = 0.35 (m). When the robot swings after grasping the bar, the robot configuration is interchanged, which significantly changes the dynamic characteristics.

appropriately to fulfill the desired task requirement. The cost function (19) with the same parameters are used as in the previous simulations. For the swing-up task, we add the time cost wT T with wT = 5 (see (18)), i.e., the task requirement in swing-up is try to swing up quickly while minimizing the control cost. Fig. 5 shows (a) the sequence of the optimized robot movement, (b) joint trajectories and servo motor positions, and (c) joint torque, spring tension and joint stiffness. The obtained optimal movement duration was The obtained optimal movement duration was 1) T = 2.071 (sec) for swing up, 2) T = 0.778 (sec) for the locomotion with interchanged (upside down) robot model and 3) T = 0.611 (sec) for the last swing movement, respectively. Notice the significant difference in the

(23)

quadratic cost with the joint torque. The main term is the cost associated with the joint torque τ and uT R1 u is added for regularization (small R1 ): h = uT R1 u + R2 τ 2

(24)

Figure 6 shows the results using the running cost h = uT Ru in (23) with R = diag{1, 1}. The obtained optimal movement duration is T = 0.604 (sec). Figure 7 shows the results using the running cost h = uT R1 u+R2 τ 2 in (24) with R1 = diag{0.0001, 0.0001} and R2 = 100. The obtained optimal movement duration is T = 0.620 (sec). In both of these two cases, the same terminal cost parameters are used as in the case of (19). As demonstrated in Figs. 6 and 7, the robot is also successfully able to reach the target bar by minimizing each specific cost in addition to the case of the cost (19) presented in Section IV-A above. However, with the choice of the running cost (23), significant difference in the resultant robot movement and much higher spring tension and joint stiffness can be observed in Fig. 6. As can be seen in Fig. 7, with the choice of cost associated with the joint torque in (24), the resultant movement looks almost identical to the one with the cost (19) (see Fig. 2) and the joint torque profile is comparable. However, we can

310

2 0

1

0

−0.2

q q

−2

2

0.1

0.2

0.3

0.4

0.5

0.6

servo motor positions angle (rad)

−0.3 −0.4 −0.5

2 0

m1

0

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

q q

−2

m2

0.1

0.2

0.4

0.3 time (sec)

0.4

0.5

0.6

spring tension (N) joint torque (Nm)

−0.1

joint angles (rad)

Target

0

joint torque τ

1 0 −1 0

0.1

0.2

0.3 spring tension

0.4

0.5

40

0.6 F

20 0 0

stiffness (Nm/rad)

joint angles

robot movement with servo motor position command cost

0.1

0.2

0.3 joint stiffness

0.4

0.5

2

0.6 k

1 0 0

0.1

0.2

0.3 time (sec)

0.4

0.5

0.6

−0.1

joint angles (rad)

Target

0

2 0

0

−0.2

q1 q2

−2 0.1

0.2

0.3

0.4

0.5

0.6

servo motor positions angle (rad)

−0.3 −0.4 −0.5

2 0

0

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

qm1 qm2

−2 0.1

0.2

0.3 time (sec)

0.4

0.5

0.6

joint torque τ

1 0 −1 0

0.1

0.2

0.3 0.4 spring tension

0.5

40

0.6 F

20

stiffness (Nm/rad)

joint angles

robot movement with joint torque cost

spring tension (N) joint torque (Nm)

Fig. 6. Optimization of the locomotion task using the running cost l = uT R1 u in (23) with R1 = diag{1, 1}. Left: Movement of the robot with optimized duration T = 0.604 (sec) Center: Joint angles and servo motor angles. Right: Joint torque, spring tension and joint stiffness. Note that while the task itself is achieved, the movement looks very different from the one in Fig. 2 and much higher spring tension and joint stiffness during the swing movement can be observed.

0 0

0.1

0.2

0.3 joint stiffness

0.4

0.5

2

0.6 k

1 0 0

0.1

0.2

0.3 time (sec)

0.4

0.5

0.6

Fig. 7. Optimization of the locomotion task using the running cost l = uT R1 u + R2 τ 2 in (24) with R1 = diag{0.0001, 0.0001} and R2 = 100. Left: Movement of the robot with optimized duration T = 0.620 (sec) Center: Joint angles and servo motor angles. Right: Joint torque, spring tension and joint stiffness. The resultant movement looks almost identical to the one in Fig. 2 and the joint torque profile is comparable. However, we can observe that spring tension and joint stiffness are larger than those of Fig. 2.

observe that spring tension and joint stiffness are larger than those of the cost (19). This is due to the redundancy in the variable stiffness actuation and the results depend on how we resolve it by an appropriate choice of the cost function. These results suggest that the choice of the cost function is crucial, however, its selection is still non-intuitive. Note that other consideration of cost functions could be possible, e.g., energy consumption. In the brachiation task, without friction, the - T mechanical energy of the rigid body dynamics, E = 0 τ q˙2 dt, is conserved for the swing locomotion with the same intervals at the same height starting and ending at zero velocity (if no potential energy is stored in the spring of the VSA at the end of the swing). Thus, if we wish to consider true energy consumption, it would be necessary to evaluate the electrical energy consumed at the motor level. However, this is not straightforward since we need a precise model of the mechanical and electrical motor dynamics including all the factors such as motor efficiency and transmission loss, which could be rather complex to model in practice, and the control strategy would largely depend on the properties of the actual motors used. E. Remarks on other Joint Actuation and Controller Design Approaches In this paper, we explore variable stiffness actuation to exploit passive dynamics in swing movement. One of the desirable properties of the variable stiffness actuation we consider is that the joint can be fully mechanically passive by appropriately adjusting the spring tension in the actuator

mechanisms. This is in contrast to joint actuation with geared electric motors in many of existing robotic systems. Typically, joints with geared motors with high gear ratio aimed for position control cannot be fully back-drivable, i.e., joints cannot be made passive to exploit natural dynamics of the link. For example, the brachiating robot in [23] uses a DC motor with a harmonic drive gear and exhibited complex and relatively high friction. Thus, in this design, it is not possible to exploit the passive dynamics of the second link since the motor is not fully back-drivable by gravity, and it is necessary to actively drive the joint to achieve the swing movement. To make the joint fully back-drivable without passive components, we may need to use high performance direct drive motors which would typically require precise torque control mechanisms. From the viewpoint of a different controller design approach, the target dynamics method [23] uses input-output linearization to actively cancel the plant dynamics. While its effectiveness has been demonstrated in the torque controlled robot hardware, it is not straightforward to apply this method to the control of robot with general variable stiffness mechanisms since the system dynamics are not easily input-output linearizable due to redundancy and complex nonlinearity in actuator dynamics. Furthermore, it turned out that for the parameter setting used in Section IV-C, the target dynamics controller becomes singular at some joint angle q2 within the range of the movement even for the torque controlled case. With the link mass parameters used in this paper, we did not find problems with the same link length l1 = l2 , however, typically, we numerically found that when l2 > l1 , the

311

target dynamics method encounters an ill-posedness problem of invertibility in the derivation of the control law (cf. Equation (15) in [23]). V. C ONCLUSION In this paper, we have presented an optimal control framework for exploiting passive dynamics of the system for swing movements. As an example, we considered brachiation on a two-link underactuated robot with a variable stiffness actuation mechanism, which is a highly dynamic and challenging task. Numerical simulations illustrated that our framework was able to simultaneously optimize the time-varying joint stiffness profile and the movement duration exploiting the passive dynamics of the system. These results demonstrate that our approach can deal with different task requirements (locomotion in different intervals, swing-up), modelling errors and switching in the robot dynamics. In addition, we empirically explored the issue of the design and selection of an appropriate cost function for successful task execution. The approach presented in this paper to exploit the passive dynamics with VSA contrasts to the nonlinear controller design with active cancellation of the plant dynamics using input-output linearization for the same task [23]. However, we feel that it shares an important issue of task encoding (or description) either in the form of target dynamics or in terms of a cost function based on physical understanding and insight into the task. We aim to extend our approach to include variable damping [26] for dynamic tasks involving interactions with environments. ACKNOWLEDGMENTS This work was funded by the EU Seventh Framework Programme (FP7) as part of the STIFF project. R EFERENCES [1] K. Akawaza, et al. Modulation of stretch reflexes during locomotion in the mesencephalic cat. J. of Physiolosy, 329 (1):553–567, 1982. [2] D. J. Bennett, et al. Time-varying stiffness of human elbow joint during cyclic voluntary movement. Exp. Brain Res., 88: 433–442, 1992. [3] D. Braun, M. Howard, and S. Vijayakumar. Optimal Variable Stiffness Control: Formulation and Application to Explosive Movement Tasks. Autonomous Robots, 2012. doi: 10.1007/ s10514-012-9302-3. [4] M. G. Catalano, R. Schiavi, and A. Bicchi. Mechanism design for variable stiffness actuation based on enumeration and analysis of performance. In IEEE ICRA, 2010. [5] S. Collins, A. Ruina, R. Tedrake, and M. Wisse. Efficient bipedal robots based on passive-dynamic walkers. Science, 307 (5712):1082–1085, 2005. [6] O. Eiberger, et al. On joint design with intrinsic variable compliance: derivation of the DLR QA-joint. In IEEE ICRA, 2010. [7] S. Eimerl and I. DeVore. The Primates. TIME-LIFE BOOKS, 1966. [8] M. Garabini, A. Passaglia, F. Belo, P. Salaris, and A. Bicchi. Optimality principles in variable stiffness control: The VSA hammer. In IEEE/RSJ IROS, 2011. [9] M. W. Gomes and A. L. Ruina. A five-link 2D brachiating ape model with life-like zero-energy-cost motions. J. of Theoretical Biology, 237(3):265–278, 2005.

[10] S. Haddadin, M. Weis, S. Wolf, and A. Albu-Scha¨effer. Optimal control for maximizing link velocity of robotic variable stiffness joints. In IFAC World Congress, 2011. [11] R. Van Ham, et al. MACCEPA, the mechanically adjustable compliance and controllable equilibrium position actuator: Design and implementation in a biped robot. Rob. and Aut. Sys., 55(10):761–768, 2007. [12] R. Van Ham, et al. Compliant actuator designs. IEEE Robotics and Automation Mag., 16(3):81–94, 2009. [13] T. Hondo and I. Mizuuchi. Analysis of the 1-joint springmotor coupling system and optimization criteria focusing on the velocity increasing effect. In IEEE ICRA, 2011. [14] J. W. Hurst, J. E. Chestnutt, and A. A. Rizzi. The actuator with mechanically adjustable series compliance. IEEE Trans. on Robotics, 26(4):597–606, 2010. [15] A. Jafari, N. G. Tsagarakis, B. Vanderborght, and D. G. Caldwell. A novel actuator with adjustable stiffness (AwAS). In IEEE/RSJ IROS, 2010. [16] J. G. D. Karssen and M. Wisse. Running with improved disturbance rejection by using non-linear leg springs. Int. J. of Robotics Research, 30(13):1585–15956, 2011. [17] W. Li and E. Todorov. Iterative linearization methods for approximately optimal control and estimation of non-linear stochastic system. Int. J. of Control, 80(9):1439–1453, 2007. [18] A. De Luca and G. Oriolo. Trajectory planning and control for planar robots with passive last joint. Int. J. of Robotics Research, 21(5-6):575–590, 2002. [19] K. M. Lynch, et al. Collision-free trajectory planning for a 3DOF robot with a passive joint. Int. J. of Robotics Research, 19(12):1171–1184, 2000. [20] D. Mitrovic, S. Klanke, R. Osu, M. Kawato, and S. Vijayakumar. A computational model of limb impedance control based on principles of internal model uncertainty. PLoS ONE, 2010. [21] D. Mitrovic, S. Klanke, and S. Vijayakumar. Learning Impedance Control of Antagonistic Systems Based on Stochastic Optimization Principles. Int. J. of Robotics Research, 2010. [22] Y. Nakamura, et al. Nonlinear behavior and control of a nonholonomic free-joint manipulator. IEEE Trans. on Rob. and Aut., 13(6):853–862, 1997. [23] J. Nakanishi, T. Fukuda, and D. E. Koditschek. A brachiating robot controller. IEEE Trans. on Robotics and Automation, 16 (2):109–123, 2000. [24] J. Nakanishi, K. Rawlik, and S. Vijayakumar. Stiffness and temporal optimization in periodic movements: An optimal control approach. In IEEE/RSJ IROS, 2011. [25] G. Palli, C. Melchiorri, and A. De Luca. On the feedback linearization of robots with variable joint stiffness. In IEEE ICRA, 2008. [26] A. Radulescu, M. Howard, D. J. Braun, and S. Vijayakumar. Exploiting variable physical damping in rapid movement tasks. In IEEE/ASME AIM Conf., 2012. [27] K. Rawlik, M. Toussaint, and S. Vijayakumar. An approximate inference approach to temporal optimization in optimal control. NIPS 2010. [28] N. Rosa, A. Barber, R. D. Gregg, and K. Lynch. Stable openloop brachiation on a vertical wall. In IEEE ICRA, 2012. [29] F. Saito, T. Fukuda, and F. Arai. Swing and locomotion control for a two-link brachiation robot. IEEE Cont. Sys. Mag., 14(1): 5–12, 1994. [30] R. Shadmehr. Learning virtual equilibrium trajectories for control of a robot arm. Neural Computation, 2(4):436–446, 1990. [31] M. W. Spong. Modeling and control of elastic joint robots. Trans. of ASME, J. of Dynamic Systems, Measurement, and Control, 109(6):310–319, 1987. [32] M. W. Spong. The swing up control problem for the acrobot. IEEE Control Sys. Mag., 15(1):49–55, 1995. [33] R. F. Stengel. Optimal Control and Estimation. 1994.

312

Inference on Networks of Mixtures for Robust Robot Mapping Edwin Olson

Pratik Agarwal

Computer Science and Engineering, University of Michigan, 2260 Hayward Street, Ann Arbor, Michigan Email: [email protected]

Computer Science and Engineering, University of Michigan, 2260 Hayward Street, Ann Arbor, Michigan Email: [email protected]

Abstract— The central challenge in robotic mapping is obtaining reliable data associations (or “loop closures”): state-ofthe-art inference algorithms can fail catastrophically if even one erroneous loop closure is incorporated into the map. Consequently, much work has been done to push error rates closer to zero. However, a long-lived or multi-robot system will still encounter errors, leading to system failure. We propose a fundamentally different approach: allow richer error models that allow the probability of a failure to be explicitly modeled. In other words, we optimize the map while simultaneously determining which loop closures are correct from within a single integrated Bayesian framework. Unlike earlier multiple-hypothesis approaches, our approach avoids exponential memory complexity and is fast enough for realtime performance. We show that the proposed method not only allows loop closing errors to be automatically identified, but also that in extreme cases, the “front-end” loop-validation systems can be unnecessary. We demonstrate our system both on standard benchmarks and on the real-world datasets that motivated this work.

I. I NTRODUCTION Robot mapping problems are often formulated as an inference problem on a factor graph: variable nodes (representing the location of robots or other landmarks in the environment) are related through factor nodes, which encode geometric relationships between those nodes. Recent Simultaneous Localization and Mapping (SLAM) algorithms can rapidly find maximum likelihood solutions for maps, exploiting both fundamental improvements in the understanding of the structure of mapping problems [1], [2], [3], and the computational convenience afforded by assuming that error models are simple uni-modal Gaussian [4]. Despite their convenience, Gaussian error models often poorly approximate the truth. In the SLAM domain, perceptual aliasing can lead to incorrect loop closures, and the resulting error can lead to divergence of the map estimate. Similarly, the wheels of a robot may sometimes grip and sometimes slip, leading to a bi-modal motion model. Similar challenges arise throughout robotics, including sonar and radar (with multi-path effects), target-tracking (where multiple disjoint hypotheses may warrant consideration), etc. In the specific case of SLAM, it has become standard practice to decompose the problem into two halves: a “frontend” and “back-end”. The front-end is responsible for iden-

Fig. 1. Recovering a map in the presence of erroneous loop closures. We evaluated the robustness of our method by adding erroneous loop closures to the Intel data set. The top row reflects the posterior map as computed by a state-of-the-art sparse Cholesky factorization method with 1, 10, and 100 bad loop closures. The bottom row shows the posterior map for the same data set using our proposed max mixture method. While earlier methods produce maps with increasing global map deformation, our proposed method is essentially unaffected by the presence of the incorrect loop closures.

tifying and validating loop closures and constructing a factor graph; the back-end then performs inference (often maximum likelihood) on this factor graph. In most of the literature, it is assumed that the loop closures found by the front-end have noise that can be modeled as Gaussian. For example, the front-end might assert that the robot is now at the same location that it was ten minutes ago (it has “closed a loop”), with an uncertainty of 1 meter. Suppose, however, that the robot was somewhere else entirely— a full 10 meters away. The back-end’s role is to compute the maximum likelihood map, and an error of ten standard deviations is so profoundly unlikely that the back-end will almost certainly never recover the correct map: it is compelled to distort the map so as to make the erroneous loop closure more probable (see Fig. 1). The conventional strategy is to build better front-end systems. Indeed, much effort has been devoted to creating better front-end systems [5], [6], [7], and these approaches have succeeded in vastly reducing the rate of errors. But

313

for systems that accumulate many robot-hours of operation, or robots operating in particularly challenging environments, even an extremely low error rate still results in errors. These errors lead to divergence of the map and failure of the system. Our recent efforts at building a team of robots that can cooperatively explore and map an urban environment [8] illustrate the challenges, and motivated this work. At the time, we modeled the uncertainty of odometry and loop closing edges with simple Gaussians, but despite extensive validation of these edges prior to optimization, some of these edges had large errors that were virtually impossible given their noise model. Even with a novel interface allowing a human to help untangle the resulting map [8], errors were still evident (see Fig. 6). Our subsequent analysis revealed that odometry edges were often to blame. We had assumed a 15% noise model, but our robots, driving under autonomous control, would occasionally get caught on small, unsensed obstacles. As a result, the robot actually encountered 100% error—five standard deviations given our prior noise model. The resulting error in our position estimates exacerbated the perceptual aliasing problem: our incorrect position prior would argue against correct loop closure hypotheses, and would favor some incorrect hypotheses. In this paper, we propose a novel approach that allows efficient maximum-likelihood inference on factor graph networks that contain arbitrarily complex probability distributions. This is in contrast to state-of-the-art factor graph based methods, which are limited to uni-modal Gaussian distributions, and which suffer from the real-world problems described above. Specifically, we propose a new type of mixture model, a max-mixture, which provides similar expressivity as a sum-mixture, but avoids the associated computational costs. With such a mixture, the “slip or grip” odometry problem can be modeled as a multi-modal distribution, and loop closures can be accompanied by a “null” hypothesis. In essence, the back-end optimization system serves as a part of the frontend— playing an important role in validating loop closures and preventing divergence of the map. We will demonstrate our system on real data, showing that it can easily handle the error rates of current frontend data validation systems, allowing robust operation even when these systems produce poor output. We will also illustrate that, in extreme cases, no front-end loop validation is required at all: all candidate loop closures can simply be added to the factor graph, and our approach simultaneously produces a maximum likelihood map while identifying the set of edges that are correct. This is an interesting development, since it provides a fully integrated Bayesian treatment of both mapping and data association, tasks which are usually decoupled. It has previously been shown that exact inference on even poly-trees of mixtures is NP-hard [9]. Our method avoids exponential complexity at the expense of guaranteed convergence to the maximum likelihood solution. In this paper, we explore the robustness of our method, and characterize the error rates that can be handled. In short, the contributions of this paper are:











We formulate a new mixture model that provides significant computational advantages over the more traditional sum-of-Gaussians mixtures, while retaining similar expressive power. We develop an algorithm for fast maximum-likelihood inference on factor graph networks containing these max-mixtures. We demonstrate how robot mapping systems can use these methods to robustly handle errors in odometry and loop-closing systems. We characterize the robustness of our method to local minima, identifying factors (like error rate and overall graph degree) and their impact. We evaluate our algorithm on real-world datasets to demonstrate its practical applicability both in terms of the quality of results and the computation time required. II. R ELATED W ORK

We are not the first to consider estimation in the presence of non-Gaussian noise. Two well-known methods allow more complex error models to be used: particle filter methods and multiple hypothesis tracking (MHT) approaches. Particle filters, perhaps best exemplified by FastSLAM [10], approximate arbitrary probability distributions through a finite number of samples. Particle filters attempt to explicitly (though non-parametrically) describe the posterior distribution. Unfortunately, the posterior grows in complexity over time, requiring an ever-increasing number of particles to maintain the quality of the posterior approximation. This growth quickly becomes untenable, forcing practical implementations to employ particle resampling techniques [11], [12], [13]. Unavoidably, resampling leads to a loss of information, since areas with low probability density are effectively truncated to zero. This loss of information can make it difficult to recover the correct solution, particularly after a protracted period of high uncertainty [14], [15]. Multiple Hypothesis Tracking approaches provide an alternative approach more closely related to mixture models [16]. These explicitly represent the posterior using an ensemble of Gaussians that collectively encode a mixture. However, the number of potential hypotheses tends to grow exponentially, forcing pruning of hypotheses and thus information loss. A similar problem occurs when explicitly performing inference on mixtures; Whyte et al approximate terrain using mixtures [17] but must resample the terrain in order to prevent an intractable number of mixture components. In both cases, information is lost that can result in lower-quality solutions. In the special case where errors are modeled as uni-modal Gaussians, the maximum likelihood solution of the factor graph network can be found using non-linear least squares. Beginning with the observation that the information matrix is sparse [18], [19], [20], efforts to exploit that sparsity resulted in rapid improvements to map inference by leveraging sparse factorization and good variable-ordering heuristics [21], [22], [23]. While the fastest of these methods generally provide only maximum-likelihood inference (a shortcoming shared by our proposed method), approximate marginal estimation

314

Original bi−modal mixture

methods are fast and easy to implement [24], [25]. It is highly desirable for new methods to be able to leverage the same insights that made these approaches so effective. One method similar to our own explicitly introduces switching variables whose value determines whether or not a loop closure is accepted [26]. This work is notable for being the first to propose a practical way of dealing with front-end errors during inference. In comparison to our approach, the activation/deactivation of a loop closure is penalized through a separate cost function (as opposed to being integrated into a mixture model), they must assign initial values to these variables (as opposed to our implicit inference over the latent variables), and are limited to either “on” or “off” (as opposed to being able to model mixture models with multiple distinct modes). Robustified cost functions [27] provide resilience to errors by reducing the cost associated with outliers, and have been widely used in the vision community [28], [29]. We show that robustified cost functions are subsumed by our mixture approach. Our proposed method avoids the exponential growth in memory requirements of particle filter and MHT approaches by avoiding an explicit representation of the posterior density. Instead, like other methods based on sparse factorization, our method extracts a maximum likelihood estimate. Critically, while the memory cost of representing the posterior distribution grows exponentially, the cost of storing the underlying factor graph network (which implicitly encodes the posterior) grows only linearly with the size of the network. In other words, our method (which only stores the factor graph) can recover solutions that would have been culled by particle and MHT approaches. In addition, our approach benefits from the same sparsity and variableordering insights that have recently benefited uni-modal approaches. III. A PPROACH Our goal is to infer the posterior distribution of the state variable (or map) x, which can be written in terms of the factor potentials in the factor graph. The probability is conditioned on sensor observations z; with an application of Bayes’ rule and by assuming an uninformative prior p(x), we obtain: & p(x|z) ∝ p(zi |x) (1) i

Prior to this work, it is generally assumed that the factor potentials p(zi |x) can be written as Gaussians: 1 − 12 (fi (x)−zi )T Λi (fi (x)−zi ) p(zi |x) = (2) −1 1/2 e n/2 (2π) |Λi | Further, while fi (x) is generally non-linear, it is assumed that it can be approximated using a first-order Taylor expansion such that fi (x) ≈ Ji Δx + fi (x0 ). The posterior maximum likelihood value can be easily solved in such cases by taking the logarithm of Eqn. 1, differentiating with respect to x, then solving for x. This classic least-squares approach leads to a simple linear system

Max−mixture

0.5

0.5

0.45

0.45

Sum−mixture 0.7 0.6

0.4

0.4

0.35

0.35

0.3

0.3

0.25

0.25

0.2

0.2

0.15

0.15

0.1

0.1

0.05

0.05

0.5 0.4 0.3 0.2 0.1

0 −4

−2

0

2

4

0 −4

−2

0

2

4

0 −4

−2

0

2

4

Fig. 2. Mixture Overview. Given two mixture components (top left), the max- and sum- mixtures produce different distributions. In both cases, arbitrary distributions can be approximated. A robustified cost function (in this case a corrupted Gaussian, bottom) can be constructed from two Gaussian components with equal means but different variances.

of the form Ax = b. Critically, this is possible because the logarithm operator can be “pushed” inside the product in Eqn. 1, reducing the product of N terms into a sum of N simple quadratic terms. No logarithms or exponentials remain, making the resulting expression easy to solve. We might now consider a more complicated function pi (x|z), such as a sum-mixture of Gaussians:  p(zi |x) = wj N (μj , Λ−1 (3) j ) j

In this case, each N (μj , Λ−1 j ) represents a different Gaussian distribution. Such a sum-mixture allows great flexibility in specifying the distribution p(zi |x). For example, we can encode a robustified cost function by using two components with the same mean, but with different variances. Or, we can encode a bi-modal distribution. The problem with a sum-mixture is that the maximum likelihood solution is no longer simple: the logarithm can no longer be pushed all the way into the individual Gaussian components: the summation in Eqn. 3 prevents it. As a result, the introduction of a sum-mixture means that it is no longer possible to derive a simple solution for x. A. Max-Mixtures Our solution to this problem is a new mixture model type, one based on a max operator rather than a sum: p(zi |x) = max wj N (μj , Λ−1 j ) j

(4)

While the change is relatively minor, the implications to optimization are profound. The logarithm can be pushed inside a max mixture: the max operator acts as a selector, returning a single well-behaved Gaussian component. The optimization process will be more thoroughly described in the following section.

315

A max mixture has much of the same character as a sum mixture and retains a similar expressivity: multi-modal distributions and robustified distributions can be similarly represented (see Fig. 2). Note, however, that when fitting a mixture to a desired probability distribution, different components will result for sum- and max- mixtures. Assuring that the distributions integrate  to one is also handled differently: for a sum mixture, wj = 1 is a necessary and sufficient condition; for a max mixture, proper normalization is generally more difficult to guarantee. Usefully, for maximum likelihood inference, it is inconsequential whether the distribution integrates to 1. Specifically, suppose that some normalization factor γ is required in order to ensure that a max mixture integrates to one. Since γ is used to scale the distribution, the log of the resulting max mixture is simply the log of the un-normalized distribution plus a constant. The addition of such a constant does not change the solution of a maximum-likelihood problem, and thus it is unnecessary for our purposes to compute γ. B. Cholesky-MM We now show how max mixture distributions can be incorporated into existing graph optimization frameworks. The principle step in such a framework is to compute the Jacobian, residual, and information matrix for each factor potential. As we described previously, these are trivial to compute for a uni-modal Gaussian distribution. For the max-mixture case, it might seem that computing the needed derivatives for the Jacobian is difficult: the maxmixture is not actually differentiable at the points where the maximum-likelihood component changes. While this makes it difficult to write a closed-form expression for the derivatives, they are none-the-less easy to compute. The key observation is that the max operator serves as a selector: once the mixture component with the maximum likelihood is known, the behavior of the other mixture components is irrelevant. In other words, the solver simply iterates over each of the components, identifies the most probable, then returns the Jacobian, residual, and information matrix for that component scaled according to the weight wj of that component. If the likelihood of two components are tied—an event which corresponds to evaluating the Jacobian at a non-differentiable point—we pick one of the components arbitrarily. However, these boundary regions comprise an area with zero probability mass. The resulting Jacobians, residuals, and information matrices are combined into a large least-squares problem which we subsequently solve with a minimum-degree variable ordering heuristic followed by sparse Cholesky factorization, in a manner similar to that described by [3]. With our modifications to handle max-mixtures, we call our system Cholesky-MM. It is often necessary to iterate the full least-squares problem several times. Each time, the best component in each max-mixture is re-evaluated: in essence, as the optimization proceeds, we dynamically select the best mixture component as an integral part of the optimization process.

Even in the non-mixture case, this sort of non-linear optimization cannot guarantee convergence to the global optimal solution. It is useful to think of a given inference problem as having a “basin of convergence”— a region that contains all the initial values of x that would ultimately converge to the global optimal solution. For most wellbehaved problems with simple Gaussian distributions, the basin of convergence is large. Divergence occurs when the linearization error is so severe that the gradient points in the wrong direction. The posterior distribution for a network with N mixtures, each with c components, is a mixture with as many as cN components. In the worst-case, these could be nonoverlapping, resulting in cN local minima. The global optimal solution still has a basin of convergence: if our initial solution is “close” to the optimal solution, our algorithm will converge. But if the basin of convergence is extremely small, then the practical utility of our algorithm will be limited. In other words, the key question to be answered about our approach is whether the basin of convergence is usefully large. Naturally, the size of this basin is strongly affected by the properties of the problem and the robustness of the algorithm used to search for a solution. One of the main results of this paper is to show that our approach yields a large basin of convergence for a wide range of useful robotics problems. IV. A PPLICATIONS AND E VALUATION In this section, we show how our approach can be applied to several real-world problems. We include quantitative evaluations of the performance of our algorithm, as well as characterize its robustness and runtime performance. A. Uncertain loop closures We first consider the case where we have a front-end that produces loop closures with a relatively low, but non-zero, error rate. For each uncertain loop closure, we introduce a max-mixture consisting of two components: 1) the frontend’s loop closure and 2) a null hypothesis. The null hypothesis, representing the case when the loop closure is wrong, is implemented using a mixture component with a very large covariance. Weights are assigned to these two components in accordance with the error rate of the front-end. In practice, the behavior of the algorithm is not particularly sensitive to the weights associated with the null hypotheses. The main benefit of our approach arises from having a larger probability associated with incorrect loop closures, as opposed to the exponentially-fast decreasing probability specified by the loop closer’s Gaussian. Even if the null hypothesis has a very low weight (we used 10−5 ), it will provide a sufficiently plausible explanation of the data to prevent a radical distortion of the graph. Second, once the null hypothesis becomes dominant, its large variance results in a very weak gradient for the edge. As a result, the edge plays virtually no role in subsequent optimization. We set the mean of the null hypothesis equal to that of the frontend’s hypothesis so that the small amount of gradient that

316

remains produces a slight bias back towards the front-end’s hypothesis. If subsequent observations re-affirm the frontend’s hypothesis, it can still become active in the future. Unlike particle filter or MHT methods which must eventually truncate unlikely events, no information is lost. A two-component mixture model in which both components have identical means but different variances can be viewed as a robustified cost function. In particular, parameters can be chosen so that a two-component max mixture closely approximates a corrupted Gaussian [27] (see Fig. 2). Analysis of false positive edges

Analysis of true positive edges

35

12000

10000

25 total number of true positives

total number of false positives

30

20

15

8000

6000

4000

10 2000

5

0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Difference between true distance and false positive edge length (meters)

0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Difference between true distance and true positive edge length (meters)

Fig. 4. Error distribution for true and false positives. Our method accepts some randomly-generated “false positives”, but an analysis of the error of those edges indicates that they (left) are only slightly worse than the error of true edges (right).

To evaluate the performance of our approach, we added randomly-generated loop closures to two standard benchmark datasets: the 3500 node Manhattan set [25] and the Intel data set. These were processed in an online fashion, adding one pose and its edges (both correct and incorrect). This mimics real-world operation better than a batch approach, and is more challenging due to the fact that it is easier to become caught in a local minimum since fewer edges are available to guide the optimization towards the global optimum. For a given number of randomly-generated edges, we compute the posterior map generated by our method and a standard non-mixture method, using a laser-odometry solution as the initial state estimate. The mean-squared error of this map is compared to the global optimal solution [30], and listed in Fig. 3. Our proposed method achieves dramatically lower meansquared errors than standard non-mixture versions. While the true positive rate is nearly perfect in both experiments, some randomly-generated edges (labeled false positives) are accepted by our system. However, since the false positives are randomly generated, some of them (by chance) are actually close to the truth. Such “accidentally correct” edges should be accepted by our algorithm1 . 1 We favor generating “false positives” in a purely random way, even though it leads to “accidentally” correct edges. Any filtering operation to reject these low-error edges would introduce a free parameter (the error threshold) whose value could be tuned to favor the algorithm.

We can evaluate the quality of the accepted edges by comparing the error distribution of the true positives and false positives (see Fig. 4). As the histogram indicates, the error distribution is similar, though the error distribution for the false positives is slightly worse than for the true positives. Still, no extreme outliers (the type that cause divergence of the map) are accepted by our method. B. Multi-modal distributions In the previous section, we demonstrated that our method can be used to encode null hypotheses, or equivalently, implement robustified cost functions—a capability similar to earlier work [26]. In that case, the probability distributions being approximated by each mixture have only a single maximum. Our use of a mixture model, however, also allows multi-modal distributions to be encoded. The ability to directly represent multi-modal distributions is a feature of our approach. For example, one of the original motivating problems of this work was dealing with the “slip or grip” problem: the case where a robot’s wheels occasionally slip catastrophically, resulting in near zero motion. With a typical odometry noise model of 10-20%, such an outcome would wreak havoc on the posterior map. Our approach to the “slip or grip” problem is to use a two-component mixture model: one component (with a large weight) corresponds to the usual 15% noise model, while the second component (with a low weight) is centered around zero. No changes to our optimization algorithm are required to handle such a case. However, since the distribution now has multiple local maxima, it poses a greater challenge in terms of robustness. Of course, without some independent source of information that contradicts the odometry data, there is no way to determine that the wheels were slipping. To provide this independent information, we used a state-of-the-art scan matching system [31] to generate loop closures. We manually induced wheel slippage by pulling on the robot. Despite the good loop closures, standard methods are unable to recover the correct map. In contrast, our method determines that

Fig. 5. Slip or Grip Example. We evaluate the ability of our algorithm to recover a good map in the presence of catastrophically slipping wheels. In this case, the robot is obtaining loop closures using a conventional laser scanner front-end. These loop closures are of high quality, but the odometry edges still cause significant map distortions when using standard methods (left). When a small probability is added to account for slippage, our mixture approach recovers a much improved map (right).

317

True Loop Edges 2099 2099 2099 2099 2099 2099 2099 2099 2099

False edges 0 10 100 200 500 1000 2000 3000 4000

True Positive 2099 2099 2099 2099 2099 2099 2099 2099 2099

Manhattan Data Set False Positive Average FP Error 0 NaN 0 NaN 1 0.0208 2 0.5001 3 0.6477 10 0.7155 22 0.5947 36 0.5821 51 0.6155

MSE (Our method) 0.6726 0.6713 0.6850 0.6861 0.6997 0.7195 0.7151 0.7316 0.8317

MSE (Non-mixture) 0.6726 525.27 839.39 874.67 888.82 893.98 892.54 896.01 896.05

True Loop Edges

False Edges

True Positive

Intel Data Set False Positive Average FP Error

MSE (Our method)

MSE (Non-mixture)

14731 14731 14731 14731 14731 14731 14731 14731 14731

0 10 100 200 500 1000 2000 3000 4000

14731 14731 14731 14731 14731 14731 14731 14731 14217

7.122x10−10 7.123x10−10 4.431x10−6 5.583x10−6 1.256x10−5 5.840x10−5 2.543x10−4 3.307x10−4 0.014

1.55x10−9 0.044 2.919 8.810 34.49 71.86 86.37 91.04 95.36

0 0 2 9 19 29 64 103 146

NaN NaN 0.1769 0.1960 0.1676 0.1851 0.1937 0.1896 0.1699

Fig. 3. Null-hypothesis robustness. We evaluate the robustness of our method and a standard Gaussian method to the presence of randomly-generated edges. As the number of randomly-generated edges increases, the mean squared error (MSE) of standard approaches rapidly degenerates; our proposed method produces good maps even when the number of randomly-generated edges is large in comparison to the number of true edges. Our approach does accept some randomly-generated edges (labeled “false positives” above), however the error of these accepted edges is comparable to that of the true positives. In each case, the initial state estimate is that from the open-loop odometry.

“slip” mode is more likely than the “grip” mode, and recovers the correct map (see Fig. 5). As part of our earlier multi-robot mapping work [8], we employed a team of 14 robots to explore a large urban environment. Wheel slippage contributed to a poor map in two ways: 1) the erroneous factor potentials themselves, and 2) the inability to identify good loop closures due to a low quality motion estimate. By using a better odometry model, our online system produced a significantly improved map (see Fig. 6). C. Eliminating the Front End In current approaches, front-end systems are typically responsible for validating loop closures prior to adding them to the factor graph network. However, if the back-end can recover from errors, is it possible to omit the filtering entirely? In certain cases, our inference method can eliminate the need for loop validation by the front-end. This is desirable from a conceptual standpoint: in principle, a better map

Fig. 6. Online results using odometry mixture model. The left figure shows a map of a 30m × 25m area in which our multi-robot urban mapping team produced a poor map due to wheel slippage and the ensuing inability to find loop-closures. With our odometry mixture model (right), the wheel slippage is (implicitly) detected, and we find additional loop closures. The result is a significantly improved map.

should result from handling loop closing and mapping from within an integrated Bayesian framework. The conventional decoupling of mapping into a front-end and back-end, while practical, prevents a fully Bayesian solution. We evaluated this possibility using the Intel data set. At every pose, a laser scan matcher attempts a match to every previous pose. The top k matches (as measured by overlap of the two scans) are formed into a mixture containing k + 1 components. (The extra component remains a null hypothesis to handle the case where all k matches are incorrect.) To push the experiment as far as possible, no position information was used to prune the set of k matches. Larger values of k provide robustness against perceptual aliasing, since it increases the likelihood that the correct match is present somewhere within the set of k components. Note that this approach is significantly different than adding k conventional edges to the graph. When edges are added “in parallel”, the posterior reflects a weighted average of all of them. In our mixture formulation, only the selected mode has an effect on the solution: the others are effectively “switched off”. An example of one mixture with k = 4 putative matches is shown in Fig. 7. The weight of the components is set in proportion to the score of the scan matcher. Running our system in an online fashion, we obtain the final map shown in Fig. 8. Online operation is more difficult than batch operation, since there is less information available early on to correct erroneous edges. Our system recovers a consistent global map despite the lack of any front-end loop validation. The quality of the open-loop trajectory estimate plays an important role in determining whether the initial state estimate is within the basin of convergence. In this case, our open-loop trajectory estimate is fairly good, and our method is able to infer the correct mode for each mixture despite the

318

lack of any front-end loop validation. The robustness of our method is amplified by better frontend systems: with better quality loop closures, the basin of convergence is enlarged, allowing good maps to be computed even when the open-loop trajectory is poor.

100

NodeDegree=4

90

NodeDegree=8

80

NodeDegree=12 Non Mixture

MSE error

70

D. Robustness

60 50 40 30

We have identified two basic factors that have a significant influence on the success of our method: the number of incorrect loop closures and the node degree of the graph. The node degree is an important factor because it determines how over-determined the system is: it determines the degree to which correct edges can “overpower” incorrect edges. To illustrate the relationship between these factors and the resulting quality of the map (measured in terms of mean

20 10 0 0

10

20

30

40

50

60

70

80

90

100

Error percentage in loop closure edges

Fig. 9. Effect of error rate and node degree on robustness. We evaluate the quality of posterior maps (in terms of mean squared error) as a function of the percentage of bad edges and the node degree of the graph. Each data point represents the average of 3,000 random trials; the standard error is also plotted showing that the results are significant. The quality of the posterior graph is robust to even high levels of error, and is improved further by problems with a high node degree. Our methods, regardless of settings, dramatically out-perform non-mixture methods (black dotted line).

  processing time (s)

0.5

  

Map

0.4

Cholesky−MM Normal Cholesky

0.3

0.2

0.1

0 0

100

200

300

400

500

600

700

800

900

Node processed

Scan alignments Fig. 7. Data-association as a mixture. Given a query pose (R), we perform a brute-force scan matching operation to all previous poses. The best 4 scan match results, based on overlap, are added to a max-mixture model that also includes a null hypothesis. The position of the best matches are shown as circles numbered (1-4), and the corresponding scan matches shown at the bottom. The similarity in appearance represents a significant degree of perceptual aliasing. The scan matcher finds two correct matches and two incorrect matches. The two correct matches are the matches numbered 1 and 2 at the bottom of the map and the first two scan alignments.

Fig. 10. Runtime performance. Using the Intel dataset, we plot the time required to compute a posterior map after every pose, using a batch solver. Our Intel dataset contains 875 nodes and 15605 edges, and each edge is modeled as a two-component max-mixture with a null hypothesis. The additional cost of handling mixtures is quite small in comparison to the total computation time.

squared error), we considered a range of loop-closing error rates (ranging from 0% to 100%) for graphs with an average node degree of 4, 8, and 12. Note that an error rate of 80% means that incorrect loop closures outnumber correct loop closures by a ratio of 4:1. In each case, the vehicle’s noisy odometry is also provided. For each condition, we evaluate the performance of our method on 100,000 randomlygenerated Manhattan-world graphs (see Fig. 9). Our method produces good maps even when the error rate is very high, and the performance improves further with increasing node degree. In contrast, a standard non-mixture approach diverges almost immediately. E. Runtime Performance

Fig. 8. Intel without front-end loop validation. Our system can identify correct loop closures and compute a posterior map from within a single integrated Bayesian framework (right); the typical front-end loop validation has been replaced with a k + 1 mixture component containing the k best laser scan matches (based purely on overlap) plus a null hypothesis. In this experiment, we used k = 5. For reference, the open-loop trajectory of the robot is given on the left.

The performance of our method is comparable to existing state-of-the-art sparse factorization methods (see Fig. 10). It takes additional time to identify the maximum likelihood mode for each mixture, but this cost is minor in comparison to the cost of solving the resulting linear system.

319

V. C ONCLUSION We have described a method for performing inference on networks of mixtures, describing an application of our method to robot mapping. Our method consists of a novel mixture model based on a “max” operator that makes the computation of the Jacobian and residual fast, and we show how existing sparse factorization methods can be extended to incorporate these mixtures. We believe that such an approach is necessary for long-lived systems, since any system that relies on a zero error rate will fail. We demonstrate how the mixture model allows null hypotheses and robustified cost functions to be incorporated into a maximum likelihood inference system. We show that our system is robust to a large error rates far in excess of what can be achieved with existing front-end loop validation methods. We further demonstrate a multi-modal formulation, addressing the “slip or grip” problem and showing that our system can make loop validation unnecessary in some cases. Our algorithm cannot guarantee convergence to the global optimum, but we characterized the basin of convergence, demonstrating the relationship between error rate, node degree, and convergence to a good solution. Finally, we demonstrate that the performance of our algorithm is similar to that of existing state-of-the-art maximum likelihood systems, and can be easily integrated into stateof-the-art methods [22]. R EFERENCES [1] P. M. Newman, “On the structure and solution of the simultaneous localisation and map building problem,” Ph.D. dissertation, University of Sydney, Australia, 1999. [2] U. Frese, “A proof for the approximate sparsity of SLAM information matrices,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Barcelona, Spain, April 2005, pp. 331–337. [3] F. Dellaert, “Square root SAM,” in Proceedings of Robotics: Science and Systems (RSS), Cambridge, USA, June 2005. [4] R. Smith, M. Self, and P. Cheeseman, “A stochastic map for uncertain spatial relationships,” in Proceedings of the International Symposium of Robotics Research (ISRR), O. Faugeras and G. Giralt, Eds., 1988, pp. 467–474. [5] J. Neira and J. D. Tardos, “Data association in stochastic mapping using the joint compatibility test,” IEEE Transactions on Robotics and Automation, vol. 17, no. 6, pp. 890–897, December 2001. [6] T. Bailey, “Mobile robot localisation and mapping in extensive outdoor environments,” Ph.D. dissertation, Australian Centre for Field Robotics, University of Sydney, August 2002. [7] E. Olson, “Recognizing places using spectrally clustered local matches,” Robotics and Autonomous Systems, 2009. [8] E. Olson, J. Strom, R. Morton, A. Richardson, P. Ranganathan, R. Goeddel, M. Bulic, J. Crossman, and B. Marinier, “Progress towards multi-robot reconnaissance and the MAGIC 2010 competition,” Journal of Field Robotics, September 2012. [9] U. Lerner and R. Parr, “Inference in hybrid networks: Theoretical limits and practical algorithms,” in Proceedings of the Proceedings of the Seventeenth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-01). San Francisco, CA: Morgan Kaufmann, 2001, pp. 310–331. [10] M. Montemerlo, “FastSLAM: A factored solution to the simultaneous localization and mapping problem with unknown data association,” Ph.D. dissertation, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, July 2003.

[11] D. H¨ahnel, W. Burgard, D. Fox, and S. Thrun, “A highly efficient FastSLAM algorithm for generating cyclic maps of large-scale environments from raw laser range measurements,” in Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2003, pp. 206–211. [12] N. Kwak, I.-K. Kim, H.-C. Lee, and B. H. Lee, “Analysis of resampling process for the particle depletion problem in fastslam,” in IEEE International Symposium on Robots and Human Interactive Communications (RO-MAN), 2007, pp. 200–205. [13] C. Stachniss, G. Grisetti, and W. Burgard, “Recovering particle diversity in a Rao-Blackwellized particle filter for SLAM after actively closing loops,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Barcelona, Spain, 2005, pp. 667– 672. [14] T. Bailey, J. Nieto, and E. Nebot, “Consistency of the FastSLAM algorithm,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). Ieee, 2006, pp. 424–429. [15] G. Grisetti, C. Stachniss, and W. Burgard, “Improving grid-based SLAM with Rao-Blackwellized particle filters by adaptive proposals and selective resampling,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Barcelona, April 2005, pp. 2432–2437. [16] S. S. Blackman, “Multiple hypothesis tracking for multiple target tracking,” IEEE Aerospace and Electronic Systems Magazine, vol. 19, pp. 5–18, 2004. [17] H. Durrant-Whyte, S. Majumder, S. Thrun, M. de Battista, and S. Scheding, “A Bayesian algorithm for simultaneous localisation and map building,” in Robotics Research, ser. Springer Tracts in Advanced Robotics, R. Jarvis and A. Zelinsky, Eds. Springer Berlin / Heidelberg, 2003, vol. 6, pp. 49–60. [18] S. Thrun and Y. Liu, “Multi-robot SLAM with sparse extended information filters,” in Proceedings of the International Symposium of Robotics Research (ISRR), Sienna, Italy, 2003. [19] M. Walter, R. Eustice, and J. Leonard, “A provably consistent method for imposing exact sparsity in feature-based SLAM information filters,” in Proceedings of the International Symposium of Robotics Research (ISRR), S. Thrun, R. Brooks, and H. Durrant-Whyte, Eds. San Francisco, CA: Springer, October 2005, pp. 214–234. [20] R. Eustice, H. Singh, J. Leonard, and M. Walter, “Visually mapping the RMS Titanic: Conservative covariance estimates for SLAM information filters,” International Journal of Robotics Research, vol. 25, no. 12, pp. 1223–1242, December 2006. [21] F. Dellaert and M. Kaess, “Square root SAM: Simultaneous localization and mapping via square root information smoothing,” International Journal of Robotics Research, vol. 25, no. 12, pp. 1181–1203, December 2006. [22] M. Kaess, A. Ranganathan, and F. Dellaert, “iSAM: Incremental smoothing and mapping,” IEEE Trans. on Robotics, vol. 24, no. 6, pp. 1365–1378, 2008. [23] K. Konolige, “Sparse sparse bundle adjustment,” in British Machine Vision Conference, Aberystwyth, Wales, August 2010. [24] M. Bosse, P. Newman, J. Leonard, and S. Teller, “Simultaneous localization and map building in large-scale cyclic environments using the Atlas framework,” International Journal of Robotics Research, vol. 23, no. 12, pp. 1113–1139, December 2004. [25] E. Olson, “Robust and efficient robotic mapping,” Ph.D. dissertation, Massachusetts Institute of Technology, Cambridge, MA, USA, June 2008. [26] N. S¨underhauf and P. Protzel, “Towards a robust back-end for pose graph slam,” in Proc. of IEEE Intl. Conf. on Robotics and Automation (ICRA), 2012. [27] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, 2nd ed. Cambridge University Press, 2004. [28] H. Strasdat, J. Montiel, and A. Davison, “Scale drift-aware large scale monocular slam,” Proceedings of Robotics: Science and Systems (RSS), vol. 2, no. 3, p. 5, 2010. [29] G. Sibley, C. Mei, I. Reid, and P. Newman, “Adaptive relative bundle adjustment,” in Robotics Science and Systems Conference, Seattle, USA, June 2009. [30] E. Olson, “Evaluating back-ends: Metrics,” in Automated SLAM Evaluation Workshop, Robotics Science and Systems, Los Angeles, USA, 2011. [31] ——, “Real-time correlative scan matching,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Kobe, Japan, June 2009.

320

Turning-Rate Selective Control : A New Method for Independent Control of Stress-Engineered MEMS Microrobots Igor Paprotny∗†∗∗†† , Christopher G. Levey‡ , Paul K. Wright†§ and Bruce R. Donald¶ ∗∗ † Center

∗ Berkeley

Sensor & Actuator Center (BSAC), University of California, Berkeley, CA, USA. for Information Technology Research in the Interest of Society (CITRIS), University of California, Berkeley, CA, USA. ‡ Thayer School of Engineering, Dartmouth College, Hanover, NH, USA. § Department of Mechanical Engineering, University of California, Berkeley, CA, USA. ¶ Department of Computer Science, Duke University, Durham, NC, USA. Department of Biochemistry, School of Medicine, Duke University Medical Center, Durham, NC, USA. ∗∗ Duke Institute for Brain Sciences, Duke University Medical Center, Durham, NC, USA. †† Email: [email protected]

Abstract—We present a novel method for independently controlling multiple stress-engineered MEMS microrobots called MicroStressBots through a single, global, control signal. We call this technique Turning-rate Selective Control (TSC). To implement TSC, we fabricated MicroStressBots that exhibit different turning rates. TSC specifically exploits these designed variations in turning rates between individual microrobots to differentiate their motion, and thereby obtain individual control. Thus, even though all robots move simultaneously, and are identical except for their turning rates, TSC can individually and independently position the robots’ centers of rotation within a planar configuration space. This allows the individual robots to be independently maneuverable to within a distance r from an arbitrary desired goal point in the plane. The distance r is the turning radius (approximately half of a microrobot width). We describe the theory behind TSC and, by using fabricated microrobots, show experimental results that confirm the feasibility of TSC for controlling multiple MicroStressBots through a single, global, control signal. Our paper further validates the multi-microrobot control paradigm called Global Control Selective Response that we first introduced in 2007. We conclude by discussing how TSC can extend the maximum number of independently controllable MicroStressBots.

I. I NTRODUCTION The last decade has seen the development of several novel microscale mobile robotic systems, such as electrostaticallydriven stress-engineered MEMS microrobots (MicroStressBots) [6], resonating stepping robots [10], stick-slip magnetic walkers [9], and microscrew-based swimmers [11]. Virtually all envisioned microrobotic applications (for example neurological: [1]) rely on the combined actions of many microrobots. However, control of many microrobots through a single global control signal has been demonstrated by only a few groups [7, 4]. Simultaneous control of several microrobots is significantly more challenging than control of single microrobots because one must overcome the high level of underactuation present in such systems. The high level

of underactuation is a result of the limited ability of the microrobots to decode a global control signal. Donald et al. [8] analyzed the control voltage complexity of stress-engineered MEMS microrobots, defined as the number of distinct voltage levels of the control waveform required to independently maneuver n robots. It was shown that MicroStressBots can exhibit sub-linear control voltage √ complexity (O( n)) if their steering arms are designed to have Symmetric Electromechanically Saturated (SESat) hysteresis gaps. In this work, we report on an new method to independently maneuver (independently control) MicroStressBots on a planar substrate. Called Turning-rate Selective Control (TSC), this method allows for independent control of microrobots that are only differentiated by their turning rates. Because MicroStressBots can be designed to have different turning rates, TSC fits within the paradigm of Global Control Selective Response (GCSR) [5], where independent control of multiple microrobots is enabled by engineering the robots to exhibit different behavior during portions of the global control signal. Our TSC idea is very simple, yet highly effective, as our experimental results show (Sec. IV). The mechanism of TSC is both different and independent of the mechanism of SESat-based control, allowing TSC to be used to complement SESat-based control and increase the maximum number of independently controllable MicroStressBots. We present the concept behind TSC, and show experimental results confirming the feasibility of TSC for enabling independent control of stress-engineered MEMS microrobots. There are no simulations in this paper; as described below, the presented control method was validated (up to proof-ofconcept) using physically fabricated MicroStressBots. The paper is structured as follows: in Sec. II, we introduce the design, the kinematics, and notation for MicroStressBots. Sec. III describes the theory behind TSC, while experimental results confirming its feasibility are presented in Sec. IV. A

321

concluding discussion summarizing the benefits and limitations of TSC is included in Sec. V. II. S TRESS - ENGINEERED MEMS M ICROROBOT (M ICRO S TRESS B OT ) A scanning-electron microscope (SEM) image of a MicroStressBot is shown in Fig. 1. The robot is approximately 260 μm × 60 μm × 10 μm in size, and has two actuated internal degrees of freedom: an untethered scratch-drive actuator (USDA) that provides forward motion, and a steering-arm actuator that determines whether the robot moves in a straightline or turns. The steering arm actuator consists of a cantilever beam (80-150 μm long) with a circular pad at its end (2050 μm in diameter). The robot chassis, including the USDA and the steering arm actuator, is fabricated from polycrystaline silicon using a MEMS foundry process [3]. Post-processing is used to add a stress-layer to curve the initially planar steering arm out-of-plane. The stressor layer (consisting of evaporated chromium) is lithographically patterned to produce the exact amount of steering-arm deflection.

one side and cannot back up. MicroStressBots can be either right- or left-handed, depending on whether the steering-arm is attached to the right of left side of the USDA, respectively.

WUDMHFWRU\ZKHQDUPUDLVHG VWUDLJKWOLQHPRWLRQ WUDMHFWRU\ZKHQDUPORZHUHG WXUQLQJ T  [\θ)

T

 ȝP Z  y)

r

USDA 

ȝP

y x

 Z

θ

x

)

VWHHULQJDUP

cr

Fig. 2. Kinematics of the stress-engineered MEMS microrobot. When the steering arm is lowered, the robot turns around a center of rotation cr .

A. Turning Mechanism

Fig. 1. SEM image of a MicroStressBot. The robot consists of an untethered scratchdrive actuator (USDA) that provides forward motion (propulsion), and a steering-arm actuator that controls whether the robot moves in straight-line or turns.

MicroStressBots are actuated on a field of zirconia-insulated interdigitated electrodes. When a voltage waveform is applied between the opposing pairs of these electrodes, alternating electric potential is capacitively coupled to the microrobot chassis supplying both power for the actuation of the USDA (propulsion) and a control signal for the actuation of the steering arm. An alternating voltage waveform containing both a steering arm control signal and an USDA power-delivery waveform is called a control primitive [7]. When the steering arm is elevated and the USDA receives power, the robot moves along a straight-line trajectory. When the steering arm is lowered as the USDA is powered, the robot turns with a turning rate ω around its center of rotation, cr . The kinematics of the MicroStressBots are summarized in Fig. 2. Although a single robot is globally controllable within its R2 × S1 configuration space (C-space) [12], the robot is not small-time locally controllable, since it can only turn to

The interaction between the steering arm and the underlying substrate as the USDA is powered causes the MicroStressBots to follow a curved trajectory, i.e. to turn. The turning mechanism is illustrated in Fig. 3. During pull-down, a portion s of the steering arm comes into flat contact with the substrate (Fig. 3.a). When the USDA is subsequently actuated, s acts as a temporary anchor, restricting the motion of the tip of the steering arm. The robot follows a curved trajectory, flexing the steering arm until the restoring force of the arm equals the force applied by the USDA (Fig. 3.b). When the arm is released during periodic polarity reversal of the AC waveform, the flexure in the arm is relieved, resulting in a net change in the heading θ of the microrobot (Fig. 3.b). The amount of the steering arm flexure is highly dependent on the geometry of the steering arm actuator, making the corresponding turning rate design-specific. B. Notation for MicroStressBot Control Consider a system of n MicroStressBots Ri , i ∈ {1, . . . , n}. Let Pj be a control primitive, drawn from an alphabet Σ of m available control primitives; Σ = {P1 , . . . , Pm }. Application of each control primitive Pj to the power delivery substrate will cause all the MicroStressBots on that substrate to move simultaneously. However, the differences in the designs of their steering-arm actuators may cause individual robots to move along different trajectories (exhibit different behavior). Because each robot Ri can exhibit different behavior during each control primitive Pj , we use a control matrix to define the relationship between the applied control primitives and the resulting microrobot behavior. An m × n control matrix M = (aji ), where aji is a pair containing the velocity vji and the turning radius rji of robot Ri during the application of control primitive Pj , aji = (vji , rji ). If robot Ri moves forward in straight line during the application of Pj , rji

322

primitive is applied to the substrate. Consequently, the control sequence S defines nominal (i.e. error free) trajectories Ti , i ∈ {1, . . . , n} for all n microrobots on the substrate. The nominal trajectories described by S are especially useful for planning the motion of microrobots. The relationship between the control sequence and the trajectories of a system of three MicroStressBots subject to a hypothetical control sequence S and a control matrix M is illustrated on Fig. 4. An orbit is a trajectory that returns the microrobot to its initial configuration. An example of a circular orbit is illustrated on Fig. 4. A cr -orbit is a trajectory that returns the center of rotation of a microrobot to its initial location. All orbits are cr -orbits, but the inverse is not necessary the case.

s (a)

r

a

P3

s (b)

c

R3

b

P1

P2

c

P2

c

P2

r

b

T3

P1

T1

b

P1 T2

a

P3

(c)

s

R1 Fig. 3. Turning mechanism of the MicroStressBot (top and cross-section views). (a) The steering arm is pulled in contact with the substrate, and a flat region s is temporarily anchored in place by electrostatic forces. (b) The USDA is actuated, causing the steering arm to flex while the robot follows a trajectory with the radius of curvature r. (c) As the arm is released during a polarity reversal of the drive voltage waveform, the flexure of the steering arm is released. This cycle is continuously repeated, causing the robot to turn.

approaches infinity. While turning, the velocity vji and turning radius rji does not vary much between the control primitives that cause turning, and for the remainder of this paper we shall assume that vi = vji and ri = rji for all j that correspond to turning control primitives. The turning rate of the robots is consequently ωi = vrii . We define the center of rotation cr of robot Ri as cr (Ri ). To maneuver the microrobots on the power delivery substrate, control primitives will be applied in some sequence to actuate the robots. All the MicroStressBots on the substrate receive the same single global power-delivery and control signal, and all the robots will move during the application of the control primitives. We define a control sequence S to be a sequence of control primitives S = {P1t1 , P2t2 , . . . , Pktk },

(1)

where Pltl , l ∈ {1, . . . , k}, is a control primitive drawn from the alphabet of available control primitives Σ, and tl denotes the duration the voltage waveform described by the control

a

P3

P1

P2

P3

R2

Fig. 4. Relationship between the control sequence S and trajectories for a group of three microrobots, R1 , R2 , and R3 . Nominal trajectories T1 , T2 , and T3 for a control sequence S = {P3a , P1b , P2c } are shown. Note that the behavior of R1 and R2 only differ during the application of control primitive P2 , while robot R3 always turns, completing an orbit.

C. Interpolated Turning Interpolated turning is a technique that allows a MicroStressBot to follow a curved trajectory with an effective radius of curvature (r ) that can be set to any value between the turning radius (r) and infinity by interleaving straightline and turning trajectory segments. The concept of interpolated turning is illustrated in Fig. 5. Let an interpolated turning trajectory TI be defined by the control sequence SI = {P1a , P2b , · · · , P1a , P2b }, where P1 and P2 are straightmotion and turning control primitives, respectively. Let ρa and ρb be the fraction of the time primitives P1 and P2 are applied b a and ρb = a+b . in the control sequence SI ; ρa = a+b The the trajectory TI will now approximate a turning trajectory with a radius of curvature r , r ≤ r < ∞, according to   ρa r = r 1 + . (2) ρb

323

TI q r

r’

r

q cr

rc

r cr

rc cr

straight-line trajectory interpolated turning trajectory

turning trajectory

(a)

approximated r’

Fig. 5. Interpolated turning. The robot follows an interpolated trajectory TI defined by the control sequence SI = {P1a , P2b , · · · , P1a , P2b }, where P1 and P2 are straight-motion trajectory and turning control primitives, respectively. a b ρa = a+b and ρb = a+b , and the trajectory of SI approximates a turning trajectory with turning radius r according to Eq. (2).

The approximation improves as the number of primitive pairs P1 -P2 repeated within SI increases. In our experiments, the P1 -P2 pair was merged into a new control primitive where the pair was repeated more than 20 times producing a good approximation to r . III. T URNING - RATE S ELECTIVE C ONTROL (TSC) The ability to vary the turning rates by varying the steering arm designs, as well as to adjust the effective turning radius r during interpolated turning allows us to propose a new method for independent control of MicroStressBots. Called Turning-rate Selective Control (TSC), this method uses a novel mechanism to achieve independent control of MicroStressBots different from our previous presented approaches. In TSC, the microrobots are differentiated only through design-induced variation of their turning rates and not through selective pulldown and release of their steering-arm actuators [7]. This implies that that during TSC, all the robots are either turning or moving straight at any given time. TSC is based on engineering orbit trajectories that maneuver some robots towards goal, while other robots are returned to their starting configurations. By combining several orbits, TSC can independently translate the centers of rotation cr of the individual robots in R2 . (During TSC all robots are turning at exactly the same time, and hence TSC can independently maneuver only the centers of rotation.) Consider two orbits that follow a turning trajectory and an interpolated turning trajectory, shown in Figs. 6.a and 6.b, respectively. When the robot is turning, the center of rotation cr remains in the same location in R2 , as illustrated in Fig. 6.a. However, when the robot is following an interpolated turning trajectory, the center of rotation cr is translated along a curved path with radius of curvature rc = r − r. When the robot completes this orbit, cr returns to its starting location (Fig. 6.b). Because each MicroStressBot makes a full rotation when it completes an orbit, MicroStressBots with different turning rates will complete their orbits at different times. Robots that

(b)

Fig. 6. Orbit trajectories. (a) During an orbit following a turning trajectory, the microrobot rotates with radius r around its center of rotation cr . The location of cr does not change (the orientation θ does change; cr is turning in place). (b) During an orbit based on an interpolated turning trajectory, the microrobot follows a trajectory with a radius of curvature r = r + rc . The center of turning cr follows a circular trajectory (its own orbit) with radius rc , and returns to its starting location.

are maneuvered along interpolated turning trajectories and do not complete an orbit are translated in C-space. Fig. 7 illustrates the concept of TSC. Consider robots R1 and R2 designed such that ω2 = 13 ω1 . An orbit for R1 based on an interpolated turning trajectory translates the center of rotation (cr (R2 )) for robot R2 by a vector γ. The vector γ can be expressed as  ⎛  ⎞ 2 sin θ2 − sin θ2 + 2πh ω ω 1  ⎠,  γ = rc ⎝ (3) 2 cos θ2 + 2πh ω − cos θ 2 ω1 where θ2 is the orientation of robot R2 , h denotes the side the steering-arm is attached to (h = 1 when the robot is lefthanded, h = −1 when robot is righthanded), rc = r2 − r2 , where r2 and r2 are the turning radius and the interpolated turning radius (from Eq. 2), respectively, of robot R2 , and ω1 and ω2 are the turning rates of robots R1 and R2 , respectively. Note that in Fig. 7 as well as the other figures in this section, the length of the trajectories is not to scale. The starting and final locations of the centers of rotation cr for the robots are marked by (S) and (E)-labeled dots, respectively. Overlapping dots indicate no change in location. In addition, where appropriate, grey color denotes past trajectories and intermediate configurations. After translating the center of rotation cr (R2 ) by displacement vector γ, cr (R2 ) can be translated by a second linearly independent displacement vector γ2 (see Fig. 8). The vectors γ and γ2 effectively allow cr (R2 ) to be translated anywhere within R2 without moving the center of rotation cr (R1 ). Correspondingly, the center of rotation cr (R1 ) of robot R1 can be translated while robot R2 completes an orbit, and cr (R2 ) returns to its starting location. One approach is to initially move robot R2 using interpolated turning, and then construct a non-orbit trajectory that maneuvers cr (R2 ) back to its original location, while displacing robot R1 by vector η. This approach is illustrated in Fig. 9.

324

E

K R1

E E S

S

S S

J

R1

R2

J’

Fig. 7. The concept of Turning-rate Selective Control (TSC). Trajectories of two robots R1 and R2 with different turning rates: ω2 = 13 ω1 . As R1 follows a circular orbit trajectory based on interpolated turning, the center of rotation (cr (R2 )) for robot R2 is displaced by vector γ.

R2

S

E S

J

(a)

R1

(b)

Fig. 9. Translating robot R1 and not robot R2 . (a) Initially, the center of rotation cr (R2 ) is translated by vector γ using an interpolated turning trajectory while robot R1 completes an orbit. (b) Both robots are then rotated such that center of rotation cr (R2 ) can return to its starting location using straight-line motion along vector γ  (bottom), while robot R1 and center of rotation cr (R1 ) are both translated by vector η (top).

E S

S

J

TABLE I E XPERIMENTALLY MEASURED TURNING RATES ω AND TURNING RADII r FOR FABRICATED MICROROBOTS R1 AND R2 E

J2

R2 S

J

(a)

S

R1 R2

J

(b)

Fig. 8. Translating the center of rotation cr (R2 ) by two displacement vectors that span R2 . (a) Translating cr (R2 ) by vector γ, while robot R1 completes an orbit. (b) Both robots turn (non-interpolated) to an arbitrary angle, and cr (R2 ) is translated by another vector γ2 , while robot R1 completes another orbit. The angle between the vectors, as well as their length, can be varied arbitrary, allowing cr (R2 ) to be translated in R2 .

A. Extension to n Microrobots TSC can be extended to independently control n MicroStressBots by individually translating a single robot while the remaining n − 1 robots complete their orbits. Consider adding a third microrobot R3 with turning rate ω3 = 32 ω1 to the system shown on Fig. 7. The center of rotation cr (R3 ) and not cr (R2 ) is translated by engineering the displacement vectors γ and γ2 such that they cancel each other, resulting in a non-circular cr -orbit returning the center of rotation cr (R2 ) to its starting location. Because all the trajectories complete orbits for robot R1 , its center of rotation, cr (R1 ), also remains in its starting location. The center of rotation cr (R3 ) is translated by a vector ζ3 . The trajectories for this implementation of

ω [rad/s] 0.46 (0.05) 0.20 (0.01)

r [μm] 136.66 (7.6) 177.50 (5.0)

TSC on three robots are shown on Fig. 10. By induction, two displacement vectors ζ3 and ζ3 can be combined to form a cr -orbit for robot R3 , allowing the translation of the center of rotation of a fourth robot R4 , and so on. We will use the shorthand TSC(n) to denote TSC of n > 2 microrobots. IV. E XPERIMENTAL R ESULTS We fabricated two MicroStressBots designed to exhibit different turning rates. The designs and optical micrographs of the two robots are shown in Figs. 11 and 12 respectively. L and w correspond to the length and width of the steering arm, respectively. d is the diameter of the steering-arm pad and c is the length of the stressor layer applied to the steering arm. The large difference in the turning rate of the robots was experimentally verified. Measured average turning rates ω and turning radii r for both robots are shown in Tab. I. Values in parentheses show the standard deviations. Differences in turning rates are also clearly visible in the optical micrographs in Fig. 13. The four images were obtained at times t = 2, 6, 8, 12 seconds from the start of the experiment and include traces of the completed trajectories (black) for the two robots turning. Fig. 14 shows the basic implementation of TSC on MicroStressBots R1 and R2 , where one of the robots is displaced

325

R1

E S

S

 ȝP  ) Z y

100 μm



 / ȝP   Z ȝP

  

J’ R2

S

E S

J

J

 ȝP  F 

)  Z x

ȝP 



 ȝP  G 

Fig. 12.

R3

]

S

S

]

]

Design and optical micrograph (inset) of robot R2 .

E

R2

(a)

]

R2

R1

(b)

R1 ȝP

ȝP

Fig. 10. Extending TSC to three MicroStressBots. A third robot R3 (ω3 = 2 ω ) is added to the system from Fig. 7. (a) The robots are actuated along 3 1 interleaved trajectories that results in an orbit of R1 (top), while the centers of rotation cr (R2 ) and cr (R3 ) are translated by vectors γ (middle) and ζ1 (bottom), respectively. (b) The robots rotate (solid line trajectory) such that an interleaved turning trajectory translates the center of rotation cr (R2 ) back to starting location (middle). The center of rotation cr (R3 ) is translated by a vector ζ2 (bottom), while robot R1 completes another orbit. The overall displacement of cr (R3 ) is ζ3 = ζ1 + ζ2 , while the location of cr (R1 ) and cr (R2 ) remain unchanged.

t = 6 s.

t = 2 s.

R2

R2

R1

R1

ȝP

t = 8 s.

ȝP

t = 12 s.

Fig. 13. Optical micrographs of robots R1 and R2 at times t = 2, 6, 8, 12 seconds while following turning trajectories. Traces of the completed trajectories for both robots are shown in black.

100 μm  ȝP  ) Z y  /

ȝP

 

 ȝP   F P ȝ   Z  P ȝ   G

Fig. 11.

)  Z x

ȝP 



Design and optical micrograph (inset) of robot R1 .

while the second robot orbits back to its initial configuration, analogous to the concept illustrated in Fig. 7. The center of rotation cr (R2 ) is translated by vector γ using an interpolated turning trajectory while robot R1 completes an orbit. Control

error contributed to a slight displacement of the center of rotation cr (R1 ). Fig. 15 shows the implementation of TSC where the center of rotation of one robot is translated in R2 by two independent vectors, analogous to the concept illustrated in Fig. 8. The center of rotation for robot R2 , cr (R2 ), is translated by two vectors γ and γ2 . The robot R1 completes two closed-loop orbits during this experiment. Again, a slight drift in the center of rotation cr (R1 ) is apparent due to control error. During the experiment, the robot R2 moves partially outside the view of the optical microscope. A movie of this experiment is available online here [13]. Fig 16 shows the implementation of TSC to independently translate the robot with a larger turning rate, analogous to the implementation illustrated in Fig. 9. The steps in this particular implementation were conducted in the reverse order than what was suggested in Sec. III. In this implementation,

326

E

R2

J

R2

E

J S

S

R1

S

E

SE

R1 ȝP

ȝP

(a)

Fig. 14. Composite optical micrograph showing the basic implementation of TSC. Robots R1 and R2 follow interpolated turning trajectories. As robot R1 completes an orbit and returns back to starting configuration, the center of rotation of robot R2 , cr (R2 ), is translated by a vector γ.

R2

J

J

E

S

robot R2 was first translated using a combination of turning and straight-line trajectories, displacing the center of rotation cr (R1 ) by vector η (Fig. 16.a). Consequently, the center of rotation cr (R2 ) was translated back to its starting location, completing a cr -orbit for R2 . Also in this experiment, a slight drift in the location of cr (R2 ) can be observed. In practice, this control error will define the ultimate accuracy with which the TSC can be implemented, analogous to forward projections described in [8]. A movie of this experiment is avaliable online here [13]. V. C ONCLUSION We presented a new method for independent control of stress-engineered MEMS microrobots (MicroStressBots). Called Turning-rate Selective Control (TSC), this method allows for the independent translation of centers of rotation cr of MicroStressBots that are identical except for their turning rates. This allows the individual robots to be maneuvered independently to within the turning radius (r) away from arbitrary locations in R2 . The concept of TSC was experimentally validated using physical fabricated MicroStressBots. The mechanism of TSC is both different from, and independent of, SESat-based control [8]. This allows TSC to augment SESat and increase the number of microrobots that can be independently controlled using a single, global, control signal. When properly designed, n MicroStressBots can be controlled √ using 2 n independent control voltage levels [8]. Consider a hybrid SESat-TSC(u) system containing n = u × w robots, where u is the number of groups of w SESat MicroStressBots that are differentiated through their u different turning rates. The w robots within a group are controlled using SESat, while the robots between the u groups are controlled using TSC. Such SESat-TSC(u) system can independently control n robots using the same number of control voltage levels as a w-robot SESat system, and has control voltage complexity ' of 2 nu . For example, suppose one wants to independently control 100 MicroStressBots through a single global control signal.

E

R1

S

ȝP

(b)

Fig. 15. Composite optical micrographs showing the translation of the center of rotation for robot R2 , cr (R2 ), by two linearly independent vectors. (a) Both robots follow interpolated turning trajectories. The center of rotation cr (R2 ) is displaced by vector γ while robot R1 completes an orbit. (b) Both robots turn (turning trajectories are denoted by dotted lines). During turning, the centers of rotation of both robots remain in place, allowing the angle between γ and γ2 to be set to an arbitrary value. The center of rotation for robot R2 , cr (R2 ), is then translated using interpolated turning by a second vector γ2 , as R2 moves outside of the view of the microscope (intermediate configurations of robot R2 are included for clarity). The (E) dot and arrow point to the approximate location of cr (R2 ) after the completion of the experiment.

According to SESat-based control, one would need 20 control voltage levels to achieve independent control. Suppose we divide the 100 microrobots into four groups, each containing 25 robots that are independently controlled using SESat, while robots between the groups are differentiated using TSC. The robots within this hybrid SESat-TSC(4) system can now be controlled using only 10 independent control voltage levels. TSC can only maneuver the centers of rotation for the robots in R2 , however the SESat system can decouple the rotation of the individual robots. It would be interesting to analyze whether a SESat-TSC(n) system can be engineered to independently maneuver the robots in R2 × S1 Because TSC cannot selectively snap-down or release the steering arms of the individual robots, all robots either all move in a straight line or all turn at the same time. Consequently, the orientations θ of the robots controlled by TSC remain coupled, i.e. cannot be independently controlled.

327

NIH, and 2000-DT-CX-K001 to B.R.D., from the Office for Domestic Preparedness, Department of Homeland Security, USA, and funded in part by the Center for Information Technology Research in the Interest of Society (CITRIS). The authors wish to thank D. Balkcom, C. Bailey-Kellogg, and Kris Pister for their advice and suggestions. The electron micrograph was taken at the Dartmouth Ripple Electron Microscopy Laboratory, with the help of C.P. Daghlian.

E

J R2

S

R EFERENCES

K

S

R1

E

ȝP

(a)

J

R2

J' E

K

S

S

R1

E

ȝP

(b) Fig. 16. Composite optical micrographs showing the implementation of TSC to independently translate the robot with a larger turning rate (R1 ). (a) The center of rotation of robot R2 , cr (R2 ), is translated using a combination of turning and straight-line trajectories (denoted by dotted lines), and is translated by a vector γ. There resulting trajectory of robot R1 does not form a an orbit, causing the center of rotation cr (R1 ) to be translated by vector η. (b) The center of rotation of robot R2 , cr (R2 ), is translated back to its starting location by vector γ  using interpolated turning, completing a cr -orbit for R2 . Because the interpolated turning trajectory completes a cr -orbit for robot R1 , the center of rotation cr (R1 ) maintains its displacement by vector η from step (a).

Because r > 0, the MicroStressBots do not turn in place. To approximate desired configurations in R2 × S1 , the robots must orbit around their centers of rotation (cr ) until mutual differences in their turning rates cause their orientations to align close to the desired configurations. Hence, there is a trade-off between the lengths of the robot trajectories and the resulting proximity to the goal configurations, which is especially important when we consider perturbations due to the effects of control error. It would be interesting to fully investigate such trade-offs in future work. Finally, related theoretical ideas have been also presented in e.g. [2], and it would be interesting to investigate how these methods fit with the SESat-TSC framework.

[1] I. V. Borzenets, I. Yoon, M. W. Prior, B. R. Donald, R. D. Mooney, and G. Finkelstein. Ultra-sharp metal and nanotubebased probes for applications in scanning microscopy and neural recording. Journal of Applied Physics, 111(7):074703, 2012. [2] T. Bretl. Control of many agents using few instructions. In the Proceedings of Robotics: Science and Systems 2007 conference, Atlanta, GA, 2007. [3] A. Cowen, B. Hardy, R. Mahadevan, and S. Wilcenski. PolyMUMPs Design Handbook. MEMSCAP, 2011. [4] E. Diller, C. Pawashe, S. Floyd, and M. Sitti. Assembly and disassembly of magnetic mobile micro-robots towards deterministic 2-D reconfigurable micro-systems. The International Journal of Robotics Research, 30(14):1667–1680, December 2011. [5] B. R. Donald. Building very small mobile micro robots. Inaugural Lecture, Nanotechnology Public Lecture Series, MIT (Research Laboratory for Electronics, Electrical Engineering and Computer Science, and Microsystems Technology, Laboratories). Cambridge, MA., April 2007. avaliable online at: http://video.mit.edu/watch/building-very-small-mobilemicrorobots-9273/. [6] B. R. Donald, C. G. Levey, C. McGray, I. Paprotny, and D. Rus. An untethered, electrostatic, globally-controllable MEMS micro-robot. Journal of Microelectromechanical Systems, 15(1):1–15, January 2006. [7] B. R. Donald, C. G. Levey, and I. Paprotny. Planar microassembly by parallel actuation of MEMS microrobots. Journal of Microelectromechanical Systems, 17(4):789–808, August 2008. [8] B. R. Donald, C. G. Levey, I. Paprotny, and D. Rus. Simultaneous control of multiple mems microrobots. In the proceedings of the 9th International Workshop on Algorithmic Foundations of Robotics (WAFR), December 2008, and also in Springer Tracts on Advanced Robotics, 57, G. S. Chirikjian, H. Choset, M. Morales, and T. Murphey (eds.), Springer-Verlag (Berlin), (2010). [9] S. Floyd, C. Pawashe, and M. Sitti. An untethered magnetically actuated micro-robot capable of motion on arbitrary surfaces. In the Proceedings of IEEE International Conference on Robotics and Automation (ICRA), 2008. [10] D. R. Frutiger, B. E. Kratochvil, K. Vollmers, and B. J. Nelson. Magmites wireless resonant magnetic microrobots. In the Proceedings of IEEE International Conference on Robotics and Automation (ICRA), May 2008. [11] A. Ghosh and P. Fischer. Controlled propulsion of artificial magnetic nanostructured propellers. Nano letters, 9(6):2243– 2245, 2009. [12] S. LaValle. Planning Algorithms. Cambridge University Press, 2006. [13] I. Paprotny, C. G. Levey, P. K. Wright, and B. R. Donald. Turnig-rate Selective Control (TSC). [Online]. Avaliable: http://youtu.be/aZMbf6lrEMA, 2012.

ACKNOWLEDGMENTS This work was supported by a grant to B.R.D. from the Duke Institute of Brain Sciences, grant numbers GM-65982 to B.R.D. from

328

Affine Trajectory Deformation for Redundant Manipulators Quang-Cuong Pham and Yoshihiko Nakamura Department of Mechano-Informatics University of Tokyo, Japan

Abstract—We propose a new method to smoothly deform trajectories of redundant manipulators in order to deal with unforeseen perturbations or to retarget captured motions into new environments. This method is based on the recently-developed affine deformation framework, which offers such advantages as closed-form solutions, one-step computation and no trajectory re-integration. Satisfaction of inequality constraints and dynamics optimization are seamlessly integrated into the framework. Applications of the method to interactive motion editing and motion transfer to humanoid robots are presented. Building on these developments, we offer a brief discussion of the concept of redundancy from the viewpoint of group theory.

or motion transfer to humanoid robots [16, 14]. In the context of robot control, the original trajectory is often obtained through extensive computations (in order e.g. to minimize energy consumption, to avoid obstacles, to respect joint limits, etc.), therefore, if the deformed trajectory is “close” to the original one, one can expect the former to keep the good properties of the latter. The deformation algorithm should also leave the possibility for specifying explicitly other optimization objectives, such as integrated torque, energy consumption, distance to obstacles, etc. B. Existing approaches

I. I NTRODUCTION Planning trajectories for highly redundant manipulators is challenging and time-consuming because of the large number of degrees of freedom associated with these systems [7]. As a consequence, in order to deal with unforeseen obstacles or perturbations of the target or of the system state, it is sometimes more advantageous to deform a previously planned trajectory rather than to re-compute entirely a new one. In motioncapture-based applications, deforming captured trajectories – e.g. to adapt them to a different environment, to retarget them to a different character [10, 2, 6], or to transfer them to a humanoid robot [16, 14] – is the only viable option, as one cannot reasonably record beforehand all the motions with the desired kinematic and dynamic properties. A. Some desirable properties of the deformations Our aim in the present article is to design a trajectory deformation algorithm that satisfy the three following requirements : accuracy, smoothness and optimality. “Accuracy” simply means that the deformed trajectory should attain the objective for which the deformation has been conceived : for instance, to reach exactly a new configuration, with a specified velocity, acceleration, etc. By “smoothness”, we mean that the deformed trajectory should conserve the regularity properties (such as being C 1 , C 2 or more generally C p ) of the original trajectory. This is particularly critical for instance in the context of computer graphics to avoid motion jerkiness or, in the context of robot control, to avoid infinite torques. By “optimality”, we mean that one should have the possibility to choose a deformation that optimizes certain criteria, so long as this optimization does not interfere with the previous requirements of accuracy and smoothness. One typical objective can be for instance to minimize the distance between the deformed trajectory and the original one. This is obviously desirable in the case of motion retargeting [10, 2, 6]

Two main approaches exist in the literature for handling trajectory deformations. In spline-based approaches, a deformation is made by altering the coefficients multiplying the basis splines [13] or by adding to the original trajectory a displacement map – which is a sum of splines [2, 6]. These modifications can furthermore be done in a coarse-tofine manner using wavelet bases [13] or through hierarchical approximations [6]. Another approach is based on the encoding of the original trajectory by an autonomous nonlinear dynamical system [3, 14]. A deformation is then made by altering the coefficients multiplying the basis functions that appear in the definition of the dynamical system. This approach yields a robust execution-time behavior thanks to the autonomous nature of the dynamical system. At the same time, because of this dynamical-system representation, inequality (such as joint limits, obstacle avoidance) and equality (such as specified final velocity, acceleration, etc.) constraints at specific time instants cannot be taken into account without integrating the whole trajectory up to these time instants, which can be very costly. The above two approaches are similar in that they make use of exogenous basis functions : splines in the spline-based approach, and Gaussian kernel functions in the dynamicalsystem-based approach. A first, pervasive, difficulty then consists of choosing the appropriate bases for a particular task. Second, adding artificial functions to a natural movement can produce undesirable behaviors, such as large undulations in the case of splines [6, 13], lack of smoothness, etc. – which call for supplementary and often costly efforts to correct. C. Our approach Unlike the spline-based and the dynamical-system-based approaches, the method we propose makes use of no exogenous basis function. Indeed, inspired by the recent finding that human movements are – to some extent – invariant under

329

affine transformations [1], we deform a given trajectory by applying affine transformations on parts of it. Thus, the only “basis functions” are the original joint angle time series and, as a consequence, any deformed trajectory under this scheme automatically preserves some properties of the original one. Such invariant properties may include for instance : smoothness, absence of large undulations, piece-wise parabolicity (which is particularly important for industrial robots), affine velocity, frequency spectrum, etc. or more qualitatively, motion “naturalness” [18], which is a desirable feature in motion retargeting applications, yet difficult to quantify. We shall show that, despite this small functions basis, it is still possible to satisfy the requirements of “accuracy”, “smoothness” and “optimality” stated earlier by leveraging the extra redundancy offered by the affine transformations. More precisely, our approach is based on the affine trajectory deformation framework first developed for nonholonomic mobile robots [8, 9]. In contrast with previous trajectory deformation methods for mobile robots [5, 11], affine-geometrybased algorithms are exact (given by closed-form expressions), can be executed in one step, and do not require any trajectory re-integration. The present manuscript focuses on manipulators with many degrees of freedom, but some of the new developments presented here (about e.g. inequality constraints or dynamics optimization) should benefit all classes of system for which affine deformations are applicable. In section II, we present the main algorithms of affine trajectory deformation for redundant manipulators. Closed-form expressions of the deformations that optimize the closeness between the original and the deformed trajectories are given, and explicit bounds on the dissimilarity between the original and the deformed trajectories or between their derivatives are presented. Inequality constraints, dynamics optimization, and kinematics filtering are integrated into the affine deformation framework. In section III, we illustrate these results with two simple concrete examples : interactive motion editing and motion transfer to humanoid robots. In section IV, we give a characterization of trajectory redundancy by the group of admissible deformations, revisiting thereby the concept of kinematic redundancy and suggesting a new theoretical approach to motion planning. II. A FFINE TRAJECTORY DEFORMATIONS A. Affine spaces and affine transformations An affine space is a set A together with a group action of a vector space W. An element w ∈ W transforms a point θ ∈ A into another point θ  by θ  = θ + w, which can also be noted −→ θ  − θ = w or θθ  = w. Given a point θ 0 ∈ A (the origin), an affine transformation F of the affine space can be defined by a couple (w, M) where w ∈ W and M is a non-singular linear map W → W. The transformation F acts on A by ∀θ ∈ A, F(θ) = θ 0 + −−→ M(θ 0 θ) + w. Note that, if θ 0 is a fixed-point of F, then F can be written in the form −−→ ∀θ ∈ A F(θ) = θ 0 + M(θ 0 θ).

B. Equality constraints on the deformations 1) Smoothness constraints at the deformation time instant: Consider a C p trajectory θ(t)t∈[0,T ] and an affine transformation F that deforms θ(t)t∈[0,T ] into θ  (t)t∈[0,T ] at a time instant τ , i.e. θ  (t) = θ(t) θ  (t) = F(θ(t)).

∀t < τ ∀t ≥ τ

We say that F is C p -preserving if the resulting θ  (t)t∈[0,T ] is also C p . From [8], we know that F is C p -preserving if and only if (i) θ(τ ) is a fixed-point of F and (ii) the first p derivatives of θ  are continuous at τ . Condition (i) implies that F is of the form ∀φ

F(φ) = θ(τ ) + M(φ − θ(τ )),

(1)

where M is a non-singular linear map A → A. i Next, for i = 1 . . . p, let us note ui = ddtθi (τ ) (u1 is the velocity vector at time τ , u2 is the acceleration vector, etc.) Then condition (ii) can be formulated as ∀i = 1 . . . p

M(ui ) = ui .

(2)

Let us now consider the non-degenerate case when u1 , . . . , up are linearly independent. In this case, it is possible to construct an orthonormal basis B whose first p vectors form a basis of U = Span(u1 , . . . , up ), for instance using a Gram-Schmidt orthonormalization procedure. From equation (2), the matrix of M in this basis is of the form ⎛

1

⎜ ⎜ ⎜ ⎜ M=⎜ ⎜ 0 ⎜ ⎜ . ⎝ . . 0

..

.

... . .. ...

1 0 . .. 0

··· .. . ··· ··· . .. ···

m1,p+1 .. . mp,p+1 1 + mp+1,p+1 . .. mn,p+1

m1,n .. . mp,n mp+1,n . .. 1 + mn,n



⎟ ⎟ ⎟ ⎟ ⎟, ⎟ ⎟ ⎟ ⎠

which shows that the space of C p -preserving affine transformations at τ forms a Lie subgroup of the General Affine group GAn of dimension n(n − p) = n2 − pn, parameterized by the mi,j . 2) Constraints on the final configuration: Regarding the “accuracy” requirement, let us now characterize, within the space of C p -preserving deformations, those that allow attaining a desired final position θ d , i.e., such that θ  (T ) = F(θ(T )) = θ d . Let (λ1 , . . . , λn ) and (μ1 , . . . , μn ) denote respectively the coordinates of θ(T ) − θ(τ ) and of θ d − θ(τ ) in the basis B. Then, the condition for the deformed trajectory to reach the desired configuration θ d at time T is (see Fig. 1A for an illustration) M (λ1 . . . λn ) = (μ1 . . . μn ) .

(3)

Equation (3) can actually be transformed into Vm = δ,

(4)

where V is the n × n(n − p) matrix defined by

330



⎜ V=⎝

λp+1

···

λn

0 0

··· ···

0 0

..

.

0

···

0

··· ···

λp+1

0



⎟ , 0 ⎠ λn

(5)

%

0 4

/'1*  3

/'* 

'* 

              



dimension n2 − (k + p)n. Within this space, one can then choose the deformations that optimize certain criteria, as detailed in the following sections and summarized in Fig. 1B.



4

/'*  3 /2'*

    #$ !"

/'*

/'1*

" /

    

     

-)     

-    

%   #   !"

  +, (    

C. Minimum-norm deformations &      '( ()    *

.#        

Fig. 1. A : Illustration for the case n = 2 (system of dimension 2) and p = 1 (C 1 continuity, i.e. continuity of the velocity, is required). B : Flowchart of the deformation algorithm.

and m and δ are the vectors of sizes n(n − p) × 1 and n × 1 defined by ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ m=⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

m1,p+1 .. . m1,n .. . mn,p+1 . .. mn,n

1) Closed-form solution: As stated in the Introduction, one important optimization objective for the deformed trajectory can consist of being the “closest” possible to the original trajectory. One way to achieve this is to minimize the distance between the transformation F and the identity transformation, which in turn can be quantified by the Frobenius distance between the matrix M and I. The  Frobenius distance between  2 = m , where M and I is given by M−IF = m 2 ij i,j  · 2 denotes the 2-norm of vectors. On the other hand, the m of minimum 2-norm that satisfies equation (4) is given by m = V+ δ,

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

⎛ and

⎜ δ=⎝

where V+ denotes the Moore-Penrose pseudo-inverse of V (for simplicity, we treat the case when only the final configuration is constrained). Observe next that, since the rows of V are linearly independent, one can in fact compute explicitly V+ by

⎞ ⎛ ⎞ δ1 μ1 − λ1 ⎟ ⎜ . ⎟ .. ⎠ = ⎝ .. ⎠ . . μ n − λn δn

1

V , S n @proj (T )2 , where θ @proj (T ) is the with S = i=p+1 λ2i = θ 2 ⊥ orthogonal projection of θ(T ) − θ(τ ) on U (see Fig.1A for an illustration). One can thus compute m by V+ = V (VV )−1 =

Note from the above definition that δ represents in fact the coordinates of θ d − θ(T ) in the basis B. 3) Constraints on the final derivatives: Assume for instance that one wishes to arrive at the final configuration with a desired velocity vd . Then, the matrix of the transformation must satisfy, in addition to equation (3), the following equation ⎛ ⎜ M⎝

λv1





μv1

.. ⎟ = ⎜ .. ⎟ , . ⎠ ⎝ . ⎠ λvn μvn

(6)

where (λv1 , . . . , λvn ) and (μv1 , . . . , μvn ) denote respectively the ˙ ) and of vd in the basis B. coordinates of θ(T Next equation (4) becomes Vg m = δ g ,

m



V+ δ =

=

1 (λp+1 δ1 , . . . , λn δ1 , . . . , λp+1 δn , . . . , λn δn ) , S

which leads to the explicit form of M as M = I + where ⎛

0 ⎜ W = ⎝ .. . 0

(7)

where Vg is a 2n × n(n − p) matrix constructed by adding below V a n×n(n−p) matrix similar to V but containing the λvi s instead of the λi , and where δ g is a 2n × 1 vector defined by δ g = (μ1 − λ1 , . . . , μn − λn , μv1 − λv1 , . . . , μvn − λvn ) . More generally, satisfying k final constraints (k = 3 for instance if one wants to constraint the final position, velocity, and acceleration) will give rise to a matrix Vg of size kn × n(n − p). From the dimensions of the matrices of the linear system (7), one can thus draw the following conclusions 2 • If n − pn ≥ kn (i.e. n ≥ k + p), then it is possible to deform θ into a θ  that satisfies k final constraints while guaranteeing C p continuity.  • Furthermore, if n > k + p, then such a θ is not unique. In fact, the space of affine transformations that satisfy the above conditions constitutes a Lie subgroup of GAn of

1

V δ S

=

··· . .. ···

0 . .. 0

λp+1 δ1 . .. λp+1 δn

··· . .. ···

⎞ λn δ1 ⎟ . .. ⎠. λn δn

1 S W,

(8)

2) Bound on the dissimilarity of the trajectories: Let us now compute the distance between the deformed and the original trajectories. First, observe from the form of F [cf. equation (1)] that, for all t ≥ τ θ  (t) − θ(t)

= =

(M − I)(θ(t) − θ(τ )) @proj (t). (M − I)θ

(9)

Letting · 2 also denote the induced 2-norm of linear maps and matrices, one has, from the explicit form (8) of M

331

M − I 2 =

1 1 W 2 ≤ W F S S



 ⎛ n

n 1  2 ⎝  2⎠ =  δi λi S i=1 i=p+1

√ δ2 S θ d − θ(T )2 √ = . = S S

(10)

Next, from equation (9), one has

tion (12) is then equivalent to

@proj (t)2 ≤ M − I2 · θ θ d − θ(T )2 @ √ = θ proj (t)2 S @proj (t)2 θ = θ d − θ(T )2 . (11) @ θ proj (T )2

θ  (t) − θ(t)2

Note that inequality (11) is in fact an equality for t = T . This inequality provides a bound on the distance between the deformed and the original trajectories at each time instant. @proj (t) and θ @proj (T ) do not depend on θ d , one can in Since θ fact write

Ai PVi m ≤ bi − Ai θ Can (ti ). One can finally construct a matrix Ag and a vector bg by piling vertically all the Ai PVi on one hand and all the bi − Ai θ Can (ti ) on the other hand. Now, to find the deformation of minimum-norm (cf. section II-C) while satisfying the inequalities (11), it suffices to solve the Quadratic Program 1 min m22 2 with the equality and inequality constraints Vg m = δ g ,

max θ  (t) − θ(t)2 ≤ Kθ d − θ(T )2 ,

Ag m ≤ b g .

(13)

t∈[τ,T ]

where K is a constant independent of θ d (for instance, K = 1 in Fig. 1A). In particular, this shows a desirable “continuity” property of our deformation algorithm : when θ d → θ(T ), one has θ  (t)t∈[0,T ] → θ(t)t∈[0,T ] in L2 . 3) Bounds on the dissimilarity of the derivatives: One can also compute the distance between the derivatives (velocity, acceleration, etc.) of the deformed and the original trajectories in a similar way as above. One has indeed, for all levels q of differentiation  q  dq θ  dq θ d θ ∀t ∈ [τ, T ] (t) − (t) = (M − I) (t) , dtq dtq dtq

Note finally that the system of equality and inequality constraints can be fully prioritized and solved efficiently using recent algorithms [4]. As an illustration, consider a planar 3-link manipulator. The original trajectory of the end-effector is a straight line between the initial position and the final position. However, the corresponding joint angle trajectory violates several joint limit constraints. A deformed C 1 trajectory is then computed that connects the initial and final positions while respecting the constraints and staying the closest possible to the original trajectory, see Fig. 2.

which, together with (10), allows obtaining the following bound for all t ∈ [τ, T ] 3 q 3 3 3 q  3d θ d θ 3 dq θ 3 θ d − θ(T )2 3 3 3 3 3 3 dtq (t)3 . 3 dtq (t) − dtq (t)3 ≤ @ θ proj (T )2 2 2

2

1.5

1

0.5

0

D. Inequality constraints

−1.5 θ1 (rad)

In addition to the equality constraints of section II-B, most applications also require the satisfaction of inequality constraints, such as joint limits, upper-bounds on the velocities, accelerations or torques, avoidance of obstacles, etc. In many cases, these constraints can be expressed by Ai θ Can (ti )

≤ bi ,

Vi m = P

(θ Can (ti )

− θ Can (ti )),

which yields θ Can (ti ) = θ Can (ti ) + PVi m, where Vi has been constructed as in equation (5). Equa-

0.5

1 θ3 (rad)

1.5

2 θ2+θ3 (rad)

3.5

1.8 0.8

1.2

3

1

2.5

1.6 1.4

0.6

1.2 0.4

0.8

2

0.6

1.5

1 0.8

0.2

0.6

0.4

1

0.2

0.5

0.4

0

0.2 −0.2

−1

−0.5 0 θ (rad) 2 2

(12)

where ti ∈ [τ, T ] is a specific time instant, Ai is a c×n matrix, bi is a c×1 vector and θ Can (ti ) is the n×1 vector containing the coordinates of θ(ti ) in the canonical basis. To enforce joints limits, one can for example choose several ti that sample the region where the joint values are expected to be large. Note ˙ θ, ¨ also that constraints on higher-order derivatives such as θ, etc. can be treated using the formulae of section II-B3. Next, let P denote the basis matrix of B (cf. section II-B1). Equation (4) can then be equivalently written as

−1

0

5

0

0

5

0

0

5

0

0

5

Time (s)

Fig. 2. Deformation under inequality constraints. The original trajectory (gray) is deformed into the black one in order to respect the constraints θ1 ≥ 0 and θ2 + θ3 ≤ 2.8 rad (black horizontal lines in the bottom plots). The gray dots correspond to the time instant of the deformation. Note that the deformed trajectory remains C 1 . The time instants when the constraint are enforced (black dots in the bottom plots) were chosen near the minimum of θ1 and the maximum of θ2 + θ3 in the original trajectory.

E. Optimization of dynamic quantities As stated in the Introduction, it is sometimes necessary to specify explicitly the objective of the optimization, such as energy consumption or integrated square torque. However, in contrast with the minimum-norm deformation case, closedform solutions cannot, in general, be obtained for such optimizations. One must then resort to iterative methods.

332

Typically, consider a cost function of the type  T ˙ ¨ C(θ) = c(θ(t), θ(t), θ(t)) dt. 0

Then the optimization problem can be formulated as  T ˙ ¨ min c(θ(τ ) + M(θ(t) − θ(τ )), M(θ(t)), M(θ(t))) dt M

τ

(14) subject to the linear equality and inequality constraints of equation (13). Note that, if an iterative method is used, a very convenient candidate for an initial guess is the M of minimum norm, which can be computed very quickly following sections II-C and II-D As an illustration, consider again the 3-link planar manipulator of the previous section. The deformation was made here in order to minimize the integrated squared torque  T C(θ) = ρ21 (t) + ρ22 (t) + ρ23 (t) dt,

needs to make sure that the stance foot neither leaves the ground nor slips. This is expressed by maintaining the foot position constant throughout a continuous time interval. It is clear that a single affine transformation cannot satisfy such a constraint that applies on a continuous time interval. Thus, to take into account this type of fine-grained constraints, we propose to subsequently “filter” the deformed trajectory. More precisely, consider for instance the following constraint g(θ(t)) = 0,

2 1.5 1 0.5 0 −1.5

−1

−0.5

0

0.5

1

θ˙ 1 (rad . s −1)

θ 1 (rad ) 0.5

0.4 0.2 0 −0.2

1.5

2

2.5

θ¨1 (rad . s −2) 0.5

θ  (t) = θ  (t) θ˙  (t) = projNg (θ (t)) θ˙  (t),

∀t ≤ t0 ∀t ∈ [t0 , t1 ]

(16)

where proj is the operator of orthogonal projection. Equation (16) is a differential equation very simple to integrate and whose solution θ  satisfies the constraint g for all t ∈ [t0 , t1 ]. While in general there is no guarantee that θ  will stay close to θ  , in practice, for highly redundant manipulators and short constraint time intervals, this procedure can yield satisfying results, as illustrated in Fig. 4. Finally, if needed, one can make a final affine deformation to reconnect θ  (t)t∈(t1 ,T ] with θ  (t)t∈[t0 ,t1 ] at time t1 .

3 3.5 Torque1 (N.m)

2

1.5

1.5

1 0 0.5 0

5

−0.5

0

5

0

0

θ˙ 2

θ2 2

5

0

0

1

5 Torque2

θ¨2

0.5

1

0

0.4

−0.2

0.2

−0.4 0

(15)

Let us denote by Jg the Jacobian matrix of g and Ng (φ) the nullspace of Jg at point φ. One first needs to make sure that the deformed trajectory θ  satisfies the constraint at t0 by requiring g(θ  (t0 )) = 0 (and θ˙  (t0 ) ∈ Ng (θ  (t0 )) if one needs C 1 continuity). This can be done by using the affine deformation method presented earlier. Next, differentiating (15) yields the constraint Jg (θ)θ˙ = 0, ∀t ∈ [t0 , t1 ]. Let us now consider the filtered trajectory θ  defined by

0

where ρj is the torque applied at joint j. Note that the above equation can be easily put in the form of equation (14) by ˙ θ ¨ and the geometric expressing the ρj as functions of θ, θ, and inertial parameters of the system. An example is given in Fig. 3.

∀t ∈ [t0 , t1 ].

0

5

0

0

5

2

0.5

1

0

0

0 0

5

0

5

−0.5

Torque

θ¨3

θ˙ 3

θ3

0.5

3

0.1 0 −0.2

−1

0

−1.5

−0.4 0

0

5

−0.5

0

5

0

5

−0.1

0

−1

−0.5

0

2

Time (s)

0.5

1

1.5

2

θ (rad) 2

θ1 (rad)

5

2.5

3

θ3 (rad)

2 2

1

Fig. 3. Minimization of squared torque. The original trajectory (gray) is deformed into a new trajectory (black) whose integrated squared torque is smaller. The dots correspond to the time instant of the deformation. Note that the deformed joint angle trajectories are C 1 but not C 2 . Note also that, in agreement with classical results (see e.g. [15]), the torque-optimal trajectory of the end-effector in Cartesian space is not straight but slightly curved.

F. Fine-grained kinematics filtering In the previous sections, we have presented a method to find trajectory deformations that suit global, coarse-grained, objectives, such as reaching a desired final configuration, avoiding obstacles at specific time instants or optimizing a global trajectory cost. However, a trajectory obtained from such deformations may not satisfy local, fine-grained, constraints. For instance, when modifying a walking pattern, one

1

1

0 −1

0

2

4

0

0

2 4 Time (s)

0

0

2

4

Fig. 4. Kinematics filtering to fulfill fine-grained constraints. The original trajectory (dark gray) allows connecting the initial and final end-effector positions by a straight line in the end-effector space. The joint angle trajectory is next deformed in order to reach the final end-effector position but using a different joint angle configuration (black trajectory, top plot). However the black end-effector trajectory (black thick line in the top plot) is no longer a straight line. The black joint angle trajectory is then “kinematically filtered” into the light gray one whose end-effector trajectory is a straight line in Cartesian space, similarly to the original dark gray trajectory, but whose joint angle trajectory is close to the black one (see bottom plots). In particular, the final black and light gray configurations are almost identical.

The previous development constitutes in some sense a “kinematics filter”. A more sophisticate approach – the “dy-

333

namics filter” – allows filtering a non-dynamically-consistent trajectory into a dynamically-consistent one by using a fast dynamics computation algorithm for structure-varying kinematic chains [16].

900 800 700 600 500

III. E XAMPLES OF APPLICATIONS

400 300

A. Interactive motion editing

200

In typical computer graphics applications, the system under study is so complicated that interactive manipulations are the only way to modify the motion to suit one’s needs. In this perspective, a pin-and-drag algorithm that allows interactively altering a particular configuration while respecting kinematic and contact constraints using differential inverse kinematics was designed in [17]. Based on the previous development, we propose here a way to extend this algorithm to handle interactive modifications of the entire trajectory. Assume that a joint angle trajectory θ(t)t∈[0,T ] is given, and that we are interested in modifying the position of a certain landmark point r = f (θ). For instance, if θ represents the joint angles of a human leg, r can represent the position of the right toe, and we want to modify r so that the toe passes above an obstacle at a time instant tobs . This can be done as follows 1) First, “drag” the position of r(tobs ) towards a higher position rd and compute the corresponding θ d using the pin-and-drag algorithm [17] ; 2) Next, choose an appropriate τ < tobs and deform θ(t)t∈[0,T ] at τ to obtain θ  (t)t∈[0,T ] with θ  (tobs ) = θd ; 3) Finally, choose an appropriate τ  ≥ tobs and deform θ  (t)t∈[0,T ] at τ  to obtain θ  (t)t∈[0,T ] with θ  (T ) = θ(T ). Note that, at steps 2 and 3, one may also impose velocity, acceleration, etc. constraints to guarantee C 1 , C 2 , etc. continuity. The new trajectory θ  (t)t∈[0,T ] can be computed very quickly, even for systems with large numbers of degrees of freedom, thanks to the efficiency of the pin-and-drag and affine deformation algorithms. One can thus visualize the whole deformed trajectory in real-time as one drags the landmark point, which greatly enhances the editing experience. This is illustrated in Fig. 5.

100 0 −100 −200

0

200

Hip angle (rad) 0

400

600

800

1000

1200

1400

−1

0

0

−2

−5

−50

1

2

0

Knee angle

1

2

−2

0

1

2

0

0

0

0

1

2

1

2

−100

0

Ankle velocity

1

−5

2

−50

Ankle angle 5

1

Knee accel 0

0 −5

2

2000

50

5

0

0

Knee velocity

1

−1

1800

Hip accel (rad.s ) 50

−1

0

1600

−2

Hip velocity (rad.s ) 5

1

2

Ankle accel 100 50 0

0

1 2 Time (s)

−50

0

1

2

Fig. 5. Interactive editing of a human walking pattern. The original trajectory is reconstructed from motion-capture data and is plotted in dark gray. The intermediate configuration at time tobs is highlighted by bold dark gray sticks (top plot) and the landmark point is represented by the leftmost black disk. First, differential inverse kinematics is performed to compute a desired configuration θ d (highlighted by the super bold gray sticks in the top plot) corresponding to a higher position rd of the landmark point (rightmost black disk). Then a first deformation (black) is made to bring the configuration at time tobs towards the desired configuration θ d . Finally a second deformation (light gray) is made to bring the configuration at T back to that of the original trajectory, with the original velocity, ensuring thereby C 1 continuity with the final part of the trajectory. Note that the time instant T corresponds here to the time instant of the first heel strike after tobs .

The smoothness of the deformed robot joint angles and of the velocity profiles can be observed in the plots of Fig. 6. Note also that, although noisy (because of motor and sensor noises), the accelerations remained small at the global scale thanks to the C 1 continuity guaranteed by the affine deformation algorithm. Finally, note that it is possible to implement online feedback control by applying this algorithm in a reactive manner [9]. IV. A

CHARACTERIZATION OF TRAJECTORY REDUNDANCY

BY THE GROUP OF ADMISSIBLE DEFORMATIONS

B. Motion transfer to humanoid robots We first recorded a hand waving movement (four degrees of freedom : elbow flexion, shoulder pitch, roll, yaw) of a human subject (see the first row of snapshots in Fig. 6). Optical markers were placed on the subject’s shoulder, elbow and wrist, which allow reconstructing the joint angles by inverse kinematics. These joint angles were then transferred to a humanoid robot (second row of snapshots in Fig. 6). Another robot movement was obtained by smoothly deforming the joint angle trajectories in order to reach a new final arm configuration corresponding to : elbow angle − 0.2rad, shoulder pitch + 0.2rad, roll + 0.2rad, yaw − 0.2rad (third row of snapshots in Fig. 6).

Building from the previous development, we now discuss the concept of redundancy from the viewpoint of group theory. A. Configuration and velocity redundancies A manipulator is said to be kinematically redundant with respect to a task when more degrees of freedom than the minimum number required to execute that task are available, see e.g. [7, 12]. As in section III-A, consider the system r = f (θ),

(17)

where r is a vector of dimension m representing the configuration of the end-effector and θ is a vector of dimension

334

−1

Elbow angle (rad) 1 0 −1 −2 −3

−2

Elbow ang velocity (rad.s )

Elbow ang acceleration (rad.s ) 200

5 0

0

−5 0

1

2

0

Shoulder pitch angle 1 0 −1 −2 −3

point, which we call velocity redundancy. Differentiating (17) ˙ where Jf = ∂f is the Jacobian indeed yields r˙ = Jf (θ)θ, ∂θ matrix of f , of dimension m × n. If n > m and Jf (θ(t)) is non-singular, then a given desired instantaneous velocity vrd of the end-effector can be achieved by infinitely many different instantaneous velocities vθ of the joint angles (for notational simplicity, we have dropped the time index t). More precisely, let S represent the null-space of Jf (θ) and vθ∗ = Jf (θ)+ vrd . Then any joint angle velocity in the affine subspace {vθ∗ + S} will achieve the desired end-effector velocity vrd [7, 12]. From a group-theoretic viewpoint, which will be convenient later on, let TS denote the space of the translations whose vectors belong to S. This set can actually be viewed as a Lie subgroup of dimension n − m of the general affine group GAn . The space of all joint angle velocities vθ corresponding to a single end-effector velocity vrd described above can then be seen as the orbit of vθ∗ under the action of TS , and the “degree of velocity redundancy” of the system at θ as the dimension of TS as a Lie subgroup of GAn .

1

2

2

−10

1

2

Shoulder pitch acceleration 200

0

1

0

Shoulder pitch velocity 10

0

−200

0

B. Trajectory redundancy

−200 0

Shoulder roll angle

1

2

0

Shoulder roll velocity

1

2

Shoulder roll acceleration

5

200

0

0

0 −1 −2

0

1

2

−5

0

Shoulder yaw angle 3 2 1 0 −1

1

2

1

2

0

−10

1

2

Shoulder yaw acceleration 200

0

0

−200

Shoulder yaw velocity 10

0 −200 0

1

2

0

1

2

Time (s)

Fig. 6. Transferring a human hand waving motion to a humanoid robot. Top row: the original human movement (snapshots taken every 500ms). Second row (condition NORMAL) : the joint angles obtained from inverse kinematics are transferred on the humanoid robot after making one affine deformation at t = 1s to align the initial posture of the motion with the standard initial posture of the robot (snapshots taken every 2s). Third row (condition MODIFIED) : a second affine deformation is made at t = 1.5s to alter the final posture of the robot. In the last three snapshots, the robot’s right arm in condition NORMAL was superimposed for comparison. Note that the robot movement speed was reduced to 1/4th of the original human speed to comply with the hardware limitations. The graphics show the joint angles and their derivatives for the original human movement (dark gray), and for the scaled robot movements (NORMAL in black and MODIFIED in light gray).

n representing the joint angles. If n > m, then there generally exists infinitely many θ that correspond to a given r, which constitutes the notion of configuration redundancy, see Fig. 7A. 



The developments of sections II and III have highlighted another type of redundancy, namely trajectory redundancy : once a particular joint configuration θ d has been chosen from the many possible joint configurations that achieve a given end-effector configuration, there still exists infinitely many joint angle trajectories that can bring the manipulator from the initial configuration θ 0 towards θ d with a specified velocity, acceleration, etc. while respecting the system kinematic and dynamic constraints, see Fig. 7B. Unlike configuration/velocity redundancies, trajectory redundancy is generally of infinite dimension. Finding a convenient way to parameterize a subset of admissible trajectories is then of particular interest. We have seen, from the development of section II that, given a time instant τ , the set of admissible trajectories that can be obtained by affinely deforming an original trajectory θ(t)t∈[0,T ] is – roughly – the orbit of that trajectory under the action of a subgroup of dimension n2 −(k+p)n of GAn . To obtain a more convenient description, we now propose a construction that removes the dependance upon τ , allowing thereby the composition of deformations at different time instants. For a given τ , let G(τ ) denote the group of admissible affine transformations at time τ . As we remarked previously, G(τ ) is of dimension n2 − (k + p)n if θ is not singular at τ . Consider now two finite sequences τˆ = {τ1 , . . . , τl } (with 0 ≤ τ1 < . . . < τl < T ) and gˆ = {g1 , . . . , gl } ∈ G(τ1 ) × 0 τ ). We define the deformation fτˆ,ˆg of θ by . . . × G(τl ) = G(ˆ ∀t ∈ [0, τ1 ], fτˆ,ˆg (θ)(t) = θ(t) ∀i ∈ [1, l], ∀t ∈ (τi , τi+1 ], fτˆ,ˆg (θ)(t) = (g1 ◦ . . . ◦ gi )(θ(t)),

Fig. 7. Configuration redundancy (A) and trajectory redundancy (B). For simplicity, we have sketched in plot B the end-effector trajectory redundancy, but this notion applies more generally to the joint angle trajectories.

Redundancy can also be studied from a differential view-

with the convention τl+1 = T . 0 τ ), we define the inverse Given two sequences τˆ and gˆ ∈ G(ˆ of fτˆ,ˆg , denoted fτˆ−1 , by the deformation associated with the ,ˆ g sequences τˆ and gˆ−1 = {g1−1 , . . . , gl−1 }.

335

0 τ ) and gˆ ∈ G(ˆ 0 τ  ), we define the product Given τˆ, τˆ , gˆ ∈ G(ˆ fτˆ,ˆg  fτˆ ,ˆg by the deformation associated with the sequences τˆ ∪ τˆ and gˆ + gˆ where the latter sequence is constructed as follow • For all τi ∈ τ ˆ \ τˆ , put gi into gˆ + gˆ at the position corresponding to τi ;  • For all τi ∈ τ ˆ \ τˆ, put gi into gˆ + gˆ at the position corresponding to τi ; • For all σ ∈ τ ˆ ∩ τˆ, say σ = τi = τj , put gi ◦ gj into gˆ + gˆ at the position corresponding to σ. We can now state the following proposition Proposition 1 The set of all fτˆ,ˆg endowed with the inverse and product operations as defined above is a Lie group of dimension n2 − (k + p)n + 1. We call this group the affine deformation group of θ(t)t∈[0,T ] and denote it by A(θ). 4 Recall that we have previously identified the redundancy of velocities with a certain group of translations TS of dimension n − m. Similarly, we identify here part of the redundancy of trajectories with the group A(θ), in the sense that the orbit of θ under the action of A(θ) is the set of all admissible trajectories that can be obtained from θ by finite sequences of affine deformations. Note that, in the limit of large n – as in the case of highly redundant manipulators or of humanoid robots – the dimension of A(θ) grows linearly with n2 , allowing thereby more “freedom” than configuration redundancy alone (the dimension of TS grows linearly with n only). The group property and the matrix representation of the admissible deformations allow searching efficiently (using random sampling, optimization, etc.) within the space of trajectory redundancy, as partially illustrated in sections II and III. We also believe that a better understanding of A(θ) – in particular that of its associated Lie algebra – may provide powerful methods to make trajectory deformations. V. C ONCLUSION We have presented a new method of trajectory deformation for redundant manipulators based on affine transformations. This method is exact, fast, and guarantees the smoothness of the resulting trajectories. Furthermore, the resulting trajectories can be chosen so as to remain close the original trajectory (with explicit bounds), to optimize a given cost function, or to satisfy inequality constraints at specific time instants. Combined with further kinematics or dynamics filtering, the method can also yield trajectories that satisfy constraints applying on continuous time intervals, in a coarse-to-fine manner. This method is advantageous with respect to spline- or dynamic-system-based approaches in that it does not require choosing an exogenous functions basis (such as hierarchical spline basis [6], wavelet spline basis [13], Gaussian kernels [3, 14], etc.) : indeed, the only “basis functions” we use are the original joint angle trajectories. In particular, there is no need to fine-tune the basis functions or to put extra constraints on the coefficients multiplying the basis functions in order to avoid undesirable behaviors (such as spline trajectories that undulate too much [6] or wavelets that have too much energy [13]). Finally, the purely endogenous

nature of the proposed method allows the deformed motions to preserve quantitative and qualitative properties, such as e.g. the difficult-to-quantify “naturalness” [18], of the original motions with no extra effort. Our current research focuses on developing this method for full-scale applications in character animation [10, 6, 17, 18] and humanoid robots control [16, 14]. Acknowledgments We would like to thank Mr. Hamano for the human experiment data, Dr. Srinivasa and Mr. Santacruz for help with the humanoid robot experiment, Prof. Bennequin and Dr. Diankov for helpful discussions. This work was supported by “Grants-in-Aid for Scientific Research” for JSPS fellows and by a JSPS postdoctoral fellowship.

R EFERENCES [1] D. Bennequin, R. Fuchs, A. Berthoz, and T. Flash. Movement timing and invariance arise from several geometries. PLoS Comput Biol, 5(7): e1000426, Jul 2009. doi: 10.1371/journal.pcbi.1000426. [2] M. Gleicher. Retargetting motion to new characters. In ACM SIGGRAPH, pages 33–42. ACM, 1998. [3] A.J. Ijspeert, J. Nakanishi, and S. Schaal. Movement imitation with nonlinear dynamical systems in humanoid robots. In IEEE International Conference on Robotics and Automation, 2002. [4] O. Kanoun, F. Lamiraux, and P.-B. Wieber. Kinematic control of redundant manipulators: Generalizing the task-priority framework to inequality tasks. IEEE Transactions on Robotics, 27(4):785–792, 2011. doi: 10.1109/TRO.2011.2142450. [5] F. Lamiraux, D. Bonnafous, and O. Lefebvre. Reactive path deformation for nonholonomic mobile robots. IEEE Transactions on Robotics, 20 (6):967–977, 2004. doi: .829459. [6] J. Lee and S.Y. Shin. A hierarchical approach to interactive motion editing for human-like figures. In ACM SIGGRAPH, pages 39–48. ACM, 1999. [7] Y. Nakamura. Advanced Robotics: Redundancy and Optimization. Addison-Wesley, 1990. [8] Q.-C. Pham. Fast trajectory correction for nonholonomic mobile robots using affine transformations. In Robotics: Science and Systems, 2011. [9] Q.-C. Pham and Y. Nakamura. Regularity properties and deformation of wheeled robots trajectories. In IEEE International Conference on Robotics and Automation, 2012. [10] C. Rose, B. Guenter, B. Bodenheimer, and M.F. Cohen. Efficient generation of motion transitions using spacetime constraints. In ACM SIGGRAPH, pages 147–154. ACM, 1996. [11] K. Seiler, S. Singh, and H. Durrant-Whyte. Using Lie group symmetries for fast corrective motion planning. In Algorithmic Foundations of Robotics IX, 2010. [12] B. Siciliano and J.-J. E. Slotine. A general framework for managing multiple tasks in highly redundant robotic systems. In International Conference on Advanced Robotics, pages 1211–1216, 1991. doi: 10. 1109/ICAR.1991.240390. [13] A. Ude, C.G. Atkeson, and M. Riley. Planning of joint trajectories for humanoid robots using B-spline wavelets. In IEEE International Conference on Robotics and Automation, volume 3, pages 2223–2228. IEEE, 2000. [14] A. Ude, A. Gams, T. Asfour, and J. Morimoto. Task-specific generalization of discrete and periodic dynamic movement primitives. IEEE Transactions on Robotics, 26(5):800–815, 2010. [15] Y Uno, M Kawato, and R Suzuki. Formation and control of optimal trajectory in human multijoint arm movement. minimum torque-change model. Biol Cybern, 61(2):89–101, 1989. ISSN 0340-1200. [16] K. Yamane and Y. Nakamura. Dynamics filter – concept and implementation of online motion generator for human figures. IEEE Transactions on Robotics and Automation, 19(3):421–432, 2003. [17] K. Yamane and Y. Nakamura. Natural motion animation through constraining and deconstraining at will. IEEE Transactions on visualization and computer graphics, pages 352–360, 2003. [18] K. Yamane, J.J. Kuffner, and J.K. Hodgins. Synthesizing animations of human manipulation tasks. In ACM Transactions on Graphics (TOG), volume 23, pages 532–539. ACM, 2004.

336

E-Graphs: Bootstrapping Planning with Experience Graphs Mike Phillips Benjamin Cohen Sachin Chitta Maxim Likhachev [email protected] [email protected] [email protected] [email protected] Carnegie Mellon University University of Pennsylvania Willow Garage Carnegie Mellon University Abstract—Human environments possess a significant amount of underlying structure that is under-utilized in motion planning and mobile manipulation. In domestic environments for example, walls and shelves are static, large objects such as furniture and kitchen appliances most of the time do not move and do not change, and objects are typically placed on a limited number of support surfaces such as tables, countertops or shelves. Motion planning for robots operating in such environments should be able to exploit this structure to improve its performance with each execution of a task. In this paper, we develop an online motion planning approach which learns from its planning episodes (experiences) a graph, an Experience Graph. This graph represents the underlying connectivity of the space required for the execution of the mundane tasks performed by the robot. The planner uses the Experience graph to accelerate its planning efforts whenever possible. On the theoretical side, we show that planning with Experience graphs is complete and provides bounds on sub-optimality with respect to the graph that represents the original planning problem. On the experimental side, we show in simulations and on a physical robot that our approach is particularly suitable for higher-dimensional motion planning tasks such as planning for single-arm manipulation and two armed mobile manipulation. The approach provides significant speedups over planning from scratch and generates predictable motion plans: motions planned from start positions that are close to each other to goal positions that are also close to each other tend to be similar. In addition, we show how the Experience graphs can incorporate solutions from other approaches such as human demonstrations, providing an easy way of bootstrapping motion planning for complex tasks.

I. I NTRODUCTION Motion planning is essential for robots operating in dynamic human environments. Tasks like picking and placing objects and mobile manipulation of larger boxes require the robot to approach and pick up (or place) objects with minimal collision with the rest of the environment. Fast performance of motion planning algorithms in such tasks is critical, to account for the speed of operation expected by humans and to account for sudden changes in the environment. This is especially true of tasks involving higher-dimensional configuration spaces, e.g. for a two-armed mobile manipulation system ( Figure 1). At the same time, many of mundane manipulation tasks such as picking and placing various objects in a kitchen are highly repetitive. It is therefore expected that robots should be capable of learning and improving their performance with every execution of these repetitive tasks. In particular, robots We thank Willow Garage for their support of this work. This research was also sponsored by ARL, under the Robotics CTA program grant W911NF10-2-0016.

Fig. 1. Motion planning is often used to compute motions for repetitive tasks such as dual-arm mobile manipulation in a kitchen.

should be capable of exploiting learned knowledge about the underlying geometric structure in tasks and human environments. While human environments can be very dynamic, e.g. with people walking around, large parts of the environment are still static for significant periods of time. Similarly, tasks tend to have some form of spatial structure, e.g. objects are often found on support surfaces like tables and desks. This work focuses on learning from experience for motion planning. Our approach relies on a graph-search method for planning that builds an Experience Graph online to represent the high-level connectivity of the free space used for the encountered planning tasks. New motion planning requests reuse this graph as much as possible, accelerating the planning process significantly by eliminating the need for searching large portions of the search-space. While previously encountered motion planning problems can speed up the planner dramatically, in their absence, it gracefully falls back to searching the original search-space, adding the newly generated motion to the Experience graph. Planning with Experience graphs is therefore complete. Furthermore, we show that it provides bounds on sub-optimality with respect to the graph that represents the original planning problem. Planning with Experience graphs leads to consistent and predictable solutions for motion plans requested in similar (but not the same) scenarios, e.g when the goal states of the robot are close to each other. Our approach is particularly useful when the tasks are somewhat repeatable spatially, e.g. in moving a set of dishes off a particular counter into a dishwasher. Although the start and goal states would be different for each motion plan, the general motion of moving a dish from the counter to the washer would essentially be the same each time. We provide experimental results demonstrating the use of our approach both in simulation and on a real robot. A fullbody planner for the PR2 robot was developed for dualarm mobile manipulation tasks using a search-based planner.

337

We show how the use of Experience graphs improves the performance of the planner in this high-dimensional state space. We also present results comparing our planner against sampling-based planning approaches. Finally, we show a preliminary application of Experience graphs to learning by demonstration. II. R ELATED W ORK Initial approaches to motion planning focused on planning from scratch, i.e. there was no reuse of the information from previous plans. Recently, there has been more work on reuse of previous information for motion planning, especially in the context of performance optimization for motion planning in realtime dynamic environments. Lien et. al. [12] presented an approach that involved constructing roadmaps for obstacles, storing them in a database, and reusing them during motion planning. Bruce et. al. [3] extended the traditional RRT approach to reuse cached plans and bias the search towards waypoints from older plans. Extensions of this approach can be found in [20, 6]. In [15], an evolutionary algorithm approach was used to bias RRT search for replanning in dynamic environments towards the edges of the explored areas, intended to reduce the time spent on searching parts of the space that have already been explored. In [20], workspace probability distributions were automatically learned for certain classes of motion planning problems. The distributions attempted to capture a locallyoptimal weighting of workspace features. Trajectory libraries have seen use for adapting policies for new situations [18], especially for control of underactuated systems and high-dimensional systems. In [1], new trajectories in a state space were generated by combining nearby trajectories, including information about local value function estimates at waypoints. In [14], a trajectory library was used in combination with an optimal control method for generating a balance controller for a two link robot. Transfer of policies across tasks, where policies designed for a particular task were adapted and reused in a new task, were discussed in [19]. A learning based approach to reuse information from previous motion plans, the environment, and the types of obstacles was presented in Jetchev [7]. Here, a high-dimensional feature vector was used to capture information about the proximity of the robot to obstacles. After a dimensionality reduction in the feature space, several learning methods including nearest neighbor, locally weighted regression and clustering are used to predict a good path from the database in a new situation. The work in [8] is most similar to our work and involves the use of a database of older motion plans. The approach uses a bi-directional RRT and tries to draw the search towards a path which is most similar to the new motion planning problem (based on distances to the start, goal and obstacles). Some more recent work [2] also attempts to repair paths from a database of past paths using sampling-based planners. It’s future work to compare against this approach. The use of a database of motion plans is a key feature of our approach. However, we differ from other approaches

in that we attempt to use all the information from previous searches instead of attempting to pick the most similar or best path. Our approach can reuse parts of the Experience Graph even when the start and goal states change. Contrary to other approaches, our approach also gracefully degenerates to planning from scratch (with no reused information) to deal with new scenarios which might be completely different from the ones in the database. Our approach also does not rely on object or shape recognition and is thus agnostic to the representation of the environment, e.g. as a voxel grid or individual objects represented using meshes. Although our approach is also comparable to Probabilistic Roadmaps [11], a crucial difference is that Experience Graphs are generated from task-based requests instead of sampling the whole space. We are also able to provide a bound on the quality of the returned solution, with respect to the discretization of the action and state spaces. Finally, our approach tends to generate consistent solutions as we confirm in our experimental analysis. III. A LGORITHM A. Overview An Experience Graph or E-Graph is a graph formed from the solutions found by the planner for previous planning queries or from demonstrations. We will abbreviate this graph as GE . The graph GE is incomparably smaller than graph G used to represent the original planning problem. At the same time, it is representative of the connectivity of the space exercised by the previously found motions. The key idea of planning with GE is to bias the search efforts, using a specially constructed heuristic function, towards finding a way to get onto the graph GE and to remain searching GE rather than G as much as possible. This avoids exploring large portions of the original graph G. In the following we explain how to do this in a way that guarantees completeness and bounds on solution quality with respect to the original graph G. B. Definitions and Assumptions First, we will list some definitions and notations that will help explain our algorithm. We assume the problem is represented as a graph where a start and goal state are provided (sstart , sgoal ) and the desired output is a path (sequence of edges) that connect the start to the goal. G G • G(V , E ) is a graph modeling the original motion planning problem, where V G is the set of vertices and E G is the set of edges connecting pairs of vertices in V G. E E E • G (V , E ) is the E-Graph that our algorithm builds over time (GE ⊆ G). • c(u, v) is the cost of the edge from vertex u to vertex v E • c (u, v) is the cost of the edge from vertex u to vertex v in graph GE and is always equal to c(u, v) Edge costs in the graph are allowed to change over time (including edges being removed and added which happens when new obstacles appear or old obstacles disappear). The more static the graph is, the more benefits our algorithm provides. The algorithm is based on heuristic search and is

338

therefore assumed to take in a heuristic function hG (u, v) estimating the cost from u to v (u, v ∈ V G ). We assume hG (u, v) is admissible and consistent for any pair of states u, v ∈ V G . An admissible heuristic never overestimates the minimum cost from any state to any other state. A consistent heuristic hG (u, v) is one that satisfies the triangle inequality, hG (u, v) ≤ c(u, s) + hG (s, v) and hG (u, u) = 0, ∀u, v, s ∈ V G and ∀(u, s) ∈ E G . C. Algorithm Detail The planner maintains two graphs, G and GE . At the highlevel, every time the planner receives a new planning request the findPath function is called. It first updates GE to account for edge cost changes and perform some precomputations. Then it calls the computePath function, which produces a path π. This path is then added to GE . The updateEGraph function works by updating any edge costs in GE that have changed. If any edges are invalid (e.g. they are now blocked by obstacles) they are put into a disabled list. Conversely, if an edge in the disabled list now has finite cost it is re-enabled. At this point, the graph GE should only contain finite edges. A precomputeShortcuts function is then called which can be used to compute shortcut edges before the search begins. Ways to compute shortcuts are discussed in Section V. Finally, our heuristic hE , which encourages path reuse, is computed† . findPath(sstart , sgoal ) 1: updateEGraph(sgoal ) 2: π = computeP ath(sstart , sgoal ) 3: GE = GE ∪ π

updateEGraph(sgoal ) 1: 2: 3: 4: 5:

updateChangedCosts() disable edges that are now invalid re-enable disabled edges that are now valid precomputeShortcuts() compute heuristic hE according to Equation 1

Our algorithm’s speed-up comes from being able to reuse parts of old paths and avoid searching large portions of graph G. To accomplish this we introduce a heuristic which intelligently guides the search toward GE when it looks like following parts of paths in GE will help the search get close to the goal. We define a new heuristic hE in terms of the given heuristic hG and edges in GE for any state s0 ∈ V G . hE (s0 ) = min π

N −1 

min{εE hG (si , si+1 ), cE (si , si+1 )} (1)

i=0

where π is a path s0 . . . sN −1  and sN −1 = sgoal and εE is a scalar ≥ 1. Equation 1 returns the cost of the minimal path from the the queried state s0 to the goal where the path consists of † This can be done with one search prior to planning or can be done on the fly for states generated by the planner. The first option is feasible whenever the heuristic is computed for lower-dimensional projections of the original problem, such as 3D space for the end-effector of a high-dimensional arm.

(a) εE = 1

(b) εE → ∞

εE .

Fig. 2. Effect of The dark solid lines are paths in GE while the dark dotted lines are best paths from s0 to sgoal according to hE . Note as εE increases, the heuristic prefers to travel on GE . The light gray circles and lines show the graph G and the filled in gray circles show the expanded states when planning with E-Graphs. The dark gray arrow shows the returned path.

an arbitrary number of two kinds of segments. The first type of segment corresponds to an instantaneous jump between between si and si+1 at a cost equal to the original heuristic inflated by εE (this is equivalent to saying all states can reach all of the other states according to the original heuristic but inflated by εE ). The second kind of segment is an edge from GE and it uses its actual cost. As the penalty term εE increases, the heuristic path from s0 to the goal will go farther out of its way to travel toward the goal using E-Graph edges. The larger εE is, the more the actual search avoids exploring G and focuses on traveling on paths in GE . Figure 2 demonstrates how this works. As εE increases, the heuristic guides the search to expand states along parts of GE as shown in Figure 2. In Figure 2a, the heuristic causes the search to ignore the graph GE because without inflating hG at all (εE = 1), the heuristicis will never favor following edges in GE . This figure also shows how during the search, by following GE paths, we can avoid obstacles and have far fewer expansions. The expanded states are shown as filled in gray circles, which change based on εE . The computePath function (shown below) runs weighted A* without re-expansions [13]. Weighted A* uses a parameter ε > 1 to inflate the heuristic used by A*. The solution cost is guaranteed to be no worse than ε times the cost of the optimal solution and in practice it runs dramatically faster than A*. The main modification to Weighted A*, is that in addition to using the edges that G already provides (getSuccessors), we add two additional types of successors: shortcuts and snap motions (line 9). The only other change is that instead of using the heuristic hG , we use our new heuristic hE (lines 4 and 16). Shortcut successors are generated when expanding a state s ∈ GE . A shortcut successor uses GE to jump to a place much closer to sgoal (closer according to the heuristic hG ). This shortcut may use many edges from various previous paths. The shortcuts allow the planner to quickly get near the goal without having to re-generate paths in GE . Possible shortcuts are discussed in Section V. Finally, for environments that can support it, we introduce snap motions. Sometimes, the heuristic may lead the search to a local minimum at the “closest” point to GE with respect to the heuristic, hG , but it may not be a state on GE . For example, in (x, y, θ) navigation, a 2D (x, y) heuristic will create a minimum for 2 states with the same x,y but different

339

computePath(sstart , sgoal ) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19:

Theorem 2. For a finite graph G, the planner terminates, and the solution it returns is guaranteed to be no worse than ε · εE times the optimal solution cost in graph G.

OP EN = ∅ CLOSED = ∅ g(sstart ) = 0 f (sstart ) = εhE (sstart ) insert sstart into OP EN with f (sstart ) while sgoal is not expanded do remove s with the smallest f -value from OP EN insert s in CLOSED S = getSuccessors(s) ∪ shortcuts(s) ∪ snap(s) for all s ∈ S do if s was not visited before then f (s ) = g(s ) = ∞ end if / CLOSED then if g(s ) > g(s) + c(s, s ) and s ∈ g(s ) = g(s) + c(s, s ) f (s ) = g(s ) + εhE (s ) insert s into OP EN with f (s ) end if end for

Consider h (s) = hE (s)/εE . h (s) is clearly consistent. Then, εhE (s) = ε·εE h (s). The proof that ε·εE h (s) leads to Weighted A* (without re-expansions) returning paths bounded by ε · εE times the optimal solution cost follows from [13]. V. I MPLEMENTATION D ETAIL In this section we discuss how various parts of the algorithm could be implemented. A. Heuristic

20: end while

θ. A problem then arises because there isn’t a useful heuristic gradient to follow, and therefore, many states will be expanded blindly. We borrow the idea of adaptive motion primitives [4] to generate a new action which can snap to a state on GE whenever states si , sj have hG (si , sj ) = 0 and sj ∈ GE and si ∈ / GE . The action is only used if it is valid with respect to the current planning problem (e.g. doesn’t collide with obstacles). As with any other action, it has a cost that is taken into account during the search. IV. T HEORETICAL A NALYSIS Our planner provides a guarantee on completeness with respect to G (the original graph representation of the problem). Theorem 1. For a finite graph G, our planner terminates and finds a path in G that connects sstart to sgoal if one exists. Since no edges are removed from the graph (we only add) and we are searching the graph with Weighted A* (a complete planner), if a solution exists on the original graph, our algorithm will find it. Our planner provides a bound on the sub-optimality of the solution cost. The proof for this bound depends on our heuristic function hE being εE -consistent. Lemma 1. If the original heuristic function hG (u, v) is consistent, then the heuristic function hE is εE -consistent. From Equation 1, for any s, s ∈ V G , (s, s ) ∈ E G . hE (s) ≤ min{εE hG (s, s ), cE (s, s )} + hE (s ) hE (s) ≤ εE hG (s, s ) + hE (s )

Some heuristics hG are derived using dynamic programming in a lower dimensional state space, such as a 2D Dijkstra search for an (x, y, θ) navigation problem. Alternatively, some heuristics hG can be computed in O(1) upon request (e.g. euclidean distance). In the first case, we can compute hE by running a Dijkstra search on the low-dimensional projection of G, with additional edges from GE connecting the low-dimensional projection of the states in GE . This can be computed with similar efficiency to the original heuristic hG , so it doesn’t hurt the planning times (this is what we used in our experiments). In the second case, hE can be computed by constructing a fully connected graph on V E and sgoal (using both εE hG and cE ). A Dijkstra seach is used to find the heuristic hE from each of these states to the goal. Then during the search for any state s, hE (s) = mins ∈{V E ∪sgoal } (hG (s, s )+hE (s , sgoal )). In our implementation, we compute the heuristic hE in an on-demand fashion. Our computation runs Dijkstra’s algorithm until the heuristic of the requested state is computed and then suspends until another un-computed heuristic is requested. B. Shortcuts Shortcuts accelerate the search by allowing the search to bypass retracing an old path (re-expanding the states on it) in GE . The algorithm works with or without the shortcuts. Basically, the shortcuts are pre-computed edges that connect all states in GE to a very small set of states in GE . Shortcut successors can only be generated when expanding a state s ∈ GE . There are several obvious choices for this subset. For example, it can contain all states s in GE that are closest to the goal within each connected component of GE . The closeness can be defined by hG or hE . In our experiments we used hG . Other ways can also be used to compute this subset of states. It is future work to explore these options. C. Pre-Computations

hE (s) ≤ εE c(s, s ) + hE (s ) The argument for the first line comes from Equation 1 by contradiction. Suppose the line is not true. Then, during the computation of hE (s), a shorter path π could have been found by traveling to s and connecting to s with min{εE hG (s, s ), cE (s, s )}. The last step follows from hG being admissible. Therefore, hE is εE -consistent.

Some of the computations on GE can be done before the goal is known. In particular, shortcuts(s) (line 9 of computePath) need to know the costs of least-cost paths between pairs of states in GE . If state u ∈ GE is being expanded and has a shortcut to the state v on the same component in GE then we need to assign an edge cost c(u, v). In order to do that we need to know the cost of a least-cost path on GE from u to v.

340

These costs can be computed before knowing the goal by using an all-pairs shortest path algorithm like Floyd-Warshall. This can be done in a separate thread between planning queries (as well as adding the path from the previous query into GE ). To make Floyd-Warshall run faster and to save memory, we can also exploit the fact that most of the paths in GE don’t intersect each other in many places. We can therefore compress it into a much smaller graph containing only vertices of degree = 2 and run Floyd-Warshall on it. Then, the cost of a path between any pair x, y ∈ GE is given by min{c(x, xi )+c(xi , yi )+c(yi , y)}.

(a) Bootstrap goals

xi ,yi

Where xi ∈ {x1 , x2 } and yi ∈ {y1 , y2 }. x1 and x2 are states with degree = 2 that contain the path on which x resides. y1 and y2 are defined similarly.

Fig. 3.

A. Full Body Planning Planning in higher-dimensional spaces can be challenging for search-based planners. A full-body planning scenario for a PR2 robot is thus a good test of the capabilities developed in this work. Our test involves the PR2 carrying objects in a large environment. We restrict the objects to be upright in orientation, a constraint that often presents itself in real-world tasks like carrying large objects, trays, or liquid containers. We assume that the two end-effectors are rigidly attached to the object. The state space is 10 dimensional. Four define the position and yaw of the object in a frame attached to the robot. The planner operates directly in this workspace (instead of joint space) and uses inverse kinematics to map movements back onto the joint angles of both arms. To restrict the number of inverse kinematics solutions to one, the redundant degree of freedom in each arm (upper arm roll joint) are part of our state. The robot’s position and yaw in the map frame and the height of the telescoping spine are the last four dimensions. A 3D Dijkstra heuristic is used to plan (backwards) for a sphere inscribed in the carried object from its goal position to the start position (this is the low-dimensional projection for the heuristic). The heuristic is useful in that it accounts for collisions between the object and obstacles. However, it does not handle the complex kinematic constraints on the motion of the object due to the arms, spine, and base and does not account for collisions between the body of the robot and the environment. In all experiments, ε = 2 and εE = 10 resulting in a sub-optimality bound of 20. We chose these values for the parameters (manually) as they provided a good combination of speed-up and sub-optimality bound. In future work, we will look into ways to automatically reduce the suboptimality bound as planning time allows. Setting εE to 10 greatly encourages the search to go to GE if it looks like

Full-body planning in a warehouse

TABLE I WAREHOUSE E NVIRONMENT: E-G RAPH P LANNING T IME successes(of 500) mean time(s) std dev(s) max(s) 500 0.30 0.43 3.37

VI. E XPERIMENTAL R ESULTS A variety of experiments were run in multiple domains to verify the performance gains of our algorithm. We compared our approach against Weighted A* without using GE as well as with a variety of sampling-based planners. The test domains include planning for a 7 degree of freedom arm as well as fullbody planning for the PR2 robot (two arms, the telescoping spine and the navigation of the base).

(b) One of the test sets

following it can get it closer to the goal. Setting ε to 2 inflates the whole heuristic including the part of the hE using paths in GE . This encourages the planner to use shortcuts as soon as they become available, preventing it from re-expanding an old path. The results were compared against regular Weighted A* with ε = 20 so that both approaches would have the same suboptimality bound. We also compared against several sampling techniques from the OMPL (Open Motion Planning Library), including RRT-Connect (a bi-directional RRT), PRM, and RRT* [5, 9, 11, 10]. Since the PRM is a multi-query approach, we allowed it to keep its accumulated roadmap between queries like our method does. Since RRT* is an anytime method that improves as more time is allowed we provide results for the first solution found as well as the solution after our timeout (we gave all planners up to 2 minutes to plan). We post-process all sampling planner paths using OMPL’s short-cutter (E-Graph and Weighted A* paths have no postprocessing). 1) Warehouse scenario: Our first scenario is modeled on an industrial warehouse where the robot must pick up objects off a pallet and move them onto a shelving unit (Figure 3). The goals alternate between pallet and shelves. Since our planner improves in performance with repeated attempts in a particular spatial region of space, we first bootstrap it with 45 uniformly distributed goals (split between the pallet and the shelves). The bootstrap goals and the resultant GE (projected onto 3D end effector position) after processing them are shown in Figure 3a. We bootstrapped the PRM with the same 45 goals. A set of 100 random goals (with varying positions and yaws of the object) alternating between the pallet and the shelves were then specified to the planner. This entire process was repeated 5 times with GE cleared before running each new set of 100 goals. The planning times for E-Graphs for all 500 random goals are shown in Table I. On average, 94% of the edges on a path produced by our planner were recycled from GE . The mean time to update GE between queries (add the new path and compute Floyd-Warshall) was 0.31 seconds. In Table II we compare E-Graphs to several other methods. We show the average speed-up by computing the average E-

341

TABLE II WAREHOUSE E NVIRONMENT: P LANNING T IME C OMPARISON method

successes(of 500)

mean speed-up

Weighted A* RRT-Connect PRM RRT* (first solution)

497 500 500 440

15.80 0.59 2.16 11.90

std dev speed-up 54.05 0.42 14.79 45.57

max speed-up 625.54 3.08 226.51 594.45

TABLE III WAREHOUSE E NVIRONMENT: PATH Q UALITY C OMPARISON method Weighted A* RRT-Connect PRM RRT* (first solution) RRT* (final solution)

object XYZ path length ratio 0.70 0.94 0.78 0.86 0.87

std dev ratio 0.15 0.59 0.33 0.37 0.43

base XY path length ratio 1.36 1.55 0.64 1.47 1.41

std dev ratio 2.89 3.31 0.39 3.46 2.84

Graph to other method ratio (across the cases where both methods found a solution to the query). We can see that generally E-Graphs are significantly faster than other methods (except for bi-directional RRT). In Table III we can see results comparing path quality on two metrics: the length of the path the carried object travels and the length of the path the base travels. The results in this table are also ratios and we can see that all methods produce shorter paths than E-Graphs. This is expected from Weighted A* since E-Graph paths sometimes go out of their way a bit to find solutions quicker. Sampling planners generally have poor path quality but the shortcutting post-process step works well on this relatively simple experiment where most paths are very close to straight lines in configuration space. Our next experiment is a more difficult kitchen scenario where the more complicated paths cause shortcutting to be less effective. 2) Kitchen Environment: A second set of tests was run in a simulated kitchen environment. 50 goals were chosen in locations where object are often found (e.g. tables, countertops, cabinets, refrigerator, dishwasher). 10 representative goals were chosen to bootstrap our planner, which was then tested against the remaining 40 goals. Figure 4 shows GE both after bootstrapping and after all the goals have been processed. Table IV shows the planning times for E-Graphs. On average, 80% of the edges on a path produced by our planner were recycled from GE . The mean time to update GE between queries (add the new path and compute Floyd-Warshall) was 0.34 seconds. In Table V we can see that E-Graphs provide a significant speed-up over all the other methods, generally over 30 times (though only a factor of 2 for bi-directional RRT). The PRM and RRT* also failed to solve many of the queries within the 2 minute limit. In Table VI we can see that E-Graphs provide significantly shorter paths than bi-directional RRT (Figure 5 shows an example). While RRT* path quality does improve with time, after 2 minutes, it still is no better than E-Graphs. The PRM still does better than E-Graphs in the kitchen scenario,

(a) GE after bootstrap goals

(b) GE after bootstrap goals

(c) GE after test goals

(d) GE after test goals

Fig. 4.

Full-body planning in a kitchen scenario

TABLE V K ITCHEN E NVIRONMENT: P LANNING T IME C OMPARISON method

successes(of 40)

mean speed-up

Weighted A* RRT-Connect PRM RRT* (first solution)

37 40 25 23

34.62 1.97 16.52 50.99

std dev speed-up 87.74 2.35 74.25 141.35

max speed-up 506.78 11.32 372.90 613.54

however, PRMs only solved 25 of the 40 cases and therefore is only compared against E-Graphs on easier queries where shortcutting still works well. We ran an additional experiment in the kitchen scenario to show the consistency of E-Graph solutions. Consistency measures how similar output of a planner is, given similar inputs (start and goal). In many domains, this kind of path

(a) RRT-Connect

(b) E-Graph

Fig. 5. An example comparing an RRT-Connect path to an E-Graph path (path waypoints move from black to white). The E-Graph path is shorter and more goal directed. TABLE VI K ITCHEN E NVIRONMENT: PATH Q UALITY C OMPARISON method

TABLE IV K ITCHEN E NVIRONMENT: E-G RAPH P LANNING T IME successes(of 40) mean time(s) std dev(s) max(s) 40 0.33 0.25 1.00

Weighted A* RRT-Connect PRM RRT* (first solution) RRT* (final solution)

342

object XYZ path length ratio 0.91 2.54 0.85 1.08 1.03

std dev ratio 0.68 4.67 0.32 0.60 0.48

base XY path length ratio 1.14 3.45 0.88 1.39 1.36

std dev ratio 1.40 9.67 0.48 1.79 1.96

Fig. 6. Consistency experiment in a kitchen scenario. 58 goals were split into two groups and the various methods were asked to plan between them. Fig. 7.

TABLE VII K ITCHEN E NVIRONMENT: PATH C ONSISTENCY C OMPARISON method E-Graphs RRT-Connect PRM RRT* (first solution) RRT* (final solution)

(a) A demonstrated path

(b) GE after 12 goals

Learning by demonstration in a more difficult warehouse scenario

dynamic time warping similarity 407 1512 748 1432 1034

predictability is critical for people to be comfortable around robots. We tested this by placing two groups of goals in the kitchen environment as shown in Figure 6. There are 58 goals split evenly between a cabinet and a refrigerator shelf. We ran through the goals alternating between the two groups so that while no two queries have the same start and goal, the paths should be mostly the same for each. The results of the experiment are shown in Table VII. We used the dynamic time warping similarity metric [16] to compare the methods. Having a value closer to 0 means the paths are more similar. Since this method is for comparing pairs of paths, we performed an all-pairs comparison within each group and then took the average path similarity. We can see that E-Graphs by far have the most consistent paths due to the deterministic nature of the planner and reuse of previous path segments. B. Learning by Demonstration The high dimensionality of the full-body domain makes it easy for the planner to get stuck in large local minima. This is especially true when the heuristic is misleading, such as in Figure 7a. After picking up an object at the pallet and given a goal in the shelf, the heuristic guides the search around the right side of the pole where the robot’s body won’t fit through. The search without Experience Graphs then takes over 30 minutes to solve the problem. However, if a path were demonstrated, and then added to GE , our approach could harness this information to reach all the desired goals. We demonstrated such a path through teleoperation in the simulated world (Figure 7a). We added this demonstrated path to GE by having each recorded waypoint be a vertex with a cost represented by the cost function used in our approach. After the demonstration, our approach had an average planning time of 1.08 seconds to all 12 goals as shown in Figure 7b. While learning by demonstration is not the focus of this work, these preliminary results show it as a promising application. It is also interesting to note that any path can be demonstrated regardless of how sub-optimal the

(a) Grasping pipeline setup (b) GE partway through the experiment Fig. 8.

Tabletop manipulation experiments

quality is. Our planner will use parts of it if possible and return a solution with a bound on the resulting solution quality. C. Single Arm Planning The planner’s performance was also tested for tabletop manipulation using the PR2 robot (Figure 8a). A search-based planner [4] generates safe paths for each of the PR2’s 7-DOF arms separately. The goals for the planner are specified as the position and orientation of the end-effector. Our implementation builds on the ROS grasping pipeline to pick up and put down objects on the table in front of the robot. Our approach is used during both the pick and place motions to plan paths for the robot’s arms (individually). In the experiments, statistics for 411 planning requests were recorded using Weighted A*, our approach, and a randomized planner (SBL [17]). The results are shown in Table VIII and Figure 8b shows GE part-way through the experiment. We set ε = 2 and εE = 50 for a sub-optimality bound of 100. We ran the regular Weighted A* planner with ε = 100. Table VIII shows that we have a speed increase of about 2.5 over both methods. The heuristic computation time (0.1s) dominates the planning times for both our approach and the Weighted A* approach, resulting in a smaller speedup than expected by the ratio of expansions. On average, 95% of the edges on a path produced by our planner were recycled from

343

TABLE VIII R ESULTS ON TABLETOP M ANIPULATION (411 GOALS ) E-Graphs (E) Weighted A* (W) SBL (S) Ratio (W/E) Ratio (S/E)

mean time(s) 0.13 0.26 0.24 2.50 2.44

std dev time(s) 0.07 0.10 0.09 1.23 1.50

mean expands 4 145 N/A 66 N/A

mean cost 117349 109297 N/A 1.03 N/A

(a) 40 end effector trajectories with similar starts and goals. Our approach is in black, while the sampling approach is in gray. Fig. 9.

comparison against the same search-based planning method without E-Graphs showed a speed-up of 1 to 2 orders of magnitude. Our comparison against a state of the art sampling based approach showed that we are competitive in terms of speed, but we yield more consistent paths. As future work, we would like to use heuristics to prune GE as it gets large, as well as taking a deeper look at applications in the field of learning by demonstration. We are also interested in making this into an anytime search so that the solution could approach optimality as more time is allowed.

(b) One visualized path

E-Graphs provide similar solutions to similar problems

TABLE IX L ENGTH OF 40 SIMILAR QUERIES IN TABLETOP MANIPULATION mean length (m) std dev length (m) E-Graphs 1.378 0.012 SBL 1.211 0.178

GE . The mean time to update GE (add the new path and compute Floyd-Warshall) was 0.12 seconds. These results show that our approach is competitive with sampling-based method in terms of planning times. However, our approach provides explicit cost minimization and therefore, some notion of consistency (for similar goals we provide similar paths). Figure 9a shows the paths of the end effector for 40 paths that had similar starts and goals (within a 3x3x3cm cube). The gray paths are from the randomized approach while the black are from our approach. Notice that the randomized approach produces highly varying paths even after shortcutting. On the other hand our approach consistently provides almost the same path each time (we also applied a simple short-cutter to eliminate artifacts from the discritized environment). Table IX shows that our approach has only a slightly longer path length (for the end effector distance) but a significantly lower standard deviation. While our planner’s cost function is actually trying to minimize the change in joint angles, our average end effector path length is still relatively competitive with the sampling-based approach. Figure 9b shows one of these paths visualized in more detail. VII. C ONCLUSION In this paper we have presented planning with Experience Graphs, a general search-based planning method for reusing parts of previous paths in order to speed up future planning requests. Our approach is able to do this while still providing theoretical guarantees on completeness and a bound on the solution quality with respect to the graph representing the planning problem. The paths our planner uses can be fed back from each planning episode in an online fashion to allow the planner to get better over time. The paths can also be demonstrated by a human (or other planner) to allow our planner to improve (while still providing a bound on solution quality). We provided experiments in the robotics domains of planning for a 7 DoF arm in tabletop manipulation tasks and full-body planning for the PR2 robot performing mobile manipulation tasks in warehouse and kitchen scenarios. Our

R EFERENCES [1] C. Atkeson and J. Morimoto. Nonparametric representation of policies and value functions: A trajectory-based approach. In Advances in Neural Information Processing Systems (NIPS), 2003. [2] Dmitry Berenson, Pieter Abbeel, and Ken Goldberg. A robot path planning framework that learns from experience. In ICRA, 2012. [3] J. Bruce and M. Veloso. Real-time randomized path planning for robot navigation. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2002. [4] B.J. Cohen, G. Subramanian, S. Chitta, and M. Likhachev. Planning for manipulation with adaptive motion primitives. In Robotics and Automation (ICRA), 2011 IEEE International Conference on, 2011. [5] Ioan A. S¸ucan, Mark Moll, and Lydia E. Kavraki. The Open Motion Planning Library. IEEE Robotics & Automation Magazine, 2012. URL http://ompl.kavrakilab.org. To appear. [6] D. Ferguson, N. Kalra, and A. T. Stenz. Replanning with rrts. In IEEE International Conference on Robotics and Automation, May 2006. [7] Nikolay Jetchev and Marc Toussaint. Trajectory prediction: Learning to map situations to robot trajectories. In IEEE International Conference on Robotics and Automation, 2010. [8] Xiaoxi Jiang and Marcelo Kallmann. Learning humanoid reaching tasks in dynamic environments. In IEEE International Conference on Intelligent Robots and Systems, 2007. [9] James J. Kuffner Jr. and Steven M. LaValle. Rrt-connect: An efficient approach to single-query path planning. In ICRA, 2000. [10] S. Karaman and E. Frazzoli. Incremental sampling-based algorithms for optimal motion planning. In Robotics: Science and Systems (RSS), Zaragoza, Spain, June 2010. [11] L. E. Kavraki, P. Svestka, J.-C Latombe, and M. H. Overmars. Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Transactions on Robotics and Automation, 12(4):566–580, 1996. [12] J. Lien and Y. Lu. Planning motion in environments with similar obstacles. In Proceedings of the Robotics, Science and Systems Conference, 2005. [13] M. Likhachev, G. Gordon, and S. Thrun. ARA*: Anytime A* with provable bounds on sub-optimality. In Advances in Neural Information Processing Systems (NIPS) 16. Cambridge, MA: MIT Press, 2003. [14] C. Liu and C. G. Atkeson. Standing balance control using a trajectory library. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2009. [15] S. Martin, S. Wright, and J. Sheppard. Offline and online evolutionary bi-directional rrt algorithms for efficient re-planning in environments with moving obstacles. In IEEE Conference on Automation Science and Engineering, 2007. [16] Hiroaki Sakoe and Seibi Chiba. Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26, 1978. [17] Gildardo Sanchez and Jean claude Latombe. A single-query bidirectional probabilistic roadmap planner with lazy collision checking. In International Symposium on Robotics Research, 2001. [18] M. Stolle and C. Atkeson. Policies based on trajectory libraries. In IEEE International Conference on Robotics and Automation, 2006. [19] M. Stolle, H. Tappeiner, J. Chestnutt, and C. Atkeson. Transfer of policies based on trajectory libraries. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2007. [20] M. Zucker, J. Kuffner, and M. Branicky. Multipartite rrts for rapid replanning in dynamic environments. In IEEE International Conference on Robotics and Automation, 2007.

344

Walking and Running on Yielding and Fluidizing Ground Feifei Qian1, Tingnan Zhang1, Chen Li1, Pierangelo Masarati4, Aaron M. Hoover3, Paul Birkmeyer2, Andrew Pullin2, Ronald S. Fearing2 & Daniel I. Goldman1 1 3 Georgia Institute of Technology, Atlanta, Georgia, Franklin W. Olin College of Engineering, Needham, USA, 30332–0250 MA, USA, 02492-1200 2 4 University of California at Berkeley, Berkeley, USA, Dipartimento di Ingegneria Aerospaziale, Politecnico 94720 di Milano, Milano, Italy, 20156 Abstract—We study the detailed locomotor mechanics of a small, lightweight robot (DynaRoACH, 10 cm, 25 g) which can move on a granular substrate of closely packed 3 mm diameter glass particles at speeds up to 50 cm/s (5 body length/s), approaching the performance of small, highperforming, desert-dwelling lizards. To reveal how the robot achieves this high performance, we use high speed imaging to capture kinematics, and develop a numerical multi-body simulation of the robot coupled to an experimentally validated discrete element method (DEM) simulation of the granular media. Average forward speeds measured in both experiment and simulation agreed well, and increased non-linearly with stride frequency, reflecting a change in the mode of propulsion. At low frequencies, the robot used a quasi-static “rotary walking” mode, in which the granular material yielded as the legs penetrated and then solidified once vertical force balance was achieved. At high frequencies, duty factor decreased below 0.5 and aerial phases occurred. The propulsion mechanism was qualitatively different: the robot ran rapidly by utilizing the speed-dependent fluid-like inertial response of the material. We also used our simulation tool to vary substrate parameters that were inconvenient to vary in experiment (e.g., granular particle friction) to test performance and reveal limits of stability of the robot. Using small robots as physical models, our study reveals a mechanism by which small animals can achieve high performance on granular substrates, which in return advances the design and control of small robots in deformable terrains. Keywords: bio-inspired granular media; lightweight I.

robot;

legged

locomotion;

INTRODUCTION

There is an increasing need for robots to traverse a diversity of complex terrain. Platforms have been developed that can effectively run on fractured rigid ground [1], [2], crawl within concave surfaces [3], and climb on walls [4]. However, relative to biological organisms [5], manmade devices often have poor locomotor ability on granular substrates like sand and gravel. We acknowledge funding from the Burroughs Wellcome Fund, the Army Research Laboratory (ARL) Micro Autonomous Systems and Technology (MAST) Collaborative Technology Alliance (CTA), the National Science Foundation (NSF) Physics of Living Systems (PoLS), and the Army Research Office (ARO).

For example, in wheeled and tracked vehicles, wheel slippage and sinkage can cause significant performance loss [6]. Granular media are collections of particles that interact through dissipative, repulsive contact forces [7]. Forced granular media remain solid below the yield stress but flow like a fluid when the yield stress is exceeded [8]. The solid-fluid transition presents great challenges for terrestrial devices moving on granular media. For example, previous studies [9] demonstrated that a bio-inspired RHex-class legged robot, SandBot (30 cm, 2.3 kg), walked effectively at up to 1 body length/s on granular media at low to intermediate stride frequencies, where the granular material behaved like a yielding solid. The granular material yielded as the legs penetrated until vertical force balance was achieved. The granular material then solidified under the legs while the body lifted and moved forward, as if the robot were walking on a solid. At high stride frequencies, however, because the legs encountered previously disturbed ground, the granular material around the legs became continuously fluidized, and the robot “swam” forward slowly (~ 0.01 body length/s) using drag on the legs to overcome belly drag. In contrast, a variety of animals live in the deserts and move rapidly across granular surfaces. For example, the zebra-tailed lizard (Callisaurus draconoides, ~ 10 cm, ~ 10 g) can run at speeds over 100 cm/s (10 body length/s) on sand. Unlike SandBot which must penetrate a large portion (> 70%) of its legs to move on granular media, the lizard is light enough that even while running it only penetrates a small portion (< 30%) of its legs into granular substrates to generate force [5]. This suggests that a small, lightweight body may confer advantages for locomotion on deformable surfaces such as granular media. Recent advances in smart composite microstructure (SCM) [10] have enabled the development of small, lightweight robots (~ 10 cm, ~ 20 g) [11] [12] like DynaRoACH (Fig. 1A). These robots are similar in size to the zebra-tailed lizard (among other myriad desert vertebrates and invertebrates [13], [14]) and can achieve performance approaching animals (~ 10 body length/s) on solid surfaces. Therefore, in addition to advancing locomotor capabilities of devices on complex terrain [15], these lightweight robots provide promising physical models to study how effective legged locomotion can be achieved on granular substrates in small, high-performing animals.

345

Another challenge for studying locomotion on granular media is the lack of comprehensive force models at the level of the Navier-Stokes equations for fluids [16]. Recently an experimentally validated discrete element method (DEM) simulation (described below) of a model granular medium (3 mm diameter glass particles) was developed and successfully captured the locomotor mechanics of a sand-swimming lizard moving within granular media [17]. The DEM simulation provides a tool to obtain accurate, detailed information such as force and flow of the granular media during intrusions relevant to locomotion. Such information is challenging to obtain in experiments, since force platforms [18] and 3D particle image velocimetry (PIV) are not yet developed for deformable opaque ground. In this paper we reveal basic principles of movement of lightweight locomotors on granular media using a combination of laboratory experiment and computer simulation. We perform studies of DynaRoACH on a medium of small glass particles. To obtain estimates of ground reaction forces that result in high performance, we integrate the DEM simulation with a multibody dynamic simulation of the robot. Our study reveals for the first time that qualitatively different propulsion mechanisms exist for low and high frequency movement on granular media. While the low frequency locomotion of DynaRoACH can be understood using a previously introduced “rotary walking” model, at higher frequency, the robot utilizes a hydrodynamic mechanical response of the granular medium to achieve high performance through sustained fluidization of the ground. We use the simulation to systematically vary parameters like particle friction that are inconvenient to modify in experiment, and demonstrate performance and stability limits. We expect that the mechanics discovered here and the tools we have developed should be applicable to other devices and provide a starting point to understand biological locomotion and develop robot designs on more complex deformable substrates, like leaf litter and mud. II.

demonstrated that these particles were a good model for studying locomotion within granular media. In nature, granular media exist in a range of compactions, measured by the volume fraction  (the ratio between the solid volume and the occupied volume). For dry granular media,  can vary from 0.57 <  < 0.64 [19], although this range is influenced by particle friction [20]. The yield strength of a granular medium generally increases with [9], [21] and affects locomotor performance on the granular medium [9]. In our study, we prepared the granular medium into a closely packed state ( = 0.63). However, we found that our results did not qualitatively change for different  (e.g., robot speed was insensitive to ), likely because the robot penetrated its legs into the granular medium to depths of only a few particle diameters and the range of achievable  was small (0.61< < 0.63) in the low friction 3 mm particles.

MATERIALS AND METHODS

A. Experiments 1) Robotic platform The DynaRoACH robot used in this study (Fig. 1A) is a small, lightweight (10 cm, 25 g), bio-inspired hexapedal robot [11]. It has six c-shaped legs (radius 1 cm) and uses an alternating tripod gait. All six legs are driven by a single motor through body linkages. The motor is controlled by a centralized controller mounted close to the center of mass (CoM) of the robot. Control parameters like stride frequency, running time, and PID control gains are set on a PC and communicated to the controller through a Bluetooth wireless interface. 2) Model granular media We used 3.0 ± 0.2 mm diameter glass particles (density = 2.47 g/cm3) as the granular medium (see Fig. 1C). The large size of the particles reduces computation time in the simulation portion of the study, facilitating a direct comparison between experiment and simulation. While these particles are larger than most natural sand grains, they have similar qualitative behavior in response to intrusion; a previous study [17] also

346

Figure 1. Locomotion experiment and simulation. (A) The lightweight, hexapedal DynaRoACH robot resting on a bed of 3 mm diameter glass particles. (B) Leg tip trajectories from top view and side view. Blue and green trajectories denote the two alternating tripods. (C) High speed video experimental setup. (D) Simulation of the robot using MBDyn (E) Leg tip trajectories in simulation. (F) Simulation of the robot running on a bed of 3 mm particles.

conditions. The simulated granular bed (3 × 105 particles) was 60 PD (particle diameter) in width, 15 PD in depth, and 290 PD in length, and had frictionless boundaries (Fig. 1F). At low frequencies we used a shorter granular bed (90 PD) containing 1 × 105 particles to save computation time.

3) Locomotion experiments We ran the DynaRoACH robot on a 75 cm long, 30 cm wide trackway filled to a depth of 6 cm (Fig. 1C). We pressed the bed using a flat plate before each trial to prepare the particles to a closely packed state. Running kinematics were captured by two high speed video cameras (AOS X-PRI) from both top and side views at a frame rate of 200 fps. One high contrast dorsal marker was bonded above the robot center of mass to obtain average forward speed; two lateral markers were bonded on the front and rear of the robot body to obtain CoM height (approximated by the average vertical position of the two markers). Stride frequency was determined from the videos.

2)

Dynamic simulation of the robot To model the robot we used a multi-body dynamic simulator, MBDyn [23], which allows time domain simulation of multi-body mechanical systems from first principle equations. MBDyn features a full 3D simulation with six translation and rotation degree-of-freedoms. This is essential for locomotion on the surface during which pitch, roll, and yaw are often present [15].

4) Leg trajectories To capture prescribed leg trajectories, the robot was suspended in the air and the trajectories of the leg tips were recorded as the motor rotated. During a cycle, each leg rotated backward about the hip (retraction), lifted-up sideways, and swung forward (protraction) (Fig. 1B). Leg kinematic parameters such as the fore-aft swing angle and lateral lifting angle were determined by tracking the markers on the legs and used to guide tuning of leg trajectories in simulation. We define a stride period T as the time between the start of two consecutive retraction phases, and stance as when the leg generates ground reaction force (determined from simulation).

In the dynamic simulation, the robot was constructed with similar body and leg geometries as the actual robot (Fig. 1D). The simulated robot was composed of 13 individual rigid parts: one box-shaped body, six c-shaped legs, and six linking plates between legs and body. The legs of the actual robot were not perfectly rigid but experimental observations showed little leg deformation during locomotion on granular media. The joints between the link plates and c-legs allowed front-back swing of the legs while the plate-body joints allowed sideway lifting of the legs. Tuning kinematic parameters for the joint movements produced leg trajectories that resembled experimental measurements (Fig. 1E) without mimicking the internal linkage of the actual robot.

B. Simulation 1) Discrete element method to model contact forces To investigate locomotion of DynaRoACH in more detail, a simulation of the robot was developed and coupled to a granular media simulation. This granular simulation used the discrete element method (DEM) to compute the particleparticle and particle-intruder interaction for the 3 mm diameter glass particles. As in previous work [17], the DEM simulation was validated by matching the forces on intruders moving in the granular medium (e.g., a rod dragged horizontally) with experimental measurements.

3) Integration of DEM with dynamic simulation We combined MBDyn with the DEM code to simulate robot locomotion on granular media via a communication interface (a UNIX socket using C++). At each time step, MBDyn integrated the equations of motion for the robot combined with the force and torques calculated from the DEM code. The updated kinematics including position, orientation, velocity, and angular velocity of each part of the robot were then passed back to the DEM code to compute the force and torque on all interacting elements at the next time step. The time step was 1 μs set by the particle collision time in DEM.

In the DEM simulation, the normal contact force between an interacting pair of particles is given by a standard force law [22], a Hertzian repulsion and a velocity dependent dissipation (to account for a coefficient of restitution):

In addition to the kinematics during locomotion (e.g., CoM position and velocity, stride length, limb penetration depth), the dynamics during locomotion (e.g., net ground reaction force on each limb and tripod) were also determined from the simulation.

Fn  k n δ3 / 2  Gn vn δ1/ 2 ;

the tangential contact force is modeled as Coulomb friction:

component of relative velocity, k n  2106 kg s -2 m-1 and

C. Kinematic predictions based on previous work Because the small DynaRoACH robot has similarly shaped c-legs to the larger SandBot, we use the rotary walking model developed for SandBot [9] to make two kinematic predictions for the locomotion of the DynaRoACH robot on granular media. We will then test these predictions in both experiment and simulation.

Gn  15 kg s -1 m -1/2 the contact stiffness and viscoelasticity dissipation coefficient, and μ {pp, pi}  0.1, 0.3 the particleparticle and particle-intruder friction coefficients. The restitution and friction coefficients were experimentally measured and validated in a rod drag experiment [17]. Once the parameters were set in the DEM simulation, the robot locomotion could be accurately predicted over a wide range of

First, we predict that both the body height and forward speed will increase during stance and decrease between stances. For walking at low frequencies, leg intrusion speeds will be small enough that granular forces will be dominated by friction, and therefore independent of speed, increase with depth (hydrostatic-like) [24], [25]. As a result, during each step, the legs will initially slowly penetrate into granular

Fs  μ Fn ; where δ is the deformation (virtual overlap) between contacting particle pairs or particle-intruder pairs, v n the normal

347

medium while the body rests on the surface. As the legs penetrate deeply enough for the lift on the legs to balance the weight and vertical inertial force of the body, the legs should stop penetrating and rotate atop solidified granular media, lifting the body and kinematically propelling it forward. As the legs withdraw from the granular medium, the body should fall and forward speed will decrease to zero. We refer to this as rotary walking in SandBot, and expect to see these features in the small robot. Second, based on the rotary walking model, we predict that stride length should decrease with stride frequency. In the quasi-static rotary walking mode, stride length is inversely related to leg penetration depth by geometry. As stride frequency increases, because the inertial force of the body during body lift-up increases, the legs should penetrate more deeply, and therefore the stride length will decrease. In addition, the transition from walking to swimming should be triggered by the reduction in stride length—at high enough stride frequency, stride length should become small enough that the legs will encounter previously disturbed material during each step. III.

RESULTS AND DISCUSSION

A. Kinematics The time-averaged forward speed of the robot measured in both experiment and simulation (Fig. 3A) agreed well, and

increased monotonically with stride frequency. At the highest stride frequency (12 Hz) tested in experiment, the robot reached a speed of 50 cm/s (5 body length/s), comparable to slow runs of the zebra-tailed lizard. Calculated stride length (Fig. 3B) decreased with stride frequency from low (0-3 Hz) to intermediate frequencies (46 Hz) but increased with stride frequency at high frequencies (712 Hz). Duty factor (the percentage of the total stride period during which the limb is in contact with the ground) measured in simulation (Fig. 2E,J) fell below 0.5 at intermediate frequencies of ~6 Hz (Fig. 3C), indicating the onset of aerial phases. Closer examination of the kinematics revealed that the robot displayed a transition in locomotor mode as stride frequency increased: 1) Walking at low stride frequencies At low stride frequencies (e.g., 3 Hz, Fig. 2A-E), as predicted, the DynaRoACH robot used a quasi-static rotary walking locomotor mode, where forward speed increased sublinearly with stride frequency (i.e., stride length decreased; Fig. 3B). Instantaneous forward speed also increased from 0 to 25 cm/s during most of stance and then dropped to zero (Fig. 2C). Vertical position of the CoM tracked in simulation (Fig. 2A,B,D) showed that average body height was 1.28 ± 0.03 cm during stance, increased by 0.46 ± 0.03 cm (38% of the standing body height 1.2 cm) during most of stance, and then decreased by the same amount.

Figure 2. Two locomotion modes observed for DynaRoACH moving on granular media. (A-E) Walking at low frequencies (e.g., 3 Hz). (F-J) Running at high frequencies (e.g., 11 Hz). (A,F) Sideview of the robot in experiment. (B,G) Sideview of the robot in simulation. (C,H) Instantaneous forward speed vs. time. (D,I) Body height vs. time. (E,J) Vertical ground reaction force on a tripod vs. time. Dashed curve and solid curve are for the two alternating tripods. Black horizontal line indicates body weight (mg). In (A,B) and (F,G) the three time instants shown are the start, middle, and end of three different stances. In (C-E) and (H-J) data are shown for simulation only. Duty factor is stance duration t divided by stride period T.

348

The observed decrease in stride length with stride frequency, increase in body height and forward speed during most of stance, and decrease in body height and forward speed between stances were in accord with the rotary walking model [9] (Fig. 3A,B, red curve). This suggests that like SandBot, the small DynaRoACH robot also rotarywalked on solidified granular media at low frequencies.

water [27] or granular surfaces [5], [9], it is critical to generate sufficient Fz to balance the weight and inertial force of the body before legs sink too deeply into the substrate. Averaged over a cycle, lift must equal the body weight, i.e., 1

F dt  mg , where mg is the body weight of the robot, DT z T the cycle period, and D the duty factor defined as the stance duration divided by T.

The decrease in body height and forward speed occurred at the start of stance; however, this does not contradict the rotary walking model, but is a consequence of the different leg trajectories of the small robot and SandBot. Because SandBot rotates its legs in circular trajectories, its body must rest on the surface between two tripods, resulting in stance phase that begins during the retraction phases of legs. The small DynaRoACH robot instead uses protraction-retraction leg trajectories, which result in stance phases that begin during protraction phases of the legs. 2) Running at high stride frequencies At high stride frequencies (Fig. 2F-J, 11 Hz) the DynaRoACH robot exhibited a different locomotor mode than predicted by the rotary walking model. The forward speed of DynaRoACH increased super-linearly with stride frequency (i.e., stride length increased; Fig. 3B). Instantaneous forward speed was always greater than zero and decreased during the first half of stance. It then increased during the second half of stance (Fig. 2H). The average body height measured in simulation was 1.86 ± 0.02 cm above the surface (Fig. 2F,G,I), which was 0.58 cm (48% of the standing body height 1.2 cm) higher than that at low frequencies. Body height decreased by 0.19 ± 0.02 cm (16% of the standing body height 1.2 cm) during the first half of stance and increased by the same amount during the second half of stance. Simulation revealed that grains around the intruding legs remained fluidized throughout the stance phase. While these kinematics were different from those predicted by the rotary walking model (Fig. 3A,B, red curve), they also differed from the slow surface swimming which SandBot used at high stride frequencies [9], in which the body height remained constant while the belly lay on the surface and forward speed was small (~ 1 cm/s). The decrease in body height and instantaneous forward speed during the first half of stance, the monotonic increase of average forward speed with stride frequency, and the aerial phases observed in the small robot at high stride frequencies resembled those observed in the zebra-tailed lizard running on granular substrates [5], which follows a spring-loaded inverted pendulum (SLIP) model [26]. This suggested that unlike SandBot but like the zebra-tailed lizard, the small robot used a SLIP-like running mode at high frequencies. B. Vertical ground reaction force To understand the mechanism of the transition in locomotor mode from walking at low frequencies to running (but not swimming) at high frequencies, we examined in simulation the vertical ground reaction force, Fz, on a tripod of legs. For animals and legged robots moving on deformable or yielding substrates such as the surface of

Figure 3. Performance and gait parameters. (A) Average forward speed vs. stride frequency (circles: experiment; squares: simulation). (B) stride length vs. stride frequency (circles: experiment; squares: simulation) (C) duty factor vs. stride frequency (simulation only). Dashed curves in (A,B) are predictions from the rotary walking model [9]. Error bars indicate standard deviation.

349

At low stride frequencies (e.g., 3 Hz), because duty factor was greater than (but close to) 0.5, the Fz on both tripods was close to the body weight for most of the cycle (Fig. 2E). As duty factor decreased below 0.5 with increasing stride frequency, the Fz on both tripods no longer overlapped, and the magnitude of Fz on each tripod increased. The peak of Fz increased from ~ 1 mg at 3 Hz (Fig. 2E) to ~ 6-7 mg at 12 Hz (Fig. 2J). Peak torque on a tripod about the hips measured in simulation at 12 Hz was 10 mN-m; this was less than the stall torque of the motor-gearbox system.

mechanism from low frequency (i.e., low vertical penetration speed) walking on yielding ground to high frequency (i.e., high vertical penetration speed) running on fluidizing ground is generic to locomotion on granular media. However the frequency at which the walk-to-run transition occurs should depend on parameters associated with the granular media as well as the robot morphology and kinematics. The capability of the small robot to run rapidly at high frequencies on granular media by using hydrodynamic-like forces, in contrast to SandBot’s slow swimming, suggests that lightweight locomotors have an advantage when moving on granular surfaces. Indeed, the small robot’s legs are relatively large (~ 1.4 cm2) compared to its body weight (25 g) (each leg applies a pressure of 800 Pa when standing), and can generate enough hydrodynamic-like lift by paddling its legs rapidly to maintain the body well above the surface. By contrast, SandBot’s legs are relatively small (~ 5 cm2) compared to its body weight (2300 g) (each leg applies a pressure of ~1.5 × 104 Pa when standing), and cannot generate enough hydrodynamic-like lift to support the body before the legs sink deeply enough to encounter previously disturbed material over steps and trigger swimming. This may explain why the zebra-tailed lizard, the highestperforming among desert lizards of similar size, has the largest hind feet [5].

The large increase in Fz at high frequencies was likely a result of a material response to leg intrusion and differed from the friction-dominated yielding observed in the SandBot study. This is because the SandBot walking model based on the friction-dominated hydrostatic-like forces could only explain the mechanism governing the increase in Fz at low frequencies. At low frequencies (< 3 Hz), the granular force was friction-dominated, and therefore was assumed to be independent of intrusion speed and to increase with penetration depth. As stride frequency increased, this depth dependent granular force increased due to the increasing inertial force associated with lifting of the body [9], resulting in an increasing penetration depth (Fig. 4A). At high stride frequencies (> 3 Hz), however, the measured leg penetration depth decreased (Fig. 4A), counter to the rotary walking model prediction. In this case, the walking model predicted a decrease in the lift force on the legs, contrary to observations (Fig. 2E, J). This discrepancy suggests that there must be additional contribution to the force at high frequencies. Examination of leg kinematics in simulation revealed that the vertical penetration speed of the legs increased with stride frequency and reached nearly 1 m/s at 12 Hz (Fig. 4B). It is known that the granular forces during high speed impact are hydrodynamic-like and increase quadratically with impact speed [28], [29]. Our data of lift vs. vertical leg penetration speed (Fig. 5, green squares) can be described approximately by a quadratic with a non-zero y-intercept (Fig. 5, red curve), due to the finite yield stress of the medium. We hypothesize that as the vertical leg penetration speeds increase, the inertial force of the grains being accelerated by the legs becomes important and contributes significantly to the vertical ground reaction force. In other words, at high frequencies, instead of swimming, the robot runs on granular material that behaves like an “inertial fluid”, much like the basilisk lizard (Basiliscus), the so-called “Jesus Christ lizard” that runs on the surface of water [27]. We used previous intrusion studies in granular media to estimate the transition frequency for the DynaRoACH robot. Studies of horizontal drag and vertical impact in granular media [24], [28], [29] suggest that inertial effects become important for intrusion speeds beyond vc ~ (2gd)1/2, where d is the particle diameter and g the gravitational acceleration. For 3 mm glass particles, vc = 25 cm/s. This indicates that hydrodynamic-like force should become significant as the vertical leg penetration speed increases beyond ~ 25 cm/s, or as stride frequency increases past ~ 6 Hz (Fig. 4B). This matches the observed transition in locomotor mode around 6 Hz (Fig. 3). We posit that this transition of the propulsion

Figure 4. Limb penetration and intrusion speed measured in simulation. (A) Maximal leg penetration depth (measured in particle diameters) vs. stride frequency. (B) Maximum leg vertical penetration speed vs. stride frequency. Error bars indicate standard deviation.

350

region of accelerating particles increases with pp because larger pp facilitates interlocking of the particles. The decrease of speed with pp between 0.2 < pp < 0.4 for running at high frequencies is likely a result of reduction in propulsion due to an asymmetric gait. We observed that in this regime, the particles became more tightly interlocked and lost fluidity. The lift force on one tripod alone became sufficient to accelerate the robot up such that the body travelled in the air during the remainder of the cycle and the other tripod never touched the ground. As pp increased above 0.4, the particles became so tightly interlocked that the granular substrate behaved like rigid ground. Locomotion became unstable as the robot bounced erratically, and its movement direction changed randomly. This is similar to the unstable motion of the robot on rigid ground [15]. Forward speed was not well defined in this regime. The perfectly rigid legs of the robot used in simulation may have contributed to this instability by generating force spikes during substrate impact [30].

Figure 5. Vertical ground reaction force vs. peak leg penetration speed measured in simulation. Dashed red curve indicates quadratic fit with fitting parameter α = 5.7×10-3 cm-2 s2and with non-zero intercept at mg. Error bars indicate standard deviation.

C. Parameter variation and broader applicability Since both the DEM simulation of the granular medium and the MBDyn multi-body analysis are based on first principles (i.e., Newton-Euler equations of motion) and validated empirical models (i.e., Hertzian contact theory, Coulomb model of friction), our results should be applicable to a broad class of granular substrates. The simulation tool therefore allows rapid and accurate systematic variation of the properties of the locomotor (e.g., leg geometry, leg trajectories) as well as the granular substrate (e.g., size, density, friction, and hardness of particles). For example, while in experiment it is difficult to alter only one property while not changing others, we can test in simulation the effects of each substrate property. This can provide insight for both design of future multi-terrain robots, and allows testing of biological hypotheses.

IV.

CONCLUSIONS

Inspired by the high performing desert animals moving on granular media, we studied the locomotion of a lightweight, bio-inspired, legged robot on a granular substrate and developed an experimentally validated computer simulation of the locomotion. Kinematics measured in simulation matched experiment and enabled examination of ground reaction forces responsible for the high locomotor performance. The small robot displayed a transition in locomotor mode from walking at low frequencies to running at high frequencies. At low frequencies, hydrostatic-like forces generated during the yielding of the granular material ultimately led to solidification of the material, and the robot moved as if it were walking on a solid surface. At high frequencies, however, the inertia of the grains being accelerated became important and forces became hydrodynamic-like. In this regime the robot ran rapidly by paddling the legs on fluidized granular material which behaved like an “inertial fluid”.

To demonstrate the capabilities of our simulation, we analyzed the effect of the coefficient of particle-particle friction, pp. Dramatic changes in locomotor performance were observed when pp was varied (Fig. 6). For walking at low frequencies (e.g., 3 Hz, Fig. 6, filled green squares), the robot speed increased monotonically with pp, and saturated at high pp. In this regime, forces were dominated by friction. As pp increased, the legs needed to penetrate less to balance body weight and inertial force. As a result, stride length and speed increased with pp. The saturation of speed at high pp was a consequence of leg penetration depth approaching zero with increasing pp. Thus stride length could not increase further. For running at high frequencies (e.g., 12 Hz, Fig. 6, empty green squares), the robot speed increased with pp for pp < 0.2 and decreased with pp for pp > 0.2. The increase of speed with pp for pp < 0.2 is likely because the inertial force provided by the particles being accelerated by the legs increased with pp We observed that multiple layers of particles directly under and around the legs were fluidized during leg penetration. We hypothesize that the size of the

Figure 6. Robot average speed upon variation of particle-particle friction in simulation (filled squares: 3Hz, empty squares: 12Hz).

351

[9] C. Li, P. B. Umbanhowar, H. Komsuoglu, D. E. Koditschek, and D. I. Goldman, “Sensitive dependence of the motion of a legged robot on granular media,” Proceedings of the National Academy of Sciences of the United States of America, vol. 106, no. 9, pp. 3029-3034, 2009. [10] R. J. Wood, R. Sahai, E. Steltz, and R. S. Fearing, “Microrobot design using fiber reinforced composites,” Journal of Mechanical Design, vol. 130, no. May, pp. 1-11, 2008. [11] A. M. Hoover, S. Burden, X. Fu, S. S. Sastry, and R. S. Fearing, “Bio-inspired design and dynamic maneuverability of a minimally actuated six-legged robot,” Biomedical Robotics and Biomechatronics (BioRob), pp. 869-876, 2010. [12] P. Birkmeyer, K. Peterson, and R. S. Fearing, “DASH: A dynamic 16g hexapedal robot,” in IEEE International Conference on Intelligent Robots and Systems, pp. 2683-2689, 2009. [13] W. Mosauer, “Adaptive convergence in the sand reptiles of the Sahara and of California: A study in structure and behavior,” Copeia, vol. 1932, no. 2, pp. 72-78, Jul. 1932. [14] C. S. Crawford, Biology of Desert Invertebrates. New York: Springer, 1981. [15] C. Li, A. M. Hoover, P. Birkmeyer, P. B. Umbanhowar, R. S. Fearing, and D. I. Goldman, “Systematic study of the performance of small robots on controlled laboratory substrates,” Proceedings of SPIE Defense, Security, and Sensing Conference, 7679:76790Z (1–13), 2010. [16] S. Vogel, Life in moving fluids: the physical biology of flow. Princeton Univ Pr, 1996. [17] R. D. Maladen, Y. Ding, P. B. Umbanhowar, A. Kamor, and D. I. Goldman, “Mechanical models of sandfish locomotion reveal principles of high performance subsurface sand-swimming,” Journal of The Royal Society Interface, vol. 8, no. 62, pp. 1332-1345, Mar. 2011. [18] A. A. Biewener and R. J. Full, “Force platform and kinematic analysis,” in Biomechanics: Structures and Systems: A Practical Approach, Oxford University Press, pp. 45-73, 1992. [19] W. W. Dickinson and J. D. Ward, “Low depositional porosity in eolian sands and sandstones, Namib Desert,” Journal of Sedimentary Research, vol. 64, no. 2a, pp. 226-232, 1994. [20] M. Jerkins, M. Schr, T. J. Senden, M. Saadatfar, and T. Aste, “Onset of mechanical stability in random packings of frictional spheres,” Physical Review Letters. vol. 101, no. 1, pp. 1-4, 2008. [21] N. Gravish, P. B. Umbanhowar, and D. I. Goldman, “Force and flow transition in plowed granular media,” Physical Review Letters, vol. 105, no. 12, pp. 208301(1-4), Sep. 2010. [22] J. Shäfer, S. Dippel, and D. Wolf, “Force schemes in simulations of granular materials,” Journal de physique I, vol. 6, no. 1, pp. 5–20, 1996. [23] G. L. Ghiringhelli, P. Masarati, and P. Mantegazza, “MultiBody Analysis of a Tiltrotor Con guration Multi-Body Formulation,” Nonlinear Dynamics, vol. 19, no. 4, pp. 333-357, 1999. [24] R. Albert, M. A. Pfeifer, A. Barabási, and P. Schiffer, “Slow Drag in a Granular Medium,” Physical Review Letters, pp. 205-208, 1999. [25] G. Hill, S. Yeung, and S. a Koehler, “Scaling vertical drag forces in granular media,” Europhysics Letters (EPL), vol. 72, no. 1, pp. 137-143, Oct. 2005. [26] R. Blickhan, “The spring-mass model for running and hopping,” Journal of Biomechanics, vol. 22, no. 11–12, pp. 1217-1227, 1989. [27] J. W. Glasheen and T. A. Mcmahon, “A hydrodynamic model of locomotion in the Basilisk Lizard,” Nature, vol. 380, no. 6572, pp. 340-341, 1996. [28] H. Katsuragi and D. J. Durian, “Unified force law for granular impact cratering,” Nature Physics, vol. 3, no. 6, pp. 420-423, Apr. 2007. [29] D. I. Goldman and P. B. Umbanhowar, “Scaling and dynamics of sphere and disk impact into granular media,” Physical Review E, vol. 77, no. 2, pp. 021308(1-14), Feb. 2008. [30] S. Kim, J. E. Clark, and M. R. Cutkosky, “iSprawl: Design and tuning for high-speed autonomous open-loop running,” The International Journal of Robotics Research, vol. 25, no. 9, pp. 903–912, 2006.

Our results reveal that lightweight robots can achieve high locomotor performance on granular media by exploiting fluid properties of the granular material. This locomotion mode is distinct from previously observed low frequency yielding walking strategy, and provides a better understanding of fundamental locomotive modes for a broad class of granular substrates. The lightweight robot platform enables detailed examination of legged locomotion, and provides likely the best model to date of a robot running on granular media. In addition, the integrated simulation tool we developed can be used to systematically test the effect of both locomotor and substrate properties on locomotor performance, which can guide the design, control and power consumption estimates for high-performing multi-terrain robot platforms. Finally, we note that while experiment and simulation allow detailed investigation of mechanics of movement on granular media, a complementary approach is needed, that of low order dynamical models that can be used to gain insight into the critical mechanics of dynamical running. In our future work, we will investigate if a dynamic force law that describes the hydrodynamic-like forces during high speed leg intrusions can be obtained from measurements in DEM simulation. We posit that a generalized locomotion model similar to the Spring-Loaded Inverted Pendulum (SLIP) [26] can be developed based on the force law, and can extend our current study to more generalized conditions. This generalized model will shed light on the locomotor dynamics of legged animals and robots on granular media, as well as guide development for analytically tractable low order models. ACKNOWLEDGMENTS We thank Jeff Shen for the help with data collection, and Yang Ding and Paul Umbanhowar for helpful discussion. REFERENCES [1] U. Saranli, M. Buehler, and D. E. Koditschek, “RHex : A Simple and Highly Mobile Hexapod Robot,” The International Journal of Robotics Research, vol. 20, no. 7, pp. 616-631, 2001. [2] R. T. Schroer, M. J. Boggess, R. J. Bachmann, R. D. Quinn, and R. E. Ritzmann, “Comparing cockroach and Whegs robot body motions,” in Proceedings of the 2004 IEEE International Conference on Robotics and Automation, vol. 4, pp. 3288–3293, 2004. [3] C. Wright et al., “Design of a modular snake robot,” Proceedings of the 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2609-2614, Oct. 2007. [4] S. Kim, M. Spenko, S. Trujillo, et al., “Whole body adhesion: hierarchical, directional and distributed control of adhesive forces for a climbing robot,” in Proceedings of the 2007 IEEE International Conference on Robotics and Automation, pp. 1268–1273, April 2007. [5] C Li, S. T. Hsieh, and D. I. Goldman, "Multi-functional foot use during running in the zebra-tailed lizard (Callisaurus draconoides)", Journal of Experimental Biology, vol. 215, no. 18, 2012. [6] J. Matson, “Unfree Spirit: NASA’s Mars Rover Appears Stuck for Good,” Scientific American, vol. 302, no. 4, p. 16, 2010. [7] H. M. Jaeger, S. R. Nagel, and R. P. Behringer, “The physics of granular materials,” Physics Today, vol. 49, p. 32, 1996. [8] R. M. Nedderman, Statics and Kinematics of Granular Materials. Cambridge: Cambridge University Press, 1992.

352

On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference Konrad Rawlik∗ , Marc Toussaint† and Sethu Vijayakumar∗ ∗



School of Informatics, University of Edinburgh, UK Department of Computer Science, FU Berlin, Germany

Abstract—We present a reformulation of the stochastic optimal control problem in terms of KL divergence minimisation, not only providing a unifying perspective of previous approaches in this area, but also demonstrating that the formalism leads to novel practical approaches to the control problem. Specifically, a natural relaxation of the dual formulation gives rise to exact iterative solutions to the finite and infinite horizon stochastic optimal control problem, while direct application of Bayesian inference methods yields instances of risk sensitive control. We furthermore study corresponding formulations in the reinforcement learning setting and present model free algorithms for problems with both discrete and continuous state and action spaces. Evaluation of the proposed methods on the standard Gridworld and Cart-Pole benchmarks verifies the theoretical insights and shows that the proposed methods improve upon current approaches.

cases make restrictive assumptions about the problem dynamics and costs which can be relaxed under our framework, besides providing a unifying and generic formalism. Finally, we are able clarify the relation of SOC and the inference control formulation by [24, 17, 26], which allows for arbitrary problems, showing it to be an instance of risk sensitive control. The generalisation of this relation given by our approach makes it possible to apply out of the box inference methods to obtain approximate optimal policies. This is of particular interest in the case of continuous problems – here approximations are unavoidable since explicit representations are often not available.

I. I NTRODUCTION

II. P RELIMINARIES

In recent years the framework of stochastic optimal control (SOC) [20] has found increasing application in the domain of planning and control of realistic robotic systems, e.g., [6, 14, 7, 2, 15] while also finding widespread use as one of the most successful normative models of human motion control [23]. In general, SOC can be summarised as the problem of controlling a stochastic system so as to minimise expected cost. A specific instance of SOC is the reinforcement learning (RL) formalism [21] which does not assume knowledge of the dynamics or cost function, a situation that may often arise in practice. However, solving the RL problem remains challenging, in particular in continuous spaces [16]. A recent, promising direction in the field has been the application of inference methods [1] to these problems, e.g., [10, 22, 24]. In this context, we introduce a generic formulation of the SOC problem in terms of Kullback-Leibler (KL) divergence minimisation. Although the arising KL divergences can, in general, not be minimised in closed form, we provide a natural iterative procedure that results in algorithms that we prove to asymptotically converge to the exact solution of the SOC problem. Specifically, algorithms for both finite and infinite horizon problems are derived and their corresponding formulations in the RL setting are introduced. We show that the latter corresponds to the independently derived result of [5] for the specific case of infinite horizon discrete problems; here, we extend this to problems with continuous actions. Formulation of SOC problems in terms of KL minimisation has been previously studied by, amongst others, [22], [11] and [10], leading to efficient methods for both stochastic optimal control [22] and RL [7]. However, as we will discuss, these

A. Stochastic Optimal Control We will consider control problems which can be modeled by a Markov decision process (MDP). Using the standard formalism, see also e.g., [21], let xt ∈ X be the state and ut ∈ U the control signals at times t = 1, 2, . . . , T . To simplify the notation, we shall denote complete state and control trajectories x1...T ,u0...T by x ¯,¯ u. Let P (xt+1 |xt , ut ) be the transition probability for moving from xt to xt+1 under control ut and let Ct (x, u) ≥ 0 be the cost incurred per stage for choosing control u in state x at time t. Let policy π(ut |xt ) denote the conditional probability of choosing the control ut given the state xt . In particular a deterministic policy is given by a conditional delta distribution, i.e. π(ut |xt ) = δut =τ (xt ) for some function τ . The SOC problem consists of finding a policy which minimises the expected cost, i.e., solving B A T  π * = argmin Ct (xt , ut ) , (1) π

t=0



where ·qπ denotes the expectation with respect to x, u ¯|x0 ) = π(u0 |x0 ) qπ (¯

T &

π(ut |xt )P (xt+1 |xt , ut ) ,

(2)

t=1

the distribution over trajectories under policy π. In the case of infinite horizon problems, i.e. we let T → ∞, we will consider the discounted cost setting and specifically assume that Ct (xt , ut ) = γ t C• (xt , ut ), where C• is a time stationary cost and γ ∈ [0, 1] a discount factor.

353

u0

u1

u2

x0

x1

x2

r0

r1

r2

...

xT

rT

Fig. 1: The graphical model of for the Bayesian formulation of the control problem in the finite horizon case. In the infinite horizon case we obtain a stochastic Markov process.

III. I TERATIVE S OLUTIONS Although Theorem 1 provides the correspondence between the SOC formulation and the computationally attractive inference control approach, due to the constraint π ∈ D, (4) remains as intractable as the classical formulation via the Bellmann equation. However relaxation of this constraint to allow minimisation over arbitrary stochastic policies provides a closed form solution, and although it does not directly lead to an optimal policy, we have the following result: Theorem 2. For any π = π 0 , KL (qπ ||pπ0 ) ≤ KL (qπ0 ||pπ0 ) implies C(¯ x, u ¯)qπ < C(¯ x, u ¯)q 0 . π

B. Inference Control Model A Bayesian inference based approximation of the above control problem can be formulated [24] as illustrated in Fig.1. In addition to the state and control variables of classical SOC, a binary dynamic random task variable rt is introduced and the task likelihood is related to the classical cost by choosing P (rt = 1|xt , ut ) = exp{−ηC(xt , ut )}, where η > 0 is some constant in analogy with the inverse temperature of a Boltzmann distribution. For some given policy π and assuming the artificial observations r0...T = 1, we denote the unx, u ¯): normalised posterior by pπ (¯ x, u ¯) = P (¯ x, u ¯, r¯ = 1|x0 ) pπ (¯ x, u ¯) = qπ (¯

T &

exp{−ηCt (xt , ut )} .

(3)

t=0

C. General Duality While the Bayesian model has been employed successfully for trajectory planning, e.g., in [24], it’s relation to the classical SOC problem remained unclear. Although a specific subset of SOC problems, studied by [11] and [22], can be formulated in a similar Bayesian model, as explicitly done by [10] (we discuss the relation to this work further in III-D3), here, we establish the formal correspondence between the two formalisms in the general case with the following result: Theorem 1. Let π be an arbitrary stochastic policy and D the set of deterministic policies, then the problem

Proof: see Supplementary Material Consequently, with some initial π 0 , the iteration π n+1 ← argmin KL (qπ ||pπn ) ,

where π is an arbitrary3 conditional distribution over u, gives rise to a chain of stochastic policies with ever decreasing expected costs. We would like to note that the conditions imposed by the above result, in order to guarantee a policy improvement, are relatively weak. By exploiting this, in addition to the iteration arising from (6), we present a relaxation, which satisfy Theorem 2 and leads to practical algorithms for infinite horizon problems, and the related iteration of Bayesian inference which leads to risk-sensitive control. A. Exact Minimisation - Finite Horizon Problems The general minimisation in iteration (6) can, as previously indicated, be performed in closed form and the new policy (for derivation, see Supplementary Material), is given by the Boltzmann like distribution, ¯ n+1 (xt , ut ) − Ψ (xt )} , π n+1 (ut |xt ) = exp{Ψn+1 t t

Ψn+1 (xt , ut ) = log π n (ut |xt ) + log P (rt = 1|xt , ut ) t  ¯ n+1 (xt+1 ) + P (xt+1 |xt , ut )Ψ t+1

is equivalent to the stochastic optimal control problem (1) with cost per stage 1 Cˆt (xt , ut ) = Ct (xt , ut ) − log π 0 (ut |xt ) . η

(5)

Proof: see Supplementary Material1 . As an immediate consequence we may recover any given stochastic optimal control problem with cost Ct by choosing π 0 (·|x) to be the uniform distribution over U2 . 1 Supplementary 2 n.b.,

Material can be found at http://arxiv.org/abs/1009.3958 formally we require U to be finite or bounded

(8)

xt+1

and log partition function



¯ n+1 (xt ) = log Ψ t

(4)

π∈D

(7)

with energy

0

x, u ¯)||pπ0 (¯ x, u ¯)) π * = argmin KL (qπ (¯

(6)

π

exp{Ψn+1 (xt , u)} . t

(9)

u

In the finite horizon case, the policy can therefore be computed backwards in time. 1) Convergence Analysis: Following [12], we bound the progress of the trajectory posterior under policy π n towards the corresponding distribution under some chosen π ˆ , obtaining Lemma 3. Let the sequence {π n } be generated by (6) and let π ˆ be an arbitrary (stochastic) policy. Then KL (qπˆ ||qπn+1 ) − KL (qπˆ ||qπn ) x, u ¯)q ≤ ηC(¯ x, u ¯)qπˆ − ηC(¯

π n+1

.

(10)

3 n.b., formally certain assumptions have to be made to ensure the support of qπ is a subset of the support of pπn

354

Proof: See Supplementary Material. Summing the above bound over 0 . . . N , we can compute the bound N +1 1  1 KL (qπˆ ||qπ0 ) , (11) C(¯ x, u ¯)qπn ≤ C(¯ x, u ¯)qπˆ + N n=1 ηN

on the average expected cost of the policies π 1 . . . π n+1 . Now, since Theorem 2 guarantees that the expected cost for each π n is non increasing with n, using (11), we can obtain the following stronger convergence result. Theorem 4. Let {π n } be a sequence of policies generated by (6), with π 0 s.t. π 0 (·|x ∈ X) has support U. Then x, u ¯)qπn = min C(¯ x, u ¯)qπ . lim C(¯

n→∞

π

(12)

Proof: See Supplementary Material. B. Asynchronous Updates - Infinite Horizon Problems In the infinite horizon setting, discussed in II-A, it is easy to show that the time stationary analog of (8) can be obtained as Ψn+1 (x, u) = log π n (u|x) + log P (r = 1|x, u)  ¯ n+1 (y) . + γ P (y|x, u)Ψ

(13)

assumption that the cost is discounted, the expected cost of the tail goes to zero as the horizon increases, leading to a result analogous to Theorem 4 (see Supplementary Material for formal proof). C. Posterior Policy Iteration Since our starting point was the relaxation of the relation between SOC and inference control, it is interesting to consider sequential inference of the posterior policy, which is the natural iteration arising in the latter framework. Such an iteration is of particular interest as posterior inference is a well studied problem with a large range of approximate algorithms [1] which could be exploited for practical implementations. Although unconstrained minimisation of the KL divergence is achieved by the posterior, in our case, the specific form of qπ in (6) is, as can be seen in (2), restricted by the prescribed system dynamics, leading to the results presented in the last sections. Nonetheless, we may consider the iteration π n+1 = pπn (ut |xt ) ,

which, as we show (see Supplementary Material), will converge to the policy

y

¯ n+1 , this does not yield Ψn+1 However, due to the form of Ψ in closed form. To obtain a practical solution we make use of the relatively weak conditions given by Theorem 2 for obtaining a lower expected cost, which allow us to consider the minimisation in (6) over some iteration dependent subset Pn of the set of all (stochastic) policies. Then, Theorem 2 guarantees the expected costs to be non increasing, if for all n, π n ∈ Pn . Such iterations admit asynchronous updates as an interesting case, i.e., updating one or several time steps of the policy at each iteration in any particular order. Formally, we choose a / schedule of time step sets T0 , T1 , . . . and let Pn = {π : ∀t ∈ Tn , πt = πtn }. Specifically, we will consider the schedule for such updates given by Tn = {0, . . . , n − 1}, i.e., in each iteration we consider finite horizon problems with increasing n+1 horizon. Such a schedule leads to the update πt+1 = πtn for n+1 all t > 0 while the new first step policy, π0 , is of the form (7) and obtained via ¯ n (x) + log P (r = 1|x, u) (x, u) = Ψn0 (x, u) − Ψ Ψn+1 0 0  ¯ n (x ) , (14) +γ P (x |x, u)Ψ 0 x

hence yielding a practical iteration which has a strong analogy to value iteration, see e.g., [21]. 1) Convergence Analysis: Essentially equivalent convergence results to the finite horizon case can be obtained for the asynchronous algorithm (14) in the infinite horizon setting. Informally, we proceed by assuming that the cost is bounded and consider finite horizon problems with growing horizon, bounding the expected cost of the infinite tail. Due to the

(15)

1 x, u ¯)}qπ . π ˜ = argmin − log exp{−ηCt (¯ η π

(16)

The objective being minimized is exactly the risk sensitive objective of [8], which has been recently also used in the path integral approach to SOC [3]. In particular, note that for η → 0, we obtain the classical risk neutral controls, allowing near optimal policies for arbitrary SOC problems to be computed by iterated Bayesian inference. D. Relation to Previous Work 1) Dynamic Policy Programming (DPP): The recently introduced DPP algorithm [5] is closely related to the formalism described here. Specifically, while the update equations (14) coincide, we provide a more general view of DPP by deriving it as a special case of the novel result in Theorem 2. In addition, III-A provides the direct extension of DPP to finite horizon problems, while the convergence proofs of III-B extend those given by [5] to continuous state and action spaces. 2) Approximate Inference Control (AICO): The AICO [24] approach to trajectory optimisation shares the same Bayesian Model used as a starting point here (cf. II-B). However, although using local LQG approximations AICO converges to locally optimal trajectories, the relation to the classical SOC problem remained unclear. We not only establish such a formal relation, but also note that AICO can be interpreted as one step of the posterior policy iteration introduced in III-C. More specifically, if one were to use the maximum likelihood policy obtained by AICO one would obtain (approximate) optimal risk seeking controls.

355

3) Path Integral Control: Let us briefly recall the KL control framework [10], the alternative formulations in [22] being equivalent for our purposes. Choose some free dynamics ν0 (xt+1 |xt ) and let the cost be given as ν(¯ x) C(¯ x) = (¯ x) + log ν0 (¯ x) where ν(xt+1 |xt ) is the controlled process under some policy. Then C(¯ x)ν = KL (ν(¯ x)||ν0 (¯ x) exp{−(¯ x)})

find the policy implementing the best realisable transition, i.e., relaxation of (19) to   P (xt+1 |xt , ut )π(ut |xt )||ν(xt+1 |xt ) , argmin KL π∈D

does also not lead to the desired result. However, for problems of the specific form (20), a closer relation between Theorem 1 and (17) does indeed exist. To illustrate this, we write the KL divergence of Theorem 1 in terms of the state trajectory (¯ x) marginals as

(17) KL (qπ (¯ x, u ¯)||pπ0 (¯ x, u ¯)) = KL (qπ (¯ x)||ν(¯ x)) C D 1 − mTt Q−1 But − uTt Hut ) , 2 qπ (¯ x,¯ u)

is minimised w.r.t. ν by ν(x1:T |x0 ) =

1 exp{−(x1:T )}ν0 (x1:T |x0 ) Z(x0 )

(18)

and one concludes that the optimal control is given by ν(xt+1 |xt ), where the implied meaning is that ν(xt+1 |xt ) is the trajectory distribution under the optimal policy. Although (18) gives a process which minimises (17), it is not obvious how to compute the actual controls ut . Specifically when given a model of the dynamics, i.e., P (xt+1 |xt , ut ), and having chosen some ν0 , a non trivial, yet implicitly made, assumption is that there exists a policy implementing the required transitions ν(xt+1 |xt ), i.e., ∃π s.t.   KL P (xt+1 |xt , ut )π(ut |xt )||ν(xt+1 |xt ) = 0. (19) ut

However, in general, such a π will not exist. This is made very explicit for the discrete MDP case in [22], where it is acknowledged that the method is only applicable if the dynamics are fully controllable, i.e., P (xt+1 |xt , ut ) can be brought into any required form by the controls. Although in the same paper, it is suggested that solutions to classical problems can be obtained by continuous embedding of the discrete MDP, such an approach has several drawbacks. For one, it requires solving a continuous problem even for cases which could have been otherwise represented in tabular form, but more importantly such an approach is obviously not applicable to problems which already have continuous state or action spaces. In the case of problems with continuous states and actions we may consider the specific form xt+1 = F(xt ) + B(ut + ξ), Ct (xt , ut ) = (xt ) +

uTt Hut

,

ξ ∼ N (0, Q) ,

ut

(20)

with F, B and  having arbitrary form, but H, Q are such that H ∝ BT Q−1 B. This is of interest, as it is the discrete time form of the fully controllable continuous time problem which underlies the path integral approach [11]. It also has been claimed, e.g., [10], that, analogously to the continuous time case, the solution of this problem is given by (18). However considering the simple instance of a one step LQG problem, we see that (19) will not hold, as in this case the variance of P (x1 |x0 , u0 ) is uncontrolled. Hence ν is not the trajectory distribution under the optimal policy. Furthermore it is straightforward to convince oneself that attempting to

where mt = xt+1 − xt − F(xt ). Furthermore, since for a deterministic policy, i.e. π(ut |xt ) = δut =τ (xt ) , mt qπ = But qπ = Bτ (xt ) , the second term is zero under the condition required, i.e., H = 2BT Q−1 B, and analogous to (17), it is sufficient to consider the distributions over state trajectories only. In conclusion, for discrete time problems, the work of [10, 22] constitutes special cases of Theorem 1, which either assume fully controllable dynamics or where the control trajectories can be marginalised from Theorem 1. 4) Expectation Maximisation: Several suggestions for mapping the SOC problem onto a maximum likelihood problem and using Expectation Maximization (EM) have been recently made in the literature, e.g., [25], and going further back, the probability matching approach [4, 19] has also close links with EM. Considering (6), the proposed approach has a close relation to the free energy view of EM. Given a free energy F (˜ q , π) = log P (¯ r = 1; π) − KL (˜ q ||pπ ) q) , = log P (¯ r = 1, x ¯, u ¯; π)q˜ + H(˜

(21) (22)

EM alternates between minimizing KL (˜ q ||pπ ) w.r.t. q˜ in (21) and maximising the free energy w.r.t. the potentially infinite parameter vector π in (22). Our iteration of (6) deviates from this standard EM in that the KL-minimization in (6) is w.r.t. a constrained q˜, namely one which can be generated by a control π. The M-step is then trivially assigning the new π to the one corresponding to q˜. The constraint E-step departs from standard EM but is a special case of the alternating minimisation procedures of [9]. Importantly however, unlike the previously mentioned EM based approaches which can only guarantee convergence to a local extremum, we have demonstrated algorithms with guaranteed convergence to the global optimum. IV. R EINFORCEMENT L EARNING We now turn to the RL setting [21], where one aims to learn a good policy given only samples from the transition probability and associated incurred costs. As RL usually considers the discounted cost infinite horizon setting we concentrate on this case, with the understanding that equivalent steps can

356

be taken in the finite horizon case. We note that for any given x, u the update of (14) can be written as an expectation w.r.t. the transition probability P (y|x, u), and hence, may be approximated from a set of sampled transitions. In particular given a single sample (x, u, R, y) of a transition from x to y under control u, obtaining reward R = log P (r = 1|x, u), we may perform the approximate update   ¯ ¯ Ψ(x, u) ← Ψ(x, u) + α R + γ Ψ(y) − Ψ(x) , (23) with α a learning rate and for trajectory data applying such an update individually for each tuple (xt , ut , Rt , xt+1 ). A. Relation to Classical Algorithms Before proceeding let us highlight certain similarities and differences between (23) and two classical algorithms, Qlearning and TD(0) [21]. The Q-learning algorithm learns the state-action value function. We note that Ψ has certain similarities to a Q function, in the sense that a higher value of Ψ for a certain control in a given state indicates that the control is ’better’ – in fact, for the optimal controls the Q function and Ψ converge to the same absolute value (see Supplementary Material). However, unlike the Q function, which also converges to the expected cost for the sub-optimal controls, Ψ goes to −∞ for sub-optimal actions. A potentially more insightful difference between the two algorithm is the nature of updates employed. The Qlearning algorithm uses updates of the form 

 Q(y, u ) − Q(x, u) , Q(x, u) ← Q(x, u) + α R + γ max  u

where α is a learning rate. Note that it employs information from the current command and the single best future command under current knowledge. The proposed algorithm on the other ¯ averaging hand uses a soft-max operation by employing Ψ, over information about the future according to the current belief about the control distribution, hence taking uncertainty arising from, e.g., sampling into account. On the other hand, the TD(0) algorithm, which learns through value function approximation, has updates of the form V(x) ← V(x) + α [R + γV(y) − V(x)] , with α again a learning rate. Since it can be shown that ¯ converges to the value function of the optimal policy Ψ (cf. Supplementary Material), the proposed update converges towards the TD(0) update for samples generated under the optimal policy. In particular, while TD(0) is an on-policy method and learns the value function of the policy used to generate samples, the proposed method learns the value function of the optimal policy directly. B. RL with continuous states and actions One needs to use parametric representations [21] to store Ψ when tabular means are no longer viable or efficient, as is the case with high dimensional, large discrete [5] or continuous state and control spaces. Similar to numerous

previous approaches, e.g., [5], we used a linear basis function model to approximate Ψ, i.e., ˜ Ψ(x, u) ≈ Ψ(x, u, w) =

M 

wm φ(x, u)

(24)

m=0

where φi : X × U → R are a set of given basis functions and w = (w1 , . . . , wM ) is the vector of parameters that are optimised. For such an approximation, and given a set of samples (x1...K , u1...K , R1...K , y1...K ), the updates (8) and (23) can be written in matrix notation as Φwn+1 = Φwn + z ,

(25)

where Φ is the K × M matrix with entries Φi,j = φi (xj , uj ) and z is the vector with elements ¯ k ) + Rk − Ψ(x ¯ k) . zk = γ Ψ(y

(26)

This suggests the update rule of the form w ← w + (ΦT Φ)−1 ΦT z .

(27)

The choice of basis functions is somewhat complicated by the need to- evaluate the log partition function of the policy ˜ ¯ i.e. log exp{Ψ(x, u)}, when forming the vector z. In Ψ, u cases where U is a finite set, arbitrary basis functions can be chosen as the integral reduces to a finite sum. However, for problems with infinite (or continuous) control spaces, bases need to be chosen such that the resulting integral is analytically tractable, i.e. the partition function of the stochastic policy can be evaluated. One class of such basis sets is given by those ˜ Ψ(x, u, w) that can be brought into the form 1 ˜ Ψ(x, u, w) = − uT A(x, w)u+uT a(x, w)+a(x, w) , (28) 2 where A(x, w) is a positive definite matrix-valued function, a(x, w) is a vector-valued function and a(x, w) a scalar function. For such a set, the integral is of the Gaussian form and the closed form solution  ˜ = − log |A|− 1 a A−1 a+a+constant (29) log exp{Ψ} 2 u is obtained. This gives us a recipe to employ basis functions that lead to tractable computations and the policy can be computed as π(u|x, w) = N (u|A−1 a, A−1 ). V. E XPERIMENTS A. Gridworld - Analytical Infinite Horizon RL We start by evaluating the proposed algorithm (23) on a problem used in [22], with finite state and action spaces, which allows a tabular representation of Ψ. The state space is given by a N × N grid (see Fig. 2(b)) with some obstacles. The control can move the state to any adjacent ones not occupied by an obstacle and the move succeeds with a probability of 0.8. Additionally, a set A ⊆ X of absorbing target states was defined and the agent incurs a cost of 1 at all states other than the target, i.e., C(x, u) = δx∈A / with δ the Kronecker delta. The cost was not discounted. We benchmark performance against tabular Q-learning [21]. Both algorithms were given

357

Q-learning Ψ-learning Ψ-learning (online)

Error

1

0

0

105 Number of Samples

(a)

(b)

Fig. 2: Results from the Gridworld problem. (a) Evolution of the mean error in (30) averaged over 10 trials with error bars indicating the s.d. (b) Optimal value function (white low expected cost - black high expected cost) of the problem. Obstacles are black and the target state is indicated by *.

data from episodes generated with controls sampled from an uninformed policy. Once a target state was reached, or if the target wasn’t reached within 100 steps, the state was reset randomly. The learning rate for Q-learning decayed as α = c/(c + k) with k the number of transitions sampled and c a constant which was optimised manually. Representative results are illustrated in Fig. 2. We plot the approximation error maxx |J (x) − Jˆ(x)| eJ = (30) maxx J (x) between the true value function J , obtained by value iteration, ˆ given by Ψ ¯ and maxu Q(x, u) respecand it’s estimate J, tively. Both algorithms achieved the same error at convergence, but the proposed algorithm (Ψ-learning) consistently required fewer samples than Q-learning for convergence – this is consistent with the discussion in IV-A. We additionally considered a online variant of Ψ-learning where the controls are sampled from the policy given by the current Ψ, i.e. ¯ π(u|x) = exp{Ψ(x, u) − Ψ(x)}. As expected, the online version outperformed sampling using an uninformed policy. The aim of this evaluation, besides providing a sanity check to the working of the algorithm, was to illustrate that the proposed method provides similar performance advantages as obtained for the restricted class of problems considered in [22], despite working in the product space of states and actions, as necessitated by considering the unrestricted SOC problem. B. Cart-Pole System We now move on to problems with continuous state and action spaces which will make approximations necessary, demonstrating that the theoretical results presented in III can lead to practical algorithms. Specifically we will consider, both, an approximate inference approach for implementing the posterior policy iteration of III-C on a finite horizon problem and the basis function based approach, discussed in IV-B, to the RL version of the asynchronous updates for infinite horizon problems derived in III-B.

We have chosen the classical Cart-Pole problem [21], which has been repeatedly used as a benchmark in reinforcement learning, e.g., [18]. This plant, illustrated in Fig. 3a, consists of a inverted pendulum which is mounted on a cart and is controlled by exerting forces on the latter. Formally, the state ˙ with x the position of the space is given by x = (x, x, ˙ θ, θ), cart, θ the pendulum’s angular deviation from the upright position and x, ˙ θ˙ their respective temporal derivatives. Neglecting the influence of friction, the continuous time dynamics of the state are given by 

g sin(θ) + cos(θ) −c1 u − c2 θ˙2 sin(θ) θ¨ = 4 l − c2 cos2 (θ) (31)

3  2 x ¨ =c1 u + c2 θ˙ sin(θ) − θ¨ cos(θ) with g = 9.8m/s2 the gravitational constant, l = 0.5m the pendulum length and constants c1 = (Mp + Mc )−1 and c2 = lMp (Mp +Mc )−1 where Mp = 0.1kg, Mc = 1kg are the pendulum and cart masses, respectively. The control interval was 0.02s and the dynamics were simulated using the fourth order Runga-Kutta method. Stochasticity was introduced by adding zero mean Gaussian noise, with small diagonal covariance, to the new state. These settings correspond to those used by comparative evaluations in [18]. 1) Model Based Posterior Policy Iteration: First, we consider a finite horizon optimal control problem, assuming we have access to both the plant dynamics and cost function. The exact algorithm of III-A does not lend itself easily to this setting, due to the intractable integrals arising in (8) as a consequence of the nonlinear dynamics – although we note that by taking local LQG approximations of the problem, closed form updates can be derived. However, we demonstrate that by using standard approximate inference techniques, the alternative posterior policy iteration in III-C can yield good approximate optimal policies. Specifically we consider the swing up task in which the pendulum has to be moved from a hanging down to an upright position and balanced. The per-step cost for this task is given by Ct (xt , ut ) = ω1 θ2 + ω2 θ˙2 + ω3 u2t

∀t ∈ [0, T ] ,

(32)

where ω is a vector of weights. The time horizon was T = 3s, but note that, since a cost is incurred in each time step for pendulum positions away from rest in the upright position, a rapid swing up followed by holding is encouraged. x, u ¯) is not tractable in this setting, As the posterior pπ (¯ we use an extended Kalman smoother [20] to estimate a Gaussian approximation to the full posterior, leading to a Gaussian posterior policy. As a consequence of the Gaussian approximation and inference method chosen, inference is required to be performed only once, for pπ0 (¯ x, u ¯), and the eventual result of the iteration (15) can be obtained as the linear policy given by the mean of the posterior policy. In Fig. 3, we plot the expected costs and the cost variances, both estimated by sampling under the obtained policies, for different values of the parameter η. For reference, we also

358

4

30

120

AICO 20

10

Expected Cost

2

100

Episode Length

Expected Cost

3

Cost Variance

θ

u

Ψ-Learning eNAC

140

200

80 60 40

1

20 0

x (a)

-5 -4 -3 -2 -1 0 1 log10 (η)

(b)

0

0

-5 -4 -3 -2 -1 0 1 log10 (η)

(c)

0

100

200 300 400 Number of Episodes

(a)

Fig. 3: Results for model based approximate posterior policy iteration on the Cart-Pole swing-up task. (a) Schematic of the pole on cart plant used in the experiments. (b) Expected cost achieved by policies obtained for different values of the parameter η. The dashed line indicates expected cost of policy obtained using iLQG. All values estimated from 1000 trajectories sampled using the respective policy. (c) Variance of the costs achieved by the same policies as in (b).

show the expected cost from the policy obtained using the iLQG algorithm [13] which also computes an approximately optimal linear policy. We first observe that as predicted by III-C, η acts to control the risk seeking behavior of the policy, and for increasing values of η the cost variance increases substantially. Furthermore, we note that the choice of η = 1, which, as discussed, corresponds to the AICO setting, leads to results substantially different from the case of classical (risk neutral) optimal control. However reducing η leads rapidly to policies obtained by approximate inference which exhibit similar performance to those obtained by classical approximate methods. 2) RL with approximations: To evaluate the RL approach proposed in IV-B we consider the balancing task, following closely the procedures in [18], where this task was used for evaluation of policy gradient methods. The task, which consists of stabilising the pendulum in the upright position while simultaneously keeping the cart at the center of the track, had the cost function 4 0 if (x, θ) in target set , (33) C• (x, u) = 2 2 ωθ θ + ωx x else where the target was given by x ∈ [−0.05m, 0.05m] and θ ∈ [−0.05rad, 0.05rad] and the discount rate was γ = 0. We chose this cost as we found it to give better results for uniformed initial policies, for which the piece wise constant cost of [18] provided little information. The linear policy learned in [18] corresponds to a second order polynomial basis for Ψ in the proposed method (ΨLearning). Specifically we used the basis set ˙ x2 , xx, ˙ x˙ 2 , xθ, ˙ θ2 , θθ, ˙ θ˙2 } ˙ uθ, uθ, ˙ xθ, xθ, ˙ x˙ θ, {u2 , ux, ux, which is of the form (28) and indeed only constitutes an approximation to the true Ψ as the problem is non-LQG.

500

0

0

500 1,000 Number of Episodes

1,500

(b)

Fig. 4: Results for RL with continuous state and action spaces. (a) Length of training episodes, averaged over blocks of 25 episodes, for Ψ-Learning, when initialized with an uninformed policy. The vertical dashed line indicates the point at which initial policies for the results in the subsequent comparison experiment were picked. Error bars indicate s.d. (b) Comparison of evolution of the expected cost between eNAC and Ψ-Learning. Both methods are initialised with the same stabilising policies (cf. (a)) and results averaged over 10 trials with error bars indicating s.d.

Episodes were sampled with starting states drawn such that θ ∈ [−0.2rad, 0.2rad] and x ∈ [−0.5m, 0.5m] and controls were sampled from the stochastic policy given by the current parameters. During training, episodes were terminated if the plant left the acceptable region θ ∈ [−0.2rad, 0.2rad] and x ∈ [−0.5m, 0.5m] or after 200 steps. Policy parameters were updated every 10 episodes and every 5 updates policies were evaluated by sampling 50 episodes of 500 step length using the mean of the policy. All results were averaged over 10 trials. The learning rate parameter for policy gradient methods was adjusted manually for best results. Despite the change in cost function, like [18], we were not able to reliably obtain good policies from uninformed initialisation when using policy gradient methods. Our method on the other hand, when initialised with an uninformed policy, i.e., zero mean and a variance of 10, was able to learn a stabilising policy within 400 training episodes. This is illustrated in Fig. 4a where the average length of training episodes is shown. In order to be able to compare to the episodic Natural Actor Critic (eNAC) method, which produced the best result in [18], we used the policies obtained by ΨLearning after 400 training episodes as initial policies. By this stage, the average expected cost of the policies was 239.35 compared to the initial cost which had been of the order 3 × 105 . Fig. 4b shows the evolution of the expected cost for both methods with such an initialisation and as can be seen Ψ-Learning outperformed eNAC both in terms of convergence speed and attained expected cost. As the quality of the obtained policy will depend on how well the basis set can approximate the true Ψ, we also considered a more complex set of bases. Specifically, while keeping A in (28) a set of non-zero constant basis functions,

359

we represented a(x, w) and a(x, w) using the general and commonly used squared exponential bases which are of the form φ(x) = exp{−(x − mφ )T Σφ (x − mφ )} (34) with center mφ and metric Σφ . The centers were sampled randomly from a region given by the acceptable region specified earlier and x˙ ∈ [−1m/s, 1m/s], θ˙ ∈ [−1rad/s, 1rad/s] and Σφ was chosen to be diagonal. For this setting we were not able to obtain good policies using eNAC, while in the case of Ψ-Learning this choice did not outperform the polynomial basis, yielding a best policy with expected cost 26.4. VI. C ONCLUSION We have presented a general relation between stochastic optimal control problems and minimisation of KL divergences of the form (4). This allowed us to derive iterative algorithms for obtaining both risk neutral and risk sensitive optimal controls for finite and infinite horizon MDPs. We show that these algorithms, although instances of generalised EM procedures, enjoy guaranteed convergence to the global optimum. Further, we discuss the connections of our work to previous approaches in this area, highlighting that many of these arise in our formulation as special cases which either require restrictions on the class of problems (e.g., [22, 10]), or for which the relation to SOC was previously unclear (e.g., [24]). The formalism is then extended to the model free RL setting in both the finite and infinite horizon case. In the case of finite state and action spaces, using a tabular representation, we obtain an exact algorithm with interesting relations to Q- and TD(0) learning. We also present an approximation, based on basis function representations, which extends [5] to problems with continuous state and action spaces. Our approach is verified in the discrete setting and we highlight the novel aspects of our work in experiments on a problem with continuous states and actions in the form of the standard Cart-Pole benchmark. On the one hand we show that, by employing standard out of the box approximate inference methods, optimal policies can be computed for model based finite horizon problems, improving the shortcomings of [24]. On the other hand, we consider an infinite horizon problem in the model free RL setting, demonstrating that the proposed approximate algorithm shows performance competitive with the well established eNAC algorithm. We also provide a recipe for selecting appropriate basis functions that lead to efficient, tractable solutions. R EFERENCES [1] C. M. Bishop. Pattern recognition and machine learning. Springer, 2006. [2] D. Braun, M. Howard, and S. Vijayakumar. Exploiting variable stiffness in explosive movement tasks. In R:SS, 2011. [3] J.L. van den Broek, W.A.J.J. Wiegerinck, and H.J. Kappen. Risk sensitive path integral control. In UAI, 2010. [4] P. Dayan and G. E. Hinton. Using EM for reinforcement learning. Neural Computation, 9:271–278, 1997.

[5] A.M. Gheshlaghi et al. Dynamic policy programming with function approximation. In AISTATS, 2011. [6] D. Mitrovic et al. Optimal feedback control for anthropomorphic manipulators. In ICRA, 2010. [7] E. A. Theodorou et al. Learning policy improvements with path integrals. In AISTATS, 2010. [8] S.I. Marcus et al. Risk sensitive markov decision processes. Systems and control in the 21st century, 1997. [9] A. Gunawardana and W. Byrne. Convergence theorems for generalized alternating minimization procedures. J. of Machine Learning Research, 6:2049–2073, 2005. [10] B. Kappen, V. Gomez, and M. Opper. Optimal control as a graphical model inference problem. arXiv:0901.0633v2, 2009. [11] H. J. Kappen. Path integrals and symmetry breaking for optimal control theory. J. of Statistical Mechanics: Theory and Experiment, page 11011ff, 2005. [12] J. Kivinen and M. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132:1–64, 1997. [13] W. Li and E. Todorov. An iterative optimal control and estimation design for nonlinear stochastic system. In CDC, 2006. [14] D. Mitrovic, S. Klanke, and S. Vijayakumar. Adaptive optimal control for redundantly actuated arms. In SAB, 2008. [15] J. Nakanishi, K. Rawlik, and S. Vijayakumar. Stiffness and temporal optimization in periodic movements: An optimal control approach. In IROS, 2011. [16] J. Peters, S. Vijayakumar, and S. Schaal. Reinforcement learning for humanoid robotics. In Humanoids, 2003. [17] K. Rawlik, M. Toussaint, and S. Vijayakumar. An approximate inference approach to temporal optimization in optimal control. In NIPS, 2010. [18] M. Riedmiller, J. Peters, and S. Schaal. Evaluation of policy gradient methods and variants on the cart-pole benchmark. In IEEE ADPRL, 2007. [19] P. N. Sabes and M. I. Jordan. Reinforcement learning by probability matching. In NIPS, 1996. [20] R. F. Stengel. Optimal Control and Estimation. Dover Publications, 1986. [21] R.S. Sutton and A.G. Barto. Reinforcement Learning. MIT Press, Cambridge, 1998. [22] E. Todorov. Efficient computation of optimal actions. PNAS, 106:11478–11483, 2009. [23] E. Todorov and M. Jordan. Optimal feedback control as a theory of motor coordination. Nature Neuroscience, 5:1226–1235, 2002. [24] M. Toussaint. Robot trajectory optimization using approximate inference. In ICML, 2009. [25] M. Toussaint and A. Storkey. Probabilistic inference for solving discrete and continuous state markov decision processes. In ICML, 2006. [26] D. Zarubin, V. Ivan, M. Toussaint, T. Komura, and S. Vijayakumar. Hierarchical motion planning in topological representations. In R:SS, 2012.

360

Failure Anticipation in Pursuit-Evasion ∗ CNRS,

Cyril Robin∗† and Simon Lacroix∗† LAAS, 7 avenue du colonel Roche, F-31400 Toulouse, France de Toulouse, LAAS, F-31400 Toulouse, France [email protected], [email protected]

† Univ

Abstract—This paper presents a new approach for the pursuit of targets by a team of aerial and ground robots under realistic conditions. Mobile target pursuit is often a sub-task of more general scenarios, that call for environment exploration or monitoring activities. Since most of the time a single robot is sufficient to ensure the pursuit of a target, our approach aims at minimizing the team resources devoted to the pursuit: while ensuring the pursuit, a single pursuer evaluates its own potential failures on the basis of the situation defined by the target evolution and the environment structure. It thus assesses its need for team support. When support is necessary to keep the target in view, one or more additional robots are involved, according to a task allocation scheme. We provide mathematical bounds of the complexity of the approach, that ensure that the system has real-time performance. Simulations in a variety of realistic situations illustrate the efficiency of the proposed solution.

them. Localization may use different sensors and vantage points [19] or some targets specificities like group coherence [17], while monitoring is achieved through a direct view to the targets. With several targets, the problem often comes to trade the targets between the observers, depending on their relative positions [7, 8]. When there is only one target, a single robot may perform the tracking task alone – which we call “following”, but it often referred to as “pursuit-evasion”. This latest problem often assumes that a single robot is enough to perform the target monitoring task. Target Management

I. I NTRODUCTION – CONTEXT Numerous mobile robotics applications are related to “targets”, be the target adversaries to detect and chase, flocks of animals to monitor, other robots to follow or assist... Detecting, localizing or tracking targets raises a large variety of problems, to which the research community has devoted a lot of work. Figure 1 summarizes the primary problems encountered in target related applications. One may first distinguish two different missions: detecting one or more targets, and tracking them. The first mission aims to control an area, and ends with the detection or the capture of the targets inside this area, using several agents, robots or fixed sensors. The historical problem is known as the art gallery problem [16] which considers fixed sensors. More recent variations are the patrolling problem [14] and the surveillance problem which consider either mobile sensors only or a combination of mobile and fixed sensors. Besides, most problems known as pursuit-evasion or search problems are actually “capture” problems, where the aim is to surround a target, avoiding both the contamination of previously cleared areas and the evasion of the targets. It is assumed that the number of robots is sufficient for such purpose. Chung et al. [5] propose a good survey of existing mathematical and robotics oriented work in this area. Discovering or detecting a target is one thing, but in real applications this is often only a part of the whole scenario, either because the robots cannot neutralize the targets or because it is not desired or expected. Once the targets are designated, the robots often have to track them. This is the second kind of mission, which starts upon target detection. One may then want to localize the targets [10] or to monitor

Target Detection

Target Tracking

Art Gallery

Search

Localizing

[16]

[5]

[10, 19, 17]

Monitoring

Observing multiple targets

Following a single target

[7, 8]

[13, 3, 11, 2, 12, 1]

Fig. 1.

Overview of the target management problems.

For all these problems, whatever the application context and the target type considered, multi-robot teams naturally extend the capacity of a single robot to manage the targets. Heterogeneous team provides even more solutions and opportunities to perform a mission, e.g. by multiplying the vantage points, as in cooperating aerial and ground teams. Also, while a single kind of robots can perform well on specific cases, a heterogeneous team can achieve multi-objective missions, some robots being more adapted to target tracking, others being better fitted to target detection for instance. We hereby consider a multi-objective scenario where an heterogeneous team of robots is in charge of controlling a predefined known area. They are not numerous enough to statically cover the entire area (this is not an art gallery problem), and several targets can be present in the environment. The robots are initially engaged in an exploration phase, according to a pre-planned strategy. Once a mobile target is discovered, the team must ensure that the target remains visible, while

361

pursuing the environment exploration. The scenario naturally imposes the satisfaction of real-time constraints. The paper focuses on the target pursuit phase, under the realistic constraints raised by this multi-objective scenario. The only hypothesis on the target is that it is not harmful for the robots: besides this it may either be adversarial, evasive, or move according to any other strategy. The success of the tracking task is defined according to some visibility criteria (in terms of distance and continuity, from a 100% target visibility constraint to a more relaxed one). As the team performs both exploration and one or more tracking tasks, we want to minimize the resources required by the management of one target: the objective is to satisfy visibility constraints on the target, while minimizing the number of required robots for this purpose. As a single pursuer is most of the time sufficient to perform the tracking task, this is the favored tracking configuration. However a single pursuer can sometimes fail, in which case it is asks for support by other robots. The two main issues raised here and tackled in this paper are (i) how to predict the single pursuer failures and (ii) how to anticipate and prevent them with support robots. The next section reviews the main results of the literature on target following, formalizes the problem, and presents our approach and the used models. Section III is the heart of the paper: it depicts how the pursuer can assess future tracking failures while the target is being tracked. Results in various realistic situation are presented and analyzed. Section IV exploits these potential tracking failures to define cooperative support tasks, using a task allocation scheme, and a discussion concludes the paper.

or “the target is about to cross an area the pursuer can not cross” are situations that do call for support from other robots, whereas the situation “the target is moving behind a small building located in a wide open area” does not if temporary occlusions are allowed (figure 2).



(a) Pursuit with AAV

(b) Risks evaluation

(c) Ask for support

(d) Pursuit with AGV

Fig. 2. Illustration of our approach. In (a) the helicopter is the pursuer. In (b), the target is about to enter a building in which the helicopter can not proceed: the pursuer asks for support (c), and the ground robot becomes the pursuer (d).

A. Problem statement According to an economy of means principle, we want to minimize the number of robots requisitioned for the tracking task. Let R the set of robots. The problem is formally defined as follows:

II. T HE PURSUIT PROBLEM

∀τ, minimize

Much work on the target following problem can be found in the literature. Eaton and Zadeh [18] first exposed the search for an optimal strategy as an optimization problem in discrete probabilistic systems, working with huge search space size. More recently, contributions by Hutchinson et al. [13, 3, 11, 2] thoroughly analyze the problem and some of its variations: the following problem is fully decidable, but its complexity hinders the definition of optimal solutions under real time constraints. Besides, some single robot local following strategies however exhibit great results in both simulations and actual experiments (e.g. [12, 1]). Note that these approaches exploit poor environment models, usually 2D binary free/obstacle models in which the obstacle areas coincide with visibility masks. To minimize the resources allocated to the target pursuit, our approach is to mainly rely on one robot (the pursuer) to perform the pursuit task: the state of the art shows that this should be sufficient most of the time. But due to its capacities, the target capacities and the environment configuration, the purser may fail to ensure the visibility constraint, be its strategy optimal or not. The pursuer therefore constantly evaluates the potential upcoming failures and their associated risk, and asks for support from others robots only when required. For instance “the target is rather far and about to enter a maze”



a(r, τ )

(1)

r∈R

where

⎧ ⎨ 1 if the robot r is active at time τ for tracking or support in pursuit a(r, τ ) = ⎩ 0 else

while satisfying the visibility criteria :  τ h({r, a(r) = 1}, target, θ)dθ ≤ Tmax

(2)

(3)

τ −Tmax

with

⎧ 1 if target is hidden ⎪ ⎪ ⎨ from the set of robots S h(S, target, θ) = at time θ ⎪ ⎪ ⎩ 0 else

(4)

More specifically, the target is hidden if none of the active robots sees it (i.e. it is occluded by the environment or beyond the sensor maximal range). Tmax is a criteria specified by the operator, which allows relaxing the visibility conditions: the target may be out of sight, but no longer than Tmax seconds ( Tmax can also be set to 0).

362

B. Realistic Models To be able to deal with a whole variety of realistic environments, different types of target and heterogeneous robots, we use environment models that explicit traversability properties for both the ground robots and the target and on which visibility constraints can be assessed. A layered model of the environment allows to recover inter-visibility and traversability information (namely, a 2.5D elevation map and a symbolic layer that expresses the terrain type, e.g. roads, grass, obstacles, buildings, rivers...). Motion models are associated to each vehicle (robots and targets), they define their movement capacities in terms of attainable speeds.1 The robot sensors are modeled as omnidirectional cameras, to which a maximal range of detection is associated. By convolving the vehicle motion models with the terrain type layer, we end up with a multi-layered map for traversability, used for target motion prediction and robot planning. A priori target visibility information are also computed for each robot (e.g. an AAV will never be able to view a target in a building), and the 2.5D elevation map is exploited to assess visibilities (see figure 3). All these layers are represented by Cartesian grids, and are exploited online to compute the motion and sensing possibilities using the available information on the current state (such as target speed and position). Each robot embeds the models of its motion and sensing capacities, and the various target motion models.

(a) Orthoimage

(b) Elevation map

(c) Traversability map for AGV

(d) Visibility map for AGV

Fig. 3. Environment models used. (a) Satellite view of the area (orthoimage); (b) elevation model (derived from an airborne lidar survey); (c) 2.5D multichannel traversability map: here different gray values correspond to different attainable speeds, along a light-gray/dark-gray fast/slow scale; (d) 2.5D multichannel visibility map.

III. A SSESSING THE NEED FOR TEAM SUPPORT Our approach relies on the fact that the robot which tracks the target, called the pursuer, is able to predict tracking failures, that is loss of target visibility for a duration longer than Tmax . For this purpose, we evaluate all the possible pursuer / target situations over a temporal horizon h to isolate the tracking failures that can occur between the current time τ and the time τ + h. We hereby depict how this can be achieved in real-time – the predicted failures generate support task requests, whose processing is depicted in section IV. A. Failure evaluation As failures mainly depend on the environment and on the relative positions of the target and the pursuer, there is no real general shortcut for their assessment, and so one has to compute all the possible future states over the time horizon h. A future state is a failure state if constraint (3) is not satisfied, which for the pursuer is equivalent to: ∀θ, θ ∈ [τ ; τ + Tmax ], h({pursuer}, target, θ) = 1

(5)

i.e. if the target is hidden from the pursuer sensors for Tmax seconds or more. To compute all the possible states, we exploit the discrete structure of the traversability models, and define a tree using 1 We assume that once detected, a target is identified – hence its capacities are known to the pursuer

the pursuer and target motion models2 . At every time step, the tree grows with a branching factor of mt ∗ mr , where mt and mr are respectively the number of possible motions for the target and the robot. With a temporal horizon h, the complexity of the tree building is O((mt ∗ mr )h ). Usual values for the parameters are mt = mr = 9 (9connexity in 2D-grid), and h = 20 at least. h must be large enough so that the other robots of the team can address the requests for support, without a priori constraining them too much to remain in the vicinity of the pursuer: indeed the smallest h is, the quicker and closer the supporting robots must be. Using a value of h of 20 and considering 9-connexity, the tree complexity is O(81h ), i.e. near 1038 , which is by no means tractable. B. Introducing the tracking strategy Besides the combinatorial problem of the failures evaluation, the pursuer tracking strategy has to be efficient to minimize the needs for team support. Efficient stands here for “keep the target in sight”, while being predictable and fast to compute. Indeed we seek to evaluate all the possible outcomes with unpredictable target motions, hence numerous strategies 2 For the sake of simplicity, we consider throughout this section that the purser and the target evolve at the same speeds. This does not cause any loss of genericity, but only slightly influences the overall complexity – see appendix.

363

must be evaluated within the time horizon h to cope with this unpredictability. As computing an optimal tracking strategy raises a combinatorial problem, we use a local greedy strategy. One may want to use a pre-computed optimal strategy but this would not be robust to differences between the a priori environment model and the actual environment. Real-time state-of-the-art local strategies are efficient, but still require much computing resources and often make strong assumption about the target motions. Our local strategy applies the two following rules: (1) if the target is visible, try to get closer to the target while maintaining visibility; (2) else, find the shortest path to the closest position which satisfies the target visibility criteria. While clearly sub-optimal, this simple local strategy shows acceptable tracking performances, is extremely quickly computed, and can be applied to either an AAV or an AGV. This local strategy also brings an important advantage: the branching factor of the tree is directly reduced to mt , the number of available positions for the target, because the strategy associates a single robot position for a given target position. The tree complexity becomes O(mht ) (see figure 4), that is near 1019 with mt = 9 and h = 20. Note that mt could be reduced by making assumptions on the target motions, but this is not desired, and this does not drastically reduces the problem the exponential complexity.   

  

Fig. 4.

Reducing the complexity: the impact of a local strategy

C. From a tree to a cyclic pursuit graph The previous complexity gain is not enough to allow the satisfaction of real-time constraints: the complexity needs to be further reduced, while still ensuring the detection of all the potential failures. Various structure transformations of the tree allow to find and exploit states redundancies, that are brought by the fact that the tree is built upon a grid structure. First, note that several positions pt of the target may be handled by only one robot position pr . The former state-nodes (τ, pt , pr ) become new nodes of type (τ, {pt }, pr ), with τ the considered time. This lets the tree structure unchanged (figure 5). Second, for a same stage in the tree, there are several similar nodes which only differ by their parents. However the evaluation of a node only depends on spatial (and sometimes temporal) considerations, not on the past positions of the target or the robot. So similar nodes can be gathered, even if they have different parents. The tree structure can be changed into a graph structure, but the information carried by the nodes are the same (figure 6).













Fig. 5.













 











Reducing the complexity: merging spatial redundancies





     







  













 













Fig. 6. Reducing the complexity: using a graph structure to exploit spatial redundancies

For a 9-connexity grid, these two structure transformations drastically reduce the complexity, down to O(h5 ) in the worst case (see appendix for details). Temporal redundancies allow to further reduce the complexity, by introducing temporal loops in the graph: some situations are indeed encountered several times, e.g. when none of the vehicle move, or after one loop around a building (figure 7). The nodes are embedded the following data: (ˆ τ , {pt }, pr ) where τˆ = minτ {τ /(τ, {pt }, pr )}, i.e. τˆ is the first temporal occurrence of the considered spatial state. The resulting structure is referred to as the pursuit graph. Last, we introduce an iterative algorithm to build the graph, which allows to only consider the real new nodes at each next temporal step, which are referred to as the front nodes. This shrinks the complexity down to O(h3 ), still without any loss of information nor precision in the prediction (see appendix for details).

    





 



 

 





Fig. 7.



 

 



 



Reducing the complexity: introducing temporal loops

The resulting complexity is polynomial of low degree, which allows to assess the potential failures, and hence the needs for team support in real time. Each potential failure generates a task, which has the form (τ, p) where τ is the date

364

of the failure and p its position. In other words the support task is defined as watch the position p at date τ . The result is a small set of tasks, which is empty most of the time (when the target is fully under the control of the pursuer), and which constitute the seeds for multirobot cooperation. D. Results The bounded tractable complexity and the fact that the elaborated solution guarantees the assessment of all the potential failures are satisfying, but in practice, how does the pursuer perform and are the complexity bounds low enough for realtime applications? Three examples are hereby presented to appreciate the performance of the solution: one takes place in a Manhattanlike environment, one is a “river-crossing” situation, and one exploits models from an actual experimental field.

(a) Manhattan environment

(b) River-crossing situation

(a) Robot traversability

(b) Target traversability

(c) Realistic Environment

(c) Robot visibility Fig. 8. Pursuit in a Manhattan environment: (a) traversability for the pursuer (AAV); (b) traversability for the target (ground target); (c) Visibility map: elevations of the buildings, and presence of a covered area in the bottom center of the map (in black). The vehicle trajectories are displayed (dark grey = target; light grey = pursuer), and the elliptical dark areas indicates places where the target is temporary hidden. The target escapes in the covered area (which is a definite failure, denoted as fail). In phase I the target is under control, while in phase II the target is too close to the covered area (danger).

1) Manhattan situation: Figure 8 shows the traversability and visibility maps, and the trajectories of both the target and the robot pursuer. The situation is a typical pursuit in a Manhattan-like town center, with a ground target and an AAV pursuer. There are buildings and, in the middle, a covered area where the AAV can not enter in nor see through (it can only fly above). The buildings do not constraint the pursuer movement, but its visibility. The motions of both the target and pursuer are holonomic. Figure 9-a displays the number of dangers (in dark gray) and temporal horizon (in light gray) over the time. Dangers are states that do require a request for support from other robots of the team. One can distinguish two phases: in phase I the target is under control (no predicted failures), even if the target is sometimes temporary hidden. The temporal horizon increases at the beginning (easy environment) and then lowers as more buildings obstruct the visibility. This is the result of the realtime constraints and the adjustment of the computation time

Fig. 9. Evolution of the number of dangers (thin line) and of the temporal horizon (thick line) as a function of time steps for the three considered cases. The time laps during which the target is temporary hidden are indicated with thin vertical lines. Phases are distinguished by thick vertical lines.

required by the building of the pursuit graph (see paragraph III-D4). In phase II, the target approaches the covered area, which generates several danger states – and in the absence of support by an other robot, the target eventually enters the covered area. The figures highlight that the greedy strategy, although not optimal, behaves quite well. Furthermore, the system generates temporally and spatially gathered tasks (remind each danger generates one task). This is really important because on the one hand it will avoid “goings and comings” from the support robots, and on the other hand the support robots will probably be able to handle several tasks at the same time, which will help to minimize the number of robots required for the pursuit mission (see section IV). 2) River-crossing situation: Figure 10 shows a situation where the target and the robot do not have the same traversability map, with an advantage for the former – imagine for example that the target is waterproof whereas the robot is not. While the target crosses the river, the robot has to take the bridge instead of directly following the target (phase I), which will probably leads to a temporary occlusion (phase II), until the robot reaches the target again (phase III).

365

horizon (above 15). As a matter of fact, the temporal horizon is constantly adjusted to fit the computation time with the realtime constraint – that is, expand at least the graph of one time step in no longer than one time step.

(a) Robot traversab.

(b) Target traversab.

(c) Robot visibility

Fig. 10. Pursuit in a river-crossing situation: (a) traversability for the pursuer (AAV), the river is the at the bottom of the terrain, a bridge in on the left; (b) traversability for the target (ground target); (c) visibility map. Difficulties arise when the target crosses the river, as the pursuer cannot follow it.

Figure 9-b displays the number of dangers and the temporal horizon evolution over the pursuit. One can again distinguish three phases: at the beginning there are risks that the target escapes (by breaking some visibility constraints); then as the target trajectory is not optimal the risks disappear, even if the target hides behind the building; at the end the pursuer catches up the target again. As the surrounding environment forms a dead-end, which is easy to handle, the temporal horizon reaches high values (over sixty time steps).

(a) Robot and target traversability

Fig. 12. Typical evolution of the computation time with the temporal horizon. Polynomial regression gives a near quadratic complexity (here : F (h) ≈ 2 ∗ h2.2 with a very confident value for the coefficient of determination R2 ).

Note on figure 13 that at few times the computation time exceeds the 1sec cycle criteria. This is actually because the system currently computes the states not one by one, but one horizon step at a time. As the prediction of the computational cost is not very precise, sometimes the system asks for one more horizon step and exceeds the time constraint. This is always balanced by the next computation cycles, which are very low. This would be solved by a slight adjustment of the implementation to compute the states one by one, improving both the respect of the time constraints and the overall performance (since more states should be computed).

(b) Robot visibility

Fig. 11. Pursuit in a realistic environment: (a) the pursuer (AGV) and ground target traversability; (b) the pursuer visibility. Difficulties arise in cluttered areas (phases I and III).

3) Experimental field situation: The third example is a realistic environment for ground target and pursuer, the vehicles evolve in both cluttered and rather clear areas (figure 11). As figure 9-c shows, dangers exist near and within the cluttered areas, whereas in less difficult areas the greedy strategy performs well enough to prevent risks of failure (no need for support). In cluttered areas, the greedy strategy performs quite well too. It cannot prevent the target to escape with an optimal strategy, but is able to follow quite easily a not so adversarial target. 4) Real-Time performances: The previous sections states that considering a 9-connexity grid, the upper bound for the complexity in the graph construction is O(h3 ) in the worst case. It actually appears that the complexity is rather near quadratic (figure 12). This allows real-time computation, considering a value of 1 or 2 seconds for the time steps (figure 13), while keeping a reasonably acceptable temporal

Fig. 13. Computation cycle at each time step for the tree previous examples. The thick horizontal line represents the “one second” limit.

IV. S UPPLYING TEAM SUPPORT While a pursuer performs the following task, it predicts all its potential failures until a temporal horizon τ + h, τ being the current time. These predictions straightforwardly generate a set a support tasks, defined as “watch place p at time θ”, with p a cell of the spatial grid and τ ≤ θ ≤ τ +h. By construction, the pursuer which issues these tasks cannot handle them. But it manages them: it allocates them to other robots according to their availability, capacities, priorities and the global objective of minimizing the number of required robots. Note that communication issues are currently not considered – they mainly lead to situational awareness inconsistencies between robots,

366

and may also lead the pursuer to have no (or not enough) support. The set of support tasks is updated at each time step. Depending on the target motions, new tasks may occur and previous ones may become deprecated. The pursuer broadcasts the updates to the surrounding robots, which in turn update their current tasks list and evaluate if they could be of any help for the non-allocated tasks: this is a classical multiple task assignment problem. Task allocation problems have given rise to a large amount of work, they raise combinatorial issues and finding optimal solution is still very difficult for non-trivial sized problem. Nevertheless many solutions have been proposed, for example using stochastic methods [15]. Auction-based approaches have been largely studied (see Dias et al. [6] for a survey) and have been shown to efficiently produce acceptable solutions. Choi et al. [4] presents decentralized algorithms which lead to good solution with some guaranties for optimum. The authors also highlight the combinatorial issue raised by the construction of the bundles of tasks (how to choose which subset of tasks will an agent handle) and the conflicts that result from the iterative construction of those bundles. We have chosen to implement an auction process, as such approaches have shown good results and can handle inconsistencies within the situational awareness of agents. But in our case, costs are quite different from usual: as we aim to minimize the number of required robots, performing one or several support tasks has the same cost for a given robot, whereas reward is linear with the number of handled tasks. The auction process is one-turn, with multiple bids allowed: after the update of the task lists, each available robot computes several bundles of tasks it can handle together, and propose them to the auctioneer (the pursuer robot). Then the auctioneer combines those bundles and allocates subset of tasks (either a bundle or sub-part of a bundle) to the robots, trying to minimize the number of robots involved. The support robots compute the bundles of tasks they can handle according to their capacities, the other tasks they already have, and their goals and As they cannot   npriorities. be exhaustive (n tasks lead to p=1 np = 2n − 1 possible bundles), they only compute the biggest bundles in an iterative way. We also want each task to be part of at least one bundle (if the robot is indeed able to perform the task). The results is an overlapping partition of the task set. Reminding the results of part III, the generated support tasks are spatially and temporally gathered, which helps to keep a small number of task bundles (usual values are 0 to 3). Tasks are also independent from each other (except for the temporal constraints) and thus can be handled separately. Once these bundles are computed, the auctioneer gathers them and allocate tasks. It optimally combines the different bundles according to the following criteria : (i) all the support task must be handled if possible (ii) the number of required robots must be minimized. This directly comes from the objective function (1). Finding the optimal combination of bundles could be expensive: con-

sidering Nt the average number of bundle for each robot and N Nr the number of bidding robots, there are O((Nt + 1) r ) 3 possible combinations . Typical maximum values are Nt = 3 and Nr = 4, which leads to 44 = 256 possible combinations. This is still easy and quick to compute for the auctioneer using Boolean representation and bit-to-bit computation. Finally, combinations are ranked with the number of handled tasks (max) and the number of support robot involved (min), including the previously required robots for non-deprecated former tasks. Then the auctioneer sends the support robots the support tasks they have been allocated. Note that the heavy computational load of the auction phase is deported on the bundles construction, sparing the auctioneer which is also the pursuer. It can thus spend more time computing the pursuit graph and expending the temporal horizon, which is the heart of the system. For the same reason we also introduce a temporal shift (of one time-step) between the time tasks are issued and the time associated bids are computed. Doing so allows support robots to compute bundles while the pursuer computes the next pursuit graph, avoiding the robots to await for the others. This shift is negligible compared to the expected horizon size (1  h). Work is still in progress for this part, but first results are promising and show that a minimum number of robots are involved, while still respecting our real-time constraints. V. D ISCUSSION We have presented a new approach for realistic target tracking problems. Keeping in mind that a team of robots often have several distinct objectives, we provide a new pursuercentered cooperation which minimizes the number of required robots at any time, thus satisfying an economy of means principle. The approach is based on a single pursuer which calls for team support only when required, by the evaluation of future failures it can not handle alone. We exploit realistic multi-layered 3D models, which allow to consider different capacities (such as being waterproof or not) for both targets and robots. Kolling et al. [9] recently introduced 2.5D visibility in pursuit-evasion, but to the best of our knowledge it is the first time realistic multi-layered models as ours are used for such problems. As our local following strategy performs quite well, only one robot is required most of the time, but team provide support when needed to prevent the target loss. The resulting system shows promising results in realistic simulations, and integration within a team composed of one AAV and two UGVs is under way. We intend to extend the models by considering communication constraints and failures: this is not a trivial issue, as communication constraints threaten the whole cooperation. Possible solutions are to extend the anticipation from the pursuer, and to integrate communication rendez-vous for the support robots. 3 “+1” is for the empty bundle : each robot can either handle a bundle of tasks, or none. This is important to consider as we want to minimize the number of required robots.

367

Problems may also occur if the pursuer does not behave as planned (i.e. according to its pursuing strategy). However, in such situations the system is not totally failing, as re-planning and re-computing the risks on-the-go remain tractable. Besides, adjusting the spatial and temporal resolutions will lead to better anticipation and to more flexibility in the execution of the plans. We also plan to integrate finer probabilistic motion models and thus to handle potential failures according to both their probability and their temporal proximity. Finally, our longer term perspective is to merge our target tracking system within an overall surveillance scheme, aiming at having a whole team of robots performing under realistic conditions several tasks and achieving different goals in parallel.

[3] S. Bhattacharya, S. Candido, and S. Hutchinson. Motion strategies for surveillance. In Proceedings of Robotics: Science and Systems, 2007. [4] Han-lim HL Choi, Luc Brunet, and Jonathan P How. Consensus-based decentralized auctions for robust task allocation. Robotics, IEEE Transactions on, 25(4):912–926, 2009. [5] T. Chung, G. Hollinger, and V. Isler. Search and pursuit-evasion in mobile robotics. Autonomous Robots, 31:299–316, 2011. [6] M.B. Dias, R. Zlot, N. Kalra, and A. Stentz. Market-based multirobot coordination: A survey and analysis. Proceedings of the IEEE, 94(7):1257 –1270, july 2006. [7] B. Jung and G. S. Sukhatme. Tracking Targets Using Multiple Robots: The Effect of Environment Occlusion. Autonomous Robots, 13:191–205, 2002. [8] A. Kolling and S. Carpin. Cooperative Observation of Multiple Moving Targets: an algorithm and its formalization. Int. J. Rob. Res., 26:935–953, September 2007. [9] A. Kolling, A. Kleiner, M. Lewis, and K. Sycara. Pursuitevasion in 2.5D based on team-visibility. In IROS 2010, ACKNOWLEDGMENTS IEEE/RSJ International Conference on, pages 4610 –4616, oct. 2010. This research is partially funded by the DGA. [10] R. Mottaghi and R. Vaughan. An integrated particle filter and potential field method applied to cooperative multi-robot target A PPENDIX tracking. Autonomous Robots, 23:19–35, 2007. We develop hereby the mathematical bounds presented [11] R. Murrieta-Cid, R. Monroy, S. Hutchinson, and J.-P. Laumond. A Complexity result for the pursuit-evasion game of maintaining section III. The mathematical proof stands for an environment visibility of a moving evader. In IEEE International Conference free of any obstacle and in 9-connexity (using a 2D grid). on Robotics and Automation. Unexpectedly this is an upper bound for the complexity, [12] R. Murrieta-Cid, B. Tovar, and S. Hutchinson. A Samplingas obstacles, viewlength and lower-connexity lead to motion Based Motion Planning Approach to Maintain Visibility of Unpredictable Targets. Autonomous Robots, 19:285–300, 2005. restriction, and thus reduce the number of possible spatial [13] R. Murrieta-Cid, R. Sarmiento, S. Bhattacharya, and S. Hutchinstates. son. Surveillance strategies for a pursuer with finite sensor Under such hypotheses, we have, for each stage s in range. International Journal on Robotics Research, 2007. the graph obtained after spatial redundancies treatment, at [14] D. Portugal and R. Rocha. A Survey on Multi-robot Patrolling most ns nodes with ns ≤ |Acc (target, s)| ∗ |Acc (robot, s)| Algorithms. In Technological Innovation for Sustainability, volume 349 of IFIP Advances in Information and Communication where Acc (i, s) = Ai,s is the set of all accessible posiTechnology, pages 139–146. Springer Boston, 2011. tions for h i in time s. The total size N of the graph is [15] Emilio Frazzoli Sertac Karaman, Tal Shima. Task assignment under s=1 |A t,s | ∗ |Ar,s |. Besides, in 9-connexity we have for complex UAV operations using genetic algorithms. AIAA s Ai,s ≤ 1 + σ=1 8σ = (2s + 1)2 , therefore we have Proceedings.[np]. 10- . . . , 2009. h 4 5 N ≤ [16] Jorge Urrutia. Art Gallery and Illumination Problems. In s=1 (2s + 1) = O(h ). Here we consider that the Handbook of Computational Geometry, pages 973–1027, 2000. pursuer and the target have the same velocity: this does not restrict the generality of our proof, as different velocity will [17] Christopher Vo and Jyh-Ming Lien. Visibility-Based Strategies for Searching and Tracking Unpredictable Coherent Targets only change the resulting complexity (|At,s | = |Ar,s |), not the Among Known Obstacles. In ICRA 2010 Workshop: Search and reasoning. Pursuit/Evasion in the Physical World: Efficiency, Scalability, Adding temporal loops and iterative construction for the and Guarantees, Anchorage, Alaska, May 2010. graph, one only develop the new nodes at each stage, referred [18] L. A. Zadeh and J. H. Eaton. Optimal pursuit strategies in discrete-state probabilistic systems. Trans. ASME Ser. D, J. to as the front nodes. Under the same hypothesis, we now have Basic Eng, 1962. a number of front nodes Nf : [19] Ke Zhou and S.I. Roumeliotis. Multirobot Active Target Tracking With Combinations of Relative Observations. Robotics, Nf ≤ (At,h − At,h−1 ) ∗ Ar,h−1 + (Ar,h − Ar,h−1 ) ∗ At,h−1 IEEE Transactions on, 27(4):678–695, Aug. 2011. 2 2

Nf

≤ ≤

(8h ∗ O(h ) + 8h ∗ O(h )) O(h3 ) R EFERENCES

[1] T. Bandyopadhyay, Yuanping Li, M.H. Ang, and D. Hsu. A greedy strategy for tracking a locally predictable target among obstacles. In Proceedings. ICRA 2006. IEEE International Conference on, pages 2342–2347, may 2006. [2] S. Bhattacharya and S. Hutchinson. On the Existence of Nash Equilibrium for a Two Player Pursuit-Evasion Game with Visibility Constraints. In Algorithmic Foundation of Robotics VIII, volume 57 of Springer Tracts in Advanced Robotics, pages 251–265. 2009.

368

Tendon-Driven Variable Impedance Control Using Reinforcement Learning Eric Rombokas, Mark Malhotra, Evangelos Theodorou, Emanuel Todorov, and Yoky Matsuoka Abstract—Biological motor control is capable of learning complex movements containing contact transitions and unknown force requirements while adapting the impedance of the system. In this work, we seek to achieve robotic mimicry of this compliance, employing stiffness only when it is necessary for task completion. We use path integral reinforcement learning which has been successfully applied on torque-driven systems to learn episodic tasks without using explicit models. Applying this method to tendon-driven systems is challenging because of the increase in dimensionality, the intrinsic nonlinearities of such systems, and the increased effect of external dynamics on the lighter tendon-driven end effectors. We demonstrate the simultaneous learning of feedback gains and desired tendon trajectories in a dynamically complex slidingswitch task with a tendon-driven robotic hand. The learned controls look noisy but nonetheless result in smooth and expert task performance. We show discovery of dynamic strategies not explored in a demonstration, and that the learned strategy is useful for understanding difficult-to-model plant characteristics.

I. I NTRODUCTION Though significant progress has been made toward controlling tendon-driven robotic systems, learning control of such systems remains a challenging task. Most of the difficulties arise from the existence of strong nonlinearities imposed by the tendon-hood structure, model uncertainty, and task complexity due to the need for simultaneous movement and force control. Additionally, fine object manipulation using tendon-driven systems may require sudden changes in gains and/or tendon excursions. Smooth controls can be used when the mass of a manipulated object is small compared to that of the manipulator, but as the dynamics of the external object become significant, dextrous contact with the object requires more abrupt changes in control. Tendon-driven systems allow lightweight manipulators which accentuate this problem. Previous work has learned manipulation tasks by learning parameters of smooth trajectory basis functions [19] [3], but the imposed control smoothness limits the dynamic interaction that can occur with the manipulated object. Optimal control provides a principled approach to determining control policies that are compliant and dynamic [2], and can also handle contact dynamics [27], but requires identified models of the robot and task. Instead, we use state-of-the-art model-free reinforcement learning to directly learn policies for movement control without having to explicitly model uncertain robot and task dynamics. The challenges that come with manipulators using tendon actuation constitute an important and previously unmet challenge for model-free reinforcement learning.

Fig. 1: The ACT hand is a tendon-driven robot designed to mimic the tendons and joints of the human hand.

In this work we use Policy Improvement with Path Integrals (Section IV) [24, 22] for learning complex manipulation tasks with a tendon-driven biomimetic robotic hand (Section II). P I 2 concentrates sampling of the state space around a rough initial demonstration or previously learned strategy, making it effective in high dimensional problems. We introduce a structure by which P I 2 can learn a discontinuous variable impedance control policy that enables tasks requiring contact, motion, and force control during object interaction. With respect to previous results on variable stiffness control with reinforcement learning [3], here we are not using any policy parameterizations that are based on function approximation. Instead, we represent trajectories and control gains as markov diffusion processes. This choice expands the dimensionality of the controllable space and allows for better exploration of the nonlinear dynamics of the ACT hand and the task. The learned strategies can yield insight into subtleties of the plant, showing how biomimetic robotics can not only use inspiration from nature to achieve robotic goals, but can provide insights into the systems which they mimic (Section VII-D). Videos of the experiments can be found at http://tendondriven.pbworks.com/. II. T ENDON -D RIVEN B IOMIMETIC H AND The robotic hand mimics the interaction among muscle excursions and joint movements produced by the bone and tendon geometries of the human hand, as in [6]. The index finger has the full 4 degree-of-freedom joint mobility and is controlled by six motor-driven tendons acting through a

369

Passive Demonstration

Controlled Replay

1

EI

0

PI RI LUM

1

FDP FDS

−1 0

L (cm)

L* (cm)

crocheted tendon-hood. Two tendons, the FDS and FDP act as flexors; the EI, RI, and PI act as extensors and ab/aductors; the LUM is an abductor but switches from extensor to flexor depending on finger posture. By sharing the redundancies and nonlinearities of human hands [5], the system constitutes a challenging testbed for model identification, control, and task learning, while also providing a unique perspective for the study of biomechanics and human motor control. The experiments presented here use only the index finger, with a silicon rubber skin on the palmar surface of the distal segment. The brushless DC motors that actuate the 6 tendons are torque-controlled at 200 Hz and measure tendon displacements at a resolution 2.30 μm; the tendon lengths alone are used for feedback control as there is no direct measurement of joint kinematics. Successfully performing manipulation tasks thus requires a control policy that can handle the nonlinear dynamics and high dimensionality of the robot as well as the dynamics of the task itself.

0

−1

1 2 Time (sec)

0

1 2 Time (sec)

Fig. 2: Playback of the demonstration trajectories using a constant-gain proportional controller fails to achieve taskperforming tendon trajectories, and does not exhibit compliance for task-irrelevant time periods. Acronyms are anatomical tendon names, eg Flexor Digitorum Profundis (FDP).

III. S LIDING S WITCH TASK The kinematically simple task of sliding a switch is difficult to perform expertly with a tendon-driven finger; contact and task dynamics constitute a large part of the force required from the controlling tendons. An important research topic in neuromuscular control is how humans achieve such hybrid control, transitioning from motion to force control, as in tapping a finger on a rigid surface [29]. Even for isometric tasks it is nontrivial to decode the recorded activations of muscles and understand how these act through tendons to the end effector [28]. In this paper we examine the task of contacting a sliding switch and pushing it down (see Figure 3). The switch in our apparatus is coupled to a belt and motor which allow the imposition of synthetic dynamics. The position of the switch x is measured with a linear potentiometer. Importantly, the finger loses contact with the switch at xreach before reaching the bottom of the possible range, denoted xmin . We begin with a single demonstration of the desired task in which a human holds the finger and moves it through a motion of pushing the switch down. The tendon excursions produced by this externally-powered example grossly resemble those required for the robot to complete the task, but simply replaying them using a general-purpose PID controller would not result in successful task completion for two main reasons. Firstly, during demonstration the tendons are not loaded, which changes the configuration of the tendon network in comparison to when it is actively moving. Secondly, and more importantly, the tendon trajectories encountered during a demonstration do not impart any information about the necessary forces required to accommodate the dynamics of the task. For instance, at the beginning of the task, the finger must transition from moving through air freely, to contacting and pushing the switch. A feedback controller following a reference trajectory has no way of anticipating this contact transition, and therefore will fail to initially strike the switch with enough force to produce the desired motion.

x

xreach xmin

Fig. 3: Hybrid Switch Task. During the third experiment, spring dynamics are added to the natural switch dynamics, in the form of a virtual spring force F . Xmin is the physical extent of the sliding switch, but the finger loses contact before reaching it, at a point Xreach which is dependent on finger movement.

Without any prior system identification of the robot or sliding switch, P I 2 can directly learn a control policy that minimizes a cost associated with the position of the switch, feedback gain, and tendon travel. IV. P OLICY I MPROVEMENT WITH PATH I NTEGRALS : P I 2 In this section we review the framework of reinforcement learning based on path integrals. The stochastic optimal control is a constrained optimization problem formulated as follows:

370



tN

V (x) = min J(x, u) = min u(x,t)

u

t

L(x, u, t)dt

(1)

subject to the nonlinear stochastic dynamics: dx = α(x)dt + C(x)udt + B(x)δω

(2)

with x ∈ &n×1 denoting the state of the system, u ∈ &p×1 the control vector and δω ∈ &p×1 brownian noise. The function α(x) ∈ &n×1 is the drift, which can be a nonlinear function of the state x. The matrix C(x) ∈ &n×p is the control transition matrix and B(x) ∈ &n×p is the diffusion matrix.

solution of the value function in (5) the path integral optimal control takes the form:  uP I (xti )dt = lim P (τ i ) δω ti (6) dt→0

where τ i is a trajectory in state space starting from xti and ending in xtN , so τ i = (xti , ..., xtN ). The probability P (τ i ) is defined as e− λ S(τ i ) P (τ i ) = 1 ˜ e− λ S(τ i ) dτ i 1

2

TABLE I: Policy Improvements with path integrals PI . •



Given: – An immediate state dependent cost function q(xt ) – The control weight R ∝ Σ−1 Repeat until convergence of the trajectory cost: – Create K roll-outs of the system from the same start state x0 using stochastic parameters u + δus at every time step – For k = 1...K, compute: − 1 S(τ i,k )   ∗ P τ i,k = e λ− 1 S(τ ) K [e λ i,k ] k=1   −1  1 q ∗ S(τ i ) = φtN + N t j dt + 2 ||u + δus ||M j=i – For i = 1...(N − 1), compute:    ∗ δu(xti ) = K P τ i,k δus (ti , k) k=1    – Update u ← max umin , min u + δu, umax

1 L(x, u, t) = q(x, t) + uT Ru 2

(3)

The running cost L(x, u, t) has two terms: the first q0 (xt , t) is an arbitrary state-dependent cost, and the second is the control cost with weight R > 0. The optimal controls u∗ (x, t) as a function of the cost to go: V (x, t) as follows.

(7)

˜ i ) is defined as: The term S(τ 

tN

˜ i ) ∝ φ(xt ) + S(τ N 

q(xtj )δt

(8)

ti

tN

+ ti

E E2 E xc (tj + δt) − xc (tj ) E E − αc (x(tj ))E E E −1 δt δt Σ tj

Which in discrete time is approximated by:

S(τ i ) ∝ φ(xtN ) +

N −1   j=i

Under the optimal controls u∗ the cost function is equal to the value function V (x). L(x,u,t), the immediate cost, is expressed as:

˜

1 q(xtj )dt + δω Ttj Mδω tj 2

 (9)

 −1 B(x). Essentially the with M = B(x)T C(x)R−1 C(x)T optimal control is an average of variations δω weighted by their probabilities. This probability is inversely proportional to the path cost according to (7) and (9). Low-cost paths have high probability and vice versa. Table I illustrates the path integral control in iterative form as applied to the ACT Hand. Alternative iterative formulations based on Girsanov’s theorem [13] resulting in small differences in biasing terms of the path cost S˜ are available in [23]. V. PI2 L EARNING VARIABLE T ENDON I MPEDANCE

u∗ (x, t) = −R−1 C(x)T ∇x V (x, t)

(4)

The role of optimal control is to drive the system towards parts of the state space with small values of V (x, t). The value function V (x, t) satisfies the Hamilton Jacobi Bellman equation partial differential equation [7, 20]. Recent work on path integral work in [24, 11] and logarithmic transformations of the value function V (x, t) = − λ1 log Ψ(x, t) are exploited to transform the HJB into a linear PDEs for which their solution [8, 13] is represented via forward sampling of diffusions processes. The outcome is the expression that follows: 1 V (x, ti ) = − log λ



(

Ppath e

− φt + N

N −1 qt dt j=i j λ

) dxN

(5)

where the probability Ppath = P (xN , tN |xi , ti ) has the form of path integral starting from the initial state xi and ending in state xN under uncontrolled u = 0 stochastic dynamics in (2). In [24], [12] it has been shown that under the

A. Variable Impedance If interaction forces with the environment were neglible, a standard approach would be to apply negative feedback control with as aggressive gains as possible while still maintaining stabilty. High feedback gains, however, are dangerous for robots interacting with humans and are potentially wasteful when perturbations are not task-relevant. The idea of impedance control is to control the dynamic behavior of the manipulator in addition to commanding a reference state trajectory [10] [3] [4]. Variable stiffness impedance control specifies a schedule of gains emphasizing uncertain or unforgiving components of the task, while allowing compliance elsewhere. For complex tasks and unknown environmental dynamics, a suitable target impedance schedule is difficult to specify a priori. For biological and some biomimetic systems, stiffness is achieved by coactivation of antagonist muscles, introducing futher challenge for impedance scheduling due to complex actuator dynamics [9] [16]. Here we define the basic structure of our controller, then describe how learning is applied.

371

Consider motor commands τ calculated via a proportional controller: τi = −ki (li − lˆi )

(10)

where ki is the proportional gain and lˆi is the desired reference length for tendon i. By varying k in time according to a gain schedule, we may achieve variable impedance specific to each tendon. The reference trajectories may differ from the demonstration to anticipate task dynamics.

VI. C OST F UNCTION AND E XPERIMENTS

B. Learning Variable Impedance Using PI2 We apply PI2 to learning controls with no time-averaging of the learned parameter vector. We use a time-varying impedance controller with a differential equation on the gains and reference lengths: dk(t) = (−αk(t) + uk (t)) dt + σk δuk   dˆl(t) = −αˆl(t) + uˆl (t) dt + σˆl δuˆl

to the exact value of σ, and high variance results in low control cost (Section IV) and therefore increased control authority. Each trial consists of ten rollouts, and after every third trial performance is evaluated by executing three exploration-free rollouts (σ = 0). The controls learned for each tendon are linked to the others only through their shared effect on the system reflected in the rollout costs. In this way, each tendon-specific PI2 can be considered a parallel agent operating on its own. A. Cost Function We conduct three experiments using the same setup and demonstration. For all experiments, the cost-to-go function for a rollout having duration T may be expressed as :

(11)

Ct = qterminal (xT ) +

T 

q(xt ) + ut T Rut

(13)

t

(12)

where uk and uˆl are the change in gains and reference trajectories. δuk and δuˆl are sampled from a zero-mean Normal distribution as in the PI2 algorithm, Table I. The smoothing effect of these differential equations provides a means for using anything from unsmoothed (α → ∞) to highly smoothed (α = 0) controls. In either case, no time averaging acts on the level of the learning algorithm [19]. The experiments presented here use α = 0.9 dt = 180, providing a very slight smoothing effect to the control outputs. Section VII presents learned controls which, although quite noisy, produce smooth tendon trajectories with contact-produced discontinuities, as illustrated in Figure 5. The controls vector u = u(t) + δu(t) optimized by the learning algorithm (see Table I) corresponds to a concatenation T of gain and reference length controls: u = uk (t) uˆl (t) . P I 2 learns the optimal gain and reference length for each tendon at each timestep, a considerable number of parameters. To combat this high-dimensionality without adversely effecting the learning, we subsample the injected noise δu to change each 50 timesteps during a rollout. Such windowing is especially important for systems like ours whose high-order dynamics filter out much of the injected noise. Learning proceeds through iterative revision of the policy parameters. Each of these revisions we refer to as a trial. A sample trajectory is queried from the system by sampling δu, and actually performing a switch-slide using the resulting u + δu. We refer to one of these exploratory executions of the task as a rollout. To revise u at the end of a trial, each sampled control strategy is weighted according to the cost encountered by the corresponding rollout (Table I). The results reported here use σk = 50 for sampling gains and σˆl = .05 for sampling reference trajectories. The smaller this exploration variance is, the more similar rollouts are, so the magnitute of σ should depend on the natural stochasticity of the plant, though here it is set by hand. Convergence is qualitatively insensitive

Here xt corresponds to the position of the switch at time t. q(xt ) is the cost weighting on the switch state, with qterminal (xT ) the terminal cost at the end of the rollout. R is the cost weighting for controls. The experiments presented here use q(xt ) = 20000xt , qterminal (xt ) = 300q(xt ), meaning the terminal state cost is as costly as 300 time steps. B. Gain Scheduling: Experiment One For the first experiment P I 2 learns only the gains K. Therefore the sampling noise σˆl = 0. This means that the learned strategy will always kinematically resemble the demonstration, but will optimize compliant control of the dynamics. C. Gains and Reference Lengths: Experiment Two The second experiment is to learn both gains and reference lengths simultaneously. The learned strategy may take advantage of kinematic poses not experienced in the demonstration. D. Stabilizing Spring Dynamics: Experiment Three In the third experiment P I 2 learns both gains and reference lengths, and we also introduce spring behavior to the switch dynamics, requiring a qualitatively different learned optimal strategy. The intrinsic switch dynamics are retained, but the motor coupled to the switch also resists the finger with a stiffness force fspring = k(xmax − x). This force springs the switch back up toward its home position if the finger pushes too far and loses contact. For the first two experiments, the optimal strategy could include pushing past the point of losing contact with the switch, but for the third the finger must proceed quickly to the edge of losing contact but go no further. VII. R ESULTS The task is successfully learned for each experiment. Figure 6 depicts the sum cost-to-go results as learning proceeds for 100 trials (revisions of control) for two separate executions of each experiment. Every third trial, three unexploratory rollouts are performed, and their means and standard deviation are denoted by the solid line and shaded width in the figure. Cost

372

Experiment 1

Experiment 2

Experiment 3

(a)

(d)

(g)

(b)

(e)

(h)

(c)

(f)

(i)

Early Learning Policy Improvement

Middle Learning

Late Learning

Fig. 4: Final posture attained for early, middle, and late learning (rows) for each experiment (columns). In pane (f) we observe the switch being thrown past the last point of finger contact to its physical limit. Pane (h) depicts a strategy which has pushed the finger too far, and the switch has bounced upward due to the added spring force (experiment three, Section V). In pane (i) we see the very tip of the finger just preventing the switch from returning.

Experiment 1

Experiment 2

Experiment 3

τ (mA)

1500 1000 500 0

L (cm)

1 0 −1

x

4 2 0

0

1 Time (sec)

2

0

1 Time (sec)

2

0

1 Time (sec)

2

Fig. 5: Results after learning (trial 100) for all three experiments. The first row depicts the torque commands learned for all tendons. Dynamic requirements of the tasks are apparent in the learned strategies. For instance, Experiment 2 appears to require two bursts of torque to achieve vigourous flexion, but Experiment 3 requires another in order to arrest finger motion before losing contact. The middle row depicts the resulting tendon trajectories, and the third row depicts the position of the switch.

373

Switch (x) Switch (x)

Fig. 6: Learning as trials progress. In experiment one (left), only the gains are learned and the reference lengths recorded during the demonstration are used. In experiment two (middle), both the gains and reference lengths of the tendons are learned. In experiment three (right), additional spring dynamics resist switch displacement from initial position. Solid line indicates mean cost for three unexploratory rollouts every third trial; the shaded area is standard deviation. Each of the two curves corresponds to a repetition of the experiment beginning with the same initial control strategy.

0.5

1 1.5 Time (sec)

2

2.5

0

0.5

1 1.5 Time (sec)

2

2.5

2

4 2

Slider Movement Fast

Holding At Bottom

Slow

Flexors

Commanded Motor Torque (mA)

1000

In Experiment 1, the optimized gain schedule produces a movement which smoothly pushes the switch to near the position attained in the demonstration. As can be seen in Figure 5, top left pane, the learned strategy uses the two primary flexors very similarly. After 100 trials, the total cost is reduced by about 15% from the initial policy. B. Gains and Reference Lengths: Experiment Two Results

Experiment 3 presents the most dynamically challenging task. In this experiment, the motor resisting switch displacement implicitly imposes a task constraint: the finger must move quickly to push the switch as before, but now it must stop quickily to prevent overshoot. As in the other experiments, Figure 5, bottom-right presents the switch trajectory for the learned strategy. The learned strategy makes a large

0

4

Contact

A. Gain Scheduling: Experiment One Results

C. Stabilizing Spring Dynamics: Experiment Three Results

Slider Movement Fast Slow

Fig. 7: Above: Switch position learned for Experiment 3, in which additional spring dynamics resist switch displacement from initial position. Below: Switch position for a middlelearning failure to stop before leaving contact with the switch.

magnitude is different for each experiment due to different control exploration (Equation 13), so sum cost-to-go for the first noise-free rollout is normalized for comparison. Figure 5 shows the commanded torques, resulting tendon trajectories, and switch position trajectories for the final learned control strategies for each experiment.

The strategy found for Experiment 2, learning both the gains and references, differed between the two executions of the experiment. Both learned strategies quickly move the switch down, past the kinematic pose observed in the demonstration. During the second execution, however, P I 2 learned that the switch could be thrown down, using the inertia of the switch to carry it past the last point of finger contact, xreach . The result of this highly dynamic behavior can be seen in Figure 4, pane (f). The switch is thrown to its physical limit xmin in a single ballistic motion, as evidenced in the switch position trajectory pictured in Figure 5, middle-bottom.

Holding At Bottom

Contact

0 Extensors

1000

0 Lumbrical 1000

0

0

0.5

1 1.5 Time (sec)

2

Fig. 8: Torques resulting from the learned control strategy for experiment three, in which additional spring dynamics resist switch displacement from initial position.

initial switch movement followed by a burst of flexion and then co-contraction of flexors and extensors. In order to better illustrate the effect of the additional spring dynamics, Figure 7 presents a suitably stopping (top) and failed stop (bottom) switch trajectory. Recall that the control policy incorporates feedback of tendon lengths only; measurement of the switch position is used in evaluating a rollout but not within the control loop.

374

D. The Role of the Lumbrical Tendon Experiment 3 yields insight into the role of the Lumbrical tendon. In that experiment the finger must move quickly but then stabilize the switch at the extent of the finger’s reach, corresponding to switch position xreach in Figure 3. This requires more elaborate use of the extensor tendons than in the first two experiments, in which extensors primarily stabilize the abduction-adduction motion of the finger. Examining the strategy learned by the controller for this tendon serves as a novel way of illuminating the complex role it plays. The tendon extensor mechanism matches the human tendon network in important ways [30], including finger lumbricals. Lumbricals are smaller tendons which attach to other tendons, both flexors and extensors, instead of bone. This produces a moment arm relationship to joint torques which is complex and highly pose-dependent. Roughly, it is a flexor of the metacarpophalangeal (MCP) joint, the ”knuckle“, while being an extensor of the two distal joints. With the hand outstretched, thumb up, the lumbrical would be used to push the fingertip as far as possible from the palm, straightening the finger while flexing the knucle. This is exactly the motion necessary for the finger to contact the switch for the longest amount of time during downstroke. By examining the control strategy learned for the lumbrical tendon in concert with the more straightforward tendons, we observe its protean nature. As illustrated in Figure 8, it acts as a flexor at the beginning of the motion, with a clear peak of torque being delivered in concert with the FDP/FDS flexors at the moment of greatest switch movement. However, after motion has ceased we observe cocontraction of all tendons, but the pattern of activation of Lumbrical more closely matches the torque profile of the extensors. VIII. R ELATED W ORK AND D ISCUSSION A. Optimal Control for Tendon-Driven Systems Optimal control provides a principled approach for robots to expertly perform tasks without relying on pre-programmed or stereotyped motions. Recent successes in applying algorithms like iLQG [17, 15, 18] suggest that difficult control tasks, like dexterous manipulation, may be better achieved by formulating models and cost functions than by laboriously hand-tuning reference trajectories or feedback gains directly. Notably, optimal control can solve for time-varying torque and stiffness profiles to achieve dynamic tasks compliantly[2]. In contrast however with [17, 15, 2] we do not use state space dynamical models. P I 2 is a derivative-free method in the sense that it does not require linearization of dynamics and quadratic approximations of cost functions on state space trajectories. More importantly the optimal control in our approach is an average over sampled controls based on how well they performed on the real system and thus no differentiation of the value functions is performed. With respect to previous work on variable stiffness control with reinforcement learning [4, 3], our approach does not use function approximators to represent gains or desired trajectories. This lack of policy

parameterization allows us to learn non-smooth trajectories and control gains required for tasks that involve contact, which is particularly important when controlling tendonactuated manipulators. Tendon-driven systems are particularly advantageous in applications like dexterous hands where it is desirable to have compact and low inertia end effectors [14]. The associated increase in control dimensionality and nonlinearity, as well as the relatively greater prominence of task dynamics make system identification of the task and the robot less tractable. Model-based control methods for systems reflecting the biomechanical complexities of hands are very much an open research topic due mostly to the interaction of tendons in the network. This paper shows that by using a model-free reinforcement learning algorithm like P I 2 , we can bridge the gap between optimal control and tendon-driven systems: we simultaneously enjoy the benefit of choosing cost functions instead of reference trajectories or gains while also circumventing the need for an accurate model of the robot and environment. Future work will investigate the generalizability of learned policies and the possibility of using this algorithm also as a data collection mechanism for system identification. The absence of models for the underlying hand and task dynamics as well as the lack of policy parameterization does not come without cost. As demonstrated in this paper, our learning approach convergences to an optimal solution for all experiments, nevertheless this may require the execution of many rollouts on the real system. This is not a surprising observation since it is related to the exploitation-exploration trade-off in reinforcement learning literature [21] in correspondence to how much information of the underlying dynamics is initially provided to the learning algorithm. Clearly, once a model is known, its use could speed up learning. Future work will explore the use of Stochastic Differential Dynamic Programming (SDDP) [25] or iLQG on contact-less models and integrate the resulting control policies with P I 2 to learn tasks with contact and motion to force control transitions. B. Discovery of System Phenomena The second execution of Experiment 2 discovered that the switch could be thrown beyond xreach , and the learned behavior for Experiment 3 involved cocontraction of antagonists, as seen in biological tendon actuation, without explicit instruction. By interacting directly with the environment, the robotic system learned to perform dynamic behavior never encountered in the demonstration, exploiting subtle phenomena without system ID. The advantages of this kind of embodied learning have been explored in a variety of experiments, for example learning circuit configurations [26] or robot control and morphology [1]. Manipulators incorporating complex actuation and eventually sensory capabilities could benefit from learning from task-relevant experience as opposed to expert knowledge. Similarly, analysis of the effects of varying aspects of complex systems like tendon networks can be difficult due to dimensionality, model mismatch, and nonlinear phenomena.

375

As described in Section VII-D, the learned policy produces torques which validate the biomimetic anatomical properties of the tendon hood extensor mechanism. The approach outlined in these experiments is valid for a variety of complex systems which might otherwise be difficult to measure or simulate. By observing that the learned usage the Lumbrical tendon produces dynamic consequences similar to those known for humans, we now can more confidently assert the biomimicry of that aspect of the tendon network. R EFERENCES

[16]

[17]

[18]

[1] J. Bongard. The utility of evolving simulated robot morphology increases with task complexity for object manipulation. Artificial Life, 16(3):201–223, 2010. [2] D.J. Braun, M. Howard, and S. Vijayakumar. Exploiting variable stiffness in explosive movement tasks. In Robotics: Science and Systems Conference, 2011. [3] J. Buchli, E. Theodorou, F. Stulp, and S. Schaal. Variable impedance control - a reinforcement learning approach. In Robotics: Science and Systems Conference, 2010. [4] J. Buchli, F. Stulp, E. Theodorou, and S. Schaal. Learning variable impedance control. International Journal of Robotics Research, 30(7):820, 2011. [5] A.D. Deshpande, J. Ko, D. Fox, and Y. Matsuoka. Anatomically correct testbed hand control: muscle and joint control strategies. In IEEE International Conference on Robotics and Automation, pages 4416–4422, 2009. [6] A.D. Deshpande, Z. Xu, M. J. V. Weghe, L. Y. Chang, B. H. Brown, D. D. Wilkinson, S. M. Bidic, and Y. Matsuoka. Mechanisms of anatomically correct testbed (ACT) hand. IEEE Trans. Mechatronics, to appear, 2012. [7] P. Dorato, V. Cerone, and C. Abdallah. Linear Quadratic Control: An Introduction. Krieger Publishing Co., Inc., 2000. [8] A. Friedman. Stochastic Differential Equations And Applications. Academic Press, 1975. [9] N. Hogan. Adaptive control of mechanical impedance by coactivation of antagonist muscles. IEEE Transactions on Automatic Control, 29(8):681–690, 1984. [10] N. Hogan. Impedance control: An approach to manipulation. Journal of Dynamic Systems, Measurement, and Control, 107(2):17, 1985. [11] H. J. Kappen. Linear theory for control of nonlinear stochastic systems. Phys Rev Lett, 95:200–201, 2005. [12] H. J. Kappen. Path integrals and symmetry breaking for optimal control theory. Journal of Statistical Mechanics: Theory and Experiment, 11:P11011, 2005. [13] I. Karatzas and S.E. Shreve. Brownian Motion and Stochastic Calculus. Springer, 2nd edition, August 1991. [14] H. Kobayashi, K. Hyodo, and D. Ogane. On tendondriven robotic mechanisms with redundant tendons. The International Journal of Robotics Research, 17(5):561– 571, 1998. [15] P. Kulchenko and E. Todorov. First-exit model predictive control of fast discontinuous dynamics: Application to

[19]

[20] [21]

[22]

[23]

[24]

[25]

[26]

[27] [28]

[29]

[30]

376

ball bouncing. In IEEE International Conference on Robotics and Automation, pages 2144–2151, 2011. S.A. Migliore, E.A. Brown, and S.P. DeWeerth. Biologically inspired joint stiffness control. In IEEE International Conference on Robotics and Automation, pages 4508–4513. IEEE, 2005. D. Mitrovic, S. Nagashima, S. Klanke, T. Matsubara, and S. Vijayakumar. Optimal feedback control for anthropomorphic manipulators. In IEEE International Conference on Robotics and Automation, pages 4143– 4150. D. Mitrovic, S. Klanke, and S. Vijayakumar. Learning impedance control of antagonistic systems based on stochastic optimization principles. International Journal of Robotics Research, 30(5):556–573, 2010. E. Rombokas, E. Theodorou, M. Malhotra, E. Todorov, and Y. Matsuoka. Tendon-driven control of biomechanical and robotic systems: A path integral reinforcement learning approach. 2012. R.F. Stengel. Optimal Control and Estimation. Dover Publications, New York, 1994. Richard S. Sutton and Andrew G. Barto. Reinforcement learning : An introduction. Adaptive computation and machine learning. MIT Press, Cambridge, 1998. E. Theodorou. Iterative Path Integral Stochastic Optimal Control: Theory and Applications to Motor Control. PhD thesis, University of Southern California, 2011. E. Theodorou and E. Todorov. Relative entropy and free energy dualities: Connection to path integral and kl control. Submitted, 2012. E. Theodorou, J. Buchli, and S. Schaal. A generalized path integral approach to reinforcement learning. Journal of Machine Learning Research, (11):3137–3181, 2010. E. Theodorou, Y. Tassa, and E. Todorov. Stochastic differential dynamic programming. In American Control Conference, 2010. A. Thompson. An evolved circuit, intrinsic in silicon, entwined with physics. Evolvable Systems: From Biology to Hardware, pages 390–405, 1997. E. Todorov, T. Erez, and Y. Tassa. Mujoco: A physics engine for model-based control. F.J. Valero-Cuevas, F.E. Zajac, C.G. Burgar, et al. Large index-fingertip forces are produced by subjectindependent patterns of muscle excitation. Journal of Biomechanics, 31(8):693–704, 1998. M. Venkadesan and F.J. Valero-Cuevas. Neural control of motion-to-force transitions with the fingertip. Journal of Neuroscience, 28(6):1366–1373, 2008. D.D. Wilkinson, M.V. Weghe, and Y. Matsuoka. An extensor mechanism for an anatomical robotic hand. In IEEE International Conference on Robotics and Automation, 2003.

Guaranteeing High-Level Behaviors While Exploring Partially Known Maps Shahar Sarid

Bingxin Xu

Hadas Kress-Gazit

Sibley School of Mechanical and Aerospace Engineering Cornell University Ithaca, New York 14853, USA Email: {sarid, bx38, hadaskg}@cornell.edu

Abstract—This paper presents an approach for automatically synthesizing and re-synthesizing a hybrid controller that guarantees a robot will exhibit a user-defined high-level behavior while exploring a partially known workspace (map). The approach includes dynamically adjusting the discrete abstraction of the workspace as new regions are detected by the robot’s sensors, automatically rewriting the specification (formally defined using Linear Temporal Logic) and re-synthesizing the control while preserving the robot state and its history of task completion. The approach is implemented within the LTLMoP toolkit and is demonstrated using a Pioneer 3-DX in the lab.

I. I NTRODUCTION Consider the case of a “Search and Rescue” scenario where a mobile robot is patrolling inside a collapsed building with unknown rooms. The robot is required to explore all the accessible rooms and search for survivors. The robot’s highlevel behavior must be guaranteed to ensure safety and mission completion while expanding the map to include unknown regions. Recently, several approaches have been suggested for generating correct high-level robot behavior [14, 12, 8, 15, 11, 2, 20] from an abstract description of a task; these approaches tackle the inherent continuous problem of robot sensing, motion and action by creating a discrete abstraction of the task, generating a provably-correct discrete solution and implementing the solution by composing a set of continuous low-level controllers such that the overall continuous behavior of the robot is the desired one. The specification formalisms are usually a variant of temporal logic, with Linear Temporal Logic (LTL) being the most commonly used. The creation of the discrete solution is based on ideas from formal methods, mainly model checking [3] and synthesis [17] when assuming no noise on sensors and actuators (e.g. [14, 12, 20, 2]), and probabilistic model checking and policy synthesis for Markov Decision Processes (MDPs) when uncertainty is taken into account (e.g.[15, 10]). When implementing the continuous behavior, researchers have considered potential field type motion planners (e.g. [14, 12]) as well as sampling-based approaches (e.g. [11, 2]). This work was supported by NSF CNS-0931686 and NSF CAREER CNS0953365

What is common to all aforementioned approaches is that the workspace is assumed to be known; that is, the robot is required to perform its high-level task within a known map. In the process of generating the control, the map is abstracted to a graph where the nodes are regions in the map and edges in the graph indicate that the robot can move between the two regions without going through a third region. In general, when considering a convex partition of a polygonal map, the edges represent the adjacency of the regions (an edge exists between nodes representing two regions that share a common face) since there are several low-level controllers that have been developed over the years that are guaranteed to drive a robot from any point within the convex polygon to one of its faces, and thus to the adjacent region (e.g. [4, 1, 16]). This paper relaxes the assumption of an a priori completely known map; instead, an initial map is assumed which may contain as little as one region, and as the robot is performing its task, new regions are discovered and the task is adjusted accordingly in an automatic and provable-correct manner. The new regions are detected using an occupancy grid [7] and then added to the known map, gradually expanding and creating a larger map. This work builds on the approach of [14] where a fragment of LTL is the formalism used to captures tasks that are reactive, meaning that the robot behavior depends on the occurrence of environmental events, for example “continually patrol all the rooms, stop if you see a person and ask them if they need help”. Given a formula in the logic, a tractable synthesis technique is used to generate an automaton such that all its executions satisfy the LTL formula. The complexity of synthesizing an arbitrary LTL formula is double exponential in the size of the formula, but if the specification is restricted to a specific fragment of LTL, the complexity becomes polynomial in the state space [17]. The contribution of this paper is in providing guaranteed high-level behavior of robots operating in unknown workspaces. Specifically, this paper describes: (i) The set of user-specified high-level tasks which is enriched by allowing quantifiers (‘all’, ‘any’) over regions. This allows a user to say “visit all offices”. (ii) The discrete abstraction of the workspace that is updated on-the-fly based on sensor information, and (iii) A re-synthesis algorithm. The algorithm preserves the task

377

A. Linear Temporal Logic (LTL) specifications Linear Temporal Logic (LTL) is a modal logic that includes, in addition to Boolean operators (such as ‘not’, ‘and’, etc.), temporal operators which allow formulas to capture truth values of atomic propositions (π) as they evolve over time. LTL formulas are constructed from atomic propositions π ∈ AP according to the following recursive rules ϕ ::= π|¬ϕ|ϕ ∨ ϕ|  ϕ|3ϕ|2ϕ

Fig. 1: The robot is visiting the classrooms. The cone represents a hazardous item in front of a classroom and the raised flag was the robot’s response to detecting the cone. The cup contained a touch sensor and was used to collect assignments. history and automatically updates the instructions of the robot behaviors, according to the new workspace. Testing the re-synthesis method and algorithms in real environments with a physical robot equipped with range sensors, gives rise to questions regarding when and how to decide a new region is detected. As a first step toward a generalized solution to the problem, assumptions regarding the environment are made (Section III). The work described in this paper was implemented on a physical robot in the lab, where the onboard laser range scanner was used to detect previously unknown regions while the robot was performing a high-level tasks. The task described in Section VI required the robot to collect assignments from students in classrooms and hand them to professors in their offices. When the robot saw a hazardous item in front of a classroom (a cone), it avoided entering that classroom and raised a flag of warning (Fig. 1). The robot detected new rooms, distinguished between classrooms and offices according to their size, and continued behaving according to the specification in the a priori unknown workspace. The paper is structured as follows: Section II presents background information regarding the approach and formalisms for generating high-level correct robot control. Section III defines the problem this paper is focusing on and the assumptions that are made. Section IV discusses the enriched grammar containing region quantifiers, and Section V describes the resynthesis process. Section VI describes the experiments and the paper concludes in Section VII. II. BACKGROUND This section presents the background information needed for the remainder of the paper. It includes the definition of the underlaying logical formalism, an overview of the process of generating provably-correct robot control from logical formulas and a description of LTLMoP [9], the toolbox that is used to generate the control and perform experiments.

(1)

where  is ‘Next’, 3 is ‘Eventually’, and 2 is ‘Always’. This work considers robot specifications that are defined over a discrete abstraction of the robot motion and action and are captured in a fragment of LTL. Specifically, the atomic propositions comprise of a set X = {x1 , ..., xm } of environment propositions corresponding to abstract sensor information (e.g. “object detected”), and a set Y = {r1 , ...rn , a1 , ..., ak } of robot propositions that correspond to the location of the robot (if ri is true then the robot is currently in region i) and its actions aj (e.g.“flag is raised”). The fragment of LTL considered in this work follows [17, 14] where formulas are of the form ϕ = (ϕe ⇒ ϕs ); ϕe is an assumption about the sensor propositions, and thus about the behavior of the environment, and ϕs represents the desired behavior of the robot. ϕe and ϕs have the following structure: ϕe = ϕei ∧ ϕet ∧ ϕeg ; ϕs = ϕsi ∧ ϕst ∧ ϕsg , where: e s • ϕi ,ϕi are non-temporal Boolean formulas constraining the initial value(s) for the environment and robot respectively. They are of the form ∧i Bi , where Bi is a boolean formula over the set X ∪ Y . e s • ϕt ,ϕt represents safety assumptions on the environment and safety requirements on the robot’s behavior. Safety includes all behaviors that the environment or the robot must always satisfy. ϕet constrains the next environment state based on the current environment state and current robot state, and ϕst constrains the possible next robot state based on the current environment, current robot state, and the next environment state. The formulas are of the form ∧i 2Bi where Bi is a boolean formula over the set X ∪ Y ∪ X for ϕet and X ∪ Y ∪ X ∪ Y for ϕst . e s • ϕg , ϕg represent fairness assumptions for the environment and liveness requirements for the robot. These include goals the environment or the robot should always eventually satisfy. The formulas are of the form ∧i 2 5 Bi where Bi is a boolean formula over the set X ∪ Y . One property of the synthesis algorithm is that the order of the formulas in ϕsg determines the sequence in which the robot will fulfill its liveness requirements. For a liveness requirement ϕi , the goal number gNum(ϕi ) = i indicates that ϕi is in the i-th position in the sequence of goals. For example, in ϕsg = 23ra ∧ 23(rb ∨ f lag), the requirement that the robot eventually go to ra is of gNum = 0 (the first liveness) and the requirement of going to rb or raising the flag is of gNum = 1. B. Discrete abstraction of the workspace The robot’s workspace is assumed to be a 2 dimensional polygonal environment. The motion of the robot in the

378

workspace is abstracted by a graph where each node represents a region and the edges represent adjacency relations between the regions. Leveraging controllers such as those of [4, 1, 16], given a set of convex polygons, a robot can move between any adjacent regions if there are no obstacles. For example, from region r2 shown in Figure 2 the robot can move to adjacent regions r0 , r1 , r3 or stay in r2 . The motion constraints are captured by: 2(r2 ⇒ (r2 ∨ r0 ∨ r1 ∨ r3 )) C. Control generation Given a robot task as an LTL formula belonging to the fragment described above, if the task is synthesizable [18] an automaton whose behaviors satisfy the formula will be automatically generated (see [17, 14] for details). The hybrid controller used to continuously control the robot is based on the execution of the automaton. An admissible input sequence is a sequence X1 , X2 , ... s.t. Xj ∈ 2X is the set of environment propositions that are true at time step j, that satisfies ϕe . A run of the automaton under an admissible input sequence is a sequence of states q0 , q1 , ..., which starts at a possible initial state of the automaton: q0 ∈ Q0 . At each time step, the robot sensor information is used to determine the truth values of the environment propositions X, and together with the current state q the next state q  is determined following the transition relation δ, i.e., q  = δ(q, X). γ is the state labeling function where γ(q) = y and y ⊆ Y is the set of robot proposition that are true in state q. Based on the labels of q  , the next region and the next actions are performed and the appropriate low-level controllers are executed. The reader is referred to [14] for more details. Every state q ∈ Q has an associated goal number, which indicates the current goal (liveness requirement) the robot is heading toward. This goal number is denoted by γr (q). D. LTL MissiOn Planner (LTLMoP) Linear Temporal Logic MissiOn Planning (LTLMoP) [9] is a Python-based, open-source toolkit that allows users to control physical and simulated robots by specifying highlevel instructions in structured English. Furthermore, if a specification contains behaviors that cannot be guaranteed (they may be inconsistent or there may be an environment in which the robot fails), no automaton will be synthesized and LTLMoP facilitates the process of understanding the problem in the given specification [18]. The experiments described in this paper were executed using LTLMoP as further discussed in Section VI. III. P ROBLEM F ORMULATION AND A SSUMPTIONS This paper focuses on guaranteeing the execution of highlevel tasks by a mobile robot operating in an a priori unknown environment. While executing it’s mission, the robot detects regions that are not defined in its current map. The basic assumptions regarding the unknown regions are as follows. First, the robot starts the execution with a partially known map. The initially known map can be arbitrary small, on the

order of the size of the robot. This assumption is essential, since the robot must be positioned in a free area first. Second, the robot must have the capabilities to detect new regions, i.e., it must have adequate sensors that can sense obstacles or lack thereof. The robot is assumed to have the appropriate sensors and actuators to perform the high-level tasks it is instructed to do. These sensors and actuators are assumed to be binary, as described in Section II. Definition 1: Robot model. The robot is assumed to be of size D, where size can be the diameter of a circular robot, or the longest dimension in the plane of movement. We assume that the robot’s structure and dynamic properties constrain its movement such that it has a minimal operating circular area of radius sD (for example for turning), where s ≥ 1 is a safety factor which depends on the robot’s structure and dynamic properties. The robot’s position and orientation relative to a fixed frame are assumed to be known. Definition 2: Region. A region p with index i, denoted pi , is a polygon in the plane, defined as a circuit of j line segments, sides [5], or edges [6], pi1 pi2 , pi2 pi3 , . . . , pij pi1 joining j = pi  points, or vertices, pi1 , pi2 , pi3 , . . . , pij . The edges of pi must not cross each other. The polygon is regarded as consisting of its vertices, edges, and the interior area bounded by it. Each region is simply connected, meaning there are no holes within it. The region pi can be non-convex. A region is free from obstacles, and it contains a bounded circle ri whose diameter dri satisfies dri ≥ sD to comply with Definition 1. Definition 3: Initial region: The robot starts from initial region p , which conforms to Definition 2. Definition 4: Workspace. The workspace is the map the robot uses, in which its operation can be guaranteed and is composed of a set of regions as defined above. The current workspace is assumed to be known in each step i, and is denoted P i . A step is defined as a stage of execution, for which the map does not change. If the map changes, as explained later, the step is incremented. Each workspace has a boundary defined as follows. Definition 5: Boundary. The boundary B i of a workspaces i P is defined as a polygon, or a set of polygons, composed of all the edges of the regions of P i which are not shared between two regions. If the workspace is a simple polygon, its boundary is composed from only one polygon. However, if the workspace contains holes, the boundary is composed from an outer polygon, and another polygon for each hole. Within each step, the workspace is assumed to be static inside its boundary, but dynamic in the sense that a previously occupied boundary edge can become unoccupied and serve as the edge of a new detected region. Definition 6: Expansion. The workspace can be dynamically expanded in each step. The robot starts in step 0 with initial workspace P 0 , where P 0 contains at least the initial region p , and is possibly composed of a total of g regions, such that P 0 = p1 ∪ p2 ∪ p3 . . . ∪ pg . In each step i, a new region pi is added. Each new added region pi conforms to

379

A. Quantifiers

Fig. 2: Left: Initial known map. The robot position is marked with a white circle. Right: Two new, unknown regions, safe1, danger1, are updated when the robot visits r2, r3 Definition 2. Moreover, each pi must be adjacent to at least 0 one known region ,ph , h = 1 . . . i − 1 or ph ∈ P g. The 0 workspace is recursively defined as follows, P = d=1 pd , P i = P i−1 ∪ pi , ∀i > 0. Definition 7: New-region sensor: The robot is assumed to have a new-region sensor, which can detect new regions. The new-region sensor is capable of distinguishing, up to its operating range, between free and occupied space1 . According to Definition 2, the dimensions of a new region must have minimal values, therefore, the sensor’s operating range must be greater than sD. Problem 1: (Operating in an unknown environment) Given an initial known workspace P 0 , an initial region p ∈ P 0 , a mobile robot equipped with a new-region sensor, initial conditions and task specifications, construct a controller such that the robot’s behavior satisfies the user specification, in any admissible, possibly unknown environment. The problem formulation is illustrated in the following example which demonstrates a simple high-level task defined over an unknown workspace and executed using LTLMoP. Example 1: Search and Rescue mission. Consider the case of a ”Search and Rescue” scenario where a mobile robot is placed in a building which collapsed due to an earthquake. The robot must explore the inner parts of what is left from the building and search for survivors. The building is partially ruined, such that the original blueprint cannot be used. The robot is placed in a partially clear area, shown in Figure 2 (Left). There are two groups of regions: dangerous and safe. The robot is required to visit all the dangerous regions: r2,r3, and to search for survivors (people). If the robot finds people, it will guide them back to one of the safe places: r0 or r1. If a new region is detected, it is identified by the sensor as safe or dangerous, and then it is added to the map. IV. G RAMMAR The grammar [13, 9] used by LTLMoP has been enriched with quantifiers and reactions to new regions needed for defining tasks in unknown maps. 1 The new-region sensor should be some sort of a range sensor (could be infrared, ultrasonic, laser scanner or camera using vision techniques)

Consider the scenario in Example 1, the same task is assigned over multiple regions. When a new region is detected, no matter safe or dangerous, extra specification is needed for defining the tasks over the new region. Therefore an automatic process for assigning tasks over multiple regions is necessary. To deal with such conditions, quantifiers are introduced in this section. To apply quantifiers over multiple regions, one defines groups as follows: • Group groupName is region1, region2, region3... If the groupName is used together with the quantifiers “all” or “any”, the sentences are automatically converted into LTL formulas over the regions. The translation process is as follows: if quantifier ‘any’: result = join regioni [:] with ∨ substitute groupName with result if quantifier ‘all’ resulti = substitute groupName with regioni join resulti [:] with ∧ Revisiting Example 1, the new specification would be: • Group Safe is r0 , r1 • Group Dangerous is r2 , r3 • If you are not activating guide then visit all Dangerous • If you are activating guide then visit any Safe is translated into ∧i∈{2,3} 23((¬aguide ) ⇒ ri ) ∧ 23((aguide ) ⇒ (r0 ∨ r1 )) These quantifiers facilitate writing specifications for unknown workspaces as they do not require the explicit enumeration of all regions. When a new region is detected, the user is not required to manually write the additional specification and the robot is able to automatically re-synthesize the controller and resume the execution. The grammar for re-synthesis is introduced in Section IV-B, and the algorithm is introduced in Section V. B. Specification for re-synthesis Now assume that in Example 1, when the robot enters r2, it detects a new dangerous region danger1 adjacent to r2. Then, when it enters r3, it detects a new safe region safe1 adjacent to r3. The new workspace is shown in Figure 2 (Right). We define a special robot action “re-synthesize”, which terminates the execution of the hybrid controller and re-generates the controller. “re-synthesize” is used in the same manner as a robot proposition in the requirements specification: • If you are sensing regionSensor then do re-synthesize. The re-synthesize action is activated when the regionSensor returns true. If the user explicitly indicates which group to add the new region to, the extra indication is allowed as follows: • If regionSensor then do re-synthesize and add into groupName. The following relation is defined: regionSensor → groupName. Later, during the re-synthesis process, if regionSensor = True,

380

the corresponding groupName is returned to ensure that the correct group is updated with the new region. Furthermore, only specifications applied to this group will need to be rewritten as new LTL formulas. If the robot is capable of distinguishing between different features of the new regions, the detected regions can be added into different groups accordingly. One sensor proposition for each of the groups is necessary, as in Example 1: • •

Algorithm 1 Low-level controller for the re-synthesize action 1: 2: 3: 4: 5: 6:

If you are sensing safe-new-region then do re-synthesize and add it into Safe. If you are sensing dangerous-new-region then do resynthesize and add it into Dangerous.

B. Modifying the discrete abstraction

V. C ONTROLLER RE - SYNTHESIS DURING EXECUTION This section describes the algorithms and automated process for automatically re-synthesizing the high-level controller when new areas are found. It addresses the special robot action re-synthesize, the need for capturing the current robot state and goal, the detection of a new region using sensors, the process of modifying the discrete abstraction of the workspace and the algorithm for creating and synthesizing a modified LTL formula. A. The robot action “re-synthesize” Detection of new region is captured by a Boolean environment proposition, regionSensor. The re-synthesis action is captured by a robot proposition. The two propositions are denoted as snew−region and are−synthesize respectively. The specification “If you are sensing regionSensor then do resynthesize” is captured by the LTL formula: 2(snew−region ⇒ are−synthesize ))

7: 8: 9:

Break execution, reset snew−region , are−synthesize CurrRobotState ⇐ γ(q) CurrEnvState ⇐ values of sensor propositions NewInitState: qˆ0 ⇐ CurrRobotState ∧ CurrEnvState Modify discrete abstraction in workspace, get Yˆ (see Section V-B) CurrGoalNum ⇐ γr (q) Modify liveness conditions in LTL (see Algorithm 2) Re-synthesize the automaton Load new automaton and resume execution

(2)

The “re-synthesize” action is a special action since the lowlevel controller associated with it terminates the execution of the automaton, and calls the module to rewrite the LTL formula and re-synthesize an appropriate automaton. The re-synthesis process is shown in Algorithm 1. In line 1, the execution of the current automaton is terminated. The following two propositions are reset: snew−region = F alse and are−synthesize = F alse. In lines 2-3 the environment propositions and robot propositions of the current state are recorded, in order to describe the initial state of the robot when resuming the execution. In line 4, the initial condition for the new automaton is obtained and the LTL formulas ϕsi , ϕei for initial conditions are updated.  In line 5, the new region proposition is added as: Yˆ = Y rnew . The process of modifying the workspace and rewriting the LTL formula ϕst is discussed in Section V-B. In line 6, the goal number of the liveness requirement toward which the robot is currently moving is recorded. In line 7, re-ordering of the goal requirements is achieved by rewriting the LTL formula ϕsg as described in Algorithm 2 and Section V-C. In lines 8-9, the updated LTL formula is synthesized and the new automaton is executed.

As defined in Section III, the robot maintains a map of the workspace and expands it on-the-fly. The map maintained by the robot is composed of a list of polygonal regions. At step i, the robot is equipped with a map of the already known area, P i , as well as its outer boundary, B i . The new region sensor repeatedly checks each edge of the current region pc , the region where the robot is currently located. If the edge also belongs to the boundary, the new region sensor tests whether there is a new region emanating from it. Defining a new region is possible only if the new regions fulfill the following conditions: the length and the width of the new region, and the width of the edge the new region is emanating from, must be greater than sD. Since the new region is adjacent to the known map, there must exist a transition boundary between the new region and the known regions. The motion constraint graph is regenerated from the new adjacency relationships between the regions, which is written into the robot safety assumption ϕst . Revisiting Example 1, the change in the known workspace, as shown in Fig. 2, results in an updated ϕst : ⎧ 2(r0 ⇒ (r0 ∨ r2 )) ⎪ ⎪ ⎪ ⎪ ⎪ ∧ 2(r1 ⇒ (r1 ∨ r2 )) ⎪ ⎪ ⎪ ⎨ ∧ 2(r ⇒ (r ∨ r ∨ r ∨ r ∨ danger )) 2 2 0 1 3 1 ⎪ ∧ 2(r3 ⇒ (r3 ∨ r2 ∨ saf e1 )) ⎪ ⎪ ⎪ ⎪ ⎪ ∧ 2(danger1 ⇒ (danger1 ∨ r2 )) ⎪ ⎪ ⎩ ∧ 2(saf e1 ⇒ (saf e1 ∨ r3 )) C. Rearranging the liveness conditions Given the current goal number, the robot is able to determine which liveness requirement it was pursuing. Therefore, it is able to distinguish between the complete and incomplete goals. The history of the completed high-level tasks is captured by re-assigning the order of the goals. The robot is allowed to first address incomplete liveness requirements, and in addition, to choose the exploration strategies by inserting the new goal at different positions, either first, thereby creating a depth first strategy, or afterwards, creating a breadth first strategy. The detailed process is shown in Algorithm 2. In lines 1-2, the liveness requirements are classified as complete or incomplete goals. In line 3, the user’s predefined group for the

381

Algorithm 2 Rewriting the LTL formula for liveness requirements 1: CompGoals ⇐ ϕ ∈ ϕsg , gN um(ϕ) < CurrGoalNum 2: IncompGoals ⇐ ϕ ∈ ϕsg , gN um(ϕ) ≥ CurrGoalNum 3: groupName ← regionSensor 4: Translate specs with groupName into LTL 5: newLiveness ⇐ liveness with newRegion in new LTL 6: if Depth First Order then 7: ϕsg ⇐ newLiveness ∧ IncompGoals ∧ CompGoals 8: end if 9: if Breadth First Order then 10: ϕsg ⇐ IncompGoals ∧ newLiveness ∧ CompGoals 11: end if

new region is loaded. In lines 4-5, the liveness with the relevant groupName is translated into LTL, as described in Section IV-A. In lines 6-8, the goals are re-ordered, if following the Depth First strategy, the new goal is put before incomplete goals. In lines 9-11, if following the Breadth First strategy, the new goal is added after the set of incomplete goals. Revisiting Example 1, assume that when the new region danger1 is detected, the robot has already visited r1 , r2 , so the priority of the original goals have been changed. The robot is also capable of choosing between either a depth-first order or breadth-first order, which is achieved as follows: Breadth First(Alg 2, ln 9-11): Depth First(Alg 2, ln 6-8): ϕsg = ϕsg = guide 2 5 ((¬a ) ⇒ danger1 ) 2 5 ((¬aguide ) ⇒ r3 ) guide ∧2 5 ((¬a ) ⇒ r3 ) ∧2 5 ((¬aguide ) ⇒ danger1 ) guide ∧2 5 ((¬a ∧2 5 ((¬aguide ) ⇒ r2 ) ) ⇒ r2 ) guide ∧2 5 ((a ) ⇒ (r0 ∨ r1 )) ∧2 5 ((aguide ) ⇒ (r0 ∨ r1 )) VI. E XPERIMENTS The new algorithms and procedures for detecting and adding unknown regions to the synthesized controller were validated in a real environment with a physical robot. The experiments illustrates how high-level tasks are automatically adjusted while the environments are being expanded. Example 2: Classroom assistant The robot is looking for students who need to submit assignments to their professors in the workspace depicted in Fig. 3. The robot is wandering through the classrooms until it finds such a student. Once it is handed an assignment, it searches for a professor in the offices until it finds one. If a door opens (new classroom or new office), the robot explores that region, classifies it, and continues executing its task accordingly. If the robot detects a hazardous item in front of classroom2, it avoids entering that classroom and it raises its flag as warning. The specifications, given in structured English, are presented in Listing 1. A. Experimental Setup The experimental test bed includes a mobile robot which can move autonomously in the environment and detect new regions. Additionally, the robot is able to perform other actions

Listing 1 Specification for the experiments. Environment starts with false Robot starts with false Always not (newClassroom and newOffice) Always not ((newClassroom or newOffice) and hazardous) Group Classrooms is classroom1 Group Offices is office1 Do flag if and only if you are sensing hazardous If you are sensing newClassroom then do re-synthesize and add to Classrooms If you are sensing newOffice then do re-synthesize and add to Offices If you are sensing newClassroom or you are sensing newOffice then stay If you are not sensing newClassroom and you are not sensing newOffice then do not re-synthesize If you are not sensing assignment and you are not activating re-synthesize then visit all Classrooms If you are sensing assignment and you are not activating re -synthesize then visit all Offices If you are not sensing hazardous and you are not sensing assignment and you are not activating re-synthesize then visit classroom2 If you are sensing hazardous then always not classroom2

Fig. 3: Left: The initial map is composed of hall1, hall2, classroom1, classroom2, office1. Right: field top view such as sensing objects (e.g., cones) and performing actions (e.g., raising a flag). The mobile robot used is based on the Pioneer 3-DX mobile robot platform, upgraded with a compact PC on top, and sensors and actuators as follows. For sensing new regions, a Hokuyo URG04-LX Scanning Laser Range Finder was used. The range finder scans at 10 Hz with a field of view of 240 degrees, and angular step size of 0.36 degree. The range is approximately 4 meters. The scanner data was overlaid on an occupancy grid with resolution of 10 [cm]. LTLMoP communicated with the robot via wireless communication. The experiments were conducted in an indoor environment. The Vicon Motion Capture system was used for obtaining accurate pose information for the robot. The Vicon system consists of 24 infrared cameras mounted on a truss attached to the ceiling, allowing for accurate tracking of rigid bodies, marked with reflective markers. The new-region sensor detects new regions as explained in Section V-B. The new region must be at least sD wide and sD deep relative to the current region edge it is emanating from and it is assumed to be rectangular. The new region sensor scans the occupancy grid for unoccupied cells, starting from the edges of the current region. It only scans edges which are contained in boundary edges. If the sensor finds an opening

382

larger than a minimal width, it continues to scan parallel lines on the grid, advancing outward from the current region. The scan continues until a line is not wide enough. During the scan, the start and end points of each line are recorded, along with its depth, such that at the end of the scan, the new region can be defined according to the maximal width, depth or area, as long as it fulfills the minimal size requirements. The new-region sensor distinguishes between a small new region and a large new region by checking whether either the width or the depth of the new region exceeds a certain threshold. In the reported experiment, the threshold was set to 2sD, D = 0.46 [m] and s = 1.5. In the experiments scenario, the small regions are referred to as offices and the large regions as classrooms. In order to maintain the guarantees, the newregion sensor is implemented conservatively. The sensors and robot propositions used are: Sensors: newClassroom, newOffice, hazardous, assignment • newClassroom/newOffice: The two propositions are attached to the same new-region sensor. If the detected new region is larger than the threshold, the sensor returns true in newClassroom. Otherwise newOffice returns true. • hazardous: A red cone served as a hazardous item. The cone detector is composed of a software blob detector, which receives the image frames from a video camera mounted on the mobile robot. • assignment: The object sensor, which is a cup that can sense the presence or absence of objects within, was used to sense the presence of an assignment. Actuators: flag, re-synthesize: • flag: The robot action is raising a flag if the proposition value is true, or lowering a flag if false. • re-synthesize: When the proposition is true, it activates the local controller that terminates the execution and starts the re-synthesis process.

(a) After detecting the classroom new1. (b) LTLMoP map after adding The boundary is in yellow, the regions are classroom new1. separated by turquoise lines.

(c) After adding classroom new2 and of- (d) LTLMoP map after the adfices, new3,new4. dition of the office new4.

(e) Student submit assignment to (f) Robot submit assignment to prorobot in classroom2. fessor in office new4.

B. Description of the experiment The robot starts in classroom1, and heads toward hall1. As soon as it enters hall1, it detects new1 (Fig. 4a,4b) and classifies it as a large room, thus adding it to the group classrooms and then re-synthesizing. It visits region new1 and classroom1 according to the specifications, then it heads toward classroom2. On its way, the robot passes through hall2, where it detects new2 (large) (Fig. 4c,4d), adds it to group classrooms, and re-synthesizes. Immediately afterwards, it detects new3 (small) (Fig. 4c,4d), adds it to group offices and re-synthesizes. Afterward, it continue to classroom new2, there, a student hands the robot an assignment (Fig. 4e), and the robots start searching for the professor to submit the assignment to in the offices new3 and office1. Since the robot does not find the professor in those offices, it continues to search the offices until a door to a new office is opened, the robot detects new4 (Fig. 4c,4d), adds it to group offices and re-synthesizes. The robot moves to the new4 office, there it finds the professor and hands him the assignment (Fig. 4f). Since now the robot does not have assignments to deliver,

(g) Avoiding classroom2 and raising the (h) Visiting the office new3 warning flag as a response to the hazardous with assignment in the cup cone.

Fig. 4: Assignment collecting and handing experiment it continues to search for students in the classrooms, starting with classroom2 which it did not visit, yet. The robot continues its correct execution according to the specifications, continuously searching for students with assignments and handing them to professors. After some time, a hazardous item appears in front of classroom2 (Fig. 4g). When the robot is in hall2 and heading to classroom2, it detects the hazardous item, consequently, it raises the warning flag (Fig. 4g), skips classroom2 and continues to new2 instead.

383

After the hazardous item exits the view of the sensor, the robot lowers the flag, and continues execution with correct behavior. From the experiment execution [19], it is evident that under the assumptions of Section III, correct behavior is achieved even when adding regions that are unknown a priori. Moreover, the robot correctly classified the new regions according to their sizes, and added them to the specifications accordingly. VII. C ONCLUSIONS AND F UTURE W ORK This paper describes an approach to guaranteeing high-level robot behaviors in partially known and continuously updated workspaces through re-synthesis. This includes modifying the discrete abstraction of the workspace on-the-fly, preserving the robot’s state and history of task progress and sequencing the liveness requirements such that the robot exhibits the desired behavior. The proposed methods and algorithms were successfully tested in experiments. Moreover, it was shown that specific features of the new detected regions can be used to classify them into different groups, enabling the automation of redefinition of the high level task accordingly, and maintaining the correct execution. The process of changing the controller is automated and a new controller is guaranteed to be correct with respect to the specifications. Furthermore, if a controller cannot be generated, it means that the task can no longer be achieved in the modified workspace and LTLMoP can provide explanations for the failure. Ongoing research is focused on improving the new region detection, such that the under approximation will be closer to the exact free area. Another research direction aims to deal with the changes in known regions, such as the unpredictable appearance of obstacles which may result in re-decomposition of the regions, or even in unrealizable specification. ACKNOWLEDGMENTS The authors gratefully acknowledge the contribution of Cameron Finucane, Eric Sample, and Nyk Lotocky to the development process and to the implementation. R EFERENCES [1] C. Belta and L.C.G.J.M. Habets. Constructing decidable hybrid systems with velocity bounds. In IEEE Conference on Decision and Control, Bahamas, 2004. [2] A. Bhatia, L.E. Kavraki, and M.Y. Vardi. Samplingbased motion planning with temporal goals. In IEEE International Conference on Robotics and Automation (ICRA), pages 2689–2696, 2010. [3] E.M. Clarke, O. Grumberg, and D.A. Peled. Model Checking. MIT Press, Cambridge, Massachusetts, 1999. [4] D.C. Conner, A.A. Rizzi, and H. Choset. Composition of Local Potential Functions for Global Robot Control and Navigation. In IEEE/RSJ Int’l. Conf. on Intelligent Robots and Systems, pages 3546 – 3551, Las Vegas, NV, October 2003. [5] H.S.M. Coxeter. Regular polytopes. Dover Publications, 1973.

[6] M. de Berg, M. van Kreveld, M. Overmars, and O. Schwarzkopf. Computational Geometry: Algorithms and Applications. Springer-Verlag, second edition, 2000. [7] A. Elfes. Sonar-based real-world mapping and navigation. Robotics and Automation, IEEE Journal of, 3(3): 249 –265, June 1987. [8] G.E. Fainekos, A. Girard, H. Kress-Gazit, and G.J. Pappas. Temporal logic motion planning for dynamic robots. Automatica, 45(2):343 – 352, 2009. [9] C. Finucane, G. Jing, and H. Kress-Gazit. LTLMoP : Experimenting with Language , Temporal Logic and Robot Control. In IEEE/RSJ International Conference on Intelligent Robots and Systems., pages 1988–1993, Taipei, Taiwan, 2010. [10] B. Johnson and H. Kress-Gazit. Probabilistic Analysis of Correctness of High-Level Robot Behavior with Sensor Error. In Proceedings of Robotics: Science and Systems, Los Angeles, CA, USA, June 2011. [11] S. Karaman and E. Frazzoli. Sampling-based motion planning with deterministic μ-calculus specifications. In IEEE Conference on Decision and Control, pages 2222– 2229, 2009. [12] M. Kloetzer and C. Belta. A fully automated framework for control of linear systems from LTL specifications. In Hybrid Systems: Computation and Control, volume 3927, pages 333–347, 2006. [13] H. Kress-Gazit, G.E. Fainekos, and G.J. Pappas. Translating Structured English to Robot Controllers. Advanced Robotics, 22(12):1343–1359, 2008. [14] H. Kress-Gazit, G.E. Fainekos, and G.J. Pappas. Temporal Logic based Reactive Mission and Motion Planning. IEEE Transactions on Robotics, 25(6):1370–1381, 2009. [15] M. Lahijanian, J. Wasniewski, S.B. Andersson, and C. Belta. Motion planning and control from temporal logic specifications with probabilistic satisfaction guarantees. In Robotics and Automation (ICRA), 2010 IEEE International Conference on, pages 3227–3232, May 2010. [16] S. R. Lindemann and S. M. LaValle. Smooth Feedback for Car-Like Vehicles in Polygonal Environments. In IEEE Conference on Robotics and Automation, 2007. [17] N. Piterman, A. Pnueli, and Y. Sa’ar. Synthesis of Reactive(1) Designs. In VMCAI, pages 364–380, Charleston, SC, Jenuary 2006. [18] V. Raman and H. Kress-Gazit. Analyzing unsynthesizable specifications for high-level robot behavior using ltlmop. In CAV, pages 663–668, July 2011. [19] S. Sarid, B. Xu, and H. Kress-Gazit. Guaranteed High-level robot behavior while exploring unknown workspace, 2012. URL http://www.youtube.com/watch? v=oiF4Wt8xgio. Video (YouTube). [20] T. Wongpiromsarn, U. Topcu, and R.M. Murray. Receding Horizon Temporal Logic Planning. In IEEE Conference on Decision and Control (CDC), pages 5997–6004, December 2009.

384

Development of a Testbed for Robotic Neuromuscular Controllers Alexander Schepelmann, Michael D. Taylor, Hartmut Geyer The Robotics Institute Carnegie Mellon University Pittsburgh, Pennsylvania 15213 Email: [email protected], [email protected], [email protected]

Abstract—Current control approaches to robotic legged locomotion rely on centralized planning and tracking or motion pattern matching. Central control is not available to robotic assistive devices that integrate with humans, and matching predefined patterns severely limits user dexterity. By contrast, biological systems show substantial legged dexterity even when their central nervous system is severed from their spinal cord, indicating that neuromuscular feedback controls can be harnessed to encode stability, adaptability, and maneuverability into legged systems. Here we present the initial steps to develop a robotic gait testbed that can implement and verify neuromuscular controls for robotic assistive devices. The initial stage consists of an antagonistically actuated two segment leg with a floating compliant joint. We detail its electromechanical design and low level, velocity-based torque control. Additionally, we present experiments that test the leg’s performance during human-like high fidelity motions. The results show that the robot can track fast motions corresponding to 87% of the maximum performance limit of human muscle. The experiments also reveal limitations of our current implementation and we discuss solutions to overcoming them.

I. I NTRODUCTION Current approaches to leg control in locomotion either use centralized planning and tracking or mimic predefined joint motion patterns extracted from normal human gait. The first approach is used in humanoids including Honda’s ASIMO and AIST’s HRP-4 [11][18][27], but cannot be applied to robotic assistance wherein the central human user’s state is unknown. As a result, the second approach prevails in rehabilitation robotics [17][2][12][13]. For instance, exoskeletons developed for paralyzed patients enforce a limited set of pre-defined motion patterns of normal human gait [21][1], which severely constrains their functional dexterity. In part, this problem can be overcome by combining motion libraries and pattern recognition, as Sup et al. [30] demonstrated with a powered legged prosthesis for speed and slope adaptation. However, control strategies that generate the stability, maneuverability, and adaptability needed for truly high mobility have not been identified with this approach. We seek to develop an alternative control approach to powered legged systems that builds on decentralized neuromuscular control strategies of human locomotion. Animal and human legs possess remarkable autonomy in behavior and control. For example, decerebrate cats and rats have no brain control over their legs, yet they seamlessly adapt to different locomotion speeds on a treadmill and autonomously transition

between gaits [28][4][20]. Similar neuro-scientific experiments reveal that, in biological systems, dexterous performance of segmented legs is realized to a large extent by local feedback controls that bypass central processing, and by biomechanical designs that hardcode functional leg responses [5][24][14][7]. Recently, neuromuscular models of human locomotion were developed, which are controlled by autonomous local feedbacks without central planning, yet adapt to their environment and show substantial robustness of locomotion[8][29]. Eilenberg et al. [6] have implemented part of this feedback control in a powered ankle-foot prosthesis, resulting in a system that adapts to the environment without requiring explicit terrain sensing. To generalize this approach to segmented powered legs, we here present our initial steps of developing a robotic gait testbed that can implement and test neuromuscular controllers for robotic assistive devices. The testbed currently consists of a half-human sized, two segment leg with two antagonistic actuators and a compliant floating joint. We detail the electromechanical design and control of this robotic neuromuscular leg (RNL) (sect. II and III), and present and discuss experimental results that test the performance of the leg during human-like, high-fidelity motions (sect. IV and V). II. E LECTROMECHANICAL D ESIGN RNL is a half-human sized, two segmented, antagonistically actuated robotic leg with joint compliance (Fig. 1). The electromechanical design of RNL is driven by three themes: dynamic similarity, antagonistic actuation, and leg compliance. A. Dynamic Similarity We aim to build a testbed that matches human leg performance to develop neuromuscular controllers for powered segmented legs. For cost and safety considerations we build a robotic leg that is half the size and a quarter of the weight of a human leg. To ensure that the dynamic behavior of our robot matches human legs, we use dynamic scaling. Dynamic scaling uses fundamental physical variables to define relationships between a system’s quantities at different scales. This approach was formalized by Buckingham [3] and is often applied in aerospace and fluid engineering applications. In mechanical systems, fundamental units are mass, length, and time. The robot’s mass and length targets define these scaling factors

385

$\

$\



$\

  

$^\

  

  

! 

   

      

Fig. 1: Robotic testbed development. (a) Major human leg muscles with monoarticular knee extensor and biarticular knee flexor in dark grey. (b) Testbed concept. (c) CAD assembly of initial stage. (d) Robotic implementation. For experiments, thigh was rigidly coupled to mounting cage.

Leg Length (m) Thigh Length (m) Shank Length (m) Knee Radius (m) Total Mass (kg) Thigh Mass (kg) Shank Mass (kg) Foot Mass (kg) Vas. Max Force (N ) Vas. Max Vel (m/s) Ham. Max Force (N ) Ham. Max Vel (m/s) Max Joint Torque (N m) Max Joint Vel (rpm)

Human

RNL

Scaling Factor

1 0.46 0.54 0.06 80 8 3.7 1.2 6000 0.96 3000 1.2 368 153

0.5 0.23 0.27 0.03 20 2 0.9 0.3 1500 0.68 750 0.84 45 217

1/2 1/2 1/2 1/2 1/4 1/4 1/4 1/4 √1/4 2/2 √1/4 2/2 1/8 √ 2

TABLE I: Mechanical performance goals. Human mechanical properties dynamically scaled to RNL dimensions. Human data taken from Winter [32].

as mrobot /mhuman = 1/4 and lrobot /lhuman = 1/2, respectively. Since the robot is exposed to the same gravitational field as humans, the relationship grobot = ghuman ' must hold, resulting in a time scale of trobot /thuman = 1/2. Given these fundamental unit scales, force, torque, and velocity scale as Frobot /F√ human = 1/4, τrobot /τhuman = 1/8, and vrobot /vhuman = 2/2. The performance envelope of human leg actuation is defined by the maximum contraction velocity and maximum isometric force that muscles can develop. We dynamically scale these two parameters to identify the equivalent actuator no load speed and torque requirements of RNL. The largest human leg muscle is the vastus, a knee extensor located in the front of the thigh (Fig. 1a). The dominating muscle used in knee flexion is the hamstring, whose performance requirements are lower than those of the vastus. (Tab. I).

zero joint torque to allow passive dynamics, satisfying human mass distributions, and actuation across floating joints. SEAs were originally developed by Pratt and Williamson [25] and are common in bipedal robots such as Spring Flamingo developed by Pratt and Pratt [26] and M2 developed by Paluska [22]. SEAs are characterized by a compliant element between motor and load. This element decouples load and motor inertia, which enables precise torque control with highly geared motors (at the expense of system bandwidth), including zero torque. SEAs have been combined with cable drives in legged systems [10]. Using nondirect elements in the drivetrain, like cable, belt, and chain drives, allows actuators to be located away from the joint, ideal for realizing human-like segment mass distributions. Unlike belt and chain drives, cable drives can also go slack like human muscles, which allows them to act across compliant segments. The drivetrain of RNL’s SEAs is shown in Fig. 2b. With a later technology transfer to prosthetic and orthotic devices in mind, we limit ourselves to electric DC motors, focusing on the Maxon RE line. Several motor configurations match our design targets (Tab. II). The optimal combination of low weight, low rotor inertia, torque, and speed is reached by four mechanically coupled RE 30 motors. However, we instead opt for two mechanically coupled RE 40 motors to limit actuator complexity, at the expense of increased rotor inertia. For compactness, we incorporate a custom three stage



   



 

 `

`

      

B. Antagonistic Actuation RNL uses antagonistic actuators, realized by cable driven series elastic actuators (SEAs), which simultaneously meets several design requirements: the ability to actively command

Fig. 2: RNL SEAs. (a) Drivetrain schematic (b) Drivetrain assembly (c) Prototype

386

Gear Ratio Total Weight (g) Rotor Inertia (kgm2 ) Nominal Torque (N m) Stall Torque (N m) No Load Speed (rpm) Nominal Speed (rpm)

4x RE 30

2x RE 40

40 952 0.0221 14.1 163 212 194

36 960 0.03 13.2 180 211 194

TABLE II: SEA motor configurations. Optimal configuration is comprised of four RE 30s. Two RE 40s meet the same performance criteria with lower mechanical complexity, at the expense of increased rotor inertia.

which hold the compliant element. The compliant element is made from a two-part PMC-744 urethane rubber mix (Smoothon Inc.), which is cast into the clamping plates. A compliance retainer, constructed from a brass spur gear, couples the joint’s shaft and compliant element, and defines the floating center of rotation of the knee joint’s axis. The retainer’s set screw rigidly couples the thigh and joint shaft. This design restricts rotational motion to only occur in the robot’s shank. An RM22B encoder head is mounted on the joint shaft. The corresponding encoder body is located on the shank. Relative motion between the encoder head and body measures knee position during leg movement. III. V ELOCITY-BASED SEA C ONTROL

Fig. 3: Floating compliant knee joint. (a) Concept (b) Prototype (c) Close-up of implementation

drivetrain into our actuation system (Fig. 2a). The first stage mechanically couples the motors with a 4:1 reduction; the second stage orients the output shaft’s axis of rotation with a 3:1 reduction. We locate a rotary spring coupling between these stages. Currently, we use an off-the-shelf spring coupler (Stock Drive Products) with a stiffness of 1.75N m/rad and a maximum torque rating of 1.4N m. An additional 3:1 reduction is located between the SEA output shaft and joint, connected via cable drive, for a total 36:1 gear reduction in our drivetrain. The gear ratio of the external stage can be modified, which allows us to use the same SEA for muscles with different properties. We realize force measurements via two absolute rotary encoders (RM22B: 9bit, magnetic, analogue encoders; Renishaw PLC) located on either side of the spring. Additionally, an incremental encoder on the shaft of one RE 40 measures motor velocity (RM22I: 9bit, magnetic, digital encoder; Renishaw PLC). One motor controller (Solo Whistle, Elmo MC) supplies the same current to both RE 40s. Currently RNL uses two SEAs. One is located in the robot’s thigh, simulating the vastus muscle; the other is located on the mounting frame, simulating the biarticular hamstring muscle (Fig. 1c).

Our actuators generate desired torques using a velocitybased SEA control scheme [31]. Pratt and Williamson [25] originally approached SEA control from a torque-based perspective. Recently, an alternative SEA controller was proposed by Wyeth [33], which modulates load torque, τl , using motor velocity, ωm , as a control target instead of motor torque,   1 s + , (1) ωm = τl Jl s ks where Jl is the load inertia and ks is the stiffness of the compliant element. This approach is advantageous because losses due to gear dynamics between the motor and spring do not need to be considered. Since motor velocity corresponds exactly to velocity at the drivetrain output, the velocity loop automatically compensates for losses without additional tuning of the outer control loop. In addition, the controller’s inverse dynamics terms (Fig. 4) only require the first derivative of motor position, which leads to increased system bandwidth, since low-pass filters with higher cutoff frequencies can be used. Wyeth [33]’s formulation of velocity-based SEA control (eq. 1) requires load inertia to be known. For legged systems, it is not clear what the load inertia is, as load dynamics constantly change due to joint position and gait phase. However, knowledge of load inertia is not necessary in the formulation of a velocity-based SEA controller. Starting with Pratt and Williamson [25]’s derivation, torque exerted by a spring due to angular deflection is given as τl = −ks (θl − θm )

where τl is the torque applied to the load, ks is the spring stiffness, and (θl -θm ) defines the deflection of the spring between the load and motor side. To formulate torque control as velocity-based control, we write motor position as a function of motor velocity θm = ωm /s, and resolve eq. 2 to

C. Leg Compliance Humans are not rigidly coupled kinematic chains, possessing interjoint cartilage and soft tissue around bones. To capture this aspect in RNL, we incorporate a floating joint design into the robot (Fig. 3). The floating joint design connects the robot’s thigh and shank. The joint is composed of two rapid prototyped clamping plates (VeroWhitePlus, Objet Ltd.),

(2)

ωm = τl /ks s + θl s.

(3)

Here Jl does not need to be known. Fig. 4 shows a schematic implementation of this velocity control, in which eq. 3 has been implemented as feedforward compensation, P (s) is PD feedback compensation for model uncertainty, and C(s) represents the motor controller.

387





















  €



  

 ‚ƒ









„







Fig. 4: Velocity-based SEA control system model.

Fig. 5: High fidelity motion experiments. RNL knee position, velocity, and antagonistic actuator torques for walking trajectories corresponding to (a) 1.0x, (b,c) 1.6x, and (d) 1.8x nominal walking speed trajectories. In experiments, motion occurred between 4s and 8s. Plots show mean±s.d. for 10 repetitions. Dark grey lines: commanded trajectories. Black lines: measured trajectories. Light grey lines: ±s.d. of measured trajectories.

IV. E XPERIMENTS The electromechanical design of RNL includes custom built antagonistic actuators, an alternative velocity-based SEA

controller, and a floating compliant knee joint. RNL is intended as platform to develop neuromuscular controllers that embed human-like performance. To verify if our design and control

388

Fig. 6: RNL knee position and antagonistic actuator torque during co-contraction experiments for (a) 0% (b) 5% (c) 10% (d) 15%. Dark grey lines: commanded trajectory. Black lines: measured trajectory. Light grey lines: External impulse disturbances to instigate shank oscillation.

implementations meet this performance, we devise two experiments. The first experiment tests if the robot can generate fast, high fidelity motions that characterize human locomotion. The second experiment targets the ability to execute antagonistic co-contraction often seen in neuromuscular control. Since RNL currently only has actuators simulating the hamstring and vastus, we focus on the robot’s knee for this evaluation. To simulate the inertial effects of a foot segment, we attach a weight to the bottom of RNL’s shank, increasing its total mass to 1.1kg. The resulting mechanical properties of the shank segment are lcom = 0.107m and Js = 0.005kgm2 , where lcom is the distance of the shank’s center of mass position from the knee pivot and Js is the inertia. A. High Fidelity Motion Experiments We evaluate RNL’s motion fidelity using dynamically scaled human joint trajectories. We first numerically differentiate knee angular position data, φref k , tabulated in Winter [32], to generate reference trajectories for joint velocity, φ˙ ref k , and acceleration, φ¨ref , observed in human walking. The trajectories k are median filtered (filter order = 10) to eliminate artifacts resulting from the differentiation. Next, we use dimensional scaling to adapt the trajectories to RNL’s scale. The resulting joint trajectories correspond to normal human walking speed (1.0x nominal trajectory). We scale these trajectories to range

between 1.0x and 2.0x speed of the nominal trajectory, the latter of which reaches maximum knee joint velocity (Tab. I). With these references, we implement a tracking control that outputs desired actuator torques. The controller includes feedforward torque trajectories, Js φ¨ref k , gravity compensation, ˙ ref ˙ and PD feedback compensation, kp (φref k −φk )+kd (φk −φk ), where kp = 15 and kd = 0.01kp are the position and velocity feedback gains. The gains are kept constant throughout all experiments. φk and φ˙ k are measured by the absolute encoder on RNL’s knee joint. The four components are summed to generate net joint torque. The resulting actuator flexion and extension torques are commanded to the corresponding SEA via the velocity-based control scheme described in section III. To avoid cable slack, each SEA applies a minimum 0.5N m torque to the joint at all times. Fig. 5 summarizes the results of the motion experiments. Tabs. III and IV list the mean crosscorrelation coefficients and signal time delays over the trials for all traces. We observe that the executed joint position and velocity trajectories closely follow the desired trajectories for trials up to 1.6x, with position and velocity correlation coefficients greater than 0.90 and 0.80, respectively. The largest differences between desired and commanded position occur during periods of knee extension at 1.6x, with a maximum error of 7.3◦ . Velocity shows a similar tracking quality. Top speeds achieved 160rpm in 1.6x trials. In addition, we observe

389

R θk 1.0x 1.2x 1.4x 1.6x 1.8x

0.97±0.01 0.97±0.01 0.94±0.02 0.95±0.02 0.80±0.06

Rθ˙

k

0.87±0.04 0.88±0.02 0.82±0.04 0.85±0.02 0.64±0.08

Rτvas

Rτham

A (%)

kmuscle (N/m)

τP L (N m)

kknee (N m/rad)

0.71±0.06 0.67±0.04 0.80±0.02 0.80±0.01 0.77±0.04

0.90±0.02 0.90±0.02 0.82±0.05 0.80±0.03 0.62±0.04

0 5 10 15

0 7310 14640 21950

0 1.22 2.44 3.66

0 6.59 13.17 19.79

TABLE III: Mean correlation coefficients (R) for walking trials. n =10 for all speeds. t θk 1.0x 1.2x 1.4x 1.6x 1.8x

9.5±1.5 7.2±0.9 15.9±1.4 12.3±2.4 25.0±3.9

tθ˙

k

18.4±2.6 11±0.7 19.5±1.8 15.2±2.3 25.2±4.4

tτvas

tτham

13.1±1.7 10.3±1.2 9.1±1.0 10.7±0.8 9.5±2.1

4.8±0.9 5.1±0.7 9.0±1.6 9.3±1.7 14.9±0.9

TABLE V: Equivalent stiffnesses & SEA pre-load torques at different muscle activation levels. Values calculated at lCE = 1/2lopt .

T Simulation (s) T RNL (s)

0%

5%

10%

0.78 0.75±0.04

0.30 0.25±0.01

0.22 0.25±0.02

TABLE VI: Period of oscillation at different levels of cocontraction.

TABLE IV: Mean time delays (t) in ms for walking trials. n =10 for all speeds. where lCE is the length of the muscle’s contractile element and lopt is its optimal length. The resulting muscle stiffness is a high degree of repeatability throughout all trials, with a maximum standard deviation of 7.4◦ and 44rpm from the mean values over all trials1 . The results indicate that humanlike segment motions can be reliably and accurately replicated for speeds up to 160rpm. At higher speed trials, the quality of position and velocity tracking declines (Tabs. III and IV). Nevertheless, velocity targets were still reached with top speeds of at least 190rpm (Fig. 5d), or about 87% of our desired performance goals outlined in Tab. I. We could not test higher speeds, because the compliance retainer in the knee joint failed during the high speed trials. B. Co-Contraction Experiments Antagonistic muscle co-contraction is characterized by equal pre-load torques of the muscles, τP L , and an equivalent rotational joint stiffness, kknee . The corresponding desired SEA torques are given by τvas/ham = τP L ± kknee (φk − φ0 )

(4)

where φ0 is the joint reference position. To estimate appropriate values for τP L and kknee , we use a Hill-type muscle model as described in Geyer et al. [9]. In these models, the muscle force, F , is given by iso F = Afl fv Fmax

(5)

max where A is percentage of muscle activation, Fiso is the muscle’s maximum isometric force, and fl and fv are the force-length and force-velocity relationships. Combined with the knee joint radius, the muscle force defines τP L . For calculating the local stiffness, we neglect modulations by fv and model fl as 1 ( 2 ( ( lCE − lopt (3 ( ( (6) fl = exp ln(0.05) ( 0.056lopt ( 1 The

first leg swing was not included in this calculation as impulse accelerations from rest are not representative of system dynamics.

kmuscle = A

dfl iso . F dlCE max

(7)

Using scaled values of the vastus complex (lopt =0.04, w=0.056), we calculate kmuscle , τP L , and kknee , for four levels of A, ranging from 0 % to 15% muscle activation. (Tab. V). If RNL can successfully execute co-contraction, the motion resulting from the torques described by eq. 4 matches the motion of a physical driven pendulum whose center of mass and inertia properties correspond to those of the shank. We test the quality of co-contraction control by deflecting the shank from its rest position φ0 = 0 and comparing the resulting behavior to that of a simulated, equivalent driven physical pendulum. Fig. 6 shows the behavior of RNL during the co-contraction experiments. Fig. 6a shows the observed motion and torques for 0% co-contraction, which evaluates the quality of RNL’s zero torque control. The shank follows the motion of a damped pendulum with a period T that matches the simulated period (Tab. VI). For 0% co-contraction, the desired τvas/ham is zero. The measured torques track the commanded torque within torque resolution limits of the SEAs. (±0.1N m joint torque resolution for SEA spring stiffness of 1.75N m and 9bit encoders.) Higher levels of co-contraction are possible (Fig. 6b-d), but the commanded torques are not well tracked, as large oscillations result from the antagonistic actuators counteracting each other with large pre-load torques. Because the oscillations exceed the maximum torque rating of our spring at approximately 15% co-contraction, we cannot command higher levels of co-contraction. This shortcoming of our current implementation of antagonistic control is visible in the position trajectories as well. V. D ISCUSSION & C ONCLUSION Our goal is to develop neuromuscular controllers for powered segmented legs. To realize this goal, we presented the

390

initial design and validation stage of a robotic test platform targeting human-like performance. The platform currently consists of RNL, a half-human sized, two segment robotic leg with a floating compliant knee joint (Fig. 1d). The electromechanical design of the leg meets the physical weightsize properties and actuation performance defined by human physiology. To meet these criteria, we developed a modular, compact SEA (Fig. 2) that matches the performance of the vastii, the largest muscle group in human legs. In addition, we formulated an SEA control that takes advantage of velocitybased torque control without having to know load inertia (Fig. 4). In two experiments, we tested if our design and control implementation can deliver fast motions that characterize human locomotion and can generate antagonistic co-contraction seen in neuromuscular systems. The experimental results show that we can reliably generate human-like leg motions with high positional accuracy for joint speeds up to 160rpm and with lesser positional accuracy for joint speeds up to 190rpm (Fig. 5), approximately 90% of the maximum designed for joint velocity (Tab. I). The failure of the knee joint’s clamping plates during the high speed experiments indicates that the prototyped plates will need to be replaced by stronger material versions to achieve these maximum speeds. On the other hand, the compliant part of the floating joint did not show any wear throughout the experiments, suggesting that its design works very well. The co-contraction experiments revealed that the antagonistic actuators can command zero torque within the torque resolution limits of the SEAs, enabling near passive motions at the joint level (Fig. 6a). Higher levels of cocontraction up to 15% of human muscle activations were possible, but produced oscillations in the SEA torque patterns which limited the performance of co-contraction control. The experiments revealed two shortcomings of our current implementation, which we are working on overcoming. First, the actuators’ bandwidth limitation creates about 25ms of time delay during high speed position and velocity tracking. Scaled to human dimensions, this delay would be about 35ms, which is similar to the feedback delays observed in humans. However, the SEA delay will increase substantially with larger actuation levels necessary for stance motions. To diminish the delay, we are looking to replace the linear rotary spring in the actuators with a nonlinear stiffening spring. Stiffening springs in SEAs enable high precision zero torque control with high bandwidth responses at large commanded torques. The advantages of nonlinear springs to series elastic actuation are widely recognized (Pratt and Williamson [25], Hurst et al. [15], Migliore et al. [19]), although a systematic method for developing compact springs with custom force deflection profiles has not been proposed. In addition, a custom nonlinear spring coupler will also allow us to overcome the current torque limit of 12.6N m at the output that is defined by the maximum torque capacity of the off-the-shelf springs in the SEAs. The second shortcoming is the 15% limit on co-contraction control. The impact of this limit on implementing neuromus-

cular control strategies is unclear. While humans can reach up to 30% of antagonistic co-contraction at the leg joints [23], these levels are observed during the stance phase in which the legs are loaded by body weight, which is about 17 times larger than the weight of the shank. In the experiments, by contrast, the leg was unloaded and resistance to motion was entirely due to the mass properties of the shank. In particular, already at 5% co-contraction the peak torques of the actuators reached 3.2N m, which corresponds to the torque created by one body weight. Hurst et al. [16] suggest that damping elements parallel to the SEA spring may attenuate oscillations when load inertia is low. We plan to test if adding such elements into the SEA drivetrain will improve the quality of co-contraction control. The presented robot validates our initial design and control implementation, but does not represent a full neuromuscular leg. We are currently expanding RNL into a multi-degree of freedom leg with hip, knee, and ankle joints as well as an adaptive, compliant foot. This robot will incorporate 7 actuators based on our presented SEA design and control implementation. These actuators represent the major monoand bi-articular muscles that propel human legs during walking and running. The actuators will be controlled by reflexive neuromuscular models [8] [29], which will generate desired torques for each SEA unit (τd in Fig. 4). With this robot, the immediate goal will be to realize reflexive leg controls during swing, crucial for autonomous balance recovery in amputee locomotion. Ultimately, we expect this work to result in a generalized gait testbed to develop and test neuromuscular controllers for multi-articulated powered robotic limbs for rehabilitation and humanoid robotics. ACKNOWLEDGMENTS This work is supported by the Richard King Mellon Foundation. A.S. is supported by the National Science Foundation Graduate Research Fellowship Program. The authors wish to thank Larry Hayhurst and David Matten for their help with robot construction. We also wish to thank Daniel H¨aufle for his help with hardware communication.

391

R EFERENCES [1] S.K. Agrawal, S.K. Banala, K. Mankala, V. Sangwan, J.P. Scholz, V. Krishnamoorthy, and W.L. Hsu. Exoskeletons for gait assistance and training of the motor impaired. Proc IEEE Int Conf Rehab Rob, pages 1108–1113, 2007. [2] H.K. Au, H. Herr, J. Weber, and E.C. MartinezVillalpando. Powered ankle-foot prosthesis for the improvement of amputee ambulation. Conf Proc IEEE Eng Med Biol Soc, pages 3020–3026, 2007. [3] E. Buckingham. On physically similar systems; illustrations of the use of dimensional equations. Phys Rev, 4: 345–376, 1914. [4] G. Courtine, Y. Gerasimenko, R. van den Brand, A. Yew, P. Musiekno, H. Zhong, B. Song, Y. Ao, R.M. Ichiyama, I. Lavrov, R.R. Roy, M.V. Sofroniew, and V.R. Edgerton. Transformation of nonfunctional spinal circuits into func-

[5] [6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

tional states after the loss of brain input. Nat Neurosci, 12(10):1333–1342, 2009. V. Dietz. Proprioception and locomotor disorders. Nat Rev Neurosci, 3(10):781–790, 2002. M.F. Eilenberg, H. Geyer, and H. Herr. Control of a powered ankle-foot prosthesis based on a neuromuscular model. IEEE Trans Neural Sys Rehab Eng, 18(2):164– 173, 2010. Y. Gerasimenko, R.R. Roy, and V.R. Edgerton. Epidural stimulation: comparison of the spinal circuits that generate and control locomotion in rats, cats and humans. Exp Neurology, 209(2):417–425, 2008. H. Geyer and H. Herr. A muscle-reflex model that encodes principles of legged mechanics produces human walking dynamics and muscle activities. IEEE Trans Neural Sys Rehab Eng, 18(3):263–273, 2010. H. Geyer, Seyfarth A., and R. Blickhan. Positive force feedback in bouncing gaits? Proc R Soc B, 270:2173– 2183, 2002. J.W. Grizzle, J. Hurst, B. Morris, H.W. Park, and K. Sreenath. Mabel, a new robotic bipedal walker and runner. American Control Conference, pages 2030–2036, 2009. M. Hirose and K. Ogawa. Honda humanoid robots development. Phil Trans R Soc A, 365(1850):11–19, 2007. J. Hitt, A.M. Oymagil, T. Sugar, K. Hollander, Boehler A., and J. Fleeger. Dynamically controlled ankle-foot orthosis (dco) with regenerative kinematics: incrementally attaining user portability. Proc IEEE ICRA, pages 1541– 1546, 2007. M.A. Holgate, A.W. Boehler, and T. Sugar. Control algorithms for ankle robots: a reflection on the state-of-theart and presentation of two novel algorithms. IEEE/RASEMBS Int Conf Biomed Rob and Biomech, pages 97–102, 2008. H. Hultborn and J.B. Nielsen. Spinal control of locomotion - from cat to man. Acta Physiologica, 189(2): 111–121, 2007. J.W. Hurst, J.E. Chestnutt, and A.A. Rizzi. An actuator with physically variable stiffness for highly dynamic legged locomotion. Proc IEEE ICRA, pages 4662–4667, 2004. J.W. Hurst, A.A. Rizzi, and D. Hobbelen. Series elastic actuation: Potential and pitfalls. Proc Int Conf Climb Walk Robots, pages 1–6, 2004. R. Jimenez-Fabian and O. Verlinden. Review of control algorithms for robotic ankle systems in lower-limb orthoses, prostheses, and exoskeletons. Med Eng Phys, 34 (4):397–408, 2012. S. Kajita, T. Nagasaki, K. Kaneko, and H. Hirukawa. Zmp-based biped running control. IEEE Robotics and Automation Magazine, 14(2):63–72, 2007. S.A. Migliore, E.A. Brown, and S.P. DeWeerth. Novel nonlinear elastic actuators for passively controlling robotic joint compliance. J Mech Des, 129(4):406–412,

2007. [20] P. Musienko, R. van den Brand, O. Maerzendorfer, R.R. Roy, Y. Gerasimenko, V.R. Edgerton, and G. Courtine. Controlling specific locomotor behaviors through multidimensional monoaminergic modulation of spinal circuitries. J Neurosci, 31(25):9264–9278, 2011. [21] P.D. Neuhaus, J.H. Noorden, T.J. Craig, J. Kirschbaum, and J.E. Pratt. Design and evaluation of mina: A robotic orthosis for paraplegics. Proc IEEE ICORR, pages 1–8, 2011. [22] D.J. Paluska. Design of a humanoid biped for walking research. MS Thesis, Massachusetts Institute of Technology, 2000. [23] J. Perry and J.M. Burnfield. Gait analysis: Normal and pathological function. SLACK Incorporated, 2, 2010. [24] E. Pierrot-Desseilligny and D. Burke. The circuitry of the human spinal cord: its role in motor control and movement disorders. Cambridge University Press, 2005. [25] G.A. Pratt and M.M. Williamson. Series elastic actuators. Proc IEEE/RSJ IROS, 1:399–406, 1995. [26] J.E. Pratt and G.A. Pratt. Intuitive control of a planar bipedal walking robot. Proc IEEE ICRA, 3:2014–2021, 1998. [27] M. Raibert. Legged robots that balance. MIT Press, Cambridge, 1986. [28] M.L. Shik, F.V. Severin, and G.N. Orlovski. Control of walking and running by means of electric stimulation of the midbrain. Biofizika, 11(4):659–666, 1966. [29] S. Song and H. Geyer. Regulating speed and generating large speed transitions in a neuromuscular human walking model. Proc IEEE ICRA, pages 511–516, 2012. [30] F. Sup, H.A. Varol, and M. Goldfarb. Upslope walking with a powered knee and ankle prosthesis: initial results with an amputee subject. IEEE Trans Neural Sys Rehab Eng, 19(1):71–78, 2011. [31] M.D. Taylor. A compact series elastic actuator for bipedal robots with human-like dynamic performance. MS Thesis, Carnegie Mellon University, 2011. [32] D.A. Winter. Biomechanics and motor control of human movement. Wiley, (4), 2009. [33] G. Wyeth. Control issues for velocity sourced series elastic actuators. Proc IEEE ICRA, 2006.

392

Experiments with Balancing on Irregular Terrains Using the Dreamer Mobile Humanoid Robot Luis Sentis, University of Texas at Austin, USA Josh Petersen, University of Texas at Austin, USA Roland Philippsen, Halmstad University, Sweden

Abstract—We investigate models and controllers for mobile humanoid robots that maneuver in irregular terrains while performing accurate physical interactions with the environment and with human operators and test them on Dreamer, our new robot with a humanoid upper body (torso, arm, head) and a holonomic mobile base (triangularly arranged Omni wheels). All its actuators are torque controlled, and the upper body provides redundant degrees of freedom. We developed new dynamical models and created controllers that stabilize the robot in the presence of slope variations, while it compliantly interacts with humans. This paper considers underactuated free-body dynamics with contact constraints between the wheels and the terrain. Moreover, Dreamer incorporates a biarticular mechanical transmission that we model as a force constraint. Using these tools, we develop new compliant multiobjective skills and include selfmotion stabilization for the highly redundant robot.

I. I NTRODUCTION Real-world terrains exhibit curvature changes, discrete slabs, granular obstacles, etc. Any robot that is supposed to help automate industrial and urban related chores, in the presence of irregular terrains and human operators, has to be able to maneuver, manipulate and interact safely, accurately and robustly in these type of environments. And it will have to do so while safely and physically interacting with human users. This paper contributes a theoretical and experimental study of a control approach with such capabilities. It is based on the whole-body control framework, which provides effective tools to incorporate constrains and build robot skills as hierarchies of motion and force tasks. We formulate unified dynamics for tasks and constraints and use them to construct compliant controllers for our humanoid torque-controlled mobile robot Dreamer. We describe, model, and control two fundamental constraints, prove the stability of our method for controlling motion redundancy, and show how our control approach can naturally implement kinesthetic interaction with the robot. The experiments provide an empirical proof of concept for the developed concepts and methodology. One of the most fundamental physical constraints for any mobile platform is the contact between its wheels and the ground. It is common to build mobile bases that can be treated as planar joints, but this requires them to be heavy and large to create passive righting moments that exceed any expected tipping moment. This requirement severely limits areas for potential deployment, which is why we designed Dreamer with a smaller and more lightweight base, and chose to address stability using an active approach inspired by

Fig. 1. Experimental setup with an indication of the coordinates used by the physial intraction skill described in the experimental section.

our earlier work on multicontact control. By incorporating the full mobility of the base, as opposed to the frequently unstated planarity assumption, it becomes possible to estimate balance coordinates due to variations in slope, account for free-body dynamic effects, and create an infrastructure that is dynamically correct in 3D space. For Dreamer, there is another fundamental constraint due to the mechanical coupling of two motion axes in its torso. This biarticular joint provides a large range of motion while keeping the mechatronics very compact. Stable control of redundancies is a general issue for manipulators with more degrees of freedom (DOF) than required for fulfilling a particular skill. In this paper, we contribute a detailed stability analysis of our method of addressing such redundancies. A. Related Work Our line of research is a direct successor of [11]. Here, a mobile omnidirectional manipulator for flat terrain was developed using caster wheels and a Puma industrial manipulator with an Operational Control layer [14]. Our work is a departure from the previous one in various respects: (1) we implement wholebody dynamics that incorporate the free floating underactuated effects, (2) we describe the dependencies between wheels as contact and rolling constraints, (3) we implement a dynamically correct prioritized control strategy, and (4) we control balance to prevent falling in the rough terrains. As such, we provide a platform that is more capable to adapt to outdoor

393

environments and is more dexterous in executing whole-body skills. Although, whole-body compliant control in mobile robots and humanoids has been thoroughly studied by us [15], [21] and by our predecesors [3], [1] more recent implementations include [17], [9], [6]. In these works whole-body torque control of the base and the upper body are considered. In particular, in [9], [6] the coordination of whole-body torques is accomplished by separating upper-body torques and an admittance control strategy for the velocity controlled base. Limitations of these works include flat terrain assumptions, lacking balancing capabilities and considering static contact conditions on the base wheels. Balancing is an important component of our approach to handling rough terrains. In [8], a large wheeled robot with reconfigurable pose is used to maneuver in outdoor backterrains. Compared to our work, the previous one does not utilize whole-body dynamics nor contact state estamination. In turn, the robot will perform poorly when executing fast dynamic maneuvers. A simple wheeled robot that balances to operate in rough roads has been proposed in [16]. They use wheeled inverted pendulum dynamics to stabilize balance through state feedback control. This work shares balance capabilities with our approach but is limited in that it does not have manipulation capabilities. As for the control architecture, our software relates (within a limited architectural scope) to [2], but in contrast we provide whole-body dynamic models apt for the control of mobile humanoids under contact constraints. When compared to [13] our software provides a multi-objective torque control layer that enables to utilize the robot’s motion and contact redundancy more efficiently. B. The Dreamer Humanoid Robot Hardware The main hardware tool that we use for this study is a new mobile dexterous humanoid robot. It includes a torso, an arm, a hand, an anthropomorphic head developed by Meka Robotics, and a torque-controlled holonomic base developed by our lab. The actuators for the base and the upper body, except for the head, contain torque/force sensors that enable Elmo amplifiers to implement current or torque feedback. An Ethercat serial bus communicates with sensors and motor amplifiers from a single computer system. A PC running Ubuntu Linux with the RTAI Realtime Kernel runs the models and control infrastructure described in this project. The holonomic base contains rotary torque sensors as well as the inertial measurement unit (IMU) 3DM-GX3-25 from MicroStrain. It achieves holonomic motion and force capabilities by utilizing Omni wheels located in a equilateral triangular configuration. II. C ONSTRAINED , U NDERACTUATED DYNAMICS

Fig. 2. “Free-body” degrees of freedom, reaction forces, and robot’s kinematic chain: in order to adapt to irregular terrains, we model the robot kinematics and dynamics with respect to an inertial reference frame. This step entails adding 6 kinematic degrees of freedom of free-body movement to the kinematic description of the robot. Furthermore, rolling and contact constraints on the wheels are modeled for contact-based control.

where qfreebody ∈ R6 corresponds to the passive free-body linear and angular DOF of the base, qwheels ∈ R3 correspond to the actuated base wheels, qtorso ∈ R3 correspond to the robot’s upper torso with the biarticular constraint, and qarm ∈ R7 corresponds to the robot’s actuated right arm. The equation of motion underlying the model is T λconstr = U T τcontrol , (2) A(q) q¨ + b(q, q) ˙ + g(q) + Jconstr

where {A, b, g} are the inertial, Coriolis-centrifugal, and gravity terms; Jconstr is the constraint Jacobian (sections II-B and II-C); λconstr are Lagrangian multipliers; U T maps actuation forces into the system (section II-A); and τcontrol ∈ Rndofs −(1+6) is the vector of motor torques. We assume {A, b, g} known, solve for λconstr here, and develop the expressions for the remaining terms in the following subsections. λconstr can be found by left-multiplying (2) by Jconstr A−1 and applying the time-derivative of the constraint equation Jconstr q˙ = 0 to yield   T λconstr = J constr U T τcontrol − b − g ˙ + Λconstr J˙constr q,

T Λconstr is the dynamically where J constr  A−1 Jconstr consistent generalized inverse of Jconstr and Λconstr  T )−1 is the inertial matrix projected in the (Jconstr A−1 Jconstr manifold of the constraints. These steps eliminate the constraint forces from (2), and we obtain the equation of motion of the constrained underactuated system as

The model of Dreamer’s dynamic behavior is formulated using Lagrangian rigid multibody dynamics. A diagram of the modeled entities is shown in Figure 2. We start by introducing the generalized coordinate vector  q  qfreebody

qwheels

qtorso

 qarm ∈ Rndofs ,

(1)

(3)

T T (b + g) + Jconstr Λconstr J˙constr q˙ A q¨ + Nconstr T

= (U Nconstr ) τcontrol ,

(4)

where Nconstr  I − J constr Jconstr is the dynamically consistent null space of the constraint Jacobian.

394

In the following subsection, we develop U and Jconstr . The latter collects various constraints. In this paper, it is defined as   Jbiart Jconstr  (5) ∈ R(1+6)×ndofs . Jcontact A. Underactuation Model There are two sources of underactuation on Dreamer, namely the six free-floating DOF (modeled as virtual unactuated prismatic and revolute joints), and the fact that the two Sagittal joints in the torso are driven by a single motor. We express this dependencies as qact = U q ∈ Rndofs −(1+6) where ⎛ ⎞ 03×6 13×3 03×3 03×7 ⎜ ⎟ 1 0 0 ⎟ U ⎜ (6) ⎝02×6 02×3 0 1 0 02×7 ⎠ 07×6 07×3 07×3 17×7 is the (ndofs − (1 + 6))×ndofs matrix that maps the biarticular and free-body DOF into the equation of motion. B. Biarticular Constraint Biarticular joints in our robot are located in between the hip and thorax segments of the torso. In particular, the torso has 3 degrees of freedom (DOF) — waist rotation, hip flexion/extension, thorax flexion/extension — but only the waist rotation and hip flexion/extension are directly actuated by independent motors. A steel wire runs between the hip and thorax DOF to provide actuation to the two Sagittal joints of the upper torso. This type of transmission constraint represents an underactuated DOF and creates tension forces across the wire. Because it creates position constraints, it is of a holonomic nature. In particular, Dreamer’s biarticular constraint can be modeled as q hip = q thorax . It follows that the instantaneous velocities of both joints are also equal (q˙ hip = q˙ thorax ). Let us define the coordinate δbiart  qhip − qthorax . Then, a differential relation with respect to the generalized coordinates can be expressed as δ˙biart = Jbiart q˙ = 0,

(7)

where q˙ is the vector of generalized joint velocities (6 for the free-body base, 3 for the wheels, 3 for the upper torso and 7 for the right arm), and the Jacobian of the biarticular constraint is expressed as  Jbiart  01×6

01×3

0

1

−1

 01×7 ∈ R1×ndofs (8)

C. Rolling and Contact Constraints The relative velocity of the ground contact point of the i-th wheel, vcontact[i] , depends on the velocity of the wheel center, the angular velocity of the wheel, and the relative position of the contact point: vcontact[i] = vwheel[i] + ωwheel[i] × δcontact[i] ∈ R3 ,

(9)

where vwheel[i] and ωwheel[i] are linear and angular velocities of the wheel, and δcontact[i]  pcontact[i] − pwheel[i] is the displacement from the wheel center to the contact point. The rolling constraint implies that the velocity of the contact point T in the direction of the wheel is zero, i.e. urolling[i] vcontact[i] = 3 0, where urolling[i] ∈ R is the direction of the wheel axis. We rewrite this condition as Jrolling[i] q˙ = 0, where   T Jrolling[i]  urolling[i] Jω,wheel[i] − δcontact[i] × Jω,wheel[i] . (10)  T T T 6×n dofs and Jwheel[i]  Jv,wheel[i] Jω,wheel[i] ∈ R are the linear and rotational part of the wheel’s basic Jacobian.   T

T T ωwheel[i] = Jwheel[i] q˙ correThe screw vector vwheel[i] sponds to the linear and angular velocity of the wheel. In addition, there is a contact constraint with respect to the wheel’s normal direction to the ground: Jnormal[i] q˙ = 0, with T Jnormal[i]  unormal[i] Jwheel . Therefore the combined rolling constraints of the three wheels correspond to the aggregation of the slip and the normal constraints for each wheel, i.e. ⎞ ⎛ Jrolling[1] ⎜ Jrolling[2] ⎟ ⎟ ⎜ ⎜ Jrolling[3] ⎟ 6×ndofs ⎟ . (11) Jcontact  ⎜ ⎜Jnormal[1] ⎟ ∈ R ⎟ ⎜ ⎝Jnormal[2] ⎠ Jnormal[3]

III. C OMPLIANT W HOLE -B ODY C ONTROL The whole-body control framework expresses controllers in a representation that is the most natural for a given task, with a generic coordinate transformation xtask = Ttask (q) ∈ Rntask , e.g. a homogeneous transformation for a Cartesian point, or a rotation and translation in the group SE(3) for a spatial transformation of a frame. It follows that instantaneous task kinematics can be expressed as x˙ task = Jtask q, ˙ where Jtask = ∂xtask /∂q ∈ Rntask ×ndofs is the geometric Jacobian matrix with respect to the generalized coordinates. In order to express the task with respect to the actuated joints, section III-A summarizes earlier results on a dynamically consistent generalized inverse of U Nconstr . Then, section III-B proceeds to develop a prioritized operational space task/posture controller for Dreamer, and section III-C contributes a new proof of stability for our posture control approach. The sum of all tasks may not fully determine the motor torques for all actuated joints. In fact, this is a very common circumstance with redundant manipulators such as Dreamer. In order to stabilize the motions that remain in the null space of all tasks, we define posture as a special kind of task which optimize a performance criterion, as opposed to the usual position or force tracking. We usually place at least one posture at the end of any task hierarchy. A. Actuated-Joint Kinematics The number of constraints (one for the biarticulate joint plus two for each of the three wheels in contact) is equal to the

395

number of unactuated DOF (one for the biarticulate joint and six for the free-floating formulation). Thus, we can solve the robot’s kinematics from the actuated joints alone. In particular, we consider decomposing the generalized coordinates into actuated and unactuated parts, i.e.   qact , (12) q = Dkin qunact where Dkin is a matrix that distributes generalized coordinates to their respective positions in the robot’s kinematic chain. Because of the already encountered condition Jconstr q˙ = 0, we can state that q˙∗ ∈ Null(Jconstr ) where q˙∗ expresses the set of generalized velocities that fulfills all of the constraints. In other words, the constrained velocities can be expressed as # the self motion manifold q˙∗ = (I − Jconstr Jconstr ) q, ˙ where # Jconstr is any right inverse of the constraint Jacobian (i.e. # = I) and q˙ represents unconstrained velocities. Jconstr Jconstr The constrained generalized velocities can be reconstructed from the velocities of the actuated DOF alone using the following formula q˙∗ = U N constr q˙act , where

T

(14)

is the dynamically consistent generalized inverse of U Nconstr , (.)+ is the Moore-Penrose pseudoinverse operator and φ  U Nconstr A

−1

T

(U Nconstr ) ,

B. Prioritized Control Structure Based on (14), the differential kinematics can now be expressed with respect to actuated joints alone: ∗ q˙act , x˙ task = Jtask

where

∗ Jtask

(16)

is the reduced constraint consistent Jacobian ∗  Jtask U N constr , Jtask

∗ Jtask

q¨act + φ∗ (b∗ + g ∗ ) = φ∗ τcontrol .

with ∈ R a reduced form of the task Jacobian consistent with the general constraint conditions. To prove (16), consider the constrained generalized velocities q˙∗ , and apply (13) to the instantaneous task kinematics, thus getting x˙ task = Jtask q˙∗ = Jtask U N constr q˙act . Now, consider the skill of touching objects using our robot, encoded as a control hierarchy with one task and one posture. This skill uses two coordinate systems, xskill  {xhand ∈ T (3), xposture = qact ∈ Rnact }, where T (3) is the group of translations. The experiments in section IV employ a controller which follows the same principle, but it adds a balancing task to the top of the control hierarchy. In general, an arbitrary task(k) can be kinematically characterized by its mapping with respect to the generalized coordinates and by its generalized constrained Jacobian, i.e. xtask(k) = Ttask(k) (q), ∗ x˙ task(k) = Jtask(k) q˙act ,

(18)

(21)

T T (b + g) + Jconstr Λconstr J˙constr q). ˙ φ∗ (b∗ + g ∗ )  U A−1 (Nconstr (22) By left multiplying the constrained actuated dynamics by ∗ T T J task(k)  φ∗ Jtask(k) (Jtask(k) φ∗ Jtask(k) )−1 , and using the ∗ ∗ q¨act + J˙task(k) q˙act , the constrained equality, x ¨task(k) = Jtask(k) task dynamics can be expressed as ∗T

¨task(k) + μ∗task(k) + p∗task(k) = J task(k) τcontrol , (23) Λ∗task(k) x with {Λ∗task(k) , μ∗task(k) , p∗task(k) } being inertial, Corioliscentrifugal, and gravitational terms (not derived here). ∗ If Jtask(k) is full rank, the following control structure yields full control of the task dynamics, ∗T Ftask(k) . τcontrol = Jtask(k)

(24)

This statement can be proven by applying the above torques into (23) and using the property of the generalized inverse ∗ ∗ J task(k) = I, thus getting Jtask(k) Λ∗task(k) x ¨task(k) + μ∗task(k) + p∗task(k) = Ftask(k) ,

(25)

which delivers full control of the task dynamics x ¨task(k) by using feedforward control and feedback linearization by means of the control input Ftask(k) , i.e. ∗ ∗ Ftask(k) = Λ∗task(k) aref task(k) + μtask(k) + ptask(k) ,

(26)

where aref task(k) is an acceleration control input. The above controller yields x ¨task(k) = aref task(k) . For the case of one task and one posture, we define the following prioritized control structure that provides optimal projection of the posture gradient,

(17)

ntask ×nact

(20)

where

(15)

is a projection of the inertia matrix in the reduced constrained manifold. See [20] for a proof of (13).

∗  Jtask(k) U N constr . Jtask(k)

It can be demonstrated [20], that the dynamics of the actuated joints can be expressed as

(13)

U N constr  A−1 (U Nconstr ) (φ∗ )+ ,



with

∗T ∗T τposture , Ftask + Ntask τcontrol = Jtask

(27)

∗ ∗ J task Jtask ,

∗ I− is the dynamically consistent where Ntask null-space matrix of the reduced task Jacobian.

C. Posture Control Stability Uncontrolled null-space tasks are known to produce unstable behaviors [5]. Therefore, we investigate the stabilizing properties of feedback controllers for the posture. We consider the following energy function to be minimized in posture space Vposture =

1 T e K eqact , 2 qact

(28)

goal goal is a feedback error function, qact where eqact  qact − qact is a postural attractor goal and K is a gain matrix. A na¨ıve approach is to project the gradient of (28) directly into the control structure (27), i.e.

(19)

396

∗T ∗T Ftask + Ntask Keqact . τcontrol = Jtask

(29)

2. As such (32) becomes q¨act + φ∗ (b∗ + g ∗ ) = φ∗p|t Fposture .

(34)

By choosing the feedback control policy + +   Fposture = φ∗p|t K eqact + φ∗p|t φ∗ (b∗ + g ∗ ) , (35) leads to [20]

Fig. 3. Balance: The upper body is controlled such that, when projected on to a horizontal plane, its center of mass (COM) remains as close as possible to the projected center of the support triangle formed by the three wheels. This is demonstrated by manually tilting the base. Notice that the COM task controls two DOF, and the posture control takes care of stabilizing the joints at a position that minimizes the posture error while not interfering with the projected COM position.

If we apply the above torque to the posture dynamics (which are equal to the actuated dynamics) of Equation (21), and neglect the task torques, i.e. Ftask = 0, we get q¨act + ∗T Keqact . φ∗ (b∗ + g ∗ ) = φ∗ Ntask ∗T The problem with this policy is that Ntask in the above equation introduces cross-coupling of the error coordinates. Such distortion leads to a system of Partial Differential Equations (PDE’s) for which convergence to the minimum energy cannot be assured. A detail study on convergence and stability of the above distortion should be conducted. However, proof of non convergence to the minimum energy was empirically demonstrated in [15]. This type of deficiency has been studied before, for instance in [18] by proposing a time varying feedback stabilization scheme which augments the gain matrix K. However, such solution does not take into account the null space characteristics, and therefore cannot optimize the gradient descent. In [15], [21], we extensively investigated this problem in the context of dynamic behavior of posture control, and proposed a control strategy equivalent to choosing a gain matrix that rotates gradient to the best possible direction away from   the ∗T . It was found that the following postural control Null Ntask structure yields favorable gradient descent of postural energies ∗T τp|t = Jp|t Fposture ,

(30)

∗T where τp|t  Ntask τposture is the right-most term of (27), ∗ ∗ ∗  Jposture Ntask , Jp|t

(31)

∗ is the task consistent posture Jacobian matrix and Jposture  U U N constr , is the reduced constraint consistent Jacobian of the posture. To prove (30) let us consider the following steps: 1. If we plug (30) into (21) we get ∗T q¨act + φ∗ (b + g) = φ∗ Jp|t Fposture .

(36)

 T rank rank φ∗p|t = Up|t Σrank Up|t p|t

(37)

is the minimal (rank) svd decomposition of φ∗p|t , and rank ∈ Rnact ×(nact −ntask ) , Up|t

(38)

is the set of controllable eigenvectors of φ∗p|t . The feedback behavior of Equation (36) is now linear in T the controllable directions Urank,p|t and therefore leads to a system of Ordinary Differential Equations (ODEs). If follows, that it must minimize the energy function of Equation (28). The resulting performance improvement in posture space is substantial [15]. As a new contribution, we study in more detail the stability of self-motions of our controller. The general con   proposed  +

v + φ∗p|t troller Fposture = φ∗p|t the closed loop behavior [20]

+

φ∗ (b∗ + g ∗ ) yields

q¨act = Φ∗p|t v, (39)  T rank rank where Φ∗p|t  φ∗p|t φ∗+ Up|t is a singular and p|t = Up|t symmetric positive semidefinite matrix that represents the self motion manifold of the task coordinates, and v a closed loop controller. Let us define a projected state error E  Φ∗p|t (−kp eqact − kv q˙act ) ,

(40)

with kp > 0, kv > 0. The controller choice v=

kp 1 1 KE E − q˙act − Φ˙ ∗p|t (−kp eqact − kv q) ˙ , kv kv kv

(41)

where KE is a gain matrix, will render stable behavior of the self-motions. To demonstrate the stability, let us consider the Lyapunov candidate function 1 T E E ≥ 0. (42) 2 Practically, the above energy function is strictly positive definite as desired postures are normally uncorrelated with the task. The time derivative yields V =

V˙ = E T E˙ = −E T (kp q˙act + kv v) − E T Φ˙ ∗p|t (kp eqact + kv q˙act ) = −E T KE E ≤ 0 (43)

(32)

However, in [20] it was demonstrated that the right hand terms ∗T φ∗ Jp|t = φ∗p|t , where ∗ ∗T φ∗p|t  Jp|t φ∗ Jp|t ,

where

T (¨ qact = K eqact ) , Urank,p|t

where we have used the equality

(33)

is an inverse inertial matrix in a space that is simultaneously consistent with the robot constraints and with the task priority.

397

˙ ∗ (−kp eq − kv q˙act ) E˙ = Φ∗p|t (−kp q˙act − kv q¨act ) + Φ act p|t = Φ∗p|t (−kp q˙act − kv Φ∗p|t v) + Φ˙ ∗p|t (−kp eqact − kv q˙act ), (44)

Fig. 4.

Prioritized whole-body compliant controller.

the property of idempotence Φ∗p|t Φ∗p|t = Φ∗p|t (which follows ∗ ∗ from the property of generalized inverse φ∗p|t φ∗+ p|t φp|t = φp|t ) and the closed loop input of (41). Because Equation (43) is essentially the negative equivalent of Equation (42), the proposed self-motion controller renders stable behavior of the posture if (42) is chosen to be striclty positive. Tracking under model uncertainties remains an open question for our posture controller. We are planning to conduct such an analysis inspired by [4], [23], [10]. IV. E XPERIMENTS We performed an experiment involving maneuvering in an irregular terrain while physically interacting with a human user. It serves as empirical validation for the approach, by focusing on one essential irregular terrain feature, namely a change in slope between two planar surfaces. This relatively simple setup, shown in Figure 1 already contains many of the challenges that Dreamer will have to face on more irregular terrain: excessive tilting or upper body accelerations may topple the robot; feed-forward gravity compensation, which is crucial for accurate yet compliant task control, depends on orientation; and forces due to interaction with the environment can lift wheels off the ground or lead to unwanted slippage of the base. The objective of the experiment is to demonstrate that: (1) the robot can transition safely between the flat and the sloping portions at various speeds; (2) the IMU unit and the models of the robot produce a good estimation of the constrained freebody dynamics; (3) the base follows the body as a result of the compliant interactions and because of the contact models; (4) the robot can control balance to prevent falling; (5) accurate manipulation and physical interaction can take place in the irregular terrain; (6) posture energies are minimized given the control objectives; and (7) all of this is accomplished while complying with the biarticular and rolling/contact constraints. To show the desired capabilities, we create a skill for interacting with a human user in the irregular terrain. It is very similar to the one discussed in section III-B, with two modifications. First of all, a balance task prevents falling when transitioning from flat to sloping terrain. This task controls the position

of the center of mass (COM) projected onto a horizontal plane. Second, the posture is modified to act only in the upper body DOF, which naturally implements kinesthetic control of the base: a human user simply needs to gently push or pull the hand in the direction where the robot should go, and without further modifications to the system model or the control formulation, the three base DOF will provide the required motion. This complex coordination works because the wholebody formulation includes all DOF in all task representations, and the hand goal is defined relative to the body, thus moving the base closer to the hand. The coordinates of this Phyisical Human-Robot Interaction skill are thus xskill (q)  {xcom , xhand , xposture } ∈ R2+3+nposture . In the following lines, we describe the task kinematics and then formulate the prioritized whole-body control structure. All the controllers and related software were implemented using the open source realtime software infrastructure described in [19]. The balance task coordinates are xcom 

ndofs 1  mi xcom[i] M i=i

(45)

base

where M is the total mass of the robot, ibase = nfreebody + nwheels is the first joint attached to the upper body base in the kinematic chain depicted in Figure 2. The COM Jacobian can thus be given as Jcom (q) =

ndofs 1  mi Jcom[i] (q), M i=i

(46)

base

where Jcom[i] is the Jacobian of the i-th linkage in the kinematic chain. The hand coordinate system can be derived from the homogeneous transformation from the Inertial Frame I to the Hand Frame H, both depicted in Figure 2. Thus, the hand frame is given by 

xhand 1

 =

n& hand i=0

i+1

H Ti (q)

 xhand , 1

(47)

where xhand is the vector of hand translations with respect to the inertial frame I, i+1 Ti (q) is the matrix describing the Homogeneous Transformation from frame i + 1 to frame i, and H xhand is the hand translation expressed in its local frame H. The Jacobian of the hand is thus the geometric operator Jhand (q)  ∂xhand /∂q. The posture coordinates correspond to the actuated joints of the upper body. Notably, for reasons described above, the upper wheels are not included. Therefore xposture = Sact q, where upper is a selection matrix corresponding to the actuated Sact joints of the upper body only. It follows that the posture upper Jacobian is Jposture = Sact . The prioritized whole-body compliant controller described earlier endows Dreamer with the physical human robot interactions skill for this experiment. The control structure can be expressed as

398

∗T ∗T ∗T Fcom + Jhand Fhand + Jposture Fposture , (48) τcontrol = Jcom

Fig. 5. Experimental Results. Sequence of video snapshots, and data taken during the experiment on the irregular terrain. Snapshots, (A), (B) and (C) correspond to maneuvers on a terrain on a terrain of 7◦ slope. The corresponding data is shown on images (D), (E), (F) and (G). At instant (1) the upper body motors are enabled. At (2) the base motors are enabled. At (3) the robot starts ascending the slope.

∗ Jcom = Jcom U N constr ,

(49)

contrast, the posture is required to be locally stable only. The proposed feedback/feedforward control laws are   Fcom = Λ∗com −kp,com egoal ˙ com + pcom , (57) com − kv,com x   ˙ hand + phand , Fhand = Λ∗hand −Kp,hand egoal hand − Kv,hand x (58)

∗ Jhand

(50)

= Fposture 

where the center of mass operates with higher priority than the hand task and the hand task operates with higher priority than the posture task. Priorities are enforced by recursively defining the differential manifold of the self-motion of higher priority tasks, i.e.

=

∗ Jhand U N constr Ncom ,

∗ ∗ = Jposture U N constr Nhand,com , Jposture

(51)



∗ Nhand,com

=I−

∗ ∗ J hand Jhand

(52) −

∗ ∗ . J com Jcom

(53)

The above control structures leverage the developments in sections II and III to achieve dynamic consistency. Fcom , Fhand , and Fposture constitute the input commands. We now focus on the feedback control policies for the overall controller. Figure 4 shows the flow diagram. The goal of the COM task is to move the robot’s center of mass to a goal position above the center of the triangle defined by the wheels (see Figure 3). The goal of the hand task is to reach a position in front of the robot’s body, compliance with respect to the human is achieved by reducing the control gains. The posture goal is to move the upper body actuated joints to the zero position, except for the elbow joint which should bend 90◦ . Summarizing, 2 xgoal com = projected center of the base ∈ R

(54)

3 xgoal hand = position relative to body ∈ R

(55)

xgoal posture

= posture with bent elbow ∈ Rnupperbody −1 .

−kp,posture egoal posture



(59)

− kv,posture x˙ posture + pposture .

Given the above linearizing control laws, the resulting closedloop dynamics of the task coordinates become

with null spaces of self-motion expressed as [20] ∗ ∗ Ncom = I − J com Jcom ,

Λ∗posture

(56)

We assume that the center of mass and hand tasks are nonsingular in their priority spaces. Therefore, the objectives of these tasks are to converge asymptotically to the goal. In

˙ com = 0, x ¨com + kp,com egoal com + kv,com x

(60)

x ¨hand + Kp,hand egoal ˙ com = 0, hand + Kv,hand x

(61)

which are globally asymptotically stable while the closed-loop dynamics of the posture become   T x ¨posture + kp,posture egoal ˙ com = 0, Ur,posture com + kv,posture x (62) which is locally stable. Note that there are alternative approaches for creating such a skill. For example, inverse kinematic [22], Resolved Momentum [12], whole-body impedance controller [7]. To the best of our knowledge, however, none of these is adept at handling irregular terrains. Figure 5 shows a sequence of snapshots and the accompanying data of the experiment. The robot starts on flat terrain. Then, the human operator pulls gently backward on the hand, and the robot transitions to the slope. After approximately 1 meter, the robot is pushed back to the horizontal area. Looking at the data plots, various phases can be discerned. During the first 8 seconds, the emergency stop button was engaged while the operator moved from the command interface to the robot. Fluctuations of hand position and postural error are due to the operator grasping Dreamer’s upper arm.

399

After 8 seconds, the emergency button of the upper body is released, the hand position stabilizes and the postural error gets minimized. The independent power of the base is turned on after 10 seconds. Consequently, the COM relative error gets minimized. After 15 seconds, the operator starts to guide the robot up the slope. It can be seen that the base moves for approximately 0.9 m at a speed of about 0.35 m/s. During this time, the balance task maintains the COM within 1 cm of its goal. The relative hand position also remains stable and close to its goal. Note that the operator keeps pulling on the hand during all of this, and notice that the postural error is minimized in the task’s null space throughout the interaction. V. C ONCLUSION Controlling mobile humanoid robots in irregular terrains entails modeling the free-body dynamics constrained by wheel contacts and, in our case, a biarticular transmission in the torso. Based on these foundations, we have presented a prioritized compliant controller which precisely tracks a hierarchy of tasks and optimizes a posture in the remaining degrees of redundancy. The general applicability of our approach has been demonstrated with a detailed study of closed loop stability and an experiment that reflects some of the key challenges while remaining sufficiently simple to serve as an illustrative example. A remarkable result is that base movement can emerge quasi spontaneously, adapting to the terrain without an explicit representation of the feasible directions of movement. This is enabled by the properly modeled contact constraints and using an IMU to estimate spatial orientation. Another important consequence is that we do not require an explicit model of the wheel geometry and the direction of base motion in the controller. The whole-body skill used in the experiment is simple yet powerful, which demonstrates the synergistic capabilities of the framework and the robot. Firstly, the posture behavior coupled with the compliant hand task ensures that the robot kinesthetically follows the human around the terrain. Secondly, the IMU sensor feedback and the balance task ensure that the center of mass always stays very close to the center of the horizontal projection of the base, even while transitioning between horizontal and sloped sections. Lastly, tracking and optimizing the various goals takes advantage of all degrees of freedom and constraints. Future directions of this work include, detecting wheel contacts due to disturbances with terrain obstacles, dealing with wheel traction and slip, and improving the base hardware to reduce actuator friction, transmission alignment, backlash and shock absortion. R EFERENCES [1] O. Brock, O. Khatib, and S. Viji. Task-consistent obstacle avoidance and motion behavior for mobile manipulation. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 388–393, Washingtion, USA, 2002. [2] H. Bruyninckx. Open robot control software: the orocos project. In Robotics and Automation, 2001. Proceedings 2001 ICRA. IEEE International Conference on, volume 3, pages 2523 – 2528 vol.3, 2001.

[3] K.S. Chang, R. Holmberg, and O. Khatib. The augmented object model: Cooperative manipulation and parallel mechanism dynamics. In Robotics and Automation, 2000. Proceedings. ICRA’00. IEEE International Conference on, volume 1, pages 470–475. IEEE, 2000. [4] CC Cheah, C. Liu, and JJE Slotine. Adaptive tracking control for robots with unknown kinematic and dynamic properties. The International Journal of Robotics Research, 25(3):283–296, 2006. [5] A. DeLuca. Zero dynamics in robotics systems. Progress in Systems and Control Theory, Proceedings of the IIASA Workshop, 9:70–87, 1991. [6] A. Dietrich, T. Wimbock, and A. Albu-Schaffer. Dynamic wholebody mobile manipulation with a torque controlled humanoid robot via impedance control laws. In Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on, pages 3199 –3206, sept. 2011. [7] A. Dietrich, T. Wimbock, and A. Albu-Schaffer. Dynamic wholebody mobile manipulation with a torque controlled humanoid robot via impedance control laws. In Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on, pages 3199–3206. IEEE, 2011. [8] Gustavo Freitas, Fernando Lizarralde, Liu Hsu, and Ney R. Salvi dos Reis. Kinematic reconfigurability of mobile robots on irregular terrains. In Robotics and Automation, 2009. ICRA ’09. IEEE International Conference on, pages 1340 –1345, may 2009. [9] M. Fuchs, C. Borst, P.R. Giordano, A. Baumann, E. Kraemer, J. Langwald, R. Gruber, N. Seitz, G. Plank, K. Kunze, R. Burger, F. Schmidt, T. Wimboeck, and G. Hirzinger. Rollin’ justin - design considerations and realization of a mobile platform for a humanoid upper body. In Robotics and Automation, 2009. ICRA ’09. IEEE International Conference on, pages 4131 –4137, may 2009. [10] M. Galicki. An adaptive regulator of robotic manipulators in the task space. Automatic Control, IEEE Transactions on, 53(4):1058–1062, 2008. [11] Robert Holmberg and Oussama Khatib. Development and control of a holonomic mobile robot for mobile manipulation tasks. The International Journal of Robotics Research, 19(11):1066–1074, 2000. [12] S. Kajita, F. Kanehiro, K. Kaneko, K. Fujiwara, K. Harada, K. Yokoi, and H. Hirukawa. Resolved momentum control: Humanoid motion planning based on the linear and angular momentum. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1644–1650, Las Vegas, USA, October 2003. [13] Fumio Kanehiro, Hirohisa Hirukawa, and Shuuji Kajita. Openhrp: Open architecture humanoid robotics platform. The International Journal of Robotics Research, 23(2):155–165, 2004. [14] O. Khatib. A unified approach for motion and force control of robot manipulators: The operational space formulation. International Journal of Robotics Research, 3(1):43–53, 1987. [15] O. Khatib, L. Sentis, J.H. Park, and J. Warren. Whole body dynamic behavior and control of human-like robots. International Journal of Humanoid Robotics, 1(1):29–43, March 2004. [16] O. Matsumoto, S. Kajita, K. Tani, and M. Oooto. A four-wheeled robot to pass over steps by changing running control modes. In Robotics and Automation, 1995. Proceedings., 1995 IEEE International Conference on, volume 2, pages 1700–1706. IEEE, 1995. [17] K. Nagasaka, Y. Kawanami, S. Shimizu, T. Kito, T. Tsuboi, A. Miyamoto, T. Fukushima, and H. Shimomura. Whole-body cooperative force control for a two-armed and two-wheeled mobile robot using generalized inverse dynamics and idealized joint units. In Robotics and Automation (ICRA), 2010 IEEE International Conference on, pages 3377 –3383, may 2010. [18] G. Oriolo. Stabilization of self-motions in redundant robots. In Robotics and Automation, 1994. Proceedings., 1994 IEEE International Conference on, pages 704–709. IEEE, 1994. [19] R. Philippsen, L. Sentis, and O. Khatib. An open source extensible software package to create whole-body compliant skills in personal mobile manipulators. In Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on, pages 1036–1041. IEEE, 2011. [20] L. Sentis. Synthesis and Control of Whole-Body Behaviors in Humanoid Systems. PhD thesis, Stanford University, Stanford, USA, 2007. [21] L. Sentis and O. Khatib. Synthesis of whole-body behaviors through hierarchical control of behavioral primitives. International Journal of Humanoid Robotics, 2(4):505–518, December 2005. [22] R. Tellez, F. Ferro, S. Garcia, E. Gomez, E. Jorge, D. Mora, D. Pinyol, J. Oliver, O. Torres, J. Velazquez, and D. Faconti. Reem-b: An autonomous lightweight human-size humanoid robot. In Humanoid Robots, 2008. Humanoids 2008. 8th IEEE-RAS International Conference on, pages 462 –468, dec. 2008. [23] N.D. Vuong, MH Ang, T.M. Lim, and S.Y. Lim. An analysis of the operational space control of robots. In Robotics and Automation (ICRA), 2010 IEEE International Conference on, pages 4163–4168. IEEE, 2010.

400

Parsing Indoor Scenes Using RGB-D Imagery Camillo J. Taylor and Anthony Cowley GRASP Laboratory University of Pennsylvania Philadelphia, PA 19104 Email: [cjtaylor,acowley]@cis.upenn.edu

Abstract—This paper presents an approach to parsing the Manhattan structure of an indoor scene from a single RGBD frame. The problem of recovering the floor plan is recast as an optimal labeling problem which can be solved efficiently using Dynamic Programming.

I. I NTRODUCTION This paper considers the problem of parsing RGB-D images of indoor scenes, such as the one shown in Figure 1 to extract an underlying floor plan defined by the delimiting walls.

(a) Fig. 1.

(b)

(c)

a. RGB image b. Depth Image c. Inferred floor plan

It is important to note that this parsing problem involves more than simply identifying the wall surfaces in the scene. While this is a necessary first step, the extracted walls are effectively infinite planes supported by a finite set of depth measurements. In order to produce a floor plan one must reason about the extent of each plane and how the walls interact with each other to form corners, occlusion edges and other structures. Furthermore, in a typical scene one must contend with the fact that wall segments are often occluded by furniture, pedestrians and general clutter and that the depth measurements may be inaccurate or missing entirely. An effective parsing procedure needs to be able to infer the extents of the wall surfaces even in the presence of ambiguous, uncertain and incorrect input measurements. The approach proposed in this paper deals with all of these problems by formulating the parsing process as an optimal labeling problem which can be solved exactly and efficiently using Dynamic Programming. A number of researchers have addressed the related, but more challenging problem of inferring the 3D structure of a scene from monocular imagery. Gupta et al. [5] describe an interesting system for reasoning about scene structure using qualitative geometric and structural constraints. Hedau and his colleagues [6, 7] have explored algorithms for inferring the layout of indoor scenes based on vanishing points and other cues. Lee et al. [9] present an interesting approach to

reasoning about scene structure using volumetric constraints. Saxena and Ng [12] recast the problem in terms of a Markov Random field and infer the scene structure based on learned models. Furukawa et. al [4] describe an impressive system for recovering the structure of indoor scenes that utilizes a sequence of monocular image and employs a sophisticated but expensive volumetric analysis procedure. In this paper we make use of the range data provided by the Kinect sensor which simplifies the interpretation problem and provides more accurate parsing results. Recently Silberman and Fergus [13] have addressed the problem of scene analysis using RGB-D data. In this work the analysis problem is framed in terms of pixel labeling where the goal is to assign each pixel in the frame an appropriate class label. The goal in our work is to go beyond a labeling of the visible pixels and to instead propose a coherent floor plan that accurately extrapolates the underlying structure of the scene even in the face of clutter. In this paper we choose to focus on extracting the floors and walls since these represent the major structures which delimit the extent of the scene and provide the underlying context for other structures such as doors, walls, tables and chairs. The approach to interpretation taken in this paper is most similar to the one given by Lee, Hebert and Kanade [10] who propose an approach to extracting the Manhattan structure of a frame based on vanishing points and an analysis of possible corners in the scene. Our approach makes use of the 2.5D structure of the image in the same way that they do but takes a different approach to formulating and solving the parsing problem. The proposed approach is also similar to the parsing scheme described by Flint et al.[3, 2] who also make use of Dynamic Programming to efficiently propose interpretations of indoor scenes from monocular imagery. The principal differences between this work and that of Flint et al.is that it takes as input an RGB-D frame and begins by explicitly extracting planar surfaces, further the subsequent dynamic programming optimization procedure is phrased in terms of the RGB-D measurements as opposed to monocular and stereo cues. Several schemes have been proposed to address the problem of interpreting point cloud data. Rusu et al. [11] describe an impressive system for parsing range scans acquired from indoor kitchen scenes. Toshev et al. [14] describe a scheme that has been used to automatically parse range scans to produce building models. Anguelov et al. [1] and Lalonde et al. [8]

401

describe schemes for classifying regions in point cloud data sets to identify, buildings, trees and other structures. Most of these schemes were designed to work offline in a batch fashion. In this context one can afford to make several passes over the data to identify nearest neighbors, or to fuse neighboring regions. The goal in this work is to develop a scheme that can ultimately be run in an online fashion so that it can be used to parse the data as it is being acquired. Another salient difference is the fact that this approach seeks to exploit the relationship between the 2D range image and the associated imagery to accelerate the interpretation process.

A. Image Segmentation

(a) Fig. 3.

II. T ECHNICAL A PPROACH

(b)

a. Extracted Intensity Edges b. Image Segmentation

The first step in our analysis is an edge accurate segmentation scheme that breaks the RGB imagery into coherent, disjoint regions. The image segmentation procedure begins with a standard Canny edge extraction step which finds significant discontinuities in the intensity image as shown in Figure 3a. The detected edgels are then passed to a Delaunay Triangulation procedure which produces a triangular tessalation of the image. The resulting triangular graph is then segmented using an agglomerative merging procedure that repeatedly merges the two regions with the lowest normalized boundary cost. These merging costs are computed by considering the average HSV color in each region. This procedure can be efficiently implemented using a heap data structure and the entire segmentation procedure can be carried out in 0.1 seconds on a typical image in Matlab.

Segment RGB Image

Extract Planar Segments

Find Dominant Rectilinear Structure

Identify Candidate Wall Segments

B. Extracting Planar Surfaces Divide Image into Intervals

Extract Wall Layout

Fig. 2.

Flowchart of overall parsing procedure.

(a)

The overall procedure for parsing the scene based on an RGB-D image is outlined in the flowchart given in Figure 2. The first stage in the pipeline segments the input RGB image into regions based on extracted image edges. The second stage of processing uses the image segmentation as a prior to search for planar surfaces in the scene. The third stage identifies the floor of the scene and estimates the dominant rectilinear orientation. The fourth stage considers the set of extracted planes and identifies segments that could serve as walls. The fifth stage breaks the image up into intervals based on the extracted wall segments. The final stage estimates the layout of the scene by labeling each interval with the index of the underlying wall. Each of these stages is explained in more detail in the following subsections using the scene shown in Figure 1 as a running example.

Fig. 4.

(b)

a. Extracted Planes b. Result after coplanar segments are merged

The regions derived from the image segmentation are used to suggest groupings of the depth samples from the range imagery. More specifically, the depth samples associated with each of the image regions are passed to a RANSAC routine which is used to recursively divide the point set into planar regions. A key advantage of the proposed approach is that the image segmentation procedure is very effective at suggesting useful groupings so very few RANSAC iterations are needed to discover the structures of interest. Effectively, the image segmentation serves to focus the computational effort of the procedure on groupings that are likely to yield fruitful interpretations so the procedure is able to discover relevant groupings quickly even in complex environments with several surfaces.

402

It is important to keep in mind that the depth measurements produced by the Kinect sensor are derived from structured light via triangulation as opposed to time of flight. As such the device is best thought of as measuring disparity which is inversely related to depth. One practical consequence is that the error in the depth estimates increases rapidly as one gets further away from the sensor which argues against standard approaches to fitting planes to points based on the residual error in 3D. In the proposed scheme the planar surfaces are fit to the RGB-D measurements by exploiting the observation that planar surfaces in the scene will project to planar regions in the disparity image. This can be seen by taking the standard equation for a plane in the coordinate frame of the sensor: nx X + n y Y + nz Z = c dividing through by scene depth, Z, to obtain: nx

Y X 1 + ny + nz = c Z Z Z

Y and noting that u = X Z and v = Z correspond to the normalized image coordinates while w = Z1 denotes the measured disparity at that coordinate. This means that planar regions in the scene can be extracted by fitting affine models to the disparity in each image region. Figure 4 shows the results of the planar interpretation procedure on the sample scene. Here the different colors correspond to different planar segments that were recovered. These planar segments are then passed to a greedy merging procedure which seeks to group coplanar segments into extended regions as shown in Figure 4b.

C. Finding Dominant Rectilinear Structure The planar extraction procedure returns a set of segments which can then be analyzed to identify salient structures. The analysis procedure assumes that the vertical axis of the image is roughly aligned with the gravity vector. Given this assumption, the first step in the interpretation procedure is to identify the floor plane by searching for a large planar region near the bottom of the image whose normal is approximately vertical and which appears to underly most of the other 3D points. The normal to this plane defines the direction of the gravity vector in the RGB-D sensors frame of reference. Candidate wall segments are identified by looking for extended planar segments whose normals are perpendicular to this gravity direction. Each candidate wall segment effectively defines an associated rectilinear orientation for the scene where the z-axis corresponds to the gravity direction, the x-axis corresponds to the normal to the wall segment and the yaxis is simply the cross product of these normal vectors. This rectilinear orientation can be compactly represented with a  ˆ y ˆ z ˆ single rotation matrix Rcw ∈ SO(3), Rcw = x which captures the orientation of the RGB-D sensor with respect to the Manhattan structure of the scene. The interpretation system cycles through each of the candidate wall segments and scores the associated rectilinear

orientation by determining how many of the other planar surfaces are aligned with one of the cardinal axes. The candidate rotation matrix with the most support is chosen as the dominant rectilinear orientation. The fourth column of Figure 8 depicts the extracted rectilinear structure by showing how various segments are aligned with the dominant axes. In addition to the walls extracted by the fitting procedure, 4 additional walls are added to form an axis aligned bounding box that surrounds all of the recovered points. This bounding box serves as a ’backdrop’ providing a default interpretation for every point in the scene. D. Identify Candidate Walls and Wall Segments Once the dominant rectilinear structure of the scene has been established, the system identifies extracted planar segments that may be walls in the scene. This is accomplished by finding planar structures in the scene whose normals are aligned with either the x or y axis and that have an appropriate horizontal and vertical extent. Note that in practice the walls are often partially occluded by furniture and other clutter so there is no requirement that the wall segment extend from the floor to the ceiling. The third column of Figure 8 shows the results of the analysis that identifies wall segments. Note that for the purposes of our experiments tall cabinets and office partitions can serve as walls since they have an appropriate extent and serve to delimit the floor plan of the scene. Once the candidate walls have been identified in the image, the points that make up that segment are further divided into contiguous sections called wall segments. This partitioning accounts for the fact that a wall may stop and start in the scene as the dominant wall in Figure 1 does. These wall segments represent continuous stretches of wall surface observed in the RGB-D image. Figure 5 makes the distinction between walls, which are modeled as infinite planes, and wall segments which are thought of as finite sections of wall surface observed in the RGB-D image. E. Divide Image into Intervals Figure 6 shows an idealized image of an indoor scene where the vertical axis of the camera is aligned with the gravity direction. In this case vertical lines in the scene will project to vertical lines in the image. In particular the vertical lines corresponding to corners in the scene or to points where one wall segment occludes another would effectively subdivide the horizontal field of view into a sequence of disjoint intervals as shown. Each interval in the image would be associated with a wall. This structure was noted and exploited by Lee, Hebert and Kanade [10] in their work on parsing indoor scenes from single images. The parsing scheme proposed in this paper proceeds by breaking the image into a sequence of intervals and then associating a wall with each span to produce an interpretation of the scene. The endpoints of the intervals are derived from the extents of the extracted wall segments and from the inferred intersections between all pairs of perpendicular walls. These intersections are depicted by circles in Figure 5 . All of

403

Fig. 5. An overhead view of a floor plan of the scene indicating the walls, which are considered to have infinite extent and are depicted by dashed lines, and the wall segments, which are depicted by the solid line segments. These wall segments correspond to contiguous planar regions in the RGB-D image. The interpretation system also considers all of the possible intersections between perpendicular walls, denoted by circles in the figure, since these may correspond to corners in the scene.

Fig. 6. Idealized figure of a frame showing how the horizontal field of view can be divided into non-overlapping intervals each of which will be associated with an underlying wall.

these putative endpoints are projected into the horizontal field of view of the sensor and then sorted from right to left to define image intervals as shown in Figure 6. Including more endpoints than needed is not a problem since the subsequent labeling stage can consolidate neighboring intervals as needed. Note that we do not require our input images to be perfectly aligned with gravity as shown in this idealized view since the rotation matrix Rcw recovered in the rectilinear analysis stage allows us to effectively rectify the input image to account for the tilt and roll of the sensor. F. Extract Wall Layout As noted in the previous subsection, the scene interpretation procedure is rephrased as a labeling problem where the goal is to label each interval in the horizontal field of view with an associated wall segment. The extents of the intervals would define the extents of the walls from the point of view of the

Fig. 7. For each pixel in the frame we can compute the index of the wall that best explains the observed disparity and orientation. We can then evaluate how well a particular wall explains a given image interval by projecting the wall into the frame at that interval and determining how many pixels in that region agree with the proposed label. The quadrilateral demarcated on the frame corresponds to the projection of one of the prospective walls into one of the image intervals.

observer. This can be viewed as a graph labeling problem where the nodes in the graph are the intervals and neighboring intervals are connected by edges to form a simple 1D chain. We associate a cost with assigning wall labels to each of the intervals and a cost to assigning different wall labels to neighboring segments and then find a labeling which minimizes the overall cost. In order to assign costs to each interval, the procedure first determines the optimal wall segment label for each pixel in the frame. This is accomplished by projecting each wall into the RGB-D frame and comparing the disparity and orientation predicted at each pixel with the observed disparity and orientation. The predicted depth or disparity at each pixel can easily be calculated based on the best fit normal to the plane. The system considers each of the extracted walls in turn and each pixel in the frame retains the index of the wall that comes closest to predicting the observed disparity and surface normal at that point. Once this initial labeling step has been completed, we can evaluate the cost of assigning a particular image interval to a particular wall candidate by projecting a quadrilateral defined by the interval endpoints and the ceiling and floor heights into the frame as shown in Figure 7. The system considers all of the pixels that lie within that region and measures what fraction of these pixels do not have the wall in question as their first choice. This fraction indicates the cost of assigning the wall label to the interval. The lower the number, the better the match. This score is also weighted by the apparent size of the interval in the image. More specifically a normalized interval size is computed by taking the angle subtended by the interval in the image and dividing it by the effective horizontal field of view of the image to produce a number between 0 and 1.

404

The product of this normalized interval size and the fractional score mentioned in the previous paragraph is used as the final label score for the interval. Unfortunately it is typically not sufficient to simply assign each interval the index of the wall with the lowest cost. Missing depth measurements, occlusion and inherent ambiguities serve to introduce spurious labelings. The proposed scheme makes use of a global optimization procedure to enforce smoothness and produce a more coherent interpretation in the face of these uncertainties. The labeling procedure exploits the fact that the nodes in the graph we are labeling form a simple chain which allows us to find the globally optimal assignment efficiently via Dynamic Programming. In order to encourage continuity in the labeling process a cost is associated with each of the edges in the chain. This transition cost is designed to capture the cost associated with assigning adjacent intervals to different wall segments. If the proposed labeling induces a significant change in depth at the junction between the two image intervals then a fixed transition penalty is applied, in our experiments this transition penalty was fixed at 0.03. Note that if two perpendicular walls meet to form a corner there is no depth discontinuity and the associated transition penalty would be zero. The objective function that we are interested in minimizing takes the following form: O(l) =

ni 

(fi (li ) + ei (li , li−1 ))

(1)

i=1

Where ni denotes the number of intervals, fi (li ) denotes the cost of associating interval i with wall label li and ei (li , li−1 ) represents the transition cost associated with assigning interval i the label li while interval i − 1 is assigned the label li−1 . The optimization problem can be tackled in stages where at each stage the system considers a series of optimization problem of the form: Di,j = min[fi (j) + ei (j, k) + Di−1,k ] k

(2)

Where Di,j represents the cumulative cost of assigning interval i the label j at this stage. The Dynamic Programming procedure systematically considers all of the legal labels that can be assigned to each interval. Note that at each stage this optimization problem uses the results from the previous optimization stage. The computational complexity of the optimization problem O(ni n2w ). Where nw denotes the number of wall labels. On a typical problem ni is on the order of 20 while nw is on the order of 10 so the computational cost of the optimization is usually quite small. The final output of the optimization procedure is a labeling of each of the image intervals. This labeling can be used to construct the 2.5D scene models shown in Figure 8. The overall strategy of phrasing the interpretation process as an optimal labeling problem that can be solved with Dynamic Programming is similar to the approach proposed by Flint et al.[3, 2] however the schemes used to define the interpretation

costs are quite different because this approach exploits RGB-D imagery. III. E XPERIMENTAL R ESULTS In order to evaluate the effectiveness of the proposed algorithm it was applied to a data set of 38 RGB-D images taken from various vantage points in a typical office environment. For each of the test images an interpretation score was manually generated by counting the number of image intervals where the wall assignment suggested by the automatic procedure differed from the humans assessment. A score of zero would indicate that the system had correctly labeled all of the wall surfaces in the scene while a score of 1 would indicate that one of the image intervals was labeled incorrectly. On 20 of these images the procedure produced a completely correct interpretation extracting all wall surfaces even in the presence of significant clutter, on sixteen of the images one of the recovered wall segments was labeled incorrectly. In the vast majority of these cases the erroneous parse covers a small section of the frame and the error is caused by incomplete range data. On two of the frames the procedure failed to produce an intelligible result. The same parameter settings were used on all of the examples. Figure 8 shows samples of the scenes that the system parsed correctly. Note that the scheme was able to handle situations with significant amounts of clutter such as the third and fifth case. It can also correctly handle cases with relatively complex occlusion structures as in the second example. Note that the system correctly recovers small features like the edges of the doorway on the second example and the structure of the water cooler alcove in the third example. It is also able to deal correctly with the clutter in the fourth and fifth examples. Figure 9 shows examples of the cases where one of the planes is labeled incorrectly. Note that in all of these cases the errors are fairly subtle and the gross structure of the scene is actually recovered quite well. For example In the fifth case the person in the foreground is labeled as a wall, in the fourth example the end of the corridor is not parsed correctly because it is beyond the range of the sensor. The entire segmentation and analysis procedure is implemented in Matlab and it takes approximately 6 seconds to run the complete analysis on a typical RGB-D image on a Macbook Pro laptop. No attempt has been made to optimize the code and we expect that a more efficient implementation would run significantly faster. All of the code and datasets used in this paper are freely available online at the following website: http://www.cis.upenn.edu/∼cjtaylor/RESEARCH/projects/ RGBD/RGBD.html. IV. C ONCLUSIONS AND F UTURE W ORK This paper has presented an approach to parsing the floor plan of an indoor scene from a single RGB-D frame by finding a set of candidate walls and delimiting their extent in the image. The problem of parsing the scene is recast as an

405

optimal labeling problem which can be solved efficiently using Dynamic Programming. In this sense, the method exploits the 2.5D structure of the image to simplify the scene interpretation problem. The analysis provides a compact description of the overall structure of the scene in terms of a floor plane and wall segments. This representation can serve as a basis for further semantic analysis which would identify other structures such as doors, tables, chairs and windows. We note that while the Manhattan structure is a convenient cue, it is not essential to this approach. The same basic scheme could be employed to parse environments where some of the walls do not adhere to the rectilinear model. Future work will seek to merge parse results from a sequence of RGB-D frames into larger floor plans. Here the Manhattan structure provides a convenient framework for accumulating information about recovered wall segments into a coherent map in an incremental fashion. Such an approach could be used to produce semantic decompositions of extended regions which could be useful in a variety of robotic applications. ACKNOWLEDGMENTS This research was partially sponsored by the Army Research Laboratory Cooperative Agreement Number W911NF-10-20016 and by the National Science Foundation through an I/UCRC grant on Safety, Security, and Rescue Robotics. R EFERENCES [1] D. Anguelov, B. Taskarf, V. Chatalbashev, D. Koller, D. Gupta, G. Heitz, and A. Ng. Discriminative learning of markov random fields for segmentation of 3d scan data. In IEEE Conference on Computer Vision and Pattern Recognition, volume 2, pages 169 – 176, jun. 2005. [2] A. Flint, D. Murray, and I. Reid. Manhattan scene understanding using monocular, stereo, and 3d features. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 2228 –2235, nov. 2011. [3] Alex Flint, Christopher Mei, David Murray, and Ian Reid. A dynamic programming approach to reconstructing building interiors. In Kostas Daniilidis, Petros Maragos, and Nikos Paragios, editors, Computer Vision – ECCV 2010, volume 6315 of Lecture Notes in Computer Science, pages 394–407. Springer Berlin / Heidelberg, 2010. [4] Yasutaka Furukawa, Brian Curless, Steven M. Seitz, and Richard Szeliski. Reconstructing building interiors from images. In International Conference on Computer Vision, pages 80–87, Kyoto, October 2009. [5] Abhinav Gupta, Alexei A. Efros, and Martial Hebert. Blocks world revisited: image understanding using qualitative geometry and mechanics. In Proceedings of the 11th European conference on Computer vision: Part IV, ECCV’10, pages 482–496, Berlin, Heidelberg, 2010. Springer-Verlag.

[6] Varsha Hedau, Derek Hoiem, and David Forsyth. Recovering the spatial layout of cluttered rooms. In Proceedings of the IEEE International Conference on Computer Vision (ICCV ’09), 2009. [7] Varsha Hedau, Derek Hoiem, and David Forsyth. Thinking inside the box: using appearance models and context based on room geometry. In Proceedings of the 11th European conference on Computer vision: Part VI, ECCV’10, pages 224–237, Berlin, Heidelberg, 2010. Springer-Verlag. [8] J. F. Lalonde, N. Vandapel, D. F. Huber, and M. Hebert. Natural terrain classification using three-dimensional ladar data for ground robot mobility. Journal of Field Robotics, 23(10):839–861, 2006. [9] David Changsoo Lee, Abhinav Gupta, Martial Hebert, and Takeo Kanade. Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. Advances in Neural Information Processing Systems (NIPS), 24, November 2010. [10] D.C. Lee, M. Hebert, and T. Kanade. Geometric reasoning for single image structure recovery. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 2136 –2143, june 2009. [11] Radu Bogdan Rusu, Zoltan Csaba Marton, Nico Blodow, Mihai Dolha, and Michael Beetz. Towards 3D Point Cloud Based Object Maps for Household Environments. Robotics and Autonomous Systems Journal (Special Issue on Semantic Knowledge), 2008. [12] Ashutosh Saxena, Min Sun, and Andrew Y. Ng. Make3d: Learning 3d scene structure from a single still image. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31:824–840, 2009. [13] N. Silberman and R. Fergus. Indoor scene segmentation using a structured light sensor. In Proceedings of the International Conference on Computer Vision - Workshop on 3D Representation and Recognition, 2011. [14] A. Toshev, P. Mordohai, and B. Taskar. Detecting and parsing architecture at city scale from range data. In IEEE Conference on Computer Vision and Pattern Recognition, pages 398–405, 2010.

406

Fig. 8. Figure showing some successful scene parsing results. The second column shows the shows the planar segments extracted from the scene, different colors are used to label points belonging to different planes. The third column shows the points belonging to candidate wall segments, the fourth column shows shows the results of the analysis that finds the dominant rectilinear structure. The final column shows the results of parsing the image into wall segments, each pixel is assigned a label corresponding to the underlying wall.

407

Fig. 9. Figure showing cases where one of the recovered planes has an error. The second column shows the shows the planar segments extracted from the scene, different colors are used to label points belonging to different planes. The third column shows the points belonging to candidate wall segments, the fourth column shows shows the results of the analysis that finds the dominant rectilinear structure. The final column shows the results of parsing the image into wall segments, each pixel is assigned a label corresponding to the underlying wall.

408

Toward Information Theoretic Human-Robot Dialog Stefanie Tellex1† , Pratiksha Thaker1† , Robin Deits‡ , Dimitar Simeonov†, Thomas Kollar§ , Nicholas Roy† † MIT

Computer Science and Artificial Intelligence Laboratory, Cambridge, MA Email: {stefie10, prthaker, mitko, nickroy}@csail.mit.edu ‡ Battelle Memorial Institute, Columbus, Ohio Email: [email protected] § Carnegie Mellon University, Pittsburgh, PA Email: [email protected]

Abstract—Our goal is to build robots that can robustly interact with humans using natural language. This problem is challenging because human language is filled with ambiguity, and furthermore, due to limitations in sensing, the robot’s perception of its environment might be much more limited than that of its human partner. To enable a robot to recover from a failure to understand a natural language utterance, this paper describes an information-theoretic strategy for asking targeted clarifying questions and using information from the answer to disambiguate the language. To identify good questions, we derive an estimate of the robot’s uncertainty about the mapping between specific phrases in the language and aspects of the external world. This metric enables the robot to ask a targeted question about the parts of the language for which it is most uncertain. After receiving an answer, the robot fuses information from the command, the question, and the answer in a joint probabilistic graphical model in the G3 framework. When using answers to questions, we show the robot is able to infer mappings between parts of the language and concrete object groundings in the external world with higher accuracy than by using information from the command alone. Furthermore, we demonstrate that by effectively selecting which questions to ask, the robot is able to achieve significant performance gains while asking many fewer questions than baseline metrics.

I. I NTRODUCTION Our aim is to make robots that can naturally and flexibly interact with a human partner via natural language. An especially challenging aspect of natural language communication is the use of ambiguous referring expressions that do not map to a unique object in the external world. For instance, Figure 1 shows a robotic forklift in a real-world environment paired with instructions created by untrained users to manipulate one of the objects in the scene. These instructions contain ambiguous phrases such as “the pallet” which could refer equally well to multiple objects in the environment. Even if the person gives a command that would be unambiguous to another person, they might refer to aspects of the world that are not directly accessible to the robot’s perceptions. For example, one of the commands in Figure 1 refers to “the metal crate.” If a robot does not have access to perceptual features corresponding to the words “metal” or “crate,” it cannot disambiguate which object is being referenced. In this paper, we present an approach for enabling robots to recover from failures like these by asking a clarifying question, the same strategy that humans use when faced with ambiguous 1 The

first two authors contributed equally to this paper.

(a)

Move the pallet from the truck. Remove the pallet from the back of the truck. Offload the metal crate from the truck. (b)

Fig. 1: Sample natural language commands collected from untrained users, commanding the forklift to pick up a pallet in (a).

language. The robot first identifies the most ambiguous parts of a command, then asks a targeted question to try to reduce its uncertainty about which aspects of the external world correspond to the language. For example, when faced with a command such as “Pick up the pallet on the truck” in the situation shown in Figure 1, the robot can infer that because there is only one truck in the scene, but two pallets, the phrase “the pallets” is the most ambiguous and ask a question like, “What do you mean by ‘the pallet’?” Then it can use information from the answer to disambiguate which object is being referenced in order to infer better actions in response to the natural language command. Previous approaches to robotic question-asking do not directly map between natural language and perceptuallygrounded aspects of the external world or incorporate additional information from free-form natural language answers in order to disambiguate the command. [3, 11, 1]. As a result, the robot cannot take advantage of its model of the environment to determine the most ambiguous parts of an arbitrary natural language command and identify a question to ask.

409

In order to derive an expression for the robot’s uncertainty about groundings in the external world, our approach builds on the Generalized Grounding Graph (G3 ) framework [18, 17]. The G3 framework defines a probabilistic model that maps between parts of the language and groundings in the external world, which can be objects, places, paths, or events. The model factors according to the linguistic structure of the natural language input, enabling efficient training from a parallel corpus of language paired with corresponding groundings. However, this factorization requires introducing a new correspondence variable which leads to difficulties when estimating the entropy over grounding variables. In this paper we derive a metric based on entropy using the G3 framework. The robot uses this metric to identify the most uncertain random variables in the model in order to select a question to ask. Once the robot has asked a question, we show that it can exploit information from an answer produced by an untrained user by merging variables in the grounding graph based on linguistic coreference. By performing inference in the merged model, the robot infers the best set of groundings corresponding to the command, the question, and the answer. We evaluate the system by collecting answers to questions created by the robot using crowdsourcing. We demonstrate that the system is able to incorporate information from the answer in order to more accurately ground concrete noun phrases in the language to objects in the external world. Furthermore, we show that our metric for identifying uncertain variables to ask questions about significantly reduces the number of questions the robot needs to ask in order to resolve its uncertainty. This work expands on previous work presented in [14]. II. BACKGROUND We briefly review grounding graphs, which were introduced by [18], giving special attention to the motivation for the correspondence variable, Φ. The correspondence variable makes it possible to efficiently train the model using local normalization at each factor but complicates the calculation of entropy described in Section III-A. In order for a robot to understand natural language, it must be able to map between words in the language and corresponding groundings in the external world. The aim is to find the most probable groundings γ1 . . . γN given the language Λ and the robot’s model of the environment m: argmax p(γ1 . . . γN |Λ, m)

(1)

γ1 ...γN

For brevity, we omit m from future equations. Groundings are the specific physical concept that is meant by the language and can be objects (e.g., a truck or a door), places (e.g., a particular location in the world), paths (e.g., a trajectory through the environment), or events (e.g., a sequence of robot actions). To learn this distribution, one standard approach is to factor it based on certain independence assumptions, then train models for each factor. Natural language has a wellknown compositional, hierarchical argument structure [6], and a promising approach is to exploit this structure in order to

factor the model. However, if we define a directed model over these variables, we must assume a possibly arbitrary order to the conditional γi factors. For example, for a phrase such as “the tire pallet near the other skid,” we could factorize in either of the following ways: p(γtires , γskid |Λ) = p(γskid |γtires , Λ) × p(γtires |Λ) p(γtires , γskid |Λ) = p(γtires |γskid , Λ) × p(γskid |Λ)

(2) (3)

Depending on the order of factorization, we will need different conditional probability tables that correspond to the meanings of words in the language. To resolve this issue, another approach is to use Bayes’ Rule to estimate the p(Λ|γ1 . . . γN ), but this distribution would require normalizing over all possible words in the language Λ. Another alternative is to use an undirected model, but this would require normalizing over all possible values of all γi variables in the model. To address these problems, the G3 framework introduces a correspondence vector Φ to capture the dependency between γ1 . . . γN and Λ. Each entry in φi ∈ Φ corresponds to whether linguistic constituent λi ∈ Λ corresponds to grounding γi . We assume that γ1 . . . γN are independent of Λ unless Φ is known. Introducing Φ enables factorization according to the structure of language with local normalization at each factor over a space of just the two possible values for φi . 1) Inference: In order to use the G3 framework for inference, we want to infer the groundings γ1 . . . γN that maximize the distribution argmax p(γ1 . . . γN |Φ, Λ),

(4)

γ1 ...γN

which is equivalent to maximizing the joint distribution of all groundings γ1 . . . γN , Φ and Λ, argmax p(γ1 . . . γN , Φ, Λ).

(5)

γ1 ...γN

We assume that Λ and γ1 . . . γN are independent when Φ is not known, yielding: argmax p(Φ|Λ, γ1 . . . γN )p(Λ)p(γ1 . . . γN )

(6)

γ1 ...γN

This independence assumption may seem unintuitive, but it is justified because the correspondence variable Φ breaks the dependency between Λ and γ1 . . . γN . If we do not know whether γ1 . . . γN correspond to Λ, we assume that the language does not tell us anything about the groundings. Finally, we assume a constant prior on γ1 . . . γN and ignore p(Λ) since it does not depend on γ1 . . . γN , leading to: argmax p(Φ|Λ, γ1 . . . γN )

(7)

γ1 ...γN

The G3 framework described by [18] trains the model in a discriminative fashion. However, it is not a conventional conditional random field in that the correspondence vector Φ = T rue is observed, and the conditioning variables γi are hidden, preventing the use of standard inference techniques. To compute the maximum value of the objective in Equation 7, the system performs beam search over γ1 . . . γN , computing

410

Question

Command

Answer

γ1 = γ2 =

γ3 =

γ5 =

γ6 =

φ1

φ2

φ3

φ5

φ6

φ7

λr1 “Pick up”

λf2 “the pallet.”

λr3 “Which one?”

λf5 “The one”

λr6 “near”

λf7 “the truck.”

(a) Unmerged grounding graphs for three dialog acts. The noun phrases “the pallet,” “one” and “the one near the truck” refer to the same grounding in the external world, but initially have separate variables in the grounding graphs.

γ3 = γ1 = γ6 =

φ1

φ2

φ3

φ5

φ6

φ7

λr1 “Pick up”

λf2 “the pallet.”

λr3 “Which one?”

λf5 “The one”

λr6 “near”

λf7 “the truck.”

(b) The grounding graph after merging γ2 , γ3 and γ5 based on linguistic coreference.

Fig. 2: Grounding graphs for a three-turn dialog, before and after merging based on coreference . The robot merges the three shaded variables.

the probability of each assignment from Equation 7 to find the maximum probability assignment. Although we are using p(Φ|Λ, γ1 . . . γN ) as the objective function, Φ is fixed, and the γ1 . . . γN are unknown. This approach is valid because, given our independence assumptions, p(Φ|Λ, γ1 . . . γN ) corresponds to the joint distribution over all the variables given in Equation 5. In order to perform beam search, we factor the model according to the hierarchical, compositional linguistic structure of the command: & p(Φ|Λ, γ1 . . . γN ) = p(φi |λi , γi1 . . . γik ) i

This factorization can be represented graphically; we call such a graphical model the grounding graph for a natural language command. We can draw it as a factor graph, but in this paper we represent it as a directed graphical model to emphasize the role of the correspondence variable and independence assumptions in the factorization. The directed model for the command “Pick up the pallet” appears in Figure 2. The λ variables correspond to language; the γ variables correspond to groundings in the external world, and the φ variables are T rue if the groundings correspond to the language, and F alse otherwise.

2) Training: We learn model parameters from a corpus of labeled examples. The G3 framework assumes a log-linear parametrization with feature functions fj and feature weights μj : & 1  μj fj (φi , λi , γi1 . . . γik )) p(Φ|Λ, γ1 . . .γN ) = exp( Z j i (8) Features map between words in the language and corresponding groundings in the external world. For example, features include object class, whether one grounded object is physically supported by another grounded object, or whether the robot is approaching or moving away from a landmark object. The training set consists of an aligned, parallel corpus of language paired with positive and negative examples of groundings in the external world. The alignment annotations consist of a mapping between each natural language constituent and a corresponding grounding in the external world. We use the corpus and alignment annotations described in [18]. III. T ECHNICAL A PPROACH When faced with a command, the system extracts grounding graphs from the natural language input and performs inference to find the most likely set of values for the grounding variables

411

variables, then ask a question about the variable with maximum entropy. We begin by defining a family of marginal distributions for each grounding variable γj conditioned on Φ and Λ: p(γj |Φ, Λ)

(10)

To find the most uncertain grounding variable γj , we find the distribution in this family with the highest entropy: argmax Hp(γj |Φ,Λ) (γj )

(11)

j

Fig. 3: System diagram. Grayed out blocks show pre-existing components; black parts show the question-asking feedback system new to this paper. γ1 . . . γN . Next, it identifies the most uncertain grounding variable γj and asks a question about that variable, described in Section III-A. After receiving an answer from a human partner, the robot merges grounding graphs from the command, question, and answer into a single graphical model, described in Section III-B. Finally, it performs inference in the merged graph to find a new set of groundings that incorporates information from the answer as well as information from the original command. Figure 3 shows dataflow in the system. Grayed-out blocks in the figure show pre-existing components, not novel to this paper. Black blocks show the question-asking feedback system that is the contribution of this work. A. Generating a Question Our approach to asking questions is to first identify grounding variables whose values are most uncertain, then generate a question to try to disambiguate the value of that variable. The system iteratively asks questions about the most uncertain grounding variable γj until it is sufficiently confident about having inferred the correct groundings. One intuitive estimate for the uncertainty of a grounding variable γj is to look at the probability of the correspondence variable φk for each factor it participates in: & p(φk |γ1 . . . γN , Λ) (9) argmin γj

k∈f actors(γj )

If the system was unable to find a high-probability grounding for a variable γj , then it could ask a question to collect more information. We refer to this approach as Metric 1. 1 However, this metric will not perform well if there are several objects in the external environment that could correspond equally well to the words in the language. As an example, a vague expression such as “the pallet” would have high confidence for any pallet that was grounded to it. But if there were many pallets in the environment, the robot might be very uncertain about which one was meant. A more principled approach is to formally derive an expression for the entropy of the distribution over grounding 1 We use confidence instead of H p(Φ|γ1 ...γN ,Λ) (Φ), since entropy is low when p(φk |γ1 . . . γN , Λ) is either very high or very low probability.

The system can collect more information from the human partner in order to disambiguate the command by asking a question about the most uncertain variable, For example, if a command like “bring the pallet on the truck to receiving” were issued in a context with two trucks and one pallet, the entropy would be higher for the phrase “the truck”and the system could ask a question such as “Which truck?” On the other hand, if there were two pallets and one truck, the entropy would be higher for the phrase “the pallet” and the system would ask a question such as “Which pallet?” We can expand the entropy function as: Hp(γj |Φ,Λ) (γj ) = −



p(γj |Φ, Λ) log p(γj |Φ, Λ)

(12)

γj

Unfortunately, p(γj |Φ, Λ) cannot be directly computed in the G3 framework, because we only have p(Φ|Λ, γ1 . . . γN ), which the system maximizes with respect to the unknown grounding variables γ1 . . . γN . Instead, we rewrite it as a marginalization over the joint:  p(γj |Φ, Λ) = p(γ1 . . . γN |Φ, Λ) (13) γ1 ...γj−1 ,γj+1 ...γN

We use Bayes’ rule to rewrite it: p(γj |Φ, Λ) =

 γ1 ...γj−1 ,γj+1 ...γN

p(Φ|γ1 . . . γN , Λ)p(γ1 . . . γN |Λ) p(Φ|Λ) (14)

Next, we assume that γ1 . . . γN are independent of Λ when we do not know Φ, as we did in Equation 6, yielding: p(γj |Φ, Λ) =

 γ1 ...γj−1 ,γj+1 ...γN

p(Φ|γ1 . . . γN , Λ)p(γ1 . . . γN ) p(Φ|Λ) (15)

Finally, we assume a constant prior p(γ1 . . . γN ) = C and define a constant K = C/p(Φ|Λ):  p(γj |Φ, Λ) = K p(Φ|γ1 . . . γN , Λ) (16) γ1 ...γj−1 ,γj+1 ...γN

We can efficiently approximate this summation based on the results of the inference process. After running inference, the system saves all M sets of values that it considered for grounding variables in the model; we call each sample sm . Each sample consists of bindings for grounding variable γj

412

in the grounding graph, and we denote the value of variable γj in sm as sm [γj ]. When performing inference using beam search, we set the beam width to be large so that the samples are diverse enough to enable accurate estimate of entropy. We approximate 16 with: c(γj = g, Φ, Λ) pˆ(γj = g|Φ, Λ) =  x c(γj = x, Φ, Λ)

(17)

where c is c(γj = g|Φ, Λ) =  K

p(Φ|sm [γ1 ] . . . sm [γN ], Λ)×

{ sm |sm [γj ]=g }

p(sm [γ1 ] . . . sm [γN ])

(18)

We can substitute this equation into Equation 12 to obtain an estimate for the entropy. We refer to this approximation as Metric 2. After identifying a variable to ask about, the robot asks a question using a template-based algorithm. It finds text associated with the grounding variable and generates a question of the form “What do the words X refer to?” Once a question has been generated, the system asks it and collects an answer from the human partner. In general, answers could take many forms. For example, Figure 5 shows commands, questions generated using the template-based algorithm, along with corresponding answers collected from untrained users. B. Merging Graphs Once a question has been chosen and an answer obtained, the robot incorporates information from the answer into its inference process. It begins by computing separate grounding graphs for the command, the question and the answer according to the parse structure of the language. Next, variables in separate grounding graphs are merged based on linguistic coreference. Finally, the system performs inference in the merged graph to incorporate information from the command, question, and answer. Resolving linguistic coreferences involves identifying linguistic constituents that refer to the same entity in the external world. For example, in the command, “Pick up the tire pallet and put it on the truck,” the noun phrases “the tire pallet” and “it” refer to the same physical object in the external world, or corefer. Coreference resolution is a well-studied problem in computational linguistics [7]. Although there are several existing software packages to address this problem, most are developed for large corpora of newspaper articles and generalize poorly to language in our corpus. Instead, we created a coreference system which is trained on language from our corpus. Following typical approaches to coreference resolution [16], our system consists of a classifier to predict coreference between all pairs of noun phrases in the language combined with a clustering algorithm that enforces transitivity and finds antecedents for all pronouns. For the pair-wise classifier we used a log-linear model which uses bag-of-words features. The model is trained using an annotated corpus

Fig. 4: Precision vs. recall at predicting whether an inferred grounding is incorrect.

of positive and negative pairs of coreferences. We set the classification threshold of the model to 0.5 so that it choses the result with the most probability mass. Once coreferring variables have been identified, a merging algorithm creates a single unified grounding graph. The coreference resolution algorithm identifies pairs of γ variables in the grounding graph that corefer; the merging algorithm combines all pairs of coreferring variables. Figure 2 shows a merged graph created from a command, a question, and an answer. C. Deciding When to Ask a Question Section III-A described how to generate a question in response to a natural language command. However, at a higher level, the robot needs to decide whether to ask a question or take an action. For example, if the robot is very confident about all grounding variables, it would be better to ask no questions at all. If it is uncertain about just one variable, a single question might suffice to disambiguate the command. Or it might ask a question, get an answer that it cannot understand, then choose to ask an additional question about the same grounding variable to receive further clarification. Our approach to this problem is to ask questions until the entropy of the most uncertain variable is below a certain threshold. To avoid going into an infinite loop, we prohibit the robot from asking a question about the same variable more than two times. IV. R ESULTS To evaluate the system, we use a corpus of 21 manually created commands given to a simulated robotic forklift. The commands were designed to be deliberately ambiguous in order to provide an opportunity for clarifying questions and answers. In Section IV-A, we assess the performance of the Metric 1 (Confidence) and Metric 2 (Entropy) at identifying incorrect grounding variables after inference has been performed for commands in the dataset. Second, we assess the end-to-end performance of the question-asking framework at increasing the number of correctly grounded concrete noun phrases. A. Predicting Incorrect Examples To directly assess the performance of the metrics defined in Section III-A, we measure their performance at identifying

413

grounding variables in the corpus that are incorrect. Figure 4 shows the effect of this process on the commands from the corpus as a precision vs. recall curve. Here, a true positive is an incorrect grounding variable that was predicted to be incorrect; a false positive is a correct grounding variable that was predicted to be incorrect. Metric 1 (Confidence) performs quite poorly, even having positive slope. This is not due to a bug; instead, there is a group of low-confidence grounding variables which are in fact correct. For example, for the command “Lift it from the truck,” the factor for “the truck” has high confidence, but “from the truck” has lower confidence. Metric 1 multiplies the two values together, yielding an overall low estimate for confidence. Metric 2, in contrast, gives this variable much lower entropy compared to other commands, since the “from the truck.” B. Asking Questions Next, we assess the performance of the system at using answers to questions to disambiguate ambiguous phrases in the corpus. To collect a corpus of questions and answers, we first generated questions for each concrete noun phrase in the corpus, then collected answers to those questions using crowdsourcing. For example, for a command like “Take the pallet and place it on the trailer to the left,” the questiongeneration algorithm could ask about “the pallet,” “it,” or “the trailer to the left.” By asking answers for all concrete noun phrases in the dataset in advance, we can compare different question selection strategies offline, without collecting new data. To collect an answer to a question, we showed annotators a natural language command directing the robot to perform an action in the environment, such as “Pick up the pallet,” paired with a question such as “What do you mean by ‘the pallet’?” In addition, annotators saw a video of the simulated robot performing the action sequence, such as picking up a specific tire pallet in the environment. We instructed them to provide an answer to the question in their own words, assuming that what they saw happening in the video represented the intended meaning of the command. We collected two answers from different annotators for each question. Example commands, questions, and answers from the corpus appear in Table 5. To measure the performance of the system, we report the fraction of correctly grounded concrete noun phrases in the original command, not including the question and answer. A concrete noun phrase is one which refers to a specific single object in the external world, such as “the skid of tires.” An example of a non-concrete noun phrase is “your far left-hand side.” A noun phrase such as “the skid of tires” is considered to be correct if the inference maps it to the same tire pallet that the human user referenced. It is considered to be incorrect if the inference maps it to some other object, such as a trailer. We evaluate our system in several different conditions, using both automatic coreference resolution and oracle coreference resolution. As baselines, we present the performance using only information from the commands, without asking any questions, as well as performance when asking a question

about each concrete noun phrase. The baseline results in Table I show that the system realizes a large improvement in performance when using information from commands, questions, and answers compared to information from the commands alone. Next, we assess the performance of the two metrics at selecting questions to ask. We report performance for three conditions: selecting just one question to ask about a command, selecting two questions, and selecting questions until uncertainty is below a specified, hand-tuned threshold. We also compare to selecting a question at random. The system may ask up to two questions about each concrete noun phrase. When asking about a noun phrase for a second time, it generates the same question but receives a different answer. The system could ask a total of 76 possible questions over the 21 commands in the corpus; for each approach we also report the fraction of questions from this space that were actually asked. Table I shows the performance of the system in these conditions. When asking one question and using automatic coreference resolution, Metric 2 (Entropy) slightly outperforms Metric 1 (Confidence), but does no better than randomly choosing a question to ask. When asking a second question, Metric 2 achieves better accuracy than Metric 1 or random question selection, nearly matching the 71% correct achieved by asking all possible questions. This is despite the fact that it only asks 55% of the available questions. These results show clear improvement from the additional information in the questions and answers, and some advantage from Metric 2 over Metric 1 and random selection of questions. Note when asking a second question, the system may choose to ask again about the previously encountered phrase; it may also choose to ask about another phrase. This mechanism means that if the answer to the first question was ambiguous, the system can recover by asking again to collect more information. In order to focus on the results of the question-selection process, we repeat the analysis of the two metrics using oracle coreference, which determines linguistic coreference directly from the mapping between constituents in the language and groundings in the real world. This eliminates automatic coreference as a source of error in these results. We see a significant improvement using oracle coreference compared to automatic coreference; this is due to errors in which the automatic resolver merges variables that do not actually refer to the same object. When using oracle coreference, asking all possible questions shows a dramatic improvement over asking no questions (92% from 58%). With one and two questions being asked, Metric 2 (Entropy) again outperforms Metric 1 (Confidence) and random selection. Notably, question selection with Metric 2 (Entropy) is able to achieve the same 92% accuracy as asking all questions, despite only asking 55% of the questions available to it. Improved coreference resolution could be achieved by training on a larger corpus of examples, as well as adding additional features to the coreference resolver, especially including information from groundings.

414

Next, we assess the system’s performance using the algorithm described in Section III-C in order to decide when to ask a question, in addition to deciding what question to ask. Table II shows the performance of Metric 1 and Metric 2. Metric 2 significantly outperforms Metric 2 while asking only a few more questions. Although it asks less than half of the possible questions, it approaches the performance of the system which asks all possible questions. As an example of the system’s operation, for a command such as “Move the pallet over to it,” the entropy according to Metric 2 for the phrase “it” is (1.82) and “the pallet” (1.00). The system identifies “it” as the most uncertain variable, then asks “What does the word ‘it’ refer to?” The answer was “It refers to the empty trailer to the left of the two pallets.” After incorporating the answer into the model and performing inference, the system correctly grounds the phrase “it” to the trailer and computes a new estimate for the entropy. The entropy for “it” has now been reduced, and the robot next asks “What do the words the pallet refer to?” and receives an answer “The pallet is the piece of wood with the orange and grey boxes that is directly in front of the forklift.” After this process, the robot has inferred correct values for all grounding variables in the grounding graph for this command. Finally, we assess the system’s performance at producing better action sequences using answers to questions. For each top-level clause in the corpus, we generated a sequence of robot actions and manually assessed whether those actions matched the actions in the original video. When using information from commands only, the system correctly executes 37% of the top level actions. In many cases, for a command such as “pick up the pallet” it does pick up a pallet, but a different one from the original video. In contrast, after incorporating information from the answers to questions, it correctly executes 66% of the commands. Our framework provides first steps toward an informationtheoretic approach for enabling the robot to ask questions about objects in the environment. Failures occurred for a number of reasons. Sometimes, the answer obtained from the human user was not useful. For example, one user answered “What do the words the pallet refer to?” with a definition of the pallet “The wooden crate that the merchandise sits on top of” rather than specifying which pallet was being referenced (e.g., something like “the pallet with tires.”) Other failures occurred in more complex environments because the robot failed to understand the disambiguating answer, as in “the object on the far left,” when the system did not have a good model of left versus right. Our entropy-based approach is limited to types of questions that target specific phrases in the answer. More general algorithms are needed to handle a wider array of dialog strategies, such as asking yes/no’ questions, such as “Do you mean this one?” or more open-ended questions such as “Now what?” V. R ELATED W ORK Many have created systems that exploit the compositional structure of language in order to follow natural language

Command: Question: Answer:

Move your pallet further right. What do the words your pallet refer to? Your pallet refers to the pallet you are currently carrying.

Command: Question: Answer:

Move closer to it. What does the word it refer to? It refers to the empty truck trailer.

Command: Question: Answer:

Take the pallet and place it on the one to the left. What do the words the one refer to? The one refers to the empty trailer.

Command: Question: Answer:

Place the pallet just to the right of the other pallet. What do the words the pallet refer to? The wooden crate that the merchandise sits on top of.

Fig. 5: Sample commands, questions, and answers from the corpus. TABLE I: Performance at Grounding Concrete Noun Phrases % Correct, automatic coreference

% Correct, oracle coreference

% Questions Asked

Baselines No Questions All Questions

47% 71%

58% 92%

0% 100%

Asking One Question Random Metric 1 (Confidence) Metric 2 (Entropy)

55% 53% 55%

74% 74% 79%

28% 28% 28%

Asking Two Questions Random Metric 1 (Confidence) Metric 2 (Entropy)

61% 63% 68%

82% 82% 92%

55% 55% 55%

commands [21, 9, 4]. Previous probabilistic approaches have used generative and discriminative models for understanding route instructions but did not make interactive systems that can use dialog to resolve ambiguity [10, 19, 8, 18, 13]. Our work instead focuses on using an induced probabilistic model over natural language commands and groundings in order to incorporate information from questions and answers. Others have created robotic systems that interact using dialog [5, 15, 2]. Bauer et al. [1] built a robot that can find its way through an urban environment by interacting with pedestrians using a touch screen and gesture recognition system. Our approach differs in that it focuses on simple threeturn dialogs but is able to understand language from untrained users rather than a predefined, fixed vocabulary or fixed types of answers. Furthermore, it chooses targeted questions based on the robot’s model of the external world. Existing work in dialog systems [3, 12, 20, 22] use MDP and POMDP models with a fixed, predefined state space to represent the user’s intentions. In contrast, the space of possible groundings is defined by objects and action available to the robot’s perception; the user’s intentions are defined by the grounding graph and vary according to the structure of the language.

415

TABLE II: Performance Deciding When to Ask a Question

[8] Thomas Kollar, Stefanie Tellex, Deb Roy, and Nicholas Roy. Toward understanding natural language directions. In Proc. % Correct, oracle % Questions Asked ACM/IEEE Int’l Conf. on Human-Robot Interaction (HRI), coreference pages 259–266, 2010. Metric 1 71% 38% [9] Matt MacMahon, Brian Stankiewicz, and Benjamin Kuipers. (Confidence) Walk the talk: Connecting language, knowledge, and action in route instructions. In Proc. Nat’l Conf. on Artificial Intelligence Metric 2 87% 40% (Entropy) (AAAI), pages 1475–1482, 2006. [10] Cynthia Matuszek, Dieter Fox, and Karl Koscher. Following directions using statistical machine translation. In Proc. ACM/IEEE Int’l Conf. on Human-Robot Interaction (HRI), VI. C ONCLUSION pages 251–258, 2010. In this paper, we presented results for a robot dialog [11] Stephanie Rosenthal, Manuela Veloso, and Anind K. Dey. Learning accuracy and availability of humans who help mobile understanding system based on a probabilistic graphical model robots. In Proc. AAAI, 2011. that factors according to the structure of language. This system [12] N. Roy, J. Pineau, and S. Thrun. Spoken dialogue management is able to ask the human user targeted questions about parts using probabilistic reasoning. In Proceedings of the 38th Annual of a command that it failed to understand and incorporate Meeting of the Association for Computational Linguistics (ACL2000), 2000. information from an open-ended space of answers into its [13] Nobuyuki Shimizu and Andrew Haas. Learning to follow model, iteratively improving its confidence and accuracy. navigational route instructions. In Proceedings of the 21st Our next steps are to scale the algorithm to more complex International Joint Conference on Artifical Intelligence, pages dialogs, expanding the repertoire of questions and answers that 1488–1493, 2009. the system can understand. Integrating nonverbal backchannels [14] Dimitar Simeonov, Stefanie Tellex, Thomas Kollar, and Nicholas Roy. Toward interpreting spatial language discourse and gesture into the framework as new types of factors in the with grounding graphs. In 2011 RSS Workshop on Grounding grounding graph remains an open problem. Finally, we aim to Human-Robot Dialog for Spatial Tasks, 2011. extend the framework to support active learning, enabling the [15] M. Skubic, D. Perzanowski, S. Blisard, A. Schultz, W. Adams, robot to learn new word meanings based on answers it has M. Bugajska, and D. Brock. Spatial language for human-robot dialogs. IEEE Trans. on Systems, Man, and Cybernetics, Part received to questions. C: Applications and Reviews, 34(2):154–167, 2004. ISSN 1094VII. ACKNOWLEDGEMENTS 6977. [16] Veselin Stoyanov, Claire Cardie, Nathan Gilbert, Ellen Riloff, This work was sponsored by the U.S Army Research Laboratory David Buttler, and David Hysom. Reconcile: A coreference resunder the Robotics Collaborative Technology Alliance, by the Office olution research platform. Technical report, Cornell University, of Naval Research under MURI N00014-07-1-0749, and by Battelle April 2010. under Robert Carnes and Scott Sheaf. Their support is gratefully [17] S. Tellex, T. Kollar, S. Dickerson, M. R. Walter, A. G. Banerjee, S. Teller, and N. Roy. Approaching the symbol grounding acknowledged. We would like to thank Javier Velez, Ashis Banerjee problem with probabilistic graphical models. AI Magazine, 32 and Josh Joseph for helpful discussions about this work. (4):64–76, 2011. [18] S. Tellex, T. Kollar, S. Dickerson, M.R. Walter, A. Banerjee, R EFERENCES S. Teller, and N. Roy. Understanding natural language com[1] Andrea Bauer, Klaas Klasing, Georgios Lidoris, Quirin mands for robotic navigation and mobile manipulation. In Proc. Mhlbauer, Florian Rohrmller, Stefan Sosnowski, Tingting Xu, AAAI, 2011. Kolja Khnlenz, Dirk Wollherr, and Martin Buss. The Au- [19] Adam Vogel and Dan Jurafsky. Learning to follow navigational tonomous City Explorer: Towards natural human-robot interdirections. In Proc. Association for Computational Linguistics action in urban environments. International Journal of Social (ACL), pages 806–814, 2010. Robotics, 1(2):127–140, April 2009. [20] J. D Williams and S. Young. Scaling POMDPs for spoken [2] Rehj Cantrell, Matthias Scheutz, Paul Schermerhorn, and Xuan dialog management. IEEE Transactions on Audio, Speech, and Wu. Robust spoken instruction understanding for HRI. In Language Processing, 15(7):2116–2129, September 2007. Proceeding of the 5th ACM/IEEE international conference on [21] Terry Winograd. Procedures as a representation for data in a Human-robot interaction, page 275282, 2010. computer program for understanding natural language. PhD [3] F. Doshi and N. Roy. Spoken language interaction with thesis, Massachusetts Institute of Technology, 1970. model uncertainty: An adaptive human-robot interaction system. [22] S. Young. Using POMDPs for dialog management. In IEEE Connection Science, 20(4):299–319, 2008. Spoken Language Technology Workshop, 2006, pages 8–13, [4] J. Dzifcak, M. Scheutz, C. Baral, and P. Schermerhorn. What to 2006. do and how to do it: Translating natural language directives into temporal and dynamic logic representation for goal management and action execution. In Proc. IEEE Int’l Conf. on Robotics and Automation (ICRA), pages 4163–4168, 2009. [5] Kai-yuh Hsiao, Stefanie Tellex, Soroush Vosoughi, Rony Kubat, and Deb Roy. Object schemas for grounding language in a responsive robot. Connection Science, 20(4):253–276, 2008. [6] Ray S. Jackendoff. Semantics and Cognition, pages 161–187. MIT Press, 1983. [7] Daniel Jurafsky and James H. Martin. Speech and Language Processing. Pearson Prentice Hall, 2 edition, May 2008. ISBN 0131873210.

416

Efficiently Finding Optimal WindingConstrained Loops in the Plane Paul Vernaza, Venkatraman Narayanan, and Maxim Likhachev [email protected], [email protected], [email protected]

The Robotics Institute Carnegie Mellon University Pittsburgh, PA 19104

Abstract—We present a method to efficiently find windingconstrained loops in the plane that are optimal with respect to a minimum-cost objective and in the presence of obstacles. Our approach is similar to a typical graph-based search for an optimal path in the plane, but with an additional state variable that encodes information about path homotopy. Upon finding a loop, the value of this state corresponds to a line integral over the loop that indicates how many times it winds around each obstacle, enabling us to reduce the problem of finding paths satisfying winding constraints to that of searching for paths to suitable states in this augmented state space. We give an intuitive interpretation of the method based on fluid mechanics and show how this yields a way to perform the necessary calculations efficiently. Results are given in which we use our method to find optimal routes for autonomous surveillance and intruder containment.

I. I NTRODUCTION The subject of this work is finding optimal constrained loops in the plane. More specifically, out of all paths starting and ending at a specific location in the plane and satisfying certain winding constraints, we would like to find one such path that minimizes a given location-dependent cost accumulated along the path, while also avoiding some regions entirely. Figure 1 depicts a practical situation in which this type of problem arises. Here, an unmanned aerial vehicle (UAV) is tasked with flying a surveillance mission to photograph certain regions of interest (ROI) located within hostile territory. The UAV must find a route that begins and ends at its home base. It must additionally fly a route that minimizes a given cost functional, determined by distance traveled and proximity to hostile radar installations, while entirely avoiding certain hightraffic regions deemed too risky to traverse. Of prime concern is the need to photograph the ROI from all perspectives. Formally, we can ensure this by requiring that the generated path satisfy topological winding constraints with respect to the ROI—if the UAV winds at least once around each ROI, it will be able to view each ROI from every angle. The rest of this paper is organized as follows. First, we give an intuitive description of our approach, followed by a brief discussion of the relation of our work to other methods. An indepth description of technical details of our method follows in Section IV. Experimental results are given in Section V, followed by discussion.

Fig. 1: An optimal plan (solid line) for a hypothetical UAV surveillance mission. The UAV is constrained to take off and return to a particular location, winding twice around each of three regions of interest. Regions of interest are designated by polygonal dashed lines. Striped polygons denote high-traffic regions that the UAV must avoid. Circular dashed lines denote location and ranges of radar installations, with radar power decreasing with distance to center.

II. M ETHOD OVERVIEW The basic idea of our method is very simple and best motivated by a fluid analogy. Suppose we wish to find a planar loop that encloses some set of regions. We imagine that within each region is a source that produces fluid at a certain rate. By the divergence theorem (illustrated in Fig. 2), we know that the net rate at which fluid escapes any loop is equal to the rate at which it is produced by sources enclosed by the loop. Therefore, we can find a loop enclosing the desired regions by searching for a loop through which fluid escapes at a rate equal to the sum of the rates assigned to those regions. So long as there are no two sets of regions whose rates sum to the same quantity, we can be assured that any such loop must contain only the desired regions. We can find such a loop via a straightforward modification of any standard graph-based method for navigation in the

417

t fl o

w

=0

1

+1/4

region of interest

B

C

1 +1

/2

F

-1 /2

t outflow = 1 ne

ow = 4

net o u

0

net ou tfl

3

-1/4

5/4,

E

-1/4

1/2

1,

/2 -1

5

+1/4

ps) oo

10 (bo th l

3/4

/2

D

+1

net outflow =

A

F= 0

Fig. 2: Illustration of divergence theorem: total flow rate exiting the boundary of any region is equal to the sum of source flow rates contained within, even in the presence of other sources. Locations of flow sources are indicated by faucet icons, with their flow rates indicated by numbers adjacent to these icons.

plane. This is accomplished by treating the rate at which fluid passes through a path as a dependent variable whose state is tracked along with the usual state necessary for navigation, such as position and orientation. Any given state of the vehicle can therefore be associated with any number of different flow values, depending on how the search reached that state. When the search visits the goal configuration, we can check the flow value to verify whether the desired regions were enclosed; if they have been, then we are assured that the loop thus found is optimal with respect to all other loops satisfying our constraints, as long as the search algorithm employed is admissible. A. Example with one region Fig. 3 gives a simple illustration of how our method might proceed given just one region of interest around which to loop. Specifically, we consider the problem of finding an optimal path that begins at the flow-augmented state ((x, y), F ) = (A, 0) and winds around the obstacle one or more times before returning to the location A. We will refer to the flow state F as the F -value. The figure shows a number of possible actions that the search could take, for illustrative purposes. Each arc is marked by the quantity by which the F -value is incremented traversing the arc in the direction shown, given that the region of interest contains a source of rate one. Each location is marked by the F -values found at those locations by exploring different illustrated paths. For instance, the search begins by expanding A, generating B with F = 1/2, and generating C with

Fig. 3: Illustration of homotopy-augmented search. Shown is a 2x2 grid on which a graph search is performed, with a single obstacle in the center. Graph search begins at point A (marked F = 0). Directed edges are marked with signed F -value increments (in italics). Other values denote F -values at grid vertices associated with one or more paths discovered during search. Solid lines show paths found passing the obstacle to the left, while dashed lines show paths found passing the obstacle to the right

F = −1/2. Next, the search expands B, generating D with F = 3/4 and E with F = 1, followed by a similar expansion of C. At this point, we note that E has been reached via two (nonhomotopic) paths, each yielding a distinct F -value. Therefore, we next expand the location E twice—once for the state (E, 1) and once for the state (E, −1). If we continue this process, the reader may verify that we will eventually expand the state (A, 2) and/or (A, −2), though this is not explicitly illustrated for purposes of clarity. Note that this will require traversing arcs in directions contrary to those illustrated, in which case the arc’s F value is negated before adding it to the state’s F -value. Upon expanding the state (A, 2), it is apparent that the search has found a path winding clockwise around the obstacle exactly once. Furthermore, it may be verified that any other path that winds once, clockwise, around the obstacle, also has F = 2. However, if we expand the states in an admissible ordering, such as that provided by A* [9], we are guaranteed that the first path via which we have expanded (A, 2), is the optimal one to that state. Since all paths starting at A, winding clockwise, once, around the obstacle, arrive at (A, 2), we can in that case conclude that this path is one that achieves the minimum cost out of all such paths. We finally observe that any loop winding around the obstacle k times arrives at the state (A, ±2k) for k > 0, with sign determined by the winding direction. If the loop does not

418

wind around the obstacle, however, the path arrives at the state (A, 0). The converse statements are also true. B. Winding around multiple regions In the case that there are multiple regions of interest, we proceed as previously mentioned, placing a fluid source inside each region and computing F -values via superposition. We may then specify how many times the path should wind around each region, where winding is defined in the topological sense [10]. Assuming we assign to each region a fluid source of rate log zi , where zi is the ith prime number, a loop satisfying the winding constraints may be found by searching for a path to a state with exp F = Πi ziWi ,

(1)

where Wi is the winding number associated with the ith region. The fundamental theorem of arithmetic implies that this constitutes an invertible map between winding numbers and F -values. III. R ELATED W ORK Our work was inspired by the work of Bhattacharya et. al. [3, 4], who first proposed the general idea described in Section II of performing graph search for navigation with a state vector augmented by homotopy information. Our work differs principally in two ways. First, we derive a version of homotopy-augmented graph search that is suitable for finding optimal loops with arbitrary winding constraints. Furthermore, we employ a real-valued encoding of homotopy information, as opposed to the complex-valued encoding derived in [3]. Although it is not the focus of this work, we note that the real-valued encoding of homotopy information derived here may also be applied to the problem described in [3], which might be characterized as point-to-point planning in 2D with homology constraints on the generated paths. In addition to having an intuitive interpretation in terms of fluids, applying the encoding we derive here would yield a method arguably simpler to implement. Also noteworthy is recent work in computer vision that employs planning with winding angles in order to perform tracking with occlusion [8]. Planning in this way is essentially equivalent to the method described here, a fact that we show formally in Sec. IV-C. Additionally, we give an intuitive fluid-based interpretation of the method, derive an appropriate heuristic, and apply the method to surveillance and optimal confinement problems. The problem of loop planning discussed here also bears some similarity to several problems in computational geometry. In the geometric knapsack problem [1], the objective is to find a simple loop enclosing discrete reward locations of varying reward, such that the enclosed reward minus the length of the loop, is maximized. Arkin et al. [1] give algorithms to solve certain variants of this problem in time polynomial in the number of vertices of the polygons given as input. Our method is similar if the reward regions are thought of as regions around which the path should wind at least once. However, our method

appears to be more flexible in a variety of ways, due to our use of graph-based search. We need not assume the environment is polygonal, nor do we assume the cost of the path is uniform across free space, or that the vehicle dynamics are trivial, as in [1]. Moreover, we can leverage problem-specific heuristics in the context of A* in order to accelerate the search. A final difference between our work and that of [1] is that we are able to find solutions with arbitrary winding constraints. As surveillance constitutes a principal application of our method, our method is related to another problem from computational geometry knows as the optimum watchman route problem. This problem is defined as that of generating a path of minimum length through an environment such that every point in the environment is visible from some point along the path. We motivate the UAV surveillance problem considered here from an aim of rendering each point along the ROI visible, but the nature of our solutions is such that this would be guaranteed only in certain cases, such as the case where ROI are flat enough to guarantee that no ROI occludes any other, or the case where the ROI are spaced sufficiently such that winding twice around each ROI is sufficient to guarantee full visibiity. Therefore, it is unlikely that optimum watchman routes are equivalent to optimum winding-constrained loops. UAV planning in itself has been the focus of a significant amount of work in engineering disciplines. Typical goals of these approaches include cooperative planning, enforcing nonholonomic constraints, and planning in real-time [6, 5, 2]. Other work focuses on ensuring geometric visibility in urban environments [7]. However, it appears that comparatively little work has focused on planning optimal loop paths for UAVs. An exception is the work of [11], who devised a suboptimal algorithm to find looping paths visiting a number of ROI while satisfying path curvature constraints. This method is again less flexible than ours in that we allow an arbitrary spatiallydependent cost function to take into account factors such as spatially-varying risk. IV. T ECHNICAL D ETAILS We now clarify some of the technical points of the method presented in Sec. II. A. Graph construction An informal description of our graph construction was given in Section II. That description may be formalized in the following way. First, we assume we are given a description of the graph corresponding to a point-to-point navigation problem in the plane, consisting of a state space X and a successor function succ : X → 2X , where 2X is the power set of X . Typical examples of X include X = R2 and X = R2 ×S 1 , the latter corresponding to navigation with heading dependence. We then define a new state space X  = X ×R, with elements denoted by (x, f ). Let ρ : X × X → (I → R2 ) denote the function that generates the Cartesian path taken from a node in the original graph to a successor. succ is then defined in the following way: succ : (x, f ) → {(x , f  ) | x ∈ succ(x), f  = f +F (ρ(x, x ))},

419

sc a

le

where F : (I → R2 ) → R is a line integral that will be defined formally in the next section. The start state for graph search is defined by (x0 , 0), where x0 is the start state inthe pre-augmented graph. The goal state is defined as (x0 , i Wi log zi ), where Wi is the desired winding number of the solution path around the ith region. B. Line integral construction Aspects of the line integral referred to in the last section are now detailed. First, we note that a suitable flow field is given by  ri x − o i , (2) V (x) = 2π x − oi 2 i where oi ∈ R2 is an arbitrary source location inside the ith region for which we are given a winding constraint, and ri = log zi is the source rate assigned to the ith source. This may be established by considering, for example, the case where oi is located at the center of a circle and devising an integrand ˆ (s) the normal to x ˜ at that will integrate to ri . Denoting by n s, we define  F (˜ x) = V (˜ x(s)) · n ˆ (s) ds. (3) In order to compute the value by which F increments along an edge, we must evaluate this integral with respect to the path taken by the edge. Assuming this path is a straight line, the integral can be computed analytically. The integral is simplified considerably by taking into consideration its rotational, translational, and scaling invariances, as depicted in Fig. 4. By further applying linearity, we may therefore reduce the general case to the line integral of V over (0, 0) → (0, 1) due to a single obstacle of rate r located at position o = (ox , oy ). Assuming the segment normal is given by n ˆ = −1 0 the absolute value of the line integral is in this case given by  1  1 r −ox V (x) · n ˆ ds = − 2 dy (4) 2π o + (y − oy )2 s=0 y=0 x   r 1 − oy −oy = arctan − arctan , (5) 2π ox ox which is equal to rθ/(2π), with θ as defined in Fig. 4. The general case may then be derived via similar triangles and superposition. For a line segment with generic endpoints p0 and p1 , this yields C D  ri p1 − oi p0 − oi arccos , , (6) si F = 2π p1 − oi  p0 − oi  i where the sign si ∈ {−1, 1} is determined by whether oi falls on the left or right side of the directed segment p0 → p1 , according to the desired convention. C. Equivalence with winding-vector approach Summing Equation (6) along a path reveals the F -value to be a linear projection of the vector of winding angles associated with the path around each region; i.e., F = w, θ, where θ is a vector of winding angles, and w is a vector

rotate +translate

Fig. 4: Illustration of symmetries in line integration of flow field: the flux through the segment L due to o is equal to the flux through the segment (0, 0) → (0, 1) due to o.

with wi = log zi /(2π). The correctness of Eq. (1) follows immediately from this observation. This raises the question of whether an equivalent method may be obtained by augmenting the state vector with a vector of winding angles as opposed to the single F -value. Here we show that the two methods are indeed equivalent, in the sense that the entire vector of winding angles can be uniquely recovered from any F -value. The key observation is that each valid winding angle at each location in the plane can be expressed in the form θi = θ¯i + 2πki , where ki ∈ Z and θ¯i is a location-dependent offset angle. We then observe that    1 ¯ θi , log zi ki + (7) exp F = exp 2π i which implies that . exp F −

 log zi i



/ θ¯i

= Πi ziki .

(8)

Therefore, given F , the winding angles θi can be obtained by finding the prime factors of the left-hand side of the previous expression. An interesting geometric interpretation of this fact is that at each location, the winding angles are determined by the intersection of the hyperplane F = w, θ and the discrete, infinite lattice of valid winding angles. The slopes of this hyperplane are ratios of logs of prime numbers, and are therefore irrational; consequently, the hyperplane can intersect the infinite lattice in at most one location. This is illustrated in Fig. 5. A practical consequence of this analysis concerns the memory efficiency of the method. Given many winding constraints, it is clearly inefficient to take the naive approach of storing each winding angle as an individual floating-point value, since the set of valid winding angles for each location is a discrete set. Storing a single F -value as a floating-point value constitutes a memory-efficient alternative to this approach. Alternatively, one may store a single floating-point offset angle

420

query state

start (goal)

Fig. 5: Illustration of invertibility of map from winding angles to F -values for an example with two regions of interest. The set of valid winding angles (θ1 , θ2 ) consists of the set of lattice points (intersections of gray lines). The only valid pair of winding angles that also satisfies the F equation is the unique intersection of the pictured line with the lattice (circled). The uniqueness of the intersection point follows from the irrationality of the slope − log 2/ log 3. vector θ¯ per location in addition to a vector of integer ki per state; for any reasonable problem, only a few bits per state would be necessary to represent each ki . D. Search heuristic The combinatorial nature of winding constraints necessitates a careful choice of search strategy. For this reason, we employ A* search with the heuristic illustrated in Fig. 6. Given a query state for which the heuristic is to be computed, we calculate, of the regions with winding constraints, which regions would have their constraints satisfied if the loop were closed with a straight line to the goal. For each of the remaining regions with non-zero winding constraints, we compute the length of the minimum-length path beginning at the query state, touching the region, and returning to the goal location; this constitutes the minimum-length excursion necessary to satisfy that region’s winding constraint. Of all the distances so computed, we choose the greatest to be the value of the heuristic. V. E XPERIMENTS We implemented the method and applied it to several test scenarios described here, with the aim of understanding the qualitative characteristics of planning with winding constraints, the applicability of the method to realistic problems, and the method’s computational efficiency. For simplicity, the examples presented here only consider curves with positive winding, though constraints with negative winding present no obstacle to the method. A. Synthetic environment Fig. 7 shows the result of a simple synthetic experiment showing optimal plans generated by our method winding around three regions (referred henceforth as ROI) in varying ways, and in the presence of obstacles. The purpose of this

Fig. 6: Illustration of heuristic. Query state indicates state for which heuristic is to be evaluated. Solid line shows a path that may have been taken to this state. Regions with positive winding constraints are depicted as solid circles. Striped area is an obstacle. Shaded area encloses regions whose winding constraints would be satisfied if the loop were closed with a straight line to the goal. Dashed line shows the maximumlength excursion to reach any winding-constrained region not in the shaded area. The value of the heuristic is the length of the dashed line.

experiment was to examine the qualitative aspects of paths with different winding constraints. In the simplest case, the path is constrained to wind around each ROI exactly once. Since we simulated no vehicle dynamics in this experiment, the optimal solutions must always be Jordan [10] in this case, as depicted. Paths constrained to wind twice around some ROI tend to consist of a large loop winding once around each obstacle with offshoots that loop around each ROI once more. However, if any pair of ROI are sufficiently close, the optimal path may contain a Jordan subloop containing both, as depicted in the lower-right portion of the figure. From an applications perspective, this type of behavior may be desirable in the case of surveillance with limited sensing range. In this case, it may be sufficient to wind around groups of ROI (instead of winding around each individually), provided that each group is contained within a circle of radius smaller than the agent’s sensing range. Note also that it is possible to specify a zero winding number for certain ROI. In the case that all the windings are either zero or one, the ROI with winding one must be contained within the region enclosed by the loop (which must be Jordan), while the ROI with winding zero must fall outside the loop. B. UAV surveillance We now return to the UAV surveillance problem mentioned in the introduction. For these experiments, we simulated hypothetical UAV missions by annotating aerial imagery of military bases with ROI, hypothetical radar installations, and regions of excessive risk to be avoided entirely. Radar installations were modeled as overlapping ellipses, each containing a cost attaining its maximum value at the center of the ellipse and diminishing gradually to zero at the boundary of the ellipse. Our objective was to find minimum-cost loops subject to winding constraints and entirely avoiding regions of excessive risk. We approximated nontrivial dynamics for the vehicle by planning

421

1

2

1 1 2 2

3

1

1

1

0

2

3

2

2 3

Fig. 7: Demonstration showing loops found by the method to satisfy different winding constraints. Numbers indicate winding number constraints enforced for circular regions located directly beneath. Obstacles depicted by rectangular regions.

1 2 3

1 Fig. 8: Result of path planning for hypothetical UAV surveillance mission. UAV is constrained to wind exactly twice around each ROI. Annotations are as described in Fig. 1.

for its orientation as well, limiting the maximum change in orientation between straight segments to 22.5 degrees. Figure 1 shows the result of finding an optimal path winding around each ROI exactly twice. The optimal path in this case has the previously discussed characteristic of having a large outer loop with smaller loops branching off of it. It is observed that the path generally stays just outside the region of radar visibility, straying inside only briefly in order to wind around the ROI. As expected, the path also never wanders inside the forbidden regions. Figure 8 shows a result obtained in another scenario. Again, the path was constrained to wind twice around each ROI. An interesting observation here is that upon visiting the last ROI, the UAV has a choice of either winding around the ROI and roughly retracing its path in the opposite direction, or penetrating radar range briefly in order to return to its home base. We observe that the optimal path indeed penetrates radar range via a short path near the boundaries of two overlapping radar installations before completing the loop. We additionally attempted to generate plans for a scenario

2

Fig. 9: Results of path planning for UAV surveillance with time-varying cost function. Curved, dashed line shows planned path. Colored, striped wedges show fields of view of enemy patrollers. Three frames are shown, each showing the position of the UAV and patrollers at a particular point in time.

in which enemy patrol units were present. These were modeled by augmenting the state with a time variable; high costs were assigned to those positions and times representing states in which the UAV could be observed by an enemy patrol unit. Fig. 9 shows an example result. The optimal path successfully allows the UAV to evade the patrol units. An interesting feature of the plan is a small loop that effectively slows the UAV in order to avoid detection by patrol unit 2. C. Intruder confinement We finally studied the application of our method to a problem we will refer to as intruder confinement. This problem consists of finding an optimal way to confine several intruders in a maze-like environment, subject to the constraint that innocent bystanders should not be confined. Confinement is achieved by surrounding the intruders with a team of robots that deploy from a central location. We assume that the robots have limited communication range, such that we can define a connectivity graph having an edge between two robots iff. they are within communication range of each other. In order to

422

ensure that the robots can coordinate, this graph should remain connected at all times. A feasible deployment strategy may be obtained by finding a loop that winds around each intruder exactly once without winding around a bystander. Robots may then incrementally deploy along the loop at intervals smaller than their communication radius in order to surround the intruders while maintaining full connectivity and allowing bystanders to escape. The results shown in Fig. 10 demonstrate that the method is able to find optimal enclosures in highly irregular, maze-like indoor environments of varying topologies. As was expected, we observed that the complexity of planning did not scale significantly with the complexity of the environment, allowing us to solve problems such as those depicted. However, the complexity was significantly affected by the number of winding constraints imposed, as will be discussed shortly.

?

(a)

D. Computational efficiency In order to study the computational efficiency of the method, we applied it to a synthetic example where we varied just the number of regions with winding constraints. The experimental scenario is depicted in Fig. 11. In the nth trial, we used our method to find a loop around the regions labeled 1, 2, . . . n, winding once in the positive direction around each. This was conducted both using the heuristic described in Sec. IV-D— which we will refer to as the loop heuristic—and using the Euclidean distance-to-goal heuristic. Fig. 11b shows a clearly asymptotic exponential scaling in the run time as a function of the number of winding constraints for both heuristics. The loop heuristic, however, exhibits a significantly more shallow slope in the log plot; for 10 constraints, the Euclidean distance heuristic is slower by a factor of 10. An interesting observation is that the performance of the loop heuristic was relatively insensitive to the number of constraints up to about six, at which point planning times increased significantly. For a very small number of constraints, the Euclidean distance heuristic outperformed the loop heuristic by a large margin, which we attribute to the smaller overhead of the Euclidean heuristic. The reason for the observed exponential scaling is that the search ultimately begins to explore many different combinations of windings around different obstacles. As we increase the number of constraints, the number of plausible winding states (and by extension, the open list) grows exponentially. The loop heuristic was able to compensate for this effect somewhat, but not completely. We also observed in experimens with constraints winding multiple times around each region, that the performance of the loop heuristic deteriorated. This is to be expected, as it does not take into account the effect of multiple windings. We are currently investigating different avenues to enable scaling to yet larger numbers of winding constraints. One simple enhancement would be to prune the search when winding angles become too large (a similar strategy was used in [8]), at the expense of losing the completeness guarantee of the search. Also, as noted in Section III, certain variants of the

?

?

(b)

Fig. 10: Results of intruder confinement experiments. Locations of intruders are marked by circles below exclamation symbols. Locations of bystanders are marked by circles below question marks. Paths generated by the method are shown superimposed on floor plan of environment.

geometric knapsack problem in computational geometry may be considered relaxations of the winding-constrained planning problem, and may be solved efficiently. An admissible heuristic might therefore be obtained as the solution to such a problem. Alternatively, it is possible to construct an admissible heuristic as a graph search problem on a vastly reduced graph that assumes a uniform cost function. We leave implementation of these ideas as future work. VI. C ONCLUSION We have demonstrated a method for efficiently finding optimal loops in the plane subject to winding constraints. The method accomplishes this by collapsing together all paths sharing a common F -value, which encodes homotopy-type information about the paths. The F -value is constructed in such a way as to reversibly encode winding data when a

423

1

start/goal

2

3

4

5

6

7

8

9

ACKNOWLEDGMENTS

10

This work was supported by ONR DR-IRIS MURI grant #N00014-09-1-1052 and ONR ANTIDOTE MURI grant #N00014-09-1-1031.

(a) Experimental setup 10

run time (s)

10 10 10 10 10 10

3 2 1

Eu

0

d cli

ea

loo

ï

n

u he

eu ph

ris

tic

risti

R EFERENCES

c

ï ï

1

2

3

4

5

6

7

8

9

10

number of winding constraints (b) Run time scaling

Fig. 11: Synthetic experiment to determine empirical scaling of computational complexity as a function of number of winding constraints. Fig. 11a shows experimental setup with point obstacles around which winding constraints were imposed, along with order in which these constraints were introduced and the solution path for 10 obstacles. Fig. 11b shows the time to find the optimal solution versus the number of winding constraints (note log scale), with and without the heuristic described in Sec. IV-D (labeled loop heuristic).

loop is completed, allowing us to reduce a search for cyclic, winding-constrained paths to a search for an acyclic path to a specific state encoding the desired winding. Computation of the F -value additionally has an intuitive fluid-based interpretation and may be performed efficiently. Finally, our method leverages standard graph-based search methods to achieve flexibility in problem representation, vehicle dynamics, and selection of domain-specific admissible heuristics to accelerate the search process. A particular challenge for the method as implemented up to this point is the case where many winding constraints are to be enforced. In the near-term, we anticipate that employing straightforward heuristics (as described in Section V-D) will greatly enhance the ability of the method to solve problems of this type. An additional area of future research is using our method as a stepping stone to solving problems with constraints related to, but stronger than winding. For instance, in the UAV surveillance problem, we may wish to enforce the constraint that the UAV fully observe one location before proceeding to another. We are currently investigating how this type of constraint might be enforced by augmenting the state vector with state that keeps track of such ordering information. Though the variations are endless, we believe that windingconstrained planning yields fundamental insight that will prove useful in a variety of related endeavors.

[1] E.M. Arkin, S. Khuller, and J.S.B. Mitchell. Geometric knapsack problems. Algorithmica, 10(5):399– 427, 1993. URL http://www.springerlink.com/content/ g007w81p153h3326/. [2] J.S. Bellingham, M. Tillerson, M. Alighanbari, and J.P. How. Cooperative path planning for multiple UAVs in dynamic and uncertain environments. In IEEE Conferenece on Decision and Control, volume 3, dec. 2002. [3] S. Bhattacharya, V. Kumar, and M. Likhachev. Searchbased path planning with homotopy class constraints. In Third Annual Symposium on Combinatorial Search, 2010. URL http://www.aaai.org/ocs/index.php/AAAI/ AAAI10/paper/view/1920. [4] Subhrajit Bhattacharya, Maxim Likhachev, and Vijay Kumar. Identification and representation of homotopy classes of trajectories for search-based path planning in 3d. In Proceedings of Robotics: Science and Systems, 27-30 June 2011. URL https://www.aaai.org/ocs/index. php/SOCS/SOCS10/paper/view/2089/0. [5] S.A. Bortoff. Path planning for UAVs. In American Control Conference, volume 1, pages 364 –368 vol.1, sep 2000. doi: 10.1109/ACC.2000.878915. [6] P.R. Chandler, M. Pachter, and S. Rasmussen. UAV cooperative control. In American Control Conference, volume 1, pages 50 –55 vol.1, 2001. doi: 10.1109/ACC. 2001.945512. [7] Peng Cheng, J. Keller, and V. Kumar. Time-optimal UAV trajectory planning for 3d urban structure coverage. In Intelligent Robots and Systems, 2008. IROS 2008. IEEE/RSJ International Conference on, pages 2750 – 2757, sept. 2008. doi: 10.1109/IROS.2008.4650988. [8] H. Gong, J. Sim, M. Likhachev, and J. Shi. Multihypothesis motion planning for visual object tracking. In ICCV, 2011. [9] P.E. Hart, N.J. Nilsson, and B. Raphael. A formal basis for the heuristic determination of minimum cost paths. Systems Science and Cybernetics, IEEE Transactions on, 4(2):100 –107, july 1968. ISSN 0536-1567. doi: 10.1109/ TSSC.1968.300136. URL http://ieeexplore.ieee.org/xpls/ abs all.jsp?arnumber=4082128. [10] J.R. Munkres. Topology: a first course. Prentice Hall, 1975. [11] Zhijun Tang and U. Ozguner. Motion planning for multitarget surveillance with mobile sensor agents. Robotics, IEEE Transactions on, 21(5):898 – 908, oct. 2005. ISSN 1552-3098. doi: 10.1109/TRO.2005.847567.

424

On the Structure of Nonlinearities in Pose Graph SLAM Heng Wang∗ , Gibson Hu† , Shoudong Huang† and Gamini Dissanayake† ∗

College of Electronic Information and Control Engineering, Beijing University of Technology, Beijing, China. Email:[email protected] † Centre for Autonomous Systems, Faculty of Engineering and Information Technology, University of Technology, Sydney, Australia. Email:{Gibson.Hu, Shoudong.Huang, Gamini.Dissanayake}@uts.edu.au Abstract—Pose graphs have become an attractive representation for solving Simultaneous Localization and Mapping (SLAM) problems. In this paper, we analyze the structure of the nonlinearities in the 2D SLAM problem formulated as the optimizing of a pose graph. First, we prove that finding the optimal configuration of a very basic pose graph with 3 nodes (poses) and 3 edges (relative pose constraints) with spherical covariance matrices, which can be formulated as a six dimensional least squares optimization problem, is equivalent to solving a one dimensional optimization problem. Then we show that the same result can be extended to the optimizing of a pose graph with “two anchor nodes” where every edge is connecting to one of the two anchor nodes. Furthermore, we prove that the global minimum of the resulting one dimensional optimization problem must belong to a certain interval and there are at most 3 minima in that interval. Thus the globally optimal pose configuration of the pose graph can be obtained very easily through the bisection method and closed-form formulas.

I. I NTRODUCTION Recently, SLAM techniques based on pose graphs are becoming very popular because of their robustness and computational efficiency. Solving the SLAM problem using pose graph representation typically includes two stages. In the first stage (SLAM front-end), relative pose constraints are obtained (e.g. from odometry and/or scan matching) to construct a pose graph where each node is a robot pose and each edge is a relative pose constraint. In the second stage (SLAM backend), an optimization technique is applied to find the optimal pose configuration. The concept of pose graph SLAM formulation has been introduced in [1]. Recent insights into the sparse structure of the SLAM problem have boosted the research in this direction. The work includes multi-level relaxation [2], stochastic gradient descent (SGD) [3], sparse Pose Adjustment (SPA) [4], preconditioned conjugate gradient and subgraph strategy [5]. A number of 3D pose graph SLAM algorithms have also been developed that can be applied to visual SLAM and SLAM with 3D laser data (http://openslam.org/). The optimization problem involved in pose graph SLAM has a very high dimension because all the poses involved are variables. It is expected that such a high dimensional nonlinear optimization problem could have a very large number of local minima and general iteration based optimization algorithms could frequently be trapped into one of these local minima

unless very good initial value is available. However, the results in [3][4][5] demonstrate that sometimes the algorithms can converge to good solution with poor or even zero initial values. In particular, Tree-based network optimizer (TORO)[6], which applies incremental pose parametrization and a tree structure on top of SGD, has been reported to be extremely robust to poor initial values, especially when the covariance matrix of the relative pose is close to spherical. These phenomena show that the nonlinear optimization problem involved in pose graph SLAM appears to have a very special underling structure that requires further investigation. Since loop closing is an important feature in SLAM optimization problems, we start from analyzing the “minimal pose graph SLAM problem with loop closure”, namely, the pose graph with 3 poses and 3 constraints. Analyzing minimal problems (e.g. http://cmp.felk.cvut.cz/minimal/ and [7]) can provide us some insights into the fundamental structure and properties of the large problems. In this paper, based on our initial work on the analysis of the number of minima for point feature based SLAM problems [8], we prove that solving the basic 3-pose 3-constraint pose graph as a least squares problem is equivalent to solving a one dimensional optimization problem. Furthermore, we extend the result to pose graph with “two anchor nodes” and show how to obtain the global minimum of the one dimensional optimization problem1 . The paper is organized as follows. Section II presents the least squares formulation of the 3-pose 3-constraint pose graph SLAM problem and shows its equivalence to a one dimensional optimization problem. In Section III, the result is extended to the pose graph with only two anchor nodes. Section IV presents some properties of the one dimensional optimization problem. Section V discusses the related work and potential applications of the results. Finally Section VI concludes the paper. The proofs of the two main results in this paper are given in the Appendix. II. P OSE G RAPH WITH 3 P OSES AND 3 C ONSTRAINTS Suppose there are three 2D robot poses r0 , r1 , r2 and three constraints Z10 , Z20 , Z21 where Zji is the relative pose constraint 1 The MATLAB source code for solving this kind of pose graph SLAM problems is available at http://services.eng.uts.edu.au/˜sdhuang/research.htm.

425

then the objective function is F (X) = γr1 [(zxr1 − xr1 )2 + (zyr1 − yr1 )2 ] + γφr1 (zφr1 − φr1 )2 + γr2 [(zxr2 − xr2 )2 + (zyr2 − yr2 )2 ] + γφr2 (zφr2 − φr2 )2 + γr2 r1 [zxrr1 − cφr1 (xr2 − xr1 ) − sφr1 (yr2 − yr1 )]2 2

+ γr2 r1 [zyrr1 + sφr1 (xr2 − xr1 ) − cφr1 (yr2 − yr1 )]2 2

Fig. 1.

+ γφrr1 (zφrr1 − φr2 + φr1 )2 .

Pose graph with 3 poses and 3 constraints.

2

(6)

2

Note that F (X) can also be written in a new form F (X)

from pose ri to pose rj (i = 0, 1, j = 1, 2). The corresponding pose graph is shown in Fig. 1 with 3 nodes (poses) and 3 edges (relative pose constraints).

= γr1 [(zxr1 − xr1 )2 + (zyr1 − yr1 )2 ] + γφr1 (zφr1 − φr1 )2 + γr2 [(zxr2 − xr2 )2 + (zyr2 − yr2 )2 ] + γφr2 (zφr2 − φr2 )2 + γr2 r1 [A(φr1 ) − (xr2 − xr1 )]2

A. Least Squares Problem Formulation We choose pose r0 as the origin of the global coordinate frame. That is, r0 = (xr0 , yr0 , φr0 )T = (0, 0, 0)T . The least squares problem is to minimize  i i Zji (Zji − H Zj (X))T PZ−1 (X)) (1) F (X) = i (Zj − H

+ γr2 r1 [B(φr1 ) − (yr2 − yr1 )]2 + γφrr1 (zφrr1 − φr2 + φr1 )2 2

where

where Zji is the relative pose constraint from pose ri to pose rj , and PZji is the corresponding covariance matrix. The 0

function H Zj (X) is given by  0 H Zj (X) = xrj

yrj

φ rj

T

(2)

= =

zxrr1 cφr1 − zyrr1 sφr1 , 2 2 zxrr1 sφr1 + zyrr1 cφr1 . 2

(8)

2

Remark 1. In (7), the term cφr1 and sφr1 are separated from the term (xr2 − xr1 ) and (yr2 − yr1 ) which make the computation of the gradient and analyzing of the stationary points easier. It is clear from our results that spherical covariance in the form (5) can simplify the problem significantly. B. The Equivalence to One Dimensional Problem

1

and the function H Zj (X) is given by 2 ⎡ ⎤ cφr1 (xrj − xr1 ) + sφr1 (yrj − yr1 ) 1 H Zj (X) = ⎣−sφr1 (xrj − xr1 ) + cφr1 (yrj − yr1 )⎦ . φ rj − φ r1

4

A(φr1 ) B(φr1 )

j

i,j

(7)

2

(3)

For the 3-pose 3-constraint problem, the state vector X is T  X = xr2 yr2 φr2 xr1 yr1 φr1 .

The following theorem shows that the problem of minimizing the objective function (7) is equivalent to an optimization problem with one variable. The results in this paper hold for general spherical covariance matrices as given in (5). However, in order to simplify the formulas and make the paper more readable, in the reminder of this paper, we assume γr1 = γφr1 = γr2 = γφr2 = γr2 r1 = γφrr1 = 1.

(9)

2

Suppose the relative pose constraints are Z10 Z20 Z21

= = =

T

(zxr1 , zyr1 , zφr1 ) (zxr2 , zyr2 , zφr2 )T (zxrr1 , zyrr1 , zφrr1 )T . 2

2

= = =

1 f (φ) = φ2 + (φ + Δzφrr1 )2 − 2a cos(φ + α) + b (10) 2 2 where Δzφrr1 , a, α, b are constants that can be computed from 2 the data by

2

Assume the covariance matrices PZji are spherical PZ10 PZ20 PZ21

(4)

diag(γr−1 , γr−1 , γφ−1 ) 1 1 r1 −1 −1 −1 diag(γr2 , γr2 , γφr ) 2 diag(γr−1 , γr−1 , γφ−1 r1 ), 2 r1 2 r1

Theorem 1: Given data zxr1 , zyr1 , zφr1 , zxr2 , zyr2 , zφr2 , and zxrr1 , zyrr1 , zφrr1 , we have that minimizing the objective func2 2 2 tion (7) under assumption (9) is equivalent to minimizing the following function of one variable φ:

3

Δzφrr1 = zφrr1 − zφr2 + zφr1 ∈ [−π, π), 2 2 ' a = p2 + (d + q)2 , α = atan2(p, d + q), b = 13 [zx2r1 + zy2r1 + (zxr2 − zxr1 )2 + (zyr2 − zyr1 )2 ]

(5)

r2

2 In this paper, we use c to denote cos(φr1 ), φr1 , sφr1 , czφr , szφr 1 1 sin(φr1 ), cos(zφr1 ), sin(zφr1 ) respectively. 3 Here spherical means diagonal with the first two elements (corresponding to x and y) being the same. Most of the publicly available datasets (e.g. [9]) have the covariance matrices in this format.

r2

r2

(11)

4 In the following, we will simply use A, B to denote the functions A(φr1 ), B(φr1 ). This also applies to Ai , Bi , A2k , B2k etc.

426

TABLE I E XAMPLE 1: FOUR DATA WITH DIFFERENT LEVEL OF NOISES . T HE GROUND TRUTH OF THE POSES ARE r0 = (0, 0, 0), r1 = (1, 0.5,

π ), r2 2

= (0, 1, 34 π).

noise

(zxr1 , zyr1 , zφr1 )

(zxr2 , zyr2 , zφr2 )

(zxr1 , zyr1 , zφr1 )

Δzφr1

α

No.

φ∗

f (φ∗ )

F (X ∗ )

zero small large huge

(1.0000, 0.5000, 1.5708) ( 1.1022 0.3967 1.5383) (0.9368 0.6861 2.1437) (1.1674 6.1375 0.0755)

(0, 1.0000, 2.3562) ( 0.0805 0.9991 2.5293) (-0.2468 1.0940 2.3901) (-6.6684 2.7509 2.4706)

(0.5000, 1.0000, 0.7854) ( 0.6335 0.9682 0.8705) (0.5627 1.3630 1.1482) (-0.8100 -7.7511 -0.6429)

0 -0.1204 0.9018 -3.0380

0 -0.0793 0.5132 1.1342

1 1 1 3

0 0.0493 -0.3623 -0.9978

0 0.0057 0.3073 9.7355

0 0.0057 0.3073 9.7355

r2

r2

r2

r2

‘No.’ means “the number of minima in [−2π − α, 2π − α]”.

Proof: See Appendix A. Remark 2. The constants Δzφrr1 , a, α, b in f (φ) summarize 2 some important information from the relative pose constraints data. For example, Δzφrr1 indicates the level of consistency 2 among the three angles zφrr1 , zφr2 , and zφr1 . If the three 2 angles are compatible with each other (e.g. when the relative orientation estimates are accurate), then Δzφrr1 is close to 0. 2 Similarly, if the relative pose data are accurate, then α is close to 0 and a is proportional to the square of the distance between r1 and r2 . Constant b does not have any impact on the solution to the optimization problem. Example 1. To illustrate the results in Theorem 1, consider the following example. Assume the ground truth of the poses are r0 = (0, 0, 0), r1 = (1, 0.5, π2 ), r2 = (0, 1, 34 π) which is similar to the configuration in Fig. 1. Four examples of relative constraints datasets randomly generated with 4 different levels of noise are listed in Table I. The corresponding constants Δzφrr1 , a, α, b can then be computed for each dataset using 2 the formulas in (11) and the function f (φ) in (10) can be obtained. The two key constants Δzφrr1 and α, the number of 2 local minima within [−2π − α, 2π − α], the global minimum φ∗ and f (φ∗ ), as well as the objective function value obtained through solving the least squares problem using Gauss-Newton (F (X ∗ )) are all listed in Table I. Fig. 2 shows the function f (φ) for the 4 different datasets. It can be seen that more than one minima exist only when the noise is unrealistically large.

200 zero small large huge

150

f(φ)

where d = 13 [(zxr2 − zxr1 )2 + (zyr2 − zyr1 )2 ] p = δa czφr + δb szφr 1 1 q = −δa szφr + δb czφr 1 1 δa = −Δzxrr1 (− 13 zyr1 + 13 zyr2 ) + Δzyrr1 (− 13 zxr1 + 13 zxr2 ) 2 2 δb = Δzxrr1 (− 13 zxr1 + 13 zxr2 ) + Δzyrr1 (− 13 zyr1 + 13 zyr2 ) 2 2 Δzxrr1 = zxrr1 − (zxr2 − zxr1 )czφr − (zyr2 − zyr1 )szφr 1 1 2 2 Δzyrr1 = zyrr1 + (zxr2 − zxr1 )szφr − (zyr2 − zyr1 )czφr . 1 1 2 2 (12) In fact, once the solution φ to the one variable optimization problem is obtained, the solution to the problem of minimizing F (X) in (7) can be obtained in a closed-form by the following formulas φr1 = φ + zφr1 φr2 = 21 (zφr2 + zφrr1 + φr1 ) 2 xr1 = 23 zxr1 + 13 zxr2 − 13 A (13) yr1 = 23 zyr1 + 13 zyr2 − 13 B xr2 = 31 zxr1 + 23 zxr2 + 13 A yr2 = 13 zyr1 + 23 zyr2 + 13 B.

100

50

0 −10

−5

0 φ

5

10

Fig. 2. The function f (φ) in Theorem 1 for datasets with different levels of noise: pose graph with 3 poses and 3 constraints.

III. P OSE G RAPH WITH T WO A NCHOR P OSES Now consider a more general pose graph as shown in Fig. 3. This pose graph has a special structure. Namely, all the edges are connecting one of the two nodes r0 and r1 . Here we call nodes r0 and r1 “anchor nodes” or “anchor poses”. Assume there are: • n poses that have two edges linked to both r0 and r1 : r2 , · · · , rn , rn+1 ; • n1 poses that only have one edge linked to r0 : r11 , r12 , · · · , r1n1 ; • n2 poses that only have one edge linked to r1 : r21 , r22 , · · · , r2n2 . Assume the relative pose constraints are: T • from r0 to r1 : (zxr , zyr , zφr ) ; 1 1 1 T • from r0 to ri : (zxr , zyr , zφr ) , i = 2, . . . , n + 1; i i i T • from r1 to ri : (zxr1 , zy r1 , zφr1 ) , i = 2, . . . , n + 1; ri ri ri T • from r0 to r1j : (zxr , zyr , zφr ) , j = 1, . . . , n1 ; 1j 1j 1j T • from r1 to r2k : (zxr , zyr , zφr ) , k = 1, . . . , n2 . 2k 2k 2k In the example shown in Fig. 3, n = 3, n1 = 2, n2 = 4. It is obvious that the 3-pose 3-constraint pose graph in Fig. 1 is a special case with n = 1, n1 = 0, n2 = 0. The covariance matrices of the relative pose constraints can be assumed to have the same format as in (5). However, to simplify the formulas, we assume the covariance matrices are all identity matrices. To find the optimal configuration of the pose graph, the state vector X contains all the robot poses except pose r0 which is the origin of the coordinate frame. Similar to (7), the objective function of the least squares optimization problem

427

b =(

n+1 1  2 − 1)zxr1 )2 zxri + ( n + 2 i=2 n+2 n+1 1  2 − 1)zyr1 )2 zy + ( n + 2 i=2 ri n+2

+( +2

n+1 

[

n+1  1 1 1 zxr1 − zxrj + zx ] 2 n+2 2 2(n + 2) i=2 ri

[

n+1  1 1 1 zyr1 − zyrj + zy ] 2 n+2 2 2(n + 2) i=2 ri

j=2

+2

n+1  j=2

n+1 n+1  2 (z r1 + zy2rr1 ) i 2(n + 2) i=2 xri  1 (zxrr1 zxrr1 + zyrr1 zyrr1 ) − i j i j n+2

+

Fig. 3.

Pose graph with two anchor poses r0 and r1 .

(17)

2≤i