Space Operations: Inspiring Humankind's Future [1st ed.] 978-3-030-11535-7;978-3-030-11536-4

This book includes a selection of 30 reviewed and enhanced manuscripts published during the 15th SpaceOps Conference hel

642 54 37MB

English Pages XVII, 856 [852] Year 2019

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Operations Management [1st ed.] 1259142205, 9781259142208

Cachon 1e is designed for undergraduate students taking an introductory course in operations management. This text will

9,729 1,070 47MB Read more

Nanosatellites: Space and Ground Technologies, Operations and Economics 1119042038, 9781119042037

Nanosatellites: Space and Ground Technologies, Operations and Economics Rogerio Atem de Carvalho, Instituto Federal Flu

1,172 100 37MB Read more

Space Operations: Experience, Mission Systems, and Advanced Concepts 1624102077, 9781624102073

Space Operations: Experience, Mission Systems, and Advanced Concepts is a collection of materials presented at the 12th

149 98 Read more

Operations and Basic Processes in Ironmaking [1st ed.] 9783030546052, 9783030546069

This textbook explores the production of pig iron, covering the first part of the steel production process, known as iro

650 76 8MB Read more

Future Interior Concepts [1st ed.] 9783030510435, 9783030510442

In this book, the authors highlight multiple aspects of and views on comprehensive automotive interior comfort for futur

449 48 4MB Read more

Human Enhancements for Space Missions: Lunar, Martian, and Future Missions to the Outer Planets [1st ed.] 9783030420352, 9783030420369

This book presents a collection of chapters, which address various contexts and challenges of the idea of human enhancem

1,602 55 4MB Read more

The Future of U.S. Special Operations Forces [1 ed.] 9781306463621, 9780876095508

193 44 770KB Read more

Alcohol in Space: Past, Present and Future 147667924X, 9781476679242

The production and consumption of alcohol has played a significant role in human society since the dawn of civilization.

362 60 5MB Read more

One Giant Leap: Iconic and Inspiring Space Race Inventions that Shaped History [Original retail ed.] 1493038435, 978-1493038435

On July 20, 1969, Americans had their eyes and ears glued to their TVs and radios. NASA's successful moon landing l

428 31 22MB Read more

The Bioethics of Space Exploration: Human Enhancement and Gene Editing in Future Space Missions 0197628478, 9780197628478

The first book devoted to the bioethics of the space-mission environment, TheBioethics of Space Exploration explores the

309 128 7MB Read more

Space Operations: Inspiring Humankind's Future [1st ed.]
978-3-030-11535-7;978-3-030-11536-4

Author / Uploaded
Helene Pasquier
Craig A. Cruzen
Michael Schmidhuber
Young H. Lee

Table of contents :
Front Matter ....Pages i-xvii
Front Matter ....Pages 1-1
Implementing Next-Generation Relay Services at Mars in an International Relay Network (Roy E. Gladden, Greg J. Kazz, Scott C. Burleigh, Daniel Wenkert, Charles D. Edwards)....Pages 3-23
Space Mobile Network Concepts for Missions Beyond Low Earth Orbit (David J. Israel, Christopher J. Roberts, Robert M. Morgenstern, Jay L. Gao, Wallace S. Tai)....Pages 25-41
Creating a NASA Deep Space Optical Communications System (Leslie J. Deutsch, Stephen M. Lichten, Daniel J. Hoppe, Anthony J. Russo, Donald M. Cornwell)....Pages 43-62
Concept of Operations for the Gateway (Kathleen Coderre, Christine Edwards, Tim Cichan, Danielle Richey, Nathan Shupe, David Sabolish et al.)....Pages 63-82
Ariane 6 Launch System Operational Concept Main Drivers (Pier Domenico Resta, Julio A. Monreal, Benoît Pouffary, Sonia Lemercier, Aline Decadi, Emilie Arnoud)....Pages 83-101
LUMIO: An Autonomous CubeSat for Lunar Exploration (Stefano Speretta, Angelo Cervone, Prem Sundaramoorthy, Ron Noomen, Samiksha Mestry, Ana Cipriano et al.)....Pages 103-134
Use of Terrain-Based Analysis in Mission Design, Planning and Modeling of Operations of a Lunar Exploration Rover (M. S. Menon, A. Kothandhapani, N. S. Sundaram, S. Nagaraj, A. Gopalan)....Pages 135-167
The Evolution of Interface Specification for Spacecraft Command and Control (Eric Brenner, Ron Bolton, Chris Ostrum, A. Marquis Gacy)....Pages 169-180
Exploring the Benefits of a Model-Based Approach for Tests and Operational Procedures (R. de Ferluc, F. Bergomi, G. Garcia)....Pages 181-193
The Power of High-Fidelity, Mission-Level Modeling and Simulation to Influence Spacecraft Design and Operability for Europa Clipper (Eric W. Ferguson, Steve S. Wissler, Ben K. Bradley, Pierre F. Maldague, Jan M. Ludwinski, Chistopher R. Lawler)....Pages 195-231
Attitude Control Optimization of a Two-CubeSat Virtual Telescope in a Highly Elliptical Orbit (Reza Pirayesh, Asal Naseri, Fernando Moreu, Steven Stochaj, Neerav Shah, John Krizmanic)....Pages 233-258
Front Matter ....Pages 259-259
The Cassini/Huygens Navigation Ground Data System: Design, Implementation, and Operations (R. M. Beswick)....Pages 261-322
Ground Enterprise Transformation at NESDIS (Steven R. Petersen)....Pages 323-355
CNES Mission Operations System Roadmap: Towards Rationalisation and Efficiency with ISIS (Paul Gélie, Helene Pasquier, Yves Labrune)....Pages 357-391
Return Link Service Provider (RLSP) Acknowledgement Service to Confirm the Detection and Localization of the SAR Galileo Alerts (M. Fontanier, H. Ruiz, C. Scaleggi)....Pages 393-411
Automated Techniques for Routine Monitoring and Contingency Detection of LEO Spacecraft Operations (Ed Trollope, Richard Dyer, Tiago Francisco, James Miller, Mauro Pagan Griso, Alessandro Argemandy)....Pages 413-437
The Added Value of Advanced Feature Engineering and Selection for Machine Learning Models in Spacecraft Behavior Prediction (Ying Gu, Gagan Manjunatha Gowda, Praveen Kumar Jayanna, Redouane Boumghar, Luke Lucas, Ansgar Bernardi et al.)....Pages 439-454
The EnMAP Mission Planning System (Thomas Fruth, Christoph Lenzen, Elke Gross, Falk Mrowka)....Pages 455-473
Recommendations Emerging from an Analysis of NASA’s Deep Space Communications Capacity (Douglas S. Abraham, Bruce E. MacNeal, David P. Heckman, Yijiang Chen, Janet P. Wu, Kristy Tran et al.)....Pages 475-511
Statistical Methods for Outlier Detection in Space Telemetries (Clémentine Barreyre, Loic Boussouf, Bertrand Cabon, Béatrice Laurent, Jean-Michel Loubes)....Pages 513-547
Front Matter ....Pages 549-549
In-Orbit Experience of the Gaia and LISA Pathfinder Cold Gas Micro-propulsion Systems (Jonas Marie, Federico Cordero, David Milligan, Eric Ecale, Philippe Tatry)....Pages 551-574
The Cassini Mission: Reconstructing Thirteen Years of the Most Complex Gravity-Assist Trajectory Flown to Date (Julie Bellerose, Duane Roth, Zahi Tarzi, Sean Wagner)....Pages 575-588
Resurrecting NEOSSat: How Innovative Flight Software Saved Canada’s Space Telescope (Viqar Abbasi, Natasha Jackson, Michel Doyon, Ron Wessels, Pooya Sekhavat, Matthew Cannata et al.)....Pages 589-613
New Ways to Fly an Old Spacecraft: Enabling Further Discoveries with Kepler’s K2 Mission (K. A. Larson, K. M. McCalmont-Everton, C. A. Peterson, S. E. Ross, J. Troeltzsch, D. Wiemer)....Pages 615-633
Incorporating Lessons Learned from Past Missions for InSight Activity Planning and Sequencing (Forrest Ridenhour, C. Lawler, K. Roffo, M. Smith, S. Wissler, P. Maldague)....Pages 635-660
Mars Aerobraking Operations for ExoMars TGO: A Flight Dynamics Perspective (F. Castellini, G. Bellei, B. Godard)....Pages 661-694
Operations Design, Test and In-Flight Experience of the Sentinels Optical Communications Payload (OCP) (I. Shurmer, F. Marchese, J. Morales)....Pages 695-727
Ant-Based Mission Planning for Constellations: A Generic Framework Applied to EO and Data Relay Missions (Evridiki V. Ntagiou, Roberto Armellin, Claudio Iacopino, Nicola Policella, Alessandro Donati)....Pages 729-745
Operational Benefit of a 3D Printer in Future Human Mars Missions—Results from Analog Simulation Testing (M. Müller, S. Gruber, M. D. Coen, R. Campbell, D. Kim, B. Morrell)....Pages 747-777
Ethological Approach of the Human Factors from Space Missions to Space Operations (C. Tafforin, S. Michel, G. Galet)....Pages 779-794
Enhanced Awareness in Space Operations Using Web-Based Interactive Multipurpose Dynamic Network Analysis (Redouane Boumghar, Rui Nuno Neves Madeira, Alessandro Donati, Ioannis Angelis, José Fernando Moreira Da Silva, Jose Antonio Martinez Heras et al.)....Pages 795-810
Space Education and Awareness in South Africa—Programmes, Initiatives, Achievements, Challenges and Issues (S. G. Magagula, J. Witten)....Pages 811-826
Educational Outreach and International Collaboration Through ARISS: Amateur Radio on the International Space Station (Frank H. Bauer, David Taylor, Rosalie A. White, Oliver Amend)....Pages 827-856

Citation preview

Helene Pasquier · Craig A. Cruzen · Michael Schmidhuber · Young H. Lee Editors

Space Operations: Inspiring Humankind’s Future

Space Operations: Inspiring Humankind’s Future

Helene Pasquier Craig A. Cruzen Michael Schmidhuber Young H. Lee •

•

•

Editors

Space Operations: Inspiring Humankind’s Future

123

Editors Helene Pasquier Direction du Numérique, de l'exploitation et des Opérations Centre National D’Etudes Spatiales—CNES Toulouse, France Michael Schmidhuber German Space Operations Center DLR—German Aerospace Center Wessling, Bayern, Germany

Craig A. Cruzen Payload and Mission Operations NASA Marshall Space Flight Center Huntsville, AL, USA Young H. Lee NASA/Jet Propulsion Laboratory/California Institute of Technology Pasadena, CA, USA

ISBN 978-3-030-11535-7 ISBN 978-3-030-11536-4 https://doi.org/10.1007/978-3-030-11536-4

(eBook)

Library of Congress Control Number: 2018967424 © Springer Nature Switzerland AG 2019 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Foreword

Around the world today, there is a growing awareness of the role of space activities on everyday life. Beyond the established domains of telecommunications, earth imaging, weather forecasting, or search and rescue, the contributions of reliable and accurate navigation systems to society and of earth observation systems to monitor the climate evolution or possibly prevent natural disasters are now new and became obvious fields to the citizens of the world. Everyone can easily appreciate the benefits of these space applications through their daily lives. Nowadays, the domain of exploration is also part of the widespread appreciation of space activities after the multiple challenges that were overcome with the International Space Station and the many live images broadcasted around the world on the achievements of this key phase of the “Man in Space” story. Even the exotic field of space science is now well known by the public—thanks to the amazing achievements of some recent robotic missions. The Rosetta spacecraft, flown by the European Space Agency, released the Philae Lander on to the comet 67P/Churyumov-Gerasimenko, 10 years and six months after its launch from Earth at a distance over 500,000,000 km. Another well-known mission is the New Horizons spacecraft of the National Aeronautics and Space Administration that sent fantastic pictures of Pluto back to the Earth from over 31 astronomical units away (about 4,700,000,000 km). Space systems like these and many others are returning wonderful images and scientific data that contribute to the dream of humankind capable of pushing the limits of knowledge and technology. Clearly, space utilization and the related applications are part of our life. The outstanding achievements performed in space inspire our dreams. Nevertheless, space is mainly considered a high technology and advanced scientific field. In particular, the young engineers understand quickly the technical challenges and the complex development processes to enable space missions. Their motivation, when joining a space organization, is often to be a designer, coping with the technical challenges to develop a new launcher, a satellite, or a lander. The field of space operations does not often immediately capture their attention.

v

vi

Foreword

Of the thousands of professionals working in the aerospace industry, only a small percentage are aware of the complexity and the many exciting challenges to deal with when designing, testing, or operating the space assets to make these breathtaking missions a reality and a success. This is the key aspect of the space operations discipline—a very demanding field where “failure is not an option” and the consequences of any apparently small mistake may be lethal for the mission. It is an extremely rewarding domain indeed, as when participating in the mission operations design, the development of ground segments, or the operations of a complex spacecraft—you will see and measure the results of all your efforts and sacrifices, along with the key milestones of the mission. After the announcement of the “separation from the launcher on a nominal trajectory,” the launch operations team will celebrate a great achievement after months of hard preparation work. Upon successful completion of “mission orbit acquisition,” “in flight acceptance tests,” or “the start of payload operations,” the spacecraft operation teams will also see the fruits of their dedication to the team efforts. What to say of the extremely complex operations in the proximity of remote planets, for which each of the key events brings you to cope with a “never done before” kind of operations? The images of gleeful excitement in the mission control rooms, as may be seen from time to time in the news, allow us to measure how big the challenges are and to exhibit huge pressure on the shoulders of the teams, to always be ready to react properly to unforeseen situations that could lead to a disaster to a mission. When attending a SpaceOps conference and when meeting with the international space operations community, it becomes easier to understand these sentiments. Yes, a community of people who want to share their unique accomplishments, technical challenges, but also the incredible personal experiences that only space operations can take you thru. The above are the ideas that the organizers of SpaceOps 2018 highlighted in Marseille, France. The theme of the SpaceOps 2018 conference was “To inspire humankind’s future.” It is a notion that carries the idea of the results of space exploration and utilization contributing to improve our future society and to build on the dream that goes along with the achievements of the missions. One key element in the organization of SpaceOps 2018 was to offer an opportunity to young generations to explore the domain of space operations. This concretized with a solid program for students and young professionals as well as sessions designed around the topic: “Inspiring next generations.” The prime objective of this track was to enable cross-generational exchanges between the participants. A second key element of the conference program was the plenary panels: The first had a theme of “Climate monitoring,” to show the benefits of accurate measuring the markers of the climate from space, but also the added value the space systems demonstrate in anticipating the effects of the extreme climate phenomenon (e.g., tsunami, hurricanes, forest fires, etc.). The second plenary was on the challenges and the dream with the theme “Sample Return Missions,” to understand the scientific objectives and the contribution of the operations to meeting these mission

Foreword

vii

goals. The third plenary was dedicated to the spirit to be a privileged community, under the theme “Outstanding Achievements in Space Operations,” to give examples of what the operations teams have achieved in the context of extremely complex missions, of rescued missions, or of re-engineered or extended missions. From its creation in 1990, the SpaceOps organization’s main goal has been the monitoring and promotion of the evolution in all major fields of space operations including unmanned and manned missions and embracing space missions from near Earth to deep space. It is intended to be a forum enabling experts to talk, discuss, and share their knowledge and experience. To these ends, the international group has been sponsoring biannual conferences since 1990. In May 2018, the 15th SpaceOps conference was held in Marseilles, France. At this SpaceOps 2018, more than 870 participants from about 40 countries attended with a total of 398 presentations being given. SpaceOps 2018 included the following topics: • • • • • • • • • • • • • • •

Mission Design and Management Operations Concepts Flight Execution Ground Systems Engineering Data Management Planning and Scheduling Guidance, Navigation, and Control Communications Architectures and Networks Human Systems and Operations Cross Support, Interoperability, and Standards Training and Knowledge Transfer Launcher, Rocket, and Balloon Operations Small Satellite Operations Commercial Space Operations Inspiring the Next Generations

Since the 2006 conference, organizers and the SpaceOps Technical Program Committee (TPC) have been producing a post-conference book, a compilation of approximately 10% of papers at each conference. In 2018, organizers decided to base the selection on quality and innovation, trying to achieve representativeness of the conference, by having papers from each topic and from different geographical origins. In this present edition, the best student papers are also included. With this selection of smart technical papers, the ideas driving the design of the SpaceOps 2018 conference are certainly captured and will give the readers of this post-conference book a feeling of what this conference was and how space operations contribute to the mankind’s dream of reaching space. I sincerely hope that the readers will keep this collection of extraordinary manuscripts as a reminder of one of the memorable milestones and achievements of their lives. Furthermore, I would like to acknowledge and send a warm thanks to all the contributors of SpaceOps 2018 including all of the authors, the participants,

viii

Foreword

sponsors, every topic chair of the TPC, and the Selection and Publication Committee for the best papers in this book. Thank you all. For more information on the SpaceOps Organization visit: www.spaceops.org.

Toulouse, France

Jean-Marc Soula Centre National D’Etudes Spatiales SpaceOps 2018 Technical Program Committee Chair

Preface

The SpaceOps organization was founded in 1990 to foster technical discussions on all aspects of space mission operations and ground data systems among space agencies, academic institutions, space operators, and industry. The organization aims to facilitate and encourage the exchange of managerial and technical information via periodic symposia concerning spacecraft, ground systems, and mission operations. Other formal and informal meetings, workshops, and publication of managerial and technical information are also significant objectives. Formal SpaceOps conferences are organized on a biennial basis and are hosted by a selected participating space agency. Conference features include technical sessions, plenary sessions, poster presentations, social and networking events, industry exhibition, and sponsorship opportunities. The fifteenth symposium was held in Marseille, France, and organized by the French Space Agency (CNES), May 28–June 1, 2018. Its theme was “Inspiring Humankind’s Future.” Following a precedent set at the 2006 conference, the organizers of SpaceOps 2018 decided to publish a book of “best” papers reflecting representative subjects presented at the conference. The SpaceOps 2018 conference topic chairs reviewed and selected papers for this book. The topic chairs and technical organizers included:

Jean-Marc Soula (CNES) Andy Dowen (NASA/JPL)

Shinichi Nakamura (JAXA) Young H. Lee (NASA/JPL)

Michael Schmidt (ESA) Mariella Spada (ESA)

Sean Burns (EUMETSAT) Michael Schmidhuber (DLR) Fabio D’Amico (ASI) Gérard Galet (CNES)

Joan Differding (NASA/ARC) Mark Lupisella (NASA/GSFC)

Bangyeop Kim (KARI) Andrew Monham (EUMETSAT) Béatrice Deguine (CNES) Keyul Patel (NASA/JPL) Thierry Levoir (CNES) Martin Wickler (DLR) (continued)

ix

x

Preface

(continued) Gian Paolo Calzolari (ESA) Thomas Müller (DLR) Craig A. Cruzen (NASA/MSFC) Takahiro Yamada (JAXA) Vladimir Nazarov (IKI)

Michel Doyon (CSA) Zeina Mounzer (Telesp.Vega) Julio Monreal (ESA)

Harry Shaw (NASA/GSFC) Helene Pasquier (CNES)

Mark Danehy (NOAA)

Eugene Avenant (SANSA)

Yunus Bhayat (SANSA)

The selected papers were examined to assess the technical accuracy and completeness of the information. Then, they were edited for clarity, logical organization, and emphasis of importance to space operations. Some figures and tables are being reprinted exactly as they appeared in the original conference papers, and in case where AIAA originally owned copyright, they are being reprinted with permission. The editors wish to thank the conference session and topic chairs, the organizers, the SpaceOps Executive Committee, and the SpaceOps Publications Group; all were instrumental in the development and publication of this book. Finally, and most importantly, the editors would like to thank the authors who contributed to this publication. Without their hard work and diligence, this esteemed compilation of conference best papers would not have been possible. Toulouse, France Huntsville, Alabama, USA Wessling, Germany Pasadena, California, USA November 2018

Helene Pasquier Craig A. Cruzen Michael Schmidhuber Young H. Lee

Contents

Part I

Mission Design and Mission Management

Implementing Next-Generation Relay Services at Mars in an International Relay Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Roy E. Gladden, Greg J. Kazz, Scott C. Burleigh, Daniel Wenkert and Charles D. Edwards Space Mobile Network Concepts for Missions Beyond Low Earth Orbit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . David J. Israel, Christopher J. Roberts, Robert M. Morgenstern, Jay L. Gao and Wallace S. Tai

3

25

Creating a NASA Deep Space Optical Communications System . . . . . . Leslie J. Deutsch, Stephen M. Lichten, Daniel J. Hoppe, Anthony J. Russo and Donald M. Cornwell

43

Concept of Operations for the Gateway . . . . . . . . . . . . . . . . . . . . . . . . . Kathleen Coderre, Christine Edwards, Tim Cichan, Danielle Richey, Nathan Shupe, David Sabolish, Steven Ramm, Brent Perkes, Jerome Posey, William Pratt and Eileen Liu

63

Ariane 6 Launch System Operational Concept Main Drivers . . . . . . . . Pier Domenico Resta, Julio A. Monreal, Benoît Pouffary, Sonia Lemercier, Aline Decadi and Emilie Arnoud

83

LUMIO: An Autonomous CubeSat for Lunar Exploration . . . . . . . . . . 103 Stefano Speretta, Angelo Cervone, Prem Sundaramoorthy, Ron Noomen, Samiksha Mestry, Ana Cipriano, Francesco Topputo, James Biggs, Pierluigi Di Lizia, Mauro Massari, Karthik V. Mani, Diogene A. Dei Tos, Simone Ceccherini, Vittorio Franzese, Anton Ivanov, Demetrio Labate, Leonardo Tommasi, Arnoud Jochemsen, Jānis Gailis, Roberto Furfaro, Vishnu Reddy, Johan Vennekens and Roger Walker

xi

xii

Contents

Use of Terrain-Based Analysis in Mission Design, Planning and Modeling of Operations of a Lunar Exploration Rover . . . . . . . . . . 135 M. S. Menon, A. Kothandhapani, N. S. Sundaram, S. Nagaraj and A. Gopalan The Evolution of Interface Specification for Spacecraft Command and Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Eric Brenner, Ron Bolton, Chris Ostrum and A. Marquis Gacy Exploring the Benefits of a Model-Based Approach for Tests and Operational Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 R. de Ferluc, F. Bergomi and G. Garcia The Power of High-Fidelity, Mission-Level Modeling and Simulation to Influence Spacecraft Design and Operability for Europa Clipper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Eric W. Ferguson, Steve S. Wissler, Ben K. Bradley, Pierre F. Maldague, Jan M. Ludwinski and Chistopher R. Lawler Attitude Control Optimization of a Two-CubeSat Virtual Telescope in a Highly Elliptical Orbit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Reza Pirayesh, Asal Naseri, Fernando Moreu, Steven Stochaj, Neerav Shah and John Krizmanic Part II

Ground Systems and Networks

The Cassini/Huygens Navigation Ground Data System: Design, Implementation, and Operations . . . . . . . . . . . . . . . . . . . . . . . . 261 R. M. Beswick Ground Enterprise Transformation at NESDIS . . . . . . . . . . . . . . . . . . . 323 Steven R. Petersen CNES Mission Operations System Roadmap: Towards Rationalisation and Efficiency with ISIS . . . . . . . . . . . . . . . . . . . . . . . . . 357 Paul Gélie, Helene Pasquier and Yves Labrune Return Link Service Provider (RLSP) Acknowledgement Service to Confirm the Detection and Localization of the SAR Galileo Alerts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 M. Fontanier, H. Ruiz and C. Scaleggi Automated Techniques for Routine Monitoring and Contingency Detection of LEO Spacecraft Operations . . . . . . . . . . . . . . . . . . . . . . . . 413 Ed Trollope, Richard Dyer, Tiago Francisco, James Miller, Mauro Pagan Griso and Alessandro Argemandy

Contents

xiii

The Added Value of Advanced Feature Engineering and Selection for Machine Learning Models in Spacecraft Behavior Prediction . . . . . 439 Ying Gu, Gagan Manjunatha Gowda, Praveen Kumar Jayanna, Redouane Boumghar, Luke Lucas, Ansgar Bernardi and Andreas Dengel The EnMAP Mission Planning System . . . . . . . . . . . . . . . . . . . . . . . . . . 455 Thomas Fruth, Christoph Lenzen, Elke Gross and Falk Mrowka Recommendations Emerging from an Analysis of NASA’s Deep Space Communications Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475 Douglas S. Abraham, Bruce E. MacNeal, David P. Heckman, Yijiang Chen, Janet P. Wu, Kristy Tran, Andrew Kwok and Carlyn-Ann Lee Statistical Methods for Outlier Detection in Space Telemetries . . . . . . . 513 Clémentine Barreyre, Loic Boussouf, Bertrand Cabon, Béatrice Laurent and Jean-Michel Loubes Part III

Mission Execution

In-Orbit Experience of the Gaia and LISA Pathfinder Cold Gas Micro-propulsion Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551 Jonas Marie, Federico Cordero, David Milligan, Eric Ecale and Philippe Tatry The Cassini Mission: Reconstructing Thirteen Years of the Most Complex Gravity-Assist Trajectory Flown to Date . . . . . . . . . . . . . . . . . 575 Julie Bellerose, Duane Roth, Zahi Tarzi and Sean Wagner Resurrecting NEOSSat: How Innovative Flight Software Saved Canada’s Space Telescope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589 Viqar Abbasi, Natasha Jackson, Michel Doyon, Ron Wessels, Pooya Sekhavat, Matthew Cannata, Ross Gillett and Stuart Eagleson New Ways to Fly an Old Spacecraft: Enabling Further Discoveries with Kepler’s K2 Mission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615 K. A. Larson, K. M. McCalmont-Everton, C. A. Peterson, S. E. Ross, J. Troeltzsch and D. Wiemer Incorporating Lessons Learned from Past Missions for InSight Activity Planning and Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635 Forrest Ridenhour, C. Lawler, K. Roffo, M. Smith, S. Wissler and P. Maldague Mars Aerobraking Operations for ExoMars TGO: A Flight Dynamics Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661 F. Castellini, G. Bellei and B. Godard

xiv

Contents

Operations Design, Test and In-Flight Experience of the Sentinels Optical Communications Payload (OCP) . . . . . . . . . . . . . . . . . . . . . . . . 695 I. Shurmer, F. Marchese and J. Morales Ant-Based Mission Planning for Constellations: A Generic Framework Applied to EO and Data Relay Missions . . . . . . . . . . . . . . . 729 Evridiki V. Ntagiou, Roberto Armellin, Claudio Iacopino, Nicola Policella and Alessandro Donati Operational Benefit of a 3D Printer in Future Human Mars Missions—Results from Analog Simulation Testing . . . . . . . . . . . . . . . . 747 M. Müller, S. Gruber, M. D. Coen, R. Campbell, D. Kim and B. Morrell Ethological Approach of the Human Factors from Space Missions to Space Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 779 C. Tafforin, S. Michel and G. Galet Enhanced Awareness in Space Operations Using Web-Based Interactive Multipurpose Dynamic Network Analysis . . . . . . . . . . . . . . 795 Redouane Boumghar, Rui Nuno Neves Madeira, Alessandro Donati, Ioannis Angelis, José Fernando Moreira Da Silva, Jose Antonio Martinez Heras and Jonathan Schulster Space Education and Awareness in South Africa—Programmes, Initiatives, Achievements, Challenges and Issues . . . . . . . . . . . . . . . . . . 811 S. G. Magagula and J. Witten Educational Outreach and International Collaboration Through ARISS: Amateur Radio on the International Space Station . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827 Frank H. Bauer, David Taylor, Rosalie A. White and Oliver Amend

About the Editors

Helene Pasquier, Lead Editor is a ground systems operations expert in the Operations department at CNES (French Space Agency). She has been Head of Generic Ground Systems Section in the Products and Grounds Systems Department, leading the development of generic and reusable monitoring and control software systems, which is a part of CNES Mission Operations Systems. She is acting in research and technology studies and coordinates the use of the international standards (ECSS, CCSDS Mission Operations, and Cross Support Services) within the Mission Operations Systems developed by the Operations department. She works in close cooperation with the European Space Agency (ESA) for Ground Systems Harmonization projects and has been Active Member of the SpaceOps Technical Program Committee since 2008. Since 1982 at CNES, she held key positions in Earth Observation programs (SPOT, Pleiades HR) and Science Ground Systems (CADMOS French USOC, etc.) in the area of Mission Operations Systems and Payload Operations and Data Systems. She graduated in computer science engineering from the Institut National Polytechnique de Grenoble, France (ENSIMAG, INPG). Craig A. Cruzen, Editor is a Payload Operations Director for the International Space Station (ISS) Program at NASA’s George C. Marshall Space Flight Center (MSFC) in Huntsville, Alabama, USA, where he and his colleagues lead a ground control team in performing science operations onboard the ISS. He joined NASA in 1990 as Launch Trajectory Analyst and later served as Ascent Guidance Engineer. He also was Lead Developer of NASA’s Automated Rendezvous and Capture technology development project. In 2000, he transitioned to MSFC’s Mission Operations Laboratory where he certified as a Payload Rack Officer and a Timeline Change Officer in support of ISS real-time science operations. He was selected to be a Payload Operation Director in 2003 and served on

xv

xvi

About the Editors

console until 2009. From 2009 to 2012, he was Flight Operations Lead for NASA’s Ares launch vehicle development program. He returned to the operations directors’ office in 2013 and was named training lead in 2015. In addition to his NASA assignments, he has authored 3 books in cooperation with the American Institute of Aeronautics and Astronautics on space operations techniques and innovations, (2011, 2013, and 2015) and one from Springer (2017). He is Member of several professional organizations including the International Committee on Technical Interchange for Space Mission Operations (SpaceOps), the aerospace engineering honor society, Sigma Gamma Tau, and the National Association of Flight Instructors. He is Certified Flight Instructor and has over 3000 h in several types of single- and multi-engine aircraft. He has been the recipient of numerous NASA awards including the Engineering Director’s commendation for extraordinary leadership (2010), and the Silver Snoopy Award (2013) which is given by NASA astronauts for professionalism, dedication, and outstanding support to space flight mission success. He was born in Flint, Michigan, and claims Flushing, Michigan, as his hometown. He graduated from Flushing High School in 1987 and earned a B.S. in aerospace engineering from the University of Michigan in Ann Arbor in 1992. Michael Schmidhuber, Editor is employed at the German Space Operations Center (GSOC) of the German Aerospace Center (DLR). Currently, he is responsible for the ground system engineering for geostationary projects, especially ESA’s new data-relay satellite system (EDRS) that is operated from GSOC. He has worked in flight dynamics, mission operations, and system development for various LEO and GEO satellite projects since 1994. Among his activities are establishing of intranet sites for operations and the application of virtualization in the control room environment. He organizes a yearly spacecraft operations training course for external participants and is also involved in GSOC’s public outreach activities. He was born and grew up in Munich, Germany. The interest in spaceflight and astronomy goes back to the early seventies and was influenced strongly by the Apollo era. He graduated in aerospace engineering in 1994 at the Technical University of Munich. After several space-related post-diploma jobs, he joined a contractor company to work for mission operations at DLR. In 2006, he became Staff Member with DLR. Ms. Young H. Lee, Editor is the Advanced Design Engineering Technical Group Supervisor and Project Support Lead in the Project Systems Engineering and Formulation Section in Jet Propulsion Laboratory. She is currently supporting the Radioisotope Power Systems (RPS) program office in the area of mission analysis in coordinating a variety of mission studies that support the advocacy of new RPS for future solar system exploration. In the past, she was the Mission Study Team Lead of the multi-agency, multi-mission directorate and multi-center Nuclear Power Assessment Study delivering the final report on a sustainable strategy for safe, reliable, and affordable nuclear power systems for space exploration.

About the Editors

xvii

Over the last ten years, she has held many diverse leadership positions in NASA programs and projects establishing strategic and collaborative working relationships across many organizations within NASA, including its partners. In addition, she has over 20 years of experience in the development and deployment of operations systems for deep space missions, focusing on operations cost reduction, userproductivity improvements, and increased information throughput in support of many NASA deep space missions. Her first technical paper, titled “Network Monitoring and Analysis Expert System”, was presented at the First International Symposium on Ground Data Systems for Spacecraft Control, Darmstadt, Germany, in June 1990. Additionally, she was the first chapter lead author of the 2012 SpaceOps Conference Book, titled, The International Space Station: Unique In-Space Testbed as Exploration Analog. She has been the recipient of numerous NASA awards including the Space Flight Awareness Team Leadership Award, which is intended for recognition of mid-level managers who consistently demonstrate loyalty, empowerment, accountability, diversity, excellence, respect, sharing, honesty, and integrity and are proactive. She has an M.S. in management of information systems at the Claremont Graduate University in California in 1992 and a B.S. in computer information systems at the California Polytechnic State University in Pomona in 1984. She currently lives in Pomona, California, and is also the proud mother of two aspiring, young men.

Part I

Mission Design and Mission Management

Implementing Next-Generation Relay Services at Mars in an International Relay Network Roy E. Gladden, Greg J. Kazz, Scott C. Burleigh, Daniel Wenkert and Charles D. Edwards

Abstract Nearly all data acquired by vehicles on the surface of Mars is returned to Earth via Mars orbiters—more than 1.7 TB so far. Successful communication between the various spacecraft is achieved via the careful implementation of internationally recognized CCSDS telecommunications protocols and the use of planning and coordination services provided by NASA’s Mars Program Office and the Multimission Ground Systems and Services (MGSS) Program at the Jet Propulsion Laboratory in Pasadena, CA. This modern Mars relay network has evolved since its inception in 2004 with the addition and loss of several missions, but it has fundamentally remained unchanged. Ground interfaces between the various spacecrafts’ mission operation centers on Earth remain largely unique for each participant; each mission maintains its own interfaces with deep-space communications networks (e.g., DSN, ESTRACK), which are similar but still unique; and relay sessions at Mars require careful ground planning, coordination, and implementation. This paper will discuss the existing architecture and consider how several technologies may be applied to the next generation of relay services at Mars. Ultimately, these are expected to lead to the implementation of a delay- and disruption-tolerant network at Mars, a precursor to becoming a major element in an emerging Solar System Internetwork. This chapter, which derives material from a paper the authors delivered at the SpaceOps 2018 conference [1], will discuss several of these pending technologies, which are predicted to be necessary for the next generation of relay activities at Mars. Nomenclature In this chapter, the following terms are used: • The term “data” refers to binary information that is needful to be received by or sent from an entity. This term intentionally makes no assumption about the content, formatting, size, or construct of the data. • The terms “data file” and “data product” imply that data may be assembled into a distinct construct that may be received by or sent from an entity. R. E. Gladden (B) · G. J. Kazz · S. C. Burleigh · D. Wenkert · C. D. Edwards NASA/Jet Propulsion Laboratory/California Institute of Technology, Pasadena, CA 91109, USA e-mail: [email protected] © Springer Nature Switzerland AG 2019 H. Pasquier et al. (eds.), Space Operations: Inspiring Humankind’s Future, https://doi.org/10.1007/978-3-030-11536-4_1

3

4

R. E. Gladden et al.

• The term “relay services” implies the transfer of data from one entity to another by an intermediary who provides the services. • The term “relay service user spacecraft” refers to a spacecraft (as used herein), that requires relay services of another spacecraft. • The term “relay service provider spacecraft” refers to a spacecraft (as used herein), that provides relay services to another spacecraft. It is notable that a spacecraft may be both a relay service user spacecraft and a relay service provider spacecraft. • The term “node” refers to an entity that acts as a relay service provider or a relay service user and may refer to either a spacecraft, or a ground station, or other ground-based network components. • The term “relay session” refers to a communication session between a relay service user spacecraft and a relay service provider spacecraft during which data may be transferred. • The term “lander” is deliberately used in this chapter to refer to any spacecraft on or near the surface of Mars, distinct from those that may be orbiting Mars. Generally, a lander is a relay service user spacecraft, but this should not be considered a strict equivalency. When specifically discussing the existing Curiosity and Opportunity rovers on the surface of Mars, the term “rover” may be used interchangeably. • The term “orbiter” is deliberately used in this chapter to refer to any spacecraft that is orbiting Mars, distinct from those that may be on or near the surface of Mars. Generally, an orbiter is a relay service provider spacecraft, but this should not be considered a strict equivalency. • The term “return-link” refers to the transfer of data from a relay service user spacecraft to the operators of the relay service user spacecraft via a relay service provider spacecraft and other intermediate network nodes. • The term “forward-link” refers to the transfer of data from the operators of a relay service user spacecraft to the relay service spacecraft, opposite to the “returnlink” data flow, through a relay service provider spacecraft and other intermediate network nodes. • Data transferred to a relay service user spacecraft is referred to as “forward-link data.” This data may be a data file. • Data received from a relay service user spacecraft is referred to as “return-link data.” This data may be a data file. • The term “sol” refers herein to a Martian day. In this chapter, the following acronyms are used: AIAA APID BCB BIB BIBE BP BSP CCSDS

American Institute of Aeronautics and Astronautics Application Identifier Block Confidentiality Block Block Integrity Block Bundle-In-Bundle Encapsulation Bundle Protocol Bundle Security Protocol Consultative Committee for Space Data Systems

Implementing Next-Generation Relay Services at Mars …

CFDP DSN DTN EMI ESA ESTRACK FHLH FTP InSight IP LTP MAC MAVEN MEX MGSS MRO MTU NASA PDU Prox-1 PUS SMTP SSI TCP TGO USLP

5

CCSDS File Delivery Protocol Deep-Space Network Delay-Tolerant Network Electromagnetic Interference European Space Agency ESA’s Deep-Space Tracking Stations First-Hop/Last-Hop File Transfer Protocol Interior Exploration using Seismic Investigations, Geodesy and Heat Transport (lander) Internet Protocol Licklider Transmission Protocol Media Access Control (address) Mars Atmosphere and Volatile Evolution (orbiter) Mars Express (orbiter) Multimission Ground Systems and Services Mars Reconnaissance Orbiter Maximum Transmission Unit National Aeronautics and Space Administration Protocol Data Unit Proximity-1 (protocol) Packet Utilization Standard Simple Mail Transfer Protocol Solar System Internetwork Transfer Control Protocol Trace Gas Orbiter Unified Space Link Protocol

1 Introduction Nearly all data acquired by vehicles on the surface of Mars is returned to Earth via Mars orbiters—more than 1.7 TB. Today, the Curiosity and Opportunity rovers from the National Aeronautics and Space Administration (NASA) may utilize up to five Mars orbiters, including two from the European Space Agency (ESA), to relay this data back to Earth, which capability represents a highly successful international collaboration. Communication between the various spacecraft is achieved via the careful implementation of Consultative Committee for Space Data Systems (CCSDS) telecommunications protocols and the use of planning and coordination services provided by NASA’s Mars Program Office and the Multimission Ground Systems and Services Program at the Jet Propulsion Laboratory in Pasadena, CA. The topology of the relay network includes the proximity links between the spacecraft at Mars, the deep-space links between the orbiters at Mars and their Earth-based ground stations, and the ground system links between the ground stations and the

6

R. E. Gladden et al.

Fig. 1 General relay network topology

mission operation centers for the orbiters and the rovers at Mars. This topology also rightly includes the infrastructure required for the planning, coordination, and implementation of activities at each node in the network, inclusive of ground systems, deep-space antennas, and the spacecraft themselves. A generalized view of this topology is illustrated in Fig. 1. This modern Mars relay network has evolved since its inception in 2004 with the addition and loss of several missions at Mars, but it has fundamentally remained unchanged. Ground interfaces between the mission operation centers of the various spacecraft on Earth remain largely unique for each participant; each mission maintains their own interfaces with the deep-space networks, which manifest as similar but still unique interfaces; and the relay sessions at Mars require careful ground planning, coordination, and implementation. For the small, existing relay network, these one-to-one interfaces are manageable and the hands-on approach to relay planning serves the participants well. However, as the network is expected to grow and evolve in the next decade to include more spacecraft from a wider cast of participating organizations—and potentially even including human exploration components—it is useful to consider what other technologies might be needed.

2 Precursor Technologies All of the following capabilities are necessary precursors to the implementation of a delay- and disruption-tolerant network, which further requires automated, in situ communications scheduling. The implementation of these capabilities is expected to be guided or informed by existing and emerging CCSDS protocol and mission operations standards, which will be mentioned in this chapter.

Implementing Next-Generation Relay Services at Mars …

7

2.1 Addressable Data Transfers In the terrestrial Internet, most data communicated between machines includes an address that defines its intended endpoint. For example, when a user sends an email, the header of the email contains a construct that indicates the end recipient of the message. This email address is used by the transmission protocol to route the data through the network to the recipient. The address is meta-data associated with the content of the message and is not typically considered part of the message itself. The transmission protocol that manages the message (either to receive it or to transfer it elsewhere in the network) reads and interprets this meta-data to recognize the recipient. In the terrestrial Internet, most machines that handle emails typically do not directly know all of the intended recipients network-wide, but instead they know enough to route the email to another machine on the network that is assumed to be closer to the end recipient. In this way, data is forwarded from machine to machine until the message is delivered. By contrast, all of the orbiters that are currently operating as part of the Mars relay network can receive data from a lander on the surface of Mars, but the orbiters only know one destination to which to transfer the data: Earth. The orbiters receive the data as binary data and package them as if they were orbiter data. For example, NASA’s Mars 2001 Odyssey orbiter, the oldest spacecraft in the extant Mars relay network, is told by ground operators the identity of the asset on the surface of Mars with which it communicates. It uses this information to attribute an application identifier, or APID, to the data prior to transmitting the data to Earth. There is no other distinguishing marker applied by the orbiter to the data at transmission time, and it is left to ground operators to sort out the actual end recipient of the data, a matter that is complicated if the same APID is applied to data received from more than one lander. ESA’s Mars Express (MEX) orbiter handles this return-link data in a similar manner. The newest orbiter in the network, the ExoMars Trace Gas Orbiter (TGO), operated by ESA, receives data from a surface asset during a communication session and packages all the received data as a single CCSDS File Deliver Protocol (CFDP, see Ref. [2]) data product (i.e., a file). As with the Odyssey orbiter, TGO must also be told the identity of the asset from which data will be received prior to the relay session; this information is applied to the CFDP file header. Despite having this destination information available to it, TGO does not interpret this destination data in any way, but again packages the data as if it were TGO data and transmits it to Earth. Ground processes must assemble the CFDP file on the ground, identify the intended recipient, and then route the data appropriately. NASA’s Mars Reconnaissance Orbiter (MRO) and the Mars Atmosphere and Volatile Evolution (MAVEN) orbiter handle return-link data in a similar manner. In all cases, ground processes take an active role in the transfer of data to the end recipients, typically the mission operators or scientists operating the relay service user spacecraft. These ground processes are responsible for de-packaging the data as orbiter data to access the lander data within it, handling any CFDP processing of the

8

R. E. Gladden et al.

data, typically to remove the CFDP packaging; and then identifying the end recipient and delivering it. The situation is complicated by the use of multiple ground station networks, such as NASA’s Deep-Space Network (DSN) and ESA’s European Space Tracking (ESTRACK) network, which requires a multitude of implementations to support the various paths through which the data may flow. Despite the apparent simplicity of this approach, which serves the current needs of the network quite well, there are several fundamental weaknesses: 1. The orbiters must be informed, prior to the relay session, of the identity of the relay service user with whom a relay session will occur. This has led to an architecture where the orbiter must always be configured either to initiate the relay session itself or to be proactively commanded to expect to receive a signal from a relay service user spacecraft at specific times. (see also Sect. 2.4 for more information on this operational paradigm). 2. The orbiters cannot support receiving data from a local relay service user spacecraft and then forward it to another relay service user spacecraft at Mars. This prohibits supporting orbiter-to-orbiter, lander-to-orbiter, orbiter-to-lander, and lander-to-lander relay services (i.e., without Earth as an endpoint). 3. Ground processes that handle the data, as received from the five orbiters, are unique for each orbiter. This has required the implementation of unique ground processes for de-packaging the relay data from each orbiter’s data, identifying the intended recipient of the data, and then transferring that data. To achieve addressable data transfers at Mars, the use of the Delay-Tolerant Networking (DTN) architecture’s Bundle Protocol (BP), a CCSDS standard, is proposed (Ref. [3]). Future orbiter missions could then, at the time of receipt of data from a relay service user spacecraft, identify the destination and automatically determine the next step to delivering the data to its intended recipient. Similarly, an orbiter’s own science instruments could identify the recipient as the science teams who operate the instruments. In today’s network, the next node to which the orbiter should transmit these packaged data would still be to “Earth,” but then ground stations could receive the data and directly transfer it to the recipient without the need to de-package and otherwise process the data. In addition, data transfers to destinations other than Earth could be supported. Operational experience with the DTN architecture suggests that it will be applicable to a Mars relay network; DTN was demonstrated on the EPOXI spacecraft in deep space in 2008 (Ref. [4]) and has been operational on the International Space Station since 2016 (Ref. [5]). However, one challenge in implementing this capability is that the entire network, or at least several connected parts within it, would need to understand the addressing structure within BP. In the current space community, many spacecraft are built using propriety technologies or formatting. This is a challenge that directly inhibits adoption. To help overcome this, a DTN “first-hop/last-hop” (FHLH) mechanism is being developed which will enable a BP-cognizant spacecraft to communicate with a non-BP-cognizant spacecraft using legacy links.

Implementing Next-Generation Relay Services at Mars …

9

2.2 Custody Data Transfers In the terrestrial Internet, many protocols exist to manage the transmission of data across a network. These protocols often include a method to ensure that the data is completely transferred without error. Such protocols include the common File Transfer Protocol (FTP) (Ref. [6]) and the Simple Mail Transfer Protocol (SMTP) (Ref. [7]). In principle, reliable data transfers are effected when the receiving entity informs the sending entity that it has received the transmitted data. This can be verified by the use of checksums and other error detection schemes. When a transfer has concluded successfully, the sending entity can “forget” about the data, and the receiving entity then takes responsibility for delivering the data to the end recipient. This is called a “custody transfer.” In the current Mars relay network, there exists no notion of a custody transfer. When data that has been generated by a lander is transmitted to an orbiter, the data is received as simple binary data. No specific format or content of the received data is assumed, as mentioned in Sect. 2.1. Usually, the data is reliably transferred by using the CCSDS Proximity-1 (Prox-1) Protocol (Refs. [8–10]),1 which ensures that the data is moved reliably as a bitstream.2 In this way, no data is lost or duplicated in the transfer. However, the orbiter does not take strict custody of the data because it does not understand its content. For example, a lander may send several images during a single transfer, and the orbiter has no mechanism to detect the boundaries of the image files within the data it has received. The lack of knowledge regarding the structure of the received data prohibits the use of a true custody transfer from the lander to the orbiter. The orbiters simply turn the data around and transmit the data to Earth without regard for its internal data boundaries. Reliability on this direct-to-Earth transfer is ensured on a few of the existing spacecraft using different retransmission schemes, but even those implementations are not common across the orbiters. TGO, for example, uses the CCSDS Packet Utilization Standard (PUS) Service 13 (Ref. [12]) to ensure reliable transfers from the orbiter to Earth, but this again is applied at the level of the orbiter data product, which can include many data products as received from a lander. MRO uses a proprietary retransmission scheme that suffers from the same limitations. These retransmission schemes, even when applied, are not 100% effective, and occasionally there remain gaps in the received data. Data may also be lost in transmission from the ground stations to the mission users, but this is extremely rare. In the end, incomplete data may be received by the lander operators. These operators may then choose to command the retransmission of the missing data from the lander on the surface of Mars, thereby starting the whole chain of communications over again for that data, or they may choose to accept the loss. Due to the inherent 1 Note that the CCSDS Proximity-1 Protocol is expected to be replaced by the Universal Space Link

Protocol (USLP), as in Ref. [11]. that the current Mars Relay Network implements the Proximity-1 protocol as a reliable bitstream. However Prox-1 does have a provision for reliable packet transfer, as well, which enables the accountable transfer of data units across the network.

2 Note

10

R. E. Gladden et al.

delays in the network, sometimes the original data is no longer available onboard the lander and the data is unrecoverable. If a true custody transfer of the data were to be implemented between the various nodes in the network, then the data could be moved off the lander on the first reliable transmission and the lander could then “forget” about it immediately. The orbiter would then take full responsibility for transmitting the data to Earth and could do so reliably. Upon receipt, the ground stations would then take full responsibility for transmitting the data to the next recipient, etc. This notion of a custody transfer requires that the data be packaged as recognizable Protocol Data Units (PDUs) and that each node in the network recognizes the boundaries of those units. It also relies upon the existence of a reliable transmission of the data between the nodes in the network. Here again, the BP could be applied to answer the question of packaging. The reliability question could be managed on the direct-to-Earth transfer using a variety of methodologies, but it is proposed that the DTN Licklider Transmission Protocol (LTP) (Ref. [13]) be applied for reasons that will be further explained in Section F. Several problems are identified relating to custody transfers in deep-space applications. First among them is that a reliable transmission of the data may require several retransmissions in order to acquire data that was lost on the first transmission. A reliable exchange between a pair of nodes that are in close proximity to each other (where the transit times are low, such as between a lander and an orbiter) may not be greatly affected by this. However, in deep-space applications, the time it takes for the radio signals to transit between the nodes (such as between an orbiter and the Earth) may cause a significant delay when attempting to assemble a complete data product. For the operators of both the Curiosity and Opportunity rovers on the surface of Mars, this delay may be unacceptable. Both operation teams typically generate commands for the next sol’s activities after receiving data from the prior sol’s activities. This is often necessary because the rovers often physically move from day to day and the local environmental conditions on the surface of Mars may prove hazardous to the long-term health of the rovers, or the missions themselves may have a short design lifetime. In both cases, the mission operators are willing to accept receipt of their data even if there exist significant gaps within it. To them, some data is better than no data. For this sort of operational scenario, it may be advisable to use a protocol like CFDP that can deliver partial data products incrementally as segments are received. In the other direction, when transmitting commands from Earth to the Curiosity and Opportunity rovers, the late receipt of command products onboard the rovers might prove risky to the mission, as commands are quite often time-dependent. In addition, the commands are typically not even available to send to the rovers until very close to their execution time, with little occasion for retransmission if the commands are lost anywhere in the transfer path. Thus, in the return-link direction, the rover operators will accept partial data returns, but are not content with delays in the return; in the forward-link direction, the operators will accept only an on-time and complete transfer.

Implementing Next-Generation Relay Services at Mars …

11

The classical method for assuring reliable data delivery in a network is retransmission, as is performed by LTP over links characterized by extremely long signal propagation delays. When the resulting final delivery latency is unacceptably high, “erasure coding” can be used instead, whereby a data item is segmented into some number of data blocks and a number of “parity blocks” are computed from the content of the data blocks such that reception of a combination of data blocks and parity blocks can suffice to recreate the original data item (notably without retransmission) even if multiple data blocks are lost in transit. The number of parity blocks computed from the data blocks can be arbitrarily increased until the probability of data item recreation failure drops to nearly zero.

2.3 Multi-channel Support and Frequency Agility Terrestrial computer-to-computer interfaces typically function using an exponential backoff approach to ensure that a message is passed successfully between computers. The target computer is identified by its Media Access Control (MAC) address, which is attached to the message. As many computers may exist on the same transmission medium (either a wired network or an over-the-air wireless network), a computer will wait until the medium is clear before attempting to send a message. Computers on the medium that are not the intended recipient ignore the message. If such a “data collision” occurs at one attempt, the computer will wait to try again. The duration it waits exponentially increases until the medium becomes clear for the data to be transferred. One way to minimize the probability of a data collision on the network is to reduce the number of computers sharing the same transmission medium. In a wired network, this is often done by using a network switch, which effectively isolates several machines on a network into smaller sub-networks, which passes data only when data is detected on one side of the switch that is addressed to the other side of the switch. Another way to ease congestion on a network is to chop large messages into smaller ones using a technique called segmentation. These smaller segments can be routed through a variety of pathways until they reach the end destination where it can be reassembled into the final message. This is a primary function of the TCP/IP (Refs. [14, 15]) and facilitates the insertion of small messages into the network from a variety of sources in the presence of very large messages, all of which may have different destinations. Some wireless devices can take advantage of different frequencies on which they can transmit data. For example, some wireless routers for use in the home can support transmissions on multiple frequencies, the most common being 2.4 and 5 GHz. Wireless devices that use such a router can detect if one particular frequency has a lot of interference on it, presumably from network traffic from other devices, and then opt to use the other frequency in the hope that there will be fewer data collisions during data transfers.

12

R. E. Gladden et al.

In the Mars relay network to date, data collisions have largely been avoided by the sparsity of nodes in the network. The Curiosity and Opportunity rovers, for example, are on opposite sides of Mars and thus can never communicate with the same orbiter simultaneously. However, in late 2018, the Interior Exploration using Seismic Investigations, Geodesy and Heat Transport (InSight) lander will land close enough to the Curiosity rover that they will share visibility to the relay orbiters over 87% of the time (the remaining 13% of non-shared view periods are low-elevation, horizon-skirting view periods that are not generally suitable for performing data transfers). The Proximity-1 Protocol designates that relay sessions are to be established between two spacecraft on what is called “Channel 0,” or the “hailing” channel. Data is addressed internally with a spacecraft identifier, which is functionally equivalent to a MAC address. Just as in the terrestrial Internet, if a message is received by a spacecraft’s radio that contains a spacecraft identifier that does not match the one for that spacecraft, it ignores the message. After a relay session is established on Channel 0, the Curiosity rover can accept a directive from the orbiter to move to a different channel. When communicating with MRO, the Curiosity rover typically uses what is called “Channel 2.” This is the “working” channel between MRO and Curiosity, which is preferred over Channel 0 due to electromagnetic interference (EMI) onboard MRO by several of its science instruments. Communications between the Curiosity rover and all other orbiters in the network remain on Channel 0 after the hail is established because they do not suffer from this EMI. By contrast, the Opportunity rover does not have the ability to switch working channels and thus always communicates on Channel 0. The Odyssey orbiter and the InSight lander both carry the same model of radio as the Opportunity rover and therefore bear the same complete reliance on Channel 0. All three future missions planned for launch in 2020 will have a similar mix of capabilities. This dependency on Channel 0 for establishing a relay session and for maintaining the relay session (in many cases) is a double-edged sword. On the one hand, it ensures interoperability between the assets in the near-Mars environment, but on the other hand, it means that only one communication session may be active at any given time in a single view. Even in the MRO-Curiosity case where the working channel is different than the hailing channel, it is still desired to keep the hailing channel clear in case the link is interrupted and the session has to be re-established. As it is, the operators of the Curiosity rover and the InSight lander have developed strategies for sharing the relay resources that are available to them. This is done by coordinating a priori which lander gets to talk to which orbiter as a function of mission phase and time of day. This will ensure that only one relay session occurs at a time, but at the potential cost of reducing the overall throughput for both missions. As implemented, the current relay sessions are largely “all or nothing” affairs, where they are pre-scheduled to occur on the orbiters and the landers; the orbiters take no notice of the structure of the received data, thus prohibiting multiplexing of received data from multiple sources. Careful effort is expended during planning to ensure that only one relay session occurs at any given time within a given view,

Implementing Next-Generation Relay Services at Mars …

13

which altogether avoids the possibility of data collisions that are so common in the terrestrial Internet. However, this approach is only practical due to the small size of the network and would not remain practical as the network increases in size. Given the hardware on the existing assets at Mars, these problems cannot be overcome in the near-term. However, future missions could ensure that they have the ability to be more frequency agile or even carry the ability to attempt communicating on multiple frequencies at the same time. In principle, spacecraft at Mars could employ the same techniques used by terrestrial wireless devices to detect a busy frequency and switch to another one. Fragmenting large data transfers into appropriately sized PDUs could further increase the flexibility of the network, reduce the likelihood of data collisions, and maximize the transport efficiency of the network. This has implications for the design of the relay asset’s avionics, which, as mentioned in Sect. 2.1, would need to support addressable data transfers.

2.4 Demand Access and Always on Terrestrial computers are designed so that a computer is effectively always able to send data if it’s plugged into a network (notwithstanding the potential for data collisions as described in Sect. 2.3) with an assurance that receiving computers on the network are listening and able to accept the data. The cost of maintaining a live and active Internet connection is small relative to the cost of powering the computer, so typically the network connection is maintained as a “live” connection. In wireless applications, this is not necessarily the case. Wireless devices, including the ubiquitous cell phone, may only power on their transmitters when establishing a connection or when data needs to be sent. After connectivity is established with the nearest network node and any data that needs to be sent has been transferred, the devices power down their transmitters and only occasionally ping the network to ensure connectivity remains available. If connectivity is lost, such as if the device has moved out of range, the device may seek for another network connection, continually searching until a connection can be established.3 In this manner, the transmitters remain active (and drawing power) only when sending data or otherwise ensuring connectivity. When data does need to be sent, wireless devices first need to establish connectivity in a “session” with the receiving radio. In the case of a wireless device in a home, as mentioned in Sect. 2.3, this session is typically established with a wireless router, which itself is plugged into a wired network and continuously powered. In this example, the router is considered to be “always on,” constantly receptive to a signal from a wireless device with the ability to service multiple devices in a time-shared manner.

3 This

searching for connectivity explains why a cell phone’s battery drains so quickly when it is out of range of the network.

14

R. E. Gladden et al.

In the current Mars relay network, the Opportunity and Curiosity rovers do not enjoy the benefit of an “always-on” network. Even though an orbiter may be physically visible, they are not receptive to a signal from the lander unless they have been actively scheduled to be so. In addition, the current network is designed so that reliable, full duplex communication sessions are only initiated by the orbiter. Thus, a lander with data to be sent can only send that data to the orbiter if the orbiter first establishes the relay session. This approach requires meticulous planning by the spacecraft operators. The orbits of the orbiting spacecraft need to be carefully predicted and/or controlled. This is required so that view periods between the orbiters and the landers can be forecast with sufficient accuracy and lead-time to facilitate the commanding processes required to schedule the relay sessions on both ends. For low-altitude orbiters, this lead-time is measured in weeks, which constrains the typical last-minute planning approaches utilized by the rover operators, who respond to real-time conditions at the landing site and prefer to do daily commanding of the vehicles. The amount of data that can be communicated during each relay session is predicted by the rover operators. The amount of science data acquired by the vehicles during their missions is limited by this estimated available data return. To complicate matters further, the exact performance of each relay session when compared to the predicted performance regularly varies by as much as 20% or more. This implies that the data management scheme onboard the rovers needs to be flexible enough to accommodate both an over-performance of a relay session and an under-performance. If one were to shift the operational paradigm so that the orbiters at Mars acted analogously to their wireless router counterparts on Earth, several changes would need to be made. The orbiters would need to have their onboard radios constantly powered so that they could receive a signal at any time. For some of the existing orbiters, the radios are only powered on at the time of a relay session. Also, the orbiters would then need to be reprogrammed so that a relay session could be initiated by a landed asset, and these at any time (i.e., “demand access”). Presently, the high reliance on navigation predictions for the orbiters when planning relay sessions further constrains the relay network architecture. For example, proactive scheduling of relay sessions must account for some nominal uncertainty in the orbits. The rover operators, knowing that power and time are precious commodities onboard the rovers, will typically schedule a relay session to occur when the orbiter is predicted to be well above the horizon. This conservatism ensures that navigation errors do not cause the relay session to slip from view, but it has the side effect of underutilizing the potential throughput of a given view period. While data throughput at low elevations can be limited, some data can usually still be transmitted. Implementing a demand access approach with the orbiters would help mitigate the problem of the orbiter navigation uncertainty. The rover could ping the orbiter for access at some reasonably close time to the expected view period, and, if connectivity is available, the rover could begin the relay session immediately. If no response is received from the orbiter, the rover could wait and try again later. This strategy would both reduce the dependence on the accuracy of the predicted orbit and improve the overall data throughput. (Note that the use of the DTN LTP can further improve orbital

Implementing Next-Generation Relay Services at Mars …

15

contact utilization; transmission can begin at the earliest possible moment and end at the last, because any data that is lost due to low signal quality is automatically retransmitted as soon as possible.) Of course, the issue of needing onboard data management flexibility does not go away in a demand access environment. However, such a network would allow the rover to autonomously initiate an otherwise unscheduled relay session with an orbiter to return data to Earth that may not have been transmitted when it was originally expected to be transmitted. With low-altitude orbiters, communication opportunities are sparse and short, so some wisdom would need to be applied to the rovers to not waste time and power attempting to communicate with an orbiter that would not be there for long periods. However, if higher-altitude orbiters were to be sent to Mars, continuity of coverage could become a reality and a demand access approach could become more meaningful to relay users. This concept, coupled with that of frequency agility as discussed in Sect. 2.3, would be a game-changer for the types of missions that could be supported at Mars. Many users could use the same relay assets simultaneously, and other mission types previously unsupported could be added, such as constellations of very small satellites that have little to no means to perform navigation and/or to communicate directly to Earth.

2.5 Node Richness and Trunk Lines In the terrestrial Internet, a multiplicity of connections between many computers provides many routes for data to be moved between widely separated machines. Even with wireless applications, usually only one or two hops are necessary before data is inserted into the wired Internet where this multiplicity of pathways facilitates the rapid transfer of data to its recipient. The network is node-rich, and the infrastructure is generally oversized for the amount of data transferred. Consider the network topology shown in Fig. 2. The diagram includes a representation of the types of networked devices in the home of the lead author of this chapter, in a general network diagram. Note that nearly all of the devices are wireless, with most connecting to the range extender noted in the lower middle portion of the diagram. It is notable that despite the plethora of devices and device types, all of them ultimately gain access to the Internet via the single wireless modem, which acts as a trunk line. Each device independently operates on the network and negotiates with the modem and range extender. Data collisions, as described in Sect. 2.3, undoubtedly occur quite frequently, especially when media is being streamed, and yet the Internet functions seamlessly with no directed coordination among the devices and without thought on the part of the users (except when streaming movies stall!). The current Mars relay network is similarly “node poor.” With five orbiters at Mars (only three of which are presently, actively, and regularly used to provide relay services), there are not many paths that data from the existing rovers can take to be

16

R. E. Gladden et al.

Fig. 2 Example home network topology

returned to Earth. Each orbiter acts as an independent trunk line for the rovers, but, as mentioned in Sect. 2.4, relay sessions with the orbiters have to be deliberately scheduled. Fortuitously, the two rovers are on opposite sides of Mars and thus have not had to coordinate relay opportunities between them. However, a larger community of relay service users is expected to arrive at Mars in the next few years. The existing relay network will then be stretched to provide relay services to these additional assets. If even more assets were to be sent, such that there were more than a dozen relay users, the existing network would simply be unable to meet all of their needs. In contrast, consider if a network of three telecommunications satellites sharing an equatorial orbit at 6000 km altitude were to be sent to Mars. Together, they could provide continuous relay coverage to landing sites up to 60° latitude (north and south). Such satellites that support multi-frequency communications (as in Sect. 2.3) and demand access features (as in Sect. 2.4) would be able to support a variety of missions on the surface as well as lower-altitude orbiters. They could serve as trunk lines, providing relay services to dozens of assets with a robust line of communication back to Earth. In this way, the relay network at Mars could look very much more like that in Fig. 2, with a variety of users gaining access to the network that is not much broader in scope than that in a typical home.

3 Delay- and Disruption-Tolerant Networks 3.1 Data Loss and Latency Terrestrial networks experience almost no delay when communicating from one node to another. Transmissions occur at the speed of light, and all nodes, wired or wireless, are physically within much less than 1 s from each other. (Even the one-way transit

Implementing Next-Generation Relay Services at Mars …

17

time from the Earth to the Moon is less than 1.3 s.) As mentioned in Sect. 2.3, even when many data collisions are encountered on the network, a sending machine merely waits a matter of seconds to try sending data again. In deep-space applications, the two-way light time between two nodes in the network may be measured in minutes or hours. If the same principles were applied to deep-space communications as are applied in the terrestrial network, a data collision would not be encountered and detected by the transmitting node until a two-way light time duration later. This is further limited by the geometry of the environment (e.g., orbiters can be occulted by planets, landers can rotate out of view, etc.). Due to these limitations, the contact times between Earth-based transceivers are proactively and carefully scheduled with the orbiters in the Mars relay network. Relay sessions between the orbiters and the landers at Mars may utilize reliable data exchanges (such as the Proximity-1 Protocol mentioned in Sect. 2.2) to ensure that data is completely transmitted from the lander to the orbiter. However, the NASA orbiters then turn that data around and transmit the data to Earth without assurance that they will be completely received on Earth successfully. When the data is received on the ground, some of the operation teams for the orbiters use ground-based algorithms to detect transfer frames that were lost in transit, and can queue up spacecraft commands to cause the orbiter to retransmit the missing frames. Today, this frame detection is performed on the orbiter frames, which may contain any number of (or only a portion of) data products as received from the lander. The ExoMars TGO does utilize the Packet Utilization Standard (PUS) Service 13 to ensure that data products from the orbiter are received completely on Earth. This is an automated process where missing frames are detected in real time and spacecraft commands are sent back to the orbiters to retransmit them. Both approaches incur additional latency in the return of the data to the lander operators when frames are missing, and both have processes in place to deliver partial data sets when a complete data set is not available after the first attempt to transmit it. This quick data delivery, even of partial data sets, is highly desired by the lander operators, as they attempt to construct the command load for the next sol’s operations using whatever data they can get that informs them of the state of the rover on the surface of Mars. These operators have learned to accept the partial data sets and then to expect a more complete delivery of the data hours or days later. Note that in all cases, none of the orbiters manage the data as received from the landers as if they were data products with known boundaries, as described in Sect. 2.2. The data is received from the landers, and the orbiters manage them as a series of bits, without regard for packet, frame, product, or other data constructs within the data itself. Thus, at the end of a single relay session, the complement of data to be returned to Earth may include partial data sets from the originating source. Interestingly, this is consistent with how data is transmitted across the terrestrial network using TCP/IP, where messages are fragmented into smaller components and then transmitted piecemeal across the network, to be reassembled by the receiving node. In the case of the Mars relay network, accountability for the complete receipt of data products as produced by the rovers on the surface of Mars is performed

18

R. E. Gladden et al.

by the lander operators. Typically, data is held onboard the rovers until they have been confirmed as received, and then explicitly “released” via commands sent to the rover. The notion of a custody transfer, as described in Sect. 2.2, would cause this accountability to occur across each leg of the network. If this were to occur, then the lander, once it had confirmed successful transfer of a data product to an orbiter, could rely upon the orbiter to complete the transfer and immediately release the data from its memory space, freeing that resource for follow-on operations. The use of the Bundle Protocol would allow a sending entity to fragment its own data into small transfer units for handling through the entire network. In this way, though each bundle may consist of parts of one or more original data products, at each node in the network the data is transmitted reliably as a distinct and known unit, without the need to assemble or further fragment it at any node into some other construct for ease of transmission. The bundle becomes the atomic unit of the transfer across the entire network, much as the packet is the atomic unit of the transfer in IP. When using Ethernet, the maximum transmission unit (MTU) size of a frame is 1500 B, but in the Bundle Protocol, the MTU is limited only by local operational constraints, if at all. Bundles that are hundreds of megabytes (even gigabytes) in size may be transmitted routinely under favorable conditions. In practice, however, transmission contact opportunities are of limited length and bundles normally should be fragmented in such a way that the last byte transmitted is the last byte of a (possibly fragmentary, but still routable) bundle. Further segmentation is typically performed at the “convergence” protocol layer that underlies the bundle transmission layer, by protocols like LTP that are aware of the limits on frame size at the underlying space link layer. Because different convergence-layer mechanisms may be operating on different “legs” of the end-to-end data path, this convergence-layer segmentation and the necessary subsequent reassembly is performed separately—and possibly in quite different ways—at multiple forwarding points on the path. Reassembly of the entire original bundle occurs only at the final destination since each fragmentary bundle can theoretically take a different route through the network. A network could be configured to utilize a delay- and disruption-tolerant protocol for communicating data across the network. Such a protocol would address the issue of interruptions in connectivity. In the Mars relay network, these interruptions many take many forms, such as the orbiter is not visible from the lander, the lander is not able to talk to the orbiter even when visible, the orbiter is not visible to Earth, a tracking station on Earth is not configured to communicate with the orbiter, and the orbiter is not able to transmit to Earth even when visible. DTN empowers an enabled network to self-determine the quickest path for data to be returned to Earth. In the presence of only a few nodes, such as present in the Mars relay network, the quickest path for lander data that cannot be directly transmitted to Earth is always obvious: transfer to the orbiter, transfer to Earth, transfer to the operators. In a more connected network where more nodes exist, the pathway may not always be so obvious. For example, consider the case where there are two orbiters at Mars, as shown in Fig. 3. In this example, assume that the lander may be able to see Orbiter 1 “now,” which has the ability to transfer data to Earth at a low data rate, and Orbiter 2 may be

Implementing Next-Generation Relay Services at Mars …

19

Fig. 3 Some data return options (contact graph routing)

visible at a later time and has the ability to transfer data to Earth at a high data rate. Including the ability to cross-link the data between the orbiters, there are a variety of paths that can be used to transfer the data back to Earth, only three of them being shown in the figure. Should the lander choose to send the data immediately to Orbiter 1, it may be that Orbiter 1 could then be confronted with the choice to send the data back to Earth directly at its low data rate or to cross-link the data to Orbiter 2 so it can complete the transfer in its stead. It may be that Orbiter 1 can only support a low direct-to-Earth data rate, but Orbiter 2 is known to support a higher rate. But it may also be the case that Orbiter 2 does not have time scheduled with Earth receivers. The use of the DTN protocol onboard the lander and the orbiters would enable each node in this network to calculate the quickest path for the data. Thus, the lander may choose to transfer the data to Orbiter 1, assuming that the quickest path for the data to be transferred is at Orbiter 1’s low rate (Path 1). Orbiter 1, however, may then choose to cross-link the data to Orbiter 2 so that it may transmit the data at a higher rate at a later time (Path 2). Alternatively, the lander could choose to wait until Orbiter 2 is visible and send the data then (Path 3). The calculation to determine the quickest path should be relatively simple, but it requires knowledge of the state and availability of every node in the network. In a DTN, this knowledge is called a “contact graph” and the act of determining the fastest route for data is called “contact graph routing.” The proper application of routing in every node in the network should theoretically always minimize the data return latency.

3.2 Predictability and Prioritization In principle, a DTN-enabled network could service many users in the presence of long transmission times. However, several issues manifest in the use of DTN, namely the predictability of time of data receipt and the need to perform partial data deliveries. The current rover operators are accustomed to knowing when their data will be returned to Earth. The time it takes data to traverse the path in the current Mars relay network from the rover to the orbiter to the ground stations and then to the

20

R. E. Gladden et al.

workstations of the rover operators is reasonably predictable. In a DTN-enabled network where there are many paths and many nodes, this predictability may be more elusive. However, software has been developed and successfully demonstrated that uses contact plan information—the same information that is used to compute routes through the network—to develop a range of potential delivery times for a bundle of a given size, source, destination, and transmission time, with the probabilities associated with each delivery time. This “bundle delivery time estimation” algorithm cannot provide a deterministically certain time of data arrival, but it can at least be used to alert network operators to potentially unacceptable latencies so that remedial action can be taken in advance. One such remedy is simply the modification of the contact plan, scheduling additional contacts—possibly at the expense of previously scheduled contacts—or lengthening contact intervals. Another possible remedy is providing additional forwarding advantage to the subject bundle itself, by revising the bundle’s class of service. BP running over DTN supports three classes of service, namely bulk, standard, and expedited, which are commonly implemented as levels of priority in the assembly of forwarding queues. Applying these classes of service, a high-priority (i.e., expedited) message may be transferred through the network with lower latency than other lower-priority data. This would directly support the need of the lander operators to receive their critical telemetry and other data to support next-sol planning, while allowing the less critical data to be returned with greater latency. As previously mentioned in Sect. 2.2, in today’s Mars relay network, data acquired during a single relay session is typically handled as a single monolithic data product when being transferred back to Earth. This prohibits the fragmentation of the data into high- versus low-priority data. The implementation of BP on the rover asset, then, could be brought to bear on the rover side to package relay data into smaller fragments. These fragments might collectively represent one data product (such as a large picture) or might include many data products (such as critical, captured spacecraft telemetry). The use of BP allows the network to be concerned about the priority of the data, and not the content, which is controlled and abstracted by the network user. This methodology partly addresses the desire to receive some of the data, particularly the most critical parts of it, with as low as latency as possible. While not directly addressing the goal to receive partial data products in the midst of an incomplete transfer, this methodology does effectively reduce the likelihood that this will become a practical issue.

3.3 Encryption The BP does also support features that allow a user mission to encrypt their data. This is one concern for user missions when confronted with the possibility of exchanging

Implementing Next-Generation Relay Services at Mars …

21

data through a relay network where not all nodes in the network may be under the control of the organization that sponsors the mission. The DTN architecture addresses this concern in two main ways. First, each bundle may contain specialized extension blocks that implement the Bundle Security Protocol (BSP). These blocks are of two types: 1. Block Integrity Blocks (BIBs) carry cryptographic signatures computed from the content of the various blocks of a bundle, typically the primary block and payload block. These signatures enable a receiving node to determine immediately whether or not the content of the block to which the BIB pertains (i.e., the “object” of the BIB) has been altered in any way subsequent to attachment of the BIB. 2. Block Confidentiality Blocks (BCBs) carry information that describes the manner in which the various blocks of the bundle—typically the payload—have been encrypted. This information enables the bundle’s destination node to decrypt the encrypted block. Note that block encryption remains in effect not only while the bundle is in transit from one node to another but also while the bundle is “at rest” at a forwarding node while awaiting a transmission opportunity. Taken together, the BSP blocks ensure that it is safe to forward data through a node operated by a given space agency when the source and destination of the data are some other agency altogether. A BCB applied to the payload ensures that the bundle’s application information would not be disclosed to an unauthorized recipient, and a BIB applied to the payload block enables receiving nodes to detect any attempt to tamper with the meta-data characterizing that information. A second mechanism for securing data in a DTN-based network is bundle-inbundle encapsulation (BIBE). BIBE enables an entire bundle to become the payload for a second, encapsulating bundle with a different destination. The source and destination of the encapsulating bundle may be at the entrance to and exit from a particularly hazardous region of network topology. This enables the entire encapsulated bundle to be encrypted during its transit through that dangerous space, affording a high degree of protection from traffic analysis.

3.4 Challenges The challenge of implementing BP over DTN is not necessarily a technical one, but rather one of adoption. The requirements on the avionics and the telecommunications systems of the spacecraft are well-understood, having been demonstrated in flight experiments, but existing Mars missions did not have such principles in mind (and especially not as relevant requirements) when they were built. In addition, it is unfortunate that none of the orbiters in the existing Mars relay network are capable of being retrofit to support these new methodologies due to hardware restrictions, specifically in their implementations for onboard data handling.

22

R. E. Gladden et al.

In order for a true DTN-enabled network to be implemented at Mars, future relay service provider spacecraft would need to be designed with the principles described in this chapter in mind. Methodologies for the regular distribution and refreshing of contact graph schedules onboard each of the nodes need to be developed, including via international entities. These methodologies, and others relating to network management, are currently being developed and should be available long before the next-generation DTN-capable spacecraft are launched. Once done, many network architectures could be manifest, from a distributed and many-node implementation where each node has their own connectivity to Earth, to a spoke-and-hub network where many nodes could rely upon one or a few primary nodes to transfer data back to Earth.

4 Conclusion The technologies outlined in this chapter may be implemented in any communications network that struggles with long-leg communications, whether at Mars or at other locations around the Solar System. Ultimately, it is desired to build a unified architecture that is robust to the addition and loss of nodes, both as relay service users and as relay service providers. The other challenges outlined in this chapter, including those related to data loss, high data transfer latencies, predictability of data returns, the need for prioritization, and a requirement for data encryption, also need to be addressed. Beginning with just one orbiter implementing the full (or even a partial) complement of the principles outlined in this chapter, the next generation of relay capabilities at Mars can begin to be put into place. These principles include addressable data transfers, custody data transfers, multi-channel support, radio frequency agility, demand access capabilities, always-on capabilities, node richness, and the inclusion of trunk lines. By doing so, a variety of mission types that heretofore have been discounted at Mars can be more achievable: small satellites, cubesats, netlanders, weather stations, mini-rovers, climbers, diggers, balloons, drones, swarms, etc. Each of these would become viable mission types, presumably alongside human explorers who would be equally served by investments in the relay infrastructure at Mars. By successfully implementing these technologies in the next generation of spacecraft, the introduction of additional nodes would be enabled without regard to mission type or the sponsoring organization, including both government and private entities, and would provide a far more robust infrastructure than is presently available. Ultimately, the objective is to construct a true interplanetary network, leading to a Solar System Internetwork (SSI) (Ref. [16]), that functions as seamlessly as the terrestrial Internet. As more and more actors look to explore and exploit space, a unified relay network could become a critical and enabling component.

Implementing Next-Generation Relay Services at Mars …

23

Acknowledgements The research described in this chapter was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration. Government sponsorship acknowledged.

References 1. In SpaceOps Conference, AIAA Proceeding 2018. Marseille, France. (2018, 28 May–1 June). https://doi.org/10.2514/6.2018-2599. 2. Consultative Committee for Space Data Standards (CCSDS). (2007, January). CCSDS file delivery protocol (CFDP). 727.0-B-4. http://www.ccsds.org. 3. Consultative Committee for Space Data Standards (CCSDS). (2015, September). CCSDS bundle protocol (BP). 734.2-B-1. http://www.ccsds.org. 4. Wyatt, J., Burleigh, S., Jones, R., Torgerson, L., & Wissler, S. (2009). Disruption tolerant networking flight validation experiment on NASA’s EPOXI mission. In 2009 First International Conference on Advances in Satellite and Space Communications, Colmar. (pp. 187–196). 5. Schlesinger, A., Willman, B. M., Pitts, L., Davidson, S. R., & Pohlchuck, W. A. (2017). Delay/disruption tolerant networking for the international space station (ISS). In 2017 IEEE Aerospace Conference, Big Sky, MT , (pp. 1–14). 6. Internet Engineering Task Force (IETF). (1972, November). File transfer protocol (FTP). RFC 412. 7. Internet Engineering Task Force (IETF). (2015, June). Simple mail transfer protocol (SMTP). RFC 7504. 8. Consultative Committee for Space Data Standards (CCSDS). (2013, December). CCSDS Proximity-1 space link protocol—physical layer. 211.1-B-4. http://www.ccsds.org. 9. Consultative Committee for Space Data Standards (CCSDS). (2013, December). CCSDS Proximity-1 space link protocol—data link layer. 211.0-B-5. http://www.ccsds.org. 10. Consultative Committee for Space Data Standards (CCSDS) (2013, December). CCSDS proximity-1 space link protocol—coding and synchronization sublayer. 211.2-B-2. http://www. ccsds.org. 11. Consultative Committee for Space Data Standards (CCSDS). (2017, February). CCSDS Unified space link protocol (USLP). 732.1-R-2. http://www.ccsds.org. 12. Telemetry and Telecommand Packet Utilization Standard (PUS) Service 13. (2003, January). Large data transfer service. ECSS-E-70–41A. 13. Consultative Committee for Space Data Standards (CCSDS). (2015, May). CCSDS licklider transmission protocol (LTP).734.1-B-1. 14. Internet Engineering Task Force (IETF). (1981, September). Transfer control protocol (TCP). RFC 793. 15. Internet Engineering Task Force (IETF). (1981, September). Internet protocol (IP). RFC 791. 16. Edwards, C. E., Jr., Denis, M., & Braatz, L. (2010, October 15). Operations concept for a solar system internetwork (SSI), Interagency operations advisory group (IOAG), Space internetworking strategy group (SISG) report.

Space Mobile Network Concepts for Missions Beyond Low Earth Orbit David J. Israel, Christopher J. Roberts, Robert M. Morgenstern, Jay L. Gao and Wallace S. Tai

Abstract The space mobile network (SMN) is an architectural framework that will allow for quicker, more efficient and more easily available space communication services, providing user spacecraft with an experience similar to that of terrestrial mobile network users. While previous papers have described SMN concept using examples of users in low Earth orbit, the framework can also be applied beyond the near-Earth environment. This chapter details how SMN concepts such as userinitiated services, which will enable users to request access to high-performance link resources in response to real-time science or operational events, would be applied in and beyond the near-Earth regime. Specifically, this work explores the application of user-initiated services to direct-to-Earth (DTE), relay, and DTE/relay hybrid scenarios in near-Earth, lunar, martian, and other space regimes.

Nomenclature BP CCSDS DAS DS DSS DTE DTN IP LEO MRN MUPA OMSPA

Bundle Protocol Consultative Committee for Space Data Systems Demand Access System Deep Space Distributed Space System Direct to Earth Delay/Disruption-Tolerant Networking Internet Protocol Low Earth Orbit Mars Relay Network Multiple Uplink per Antenna Opportunistic Multiple Spacecraft per Aperture

D. J. Israel (B) · C. J. Roberts · R. M. Morgenstern NASA Goddard Space Flight Center, Greenbelt, MD 20771, USA e-mail: [email protected] J. L. Gao · W. S. Tai NASA/Jet Propulsion Laboratory/California Institute of Technology, Pasadena, CA 91109, USA © Springer Nature Switzerland AG 2019 H. Pasquier et al. (eds.), Space Operations: Inspiring Humankind’s Future, https://doi.org/10.1007/978-3-030-11536-4_2

25

26

PNT PROX-1 RF SCaN SMN TDRSS UIS

D. J. Israel et al.

Position, Navigation, and Timing Proximity-1 Radio Frequency Space Communications and Navigation Space Mobile Network Tracking and Data Relay Satellite System User-Initiated Services

1 Introduction NASA has introduced a framework, the space mobile network (SMN) [1, 2], for a space communication and navigation architecture. SMN involves a set of architectural ideas that will allow for automated, more efficient and more easily available communication services, providing user spacecraft with increased network performance supportive of dynamic and autonomous mission scenarios. Users will be able to request communication services as needed from a network of diverse providers, including government and commercial providers. In this automatic and more accommodating system, user platforms could request services directly to tailor to their realtime needs; thus, SMN offers autonomous variable data collection without requiring pre-scheduled, manual requests, enabling users to obtain higher or lower data rates as needed. SMN also allows for improved availability and mitigation of delays in service provision by taking into account specific latency requirements. While previous papers describing SMN concepts and implementation have typically demonstrated the ideas using case examples of users in low Earth orbit (LEO), the concept is not restricted to LEO users and could be applied beyond LEO. This chapter will describe SMN application scenarios for users at the Moon, the Sun–Earth Lagrange points, and Mars and will propose examples of how to implement those concepts. The concepts will include support through both direct-to-Earth (DTE) links and space relays. This chapter will also discuss how missions moving from beyond LEO will be able to continue under the same operations concepts, though implementation solutions may be different for various scenarios.

2 Space Mobile Network Concepts Figure 1 illustrates the key features of the SMN. A significant difference of this view from the historical NASA space communication view is the depiction of the network as a “network cloud” with access points. This depiction has been common for terrestrial networks for some time, but the networks supporting robotic and human space exploration have continued to be viewed and operated as point-to-point link providers. This view implies the ability to route or forward data between any two endpoints with connectivity to the network. In terrestrial networks, this functionality

Space Mobile Network Concepts for Missions Beyond Low Earth Orbit

27

Fig. 1 Space mobile network key features

is provided by the ubiquitous Internet Protocol (IP) and its suite of associated protocols. Though the IP suite works in some space applications, the dependence on full end-to-end connectivity for data delivery and prevalence of “chatty” two-way support protocols prohibits its use for all space scenarios [3]. Delay/disruption-tolerant networking (DTN), specifically the Bundle Protocol (BP), has been developed to provide the benefits of networking in space (and other challenging) environments [4]. BP offers network-layer functionality using a store-and-forward approach, allowing for storage at intermediate nodes when the next hop is unavailable. As the evolution of the terrestrial Internet has demonstrated, the standardization of network and link-layer protocols allows a build-up of infrastructure through the peering of provider systems. The SMN framework continues this evolution by leveraging IP and DTN for the network-layer standards and by leveraging commercial and Consultative Committee for Space Data Systems (CCSDS) standards for the link- and physical-layer standards for space applications. The infrastructure to provide SMN services to future space users is expected to be comprised of global government and commercial systems. These systems, either ground stations or relay spacecraft, would provide access points to the larger network. The terrestrial mobile network user is accustomed to having continual access to the network. Providers strive to provide coverage to all users at availability and quality-of-service levels high enough to attract and maintain customers. There are

28

D. J. Israel et al.

significant differences, though, between the terrestrial mobile network user and the initial SMN user. Most notable are the SMN user locations, user terminal limitations, and the user’s willingness to wait for full data delivery. High-availability links are a feature of the SMN; but due to the challenges involved in providing service at locations such as deep space or planetary surfaces, continuous availability is not always feasible. SMN’s high-availability links are optimized for coverage and availability, which typically limits the performance with respect to data rates and link capacity. A common approach to increasing availability is to implement multiple access systems that can provide links to multiple users. Multiple accesses may be provided through time, frequency, or code division. A user’s time slice, carrier frequency, or code may be pre-provisioned such that a user can immediately receive service or a multipleaccess link acquisition process may occur first. The Tracking and Data Relay Satellite System’s (TDRSS) Demand Access System (DAS) is an example of the former [5], and the CCSDS Proximity-1 Protocol (Prox-1) is an example of the latter [6]. Terrestrial mobile network systems are able to economically “over-provision” the area to allow all users to find an available link, except in extreme cases such as emergencies. These systems also maintain continuous control-channel links that can always locate users and assign parameters. This is a continuous power drain and link requirement on both the user and provider systems that is not acceptable for most spacecraft. Space links optimized for availability will likely have traded performance in other areas. High-gain systems that will support higher data rates or reduce the user’s system gain requirements are generally implemented with larger apertures with narrower beam widths and/or high-gain amplifiers. The costs measured in relay size, weight, and power, or in ground station implementation and operations’ costs are high, limiting deployment opportunities. Thus, these high-performance links become shared resources. Fortunately, many space missions have enough on-board storage and few stringent latency requirements that allow them to wait until the service is available. High-performance links ranging from X-band to Ka-band in radio frequency (RF) and now expanding to optical links are scheduled days to weeks in advance, guaranteeing link availability in time to meet mission requirements. However, this leads to inefficiencies in link utilization and an inability for more rapid call-up of a high-performance link to support an unplanned science event or other occasions. To have a more responsive method to provide high-performance links and network services, SMN introduced the concept of user-initiated services (UIS). As illustrated in Fig. 2, UIS is a class of service acquisition processes in which the end user originates the service request. This differs from the current methods used for the acquisition of services by allowing the service acquisition process to be carried out by standardized “machine-to-machine” communication over space links. These requests may extend beyond link access requests to requests for end-to-end data delivery [7]. UIS will enable a user platform to request services over a signaling channel embedded within any links available to the user. The high-availability links provide the most available path for UIS to request services from the network. Since most space missions and networks are unable to support an always-connected control channel due to onboard power limitations and pointing and coverage constraints, UIS solutions specific to various SMN scenarios are under development.

Space Mobile Network Concepts for Missions Beyond Low Earth Orbit

29

Fig. 2 Service acquisition process hierarchy

3 Service Acquisition via User-Initiated Services Space communication operations can be described as occurring in two phases: service acquisition and service execution. Current space communication service acquisition processes are characterized by pre-planned service requests negotiated among user missions and provider network operator’s weeks in advance by “human-in-the-loop” systems. These requests are typically for point-to-point services and, therefore, space link resource-specific. This reduces the ability of the provider network to allocate service requests among space link resources according to optimal prevailing conditions or other criteria. In contrast, terrestrial wireless network providers implement all of the control data flows preceding service execution autonomously and hidden from end users, resulting in the delivery of user service data, such as delivering a text message to a friend or streaming Internet video. UIS automates current processes for space communication service acquisition using a request–response design pattern, with the service request generated by the user [8]. The data contents of the request message may vary based on user mission compatibility constraints, degree of platform autonomy, or other considerations. However, a key distinction from current service acquisition processes is that a UIS request may be service-oriented as opposed to link resource-specific. For example, a user may specify a request to “deliver 25 GB of data from the mission platform to the science operations center within two hours,” or “get as much data as possible off my platform as soon as possible (to avoid overwriting the onboard data storage), and get it delivered to the science operations center within six hours.” Requests specified in these terms allow the provider network flexibility to optimize allocation of the request across the set of link and network resources that satisfy the user mission service and link parameter constraints, which may be provided by the government, university, or commercial resources operating as a federated network.

30

D. J. Israel et al.

A key architectural principle for realizing service-orientated requests in terrestrial networks involves the separation of concerns pertaining to network signaling and control data flows, which enable autonomous monitoring and control of resources, from those of the user data flows, which traverse the paths orchestrated by the signaling and control processes [9]. UIS is an emerging class of space communication service acquisition processes implemented through signaling and control protocols. In the SMN framework, signaling and control data flows will typically occur via high-availability space links. However, high-performance space links optimized for user service data flows may also carry control data. Under the UIS framework, the service provider pre-provisions a signaling channel to support the necessary handshaking process between the provider and any users wishing to acquire service. Once a user has secured service commitment from the provider, the actual communication of user data will take place during the service execution phase on a data channel. This process is illustrated in Fig. 3. Service acquisition by definition always precedes service execution, and the delay between the two processes is determined by the scheduling agreement reached between the users and the provider during the service acquisition process. Users with existing service on any data channel may use it to simultaneously initiate a UIS process to acquire additional service in the future, as illustrated in Fig. 4. UIS protocol facilitates the exchange of a user’s request and the confirmation of provisioning of resources. Therefore, it is a service management application-layer protocol at the provider–user interface. This process must be common to all service domains in terms of procedure and messaging content in order to integrate management and operations across all elements of a federated network. However, how UIS messages are delivered across the signaling channel may vary depending on the operational environment. Under the UIS framework, a user will initiate a service request by sending a request (REQ) message to the provider over either a signaling channel or an existing data channel. A request can convey a specific desired service configuration or a range of acceptable parameters in terms of time, duration, data rate, coding, etc. In the latter case, the provider may narrow down the list of options when granting service. Upon receipt of a REQ message, the provider will determine, per network management decision, whether to grant the user’s request. The request can be granted as is, granted with a more restricted parameter set, denied implicitly by lack of response and therefore result in a user time-out, or denied explicitly by issuing a negative acknowledgment. The provider’s response can be issued over the signaling channel or via an existing data channel. A UIS service acquisition process is nominally a two-way handshake (positive acknowledgment) process with user time-out. Each user must receive a confirmation from the provider within a time-out period to proceed with service execution. The provider may also have the option to explicitly cancel a prior grant to de-conflict with late-arriving, higher-priority requests. Furthermore, the provider may release/cancel a provisioned resource upon determination that service has not been utilized for a certain period of time after the beginning of the service execution phase. A one-way handshake process may be used in cases of high communication delays with a high likelihood of a granted request. In rare circumstances, a one-way handshake process might be considered to support off-nominal events,

Space Mobile Network Concepts for Missions Beyond Low Earth Orbit

31

Fig. 3 UIS service acquisition process using the signaling channel for conducting the UIS handshaking process

Fig. 4 UIS service acquisition process using the data channel for conducting the UIS handshaking process

32

D. J. Israel et al.

like a spacecraft emergency. In such a scenario, a high detection probability, lowfalse-alarm signaling mechanism is typically used, and the user assumes the provider will correctly receive its request and grant service without further confirmation. The specific channels available and UIS process and protocols will differ as dictated by communication constraints in different scenarios and environments. Wherever possible, UIS protocol messages and processes can be supported over different links. For example, the same UIS handshaking process and messages can be supported over a TDRSS demand access channel or a Prox-1 hailing channel, with the same messages carried over the different physical and link layers. The following sections provide examples of how UIS could be implemented within different scenarios.

4 Direct-to-Earth (DTE) Scenarios 4.1 Low Earth Orbit (LEO) DTE Direct-to-Earth communication operations in the low Earth orbit (LEO) regime are typically characterized by long periods with no communication access due to geometrical line-of-sight constraints between ground assets and mission platforms. Many ground link resources are concentrated near the Earth’s polar regions to support sunsynchronous and other polar orbits common to weather and other Earth-observing satellites. Due to phase differences between typical LEO orbital periods and the rotational speed of the Earth, access to ground link resources for missions in lower inclination orbits is less frequent. When ground stations are in view, there are still limitations to how many missions may be supported simultaneously. However, DTE link resources, such as ground-based omnidirectional or electronically steerable phased arrays, may enable more responsive DTE communication and higher availability. In the absence of high-availability communication links, LEO missions must implement sufficient onboard autonomy to sense and respond to scientific or engineering events of interest that require responsiveness on timescales less than the greater of the expected maximum pre-scheduled inter-link access time or network service acquisition and execution time. Examples of such onboard autonomy may include increasing instrument data sampling rates in response to a transient solar flare or changing spacecraft state into a safe mode if an engineering parameter exceeds its pre-defined limit. Currently, system and operation concepts must be designed to accommodate this increase in data volume without the ability to acquire more high-performance links in a timely fashion. Two possible scenarios involving UIS and DTE links are described. In both scenarios, a scientific or engineering event precipitating the need for service acquisition occurs at some point between pre-planned, high-performance service events. The first scenario, depicted in Fig. 5, involves ground-based, high-gain link resources only. In this scenario, a UIS request for future, high-performance link services is inserted into

Space Mobile Network Concepts for Missions Beyond Low Earth Orbit

33

Fig. 5 UIS request for future DTE high-performance link access during a pre-scheduled DTE contact

a pre-scheduled service channel, assuming the user needs cannot be fully satisfied during this contact. The request for future services should be dispositioned and a response provided to the user platform within the contact access window. The second scenario, depicted in Fig. 6, involves the use of higher-availability ground link resources, such as omnidirectional or electronically steered phased arrays, in order to request access to higher-performance, high-gain link resources. If possible, the service request could be granted within the same contact. The utility of this scenario is greatly enhanced by adaptive link technologies, such as variable modulation, coding and data rates, and DTN. For example, a typical LEO access window is constrained by the line of sight between the ground link resources and mission platform, and it may last ten minutes. Within that window, the range between the platform and ground link resource may vary by an order of magnitude. Adaptive link technologies would enable higher data rates as the range decreases. For a service request with a given data volume, this reduces the time required for service execution, which then forms a deadline constraint on the preceding highavailability service acquisition process. DTN protocols ensure reliable data delivery despite intermittent link availability [4]. This has three main benefits. First, the network-layer functionality allows the platform to determine where data should be sent when it gets to the ground station without any a priori knowledge at the ground station, and the store-and-forward nature of DTN provides automated rate buffering for any rate mismatches or disconnections over the end-to-end data delivery path. Second, it allows for a relaxation of constraints on link performance requirements because data delivery reliability is handled at the networking layer instead of at the link layer or application (file) layer. Third, the data from the mission platform is

34

D. J. Israel et al.

Fig. 6 UIS request for DTE high-performance link access using higher-availability ground link resource

fractionated into bundles, which are generally smaller in data volume as compared to other common space protocol data units. These benefits combine to increase the quantity of viable access windows. From the UIS perspective, a scenario involving higher-availability ground link resources combined with adaptive link and DTN technologies may significantly improve both the responsiveness and efficiency of the space communication network. If the service request cannot be granted during the same contact, the UIS service acquisition process could still be completed by scheduling a new or modified contact to follow. This subsequent contact could again be provided by any provider’s asset if it were part of a peering or federated service infrastructure.

4.2 Deep-Space DTE For deep-space communication, the UIS framework can be supported by an explicit signaling channel implemented via a beaconing system or via an opportunistic multiple spacecraft per aperture (OMSPA) technology [10]. The beacon approach, shown in Fig. 7, is particularly suitable for missions on long cruise or during extended periods of inactivity such that it is more economical to use a low-complexity beacon signal that can be detected reliably on the ground without utilizing the 34 or 70 meter antennas. A preliminary beaconing concept has been tested as early as 1999 with the Deep Space-1 (DS-1) spacecraft [11]. Due to the robustness of the beacon system, several deep-space missions (e.g., the Mars Science Laboratory and Juno) began utilizing a limited set of distinct beacon frequency tones during critical entry, descent,

Space Mobile Network Concepts for Missions Beyond Low Earth Orbit

35

and landing and orbit-insertion phases to indicate the spacecraft’s operational states. For UIS, however, a message-oriented process is envisioned using sequences of tone “alphabet” to encode the UIS service request and confirmation messages. While all beaconing in deep space has been one way only so far, an uplink beacon, when feasible, can be added to complete the two-way handshake. For assets deployed in the proximity of a high-coverage area such as the Mars region, where multiple spacecraft are tracked on a daily basis, OMSPA and multiple uplink per antenna (MUPA) technologies [12] can provide suitable downlink as well as uplink signaling channels for a UIS service acquisition process. The exact signaling format on the link layer and waveform could again be a simple tone-based alphabet or CCSDS-formatted telemetry, depending on resource availability. UIS messages could be captured and processed either in the close-loop receiver with timely frame/packet content extraction and delivery to network management, or be recorded in the open-loop recorder and post-processed. Both approaches could be used depending on the desired turnaround time and ground resource availability. The common UIS service acquisition protocol is advantageous for missions operating across multiple space regimes, such as in the near-Earth regime immediately after launch followed by the lunar or deep-space regime, because the same operational procedure applies in terms of how communication services are acquired.

4.3 Beyond LEO/Near Earth As mentioned in the previous section, with respect to supporting deep-space missions in the early-cruise phase, missions beyond LEO and out to the Moon and to the Sun–Earth L1 and L2 Lagrange points can be supported with shared apertures at sizes of 18 meters or less. The smallest antenna size that can still maintain the desired links would be preferred since they would have the widest beam width and, therefore, the most coverage area for a particular distance from Earth. For example, all users on the near side of the Moon or in lunar orbit could be supported with MSPA and engage UIS to request a high-performance link at any time. The distinct advantage at these distances is that the delays of no more than a few seconds would allow two-way UIS protocols to finalize a service acquisition protocol quickly and potentially bring up a high-performance link immediately.

5 Relay Scenarios 5.1 Earth Relay Previous papers have extensively described the Earth relay scenario combined with the hybrid Earth relay/DTE scenario [7]. Earth relays are able to provide full orbital

36

D. J. Israel et al.

UIS SAP messages

User Data

UIS Service Execution & additional Service Acquisition

UIS Service Execution

UIS Service Acquisition via beacon

beacon system grant request

resource provisioning Network Management & Service Provisioning

DSN

User MOC

grant request

Fig. 7 UIS framework for deep-space multiple access communication with beacon

coverage and could feasibly provide continuously available links. There is also the greater likelihood of multiple providers among international and commercial partners that could provide the total infrastructure. One provider’s system may provide the high-availability link used for the UIS service acquisition process that acquires a different provider’s high-performance link. For example, a high-availability RF link may be used to schedule access on a different relay’s optical link. Similar to the TDRSS DAS, high-availability links could be pre-provisioned with enough simultaneous user capacity to allow each user’s high-availability link service execution to begin immediately without the per-use service acquisition process.

Space Mobile Network Concepts for Missions Beyond Low Earth Orbit

37

5.2 Mars Relay The current baseline of the Mars relay network (MRN) is based on pre-scheduled operations. However, due to the flexibility that is already designed into the primary link-layer protocol, the CCSDS Prox-1 Space Link, channel synchronization and coding, and physical-layer protocol suite, the MRN can be extended easily to support UIS service acquisition. Prox-1 uses a multiple-access link acquisition process, or “hailing process,” to establish a link. Figure 8 shows UIS operations utilizing the same hailing channel as its signaling channel. In this scenario, the user (the surface assets) will initiate a hailing sequence to establish a temporary connection with the provider (the relay orbiter) in order to exchange service request and handshake messages. This initial exchange on the hailing channel completes the UIS service acquisition phase. That phase is followed immediately, if available, or later by the separate Prox1 hailing process, triggered from either the orbiter or the surface asset to kick off the service execution phase and establish a data channel. The application-layer UIS process remains essentially the same except that the signaling mechanism is enabled by the Prox-1 hailing process. If a user desires to acquire additional service during an existing data session, it can send the UIS message over the Prox-1 data channel, as well. This operational scenario is completely in agreement with the general UIS framework described in earlier sections.

5.3 Lunar Relay The plans for long-term exploration and utilization of the Moon include the build-up of multiple spacecraft landing on and orbiting the Moon [13]. This is expected to be a collaborative effort combining human and robotic spacecraft provided by NASA, along with commercial and international partners. The scenarios described in this paper combine for this scenario. Individual missions could be supported with DTE links from the lunar vicinity, and these missions could also become relays for other missions as part of a distributed space system (DSS). The inclusion of network-layer protocols for all data flows will initially facilitate the use of any link as a possible path toward data delivery. The addition of UIS concepts will provide the means for a scalable infrastructure that can be responsive to the wide variety of potential missions expected. The unique location of the Moon allows for a situation where the proposed approaches for Mars relay proximity links could be used, but the scheduling and provision of resources could be done by a combination of onboard and Earth resources. A surface or orbiting lunar user could submit UIS requests through a relay, but the service could be provided by a DTE link. A user could also submit requests through a DTE link and be provided relay services. This flexibility would require coordination between lunar orbiting and Earth provider systems, but the couple of seconds time of

38

D. J. Israel et al.

UIS g r a nt

UIS onboard scheduler provisions resources

UIS Service Execution on Prox-1 Data Channel

ch y

UIS SAP over Prox-1 Hailing Channel

UIS re q

ch x

Fig. 8 UIS framework for multiple-access proximity relay networks

flight delay for the messages would likely still be short enough to allow successful coordination.

6 DTE Relay Hybrids 6.1 LEO Generally speaking, high-performance DTE links impose fewer burdens on the user platform than high-performance geosynchronous relay links. This is due to the smaller distance between the platform and link resource and the common use of un-steered, low-gain antennas on the platform as opposed to range distances of several tens of thousands of kilometers and gimbal-pointed antennas on the platform typically required for high-performance geosynchronous links. DTE communication architectures are anticipated to have lower responsiveness due to less access to

Space Mobile Network Concepts for Missions Beyond Low Earth Orbit

39

Fig. 9 High-performance DTE service acquired through UIS over high-availability relay service

ground link resources in typical LEO orbits as compared to geosynchronous Earth relay scenarios with global coverage. However, more responsive DTE mission concepts could be enabled by highavailability links provided by space relays without the increased user burden imposed by high-performance relay links. A hybrid SMN infrastructure combining the availability of relays with the high performance of DTE links would capitalize on the best attributes of both. It is already common for missions to have low-rate relay capabilities for health and safety, so the UIS process would take place over this existing capability. UIS would allow a user to request and receive service from whichever capable assets are available, and DTN would allow data to flow to the desired destination no matter which access point was used to connect to the larger network. In this scenario, the UIS service acquisition process occurs mainly over the high availability, possibly pre-provisioned relay links, while the high-performance links are DTE or, if capable of meeting the service request, relay links. This scenario is depicted in Fig. 9.

6.2 Distributed Space Systems An interesting SMN case is the distributed space system (DSS), in which multiple spacecraft operate within a single link coverage area. For the DSS, communication between an individual spacecraft and the provider system may function through another node within the DSS, which acts as the SMN access point. The DSS node

40

D. J. Israel et al.

could also expand beyond being solely a data access point to providing the link for the UIS signaling traffic. In this way, the DSS node functions as a relay in a hybrid relay/DTE system. Individual nodes may be able to schedule up DTE support or better relay support via this node of the same DSS. For gateway nodes, those that bridge the spacecraft constellation with the ground network over DTE links, a beacon system or MSPA/MUPA can support UIS operations with the ground infrastructure. For much larger constellations with many assets spread over hundreds of thousands of kilometers or more, DTE/direct-fromEarth links will most likely be the primary communication system, and the same MSPA/MUPA approaches can apply very effectively. If the node functioning as the connection to Earth is being supported on a high-performance DTE link, more complex UIS messaging may be used than what is possible over a beacon or lower-rate multiple-access scheme. Once again, DTN provides the increased flexibility for link selection and intermediate data storage, enabling satisfied user requests with a sparser infrastructure.

7 Conclusion Through the recasting of the space communication architecture as a mobile network with different access points, the space mobile network concept strives to enable mission operations concepts to remain consistent even as the missions move between the environments near Earth out to deep space. In general, a mission could perform lowrate data transfers and network service requests over high-availability links, higherrate transfers over high-performance links that can be brought up more responsively to service requests, and networked communication to allow data flows between any two nodes connected to the same network. The specifics for how those links and service requests are performed will differ between environments, but that can be treated as the equivalent of “lower-layer” differences such that mission applications can be developed with defined interfaces to these underlying functions. Key next steps are to continue to refine the concepts and terminology, define functional and performance requirements for target environments and operational concepts, develop, model, and demonstrate protocols and implementations. Demonstrations via small satellites [14] or other test platforms, such as the SCaN Testbed [15] will also serve to identify requirements and opportunities. SMN also drives new requirements for spacecraft position, navigation, and timing (PNT) systems. Other next steps include continued efforts to address meeting those new requirements and the developing new SMN-enabled PNT capabilities [16]. The components can be implemented separately and build upon each other, allowing for a phased deployment of the space mobile network. Acknowledgements This work is in support of the NASA’s Space Communication and Navigation Program Office. The work carried out at the Jet Propulsion Laboratory, California Institute of Technology, was under a contract with NASA. C. J. Roberts would like to thank professors Tom Bradley

Space Mobile Network Concepts for Missions Beyond Low Earth Orbit

41

and John Borky for their contributions in shaping the user-initiated services concept. D. J. Israel and C. J. Roberts acknowledge the contributions of Jacob Burke, Mark Sinkiat, and Jacob Barnes in defining the UIS relay and DTE scenarios. We also acknowledge Seema Vithlani and Katherine Schauer for their contributions in technical writing, technical editing, and graphics development.

References 1. Israel, D. J., Heckler, G. W., & Menrad, R. J. (2016, March). Space mobile network: A near earth communications and navigation architecture. In 2016 IEEE Aerospace Conference (pp. 1–7). 2. Israel, D. J., Heckler, G. W., Menrad, R. J., Boroson, D., Robinson, B. S., Hudiburg, J., & Cornwell, D. M. (2016, May). Enabling communication and navigation technologies for future near earth science missions. In 14th International Conference on Space Operations (p. 2303). 3. Burleigh, S., Hooke, A., Torgerson, L., Fall, K., Cerf, V., Durst, B.,… Weiss, H. (2003, June). Delay-tolerant networking: An approach to interplanetary internet. In IEEE Communications Magazine, 41(6), (pp. 128–136). 4. Rationale, scenarios, and requirements for DTN in space. Report Concerning Space Data System Standards, CCSDS 734.0 G-1. Green Book. Washington, D.C.: CSDS. (2010). 5. Gitlin, T. A., & Horne, W. (2002). The NASA space network demand access system (DAS). In Space Ops 2002 Conference (p. 50). Houston, Texas. 6. Proximity-1 space link protocol—Data link layer. Recommendation for space data system standards, CCSDS 211.0-B 4. Blue Book. Issue 4. Washington, D.C.: CCSDS. (2006). 7. Roberts, C., Morgenstern, R., Israel, D., Borky, J., & Bradley, T. (2017, October). Preliminary results from a model-driven architecture methodology for development of an event-driven space communications service concept. In Space Terrestrial Internetworking Workshop, IEEE Wireless for Space and Extreme Environments. Montreal, Canada. 8. Hohpe, G., & Woolf, B. (2004). Enterprise integration patterns: Designing, building and deploying messaging solutions. Addison Wesley. 9. Pentikousis, E. K., Denazis., S., et al. (2015, January). Software-defined networking (SDN): Layers and architecture terminology, internet research task force (IRTF) Request for Comments. (7426). ISSN: 2070–1721. 10. Abraham, D., Finley, S., Heckman, D., Lay, N., Lush, C., & MacNeal, B. (2015, February 15). Opportunistic MSPA demonstration #1: Final report IPN progress report (pp. 42–200). 11. M. Rayman, P. Varghese, D. Lehman, & Livesay, L. (2000). Results from the Deep Space 1 Technology Validation Mission. In Acta Astronautica 47, (p. 475), 50th International Astronautical Congress. Amsterdam, The Netherlands (1999, October 4–8). 12. Abraham, D. (2017, May 1–2). Progress toward simultaneous communications with multiple smallsats via a single antenna. In International SmallSat Conference, Session C-1. San Jose, California, 2017. 13. Hill, B. (2018). The next great steps. Space Policy Directive 1, 45th Space Congress. 14. Shaw, H., Israel, D., Roberts, C., Burke, J., Kang, J., King, J. (2018, May–June). Space mobile network (SMN) user demonstration satellite (SUDS) for a practical on-orbit demonstration of user initiated services (UIS). In AIAA 15th International Conference on Space Operations. Marseille, France. 15. Mortensen, D., Roberts, C., Reinhart, R. (2018, May–June). Automated spacecraft communications service demonstration using NASA’s scan testbed. In AIAA 15th International Conference on Space Operations. Marseille, France. 16. Valdez, J. E., Ashman, B., Gramling, C., Heckler, G., Carpenter, R. (2016). Navigation Architecture for a Space Mobile Network. In AAS Guidance, Navigation and Control Conference. Breckenridge, Colorado.

Creating a NASA Deep Space Optical Communications System Leslie J. Deutsch, Stephen M. Lichten, Daniel J. Hoppe, Anthony J. Russo and Donald M. Cornwell

Abstract We expect data rates from deep space missions to increase by approximately one order of magnitude per decade for the next 50 years. The first order of magnitude improvement will come from existing plans for radio frequency (RF) communications including enhancements to both spacecraft and Deep Space Network (DSN) facilities. The next two orders of magnitude are predicted to come from the introduction of deep space optical communications. Studies indicate that optical receive apertures of between 8 and 12 m are desired. The large cost of dedicated receive telescopes makes this method unrealistic—at least in the near-term. The cost of large optical ground terminals is driven primarily by the cost of the optics and by the cost of a stable structure for the telescope. We propose a novel hybrid design in which existing DSN 34 m beam waveguide (BWG) radio antennas can be modified to include an 8 m equivalent optical primary. By utilizing a low-cost segmented spherical mirror optical design, pioneered by the optical astronomical community, and by exploiting the already existing extremely stable large radio aperture structures in the DSN, we can minimize both of these cost drivers for implementing large optical communications ground terminals. Two collocated hybrid RF/optical antennas could be arrayed to synthesize the performance of an 11.3 m receive aperture to support more capable or more distant space missions or used separately to communicate with two optical spacecraft simultaneously. NASA is currently building six new 34 m BWG antennas in the DSN. The final two are planned to be built at the DSN Goldstone, California, and Canberra complexes. We are now investigating building these last two antennas as RF/optical hybrids. By delaying their operational dates by two years, we would be able to add the 8 m optical receive capability for these two antennas while fitting within existing budgetary constraints. This paper, which derives material The research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration. © 2018. All rights reserved. L. J. Deutsch (B) · S. M. Lichten · D. J. Hoppe NASA/Jet Propulsion Laboratory/California Institute of Technology, Pasadena, USA e-mail: [email protected] A. J. Russo · D. M. Cornwell National Aeronautics and Space Administration, Washington, USA © Springer Nature Switzerland AG 2019 H. Pasquier et al. (eds.), Space Operations: Inspiring Humankind’s Future, https://doi.org/10.1007/978-3-030-11536-4_3

43

44

L. J. Deutsch et al.

from a paper the authors delivered at the SpaceOps 2018 conference [1], describes the hybrid antenna design, the technical challenges being addressed, and plan for using this concept, together with ongoing work on optical flight terminals, to infuse operation optical communications into deep space missions. All included figures are reproduced here with permission of the American Institute of Aeronautics and Astronautics (AIAA) the publishers of the transactions of SpaceOps.

1 Introduction Over the next 50 years, the average downlink data rate from deep space missions will increase by an order of magnitude each decade [2]. There are many reasons for this expected increase. For one, we are ending the era of “fly-by” exploration and seeing an increase in orbiters and landers that remain at their targets for many years. There is also a trend toward higher bandwidth instruments, inheriting technology applied first in Earth observation missions. An example of this future trend is the NASA-ISRO Synthetic Aperture Radar (NISAR) instrument, which will downlink science data from Earth orbit at rates exceeding 3 Gbps, totaling more than 25 Tb/day. NASA’s plans for future solar system exploration include proposed missions carrying comparable instruments to Venus, Mars, and beyond to outer planets. These instruments will require very large data rates and volumes to return rich data sets from large distances. Human exploration will likely require data rates of hundreds of Mbps from Mars distance. Additionally, future missions may comprise multiple spacecraft operating as constellations to provide improved science observations. We are currently working to improve NASA’s existing radio communications infrastructure to provide the first of these additional orders of magnitude in downlink communications performance. The improvements for the first decade will come from a combination of new flight and ground systems that remove current bottlenecks [3, 4], increased use of Ka-band over X-band, and newer error-correcting codes. Optical communications will likely provide the next two orders of magnitude of performance improvement. NASA has been developing technology for deep space optical communications since the 1980s [5] when the first laboratory demonstrations of multi-bit-per-photon systems were realized. The first demonstration with a spacecraft at lunar and greater distance was accomplished with the Galileo Optical Experiment [6] (GOPEX) in 1992. GOPEX used ground-based lasers to record pulses of light on the Galileo spacecraft’s camera as the spacecraft rotated during the transmission. JPL built the Optical Communications Test Laboratory (OCTL) on nearby Table Mountain in 2003 [7]. OCTL has been used in many demonstrations with spacecraft in Earth orbit including the Japanese Optical Inter-orbit Communications Test Satellite and NASA’s Optical Payload for Lasercomm Science [8]. More recently, OCTL was used as one of three ground terminals in the Lunar Laser Communications Demonstrator [9] (LLCD) carried to the Moon on NASA’s Lunar Atmosphere Dust and Environment Explorer (LADEE) spacecraft.

Creating a NASA Deep Space Optical Communications System

45

LLCD, along with the future Lunar Communications Relay Demonstration [10] (LCRD) which will fly in Earth geostationary orbit, has paved the way for infusion of optical communications in Earth orbit. We expect this infusion to happen readily in the next decade because of two main things: (1) The above-mentioned demonstrations have proven the technology and retired most of the risks associated with an operational capability, and (2) the ground-based infrastructure for Earth-orbiting optical communications can be accomplished with small (less than 0.5 m) aperture telescopes that are both readily available and relatively inexpensive. Operational optical communications in Earth orbit are expected to achieve at least an order of magnitude improvement over existing RF systems and deliver data rates in the 10s to 100s of GB/s. In order to infuse optical communications into the operational deep space environment, we need to demonstrate the appropriate space terminal technology so as to also retire the risks associated with this class of missions. We expect this to be accomplished with the planned flight of the Deep Space Optical Communications [11] (DSOC) terminal on the Psyche [12] mission. Psyche is planned to launch in 2022 on its journey to the metallic main-belt asteroid Psyche. The DSOC demonstration will occur in the first year or two of Psyche’s flight. The DSOC demonstration will use OCTL for the optical uplink and the Hale 200 (~5 m) Telescope at Palomar Observatory for downlink. Though DSOC is expected to retire the flight side and system risks of deep space optical communications, making it ready to fly on other missions, we still need to develop an operational ground infrastructure. JPL has done analysis for NASA’s ongoing next-generation architecture studies that show we will need the equivalent of an 8–12 m ground telescope to support the links that are expected to be needed for human missions to Mars. The technology certainly exists to construct and operate telescopes of this size. Also, because the optical system does not require fully diffraction-limited primary mirrors, we may be able to construct these stations for substantially less than the cost of astronomical observatories of similar size. However, this still represents a substantial investment that would have to be accomplished before mission designers could commit to using optical communications. An obvious alternative to building dedicated communication telescopes is to share facilities with optical astronomers. However, some requirements for the communications system—including the need to operate in both daytime and nighttime—present logistical and operational difficulties at most facilities. Another alternative is to add optical mirrors to existing or future DSN RF antennas. This has the possibility of providing a lower cost solution, but comes with a host of interesting challenges. In fact, NASA has studied concepts for combing RF and optical systems on the same antenna structure since 2010 [13]. Recent experimental work and engineering studies (described below) along with innovative funding mechanisms have resulted in such hybrid RF/optical antennas forming a major piece of NASA’s current strategy for infusing operational deep space optical communications. The following sections outline the concept in much more detail and also describe the steps necessary to advance this concept to operational reality.

46

L. J. Deutsch et al.

2 The RF/Optical Antenna Concept Figure 1 shows the basic design of the RF/optical hybrid antenna. It is based on the design of the DSN 34 m Beam Waveguide (BWG) RF antenna. BWG antennas have been operating in the DSN since 1992 [14] and have proven themselves to have excellent pointing and stability. The basic idea of the hybrid concept is to place segmented optical mirrors on the inner area of the antenna’s primary RF reflector. These would be segments of a spherical optical primary mirror. We have chosen to use a spherical primary rather than a paraboloid to reduce the cost of the system and take advantage of recent advances in spherical segmented mirrors in the optical astronomy community. These segments would be actuated on their edges to allow for figure compensation as the antenna points in elevation. Of course, this configuration necessitates a decrease in the RF performance of the antenna approximately equal to the reduced RF reflecting area. In order to reduce stray light issues, the spherical segments would be placed to avoid reflections from the four subreflector supports. This results in the four “pod” configuration shown in Fig. 1. The total area this allows is just over 50 m2 or the equivalent of an 8 m diameter monolithic mirror. The RF subreflector mounted below the apex of the antenna would be redesigned as to illuminate only the remaining RF panels. This will leave a hole in the center

Fig. 1 Concept for RF/optical hybrid antennas

Creating a NASA Deep Space Optical Communications System

47

of the subreflector through which the signal reflected from the optical primary can pass to a new optical processing system on the apex platform. This system would include an optical receiver along with a lens assembly for correcting the spherical aberration. There would be no optical uplink on this hybrid antenna. Instead, a smaller (~1 m diameter) optical station somewhere nearby would supply uplink. For the DSN Goldstone site, a 1 m telescope on Table Mountain would suffice for this. In any case, the cost of these smaller stations is not a driving function in the infusion process. The hybrid antenna would retain all of its RF capabilities, albeit at a loss of 0.6 dB resulting mainly from the loss of RF reflecting area on the primary, plus some other smaller losses that the current design minimizes. Both RF and optical signals could be received (and RF transmitted) simultaneously in operations. Hence, the one antenna could service a deep space mission with simultaneous RF and optical downlinks, further reducing the cost of operations. In principle, different spacecraft could even be serviced at optical and RF simultaneously provided that the optical user is located within the RF field of view that includes the RF user. When performance beyond an 8 m equivalent optical aperture is required, two or more of these hybrid antennas could be arrayed. Two-hybrid antennas would provide the optical performance of an 11.3 m diameter telescope. We have examined adding a larger surface area for the optical portion of the RF/Optical hybrid, but beyond 8 m we would have to both increase the mass and distribute the loads outside the integral ring girder of the primary RF reflector. This would make the spherical aberration correction system more complicated and require a mechanical structure update for additional mass and moments.

3 Experiments and Studies Completed this Far In conjunction with the design effort for the RF/Optical ground station described in the previous section, laboratory and fieldwork were undertaken in a number of key areas. In this section, we will briefly summarize this work. In order to demonstrate the viability of using the RF antenna to track optical sources, a simple optical receiving system was installed on DSS-13, the Deep Space Network’s research and developmental 34 m beam waveguide antenna at Goldstone CA. As depicted in Fig. 2, the system consists of a two-segment optical mirror assembly mounted through the RF panels to the backup structure of the antenna and a focal plane receiver mounted to the edge of the RF secondary. The mirror assembly, depicted in the right half of Fig. 2, includes two hexagonal optical segments, each approximately 40 cm tip-to-tip. The radius of curvature of these spherical segments is approximately 24 m, creating a focus near the edge of the RF secondary. Actuators with submicron resolution allow tip/tilt and piston adjustment of the mirror pair and individually tip/tilt the left mirror with respect to the rightmost mirror. A SiO2 /TiO2 coating, suitable for the open-air environment, is applied to the segments. The assembly also includes a target camera which images

48

L. J. Deutsch et al.

Fig. 2 Optical demonstration system at DSS-13

Fig. 3 Mirror assembly (left) and focal plane assembly (right)

the focal plane assembly and its relative motion. The focal plane assembly, depicted in the right half of Fig. 3, contains a filter wheel and large primary camera (red) visible at the rightmost position, as well as beam splitters, a fast video camera, and a pupil camera for imaging the segments themselves. A linear stage under the assembly allows for focus adjustment. The Moon and combination RF/optical sources such as Saturn were used to achieve initial alignment of the optical system and co-align the RF and optical beams, as shown in Fig. 4. In this figure, the right panel shows a dual image of Saturn is shown prior to co-alignment of the two optical segments. The DSS-13 optical system and test sequence were designed to show acquisition of the optical sources using the RF mount and pointing model, confirm the stability

Creating a NASA Deep Space Optical Communications System

49

Fig. 4 Images taken during the initial alignment sequence

Fig. 5 Images of Vega prior to co-alignment (left), and after (right)

of the RF platform for optical tracking, and demonstrate the ability to position the mirror segments with micro-radian precision in an open-air environment. After initial alignment, various optical sources were tracked a few times a month in a number of environmental conditions over a period of a few years. A typical sequence included blind pointing to the optical source using the RF pointing model and applying initial actuator offsets on the mirror assembly based on a table look-up and the source elevation and azimuth. In nearly all instances, this simple method was capable of bringing the source images from both segments well into the 1 mrad field of view of the astronomical camera. Small adjustments were then made to co-align the two images and bring them to the center of the field. Example images from the star Vega are shown in Fig. 5. In this image, the 80% encircled energy spot size is approximately 20 micro-radians and is determined by a combination of optical seeing and mirror segment quality. For the initial RF/Optical ground station, the detector size encompasses approximately 50 micro-radians on the sky, and the goal for the experiments is to maintain the spot within this hypothetical detector throughout a typical track. Throughout the alignment process, the ground station is of course, tracking the source using the RF pointing model. After initial alignment of the spots, the coalignment of the spots and the centroid is recorded throughout the track. Typical tracks showed a common-mode (to both segments) pointing error of several hundred micro-

50

L. J. Deutsch et al.

Fig. 6 Differential segment pointing during a track of the Crab Nebula

radians over an elevation range of tens of degrees. This is consistent with expectations based on the RF pointing model accuracy and the local tip/tilt environment of the antenna structure as a function of elevation angle [15, 16]. Differential-mode pointing error (difference between segments) was typically several times smaller. This error is driven by the mechanical details such as the design of the mounts used to fasten the actuators to the segments. Figure 6 shows a typical example of differential segment pointing using the Crab Nebula as the source. In this example, a differential pointing error of approximately 150 micro-radians is observed as an elevation range of 40°–60°is traversed. It can be noted that the response is nearly linear, smooth, and easily corrected using the segment actuators. This result is typical of those observed throughout the test campaign. While co-alignment of the two segments in the demonstration system is a relatively simple matter, this is not necessarily the case for the full system where 64 segments are involved. Demonstration of a pupil camera system to aid in initial segment alignment was also part of the DSS-13 experiment. The pupil camera concept is depicted in the leftmost pane of Fig. 7. The pupil camera is optically focused on the segments themselves as they are illuminated by the optical source. Light from the segments passed through a variable aperture prior to entering the camera. At the minimum aperture size, only light from properly aligned segments will pass to the camera. This is illustrated in the two right panes of figure where in the middle pane only one segment is properly aligned. In the final pane, both spots in the focal plane are atop one another and the pupil camera shows both segments fully illuminated. The character of the segment images is determined by a combination of mirror quality and local seeing conditions. Furthermore, if the iris size is swept, the pupil image will show the light sweeping across the misaligned mirrors. The sweeps direction and the pupil size can be used to determine both the magnitude and direction of the mirror offset. This can be very valuable when initially aligning the 64 segments. For the two-segment demonstration, the pupil camera was used in a manual mode only. In future experiments, with more segments, we plan to fully automate the system.

Creating a NASA Deep Space Optical Communications System

51

Fig. 7 Pupil camera concept and sample segment images

Fig. 8 First structural mode of the antenna, and direct measurement of the resonance

Environmental conditions are a major factor in the implementation of an openair optical communication ground station. In particular, wind can interact with the RF primary and secondary structures and induce pointing errors. A primary area of concern is the relative position of the RF primary and secondary since each house one of the two ends of the optical chain. Spot motion in the focal plane and direct imaging of the focal plane assembly using the target camera on the RF primary in wind can directly measure this effect. While nominal wind conditions were not significant in our ability to maintain the spot inside the 50 micro-radian detectors, winds in excess of 20 Mph were significant. Figure 8 shows the first structural resonance for the antenna which is a twisting mode of the subreflector structure at approximately 2.4 Hz. The right panel of the figure plots a direct measurement of this resonance obtained by processing video images of the spot motion in the focal plane. No other significant structural resonances were observed. It should be noted that this resonance is important for our experimental system where the focal plane assembly is at the edge of the RF secondary since it causes direct lateral motion. However, for a centrally mounted optical assembly such as that in the operational design, the effect of this resonance is greatly reduced. A fast steering mirror can easily compensate for low-frequency pointing errors such as those observed in the experiments at DSS-13, and a such a mirror will surely be present in the aft optics of the operational system. Recently, an upgraded focal plane assembly was installed at DSS-13 [17]. A block diagram and photograph of the unit are shown in Fig. 9. Like the first-generation assembly, the new version includes

52

L. J. Deutsch et al.

Fig. 9 Second-generation focal plane assembly

a test source, a number of beam splitters, a pupil camera, and the main camera. Additionally, the fast steering mirror is implemented requiring a collimating mirror and an additional camera to close the steering mirror loop. The system is currently on the antenna and has gone through initial alignment. Along with the demonstration at DSS-13, there is also an ongoing effort to characterize the optical channel at Goldstone [16]. The suite of instruments includes a particle profiler, night time, and daytime seeing monitors, a solar scintillometer, a sun photometer, a boundary layer scintillometer, and a cloud camera. Many of these instruments have been in near continuous operation for several years, necessary to build up a statistical model for the channel. Early experiments were also conducted to measure the dust contamination rate on the RF primary using a group of small mirrors and a handheld BRDF meter [17]. This information is necessary in order to determine a mirror cleaning schedule and method for the operational system. Laboratory work in support of the RF/Optical ground station is also ongoing and includes mechanical/structural analysis, stray light analysis, optical design, and cryogenic work. For the proposed ground station, a prime focus optical system is used, requiring a fully tippable 1 K cryogenic system. Photographs of an experimental sub 1 K tipping Dewar in Fig. 10. The system uses an existing 4 K DSN Dewar with an additional 1 K absorption cooler stage. Laboratory tests have validated the basic mechanical design, the cooling capacity over the 90° of elevation tipping angle, and the mechanical/optical stability of the detector mount. A structural/mechanical analysis of the proposed ground station was performed [18]. The analysis considers the margins of safety for the existing structure including the additional optical segments as well as the resulting deformation of the RF surface and its impact on antenna efficiency. Figure 11 shows the Nastran model of the structure as well as computed optical segment tip/tilt and piston as a function of elevation angle. These calculations are necessary to properly size the segment actuators for the final system. Two curves are shown on each plot, one for the baseline beam waveguide (BWG) implementation and a second for an implementation on one of the DSN’s high-efficiency 34 m antennas (HEF).

Creating a NASA Deep Space Optical Communications System

53

Fig. 10 Prototype 1 K tipping cryogenic package

Fig. 11 Structural model of the RF/Optical ground station

As a final example of additional work on the ground station results from a recent stray light study are depicted in Fig. 12. For an optical communication system, the achievable data rate is a function of the available signal power density at the ground station and the background light level seen by the detector. Stray light from the Sun is one of the primary concerns when trying to track near the Sun in the day. While it is not directly in the field of view of the communication detector, sunlight can be scattered off of auxiliary structures into the detector. Dust contamination on the mirror segments and the raw mirror surface roughness also contribute but are mitigated by the choice of materials in our proposed design. The left panel of the figure shows the stray light model, whereas the remaining two panels show how the point source transmission (PST) for the system. The PST is the transfer function relating the detector power density to plane wave source density incident from a given direction. In the center panel for source angles near the main optical direction ( “design to operations”.

5.1 Methodology Concurrent engineering tasks have been extensively used to tackle the end-to-end system optimisation, integrating from the beginning the lessons learned from former LS development and operations (Ariane and Vega mainly). The overall methodology is illustrated in Fig. 7:

Ariane 6 Launch System Operational Concept Main Drivers

93

To illustrate such activity, some insight is needed into the trade-off performed between a full horizontal, a mixed horizontal/vertical and a full vertical Operational Concept. Full horizontal assumes that both launcher and P/L would be integrated into that position. Full vertical is an Ariane 5-type scenario. Horizontal/vertical is a smart mix of both scenarios, meaning most operations are performed horizontally, but the payload is integrated vertically and assembled on top of the launcher later on the launch pad. For the trade-off, the following elements were studied concurrently, involving experts from operations, mechanical, avionics and fluidics as well as cost engineers: – Launcher integration, P/L integration, CC verticalisation concept, launcher connections; – Checkout policy; – BAL concept. For each element, the following criteria were quoted: – Recurrent cost (end to end); – Time to market (maiden flight 2020, Full Operational Capability 2023) and fit to NRC; – P/L customer’s satisfaction criteria; – Flexibility to meet launch manifest dynamics; – Robustness to contingencies; – Adaptability. Even if no showstopper was identified for the horizontal integration logic, it was found to be not optimal in terms of service to the customer, leading to the choice of the horizontal/vertical scenario.

5.2 Current “Design to Operations” Under Study Achieving ambitious operations duration targets involves driving the design of launcher and launch base elements towards operations efficiency. In this framework, the following issues have been identified: a. Cryogenic connection systems junction: quick connection devices are currently under analysis to drastically reduce the operations duration observed with classic bolt flange connection devices; b. Optimisation of central core vis-à-vis ESR mechanical dimension chains to avoid any ESR ripping or shimming operations for CC-ESR integration; c. Parallel mechanical/electrical integration operations for CC integration and ESR assembly; d. L/V final integration test in under 3.5 h; e. Avoid need for sensors recalibration once in launch area.

94

P. D. Resta et al.

5.3 Challenges The duration of each defined operation in the BAL has been set to fulfil the target of “BAL in 5 days”. The final feasibility of carrying out the operations in the allocated time is still to be verified. The launch vehicle and launch complex facilities design, as well as the operational process itself, have to be challenged with respect to current Launch Systems to achieve the timeline target. At the end of design loops, the duration of BAL operations will have to be reviewed in order to validate the final duration of each operation and the resources needed. Challenges that have already been identified are: • Design of the flooring/access to liquid propulsion modules: the duration of the flooring installation/removal as well as the duration of internal inspection before removal is specified to be very short compared to current knowledge with Ariane 5; • Design of dollies/rolling bases and associated interfaces to minimise the duration of stage unpacking, container repackaging, ULPM and LLPM relative positioning, etc.; • Measurement facilities for ULPM and LLPM relative positioning.

6 Ariane 6 Launch System Operational Concept: Current Definition Two main distinctions may be drawn between the various activities constituting the Ariane 6 LS Operational Concept: a. Activities in mainland Europe; b. Activities at the CSG: i. The operational cycle at the launch rate starts with launch vehicle integration which is performed in four main steps: • L/V central core (CC) integration in a dedicated building (the BAL); • CC roll-out to the Mobile Gantry (MG) in the launch zone (ZL); • Strap-on solid rocket boosters (ESR) roll-out to the MG; • Assembly of the CC with the ESR on the MG at the ZL; ii. The launch cycle continues with mating of the upper composite (UC) where the payloads are accommodated; iii. Then the countdown and launch, followed by ZL refurbishment post-launch and reconfiguration of the launch complex and launch range to prepare for the next mission. In fact, it is now confirmed that horizontal integration of the launch vehicle’s central core has introduced synergies in the end-to-end production chain: adopting

Ariane 6 Launch System Operational Concept Main Drivers

95

Fig. 8 Cryogenic connection system architecture

the same manufacturing and integration processes for the stage (Europe) and L/V (French Guiana) has significantly reduced time to market for Ariane 6, has reduced and consolidated the operational skills necessary to exploit the Launch System and has also shared them out between the European and the French Guiana production chains. Strict application of the Deming Cycle (PDCA) model to LS fluidics functions together with careful exploitation of the knowledge gained through Ariane 5 ECA exploitation have yielded important savings for cryogenic and conventional fluids functions compared to Ariane 5. In this way, the launch rate can be reached with very limited upgrading of existing propellant production facilities. Another achievement of the Ariane 6 development methodology is the simplification of flight hardware allowed by the different choice of a launcher-to-ground cryogenic connection system whose main functions and constraints can be summarised as follows: • Fulfil cryogenic (but not only) Ariane 6 fluid interface functions; • Disconnect in positive time just after ESR ignition; • Fulfil RAMS requirements. The cryogenic connection system design (Fig. 8) is common to both lower and upper liquid propulsion modules, allowing cost reductions by maximising hardware commonality. The Ariane 6 Launch System methodology gives priority to design to operations to minimise the launch campaign duration through improved flight segment/ground segment interfaces and to reviewing and optimising the launcher checkout logic to minimise complexity and degraded-mode treatment at the launch base. Finally, logistical constraints have been challenged and a critical item has been identified in the definition of the CSG road system that requires limited refurbishment to enable free movement of personnel and non-dangerous items during hazardous

96

P. D. Resta et al.

Fig. 9 CSG roads system between the solid boosters (ESR) factory and the Ariane 6 launch pad: current main roads (in dark blue) and ESR transfer road (light blue). The red lines show a complementary road path alternative to decouple even more the ESR transfer operations from the main road traffic

transfer operations. The subject is illustrated in Fig. 9 in which two separate main roads are shown corresponding to the ESR transfer road and the CSG main road. The Ariane 6 Launch System is ready to become compatible with the newgeneration CSG launch range (e.g. telemetry transmission by satellite, localisation by GNSS receivers).

7 Safety Aspects Safety is a must in launch vehicle operations. The Ariane 6 processing cycle must comply with safety rules/regulations and as far as possible demonstrate its feasibility and robustness in the early stages of development. With this objective in mind, a collaborative work approach has been implemented by ESA involving all the stakeholders in the Ariane 6 project, namely ArianeGroup as launcher system design authority, CNES as launch base design definition authority and CNES/CSG as safety authority. A safety roadmap has been implemented to secure the preparation of the main development milestones:

Ariane 6 Launch System Operational Concept Main Drivers

97

Fig. 10 Safety cycle

– Identifying as soon as possible all the hazard scenarios impacting the safety of people and property on the ground and in flight; – Defining, validating and implementing consistent mitigation actions to ensure compliance with the law and regulations. The safety roadmap is split into six-month cycles, as illustrated in Fig. 10. This new safety management approach has proved to be very effective for reaching a common understanding on the way forward for routine safety issues such as operations in the presence of pyrotechnic products, qualification rules for neutralisation chains and close-range activity. For example, the arrival sequence for the CC and the ESRs at the launch zone will be carried out with a CC first and with an ESR after sequence with the aim of optimising the accessibility before arrival of the ESR and the overall launcher integration operations schedule.

8 Degraded-Mode Cases Based on lessons learned from other operational Launch Systems, it has been observed that the Operational Concept’s elaboration shall take into account from the very beginning the management of degraded-mode cases as a major contribu-

98

P. D. Resta et al.

Fig. 11 Launch system hazard analysis

tor to the achievement of dependability (RAMS) objectives. With this objective in mind, the capacity to manage such cases has been clearly identified and integrated into the robustness quote for CONOPS options trade-offs. As a consequence, it could be concluded that the minor degraded-mode cases/contingencies occurring during a launch campaign could be covered by the flexibility requirements applicable to the Launch System design: – All minor degraded-mode cases can be treated under the Mobile Gantry, except nozzle and engine exchanges which would require returning the central core to the BAL. – Launcher System checkout logic principles established and evaluated in coherence with CONOPS elaboration. – All operations specified to be reversible (including CC assembly on ESRs). – No electrical ground support equipment to be installed in the BAL. – BAL integration areas limited to one assembly line and one CC production line. – No fairing disassembly facilities required under the Mobile Gantry as it has been ensured that minor contingencies would be treated via local fairing doors and access points, while in cases of major payload anomalies, the P/L composite will be sent back to the encapsulation hall. To further secure the achievement of dependability objectives, a Launch System risk analysis roadmap has been established by ESA with both the Launcher System and the launch base design authorities to evaluate, as soon as operational sequences and preliminary design are available, the risk mitigation actions to be implemented by the other segment in cases of major impacts on the segment that generated the failure mode. Figure 11 shows the methodology implemented at LS level. This approach is of the utmost importance to securing the robustness of the Ariane 6 Launch System:

Ariane 6 Launch System Operational Concept Main Drivers

99

– Facing external environments (lightning, winds) in particular during roll-out and countdown phases; – During cryogenic operations optimise the safety barriers/operational procedures with respect to the usual feared events like geyser, water hammer, flash vaporisation effects; – Cryogenic connection systems disconnection and retraction. There, the LS will benefit from a positive time disconnection system removing the need for purging devices onboard the launcher in flight while securing draining operations in the case of a last moment abort. Nevertheless, stringent LS failure mode analysis is to be performed to suppress any design issues before combined testing.

9 Right First Time: Launch System Verification and Validation Logic Defining an Operational Concept optimised to meet the operational-cost and launch rate constraints of the high-level requirements is a need accompanied by another very important constraint: time to market, meaning, for the case of Ariane 6, having the maiden flight by 2020 and the first operational flight beginning 2021. This is a programmatic need of very high added value that has led the project to abandon the prototype development approach and enforce the right-first-time paradigm at all levels of the development chain. At Launch System level, this has meant ESA establishing a very thorough integrated verification plan covering verification of technical and operational performance levels at the same time. For each LS function contributing to operational performance (including dependability), the end-to-end verification and validation logic is being established in collaboration with the Launcher System and Launch Base design authorities, the aim being to guarantee that: • Requirements are fully verified and validated at the lowest possible level in the product breakdown structure. • Requirements are satisfactorily verified and validated at least once in the integrated logic. • Testing is the preferred verification method, including scale testing when necessary. The verification and validation logic not only covers H/W and S/W products but operational products themselves. They are a compound of operations plans, operations requirements expressed by design authorities, operations procedures defined by operations authorities, control software and databases. Each operational product is to be tested as soon as possible in the integrated verification and validation plan. LS Validation comprises four main steps: • Product-level testing (Category 1 tests);

100

P. D. Resta et al.

• Early combined tests defined by introducing Launch System test requirements into the Launcher System, respectively, launch base, qualification testing; • LS combined tests performed at the launch complex when its development activities have been completed; • Maiden flight campaign. Implementation of a right-first-time approach requires using the operational organisation and products designed for exploitation at least as from Launch System combined testing, this being implemented in ESA’s test plans.

10 Comparison with Former Ariane Operations Plans Ariane 6 is being developed with the objectives of providing users with high mass lift-capability performance, mission versatility, operational flexibility, high launch rate and low launch service cost. The following improvements with respect to former Ariane operations plans contribute to the final objective of better fulfilling user expectations during preparation, launch and post-launch operations: – A Launcher System concept based on: • a design for manufacturing and operations > design-to-cost; • an industrial organisation based on the “extended enterprise” model; • a “production-pull” strategy. – A launch complex concept based on: • a design allowing time-optimised launcher integration, checkout and countdown; • a configuration of the launch pad taking into account lessons learned from Ariane 5, in particular for the design of the gas exhaust ducts which in addition allow future Ariane 6 evolution towards more powerful versions; • a launcher control and command room accommodating commercial off-theself operational computing equipment together with a high-reliability redundant main computing core. – A launch range adaptation based on: • customer-oriented payload processing services including specific high-speed links between the spacecraft and its control bench; • a multi launch missions reconfiguration capability to be consistent with a high launch rate. – A Launch System Operational Concept resulting in: • a time-optimised operations plan for launcher integration, payload upper composite assembly, checkouts and launch readiness;

Ariane 6 Launch System Operational Concept Main Drivers

101

• a time-optimised launch countdown including a shorter liquid propellant launcher filling process; • quicker delivery to the customer (i.e. within 30 min after separation) of the flight results summary at separation (orbital characteristics, attitude data); • quicker post-flight analysis (i.e. within two weeks of launch) allowing a high launch rate and ensuring early identification of flight non-conformances, so they can be treated as soon as possible.

11 Conclusion The results achieved so far demonstrate the well-founded nature of the Ariane 6 development methodology, notably the concurrent design of products and operations with recurrent cost checking against targets performed at each step in the development cycle. This is the basis for getting up-front confirmation of the compliance with cost requirements that are part of the Ariane 6 high-level requirements. The overall operations plan resulting from the CONOPS exercise has been presented and for some challenging points has been compared with former European Launch System operations plans, showing the differences and highlighting improvements with respect to user expectations. Acknowledgements The ESA authors wish to thank ArianeGroup and CNES for this cooperative work effort allowing the Ariane 6 time and cost targets to be met.

Reference 1. Resta, P. D., Pilchen, G., Coulon, D., et al. (2016). The Ariane 6 Launch System Development Status. In 67th International Astronautical Congress (IAC). Guadalajara, Mexico, 26–30 September 2016.

LUMIO: An Autonomous CubeSat for Lunar Exploration Stefano Speretta, Angelo Cervone, Prem Sundaramoorthy, Ron Noomen, Samiksha Mestry, Ana Cipriano, Francesco Topputo, James Biggs, Pierluigi Di Lizia, Mauro Massari, Karthik V. Mani, Diogene A. Dei Tos, Simone Ceccherini, Vittorio Franzese, Anton Ivanov, Demetrio Labate, Leonardo Tommasi, Arnoud Jochemsen, J¯anis Gailis, Roberto Furfaro, Vishnu Reddy, Johan Vennekens and Roger Walker Abstract The Lunar Meteoroid Impact Observer (LUMIO) is one of the four projects selected within ESA’s SysNova competition to develop a small satellite for scientific and technology demonstration purposes to be deployed by a mothership around the Moon. The mission utilizes a 12U form-factor CubeSat which carries the LUMIO-Cam, an optical instrument capable of detecting light flashes in the visible spectrum to continuously monitor and process the meteoroids impacts. In this chapter, we will describe the mission concept and focus on the performance of a novel navigation concept using Moon images taken as byproduct of the LUMIOCam operations. This new approach will considerably limit the operations burden on ground, aiming at autonomous orbit-attitude navigation and control. Furthermore, an efficient and autonomous strategy for collection, processing, categorization, and storage of payload data is also described to cope with the limited contact time and downlink bandwidth. Since all communications have to go via a lunar orbiter, all commands and telemetry/data will have to be forwarded to/from the mothership. This will prevent quasi-real-time operations and will be the first time for CubeSats S. Speretta (B) · A. Cervone · P. Sundaramoorthy · R. Noomen · S. Mestry · A. Cipriano Delft University of Technology, Delft, The Netherlands e-mail: [email protected] F. Topputo · J. Biggs · P. Di Lizia · M. Massari · K. V. Mani · D. A. Dei Tos · S. Ceccherini · V. Franzese Politecnico Di Milano, Milan, Italy A. Ivanov Space Center, Skolkovo Institute of Science and Technology, Moscow, Russia D. Labate · L. Tommasi Leonardo, Campi Bisenzio, Florence, Italy A. Jochemsen · J. Gailis Science and Technology AS, Oslo, Norway R. Furfaro · V. Reddy University of Arizona, Tucson, AZ, USA J. Vennekens · R. Walker European Space Agency, Noordwijk, The Netherlands © Springer Nature Switzerland AG 2019 H. Pasquier et al. (eds.), Space Operations: Inspiring Humankind’s Future, https://doi.org/10.1007/978-3-030-11536-4_6

103

104

S. Speretta et al.

as they have never flown without a direct link to Earth. This chapter was derived from a paper the authors delivered at the SpaceOps 2018 conference [1].

Nomenclature ADCS CCD CCSDS CDF CONOPS COTS CRTBP EOL ESA FOV HIM IMU LUCE LUMIO NIR OBPDP PCM RF ROM SADA SK SMIM SNR TCM TRL UHF

Attitude determination and control system Charge-coupled device Consultative committee for space data systems Concurrent Design Facility Concept of operations Commercial off the shelf Circular restricted three-body problem End of life European Space Agency Field of view Halo injection maneuver Inertial measurement unit Lunar CubeSat for exploration Lunar Meteoroid Impact Observer Near infrared Onboard payload data processor Plane change maneuver Radio frequency Rough order of magnitude Solar array drive assembly Station keeping Stable manifold injection maneuver Signal-to-noise ratio Trajectory correction maneuver Technology readiness level Ultra-high frequency

1 Introduction The Lunar Meteoroid Impact Observer (LUMIO) was one of the proposals submitted to the ESA SysNova LUnar CubeSats for Exploration (LUCE) call by ESA [2]. SysNova is intended to generate new and innovative concepts and to verify quickly their usefulness and feasibility via short concurrent studies. LUMIO was selected as one of the four concurrent studies run by ESA, and it won ex aequo the challenge. An independent assessment conducted at ESA’s Concurrent Design Facility (CDF) has

LUMIO: An Autonomous CubeSat for Lunar Exploration

105

shown that the mission is feasible, proving the value of LUMIO for future autonomous missions for planetary exploration. The mission utilizes a CubeSat that carries the LUMIO-Cam, an optical instrument capable of detecting light flashes in the visible spectrum. Onboard data processing is implemented to minimize data downlink, while still retaining relevant scientific data. The mission implements a sophisticated orbit design: LUMIO is placed in a halo orbit about Earth–Moon L 2 where permanent full-disk observation of the lunar farside is made. This prevents background noise due to Earthshine and permits obtaining high-quality scientific products. This chapter will focus on the concept of operations, which will not have a direct communication link to Earth, preventing the usual navigation and control techniques. LUMIO will be especially relevant as a precursor of autonomous missions to remote bodies which cannot rely on real-time commands. Furthermore, in the optics of reducing the cost of a mission, operations (and navigation) will be autonomous, as operations are one of the cost figures that do not scale linearly with satellite size [3]. In this chapter, we will present the mission (Sect. 2), briefly describing also the SysNova LUCE challenge. We will then describe the satellite design (Sect. 3) and the orbit design (Sect. 4) and concentrate on the mission concept of operations (Sect. 5) and the autonomous navigation system (Sect. 6). The concept presented throughout this chapter has also been independently verified by the ESA CDF team (Sect. 7) that suggested improvements to the mission.

2 Mission Description LUMIO was one of the four competitive proposals selected for the ESA SysNova LUCE [1] study, which was aimed at identifying a viable low-cost concept using nanosatellites or CubeSats for interplanetary exploration. The LUCE call was, in particular, aimed at technology demonstration and the exploration of the Moon. The prize for this competitive study was the opportunity to review and advance the mission concept with ESA experts at the CDF at ESA/ESTEC. This independent mission verification was carried out and showed that the mission is feasible and suggested a series of improvements, described in Sect. 7.

2.1 SysNova LUCE The LUCE study is expected to enable future exploration missions around the Moon, by pushing the following key technologies: • Deployment and autonomous operation of a number of small satellites in a lunar orbit either as individual elements or as part of a distributed system including localization and navigation aspects;

106

S. Speretta et al.

• Miniaturization of optical, RF, and other scientific payload instrumentations and associated technology flight demonstrations on CubeSat/nanosatellite platforms in a lunar orbit; • Remote sensing of the lunar surface and/or in-situ measurements in the lunar environment and astronomical observations that could be made from lunar orbit and not achievable by past, current, or planned lunar missions; • Intersatellite communication links to a larger lunar communications orbiter for relay of data back to users on Earth and for tracking, telecommand, and control; • Technologies directly useful for future human and robotic exploration missions and in need of flight demonstration in a representative environment. The mission concept relies on a lunar orbiter which departs from Earth and reaches an elliptical (800–8000 km) high-inclination (50°–90°) orbit where it deploys several smaller satellites (up to 24 kg) in a circular orbit around the Moon. This mother spacecraft solves most of the issues related to the deployment in lunar orbit, and it also ensures communication with Earth, acting as a relay to the small satellites (it should be noted that, according to the SysNova LUCE challenge, no direct-toEarth communication was allowed). This concept brings several constraints on the small spacecraft, especially from the communications point of view. The mother spacecraft is only visible in certain parts of the orbit (see Fig. 1), and it constrains the communications to and from Earth as it can service only one satellite at the time in time-division multiple access. Furthermore, the mother spacecraft does not have a known schedule, so the deployed satellites should act independently and be able to fulfill their goals without counting on a connection to ground. An additional 10 days maximum communication blackout should also be considered, in case of problems onboard the mother spacecraft. One of the aims of SysNova LUCE is pushing the limits of technology, and autonomous operations will be an important technology to demonstrate for future missions.

2.2 Lunar Meteoroid Impact Observer LUMIO is one of the four missions that were funded by ESA, and it is meant to observe, quantify, and characterize the meteoroid impacts by detecting the impact flashes on the lunar farside. This will complement the knowledge gathered by Earthbased observations of the lunar nearside, thus synthesizing global information on the lunar meteoroid environment. The mission is designed to observe meteoroid impacts on the lunar farside for a continuous period (up to 14 consecutive days) to improve the existing statistics on meteoroids close to Earth. The Moon can be used as an impact target to measure the statistics, but Earth-based observations of lunar impact flashes are restricted to periods when the lunar nearside is illuminated between 10 and 50%. The observation on the night side of the Moon can be carried out when the illumination is less than 50%, and this can happen for half of the lunar orbit. To achieve this, it was required

LUMIO: An Autonomous CubeSat for Lunar Exploration

107

Fig. 1 Lunar orbiter communication window (see Sect. 3 for further details)

to select an orbit that would maximize the visibility on the night side of the Moon (see Fig. 2 for more details). The mission uses a CubeSat that carries the LUMIO-Cam, an optical instrument capable of detecting light flashes in the visible spectrum. LUMIO-Cam has a 1024 × 1024 pixel CCD, 6° FOV, 127 mm focal length, and 55 mm aperture. Slight defocusing is chosen to prevent detecting false positives. Onboard data processing is implemented to minimize data downlink, while still retaining relevant scientific data. The onboard payload data processor autonomously detects flashes in the images, and only those containing events are stored. The mission implements a sophisticated orbit design: LUMIO is placed in a halo orbit about Earth–Moon L 2 where permanent full-disk observation of the lunar farside is made. This prevents having background noise due to Earth shine and thus permits obtaining high-quality scientific products. Repetitive operations are foreseen, the orbit being in near 2:1 resonance with the Moon orbit. Innovative full-disk optical autonomous navigation is proposed, and its performance is assessed and quantified. The spacecraft is a 12U form-factor CubeSat (see Fig. 3 for further details), with a mass of 22 kg. Novel onboard micropropulsion system for orbital control, de-

108

S. Speretta et al.

Apex

Fig. 2 Improvement in observation time of lunar farside

elion

Helion

Antap

ex

Earth

Anti-H

Fig. 3 LUMIO configuration

tumbling, and reaction wheel de-saturation is used. Steady solar power generation is achieved with a solar array drive assembly that also guarantees eclipse-free orbits. Accurate pointing is performed by using reaction wheels, an IMU, start trackers, and fine sun sensors. Communication with the lunar orbiter is done in the UHF band using the CCSDS Proximity-1 link [4]. A lightweight structure with radiation shielding has been considered to minimize the impact of ionizing radiation on components, allowing to reduce mission cost by relying on commercial parts whenever possible.

LUMIO: An Autonomous CubeSat for Lunar Exploration

109

To make such a mission possible, a propulsion system capable of a v of 154 m/s (see Sect. 4.2) will be required. Several commercial units have been evaluated, deeming such system feasible, but requiring a high level of customization. The required volume for such a system has been estimated down to approximately 3U with a wet mass of 5.6 kg.

3 System Design The LUMIO spacecraft has been designed to perform with a high level of autonomy, particularly the navigation, payload data processor, and CDHS subsystems. This choice was driven not only by the operational constraints with respect to the lunar orbiter, but also by the ambitious mission design. Additionally, a general zeroredundancy approach has been adopted for all subsystems. This is dictated by the tight mass and volume constraints and a CubeSat design-driven risk approach. In the subsystem design, a systematic trade-off procedure has been adopted, based on subsystem-specific performance criteria, as well as standard performance, cost, and schedule criteria. Consistent design margins have been used for sizing the subsystems based on the development status. A standard 5, 10, and 20% mass margin has been applied for a fully COTS solution, a COTS solution requiring modification and a custom design, respectively. The most important system and subsystem requirements are summarized in Table 1.

3.1 Payload The observation of the light flashes produced by meteoroid impacts on the Moon farside is performed through the LUMIO-Cam, the main payload of the LUMIO CubeSat. The impact flashes on the Moon can be modeled as black body emissions [5], with temperatures between 2700 and 6000 K [6], and durations greater than 30 ms [7]. The lowest impact energies correspond to apparent magnitudes higher than six as seen from Earth. These characteristics drive the payload requirements, which are listed in Table 2. The camera detection and optics are guided by requirements PLD-001 to PLD-003, while requirements PLD-004 to PLD-007 constrain the payload physical properties in terms of total mass, volume, power consumption, and storage, due to the need of compliance with low-resource CubeSat standards. The baseline detector is the CCD201 of E2 V L3VisionTM . This device is a 1024 × 1024 pixel frame transfer sensor that uses a novel output arrangement, capable of operating at an equivalent output noise of less than one electron at pixel rates of over 15 frames per second. This makes the sensor well-suited for scientific imaging where the illumination is limited, and the frame rate is high, as it is for LUMIO.

110

S. Speretta et al.

Table 1 Main system and subsystem requirements OVRSYS-001

The mass of the spacecraft shall be no greater than 24 kg

OVRSYS-002

The spacecraft volume shall not exceed that of a 12U CubeSat

OVRSYS-003

The system shall operate in a standalone mode for a period of 10 days without any communication

PROP-001

The propulsion system shall provide a minimum V 154.39 m/s for station keeping, orbital transfer, end-of-life disposal, and a minimum total impulse of 72.91 Ns for de-tumbling, and wheel desaturation maneuvers

PROP-002

The maximum thrust of the propulsion system shall be 500 mN

PROP-003

The propulsion system shall have a maximum thrusting time of 8 h per orbital transfer maneuver

ADCS-001

After the separation from the lunar orbiter, the ADCS shall de-tumble the spacecraft from tip-off rates of up to 30 deg/s in each axis

ADCS-003

The ADCS shall point with an accuracy of less than 0.1° during science and navigation phases

ADCS-005

The ADCS shall provide minimum pointing stabilization of 79.90 arcsec/s during the science phase

ADCS-006

The ADCS shall provide a maximum slew rate of 1°/s

EPS-002

The EPS shall supply 22 W average and 36 W peak power to the subsystems in parking orbit phase

EPS-004

The EPS shall supply 23 W average and 39 W peak power to the subsystems during transfer phase

EPS-006

The EPS shall supply 27 W average and 46 W peak power to the subsystems in science mode

EPS-008

The EPS shall supply 22 W average and 42 W peak power to the subsystems in navigation mode

EPS-013

The EPS shall have a mass of no more than 3 kg

COMMS-001

The spacecraft shall receive telecommands from lunar orbiter at the frequency range of 390–405 MHz

COMMS-002

The spacecraft shall send telemetry to the lunar orbiter at the frequency range of 435–450 MHz

COMMS-003

The spacecraft shall send payload data to the lunar orbiter at the frequency range of 435–450 MHz

COMMS-007

The maximum available time limit for communication between the spacecraft and the lunar orbiter shall be 1 h per day

PDLPROC-01

The payload processor shall receive and process a maximum of 15 images per seconds from payload

PDLPROC-02

The payload processor shall store a maximum of 13 MB of payload data per 29 days period to the COMMS for transmission to lunar orbiter

LUMIO: An Autonomous CubeSat for Lunar Exploration Table 2 LUMIO payload requirements

Table 3 Detector features

Table 4 Optics features

111

PLD-001

The payload shall detect flashes with energies between 10−6 and 10−1 kT TNT

PLD-002

The payload shall detect flashes in the radiation spectrum between 450 and 890 nm

PLD-003

The image integration time shall be equal or greater than 30 ms

PLD-004

The mass of the payload shall be no more than 4.5 kg

PLD-005

The maximum power consumption of the payload shall be no more than 10 W

PLD-006

The maximum size of the payload shall be 10 cm × 10 cm × 30 cm

PLD-007

The payload processor shall create less than 20 MB of science data per day

Parameter

Value

Parameter

Value

Image area

13.3 mm × 13.3 mm

Low noise gain

1–1000

Active pixels

1024 × 1024

Readout frequency

15 MHz

Pixel size

13.3 μm × 13.3 μm

Charge handling capacity

80 ke− /pixel

Storage area

13.3 mm × 13.3 mm

Readout noise @1 MHz

100 W throughout surface operations

With body-fixed solar panels, covering half the circumference of the Lander, higher latitudes require precise landing orientation, while lower latitudes require horizontal panels (which necessitates deployment)

Terrain knowledge

LROC SDNDTM stamps

See Fig. 2

Landing in areas where high-resolution terrain data is available removes the risk of landing on unknown terrain which is challenging for both Lander and Rover

Terrain roughness

Crater size frequency distribution (SFD), average slope, directionality of terrain

No craters of diameter greater than 20 m

Craters below 20-m diameter are avoided by onboard hazard avoidance algorithms

Proximity to lunar highlands

Distance from boundary of Mare

Clearance >400 km

To maintain reliability of range estimates from onboard altimeter

Direction of orbital approach

Orbital inclination

All inclinations ≥ latitude (incl. retrograde)

To have flexibility in launch opportunity

140

M. S. Menon et al.

Fig. 3 Landing site selection—algorithm for LTA

Fig. 4 Landing site selection process

But the probability of occurrence of doublets is also very low [4], and hence, the decision was taken to proceed ignoring them. The LTA is fed with the three DTMs, and a binary image called HazMap is obtained for each in which black pixels are classified as unsafe/potentially hazardous and white pixels are classified as safe. These HazMaps are the data inputs for further analysis. Subsequently, the hazard blobs are converted into equivalent circles—equivalent diameters are computed from the area of the blob. A histogram of the equivalent diameters of the hazard blobs is plotted, and standard statistical parameters are extracted and analyzed. Based on these results (along with other mission considerations), it was decided to use IMBRIUM2 as the primary landing site (PLS)-DTM and IMBRIUM as the secondary landing site (SLS)-DTM. The next step is to select the best coordinates within the (PLS)-DTM, and the same procedure will be carried out on SLS-DTM to generate SLS coordinate list. The landing dispersion ellipse for the Z-01 mission is bounded within a 2 km × 1.9 km

Use of Terrain-Based Analysis …

141

ellipse (3σ) with an azimuthal approach from 10° West of South. The probability of touchdown at any point inside this dispersion ellipse follows an asymmetric 2D Gaussian probability distribution function (PDF). This is represented as a mask with non-uniform weightage of the 2D Gaussian represented on a gray scale (Fig. 5). Hereafter denoted as DEMask, this is used as a mask and convolved over the whole of the DTM HazMap. The 2D Gaussian-distributed weights will ensure that for each potential landing pixel, hazards close to the landing site represent more risk. The top few pixels with lowest risk index will be saved as backup landing sites, and the lowest risk pixel is marked as the primary landing site (PLS). Also, the corresponding portions of HazMap with size corresponding to the bounding box of DEMask (to cater to the Rover’s serviceable range from the Lander) centered around the PLS, (referred to as HazMapLS), and the DTM, (referred to as LoSDTM), are passed on for further presented in Sect. 5 and Sect. 7.

3 Rover Operational Constraints for GPP ECA is a micro-Rover, of mass less than 10 kg. With the mass and volume constraints flowing down from mission management, operating the Rover poses challenges. Each feature that could make operations easier also has a cost associated with it in terms of mass, power consumption or complexity in realization or operation. Onboard software, planning tools and operational rules have been developed to compensate for these limitations of the Rover, and hence bring greater dependability on the Rover as a system while meeting the mission objectives. The Rover as seen in Fig. 6 is an electric four-wheeled independently driven locomotion platform, equipped with a stereo pair of cameras that can be articulated in both elevation and azimuth axes. This stereo pair is used to perform visual odometry

Fig. 5 Landing dispersion ellipse mask (DEMask), a without applying the PDF and b after applying the Gaussian PDF

142

M. S. Menon et al.

Fig. 6 ECA—the lunar exploration Rover developed by TeamIndus

(VO). An IMU is used for absolute roll and pitch determination as well as for relative heading knowledge. Absolute heading is determined using a sun sensor. The Rover is solar-powered, with a single inclined solar panel, and has rechargeable batteries to support peak power and operations when solar panel is not illuminated by sunlight. The Rover drive speed is roughly 6 cm/s. Monitoring and commanding of the Rover are done through a Lander relay link. The following characteristics pose operational constraints that must be respected while developing a drive plan.

3.1 Direction of Traverse a. Power generation: Since the Rover has a solar panel on one side (left), drive plans need to maintain the sun on the same side of the Rover for most of the traverse, unless there are hazards to be negotiated around. From Figs. 6 and 7, with a solar panel on the left-hand side, driving toward South or Southwest is most preferred. b. Camera-feature-sun geometry: To meet reliability requirements, images acquired for VO should not be washed out—the Rover shall not drive down-sun. Also, driving down-sun would cause the Rover to cast its own shadow immediately ahead. These two factors need to be respected to obtain reliable navigation estimates on the distance traversed and the hazards in the path of traverse.

Use of Terrain-Based Analysis …

143

Fig. 7 Illustration of the average solar azimuth between dawn (D) + 2 days and D + 5 days

3.2 Safety Constraints a. Temperature: The wheel motors, cameras and onboard electronics need to be maintained within their operational temperature limits. But as the sun’s elevation increases in the forenoon, the ambient temperature approaches the upper operational limit. Hence, long drive distances are only possible on either side of local noon, but it is recommended to reduce the time spent driving when soil temperatures approach 90 °C. b. Communication link: The Rover antenna should maintain line of sight with the stationary Lander antenna throughout the traverse to maintain control capability. Loss of communication with the Rover will lead a permanent loss of the Rover itself as there is no direct-to-Earth communication link. c. Camera sensor: Direct exposure of the image sensor to the sunlight can damage the cameras—the camera shall not be pointed up-sun. A number of candidate drive directions and camera pointing strategies were studied and traded off against the navigation (feature visibility for VO, nearby hazard identification) and safety requirements (sensor damage due to direct sun exposure) to arrive at a recommendation to drive toward South or Southwest, keeping the cameras looking forward. This strategy works if the targeted drive is completed before the fifth day after sunrise. Though a NE drive with the camera pointing NE/SW as shown in Table 2 is feasible, in case of any delays, the Rover’s own shadow would fall ahead just after noon. Also, the Rover’s initial traverse in proximity to the Lander

144

M. S. Menon et al.

Table 2 Trade-off on Rover drive direction, based on Rover navigation and safety requirements

also needs to avoid the Lander’s shadow. These considerations make the last drive and camera pointing the safest for the mission.

4 Lander Localization In this lunar mission, after the Lander has landed on the Moon, the first step is to find the selenographic coordinates of the touchdown location. This process is called ‘Lander localization’ and is enabled by using a set of onboard cameras and altimeters. The LRO-NAC images [1] are used as the reference atlas against which are compared the images captured by the Lander’s onboard cameras. Lander localization is required to find the actual dispersion from selected target landing point and provide this location as the start point for Rover’s global path planning.

4.1 Source of Images The Lander has three cameras, each operational at distinct phases of the descent. One of these cameras is mounted such that it can view the lunar surface from an altitude of 2000 m down to 100 m. The spatial resolution of the camera from 2000-m altitude is 1 m/pixel. To obtain many identifiable features, an image useful for localizing the Lander must be captured at a sufficiently high altitude to get a larger field of view. Using some of the overlapping images collected by the LRO-NAC, NASA has generated a digital terrain map (DTM) of some sites of interest on the Moon. These DTMs are used in conjunction with a tool called Planet and Asteroid Natural Scene Generation Utility (PANGU), developed at University of Dundee, to generate a synthetic scene using the DTM which forms the reference atlas. The resolution of LRO DTM is 5 m/pixel. It may be noted that the DTM has anywhere between 2.5- and 10-m error in elevation and 5-m error in map tie [3].

Use of Terrain-Based Analysis …

145

4.2 Accuracy Requirement for GPP The 500-m Rover path needs to be clear of hazards, be in sun-positive heading and be as straight and as short as possible. Each heading change induces additional mission cost in terms of time and energy. The path should also ensure a continuous communication link with Lander. Any uncertainty in localization of the Lander, which serves as the start point for the Rover’s traverse, will lead to a requirement to have a wide safe area (while simultaneously growing the hazard areas as much as the uncertainty). Since terrain data used for planning the Rover’s path is at best 5 m/pixel, the target for Lander localization is to keep the uncertainty as close to 5 m (i.e., 1px) as possible, to obtain as many viable paths as possible for ensuring the success of the Rover’s mission. The mission analysis team mandated that the maximum allowable uncertainty in knowing the Lander’s touchdown location shall be less than 20 m with respect to the selenographic terrain grid.

4.3 Position Uncertainties of the Lander at Start of Lunar Descent Worst-case uncertainties in orbit determination and contribution from errors in onboard sensors, derived by the TeamIndus GNC group through Monte Carlo simulations, at start of lunar descent are 180 m in cross-track, 780 m in-track and 1 m in radial. Attitude knowledge errors are bounded to within 0.2° about all axes. The objective is to bring down these initial knowledge errors through a combination of onboard estimates and independent image-based correlation.

4.4 Feature Matching Method Lander localization is done by matching features in the image taken by Lander camera with the reference atlas image. Feature matching can be carried out using various image comparison methods such as: scale-invariant feature transform (SIFT) [6], speeded-up robust feature (SURF) [7] and affine-SIFT (ASIFT) [8], which are extensively used in computer vision applications. Here, ASIFT is chosen for feature matching due to its superior performance under affine changes [9].

146

M. S. Menon et al.

4.5 Algorithm The process involved in the position estimation of the Lander is shown in the flowchart in Fig. 8. The position of the Lander is estimated by matching features in the images taken by the Lander and the reference images. The images taken by the Lander and downlinked to ground after touchdown are a record of the Lander’s true position and orientation. The true position and orientation are represented by {X, Θ}. The reference images are taken using PANGU such that a virtual camera looks at approximately the same scene, using the position and orientation obtained from the inertial measurement unit (IMU) telemetry. Since the error in position at the start of descent is considerable, uncertainties in navigation estimates propagated using the IMU can only grow. The estimated position and orientation provided by the IMU are represented by {X nav , Θ nav }. The position error resulting from uncertainty in orbit determination is given by E nav X − Xˆ nav

(1)

Figure 9 shows the true position and estimated position of the Lander; the ENU origin is at the center of the atlas-referenced DTM, in this case the IMBRIUM2

Fig. 8 Algorithm used to estimate the position of the Lander

Use of Terrain-Based Analysis …

147

Fig. 9 Estimated position of the Lander and the reference point w.r.t the features

dataset.1 The uncertainty of the Lander’s instantaneous position (at the time when an image would have been captured by its camera) is represented in the figure (exaggerated) by an ellipsoid. This uncertainty can be mapped to a corresponding dispersion ellipse around the landing site (LS) within which the Lander would have landed. Reference image is obtained from estimated position Xˆ nav and attitude Θˆ nav . Since the estimated position can have an error (E), the altitude to obtain a reference image is increased bya factor. The new position is called the reference position and is ˆ represented by X ref . With the reference position and orientation, images are taken using PANGU. The true image and the reference image are matched for features. The image pixel coordinates of the matched features are converted into world coordinates, and thus the estimated position of the Lander ( Xˆ f ) and the reference point ( Xˆ fref ) are obtained w.r.t the features, which is shown in Fig. 9. Translating ( Xˆ f ) to the ENU frame gives the estimated position of the Lander in the ENU frame after localization ( Xˆ loc ) as shown in Fig. 10. Considering only the horizontal position for the localization procedure, Xˆ loc Xˆ nav + Eˆ

(2)

where Eˆ acts as a correction term to be added to the position estimate. Thus, the error in position estimates provided by the IMU has been brought down by image processing and the reduced error dispersion is indicated by the green ellipse around the true position of the Lander (X). Mathematically, the localization error (E loc ) is represented by: 1 Lunar

Orbital Data Explorer: http://ode.rsl.wustl.edu/moon/indexproductpage.aspx?product_id= NAC_DTM_IMBRIUM2&product_idGeo=15955656.

148

M. S. Menon et al.

Fig. 10 Estimated position of the Lander after localization

Fig. 11 Feature matching of true and reference images

E loc X − Xˆ loc

(3)

An example descent scenario is considered to validate the localization algorithm discussed. In Fig. 11, the true image from LDS 2 camera (simulated using PANGU) when the Lander is located at an altitude of 1992 m and horizontal position of (127, 167 m) is shown on the left half. The reference image obtained from PANGU at the estimated position (233, 882 m) is shown on the right half. The straight lines correspond to feature matches between the true and reference images, detected using ASIFT. The localization error resulted in (−22.05, −22.03 m). It is important to note that images used for matching need to be consistent with their source; i.e., an image generated synthetically cannot be used to match against an

Use of Terrain-Based Analysis …

149

actual image of a planetary surface. This was recognized after multiple simulations, where the accuracy was limited to level of deterministic detail available in a terrain model used to generate synthetic scenes in PANGU. With a resolution of 5 m/px, the DTM did not have detail of its own when the Lander camera got closer to the surface, at which point PANGU would include fractal features, but which cannot be used to match against (say) a 1 m/px LRO-NAC image. During the actual mission scenario, LRO-NAC images will be used to match with the images taken by the spacecraft. In current scenario, PANGU-generated images have been used for generating both spacecraft cameras and reference images for testing.

4.6 Improvement by Using of Sequence of Images The localization error resulting from the algorithm discussed above can be reduced by repeating the process using sequence of images taken at different altitudes. An iterative method to validate the procedure is presented in Fig. 12. The localization error obtained from previous iteration is considered as the navigation error for the start of next iteration. The procedure is repeated with further images at lower altitudes (Fig. 13). However, the reference image altitude is not reduced below 1000 m to avoid blurred images which can lead to failure of the feature detection method. In a real scenario, Xˆ loc is propagated using the IMU data to obtain the position estimate required and obtain the reference image at lower altitude.

Fig. 12 Validation of localization algorithm using sequence of images

150

M. S. Menon et al.

Fig. 13 Sequence of images used for localization

The objective of all the iterations is to see that the localization error meets requirements and provides an order or magnitude improvement over the navigation error obtained from the onboard estimate, i.e., as per condition in Eq. (4). ||E loc || ||E nav

(4)

The location of the Lander corresponding to each image and the resulted localization errors are listed in Table 3. Note that the localization error was reduced within 5 m after 3 iterations and was further reduced below 2.5 m after 6 iterations.

5 Global Path Planning for Rover Traverse Once localization is complete, we need to generate the shortest, straightest Rover path that is clear of hazards, in a sun-positive direction, and yields a 500-m displacement,

Use of Terrain-Based Analysis …

151

Table 3 Location of image sequences and localization errors Iteration

Altitude (m)

True lander position (m)

Estimated position for reference image (m)

Localization error E loc (m)

1

1992

(127, 167)

(233, 882)

(−22.05, −22.03)

2

1111

(61, 112)

(39, 90)

(−10.03, −7.98)

3

735

(34, 90)

(25, 83)

(−4.7, −4.08)

4

542

(21, 80)

(17, 76)

(−4.17, −3.77)

5

390

(11, 71)

(8, 68)

(−2.92, −2.91)

6

328

(7, 68)

(5, 66)

(−2.38, −2.32)

while avoiding potential hazards. These hazards are represented as a binary image called HazMap, derived by combining the slope, shadow, topographically hazardous features (craters/hills) and LoS hazards, as presented in Sect. 2. Without loss of generality, the smallest hazard size accounted for is at the 5-m scale, due to the limited resolution of the available DTMs. The inputs to GPP are the HazMap derived in Sect. 2 and the touchdown coordinates. Subsequently, we crop a portion of the HazMap with dimensions 1000 m × 1000 m centered at the touchdown coordinates, hereafter referred to as IntHazMapLS. It is on this Map that a path must be planned for the Rover through only the white (safe) areas. A path in the context of GPP is defined as a set of points connected by straight line segments. The points that define the geometry of the path are known as key point. The objective of GPP is to generate the best terrain-relative path, hereafter referred to as GPath, which goes through only the white areas (hence avoiding potential hazards) enabling a displacement of 500 m from the start point of Rover traverse (center pixel), while at the same time obeying Rover operational constraints mentioned in Sect. 3. The best path is defined as the path that meets or exceeds the R-POET parameter requirements described in Sect. 6. GPath is converted into drive steps, which are then used to generate a schedule table. The schedule table describes the whole journey with details about movement, turns, wait periods for recharging battery, cooling down the motor, etc., and is modeled through a discrete event simulation (detailed in Sect. 6).

5.1 Selecting All Potential Destination Sectors As the resolution of the hazard map is 5 m, it is almost certain that there will be hazards in the final GPath that are not marked on the hazard map. If we were to choose a single destination as a target for path planning, it is possible that we discover a hazard blocking the path to that destination, making it unreachable. This outcome results in a waste of resources and reduces the chances of a successful mission. As a result, we work with destination sets that encompass multiple destinations within a localized

152

M. S. Menon et al.

sector. This allows flexibility later in the mission to target a different, but close by, destination if the target initially chosen is unreachable, with minimal demand for additional resources and time required for assessment. Selection of the destination set (set of reachable destinations) is controlled by four parameters: ϕ, , α and targetReachability, as described in Table 4. The sector containing the destination set must be close to the optimal angle of travel (within ϕ ± ) and must contain a satisfactory amount of reachable points. Moreover, they must be reachable with a relatively straight path (controlled by α). For the mission described in this paper, the chosen ϕ is toward selenographic south, which allows for the Rover solar panel to be illuminated by the sun for the duration of the traverse. The destination finding algorithm will look for all sectors that possess a minimum proportion (targetReachability) of reachable destinations that can be reached with a relatively straight path. Each such sector and its reachable destinations are added to a viableSectorSet—this is a set of all viable sectors and the reachable destinations in each sector. limits the deviation of the centering of a sector from the optimal angle, ϕ. Figure 14 shows an example ϕ in red and a bounded search area: ϕ ± . The area outside ϕ ± is shaded black as paths through that region will not meet the constraints as described in Table 2. The search area is divided into sectors, where a sector is defined as any θ ± α (an example sector is the region between two blue lines in Fig. 14) relative to ϕ. In each sector, connected component analysis is used to check the existence of a path that lies within that sector from the Rover starting point (center of the image) to every point on the circular sector perimeter (500-m radius). If the proportion of reachable points is at least targetReachability (i.e., the reachability is satisfactory), these reachable points and their sector are then added to the viableSectorSet. The straightness of the path is controlled by the angular width of this sector (α). The value of θ is initially set to 0° and oscillates clockwise and then counterclockwise, in that order, in increments of α. The sector being searched moves like a

Table 4 Descriptions of variables used for global path planning Variable

Description

ϕ

The optimal Rover heading angle. This is chosen based on mission requirements of ideal start time after Lander touchdown and estimate of time taken to complete traverse

The maximum allowable deviation of the destination sector from the optimal Rover heading angle

α

The straightness coefficient controls the straightness of the final path. A smaller α results in a straighter path

θ

The deviation of the current sector from the optimal angle. A sector with a lower |θ| is desired

targetReachability

The minimum required reachability of destinations in a sector for it to be chosen as a viable sector. Reachability of a sector corresponds to the percentage of points at the periphery that is reachable (reachable_perimeter/total_perimeter)

Use of Terrain-Based Analysis …

153

Fig. 14 Illustration of the variables and parameters used in selecting the destination point set

pendulum in both directions until it has reached the bounds defined by ±. Furthermore, to mitigate the risk of missing solutions due to a value of α that is too small, this pendulum-like oscillation is repeated for a range of increasing values of α. At the end of the algorithm, viableSectorSet is a set of sectors with candidate paths. This oscillation process is illustrated in Fig. 15, in which all sectors with satisfactory reachability are shaded green. In the context of Fig. 15, the viableSectorSet would consist of all the green sectors and their respective reachable destinations. This set will be used to find the best path (described in Sect. 5.2).

5.2 Selecting the Best Path The next step in GPP is to find the best path. Details on the evaluation of the paths to find the best path that matches the requirements are explained in Sect. 6. Figure 16 illustrates the pipeline used to find the bestPath. The destination finding algorithm returns all viable destination points and their sectors. The volume of potential destination points per sector is often high—and finding a path to every destination point in every sector is not viable; instead, the algorithm uses only a subset of a sector’s reachable destinations to find that sector’s shortest path. For each sector with more than five reachable points, a subset of five points, uniformly distributed across the arc of potential destinations, are chosen—this set of five or less points is termed chosenPoints. Figure 17 illustrates how chosenPoints are selected: All marked points are a reachable destination within a sector, and the red points are the chosenPoints. For each sector, the path from the start to each point in that sector’s chosenPoints is found (detailed in Sect. 5.3). Each path is smoothened (detailed in Sect. 5.4), and the

154

M. S. Menon et al.

Fig. 15 Progression (left to right, top to bottom) of algorithm sweeping sectors to find destination points. All satisfactory reachability sectors that are added to viableSectorSet are filled in colour

Fig. 16 Flowchart for selection of bestPath

Use of Terrain-Based Analysis …

155

Fig. 17 Illustration of how chosenPoints are selected for a sector

shortest smoothened path from that sector is chosen—this path will be that sector’s shortestPath. Each sector now has a shortestPath. For each sector’s shortestPath, drive steps are created, a schedule is generated, and the strength of each shortestPath is evaluated. The shortestPath which best matches the requirements is chosen (detailed in Sect. 6).

5.3 Searching for a Single Path An essential part of the GPP process is: “given a sector with a start and destination, search for the shortest path between the two points without exiting the sector.” To do this, the hazard map sector binary image is converted to an undirected graph network. Each node represents a pixel in the image. Each edge represents the cost of travel. Each node is connected to at most eight other nodes, one for each neighboring pixel. The cost function for an edge connecting nodes Pj and Pk is given in Eq. (5). ⎧ ⎨ 10 if P j and Pk are both white pixels and directly adjacent c(P j , Pk ) 14 if P j and Pk are both white pixels and directly diagonal ⎩ ∞ if Pk and black pixels

(5)

The shortest path is found using an informed search algorithm: D*lite. Using D*lite [5] allows for easy re-planning mid-journey if necessary. The heuristic (used for D* lite) of a node Pi is the Cartesian distance from Pi to the destination. Paths found by this part of the process will be referred to as the rawShortestPath. Figure 18 illustrates the rawShortestPaths found in each sector.

156

M. S. Menon et al.

Fig. 18 Illustration of the rawShortestPath from each sector. The figure contains pairs of a sector without the path (above and filled in colour) and the same sector with the shortest path it yields (directly below)

5.4 Path Smoothening A problem with this rawShortestPath is that it is not the actual shortest path. As can be seen in Fig. 19, the two waypoints can be connected by a straight line, which is a shorter distance than following the rawShortestPath (blue path); however, the search algorithm can only find paths that traverse adjacent pixels, which results in the rawShortestPath (an artifact of pixelation). As a result, it is necessary to process this rawShortestPath to yield the actual shortest path, which consists of a set of key points that can be traversed using straight lines, as illustrated in Fig. 20. This makes the path more representative of reality, while also reducing the overall distance of travel and the number of turns and magnitude of turn angles. The combination of key points and straight lines between them creates a GPath.

Use of Terrain-Based Analysis …

Fig. 19 rawShortestPath with two key points (in yellow) marked

Fig. 20 Key points (red) and their connecting lines (blue) make up the GPath

157

158

M. S. Menon et al.

5.5 Drive Step Generation The GPath is used to generate drive steps in which each step is below a maximum distance and each turn is below a maximum turn angle. These drive steps are input to a scheduler to produce a detailed schedule of the global path planning. This schedule can then be utilized by Rover operators as a guideline to drive the Rover and achieve the mission’s goal while staying within the Rover’s operational constraints.

5.6 Results and Discussion Our initial quickfire approach [10] simply chose the first viable sector. Our current method cherry-picks a solution out of a set of many possibilities while matching all the criteria. The benefit of the current method returning more solutions is best illustrated through a visual demonstration. The algorithm was run for α ranging from 5 to 35 (both inclusive). In this case, ϕ 270°, 75° and targetReachability is 20%. The images in Fig. 21 represent the best path the algorithm yields at every value of α for a sector that focuses on one area of the map. Our initial approach would have returned the α 5 solution. However, the α 5 solution has a path that deviates from the optimal direction for the whole starting stretch of the mission. Moreover, the Rover makes a sharp turn away from the sun near the top clustering of key points that is highly detrimental early in the mission. Increasing α to 10 alleviates the latter issue, but still travels in a suboptimal direction most of the time. Further increasing α shows that within approximately the same

Fig. 21 Illustration of rawShortestPaths (blue) and key points (red) from sectors with similar region coverage. GPath distances are in the yellow boxes

Use of Terrain-Based Analysis …

159

region, there is a path that is shorter in distance and yields motion in the optimal direction for more of the mission. GPP is done as a part of planning for the mission, before the mission begins. Once the Lander reaches the Moon, these algorithms will be applied, but the parameters to the algorithm and the result will be dependent on the landing position during the actual mission.

6 Operation Scheduling The Rover is operated at the level of a drive step. A single drive step may constitute a straight drive of desired distance or a spot-turn to a desired heading. The prerequisite for each drive step is that a stereo pair of images are captured and downlinked to the operation team to serve as both a position and an attitude reference (for VO) as well as to help choose the direction and distance of the upcoming drive step. The operation team will take a finite amount of time to plan (up to 5 min), after which commands are uploaded. Once the commanded drive is executed (duration decided by drive command parameters), the Rover captures images from its cameras and transmits them to the Earth. This triggers the start of the next drive cycle. A simplified power, thermal, kinematics and communication model of the Rover was developed along with environmental models to represent the thermal effects on the Rover and the motion of the Moon relative to the sun over time. Each drive step from GPath is converted into a set of command parameters, which modify the state of the Rover and progress the environment models. A discrete event simulation (DES) can be set up to put the Rover model through the path with the environmental constraints in order to obtain a quick evaluation of a given candidate path as well as formulate a drive strategy. The DES represented as a state machine is shown in Fig. 22. This DES and associated functionality, called the ‘Rover Planning and Operations Evaluation Tool’ (R-POET), provide the operation team a prediction of the Rover’s health over the planned global path, the estimated time to complete the planned traverse and indicative distance targets to be met in each drive shift in order to complete the primary mission (500 m from the Lander) before the thermally unsafe conditions take hold. Table 5 shows one of the outputs of the drive scheduler after parsing the path, with indicative shift-wise distance targets to be achieved by Rover operators in each successive 8-h shift. ‘Start’ for the simulation is defined as the time when the Rover begins traverse after deployment and commissioning. ‘Shift start’ refers to the beginning of an 8-h Rover traverse shift from the perspective of the Rover operation team. A pre-mission use case of R-POET is to evaluate paths for their feasibility and efficiency. A sample Rover state profile is shown in Fig. 24, illustrating (i) distance traversed, as a percentage of the total, (ii) instantaneous solar array (SA) power generated as a percentage of the maximum power it can generate, (iii) battery state

160

M. S. Menon et al.

Fig. 22 Discrete event simulation of R-POET represented as a state machine Table 5 Sample drive schedule summary for the path plan to obtain a displacement of 500 m

Use of Terrain-Based Analysis …

161

Fig. 23 Estimated profile of the Rover states while following the global path planning

of charge (SoC) and (iv) heading angle of the Rover while negotiating the candidate path plan. The red arrow in Fig. 23 indicates a part of the drive plan where there is no progress which can be attributed to an unfavorable heading (solar panel is not oriented toward the current solar azimuth); hence, the Rover’s battery gets replenished at a less than optimal rate. There are two ways to mitigate this: (i) command a spot-turn to align the Rover’s solar panel with the current solar azimuth and (ii) discharge the battery further and get to the end the drive step (as there is a favorable heading change required immediately after that drive step). The operation team will have to pick one of these options in real time while respecting other operational factors including the serviceability of the onboard hardware, terrain and learning from prior drive cycles. The analysis was performed for the 21 candidate paths presented in Fig. 18 to evaluate their feasibility from an operational perspective. The parameters that were studied and their relationship to operability and acceptable requirements are detailed in Table 6. Due to the changing solar elevation, each path performs differently depending on the start time. The parameters were hence plotted for the 21 paths for different start times: (a) 24 h after dawn, (b) 48 h after dawn and (c) 60 h after dawn, to see if there are some paths that seem inefficient purely from a time of start perspective. The analysis (Fig. 24) brings out the two families of paths clearly: (a) Southerly and (b) Westerly. Paths 1, 3, 4, 6, 7, 10, 12, 13, 16, 17, 19 and 20 require the Rover to follow a southward direction to complete the traverse—called Southerly paths. These paths perform better with an earlier start. Path 1 is the only outlier that does not progress more than 10% (failure) of the traverse distance (for a post-24-h start) due to an unfavorable heading near the beginning. This can be attributed to the small value of α, thus not routing the path over a straighter and more favorable stretch. All the other Southerly paths improve across all parameters as we delay the start time.

162

M. S. Menon et al.

Table 6 R-POET parameters used to evaluate candidate paths R-POET parameter

Description

Completion time

The local time (in Earth days) after sunrise at which the Rover is expected to complete the traverse Requirement: Completion before Day 5 is preferred from thermal safety

Drive duration

Time taken by the Rover to complete the traverse (in hours); count starts after deployment; and commissioning is complete Requirement: Lower durations preferred afford the operation team margin against an idealized execution of the drive plan

Minimum battery state of charge (SoC)

The lowest charge level of the Rover’s battery (in percentage) while attempting the traverse Requirement: To be maintained above 40% at all time

Total wait time

The cumulative time (in hours) spent by the Rover waiting to charge up its battery, and hence not proceeding along the drive plan Requirement: Lower durations are preferred, as this denotes the efficiency of the path from the power generation perspective

Distance covered

Expressed as a percentage of the total, this parameter is plotted to identify cases where the battery SoC goes below safe limits (or deplete to 0%) before completion of the drive Requirement: 100%

Fig. 24 Analysis of path metrics for candidate paths against identified R-POET parameters

Use of Terrain-Based Analysis …

163

Fig. 25 a Terrain effects on Lander/Rover LoS and b designation of a white (positive LoS) and black (negative LoS) pixel

Paths 2, 5, 8, 9, 11, 14, 15, 18 and 21 follow a Westerly direction—called Westerly paths. All of them lead to early failures for start times less than 42 hr. For starts between 42 and 60 hr, the cumulative wait time is greater than that for the Southerly paths, thus making it preferable to pick them for faster completion and maintaining operational margin. As can be seen in Fig. 25d, only with a start time of 66 h after dawn does it become feasible to attempt the Westerly paths, but the completion times are already close to the start of Day 5, thus making them risky, with hardly any margin for the operation team to accommodate for anomalies and deviations from a nominal activity plan. In summary, R-POET provides a method to evaluate drive strategies and validate the compatibility of the Rover’s configuration for the variety of terrain-based challenges that can affect the ability of the operation team to accomplish its mission objectives well before major development and test effort is committed to. During the mission, the same tool provides an operational plan that can be used as the strategic guide for Rover operators to minimize risks and achieve mission objectives using a strategy that has been studied, evaluated and well understood on ground by the teams involved.

7 Engineering Recommendations to Lander/Rover Communication System Design In any planetary exploration mission, the most critical aspect is the Rover’s capability to negotiate the terrain. For the Rover, an important consideration in the design was the antenna height on the Rover. The antenna height of the Rover needs to be large enough to guarantee a clear LoS between the Lander antenna and the Rover antenna throughout the traverse. This always guarantees a communication link with the Rover. At the same time, tight mass budget and push toward simpler stow/deployment mechanisms constrained the antenna height. Thus, the design requirement is translated to

164

M. S. Menon et al.

Fig. 26 Rover LoS maps in the vicinity of the touchdown point for different Lander antenna heights: a 1.4 m, b 2.5 m, c 3.5 m, d 4.5 m and e 5.5 m

the following problem statement: “Find the lowest antenna height which guarantees clear LoS for at least 80% of points within the circular terrain patch of 500-m radius centered about the preselected landing site on lunar terrain.” For the Z-01 mission, we have used IMBRIUM2 as the PLS-DTM and the PLS coordinates are computed by the algorithm explained in Sect. 2. For LoS studies and to generate antenna height design recommendation, a subset of the PLS-DTM of the size of the bounding box of DEMask and centered around the PLS coordinates is extracted called LoSDTM. As the mission objective is to complete a Rover traverse with 500-m displacement from the Lander, for each potential touchdown point (which now becomes the starting location of the Rover) in LoSDTM, hereafter referred to as CnLS, we extract a 1000 m × 1000 m DTM centered around the pixel (hereafter called CnLSDTM). On the CnLSDTM, the Rover always starts the mission at the center. For any point on CnLSDTM given as input and additional parameters like the Lander and Rover antenna heights, the LoS algorithm computes if the spatial line segment between Lander antenna (at the center pixel) and Rover antenna (at the input point) has any geometrical terrain interference as shown in Fig. 26. In case LoS is clear, it outputs one, else zero. This procedure when repeated on all pixels in the CnLSDTM (as all pixels can potentially be a Rover position in some path plan) produces a binary image called LoS-HazMap for the CnLS. For LoS health metric of a given Rover start location (coincides with Lander touchdown point), termed CnLS, in the LoS-HazMap, the Rover operational constraints on power from Sect. 3 need to be accounted. Let the weighted average solar azimuth (weights being the relative power generation rates) during the expected mission duration be called PPAz. The preferred circular traversal sector TravSect as centered at the CnLS has been fixed with a radius of 500 m having a central azimuth of PPAz and sector angle equal to maximal change in solar azimuth during the surface mission. Subsequently, a connected component analysis (CCA) is executed to check for eight-point connectivity analysis from the Rover initial location (CnLS) to potential final points which are the points on the periphery of TravSect through white pixels. This information is recorded in a metric called reachability index (RI) which is the percentage of TravSect circular perimeter reachable from CnLS. RI values derived by iterating through all the points on the LoSDTM are used to generate a LoS health map (Fig. 27) at the proposed landing site for a given antenna

Use of Terrain-Based Analysis …

165

Fig. 27 False-colored LoS health map with RI scale bar for LoSDTM

Fig. 28 Algorithm for line-of-sight analysis

height. On this image, a thresholding to grade points has been applied with 5% or less reachability considered as unacceptable (black) and acceptable (white) otherwise, to produce a binary image called LoS-HazMap. The percentage of white pixels on the LoSDTM is taken as the decision variable, and the Rover antenna height is used as a design variable, keeping all other parameters constant. The antenna height is increased until at least 80% of all points on the terrain are acceptable and the corresponding antenna height is recommended to the design team. And the LoSHazMap is combined with the HazMapLS by a bitwise AND operation to generate an integrated hazard map of the proposed landing location, referred to as IntHazMap which is further passed onto the GPP planner for Rover path planning. A flowchart showing the brief overview of the process is shown in Fig. 28.

166

M. S. Menon et al.

8 Conclusion and Future Work The pre-launch processes and tools required to plan and execute the Rover’s mission have been developed and are being exercised in field tests with the operation team. The recommendations from the reachability analysis were incorporated in the Lander and Rover antenna designs. Terrain-based analyses have helped overcome the limitations of a low-cost Rover, demonstrating the ability to fulfill the mission objective of traversing 500 m well before onset of local noon. The studies presented here are extensible and applicable to a wide variety of missions. Exploration of the lunar poles will involve communications with Earth which would be very close to the horizon. Craters and hilly reliefs on the poles will pose LoS constraints for Rovers as well as Landers. Using reachability analyses might be crucial to performing mission in these challenging locations on the Moon. Adding the LoS analysis to landing site selection can help reduce the risks to the Rover mission purely by landing in a region that has very high reachability. This operation would require heavy computation and has been earmarked to be performed later. The IMBRIUM and IMBRIUM2 DTMs were created in 2013. The LROC team has been able to improve the accuracy and quality of the DTMs after 2013 [3]. A request has been submitted to the LROC team to reprocess IMBRIUM and IMBRIUM2, thereby improving the absolute orientation, i.e., map-tie error and elevation. Acknowledgements The authors would like to thank the LROC team of Washington University in St. Louis and the LRO team at NASA GSFC for the amazing work done with LROC and the data products generated therefrom. Commercial space exploration companies have greatly benefited from these datasets and spurred investment on technologies that could get humans back to the Moon. Thanks also go to Mr. Natarajan who has mentored the Guidance, Navigation & Controls group at TeamIndus, and Mr. Vishesh Vatsal, Mr. Shyam Mohan and Ms. Deepana Gandhi, part of the GNC group, without whose work and help in proofreading, this chapter may not have been possible.

References 1. Robinson, M. S., Brylow, S. M., Tschimmel, M., et al. (2010). Lunar reconnaissance orbiter camera (LROC) instrument overview. Space Science Reviews, 150, 81. https://doi.org/10.1007/ s11214-010-9634-2. 2. Williams, J.-P., Paige, D. A., Greenhagen, B. T., & Sefton-Nash, E. (2017). The global surface temperatures of the Moon as measured by the diviner lunar radiometer experiment. Icarus, 283, 300–325. ISSN 0019-1035, https://doi.org/10.1016/j.icarus.2016.08.012. 3. Henriksen, M. R., Manheim, M. R., Burns, K. N., Seymour, P., Speyerer, E. J., Deran, A., et al. (2017). Extracting accurate and precise topography from LROC narrow angle camera stereo observations, Icarus, 283, 122–137. ISSN 0019-1035, https://doi.org/10.1016/j.icarus. 2016.05.012. 4. Woronow, A. (1978). The expected frequency of doublet craters. Icarus, 34(2), 324–330. ISSN 0019-1035, https://doi.org/10.1016/0019-1035(78)90170-7.

Use of Terrain-Based Analysis …

167

5. Koenig, S., & Likhachev, A. M. (2002). D*lite, eighteenth national conference on artificial intelligence (pp. 476–483). Edmonton, Alberta, Canada: American Association for Artificial Intelligence, 0-262-51129-0. 6. Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110. https://doi.org/10.1023/B:VISI.0000029664. 99615.94. 7. Bay, H., Ess, A., Tuytelaars, T., & Van Gool, L. (2008). Speeded-up robust features (SURF). Computer Vision and Image Understanding, 110(3), 346–359. https://doi.org/10.1016/j.cviu. 2007.09.014. 8. Morel, J. M., & Yu, G. (2009). ASIFT: A new framework for fully affine invariant image comparison. SIAM journal on imaging sciences, 2(2), 438–469. https://doi.org/10.1137/080732730. 9. Wu, J., Cui, Z., Sheng, V. S., Zhao, P., Su, D., & Gong, S. (2013). A comparative study of SIFT and its variants. Measurement science review, 13(3), 122–131. https://doi.org/10.2478/ msr-2013-0021. 10. Menon M. S., Kothandhapani A., Sundaram N. S., Nagaraj S., & Raghavan V. (2018). Terrainbased analysis as a design and planning tool for operations of a lunar exploration rover for the teamindus lunar mission. In SpaceOps Conference (AIAA 2018–2494) https://doi.org/10. 2514/6.2018-2494.

The Evolution of Interface Specification for Spacecraft Command and Control Eric Brenner, Ron Bolton, Chris Ostrum and A. Marquis Gacy

Abstract This paper describes an evolution from a traditional satellite commanding interface control document (ICD) to a service suite which provides real-time propagation and validation of interface changes. TEL handling, elucidation, modification, and investigation service (THEMIS) is the DigitalGlobe next-generation software suite which provides runtime validation of satellite tasking built by mission planning systems and spacecraft engineers. It enables more efficient management of multiple baselines and changes through the lifecycle of a constellation mission. Through THEMIS, developers, spacecraft engineers, and system engineers view, edit, and manage revisions of the spacecraft tasking interface. The interface specification is represented in JavaScript Object Notation (JSON) format and configuration management is provided through a GitHub repository. Once generated, the ICDs are used in real-time operations. This approach has reduced interface interpretation errors by having a single service able to validate commanding generated by multiple sources against any interface baseline.

Nomenclature API CI/CD CP DAF DG Effectivity ICD JSON LEOP MCS MPS

Application programming interface Continuous integration, continuous delivery Collection planning: constellation-wide image planning software Direct access facility DigitalGlobe Time/version correlated ICD document Interface control document JavaScript object notation Launch and early operations Mission control system Mission planning system

E. Brenner (B) · R. Bolton · C. Ostrum · A. Marquis Gacy DigitalGlobe, 1300 W 120th Ave, Westminster, CO 80234, USA e-mail: [email protected] © Springer Nature Switzerland AG 2019 H. Pasquier et al. (eds.), Space Operations: Inspiring Humankind’s Future, https://doi.org/10.1007/978-3-030-11536-4_8

169

170

OE REST SCEng SOA TEL THEMIS UI

E. Brenner et al.

Operations engineer Represenational state transfer Spacecraft engineering Service-oriented architecture Tasked event list TEL handling, elucidation, modification, and investigation service; software service suite addressing TEL ICD management User interface

1 Introduction The software interface between satellites and ground systems to command and monitor the devices is complex and mission critical. The design, development, integration, test, and runtime validation of this interface typically involves a large number of software development, systems, aerospace, and operations engineers. Often the engineering teams use different development methodologies, tools, and they work for different companies. This complexity has contributed to long schedules and high cost when integrating a satellite into the ground system. Generators to the interface include the following: • DG mission planning system (MPS). The MPS generates the commanding of the spacecraft used in day-to-day image collection operations driven by DG business requirements. • Direct access facility (DAF). DAF generates commanding specifically for their image collection. • DG spacecraft engineering (SCEng). SCEng generates adhoc tasking which can be used for things such as spacecraft maneuvering related to orbit changes and collision avoidance. • DG mission control system (MCS). The MCS consolidates the commands from the aforementioned sources and converts them to spacecraft-specific commands. MCS also generates commands that are used to monitor and maintain spacecraft health and safety. It is important to note that each of these generators use a subset of the commands, depending on their use case. This complicates both the specification and the validation of the interface. As with any systems interface to a complex hardware system, a satellite commanding interface undergoes a period of refinement from the initial specification through integration, and continuing until the mission ends. The interface itself is not as simple as a software API, REST endpoint, or website user interface. The interface controls electrical and mechanical operation on a device that, once deployed, cannot be seen or touched. The worst consequences of defects in this interface are much more costly than most other systems, ranging into the hundreds of millions of dollars.

The Evolution of Interface Specification for Spacecraft …

171

Though XTCE is an industry standard for satellite commanding [1, 2], it does not provide an abstraction of the command interface that is optimal for planning systems. DigitalGlobe operates satellite constellations with varying spacecraft vendors, providing a need for this abstraction. The abstraction of the spacecraft-specific commanding into an event provides numerous benefits to the planning systems used in operations. With the abstraction, however, there must be an application residing in the MCS that is able to translate the events into the spacecraft commands. Similarly to the TEL ICD, a developer in the MCS would read a document that was generated by systems engineering and SCeng which would detail how the event must be translated into the spacecraft-specific commands. The developer would then turn this document into code in an application called command generator (CG)—an error-prone process. With this application being separate from the planning system, a coupling between CG and the planning systems occurred. Even the trend and power of the “-as-a-service,” SOA, revolution is a complication when the interface is orders of magnitude more complex than that of a “microservice.” DigitalGlobe enables DAF customers to plan their image collection using a customized planning system to accomplish their mission; so, the planning for the constellation is distributed across different instances and implementations. And, of course, efficient missions need to be able to task the satellites in creative and ever-changing ways without risking the space assets, which are “long-lead time,” expensive to design/build/test, and expensive to deploy and maintain. Another inherent complexity of the spacecraft to ground system interface is the result of the satellite deployment process itself. The complexity of the device and operating environment require that a satellite be brought into operations through several carefully monitored and controlled phases. These phases handle the complexity of integration testing on the ground during carefully orchestrated events such as “Command Days,” “Launch Rehearsals,” and “Mission Rehearsals.” Similarly, once in orbit, the interface is used in a carefully prescribed manner to check out key systems before turning on other systems. For example, the interface used during the launch and early operations (LEOP) period is quite different than the commanding during the initial operations. Initial operations are phased as the mission payload, e.g., the earth imaging sensors, are calibrated. Then, naturally, during the mission itself, the interface can be changed based on mission requirements or adjustments needed because of unexpected conditions of the satellite in orbit. These differences in the interface need to be managed as “effectivities”—periods in time in which the interface is valid. Troubleshooting unexpected results during each phase requires the antithesis of “continuous integration and deploy” of a multitude of independent microservices. To put a fine point on the argument, in the case of design and implementation flaws in the interface which escape detection during integration, the rebooting of satellite systems or loading of new firmware to the space vehicle is a high risk and expensive consequence not soon to be improved by continuous integration and deployment. In summary, the following are the challenges of development, testing, and realtime validation of a satellite-to-ground interface:

172

E. Brenner et al.

• The interface, by the nature of the hardware and operating environment, is exceptionally complex. • A large number of engineering groups, including vendors, with different areas of expertise and work styles must develop to a common, precise understanding of the interface. • There is no industry standard specification framework to normalize satellite commanding at a level of abstraction for a planning system and satellite operations engineers. • Satellites must be brought into service in a carefully orchestrated and methodical method, adhering to the appropriate constraints of the commanding interface. • The consequences of incorrect interpretation of the satellite interface can be extreme. • Coupling of planning system and command generation.

2 A New Approach DG has evolved the engineering practices of designing and operating the ground system commanding of a satellite constellation in the following ways, listed below, and described in detail in this section: • Defined a “tasking” language for planning systems to present to the mission control system. This language normalizes many satellite operations and abstracts commanding to a macro-level appropriate for image collection planning. • Translated the tasking specifications into machine-readable schema for use in realtime operations for validation of commanding parameters, types, and states. • Extended the tasking specification to allow the translation of the abstracted events to the spacecraft-specific commands. • Created a web-based UI for viewing and editing the machine (and human)-readable tasking specification. • Integrated a configuration management repository for archival and access to multiple versions of the interface.

2.1 Abstraction of Spacecraft-Specific Commanding DG first abstracted satellite commanding into a text syntax called “Tasking,” which is generated by the mission planning systems, operators, or SCeng to command any satellite in the constellation (e.g., “Turn on narrowband transmitter”). In general, these high-level events are quite similar across unique spacecraft, even from different vendors. These tasking events are conceptually macros that are translated into vehicle-specific commands by the mission control system. A Tasked Event List (TEL) is a time-ordered sequence of these events. TELs are specified in terms of actions,

The Evolution of Interface Specification for Spacecraft …

173

Fig. 1 Previous procedure for TEL ICD update

states and parameters. The TEL specification is defined by aerospace and systems engineers based on specifications provided by the satellite vendor. The TEL specification is the common commanding language which the planning systems and the spacecraft engineering operations teams use as an interface into the mission control system and then translated and uplinked to the satellites. However, even with this abstraction, the TEL interface control document (TEL ICD) for the DG constellation is 150+ pages in a MS Word document (Fig. 1). The change record for this document itself is 13 pages. The document management tool shows 23 major version updates, and actually struggles to display the version history. Prior to THEMIS, this spec drove the engineering work of many software development teams including teams of vendors and customers. It also drove the development of procedures used by spacecraft engineers and vehicle operators. All of these disparate teams develop to this interface in parallel, as it is being refined. Working against a large specification such as the TEL ICD is anathema to an agile development process. This problem is exacerbated when there are changes constantly being flowed to the ICD during the development and test lifecycle of these teams. Further evolution of the engineering process was needed.

2.2 THEMIS Architecture THEMIS is a modest next step at DG to improve the efficiency of the design, development, integration testing, and operations of the spacecraft to ground systems TEL interface. In keeping with the architecture change DG is making [3], the implementation consists of two REST services; a Web user interface to view/edit and version control the interface and a backend service to validate real-time commanding against a JSON representation of the TEL ICD (Fig. 2). With the THEMIS project, the TEL ICD was translated by system engineers into a JSON schema. JSON TELs generated by the planning systems or spacecraft

174

E. Brenner et al.

Fig. 2 High-level architecture vision of the system surrounding THEMIS

engineers are validated against this schema. JSON was selected over XML because of the processing efficiency [4] in real-time operations is a key value add—which allows quick reactions from operations personnel in real-time. The services that comprise THEMIS functionality are depicted above, and include: • THEMIS-UI (front-end) provides a user interface to add, modify, and publish ICD versions. This is most commonly used by the systems engineers in the MCS and the SCEng team. • DGs private GitHub provides the storage of the TEL JSON schema and the version and configuration management functionality. • The THEMIS backend provides REST endpoints for validation of TELs against the JSON schema, as well as an endpoint for retrieval of the JSON representation of the TEL ICD. Consumers of the information (MPS, DAF Planning System, SCEng, and MCS) can use these endpoints real-time in the operational system as well as in the testing environments. The systems that use THEMIS in real-time operations are depicted in orange, and include: • DGs MPS uses THEMIS for validation of TELs that it generates. • DG customer DAF Planning System(s) use THEMIS for validation of TELs they generates.

The Evolution of Interface Specification for Spacecraft …

175

• SCEng uses THEMIS for validation of TELs that it generates for adhoc operations. • MCS uses THEMIS for validation of common TELs that it modifies prior to final command translation and uses the translation definition to generate command lists. The systems that support THEMIS are depicted above, and include: • Common config server. The common config server is used by THEMIS to provide those values that are common across multiple domains in the enterprise, such as the Baseline ICD Version or values specific to DG DAF customers.

2.3 Real-Time Validation TEL Validation is an integral part of ensuring the commanding of the vehicle does not risk the safety and health of the satellite. As with all good designs, the validation of the interface should be consolidated in one system to reduce the effort of correcting defects. Consolidation also decreases the cost and schedule required to add a spacecraft to the ground system. DG also focused on improving when validation occurs. By providing a single source of validation for all applications early in the flow, the services and users that generate TELs are able to verify the contents of the TELs they produce before distribution throughout the system. While this sounds like a rudimentary step, it is one we accomplished only by consolidating validation logic to a single service such as THEMIS, as opposed to having many teams interpret an ICD. Having disparate teams interpret the same document resulted in the validation of TELs being piecemeal through the system (Fig. 3). For instance, one app may validate the header information of the TEL while another would validate the parameter values within the TELs. Validation earlier in the processing of a TEL also aids operations troubleshooting errors that are occurring. Rather than finding an issue in one component and tracking the error down through many components to determine the source, operations know which component generated the TEL and can quickly triage, fix, and deploy. The new process of validation of TELs against the JSON TEL schema consists of four main tasks: primary header validation, event and schema collection, schema and predicate utilization, and result consolidation. Each task is required to proceed to the following step and allows early exiting when an error is found. This helps reduce processing time and allows operators the opportunity to make corrections to either the TEL being processed or the JSON TEL ICD, should an error case arise. The validation focuses on data types, ranges, and logic conditions pertaining to enumerations and their permutations. It is important to note the limits of THEMIS—it does not validate flight rule level checks (e.g., turning transmitters on prior to uplink or downlink). Those checks are performed in a separate downstream component. Primary header validation allows the software to verify the key metadata values of a TEL are consistent while further refining the set of schema validations to apply in the follow-on steps. This validation follows a simple white-list approach to ensure that a presented key falls into one of a set of known values or enumerations.

176

E. Brenner et al.

Fig. 3 Previous procedure for real-time TEL validation

Once the primary header has been validated, THEMIS continues the validation process and begin collecting both the event and schema sets. Building these sets consists of aggregating each type of TEL event into a dictionary of arrays keyed by the event name. This dictionary allows the software to process all unique occurrences of each event in the next step. In addition to collecting the events, THEMIS also builds a similar dictionary-based set for the corresponding TEL ICD schema. This schema set is a 1-to-1 relationship to the previously collected event set. This schema is applied to each event in the next step. After each of the event and schema collections have been created, THEMIS applies each schema validation to the array of events collected previously. In effect, THEMIS iterates through the list of events of each type and ensures that each TEL event passes the appropriate schema validation and JSON predicate application specified by the ICD. These validations ensure data types are correct, values fall into the appropriate ranges, and logical conditions specified in the ICD domain-specific language are met. The final step in TEL validation is to perform any result or error consolidation. This step allows the human operator to more easily view and take action should an error scenario arise (Fig. 4). Building an error message that provides key metadata and information about the TEL being processed, the error encountered, and the TEL event time allows the operator to more quickly acknowledge and act on the scenario. Once a TEL is successfully validated, it is allowed to continue flowing through the command and control system, eventually being translated to vehicle commands.

The Evolution of Interface Specification for Spacecraft …

177

Fig. 4 New procedure for real-time TEL validation

2.4 UI/ICD Generation Section After proving the benefits of the JSON validation of a TEL, DG is taking the next step—providing a user interface for the systems engineers to generate a TEL ICD. While this can be generated directly in JSON (and has been thus far), the UI prevents simple syntax or structural errors within the JSON. The THEMIS-UI provides three main functions related to TEL ICD generation and maintenance: 1. Manage ICDs living in the GitHub Repository. a. b. c. d.

Add an ICD version. Request feedback on an ICD version edit. Publish an ICD version. Delete an ICD version.

2. Addition, removal, and modification of events within the ICD. 3. Addition, removal, and modification of parameters within the events. 4. Addition, removal, and modification of command translation information. The ICD management interacts directly with the THEMIS GitHub repository. A user can begin with a blank TEL ICD, or they can copy a pre-existing ICD that is already stored in THEMIS. When adding a TEL ICD, THEMIS-UI will create a branch in the THEMIS repository. As the TEL ICD is generated over many months, work can be saved on this branch. When saving the TEL ICD, the THEMIS-UI will commit to that branch. When a TEL ICD is ready for review, the user can select the option to request feedback. This selection will initiate a pull request for the branch the ICD is on. Reviewers can then use the GitHub UI to view the markdown

178

E. Brenner et al.

Fig. 5 New procedure using THEMIS for TEL ICD updates

changes, leave comments, and approve or deny the change. When the changes have been approved, the user publishes the ICD through the THEMIS-UI, which merges the pull request into the master branch of THEMIS. When a commit is made to the master branch, the application is deployed through the DigitalGlobe CI/CD pipeline into all environments, both test and production (Fig. 5). At this point, applications can perform final integration with the new version of the TEL ICD, as it can be referenced with a configuration value prior to officially baselining the version. When all applications have been updated and validated, the OE which supports THEMIS can update the baseline ICD version in the common config server, which defines that version of the ICD as the production-use version for all applications. Should a user begin creating an ICD and decide to throw away their changes, the THEMIS-UI allows for the deletion of an ICD. When deleted, the branch in the GitHub repository is deleted from THEMIS.

2.5 Data-Driven Spacecraft-Specific Commanding In order to decouple these systems, the JSON tasking specification was extended to provide a machine-readable definition for the translation. A new application was then developed in the MCS to ingest the machine-readable tasking specification from THEMIS, parse the command definitions, and convert the TEL into the commands which would then be sent to the spacecraft. By using THEMIS as the central store for this information, the event definition in the TEL ICD and the command translation can be updated and deployed atomically. When these TEL ICD updates are deployed, this new application is unaffected. As discussed previously, various versions of the ICD

The Evolution of Interface Specification for Spacecraft …

179

can be used via configuration parameters, allowing the planning system validation of the new ICD and command translation without impacting operations.

3 Future Enhancements The THEMIS-UI is being developed, so DG will continue refining the architecture to ensure the needs of all the stakeholders are being met. GitHub provides an excellent interface to view in-line changes to the JSON ICD. However, the JSON TEL ICD can be many thousands of lines. One improvement currently planned for the UI is to provide an easier view into the changes that are made between versions of the TEL ICD, as opposed to reviewing thousands of lines of JSON code. While the machine-readable version of the ICD should be used for software, distributing the ICD to external teams requires legacy documentation. DigitalGlobe intends to build a microservice that will read the TEL ICD and generate the documentation required from a human-readable perspective. This document can contain other information such as diagrams that are not found within a JSON document. The result of using the JSON TEL ICD to generate the documentation is providing a single “source of truth” for the machine definition of the interface. This also moves DG closer to a “code-as-documentation” mindset. As DG continues to develop future systems to fly the constellation, another improvement would be to have the applications themselves read the TEL ICD and use the information found there to determine how to plan the events in the TEL. This is the final integration point needed to have THEMIS truly be the one source. This would be a deprecation of the human-readable document version of the TEL ICD that is interpreted by external vendors.

4 Conclusion As an enterprise continuously looking to improve value delivered to customers, DG has already seen benefits of migrating to this new approach of single-source validation applications for critical components in the system. Inconsistencies have been found and DG has been able to realign to ensure all applications are in sync. Validation occurs earlier in the process allowing DG to ensure that the data flowing through the system is as accurate as possible, leading to efficiencies gained in operations. As always, lessons have been learned along the way. In particular, the TEL ICD originated prior to JSON schema validation. The document has historically contained additional information which was required to be interpreted in order to build against the TEL ICD. Migrating this information into a structure such as JSON schema has proved difficult, but possible. As DG continues growing its constellation, this problem can be mitigated by defining the JSON TEL ICD in the appropriate structure.

180

E. Brenner et al.

By extending THEMIS to provide the definition of the events as well as how they are translated to commands, DG has reduced the cost to add a satellite to the constellation. Months of development time have been removed in favor of defining the command translation in a machine-readable format. Overall, the pitfalls that have been found in converting a legacy document into a new schema validation have been well worth the benefits gained by DG. Acknowledgements All of this work was part of the direct research and development carried out at DigitalGlobe Inc., now part of MAXAR Technologies. We would like to thank all of our MAXAR Technologies co-workers for providing lessons learned and operational feedback involving the architecture, implementation, and operations of the multi-mission ground system.

References 1. Nurseitov, N., Paulson, M., Reynolds, R. & Izurieta, C. (2009). Comparison of JSON and XML data interchange formats: A case study. In Proceedings of the ISCA 22nd International Conference on Computer Applications in Industry and Engineering. San Francisco. 2. Gal-Edd, J., Kreisler, S., Fatig, C., & Jones, R. (2008). XTCE and XML database evolution and lessons from JWST, LandSat, and constellation. In SpaceOps 2008 Conference. Heidelberg, Germany. 3. XML Telemetric & Command ExchangeFormat Specification Version 1.1, formal/2008-03-01, Object Management Group, 2008. 4. Wierzbinski, M., Waters, M., & Upham, J. On the road to ground systems that scale for the future. In 21st Annual Ground System Architecture Workshop Conference. Los Angeles.

Exploring the Benefits of a Model-Based Approach for Tests and Operational Procedures R. de Ferluc, F. Bergomi and G. Garcia

Abstract Traditional practices to specify tests and procedures for satellite manufacturing activities rely on handwritten specification in Word or PDF documents, provided as inputs for validation and verification activities (V&V) such as on-board software validation, functional chain validation, and assembly, integration, and test validation. Several optimizations of the V&V process have already been identified, and new tools and practices are progressively deployed in operational projects, but with a limited impact on planning. In complement to these works, and based on a large background in model-based system engineering (MBSE), Thales Alenia Space has started to define a new approach to introduce more formalism in the test and procedure activities, especially using the open-source Capella modeling tool and its associated methodology, Arcadia. This work performed through R&D activities has allowed to highlight MBSE benefits and to define an end-to-end MBSE approach to handle tests and procedures specifications, which seems promising to reduce costs and planning for new spacecraft design and test activities.

1 Introduction Satellites industry, like other industrial domains, is looking for reduced time to market new products. At the same time, system complexity is raising and satellites integrate with more parts and more components to provide more functionalities. These two trends are accompanied with a third one which consists of reducing recurrent but also non-recurrent costs related to design, validation and verification (V&V), and manufacturing phases of satellites. In this context, Thales Alenia Space, in the scope of R&D studies (both internal and institutional studies), has started to analyze the situation with a global point of view and to think about a new approach which could bring a global optimization of the process. This chapter describes first the challenges identified at the beginning, then presents the first and second optimization steps that were planned, and finally introduces the MBSE approach

R. de Ferluc (B) · F. Bergomi · G. Garcia Thales Alenia Space, 06156 Cannes, France e-mail: [email protected] © Springer Nature Switzerland AG 2019 H. Pasquier et al. (eds.), Space Operations: Inspiring Humankind’s Future, https://doi.org/10.1007/978-3-030-11536-4_9

181

182

R. de Ferluc et al.

with some details about the expected benefits of a such approach. In addition, the discussion is illustrated by a small case study.

2 V&V Challenges in the Traditional V Cycle The target objective can be summarized as: “we want to develop more complex systems with reduced costs and also shorter planning.” To reach this objective, the first task is to analyze the traditional process and to identify the optimization areas. This exercise is performed adopting a helicopter view and focusing on the theoretical process. The global process is structured according to the V-cycle paradigm: The lefthand-side branch of the V letter represents the design and development activities, and the right-hand-side branch represents the validation and verification activities. This global process is depicted in Fig. 1. Ideally, the process starts with the involvement of operational teams which have the responsibility to define the operational concepts. In general, this step is rather performed directly by system teams based on the customer needs. When operational concepts are defined, mission planning and operational procedures can be specified at a high level of abstraction. System engineers have the responsibility to establish the system high-level design specification so as to satisfy customer’s requirements. At the same time, those engineers have to prepare assembly and integration activities and therefore define and specify test–procedures for these activities. System is decomposed into several sub-systems, each one being under the responsibility of a dedicated team or external provider. Sub-systems can be further decomposed. In this article, we focus more on functional aspects, and therefore, we consider in first place payload and avionics perimeters. Here again, design activities have to

Fig. 1 Global process overview

Exploring the Benefits of a Model-Based …

183

be performed together with test preparation activities, leading to the definition of functional chain test sequences description. At the end of the design branch, software and hardware components are specified so as to fulfill sub-system requirements. Testing of each SW or HW component is also prepared through test plan elaboration. Finally, components are developed and unitary-tested to check the correctness of the development activities. This is the end of the descending branch and the beginning of the ascending branch: validation and verification activities are performed from low level (SW and HW components) up to the highest levels (sub-systems, system, and spacecraft in its operational environment). All those activities are based on inputs elaborated during the design and development phase and consist of translating procedure and test specification in concrete procedure and test artifacts that can be executed thanks to appropriate tooling and setup in the various V&V steps. One important aspect highlighted in Fig. 1 is the use of document-based formalism (Word, PDF, Visio, Excel, documents) to specify procedures and tests early in the process. The company organization is generally aligned with this process. A classical organization can be decomposed into independent departments, each one owning a given perimeter (at system, or sub-system, or component level), with independent teams in charge of design activities on the one hand, and validation activities on the other hand. Moreover, each team in charge of validation and verification activities may be using specific tools perfectly matching their specific needs. These tools are often provided by other teams in the company. A typical organization is illustrated in Fig. 2. Last but not least, the system reference database (SRDB) is used to centralize all the data related to the design of the satellite and to provide a reference for all stakeholders and all activities across the V cycle. This implies that procedures and tests, specified earlier in textual documentation, have to be elaborated by using the content of the reference data base.

Fig. 2 Company organization

184

R. de Ferluc et al.

3 Step 1: Unifying of Preparation Environments One of the first identified optimization point is to try to unify the tools required to perform V&V activities. This rationalization obviously starts with the convergence of the procedures and tests preparation environment. The objective is to unify the teams that are developing, maintaining, and providing the support for those tools. The process remains unchanged, but the V&V teams start to share some common tools and practices, as illustrated in Fig. 3. This approach has several benefits: First, it allows to save costs by rationalizing the tool development. For instance, the connection with the reference database is done only once. Then, the organization evolves toward a large development team instead of several small ones and a unique help desk and support team. Second, as users are sharing a common tool, it allows for collaboration between discipline engineers, opening the door to exchange pieces of procedure or tests, known under the names «Common Blocks» or «Primitives». This approach also raises some challenges: It is not easy to make the decision about the tool to select to become the unified one; this tool may not natively implement all functionalities to satisfy all user needs and may be improved to address disciplines’ specific needs. The development team is facing multiple product owners, forcing priorities to be established in a consensual way, and end users have to be educated to new tools and practices.

Fig. 3 Introducing a unified preparation environment

Exploring the Benefits of a Model-Based …

185

The major benefit for this rationalization is about cost reduction. However, sharing a common tool has a limited impact on the overall planning of the entire development and V&V lifecycle.

4 Step 2: Unifying Test Environments and Facilities A second step appears to be evident once the first step is established. The approach is to unify the test environment and facilities for all specific V&V phases. This step is based on the adoption of the European Ground Segment Common Core (EGS-CC), which is considered as an opportunity to deploy seamless collaboration between European actors in the space domain and to reduce costs associated with the maintenance of proprietary solutions. This step will allow to optimize tests and procedures execution. As for step 1, it is expected that it has a limited impact on planning.

5 Analyzing Planning Reduction Blocking Points A deeper analysis of the global process and of the step 1 and step 2 optimizations shows that planning reduction is hard to achieve due to the strong role in the process of the system reference database: The SRDB is acting as a real bottleneck in the V cycle, meaning that no activity in the left-hand side of the V can start before the SRDB is populated. This situation is due to the fact that tests and procedures elaboration relies on the system reference database inputs. However, the system reference database is elaborated at the end of the development phase and therefore arrives late in the process. Furthermore, it is elaborated without knowing exactly the tests and procedures needs. As such, it may contain too much content, or one can miss some required data. Finally, the content of the database needs to be consolidated when tests and procedures start to be written. In some cases, inadequation between user needs and system definition is detected only at this stage, with possibly some impacts on the design, leading to delays in planning. Last but not least, the validation of the reference database is a challenge in itself, and additional tests may be defined only to cover the database validation. Based on these observations, it appears that the global approach needs to be reconsidered: the system reference database should be derived from the tests and procedures definition. This means several changes: (i) a better formalization of the specification of tests and procedures all along the development phase, (ii) checking the matches between the system design and the operational and test needs early in the process, (iii) synthesis of the content of the database from the formalized test and procedures specification, and (iv) synthesis of test and procedures artifacts from the formalized tests and procedures specification.

186

R. de Ferluc et al.

All these obviously require adequate tooling. Based on Thales Alenia Space background, model-based system engineering tools and practices appear to be good candidates for such approach. Investigation has been performed in this direction through dedicated R&D studies.

6 Benefits of MBSE with Capella Thales Alenia Space in France has been active on model-based system engineering since 2009, when it started using and deploying the Melody CCM tooling for onboard software design activities. Since then, Thales Alenia Space has gained maturity and experience on this new approach, identifying when and where MBSE was the most beneficial, how to gain by applying it, and smoothly extending this approach to larger perimeters and to new engineering and development teams [1]. Thales Alenia Space also developed a deep expertise on the extension, and integration of model-based engineering environments through the development of code generators, viewpoints, and editors, sustained by Thales Group expertise and modelbased engineering assets. Today, model-based engineering is widely deployed in Thales Alenia Space thanks to a set of environments covering various engineering activities, from the satellite system engineering down to the design of software validation strategies and tests and procedures. Thales Alenia Space ambition is to set up a coherent and harmonized modelbased system engineering process accompanied by efficient model-based engineering environments and supporting most of the system development life cycle activities throughout all development phases. One of the cornerstones of this engineering environment is the open-source Capella toolset. Capella is a model-based system engineering environment resulting from the open-sourcing of Thales-owned system engineering environment called Melody Advance. Formerly reserved for Thales internal usage, it is now available through the Polarsys Industry Working Group (http://polarsys.org/capella) which involves large industry players around the co-development and sharing of opensource tools for the development of embedded systems. Capella is built over a core model which merges parts of UML2, SysML, functional analysis, and also NAF concepts and drives the system engineering activities through three main levels (system analysis, logical architecture, and physical architecture). These activities are assisted and guided with the help of a sound methodology which is both documented and implemented in the tool through modeling guides, cheat sheets, assistants, and validations.

Exploring the Benefits of a Model-Based …

187

7 End-to-End MBSE for Tests and Procedures The end-to-end application of an MBSE approach for covering all design and V&V activities of the V cycle has led to the definition of a new process which is illustrated in Fig. 4. The general idea is to follow the Arcadia methodology supported by the Capella toolset: At every engineering level (operations, system, sub-systems, etc.), a specific abstraction is used to define at the same time the design elements and the expected scenarios. Those scenarios, captured as dedicated models, are representing the procedures or the tests in a semi-formal way. Complete formalism is reached thanks to appropriate modeling guidelines. Two additional modeling tools are used in addition to Capella. Those tools are part of the Thales Orchestra tool suite and provide specific features: Melody CCM, now integrated in the CCM4Space toolset, is dedicated to on-board software modeling. Call is a modeling tool allowing to define scenarios for Melody CCM models. Capella, Melody CCM, and Call together are able to cover most of the modeling needs for design activities. They permit to describe, early in the development phase, the procedures and tests specification in a formal way. This approach has several benefits: First, the model of the procedure and test specification can only refer to actions that are provided by the design of the system. Second, if the design is changed, all impacted scenarios become invalid and need to be reworked based on the new design. Therefore, coherence between procedures and tests specification and system design is guaranteed. Relying on documentation generation from models allows to maintain an updated documentation which remains aligned on the design. Another benefit is to be able to deduce, from the set of procedure and test models, what should be defined in the system database: All required telecommands can be automatically inserted in the database, and some observability and commandability aspects can also be deduced. This approach permits to restrict the content of the database to the strict minimum (Fig. 5).

Fig. 4 End-to-end MBSE process for tests and procedures

188

R. de Ferluc et al.

Fig. 5 Capella scenario example (taken from [2])

In parallel to the population of the database, tests can be executed without SRDB inputs, but relying on the definition of the system (for instance, OBSW IDS). Procedures and tests can be consolidated so to be fully correct against the expected behavior. When the database is fully populated, no additional work is required to produce the final procedures and tests artifacts: Operational procedures and test scripts can be synthesized from the models and the content of the database. It becomes clear that all those benefits are participating in a significant reduction of V&V planning, accompanied with a significant enhancement of the quality of the information which is produced. Of course, this approach comes with several big challenges: First, it is important to manage the modeling effort, which can be much more time-consuming than documentation production. Then, it requires a good organization in the company to ensure a sufficient level of training and coaching. The modeling tools complexity can also require a dedicated team for maintenance and evolution. Organization impact is to be addressed taking into account a completely different way of doing the development activities (co-engineering, human-flow, etc.). Configuration and version management of models have to be set up, and tools to perform efficient diff/merge or impact analysis should be available.

8 Illustration of the Approach To illustrate the discussion, a very simple example is provided. Let us consider a space system aiming at providing a given functionality. The process starts with the analysis of the customer’s requirements, which leads to the definition of a functional analysis. In Capella, the functional analysis is performed at the system engineering level and

Exploring the Benefits of a Model-Based …

189

Fig. 6 Operational analysis with Capella

can be preceded with an operational analysis. The operational analysis is generally not performed in the space domain, but is a real opportunity to involve operators early in the development life cycle to consolidate operational concepts. The following picture illustrates a simple operational analysis performed with Capella, where two operational activities are defined by operators, and interactions are specified (Fig. 6). The operational concepts defined at this level can be considered as the contract between system engineers and operators. By agreeing on it at the beginning of the process, operators are confident that the usability of the system will match with their expectations. To avoid interferences with design activities, operational concepts should be defined at a very high abstraction level. This is illustrated by the “check function integrity” interaction which does not inject any constraints on the design. Of course, iterations can be performed during the life cycle to consolidate those operational concepts according to the system design. Next step is the functional analysis performed at the system engineering level. At this level, the perimeter of the system is elaborated. If we focus on the space segment, the design will distinguish the control centers and the spacecraft and will specify the functions for each entity. Some design trade-off may be performed taking into account all possible constraints (orbit, ground–board link bandwidth, autonomy requirements, etc.). The trade-off should have no impact on the operational concepts. In the use case, for some reasons, system engineers have defined that the “check function integrity” action would be performed on-board via an on-board control procedure, rather than on-ground by operators. This leads to a design involving OBCP-related functions (storage and management), as depicted in the following figure (Fig. 7). As defined in the methodology previously described, tests and procedures can be initiated as soon as a system design exists. This can be done with the Capella toolset thanks to the scenario diagram. This allows, for example, to specify the procedure steps envisaged to realize the “check function integrity” activity, as illustrated in the following Fig. 8. This high-level procedure specification can be provided to the other disciplines to prepare test activities at every engineering level and can also be delivered to operators to let them know how the spacecraft will be operated. Then, sub-systems are defined, and functional analysis is refined a step further. A logical design is defined, as an assembly of logical components reflecting a

190

R. de Ferluc et al.

Fig. 7 System functional analysis with Capella

Fig. 8 System scenario to specify procedure with Capella

logical decomposition of the system in sub-systems and components. Component exchanges and interfaces are also defined at this level. Capella allows to map functional exchanges on component exchanges and to define scenarios involving component exchanges, as illustrated in the following Figs. 9 and 10. The logical design is the opportunity to perform important trade-off without impacting the operational concepts and the functional scenarios. For instance, the functional exchange “upload OBCP file” which consists in transferring an OBCP file from the ground station to the spacecraft can be performed in different ways: – Using file management service (PUS service 23): Copy file request, with source file location being ground, and destination file location being on-board storage.

Exploring the Benefits of a Model-Based …

191

Fig. 9 Logical system architecture with refined functional analysis and mapping on component interfaces

Fig. 10 Logical component interface scenario with Capella

192

R. de Ferluc et al.

Fig. 11 Describing logical component interfaces with Capella

– Using the file transfer service (CFDP): Put request corresponding to a file transfer initialization from ground to on-board storage. Those services are captured with Capella as component interfaces, as depicted in the following Fig. 11. This example illustrates that the same system could be operated in different ways to achieve the same functional objectives. Early specification of the scenario is an important input to consolidate procedures and to initialize tests specification, in coherence with the design: If the design changes, impact on the procedures and tests is immediately identified. If one misses a system feature (service, parameter, etc.) to elaborate procedure or test specification, the design can be consolidated in early phases. Capella allows automatic transition from one engineering level to the other one, ensuring a correct by construction approach in the refinement process all along the development life cycle. After the logical architecture, a transition to the physical architecture allows to refine the design of the spacecraft: ICDs of the existing components are analyzed to check whether they satisfy the logical architecture, and a mapping between logical and physical design is performed. New equipment or software components can be specified, and their interfaces can be detailed thanks to Capella. Providing usage scenarios associated with the IDS of the specified HW or SW parts gives a lot of indications to the provider on how the part will be used in the overall system and participates to the consolidation of the quality of the product. Other steps of the end-to-end MBSE process are not illustrated as relying on proprietary tools: Transition from Capella to Melody CCM allows to elaborate the on-board software design and to specify tests accordingly thanks to MBSE practices; tests and procedure skeleton can be automatically generated from the modeled scenarios. System reference database can be initialized based on the content of those

Exploring the Benefits of a Model-Based …

193

scenarios, and traceability between all stages of the process is supported by the tooling.

9 Conclusion While step 1 and step 2 presented in this paper have allowed to consider costs reduction of the V&V activities for satellite development, only a complete change of process and of practices can bring significant planning reduction for these activities. Thales Alenia Space has defined an approach relying on model-based system engineering paradigm and tools and has identified all benefits and challenges related to this new process. Experimentation has started on small perimeters and on small case studies, and it is clear that deploying such an approach will be done step-by-step before validating all the concepts and developing all the required tooling. The innovative aspect of this work is to reconsider the process from a global point of view, with all the stakeholders, and to challenge today’s practices. It is motivated by the feeling that global optimization will not always come from local optimizations. MBSE is considered as an opportunity, and its progressive adoption by engineers in Thales Alenia Space is making the “end-to-end MBSE approach for procedure and test definition” possible in a short or medium term.

References 1. Model Based System Engineering with Capella—Laetitia Saoud—http://www.clarity-se.org/ wp-content/uploads/2015/05/2016_06_02_CNES_Capella-Introduction_v1.pdf. Checked on October 26, 2018. 2. Capella Guide on http://help.polarsys.org, http://help.polarsys.org/help/nav/1_3 Navigate to “Operational Interaction Scenario”; Checked on October 26, 2018.

The Power of High-Fidelity, Mission-Level Modeling and Simulation to Influence Spacecraft Design and Operability for Europa Clipper Eric W. Ferguson, Steve S. Wissler, Ben K. Bradley, Pierre F. Maldague, Jan M. Ludwinski and Chistopher R. Lawler Abstract NASA’s planned Europa Clipper mission seeks to assess the habitability of Jupiter’s moon Europa, which exhibits strong evidence of an ocean of liquid water underneath its icy crust. The sheer number of unique instruments, all of which require quiescent environments in order to operate, compounded with Jupiter’s distance from the Sun and Earth, makes this planned mission challenging and resource-constrained. High-fidelity, mission-level simulations that model the spacecraft, ground, and environment from launch to end of mission with a given trajectory and mission plan have been employed early in the project life cycle to better understand the interactions between various components of the mission and how design changes impact the entire system. These simulations have already resulted in tangible benefits to the project by providing vital input to key spacecraft trades, assessing impacts to operability, and quantifying how well the scientific objectives of the mission can be achieved. Improvements to simulation performance and to the process by which information defining the system is gathered and built into models used by simulations have the potential to further expand the scope of their use on Europa Clipper and future missions.

1 Introduction Modeling and simulation play a vital role in the design and operation of most, if not all, space missions. Applications employing modeling and simulation techniques vary considerably from low-fidelity parameterized simulations trading different mission architectures during proposal development to highly complex hardwarein-the-loop simulations used for testing and command validation during operations. Mission-level software simulations integrated with high-fidelity spacecraft models have been used effectively for spacecraft activity planning on a number of missions [1, 2]. These types of simulations provide a means to predict the state of the spaceE. W. Ferguson (B) · S. S. Wissler · B. K. Bradley · P. F. Maldague · J. M. Ludwinski · C. R. Lawler NASA/Jet Propulsion Laboratory/California Institute of Technology, Pasadena, CA 91109, USA e-mail: [email protected] © Springer Nature Switzerland AG 2019 H. Pasquier et al. (eds.), Space Operations: Inspiring Humankind’s Future, https://doi.org/10.1007/978-3-030-11536-4_10

195

196

E. W. Ferguson et al.

craft given an activity plan and to model the cumulative effect those activities have on mission resources (including, among many others, power, energy, pointing, data, and ground station tracking) over time. As simulation frameworks and languages have evolved and the processing power of computers has improved, mission-level activity planning simulations typically used during operations have begun to appear much earlier in the project life cycle. The Europa Clipper mission, one of the first flagship-class missions to adopt such simulations early in the NASA project life cycle, began running simulations based on current mission concepts in Pre-Phase A. Developers of these simulations were able to leverage models from previous missions in order to quickly build a simulation infrastructure at an extremely low cost. Despite having a very small development team and limited budget, mission-level simulations have had immense benefits (many of which were unexpected) to the Europa Clipper project since they were established. Analyses based on simulation results have facilitated informed decisions during trade studies, hardware test plan development and resource allocation processes, and the development and verification of mission requirements. Even though the approach to run high-fidelity simulations early and often has been extremely successful on Europa Clipper, improvements could be made to further enhance their utility on this and future missions. With these improvements, it is even possible that these simulations will smoothly evolve into the simulations used to integrate and validate spacecraft plans during operations. Sections 2 and 3 introduce the Europa Clipper mission and present the Activity Plan Generator (APGen) simulation framework used to build and run the integrated mission-level simulations for the mission, respectively. Section 4 describes the infrastructure wrapped around APGen that allows for quick ingestion of trajectory and design updates as well as comprehensive post-processing of simulation results. Section 5 provides concrete examples of how these simulations have been used for a diverse array of applications, along with some challenges faced by simulation developers thus far on the project. Section 6 provides a brief exploration of recommendations for future improvements, including how this infrastructure could be directly transferred to operations.

2 Europa Clipper Overview 2.1 Mission Design As one of the top candidates in our solar system with the potential to harbor presentday extraterrestrial microbial life, Jupiter’s moon Europa is an intriguing world to explore. Europa, roughly the size of Earth’s moon, is hypothesized to possess the three main requirements for life: liquid water from a global-scale ocean, energy from tidal heating due to Europa’s slightly elliptical orbit around Jupiter, and organic chemistry via interactions between materials on the surface and the ocean environment

The Power of High-Fidelity, Mission-Level Modeling …

197

[3]. Currently in its preliminary design phase and scheduled to launch in 2022, the Europa Clipper mission’s primary scientific objectives are to map the surface and subsurface structure, constrain the average thickness of the ice shell, characterize the composition of the surface and atmosphere, understand the formation and evolution of geologic features, and search for and characterize any current activity [4]. To achieve these objectives, the project plans to send to Europa a robotic, solarpowered spacecraft carrying a suite of ten instruments: five remote sensing instruments covering the spectrum from thermal emissions to the ultraviolet, four in situ fields and particles instruments, and a two-channel radar. Instead of orbiting directly around Europa where the spacecraft would be pummeled continuously by Jupiter’s harsh radiation environment, the spacecraft would perform a series of repeated close flybys of Europa while in an elliptical orbit around Jupiter. This allows the spacecraft to quickly dive through the intense radiation zone near Europa while providing ample time during the remainder of the orbit to return scientific data collected during the flyby [5]. Nonetheless, the brief dips into the radiation environment pose a threat to sensitive electronics and cause the solar arrays to degrade and produce less power as the mission progresses. The simulations constructed for Clipper take this environmental effect into account when computing solar array power output. The current mission baseline consists of launching on NASA’s Space Launch System (SLS) rocket on a 2.5-year direct trajectory to Jupiter followed by a 3.5-year tour composed of over 40 close flybys of Europa with closest approach altitudes varying from several thousand kilometers to as low as 25 km (Fig. 1) [6]. In the event the SLS is unavailable, an indirect trajectory requiring gravity assists from Venus and Earth and 4–5 years of additional cruise time would be used as no commercially available launch vehicles can provide sufficient energy to travel directly to Jupiter. Therefore, the spacecraft must be designed to withstand the lengthier cruise and lower minimum solar distance seen on an indirect trajectory. Both the direct and indirect trajectories can be run through integrated simulations to help uncover operational issues due to geometry (e.g., preventing the solar arrays from pointing directly at the Sun during turns to and from the trajectory maneuver attitude near the Venus gravity assist). The prime phase of the mission, known as Tour, consists of two distinct flyby campaigns, the first of which focuses primarily on Europa’s anti-Jovian hemisphere (Europa is tidally locked to Jupiter, and thus, the same side always faces away from Jupiter) while the second campaign focuses on the sub-Jovian hemisphere (Fig. 2). Each campaign systemically takes regional-scale observations of its entire hemisphere, which will ultimately provide the global coverage needed to answer key questions about Europa’s habitability by the end of the mission. Prior to the first campaign, a series of gravity-assist flybys of Ganymede reduce the orbit period to a duration of just over two weeks and set up lighting and velocity conditions for the first Europa flyby. This transition to the first Europa campaign also provides time to exercise the spacecraft and operations processes as well as calibrate instruments in preparation for operations at Europa. A transition from the first to second Europa campaigns using a series of Callisto flybys reshapes the spacecraft orbit so that it can observe the sub-Jovian side of Europa in the Sun.

198

E. W. Ferguson et al.

Fig. 1 Baseline mission design for Europa Clipper. Direct trajectory to Jupiter showing both early and late launch dates (left) and multi-flyby prime mission orbits around Jupiter (right)

Fig. 2 Baseline mission timeline including mission phases along with a zoomed view of Tour (terms in red are names of trajectory segments; COT Crank-Over-the-Top and Res Resonant)

During Tour, orbits containing a targeted Jovian moon flyby are divided into subphases (Fig. 3). The time between 2 days before and after closest approach, known as the flyby period, is further broken down into three parts: approach, nadir, and departure. The Playback phase follows Departure, which in turn ends 2 days prior to the closest approach of the next targeted flyby, and the cycle repeats. One such cycle is known as an encounter. Most of the mission’s Jupiter orbits contain a targeted flyby; thus, the duration between flybys is typically just over 14 days. However, there

The Power of High-Fidelity, Mission-Level Modeling …

199

Fig. 3 Encounter phase definitions and typical timing of key activities during an encounter

are occasional “phasing orbits” which do not contain targeted flybys; in these cases, an encounter spans multiple orbits. During Playback, the spacecraft spends most of its time pointing its high-gain antenna (HGA) at Earth to maximize data return and only occasionally departs from this orientation to perform activities that require specific pointing such as orbit trim maneuvers and instrument calibrations. During Approach, instruments prepare for and begin to make episodic observations of Europa, which requires the spacecraft to turn to and from a nadir orientation (where the remote sensing instruments are pointed at Europa). After a final communication pass roughly a half-day prior to closest approach, the spacecraft turns to point at or near nadir while twisting around the nadir axis to allow full Sun to fall on the solar arrays, and no longer turns back toward Earth until after closest approach. The majority of instrument observations and, consequently, data accumulation occur within a 4-hour period around closest approach. Just prior to this period, the spacecraft twists about the nadir axis to align the along-track fields of view (FOVs) of the remote sensing instruments and the boresights of the in situ particle detection instruments with the spacecraft velocity vector present at closest approach. Observations during departure generally mirror those found during approach with some differences driven by lighting conditions before and after closest approach, and also by the need to prepare for data playback. All spacecraft activities (including instrument observations) that occur during an encounter, and their effect on resources like spacecraft attitude, power, and data, are incorporated into the integrated Clipper simulations.

200

E. W. Ferguson et al.

2.2 Flight System Design The Europa Clipper Flight System is composed of the spacecraft and the instrument payload (Fig. 4). Single-axis solar arrays provide power to the spacecraft during the majority of the mission while large batteries are used during long eclipses (lasting up to 9.2 h) and periods when the spacecraft attitude is constrained such that the arrays cannot point directly at the Sun (e.g., maneuvers and flybys). Unfortunately, the periods of highest power demand are often the periods when the arrays cannot point toward the Sun. A unique aspect of the design is that the very high-frequency (VHF) and high-frequency (HF) radar antennas of the REASON instrument are mounted directly to the solar array. During REASON radar sounding, which occurs below 1000 km altitude from Europa, the solar arrays must remain fixed in the position that points the VHF antennas toward nadir. A suite of one HGA, one medium-gain antenna (MGA), two low-gain antennas (LGAs), and three fan-beam antennas are used for communication. All antennas support X-band for uplink and downlink, but only the HGA supports Ka-band downlink, which is the primary return path for acquired science data. Additionally, the LGAs

Fig. 4 Europa Clipper flight system in its low-altitude science data acquisition configuration. Inset at upper right depicts spacecraft vault and external components including co-aligned remote sensing instruments (MISE, EIS, E-THEMIS, and UVS) and co-aligned particle detection instruments (MASPEX and SUDA)

The Power of High-Fidelity, Mission-Level Modeling …

201

and fan-beam antennas support the gravity science investigation conducted a few hours around Europa closest approach. Reaction wheels are used for precise attitude control while a bipropellant system is used both for coarse attitude control and trajectory maneuvers. Highly accurate attitude knowledge is maintained through a pair of non-co-aligned stellar reference units (SRUs) and an inertial measurement unit (IMU). Dual-redundant, non-volatile bulk data storage (BDS) units, each of which can hold at least 550 Gigabits (Gb), collect data from the instruments throughout the prime mission. The large capacity of these units is primarily driven by the fact that the Flight System may collect on the order of 100 Gb per flyby and the downlink capability may not always support returning all of that data before the next flyby. An active heat redistribution system (HRS) employing a pumped fluid loop, combined with passive systems such as a louvered radiator and multi-layer insulation (MLI) blankets, provides thermal control for the spacecraft in both the hot environment early in cruise and the cold, energy-starved environment at Jupiter. Each spacecraft subsystem and its respective hardware components are modeled as part of the mission-level simulations produced for Clipper, with updates to the models continually incorporated as the design matures.

3 Simulation Framework and Heritage The central tool used to build and run simulations of the Europa Clipper Mission in its early stages was APGen, a modeling and simulation framework developed by NASA’s Advanced Multi-Mission Operating System (AMMOS) organization [2]. APGen was originally developed to help mission planners create activity plans that did not oversubscribe critical resources such as data storage, electrical power, and fuel during mission operations. Mission engineers specify ground and spacecraft activities, resources, and their interaction through a non-standard but highly effective domain-specific language (DSL) tailored to increase code clarity and coding efficiency when compared to conventional languages. The fundamental constructs within the DSL are parametrized activity types (much like classes in object-oriented programs) that can be instantiated within a plan, and resources which collectively describe the full state of the simulation. Code written in the DSL defining the activities and resources in a plan for a particular mission is often referred to as an APGen “adaptation.” Activities and their effect on resources are modeled using APGen’s discrete-event simulation engine, which only performs calculations at times specified by the simulation developer. This provides the capability to run simulations significantly faster than real-time (Europa Clipper end-to-end simulations run at over 8000× real-time). One feature of APGen and its DSL that has played a fundamental role in recent mission adaptations including Europa Clipper is a complete framework for expressing activity hierarchies. High-level activities, which typically represent science or engineering goals, can be expanded into lower-level activities representing the commands that implement these goals. Detailed multi-year mission simulations like Clip-

202

E. W. Ferguson et al.

per produce between 100,000 and a million activities; these massive simulations were not even possible until recent, significant upgrades were made to APGen. Over the years, the APGen DSL has been used to formulate a number of subsystem models for various missions, many of which have been completely or partially reusable by new missions [1]. Activities in the plan interact with these models via simple, well-defined interfaces. Additionally, APGen has the capability to interface with other model libraries written outside of the APGen DSL, which has allowed developers to leverage powerful tools like SPICE [7] or JPL-standard models like the Multi-Mission Power Analysis Tool (MMPAT) with ease. Table 1 describes the subsystem models used on Europa Clipper and their heritage (if any) from previous mission adaptations. On missions that adopted APGen early in its development, planners would place activities in the plan manually or through scripts outside of the DSL. Later, the DSL was enhanced to allow mission engineers to automatically add science and engineering activities to the plan based on customizable scheduling criteria [9]. This provided planners with the ability to build algorithms that queried the state of the plan (via resources) to determine time windows where specified scheduling constraints were met. New activities were then placed into the plan at appropriate times based on these windows. The deterministic nature of the observing geometry of orbital missions like Europa Clipper compared with surface missions makes these missions much more conducive to automated activity scheduling. Thus far, Europa Clipper simulations have relied almost entirely on automated APGen scheduling algorithms to build a notional plan of both science and engineering activities. As with the subsystem models, many of the scheduling algorithms, known colloquially as “schedulers,” were based on heritage from previous missions. While many previous uses of the APGen scheduling engine were focused on a single activity type or subsystem, Europa Clipper simulations are built from the ground up using automated schedulers. Complex logic based on various constraints, policies, and guidelines is frequently required when introducing activities to the Clipper mission plan, which is essentially empty at the beginning of a simulation run. Heuristics that had been introduced in early versions of the engine had to be removed and replaced with well-defined, predictably behaved algorithms that could be put to use with a minimum of difficulty by non-programmers. The resulting scheduling engine was perhaps the most significant ingredient to generating the Clipper simulations discussed in this paper. Without it, the construction of realistic activity plans would have been significantly more laborious.

The Power of High-Fidelity, Mission-Level Modeling …

203

Table 1 Primary subsystem models used in the Europa Clipper APGen adaptation Subsystem

Model description and heritage

Geometry

NASA’s Navigation and Ancillary Information Facility (NAIF) SPICE-based geometry calculations, including basic range and angular separation, as well as instrument field of view and constraint checking. Currently, 50+ geometry resources are calculated and tracked Heritage: Deep Impact

Ground station

Models’ characteristics and usage of Earth-based ground stations. Composed primarily of the Deep Space Network (DSN), but also includes additional assets from the European Space Agency (ESA), the Japan Aerospace Exploration Agency (JAXA) and NASA’s Near Earth Network (NEN). It includes the ability to generate view periods, transmitter limits, allocation patterns, and ground station transmitter and receiver events Heritage: Mars Polar Lander, Deep Impact, Mars Odyssey, Mars Reconnaissance Orbiter, Mars Exploration Rovers, Mars Science Laboratory, and InSight

Telecommunications

Medium-fidelity telecom link model to support the downlink modeling. It computes achievable data rates for X and Ka-band transmitters for all low-gain, fan-beam, medium-gain, and high-gain antennas. In the future, this model may be replaced by an interface to JPL’s Telecom Forecaster Predictor (TFP) software, which is used by most JPL projects Heritage: New to Europa Clipper

Data

High-fidelity model of instrument and engineering data production, onboard data storage, and playback of data to the DSN Heritage: Mars Polar Lander, Deep Impact, and Phoenix

Power

Models power loads from all spacecraft subsystems and instruments. It computes solar array output and battery state of charge. There is a medium-fidelity model as well as an interface to JPL’s high-fidelity Multi-Mission Power Analysis Tool (MMPAT) MMPAT Heritage: Mars Exploration Rovers, Deep Impact, Mars Science Laboratory, Phoenix, InSight, and Juno

Guidance Navigation and Control (GNC)

Models commanded attitudes for communication, science observations, and trajectory maneuvers with high fidelity. Updated based on the algorithms and methods inherited from the Cassini Mission and adapted for the flight software implementation on Clipper. Important for input to other models, such as power, telecom link, and geometric constraint checking Heritage: Deep Impact

Solar array

Models solar array articulation for Sun tracking and fixed modes, as well as hard-stop avoidance “flops” Heritage: New to Europa Clipper

Radiation

Models the radiation environment’s effects on the solar arrays. The Jupiter radiation environment model is based on a gridded approximation of the Galileo Interim Radiation Environment (GIRE) model [8] Heritage: Galileo (continued)

204

E. W. Ferguson et al.

Table 1 (continued) Subsystem

Model description and heritage

Propulsion

Medium-fidelity fuel usage model Heritage: New to Europa Clipper

Payload

Instrument models were implemented in the APGEN DSL based on specifications provided by the Europa science team. Models include instrument operating modes, power states, and data production Heritage: New to Europa Clipper

Mission operations

Models mission operations processes, timelines and shift schedules Heritage: Deep Impact

4 Europa Clipper Mission Simulations 4.1 Simulation and Model Inputs The foundation of all Clipper mission simulations is the spacecraft trajectory. Mission designers deliver new trajectories to the project periodically in the form of a SPICE SPK (ephemeris) kernel file with ancillary files for detailed information on maneuver times, magnitudes, and orbit determination error. Clipper simulations read the SPK directly along with other kernels defining the positions of natural bodies and ground stations on Earth to provide a basis for performing geometric calculations during activity scheduling and modeling. Additional kernels defining spacecraft and instrument coordinate systems and instrument FOVs (FKs and IKs respectively) were also created and then used by APGen for attitude scheduling as well as constraint checking. A new plan for the entire mission could be produced by APGen within a day or two after the delivery of a new trajectory. This provided the project with the power to quickly assess multiple trajectory candidates simultaneously and choose the trajectory that best met project requirements. Many subsystem models used in the Clipper APGen adaptation were inherited from previous missions and already designed to be multi-mission in nature. Therefore, mission engineers needed only to modify configuration data for that model to make it effective for Clipper. For example, the setup for the onboard data collection, storage, and downlink model inherited from the Deep Impact mission only required changes to simple parameters such as onboard storage size, recorded and real-time engineering data rates, data collection bin names, and compression rates. Early in concept development, certain model parameters were only crudely estimatable due to uncertainties in the spacecraft design. As a result, model parameters were often varied to understand how a given variation would affect the system and design margins. The ease with which models could be reused from previous missions allowed a single engineer to build an integrated Clipper simulation modeling attitude, power, data collection, and data downlink within a few weeks. The main model development work was spent on instrument models, as these had to be built from scratch.

The Power of High-Fidelity, Mission-Level Modeling …

205

In order to accurately model energy demand and availability for the current flight system design, the simulation had to know about the hardware onboard and the various power states that each piece of hardware could assume. Since Pre-phase A, Clipper simulations have interfaced with the official project Power Equipment List (PEL), which provides the Current Best Estimate (CBE) and Maximum Expected Value (MEV) power draw for each mode of each hardware component that consumes power. Initially, the data in the PEL was manually scraped from a spreadsheet and added to the simulation configuration. Now, accessing the latest PEL data is much more efficient as it is pulled directly from the project’s single source of truth “System Model” through the transformation process described in Ref. [10]. As the project progressed through concept development, there were major changes in the spacecraft power design—for example, changing from radioisotope thermoelectric generators (RTGs) to solar arrays—and the ability to provide rapid simulations of the mission was crucial in sizing the solar arrays and batteries as well as understanding the balance between data return strategies and battery depth of discharge.

4.2 Activity Scheduling Mission simulations for Europa Clipper are built in APGen by layering activities into the plan through a series of automated schedulers. The order in which these activities are layered is a function of priority and/or dependency. For example, schedulers responsible for creating activities that specify key orbital events and ground station view periods are only dependent on geometry and thus are normally first to be run upon delivery of a trajectory. Similarly, critical engineering activities like orbit trim maneuvers are high priority and inflexible in their timing, which means they are scheduled prior to other activities like science observations. Activities and their associated constraints are determined through conversations with systems engineers and subsystem experts. Based on activity constraints, each scheduler generates windows where it is valid to schedule each activity and adds one or more activities within these windows. Additional logic in the scheduler is often necessary to determine the number of activities to add, as well as the best place to schedule an activity within the available constraint windows. In rare cases, a subset of the activity scheduling is delegated to another tool, which then feeds back its results into APGen. For example, the scheduling of the EIS NAC instrument’s imaging and gimbal motion activities is provided to APGen by SIMPLEX, a tool developed and maintained by the John Hopkins University Applied Physics Laboratory (APL). Input parameters for some schedulers also provided a means to enforce different constraints or alter scheduling behavior. Each scheduler represents a separate instantiation of APGen core. Simple wrapper scripts driven by a configuration file were developed to change input parameters to schedulers and automatically trigger each scheduler when the previous scheduler had completed. The set of schedulers used for the Tour simulation of Europa Clipper generates approximately 165,000 activities, which would be onerous to build up

206

E. W. Ferguson et al.

manually. Activities scheduled often expand all the way down to the command level and include those developed to support attitude commanding, instrument data collection, data downlink, trajectory correction maneuvers, and engineering maintenance activities. A brief description of each scheduler and the key constraints it enforces is provided in Table 2.

4.3 Final Simulation Run Once all the schedulers have generated activities and saved them in activity plan files (APFs), APGen reads in all of the activities and kicks off a final simulation run. The various subsystem models within the simulation are triggered by the scheduled activities, which in turn modify the state of the simulation over time through changes to resource values. Every change to the state is recorded, stored, and ultimately provided as an output once the simulation run completes. Not surprisingly, as the level of detail within the APGen Clipper adaptation has grown, so too has the time it takes to run them. For current end-to-end mission simulations covering a 6-year time frame, the activity scheduling process takes about 24 hours to complete; the final simulation takes 6–8 hours to run. There are a few potential opportunities to improve simulation run time discussed in more detail in Sect. 6. Fortunately, modifications to subsystem models often do not affect activity scheduling. In these instances, only the final simulation needs to be re-run, which results in a significant savings of time.

4.4 Simulation Output and Visualization Products The primary output of the APGen simulations for Clipper is a massive Extensible Markup Language (XML) time-ordered list (TOL) file on the order of 20 GB for end-to-end simulation runs. This file contains all of the state changes that occur within the simulation, every constraint violation, and all of the data associated with each scheduled activity (e.g., start time, duration, and attributes). In addition to the XML TOL, a set of smaller files tailored to specific applications such as power and attitude reporting are produced. After each simulation run, the attitude report is automatically converted into an industry-standard SPICE C-kernel, which serves as an input to many other analysis tools on the project. All of the raw data produced by simulations is useless unless it can be transformed into a product digestible by analysts and, ultimately, project management. A variety of data visualization tools have been utilized to post-process simulation results and produce valuable products that can be disseminated to a wide array of users including mission planners, engineering teams, and instrument teams. The tools used to produce these products range from homegrown scripts that generate extensive PDF reports containing plots and tables to third-party or NASA-sponsored interactive timeline and 3D visualization tools. One commonality between the cores of these tools is

The Power of High-Fidelity, Mission-Level Modeling …

207

Table 2 Activity schedulers used on Europa Clipper listed in the order in which they are run Activity scheduler

Description and key scheduling constraints

Ground station view periods

Determines the time periods where the spacecraft is in view of each ground station (DSN station or additional asset from ESA, JAXA, etc.). View periods are computed based upon a complex combination of station elevation, spacecraft range, and signal-to-noise ratio

Geometric events

Calculates times of key geometric events such as eclipses, occultation, transits (e.g., Europa transit across Jupiter), and periapsis/apoapsis times based upon the spacecraft trajectory. This scheduler relies heavily upon SPICE and can run in parallel with the view period scheduler

Trajectory correction maneuvers

Adds maneuvers to the plan given a table of maneuver times, magnitudes, and directions (a fixed direction is assigned to statistical maneuvers whose direction is not yet known). Scheduled activities include commands to turn to/from the maneuver attitude, position the solar arrays for the maneuver, and turn on/off propulsion hardware

Flyby science and calibrations

One of the most complex schedulers which generates a schedule of instrument observations and calibrations around each flyby including all of the required attitude changes between science observations and Earth communications. Each observation has its own scheduling constraints based upon lighting, spacecraft attitude, solar array positioning, etc.

Gravity science

Schedules X-band communication activities around closest approach of each flyby to aid in measuring the gravity field around Europa. The geometry and spacecraft attitude around closest approach necessitate switching between fan-beam and/or low-gain antennas to maintain a link with Earth

Ground station allocations

Since actual allocations are not available far in advance, a notional schedule is generated based on station visibility, mission requirements for station coverage, and expected level of DSN commitment. For example, continuous 34-m station coverage is allocated for each maneuver in order to provide rapid maneuver assessment and precise reconstruction

Avionics maintenance

Adds avionics activities that preprocess stored science data into downlink products, copy science data from the prime to the backup “hard drive,” and perform radiation maintenance to prevent data corruption. These activities must be scheduled around science observations that produce data at a high rate

Instrument calibrations (outside flyby period)

Places instrument calibrations that have no requirement to be scheduled near a flyby into the plan. Different calibrations are scheduled with different frequencies, and each calibration has unique scheduling constraints primarily based upon geometry, lighting, and spacecraft attitude

SUDA surveys

These unique science observations that measure the dust environment of the Jovian system are scheduled after calibrations because there are more opportunities to schedule surveys than there are for certain calibrations (i.e., surveys have more scheduling flexibility)

Downlinks

Schedules Ka-band downlink activities, which playback stored science and engineering data. Commands to change the downlink rate as a function of station elevation and turn on/off telecom hardware are added using this scheduler

208

E. W. Ferguson et al.

that they are completely independent of the simulation itself so that other projects can also take advantage of their capabilities. What follows is a list of the primary visualization products currently generated from mission-level simulations along with a brief description of the product and the tool used to generate that product. 1. Interactive RAVEN Timelines After a simulation completes, the output XML TOL file is automatically pushed to a server, divided into a plethora of resource and activity timelines, and stored in a database, which can be accessed by the Web-based Resource and Activities Visualization Engine (RAVEN) tool. Created and managed by NASA’s Multimission Ground System and Services (MGSS) program, RAVEN provides a way to view and interact with timeline data via a Web browser. Each mission-level simulation generates thousands of individual timelines, which are organized into categories such as “Geometry,” “DSN,” and “Power.” Users build a timeline view by first choosing the simulation data set/s of interest and adding individual timelines from that data applicable to their current analysis (Fig. 5). After one or more timelines have been added to the view, users can zoom in to the time periods of interest, customize how each timeline is displayed, reorder timelines, and save the view so it can be shared with others via a simple URL. Other useful features include overlaying timelines in a single “row” and saving a view layout so that it can be applied to other simulation data sets. On Europa Clipper, RAVEN views of mission simulations are used to answer questions in real time that crop up during meetings, illustrate relationships between events in the mission plan, capture potential time periods of concern where constraints or guidelines are violated, and even build presentation material for reviews. RAVEN gives users the power to efficiently explore the vast quantities of data associated with these simulations and make connections between aspects of the mission that otherwise would have eluded them. 2. PDF APGen Simulation Report A PDF report consisting of hundreds of plots and tables summarizing the results of each simulation is produced automatically alongside every simulation run (Fig. 6). When a run has completed, the output TOL is split into a series of resource and activity timelines in an analogous manner to the process used prior to being viewed in RAVEN. These timelines are then read into MATLAB® , where they are used to perform the computations necessary to build various plots and tables. Finally, these plots and tables are organized and merged into a LaTeX-generated PDF report using a generic script that can be leveraged by other users. This report offers a one-stop shop for the project members to get high-level metrics on resources like data and power as well as detailed information about a particular encounter, subsystem, or instrument. The report is not intended to be read from start to finish; instead, users can jump to a particular section depending on their role or interest. Additional plots and tables can easily be added whenever a project member makes a new request. A completely separate PDF report has also been fashioned for the Clipper project to assess the satisfaction of science measurement requirements (see Sect. 5.7). This

The Power of High-Fidelity, Mission-Level Modeling …

209

Fig. 5 Resource and Activities Visualization Engine (RAVEN) timeline view of the entire Tour phase of Europa Clipper. This only represents a small subset of the data available from each simulation. Users can zoom in to regions of interest and add additional timelines as desired

Fig. 6 Stacked encounter timeline showing instrument activities around closest approach for every encounter (left) and diagram of Deep Space Network (DSN) passes in a single Jupiter orbit (right), two of the many plots included as part of the APGen simulation report

report was built using the same infrastructure and generic scripts as the APGen Simulation Report. 3. Interactive Tableau® Workbooks Data visualizations built using Tableau® have also proven effective for investigating simulation results, especially when cross-comparing results from multiple simulation runs (Fig. 7). Report files specifically structured to work effectively with Tableau® are

210

E. W. Ferguson et al.

Fig. 7 Examples of Tableau visualizations used on Clipper. Breakdown of instrument data by type (left) and comparison of onboard data between multiple simulation runs (right)

output directly from each simulation. Once the data is connected to Tableau® , users can make beautiful, data-rich charts in seconds, interactively filter data, and perform statistics based on the data. Europa Clipper has constructed a series of workbooks containing a number of charts that track different spacecraft resources like data and power. When a new simulation run has completed, the data can be connected to the workbook and immediately compared to previous results. 4. Cosmographia 3D Visualizations Understanding the relation of a spacecraft and its components to Earth, the Sun, and various other celestial objects as it turns, scans, or rolls to carry out various activities to meet mission objectives is extremely difficult without a way to visualize the geometry in three dimensions. 3D visualizations driven directly from mission simulations using Cosmographia, a publicly available interactive tool that visualizes natural bodies within the solar system along with spacecraft trajectories, orientations, and observations, have been available to the project since Pre-Phase A (Fig. 8). Cosmographia was recently extended by NAIF so that SPICE files could be used directly to drive the positions and orientations of objects and display SPICE frames on objects within a visualization [11, 12]. After running an integrated Clipper simulation, the spacecraft attitude profile, solar array rotation profile, and instrument pointing profiles are automatically converted into SPICE C-kernels, which are then loaded into Cosmographia along with a 3D model of the spacecraft, its trajectory, and instrument observation times. When running Cosmographia, users have the freedom to jump to a specific time and place, zoom in on an object, and rotate the view to any desired angle. If desired, users can also script how the camera moves in space so the user doesn’t have to change the view manually. This is especially useful when generating videos from the application. 3D visualizations using Cosmographia can be produced quite quickly once a simulation completes. Therefore, they have been used on a day-to-day basis during

The Power of High-Fidelity, Mission-Level Modeling …

211

Fig. 8 Screenshot from a Cosmographia video directly driven from simulation data showing a scan from a typical flyby of Europa Clipper produced for the project’s Mission Definition Review (MDR)

meetings as an engineering tool to ensure there is a mutual understanding among team members about the timing and geometries of specific spacecraft activities. For example, many project members didn’t fully grasp that the solar arrays would have vastly different positions relative to the spacecraft while approaching Europa on different flybys due to the varying solar phase angle between those flybys until seeing the geometry in Cosmographia. These visualizations are also sleek enough that some have been used for public outreach and project reviews including (to date) the Mission Concept Review (MCR), Mission Definition Review (MDR), and Flight System Preliminary Design Review (PDR).

5 Applications of Mission Simulations During Project Formulation (Pre-Phase A to Phase B) Since their inception in Pre-Phase A of the project lifecycle, mission-level integrated simulations using APGen have been applied to a diverse set of engineering challenges across the project. The applications of these simulations and their respective examples below are by no means an exhaustive list. However, they do provide a sense of the extensive role that mission-level simulations have played thus far on the project.

212

E. W. Ferguson et al.

5.1 Mission Planning The Mission Plan establishes the baseline strategy for successfully achieving mission objectives within the capabilities and constraints of the project systems, and subject to the project policies on the use of these systems. It does so by integrating the mission requirements from science, engineering, and navigation with the trajectory and ground station capabilities into a set of strategies and a time-ordered set of activities. For Clipper, the APGen mission simulations are the direct embodiment of the Mission Plan. In earlier projects, a significant manual effort was necessary to produce timelines of these plans that were consumable by non-mission planners, especially decisionmakers. The combination of APGen with RAVEN allowed for the direct production of timelines that can be used virtually as-is to communicate the Mission Plan. Furthermore, the relatively easy-to-use Web interface of RAVEN allowed subsystem and instrument engineers to access the mission plan directly, to produce customized views of their activities, and to perform important tradeoffs that inform their design. Simulations have provided an effective means of assessing impacts to resources brought about by an evolution in understanding of the flight system design and how that might alter strategies employed within the Mission Plan. For example, as the payload design matured, data volume estimates grew substantially and the project recognized an increased risk of exceeding the onboard storage capacity. Mission Planners were able to run simulations exercising different strategies such as arraying ground stations and adding additional data rate switches during downlink tracks to increase downlink capacity at times when the onboard storage was filling up. Although none of these strategies were employed at the time, project management could rest assured that there were ways to resolve data balance issues if data margin dipped below comfortable levels.

5.2 Project Trade Studies Armed with high-fidelity simulations and tools to process and analyze results, the project could quickly assess whether a given trajectory and mission plan could meet mission objectives within the constraints imposed by the spacecraft design and environment. If mission objectives could not be met, the project sought out alternate solutions by modifying the spacecraft design, modifying the mission plan (with attendant impacts on operability), or reducing the scope of scientific objectives. The trade space of solutions could be explored by running multiple simulations to help quantify the cost and benefits of each solution. On Clipper, this technique was used on several key trade studies, including one aimed at investigating potential solutions to curb energy demand and thus prevent solar array growth. The types of solutions explored fell into three broad categories: reduction of power demand via changes to spacecraft hardware, enhanced power

The Power of High-Fidelity, Mission-Level Modeling …

213

Fig. 9 Solar array power generation over Tour for multiple simulation runs varying array size, cover glass thickness, radiation dosage seen by the spacecraft, and Jupiter environment model

generation capability through changes to solar cell cover glass thickness or revisions to the Jupiter environmental model, and operational changes such as lengthening each Jupiter orbit to provide additional time to recharge the battery after each flyby or reducing the number of downlinks late in the mission when power generation will be limited. Multiple APGen mission simulations were run with varying input parameters that reflected the proposed solutions. After cross-comparing simulation results using both RAVEN and Tableau® , the project quickly saw that the net energy gain of the operational solutions was limited when compared with other potential solutions. Moreover, the costs associated with longer orbits and fewer downlinks far outweighed the slight increase in available energy. Simulations also showed that using a new, more accurate Jupiter proton environment model compiled with the most recent data available lead to a large improvement in energy margin (Fig. 9). Ultimately, the project decided to adopt the new model and add additional solar cells to the solar array yoke, which allowed the project to avoid having to add another solar panel to each wing. Many other project-wide and subsystem specific trade studies were supported by the Clipper adaptation of APGen. All of these trades could be supported because analysis based on simulation results could be turned around relatively quickly (even more trades would likely have been supported if turn around time could have been reduced further). A few additional examples of supported trades are listed and briefly described below. (1) Spacecraft Configuration Trade—an early project trade to compare various spacecraft hardware and instrument configurations. Simulations helped show

214

E. W. Ferguson et al.

that—for certain configurations—the restriction on solar array range of motion significantly reduced energy generation when it was most needed. (2) Additional Instrument Accommodation—NASA requested the project examine the resource cost of adding a laser altimeter to the instrument suite. Numerous simulations were run with various assumptions on instrument operational concepts and resource usage. The results were used to help determine that the additional science value of the instrument was not worth the associated resource impacts. (3) MISE Radiator Position—the Mapping Imaging Spectrometer for Europa (MISE) near-infrared instrument team needed to determine an optimal location for their passive radiator. APGen simulation of the spacecraft attitude throughout the mission showed the location on the spacecraft where bright bodies would pass over most frequently. Surprisingly, the radiator was designed to face the nadir direction toward Europa during flybys as that location provided the most protection from the brightest bodies like Jupiter and the Sun.

5.3 Energy and Data Allocations In order to determine the required energy generation and storage for the spacecraft, the project took a bottom-up approach by first deriving an energy allocation for the payload [13]. Setting energy allocations for a payload consisting of ten instruments, many of which have dynamic observation profiles that vary each encounter, was challenging. These allocations needed to be relatively robust to changes in tour design while also providing an easy means to sub-allocate to individual instruments. APGen simulations were run on recent tour designs using the Design Reference Mission (DRM) behaviors, a set of spacecraft and instrument activities that are designed to achieve the mission objectives; the DRM thereby provides a flight system design envelope. The DRM runs provide the CBE estimate for payload energy consumption. From these simulations, the project set the payload allocation based on the mean + 1σ energy consumption of the payload per orbit and per flyby (25% contingency plus 10% project reserve was used to compute the final values). Since the simulations also tracked the operational modes of each individual instrument, payload energy sub-allocations were computed based on the mean + 1σ durations for each mode [13]. Data volume allocations for the payload were also based on simulation runs using recent trajectory designs. Per direction of the project, allocations were based on current telecom capability and onboard storage size. While a per-encounter statistical approach similar to what was used for energy allocations initially looked promising, system engineers found it difficult to write useful requirements that would allow the payload to easily sub-allocate data to each instrument and report against their allocations. Instead, allocations were set based on two different portions of Tour: Transition to Europa Campaign 1 (TEC1) and the rest of Tour. Although there is excess downlink capacity during TEC1 due to its long duration and limited periods

The Power of High-Fidelity, Mission-Level Modeling …

215

Fig. 10 Comparison of downlink capacity and total data downlinked during Tour starting at the beginning of Europa Campaign 1, which served as the basis for setting payload data volume allocation requirements

of high-volume instrument data acquisition, this capacity is not available for use later in the Tour. APGen simulations were run for three different tour designs, and the run with the limiting case for downlink capability for each segment of the Tour was used as the basis for setting allocations (holding 10% of the capability back as project reserve). Figure 10 shows the accumulated downlink capability for Europa Campaign 1 through the completion of Tour. Each simulation run also had CBE data estimates for the payload split out by instrument, so margin could be computed against the allocation and sub-allocations could be derived. In addition to ensuring allocations had sufficient margin between total data volume collected and data downlink capability, they had to be consistent with the spacecraft’s onboard storage utilization. Simulations again proved that the current instrument operations plan during Tour using CBE data estimates did not violate the 60% project margin policy on maximum onboard storage utilization.

5.4 Hardware Design Clipper’s solar arrays rotate about a single axis, but unfortunately the design could not support a full 360° range of motion. To protect the arrays from over-rotating and twisting cables running from the spacecraft, hard stops needed to be installed. Depending on the design, these hard stops would reduce the solar array range of

216

E. W. Ferguson et al.

motion between 10° and 40°. However, the location of these hard stops was rather flexible, which meant systems engineers could pick the best range of solar array rotation angles relative to the spacecraft to restrict based on operational considerations. Two separate analyses were performed using APGen simulations to help determine the best location for the hard stops and the impact on energy generation of decreasing the array range of motion. First, a simulation of the entire Tour was run based on the mission plan and its resultant attitude and solar array strategies assuming a 360° range of motion. The solar array rotation angle over time, one of the outputs of the simulation, was processed and binned in 5° increments to determine how much time the arrays spent at different rotation angles (Fig. 11). Not surprisingly, for the majority of the Tour the solar arrays point in the same general direction as the HGA (defined as 0°) because the Earth and the Sun never stray too far apart when viewed from Jupiter. Moreover, the arrays spent very little time facing the opposite direction of the HGA (180°), which made this region ideal for the necessary array exclusion zone. Additional simulations were run that extended the size of the exclusion zone from 10° to 40° to see how this would impact battery state of charge (SOC). For certain flybys where solar array geometry was poor, SOC decreased with reduced range of motion. The impact was significant enough to require that the arrays be designed with at least a 350° range of motion. Ultimately, an exclusion zone between 165° and 175° was chosen because the arrays would need to be stowed in the 180° position for launch.

5.5 Hardware Test Plan Development Given the criticality for the solar arrays to provide sufficient power to the spacecraft and the implication of their size to spacecraft mass and agility, detailed design work on the arrays had already begun early in Phase B. As part of that work, thermal engineers had to develop a comprehensive test profile for qualification testing of the solar arrays and solar array-mounted hardware like the REASON VHF and HF antennas that would not be temperature controlled. The test profile had to define the number of thermal cycles to perform over different temperature ranges by determining the expected number of cycles that would occur during the mission and on the ground during pre-launch testing. In order to comply with design principles, the profile had to include at least three times the number of expected cycles. Estimating the number of thermal cycles at different temperature ranges that occur during the mission is challenging as the temperature of the solar arrays during the mission is dependent on both the trajectory (solar distance and eclipses) and any planned spacecraft activities that may cause the solar arrays to point off the Sun. Given that the solar arrays can only rotate about one axis, activities that require a specific spacecraft attitude can cause the arrays to diverge from pointing directly at the Sun. Specific examples of such activities during the mission are flyby science activities near Europa closest approach and periodic instrument calibrations. Also, certain activities require the solar arrays to remain still to prevent disturbances (e.g.,

The Power of High-Fidelity, Mission-Level Modeling …

217

Fig. 11 Cumulative time solar array spends at different rotation angles over the Tour binned in 5° increments. The time spent turning and in the nominal attitude pointing the high-gain antenna (HGA) at Earth has been filtered out

EIS NAC imaging) or remain fixed at a specific solar array rotation angle with respect to the spacecraft body (e.g., REASON flyby operations and trajectory correction maneuvers). Fortunately, a notional set of planned spacecraft activities for the entire mission had already been incorporated into mission-level simulations using APGen. Since one of the outputs of these simulations was the solar array pointing profile, the time and duration of each event where the solar arrays pointed off the Sun were quickly determined. In order to assess the severity of each solar array off-point event from a thermal perspective, these events were binned by spacecraft-Sun range and maximum off-point angle. Figure 12 shows a plot of all the simulated off-point events while Clipper is in orbit around Jupiter. Note that most off-point events are under a few hours while there are some outlier events around 10 hours in duration. These longer instances of the solar array geometry remaining especially poor occur during flybys, when the spacecraft attitude is fixed such that the remote sensing instruments are pointed nadir and the in situ particle detection instruments are pointed toward the spacecraft velocity vector. Simulation results of the solar array off-point events were combined with eclipse time and duration data based on the reference trajectory to determine predicted ther-

218

E. W. Ferguson et al.

Fig. 12 Maximum off-Sun angle and duration of each off-Sun-point event while in orbit around Jupiter Table 3 Solar array qualification test profile accounting for predicted thermal cycles during the mission and on the ground prior to launch (PP Planetary Protection) T min (°C)

T max (°C)

Delta-T (°C)

No. of cycles

3× life

Test medium

PP thermal cycles

25

150

125

1

3

PF hot thermal cycles

25

120

95

3

9

LN2

−240

−125

115

3

9

LHe

−5

100

105

5

16

LN2

PF cold thermal cycles Thermal cycles: up to 2AU

LN2

−80

−20

60

3

9

LN2

Thermal cycles: 5AU, Tour, some eclipse thermal cycles (short)

−180

−120

60

88

263

LN2

Eclipse thermal cycles (long)

−240

−125

115

35

106

LHe

Thermal cycles: 2–4AU

mal cycles for the mission. Additional margin was added to the results to account for uncertainties in the simulated mission profile. Thermal engineers then converted these results into their test profile as seen in Table 3, which consists of a series of discrete thermal cycle profiles with varying temperature ranges.

The Power of High-Fidelity, Mission-Level Modeling …

219

5.6 Operability Operability is often overlooked early in the project lifecycle. However, designing a system that is difficult to operate can adversely affect mission performance, increase operations cost, and decrease reliability. Finding operational problems early is extremely beneficial, as late hardware design changes are expensive or may not even be possible. During early project formulation, the Europa Clipper project established a project policy to consider operability when making design decisions by establishing a project policy on operability and staffing an operability engineer [14]. Mission simulations built using APGen, a tool originally designed for operations, have helped enforce the operability policy by identifying operational concerns stemming from the trajectory, spacecraft, instrument, and initial operations design since Pre-Phase A. For example, simulations of launch showed potential problems with the communication link between the spacecraft and Earth during post-launch critical events such as solar array deployment. Just after launch, the spacecraft will separate from the upper stage and continue its “barbeque roll” to keep the spacecraft thermally stable until it can obtain an inertial reference and transition to 3-axis control. While rolling, the best antenna for communicating with Earth will alternate between the two LGAs, and without inertial reference, the spacecraft would not know when to switch the signal from one antenna to another. Moreover, once the spacecraft establishes inertial reference and 3-axis control, the spacecraft must point the HGA toward the Sun to protect thermally sensitive spacecraft components. This attitude places the Earth in a location that is nearly 90° away from the boresights of the LGAs. Having little to no capability to assess spacecraft health and command in the event of a problem during such a critical phase of the mission is an extreme concern for operations planning. Since this operability issue was caught early, spacecraft engineers have examined different solutions including moving the locations of the LGAs on the spacecraft or adjusting the attitude strategy (as long as that strategy is also thermally safe). Some additional examples from the Clipper project where simulations and subsequent analyses have contributed to operability are provided in the following list: (1) Timelines produced by simulation runs showed the need for automation in the maneuver design process due to the criticality and short turnaround times of orbit trim maneuvers (OTMs). (2) Integrated plans scheduled by APGen showed how to reconcile a number of conflicting activities during the flyby period, which should reduce the need for manual negotiation and activity planning. (3) Analysis of the positions of bright bodies like the Sun and Jupiter relative to the spacecraft given its orientation over the mission revealed there were no good locations to place instrument radiators. This led to instrument design selections that included cryocoolers, thus avoiding operational mitigations requiring an adjustment of attitude to prevent over-heating the instrument (such mitigations were required on the Cassini mission).

220

E. W. Ferguson et al.

(4) Simulations provided evidence that the post-flyby sorting of recorded science data required by the proposed BDS design could be accommodated with minimal impact to science opportunities and data playback [14].

5.7 Requirements Verification Ensuring the concept of operations and the trajectory design can meet the science needs of the mission is one of the primary charters of mission designers, and missionlevel simulations have proven to be an essential ingredient toward achieving this goal. After agreeing on the high-level scientific objectives of the mission, scientists quantified the amount and types of data each instrument would need to collect to meet these objectives and captured these needs as science measurement requirements. Evaluating achievement of over 100 science measurement requirements is not trivial, especially when coupled with 70+ geometric conditional requirements and a trajectory that is tweaked and redesigned annually. These factors make measurement requirement evaluation by inspection or by scripts spread across instrument teams infeasible. A software tool called Verification of Europa Requirements Integrating Tour and Science (VERITaS) was built to aid mission planners and trajectory designers by quickly evaluating all science measurement requirements automatically based on simulations performed using APGen. VERITaS excels at assessing different trajectories and operations strategies to determine the level by which they meet measurement requirements. Written in MATLAB® , VERITaS analyzes requirement satisfaction and margin against those requirements based on a planned set of instrument observations and a spacecraft attitude profile provided by APGen. With these inputs, VERITaS performs various computations necessary to check the requirements against the plan including complex geometrical and instrument surface coverage calculations. VERITaS makes widespread use of NAIF’s SPICE toolkit for many of its geometric calculations and frame transformations [7, 11]. The incorporation of SPICE allows for standard SPICE kernels containing trajectory, attitude, and frame information used across the project to be ingested. Instrument observations are supplied to VERITaS via a simple time-ordered-list CSV file, and the spacecraft attitude and solar array orientation, along with the independently pointable EIS NAC, are provided as SPICE C-kernels. A flowchart depicting the requirement assessment workflow is shown in Fig. 13. In order to check an observation plan against requirements, VERITaS must also read in the requirements. Science measurement requirements are represented in the format of the Mission Science Traceability Alignment Framework (M-STAF), which is an Excel workbook that organizes Level-2 measurement requirements by instrument and relates them to corresponding geometry requirements (e.g., altitude, lighting, or ground speed) and science themes (e.g., ocean properties). While the M-STAF is useful as a stand-alone product on the project [16], the unique style of organizing requirement makes it indispensable for interpreting and assessing science measure-

The Power of High-Fidelity, Mission-Level Modeling …

221

Fig. 13 Flowchart of inputs and outputs to the Verification of Europa Requirements Integrating Tour and Science (VERITaS) tool for science measurement requirement compliance assessment. The ability for VERITaS to ingest fault timelines in a Monte Carlo simulation is also shown, but is an optional feature [15]

ment requirements. For ingestion into VERITaS, the Excel sheet is reduced to a simple text file with a special syntax that VERITaS knows how to interpret. Once VERITaS has a requirement set and observation plan, it begins by computing sensor coverage of Europa’s surface for each remote sensing instrument using a specially built coverage tool. The coverage tool was built from scratch to accommodate input of SPICE kernels (including instrument attitude C-kernels), input of an observation timeline, and the generation and saving of coverage on a fine temporal scale. The fine temporal scale is necessary to take advantage of VERITaS’s ability to ingest simulated fault timelines (see Sect. 5.8 for more information on the faulted Monte Carlo analysis). VERITaS then churns through each science measurement requirement, applying necessary geometry and lighting constraints, and determines if and when each requirement is met. All information coupled to each requirement is saved together for future reference. Data related to each requirement includes science theme, observation technique type, geometry constraints and their requirement IDs, pass/fail information, and the margin with which the requirement is met. Multiple plots are also automatically generated and saved in the process. Figure 14 contains an example visible imaging coverage map and the corresponding accumulation of surface coverage with each additional flyby of Europa over the course of the mission. Figure 15 highlights how this information can be used to provide insight into how much margin exists on the requirement given the current observation plan and trajectory. Analyzing a VERITaS run can be done quickly by looking at an automatically generated Excel sheet and a PDF report full of both summary and detailed statistics and charts. The Excel sheet lists each requirement, provides a short description of the requirement, notes whether it passed and, if so, when it passed, and even color-codes

222

E. W. Ferguson et al.

Fig. 14 Combined global-scale coverage of Europa for the EIS NAC and EIS WAC instruments colored by resolution. Constraints have been applied for altitude, pixel scale, and lighting

Fig. 15 Solid blue line shows the combined EIS NAC and EIS WAC visible imaging surface coverage of Europa over each encounter, cumulatively by encounter; the coverage requirement is met at E30. Coverage accumulations are also shown at different levels of resolution of interest

The Power of High-Fidelity, Mission-Level Modeling …

223

Fig. 16 Histogram of when each science measurement requirement is predicted to be met, providing insight into margin and robustness

the requirement based on whether it was achieved. The PDF report contains plots and tables for each requirement and is automatically compiled using LaTeX. The plots and tables provide background information and can quickly be incorporated into presentations and papers. As an example, Fig. 16 shows a histogram of how many requirements are met by encounter. Together these output products give the Europa project a quick assessment of which measurement requirements are failing or are more difficult to meet. The use of VERITaS in conjunction with APGen simulations and their integration into the mission planning and trajectory design process has catalyzed updates to the mission concept of operations, initiated trajectory design tweaks, and most often, has triggered updates to the measurement requirements themselves [15]. The tool has allowed for rapid requirement evaluation in the midst of an evolving flight system design and operations concept on a near-weekly basis.

5.8 Fault Sensitivity Analysis In addition to using VERITaS to assess a nominal mission, VERITaS can aid in analyzing the effect of fault rate and recovery time on requirement satisfaction and margin by ingesting simulated fault timelines [17]. As depicted in Fig. 13, thousands of simple fault timelines can be ingested by VERITaS and analyzed in a Monte Carlo process, effectively simulating thousands of “missions” with varying fault

224

E. W. Ferguson et al.

Fig. 17 Example results of a Monte Carlo faulted VERITaS analysis showing the probability of each measurement requirement being met given the fault rate and recovery assumptions in each column [17]

rates and recovery times. This capability is especially useful due to Europa Clipper’s destination—Jupiter—and its high-radiation environment. Fault timelines are generated using historical spacecraft safing and transient fault rates along with models of the Jovian radiation environment based on data from Pioneer, Voyager, Galileo, and Juno. The probability of a fault occurring is a function of time and the flight system’s proximity to Jupiter. Still, there is uncertainty on the exact fault rate each instrument and the spacecraft will see in operations, so a range of fault rates have been parametrically assessed to date in order to bound the problem [17]. The goal of this type of analysis is threefold: (1) assess the robustness of the science requirements in the presence of transient faults and safings, (2) inform the design of the recovery time capability of the flight system, and (3) ensure the Tour duration is adequate to achieve the science objectives in the presence of potential disruptions. Thousands of fault timelines have been run through VERITaS to simulate the loss of science data due to these putative transient faults. An example of resulting information is shown in Fig. 17, which illustrates how the probability of a particular requirement being met can be reported for each assumption about the fault rate and recovery time. This has become a powerful tool for steering the design of spacecraft recovery time, operations concepts, and even modifications to the measurement requirements themselves.

The Power of High-Fidelity, Mission-Level Modeling …

225

5.9 Europa Lander Concept Thanks to the multi-mission infrastructure that past projects had built up, the Clipper APGen adaptation was set up much more quickly. Improvements made for Clipper are already proving very useful for other mission concepts currently in development, especially Europa Lander. The Europa Lander simulation framework, as described in Ref. [18], resulted directly from the Clipper APGen adaptation. The Carrier and Relay Stage in the 2017 reference Lander mission, which would orbit Europa and provide relay support for the Lander, was directed to have as much heritage hardware as possible from Clipper, so it was easy to start with the Clipper adaptation as a jumping off point for the Carrier. Common components, which accounted for about half of the lines of code in the Clipper adaptation, were reorganized into a “Generic” directory from which both adaptations could import code, and which can form a more formal basis for all future APGen adaptations. The main generic components are: command and data handling (CDH), DSN, geometry, GNC, ground station, mission operation system (MOS), power, propulsion, radiation, sequence, solar arrays, telecom, telemetry, and thermal. The planned Psyche mission, whose adaptation will begin soon, should be able to reuse all of these components with minimal modifications, inheriting almost everything necessary to run integrated simulations (with the exception of the project-specific science instrument models).

6 Improving Future Mission Simulations 6.1 Simulation Performance For the most part, project customers of simulation results and analyses were pleased by how quickly APGen adaptation developers could turn around requests. Nevertheless, there have been a few cases where lengthy simulation run times have impacted decision-making timelines. Full simulation runs including all activity scheduling can take up to 1.5 days to complete. Although multiple simulations can run in parallel, each simulation uses on the order of 40 GB of RAM. Therefore, even with several high-powered machines, the number of concurrent simulations has been limited. There are several pending enhancements, both to the APGen framework and adaptation that together have the potential to significantly improve simulation performance by an order of magnitude or more. After taking a hard look at the core APGen framework, developers determined that one way to increase speed would be to identify as much information as possible at “compile time” (i.e., the moment when APGen parses the adaptation). If more information embedded in the adaptation is collected and organized prior to the time of execution, there would be a large reduction in the number of CPU cycles and, consequently, simulation run time. Work is currently ongoing to improve the APGen source code so that it can efficiently parse the adaptation and thoroughly analyze

226

E. W. Ferguson et al.

its content before execution. Test cases running a prototyped version of this “faster APGen” that use a subset of the Clipper adaptation have already shown a boost in run time performance by a factor of 3.4. As the Clipper adaptation builds up its activity plan, it runs a series of activity schedulers, each of which invokes a separate instance of APGen. In order to schedule the next set of activities correctly, based on resource constraints, many of these schedulers must first model the current state of the plan, which involves calculating the effect each activity in the plan has on various resources. In the current adaptation, many of these calculations are unnecessary because they focus on resources that turn out not to be needed in the evaluation of constraint windows suitable for adding new activities to the plan. In addition, each scheduler often has to recalculate the same resource values even if their values do not change, since each scheduler is a separate APGen process. Additional logic is being introduced into the adaptation so that resource calculations are only performed when necessary. Switches are now provided that turn subsystem models on and off contingent on whether resources within those models need to be calculated. Recent updates to the APGen framework also allow specific resource values to be saved off into “partial” XML TOLs, which can then be read into subsequent instances of APGen. This functionality is used between many schedulers within the Clipper adaptation and has already reduced run time by over a factor of 2. There are still plenty of additional opportunities to modify the adaptation in order to take more advantage of these performance-enhancing capabilities.

6.2 Transformation of Models, Activities, and Constraints to Simulation Inputs The vast majority of the time spent developing and refining the Clipper APGen adaption was devoted to understanding what activities encompass the mission plan, the constraints associated with each activity, and how those activities behave and interact with the rest of the system. Mission planners, who were the primary developers of the adaptation, would gather this information from each subsystem via meetings or documentation that was often vague or incomplete. Next, mission planners would manually create activity definitions and schedulers within APGen based on their interpretation of the gathered information and run a simulation. Simulation results would be reviewed by subsystem experts and inevitably the definitions and scheduling logic would be inaccurate. This would lead to one or more iterations between developers (planners) and customers (subsystems) before a satisfactory plan would be produced. One way to prevent “lost in translation” issues would be to have subsystems develop their own models, activity definitions, and constraints within the simulation framework. This can be challenging as learning the APGen DSL is non-trivial and subsystem experts usually don’t have the time or resources to devote to building additional system-level models for their subsystem. However, APGen does provide

The Power of High-Fidelity, Mission-Level Modeling …

227

a way to interface with other model libraries outside of the APGen DSL, and this has been accomplished successfully in the past for both power and telecom models. Another solution would be to direct project members to describe activities and model behavior in a more standard modeling language that could then be translated automatically into APGen for simulation execution. Since the inception of the project, Europa Clipper has emphasized and invested in Model-Based System Engineering (MBSE) practices that employ common modeling languages such as SysML to describe various aspects of the system design [19]. Data captured in these languages serves as a single source of truth (SSoT) for the project that other tools can pull from with the knowledge that they are receiving reliable information [20]. MBSE efforts on Clipper have had success in some areas, but little energy has been devoted to providing a simple way for subsystems to capture behavioral information that can then be easily queried by downstream simulation tools like APGen. Currently, the only information the APGen adaptation pulls directly from the SSoT Europa Clipper “System Model” is the PEL (see Sect. 6.1). Nonetheless, capturing information about activities, constraints, and behaviors using MBSE seems promising and should be pursued with more force, as it has the potential to save hours of simulation development time and prevent misunderstandings between teams.

6.3 Parameterization and Automation A single run of a mission simulation represents a single point design with a specific trajectory, spacecraft hardware configuration, and concept of operations. During preliminary design, engineers are challenged to explore the design space in order to find solutions that best meet mission objectives and constraints. One effective means of exploring this space is by running multiple simulations, each of which tweaks different aspects of the mission design. Clipper found success using this method on targeted trade studies, but only a small number of design parameters were available to change, and runs were typically kicked off manually. Building an architecture that easily exposes design parameters as simulation inputs would allow for a more thorough and efficient exploration of the design space. Monte Carlo methods could be applied to automatically run simulations that vary design parameters, in order to gauge which aspects of the design are the most driving. In order to effectively compare results from the multitude of simulations that would be run, a highly organized and efficient data storage infrastructure would be needed. The Clipper project has begun to tackle this “big data” problem through the use of industry standard, unstructured databases as well as search and data analytics tools. However, there is still much work that could be done to improve the current infrastructure so it could support storing and analyzing multiple simulation runs on a much larger scale.

228

E. W. Ferguson et al.

7 Potential Use in Operations Integrated simulations are expected to play a crucial role in operations for Europa Clipper, and plans are already in the works to leverage as much capability as possible from the Clipper APGen adaptation. In fact, the operations process currently envisioned for Clipper revolves around planning and validating spacecraft behaviors at the activity level. Once planners agree on a course of action, the activities within the plan would automatically decompose into their respective spacecraft commands. Although the operations design is still rapidly evolving given the project is still in Phase B, some key aspects of the design include: (1) A centralized modeling and simulation environment used to perform all spacecraft and instrument planning and validation. (2) Rapid, continuous integration, and validation of plans. (3) Highly automated processes to generate, validate, and approve final uplink products (e.g., commands and sequences). A limitation of many previous mission efforts was that the evolution of the uplink products from the plan to sequences went through tool discontinuities as the level of detail increased. This lead to an iterative and often error-prone planning process as the intent and behavior (such as resource usage) captured at the activity level regularly failed to match the delivered sequences. These inconsistencies would only get caught late in the planning process, which meant sequences had to be re-delivered and revalidated within extremely short time frames. With an integrated activity planning tool like APGen, however, increasing level of detail can be readily accommodated by capturing a more sophisticated algorithmic description within the definition of each activity. This means that the implemented plan is the direct ancestor of the sequence products that eventually get sent to the flight system. Such continuity ensures that the rationale for the planned strategies won’t get lost as a result of switching between tools in the uplink process. One mission where the continuity between the plan and uplink products was maintained with success was the Deep Impact Mission. On Deep Impact, APGen was also used starting in Phase B to build integrated simulations of the comet Tempel-1 encounter. The APGen adaptation was originally based on the preliminary design, but as actual flight system and flight software capabilities became available, the adaptation evolved from notional activities and design behaviors to actual spacecraft commands and measured behaviors. The commands produced by the adaptations were then validated by running integrated, system-level tests using hardware-in-theloop test beds. Even now in the early stages of the project, Clipper’s APGen adaptation already contains an extraordinary level of detail about the planned spacecraft behavior for the entire mission. Most activities already decompose to the command level, and in some cases, those commands are already being vetted and validated by subsystem experts. The current adaptation already has many of the desired qualities which operators are looking for in a planning tool for Clipper. Simply evolving the Clipper

The Power of High-Fidelity, Mission-Level Modeling …

229

adaptation and enhancing its capabilities where needed could significantly reduce cost both during development and operations. Whether or not the adaptation is used directly, the knowledge embedded in the adaptation about the system and how it behaves is readily available and will undoubtedly serve as the foundation for any future operational planning tool.

8 Conclusion Mission-level simulations have had a profound impact on the Europa Clipper project, the spacecraft, and the mission design. The unprecedented level of detail in the simulations has permitted managers and engineers alike to gain insights into the design and operations of Clipper that would have otherwise gone unnoticed until much later in the project lifecycle. Early in the design process, results from simulations gave mission operations engineers “a seat at the table” by providing quantitative evidence of operability concerns with the design. As the project progressed, the scope of simulations only expanded. Since their creation, end-to-end simulations have been used on Clipper on a daily basis to support mission planning, trades, hardware design and testing, requirements development, and even to help prove the mission can meet its bold scientific objectives. Ultimately, direct descendants of these simulations may drive the central process behind planning during operations. With high-fidelity simulations, the entire mission could be flown thousands, if not millions, of times before the spacecraft ever reaches the launch pad. If these simulations provide sufficient fidelity, run quickly, and simulation results are easily searchable and cross-compared, engineers could efficiently explore a large design space and produce a better design. One could even imagine using automation and computer intelligence techniques in conjunction with simulations to discover the top design candidates. Current simulations on Clipper represent a step in this direction, and future missions should consider adopting mission-level simulations early, thereby advancing their capabilities toward this goal. Acknowledgements The material in this work originates from a paper presented at the SpaceOps 2018 conference in Marseille, France [21]. This research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration. The authors would like to thank Laureano Cangahuala, Nathan Strange, Kelli McCoy, and Dave Mohr for taking the time to provide a thorough technical review of this paper. Also, a special thanks to Adam Roberts for providing additional feedback on grammar and style.

References 1. Wissler, S., Maldague, P. F., Rocca, J., & Seybold, C. (2006). Deep impact sequence planning using multi-mission adaptable planning tools with integrated spacecraft models. In AIAA 9th

230

2. 3.

4.

5.

6.

7. 8.

9.

10.

11.

12.

13.

14.

15.

16.

17. 18.

E. W. Ferguson et al. International Conference on Space Operations (SpaceOps). Rome, Italy. https://doi.org/10. 2514/6.2006-5869. Mitchell, A., et al. (2004). MAPGEN: Mixed-initiative planning and scheduling for the mars exploration rover mission. IEEE Intelligent Systems [Online Journal], 19(1), 8–12. McKinnon, W. B., Pappalardo, R. T., & Khurana, K. K. (2009). Europa: Perspectives on an ocean world. In R. T. Pappalardo, et al. (Eds.), Europa (pp. 697–710). Tucson: University of Arizona Press. Pappalardo, R. T., et al. (2015, September). Science objectives for the Europa Clipper mission concept: Investigating the potential habitability of Europa. In European Planetary Science Congress (Vol. 10). EPSC2015-156, Nantes, France. Buffington, B., Campagnola, S., & Petropoulos, A. (2012). Europa multiple-flyby trajectory design. In AIAA/AAS Astrodynamics Specialist Conference. Minneapolis, MN, August 13–16, 2012. https://doi.org/10.2514/6.2012-5069. Lam, T., Buffington, B., Campagnola, S., Scott, C., & Ozimek, M. (2018). A robust mission Tour for NASA’s planned Europa Clipper mission. In 2018 Space Flight Mechanics Meeting, AIAA SciTech Forum. Kissimmee, FL, January 8–12, 2018. https://doi.org/10.2514/6.20180202. Acton, C. H. (1996). Ancillary data services of NASA’s navigation and ancillary information facility. Planetary and Space Science, 44(1), 65–70. Garrett, H. B., Martinez-Sierra, L. M., & Evans, R. (2015, October). Updating the Jovian proton radiation environment. Pasadena, CA: JPL Publication 15-9, Jet Propulsion Laboratory, National Aeronautics and Space Administration. Maldague, P. F., Wissler, S., Lenda, M., & Finnerty, D. (2014). APGEN scheduling: 15 years of experience in planning automation. In AIAA 13th International Conference on Space Operations (SpaceOps). Pasadena, CA, May 5–9, 2014. https://doi.org/10.2514/6.2014-1809. Cole, B., & Dinkel, K. (2016). Multidisciplinary model transformation through simplified intermediate representations. In IEEE Aerospace Conference. Big Sky, MT, March 5–12, 2016. https://doi.org/10.1109/aero.2016.7500656. Acton, C., Bachman, N., Semenov, B., & Wright, E. (2018). A look toward the future in the handling of space science mission geometry. Planetary and Space Science, 150, 9–12. https:// doi.org/10.1016/j.pss.2017.02.013. Semenov, B. V. (2018). WebGeocalc and cosmographia: Modern tools to access OPS SPICE data. In AIAA 15th International Conference on Space Operations (SpaceOps). Marseille, France. https://doi.org/10.2514/6.2018-2366. Oaida, B., Lewis, K., Ferguson, E., Day, J., & McCoy, K. (2018). A statistical approach to payload energy management for NASA’s Europa Clipper mission. In IEEE Aerospace Conference. Big Sky, MT, March 3–10, 2018. Signorelli, J., Bindschadler, D. L., Schimmels, K. A., & Huh, S. M. (2018). Operability engineering for the Europa Clipper mission: Formulation phase results and lessons. In AIAA 15th International Conference on Space Operations (SpaceOps). Marseille, France. https://doi.org/ 10.2514/6.2018-2629. Buffington, B., et al. (2017). Evolution of trajectory design requirements on NASA’s planned Europa Clipper mission. In 68th International Astronautical Congress (IAC), IAC-17-C1.7.8. Adelaide, Australia, September 25–29, 2017. Susca, S., Jones-Wilson, L. L., & Oaida, B. V. (2017). A framework for writing measurement requirements and its application to the planned Europa mission. In IEEE Aerospace Conference. Big Sky, MT, March 4–11, 2017. https://doi.org/10.1109/aero.2017.7943689. McCoy, K., et al. (2018). Assessing the science robustness of the Europa Clipper mission: Science sensitivity model. In IEEE Aerospace Conference. Big Sky, MT, March 3–10, 2018. Lawler, C. R., Wissler, S. S., Kulkarni, T., Ferguson E. W., & Maldague P. F. (2018). Europa lander concept: High fidelity system modeling informing flight system and concept of operations years before launch. In AIAA 15th International Conference on Space Operations (SpaceOps), Marseille, France. https://doi.org/10.2514/6.2018-2413.

The Power of High-Fidelity, Mission-Level Modeling …

231

19. Bayer, T. J., et al. (2012). Model based systems engineering on the Europa mission concept study. In IEEE Aerospace Conference. Big Sky, MT, March 3–10, 2012. https://doi.org/10. 1109/aero.2012.6187337. 20. Dubos, G. F., Coren, D. P., Kerzhner A., Chung, S. H., & Castet, J. (2016). Modeling of the flight system design in the early formulation of the Europa project. In IEEE Aerospace Conference, Big Sky, MT, March 5–12, 2016. 21. Ferguson, E. W., Wissler, S. S., Bradley, B. K., Maldague, P. F., Ludwinski, J. M., & Lawler, C. R. (2018). Improving spacecraft design and operability for Europa Clipper through highfidelity, mission-level modeling and simulation. In AIAA 15th International Conference on Space Operations (SpaceOps). Marseille, France. https://doi.org/10.2514/6.2018-2469.

Attitude Control Optimization of a Two-CubeSat Virtual Telescope in a Highly Elliptical Orbit Reza Pirayesh, Asal Naseri, Fernando Moreu, Steven Stochaj, Neerav Shah and John Krizmanic

Abstract This paper investigates a novel approach for attitude control of two satellites acting as a virtual telescope. The Virtual Telescope for X-Ray Observations (VTXO) is a mission exploiting two 6U CubeSats operating in a precision formation. The goal of the VTXO project is to develop a space-based, X-ray imaging telescope with high angular resolution precision. In this scheme, one CubeSat carries a diffractive lens and the other one carries an imaging device to support a focal length of 100 m. In this mission, the attitude control algorithms are required to keep the two spacecraft in alignment with the Crab Nebula observations. To meet this goal, the attitude measurements from the gyros and the star trackers are used in an extended Kalman filter, for a robust hybrid controller, and the energy and accuracy of attitude control are optimized for this mission using neural networks and multi-objective genetic algorithm.

Nomenclature i ω a

Inclination Right ascension of the ascending node Argument of Perigee Semimajor axis

R. Pirayesh · F. Moreu University of New Mexico, Albuquerque, NM 87131, USA A. Naseri (B) Space Dynamics Laboratory, Logan, UT 84321, USA e-mail: [email protected] S. Stochaj New Mexico State University, Las Cruces, NM 88003, USA N. Shah NASA Goddard Space Flight Center, Greenbelt, MD 20771, USA J. Krizmanic University of Maryland, Baltimore County, Baltimore, MD 21250, USA © Springer Nature Switzerland AG 2019 H. Pasquier et al. (eds.), Space Operations: Inspiring Humankind’s Future, https://doi.org/10.1007/978-3-030-11536-4_11

233

234

e f h G M r I Rbo Fi Fo Fb T P D

R. Pirayesh et al.

Eccentricity True anomaly Angular momentum about the center of mass Gravitational constant Mass of the Earth Distance between the Earth and satellite Moment of inertia Rotation matrix Inertial reference frame Orbital reference frame Body reference frame Torque Proportional gain Derivative gain

1 Introduction Formation flying is a set of more than one spacecraft whose dynamic states are coupled through a common control law. In particular, at least one member of the set must (1) track a desired state relative to another member, and (2) the tracking control law must at the minimum depend upon the state of this other member. Formation flying, a key factor in spacecraft formation and rendezvous, is investigated in many space missions including virtual telescopes. The formation flying role is to keep the spacecraft in an accurate alignment so that they can perform a specific mission. The European Space Agency (ESA) has been developing a mission, PROBA-3 [1, 2], that enables pointing toward the sun through a formation in a highly elliptical orbit, and this formation keeps the relative position error in the order of millimeters and the attitude angular error in the order of arcseconds. SIMBOL_X [3], an ESA mission, is a virtual X-ray telescope with 10 arcsecond accuracy. Other virtual telescope missions are X-ray Milli-Arc-Second Structure Imager (MASSIM) [4] and the New Worlds Observer (NWO) [5] exoplanet mission. Calhoun [6] investigated formation flying with noise in the measurements. Woffinden [7] and Okasha [8] investigated the guidance, navigation, and control (GNC) problem for orbital rendezvous with noise in the measurements and used an extended Kalman filter (EKF) to reduce them. Since a study from the National Research Council titled NASA Space Technology Roadmaps and Priorities in 2012 was published, X-ray optical systems have been listed as a “game-changing” technology due to X-rays themselves being linked to the universe’s highest energy phenomena. X-rays are also linked to the fact that they are capable of producing images of greater resolution and when used in space missions they provide the ability to understand physical processes at a greater level. This paper investigates a novel approach to formation flying of satellites applied to the future Virtual Telescope for X-ray Observations (VTXO) mission. The VTXO mission is a collaboration between two educational institutions (NMSU, UNM)

Attitude Control Optimization of a Two-CubeSat …

235

and NASA GSFC and supports NASA’s Science Technology Mission Directorate (STMD) and Science Mission Directorate (SMD). VTXO is a sub-arcsecond resolution X-ray telescope that utilizes two 6U CubeSats flying in formation in highly elliptical orbits [10–12]. The two CubeSats carry a phased Fresnel lens (with a focal length in excess of 100 m9 ) and an X-ray detector mounted on a leader and a follower, respectively. These two vehicles are designed to fly in a formation approximating a rigid telescope. The main objective of the mission is to investigate technologies that will enable a next-generation X-ray telescope. This mission requires very precise alignment and determination (sub-arcsecond to milli-arcsecond) to enable imaging at a higher quality than currently available. This will be made possible through relative motion and attitude control methods that enable formation flying of CubeSats. The mission design for the VTXO calls for the two vehicles to hold a rigid formation near apogee, during which time the two spacecraft will perform scientific observations for a short period of time (1–3 h). While away from apogee, the two vehicles will reposition themselves for the next iteration of the observations. Hence, each orbit consists of three major phases: the formation stabilization phase, the development phase, and the scientific phase. The high-precision alignment requirements for the mission call for precise knowledge of both spacecraft’s positions relative to one another. The second aspect of formation flying is attitude determination and control. In the formation of stabilization phase, the CubeSats are stabilized while they pass the perigee to come into the next orbit phase, where only an antigravity gradient torque is applied to the satellites to lessen the drift of the angular velocities from zero. In the development phase, the coarse pre-attitude control is applied to provide enough attitude accuracy for the scientific phase. In the scientific phase, the precision attitude control takes place, where the two satellites point at the Crab Nebula for two hours. To meet this goal, three main steps in the design are taken based on the desired period and the desired angular precision the satellites should maintain during formation. These steps include designing the orbits and the corresponding phases in each of them, the control algorithms, and the filter to reduce the noise of the sensors. The attitude control is based on the quaternion models of the two satellites. In this model, different sources of noises and disturbances including the space (gravity gradient torques, random accelerations, J2 gravity model, and torques to account for drag, solar pressure, higher-order gravity terms, etc.), the measurement sensors, and the actuator torques are included [7]. In the attitude control and EKF design, the noises of different sensors including the IMU sensor and the star tracker are considered, and the navigation part of the control system uses EKF to approximate the angles and the angular velocities of the satellites based on the noisy data from the sensors. Unknown initial conditions and noise in the dynamical system lead to different errors and energy consumption values assuming the same controller parameters. This is not acceptable since the spacecraft are power limited, and the goal of the mission is to obtain the least error. As a result, there are two objective functions to minimize, i.e., the energy consumption and the error. A heuristic optimization method, the multi-objective genetic algorithm, is used to find the controllers’ optimal parameters in the development phase and the scientific phase. The initial conditions are not known in the development phase. Similarly, different ratios of total errors to total

236

R. Pirayesh et al.

energy consumption are desired for each mission. Consequently, the authors used a neural network to estimate the optimal controllers’ parameters based on the initial quaternion, initial velocity, and different ratios of errors to energy consumption.

2 Orbit Design The orbits with respect to the Crab Nebula are shown in Fig. 1. The orbits of the follower and the leader are designed based on the position of the Crab Nebula. The orbits are placed in the same plane, and both of their apogees are in the same line connecting the Crab Nebula to the center of the Earth so that the satellites have more time to observe the Crab Nebula. The Crab Nebula right ascension and declination are 5 h 34 min 31.94 s and 22°, respectively. The orbits have the same right ascension of the ascending node, argument of perigee, and inclination. In addition, the orbits must have the same semimajor axis to have the same period. The only difference between the orbits is their eccentricity. The leader and the follower are both in geostationary transfer orbits. The eccentricity of the orbit of the follower is designed to include a 6-min buffer between the time the follower and the leader pass the point where the orbits intersect, avoiding a collision between satellites. A larger difference between the eccentricities results in a lower risk of collision, since the satellites would have longer relative distances. But this results in a higher energy consumption that is needed to keep the desired 100-m relative distance between the satellites. This is because the distance between the apogees of the satellites increases, which requires higher energy to keep the satellites in the relative distance range that is desired. The orbital elements are given in Table 1.

Fig. 1 Orbits Table 1 Orbital elements

i, rad

, rad

ω, rad

a, km

e

Follower

0.34

0

4.6743

24,320

0.7125

Leader

0.34

0

4.6743

24,320

0.7336

Attitude Control Optimization of a Two-CubeSat …

237

3 Modeling the Satellites This research modeled the system in four steps: (A) modeling the system with Euler angles: The dynamics of the system is required based on Euler angles since the desired angular velocity and the desired angular acceleration are known based on the Euler angles; (B) modeling the system with quaternions: Researchers obtained the dynamics of the satellites considering the noise in the system based on the quaternions; (C) sensor fusion: The EKF is added to the system of equations to (1) decrease the noise of the sensors and (2) increase the desired resolution; and (D) learning and optimization: A multi-objective genetic algorithm and neural network are implemented on the system to optimize energy and errors.

3.1 Modeling the System with Euler Angles The Earth coordinate frame is the Earth-centered inertial (ECI) frame, and the frames used for the satellites are the Local-Vertical-Local-Horizontal (LVLH) frames. The attitude dynamics of the two satellites are driven based on the Euler angles. The rotational equation of motion for the satellites as rigid bodies in space is bi bi I ω˙ bi b −ωb × Iωb + 3

GM o3 × J b o3 + τ r3

(1)

τ is the input torque generated by the reaction wheels and random space noises, which are gravity gradient torques, random accelerations, J2 gravity model, and the torques to account for drag, solar pressure, higher-order gravity terms, etc. In the LVLH frame, the vector o3 is the nadir vector; i.e., o3 −r/r , which is the third column of the rotational matrix Rbo . Rbo represents the rotational matrix from the frame Fo , the orbital frame, to Fb , the body-fixed frame. o1 is pointing in the direction of the velocity vector, and o2 completes the right-handed triad. If 1-2-3 rotation sequence from Fo to Fb is chosen, then the gravity gradient torque is Rbo R3 (θ3 )R2 (θ2 )R1 (θ1 ) ⎡ ⎤ (I3 − I2 )c1 c2 (s1 c3 + c1 s2 s3 ) GM ⎣ ⎦, gb 3 3 (I3 − I1 )c1 c2 (s1 s3 + c1 s2 c3 ) r (I2 − I1 )(s1 s3 − c1 s2 c3 )(s1 c3 + c1 s2 s3 )

(2)

where “c” represents cosines of rotations, “s” represents sines of rotations, and the subscripts show the axis of rotation. ωbi b is the angular velocity of Fb with respect to is obtained from the following equation: Fi . ωbi b bo oi ωbi b ωb + ωb

(3)

238

R. Pirayesh et al.

If we consider small angles, assuming the orbits to be circular and assuming free motion without any torque on the satellites, the set of the linearized equations that are used in many references are as follows [9] I1 θ¨1 + (I2 − I3 − I1 )ωc θ˙3 − 4(I3 − I2 )ωc2 θ1 0 I2 θ¨2 + 3(I 1 − I3 )ωc2 θ2 0 I3 θ¨3 + (I3 + I1 − I2 )ωc θ˙1 + (I2 − I1 )ωc2 θ3 0 GM ωc r3

(4)

As the phrase “linearized equations” suggests, the equations are only valid for small values of Euler angles. Thus, the nonlinear equations are developed bo oi ωoi b R ωo ⎡ ⎤ 0 ⎣ − f˙ ⎦ ωoi o

0 h f˙ 2 r √ h pμ p a(1 − e2 ) μ f¨ 2 3 (1 + e cos( f ))(−e f˙ sin( f )) h 2

(5)

Hence, the angular velocity and angular acceleration using Eqs. (3) and (5) are ˙ ¨ ¨ ˙ ˙ ˙ ω˙ bi b M θ + G(θ1 , θ3 , θ2 , θ1 , θ3 , θ2 , f ) + f b(θ1 , θ3 , θ2 ) ⎡ ⎤ ⎡ ⎤ c2 c3 θ˙1 + s3 θ˙2 s1 s2 c3 ⎣ −c2 s3 θ˙1 + c3 θ˙2 ⎦ − f˙⎣ c1 c3 − s1 s2 s3 ⎦ ωbi b s2 θ˙1 + θ˙3 −s1 c2 ⎡ ⎤ θ1 θ ⎣ θ2 ⎦ θ3 ⎤ ⎡ ⎤ ⎡ −(s1 s2 c3 + s3 c1 ) c2 c3 s3 0 M ⎣ −c2 s3 c3 0 ⎦ b ⎣ (s1 s2 s3 − c3 c1 ) ⎦ s2 0 1 s1 c2

(6)

Attitude Control Optimization of a Two-CubeSat …

239

˙ bi We use ωbi b in Eq. (6) to find the equations of motion. Thus, the final b and ω equations are θ¨ M −1 −G(θ˙1 , θ˙3 , θ˙2 , θ1 , θ3 , θ2 , f˙) − f¨ b(θ1 , θ3 , θ2 ) GM bi × Iω + 3 o × I o + τ + I −1 −ωbi 3 b 3 b b r3 θ¨ N(θ˙1 , θ˙3 , θ˙2 , θ1 , θ3 , θ2 , f˙, f¨) + M −1 I −1 (τ ))

(7) (8)

Since the relative position controller is acting on the leader, the leader is not on its natural orbit. Hence, f L , f˙L , and f¨L are unknown for Eq. (5) (subscripts “L” and “F” correspond to the leader and the follower satellites, respectively). However, the distance d between the satellites is known, and assuming during the formation control the deviation of the relative distance from 100 m is negligible. Since the line connecting the satellites is parallel to the line connecting the Crab Nebula and the Earth, a triangle can be formed and we can find f L , f˙L , and f¨L : f L arcsin(d sin( f F )/rL ) + f F

(9)

f L is close to f F since the rL is much larger than the distance between the satellites. However, to have more accuracy, the arcsin(d sin( f F )/rL ) is considered in the equations. The f˙L is d ˙ ˙ fL fF (10) (sin( f F ) − eF cos( f F )) + 1 ∼ f˙F rL And f¨L ∼ f¨F . In Eq. (7), the matrix M is singular at θ 2 π /2 rad. Therefore, Eq. (8) cannot be implemented to analyze and control the satellites; thus, the quaternions are used instead since the quaternions do not have this singularity problem. However, Eq. (6) is used to find the desired angular velocity and the desired angular acceleration which are used later in the controller design.

3.2 Modeling the System with Quaternions The quaternion q bI represents the orientation of the body frame with respect to the inertial frame, and this orientation is in the body frame.RbI is the matrix transforma-

240

R. Pirayesh et al.

tion from the inertial frame to the body frame in terms of this orientation. Hence, the attitude dynamics model in terms of quaternions is 1 bI bI E q ωb 2 ⎡ ⎤ q1 ⎢ q2 ⎥ ⎥ q⎢ ⎣ q3 ⎦

q˙ bI

E(q)

q4 q4 I 3×3 + [q 1:3 ×] T −q 1:3

(11)

(12)

−1 bI ω˙ bI (−[ωbI b J b ×]J ωb + τ ) τ τ in + τ g + τ d GM τ g 3 3 o3 × I b o3 r

(13)

(14)

The term τ d corresponds to disturbances in the space environment including gravity gradient torques, random accelerations, J2 gravity model, and the torques to account for drag, solar pressure, higher-order gravity terms, etc. It is modeled as uncorrelated white noise with the autocorrelation function as E[τ d (t)τ d (t )∗ ] σω˙ I 3×3 δ(t − t )

(15)

The variance is defined by a trial and error process outlined by Lear [14]. The term τ in corresponds to the controller input so that the satellites’ quaternions reach the desired quaternions.

3.3 Sensors The gyro measures the satellites’ angular velocity, and the star tracker measures the orientations of the satellites. The gyro model is ω˜ ∂ R(εω )[{I 3×3 + diag( f ω )}ω + bω + w ω ]

(16)

The superscript ~ indicates measurement, ω is the angular velocity of the satellite, ε ω is the misalignment, f ω is the scale factor biases, bω is the bias, and wω is white noise. The covariance of the white noise E[wω (t) wω (t )] Bδ(t − t ) B σg2 I 3×3

(17)

Attitude Control Optimization of a Two-CubeSat …

241

bω is defined as bω b˙ ω − + wb τb

(18)

where wb is white noise with the variance σb2 . The star tracker model is [8] q˜ sI ∂q(υ ss ) ⊗ ∂q(ε s ) ⊗ q sb ⊗ q bI

(19)

where υ ss is sensor white noise with the covariance C. εs is the misalignment, defined as ε˙ s −

ε ss + ws τs

(20)

where the w s is white noise with the variance σs2 . The star tracker model can also be represented in terms of its states as z˜ s Rsb θ b + ε s + υ s

(21)

Here, the subscript “b” represents the body frame of the satellites and “s” represents the body frame of the star tracker. θ b is obtained from this relationship δq bI qˆ bI+ ⊗ (qˆ bI− )−1

(22)

The superscript “+” represents after the filter estimation, discussed more in the navigation section, and the “−” represents before the estimation. The estimation is represented by ∧. For small rotations, the following equation holds

∂q ≈

θ/2 1

(23)

3.4 Actuators The torques required to control the attitude of satellites are applied by the reaction wheels. The angular momentum of the reaction wheel is h˙ −[ω×]h − τ in

(24)

242

R. Pirayesh et al.

Fig. 2 GNC schematic

As a result, h˙ is the wheel torque applied to the satellites and τ in is the input control algorithm. The reaction wheels generate torques for a commanded desired torque as τ in δ R(ε τ )[{I 3×3 + diag( f τ )}τˆ desired + bτ + w τ ]

(25)

where ετ is the misalignment, f τ is the scale factor bias, bτ is bias, wτ is white noise, and τˆ desired is the desired commanded torque.

3.5 GNC Design The goal of the guidance, navigation, and control (GNC) subsystem is to first define a desired trajectory for the system, in our case, the trajectory is the attitude of the satellites, and then control the system efficiently based on this desired trajectory, given the sensors are noisy. The GNC model leads the system to the desired values (Fig. 2). 1. Guidance Each satellite is controlled separately, so there is no data fusion between the satellites. Each satellite, during the scientific phase, keeps its orientation parallel to the line connecting the center of the Earth and the Crab Nebula for two hours. Accordingly, the desired quaternions for both of the satellites are constant: ⎡

⎤ −0.5591 ⎢ 0.0158 ⎥ ⎥ q bI ⎢ ⎣ −0.0106 ⎦ 0.8289 And the corresponding Euler angles with 1-2-3 rotation sequence are

Attitude Control Optimization of a Two-CubeSat …

243

⎡

⎤ ⎡ ⎤ θ1 −2.0235◦ ⎣ θ2 ⎦ ⎣ 0.8173◦ ⎦ θ3 −68.0144◦ 2. Navigation The navigation model uses an extended Kalman filter to estimate the states optimally. The dynamics model used for propagating the states are bI 1 ˆ qˆ˙ E(q bI )(ω˜ bI b − bω ) 2 bˆ ω ˆ b˙ ω − τb

(27) (28)

s

εˆ εˆ˙ s − s τs

(29)

The model for this filter is x˙ f (x, τ in , w, t), w ∼ N (0, B) yk h(x) + υ k , υ k ∼ N (0, C) Initialize: x 0 , P 0 Propagation: States xˆ˙ f ( xˆ , τ in , t) + ˆ ˆ pˆ k−1 ϕˆ T + Q Covariance pˆ − k ϕ

ˆ T ˆ ˆ− ˆ T ˆ −1 Gain Kˆ k pˆ − k Hk [Hk p k H k + R] ˆ ˆ− Update xˆ +k xˆ − k + K k [ yk − h( x k )] ˆ k ] pˆ − pˆ +k [I − kˆ k H k

(30)

The navigation states of each satellite for the filter comprise nine state vectors xˆ [θ b bω ε s ]9

(31)

The model does not include angular velocities, since the attitude model used in the filter is in the model replacement mode [13]. The measurements violate the normalization constraint of the quaternions, so a multiplicative error is used to overcome this problem. As a result, instead of four elements of quaternions, three components of orientation θ b are selected for the states [11]. The quaternions are updated with the following equation

244

R. Pirayesh et al.

qˆ bI+ δq bI (θ b ) ⊗ qˆ bI−

(32)

The sate transition matrix ϕˆ used in the filter is e F t , and it can be approximated by the fourth-order Taylor series ϕ e F t ≈ I + F t + where F

δf | −. δx x k

F 2 t 2 F 3 t 3 F 4 t 4 + + 2! 3! 4!

(33)

The discrete process noise matrix is ˆ Q

Qg 0 0 Qn

(34) 9×9

The gyro process noise matrix Q g is approximated considering the gyro noise is internal and a random walk process: ⎡ ⎤ 0 σxg2 t 0 ⎢ ⎥ 2

t 0 ⎦ (35) Q g ⎣ 0 σ yg 2

t 0 0 σzg Q n represents the bias process noise defined as − 2 t σb2 I 3×3 (1 − e τb ) 03×3 Qn − 2 t 03×3 σs2 I 3×3 (1 − e τb ) 6×6

(36)

ˆ k is The measurement sensitivity matrix H ˆ k δh(x) | xˆ − [R3×3 (qˆ sb ) 03×3 I3×3 ] H 3×9 k δx

(37)

3. Control For the two development and scientific phases, two controllers are utilized to control the attitude, a proportional–derivative (PD) controller and a robust sliding mode controller. In the dynamical model, it is assumed that there is a disturbance in the inertial momentum J J + disturbance ⎡ ⎤ cos(t) disturbance 0.5J 2 ⎣ sin(t) ⎦ 0.5 The PD controller is

(38)

Attitude Control Optimization of a Two-CubeSat …

245

ˆ bI ) τˆ desired P(θ desired ) + D(ωˆ bI desired − ω

(39)

where the desired angular offset is obtained from the small difference orientation feature of quaternions with the following expression:

θ desired 1

qˆ desired ⊗ (qˆ bI )−1

(40)

To define the sliding mode controller (SMC), first the difference in quaternions is defined as

δq

δq 1:3 δq4

(41)

Then, the sliding mode vector is defined as ˆ 1:3 sˆ (ωˆ bI − ωˆ bI desired ) + k × sign(δ qˆ4 )δ q

(42)

Therefore, the SMC is bI k bI bI ˆ ˆ ˙ ˆ ˆ ¯ |δ qˆ4 | ωˆ bI − ωˆ bI − sign(δ q ˆ ω + ω τˆ in J )δ q + ω − A s 4 1:3 desired desired desired 2 bI (43) + ωˆ bI × J ωˆ where A is a positive definite matrix and s¯ is defined with a saturation function as s¯ i sat(si , εi ) i 1, 2, 3

(44)

where si is the ith element of the sliding mode vector and εi is a small positive number. In the formation stabilization phase, to reduce the drift of the angular velocities from zero, the gravity gradient torque is counteracted by an antigravity torque applied to the satellites τˆ in −τˆ g

(45)

4. Optimization and learning phase To find the optimal controllers’ parameters, two algorithms, multi-objective genetic algorithm and neural networks, are utilized [15]. The multi-objective genetic algorithm finds the minimum of the defined objective functions by heuristically finding the optimum variables or parameters affecting the objective functions. The objective functions are the RMS error of the controllers and the total energy consumed during the control. The result of this optimization is an optimum Pareto front, representing

246

R. Pirayesh et al.

a set of points showing how each objective value changes against the other one. The data is used later to train a neural network. The neural network creates a function which estimates the optimal controllers’ parameters based on the input to the neural network. The neural network will then give the optimal parameters in the experiment phase. The experiment phase basically simulates the satellite in their orbits when they should decide what parameters to use to guarantee low error and energy consumption. The outputs and the inputs are different for the development phase and scientific phase. The following sections explain both the optimization phase and the learning phase in detail. (1) Development phase In the development phase, the schematic of the GNC system is shown in Fig. 3. In the following, the optimization and learning phases will be explained more. In this paper, we produce 100 different initial conditions data around the approximately estimated initial condition of the beginning of the development phase to better estimate the optimal parameters based on the current initial condition of the satellite in their orbits. As a result, we assume that we know approximately what initial condition the satellites can have, and, in practice, we read the absolute initial condition and choose the optimal parameters. As a result, some prior information is considered in the training. The data produced for angles and angular velocities has the variance of 0.07 and 0.2, respectively.

Fig. 3 GNC system of the development phase

Attitude Control Optimization of a Two-CubeSat … Fig. 4 Pareto front-PD Controller

247 Pareto front

0.2

Objective 2- RMS error (radian)

0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Objective 1- Energy (J)

Optimization phase In the development phase, the objective function of the optimization algorithm is the total energy consumed during the attitude control and the RMS error of the last 5 s of the attitude controllers. Objective function Total energy and RMS error in the last 5 s

(46)

The parameters to be defined during the optimization are the PD controller parameters, the SMC’s parameters, and the time of the control. The total error of the last seconds, here defined to be 5 s, is the important duration, since it defines that in the development phase a low error is obtained and so the scientific phase with low errors can be started. The Pareto front results of this optimization for one specific initial quaternion and angular velocity are shown in the following figures for the PD controller and SMC. As it is illustrated in the Pareto front figures, when the objective 2, or the total energy consumption, increases, the objective 1, or the RMS error in the last 5 s, decreases and vice versa. This is due to the fact that when there is more energy consumed, the error decreases. This optimization is run for many times for different initial quaternions and angular velocities. Next, in the neural network step, the initial quaternions, which were used in the optimization phase, are transformed into Euler angles by 3-2-1 sequence to decrease the number of inputs in the neural network, and so decrease the time of process in the learning phase of the neural network (Figs. 4 and 5). Learning phase The neural network is designed with three hidden layers, which has ten neurons in the first layer, five neurons in the second layer, and three neurons in the last layer. There are seven inputs and three outputs when using PD controller and four outputs

248

R. Pirayesh et al.

Fig. 5 Pareto front-SMC

Pareto front

0.06

Objective 2- RMS error (radian)

0.055 0.05 0.045 0.04 0.035 0.03 0.025 0.02 0.015 0.03

0.035

0.04

0.045

0.05

0.055

0.06

Objective 1- Energy (J)

Fig. 6 Performance of PD controller

Mean Squared Error (mse)

101

Best Validation Performance is 0.075933 at epoch 698 Train Validation Test Best

100

10-1

10-2 0

100

200

300

400

500

600

700

800

900 1000

1000 Epochs

when using SMC in the neural network. The inputs are three Euler angles, three angular velocities, and one input as the ratio of RMS error to the energy consumption obtained during the optimization phase. The outputs of the neural network using the PD controller are the proportional and derivative gains in the controller and the time of the development phase. The outputs of the neural network using the SMC are three parameters in the controller and one as the time of the development phase. In the neural network in the training phase, the number of epochs is set to 1000 and the number of maximum fail is set to be 6000. The performance plots of the PD controller and the SMC are shown in Figs. 6 and 7, respectively. The performance shows the high accuracy of the neural network estimation for the controllers’ parameters and time of the development.

Attitude Control Optimization of a Two-CubeSat … Fig. 7 Performance SMC

Mean Squared Error (mse)

101

249

Best Validation Performance is 0.0041867 at epoch 30 Train Validation Test Best

100

10-1

10-2

10-3

0

100

200

300

400

500

600

700

800

900 1000

1000 Epochs

Table 2 Input to the neural network Initial quaternion

Initial angular velocity

−0.4028 0.0776 −0.8484 0.3594

−0.0353 −0.0788 0.8009

Ratio of error to energy 1.86

Table 3 Output of the neural network PD controller parameters 0.1208 kp 0.3786 kd

SMC parameters ⎡ ⎤ ⎡ ⎤ 1.3240 k ⎢ ⎥ ⎢ ⎢ G ⎥ ⎣ 0.0166⎥ ⎦ ⎣ ⎦ 0.4488 eps

Duration PD, SMC 0.4755 min, 1.0904 min

Tables 2 and 3 show the estimated optimal controllers’ parameters and the development phase duration obtained by the neural network for the PD controller and the SMC. As it is shown in Table 3, the duration of the time of the development phase for both controllers is similar, and the gains of the controllers are not big. In the result section, these parameters will be tested in the experiment phase. (2) Scientific phase In the scientific phase, the schematic of the GNC system is shown in Fig. 8. Optimization In the scientific phase, since the initial conditions are close to the desired ones, the objective functions are optimized only for one initial condition, which is the noisy desired quaternions and the noisy zero angular velocities. Besides, the time of scientific phase is set to be one hour. The objective function is

250

R. Pirayesh et al.

Fig. 8 GNC system of the scientific phase Fig. 9 Pareto front of the PD controller

Pareto front

Objective 2- RMS error (radian)

0.24 0.22 0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04

3

4

5

6

7

Objective 1- Energy (J)

8

9

10 10-3

Objective function Energy consumption and RMS error during one hour (47) The Pareto front of the PD controller and SMC is shown in Figs. 9 and 10. In these figures, the RMS error versus the energy consumption is plotted. Learning phase The neural network, similar to the development phase, is designed with two hidden layers, which has ten neurons in the first layer and three neurons in the last layer.

Attitude Control Optimization of a Two-CubeSat … Fig. 10 Pareto front of the SMC

251 Pareto front

Objective 2- RMS error (radian)

0.22 0.21 0.2 0.19 0.18 0.17 0.16 0.15 0.14

2

3

2.5

3.5

4

5

4.5

5.5 10-3

Objective 1- Energy (J)

Best Validation Performance is 0.043454 at epoch 584

Fig. 11 PD controller’s performance

Mean Squared Error (mse)

100

Train Validation Test Best

10-2 10-4 10-6 10-8 10-10 10-12 0

100

200

300

400

500

600

700

800

900

921 Epochs

There is only one input and two outputs when using PD controller and three outputs when using SMC in the neural network. The input is input as the ratio of RMS error to the energy consumption is obtained during the optimization phase. The outputs of the neural network using the PD controller are the proportional and derivative gains in the controller. The outputs of the neural network using the SMC are three parameters in the controller. In the neural network in the training phase, the number of epochs is set to 1000 and the number of maximum fail is set to be 6000. The performance plots of the PD controller and SMC are shown in Figs. 11 and 12. The high performance of the neural network shows the high accuracy of the neural network estimator. Table 4 shows the parameters obtained for the experiment phase.

252

R. Pirayesh et al.

Mean Squared Error (mse)

101

Best Validation Performance is 0.012472 at epoch 21 Train Validation Test Best

100

10-1

10-2

10-3

10-4 0

100

200

300

400

500

600

700

800

900 1000

1000 Epochs

Fig. 12 Performance SMC Table 4 Optimal parameters Input to the neural network–ratio of error to energy RatioPD 6.58 RatioSMC 26.14

Controllers’ optimal parameters ⎡ ⎤ ⎡ ⎤ k 2.9947 ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ ⎥ SMC : ⎢ ⎣ G ⎦ ⎣ 0.0193 ⎦ eps 0.2601 0.1208 kp PD : 0.3786 kd

4 Results In the development phase, all the subsystems, except camera, are on to provide enough attitude accuracy for the next phase. In the scientific phase, all the subsystems are on and the camera is imaging the Crab Nebula X-ray emissions. In the next phase, the controller switches to the passive antigravity gradient torque control, where the sensors, the camera, and the filter are off. Table 5 summarizes the phases.

Table 5 Phases

Controller

Sensors and filters

Camera

Phase 1

SMC/PD

On

Off

Phase 2

SMC/PD

On

On

Phase 3

Antigravity gradient torque

Off

off

Attitude Control Optimization of a Two-CubeSat …

253

4.1 Phase 1 In this phase, which lasts 2 min, the camera and the filter are off, but the controller is in the loop. The responses for both controllers for the follower are shown below. The results for the leader are fairly similar, since the same design of the GNC is implemented on the leader (Figs. 13 and 14).

Fig. 13 Development phase with PD controller

0.8 0.6

Quaternions

0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.7

0.8

0.4

0.45

Time, min 0.8

Fig. 14 Development phase with SMC

0.6

Quaternions

0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 0

0.1

0.2

0.3

0.4

0.5

0.6

Time, min

0.9

1

254

R. Pirayesh et al.

4.2 Phase 2 In this phase, the camera is on and the satellites observe the Crab Nebula for one hour. The following plots show the responses for both the Euler angles and the quaternions (Figs. 15, 16, 17, and 18). And the RMS Euler angle errors are shown (Figs. 19 and 20). The RMS accuracy over an hour observation for the SMC is 0.1738° and for the PD controller is 0.2219°. The results show fairly similar responses for both controllers. These results, considering the disturbances in the inertial momentum, show the robustness of the PD controller. However, PD controllers are not guaranteed to be robust against high dis-

Fig. 15 Scientific phase with SMC

1 q1 q2 q3 q4

0.8 0.6

Quaternions

0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1

0.9

Time, h

Fig. 16 Scientific phase with PD controller

1 q1 q2 q3 q4

0.8 0.6

Quaternions

0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1

0

0.1

0.2

0.3

0.4

0.5

Time, h

0.6

0.7

0.8

0.9

1

Yaw (Deg)

Pitch (Deg)

Fig. 17 Scientific phase with SMC—Euler angles

Roll (Deg)

Attitude Control Optimization of a Two-CubeSat …

255

2 0 -2

0

10

20

30

40

50

60

0

10

20

30

40

50

60

0

10

20

30

40

50

60

2 0 -2

2 0 -2

Yaw (Deg)

Pitch (deg)

Fig. 18 Scientific phase with PD controller—Euler angles

Roll (deg)

Time (Min) 2 0 -2 0

10

20

30

40

50

60

0

10

20

30

40

50

60

0

10

20

30

40

50

60

2 0 -2

2 0 -2

Time (Min)

turbances, unlike SMCs. This robustness in SMC is provided at the price of chattering in the controller output fed into the actuators, which are the reaction wheels.

4.3 Phase 3 In this phase, the antigravity gradient torque is applied to decrease the drift of the angular velocity from zero. However, the third angular velocity increases, while the other two angular velocities fluctuate around zero. The result for this phase is similar to both PD controller and SMC (Fig. 21).

256

R. Pirayesh et al. 0.6

Fig. 19 Scientific phase with SMC—error

Errors (Deg)

0.5 0.4 0.3 0.2 0.1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.6

0.7

0.8

0.9

1

time, h

Fig. 20 Scientific phase with PD controller—error

0.55 0.5 0.45

Errors (Deg)

0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05

0.1

0.2

0.3

0.4

0.5

time, h 0.25

Angular velocities, rad/sec

Fig. 21 Formation stabilization phase

0.2 0.15 0.1 0.05 0

2

3

4

5

Time, h

6

7

8

9

Attitude Control Optimization of a Two-CubeSat …

257

5 Conclusion VTXO demonstrates the key technologies that should be developed to keep spacecraft in formation. The high accuracy involved in this mission requires more challenging technologies to be developed further such as more advanced controlling algorithms, more efficient filters [16, 17], and more sensors. The technology that was developed here provides high accuracy with the involved noises influencing the attitude and so the attitude accuracy. To optimize the energy consumption and error, a multi-objective genetic algorithm is implemented on the GNC. In the future work, the relative position between the satellites will not be assumed constant and this distance will be considered for higher accuracies, since this distance also must be controlled for providing the 100-m relative distance between the satellites. The technology that was developed here provides high accuracy with the involved noises influencing the attitude and so the attitude accuracy. The researchers are exploring the nonlinear tracking control of relative position [18]. Future stages of this work include integrating this relative position controller with attitude control to reach higher accuracies for both relative position and attitude.

References 1. Sánchez-Maestro, R., Agenjo-Díaz, A., & Cropp, A. (2013). PROBA-3 formation flying high performance control. In 5th International Conference on Spacecraft Formation Flying Missions and Technologies (SFFMT 2013), Munich, Germany, 29–31 May 2013. 2. Peyrard, J., Olmos, D. E., Agenjo, A., Kron, A., & Cropp, A. (2014). Design and prototyping of PROBA-3 formation flying system. International Journal of Space and Engineering. 3. Clédassou, R., & Ferrando, P. (December, 2005). A new generation hard X-ray formation flying mission. Journal of Experimental Astronomy, 20, 421–434. 4. Skinner, G. K., Arzoumanian, Z., Cash, W. C., et al. (2008). The milliarc-second structure imager (MASSIM): A new concept for a high angular resolution X-ray telescope. In Space Telescopes and Instrumentation 2008: Ultraviolet to Gamma Ray, vol. 7011 of Proceedings of SPIE, Marseille, France, June 2008. 5. Cash, W., Oakley, P., Turnbull, M., Glassman, T., Lo, A., Polidan, R., Kilston, S., & Noecker, C. (2008). The new worlds observer: Scientific and technical advantages of external occulters. In SPIE, 7010, 1Q. 6. Calhoun, P. C., & Shah, N. (2012). Covariance analysis of astrometric alignment estimation architectures for precision dual spacecraft formation flying. In AIAA Guidance, Navigation, and control Conference, 13–16 August 2012, Minneapolis, Minnesota. 7. Woffinden, D., & Geller, D. (2007, September–October). Relative angles-only navigation and pose estimation for autonomous orbital rendezvous. Journal of Guidance, Control, and Dynamics, 30(5). 8. Okasha, M., & Newman, B. (2011). Relative motion guidance, navigation and control for autonomous orbital rendezvous. In AIAA Guidance, Navigation, and Control Conference, 08–11 August 2011, Portland, Oregon. 9. Schacher, M. (2009). Optimal PID control in case of random initial conditions. PAMM, 9(1), 573–574. 10. Pirayesh, R., Naseri, A., Stochaj, S., Shah, N., & Krizmanic, J. (2018). Attitude control of a two-CubeSat virtual telescope in highly elliptical orbits. In 2018 AIAA Guidance, Navigation, and Control Conference (p. 0866).

258

R. Pirayesh et al.

11. Naseri, A., Pirayesh, R., Adcock, R. K., Stochaj, S. J., Shah, N., & Krizmanic, J. (2018).Formation flying of a two-CubeSat virtual telescope in a highly elliptical orbit. In 2018 SpaceOps Conference (p. 2633). 12. Kyle, R., Stochaj, S., Shah, N., Krizmanic, J., & Naseri, A. (2017). VTXO—Virtual telescope for X-ray Observations. In 9th International Workshop on Satellite Constellations and Formation Flying, 19–21 June 2017, University of Colorado Boulder. 13. Hall, D. C. (2011, April 4). Spacecraft attitude dynamics and control. 14. Lear, W. M. (1985). Kalman filtering techniques, NASA Johnson Space Center: Mission Planning and Analysis Division, Houston, TX, September 1985, JSC-20688. 15. Crassidis, J. L., & Junkins, J. L. (2004). Optimal estimation of dynamic system (1st ed.). Boca Raton, FL: CRC Press LLC. 16. Ainscough, T., Zanetti, R., Christian, J., & Spanos, P. D. (2014). Q-method extended Kalman filter. Journal of Guidance, Control and Dynamics, 38(4), 752–760. 17. Sullivan, J., & D’Amico, S. (2017). Nonlinear Kalman filtering for improved angles-only navigation using relative orbital elements. Journal of Guidance, Control and Dynamics, 40(9), 2183–2200. 18. Adcock, R., & Naseri, A. (2018). Comparison of linear and nonlinear dynamics of a virtual telescope. In AIAA Region IV Student Conference, April 2018.

Part II

Ground Systems and Networks

The Cassini/Huygens Navigation Ground Data System: Design, Implementation, and Operations R. M. Beswick

Abstract The highly successful Cassini/Huygens mission conducted almost 20 years of scientific research in both its journey across the solar system and its 13-year reconnaissance of the Saturnian system. This operational effort was orchestrated by the Cassini/Huygens Spacecraft Navigation team on a network of computer systems that met a requirement for no more than two minutes of unplanned downtime a year (99.9995% availability). The work of spacecraft navigation involved rigorous requirements for accuracy and completeness carried out often under uncompromising critical time pressures and resulted from a complex interplay between several teams within the Cassini Project, conducted on the Ground Data System. To support the Navigation function, a fault-tolerant, secure, high-reliability/high-availability computational environment was necessary to support operations data processing. This paper discusses the design, implementation, re-implementation, and operation of the Navigation Ground Data System. Systems analysis and performance tuning based on a review of science goals and user consultation informed the initial launch and cruise configuration requirements, and then those requirements were subsequently upgraded for support of the demanding orbital tour of the Saturn System. Configuration management was integrated with fault-tolerant design and security engineering, according to cornerstone principles of Confidentiality, Integrity, and Availability, and strategic design approaches such as Defense in Depth, Least Privilege, and Vulnerability Removal. Included with this approach were security benchmarks and validation to meet strict confidence levels. The implementation of this computational environment incorporated a secure, modular system that met its reliability metrics and experienced almost no downtime throughout tour operations.

R. M. Beswick (B) NASA/Jet Propulsion Laboratory/California Institute of Technology, Pasadena, CA 91109, USA e-mail: [email protected] © Springer Nature Switzerland AG 2019 H. Pasquier et al. (eds.), Space Operations: Inspiring Humankind’s Future, https://doi.org/10.1007/978-3-030-11536-4_12

261

262

R. M. Beswick

Nomenclature CIA

Confidentiality, integrity, and availability (foundational security principles) CIS Center for Internet Security CISscan CIS internal host security benchmark scanner CM Configuration management DOS Denial-of-service attack DSN Deep Space Network DR Disaster recovery ECC Emergency Control Center, a GCC facility for DR support GDS Ground Data System GCC Goldstone Communications Complex, located in California, one of three main DSN sites HP-UX Hewlett-Packard (Unix System V-based OS) for HewlettPackard computers IGNITE System imaging and installation software for HP-UX systems LAN Local area network Linux Open-source OS derived from Unix, System V-based OS OS Operating system MTTR Mean Time To Restore MMNAV Multi-Mission Navigation operations coordinating organization NAS Network-attached storage NESSUS Nessus network security scanner from Tenable Network Security, Inc N+1 System with one redundant component for every point of failure NFS Network file system ORT Operational readiness test QoS Quality of Service RAID Redundant array of inexpensive disks RLOGIN Remote log-in RCP Remote copy RSYNC Remote synchronization (file distribution) program Solaris Sun (Unix System V-based OS) for Sun computers SFOC Space Flight Operations Facility—Mission Critical Spacecraft Operations building at JPL SSH Secure Shell communications replacement for RLOGIN, RCP, and other “R” commands SYSTEMIMAGER System imaging and installation software for Linux computers TMR Triple modular redundant (three redundant components for every point of failure)

The Cassini/Huygens Navigation Ground Data System …

263

1 Introduction The Cassini/Huygens mission launched in October of 1997 and began a seven-year journey across the solar system culminating in the entry of the spacecraft into Saturnian orbit in July of 2004. Over the next thirteen years, it would orbit Saturn 293 times, conducting nearly 200 targeted flybys of Saturn’s moons, as well as extensive surveys of the planet and its rings, until the dramatic ending of the mission in September of 2017. The work of spacecraft navigation for Cassini involved rigorous requirements for accuracy and completeness often carried out under uncompromising critical time pressures that arose from a complex interplay among several teams within the Cassini Project, conducted on the Navigation Ground Data System. These teams had numerous activities required to fulfill their mission objectives: perform detailed analysis of complex spacecraft data sets; review results and process resultant navigation computations in an efficient manner; convert, distribute, and correlate such results with the Spacecraft Operations team, the Science team, Real-Time Operations team, as well as other parts of the project; and then correlate against further data from the spacecraft and uplink resultant instructions to the Cassini spacecraft—in some cases within a two- to three-hour period, with little margin for error. (These processes are discussed in more detail in several excellent papers, Antreasian et al. [1, 2].) The computational requirements to support this navigation function led to the design, implementation, and successful operation of a high-reliability/high-availability computational environment orchestrated on a network of computer systems used by the Cassini/Huygens Navigation team. It had an initial hard requirement for no more than two hours of unplanned downtime a year during the Saturn orbital tour, often stated in the computer industry as 99.97% availability. By the end of the Prime Mission, this was increased to a soft requirement of 99.995%—less than two minutes of unplanned downtime per year to support these data processing needs through the end of the mission. This paper looks at the history of these efforts through a theme: That good design efforts, taken aggressively early on, will lead to better implementation and, as a consequence, greatly improved operations. The fundamental challenges faced in this effort will be familiar to many in aerospace: limited funding and manpower with aggressive mission requirements demanding ingenious design and considerable care to meet such objectives. In order to understand the evolution of this system engineering process, it may be helpful to consider this effort against the overall timeline of the Cassini/Huygens mission. Launched on October 15, 1997, the Cassini/Huygens spacecraft, a joint mission between the National Aeronautics and Space Administration, the European Space Agency, and the Italian Space Agency (NASA/ESA/ASI), spent the next seven years on a journey crossing the solar system, until its arrival and entry into Saturnian orbit on July 1, 2004. Cassini performed a total of four planetary flybys on its journey to Saturn. This included two Venus flybys on April 26, 1998 and June 24, 1999, an Earth flyby on August 18, 1999, and a flyby of Jupiter, coupled with a dual science mission with the Galileo spacecraft, on December 30, 2000. Upon its arrival

264

R. M. Beswick

in Saturn orbit, the Cassini spacecraft began a thirteen-year aggressive tour of the Saturnian system, with detailed examinations of the rings, moons (especially Titan), and Saturn’s atmosphere. This would also involve 127 targeted flybys of Titan, 22 of Enceladus, five of Dione, four of Rhea, and one each of Iapetus, Hyperion, and Phoebe. The exhaustive study of the cloud-covered moon Titan also encompassed the deployment of the Huygens probe to survey the atmosphere and land on its surface on January 14, 2005. Finally, the Cassini spacecraft itself flew the dramatic Grand Finale, orbiting Saturn between the upper atmosphere and its innermost ring, the Dring, 22 times, culminating in the atmospheric entry of the spacecraft on September 15, 2017, ending the mission. Against this backdrop, a focused effort at systems engineering took place. The twenty-plus years of the Cassini mission required a “long-view” approach to the design and implementation of hardware and software systems. Unlike most flight projects, this was a “marathon,” not a “sprint.” Over a large part of the mission timeline, this implementation took place incrementally, with minor improvements being financed, evaluated, and then placed into operations. Periods of low spacecraft activity (during the long transits between the inner and outer planets of the solar system) afforded the opportunity for large-scale overhauls of the Ground Data System used by Navigation. These overhauls emphasized top-to-bottom review, design, and formal upgrade planning. Such comprehensive overhauls examined and defined systems requirements, the state of the art of current computational offerings from major vendors, systems analysis to determine what offerings met the needs of Navigation, and budgetary planning to determine what and how these improvements would be implemented. We look at these efforts, considering both a strategic systems model which was finalized early on, serving Navigation well over the course of the mission, and as tactical considerations that may inform future systems implementers and system administrators. Some of these general tactical considerations are discussed in the “Operations and Lessons Learned” section at the end of this paper, which may prove especially useful in future design efforts. This system engineering process involved several associated elements in its design [3–5]. Configuration management (CM) was an integral and critical element in the system configuration. Software and hardware elements were rigorously standardized—a clearly defined computing environment that had precise controls for which operating system and software sets would be installed on which hardware platforms. Furthermore, a greatly clarified system model simplified administration by ensuring that each machine had a well-defined configuration with clearly understood interactions with other computational components. The importance of this “building block” approach cannot be overstated. In time, it allowed complex tasks that once took hours and days (on the same hardware) to be finished in minutes, and it greatly simplified troubleshooting of errors during system faults, while allowing for greater confidence in the update, repair, and deployment of new computational capability. Fault tolerance was integral to the system engineering of the Ground Data System. Making certain that the system was reliable enough to be used and available, on a 24-h basis, to the Navigation team was the central consideration in the reliability requirements for the Navigation system. Both Quality of Service (QoS), the metrics

The Cassini/Huygens Navigation Ground Data System …

265

used to show how well the system would perform overall, and Mean Time to Restore (MTTR), the average time to restore (or recover) a system after a fault, were used in the evaluation of system capability. Fault tolerance and speed of restoration in the event of a fault were core engineering constraints for the system design. Security was another critical aspect of reliability. No matter how accurate the results or how rapidly such results were generated by the computational system, if modified by a security compromise, those results would be of little use to the Navigation team. Security is a feature of reliability in this fault-tolerant systems design [5, 6]. Indeed, a system hardened (fault tolerant) against intelligent actors will often prove robust against numerous “natural” random failures as well. This particular security model used the cornerstone principles of confidentiality, integrity, and availability (CIA), along with classic design principles of Defense in Depth, Least Privilege, and Vulnerability Removal to serve as a framework for security fault tolerance (system hardening). Security benchmarks and standardized testing tools, along with other regression testing, validated that the computational system met certain confidence levels for the Navigation core reliability requirements. The pre-launch environment which emerged out of the mission planning phase for Cassini was limited in its early design. Its ad hoc design, although supported by skilled system administrators, had emerged out of an environment of harsh cost constraints in terms of both hardware and personnel support. As the mission moved out of its formulation phases, it became clear that significant work was necessary to make the systems design robust for launch operations support. After struggles with hardware and software, and evaluation of the trade space of differing design approaches, a simple and effective system architecture model emerged. It had, as mentioned above, rigorous CM and an atomic software and operating system update process with few single points of failure. Security was an integrated part of this design from the ground up. This model (in varying forms) would be used effectively throughout the rest of the mission and would serve to inspire numerous other system design efforts. As the mission moved toward launch in October 1997, a core configuration based on this model emerged. As part of the increasing focus on critical launch operations, numerous launch reviews and operational readiness tests (ORTs) would encourage the development of this system architecture model into a cleaned-up, solid system implementation. During this time, a working disaster recovery (DR) facility was implemented at a backup remote site (this was the only time during the mission when such a capability would be implemented for Navigation due to significant costs in staffing, training, and software and hardware support). Considerable backup capability existed in the Navigation setup—however, after much evaluation, it became clear that the funding and personnel did not exist to continue this effort after the initial launch period. Although desirable, an effective remote DR facility that could provide the needed continuity of operations for Navigation simply could not be afforded. After the hectic and successful launch period, the first real opportunity to do a full overhaul arose during the Inner Cruise period (1997–2000). Although functional, the systems used at launch needed an overhaul to meet the goals for a simplified systems architecture identified before launch. Numerous desired hardware improve-

266

R. M. Beswick

ments for both systems and the underlying networks were also implemented. Two significant milestones comprised this early period of the mission. The first involved the upgrade of the configuration management process whereby each system, including the network firewall protecting the perimeter, was upgraded to an extremely secure configuration, with every possible security vulnerability removed. The second was the significantly upgraded computer security posture undertaken to ensure a safe gravity-assist flyby of Earth on August 18, 1999. The systems architecture model implemented in the overhaul greatly aided in this effort, allowing for ease of change and confidence in the configuration to meet the enhanced demands of this period, while the hardware overhauls allowed the replacement of aging hardware with newer and more fault-tolerant gear. A serious program of systems analysis and performance comparison would be undertaken from 2000–2002 to determine how to meet the requirements of the Navigation team during the upcoming Saturn tour. This involved requirements analysis and subsequent requirements specification for the Navigation system for the Saturn orbital tour. Once the needs of the Navigation team during the Saturn tour were understood, we could create a system that could support those needs. This would involve a refinement of the system architecture model, taking advantage of changes and improvements in technology since the start of the mission, and evaluating the current state of systems design. Consequently, the rest of the Outer Cruise (2001–2004) was a time of transition. At the same time, a significant incremental upgrade was undertaken (involving some cleverness) to all of the current navigational systems, giving faster network, disk, and computer systems, at no additional hardware cost to the Cassini Project. This upgrade was important because it gave expanded, necessary capability to the Navigation team as they began planning the aggressive Saturn tour campaign, while funding for the full-scale overhaul (being reviewed and evaluated at that time) for Saturn tour would not be available for nearly three years. This was a deliberate “long game” decision—to make sure the largest quantity of the fastest, most capable hardware would be available for the most computationally intensive stage of the mission. This interim hardware upgrade served to ameliorate much of this spartan approach to budgetary self-restraint. After a considerable process of hardware evaluation, determining what current hardware and software platform could meet Navigation systems requirements in the most cost-effective manner possible, final decisions could be made. This involved performance evaluation and systems analysis of four workstation and server vendors, and three different file system storage vendors. Benchmarking of test and Navigation software was performed while the rest of the system was evaluated as a whole. The Finalists for workstation, server, and file system server were chosen, and these chosen platforms would be used successfully for the rest of the mission. During this time as well, the model for the staffing of operations, particularly the many critical events such as maneuvers and encounters, was also planned out and would serve to greatly improve efficiency. The system support personnel would prove to be crucial for mission success, intensely focused on resolving the most crucial systems problems in the minimum time.

The Cassini/Huygens Navigation Ground Data System …

267

Once this overhaul process had concluded and the new system upgrades been brought online, no significant planned downtime, planned or unplanned, would occur. For the next fourteen years (2003–2017), from the end of the Outer Cruise, through Saturn Orbit Insertion in 2004, until the mission’s end with the Grand Finale in 2017, these systems performed almost without a fault. As a part of the modular design of this system architecture, individual units (workstations, disk drives, server systems) could be added or removed without impacting the system as a whole. This design allowed for the incremental upgrade of faster, more capable components, almost never shutting down the whole computational environment. This robust design was put to the test significantly when external organizations acquired a large amount of floor space in the JPL Space Flight Operations Facility (SFOC), where Cassini was located, and proceeded to perform heavy construction on the building itself, requiring multiple moves of the Navigation Ground Data System—in the middle of critical Spacecraft Operations! This experience in the author’s opinion does not bear repeating. Even with this unnecessary stress, all Navigation requirements were met over the course of the aggressive Saturn orbital tour. Thanks to this incremental upgrade approach, the system was several times larger in speed, memory, and disk by the end of the mission (EOM). The modular nature of the system allowed significant expansion in short order, as well as the movement of the environment. This served well for the Grand Finale, when multiple systems were added for media presentation capabilities at multiple sites, as well as post-EOM, where significant additional disk storage and backup capability were implemented to meet closeout mission analysis requirements. This system environment served the Cassini Navigation team and its needs successfully for more than twenty years. This paper documents the evolution of this system and its design, from its pre-launch organic configuration through its redesigns to its final robust configuration for Saturn tour operations. A timeline of events is shown in Table 1. While the evolution of hardware and software capabilities is an interesting story, the design itself proved highly optimal and remained remarkably constant, with only small improvements over this time frame. Key observations and lessons learned from this process will be examined here, both as strategic guidance and tactical considerations. This review also portrays these efforts with industry “best practices” for fault tolerance, disaster recovery, and security. This is an example for similar engineering efforts. While a focused Ground Data System for a flagship interplanetary mission may benefit from this example, there is significant applicability to other cases where a large engineering team has to process and analyze large amounts of data in a precise, efficient, and secure manner under tight time constraints.

268

R. M. Beswick

Table 1 Timeline of events for Cassini Navigation System Engineering Events: Launch, October 15, 1997 Inner Cruise Upgrades

Interim (Post-Jupiter) upgrades Reconfiguration systems analysis Orbital Tour Complete Overhaul – Implement Block Ia Continuous Upgrade Process – Block Ib Block Ic – Overhaul complete Block II – 1st fileserver upgrade

Block III – Primary compute server update

Block IV – Backup server update Block V – 2nd fileserver upgrade, final systems update

End of Mission, September 15, 2017, closeout begins

Cassini Project Events: Launch, October 15, 1997 1st Venus Flyby, April 26,1998 2nd Venus Flyby, June 24, & Earth Flyby, August 18, 1999 Jupiter Flyby, December 30, 2000 -2001 -2002 -2003 Saturn Orbit Insertion, Prime Mission begins, July 1, 2004 Huygens Titan Landing, January 14, 2005 -2006 -2007 End of Prime, start of Equinox Mission September, 2008 Saturnian Equinox (northern spring begins), August, 2009 End of Equinox, start of Solstice Mission, September, 2010 -2011 -2012 -2013 -2014 -2015 -2016 Grand Finale, End of Mission, September 15, 2017

2 History of the Navigation Ground Data System And you may find yourself, In another part of the world … And you may ask yourself, well, How did I get here? —Talking Heads, Once in a Lifetime [7]

2.1 Pre-launch Configuration The initial computational environment for the Cassini Navigation Ground Data System grew out of the environment used for the initial project formulation. This development environment had been subject to several years of harsh cost constraints in terms of both computer and network hardware and adequate personnel support and had developed organically through the mission development phases. One year prior to launch, with the prospect of impending ORTs and many other attendant launch concerns, it was clear that improvement would be necessary to prepare the environment. This computational environment, growing organically out of the mission design phases, comprised a dozen Hewlett-Packard workstations and two Sun Microsystems workstations. These machines were situated on a heterogeneous Ethernet local area network (LAN) built and used by the Navigation section for Multi-Mission Navigation operations (known as the MMNAV network). This network, among the oldest

The Cassini/Huygens Navigation Ground Data System …

269

networks on the JPL campus, had developed in an unplanned manner. Aside from the computer systems, the network hardware was itself a mixture of old and modern components, spanning Thicknet (10BASE5), Thinnet (10BASE2), twisted-pair (10BASE-T), and fiber (10BASE-F) Ethernet configurations, repeaters, switches, and routers, located in five separate network rooms (all but one controlled by other organizations), scattered across three buildings. In addition, the operating systems, security, and version control on the individual computer systems were confused and inconsistent. As a result, it was difficult to lock down machines in a CM and security context, to operate in a secure manner, or to determine the cause of a crash.1 It was clear that improvements were needed. The effort to improve this computational environment initially was far more reactive than proactive in approach. While there was a strong desire to get to the root of these problems and resolve them, it would take some effort to gain traction and start solving problems before they would become a concern. Indeed, this effort took on the characteristics of the mythical Hercules slaying the Lernaean Hydra, or more contemporaneously, a rather large game of “whack-a-mole” [8], for as one problem would be resolved, two or more would be uncovered. However, the long-term push on the part of the systems administration staff (as well as increasing the systems administration staff) would slowly begin to bear fruit, as improvements in CM, security, and fault tolerance would begin to show forth in a more reliable systems design. One of the first focuses of these long-term efforts was version control for both operating system releases (and their attendant patches) and software versions. Initially, the workstations used a variety of operating system releases (HP-UX 9.05 and 9.07 for the HP workstations, Solaris 1.1.3 and 1.1.4 for the Sun workstations), with a difficult-to-determine set of patches applied to each system. Because software versions varied on each workstation as well, it was difficult to determine which software set was running on which OS and patch configuration. This was a very difficult environment to improve. Making matters worse, several Navigation staff members had a strong desire to expand cross-mounting of all the workstations to help provide a workaround for this version control issue. This approach was based on the idea that if all conflicting versions of the software could be run on all the machines, a working set could (hopefully) emerge. Though misguided, this approach demonstrated one of the first needs for a paradigm change in how these systems would be configured.

1 In

one particularly egregious case, this author was headed out one evening to enjoy a two-week winter break in Oregon, and made the poor personal decision to answer “just one more call” on his office line. After some discussion with the caller, it was determined that the MMNAV network had gone silent. Over a twelve-hour period, this author assembled a team of four system and network administrators from three separate organizations to help isolate, and finally replace what turned out to be a bad optical Ethernet transceiver. Sadly, this ended up canceling the trip as the author missed his transportation, and promptly got sick for a week. This unfortunately would not resolve the issue either, as the same transceiver would fail again the following May (at least there was a suspicion where to look) [42]. It would take a network upgrade/overhaul some years later (see Section IV for more details) to finally put these problems to rest.

270

R. M. Beswick

Indeed, what had become clear upon examination of the problem, was that what was needed was a single “known good” working version of the software—not twenty differing problematic software versions, each working in slightly divergent ways. In addition, the cross-mounting proposed for this environment had a fairly serious flaw: Like many other Unix operating systems of the time, the implementation of the Network File System (NFS) under HP-UX was not robust—if one such workstation failed, it could bring down all the other workstations. Each workstation on the network would become a single point of failure for the entire network! To resolve just these initial concerns spanned many months, but the experience served as a catalyst for systems design for the rest of the mission. The proposed multiple software, cross-mounting approach would be opposed vigorously by the System Administration team (particularly this author) until the Navigation staff members came to appreciate the advantages of a centralized software repository. This was a hard-fought and hard-won struggle. This design, with a carefully watched set of centralized servers, would be far superior to each workstation, domino-like, acting as a single point of failure. This alternative came to comprise the systems design used in several layers of the computational environment employed for Navigation. Each workstation for the Navigation team was reinstalled with the same set of operating system software (HP and Sun) and patch clusters, and then regression tested to ensure that the flight software would function as intended to design specifications. This configuration had only two sources of software: the individual operating system, and its attendant local software stored with each machine, and the software provided from one central location. The two most powerful workstations were set up as identical central file and compute servers, configured in a classic “star” configuration as shown in Fig. 1 (for other types of networks this may also be described as a “home run” configuration). One would run as a warm backup of the other, and these two machines, a primary and a backup, served as the single repository of all software, including Flight Operations ground software, not directly a part of the operating system of the individual workstations. This configuration had several key system-level improvements over the previous system. It was based on principles seen in distributed systems design [9]; all users of the system saw the same files and software, all non-operating system software could be updated in a single operation, at a single location. To be clear, several of these design ideas may not seem this unusual or radical in the current age of virtual machines and cloud-based computing services (or as stated in the old parlance, of client–server computing). Many of the issues examined here are the same problems that have been worked by virtual and cloud computing services as well. In similar fashion, as mentioned above, it became clear that the security of these systems needed to follow a similar push—in this case to implement security by design. Although SSH and encrypted TELNET were, and had been available for several years, most Navigation engineers still defaulted to using the non-encrypted versions of RLOGIN and TELNET. Even though by this point SSH provided a significantly superior user experience in both usability (especially with its window tunneling capability) and security, it was not generally used. This was mostly due to familiarity, and the setup for SSH not having been configured the same on each

The Cassini/Huygens Navigation Ground Data System …

271

Fig. 1 Types of network topology

machine. By incorporating the standardization of security to our CM process, we could deploy systems where the users could be assured that these encrypted communication tools worked as expected, and in fact, in the case of SSH, they would work vastly better than the insecure tools they replaced. Once the user community saw how well the new services worked and they gained confidence that these services would work as expected, we were able to totally disable the non-encrypted versions of these tools on the Navigation computer systems. (It would take many more years before this goal was accomplished on the JPL campus as a whole.) Table 2 conveys the initial hardware loadout that was used by the Cassini Navigation team. As mentioned above, most of the Navigation workstations through the post-Jupiter interim upgrade were HP 715/75 systems. There were a number of ill-recorded hardware swaps early in this period when two HP 715/50 workstations were swapped out and the internal disk drives were upgraded to 2 GB FWD. The drive terminology follows standard industry convention, in this case Fast SingleEnded—10 MB/s (FSE) and Fast Wide Differential—20 MB/s (FWD), and denotes the architecture, and more importantly for this discussion, aggregate speed of the drive bus. The two HP 735/125 workstations became central file and compute servers as mentioned, and the majority of the external disk storage were relocated to these machines. The SPEC-based FP and SPEC-based FP throughput values are representations of a metric used to compare the Floating Point peak processing and throughput (average over time) processing capabilities of these systems (Floating Point processing capability is the central metric of concern for Navigation software). How these values were derived will be discussed in the “Orbital Tour Reconfiguration” subsection, under Strategic Considerations, but it should be noted that this is an effort to come up with a single metric to compare all the systems used over the 25-year timespan covered by this paper.2

2 No

general systems metric exists over this timespan, and it represents the combination of results from the Standard Performance Evaluation Corporation [25], specific Navigation benchmark utilities (NBODY), and performance comparison of Navigation software on differing hardware platforms to come up with this scale.

272

R. M. Beswick

Table 2 Initial hardware configuration for Cassini Navigation Model: Sun IPC 4/40

CPU: L64801 25 MHz HP 715/75 PA-7100 75 MHz HP 715/100 PA-7100LC 100 MHz HP 735/125 PA-7150 125 MHz Total network disk storage: (on HP 735/125 servers)

RAM: 16 MB

Disk: 120 MB FSE

SPEC-based FP: 0.01270

SPEC-based FP throughput: 0.01521

128 MB

1-2 GB FWD

0.09863

0.1189

128 MB

1-2 GB FWD

0.1315

0.1585

256 MB

2 GB FWD

0.1716

0.2077

4-8 GB FWD

8-16 GB FWD (including backup)

Strategic Considerations3 Simplicity of Design In a like manner, a similar lofty goal came from the principle of Simplicity of Design, also known more popularly as the KISS principle [10]. For the Navigation context that meant the systems design should stress simplicity of function, and each workstation, instead of having a customized set of software, should be capable of running all the software needed for Navigation operations (e.g., be interchangeable with all other Navigation workstations). In the event of a failure, or perhaps just a desire to run software on a faster machine, all machines would behave in the same manner. One machine could be swapped out for another transparently, the only possible difference being in speed/capability. We have seen a similar paradigm in the Industrial Revolution, where the invention of interchangeable parts greatly improved efficiency. As a bonus, having identically configured machines proved invaluable for troubleshooting problems and greatly simplifying configuration management throughout the rest of the mission. In order to improve security fault tolerance, these efforts would be combined with the goal to have one (security) fault-tolerant machine model that would be the best implementation of a secure system setup, designed to be secure from the ground up. Although this idea would take many years to develop into its final implementation [6], the idea of baking in good security into the computational environment from the start would take shape in these early years.

2.2 Launch to the Planets Once this initial round of environmental fixes was accomplished, a significant mission milestone was achieved. On October 15, 1997, capping off a year of increasingly heightened activity, the Cassini/Huygens spacecraft began its journey to Saturn 3 Valuable

digressions to go into detail about relevant key areas of high-level systems design or systems engineering of interest will be denoted in the text as Strategic Considerations, while areas of relevant technical concerns (particularly covering areas of interest for systems administration) will be called out in the text as Tactical Considerations. In addition, the “Observations and Lessons Learned” section will cover a number of such observations.

The Cassini/Huygens Navigation Ground Data System …

273

Fig. 2 Cassini launch and Cruise

(detailed in Fig. 2). The launch reviews and ORTs would come to consume all the available staff time (and then some) of the Navigation team and the Navigation system administrators. Much systems engineering effort would go into putting into practice the improved configuration and to ensure that this configuration would provide a reliable environment for launch. Although the system model would change little from the approaches discussed in the last section, the efforts undertaken would be in areas of refinement. This would include setting up an operations area in the previously mentioned JPL Space Flight Operations Facility (SFOC) to house the computational asserts of the Navigation team, moving all computational assets to the SFOC to ensure the most stable network and power environment for these critical computers, and finalizing the planned changes to the two systems serving as file and compute servers (including centralizing all the file system space and greatly increasing the backup of these systems). This can be surprisingly hard to do under conditions of high user stress, but these details were going to be vital for the implementation of the new system model and for the rest of the mission. These changes and the challenges of support would provide further good examples for later, more formal designs to come. Each HP workstation was upgraded to HPUX 9.07, and each Sun workstation to SunOS 1.1.4 with care taken to ensure each machine was operating at the same patch level. Several additional machines were deployed to support launch functions, with some systems dedicated to support an effort to ensure that the spacecraft trajectory would not return to Earth. In addition,

274

R. M. Beswick

the systems administration staff had not only the concern for support of a complex network of production Unix systems but also the demands of being a critical part of an active Flight Operations team. In this operational environment, the highest possible effort to solve problems was expected. Troubleshooting and successful resolution of problems, as they occurred, was critical to the operation of the spacecraft (as well as doing everything possible to resolve these possible problems before they became problems). A former Flight Director for NASA, Gene Kranz, described the required attitude of such personnel: “…they fully understand that the price of their admission is Excellence, and that a Spartan set of standards will govern their conduct…. Failure does not exist in the lexicon of a flight controller. The universal characteristic of a controller is that he will never give up until he has an answer or another option.” [11] While high-stress periods during the mission would recur, meeting the challenge of these initial efforts served as a powerful impetus for the many improvements to the system in the years to come. As a part of this effort, and for the only time during the mission, a full-scale disaster recovery effort was put in place. This was to ensure that in the event of a major disaster, such as a major earthquake or a large brush fire in Southern California, a backup control site could provide a limited spacecraft navigation capability. An off-site backup facility was activated at the Goldstone Communications Complex (GCC) Emergency Control Center (ECC), located in the Mojave Desert, some 200 miles northeast of JPL’s main campus in Pasadena. The ECC comprised of several Navigation workstations, with attendant local file system support, synchronized with the Navigation file servers in SFOC. This site, directly tied into the Deep Space Network’s (DSN) array of radio telescopes at Goldstone, would allow operations staff to receive and uplink commands to the spacecraft, even with the loss of support of the infrastructure at JPL. Several ORT exercises were executed at the ECC, involving the deployment of Navigation team members and other operations support staff for days to the remote site. This involved daily plane flights to and from the on-site airstrip, as well as use of the GCC dormitory facilities. In addition, due to the limited network bandwidth available to Goldstone, keeping this site synchronized with the main Navigation operations computers became a real challenge. While we hoped that these changes could be made over the limited network connection, it turned out that numerous data tapes (and a surprising number of replacements for the tape drives4 ) would have to be used to keep the systems updated. (At one point this author almost had to be restrained from running “just one more set of backup tapes…” to the very remote site!) The costs of maintaining a separate set of hardware, including personnel, training, and transportation costs, would bring an end to this effort shortly after launch.

4 During

this time, nearly 75% of all 4-mm DDS-2 tape drives (the standard used for backup for Cassini Navigation at the time) shipped to the remote site would fail shortly after arrival. After some investigation, we suspected that the poor desert roads and high altitudes (nearly 5000 ft. in some locations) were probably contributing factors. Some improvement was achieved by flying both the tapes and the drives to the site in carry-on luggage.

The Cassini/Huygens Navigation Ground Data System …

275

Tony Mendez: … There are only bad options. Its about finding the best one. Stansfield Turner: You don’t have a better bad idea than this? Jack O’Donnell: This is the best bad idea we have, sir. By far. —Argo, Warner Brothers [12]

Tactical Considerations: risk management, disaster management, and recovery This decision would not stop numerous attempts over the mission life span by other organizations to encourage us to support or purchase solutions and hardware that could not meet our needs for disaster recovery. Explaining why we were not interested in these (non)-solutions would prove to be a time-consuming activity indeed. Money spent on a solution to a problem, that will not actually solve the problem, is in fact money wasted. To be clear, few people were more interested in an effective DR solution for the Cassini Navigation team than this author—however, from our own detailed analysis, it was clear that a better solution could not be achieved without a significant increase to our budget. In a detailed study conducted as a response to one such external request to join a collective DR effort in August of 2003 [13], the issues were laid out succinctly: A cursory review of the most likely potential failure modes reveals those areas where single points of failure are most problematic. They can be broken down by types: • • • • • • •

Software/OS fault—(bug, hacker, worm/virus/trojan) Electro/Mechanical (hardware) fault Prolonged power failure Water intrusion (most likely through defect in fire suppression system) Fire Earthquake Terrorist action. These failure modes can thereby be described by the area they affect:

• • • •

Cassini Navigation Central Disk Array Loss of room Loss of building Loss of JPL.

From these categories, some scale of the likelihood of failure should be addressed (i.e., the likelihood of hardware failure of the disk array should be much lower than the likelihood of the loss of the laboratory). In addition, it should be considered what action might be taken to mitigate the failure mode. This study examined the risk management profile of the various modes of failure [13]. In the citation above and chart below, the term disk array is synonymous with fileserver, and the failure case of a loss of a workstation was not considered of interest due to the modular design that we have discussed. A total loss of the central fileserver would be the least bad case that would halt the Cassini Navigation GDS. Table 3 presents the risk profile.

276

R. M. Beswick

Table 3 Risk profile of various failure modes of navigation environment Lowest Consequence Loss of disk array Loss of Room

Highest Consequence Loss of Building Loss of JPL

Software/OS Fault

Backup disk array – other vendor

Backup disk array – at ECC/GCC

Backup disk array – at ECC/GCC

Hardware Fault

Backup disk array

Backup disk array – from other vendor or other room to (Worst Case) Backup disk array – other room / Cluster disk in other room to (Worst Case) Run new circuit – from other room

Backup disk array – at ECC/GCC

Backup disk array – at ECC/GCC

Backup disk array – at ECC/GCC

Backup disk array – at ECC/GCC

Backup disk array – other room / Cluster disk in other room to (Worst Case) Backup disk array – other room / Cluster disk in other room to (Worst Case) Backup disk array – other room / Cluster disk in other room to (Worst Case)

Backup disk array – at ECC/GCC

Backup disk array – at ECC/GCC

Backup disk array – at ECC/GCC

Backup disk array – at ECC/GCC

Backup disk array – at ECC/GCC

Backup disk array – at ECC/GCC

Highest to Lowest Likelihood:

Power Failure Prolonged

–

Run new circuit to disk array

Water (flooding) or Fire a

Backup disk array / Cluster disk across room

Earthquake

Backup disk array / Cluster disk across room

Terrorist Act b

Backup disk array / Cluster disk across room

a In

many cases, one often follows hard upon the other is a category which may include (or preclude) cyber warfare or other military action

b This

The traditional levels of low, moderate, and high risk (denoted in green, yellow, and red, respectively) follow the criticality scale used in GSFC-STD-0002 [14] and elsewhere. In addition, high (red)-risk items have solutions that could be implemented only with a significant increase to the Navigation operations budget—or with significant outside assistance. The “Software/OS fault” category comprises computer security issues as well as random failure, which follows in the concept of security fault tolerance as noted previously. The “loss of room” category was especially hard to categorize, for in some rooms the failure of a sprinkler or other minor water or fire hazard could happen for a variety of reasons. Moreover, during this period, the loss of one of a number of rooms in the SFOC (due to single points of failure) would result in the elimination of Spacecraft Operations Capability at JPL and require the activation of the Goldstone ECC center. (The issues discussed here have happily been resolved.) The activation of the ECC in an emergency required at a minimum a 24-h turnaround period and would require the deployment of significant numbers of trained staff to the remote site under potentially difficult conditions. In this examination, it became clear that there was a significant difference between providing backup support at other locations on the JPL campus, and providing an off-site backup capability. A simple copy of the Navigation file system would not enable Navigation operations to continue. The minimum capacity necessary to support backup operations had to include an uplink and downlink capability to the spacecraft (found only at the Goldstone DSN site), as well as enough personnel to replicate the other parts of the Spacecraft team, otherwise the Navigation team would

The Cassini/Huygens Navigation Ground Data System …

277

not be able to do anything with their files! It was this huge increase in cost between a local backup and full “warm-site” DR support at the ECC (and the lack of any appealing options between these two cases) that led to the decision not to invest any further effort in this capability.

2.3 Inner Cruise Upgrades After the successful launch of the Cassini/Huygens spacecraft and attendant immediate operations, a second round of system improvements was begun. As mentioned, key systems design concerns became clear before launch and it was prudent that these improvements should be made. The Sun workstations were retired, and the HP workstations would be upgraded to HP-UX 10.20 in a precise, scripted manner by following an extensive checklist of software loads and patching instructions from a initial install. With thousands of patches to HP-UX 10.20’s core operating system and application software (seemingly following an Agile/DevOps-like software engineering paradigm), extensive version control became critical to ensure that the same operating system, software sets, and patch sets were installed on all the HP workstations. HP would release updates to its operating systems and application software on a quarterly basis, associating particular patch bundles (containing thousands of patches) to these releases. In order to guarantee that the same set of software would be installed on a similar set of workstations, the same media, with the same part number, and date of production would have to be installed, with the same associated patch media, on all machines. Otherwise conflicts would arise between patch sets (this made reading the Release Notes compulsory). This was much more time consuming, but it did produce a more stable and well-defined computing environment. This experience would later prove useful in dealing with the very large number of software sets and patches and in the open-source Linux operating systems used for the Saturn tour. The primary and secondary fileservers at the heart of the Navigation computing environment also had the RSYNC software package installed and configured so that they could remain in lockstep with each other. Instead of complicated manual cross-checking to ensure that the same program sets and files were available and identical on both systems, the backup server would automatically mirror the primary server every night so that the secondary server would contain a day-old copy of everything on the primary server, allowing for much improved fault tolerance in the event of a failure (albeit with a several-minute failover in the event of a failure). This fault tolerance would be far better than a possibly haphazard switchover in a failure situation to a backup configuration where files that had not been copied would have to be pulled from backup tapes—a process that could take hours or days. In addition, this promoted a degree of “user-proofing” to the Navigation environment. With this configuration, if important files or directories were corrupted or deleted, it was no longer necessary to refer to the backup tapes to recover prior versions, possibly days or weeks old. As long as the affected user or users became aware of the problem

278

R. M. Beswick

within one day, the files could be reverted to their prior state. This did have the problem that at midnight (or whenever the process actually started), the opportunity for change would be lost. This was a real improvement to the system at low cost, namely the sacrifice of nearly half of the disk capacity of the Navigation GDS. This would prove to be a wise decision. After the significant time expenditure involved with the update to the HP workstations, we realized that a better mechanism had to be used to install these systems. This involved a complex checklist of operating system media, application software media, patch media, and carefully chosen install options. Then the installation had to be tested to verify that it matched the desired configuration. The entire process on a slow workstation (HP 715/75) would take a minimum of ten hours if everything went correctly. After some discussion with HP-UX vendor support services and systems administration support staff, we learned that HP’s software package IGNITE supported whole-system imaging [15]. This software set was different than the Solaris Jumpstart tool or the Red Hat Kickstart tool which automate the process of installing a system through a package-based installer (a software installation tool which will automatically run for you). IGNITE instead made a complete software “golden image” which could be used to install an exact clone of a system in another workstation. Although it took some time to work through the complex setup, it ensured at a very atomic level that machines had exactly the same software set. Moreover, deploying a new workstation from “bare metal” (a completely blank state) to a fully functional setup took under 90 min, rather than the previously cited ten hours. This was of tremendous benefit to troubleshooting operations problems, because one could ensure that there were no differences between the entire systems on two separate machines, thereby eliminating whole classes of possible faults arising from variances in the operating system/patch level or system configuration. (The largest regret was the realization of how much time could have been saved if this tool had been used for system setup earlier.) It also became clear that not only could clones of workstations be made in a rapid manner, but such clones could be very tightly secured. Instead of having to perform numerous customizations repeatedly to secure an entire network, it was now reasonably easy to deploy a workstation already configured in a highly secure manner. File system permissions and process controls, even difficult-to-configure setups such as secure remote monitoring and process accounting, could be performed once and cloned many times. Now all Cassini Navigation machines could be configured in an identical and secure manner. This allowed for true “Defense in Depth”—as Cheswick, Bellovin, and Rubin discuss in [16], that of redundant systems providing multiple layers of protection (much like fault-tolerant redundant systems), rather than a system with a strong defensive layer (such as a firewall) but insecure in interior configuration. Now it would be possible to efficiently have a strong exterior and a strong interior.

The Cassini/Huygens Navigation Ground Data System …

279

During this time, the firewall utilized for the entire MMNAV network was upgraded as its security needed an overhaul. A Cisco stateful packet filtering firewall was put in place along with a significantly upgraded mechanism for transfer of email, SSH, and encrypted TELNET. After review of the system communication requirements, and some significant testing, we could reduce the number of open TCP and UDP ports (communication channels) on this firewall from several thousands to a mere handful. The network configuration restrictions would be considered in a similar manner to those developed for the secured “building block” workstations. Much of the effort for this upgrade involved convincing staff members that numerous protocols, such as X-Windows or network printing, did not and should not go through the firewall. Although the network would still need organizational and hardware improvements, its frontline security (the “front door”) would be significantly improved. This work would soon prove its considerable utility. The greatest concern for the computer security profile of the Navigation environment was in the months before August 1999, when the Cassini/Huygens spacecraft flew by the Earth for a course correction on its way to Saturn. In order to cover all bases, the prevention of an accident, or a deliberate, malicious act that could put the spacecraft on a course to return to Earth was considered and evaluated. Similar to the lead up to launch, a period of maximum effort would ensue—this time focused on the computer security of the spacecraft and its Ground Data System. Of particular focus was the most crucial component, the Navigation team and its Ground Data System. For several months before the August 1999 encounter, outside auditors and consultants, counterterrorism experts, and even bomb-sniffing dogs were used to look for any possible flaw in the computer environment. An erroneous or deliberate command could cause great trouble, possibly ending the mission. During this time, the system model described previously was implemented, comprised of a greatly hardened network firewall setup and the highly secured cloned workstations and servers. Insecure systems were either removed from the network or upgraded. Very detailed auditing was set up, and advanced logging provided a better picture of what was happening to these systems. As seen in the discussion and references [5, 6], the workstation became a building block for the rest of the network, and the firewall setup was designed around data flows and needs similar to this building block model. Although time consuming to set up, this configuration was fully operational. Each workstation and server contained an identical set of software and was configured in a nearly identical manner—to the most restrictive security settings that the Navigation team members would accept and still accomplish their engineering objectives. Likewise, the security settings for the firewall were nearly identical—every extraneous data flow or open port was examined and turned off if unnecessary. All systems were patched to the same known good (and almost current5 ) set of security updates. These machines would comprise a trustworthy setup. 5 These two terms are not, as popularly believed, the same. Patches and other software updates may

introduce bugs, especially in the currently popular Agile/DevOps software engineering paradigms. Absent other testing schemes, it is wise to adopt a “wait and see” approach to patching of critical software systems—especially in times of stress when major computer security bugs are announced.

280

R. M. Beswick

Strategic Considerations: Security design suboptimality—the limits of perfection The experience of the Earth flyby was useful in other ways for making clear principles of security design. While the Navigation environment would be sound, environments that are secure to truly optimal standards are rare indeed. One of the trade-offs that must be made involves time, effort, and available resources. Effort would be spent fixing the biggest problems, and the problems that admitted to expedient or quick fixes. (This is sort of a scheduling 80/20 rule, or Pareto principle—working from most critical to least critical concerns, and, in parallel, easiest to most time consuming.) The problems that are serious must be fixed or one does not have a working setup. A good, secure means of logging into a system remotely is one such feature that must work correctly. Likewise, simple problems that can be fixed quickly should be resolved. Resetting a user’s password is usually a quick fix—even if a system administrator is busy this is a short task. Absent infinite amounts of time, effort, and resources, however this security setup will be suboptimal. From this author’s own experience, there is always another setting to be tightened down; there is always another improvement or fix that could be implemented. One must do the best one can with what one has. One such hypothetical example was the case of “the Mad Navigator.” How do you prevent someone, tasked to fly the spacecraft, from commanding the spacecraft to crash (in this case on Earth)? Just changing security settings to prevent individual engineers from clobbering one another’s files was considered unworkable: Users of the system had to work together, trust one another, and sometimes modify each other’s work. Moreover, there were many clever ways one might produce a solution that could pass review, but instead of sending the spacecraft on a “good” trajectory would instead cause it to crash. Several approaches were investigated to find a way to verify independently that the Navigation solution and the commands uplinked to the spacecraft were valid and “good.” None was found that did not involve the same software and computer systems that would be used to generate the initial trajectory. Furthermore, no scientific calculator or other disconnected computer system could perform some stages of this validation.6 Ultimately, the best solution to this problem involved a careful cross-checking of several people’s independent solutions. This problem has similarities to those found with pilots of large airliners—at some point one simply has to trust the people who fly one’s equipment.

Even large companies can make mistakes! A patch failure that causes a home computer to stop working can be painful. A similar failure on a critical operations machine could end the mission. An examination of the release schedule for Microsoft and Intel software and firmware updates during the MELTDOWN/SPECTER vulnerability in January 2018 may be instructive. 6 Versions of this problem involving significantly larger numbers of actors are considered in the case of the Byzantine General’s Problem, or more simply Byzantine failure, while solutions to this problem are classified as Byzantine Fault Tolerance [43].

The Cassini/Huygens Navigation Ground Data System …

281

2.4 Outer Cruise Reconfiguration and Performance Specification Another significant series of changes to the computational environment began in the second half of 2000. These involved both an interim set of upgrades and an exhaustive period of requirements specification and review in which the Navigation performance requirements for the Saturn orbital tour were finalized. These requirements derive systems engineering requirements for the Navigation Ground Data System. The interim upgrades were important due to the nature of the funding regimes that existed during different phases of the mission for Navigation. In Cassini’s overall budget, after the initial equipment purchase, one major upgrade was slated to occur during the mission. After extensive consultation among the Navigation team, it was decided to delay this upgrade to the last possible moment. This was a deliberate decision (one not followed by the rest of the Cassini Project) to focus on the long-term needs of the Navigation team. The team endured spartan levels of computational capability for several years, but at last they could take advantage of the fastest and most powerful hardware during the most computationally intensive part of the mission. This decision meant that the Navigation team did not have sufficient computational capability to complete the orbital tour for nearly three years—in fact it would not be possible until the final hardware upgrades were completed a few months before Saturn Orbit Insertion. (This is discussed further in “Orbital Tour Reconfiguration.”) The interim upgrades undertaken during this period (at no or very little cost to the project) would serve to ameliorate some of this design budgetary self-restraint. The first of this series of interim upgrades came with the merger of our network operations capability (and networks) with the younger, but much larger, organization running the JPL Flight Operations Network. Discussions concerning this merger had been taking place for several years. The MMNAV systems and network administration staff were concerned about this change for two major reasons. This network infrastructure would no longer be under the control of our systems administration staff, and we would no longer have control over a critical part of the infrastructure necessary for our MMNAV network (used for Flight Operations support of Cassini/Huygens, Galileo, Deep Impact, Stardust, and missions to Mars). Second, this network would be directly connected to the Flight Operations firewall, which although newer and more robust, supported far more users and projects—hence more security openings in its configuration. This was a security step backward. Changes would involve more bureaucracy and less autonomy. Instead of being able to directly fix most problems as they occurred, MMNAV staff would have to notify others for problem resolution (sometimes one complaint among many). Instead of direct ownership of the problem, we would not even have visibility into how our mission critical network would be built and operated. However, these concerns would be allayed by the factor we were hard pressed to compete with—cost. As mentioned above, for several years we had been subject to harsh cost constraints. While our staff had the technical capability to support our networks, the stratospheric cost of networking hardware at the time put all but the most crucial repairs far out of reach. In com-

282

R. M. Beswick

parison, the JPL Flight Operations Network had sufficient money and authority to modernize and maintain the networks under its purview, including ours. Although this author would like to stress the values of organizational ownership and autonomy, the opposing argument was too hard to ignore. In time, this merger ameliorated these concerns somewhat, as numerous long-needed improvements in capability and performance were implemented when funding to upgrade, reroute, and standardize the network became available.7 This network hardware upgrade was not the only improvement in this period. The computational environment used by the Navigation team was significantly upgraded in this interim period. The rest of the Cassini Project had already begun its hardware upgrades for Saturn tour operations, moving from a large number of highperformance HP workstations to newer Sun workstations that met or exceeded the necessary performance requirements. There was a long history of Sun workstation use and familiarity in operations, and most of the Ground Data software used was primarily based around the Sun hardware platform—but during the initial Project hardware purchases, HP-UX systems had an advantage in performance and cost that was hard to beat. As a result, many of these HP workstations were available, and the best of them were considerably more powerful than the workstations and servers then used by Navigation. This spare high-end hardware would help fill in the gap for the remainder of the Outer Cruise. This was a significant, no-cost upgrade. (Although they would take several months to reconfigure and restore, the value of these machines was many multiples of the author’s salary.) As a result, the average computational performance of the Navigation team’s workstations improved by a factor of three, and the servers by a factor of six while available main memory (RAM) and disk storage were quadrupled in size. This upgrade was not unappreciated. Table 4 shows the interim hardware upgrade that was used by the Cassini Navigation team throughout the end of the Outer Cruise period. (The main overhaul discussed in the next section would work in parallel with this setup.) As mentioned above, 20 of the Navigation workstations of this period were HP J210 class systems, configured set up so that the internal drives were (software) mirrored. Two HP J282 (dual processor) workstations were central file and compute servers like the previous HP 735/125 systems, and the external disk storage would continue to be attached to these machines. This setup would not be used for long as two J2240 workstations were found. They would have as well Ultra-Wide Single-Ended (UWSE) drives—40 MB/s (twice as fast as FWD). This was a treasure trove—the two systems were brand new. This was an extraordinary find—each was worth around $55,000 in 2001 US dollars. This additional capability would be of great use over the next few years. During this time frame, we began to examine what sort of a computational environment would be necessary for the Saturn orbital tour. There were two significant phases to the development of the Saturn orbital tour performance requirements 7 Not

being able to program your own network switches for such things as server failover can be irksome when trying to explain why a particular complex setup is necessary to another engineer; however, it is hard indeed to ignore a nearly immediate 100-fold upgrade from 10 MB/s Ethernet to 1000 MB/s Ethernet.

The Cassini/Huygens Navigation Ground Data System …

283

Table 4 Post-Jupiter flyby Interim hardware upgrade for Cassini Navigation Model: HP J210

CPU: PA-7200 120 MHz HP J282 PA-8000 180 MHz HP J2240 PA-8200 236 MHz Total network disk storage: (on HP J2240 servers)

RAM: 512 MB

Disk: 4x2 GB FWD

SPEC-based FP: 0.2274

SPEC-based FP throughput: 0.3728

1 GB

4x2 GB FWD

0.7080

no estimate available

0.9648

1.3279

1.5 GB

4x2 UWSE 36 GB FWD

GB

72 GB FWD (including backup)

. A formal specification process was undertaken for the Navigation Software System [17], which was itself derived from spacecraft operational requirements [18]. This first effort had derived an incomplete set of functional requirements for the Navigation team necessary to conduct orbital tour operations. However, these requirements demanded further expansion. A series of lengthy meetings with contributing members of the Navigation team in 2002 to work through these functional requirements led to a set of requirements that provided considerable clarification and expansion, captured performance concerns, and gave a good set of foundational metrics with which to design the Navigational computational environment [19]. Much of the effort to come up with the concise and effective system model discussed above would prove very useful in these discussions and became a part of these requirements. The central requirement which undergirded many of the other requirements and the system design choices stated: 4.18 The Navigation Hardware and Operating System Software shall be benchmarked in its current operational [Launch/Cruse] state, on both client and server systems, using evaluation tools including industry standard CPU benchmarks as well as NBODY (Section 312) suite of benchmarks. These benchmarks should be used to determine and obtain hardware that at a base level is five times (5×) faster than the current Navigation Hardware [ten times (10×) goal] for Tour operations. [19]

This requirement, absent more stringent ones on specific software sets, served as the fundamental benchmark for system performance for the Navigation Ground Data System. (Other core requirements would provide similar metrics for system availability and specific capability and are examined in the appendix.) These requirements were a useful yardstick for the systems evaluation process. They are succinctly summed up thus: • The goal for the Navigation Ground Data system is hardware that is at least five times faster than the then-current Navigation hardware—preferentially ten times faster. • Components would be modular, interchangeable, one workstation for each engineer, so each engineer could do all work on their own workstation without concern that they would encounter interference from other engineers. • Each workstation and server would be capable of meeting the performance requirements for Navigation.

284

R. M. Beswick

• The system will allow for revision to any system state through the mission lifetime to review and analyze prior Navigation results. (This is accomplished through backups and backup hardware.) • Individual workstation and fileserver storage capability would be determined through an extensive series of user interviews. The Navigation computational environment would have the storage capability for 150 GB of local file system storage and enough data storage for all files and necessary Navigation deliveries until the Prime End of Mission (EOM)—estimated at 3 TB.8 This storage capability would be configurable to be easily scaled up to five times its capacity for future growth. From these Navigation Hardware performance requirements, we could examine current hardware and operating systems software offerings from major systems vendors to determine what offerings could best meet Cassini Navigation’s needs. This is the subject of the next section. (1) Strategic Considerations: risk management, disaster management, and recovery Planning for such a mission as this is an exercise fraught with difficulty. Predicting hardware and software changes more than a decade downstream and making purchasing decisions based on those predictions can be extraordinarily challenging. During this evaluation process, it was very clear that a wrong decision could lock the Navigation team into a suboptimal design, perhaps even an unsupported design, for a decade or more. Such a solution might look very promising at first.9 One of the difficulties in such “future proofing” exercises comes from the nature of the computer industry itself. The ever-increasing memory and processing power of computer systems have been following the exponential growth curve of Moore’s Law for more than fifty years—namely that every two years (sometimes incorrectly cited as eighteen months) aggregate computer processor performance will double [20, 21]. The even more aggressive corollary for disk drive storage, Kryder’s Law, states that every year aggregate disk storage will double [22]. (An examination of the Navigation experience will be discussed later; see the “Observations and Lessons Learned” section for more detail.) Even absent any other change in the computer industry, such growth would be very difficult to plan around. The computer marketplace can change dramatically in a short time and upset both assumptions and well-thought-out plans. Being too conservative can cause one to miss dramatic new improvements with particular technologies; being too aggressive can cause one to invest in bad ideas. In addition, corporate economics do play a role. In our experience, it is very hard for anyone to compete with the race to the bottom line. Detailed systems analysis of the particular hardware and software offerings from computer vendors can be helpful, not to mention detailed analysis of the longterm health of the parent corporation itself. Having the broadest possible capability to change solutions and adapt is very helpful—however, as in the case faced by 8 These

estimates were, as all such estimates are, woefully inadequate. See the “Observations and Lessons Learned” section and the derivation of this estimate in the appendix for more detail. 9 See: Amiga, NeXT, VMS, SGI-IRIX.

The Cassini/Huygens Navigation Ground Data System …

285

the Cassini Navigation team, there may not be much of an option in that regard. Ultimately, it comes down to an exercise in risk management. These points bear noting. Although the evaluation process discussed in the next section worked well, this author was convinced (wrongly) that a specific solution would provide the most performance and benefit for our network. Ultimately, the metrics told a different story—and we went with those metrics. As it turns out, that decision was the correct decision. Caveat Emptor! (2) Tactical Considerations: The two great virtues of a system administrator There are specific approaches and beneficial character traits that are useful in working as a system administrator in such an environment as this. The famous software engineer Larry Wall proclaimed that the three great virtues of a programmer are “laziness,” “impatience,” and “hubris,” [23]. Similarly, two great traits that have proven themselves to be real virtues for a system administrator or systems engineer are “ownership” and “cleverness.” During the operational part of the Cassini Project, and especially in the process of the several upgrades to the computational environment, the fact that the System Administration team was embedded with the Navigation team greatly improved the user assistance and the computational environment as a whole. This meant much better visibility of user needs and problems, and the SAs obtained a much better understanding of how the system was being used. Continual interaction meant that long-standing complex issues would get resolved for the benefit of all users, while the system design would be improved almost automatically. Being able to assign responsibility, ask for correction, and give compliments and criticism greatly improved both the responsiveness and the amount of drive or personal investment of the Systems Administration staff. The concept that “I am responsible for the success of the Navigation team’s computer systems” is a powerful one. Likewise, to be effective, system administrators must be clever and able to think on their feet. In the event of a problem, being able to extract oneself from difficulty can be a crucial skill. Many operations issues are resolved under pressure, and sometimes a system administrator is called upon to produce a technical miracle. As an example, during the interim Outer Cruise update described above, the spartan computational resources available were a source of great concern—even though the advantages of the long-term planning we had done were clear. Cleverness would come to the fore when the time came to take possession of the hardware. Even after having investigated the available hardware systems and working hard to finalize the transfer of this hardware, the entire approach was imperiled when the hardware was inadvertently shipped to an off-site storage facility controlled by an external organization. Fortunately for us this equipment was still owned by the Cassini Project. (Having a lack of ownership or concern might have ended the whole interim upgrade at this point.) However, with some cleverness the storage facility where the hardware was kept was discovered, and an inspection tour of the hardware was obtained. Notebook in hand, taking “just a few more minutes” than expected to jot down serial and property numbers would prove invaluable! While the external organization would feign lack of knowledge (possibly really having

286

R. M. Beswick

lack of knowledge) about hardware that it was storing, a detailed memo with the location, serial, and property number of the hardware, and a clear request to have such hardware moved to a specific location could not be ignored. With more effort, the hardware was re-relocated to serve the needs of Navigation.10

2.5 Orbital Tour Reconfiguration Once these requirements had been defined and with the interim hardware upgrade complete, the time came to begin the evaluation of how we would accomplish these requirements. A process of systems analysis would ensue to determine what hardware and software platforms could meet Navigation’s needs, and meet them in the most cost-efficient manner possible. Our goal was to determine the minimum number and best kind of “building block” types that would be necessary to construct this new Navigation computational environment. Our fundamental model was a “star” configuration environment (see Fig. 1), with numerous workstations (one per engineer), combined with a small number of compute and file servers providing a centralized file system and process hosting. Simplicity of design was key—as well as getting the fastest and most robust “building block” computers to make up that design. Performance evaluation and software tuning were conducted on a number of vendor-loaned test systems to consider how software performing similar analytical techniques would perform on these machines, and then to determine how actual Navigation software sets could (where available) perform. The Navigation software packages were built from C, Fortran 77, and Fortran 90 codebases at the time. They consisted of extremely optimized code for Orbit Determination, Trajectory Analysis, and Flight Path Control. These packages were not optimized equally for every supported hardware platform—and some major workstation platforms had only a few packages built for them, or none at all. As a result, while there were several interesting choices available during this period, the initial evaluation was limited to workstations from Sun, HP (two separate lines—the PA-RISC- and Itanium-based workstations), and Linux systems running on Dell workstations. We had some experience with these systems, and we had demonstrated some capability in at least building NBODY (our system benchmarking software on them). To be clear: There were a number of interesting other choices that had potential and it was clear whatever system decisions were made our codebase could, and would, be ported to that system. (This author was particularly intrigued by DEC Alpha systems running Digital Unix, SGI-IRIX systems, and IBM POWER systems, all of which in one form or another were available in our test laboratory.) 10 The

astute observer will note the central benchmark requirement denoted as [4.18] in Ref. [18] and the success of the interim upgrades in increasing performance, “… by a factor of three, and the servers by a factor of six.” Increasing the speed and capability of the “…current operational [Launch/Cruise] state, on both client and server systems” meant that the goalposts would be moved significantly higher for the requirements for tour. This may have not been accidental.

The Cassini/Huygens Navigation Ground Data System …

287

These vendor offerings were evaluated for benchmark performance of test and navigation software, software compatibility with the Navigation flight software, overall product reputation, and expected systems administration hardware and software modularity and ease of maintenance. Fault tolerance and security fault tolerance of the hardware and operating systems software were considered as part of this evaluation. A number of factors comprised each metric. We examined hardware and software modularity and ease of maintenance concerns to see how easy repair and replacement of hardware and software components of the various systems would be. (This was an area of concern of considerable interest for System Administration, as it would directly impact our maintenance efforts throughout the life of the hardware.) We examined each offering’s software compatibility with existing Navigation and Flight software—how much would the software need to be modified to function on the end system. Product reputation (how long will the system be expected to last) and the long-term viability of the product line (how long the company was expected to be able to support the particular product—along with the corporate stock price) were evaluated to determine the overall quality of the vendor’s offerings. Of these workstations, a few key factors in these prospects stood out. Sun workstations and servers were used extensively at JPL, and all of the software that would be used in the mission would be available for the Sun platform. The two HP offerings would support ease of transition from the then-current HP-UX PA-RISC systems: HP-UX could run on both systems (Itanium and the newer PA-8700 RISC processor), HP binaries would work on both systems under HP-UX (albeit under emulation on Itanium), and Itanium would support Red Hat Linux as well. The Dell ×86 Red Hat system had the fundamental advantage of cost (including lower cost parts). We concurrently examined fileserver architectures. Four different fileserver vendors were consulted: Hitachi, EMC, NetApp, and Sun (with a spectrum of choices for a Sun workstation environment). While differing architecture choices were examined from the three principal vendors, total cost, rather than performance, served as the deciding factor. Only NetApp had a price point low enough to allow us to purchase a dedicated fileserver. Sun would allow us to examine other options, but system performance would be much lower and barely acceptable. There was little choice but to go with NetApp. In addition to evaluating systems performance, our most crucial metric for our evaluations was overall system cost. Studies of each of the vendor offerings were undertaken to provide a compatible interface with other parts of Spacecraft Operations. From these studies were developed estimates of total cost, as well as a breakdown in year-to-year costs for a three-year implementation plan [24]. Table 5 summarizes the nominal configuration we studied, looking at an initial purchase of some 27 workstations and servers, a network fileserver, a Sun workstation for interfacing with the rest of the Cassini Project, and backup. Some elements in this cost analysis are included for comparison only—the lowend pricing for the Sun vendor server attached disk had performance that was barely acceptable, while the high end of the HP PA workstation offering was outside of our acceptable price range. (These numbers are in 2002 dollars and have been intentionally “blanderized.” The cost information contained in this document is of a budgetary

288

R. M. Beswick

Table 5 Selected vendor hardware platforms total cost comparison (2002 dollars) Model: Sun 2000: Solaris 8 Dell PC: Red Hat 7.3 HP zx2000: HP 11.0 HP C3700: HP 11.0

CPU / RAM: USPARC III 1.05 GHz / 1 GB RAM P-4 x86 3.0 GHz / 1 GB RAM Itanium 2 1.0 GHz / 1 GB RAM PA-8700 750 MHz / 1 GB RAM

Server Attached Disk: $477,000 - $507,000

Network Attached Storage: $547,000 - $577,000

$303,000 - $344,000

$372,000 - $413,000

$468,000 - $573,000

$537,000 - $642,000

$586,000 - $863,000

$655,000 - $932,000

and planning nature and is intended for informational purposes only. It does not constitute a commitment on the part of JPL and/or Caltech.) After much review of our findings, the lowest cost decision would be chosen. (As it would turn out, in the end that was the right decision.) (1) Strategic Considerations: Performance Benchmarking and the Derivation of a Useful Scale for Performance The performance benchmarking that we undertook helped not only to determine the performance of Navigation software on these systems, but also to serve as a valuable tool in the development of a single-performance scale to compare hardware performance. The most comprehensive computer industry performance benchmarks, produced by the Standard Performance Evaluation Corporation (SPEC) [25], are redesigned and reissued about every five years, and are not generally compatible between versions. However, combined with other performance metrics, and systems that are tested under multiple benchmarks, a rough linear comparison for differing machines can be determined. The performance of Navigation computations, using mostly Floating Point arithmetic processing, gives one such benchmark and allows for such a comparison. The processing time of the Navigation benchmarking tool NBODY, as well as select Navigation software batch jobs processing time, could be combined with SPEC Floating Point, and Floating Point Throughput (average over time) results for the same hardware, and give an overall performance rating and comparison between systems. It also allows, along with machines which are tested under multiple SPEC performance generations (i.e., SPEC CPU2000 and CPU2006), a comparison between them. This is a linear, rough, informal estimate that looks only at the performance of select Navigation software and that is based on the SPEC CPU2006 baseline (i.e., it should be comparable to the published results for machines tested against the CPU2006 baseline). This gives some metric for comparison of differing systems performance for Navigation (in many cases, it is the only such metric of its kind). This is especially useful for the comparison listings noted in this paper. During this analysis, the true difficulty of this comparison effort became clear. As a benchmarking process, the NBODY software set, based on C, Fortran 77, and Fortran 90 codebases, was recompiled for each platform. However, as a part of this compilation, the decision on how much optimization to use for the NBODY benchmarks caused this author no little consternation. The point was to have an effective

The Cassini/Huygens Navigation Ground Data System …

289

Table 6 Interim hardware comparison Model: HP J210: HP 10.20 HP J282: HP 10.20 HP J2240: HP 10.20

CPU: PA-7200 120 MHz PA-8000 180 MHz PA-8200 236 MHz

NBODYC: 23.3 sec

NBODYF90: 21.8 sec

NBODYF77: 18.8 sec

SPEC-based FP: 0.2274

SPEC-based FP thp: 0.3728

Not Eval.

Not Eval.

Not Eval.

0.7080

No estimate available

8.3 sec

5.1 sec

5.1 sec

0.9648

1.3279

benchmark—the same measuring stick used on every platform that we wished to test. But what was being tested here? Compiled code performance? Assembly code performance? What about specific processor optimizations that could give some software a significant speed boost? Isn’t that what is sought for here? Should we use optimized compilers or standard compilers that might give lower performance? But then how can one have a meaningful comparison between these disparate machines? In the current age of limited processor choices, with little differentiation between optimized × 86 machine code execution, this distinction may not be as clear, but at the time of this evaluation four entirely different machine architectures were under consideration having very differing processor performance characteristics.11 After some lengthy consideration, we decided to use a standard set of compilers. But, the GCC and NAGware Fortran compilers used were better optimized for some platforms than others. Code with no optimizations would run at a significant disadvantage, in some cases more than four times slower than the optimized case. We determined for each of these cases to compile with “–O2” optimization as that was the maximum level “best effort” that provided most of the same compile time settings. This seemed to be “the best bad idea” we could find. We have presented performance comparison above for the initial hardware state for Cassini Navigation and the Interim Cruise upgrade in Tables 2 and 4. We examine in Table 6 SPEC-based performance estimates and NBODY results of the same workstations used the Interim Cruise upgrade. Table 7 reviews the same metrics, this time for the comparison workstations with “best effort” optimization done on the build of the NBODY software set for the particular hardware platform. The lowest cost option, Dell workstations and servers running Red Hat Linux, were chosen, along with a 3 TB Network Appliance 840 to serve as a central fileserver. Due to the market economics over the rest of the mission, this solution would prove to be the correct one, as PC workstation and server performance would increase dramatically faster than any of the other options examined here. (HP PA workstations would cease production, the Sun processor line would end up only being maintained 11 One

may recall the protracted struggles in the microprocessor industry at this time between CISC—complex instruction set computer (×86 processors), its branch off to EPIC—explicitly parallel instruction computing (Itanium), and RISC—reduced instruction set computer (PA-RISC, ARM, others). CISC and RISC had radically different processor architectures, while EPIC attempted to merge some traits between the two others. Code compiled for these differing processors might have significantly different performance characteristics in different areas—much like trying to compare the performance of a heavy-duty truck and a sports car. All of these processor types were in this evaluation.

290

R. M. Beswick

Table 7 Selected vendor hardware platforms comparison Model: Sun 2000: Solaris 8 Dell PC: Red Hat 7.3 HP zx2000: HP 11.0 HP C3700: HP 11.0

CPU: USPARC III 1.05 GHz P-4 x86 3.0 GHz Itanium 2 1.0 GHz PA-8700 750 MHz

NBODYC: 1.72 sec

NBODYF90: 1.72 sec

NBODYF77: 1.72 sec

SPEC-based FP: 4.50

SPEC-based FP thp: 4.21

0.86 sec

0.96 sec

0.92 sec

6.47

6.86

1.48 sec

0.73 sec

0.73 sec

7.96

7.29

2.3 sec

2.4 sec

1.8 sec

3.24

3.43

as a legacy mode, and Itanium would prove to be one of Intel’s more spectacular miscalculations.) The new Navigation computing environment is discussed in the next section. (2) Tactical Considerations: Operations Staffing Planning As a part of the support of the Navigation computational environment, the system administration staff were an important “human” component. They were essential for the continuous operations schedule and were the most valuable pieces of the Navigation environment. As noted in the introduction, this was a “marathon,” not a “sprint,” and much like the running activity, differing strategies had to be used to complete the course successfully. Not only did they plan maintenance operations on the Navigation system (and were severely constrained on when the system could be shut down for overall maintenance), they also were scheduled for on-site and on-call support of the Navigation computer systems. Almost all of the time (by design), this was an uneventful, even boring, activity. However, it was critical that system administration staff be close at hand to resolve any problems that might arise. As noted, there was very little time margin in some of the orbital tour schedules—during critical periods Navigation engineers might have only a two- to three-hour window to evaluate complex data sets, perform analysis, and turn around engineering results for delivery to other operations teams for uplink to the spacecraft. Any systems problems that prevented or delayed them from performing their job during these critical time frames could cause serious difficulty for mission operations. At the peak of Cassini’s operational activities, these critical periods would grow nearly to a seven-day-a-week, sixteen (or greater) hour-a-day schedule, supported by only two full-time system administrators with a few part-time, associates from other spacecraft navigation operations teams. The scheduling difficulties in such nearly round-the-clock support were severe—this was a long-term mission with a four-year primary tour. The usual strategies of a shorter length mission would not work. Without no end in sight, burnout could be a serious problem. We created a scheduling paradigm to help maintain the health and sanity of the system administration staff, throughout even the most demanding parts of the mission. As a part of this, several scheduling approaches helped to ameliorate the difficulties of even the busiest parts of the schedule. Figure 3 shows a particularly busy period, before the release of the Huygens probe on December 25, 2004. This was a challenging support period, comprising the support of three system administrators; two, in gray, covering operations support periods in some cases

The Cassini/Huygens Navigation Ground Data System …

291

Fig. 3 Flight Operations schedule for Saturn tour—Huygens probe release

exceeding 24 h in length, working in concert with a third, in orange, providing support for other (non-Cassini) Navigation systems that were tasked with providing DSN support services (ensuring backup capability so that tracking data necessary for Navigation would be available during this critical period). A few key scheduling paradigms helped ease this busy schedule: • Every six days, a system administrator was given a “day” off (such terms were somewhat flexible in the more round-the-clock scheduling periods, but it was a minimum 24-h rest period). • System administrators would be kept to the same sleep schedule. This would help avoid jet-lag effects of having a widely varying sleep schedule and would help a great deal to improve the tolerability of a difficult schedule. (This could induce interesting schedule diversions as seen above—and sleep patterns.) • Whenever possible, shifts would be kept to under ten hours. In addition, other longer-term scheduling paradigms helped to ameliorate fatigue and prevent burnout: • The taking of vacation time would be encouraged—and this is one area where the long arc scheduling of the mission was very helpful, with hour-by-hour schedules (as shown above) going out up to six months or more. Longer-term Navigation plans going out for the duration of the mission phases also helped greatly. With some effort toward not scheduling everyone’s leave at the same time, and shoring up exceptionally busy periods, system administrators could take significant yearly vacation time off. • Often mission critical events would fall on government holiday periods (such as the Christmas–New Years schedule shown above) and would require extensive

292

R. M. Beswick

support. Systems administrators were compensated for working these holiday periods, and a year-long “holiday-reservation” system allowed system administrators to ensure that they would have the chance to observe most of the holiday periods they needed to. This system would prove to be a popular one and would be adopted by other JPL organizations. • Other leave time, for family emergencies or other critical personal issues, was encouraged.12 These concepts, along with flexibility on the part of system administrators, and their scheduling, helped a small support staff (never greater than four or five people) provide coverage for thousands of critical shifts over the course of the Cassini mission—many at highly unusual times and dates.

2.6 Continuous Tour Operations Once this systems analysis and evaluation process were concluded, through the remainder of the Outer Cruise until several months prior to Cassini’s arrival and entry into Saturnian orbit on July 1, 2004, most time was spent on the systems implementation phase of this upgrade process and on the preparation for continuous tour operations. Like a long drawn out Christmas morning, it was a period of unwrapping, testing, and setting up of the new hardware platforms. With some effort had managed to get approval initially for the following hardware platforms: Initial Tour Configuration (pre-Saturn Orbit Insertion—Block I): Red Hat Linux PC Workstations and Servers: • 15 high-end Dell precision 350, P4 3.06 GHz workstations, 1.5 GB RAM. • 4 low-end Dell precision 350 workstations, P4, 2.66 GHz, 1 GB RAM. • 2 Dell precision 650 servers, dual Xeon, 2.8 GHz, 3 GB RAM. Sun Solaris Workstations (SFOC Software Interface Support): • 3 Sun Ultra 10 workstations, 440 MHz UltraSparc II, 1 GB RAM.13 Network Server Hardware: • Network Appliance 840, 3 TB Network Disk Array—Central NFS Server. Additional purchases over the next year and a half along with some additional systems from the project would finish up this first round of purchasing: Tour Configuration (Total—Block I): 12 For

example, this author took days off to support several graduate final exams. did not have to pay for these, and they were not part of our evaluation. They were provided for interface with other Cassini Project Operations teams and are included for completeness. 13 We

The Cassini/Huygens Navigation Ground Data System …

293

Red Hat Linux PC Workstations and Servers: • • • • •

15 high-end Dell precision 350, P4 3.06 GHz workstations, 1.5 GB RAM. 5 high-end Dell precision 360, P4 3.60 GHz workstations, 1.5 GB RAM. 1 high-end Dell precision 370, P4-EE 3.40 GHz workstation, 1.5 GB RAM. 4 low-end Dell precision 350 workstations, P4, 2.66 GHz, 1 GB RAM. 3 Dell precision 650 servers, dual Xeon, 2.8 GHz, 3 GB RAM.

Sun Solaris Workstations (SFOC Software Interface Support): • 7 Sun Ultra 10 workstations, 440 MHz UltraSparc II, 1 GB RAM. Network Server Hardware: • Network Appliance 840, 3 TB (Mirrored) Network Disk Array—Central NFS Server. Operating Systems: • Red Hat Linux 8.0 • Solaris 9 Major Software: • MMNAV ODP/TRAJ (Orbit Determination/Trajectory Analysis Program set) version T1.6 • NAIF 56 (SPICE toolkit) • OPNAV 8.3/ ONIPS 2.3 (Optical Navigation Software set) This system design architecture is depicted in Fig. 4. While the components would be significantly upgraded over time, the basic configuration would not change. As noted, this environment was designed in a modular fashion. Hardware components (especially workstations) could be added or changed out without disrupting the whole computational environment. Many upgrades and changes could be performed even on the Network Appliance central file server without requiring maintenance downtime, and this would prove to be a real blessing. This modularity would allow a continuous program of hardware upgrade and replacement without major downtime. This program was tremendously successful, meeting all performance and reliability requirements. Hardware was installed in a series of block upgrades (much as is seen in other areas of aerospace) with each block upgrade offering hardware improvement and possibly expanded capability. These block upgrades continued throughout the tour. Each update replaced approximately one-fifth of the Navigation workstations per year. People requiring the highest possible performance for their tasks were given the new upgrades, and then the replaced machines would be compared against other Navigation systems and the next fastest group of machines would then replace other slower machines. The only significant planned system downtime that would be required over the length of the orbital tour occurred during the second fileserver update (in Block V) because

294

R. M. Beswick

Fig. 4 Cassini Navigation system diagram

the disk shelves from the original (Block I) purchase were not supported on the new fileserver—hence all of the fileserver data had to be mirrored to the mirroring system. While this could be run in parallel with the previous server, two hours of downtime was required when the new fileserver was switched in from the old. Significant software updates happened over the thirteen-year tour as well. Although the schedules differed, we categorize these software changes in the context of these block updates, to provide some indication of software milestones during this period. Cassini Navigation had a very large collection of software to meet its diverse requirements, and capturing all the myriad changes over this period is not practical here. The most important changes in the core Navigation software we note here. The switchover from the MMNAV ODP/TRAJ Navigation software set, used for the core Navigation engineering tasks of Flight Path Control, Orbit Determination, and Trajectory Analysis (Fortran 77 and Fortran 90 binaries) to the MONTE software set (C++ and Python binaries and applications), was a significant long-term effort. The benefits of newer, expanded feature sets were traded off against slower performance and larger object files. The software set used for the Optical Navigation subsystem saw significant development, and the NAIF SPICE libraries and toolkits, used for space mission planning, underwent several iterations. Furthermore, the Linux operating system for the workstation and server systems underwent many updates, from Red Hat 8 to Red Hat Enterprise 7.22. We exceeded our performance requirements for tour operations. While the underlying architecture design is documented in “General System Design Principles,” there were several key design features. We met our performance and reliability requirements directly in the following ways:

The Cassini/Huygens Navigation Ground Data System …

295

• All Linux systems would have the performance to handle all Navigation tasks (not all flight software would be expected to be ported to these systems—this was explicitly allowed in functional requirements). • Estimates indicated that initial performance was close to ten times Cruise configuration in most application runs, and all showed performance greater than five times Cruise configuration. • The system was built in a “star” configuration (see Fig. 1), with multiple “hot” redundant systems and full mirroring of all server data. Clones of all systems could be deployed to a new system in under 15 min, as with the IGNITE cloning software, using the Linux equivalent application set, SYSTEMIMAGER [26]. • Although the Network Appliance would not have a backup (we could not afford more than one), it would be a N+1 redundant system. While not as redundant as a true Triple Modular Redundant (TMR) configuration with three redundant components for every point of failure (such as might be found in airliner avionics [27]), it would still represent a gold standard for reliability and high availability. It was the core of our network design. Every component that could serve as a single point of failure, such as a disk drive, a network card, power supply, communications bus, and even power strip, would had at least dual, or N+1 redundant (one extra) components that would work in the event of failure.14 • An additional, “day-old” synchronized backup server kept a copy of the entire file system, via RSYNC, from the Network Appliance. This protected from a differing set of failure cases, and it provided an additional layer of fault protection to the system. • A spare office (two at our peak size in our Block III upgrades) with a high-end Dell workstation, Sun Solaris workstation, and a color printer was available. A user could simply go from their office to the spare office and continue working in the event of a workstation failure. This would prove useful in numerous fault diagnosis problems, or merely to support presentation activities or cleared guests. • Spare keyboards, mice, disk drives, disk controller cards, graphics cards, and even motherboards were available in the event of a hardware failure. Parts could be replaced and the system brought online, and then the vendor could be contacted for replacement. • A CD, DVD, and eventually Blu-Ray (Block IV) writing capability was available for archival by users of selected Navigation data sets. • Full (Level-0) tape backups were made to a tape library with AIT-3 (Block I), AIT4 (Block Ib), and LTO-5 (Block IV) capability, directly connected to the Network Appliance fileserver (for speed) on a regular basis through the end of mission. These backup tapes were not to be overwritten. These tapes, together with the 14 This

would be another example of “cleverness” and “ownership”—while having a redundant network connection running in parallel is par for the course for such a critical fileserver, it took considerable effort (as well as late night discussions with the facility electrician) to get a second circuit installed, powering the N + 1 redundant power strip, connected to a different Power Distribution Unit. This meant that a power failure would have to impact not only two different rooms, but two different wings of the building before the server would lose power.

296

R. M. Beswick

Fig. 5 Saturn orbital tour operations

Network Appliance snapshot mechanism, made it almost impossible to lose a file on the fileserver through the life of the mission. After Cassini’s long voyage, this system would be put to use. After nearly seven years in space, the Cassini/Huygens spacecraft finally approached Saturn in June of 2004. A heightened pattern of activity had already been underway for several months, resembling the ramp up to launch that had taken place seven years before. However, compared to launch, this time the standard Navigation computer would be more than five times as fast as the best of the computers used in Outer Cruise, and more than thirty times as fast as the computers used for launch. They would have six times as much main RAM memory, and a whopping three hundred times as much network disk storage as the ones used at launch. Although much harder to quantify, the system was far more secure and fault tolerant, with many components possessing at least N+1 redundancy in the event of a failure. A secure system model had been evaluated and improved several times, and numerous drills had taken place to test the failover of critical components by the System Administration team. After arrival at Saturn, the orbital tour would begin in earnest and would go on to start a pattern of, in some cases, nearly daily critical operations. The aggressive orbital tour would succeed beyond all expectations. Figure 5 shows the 293 orbits of the Saturn tour. (1) Block I updates: Beginning of Tour Operations The first series of updates to the Navigation system yielded a significant performance improvement to the systems used in Navigation. These initial updates, described above, were delivered as part of the main set of hardware procurements for the overhaul of the Navigation computational environment in FY 2003, 2004, and 2005. They represented the largest such outlay during the mission, a total of 21 “high-end” workstations—so called because they represented the peak available performance generally available in an engineering ×86 workstation, and four “low-end” workstations—designated because they were a pedestrian ×86 workstation that served

The Cassini/Huygens Navigation Ground Data System …

297

Table 8 Cassini Navigation Block I—initial updates Model: CPU: Dell P350: P-4 x86 Red Hat 8.0 3.06 GHz NBODY ⇒ NBODYC: Dell P360: P-4 x86 Red Hat 8.0 3.60 GHz Dell P370: P-4EE x86 Red Hat 8.0 3.40 GHz Dell P650: Xeon x86 Red Hat 8.0 2.8 GHz Total network disk storage: (on NetApp 840 server)

RAM: 1.5 GB

Disk: 150x2 GB U360

SPEC-based FP: 6.473

SPEC-based FP throughput: 6.858

0.86 sec 1.5 GB

NBODYF90: 150x2 GB U360

0.96 sec 7.63

NBODYF77: 0.92 sec no estimate available

1.5 GB

150x2 GB U360

8.73

no estimate available

3 GB

150x2 GB U360

5.956

7.287

1.5 TB FC

3.0 TB FC (including mirror)

for general use. In addition, three Xeon ×86 dual processor server systems were purchased with twice as much memory and processing power as the individual engineering workstations, to serve as centralized CPU servers and run dedicated central network services for the entire Navigation environment—such systems as a real-time tracking data repository and monitoring system. Combined with the central Network Appliance fileserver, and the backup systems, this represented a quantum leap from the prior Navigation computer systems. Table 8 describes the performance characteristics of these systems, with the NBODY values derived from the systems analysis effort compared against the equivalent workstation. (2) Block II updates: Prime Mission Incremental Update The first “upgrade in place” and second overall series of updates to the Navigation system gave a significant performance improvement to the systems used in Navigation during the key years of the Cassini Prime Mission. These incremental updates were delivered as part of a set of hardware procurements for the overhaul of the Navigation computational environment in FY 2006, 2007, and 2008. They comprised a total of nine “high-end” workstation upgrades over this three-year period, and one server upgrade—a Xeon ×86 dual processor server system that offered upgraded performance for our primary compute server. Combined with a performance, and capacity upgrade to the central Network Appliance fileserver, this represented a healthy leap in size, performance, and capability. The Navigation team would grow to its largest size, with more than 30 user workstations dedicated to Navigation numerical analysis and engineering. This full operational environment encompassed one of the largest Navigation teams ever assembled. Tour Configuration (Total—Block II): Red Hat Linux PC Workstations and Servers: • 29 high-end Dell precision 350–390, and T3400, ×86, 3.06–3.80 GHz workstations, 1.5 GB RAM. • 4 low-end Dell precision 350 workstations, P4, 2.66 GHz, 1 GB RAM. • 3 Dell Dell precision 650 servers, dual Xeon, 2.8 GHz, 3 GB RAM. • 1 Dell Dell precision 670 servers, dual Xeon, 3.7 GHz, 3 GB RAM.

298

R. M. Beswick

Table 9 Cassini Navigation Block II updates Model: CPU: Dell P380: P965E x86-64 RH Ent 4.4 3.73 GHz Dell P390: DC2E x86-64 RH Ent 4.4 3.60 GHz Dell P670: Xeon x86-64 RH Ent 4.4 3.7 GHz Total network disk storage: (on NetApp FAS 3020 server)

RAM: 1.5 GB

Disk: 150x2 GB U360

SPEC-based FP: 12.40

SPEC-based FP throughput: 21.70

1.5 GB

150x2 GB U360

19.50

no estimate available

3 GB

150x2 GB U360

11.04

17.57

8.0 TB FC

16.0 TB FC (including mirror)

Sun Solaris Workstations (SFOC Software Interface Support): • 7 Sun Ultra 10 workstations, 440 MHz UltraSparc II, 1 GB RAM. Network Server Hardware: • Network Appliance FAS 3020, 8 TB (Mirrored) Network Disk Array—Central NFS Server. Operating Systems: • Red Hat Enterprise 4.4 • Solaris 9 Major Software: • MMNAV ODP/TRAJ version T2.6 • NAIF/SPICE 62 • OPNAV 8.4/ ONIPS 2.4 These workstations would also be the first Linux systems capable of true 64-bit processing. This was a concern for further downstream as we wanted to migrate to a 64-bit capable version of Red Hat Linux for greater numerical processing bandwidth (see below). Sadly, this was a capability we had with our HP-UX systems from the interim upgrade, and lost with our Tour overhaul. Of interest: During this period, a single point of failure was discovered in the Network Appliance fileserver that was not protected (nor could be) by a redundant part: the disk shelves (the boxes holding the disks). It would turn out, as discovered after six weeks of investigation, that there was a small hairline crack in one of the breadboards in the shelf itself. Fortunately, this problem was discovered in routine maintenance, the shelf replaced, and the problem resolved (to some incredulity from the Network Appliance technicians). Table 9 describes the performance characteristics of the workstation and server systems, as well as the central fileserver.

The Cassini/Huygens Navigation Ground Data System …

299

(3) Block III updates: Equinox Mission Incremental Update The third series of updates to the Navigation system would provide significant performance improvements to the systems used in Navigation. These incremental updates were delivered as part of a set of hardware procurements for the overhaul of the Navigation computational environment in FY 2009, 2010, and 2011. They comprised a total of 11 “high-end” workstation upgrades over this three-year period, and two Server upgrades—a Xeon ×86-64 dual processor system that offered upgraded performance for our primary and secondary compute servers. These upgrades, along with a slight decrease in the Navigation team size during this period, would ensure that all of our systems would be able to run 64-bit operating systems and applications. This would prove essential for the upgrade to Red Hat Enterprise 5.8 that would be a part of the next block upgrade for the Solstice Mission. Combined with a capacity upgrade to the central Network Appliance fileserver, this represented an increase in total performance and capability. Additionally, with the Sun SFOC workstation hardware no longer supported, those systems were upgraded to new systems as well. Tour Configuration (Total—Block III): Red Hat Linux PC Workstations and Servers: • 25 high-end Dell precision 390, and T3400-T3500, ×86-64, 3.60–3.80 GHz workstations, 1.5 GB RAM. • 2 low-end Dell precision 350 workstations, P4, 2.66 GHz, 1 GB RAM. • 3 Dell Dell precision T7500 servers, dual Xeon, 3.2 GHz, 24 GB RAM. Sun Solaris Workstations (SFOC Software Interface Support): • 4 Sun Ultra 45 workstations, 1.6 GHz UltraSparc IIIi, 4 GB RAM. Network Server Hardware: • Network Appliance FAS 3020, 20 TB Network Disk Array—Central NFS Server. Operating Systems: • Red Hat Enterprise 4.5 • Solaris 10 Major Software: • • • •

MMNAV ODP/TRAJ version T2.6 MONTE 8.2.0 NAIF/SPICE 63 OPNAV 9.0/ONIPS 3.0

With the Prime Mission over, two significant system changes were made during this period. The overall operational segment lengths (the lengths of the critical event sequences), based on several years of successful tour operations and the everincreasing speed of the Navigation computers, were decreased significantly, and the

300

R. M. Beswick

Table 10 Cassini Navigation Block III updates Model: CPU: Dell T3400: E8500 x86-64 RH Ent 4.5 3.2 GHZ Dell T3500: W3570 x86-64 RH Ent. 4.5 3.2 GHz Dell T7500: Xeon x86-64 RH Ent 4.5 3.2 GHz Total network disk storage: (on NetApp FAS 3020 server)

RAM: 1.5 GB

Disk: 150x2 GB SAS

SPEC-based FP: 21.40

SPEC-based FP throughput: 31.65

1.5 GB

150x2 GB SAS

38.10

95.24

24 GB

300x2 GB SAS

40.42

208.0

20.0 TB FC (does not include some spare and raw disk)

expected reliability of the system was increased from 99.97% to a soft requirement of 99.995%—less than two minutes of unplanned downtime per year. Moreover, as another soft requirement, the total downtime (planned and unplanned) was expected to not exceed the original 99.97% (two to three hours of unplanned downtime) per year with these critical schedules. Due to the high availability demonstrated over this period, and the successful completion of all of the Prime Mission objectives, it was decided that a full mirror of the central NetApp was no longer required—enabling a quick upgrade to the system storage capacity. Table 10 describes the performance characteristics of the workstation and server systems, as well as the central fileserver. (4) Block IV updates: Solstice Mission Incremental Update The fourth series of updates to the Navigation system would provide significant performance improvements to the systems used in Navigation. These incremental updates were delivered as part of a set of hardware procurements for the overhaul of the Navigation computational environment in FY 2012 and 2013. They comprised a total of 8 “high-end” workstation upgrades over this two-year period, that offered upgraded performance, and with the later model T3600, increased memory to support enhanced MONTE applications. With the decrease in the size of the team in the Solstice Mission, we were able to get rid of the last of the two “low-end” workstations. These systems ran Red Hat Enterprise 5.8 and were fully 64-bit compliant. A capacity upgrade occurred on the Network Appliance, bringing the total disk to 28 TB. Tour Configuration (Total—Block IV): Red Hat Linux PC Workstations and Servers: • 25 high-end Dell precision 390, and T3400-T3600, ×86-64, 3.60–3.80 GHz workstations, 1.5 GB RAM. • 2 Dell precision T7500 servers, dual Xeon, 3.2 GHz, 24 GB RAM. • 1 Dell precision T3600, Xeon, 3 GB RAM. Sun Solaris Workstations (SFOC Software Interface Support): • 4 Sun Ultra 45 workstations, 1.6 GHz UltraSparc IIIi, 4 GB RAM. Network Server Hardware: • Network Appliance FAS 3020, 28 TB Network Disk Array—Central NFS Server.

The Cassini/Huygens Navigation Ground Data System …

301

Table 11 Cassini Navigation Block IV updates Model: CPU: Dell T3500: W3570 x86-64 RH Ent. 5.8 3.2 GHz Dell T3600: E5-1660 x86-64 RH Ent. 5.8 3.3 GHz Total network disk storage: (on NetApp FAS 3020 server)

RAM: 1.5 GB

Disk: 150x2 GB SAS

SPEC-based FP: 38.10

SPEC-based FP throughput: 95.24

8 GB

150x2 GB SAS

76.80

210.0

28.0 TB SAS/FC (does not include some spare and raw disk)

Operating Systems: • Red Hat Enterprise 5.8 • Solaris 10 Major Software: • • • •

MMNAV ODP/TRAJ version T3.11.4 MONTE 51 NAIF/SPICE 63 OPNAV 10.0/ONIPS 4.0

Table 11 describes the performance characteristics of the workstation and server systems. As an interesting side note, from this point on, the unstated goal of purchasing the fastest processors (or as close to practicable) for these systems would become increasingly difficult due to changes in the Dell product line—in an effort to continue, Navigation would begin purchasing systems from the higher end (hence more expensive) T5XXX line. (5) Block V updates: Solstice Mission Final Update The fifth and last series of updates to the Navigation system would provide the final performance improvements to the systems used in Navigation as well as a significant capacity increase to the Navigation fileserver. These incremental updates were delivered as part of a set of hardware procurements for the overhaul of the Navigation computational environment in FY 2014 and FY 2015. They comprised a total of 14 “high-end” workstation upgrades over this two-year period that offered upgraded performance and would serve to fully replace all prior user workstations. These systems ran Red Hat Enterprise 7.22 and were fully 64-bit compliant. A capacity and performance upgrade occurred on the Network Appliance, bringing the total disk to 43 TB. Tour Configuration (Total—Block V): Red Hat Linux PC Workstations and Servers: • 17 high-end Dell precision 390, T5610, and T5810, ×86-64, 3.60–3.80 GHz workstations, 1.5 GB RAM. • 2 Dell precision T7500 servers, dual Xeon, 3.2 GHz, 24 GB RAM. • 1 Dell precision T3600, Xeon, 3 GB RAM.

302

R. M. Beswick

Sun Solaris Workstations (SFOC Software Interface Support): • 4 Sun Ultra 45 workstations, 1.6 GHz UltraSparc IIIi, 4 GB RAM. Network Server Hardware: • Network Appliance FAS 2240, 43 TB Network Disk Array—Central NFS Server. Operating Systems: • Red Hat Enterprise 7.22 • Solaris 10 Major Software: • • • •

MMNAV ODP/TRAJ version T3.11.4 MONTE 123 NAIF/SPICE 63 OPNAV 10.0/ONIPS 4.0

Coming to the end of the hardware procurements for the mission would see the Navigation team at its highest system performance. Compared to prior machines, the standard Navigation computer would be more than ninety times as fast as the best of the computers used in Outer Cruise, and more than six hundred times as fast as the best computers used for launch. They would have more than sixty-four times as much main RAM memory, and an exponential ten thousand times as much network disk storage as the ones used at launch. Table 12 describes the performance characteristics of the workstation and server systems. However, this impressive system would be tested during this period. Due to the descope of the Cassini mission and other external facility issues, other organizations acquired a large amount of floor space in the JPL Space Flight Operations Facility (SFOC), where Cassini was located, and proceeded to perform heavy construction on the building itself. This would require multiple moves of the Navigation Ground Data System, eventually resettling in a different wing of the facility—all of this accomplished in the middle of ongoing critical Spacecraft Operations. The modular design of the Navigation computing environment would enable components to be shut down (in some cases upgraded) and then moved to their remote location and brought back online, without any total system downtime. This would require more that a bit of clever planning and effort on the behalf of the System Administration staff, and this experience was one of the most difficult of the mission. In addition, in the midst of this heavy construction, the entire SFOC building was accidentally powered down, for the first time in thirty years, causing multiple equipment failures and considerable mission stress. However, once again the robust Navigation system was rapidly brought back online without harm to the systems or the data sets of the Navigation team. Even with this unnecessary stress, all Navigation requirements were met over the course of the aggressive orbital tour. (To be clear, the Navigation GDS was brought up within the downtime hard limit of two to three hours of unplanned downtime per year. The soft requirement was not met by this “act of nature.” This author will claim the failure was out of scope.)

The Cassini/Huygens Navigation Ground Data System …

303

Table 12 Cassini Navigation Block V updates Model: CPU: Dell T5610: E5-2687 x86-64 RH Ent. 7.22 3.3 GHz Dell T5810: E5-1620 x86-64 RH Ent. 7.22 3.5 GHz Total network disk storage: (on NetApp FAS 2240 server)

RAM: 16 GB

Disk: 150x2 GB SAS

SPEC-based FP: 80.48

SPEC-based FP throughput: 244.9

16 GB

150x2 GB SAS

90.20

247.5

43.0 TB SAS/FC (does not include some spare and raw disk)

2.7 Grand Finale and Closeout—Epilogue Once the last series of block updates was concluded (Block V) at the end of FY 2015, no further hardware updates to the Cassini Navigational systems were conducted. The mission was entering the final phases of its orbital tour. This did not reduce the system engineering challenges however. After several years of “nominal” spacecraft activity, much of the systems administration and system engineering tasks had become well understood and somewhat regular. As the spacecraft came to the end of its Solstice Mission and prepared to enter the sequences leading to its fiery end in the Grand Finale, a number of new requirements, and new concerns, some from the end of mission, some from unexpected sources and causes, would make this a very interesting time in the mission. The first significant new problem involved the usage of the central fileserver. Although far larger than even its early tour configuration, it had limits to its storage. In March of 2017, it was noticed that several of the volumes dedicated to operations were filling up at an unusually high and growing rate. This growth pattern was not in a single case, but in entire operations directories—growing at a rate exceeding 10% of the total file system per month. This monthly growth was greater than the entire yearly usage seen in previous mission phases. (Unusual growth patterns were a not entirely irregular event—one evening several years before, an engineer had to be asked if he understood the consequences of adding 100 GB of data files to the system every hour and with some consideration, the engineer realized that he needed to change his analysis technique.15 ) As it would turn out, this growth was an anomaly, caused by the need to handle several large groups of data sets for archiving—the extra space would be released after several months. This showcased the critical need, discussed further in the “Observations and Lessons Learned” section to have adequate margin—especially on such a common, highly subscribed area. (This problem would be ameliorated further when analysis showed that even under the worst rate of growth the system would not exhaust its storage capacity until after the End of Mission on September 15, 2017.) [28]. The resolution of this issue would provide some breathing room for our fileserver storage as the project moved into the last operational arc of the Cassini spacecraft.

15 To

be clear, we would have been happy to support him, we just needed a larger account if he wanted to continue… .

304

R. M. Beswick

Of course, this was not actually the end for mission support activities. Although the final closeout activities had been discussed, the full nature of the events, and the work effort that would take place after the end of Cassini, would not be immediately clear. There are few projects in similar scale and scope that one can compare plans with. While it was understood that Navigation data processing would continue as the effort to finalize years of Flight Operations, the new needs and heightened state of activity would be unexpected. Indeed, the following year was a surprisingly busy time. After the end of Spacecraft Operations at Saturn following the Grand Finale, the final closeout process for Cassini ran for the following year. For Navigation, this would prove to be a period of significant activity. Several complete backups of the whole system were made, and we reviewed and archived key systems files and processes. Indeed, this closeout period was significantly busier than the mission critical periods leading up to the mission’s end. When the time came near to finally get rid of key pieces of hardware, archiving processes were in full swing and Navigation team engineers would begin to make complex requests to find missing files and deliveries, some from many years prior. As noted, Navigation had requirements to be able to restore all prior configurations and software states, to “support the Cassini Mission from Launch to EOM,” [19] and this had led to the formulation of a small collection of prior computer hardware and operational environments. Several times Navigation team members inquired into the possibility of recovering one or more file sets, from the early tour in 2004, or worse still, the Earth flyby in 1999. Indeed, although hardware had been kept, it was not clear that, even if the data was recovered, it would be of use without the analytic tools to read and understand the data. In order to answer these requests, an entire system state would have to be recovered, perhaps on a different hardware platform and OS framework that had not been used in many years. This recovery effort would take several days of system administrator time for easy solutions—perhaps a week or more to recover a system prior to SOI. Theoretically, it would work, but it could take considerable time and effort to implement. Several meetings encouraged these engineers to see if they had a copy of the necessary information, or could find/reproduce the data in some other way. Happily, at the time of this writing, after considerable search of DVD and Blu-Ray archives, it has not proved necessary to return to a prior system state or (the even more dreaded) system state on a differing hardware platform. This will be discussed further below. In a similar manner, the concerns about file system growth would redouble. While some unusual growth in file system usage had been identified and resolutions implemented, a request for an additional 20% (4 TB at the time) of file storage for an analysis task covering the whole orbital tour was made. It was crucial for the final close out effort, but quite difficult to respond to. This one request, by one member of the Navigation team, was 33% larger than the whole raw fileserver used for Saturn arrival. Fortunately, with some care in moving volumes of files around, it proved possible to meet this request. However, this continued to be another area of concern for the Navigation system. Part of the problem lay in the significant needs for computational capability and intermediate storage space for the Navigation team

The Cassini/Huygens Navigation Ground Data System …

305

during this closeout period. As the closeout period for the Navigation team would run throughout the rest of the following year, a significant portion of the Navigation systems would not be able to be shut down until the very end of the closeout activity. In fact, these systems would be processing data at near maximum capacity—with significant contention between users leading to complaints that the system was sluggish under the load. This would cause the System Administration staff, especially this author, no little stress. As an example, with less than a month remaining in this 12-month period, a request was made for yet another 1 TB of file storage. Through careful moving and shuffling of storage, and using the very last reserves on the file system, this request was met. As noted in the previous section, the modular design of the Navigation computing environment enabled components of this environment to be shut down and moved, without impacting the rest of the Navigation environment. This fault-tolerant configuration would be put to another test when in the midst of this closeout activity, most of the Navigation team systems were forced to move. What would serve to make this closeout effort even more difficult, and what caused this move, was the loss of the data center (which stored many of the primary and backup systems) months ahead of the planned shutdown date. Once again, outside organizations needed to perform heavy construction on the building itself, requiring a hasty—and successful—move of the contents of the Navigation data center to several other locations (this did involve downtime: Fortunately, the one silver lining of this action was that there was no spacecraft or Spacecraft Operations to support during this move). The shutdown and disposition of such a network of systems, the backups, backup systems, the archives, spare and redundant hardware, and even paperwork that had been accumulated over 25 years of continuous operations was a non-trivial effort. One cannot simply turn the lights off on a whole environment such as this! Considerable time was spent, in addition to the Navigation Team efforts, in archiving setups and configuration information, as well as system design info (and this paper is a part of that effort). Finally, once it became clear that no further requests for recovery of data (and more significantly: previous systems and system configurations) from the backups would be required, the spare hardware was shut down and removed, and the backups were boxed up for final disposition. This was in effect a museum or a timeline of workstation and server hardware, and backup media used over the previous quarter century (some of which would be used and examined for this report; see Chapter III for a comparison of these systems). At this point, with some discussion with the remainder of the Navigation team leads, after several years of ground operations and nearly twenty years in Spacecraft Operations, operational support for the Ground Data System came to an end.

306

R. M. Beswick

Strategic Considerations: Final disposition of Cassini Navigation Ground Data System It became clear that we wanted to be able to review, in some capacity, the results from the mission as well as to examine the software and models used. To that effort, the following long-term strategy was undertaken: • Three full, backups of the file system would be taken, one immediately after the end of mission operations after the Grand Finale, one a month later, and one after the conclusion of closeout and shutdown activities (so a version of all last minute changes to the system would be saved). Several copies of each backup will be stored for the foreseeable future. In addition, long-lived archive systems were made available for three different levels of review of Navigation data, software, and results. • The primary server and file system support infrastructure (including network file server and backup system) are being maintained for 12 months to provide nearterm support for any follow-on questions. • An archive server, with sufficient file system storage, will host a copy of all the files and data sets of the navigation computational environment for an indefinite time (e.g., at least five years) to allow for ease of examination/recovery of particular data sets. • Extensive archives of particular data sets, scripts, and files will be supported for the foreseeable future on the JPL Navigation archive service. With these three different types of archive systems, as well as the backup tapes, future interests in the data, and environment of the Navigation team on the Ground Data System, can be easily supported [29].

3 General System Design Principles With the performance, capability, and reliability requirements that were put together in the requirements specification and review period during the Outer Cruise (as well as the significant operational experience picked up from launch, SOI, Cruise, and orbital tour operations), a framework was assembled to serve as an archetype for our design. We have attempted to give some feel for the issues and the pace (as well as some cogent examples) of working on the design, implementation, and operations of the Navigation system during the challenging years it was in operation. In this section, we will move from specific chronological case studies to look at highlights of general engineering principles for these systems. As mentioned previously, these efforts have been the subject of several good papers, and so we will focus on key issues here, giving depth where necessary, but not seeking to replicate prior work [3–5]. The effort here is to examine the construction of this modular design, taking particular effort to examine the construction and setup of the “building blocks” of

The Cassini/Huygens Navigation Ground Data System …

307

this design—and the connections that link them together and connect them to the wider Flight Operations infrastructure. The principle areas of examination for these modules include fault tolerance, security fault tolerance, and high availability and reliability. These three terms are interlinked and depend on each other: A security fault-tolerant system will most likely be fault tolerant against general failure (as opposed to intelligent actors), and a fault-tolerant system will likely be one that is much more highly available and reliable. This is a limited effort based on the resources we had; the concerns that go into the truly robust design of a modern airliner avionics system are far greater [27]. Nonetheless, it is hoped that these paradigms offer examples for good system design. These principles, combined with good CM and the desire to deeply examine the hardware and software choices offered from vendors against this framework, enabled us to not only meet our requirements, but also to greatly simplify troubleshooting, system repair, system installation and reconfiguration, and regular maintenance.

3.1 Fault Tolerance Integrating fault tolerance into the underlying modular design from the beginning would prove time and time again to be the correct approach. Rather than an add-on to an already established design, fault tolerance concerns were considered in the initial specifications and vendor purchasing decisions: from who and what. These fault-tolerant principles included the already discussed modularity, where the system architecture is divided into separate modules, that upon a failure, or even just for maintenance or an upgrade, can be replaced with a different module. Also the concept of fail-fast, where a system component will either work correctly or stop immediately. This often goes hand in hand with independent failure, where each module functions so that it does not affect the other modules in the system if it fails. This also works in concert with redundancy and repair, where spare modules are configured so that when one module fails, another can replace it almost instantly, while the failed module is cycled out to be repaired or replaced offline. Also important is the idea of design diversity (more important in security fault tolerance) which considers using hardware and software from different groups is the best way to get redundancy in (differing) types of failure [30]. Implementing these approaches would produce a more fault-tolerant system. Each of the workstations and servers as much as possible were identically configured (this would prove a challenge in the world of commodity PC hardware). Superior quality and more reliable parts were chosen for these systems (e.g., SCSI or SAS drives in comparison with lower cost IDE or SATA drives), and spare parts for those components most likely to fail were purchased (such as keyboards, mice, disk drives, disk controller cards, graphics cards, and even motherboards). Modular system designs were preferred. Indeed, the HP workstations from the Interim upgrade were very modular, with power supplies, disk, even motherboards capable of being swapped out rapidly. Likewise, the Network Appliance was a fully modular design with the

308

R. M. Beswick

ability to replace power supplies, disks, network cards, network, and power connections without shutting down the machine. The PC workstations were carefully chosen to be as part-similar to each other as possible (difficult indeed on the commodity PC platform where part and even hardware manufacturers were swapped out at will by the vendor to get the lowest possible price). This could be accomplished only partially, by defining the exact configuration in purchasing, and wherever possible, buying in bulk (this would really be the only effective approach—and only if the hardware was bought at exactly the same time). Furthermore, these systems were clones, so in the manner of a raid array they could be swapped out with a failed system. Moreover, a spare office was available in the event of a failure, so that an engineer could simply go to the spare office and continue his or her work. This would enable disentanglement between the system problem and the abilities of the Navigation Engineer to complete their task, allowing the system administrators to fix the problem in a more routine manner. Each of the workstations and servers using an implementation of the secure “golden image” were configured in exactly the same way, except for changes in minor details. The HP-UX IGNITE [15], and the Linux equivalent, SYSTEMIMAGER [26] utilities as mentioned would be invaluable in this effort. This would greatly improve CM, as each system would be a clone of one another. If a problem occurred on one machine, and not on another, this would be a strong indication of a non-operating system fault and many root causes of failure could be eliminated in this manner. By design, all non-ephemeral user files would be stored on the central server—not on an individual workstation. If critical files were suspected of being overwritten or corrupted due to a bug or system compromise, the system configuration could be cryptographically validated, file by file, against the “golden image” and could be very rapidly restored to its pristine state in a matter of minutes—even faster than the brisk “bare-metal” install process. Indeed, the entire computational environment could be updated to a new clone image in a matter of hours, each interchangeable machine sharing the exact same features. This would actually be conceptually similar to the CM and fault tolerance ideas used for the central network servers. Both were essential parts that had to be trusted. As Mark Twain succinctly put: Behold the fool saith, “Put not all thine eggs in the one basket “ – which is but a manner of saying, “Scatter your money and your attention;” but the wise man saith, “Put all your eggs in the one basket and – watch that basket!” [31]

Both the system “golden image” and the central fileserver(s) used were similar, both very carefully built, very carefully controlled. The system “golden image” was precisely configured, updated, and backed up. Extensive security testing, both internally and against the purposefully limited network connections to this clone image, was undertaken to ensure that a robust and secure system would be deployed to these “building block” machines. Much time could be dedicated to ensuring this system worked correctly because only one system image was being evaluated. (This was included with significant verification that the image was correctly and completely deployed to the end systems.)

The Cassini/Huygens Navigation Ground Data System …

309

The central fileserver in both its original UNIX server and Network Appliance upgrades was a single point of failure. This initially was remediated by the fact that the HP UNIX servers had exact mirrored backups of their files. They were even hardware clones (the same parts down to the part numbers for all critical system components). The Network Appliance server was a more reliable system by an order of magnitude, and critical components could be swapped out without shutting down the server. In addition to being a central server, considerable file storage was set aside for multiple versions of the software used for Navigation operations. Care was taken that instead of having to delete older versions of the software (as had to be done to save space on the previous servers) many complete versions could be stored. Although there was only one “known good” version of the software that would be used by default, this version of software could be compared with prior versions to examine any changes in behavior. Another useful feature set of this server was the ability to create “snapshot” copies of the file system, allowing for revision control of the entire server. These “snapshots” were only copies of file system changes over time, and hence in (mostly) static file systems would occupy only a small percentage of the original file system. Even better, this would significantly “user-proof” the computational environment, making the recovery of accidentally deleted files and configurations easy in our setup as long as the user realized his or her mistake within a month.

3.2 Security and Security Fault Tolerance This systems engineering problem becomes far more acute when one considers the probability, not of random chance, but intelligent action (and actors) as a failure mode. This has been the subject of several papers by this author [4–6], as well as several excellent books by others [16, 32–34], and guides from such organizations as the National Security Agency (NSA) [35], the Center for Internet Security (CIS) [36], and the National Institutes for Standards and Technology NIST) [37]. A succinct summary follows. We consider security an aspect of reliability in that systems that are hardened against intelligent actors will often prove robust against natural random failure as well. This security design model takes on several principles found in architecture in that the strength of a system can often be improved, not by what one adds, but by what one takes away. The security design of the Navigational computational environment is based around the cornerstone security principles of confidentiality, integrity, and availability (CIA). These principles define the key concerns in securing a system, not in terms of technique or subsystem, but in what things about the system need protecting (e.g., to evaluate a bank’s security environment, consider those things that need to be protected; in this case the valuables stored in the bank vault). For the Navigation computational environment, these traits manifest themselves in a different manner than as seen in military or financial systems. While confidentiality is critical in military systems (where in some cases the destruction of the system

310

R. M. Beswick

is more desirable than the unauthorized release of information), and of very high priority for some financial systems (where customer data is not only a crucial business concern, but also may be covered by strong legal regulations for control of customer information), for Navigation this is of lower concern. Confidential information in the Navigation environment consists of access control information such as passwords and password databases, detailed information on system and network configurations which could be used by an adversary to subvert security, or the control of information critical to mission operations, or that which has not been cleared for public release (for scientific embargo or other reason). Integrity is a crucial trait in Navigation computing as unauthorized or improper modification of the system could lead to very serious problems. As noted above, no matter how accurate or how rapidly results were generated by the computational system, if modified by a security compromise, those results would be of little use to the Navigation team. Availability in the Navigation computational system, working with fault-tolerant reliability constraints, did have the additional concern that someone might try and deny access to data sets or computer services. This was a crucial trait—the reliability requirements we discuss show how essential it was the system could be used, 24-h a day, 365 days a year, for the thirteen-year length of the orbital tour. These traits would be combined with ideas from security systems, such as Defense in Depth, Least Privilege, and Vulnerability Removal. Defense in Depth considers that instead of a single defensive stronghold or choke point, a series of mutual, overlapping defenses is a much more robust implementation, and provides a form of fault-tolerant redundancy. Should attackers get past the outer perimeter, they will encounter layers of defenses to render further progress futile. This is a moving target. As noted, trade-offs must be made between time, effort, and available resources. There are always more layers of defenses and optimizations that can be added, given enough time, money, and diligence. Least Privilege considers that processes and users should be given the rights and access to do the tasks they need to perform, and no further. This avoids many possibilities for manipulation of user access or a process to do things that were not intended to be allowed. Consider a network printing service that has access to a central spooling directory as a privileged user: By some effort, it may be possible to gain access to other directories on the system, even overwrite critical files, or obtain remote access to the system. Instead, by restricting the privilege of the printing service, the printing utility can continue to function without enhanced access, removing its capacity to be misused. This has many similarities to ideas found in finance and accounting, such as a safe deposit box at a bank where two people are required to access the box, so that no one person can abscond with all the contents of the safe deposit box. Vulnerability Removal considers that by simply removing vulnerabilities, we can avoid many issues with those vulnerabilities, and sometimes other problems as well. If there is no email server on a system, a worm that attacks email servers will have no effect. Likewise, if there is no web server on the system, there is no need to monitor it and keep it patched against exploits. Preventing potential vulnerabilities in buggy

The Cassini/Huygens Navigation Ground Data System …

311

software from being installed, along with the software itself, can save much effort and time in attendant security and maintenance later on. As an analogy, not only is this locking up a side door into a building, but sealing it up so that it cannot be used. Included with these approaches were security benchmarks and validation to meet strict confidence levels. To a high level, the process of hardening (or tightening down) a system considered these areas of concern: (1) External Network • Shut down/remove all unnecessary network services • Implement and configure a firewall that blocks all unnecessary network services (2) Internal to System • • • • •

Shut down all unnecessary system services Remove or restrict all unnecessary software that gives system privileges Review and restrict local and network file system access Configure logging Limit installation of new software and strongly scrutinize software offering a network service

(3) User Controls • Monitor closely who is allowed access to the system and limit their actions without preventing them from doing their job • Carefully control user installation of software so that the security controls do not end up being circumvented As a part of this security process, benchmarking utilities were used to closely evaluate the security state of the system and compare states over time. Mechanisms for determining metrics to evaluate a system’s security level were considered integral to these efforts. A number of software tools were utilized in the security design process. With these tools, it would be possible to establish a baseline security level and ensure patching and operating system upgrades and changes continued to meet those metrics. NMAP [38] and NESSUS [39] are two network port scanning and network service analysis tools used to determine the security of the external network connections of a system. NMAP would determine what network services or ports would respond on a given system, and NESSUS would determine if the network services or ports were vulnerable to a catalog of hacking techniques.16 Likewise, an internal host security scanner from the Center for Internet Security (CIS), known as CISscan, would examine the internal security state of a system, looking for bad permission settings, configurations, or unusual running software or system services that might pose a security problem or weakness. It would serve to generate a consistent set of metrics to evaluate host security across a variety of operating system platforms, enabling comparisons between differing machines. Moreover, its evaluations would 16 If one thinks of a computer on a network being akin to a building on a street, NMAP will attempt

to find all the openings, and NESSUS will try to open all the doors and windows.

312

R. M. Beswick

represent an effort to serve as an industry standard set of “best practices” guidelines, and be very well documented—so that the security implications (and how to change them) of each tested system component would be well understood. As we saw with performance benchmarking, the metrics would provide a vendor-independent scale to measure (and improve) the security of these systems. Reference [5] examines this in detail.

3.3 High Availability and Reliability In the design of a highly available and reliable computational environment, in addition to fault-tolerant hardware and software models, several conceptual goals from other software systems provided foundational models. One of these archetypes was borrowed from the field of advanced distributed systems. As seen in distributed database systems, the principles of Atomicity, Consistency, Isolation, and Durability or (ACID) were viewed as desirable goals for this mission critical system [30]. In this systems model (which is not the same thing as a database), the paradigms of Atomicity and Consistency were considered to be the most crucial: All machines would see the exact same essential software state. Running software would produce equivalent results, even on differing hardware, and any changes would be committed to the whole system, or not at all. Isolation as a paradigm is considered here to be synonymous with security fault tolerance (i.e., the goal is that system software, and particularly user processes and actions, will not interfere with one another at all). Durability as a paradigm is considered to be both synonymous with systems Integrity, Availability, and (to some degree) Consistency, as well as general fault tolerance—such that any committed file system or software changes will be permanent in the face of power loss, crashes, or errors. These system design ideals were not achievable in the manner of transactions in a distributed database system. These were goals that were implemented in how the computational system was built and organized: the NFS fileserver design, the setup of file systems, the installation of software, and even operating systems (sometimes, especially in the beginning, based on the lengthy efforts of the system administrators). The idea is that, from the user perspective, these would be guiding principles in how the system would operate. The objective is a system having characteristics seen in database systems that promote reliability. Combined with the ideas from fault tolerance and security fault tolerance, especially in hardware and software design, these would serve in promoting the design of a reliable and highly available system.

The Cassini/Huygens Navigation Ground Data System …

313

4 Observations and Lessons Learned In this section, key observations found in this long effort will be examined. It is from these observations that it is hoped that some technical points, learned experience, and wisdom may be passed on.

4.1 Performance Comparison The systems cited in this paper were evaluated as part of their use in Navigation, and their performance was categorized as discussed above, in both hardware metrics (operating system, processor speed and type, main memory, disk size and speed) and an artificial performance metric based off of Navigation software benchmarking combined with an industry standard benchmark (SPEC CPU [25]). We reproduce some of the previous metrics cited above in Fig. 6 (note use of log scale) and in Table 13 to enable a comparison of the computer systems used across the whole mission. This may prove instructive.

4.2 Useful Concepts Over such an effort, a number of caveats, miscellaneous points of wisdom, and foundational ideas have emerged. Some considerations may prove useful in the design of future systems of similar scope, or in other areas entirely. Some of these may be known from Systems Administration. As will all such injunctions, your mileage may vary. (1) “Disks are cheap.” Even in 1996 the cost of disk storage was much, much cheaper than the system administrator and Navigation engineering time necessary to work around insufficient file system space, or recover a file from a backup tape, instead of online storage. Acting on this intuition proved to be the single greatest improvement to system reliability accomplished throughout the mission. (2) “Buy the best and largest storage you can afford.” After re-examining the painstaking effort to gather capacity requirements for disk storage, even after putting a large margin in place for the Navigation systems, it is impressive to see how far away from those estimates the actual file system use has been. The estimates simply failed to accurately predict usage even across the Prime Mission. New techniques and new software sets have been implemented since those functional requirements that have made such estimates even more inaccurate. It would have been more cost-effective to have simply skipped this effort and use the money

314

R. M. Beswick

Hardware Performance Across Cassini Mission

3060

3730

3200

3500

208

247.5

1000

240

100

125

100

90.2

120

21.7 6.8583

25

10

40.4 12.4

6.4728 Peak

1.3279

1 0.1585

CPU MHz

0.2077

0.1315

0.1

Throughput

0.9648

0.3728 0.1716

0.2274

0.01521

Dell T5810 (2015)

Dell T7500 (2011)

Dell P 380 (2009)

Dell P 350 (2006)

HP J2240 (2000)

HP J210 (1999)

Sun IPC (1993)

HP 735/125 (1996)

0.012701

0.01

HP 715/100 (1996)

Floating Point Performance - SPEC 2006 based

10000

System

Fig. 6 Comparison of hardware performance of computer systems used in Cassini Navigation

saved to buy more storage. One may note that at one point, the entire Navigation team managed to cross the solar system using less storage than was available for an original iPhone 1. (3) “Make your life as easy as possible in the event of a crisis.” Complicated disaster recovery and failover procedures will be much less helpful during a crisis when the pressure is on and users are screaming. Recovery procedures should be simple, must be tested, and must work. (4) “Never delete anything as a system administrator, for you will regret it later.” This goes hand in hand with the first point. The time saved in not having to recreate a prior configuration to solve numerous classes of questions and problems is almost magical. Many were times that the author had removed something only to regret it hours, or even minutes later. Furthermore, if a user asks to have something deleted, it is likely that a restoration from a backup is sure to follow. Instead of possibly creating a bigger problem, save a backup copy or move it aside. See #1 above.

The Cassini/Huygens Navigation Ground Data System …

315

Table 13 Comparison of computer systems used in Cassini Navigation Model:

CPU:

RAM:

Disk:

Sun IPC 4/40

L64801 25 MHz PA-7100 75 MHz PA-7100LC 100 MHz PA-7150 125 MHz PA-7200 120 MHz PA-8000 180 MHz PA-8200 236 MHz P-4 x86 3.06 GHz P-4 x86 3.60 GHz P-4EE x86 3.40 GHz Xeon x86 2.8 GHz P965E x86-64 3.73 GHz DC2E x86-64 3.60 GHz Xeon x86-64 3.7 GHz E8500 x86-64 3.2 GHZ W3570 x86-64 3.2 GHz W5580 x86-64 3.2 GHz E5-1660 x86-64 3.3 GHz E5-2687 x86-64 3.3 GHz E5-1620 x86-64 3.5 GHz

16 MB

HP 715/75 HP 715/100 HP 735/125 HP J210 HP J282 HP J2240 Dell P350: Red Hat 8.0 Dell P360: Red Hat 8.0 Dell P370: Red Hat 8.0 Dell P650: Red Hat 8.0 Dell P380: RH Ent 4.4 Dell P390: RH Ent 4.4 Dell P670: RH Ent 4.4 Dell T3400: RH Ent 4.5 Dell T3500: RH Ent. 4.5 Dell T7500: RH Ent 4.5 Dell T3600: RH Ent. 5.8 Dell T5610: RH Ent. 7.22 Dell T5810: RH Ent. 7.22

SPEC-based FP throughput:

120 MB FSE

SPEC-based FP: 0.01270

128 MB

1-2 GB FWD

0.09863

0.1189

128 MB

1-2 GB FWD

0.1315

0.1585

256 MB

2 GB FWD

0.1716

0.2077

512 MB

4x2 GB FWD

0.2274

0.3728

1 GB

4x2 GB FWD

0.7080

no estimate available

1.5 GB

4x2 GB UWS

0.9648

1.3279

1.5 GB

150x2 GB U360

6.473

6.858

1.5 GB

150x2 GB U360

7.63

no estimate available

1.5 GB

150x2 GB U360

8.73

no estimate available

3 GB

150x2 GB U360

5.956

7.287

1.5 GB

150x2 GB U360

12.40

21.70

1.5 GB

150x2 GB U360

19.50

no estimate available

3 GB

150x2 GB U360

11.04

17.57

1.5 GB

150x2 GB SAS

21.40

31.65

1.5 GB

150x2 GB SAS

38.10

95.24

24 GB

300x2 GB SAS

40.42

208.0

8 GB

150x2 GB SAS

76.80

210.0

16 GB

150x2 GB SAS

80.48

244.9

16 GB

150x2 GB SAS

90.20

247.5

0.01521

(5) “Failures that are expected should not cause a problem for users or system administrators.” In a fault-tolerant environment like the one we describe here, there are certain classes of problems that should not be a cause for alarm for the users or system administrators. The environment will function much better when the recovery mechanisms allow disentanglement between the solution to the problem and the repairs necessary to resolve that problem. Hot-swappable components that automatically come into function, such as disks in a RAID array, or the noted pre-configured spare office, or a spare network printer, will increase the happiness of both users and system administrators, for they will both be able to continue with what they were working on without having to involve the other.

316

R. M. Beswick

(6) “Sufficient Margin is Magic.” Much of the work described in this paper had in the background a continuous effort on the behalf of the Systems Administration team to get margin. This would include spare parts, spare computers, and spare disk storage. The difference this makes in handling problems, unusual events, or unusual requests is night and day. It can make a problem that is difficult to solve, such as an important backup and recovery effort, instead a matter of much simpler effort. (7) “Ownership is key.” There may be times when merging with a larger group is the best option, as we discussed with the merger of the Navigation network, or it may be the only option. It may save significant money, or enable things that could not be done previously. However, having control, autonomy over a system, a network, a room, an organization gives one options and encourages a higher standard of care and due diligence. It is yours. Independence may allow one to avoid the lemming-like fate of others in a large organization to the outcome of a bad design or a policy. (8) “Today is Saint Crispin’s Day—there is no such word as ‘surrender’…” In Flight Operations, there is no such word as “surrender.” We have commented on the importance of key character traits such as “ownership” and “cleverness.” However, the importance of the right attitude of the System Administrative staff cannot be overemphasized. The stubborn refusal to accept failure is vital for people who serve in this role [40]. Much depends on the right outlook and spirit of the people who serve to keep these critical systems running. Apollo Flight Director Peter Frank codified this attitude as, “To recognize that the greatest error is not to have tried and failed, but that in trying we did not give it our best effort.” [11].

5 Conclusion and Final Notes We have described the efforts to support and maintain the computer systems used by the Cassini Navigation team and the approach taken as to the design, their approach to implementation, and the very successful impact it had on the operations of the mission. From the beginnings of this effort, an ongoing process of improvements coupled with a strict pace of operational objectives: launch, Cruise, Saturn arrival, orbit tour, Grand Finale and closeout would mark the progress of this effort. Through this effort, a number of design constraints were imposed, worked forward with consideration of the best practices in industry while attempting to understand what changes might occur for the future. While the implementations described in this paper have changed over the course of the mission and will change in the future, the principles and process shown may provide long lasting guidance. Although the work conducted on such systems can be very crucial, and the penalties for failure very high, in many cases there is a lack of significant resources to perform such tasks.

The Cassini/Huygens Navigation Ground Data System …

317

Although becoming much more significant, we have not examined closely a number of trends that have emerged in the last few years, such as the rise of cloud computing and virtual systems. Such systems do have the potential to lower costs—if they can meet the reliability and availability needs of their users. No public cloud providers yet do so [41]. Such an effort has yet to been seen on the length and scope described here. It is hoped that future efforts, regardless of the underlying architecture, carefully consider, similar types of questions of design, and work to build systems that will meet the often aggressive requirements of future missions. Ideally, this work we document will also find use with other organizations and projects that may be facing similar challenges. Acknowledgements No work of this size can be carried out alone. The editors who assisted in this document did tremendous service, and to William Owen, Zachary Porcu, Duane Roth, and Sean Wagner a considerable debt is due. Without their aid, the author could not imagine successfully accomplishing such an effort. I would also like to acknowledge the help and assistance of all the very many individuals involved with the Cassini Project over the years without whose help this effort would not have been possible. In particular, the author would like to commend all those system and network administrators who have been a part of the Cassini Navigation team [40] and directly supported the effort described in this paper. To Charles W. Rhoades, Jaime C. Mantel, Scott E. Fullner, Tomas Y. Hsieh, Katherine D. Nakazono, David M. Bajot, Dimitri Gerasimatos, Frank Yu, Elizabeth Real, Jae H. Lee, and Navnit “Nick” C. Patel, “But we in it shall be remembered – we few, we happy few, we band of brothers” (and sisters) to whom we give our deepest thanks. This research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration. Reference to any specific commercial product, process, or service by trade name, trademark, manufacturer or otherwise, does not constitute or imply its endorsement by the US Government or the Jet Propulsion Laboratory, California Institute of Technology. Copyright 2018 California Institute of Technology. US Government sponsorship acknowledged.

Appendix—Key Requirements for Tour As there is considerable interest in the parts of the evaluation process used to consider the decisions made for hardware for the orbital tour, key functional requirements impacted our system choices. We now discuss these key requirements from the model effort here and show how these led to the design choices made for the Navigation computational system. We have already discussed the central requirement of enhanced speed and processing. Other core requirements would provide similar metrics for system availability: 4.8 The Navigation Hardware and Operating System Software shall provide 99.97% [i.e. 2–3 h of unplanned downtime per year] uptime capability. (from 5.1.3.7, 3.2.1.1) 4.9 The Navigation Computer System shall be configured to have a mean-time to restore overall system functionality of 30 min during critical periods and 60 min during non-critical periods. While this does not imply that all subsystems will be functional, all systems necessary to fulfill the NAV operational requirements will be restored in this period.

318

R. M. Beswick

4.11 The Navigation Hardware and Operating System Software shall provide 24-7 uptime capability. (5.1.3.8,3.2.1.11) [19]

Requirements on systems design also incorporated the learned experience of our systems model and the importance of CM that had evolved since launch: 2.3 The design of the Navigation Hardware shall use an approach which stresses interface simplicity. (5.1.1.4, 3.1.1.8, 3.2.1.3) 2.8 Each Navigation Engineer shall have a Navigation Workstation located in the Navigation Mission Support Area (MSA). (5.1.1.8, 3.1.1.55) 2.9 The Navigation Workstations shall be connected to the Navigation Computer Servers by the Multi-Mission Navigation (MMNAV) Local Area Network (LAN). (5.1.1.9, 3.1.1.55, 3.1.1.57) 2.10 The Navigation Hardware shall support local file storage and retrieval capabilities. (5.1.1.10, 3.1.1.43, 3.2.1.5)

These concerns would be coupled with fault-tolerant and modular design considerations, especially those promoting redundancy and resiliency: 2.1 The Navigation Hardware shall support the Cassini Mission from Launch to EOM. (5.1.1.1, 3.1.1.4, 3.1.1.5, 3.2.1.1) 2.17 The Navigation Hardware and Operating System Software shall be designed such that to the maximum extent feasible, single points of failure shall be eliminated in favor of multiple-redundant sub-systems. 2.18 The Navigation Hardware and Operating System Software shall be designed such that, to the maximum extent feasible, degraded performance shall be accommodated in preference to non-operating states in the event of component failure. (from 57-98) 2.20 The Navigation Hardware and Operating System Software should be designed so that, by the use of modular components and software systems, whole-system copying and configuration management utilities, such as Solaris Jumpstart, HP Ignite, Linux kickstart/system imager or the like, and/or spare/redundant systems and components, to the maximum extent feasible, system maintenance issues are minimized in terms of effort and time. [19]

Some design requirements promoted the idea of interchangeable components, particularly that each workstation or server could be exchanged efficiently with any other Navigation workstations or server: 4.10 The Navigation Hardware and Operating System Software shall provide a capability to cluster the primary servers used by the Navigation Computer System to permit a failover from a faulty server to a functional server rapidly and without modification of the other parts of the Navigation Computer System in under one minute [under one second goal]. This means that, aside from any operations or software runs in progress during such a failure, no data will be corrupted or lost. 4.17 The Navigation Hardware and Operating System Software should be capable of meeting the functional requirements for performance on each user’s personal workstation. [Rationale: the previous model of having centralized compute servers doing most of the processing work lead to underutilized personal workstations and very overloaded central servers. As compute costs have come down significantly over time it is desired to shift more of the processing load to the individual workstations and leave the central servers as file and configuration servers.] (5.1.3.11, 3.2.1.1) [19]

The Cassini/Huygens Navigation Ground Data System …

319

Some helpful ideas were introduced that would prove useful in increasing the robustness of the computational environment: 4.13 The Navigation Hardware shall be configured to include a online “hot-spare” workstation and server, so that in the event of a server-cluster failure or a workstation failure the recovery time is limited only by the time necessary to change the configuration of the workstation or server to its new mode serving as an operational replacement, within 60 min of the outage of the prime workstation. This should not preclude additional like spares being so configured if feasible. (4571-7) [19]

This would end up proving a highly useful requirement—it forced the question of maintaining a spare capability which allowed for rapid resolution of problems. This will be discussed in more detail in the next section. Separate performance requirements were levied against specific software sets that were viewed as more technically difficult than the first general requirement (4.18) given above. They were considered more operationally stringent than previous requirements [17] and included upgraded constraints specific to the orbital tour that could serve as a good benchmark for generalized system performance: 4.1 The Navigation Hardware and Software shall be capable of updating the ephemerides of all nine major Saturnian satellites in a single run. This is an encounter requirement. (from 3.1.1.42) from (6421-43) Rationale: Needed to reduce the time required to update spacecraft ephemeris. Note: This implies a temporary working filespace of at least 1 GB to complete this task. 4.2 The Navigation Hardware shall be capable of updating an orbit determination solution, including the satellite ephemerides update spanning 4 years (see 1.3.2 above), and estimations of 150 bias params, 6 stochastic params (up to 200 batches) for 50,000 data points within 5 min [ 1 min goal] of receipt of the input NAVIO tracking data file. (from 5.1.3.2, 3.1.1.29, 3.2.1.9, 4.2.3.1 among others) 4.3 The Navigation Hardware shall be capable of performing at least 5 iterations of a maneuver design update computer run within 1 min [10 s goal] of receipt of the final orbit estimate. (from 5.1.3.3, 3.1.1.30, 3.2.1.10) 4.4 The Navigation Hardware shall be capable of running the LAMBIC software capable of simulating a maximum set of 180 maneuvers and 100 encounters with 100 K-matrix files, with a 1000 sample monte-carlo run using a baseline tour maneuver strategy in 15 min or less. (from 4.3.1.10) from (6421-23). 4.5 The Navigation Hardware shall be capable of processing up to 200 ISS pictures per day at a peak rate of 30 pictures per hour. (5.1.3.4) from (3.1.2.37) 4.6 The Navigation Hardware shall be capable of producing optical navigation picture schedules for one month of the mission within two working days, where the two working days also includes the analysts time. (5.1.3.5, 3.1.2.35, 3.2.2.4) 4.7 The Navigation Hardware shall provide convenient on-line access to the prior six months of optical navigation images. (3.2.2.7) Note: this requirement is probably met under most configurations by 4.15. [19]

Individual workstation and fileserver disk storage capability were determined through an extensive series of user interviews. These became minimal system requirements for the central file server:

320

R. M. Beswick

4.15 Data Storage Requirements A. The Navigation Hardware and Operating System Software shall be configured to provide an offline backup of all Navigation online systems at least twice a week. B. The Navigation Hardware and Software (this could involve changes to the archive S/W) shall be configured to provide a long term offline archive/backup capability in a stable media [a CD-ROM writer or some other similar stable media] C. The Navigation Hardware and Operating System Software shall be configured to provide online data storage for all navigation delivery files and files necessary to duplicate such deliveries until the prime EOM. (from 3.2.1.1) Note: Part C implies a necessary critical disk space of at least 1 TB (3 TB total) as follows: ODP: 150 MB per run ×150 runs +50 GB additional/overhead ~100 GB TRAJ: 11.75 GB between two probe deliveries and twelve project deliveries. MAN: TCM: 70 MB per maneuver ×150 runs 51.3 GB + LAMBIC: 10 MB per encounter ×50 encounters ×5 analysts 2.5 GB + CATO: 10 MB per encounter ×50 encounters ×5 analysts 2.5 GB Total MAN: ~60 GB OMAS: 25,000 pictures ×2 MB per picture ~50 GB SA: 100 GB software C/M repository (current + old flight software) + 50 GB workstation and server OS software + turn-key images + 100 GB file system overhead + 471.3 GB Snapshot space for above disk area 943.5 GB Total minimum disk space: 950 GB Total critical disk space (assuming 70% utilization): 1350 GB Total critical disk space, including online mirror: 2670 GB (across two or more systems) D. The Navigation Hardware and Operating System Software shall be configured in such a manner as to allow online data storage to be easily scaled up to five times its capacity, in order to provide for future growth. [19]

And the individual workstations: 4.19 The Navigation Hardware should be capable of storing at a minimum, 150 GB of data in a local file system, 150 GB for the mirror of local data for a total of 300 GB on each individual workstation and have 2 GB of memory (RAM) capacity. [Rationale: in order to fulfill 4.17 and noting the general requirements of 4.18, these specifications round out the performance requirements noted previously.] [19]

These requirements served as a useful yardstick for the systems evaluation process. Against this background, we would be able to consider the hardware choices critically.

The Cassini/Huygens Navigation Ground Data System …

321

References 1. Antreasian, P. G., Ardalan, S. M., Beswick, R. M., Criddle, K. E., Ionasescu, R., Jacobson, R. A., et al. (2008). Orbit determination processes for the navigation of the Cassini/Huygens mission. In AIAA-2008-3433, SpaceOps Conference, Heidelberg, Germany, May 12–16, 2008. https://doi.org/10.2514/6.2008-3433. 2. Williams, P. N., Gist, E. M., Goodson, T. D., Hahn, Y., Stumpf, P. W., & Wagner, S. V. (2008). Orbit control operations for the Cassini-Huygens mission. In AIAA-2008-3429, SpaceOps Conference, Heidelberg, Germany, May 12–16, 2008. https://doi.org/10.2514/6.2008-3429. 3. Beswick, R., Antreasian, P., Gillam, S., Hahn, Y. H., Roth, D., & Jones, J. (2008). Navigation ground data system engineering for the Cassini/Huygens mission. In AIAA 2008-3247, SpaceOps 2008 Conference, Heidelberg, Germany, May 12–16, 2008. https://doi.org/10.2514/ 6.2008-3247. 4. Beswick, R. M., & Roth, D. C. (2012). A gilded cage: Cassini/Huygens Navigation ground data system engineering for security. In AIAA 2012-1267202, SpaceOps 2012 Conference, Stockholm, Sweden, June 11–15, 2012. https://doi.org/10.2514/6.2012-1267202. 5. Beswick, R. M. (2017). Computer security as an engineering practice: A system engineering discussion. In IEEE: 6th International Conference on Space Mission Challenges for Information Technology (SMC-IT), September 27–29, 2017. https://doi.org/10.1109/smc-it.2017. 18. 6. Beswick, R. M. (2018). Computer security as an engineering practice: A system engineering discussion. In Advances in Science, Technology and Engineering Systems Journal (ASTESJ), vol. Special Issue 5, no. Multidisciplinary sciences and Engineering, p. (to be published). 7. Byrne, D., Frantz, C., Weymouth, T., & Harrison, J. (1980). Composers, once in a lifetime [sound recording]. Sire Records. 8. Wikipedia, Whack-A-Mole, [online encyclopedia], Wikimedia Foundation, December 15, 2017. [Online]. http://en.wikipedia.org/wiki/Whac-A-Mole. Accessed March 28, 2018. 9. Coulouris, G., Dollimore, J., & Kindberg, T. (2005). Distributed systems, concepts and design (4th ed., p. 519). New York: Addison-Wesley. 10. Rich, B. R. (1995). Clarence Leonard (Kelly) Johnson, 1910–1990. In A biographical memoir (p. 231), National Academy of Sciences, National Academies Press, Washington, D.C. 11. Kranz, G. (2009). Failure is not an option: Mission control from Mercury to Apollo 13 and beyond (p. 392). New York: Simon & Schuster. 12. Affleck, B. (2012). Argo. [Film]. USA: Warner Brothers. 13. Beswick, R. M. (2003). Response to RFA #3, of review for Cassini Navigation, of 28 August 2003. IOM 312.D/006-2003, Jet Propulsion Laboratory, NASA, Pasadena, CA, October 15, 2003. 14. Goddard Technical Standard, Risk management reporting, GSFC-STD-0002, Goddard Space Flight Center, NASA, Greenbelt, MD, May 8, 2009. 15. Hewlett Packard Enterprise, HP Ignite-UX, Hewlett Packard Enterprise Development. (2018). [Online]. https://www.hpe.com/us/en/product-catalog/detail/pip.4077173.html. Accessed March 30, 2018. 16. Cheswick, W. R., Bellovin, S. M., & Rubin, A. D. (2003). Firewalls and internet security, repelling the Wily Hacker (2nd ed., pp. 10–14). New York: Addison-Wesley. 17. Ekelund, J. E. (2000). Functional requirements document for the navigation software system—Encounter version. 699-SCO/NAV-FRD-501-ENC, Jet Propulsion Laboratory, NASA, Pasadena, CA, April 25, 2000. 18. Jones, J. (1992). Navigation requirements reference document for Cassini, 699-500-4. Jet Propulsion Laboratory, NASA, Pasadena, CA, December 1992. 19. Beswick, R. M. (2002). Cassini Navigation hardware requirements. IOM 312.D/007-2002, Jet Propulsion Lab, NASA, Pasadena, CA, September 30, 2002. 20. Moore, G. E. (1965, April 19). Cramming more components onto integrated circuits 38(8), 114–117.

322

R. M. Beswick

21. Intel, Excerpts from a conversation with Gordon Moore: Moore’s Law, Intel Corporation. (2005). http://large.stanford.edu/courses/2012/ph250/lee1/docs/Excepts_A_Conversation_ with_Gordon_Moore.pdf. Accessed March 30, 2018. 22. Walter, C. (2005, August). Kryder’s law (pp. 32–33). Scientific American. 23. Wall, L., Christiansen, T., & Schwartz, R. (1996, September). Programming perl (2nd ed.). O’Reilly & Associates. 24. Beswick, R. M. (2002). Initial product evaluation for Cassini Navigation upgrades. IOM 312.D/008-2002, Jet Propulsion Laboratory, NASA, Pasadena, CA, November 24, 2002. 25. Standard Performance Evaluation Corporation, SPEC: Standard Performance Evaluation Corporation, Standard Performance Evaluation Corporation, March 1, 2018. [Online]. https://www. spec.org. Accessed March 30, 2018. 26. Finley, B. E. (2015). SystemImager, September 2, 2015. [Online]. https://github.com/finley/ SystemImager/wiki. Accessed March 30, 2018. 27. Yeh, Y. C. (2001). Safety critical avionics for the 777 primary flight controls system. In IEEE—Digital avionics systems, Daytona Beach, FL, DASC. 20th Conference, October 14–18, 2001. https://doi.org/10.1109/dasc.2001.963311. 28. Beswick, R. M. (2017). Cassini Navigation file server storage estimates through EOM. IOM 392K-17-001, Jet Propulsion Laboratory, NASA, Pasadena, CA, March 10, 2017. 29. Beswick, R. M. (2018). Final disposition of Cassini Assets. IOM 392K-18-002, Jet Propulsion Laboratory, NASA, Pasadena, CA, September 24, 2018. 30. Gray, J., & Siewiorek, D. P. (1991, September). High-availability computer systems (pp. 39–48). Los Alamitos, CA: IEEE Computer Society. https://doi.org/10.1109/2.84898. 31. Twain, M. (1894). Pudd’nhead Wilson. New York City: Charles L. Webster & Co. 32. Skodis, E., & Liston, T. (2006). Counter hack reloaded: A step-by-step guide to computer attacks and effective defenses (2nd ed.). New York: Prentice Hall. 33. Bishop, M. (2003). Computer security, art and science (pp. 344–345). New York: AddisonWesley. 34. Anderson, R. J. (2008). Security engineering: A guide to building dependable distributed systems (2nd ed.). New York: Wiley. 35. Information Assurance Directorate, Operating Systems guidance, National Security Agency, [Online]. https://www.iad.gov/iad/library/ia-guidance/security-configuration/operatingsystems/index.cfm. Accessed April 20, 2017. 36. Center for Internet Security, CIS—Center for Internet Security, CIS, [Online]. http://www. cisecurity.org. Accessed March 30, 2018. 37. National Vulnerability Database, National Checklist Program Repository, National Institute of Standards and Technology, [Online]. https://nvd.nist.gov/ncp/repository. Accessed March 30, 2018. 38. NMAP, Nmap, [Online]. http://www.nmap.org. Accessed March 30, 2018. 39. Nessus, Tenable security, Tenable Inc, [Online]. http://www.tenable.com/products. Accessed March 30, 2018. 40. Shakespeare, W. (1599). Henry V, Act IV, Scene III. [Performance]. 41. CloudSquare, CloudHarmony—Service status (comparison), CloudSquare, March 30, 2018. [Online]. https://cloudharmony.com/status. Accessed March 30, 2018. 42. Beswick, R. M. (1997). Saturday, May 24th, [MMNAV NAV-OPS LAN] NETDOWN, JPL NETDOWN report (MMNAV NAV-OPS archive: email distribution list), Pasadena, CA, Saturday, May 24, 1997. 43. Castro, M., & Liskov, B. (2002, November). Practical Byzantine fault tolerance and proactive recovery. ACM Transactions on Computer Systems, 20(4), 398–461. https://doi.org/10.1145/ 571637.571640.

Ground Enterprise Transformation at NESDIS Steven R. Petersen

Abstract This paper describes major changes in the architecture and capability of the Ground Enterprise that operates and sustains the National Oceanic and Atmospheric Administration’s (NOAA’s) weather satellites. Operated by NOAA’s National Environmental Satellite Data and Information Service (NESDIS), the Ground Enterprise supports satellite systems in both polar and geosynchronous orbits. These include the legacy Polar Operational Environmental Satellite (POES) and Geostationary Operational Environmental Satellite (GOES) systems as well as the new Geostationary Operational Environmental Satellite Series R (GOES-R) and Joint Polar Satellite System (JPSS) systems. Additionally, NESDIS participates with domestic and international partners in satellite operations, product generation, and distribution involving a variety of platforms. Analysis over the last three years defined an evolved architecture for the Ground Enterprise that integrates its elements together for greater effectiveness and efficiency. Based on common services, the evolved architecture complements the strengths of the new GOES-R and JPSS ground systems with investments to modernize the remainder of the existing infrastructure and exploit new technologies. These changes will enable the Ground Enterprise to process and distribute new sources of data with greater agility, flexibility, and efficiency at reduced cost. They also provide economies of scale across the entire enterprise and enable more straightforward implementation of security measures. This paper describes the translation of the architecture analysis into time-phased investment plans. These plans address the migration to a set of enterprise algorithms based on common physics implementations, a modernized data archive potentially leveraging the cloud, shared product generation, and distribution services for more efficient operations, and a mission science network to promote greater collaboration across the science community. The paper also describes the status of initial investments in mission support tools.

S. R. Petersen (B) National Ocenaic and Atmospheric Administration, Silver Spring, MD 20910, USA e-mail: [email protected] © Springer Nature Switzerland AG 2019 H. Pasquier et al. (eds.), Space Operations: Inspiring Humankind’s Future, https://doi.org/10.1007/978-3-030-11536-4_13

323

324

S. R. Petersen

1 Background The National Environmental Satellite Data and Information Service (NESDIS) acquires, operates, and sustains geosynchronous and polar weather satellites for the National Oceanic and Atmospheric Administration [1]. These platforms provide key observations for the National Weather Service and a variety of other US users and international partners. Historically, due to partnerships and acquisition strategies, NOAA’s satellite missions were developed as stovepipe systems. Over the last two years, NESDIS launched the first two satellites in the new Geostationary Operational Environmental Satellite Series R (GOES-R) [2], now known as GOES-16 and GOES-17, as well as the first satellite of the new Joint Polar Satellite System (JPSS) [3], now known as NOAA-20. With these launches, NESDIS began full operation of the two new ground segments developed to support them. These systems joined the infrastructure previously fielded to support legacy Geostationary Operational Environmental Satellite (GOES) and Polar Operational Environmental Satellite (POES) systems, which are still operating and forecast to flyout in the 2020s, as well as other NESDIS and partner missions. NOAA’s stovepipe architectures supporting the legacy GOES and POES satellites provide limited sharing of common standards, services, or functionality. These systems generally use dedicated component resources crafted to perform solely one mission. While these designs are well-thought out and generally perform their intended functions well, they frequently lack provisions for sharing with other missions that need similar services. This has led to high acquisition costs in the aggregate due to redundant functionality with each new program and high operations and maintenance costs. The two new ground systems offer improved architectures that leverage some common services; however, they were still fielded as stand-alone capabilities that do not interface with each other nor fully share common support infrastructures.

2 Organization In January 2015, NESDIS activated the Office of Satellite Ground Services (OSGS) [4]. OSGS was created to centralize development and sustainment of future NESDIS satellite ground capabilities. OSGS leads the transition to an Integrated Ground Enterprise (IGE) that offers cost-effective, secure, agile, and sustainable support for NESDIS missions. The IGE consists of the two new ground segments, elements of the legacy capability, and targeted new investments. The ultimate goal is to never again buy an entire new ground system for a future satellite constellation. The remainder of this paper describes how OSGS and the other NESDIS offices are executing this transition. The role of the concept of operations [5] and the process used to create the Enterprise Architecture are illustrated in some detail, as these items can be employed by any organization seeking to transform itself. When fully developed, the IGE will provide a suite of common ground services enabling (1) reduction of

Ground Enterprise Transformation at NESDIS

325

mission ground systems costs and (2) accelerated deployment of capabilities. In the process, NESDIS will position itself as a data provider or data engine that leverages an IT-centric approach more closely related to Silicon Valley values than to the traditional aerospace industry.

3 Top-Level Objectives and Approach Mission ground system costs will be reduced and deployment speed will be improved by (1) Eliminating redundant acquisition and development of common ground system functionality; (2) Sharing common but underutilized infrastructure resources across satellite programs and implementing standards for future development; and (3) Streamlining ground operations by eliminating redundant operations and embracing automation to require fewer support staff. Planning for the transition to the IGE is accomplished through three sets of activities: (1) Development of an overarching IGE concept of operation and a set of Level 1 requirements (completed); (2) Development of an Enterprise Architecture describing the current and to-be IGE states (95% complete), and (3) Development and implementation of investment plans to transform the enterprise from its current state to the full IGE (in work). Figure 1 illustrates the NESDIS Ground Enterprise.

4 Concept of Operation and Use Cases The IGE concept of operation was completed in February 2015. It describes the capabilities and attributes that the future Integrated Ground Enterprise (IGE) will possess and illustrates the application of these capabilities and attributes across 12 use cases. It provides the fundamental rationale for creating the IGE. Table 1 describes the capabilities and attributes of the IGE. The twelve use cases in the concept of operations illustrate the benefits of the IGE approach to all major groups of stakeholders, from application developers to operators to end users. The goal is to socialize the benefits of the IGE and enable all stakeholders to see themselves within the new construct. Table 2 lists the use cases. Compared to the existing stand-alone approach, the IGE concept of operation brings significant impacts to traditional approaches to design, development, sustainment, operations, security, and staffing. Table 3 lists some of these impacts. As NESDIS moves to fully integrate the new ground systems into a shared services approach, we are already experiencing some of these impacts, including challenges in end-to-end testing of segments that were intended to function together but were developed and validated in isolation of each other. As the IGE matures, NESIDS will need an Integration and Test (I&T) environment that accurately mimics the full operational environment, so that changes can be adequately tested to confirm that they do not create negative impacts on any of the systems that use common services.

Fig. 1 NESDIS Ground Enterprise [6]

326 S. R. Petersen

Ground Enterprise Transformation at NESDIS

327

Table 1 Capabilities and attributes of the IGE concept of operation [7] Capability/ Attribute

Description

Enterprise governance

IGE is a shared resource that stretches across all of NESDIS and is governed as an enterprise resource; all stakeholders have a voice

Enterprise management

Capabilities include situational awareness (health and status) and the ability to move resources from one use to another

Enterprise funding

All using organizations provide baseline IGE requirements and NESDIS requests the necessary funding

Shared infrastructure

An infrastructure of network, compute, storage, and software resources are shared and dynamically managed to meet NESDIS requirements. This maximizes resource utilization, improves operational flexibility, and reduces O&M costs through standardization

Ubiquitous data access

IGE provides a MetaData Registry describing available data and how to access it. The IGE data access architecture includes data from non-NOAA satellites, data from non-satellite sensors (e.g., in situ sensors), as well as external data

End-to-End lifecycle data management

Data management includes acquisition, quality control, validation, reprocessing, storage, retrieval, dissemination, and long-term preservation activities. IGE provides common services for many elements of the process

Isolation of impacts

IGE provides separations between the users of the shared resources IGE also enforces isolation of impacts by use of controlled interfaces

Hardware agnostic

IGE supports infrastructure as a service (IaaS). This approach breaks the dependency between hardware and software

Location agnostic

IGE is a distributed system where functionality may be implemented anywhere and migrate to new locations without impact to the users. This improved resource management, continuity of operations (COOP) and failover functionality

Acquisition approach agnostic

IGE enables a range of acquisition approaches for adding additional resources, capabilities, or applications

Service-oriented approach

In the IGE service-oriented architecture (SOA), every IT resource is accessible as a service. Each service is discrete and interacts with the enterprise through defined interfaces, so service implementations can be changed with limited impact to the enterprise

Maximum reuse of common services

Missions are incentivized to reuse existing services instead of creating redundant functionality, and common services are well documented in an enterprise registry. By policy and contract, the Government will own full data rights for all IGE common services (except for Commercial Off The Shelf (COTS) components). This includes the full source code and unlimited rights for reuse at no cost

Use of standards

IGE resources, interfaces, data, and metadata formats use non-proprietary standards, with preference for International or consortium-based standards that have broad deployment and proven success. This avoids expensive dependencies on single vendor solutions

Support for automation

IGE provides workflow automation, rules engines, and other automation tools as common services (continued)

328

S. R. Petersen

Table 1 (continued) Capability/ Attribute

Description

Security as infrastructure

IGE provides information security as an integral feature of the infrastructure and meets NIST security requirements

Warehousing and restoring

Warehousing saves the current state of a user’s profile and resources to storage so that the resources can be freed for other users

Table 2 Concept of operation use cases [8] Use cases Routine satellite operations Integration of a new satellite mission Transition of a NASA Research Satellite Mission to NOAA Operations Integration of an external data source New data product requirement New algorithm development Algorithm sustainment Calibration and Validation (Cal/Val) support Governance of common services Automation of a ground system function Adding a new common capability to IGE Reprocessing

5 Level 1 Requirements The Level 1 requirements describe the capabilities that the IGE must provide. These are listed in Table 4. During architecture development, these requirements are allocated across nine functional segments in the business architecture and subsequently decomposed into Levels 2 and below.

6 Enterprise Architecture (EA) Process The architecture development activity leverages best practices such as those described in the Federal Enterprise Architecture Framework (FEAF) [11] and The Open Group Architecture Framework (TOGAF) [12]. The IGE Enterprise Architecture development follows the TOGAF development lifecycle process to conduct business mission and modeling first and then perform system architecture design, technical solution analysis, system development, and/or acquisition. To provide more business, operation, and engineering details for TOGAF lifecycle process, DoDAF (DoD Archi-

Ground Enterprise Transformation at NESDIS

329

Table 3 Impacts of adopting the IGE concept of operation [9] Impact

Description

Government integrator and infrastructure provider

In a traditional acquisition, a prime contractor provides an end-to-end system, integrating together its elements. With IGE, the government provides infrastructure resources and has greater responsibility for integration and end-to-end system requirements

Staffing and skills

The shared infrastructure introduces new tasks and challenges to the existing organization; staffing and skills need to evolve

Shared resources

Redundancy, testing, and other technical mitigations are needed to increase the reliability and availability of the shared resources

Security

Capability provisioned across the enterprise reduces the need to implement a redundant set of controls within each application

Satellite ground system architecture and design

Each new satellite ground capability added to IGE must comply with Enterprise Architecture principles such as avoiding duplicative services, being location agnostic, using non-proprietary standards, adhere to standard, and open interfaces in the enterprise.

Mission acquisition

Missions constitute incremental modifications and additions to the existing Ground Enterprise. New missions will be required to use IGE capabilities, and NESDIS will have to provide greater technical support to new mission developers

Flight/ground integration

IGE provides a documented interface between flight and ground segments; new flight systems must integrate to these interfaces

Development and deployment of ground functionality

IGE will provide development, test, and deployment resources to developers. This saves time and money by providing an environment that closely mimics the operational one, access to operational and test data, and common services. OSGS must make resources available to developers when needed. Use of common ground services and resources may increase the integration risk, requiring increased integration testing

Mission operations

Operators will interact with a single, common ground operator interface, resulting in lower training costs and better efficiency. Since IGE is (largely) location agnostic, there will be no dependency between location and tasking. Most ground system tasks will be executable from any ground system location

Sustainment

In IGE, there is a large pool of shared resources that are continually allocated to meet changing requirements. This enables a dynamic sustainment approach tied to IT resources rather than systems. Individual IT resources or classes of resources can be refreshed independent of the systems that use those resources. This enables capability refresh that can quickly leverage opportunities to adopt technologies that deliver benefits across the enterprise

Requirements

The process of requirements flow down to the ground system will be revised to enable enterprise-level ground system solutions. Level 1 ground requirements will mandate use of IGE. Mission unique requirements not already supported by common ground services will be built and integrated into the IGE in an enterprise-like fashion, enabling their reuse by future missions

330

S. R. Petersen

Table 4 Level 1 requirements for the NESDIS enterprise [10] Level 1 Requirements for the NESDIS Ground Enterprise NGE_L1_2: The NGE shall be developed as an enterprise-level system based on an open, flexible, and adaptable Enterprise Architecture NGE_L1_3: The NGE shall support satellite and instrument operations of NOAA operated satellites NGE_L1_4: The NGE shall configure, monitor and control the ground network to support operations NGE_L1_5: The NGE shall support a suite of simulators that includes spacecraft, instrument, and ground-link simulators NGE_L1_6 The NGE shall acquire, process, store, and disseminate data NGE_L1_7: The NGE shall support real-time operational data processing and (multi) mission-level reprocessing of older data simultaneously NGE_L1_8: The NGE shall provide an enterprise-level capability to disseminate operational data and products to the user community NGE_L1_9: The NGE shall provide ground operations support for space-based data relay, direct broadcast, and rebroadcast services provided by or administered by NOAA NGE_L1_10: The NGE shall support operational product development, refinement, and transition of science to operations NGE_L1_11: The NGE shall support collaboration and technology transfer with user communities NGE_L1_12: The NGE shall provide communication services for missions supported by NOAA NGE_L1_13: The NGE shall implement NESDIS required DHS and NIST security directives in a manner consistent with NOAA/NESDIS IT Security policies NGE_L1_14: The NGE shall meet the continuity of operation requirements according to the Federal Continuity Directives FCD 1 and FCD 2 NGE_L1_15 The NGE shall provide the IT infrastructure to support NGE life cycle activities NGE_L1_16: The NGE shall support mission-specific data latency and availability requirements

tecture Framework) Enterprise Architecture Views are developed, especially for TOGAF’s information system architecture. TOGAF is a framework that provides an enterprise approach for designing, planning, implementing, and governing an enterprise information technology architecture, relying heavily on modularization, standardization, and existing, proven technologies and products. As of 2016, TOGAF was employed by 80% of Global 50 companies and 60% of Fortune 500 companies. As shown in Fig. 2 and summarized in Table 5, TOGAF EA is developed through a phased approach and process consisting of four iterative cycles: architecture context; architecture delivery; transition planning; and architecture governance. Each phase (from A to H) of the process is driven by the requirements management process to ensure every stage of a TOGAF-based project meets business requirements. The Architecture Development Method (ADM) [14] is an iterative method over the whole process, between phases, and within phases. Each iteration considers

Ground Enterprise Transformation at NESDIS

331

Fig. 2 TOGAF’s phased approach for EA development [13]

enterprise coverage, level of detail to be developed, time horizon, architecture asset reuse including previous ADM iterations, other frameworks, system models, industry models, etc. IGE EA development has completed the first round of the four iterative cycles of TOGAF, with the exception of portions of Phases E & F, which are currently in work and developed the following EA products for IGE design and planning.

6.1 Architecture Context Circle • IGE Concept of Operations, v.1.0 released in Dec. 2015, v1.2 released in May 2017 • IGE Requirements (NGE Level 2 requirements), v0.1 (Draft) in Dec. 2015 and v1.0 released in October 2017.

6.2 Architecture Delivery Circle • IGE Business Architecture Report, v3.0, September 2016

332

S. R. Petersen

Table 5 TOGAF’s architecture development phases Phase

Purpose

A: Architecture vision

Set the scope, constraints, and expectations for a TOGAF-based project, create the architecture vision, validate the business context, and create the statement of architecture work

B: Business architecture (BA)

Develop baseline and target architectures and analyze the business gaps

C: Information systems architecture (ISA)

Develop baseline and target architectures and analyze the system gaps

D: Technology architecture (TA)

Develop baseline and target architectures and analyze the technology gaps

E: Opportunities and solutions

Perform initial implementation planning, conduct investment analysis, and identify major implementation projects

F: Migration planning

Analyze costs, benefits, and risks, and develop detailed implementation and migration plan

G: Implementing governance

Provide architectural oversight for the implementation, and ensure that the implementation project conforms to the architecture

H: Architecture change management

Provide continual monitoring and a change management process to ensure that the architecture responds to the needs of the enterprise

• IGE EA Views Baseline Package, v1.0 released in May 2017 • IGE Technology Architecture (TA) document, v1.6, released in August 2018.

6.3 Transition Planning Circle • Transition & Sequence (T&S) Plan, v1.0 released in Sept. 2016 • Implementation Plan, under development (Target date: June 2018). Architecture Governance Circle: • EA Governance Document, v1.0 released in May 2017

7 Three-Tier Results: BA, ISA, and TA To develop a service-oriented and customer-focused Enterprise Architecture, the team developed the IGE EA through a descriptive model of the existing/legacy archi-

Ground Enterprise Transformation at NESDIS

333

tecture which is used to make informed command decisions on operations, training, personnel, and acquisition to transition and sequence to a Target (“To-Be”) Architecture. TOGAF provides a top-down approach for EA development for its architecture delivery circle which includes three-tier Enterprise Architectures development. TOGAF starts from high-level business activities and ends at technology reference models for hardware and software standards and preferences. By following the threetier TOGAF approach, the OSGS IGE EA team developed three Enterprise Architectures (BA, ISA, and TA). These architectures are described in the following sections. But since the IGE enterprise is very large and complex, comprising many separate but interlinked components within an overall collaborative business framework, the architecture development method used in IGE EA development was tailored accordingly.

7.1 Business Architecture (BA) The BA defines the business strategy, governance, organization, and business processes. More specifically, the BA develops the target business architecture that describes how the enterprise operates to achieve the business goals, respond to the requests of strategic drivers, and solve the concerns of stakeholders. The BA also identifies architecture roadmap components based upon business gaps between the baseline and target architectures. During the BA development, in addition to the business architecture report, DoDAF Operational Views (OVs) and Data and Information Views (DIVs) were developed to support the business description. The BA Report defines nine functional segments of the NESDIS Architecture. These functional segments are described in Table 6. The IGE BA Report identifies operational activities (and sub-activities) for each functional segment. Table 7 identifies the top-level operational activities for each functional segment.

7.2 Information Systems Architecture (ISA) The ISA develops data models and application systems/services structure. More specifically the ISA develops the target information systems (data and application) architecture to address business processes that are supported by IT and is the interface of IT-related process and non-IT related process. DoDAF DIVs, High-level Systems Views (SVs), and Services Views (SvcVs) were developed to provide ISA details through data modeling and application systems/services structure diagrams and description.

334

S. R. Petersen

Table 6 IGE business architecture functional segments [15] ID

Functional segment

Description

MOP

Mission operations

Plans, schedules, and manages satellite flight and ground operations. This includes satellite and instrument commanding, maintaining situational awareness, operating antennas, allocating ground resources, managing flight software and verifying commands and flight software using simulations

SCM

Satellite communications

Provides space-ground communications and the routing of data received from the satellite to mission partners. This includes sending commands, product data for rebroadcast, memory loads and software updates to the satellite and receiving mission, rebroadcast, housekeeping and unique payload data from the satellite

DMS

Data management services

Provides product generation, product distribution, and product quality assessments for environmental data products. Product generation ingests data and manages mission support data and production of data products. Product distribution disseminates data for exploitation by end users and for permanent archival storage. Product quality monitors products, maintains metrics and generates alerts

DSS

Data stewarding support

Supports data stewarding, which may include modifications to data (e.g., metadata updates, replacements, deletions, and/or removals) and access restrictions, by providing long-term storage, and catalog services as well as other IT infrastructure services. This includes ingesting submission information packages, ensuring permanent and supporting secure archival access

SAS

Science application support

Provides infrastructure environments, and associated software, tools, and data to support algorithm development and management and data product calibration and validation

EMO

Enterprise management and operations

Manages the operation of the mission systems and services (processes, hardware, software, and data). Includes mission-based situational awareness, capacity management, integrated change control, configuration management, and maintenance of the operational assets of the NGE. Enterprise operations include customer support (service desk), field terminal support, and I&T

NES

NGE engineering and sustainment

Provides the planning, systems engineering, program and project management, and miscellaneous support functions that sustain each NGE mission system

INF

Infrastructure

Procures and provides the compute, storage and network capabilities for each NGE mission system

SMT

Security management

Develops and implements policies and procedures that protect NGE assets and operates security services that provide access control, incident management and IT security diagnostics and mitigation across the NGE

Ground Enterprise Transformation at NESDIS

335

Table 7 Operational activities [16] Functional segment

Operational activity

Description

Mission operations (MOP)

Fleet Planning and Scheduling (FPS)

Long-range and intermediate planning and tactical scheduling of space and ground fleet assets

Flight Operations (FO)

24 × 7 support services for satellite operations

Ground Operations (GO)

24 × 7 support services for ground operations supporting satellite operations and space-ground communications

Space-Ground Communications (SGC)

Provides the communications services between the ground and flight segments

Data Routing (DR)

Relays raw data products to subscribing consumers

Data Ingest

Collects, prepares, and makes available the data needed for product generation. This includes ingesting mission and unique payload services data and extracting direct readout data

Product Generation (PG)

Processes instrument detector samples (RDR/L0) performs radiometric calibration and geometric correction (SDR/TDR/L1b), assembles rebroadcast data sets and generates higher level products (EDR/L2 +). It uses algorithms developed and maintained by the instrument manufacturers and the science community

Product Distribution (PD)

Disseminates the real-time and near real-time NOAA data products. It includes user and subscription management

Product Quality (PQ)

Assesses and monitors products and production, assesses the quality of data products and manages data product consumer communications

Data Analytics (DA)

Analyzes data products manually characterizes environmental information, provides discussion support analysis and distributes analysis products to consumers

Data Stewardship (DS)

Applies rigorous analyses and oversight to ensure that environmental data sets meet the needs of users. This includes documenting measurement practices and processing practices (metadata); providing feedback on observing system performance; inter-comparison of data sets for validation; reprocessing (incorporate new data, apply new algorithms, perform bias corrections, integrate/blend data sets from different sources or observing systems); and recommending corrective action for errant or non-optimal operations

Satellite Communications (SCM)

Data Management Services (DMS)

Data Stewarding Support (DSS)

(continued)

336

S. R. Petersen

Table 7 (continued) Functional segment

Science Application Support (SAS)

Enterprise Management and Operations (EMO)

NGE Engineering and Sustainment (NES)

Operational activity

Description

Storage Support (SS)

Ensures permanent, secure archival of environmental data. It ingests data sets, provides archival storage, and supports retrieval of archived data sets. It also administers storage support, provides common security, IT infrastructure, and IT management services

Science R2O (R2O)

Facilitates the transfer of satellite observations of the land, atmosphere, ocean, and climate from scientific research and development into routine operations, and offers state-of-the-art data, products, and services to decision makers. Findings are shared with partners and stakeholders to promote creative thinking about methods that would use satellite data to obtain better information about the Earth and its environment

Algorithm Management (AM)

Develops deployable algorithm software and provides algorithm change management and testing. This service supports the development of both mission specific and enterprise science algorithms

Data Product Cal/Val (CV)

Supports instrument calibration and data product validation activities

Enterprise Management (EM)

Manages the enterprise systems and services (processes, hardware, software, and data). It provides situational awareness, capacity management, integrated change control, configuration management and maintenance of the operational assets of the NGE

NGE Operations (NO)

Provides customer support (service desk), field terminal support, and integration and test

NGE Management (NM)

Provides investment planning and management and the program and project management necessary to sustain NGE capabilities, transition new systems into NGE and evolve existing capabilities into an integrated Ground Enterprise

NGE Sustainment (NS)

Analyzes NGE sustainment needs and supports sustainment investment decisions

NGE Engineering (NE)

Provides systems engineering and supports science operational support and launch and early operations

Continual Service Improvement (CSI)

Evaluates business processes and services and creates and implements processes and service improvement plans (continued)

Ground Enterprise Transformation at NESDIS

337

Table 7 (continued) Functional segment

Operational activity

Description

Infrastructure (INF)

Infrastructure (INF)

Provides common IT environments, development suites, common infrastructure configurations (CIC) and other IT capabilities (compute, storage, and network infrastructure) across the NGE

Security Management (SMT)

Security Engineering (SI)

Provides facility security, IT security diagnostics and mitigation, IT security risk management (including assessment and authorization), network IT security, personnel security, and spacecraft communication security

Security Operations (SO)

Provides access control, configuration and security status monitoring, diagnostics and mitigation, and incident management

7.3 Technology Architecture (TA) The TA describes the software and hardware capabilities that are required to support the deployment of IGE data and application services. More specifically the TA forms the basis for the EA implementation and establishes building blocks on functionality, standards, and interoperability. The key contents of the TA include the technologies, standards, specifications, and protocols used in the enterprise. The TA tailored by the IGE EA team has two major parts, the Infrastructure Reference Model (IRM), and the Application Reference Model (ARM). Infrastructure Reference Model (IRM): The IRM is the taxonomy-based reference model for categorizing IT infrastructure and the facilities and network that host the IT infrastructure. The IRM is developed to enable sharing and reuse of IT infrastructure to reduce costs, increase interoperability across the government and its partners, and guide the selection of service components for efficient acquisition and deployment. The IRM captures the IT infrastructure: the equipment, systems, software, and services used in common across the enterprise regardless of mission, program, or project and the standards used by those infrastructure elements. Application Reference Model (ARM): The ARM is the taxonomy-based reference model for categorizing software applications that support the delivery of service capabilities. The ARM is created to document the technologies, standards, specifications, and protocols currently in use and to guide the selection of technologies, standards, specifications, and protocols for future. The ARM captures the applications: mission, program, or project-specific software that resides upon the IT infrastructure and helps fulfill a business function and the standards used by those applications. Both the ARM and IRM technology profiles are highly matured in terms of data [17]. The application and interface standards necessary to support information technology acquisitions are sufficiently usable in both the ARM and the IRM. The usage status is under development in the standards view inclusive of ARM and IRM stan-

338

S. R. Petersen

Fig. 3 Technology architecture organization [18]

dards. The standards view currently provides usage status recommendations and a standardized justification for 100 of 371 identified standards to date. A justification includes standardized assessments of technical maturity, availability, vulnerability, product implementation, among others, for any given standard (Fig. 3). In addition to the TA document which includes hardware and software technologies, standards, and usage status, DoDAF low-level SVs were also developed to support the TA deployment (e.g., Level 3/4 SV-1s). A TA user guide illustrates how to apply to TA during various stages of system development and operation as shown in Table 8.

8 Enterprise Architecture Views 8.1 DoDAF Views for IGE EA Development IGE EA Views Baseline Package v1.0 (see Table 10) was prepared and released in May 2017 for NESDIS Ground Enterprise SEs and PMs for use in their system development and program/project management. The IGE EA Views, DIVs, OVs, and SVs were developed to baseline the current NGE (“As-Is”) and only SvcVs were developed for the target NGE (“To-Be” or IGE). In the next version, more “To-Be” DIVs, OVs, SVs, and SvcVs will be developed and included (Table 9). Note that in the released baseline version (v1.0) “As-Is” Systems FY2019; Target “To-Be” Services FY2022+. In addition, a baseline package briefing was also developed and released together with the baseline package which includes:

Ground Enterprise Transformation at NESDIS

339

Table 8 Usage of the TA [19] Phase

Activity

Responsibility

Primary stakeholder(s)

Pre-acquisition

Evaluate compatibility

Assess whether a project is compatible with the technical architecture

Lead system engineer, decision-making body technical lead

Identify enterprise gaps

Identify standards or technologies that are missing from the technical architecture

Decision-making body technical lead

Requirements development

Develop requirements for a new program that complies with the technical architecture

Lead system engineer

Request for Proposal (RFP)

Include technical architecture in RFP

Program manager

Source selection

Include compatibility with technical architecture in source selection criteria

Program manager

Development

Assess system design for compatibility with the technical architecture

Program manager

Sustainment planning

Identify deprecated standards and technologies that should be considered for upgrade

Sustainment manager

Sustainment execution

Identify opportunities for improving compatibility with the technical architecture

Sustainment developer

Maintain the technical architecture

Update the technical architecture to reflect new standards and technologies

Enterprise Architecture team

Track usage of standards and technologies

Maintain accurate knowledge of the standards and technologies in use across the NESDIS enterprise

Enterprise Architecture team

Acquisition

Sustainment

Enterprise Architecture

340

S. R. Petersen

Table 9 Enterprise Architecture view categories [20] View

Purpose

Data and Information View (DIV)

Describes current NGE from a data and information perspective. It identifies mission, security, and support data elements at three abstraction levels and describes their aggregation

Operational View (OV)

Describes the existing NGE from an operational and business perspective. It identifies the operational performers, roles, activities, and requirements at multiple levels of abstraction and describes their aggregation. It identifies NGE data flows among the performers and activities, describes business processes and links requirements to the elements responsible for their satisfaction

Systems View (SV)

Describes the current NGE from a systems perspective. It describes baseline “As-Is” NGE systems at three levels of abstraction and identifies the resource interaction among NGE systems at each level of abstraction

Services View (SvcV)

Describes the desired “To-Be” services (future NGE) and their interaction. It identifies service access points, realized interfaces, and the resource flow among the services

• • • • •

Introduction of IGE EA Baseline Introduction of EA Views Use of EA tool (a.k.a. MagicDraw) Future work for the baseline release Change control process.

8.2 Using EA Views for Analysis, Planning, and Development The EA activity is an integral part of the overall enterprise engineering effort, that encompasses the systems architecting and systems engineering activities associated with the concept analysis, design, acquisition/implementation, and sustainment of the individual component systems and services that make up the Enterprise as shown in Fig. 4. As outlined in Fig. 4, EA products (ConOps, EA Views, and T&S Plan) provide guidance and criteria for enterprise requirements finalization, implementation plan development, and investment analysis and decisions. Program/project managers and system engineers follow enterprise requirements, implementation plan, and EA views for their system requirements development and system design. Both EA and SA products are followed during system development and used for system testing and acceptance evaluation. Program/project managers and system engineers associated

Ground Enterprise Transformation at NESDIS

341

Fig. 4 Architecture-to-engineering [22]

with NESDIS Ground Enterprise development are required to use the EA views directly for the following activities, but not limited to: • • • •

System Design, Development/Acquisition, and Sustainment System Configuration Management (SCM) Analysis of Alternatives Studies Impact Analyses.

8.3 Benefits of Using GEARS EA Views IGE Views deliver direct and indirect contributions to the organization’s goals and objectives. The EA Views provide a system development and operation analysis tool that facilitates better design and analysis to support all phases along the systems engineering life cycle and the acquisition life cycle of new and emerging technologies. Benefits exist in the following areas (useful not only at the enterprise-level but also at the system/project level) • • • • • •

Project/Program/Portfolio Analysis and Management System Configuration Management Requirements engineering Systems design, acquisition, development, and testing IT Management and Decision Making Reduction of IT Complexity and Redundancy

342

S. R. Petersen

• Elimination of duplicative common services • Promotion of Open, Responsive, Transparent IT functionality • Risk reduction in project delivery, systems operations, and mission accomplishment • Operational and Maintenance (O&M) cost reduction • Ready availability and accessibility of enterprise design documentation • Reduction of solution delivery time and associated development costs • Ability to generate and maintain a common, accepted vision of the future • Flexibility in expansion of the enterprise • System acquisition Strategy and Planning • Key Initiatives (pilots/demonstrations) • Policy Definition and Refinement.

9 Harvesting Actionable Results from the Architecture 9.1 Development Principles The IGE evolution from current NGE will meet all existing and envisioned future performance requirements by: • Employing shared common services • Implementing evolutionary strategy, small pieces at a time while continuing operations • Fielding early enterprise elements to demonstrate the value of Common Services • Considering Return on Investment—it may not be cost-effective to migrate all the functions to shared common services. This ensures the integrity of existing operational systems is maintained, and that return from investments is achieved along the way. In addition, this approach has the added advantage of being adaptable to changing budgets and priorities, allowing NESDIS the flexibility to accelerate or delay aspects of the transition as needed. More specifically, the GEARS development approach will take the following actions for IGE development as shown in Fig. 5: • Decreasing the amount of mission unique code to reduce overall maintenance and sustainment costs • Creating shared common services to reduce time and cost of adding new products and missions • Adopting common hardware standards to enable economies of scale in tech refresh and simpler security hygiene • Using the OSGS Infrastructure Reference Model (IRM) and Application Reference Model (ARM) as standards and guidance on software/hardware acquisition and/or development and sustainment efforts.

Ground Enterprise Transformation at NESDIS

343

Fig. 5 IGE common services [23]

The impact of increasing utilization of common services on sustainment costs in the evolution toward the IGE is illustrated in Fig. 5. Duplicative functions or supporting applications in each mission will be collapsed into a set of common services using standardized interfaces. Any residual functions will be retained as mission unique services based on mission unique requirements and not encompassed by the overarching IGE mission requirements. One enterprise design employing a service-based architecture and shared resources, potentially structured in the layered building block approach, is shown in Fig. 6.

9.2 Results of the Business Architecture Analysis The Business architecture development effort and the associated gap analysis identified potential near-term and long-term investment opportunities on the path to transitioning to the desired end state of the target Enterprise Architecture. This includes new capabilities, existing mission-specific capabilities that should be combined and applied enterprise wide, existing capabilities that should be enhanced and applied across the enterprise and existing capabilities that are already appropriate for the Target Business Architecture. The following is a list of the candidate, business process migration opportunities ascertained during the development of the NGE Business Architecture:

Fig. 6 IGE structured as a SOA-based services provider [24]

344 S. R. Petersen

Ground Enterprise Transformation at NESDIS

345

• Infrastructure—Investment in enterprise infrastructure capabilities migrates mission-specific capabilities to the enterprise with attendant savings in acquisition, maintenance, and sustainment. • Algorithm R&D—Investment in enterprise algorithm research and development capabilities consolidates mission-focused algorithm sustainment and maintenance activities into an enterprise capability that supports algorithm research, development, deployment, maintenance, and sustainment while reducing transition time from research to operations and reducing the number of algorithm versions that need to be supported. • Product Generation—Investment in enterprise product generation capabilities consolidates mission-focused product generation into an enterprise-wide product generation capability, projected to be built on enterprise infrastructure services that reduce the costs of COTS licensing, system sustainment, and maintenance. • Enterprise Engineering—Investment in enterprise engineering capabilities consolidates mission-focused management activities into enterprise engineering services providing enterprise-level tools and processes for configuration management, requirements management, acquisition management, facility management, quality management and risk management with savings in engineering, licensing, and system sustainment and maintenance.

9.3 Investment Priorities Table 10 assesses the potential ROI by mission function. Mission functions are a highlevel classification of the functions carried out by the current Ground Enterprise. The assessment is based on the degree to which a given system service is independent of mission unique features. • Downstream services (i.e., archive and distribution) which are independent of the original satellite mission data can be ranked high; • Spacecraft/Instrument-specific services (i.e., command and data acquisition and L1/SDR data processing) are highly dependent on mission-specific details and are ranked low; • Services such as mission operations, ingest and product generation provide mission-specific tailoring of capabilities that are common to all missions and are ranked medium (Table 11). Table 12 summarizes the analysis accomplished by the IGE implementation team and groups future investments into phases.

346

S. R. Petersen

Table 10 GEARS EA views baseline package v1.0 [21] IGE EA views

Descriptions

DIV-1: Conceptual data model

NGE high-level data concepts and their relationships are modeled in the views

OV-2: Operational resource flow description

A description of the NGE Data Flows exchanged between operational activities

OV-5a: Operational activity decomposition tree

NGE capabilities and activities (operational activities) are organized in different categories (service areas) and summarized in a hierarchal structure

SV-1: Systems interface description

The identification of systems, system items, and their interconnections

SV-3: Systems-systems matrix

The relationships (tabular summary) among systems in a given architectural description

SvcV-1: Services context description

The identification of services, service items, and their interconnections in different levels of detail

SvcV-2: Services resource flow description

A description of resource flows exchanged between primary services

Table 11 ROI potential by mission function [25]

Mission function

ROI

Mission operations

Medium

Command and data acquisition

Low

Mission data routing

Medium

Data ingest

Medium

Mission data preprocessing

Low

Product generation

Medium

Product distribution

High

Archive

High

10 Investments in Migration to Service-Based Enterprise The following sections describe ongoing and planned investments in the product, science, and operations support services areas listed above. While these investments unfold NESDIS will decide whether or not to move some or all mission capabilities to a cloud (public, private, and/or hybrid). Commercial cloud solutions offer wellrecognized benefits such as a continuous technology refresh conducted in a highly efficient fashion; spare capacity that can be accessed when needed but only paid for while actually in use; and common services that are well defined and documented. Constraints include concerns about security, reliability, and the cost of porting existing applications to a cloud environment. Individual assessments of the benefits and

Ground Enterprise Transformation at NESDIS

347

Table 12 IGE investment phases [26] Enterprise initiative group

Enterprise initiative

Mission unique

Investment priority

Investment phase

Enterprise product services

Data processing services

Medium

1

1

Product generation services

Low

2

1

Product user services

Low

3

1

Environmental information services

Low

1

1

Cal/Val services

Medium

2

2

Algorithm services

Medium

3

2

Ground operations support services

Medium

1

1

Planning and scheduling services

Medium

2

2

Flight operations support services

Medium

3

3

Flight operations services

High

1

2

Contact services

Medium

2

3

Common operating picture services

Medium

3

3

Sp/Gnd Comm services

High

1

2

Routing services

Low

2

3

Enterprise science services

Enterprise operations support services

Enterprise operations services

Enterprise data relay services

348

S. R. Petersen

risks to specific mission functions are underway in the pilot experiments included below. As NESDIS implements the projects described below, it also has to make changes in the ways it integrates development and operations activities. Because the common services approach employs multiple services to simultaneously serve multiple missions, changes in a service will require testing that addresses end-to-end mission flows rather than just the impacts on one-specific mission. The enterprise needs a fully capable set of development and integration and test (I&T) environments along with the capacity to schedule the data sources needed to drive them. This is fundamentally different than the way development and integration have been handled in the past.

10.1 Enterprise Product Services: Environmental Satellite Processing and Distribution System (ESPDS) [27] Over the past year, ESPDS began product generation and distribution operations supporting the newly launched GOES-R and JPSS satellites. It also gained responsible for distributing products from the legacy satellites that had been distributed by the legacy Data Distribution System (DDS). ESPDS modernizes the existing Environmental Satellite Processing Center (ESPC), operated by the NESDIS Office of Satellite and Product Operations, with an enterprise solution that meets the needs of legacy, GOES-R, S-NPP/JPSS, and GCOM-W satellite programs, with scalability to support future environmental satellites. It includes modernization of the Product Generation (PG), Product Distribution (PD), and Infrastructure segments of the ESPC and provides environmental satellite data and services to a growing user community including the NOAA Line Offices (NWS, NMFS, NOS, NIC, NESDIS, etc.), DoD (AFWA, NAVO, etc.), and other U.S. and international users (government, universities, foreign partners, etc.). ESPDS provides a scalable and secure infrastructure as a foundational building block upon which all other system functions and services reside. It leverages common infrastructure and processing services, reducing redundancy and costs while simplifying operations, maintenance, monitoring, and security. The ESPDS user portal provides user self-service subscription and search capabilities across all NESDIS products, eliminating labor required with the current manual subscription method. Approved users can manage their data access details (product customization, selection, and transfer method) and exercise greater data discovery (via the online catalog). ESPDS is built on a service-oriented architecture (SOA) that provides the following benefits: (1) Extensibility: The loose coupling of services allows the ability to add new functionality to the system without impacting the existing capabilities; (2) Reusability: ESPDS services will be usable for future integration, benefitting future government systems; (3) Modularity: ESPDS services can be upgraded and replaced easily.

Ground Enterprise Transformation at NESDIS

349

ESPDS is fielded at the primary and backup ESPC sites. The primary ESPC site is the NOAA Satellite Operations Facility (NSOF) in Suitland, MD; the new ESPC backup site is the Consolidated Back Up (CBU) facility in Fairmont, WV. As part of internal planning to expand partner collaboration and data sharing, NESDIS is budgeting for significant increases in ESPDS product generation and distribution capacity. These will be implemented as part of the move to an IGE. The ESPDS Program is conducting a pilot project entitled “NDE Proving Ground” to prototype candidate architectures for product generation as a service in a commercial (AWS) cloud environment. It is assessing cloud efficacy, performance, scalability, and maintainability, as well as the flexibility to execute multiple types of algorithms. The team also recognizes that even if the cloud proves unsuitable for operations, it may offer a viable approach for development, test, and collaborative research capabilities leveraging common frameworks.

10.2 Enterprise Product Services: Comprehensive Large Array-Data Stewardship System (CLASS) [28] CLASS provides long-term, secure storage of NOAA-approved data, information, and metadata, and enables access to these holdings through both human and machineto-machine interfaces. Capabilities are provided in three primary functional areas as defined by the Open Archive Information System Reference Model (OAIS-RM): (1) Ingest—mechanisms by which data, information, and metadata are transferred to and organized within the storage system; (2) Archival Storage—common enterprise means for data, information, and metadata to be stored by the system and the capability to refresh, migrate, transform, update, and otherwise manage these holdings as part of the preservation process; (3) Access—common enterprise access capability enabling users to identify, find, and retrieve the data and information of particular interest to the user. As an enterprise solution, CLASS reduces cost growth associated with storing environmental data sets by consolidating stovepipe, legacy archival storage systems and relieving data owners of archival storage-related system development and operation issues. CLASS does not support near real-time or mission-critical product delivery. CLASS consists of two full replicated storage nodes hosted by NOAA’s National Centers for Environmental Information (NCEI) located at Asheville NC and Boulder CO. Receipt nodes at located at the NOAA Satellite Operations Facility (NSOF) in Suitland, MD, and the Consolidated Backup Facility (CBU) at Fairmont, WV. The current capacity of the CLASS system is 20 PB, with projected growth to 53 PB by 2020. CLASS recently completed development and is currently in operation supporting the following users: – NOAA Geostationary Operational Environmental Satellites Series R (GOES-R) – NOAA Joint Polar Satellite System (JPSS)

350

– – – – – –

S. R. Petersen

NOAA Polar-orbiting Operational Environmental Satellites (POES) US Department of Defense (DoD) polar-orbiting satellites NOAA Geostationary Operational Environmental Satellites (GOES) Canadian Space Agency’s Synthetic Aperture Radar Satellites (Radarsat) European Meteorological Operational Satellite (MetOp) Program Ocean Surface Topography Mission (OSTM) Jason-2 and Jason-3.

Beginning in FY19 CLASS will implement changes designed to reduce costs and move the architecture closer to the common services approach of the IGE. Currently CLASS supports two full service nodes. Each can ingest data, which is then automatically replicated to the other node via custom software. Sustainment of this replication capability is costly. This new CLASS project changes the current architecture to a simpler one that limits ingest to a primary node only, with one-way data replication to a second node dedicated purely to backup. In addition to reducing costs, this change paves the way for a future decision to move the second node to the cloud. Future planned investments will also implement a more flexible approach to ingesting data sets, opening the archive to broader adoption as a universal solution across NOAA.

10.3 Enterprise Product Services: Flexible Ingest Currently, the NGE ingests data through a variety of systems, each designed to accommodate a specific source in a fixed fashion. As part of the move to the IGE, NESDIS is planning a flexible ingest project that will provide data ingest as a service. This service will easily accommodate a variety of data sources, including non-satellite sources, and provide the data to product generation and distribution services. One key element of the project is an early evaluation of the feasibility of using cloud capabilities to implement this function.

10.4 Enterprise Science Services: Enterprise Algorithms [29] As originally built and deployed as part of stand-alone ground processing systems, single-purpose algorithms produced satellite data products specific to an instrument type and mission. Multiple versions of algorithms and associated software created similar products. Since the single-purpose algorithms are developed for specific instruments and missions, these versions are generally run on different product generation systems These algorithms were easy to develop, easy to test, and simple to deliver and maintain as individual algorithms. However, as a group they were more expensive in the long run: more systems to maintain in operations; longer algorithm and system development time—more “from scratch” development, not much leveraging due to requirements; higher algorithm maintenance costs—each algorithm for each satel-

Ground Enterprise Transformation at NESDIS

351

lite type needs funding to be maintained; minimal software reuse; many sources of ancillary data required for the multiple algorithms—no standardization across algorithms. In light of these inefficiencies, NESDIS has chosen to move to enterprise algorithms. An enterprise algorithm uses the same scientific methodology (i.e., physical basis, including assumptions) and software base to create the same classification of product from differing input data (satellite, in situ, or ancillary). Each time an unfamiliar observation/source is acquired, interfaces (metadata, data structures, and formats) have to be written. But over time, as new observations are introduced into NOAA, the process of adding external sources of data will become easier because of the tools and interfaces already established. Some of the benefits of this approach include consistent products with similar characteristics and performance can be generated for different instruments to support end-user operations; it aligns with the National Weather Service implementation strategy of multi-sensor algorithms and products; algorithms are independent of the satellite sensor input data; supports current and future NOAA operational satellites; same algorithm can process different resolutions of like sensors; eliminates the need to retrain users for continuity missions; sustains product calibration across continuity missions; requires fewer algorithms and processing systems to be installed and maintained; streamlines transition to operations for new satellite missions. This new approach brings some challenges of its own. These include algorithm inter-dependencies; more rigorous testing requirements; a need for flexible systems and processes that accommodate new/updated requirements; alignment of algorithm delivery schedules across projects. NESDIS has adopted a two-phase strategy to instantiate enterprise algorithms within the IGE. The first phase, initiated in July 2017, focuses on algorithms that run on existing legacy product generation systems. By September 2020, all these algorithms will be converted to enterprise algorithms running on enterprise product generation systems, likely ESPDS, the GOES-R Ground Segment, and potentially the cloud. Evaluation of cloud suitability forms a key part of the early work and includes pilots specifically structured to address the concerns referenced above. At the end of Phase, one the legacy product generation systems will be decommissioned. In Phase two, all the Level 2 algorithms will be converted to algorithm services, consistent with the IGE architecture. Phase two will be completed by September 2025. Based on our analysis to date, enterprise algorithms will benefit the developers, operators, and the users

10.5 Enterprise Science Services: Mission Science Network (MSN) [30] Development of the mission science network began in 2016. MSN is an IT platform that will provide enterprise services addressing the following objectives:

352

S. R. Petersen

– Deliver cost-effective, secure, cloud-based infrastructure to support research through operations – Enable research and development of scientific data and applications – Support operational availability for product generation – Manage data through its full lifecycle from creation to preservation – Provide access to NOAA’s data, information, and services. The MSN focuses primarily on the science needs of the Center for Satellite Applications and Research (STAR) and the National Centers for Environmental Information (NCEI). MSN will be implemented through the following tasks: – Connection of existing systems between STAR and NCEI – Harness existing N-Wave connectivity – Consolidate systems in order to obtain efficiencies of scale and long-term cost savings – Migrate data and applications, and shutdown systems in NCEI-MD and NCEI-MS – Consolidate existing STAR systems into internal MSN clouds at STAR – Deploy IT services that support entire science enterprise – Determine best-of-breed capabilities between NCEI and STAR – Leverage open source applications wherever possible – Develop agile, scalable, and secure architecture for future science mission – Prepare NCEI and STAR systems for future migration into public cloud. Initial Operational Capability (IOC) is scheduled for September 2019 with Full Operational Capability (FOC) in September 2022.

10.6 Enterprise Operations Support Services: Enterprise Tools—Active Risk Manager (ARM), Enterprise Configuration Management Tool (ECMT) OSGS has deployed two support tools to enable better management of NESDIS projects and operations. The first tool, Active Risk Manager (ARM), allows all project managers to document and manage their issues, risks, and opportunities. As currently configured, it enables managers at all levels to gain enterprise views of the risks within their line of responsibility. It helps the organization prioritize and allocate resources to mitigate the risks and exploit the opportunities. The Enterprise Configuration Management Tool is deployed at the NOAA Satellite Operations Facility (NSOF) in Suitland, MD. ECMT was originally developed by the GOES-R project and now hosts configuration information on most of NESDIS satellite programs, with the exception of JPSS, which is expected to migrate over the next few years.

Ground Enterprise Transformation at NESDIS

353

10.7 Cloud Evaluations In addition to the development efforts described above, NESDIS is examining ways to leverage commercial cloud capabilities for the new service-based architecture. Initial efforts consist of small pilot projects focused on answering the following questions: 1. Security: How can sufficient operational security, and associated Authority to Operate, be obtained with a cloud implementation? 2. Data Ingest: What architecture and native services are appropriate for common Ingest, cataloging, and discovery functions? 3. Product Generation: What is the best fashion to implement and operate the enterprise algorithms that will deliver the products required by the users? 4. Product Distribution: What is the optimum architecture to meet latency needs at the lowest possible cost? 5. Science Development: How should Science Development activities be hosted in the cloud?

11 Managing Change This paper has described technical and management activities across NESDIS focused on implementing the transition from stand-alone to integrated ground systems. As illustrated in the concept of operations, the use of shared infrastructure embedded in an Integrated Ground Enterprise brings change that impacts many elements of traditional system acquisition, operation, and sustainment. The good news is that the technologies to accomplish this transformation already exist (with the possible exception of some security aspects of cloud operations) and have been successfully applied to similarly complex problems in other industries—there are no known technical show stoppers. The Office of System Architecture and Advanced Planning (OSAAP) established a governance process for the architecture and it is working well. OSAAP organized and chairs a weekly forum known as the NESDIS Enterprise Architecture Council (NEAC). The NEAC coordinates and provides awareness of IGE-related activities and serves as an excellent conduit for information dissemination. The biggest challenges associated with this transformation are its impacts on the people that develop and operate these capabilities. Team members across the spectrum of Ground Enterprise jobs have been successful for many years doing things the traditional way. Now we are asking them to recognize that the traditional way is no longer scalable or affordable, and to adopt new roles, job definitions, and tools. The transformation is about change management.

354

S. R. Petersen

12 Toward the Future OSGS is driving the development of a future Integrated Ground Enterprise that offers numerous advantages over the traditional stand-alone approach to system development and operation. Activities to realize the desired end state are well underway, including completion of a concept of operation and most of the Enterprise Architecture analyses. Complementing these activities, OSGS is fielding pioneering enterprise product generation, distribution, and archive systems that fully support the recently launched new generations of polar and geosynchronous environmental observation satellites. Supporting all these endeavors is a communication and outreach approach designed to address the challenges of change management. Acknowledgements Dr. Scott Turner (Aerospace Corporation) played an essential role in the development of the Concept of Operation. Mr. Mike Greico (Vencore) leads the Enterprise Architecture Team. The government Chief Architect is X Li (OSGS). The opinions expressed in this paper are entirely those of the author.

References 1. “About NESDIS—NOAA’s Satellite and Information Service”. NESDIS.NOAA.gov. National Oceanic and Atmospheric Administration. 29 Feb 2016. Web. 24 Mar 2016. 873. 2. “GOES-R Ground Segment Overview”. GOES-R.gov. GOES-R Series Program Office. Web. 24 Mar 2016. 3. “About JPSS—JPSS Ground System”. JPSS.NOAA.gov. National Oceanic and Atmospheric Administration. Web. Mar 2016. 4. “NESDIS. Offices. NOAA OSGS—Office of Satellite Ground services”. NESDIS.NOAA.gov. National Oceanic and Atmospheric Administration. Web. 24 Mar 2016. 5. United States. Department of Commerce. National Oceanic and Atmospheric Administration. National Environmental Satellite Data and Information Service. Office of Satellite Ground Services. NESDIS Ground Enterprise Architecture Services (GEARS) Concept of Operations. Silver Spring, MD: NOAA. N.p.: Office of Satellite Ground Services, Feb 2015. Print. 6. “NESDIS. Offices. NOAA OSGS—Office of Satellite Ground services. About OSGS”. NESDIS.NOAA.gov. National Oceanic and Atmospheric Administration. Web. 24 Mar 2016. 7. (GEARS) Concept of Operations, pp. 11–20. 8. Ibid., pp. 20–37. 9. Ibid., pp. 39–45. 10. United States. Department of Commerce. National Oceanic and Atmospheric Administration. National Environmental Satellite Data and Information Service. Office of Satellite Ground Services. NESDIS Ground Enterprise Architecture Services (GEARS) Requirements. Silver Spring, MD: NOAA. N.p.: Office of Satellite Ground Services, Sep 2017. Print. 11. United States, Office of Management and Budget. Federal Enterprise Architecture Framework Version 2. Washington DC. Jan 2013. Print. 12. “TOGAF Version 9.1”. OpenGroup.org. The Open Group. 2013. Web. 24 Mar 2016. 13. “TOGAF 9.1. Part II: Architecture Development Method. Introduction to the ADM.” OpenGroup.org. The Open Group. 1999-2011. Web. 24 Mar 2016. 14. United States. Department of Commerce. National Oceanic and Atmospheric Administration. National Environmental Satellite Data and Information Service. Office of Satellite Ground Services. Systems Engineering Division. NESDIS Ground Enterprise Architecture (EA) Views

Ground Enterprise Transformation at NESDIS

15.

16. 17. 18. 19.

20. 21. 22. 23. 24. 25.

26.

27. 28. 29.

30.

355

and Usage Document. Silver Spring, MD: NOAA. N.p.: Office of Satellite Ground Services, Nov 2017. Print. United States. Department of Commerce. National Oceanic and Atmospheric Administration. National Environmental Satellite Data and Information Service. Office of Satellite Ground Services. Systems Engineering Division. NESDIS Ground Enterprise Architecture Services (GEARS) Technology Architecture OSGS TA 5350.16A Version 1.5. Silver Spring, MD: NOAA. N.p.: Office of Satellite Ground Services, Mar 2018. Print. pp. 13–15. Ibid., pp. 15–17. Ibid. GEARS Technology Architecture, p. 13. United States. Department of Commerce. National Oceanic and Atmospheric Administration. National Environmental Satellite Data and Information Service. Office of Satellite Ground Services. Systems Engineering Division. NESDIS Ground Enterprise Architecture Services (GEARS) Technology Architecture Users Guide OSGS TA 5320.05A Version 1.0. Silver Spring, MD: NOAA. N.p.: Office of Satellite Ground Services, Jul 2018. Print. P 6. Ground Enterprise Architecture (EA) Views and Usage Document, p. 19. Ibid., p. 20. Ibid., p. 21. Ibid., p. 11. Ibid., p. 9. United States. Department of Commerce. National Oceanic and Atmospheric Administration. National Environmental Satellite Data and Information Service. Office of Satellite Ground Services. Draft NESDIS Ground Enterprise Architecture Services (GEARS) Transition & Sequencing Plan: A Migration Strategy for the NESDIS Ground Enterprise (NGE). Page B-2. Silver Spring, MD: NOAA. N.p.: Office of Satellite Ground Services, December 2015. Print. p. 42. United States. Department of Commerce. National Oceanic and Atmospheric Administration. National Environmental Satellite Data and Information Service. Office of Satellite Ground Services. Integrated Ground Enterprise (IGE) Implementation Plan (IGEIP) Version 1.0. Silver Spring, MD: NOAA. N.p.: Office of Satellite Ground Services, Dec 2017. Print. p. 30. “NESDIS. Offices. NOAA OSGS—Office of Satellite Ground services. Sustain.” NESDIS.NOAA.gov. National Oceanic and Atmospheric Administration. Web. 24 Mar 2016. Ibid. United States. Department of Commerce. National Oceanic and Atmospheric Administration. National Environmental Satellite Data and Information Service. NESDIS-TEC-6510.1 Enterprise Algorithms White Paper. Silver Spring, MD: NOAA. May 2017. Print. United States. Department of Commerce. National Oceanic and Atmospheric Administration. National Environmental Satellite Data and Information Service. 2Q FY18 Program Management Review—Mission Science Network. Silver Spring, MD: NOAA. Mar 2018. Print.

CNES Mission Operations System Roadmap: Towards Rationalisation and Efficiency with ISIS Paul Gélie, Helene Pasquier and Yves Labrune

Abstract In the middle of the two thousand, CNES was operating around 15 space vehicles, using five different mission operations systems. The rationale was that each spacecraft type or product line had its own mission operations system, so as to have an optimised system for each kind of vehicle. This was leading to ‘local’ optima. This paper aims at showing how CNES has chosen to improve its global efficiency for mission operations system development and for spacecraft operations. In order to find a better development and operations cost optimisation, as well as an optimisation of the organisation of its space operations, CNES studied by about 2006 the development of a new mission operations system that must be usable for all the future missions operated at CNES. The decision to achieve this development was taken in 2010. This new system is developed in the frame of the CNES ISIS project (Initiative for Space Innovative Standard). The ISIS project aims at optimising the CNES space systems development by standardising for CNES missions the platforms together with the mission operations system. The payload and the payload operations and data system are out of the scope of the ISIS project as they are specific to each mission. ISIS only deals with the interfaces towards the payload world. The whole ISIS project is achieved in partnership with two spacecraft manufacturers: TAS (Thales Alenia Space) and ADS (Airbus Defence and Space). The new mission operations ground system is called LP CCC ISIS (French acronym for ISIS command control centre product line). The paper explains how the ISIS project was born at CNES, the rationale of the project, the area and mission types it covers and the objectives it follows in order to perfectly understand the context in which the LP CCC ISIS is being developed. The ISIS project description also gives the rationale for optimising the space systems development and unifying the operations concept at CNES. This project is designed as the follow-on of the successful CNES/TAS (Thales Alenia Space) mini-satellites product line called PROTEUS. The main PROTEUS concepts are recycled in ISIS and are completed with all the CNES, TAS and ADS past experience in various space systems development and operations. The paper shows the major role given to various CCSDS and ECSS standards in this process. The context being explained, the paper addresses then the objectives of the LP CCC ISIS P. Gélie (B) · H. Pasquier · Y. Labrune CNES, Toulouse 31401, France e-mail: [email protected] © Springer Nature Switzerland AG 2019 H. Pasquier et al. (eds.), Space Operations: Inspiring Humankind’s Future, https://doi.org/10.1007/978-3-030-11536-4_14

357

358

P. Gélie et al.

(for instance better performances to anticipate the evolution of the space systems, more automation to reduce operations costs and securing at defence missions level). It also addresses the various foreseen uses of the LP CCC ISIS which are not limited to mission operations system but also cover, for instance, test bench for instrument and satellites AIT (assembly, integration and tests) or remote analysis toolbox for on-call operators. This is due to the will to optimise software developments in all fields connected with satellites monitoring and control. Thereafter are described the main concepts on which the LP CCC ISIS relies and how they help fulfil the various objectives assigned to this new product. The main topic here is about service-oriented architecture (SOA), CCSDS Mission Operation standard and splitting of the software into components. This is completed with a high-level technical description of the LP CCC ISIS, showing all the functions covered by the software, how they are organised and what is to be done when the LP CCC ISIS is to be adapted to a new mission. The industrial organisation for development and integration/qualification phases will be described. This will highlight the expected benefits from this new product line. For instance, it will be shown how different systems for different uses can be built from the LP CCC ISIS components or the advantages of using the LP CCC ISIS software in various contexts and not only in mission operations system. The paper will give an overview of the development planning and of client missions at CNES for the LP CCC ISIS. A first evaluation of the benefits those missions have seen by using the ISIS standard and the LP CCC ISIS will be exposed, considering that CNES is only in assembly and tests phase as the first launch has not yet been achieved. The paper touches on the links with the European Ground System Common Core (EGS-CC) initiative. EGS-CC fulfils similar objectives and thus exchanges have taken place between the two initiatives. CNES is a member from the beginning of the EGS-CC steering board and of the EGS-CC system engineering team.

Nomenclature ADS AIT AIV AOCS API BD CCC CCSDS CGS CNES COO COP1 COR

Airbus Defence and Space Assembly, Integration and Testing Assembly, Integration and Validation Attitude and Orbit Control System Application Programming Interface Base de Données (data base) Centre de Commande Contrôle (command control centre) Consultative Committee on Satellite Data Standard Control Ground System Centre National d’Etudes Spatiales (French space agency) Centre d’Orbitographie Opérationnelle (operational orbitography centre) Command Operation Procedure 1 Centre d’Opérations Réseau (Station network operation centre)

CNES Mission Operations System Roadmap …

COTS DLR

359

Commercial Off-The-Shelf Deutsches zentrum für Luft und Raumfahrt (German space and aeronautics national agency) EAR Export Administration Regulations ECSS European Cooperation for Space Standardisation EGS-CC European Ground Segment Common Core EGSE Electrical Ground Support Equipment ESA European Space Agency FCP Flight Control Procedure FDS Flight Dynamics System GCP Ground Control Procedure GEO Geostationary Orbit GUI Graphical User Interface HEO High Elliptical Orbit ISIS Initiative for Space Innovative Standard ITAR International Traffic in Arms Regulations ITU International Telecommunication Union JPL Jet Propulsion Laboratory Kbps Kilobits per second LEO Low Earth Orbit LOS Loi sur les Opérations Spatiales (French space operations law) LP CCC ISIS Ligne de Produit Centre de Commande Contrôle ISIS (command control centre product line) MAL Message Abstraction Layer MB Mega Bytes Mbits/s Mega bits per second MCS Mission Control System MEO Medium Earth Orbit MGS Mission Ground System MO CCSDS Mission Operation standard M&C Monitoring and Control NASA National Administration for Space and Aeronautics OBCP On-Board Control Procedure OS Operating System PL Payload PROTEUS Plateforme Réutilisable pour l’Observation de la Terre, les Telecommunications et les Usages Scientifiques (reusable platform for Earth observation, telecommunications and scientific uses) PUS Packet Utilisation Standard SDB Satellite Data Base SLE Space Link Extension SOA Service-Oriented Architecture SOO Sequence Of Operations TAS Thales Alenia Space TC TeleCommand

360

TM TOMS VM W/S XML XTCE

P. Gélie et al.

TeleMetry CNES numerical satellite simulator product line Virtual Machine WorkStation eXtensible Markup Language XML Telemetric and Command Exchange

1 Introduction In the middle of the 2000s, CNES was operating around 15 space vehicles, using 5 different mission operations system. The rationale was that each spacecraft type or product line had its own mission operations system, so as to have an optimised system for each kind of vehicle. This was leading to ‘local’ optima. This paper aims at showing how CNES has chosen to improve its global efficiency for mission operations system development and for spacecraft operations. In order to find a better development and operations cost optimisation, as well as an optimisation of the organisation of its space operations, CNES studied by about 2006 the development of a new mission operations system that must be usable for all the future missions operated at CNES. The decision to achieve this development was taken in 2010. This new system is developed in the frame of the CNES ISIS project (Initiative for Space Innovative Standard). The ISIS project aims at optimising the CNES space systems development by standardising for CNES missions the platforms together with the mission operations system. ISIS project is conducted by CNES in partnership with two major satellite manufacturers, TAS and ADS. All ISIS documents are discussed and signed together by the three partners. The paper presents the ISIS project and the new CNES mission operations system. It then briefly explains the links between ISIS and EGS-CC before giving a word of conclusion.

2 The ISIS Project at CNES 2.1 ISIS Rationale and Scope ISIS began as a study of a follow-on of the CNES PROTEUS low-cost satellite family. PROTEUS system was developed during the 1990s. Six satellites have been built, three of them still being in flight. This system was made of: • A platform and its interfaces, designed for multipurpose usage. • A command/control ground segment and its interfaces, compliant with the platform, including small TM/TC stations.

CNES Mission Operations System Roadmap …

361

All that is related with the payload is out of the PROTEUS scope. Only the interfaces with the payload or the payload processing centre for instance are part of the PROTEUS perimeter. Each mission had to design its payload and payload programming and processing centres so as to be compliant with the PROTEUS resources and interfaces. The platform/command control ground segment package was reused as is or with only slight modifications for each mission. Having designed a reusable system was a major contributor to the low-cost objectives of PROTEUS for the missions. Cost reduction is one of the main ISIS objectives, and the idea is to recycle PROTEUS main concepts as they have proven to be successful. But in the frame of ISIS, it has not been designed a new platform like PROTEUS did, but it has been decided to write a CNES internal standard (this explains the ISIS acronym: Initiative for Space Innovative Standard). This standard contains specifications both for platform and command control ground segment. This standard applies to all future CNES missions. This standard is both: • A whole set of specifications that a mission can reuse to write its own system specification, platform specification, board to ground interfaces, command control ground segment specifications and ground interfaces. Each mission slightly adapts the specifications to its needs. • And a set of specifications from which ISIS products can be developed and reused from one mission to another. Applying the standard to all CNES mission is the way cost savings can be done, by reusing products compliant to the standard from one mission to another. Applying the standard also leads to have a single operation concept, thus allowing more easily the operation teams to switch from one mission to another. The ISIS scope is the same as the PROTEUS one, that is to say platform, command control ground segment and interfaces with the payload world. The only difference is for the stations, as ISIS uses the existing CNES multi-missions stations network instead of developing dedicated stations. ISIS technical area of concern is basically made of all the means and services recurrently involved in a space mission, for both operations, and satellite development. It is commonly agreed that those services do not make significant difference and are not subject to competition between major space companies any longer. This situation makes possible to define standards facilitating reuse and rationalisation. The main components this standardisation can focus on are: • The platform services for the payload: mechanical/thermal/propulsion, electrical and power, data handling, payload data management, dynamics and AOCS, software and operations. • The control ground segment (CGS) which may be multi-missions. • The external interfaces: satellite-launcher interface, platform-payload interface and interface between the control ground segment and the mission ground segment. • AIV, test beds and simulators, system database.

362

P. Gélie et al.

Fig. 1 ISIS project scope

On Fig. 1, the elements in the ISIS scope are in green, the payload world elements are in purple and the reused stations network is in blue.

2.2 ISIS Covered Missions ISIS has been designed to cover a wide variety of missions. Only telecommunication missions and interplanetary missions are out of the ISIS scope. Many other kind of missions can be covered for instance LEO, MEO and HEO. After studying all those type of missions and having defined a segmentation for each technical part of the system, ISIS is currently developing LEO missions’ aspects as the first ISIS missions are all LEO missions. If another kind of mission is to be taken into account in the future, additional studies will be made to achieve the segments eventually not yet developed. So the whole ISIS segments will be developed little by little according to the missions’ opportunities.

CNES Mission Operations System Roadmap …

363

2.3 ISIS Optimisation Processes The rationale for optimising the space system development process as well as the operations with ISIS is to apply a precisely defined standard to all CNES missions thus allowing: • Application of standard and homogeneous process for successive space systems development. • Reuse of products. • Reuse of validation statuses on products thus optimising the tests to be conducted. • Application of a standard operation organisation allowing to interchange easily operators between missions. • Reuse of operational procedures limiting the workload to prepare successive mission operations. This can be summarised as limiting the workload by maximising the reuse of specifications, products, processes and validation statuses from one mission to another on all recurrent fields.

2.4 ISIS High-Level Objectives ISIS standards aim at helping space mission providers to propose highly performing systems as well as secured and cost-effective development and operation processes.

2.4.1

Cost Reduction

It is a strong requirement that the ISIS approach leads to significant costs reduction from mission point of view. The main drivers for costs reduction are: • Development efficiency: the right inputs at the right time, the right tool for the right study. • Operation efficiency, to limit the manpower required during tests and operations. • Simplicity and robustness, to limit the knowledge required to operate the system (test or flight procedures development for example), and the flight domain to be validated. • Wide reuse of system components without modification, which implies wide and well-adapted flight domain, versatility, but also documentation to help missions use them in the best way. Cost reduction shall be considered from a point of view common to all missions: an option can be not optimal for a given mission but must be optimal for the missions all together.

364

P. Gélie et al.

Cost reduction shall also be considered from a point of view common to all mission phases: a service may be expensive during development, but cost efficient during operation. The cost of standardisation shall be taken into account in the economic rationale, considering that, at short term, ISIS can apply to a limited number of decided missions.

2.4.2

Secure Developments

In early phases of mission development, it is useful to know the typical performances, development plans and costs of product lines and associated processes of previously flight-proven solutions, in order to ease the system architecture definition process and to set up a well-defined and reliable development plan. Consolidation of such flight-proven reusable product lines as well as inputs defining how to use them is a major way to secure developments and is the basis of ISIS approach.

2.4.3

Support Innovation

Innovation is not the main goal of ISIS approach. However, standards and technology evolution as well as obsolescence imply permanent changes, and it is often difficult, for costs and risks reasons, for an isolated mission to embed innovation. ISIS brings out solutions to help innovation be taken into account and to rationalise the way it is performed.

2.4.4

Improved Flexibility and Flight Domain

Flexibility and flight domain improvement are a way to enlarge the domain where generic product lines can be used and at the end a plus for efficiency. Flexibility must be understood as the ability of product lines (on-board or ground), of adapting locally to particular needs, without changing the whole product line outline. Flight domain improvement must be understood as the ability of part of product lines to be used in wider mission conditions (LEO/GEO, radiation level for example). This implies enlarging the capacity of single solutions, or breaking down the performance ranges into optimised segments. ISIS shall support flexibility and flight domain improvement.

CNES Mission Operations System Roadmap …

2.4.5

365

Compatibility with International Standards and Regulations

As the French National Space Agency, it is the CNES duty to promote International and European standards and regulations, especially when CNES was involved in their definition. At the same time, such compatibility guarantees better reusability, which is an important source of cost reduction. Regulations that restrict exportation must also be taken into account. The following guidelines were given to ISIS: • ISIS-based products shall be based on state-of-the-art technologies. • ISIS-based products shall comply with international, European and state-of-the-art standards (CCSDS, ECSS…). • ISIS shall consider and make the most of other European standardisation initiatives. • ISIS-based products shall comply with classical space regulations: ITU, law on space debris and French LOS. • Components under ITAR, EAR and other regulations limiting products exportation shall be exhaustively identified and shall not be used in key building blocks. • French export regulation constraints shall be identified and taken into account to avoid restrictions on ISIS products exportation. • ISIS safety rules shall be compliant with the major launch sites requirements. First, it guarantees that the solutions developed by ISIS partners will be compatible with other customers’ needs. Second, it guarantees that developments required by or for other customers will be close to ISIS standard, and contribute to ISIS partners’ effort to convergence. Until now, ISIS has produced 60 documents that can be reused by the different missions. Among them, 30 are ECSS documents that have been tailored.

2.4.6

Support European Industry Product Lines

It is another CNES duty to support the European industry. ISIS must contribute to this goal by establishing partnerships with the prime contractors in order to support product lines and platform equipment improvement and competitiveness. ISIS benefits must be profitable for other customers than CNES.

2.4.7

Missions

ISIS standard shall be compatible with the following classes of missions: • French institutional: defence, multi-lateral science or institutional cooperation. • European space agency missions (except interplanetary). • Export.

366

P. Gélie et al.

2.5 Position of the LP CCC ISIS in the ISIS Project At the same time ISIS was being defined, CNES was thinking about developing a new command control centre software product line that would be used by all the future CNES missions. The aim was to have a single software for the future command control centres in order to optimise the missions’ costs for the command control ground segment. When the ISIS standard has emerged, it became obvious that this new product line had to be compliant with the ISIS standard. The ISIS standard applied to all future CNES missions was the ideal frame to allow using a single product line for operations. This is how the new CNES command control centre software product line has been designed as the LP CCC ISIS. So the LP CCC ISIS is an ISIS standard compliant software product line reusable for any ISIS compliant mission command control centre. It is developed in the frame of the ISIS CNES project.

2.6 ISIS Operational Concept The ISIS mission command chain and the ISIS mission data chain are based on the following concepts.

2.6.1

Mission Command Chain

Mission operators from the mission centres are responsible for payload operations whereas operators from the control centre are responsible for platform operations, as well as maintaining the satellite and the control ground segment in operational conditions. Operations commanded by mission operators from the mission centres are called mission operations. Operations performed by satellite operators from the control centre are called satellite operations. They include maintenance operations on the payload. The mission command chain is based on the following concept (Fig. 2). Mission planning is performed by mission operators. For some missions, payload activities can be planned independently from platform operations (power resources are large enough, the payload includes its own data storage and transmission means, no specific attitude profiles are needed). In this case, mission operators provide the control centre with payload commands only. If the payload activation is conditioned by platform resource level (power for example), these limitations have to be taken into account in the mission planning process.

CNES Mission Operations System Roadmap …

367

Fig. 2 Mission command chain

When payload activities are coupled with platform activities (e.g. specific attitude profiles are needed), mission operation files include requests to the control centre to perform platform operations. Requests are high-level commands and are translated into telecommands by the control centre. The control centre plans the satellite contacts with the ground stations and processes the operation files to generate telecommand plans. Then, these TC plans can be sent to the destination satellite next time it is in contact with a ground station. If the mission has no specific needs and there is no coupling with payload operations, standard attitude and downlink management operations can be performed by the control centre in an automated way. In any case, the control centre and the platform are responsible for the satellite safety. Ground and on-board mechanisms guaranty that requests coming from the mission will never put the satellite at risk, or (as far as possible) in safe mode. Depending on mission characteristics and the nature of the erroneous programming, the result may vary from operation files reject to mission interruption. The presence of control operators is not necessary to process a mission operation file. The process is automated, and mission operators receive a failure report in case of problem and all the information necessary to follow the plan acceptance and execution by the satellite through satellite telemetry. In case of problem, real-time reaction from the control centre alone is not possible.

2.6.2

Mission Data Chain

The data distribution chain is based on the following concept (Fig. 3). Several options are possible: • All or part of payload data are recorded and transmitted to the ground through the same means as platform data (platform memory and S-band stations). • Mission data are recorded and transmitted through a multi-missions X-band data service managed by control operators. • Mission data are recorded and transmitted by mission-specific equipment considered as payload instrument and dedicated ground stations managed by mission operators.

368

P. Gélie et al.

Fig. 3 Data distribution chain

Mission data management is performed by mission operators. Downlink operations are planned in the mission plan consistently with payload operations. Raw (and ciphered if necessary) mission data are retrieved by the mission centres. Mission-specific data processing is performed before delivering the products to end users. Mission data should include all platform ancillary data which are necessary to mission data processing.

3 The LP CCC ISIS 3.1 Objectives The main objectives of the LP CCC ISIS are: • An upgradeable architecture to be adaptable to various missions with limited costs, even for non-ISIS missions. • The capacity to have modifications to fix bugs or to insert adaptations with limited amount of source code to modify and to deliver. • A distributed architecture and a scalability of the software so as to take into account various hardware deployment and so as to have a capacity to upgrade the hardware and to adapt the deployment, for example, when adding a satellite in a multimissions centre. • Better performances to anticipate the evolution of the space systems. • More automation to reduce operation costs. • Securing at defence missions level. • Taking the best of CNES operation concepts in the past 20–30 years. All those items participate in the step beyond CNES wants to do in its satellite operations.

CNES Mission Operations System Roadmap …

369

3.2 Various Possible Deployments The new ground system product line shall be used for: • Civilian and military mission control systems including nanosatellites and constellations (those last two points being at this time only under study). • Main control room/spacecraft experts room. • Standalone flight dynamics workstations. • Expert workstations (offline telemetry processing and analysis). • On-Call operator laptop (access to MCS data for first level diagnosis). • Simulator test bench procedures execution and telemetry monitoring. • AIT/On-board equipment test bench (Preparation and execution of spacecraft tests or instrument tests). This will to design the software thinking about various utilisation contexts from the beginning of the development reflects the will to optimise all software development in the field of satellite command control and testing, but also to have homogeneous practices and tools for all the different jobs like designers, operators, experts and AIT teams.

3.3 Main Concepts The key features of the ground system product line are: • To be able to build a multi-missions, multi-satellites control centre: the TC sending process is devoted to one satellite, TM visualisation is able to process, monitor and display several satellite flows. This capacity allows to have a single hardware infrastructure for several satellites belonging to different mission, or for a multisatellites mission. • To be able to have several independent working sessions in parallel for a mission, so as to gather in the same control centre both the operational activities and the tests activities (new procedure validation for instance, or new software release validation before using it on the operational line). • To be able to easily cope with new missions: the product line is organised as a set of generic software components that address common requirements. Additional components can be added to address mission-specific requirements. • To rely on a service-oriented architecture which enables easy software adaptability. The service-oriented architecture is based on the CCSDS Mission Operations standard. • To be compliant with security constraints imposed by defence missions. The architecture and the technologies used for the ISIS baseline shall meet the security requirements of a defence mission.

370

P. Gélie et al.

• To provide software components reusable outside of the product line. For example, TM parameters processing, TM visualisation, PUS services… can be reused for payload control centres, for on-board equipment validation, etc…. • To have a full automation of the control centre: every function can be activated by the automation subsystem, either directly or through a procedure. • To have the same language used for FCP, OBCP and ground automation procedures. • To have operational data divided in two categories: time-based data (non-relational time-stamped data like telemetry, command logs, events…) and documents (complex version-controlled files like procedures, mimics, memory images, reports…). Time-based data is immediately available. All those features intend to get a maximum of flexibility in all the phases of a mission (design, development, validation and operations).

3.4 High-Level Technical Description The new ISIS ground system product line provides the whole control centre functions. This chapter gives a description of the functions, of the architecture and of the performances. It begins giving an overview of which standards are used for the LP CCC ISIS.

3.4.1

CCSDS and ECSS Standards in the LP CCC ISIS

CCSDS Standards The Consultative Committee for Space Data Systems (CCSDS) was formed in 1982 by the major space agencies of the world to provide a forum for discussion of common problems in the development and operation of space data systems. Since its establishment, it has been actively developing recommendations for data- and informationsystems standards to promote interoperability and cross support among cooperating space agencies, to enable multi-agency spaceflight collaboration (both planned and contingency) and new capabilities for future missions. Additionally, CCSDS standardisation reduces the cost burden of spaceflight missions by allowing cost sharing between agencies and cost-effective commercialisation. The LP CCC ISIS relies on Consultative Committee for Space Data Systems (CCSDS) standards for: • Space link: CCSDS TM and TC space data link protocols. • MCS/ground station communications: CCSDS Space Link Extension Services. • MCS software architecture: CCSDS Mission Operation services and XTCE. The product line design relies on the CCSDS Mission Operations standard (MO) which follows the principle of service-oriented architecture (SOA) and defines a set

CNES Mission Operations System Roadmap …

371

of end-to-end services and common object model to describe the data. The standard also defines a MAL (message abstraction layer) to provide independence from the message encoding and transport technologies.

ECSS Standards ECSS is a cooperative effort of the European Space Agency, national space agencies and European industry associations for the purpose of developing and maintaining common standards. Requirements in this standard are defined in terms of what shall be accomplished, rather than in terms of how to organise and perform the necessary work. This allows existing organisational structures and methods to be applied where they are effective, and for the structures and methods to evolve as necessary without rewriting the standards. Most of the ECSS documents of the engineering branch have been tailored in ISIS. The product line relies on ECSS standards for: • ECSS-E-70-31 Mission Database Layout, used to define the conceptual data model of the system database. • ECCS-70-01 Spacecraft on-board control procedures language definition, used to derive two specification documents: the specification of OBCP user-needs and the specification of the procedure edition and validation user-needs. • ECSS-70-32 procedure language definition, used to specify the procedure language for satellite and ground operations. • ECCS-70-41 PUS A board-to-ground interface. ISIS provides CNES missions with a tailored PUS specification that simplifies the use of PUS services and complies with current implementations of PUS. Each mission can then select PUS services and subtypes according to its needs.

3.4.2

LP CCC ISIS Functions

The functional analysis of the ISIS ground system product line functions has identified the following high-level functional sub-systems: • • • • • • •

Monitoring and control. Data management. Visualisation. Automation. Flight dynamics. Mission operation support. Infrastructure.

Monitoring and control Monitoring and control functional subsystem is composed of functions necessary to communicate with ground stations according to CCSDS SLE standard, functions to

372

P. Gélie et al.

receive and process telemetry, functions to generate and send commands, encrypted or not, and functions to manage ECSS PUS services. Data management Data management functional subsystem (also named DataStore) contains functions related to data archive, management and retrieval. All the data received and produced in the control centre are handled in the DataStore which is the exclusive storage place inside the control centre: stored data are for instance telemetry packets and parameters, sent commands, events and operational products (procedures, mimics, sequence of operations…). This subsystem also manages the system database made from the satellite database delivered by the satellite manufacturer completed by various data such as derived parameters, monitoring rules and system parameters. Visualisation Visualisation functional subsystem consists in a graphical user interface for satellite mission monitoring. It allows creating graphical elements like mimics, curves, tables and history views, to display time-stamped data stored in the DataStore on these graphical elements, and to apply local monitoring. Various kinds of data can be visualised together on a single view to allow cross analyses. This subsystem can be used both during real-time operation and during offline operations like telemetry replay. Automation Automation functional subsystem gathers all functions related to control centre operations automation: procedures and sequence of operations preparation and execution. The sequence of operations preparation relies on an automatic computation of the activities based on predefined rules adapted to each mission. Flight Dynamics Flight dynamics functional subsystem is responsible for all processing dealing with attitude and orbit management. Mission operation support Mission operation support functional subsystem gathers functions related to events management and configuration, on-board software and images management, reports creation and report templates management, operational tasks and notes management. Infrastructure Infrastructure functional subsystem is comprised of functions related to the low-level layers of the control centre, typically dealing with user authentication and access rights management, internal communication management, system’s configuration, exchanging files with external entities, printing and managing hardware.

CNES Mission Operations System Roadmap …

373

Fig. 4 ISIS product line layers overview

3.4.3

LP CCC ISIS Architecture

Layers and Components In the LP CCC ISIS architecture, all subsystems are divided into several components (in average around ten components per subsystem). All those components are stateless and give services. The components call services of other components according to their needs and all interactions are done in an asynchronous way. This allows that components can easily be replaced by another implementation if necessary, for instance to take into account a mission specificity. This allows also that only the components concerned by an adaptation are modified and the others remain identical. Only the modified components are delivered in a new release instead of the whole subsystem in monolithic architectures. In addition, the LP CCC ISIS architecture is a layered architecture in which all the applications are in the upper layer (the application layer) and the infrastructure is in the lower layer. It is in this lower layer that the communication component (libcom) and the directory service that contains all information to access the various services can be found. This is illustrated in Figs. 4 and 5. The libcom allows components to exchange messages to interact. For instance, there are messages for activating a service, messages for telemetry diffusion etc.… The directory service allows to know which services are available and where they can be called. The libcom relies on the ZeroMQ technology and the directory service on the Redis technology. Both are compliant with the CCSDS Mission Operations standard.

374

P. Gélie et al.

Fig. 5 Layered architecture

The libcom implements the MAL part of the standard and the directory service implements part of the common services of the standard. The MAL allows to have standard interaction patterns independent from the transport technology. If necessary, the transport technology can be changed without impacting the application layer. Inside the application layer, we distinguish different types of components (see Fig. 6): • The daemons that are permanently active. • The processings that are launched on demand and that are stopped when their work is finished. • The GUIs that are dedicated to the man–machine interface. For instance, DataStore components that must always be available for storage are daemons, but the station connection component which is needed only during a satellite pass over a station is a processing launched just before the pass and stopped after. Besides the components, the application layer can also contain scripts in Python. Scripts can be used instead of components when something light is to be done. In particular, scripts can be written by users if they need to add simple processing without asking an industrial development of a component or if they need to add quickly a first level of processing waiting for a later implementation as a component.

CNES Mission Operations System Roadmap …

375

Fig. 6 Detail of the application layer

Last important elements are the configuration files of the components. They allow to tune the behaviour of the software. Concerning the activation of the processing chains, nothing is hard coded in the components. Every task in the control centre has to be activated from procedures that are to be written in Python. So all the operational concept is implemented through procedures and can be tuned by the operators as they are responsible for writing the procedures. Those procedures are called GCP. This is possible because all services given by the components are exposed trough APIs. Scripts can also use those exposed APIs and this is why scripts may replace components in some occasions. Concerning the deployment of the software, the LP CCC ISIS is a virtualised, distributed application. The LP CCC ISIS is deployed on a topology of VMs on which the components are distributed. The contractor delivers a standard topology but it can be adapted by the customer when installing the software according to the mission needs. For the subsystem development and validation, the contractor uses VirtualBox to create and manage VMs. For the operational deployment, CNES uses VMWare to create and manage VMs. Distributing the application on VMs allows to do logical groups of components so as to avoid any resources conflict between components. It also allows to tune the resources of each VM according to the needs of each components logical groups. Distributing the software on VMs is the guaranty of an easy scalability of the application. For example, storage components can be duplicated to allow workload dispatching. Virtualisation is used to be able to deploy on various hardware and various operating systems. For instance, it will be used to deploy the software on the experts’ laptop

376

P. Gélie et al.

Fig. 7 Main internal interfaces

under Windows. Virtualisation will allow in the future to better manage obsolescence and to avoid software porting to follow OS evolution.

Main Internal Interfaces Figure 7 shows the leading role of the storage subsystem. Each subsystem relies on the DataStore to store or retrieve all useful data: • • • • • •

Telemetry and sent commands for monitoring and control. Procedures, sequence of operations and operational rules for automation. Visualisations elements for the corresponding component. Data context for flight dynamics. Configuration data and execution context for infrastructure. Reports templates, memory images or event and alert configuration for mission operations support.

Besides, each component can generate an event to report a particular behaviour to the user. At last, the satellite database definition is distributed to all components needing to manage telemetry and telecommands structure.

CNES Mission Operations System Roadmap …

377

Fig. 8 Main external interfaces

Main External Interfaces Main external interfaces are shown in Fig. 8. Four external entities are identified: • The payload control centre which receives from the control centre all the data necessary to manage the mission: flybys data, telemetry, satellite configuration, orbit and attitude data. In response, the payload control centre provides the mission operations file defining the mission commands to send to the satellite. • The ground stations network which is responsible for establishing the link with the satellite (telemetry and commands exchanges) and which provides data associated to ground stations and to external environment (for instance, solar activity). • The satellite simulator whose main interfaces are telemetry sending and commands receipt. • At last, satellite manufacturer who receives telemetry from the command centre for monitoring purpose, who supplies data related to the spacecraft, mainly satellite database and on-board software.

Flight Dynamics Subsystem Architecture The FDS is made of a framework in which flight dynamics algorithms are integrated. The framework is in charge on one side of the integration and of the management of the interfaces for the flight dynamics algorithms and on the other side of the management of the interface with the LP CCC ISIS other components. The special feature for this subsystem is that the framework is developed as an ISIS component,

378

P. Gélie et al.

Fig. 9 FDS architecture

while the flight dynamics algorithms are developed in the frame of another project at CNES called SIRIUS. SIRIUS aims at modernising all the CNES flight dynamics software. The FDS assembly is thus made from the ISIS FDS framework component called FDS toolkit and from various flight dynamics algorithms that are integrated with the FDS toolkit. The resulting FDS is dedicated to a mission. The list of flight dynamics algorithms is mission specific. The cost optimisation is obtained here through the reuse as is of the FDS toolkit from one mission to another and by the fact that most of the algorithms are reused from one mission to another. The mission tuning is done mainly through the algorithms. Figure 9 shows the various items of the FDS, together with the infrastructure layer of the LP CCC ISIS.

3.4.4

ISIS Data Flow Architecture and ISIS PUS Services

Data flows cover both telemetry/telecommands (TM/TC) and remote monitoring/remote commands (RM/RC). TM/TC connections with the stations are compliant with the CCSDS space link extension standard (SLE). SLE can be replaced by other connection protocols. For instance, a simple TCP/IP connection is foreseen for AIT needs (Fig. 10). RM/RC are dedicated to the remote control of the stations. This possibility comes at CNES in addition to the standard remote control of the stations already done by the network operations centre. It allows doing some automatic checks on the station through the RM during the sending of a TC procedure for instance. It also allows to act on the station through the RC inside a TC procedure if necessary, which would not be possible if an interaction with the network centre was mandatory. RM and

CNES Mission Operations System Roadmap …

379

Fig. 10 ISIS data flows

RC are only available when the station is booked for the mission. RM and RC are defined in the system database, so as to be decommuted or encoded through the same mechanisms as those used for TM/TC. When they are distributed inside the LP CCC ISIS system, those TM/TC and RM/RC data are translated into standard CCSDS MO data, transiting on the MAL architectured data bus. Concerning the PUS implementation, some services have been customised for the ISIS standard (see Fig. 11 standard ISIS PUS services) but for every mission it is possible to add all the specific services needed (for AOCS TC for example).

380

P. Gélie et al.

Fig. 11 Standard ISIS PUS services

3.4.5

No

Name

1

Telecommand Verification

2

Device Command Distribution

3

Housekeeping and Diagnostic Data Reporting

4

Parameter Statistics Reporting

5

Event Reporting

9

Time Management

11

On-board Operations Scheduling

12

On-board Monitoring

13

Large Data Transfer

14

Packet Forwarding Control

15

On-board Storage and Retrieval

17

Test

18

On-board Operations Procedures

19

Event-Action

140

Parameter Management

142

Functional Monitoring

144

File Management

170

Dwell Acquisitions

LP CCC ISIS Performances

The performance requirements of the product line have been defined according to the mission needs taking into account feedback from the missions operated at CNES and from new mission needs. They cover user access, real-time and differed telemetry and other generic requirements. The main requirements deal with: • Multi-satellite capabilities. • Capacity of a single control centre hardware instance to allow routine operations on up to six satellites belonging to one or several missions. • Growth potential: the control centre architecture shall allow operations on six satellites but it shall also have a growth potential to be able to be used for monitoring and control of constellation of around 50 satellites, with an adapted data storage capacity, but otherwise same performance level.

CNES Mission Operations System Roadmap …

381

Generic requirements: • The system shall be able to support up to 50 users in parallel, while meeting the specified performance requirements. • The system shall support the possibility to start and manage at least 30 userdedicated applications from within one client. • It shall be possible to display at least 1000 telemetry parameters on 30 different operational places in real time (main control room/spacecraft experts room). • The long-term trend analysis of 50 parameters over 1 year shall take no more than 10 min. TM monitoring and archiving performances: • 50 MB of raw platform recorded telemetry shall be available for beginning of archiving, less than 10 min after the LOS 0° of a pass. • Monitoring and archiving of 50 MB of raw platform recorded telemetry shall be done in less than 5 min. Telemetry parallel connections: • The system shall be able to receive and archive all real-time telemetry data received from up to ten parallel connections (e.g. SLE service instances), at an overall rate of at least 20 Mbits/s. Processing and archiving per second: • The system shall be able to process and archive at least 100,000 monitoring parameter samples per second. Telecommand data transmission: • The system shall be able to transmit telecommand data at a rate of 64 Kbps, target performance shall be 512 Kbps. Procedure sending performances: • The time necessary for generating (sending) a procedure containing 20 telecommands, once all the necessary entries have been given by the operator shall be less than 4 s.

3.4.6

Technologies and COTS

In order to reach all the CNES requirements (in terms of performance, genericity, modularity, maintainability and easiness of deployment), an innovative technical solution has been thought, breaking with legacy system solutions. Some of the key choices are: • CCSDS MO that covers communication, archiving and M&C. ISIS needs automation, visualisation, flight dynamics, on-board software management, reporting, deployment, file transfer, supervision… ISIS own services based on the MO framework have been created.

382

P. Gélie et al.

• A lightweight communication library: ZeroMQ. • Simple efficient storage for time series: Google LevelDB. • Python as the common scripting language: FCPs and GCPs will be written in Python. Native Python access to all the ISIS services (C++ mostly) is provided. • GUIs rely on Qt and JavaScript.

3.4.7

LP CCC ISIS Adaptation to a Mission

The LP CCC ISIS is made of a reusable set of components that have been developed from the ISIS standard set of specifications and from the past experience of CNES in command control centre developments. In order to cope with mission specific needs, several possibilities exist: • In the development phase: – First, if possible, the reusable ISIS components are enhanced with missionspecific needs, provided that the component remains compatible with all other ISIS missions. For instance, ISIS does not use CCSDS segmentation for TCs but one of the ISIS missions needs this segmentation. So, it has been added to the encoding chain capacity as a configurable option that a mission can use or not. – Secondly, if the mission-specific need is not covered by an ISIS component, a mission-specific component is developed in compliance with the technologies and the architectural principles of the LP CCC ISIS and is designed to be run and to interact with the ISIS components. If the mission-specific function is already covered by an ISIS component that is not compliant with the precise mission need, the mission-specific component replaces for the mission, the existing ISIS component, and it is deployed in place of the ISIS component in the mission control centre. If the mission-specific function does not exist in the set of reusable ISIS components, the mission-specific component is deployed together with all the ISIS reusable components in the mission control centre. • During the assembly, integration and validation of the mission control centre: – The ISIS configuration files must be tuned with the mission-specific data and configuration values. In particular, each mission comes with its own satellite database. – In addition, scripts and procedures can also be used to fulfil certain missionspecific needs. Figure 12 shows that inside a mission centre, we can have ISIS reused items (in grey) and specific mission items that can be added if necessary (in blue). The whole infrastructure is common to all missions. Only application layer can include mission-specific items. The flight dynamics system is adapted slightly differently to a mission because it is built by assembling a generic ISIS framework together with flight dynamics

CNES Mission Operations System Roadmap …

383

Fig. 12 Addition of mission-specific items

algorithms that are developed out of the ISIS development. The adaptation to a mission is done on the algorithm part but the generic ISIS framework is reused as is for all missions. This framework interfaces on one side with the flight dynamics algorithms and on the other side with all the components of the LP CCC ISIS. Of course, if the flight dynamics algorithms of a mission need it, additional features can be added to the ISIS framework so as to remain compatible with all other missions and thus remaining generic. The flight dynamics algorithms are designed so as to be adapted similarly as ISIS components are adapted to a mission: • Either the algorithm is enhanced with new capabilities with ascending compatibility for all missions. • Or a new algorithm is developed and deployed for the mission that needs it. The ISIS generic framework for the FDS is called FDS toolkit.

3.5 Global Operational Scenario Inside the LP CCC ISIS The typical routine activity inside an ISIS command control centre can be synthetically described as follows:

384

P. Gélie et al.

• Every week the sequence of operations is automatically computed from missionspecific rules. As a result, the next three or four weeks’ program is established and can be made operational after validation and eventual manual upgrades. • Once operational, this sequence of operations is automatically executed by a scheduler. All operations are launched according to their beginning date and to the status of the previous activities. All this sequence of operations can be monitored through a GUI showing the operations with colours according to their status. • Typical operations automatically started through the sequence of operation are: – The processing of the mission programming file that is sent at a frequency decided for each mission by the mission centre. It is processed to prepare the procedure to be uploaded to the satellite for payload programming. – The flight dynamics computation that take place automatically daily. This allows to monitor the orbit and prepare a manoeuvre if needed, to monitor the attitude, to program the attitude if needed. As a result, data files are produced and distributed inside the command control centre and towards external centres like the mission centre for instance. In addition, parameters for AOCS telecommands are computed and are sent in the control centre to components in charge of computing the telecommands and preparing the TC files to be uploaded. – Daily FCPs preparation to be sent on-board during the satellite passes of the day. Only FCPs validated on a satellite simulator are available to be sent on-board. – Start of all monitoring and control functions before a satellite pass. – Acquisition, processing, visualisation if necessary and archiving of the real-time telemetry during the pass. It is used to drive the TC sending when telemetry parameter values must be checked. – Sending to the satellite of the TC procedures prepared for the pass. – Acquisition, processing and archiving after the pass of on-board recorded telemetry that have been downloaded. – Stop of all monitoring and control functions once the pass is finished. – Raising alarms that are processed by the control centre during the recorded telemetry processing. • Offline operations are generally not automatic. The offline operations are, for instance, preparing and validating FCPs, preparing and validating telemetry views, updating the system database, etc…. As a result, except during critical operations or for offline preparation, there is no operator in the control centre. People are on-call and the alarm management software calls them when an alarm is raised.

3.6 Development Industrial Organisation The development has been given to an industrial consortium leaded by the ATOS company. In this consortium:

CNES Mission Operations System Roadmap …

385

• ATOS is in charge of: – – – –

The system architecture. The system quality management. The security. All functions development except monitoring and control, PUS and flight dynamics functions. – The first client mission-specific components development. – The AIV of the whole system before delivery to CNES for acceptance tests. • Thales services are in charge of the monitoring and control and of the flight dynamics functions development. • Telespazio-Vega is in charge of the PUS components development. • Spacebel is involved in the automation function development. • SCASSI gives a support on the security aspects. • Telespazio has a limited involvement on visualisation and mission operation support functions development. In this organisation, ATOS is responsible for all the release perimeter building and coordination. Sub-contractors develop and validate the functions they are in charge of, in an environment created and delivered by ATOS. Then they deliver their subsystem to ATOS which is in charge of the assembly of all subsystems and of the complete validation of the various functions through system validation scenarios or functional loops.

3.7 Schedule and First Client Missions CNES has already received and accepted three releases of the LP CCC ISIS. Two more releases are expected before getting the complete software: • One in July 2018. • One in February 2019. The first client mission is the French defence mission CERES that will be launched by 2020. The second client mission is the NASA (JPL)/CNES altimetry mission SWOT that will be launched by 2021. The third client mission is the DLR/CNES methane sensing mission Merlin that will be launched by 2023.

386

P. Gélie et al.

3.8 Expected Benefits from the LP CCC ISIS for Missions By developing the LP CCC ISIS, CNES expects to reduce the global possession cost of its command control system for the next 30 years. The return on investment is expected to be acquired after the third mission because: • Specific developments are expected to be very light for each ISIS mission. • CNES expects to reduce the amount of tests mission after mission, as it will be possible to take into account some validation statuses without replaying all tests. • CNES expects to reuse procedures from one mission to the other thus simplifying operation preparation. • It will be possible to optimise the operation teams as the system will be identical on all missions and highly automated. • After the first three known missions, CNES will have gone through validation of very different kinds of missions allowing to have tested most of the foreseen configurations: a highly secured defence mission in an isolated control centre and two scientific missions in cooperation with other agencies that will be gathered in the same multi-missions control centre. At the time this paper is written, we have already developed the specific components for the first mission, we have specified the specific components for the second mission and we have a first analysis of the specific components for the third mission. Through those three missions, we have proven that we need around five missionspecific components per mission out of 60 ISIS reusable components. In addition, those specific components are much smaller than the ISIS components. So, we have validated that applying the ISIS standard to our missions gives the expected result in terms of reusability of the LP CCC ISIS software. As we still are in the development phase, it is too early to have a feedback about the validation optimisation and the operations optimisation but we are confident that our objectives are reachable.

4 Link with EGS-CC 4.1 EGS-CC Brief Presentation The European Ground Systems-Common Core (EGS-CC) is a European initiative to develop a common infrastructure to support space systems monitoring and control in pre- and post-launch phases for all mission types. This is expected to bring a number of benefits, such as the seamless transition from spacecraft assembly, integration and testing (AIT) to mission operations, reduce cost and risk, support the modernisation of legacy systems and promote the exchange of ancillary implementations across organisations.

CNES Mission Operations System Roadmap …

387

The European Space Agency (ESA) discussed with large European System Integrators, including AIRBUS Defence and Space, Thales Alenia Space (France and Italy) and OHB System, the possibility of a collaboration to develop a European Ground Systems Common Core (EGS-CC) which would provide a common infrastructure to support space systems monitoring and control in pre- and post-launch phases. The French and German national space agencies, CNES and DLR also signalled their desire to join the initiative and a Memorandum of Understanding was finalised in support of the EGS-CC initiative. EGS-CC and LP CCC ISIS fulfil similar objectives and thus exchanges have taken place between the two initiatives. CNES is a member from the beginning of the EGS-CC steering board and of the EGS-CC system engineering team. ISIS development started before EGS-CC. CNES took part in the EGS-CC system engineering team during the specification phase and brought some inputs from the ISIS data model definition. However, the two data models differ from their perimeter (EGS-CC has a wider scope and addresses wider satellite manufacturing definitions) and from some concepts and data definition. In addition, a study called ISIS–EGS-CC convergence analysis was performed in 2015 during the EGS-CC phase B to analyse the two systems and see which type of collaboration/convergence could be possible. The analysis covered inter-operability, compatibility, software reuse and cross-fertilisation.

4.2 EGS-CC/ISIS Convergence Study The convergence analysis aims at identifying potential approaches/measures which would enable: • Inter-operability: EGS-CC-based applications interfacing with (elements of) the ISIS system. • Compatibility: both systems supporting the same interfaces to external systems. • Software reuse: reusing software elements developed for one project in the other one. • Cross-fertilisation: technology feedback, use of standards, guidelines, etc….

4.2.1

Topics

The convergence has been analysed by addressing the following topics: • Scope: functional and non-functional features, target applications. • Concepts: fundamental objectives, approaches, constraints, operational and architectural concepts. • Data model: semantic level definition of data. • Product structure: system decomposition, internal organisation and basic architecture.

388

P. Gélie et al.

• Technology: stack of technologies and third party products adopted. • External interfaces: definition and implementation. • Standards: adoption, compliance and tailoring.

4.2.2

Functional Scope

The functional scope analysis revealed that: • The functional scopes are largely overlapping but also complementary. • The main overlaps are: – – – – –

TM/TC processing. Data management. Automation. Visualisation and reporting. Basic infrastructure.

• ISIS specific features are: – Flight dynamics. – Memory images management. – Station control. • EGS-CC-specific features are: – Separation between monitoring and control modelling/processing and adaptation (e.g. TM/TC) and test facilities. This separation allows to deal with a wide range of applications (e.g. instrument development system to satellite constellation). • Non-functional requirements are generally comparable or equivalent except: – The redundancy approach and design. – The software criticality management.

4.2.3

Concepts and Data Model

The basic functions of monitoring and control are the fundamental ones of an M&C system and, therefore, they are common to EGS-CC and ISIS, but there are fundamental differences in the basic system concepts associated to them. Main differences are: • • • •

The monitoring and control model. The live, playback, replay and retrieval functions. The messages and events. The adaptation and deployment.

CNES Mission Operations System Roadmap …

389

• The data storage. • The time management. The main (but not the only one) driver for these differences is the difference in the scope of applicability (types of missions). Other areas are conceptually more similar. These areas are: • The components type and approach. • The TM/TC data management.

4.2.4

Architecture

There are some major differences in the system architecture and structure. The differences brought to light by the study are: • • • • • •

Infrastructure and communication. System integration approach. Data access. System of systems support. Processing distribution. User interfaces. Other areas are conceptually more similar. These areas are:

• The component-based conception. • The service-oriented architecture.

4.2.5

Technology

There are major differences in the selected technology baseline notably for software development tools. Differences are on: • • • •

Programming language. Components and integration framework. User interface framework. Files management. Some technologies or products are nevertheless common:

• Inter-process communication is based on ZeroMQ. • Short-term archive is based on LevelDB. • User directory is based on OpenLDAP.

390

4.2.6

P. Gélie et al.

External Interfaces and Standards

There are fundamental differences in the approach adopted by the two systems for interfaces and standards: • EGS-CC defines generic interfaces, adhering to relevant standards where possible. Adaptation to a specific implementation of external systems is target specific. • ISIS specifies many more interfaces assuming a very specific target. However, the interfaces implementing international standards can be made interoperable and/or compatible. The interfaces in this case are: • TM/TC data definition (E-70-31, XTCE). • M&C Services (MO). • TM/TC data (SLE).

4.2.7

Scenarios Evaluation

Possible convergence scenarios have been evaluated according to relevance and feasibility. Only the scenarios scoring at least medium in both criteria are listed below: • • • • •

Inter-operability of ISIS FDS and Expert W/S with EGS-CC. Compatibility of EGS-CC-based CCS with ISIS. Reuse of ISIS communication component (in EGS-CC MO). Technology feedback (in particular for MO implementation). EGS-CC extensions (communication inspector and media manager).

4.2.8

Convergence Proposals

Finally, considering the results of the convergence analysis, the convergence proposals raised by the EGS-CC Consortium aim at ensuring and verifying compatibility of EGS-CC external interfaces with the relevant ISIS interface (for both inter-operability and compatibility scenarios). As a consequence of the convergence proposals, there will be the need to exchange technical information about the use of common technologies and standards. The form to achieve these needs to be agreed. Convergence proposals identified by the Consortium are: • Definition and support of an import/export format enabling exchange of TM/TC data definitions between ISIS and EGS-CC-based systems (e.g. definitions from an EGS-CC-based EGSE system could be transferred directly to an ISIS control centre). • Joint development and cross-validation of a ‘CCSDS MO API’ supporting the provider and consumer side of M&C Services. This aims at: – Ensuring inter-operability of ISIS and EGS-CC-based systems.

CNES Mission Operations System Roadmap …

391

– Promoting the adoption of CCSDS MO as standardised interfaces for the provision of M&C services on ground. This CCSDS MO API is currently developed in the frame of the EGS-CC contract.

5 Conclusion As a conclusion, CNES has initiated around 2007 the ISIS studies. ISIS standard has been used for the first time by a mission around 2009 and now every new CNES mission uses the ISIS standard. A command control product is under development and will be complete by beginning of February 2019. Three client missions exist and CNES expects others in the future. After around ten years, a first ISIS system will have been assembled and made operational. CNES is now on the way to get its return on investment and to widely deploy the LP CCC ISIS software for the various uses it is designed for. CNES and industrial teams work is about to be rewarded.

Proceedings 1. N. Champsavoir–CNES ISIS. (2013). A New Ground System Product Line for CNES–ESAW2013—18/06/2013. 2. M. Ferrer ATOS (France) ISIS-CCC. (2013). The future space control centre for CNESESAW201—18/06/2013. 3. Mission Operations Services Concept, CCSDS 520.0-G-3. Retrieved from http://public.ccsds. org/publications/archive/520x0g3.pdf. 4. A new ground system product line for CNES future missions relying on ISIS-IAC-13D1.4.1x16918-CNES. 5. Operational Data Management within the ISIS CCC (Upcoming CNES CCC) (AIAA 20141706)—SpaceOps2014. 6. ISIS MCS: A High-Performance Mission Control System Based On CCSDS Mission Operations Standards (AIAA 2014-1831)—SpaceOps2014. 7. ESAW. (2017). ISIS web based technologies editors architecture. CNES, ATOS. 8. ESAW. (2017). An adaptive and extensible implementation of ISIS PUS services integrated in CCSDS MO framework. CNES, TELESPAZIO VEGA. 9. Toussaint, F., & Samitier, C. (2018). New operability concepts within ISIS mission operations ground segment. In SpaceOps. 10. Georger, J.-M. (2018). CNES-ISIS, Design of future ground system operations over a fully automatized stack of CCSDS-MO Services. In SpaceOps. 11. Ciocirlan, C. (2018). Micro-services in ground control center: Lessons learned. In SpaceOps.

Return Link Service Provider (RLSP) Acknowledgement Service to Confirm the Detection and Localization of the SAR Galileo Alerts M. Fontanier, H. Ruiz and C. Scaleggi

Abstract The French Space Agency, CNES, has contributed to the international Cospas/Sarsat program since its creation in 1982. This program is a cooperation of 43 states and agencies committed to detecting and locating radio beacons activated by persons, aircraft or vessels in distress. Within this consortium, the return link service provider, RLSP, will be the facility responsible for the establishment of the return link messages and their coordination with the Galileo system, interfacing on one side with the Cospas/Sarcat system and on the other side with the Galileo Ground Mission Segment (GMS). The first version of the RLSP will enable the return link service provision including the acknowledgement service separated into two types: • Type1—system acknowledgement (Galileo sends message automatically when the alert has been received and located), • Type 2—rescue coordination centres (RCC) acknowledgement (RLSP transmits the message after authorization from the responsible RCC). Through the acknowledgement service provision, the RLSP will play a very important role in the Cospas/Sarsat network, because for the first time ever, it will be possible to send feedback messages to the beacons that sent a distress, thus completing the cycle with the beacons. Considering these functionalities and interfaces to put in place with Galileo and Cospas/Sarsat networks, the European Commission entrusted the CNES in order to manage the development of the whole system and also to operate the RLSP with high levels of objectives. After a brief recall of the RLSP functions and Cospas/Sarsat system, this paper will present the numerous technologies and methods put in place to guarantee the performances and the high availability of the system (99.95%) in order to ensure operations on a 7 d/24 h basis. The infrastructure and COTS used or developed to design the RLSP functionalities will be described. M. Fontanier (B) · H. Ruiz CNES DNO/SA/LMC, CNES, Toulouse 31401, France e-mail: [email protected] H. Ruiz e-mail: [email protected] C. Scaleggi CNES DNO/SA/CS, CNES, Toulouse 31401, France e-mail: [email protected] © Springer Nature Switzerland AG 2019 H. Pasquier et al. (eds.), Space Operations: Inspiring Humankind’s Future, https://doi.org/10.1007/978-3-030-11536-4_15

393

394

M. Fontanier et al.

Design concepts such as redundancy, scalability, virtualization, real time/non-real time, the database and the Web server will be detailed. The paper will highlight the integration of all these components and their interfaces with external entities. The GMS communicates through a ciphered network, the Cospas/Sarsat network via a VPN, and the rescue coordination centres using the Internet. More than 250 RCC across the world will connect to the RLSP website to acknowledge distress beacons. By consequence, the architecture is a key to the success of this project. A security tradeoff involving national and international actors was made between the architecture and security measures so that the RLSP can be connected to both a closed secured environment such as Galileo, and also to the outside world via the Internet. The main outcomes of this tradeoff will be exposed in article and presentation. This paper will demonstrate how the CNES concepts of operations, with the RLSP, will address the European Commission’s high-level objectives mentioned here above which makes the RLSP state-of-the-art in modern technology. Finally, the paper will conclude with some important lessons learned from the accreditation, integration and qualification phases and the first months in operations of this system.

Nomenclature C/S, CS CNES COTS EC FMCC FW GMS GNSS KPI MCC MEO MEOLUT MEOSAR MF MGF MMI RCC RLM RLMR RLS RLSP S/W, SW SAR SGSC

Cospas/Sarsat French National Space Agency Commercial off-the-shelf European Commission French Mission Control Centre Firewall Ground mission segment Global navigation satellite system Key performance indicator Mission Control Centre Medium-altitude Earth Orbit MEO Local User Terminal MEO Search and Rescue Message field in Cospas/Sarsat standard interface. MF#: MF Number Message generation facility (GMS component) Man–machine interface Rescue coordination centre Return link message Return link message request Return link service Return link service provider Software Search and Rescue SAR/Galileo Service Centre

Return Link Service Provider (RLSP) Acknowledgement Service …

SGDSP SIT SPF MPLS

395

SAR/Galileo Data Service Provider Subject indicator type each SIT number correspond to a standardised message exchanged between Cospas/Sarsat component Service product facility (GMS component) Multiprotocol label switching

1 Introduction The Search and Rescue (SAR) service of Galileo introduces a new SAR function called MEOSAR return link service, which will be provided through the dedicated facility Return Link Service Provider (RSLP). In its first version, the Return Link Service (RLS) provision is separated into two types: • Type1—system acknowledgement: Galileo sends return link messages (RLM), automatically when an alert has been received and located • Type 2—rescue coordination centre (RCC) acknowledgement: RLSP transmits the RLM after authorization from the responsible RCC. Through the acknowledgement RLS provision, the RLSP will play a very important role in the Cospas/Sarsat system, providing by first time the possibility to feedback the beacons in distress from the SAR system. It is important to note that Galileo, through the RLSP, is the only GNSS system committed to enable the return link service. As well as the acknowledgement service being a great improvement from the current Cospas/Sarsat operations, additional services are also under study. The RLSP centre infrastructure contract is carried out under a contract for the European Commission. This contract is being developed by a Consortium led by GMV SPAIN and is composed by GMV, PROSICA and AMOSSYS.

2 Galileo SAR Service Galileo supports the Search And Rescue service by equipping its satellites with a SAR payload that relays beacon distress signals towards Cospas/Sarsat (1) (Fig. 1). Relayed signals from the different satellites are received by one or several Medium-altitude Earth Orbit Local User Terminals (MEOLUTs) (2). The MEOLUT is in charge of determining the location of the beacon, either by demodulating the beacon message or by processing the times of arrival (TOA) and Doppler shifts of the received signals (FOA). The MEOLUT then sends the estimated beacon position and other relevant data to a mission control centre (MCC) (3).

396

M. Fontanier et al.

1

9

8

2

6 3 4

5A 7

5B Fig. 1 Galileo SAR service concept

This MCC communicates with the rescue coordination centre (RCC) (4) and the return link service provider (RLSP) through the French MCC (5A). The MCC sends a return link message request. The RCC can also send a return link message request to RLSP (5B) The RCC is in charge of launching a rescue operation (6) RLSP sends the return link message (acknowledgement or other service) to GMS (7) GMS inserts the RLM data in the C-band mission uplink (8) The return link message is downlinked to the beacon in the I/NAV signal in E1B (9).

3 RLSP Main Functions The associated functionalities of the RLSP are identified by blocks as shown in Fig. 2. These functionalities are divided as follows: 1. Manage interfaces The purpose of this function is to manage information coming from the different external interfaces, the FMCC, RCC and GMS.

397

Fig. 2 RLSP high-level functions

Return Link Service Provider (RLSP) Acknowledgement Service …

398

M. Fontanier et al.

Fig. 3 RLSP functional external interface areas

• Interface with FMCC: This interface is responsible for the communications between the RLSP and the French Mission Control Centre (FMCC). This centre, hosted and operated by CNES Toulouse on a 7 d/24 h basis, exchanges two types of data: alert data and system information. The RLSP receives data from the FMCC with information relative to distress event through a RLM_Request message coming from an activated beacon. The RLSP transmits SAR/Galileo Status and Ephemeris received from the GMS. • Interface with RCCs: This interface is responsible for the communications between the RLSP and the rescue coordination centre. This centre can ask the RLSP to send acknowledgements messages (RLM Acknowledgement Type-2). This is done through an authenticated Web portal. • Interfaces with GMS: This interface manages the communications with the GMS. The communications are implemented via MGF (message generation facility) for return link message in real time and service product facility (SPF) for status and ephemeris information in non-real time (Fig. 3). 2. Manage return link messages • Generate and manage RLMR: The RLM_Request can be received from different sources (FMCC and RCC). • Calculate time of RLMR transmission (it computes the time when the RLM message should be transmitted) • Generate list of satellites for dissemination: The RLSP has to provide the list of visible satellites according to the position of the beacon at a specific time

Return Link Service Provider (RLSP) Acknowledgement Service …

399

• Decode beacon message • Manage RLM states. 3. Manage SAR/Galileo Status Information • Manage SAR/Galileo Status received from the GMS • Manage SAR/Galileo Ephemeris received from the GMS. 4. Produce statistical information • This functionality generates statistical information to be used in future reports. 5. Manage RLSP system • Store/Archive data: one of the main functions is to store and archive data generated by the centre. • Manage log files, configure and supervise RLSP, synchronize RLSP with UTC Time.

4 Technologies and Design In this section, the selected tools that make the RLSP state-of-the-art in modern technologies are presented, in accordance with the addressed design concepts to guarantee a long lifetime for the RLSP service. Note that these technologies have been chosen during the conceptual phase and endorsed during the qualification campaign. In addition, the need to enhance the configuration management has been raised by the security team. For this purpose, the GIT tool has been installed on the platform to track the changes on configuration files for COTS such as PgPool, WAF/Firewall, NTP … Common Functionalities

Procured software

Selected COTS

Operating system

RedHat

Secure shell (SSH) software

Open SSH

Network time protocol software

NTPd

Virtualization software

VMware/vSphere virtualization software

Backup software

Bacula

Monitoring tool

Nagios

Antivirus software

CLAMAV

File transfer software

Vsftpd

Configuration manager

Git, Ansible

Directory services software

OpenLDAP

400

M. Fontanier et al.

For Monitoring, the open-source solution Nagios has been selected for its built-in capabilities, such as allowing monitoring all kinds of a server parameters, ranging from basic variables such as CPU load, memory use, swap usage, logged users and disk space, to more complicated ones, such as status of selected running processes, RAID, NTP and SSH status. This is the main reason for using Nagios. It also alerts users when things go wrong and alerts them again when the problem has been resolved. Moreover, it generates information that can be visualized from a Web browser, and which is very useful for operating RLSP on 7 d/24 h basis. Linux RedHat has been chosen for its wide use and the fact that it is the Linuxbased operating system with the longest support for each release. The operating system of the VMs will be manually updated, through a dedicated server, which has “Ansible” installed. Regarding the virtualization software, VMware has been selected because the performance and robustness of VMware infrastructures is much higher than other available virtualization software. The update of the “VMware Tools” and the virtual hardware will be performed manually through the vCenter Server. Note that the HW infrastructure is a combination between several physical servers and virtual machines: – Operational functionalities (CORE) are hosted on physical servers to benefit from computation power and high performances – Management services (MMI, Web) are hosted on virtualized machines to benefit from VMware maintenance capabilities Scalability, Extendibility and Flexibility

Procured software

Selected COTS

Virtualization software

VMware/vSphere virtualization software

Web server

Node.js

Backup software

Bacula

Scalability, extendibility and flexibility are must in RLSP service. This need does not only come from the programmatic context, but also from: – The expected evolution of the service – The increase in the number of users like the ATC users – New interfaces such as the GNSS service centre. The need of scalability is more related to the design of the centre and to the dimensioning of aspects, such as server capabilities, number of I/F ports or rack capacities. At this stage of the project, sufficient margins have been taken into account to allow the growth of the service. For example, Bacula allows project to keep high performances even with a much bigger volume of data to backup. In addition, the tool is able to cope with a combination of virtual and physical environments like the one of RLSP.

Return Link Service Provider (RLSP) Acknowledgement Service …

401

Operational tools Procured software

Selected COTS

Search server based on Lucene

ElasticSearch

Tool for managing events and logs

LogStash

Dashboard for elasticserch/logstash

Kibana

One of the design objectives was to implement a system that was as autonomous as possible, running on a 7 d/24 h basis but operated with human control during working hours only. Consequently, the sequencing of processing has been implemented within RLSP Software, and allows to transmit return link requests coming from a beacon in distress to the Galileo Mission Segment even if nobody is working on RLSP. In addition, when operational team needs to investigate problems that occurred during non-working hours, Kibana provides a powerful visualization engine whose main feature is analysis of streaming data such as: time-based comparisons, variety of charts and maps, powerful search syntax. Availability The availability budget given to RLSP System is 99.95% of the time expecting a maximum outage of 4 h a year. This high-reliability requirement imposes a design with a long mean time between failure (MTBF) and short mean time to repair (MTTR). These properties must be built into the system using proven off-the-shelf hardware when available, redundancy when required and other principles described below. The twin design principles of modularity and distribution ensure that failures in one part of a distributed, modular system must not adversely affect other parts. The deployment of a virtual architecture is aimed to guarantee as much continuous operation as possible, reducing to a minimum recovery time in case of failure. Since “mean time to repair” will directly affect the availability, RLSP has been designed to minimize the time required to diagnose and fix problems: – – – –

System module failures will generate alarms Diagnostic routines will facilitate finding the source of problems Spare parts will be inventoried strategically Records will be maintained to aid in the continuous improvement of the system reliability.

In order to minimize the time it takes for the RLSP to recover after a power failure, uninterruptible power and connectivity have been supplied to critical equipment. For example: – Servers will have a double power supply connected to two different PDU – Servers shall be equipped with two different network interface card – Two different HBA cards will also be provided for disk array connection. Additionally, RLSP architecture is robust to failures in order to meet not only the availability requirement, but to ensure that data management is not lost; in consequence, servers will have a RAID 1 configuration for their HDs and RAID 5 for the data storage.

402

M. Fontanier et al.

Finally, RLSP system operates in high-availability mode (cluster recovery mechanism) and guarantees that in case an active server, firewall, operational database or LDAP server fails (is unplugged, ceases to function), the other node becomes active without the service being affected by the failure. Even with all these measures in place, during the failover test it has been identified the need to connect Mistral equipment to static transfer switches (STS) to reach uninterruptible power on this equipment too. Non-Real Time/Real Time

Procured software

Selected COTS

Operating system

RedHat

Network time protocol software

NTPd

Monitoring tool

Nagios

The non-real-time component implements functionalities in interface with GMS and FMCC (see others section of this article for more details). For the implementation of these functions, a Java development has been chosen in order to reuse as much as possible implemented code from others projects. The real-time component implements interfaces as well, but also computation on time regarding RLMR transmission; it manages the RLM states and computes information for dissemination. This component will be developed using C, C++ and Shell scripting for the same reasons. Relational Database

Procured software

Selected COTS

Operating system

RedHat

Secure shell (SSH) software

Open SSH

Virtualization software

VMware/vSphere virtualization software

Relational database

PostgreSQL, PGPool

PostgreSQL has been selected as it is widely used in the business environment and it is also supported by a wide open-source community. PostgreSQL offers many advantages: – There is no associated licensing cost for the software – Has an important community of professional contributors – A lower maintenance and tuning requirements than the leading proprietary databases – A very good reliability and stability – Is extensible: the source code is available to all at no charge – Is designed for high-volume environments.

Return Link Service Provider (RLSP) Acknowledgement Service …

403

Besides, PostgreSQL is the default database deployed on CNES project. As a lesson learnt of the operational qualification campaign, Postgres used as a database cluster, is one of the heavier COTS to operate as it requests a more complex and manual recovery procedure than other components in case of failure. Web Server (RCC)

Procured software

Selected COTS

Operating system

RedHat

Virtualization software

VMware/vSphere virtualization software

Web server

Node.js

Web application firewall (WAF)

Fortinet Fortiweb

File transfer software

Vsftpd

The interface between the rescue coordination centres (RCC) and the RLSP system is provided using a Web portal protected by a Web application firewall (WAF). The architecture is based on node.js web server. This technology was chosen because it is the most common approach and was expected to generate fewer problems while offering higher security. This solution also allows us to make a lightweight and efficient web server, fully compatible with the most common browsers used around the world (more than 250 RCC will be able to use this interface). The technology/language used is JavaScript. This component is running on a virtualized environment due to the advantages provided by this kind of architecture and the fact that the functionalities provided by these COTS are not subject of specific time requirements.

5 High-Level Architecture RLSP project challenge was to find the best manner to integrate all the components presented above in an architecture that in the meanwhile allows secure interfaces with external entities. As mentioned below, RLSP communicates as well with Galileo segment through a ciphered network, with the Cospas/Sarsat network via a VPN, and with the rescue coordination centres using the Internet. In consequence, the architecture is a key to the success of this project. A security tradeoff, involving national and international actors was made between the architecture and security measures so that the RLSP could be connected to both a closed secured environment such as Galileo, and also to the outside world via the Internet. The RLSP service network transmitting all operational data flows is segregated in three different functional environments: – RLSP core real-time processing chain – RLSP core non-real-time processing chain – RLSP Web server and MMI/monitoring and control.

404

M. Fontanier et al.

But in terms of security, it has been materialized in two main zones: • On the left side of Fig. 4 is the RLSP Web, which allows the connection of external users to the platform. Infrastructure in this zone is virtualized. Also in this zone is the management infrastructure, composed of all the auxiliary elements needed to manage the whole platform: – – – – –

The monitoring tool The console for managing the virtualization environment The directory server The log server The backup server.

• On the right side of the figure are the CORE RLSP components, with their two subcomponents, the one for real time (RT) and the one for non-real time (NRT). Infrastructure in this zone is based on physical servers. In this network architecture, it is important to highlight that these two security zones will have separated network hardware (switches); within a security domain, VLANs are used to configure and segregate different networks to prevent intruders from Internet to access an unauthorized zone and therefore rebound on Galileo interface. These service network areas are separated by active security mechanisms (firewalls). Note that these chains cannot be completely separated because some data flows need to be transmitted between them. However, data transmission between areas can be restricted to the minimum needed to comply with RLSP functionalities. This separation is also applied to the management networks (for monitoring and control), although these networks do not enter in contact with the outer protecting firewalls. The pair of firewalls interfacing with the FMCC firewall has several functions: – Implementing VPN tunneling towards FMCC – No VPN tunneling for RCC (more than 250 RCCs) – Providing secure communication and to a certain extent, segregation between the different service networks: RLSP RT branch, the NRT branch and the Web server – Providing additional security measurements towards traffic coming from the FMCC firewall. The firewall clusters installed between the RLSP core RT and NRT servers and the corresponding Mistral devices provide an additional barrier towards GMS. Note that a test platform reflecting the operational one is also deployed in CNES premises. Virtualization concepts have been much more used as it facilitates maintenance and software upgrade validation on one hand, and on the other hand the needs to perform real-time calculation are less important.

405

Fig. 4 Logical architecture

Return Link Service Provider (RLSP) Acknowledgement Service …

406

M. Fontanier et al.

6 Operations Concepts The design drivers presented above such as flexibility, scalability, automation have led to an RLSP operational concepts, reviewed and accepted by the CNES operational team already in charge of the FMCC Operations. All the ensuing procedures are written in order to guarantee the proper execution and continuity of the service. The RLSP service will be operated in the 7 d/24 h basis. The routine operation is automatic including routing and long-term data archiving. The maintenance approach based on the test platform for validating the upgrades during the operational phase allows upgrading the system without interruption in the provision of the service. Operational Roles/Working Positions The RLSP team will be composed by a number of expert staff available at CNES. The following profiles, among others, will be covered by the team:

Profile

Required experience

Operations preparation engineer

System development with focus on operational aspects

Test engineer

Test, verification and validation of complex systems

(HS/R&D/security) engineer

Engineering activities related to HS/R&D/security, respectively

Operations engineer

Involved in daily operations of complicated systems and on-site first-level maintenance

Outside of working hours, the need is to supervise the system (HW or RLSP service anomalies) in order to be able to call the on-duty operations engineer if needed. Therefore, only information limited and dedicated to system supervision are displayed in a deported screen area supervised on 24 h/7 d basis. Consequently, the specific RLSP team can be on site during working hours only without impacting continuous service delivery. Operational Procedures The Operational procedures are based on the “Installation, Operations and Maintenance Document” delivered by GMV, and will act as the operator’s guideline for a proper operation of the RLSP. Any task to be performed in the RLSP operational chain is documented in detail. Moreover, it is crucial that this information is immediately available to all the RLSP operators. At the time being, these procedures are written and include: – The way activities have to be performed to maintain the consistency with technical operations and to always perform operations in the same manner – The processes for maintaining, calibrating and using the equipment

Return Link Service Provider (RLSP) Acknowledgement Service …

407

– The quality control and quality assurance – The specific security procedures (Local SecOps) During the operational qualification phase, these procedures have been enhanced and validated. Operational Scenarios The section below lists all the operational scenarios that have been identified for the nominal operation of the RLSP. The scenarios are classified depending on whether they are devoted to the maintenance or operation of the RLSP infrastructure. In both cases, routine and non-routine tasks are considered.

Level 0

Level 1

Level 2

Scenario

Operator intervention

RLSP interfaces and messages management

Routine tasks

Produce statistical information

Report generation: weekly, monthly, yearly and on-demand

Yes

Manage interfaces

Interface to Galileo GMS

No

Interface to the FMCC

No

Interface to the RCC

No

Generate and manage RLMR

No/Yes optional

Calculate time of RLMR transmission

No

Generate list of satellites for dissemination

No/Yes Optional

Decode beacon message

No

Manage RLM states

No

Manage SAR/Galileo status

No

Manage SAR/Galileo ephemeris

No

Manage return link messages

Manage SAR/Galileo Status information

(continued)

408

M. Fontanier et al.

(continued) Level 0

RLSP infrastructure operation and maintenance

Level 1

Level 2

Scenario

Operator intervention

Non-routine tasks

Simulators

GMS simulator

Yes

FMCC simulator

Yes

RCC simulator

Yes

Contingency scenarios

Manage interfaces

GMS NRT interface failure

Yes

GMS RT interface failure

Yes

Store/archive data

No

Manage log files

No

Configure and supervise RLSP

Yes

Synchronize RLSP with UTC time and GST time

No

Monitoring and control

RLSPINF infrastructure monitoring

Yes

Administrative

Preventive maintenance

Yes

SW updates and configuration control

Yes

Anomaly reporting

Yes

Routine tasks

Non-routine tasks

Manage RLSP system

Anomaly

Key Performance Indicators The RLSP system will provide to the operational team the functionality of generating reports over a period of time (weekly/monthly/yearly and defined period by the operator) computing statistics of the following information: – The number of RLM _Request – List of RLM Request_ID with status “closed” and for each, number of RLMR repetitions, and among these, number of disseminated, rejected and cancelled repetitions (in order to have indicators per beacon) – The number of RLMR rejected by the GMS – The number of RLMR cancelled – The number of disseminated RLM – The percentage of disseminated RLM compared to the number of RLM demands

Return Link Service Provider (RLSP) Acknowledgement Service …

409

– The percentage of RLMR sent to the GMS compared to the number of RLM_Request created – Average time to disseminate a RLM (output time/input time). The following image is an example of MMI RLSP showing standard statistics on RLM requests (Fig. 5). KPI will allow the operational team to ensure that the performances of the RLSP are reaching the target established by the European Commission during the lifetime of the system: – Transmission to the GMS of a given RLM request as soon as possible and not later than 1 s after the reception of the relevant interface from the FMCC – Be able to transmit up to 30 RLM requests by minute to the GMS – Be able to handle peaks of at least eight new RLM requests per minute.

7 Security Impacts on RLSP Development Plan Since the beginning of the project, we have anticipated that the security topics are closely linked to design and operational concepts. This has been completely verified during the last phases. Major topics are detailed hereafter. First, the volume of logs generated by the system has been revised upwards and forced us to increase the disk space. By consequence, the backup procedures have been updated accordingly. At the time being, a tradeoff is ongoing to find the correct level to guarantee sufficient data to investigate and on the other side, not too much to avoid a saturation of the system leading to difficulties to operate it. The accreditation process leading to the connection to Galileo Operational Chain has imposed deliveries at several stages of the development process such as tests reports, bug fixing evidences, validation control documents, conformity matrices … Finally, we observe that the development planning is frequently challenged by the accreditation milestones. For instance, independent security penetration tests have been organized in addition to usual validation phases foreseen in a project. This is not yet the end as these activities will continue through the correct implementation of cyber security policies that will enhance the safety and resiliency of RLSP system and network.

8 Conclusion At the time being, the RLSP System has been validated in CNES premises using both simulators and real interfaces for external entities (GMS, FMCC and RCC). During the operational qualification, operators have been trained and increased their skills on procedures to fulfil the high-level availability objectives. Patches are under development to fix issues and take into account some changes.

Fig. 5 MMI standard report page

410 M. Fontanier et al.

Return Link Service Provider (RLSP) Acknowledgement Service …

411

At this stage of the project, one of the most important lessons learned is regarding the build-up of the compliances towards the security constraints defined in the accreditation process. This activity shall not be underestimated as it is time-consuming and may have impacts on operational concepts, even though security concerns are involved and considered as of the definition phase of the architecture. Today the accreditation process is still ongoing. Before the RLSP service commissioning (mid 2019), the complete loop will be validated using Galileo mission segment in order to check the completeness of the key performance indicators. Acknowledgements All of the authors thank other members of the RLSP project: European Commission, GMV Spain and Cospas/Sarsat team for providing useful information from their various publications.

Automated Techniques for Routine Monitoring and Contingency Detection of LEO Spacecraft Operations Ed Trollope, Richard Dyer, Tiago Francisco, James Miller, Mauro Pagan Griso and Alessandro Argemandy

Abstract The flight control teams of two low Earth orbit missions at EUMETSAT present an overview of the automated tools and methodologies being used to analyse and report on spacecraft health including trend analysis, data mining and outlier detection. A qualitative analysis of the techniques is provided based on in-flight experience, and proposals for future development of such toolsets are presented. This paper focuses on the experiences of the Copernicus Sentinel-3 and EPS MetOp flight control teams in using the EUMETSAT CHART framework, which allows engineers to define automated reports and perform ad hoc analysis on large data sets with multiple input sources. Arguments are also presented regarding whether or not it may be appropriate for future missions to consider applying some of these techniques directly on-board as an extension of the currently in-place FDIR mechanisms.

Nomenclature CHART EDAC EPS FCT FDIR GOME GNSS HKTM IASI

Component Health Analysis & Reporting Tool Error Detection and Correction EUMETSAT Polar System Flight Control Team Failure Detection, Isolation and Recovery Global Ozone Monitoring Experiment (an instrument aboard MetOp) Global Navigation Satellite System Housekeeping Telemetry Infrared Atmospheric Sounding Interferometer (an instrument aboard MetOp)

E. Trollope (B) · R. Dyer · T. Francisco · J. Miller Flight Operations Department, EUMETSAT, 64293 Darmstadt, Hessen, Germany e-mail: [email protected] M. P. Griso Operations & Services to Users, Serco Services GmbH, 64293 Darmstadt, Hessen, Germany A. Argemandy Engineering Services, LSE Space GmbH, 64293 Darmstadt, Hessen, Germany © Springer Nature Switzerland AG 2019 H. Pasquier et al. (eds.), Space Operations: Inspiring Humankind’s Future, https://doi.org/10.1007/978-3-030-11536-4_16

413

414

E. Trollope et al.

LEO MCS NANU NRT SLSTR SRAL SVM TM WODB

Low Earth Orbit Mission Control System Notice Advisory to NAVSTAR Users Near Real-Time Sea and Land Surface Temperature Radiometer (an instrument aboard Sentinel-3) SAR Radar Altimeter (an instrument aboard Sentinel-3) Service Module (the platform components of MetOp, distinct from the Payload Module) Telemetry Working Operational DataBase

1 Introduction For space missions where the duration of direct communication with the spacecraft is limited, such as in low Earth orbit missions where passes are limited to ~10 min every ~90 min, the ability to rapidly analyse and diagnose spacecraft health in the event of an on-board contingency is critical. This urgency is mitigated to some extent by the robust design of modern spacecraft and the increase in on-board automation for error handling (FDIR; EDAC; etc) as presented in Ref. [1], but further improvements to the ground segment can still be made bringing real benefits to the investigations. There is also a cost benefit to space agencies in reducing the investigation time or avoiding engineers being called to investigate outside of normal working hours for scenarios that are in actuality mundane. The routine monitoring of the health of a modern spacecraft is based on two central pillars: predefined ground and on-board “limits” on telemetry, and the generation of “events” by the spacecraft. While these concepts provide a very robust means of identifying and understanding the status of a spacecraft at any given time and the sequence of events that led to that status, they must be complemented by careful analysis of the data for any concerning trends or unusual behaviour that would not be detected by these two means alone. A clear distinction is drawn between the four time frames in which the routine monitoring of LEO missions must be performed: (1) (2) (3) (4)

monitoring of the real-time health status; evaluating the health between communications passes in near real-time (NRT); evaluating any new or evolving trends with potential short-term consequences; monitoring of the long-term spacecraft performance.

Real-time status checking may require an immediate (within a matter of minutes or hours) response from ground, depending on the level of autonomy involved with the mission in question. However, playback of the telemetry accumulated since the previous communications pass is unlikely to complete before the end of the pass—with corresponding impacts on the nature and timeframe of the subsequent

Automated Techniques for Routine Monitoring …

415

NRT analysis. Similarly, analysis of new or developing trends typically takes a longer term view—and thus can afford to be slightly more laborious in terms of processing time and may be performed with a reduced frequency, e.g., once per week or one per month [2]. For the EPS MetOp and Copernicus Sentinel-3 missions at EUMETSAT, the real-time monitoring is done exclusively via the mission control system (MCS); NRT monitoring and analysis through a combination of the MCS and the component health analysis and reporting tool (CHART); and weekly trending analysis is performed exclusively with CHART. Long-term performance analysis is performed in conjunction with industry on a quarterly/biannual basis, but is considered to be outside the scope of this paper. Having adequate tools in place to access and analyse spacecraft data is a fundamental component of the knowledge management strategy of any space mission. It is noted, therefore, that the benefits of spacecraft data analysis tools should include those identified as the “correct implementation of a [Knowledge Management] system” for a space project in Ref. [3], and consequently these goals are adopted throughout this paper to evaluate the tools described: (1) (2) (3) (4)

a common way of working on information; error minimisation in operation; quick answer/reaction time; ease the arrival of newcomers.

2 Real-Time Analysis Techniques for Spacecraft Health A typical pass duration for LEO missions is of the order of 10 min. Consequently, it is important that key parameters be checked immediately against their expected values, which is achieved via the well-established concept of high- and low-monitoring limits (the so-called Out-Of-Limits or OOL approach). A description of the evolution and limitations of the OOL approach widely used today can be found in Ref. [4]. These issues are compounded by the trends noted by Ref. [5] with regard to the reducing manpower available contrasting with an increase in diversity of commanding and housekeeping telemetry parameters, which is exemplified by the two missions covered by this paper (see Table 1); with one decade between the launch dates of the initial satellite in each series, we can see a doubling of both the number of telecommands and telemetry parameters and a halving of the available manpower to define and maintain suitable monitoring limit sets (with similar implications for the NRT and trending analysis). Clearly, it is impossible to manually check tens of thousands of parameters in the time frame available, but while all HKTM is equal from the perspective of data volume some HKTM is more equal than others in the eyes of the mission control system (MCS) and flight control team (FCT). While it is widely acknowledged that manual checks cannot be performed [6, 7], there is no such limitation on performing automated checks by the MCS. However,

416 Table 1 Comparison of monitoring and commanding Data set Sizes for MetOp and Sentinel-3

E. Trollope et al.

MetOp Year of first launch

Sentinel-3

2006

2016

21,000

46,000

# Ground monitoring limit sets

6000

1000

# Telecommands

2000

4000

13

7

# HKTM parameters

# SOEs

these checks must also be defined and maintained on the ground, and the reduction in manpower combined with the increase of on-board autonomy has led increasingly to the prioritisation of a subset of key parameters in the definition of ground monitoring limits by the ground teams. The statistics for MetOp and Sentinel-3 presented in Table 1 are in line with the observation in Ref. [5] that even automated checks are not performed on the majority of parameters in real time. There is also an argument to be made that it is preferable to have smaller number of key parameters OOL that clearly identify the nature of a problem at a given time, such as an instrument’s mode, rather than for all parameters that are not at their usual values as a result of the problem all being OOL at the same time. This argument is strongly supported by our criteria for achieving a quick answer/reaction time. For communications passes in which commanding is possible, which is true every orbit for MetOp but is only true for 2 of 14 orbits per day for Sentinel-3, it is clearly mandatory that the automated monitoring is performed by the MCS. Arguments can be made, especially for communications passes in which no commanding is possible, to augment the monitoring done by the MCS with other toolsets, although these seem more likely to be NRT. However, such augmentation should keep in mind our previously acknowledged criteria of maintaining a common way of working on information, in particular. Some possible augmentations are discussed later. The automated checks performed by the MCS confirm extremely well to all four of our acknowledged knowledge management criteria, with just one exception—responding to contingency scenarios. The MCS checks end with the notification that “something” is not as expected, by design. The FCT must then rely on other documentation and tools to identify the actual anomaly and determine the appropriate course of action—a series of actions that potentially include the contingency operations guide, flight control procedures, anomaly processing tool, a catalogue of how previous operations was handled and one or more tools for comparing telemetry series against those seen on previous occasions (e.g. CHART). Although the usage of each of these tools and the process that connects them is well defined, work is ongoing at EUMETSAT to reduce the number of toolsets required; a change that clearly benefits all four of our knowledge management goals.

Automated Techniques for Routine Monitoring …

417

3 Near Real-Time Analysis Techniques for Spacecraft Health In addition to the real-time data reported during a pass, LEO spacecraft must dump all the HKTM (and science data) collected throughout the orbit. This data must in turn be ingested into the appropriate ground systems and checked in a similar manner to the real-time data. Additional checks can also be performed because the time constraints are not as severe as for the real-time analysis, but not to such an extent that significant delays in the processing are acceptable: it is obviously necessary to avoid processing delays so as to alert the FCT of any significant detections as soon as possible, and the ultimate constraint is that all orbital processing must be completed before the next pass begins at the very latest, to prevent a backlog from developing. In addition to storing the raw and calibrated data received, the CHART architecture processes the data to support additional analysis; this includes calculation of orbital statistics, which can subsequently be run through an outlier detection algorithm, or other algorithms defined by the FCT whose results are stored alongside the spacecraft data in the CHART database. The nature of 24/7 monitoring in the control room is such that the monitoring performed by the MCS must highlight any spacecraft parameters that require immediate action or investigation by the engineering team, and any monitoring that highlights minor deviations from nominal behaviour must not interfere with that. However, it is not a small challenge to distinguish between the two. Figure 1 illustrates a real-life example of this: on 11 August 2014, a small jump was reported in the attitude errors of MetOp-B, which was immediately corrected by the spacecraft itself without any need of ground intervention, and the values reported were well within the defined safe ranges. The deviation, therefore, did not trigger any OOLs in the real-time monitoring being done by the MCS, but was detected via the outlier detection algorithm whose results go directly to the engineering team rather than to the controllers. Consequently, the event was investigated during working hours and identified as a suspected micrometeorite or very small debris impact, without an unnecessary callout of the on-call engineer. It is also noted that while it would have been possible for this particular event to have been detected by tightening the ground monitoring limits to values closer to those seen during nominal operations, this event demonstrates why it would not have been appropriate to do so.

3.1 Automated Detection of On-Board Events by Ground Systems There are various automatically generated events available with the tools that the authors use that were originally monitored manually, a very tedious process in some cases, and these were subsequently coded into algorithms that can easily be run against the data set automatically or on request. An example of this is an anomaly

418

E. Trollope et al.

Fig. 1 Small attitude jump on MetOp-B due to a suspected micrometeorite or very small debris impact on 2014-08-11 detected via novelty detection algorithm

encountered by the MetOp team where a data writes pointer for a solid-state recorder would randomly skip over an address block and write in the subsequent block, and so leave old data in the read path. This did not cause any loss of recorded data, but the block of old data caused processing errors due to timestamping going backwards. When the error first occurred, these processing errors were not understood and were not attributed to the jump of the pointer which was not monitored—it is very hard to spot a one block jump in many blocks. The problem was quickly analysed and understood and the then daily manual memory write and read step checks were replaced by the autonomous checking by CHART. The algorithm had to take in to account: (1) Telemetry values not updated regularly, sometimes the same value is contained in consecutive frames; (2) Data gaps, ignoring artificial jumps (false positives); (3) Wraparound of the counter; This cannot fix the problem but does reduce the investigation time as a notification of the event is emailed to the FCT automatically by CHART as soon as the problem occurs.

3.2 Outlier Detection at EUMETSAT Outlier detection algorithms such as that presented in Ref. [4] can be used to identify unusual behaviour that may be indicative of problems that are developing, without needing telemetry to go outside the defined ground or on-board limits. Reference [6] provides an excellent summary of different implementations being used in spacecraft operations domain and their evolution up to 2016. The four-dimensional Euclidean distance method is just one example of novelty detection, which has proved to be especially effective at detecting changes in

Automated Techniques for Routine Monitoring …

419

Fig. 2 Example of a a HKTM parameter with a long, slow trend and b the resulting output from inserting one week of data into the CHART outlier detection tool, showing yellow, orange and red outliers that would trigger an investigation by the FCT. Training data is presented in black, nominal target data in green

behaviour which occur over short timescales. Although a very slow, continuous trend since launch may not be detected at first, ultimately it would lead to many (or all) results being flagged as outliers as shown in Fig. 2. Such a trend and seasonal patterns can be compensated for by pre-processing the data before inputting it into the outlier detection algorithm though means as described in Refs. [8, 9], but details on the methodology is outside the scope of this paper. Outlier detection was initially performed using ESOC’s novelty detection algorithms as described in Ref. [4]; our teams compiled the training and target data sets using CHART, which was then transferred to and analysed by an ESOC server and returned to EUMETSAT via ftp This approach proved very successful and our teams were very happy with the results that it produced; however, we ultimately needed to develop an independent approach: • The interface between EUMETSAT and ESOC was not an operational one—there was no on-call support, nor service level agreement, etc. This could occasionally lead to outages of the service; • We were lacking some visibility of the results of the algorithm, such as is provided by the plot in Fig. 2 for example. Such figures are very important for engineers to understand why a novelty has been detected; • We had no ability to tune the algorithms to different parameter types—e.g. stable 8-bit parameters need to be handled slightly differently to 16-bit parameters; • There was a strong desire to integrate the algorithms into the CHART tool so that we could take full advantage of the reporting and analysis capabilities; • Occasionally, an event would not be detected. Moving to an in-house algorithm enabled the team to have a clearer understanding of why a given event is not detected, enabling an update of the toolset to compensate; It was considered to be preferable to avoid spending a large amount of effort developing new algorithms, knowing that similar work was already being done by other teams. Our intention was to first utilise an off-the-shelf toolkit and combine

420

E. Trollope et al.

Fig. 3 Temperature data (orbital min, max, average) from the MetOp-A GOME instrument, with shaded areas highlighting the data identified for use as training data by the outlier detection tool

it with the existing CHART architecture, with a particular focus on the so-called one-class support vector machine methodology. The scikit learn™ toolkit looked promising initially; however, it was found that two parameters (nu and gamma) needed tuning for each HKTM parameter being analysed. A solution to this is described in Ref. [10], but at the time the FCT lacked the expertise to perform this tuning, and time constraints meant that it was actually easier to implement a known algorithm from scratch in order to avoid unpredictable learning obstacles. This algorithm was based on the semi-supervised 4D-Euclidean distance to the nearest neighbour approach, as described in Refs. [4, 11]. The detailed implementation is outside the scope of this paper, but while the algorithm is very straightforward it is also very effective. As our approach is reliant on the SOE-led selection of parameters and applicable training data ranges, effort is instead being focused on developing and improving the tools available to the FCT for identifying and maintaining these ranges. Internal guidelines state that “training data sets should generally only cover cases when the satellite and payload are in their fully nominal status and stabilised thermally” so as to avoid false negatives due to identifying anomaly signatures as nominal behaviour as illustrated in Fig. 3. The team uses the outlier detection tool to monitor approximately 200 parameters on each of the MetOp satellites. Outliers detected by the algorithm are scored with a novelty threshold value and divided into colour-coded levels of severity; green, yellow, orange and red—illustrated in Fig. 2. We consider investigation is needed if any orange or red flags are raised, or if more than 25% of the test data in a given week raise yellow flags. Our teams also note that although the computations are done in 4D, an estimated 95% of

Automated Techniques for Routine Monitoring …

421

Fig. 4 Pros and Cons of selecting different parameters for monitoring by outlier detection

outliers show up in the Std.-Dev. versus Mean plots, so these are the most useful in reports to help understand why flags have been raised. The selection of parameters to be monitored via outlier detection methods must also consider the “granularity” of the parameter, which can be thought of as its position within the hierarchy of the satellite data, as illustrated in Fig. 4. If the parameter is too high in the hierarchy, the usefulness of the detection is lost as it is unclear exactly where the problem lies within the unit, and the monitoring will trigger very often. Conversely, if the parameter is too low in the hierarchy there are too many parameters to easily maintain, and a single on-board error can lead to a larger number of detections than is optimal for operational purposes. The authors also note that is useful to apply pre-filtering of detections by grouping parameters into “master” and “slave” sets, according to their level of influence upon each other: • A master parameter is one which is explicitly set by Telecommand or On-board Software—for example, the mode of an instrument. • A slave parameter is one which is affected by a master. The relationship between master and slave parameters can be one:one, many:one, one:many or many:many. The algorithm can then first perform novelty detection on the master parameters. If a novelty is detected on a master parameter, then novelty detection is not performed on any associated slave parameters over the respective time period. Our teams found that outlier detection can only be reliably implemented for any given subsystem once we have a good size data set of nominal behaviour—i.e. it

422

E. Trollope et al.

cannot be easily prepared before or used immediately after launch, requiring some months to establish the requisite nominal data sets assuming the reliance on actual data is maintained. This observation is in line with a study performed by ESA in partnership with KU Leuven and S.A.T.E. is an Italian company (Systems & Advanced Technologies Engineering) in which one year of training data from the Venus Express mission was used [5, 9]. Being unable to rely on such techniques from launch is not a problem per se,1 however, along with the fact that such techniques do not produce results specific enough to be used for on-board FDIR, it is another reason that they can only be considered complimentary to the traditional OOL concept. Any effort spent in developing such techniques must be made in addition to the effort that must be spent establishing the OOL concept; as they cannot replace it, and is in conflict with our knowledge management objective of maintaining a common way of working on information. Some data could be used from ground test campaigns or artificially generated, or in the case of multi-satellite systems taken and adapted from those of an earlier satellite, but at the risk of introducing false positives. This has not yet been attempted by the EUMETSAT teams, but comparisons can be drawn between the data sets of the already flying satellites MetOp-A and MetOp-B as illustrated by Fig. 5. It was observed that the algorithm was over sensitive when the nearest neighbours in the nominal data were dense and under-sensitive when it was sparse. This problem is because the nominal data sets generated from operations typically have fuzzy edges. The effect of this is illustrated in Fig. 6, where a sparse region of training data at point “A” is in close proximity to one or more densely populated regions. This can lead to classification of target data falling within the sparsely populated “gap” as a novelty, because the X nearest neighbours are in densely populated regions. Increasing the value of X used by the algorithm until the problem of false positives is eliminated leads to target data, such as the one identified near point C, being classed as “green” instead of “yellow” or “red” as would be desirable. To work around this issue, the authors adapted the algorithm such that it autonomously relaxes the novelty thresholds when nearest neighbours in the training data set are closely packed and tightens them when the nearest neighbours are sparse. This eliminates the need to manually tune the algorithm to each monitored item individually. The authors note that outlier detection within CHART at EUMETSAT is currently only implemented for MetOp, and will be extended to Sentinel-3 in future.

1 From experience, the expertise, size and focus of both the Flight Control and Industrial Support is

at its maximum shortly after launch and a shortage of resources for trend analysis and monitoring is less of an issue.

Fig. 5 Comparison of the first six 6 years of one in-flight temperature on MetOp-A (blue) and MetOp-B (dates offset by 2192 days)

Automated Techniques for Routine Monitoring … 423

424

E. Trollope et al.

Fig. 6 Example of a data set containing a sparsely populated region in the training data set at “A” in close proximity to a densely packed cluster of data at “B”, and a data point in the target data set around the edges of the nominal data set at “C”

3.3 Lessons Learned from Events Detected 3.3.1

Switch-off of IASI Compensation Device

On the 18 December 2014, a spurious switch-off of the IASI Compensation Device (used to prevent exported torque due to cube corner in Michelson Interferometer) occurred on MetOp-B. The IASI CD was added very late in the MetOp programme, so there is no direct monitoring in the spacecraft telemetry: its status must be inferred from other parameters. The outlier detection tool spotted a reduction in the power consumption of the subsystem controlling the CD and a simultaneous increase in the Y -axis wheel currents (Fig. 7). Following the subsequent investigation, the impact on products turned out to be favourable due to the removal of ghosting artefacts from vibrations—this was always suspected, but had not been confirmed until then. The impact on the spacecraft platform was confirmed as acceptable, and so switching the device off was proposed as the new operational baseline at the IASI Revue d’Exploitation. Inferring or confirming anomalies through indirect means are not uncommon in spacecraft operations. For example, a recurrent anomaly on the Sentinel-3A OLCI instrument must similarly have its origin and status inferred from other parameters. Using the CHART architecture, the FCT can convert the checks associated with these inferences into automated algorithms to improve response times during contingency scenarios.

Automated Techniques for Routine Monitoring …

425

Fig. 7 Monitoring of the a IASI subsystem power and b Y -axis wheel currents allowed the detection of a switch-off of the IASI Compensation Device

3.3.2

Impact of MHS Switch-off on the MetOp-B Service Module

In September 2015, the MHS instrument on METOP-B was switched-off. This led to the overall estimated rates recorded by the service module also being reduced (see Fig. 8), since the disturbances were reduced due to the scan mirror mechanism being off, which was detected by the outlier detection tool. As with the micrometeorite detection described above, this event did not trigger any OOL alarms on the service module, as the reduction in rates was not problematic for the spacecraft. In other words, while was not a “false positive” detection, it was nevertheless a detection of a secondary effect of a genuine non-nominal spacecraft status that was not in itself non-nominal. It is important to note that, as a service

426

E. Trollope et al.

Fig. 8 A reduction in rates reported by the MetOp-B SVM in 2015, due to a switch-off of the MHS instrument

module (SVM) parameter, if it had triggered OOL alarms then investigation of the SVM alarm would have been prioritised over those raised by the payload (MHS). The authors, therefore, distinguish between primary and secondary monitoring tools, as follows: • A primary monitoring tool monitors for critical events and parameters requiring an immediate investigation or response from the ground at any time of day or night. • A secondary monitoring tool checks for lower priority “novelties” and unexpected behaviour which is to be investigated during office hours. We suggest that using this definition from the design phase onwards of future missions could be of benefit: by not defining MCS limits for less critical parameters, but instead defining “secondary monitoring tool limits” in parallel, a more versatile and comprehensive monitoring approach could be achieved. In this context, the authors, therefore, conclude that while outlier detection is extremely useful as a secondary monitoring tool, it is not yet ready to be included in primary monitoring toolsets.

4 Offline Trend Analysis Techniques for Spacecraft Health In addition to the confirming that no parameters are currently OOL and that behaviour is not significantly “novel” through outlier detection, the FCT must evaluate any new or evolving trends with potential short-term consequences, such as a parameter that is currently within limits but is rapidly heading towards them. Detecting such trends allows the FCT to take action before such limits are reached, potentially preventing an outage of mission data or more serious events from occurring.

Automated Techniques for Routine Monitoring …

427

Fig. 9 Sentinel-3A SLSTR Blackbody temperature started rising towards the upper operational limit roughly 6 months after launch, before stabilising again

The Sentinel-3 SLSTR Blackbody temperature has an upper operational temperature of 305 K and was configured to remain at approximately 302 K following the commissioning phase. For roughly 6 months after launch, the temperature was stable at 302 ± 0.5 K, but then started rising towards the limit before levelling off again as shown in Fig. 9—a trend that is nominal seasonal effect due to the distance of the Earth to the Sun. Automatically, identifying changes in trends such as this is highly valuable for flight operations teams.

4.1 Pre-processing of HKTM with CHART When working on data sets with tens of thousands of parameters that produce a sample every few seconds over a decade or more, it is clear that data reduction techniques must be a fundamental part of any analysis toolset. The first technique utilised within CHART is that of orbital statistics; reducing all values of a given parameter for a period of ~100 min to just 4 values (minimum, maximum, mean and standard deviation). The nature of LEO missions makes such time windows a natural choice, given the environmental effects are relatively constant from one orbit to the next, with one notable exception: radiation. The orbits of MetOp and Sentinel3 are such that their ground track repeats after 29 days (412 orbits) and 27 days (385 orbits) respectively, known as a “repeat cycle”. This means that every repeat cycle the spacecraft experiences the same number of crossings of the infamous South Atlantic Anomaly, making it a very suitable time window for many parameters with a link

428

E. Trollope et al.

Fig. 10 Simplified overview of CHART data pre-processing concept

to the geographic position of the spacecraft, such as behaviour of on-board software errors or the mode budget of the Sentinel-3 radar altimeter SRAL. CHART is an extensible toolset, such that the FCT can define new algorithms to store pre-processed data for later retrieval and analysis, in addition to the generation of events and notifications as mentioned previously. It also allows for the overlaying of data from different time periods, allowing for an easy comparison of behaviour during an event with previous events. Such algorithms can be scheduled to run over any time period and with any frequency, allowing the FCT to develop specialised analysis tools according to their needs. A simplified overview of the process is illustrated in Fig. 10, the detailed design being outside the scope of this paper. Our teams note that this pre-processing allows for significant data reduction through orbital or temporal statistics or derived parameters, e.g. NEDT. Consequently, the monitoring of one derived parameter can be equivalent to monitoring a much larger number of on-board parameters, reducing the number of checks that must be performed and subsequently the number of alarms/detections that may trigger in parallel. Our teams found that the vast majority of detections by the outlier detection algorithms tested were false positives, which we defined as any behaviour that is not indicative of an anomaly on-board. The teams currently spend approximately one hour per week per spacecraft investigating and eliminating false positives. This can be reduced by changing the thresholds used to classify an outlier but ultimately you have to accept that risk and make the trade-off between too many false positives and missing something useful or important. Some false positives were identified as being due to missing data from the orbital set, often due to changes on board which were expected but also sometimes due to ground issues. Rather than adding the number of samples within the orbital set as an additional dimension to the algorithm, an alternative solution has been adopted in which the tool checks the total number of samples is above a predefined percentage of the expected number (taken as the mean number of data points per sample in the training data for a given parameter), and samples with too few points are removed from the data being examined for that orbit. An example of this is illustrated in Fig. 11. It should be noted that in the event of a genuine on-board event causing the total loss of data from a particular parameter, it would be detected by other means, e.g. OOL and so removing the parameter in this way does not risk introducing significant false negatives to the toolset. We also get false positives from fairly stable, low-resolution (8 bit) parameters. We overcome this by working on RAW data where it is available—it is normalised

Automated Techniques for Routine Monitoring …

429

Fig. 11 a Spike in Metop-A MHS channel 3 NEDT on 2017-06-15, and b plot of orbital average vs orbital standard deviation of Metop-A MHS channel 3 NEDT throughout 2017 with the outlier event shown in (a) circled in red, and a second outlier circled in green caused by a gap in data during the orbit that is filtered out

to the training data (such that the Min/Max/Mean/St_Dev of the training data are all in the range 0–1), so the actual values are abstract. By extracting the raw data, we just skip any parameters where the training data is in a 2 LSB range and the test data is within 1 LSB of the training data.

4.2 Post-processing of HKTM with CHART 4.2.1

Data Subsampling and Automated Reporting

For extremely long durations, CHART also offers the ability to subsample both the original and reduced data sets. This is a very useful feature of CHART, which is applied automatically by the CHART plotting tool (GUI) whenever a user attempts to plot more data samples than the plot has pixels in width; allowing engineers the ability to use a single tool to retrieve either all data for a short duration to analyse a specific feature or anomaly, or plot the same data over the entire length of the mission. Figure 12 utilises several of these features, showing orbital statistics subsampled 3:1 and multiple years of data presented offset such that each month is aligned vertically with the same date of the other years. Parameters like this can pose a variety of different challenges for OOL and outlier detection concepts, including long term and seasonal trends, and a lot of noise—here variations within a single orbit or from one orbit to the next are far greater than the underlying trends for several months at a time. Such challenges can be overcome, however, a simple visualisation like this makes it very easy for an engineer to visually compare behaviour from one year to the next and predict generic future behaviour accordingly. CHART provides the FCT with the ability to produce automated reports containing plots like this, either on a fixed schedule or on request. These reports are built from templates which are defined by the FCT and make use of an extensible library of

Fig. 12 Orbital average values (subsampled 1:3) for Sentinel-3A SLSTR blackbody heater temperature in 2016 (red), 2017 (green) and blue (2018)

430 E. Trollope et al.

Automated Techniques for Routine Monitoring …

431

Fig. 13 Example of a normalised temperature with anomaly event, and b actual temperature with same event

widgets that retrieve the stored (pre-processed) data and can manipulate and format it for presentation [12]. This allows for a “middle ground” between automated detection methods and manual checks, such that the team can define a series of checks they wish to perform on a daily/weekly basis and have CHART automatically prepare the reports for the engineer to review. This also helps facilitate the training and handover to new team members, as the reports are already defined and so the new engineer does not need to learn how to find the parameters or analyses they should be reviewing on a case by case basis.

4.2.2

Anomaly Spotting via Normalisation of Temperature Gradients

When the MetOp team defined one such report, it was decided that thermistors on the same board that has different temperatures should be normalised on the graphs, allowing the graph to show greater detail and hence improve visibility of deviations in the temperatures over the reporting period. The temperatures were seen to evolve

432

E. Trollope et al.

together as expected for the same panel/area, but then the team noticed a divergence of the temperatures as can be seen in Fig. 13. The two graphs in Fig. 13 cover the same period and temperature parameter and this highlights quite well what the graph normalisation of these temperatures brings to the reports. Investigation also showed a rise in power consumption for the board at that time of the temperature increases, with no change to output power. Analysis of the actual temperature variations in relation to the thermistor positions on the board, a small deviation being further away from the heating source, we could quickly narrow down the area and so the possible components at fault. As this board is in hot redundancy with another power board, the problem did not cause any outage to the subsystem, and the events have since reduced in frequency and duration. With improved automation and algorithms, the power rise and individual temperature deviations from the “norm” can be flagged automatically without having to rely on visual inspection of the graphs.

5 Additional Data Sources The authors argue that analysis tools for spacecraft operations should not focus exclusively on a single data source—the spacecraft telemetry. In addition to the systematic reduction of large data sets to ensure usability through techniques such as those summarised by Ref. [13] and the automated detection of predefined signatures described above, a fundamental principle of modern data mining is to combine data from as many sources as possible that can benefit the understanding of the timing or causes of target phenomena—internal, external and public. Other satellites within a constellation or mission, including those launched at different times (such as the three MetOp and four Sentinel-3 spacecraft), can provide valuable reference data for a comparative analysis of observed phenomena. The US coast guard “Notice Advisory To NAVSTAR Users” (NANU) bulletins, which are distributed via email and are available online, frequently contain all the necessary information to explain error messages generated by on-board GNSS units, either in advance or shortly after an outage of a GPS satellite. As these bulletins are generated with a machine-readable template, it is relatively straightforward for modern analysis tools to ingest them automatically. Other data sources include outputs from the mission planning and flight dynamics systems and associated processes and logs; for example, on both MetOp and Sentinel-3 the relative orbit number within a repeat cycle corresponds with whether or not the spacecraft will fly through the South Atlantic Anomaly and to what extent. This information can be taken as an additional dimension within outlier detection algorithms or included in plots and reports. Outlier detection algorithms can be effectively applied to these inputs too, as demonstrated by Ref. [14]. The authors consider that a comprehensive analysis performed by engineers will always take such additional sources into account, and so the analysis tools should be designed to perform similarly.

Automated Techniques for Routine Monitoring …

433

6 Future Developments 6.1 Application of Techniques Developed Across Toolsets The authors note that algorithms developed to support the identification of nondiscrete parameters (e.g. see Ref. [9]) could potentially be applied to the maintenance and enhancement of monitoring limits within the MCS/WODB. However, the authors advise caution with such an application, for reasons described above. Following the observations related to the distribution of discrete and non-discrete parameters in Ref. [9], a similar analysis was performed against Sentinel-3 telemetry over data from 1st of January 2018 to 30th of April 2018, with similar results in that the majority of TM parameters, i.e. 50%, can be considered as constant. The authors could foresee a use of this analysis to help eliminate TM parameters for which outlier detection is not appropriate, namely any parameters which can be described as non-discrete. At this point, it is a task for the FCT to manually identify the most significant parameters and to apply the appropriate monitoring strategy. The aforementioned analysis indeed, reduces drastically the number of parameters to be managed, without losing any relevant information that can be acquired using other means like on-board events. This approach is also beneficial in terms of computational costs, especially with spacecraft WODB becoming increasingly larger (Fig. 14).

Fig. 14 Count of HKTM series reporting a given range of distinct values, observed in Sentinel-3A HKTM between 1st January and 30th April 2018

434

E. Trollope et al.

6.2 Merging of Toolsets In order to improve the efficiency of contingency response, CHART can be expanded such that it contains a database with further information and actions that need to be executed once certain predefined TM signatures and patterns are detected. The system could then inform the FCT via email with a report of what happened and information that is deemed relevant prior and post-anomaly detection. Currently, missions like Sentinel-3 use an offline tool that gives a list of signatures that need to be verified manually, using CHART or the MCS. Merging this functionality into CHART would make the detection and identification of anomalies much quicker and allow for a more efficient data collection, as well as reducing the number of tools required. In keeping with the theme of expanding the CHART system to incorporate tools that currently exist offline, new functionalities could be added that would allow the users, be this FCT or other partners such as the EUMETSAT science teams, to create operation requests via an online portal. These requests would require the input of the date of the activity and what needs to be done. The FCT would then need to accept this request via the same portal and provide the system with additional information, ranging from which units are going to be affected, what spacecraft events might be generated, if any OOLs are expected, what procedures will be used and even identifying potential anomalous outcomes of this request. Having this information available within the same tool used to generate periodic reports would allow for faster and more efficient reporting. Once ingested by the system, all of this information could be used to prepare a special operations request the FCT can follow—another task currently performed manually using OTS software. Once the operation is completed, the system could also generate an automatic report based on the inputs given by the FCT at acceptance level. Since the data is available to the system, all associated processes checking for and reporting on anomalies would know that during this period certain OOLs are expected, and instead of raising unnecessary alarms and informing the FCT it would flag them as expected accordingly.

6.3 On-board Datamining by Spacecraft The authors note that it would theoretically be possible to move the outlier detection checks performed on ground onto the spacecraft itself, just as spacecraft currently perform limit monitoring. This would require that the outlier detection algorithms are sufficiently improved that false positives are mostly eliminated and that the computing power necessary is affordable. Such an “outlier detection system” on future missions could be linked to common spacecraft functionalities that are already available today such as event reporting, event action responses and on-board monitoring. Once configured properly, the new functionality might allow for more on-board autonomy and faster resolution of problems. A wide variety of complexity could be

Automated Techniques for Routine Monitoring …

435

Fig. 15 Potential concept for on-board exploitation of outlier detection

made available in the checks or signatures identified, and similarly in what the spacecraft would do if these are detected—ranging from generating an event in telemetry, requesting an additional report or enabling a diagnostic packet, up to or reconfiguring or switching off the unit. Such functionality could even be expanded to allow for autonomous decision making, based on the data that is available on-board. For example, when a signature was identified the spacecraft could check whether or not there was an upcoming communication pass and respond differently depending on the answer. If this functionality would be available today, such a decision tree would be possible in a spacecraft that utilises the Packet Utilisation Standard (PUS), which allows for the definition of custom services. Assuming the flight control team could configure such a service to generate on-board events, these events could then be linked to event actions and on-board monitorings. The example given above could be implemented within PUS as Fig. 15 shows. It is questionable as to whether the cost associated with moving such systems on-board LEO satellites would be worth doing. For most missions, this would be pointless: The results are too vague to put an FDIR action on, there is no real requirement improved timeliness of such monitoring compared to what is already possible today, and implementing on-board would be far more costly and less flexible than implementing on ground. However, it might be useful for operating a very large constellation, where the economies of scale for software development may help rebalance the equation. Major issues with TM/TC frequency and ground station allocations of such large constellations lead to the assumption that in such a constellation the satellites are normally muted from communications with ground controllers, but could request ground intervention whenever an outlier is detected.

436

E. Trollope et al.

7 Conclusion The authors believe that automated techniques for the routine monitoring and contingency detection of LEO operations should be divided into primary and secondary toolsets. Primary toolsets, embedded within the MCS itself, should be used solely to identify critical changes in spacecraft health that require an immediate intervention from ground. Secondary toolsets should be used to support the detections of the primary toolsets through such means as the identification of secondary effects, and lower priority trends or deviations in spacecraft behaviour that can be investigated during working hours. Such a distinction could be useful in the design phase of the spacecraft ground segment of future missions. The authors also believe that secondary monitoring toolsets should be not designed exclusively for the goal of data analysis, but should also support operations planning and reporting and thereby strengthen the knowledge management strategy of the mission. The authors believe that outlier detection algorithms are extremely useful in identifying anomalies before they develop into serious problems on-board, and further enhancements to the toolsets available today can be expected to be of great benefit to flight operations teams. In particular, the algorithms tested at EUMETSAT have proved too sensitive for use as a primary monitoring tool, but are very well suited to use as a secondary monitoring tool. Acknowledgements The authors extend their thanks to Mike Elson, Paul Raval, and the other members of the EUMETSAT TSS team for their support and hard work in developing the CHART toolset, and to the Data Analytics Team within the Advanced Mission Concepts Section of ESA ESOC for their cooperation and support in the initial stages of the EUMETSAT integration of novelty detection concepts into CHART. Ed Trollope would also like to thank Richard Dyer, for his work on the implementation of the EUMETSAT outlier detection algorithm, plus Nico Feldmann, Ry Evill and Jonathan Schulster for their support in the development of CHART-S3. The authors would also like to thank Helene Pasquier for her editorial support and advice, building on the SpaceOps 2018 conference paper by the same authors (Ref. [15]).

References 1. Francisco, T., Trollope, E., Montero, D., & Ventimiglia, L. (2018). What it has been like to fly and operate Europe’s ocean and land watcher, Copernicus Sentinel 3. In SpaceOps Conference 2018. Marseille, France: AIAA. Retrieved from https://doi.org/10.2514/6.2018-2416. 2. Galet, G. (2017). DATA MINING: Using machine learning for spacecraft housekeeping purpose. In SpaceOps Workshop 2017. Moscow, Russia: AIAA. 3. Holm, J., & Moura, D. (2003). Draft position paper on knowledge management in space activities. In 54th International Astronautical Congress. Bremen, Germany: IAF. 4. Martínez-Heras, J. A., Donati, A., Kirsch, M. G., & Schmidt, F. (2012). New telemetry monitoring paradigm with novelty detection. In SpaceOps Conference 2012 (pp. 11–15). Stockholm, Sweden: AIAA. Retrieved from https://doi.org/10.2514/6.2012-1275123. 5. Evans, E., Martinez, J., Korte-Stapff, M., Brighenti, A., Brighenti, C., & Biancat, J. (2016). Data mining to drastically improve spacecraft telemetry checking: An engineer’s approach. In

Automated Techniques for Routine Monitoring …

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

437

SpaceOps Conference 2016 (p. 2397). Daejeon, Korea: AIAA. Retrieved from https://doi.org/ 10.2514/6.2016-2397. O’Meara, C., Schlagy, L., Faltenbacher, L., & Wicklerz M. (2016). ATHMoS: Automated telemetry health monitoring system at GSOC using outlier detection and supervised machine learning. In SpaceOps Conference 2016 (p. 2347). Daejeon, Korea: AIAA. Retrieved from https://doi.org/10.2514/6.2016-2347. Gil, J.C., Narula, N., & Lopez, T. (2014). There can be only one: Heterogeneous satellite fleet automated operations with a single tool and language, the MEASAT case. In SpaceOps Conference 2014 (p. 1924). Pasadena, USA: AIAA. Retrieved from https://doi.org/10.2514/6. 2014-1924. Fernández, M. M., Yue, Y., & Weber, R. (2017). Telemetry anomaly detection system using machine learning to streamline mission operations. In 6th International Conference on Space Mission Challenges for Information Technology (SMC-IT) (pp. 70–75). IEEE. Evans, E., Martinez, J., Korte-Stapff, Vandenbussche, B., Royer, P., & De Ridder, J. (2016). Data mining to drastically improve spacecraft telemetry checking: A scientist’s approach. In SpaceOps Conference 2016 (p. 2398). Daejeon, Korea: AIAA. Retrieved from https://doi.org/ 10.2514/6.2016-2398. Fuertes, S., Pilastre, B., & D’Escrivan, S. (2018). Performance assessment of NOSTRADAMUS and other machine learning-based telemetry monitoring systems on a spacecraft anomalies database. In SpaceOps Conference 2018. Marseille, France: AIAA. Retrieved from https://doi.org/10.2514/6.2018-2559. Iverson, D. L. (2008). System health monitoring for space mission operations. In 2008 IEEE Aerospace Conference (pp. 1–8). Big Sky, USA: IEEE. Retrieved from https://doi.org/10.1109/ aero.2008.4526646. Schulster, J., Evill, R., Rogissart, J., Phillips, S., Dyer, R., & Feldmann, N. (2018). CHARTing the future—an offline data analysis and reporting toolkit to support automated decisionmaking in flight operations. In SpaceOps Conference 2018 (p. 2637). Marseille, France: AIAA. Retrieved from https://doi.org/10.2514/6.2018-2637. Wang, C., Chen, M. H., Schifano, E., Wu, J., & Yan, J. (2016). Statistical methods and computing for big data. Statistics and Its Interface, 9(4), 399–414. https://doi.org/10.4310/SII.2016. v9.n4.a1. Martínez-Heras, J., Boumghar, R., & Donati, A. (2016). Log novelty detection system. In SpaceOps Conference 2016 (p. 2432). AIAA, Daejeon, Korea. Retrieved from https://doi.org/ 10.2514/6.2016-2432. Trollope, E., Dyer, R., Francisco, S., Miller, J., Griso, M. P., & Argemandy, A. (2018). Analysis of automated techniques for routine monitoring and contingency detection of in-flight LEO operations at EUMETSAT. In SpaceOps Conference 2018 (p. 2532). Marseille, France: AIAA. Retrieved from https://doi.org/10.2514/6.2018-2532.

The Added Value of Advanced Feature Engineering and Selection for Machine Learning Models in Spacecraft Behavior Prediction Ying Gu, Gagan Manjunatha Gowda, Praveen Kumar Jayanna, Redouane Boumghar, Luke Lucas, Ansgar Bernardi and Andreas Dengel Abstract This paper describes the approach of one of the top ranked prediction models at the Mars Express Power Challenge. Advanced feature engineering methods and information mining from the Mars Express Orbiter open data constitute an important step during which domain knowledge is incorporated. The available data describes the thermal subsystem power consumption and the operational context of the Mars Express Orbiter. The power produced by the solar panels and the one consumed by the orbiter’s platform are well known by operators, as opposed to the power consumption of the thermal subsystem which reacts to keep subsystems at a given range of working temperatures. The residual power is then used for scientific observation. This paper presents an iterative and interactive pipeline framework which uses machine learning to predict, with more accuracy, the thermal power consumption. The prediction model, along with the estimation of the thermal power consumption, also provides insight into the effect of the context which could help operators to exploit spacecraft resources, thereby prolonged mission life.

Y. Gu (B) · G. M. Gowda · P. K. Jayanna · A. Bernardi · A. Dengel German Research Centre for Artificial Intelligence, 67663 Kaiserslautern, Germany e-mail: [email protected] Y. Gu DTMS, 55118 Mainz, Germany G. M. Gowda · P. K. Jayanna · A. Dengel CS Dept, TU Kaiserslautern, Germany R. Boumghar · L. Lucas European Space Agency - ESOC, 64293 Darmstadt, Germany © Springer Nature Switzerland AG 2019 H. Pasquier et al. (eds.), Space Operations: Inspiring Humankind’s Future, https://doi.org/10.1007/978-3-030-11536-4_17

439

440

Y. Gu et al.

Nomenclature cx y rx y N M

Root mean square error Predicted value for the X th timestep in the fourth Martian year of the Y th parameter Reference value for the X th timestep in the fourth Martian year of the Y th parameter Total number of evaluated measurements x ∈ [1, N] with N 12°, and Sun-Probe-Earth angle >3°.

Recommendations Emerging from an Analysis of NASA’s …

497

Fig. 17 Comparison of baseline, “RF-Only,” and “Combined RF-Optical” options from a downlink volume perspective (Based on 2016 best guess mission set scenario) Table 2 Optical flight terminal assumptions by location General locale

Aperture size (m)

SEL 2

0.22

Moon & ELL 2

0.11

Planetary (Mars)

0.50

Optical power (W)

Slot width (ns)

PPM range

Code rate

4.0

0.2

16–128

0.66

0.5

0.2

16–128

0.66

50.0

0.12

16–256

0.66

the duration of the originally requested Ka-band downlink tracks such that the same data volume could be achieved. Figure 18 provides an idea of how big this postulated optical communications mission set was as a function of time. At its height, the projected optical mission set constituted ~1/4 of the total deep space mission set. While optical communications were substituted for high-rate Ka-band links on these missions, their X-band engineering telemetry downlinks were left as originally modeled, since such mission-critical data was deemed too important to leave to the mercy of the weather and other potential sources of optical link interference. And, a reduced backup Ka-band downlink capability was assumed for times when the Sun-Earth-Probe angle and/or the Sun-Probe-Earth angle were not large enough to enable a reliable link. Similarly, no optical communications uplinks were assumed—either for missioncritical commanding or high-rate uplink. Aside from weather vulnerability, the close

498

D. S. Abraham et al.

Fig. 18 Number of spacecraft assumed capable of optical downlink as a function of time

proximity of the atmosphere at the beginning of an optical link leads to photon scattering that, when propagated out over interplanetary distances, leaves far too few photons reaching their intended target. With current technology, the optical uplink powers needed to compensate for this effect are neither feasible nor safe (particularly from the vantage point of anything flying overhead in the intervening space). Hence, the initial “Combined RF-Optical” study option assumed an array of 2to-3 34 m, X/X and 34/32 GHz antennas per complex to provide high-rate uplink (~50 Mbps) to human Mars exploration assets at maximum Mars distance (as well as a backup Ka-band downlink capability for small SEP and SPE angles). Even with this antenna addition for uplink, the “Combined RF-Optical” option assumed 11 less 34 m antennas than the “RF-Only” option. And, without the same sort of high-rate RF bandwidth drivers (given their conversion into optical missions), the remaining Ka-band links could all be accomplished at 34/32 GHz rather than having to upgrade the antennas to also operate at 40/37 GHz. The “Combined RFOptical” option, however, did retain the 70 m antennas due to the smaller number of 34 m antennas available for X-band arraying during critical events and spacecraft emergencies. Figures 16 and 17 (shown previously) indicate that the “Combined RF-Optical” option outperforms the “RF-Only” option in loading simulations from both a downlink-hour and a downlink-volume perspective—generally remaining above the average support calibration line in both cases. Because of the difference in achievable data rates, the required optical track hours to achieve the same data volume returns are an order of magnitude less than the Ka-band track hours required in the “RF-Only” option. And, a look at optical antenna-hour demand versus supply in Fig. 19 suggests that the three optical ground stations assumed for global coverage could easily meet the antenna-hour demand associated with the postulated optical communications mission set. Figure 20, however, does remind us that optical communications can be

Recommendations Emerging from an Analysis of NASA’s …

499

Fig. 19 Demand versus supply, assuming 3 12 m optical ground stations

subjected to significant outages when the angle between the probe and the Sun gets small relative to an observer on Earth and when the angle between the Earth and the Sun gets small relative to the probe that is looking for a laser beacon from Earth to guide the pointing of its optical downlink. Mission designers will need to be sure to design around such events.

5 Pass-2: Updating the Mission Set Scenarios and Refining the Options 5.1 Updating the Mission Set Scenarios Because roughly a year transpired between definition and modeling of the 2016 mission set scenarios and presentation of the final Pass-1 findings, each of the scenarios was updated in Pass-2 to reflect the addition of new missions, down-selections for competitively bid program lines, mission slips, mission cancellations, and changes in mission requirements. In addition, SCaN management requested pursuit of a “what if” scenario that assumed half the maximum uplink and downlink rates as those assumed in the original “Best Guess” for the human Mars exploration missions. So, two 2017 “Best Guess” scenarios appear in many of the figures: “2017 Best Guess—OA” with the original assumptions and “2017 Best Guess—JSA” with the half-max-rate assumptions. Also, in the “2017 Best Guess—JSA,” we assumed that the high-rate human Mars exploration downlinks would use 32 GHz (rather than

500

D. S. Abraham et al.

Fig. 20 Example occasions when SEP angles < 12° and SPE angles 4 (n-MSPA) are currently underway—as are open-loop recording techniques for allowing downlink via unscheduled, opportunistic use of other already scheduled antenna beams (i.e., OMSPA). Efforts are also underway to allow more flexible serial uplink swapping between downlink MSPA’d spacecraft. And, techniques for enabling simultaneous multiple uplinks per antenna (MUPA) are currently under investigation. The next 10 years also appears to be characterized by a steep increase in maximum downlink rates. After that, such rates begin to level off at around 250 Mbps. Most of the steep increase appears to be driven by high-rate observatory-class missions. However, by the end of the next decade, postulated human exploration missions and associated relay infrastructure become the dominant data rate drivers. And, because these high-rate missions at least partially occur at maximum Mars distance, their projected end-to-end link difficulties (data rate × square of distance) exceed current levels by almost two orders of magnitude. The sheer difficulty associated with closing such links may be inhibiting mission concept designers from designing to any higher data rates, causing the plots of average and maximum data rates to level off in the 2030s and beyond. But, available allocated RF spectrum may also be dissuading mission concept designers from postulating data rates much higher than 250 Mbps, since such rates would use up most, if not all, of the entire deep space Ka-band allocation. Beyond the RF spectrum issue, these increasing data rates and associated link difficulties also pose a potential capacity issue. As we showed in Pass-1 of the Deep Space Capacity Study, a “brute-force” approach to closing such links at Ka-band requires arraying roughly 6-to-7 34 m antennas (depending upon whether including a “hot backup” antenna for human missions). When these antennas are tied up with such arraying, there are then fewer antennas available to service all the rest of the spacecraft in other sky locations. And, in the current baseline plan for the future, the DSN only has four 34 m antennas per Complex. Hence, when we look at antennahour plots and loading simulation results for the 2030s and beyond, we see significant capacity shortfalls (sometimes exacerbated by the unavailability of certain frequency band capabilities such as 37–37.5 GHz). To the extent that the primary drivers for the most difficult links are the Mars Areostationary Relays needed for the postulated human Mars exploration era, we 13 In addition to beam sharing, SCaN/DSN is working to mitigate contention periods by developing

additional large-antenna cross-support arrangements with other space agencies and universities. It is also working to foster reliance on less-DSN intensive navigation techniques, particularly with respect to the cubeSat users.

Recommendations Emerging from an Analysis of NASA’s …

509

showed in Pass-2 that a cross-link between the relays could allow each one to send down (to Earth) half the data that one with the requisite Mars surface and/or Phobos view would otherwise be sending down. Hence, each relay could send that data down at half the original data rate, making the link about half as difficult to close. By applying some of the same MSPA techniques needed to minimize peak asset contention during the 2020s, it would then be possible to simultaneously recover the data from both relays. Thus, with only 2–3 additional antennas beyond what is currently planned per complex, the same total downlink volume could be recovered as with the 6-to-7 additional antennas per complex considered in Pass-1. The cross-link could also be applied to redistributing data uplinked to one or the other of the two relays, eliminating the need to array two or more antennas at Kaband to achieve the required 30 Mbps uplink. With the cross-link, two of the three antennas needed for downlink could each transmit half the uplink data at half the data rate. The data arriving at each of the relays could then be routed as needed via the cross-link. Loading simulations suggest that adoption of the preceding dual-trunk downlink and uplink architecture for human Mars exploration, in addition to the four 34 m antennas and one 70 m antenna per Complex already in the baseline plan, would enable satisfaction of the aggregate mission set’s capacity requirements. However, this architecture ignores the allocated spectrum bandwidth issue alluded to above in the downlink rate trend discussion. At either 32 or 37 GHz, there are only 500 MHz available or suggested, respectively. One could pursue use of both frequencies, but that would add significant cost and still create a 1000 MHz bandwidth limit. By instead using optical communications in a dual-trunk link architecture, one could avoid the RF spectrum bandwidth issue. And, by making use of one RF-optical hybrid antenna and one additional RF antenna per complex, one could satisfy the uplink requirement while having a reduced rate Ka-band backup for when the SunEarth-Probe and Sun-Probe-Earth angles get too small. One open issue with this dual-trunk optical option involves determining whether or not two, simultaneous pulse-position-modulated optical downlinks can be successfully MSPA’d. To sidestep this open issue, we also examined an unsymmetrical, dual-trunk link combined RF-optical option. In this option, two additional RF antennas accompany the RF-optical hybrid antenna. One Mars relay trunk link comes down at Ka-band to an array of the two RF antennas and the RF component of the RF-optical hybrid antenna. The other comes down to the optical component of the RF-optical antenna. With this design, there are more than enough RF-capable antennas to provide the dualtrunk uplink, and there is the possibility of defaulting to an MSPA’d RF dual-trunk arrayed downlink in the event of an optical communications failure or when the SunEarth-Probe and Sun-Probe-Earth angles get too small. Hence, the unsymmetrical option would potentially provide very high resilience to failures. And, if run in the RF MSPA’d mode along with the optical, this option could provide even higher data return than required in the study. While it is probably too soon to settle on any particular path, a couple of things are clear. First, load-sharing cross-links between relays can significantly reduce required antenna/aperture numbers at Earth, provided that the Earth ground station or arrayed

510

D. S. Abraham et al.

stations make use of MSPA—something that is needed anyway to reduce asset scheduling contention over the next ten years. Cross-link technology, in terms of both the physical link and the management of data across the link, merits further investment. Second, advances in technology can significantly change which path may look most promising. For instance, we routinely assume throughout this study that it is possible to array up antennas at Ka-band to achieve the required downlink, but demonstrations of such capability are few and far between. Similarly, in Pass-1, we assumed it possible to array up Ka-band antennas to achieve the required uplink, but that, too, is a work in progress. In the optical realm, we assumed that spacecraft laser powers as high as 50 W could be achieved—but, again, getting spacecraft laser powers to those levels will take a lot more time and effort. We also pointed out the open issue regarding the viability of optical MSPA, upon which the feasibility of the dual-trunk optical option hinges. Meanwhile, recent developments regarding new high-efficiency nanowire superconducting photon detectors might render the dualtrunk approach to optical unnecessary. Similarly, investment in the development of new, higher power 34 GHz transmitters could render the dual-trunk approach to uplink unnecessary. If both of these latter cases were to prove true, then a single RFoptical hybrid antenna per Complex, in addition to what is already in the baseline plan, might be all that would be needed for routine human Mars exploration support. So, a modest investment in all of the above technologies might be a necessary prerequisite for finding the right path. However, with the early 2030s in mind for operational capability, time is short. Historically, it takes several years to design the type of spacecraft that might operate as relays and just building a single 34 m antenna takes a couple of years. We do not have much time to invest in and evolve technologies before choosing a path. So, we ought to start now. Acknowledgements The authors would like to thank Wallace Tai for his sponsorship of this paper and Lincoln Wood for his derivation of the optimal areostationary relay spacing for cross-links. The research described in the paper was carried out by the Jet Propulsion Laboratory, California Institute of Technology, under contract with the National Aeronautics and Space Administration. The research was supported by NASA’s Space Communications and Navigation (SCaN) program.

References 1. Abraham, D. S., MacNeal, B. E., Heckman, D. P., Chen, Y., Wu, J. P., Tran, K., et al. (2018). Recommendations emerging from an analysis of NASA’s deep space communications capacity. In SpaceOps 2018 Conference, CAN-06, Marseille, France, 30 May 2018. Published by the American Institute of Aeronautics and Astronautics. URL https://arc.aiaa.org/doi/abs/10.2514/ 6.2018-2528. 2. Tai, W., Abraham, D., & Cheung, K.-M. (2018). Mars planetary network for human exploration era—Potential challenges and solutions. In SpaceOps 2018 Conference, CAN-03, Marseille, France, 29 May 2018 (cited pre-publication). 3. Abraham, D. S. (2002). Identifying future mission drivers on the deep space network. In SpaceOps 2002 Conference, T3-64, Houston, Texas, October 2002. URL https://arc.aiaa.org/

Recommendations Emerging from an Analysis of NASA’s …

511

doi/10.2514/6.2002-T3-64 (cited 5 January 2018). 4. MacNeal, B. E., Abraham, D. S., Hastrup, R. C., Wu, J. P., Machuzak, R. J., Heckman, D. P., et al. (2009). Mission set analysis tool for assessing future demands on NASA’s deep space network. In IEEE Aerospace Conference 2009, April 2009. URL http://ieeexplore.ieee.org/ document/4839377/ (cited 5 January 2018). 5. Cheung, K.-M., & Abraham, D. S. (2012). End-to-end traffic flow modeling of the integrated SCaN network. Interplanetary Network Progress Report, 42-189 (Jet Propulsion Laboratory, California Institute of Technology, May 15, 2012). URL https://ipnpr.jpl.nasa.gov/progress_ report/42-189/title.htm (cited 5 January 2018). 6. Chen, Y., Abraham, D. S., Heckman, D. P., Kwok, A., MacNeal, B. E., Tran, K., & Wu, J. P. (2016). Architectural and operational considerations emerging from hybrid RF-optical network loading simulations. In Proceedings SPIE 9739, Free-Space Laser Communication and Atmospheric Propagation XXVIII, 97390P (15 March 2016). URL: https://www. spiedigitallibrary.org/conference-proceedings-of-spie/9739/1/Architectural-and-operationalconsiderations-emerging-from-hybrid-RF-optical-network/10.1117/12.2213594.full (cited 10 January 2018). 7. Morabito, D., & Abraham, D. (2018). Multiple uplinks per antenna (MUPA) signal acquisition schemes. In SpaceOps 2018 Conference, CAN-09, Marseille, France, 31 May 2018 (cited prepublication). 8. Abraham, D. S., Finley, S. G., Heckman, D. P., Lay, N. E., Lush, C. M., & MacNeal, B. E. (2015). Opportunistic MSPA demonstration #1: Final Report, Interplanetary Network Progress Report, 42-200 (Jet Propulsion Laboratory, California Institute of Technology, February 15, 2015). URL https://ipnpr.jpl.nasa.gov/progress_report/42-200/title.htm (cited 19 February 2018). 9. Towfic, Z., Heckman, D, Morabito, D., Rogalin, R., Okino, C., & Abraham, D. (2018). Simulation and analysis of opportunistic MSPA for multiple cubesat deployments. In SpaceOps 2018 Conference, CAN-01, Marseille, France, 28 May 2018 (cited pre-publication). 10. Biswas, A., Kovalik, J., Srinivasan, M., Shaw, M., Piazolla, S., Wright, M. W., & Farr, W. H. Deep space laser communication. In Proceedings of SPIE 9739, Free-Space Laser Communication and Atmospheric Propagation XXVIII, 97390P (15 March 2016). URL https://www.spiedigitallibrary.org/conference-proceedings-of-spie/9739/97390Q/Deepspace-laser-communications/10.1117/12.2217428.full (cited 2 March 2018). 11. Tai, ibid.

Statistical Methods for Outlier Detection in Space Telemetries Clémentine Barreyre, Loic Boussouf, Bertrand Cabon, Béatrice Laurent and Jean-Michel Loubes

Abstract Satellites monitoring is an important task to prevent the failure of satellites. For this purpose, a large number of time series are analyzed in order to detect anomalies. In this paper, we provide a review of such analysis focusing on methods that rely on features extraction. In particular, we set up features based on fixed functional bases (Fourier, wavelets, kernel bases…) and databased bases (PCA, KPCA). The outlier detection methods we apply on those features can be distance- or densitybased. Those algorithms will be tested on real telemetry data.

1 Introduction Analyzing functional data and outlier detection has become an increasingly important subject over the years, and the space industry is facing more and more these issues as well. Indeed, unlike other complex systems, the hardware monitoring of satellites is impossible but a single failure in a component or a subsystem may be fatal to it. Consequently, it is highly important to check the behavior of the satellite during all its life cycle to detect the divergences as soon as possible.

C. Barreyre (B) · L. Boussouf · B. Cabon Airbus Defence and Space, Z.I. Palays, 31 rue des Cosmonautes, 31400 Toulouse, France e-mail: [email protected] L. Boussouf e-mail: [email protected] B. Cabon e-mail: [email protected] B. Laurent Institut de Mathématiques de Toulouse (UMR 5219), INSA Toulouse, Université de Toulouse, 135 avenue de Rangueil, 31400 Toulouse, France J.-M. Loubes Institut de Mathématiques de Toulouse (UMR 5219), Université Paul Sabatier, Université de Toulouse, 135 avenue de Rangueil, 31400 Toulouse, France © Springer Nature Switzerland AG 2019 H. Pasquier et al. (eds.), Space Operations: Inspiring Humankind’s Future, https://doi.org/10.1007/978-3-030-11536-4_20

513

514

C. Barreyre et al.

The behavior of the satellites is firstly checked during the tests on the ground, and in flight via the measurements of thousands of health parameters, called telemetries. Most of such signals can be considered as functions depending on time and sampled at high rate, leading to a large number of observed values. In order to automate the monitoring of all these data, existing processing was developed by experts based on physical knowledge. They are run frequently to raise any misbehavior but they rely on a posterior human analysis. However, because of the increasing complexity of the concerned systems and the increasing number of data, designing custom-made processing for all data is currently impossible, hence developing automatic data analysis has become essential. The automatic monitoring of satellites data has already been addressed by the ESA [1] (European Space Agency), the CNES [2] (Centre National d’Etudes Spatiales French Space Agency), and the JAXA (Japanese Space Agency) to detect real-time anomalies in telemetries. In the aeronautic field, many health monitoring studies have been done, such as the studies performed by Thomas [3], Rabenoro [4], and AbdelSayed [5]. All these studies address the anomaly detection with different approaches in various frameworks, firstly to reduce the dimension of the data to concentrate the information, and then to isolate the anomalies. Given the variety of all the outlier detection methods that have been studied in similar applications, we choose to confront some of them to our data. For this reason, in this paper, we tackle the issue of outlier detection by reviewing statistical methods to analyze typical functional data deriving from satellites that may present some anomaly in unsupervised settings. Due to the complexity of defining what an outlier is, we aim at extracting main behaviors and identify the observations that differ from it. Hence, we want to build features that characterize the normal behavior. For this purpose, we consider dimension reduction techniques to overcome the high-dimensional aspect of the problem. For instance, we refer to various clustering applications such as Antoniadis [6], Tarpey [7], and Auder [8], and default detection applications such as Pan [9], where the dimension reduction is done on coefficients arising from projections into functional bases, mostly deriving from wavelet decompositions. Features can be extracted from such reduction techniques, and they are robust when there are some irregularities in the data, such as noise, irregular sampling, missing data. The features arising from these projections are expected to concentrate the information on a small number of coefficients. As for the previously cited studies, several bases will be tested to build our features. Then, we apply, on each feature set, several reference outlier detection techniques, for which we refer, for example, to [10], that we compare with a new method that takes into account the temporal order of the curves, called the temporal nearest neighbors dissimilarity. Those methods are applied on simulated data to ease the comparison of all the projection bases and outlier detection methods directly, on a common example. Then, the methods and the results of the common comparison are validated on real telemetries as well as radio-frequency equipments test data. The first part of the paper will

Statistical Methods for Outlier Detection in Space Telemetries

515

be dedicated to the data description, the second part to the dimension reduction, and the third part on the outlier detection techniques. The methods will be applied to real telemetry data in the last section of this paper.

2 Data 2.1 Description The data that is analyzed is telemetry data. A telemetry is an observation of a continuous signal t → X (t) sampled at some instants t j , with j = 1, . . . , T . Such signals are numerous, and each signal characterizes a component/parameter of the satellite. The telemetries are very numerous, and a satellite may have as many as 10,000 telemetries or more. The sampling time can be very short and will vary from one telemetry to another, with most telemetries being sampled every 30 s. It means that a full year of telemetry is represented by more than a million instants. In order to build novelty detection method in an unsupervised setting, we must build confidence areas where the good behavior will be mainly located while anomalies will be away of this area. Hence, we need to extract some representative features of these curves that exhibit a pattern that will represent the normal behavior pattern. The telemetries can be very different and vary a lot depending whether the satellite is a telecommunication or an observation satellite. Telecommunication satellites are mostly located on a geostationary orbit. They always stay above the same location in Earth, completing an orbit each day. Consequently, most of the telemetries coming from telecommunication satellites exhibit patterns that are replicated on a daily basis. These patterns can evolve through the year because of the seasonality. Also, geostationary satellites experience eclipses twice a year during a period of approximatively 15 days. The behavior of some telemetries can be significantly impacted by these events. Observation satellites are low earth orbit (LEO). Unlike the telecommunication satellites, their location above Earth changes all the time, and it takes a few days for the satellite to scan the whole planet. They often have an inner periodicity which is shorter than a day, with no annual evolution. In this purpose, we present in the following such typical examples and the different kinds of anomalies that may occur more frequently.

2.2 Main Kinds of Signals The following examples are real satellites’ data. The methods we develop will be tested later on those telemetries in the Sect. 5.

516

C. Barreyre et al.

1. Non-evolutive telemetry This first telemetry, represented in Fig. 1, is not yearly periodical and exhibits changing behavior. This telemetry contains a large amount of anomalies; hence, it is an interesting use-case for applying outlier detection algorithms. 2. Evolutive telemetry This second telemetry, represented in Fig. 2, has a yearly trend. Unlike the previous example, there is no obvious anomalies in this dataset. The outlier detection here will help us to highlight the changes that can be found in this telemetry. 3. Gain versus Frequency For this third example, we have proposed the Gain versus Frequency test, which is a test commonly used to characterize the performance of telecommunications channels. The principle of the test is to measure the gain between the input and the output at all frequencies within the channel bandwidth, and is an excellent mean to verify the impact of the passive units and the active components on the expected performance. The ideal response of the Gain-Frequency tests is a constant gain function over the

Fig. 1 First telemetry

Fig. 2 Second telemetry

Statistical Methods for Outlier Detection in Space Telemetries

517

Fig. 3 Example of Gain-Frequency tests (left plot), small steps as local anomalies (middle plot), and ripples as global anomalies (right plot)

frequency bandwidth. The goal is to find the most altered results among the curves resulting from the Gain-Frequency. The signal can be altered locally (steps, spikes…) or globally. Figure 3 illustrates these tests and shows how their anomalies look like.

2.3 Anomaly Description By definition, an anomaly is an event that differ from the usual behavior of the data. As it is unusual, it is infrequent and therefore difficult to detect. Indeed, defining a normal region that encompasses every possible normal behaviors is very difficult. In addition, the boundary between normal and anomalous behavior is often not precise. Hence, our task is very difficult, hence we will focus on specific types of anomalies. As a matter of fact, space telemetries’ anomalies can be classified into three categories. • Local anomalies: Those anomalies occur in a limited time durations. They are characterized by discontinuities of the time series at some breaking points. • Pattern anomalies: One of the pattern of the telemetry does not exhibit the expected shape. • Periodicity anomaly: The frequency, or the phasis of the physical phenomenon changes. We can see that these anomalies can have several different origins : anomaly on duration, anomaly on frequency. The anomalies can be due to real phenomenon in flight, but they can also be the consequence of measurement errors. Thus, a single method is unlikely to highlight all those types of anomalies with a small rate of false positive. As we do not know the past anomalies on the telemetries, we use a simulated

518

C. Barreyre et al.

example, close to our data, on which we insert several types of anomalies. It will enables us to evaluate and compare the performances of our methods on a common application. In a second step, these methods are validated on real telemetries.

2.4 Simulated Example In order to compare the different methodologies we apply in this paper, we have created a simulated example to ease the validation of each method. This telemetry has a daily and a yearly periodicity, to carry the difficulties we may find in real telemetries. We used a combination of periodic signals and given patterns to generate this telemetry. We simulated 240 days of telemetry in order to get a significant number of individuals after splitting the signal into days. Several types of anomalies have been introduced. The shape of this telemetry is represented in Fig. 4. In this figure, the evolution of the repetitive patterns through the days is easily noticeable. We can see that from one day to another, the patterns do not evolve much. We simulated three local anomalies which are represented on Fig. 5. The first two are local spikes on the signal. The third anomaly corresponds to the increase in the amplitude of the noise during a small time duration. We also added pattern and periodicity anomalies, represented on Fig. 6. The anomalies can be minor, such as amplitude attenuation, or stronger such as having two patterns within a given day instead of one. Consequently, we have seven anomalies in total. These anomalies have been chosen in order to investigate if both the obvious ones and the minor ones can be raised by our algorithms. In the next section, we firstly define how to extract the features for highlighting the anomalies.

Fig. 4 Simulated telemetry and its zoomed version

Statistical Methods for Outlier Detection in Space Telemetries

Fig. 5 Simulated local anomalies

Fig. 6 Simulated pattern anomalies

519

520

C. Barreyre et al.

3 Anomaly Detection with Feature Extraction 3.1 Methodology Our aim is to extract appropriate features that convey some relevant information to characterize anomalies as functions that deviate from these main features. In fact, we recall that the data we monitor is functional data, consequently the features will be built on projection onto functional bases. Let x(t) ∈ R, with t = 1, . . . , T be a univariate telemetry time series. At first, the time series are cut into n intervals of length p, with n × p ≤ T and (n + 1) × p > T . For geostationary satellites, the telemetries can be cut into days. For observation satellites, several time durations, such as the orbit period, can be tested. In a first step, have to analyze the matrix X = [X1 , . . . , Xn ]T . The dimension of the matrix X is n × p. The idea is to reduce the dimension of the vectors Xi , i = 1, . . . , n by resuming the information contained in this vectors by d < p features (θi,1 , . . . , θi,d ) chosen properly. It is important to recall that those features are built in order to highlight the best the anomalies. We use the following methodology. After splitting the data into regular intervals, the vector Xi to analyze can be seen as observations of functional data. This corresponds to the observation model Xi, j = Xi (t j ) + εi, j , i = 1 . . . n, j = 1, . . . , p, where εi, j is a random noise which variance σ 2 is unknown and we suppose that Xi ∈ L2 ([0, 1]). Without loss of generality, we assume that for all j, t j ∈ [0, 1]. Moreover, if the telemetries are regularly sampled, we can assume that t j = j/p. There exist many ways to represent functional data in a reduced dimension. Some bases are more adapted to some types of functions. We can consider, for instance, Fourier bases, wavelet bases…Auder and Fischer [8] introduced some well-known functional bases in order to apply functional clustering. Our problem is to identify several types of anomalies. The functional features have to carry relevant information on the anomalies, such that any outlier detection algorithm has a chance to detect them. We proceed by identifying some features that are able to accentuate anomalies in a reduced dimension. Then, the feature selection is made in order to maximize the variance explained by those features.

3.2 Projection Onto Functional Bases Assume that for all i = 1 . . . n, Xi ∈ L2 ([0, 1]). Those functions can thus be represented in an orthonormal functional basis in L2 ([0, 1]). (See Ramsay [11] for more details). Then, if (φλ )λ≥1 is an orthonormal basis in L2 ([0, 1]),

Statistical Methods for Outlier Detection in Space Telemetries

Xi (t) =

∞

θi,λ φλ (t), with θi,λ = Xi , φλ

521

(1)

λ=1

where ·, · denotes the inner product in L2 ([0, 1]). In practice, the coefficients θi,λ , λ = 1, . . . , d are estimated by their empirical counterparts. 1 Xi, j φλ (t j ). p j=1 p

θî,λ =

(2)

1. Fixed basis The Fourier basis When the data is periodical, analyzing the frequency properties of the signal is widely applied in signal processing. We know that we can decompose the functions Xi according to Xi (t) =

+∞

θi,λ φλ (t)

λ=0

where t ∈ [0, 1]. In this case, the functional basis (φλ )λ≥0 is the usual Fourier basis. We have, for k ∈ N∗ , ⎧ φ0 (t) = 1 ⎪ ⎪ ⎨ √ φ2k (t) = 2cos(2π kt) (3) ⎪ √ ⎪ ⎩φ (t) = 2sin(2π kt). 2k+1

We now have to select d features properly. Let Sd (Xi ) = dλ=0 θi,λ φλ (t) be the partial Fourier series, and let θ ·,λ = (θ0,λ , . . . , θn,λ ) be the coefficients of all the curves for the λth feature. In practice, the vector θ ·,λ is unknown and is estimated by θˆ ·,λ as defined in (2). We know that the more d increases, the better the approximation becomes. Based on this assumption, we choose the smallest d that satisfies the condition d

λ=0

p

λ=0

V ar(θˆ ·,λ ) ≥ 0.99. V ar(θˆ ·,λ )

It means that the d first features represent more than 99% of the variance of all the features we can compute. At last, we get θˆ i = (θî,0 , . . . , θî,d ) as remaining data to analyze, for i = 1, . . . , n. Example 1 We applied the Fourier decomposition on the simulated example introduced Sect. 2.4. The five first coefficients deriving from this projection represent already more than 99% of the variance. To illustrate the kind of anomalies that could be highlighted by the Fourier basis, we have represented the three first components in Fig. 7. The nominal days are represented as green dots in green, the local anomalies in pink, the periodicity anomaly in orange, and the pattern anomalies are in red. As

522

C. Barreyre et al.

Fig. 7 Coefficients in the three first functions of the Fourier basis

a first remark, we can notice that those features do not appear as an agglomerate of points but as a continuous line, resuming the yearly trend of this telemetry. As we can see, the local anomalies seem unlikely to be raised through the Fourier basis whereas the other anomalies appear as outliers. The local anomalies in pink start slowly to stall the nominal curves in the following features. The anomaly detection will help us to confirm this hypothesis. Wavelet basis A wavelet basis is defined from a pair of functions (φ, ψ) respectively called father and mother wavelet. We consider only compacted supported wavelets. We denote by supp(ψ) the support of ψ. For all j ≥ 0, k ∈ Z, let ψ j,k (x) = 2 j/2 ψ(2 j x − k) and for all j ≥ 0, let ( j) = {k ∈ Z; [−k, 2 j − k] ∩ supp(ψ) = 0}. The functions (φ, ψ j,k , j ≥ 0, k ∈ ( j)) form an orthonormal basis of L2 ([0, 1]). The easiest example to present is the Haar basis where φ = 1[0,1] ; ψ = 1[0,1/2[ − 1[1/2,1[ . In this case, ( j) = {0, 1, . . . , 2 j − 1} and | ( j)| = 2 j , for all j ≥ 0. The Haar wavelets are not continuous, and hence sometimes inappropriate for practical purposes. Many other wavelet bases can be considered such as Daubechies symmlet bases (see Daubechies [12] for more details). A function X ∈ L2 ([0, 1]) can be represented by its expansion onto a wavelet basis β j,k ψ j,k (t). X (t) = αφ(t) + j≥0 k∈ ( j)

We extract, from the estimated wavelet coefficients (αˆ i , βî, jk , j ≥ 0, k ∈ ( j)) of our functions Xi , a small number of suitably chosen coefficients. In the wavelet case,

Statistical Methods for Outlier Detection in Space Telemetries

523

the coefficients are indexed by the level j and the position k. Thus, it does not have a sense to take the d first coefficients since they are not ordered in one dimension. The variance analysis is likely to encourage us to keep only the coefficients corresponding to the first levels. However, we know that the first levels of wavelet coefficients are not designed to catch local events. Consequently, those events are unlikely to be highlighted in these levels. Thus, we have to consider larger levels as well. Coifman et al. [13] proposed to pick only a finite number of wavelets by keeping d pairs of coefficients ( j1 , k1 ), . . . , ( jd , kd ) such that supp(∪dm=1 ψ jm ,km ) = [0, 1]. Auder et al. [8] noticed that it is possible to keep all the position coefficients from a same level. In order to get the anomaly types that can be detected by each wavelet level, we will analyze them separately. As the first levels contain a small number of coefficients, we group the levels for which j ∈ {0, 1, 2} together. The features of the following levels will be taken entirely. Finally, the features extraction we get can be 012

θˆ i

= (α, ˆ βˆ0,0 , βˆ1,0 , βˆ1,1 , βˆ2,0 , . . . , βˆ2,3 );

j

or, for some j > 2, θˆ i = (βˆ j,0 , . . . , βˆ j,2 j −1 ). Several values for j > 2 will be tested. Example 2 We would like to choose the levels that contain information on the anomalies. To have an idea of the pathologies raised up by each level, we choose to represent all the coefficients of some wavelet level in order to get relevant information on each anomaly type. For pattern anomalies, the small levels are expected to explain better the global changes. For this, we represent for a full level, a boxplot per feature. We add the corresponding values for each anomaly within a continuous line. For pattern anomalies, we can see in Fig. 8 that in the small levels only one pattern appears clearly as an outlier (refer to the lines in red). For the two other pattern anomalies, it is impossible to conclude based on this figure. We can also see that the periodicity anomaly starts to appear as abnormal at the level j = 2, and it stall even more than the other data in the coefficients for which j = 3. For the local anomalies, we consider the levels 4 and 5, that we represent in Fig. 9. We can see that the fourth level is the first level where local anomalies start to appear clearly as outliers. It is even more evident in the fifth level. We can see that the different levels enable to raise several types of anomalies. Larger levels than j = 5 return false positives, and the number of features becomes too large. Kernel basis We consider in this paragraph a basis obtained from the Gaussian kernel eigenfunctions. For a given γ > 0, the Gaussian kernel is defined as Kγ (s, t) = exp(−γ (s − t)2 ). The Gaussian kernel is universal in the sense that its eigenfunctions can represent any regular function (see Steinwart [14]). Thanks to this property, it is possible to

524

C. Barreyre et al.

Fig. 8 Wavelet coefficients with j ≤ 4 and details for pattern anomalies (in red) and periodicity anomaly (in orange)

Fig. 9 Wavelet coefficients with j = 4, 5 and details for the local anomalies. Different markers to identify the local anomalies

Statistical Methods for Outlier Detection in Space Telemetries

525

reduce the dimension of the observations by keeping only the d first eigenfunctions, with 1 < d < p. When the telemetry exhibits a pattern reproducing itself regularly, this basis is relevant. As the basis is fixed, if a unique pattern has a different shape, the features are likely to be highly impacted. The Gaussian kernel Kγ is a Mercer kernel (a continuous symmetric, definite positive function). Thanks to Mercer theorem [15], we know that Kγ admits the following representation Kγ (s, t) =

∞

αk φλ (s)φλ (t).

λ=1

The functions (φλ )λ≥1 are the eigenfunctions of the kernel Kγ , and (αλ )λ≥1 form a α < +∞. Thanks to Steinwart, [14], we decreasing positive series, such that ∞ λ=1 λ know that the eigenvalues (αλ )λ≥1 decrease exponentially. This property justifies the fact that we can retain only a finite number d of eigenfunctions. As for the Fourier basis, we retain the d first coefficients that encompass more than 99% of the variance of the coefficients. The function Xi , i = 1, . . . , n, can be approximated by Xi,d corresponding to its projection onto the d first eigenfunctions (see Schölkopf and Smola [16] for more details): 1 d θi,λ φλ (t), where θi,λ = Xi (t)φλ (t)dt. (4) Xi,d (t) = λ=1

0

In practice, for all i = 1 . . . n, we estimate the coefficients (θi,λ )λ=1,...,d by the empirical coefficients (θî,λ )λ=1,...,d as defined in the Eq. (2). The remaining vector to analyze is θˆ i = (θî,1 , . . . , θî,d ). Example 3 For the simulated example, we test several Gaussian kernels with γ = 10, 100, 10, 000. In this case, the four first functions we get from this kernel basis represent already more than 99% of the variance. To illustrate the anomalies that can be highlighted by this basis, we plot in Fig. 10 the three first components corresponding to the kernel eigenfunctions with γ = 100, the other options give similar results. The colors that we use are the same as previously. Once again, the local anomalies (in pink) do not appear as outliers, whereas the others do. 2. Data-dependent bases The principal component analysis (PCA) is a very powerful way to reduce the dimension of the data by retrieving the vectors that recover the best the variance of the data. If all the portions of the signal are really similar, we will see that, by taking a small number of features, we are likely to resume the data in a good way.

526

C. Barreyre et al.

Fig. 10 Coefficients in the three first functions of the kernel basis for γ = 100

The kernel principal component analysis (KPCA) was introduced by Schölkopf et al. [17]. The idea is to build a nonlinear form of PCA. Let ψ : Rp → R be a continuous function. Denote ψ¯ = 1n ni=1 ψ(Xi ). The KPCA finds the eigenvectors of the matrix

T 1 ψ(Xi ) − ψ¯ ψ(Xi ) − ψ¯ . n i=1 n

Cn =

As we can see, if ψ = Id , then Cn is equivalent to the empirical covariance matrix; therefore, the principal component analysis is a particular case of the kernel principal component analysis. Let K be the Gram matrix defined as ¯ ψ(Xl ) − ψ ¯ = (ψ(Xi ) − ψ) ¯ T (ψ(Xl ) − ψ). ¯ Kil = ψ(Xi ) − ψ, ¯ of Finding the eigenvalues αλ and the eigenvectors Vλ = ni=1 θi,λ (ψ(Xi ) − ψ) the matrix Cn can be done by solving the eigenvalue problem nαλ θ λ = Kθ λ .

(5)

The projection of a test point ψ(X∗ ) onto the eigenvector Vλ is equal to γλ Vλ , where γλ =

n i=1

¯ (ψ(X∗ ) − ψ) . ¯ θi,λ (ψ(Xi ) − ψ),

(6)

Statistical Methods for Outlier Detection in Space Telemetries

527

Example 4 Let (φλ , λ ∈ ) be an orthonormal family of functions of L2 ([0, 1]), and ψ(Xi ) =

p 1 Xi j φλ (t j ) = (θî,λ )λ∈ p j=1 λ∈

where ψ : Rp → Rd , and d = | |. Then, ψ(Xi ) returns the projection coefficients of Xi in an orthonormal basis, as defined earlier. Then we get ⎞ ⎛¯ ⎞ θˆ1 θî,1 ⎜ ⎟ ⎜ ¯ˆ .⎟ ⎟ ˆ ψ(Xi ) − ψ¯ = ⎝ ... ⎠ − ⎜ ⎝ .. ⎠ = θ i − θ . θî,d θ¯ˆ ⎛

d

The covariance function becomes Cn = 1n ni=1 (θˆ i − θ¯ˆ )(θˆ i − θ¯ˆ )T . In this case, we can see that the KPCA applies a PCA on the features corresponding to projections in an orthonormal basis. We can take, for example, a Gaussian kernel to recover the results of the previous section. We can also use projections on other fixed bases such as Fourier, wavelet bases. Example 5 We apply both the standard PCA and the KPCA with a Gaussian kernel K(x, y) = exp(−γ (x − y)2 ), with γ = 100 as in the kernel basis section. For the PCA case, the three first components we get from this basis represent already more than 99% of the variance, whereas for the KPCA we need to retain the four first principal components. The three first components for both cases are represented in Fig. 11. We can see that the pattern anomalies and the periodicity anomaly are clearly isolated. However, we also notice that the local anomalies seem to appear as more isolated than what we observed with the Fourier and the Gaussian kernel basis.

Fig. 11 Coefficients in the three first functions of the PCA basis (left) and the KPCA component basis (right)

528

C. Barreyre et al.

3.3 Frequency Spectrum In the analysis of telemetries, changes in the period can be interesting to catch. By analyzing the periodogram, also denoted as the frequency spectrum of the signal, some anomalies of non-periodicity can be detected. Instead of computing our features on the original signal, it is possible to build them on the frequency spectrum. The frequency spectrum has been detailed by Schlindwein et al. [18]. Let X be an arbitrary function of length p. Denote by X (t) the tth sample of the sequence. This function can be decomposed in a Fourier basis using the discrete Fourier transform p−1 1 X (t)e2πiνt . d (ν) = p t=0 The periodogram of the function is highlighting the frequencies corresponding to the function’s period. It is equal to I (ν) = p|d (ν)|2 . If the daily signal is periodical with an unknown period, then it is more relevant to apply the functional decomposition on the periodogram than on the initial values Xi . The periodogram is able to catch changes in the period without being disturbed by changes in the phasis. We can then look for pattern anomalies in the frequency spectrum. Example 6 We often represent the logarithm of the spectrum rather than the spectrum itself. In our case, the periodogram is represented in Fig. 12. As we can see, the periodogram enables to catch local anomalies as well as periodicity anomalies. Then, we can extract the features (e.g., on kernel bases or the principal components). The local anomalies appear clearly on Fig. 13 as outliers. However, as we have computed our signal with periodical functions, we must validate this hypothesis on real data.

Fig. 12 Periodogram on simulated data

Statistical Methods for Outlier Detection in Space Telemetries

529

Fig. 13 Coefficients in the three first functions of the kernel basis and on the functional principal component basis computed on the periodogram

3.4 Curve Registration When extracting the information conveyed by curves, an additional difficulty may come from the fact that the curves may not be aligned. Indeed when finding a meaningful representative function that characterizes the common behavior of the sample, capturing its inner characteristics, as trends, local extrema and inflection points can be really relevant. In fact, the curves can be subject to both amplitude (variation on the y-axis) and phase (variation on the x-axis) variations with respect to the common pattern, as pointed out in [11], or [19] for instance. Hence, in the two last decades, there has been a growing interest for statistical methodologies and algorithms to align the curves prior to any analysis in order to remove this variability. Among all the methods, we refer, for instance, to [20–22] and references therein. Hereafter, we focus on warping fonctions γ defined as follows. Denote G the set of all diffeomorphism γ subject to : G = {γ : [0, 1] → [0, 1] | γ (0) = 0, γ (1) = 1} γ is a function that warps the time. Xi ◦ γ denotes the time-warping of Xi by γ . When having two functions f1 and f2 , the classical dynamic time warping algorithms finds the warping function that minimizes inf γ ∈G ||f1 − f2 ◦ γ ||, where || · || denotes the standard L2 ([0, 1]) norm. This method is well adapted when we want to align two curves. To align more than two curves within the same algorithm, a new method was implemented. of Xi , can Denote qi = sgn(Xi (t)) |Xi (t)| the squared-root slope function (SRSF) √ be shown that the SRSF of Xi ◦ γ that we denote (qi , γ ) is equal to qi (γ (t)) |γ (t)|. Then, we can prove that ||q1 − q2 || = ||(q1 , γ ) − (q2 , γ )||. Thanks to this property,

530

C. Barreyre et al.

we can define a distance considering the time warping. We call this distance the y-distance Dy . Dy (f1 , f2 ) = inf ||q1 − (q2 ◦ γ ) γ || γ ∈

The Karcher means μf and μq are the functions that minimise μf = argmin f ∈L2

μq = argmin q∈L2

n

Dy (f , Xi )2

i=1 n i=1

inf ||q − (qi ◦ γi )||2

γi ∈

The idea of the algorithm is to start by considering the empirical mean of all functions, and then to warp all the curves according to this mean. At each iteration, compute the new Karcher means and update the warping functions. The algorithm stops after a given number of iterations or if the increment || n1 ni=1 (qi ◦ γi ) γi || is small enough. The aligned functions we get from the output of the algorithm are the functions X˜i = Xi ◦ γi∗ . Those methods can be really useful when the patterns have a small phase or scale gap that can be ignored. Example 7 We align the curves we got from our simulated data thanks to the function align_fPCA which is part of the R package fdasrvf. This function takes a set of curves, align them thanks to shape functions as described earlier, and then we compute the functional principal component analysis on the aligned curves. On the following figure are represented the original data and their warped version after 12 iterations of the algorithm. After computing the FPCA on the aligned data, we keep the seven first coefficients in order to keep 99% of the variance. We choose to represent the three first FPCA components before and after the warping algorithm. As we can see on Fig. 14, aligning the data generates some noise in the components we get. Some of the curves can be so much warped that they no longer look like the original ones. Those extreme deformations can generate outliers that are not true outliers before the time warping procedure. However, on Fig. 15 we notice that the local anomalies stall much from the nominal data. The outlier detection methods will help us in determining which case is the best for the outlier detection purpose.

4 Anomaly Detection In this section, we detail several methods to detect outliers using the features that have been previously selected. First it is important to define what is an outlier. Several definitions are used in the literature, and the most famous is probably the definition of Hawkins [23]. “An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism.”

Statistical Methods for Outlier Detection in Space Telemetries

531

Fig. 14 Original curves and their aligned versions

Fig. 15 Three first coefficients of the PCA applied on the original and aligned data

Based on this assumption, we try to build a mathematical framework in order to detect them automatically. For this purpose, several ways can be considered. • An outlier is an observation that lies at an abnormal distance to the other observations. Distance-based methods may be adapted to this point of view. • An outlier is an observation situated in a low-density area. Then the density-based methods and minimum volume set estimation can be applied. We develop outlier detection methods based on those two point of views. We will start with distance-based methods, and then density-based methods, inspired by Chan-

532

C. Barreyre et al.

dola’s anomaly detection classification [10]. For all the outlier detection methods, we apply them on the feature sets we have defined in our previous examples, that we summarize as follows. (1) The first Fourier coefficients representing at least 99% of the variance. (2) The coefficients deriving from the Daubechies wavelet levels for which j ∈ {0, 1, 2}. (3) The coefficients deriving from the Daubechies wavelet levels for which j = 3. (4) The coefficients deriving from the Daubechies wavelet levels for which j = 4. (5) The coefficients deriving from the Daubechies wavelet levels for which j = 5. (6) The coefficients deriving from the projections onto a Gaussian kernel basis for which γ = 100, representing at least 99% of the variance. (7) The PCA coefficients representing at least 99% of the variance. (8) The KPCA coefficients where the kernel is a Gaussian kernel with γ = 100 representing at least 99% of the variance. (9) The coefficients deriving from the projections onto a kernel basis of the periodogram (we use the same kernel as the set 6). (10) The PCA coefficients applied on the periodogram. (11) The PCA coefficients applied on the aligned data. For each application, we also present the results when we aggregate the novelties detected on every feature set.

4.1 Anomaly Detection Using Distance Methods Some of the methods to detect outliers in functional data are based on distances between the curves. Usually, the usual distance refers to the Euclidian distance on the raw data. As we have reduced the dimension of the data to get d coefficients for each curve, the distance we use is the Euclidian distance built on the features: d ˆ ˆ d (θ i , θ j ) = (θî,λ − θˆ j,λ )2 = ||θˆ i − θˆ j ||. λ=1

As we have defined the feature distance, we can now apply distance-based methods for outlier detection using this distance. 1. Hierarchical ascending clustering The hierarchical ascending clustering is a widely used distance-based clustering method. Let D be the distance matrix of all our features, where Di j = d (θˆ i , θˆ j ), with i, j = 1, . . . n. The usual clustering, as it is often applied, is unlikely to isolate outlier clusters. However, we know that if the intergroup similarity is the “single linkage,” the outliers are likely to be situated in very small clusters.

Statistical Methods for Outlier Detection in Space Telemetries Table 1 Hierarchical clustering on simulated data Features Number of clusters Anomalies pattern period - local 3 - 1 - 3 1 - Fourier 2 - Wavelet (0,1,2) 5 - Wavelet (5) 6 - Kernel (γ = 100) 7 - PCA 8 - KPCA 9 - Periodo + ker 10 - Periodo + PCA 11 - Shape + PCA Aggregating

3 3 13 3 4 4 6 5 2 –

1-1-0 1-1-0 2-1-3 1-1-0 2-1-0 2-1-0 1-1-3 1-1-3 0-1-0 All anomalies detected (3-1-3)

533

False alarms 0 0 6 0 0 0 0 0 0 6

All we have to do is to set the number of clusters in order to isolate the outliers. Tibshirani et al. [24] introduced the gap statistics, which is a criteria to set up automatically the number of clusters. For each feature set defined earlier, we apply this automatic hierarchical clustering on the simulated data. We label as anomalous the individuals which are not in the biggest cluster. The results we get with the simulated data are reported in Table 1. For almost all features types, the anomalies are in singleton clusters. The fifth wavelet levels feature set generates some misdetections, but almost all anomalies can be found there. We firstly notice that for all the other feature sets, we have detected nothing but real anomalies. We can see that the feature sets computed on the periodogram seem to be the best features for detecting outliers. The automatic criteria for selecting the best number of clusters are difficult to set up. In fact, several numbers of clusters are possible, and as we do not know in advance the number of anomalies, it is difficult to fix it properly. However, aggregating all the results of the clustering provides good results: All anomalies are found at leat once. 2. Temporal nearest neighbors dissimilarity For the telemetry application, we have chosen to consider the signal as repetitions of curves. However, there is a logical order in these curves that can be taken into account. Thus, we have developed a novel distance-based method to handle this property. In this section, we use a distance that we denote the kernel distance dK , which depends on a kernel K. This kernel has to be chosen in order to satisfy the following property.

534

C. Barreyre et al.

For all x, y, h ∈ R, K(x, y) = K(x + h, y + h). Kernels such as Laplacian kernels and Gaussian kernels can be considered. The kernel distance we choose is dK (θˆ i , θˆ j ) = K(i, j) × d (θˆ i , θˆ j ). Then we define the temporal nearest neighbors dissimilarity (TNND) as i = dK (θˆ i , θˆ j ). j≤i

This dissimilarity is motivated by the fact that some of the telemetries are repetitions of patterns that slowly evolve through time. Thus, a day of telemetry must be compared to the other days by giving more importance to the most recent previous days. Proposition 1 Suppose that θˆ i = θˆ i−1 + εi , with εi ∈ Rd . As the telemetry evolves slowly, we suppose that, if there is no anomaly, for all i = 1, . . . , n, ||εi || < η for some η > 0. Then, if K is a Laplacian kernel, K(x, y) = exp(−ρ|x − y|),where ρ > 0, we can show that e−ρ . (7) i ≤ η (1 − e−ρ )2 Proof We know that for all j = 1, . . . , i − 1, we have d (θˆ i− j , θˆ i ) = ||

j l=1

εi−l || ≤

j

||εi−l || = j × η

l=1

Thus, we have i =

i−1

K(i − j, i) × d (θˆ i− j , θˆ i )

(8)

j=1

≤η×

i−1 j=1

j × e−ρ j ≤ η ×

e−ρ (1 − e−ρ )2

(9)

In the case where no anomaly occur, we can estimate η by the maximum ||ε|| observed. It is also possible to consider that at most a proportion α of the observations are outliers. In this case, it is possible to estimate η by taking the 1 − α quantile of the ||εi ||, j = 1, . . . , n estimation, and to detect outliers when i exceeds this value. This method can only be applied to telemetry data. However, we notice that, in some cases, it can also be generalized to test data, because several curves can come from the same test, where an input parameter evolves at each iteration. Example 8 Let us suppose that at most 2% of the observations are anomalous. Then we fix η = q0.98 (||ε||). The results for each basis are stored in Table 2. We can see that the third level of wavelet is able to raise all the anomalies with no false alarm by setting the threshold thanks to the 98% quantile. Figure 16 shows the evolution

Statistical Methods for Outlier Detection in Space Telemetries Table 2 TNND results on simulated data Features Anomaly pattern - period local 3 - 1 - 3 1 - Fourier 2 - Wavelet (lev 0,1,2) 3 - Wavelet (lev 3) 4 - Wavelet (lev 4) 6 - Kernel (γ = 100) 7 - PCA 8 - KPCA 9 - Periodo + kernel 10 - Periodo + PCA 11 - Shape + PCA Aggregating

2-1-0 3-1-0 3-1-3 2-1-3 2-1-0 2-1-0 2-1-0 0-0-3 0-0-3 3-1-1 All anomalies detected

535

False alarm 0 0 0 0 0 0 0 0 0 1 1

Fig. 16 TNND and anomaly detected with the threshold set as 98% quantile for the third level of wavelet (left) and the periodogram + PCA (right)

of for this feature set. But by analyzing the evolutions of the TNND for the other features spaces, we can see that we could have detected all the anomalies, or almost, by setting the threshold to a lower value. Figure 16 also shows the results for the PCA built on the periodogram, for example. We also notice that only the feature set built on the aligned data generates false alarms.

536

C. Barreyre et al.

4.2 Local Outlier Factor The Local Outlier Factor is a score introduced by Breuning et al. [25] to detect outlier data. In addition of detecting outliers, it returns a score to illustrate how the data is anomalous. This approach is also related to density-based clustering. This method inspired the ESA in the Novelty software [1]. It is local since this degree depends on how the object is isolated with respect to the surrounding neighborhood. Suppose we have n objects x1 , . . . , xn to cluster. Let k be the number of neighbors to consider. To simplify the notations, we suppose that for x1 , x2 , x3 all different, d (x1 , x2 ) = d (x1 , x3 ). – Let d k (p) be the k-distance of an object p, such that for k objects, the distance to p is closer than d k (p), and the other n − p points are situated further. Let Nk (p) be the k nearest neighbors of the object p. – The reachability distance of an object p with respect to an object o is then defined as rk (p, o) = max{dk (o), d (p, o)}. If p and o are sufficiently close, the distance between them is replaced by the k-distance of o. – Then the local reachability density of p is defined as lrk (p) =

k o∈Nk (p) rk (p, o)

.

– The Local Outlier Factor is then defined as LOFk (p) =

1 lrk (o) . k o∈N (p) lrk (p) k

In other words, the LOF compares the nearest neighbors density distance of a given object with the nearest neighbors density distance of its nearest neighbors. When this score is close to 1, it means that the object is distributed in the same way as its neighbors. When having a large LOF, the corresponding object is likely to be an outlier. Example 9 In Table 3, we have indicated the results obtained if we consider as anomalous all the observations which lead to a LOF > 1.5, for k = 10 neighbors. We can see that once again all the anomalies were found at least once. The results seem comparable and a little less satisfying than the results we get from the TNND. The choice of the threshold is important, and we can see that the detection varies a lot, from a feature set to another. On Fig. 17, we have represented the two cases where the LOF exhibits the greatest and the smallest values. With the kernel features built on the periodogram set, we can detect all anomalies, while having some false alarms. On the other hand, with the same threshold, the LOF computed on the KPCA

Statistical Methods for Outlier Detection in Space Telemetries Table 3 LOF results on simulated data Feature Anomaly pattern - period local 3 - 1 - 3 1 - Fourier 2 - Wavelet (lev 0,1,2) 4 - Wavelet (lev 4) 5 - Wavelet (lev 5) 6 - Kernel (γ = 100) 7 - PCA 8 - KPCA 9 - Periodo + kernel 10 - Periodo + PCA 11 - Shape + PCA Aggregating

1-1-0 2-1-0 1-1-0 1-1-3 1-1-0 2-1-0 0-1-0 3-1-3 2-1-3 2-1-1 All anomalies detected

537

False alarm 0 0 0 2 0 0 0 4 1 2 8

Fig. 17 LOF and anomaly detected—threshold at 1.5

features detects only one anomaly. To explore several choices of threshold, we plot for each feature set the ROC curve. Those curves can be found in Fig. 18. As we can see, the features based on the kernel basis on the spectrum are the best choices in this example.

4.3 Density Approach Anomalies are rare events, and we would like to find them in our coefficients θˆ i ∈ Rd . The idea is to find a measurable subset of Rd that we call G, of probability greater than γ . Thomas et al. [3] gave a definition of this property. For this we set G such that its volume is as small as possible. In other word, we would like to solve the following problem

538

C. Barreyre et al.

Fig. 18 ROC curves for the LOF computed on several feature sets

min {μ(G) | P(θˆ ∈ G) > γ },

G∈B(Rd )

where μ(G) is the Lebesgue measure of G. Here B(Rd ) denotes the set of all meaˆ In fact, G is a surable subsets of Rd , and P is the distribution of the coefficients θ. density level set; see Cadre et al. [26] for further details. By truncating the density to a level γ , we are able to isolate the rare events. The One-Class SVM is one of the most famous algorithms that applies outlier detection by defining the minimum volume set. The One-Class SVM was introduced by Schölkopf et al. [27], and this method was already applied by Shawe-Taylor et al. [28] to time series data. Let (x1 , . . . , xn ) ∈ Rd ×n be a sample data. As in the SVM classification, the idea of the One-Class SVM is to transform the data with a nonlinear function ψ. After transformation, the problem is to find the separating hyperplan orthogonal to the vector w that defines a minimum volume containing a given proportion of the data. It isolates the individuals situated in low-density regions with a margin ρ. Those points are considered as outliers. The values of w and ρ are the solutions of the optimization problem 1 1 ||w||2 − ρ + ξi 2 νn i=1 n

min

w,ξ ,ρ

s.t w, ψ(xi ) ≥ ρ − ξi , ∀i = 1 . . . n

(10)

ξi ≥ 0, ∀i = 1 . . . n. The parameter ν ∈]0, 1] is fixed by the user. It represents an upper bound for the proportion of anomalies highlighted by the One-Class SVM. The decision function of this problem is h(x) = sgn(w, ψ(x) − ρ). It returns 1 if the individual is in a

Statistical Methods for Outlier Detection in Space Telemetries

539

high-density region. Let K(x, y) = ψ(x), ψ(y) be a kernel. The problem (10) is convex and can be solved by its dual. minn

α∈R

n 1 αi α j K(xi , x j ) 2 i, j=1

s.t. 0 ≤ αi ≤

n 1 , αi = 1. νn i=1

(11)

The points xi such that αi = 0 are called the support vectors. We can remark that ν is also a lower bound for the fraction of support vectors. These support vectors define the limit between outliers and nominal data. The decision function becomes h(x) = sgn( ni=1 K(xi , x) − ρ). The advantage of the One-Class SVM is that the nominal set is defined only by the support vectors. Thus, the computation time for new values is much faster than for many other algorithms. Usually, we choose to use a Gaussian kernel Kγ as it was defined in Sect. 2.2.1. This method is widely used for anomaly detection. NOSTRADAMUS is based on this method [2]. Thomas et al. [3] perform a new algorithm to tune the parameters γ and ν in order to find a minimum volume set of a desired level λ. Other methods exist to define the minimum volume set, such as isolation forests, introduced by Liu et al. [29]. Example 10 On Table 4, we can see that the One-Class SVM generates systematically false-positive detections. It can be explained as we know that it returns at least 5% of outliers whereas we truly have less than 3% of anomalies. Even if we set ν to a very small level, for example, 1%, we would always have a large number of false positives. It can be explained by the fact that, in most feature sets, the features are Table 4 OCSVM results on simulated data, with ν = 0.05 and γ = 1/d Feature Anomaly pattern - period False alarm local 3 - 1 - 3 1 - Fourier 3 - Wavelet (lev 3) 5 - Wavelet (lev 5) 6 - Kernel (γ = 100) 7 - PCA 8 - KPCA 9 - Periodo + kernel 10 - Periodo + PCA 11 - Shape + FPCA Aggregating

1-0-1 1-1-3 1-1-3 1-1-1 1-0-2 0-0-0 3-1-3 1-0-0 3-1-1 All anomalies detected

13 14 7 15 12 20 6 18 11 78

540

C. Barreyre et al.

not distributed according to an agglomerate of points. Hence, it is more difficult to set the minimum volume set. Moreover, we detect less anomalies than with the other methods in general. The periodogram seems once again to be the best feature set since all the anomalies were found. These results can be justified by the fact that for most feature sets, the features are ordered according to their temporal occurrence. In fact, it seems that there is no clear high-density areas, and density methods are expected to be less efficient than the TNND.

4.4 Conclusion on the Methods Given a simulated telemetry, we are able to conclude on several points to the efficiency of the different methods. • The distance methods can be applied when a telemetry is evolutive through the year. • The One-Class SVM is less efficient since the trending effect does not always generate high-density area. • The temporal nearest neighbors distance is the best method to use in the given example, but the parameter η has to be chosen in order to be coherent with the percentage of anomalies. For each outlier detection method, at least one feature set was able to raise all anomalies with only a few false alarms. The TNND coupled with the features built on the periodogram seems to be the best compromise in this case. All the results from the simulated data are shown in Fig. 19. All those methods have to be implemented on real data to confirm our results, or to refine them.

Fig. 19 True-positive rate over false-positive rate for all methods

Statistical Methods for Outlier Detection in Space Telemetries

541

5 Validation Using Satellite Data We have validated our methods with simulated telemetry. However, as we noticed in Sect. 2.2, the telemetries can have various behaviors, and consequently we would also like to test them on real data. For each dataset, we will comment the results and illustrate the ones for which the methods that seem to return the most interesting results for the given use-case.

5.1 Telemetry 1. First application We are dealing with the first telemetry we described earlier. As we have seen in Sect. 2.2, we know that some outliers are expected to be detected, such as spikes, drops in the amplitude of the signal, and pattern anomalies. On some portions, the signal can be altered. We have labeled by hand each portion of signal and computed all the features we have developed in Sect. 3. However, this telemetry does not have a daily periodicity, which has a strong importance since the repetitive patterns are not all scaled in the same way (see Fig. 1). Thus, we focus on the features built on the periodogram and the aligned data. As expected, the hierarchical clustering returns a very small number of outliers because of the impossibility to fix properly the number of clusters. The One-Class SVM returns many false alarms. As some anomalies occur several times, it seems also that the OCSVM considers those anomalies as “nominal”. The best results seem to be obtained with the Local Outlier Factor computed on the kernel basis, on the periodogram. We have labeled some of the days that are represented in Fig. 20 in order to visualize the efficiency of this algorithm. 2. Second application The previous example was a critical one, and usually a telemetry dataset contains only a few outliers. For instance, the second application does not contain known anomalies, and the algorithms will help us to detect the changes in the behavior of this telemetry. The telemetry varies over the year, and the daily portions of this telemetry are repetitive. Thus, any functional decomposition can be informative in this case. As the patterns are really regular, it seems unnecessary to use the periodogram or the aligned data. The principal component analysis seems to provide good results such as wavelets and kernel features. As the telemetry varies, the temporal nearest neighbor dissimilarity is the most powerful method. We represent the values of the statistics computed on the PCA coefficients and the fourth level of wavelets. These results appear in Fig. 21. We can see that both feature sets enable to highlight more or less the same outliers, but not

542

C. Barreyre et al.

Fig. 20 Local Outlier Factor of the kernel features built on the periodogram, first telemetry application

Fig. 21 TNND on two feature sets

with the same importance. Figure 22 shows some examples of anomalies that can be found thanks to this method. Local changes as well as changes in the behavior of the telemetry can also be detected.

5.2 Gain Frequency Tests In this section, we apply the algorithms on test data instead of telemetry. As we have extracted functional features, it is important to validate this work directly with functional data. The temporal nearest neighbor dissimilarity cannot be used in this

Statistical Methods for Outlier Detection in Space Telemetries

543

Fig. 22 Anomalies found with TNND, with both PCA and wavelet features

case, because the curves are not temporally ordered. We could have used it if, for example, a test would have been tested under several conditions, for example, if we get a curve each time we increase the temperature. This will be done in a further study. We compute the One-Class SVM and the LOF on all the feature sets we have described in Sect. 3. Once again, two different methods are necessary to detect the different types of outliers. As it was already done on a previous study (see [30]), we can see that the best feature basis is the principal component analysis. Both methods (LOF and One-Class SVM) give satisfying results, but the OCSVM generates again many false positives, as we can see on Fig. 23. In this case, this can be explained by the fact that the alterations can only decrease the values of the curves. Thus, a “perfect” GF curve can be seen as an outlier since it is an upper boundary for all the GF curves. The Local Outlier Factor has similar results, with less false alarms. Therefore, this method seems more adapted to our problem. The fifth level of wavelets highlights the expected local anomalies we have described in Fig. 3 (Fig. 24).

5.3 Application to a Full Archive of Satellite Telemetry This work has enabled us to benchmark a large amount of methods for applying outlier detection on space telemetry data. In order to validate these results on real telemetry data, we have applied one of these methods on a full archive of a satellite. This archive contains 12 years of more than 1700 continuous signals, sampled every 24 s. It represents an amount of nearly 1 terabyte of data. In a first step, we have benchmarked different storage solutions, such as HDFS, Parquet, Hive, HBase. We chose to use Parquet for its good performances in terms of time consumption for

544

C. Barreyre et al.

Fig. 23 One-Class SVM (ν = 0.05) on the principal component features

Fig. 24 LOF (k = 20 neighbors) on the principal component features. Threshold at 1.5

reading and processing the data, and because it is one of the easiest to set up for data ingestion. See Boussouf [31] for further information on the implementation and infrastructure details. We applied the kernel features computation along with the TNND computation, as it had shown to have good performances on simulated data, and because it is really easy to implement in a distributed framework. We have applied the same algorithm on all the telemetries, whatever their type or the equipment they were coming from. In a first step, we needed to decommute the telemetries, and it has shown that this step is really time consuming, and it took us 60 days to decommute all the archive. Storing the data into Parquet takes approximatively 9 days, and we were surprised that the outlier detection processing on this archive took only 4 h and 30 min to run, which is really short comparatively to the previous steps.

Statistical Methods for Outlier Detection in Space Telemetries

545

Fig. 25 Example of outliers detected on the archive of the satellite, in red. In orange, we have represented the same day one year before for comparison. On the left, we have raised a local anomalies, whereas the outlier on the right might be a false alarm, due to the shape of the telemetry that is not adapted to our method

The results are really satisfying, because the outliers that were raised by our algorithm correspond to real novelties, such as pattern events and local events. We also have noticed that our method is less powerful on non-periodic signals. This issue will be treated in a future work. On Fig. 25, we have raised a local event detected by the algorithm, and an example of the weaknesses of our algorithms on non-periodic signals.

6 Conclusion We have explored several methods for unsupervised outlier detection in functional data. We have seen that one single method is unlikely to bring out all anomalies. However, by aggregating the feature sets, we were able to detect all anomalies in all the sources of data. We have seen that the aggregation of the methods generates sometimes many false alarms, as for the OCSVM. Even for these extreme cases, the outlier detection methods still improve the manual review since they enable the expert to focus only on a reduced number of observations. In addition, it seems that these methods can be more sensitive to some given properties of the data, in particular to the data periodicity. Here is what we can conclude. • The One-Class SVM seems to generate more false alarms. However, if we can learn from the results of the classification for a while, it would be possible to switch to a semi-supervised approach and then to a supervised model. This method has the best efficiency when we know the percentage of anomalies, when the data is distributed according to an agglomerate of points. • The Local Outlier Factor is very efficient when the data is not evolutive. It is often the best method to detect local anomalies, as for the Gain-Frequency data. It can be used with any features.

546

C. Barreyre et al.

• The TNND is the best method to characterize the evolutive properties of telemetry. However, the percentage of outliers has to remain as small as possible. We can also conclude on the features to be used in each case. • The kernel basis, PCA, KPCA, and the small-level wavelet features can be used as soon as the curves are similar, such as the test data. In this case, the PCA and KPCA provide satisfying results to raise pattern anomalies. • The periodogram features are very efficient when the data has a changing period, such as the first telemetry example. When the telemetry is very regular, it is unnecessary to use them. • The high-level wavelets are useful to detect local anomalies, as well as the periodogram features. • Aligning the data can be useful when some patterns appear randomly in a constant telemetry for example. If the scale of the repetitive patterns changes randomly, it is better to use the periodogram. If we combine the types of anomalies we want to detect and the data type, we can find the best method for the outlier detection. The feedback of our experts on this work will help us improving the method, especially on non-periodical data, as we have figured out in our full archive analysis. We also demonstrated the performances of our method on a full archive of telemetries, that they are really easy and fast to run in a distributed environment, enabling big data applications. A further step that could be foreseen would be to use the correlation between several telemetries as a feature, in order to detect abnormal correlations on several parameters.

References 1. Martínez-Heras, J.-A., Donati, A., Kirsch, M. G., & Schmidt, F. (2012). New telemetry monitoring paradigm with novelty detection. In SpaceOps Conference (pp. 11–15). Sweden: Stockholm. 2. Fuertes, S., Picart, G., Tourneret, J.-Y., Chaari, L., Ferrari, A., & Richard, C. (2016). Improving spacecraft health monitoring with automatic anomaly detection techniques. In 14th International Conference on Space Operations (p. 2430). 3. Thomas, A., Feuillard, V., & Gramfort, A. (2015). Calibration of one-class SVM for MV set estimation. In IEEE International Conference on, IEEE Data Science and Advanced Analytics (DSAA) 36678 2015 (pp. 1–9). 4. Rabenoro, T. (2015). Outils statistiques de traitement d’indicateurs pour le diagnostic et le pronostic des moteurs d’avions. Ph.D. thesis, Université Paris 1 Panthéon Sorbonne. 5. Abdel-Sayed, M., Duclos, D., Faÿ, G., Lacaille, J., & Mougeot, M. (2016). NMF-based decomposition for anomaly detection applied to vibration analysis. 6, 73–81. 6. Antoniadis, A., Brossat, X., Cugliari, J., & Poggi, J.-M. (2013). Clustering functional data using wavelets. International Journal of Wavelets, Multiresolution and Information Processing, 11(01), 1350003. 7. Tarpey, T., & Kinateder, K. K. J. (2003). Clustering functional data. Journal Classification, 20(1), 93–114. 8. Auder, B., & Fischer, A. (2012). Projection-based curve clustering. Journal Statistical Computation Simulation, 82(8), 1145–1168.

Statistical Methods for Outlier Detection in Space Telemetries

547

9. Pan, J., Chen, J., Zi, Y., Li, Y., & He, Z. (2016). Mono-component feature extraction for mechanical fault diagnosis using modified empirical wavelet transform via data-driven adaptive Fourier spectrum segment. Mechanical Systems and Signal Processing, 72, 160–183. 10. Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys (CSUR), 41(3), 15. 11. Ramsay, J. O., & Silverman, B. W. (2005). Functional data analysis (2nd ed.). New York: Springer Series in Statistics, Springer. 12. Daubechies, I. (1992). Ten Lectures on Wavelets, CBMS-NSF Regional Conference Series in Applied Mathematics (Vol. 61). Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM). 13. Coifman, R. R., & Wickerhauser, M. V. (1992). Entropy-based algorithms for best basis selection. IEEE Transactions on Information Theory, 38(2), 713–718. 14. Steinwart, I. (2001). On the influence of the kernel on the consistency of support vector machines. Journal of Machine Learning Research, 2, 67–93. 15. Mercer, J. (1909). Functions of positive and negative type and their connection with the theory of integral equations. London Philosophical Transactions Royal Society. 16. Schölkopf, B., & Smola, A. J. (2000). Learning with Kernels. Cambridge: MIT Press. 17. Schölkopf, B., Smola, A. J., & Müller, K.-R. (2005). Kernel principal component analysis. Artificial Neural Network. 18. Schlindwein, F., & Evans, D. (1992). Autoregressive spectral analysis as an alternative to Fast Fourier transform analysis of Doppler ultrasound signals. Doppler Physics and Signal Processing. 19. Vantini, S. (2012). On the definition of phase and amplitude variability in functional data analysis. Test, 21, 676–696. 20. Wang, K., & Gasser, T. (1997). Alignment of curves by dynamic time warping. Annals of Statistics, 25, 1251–1276. 21. Gamboa, F., Loubes, J.-M., & Maza, E. (2007). Semi-parametric estimation of shifts. Electronic Journal of Statistics, 1, 616–640. 22. Dimeglio, C., Gallón, S., Loubes, J.-M., & Maza, E. (2014). A robust algorithm for template curve estimation based on manifold embedding. Computational Statistics & Data Analysis, 70, 373–386. 23. Hawkins, D. M. (1980). Identification of outliers (Vol. 11). Berlin: Springer. 24. Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2), 411–423. 25. Breunig, M. M., Kriegel, H.-P., Ng, R. T., & Sander, J. (2000). LOF: Identifying density-based local outliers. In ACM sigmod record (Vol. 29-2, pp. 93–104). ACM. 26. Cadre, B., Pelletier, B., & Pudlo, P. (2013). Estimation of density level sets with a given probability content. Journal of Nonparametric Statistics, 25(1), 261–272. 27. Schölkopf, B., Platt, J. C., Shawe-Taylor, J. C., Smola, A. J., & Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural Computing, 13(7), 1443– 1471. 28. Shawe-Taylor, J., & Zlicar, B. (2015). Novelty detection with one-class support vector machines. Advances in Statistical Models for Data Analysis. 29. Liu, F., Ting, K., & Zhou, Z.-H. (2008). Isolation forest. Data Mining. 30. Barreyre, C., Laurent, B., Loubes, J.-M., & Cabon, B. (2016). Détection d’événements atypiques dans des données fonctionnelles. Les journées de la Statistique. 31. Boussouf, L., Bergelin, B., Scudeler, D., Graydon, H., Stamminger, J., Rosnet, P., et al. (2018). Big data based operations for space systems. In 2018 SpaceOps Conference 2018 (p. 2506).

Part III

Mission Execution

In-Orbit Experience of the Gaia and LISA Pathfinder Cold Gas Micro-propulsion Systems Jonas Marie, Federico Cordero, David Milligan, Eric Ecale and Philippe Tatry

Abstract This paper presents in-flight experience of the cold gas micro-propulsion systems (MPS) used on-board the Gaia and LISA Pathfinder spacecraft. Gaia is an ESA Science cornerstone mission which is tasked with mapping one billion stars in the Milky Way to unprecedented precision. It is also expected to discover and chart 100,000’s of new objects including near-earth asteroids, exoplanets, brown dwarfs and quasars. The Gaia spacecraft was designed and built by Airbus Defence and Space. After a flawless launch on 19 Dec 2013, it was brought the circa 1.5 million km into L2 via a Soyuz Fregat burn. Additional delta-V was realized via a sequence of technically demanding orbit transfer manoeuvres using on-board chemical thrusters in thrust vectoring mode. Since early 2014, Gaia has been operating in a halo orbit around the second Sun-Earth Lagrange point that provides the stable thermal environment without Earth eclipses needed for the payload to function accurately. Starting in parallel to this and lasting six months, the spacecraft was fully commissioned and brought gradually up to the highest operational mode. The unique rate stability requirements for Gaia’s science mode (the standard deviation of its rate error is equivalent to one rotation every 420 years) lead to a high number of bespoke units, including a 106 CDD focal plane assembly, telescope-in-the-loop AOCS control and a cold gas micro-propulsion system which was developed by Thales Alenia Space Italia (TAS-I) and Leonardo Company. Continuously compensating for the solar radiation pressure torque in order to maintain an undisturbed scanning-law of the celestial sphere, a 3-year data set of MPS housekeeping telemetry was collected that offers the unique opportunity to characterize long-term performance of this novel fine pointing actuation system (thrust range 1 µN to 1000 µN at 0.1 µN resoJ. Marie (B) LSE Space GmbH, Darmstadt, Germany e-mail: [email protected] F. Cordero Telespazio Vega, Darmstadt, Germany D. Milligan ESA/ESOC, Darmstadt, Germany E. Ecale · P. Tatry Airbus Defence and Space, Toulouse, France © Springer Nature Switzerland AG 2019 H. Pasquier et al. (eds.), Space Operations: Inspiring Humankind’s Future, https://doi.org/10.1007/978-3-030-11536-4_21

551

552

J. Marie et al.

lution) in thermally stable conditions. Planned disturbances such as station keeping manoeuvres using coarse chemical propulsion as well as unplanned disturbances due to environmental effects such as micro-meteoroid impact also help characterize the cold gas system performance under stress due to sudden increase in thrust demand. LISA Pathfinder (LPF) is an ESA mission that demonstrates technologies needed for a planned ESA gravitational wave observatory. The LPF spacecraft, designed and built by Airbus Defence and Space, places two test masses in a nearly perfect gravitational free-fall, and controls and measures their relative motion with unprecedented accuracy. The laser interferometer measures the relative position and orientation of the masses to an accuracy of less than 0.01 nanometres, a technology shown to be sensitive enough to detect gravitational waves by the planned follow-on ESA mission, the Laser Interferometer Space Antenna (LISA). Launched on 03 Dec 2015, LPF reached its operational orbit around L1 in early 2016 where it underwent payload commissioning. The same MPS used by Gaia was selected for LPF fine attitude control to realize the extremely accurate free-fall trajectory of its test masses. After reaching the destination orbit, the propulsion module was separated. Since this point, the MPS was also the main actuator for station keeping manoeuvres where the cold gas thrusters are operated in “open-loop” realizing demanded forces. Approaching its nominal end of mission lifetime, LPF carried out dedicated test operations to characterize MPS performance under non-nominal conditions. The aim of this paper is to present the in-flight results that have been derived from post-processing of the Gaia and LPF housekeeping telemetry archives in terms of micro-thruster performances. MPS off-nominal events that have been encountered in-flight on individual or both spacecraft will be presented as well as mitigation actions that have been put in place to restore nominal conditions.

Nomenclature ARM CACS CPS DFACS DN DRS DS FSS GN2 IS JPL LPF LTP MFS MPE

Apogee raising manoeuvre Composite attitude control subsystem Chemical propulsion system Drag-free attitude control subsystem De-nutation phase Disturbance reduction system De-spin phase Fine sun sensors Gaseous nitrogen Inertial measurement system Jet propulsion laboratory LISA pathfinder LPF technology package Mass flow sensor Micro-propulsion electronics

In-Orbit Experience of the Gaia and LISA …

MPS MT OBCP PLM PRM SAA SCM SRP TV VPU

553

Micro-propulsion system Micro-thruster On-board control procedure Payload module Chemical propulsion module Solar aspect angle Science module Solar radiation pressure Thruster valve Video processing unit

1 Introduction The aim of this paper is to present the in-flight results derived from post-processing of the Gaia and LPF housekeeping telemetry archives in terms of micro-propulsion system (MPS) performances. MPS significant off-nominal events that have been encountered in-flight on individual or both spacecraft will be presented as well as mitigation actions that have been put in place to restore nominal conditions.

1.1 Gaia The Gaia mission, designed and built by Airbus Defence and Space, was launched on the 19 December 2013 on a Soyuz-Fregat ST from French Guiana in Kourou, and is operated by the European Space Operations Centre (ESOC), in Darmstadt, Germany. Gaia is an ESA cornerstone mission, which relies on the proven principles of ESA’s Hipparcos mission to solve one of the most difficult yet deeply fundamental challenges in modern astronomy: to create an extraordinarily precise three-dimensional map of about one billion stars aiming at star magnitudes up to 20 throughout our Galaxy and beyond [1]. The Gaia spacecraft contains a relatively large fraction of bespoke units, due largely to the incredible precision requirements. Gaia’s digital camera is the largest ever flown in space, containing 106 CCDs, which are around 90% light efficient (c.f. 20% typical terrestrial camera efficiency). The telescope is linked to the on-board attitude control system, providing precise rate measurements and allowing an overall control rate error equivalent to one rotation every 420 years (Fig. 1). Moving parts on-board are strictly minimised (e.g. no reaction wheels and no mechanically steerable antenna). The data are downlinked through a novel electromagnetically steerable phased array antenna and attitude control is provided by a micro-propulsion system that has its first flight use with Gaia. An atomic clock is used for precise time-stamping. The goal of the Gaia mission is to perform global

554

J. Marie et al.

Fig. 1 Gaia S/C reference frame

astrometry. For this purpose, the satellite is equipped with two telescopes with astrometric viewing directions (referred to as line of sight 1 and 2 in Fig. 2) separated by a large angle, called the basic angle, of 106°. The two telescopes share the same focal plane, in which the detectors are located. With this arrangement, the position of stars can be measured and compared between the two lines of sight, which are separated by 106° and arranged in a plane perpendicular to the spacecraft spin axis [2]. The spacecraft performs a continuous scanning motion around the spin axis, which allows the fields of view to sweep across the sky with a nominal speed of 1°/min (i.e. once every 6 h). The sun aspect angle between the satellite spin axis and the spacecraft-sun line of 45° has been selected as a compromise between optimum measurement accuracy and system design constraints (payload protection from direct sun illumination, thermal and power constraints, etc.). The spin axis rotates around the sun direction with a period of approximately once every 63 days. This slow precession allows the full coverage of the sky within a few months. Therefore, the viewing directions of the two astrometric instruments prescribe great circles over the sky, observing each star on several great circles during the nominal mission lifetime at L2. The spacecraft is designed with a high level of on-board autonomy, but the nature of the mission dictates that a relatively large amount of ground station contact is necessary to be able to downlink the data that Gaia generates. From its Lissajous orbit around the second Lagrange point (L2), the spacecraft is continuously observing during the nominal 5-year science mission, using solid state mass memory to store

In-Orbit Experience of the Gaia and LISA …

555

Fig. 2 Gaia’s sky scanning law

data on-board whilst out of contact with the ground. Gaia is in daily contact with ESOC via ESA’s 35 m deep space ground station network in Cebreros (Spain), New Norcia (Australia) and Malargüe (Argentina). Since the amount of science data generated on-board varies according to the area of the sky being scanned, the amount of time needed to downlink it varies as well. This is particularly relevant during periods when the spacecraft is scanning along the galactic plane where stellar densities are very high. In these periods, maximum ground station coverage is requested [3].

1.2 LISA Pathfinder The LISA Pathfinder mission, designed and built by Airbus Defence and Space, was launched on 3 December 2015 by a VEGA launcher from French Guiana in Kourou, and was operated by ESOC in Darmstadt, Germany until its planned decommissioning, on 18 July 2017. The launcher injected LPF in an elliptical low Earth orbit (LEO), from which LPF made its way to the Earth-Sun L1 Lagrange point through a series of delta-V manoeuvres using its chemical propulsion module (PRM). Fifty days after launch, when LPF reached L1, the PRM separated from the science module (SCM), leaving the cold gas micro-propulsion (same equipment used by Gaia) as the only actuator available for attitude and orbit control. At this point in the mission, the MPS was used to de-nutate and de-spin (from 5 to 0°/s) the SCM

556

J. Marie et al.

and to conduct the necessary trimming delta-V manoeuvres, to place the spacecraft in the exact point around L1. For these manoeuvres, and for the final de-orbiting manoeuvre, the MPS thrusters were used to generate a force up to 500 µN each. The core LPF technology package (LTP) is the inertial measurement system (IS), with two test masses (TMs) placed in free fall, at the ends of a short laser interferometer arm to measure their relative position and orientation to an accuracy of less than 0.01 nm. This technology shown to be sensitive enough to detect gravitational waves by the planned follow-on ESA mission, the Laser Interferometer Space Antenna (LISA). In LPF, the TMs were, however, insensitive to gravitational waves because of the reduced length (37 cm), but sensitive to the differential acceleration, g, arising from parasitic forces, like solar radiation pressure, spacecraft IR radiation pressure, micro-meteorites or the self-gravity of the satellite. To prevent these perturbations on the TMs, the relative position between the spacecraft and the TMs is constantly controlled by adjusting the SC position using the MPS thrusters. This control strategy forcing the spacecraft to follow the test masses is called drag free. In drag-free operations, the main perturbation force to compensate is the solar and the spacecraft IR radiation pressures, which for LPF, was ~26 µN, requiring a constant compensating force of ~9 µN from each of the MPS thrusters. During station keeping manoeuvres, conducted with the TMs electrostatically suspended, this force was increased up to 116 µN, to achieve a net force of ~40 µN on the sun direction, compatible with the TM electrostatic actuator authority. A peculiarity of LPF cold gas MPS is its unidirectional thrusters configuration with all thrusters generate acceleration towards the sun; the full 6 degree of freedom (DOF) control is however restored, when the spacecraft is in steady state, by using the sun as a virtual thruster. As an additional technology experiment, LPF also embarked a colloidal MPS, supplied by JPL. These thrusters were used to test alternative dragfree control schemes and thruster technology. LPF exhibited flawless in-orbit performance, very well matching the performance predicted by simulations. The level of g measured between the TMs was 5.2 fm s−2 Hz−1/2 in the bandwidth 0.7…20 mHz. Such a performance was possible because of the unique hardware equipment on-board LPF, among which also the cold gas MPS, which continuously supported the drag-free mode for the entire mission, only interrupted by the necessary Station Keeping manoeuvres, scheduled every 1 or 2 weeks and by the time allocated to the Colloidal MPS [4–6].

2 AOCS Design 2.1 Gaia Gaia AOCS architecture is based on a fully redundant set of equipment. Actuators:

In-Orbit Experience of the Gaia and LISA …

557

• a bi-propellant chemical propulsion system (CPS): 2 × 8 10N thrusters used in cold redundancy used for orbit maintenance and attitude control (circa 350 kg MON + MMH); • A micro-propulsion System which is internally redundant and used in cold redundancy. The nominal science attitude control mode uses the cold gas MPS. Sensors: • three fine sun sensors (FSS) used in hot redundancy (triple majority voting); • Three gyro packages. Each gyro package comprises two fully independent coaligned channels (i.e. a fibre optic gyroscope sensor plus associated electronics per channel), with the channels being used in cold redundancy; • The payload module (PLM) very precise rate measurements, driven by the seven video processing units (VPU), are fed in the AOCS control loop to achieve the stringent rate stability requirements of the nominal science mode. • Two autonomous star trackers (A-STR T006868700, Galileo Avionica S.p.A.) used in cold redundancy. The Gaia AOCS SW consists of different control modes which are selected according to the operations that have to be carried out: • The sun acquisition mode (SAM) realises the initial attitude acquisition in case of safe mode. It makes use of the gyroscopes and the fine sun sensor for attitude sensing and the bi-propellant chemical propulsion system for attitude control. • The inertial guidance mode (IGM) is a transient mode between CPS controlled modes and MPS controlled modes. In this mode, the spacecraft is controlled with the CPS and follows attitude profiles defined by ground. The attitude estimation is realised by the hybridization of the gyroscopes and star tracker measurements through the gyro-stellar estimator (GSE). • The orbit control mode (OCM) is designed to realise the orbit correction manoeuvre with the CPS. It uses the same set of sensors than the IGM. In transition mode (TSM), the attitude sensing principle is the same as in IGM, but the control is ensured by the MPS. The role of this mode is to acquire the stable dynamic conditions required to initialise the instrument acquisitions. • The normal mode (NM) based on the A-STR, PLM and MPS. NM is the Gaia science operational mode. The attitude estimation is realised by the hybridization of the PLM and star tracker measurements by the astro-stellar estimator (ASE). In NM, the AOCS autonomously commands sub-mode transitions to progressively put the science payload in the control loop and to minimise the attitude and rate errors with respect to the ground commanded scan law. The PLM features three different modes which are used successively to progressively put the payload into the AOCS control loop: the NM sub-mode zoom astro phase (ZAP) is entered at TSM exit. When the convergence criteria in terms of star flow and attitude/rate error are met, AOCS switches to the next higher sub-mode gate astro phase (GAP) which performs the star speed measurements with smaller readout windows and larger integration times. The convergence criteria can now be refined as the payload can detect fainter objects and

558

J. Marie et al.

Fig. 3 Gaia AOCS and PLM modes (Courtesy of Airbus Defence & Space)

hence provides more accurate rate measurements to the ASE in a larger quantity. The highest NM sub-mode normal astro phase (NAP) finally enables the most accurate pointing performance which is used for science generation. In NM/NAP, Gaia is following the ground commanded scanning law guidance (SLG). The main indicators for how accurate this guidance profile is followed are the ASE angle and rate errors. They provide the difference between the demanded attitude/rate profiles and the ASE-measured attitude/rate [7] (Fig. 3).

2.2 LPF LPF AOCS architecture is based on a redundant set of equipment. Actuators: • A bi-propellant Chemical Propulsion System (CPS, ~1260 kg MON + MMH) carried by the detachable Propulsion Module (PRM): 2 × 4 10N thrusters and a 400N main engine with redundant actuators, used in cold redundancy during LEOP and transfer phase to L1, for apogee raising manoeuvres (ARM) and attitude control; • A cold gas MPS carried by the science module: 2 × 6 0…1 mN micro-thrusters (MT), controllable in 0.1 µN steps, used in warm redundancy; • A colloidal MPS, as part of the disturbance reduction system (DRS) supplied by JPL: 8 thrusters 5…30 µN (non-redundant). Sensors: • Three sun sensors used in hot redundancy (triple majority voting); • Two ring laser gyro packages, providing three axes rate measurements, mounted in a skewed configuration to allow consistency checking, used during LEOP and transfer to L1 up the PRM/SCM separation; • Two autonomous star trackers used in cold redundancy, mounted in a skewed configuration; • The LPF technology package, comprising the TMs, the laser interferometer, a capacitive position/attitude measurement system, an electrostatic suspension actu-

In-Orbit Experience of the Gaia and LISA …

559

Stand-by

CACS Modes

MPACS Modes

DFACS Modes

DRS Modes

Chemical Propulsion

Cold Gas Micro Propulsion

Colloidal Micro Propulsion

Fig. 4 LISA path-finder (LPF) AOCS modes

ation system for each TM, electrostatic discharge UV lamps and a radiation monitor. When not in free fall, the TMs were either suspended by the electrostatic actuators or grabbed by the grabbing mechanism for hardware safety reasons. The LPF AOCS is designed around three distinct mission phases and divides the functional modes into three primary groups (Fig. 4): • Composite attitude control subsystem (CACS), comprising the set of modes used during the composite phase of the mission, from launcher separation in LEOP until PRM/SCM separation in the vicinity of the L1 point: – Composite sun acquisition mode (CSAM), to damp the initial rates after launcher separation to zero and acquire the sun on the +Z-axis. Used also as the mode for spin-up prior to PRM/SCM separation; – Composite normal mode (CNM), to implement the inertial pointing using a gyro-stellar attitude estimation filter to maintain the commanded attitude; – Composite delta-V mode (CDVM), to control the attitude during the 400N main engine burns; – Composite thruster control mode (CTCM), to provide delta-V using the 10N thrusters only; • Micro propulsion attitude control subsystem (MPACS), comprising the set of modes for the SCM after PRM release, when the drag-free modes are not in use, i.e. when the TMs are mechanically grabbed: – Sun acquisition mode (MSAM), used directly after PRM separation to firstly remove the spacecraft rates and then to acquire and maintain a sun pointing attitude;

560

J. Marie et al.

– Normal mode—on station (MNM—OS) is the main steady state MPACS mode providing attitude control and guidance capability; – Normal mode—station keeping (MNM—SK) is used to perform the orbit correction manoeuvres with the CGAS thrusters; – Normal mode—science transition (MNM—ST) allows the required conditions for entry to drag-free modes implemented by DFACS; • Drag-free attitude control subsystem (DFACS), comprising the set of modes used to carry out the main mission science objectives, including not only the drag-free attitude control modes themselves, but a series of support modes or sub-modes to manage activities such as de-caging, grabbing/ungrabbing of the TMs or fine calibration of the micro-propulsion systems: – Attitude mode (DF_ATT) is the first mode that is used after handover from MPACS. It only controls the attitude of the spacecraft, while the TMs are mechanically grabbed; – Accelerometer mode (DF_ACC_X) is a kind of safe mode for the DFACS operations, as it uses high voltage (i.e. high force/torque) of the TM electrostatic actuation to suspend the test masses with respect to the spacecraft; – DRS mode (DF_DRS) is used during the mission periods where the control of the spacecraft and the test masses is handed over to the disturbance reduction system (DRS), the JPL experiment implementing alternative attitude control and drag-free modes using the colloidal thrusters; – Normal mode (DF_NOM_X) is the first mode that uses drag-free control on some of the test mass degrees of freedom, to facilitate a smooth transition from accelerometer mode to the science modes; – Science mode 1 (DF_SCI1_X) is the baseline science mode using drag-free control on test mass coordinates designated coordinates and the laser interferometer as TM position and attitude sensor; – Science mode 2 (DF_SCI2_X) is the backup science mode, same as DF_SCI1_X but with the capacitive sensor as TM position and attitude; – Drift mode (DF_DRIFTX) is a special science mode, based on DF_SCI1_X, it switches off the control on dedicated TM coordinates. These coordinates are drifting freely for a fixed period of time and then “kicked back” into the centre via electrostatic actuation. The DFACS modes implement the control laws for 15 degrees of freedoms (DOFs), 6 for each TM and the S/C attitude, separating the S/C attitude control, the TM1/TM2 suspension control and the drag-free control by bandwidth as shown by Fig. 5 (MBW is the measurement bandwidth).

In-Orbit Experience of the Gaia and LISA …

Suspension control bandwidth: 1, 1, 2, 2, are controlled to zero with capacitive forces Attitude control bandwidth: test masses dof y1,z1, 1, y2,z2 are controlled such that S/C stays sun pointing

0.01 mHz

2,

2

561

Drag-free control bandwidth: x1,y1,z1, 1, y2,z2 are controlled to zero with CGAS MPS MBW

1 mHz

f 30 mHz

~0.15 Hz

Fig. 5 LPF test mass and S/C control bandwidths

3 Cold Gas Micro-propulsion 3.1 Overview One novel feature of the Gaia AOCS is the cold gas MPS. This subsystem had been specifically designed for the Gaia mission by Thales Alenia Space Italia and Leonardo Company in cooperation with Gaia prime contractor Airbus Defence and Space (ADS), Toulouse. It was also selected for the missions LPF (ESA), Euclid (ESA) and Microscope (ESA/CNES). The MPS consists of micro-thrusters which are fed by gaseous nitrogen (GN2) and controlled by a micro-propulsion electronic unit (MPE). High-pressure gas tanks hold the nitrogen which is used to feed the thrusters at a regulated pressure. The proportional micro-thrusters can generate a quasi-continuous thrust in the range 0–1 mN, with a resolution of 0.1 µN. This very low level of thrust is required to achieve the performances mandatory for highaccuracy pointing missions such as Gaia and Euclid and high-accuracy drag-free missions such as LPF and microscope (Fig. 6). Each micro-thruster contains a calorimetric mass flow sensor (MFS) and a thruster valve (TV) whose opening level is driven through piezo-discs. The voltage applied to

Fig. 6 Simplified MPS control logic

562

J. Marie et al.

the piezo-discs allows the GN2 throughput to be controlled. The thrust of each microthruster is controlled in a closed loop running at 40 Hz. The MPE software commands each piezo-disc voltage (in the range 0–200 V) as a function of the difference between the measured mass flow and the target mass flow which is derived from the demanded force coming from the AOCS. The opening level of the micro-thruster (MT) piezoelectric valve is comprised of two voltages. DAC1 is a static base voltage that is commanded from ground to be close to the opening voltage. DAC2 is the regulation voltage that is autonomously driven by the MPE to achieve the desired thrust level. A simplified model illustrating the relation between the DAC voltages and the thrust level is shown in Fig. 7. The MFS is a silicon chip with five integrated resistors, encapsulated inside a sealed container where the nitrogen gas of the cold gas system flows through. The working concept is similar to a kind of anemometer, the thermo-anemometric flowmeter: when the gas flows across the chip, a temperature gradient is applied. The resulting unbalancing of resistances causes a signal in the measurement circuit that can be correlated to the gas flux. In order to provide the expected fine performances, the MFS voltage output for zero mass flow is calibrated at thruster switch-on and checked routinely by ground command. The MFS offset calibration mechanism acquires the MFS output when the thruster valve is closed. This offset is then taken into account by MPE software during thrust generation. The recorded offset is used until a new calibration is performed [8].

Fig. 7 DAC voltages driving the thrust level

In-Orbit Experience of the Gaia and LISA …

563

3.2 Gaia MPS The Gaia MPS implements a cold redundancy concept where each unit is backed up by a redundant equivalent, i.e. 2 × 6 MTs are operated by two MPEs in cold redundancy where each branch of six MTs is controlled by one MPE. A mixedbranch thruster configuration is possible, but requires both MPEs to be operated in parallel. Two high-pressure gas tanks hold the nitrogen which is used to feed the thrusters at a regulated pressure of 1 bar (Fig. 8). In Gaia’s orbital position at L2, the main source for external attitude disturbance originates from the solar radiation pressure (SRP). Gaia is constantly spinning while the SAA is maintained at 45°. Two out of the six MPS thrusters are used to control the rotation speed around the S/C X-axis. The remaining four thrusters mainly have Y /Z thrust direction components and are evenly distributed to compensate for SRP one after the other over the course of one revolution. The resulting force profile per thruster and the sinusoidal torque demand for Y and Z is shown in Fig. 8. The difference in maximum thrust between the units is due to non-symmetries of the illuminated side of the sun-shield, Y /Z components of the CoG and individual thruster characteristics (mass flow sensor (MFS) sensitivity, thrust vector alignment, etc.) (Fig. 9). The Gaia MPS has been operating near continuously for 4.5 years and has performed excellently, meeting and surpassing the incredible rate stability requirements in the 0.1 mas/s range. It is currently being operated on the B-side. The A-side is fully operational and ready to be used in case of contingency. During commissioning and routine phase, a few operational challenges were encountered and overcome. 1. MT3B scale factor Shortly after first in-flight usage in early 2014, an unexpected value for the MFS calibration offset for one of the A-side thrusters (MT#3) was reported, showing an

Fig. 8 Gaia MPS layout

564

J. Marie et al.

Fig. 9 MPS commanded thrust and torque output

increase by a factor ~10 compared to the previous calibration. AOCS behaviour indicated that the affected unit provided more thrust than demanded (by a factor of 5–10, depending on the commanded thrust level). Root cause investigation showed that the signature correlates with a failure of the MFS heater circuitry. The resulting sensitivity degradation leads a reduced MFS output which was compensated close loop by the MPE through an increase in piezo-disc voltage in order to achieve its target mass flow. Thanks to the flexibility of the on-board AOCS software, it was possible to deal with this problem in-flight without a software change: the commanded thrust level for the affected thruster was de-scaled by ground command to compensate the observed increase. The corrective measure achieved the desired result and performance as well as cold gas consumption went back to a nominal level. However, the failure proved to be intermittent when it disappeared after a MPS power toggle in July 2014. The descaling fix was removed, but the failure re-occurred mid-2014 triggering safe mode due an increase of cold gas usage rate. It was then decided to continue operations using the B-branch. Branch A is still available as a backup and can be operated in both states, i.e. if the scale factor off-nominal event is present, the de-scaling fix needs to be applied. If the scale factor off-nominal event is absent, the cold gas usage rate monitoring threshold is widened to allow for enough time to manually apply the de-scaling fix should the off-nominal event re-occur. In addition, a second backup solution was prepared where a mixed-branch thruster configuration can be driven by both MPEs in parallel in order to isolate a thruster if declared as failed. 2. MFS offset drift Later in the mission, it was seen that the MFS offset was drifting over time. While this was expected to some extent, the rate at which the offset was changing was greater than expected, and led ground to perform more frequent calibrations (for a period these were performed close to weekly). This was done since the propellant usage increases when the MFS offset is not well calibrated. Since calibration could only be performed with the cold gas flow stopped (and therefore not controlling

In-Orbit Experience of the Gaia and LISA …

565

Fig. 10 Gaia MPS MFS offset drift (B-side)

the spacecraft), such calibrations had to be performed under chemical propulsion system-based control, which impacts science return. In the meantime, ground developed a new method to perform offset calibrations, allowing the spacecraft a short period of free drift, which is less disturbing for science than going to CPS control. Though the free drift calibration is a non-routine way to calibrate in-orbit, it showed that it is an in-orbit efficient operation. However, as is also apparent from Fig. 10, over a period of a few months the offset drift has stabilised meaning that such measures are no longer necessary. A non-invasive method was also developed to infer indirectly, via the orbit determination solution, if a significant offset drift had occurred (since the excess propellant usage causes a parasitic effect on the orbit). This monitoring remains active, though as said above the MPS is now performing fully nominally. On LPF, offset drifts of similar magnitude have not been observed. 3. Autonomous LCL reset In April 2016, MPEB internal LCL triggered leaving the MFS heaters unpowered. The resulting loss of MFS sensitivity lead to an increased thrust demand for all thrusters. Despite the increased cold gas consumption, the AOCS was able to maintain the guidance profile. Ground recovered the off-nominal event by manually resetting the internal LCLs before the FDIR threshold for cold gas usage rate was surpassed. This fix immediately restored the nominal performance. After a second occurrence of the same off-nominal event a year later had triggered a unit FDIR, an on-board control procedure (OBCP) was developed which detects the spurious LCL triggering based on housekeeping telemetry (HKTM) signature and resets the LCLs before any

566

J. Marie et al.

FDIR reaction is carried out. A precondition for OBCP activation was a patch of the MPE S/W that would allow for a LCL reset command in the routine operating mode. Unwanted regression effects of such a patch were ruled out by validation on the Gaia avionics model and an in-flight test on the LPF MPE as part of the end-of-life (EOL) activities. In early 2018, the first LCL trip-off was autonomously recovered within a few minutes limiting the operational impact to a minor temporary disturbance of the rate profile. 4. MPS performances during routine phase During most of the routine operation phase, the MPS has shown a very stable performance which resulted in less frequent calibration activities than originally anticipated. The main periodic activities foreseen for the MPS were calibration of the MFS zero offset, DAC1 update and venting of the low-pressure section of the unused branch. The latter was foreseen in case of pressure build-up due to pressure regulator leakage, but was only executed once shortly after launch. Since then the pressure on the unused branch remained stable and was mainly driven by gradual changes in the ambient temperature. Since the MFS zero offset drift has stabilised, an offset calibration is routinely performed in conjunction with each station keeping manoeuvre which is executed at a bi-monthly rate. The impact of MFS accuracy on the thrust noise can be shown by analysing the stability of the MFS output for a constant commanded thrust. This commanded thrust is continuously changing when a thruster is used by the AOCS in close loop. However, for approximately 50% of each of Gaia’s 6 h revolutions, every micro-thruster is commanded at a fix bias of 1 µN when its torque components are not required to counteract SRP. The standard deviation of the MFS output over these time periods was selected to assess MFS steady state performance and converted into an equivalent thrust noise (Fig. 11). The look-up table for correlating cold gas mass flow and resulting force magnitude was taken from the Gaia MPS ground model used for AOCS simulations on the flight validation bench. The processed 6 h slots show a sigma better than 0.5 µN over the course of two years for all B-side thrusters and long-term stability without any apparent change in performance. These figures are also in line with a pre-launch estimate of 0.53 µN. DAC2 voltage is monitored by ground to ensure nominal thrust resolution over the full thrust range within the commandable DAC2 voltage range from 0 to 200 V. Since DAC2 voltage is autonomously commanded by the MPE in closed loop, the only way to calibrate the thrust range is through a modification of DAC1 voltage which should be close to the opening voltage to achieve the full thrust range trough DAC2 voltage range. A ground monitoring procedure of the MT valve temperatures was supposed to trigger an update of DAC1 voltage because ground testing had shown a correlation between MT valve temperature and opening voltage. However, the stable thermal environment of Gaia at L2 only caused a minor valve temperature variation in the order of 10 °C so that DAC1 was not required to be updated since commissioning. This can be illustrated by the stability of the measured DAC2 voltage as a function of the commanded thrust to drive the micro-thruster piezo-valve opening level. Over

In-Orbit Experience of the Gaia and LISA …

567

Fig. 11 Gaia MPS thrust noise due to MFS performance

a period of 1.5 years, five data sets of 6 h were sampled months apart to illustrate the evolution of this parameter pair. The results for thrusters 1–4 are shown in Fig. 12. Thrusters 5 and 6 are not used to compensate for SRP, but only used to control Gaia’s rotation rate. The resulting thrust range in routine operations (1–7 µN at a 1 µN resolution) does not yield meaningful results in this context. The hysteresis shape of the plots is due to the nature of the reverse piezo-electric effect when the valve is gradually opened and then closed again over the course of one revolution.

3.3 LPF MPS The LPF cold gas MPS uses a re-build of the Gaia equipment, with additional branches to implement full redundancy, also at pipe/tank level, and allowing control of tank gas mass distribution around the TMs, to reduce the disturbing gravity gradient during usage. The LPF micro-propulsion consists of the nitrogen storage and feed assemblies, and two micro-thruster branches (N + R), each one commanded by one micro propulsion electronics. As shown in Figs. 13 and 14, there are three feed branches (FB), one branch equipped with two N2 tanks, that could all be used for the two micro-thruster subsets by properly configuring the low-pressure latching valves (LPLV). Operationally, only the nominal chain of micro-thrusters is active while the other is kept in warm redundancy, with the MPE powered-on and thrusters warmed up, ready for firing.

568

Fig. 12 Gaia commanded thrust versus DAC2 readback Fig. 13 Simplified sketch of the cold gas tank and micro-thruster location around the LPF test masses

J. Marie et al.

In-Orbit Experience of the Gaia and LISA …

569

Fig. 14 LPF cold gas MPS layout

During operations, to balance tank usage and control the gravity gradient on the TMs, the active feed-branch was regularly swapped via ground command. 1. Priming and commissioning During transfer phase to L1, with the PRM still attached, the cold gas MPS was primed and commissioned. Special attention was paid to this activity, to ensure that the performance of the feed branches and the main and redundant micro-thruster chains were suitable for the subsequent phases of the mission, after PRM separation. The commissioning consisted in firing each thruster, both in sequence and all at the same time, up to 500 µN, a level expected to be reached during the SCM despin and during the correction manoeuvres for L1 insertion. The resulting thruster

570

J. Marie et al.

Fig. 15 LPF cold gas regulated pressure evolution during MPS commissioning

force could be derived via the rate measured on the composite spacecraft, exploiting the CACS-CNM attitude control dead-band, in which the chemical propulsion was inactive. Figure 15 shows the commissioning of chain-A thrusters, started with FB1 on the first two steps and then continued with FB2: The start of the regulation by the pressure regulator in branch FB1 was lower than expected, approaching the FDIR threshold (0.8 bar). In addition, the re-pressurisation of the feed line after the firing was too slow (order of hours instead of the expected 15 min). This branch was very cold, with the nearest temperature measured at −15 °C. The branch FB1 was thus brought to a higher temperature and then the Pressure Regulator started regulation with the expected time delay. The anomaly was explained due to icing as it definitively disappeared after temperature increase. 2. SCM de-nutation/de-spin The PRM/SCM composite spacecraft was spun-up in CACS-CSAM to 5°/s around the Z-axis to gyroscopically stabilise before PRM separation. Following PRM release from the SCM, the de-nutation (DN) and de-spin (DS) phase was executed in MPACS-MSAM mode using the chain-A micro-thrusters. SCM rates were successfully reduced on all spacecraft axes in 7 h 20 , which was within the expected range, though more time than expected was spent in DN. The slightly unexpected behaviour did not have any adverse consequence on operations, and the likely cause has been identified in spacecraft mass properties (in particular, cross products of inertia) different from the values measured on ground. The sun angles were kept within 9° well within the target 40°. Figure 16 shows the measured spin rate around the spacecraft

In-Orbit Experience of the Gaia and LISA …

571

Fig. 16 LPF SCM de-nutation and de-spin with MPS after PRM separation

body Z-axis and the transversal components due to nutation that were first reduced to 0. 3. L1 Correction Manoeuvres After PRM separation, LPF successfully executed three delta-V manoeuvres with the cold gas MPS for L1 orbit insertion, in MPACS MNM-SK mode with chain-A micro-thrusters firing at ~500 µN for 23 h (25 cm/s), 14 h (15 cm/s) and 4 h (4.5 cm/s). Figure 17 shows the start of the third manoeuvre where thruster 5 showed an anomalous under-performance (>10%) lasting a few minutes. This underperformance had no impact on the manoeuvre itself and could only be investigated later in the mission, during the end-of-life tests, before LPF decommissioning. The problem was due to the missing in-flight re-calibration of MPE DAC1 base voltage (foreseen in Gaia but not in LPF), leading to the saturation of the MPE DAC2 regulation voltage. The re-calibration procedure was conducted during the end-of life tests and thruster performance could be recovered. 4. Drag-Free and Station Keeping Operations Following the correction manoeuvres, the scientific exploitation of LPF immediately started. In this phase, the cold gas MPS was used at a much lower thrust level (~9 µN per thrusters for drag-free operations with peaks to 116 µN for station keeping operations). It demonstrated good performances and behaviour in terms of noise [9, 10], in average 0.15 µN Hz−1/2 down to 1 mHz, and MFS (mass flow sensor) offsets, stable over the whole mission within +60 to −60 mV, showing a slow increase when in operative and decrease when in warm-up, as shown in Figs. 18 and 19. However,

572

J. Marie et al.

Fig. 17 LPF start of orbit correction manoeuvre Nr.3 with MPS

Fig. 18 LPF cold gas MPS chain-A MFS calibration evolution

the magnitude of offset changes stayed in the order of tens of mV, i.e. well below the values seen on Gaia MFS units [11].

In-Orbit Experience of the Gaia and LISA …

573

Fig. 19 LPF cold gas MPS chain-B MFS calibration evolution

4 Conclusion Although the MPS is a new propulsion technology with highly challenged performances, its in-orbit behaviour is excellent, meeting Gaia’s and LPF’s unparalleled requirements in terms of attitude and rate stability. Most of the MPS calibration activities were only required in commissioning phase and in non-nominal operational conditions. In routine operations, the MPS requires a low maintenance effort with a high degree of autonomy. The impact of calibration activities on Gaia operations could be mitigated by developing new flight operation control procedures. Gaia’s Attitude and orbit control subsystem design and in-orbit behaviour have been robust towards the observed MPS off-nominal events that have been observed during the more than 4 years in-orbit. All of them have been mitigated thanks to AOCS software flexibility, redundant system architecture, new flight procedures developed by ground and capabilities of on-board control procedure. The cold gas micro-thrusters demonstrated robustness and accurate commanding resolution, featuring low noise and thrusting stability. One of the most effective evidence of achievable performances are the accomplishment of the LPF science mission, that provided science results beyond expectations also thanks to the performances of the micro-thrusters. Acknowledgements Spacecraft operations are a team effort requiring the support of multidiscipline expertise across ESA, industry and the Gaia/LPF scientific consortia. The authors gratefully acknowledge the excellent contributions of the wider Gaia/LPF MOC team, the Gaia/LPF

574

J. Marie et al.

Project, Project Scientists and Mission Manager in ESTEC, the respective Science Operations Centers, Airbus Defence & Space, Thales Alenia Space Italia and Leonardo Company experts to the Gaia/LPF endeavour and to the work behind this paper.

References 1. Rudolph, et al. Gaia mission operations concept and launch and early orbit phase—In orbit experience. In AIAA 2016-1751, SpaceOps, Pasadena, CA. 2. Milligan, et al. ESA’s billion star surveyor—Flight operations experience from Gaia’s first 1.5 years. IAC-15,B6,3,2,x30679. In 66th IAC 2015, Jerusalem, Israel. 3. Milligan, et al. Flying ESA’s ultra-precise Gaia mission. In AIAA 2016-2444, Spaceops, Daejeon, Korea. 4. Povoleri, & Watt, M. (2017). Lisa PathFinder in orbit experience: LEOP and on-station operations. In 10th ESA Conference on GNC Systems, Salzburg, Austria. 5. Armano, et al. (2016). Sub-Femto-g free fall for space-based gravitational wave observatories: LISA pathfinder results. Physical Review Letters, 116. 6. Rudolph, et al. Lisa path findermission operations concept and launch and early orbit phase—In orbit experience. In AIAA 2016-2414, Spaceops, Daejeon, Korea. 7. Ecale, et al. Gaia AOCS in-flight results. In GNC 2014, Porto, Portugal. 8. Di Marco, et al. (2015). Gaia: first year flight operations experience. In AAS-15-133 AAS GN&C Conference, Breckenridge, Colorado, USA. 9. Martino, & Plagnol. (2016). LISA pathfinder coldgas thrusters. In Lisa Symposium, Zurich, Switzerland. 10. Fallerini, et al. (2018). Micro-propulsion for scientific missions. In Space propulsion 2018, Seville, Spain. 11. Armano, et al. (2018). Beyond the required LISA free-fall performance: New LISA pathfinder results down to 20 µHz. Physical Review Letters, 120, 061101. 12. Marie, et al. In-orbit experience of the Gaia and LISA pathfinder cold gas micro-propulsion systems. In AIAA 2018-2716, Spaceops, Marseille, France.

The Cassini Mission: Reconstructing Thirteen Years of the Most Complex Gravity-Assist Trajectory Flown to Date Julie Bellerose, Duane Roth, Zahi Tarzi and Sean Wagner

Abstract Cassini launched in 1997 and completed its prime mission, its Equinox first extended mission, and its Solstice second extended mission. Since its arrival at Saturn in 2004, Cassini completed almost 300 orbits around the planet. Over the span of the mission, significant improvements were made to all the major satellites ephemeris and to Saturn gravitational and pole models. These improvements have enabled better trajectory reconstructions throughout the timeframe of the mission, although using about one hundred different models of the Saturn system. Now that the mission is over, the paper reports on the uniform reconstruction of the entire Cassini orbital mission, which uses one consistent Saturn system model and satellite ephemerides throughout. We discuss the challenges of undertaking this task and comparison strategies for choosing the best and greatest Cassini trajectory for its very final delivery.

1 Introduction Cassini spent almost twenty years in space, where thirteen of those years were to explore the Saturn system starting in July 2004. The trajectory was designed to explore the Saturn system, with a focus on its biggest moon Titan. After four years of nominal (Prime) mission, two extensions brought its operations to September 2017. The extended missions were named to correspond with the applicable season at Saturn. The prime mission included forty-five flybys of Titan, four of Enceladus, and nine of other icy satellites. After the completion of the four-year prime tour in July 2008, when Huygens was released to Titan and Cassini’s trajectory went through Saturn’s magnetotail and completed orbits at higher inclination, NASA extended the mission until September 2010 [1]. The first extension, the Equinox mission, from July 2008 to September 2010 was focused on changes to the Saturnian system by the onset of equinox, on August J. Bellerose (B) · D. Roth · Z. Tarzi · S. Wagner NASA/Jet Propulsion Laboratory/California Institute of Technology, Pasadena, CA 91109, USA e-mail: [email protected] © Springer Nature Switzerland AG 2019 H. Pasquier et al. (eds.), Space Operations: Inspiring Humankind’s Future, https://doi.org/10.1007/978-3-030-11536-4_22

575

576

J. Bellerose et al.

11, 2009. The Equinox mission added twenty-six Titan flybys and twelve more Enceladus and icy satellite flybys. By then, the uncertainty on Titan’s position was already only a few hundred meters while the uncertainties for Enceladus, Rhea, and Dione ephemerides were a few kilometers. The trajectory’s inclination gradually lowered over one and a half years, leading to an equatorial phase where numerous icy moon flybys could be performed. A few orbit inclination changes were included to fully map Titan’s surface using Cassini’s radar instrument [2]. The Solstice mission, from 2010 to 2017, added forty-six Titan flybys, twelve Enceladus, and twelve other icy satellite flybys. Cassini came back to the equatorial plane and remained there until June 2012, then returned to higher inclinations in three different inclination phases that lasted three years, to come back to its last equatorial phase in 2016. Cassini’s orbit inclination then gradually increased for the Grand Finale. In the last ten months of the mission, the orbit period changed from nearly thirty-two days to about seven days or less. After the penultimate Titan flyby (T125), Cassini’s trajectory was altered such that it grazed the outer edge of the F-rings at twenty descending node crossings and then, after the final Titan flyby (T126), passed through the narrow gap between the D-ring and Saturn’s cloud tops every 6.4 days twenty-two times. In order to dispose of the spacecraft in accordance with NASA’s planetary protection requirements, the trajectory was designed so that the spacecraft would enter Saturn’s atmosphere on its final orbit and vaporize. This happened on September 15, 2017 [3]. The planet’s orientation in time with respect to the Sun is shown at the bottom of Fig. 1. Figure 1 also summarizes the satellite encounters throughout its prime and two extended missions. In total, Cassini targeted 163 satellite encounters. Cassini’s trajectory reconstructions are already publicly available on the NAIF Web site [4]; deliveries were usually made every few months. As the Saturn system and spacecraft error modeling were refined over the years, inputs to those reconstructions have also evolved over the years. In particular, they include different models for Saturn’s gravity, pole, and the ephemerides of ten of Saturn’s moons. The Cassini Navigation team received Saturn system models from the JPL Solar System Dynamics (SSD) group every six months on average. Throughout the thirteen years of the mission, one can find at least forty different models of the Saturn system used in the making of the Cassini trajectory reconstructions during operations. In addition, since navigating Cassini can also include refining the Saturn’s system model, the Navigation team also estimated the Saturn system from time to time in between SSD model upgrades. In total, the currently available reconstructions include more than one hundred different values for Saturn gravity harmonic coefficients, pole angles, and states of ten major satellites. We worked closely with the SSD group to get the latest and greatest model of the giant planet and its satellites to provide a reconstructed trajectory for Cassini with a single uniform model. The uniform reconstruction needed to be done in three phases: first gather inputs, then build all trajectory reconstruction environments for defined time spans, and finally reconstruct all those. There have been many challenges in doing so. Besides ingesting thirteen years of data, the navigation process improved over time and got modernized through using a different navigation software. In the

The Cassini Mission: Reconstructing Thirteen Years …

577

Fig. 1 Cassini mission science profile, 2004–2017 [3]

next sections, we detail the strategy used to undertake this uniform reconstruction task and explain the associated challenges. In the last section, we discuss analyses and results for the final reconstruction delivery made in June 2018, comparing this uniform reconstruction to the ones delivered during Cassini’s operational years.

2 Uniform Reconstruction Structure 2.1 Navigation Background Cassini’s trajectory is one of the most complex trajectories ever flown. The reference trajectory was optimized for propellant consumption and satellite encounters to maximize science returns [1–3, 5]. The Navigation team’s task was to return the Cassini trajectory to the reference trajectory at the times of flybys, allowing deviations between encounters. This was performed in two steps: the orbit determination (OD) team estimated Cassini’s trajectory and related parameters, while the flight path control (Maneuver) team designed the orbit trim maneuvers (OTM) to meet flyby times and geometry. The OD team’s activities covered trajectory error analysis for upcoming operations through reconstructing past trajectories for spacecraft calibrations and science investigations.

578

J. Bellerose et al.

Fig. 2 Navigation arc definition

Cassini’s navigation structure focused on two flybys at a time where the time span was referred to as an “arc”. To give some background, an example “arc” is illustrated in Fig. 2. In this figure, the black line represents the main arc, revYFB2, starting at the Saturn apoapse prior to the first satellite flyby (the nth revolution around Saturn, named revY here, and flyby number 1, or FB1) and ending after the maneuver following the second flyby (FB2 in figure). Three maneuvers were usually designed in between two encounters; the first two maneuvers following the first encounter were deterministic, with nonzero delta-Vs in the reference trajectory, while the third one was statistical to clean up errors prior to a given flyby. The overlaps between arcs provided some validation during operations (green and black lines overlapping or black and red lines overlapping in Fig. 2). The reconstruction deliveries started at the arc epoch and ended at the next arc epoch, without overlap. A number of papers have been published to describe the workings and performances of both the OD and Maneuver teams [6–11]. During operations, on the OD side, the software suite was built to gather and build the required inputs for a new arc. Then, another set of scripts would take care of the data fitting and parameter estimations over the time span of the arc. As tracking data came in, usually Doppler and range, the spacecraft and dynamical environment were updated and the Cassini trajectory propagated so that predictions and observations could be compared. Optical images were also used to refine our overall knowledge of Saturn’s pole, gravitational field, and major satellite ephemerides. A linearized least-square filter was used to estimate selected spacecraft and Saturn’s system parameters. Other parameters such as the Saturn ephemeris and Earth platform parameters were not estimated but included as consider parameters to account for their uncertainties. For most parameters, a priori values were constrained from the previous arc. The estimation process was used with iterations to manage nonlinearities. This operational approach served as the basis for the uniform reconstruction strategy, with a degree of automation highly desired for time saving. During the mission, Saturn gravitational harmonic coefficients, pole angles, and satellites’ states were estimated for certain periods of time. Figure 3 shows the evolution of the Saturn’s pole right ascension since Tc (end of 2004). This variation is representative of many Saturn system parameters estimated across the mission.

The Cassini Mission: Reconstructing Thirteen Years …

579

40.586 40.584 40.582 40.58 40.578 40.576 40.574 40.572 3-1-16

1-1-15

8-1-15

6-1-14

4-1-13

11-1-13

9-1-12

2-1-12

7-1-11

5-1-10

12-1-10

3-1-09

10-1-09

8-1-08

6-1-07

1-1-08

11-1-06

9-1-05

4-1-06

2-1-05

40.57

Fig. 3 Saturn pole right ascension angle in degrees in time, with values taken at epoch January 1, 2010, from Saturn models delivered throughout the mission by the JPL SSD group. Note that Saturn’s declination has a much smaller variation

Note that Saturn’s right ascension uncertainty is not shown as it is in the third digit or smaller. Earlier Saturn models used included larger uncertainties. For instance, Titan’s uncertainties reduced from near 200 km to less than 5 km by Tc. With the mission done, using a single-uniform Saturn model can be achieved which eliminates Saturn parameters’ variations and includes uniform uncertainties throughout the mission. In total, 172 arcs were delivered during the mission. One of the requests for the uniform reconstruction was to keep the same epochs as the previous delivered reconstructions during the mission. As a result, time allowed implementing the “arc” operational approach for the uniform reconstruction. It would also make comparisons with existing Cassini trajectories easier since each would reconstruct the same time span. The epoch chosen for this full mission reconstruction is just after Saturn Orbit Insertion (SOI), on July 1, 2004 14:00 ET. The end time includes the last data received from the spacecraft just before its disintegration into Saturn atmosphere on September 15, 2017 11:54 ET. For the uniform reconstruction effort, we were able to reduce the number of arcs to 157 as some of the previous deliveries were made for every Saturn revolution until the OD team became confident fits could be obtained for many empty revolutions (i.e. without any satellite flybys). As described in the next section, the uniform reconstruction was ninety percent automated, and the time necessary for an end-to-end reconstruction was about 160 h.

580

J. Bellerose et al.

2.2 Inputs Preparation One particular challenge for re-reconstructing the mission was the use of two different software sets. In 2012, Cassini’s navigation efforts transitioned from the legacy Orbit Determination Program (ODP) [12] to the new Python-based Mission Analysis, Operations, and Navigation Toolkit Environment (MONTE) developed at JPL [13]. The Navigation team performed parallel operations for almost three years before making the transition, from Spring 2009 to mid-January 2012. The transition happened during a “downtime” toward the end of almost two years of distant Titan flybys, and about ten flybys of Enceladus and other icy moons (from July 2010 to May 2012). Although invisible to the end users of Cassini’s trajectory, the T80 reconstruction was done with ODP, followed by T81 in MONTE. This extended experiment and testing phase gave confidence in the conversion and process implementation, and the legacy ODP was quickly dropped after the formal transition. Although this change modernized Cassini’s navigation, it also meant all inputs from 2004 to 2012 needed to be converted into MONTE format; this had not been done during the parallel testing or since the transition. In addition, all arcs for that period of time needed to be organized with the appropriate structure so that precise orbit determination could be performed in MONTE. We discuss some of that data conversion below. The amount of Cassini data cannot be overstated. Over the thirteen years at Saturn, the spacecraft flew by a satellite, most often Titan, about once every month, and executed 360 maneuvers, out of 492 maneuvers designed, or about two and a half per month on average. This also means thirteen years and almost three months of data to ingest. Some inputs needed to be assembled, while others had to be converted in the appropriate format and distributed to each arc. Table 1 gives a feel of the navigation data volume along with the conversion required prior to starting the uniform reconstruction itself. We detail each of those inputs below. The tracking data, ionospheric and tropospheric calibrations, and Earth orientation parameter (EOP) files were the easiest to work with. All those files had been saved, and MONTE has direct format conversion commands available. Simple short scripts were used to convert and merge all the required files. The tracking file is 530 MB in size, containing radiometric data for nearly 5000 tracks using the Deep Space Network (DSN). Each of those tracks includes an average of 6 h of coherent Doppler data every 60 s and range integrated over 5 min. Data edits for all arcs had to be merged; this was critical in speeding up the reconstruction process and allowing a smooth automated process. Without these, manual iterations and edits would be needed for 157 arcs; this means checking for corrupted data points five to ten times during a single arc reconstruction. These edits had been recorded during operations to remove or ignore either bad data or biased data due to various antenna manipulations and dynamical mis-modeling. Although part of those edits needed to be fetched from the legacy arcs, MONTE already has a conversion command for those inputs. Hence, this produced a merged file of about 10,000 edits.

The Cassini Mission: Reconstructing Thirteen Years …

581

Table 1 Navigation data required for the uniform reconstruction Inputs

Volume

Conversion characteristics

Tracking data (deep space network)

~5000 passes

Merge all tracking data

Earth atmospheric calibrations

80,000 entries, ionosphere 485,000 entries, troposphere

Merge monthly calibration files for each ionospheric and tropospheric calibrations

Radiometric data edits

10,000 edits

Fetch and merge edits from 121 ODP arcs and 51 MONTE arcs

Optical images for navigation

2243

Directly ingested by MONTE (609 pre-SOI images not processed)

Small thruster events (turns, spacecraft momentum maintenance)

2253

Make a new file per arc, for all 157 arcs Reset all a priori uncertainties

Encounters where thrusting was used

80

Query telemetry and build acceleration profile for each, verify begin and end turns for each, include stochastics at flyby

Spacecraft attitude files

102

Convert sequence NAIF format attitude file (called c-kernel) to MONTE, added two reconstructed attitude files for safings

Optical pictures used in navigation (opnavs) totaled more than 2200 post-SOI. Most of those navigation pictures were obtained early in the mission for science investigations, and Saturn and satellites’ ephemerides estimation. The Picture Sequence File (PSF) is the text file used in mission operations containing information on the spacecraft camera, the camera pointing, and the time and location of objects in a picture. Cassini’s PSF includes 164 pictures of Titan during the prime mission and none in the two extended missions. Enceladus and most of other large icy satellites were imaged 100–200 times in the prime mission, then about half during the Equinox mission, and only a dozen times during the Solstice mission. By the end of the prime mission, optical navigation was mainly used to maintain knowledge of the Saturn’s satellite ephemerides, with a few exceptions for particular encounters of interest [14, 15]. Unlike the radiometric data, opnavs were already merged, and the star catalog just needed to be updated. Since the uniform reconstruction is using a single Saturn system model, the opnavs were not to be used in the overall estimation. However, we looked at possible biases in the optical observations over the mission timeline, discussed in Sect. 3.

582

J. Bellerose et al.

The most time-intensive task was remaking the small forces files, where the small reaction control subsystem (RCS) thrusters were used to adjust the spacecraft’s orientation for science observation or momentum management (but not for the OTM), and the acceleration models for satellite flybys where thrusting was used to maintain or adjust the attitude. Each arc points to the corresponding “rcs_dv” file including all small thrusting for the arc span. A uniform error model was also required throughout: 0.5 mm/s uncertainty for all thrusting relying on telemetry and not visible in the Doppler signature, and 0.25 mm/s for momentum management using less than 1 mm/s. Those small forces not on Earth line were rare in the prime mission, but became common by the Solstice mission, making for almost all of the small spacecraft events by the Grand Finale. The default uncertainty was set to 1.2 mm/s for all other events; since those were on Earth line and thus visible in the tracking data, the least-square estimation reduces their uncertainty to less than 0.1 mm/s usually. About eighty satellite encounters had been performed in RCS mode to maintain attitude during close approach. This was usually done for Titan encounters at less than 1200 km altitude, then modified to 1300 km altitude later in the mission, in order to counter the effect of Titan’s upper atmosphere. Instead of inputting potentially hundreds of small burns during a given encounter, an acceleration profile was used from the predicted and then telemetered encounter. The telemetry was queried again and reprocessed for ingestion in the appropriate software language. Three different servers needed to be used in the telemetry queries over the entire mission. A few flybys had telemetry missing due to various reasons, but documented enough to be repaired. The OD team had implemented the RCS updater script to also refresh attitude files when used. As all the RCS files were being built, the newest predicted c-kernel attitude files were being updated for appropriate sequences. Cassini had already adopted a philosophy to not use reconstructed attitudes since the differences with the predicted ones were negligible. Hence, all reconstructions are done with latest predicted attitude files, besides adding reconstructed attitudes for two safing events: September 11, 2007 and November 2, 2010. The first safing occurred after the 1648 km Iapetus flyby. The second safing was caused by corrupted files. During safing, the spacecraft was to turn off all non-necessary power loads, turn to a Sun-pointed attitude and switch communication to the low-gain antenna.

2.3 Uniform Reconstruction Tool With 157 arcs to reconstruct, automation becomes critical. The main requirements are to: (1) Set a given arc with appropriate inputs for the arc time span, including the spacecraft states and covariance at epoch mapped from the previous arc. (2) Once this is achieved, estimate spacecraft parameters. (3) Do this for 157 arcs, and allow reruns while keeping local modifications.

The Cassini Mission: Reconstructing Thirteen Years …

583

For satisfying (3), a function was implemented to prevent rewriting a given arc directory and only update the spacecraft states and errors from the previous arc. Hence, if all arcs were reconstructed but a random one required to be edited, the user could re-reconstruct it again, and then continue on (as one arc feeds the following one). The OD software was reused as much as possible and wrapped over the Cassini arc database. This resulted in an “auto_recon” script, where the workings of the tool are schematized in Fig. 4. In this figure, the “start” position assumes inputs have been converted and assembled. The tool loops through an OD arc database to fetch arc names, epochs, and end times, and records the next arc name and epoch for appropriate spacecraft state mapping. At the same time, the satellite encounters along with the maneuvers encompassing the arc are identified. Their respective event times are recorded and stored as data cutoffs for later use. Finally, the RCS file and appropriate sequence attitude files are listed. After this arc initialization or “advance_arc init” in Fig. 4, the main configuration “Options” file is built, pointing to paths of appropriate inputs. Among those, the Saturn system model files, tracking files, Earth parameters and calibrations files, data edits and weights, and RCS flybys’ acceleration models are brought in. The states, radioisotope thermoelectric generator heat (RTG), attitude, filter setup, and mappings files are placed in a local inputs folder. This is done in “advance_arc final” in Fig. 4. By this point, the first data fit is ready to be started. In “odfit”, the trajectory is propagated from the spacecraft states at epoch and the force model from inputs described above. The raw observables are read (Doppler and range), and the computed observables are generated from the integrated trajectory. The navigation filter then performs the least-square estimation of parameters indicated in the off-white box at the bottom of Fig. 4. We list estimated and consider parameters with their uncertainty in Table 2. The process is then reiterated through data cutoff updates until the end of an arc, until convergence (“iter” loop in Fig. 4). Once a solution is converged, the next step is to verify the newly reconstructed Cassini trajectory against the existing delivered one and resolve any potential issues; if the position and velocity differences are too large, something must have gone wrong. As the auto_recon tool runs, this verification can be done for multiple arcs at a time, if not the entire reconstructed mission. A number of them needed to be reworked as the position and velocity differences were too large; a position difference above a few tens of kilometers at arc epoch is suspicious. Most often, bad radiometric data was throwing off the trajectory fit: safings were first forgotten, some data cutoffs were too distant for the arc to converge, some RCS turns prior to Tc had been modeled through non-gravitational accelerations instead and thus not listed in the usual files, and random other errors occurred. In some cases, the second or third arc following a corrupted one had recovered, and the automated process could continue. Hence, after all 157 arcs are built and converged, the reconstruction could be rerun with all necessary fixes by only updating the state and associated covariance from the previous arc. While this doesn’t save much processing time, it allowed saving particular setup and edits for specific arcs that needed more care or needed uncommon changes.

Fig. 4 Automated uniform reconstruction tool diagram

584 J. Bellerose et al.

The Cassini Mission: Reconstructing Thirteen Years …

585

Table 2 Estimated and consider parameters and associated uncertainties Estimated

A priori 1 σ error

Consider

A priori 1 σ error

Cassini states

360

OGl ↔ IA2 (cm)

>360

120–360

>360

>360

>360

>360

>360

OG2 ↔ IA2 (cm)

>360

40–120

40–120

>360

>360

>360

>360

Table 1 presents an assessment of inter-individual distances by class. A collection of map data in continuous mode (from video recordings) allows dynamic monitoring of changes in zones. Otherwise, an allocentric data record (subjects in relation to each other) was taken every 20 min in the selected time slot (12:00–14:00). They indicate relative positions in the delimited spaces (I, Po, S, Pu). The results show that the subjects are relatively distant from each other, most frequently more than 360 cm. This distance can help concentration or visual alertness to control screens (H/S interactions) in separate operations (G1 and G2), with a behavioral strategy of avoiding interferences when subjects are in audio communications (H/Hi or H/He interactions). In this spatial organization, the AI is located at the interface by occupying the personal (40–120 cm) or social (120–360 cm) space with one or many operators, thus ensuring connections from one area to another. Figure 5 shows the analysis of allocentric data based on orientations relative to each operator. The assessment of the number of subjects in an operator’s visual field indicates the level of visual isolation in addition to the inter-individual distances.

788

C. Tafforin et al.

Fig. 5 Visual orientations of participants to Galileo post-launch operations according to time

It delimits auditory isolation. The results show that operators adapt their behavior according to the configurations of the system environment and the human environment in which they interact, in a positive way, such as to maintain their efficiency and conviviality at work.

3.3 Pleiades Training Session In order to complete this data and present the completeness of the methodological tools that could be applied, additional observations have been made during a system qualification test using a Training, Operations and Maintenance Simulator (TOMS) [23]. Performed in Pleiades control center, they focused on the interactions between pairs of subjects so as to give a qualitative and quantitative description of their cooperation. Figure 6a gives the levels of behavioral interactions or flow rate in act/minute for the two control operators (OC1 and OC2) at the control station. The results show differences in activity between the two subjects. In spite of the common task, there are behavioral profiles specific to each operator that can be linked to individual differences, differences in experiences, specific functions in the pair. These variables can have a positive impact in this combination of training and coaching. Figure 6b gives the level of behavioral interactions for the two mission engineers (IM1 and IM2). The results also show differences between the subjects and highlight more active and passive cooperation within the pair. This combination is effective given the success of the tests. At the individual level, the qualitative and quantitative behavioral profile is like a picture with objective validation of the efficiency at work based on training procedures. Table 2 provides a quantitative assessment of the behavioral patterns on an enlarged catalog of interactions (Io: object interaction; Iv: visual interaction; Ip:

Ethological Approach of the Human Factors from Space …

789

Fig. 6 Interaction levels of trainees pair during a Pleiades training session Table 2 Sequences of behavioral interactions on a control operator during a Pleiades training session

Interactions

Interactions

Absolute frequency of transition ( = 2 hours) for one operator

verbal interaction; Ib; body interaction; As: action on the system; EXT: outside intervention; SEP: pair separation). The results show that the sequential sequence of acts follows the pattern: system interaction → verbal interaction → visual interaction ↔ verbal interaction This interactive sequence makes it possible to describe in another form the cooperative nature of a pair while training. This pattern of behavioral sequences can also be observed at the individual level in routine operation. The outcome of stereotyped sequences as an indicator of automatisms can be combined with warning signals to maintain optimal vigilance in real situations.

790

C. Tafforin et al.

Fig. 7 Artistic view of an envisaged future NOC

4 Feedbacks and Perspectives of Application Positive feedbacks from CNES were expressed on the application of the ethological approach to the new configuration of operational NOC areas. The ethological analysis related to NOC zones helped to validate the new NOC implementation that was being studied at that time. The former central area (GC)—that the coordinating engineers were supposed to use but which was not actually used as shown by the ethological data—could move closer to other Galileo areas as allocated spatial positions in an envisaged future NOC (Fig. 7). Some feedbacks came from the operators themselves because of their interest in improving their optimal rhythm and working space. The ethological approach, by providing insight on actual actions, interactions, movements, positions, distances and orientations, could help answering some very concrete question operators have such as: • In routine operations, how can we anticipate fatigue and not compensate for it on a shift day that is not adapted to operational comfort? • In control rooms, how to maintain optimal behavioral activities and how to monitor efficiently satellites activities at the same time? • In space operations, either training or real ones, how to use an efficient methodological tool for enhancing human behavior quality versus decreasing HF risks?

Ethological Approach of the Human Factors from Space …

791

Fig. 8 Scenario of behavioral variation, fatigue indicator in anticipation of energy inputs (cyclical process)

First point, a perspective of application would be to focus observations on behavioral descriptors such as collateral activities by measuring their occurrences on an ongoing basis, in order to determine cyclical variations considered as nominal. They could then be correlated with the operational context so as to help, for instance, in assessing the operator well-being in the frame of routine operations. Observations at night—disturbed rhythm in reference to the day rhythm punctuated by usual meal times—would define optimal scenarios (Fig. 8). Observed increasing activity could be a positive strategy to compensate for fatigue and maintain alertness, as link between human behavior and HF. Second point, a perspective of improvement would be to enlarge the field of observations thanks to the use of video recordings and/or of a professional system for the capture, analysis, and presentation of observational data. Behavioral descriptors would be checked following an encoding process specific to the ethological method. If there is no security camera for collecting video data, an adapted application could be to use the software in situ by directly integrating descriptive data and any other source of information such as operational conditions for correlations. The monitoring screens in Fig. 9 show a possible association between the visualization of the reactivity sequences of the satellites and the visualization of active sequences of operator behavior in the same time intervals. This would be a logistic support development adapted to HF monitoring, in the frame of a longitudinal analysis. Third point, an innovative application would be to integrate an expert user of the ethological tools within teams in routine operations on a regular basis and/or during training and/or at the early steps of satellites operations (Fig. 10). This would make possible, on the one hand, to carry out an exhaustive study by applying specific assessments with the appropriate logistical support and, on the other hand, to generate warnings based on ethological data allowing to prevent operators errors and thus guaranteeing the success of space operations. This ethologist’s support could even help operators’ assessment in the frame of their individual certification process.

792

C. Tafforin et al.

Fig. 9 Association over time between satellite visibility slots on the control monitors and the controllers’ behavior monitoring

A few other possible applications of this ethological tool are listed hereunder: • Observation of operators while assembling spacecraft or carrying tests in clean rooms during an Assembly, Integration and Test (AIT ) campaign, with a possible goal of enhancing their working conditions. • Integration of an expert user of the ethological tools within teams in spacecraft contingency operations or observations by the expert of behaviors during spacecraft contingency situations via video recordings, aiming at finding ways of enhancing communication for problem resolution. Future relevant studies will contribute to provide significant results, implemented database, lessons learned, experience sharing, training validation and as a result, to strengthen the link between human behavior and HF.

Ethological Approach of the Human Factors from Space …

793

Fig. 10 Ethological tools for space operations

Acknowledgements The authors wish to thank CNES for its financial supports within the «Life Sciences» thematic and from the CNES Operations division. Additionally, the authors are grateful to all the participants to the study.

References 1. Tafforin, C., Thon, B., Guell, A., & Campan, R. (1989). Astronaut behavior in orbital flight situation: Preliminary ethological analysis. Aviation, Space and Environmental Medicine, 60, 949–956. 2. Tafforin, C. (1994). Synthesis of ethological studies on behavioural adaptation of the astronaut to space flight conditions. Acta Astronautica, 32(2), 131–142. 3. Tafforin, C. (1996). Initial moments of adaptation to microgravity of human orientation behavior, in parabolic flight conditions. Acta Astronautica, 38(12), 963–971. 4. Tafforin, C. (2002). Ethological observations on a small group of wintering members at Dumont d’Urville station (Terre Adélie). Antarctic Science, 14(4), 310–318. 5. Tafforin, C. (2005). Ethological indicators of isolated and confined teams in the perspective of Mars missions. Aviation, Space and Environmental Medicine, 76, 1083–1087. 6. Tafforin, C. (2009). From summer to winter at Concordia station in Antarctica: A pilot study for preparing missions to Mars. In Proceeding of the 60th International Astronautical Congress, IAC-09-A1.1.6. 7. Tafforin, C. (2011). The ethological approach as a new way of investigating behavioural health in the Arctic. Journal of Circumpolar health, 70(2), 109–112.

794

C. Tafforin et al.

8. Tafforin, C. (2013). The MARS-500 crew in daily life activities: An ethological study. Acta Astronautica, 91, 69–76. 9. Tafforin, C. (2015). Behavior, isolation and confinement. In D. Beysens & J. Van Loon (Eds.), Generation and application of extra-terrestrial environment on Earth (Vol. 6, 26, pp. 265–271). Rever Publishers Book. 10. Tafforin, C., & Gerebtzoff, D. (2010). A software-based solution for research in space ethology. Aviation, Space and Environmental Medicine, 81, 951–956. 11. Tafforin, C. (1993). Ethological analysis of human coordinated movements in weightlessness: Comparison between the first and last days of short-term space flights. Journal of Human Movement Study, 24, 119–133. 12. Tafforin, C., & Campan, R. (1994). Ethological experiments of human orientation behavior within a three-dimensional space—In microgravity. Advances in Space Research, 14(8), 415–418. Oxford, Royaume-Uni: Pergamon Press. 13. Tafforin, C., Vinokhodova, A. G., Chekalina, A. I., & Gushin, V. (2015). Correlation of ethosocial and psycho-social data from MARS-500 interplanetary simulation. Acta Astronautica, 111, 19–28. 14. Johannes, B., Sitev, A. S., Vinokhodova, A. G., Salnitski, V. P., Savchenko, E. G., Artyukhova, A. E., et al. (2015). Wireless monitoring of changes in crew relations during long-duration mission simulation. Plos-One, 10(8). August 7, 2015, online. 15. Tafforin, C., & Giner Abati, F. (2016). Interaction and communication abilities in a multicultural crew simulating living and working habits at the Mars desert research station. Antrocom Journal of Anthropology, 12(2), 97–110. 16. Tafforin, C., & Delvolvé, N. (1991). Les activités collatérales de l’astronaute: Indices d’adaptation comportementale à la situation d’impesanteur. Sciences et motricité, 14, 27–33. 17. Tamponnet, C. (1996). Life-support systems for Lunar missions. Advances in Space Research, 18(11), 103–110. 18. Tafforin, C. (2018). From human behavior to human factor in aerospace, astronautics and space operations. Aviation and Aeronautical Science, 1(1), 1–3. 19. Tafforin, C. (2015). The ethological tool for human factors assessment in the interactive space operations. In 4th Human Dependability Workshop, ESOC-ESA. Darmstadt, Germany, October 6–7, 2015. 20. Galet, G., Tafforin, C., & Michel, S. (2017). Feedback after a first ethological experience, human factor in operations. In 5th Human Dependability Workshop, ESA-Estec. Noordwijk, Netherlands, November 14–16, 2017. 21. Galet, G. (2014). Optimization/evolution of the operational trades & skills. In SpaceOps 2014 Conference Proceedings. Pasadena, Canada, May 5–9, 2014. 22. Hall, E. T. (1971). La dimension cachée. In The Hidden Dimension. Paris, France: Seuil. 23. Strzepek, A., Estevem, F., Salas, S., Millet, B., & Darnes H. (2016). A training, operations and maintenance simulator (TOMS) made to serve the MERLIN mission. In SpaceOps 2016 Conference Proceedings. Daejeon, Korea, May 16–20, 2016.

Enhanced Awareness in Space Operations Using Web-Based Interactive Multipurpose Dynamic Network Analysis Redouane Boumghar, Rui Nuno Neves Madeira, Alessandro Donati, Ioannis Angelis, José Fernando Moreira Da Silva, Jose Antonio Martinez Heras and Jonathan Schulster

Abstract The dynamic network analysis (DNA) interactive visualization tool is a graph-based visualization tool that gives space operations staff the ability to comprehend complex relationships at stake in many different kinds of problems. The added value of DNA is exposed through different use cases applied to spacecraft operations. Operations engineers have shown an enhanced level of awareness when being able to visualize the dynamics of their problems. Tables, text, and numbers represent the way we communicate, but graph layouts and images represent, more efficiently, the way we think and mind map problems. Also, graphs represents patterns that our eyes are made to detect easily. By enabling the sharing of these mind maps and their semantics, we show how spacecraft issues can be detected earlier and, thanks to better insight, how they are solved more efficiently. R. Boumghar (B) · R. N. N. Madeira · A. Donati European Space Agency - ESOC, 64293 Darmstadt, Germany e-mail: [email protected] R. N. N. Madeira e-mail: [email protected] A. Donati e-mail: [email protected] I. Angelis SCISYS Deutschland GmbH, 64293 Darmstadt, Germany e-mail: [email protected] J. F. M. D. Silva · J. A. M. Heras Solenix, 64293 Darmstadt, Germany e-mail: [email protected] J. A. M. Heras e-mail: [email protected] J. Schulster LSE Space, 64293 Darmstadt, Germany e-mail: [email protected] © Springer Nature Switzerland AG 2019 H. Pasquier et al. (eds.), Space Operations: Inspiring Humankind’s Future, https://doi.org/10.1007/978-3-030-11536-4_31

795

796

R. Boumghar et al.

Nomenclature AI DNA ESA HK JSON ML SVG TM UI UX

Artificial Intelligence Dynamic Network Analysis European Space Agency House Keeping JavaScript Object Notation Machine Learning Scalable Vector Graphics Telemetry User Interface User Experience

1 Introduction 1.1 Rationale The work of space operations staff is driven by an important load of interconnected information. Thermal engineers, for instance, use large and complex thermal networks like Sentinel3’s thermal network [1] which contains hundreds of hot objects connected in groups as abstract representations of thermal nodes for a mathematical thermal model. Furthermore, spacecraft operations engineers create synthetic parameters by linking multiple telemetry parameters thanks to the expression of mathematical functions, spacecraft analysts and engineers also get access to output of higher level algorithms providing highly connected information such as dependencies between all pairs of spacecraft parameters [2]. In many problems, engineers have to deal with many coupled pieces of information: time, value, influence, status (e.g., being out of limit or not for a TM parameter), meta-data information, external constraints (e.g., planning constraints), etc. Tables, text, and numbers represent the way we communicate, but graph layouts and images represent, more efficiently, the way we think and mind map problems. By enabling the sharing of mind maps and their semantics, issues are detected earlier and, thanks to better insight, they are solved more efficiently. To avoid information overload and permit clear awareness on complex systems, the European Space Agency’s Data Analytics Team for Operations (DATO) created a visualization tool called DNA (dynamic network analysis) [3], aimed at showing the composition of any kind of network using a fluid, dynamic and interactive view to provide the ability to comprehend complex relationships at stake in many different kinds of problems.

Enhanced Awareness in Space Operations Using Web-Based …

797

1.2 The DNA Approach Changing point of view is a key action required to build robust comprehension of a situation. The example given by the spacecraft thermal network of Sentinel 3A (S3A) is revealing important advantages. This thermal network is encoded in a large text file with a format such as an adjacency list which is hard to navigate manually. Parsing this file into a standard format (JSON for graphs) as illustrated in Fig. 1, it was possible to render it with DNA, thus changing the point of view. The view provided by DNA is dynamic, and users can interact with the graph by pulling and dragging nodes, clicking for information or searching for nodes. If a dragged node is connected to another group of nodes, it would drag the whole group to a new position, changing the whole view in a user-driven organization. Basic visual checks become then possible. Thermal nodes and hot objects, having different semantics, were represented with different colors and shapes, making inconsistent connections obvious. In spacecraft operations, there are several scenarios where this kind of analysis may play a relevant role. Data mining techniques also output relations such as correlation matrices which can be visualized as graphs using DNA. DNA leverages visual features that are easily interpreted by the human eye. Moreover, actions on the graph, like adding nodes and edges, can be saved and shared among several users. DNA adapts the visualization using a high-level configuration grammar to easily tackle problems of different natures (e.g., correlation graphs, thermal network organization, parameters checking). DNA also shows relevant use from data scientists when visualizing machine learning predictive models’ parameters. A multisemantics graph layout was developed using web-based technology to enable multiuser interactions and seamless integration. DNA represents the dependency relations or other kind of connections among different objects. The objects composing such representations can be complex, and their links’ information may, as well, entail complexity. One of the main advantages of having a network view is to provide a topological overview of any problem for improved awareness. Connected information of multiple kinds already exists in operations as in many other differ-

Fig. 1 File formats illustration; on the left the former format, on the right JSON-based format used by DNA

798

R. Boumghar et al.

ent domains. Graphs are inherently present in various forms in operators’ everyday life. In domains where structured text and numbers are the only representation of information, a lot of effort has to be made to understand links and build awareness. DNA’s goal is to transform textual relational information in a visual layout, where attributes like shapes and colors are leveraged to represent the different dimensions of a problem. Reducing dimensions is important to overcome the problem of information overload presented in Sect. 2. The technologies used during prototyping phase and the final choices are presented before exposing the data interfaces in Sect. 3. Section 4 shows typical use cases where DNA has already proven to be useful, while Sect. 5 highlights the interesting consequences of the use of DNA. The numerous user feedbacks and foreseen usage translate into a new set of features to be developed that Sect. 6 presents before concluding in Sect. 7.

2 Enhanced Awareness Through Dynamics 2.1 Information Overload Spacecraft operators’ teams are organized by specialization. Spacecraft engineers have a good understanding of their subsystems as they are organized to focus only a specific one. With time and experience, engineers change the focus on different subsystems and therefore can have a more acute understanding of all the relations between subsystems and have more insight about the possible reasons of anomalies. Their knowledge is composed of semantics and ontologies, links between causes and effects, influencing factors, sometimes at first non-obvious. A graph representation made of nodes and edges is a compact way to represent dependencies, would they be correlations, causal, or logical. All the resources of a plotting tool must be used to tackle the high number of dimensions that operators need to deal with. It is interesting to see how different information can be encoded in a chart or a graph by using attributes such as color, shape, thickness, and annotations. The survey [4] goes through these different problems of mapping a N dimensions information to a smaller M dimensions with N > M in order to fit design trade-offs between information, simplicity, and accuracy.

2.2 Visualization Technologies 2.2.1

Exploration and Prototypes

GraphViz [5] proposes the dot language to describe nodes and edges which can both withhold information and also drawing attributes. The several proposed algorithms permit to render static images from files written in the dot language. The many

Enhanced Awareness in Space Operations Using Web-Based …

799

Fig. 2 A sample network is viewed using different force-directed layout algorithms; left Atlas2 [8], center repulsion, right Barnes-Hut algorithm [9]

different layouts proposed cover a very large number of applications. But when the user wants to retain a good access to details while still viewing a large graph, it is necessary to have capabilities such as a dynamic zoom which is not possible by a static image where information may have been rendered at an inappropriate resolution. With the numerous libraries and tools that exist, it has become easier to build dynamic visual representations also enabling user feedback and interactions with the data. HTML5 library D3.js [6] provides ways to draw documents shaped by data itself. Several derivative and inspired solutions have emerged such as VIS.js [7] which permits to easily produce a graph. Figure 2 represents one sample network of nodes and edges drawn using different layout algorithms. The choice of the layout is very important and can change the whole graph understanding. On Fig. 2 on the left is the Atlas2 [8] force directed layout which expands the cliques1 or partially connected group of nodes of the graph to avoid overlaid edges, it gives clarity in the cliques but lack of discrimination between the groups. In the repulsion layout, center view of Fig. 2, these connected groups of nodes form circles that do not represent the structure of the connections. Another relevant web-based graph library is Cytoscape.js [10]. This library is based on the popular Cytoscape [11] desktop tool which is the one of the main used graph applications in several areas such as biology. Inspired by the features of its desktop counterpart, Cytoscape.js provides advanced graph querying, calculations, styling, and manipulation capabilities, making it a very powerful library out of the box. Finally, WebGL-based 3D visualization libraries were also considered and reviewed both for performance and how they fare in comparison with, for example, animated SVG in D3.js or canvas in Cytoscape.js, but also to see whether a 3D space layout would help managing the information overload problem. The main library explored in this area was Three.js [12] (Fig. 4), which provides a higher level API for WebGL. 1A

clique is a subset of nodes of the graph such that every two distinct nodes in the clique are adjacent, directly connected.

800

R. Boumghar et al.

Fig. 3 Dependency network using the color dimension to represent additional information. On the left, colors represent each node subsystem. On the right, no colors are used, therefore it becomes harder to understand which nodes belongs to which subsystem, yet the structure of the dependencies between telemetry parameters is still visible Front-End

Back-End REST requests

Angular graph visualization

UIkit 3

Django

UI components

UI components

Cytoscape

graph data

Angular Material

ArangoDB

additional graph functionality

Cytoscape Plugins

Fig. 4 Dependency network in 3D using 3d-force-graph [13] that is based on Three.js. Colors represent different spacecraft subsystems

2.2.2

Choices

The development of the DNA application was an iterative process that started with proof-of-concepts and performance demos using previously mentioned libraries. Because of several libraries and tools were involved in the process, this section will describe the technology in use in the current version of the application Fig. 5 and the motivations behind the choice. Cytoscape.js was adopted as the front-end library for graph representation and manipulation because of all its functions available out of the box that allowed for

Enhanced Awareness in Space Operations Using Web-Based …

801

Fig. 5 Technology stack composing the latest version of the DNA application

faster feature implementation. The library also has support for graph-based calculations that the application will implement in the future and a collection of extensions such as the navigator and edge creation. It also showed better performance compared to default behaviors of other libraries. The first applications were based on plain HTML and JavaScript and quickly grew in size and complexity. In order to try to impose a cleaner structure and improve the development process, the angular framework was introduced. This choice was based on the modernness and paradigms of angular, and also because there were already some experience inside the team. To hasten the development of the web UI itself, the component library UIKit was used for its simplicity and easiness of use. With the use of the angular framework, components from angular material were also introduced. As the front-end side grew in complexity and features, it was necessary to extend the data api to dynamic web services. To this aspect, the following software are used: • Django [14]: a high-level web framework written in Python. It permits fast creation of webservices and facilitates the connection to the most known databases. The fact that it is written in python also offers the possibility to use the numerous python libraries. • ArangoDB [15]: a graph database that follows the No-SQL paradigm and proposes a multimodel approach. It permits to directly represent the data in its final form, a graph. It also offers built-in graph functions that were not available on other graph databases such as Neo4J.

802

R. Boumghar et al.

3 Data Interfaces Since DNA was envisioned as a web application, the natural modern way to encode the networks used is JSON. JSON is a lightweight data format that is used in several different domains and applications. It is based on a subset of the JavaScript programming language, and its main feature is its simplicity that makes it easy for both humans and machines to read and write. The files used differ slightly depending on the origin but follow the same highlevel structure: two JSON arrays, one contains JSON objects describing the edges, or links. The edge object itself contains the ids of the nodes being linked and the value of the link. The other array contains the nodes themselves which are also described by JSON objects with several fields, such as id, label, description, hierarchy level, element type, etc. With the evolution of the tool, the files also started incorporating additional fields related to graph metadata, such as filters to apply immediately to the graph as it is viewed, or the list of node fields that can be edited, what is the type of the graph, etc. A recurrent issue with the format of the graph was that there were slight variations depending on the origin of the data, and also that each tool and library that we tested has its specific way to input a graph. In order to cope with this, several slightly different algorithms had to be created on the front-end side after loading the graph. In order to reduce the impact of these transformations, and to facilitate the development of more advanced features such as advanced search and off-line calculations, it was necessary to develop a back-end based on the previously mentioned technology: Django and ArangoDB. Data in ArangoDB are organized in document collections; for each graph, there are two collections: one for the nodes and another for the edges. Inside each collection, there are JSON objects representing the node or edge, and these are similar to the previous file-based ones with the main difference being the native mandatory Arango fields, _id and _rev. Then there is an additional collection that contains the information for each graph that serves to know to compose networks using other documents collections. This new way of interacting with the data brought several benefits: the graph representation was streamlined in order to follow the ArangoDB conventions, so that the calculations while importing, manipulating, and storing data use this representation, and then a set of functions can be used to transform the data for the front-end depending on the technology in use there, which for now is mainly Cytoscape.js. It also opened a possibility to query the minimum necessary data for a node and its neighbors instead of requesting a whole graph, an important feature to enable interactions and/or incremental loading of large graphs.

Enhanced Awareness in Space Operations Using Web-Based …

803

4 Use Cases 4.1 Thermal Network, Simulation Configurable Model One of the most important aspects in order to keep the satellite, while in orbit, in a stable configuration is to monitor closely its thermal behavior. As a result, a thermal representation of the satellite is of great importance in order to support the engineers to evaluate from a thermal perspective the satellite under different circumstances, e.g., during eclipse, sun pointing. Such a representation can be achieved using the thermal network (TNET) generic model, developed by ESA/ESOC as part of a suite of several generic simulation models. The thermal network (TNET) provides a highly configurable model allowing the simulation of any satellite thermal network. The TNET provides the capability to simulate the behavior of the thermal evolution of the satellite in terms of heating and cooling different satellite areas at specific temperature reference points. The number of contributors and its characteristics are specified in a configuration file. The latter is composed of a set of hot objects, each one contributing a configurable amount of heat, influencing a corresponding thermal node, while each thermal node is consisted of a set of hot objects, each one associated with a default temperature contribution. The TNET configuration in the simulators is implemented based on the available existing satellite-related documentation, i.e., thermal ICDs, in order to mimic realistically the characteristics of all hot objects in terms of active and passive units, data handling and monitoring interfaces, rise and fall temperature rates and the scalability of the thermal nodes. Based on the different satellite modes, pointing laws and satellite layout, each thermal node consists the following characteristics: • • • • • • • • • • •

Identifier Base Temperature (◦ C) Default Temperature (◦ C) Rise Rate (◦ C/sec) Fall Rate (◦ C/sec) Number of Related Hot Objects HotObjectId NominalContribution (◦ C) HotObjectId NominalContribution (◦ C) HotObjectId NominalContribution (◦ C) …

For each hot object, the thermal node calculates the actual hot object temperature contribution based on the corresponding status of the model that the hot object is connected to, which is then reflected to the thermal node actual temperature being reported in TM. The DNA tool facilitates the process to verify the construction of such complicated networks as it offers the possibility to visualize and navigate in a graph, representing thermal configuration files of more than 3000 lines.

804

R. Boumghar et al.

4.2 Telemetry Dependency Analysis Dependency finder [2] is an algorithm that explores the possibility of extracting dependency knowledge automatically from telemetry parameter data. It is a part of the applied research realized by the ESA Data Analytics Team for Operations (DATO). Dependency finder is designed as a generic method based on conditional probabilities and currently implemented as a prototype. Its result consist of a large list of pairs between thousand parameters with a degree of dependency between 0 and 1. Mars Express Flight Control Engineers report to find useful to display this extracted knowledge in the form of dependencies as a graph. It gives a better insight than a large table, and the dynamic graph provided by DNA is more navigable and enables the visualization of dependencies in an intuitive way.

4.3 Random Forest Feature Importance The random forest [16] algorithm is a machine learning approach that fosters the use of a set of random decision trees. After training, the relative importance of the input features can be computed, and the relation between each parameters and the set of features can be drawn as shown in Fig. 6. The features colors shown on this figure represent the features extracted from the mission operations plans (purple) and the ones from the flight dynamics information about solar aspect angles (orange). An organized version of this view recently proposed in [17] permits to group all the

Fig. 6 Visualizing the most important features after the training of a random forest model. The zoom enables the user to get more information. In green square the target parameters, in circle their sources of influence, and in purple for operations’ command-related features, in orange for sun aspect angle-related features

Enhanced Awareness in Space Operations Using Web-Based …

805

features of each domain and have the repartition of features’ origin influence per target parameter. A very useful view to understand the dynamics of the spacecraft thermal subsystem.

5 Outcome and Discussions 5.1 Important Users Feedback During the projects, several UX experiments were conducted. Operators and engineers who saw the network describing their own problem could recognize patterns from the third time they saw their graph. Users did not have access to the visualization between the experiments, and colors were changed, but the shapes which depends on the structure of the problem, such as a group of telemetry parameters related to a reaction wheel, were recognizable. This shows that such type of visualization easily brings awareness to the structure of the problem using visual pattern recognition. The graphical visualization of the Sentinel-3 thermal network, as used in the satellite simulator, allowed for a rapid evaluation of the structure and connection of links between ‘hot objects’ (heaters) and ‘thermal nodes’ (thermistors and thermostats), compared to sifting through thousands of lines of thermal model code. By focusing on ‘knots’ of dense nodes (Fig. 3) in the visualization and concentrating on the links that connected with the most nodes, it was possible to check the expected proximity between heaters, the thermistors, or thermostats that were used to control them and the monitoring thermistors used for functional monitoring of the heater performance. This approach was used to identify a typographic error within the thermal model code linking one heater with unrelated thermistors. The model code was checked by hand and found to have a ‘cross wired’ heater, which was then fixed via software problem report, specifying the change required. For the specialist subsystem engineer, the intuitive knowledge of the most dissipative equipments and largest heaters on the satellite allow rapid confirmation or ‘sanity check’ that these correspond to the ‘hot object’ nodes with the most connections, that are causing the largest influence on other surrounding objects. A rapid survey of these against the known list of most dissipative units and heaters allowed to quickly reveal any significant errors in the modeling of the power values assigned to each object.

5.2 Current Status of the Tool Currently, the tool provides several features mainly for the dependency finder and thermal networks use cases, but that can also be applied with the other ones. The main features are as follows:

806

R. Boumghar et al.

General and specific filtering possibilities: After a graph has been loaded, it is helpful to be able to filter out unwanted nodes or edges in order to better focus on a task. For this, several filters are provided that can be applied to most graphs, such as filtering based on the value of the edge or if a node has any edges. Besides this, DNA also provides dynamic filters based on the different groups per use case, such as: hot object and thermal node, for the thermal network use case or telemetry, telecommand, and event filters in the dependency use case. Advanced color and shapes defining algorithms: DNA uses Cytoscape.js styling possibilities by passing a final configuration to the library that is based on metadata information of the graph and the application. That is, a graph-specific configuration overwrites a graph-type configurations, and this one overwrites the default configuration. This allows to have a configuration specific to a graph and at the same time helpful defaults based on graph types or other heuristics. An example of this is the color generation for the dependency finder graphs that is based on the group of the node and the hierarchy. Graph modifications: A very important feature for the thermal network case is the ability to modify the graph. For this, the tool provides several models to create, delete, and modify nodes based on the type of the graph or the configuration loaded at the moment. Besides this, the user is also able to change the layout algorithm of the graph, redo the layout, and resize the nodes according to current number of edges. Loading based on node and its neighbors for larger graphs: One of the greatest challenges encountered while developing the tool was the necessity of coping with very large graphs. For some use cases, namely the dependency finder, the produced graphs have a very large number of elements, for example more than 2000 nodes and 20,000 edges for the case of the Mars Express data even after filtering by a threshold of dependency. This is a big challenge because we want the users to be able to freely explore the graph while at the same time not being overwhelmed by the number of elements. Also the application must be able to maintain an acceptable level of performance so that the interaction is not affected by this. Early development focused on two topics: improving and creating more filters to be applied to the graph, and trying to improve the performance of the application for display big graphs. Performance was improved by fine tunning the transformations and interactions with the library used and testing several other visualization libraries. Although there are some differences between alternatives, all libraries have troubles with managing and rendering a high number of elements, especially under layout and interaction constraints. The WebGL-based approaches seem promising, but the interaction levels with the elements are not optimal. One of the disadvantages of filtering is that it generates waiting time before the filter is applied and rendered. In the current application visible on Fig. 7, the tab named “load”, on the left, provides an additional feature which is a table of nodes that can be browsed and filtered. The concept is that when a graph has a certain threshold of elements, instead of completely loading it all at the beginning, it will only load the node list and from there the user can select which nodes to load and which adjacency level, that is a level of neighborhood, that should also be loaded. For example, level 0 means only this node, level 1 itself and its neighbors, level 2 includes neighbors of neighbors, and so

Enhanced Awareness in Space Operations Using Web-Based …

807

Fig. 7 Overview of the application while visualizing the data from the dependency finder algorithm

on. Although this does not help a lot in the case of complete free data exploration, it is a good approach for investigation cases where we have an idea of where to start looking.

6 Future Work Based on the described use cases and the outcome of the development, several topics for future work have been defined. The first is to implement advanced graphic calculations both when importing the graph and available on request on the front end. In order to do this, both the capabilities of Cytoscape.js algorithms and ArangoDB queries can be used. This will allow for better filtering and querying in the front end, by, for example, highlighting the shortest path between certain nodes or groups (to answer the question: What is the relation between these two elements?), or filtering based on importance of nodes, etc. Secondly, more ways to represent the available data can be introduced since depending on the use case and situation they might help dealing with the information overload. The two main visualization that will be focused are a tabular view of nodes and edges and their fields, that can be helpful for checking details and for reporting, and the 3D view as shown before. Another important improvement for the tool is to have a separate component, or menu, to interact with the available graphs and their details. That is, being able to see when it was last modified, load old versions of the graph, see general statistics

808

R. Boumghar et al.

before loading, etc. This is especially important once the user base and number of graphs available start growing. On the topic of multiuser experience, there are many common UI and UX elements that should be implemented, such as identifiers of who changed something in the graph, a mechanism to approve these changes, the possibility to write comments, being able to store different configurations per graph, and also how to make the interface intuitive so that only very few effort is necessary to introduce the tool to new users. Moreover, there is also work to be done in terms of deployment of the application and understanding the new needs of new users and grabbing the outcome such as the ones described in Sect. 5. The deployment of the application is planned to be possible either as a standalone or integrated in a mission analysis tool such as WebMust [18], OpsWeb [19], or OpenMCT [20]. The current version of DNA does not entail the time information. But there are ways to animate the graph visualization and, as a movie, play the evolution of links. A better understanding of the current relationships can sometimes be raised by a navigation through time. This idea emerged from the very early prototypes for spacecraft engineers to replay the evolution of telemetry dependencies around the time of a known anomalies. This is foreseen to help spotting the concerned telemetry and fasten the search for the cause of the anomaly.

7 Conclusion Space operations staff evolves in our world made of interconnected information. Yet no visualization tool clearly permitted to visualize the links, the relationships, and their topology. DNA has proven that graph-based visualization is more efficient that an adjacency list-based approach when it comes to verify and investigate issues. Thanks to this demonstrator, users could express additional needs, not foreseen in the beginning, which open to brand new development ideas. DNA is built with the idea to be modular and integrable as a component to other dashboards projects. In such future integration, the workflow of users have to be rethought: start with raw telemetry, move to an aggregated chart, check the dependencies of the telemetry parameter in a graph, get a list of these dependencies, recheck the raw telemetry for them, and so on. This may require several information exchanges between different components. The future of automated data analysis systems is motivated by the usability of machine learning techniques and the simplicity, nowadays, to built complex deep learning models. DNA plays an important role in giving a comprehensive visualization of parametrization and evolution of complex artificial intelligence models. Mature core functions still need to be improved in order for users to be able to check, reorganize, search, and understand the relationships of a complex network of objects. DNA acts as a facilitator to enable costless navigation for faster compre-

Enhanced Awareness in Space Operations Using Web-Based …

809

hension which has already proved to be useful. By showing this demonstrator, we hope to raise attention on the topic in space operations and start building, in an open collaborative way, the next version of DNA. Acknowledgements DNA would not have been developed without the valuable feedback of many users who gave their time to share their issues and participate in the UX design experiments. The authors express particular gratitude to the following users, members of the European Space Operations Center: Gustavo Baldo Carvalho, Juan Rafael Garcia Blanco, Vadims Kairiss, Max Pignede, Luke Lucas, Mario Castro De Lera, Marco Zambianchi, Peter Collins, Ana Piris, Thomas Godard.

References 1. Massimo, S. D., Ignacio, M., Silvio, D., & Gerard, C. (2015). Sentinel 3 - Spacecraft thermal control: Design, analysis and verification approach. 2. Martinez-Heras, J., Lucas, L., & Donati, A. (2018). Dependency finder: Surprising relationships in telemetry. In15th International conference on space operations, SpaceOps. 3. Boumghar, R., Madeira, R. N. N., Angelis, I., Moreira Da Silva, J. F., Martinez Heras, J. A., Schulster, J., & Donati, A. (2018). Enhanced awareness in space operations using multipurpose dynamic network analysis. In 15th International conference on space operations, SpaceOps. 4. Chan, W. W.-Y. (2006). A survey on multivariate data visualization. 5. Gansner, E. R., & North, S. C. (2000). An open graph visualization system and its applications to software engineering. Software - Practice and Experience, 30(11), 1203–1233. 6. D3JS Data-Driven Documents. (2018). https://d3js.org/. 7. A dynamic, browser based visualization library. (2018). https://visjs.org/. 8. Jacomy, M., Venturini, T., Heymann, S., & Bastian, M. (2014). ForceAtlas2, a continuous graph layout algorithm for handy network visualization designed for the Gephi software. PLOS ONE, 9(6), 1–12. https://doi.org/10.1371/journal.pone.0098679, https://doi.org/ 10.1371/journal.pone.0098679. 9. Barnes, J., & Hut, P. (1986). A hierarchical O(N log N) force-calculation algorithm. Nature, 324, 446–449. https://doi.org/10.1038/324446a0. 10. Franz, M., Lopes, C. T., Huck, G., Dong, Y., Sumer, O., & Bader, G. D. (2016). Cytoscape.js: a graph theory library for visualisation and analysis. Bioinformatics, 32(2), 309–311. https:// doi.org/10.1093/bioinformatics/btv557, https://doi.org/10.1093/bioinformatics/btv557. 11. Shannon, P., Markiel, A., Ozier, O., Baliga, N. S., Wang, J. T., Ramage, D., et al. (2003). Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Research, 13(11), 2498–2504. https://doi.org/10.1101/gr.1239303, http:// genome.cshlp.org/content/13/11/2498.abstract. 12. Three.js - A cross-browser JavaScript library and Application Programming Interface (API) used to create and display animated 3D computer graphics in a web browser. (2010). https:// threejs.org/. 13. 3D force-directed graph component using ThreeJS/WebGL. (2017). https://github.com/ vasturiano/3d-force-graph. 14. Django - The web framework for perfectionists with deadlines. (2017). https://djangoproject. com/. 15. ArangoDB, production ready highly available Multi-Model NoSQL graph-oriented database. (2018). https://www.arangodb.com/. 16. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/ A:1010933404324.

810

R. Boumghar et al.

17. Breskvar, M., Kocev, D., Levatić, J., Osojnik, A., Petković, M., Simidjievski, N., Ženko, B., Boumghar, R., & Lucas, L. (2017). Predicting thermal power consumption of the Mars express satellite with machine learning. In 2017 6th International conference on space mission challenges for information technology (SMC-IT) (pp. 88–93). https://doi.org/10.1109/SMC-IT. 2017.22. 18. Silva, J., & Donati, A. (2016). WebMUST evolution. In 14th International conference on space operations (p. 2433). 19. Faerber, N., Ojala, K., da Silva, J., Scholz, A., Brach, L., Demirsan, A., & Boumghar, R. (2018). A grammar-based timeline for increasing fleet situational awareness. In 15th International conference on space operations, SpaceOps. 20. Trimble, J. P., & Rinker, G. (2016). Open source next generation visualization software for interplanetary missions.

Space Education and Awareness in South Africa—Programmes, Initiatives, Achievements, Challenges and Issues S. G. Magagula and J. Witten

Abstract South Africa’s involvement in the Space Science started at the dawn of the “Space Age”. Before this, South Africa had been involved in astronomy since 1820, when the first permanent astronomical observatory (and scientific institution) in the southern hemisphere was completed at the Cape of Good Hope. Despite this rich history, South Africa has unfortunately, for various reasons, not been able to fully exploit the benefits of space technology and its applications to meet the challenges—it faces. One reason in particular is lack of awareness and understanding by planners, decision-makers and users about the potential benefits of space technology in planning and implementation of socio-economic development plans. In recent years, South Africa has made the development and cultivation of a domestic space industry a priority, citing the critical roles that space science, technology and innovation play in economic growth and socio-economic development. However, to ensure a long-term viable space programme and a growing space industry, it is imperative that South Africa builds the necessary capacity to support such an industry—hence, the need to create awareness and an appreciation for STEM-based careers at all levels of society which, for our school-going population, is likely to translate into an increase in the uptake and appreciation of science, technology, engineering and mathematics (STEM). In South Africa, the Department of Science and Technology (DST) plays a leading role in the implementation of space science and technology activities. Current space-related projects include the Satellite Build Programme; Operation Phakisa and the CubeSats Development Programme. There are also a number of local programmes and initiatives to promote space education and awareness. These are initiated, supported and implemented by various organizations, from the DST and its agencies through to the private sector and industry, as well as non-governmental and non-profit organizations. With regard to reaching out to schools, the DST conducts Space Weeks and other science festivals, as well as numerous initiatives by the local Radio Amateurs Clubs working with schools on high-altitude ballooning projects. Nonetheless, it has been noted that although there are much effort and resources expended on STEM awareness, there is notable lack of its intended impact. The role of educators/teaching staff will therefore be S. G. Magagula (B) · J. Witten South African National Space Agency, Pretoria, Gauteng 0127, South Africa e-mail: [email protected] © Springer Nature Switzerland AG 2019 H. Pasquier et al. (eds.), Space Operations: Inspiring Humankind’s Future, https://doi.org/10.1007/978-3-030-11536-4_32

811

812

S. G. Magagula and J. Witten

addressed focusing on their contribution to the goals of such STEM programmes. This paper discusses the various space-based STEM programmes that have been put in place to promote space education awareness and the contribution made by various role players. Furthermore, this paper will showcase a concept for a new approach to space awareness activities and its potential to drive STEM awareness programmes in the country. Lastly, this paper aims to recognize the achievements of South Africa in promoting space education and awareness in the country and the related issues and impediments to pursue these programmes. The paper also emphasizes the importance of introducing space education programmes in schools not only for the children but for the teachers in particular.

1 Introduction South Africa’s space history reaches back to 1820, when the first permanent astronomical observatory (and scientific institution) in the southern hemisphere was completed at the Cape of Good Hope. But, South Africa’s space exploration history started with the dawn of the Space Age in 1958, when the National Telecommunications Research Laboratory (TRL) of the Council for Science and Industrial Research (CSIR) agreed to operate and maintain a Minitrack station in Esselen Park in South Africa on behalf of the American Naval Research Laboratory (NRL). In 1960 operations were transferred from Esselen Park to Hartebeesthoek and the Johannesburg (Joburg) Satellite and Tracking and Data Acquisition Network station (JOBURG STADAN) was born. This became one of the busiest network stations in the Goddard Space Flight Centre satellite Tracking Telemetry and Command (TT&C) network. It was eventually equipped with three receiving links at 136 MHz and later S-band and two powerful VHF transmitting systems. This facility is now known to the international community as HBK, now operated by the South African National Space Agency (SANSA) Space Operations Directorate. From the 1950s to the 1970s, satellites were tracked from this station to determine the effects of the upper atmosphere on their orbits. Alongside other programmes, South Africa initiated its first space programme in the 1980s. The objective of this programme was to develop an Earth observation satellite and a launcher, and all the necessary facilities to support these activities. Satellite integration and testing facilities were constructed at Grabouw in the Western Cape, about 70 km north–west of Cape Town. The launch facility was established at Arniston, on the Cape South Coast, about 180 km west of Cape Town. This allowed for east to polar launches. Considerable capabilities were developed in South African industry to support this programme. This programme was discontinued in 1994, before any satellites were launched, and the facilities were decommissioned or repurposed. In recent years, South Africa has made the development and cultivation of a domestic space industry a priority, citing the critical roles of science, technology, and innovation play in economic growth and socio-economic development.

Space Education and Awareness in South Africa—Programmes …

813

In 2008, the Department of Science and Technology (DST) developed a “Ten-Year Innovation Plan” (see http://www.sansa.org.za/publications) whose purpose was to help drive South Africa’s transformation towards a knowledge-based economy. In this plan, five grand challenges were identified, one of which was Space Science and Technology. In the same year, 2008, the DST set out to establish a national space agency. This was realized after the approval of the National Space Agency Bill, which paved the way for the establishment of the South African National Space Agency, launched in December 2010. The DST plays a leading role in the implementation of space science and technology activities. The department’s investment in space is organized among two main initiatives; the South African National Space Agency and the Square Kilometre Array (SKA) programme, which includes the Karoo Array Telescope (MeerKAT).

2 Space Awareness It is common understanding that Space science and technology continues to contribute to sustainable development and offers many benefits to mankind. In order for South Africa’s space programme to be meaningful to the general public, public awareness of the benefits of space technology and its manifold application products and services will have to be created. No technology platform is embraced without a wide understanding of the platform, and awareness and advocacy programmes will therefore be vital to the development of South Africa’s space programme. For the South African public to appreciate Space Science, a general appreciation of the Sciences, Technology, Engineering and Mathematics (STEM) fields is necessary. In order to fulfil this mandate, the South African Agency for Science and Technology Advancement (SAASTA) was formed. SAASTA is a business unit of the National Research Foundation (NRF), with a mandate to advance public awareness, appreciation and engagement of science, technology, engineering, mathematics and innovation (STEMI) in South Africa. SAASTA contributes to the DST’s Ten-Year Innovation Plan (2008) and Youth into Science Strategy (2006) [ref.] and, thus, plays a key role in contributing to the National System of Innovation (NSI). Space awareness in South Africa is pursued in two thrusts: Firstly, to increase the interest and appreciation of science by the youth. Secondly, to create public awareness on the benefits that space is creating in addressing day-to-day societal needs.

814

S. G. Magagula and J. Witten

2.1 Programmes South Africa has well-established science engagement platforms. Currently, SANSA and other space entities leverage on these existing platforms to engage with the public at different levels and create awareness about Space Science and Technology. SANSA has two directorates that have been in existence before the establishment of the agency that is the CSIR-SAC now SANSA Space Operations and the Hermanus Magnetic Observatory, now SANSA Space Science. These entities had been participating in the activities listed below in creating space awareness at various levels of the communities. (1) World Space Week World Space Week (WSW), held from 4 to 10 October, was declared in 1999 by the United Nations General Assembly to celebrate each year at the international level, the contribution of space science and technology to the betterment of the society. SAASTA’s involvement in Space Science Education, Outreach and Awareness dates back to 2003 when World Space Week was implemented in South Africa for the first time. WSW is viewed as a public awareness project aimed at the following: 1. Profiling South African institutions and their achievements (milestones) in the peaceful uses of outer space and space technologies; 2. Capacity building, training and education (including careers); 3. Applications and benefits derived from the peaceful use of outer space and technology; and 4. Popularizing space science to the broader South African society. Since SANSA started operating in 2011, the agency has taken the lead role in the creation of space science awareness in the country. SANSA continued to work with SAASTA in order to establish synergies between what SAASTA has achieved and SANSA’s mandate. (2) SANSA Space Science Holiday Programme (Winter and Summer Schools) SANSA Space Science hosts a holiday programme for children ages 6–12 years during December. (3) SANSA Space Science Joint Space Weather Camp The Joint Space Weather Camp is a collaborative programme between SANSA, the University of Alabama in Huntsville (UAH) and German Aerospace Centre (DLR). It is an annual camp that takes place in two countries every year, attracting a limited number of tertiary-level students from each participating country. (4) BACAR—Balloon Carrying Amateur Radio payload The BACAR project is a STEM educational initiative, being run by the Secunda Amateur Radio Club. The Secunda Amateurs work with school learners in and around

Space Education and Awareness in South Africa—Programmes …

815

Secunda to design and build functioning models of satellites and launching them on a high-altitude balloon into a space-like environment. In this project, learners are introduced and exposed to basic electronics; digital electronics; microcontroller programming; radio communications; study science and a lot more. As they acquire practical experience, there is an additional benefit as the learning contributes to their school curriculum. The programme is designed to stimulate an interest in STEM. In the BACAR project, SANSA partners with the Secunda Amateur Radio Club by supplementing the tracking function using SANSA’s mobile VHF/UHF antenna. SANSA also exposes the learners to the ground station in order to create awareness about the entire space system.

2.2 Initiatives There are a number of annual events that have been initiated by the Department of Science and Technology; however, none of these are focused purely on space. SANSA is still to develop a comprehensive science awareness programme/project. The SANSA Science Advancement team, however, is committed to participating in the events listed below to increase exposure of Space Science and Technology at these events. (1) National Science Week National Science Space Week (NSW) is an initiative of the Department of Science and Technology aimed at celebrating science countrywide, and it involves various stakeholders and role players conducting science-related activities during this week. NSW is normally celebrated on the first week of the third school term. Each year, a different theme is chosen and published. The various stakeholders then prepare and offer activities around the theme to the target audiences. (2) Science Festivals Science festivals are mass participation, public events comprising of Science, Engineering, and Technology (SET) activities that celebrate science in a festive, fun-filled and exciting way. The festivals funded by the Department of Science and Technology are (for more info refer to http://www.saasta.ac.za/science-festivals): 1. 2. 3. 4. 5. 6. 7. 8. 9.

Eding International Science Festival; Mpumalanga Festival; Science Tube; Science Unlimited; Scifest Africa; ScopeX; Rural Education Festival; ZuluFest; STEM Community Day.

816

S. G. Magagula and J. Witten

(3) Science Competitions 1. 2. 3. 4. 5. 6. 7.

AstroQuiz; FameLab; National Schools Debates; National Science Olympiad; Natural Science Olympiad; SA Science Lens; Young Science Communicators.

3 Space Education Space Education in South Africa has largely been offered at tertiary levels as well as in industry. Over the last 10 years, there have been a number of indigenous projects focusing on satellite engineering, which are explained below.

3.1 Upstream Programmes Upstream programmes are aimed at educational activities that are typically focused on engineering disciplines involved in space-based and terrestrial systems and subsystems. (1) Stellenbosch University SUNSAT Project The Stellenbosch University was the first institute of higher learning to reintroduce space education after the initial South African Space Programme was stopped. This university started the SUNSAT project in its Electronic and Electrical Engineering Department in 1992. The aims of the project were to train engineers for a future SA space industry, challenge graduate students, inspire school kids in science and have international cooperation [1]. The SUNSAT project became Africa’s first indigenous (locally built) orbiting satellite and went on to produce more than 100 master’s degrees and Ph.D. combined [1]. (2) Institute for Satellite and Software Applications (ISSA) ISSA was an initiative started in 1998 funded by the then Department of Communications. It offered postgraduate qualifications in satellite engineering, software engineering and information technology. This programme ended around 2005 and produced more than 500 postgraduates. (3) Sumbandila Satellite Project The SumbandilaSat is the second South African satellite built in country. The technology demonstrator was built by SUNSPACE (in 15 months), managed by the

Space Education and Awareness in South Africa—Programmes …

817

University of Stellenbosch and launched from Baikonur, Kazakhstan, in September 2009. It weighs approximately 83 kg, orbits at around 505 km, with a 6.25 m GSD imaging in six spectral bands. The success of the project lies in the educational development aspect of the team: 20 masters and 2 Ph.D. students with an additional 8 interns in satellite engineering were trained. (4) CPUT/French South African Institute of Technology (F’SATI) The Cape Peninsula University of Technology (CPUT) hosts the region’s premier nano-satellite programme, which has developed Africa’s first nano-satellite, ZACUBE-1, and is currently developing what will be the most advanced South African CubeSat to date, ZACUBE-2. The programme is hosted by the French South African Institute of Technology and the African Space Innovation Centre. The programme is strategically aligned with the National Space Strategy and is funded as a key human resource development programme by the Department of Science and Technology and the National Research Foundation. To date, the satellite programme has graduated more than 60 master’s students and has developed a suite of communications products that is being marketed internationally through Clyde Space.

3.2 Downstream Programmes Downstream programmes are aimed at educational activities that are focused on satellite data, the processing thereof and the resulting applications. (1) The Fundisa Disk “In 2006, the department of education introduced geographic information systems (GIS) as part of the Grade 10 geography syllabus for the first time. It has meanwhile been extended to Grades 11 and 12” [2]. SANSA has put together an intervention to introduce geography educators to remote sensing and GIS in response to the introduction of GIS in the school curriculum. The disc contains Quantum GIS (QGIS)—a GIS software and datasets from SumbandilaSat, SPOT and Landsat, which SANSA distributes free of charge to students. From 2009 to 2013, hard drives called “FUNDISA Disks” were sent to schools and universities with an assortment of data. The Fundisa programme is funded by the DST. What started out as the Fundisa Disk targeting universities from July 2008, as part of addressing the four pillars of data democracy for developing countries, was later adapted to be the “Fundisa Resources” with the Fundisa Disk for universities, Fundisa Disk School Edition and the Fundisa Portal. For more information, see: http://fundisa.sansa.org.za.

818

S. G. Magagula and J. Witten

3.3 Recent Initiatives (1) Plans to develop SANSA Space Science Engagement Strategy The Science Engagement Implementation Plan, (https://www.saasta.ac.za/saasta_ wp/wp-content/uploads/2018/03/Science-Engagement-Strategy-ImplementationPlan-Approved.pdf) which states that “individual DST entities and specialized service delivery units should embark on targeted science engagement initiatives meant to raise awareness among the intended beneficiaries of their service offerings”. To fulfil this mandate, two of SANSA’s managing directors have been task to aid the development of a SANSA science engagement strategy. This should talk to space awareness and education and will be aligned to the DST’s Science Engagement Strategy. (2) SANSA STEMI Outreach Project SANSA does not have a comprehensive national STEMI programme reaching all 9 provinces of the republic, which encompasses the complete space value chain, with regards to upstream and downstream activities. Existing STEMI activities and programmes are measured in terms of the size of the audience present at an event. A better way to measure impact needs to be developed that is better aligned to the outcome of the programme. There are approximately 5800 secondary schools and 3900 combined (primary and secondary) schools in the country [ref.]. If we are able to reach only 100 schools around the country per year, it would take us over 100 years to achieve our goal. The vision of the programme is to create a human capital pipeline that is competent in STEMI-based skills. The mission is to be able to enhance STEMI awareness and education by using space as a theme and a representative technology platform that engages schoolgoing learners between 14 and 16 years old and then to track their progress through to graduation from a tertiary institution. 1. Fundamental requirements for the programme In order to create a sustainable outreach programme, the following requirements needed to be fulfiled: • Programme implementation through partnerships (non-profit and private entities). • The programme needs to be scalable. • Learning needs to be inspired by technology hardware that is robust in design and easy to use. • A funding model that will ensure long-term/perpetual implementation. • A measurement system that measures the development per individual over time, not merely the number of individuals at any given point in time.

Space Education and Awareness in South Africa—Programmes …

819

(3) Programme Realization An assessment was done of the service industry with a view to create viable partnerships with possible implementers. The implementers had to be a South African non-profit organization, have access to technology platforms and hardware that could be used safely in the classrooms, and had to have a track record of success with skills development projects. A partnership was formalized through a Memorandum of Understanding, signed in October 2017, with a local non-profit organization, called Meta Economic Development Organization (MEDO) Space. This agreement set of the roles and responsibilities of either party in implementing a nationwide STEMI outreach programme. MEDO’s has a track record of successful STEM programmes for private and public institutions, both in South Africa and in the USA. Their strength lies in their innovative technology developed in-house, called X-In-A-Box (for more information see https://xinabox.cc). X-in-a-Box was developed with the aim of being able to expose and enable the youth (from school-going learners to university students) to build various configurations of miniaturized satellites (flatsat, cookiesat, etc.). Through this activity, participants will be able to learn basic principles of coding, data collection and management, data analyses and interpretation to developing functional platforms that perform actual scientific missions. X-in-a-Box is essentially a range of chips/miniature pc boards, which provides the basic functions of satellite sub-systems. It leverages on a standardized modular philosophy, a standard connectivity solution void of wiring or soldering which allows for simplistic integration of various payloads. They have also developed an extensive range of sensors/payloads that can be incorporated into the satellite design. These chips are placed in standardized satellite kits for the learners to build. Software is programmed using a range of different code and interfaces with Raspberry Pi (as well as the Zero), DragonBoard, Minnowboard Beagleboard, and even Micro:bit. (4) The Pilot Project Due to MEDO’s track record and experience in STEM(I) programmes, it was decided to initially implement SANSA’s STEMI programme on a small scale as a proof of concept and to grow the programme over a 3 year period, in terms of a number of schools and national coverage. The programme commenced in 2017, in the Western Cape with ten schools, supported by corporate sponsorship. (5) SANSA STEMI Outreach Programme for Grade 7–9 learners: Phase 1A and 1B The programme consists of at least two phases. A learning management system (LMS) consisting of numerous tasks of varying complexity is used to assess the learners’ aptitude and competence in using and applying the technology to their physical environment. In Phase 1, the learners will build WeatherSats to study the weather in the Troposphere. Each school will receive kits, a radio ground station and balloon launch packages. Some of the assembled kits allocated to each school will be launched with the balloons, and the data will be collected using a ground station.

820

S. G. Magagula and J. Witten

Data collected from prior balloon launches will be used to demonstrate the output and results of each module. Each component of the kit has a purpose (satellite subsystem functions), and learners (and teachers) receive the appropriate training. i. Phase 1A activities • Connecting sensors to build a WeatherSat; • Programming of all the sensors; • Ground-level data collection and analysis while moving the WeatherSat around indoor, outdoors and away from the school. ii. Phase 1A outputs • Trained teachers to guide and help learners in the build process and to use the kits as teaching aids; • Learners with programming knowledge and basic skills; • Learners equipped with an understanding of a system and the data value chain; • Learners with a basic understanding of data analysis skills; iii. Phase 1B activities • Route (trajectory) planning and launching the WeatherSats on balloons; • Collecting flight data and analysing the results at altitudes up to 5 km. iv. Phase 1B outputs • • • • •

Introduction to situational awareness skills; Observing the data value chain (system) in action; Analysis of real-time operational data; Presentations of data (temperature vs. altitude vs. time, etc.); Training aids (and data) for the school to use in the curriculum.

The two levels of activity (1A and 1B) above have been designed to link indoor and desktop activities with eventual satellite launches. The LMS has a secondary purpose of tracking the performance of individual learners on the programme. Although not all the participants will be able to make it through to Phase 2, qualifying teams/individuals as well as the runners up will be added to the “pipeline” database that will track their performance through secondary school to university. (5) SANSA STEMI Outreach Programme for Grade 7–9 learners: Phase 2 and 3 Description The top teams will enter into Phases 2 and 3, where they will have a wider selection of sensors to be used for their own self-defined weather-based (Phase 2) and actual satellite missions (Phase 3) in low earth orbit. In Phase 2, the teams’ satellites are launched on high-altitude weather balloons, up to 35 km. The data collected will be analysed to understand the troposphere, the ozone layer and the stratosphere. Depending on the availability of funding and launch availability for Phase 3, the learning from Phases 1 and 2 will be incorporated into the design of experimental mission. The satellite will be launched into (extreme) low earth orbit between 150

Space Education and Awareness in South Africa—Programmes …

821

and 300 km. At such altitudes, the satellite will operate for a short period (5–30 days), collecting data beyond the thermosphere as required of the mission. i. Phase 2 and 3 activities • • • • • • •

Project planning; Mission planning; Connecting sensors to build a more sophisticated configurations; Programming of all the sensors; Data collection and analysis; Failure/fault investigation (if any); Reporting and presentation of data.

ii. Phase 2 and 3 outputs • • • • • •

Basic project management skills; Team-working skills; Learners with more extensive programming knowledge skills; Learners equipped with deeper data analysis skills; Learners equipped with deeper system analysis skills; Learners equipped with enhanced communication skills.

4 Limitations This paper may not have fully captures some of the efforts of space awareness and education due to either them being published under the umbrella of science awareness and education or due to a lack of documentation on the efforts. Some of the people who were instrumental in these efforts may not have been reached out to as some of them may have since left the space or science industry and some may not have been available due to various reasons.

5 Challenges and Issues According to economist [3], “Many of South Africa’s education problems have their roots seated in the era of the Apartheid regime. The Bantu Education Act of 1953 set out to ensure that white citizens received a better education than black people, who were, according to Hendrik Verwoerd, the future prime minister then in charge of education, to be educated only enough to be “hewers of wood and drawers of water””. After South Africa’s first democratic election in 1994 and under Nelson Mandela’s presidency, the government expanded access to schooling. It also replaced a school system segregated by race with one divided by wealth.

822

S. G. Magagula and J. Witten

Economist [3] cites Nic Spaull of the University of Stellenbosch stating that “South Africa has the most unequal school system in the world. The gap in test scores between the top 20% of schools, and the rest is wider than in almost every other country. Of 200 black pupils who start school just, one can expect to do well enough to study engineering. Ten white kids can expect the same result”. In order to address these challenges, [4] states that “the Department of Education in 1999 then set out to develop a Strategy for improving school science and mathematics”. There were, however, challenges with regard to science and mathematics educators. Kahn [4] cites the (DoE 2001: 6) stating that “Although 85% of mathematics educators were professionally qualified as educators, only 50% had specialized in mathematics in their training. Similarly, while 84% of science educators were professionally qualified, only 42% were qualified in science. An estimated 8000 mathematics and 8200 science educators needed to be targeted for in-service training to address the lack of subject knowledge”. Mathematics and science are generally perceived to be difficult subjects to learn. Having teachers who are not trained to teach these subjects makes an already dire situation even worse. This leads to a decline in the uptake of STEM-based subjects year on year.

Subject

Number wrote 2016

Number passed 2016

Number wrote 2017

Number passed 2017

Change (passed at 30% and above)

Mathematics

265,810

135,958

245,103

127,197

−8761

Physical science

192,618

119,427

179,561

116,862

−2565

Guaranteed engineering entry

Number above requirements 2016

% of total 2016

Number above requirements 2017

% of total 2017

Change

7974

3.0

6726

2.7

−1248

Physical science (70%)

17,143

8.9

16,531

9.2

−612

Guaranteed computer science entry

Number above requirements 2016

% of total 2016

Number above requirements 2017

% of total 2017

Change

Mathematics (70%)

18,075

6.8

16,565

6.8

−1510

Mathematics (80%)

Source 2017 School subject report, 2016 Diagnostic report, UCT. Numbers are approximates based on supplied data

Space Education and Awareness in South Africa—Programmes …

823

As there are many challenges and issues that can be discussed, for the purposes of this paper, the following are the most relevant.

5.1 Promotion Criteria in the Schooling System In February 2018, the South African Minister of Basic Education published a proposal—National Gazette No. 41456, 23 February 2018, Vol. 632—to drop the promotion requirement or “pass mark” for general education and training to be as follows: • 40% in four subjects, one of which is a home language; • Any three subjects at 30%; • A condonation of 2% in one subject if it will lead to a pass. This is a further drop from the current promotion requirement which was already low. The current requirements are as follows: • 50% or more in one language at home language level; • 40% or more in the second required official language at first additional language level; • 40% or more in mathematics; • 40% or more in any three of the other required subjects—including natural sciences, life orientation, social sciences, arts and culture, and economic management sciences. These constant drops in the promotion requirements have a knock-on effect on the education value chain in that schools would churn out matriculants who do not fulfil the entry requirements for university are underprepared for other types of tertiary education offered. Many of these learners may not even qualify for entrance to university/tertiary education which further reduces the uptake of STEM-based careers—this conflicts with the many efforts to drive South Africa’s transformation towards a knowledge-based economy.

5.2 The Artificial Split Between Astronomy and Space Science and Technology According to Smith [5], “the term “space science” encompasses a much wider arena of topics than simply astronomy or satellite-based research. The popular conception of the topic includes rockets and the technologies needed for rocket and shuttle launches, their control and tracking, and also the technologies for new instruments”. In South Africa Space Science and astronomy are seen as completely different. What had gotten to be known as the “Space Age” only started in 1957 with the launch of Sputnik. In as much as South Africa has been involved in the “Space Science”

824

S. G. Magagula and J. Witten

since the dawn of the “Space Age”, the split was magnified by the maturity stage of the Square Kilometre Array (SKA) bid and the infancy of the SANSA at that time. This has led to the inability of the “Space Science” community like the SANSA to look into and leverage on the platforms that has been created by the astronomy community.

5.3 Lack of a Coordinated Approach to Science and Space Science Engagement As stated in Sect. 3.2 above, although there are indeed numerous regular events planned annually on the Science Engagement calendar, entities like SANSA that represent space are mere participants at this stage. A coordinated approach would enable space to be the main theme for an event, yet enjoy the same support and exposure from public sponsors if placed on the calendar together with other events.

5.4 Communication of Space Science and Technology There are a number of aspects to science communication that poses challenges to successfully enhancing and promoting education and awareness of Space Science and Technology in South Africa. The source of the message and its vehicle are the traditional stumbling blocks. The science and innovations themselves are not well articulated by the actual scientists and engineers and not expressed to a broad enough (national) audience. This problem is further exacerbated when developments are reported on by journalists with no background in science or science communication.

5.5 Insufficient Funding Allocated to Space Science and Technology Awareness and Education Programmes Due to budgetary constraints, new programmes to advance awareness and education of space are not easily funded. This has led to SANSA adopting a different approach to the SANSA STEMI outreach programme with MEDO which leverages off corporate sponsorship.

Space Education and Awareness in South Africa—Programmes …

825

6 Conclusions and Recommendations 6.1 Space Awareness Programmes and Initiatives There is clearly a mature foundation of initiatives and programmes focused on STEM engagement with the public of all ages. SANSA’s participation is these events are critical to Space Science and Technology awareness and to stimulating curiosity and enthusiasm in learners and the broader community.

6.2 Space Education Programmes and Initiatives The SANSA STEMI outreach programme is meant to be the first programme by SANSA that aims to showcase the complete value chain of space to a nationwide audience and to equip learners with basic understanding and skills to prepare them to enter the knowledge-based economy. Better coordination with the DST and other public institutions is required to grow the footprint of this outreach programme more rapidly. Current programmes are based at Centres of Excellence that are seated at tertiary institutions which have limited budgets and hence constraints to their educational development programmes. Increased funding is required to ramp up the facilities and staffing at these Centres of Excellence, which will then enable these programmes to be expanded to incorporate a larger number of local, regional and international development programmes. Greater inter-ministerial collaboration is required to eliminate duplicity, strengthen funding commitments and support and effectively streamline the implementation and rollout of programmes.

6.3 Issue and Challenges There are numerous challenges facing South Africa as a developing nation, which impacts our strategic objectives and our priorities as a country. Science awareness and education have been made a priority, and progress made in this area has been steady but slow. The efforts made by public and private entities in the science, technology, engineering and innovation industries need to be complemented by an absorptive capacity in these industries to attract the skills being developed.

826

S. G. Magagula and J. Witten

References 1. Steyn, W. H. (2017). A historical overview of the University of Stellenbosch’s satellite projects over the past 25 years. Cape Town: s.n. 2. Scheepers, D. (2009). GIS in the geography cirriculum. In Position IT (pp. 40–45). 3. Economist, T. (2017). The Ecobomist [Online]. Available at: https://www.economist.com/ news/middle-east-and-africa/21713858-why-it-bottom-class-south-africa-has-one-worldsworst-education, Accessed March 15, 2018. 4. Kahn, M. P. (n.d.). Science, technology, engineering and mathematics (STEM) in South Africa. Cape Town, South Africa: Australian Council of Learned Academies. 5. Smith, H. A. (2000). Challenges in affecting US attitudes towards space science. Cambridge: Harvard-Smithsonian Center for Astrophysics. 6. Anon. (2014). DST national space science and technology programme strategy. S.l.: s.n. 7. Anon. (2015). Education statistics in South Africa 2013. S.l.: Department of Basic Education. 8. Joubert, M. (2001). Priorities and challenges for science communication in South Africa. Pretoria: Sage Publications. 9. McCarthy, J., & Oliphant, R. (2013). Mathematics outcomes in South African schools: What are the facts? What should be done?. Johannesburg: The Centre for Development and Enterprise. 10. Reddy, V., Gastrow, M., Juan, A., & Reberts, B. (2018). Public attitudes to science in South Africa. South African Journal of Science, 109(1/2), 1–8.

Educational Outreach and International Collaboration Through ARISS: Amateur Radio on the International Space Station Frank H. Bauer, David Taylor, Rosalie A. White and Oliver Amend

Abstract The Amateur Radio on the International Space Station (ARISS) payload was first deployed and operated on the International Space Station (ISS) about two weeks after the first ISS expedition crew arrived on ISS. It has been continuously operational since that time. This makes ARISS the first operational payload and first educational outreach program on the ISS (Bauer et al. in Proceedings from the World Space Congress) [1]; (Conley et al. in Proceedings from the World Space Congress) [2]. ARISS provides a unique, once in a lifetime, educational opportunity for youth to conduct a ten-minute question and answer interview directly with crew members on board ISS. This is accomplished using the ARISS amateur radio systems on ISS, through the support of ISS crew members that have obtained their amateur radio licenses and through hundreds of ARISS international volunteers around the world. These volunteers mentor the schools, help set up ham radio equipment in the schools, and then prepare the students to coïnduct the contact with the ISS crew. ARISS, an international working group consisting almost entirely of dedicated volunteers, partners with the National Aeronautics and Space Administration (NASA), the Center for the Advancement of Science in Space (CASIS), and the other ISS international space agencies to engage the schools and students in educational opportunities that enable them to explore space and learn about wireless technology.

F. H. Bauer (B) ARISS, Silver Spring, MD 20905, USA e-mail: [email protected] D. Taylor ARISS, Columbia, MD 21046, USA R. A. White ARISS, Bloomington, IN 47403, USA O. Amend ARISS, Bremen, Germany © Springer Nature Switzerland AG 2019 H. Pasquier et al. (eds.), Space Operations: Inspiring Humankind’s Future, https://doi.org/10.1007/978-3-030-11536-4_33

827

828

F. H. Bauer et al.

Acronyms/Abbreviations ARRL AMSAT APRS ARISS AZ BATC BPSK CA CD CDM CASIS CIRC CLC CSA DATV DVB-S, DVB-S2 ESA EVA FGB FL FM GSNEO HamTV IN ISS JSL JAXA KSC LAN MA MHz MO NASA OH ROC Roscosmos SAREX SCaN SSTV STEAM STEM STS

Amateur Radio Relay League Radio Amateur Satellite Corporation Automatic Position Reporting System™ Amateur Radio on the International Space Station Arizona British Amateur Television Club Binary Phase Shift Keying California Compact Disk Children’s Discovery Museum Center for the Advancement of Science in Space Central Illinois Radio Club Challenger Learning Center Canadian Space Agency Digital Amateur Television Digital Video Broadcasting via Satellite European Space Agency Extra Vehicular Activity Functional Cargo Block Florida Frequency Modulation Girl Scouts of North East Ohio Amateur Radio Television Indiana International Space Station Joint Station LAN Japan Aerospace Exploration Agency Kennedy Space Center Local Area Network Massachusetts MegaHertz Missouri National Aeronautics and Space Administration Ohio Republic of China Russian Space Agency Shuttle Amateur Radio Experiment Space Communication and Navigation Slow Scan Television Science, Technology, Engineering, Arts, and Mathematics Science, Technology, Engineering, and Mathematics Space Transportation System

Educational Outreach and International Collaboration …

TDRSS TX UBA UHF UK US USB UTC VDC VHF VITA

829

Tracking Data Relay Satellite System Texas Union Royale Belge des Amateurs-Emetteurs Ultra High Frequency United Kingdom United States Universal Serial Bus Universal Coordinated Time Volts Direct Current Very High Frequency Vitality, Innovation, Technology and Ability

1 Introduction ARISS inspires, engages, and educates youth in the fields of science, technology, engineering, arts, and mathematics (STEAM) by giving them an opportunity to talk directly with the on-orbit crew via amateur radio. Through the ARISS ham radio connections, the students ask the ISS crew questions about life in space, career opportunities, or other space-related topics. Students can fully engage in the ARISS contact by helping set up an amateur radio ground station at the school and then using that station to talk directly with the onboard crew member for approximately ten minutes, the time of an ISS overhead pass. Preparation for the experience motivates youth to learn about radio waves, space technology, ISS research, science, geography, and the space environment. In many cases, the students help write press releases, convey ARISS activities through social media, and give presentations on the contact to their fellow students and to the local community. ARISS youth activities span many youth educational domains, including public and charter schools and universities (K-16), scout groups, museums, libraries, after-school programs, and national or international events. Over the years, students, worldwide, have asked the ISS crew a myriad of questions to better understand what it is like to live, work, and conduct research in space. Using ARISS as their communication Muse, students have become surrogate explorers of the universe through their connection with the astronauts and cosmonauts on ISS. To date, ARISS has conducted over 1200 school contacts with about 70–100 performed per year. Through ARISS, educators engage their students in many handson activities that provide numerous STEAM learning experiences. These can include learning about wireless technology and then helping set up the amateur radio (wireless) ground station that will be used to conduct their contact with the ISS crew. They can learn about orbit mechanics and orbit prediction and then use this information and computer programs to point their school’s antennas during the contact. Students can use the knowledge learned in journalism class to develop press releases and interact with the press. They can learn about ISS science experiments and then conduct science experiments that emulate experiments currently being performed

830

F. H. Bauer et al.

on ISS. Additionally, they can develop experimental hardware systems that can be flown to ISS, either as an in-cabin experiment that is connected to one of the radio stations inside ISS or integrated into an ARISS satellite that is deployed by the ISS crew. ARISS is comprised of an all-volunteer international team of amateur radio operators, educators, and supporters that span space flight hardware development, space agency flight operation support, education, ham radio ground station installation and operation, and media communication, among others. ARISS designs and develops the amateur radio hardware that is flown to the ISS and is employed by the ISS crew to communicate with the students. ARISS selects host organizations to conduct ARISS contacts through its contact proposal process. The ARISS team prepares host organizations for their contact. Preparations include ensuring the school conducts the educational initiatives they submitted as part of their ARISS proposal. ARISS technical and educational mentor volunteers guide the school in the installation and operation of the ham radio ground station to be used on contact day. And they support the development of student experiments and initiatives to further student’s interest in amateur radio and expand their STEAM knowledge. In addition to the voice contacts with the onboard crew, many other capabilities exist and are being employed by the ARISS team to inspire, engage, and educate students in STEAM subjects. The ARISS international team often transmits images from ISS, also called slow-scan television or SSTV. These pictures can be received directly from the ISS as it flies overhead. Pictures received can be uploaded to an ARISS Web site for all to see. ARISS also supports ham radio digital communications using a system similar to Twitter or text messaging. This digital packet radio system has been used by schools to send commands to robots hundreds of miles away, emulating how space agencies communicate with their robotic space vehicles on the moon and Mars. During some ARISS voice contacts, video downlinks are also conducted using the ARISS-developed HamTV system on amateur radio S-band frequencies. ARISS also has developed satellites that have been hand-deployed by the ISS crew. These satellites include SuitSat-1, ARISSat-1, and several CubeSats. Many of these satellites embed student voices for automatic downlink. They also include student-developed projects, artwork, pictures, and spaceflight experiments. As ARISS is a hands-on, grassroots experience, students are engaged and educated in STEAM fields and are inspired to pursue STEAM-related career choices. Approximately, 15,000–100,000 students are touched directly by an ARISS contact each year and tens of millions from the public witness the contact either directly or through the news media.

2 History of Amateur Radio in Human Spaceflight Amateur radio has had a significant human presence in space. The first human spaceflight mission with amateur radio on board was the STS-9 Space Shuttle Columbia mission in 1983. At that time, astronaut Owen Garriott, W5LFL, provided an unprece-

Educational Outreach and International Collaboration …

831

dented level of excitement in the amateur radio community by talking with ham radio operators on the ground using a specially developed VHF FM ham radio transceiver. These modest beginnings 35 years ago led to frequent school group contacts with the astronauts onboard 25 space shuttle missions in the mid-1980s and during most of the 1990s as part of the US-led Shuttle Amateur Radio Experiment (SAREX) project. The 1985 STS-61A Spacelab D1 mission kicked off the first international ham radio venture when a German-led station was flown and operated by two German and one Dutch-licensed radio amateurs. During that space shuttle mission, they operated a voice repeater and, when not available, employed an automated voice uplink recorder to capture the voices that contacted the radio station. Soon, international ham radio operations would become the norm. As human spaceflight moved from Shuttle sorties to more permanent operations in space stations, amateur radio followed. Both the Mir space station and the ISS planned to include an international crew complement. So, ham radio developments and operations transformed from single country effort (e.g., US or Germany) into an international operation. For the Russian space station Mir, Sergey Samburov from Russia led a team that included ham radio enthusiasts from Germany and the USA. The team included the US SAREX team. SAREX performed numerous school contacts with US astronauts on board the Russian Space Station Mir in the late 1990s. In 1996, while the ISS was still in its early stages of development, the ARISS team was formed to support international ham radio development and operations on ISS. The ARISS international working group was established to develop and operate the ham radio station systems on ISS as a single coordinated entity. In this way, a single focus and entry point were created for all amateur radio activities on the ISS and into the international space agencies. With the formulation of ARISS, the foundational development and operation processes honed during the SAREX program as well as the developments and operation experiences in Germany and Russia have evolved into an international development and operation program that leverages all the strengths and resources of an international team. As a result, the international space agencies and ARISS sponsors have been quite pleased with the positive educational impact that ARISS provides to students worldwide. Since its beginnings, ARISS and its predecessors have pioneered several new and exciting communication capabilities on human spaceflight vehicles. These include: • The first human-tended amateur radio in space (1983). • The first communications between astronauts and people outside official NASA channels (1983). • The first pictures uplinked and downlinked to the Space Shuttle (1985). • The first astronaut–student interviews (1990). • The first computer-to-computer e-mails from the Shuttle (1990). • The first television uplink to the Shuttle (1991). • The first backup communications during a NASA satellite (TDRSS) outage (1992). • The first spacesuit satellite—SuitSat-1/RadioSkaf deployed from ISS (2006). • Working with NASA and Roscosmos, safety protocols that enable the deployment of satellites from ISS (2006).

832

F. H. Bauer et al.

• The first independent education video downlinks using HamTV (2016). • All ISS expedition crews (expeditions 1–57) have used the ARISS radio systems to conduct thousands of interviews with school students and thousands of contacts with ham radio operators around the world.

3 Organization ARISS is an international working group, consisting of delegations from 12 countries including several countries in Europe as well as Japan, Russia, Canada, and the USA. The organization is run almost entirely by volunteers from the national amateur radio organizations (e.g., the American Radio Relay League or ARRL in the USA) and the international AMSAT (Radio Amateur Satellite Corporation) organizations (e.g., AMSAT-NA in North America). Since ARISS is international in scope, the team coordinates locally with their respective space agency—the Canadian Space Agency (CSA), the European Space Agency (ESA), the Japan Aerospace Exploration Agency (JAXA), the Russian Space Agency (Roscosmos), and NASA. Globally, ARISS international coordinates as a team through working group meetings, teleconferences, and electronic mail. ARISS international consists of five regions (Canada, Europe, Japan, Russia, and the USA) which parallel the five ISS space agencies. The ARISS International Board includes the following officers: Chair, Vice-Chair, and Secretary/Treasurer. Each region has two voting delegates except Europe, which has four delegates due to many countries and regional space organizations that comprise Europe (see Fig. 1). These delegates are chosen from the national amateur radio societies and AMSAT organizations. The international team donates approximately $5 million per year of in-kind support to the ISS program, primarily through technical and educational volunteer support to the schools, flight hardware development, and contact operation support. In the USA, the primary ARISS sponsors are NASA’s Space Communication and Navigation (SCaN) organization and CASIS. They provide financial support to fund the cost of ARISS operation support in Houston, which permits contact scheduling, crew training, and crew ham radio licensing. In 2017, the international team formed 97 partnerships with schools (grades K-12 and universities), boy scouts, girl scouts, museums, libraries, US national parks, and camps. A snapshot of some of the larger organizations and venues that collaborated with ARISS in 2017 [3] includes the following: USA • • • •

World Genesis Foundation, Goodyear, AZ. McBride High School and area Middle Schools, Long Beach, CA. Blair Pointe Upper Elementary School, Peru, IN. NSTA Convention and Council of State Science Supervisors (CSSS), Los Angeles, CA.

Educational Outreach and International Collaboration …

833

ARISS -I Officers Chair — Frank Bauer Vice Chair — Oliver Amend Secretary/Treasurer — Rosalie White

ARISS Delegates ARISS -Canada

ARISS -Europe

ARISS -Russia

ARISS -Japan

ARISS -USA

C. Latawiec G. MacDonell

O. Amend E. D’Andria J.P. Courjaud B. Husken

S. Samburov T. Kolmykova

S. Endo M. Tsuji

D. Taylor R. White

Canadian Space Agency

European Space Agency

Roscosmos/ Energia

Japan Aerospace Exploration Agency

National Aeronautics and Space Administration

Space Agencies Fig. 1 ARISS International (ARISS-I) working group and its relationships with the international space agencies

• SCaN, NASA Glenn Research Center, and Girl Scouts of NE Ohio, Brook Park, OH. • Tuskegee Youth in Aviation at ISS R&D Conference, Washington DC. • Youth at Frontiers of Flight Museum, Dallas, TX. • SCaN TDRS-M Launch Public Outreach Effort, KSC Visitor Center, Titusville, FL. • Chiddix Junior High School, Normal, IL. • Meadows Elementary School, Manhattan Beach, CA. • Burleson High School, Burleson, TX. • Heart of America Council, Boy Scouts of America “Scouting 500,” Kansas City, MO. • Red Sox STEM Day, Fenway Park, Boston, MA. • Fleet Science Center’s BE WISE, San Diego, CA. Europe • Paolo Nespoli’s VITA Mission. • The 14th Elementary School Katerini, Greece. • Girl and Boy Scouts at VCP-Bundeszeltplatz (a national campground), Großzerlang, Germany. • Youngsters on the Air (YOTA), Gilwell Park, UK. • Tallaght Community School, Dublin, Ireland. • SUMMA Aldapeta, Donostia/San Sebastián, Spain.

834

F. H. Bauer et al.

• Primary School 21st of May, Podgorica, Montenegro. Russia • About Gagarin from Space (multiple school STEM activities). • Ufa State Aviation Technical University, 85th Anniversary, Ufa. Canada • Ecole College Park School in Saskatoon, Saskatchewan. • Kugluktuk High School, Kugluktuk Nunavut. Japan/Asia • Takaishi City Central Public Hall, Takaishi City, Japan. • Shirokawa Elementary School, Seiyo, Japan. • Taipei Municipal Ximen Elementary School, Taipei, Taiwan, R.O.C.

4 Program Objectives ARISS strives to meet space agency education goals of strengthening the future workforce through attracting and retaining students in STEAM disciplines and engaging the public on human spaceflight activities and research occurring in the ISS National Laboratory. The preparation for ARISS contacts exposes students, the public, and the ISS crew members to amateur radio. Young people are then exposed to human spaceflight by direct contact with crew members on board the ISS. Astronauts and cosmonauts benefit from these contacts as they speak to people who are not solely involved with their ISS mission, reducing feelings of isolation during their long stay in space. Opportunities exist for experimentation and for the evaluation of new technology as it relates to ARISS. And ARISS provides a contingency communication network for NASA and the ISS crew. The increase in public awareness of space initiatives and research and of amateur radio benefits the next generation by promoting interest in the fields of science, technology, engineering, and mathematics.

5 Contact Proposal Process In the USA, proposals to host an ARISS contact are requested twice per year. This is similar to the process used by ARISS Europe. Each proposal is reviewed by a group of education specialists. The proposals are ranked using a rubric which includes the strength of the education proposal, the venue size, impact the activity will have on students and the general community, and the team’s ability to successfully conduct the proposed education program and ARISS contact. Organizations whose proposals are accepted in each cycle will host an ARISS contact 6–12 months later. The ARISS proposal process is a two-step process:

Educational Outreach and International Collaboration …

835

Fig. 2 ARISS proposal process

• Step 1—Completion and approval of the host organization’s education plan. • Step 2—Completion and approval of the host organization’s equipment plan. A detailed description of the proposal process, including the latest proposal window dates, proposal forms, and a downloadable US proposal guide, can be obtained by visiting the US contact proposal section of the ARISS Web site: http://www.ariss. org/hosting-an-ariss-contact-in-the-us.html. The ARISS proposal process is outlined in Fig. 2. All information to submit an ARISS proposal during the two windows of opportunity is on the ARISS Web site: www.ariss.org. Once a host’s proposal is submitted, the ARISS team evaluates the education plan. Approved education plans move forward with the assignment of an ARISS technical mentor, orientation session attendance, and submittal of their ham radio equipment plan. Schools are encouraged to start their education plan with the students immediately after the ham radio equipment plan has been approved. After this, schools learn their contact “best week” about 2–3 months prior to their contact. Approximately 4–6 weeks prior to the ARISS event, several contact date/time opportunities are generated by the operation team. This information is shared with the school group to arrive at a prioritization of contact times. These priorities, as well as specific information about the school and the questions to be asked by the students, are forwarded to the ISS mission control team. Student questions and names for the contact, normally about 24 questions and 12 students, are forwarded about 2–3 weeks before the contact. Approximately one week prior to the event, the ISS mission control team provides the rise and set time for the event and the crew member that will participate in the event. All this time, the whole school is engaged in active learning and preparing for the contact, including the setup of the ham radio station. Prior to and during contact day, the operation mentor and the school team are in constant communication, sharing and confirming orbital data, synchronizing timing, sharing information on contact success, and compiling metrics from the contact. As the con-

836

F. H. Bauer et al.

tact time approaches, all at the school are ready and excitedly waiting for the crackle of the radio when the ISS crew member calls the school to start the connection. Contacts are available year-round but depend on the work schedule of the crews on the ISS. The operation team attempts to schedule one to two contacts per week. During weeks involving extravehicular activity (EVA) or visiting vehicles, the ISS crew is usually not available for school contacts. The ARISS contact is an inspiring once-in-a-lifetime event that the students and those in the audience will never forget. But that is just the “icing on the cake.” The educational program, proposed by the school and successfully implemented, will serve as a lifelong STEAM foundation for the students. The contact preparations, unique to ARISS, provide real-world examples of STEAM concepts that the students learn through lectures from real scientists and engineers and through hands-on activities and problem solving. This “inspire, engage, educate” method has propelled many students into STEAM careers that they could not have imagined prior to the start of their ARISS journey.

6 Educational Impact The radio contact is the culmination of a long series of classroom projects, space science and engineering activities, community involvement, and public relations that produce a spirit of teamwork. There is a sense of accomplishment that results from the school, and the students setting up and conducting the ISS ham contact themselves. The students better understand how NASA and the other international space agencies conduct science on ISS. The unique, hands-on nature of the amateur radio contact provides the incentive to learn about orbital mechanics, space flight, and radio operations. A coordinating teacher said, “To see the kids’ eyes light up, it was worth everything we’ve done to see that. This is something they will remember for the rest of their lives.” Some examples of the in-depth ability for ARISS to inspire, engage, and educate youth in 2017 include: • In the months prior to their ARISS contact, students from SUMMA Aldapeta, Donostia/San Sebastián, Spain, visited the exhibition “Zientzia Astea” (Week of Science, Technology and Innovation) (Fig. 3). They also enjoyed two radio workshops put together by area radio amateurs where activities ranged from experiments with antennas and developing model rockets. • Students at Chiddix Junior High School in Normal, IL, USA, participated in an ARISS contact with Joe Acaba on the ISS. A 13 year old, Dhruv Rebba, KC9ZJX, made the initial call to begin the ARISS contact. The mayor awarded him a proclamation of achievement for spearheading the contact for the school; he worked with teachers who partnered with the city’s Challenger Learning Center (CLC), Children’s Discovery Museum (CDM), and Central Illinois Radio Club (CIRC). CLC

Educational Outreach and International Collaboration …

837

Fig. 3 Students from SUMMA Aldapeta School in Spain learn about rocketry

guided students in robotics. CDM helped students attend ISS Adventure Camp and built an ISS model underwater at the high school’s swimming pool. CIRC taught how communications work between ground ham radio equipment and ARISS equipment. One hundred twenty-five students watched the contact. The event was live-streamed to over 800 students in classrooms in the school district. • With the help of staff from NASA Space Communication and Navigation (SCaN), NASA Glenn Research Center, and Girl Scouts of Northeast Ohio (GSNEO), USA, 300+ girl scouts engaged in educational activities involving a microgravity drop tower, lunar robotics, NASA space communications and navigation, amateur radio, vacuum chamber, electronics, ISS glove box, straw rockets, energy bead bracelets, and more (Figs. 4 and 5). The girls were exposed to many STEMrelated career options, and speakers from GSNEO, NASA, and ARISS gave talks. Leaders reported that the ARISS contact was the highlight of camp week. Social media generated a great deal of interest; i.e., SCaN uploaded the GSNEO poster, and it garnered 5185 impressions. • Over 2500 youth from schools all over New England (including from low-income towns) came to Fenway Park for STEM, ARISS, and a major league ball game. Weeks before, educators were sent STEM action packs for use in classrooms. Astronaut Paolo Nespoli tweeted, “One astronaut and 2000 future ones! Had fun talking to you.” • Some of the precontact educational preparations for the World Genesis Foundation contact in Quartzite, AZ, included a “Radio Science Day” that brought youth from six schools together for a hands-on experience in amateur radio and radio science at 8 indoor and outdoor stations with 20 interactive exhibits. More than 250 youth and 30 volunteers came together for the full-day event involving students from throughout the rural county and teachers from across the southwest USA. They also completed 8 weeks of amateur radio instruction in the school, have 11 newly

838

F. H. Bauer et al.

Fig. 4 Camp educational activities: Girl Scouts of North East Ohio, USA Fig. 5 Girl scout designed patch

licensed students between the ages of 9 and 15, and expected to test another 10–20 in the following two months. They also are launching an amateur radio club for youth from Quartzsite, Cibola, Ehrenberg, Bouse, Brenda and Hope, Arizona, and, also, Blythe and Palo Verde, California. • ESA astronaut Paolo Nespoli made his third trip to the ISS in July 2017 on the Vitality, Innovation, Technology and Ability (VITA) education mission (Fig. 6). For the third time, he participated in ARISS contacts. During this mission, he set a record for total number of ARISS contacts performed, reaching 85 schools, and performed seven successful Ham Video events during school contacts and some during European Researchers’ Night and other ARISS events. Contact metrics each year are amazing. 2017 and 2018 were no different. Tables 1 and 2 show that each year over 100,000 students participate in ARISS, even with only a partial 2018 data set (through October 22). Nearly 170,000 people witnessed an ARISS contact either directly or indirectly in 2017. From January 1 to October 22, nearly 400,000 witnessed an ARISS contact in 2018. Direct metrics mean that

Educational Outreach and International Collaboration …

839

Fig. 6 Paolo Nespoli talking to students using the ARISS ham radio system as part of the VITA education mission

individuals were in the room during the contact. For students, direct means they participated in some or all of the planned ARISS educational curriculums. Indirect means that the students, educators, or public witnessed the contact live, either through the school’s closed-circuit TV system or via livestream Internet. Note that these metrics do not include other informal educational activities that usually occur during ARISS contacts. In a teachable moment, ham radio operators bring their radios into other schools that are within the ISS contact flight path and allow these students to directly hear the contact via the ARISS publicly published downlink frequency 145.80 MHz.

7 ARISS System Capabilities The ability to inspire, engage, and educate students through ARISS could only be accomplished through the support and goodwill of the space agencies and the crew on board ISS. Also, it could not be accomplished without the infrastructure of ham radio systems on board ISS and the communication systems that are run by the amateur radio operators on the ground. ARISS has strategically placed ham radio systems in multiple locations on the ISS to facilitate ease of use by the ISS crew. This encourages more frequent contact opportunities. Moreover, the diverse locations in the Russian and US segments of ISS also improve our backup communication capability. Com-

840

F. H. Bauer et al.

Table 1 January 1–December 31, 2017, participation metrics Direct

Indirect

Students

Educators Public

Total

Students

Educators Public

Total

January 1–March 31

6917

387

786

8090

6626

431

2635

9692

April 1–June 30

2047

288

576

2911

2197

81

1795

4073

July 16,093 1–September 30

516

2869

19,478

3681

432

1216

5329

October 1–December 31

9463

854

3027

13,344

54,679

1432

48,683

104,794

2017 totals

34,520

2045

7258

43,823

67,183

2376

54,329

123,888

Table 2 January 1–October 22, 2018 participation metrics Direct Students

Indirect Educators Public

Total

Students

Educators Public

Total

January 7037 1–March 31

531

1926

9494

35,592

334

119,198

155,124

April 1–June 30

852

22,460

34,613

47,723

1950

145,058

194,731

July 5272 1–September 30

814

1617

7703

4523

549

3937

9009

October 1–October 22

4595

159

289

5043

4547

69

4545

9161

2018 totals (to October 22)

28,205

2356

26,292

56,853

92,385

2902

272,738

368,025

11,301

Educational Outreach and International Collaboration …

841

plementing voice communications with the crew, ARISS has embedded many other communication modes and capabilities in the onboard radio systems and has flown numerous experiments on ISS and through ISS-deployed satellites. These efforts are to further pique students’ interest in STEAM education. They also actively engage the million + worldwide amateur radio community that have keen interest in communicating through ISS. In addition, the experiments flown as part of ARISS enable ARISS to continue to push the state of the art in wireless communication technology and, in some cases, space exploration. The following paragraphs provide insight into the ARISS system capabilities on ISS, experiments flown as part of ARISS and flown as hand-deployed satellites to support the education and engagement ideals.

7.1 Current Onboard Radio Systems The ARISS ham radio equipment currently resides in two locations inside the ISS with antennas located in several locations outside the ISS (see Fig. 7). Ham radio operations initially were performed inside the Functional Cargo Block (FGB), named Zarya, using an antenna previously employed for docking operations. Once the other modules were fully operational and the ARISS antennas were installed, the ham radio equipment was moved into these modules. These include ham radio stations in the Russian Service Module, called Zvezda, in late 2003 and in the European Columbus Module in 2009. The radio in Zvezda is a JVC Kenwood D710 VHF/UHF mobile radio system that supports voice and digital packet radio operations. It can operate at power levels up to 50 W. The Columbus radio system includes a pair of Ericsson handheld radios that support either VHF or UHF voice and a packet module for digital communications. Only one radio, VHF or UHF, can be hooked up at a time. The Ericsson radios are the original models flown to ISS in 2000, prior to crew habitation. The Ericsson radio power output is 5 W. School contacts are performed using FM voice on the VHF frequency band with a 145.80 MHz public downlink frequency. At times, UHF is used for school contacts. To support multi-mode, multi-operation contacts on ISS and to ensure robust signal links for backup communications, four multi-band ham radio antennas, supporting HF, VHF, UHF, and L-band and S-band, were installed around the periphery of the Russian Service Module. A single VHF/UHF and two L-band/S-band antennas were installed near Nadir on Columbus. The current equipment on board supports voice operations, digital packet radio operations called APRS which is like text or Twitter messaging, picture uplink and downlink using a technique called slow-scan television (SSTV), and a digital amateur radio television system, called HamTV that enables video downlink on S-band. HamTV is coupled with the two-way FM voice communication system to provide simultaneous video downlink on S-band and two-way FM voice communications on VHF or UHF for school contacts.

842

F. H. Bauer et al.

Columbus Module (COL)

FGB (Zarya)

Service Module (Zvezda)

Fig. 7 ARISS locations on the ISS

7.2 Slow-Scan Television (SSTV) Picture Uplink and Downlink The Russian ARISS team runs occasional SSTV special events from the Zvezda module. These events use the JVC Kenwood D710 radio and a laptop running an SSTV software package called MMSSTV. In 2017, SSTV events included a series of 12 images that celebrated the 20th anniversary of the ARISS program (see Fig. 8). The images were received by school groups and ham radio operators depicting historical events from the 20 years that ARISS has existed. SSTV special events also include a yearly celebration of Cosmonautics Day or Yuri’s Night, Fig. 9, when Yuri Gagarin, the first human in space, was launched on April 12, 1961. ARISS also celebrated the 40th anniversary of the Apollo–Soyuz joint mission in July 2015. A team in Poland creates special diplomas (certificates) for each of these events which are distributed to all who received images. Selected images from various events can be viewed at: http://spaceflightsoftware.com/ARISS_SSTV/index.php. Reception of SSTV is very simple. It can be as simple as using an iPhone with a free SSTV app installed which can receive the audio from a radio listening to the SSTV downlink. The app then displays the SSTV image on the iPhone screen.

Educational Outreach and International Collaboration …

Fig. 8 ARISS 20th anniversary commemorative

Fig. 9 Cosmonautics day commemorative

843

844

F. H. Bauer et al.

7.3 HamTV Downlink Capability The HamTV hardware system supports the downlink of standard definition digital amateur television (DATV) and voice on S-band frequencies (~2.4 GHz) from the Columbus module (see Figs. 10, 11, and 12). The onboard system employs a 10 W HamTV transmitting module built by Kaiser Italia for ESA and the ARISS team, an onboard video camera and one of the L-band/S-band ARISS antennas that were externally affixed on the Columbus module prior to its launch. The HamTV downlink is received using a specially developed DVB-S receiver/analyzer software system, called Minitioune, through a set of “chained” ground stations that stream their downlink video into the British Amateur Television Club (BATC) video server for viewing at a school and by the public. Most ground station setups can only receive 3–5 min of good video, so the chained station concept extends the video downlink timespan to support the entire ten-minute school contact pass. Operational HamTV was first employed live by astronaut Tim Peake on February 11, 2016, for one of the UK schools and was used for most of the contacts that Tim Peake performed. Paolo Nespoli continued its use, performing seven successful school video events during his VITA mission.

Fig. 10 Tim Peake HamTV downlink during ARISS contact

Educational Outreach and International Collaboration …

Fig. 11 Samantha Cristoforetti with HamTV

Fig. 12 Minitioune DVB-S receiver/analyzer software

845

846

F. H. Bauer et al.

7.4 Interoperable Radio System: The Future Radio System for Columbus and Zvezda The interoperable radio system represents a significant upgrade of capability, robustness, and versatility as compared to the current ARISS radio systems in the Columbus and the Zvezda Service Modules [4]. It will significantly improve the Columbus module signal reception with a much more powerful radio system. The JVC Kenwood D710 GA VHF/UHF transceiver (Fig. 13) will provide up to 25 W of output power, as compared to the 5 W from the current Ericsson. The radio system will be interoperable and certified to operate across all of ISS—the USA and Russian operating segments. Two systems will be deployed on ISS, one in Columbus and the other in Zvezda. A key element of this new radio system is the multi-voltage power supply development, which will support multiple ISS input voltages. It also has several power outlets, with multiple voltages available on each, and four 5 VDC USB ports. The multi-voltage power supply will support a myriad of current and future ham components. Overall, the planned enhancements include: • A higher power downlink for improved school contacts, SSTV downlinks, and APRS operations. • Improved audio and contact time during school events. • Reduced HamTV downtime through a dedicated power supply. • More robust software/firmware in the radio system to prevent accidental programming and reduce repeater mode command steps. • New JVC Kenwood D710 cooling fan which supports continuous repeater mode operations. • Enabled sharing of hardware resources across modules when failures occur. • Support of common operations and training across US and Russian segments. • Support of future system developments including: – HamTV slide show module/dedicated camera. – iPad SSTV operations. – Additional radio systems.

Fig. 13 JVC Kenwood D710 GA transceiver

Educational Outreach and International Collaboration …

847

8 ARISS Hand-Deployed Satellites The ARISS team pioneered the idea of deploying satellites from the ISS with its first hand-deployed satellite—SuitSat-1—deployed in 2006. At the time, there was great angst within the space agencies about deploying passive satellites from ISS. This was primarily because these satellites could pose a high risk for future collision with ISS. The ARISS international team, facilitated by the safety experts at NASA and the Russian Space Agency, were able to develop a deployment plan for passive satellites that protects the safety of the ISS crew and protects the ISS vehicle while allowing a satellite (or multiple satellites) to be safely and successfully deployed from ISS. This capability is now coded into a set of safety rules that have been used on all ISS-deployed satellites since SuitSat-1. This includes many university, government, and commercial CubeSats deployed from ISS each year. The following describes the satellites that ARISS has developed and deployed from the ISS, with a description on how the international team has embedded STEAM-related opportunities in these satellites for K-16 youth all over the world.

8.1 SuitSat-1/RadioSkaf SuitSat-1, also known in Russia as RadioSkaf, Fig. 14, is a pioneering amateur radio satellite that was designed, developed, and fabricated by the ARISS team. The amateur radio components for this mission were launched and delivered to the ISS in September 2005. The Expedition 12 crew assembled the satellite using a discarded Russian Orlan spacesuit and the uploaded amateur radio hardware. Schools worldwide submitted artwork to be flown on the satellite. Over three hundred items were compiled, and two copies of the artwork CD were delivered to the ISS. One was inserted into the spacesuit prior to its deployment, and the other was made available for viewing by the ISS crew. An ARISS slow-scan TV image was included on board and was transmitted to Earth, providing additional opportunities for school children and ham radio operators to participate in this exciting project. The suit was deployed during an EVA on February 3, 2006. SuitSat-1 sent greetings from students in various languages and special words in several languages which could be decoded. Those who heard SuitSat and provided the special words and messages or recorded image telemetry were eligible to receive a certificate acknowledging their accomplishment. A SuitSat Web site was designed to allow those participating in this project to enter their data and track the satellite. The site received 10 million hits. SuitSat-1 received extensive media coverage from around the world: the NASA Education Portal, Reader’s Digest, Popular Science, NPR (All Things Considered), Japan CQ, CNN, MSNBC, AstroNet (Poland), CBS, Discovery Channel, Al Jazeera, to name a few. The spacesuit satellite captured the imagination of many worldwide due to it being the first-ever satellite to be deployed from ISS, because of its extensive

848

F. H. Bauer et al.

Fig. 14 SuitSat-1 shortly after deployment

international educational outreach and because a floating spacesuit in space (this one without a person in it) captured the vision of many science fiction movies—past and present. A “Chicken Little” contest was held for students and adults to guess when SuitSat would reenter Earth’s atmosphere. On September 7, 2006, at 16:00 UTC, SuitSat-1 reentered the Earth’s atmosphere approximately 1400 km south-southwest of Western Australia. The winners of the three categories (K-8, 9-12, and adult) received a special certificate commemorating their award.

8.2 ARISSSat-1/RadioSkaf-B/KEDR ARISSat-1, also known in Russia as RadioSkaf-B and KEDR, was initially planned to be a follow-on ham radio system in an Orlan spacesuit or SuitSat-2 [5]. But the suit became unavailable in 2009, so the ARISS team focused on developing a series of four hand-deployed 30-kg mass spacecraft with dimensions 550 × 550 × 400 mm. To date, only one of the spacecraft, ARISSat-1, has flown. The ARISSat series was developed as a STEAM outreach and technology demonstration platform. Each vehicle could support up to three student experiments with one—a vacuum experiment from Kursk University—on ARISSat-1. STEAM out-

Educational Outreach and International Collaboration …

849

reach activities also included SSTV downlink images, using onboard cameras, student voice greetings with 24 messages in 15 languages, and participation in a “Secret Word” contest. Data downlinks included voice telemetry to listen to spoken data, digital downlinks of the Kursk experiment, and spacecraft data downlinks. Like SuitSat-1, a “Chicken Little” contest was conducted to predict reentry. ARISSat1 technologies included the first software-defined transponder, a maximum power point tracker power management system, to maximize solar panel power for spacecraft operations, and the demonstration of a new data protocol called BPSK-1000, which was designed to downlink a significant amount of data despite signal nulls and fading. A special Web site was developed to support the mission and to provide a more comprehensive educational outreach program with mission news, ground system software, and contest highlights. ARISSat-1 was deployed on August 3, 2011, and reentered the Earth’s atmosphere on January 4, 2012. Over the duration of the mission, 3500 images were uploaded to the archive and nearly 3000 students engaged in ARISSat-1 activities. Prior to deployment and while still on board ISS, ARISSat-1 was connected to one of the ARISS antennas in Zvezda to commemorate the 50th anniversary of the first human spaceflight mission by Russian astronaut Yuri Gagarin.

8.3 Tanusha CubeSats ARISS Russia, in collaboration with the Southwest State University in Kursk, Russia, is developing a series of educational CubeSat satellites called Tanusha [6]. Two CubeSats, Tanusha 1 and 2, Fig. 15, were developed by students at Southwest State University and were hand-deployed by cosmonauts Fyodor Yurchikhin and Sergey Ryazansky on August 17, 2017, during a space walk. These two CubeSats are performing cluster flight experiments through UHF amateur radio communication links. A second set of CubeSats, Tanusha 3 and 4, were also developed by the Southwest State University students and were launched and hand-deployed by cosmonaut Sergey Prokopyev on August 15, 2018. The Tanusha 3 and 4 satellites support more comprehensive cluster flight student experimentation including the employment of attitude control and cluster flight communication and coordination with Tanusha 1 and 2 satellites when they are within the Tanusha 3/4 cluster’s field of view. Similar to SuitSat-1 and ARISSat-1, the Tanusha CubeSat series also transmits grade school student greetings in Russian, English, Spanish, and Chinese via the UHF amateur radio transmitter. The ARISS Russia team is also partnering with other schools in Russia to develop and deploy CubeSats for educational purposes.

850

F. H. Bauer et al.

Fig. 15 Tanusha 1 and 2 hand-deployed CubeSats

9 Experimentation Since its inception, ARISS has supported a number of student experiments on board ISS or on ARISS-developed satellites. The most recent in-cabin ARISS student experiment is MarconISSta [7]. MarconISSta performs on-orbit spectrum analyses on the amateur radio satellite frequency bands. It received its name from marconista, Italian, for radio operator on a ship. The payload is an educational initiative proposed by the Technical University of Berlin. It monitors segments of the amateur radio frequency spectrum in VHF, UHF, L- and S-bands to analyze current use and availability of bands for satellite communication. One aspect of this effort is to understand the locations of frequency interference “hotspots” that could impact amateur radio satellites and ARISS. MarconISSta was integrated into the existing ARISS setup, using ARISS antennas on the Columbus module. The experiment is being conducted in 2018. This experiment is part of an electrical engineering PhD thesis for the MarconISSta Principal Investigator.

Educational Outreach and International Collaboration …

851

Fig. 16 MarconISSta LimeSDR

The MarconISSta experiment objectives are: • Gather radio spectrum data and analyze spectrum use in the following bands: VHF 145.80–146.00 MHz; UHF 435.00–438.00 MHz; L-band: 1260.00–1270.00 MHz; and S-band: 2400.00–2450.00 MHz. • Analyze ARISS antenna radiation patterns based on multiple measurements. • Student education. The MarconISSta key hardware includes: • RF couplers, which allow the ARISS VHF/UHF radios and MarconISSta to share the VHF/UHF antenna system. • LimeSDR, Fig. 16, a software-defined radio receiver which serves as the radio spectrum receiver. • Astro Pi, a Raspberry Pi-based microcontroller, which serves as the onboard computer. It supports experiment data processing. Its Joint Station LAN (JSL) network connection will enable MarconISSta commanding and data downlink through the ISS capabilities. • USB charger to support the LimeSDR and Astro Pi power requirements. The above hardware is interfaced with the ARISS antennas and radio systems to conduct MarconISSta experimentation (see Fig. 17). MarconISSta was launched on May 21, 2018, and integrated with the ARISS radios and antennas on August 13, 2018 (see Fig. 18). MarconISSta spectrum monitoring is underway, and initial experimental results are expected to be published soon [8].

852

F. H. Bauer et al. ARISS L/S antennas

ARISS VHF/UHF antennas

RF Coupler

HIGH WIDE LOW

LimeSDR

ARISS Radio Systems

Auxiliary Power

Astro Pi

USB Charger

Data

JSL Fig. 17 MarconISSta interface to ARISS radios and antennas

Fig. 18 MarconISSta on-orbit setup

Educational Outreach and International Collaboration …

853

10 ARISS Future Endeavors The ARISS international team continues to add more breadth and depth to its program to inspire, engage, and educate youth in STEAM subjects through amateur radio on ISS. ARISS has several strategic partnerships and initiatives in place, and it is pursuing several others. Two examples are described below. One is a major upgrade of our HamTV system. The other represents the expansion of our amateur radio STEAM program into deep space.

10.1 HamTV-2—The Next Generation The superb STEAM impacts that the existing HamTV system has provided and the potential for expanded experimentation and operations has raised significant interest in an enhanced HamTV system. This next-generation HamTV system [9] would not be bound to the DVB-S standard. The ARISS team would like it to be adaptable, to support multiple operating modes (e.g., voice, high-definition television, and highspeed data) and, potentially, to be a two-way (uplink and downlink) system. Due to international band plan restrictions and the current ARISS microwave antennas on board ISS, the uplink of HamTV-2 signals to the ISS is limited to the 23-cm band (1240–1300 MHz) and the downlink has been coordinated in the 13-cm band (around 2400 GHz). HamTV-2 would replace the HamTV system in the Columbus module and will make use of the available ARISS L-/S-band antennas on the Earth-pointing side of the module. A connection to the ISS LAN would enable the system to employ the ISS ancillary broadcast data. This would support time synchronization and ISS position and orientation information. Another benefit of ISS LAN access is that it enables ARISS to control HamTV-2 automatically without significant crew interactions. Prepared videos or images could be uploaded, and transmission operations can be scheduled. Connections to the interoperable radio system would offer various possibilities for combined interesting experiments for schools. With a high-speed data downlink capability, the HamTV-2 system becomes a fascinating tool for future student experiments. For example, HamTV-2 could be combined with the ESA education Astro Pi, a software-defined radio (SDR), and the L-/S-band ARISS antennas, for student frequency band experimentation. HamTV-2 could also be interfaced with CASIS Space Station Explorers’ student payloads to conduct expanded student research with minimal crew interaction. With a dedicated, high-speed uplink and downlink, the STEAM opportunities are endless.

854

F. H. Bauer et al.

10.2 Evolving Amateur Radio STEAM Education into Deep Space Crew-Tended Vehicles Whether on the Space Shuttle, on the Mir Space Station, or on ISS, astronauts and cosmonauts in space have realized a tremendous psychological boost through their talks with students and the public via ham radio. As humans venture into deep space and on to commercial space stations, the ARISS team is interested in working with the space agencies and commercial teams to ensure that ham radio is integrated into these vehicles and ready to support the onboard crews. And, through these endeavors, inspire, engage, and educate future generations to pursue STEAM careers. In March 2006, amateur radio operators from AMSAT Germany tracked and received data from Voyager 1 using the 20-m antenna at Bochum at a distance of 14.7 billion km. Its data was checked and verified against data from the Deep Space Network. Since then, amateur radio operators from around the world have tracked and received data from a Venus radar, the Cassini spacecraft orbiting Saturn, as well as missions around Mars and in lunar orbit. These achievements demonstrate that extending ham radio from ISS in low Earth orbit to human spaceflight missions in deep space is feasible. ARISS is considering educational payloads that may be included on future deep space missions. A ham radio infrastructure on the lunar orbiting Gateway, Fig. 19, like the ARISS systems on ISS, is one example. Others could include a repeater on the moon, a remote amateur television, and a Mars ham radio CubeSat. These will generate interest among students, encouraging participation in amateur radio projects and in space, science, and technology fields. Through these projects and the extensive ham radio volunteer network, future generations of students will be inspired to pursue careers in math science and communication technology. An international lunar orbiting Gateway is in the planning stages. Employing capabilities developed by NASA for deep space missions and international and commercial efforts, a cislunar space station would be used as a staging point for Deep Space Transport for missions to deep space destinations including Mars. NASA and the international space agencies are open to new ideas concerning the construction and use of the Gateway outpost. As per the request of ESA Education, ARISS international team members and an international consortium of AMSAT organizations submitted four education idea documents to ESA for the future Gateway. Those ideas included: 1. 2. 3. 4.

Amateur radio systems on Gateway. Primary (K-8) education ideas. Secondary (high school) education ideas. University education ideas.

The ARISS team anxiously awaits the next steps to engage in this endeavor. Our proven low-cost, high-payoff STEAM outreach program on ISS, with modifications, can to do the same on deep space human spaceflight missions.

Educational Outreach and International Collaboration …

855

Fig. 19 Lunar orbiting Gateway

11 Conclusions The ARISS program represents the first and longest running educational outreach payload on board the ISS. The ISS ham radio activity is one of the most exciting and stimulating educational outreach programs in space, providing students a once in a lifetime opportunity to talk to a crew member on board ISS and learn what it is like to live and work on ISS. Each year, this program has enabled tens of thousands of students to participate and learn about science, technology, and amateur radio. And, through these endeavors, ARISS inspires, engages, and educates future generations to pursue STEAM careers. ARISS and its predecessors have jump-started countless students’ careers, touched millions from all walks of life, and, in some cases, became a local or international phenomenon. Disadvantaged students in poverty to heads of states have participated in our contacts. Along the way, we have also been story-lined in the National Geographic’s Mars Prequel, were part of the IMAX Space Station 3D Film, and were part of numerous television shows and commercials, including the History Channel’s Ice Pilots series. We have participated in the search for Amelia Earhart’s plane. And we were the first to realize a science fiction movie vision with the deployment of SuitSat-1. The above is just the tip of the iceberg of what this team has accomplished in twenty years. ARISS is now on the precipice of moving our program—and our youth—into deep space. Can you imagine what we will accomplish in the next twenty years? Acknowledgements The authors would like to acknowledge our space agency sponsors NASA, ESA, CSA, JAXA, and Roscosmos, who have been instrumental in realizing the ARISS educational outreach program. We would also like to thank our amateur radio sponsors, the national amateur radio

856

F. H. Bauer et al.

societies, including the ARRL in the USA and the international AMSAT organizations, including AMSAT-NA. Special recognition goes to the international ARISS volunteer team for their tireless efforts in making ARISS such a successful, low-cost STEAM initiative. Finally, we want to thank the NASA SCaN organization and CASIS for their sustained support and guidance to further expand our educational programs.

References 1. Bauer, F. H., McFadin, L., Steiner, M., & Conley, C. (2002). Amateur Radio on the International Space Station—The first operational payload on the ISS. In Proceedings from the World Space Congress. 2. Conley, C., Bauer, F. H., Brown, D., & White, R. (2002). Amateur Radio on the International Space Station—The first educational outreach program on the ISS. In Proceedings from the World Space Congress. 3. Jackson, C., Bauer, F. H., & White, R. (2017). ARISS 2017 annual report. https://www.dropbox. com/s/hg2ap3bzkuhxl9e/Annual%20Report%202017%20Final.pdf?dl=0 4. Bauer, F. H., White, R. A., & Taylor, D. (2018, May). Educational outreach and international collaboration through ARISS: Amateur radio on the international space station. In Spaceops 2018. Marseilles, France. 5. Baines, B. (2010, October). ARISSat-1 Overview. In AMSAT Space Symposium. Orlando, FL. 6. Samburov, S., & Kolmykova, T. (2017, October). Russian Tanusha CubeSats. In ARISSInternational Meeting. Rome, Italy. 7. Buscher, M. (2017, October). MarconISSta. In ARISS-International Meeting. Rome, Italy. 8. Buscher, M., Bauer, A., Manuel Diez, J., Malte Gräfje, T., Kobow, L., Planitzer, T., et al. (2018, October). Flight results of MarconISSta—An RF spectrum analyzer aboard the ISS to improve frequency sharing and satellite operations. In 69th International Astronautical Congress. Bremen, Germany. 9. Amend, O., Bauer, F. H., Coujaud, J. P., Togolati, P., Carrai, F., Morgan, C., et al. (2018, October). Amateur Radio on ISS—Next generation HAM TV system. In 69th International Astronautical Congress. Bremen, Germany.