Analytic Combinatorics for Multiple Object Tracking [1st ed.] 9783030611903, 9783030611910

​The book shows that the analytic combinatorics (AC) method encodes the combinatorial problems of multiple object tracki

478 96 4MB

English Pages XVI, 221 [228] Year 2021

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Analytic Combinatorics for Multiple Object Tracking [1st ed.]
 9783030611903, 9783030611910

Table of contents :
Front Matter ....Pages i-xvi
Introduction to Analytic Combinatorics and Tracking (Roy Streit, Robert Blair Angle, Murat Efe)....Pages 1-21
Tracking One Object (Roy Streit, Robert Blair Angle, Murat Efe)....Pages 23-47
Tracking a Specified Number of Objects (Roy Streit, Robert Blair Angle, Murat Efe)....Pages 49-79
Tracking a Variable Number of Objects (Roy Streit, Robert Blair Angle, Murat Efe)....Pages 81-112
Multi-Bernoulli Mixture and Multiple Hypothesis Tracking Filters (Roy Streit, Robert Blair Angle, Murat Efe)....Pages 113-143
Wither Now and Why (Roy Streit, Robert Blair Angle, Murat Efe)....Pages 145-153
Back Matter ....Pages 155-221

Citation preview

Roy Streit Robert Blair Angle Murat Efe

Analytic Combinatorics for Multiple Object Tracking

Analytic Combinatorics for Multiple Object Tracking

Roy Streit Robert Blair Angle Murat Efe •



Analytic Combinatorics for Multiple Object Tracking

123

Roy Streit Metron, Inc Reston, VA, USA

Robert Blair Angle Metron, Inc Reston, VA, USA

Murat Efe Department of Electrical and Electronics Engineering Ankara University Golbasi, Ankara, Turkey

ISBN 978-3-030-61190-3 ISBN 978-3-030-61191-0 https://doi.org/10.1007/978-3-030-61191-0

(eBook)

© Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

To my loving wife, Nancy Roy Streit To Dorothy and Parker, my all Robert Blair Angle To Gülşah, Oğuzhan and Ali Emre Murat Efe

Preface

Solving simple enumeration problems is fun and natural for most folks. When the problems become complicated or subtle, however, solving them becomes tedious and the fun disappears. At the end of a long and difficult enumeration, a seed of doubt often floats unhappily to mind, “Did I/we overlook any terms, or double count them?” It takes time and patience, and careful (bordering sometimes on fanatical) attention to detail, to convince oneself that everything is fine. And then comes the task of convincing others the solution is correct, a necessary step if solving the problem is part of your job. Problems posed in the language of analytic combinatorics (AC) and generating functions (GFs) do not suffer from this “enumeration doubt” because the enumerations are embedded—exactly—in the derivatives of the GF, and the GF is determined by the fundamental assumptions of the problem, that is, from first principles. The doubt is focused where it belongs, on the fidelity of the GF model of the problem, and not on whether or not an enumerated list is complete. Such confidence is justified since derivatives are well understood. This book applies the GF method of AC to the diverse family of multiobject tracking filters. These filters resolve combinatorial enumeration problems (often a measurement-to-object assignment problem) to achieve their goal of estimating object states, but combinatorial problems are seen as secondary to the estimation goals. AC turns these priorities upside down, so to speak, by organizing tracking filters according to their GFs. This organization reveals a surprising degree of unity among the filters, a unity that is provided by the technical method (GFs), not the goal. This is one of the many benefits of AC to tracking. There are many other benefits of AC to tracking as well, and they are spelled out in this book. The emphasis throughout is on constructive methods for deriving the exact Bayesian posterior distributions from the GF that defines the track filtering problem. Closing a Bayesian recursion typically requires approximating the exact posterior, and of approximations there is no end. To keep the focus on AC, approximations are discussed in fairly broad terms.

vii

viii

Preface

AC adds considerable perspective to the many problems that plague the harder tracking applications. The price that must be paid to see the advantages of AC is that of learning a new language—the language of AC and its workhorse method, the GF. The DSP community knows GFs as z-transforms, the discrete analog of Fourier transforms, but the tracking community has drifted from the DSP and control theory communities. In any event, three appendices are provided in an effort to make the book self-contained and accessible to developers and practitioners. The references reflect the authors’ personal interests. They are pointers to the broader literature and should be seen in this light.

Acknowledgments The authors are indebted to many institutions and individuals for their support over the years needed for the ideas in this book to evolve and mature. The first two authors thank Metron, Inc. for creating and sustaining a culture that encourages mathematical research and innovation. The book might never have been started or finished without it. We thank Dr. Lawrence Stone (Metron) for reading and commenting on an early draft of parts of the book. We thank Dr. Keith Davidson (Office of Naval Research) for a writing grant that kick-started it. A series of invitations from several institutions over the last half dozen years enabled the first author to pursue a line of research that became this book. Prof. Dr. Wolfgang Koch and Dr. Felix Govaers (FKIE/Fraunhofer, Germany) invited him to participate in the Sensor Data Fusion (SDF) Workshops in Bonn. Collaborative research with them and with Dr. Christoph Degen (FKIE/Fraunhofer) stimulated many ideas that reached fruition in this book. Their support is greatly appreciated. The AC derivation of the IPDA filter, an important theme in this book, was the result of an invitation by Prof. Taek Lyul Song (Hanyang University, South Korea) to engage in collaborative university research with him and the inventor of IPDA, Prof. Darko Mušicki, in the Spring of 2013 as part of the “Brain Korea 21+ Project” supported by the Korea Ministry of Education (2013.03–2017.02). Prof. T. Kirubarajan (McMaster University, Hamilton, Canada) supported visits to his Department to give university lectures, the contents of which all enter this book in one way or another. Special thanks also go to Prof. Juan Manuel Corchado (University of Salamanca, Spain) who invited the first author to give a Keynote lecture at the International Society of Information Fusion (ISIF) Conference in Salamanca in 2014, a lecture that inspired the very first thought of the possibility of writing a book such as this one. The third author thanks members of the Estimation, Tracking and Fusion Research Laboratory at Ankara University whose support and extra time on the industry projects made working on this book possible for him. All of us thank Dr. Ali Önder Bozdoğan for his valuable contributions and support at the early stages of this book project.

Preface

ix

We all thank our families for their love and support. It is absolutely true that we could not have written this book without them. They held our families together, and gave us the strength and the time to write this book. Reston, USA Reston, USA Ankara, Turkey August 2020

Roy Streit Robert Blair Angle Murat Efe

Contents

1 Introduction to Analytic Combinatorics and Tracking . . . . . . . 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 The Benefits of Analytic Combinatorics to Tracking . . . . . . . 1.3 Sensor and Object Models in Tracking . . . . . . . . . . . . . . . . 1.4 Likelihood Functions and Assignments . . . . . . . . . . . . . . . . 1.5 A First Look at Generating Functions for Tracking Problems 1.5.1 Statement A—Object Existence and Detection . . . . . 1.5.2 Statement B—Gridded Measurements . . . . . . . . . . . 1.5.3 Statement C—Gridded Object State and the Genesis of Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Generating Functions for Bayes Theorem . . . . . . . . . . . . . . . 1.6.1 GF of the Bayes Posterior Distribution . . . . . . . . . . 1.6.2 Bayes Inference in Statement A . . . . . . . . . . . . . . . 1.6.3 Bayes Inference in Statement B . . . . . . . . . . . . . . . 1.6.4 Bayes Inference in Statement C . . . . . . . . . . . . . . . 1.7 Other Models of Object Existence and Detection . . . . . . . . . 1.7.1 Multiple Object Existence Models . . . . . . . . . . . . . . 1.7.2 Random Number of Object Existence Models . . . . . 1.7.3 False Alarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8 Organization of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

1 1 2 3 4 5 6 8

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

10 12 13 14 14 15 16 16 17 19 20 21

2 Tracking One Object . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 AC and Bayes Theorem . . . . . . . . . . . . . . . . . . . . 2.3 Setting the Stage . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Bayes-Markov Single-Object Filter . . . . . . . . . . . . 2.4.1 BM: Assumptions . . . . . . . . . . . . . . . . . . 2.4.2 BM: Generating Functional . . . . . . . . . . . . 2.4.3 BM: Exact Bayesian Posterior Distribution

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

23 23 24 26 28 28 30 30

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

xi

xii

Contents

2.5 Tracking in Clutter—The PDA Filter . . . . . . . . . . . . . 2.5.1 PDA: Assumptions . . . . . . . . . . . . . . . . . . . . 2.5.2 PDA: Generating Functional . . . . . . . . . . . . . 2.5.3 PDA: Exact Bayesian Posterior Distribution . 2.5.4 PDA: Closing the Bayesian Recursion . . . . . . 2.5.5 PDA: Gating—Conditioning on Subsets of Measurements . . . . . . . . . . . . . . . . . . . . . 2.6 Object Existence—The IPDA Filter . . . . . . . . . . . . . . 2.6.1 IPDA: Assumptions . . . . . . . . . . . . . . . . . . . 2.6.2 IPDA: Generating Functional . . . . . . . . . . . . 2.6.3 IPDA: Exact Bayesian Posterior Distribution . 2.6.4 IPDA: Closing the Bayesian Recursion . . . . . 2.7 Linear-Gaussian Filters . . . . . . . . . . . . . . . . . . . . . . . 2.7.1 The Classical Kalman Filter . . . . . . . . . . . . . 2.7.2 Linear-Gaussian PDA: Without Gating . . . . . 2.7.3 Linear-Gaussian PDA: With Gating . . . . . . . 2.8 Numerical Example: IPDA . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

31 31 32 33 35

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

35 37 37 38 39 41 41 41 42 43 44 47

3 Tracking a Specified Number of Objects . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Joint Probabilistic Data Association (JPDA) Filter . . . . . . . . 3.2.1 Multivariate Generating Functional . . . . . . . . . . . . . 3.2.2 Exact Bayes Posterior Probability Distribution via AC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Measurement Assignments and Cross-Derivative Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.4 Closing the Bayesian Recursion . . . . . . . . . . . . . . . 3.2.5 Number of Assignments . . . . . . . . . . . . . . . . . . . . . 3.2.6 Measurement Gating . . . . . . . . . . . . . . . . . . . . . . . 3.3 Joint Integrated Probabilistic Data Association (JIPDA) Filter 3.3.1 Integrated State Space . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Generating Functional . . . . . . . . . . . . . . . . . . . . . . 3.3.3 Exact Bayes Posterior Probability Distribution via AC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.4 Closing the Bayesian Recursion . . . . . . . . . . . . . . . 3.4 Resolution/Merged Measurement Problem—JPDA/Res Filter 3.5 Numerical Examples: Tracking with Unresolved Objects . . . 3.5.1 JPDA/Res Filter with Weak and Strong Crossing Tracks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 JPDA/Res with Parallel Object Tracks . . . . . . . . . . . 3.5.3 Discussion of Results . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

49 49 51 52

....

53

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

56 57 58 59 60 61 62

. . . .

. . . .

. . . .

. . . .

62 66 67 70

. . . .

. . . .

. . . .

. . . .

73 74 78 78

Contents

xiii

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

81 81 82 82 83 84 84 85 86

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

88 90 90 91 92 93 94 94 95 95

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

97 97 98 99 101 105 109 110 111 111

4 Tracking a Variable Number of Objects . . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Superposition of Multiple Object States . . . . . . . . . . . . . . . . 4.2.1 General Considerations . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Superposition with Non-identical Object Models . . . 4.3 JPDAS: Superposition with Identical Object Models . . . . . . . 4.3.1 Information Loss Due to Superposition . . . . . . . . . . 4.3.2 Generating Functional of the Bayes Posterior . . . . . . 4.3.3 Probability Distribution . . . . . . . . . . . . . . . . . . . . . 4.3.4 Intensity Function and Closing the Bayesian Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.5 Intensity Function and the Complex Step Method . . 4.4 CPHD: Superposition with an Unknown Number of Objects . 4.4.1 Markov Chain for Number of Objects . . . . . . . . . . . 4.4.2 Probabilistic Mixture GFL . . . . . . . . . . . . . . . . . . . 4.4.3 Bayes Posterior GFL . . . . . . . . . . . . . . . . . . . . . . . 4.4.4 Posterior GF of Object Count . . . . . . . . . . . . . . . . . 4.4.5 Exact Bayes Conditional Probability . . . . . . . . . . . . 4.4.6 Intensity Function . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.7 Closing the Bayesian Recursion . . . . . . . . . . . . . . . 4.5 State-Dependent Models for Object Birth, Death, and Spawning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 New Object Birth Process . . . . . . . . . . . . . . . . . . . . 4.5.2 Darwinian Object Survival Process . . . . . . . . . . . . . 4.5.3 Object Spawning (Branching) . . . . . . . . . . . . . . . . . 4.6 PHD: A Poisson Intensity Filter . . . . . . . . . . . . . . . . . . . . . 4.7 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.1 JPDAS Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.2 PHD Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.3 Discussion of Results . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Multi-Bernoulli Mixture and Multiple Hypothesis Tracking Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Multi-Bernoulli (MB) Filter . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Prior and Predicted Processes: JIPDA with Superposition . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 GF of Predicted Number of Objects . . . . . . . . . . 5.2.3 Predicted Multiobject PDF . . . . . . . . . . . . . . . . . 5.2.4 Predicted Multiobject Intensity Function . . . . . . . 5.2.5 GFL of the MB Filter . . . . . . . . . . . . . . . . . . . . . 5.2.6 GFL of the MB Posterior Process . . . . . . . . . . . . 5.2.7 Exact MB Posterior Process Is an MBM . . . . . . .

. . . . . . .

. . . . . . 113 . . . . . . 114 . . . . . . 115 . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

115 116 117 117 117 118 119

xiv

Contents

5.2.8 Interpretation of the Posterior Mixture . . . . . . . . . . 5.2.9 Posterior Probability Distribution . . . . . . . . . . . . . 5.2.10 Intensity Function of the Posterior Process . . . . . . 5.2.11 GF of the Number of Existing Objects—MB Filter 5.2.12 Closing the Multi-Bernoulli Bayesian Recursion . . 5.3 Multi-Bernoulli Mixture (MBM) Filter . . . . . . . . . . . . . . . . 5.3.1 GFL of the MBM Process at Scan k 1 . . . . . . . . 5.3.2 GFL of the MBM Predicted Process at Scan k . . . . 5.3.3 GF of the Predicted Aggregate Number of Objects in the MBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.4 Probability Distribution of Predicted MBM Multiobject State . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.5 GFL of the Joint MBM Process . . . . . . . . . . . . . . 5.3.6 GFL of the MBM Bayes Posterior Process . . . . . . 5.3.7 MHT-Style Hypotheses . . . . . . . . . . . . . . . . . . . . 5.3.8 GF of Aggregate Object Number—MBM Filter . . . 5.3.9 Intensity of the MBM Posterior . . . . . . . . . . . . . . 5.3.10 Closing the Bayesian Recursion for MBM Filters . 5.4 Labeled MBM Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Labels in Analytic Combinatorics . . . . . . . . . . . . . 5.4.2 GFL of the LMBM Filter . . . . . . . . . . . . . . . . . . . 5.4.3 Track-Oriented LMBM and Closing the Bayesian Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Multiple Hypothesis Tracking (MHT) Filter . . . . . . . . . . . . 5.6 Conjugate Families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Numerical Example: JIPDAS Filter . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Wither Now and Why . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 To Count or Not to Count, that Is the Question . . . . . . . . 6.2 Low Hanging Fruit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Techniques for High Computational Complexity Problems 6.4 Higher Level Fusion and Combinatorial Optimization . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

120 121 121 122 122 123 123 125

. . . . . 125 . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

126 126 127 129 129 130 130 131 132 133

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

135 136 138 138 143

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

145 145 147 149 151 153

Appendix A: Generating Functions for Random Variables . . . . . . . . . . . 155 Appendix B: Generating Functionals for Finite Point Processes . . . . . . . 169 Appendix C: Mathematical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

Acronyms

AC AD AOU BM BMD CCM CDF CIT CPHD DSP EGF ESP FISST GCE GF GFL IID ILP IPDA JIPDA JIPDAS JPDA JPDA/Res JPDAS LMB LMBM MB MBM MHT NLP

Analytic Combinatorics Automatic Differentiation Area of Uncertainty Bayes-Markov Bayes-Markov-Detect Constrained Conditional Modeling Cumulative Distribution Function Cauchy Integral Theorem Cardinalized Probability Hypothesis Density Digital Signal Processing Exponential Generating Function Elementary Symmetric Polynomial Finite Set Statistics Grand Canonical Ensemble Generating Function (see PGF) Generating Functional (see PGFL) Independent and Identically Distributed Integer Linear Programming Integrated Probabilistic Data Association Joint Integrated Probabilistic Data Association Joint Integrated Probabilistic Data Association with Superposition Joint Probabilistic Data Association Joint Probabilistic Data Association with Resolution Model Joint Probabilistic Data Association with Superposition Labeled Multi-Bernoulli Labeled Multi-Bernoulli Mixture Multi-Bernoulli Multi-Bernoulli Mixture Multiple Hypothesis Tracking Natural Language Processing

xv

xvi

PDA PDF PET PGF PGFL PHD PMB PMBM PMF PPP RFS ROC RTS SNR SQL TBD TPM

Acronyms

Probabilistic Data Association Probability Density Function Positron Emission Tomography Probability GF (often abbreviated GF) Probability GFL (often abbreviated GFL) Probability Hypothesis Density Poisson Multi-Bernoulli Poisson Multi-Bernoulli Mixture Probability Mass Function Poisson Point Process Random Finite Set Radius of Convergence Rauch-Tung-Striebel Signal-to-Noise Ratio Structured Query Language Track Before Detect Transition Probability Matrix

Chapter 1

Introduction to Analytic Combinatorics and Tracking

“Se vogliamo che tutto rimanga com’è bisogna che tutto cambi.” (“If we want things to stay as they are, things will have to change.”) Spoken by Tancredi in Il Gattopardo, by Giuseppe Tomasi di Lampedusa. (The Leopard, in English editions)

Abstract Analytic combinatorics and generating functions are introduced in the context of multiple object tracking. Their utility is demonstrated by several examples of increasing model complexity. The simplest is for an object that may or may not exist and, if it exists, may or may not be detected. To ease the discussion, the examples use gridded spaces; continuous spaces are reserved for later chapters. The variety and computational complexity of tracking filters is understood clearly when seen through the lens of generating functions and the methods of analytic combinatorics. Keywords Multiple object tracking · Analytic combinatorics · Combinatorial enumeration · Combinatorial optimization · Generating function · Bayesian filter · Gridded data

1.1 Introduction Combinatorics is an extraordinarily diverse field of classical mathematics. Many seemingly unrelated problems fall within its purview, ranging from charming puzzles like Rubik’s cube to practical applications like the multiple object tracking problems that are the subject of this book. Many are easy to state but very challenging to solve, which makes them interesting in themselves. Questions in enumerative combinatorics begin with the words, how many? For example, given a starting configuration of Rubik’s cube, how many solutions have exactly n moves (rotations)? Enumerative questions are often coupled with combinatorial optimization questions such as what is the “best” solution? Most people would say the best solution of Rubik’s cube is the one with the fewest moves, but what is best depends on the goal. This chapter © Springer Nature Switzerland AG 2021 R. Streit et al., Analytic Combinatorics for Multiple Object Tracking, https://doi.org/10.1007/978-3-030-61191-0_1

1

2

1 Introduction to Analytic Combinatorics and Tracking

gives a high-level overview of multiple object tracking problems and the pivotal role that combinatorics plays in their formulation and solution. Object tracking is a statistical estimation problem. It is formulated mathematically in terms of a likelihood function whose structure depends on the models required in the application. More often than not, the higher the fidelity of the mathematical model, the more complicated the likelihood function and the greater the computational complexity of the estimation algorithm, or tracking filter as such algorithms are called. A veritable smorgasbord of Bayesian tracking filters is available in the literature nowadays. Each filter is derived from a likelihood function that has–at its core–a combinatorial problem. Some problems are stated naturally in the language of enumeration and others in terms of optimization. The thesis of this book is that the fundamental properties of tracking filters are best understood by examining them through the lens of analytic combinatorics (AC). The AC method is an old, but ever modern, trick that converts combinatorial statements into exactly equivalent statements about generating functions1 (GFs). The trick applies to both enumeration and optimization problems. At first glance, AC is counterintuitive and yet, as Wilf writes [1], “generating function arguments often give satisfying feelings of naturalness, […] as well as usually offering the best route to finding exact or approximate formulas.” This book validates Wilf’s observation for multiple object tracking filters.

1.2 The Benefits of Analytic Combinatorics to Tracking The AC approach makes it easy to understand what distinguishes the different filters by comparing the individual algebraic factors of the GFs of the likelihood functions. Each factor has a specific combinatorial interpretation, and their product uniquely determines the fundamental structure of the filter. Using a chemistry analogy, factors are “elements” and their product is the “molecular formula” of the filter. AC also gives a simple explanation of why it is that two seemingly similar filters may have very different computational complexities. As shown in Appendix A, the mathematical form of a filter is found by taking derivatives of the filter’s GF. The complexity of a filter is determined by the number of distinct terms in the derivative. As calculus students discover, the derivatives of similar looking functions can have very different numbers of terms. Simply by counting the number of terms in a derivative, practicing engineers can see in detail how a proposed change in the model alters the likelihood function and how that, in turn, affects the filter complexity. AC enables approximations to be computed for very high computational complexity tracking filters using established classical applied mathematics. Further discussion is given in Sect. 6.3 of Chap. 6, but it suffices here to say that the derivatives of the GF are written as Cauchy integrals (this is an exact equivalence), and then the saddle 1 In digital signal processing, GFs are known as discrete-time generalizations of Fourier transforms.

See Appendix A, p. 156.

1.2 The Benefits of Analytic Combinatorics to Tracking

3

point method is applied to compute numerical approximations to the integrals. The saddle point method is a standard tool used in applied mathematics and physics for asymptotic analysis. The topic is the subject of ongoing work.

1.3 Sensor and Object Models in Tracking Single-object tracking filters estimate an object’s state using sensor data collected over a sequence of (non-overlapping) time intervals called scans. What constitutes the state of an object depends on the application, but it often comprises kinematic properties such as position and velocity. Object states are modeled as points, that is, objects appear as point sources in the sensor output. This modeling assumption has practical implications, e.g., in some surveillance applications, it requires that objects be neither too close nor too far from the sensor.2 Multiple object tracking filters estimate the multiobject state, which comprises the state of every object. The number of objects is stipulated in some filters and estimated in others, in which case the number of objects is part of the multiobject state. Object motion is only partially predictable when some agent (e.g., a human pilot) is controlling them. This well-known and thorny modeling problem arises in other fields, too (e.g., control theory). It is treated here by assuming that object motion is governed by a random process whose probability distribution models the many sources of uncertainty in the object motion. Satisfying the assumption in practice is often a nontrivial exercise in model development—the model must incorporate not only the statistical nature of the inherent and unavoidable variability in object motion due to “system noise,” but it must also incorporate plausible models of the various unknown deterministic inputs from a controller (e.g., the pilot mentioned above). Sensor data comprise the outputs of an embedded sensor signal processor that analyzes the raw signal observed over a scan time interval and produces sensor measurements. Point sources are imperfectly resolved, so to obtain the point measurements used in most (but not all, see [2, Chap. 6]) tracking filters, the signal processor estimates point measurements in a manner that is consistent with the sensor’s point spread function and outputs the estimated points to the tracker. Point measurements are produced for each detected object. When several objects are detected, the sensor measurement is actually a measurement set. Sensor measurements are noisy and, ideally, are accompanied with estimates of their accuracy (old fashioned “error bars” deduced from the point spread function). It is often possible to model the probability distributions of the measurements if the design of the sensor and the physics of the raw signal are known. In the absence of physics-based models, sensor behavior is modeled by subjective probabilistic methods. Sensor models, however developed, are incorporated into the tracking filter. In practice, the measurement set, which ideally contains only measurements of objects, is corrupted by random insertions and deletions, or “indels” as they are 2 Extended

objects appear larger than the sensor’s point spread function (Sect. 2.3 of Chap. 2).

4

1 Introduction to Analytic Combinatorics and Tracking

termed in computational sequencing problems (e.g., genetics). Insertions are measurements that do not correspond to objects. They are often called false alarms or clutter. Deletions are false dismissals and correspond to objects for which there is no measurement. In surveillance applications (e.g., radar), indels are frequently modeled as statistical artifacts of the sensor signal processor caused by low signal-to-noise ratio (SNR). Signal processors use low detection thresholds to detect weak signals (e.g., objects with small radar cross-sections) at the cost of increasing the number of threshold crossings, or false alarms. Deletions result when weak signals are undetected because the detection statistic did not cross the detection threshold, or because they are “lost” in the point spread function of a strong signal. In practical problems, the fundamental reality of sensor data is captured by a single word, uncertainty: Which measurements originate from objects, and which are false alarms? Which objects contribute sensor measurements, and which are undetected? These uncertainties are the foundation of the combinatorial problems discussed in this book. AC and its workhorse, the GF, systematically organize the uncertainties in tracking problems and lead to viable and useful tracking filters.

1.4 Likelihood Functions and Assignments The probability distributions of the measurement and object processes determine the joint object-measurement distribution. The distribution is complicated when measurements and objects can be combined in more than one way without violating constraints imposed by the application. The allowed combinations are called feasible combinations. The joint likelihood function is the sum of the probabilities of the events defined by the feasible combinations. Even when the feasible combinations are describable in simple down-to-earth language, the sums themselves can be large, elaborate expressions whose summands are cumbersome and unwieldy, even tedious. In short, any inherent simplicity of the underlying joint likelihood function is obscured in a veritable haze of formulae in the enumerated sums. The feasible combinations for point objects are assumed to be those that satisfy the “at most one measurement per object per scan” rule. The rule is a model of the output of the sensor signal processor; it was first stated explicitly in [3]. If there is more than one sensor, the rule is applied to each one. Given the rule, it is meaningful to define a “label,” or indicator variable that specifies whether or not a given measurement corresponds to an object, and if it does, which one. In this language, the sensor measurement set is unlabeled. The rule systematizes the problem but does nothing to clear away the formulaic haze. GFs clear away the haze by encoding joint likelihood functions into astonishingly concise, exactly equivalent algebraic expressions. They transform complicated sums in the likelihood function into products of algebraic factors. The resulting GFs are often so small that they fit on a single line of text, sometimes with room to spare. The expressive economy of GFs extends to Bayesian statistics. As will be seen, the

1.4 Likelihood Functions and Assignments

5

GF of the exact Bayesian tracking filter is given by a ratio of derivatives of the GF of the joint likelihood function. Two paths emerge in the tracking literature. One enumerates the measurement-toobject assignments and develops a Bayesian tracking filter that includes them all and neglects the influence of none. This is the classical Bayesian path, and it is the path followed in most of the book. The other path seeks to find the best set of assignments, that is, the best measurement labels, and then derives a Bayesian filter conditioned on those assignments. This is a combinatorial optimization problem. Both paths are Bayesian and both are amenable to AC methods. (See Sect. 6.4 of Chap. 6.) The next section introduces GFs as models for several very simplified tracking and tracking-related problems. It illustrates how closed-form expressions for the joint GFs are deduced directly from the tracking assumptions. Successful applications of AC in tracking share this story.

1.5 A First Look at Generating Functions for Tracking Problems Multiple object tracking problems are combinatorial problems in probabilistic clothing. To make the point and also show that GFs are valuable models in tracking, three basic descriptive statements, or hypotheses, about objects and measurements are examined in this section. Elements of these statements appear in many tracking filters, frequently in complicated settings involving multiple objects and measurements, which is the focus of the book. The goal of this section is simple, yet remarkable—it is to show that these statements are perfectly encoded by a mathematical expression called a GF. After reading the examples, readers will understand the aptness of Wilf’s aphorism [1, p. 1], “A generating function is a clothesline on which we hang up a sequence of numbers for display.” The first hypothesis is Statement A. It concerns the uncertainties of object existence and detection. It is interpreted as a counting problem and modeled as a GF. The second is Statement B and it extends Statement A by including the possibility of a measurement. Careful attention is paid to the critical step that maps the measurement into an equivalent counting problem. The third hypothesis is Statement C, where Statement B is extended by including the possibility of object state and it too is mapped into a counting problem. Since Statement C is about two entities, object and measurement, it brings the discussion to a point where Bayesian inference is possible. The following subsections discuss, in turn, the encoding of Statements A–C into joint GFs. Considerable insight into GF methods and the AC way of thinking is gained by using the GFs to explore the close relationships between the problems. For example, a simple algebraic procedure reduces the GF of Statement C to the GF of B, and the GF of B to that of A.

6

1 Introduction to Analytic Combinatorics and Tracking

The important problem of false alarms is not considered in this section to keep the discussion simple. False alarms add distinctive terms to the joint GFs. They are discussed in Sect. 1.7, among other things. Section 1.6 derives exact Bayesian posterior distributions for Statements A–C from the joint GFs derived in the next three subsections.

1.5.1 Statement A—Object Existence and Detection The first statement describes an extremely spartan setting in which the logical relationship between object existence and sensor detection is modeled but without specifying either the object state or measurement. Statement A. At most one object exists and, if it does exist, the sensor may or may not generate one measurement. Let N and M denote the number of objects that exist and the number of measurements, respectively. Philosophers wrestle with the nature of existence, but here existence is strictly about counting: the object is said to exist if N = 1 and not to exist if N = 0. The number of objects is assumed to be a random {0, 1} integer, an assumption that justifies calling χ = Pr{N = 1} the object existence probability. The statement that at most one object exists is equivalent to Pr{N = 0} = 1 − χ and Pr{N ≥ 2} = 0. By definition (see Appendix A), the GF for N is G N (z) =

∞ 

Pr{N = n}z n = 1 − χ + χ z ,

(1.1)

n=0

where z is the indeterminate variable of the GF for N . The linear equation (1.1) perfectly characterizes the statement that there is at most one object, but it says nothing about the number of measurements. It is the GF of the marginal distribution of N for the joint random variable (N , M). If the object does not exist, then N = 0 and G M|N =0 (w) = Pr{M = 0 | N = 0}w0 ≡ 1, where w is the indeterminate variable for M. If it does exist, then N = 1. Denote the conditional probability that it generates a measurement (i.e., is detected) by ρ = Pr{M = 1 | N = 1}. The GF of M conditioned on N = 1 is G M|N =1 (w) =

∞ 

Pr{M = m|N = 1}w m = 1 − ρ + ρw .

(1.2)

m=0

The conditional GF is different from the GF of the marginal distribution of M, which is given by (1.3) G M (w) = 1 − χρ + χρw .

1.5 A First Look at Generating Functions for Tracking Problems

7

To see this, note that the only way that a measurement is observed is if the object exists and is detected. The probability of this event is χρ, and it is the coefficient of w. No measurements can occur in one of two ways, and they account for the coefficient of w 0 ≡ 1. More than one measurement cannot occur, so the coefficients of w k , k ≥ 2, are zero. The single variable GFs (1.1)–(1.3) do not capture the if-then structure of Statement A. Said differently, they do not model the relationship between N and M. What is needed is the GF of the joint variable (N , M). The joint GF is a bivariate function of z and w. By definition (Appendix A, Eq. (A.19)), G NM (z, w) =

∞ ∞ n=0

m=0 0 0

Pr{N = n, M = m} z n w m

= (1 − χ) z w + (1 − χ ) 0z 0 w 1 + χ (1 − ρ) z 1 w 0 + χρz 1 w 1 = 1 − χ + χ (1 − ρ) z + χρzw = 1 − χ + χ z (1 − ρ + ρw) .

(1.4)

As a check, verify that G NM (1, 1) = 1. The GFs of the marginal distributions of N and M are (1.5) G N (z) = G NM (z, 1) and G M (w) = G NM (1, w) , as is verified by comparison with (1.1) and (1.3), respectively.3 In small problems like (1.4), the GF is an explicit sum over all possibilities. In large problems, closed-form expressions for the GF can sometimes be derived from the modeling assumptions. In such cases, expanding the GF into a power series about the origin gives the probabilities as the series coefficients. The following example uses Bayesian methods to derive a GF as a composition of conditional GFs. It is a general result that applies to this problem and to other problems of the kind that often arise in tracking. Using the definition of conditional probability, the first equation in (1.4) is rewritten in the form G NM (z, w) = =

∞ n=0

∞

n=0

Pr{N = n}z n

∞ m=0

Pr{M = m|N = n} w m

Pr{N = n}z G M|N =n (w). n

 (1.6)

Assume now that M is the sum of N independent and identically distributed (IID) random variables. Thus, the GF of M conditioned on N = n is the product of the n identical GFs (see Appendix A), that is,  n G M|N =n (w) = G M|N =1 (w) . Substituting into (1.6) gives

3 The

identities (1.5) are general, see Appendix A, Eqs. (A.22) and (A.23).

(1.7)

8

1 Introduction to Analytic Combinatorics and Tracking

∞

 n Pr{N = n} zG M|N =1 (w) n=0   = G N zG M|N =1 (w) .

G NM (z, w) =

(1.8)

Consequently, substituting (1.1) and (1.2) into (1.8) gives (1.4), as is verified by calculation. Intuitively, the composition (1.8) can be interpreted as representing the statement, “If N , then M.”

1.5.2 Statement B—Gridded Measurements Statement A is “data starved,” meaning that most problems are formulated in a larger context in which objects have states and sensors produce measurements of the state. Statement B adds a point measurement to Statement A, which is otherwise unchanged. Statement C of the next subsection adds both a measurement and an object state. The added measurement information is mapped into a counting problem. The mapping is done with a gridded measurement space. The same mapping works intuitively for continuous spaces, but technical details (see Appendix B) get in the way of the flow of ideas, so only gridded spaces are treated here. Later chapters in the book do not use gridded spaces. Statement B. At most one object exists and, if it does exist, the sensor may or may not generate a random measurement Y . The measurement space Y is partitioned into a finite number of non-overlapping grid cells labeled 1, . . . , R. The measurement Y is conceptualized as the nonnegative integer-valued random vector Y1:R = (Y1 , . . . , Y R ), where Yr is the number of measurements in cell r . The random number of measurements M is M=

R r =1

Yr .

(1.9)

The object is detected or undetected according to whether M = 1 or M = 0, respectively. If M = 1, the measurement is in one and only one of the R grid cells, so one of the variables Yr is one, and the others are zero. If M = 0, the object is undetected, and the variables Yr are all zero. The indeterminate variables for Y1:R are denoted by w1:R , respectively. Since no measurement is generated when the object does not exist, N = 0, the GF of Y1:R conditioned on N = 0 is G Y1:R |N =0 (w1:R ) = 1. Suppose now that N = 1, that is, the object exists and no information is known about its state. From (1.9), the probability of detection is the conditional probability ρ = Pr{M = 1 | N = 1} .

(1.10)

1.5 A First Look at Generating Functions for Tracking Problems

9

The complement, 1 − ρ = Pr{M = 0 | N = 1}, is the probability that the object is undetected even though it exists. Conditioned on object existence and detection, the probability that the measurement is in cell r is pr = Pr{Yr = 1 | N = 1, M = 1},

R r =1

pr = 1 .

(1.11)

The GF of Y1:R conditioned on N = 1 is a formidable looking sum over the 2 R possible outcomes of Y1:R (see Appendix A); however, appearances are deceiving— the sum collapses to a linear function because only R + 1 events have nonzero probability. Using (1.10) and (1.11), G Y1:R |N =1 (w1:R ) ≡

1 k1 ,...,k R =0

= Pr{M = 0 | N = 1} +

R 

Pr{Y1:R = k1:R | N = 1}w1k1 · · · w kRR Pr{M = 1 | N = 1} Pr{Yr = 1 | N = 1, M = 1}wr

r =1

=1−ρ+ρ

R 

pr wr .

(1.12)

r =1

As a check, verify that G Y1:R |N =1 (11:R ) = 1, where 11:R is the vector of all ones. Combining the conditional GF with the joint GF (1.4) gives  R G N Y1:R (z, w1:R ) = 1 − χ + χ z 1 − ρ + ρ

r =1

 pr wr .

(1.13)

To see this, parallel the Bayesian derivation of (1.4) but replace expression (1.6) with G N Y1:R (z, w1:R ) = Assuming that

gives

∞ n=0

Pr{N = n}z n G Y1:R |N =n (w1:R ) .

(1.14)

 n G Y1:R |N =n (w1:R ) = G Y1:R |N =1 (w1:R )

(1.15)

  G N Y1:R (z, w1:R ) = G N zG Y1:R |N =1 (w1:R ) ,

(1.16)

where G N (·) is given by (1.1). Equation (1.15) holds trivially for Statement B since N is at most one. Expanding (1.16) gives (1.13). Cross-checks and marginals give insight into the joint GF (1.13). Besides verifying that G N Y1:R (1, 11:R ) = 1, the marginal GF of N and the cell count Yr is found by substituting w j = 1, j = r , G N Yr (z, wr ) ≡ G N Y1:R (z, 1, . . . , 1, wr , 1, . . . , 1) = 1 − χ + χ z (1 − ρ pr + ρ pr wr ) .

(1.17)

10

1 Introduction to Analytic Combinatorics and Tracking

The result accords with intuition and is the same as (1.4) but with ρ replaced by ρ pr and w by wr . The number of measurements M is the sum (1.9). The GF of the sum (see Appendix A) is found from (1.13) by substituting w1:R = w11:R , that is, by letting wr = w for all r . The result is, after simplifying, G N Y1:R (z, w11:R ) = G NM (z, w) ,

(1.18)

where the right-hand side is the GF (1.4) of Statement A, as expected.

1.5.3 Statement C—Gridded Object State and the Genesis of Tracking Statement B asserts only that an object may or may not exist, but it ignores the object’s state. Including object state and a probabilistic model that relates it to sensor measurements is fundamental to Bayesian object tracking as a statistical estimation problem. Statement C. At most one random object exists and, if it does exist in state X , the sensor may or may not generate a random measurement Y . The measurement counting variables Y1:R and notation of Statement B are retained here. The object space X is partitioned into a finite number non-overlapping grid cells labeled 1, . . . , S. As was done with the measurement Y , the object state X is conceptualized as a nonnegative integer-valued random vector X 1:S = (X 1 . . . , X S ), where X s is the random number of objects in cell s. The total number of objects N is N=

S s=1

Xs .

(1.19)

The object exists or not depending on whether N = 1 or N = 0, respectively. If N = 1, the object is in one and only one of the S state cells, so one of the variables X s is equal to one and the others are zero. If N = 0, the object does not exist and all the variables X s are zero. The indeterminate variables for X 1:S are denoted, respectively, by z 1:S = (z 1 , . . . , z S ). If the object does not exist, then the GF of X 1:S is G X 1:S (z 1:S ) = 1. Suppose now that the object exists, i.e., N = 1. From (1.19), the probability that the object exists is χ = Pr{N = 1}, and the probability that it does not is 1 − χ = Pr{N = 0}. The probability that the object is in cell s conditioned on object existence is S μs = 1 . (1.20) μs = Pr{X s = 1 | N = 1}, s=1

1.5 A First Look at Generating Functions for Tracking Problems

11

The probabilities μ1:S = (μ1 , . . . , μ S ) are conditioned only on object existence, whereas the measurement probabilities ( p1 , . . . , p R ) given by (1.11) are conditioned on both object existence and detection. The GF for the object state X 1:S is the sum over all 2 S possible outcomes of X 1:S . As was the case with the measurement GF, the sum collapses because only S + 1 events have nonzero probability. G X 1:S (z 1:S ) =

1

j

j1 ,..., jS =0

= Pr{N = 0} + =1−χ +χ

j

Pr{X 1:S = j1:S } z 11 · · · z SS S 

s=1 S s=1

Pr{N = 1} Pr{X s = 1 | N = 1} z s μs z s ,

(1.21)

a result that is perfectly analogous to the measurement GF (1.12). Conditional probabilities model the relationship between measurements and object state. The probability of detecting an object, conditional on its existence in state cell s, is (1.22) ρs = Pr{M = 1 | X s = 1} , where M is defined by (1.9). If the object is detected, and it is in state cell s, then it generates a measurement in cell r with probability pr |s = Pr{Yr = 1 | X s = 1, M = 1},

R r =1

pr |s = 1, s = 1, . . . , S .

If the object is not detected, the conditional probabilities are undefined. Mimicking the derivation of the GF (1.12), conditioning not on object existence N = 1 but on the object existing specifically in cell s, that is, on X s = 1, gives the conditional GF G Y1:R |X s =1 (w1:R ) ≡

1 k1 ,...,k R =0 R

Pr{Y1:R = k1:R | X s = 1}w1k1 · · · w kRR

= Pr{M = 0 | X s = 1} + Pr{M = 1 | X s = 1}Pr{Yr = 1 | X s = 1, M = 1}wr r =1 R = 1 − ρs + ρs pr |s wr . (1.23) r =1

The remaining issue is how to merge the GF for object state (1.21) and the GF for the conditional measurement (1.23). The GF of the joint random variable (X 1:S , Y1:R ) is  pr |s wr . s=1 r =1 (1.24) It is easier to verify (1.24) than to deduce it from the definition. Verification requires showing that the monomials in (1.24) are the only terms in the definition of the GF with nonzero coefficients and, moreover, that the coefficients are the correct probabilities. The following table lists the monomials and their coefficients: G X 1:S Y1:R (z 1:S , w1:R ) = 1 − χ + χ

S

 R μs z s 1 − ρs + ρs

12

1 Introduction to Analytic Combinatorics and Tracking

Monomial Coefficient Event 1 1−χ Object does not exist χ μs (1 − ρs ) Object exists, is in cell s, but is not detected zs χ μs ρs pr |s Object exists, is in cell s, and is detected in cell r z s wr The first column lists the monomials with nonzero coefficients, the second lists the coefficients, and the third identifies the events to which the monomials correspond. It is self-evident that the probabilities of these events are the coefficients in the second column. No other events have nonzero probability, so (1.24) is indeed the GF of (X 1:S , Y1:R ). As a check, verify that G X 1:S Y1:R (11:S , 11:R ) = 1. A by-product of the joint GF (1.24) is that it yields explicit formulae for probabilities in Statement B. From (1.19), the joint GF of N and Y1:R is found by putting z s ≡ z for all s. Substituting z 1:S = z11:S into (1.24) gives G N Y1:R (z, w1:R ) = G X 1:S Y1:R (z11:S , w1:R )  S  R  S = 1 − χ + χz 1 − μs ρs + r =1

s=1

s=1

  μs ρs pr |s wr .

(1.25)

Comparing the coefficients in this expression to those of (1.13) in Statement B gives the probability of detection (1.10), ρ=

S s=1

μs ρs ,

(1.26)

and the probability of a measurement occurring in cell r (1.11), S pr =

s=1

S

μs ρs pr |s

s=1

μs ρs

.

(1.27)

Both expressions are expected values over the object states that appear in the GF for Statement C but are absent from the GF for Statement B. Going another step in this direction, the joint GF of N and M is found by substituting w1:R = w11:R into (1.25) and using (1.26) and (1.27). The result is the joint GF (1.4), as expected.

1.6 Generating Functions for Bayes Theorem Bayesian inference is the application of Bayes theorem to obtain the posterior probability distribution of one or more random variables conditioned on the outcomes of other random variables. Discussions of Bayesian inference are widely available and are not repeated here. Section 2.2 contains a brief review of Bayes theorem. For discussion of the GF form of Bayes theorem, see [4] or Sect. A.3 of Appendix A. The first subsection derives the GF for the Bayes posterior distribution from the joint GF of two nonnegative random integer variables N and M. The form of

1.6 Generating Functions for Bayes Theorem

13

the Bayesian GF is particularly easy to understand in this case. The Bayesian GFs corresponding to Statements A–C are derived in the following three subsections. The Bayesian GF is written in two notations. One uses derivatives, while the other specifies coefficients in a power series. They differ mathematically by a factorial (cf. (1.31)). The coefficient form is more readable in many problems.

1.6.1 GF of the Bayes Posterior Distribution The posterior distribution of N conditioned on M = m is Pr{N = n | M = m} =

Pr{N = n, M = m} , Pr{M = m}

(1.28)

where Pr{M = m} = n≥0 Pr{N = n, M = m} is the marginal distribution of M. The posterior corresponds to a random variable whose probability distribution is proportional to a “slice” of the joint distribution at M = m. The denominator is the normalizing constant (called the “partition function” in physics). The numerator of (1.28) is a coefficient in a double power series expansion of the GF about the origin. Using standard multivariate calculus notation for mixed derivatives, Pr{N = n, M = m} =

1 G (n,m) (0, 0) . n!m! NM

From (1.5), G M (w) = G NM (1, w), so the denominator of (1.28) is the derivative Pr{M = m} =

1 (m) 1 (0,m) G M (0) = G (1, 0) . m! m! NM

Dividing the numerator by the denominator gives Pr{N = n|M = m} =

1 G (n,m) NM (0, 0) . (0,m) n! G NM (1, 0)

(1.29)

Multiplying both sides by z n and summing from n = 0 to ∞ shows that the GF of the Bayes posterior distribution (1.28) is G N |M=m (z) =

G (0,m) NM (z, 0) G (0,m) NM (1, 0)

.

(1.30)

In words, the Bayesian GF is the normalized derivative of the joint GF. For any integers n ≥ 0 and m ≥ 0, denote the coefficient of the monomial z n w m in the Taylor series expansion of G NM (z, w) about the origin by [z n w m ]G NM (z, w).

14

1 Introduction to Analytic Combinatorics and Tracking

Since G N M (z, w) is an analytic function in z and w with a radius of convergence greater than or equal to one in each variable (see Sect. A.3 of Appendix A), [z n w m ]G NM (z, w) ≡

1 G (n,m) (0, 0) . n!m! NM

(1.31)

By the definition of generating function, the coefficient is Pr{N = n, M = m}. Thus, Pr{N = n|M = m} =

[z n w m ]G NM (z, w) , [w m ]G NM (1, w)

(1.32)

and the GF of the Bayes posterior distribution is G N |M=m (z) =

[w m ]G NM (z, w) . [w m ]G NM (1, w)

(1.33)

This is the coefficient form of the Bayesian GF. It differs from (1.30) only in notation. The notation generalizes in a straightforward way to any number of variables.

1.6.2 Bayes Inference in Statement A Object existence is inferred from the number of measurements. Substituting the joint GF (1.4) into (1.30) with M = 1 and M = 0 gives, respectively, G N |M=1 (z) = z χ (1 − ρ) 1−χ + z. G N |M= 0 (z) = 1 − χρ 1 − χρ The first equation agrees with the logic, which says that the object must exist if there is a measurement. The second equation says that the object may or may not exist if there is no measurement. In particular, the coefficient χ (1 − ρ)/(1 − χρ) is the posterior probability that the object exists.

1.6.3 Bayes Inference in Statement B Adding the measurement changes the joint GF to (1.13). The nature of the measurement is such that at most one of the cell variables Yr = 1. (To see this analytically, note that the mixed partial derivative of (1.13) with respect to wr and ws , r = s, is zero.) The posterior GFs for Yr = 1 and Ys = 0, s = r, are ratios of mixed derivatives. The event Yr = 1 corresponds to R individual events whose indeterminates are wr1 = wr and wκ0 = 1, κ = r . Their product is the univariate monomial wr . Using

1.6 Generating Functions for Bayes Theorem

15

(1.13) and the appropriate multivariate version of (1.33) gives  wr G N Y1 ···Y R (z, w1:R ) χρpr z G N |Yr =1 (z) = = z. = [wr ] G N Y1 ···Y R (1, w1:R ) χρpr

(1.34)

The event Y1 = · · · = Y R = 0 corresponds to R events whose indeterminates are wr0 = 1 for r = 1, . . . , R. The product is the constant monomial 1. Using (1.13) again gives the conditional GF  1 G N Y1 ···Y R (z, w1:R ) 1 − χ + χ (1 − ρ)z . G N |Y1 =···=Y R = 0 (z) =  = 1 − χρ 1 G N Y1 ···Y R (1, w1:R )

(1.35)

This GF is identical to the G N |M= 0 (z) of Statement A. The derivative forms of these conditional GFs are equivalent but less readable, r ,...,0) (z, 01:R ). The other derivatives are similar e.g., the numerator of (1.34) is G (0,...,1 N Y1 ···Y R and are not given.

1.6.4 Bayes Inference in Statement C Adding information about object state and its relationship to the measurement paints a different picture. The joint GF is (1.24). For Yr = 1, calculation gives S μs ρs pr |s z s [wr ] G X 1:S Y1:R (z 1:S , w1:R ) . G X 1:S |Yr =1 (z 1:S ) = = s=1 S [wr ] G X 1:S Y1:R (11:S , w1:R ) s=1 μs ρs pr |s

(1.36)

S A crosscheck of this result is to verify that, since N = s=1 X s , it reduces to G N |Yr =1 (z) = z for z 1 = · · · = z S ≡ z. (See Eq. (A.36) of Appendix A.) The coefficient of z s in (1.36) is the posterior probability that the object is in cell s,

Pr{X s = 1|Yr = 1} =  S

μs ρs pr |s

s  =1

μs  ρs  pr |s 

.

(1.37)

The presence of a measurement implies the object exists, so χ is absent from (1.37). If there is no measurement, then Y1:R = 01:R , where 01:R is a vector of all zeros. Using the joint GF (1.24) gives Bayes theorem, G X 1:S |Y1:R = 01:R (z 1:S ) =

1:S ,01:R ) G (0 X 1:S Y1:R (z 1:S , 01:R )

=

1−χ +χ

S

μs (1 − ρs )z s

. μs (1 − ρs ) (1.38) The coefficient of z s is the probability of object existence in state cell s conditioned on no measurement, 1:S ,01:R ) G (0 X 1:S Y1:R (11:S , 01:R )

1−χ +χ

s=1

S

s=1

16

1 Introduction to Analytic Combinatorics and Tracking

Pr{X s = 1 | Y1:R = 01:R } =

χ μs (1 − ρs ) .  1 − χ + χ sS =1 μs  (1 − ρs  )

(1.39)

The constant term of (1.38) is the posterior probability that the object does not exist, Pr{N = 0 | Y1:R = 01:R } =

1−χ +χ

1−χ S s  =1

μs  (1 − ρs  )

.

(1.40)

The last two expressions reduce to Pr{N = 1|M = 0} and Pr{N = 0|M = 0}, respectively, for S = R = 1.

1.7 Other Models of Object Existence and Detection This section enriches the object existence and detection model in Statement A in several ways that are relevant to tracking. The examples show the power of GFs to model challenging problems and perform exact Bayesian inference. They are precursors of certain aspects of tracking filters discussed in subsequent chapters. Only the numbers of objects and measurements are considered here.

1.7.1 Multiple Object Existence Models Adding to Statement A, suppose there are exactly k objects that independently may or may not exist, and if they do exist, each object generates at most one measurement. Assume the same existence and detection probabilities for all objects. The definitions of N and M are modified so that they denote the total numbers of objects and measurements, respectively. Mutual independence means that the joint GF is the product of the individual GFs (see Sect. A.3 of Appendix A),  k G NM| k (z, w) = 1 − χ + χ (1 − ρ)z + χρzw k  = G NM|k=1 (z, w) .

(1.41)

Note that G NM|k=1 (z, w) is identical to (1.4). Expanding (1.41) in a double power series about the origin gives, since the power series terminates at k terms, G NM| k (z, w) =

k n=0

k m=0

Pr{N = n, M = m | k} z n w m .

(1.42)

The series coefficients can be found using symbolic algebra packages. Asymptotic methods are necessary for very large values of k.

1.7 Other Models of Object Existence and Detection

17

Example 1 Let k = 20, so at most 20 objects exist. The existence and detection probabilities are the same for all object models, χ = 0.9 and ρ = 0.8. Then G NM|k=20 (z, w) = (0.1 + 0.18z + 0.72zw)20 . If there are M = 15 measurements, then the GF of the Bayes posterior distribution on the number of objects, N , is [w 15 ]G NM|k=20 (z, w) [w 15 ]G NM|k=20 (1, w)

5 15 0.1 + 0.18z =z 0.28

G N |M=15,k=20 (z) =

= 0.00581z 15 + 0.0523z 16 + 0.188z 17 + 0.339z 18 + 0.305z 19 + 0.110z 20 , where the coefficients are rounded to three significant digits. There are 15 measurements, and objects have a nonzero detection probability; hence, powers of z lower than 15 are absent. The posterior probability that all 20 objects exist is 0.110, or 11.0%. The most likely number of objects is 18, which has a probability of 33.9%.

1.7.2 Random Number of Object Existence Models In this section, another twist is added to Statement A. Suppose that the number of object-measurement existence models in the previous section is a random variable, K . Let G K (v) = k≥0 Pr{K = k}v k denote its GF, where v is the indeterminate variable. It is assumed that object existence and detection are independent of the number of object models. Multiplying (1.41) by Pr{K = k}v k and adding gives the GF of (K , N , M) as G KNM (v, z, w) =

∞ 

Pr{K = k} G NM|K =k (z, w)v k

k=0

=

∞ 

Pr{K = k} [v(1 − χ + χ (1 − ρ)z + χρzw)]k

k=0

  = G K v(1 − χ + χ (1 − ρ)z + χρzw)   = G K vG NM|K =1 (z, w) .

(1.43)

Continuing the composition is rather fun—using the expressions (1.41) with k = 1 and (1.8) gives    (1.44) G KNM (v, z, w) = G K vG N zG M|N =1 (w) .

18

1 Introduction to Analytic Combinatorics and Tracking

As a check, note that G KNM (1, 1, 1) = 1. Compositions of GFs are a frequent occurrence in multiple object tracking problems. If K is Poisson distributed with expected number  O , then Pr{K = k} = e− O k  O /k! and ∞  e− O kO k v = exp(− O +  O v) . G K (v) = (1.45) k! k=0 Substituting into (1.43) gives the GF of (K , N , M) with Poisson distributed K as   G KNM (v, z, w) = exp − O +  O (1 − χ )v + χ  O (1 − ρ)vz + χ  O ρvzw . (1.46) The probabilities Pr{K = k, N = n, M = m} are the coefficients of the multivariate power series of (1.46) expanded about the origin. The GF of the marginal distribution of (N , M) is found by putting v = 1. Thus,    G NM (z, w) = G K G N zG M|N =1 (w)   = exp −χ  O + χ  O (1 − ρ)z + χ  O ρzw

(1.47)

is the GF of the behavior of the system when the number of object models is random. Example 2 The parameters are the same as in Example 1 of Sect. 1.7.1 except that the number of object models is changed from exactly 20 to Poisson distributed with a mean of  O = 20. The GF is, using (1.47), G NM (z, w) = exp(−18 + 3.6z + 14.4wz) . Given the same number of measurements as in Example 1, namely 15, the GF of the Bayes posterior distribution and its series expansion are [w 15 ]G NM (z, w) [w 15 ]G NM (1, w) = z 15 e3.6(z−1)

G N |M=15 (z) =

= 0.027z 15 + 0.098z 16 + 0.18z 17 + 0.21z 18 + 0.19z 19 + 0.14z 20 + 0.083z 21 + 0.042z 22 + 0.019z 23 + 0.0076z 24 + · · · . Coefficients are rounded to two significant digits. As in Example 1, there are at least 15 objects, but now—with a random number of models—any number of objects can exist as long as only 15 are detected. The probability that 23 objects exist, for example, is 0.019, or 1.9%. The most likely number of objects is 18, the same as in Example 1, but the uncertainly is evidently larger, that is, the probability of 18 objects is reduced to 21%, which is a third less than the 33.9% in Example 1, and the posterior distribution is flatter.

1.7 Other Models of Object Existence and Detection

19

1.7.3 False Alarms False alarms, or clutter, are measurements that correspond to no object; they are “insertions” into the measurement set. They arise in many different applications. As mentioned in Sect. 1.3, they are often statistical artifacts of the sensor signal processor caused by low SNR. In some applications, such as electronic sensing devices, they are a residual (physical) background process known as “dark current.” No matter how it arises, the false alarm process is assumed to be independent of object and object-measurement processes. Suppose the number of false alarms is Poisson distributed with mean number C , so its GF is G C (w) = exp(−C + C w). The indeterminate variable w is used because false alarms are added to the total measurement count. False alarms change the definition of the number of measurements—M is now the sum of the number of measurements generated by objects and the number that are false alarms. With this new definition of M, the GF of the sum of independent processes is the product of their GFs (see Appendix A). With a Poisson distributed object process, (1.47) becomes G NM (z, w) = G C (w)G NM (z, w) = exp (−C + C w − χ  O + χ  O (1 − ρ)z + χ  O ρzw) . (1.48) The probabilities Pr{N = n, M = m} are given by the coefficients of the bivariate power series of (1.48) expanded about the origin. Example 3 The parameters of Example 2 of Sect. 1.7.2 are unchanged, but now the model is extended to include Poisson distributed false alarms with mean number C = 3. The joint GF is, using (1.48), G NM (z, w) = exp(−21 + 3.6z + 3w + 14.4wz) .

(1.49)

Conditioning on 15 measurements gives the GF of the Bayes posterior distribution, [w 15 ]G NM (z, w) = G N |M=15 (z) = 15 [w ]G NM (1, w)



3 + 14.4z 17.4

15 e3.6(z−1)

(1.50)

= · · · + 0.017z 11 + 0.040z 12 + 0.078z 13 + 0.12z 14 + 0.16z 15 + 0.17z 16 + 0.15z 17 + 0.11z 18 + 0.072z 19 + 0.040z 20 + 0.020z 21 + · · · . Coefficients are rounded to two significant digits. Terms with coefficients less than 0.01, or 1%, are not listed. Fewer than 15 objects are possible in this example because there is a nonzero probability that all measurements are false alarms. The maximum likelihood estimate of the number of objects is now 16, and the uncertainty in the estimate is larger than in Example 2, that is, the 17% probability of 16 objects is less than the 21% probability of 18 objects in Example 2. The Bayesian GF (1.50) encodes the number of objects. It is a product of 16 terms, each of which is itself a GF; hence, the posterior distribution is the sum of 16

20

1 Introduction to Analytic Combinatorics and Tracking

independent random variables (see Appendix A). One is Poisson distributed with a mean of 3.6 objects. The other 15 are Bernoulli variables with success probability 14.4/17.4 ≈ 0.83. The Bayesian GFs of Examples 1 and 2 are products of GFs and, therefore, are the GFs of sums of independent random variables. (In the chemistry analogy above, the “elements” are the GFs of Bernoulli and Poisson variates.)

1.8 Organization of the Book The rest of the book adds meat to the bare bones outlined in this chapter to show that AC is well suited to model diverse problems in multiple object tracking. Much of the discussion will be novel to readers unfamiliar with AC and GFs, so a relaxed writing style is used throughout the book. The emphasis on constructive mathematical methods and algorithms means that unnecessary abstractions and details are relegated to the references. The goal is to build the intuition and insight needed by practitioners for independent study. The book is largely self-contained. The book proceeds in stages. Chapter 2 is all about the classic Bayes-Markov filter and the well-known family of PDA (probabilistic data association) and IPDA (integrated PDA) filters. These are single-object filters, and the focus is on how to formulate and derive them using generating functions and the AC method. Chapter 3 extends these methods to the JPDA (joint PDA) and JIPDA (joint IPDA) filters for tracking multiple objects, assuming that the number of objects is known. These chapters are best read as a “bridge” between very different combinatorial styles—the standard enumerative method and the methods of AC. Chapter 4 is devoted to a family of superpositional, or intensity, filters called CPHD (cardinalized probability hypothesis density) filters. They are based on cluster point processes. The connection to the traditional JPDA filter is clearly and convincingly revealed in two simple steps. The first step applies superposition to the JPDA filter. This step has many lively implications that are discussed carefully. The superposed JPDA filter is called the JPDAS filter. The second step assumes the number of objects in JPDAS is a random integer with a known GF. The result is the CPHD filter. It is called the PHD filter if the number of objects is Poisson distributed. The mathematical forms of the generating functions of the CPHD and the JPDAS filters show clearly the similarity of these filters and, at the same time, sharply delineate the differences between them. Chapter 5 is devoted to a family of intensity filters based on multi-Bernoulli (MB) point processes. Loosely speaking, they are more structured than CPHD filters. There are several varieties. The simplest MB filter is shown to be identical to the JIPDA filter with object superposition, i.e., to the JIPDAS filter. MB mixture (MBM) filters are a much richer class of filters. Their association hypotheses are the same as that of a multiple hypothesis tracking (MHT) filter, but the probability structures differ because of object superposition. Labeled MBM (LMBM) filters add labels to preserve track continuity from scan to scan. In AC generating function terms, labels

1.8 Organization of the Book

21

are additional indeterminate variables. It is shown that the LMBM filter is equivalent to an MHT filter in which objects have the same state space. Chapter 6 begins with a road map showing the relationships between the filters derived by AC in this book. AC enables established mathematical methods—new to tracking—to be brought to bear on high computational complexity problems. Chapter 6 outlines these possibilities, along with other directions for future research. The appendices are written in a relaxed and pedagogical style, with an emphasis on insight and understanding. Appendix A is a tutorial on GFs for random variables on the nonnegative integers. Both univariate and multivariate GFs are discussed. Appendix B transitions AC methods from GFs for discrete random variables to similar concepts for finite point processes. Appendix C gives an overview of several mathematical methods used in the book.

References 1. Herbert S Wilf. generatingfunctionology. 3rd Ed. AK Peters/CRC Press, 2005. 2. Lawrence D Stone, Roy L Streit, Thomas L Corwin, and Kristine L Bell. Bayesian multiple target tracking. 2nd Ed. Artech House, 2013. 3. Oliver E Drummond. Multiple-Object Estimation. Ph.D. Thesis, University of California LA, USA, 1975. 4. Norman L Johnson, Samuel Kotz, and Narayanaswamy Balakrishnan. Discrete multivariate distributions. Wiley New York, 1997.

Chapter 2

Tracking One Object

“Great things are not done by impulse, but by a series of small things brought together.” Vincent Van Gogh, in a letter to Theo van Gogh

Abstract The classic Bayes-Markov (BM), probabilistic data association (PDA), and integrated PDA (IPDA) filters are derived using the analytic combinatorics (AC) method. The probability generating functional (GFL) of the BM filter is an integral, obtained as the limit of a Riemann sum of a discrete problem discussed in Chap. 1. The GFL of the PDA filter is the product of two GFLs, one for a possibly undetected but always present BM object, and the other for point process clutter. The GFL of the IPDA filter is similar, but includes object existence as a combinatorial option. The AC approach clarifies the combinatorial structure of the IPDA filter. The clarity of the AC approach also reveals a previously unrecognized approximation in PDA with gating. The real benefits of AC are to be found in the much more complex tracking problems discussed in later chapters—their GFLs are built around the single-object GFLs developed in this chapter. Keywords Bayes theorem · Bayes-Markov (BM) filter · PDA filter · IPDA filter · Gating · Analytic combinatorics · Generating function · Generating functional

2.1 Introduction Measurement-to-object assignment problems arise in single-object tracking problems when clutter is involved. Assignment problems are combinatorial problems, as discussed in the first part of Chap. 1. Practical implementations of tracking filters with assignment problems must—at some stage—solve or approximate the solution of a combinatorial problem. The generating function (GF) method of AC discussed in Chap. 1 is an exact model for assignment problems. Moreover, the ratio of derivatives of the GF gives © Springer Nature Switzerland AG 2021 R. Streit et al., Analytic Combinatorics for Multiple Object Tracking, https://doi.org/10.1007/978-3-030-61191-0_2

23

24

2 Tracking One Object

the exact Bayesian filter. As will be seen in later chapters of this book, AC methods are very general and provide broadly useful techniques applicable to many filters. Several well-known single-object tracking filters are presented in this chapter. They come in several flavors, but all (except the first) have a combinatorial component that arises either from an assumption about the number of measurements an object can generate in the sensor, or whether or not the object even exists. In single-object, single-scan tracking problems, explicit enumeration of the combinatorial model is typically straightforward, and, consequently, the utility of the AC method in these filters is minimal. This very simplicity, however, provides a gentle introduction to the AC techniques that are the foundation of this book. The AC method is, at its core, essentially the same for all filters addressed in the book. In addition to their pedagogical value, the GFs introduced in this chapter are fundamental building blocks that reappear in more sophisticated problems presented in later chapters. The journey begins by introducing the GFL of Bayes theorem in Sect. 2.2. The GFL is mathematically equivalent to Bayes theorem. The Bayes-Markov (BM), the probabilistic data association (PDA), and the integrated PDA (IPDA) filters follow from this result, as seen in Sects. 2.4–2.6, respectively. Gating is incorporated into the GFL via conditioning. A previously unrecognized approximation in PDA with gating is given in Sect. 2.5.5. The results support the chemistry analogy mentioned in Sect. 1.2 of Chap. 1 that interprets the GFL of Bayes theorem as the key “element” around which the GFLs of other “molecular” filters are built. The PDA and IPDA filters accommodate general BM object motion models, although they are often stated with linear-Gaussian assumptions. These familiar special forms are given in Sect. 2.7. The illustrative example for the IPDA filter given in Sect. 2.8 uses linear-Gaussian models, but only for convenience. Appendices A and B review analytic combinatorics and generating functions. These appendices eschew unnecessary abstractions and generalizations in favor of gaining intuition and understanding but, notwithstanding, they contain a lot material that is not needed in this chapter. Readers should consult them as needed. Those with no background in AC are encouraged to read Sects. A.1–A.3 of Appendix A, and Sects. B.1 and B.2 of Appendix B.

2.2 AC and Bayes Theorem Bayes theorem is a sound mathematical foundation for the accumulation of evidence over time, which makes it well suited to object tracking problems. It is briefly reviewed here. Readers seeking greater mathematical formality can find it in any of a number of excellent texts, e.g., [1, 2]. Readers seeking to learn its rich and storied history will find it in [3]. Let x ∈ X and y ∈ Y represent two events for which a joint PDF p(x, y) on X × Y is known. Bayes theorem states that

2.2 AC and Bayes Theorem

25

p(x | y) =

p(y | x) p(x) , p(y)

(2.1)

where p(x) is known as the prior PDF, while p(x | y) is the posterior PDF. Both are defined on X. The prior characterizes probabilistically all knowledge of x at a time prior to assimilating the event y. The function p(y | x) is a PDF for y ∈ Y when x is fixed. However, in many problems, y is considered fixed (because, e.g., y is a specified measurement), and p(y | x) is treated as a function of x. In this case, p(y | x) is called the likelihood function of x. The posterior incorporates y into the prior to update probabilistic knowledge of x. The denominator p(y) is the normalizing factor1 for the numerator  of (2.1) that makes the posterior a true PDF in x for fixed y. Explicitly, p(y) = X p(y | x) p(x) dx. As y is fixed, the normalizing factor is a constant, and it is often convenient to think of the posterior as the scaled product of the prior and the likelihood, p(x | y) ∝ p(y | x) p(x). Part of the beauty of the Bayesian method is that conditionally independent measurements can be processed sequentially as they arrive (rather than in batch mode), and there is no need to reprocess old data once they are already processed. Indeed, assuming y1 and y2 are conditionally independent given x, p(x | y1 , y2 ) ∝ p(y1 , y2 | x) p(x) = p(y2 | x) p(y1 | x) p(x) ∝ p(y2 | x) p(x | y1 ) ,

(2.2)

where the equality follows from the definition of conditional independence. Sequential processing of y1 and y2 is justified by noting that (2.2) is of the same form as (2.1), but with p(x | y1 ) in the role of the prior distribution before incorporating y2 . GFL for Bayes Theorem. The GF of Statement C in Chap. 1 with χ = 1 and ρs ≡ 1 for all s is, from (1.24), G X 1:S Y1:R (z 1:S , w1:R ) =

S s=1

μs z s

 R r =1

 pr |s wr .

(2.3)

S and {wr }rR=1 are indeterminate variables, and that the numbers Recall that {z s }s=1 R,S S {μs }s=1 and { pr |s }r,s=1 are probabilities. With appropriate definitions for μs and pr |s , as well as for the indeterminates, the expression (2.3) is a Riemann sum. Let X denote the union of the S grid cells, and Y the union of the R grid cells. These cells form a partition for a Riemann sum for two nested single integrals. The inner integral is over Y, and the outer one is over X. For simplicity, assume the cells in X are the same size, and similarly for the cells in Y. Denote the sizes of the cells by X and Y . Let xs and yr denote specified points in the grid cells comprising X and Y, respectively. Let

1 It

is called the “partition function” in statistical physics.

26

2 Tracking One Object

μs = p step (xs ) X pr |s = p step (yr |xs ) Y z s = h(xs ) wr = g(yr ). The PDFs p step (x) and p step (y|x) are taken to be step functions on X and Y, respectively, with steps that coincide with the grid cells; hence, multiplying them by cell size gives probabilities. Substituting these expressions into (2.3) gives G XY (h(x1:S ), g(y1:R )) =

S s=1

h(xs ) p step (xs )

 R r =1

 g(yr ) p step (yr |xs ) Y X .

Fix X and Y. Then X → 0 and Y → 0 as the numbers of cells R → ∞ and S → ∞. The “small cell limit” of the Riemann sum is the nested integral 

 h(x) p(x) g(y) p(y|x) dy dx Y X  = h(x)g(y) p(x) p(y|x) dy dx ,

 BM (h, g) =

X

(2.4) (2.5)

Y

where in the limit the step functions p step (x) → p(x) and p step (y|x) → p(y|x). See Sect. B.2 of Appendix B for a more detailed discussion of the limit. This expression is the GFL of one sample of a bivariate PDF p(x, y). It is equivalent to a singleton point process that generates exactly one point (x, y) ∈ X × Y. Bayes theorem is derived from it as a ratio of derivatives of  BM (h, g); see Sect. B.8 of Appendix B for details (or see below in this chapter, Sect. 2.4.3). Without doubt, the GFL approach to Bayes theorem employs a lot of mathematical machinery. The added value of the GFL approach shows itself in later chapters when the event spaces are vastly more complex, and the GFLs that characterize them are highly nontrivial. The GFL (2.5) is remarkable in one very specific way—it is a complete mathematical statement of the problem. The result is embedded in a much larger event space that comprises arbitrarily large (finite) numbers of x’s and y’s. This particular GFL implies that the only events of this larger space with nonzero probability are those with exactly one x and one y. In the more challenging problems treated in later chapters, it is very advantageous mathematically to incorporate into the equations assumptions that must otherwise be stated in words.

2.3 Setting the Stage The goal of Bayesian filtering is to estimate the posterior distribution of the current state of the object, given all measurements up to and including the current time.

2.3 Setting the Stage

27

The dynamics of the object, i.e., its evolution over time, is modeled as a function of the state. The state is represented as a numerical vector, typically with kinematic components, e.g., position and velocity. Phrases such as “object motion,” “object dynamics,” and “motion model” are interpreted as changes in object state, regardless of the actual nature of the state vector. The object moves in continuous time and, thus, has a well-defined state x(t) at all times t in some nonempty closed time interval. The state space X is the collection of all possible states. Unless otherwise stated, in the remainder of the book it is assumed that X is a continuous region of interest, i.e., an open subset of Rdim(X) . An individual sensor measurement y is a point in the measurement space Y. It is typically found by “peak picking,” i.e., by thresholding the sensor response surface. Responses that are above threshold are interpolated (by the sensor signal processor) to estimate one or more local “peaks.” The locations of the peaks are the point measurements. A “missed object detection” is said to occur when the entire response surface is below threshold, so that no point is returned. Losses and trade-offs are associated with thresholding. For example, setting the threshold low increases the object detection probability at the cost of increasing the clutter density. Unless otherwise stated, it is assumed that the measurement space Y is a continuous space; typically, Y is a specified open subset of the vector space Rdim(Y) . The spaces X and Y need not be copies of the same space; for example, when tracking spatial position in latitude-longitude space via bearings-only measurements, X is a subset of R2 , while Y is a subset of R1 . A scan measurement set is a collection of sensor point measurements made in a given time interval called a scan. Sensor scans form a sequence of non-overlapping time intervals. The scan time is defined to be the last time in the interval. Due to peak picking, a scan measurement set can be empty. It is assumed that (i) each measurement in a scan measurement set is associated with at most one object, and (ii) at most one measurement in the set is due to the object of interest. Scan measurement sets are available at the scan times t1 < t 2 < · · · < t K ,

(2.6)

for some K ≥ 1. The time t0 , where t0 < t1 , is a reference time at which no measurements are available. For k = 0, 1, . . . , K , the corresponding (ground truth) object state is denoted xk ≡ x(tk ) ∈ X. An object that generates a measurement in Y is said to be detected (by the sensor); if not, it is said to be undetected—the sensor is said to have “missed” a detection. The probability of detecting an object in state x in scan k is denoted Pdk (x). Spurious measurements unrelated to an object are called by several names. False alarms and “clutter” are perhaps the most common. The number and locations of the clutter points vary from scan to scan. In this book, clutter is modeled as a finite point process (Appendix B).

28

2 Tracking One Object

2.4 Bayes-Markov Single-Object Filter 2.4.1 BM: Assumptions Discussions of the classical Bayes-Markov filter and tracking are widely available in venerable older texts, e.g., [4–6], as well as more recent ones, e.g., [7–11]. The filter is derived from the joint probability distribution, whose general (non-parametric) form is determined by the following modeling assumptions: Continuous object existence. Exactly one object is present at all times t in the continuous time interval [t0 , t K ]. Object is causally independent of the sensor. Objects “cause” measurements in the sensor, but not vice versa.2 Markovian state evolution. Object state is a continuous-time Markov process on the state space X. Known object prior. The prior PDF μ0 (x0 ) for object state x0 ∈ X at reference time t0 is specified. Measurements are sequentially independent, conditional on object state. For t j and tk , with j = k, the sensor measurements y j and yk are independent, conditional on the object states x j ∈ X and xk ∈ X. Exactly one object measurement per scan. The sensor generates exactly one object measurement yk ∈ Y at time tk , k ≥ 1. (No missed detections.) No clutter. There are no spurious sensor measurements in Y unrelated to the object. These assumptions are severe in that they virtually eliminate combinatorial issues from the tracking problem. If peak picking is used to estimate point measurements, BM filters are useful for high SNR problems in which only one peak of the sensor response surface exceeds threshold. Lower SNR problems require lower thresholds, which leads to practical concerns (e.g., clutter) that introduce combinatorial elements to the tracking problem. These concerns are outside the purview of the BM filter. Recursive Bayesian filtering is Bayes theorem with an additional probability propagation step to handle the time evolution of object state. The simple result (2.2) is the apparatus which enables sequential, or recursive, updating of the posterior PDF as measurements are received. Three classes of functions are assumed known: μ0 (x0 ), K  Markovian object motion models: pk (xk | xk−1 ) k=1 , K  Measurement likelihoods: pk (yk | xk ) k=1 .

Prior object state PDF :

2 More

(2.7) (2.8) (2.9)

precisely, object state is an a priori random variable (i.e., parent node) in causal Bayesian network representations of the joint object-sensor probability distribution [12–15].

2.4 Bayes-Markov Single-Object Filter

29

The prior μ0 (x0 ) is a PDF that characterizes knowledge of the object state x0 at reference time t0 . Intuitively, it is the initial condition for the stochastic motion models (2.8) that describe the evolution of object state from scan to scan. The Markovian assumption is equivalent to pk (xk | x0:k−1 , y1:k−1 ) = pk (xk | xk−1 ) .

(2.10)

In other words, given all past states and measurements, the current state xk depends only on the object state xk−1 at the previous time step tk−1 . The measurement likelihoods relate the current measurement yk to the current object state xk . The conditional independence assumption says that pk (yk | x0:k , y1:k−1 ) = pk (yk | xk ) .

(2.11)

Thus, given the current object state xk , the corresponding measurement yk is conditionally independent of all past object states and measurements. These assumptions allow for a recursive algorithm in which, at each iteration, only the current measurement is used to update knowledge of object state. The recursion for the BM filter is formally defined by Algorithm 2.1: ALGORITHM 2.1. Bayes-Markov filter for single-object tracking Given: • Prior PDF at reference time t0 : μ0 (x0 ), x0 ∈ X • Measurements at each scan: y1 , . . . , y K , yk ∈ Y For iteration k = 1, . . . , K and xk ∈ X: • Predicted object state:

μ− k (x k )

• Predicted measurement:

ρk (yk ) =

=

 X X

pk (xk | xk−1 )μk−1 (xk−1 ) dx k−1 pk (yk | xk ) μ− k (x k ) dx k

pk (yk | xk )μ− k (x k ) • Posterior PDF update: pk (xk | yk ) = ρk (yk ) • Posterior PDF is the prior PDF for next iteration: μk (xk ) ≡ pk (xk | yk ) The posterior update is also known as the measurement update and the information update. The recursion notation does not show it explicitly, but the information update pk (xk | yk ) is more accurately written as pk (xk | y1:k ). In practice, it is necessary to parameterize the PDFs in the recursion in some way for calculation. For example, under linear-Gaussian parametric assumptions, the Bayes-Markov filter is exactly equivalent to the Kalman filter. Different parameterizations lead to the important topic of “closing the Bayesian recursion” to prevent the

30

2 Tracking One Object

number of parameters in the posterior from growing without bound. See Sects. 2.5.4 and 2.6.4 for further discussion of these topics. The goal of the BM filter is to compute the posterior PDF, not to compute point estimates. Nonetheless, applications often call for point estimates of object state, together with an accuracy estimate. These can be computed from the posterior PDF in various ways, e.g., by choosing the point estimate to be the global maximum of the posterior with an elliptical area of uncertainty (AOU) determined by the observed information at the maximum. These topics lie outside the scope of this book.

2.4.2 BM: Generating Functional The Bayes-Markov joint object state-measurement PDF at scan k is pk (xk , yk ) = − μ− k (x k ) pk (yk | x k ), where the prior PDF is the predicted state PDF μk (·). Its bivariate − GFL is, using (2.5) with p(x) replaced by μk (xk ),   kBM (h k , gk ) =

X

Y

h k (xk )gk (yk )μ− k (x k ) pk (yk | x k ) dy k dx k ,

(2.12)

where h k (·) and gk (·) are the indeterminate functions at scan k. The fact that the object is always present and produces exactly one measurement per scan is implicit in (2.12) as the GFL is bilinear in h k and gk . Indeed, setting h k (·) ≡ z and gk (·) ≡ w for indeterminates z and w gives kBM (z, w) = zw. In what follows, the subscripts k are dropped from indeterminate functions h(·) and g(·), as well as “dummy” integration variables x and y.

2.4.3 BM: Exact Bayesian Posterior Distribution The “secular” method described in Sect. B.5 of Appendix B is used to differentiate GFLs. Given the measurement yk ∈ Y, let gδ (y) = βδ yk (y),

(2.13)

where β ∈ R and δ yk (y) is the Dirac delta at the point yk . Define  k (h, β) ≡ k (h, gδ ) = β BM

BM

X

h(x)μ− k (x) pk (yk | x) dx .

(2.14)

The GFL of the Bayesian posterior process is the normalized derivative (see (B.67))

2.4 Bayes-Markov Single-Object Filter

31



kBM (h | yk ) =

d  BM (h, β) dβ k β=0

d  BM (1, β) dβ k β=0

 =

h(x)μ− (x) pk (yk | x) dx  −k . X μk (x) pk (yk | x) dx

X

(2.15)

Let xk be an arbitrary point in X, and define h δ (x) = αδxk (x),

(2.16)

where α ∈ R. The Bayesian posterior PDF is the derivative d BM k (h δ | yk ) dα α=0

(x ) αμ− d k pk (yk | x k ) k  − = dα μ (x) p (y | x) dx k k X k

pk (xk | yk ) =

pk (yk | xk )μ− k (x k ) . = ρk (yk )

α=0

(2.17)

As expected, (2.17) is identical to the posterior PDF update in Algorithm 2.1.

2.5 Tracking in Clutter—The PDA Filter 2.5.1 PDA: Assumptions In many practical applications, clutter and missed detections are significant impediments to object state estimation. In this section, a more realistic filter that incorporates these inconvenient truths is introduced, namely, the probabilistic data association (PDA) filter [8, 16]. Like the BM filter, the PDA filter assumes exactly one object is present at all times t in the time interval [t0 , t K ]. In contrast with the BM assumptions, however, the object may or may not be detected (i.e., generate a measurement) at any given scan k ≥ 1. Recall that Pdk (x) ≤ 1 is the probability that the sensor generates a measurement at scan k from an object at x ∈ X. Clutter comprises point measurements in the same space Y as object measurements. The numbers and locations of clutter points vary from scan to scan; they are modeled by a simple finite point process (see Sect. B.3 of Appendix B) whose GFL is assumed known. The clutter process is assumed to be independent of the object-measurement process.

32

2 Tracking One Object

The PDA filter superposes the object and clutter measurement processes. Superposition models the fact that sensor measurements are unlabeled, i.e., the origin (clutter or object) of each measurement is unknown. The absence of labels is the source of the measurement assignment problem. The first five assumptions of the PDA filter are identical to those of the BM filter (Sect. 2.4.1). Because the conditional independence assumptions are unchanged, (2.10) and (2.11) still hold. The last two assumptions, however, are modified: At most one object measurement per scan. The sensor generates at most one object measurement yk ∈ Y at time tk , k ≥ 1. (Missed detections are possible.) Independent clutter. At each scan, the sensor measurement process is the superposition of two independent processes: the object measurement process and the clutter measurement process. The modifications inject a nontrivial combinatorial assignment problem into the PDA filtering problem. The problem is to decide which measurements, if any, are due to the object, and which are due to clutter. Such assignments are absent from the BayesMarkov filter because each scan comprises exactly one measurement, and it must be the object measurement because there are no missed detections. Explicit enumeration of PDA assignments is not difficult. Indeed, given M measurements {y1 , . . . , y M }, there are M + 1 possible assignments: either (i) the object is not detected, or (ii) measurement ym is generated by the object for some m = 1, . . . , M and the other measurements are clutter. Indeed, this enumeration was used in the original derivation of the PDA filter [16]. The first derivation of the PDA filter using the AC method was given in [17].

2.5.2 PDA: Generating Functional The PDA GFL comprises two ingredients: the joint object state-measurement GFL kBMD (h, g) and the clutter measurement GFL kC (g), both defined below. Because the clutter and object-measurement processes are independent on Y, the GFL of the superposed object-clutter process is the product of the GFLs (see Appendix B): kPDA (h, g) = kC (g) kBMD (h, g).

(2.18)

The additional clutter term kC (g) clearly increases the computational complexity of the derivatives of this GFL compared to those of the BM filter. This reflects the increased combinatorial complexity of the PDA filter. Bayes-Markov-Detect (BMD) Model. The object model allows the possibility of missed detections. Given the function Pdk (x), the GFL for the BMD model is  kBMD (h, g) =

X



 h(x)μ− (x) 1 − Pd (x) + Pd (x) g(y) p (y | x) dy dx. k k k k Y

(2.19)

2.5 Tracking in Clutter—The PDA Filter

33

It reduces to the GFL of the BM filter if and only if Pdk (x) ≡ 1, x ∈ X. Since kBMD (0, g) = 0 for all indeterminate functions g(y), the only “terms” in the GFL (2.19) involve indeterminates of the form h(x) or h(x)g(y). They correspond, in the former case, to exactly one object present and no measurement and, in the latter case, to exactly one object and one measurement. As per the BMD assumptions, these are the only possibilities due to the “at most one object-generated measurement per scan” restriction. The GFL correctly and exactly encodes this information. Clutter. Clutter measurements are modeled according to a Poisson point process (PPP). At each time tk , the number of clutter measurements Ck is Poisson distributed (see (A.2)) with mean λck and, given Ck , the individual clutter points are IID with PDF pkc (y). From (B.11), the clutter GFL is given by k (g) = exp C

 −λck

+

λck

Y

g(y) pkc (y) dy

.

(2.20)

The indeterminate function g in kC (g) is the same g as in kBMD (h, g). This is due to the fact that clutter measurements and object-generated measurements are superposed in the measurement space Y. The Poisson clutter assumption is a traditional model. Other clutter processes are discussed at the end of the next section. PDA = BMD + Clutter. From (2.18), the GFL of the full object-clutter PDA process is given explicitly as

 kPDA (h, g) = exp −λck + λck g(y) pkc (y) dy (2.21) Y 

  × h(x)μ− g(y) pk (y | x) dy dx . k (x) 1 − Pdk (x) + Pdk (x) X

Y

It is important to notice that this GFL captures all of the PDA assumptions into one stand-alone expression—they are an intrinsic part of the mathematical statement and do not need to be stated in words.

2.5.3 PDA: Exact Bayesian Posterior Distribution Taking the derivative of the GFL (2.21) demonstrates that the exact measurementto-object enumeration mentioned above is encoded in the derivatives of the GFL. The scan measurement set is yk ⊂ Y. Suppose first that yk = ∅, and let yk = {y1 , y2 , . . . , y M } , M ≥ 1. Define the delta train gδ (y) =

M m=1

βm δ ym (y),

y ∈ Y, β = (β1 , β2 , . . . , β M ) ∈ R M .

(2.22)

The presence of clutter necessitates a delta train of length M for PDA (compare to the BM filter (2.13)). Define

34

2 Tracking One Object

kPDA (h, β) ≡ kPDA (h, gδ )   M = exp −λck + λck βm pkc (ym ) m=1   M − × h(x)μk (x) 1 − Pdk (x) + Pdk (x)

m=1

X

(2.23)   βm pk (ym | x) dx .

The posterior object state GFL is the normalized derivative (see Eq. (B.67)) kPDA (h, β) PDA β=0 . k (h | yk ) = dM PDA  (1, β) dβ1 ···dβ M k dM dβ1 ···dβ M

(2.24)

β=0

Let − L PDA k (x | yk ) = (1 − Pdk (x)) μk (x) +

Pdk (x)  M ρk (ym ) pk (x | ym ) , (2.25) m=1 λck pkc (ym )

where μ− k (x), ρk (y), and pk (x | y) are defined in the BM recursion, Algorithm 2.1. The normalized cross-derivative on the right-hand side of (2.24) is the GFL of the exact Bayes posterior for the PDA filter. Computing the derivative of (2.23) directly, or by using (C.42) in Appendix C, and normalizing it gives 

(x | yk ) dx . (x | yk ) dx

PDA

Xh(x)L k

k (h | yk ) = PDA

X

PDA

Lk

(2.26)

As in (2.16), let h δ (x) = αδxk (x). The PDA posterior PDF is pk (xk | yk ) =

L PDA (xk | yk ) d PDA k (h δ | yk ) . =  kPDA dα α=0 X L k (x | yk ) dx

(2.27)

Define the normalizing constant  sk (yk ) ≡

X

L PDA k (x | yk ) dx .

(2.28)

It is seen that the exact Bayesian posterior is a probabilistic mixture density: pk (xk | yk ) =

 M Pdk (xk )ρk (ym ) 1 − Pdk (xk ) − μk (xk ) + pk (xk | ym ) . m=1 λc p c (ym )sk (yk ) sk (yk ) k k (2.29)

The mixture weights correspond to specific measurement assignment probabilities, and conversely. The weight of the first component, μ− k (·), corresponds to the missed detection event, i.e., it is the predicted object state. The remaining M weights are

2.5 Tracking in Clutter—The PDA Filter

35

posterior probabilities; thus, the weight of pk (· | ym ) is the probability that ym is object generated and the remaining measurements are clutter. Suppose, finally, that the scan measurement set is yk = ∅. In this case, the secular method uses gδ (y) ≡ 0, so that the GFL of the Bayes posterior is the same as (2.26) − but with L PDA k (x | yk = ∅) = (1 − Pdk (x)) μk (x). It follows that the posterior PDF is the first term in (2.29) with sk (yk = ∅) = X L PDA k (x | yk = ∅) dx. Other Clutter Processes. The statistical nature of the clutter process impacts the mathematical form of the exact Bayesian filter. To see this, it is only necessary to verify that the derivatives of the function (2.23) depend on the GFL of the clutter process. Interested readers may wish to investigate exact filters for the cluster processes (B.5)–(B.10) in Appendix B. Methods for closing the Bayesian recursion depend on the form of the exact filter. This is discussed in the next section for Poisson clutter. Further discussion is outside the scope of the book.

2.5.4 PDA: Closing the Bayesian Recursion Assume M measurements are reported in the next scan at iteration k + 1. Each of the M + 1 components of the posterior-turned-prior defined by (2.29) will, in turn, generate M + 1 posterior components, giving a total of (M + 1)(M + 1) components after iteration k + 1. In general, assuming Mk measurements are reported at scan k, k = 1, .K. . , K , the exact posterior after the final scan is a probabilistic mixture PDF (Mk + 1) components. The size of the mixture grows without bound, so with k=1 it is inaccurate to describe it as a “closed” Bayesian recursion. To close the PDA Bayesian recursion, it is necessary to stipulate a parametric form for the prior (with a finite number of parameters) and then approximate the posterior with a distribution of the same form (and with the same number of parameters). The original PDA filter [16] restricts the BM filter to linear-Gaussian distributions with constant probability of detection Pdk (xk ) ≡ Pdk , which makes (2.29) a Gaussian mixture. The Bayesian recursion approximates the mixture with a single Gaussian whose mean and covariance are matched to the mixture.

2.5.5 PDA: Gating—Conditioning on Subsets of Measurements To reduce computational complexity, the PDA filter may be restricted to measurements from specified subsets of Y called validation gates. Gates may change from scan to scan; the gate at scan k is denoted by k . The probability that a given clutter point falls within the gate k is Pkc/gtd = k pkc (y) dy. Assume that Pkc/gtd > 0. The Poisson clutter process restricted to k is a Poisson process whose GFL is [18]

36

2 Tracking One Object



 c/gtd c/gtd + λ g(y) p (y) dy , kC/gtd (g) = exp −λc/gtd k k k

(2.30)

k

c/gtd c where λc/gtd k = Pk λk is the expected number of clutter points in the gate, and the conditional PDF is pkc/gtd (y) = (Pkc/gtd )−1 pkc (y), y ∈ k . Conditioning on the validation gate changes the object detection probability. To see this, define the events Dk ≡ {object detected at scan k} and E k ≡ {object detected at scan k and the object measurement is in k }. Then:

Pdkgtd (x) ≡ Pr{E k | x} = Pr{E k , Dk | x} = Pr{Dk | x} Pr{E k | Dk , x}  = Pdk (x) Pr{E k | Dk , y, x} p(y | Dk , x) dy Y = Pdk (x) pk (y | x) dy = Pdk (x)Pkgtd (x) , (2.31)

k

 where Pkgtd (x) ≡ k pk (y | x) dy > 0. The first equality in the last line follows from the fact that Pr{E k | Dk , y, x} equals one if y ∈ k and zero otherwise. The gated conditional measurement likelihood function is  −1 pkgtd (y|x) = Pkgtd (x) pk (y | x),

y ∈ k , x ∈ X .

(2.32)

The GFL for the gated BMD is (2.19) with Y replaced by the validation gate k :  k

BMD/gtd

(h, g) =

X

h(x)μ− k (x)



 1 − Pdk (x) + Pdk (x) gtd

gtd

g(y) pk (y|x) dy gtd

k

dx.

(2.33) The object process and the gated clutter process are independent, so the GFL of the gated PDA is the product of their GFLs, kPDA/gtd (h, g) = kC/gtd (g) kBMD/gtd (h, g) .

(2.34)

The exact Bayesian posterior is a ratio of cross-derivatives of this GFL. The size of the derivatives is equal to the number of measurements that fall inside the gate. With obvious substitutions, e.g., Pdkgtd (x) replaces Pdk (x), the posterior distribution is identical to the PDA update (2.29).

2.6 Object Existence—The IPDA Filter

37

2.6 Object Existence—The IPDA Filter 2.6.1 IPDA: Assumptions In applications, a common complaint about the BM and PDA single-object tracking filters is that they do not directly indicate when the object ceases to be present. To meet this need, object presence and absence decisions are usually performed by a statistical hypothesis test, e.g., a sequential probability ratio test or a track-beforedetect (TBD) technique that is external to the tracker and often running on data from the sensor response surface. In the context of AC and counting filters, the issue is conceived in combinatorial terms: N , the number of objects, is either N = 0 or N = 1. Modeling this possibility in the language of AC leads to the IPDA filter. By interpreting N = 0 as “the object does not exist” and N = 1 as “the object exists,” the existence problem is a combinatorial problem that is well suited to AC. The GFL of the exact IPDA filter illuminates and clarifies the event space of the original derivation [19]. Further discussion and developments of single-object IPDA tracking are given in [10, 20, 21]. The first AC derivation of the IPDA filter was given in [22]. The IPDA assumptions are the same as the PDA assumptions, but modified to accommodate object existence: At most one object exists. At most one object exists at any time t ∈ [t0 , t K ]. The number of objects, N (t), is a binary {0, 1} random variable. Markovian object existence. N (t) is a continuous-time two-state Markov chain. Known object existence prior. The prior probability of object existence, χ0 , at reference time t0 is specified, where χ0 = Pr{N (t0 ) = 1}. At most one object measurement per scan. The object, if it exists, generates at most one sensor measurement yk ∈ Y at time tk , k ≥ 1. The functions (2.7)–(2.9) are used in IPDA with the understanding that they are defined if and only if the object exists, i.e., N = 1. For example, the prior PDF for object state x0 ∈ X at time t0 is specified, assuming the object exists. It is written in a conditional form as μ0 (x0 |N0 = 1), where N0 ≡ N (t0 ). The “other” conditional prior, namely, μ0 (x0 |N0 = 0), is undefined. It is natural, but not quite correct, to think of the IPDA event space as the Cartesian product B × X, where the set B = {0, 1}. If this were the case, the conditional prior μ0 (x0 |N0 = 0) would be well defined. The event space for IPDA is actually L ≡ {∅} ∪ X. The prior is a discrete-continuous distribution that consists of the probability Pr{∅} ≡ 1 − χ0 and the PDF μ0 (x0 ) ≡ μ0 (x0 |N0 = 1). Let Nk ≡ N (tk ), k = 0, 1, . . . , K . Existence is a continuous-time Markov chain, by assumption, so Nk is a discrete-time Markov chain with states in the set B = {0, 1}. The transition probability matrix from scan k − 1 to scan k, written in row stochastic form [23, 24], is

0 0 πk−1 1−πk−1 , (2.35) Ak−1 = 1 1 1−πk−1 πk−1

38

2 Tracking One Object

0 1 where πk−1 and πk−1 are the probabilities that the chain stays in state 0 and 1, respectively, when transitioning from scan k − 1 to scan k. The prior probability of existence χ0 is determined by the application, e.g., by the detector of the sensor signal processor. The value χ0 = 0.5 is a common choice. The predicted existence probability χk− ≡ Pr{Nk = 1 | y1:k−1 } is determined by the Markov chain via the vector-matrix product

    1−χk− χk− = 1−χk−1 χk−1

0 0 1−πk−1 πk−1 , 1 1 1−πk−1 πk−1



(2.36)

1 0 so that χk− = πk−1 χk−1 + (1−πk−1 )(1−χk−1 ). The posterior probability of object existence at time tk is χk ≡ Pr{Nk = 1 | y1:k }, k ≥ 1. Numerical values for the transition matrix Ak−1 are application-dependent. For 0 = 1 for k ≥ example, if an object cannot “leap into existence,” then setting πk−1 1 prevents the chain from exiting state N = 0 once it enters—nonexistence is an 1 < 1, in which case the “absorbing” state. Existence is a “transient” state if πk−1 − 1 predicted existence probability χk = πk−1 χk−1 decreases monotonically to zero as k → ∞ (assuming an empty measurement set at each scan). A different example is one in which the object count can alternate between N = 0 and N = 1, so that both states are “recurrent.” In some applications, but not all, this would imply that one and the same object is intermittently counted. The transition matrix (2.35) of the Markov chain and the object motion models (2.8) constitute a Markovian dynamical model on the space L. The form is completely general, since any Markovian process on L is equivalent to a two-state Markov chain and a conditional PDF on X. (Details are omitted.) The “unified” Markovian model embeds the combinatorial problem in a convenient, familiar, and useful filtering notation and avoids mentioning the Markov chain explicitly, e.g., see [7, Chap. 4] and the references therein. The AC approach brings the combinatorics forward and directly into the spotlight, if you will, so the IPDA model is used here. It is important to note that existence probabilities can be state dependent. This case is treated in Sect. 4.5.2.

2.6.2 IPDA: Generating Functional The GFL for IPDA comprises two ingredients: the GFL of the joint object statemeasurement process, defined below, and the GFL of the clutter process kC (g), defined in (2.20). The processes are independent, so the GFL corresponding to the full object-clutter IPDA process is the product   BMD k (h, g) , kIPDA (h, g) = kC (g) G Bernoulli χ− k

where

(2.37)

2.6 Object Existence—The IPDA Filter

39

G Bernoulli (z) = 1 − χk− + χk− z . χ−

(2.38)

k

The Bernoulli GF is introduced in (A.13), while kBMD (h, g) is defined in (2.19). (kBMD (h, g)) yields Setting h = 0 in the joint object state-measurement GFL G Bernoulli χ− k

1 − χk− for all indeterminate functions g(y). Furthermore, the indeterminate components of the terms of this GFL are of the form h(x) or h(x)g(y). This corresponds to the three IPDA possibilities: (i) no object exists, (ii) exactly one object exists and generates no measurement, and (iii) exactly one object exists and generates exactly one measurement. If the predicted probability of object existence χk− = 1, then the GFL in (2.37) reduces to that of the PDA filter given in (2.18) for iteration k. Furthermore, if πk0 = 0 and πk1 = 1, then IPDA is PDA over all iterations k ≥ 1 if the Markov chain is initialized at the reference time t0 by χ0 = 1. The similarities and differences between the PDA and IPDA filters are readily seen by rewriting the PDA GFL in (2.18). The generating function for the number of objects present at any given time under the PDA assumptions is always exactly one, (z) ≡ z. Then so the GF of object number is G Identity k  BMD  k (h, g) . kPDA (h, g) = kC (g) G Identity k

(2.39)

Compare this with the IPDA GFL (2.37).

2.6.3 IPDA: Exact Bayesian Posterior Distribution Suppose first that the scan measurement set yk = ∅. Let yk = {y1 , . . . , y M }, M ≥ 1. Substituting the Dirac delta train gδ given by (2.22) into (2.37) gives kIPDA (h, β) ≡ kIPDA (h, gδ )

M = exp −λck + λck βm pkc (ym ) m=1   M − − × 1 − χk + χk h(x)μ− (x) 1 − Pdk (x) + Pdk (x) k

m=1

X

(2.40)

βm pk (ym |x)

 dx .

The GFL of the Bayes posterior is the normalized derivative (see Eq. (B.67)) kIPDA (h, β) IPDA β=0 k (h | yk ) = dM IPDA  (1, β) dβ1 ···dβ M k β=0  − − 1 − χk + χk X h(x)L PDA (x | yk ) dx  PDAk , = − − 1 − χk + χk X L k (x | yk ) dx dM dβ1 ···dβ M

(2.41)

40

2 Tracking One Object

where L PDA k (x | yk ) is defined in Eq. (2.25). The exact IPDA Bayesian posterior is a discrete-continuous probability distribution. Let xk ∈ X and h δ (x) = αδxk (x). The continuous part of the distribution for Nk = 1 is proportional to a PDF on X. Calculating derivatives of (2.41) gives d IPDA k (h δ | yk ) pk (xk , Nk = 1 | yk ) = dα α=0 =

χk− L PDA k (x k | yk )  . − − 1 − χk + χk X L PDA k (x | yk ) dx

(2.42)

Setting h(x) ≡ 0 in (2.41) gives the posterior probability of nonexistence,

Pr{Nk = 0 | yk } =

1−

χk−

1 − χk−  PDA ≡ 1 − χk . + L k (x | yk ) dx χk− X

(2.43)

To verify that the posterior is a discrete-continuous probability distribution, integrate (2.42) over all xk ∈ X to obtain the posterior probability of existence Pr{Nk = 1 | yk } =

χk−



L PDA k (x|yk ) dx  ≡ χk − − 1 − χk + χk X L PDA k (x|yk ) dx X

(2.44)

and then add (2.43). IPDA conditioned on object existence, Nk = 1, is identical to PDA. To see this, use Bayes theorem with (2.42) and (2.44): pk (xk |Nk = 1, yk ) =

L PDA (xk |yk ) pk (xk , Nk = 1|yk ) =  kPDA . Pr{Nk = 1|yk } X L k (x|yk ) dx

(2.45)

The last equality is the same as the PDA posterior (2.27). Suppose, finally, that yk = ∅. The secular method uses gδ (y) = 0, so the GFL of − the posterior is (2.41) but with L PDA k (x | yk = ∅) = (1 − Pdk (x)) μk (x); hence, χk− L PDA k (x | yk = ∅)  + χk− X L PDA k (x | yk = ∅) dx

(2.46)

1 − χk−  . 1 − χk− + χk− X L PDA k (x | yk = ∅) dx

(2.47)

pk (xk , Nk = 1 | yk = ∅) =

1−

χk−

and Pr{Nk = 0 | yk = ∅} =

These expressions are effectively special cases of (2.42) and (2.43).

2.6 Object Existence—The IPDA Filter

41

2.6.4 IPDA: Closing the Bayesian Recursion The Bayesian recursion for the IPDA filter is closed by exploiting the procedure used to close the PDA filter, with modifications to accommodate the existence variable Nk . The exact posterior existence probability is, from (2.44), χk =

χk− sk (yk ) 1 − χk− + χk− sk (yk )

(2.48)

 where sk (yk ) = X L PDA k (x|yk ) dx. As noted in (2.45), the continuous component conditioned on object existence is the exact posterior PDF of the PDA filter. The PDA Bayes recursion closure procedure applied to (2.45) yields an approximate PDF of the same form as the PDA prior; denote it by pˆ k (xk |Nk = 1, yk ) and let pˆ k (xk , Nk = 1|yk ) = χk pˆ k (xk |Nk = 1, yk ) .

(2.49)

The Bayes recursion for the IPDA filter is closed by (2.48) and (2.49). In practice, the integrals defining sk (yk ) are often difficult to compute (e.g., if Pdk (x) is not constant), in which case it is also necessary to approximate sk (yk ).

2.7 Linear-Gaussian Filters 2.7.1 The Classical Kalman Filter Let N(x; μ, ) denote the PDF at x of a multivariate Gaussian with mean vector μ and covariance matrix ; explicitly,   1 N(x; μ, ) = |2π |− 2 exp − 21 (x − μ)T −1 (x − μ) .

(2.50)

The state and measurement spaces are, respectively, X = Rn and Y = Rm . Under standard linear-Gaussian assumptions, the PDFs (2.7)–(2.9) are   μ0 (x0 ) = N x0 ; xˆ0|0 , P0|0

(2.51)

pk (xk | xk−1 ) = N(xk ; Fk−1 xk−1 , Q k−1 ) pk (yk | xk ) = N(yk ; Hk xk , Rk ) .

(2.52) (2.53)

The vector x0|0 and covariance matrix P0|0 initialize the filter. The matrices Fk are often called the process, or system, matrices, and Q k are the process noise covariance matrices. The matrices Hk are measurement matrices, and Rk are the measurement error covariance matrices. They are specified (real) matrices of compatible dimensions. The PDFs (2.51)–(2.53) are often written as deterministic functions with inde-

42

2 Tracking One Object

pendent, additive, zero-mean Gaussian noise. For example, (2.53) is equivalent to yk = Hk xk + wk , where wk ∼ N (0, Rk ). The PDFs in the Bayes-Markov recursion are (see, e.g., [9, 25, 26]):   Predicted object state: μ− k (x k ) = N x k ; xˆ k|k−1 , Pk|k−1 ,   Predicted measurement: ρk (yk ) = N yk ; yˆk|k−1 , Sk ,   Posterior PDF : pk (xk | yk ) = N xk ; xˆk|k , Pk|k ,

(2.54) (2.55) (2.56)

where the mean vectors and covariance matrices are xˆk|k−1 = Fk−1 xˆk−1|k−1

(2.57)

yˆk|k−1 = Hk xˆk|k−1

(2.58)

xˆk|k = Pk|k−1 = Sk =

xˆk|k−1 + Pk|k−1 HkT Sk−1 (yk − T Fk−1 Pk−1|k−1 Fk−1 + Q k−1 T Hk Pk|k−1 Hk + Rk

Pk|k = Pk|k−1 −

yˆk|k−1 )

Pk|k−1 HkT Sk−1 Hk Pk|k−1 .

(2.59) (2.60) (2.61) (2.62)

The Kalman filter is a closed Bayesian recursion because, at every iteration, the posterior and the prior are multivariate Gaussians on X = Rn .

2.7.2 Linear-Gaussian PDA: Without Gating Common practice (but unnecessary) is to assume the probability of detection is constant, that is, Pdk (x) ≡ pd for all x ∈ X and k = 1, . . . , K , and to assume the clutter intensity is uniform over a specified bounded region of regard R ⊂ Y and constant over time, so that λck ≡ λc and the clutter PDF pkc (y) ≡ V −1 , y ∈ R, where V is the volume of R. The PDF pk (xk | ym ) is the Kalman filter posterior distribution conditioned on the assumption that ym is object-originated. Thus, (2.56) implies that   m , Pk|k , pk (xk | ym ) = N xk ; xˆk|k

(2.63)

m where xˆk|k ≡ xˆk|k−1 + Pk|k−1 HkT Sk−1 (ym − yˆk|k−1 ). The exact PDA posterior (2.29) is

2.7 Linear-Gaussian Filters

43

 M 1 − pd V pd μ− (x ) + ρk (ym ) pk (xk | ym ) k k c m=1 sk (yk ) λ sk (yk )

1 − pd N(xk ; xˆk|k−1 , Pk|k−1 ) (2.64) = sk (yk )

   M V pd m N(ym ; yˆk|k−1 , Sk ) N xk ; xˆk|k , Pk|k , + c m=1 λ sk (yk )

pk (xk | yk ) =

where the second equality follows from (2.54), (2.55), and (2.63). The normalizing constant is V pd  M N (ym ; yˆk|k−1 , Sk ) . (2.65) sk (yk ) = 1 − pd + c m=1 λ The PDA filter closes the Bayesian recursion by approximating the posterior mixture PDF with a single Gaussian whose mean vector, xˆk|k , and covariance matrix, Pk|k , are matched to the mean and covariance of the exact posterior. Since the Gaussian mixture (2.64) has constant coefficients, xˆk|k is the weighted sum of the mixture component means, and Pk|k is a weighted sum called the “spread of the means.” Detailed expressions are given in [8, 9, 16]. Thus, the PDA filter closes the Bayesian recursion with the approximation pk (xk | yk ) = N(xk ; xˆk|k , Pk|k ).

2.7.3 Linear-Gaussian PDA: With Gating The gate is typically a suitably chosen contour (and its interior) of the predicted measurement PDF (2.55). Thus, for some threshold γ > 0, the gate is defined by   T   

k = y ∈ Y : y − yˆk|k−1 Sk−1 y − yˆk|k−1 ≤ γ . The traditional assumptions of Sect. 2.7.2 are retained here. In this case, the simplified gated PDA GFL is

 λc kPDA/gtd (h, g) = exp −λc/gtd + g(y) dy k V k   gtd × h(x)μ− (x) 1 − p P (x) + p d k d k X

k

g(y) pk (y | x) dy

 dx .

An additional approximation that is often found in the literature assumes that Pkgtd (x) ≡ pg is constant for all x ∈ X and k = 1, . . . , K . In this case, the posterior object state PDF becomes

44

2 Tracking One Object

pk (xk | yk ) =

1 − pd p g N(xk ; xˆk|k−1 , Pk|k−1 ) (2.66) sk (yk )

     M V pd m N ym ; yˆk|k−1 , Sk N xk ; xˆk|k , Pk|k , + c m=1 λ sk (yk )

with normalizing constant sk (yk ) = 1 − pd pg +

V pd  M N(ym ; yˆk|k−1 , Sk ) . m=1 λc

The approximation Pkgtd (x) ≡ pg is reasonably accurate for gates with sufficiently large volume. For smaller gates, however, the dependence on object state in the function Pkgtd (·) is more pronounced and should not be ignored. To close the recursion, the posterior mixture PDF is approximated with a single Gaussian as described in Sect. 2.7.2.

2.8 Numerical Example: IPDA A simulated scenario for the IPDA filter is examined in this section. Elements of X and Y will be represented using boldface text (e.g., x and y) so that they will not be confused with their 2D spatial components x and y. All spatial units are in meters, and all time units are in seconds. The spatial region of interest is R = [−2500, 2500] × [−2500, 2500] ⊂ R2 . A point object moves in continuous time through R at a constant speed of 50 m/s. Its state space X ⊂ R4 comprises position and velocity components; an element x ∈ X is rep T resented by x = (x x˙ y y˙ )T . The object begins in state x0 = −1700 0 1500 −50 at time t0 =0. It moves at this velocity for 30 s, turns at 3◦ per second for the next 30 s, moves in the positive x direction for 30 s, and then “disappears” at time 90 s  T with last x-y position of approximately 700 −1000 . From a simulation point of view, this means simply that the object no longer produces sensor measurements. A sensor provides spatial x-y measurements at 1 s intervals starting at time t1 = 1 and ending at time t K ≡ t120 = 120; that is, t = 1. The measurement space Y is a subset of R2 , and an element y ∈ Y is represented by the 2D spatial vector y = (x y)T . At each scan time tk , the object is in state xk ∈ X, and it induces a sensor measurement yk ∈ Y with probability pd = 0.9. Given that the object is detected, a measurement yk is generated according to the linear-Gaussian PDF pk (yk |xk ) = N(yk ; H xk , R), where

2.8 Numerical Example: IPDA

45

Fig. 2.1 IPDA position estimates is the green-trending-to-blue line, where green and blue correspond to χk = 1 and χk = 0, respectively. Black ellipses depict the IPDA 99% position error ellipses centered at the position estimate (black “x”) at scans 10, 20, …, 120, along with pink, blue, and brown ellipses at scans 1, 3, and 5, respectively. Black dotted line is true object position, and gray dots are clutter points from the last five scans, superposed. The red circle depicts a 3σM measurement window to provide a scale of reference



1000 Hk ≡ H = 0010



and Rk ≡ R =

σ M2

10 01

(2.67)

with σ M = 40. In the simulation, no measurement fell outside the region R. Simulated clutter measurements are realizations of a homogeneous PPP with uniform clutter intensity over R. Specifically, the mean number of clutter measure1 = ments λck ≡ λc = 225 is constant over all scans, and the clutter PDF pkc (y) ≡ Vol(R) −8 4 × 10 for y ∈ R. Thus, in each scan, there is an average of approximately 0.4 clutter measurements in a 3σ M measurement window. Linear-Gaussian assumptions are adopted for object motion. Equations (2.7)–(2.9) become (2.51)–(2.53), respectively. The prior object state μ0 (·) at time t0 is assumed to be normally distributed with mean xˆ0|0 = x0 and diagonal covariance matrix P0|0 = diag(2002 , 102 , 2002 , 102 ). The object motion model is assumed stationary, so that Fk ≡ F and Q k ≡ Q are constant over all scans, where

46

2 Tracking One Object

Fig. 2.2 Estimated IPDA object existence probability versus time (scan number). The red diamonds at the bottom of the figure mark “missed detections,” that is, scans in which the sensor did not produce an object measurement



1 ⎜0 F =⎜ ⎝0 0

t 1 0 0

0 0 1 0

⎛ 3 ⎞ t t 2 0 32 2 ⎜ t t 0⎟ ⎜ 2 2 ⎟ and Q = σ ⎜ p t ⎠ ⎝ 0 0 1 0 0

0 0 t 3 3 t 2 2

⎞ 0 ⎟ 0 ⎟ 2⎟ t ⎠ 2 t

with σ p = 5. No gating is performed. The IPDA prior probability of existence is set to χ0 = 0.5, and the IPDA transition matrix parameters given in (2.35) are set to πk0 = 1 and πk1 = 0.99 for k ≥ 1. Tracking filter parameters λc , pd , and σ M are matched to those of the simulation. Figure 2.1 displays the output of the simulated IPDA scenario. Ground truth object position in x-y space is given by the black dotted line; the gray dots are the superposed clutter realizations over the last five scans (clutter realizations are independent from scan to scan); and pink, blue, brown, and black ellipses are the IPDA 99% error ellipses centered at the IPDA spatial estimate (black “x”) at scans 1, 3, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, and 120. The IPDA spatial estimates over all 120 scans are given by the green-to-blue line; pure green corresponds to the IPDA

2.8 Numerical Example: IPDA

47

existence probability χk = 1, while pure blue corresponds to χk = 0. For intuition, the red circle depicts a 3σ M measurement window. Figure 2.2 shows the estimated object existence probability. The red diamonds at the bottom of the figure represent scans in which there were no object-induced sensor measurements.

References 1. Bernard Lindgren. Statistical theory. 4th Ed. Chapman and Hall, 1993. 2. Edwin T Jaynes. Probability theory: The logic of science. Cambridge University Press, 2003. 3. Sharon B McGrayne. The theory that would not die: how Bayes’ rule cracked the enigma code, hunted down Russian submarines, & emerged triumphant from two centuries of controversy. Yale University Press, 2011. 4. Arthur Gelb. Applied optimal estimation. MIT press, 1974. 5. Arthur E Bryson and Yu-Chi Ho. Applied optimal control, revised printing. Hemisphere, New York, 1975. 6. BDO Anderson and JB Moore. Optimal filtering. Prentice Hall, 1979. 7. Lawrence D Stone, Roy L Streit, Thomas L Corwin, and Kristine L Bell. Bayesian multiple target tracking. 2nd Ed. Artech House, 2014. 8. Y Bar-Shalom and TE Fortmann. Tracking and data association. Academic Press, 1988. 9. Yaakov Bar-Shalom, Peter K Willett, and Xin Tian. Tracking and data fusion: A handbook of algorithms. YBS Publishing Storrs, CT, 2011. 10. Subhash Challa, Mark R Morelande, Darko Mušicki, and Robin J Evans. Fundamentals of object tracking. Cambridge University Press, 2011. 11. Branko Ristic, Sanjeev Arulampalam, and Neil Gordon. Beyond the Kalman filter: Particle filters for tracking applications. Artech house, 2003. 12. J. Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann. Elsevier Science, 1988. 13. Richard E Neapolitan. Probabilistic reasoning in expert systems: theory and algorithms. Wiley, 1990. 14. Finn V Jensen. An introduction to Bayesian networks. Springer, 1996. 15. Joe Whittaker. Graphical models in applied multivariate statistics. Wiley, 1990. 16. Yaakov Bar-Shalom and Edison Tse. Tracking in a cluttered environment with probabilistic data association. Automatica, 11(5):451–460, 1975. 17. Roy Streit. Generating function derivation of the PDA filter. In 17th International Conference on Information Fusion (FUSION). IEEE, 2014. 18. Roy L Streit. Poisson point processes: imaging, tracking, and sensing. Springer Science, 2010. 19. Darko Musicki, Robin Evans, and Srdjan Stankovic. Integrated probabilistic data association. IEEE Transactions on Automatic Control, 39(6):1237–1241, 1994. 20. Darko Musicki and Sofia Suvorova. Tracking in clutter using IMM-IPDA-based algorithms. IEEE Transactions on Aerospace and Electronic Systems, 44(1):111–126, 2008. 21. Subhash Challa, Ba-Ngu Vo, and Xuezhi Wang. Bayesian approaches to track existence–IPDA and random sets. In 2002 5th International Conference on Information Fusion (FUSION). IEEE, 2002. 22. Darko Mušicki, Taek Lyul Song, and Roy L Streit. Generating function derivation of the IPDA filter. In 2014 Sensor Data Fusion: Trends, Solutions, Applications (SDF), pages 1–6, 2014. 23. G J Hillier, F S Lieberman. Introduction to Operations Research. 10th Ed. Open Access, 2015. 24. Bruno Sericola. Markov chains: theory and applications. John Wiley & Sons, 2013. 25. Andrew H Jazwinski. Stochastic processes and filtering theory. Academic Press 1970, Dover Edition, 2007. 26. Yaakov Bar-Shalom and Xiao-Rong Li. Estimation and tracking- Principles, techniques, and software. YBS Publishing Storrs, CT, 1995.

Chapter 3

Tracking a Specified Number of Objects

“Finished products are for decadent minds.” Isaac Asimov Second Foundation, 1953

Abstract The joint probabilistic data association (JPDA) and joint integrated probabilistic data association (JIPDA) filters are presented and derived from the AC point of view. JPDA stipulates the number of objects to be tracked, so it requires an external track management system to initiate and terminate object tracks. JIPDA is an extension of JPDA that “integrates” track initiation and termination into the tracking function. JIPDA stipulates a maximum number of objects to be tracked. Unresolved object models are incorporated into JPDA using the AC method. Examples are given. Keywords Joint probabilistic data association (JPDA) · Joint integrated probabilistic data association (JIPDA) · Merged measurements · Object resolution · JPDA/Res

3.1 Introduction Multiple object tracking is a new world compared to the single-object tracking problems discussed in the first two introductory chapters. What makes it new is the greatly increased difficulty of the assignment problems that are encountered. Assignments in the single-object PDA filter are straightforward because there are precisely two hypotheses for every measurement—it originates either from the object or from clutter. On the other hand, assignments in the IPDA filter are harder because it includes an “existential doubt” about the object hypothesis. AC explicates both filters in a unified and transparent manner, as shown in Chap. 2. In contrast, assignments in multiobject tracking involve a multitude of hypotheses about measurements and objects, thus greatly increasing their number and the complexity of the problem. Joint probabilistic data association (JPDA) problems are © Springer Nature Switzerland AG 2021 R. Streit et al., Analytic Combinatorics for Multiple Object Tracking, https://doi.org/10.1007/978-3-030-61191-0_3

49

50

3 Tracking a Specified Number of Objects

a natural extension of PDA to N known objects. In JPDA, a measurement originates either from one (and only one) of the objects or from clutter. Assuming that every object is detected (Pd = 1) and no clutter is present, there are N ! possible assignments. Including clutter and possibly undetected objects greatly increases the number of possible assignments and, at the same time, makes them increasingly complex and structured. Joint integrated probabilistic data association (JIPDA) is an established method that is now several decades old. It is a natural extension of PDA to at most N known objects. JIPDA is particularly interesting because it includes an existential doubt about the existence of every object. Both JPDA and JIPDA filters are derived for objects with different state spaces. The added generality can sometimes be useful, for example, when objects maneuver and switch between multiple motion models. It also happens to be helpful in later chapters when discussing the pros and cons of object state superposition. The benefits of AC to multiobject tracking. The assignment problems of JPDA and JIPDA begin to test the capabilities of AC. They are first steps into the wheelhouse of AC, and this chapter demonstrates the many merits of the AC method. AC for JPDA and JIPDA contributes something new to these established methods by forcing a really close look at the consequences of the underlying assumptions. For JPDA, it clarifies the way gating for multiple objects has to be done. The close look also reveals that, for well-separated objects with non-overlapped measurement gates, the trackers in the JPDA model are not independent if the clutter process is anything other than Poisson! This is because gate-level clutter processes are independent if and only if the overall clutter process is Poisson [1]. For JIPDA, AC greatly clarifies the “existence” model as well as the nature of the discrete-continuous state space on which the Bayes posterior distribution is defined. With the AC approach, it is possible to discuss the general case of multiple possibly existing objects with insight and understanding, perhaps for the first time. It is a much more interesting state space than that of JPDA. The AC method seamlessly encodes the logical assumptions of the assignment problem into a mathematical form that is well matched to modern software for automatic symbolic calculation. A big problem in multiple object tracking is the merged measurement, or unresolved object, problem. In this problem, two or more objects are in close enough proximity to each other that they generate fewer measurements than they would if they were well separated. This happens because the object point spread functions overlap/merge on the sensor response surface. It is a combinatorial assignment problem with a twist—it assigns the same measurement to two or more objects, depending on their joint state. In this chapter, a “smart” JPDA for N = 2 objects with possibly unresolved measurements is developed and some simulation results are presented. The general case of N ≥ 3 objects is more involved and will be treated elsewhere.

3.2 Joint Probabilistic Data Association (JPDA) Filter

51

3.2 Joint Probabilistic Data Association (JPDA) Filter The JPDA filter is a classical Bayesian multiobject estimator that determines the posterior joint PDF on the Cartesian product of the object state spaces, conditioned on all the available measurements. It is impractical for many applications because its computational complexity is NP-hard, which means that runtime grows exponentially with the number of objects and measurements regardless of how the filter is implemented. The JPDA filter is described in many places. The early papers [2–4] lay the foundation, but [5] is more commonly cited as it provides practical application tips and applies JPDA to a passive sonar tracking problem. On the other hand, [6] presents the first AC derivation of JPDA, and [7] provides more discussion on the AC-JPDA connection. The JPDA filter is designed for N ≥ 1 objects. Objects are assumed independent of each other and causally independent of the measurement process. Each object generates at most one measurement in any given scan, and all object measurements are in the space Y. Measurements are unlabeled, i.e., which object generated which measurement is unknown. Measurements are assigned to at most one object, and the measurement likelihood function depends only on the object to which it is assigned. Measurements not assigned to an object are assigned to a clutter process. The PDA notation is adjusted to accommodate N objects. The state space of object n is denoted by Xn , n = 1, . . . , N . In many discussions, the object state spaces are taken to be identical, i.e., Xn ≡ X; however, the restriction is unnecessary. The state space for the N objects is the Cartesian product X1 × · · · × X N . The a priori PDF N (·), where μnk−1 (·) is the a priori at scan k is specified as the product μ1k−1 (·) · · · μk−1 n PDF of object n on X . A superscript n is added to already named PDA functions to indicate that they depend on the object. Thus, for object n at scan k, the predicted PDF for object n state x ∈ Xn is μn− k (x), the probability of object detection is Pdk (x), the Markov n transition function is pk (x | ·), the conditional probability of y ∈ Y is pkn (y | x), and the predicted measurement PDF is ρkn (y). Distinct object-induced measurements are mutually independent given an object-measurement assignment and the corresponding object  states. Let{y1 , . . . , y M }, M ≥ 1, represent a collection of measurements, and let x 1 , . . . , x N represent the object states at scan k. Suppose, for instance, that  exactly r ≥ 1 objects generated measurements, and that these measurements are yσ1 , . . . , yσr . Given a (feasible) object-measurement assignment θ ≡ {object τ j generated measurement yσ j | 1 ≤ j ≤ r }, the measurement likelihood function at scan k factors into a product of individual object likelihood functions: pk (yσ1 , . . . , yσr | x τ1 , . . . , x τr , θ ) =

r

τ p j (yσ j j=1 k

| xτj ) .

52

3 Tracking a Specified Number of Objects

An additional factor is needed to account for the probability that the unassigned measurements are generated by the clutter process.

3.2.1 Multivariate Generating Functional The assumptions of the single-object PDA filter are satisfied by every object. Individual object-measurement processes are therefore BMD processes. The GFL of the BMD model for object n at scan k is written, from (2.19), 

 k

BMD(n)

(h n , g) =

X





n n n n n h n (x n )μn− k (x ) 1 − Pdk (x ) + Pdk (x ) n

Y

g(y) pkn (y|x n ) dy

dx n ,

(3.1)

where h n : Xn → C and g : Y → C are indeterminate functions for object n and measurements, respectively. Technically, these indeterminate functions should have subscripts k, but they are unnecessary here and are omitted. Object measurements are superposed with clutter measurements in Y whose GFL is kC (g). The clutter process is assumed independent of the object processes, so the GFL of JPDA for N objects at scan k is kJPDA (h 1:N , g) = kC (g)

N n=1

kBMD(n) (h n , g).

(3.2)

The compact functional form expands considerably upon substituting analytical expressions for the constituent GFLs. Under the traditional assumption of Poisson clutter with GFL (2.20),   kJPDA (h 1:N , g) = exp −λck + λck g(y) pkc (y) dy (3.3) Y

   N n ) 1 − Pd n (x n ) + Pd n (x n ) g(y) p n (y|x n ) dy dx n . × h n (x n )μn− (x k k k k n=1 n X

Y

This expression is identical to (2.21) for N = 1. Letting h  (x) = 1,  ∈ {1, . . . , N }\{n}, gives the marginal GFL of object n as kJPDA (h n , g) ≡ kJPDA (1, . . . , 1, h n , 1, . . . , 1, g) N = kC (g)kBMD(n) (h n , g) kBMD() (1, g) . =1, =n

(3.4)

The first two terms on the right-hand side correspond to the GFL of the PDA filter for object n in clutter. The presence of the other terms shows that marginalizing over all but one object does not remove the combinatorial influence of other objects. In other words, marginalizing does not reduce JPDA to PDA.

3.2 Joint Probabilistic Data Association (JPDA) Filter

53

3.2.2 Exact Bayes Posterior Probability Distribution via AC In this section, the cross-derivative of the GFL is merely an analytical tool for deriving the Bayes posterior PDF of the JPDA filter. The tracking context is absent because it is subsumed inside the differentiation procedures. It is not gone, however. It is shown (in the next section) that each and every term in the normalized cross-derivative has a specific probabilistic interpretation in the JPDA filter. Bayes posterior GFL. The scan measurement set is yk ⊂ Y. Suppose first that yk = ∅, and let yk = {y1 , y2 , . . . , y M } , M ≥ 1. Define the delta train exactly as in (2.22) and substitute it into (3.3) to obtain the secularized function k (h JPDA

×

1:N

, β) ≡ exp

+

λck

 N  n=1

−λck

h (x n

Xn

n

n )μn− k (x )

M

 βm pkc (ym )

(3.5)

m=1

1−

Pdkn (x n )

+

Pdkn (x n )

M

 βm pkn (ym |x n )

 dx

n

.

m=1

The normalized cross-derivative (see (B.67))   JPDA 1:N  (h , β)  k  1:N  β=0 JPDA  k h | yk =  dM JPDA  (11:N , β) dβ1 ···dβ M k dM dβ1 ···dβ M

(3.6)

β=0

is the GFL of the exact Bayes posterior distribution of the JPDA filter. By inspection, the numerator is the cross-derivative of the product of an exponential of a linear function of β and N linear functions of β. Computing its cross-derivative is, therefore, a straightforward exercise in calculus.1 Let M×N be the set of all M × N matrices whose elements are all either zero or one, and whose row and column sums are also either zero or one. Define N ≡ min{M, N }, so that N is the number of rows or columns of θ , depending on which is smaller. Let θ = [θmn ] ∈ M×N. For κ = 0, 1, . . . , N , define  

M

(κ) = θ ∈ M×N : θmn = 1 for exactly κ columns n ∈ {1, . . . , N } . m=1 (3.7) Thus, (κ) is the subset of M×N with exactly κ columns that sum to one. Let the set I(θ ) ≡ {i 1 , . . . , i κ } denote the indices of the columns of θ that sum to one, and let J(θ ) ≡ { j1 , . . . , j N −κ } be the indices of the remaining columns that sum to zero. Then, I(θ ) ∩ J(θ ) = ∅ and I(θ ) ∪ J(θ ) = {1, . . . , N } .

(3.8)

1 It is evaluated using the formula (C.37) derived in Sect. C.4 of Appendix C. Like many derivatives,

the details are messy.

54

3 Tracking a Specified Number of Objects

For n ∈ I(θ ), define m θ (n) to be the index of the (unique) nonzero row of column n. The matrix θ has the entry θm θ (n),n = 1. With this notation, the GFL of the exact JPDA posterior is ⎛ N  

  n 1 n n n ⎝ k h 1:N | yk = h n (x n )μn− (x ) 1 − Pd (x ) dx k k C κ=0 θ∈ (κ) n∈J(θ) Xn ⎞  

1 n n n n n n ⎠ × h n (x n )μn− , k (x )Pdk (x ) pk (ym θ (n) | x ) dx λck pkc (ym θ (n) ) Xn  JPDA



n∈I(θ)

(3.9) where the normalizing constant C ≡ C(yk ) is

C=

N

⎛ ⎝

κ=0 θ∈ (κ)

×



n∈I(θ)

 

n∈J(θ)

Xn

n μn− k (x )

1 c c λk pk (ym θ (n) )

 Xn

  1 − Pdkn (x n ) dx n



⎞ n n n n n n ⎠ μn− . (3.10) k (x )Pdk (x ) pk (ym θ (n) | x ) dx

To see this, use the formula (C.37) to find the cross-derivative in the numerator n n n of (3.6). The denominator of (3.6) is the numerator at  h (x ) ≡ 1, x ∈  M evaluated c c c n X . The nonzero constant c0 (0) = exp(−λk ) m=1 λk pk (ym ) defined by (C.38) depends only on the measurements. Canceling it gives (3.9)–(3.10). Bayes posterior probability distribution. The posterior PDF at a given point (x 1 , . . . , x N ) ∈ X1 × · · · × X N is found by substituting the weighted Dirac  deltas  h nδ (·) = αn δxkn (·), αn ∈ C, into (3.9). This gives the secular function kJPDA α | yk , where α = (α1 , . . . , α N ). The Dirac deltas kill integrals by sampling them, so that, by inspection, the secular function is proportional to the product α1 · · · α N . The crossderivative with respect to α at α = 0 is therefore the proportionality constant. The exact Bayes posterior PDF is, again by inspection, 

pk xk1:N | yk



⎛ N  1 ⎝   n− n  μk (xk ) 1 − Pdkn (xkn ) = C κ=0 θ∈ (κ) n∈J(θ) ⎞  μn− (x n )Pd n (x n ) p n (ym θ (n) | x n ) k k k k k k ⎠. × λck pkc (ym θ (n) )

(3.11)

n∈I(θ)

The number of terms in this sum is equal to the number of feasible assignments and is discussed in Sect. 3.2.5. Finally, suppose yk = ∅. In this case, evaluating the GFL (3.3) at g(y) = 0 gives

3.2 Joint Probabilistic Data Association (JPDA) Filter

    N kJPDA h 1:N , 0 = exp −λck

n=1

 Xn

55

  n n n n . h n (x n )μn− (x ) 1 − Pd (x ) dx k k (3.12)

Normalizing this GFL by its value at h n (x n ) ≡ 1, x n ∈ Xn , substituting the weighted Dirac deltas h nδ (·) = αn δxkn (·), and taking the normalized cross-derivative with respect to α at α = 0 give the Bayes posterior PDF 

pk xk1:N where skn (yk = ∅) =

 Xn

  N n n n   μn− k (x k ) 1 − Pdk (x k ) , | yk = ∅ = skn (yk = ∅) n=1

(3.13)

  n n n n μn− k (x ) 1 − Pdk (x ) dx .

Exact PDA filter as a special case. For N = 1, the exact Bayes posterior PDF represented by (3.11) and (3.13) is identical to corresponding expressions for the PDA filter given in Chap. 2. This is self-evident for yk = ∅. If yk = ∅, that is, if M ≥ 1, then N = min{M, N } = 1 and (0) = {0 M }, where 0 M is the zero vector in R M , and (1) = {e1 , . . . , e M } is the set of “one-hot” basis vectors for R M . The sum over (0) gives the constant term of the PDA filter, and the sum over (1) gives the remaining terms. Alternative form of the posterior GFL. The GFL (3.9) is a probabilistic mixture of GFLs. To see this, note that the events { (κ), 0 ≤ κ ≤ N } are disjoint and their union is the set of feasible JPDA assignments. For 0 ≤ κ ≤ N , let θ be an arbitrarily specified matrix in (κ). Define the event conditioned GFLs on the space X1 × · · · × X N by k

JPDA



h

1:N



| yk , θ =

 n∈J(θ)

×

 n∈I(θ)

 Xn

 Xn

  n n n n h n (x n )μn− k (x ) 1 − Pdk (x ) dx    n n− n n n Xn μk (x ) 1 − Pdk (x ) dx

(3.14)

n n n n n n h n (x n )μn− k (x )Pdk (x ) pk (ym θ (n) | x ) dx  . n− n n n n n n Xn μk (x )Pdk (x ) pk (ym θ (n) | x ) dx

They evaluate to one if h n (x n ) ≡ 1 for all n ∈ I(θ ) ∪ J(θ ) = {1, . . . , N } and x n ∈ Xn . There are exactly κ indices in I(θ ) and N − κ indices in J(θ ). Define the probabilities    M−κ 1 exp(−λck ) λck Pr{θ | yk } =

C (yk ) ×

 

M 

 pkc (ym )

m=1 m∈ / {m θ (n) : n∈I(θ )}

(3.15)

   n   n n n n n n n n n μn− (x ) 1 − Pd (x ) dx μn− dx , k k k (x )Pdk (x ) pk ym θ (n) |x

n n∈J(θ) X

n n∈I(θ) X

56

3 Tracking a Specified Number of Objects

where the normalizing constant C (yk ) is chosen so that

N

Pr{θ | yk } = 1.

(3.16)

κ=0 θ∈ (κ)

The three major terms in the expression are interpreted as follows. The term in braces in (3.15) is the probability that the Poisson clutter process generated the M − κ points yk \{ym θ (n) : n ∈ I(θ )}. The second term is a product over J(θ ), and it is the probability that the N − κ objects with indices in J(θ ) are not detected. The third term is a product over I(θ ). It is the probability that the remaining κ objects generate one measurement each—to be precise that object n ∈ I(θ ) generates measurement ym θ (n) . The normalizing constant, C (yk ), is the sum of the likelihoods, so the quantities (3.15) specify the PMF of a discrete random variable on the set of feasible assignments θ ∈ . It is conditioned on the data yk . (It is used in combinatorial optimization to estimate the most likely set of measurement-to-object assignments.) With these definitions, the GFL (3.9) for JPDA is written

N

    Pr{θ | yk } kJPDA h 1:N | yk , θ . kJPDA h 1:N | yk =

(3.17)

κ=0 θ∈ (κ)

The Bayes posterior GFL is, by construction, a probabilistic mixture of GFLs, one for each term in the exact JPDA filter. See Appendix B, Sect. B.3.

3.2.3 Measurement Assignments and Cross-Derivative Terms The traditional derivation of the JPDA filter starts by defining the feasible measurement-to-object assignment events, and then assembles the Bayes posterior distribution from the probabilities of these events. The AC derivation, on the other hand, does not define assignment events. It starts by characterizing the statistical properties of objects and measurements in a GFL, and then derives the filter as the normalized cross-derivative of the GFL. The “look and feel” of the two derivations are very different, and yet they lead to the same filter. This suggests that the AC approach to tracking is not merely a different way to derive exact Bayesian filters such as JPDA; it is an alternative way to conceptualize these filters combinatorially. In the AC derivation of the JPDA filter, assignment events are defined after the exact posterior is derived. The right question to ask is “Are these events identical to the feasible assignment events of the traditional derivation?” The answer is yes, of course, since the GFL fully characterizes the exact JPDA filter. The analytic proof that there is a one-to-one correspondence between cross-derivative terms and assignment events is a corollary of the proof of (3.17) that the GFL is a probabilistic mixture defined on the collection of all feasible assignments.

3.2 Joint Probabilistic Data Association (JPDA) Filter

57

Index constraints appear automatically in AC methods as part of the differentiation procedure. They model assumptions that are handcrafted into the enumerative derivation. For example: 1. Leibniz’s product rule for cross-derivatives requires the entries in the matrices θ ∈ to be binary (zero or one), and it limits the maximum number of nonzero entries per row to one. This sets the stage for assignment problems. 2. The linearity of the object secular functions kBMD(n) (h n , β), that is, the factors in the product (3.5), imposes the vital requirement that columns of θ ∈ contain at most one nonzero entry. These constraints act together to make m θ (n) a welldefined function with domain I(θ ), i.e., the columns of θ that sum to one. 3. The function m θ (n) is the AC manifestation of the “at most one measurement per object per scan” rule. This fundamental rule is embedded in the GFL, where it silently disappears, and then reappears naturally in the cross-derivative. 4. The bound N = min{M, N } on the maximum number of columns of θ that sum to one is the analytical equivalent of the simple fact that it is impossible to assign more measurements to objects than there are objects, and conversely.

3.2.4 Closing the Bayesian Recursion The prior PDF of JPDA at scan k is the product of the single-object priors, N (·). The number of terms in the exact Bayes posterior distribution μ1k−1 (·) · · · μk−1 (3.11) is equal to the number of feasible assignments. Not only is this an impractically large sum in many applications, but it does not factor into a product of N terms and so is not of the same form as the prior. To close the Bayesian recursion, the JPDA filter approximates the posterior by the product of its single-object marginal PDFs.2 Explicitly,   (3.18) pk xk1:N | yk = μ1k (xk1 | yk ) · · · μkN (xkN | yk ) , where (using an obvious notation)    μnk x n | yk ≡ 



X1 ×···×X N \Xn

  pk x 1:N | yk dx 1:N \ dx n

(3.19)

is the marginal PDF of the exact Bayesian posterior integrated over all object state spaces except Xn . The GFL of the marginal Bayes process is given by (3.4). Under linear-Gaussian assumptions for object motion and measurement likelihood function, and assuming constant detection probabilities, the integrals in the normalizing constant C in (3.11) can be evaluated explicitly. The result is that the exact posterior is a large Gaussian mixture. Consequently, the marginal PDFs are also Gaussian mixtures. The marginal PDF for object n is approximated in the same 2 Often

called mean field approximations in the physics and machine learning communities.

58

3 Tracking a Specified Number of Objects

way as done in the single-object PDA filter, that is, by approximating the mixture by a single Gaussian with the same mean and covariance matrix as the posterior mixture implicit in the GFL (3.4). As mentioned above, the JPDA posterior PDF does not factor into a product of N terms. The objects themselves are independent, by assumption, so why does the Bayesian estimate of their joint state suggest otherwise? This perhaps counterintuitive result is due entirely to what is sometimes termed as “assignment interference.” Conditioning the sum (3.11) on one assignment θ ∈ (κ) reduces it to the one term, the θ term, and this term factors into the product form that intuition expects.

3.2.5 Number of Assignments Let η(n, m) represent the number of ways to assign m measurements to n objects in the JPDA filter. Then   min(n,m)

m n κ! . (3.20) η(n, m) = κ κ κ=0 To see this, note that there are anywhere from κ = 0 to κ = min(m, n) objects that  can be assigned one-to-one to measurements. For a fixed κ, there are κn ways to   choose κ detected objects, and there are mκ ways to choose κ measurements from the available m measurements. Once chosen, there are κ! ways to assign them oneto-one. The product of these numbers is the summand of (3.20) and summing over κ gives the result. The exponential generating function (EGF) is often useful for counting labeled objects [8, Chap. II]. The bivariate EGF is defined by A(z, w) =

∞ ∞ n=0

m=0

η(n, m)

z n wm . n!m!

(3.21)

The EGF of the number of feasible JPDA assignments (3.20) is A(z, w) = ew+z+wz .

(3.22)

To prove (3.22), it suffices to show that the coefficients of the double power series expansion of G(z, w) about z = w = 0 are the correct numbers. Calculation gives    dn+m  w+z+wz  dm  w n  e e (1 + w)  =  dz n dw m dw m w=z=0 w=0   m  w = m! w e (1 + w)n ≡ η(n, m).

(3.23)

3.2 Joint Probabilistic Data Association (JPDA) Filter

59

The coefficient of w m is found by multiplying the series expansion of ew about w = 0 and the binomial expansion of (1 + w)n . Simplifying the resulting sum gives η(n, m). Details are omitted.

3.2.6 Measurement Gating To reduce computational complexity, validation gates are often employed for each object to eliminate measurements that are deemed “highly unlikely” to have been generated by the object. Let kn denote the gate for object n at scan k. The GFL for object n with gated measurements has the same form as (2.33), but with obvious changes, namely, (3.24) kBMD(n)/gtd (h n , g) =    n/gtd n/gtd n n n h n (x n )μn− g(y) pkn/gtd (y|x n ) dy dx n , k (x ) 1 − Pdk (x ) + Pdk (x ) Xn

kn

where the gated probability of detection is (compare to (2.31))  Pdkn/gtd (x n ) = Pdkn (x n )

kn

pkn (y | x n ) dy ≡ Pdkn (x n )Pkn/gtd (x n ) ,

(3.25)

and where the gated measurement likelihood function is (compare to (2.32))  −1 pkn/gtd (y|x n ) = Pkn/gtd (x n ) pkn (y | x n ) ,

y ∈ kn , x n ∈ Xn .

(3.26)

The likelihood functions are truncated, that is, they are identically zero outside the validation gates. The same indeterminate g(y) is used for all objects since kn ⊂ Y. The clutter process is restricted to the union of the object validation gates, N

k = ∪ kn ⊂ Y. n=1

(3.27)

If two or more validation gates overlap, the corresponding gated clutter processes are not independent because the clutter process restricted to the nonempty intersection contributes to them all. Surprisingly, even if the gates are pairwise disjoint, the gated clutter processes are independent if and only if the clutter process on k is a Poisson point process [1, Th. 2.4.V]. By using the union k of the individual object gates as the JPDA validation gate, JPDA is able to incorporate any clutter process whose GFL is known. It is widely accepted in practice that tracking independent objects that are “well separated” in both state and measurement space can be done independently. In the

60

3 Tracking a Specified Number of Objects

JPDA formulation of the problem, this statement is theoretically correct if and only if the clutter process is Poisson. Object processes are mutually independent regardless of the gating, provided the object-specific gates kn are specified using only information from the predicted distribution of each object. Object-measurement processes are independent of the clutter process on Y and, hence, are independent of the clutter process restricted to the gate union k . The GFL of the gated JPDA is the product kJPDA/gtd (h 1:N, g) = kC/gtd (g)

N n=1

kBMD(n)/gtd (h n , g) .

(3.28)

For N = 1, the GFL reduces to the GFL for gated PDA (2.34). The exact Bayesian posterior of the gated JPDA filter is derived as a ratio of cross-derivatives of (3.28) in the same way as done for the ungated filter. Using the traditional Poisson clutter process gives the exact Bayes posterior distribution (3.11), with obvious changes in notation to reflect gated variables. More aggressive gating to a specific object falls to the object likelihood functions, which are zero outside their respective gates. Measurements that are outside an object’s gate give zero contribution to the Bayesian posterior. Implementations require bookkeeping to avoid computing terms that are zeroed by the gated object likelihood functions.

3.3 Joint Integrated Probabilistic Data Association (JIPDA) Filter The JIPDA filter “integrates” a detection capability into JPDA by extending the multiple object continuous state space of JPDA to a multiple object discrete-continuous state space. The discrete component enables JIPDA to model object existence. It is, like JPDA, a classical Bayesian estimator and is conditioned on all the available measurements. It was first derived in [9–11]. A random set style derivation was given in [12, 13]. The AC derivation of JIPDA was first given in [14] and also later in [7]. Let N denote the specified number of IPDA-style object GFL models, so that it is also the maximum number of objects. The JPDA notation for objects and measurements is retained for JIPDA. The IPDA existence models and notation given in Chap. 2 are extended to multiple objects, with indices added to make everything object-specific. The existence of object n is modeled as a continuous-time two-state Markov chain, N n (t), with states in the set B = {0, 1}. If N n (t) = 1, object n is said to exist at time t; if N n (t) = 0, object n is said not to exist. Objects are independent, by assumption, so the Markov chains are independent. The existence variable Nkn ≡ N n (tk ) for object n at time tk is a discrete-time Markov chain on B. Existence variables are written with subscripts or superscripts or both, so they should not be confused with the specified number of object models N .

3.3 Joint Integrated Probabilistic Data Association (JIPDA) Filter

61

The transition probability matrix for object n is row stochastic, 0n 0n 1−πk−1 πk−1 , = 1n 1n 1−πk−1 πk−1 

Ank−1

(3.29)

0n 1n where πk−1 and πk−1 are the probabilities that the chain stays in state 0 and 1, respectively, when transitioning from scan k − 1 to scan k. The value χ0n = 0.5 is a common choice for the prior probability of existence. The predicted existence probability χkn− ≡ Pr{Nkn = 1 | y1:k−1 } is determined by the Markov chain via the vector-matrix product, as in (2.36). The posterior probability that object n exists at time tk is χkn ≡ Pr{Nkn = 1 | y1:k }, k ≥ 1. It is important to note that existence probabilities can be state dependent. This case is treated in Sect. 4.5.2 of Chap. 4.

3.3.1 Integrated State Space The state space for JIPDA with at most N objects is complicated not because it is a discrete-continuous space, but because objects that do not exist cannot have a continuous state space. The complication is seen even in its simplest form for N = 1, as discussed in Sect. 2.6.1. The “integrated” JIPDA state space is the union of 2 N Cartesian products: N

LN =

Xn 1 × · · · × X n κ ,

(3.30)

κ=0 1≤n 1 > 1 − wr , or vice versa, then the unresolved measurement is more likely to be nearer the strong object than

3.5 Numerical Examples: Tracking with Unresolved Objects

73

the weak one. This model is a not-unreasonable approximation to the behavior of a peak picking measurement process that thresholds the sensor response surface (see Sect. 2.3 of Chap. 2). The resolution function is the probability that the two objects are resolved when they are in states x1 and x2 . It is assumed to have the Gaussian form  1 r (x , x ) ≡ 1 − exp − 2 2σres  1 = 1 − exp − 2 2σres 1

2

# # # H (x1 − x2 )#2



 1 (x − x 2 )2 + (y 1 − y 2 )

(3.71)  2

,

where the constant σres > 0 is specified. The form of (3.71), in conjunction with the linear-Gaussian object modeling assumptions, allows the Bayesian posterior in (3.59) to be calculated analytically as a weighted mixture of Gaussian PDFs.3 Calculating the mean and covariance of a Gaussian mixture is straightforward. Thus, no approximations are used, until the Bayesian recursion is closed with a single Gaussian whose first and second moments match those of the Gaussian mixture. Measurement gating. To increase computational efficiency, only measurements that lie within the gate k ≡ k1 ∪ k2 ∪ kunres are passed to the tracking filter. The gates k1 and k2 are defined for objects one and two individually in the standard way as outlined in Sect. 2.7.3, with γ = 0.995. The gate   T    −1  unres unres unres H Pk|k−1 y − yˆk|k−1 ≤ 0.995 HT + R kunres ≡ y ∈ Y : y − yˆk|k−1 is designed to encompass potential unresolved measurements. The gate mean,   unres 1 2 1 2 , ≡ wr yˆk|k−1 + (1 − wr ) yˆk|k−1 = H wr xˆk|k−1 + (1 − wr )xˆk|k−1 yˆk|k−1

(3.72)

is the weighted average of the two predicted object measurements. The gate covariance matrix, unres 1 2 ≡ wr2 Pk|k−1 + (1 − wr )2 Pk|k−1 , (3.73) Pk|k−1 takes into account the independence of predicted object states.

3.5.1 JPDA/Res Filter with Weak and Strong Crossing Tracks In this scenario, two objects move through R with constant velocities, eventually crossing paths at the origin as shown in Fig. 3.2. Object one (red) begins in state  T x01 ≈ −2000 31.5 −352.7 5.6 at time t0 = 0, and it moves at this constant velocity for 120 s. Object two (blue) mirrors object one. It begins in state x02 ≈ 3 Some

components of the mixture have negative weights, but the mixture PDF is strictly positive.

74

3 Tracking a Specified Number of Objects

 T −2000 31.5 352.7 −5.6 , and it moves at this constant velocity for 120 s. The objects cross at the origin at a 20◦ angle. The region of interest is R = [−2000, 2000] × [−1000, 1000]. The mean number of clutter measurements λck ≡ λc = 250 is constant over all scans. Thus, in each scan, there is an average of approximately 2.2 clutter measurements in a 3σ M measurement window. The unresolved weighting factor is chosen to be wr = 10/11 ≈ 0.91. Under the relative signal $ return % strength interpretation, the signal return strength of object one wr is 10 log10 1−wr = 10 dB higher than that of object two. The tracker process noise standard deviation is set to σ p = 0.5. The resolution parameter is chosen to be σres = 235.7. This value for σres gives a resolution probability of approximately 0.5 when the objects are around 275 m apart, and the resolution probability drops to 0 when the objects eventually cross. The filter outputs are depicted in Fig. 3.2. Ground truth object position in x-y space is given by the black dashed line, the gray dots are the superposed clutter realizations over the last five scans (clutter realizations are independent from scan to scan), and red/blue ellipses are the 99% error ellipses centered at the tracker spatial estimate (“x”) for each object at scans 1, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, and 120. The tracker spatial estimates over all 120 scans are given by the red/blue lines. Red/blue asterisks represent individual object measurements. Green asterisks are unresolved/merged measurements. For reference, the dark gray circle in the lower right-hand corner of each plot depicts a 3σ M measurement window.

3.5.2 JPDA/Res with Parallel Object Tracks In this section, the two objects move in close proximity to one another for an extended period of time. The object motion is quite similar to that in the IPDA numerical example in Sect. 2.8, and tracking results are shown in Figs. 3.3 and 3.4. Both objects move through R at a constant speed of 50 m/s. Object one begins  T in state x01 = −1700 0 1830 −50 at time t0 = 0. It moves at this constant velocity for 15 s, turns at 3◦ per second counterclockwise for the next 30 s, and then moves in the positive x-direction for 75 s. Object two mirrors this process. For T  k = 0, 1, . . . , K , if xk1 = xk1 x˙k1 yk1 y˙k1 is the state of object one at scan k, then T  xk2 = xk1 x˙k1 −yk1 − y˙k1 is the state of object two at scan k. Thus, the two objects remain at a constant distance of 6σ M = 300 from one another for the final 75 s. The region of interest spans R = [−2000, 3000] × [−3000, 3000]. The mean number of clutter measurements λck ≡ λc = 100 is constant over all scans. Thus, in each scan, there is an average of approximately 0.24 clutter measurements in a 3σ M measurement window. The tracker process noise standard deviation is set to σ p = 4, and the resolution parameter is chosen to be σres = 182.6. This value for σres gives a resolution probability of approximately 0.75 when the objects are moving in parallel (i.e., 300 m apart).

3.5 Numerical Examples: Tracking with Unresolved Objects

75

1 ground truth JPDA obj 1 estimate JPDA obj 2 estimate unresolved measurement

y (km)

0.5

0

-0.5

-1 -2

-1.5

-1

-0.5

0

0.5

1

1.5

2

x (km) 1 ground truth JPDA/Res obj 1 estimate JPDA/Res obj 2 estimate unresolved measurement

y (km)

0.5

0

-0.5

-1 -2

-1.5

-1

-0.5

0

0.5

1

1.5

2

x (km)

Fig. 3.2 Crossing objects: (Top)—Standard JPDA tracker. No specific attention to unresolved measurements. Returns from red object are ten times stronger than blue object. (Bottom)—JPDA/Res tracker. Unresolved measurements are considered. Returns from red object are ten times stronger than blue object

Figures 3.3 and 3.4 represent two different scenarios. The annotations in the figures are the same as in Fig. 3.2. In the first scenario, the unresolved weighting factor is wr = 10/11 ≈ 0.91. Under the relative signal $ return % strength interpretation, wr the signal return strength of object one is 10 log10 1−wr = 10 dB higher than that of object two and, thus, most of the green unresolved measurements are close to object two, as seen in Fig. 3.3. In the second scenario, the unresolved weighting factor is wr = 0.5, i.e., the signal return strengths of both objects are equal−−neither of the objects is favored. Tracking results for both the standard JPDA and JPDA/Res trackers are displayed in Fig. 3.4.

76

3 Tracking a Specified Number of Objects

ground truth JPDA obj 1 estimate JPDA obj 2 estimate unresolved measurement

2

y (km)

1

0

-1

-2 -2

-1

0

1

2

3

4

2

3

4

x (km)

ground truth JPDA/Res obj 1 estimate JPDA/Res obj 2 estimate unresolved measurement

2

y (km)

1

0

-1

-2 -2

-1

0

1

x (km) Fig. 3.3 Parallel objects: (Top)—Standard JPDA tracker. No specific attention to unresolved measurements. Returns from red object are ten times stronger than blue object. (Bottom)—JPDA/Res tracker. Unresolved measurements are considered. Returns from red object are ten times stronger than blue object

3.5 Numerical Examples: Tracking with Unresolved Objects

77

ground truth JPDA obj 1 estimate JPDA obj 2 estimate unresolved measurement

2

y (km)

1

0

-1

-2 -2

-1

0

1

2

3

4

2

3

4

x (km)

ground truth JPDA/Res obj 1 estimate JPDA/Res obj 2 estimate unresolved measurement

2

y (km)

1

0

-1

-2 -2

-1

0

1

x (km) Fig. 3.4 Parallel objects: (Top)—Standard JPDA tracker. No specific attention to unresolved measurements. Return strengths from both objects are equal. (Bottom)—JPDA/Res tracker. Unresolved measurements are considered. Return strengths from both objects are equal

78

3 Tracking a Specified Number of Objects

3.5.3 Discussion of Results The standard JPDA filter inherently assumes that a given measurement is either clutter, or it originates from exactly one of the objects of interest. Thus, in the unresolved measurement case where an object’s return is buried under the return of another object or where objects are close enough so that they fall into a single resolution cell, the standard JPDA filter model is hopelessly mismatched to the reality of the data—it lacks model fidelity. The mismatch is clearly seen in the upper subplots of Figs. 3.2 and 3.3. In both the crossing and parallel scenarios, since the red object produces stronger returns than the blue, the unresolved measurement more often “favors” the red object than the blue; thus, the blue object is left without a measurement. Therefore, the existing unresolved measurement is, for all practical purposes, assigned to the red object when performing the measurement update, whereas the blue object’s estimate is extrapolated based on its motion model and the previous measurement update. In contrast, the JPDA/Res filter allows for the possibility that only one signal return (measurement) could mean that the two objects are unresolved. The various ways this can happen are accounted for by the six terms (3.60)–(3.65). Thus, despite increased estimation error due to inflated covariance, both objects are successfully tracked as shown in the bottom subplots of Figs. 3.2 and 3.3. It is also seen in the bottom subplot of Fig. 3.2 that the blue object is extrapolated without being drawn off, or seduced, by clutter, as it was in the upper plot. This is rather remarkable, and somewhat unexpected. The explanation is the improved fidelity of the model. JPDA/Res also improves tracking performance when both objects produce equal strength returns and, hence, neither object is favored. As illustrated in the top subplot of Fig. 3.4, although standard JPDA maintains two tracks in the presence of unresolved measurements, the tracks switch at the start of the parallel motion and then switch back later. This is an undesirable situation in, e.g., radar tracking where maintaining track IDs is of utmost importance. Such an outcome is not observed with JPDA/Res, however, as can be seen in the lower subplot of Fig. 3.4. The reason, again, is that the likelihood function has six terms (3.60)–(3.65) to account for the way point measurements can be generated by the sensor.

References 1. Daryl J. Daley and David Vere-Jones. An introduction to the theory of point processes. Vol. I. Elementary Theory and Methods. Springer, 2003. 2. Yaakov Bar-Shalom, Thomas E Fortmann, and Molly Scheffe. Joint probabilistic data association for multiple targets in clutter. In Proc. Conf. on Information Sciences and Systems, pages 404–409, 1980. 3. Thomas E Fortmann, Yaakov Bar-Shalom, and Molly Scheffe. Multi-target tracking using joint probabilistic data association. In 19th IEEE Conference on Decision and Control, pages 807–812, 1980.

References

79

4. Shozo Mori, Chee-Yee Chong, Edison Tse, and Richard P Wishner. Tracking and classifying multiple targets without a priori identification. IEEE Transactions on Automatic Control, 31(5):401–409, 1986. 5. Yaakov Bar-Shalom and Xiao-Rong Li. Estimation and tracking- Principles, techniques, and software. YBS Publishing Storrs, CT, 1995. 6. Roy L Streit. Saddle point method for JPDA and related filters. In 2015 18th International Conference on Information Fusion (Fusion), pages 1680–1687, 2015. 7. Roy L Streit. How I learned to stop worrying about a thousand and one filters and love analytic combinatorics. In 2017 IEEE Aerospace Conference, Big Sky, Montana, pages 1–21, 2017. 8. P Flajolet and R Sedgewick. Analytic combinatorics. Cambridge University Press, 2009. 9. Darko Mušicki and Robin Evans. Joint integrated probabilistic data association: JIPDA. IEEE Transactions on Aerospace and Electronic Systems, 40(3):1093–1099, 2004. 10. Darko Mušicki and Barbara La Scala. Multi-target tracking in clutter without measurement assignment. IEEE Transactions on Aerospace and Electronic Systems, 44(3):877–896, 2008. 11. Taek Lyul Song, Hyoung Won Kim, and Darko Mušicki. Iterative joint integrated probabilistic data association. In Proceedings of the 16th International Conference on Information Fusion, pages 1714–1720, 2013. 12. Subhash Challa, Ba-Ngu Vo, and Xuezhi Wang. Bayesian approaches to track existence–IPDA and random sets. In 2002 5th International Conference on Information Fusion (FUSION). IEEE, 2002. 13. Subhash Challa, Mark R Morelande, Darko Mušicki, and Robin J Evans. Fundamentals of object tracking. Cambridge University Press, 2011. 14. Darko Mušicki, Taek Lyul Song, and Roy L Streit. Generating function derivation of the IPDA filter. In 2014 Sensor Data Fusion: Trends, Solutions, Applications (SDF), pages 1–6, 2014. 15. Frederick E Daum and Robert J Fitzgerald. Importance of resolution in multiple-target tracking. In Signal and Data Processing of Small Targets 1994, volume 2235, pages 329–338. International Society for Optics and Photonics, 1994. 16. Kuo-Chu Chang and Yaakov Bar-Shalom. Joint probabilistic data association for multitarget tracking with possibly unresolved measurements and maneuvers. IEEE Transactions on Automatic control, 29(7):585–594, 1984. 17. Wolfgang Koch and Günter Van Keuk. Multiple hypothesis track maintenance with possibly unresolved measurements. IEEE Transactions on Aerospace and Electronic Systems, 33(3):883–892, 1997. 18. Daniel Svensson, Martin Ulmke, and Lars Danielsson. Joint probabilistic data association filter for partially unresolved target groups. In 2010 13th International Conference on Information Fusion, pages 1–8, 2010. 19. Daniel Svensson, Martin Ulmke, and Lars Danielsson. Multitarget sensor resolution model for arbitrary target numbers. In Signal and Data Processing of Small Targets 2010, volume 7698, 2010. 20. Daniel Svensson, Martin Ulmke, and Lars Hammarstrand. Multitarget sensor resolution model and joint probabilistic data association. IEEE Transactions on Aerospace and Electronic Systems, 48(4):3418–3434, 2012. 21. Christoph Degen. Probability generating functions and their application to target tracking. PhD thesis, University of Bonn, Germany, 2017.

Chapter 4

Tracking a Variable Number of Objects

“Dubito, ergo sum, vel, quod idem est, cogito, ergo sum.” (“I doubt, therefore I am—or what is the same—I think, therefore I am.”) René Descartes La Recherche de la Vérité par La Lumiere Naturale, 1647

Abstract Bayesian estimators of a time-varying number of objects and their states are developed. The estimators are based on object superposition, a fundamental concept in finite point processes that is adapted to model multiobject state. The concept of superposition is introduced carefully step by step, starting with its application to the JPDA filter for a specified number of objects. This JPDAS filter lays the groundwork for the extension to a random number of objects, leading quickly and painlessly to the CPHD filter. This natural approach shows the many close connections between the CPHD and the JPDAS filters. A numerical device, new to tracking applications, called the complex step method is used to evaluate the intensity function to machine precision “for free” merely by evaluating the GFL of the Bayes posterior process in complex arithmetic. Examples are presented. Keywords Intensity function · Count density · State superposition · JPDAS intensity filter · PHD intensity filter · CPHD intensity filter · Complex step method · Multi-Bernoulli filter

4.1 Introduction The JIPDA tracking filter developed in Chap. 3 is an exact Bayesian estimator of a time-varying number of objects and their states, given the reasonable assumption that a maximum number of objects, Nmax , is specified. It provides explicit expressions for calculating virtually any probability distribution needed in applications. For example, conditioned on the measurements up through the current scan, the PMF of the total number of objects that exist is given by (3.44). The state space (3.30) is complicated, © Springer Nature Switzerland AG 2021 R. Streit et al., Analytic Combinatorics for Multiple Object Tracking, https://doi.org/10.1007/978-3-030-61191-0_4

81

82

4 Tracking a Variable Number of Objects

true enough, but the show-stopper is that the high computational complexity of JIPDA makes it unsuitable for many applications. In this chapter, lower computational complexity Bayesian multiobject filters are derived from higher complexity filters by superposing object states. Exact Bayesian filters derived via object superposition have several important features: (i) they cannot increase the complexity; (ii) they cannot increase the information content of the filter output; and (iii) they are inherently consistent with the object and measurement assumptions of the original filter. These properties are immediate consequences of the fact that the GFL of a superposed process is, loosely speaking, the “diagonal” of the GFL of the joint non-superposed process (Sect. B.10 of Appendix B). This chapter shows that the concept of superposition is broadly useful. When applied to JPDA, it leads to a new low-complexity filter called JPDAS. Extending JPDAS to random numbers of objects leads to the cardinalized probability hypothesis density (CPHD) filter, of which the PHD filter is a widely known special case. The GFL of the CPHD is derived as a mixture of GFLs of JPDAS filters. Many interesting and novel connections between these filters follow from the mixture GFL. The intensity function is a key concept in superposition. It is a summary statistic for point processes, and it is often used to close Bayesian recursions. Surprisingly, it can be evaluated numerically almost “for free” using a novel complex step method to find the derivative of the secular function of the exact Bayesian filter. The method is approximate, true, but it is accurate to machine precision, fast, and numerically stable. It is described in Sect. 4.3.5. The technique makes particle implementations of multiobject intensity filters more attractive than previously. The numerical example in Sect. 4.7.1 is, perhaps, the first use of the complex step method in tracking.

4.2 Superposition of Multiple Object States 4.2.1 General Considerations Superposing object states onto a common space is a familiar practice in many fields— think of the theater-level plotting tables that display the geographic distribution of a diversity of objects, for example. The basic concept assumes that object state spaces are copies of a common state space, X. When the spaces are incommensurate, e.g., they have different dimensionalities, it is necessary to map object states to points in X in order to superpose them. The superposition of the object states defines the multiobject state. The points in the superposition are unlabeled, that is, which point corresponds to the state of which object is unknown. The multiobject state is, therefore, inherently less informative than the list of object-specific states. The multiobject state space is the grand canonical ensemble (GCE) of all multiobject states; see (B.15) in Appendix B. It is denoted by E(X). Random variables whose realizations are in the GCE event space are termed finite point processes.

4.2 Superposition of Multiple Object States

83

Probability distributions for a finite point process are defined over the events in E(X). The goal of Bayesian analysis is, therefore, to determine the posterior PDF of a finite point process on the GCE, E(X), conditioned on a set of sensor point measurements in the measurement space Y. The dimensionality of the GCE is so large that the PDF on this space is represented by one or more summary statistics. The most commonly used statistics are the intensity and the pair correlation functions (see Appendix B). These statistics support the intuition that superposed processes are often good representations of the multiobject state. In contrast, marginalized processes (see Sect. B.9) can conflate the distributional support of well-separated objects.

4.2.2 Superposition with Non-identical Object Models Recall that the GFL of the JPDA filter for N objects with clutter is, from (3.2), kJPDA (h 1:N, g) = kC (g)

N n=1

kBMD(n) (h n , g) ,

(4.1)

where kC (g) is the GFL (2.20) of the Poisson clutter process and kBMD(n) (h n , g) is the GFL (3.1) of the BMD process of object n. It is assumed here that the state space of all N objects is X, but the JPDA models are otherwise unchanged. It allows different objects, so the transition (motion) functions, measurement likelihoods, detection probability functions, and prior PDFs depend on n, the object index, and the scan index k, e.g., the prior PDF is μnk−1 (x), n = 1, . . . , N . For a general discussion of superposition, see (B.72) of Appendix B. The GFL of the superposed process is found by setting h n (x) = h(x), n = 1, . . . , N , in the GFL of the unsuperposed process. Let kJPDA/s (h, g) ≡ kJPDA (h, . . . , h, g) N = kC (g) kBMD(n) (h, g) . n=1

(4.2)

Given the scan measurement set yk = {y1 , y2 , . . . , y M } , M ≥ 1, and the Dirac delta train (cf. Eq. (2.22)), gδ (y) =

M m=1

βm δ ym (y),

y ∈ Y, β = (β1 , β2 , . . . , β M ) ∈ R M ,

(4.3)

the GFL of the Bayes posterior process is the normalized cross-derivative, 

  kJPDA/s h | yk =

 d  JPDA/s (h, β) dβ k β=0 

 d  JPDA/s (1, β) dβ k β=0

,

(4.4)

84

4 Tracking a Variable Number of Objects

where kJPDA/s (h, β) ≡ kJPDA/s (h, gδ ) is a secularized function, and the notation M

d dβ1 ···dβ M

d dβ



is used for enhanced readability. Explicitly, 

k

JPDA/s

×

(h, β) = exp



N  n=1

X

−λck

+ 

h(x)μn− k (x)

λck

M 

 βm pkc (ym )

(4.5)

m−1

1−

Pdkn (x)

+

Pdkn (x)

M 





βm pk (ym |x) dx .

m=1

This function is identical to (3.5) but with h n ≡ h and Xn ≡ X. By inspection, it is the product of an exponential of a linear function of β and N (non-identical) linear functions of β. Its cross-derivative has the same computational complexity as the cross-derivative in JPDA. The message is clear—superposing objects does not always reduce complexity. Superposition does not change the number of objects. To see this, denote the GF of the number of objects in the superposed process by G JPDA/s N (z), where z is the indeterminate. It is found (see Sect. B.3 of Appendix B) by substituting h(x) ≡ z for all x ∈ X into (4.4). Since z N factors out of the analytic expression (4.5), the crossN derivatives with respect to β cancel in the ratio, giving G JPDA/s N (z) = z . Therefore, with probability one, exactly N objects are present at scan k, a result in perfect accord with the JPDA assumption.

4.3 JPDAS: Superposition with Identical Object Models Assume further that the N object models are identical. The object transition functions, measurement likelihoods, detection probabilities, and prior PDFs are now independent of the object index. Dropping the object index n from the model notation − gives, e.g., Pdkn (x) ≡ Pdk (x) and μn− k (x) ≡ μk (x). As shown below, this additional assumption significantly reduces the computational complexity of the exact Bayesian posterior process.

4.3.1 Information Loss Due to Superposition The assumption that the a priori PDF μk−1 (x) is the same for all objects is a radical change. It profoundly alters the role of the prior. With the new assumption, one prior must serve for all N objects simultaneously, and not just one, as previously had been the case. To justify using one prior, recall the AC interpretation of the classic Bayes-Markov filter, wherein the object state is interpreted as a sample in a histogram with cell

4.3 JPDAS: Superposition with Identical Object Models

85

probabilities determined by the prior. The histogram model still holds, but now there are N objects. Before superposition, N object-specific histograms have one sample each. After superposition, one histogram has all N samples. Objects are independent and, by assumption, the prior PDF is the same for all objects, so that one histogram is the count record of N IID samples from the prior PDF. Said differently, object superposition is the equivalent of pooling and IID sampling “with replacement.” Well-separated objects are therefore represented by a multimodal prior, that is, by a prior with one mode for each object. Multimodality is one reason why superposition is effective. On the other hand, IID sampling with replacement is a problem for object tracking because more than one of the N IID samples can be from the same mode, thereby violating the at most one measurement per object rule. To ensure objects are properly “present and accounted for,” the sampling procedure should be without replacement. Superposition loses information because, implicitly, it is based on sampling with replacement.

4.3.2 Generating Functional of the Bayes Posterior Given the “sameness” assumption, kBMD(n) (h, g) ≡ kBMD (h, g) and the GFL of the superposition is  N kJPDAS (h, g) = kC (g) kBMD (h, g) , (4.6) where kBMD (h, g) is identical to (2.19), repeated here for easy reference:



h(x)μ− (x) 1 − Pd (x) + Pd (x) g(y) p (y | x) dy dx. kBMD (h, g) = k k k k X

Y

(4.7) Defining kJPDAS (h, β) = kJPDAS (h, gδ ), where gδ is given by (4.3), and rearranging terms gives M kJPDAS (h, β) = exp −λck +

×

X

m=1

βm λck pkc (ym )

M   h(x)μ− k (x) 1 − Pdk (x) dx +

m=1

(4.8)

N

  − βm h(x)μk (x)Pdk (x) pk ym |x dx . X

This expression is the product of an exponential of a linear function of β and— because the object models are identical—the N th power of a linear function of β. Its cross-derivative is given by Eq. (C.47) in Appendix C. As a shorthand notation, let   A(x) = μ− k (x) 1 − Pdk (x)   μ− k (x)Pdk (x) pk ym |x , m = 1, . . . , M. B(x; ym ) = λck pkc (ym )

(4.9) (4.10)

86

4 Tracking a Variable Number of Objects

The GFL of the Bayes posterior process is the normalized cross-derivative of (4.8) evaluated at β = 0, k

JPDAS



h | yk





N −κ min{N ,M} c(yk ) = (N )κ h(x)A(x) dx (4.11) Ck (M, N ) κ=0 X



h(x)B(x; y1 ) dx, . . . , h(x)B(x; y M ) dx , × Sκ(M) X

X

where Ck (M, N ) is the normalization constant and  M  c(yk ) = exp −λck

m=1

 c c  λk pk (ym )

(4.12)

is the probability that the M measurements are generated by the clutter process. The elementary symmetric polynomials (ESPs) in M variables Sκ(M) (·) are defined by (C.48) in Appendix C. The GFL (4.11) reduces to the GFL of the PDA filter (2.25)–(2.26) for N = 1 since S0(M) (·) = 1 and S1(M) (·) is the sum of its arguments. As noted in Appendix C, the cross-derivative (C.47) comprises at most 2 M terms. However, due to superposition, its computational complexity is merely O(M N ) via the recursion given by Eqs. (C.50)–(C.51). This is in sharp contrast to the exponential computational complexity of standard (non-superposed) JPDA; see Eq. (3.20). If the scan measurement set is empty, that is, yk = ∅, then the JPDAS posterior GFL is given by k

JPDAS



 h | yk = ∅ =

 X

  N h(x)μ− k (x) 1 − Pdk (x) dx  −   . X μk (x) 1 − Pdk (x) dx

(4.13)

4.3.3 Probability Distribution The PDF of JPDAS is defined on the Cartesian product space X N . It is evaluated by substituting a weighted Dirac delta train for h(x) and taking the cross-derivative. (See Sect. B.5 of Appendix B.) As noted above, exactly N objects are present with probability one, so delta trains of length less than or greater than N yield zero probabilities. Let {x1 , . . . , x N } denote a set of object states. For α = (α1 , . . . , α N ), let N h δ (α) = αn δxn (x). (4.14) n=1

Substituting into (4.11) gives the secular function; it is a multivariate polynomial in the components of α,

4.3 JPDAS: Superposition with Identical Object Models

k

JPDAS

  α | yk =

87



N −κ N c(yk ) min{N ,M} (N )κ αn A(xn ) (4.15) κ=0 n=1 Ck (M, N )  N  N × Sκ(M) αn B(xn ; y1 ), . . . , αn B(xn ; y M ) . n=1

The coefficient of the monomial

N n=1

n=1

αn = α1 · · · α N is the conditional probability:

    pk ({x1 , . . . , x N } | yk ) = α1 · · · α N kJPDAS α | yk .

(4.16)

The general expression for the coefficient is complicated and of little interest here, so it is omitted. Example 1 M = 1. In this case, yk = {y1 } and c(yk ) = exp(−λck )λck pkc (y1 ) . Since min{N , M} = 1, the coefficient (4.16) is the sum of two terms whose ESPs are S0(1) (x) = 1 and S1(1) (x) = x. The normalization constant is Ck (1, N ) =



exp(−λck )λck pkc (y1 )



N



  + N exp(−λck )λck pkc (y1 )

A(x) dx

(4.17)

X

N −1



A(x) dx X

X

B(x; y1 ) dx .

The κ = 0 term, without Ck (1, N ), is 

N   N αn A(xn ) Tk (0) ≡ α1 · · · α N c(yk )(N )0 n=1





= exp(−λck )λck pkc (y1 ) N !

N

n=1

A(xn ) .

(4.18)

To see this, apply the multinomial expansion. It is the product of two probabilities. The first is the probability that the Poisson clutter process generated one point, and the point is y1 . The second is the (joint) probability that the N objects are in states {x1 , . . . , x N } and are undetected. The N ! is a by-product of superposition and the multinomial expansion. Similarly, the κ = 1 term is, after some algebra, 

N −1 

  N N Tk (1) ≡ α1 · · · α N c(yk )(N )1 αn A(xn ) αn B(xn ; y1 ) n=1 n=1    N  N    (4.19) A(xn  ) μ− = exp −λck N ! k (x n )Pdk (x n ) pk y1 |x n .   n=1

n =1, n =n

It is the probability that no clutter is generated and that exactly one object is detected, it is in state xn , and it generates the measurement y1 . The sum over all N objects is a consequence of superposition, since the measurement is unassigned. The normalized sum,

88

4 Tracking a Variable Number of Objects

pk ({x1 , . . . , x N } | yk ) =

Tk (0) + Tk (1) , Ck (1, N )

(4.20)

is the joint PDF for the JPDAS filter with N objects given one measurement.

4.3.4 Intensity Function and Closing the Bayesian Recursion Summary statistics for the superposed process are the same as for any single point process. The most commonly used statistic is the “count density” function, often termed the intensity function. The intensity is the expected number of objects per unit state space at each point x¯ ∈ X. It has the same units as a probability density on X, but it is a very different function. This important distinction is discussed in Sect. B.6 of Appendix B. As noted in Sect. B.6 of Appendix B, the intensity function at an arbitrarily specified point x¯ ∈ X is the derivative of the GFL at x¯ evaluated at one, not zero, so the secular function for the intensity at x¯ employs the Dirac delta h δ (x) = 1 + αδ ¯ x¯ (x), α¯ ∈ R. Using it in (4.11) gives     ¯ x¯ | yk kJPDAS α¯ | yk = kJPDAS 1 + αδ

N −κ

min{N ,M} c(yk ) ¯ + A(x) dx (N )κ α¯ A(x) (4.21) = Ck (M, N ) κ=0 X



(M) αB( ¯ x; ¯ y1 ) + B(x; y1 ) dx, . . . , αB( × Sκ ¯ x; ¯ y M ) + B(x; y M ) dx . X

X

By inspection, this secular function is a (univariate) polynomial in α. ¯ The intensity at the point x¯ ∈ X is the coefficient of the linear term,     IkJPDAS (x¯ | yk ) = α¯ kJPDAS α¯ | yk .

(4.22)

The general analytic expression for this coefficient is cumbersome and is omitted. The intensity function simplifies to     ¯ 1 − Pdk (x) ¯ N μ− (x)  IkJPDAS x¯ | yk = ∅ =  −k  X μk (x) 1 − Pdk (x) dx

(4.23)

when thescan measurement set is empty. Since X IkJPDAS (x | yk ) dx = N , dividing by N gives the exact Bayes updated PDF as 1 (x | yk ) = IkJPDAS (x | yk ). (4.24) μJPDAS k N This step closes the Bayesian recursion for JPDAS.

4.3 JPDAS: Superposition with Identical Object Models

89

Example 2 M = 1. Following the manner of Example 1 in Sect. 4.3.3, the intensity function comprises two terms. The normalization constant Ck (1, N ) is unchanged from (4.17). The term for κ = 0 is

N

  ¯ + A(x) dx Sk (0) = α¯ c(yk )(N )0 α¯ A(x) X

N −1

  = N exp(−λck )λck pkc (y1 ) A(x) dx A(x) ¯ .

(4.25)

X

The term for κ = 1 is, after canceling the denominator of B(x; ¯ y1 ),

N −1



  ¯ + A(x) dx αB( ¯ x; ¯ y1 ) + B(x; y1 ) dx Sk (1) = α¯ c(yk )(N )1 α¯ A(x) 



 c

= N exp − λk

X

N −1 A(x) dx

X



+ N (N − 1) exp −

λck





X



μ− ¯ ¯ pk y1 |x¯ k ( x) k ( x)Pd

N −2

A(x) dx

X

A(x) ¯

X



(4.26)

  μ− k (x)Pdk (x) pk y1 |x dx.

Each contribution to the sum Sk (0) + Sk (1) is the product of N + 1 probabilities: • Sk (0) is the product of the probability of exactly one clutter-induced measurement at y1 , the probability that N − 1 objects are undetected, and the probability that one object is undetected in state x. ¯ There are N such products, all identical due to superposition, hence the factor of N . • The first term of Sk (1) is the product of the probability that no clutter is generated, the probability that N − 1 objects are undetected, and the probability that one object in state x¯ generates y1 . The multiple of N is due, again, to superposition. • The second term of Sk (1) is the product of the probability that no clutter is generated, the probability that N − 2 objects are undetected, the probability that one object at x¯ is undetected, and the probability that one object generates y1 . There are 2 N2 = N (N − 1) such products, all identical. The intensity function of JPDAS at an arbitrary point x¯ ∈ X is the normalized sum, IkJPDAS (x¯ | yk = {y1 }) =

Sk (0) + Sk (1) . Ck (1, N )

(4.27)

Integrating over all X gives the expected number of objects in X as

X

  IkJPDAS x | yk = {y1 } dx = N ,

(4.28)

as required per JPDA assumptions. To check this, substitute (4.25), (4.26), and (4.17), integrate over X, and simplify.

90

4 Tracking a Variable Number of Objects

4.3.5 Intensity Function and the Complex Step Method Computing the intensity function is an essential part of practical filter implementation when superposition is employed. Symbolically, this requires differentiating the GFL of the Bayes posterior process, e.g., Eq. (4.22) for JPDAS. Whether or not the symbolic derivative is difficult depends on the function, naturally. As it happens, a remarkably simple method, called the complex step method, can compute extremely accurate numerical values of the derivative of an analytic function at a point of analyticity. It is due to [1], although it has a longer history [2, 3]. For details and an explanation of why it works, see Sect. C.5 of Appendix C. The bottom line is that numerical values of the intensity at an arbitrary point x¯ ∈ X can be computed with the same computational effort that is needed to compute the secular function itself. Loosely speaking, the complex step method evaluates the secular function in complex arithmetic, and its derivative, the intensity function, comes “for free” with no additional computational cost. Complex step for JPDAS intensity. By inspection, the secular function for JPDAS (4.21) is analytic in a neighborhood of α¯ = 0 in the complex plane, C. Using the complex step method to compute its derivative at α¯ = 0 gives the intensity of JPDAS at x¯ ∈ X. The numerical calculation is straightforward—in  √ complex arithmetic, evaluate kJPDAS α¯ | yk at the point α¯ =  i, where i = −1 and  is a small positive number, say 10−8 . The intensity at x¯ is JPDAS

Ik

(x¯ | yk ) ≈ Im

IkJPDAS ( i | yk ) 

,

(4.29)

where Im(z) is the imaginary part of z ∈ C. The error of the approximation is O(2 ), so it is highly accurate. The computational cost of evaluating the JPDAS intensity function is dominated by the cost of calculating the M + 1 integrals in (4.21). The algorithm based on (C.51) computes numerical values of the ESPs with complexity O(M N ). The additional cost of using complex arithmetic is low and fixed using the complex step method, making particle filter implementations of the JPDAS intensity filter practical. The complex step method is employed for intensity estimation in the example in Sect. 4.7.1.

4.4 CPHD: Superposition with an Unknown Number of Objects Random models of the number of objects and their states are needed in order to develop Bayesian estimates of object number. A finite point process is one such model. The AC approach combines a GF for the random number of objects with a GFL for the object states. These filters are called cardinalized probability hypoth-

4.4 CPHD: Superposition with an Unknown Number of Objects

91

esis density (CPHD) filters. (The term “hypothesis density” originated from earlier independent work [4] that sought an additive theory of evidence accrual.) To streamline the presentation of ideas, this section discusses a version of the CPHD filter that does not include a model for new objects that may appear from one scan to the next. It also does not include a model for object termination. Both capabilities are in the original filter [5]. These additional models are discussed in Sect. 4.5, along with an object spawning model which is not in [5].

4.4.1 Markov Chain for Number of Objects The number of objects N (t) is assumed to be a continuous-time Markov chain on the nonnegative integers {0, 1, 2, . . .} with specified transition function. The number of objects at scan time tk is denoted Nk = N (tk ). The sequence (Nk ) is a discrete-time Markov chain whose transition probability matrix (TPM) from scan k − 1 to scan k is Ak−1 = [πk−1 (i, j)]i,∞j=0 , where the transition probabilities are πk−1 (i, j) = Pr{Nk = j | Nk−1 = i} .

(4.30)

The matrix Ak−1 can be specified to suit application needs, but ideally it is derived from a continuous-time transition function. In any event it is a row stochastic matrix, i.e., the sum of each row is one. Denote the Bayesian posterior of the number of objects Nk−1 at scan k −  PMF∞ 1 by the row vector χk−1 = χk−1 (n) n=0 , where χk−1 (n) = Pr{Nk−1 = n}. It is conditioned on all the sensor data up to and including scan k − 1. Its GF is N (z) = G k−1

∞ n=0

χk−1 (n) z n ,

(4.31)

where z is the indeterminate variable. The predicted number of objects at scan k ∞ − − is denoted by Nk− . Its PMF is the row vector χ− k = χk (n) n=0 , where χk (n) = − Pr{Nk = n}. Because of the Markov chain assumption, it is given by the row-matrix product χ− k = χk−1 Ak−1 , so χ− k (n) =

∞ i=0

χk−1 (i) πk−1 (i, n), n = 0, 1, 2, . . . .

(4.32)

The predicted PMF χ− k is the a priori PMF for object number and is used in the Bayesian update for scan k. The GF of the predicted PMF is, by definition, −

G kN (z) = where z is the indeterminate.

∞ n=0

n χ− k (n) z ,

(4.33)

92

4 Tracking a Variable Number of Objects

New objects may appear and existing objects may terminate during the time interval between scan times tk−1 and tk . These birth and death processes, as they are termed, are often constants, in which case they may be incorporated into the continuous-time object number process N (t). An example is given in Sect. 4.6. In general, however, they are functions of object state, in which case they are part of the state dynamical model. Section 4.5 presents the general case.

4.4.2 Probabilistic Mixture GFL The starting point for the multiobject state model is the GFL for JPDAS and the law of total probability as it applies to GFLs. The assumptions of the JPDAS model are adopted here. With a random number Nk of objects at scan k, the GFL of JPDAS is now the conditional GFL  N kJPDAS/N (h, g | Nk ) = kC (g) kBMD (h, g) k ,

Nk ≥ 0 .

(4.34)

Multiplying by Pr{Nk− = n} = χ− k (n) and summing over all n give the multiobject GFL of the CPHD filter at scan k as ∞ χ− (n) kJPDAS/N (h, g | n) (4.35) kCPHD (h, g) = n=0 k ∞   n χ− (n) kC (g) kBMD (h, g) = n=0 k  − = kC (g) G kN kBMD (h, g) . (4.36) To see that the mixture (4.35) is a valid GFL, partition the GCE of the multiobject measurement space E(X) × E(Y) into the disjoint subspaces (n) ≡ {x1:n : x1:n ⊂ X} × E(Y) , where x1:n ≡ {x1 , . . . , xn } (see Appendix B, Sect. B.1). Since ∪∞ n=0 (n) = E(X) × E(Y) and (n) ∩ (m) = ∅ for n = m, the law of total probability applies. From (B.21) in Appendix B, the GFL of a general point process on E(X) × E(Y) is the probabilistic mixture  Mix (h, g) =

∞ n=0

Pr{N = n}  Mix/N (h, g | n) ,

(4.37)

where  Mix/N (h, g | n) is the joint process conditioned on n. In this case,  Mix/N (h, g | n) ≡ kJPDAS/N (h, g | n) .

(4.38)

Finally, moving kC (g) outside the sum and substituting z = kBMD (h, g) into (4.33) gives (4.36).

4.4 CPHD: Superposition with an Unknown Number of Objects

93

Connection to IPDA and JIPDA. The GFL (4.36) is identical to the GFL (2.37) for IPDA in the special case −

− (z) = χ− G kN (z) = G Bernoulli k (0) + χk (1) z , χ− (1)

(4.39)

k

− where χ− k (0) = 1 − χk (1). More generally, for a given N and

 N  N − − G kN (z) = G Bernoulli (z) = χ− , − k (0) + χk (1) z χ (1)

(4.40)

k

it is seen using (3.33) and (3.34) that the GFL for CPHD reduces to the GFL of JIPDA with superposition (JIPDAS) and identical object models. A full discussion is given below in Sect. 5.2.1 of Chap. 5.

4.4.3 Bayes Posterior GFL Assume the scan measurement set is yk = {y1 , . . . , y M }, where M ≥ 1. The case yk = ∅ was addressed for the JPDAS posterior GFL in (4.13) and is handled in an analogous manner here. To streamline the presentation, in the remainder of the chapter the case yk = ∅ is not explicitly addressed and is left to the reader. Given the Dirac delta train (4.3), the GFL of the exact Bayes posterior process is the normalized cross-derivative,   d CPHD  (h, β)  dβ k   CPHD β=0 , h | yk = (4.41) k  d CPHD  (1, β)  k dβ β=0

 M kCPHD (h, β) ≡ kCPHD h,

where

m=1

 βm δ ym .

(4.42)

The cross-derivative is determined by direct calculation if an analytic closed-form − expression for the GF of the predicted number of objects G kN (z) is known. Two − examples are given in the next subsection. Otherwise, if G kN (z) is known only as a power series, the cross-derivative is calculated term-by-term. The power series form of the GFL of CPHD gives several interesting identities that closely link the likelihood structure of the exact CPHD posterior to that of JPDAS/N. Substituting (4.35) into (4.41) gives k

CPHD



h | yk



 ∞   1 d JPDAS/N  −  = CPHD h, β | n  χ (n) , Ck (M) n=0 k dβ k β=0

(4.43)

94

4 Tracking a Variable Number of Objects

where the normalizing constant CkCPHD (M) is the denominator of (4.41). The derivative appeared earlier in (4.8) and (4.11). Multiplying and dividing each term by the constant Ck (M, n) gives the GFL of the Bayes posterior process as a weighted sum of JPDAS/N posterior processes: ∞      χ− k (n) C k (M, n) kJPDAS/N h | n, yk . kCPHD h | yk = CPHD Ck (M) n=0

(4.44)

  This expression is general. For n = 0, kJPDAS/N h | 0, yk = 1.

4.4.4 Posterior GF of Object Count The PMF of the number of objects is found by substituting h(x) ≡ z. As remarked above, Bayesian processing does not change the JPDAS/N assumption that n objects are present, so that ∞    χ− k (n) C k (M, n) n z , z | yk = G CPHD k CkCPHD (M) n=0

(4.45)

where z is the indeterminate.It follows immediately that the PMF of the exact Bayes ∞ CPHD posterior process is χCPHD ≡ χ (n | y ) , where k k k n=0 (n | yk ) = χCPHD k

χ− k (n) C k (M, n) . CkCPHD (M)

(4.46)

The derivative of the GF (4.45) evaluated at z = 1, E[N | yk ] =

 ∞   nχ− d CPHD  k (n) C k (M, n) G k z | yk  , = dz CkCPHD (M) z=1 n=1

(4.47)

is the expected number of objects in the exact Bayes posterior process.

4.4.5 Exact Bayes Conditional Probability The conditional probability of a given set of states {x1 , . . . , xn } ⊂ X is found from this result by substituting the Dirac delta train (4.14) (with N replaced by n) and extracting the coefficient of the monomial, as was done in (4.16). As mentioned there, the coefficient is identically zero if the summation index is anything other than n. Therefore,

4.4 CPHD: Superposition with an Unknown Number of Objects

95

  χ− (n) Ck (M, n)     α1 · · · αn kJPDAS/N α | n, yk (4.48) pkCPHD {x1 , . . . , xn } | yk = k CPHD Ck (M)  χ− (n) Ck (M, n) JPDAS/N  pk = k CPHD {x1 , . . . , xn } | n, yk . (4.49) Ck (M) This result is general. It does not appear to have been reported elsewhere.

4.4.6 Intensity Function The intensity function at a point x¯ ∈ X is calculated as it was for JPDAS, by substituting h(x) = 1 + αδ ¯ x¯ (x), giving         ¯ x¯ | yk . IkCPHD (x¯ | yk ) = α¯ kCPHD α¯ | yk ≡ α¯ kCPHD 1 + αδ

(4.50)

The analytic expression for the coefficient is determined by differentiating the secular function with respect to α¯ at zero. An equivalent alternative expression is found by substituting into (4.44) and using the identity (4.22) to obtain IkCPHD (x¯ | yk ) = =

∞    JPDAS/N   χ− k (n) C k (M, n) α¯ k α¯ | n, yk CPHD Ck (M) n=0 ∞  χ− (n) Ck (M, n) k

n=0

CkCPHD (M)

IkJPDAS/N (x¯ | n, yk ) .

(4.51) (4.52)

An interesting aspect of this expression is that it facilitates computing only a few terms in the full Bayesian posterior expression for the intensity; see the end of the next section. As discussed in Sect. 4.3.5, the numerical value of the CPHD intensity function at x¯ is calculated numerically to very high accuracy by using the complex step method to evaluate the desired terms in the series. In practice, there are only a finite number of terms in the series, one for each object. If this number is Nmax , the total computational 2 ); see Appendix C; Eq. (C.51). complexity is bounded above by O(M Nmax

4.4.7 Closing the Bayesian Recursion The CPHD intensity filter is initialized at scan k by assuming that the multiobject state is a finite point process called a cluster process (see Appendix B, Sect. B.2). The GFL of this prior process is defined by

96

4 Tracking a Variable Number of Objects



prior N (h) = G k−1 k−1

X

h(x)μk−1 (x) dx ,

(4.53)

N (z) is the (prior) GF of the number of objects at scan k, where G k−1 N (z) = G k−1

∞ n=0

χk−1 (n) z n ,

χk−1 is the PMF of the number of objects, and μk−1 (x) is the PDF of the IID state sample points that comprise the multiobject state. Both the PMF and the PDF are conditioned on all measurements up to and including scan k − 1. Together they characterize the prior distribution of the Bayesian recursion. The prior intensity function at scan k is defined by Ik−1 (x) = E χk−1 [N ] μk−1 (x) , where E χk−1 [N ] is the expected number of objects. It follows that the intensity function is the expected number of objects per unit state space at x ∈ X. The predicted multiobject state is the cluster process whose GFL is k (h) =

− G kN



X

h(x)μ− k (x) dx

,

(4.54)



where G kN (·) is the predicted GF given by (4.33) and μ− k (x) is the predicted PDF. That this is a cluster process model was not mentioned in the mixture derivation (4.35) of the GFL for the CPHD  filter. It is seen clearly in (4.36) by setting g(y) = 1, for in that case kBMD (h, 1) = X h(x)μ− k (x) dx. The cluster process model arises from the “sameness” assumption needed for superposition and the usual assumption that objects are independent. The exact Bayesian posterior CPHD process is not a cluster process because its GFL does not have the requisite form. This follows from (4.44), since the posterior GFL of JPDAS/N is not, by inspection of (4.11), a power of N . To close the Bayesian recursion, it is necessary to approximate the exact posterior process by a cluster process. The approximating cluster process is defined by

k

CPHD/Bayes

(h) ≡ G k

CPHD

CPHD

X

h(x)μk

  (x) dx  yk ,

(4.55)

where G CPHD (· | yk ) is the GF of the exact Bayes posterior process given by (4.45) k and the updated PDF is defined by μCPHD (x) = k

IkCPHD (x | yk ) , E[N | yk ]

(4.56)

4.4 CPHD: Superposition with an Unknown Number of Objects

97

where the numerator is the intensity function of the exact Bayes posterior process given by (4.52) and the denominator is the expected number of objects in the exact Bayes posterior (4.47). Alternative intensity. The CPHD intensity function (4.52) is a weighted sum of JPDAS/N intensity functions. In applications, situations may arise in which it is known (from other information sources) that there are precisely n objects, or perhaps that the number of objects is somewhere between n 1 and n 2 . In such cases, it may be worthwhile to limit the sum to only these terms. This limited sum, properly normalized, can then be used in place of the update (4.56).

4.5 State-Dependent Models for Object Birth, Death, and Spawning Several additional AC models can be added to the intensity filters to enhance their capability and modeling fidelity. The new object and object termination models are part of the original CPHD filter [5], while the spawning model is not and was added much later [6, 7]. Objects may spawn in more than one way, so several different object spawning models are possible. Thus, the AC spawning model presented here may differ from those in the published literature.

4.5.1 New Object Birth Process A random number of new objects may appear at scan k. Each object is modeled as a BMD process, so it may or may not generate a measurement. Let μnew k (x) and Pdknew (x), x ∈ X, denote the PDF of an object’s “birth” state and initial detection probability, respectively. The GFL is assumed to be the same for all new objects:

k

(h, g) =

BMDnew

X



h(x)μk (x) 1 − Pdk (x) + Pdk (x) new

new

new

Y

g(y) pk (y | x) dy

dx.

The new object process is assumed to be a cluster point process that is independent of new other processes. Denote the GF of the number of new objects by G kN (z). Including new objects in the CPHD filter gives the GFL   − new new new kCPHD (h, g) = kC (g) G kN kBMD (h, g) G kN kBMD (h, g) .

(4.57)

After the initialization at scan k, the “sameness” assumptions are applied to all objects in later scans, e.g., they follow the same Markov motion model and have the same detection probabilities.

98

4 Tracking a Variable Number of Objects

4.5.2 Darwinian Object Survival Process Object survival can be modeled in two ways (probably more). In the “early” termination model, objects that exist at scan k − 1 do not have an opportunity to transition to scan k. In the “late” model, objects can transition from scan k − 1 to scan k but may not survive the transition. The late model is used in this subsection. The early model is used in the next subsection. Termination is modeled as a counting procedure that counts one if the object still exists at scan k and zero if not. Denote the survival probability by ρk (x). The predicted PDF of object state at scan k is μ− k (x), x ∈ X, so the survival model defines a Bernoulli random integer on {0, 1} whose GF is (z) = 1 − ρˆk + ρˆk z , G Darwin k

(4.58)

where z is the indeterminate, and

ρˆk =

X

ρk (x)μ− k (x) dx

(4.59)

is the expected survival probability. The PDF of the state of a surviving object is μˆ k (x) = 

ρk (x)μ− k (x) , x ∈ X. −    X ρk (x )μk (x ) dx

(4.60)

It follows that the GFL of a surviving object is a Bernoulli point process (see (B.6)) with GFL

h(x) μ ˆ (x) dx kDarwin (h) = G Darwin k k X

h(x)μˆ k (x) dx . (4.61) = 1 − ρˆk + ρˆk X

A surviving object may or may not be detected by a sensor, so the GFL of the joint ˆ k (x), object-measurement process is, using (4.7) with μ− k (x) replaced by μ  BMD

k (h, g) =

X

h(x)μˆ k (x) 1 − Pdk (x) + Pdk (x)



Y

g(y) pk (y | x) dy dx. (4.62)

To incorporate the survival model  BMD  into CPHD, replace the GFL for the BMD model   (h, g) and follow the same steps, to obtain in (4.34) with G Darwin k k    BMD   − kCPHD/Darwin h, g = kC (g) G kN G Darwin k h, g . k

(4.63)

4.5 State-Dependent Models for Object Birth, Death, and Spawning

99

This reduces to (4.36) for ρk (x) = 1. The GF of the predicted number of objects is unchanged from (4.33). The combinatorics of the problem is seen in the expression (4.63), which is colorfully described as follows: A star (or object, or particle) is born at x at time tk−1 from the population μk−1 (x). It may or may not survive the transition to time tk (Darwin), but if it does, it may or may not be detected (the D in BMD). If detected, it generates exactly one measurement at time tk . Many stars are born—the number, N , is random—and they are all born free (IID). Their stories are collected (superposed) and pooled along with stories from other populations (clutter). Their history is told by Eq. (4.63).

4.5.3 Object Spawning (Branching) Branching processes. Markov branching processes are classical continuous-time models for spawning. In the Yule process, an object lives for an exponentially distributed length of time and then splits into two objects. In birth and death processes, objects are born and existing ones die at time rates proportional to the number of objects. In both cases, the GF can be derived explicitly [8, Chap. III]. Branching diffusion processes are extensions that include the notion of object state, and they are relevant to tracking a spawning number of moving objects. They too have a large, established literature (see [8, Chap. VI] and the references therein). A discrete-time spawning model, rudimentary compared to the broader literature, is presented in this subsection. Spawning in CPHD. In tracking applications, objects may fragment, or spawn new objects, due solely to natural physical processes (e.g., a stony meteorite may break apart during its fall to earth). Controlled objects may divide by choice (e.g., when a taxi discharges riders). In both instances, once spawned, objects are assumed to be indistinguishable and to move independently. When it occurs, spawning is assumed to happen precisely at time tk−1 . The “sameness” assumption implies that all objects, spawned or not, transition to scan k with the same Markovian motion model, and survive or not according to the same probability ρk (·). The spawning number of an object in state xk−1 at time tk−1 is the number of objects that exist immediately after the object spawns. The number is assumed to be greater than or equal to one. An object is said to have spawned if its spawning number is greater than one, and not to have spawned if it is exactly one. Its GF is spa | x denoted by G k−1 k−1 (z). The conditioning on xk−1 is flexible. For example, if X is partitioned into two sets, say, X L and X R , then an object at xk−1 ∈ X L cannot spawn spa | x if G k−1 k−1 (z) = z, while an object at xk−1 ∈ X R can become either one or two objects spa | x if, say, G k−1 k−1 (z) = 0.9z + 0.1z 2 . spa Let xk−1 ∈ X denote the state of an object spawned at time tk−1 by an object at spa spa by μspa xk−1 ∈ X. Denote the conditional PDF of xk−1 k−1 (x k−1 | x k−1 ). The Markov object motion model applies to all objects, so the predicted PDF of a spawned object

100

4 Tracking a Variable Number of Objects

at scan k conditioned on xk−1 is  − spa μspa xk | xk−1 = k

X

 spa   spa spa  spa μspa k−1 x k−1 | x k−1 pk x k | x k−1 dx k−1 .

(4.64)

The probability that an object at scan k − 1 continues to exist at scan k given that it is in state xk is ρk (xk ). Survival applies equally to all objects. The “Darwinian” survival of an object spawned at scan k − 1 by an object at xk−1 is Bernoulli distributed; its conditional GFL is Darwin|x k−1

Gk



where spa|x k−1

ρˆk

spa|x k−1

=

spa|x k−1

(z) = 1 − ρˆk

X

+ ρˆk

z,

(4.65)

  − spa  xk | xk−1 dx spa ρk xkspa μspa k k .

(4.66)

The corresponding Bernoulli point process at scan k is Darwin|x k−1

k

Darwin|x k−1

(h) = G k





X

where

spa

h xk







μˆ k xk | xk−1 dx k spa

spa

spa

,

(4.67)



ρk (xk ) μspa k (x k | x k−1 )

μˆ kspa (xk | xk−1 ) = 



X

ρk (x) μspa k (x | x k−1 ) dx

.

(4.68)

By the “sameness” assumption, an object at scan k is either detected or not, and it generates a sensor measurement if and only if it is detected. This makes it a BMD process with a modified prior PDF. Its conditional GFL is BMDspa|x k−1

k

=

X

(h, g)

(4.69)

  h(xk )μˆ kspa xk |xk−1 1 − Pdk (xk ) + Pdk (xk ) g(y) pk (y|xk ) dy dx k . Y

Multiplying by the prior PDF of xk−1 , and superposing all spawned objects and clutter, gives the GFL for the CPHD with spawning objects as kCPHD/Spawn (h, g) N = kC (g) G k−1



spa|x

X



Darwin|x k−1

μk−1 (xk−1 ) G k−1k−1 G k



BMDspa|x k−1

k

(h, g)



(4.70)

dx k−1 .

The combinatorics of the spawning model can be read directly from the GFL. In this − N model, spawning replaces object number prediction, that is, G kN (z) ≡ G k−1 (z).

4.5 State-Dependent Models for Object Birth, Death, and Spawning

101

spa|x

If there is no spawning, then G k−1k−1 (z) = z and the notation simplifies so that spa xk−1 ≡ xk−1 and μspa k−1 (x k−1 | x k−1 ) ≡ μk−1 (x k−1 ). The left-hand side of (4.64) is just − μk (xk ), so that (4.70) reduces to (4.63). Further details are omitted. spa|x Nota Bene. The GF of the spawning model G k−1k−1 (z) is also a model for early spa |x object termination. If, say, G k−1k−1 (z) = 0.1 + 0.9z, then an object at xk−1 in scan k − 1 will be terminated with probability 0.1 and never have the opportunity to transition. This model is different from the late termination model in the previous subsection. The models are not incompatible, so both can be used in the same problem. spa

4.6 PHD: A Poisson Intensity Filter The PHD filter is a special case of the CPHD filter. It is a Bayesian recursion whose prior at scan k is assumed to be a Poisson point process whose GFL is

k−1 (h) = exp −λk−1 + λk−1 h(x)μk−1 (x) dx ,

(4.71)

X

where λk−1 is the expected number of objects and μk−1 (x) is the PDF of object state. Both are conditioned on all measurements up to and including scan k − 1. To close the Bayesian recursion, the posterior process at scan k must be approximated by a process whose GFL has the same form. Clutter process. The PHD filter assumes, as do many filters in the literature, that clutter at every scan is a Poisson process. The clutter GFL at scan k is, from (2.20),

c c c g(y) pk (y) dy . k (g) = exp −λk + λk C

(4.72)

Y

The Poisson form is necessary for the PHD filter. Predicted process. A Poisson process subjected to object motion, as modeled by the transition function pk (xk |xk−1 ), is a Poisson process [9] whose PDF μ− k (·) is the predicted object state at scan k. Motion does not alter the number of objects. Subjecting this process to state-dependent survival with probability function ρk (xk ) gives another Poisson process [9]. The GFL of the surviving process at scan k is

− − − − ˆ ˆ  k (h) = exp −λk + λk h(x)μˆ k (x) dx ,

(4.73)

X

where the expected number of objects that survive the transition to scan k is λˆ − k = λk−1

X

ρk (x)μ− k (x) dx

(4.74)

102

4 Tracking a Variable Number of Objects

and their PDF is μˆ − k (x) = 

ρk (x)μ− k (x) .  )μ− (x  ) dx  ρ (x k X k

(4.75)

New object process. Superposed with the predicted process is a new object process. new It too is assumed to be Poisson with mean number λnew k and object PDF μk (x). The superposed process is Poisson with mean and PDF given, respectively, by new ˆ− λ˜ − k = λk + λk

μ˜ − k (x) =

(4.76)

new new ˆ− λˆ − k μ k (x) + λk μk (x) . λˆ − + λnew

k

(4.77)

k

That the combined process persists as a Poisson process from scan k − 1 to scan k is a tribute to the remarkable properties of the Poisson process. GFL of the object-measurement process. Each object, when present, is assumed to generate measurements in accordance with the BMD model. The GFL is given by (4.36). Given the PHD model assumptions, the GF of the predicted object number is   − ˜− G kN (z) = exp − λ˜ − k + λk z ,

(4.78)

where z is the indeterminate. Define the predicted intensity function by ˜− IkPHD (x) = λ˜ − k μ k (x) . −

(4.79)

The GFL of the BMD process is given by (4.7). Substituting and rearranging terms gives the explicit form of the joint GFL as



c g(y) p c (y) dy + h(x)I PHD−(x)1− Pd (x) dx kPHD (h, g) = exp −λck − λ˜ − + λ k k k k k Y X



− h(x)g(y) IkPHD (x)Pdk (x) pk (y | x) dy dx . (4.80) + X Y

It is proportional to the exponential of a sum of one bilinear and two linear functionals in h and g. As a check, note that kPHD (1, 1) = 1. GFL of the exact Bayes posterior process. Given the measurement set yk = {y1 , . . . , y M }, the secularized function is  M  βm δ ym kPHD (h, β) = kPHD h, m=1

 M   − − c = exp −λk − λ˜ k + βm λck pkc (ym ) + h(x) IkPHD (x) 1 − Pdk (x) dx m=1 X

M PHD− + βm h(x) Ik (x)Pdk (x) pk (ym | x) dx . (4.81) m=1

X

4.6 PHD: A Poisson Intensity Filter

103

The normalized cross-derivative with respect to β at β = 0 is the GFL of the exact Bayes posterior process. Arranging terms gives k (h|yk ) = k (h) PHD

und

M  λck pkc (ym ) + m=1





h(x)I PHD (x)Pdk (x) pk (ym | x) dx  PHDk− , (4.82) c c λk pk (ym ) + X Ik (x)Pdk (x) pk (ym | x) dx X

where kund (h) is the GFL of a Poisson process with mean and PDF given by   −   PHD− 1− Pdk (x) IkPHD (x) und  PHD− 1− Pdk (x) Ik (x) dx and μk (x) =   = , (x) dx X X 1− Pdk (x) Ik

λk

und

respectively. The product form of the GFL (4.82) means that it is the superposition of M + 1 independent processes. The “und” process corresponds to objects that are undetected. The other M terms are, by inspection, Bernoulli processes. Closing the Bayesian recursion. The prior distribution of the Bayesian PHD intensity recursion is a Poisson point process. The exact Bayes posterior process is not Poisson, so it must be approximated by a Poisson process to close the Bayesian recursion. Whatever the choice, the approximation must be invoked at each and every iteration of the filter, so the resulting intensity filter cannot be an exact filter. The PHD intensity filter makes a natural choice for the Poisson process—it chooses the Poisson process whose intensity function matches the exact Bayes posterior process. The intensity function of the exact posterior process is derived from the product form of the GFL. The secular function,     ¯ k ≡ kPHD 1 + αδ ¯ x¯ (x)|yk , kPHD α|y

(4.83)

is a product of exponential functions, and its derivative at α¯ = 0 is the intensity of the exact Bayesian posterior process. (See Sect. B.8 of Appendix B.) A more economical approach is to take the logarithmic derivative instead; see Appendix B, Eq. (B.70). The intensity function of the exact Bayes posterior process is M    − IkPHD (x) = 1− Pdk (x) IkPHD (x) +



IkPHD (x)Pdk (x) pk (ym | x)  . − λc p c (y ) + X IkPHD (x)Pdk (x) pk (ym | x) dx m=1 k k m (4.84) This expression defines the Bayesian update of the intensity function, and this intensity defines the updated Poisson process of the PHD filter. Interpretation of the recursion. The filter update is explained intuitively as follows. The probability that the measurement ym originates from an object whose state lies in a small neighborhood of x ∈ X of volume |x| is the ratio −

IkPHD (x)Pdk (x) pk (ym | x)|x|  . − c c λk pk (ym ) + X IkPHD (x)Pdk (x) pk (ym | x) dx

104

4 Tracking a Variable Number of Objects

By assumption there is at most one object per measurement, and the measurements are conditionally independent, so the sum from m = 1 to M is the expected number of objects in the neighborhood |x| of x that generate measurements. The expected number of objects in |x| that do not generate measurements is   − 1− Pdk (x) IkPHD (x)|x| . The sum of these numbers is the expected total number of objects in |x|. This number is also written as IkPHD (x)|x|. Equating these two expressions and dividing by |x| gives a recursion for object intensity, namely, (4.84). Relationship to imaging filters. The PHD filter recursion is not new to signal processing and imaging. Essentially the same recursion is known in image analysis as the famous Richardson-Lucy algorithm that was originally proposed in 1972 [10] and 1974 [11] for de-blurring astronomical (stellar) images. In positron emission tomography (PET), it is known as the Shepp-Vardi algorithm [12] that was derived in 1982 for medical imaging. The connections between the PHD intensity filter and PET imaging are detailed in [13]. Pair correlation function. Because of the Poisson approximation, the PHD intensity filter does not preserve the correlation structure of the exact Bayesian posterior process. Some of the information lost in the approximation is partly captured in the pair correlation function. As discussed in Sect. B.12 of Appendix B, the concept of pair correlation differs from the correlation function of a time series. Pair correlation functions are evaluated at two distinct points x j ∈ X, j = 1, 2. As seen from Eq. (B.88), they require evaluating first- and second-order moments using the GFL of the process. The GFL of the exact Bayes posterior PHD process at scan k is given by (4.82), so the pair correlation function is ρPHD k (x 1 , x 2 ) =

IkPHD (x1 , x2 ) . Ik (x1 )IkPHD (x2 )

(4.85)

PHD

The two denominator terms are intensities, so they are the first-order moments: m [1] (x j ) ≡ IkPHD (x j ) ,

j = 1, 2 .

(4.86)

The numerator is the second-order moment m [2] (x1 , x2 ). Substitute the Dirac delta train h δ (x) = 1 + α1 δx1 (x) + α2 δx2 (x) into (4.82) and, with α = (α1 , α2 ), define the secular function kPHD (α|yk ) = kPHD (h δ |yk ) ⎛ und = exp ⎝−λund k + λk

 j=1,2

⎞ ⎠ α j μund k (x j )

M  m=1

⎛ ⎝1 +



⎞ α j Wm (x j )⎠ ,

j=1,2

(4.87)

4.6 PHD: A Poisson Intensity Filter

105

where −

Wm (x) =

IkPHD (x)Pdk (x) pk (ym | x)  , − c c λk pk (ym ) + X IkPHD (x)Pdk (x) pk (ym | x) dx

x ∈ X.

(4.88)

 d2  kPHD (α|yk ) α1 =α2 =0 dα1 dα2 M = IkPHD (x1 )IkPHD (x2 ) − Wm (x1 )Wm (x2 ).

(4.89)

The cross-derivative with respect to α at α = 0 is IkPHD (x1 , x2 ) =

m=1

As kPHD (α|yk ) is the product of M linear functions and the exponential of a linear function, the derivative (4.89) can be computed via the general expression (C.37) in Appendix C (although it may be more enlightening to take the derivative by hand). Dividing (4.89) by IkPHD (x1 )IkPHD (x2 ) gives the pair correlation function of the exact PHD posterior process M ρPHD k (x 1 , x 2 ) = 1 −

m=1

Wm (x1 )Wm (x2 ) . (x1 )IkPHD (x2 )

PHD

Ik

(4.90)

This expression was first derived in [14]. After closing the recursion, the pair correlation function is that of a Poisson process and is identically equal to one. Repulsive processes. The pair correlation function of the exact PHD posterior process is less than or equal to one for any two points, since the correction term on the right-hand side of (4.90) is nonnegative. Intuitively, this means that a pair of points is less likely to occur jointly at the locations x1 and x2 than for a Poisson process with the same intensity function [15, p. 31]. The PHD intensity filter is, after closing the Bayes recursion, the Poisson process with the same intensity of the exact Bayes posterior. General “repulsive” and “attractive” finite point processes are defined in [15, p. 83]. The definitions are equivalent to a collection of simultaneous inequalities for multiple point correlations. The first (and simplest) of these inequalities is that the pair correlation is less than or greater than one, respectively, for repulsive and attractive processes. The result (4.90) shows that the first inequality holds for the exact PHD posterior process. Whether or not the exact PHD posterior process is repulsive in the strict sense is not known.

4.7 Numerical Examples In this simulated scenario, six point objects move in continuous time at constant velocities. Each object “exists” at all times in the closed interval [t0 , t K ] = [0, 240] seconds. Two different CPHD filters are used to process the same simulated dataset.

106

4 Tracking a Variable Number of Objects

The first filter incorporates all a priori knowledge about object number/existence, namely, that the number of objects N (t) remains constant at six during the time − interval [0, 240]. One implication of this is that G kN (z) = z 6 in the CPHD GFL given in Eq. (4.36). This very restricted version of the CPHD filter is exactly equivalent to the JPDAS filter; see Eq. (4.6). The second filter is the well-known PHD filter; see Sect. 4.6. Both filters are reduced to their bare essentials: the tracking filters do not incorporate birth, death, or spawning models, and the Markovian TPM Ak−1 in (4.30) is the infinite-dimensional identity matrix defined by πk−1 (i, j) = δi ( j), where this delta is the Kronecker delta. All object motion is confined to a 2D spatial region of interest R = [−2000, 2000] × [−1000, 1000] ⊂ R2 . Elements of the (superposed) object state space X and measurement space Y are represented using boldface text (e.g., x and y) so that they will not be confused with their 2D spatial components, x and y. All spatial units are in meters, and all time units are in seconds. The state space X ⊂ R4 comprises position and velocity components; an element x ∈ X is represented in coordinate form as x = (x x˙ y y˙ )T . A sensor provides spatial x-y measurements at 1 s intervals starting at time t1 = 1 and ending at time t K = t240 = 240; that is, t = 1. The measurement space Y ⊂ R2 ; an element y ∈ Y is represented by the 2D spatial vector y = (x y)T . At each scan k, if a given object is in state xk ∈ X, it induces a sensor measurement yk ∈ Y with probability Pdk (xk ) ≡ pd = 0.9. Given that the object is detected, a measurement yk is generated according to the linear-Gaussian PDF pk (yk | xk ) = N(yk ; H xk , R), where Hk ≡ H =

1000 0010



and Rk ≡ R = σ 2M

10 01

(4.91)

with σ M = 65. Simulated clutter measurements are realizations of a homogeneous PPP with uniform clutter intensity over R ⊆ Y. Specifically, the mean number of clutter measurements λck ≡ λc = 75 is constant over all scans, and the clutter PDF 1 = 1.25 × 10−7 for y ∈ R and k = 1, . . . , K . Thus, in each scan, there pkc (y) ≡ Vol(R) is an average of approximately 0.66 clutter measurements in a 3σ M measurement window. Both tracking filters are implemented as particle filters. Linear-Gaussian assumptions are adopted for object motion. Equations (2.8)–(2.9) become (2.52)–(2.53), respectively. The object motion model is assumed stationary, so that Fk ≡ F and Q k ≡ Q are constant over all scans, where ⎛

1 ⎜0 F =⎜ ⎝0 0

t 1 0 0

0 0 1 0

⎛ 3 ⎞ t t 2 0 32 ⎜ t 2 t 0⎟ 2 ⎟ and Q = σ 2 ⎜ p⎜ t ⎠ ⎝ 0 0 1 0 0

0 0 t 3 3 t 2 2

⎞ 0 ⎟ 0 ⎟ t 2 ⎟ ⎠ 2 t

4.7 Numerical Examples

107

with σ p = 0.5. This relatively small value of σ p helps to enforce “nearly straightline” motion; see Chap. 3, Fig. 3.1. The prior μ0 (·) is a Gaussian mixture model (GMM) defined by 6  1  μ0 (x) = (4.92) N x; x0n , P0|0 , 6 n=1 where x0n is the ground truth location of object n at reference time t0 = 0, and P0|0 = diag(1502 , 32 , 1502 , 32 ). The tracking filter parameters λc , pd , and σ M are matched to those of the simulation, and the prior expected number of objects at reference time t0 = 0 is set to λ0 = 6 in the PHD filter. (For the JPDAS tracker, this is not needed as the object number is always fixed at six.) No gating is performed. Multiobject state estimate. The following process is used to extract multiobject state estimates. It is important to point out that this does not affect the tracking filters at all. State estimation is performed separately for visualization purposes only. At each scan k, the filter’s state particles are fit to a multivariate Gaussian mixture model (GMM) comprising six components.1 The output of the fitting is a collection of six weighted Gaussians  n n n  (4.93) wk , μk , k n=1,...,6 with corresponding state PDF p(x) =

6 

  wkn N x; μnk , kn .

(4.94)

n=1

The six object state estimates are given by each μnk ∈ R4 , and the corresponding covariances are given by kn ∈ R4×4 . For scan k + 1, k ≥ 1, the algorithm is initialized with the GMM estimates (4.93) from the previous scan. For scan k = 1, the algorithm is initialized with the GMM defined by (4.92). It is important to note that there is no principled way to link the components of the GMMs over scans. There is no object continuity, so to speak. Labeled MBM filters are relevant in this regard—see Sect. 5.4 of the next chapter. Description of Figs. 4.1 and 4.2. Superposed object intensity functions.The upper subplots of Figs. 4.1 and 4.2 are the tracker outputs for the JPDAS and PHD filters, respectively. Ground truth object position in x-y space is given by the six dashed lines, and the gray dots are the superposed clutter realizations over the last five scans (clutter realizations are independent from scan to scan). The gray circles represent the six GMM μnk estimates for each scan k (spatial components only), while the black ellipses are the corresponding 99% spatial error ellipses derived from the fitted estimates  Kn , n = 1, . . . , 6, for each object at the final scan K = 240. For intuition, the brown dashed circle depicts a 3σ M measurement window. 1 The

GMMs are estimated via the fitgmdist function in Matlab® R2017b.

108

4 Tracking a Variable Number of Objects 1

ground truth GMM means GMM covs final scan

y (km)

0.5

0

-0.5

-1 -2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0.5

1

1.5

2

x (km) 1

y (km)

0.5

0

-0.5

-1 -2

-1.5

-1

-0.5

0

x (km) Fig. 4.1 JPDAS filter output. (Top)—A priori knowledge about object number/existence is assumed. Number of objects is constant and no birth, death, or spawning model is employed. (Bottom)—Heat map depicts how well the particles are concentrated around the ground truth after resampling

Heat maps. The lower subplots of Figs. 4.1 and 4.2 are the corresponding particle filter heat maps. The extended 2D spatial region of interest R = [−2200, 2200] × [−1100, 1100] is divided into 1000 × 500 grid cells. After resampling at each scan k, the number of particles falling into each grid cell is calculated. These values are then summed over all K = 240 scans.

4.7 Numerical Examples

109

1

ground truth GMM means GMM covs final scan

y (km)

0.5

0

-0.5

-1 -2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0.5

1

1.5

2

x (km) 1

y (km)

0.5

0

-0.5

-1 -2

-1.5

-1

-0.5

0

x (km) Fig. 4.2 Standard PHD filter output. (Top)—No birth, death, or spawning model is employed. (Bottom)—Heat map depicts how well the particles are concentrated around the ground truth after resampling

4.7.1 JPDAS Filter The JPDAS tracker output is summarized in Fig. 4.1. At each scan k, the Bayesian recursion is closed as in (4.24). The intensity function IkJPDAS (· | yk ) is calculated via the complex step method as described in (4.29) with  = 10−9 . (See Sect. 4.3.5 and Appendix C, Sect. C.5 for more on the complex step method.)

110

4 Tracking a Variable Number of Objects

4.7.2 PHD Filter The PHD tracker output is summarized in Fig. 4.2. Figure 4.3 provides summary object count statistics. In the upper subplot,  red line is the posterior estimate  the solid of the expected number of objects E N (tk ) | yk . This value (see Eq. (4.47)) is estimated via summing the posterior particle intensities (before resampling); then it is used to close the Bayesian recursion. The dashed red line is the estimated mean N¯est over all K = 240 scans (an “ergodic mean”). For this particular simulated 12 est num objects est mean = 5.364 actual num objects Poisson 2

est num objects

10 8 6 4 2 0 20

40

60

80

100

120

140

160

180

200

220

240

scan 0.3 estimated Poisson

probability

0.25 0.2 0.15 0.1 0.05 0 0

2

4

6

8

10

12

14

16

Nest Fig. 4.3 (Top)—PHD filter estimates of the number of objects. The mean of these estimates over all K = 240 scans is 5.364; it is depicted by the red dashed line. The blue dashed lines at 6 ± 4.89 are the 2σ lines for a Poisson variate with mean 6. (Bottom)—Red histogram (normalized) of all K = 240 PHD object count estimates (rounded to the nearest integer). Blue PMF of a Poisson variate with mean 5.364

4.7 Numerical Examples

111

scenario, the overall mean is biased low at N¯est ≈ 5.364. For comparison, the solid blue line represents the actual number of objects N (tk ) = 6, while the √ dashed blue lines represent two Poisson standard deviations from the mean: 6 ± 2 6 ≈ 6 ± 4.9. In the lower subplot of Fig. 4.3, the red bars are a histogram representation,   rounded to the nearest integer, of the PHD filter estimates E N (tk ) | yk that make up the solid red line in the upper subplot. For comparison, the blue bars represent the PMF values for a Poisson with mean N¯est .

4.7.3 Discussion of Results Qualitatively speaking, there is no significant difference in the PHD and JPDAS tracker outputs shown in Figs. 4.1 and 4.2. Both use object superposition, and both are special cases of the CPHD filter. Although they make very different assumptions about the number of objects, their filter output intensity functions are very similar. This suggests that superposition is a stronger influence on filter intensity estimates than the PMF of object number. This result has implications for post-processing algorithms that first estimate object number from the integral of the intensity function, and then extract that number of point estimates from the intensity function. In this example, the results show that superposition is a good visualization tool for situation assessment. The tracking capabilities of the PHD and CPHD filters are studied extensively elsewhere using more refined software than is used in this illustrative example, so it is not repeated here. The utility of JPDAS for track estimation has not yet been investigated. The PHD filter estimates the number of objects present. Figure 4.3 shows that these estimates are biased low with a mean of 5.364 instead of 6.0, which is ground truth. The figure shows that the estimates are approximately Poisson distributed about this biased mean. In the lower subplot of Fig. 4.3, the histogram (red) of the PHD estimates of object number has less spread than the Poisson PMF (blue) with mean 5.364. The (rounded) PHD estimates of object number lie in the interval [2, 10]. The probability that a single draw from a Poisson random variate with mean N¯est ≈ 5.364 lies in this interval is approximately 0.95.

References 1. William Squire and George Trapp. Using complex variables to estimate derivatives of real functions. SIAM Review, 40(1):110–112, 1998. 2. James N Lyness and Cleve B Moler. Numerical differentiation of analytic functions. SIAM Journal on Numerical Analysis, 4(2):202–210, 1967. 3. Nicholas Higham. Differentiation with(out) a difference. https://sinews.siam.org/Details-Page/ differentiation-without-a-difference, 2018 (accessed June 25, 2020).

112

4 Tracking a Variable Number of Objects

4. C Larrabee Winter and Michael C Stein. An additive theory of Bayesian evidence accrual. Technical report, Technical Report, No LA-UR-93-3336, Los Alamos National Lab NM Analysis and Assesment Div., 1987. 5. Ronald PS Mahler. Multitarget Bayes filtering via first-order multitarget moments. IEEE Transactions on Aerospace and Electronic systems, 39(4):1152–1178, 2003. 6. Daniel S Bryant, Emmanuel D Delande, Steven Gehly, Jérémie Houssineau, Daniel E Clark, and Brandon A Jones. The CPHD filter with target spawning. IEEE Transactions on Signal Processing, 65(5):13124–13138, 2016. 7. Malin Lundgren, Lennart Svensson, and Lars Hammarstrand. A CPHD filter for tracking with spawning models. IEEE Journal of Selected Topics in Signal Processing, 7(3):496–507, 2013. 8. Krishna B Athreya and Peter Ney. Branching Processes. Dover Publications, 2004. 9. Roy L Streit. Poisson point processes: imaging, tracking, and sensing. Springer Science, 2010. 10. William H Richardson. Bayesian-based iterative method of image restoration. J. Opt. Soc. Am., 62(1):55–59, 1972. 11. Leon B Lucy. An iterative technique for the rectification of observed distributions. The Astronomical Journal, 79:745, 1974. 12. Lawrence A Shepp and Yehuda Vardi. Maximum likelihood reconstruction for emission tomography. IEEE Transactions on Medical Imaging, 1(2):113–122, 1982. 13. Roy L Streit. PHD intensity filtering is one step of a MAP estimation algorithm for positron emission tomography. In 2009 12th International Conference on Information Fusion, pages 308–315, 2009. 14. A Onder Bozdogan, Roy Streit, and Murat Efe. Reduced Palm intensity for track extraction. IEEE Transactions on Aerospace and Electronic Systems, 52(5):2376–2396, 2016. 15. Jesper Møller and Rasmus P Waagepetersen. Statistical inference and simulation for spatial point processes. CRC Press, 2003.

Chapter 5

Multi-Bernoulli Mixture and Multiple Hypothesis Tracking Filters

“I don’t know half of you half as well as I should like; and I like less than half of you half as well as you deserve.” Spoken by Bilbo Baggins in The Fellowship of the Ring by J. R. R. Tolkien.

Abstract The power of the AC method to unify and clarify concepts is demonstrated. The multi-Bernoulli (MB) filter is derived from JIPDA by superposing object states (JIPDAS). The posterior process is a multi-Bernoulli mixture (MBM) process, and the hypothesis set is relatively simple—it comprises the set of feasible assignments of measurements to individual Bernoulli processes in the MB filter. The multi-Bernoulli mixture filter is also derived. The prior is an MBM process, and the posterior is again an MBM process, but the hypothesis set is now much more complicated. Said simply, the complication is that a Bernoulli process can be part of more than one MB process in the mixture. Since each MB process has a unique set of feasible assignments, the hypothesis set for the MBM filter is the union of the feasible assignment sets for the MB processes that comprise the mixture. The MBM process is labeled (LMBM) using an additional set of indeterminate variables called labels. The individual Bernoulli processes are paired with unique labels, so the LMBM facilitates using the hypothesis list together with merge/cap/split/prune methods to link Bernoulli processes over scans. The LMBM and the multiple hypothesis tracking (MHT) filters are built from the same hypotheses. For one scan, LMBM filters are shown to be equivalent to the subclass of MHT filters in which objects have the same state space. Keywords JIPDA filter · JIPDA with superposition (JIPDAS) · Multi-Bernoulli (MB) filter · MB mixture · Labeled MBM · Hypothesis-oriented LMBM · Trackoriented LMBM · Multiple hypothesis tracking (MHT) · Examples

© Springer Nature Switzerland AG 2021 R. Streit et al., Analytic Combinatorics for Multiple Object Tracking, https://doi.org/10.1007/978-3-030-61191-0_5

113

114

5 Multi-Bernoulli Mixture and Multiple Hypothesis Tracking Filters

5.1 Introduction The family of multi-Bernoulli filters is derived in this chapter from the JIPDA filter by applying superposition and assuming a random number of objects. The flow of the discussion parallels in many ways that of Chap. 4 which uses essentially the same methodology to move step-by-step from the JPDA filter to the CPHD filter. The basic multi-Bernoulli (MB) filter is built on the JIPDA existence model, so it is a kind of track-before-detect estimator of object number. Each of the constituent Bernoulli models in the MB filter is, in effect, a model of a sequential detector for one object. Sequential detection methods accumulate evidence over multiple scans to determine whether or not the object exists. In contrast, the CPHD filter is built on the JPDA model which is hard-edged and allows no doubt—objects that are “definitely” present or absent. Intuitively speaking, the CPHD filter lacks the nuanced structure of multi-Bernoulli filters and is, in effect, an estimator of the number of object detection decisions. The multi-Bernoulli filters are more circumspect—and probably more realistic—than the CPHD filter. The MB filter is the starting point. There are many interesting variations of the basic filter, so it is treated in detail in Sect. 5.2. The GFLs of Bernoulli and MB point processes are given in Appendix B by, respectively, (B.6) and (B.8). The more complex multi-Bernoulli mixture (MBM) filter is presented in Sect. 5.3. The GFL of an MBM point process is a probabilistic sum of the GFLs of several MB processes; such sums are defined in Appendix B (see (B.21)) and justified by the total probability theorem. MBM processes are called “generalized” multi-Bernoulli processes in [1]. Section 5.4 discusses labeled MB (LMB) and labeled MBM (LMBM) filters. Labels are defined in AC terms as new indeterminate variables that are incorporated into the GFL of the unlabeled point process. Labeled GFLs reduce to the unlabeled GFLs when the labels are set equal to one. Section 5.5 discusses the relationship of LMBM filters and classic MHT filters. The concept of conjugate families of distributions for multiple object tracking is reviewed in Sect. 5.6. A numerical example of the JIPDAS filter is given in Sect. 5.7. The benefits of AC to MBM filters. The notational burden of multi-Bernoulli filters is well known, and daunting. The AC approach does not, of course, avoid all such burden, but it substantially reduces it. The way that AC reorganizes the presentation makes the concepts much more accessible and, moreover, it completely clarifies the event space. AC also reveals close connections to the JIPDA filter. The analogy seems to be JPDA is to CPHD as JIPDA is to MBM. Another benefit is that the nature and meaning of labels is starkly seen in their interpretation as indeterminate variables. A final, significant benefit is that the LMBM filters and the established family of MHT filters are seen to be essentially identical, differing only in their use of object superposition. This statement is shown only for one scan, but the argument is convincing. Relationships between MHT and trajectory LMBM filters await further study.

5.2 Multi-Bernoulli (MB) Filter

115

5.2 Multi-Bernoulli (MB) Filter Virtually all the notation and assumptions used in Sect. 3.2 of Chap. 3 for the JIPDA filter are adopted throughout the rest of this chapter. Extensions and modifications to the notation are duly noted as needed.

5.2.1 Prior and Predicted Processes: JIPDA with Superposition Before superposition. The distribution for JIPDA with N objects that may or may not exist at scan k − 1 is, by assumption, a multi-Bernoulli process with Nk−1 models. It is parameterized by the list (3.31), namely,    n n χk−1 , μnk−1 (xk−1 ) : n = 1, . . . , Nk−1 ,

(5.1)

n where xk−1 ∈ Xn . The Bayesian recursion assumes that Nk−1 ≤ Jmax , where Jmax ≥ 1 is a specified maximum  nsize  of an MB process. xk denote, respectively, the predicted existence probability Let χ˜ kn− and μ˜ n− k and PDF of object n after Markovian transition from scan k − 1 to scan k. After the n− state-dependent   “Darwinian” survival test (see Sect. 4.5.2), they are denoted by χk n− n and μk xk . This process is superposed with an independent new object process at scan k. The new object process is assumed to be multi-Bernoulli with Nknew ≥ 0 Bernoulli models. The list (5.1) is augmented with the parameters for the new objects. Let Nk = Nk−1 + Nknew . The JIPDA predicted process at scan k is an MB process

   n χkn− , μn− k (x k ) : n = 1, . . . , Nk ,

xkn ∈ Xn .

(5.2)

The processes are independent, so the GFL of the predicted process is 

MultipleB−

k

h

1:Nk



=

 Nk  n=1



 1−

χkn−

+

χkn−

h (x n

Xn

n

n n )μn− k (x ) dx

.

(5.3)

A different indeterminate function h n (·) is needed for each space Xn . With superposition. To superpose the predicted JIPDA process, we proceed in the same way as was done with JPDA, that is, assume that Xn ≡ X and set h n (·) = h(·). After Markovian transition, Darwinian survival, and the superposition of an MB new object process, the parameter list of the MB process is specified by the list    χkn− , μn− k (x) : n = 1, . . . , Nk , x ∈ X .

(5.4)

116

5 Multi-Bernoulli Mixture and Multiple Hypothesis Tracking Filters

The PDFs in this list have a common domain, unlike the PDFs in (5.2). The GFL is −

kMB (h) =

  Nk  (x) dx . 1 − χkn− + χkn− h(x)μn− k n=1

(5.5)

X

This is the GFL of the predicted object process of the JIPDAS filter. After incorporating clutter and the BMD model, it is identical to the multi-Bernoulli GFL given below (see Eq. (5.17)). An alternative model for the new object process is a Poisson point process. Superposing it with the existing MB model makes the prior into what is termed a Poisson multi-Bernoulli (PMB) process. In some ways a PMB process is a better prior than the multi-Bernoulli alone. The PMB model causes no combinatorial difficulty, but it does change the algebra of the prediction and Bayesian update steps, and it affects how the Bayesian recursion is closed [2, 3]. Because it interferes with the flow of the discussion in this chapter, the PMB new object model is not pursued here. A new object model, whether MB or PMB, is specified at each scan.

5.2.2 GF of Predicted Number of Objects Setting h(x) = z for all x ∈ X in (5.5) gives the GF of the predicted number of objects as  Nk   − G MB 1 − χkn− + χkn− z . (5.6) k (z) = n=1

The PMF is essentially unimodal, except for ties for the largest probability [4]. The probability that R ≥ 0 objects exist is the coefficient of z R , that is,

− Pr{R} = z R G MB k (z) =

wk− (n 1 , . . . , n R ) ,

(5.7)

1≤ n 1 < ··· Nk .

5.2.4 Predicted Multiobject Intensity Function The intensity of the predicted process is −

IkMB (x) =

Nk n=1

χkn− μn− k (x) .

(5.15)

To see this, note that the GFL of the predicted process is the product (5.5) of the GFLs of Nk independent Bernoulli processes. The intensity of their superposition is, therefore, the sum of the intensities of the Bernoulli processes.

5.2.5 GFL of the MB Filter Bayesian processing using AC methods uses the GFL of the joint multiobjectmeasurement process at scan k. The GFLs of the prior and predicted MB processes are described in Sect. 5.2.1.

118

5 Multi-Bernoulli Mixture and Multiple Hypothesis Tracking Filters

By assumption, objects follow a Markov motion model and are possibly undetected, so the GFL of an object is given by the BMD model (3.1). The Nk Bernoulli models are parametric models in the prior process and are not subject to the “sameness” assumptions of the CPHD. In other words, the BMD models may have different detection probabilities and measurement likelihood functions. The n th BMD model is, after superposition, k

BMD(n)

   n− n n n (h, g) = h(x)μk (x) 1 − Pdk (x) + Pdk (x) g(y) pk (y|x) dy dx . X

Y

(5.16) Object measurements and clutter are superposed, so the GFL of the multi-Bernoulli filter is  Nk   1 − χkn− + χkn− kBMD(n) (h, g) , kMB (h, g) = kC (g) (5.17) n=1

where kC (g) is the GFL of the clutter process. The multi-Bernoulli GFL is identical to the GFL of JIPDA with superposition if the clutter process is Poisson, that is,       kMB h, g = kJIPDAS h, g ≡ kJIPDA h, . . . , h, g .

(5.18)

To see this, substitute h n (·) = h(·) into (3.35) on page 62 (with N replaced by Nk ): k

JIPDA



   Nk  c c c 1 − χkn− h, . . . , h, g = exp −λk + λk g(y) pk (y) dy 

 + χkn−

X

Y

(5.19)

n=1

   n n n h(x)μn− (x) 1 − Pd (x) + Pd (x) g(y) p (y | x) dy dx . k k k k Y

This expression is identical to (5.17). The identity also holds for any specified clutter process kC (g), as is seen by using it in place of the Poisson GFL in (5.19).

5.2.6 GFL of the MB Posterior Process   The secular function of kMB h, g for the measurements yk = {y1 , . . . , y M }, M ≥ 1 is   M (5.20) βm δ ym . kMB (h, β) = kMB h, m=1 The GFL of the Bayes posterior process is the normalized cross-derivative 

kMB (h | yk ) =

d  MB (h, β)β=0 dβ k  d  MB (1, β)β=0 dβ k

,

(5.21)

5.2 Multi-Bernoulli (MB) Filter

119 M

d where the notation dβ ≡ dβ1d···dβ M is used for enhanced readability. Thus, for h n = h for all n, the GFL of the exact Bayes posterior process is

N    1 kMB h | yk = χ κ=0 θ∈ (κ) C (yk )       n 1 − χkn− + χkn− h(x)μn− (x) 1 − Pd (x) dx k k X

n∈J (θ)

×

 

n∈I(θ)

χkn− c c λk pk (ym θ (n) )





X

n h(x)μn− k (x)Pdk (x)

pkn (ym θ (n)

| x) dx ,

(5.22)

where the index matrices (κ) are the same as in JPDA, the normalizing constant C χ (yk ) is given by (3.38) on page 63, and N  = min{M, Nk }. The function m θ (n) ∈ {1, . . . , M} is the index of the measurement assigned to object n if n ∈ I(θ ) and is undefined otherwise. It plays an important role when closing the Bayesian recursion. If the measurement set is empty, that is, yk = ∅, then the posterior MB GFL is k

MB



   N n   1 − χkn− + χkn− X h(x)μn− k (x) 1 − Pdk (x) dx    h | yk = ∅ = . (5.23) n 1 − χkn− + χkn− X μn− k (x) 1 − Pdk (x) dx n=1

This is the superposed version of the JIPDA posterior GFL as given by (3.39).

5.2.7 Exact MB Posterior Process Is an MBM The exact Bayes posterior GFL is a probabilistic mixture of GFLs of multi-Bernoulli processes, i.e., it is an MBM process. To see this, write the normalization constant in (5.22) as N  w (κ, θ ) , (5.24) C χ (yk ) = κ=0

θ∈ (κ)

where w (κ, θ ) =

      n 1 − χkn− + χkn− μn− (x) 1 − Pd (x) dx k k n∈J (θ)

×

 

n∈I(θ)

χkn− c c λk pk (ym θ (n) )



X

X

 n n μn− (x)Pd (x) p (y | x) dx . k k m θ (n) k

(5.25)

Dividing and multiplying the (κ, θ ) term in (5.22) by w (κ, θ ) gives   N  kMB h | yk = κ=0

θ∈ (κ)

  w(κ, θ | yk ) kMB h | yk , κ, θ ,

(5.26)

120

5 Multi-Bernoulli Mixture and Multiple Hypothesis Tracking Filters

where for each pair of indices (κ, θ ), w (κ, θ ) w(κ, θ | yk ) =  N   κ=0

θ∈ (κ)

(5.27)

w (κ, θ )

are nonnegative “updated” weights and 



k h | yk , κ, θ = MB

 1 − χkn− + χkn−





1 − χkn− +  n− n n  X h(x)μk (x)Pdk (x) pk (ym θ (n) | x) dx  n− × n n X μk (x)Pdk (x) pk (ym θ (n) | x) dx n∈I(θ) n∈J (θ)



n− n X h(x)μk (x) 1 − Pdk (x) dx    n χkn− X μn− k (x) 1 − Pdk (x) dx

(5.28)

is the GFL of an MB process with Nk independent Bernoulli processes. The GFL of the Bayes posterior process (5.26) is, thus, an MBM process.

5.2.8 Interpretation of the Posterior Mixture The posterior MB process (5.26) is a mixture over all feasible assignments of measurements and objects. The mixture GFL (5.26) is the first of several examples of MBMs in this chapter. As with other applications of AC in tracking, the GFL is a complete and concise statement of the underlying combinatorial problem. Mixture GFLs arise from the total probability theorem (see (B.21)). Thus, the weight w(κ, θ | yk ) in the mixture (5.26) is the probability of the event (κ, θ ). This event shows up in the derivative (5.22), but its first appearance is in the derivative of the GFL of JPDA, namely, (3.9). The event (κ, θ ) corresponds to an assignment of measurements to object models. To see which assignment, let denote the set of {0, 1} matrices of size M × Nk whose row and column sums are all either zero or one (cf. Sect. 3.2.3). Let (κ) ⊂ comprise those matrices with exactly κ columns that sum to one. For θ ∈ (κ), objects that exist and generate measurements are identified by the index set I(θ ), while Bernoulli models that do not generate measurements (either because the object does not exist or it exists but is not detected) correspond to the indices in J (θ ). The index sets I(θ ) and J (θ ) partition the set {1, . . . , Nk }, and these partitions characterize the feasible assignments. Thus, the mixture GFL (5.26) is indeed appropriately described as a sum over feasible assignments.

5.2 Multi-Bernoulli (MB) Filter

121

5.2.9 Posterior Probability Distribution To evaluate the PDF of the multiobject state x ≡ {x1 , . . . , x R }, R ≥ 1, substitute the Dirac delta train (5.10) into the multi-Bernoulli mixture GFL (5.26) to obtain the secular function    kMB α | yk ≡ kMB rR=1 αr δxr N  = κ=0

   yk

θ∈ (κ)

  w(κ, θ | yk ) kMB α | yk , κ, θ ,

(5.29)

where the summands are      kMB α | yk , κ, θ ≡ kMB rR=1 αr δxr  yk , κ, θ . Thus, (5.29) is a weighted sum of products of linear functions of α. The coefficient of the monomial α1 · · · α R is the posterior PDF,   N  pkMB x | yk = κ=0



 w(κ, θ | yk ) α1 · · · α R kMB α | yk , κ, θ . (5.30)

θ∈ (κ)

As in (5.13), the indicated coefficients are easily found by expanding the products term-by-term. It can also be calculated by taking the cross-derivative with respect to α term-by-term. The final expression is of little intrinsic interest and is omitted.

5.2.10 Intensity Function of the Posterior Process The multi-Bernoulli processes in the mixture are not independent. Nonetheless, since the expectation operator is linear, the intensity of their sum is the sum of their intensities. Following the method in (5.15), IkMB (x | yk ) =

N  κ=0

θ∈ (κ)

  w(κ, θ | yk ) IkMB x | yk , κ, θ ,

(5.31)

where MB

Ik



  n χkn− μn− k (x) 1 − Pdk (x)    n x | yk , κ, θ = n n n 1 − χkn− + χkn− X μn− k (x ) 1 − Pdk (x ) dx n∈J (θ) 



+

n∈I(θ)



n n μn− k (x)Pdk (x) pk (ym θ (n) | x) . n− n n n n n n X μk (x )Pdk (x ) pk (ym θ (n) | x ) dx

(5.32)

If J (θ ) or I(θ ) is empty, the corresponding sum is zero. In particular, if the measurement set yk is empty, then

122

5 Multi-Bernoulli Mixture and Multiple Hypothesis Tracking Filters

MB

Ik



Nk  x | yk = ∅ = n=1

  n χkn− μn− k (x) 1 − Pdk (x)    n. n n n 1 − χkn− + χkn− X μn− k (x ) 1 − Pdk (x ) dx

(5.33)

The probabilistic interpretation of this expression is straightforward.

5.2.11 GF of the Number of Existing Objects—MB Filter Proceed in the usual way by letting h(x) = z for all x ∈ X. The result is the GF of the number of objects that exist conditioned on the data:   G MB k z | yk

(5.34)  N n− n  1−χkn− + z χk X μk (x) 1 − Pdk (x) dx κ    = w(κ, θ | yk ) z . n 1−χkn− + χkn− X μn− k (x) 1 − Pdk (x) dx κ=0 θ∈ (κ) n∈J (θ)  n−





This GF is a polynomial of degree at most Nk in z. The factor z κ occurs in the (κ, θ ) summand because the conditioning event is that κ objects exist and generate measurements. The conditioning stipulates that the other object models do not generate measurements, hence the product of Bernoulli processes. The coefficient of z R ,

  Pr{Nk = R | yk } = z R G MB k z | yk ,

(5.35)

is the probability that R objects exist. The predicted probability (5.7) is, by comparison, a much simpler expression.

5.2.12 Closing the Multi-Bernoulli Bayesian Recursion The Bayesian recursion of the MB filter is closed by approximating the GFL of the exact posterior MBM process (5.26) with the GFL of a single MB process parameterized in the same way as the MB process (5.4) specified at scan k − 1. After closure/approximation, the updated parameter list is    χkn , μnk (x) : n = 1, . . . , Nk , x ∈ X ,

(5.36)

which corresponds to the GFL   Nk  n n n 1 − χk + χ k k (h | yk ) ≈ h(x)μk (x) dx . MB

n=1

X

There is more than one way to close the Bayesian recursion.

(5.37)

5.2 Multi-Bernoulli (MB) Filter

123

The approximation proposed in [5] was shown in [6] to exhibit significant bias in the estimated number of object models. A modified multi-Bernoulli approximation designed to mitigate that bias was also proposed in [6]. The bias-compensated multiobject filter is called the cardinality balanced multi-Bernoulli filter. Comparative tracking performance of these filters and more conventional, track-oriented multiple hypothesis tracking (MHT) filters can be found in [7]. An alternative approach is a combinatorial optimization strategy that exploits the fact that the posterior process is an MBM, i.e., a probabilistic mixture of the GFLs of the feasible assignments. The basic idea is to close the recursion by choosing the multi-Bernoulli process with the largest weight. (The JIPDAS numerical example in Sect. 5.7 employs this strategy.) When there are many assignments with weights that are nearly as large as the maximum weight, the mixture can be thresholded to retain several high scoring assignments. The reduced list can be merged/pruned in some suitable manner to find a multi-Bernoulli process to close the Bayesian recursion. Such methods are handicapped by the absence of some way to identify the individual Bernoulli processes. As shown in Sect. 5.4, adding indeterminate variables (called labels) to the GFLs can help with these issues.

5.3 Multi-Bernoulli Mixture (MBM) Filter The prior process in an MBM filter is defined to be a weighted sum of MB processes, i.e., an MBM process. The individual Bernoulli processes that comprise an MB in the mixture are the objects to which measurements are (potentially) assigned. For this reason, the MBs in the MBM are referred to as association hypotheses. The mixture prior adds a serious complication to the model. The complication— and it is an important one—is that the MB processes that constitute the mixture are themselves collections of Bernoulli processes and, moreover, they can share Bernoullis. Consequently, the MB processes generate different sets of measurementto-object assignments. The feasible assignments characterize the “at most one measurement per object rule” within each hypothesized set of “eligible” objects. Closing the Bayesian recursion of MBM filters encounters the difficulties of the same or similar kind as those encountered by their more widely known cousins, the MHT filters. Taken together, the similarities highlight the potential utility of labels. Labels are added to MBM filters in the next section, Sect. 5.4.

5.3.1 GFL of the MBM Process at Scan k−1 Hypotheses as they are defined here do not include the assignments of measurements to objects. This usage is different from that of most, if not all, MHT filters. With the AC method, however, it is natural to define hypotheses as subsets of objects and then

124

5 Multi-Bernoulli Mixture and Multiple Hypothesis Tracking Filters

add the assignments later. It is shown below in Sect. 5.3.7 that MHT-style hypotheses are “AC triplets” here. Let Nmax denote a specified maximum number of objects to be tracked. Each object may or may not exist, so Nmax is the maximum number of Bernoulli processes. Hypotheses are subsets of these processes, so there are 2 Nmax possible hypotheses. In practice, the Bayesian recursion is constrained to carry a maximum number of hypotheses, NHmax , so that NHmax ≤ 2 Nmax . Each such hypothesis corresponds to an MB process in the MBM. H ≥ 1 denote the number of MB processes in the MBM at scan k − 1, and Let Nk−1 let   H (5.38) Hk−1 = h1 , . . . , h Nk−1 denote the association hypotheses, where hypothesis hn ∈ Hk−1 is a subset of the n , and let indices {1, . . . , Nmax }. Denote the size of hn by Jk−1   n . hn = hn ( j) : j = 1, . . . , Jk−1

(5.39)

Conditioned on association hypothesis hn , only object processes with indices in hn are eligible to be assigned a measurement. See Fig. 5.1 for a depiction of the hypothesis notation at scan k. The MBM point process at scan k − 1 is specified by the doubly indexed list 

    hn ( j) hn ( j) n H : n = 1, . . . , Nk−1 . χk−1 , μk−1 (x) : j = 1, . . . , Jk−1 (5.40) H }, define the PMF of the mixing proThe probabilities {Pr k−1 {hn }, n = 1, . . . , Nk−1 portions of the MBM. The notation allows different MB processes to share individual H ≤ NHmax ≤ 2 Nmax . The MB size conBernoulli processes. From the constraints, Nk−1 n straint, Jk−1 ≤ Jmax , is imposed on each hypothesis. Pr k−1 {hn },

Fig. 5.1 MBM hypothesis notation example for scan k. Analogous notation is used for scan k − 1

5.3 Multi-Bernoulli Mixture (MBM) Filter

125

5.3.2 GFL of the MBM Predicted Process at Scan k The Bernoulli object processes in the MB process defined by the association hypothesis hn undergo Markovian transition and Darwinian survival. A new MB process with Jkn,new ≥ 0 Bernoullis is then superposed with the MB of hypothesis hn . Let n + Jkn,new . The hypothesis list can also be augmented with a list of new Jkn = Jk−1 H + NkHnew , where MB hypotheses, Hknew . Let Hk = Hk−1 ∪ Hknew , and let NkH = Nk−1 Hnew new ≥ 0 is the number of MB hypotheses in Hk . After Markovian transition, DarNk winian survival, object birth, and hypothesis augmentation, the MBM process to be updated at scan k is parameterized by the list      h ( j)− h ( j)− Pr − χk n , μk n (x) : j = 1, . . . , Jkn : n = 1, . . . , NkH . k {hn }, (5.41) H hypothesis probabilities are proportional to those in (5.40); they are The first Nk−1 renormalized (if necessary) to accommodate new hypothesis probabilities. The GFL defined by the list (5.41) is H

MBM−

k

(h) =

Nk n=1

Jk    h ( j)− h ( j)− h ( j)− 1−χk n + χk n h(x)μk n (x) dx . n

Pr − k {hn }

X

j=1

(5.42) As mentioned earlier, new object processes can also be modeled as Poisson processes. It is shown in [8] that the exact Bayes posterior is a PMBM process. The extra algebra detracts from the discussion, so Poisson new object models are not presented. Details are given in [2, 8].

5.3.3 GF of the Predicted Aggregate Number of Objects in the MBM Setting h(x) = z in (5.42) gives the GF of the total number of objects, Ntotal , in the MBM as NkH Jkn    h ( j)− h ( j)− − MBM− 1−χk n , (5.43) Pr k {hn } + z χk n G k (z) = n=1

j=1 −

where z is the indeterminate. As a sanity check, note that G MBM (1) = 1. The GF is the k probabilistic sum of the GFs of the number of objects conditional on the hypotheses. The expected number is

126

5 Multi-Bernoulli Mixture and Multiple Hypothesis Tracking Filters

E[Ntotal ] =

 d MBM−  G k (z) dz z=1 H

=

Nk

n

Pr − k {hn }

n=1

Jk

h ( j)−

χk n

,

(5.44)

j=1

a result that accords well with intuition.

5.3.4 Probability Distribution of Predicted MBM Multiobject State Let x ≡ {x1 , . . . , x R }, R ≥ 1. Substituting the Dirac delta train (5.10) into (5.42) gives the secular function H

MBM−

k

(α) ≡

Nk

Jk    R h ( j)− h ( j)− h ( j)− 1 − χk n + χk n αr μk n (xr ) . n

Pr − k {hn }

n=1

r =1

j=1

(5.45) The PDF at x is the coefficient of the monomial α1 · · · α R . Hence, H

MBM−

pk

(x) =

Nk

k   R 

h ( j)− h ( j)− h ( j)− 1−χk n α1 · · · α R + χk n αr μk n (xr ) .

Jn

Pr − k {hn }

n=1

r =1

j=1

Using (5.14) and rearranging terms gives H

MBM−

pk

(x) =

Nk



n=1

1≤ n 1 =···=n R ≤Jkn

− Pr − k {hn } wk (n 1 , . . . , n R | hn )

R 

h (nr )−

μk n

(xr ) ,

r =1

(5.46) where the weights conditioned on the n th association hypothesis are wk− (n 1 , . . . , n R | hn ) =

 R  Jkn  h ( j)− 1 − χk n j=1

r =1

h (nr )−

χk n

h (nr )−

1 − χk n

.

(5.47)

These are the same weights as in (5.8) but conditioned on hypothesis hn .

5.3.5 GFL of the Joint MBM Process The GFL of the MBM filter builds on the derivation of the GFL (5.17) for the MB filter. Superposing clutter and using the predicted MBM process (5.42) gives the

5.3 Multi-Bernoulli Mixture (MBM) Filter

127

GFL kMBM (h, g)

(5.48) NkH

= kC (g)



Pr − k {hn }

n=1

 Jkn

h ( j)−

1−χk n

h ( j)−

+ χk n

BMD(hn ( j))

k

 (h, g) ,

j=1

where the BMD process is, extending the notation of (5.16), BMD(hn ( j))

k

(h, g) (5.49)    h ( j)− h ( j) h ( j) h ( j) (x) 1 − Pdk n (x) + Pdk n (x) g(y) pk n (y|x) dy dx . = h(x)μk n X

Y

A more insightful way to write (5.48) is H

k

MBM

(h, g) =

Nk

MB Pr − k {hn } k (h, g | hn ) ,

(5.50)

n=1

where the conditional GFLs are defined by Jk    h ( j)− h ( j)− BMD(hn ( j)) k (h, g | hn ) = k (g) 1 − χk n + χk n k (h, g) . n

MB

C

(5.51)

j=1

Except for the conditioning on the mixture component hn , this GFL is identical to the multi-Bernoulli GFL (5.17). From this fact alone, it is clear that the exact posterior process of the MBM filter is an MBM filter.

5.3.6 GFL of the MBM Bayes Posterior Process Let the scan measurement set be yk = {y1 , . . . , y M }, M ≥ 1. The case yk = ∅ is handled in a similar fashion to that of the MB and JIPDA filters in Sects. 5.2.6 and 3.3.3, respectively. To streamline the presentation, the case yk = ∅ is not explicitly addressed for MBM filters. The GFL of the MBM process is the normalized derivative, 

k

MBM

(h | yk ) =

d  MBM (h, β)β=0 dβ k  d  MBM (1, β)β=0 dβ k

128

5 Multi-Bernoulli Mixture and Multiple Hypothesis Tracking Filters

 NkH n=1

=  H Nk

n=1 NkH

=



  d MB  Pr − k {hn } dβ k h, β | hn β=0   d MB  Pr − k {hn } dβ k 1, β | hn β=0

  Pr k {hn | yk } kMB h | yk , hn ,

(5.52)

n=1

where the mixture probabilities are   d MB  Pr − k {hn } dβ k 1, β | hn β=0 Pr k {hn | yk } =  H   Nk Pr − {hn  } d  MB 1, β | hn    n =1

k



(5.53)

β=0

k

and, using the normalized derivative (5.21) of the posterior GFL for an MB process, 

k (h | yk , hn ) = MB



d  MB h, β | hn β=0 dβ k   d  MB 1, β | hn β=0 dβ k

.

(5.54)

The next step is to expand the hypothesis conditioned GFLs in (5.52) into a sum of assignment conditioned GFLs. Conditioning the assignment mixture (5.26) on the assignments that correspond to the hypotheses requires refining the notation. Let agg Nk denote the total number of distinct Bernoulli processes in the entire MBM; analytically, agg (5.55) Nk = |∪n hn | ≤ Jmax NHmax . agg

Let be the set of all M × Nk matrices with entries in {0, 1} and whose row and column sums are less than or equal to one. Let (κ, hn ) denote the subset of matrices in such that exactly κ of the columns corresponding to indices in hn sum to one, and all other columns sum to zero; thus, a total of κ columns sum to one and these columns correspond to Bernoulli processes indexed in hn . Conditioning the assignment mixture (5.26) on the association hypothesis hn and substituting the result into (5.52) gives H

k

MBM

(h | yk ) =



Nk (hn ) N n=1



  Pr k {hn , κ, θ | yk } kMB h | yk , hn , κ, θ , (5.56)

κ=0 θ∈ (κ,hn )

where the exact posterior mixture probabilities are Pr k {hn , κ, θ | yk } = Pr k {hn | yk } w(κ, θ | yk , hn ) ,

(5.57)

and N  (hn ) = min{M, Jkn }. These probabilities are interpreted in an MHT filter context in the next subsection.

5.3 Multi-Bernoulli Mixture (MBM) Filter

129

Every matrix θ in the set of matrices (κ, hn ) has κ columns that sum to one; the definition of these columns depends on the hypothesis hn . The posterior process is, by inspection, an MBM process. The GFL of the conditional MB in the MBM,   kMB h | yk , hn , κ, θ =

h ( j)−



  hn ( j)− h ( j) (x) 1− Pdk n (x) dx X h(x)μk   h ( j)−  hn ( j)− h ( j) χk n (x) 1 − Pdk n (x) dx X μk

 1−χkhn ( j)− + χkhn ( j)−

1−χk n +   hn ( j)− h ( j) h ( j)   (x) Pdk n (x) pk n ym θ (hn ( j)) | x dx X h(x)μk , (5.58) ×  hn ( j)−  hn ( j) hn ( j)  y μ (x) Pd (x) p | x dx m (h ( j)) θ n k k k j∈I(θ,hn ) X

j∈J (θ,hn )

is the hypothesis-dependent version of (5.28). The set I(θ, hn ) is defined so that j ∈ I(θ, hn ) if and only if the column of θ corresponding to hn ( j) sums to one. Similarly, j ∈ J (θ, hn ) if and only if the column of θ corresponding to hn ( j) sums to zero. Thus, |I(θ, hn ) ∪ J (θ, hn )| = Jkn for all matrices θ ∈ (κ, hn ). This expression, like other hypothesis-dependent expressions, is messy because the notation must accommodate hypotheses that share individual Bernoulli processes.

5.3.7 MHT-Style Hypotheses It was mentioned in Sect. 5.3.1 that the hypotheses depicted in Fig. 5.1 are not MHTstyle hypotheses. In this subsection, the description of MHT hypotheses as “triplets” is made explicit. The sum (5.56) is a probabilistic combination of conditional GFLs. Each GFL is conditioned on a unique event (hn , κ, θ ). These events are referred to as AC triplets or triples. The set of feasible triplets {(hn , κ, θ )} partitions the feasible event space. Each variable in a triplet plays a specific role. Each MHT-style hypothesis specifies a particular set of assignments of measurements to objects. The set of MHT-style hypotheses also partitions the feasible event space. A detailed discussion is outside the scope of the book, but it is clear that the set of MHT-style hypotheses correspond one-to-one to the set of AC triplets {(hn , κ, θ )}. Said differently, every MHT-style hypothesis is an AC triplet (hn , κ, θ ), and vice versa, and (5.57) is its probability.

5.3.8 GF of Aggregate Object Number—MBM Filter The MBM posterior GF of the number of objects is given by evaluating (5.56) and (5.58) for h(x) = z. The result is a probabilistically weighted sum of the GFs  of object number for the hypotheses. The conditional GFL, kMB h | yk , hn , κ, θ |h=z ,

130

5 Multi-Bernoulli Mixture and Multiple Hypothesis Tracking Filters

is a polynomial in z of degree at most Jkn . The probabilistically weighted sum over all hypotheses is, therefore, a polynomial of degree at most maxn {Jkn }, a result that accords well with intuition. Further details are omitted because they are straightforward extensions of what is done for the MB filter in (5.34).

5.3.9 Intensity of the MBM Posterior The intensity function is conveniently stated using the expansion (5.56) and the expression (5.58). The Bernoulli processes in (5.58) are independent; hence the intensity of their superposition is the sum of their intensities. Let x ∈ X. Then

MB

Ik



x | yk , hn , κ, θ +



  h ( j)− hn ( j)− h ( j) μk (x) 1− Pdk n (x) =   hn ( j)− h ( j)−  hn ( j)− h ( j) + χk n (x) ˜ 1 − Pdk n (x) ˜ dx˜ j∈J (θ,hn ) 1−χk X μk

χk n

h ( j)−





j∈I (θ,hn )

μk n

h ( j)

h ( j) 

(x) Pdk n

(x) pk n

Pdk n

(x) ˜ pk n

hn ( j)− (x) ˜ X μk

h ( j)

h ( j) 

ym θ (hn ( j)) | x



.  ym θ (hn ( j)) | x˜ dx˜

(5.59)

Except for conditioning, this expression is the same as (5.32). Using this expression and (5.56) gives H

MBM

Ik

(x | yk ) =



Nk (hn ) N n=1



  Pr k {hn , κ, θ | yk } IkMB x | yk , hn , κ, θ .

(5.60)

κ=0 θ∈ (κ,hn )

5.3.10 Closing the Bayesian Recursion for MBM Filters Mixture reduction is typically performed by pruning, capping, and merging various parts of the posterior MBM. The set of all posterior hypotheses, or AC triplets, {(hn , κ, θ )}, is, in general, much larger than the number of predicted hypotheses. If the exact posterior MBM (5.56) is larger than the hypothesis list size constraint NHmax , or if any of the MB components in the mixture is larger than the maximum size Jmax , then mixture reduction methods are needed to find an MBM that satisfies the constraints and, thereby, closes the Bayesian recursion. The first step involves capping the number of hypotheses so that at most NHmax are retained. In simpler problems this can be done by thresholding the hypothesis probabilities. After capping, the mixture probabilities are renormalized. Similarly, the Bernoulli processes in each MB are thresholded based on the exish ( j) tence probabilities χk n , so that at most Jmax are retained. The same Bernoulli process can be purged in one MB process and retained in another.

5.3 Multi-Bernoulli Mixture (MBM) Filter

131

Simple “greedy” style closure procedures have serious deficiencies. In difficult problems they are inefficient and will often fail. Problems arise even in fairly simple problems. As indicated in the JIPDAS example in Sect. 5.7 below, greedy methods cannot be used without care. There are many varieties of these kinds of methods, but they are outside the scope of this book. Numerical values of the intensity for (5.56) as given by (5.60) are easily computed by the complex step method; see Appendix C, Sect. C.5, as well as Sect. 4.3.5. Alternatively, the symbolic derivative can also be derived and the derivative evaluated numerically. Object state estimates can be extracted from the intensity function using any of several heuristics that range from simple to sophisticated. As in earlier chapters, track extraction is outside the scope of the book. Because of superposition, the Bernoulli processes in an MBM are unidentifiable, so there is no obvious way to associate a Bernoulli process at scan k with an earlier process at scan k − 1. Mixture reduction methods are, therefore, applicable only to an MBM at a single scan k. To use them for multiple scans it is necessary to make the Bernoulli processes identifiable. Further details on mixture reduction methods applied to MBM filters, as well as track extraction methods for estimating specific object states and a measure of estimation error, are given in various papers; see, e.g., [6, 8, 9].

5.4 Labeled MBM Filter Object superposition can significantly reduce computational complexity in many problems, but it always comes with a price—information loss, as discussed in Chap. 4, Sect. 4.3.1. In particular, superposition makes it difficult to estimate the state of individual objects and to connect these estimates from one scan to the next to form object tracks, or trajectories, over multiple scans. There are two ways to study trajectory problems using AC methods. One is to label the Bernoulli processes and carry the labels forward from one scan to the next, retaining the labels and using them to link object state estimates over multiple scans. It is studied by a number of researchers [8, 10–12]. The other approach derives the trajectory GFL for one object over a sequence of scans, and then superposes the object trajectories to obtain the multiobject trajectory GFL. Scan to scan object linkage is established before superposition, thereby reducing the potential utility of labels. The exact Bayesian interval smoothing and lagged filters are derived from the trajectory GFL. These multiobject trajectory filters are analogs of the single-object RTS smoothing filters [13]. The trajectory GFL approach is studied in [14]. It is the subject of ongoing work and will be presented elsewhere.

132

5 Multi-Bernoulli Mixture and Multiple Hypothesis Tracking Filters

5.4.1 Labels in Analytic Combinatorics Loosely speaking, for the tracking problems considered in this chapter, labels retain information that is lost by superposing object states. In more technical AC terms, labels are the indeterminates of missing (latent) variables. Definitions. Let G N (z) be the GF of a nonnegative random integer N , where z is the indeterminate. The random nonnegative integer L is said to label N if the bivariate GF of N and L satisfies (5.61) GNL (z, ) = GN (z ) , where is the indeterminate variable for L. The label space is the range of L, i.e., a subset of the integers {0, 1, . . .}. The right-hand side of (5.61) encodes the fact that Pr{L = N } = 1; that is,   1 dn+m G (z,

) Pr{N = n, L = m} = n!m!  N L n m dz d z= =0  Pr{N = n} if n = m = (5.62) 0...... otherwise. The GFL of a labeled cluster point process with PDF p(x) on X is defined by (h, ) ≡ GNL

 X

    h(x) p(x) dx, = GN X h(x) p(x) dx ,

(5.63)

where h is the indeterminate function. Example: Labeled generating function. A random integer N is the sum of three variates, two Bernoulli and one Poisson, whose GFs are, respectively, G B (z) = 15 + 4 z and G P (z) = e−3+3z . Assuming independence, the GF of N is G N (z) = ( 51 + 5 4 2 −3+3z z) e . To find the combinations of the Bernoulli and Poisson variates that give, 5 say, N = 3, make the GFs into bivariate GFs by labeling them (z, j ) = GBlabel j

+ 45 j z , j = 1, 2,   GPlabel (z, 3 ) = exp −3 + 3 3 z , 1 5

where j is the indeterminate for the random nonnegative integer L j , j = 1, 2, 3. Let = ( 1 , 2 , 3 ) and L = (L 1 , L 2 , L 3 ). As a check, note that the GFs are equal to one when = 1 ≡ (1, 1, 1) and z = 1. The labeled GFs are identifiable, including the otherwise indistinguishable Bernoulli variables. The GF of N labeled by L is G NL (z, ) =

1 5

+ 45 1 z

 1 5

 + 45 2 z e−3+3 3 z .

(5.64)

The coefficient of z 3 in the series expansion about zero is [z 3 ] G NL (z, ) =

 96

50 1 2 3

+

36

2 50 1 3

+

36

2 50 2 3

+

9 3

50 3



e−3 .

(5.65)

5.4 Labeled MBM Filter

133

Setting = 1 gives Pr{N = 3} = [z 3 ] G NL (z, 1) = 177e−3 /50 ≈ 0.176. The four monomial terms in (5.65) identify the four ways that N can equal three. For example,

2 23 = 01 12 23 labels the event in which the first Bernoulli variate is zero, the second is 36 −3 one, and the Poisson variate is two; moreover, the coefficient 50 e of the monomial is the joint probability of that event. The GF of L conditioned on N = 3 is the ratio (see (A.28)) G L|N =3 ( ) =

[z 3 ] G NL (z, ) = [z 3 ] G NL (z, 1)

96

177 1 2 3

+

36

2 177 1 3

+

36

2 177 2 3

+

9 3

. 177 3

The conditional probability that the Poisson variate equals 3 and the Bernoulli variates 9 equal 0 is 177 ≈ 0.051. The monomials identify the combinations that comprise the event N = 3, and the coefficients are the probabilities of the combinations. Example: Labeled Bernoulli point processes. Let χ denote the probability that an object exists, so that G N (z) = 1 − χ + χ z. The labeled Bernoulli point process is   LBernoulli (h, ) = 1 − χ + χ X h(x) p(x) dx ,

(5.66)

where p(x) is the PDF of the process and is the indeterminate for a random integer L that takes values in the space {0, 1}. If there are N ≥ 1 Bernoulli processes with existence probabilities χn and PDFs pn (x), each process gets it own label  LMultipleB (h, ) =

N    1 − χn + χn n X h(x) pn (x) dx , n=1

(5.67)

where = ( 1 , . . . , N ) is an indeterminate label vector. As in the previous example, the labels are distinct even if the Bernoulli processes are identical.

5.4.2 GFL of the LMBM Filter Labeling a Bernoulli process is very simple in principle; merely replace the indeterminate h by the product h. Labeling an MB process is just as easy, although a different label is needed for each Bernoulli. When labeling an MBM process, however, it is necessary to ensure that every Bernoulli in the mixture is labeled once and only once. The notation of (5.48) and (5.49) uniquely identifies every Bernoulli appearing in the hypothesis list, so the simple labeling method works. h ( j) Let k n denote the label/indeterminate for the Bernoulli process indexed by hn ( j) in scan k. If the same Bernoulli process appears in different hypotheses, then it is assigned the same label, as is easily verified. The vector of indeterminates for hypothesis hn is   h (J n ) h h (1) (5.68)

k n = k n , . . . , k n k .

134

5 Multi-Bernoulli Mixture and Multiple Hypothesis Tracking Filters h

The labels in k n are necessarily distinct, but (to emphasize the point) the same label can appear in the label vector of a different hypothesis. Let   h

k = k n : n = 1, . . . , NkH

(5.69)

denote the list of association hypothesis vector labels. The labeled version of the GFL of the MBM is, using (5.48) and (5.67), kLMBM (h, k , g) NkH

= kC (g)



(5.70) Pr − k {hn }

n=1

  hn ( j)  h ( j)− h ( j)− BMD(hn ( j)) 1−χk n

k h, g . + χk n k Jkn

j=1

This GFL uniquely labels every Bernoulli in the MBM regardless of how many hypotheses it is in. The labeled version of the GFL of the Bayes posterior process can be derived de novo from this GFL. It is more simply found from (5.56) by h ( j) replacing h with the appropriately labeled product k n h, that is, H

k

LMBM

(h, k | yk ) =



Nk N (hn ) n=1

  h Pr k {hn , κ, θ | yk } kLMB h, k n | yk , hn , κ, θ ,

κ=0 θ∈ (κ,hn )

(5.71) where the conditional GFL of the LMB (labeled MB) filter is, using (5.58),   h kLMB h, k n | yk , hn , κ, θ =

    1−χkhn ( j)− + χkhn ( j)− hk n ( j) h(x)μhk n ( j)− (x) 1− Pdkhn ( j) (x) dx X   h ( j)− h ( j)−  hn ( j)− h ( j) 1−χk n + χk n (x) 1 − Pdk n (x) dx j∈J (θ) X μk     hk n ( j) h(x)μhk n ( j)− (x) Pdkhn ( j) (x) pkhn ( j)− ym θ (hn ( j)) | x dx X . (5.72) ×  hn ( j)−  h ( j) h ( j)−  ym θ (hn ( j)) | x dx (x) Pdk n (x) pk n j∈I(θ) X μk

The GFL (5.71) is referred to as the hypothesis-oriented LMBM (HO/LMBM) because it accounts for all (feasible) assignments in every hypothesis. Let 1agg be the list of vectors of all ones that is matched to the dimensions of the hypothesis list Hk . As sanity checks, note that kLMBM (h, 1agg | yk ) = kMBM (h | yk ) and that kLMBM (1, 1agg | yk ) = 1. Perhaps the most important feature of the GFL (5.71) is that it is a polynomial of degree at most one in each label. It inherits this property from (5.72) since it, too, is a polynomial of degree at most one in any given label.

5.4 Labeled MBM Filter

135

5.4.3 Track-Oriented LMBM and Closing the Bayesian Recursion The LMBM Bayes recursion can be closed in the same manner as done in Sect. 5.3.10 for the unlabeled MBM filter; however, this procedure is unappealing now because it does not exploit the labels. Now that labels are available, it is possible to isolate the terms that contribute to each of the Bernoulli processes. This makes it possible to track individual objects by exploiting their labels. All that is necessary, in principle, is that objects retain their labels as they transition from scan k − 1 to scan k, and any new objects that are born are assigned labels that are different from the labels of objects that already exist. (It is also possible, given exogenous information, to assign the label of an existing object to a new object.) Said differently, labels are “bundled” with objects but they do not undergo Markovian transition. The GFL of what is termed the track-oriented LMBM (TO/LMBM) filter for a specified object of interest is found by marginalizing over all other object labels. Let

* denote the label of the object of interest, and let 1agg ( *) denote the list of vectors of labels that are all equal to one except for the label *. The coefficient of * in the marginal GFL for object * is the Bayes updated existence probability for object *,  

χk * = * kLMBM 1, 1agg ( *) | yk .

(5.73)

Similarly, the Bayes updated GFL for the state of object *, conditional on its existence, is 

LMBM  h, 1agg ( *) | yk

* k TO/LMBM . k (h | yk , *) = LMBM  agg (5.74)

* k 1, 1 ( *) | yk The denominator is identical to (5.73). The numerator comprises only those hypotheses that include the Bernoulli process *; hypotheses that do not include it are not part of the coefficient of *. The track-oriented GFL clearly has far fewer terms than the full expression. The PDF for object *, conditional on its existence, is the derivative of the secular function. Explicitly, let x ∈ X, substitute h(·) = αδx (·), and differentiate: μ * k (x) =

 d TO/LMBM  k (αδx | yk , *) . α=0 dα

(5.75)

The use of the marginal distribution to close the Bayesian recursion is reminiscent of the mean field method used in the classic PDA filter (see [7]). The final step in closing the recursion is to update the hypothesis probabilities. Let 1agg (hn ) denote the list of vectors of labels that are all equal to one except for the set of labels in the hypothesis hn . Define the monomials h

h (1)

k n ≡ k n

h (Jkn )

· · · k n

, n = 1, . . . , NkH .

(5.76)

136

Then

5 Multi-Bernoulli Mixture and Multiple Hypothesis Tracking Filters



hn LMBM  agg 1, 1 (hn ) | yk k k Pr{hn | yk } =  H   Nk hν LMBM 1, 1agg (hν ) | yk ν=1 k k

(5.77)

is the (marginal) Bayes updated probability for each predicted hypothesis hn in the list (5.41). Applying (5.57) gives the full posterior PMF over the set of all posterior hypothesis triples {(hn , κ, θ )}. This set can be reduced, for example, by retaining the NHmax highest probability hypotheses and renormalizing accordingly. Practical implementations of MBM, LMBM, and MHT filters use prune/cap/ merge/split methods to retain a limited number of “high quality” associations. There is no one-size-fits-all way to do this and, as a result, there are as many filters as there are implementations. These techniques are often quite sophisticated, but they fall into the general category of combinatorial optimization and are outside the scope of this book.

5.5 Multiple Hypothesis Tracking (MHT) Filter The classical MHT approach to multiple object tracking dates to 1979 [15]. Conceptually, exact MHT enumerates all possible associations of measurements to objects over several scans. The goal is to estimate connected object trajectories (linked tracks) over time. This is the same goal as that of LMBM filters. In this section, however, the ability of LMBM and MHT filters to estimate object trajectories is severely curtailed by restricting the discussion to one scan only. The obvious distinction between the MBM and MHT filters is that MBM superposes object states and MHT does not. The LMBM filter goes a step further by labeling objects. This raises important questions about the relationship of MHT to these filters. The purpose of this section is to use AC to give convincing answers to these questions, and to do so in a relatively painless and straightforward way using the joint GFLs from which the filters are derived. As pointed out above, hypotheses are subsets of objects. Thus, MHT and MBM filters begin with the same hypotheses Hk . Markovian transition, object birth, and hypothesis augmentation can be performed in the same way for both filters, although MHT allows the Bernoulli processes to have different state spaces and, hence, different indeterminate functions. The predicted process for MHT is a modified version of (5.41). Recall that hypothesis hn ∈ Hk comprises the indices    hn = hn (1), . . . , hn Jkn .

(5.78)

The parameter list for the predicted MHT process is     h ( j)− h ( j)− hn ( j)  Pr − xk : j = 1, . . . , Jkn : n = 1, . . . , NkH , χk n , μk n k {hn }, (5.79)

5.5 Multiple Hypothesis Tracking (MHT) Filter

137

h ( j)

where xk n ∈ Xhn ( j) . Designate the indeterminate function for the Bernoulli process with index hn ( j) as h hn ( j) , and denote the set of all such indeterminate functions for processes in the hypothesis list by h Hk . The joint GFL of the MHT filter is kMHT (h Hk , g)

(5.80) NkH

= kC (g)

n=1

Pr − k {hn }

 Jkn

h ( j)−

1−χk n

h ( j)−

+ χk n

BMD(hn ( j))

k



h hn ( j) , g



.

j=1

MHT filters specify and update the list (5.79), but do not explicitly formulate the GFL of the underlying discrete-continuous distribution. The GFL (5.80) is justified by noticing that it is a probabilistic sum of the GFLs of JIPDA filters, one for each of the NkH hypothesis. To see this, apply (3.33)–(3.34) to the set of Bernoullis that comprise one hypothesis, say hn . Doing this for each hypothesis establishes the result. The set of all such JIPDA hypotheses is identical to the set of MHT-style hypotheses, as discussed in Sect. 5.3.7. Further details are omitted. If the objects have the same state space, so that they can be superposed, setting all the indeterminates equal to h reduces it, by inspection, to (5.48). Thus, the MBM filter is identical to the MHT filter with superposition. Note that the GFL of the clutter process is assumed to be the same in both filters, but it need not be Poisson. Comparing (5.80) to (5.70) is revealing. If the state spaces are the same, then the mapping h ( j) (5.81) h hn ( j) ←→ k n h is one-to-one. Consequently, for one scan update, there is a one-to-one correspondence between the terms in the GFLs of MHT and LMBM. The role of labels as a kind of indicator function, first noted in (5.62), is thus inherited by the LMBM filter. To all intents and purposes, the MHT and LMBM filters are identical for one scan when the state spaces are identical. The clutter process is the same in both filters, but it need not be Poisson. MHT filters allow objects to have different state spaces, a capability that is sometimes needed if objects are maneuvering. LMBM filters cannot do the same because superposition requires object state spaces to be identical. If the LMBM filter can be modified in some way to allow it to cope with different spaces, it would then be mathematically equivalent to MHT. Similar statements may pertain to comparisons between MHT over multiple scans and labeled trajectory estimation problems. This is an interesting and much more complicated problem. As mentioned above at the beginning of this section, the topic is outside the scope of the present book. The different conceptualizations of MHT and LMBM filters may facilitate, with more or less ease, the development of new and improved methods (e.g., merging/splitting/etc.) for revising the AC triplet hypothesis list to close the Bayesian recursion. Moreover, the GFL formulation of MHT gives an alternative understanding of the problem and, hence, may possibly lead to improved approximations.

138

5 Multi-Bernoulli Mixture and Multiple Hypothesis Tracking Filters

5.6 Conjugate Families In traditional Bayesian statistics, the prior and posterior probability distributions are conjugate if both are in the same family of distributions. As applied to parameterized distributions, e.g., the exponential family, it requires the prior and posterior to have exactly the same number of parameters. The Bayesian recursion called the linearGaussian Kalman filter is closed because the family of Gaussian distributions is a conjugate family in this strict sense. The concept of a “multiobject conjugate family” is a relaxed version of conjugacy that allows the prior and posterior to be of different sizes, as long as they are in the same family. As proved in the previous subsection, MBM processes are a conjugate family under this more relaxed definition since, if the prior is an MBM process, the predicted and Bayes posterior processes are also MBM processes. The family of PMBM processes is another example of a multiobject conjugate family [16]. The concept of multiobject conjugate families shines a much needed light into the tedious algebraic complexity of MBM filters, but it is nonetheless not as powerful as the classical concept. It is seen by inspection of (5.56) that the size of the posterior MBM process is larger than the size of the prior MBM, so the number of parameters needed to specify the exact MBM filter grows with each step of the recursion. Although the posterior MBM process has an exact closed algebraic form, the recursion is not closed in the classical Bayesian sense.

5.7 Numerical Example: JIPDAS Filter A simulated scenario to illustrate the JIPDAS filter is presented in this section. The tracking filter is implemented using a four-component, unlabeled, “bare-bones” Gaussian MB model with no object birth. The posterior at each scan is calculated analytically; particle filters are not employed in this section. The Bayes recursion for the JIPDAS filter is closed with a garden variety greedy style method. These kinds of methods are discussed briefly in Sect. 5.3.10. The basic steps taken here to ensure stability in closing the Bayesian recursion for JIPDAS are outlined below. “Industrial strength” methods are required for difficult problems, but they are outside the scope of this book. Simulated scenario. In the simulated scenario, four point objects move in continuous time at constant velocities. Each object n ∈ {1, 2, 3, 4} “exists” at all times in the closed interval [0, Tn ] seconds; the values for Tn are the colored numbers in the upper subplot of Fig. 5.2. Object n does not exist at any time t > Tn . Nonexistence in this case is equivalent to the fact that the object no longer induces sensor measurements. All object motion is confined to a 2D spatial region of interest R=[−2000, 2000] × [−1000, 1000] ⊂ R2 . Elements of the (superposed) object state space X and measurement space Y will be represented using boldface text (e.g., x and y) so that they

5.7 Numerical Example: JIPDAS Filter

139

will not be confused with their 2D spatial components, x and y. All spatial units are in meters, and all time units are in seconds. The state space X ⊂ R4 comprises position and velocity components; an element x ∈ X is represented in coordinate form as x = (x x˙ y y˙ )T . A sensor provides spatial x-y measurements at 1 s intervals starting at time t1 = 1 and ending at time t K = t240 = 240; that is, t = 1. The measurement space Y is a subset of R2 , and an element y ∈ Y is represented by the 2D spatial vector y = (x y)T . At each scan k, if object n exists in state x ∈ X, it induces a sensor measurement yk ∈ Y with constant probability Pdkn (x) ≡ pd = 0.9. Given that the object is detected, a measurement yk is generated according to the linear-Gaussian PDF pkn (yk |x) = N (yk ; H x, R), where 

Hkn

1000 ≡ Hk ≡ H = 0010



and

Rkn

≡ Rk ≡ R =

σ M2

10 01

(5.82)

with σ M = 40. Simulated clutter measurements are realizations of a homogeneous PPP with uniform clutter intensity over R ⊆ Y. Specifically, the mean number of clutter measurements λck ≡ λc = 50 is constant over all scans, and the clutter PDF 1 = 1.25 × 10−7 for y ∈ R and k = 1, . . . , 240. Thus, in each scan, pkc (y) ≡ Vol(R) there is an average of approximately 0.28 clutter measurements in a 3σ M measurement window. JIPDAS filter. The prior at scan k is a four-component Gaussian MB of the form Mk−1 =

 n  n χk−1 , N (x; μnk−1 , k−1 ) n=1,2,3,4 .

(5.83)

Notice that the PDFs are single Gaussians, not mixtures. For k = 1, the initial probabilities of existence χ0n are each set to 0.5, while the initial Gaussian parameters are set to μn0 = x0n and 0n = diag(1002 , 32 , 1002 , 32 ), where x0n is the (ground truth) location of object n at reference time t0 = 0. Linear-Gaussian assumptions are adopted for object motion. Equation (2.8) becomes (2.52) for n = 1, 2, 3, 4. The object motion model is assumed stationary, so that Fk ≡ F and Q k ≡ Q are constant over all scans, where ⎛

1 ⎜0 F =⎜ ⎝0 0

t 1 0 0

0 0 1 0

⎛ 3 ⎞ t t 2 0 32 2 ⎜ t t 0⎟ ⎜ 2 ⎟ and Q = σ ⎜ 2 p t ⎠ ⎝ 0 0 1 0 0

0 0 t 3 3 t 2 2

⎞ 0 ⎟ 0 ⎟ 2⎟ t ⎠ 2 t

with σ p = 1.5. Tracking filter parameters λc , pd , and σ M are matched to those of the simulation. For each object n and scan k, the filter survival probability is set to a constant value of ρk (x) ≡ ρs = 0.99 for all x ∈ X. Thus, the predicted object existence probability is given by n . (5.84) χkn− = ρs χk−1

140

5 Multi-Bernoulli Mixture and Multiple Hypothesis Tracking Filters

There is an important practical reason for setting ρs to a value strictly less than 1. n n n = 1 for some k and n, then χkn− = ρs χk−1 = χk−1 = 1. Indeed, if ρs = 1 and χk−1 n Inspection of Eq. (5.28) then shows that χk = 1 regardless of the assignment (κ, θ ), and regardless of whether n ∈ I(θ ) or n ∈ J (θ ).1 This simple fact, in conjunction with the greedy approach to closing the Bayesian recursion (see next paragraph), implies that once any measurement is associated in the posterior to any object n at any scan k, then χkn = 1 for all scans k  ≥ k. The exact JIPDAS posterior is an MBM; see Sect. 5.2.7. Closing the Bayesian recursion reduces the posterior to the (single) multi-Bernoulli form (5.83). This is done by employing a greedy strategy of the kind outlined in the final paragraph of Sect. 5.2.12. No approximations are used; the posterior weights in (5.27) are computed for every feasible (κ, θ ) pair. The MB corresponding to the pair with the largest weight w(κ, θ | yk ) is selected as the posterior (with a minor modification to the probabilities of existence, described in the next paragraph). Two important points regarding this strategy need to be made. First, calculating the weights for every feasible assignment pair (κ, θ ) is impractical for higher clutter rates and/or more potential objects. Second, a myopic approach that ignores all but one feasible object-measurement assignment is doomed to failure in complicated scenarios. In the relatively low clutter environment of this example with N = 4 Bernoulli objects, it works. Once the posterior MB is selected, one more modification is used. The PDFs in the posterior are Gaussians; indeed, for fixed (κ, θ ), measurement updates are the Kalman updates given by Eqs. (2.54) and (2.56). The final step in closing the recursion is to replace the posterior probability of existence for each n with χ¯ kn = max(χmin , χkn ) ,

(5.85)

where the threshold χmin = 0.1. What is the purpose of thresholding below at χmin ? Before thresholding, if no measurement is associated to object n in the assignment (κ, θ ), then from (5.28) and (5.84), the posterior probability of existence is given by χkn =

(1 − pd )ρs n n χk−1 . 1 − pd ρs χk−1

n Assume χk−1 is close to 0. Then the denominator in the above expression is approxin n = 0.099χk−1 . A series of scans in which greedy mately 1, and χkn ≈ (1 − pd )ρs χk−1 hypothesis selection assigns no measurement to object n sends χkn rapidly toward 0 as the iteration number k increases. The recursion falls into a hole from which simple greedy strategies are unable to escape. The greedy method chooses the single best n (κ, θ ). This assignment maximizes the weight given in (5.27), but if χkn− = ρs χk−1 is close to 0, then assigning a measurement to object n (i.e., n ∈ I(θ )) will make the product in (5.25) nearly 0. As a result, it is highly unlikely that the optimal θ assigns a measurement to n. 1 For

a fixed assignment (κ, θ), n ∈ I (θ) always implies a unity posterior probability of existence.

5.7 Numerical Example: JIPDAS Filter

141

1 ground truth post means 240

y (km)

0.5

0 60 180

-0.5 160

-1 -2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0.5

1

1.5

2

x (km) 1

y (km)

0.5

0

-0.5

-1 -2

-1.5

-1

-0.5

0

x (km) Fig. 5.2 (Top) – JIPDAS tracker output. pd = 0.9, priors are single Gaussians, not mixtures, posterior at each scan is calculated analytically, i.e., no particle filter is employed. The filter survival probability is set constant at 0.99. (Bottom) – Heat map to illustrate the tracker uncertainty when an object ceases to exist

After closing the recursion, the JIPDAS posterior is of the same form as (5.83), namely,   (5.86) Mk = χ¯ kn , N (x; μnk , kn ) n=1,2,3,4 . Figures. Figure 5.2 comprises two subplots. The upper subplot is the JIPDAS tracker output. Ground truth object position in x-y space is given by the four dashed lines, and the gray dots are the superposed clutter realizations over the last five scans (clutter realizations are independent from scan to scan). The gray circles represent the 2D spatial components of the four posterior mean estimates {μnk } for each scan k; gray

142

5 Multi-Bernoulli Mixture and Multiple Hypothesis Tracking Filters

actual estimated

object number

4

3

2

1

0 0

50

100

150

200

scan Fig. 5.3 JIPDAS filter estimate of the number of objects

circles are displayed if and only if the estimated probability of existence exceeds 0.5. The colored numbers represent the final scans {Tn } at which the objects existed, and the corresponding colored circles represent the (ground truth) 2D spatial locations of the objects at their final scans. For intuition, the brown dashed circle depicts a 3σ M measurement window. The lower subplot of Fig. 5.2 is the corresponding JIPDAS heat map. The extended 2D spatial region of interest R = [−2200, 2200] × [−1100, 1100] is divided into 800 × 400 grid cells. After each scan k, a collection of “particles” is drawn from the posterior Mk given in (5.86). The number drawn from each MB component n is directly proportional to its (thresholded) probability of existence χ¯ kn . The number of particles falling into each grid cell is calculated. These values are then summed over all 240 scans. The heat map is displayed on a log-scale in order to accentuate the tracker uncertainty when an object ceases to exist. The resulting cloudy looking “particle fans” are reminiscent of and related to Fig. 3.1 in Chap. 3. (Smaller values of χ¯ kn reduce the visibility of the fans.) Figure 5.3 gives the posterior JIPDAS filter estimate of the number of objects before applying the threshold in (5.85). The expected number of objects is given simply by the sum of the four pre-thresholded values χkn .

References

143

References 1. Ba-Tuong Vo and Ba-Ngu Vo. Labeled random finite sets and multi-object conjugate priors. IEEE Transactions on Signal Processing, 61(13):3460–3475, 2013. 2. Karl Granström, Lennart Svensson, Yuxuan Xia, Jason Williams, and Ángel F GarcíaFernández. Poisson multi-Bernoulli mixtures for sets of trajectories. arXiv preprint arXiv:1912.08718, 2019. 3. Yuxuan Xia, Karl Granström, Lennart Svensson, Ángel F García-Fernández, and Jason L Williams. Multi-scan implementation of the trajectory poisson multi-bernoulli mixture filter. arXiv preprint arXiv:1912.01748, 2019. 4. Stephen M Samuels. On the number of successes in independent trials. The Annals of Mathematical Statistics, 36(4):1272–1278, 1965. 5. Ronald PS Mahler. Statistical multisource-multitarget information fusion. Artech House, Inc., 2007. 6. Ba-Tuong Vo, Ba-Ngu Vo, and Antonio Cantoni. The cardinality balanced multi-target multiBernoulli filter and its implementations. IEEE Transactions on Signal Processing, 57(2):409– 423, 2008. 7. Jason L Williams. Marginal multi-Bernoulli filters: RFS derivation of MHT, JIPDA, and association-based MeMBer. IEEE Transactions on Aerospace and Electronic Systems, 51(3):1664–1687, 2015. 8. Ángel F García-Fernández, Jason L Williams, Karl Granström, and Lennart Svensson. Poisson multi-Bernoulli mixture filter: direct derivation and implementation. IEEE Transactions on Aerospace and Electronic Systems, 54(4):1883–1901, 2018. 9. Yuxuan Xia, Karl Granström, Lennart Svensson, and Ángel F García-Fernández. An implementation of the Poisson multi-Bernoulli mixture trajectory filter via dual decomposition. In 2018 21st International Conference on Information Fusion (FUSION), pages 1–8, 2018. 10. Stephan Reuter, Ba-Tuong Vo, Ba-Ngu Vo, and Klaus Dietmayer. The labeled multi-Bernoulli filter. IEEE Transactions on Signal Processing, 62(12):3246–3260, 2014. 11. Yuxuan Xia, Karl Granström, Lennart Svensson, Ángel F García-Fernández, and Jason L Williams. Extended target Poisson multi-Bernoulli mixture trackers based on sets of trajectories. In 2019 22th International Conference on Information Fusion (FUSION), pages 1–8, 2019. 12. Edson H Aoki, Pranab K Mandal, Lennart Svensson, Yvo Boers, and Arunabha Bagchi. Labeling uncertainty in multitarget tracking. IEEE Transactions on Aerospace and Electronic systems, 52(3):1006–1020, 2016. 13. Herbert E Rauch, F Tung, and Charlotte T Striebel. Maximum likelihood estimates of linear dynamic systems. AIAA journal, 3(8):1445–1450, 1965. 14. Roy L Streit. Interval/smoothing filters for multiple object tracking via analytic combinatorics. In 2017 20th International Conference on Information Fusion (Fusion), pages 1–8, 2017. 15. Donald Reid. An algorithm for tracking multiple targets. IEEE transactions on Automatic Control, 24(6):843–854, 1979. 16. Ángel F García-Fernández, Yuxuan Xia, Karl Granström, Lennart Svensson, and Jason L Williams. Gaussian implementation of the multi-Bernoulli mixture filter. In 2019 22th International Conference on Information Fusion (FUSION), pages 1–8, 2019.

Chapter 6

Wither Now and Why

“Nothing is as practical as a good theory.” James Clerk Maxwell apochryphal

Abstract The utility of AC and counting methods to show the essential unity of multiple object tracking filters is briefly reviewed. Several directions for future work are also discussed. Two methods are proposed for significantly reducing the high computational complexity of cross-derivatives when used in conjunction with particle filters. The AC method is also applicable to solving integer linear programs. ILPs are used in combinatorial optimization problems in tracking, specifically, the multiframe assignment problem, and in higher level information fusion. Two problems of this kind are highlighted: natural language processing and multisensor conflict resolution. Keywords Multiobject trajectory tracking · Multiscan tracking · Multiple unresolved objects · Merged measurements · Smoothing and fixed point PHD intensity filters · Multisensor tracking · Spatial and temporal pair correlation · Palm track extraction · Saddle point approximation · Multicomplex step method · Multiframe assignment · Combinatorial optimization · High-level information fusion

6.1 To Count or Not to Count, that Is the Question Combinatorial problems lie at the heart of many problems in multiple object tracking, and indeed many problems in information fusion. Each combination constitutes a collection of switches that, depending on which way they are thrown, define a “switch-free context” in which a mathematical model is formulated and solved. Think of the switching process as a kind of SQL (structured query language) in which the retrieved “data” specify a mathematical model to be solved by standard methods. All Civil rights speech by W. E. B. Du Bois, at Johnson C. Smith University, Charlotte, NC, 1960. © Springer Nature Switzerland AG 2021 R. Streit et al., Analytic Combinatorics for Multiple Object Tracking, https://doi.org/10.1007/978-3-030-61191-0_6

145

146

6 Wither Now and Why

too often, even well designed systems are opaque and difficult to understand, making them time consuming to extend and to maintain. AC is not going to solve these problems, but it can help. Its great strength—when it is applicable—is the extreme economy of its models. Such models effectively organize and guide the complicated details that often ensue in particular instances. The benefits of AC to multiple object tracking are of this kind. The topics examined in this book all deal in one way or another with questions of object existence, and what is their state if they do exist. AC formulates these questions in a natural way using generating functions. All problems are, in effect, converted into counting problems. The conversion to counting problems works well but is of little value when, as in classical single object tracking with no clutter, all the switches have been set in place and there is no model “doubt.” The value of counting begins to show when there are multiple objects that exist, but may or may not be assigned a measurement. The combinatorial flavor of this problem is inescapable when clutter is present. This is the JPDA filter presented in Chap. 3, and it is an unhappy experience (probably) for many who encounter it for the first time. And then things get worse. The combinatorial problem becomes even more daunting when each object may or may not exist, and may or may not be assigned a measurement even if it does exist. This is the JIPDA filter, also discussed in Chap. 3. The benefits of AC to tracking. AC brings a powerful modeling tool, namely generating functions, to multiobject tracking. The problems are not made easier using AC, but AC does make it possible to see—and read—the “big picture” in a single, compact, one line mathematical expression. Generating functions are defined using formal variables called indeterminates. Analytic combinatorics usually treats indeterminates as complex variables defined on some suitable “dual” spatial domain.1 The compact expressions for JPDA and JIPDA, for example, give significant and fresh insights into the modeling assumptions. Some were previously overlooked, as mentioned earlier in this book. GFLs also suggest various approximation methods that are established in other fields, but are new to tracking. Along with alternative ways to compute, the compact formulae suggest alternative ways to conceptualize and assess new possibilities. One pathway starts with JPDA and leads to JPDAS, or JPDA with object superposition. It is then a seemingly small step (but surely a big one at the time) to randomize the object number. AC makes the formulation very easy, given the generating function of object number. The result is the CPHD filter and its special Poisson case, the PHD filter. These superpositional filters build directly on the notion of counting. They estimate the spatial object intensity function, which is defined as the expected number of objects per unit state space. The simplifying assumptions that are needed for superposition, and the attendant information loss that it entails, are reviewed earlier in Chap. 4. The other pathway starts with JIPDA and leads to JIPDAS, or JIPDA with object superposition. Randomizing the number of Bernoulli object models, given the gen1 In signal processing, the domain would be termed a Fourier spectral domain over spatial frequency,

or wavenumber.

6.1 To Count or Not to Count, that Is the Question

147

Bayes-Markov (BM) Eq. 2.4

Bayes-Markov + DetecƟon (BMD) Eq. 2.19

PDA = BMD + CluƩer

IPDA = PDA + Existence

Eq. 2.21

JPDA = PDA with N objects Eq. 3.3

JIPDA = IPDA with N objects

Eqs. 2.37/40

Eq. 3.35

MB = JIPDA + SuperposiƟon Eqs. 5.17/19

MHT = JIPDA + Hypotheses

JPDAS = JPDA + SuperposiƟon Eqs. 4.6/8

Eq. 5.80

MBM = MB + Hypotheses Eq. 5.52

MHT without SpaƟal Diversity

CPHD = JPDAS + Random N Eq. 4.36

Eq. 5.81

PHD = CPHD with Poisson point process Eq. 4.80

LMBM = MBM + Labels Eq. 5.71

Equivalent

Fig. 6.1 AC derivations of the filters presented in the book and their connections to each other

erating function of the number of models, leads to the multi-Bernoulli (MB) filter. There are many ways to hypothesize about the data, and each hypothesis is an MB filter. This key insight makes the MB filter the genesis of the mixture MB, or MBM, filter. The mixture is over the number of hypotheses. The determined researcher will find that the MBM filter is the foot of an invigorating uphill slope. It leads naturally to the idea of labeling the individual Bernoullis and using the labels to connect object tracks from scan to scan. This is the LMBM filter, and AC greatly facilitates understanding the details. AC also shows the relation of the LMBM filter to the MHT filter. As discussed earlier, for one scan the LMBM filter is equivalent to an MHT filter in which objects have the same state space. Both pathways of AC derivations of the filters presented in the book and their connections to each other are shown in Fig. 6.1 along with the corresponding equation numbers in order to provide the full picture.

6.2 Low Hanging Fruit Using AC to model many of the best-known tracking filters in a unified manner is edifying, and clarifying, and also sometimes revealing of new relationships between them. It also holds promise, because it suggests new avenues for research that are, speaking metaphorically, as enticing as the aroma of fresh baked French baguettes on a warm Spring morning in Paris.

148

6 Wither Now and Why

Further work is needed to demonstrate the utility of AC in diverse problems. Certain applications are currently active areas of research, such as the tracking of multiple extended objects and multiple sensor tracking. Recent work of the authors [1, 2] explore hybrid filters—part JPDA and part PHD—for these problems. While such filters may be classed as low hanging fruit, they are also known and there is little need to say more about them here. Other hybrid filters are discussed in [3], and surely others still remain to be discovered. The list below highlights applications that are relatively unexplored by the methods proposed in this book, and are deemed likely to be worthy of inclusion in a future companion book. The same holds for the topics presented in the remaining sections of this chapter. They are predictions and, like all predictions, they come with the usual disclaimers: The list may be incomplete, and topics on the list are not guaranteed to lead to successful applications of AC. Multiobject trajectory filters. Multiscan, or multiframe, filters are well studied in the single-object-no-clutter tracking world. They are often called “smoothing” or “retrodictive” filters when used to estimate object state at scan times earlier than the current scan. The trade-space is time delay against reduced estimation error. The exact same concept pertains to multiple object tracking in clutter, but now the problem is significantly harder because of the combinatorics. Assignment errors drive state estimation errors, so the trade-space now is reduced assignment error against computational complexity (due to the multiframe processing). MHT and LMBM filters are presented in Chap. 5 in their single scan form for essentially didactic purposes, but both are multiscan filters. Trajectory filters, mentioned at the beginning of Sect. 5.4, are also of this kind. From the point of view of AC, however, the design of multiscan filters begins with the GFL of the sequential data. The conceptualization is basic to stochastic methods in tracking. The method is very much in the spirit of the book to this point: Find the GFL of one object’s trajectory over the multiscan sequence—it endures over the entire sequence or not, as Darwin prescribes—and then superpose an ensemble of object trajectories. This is the core stochastic model. It is superposed with other processes at each scan: a clutter process and a new object process. The GFL for the multiscan problem is derived in [4]. Tracking with multiple unresolved objects. The AC formulation of the problem for two objects was formulated in Sect. 3.4 of Chap. 3, together with an example in Sect. 3.5. The model there assumes that two objects are always present, but they may or may not be resolvable. Weakening the assumption from “exactly two objects are always present” to “at most two objects are present” leads to a different, but as yet unstudied, filter. The general problem with N known objects is much more complicated and also remains to be investigated by these methods. Of particular interest, possibly purely academic, is the performance of the superposed object filter, JPDAS, of Chap. 4 when there are merged measurements. Superpositional filters with possibly merged measurements appear not to have been studied. Spatial and temporal correlations. AC methods enable the calculation of statistics that are unavailable using other methods. In principle, the pair and higher order

6.2 Low Hanging Fruit

149

correlation functions are derived analytically from the GFL of the Bayes posterior process. The pair correlation function is computationally tractable, but its potential use in tracking remains to be fully explored. A prime example relates to sequential track extraction, or “matching pursuit” (to adopt a term from the machine learning community). Intensity filters that use object superposition (such as JPDAS, CPHD, PHD, JIPDAS, MBM, LMBM) employ various techniques to extract object tracks (i.e., point estimates) and an associated estimation error (i.e., area of uncertainty). The pair correlation function provides a Bayesian method to extract track estimates sequentially [5, 6], one at a time, from the point processes that are conditioned (in the Bayesian sense) on the earlier extracted object states. The conditional processes are Palm processes. The problem with this theoretically “proper” procedure is that the conditional intensity functions are insufficiently stable from scan to scan, so the resulting point estimates are unreliable compared to heuristic methods. An alternative to the single scan intensity filters considered in [5] is to use the intensity functions of trajectory filters. They are based on multiple scans and consequently may be stable enough to make the Bayesian matching pursuit approach to track extraction as the method of choice. In non-superposed traditional filters such as JPDA, it is reasonable to expect to be able to use the pair correlation function to prove, rather than merely demonstrate with examples, that JPDA has a track coalescence problem. It is also reasonable to think that the pair correlation function could be used to determine whether or not adding a merged measurement model improves the coalescence problem.

6.3 Techniques for High Computational Complexity Problems Tracking filters derived by AC methods require evaluating the cross-derivative of the GFL. The size of the derivative is, in most cases, at least as large as the number of measurements that condition the Bayes posterior process. Even with the help of modern symbolic software, it is often the case that the computational complexity of the cross-derivative is so high that it is impractical to evaluate exactly in practical applications.2 Thus, approximation is inevitable in many problems. There are two broad ways to approximate a problem. One is to approximate the problem itself, replacing it with another problem that is solved exactly (or to high accuracy). This is an excellent strategy in some problems. The risk is that changing the problem may compromise the utility of the computed solution for the problem at hand. Demonstrating the efficacy of the surrogate solution is necessarily done on a case by case basis. Consequently, this strategy is not considered further. The other strategy is to seek computationally efficient and accurate approximations of the calculation at hand, which in this case is the cross-derivative. This strategy has two very different branches, both of which are suitable for particle filters. The 2 For

example, the exact JPDA filter is NP-hard.

150

6 Wither Now and Why

idea is to evaluate the cross-derivative of the GFL of the exact Bayes posterior at the points that define the particles. If this can be done efficiently, then particle filters can be used without modification. If the cross-derivative calculation is exact, then the only errors in the filter are those committed by the particle filter approximation (and whatever approximation is done to close the Bayesian recursion). Saddle point approximations. The secular function of the GFL of the Bayes posterior is an analytic function of several complex variables. Its cross-derivative can be found directly, of course, but it is also exactly equivalent to a multivariate Cauchy integral (see Appendix C). If there is a large parameter involved, asymptotic expressions for the integral can be derived by the saddle point method and, under mild assumptions, the accuracy improves as the parameter gets larger. What is commonly known as the saddle point approximation is the first term in a series; higher order terms can improve the approximation. Tracking applications lack a large parameter, so the saddle point approximation is not asymptotic to the exact particle weight. An initial study [7] that was limited to the JPDA filter demonstrated that the ratio of the (first term only) saddle point approximate particle weight to the exact particle weight was approximately normally distributed with a mean μspt and standard deviation σspt < 0.01. With particle filter methods, μspt is a proportionality constant and, thus, does not affect particle filter performance. The small (1%) deviations of particle weights from the exact values were not systematic, and preliminary results confirmed that they did not affect filter performance. The complexity of the saddle point approximation grows as the cube of the number of measurements. This is nontrivial complexity, but small compared to NP-hard calculations. The application of the saddle point method to probability and combinatorics is extensively studied in the remarkable book [8]. The method is also an established tool in applied mathematics [9]. (In ray theory for electromagnetic and acoustic wave propagation, frequency is the large parameter.) Multicomplex step derivatives. The complex step method in Sect. C.5 of Appendix C for computing the derivative of an analytic function of one variable to machine precision is surprising on first encounter. How can the numerical value of the derivative of a function, f , be computed by evaluating f at only one point? The answer lies in evaluating it for a slightly complex number, as shown in Sect. C.5, but the feeling of magic lingers in the air. What is perhaps even more surprising is that by extending the domain of definition of a multivariate function appropriately to a “multicomplex space,” a multicomplex step method will compute the cross-derivative, again to machine precision. Moreover, it can compute any mixed derivative. The complex step method is merely the simplest of these methods. The cross-derivative of the secular function is needed in AC applications to tracking. It is evaluated at zero to find probabilities of particles, and at one to find intensity functions and other expectations at a specified point. Readers are referred to Appendix C for the basic one variable derivative. To evaluate the cross-derivative, it is necessary to evaluate the secular function on a multicomplex space. This space has venerable and classical mathematical roots and is detailed in [10]. Its use for

6.3 Techniques for High Computational Complexity Problems

151

computing mixed higher order derivatives, of which the cross-derivative is one, is discussed carefully in [11]. The accuracy of the classical central-difference method for computing numerical derivatives [12] is comparable to that of the multicomplex step method. As discussed in [13], the central-difference method requires 2 M function evaluations to approximate an order M cross-derivative. The computation is potentially expensive, depending on the secular function. Applications of these methods to problems in multiple object tracking remain to be investigated. Nonetheless, they are promising methods that may have the potential to contribute to solving difficult problems.

6.4 Higher Level Fusion and Combinatorial Optimization As discussed in Sect. 1.4 of Chap. 1, there are two broad Bayesian paths in the tracking literature. One path computes the probability distribution over the multiple object state, conditional on the measurements. As seen repeatedly throughout this book, the posterior distribution is a sum of many terms, each of which corresponds to a discrete combinatorial variable that is relevant to the problem, e.g., the assignments of measurements to objects. The discrete variables connect to the other Bayesian path in tracking, namely, combinatorial optimization problems. It begins with the same probabilistic foundation as the tracking filters, but the goal is to find the most probable of the discrete combinatorial variables in the posterior distribution. The change of goals is very significant because achieving them requires solving discrete optimization problems, and this in turn requires using specialized mathematical techniques that are rarely used in continuous problems. Integer linear programming (ILP) is perhaps the best known of these techniques. As shown in [14], ILPs can be recast in the native language of AC, namely, generating functions. Tracking problems are often thought of as lower level fusion when considered in the broader context of information fusion. As it happens, ILPs are also used extensively in higher level fusion problems. The jump from lower level fusion, i.e., tracking, to higher level information fusion is negotiated entirely by ILP, but the language of AC permeates both levels. The links may be strictly mathematical, but the connections may well run deeper. The remainder of this section presents disparate problems that are linked by AC and ILP. Multiframe assignment. In tracking problems it is reasonable to identify specific measurements as coming from specific objects, provided that objects really do generate point measurements in the sensor and satisfy the “at most one measurement per object rule.” As discussed in Sect. 2.3 of Chap. 2, the validity of these assumptions depends on the sensor response surface and factors like SNR and the point spread function. The multiframe assignment problem is to find the best assignments of measurements to objects over a sequence of scans. Its formulation as an ILP is documented

152

6 Wither Now and Why

in several papers; see [15, 16] and the references therein. For large problems the ILP is solved (approximately) by a Lagrangian relaxation algorithm. The generating function for this ILP and the ILP for the closely related set partitioning problem are derived in [14]. Measurement gating is especially simple in the optimization model, and it has the salient property of immediately reducing the size of the ILP. Higher level fusion. Higher level information fusion constitutes a collection of extremely diverse, important problems that are grouped together by general consensus. Regrettably, and unlike in tracking, there do not appear to be any well-defined “canonical” high-level fusion problems. ILPs are fundamental tools used to model and solve several important high-level problems. Perhaps they can serve as a normative mathematical language in which to cast seemingly dissimilar problems to discover previously unseen relationships between them. In any event, the following problems are of current interest and are posed and solved as ILPs. • Natural language processing (NLP): The constrained conditional modeling (CCM) approach to NLP [17, 18] begins with the “local” models of data. These models are typically hidden Markov models (HMMs) that are carefully developed and make expert use of grammar rules, language usage, etc. Higher order Markovian models can capture non-local “declarative” knowledge in ways that the local models do not. Higher order models require large data sets and are difficult to train. The CCM approach uses linear integer constraints, not high-order models, to encode declarative information into the HMM decision process. This approach reduces to solving an appropriately formulated ILP. A careful exposition of the CCM method is outside the scope of this book. An insightful illustrative example is given in [19]. • Multiple graph conflict reconciliation: These problems are proposed in [20, 21] to solve the approximate common subgraph (ACS) problem. Their goal is to combine, or fuse, the outputs of “intelligent” sensors whose outputs are represented as relational graphs in which vertices (nodes) and edges have object attributes. Examples include images of road networks, feature-aided tracking, and robot navigation [22]. Hard-soft information fusion problems can also take this form [23], especially if NLP is involved. The problem as posed is especially interesting because the sensor output graphs are noisy, both in reported attributes and network connectivity. To find the ACS, it is, therefore, necessary to use a trade space that exchanges graphical structure against probabilities of the reported attributes and edges. It is also necessary to find the ACS in a way that respects transitivity constraints when there are three or more sensors. The generating function of their ILP is derived in [14].

References

153

References 1. Roy L Streit. JPDA intensity filter for tracking multiple extended objects in clutter. In 2016 19th International Conference on Information Fusion, pages 1477–1484. IEEE, 2016. 2. R Blair Angle and Roy L Streit. Multisensor JiFi tracking of extended objects. In 2019 22th International Conference on Information Fusion, pages 1–8, 2019. 3. Christoph Degen. Probability generating functions and their application to target tracking. PhD thesis, University of Bonn, Germany, 2017. 4. Roy L Streit. Interval/smoothing filters for multiple object tracking via analytic combinatorics. In 2017 20th International Conference on Information Fusion, pages 1–8, 2017. 5. A Onder Bozdogan, Roy L Streit, and Murat Efe. Reduced palm intensity for track extraction. IEEE Transactions on Aerospace and Electronic Systems, 52(5):2376–2396, 2016. 6. A Onder Bozdogan, Murat Efe, and Roy L Streit. A new heuristic for multisensor PHD filter. In 17th International Conference on Information Fusion, pages 1–7, 2014. 7. Roy L Streit. Saddle point method for JPDA and related filters. In 2015 18th International Conference on Information Fusion, pages 1680–1687, 2015. 8. P Flajolet and R Sedgewick. Analytic combinatorics. Cambridge University Press, 2009. 9. Carl M Bender and Steven A Orszag. Advanced mathematical methods for scientists and engineers I: Asymptotic methods and perturbation theory. Springer Science, 2013. 10. Griffith B Price. An introduction to multicomplex spaces and functions. M. Dekker New York, 1991. 11. Gregory Lantoine, Ryan P Russell, and Thierry Dargent. Using multicomplex variables for automatic computation of high-order derivatives. ACM Transactions on Mathematical Software (TOMS), 38(3):1–21, 2012. 12. Philip E Gill, Walter Murray, and Margaret H Wright. Practical optimization. SIAM, 2019. 13. Roy L Streit. A technique for deriving multitarget intensity filters using ordinary derivatives. J. Adv. Inf. Fusion, 9(1):3–12, 2014. 14. Roy L Streit. Analytic combinatorics and labeling in high level fusion and multihypothesis tracking. In 2018 21st International Conference on Information Fusion (FUSION), pages 1–5, 2018. 15. Aubrey B Poore. Multidimensional assignment formulation of data association problems arising from multitarget and multisensor tracking. Computational Optimization and Applications, 3(1):27–57, 1994. 16. Aubrey B Poore and Nenad Rijavec. A new class of methods for solving data association problems arising from multiple target tracking. In 1991 American Control Conference, pages 2303–2304. IEEE, 1991. 17. Ming-Wei Chang, Lev Ratinov, and Dan Roth. Structured learning with constrained conditional models. Machine learning, 88(3):399–431, 2012. 18. Dan Roth and Wen-tau Yih. A linear programming formulation for global inference in natural language tasks. Technical report, 2004. 19. Wen-tau Yih. Global inference using integer linear programming. Supplemental document for reference [2], 2004. 20. Gregory Tauer, Rakesh Nagi, and Moises Sudit. The graph association problem: mathematical models and a Lagrangian heuristic. Naval Research Logistics (NRL), 60(3):251–268, 2013. 21. Gregory Tauer and Rakesh Nagi. A map-reduce Lagrangian heuristic for multidimensional assignment problems with decomposable costs. Parallel Computing, 39(11):653–668, 2013. 22. Tim Bailey, Eduardo Mario Nebot, JK Rosenblatt, and Hugh F Durrant-Whyte. Data association for mobile robot navigation: A graph theoretic approach. In Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No. 00CH37065), volume 3, pages 2512–2517, 2000. 23. Ketan Date, Geoff A Gross, Sushant Khopkar, Rakesh Nagi, and Kedar Sambhoos. Data association and graph analytical processing of hard and soft intelligence data. In Proceedings of the 16th International Conference on Information Fusion, pages 404–411, 2013.

Appendix A Generating Functions for Random Variables

“A generating function is a clothesline on which we hang up a sequence of numbers for display.” In generating functionology by Herbert Wilf

Abstract Generating functions (GFs) for a finite number of random variables are defined, and their basic properties are reviewed. The GF form of Bayes Theorem is presented. Differentiation is seen as a decoding method that recovers the underlying probability distribution. GFs for histograms with a random number of samples are presented, as a precursor to GFs for finite point processes. Keywords Generating function · Multivariate generating function · Bayesian inference · Histograms · Random histograms

A.1

Introduction

The basic concepts and computational capabilities of GFs for a finite number of random variables are reviewed. The discussion makes no direct reference to object tracking. An informal and relaxed writing style is adopted to facilitate insight and understanding, and to support independent study and algorithm development. GFs are the workhorses of AC—they are powerful mathematical tools for Bayesian inference and quantitative analysis in high-dimensional problems. The material in this appendix is gathered from many sources; none of it is new. Section A.2 starts with basic definitions and first concepts for GFs in one random variable. The GFs of several commonly used discrete probability distributions are listed. Section A.3 is bivariate GFs and how they are used to express Bayes Theorem. This section is an easy read, but Sect. A.4 is not. It can be skipped on a first © Springer Nature Switzerland AG 2021 R. Streit et al., Analytic Combinatorics for Multiple Object Tracking, https://doi.org/10.1007/978-3-030-61191-0

155

156

Appendix A: Generating Functions for Random Variables

reading since it presents the multivariate versions of the bivariate results discussed in Sect. A.3. Section A.5 presents GFs for histograms with random numbers of samples. Histograms are important because they lead to the GFs of finite point processes. Scholium. The published literature on GFs is frustrating for beginning readers. Although many general textbooks, e.g., [1] give definitions and first properties, and they often go no deeper. Literature in the “middle” is harder to find. The introductory textbook [2] is an excellent start. The charmingly written book [3] goes a step deeper. Yet another step toward the deep end is the small book [4]. All three are well written, largely complementary, and suitable for independent study. Many interesting discussions of GFs and their applications in population dynamics and cascade modeling are found in two justly famous texts on branching processes, [5, 6]. Finally, three books from the deep end of the pool are the research tomes [7, 8, 9]. The first of these [7] is especially interesting for its pedagogical style.

A.2

Definitions and Basic Properties for One Variable

Encoding a sequence. The GF of the sequence (a0 , a1 , a2 , . . .) is defined by G A (z) =

∞ k=0

ak z k .

(A.1)

The numbers ak may be real or complex. The variable z is called the indeterminate, and it is complex valued. In enumerative combinatorial problems, the index k is often the “size” of a “configuration” and ak ≥ 0 is the number of configurations of size k. Because convergence of the GF is irrelevant, G A (z) is called a formal power series in z. Wilf [3] charmingly speaks of it as a clothesline (see quote above). The digital  signal processing community calls G A z −1 the z-transform of the sequence. In many applications, (a0 , a1 , a2 , . . .) is a probability sequence, i.e., ∞ k=0

ak = 1 and ak ≥ 0.

(A.2)

In this case, G(z) is called the probability generating function (PGF) [1] of the nonnegative random integer A, where ak = Pr{A = k}. When the context is clear, PGFs are called simply GFs. Probabilities (decoding). Substituting z = 1 into (A.1) and using (A.2) shows that the series G A (z) converges for z = 1. Thus, by well-known results (Sect. C.1) in complex analysis, the radius of convergence of G A (z) is at least one, and the power series converges for every z in the open disk D0 = {z : |z| < 1}. It is analytic there, meaning that it is infinitely differentiable at every point in D0 , and the series can be differentiated term-by-term. Differentiating k times and evaluating at z = 0 gives

Appendix A: Generating Functions for Random Variables

  1 (k) 1 dk  ak = G A (0) ≡ G (z) for k = 0, 1, 2, . . . . A  k! k! dz k z=0

157

(A.3)

In words, GFs encode sequences and differentiation decodes the GFs. A frequently used alternative notation is   ak = z k G A (z) for k = 0, 1, 2, . . . .

(A.4)

This form is the derivative (A.3) but without the factorial. Factorial moments. The GFs of most commonly used random variables have a radius of convergence that is strictly greater than one, so they are analytic inside a (sufficiently small) neighborhood around z = 1. The examples listed at the end of this subsection are differentiable at z = 1. This is very convenient in practice because, in these cases, the derivatives of G A (z) evaluated at z = 1 correspond to the factorial moments [1] of A. For example, the first derivative of G A (z) evaluated at z = 1 is G A (1) =

 ∞ d ∞  ak z k  = kak ≡ E[A] . k=0 k=1 z=1 dz

(A.5)

The mean is the first factorial moment. In these cases too, the nth derivative is G (n) A (z) =

∞ k=n

(k)n ak z k−n , n = 0, 1, 2, . . . ,

(A.6)

where the falling factorial is defined by (k)n = k(k − 1) · · · (k − n + 1) for n ≥ 1 and (k)0 = 1. Evaluating at z = 1 gives the factorial moment     m [n] = G (n) A (1) ≡ E (A)n = E A(A − 1) · · · (A − n + 1) .

(A.7)

This expression holds when the radius of convergence of G A (z) is greater than one. A corollary is that all the factorial moments exist when the GF is analytic in a neighborhood of z = 1. When the radius of convergence is exactly one, a little more care is needed. Abel’s Theorem [10] applied to (A.6) states that if the series ∞ k=n (k)n ak converges, then the left limit at z = 1 taken on the real line is ∞ (k)n ak ≡ m [n] . (A.8) lim− G (n) A (z) = z→1

k=n

In words, the factorial moments are limits of derivatives from inside the unit disk, provided the moments exist, i.e., the series (A.8) converges. Nota Bene. Not all random integers have finite factorial moments. Suppose, for example, that A is zeta-distributed. Its PMF is defined by ak = π 26k 2 , k = 1, 2, . . . , and a0 = 0. The GF, 6 ∞ z k , (A.9) GA (z) = 2 k=1 k 2 π

158

Appendix A: Generating Functions for Random Variables

converges to one at z = 1, but it is not continuous there since lim z→1+ G A (z) = ∞. The series converges at z = 1, so Abel’s Theorem holds and G A (z) is left continuous at z = 1, that is, lim z→1− G A (z) = G A (1). In contrast, the first derivative of G A (z) at z = 1 is proportional to the harmonic series, so it diverges and Abel’s Theorem does not hold. In other words, the zeta distribution does not have an expected value (like Cauchy’s PDF on the real line). What is happening in this example is that the GF of the zeta distribution (A.9) has a singularity known as a branch point at z = 1. More specifically, it is a logarithmic branch point [11]. The GFs encountered in this book all have a radius of convergence greater than one, so no special care is needed to evaluate their derivatives at z = 1. Probabilistic mixtures of GFs. Let E denote the event space of the random variable A in (A.2). For J ≥ 1, let the events  j , j = 1, . . . , J, form a partition of E, so that Pr{1 } + · · · + Pr{ J } = 1. If Pr{ j } > 0, then by the law of total probability [1], ak = Pr{A = k} =

J j=1

Pr{ j } Pr{A = k |  j } .

(A.10)

Let ak j = Pr{A = k |  j }. Substituting (A.10) into (A.1) and interchanging the sums over k and j gives J Pr{ j } G A| j (z) , (A.11) G A (z) = j=1

where G A| j (z) =

∞ k=0

ak j z k ,

j = 1, . . . , J

(A.12)

is the GF of the random variable A conditioned on event  j . GFs of frequently occurring discrete probability distributions. • Bernoulli(β), the number of successes/failures (heads/tails) in one trial (coin toss), with β = Pr{success} and α = Pr{failure} = 1 − β: G Bernoulli (z) = α + βz.

(A.13)

• Binomial(β, n), the number of successes in n IID trials of a Bernoulli(β) variable:  n G Binomial (z) = (α + βz)n = G Bernoulli (z) .

(A.14)

• Multi-Bernoulli(β1 , . . . , βn ), the number of heads in n independent Bernoulli(βk ) trials: n (A.15) G MultiBernoulli (z) = (αk + βk z) , k=1

where αk + βk = 1, αk ≥ 0, βk ≥ 0. • Geometric(β), the number of failures in a sequence of IID Bernoulli(β) trials until the first success:

Appendix A: Generating Functions for Random Variables

G Geometric (z) =

∞ k=0

(1 − β)k β z k =

159

1 β , |z| < 1 − (1 − β)z 1−β

(A.16)

• Negative binomial, NB(r, β), the number of successes in a sequence of IID Bernoulli(β) trials until r failures occur: 1 − β r ∞ k + r − 1 1 G NB (z) = (1 − β)r β k z k = (A.17) , |z| < k=0 k 1 − βz β • Poisson(λ) with mean number λ ≥ 0: G Poisson (z) =

∞ k=0

e−λ

λk k z = e−λ+λz . k!

(A.18)

Several of these examples illustrate an important result shown in the next section, namely, the GF of the sum of independent and identically distributed (IID) random integers is the product of their GFs. The GFs of the Poisson, binomial, geometric, and negative binomial distributions are special cases of, or the limit of, distributions in a larger class of Panjer probability distributions, as they are known in actuarial science. Other distributions in the Panjer class are being applied in tracking problems. For further discussion of Panjer distributions, see [12, 13].

A.3

Bivariate Generating Functions

The joint probability GF of the nonnegative random integers A and B is G AB (z 1 , z 2 ) =

∞

j

k, j=0

Pr{A = k, B = j} z 1k z 2 .

(A.19)

The GF is analytic on the open disk D0 in each of the indeterminates z 1 and z 2 separately; that is, it is analytic in z 1 given a fixed value of z 2 with |z 2 | < 1, and vice versa. Therefore, by [14, Hartogs’ Thm.], G AB (z 1 , z 2 ) is jointly analytic on the bidisc D02 ≡ {(z 1 , z 2 ) : |z 1 | < 1, |z 2 | < 1}. This is liberating—it justifies differentiating the series (A.19) term-by-term any number of times at any point in D02 . The GF is convergent at the point z 1 = z 2 = 1, so by Abel’s Theorem, it is left continuous there. Differentiating the GF term-by-term gives the joint probabilities, 1 (k, j) G (0, 0) k! j! AB  j ≡ z 1k z 2 G AB (z 1 , z 2 ) .

Pr{A = k, B = j} =

(A.20) (A.21)

160

Appendix A: Generating Functions for Random Variables

Both the derivative and coefficient forms for the probabilities are convenient, depending on the situation. GFs for marginal distributions. The GFs of the marginal distributions are G A (z 1 ) = G AB (z 1 , 1) G B (z 2 ) = G AB (1, z 2 ) .

(A.22) (A.23)

To see (A.22), for instance, substitute z 2 = 1 into (A.19) and verify that G AB (z 1 , 1) = = =

∞

Pr{A = k, B = j} z 1k

k, j=0 ∞ ∞ k=0

∞

k=0

j=0

Pr{A = k, B = j} z 1k

Pr{A = k} z 1k = G A (z 1 ) .

The last step holds by definition, since Pr{A = k} ≡

∞ j=0

Pr{A = k, B = j}.

GFs for sums. The GF of the sum of the random variables A and B is the “diagonal” of the joint GF. To see this, let z 1 = z 2 = z in (A.19) and verify that G AB (z, z) = =

∞ n=0

∞ n=0

⎛ ⎝





Pr{A = k, B = j}⎠ z n

(A.24)

k, j≥0, k+ j=n

Pr{A + B = n} z n ≡ G A+B (z) ,

(A.25)

where the last expression is the GF of A + B. If A and B are independent, the GF of their sum is the product of their GFs; that is, G A+B (z) = G A (z)G B (z). To see this, let Pr{A = k} = ak and Pr{B = k} = bk . By the independence assumption, Pr{A = k, B = j} = ak b j for all k and j, so Pr{A + B = n} =

 k+ j=n k, j≥0

Pr{A = k, B = j} =

n r =0

ar bn−r .

Substituting into (A.25) gives G A+B (z) = =

∞ k k=0

∞ k=0

ar bk−r z k r =0

∞

ak z k bk z k = G A (z)G B (z) , k=0

(A.26)

where G A (z) and G B (z) are the GFs of A and B, respectively. GFs for Bayes Theorem. The GF of the posterior distribution of A conditioned on B = j was derived from Bayes Theorem in Chap. 1, Sect. 1.6.1. It was shown there to be the normalized derivative of the joint GF. Restated in the notation of this

Appendix A: Generating Functions for Random Variables

subsection, the GF is

161

(0, j)

G A|B= j (z 1 ) =

G AB (z 1 , 0) (0, j)

G AB (1, 0)

(A.27)

or, equivalently, as [7, Prop. III.1]  j z G AB (z 1 , z 2 ) G A|B= j (z 1 ) =  2j  , z 2 G AB (1, z 2 )

(A.28)

where the coefficient form (A.21) is used. These results for bivariate GFs extend to multivariate GFs.

A.4

Multivariate Generating Functions

For completeness, this section gives the multivariate versions of the bivariate results in the previous section. A discussion of GFs for general multivariate statistical analysis, including Bayes Theorem, is presented in [15]. The joint GF of M ≥ 1 nonnegative random integers A1 , . . . , A M is defined by ∞ 

G A1 ··· A M (z 1 , . . . , z M ) =

Pr{A1 = k1 , . . . , A M = k M } z 1k1 · · · z kMM .

(A.29)

k1 ,...,k M =0

If A1 , . . . , A M are independent, then Pr{A1 = k1 , . . . , A M = k M } = Pr{A1 = k1 } · · · Pr{A M = k M } and the joint GF factors, that is, G A1 ··· A M (z 1 , . . . , z M ) =

M m=1

⎛ ⎝

∞ 

⎞ Pr{Am = km } z mkm ⎠ ≡ G A1 (z 1 ) · · · G A M (z M ).

km =0

(A.30) The joint GF is analytic in each indeterminate z m on the open unit disk D0 provided only that the values of the other indeterminates are less than one in magnitude. Therefore, just as in the bivariate case, it is jointly analytic [14] on the polydisc D0M ≡ {(z 1 , . . . , z M ) : |z 1 | < 1, . . . , |z M | < 1}. Term-by-term differentiation of the series (A.29) is justified at any point in D0M . The derivative may or may not exist at the point (z 1 , . . . , z M ) = (1, . . . , 1) ≡ 1 M , as a multivariate version of the zeta distribution shows. When the derivative does exist at 1 M , then it is left continuous there, by Abel’s Theorem, and the derivative is the left limit of its term-by-term derivative.

162

Appendix A: Generating Functions for Random Variables

A more compact notation for the general multivariate GF is G A1:M (z 1:M ) =

∞ k1:M = 0

k1:M Pr{A1:M = k1:M } z 1:M .

(A.31)

Differentiating the series term-by-term, evaluating at the origin, and dividing by the appropriate product of factorials gives the joint probability. Two different but closely related notations are used:  1  1 ,··· , k M ) G (k (z , . . . , z )  1 M z 1 = ··· = z M = 0 k1 ! · · · k M ! A1 ··· A M   k1 kM = z 1 · · · z M G A1 ··· A M (z 1 , . . . , z M ) .

Pr {A1:M = k1:M } =

(A.32) (A.33)

The derivative expression is natural, but it is cumbersome in some situations. The coefficient form is often a less intrusive notation. Marginal distributions. The GF of the univariate marginal distributions of the joint GF are, for m = 1, . . . , M, G Am (z m ) = G A1 ··· A M (1, . . . , 1, z m , 1, . . . , 1) ,

(A.34)

as is seen by paralleling the steps of the bivariate case. The GF of the marginal distributions of any number of variables is similar. For example, G A1 A M (z 1 , z M ) = G A1 ··· A M (z 1 , 1, . . . , 1, z M )

(A.35)

is the GF of the marginal probability distribution summed over all but the first and last random variables. In words, to integrate out one or more random variables, set their indeterminates to one in the joint GF. The resulting expression is the GF of the distribution of the variables that remain. Sums. The GF of the sum A ≡ A1 + · · · + A M is the diagonal of the joint GF, i.e., G A (z) = G A1 ··· A M (z, . . . , z) .

(A.36)

This result and its derivation generalize those of the bivariate case (A.25). Details are omitted. If A1 , . . . , A M are independent, then G A (z) =

M m=1

G Am (z) ,

(A.37)

where G Am (z) are the univariate GFs defined in (A.30) and (A.34). Bayes Theorem. The GF for the posterior distribution given by Bayes Theorem is very similar to the bivariate case. Let Am = {A = k ≥ 0,  = 1, . . . , M,  = m} denote an event that specifies values for all variables A1:M except Am . The GF of Am conditioned on the event Am is, in coefficient form,

Appendix A: Generating Functions for Random Variables

G Am |Am (z m ) =

163

 1  k1 km−1 km+1 z 1 · · · z m−1 z m+1 · · · z kMM G A1 ··· A M (z 1 , . . . , z M ) , c

(A.38)

where c is a scale constant that is chosen so that the GF equals one for z m = 1. Despite what it seems at first glance, the right-hand side of (A.38) is a series in z m alone. The coefficients of the joint GF are all nonnegative, so evaluating it at z m = 1 gives an explicit expression for the constant c > 0. Changing the conditioning from Am to events that specify values for several variables does not change the fundamental character of the Bayesian expression (A.38). Remarkably, all that is required is to extract the appropriate series from the joint GF and normalize it by its value at one. The appropriate series is the coefficient of the monomial whose powers are the values of the conditioning variables. This coefficient is a multivariate series in the indeterminates of those and only those variables that are not conditioned on. If the random variables A1 , . . . , A M are independent, and the conditioning events specify values for some of the variables, Bayes Theorem is essentially trivial (except for the confidence the general theorem provides). The GF factors in this case, as shown in (A.30). It is straightforward to verify analytically, using the derivative and coefficient forms above, that the GF of the Bayes posterior distribution is the product of the GFs of the variables left unspecified in the conditioning.

A.5

Generating Functions for Random Histograms

The humble histogram is a multivariate random variable. For example, think of the M = 37 pockets of a French roulette wheel as the cells of a histogram. In one trial, or roll, the ball falls at random (in theory) into one and only one pocket. The histogram records the number of times the ball falls into each pocket over a number of trials. The histogram count record is an integer valued random vector of length 37. More generally, think of a random variable whose outcomes, or samples, are mapped into one of M histogram cells. Let pm = Pr{cell m} be the probability that a sample is mapped to cell m. Let z m denote the indeterminate for cell m. The GF of the M random integers for one sample is that of an M category Bernoulli distribution, G MCBernoulli (z 1 , . . . , z M ) =

M m=1

pm z m .

(A.39)

This follows from the definition (A.29), for if a sample falls in cell m, then km = 1 and other indices are zero, so z 1k1 · · · z kMM = z m . Let N denote the total number of trials recorded by an M category histogram. The trials are assumed IID. By independence, the GF of the sum is the product of the GFs of one trial. The samples are also identically distributed, so the GF of the histogram is that of a multinomial distribution,

164

Appendix A: Generating Functions for Random Variables

G Multinomial (z 1 , . . . , z M | N ) =

N

 M m=1

pm z m

.

(A.40)

Suppose now that N is random and that its GF is G N (z) =

∞ n=0

Pr{N = n}z n .

(A.41)

Multiplying (A.40) by Pr{N = n} and adding gives the GF of a histogram with a random number of samples: G Histogram (z 1 , . . . , z M ) =

∞  n=0

= GN

Pr{N = n}  M m=1

 M m=1

n pm z m

pm z m .

(A.42)

This expression is a composition of GFs. Using (A.39),   G Histogram (z 1 , . . . , z M ) = G N G MCBernoulli (z 1 , . . . , z M ) .

(A.43)

The distribution of N strongly influences the distribution of the random histogram, as the next two examples show. Poisson histograms. If N is Poisson distributed with mean λ, then, using (A.18),

pm z m .

(A.44)

exp (−λpm + λpm z m ) .

(A.45)

M G PoiHistogram (z 1 , . . . , z M ) = exp −λ + λ

m=1

The joint GF factors into a product of GFs as G PoiHistogram (z 1 , . . . , z M ) =

M m=1

Each factor depends on only one indeterminate, which proves that the cell counts are independent. Moreover, the form of the factors shows that they are Poisson distributed with means λp1 , . . . , λp M . That the cell count of a multinomial distribution with a Poisson number of trials are independent Poisson variables [16, Sect. 2.9] is counterintuitive on first encounter. The Poisson assumption may facilitate principled mean field models and approximations, but it can also be lazy in that it is sometimes imposed solely for the convenience of having independent variables.1 A numerical example is helpful. For a fair French roulette wheel, M = 37 and the pocket probabilities are pm = 1/37, so the generating function G Wheel (z 1 , . . . , z 37 ) is (A.44) with M = 37. Suppose N = 4 independent trials are performed and counts for cells (pockets) 9, 16, and 25 are, respectively, 1, 2, and 1. The counts for the other cells are zero. The order is irrelevant. Denote this event by O121 . Let the vector 1 Making a Poisson assumption without justification is referred to as a Poisson gambit in [16] because

it carries mismodeling risks that need to be independently assessed.

Appendix A: Generating Functions for Random Variables

165

ε = (ε1 , . . . , ε37 ) have components 9, 16, and 25 equal to, respectively, 1, 2, and 1, and equal to zero otherwise. The derivative form of the probability of the event is   d4 1 G Wheel (z 1 , . . . , z 37 ) Pr{O121 } = 2 2 dz 9 dz 16 dz 25 z 1 = ··· = z 37 = 0 1 −λ 4 2 = e λ p9 p16 p25 . 2

(A.46)

The usual enumerative derivation of this result will multiply Pr{N = 4} by 2 (4!/2!) p9 p16 p25 , where the coefficient 4!/2! accounts for the irrelevance of order in the four trials that gave the cell counts. On the other hand, differentiation is avoided altogether by making use of the fact that the cell counts are independent given a Poisson assumption for N . The probability of O121 is the product of the probabilities of 37 events: {one ball in cell 9}, {two balls in cell 16}, {one ball in cell 25}, and zero balls in the other 34 cells. They are Poisson variates with, respectively, parameters λp9 , λp16 , λp25 , and λpm , m = {9, 16, 25}; thus,

   (λp16 )2  −λp25 e λ p25 e−λpm , Pr{O121 } = e−λp9 λ p9 e−λp16 2! m={9,16,25} which is merely a rearrangement of (A.46). The GF of the Bayes posterior distribution conditioned on O121 is trivial. Because of independence, it is the product of the Poisson GFs of the 34 cells left unspecified by the conditioning event. Geometric(β) histograms. Alternatively, if N is distributed as Geometric(β), then, using (A.16), the joint GF is G GeoHistogram (z 1 , . . . , z M ) =

β . M 1 − (1 − β) m=1 pm z m

(A.47)

The GF of the Bayes posterior is derived from the joint GF using (A.38). To avoid needless generality, consider French roulette and the event O121 from the Poisson histogram example. The mixed derivative is 2 p25 24 β(1 − β)4 p9 p16 d4 G (z , . . . , z ) = GeoHistogram 1 M

4 . (A.48) 2  dz 9 dz 16 dz 25 M 1 − (1 − β) m=1 pm z m

The probability of O121 is the derivative evaluated at z 1 = · · · = z 37 = 0 and divided by 2. The GF of the posterior distribution conditioned on O121 is proportional to (A.48) evaluated at z 9 = z 16 = z 25 = 0. Using obvious notation, the Bayes GF is

166

Appendix A: Generating Functions for Random Variables

c G BayesGeoHistogram (z={9,16,25} ) =

4 ,  1 − (1 − β) m={9,16,25} pm z m

(A.49)

where c is chosen so the right-hand side evaluates to one for z={9,16,25} = 1. Substituting out the constant and arranging terms gives  G BayesGeoHistogram (z={9,16,25} ) =

 1 − 1 − βˆ

βˆ  m={9,16,25}

4 pˆ m z m

,

(A.50)

where βˆ = 1 − (1 − β) pˆ m = 

 m  ={9,16,25}

pm m  ={9,16,25}

pm 

pm 

, for m = {9, 16, 25}.

It follows from (A.50) that the Bayes posterior distribution is the sum of four IID 34ˆ distributions, which is equivalent to a 34-category negative category Geometric(β) ˆ distribution. binomial NB(4 , 1 − β) Probability of cell counts greater than one. Events like O121 that have one or more cells with a count greater than one are rare when N is fixed and M is sufficiently large; in fact, their relative abundance goes to zero as M → ∞. The first sample falls in cell m 1 with probability pm 1 . Let P1 = 1. Because samples are IID, for n = 2, . . . , N , the probability that the nth sample falls in a cell m n that is different from n−1 pm k−1 , assuming that M > N . the preceding cells m 1 , . . . , m n−1 is Pn ≡ 1 − k=1 Thus, the probability that all N cell counts are equal to one is P ≡ P1 P2 · · · PN . Assuming the cell probabilities pm → 0 as M → ∞, it follows, since N is fixed, that P → 1. In words, for sufficiently large M and fixed N , histogram cell counts are either one or zero with high probability.

References 1. Bernard Lindgren. Statistical theory. 4th Ed. Chapman and Hall/CRC, 1993. 2. Robert A Beeler. How to Count: An Introduction to Combinatorics and Its Applications. Springer, 2015. 3. Herbert S Wilf. generatingfunctionology. AK Peters/CRC Press, 2005. 4. Sergei K Lando. Lectures on generating functions. American Mathematical Soc., 2003. 5. KB Athreya and PE Ney. Branching processes. Springer-Verlag, 1972 (Dover Publications, 1972). 6. Theodore E Harris. The theory of branching process. Springer-Verlag, 1963 (Dover Publications, 2002). 7. P Flajolet and R Sedgewick. Analytic combinatorics. Cambridge University Press, 2009.

Appendix A: Generating Functions for Random Variables

167

8. Robin Pemantle and Mark C Wilson. Analytic combinatorics in several variables. Cambridge University Press, 2013. 9. RL Graham, M Grötschel, and L Lovasz. Handbook of combinatorics, Volume I and II. MIT Press, 1995. 10. Niels Abel. Untersuchungen uber die reihe. Journal fur Math, Theorem IV(1): 311–339, 1826. 11. Leonard C Maximon. The dilogarithm function for complex argument. Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, 459(2039):2807–2819, 2003. 12. Harry H Panjer. Recursive evaluation of a family of compound distributions. ASTIN Bulletin: The Journal of the IAA, 12(1):22–26, 1981. 13. Michael Fackler. Panjer class united - one formula for the Poisson, binomial, and negative binomial distribution. In Proc. ASTIN Colloquium, Helsinki. International Actuarial Association, 2009. 14. Lipman Bers. Introduction to several complex variables: lectures. Courant Institute of Mathematical Sciences, New York University, 1964. 15. Norman Lloyd Johnson, Samuel Kotz, and Narayanaswamy Balakrishnan. Discrete multivariate distributions, volume 165. Wiley New York, 1997. 16. Roy L Streit. Poisson point processes: imaging, tracking, and sensing. Springer Science, 2010.

Appendix B Generating Functionals for Finite Point Processes

“Dimidum facti, qui coepit, habet; sapere aude, incipe.” (“He who has begun is half done; dare to know; begin.”) Horace, in a letter addressed to Lolius (First Book of Letters, 20 BCE)

Abstract Generating functionals (GFLs) and Bayesian inference for finite point processes are studied in this appendix. Cluster processes are discussed first because their GFLs are the limits of GFs of random histograms. The GFL for general processes is then defined, and the underlying probability distribution is recovered from the GFL by differentiation. Bivariate and multivariate finite point processes are defined, and the GFL of the Bayes posterior point process is derived. It is central to recursive Bayesian estimation. The GFL for partially observed point process realizations is derived. Partial realizations occur in sequential statistical estimation problems when point estimates are obtained recursively. Partially observed processes are Palm processes. Intensity and pair correlation functions are derived from the GFL. The GFL of the superposition of non-independent point processes is derived from the “diagonal” of the joint GFL. Keywords Finite point process · Generating functional · Variational and set derivatives · Secular function · Bivariate finite point process · Bayes Theorem · Palm process · Intensity function · Pair correlation function · Marginalization · Superposition

© Springer Nature Switzerland AG 2021 R. Streit et al., Analytic Combinatorics for Multiple Object Tracking, https://doi.org/10.1007/978-3-030-61191-0

169

170

B.1

Appendix B: Generating Functionals for Finite Point Processes

Introduction

A point process is a mathematical model of points that are distributed randomly in a specified space, X. It is a finite point process if the number of points in a realization is finite with probability one. The space is usually Rd , d ≥ 1, or a subset thereof.2 Said more carefully, a finite point process is a random variable whose realizations are finite multisets of X, where a multiset is a set in which duplicate elements are permitted.3 The order of the elements in a multiset is irrelevant. The event space of a finite point process is the grand canonical ensemble, denoted E(X). This appendix draws on material from many sources. Little of it is new. The GFs for random variables in Appendix A are generalized and extended to include finite point processes. The generalization changes the way some things are written mathematically. While the changes are not entirely cosmetic, they are nevertheless essentially superficial—beneath them, much remains the same. Section B.2 begins with the GFs of random histograms. The basic idea is to interpret the sum pm z m in the argument of the GF of the random histogram (A.42) as a Riemann sum. In the limit as cell size goes to zero, the sum goes to the GFL (generating functional) of a finite point process. The method is intuitive and insightful, drawing as it does on the concept of Riemann sums. The shift from GFs to GFLs was presented in [4]. Section B.3 defines GFLs for finite point processes in a less intuitive way, but one that is more amenable to analysis. Sections B.4 and B.5 define variational derivatives of GFLs. These derivatives give the probability distribution of the process. Exact expressions for the derivatives are found using the “secular” method, which involves only ordinary mixed derivatives. Consequently, the GFL derivatives can be evaluated exactly using symbolic mathematical software. Several examples are given. Section B.6 discusses the intensity function and the higher order factorial moments. It is also shown how to evaluate them using the secular method. Sections B.7 and B.8 discuss bivariate finite point processes and their GFLs. It is shown that the Bayes posterior point process is characterized by the normalized derivative of its GFL. This result parallels a similar result for random variables. Section B.9 discusses marginalizing bi- and multivariate point processes using GFLs, a topic that is important for Bayesian analysis. Section B.10 discusses the superposition of point processes that are not necessarily independent. This topic is important in practical applications. Section B.11 discusses Palm processes, which are potentially important for sequential estimation problems (or matching pursuit, as it is often called in machine learning). Section B.12 discusses pair correlation, a topic that almost always comes up when doing Bayesian analysis.

2 Finite point processes are also defined for discrete spaces, discrete-continuous spaces, and general

abstract spaces. For further discussion, see [1, p. 2] and [2, p. 4]. to Donald E. Knuth [3, p. 36], the concept of multiset is ancient, but the term multiset was coined only in the 1970s by N. G. de Bruijn. 3 According

Appendix B: Generating Functionals for Finite Point Processes

171

Fig. B.1 Depiction of a random histogram with M cells partitioning a space X with GF given by (h step ) in Eq. (B.3). Cell probabilities are p(xm ), m = 1, . . . , M, and the corresponding cell indeterminate variables are z 1:M

B.2

Cluster Point Processes

The goal of this section is to give an intuitive understanding of the GFL that is more rigorously defined in Sect. B.3 for general point processes. Cluster processes are perfectly suited to this purpose because their GFLs are obtained as a limit of the GFs of random histograms (Appendix A) as the number of histogram cells, M, goes to infinity. Figure B.1 depicts the histogram and variables used in the GF. The cells form a partition of the space of interest, X. The point xm ∈ cell(m) is a point interior to the cell (e.g., the centroid, if the cell is convex). The number m is the cell size (e.g., length or volume). The probability vector p1:M becomes a step PDF, pstep (x), x ∈ X, whose value, pstep (xm ), for xm ∈ cell(m) is chosen so that pm = Pr{cell(m)} = pstep (xm ) m , m = 1, . . . , M.

(B.1)

The indeterminate vector z 1:M becomes a complex-valued step function h step (·), and its step values are the indeterminates of the cells, that is, h step (x) =

M m=1

z m 1{x ∈ cell m},

(B.2)

where 1{x ∈ cell(m)} is the indicator function for cell(m). Substituting into the second equation of the histogram GF (A.42) gives

(h step ) ≡ G N

 M m=1

h step (xm ) pstep (xm ) m .

(B.3)

As a check, note that (h step ) = 1 if the indeterminates h step (xm ) = 1 for all m. Since G N (·) is analytic and therefore continuous, the limit as M → ∞ can be brought inside the function G N (·) if h step is no greater than one in magnitude. The limit exists if the cell sizes m go uniformly to zero, assuming that the step functions pstep (·) converge

172

Appendix B: Generating Functionals for Finite Point Processes

to an integrable function p(·), and the step functions h step (·) converge to a locally integrable function h(·) which is bounded in magnitude by 1. (A function is locally integrable if it is integrable over all compact subsets of its domain.) This assumption is re-examined at the end of this section. The limit is denoted  M

(h) ≡ lim G N h step (xm ) pstep (xm ) m m=1 M→∞

 h(x) p(x) dx , = GN

(B.4)

X

where the dependence of h step , pstep , and m on M is suppressed for readability. The check holds in the limit, i.e., (h) = 1 for h(x) = 1, x ∈ X. This Riemann sum derivation applies to the class of processes called cluster processes [1]. The name “cluster” is explained by how realizations are simulated: The number of points N = n is randomly sampled according to the probability distribution whose GF is G N (z). Given n, the points are IID samples of the cluster PDF p(x). This class encompasses nearly all the point processes used in tracking. The function (·) evaluates to a number for every locally integrable, complexvalued indeterminate function h defined on X and satisfying |h(x)| ≤ 1 for all x ∈ X. It is termed a functional in the literature to emphasize that its argument is a function. In this text, it is called a GFL and, occasionally, a PGFL (probability generating functional). GFLs for point processes are denoted by the symbol (·), and GFs for random variables are denoted by G(·). Examples of cluster processes. Several GFLs for point processes of interest in tracking arise from the GFs listed in Appendix A (Sect. A.2). Points in these processes are IID samples of a random variable with PDF p(x) on the space X. • Singleton process generates one point with probability one: G Singleton (z) = z and 

Singleton (h) =

h(x) p(x) dx

(B.5)

X

• Bernoulli(β, p) process generates one point with probability β and no point with probability α = 1 − β:

 h(x) p(x) dx

Bernoulli (h) = G Bernoulli X  h(x) p(x) dx =α+β



(B.6)

X

• Binomial(β, n, p) process with n independent Bernoulli(β) distributed trials:

Appendix B: Generating Functionals for Finite Point Processes

173



Binomial (h) = G Binomial h(x) p(x) dx X n

 h(x) p(x) dx = α+β

(B.7)

X

• Multi-Bernoulli(β1:n , p1:n ) process with n independent Bernoulli(βk ) trials with Pr{one point} = βk with PDF pk (·) and Pr{no point} = αk = 1 − βk :

MultiBernoulli (h) =

 n

αk + βk h(x) pk (x) dx k=1

(B.8)

X

• Geometric process with Geometric(β) distributed trials:



Geometric (h) = G Geometric =

1 − (1 − β)

h(x) p(x) dx X

β  X

h(x) p(x) dx

(B.9)

• Negative binomial process with NB(r, β) distributed trials:





h(x) p(x) dx

NB (h) = G NB X r

1−β  = 1 − β X h(x) p(x) dx

(B.10)

• Poisson process, with Poisson(λ) distributed trials:





h(x) p(x) dx

Poisson (h) = G Poisson X

 = exp −λ + λ h(x) p(x) dx

(B.11)

X

The zeta cluster point process is defined via the GF (A.9). It is excluded from the list because of the absence of known applications. Why indeterminate functions are assumed locally integrable. The limit in the argument of G N (·) in (B.4) is a well-defined integral if (i) the step functions pstep (·) defined by (B.1) converge to a PDF p(x) and (ii) the indeterminate step functions h step (·) defined by (B.2) converge to a locally integrable function h(x) bounded in magnitude by 1 (i.e., |h(x)| ≤ 1 for all x ∈ X). It is reasonable to assume that the functions pstep (·) converge to a PDF since, in practice, the underlying probability function p(x) for the locations of points in X is continuous. In this case, the process is a “simple” finite point process because the points in a realization are distinct with probability one. Not all point processes are

174

Appendix B: Generating Functionals for Finite Point Processes

simple, however, and in the more general case the limiting distribution is a probability measure, d P(x). These distinctions are discussed more carefully in the next section. The indeterminate function h satisfies no regularity conditions; indeed, it need not even be bounded or locally integrable. Nonetheless, the next two sections show that such restrictions can be imposed on h and the derivatives of the resulting GFL can be defined in such a way that the major goal of AC is achieved, namely, to embed the probability distribution of a finite point process into a GFL and to recover the distribution by differentiating it. The procedure is mathematically rigorous and preserves the convenience of writing (B.4) as an integral.

B.3

Generating Functionals (GFLs) for Finite Point Processes

A general finite point process is simulated by a three-step procedure. The first step samples a random integer, N , called the canonical number that determines the number of points in the realization. The probability mass function (PMF) of N is Pr{n}, n ≥ 0. In the second step, if N = 0, the realization is the empty set, ∅. If N = n ≥ 1, a sample is drawn from a random variable whose PDF, p(x1 , . . . , xn | n), is symmetric, i.e.,   (B.12) p(x1 , . . . , xn | n) = p xσ (1) , . . . , xσ (n) | n for all permutations σ of the integers 1, . . . , n. (In statistics, the arguments of p are said to be exchangeable.) The sample is an n-tuple, or vector, x1:n = (x1 , . . . , xn ) ∈ the n-tuple becomes the multiset x1:n = {x1 , . . . , xn } ⊂ X. Xn . In the third step,  Permuted vectors xσ (1) , . . . , xσ (n) give the same multiset x1:n , so the PDF of x1:n given n is p (x1:n | n) =



  p xσ (1) , . . . , xσ (n) | n = n! p (x1 , . . . , xn | n) .

(B.13)

σ

The notation makes no distinction between PDFs of multisets and PDFs of vectors, but the context will make clear which is intended. The vector x1:n is a realization of what is termed, for want of a better name, the random finite “tuple” process on X. The event space L(X) is the union of finite length vectors, ∞   (B.14) L(X) = ∪ x1:n : x1:n ∈ Xn , n=0

where x1:0 ≡ ∅. An unspecified random realization is denoted by x1:N . Tuple processes are useful in applications in which the order of the components identifies, or labels, them in some way. The event space (B.14) is termed the label space of X. The multiset x1:n is a realization of the finite point process on X. The event space E(X) is the union of multisets,

Appendix B: Generating Functionals for Finite Point Processes

175



E(X) = ∪ { x1:n : x1:n ⊂ X} ,

(B.15)

n=0

where x1:0 ≡ ∅. The mapping x1:n → x1:n of vectors in L(X) to multisets in E(X) is not one-to-one because it does not preserve the order of the components of x1:n . An unspecified random realization is denoted by x1:N . In the physics community, E(X) is called the grand canonical ensemble (GCE) of X. The canonical number of a multiset is the number of elements counted with multiplicity, e.g., the canonical number of {1, 3, 1, 2, 2} is five. Most applications involve only simple finite point processes. These are processes whose realizations are sets with probability one and multisets with probability zero.4 The GFL of a general simple finite point process is defined by

(h) = Pr{0} +

∞  n=1

 Pr{n} X

 ···

X

h(x1 ) · · · h(xn ) p(x1 , . . . , xn | n) dx 1:n , (B.16)

where dx 1:n ≡ dx 1 · · · dx n and h : X → C is a locally integrable function such that |h(x)| ≤ 1 for all x. No physical units are associated with h(x), so the integrals are unitless and (h) is well defined. The series is absolutely convergent because the integrals are bounded in magnitude by one and n≥0 Pr{n} = 1. If the process is not simple, the probability distribution has point masses, and the integrals are with respect to a probability measure d P(x1:n | n). Such processes are useful in applications when the space X is discrete or comprises both discrete and continuous elements. The general case is discussed in [8, 9]. Evaluating the GFL for constant indeterminate functions h gives the GF of N .  n Pr{n}z ≡ G N (z). In particular, Let h z (x) ≡ z for all x ∈ X. Then (h z ) = ∞ n≥0

(1) = 1. IID point processes. A simple finite point process is IID if the points in a realization are independent and identically distributed conditional on the number of points, so that p(x1:n | n) = p(x1 |n) · · · p(xn |n) , n ≥ 1.

(B.17)

Substituting the product (B.17) into the definition (B.16), it is seen that the n-fold integrals reduce to products of integrals and, thus, the GFL of an IID simple finite point process is

(h) = Pr{0} +

∞  n=1

4 Random



n

Pr{n}

h(x) p(x|n) dx X

.

(B.18)

finite sets (RFSs) and finite set statistics (FISST) [5] exclude multisets from the event space. They should not be confused with simple finite point processes [6, Sect. 2.13], [7].

176

Appendix B: Generating Functionals for Finite Point Processes

More generally, if an IID process is not simple, the conditional probability distributions have jump discontinuities, and the joint cumulative (probability) distribution functions (CDFs), denoted P(· | n), factor as d P(x1 , . . . , xn | n) = d P(x1 | n) · · · d P(xn | n) , n ≥ 1.

(B.19)

TheGFL for general IID processes is the same as (B.18) but with the integral replaced by X h(x) d P(x|n). Cluster processes. The processes derived by the limiting procedure are IID processes in which points are samples of the same distribution; for simple cluster processes, p(x | n) ≡ p(x) for all n and x ∈ X. Substituting this into (B.18), it is seen that the GF of a simple cluster process is identical to (B.4). For general cluster processes, the probability measure is the same for all n, that is, d P(x|n) ≡ d P(x). Probabilistic mixtures of GFLs. Let the events  j , j = 1, . . . , J, J ≥ 1, partition the event space E(X). Assume that Pr{ j } > 0 for all j. Then, by the law of total probability [10], Pr{1 } + · · · + Pr{ J } = 1, and the PDF in the definition of the GFL (B.16) is J

p(x1 , . . . , xn | n) =

j=1

Pr{ j } p(x1 , . . . , xn | n,  j ).

(B.20)

Let (h |  j ) denote the conditional GFL of the point process conditioned on  j . It is defined as in (B.16), but uses the conditional PDF p(x1 , . . . , xn | n,  j ). Substituting (B.20) into (B.16) and interchanging sums gives the GFL of the point process as a probabilistic mixture of event conditioned GFLs,

(h) =

J j=1

Pr{ j } (h |  j ).

(B.21)

For an example, see Eq. (3.17). It also holds for countably infinite partitions of the event space. The result for random variables takes the same basic form; see (A.11).

B.4

Derivatives of GFLs

General finite point processes are characterized by their GFLs [8]. This section verifies the general result for simple processes by showing that the event probabilities of the process are the limits of variational derivatives of (h). Variational derivatives are defined as in the Calculus of Variations (see Appendix C.3). They are called functional derivatives when the space X is abstract. In tracking applications they are often referred to as “set” derivatives. After introducing definitions and the traditional symbolic methods, the discussion switches gears, and moves in refreshing new directions that are suitable for imple-

Appendix B: Generating Functionals for Finite Point Processes

177

mentation using modern symbolic software. These methods eliminate the tedium of hand calculation for derivatives. For a fixed δ > 0, assume |h(x)| ≤ 1 − δ for all x ∈ X (the case δ = 0 will be addressed later in this section). Let qk : X → C, k = 1, 2, . . . be a sequence of infinitely differentiable test functions for the Dirac delta at the point x1 ∈ X. (See Appendix C, Sect. C.2.) Each function qk is also bounded, so for |ε| sufficiently small, |h(x) + εqk (x)| < 1 for all x ∈ X. Thus, for fixed h and qk , (h + εqk ) is an analytic function of ε ∈ C inside a sufficiently small radius disk. Definition. Let (h) be the GFL of a simple finite point process, (B.16). Let {qk }∞ k=1 be a test sequence for the Dirac delta at the point x1 ∈ X. Let ε ∈ C. The derivative of (h) at h with respect to the set {x1 } is the limit of variational derivatives,

{x1 } (h) ≡

 d d 

(h) ≡ lim

(h + εqk )  . k→∞ ε=0 d{x1 } dε

(B.22)

Let x1:n ≡ {x1 , . . . , xn }, n ≥ 1, be a set, i.e., xi = x j for i = j. The derivative with respect to x1:n is defined recursively:

{x1 ,x2 ,...,xn } (h) =

d  {x1 ,x2 ,...,xn−1 } 

(h) , n ≥ 2. d{xn }

(B.23)

The derivatives, called set derivatives, depend on h and the set x1:n . Derivatives with respect to multisets are undefined for simple point processes because they are zero probability events. The superscript notation is similar to the derivative notation used in multivariate calculus; see, e.g., (A.20) and (A.32). Example: Simple cluster processes. Substituting h + εqk into the GFL (B.4) gives



(h + εqk ) = G N

X



 h(x) + εqk (x) p(x) dx .

(B.24)

Differentiating with respect to ε and substituting into (B.22) gives

{x1 }

G N





(h) = lim h(x) p(x) dx qk (x) p(x) dx k→∞ X X

 h(x) p(x) dx p(x1 ) , = G N

(B.25)

X

where G N (·) is the derivative of G N (·). Differentiating again, but at a different point x2 = x1 and with a Dirac test sequence qk centered at x2 not x1 , gives

178

Appendix B: Generating Functionals for Finite Point Processes

 

{x1 ,x2 } (h) = lim G N h(x) p(x) dx p(x1 ) qk (x) p(x) dx k→∞ X X

  h(x) p(x) dx p(x1 ) p(x2 ) . (B.26) = GN X

Continuing in this way for every point in the set x1:n = {x1 , x2 , . . . , xn } gives

x1:n

(h) =

G (n) N



h(x) p(x) dx

X

p(x1 ) p(x2 ) · · · p(xn ) ,

(B.27)

where G (n) N (·) is the nth derivative of G N (·). Evaluating at h = 0 gives

x1:n (0) = G (n) N (0) p(x 1 ) p(x 2 ) · · · p(x n ) . Substituting G (n) N (0) = n! Pr{n} from (A.3) and the identity (B.13) gives

x1:n (0) = n! Pr{n} p(x1 ) p(x2 ) · · · p(xn ) = Pr{n} p(x1:n | n) .

(B.28)

The event probabilities are the derivatives of the GFL; consequently, the GFL characterizes cluster processes. Derivatives of general finite point processes. To find the derivative with respect to a point x1 , change the dummy variable names in (B.16) and write

(h + εqk ) = Pr{0} +

∞ 



ν

Pr{ν}

ν=1

Xν j=1

h(s j ) + εqk (s j ) p(s1 : ν | ν) ds1 : ν .

The function (h + εqk ) is analytic for sufficiently small |ε|. Differentiating term-by-term with respect to ε gives ∞ ν 

ν    d  qk (s j )

(h + εqk ) = Pr{ν} h(si ) p(s1:ν | ν) ds1:ν . ε=0 ν dε ν=1 j=1 X i=1, i= j Interchanging the order of the derivative and the integrals is justified here by the dominated convergence theorem ([11, Thm. 2.27b]) because, for each ν, the magnitude of the derivative of the integrand with respect to ε is bounded above by a constant multiple of the (integrable) PDF p(s1 : ν | ν). The limit k → ∞ of the Dirac test sequence is the derivative at x1 . The limit samples the jth integral at x1 , so that

{x1 } (h) =

∞  ν=1

Pr{ν}

ν   j=1

ν Xν−1 i=1, i= j

h(si ) p(. . . , s j−1 , x1 , s j+1 , . . . | ν) ds1:ν \ds j .

In the jth summand, since the PDF p(·) is symmetric (cf. (B.12)), permute s1 and x1 , and then relabel s1 as s j to obtain

Appendix B: Generating Functionals for Finite Point Processes



{x1 }

(h) =

ν



∞ 

ν Pr{ν}

ν=1

Xν−1

179

h(s j ) p(x1 , s2 , . . . , sν | ν) ds2 : ν .

(B.29)

j=2

For ν = 1, the product over j is taken to be one and the integral is omitted. The derivative with respect to the set x1:n ≡ {x1 , . . . , xn }, n ≥ 1,

(B.30)

is defined by the recursion (B.23). Omitting details, the derivative is

x1:n (h) =

∞ 

ν(ν − 1) · · · (ν − n + 1) Pr{ν}

ν=n

ν



×

Xν−n

(B.31)

h(s j ) p(x1 , . . . , xn , sn+1 , . . . , sν | ν) dsn+1 : ν .

j=n+1

For ν = n, the product is taken to be one and the integral is omitted. Setting h(x) ≡ 0 for all x ∈ X reduces (B.31) to a constant:

x1:n (0) = n! Pr{n} p(x1 , . . . , xn | n) = Pr{n} p(x1:n | n) = p(x1:n ) .

(B.32)

Thus, the unconditional probability p(x1:n ) of the event x1:n is the derivative x1:n (0). This verifies that general (simple) finite point processes are characterized by their GFLs. (A measure-theoretic derivation of this result is given in [8, Sect. 4].) Derivatives at h ≡ 1. For an indeterminate z, setting h(x) ≡ z in (B.31) gives

x1:n (z) =

∞ ν=n

ν(ν − 1) · · · (ν − n + 1) Pr{ν} p(x1 , . . . , xn |ν) z ν−n .

(B.33)

Equivalently, by Bayes Theorem,

x1:n (z) = p(x1 , . . . , xn )

∞ ν=n

ν(ν − 1) · · · (ν − n + 1) Pr{ν|x1 , . . . , xn }z ν−n .

This expression is the nth derivative of G(z) ≡ p(x1 , . . . , xn )

∞ ν=0

Pr{ν|x1 , . . . , xn }z ν .

The series has a radius of convergence at least one since G(1) = p(x1 , . . . , xn ) < ∞. Applying Abel’s Theorem to (B.33) as in Appendix A gives, reverting to (B.33), lim− x1:n (z) =

z→1

∞ ν=n

ν(ν − 1) · · · (ν − n + 1) Pr{ν} p(x1 , . . . , xn |ν) .

(B.34)

180

Appendix B: Generating Functionals for Finite Point Processes

Throughout this book, if the radius of convergence of the GF x1:n (z) is exactly equal to one, then (B.35)

x1:n (1) ≡ lim− x1:n (z) . z→1

Phrases such as “evaluating the derivative at h = 1” should be interpreted to mean, in these cases, taking a left-hand limit. For example, compare expression (B.34) above to expression (B.45) below. These technicalities are not an issue in this book as all the examples have a radius of convergence strictly greater than one. Set derivatives as mixed variational derivatives. In the above development, the derivative with respect to x1:n is calculated recursively, one point at a time. Derivatives with respect to one point are first-order derivatives, so the recursion is equivalent to performing several independent variational perturbations simultaneously:

x1:n (h) = lim

k→∞

 n dn 

h+ ε j qk j  , j=1 ε1 =···=εn = 0 dε1 · · · dεn

(B.36)

where the sequence {qk j (·)}∞ k=1 , j = 1, . . . , n, is a test sequence for the Dirac delta δx j (·) at the point x j .

B.5

Secular Functions

The GFL of a simple finite point process is an analytic function of a linear functional. It is shown in this section that set derivatives can be expressed as mixed first-order derivatives of an ordinary multivariate function obtained from (h) by “perturbing” h with a weighted Dirac delta train (see (B.37) below). Functions obtained in this way are called the secular5 functions of (h). The method was first outlined in [12]. Secular functions and their derivatives are computed by hand or by using modern symbolic software packages (e.g., Mathematica). Exact derivatives give exact event likelihood functions. Algorithms for evaluating many likelihood functions have high computational complexity, i.e., they are NP-hard calculations; consequently, algorithms for computing derivatives of their GFLs also have high computational complexity. For intractable problems, secular functions are a starting point for deriving bounded complexity approximations using established techniques from applied mathematics, e.g., the saddle point method, or newer methods such as the complex step method. The basic idea is to interchange the derivative and the limit in (B.36). For a fixed indeterminate function h, define the secular function (· ; h) : Cn → C by n

(ε; h) ≡ lim h + k→∞

5 The

j=1

ε j qk j ,

(B.37)

adjective secular signifies only that the functions are not functionals, even though they are derived from functionals, and that their partial derivatives are defined in the usual way.

Appendix B: Generating Functionals for Finite Point Processes

181

where ε = (ε1 , . . . , εn ) ∈ Cn . When using test sequences for Dirac deltas [13], the integrals of the test functions are evaluated before taking the limit—it is not correct to take the limit inside the integral. The derivative of the secular function is the set derivative:

x1:n (h) =

 dn 

(ε; h) . ε1 = ··· =εn = 0 dε1 · · · dεn

(B.38)

The result (B.38) holds for simple point processes [12]. Secular functions are defined for more general processes too; see [6, 14]. Secular functions take attractive forms for IID processes. From (B.18),

 n

h+

j=1

ε j qk j

= Pr{0}+

∞ 

Pr{n  }

n  =1



X

h(x) +

n j=1

n  ε j qk j (x) p(x|n  ) dx .

The limit as k → ∞ is the secular function of (h) for x1:n = {x1 , . . . , xn },

(ε; h) = Pr{0} +

∞ 





Pr{n }

n  =1

X



h(x) p(x|n ) dx +

n j=1

n  ε j p(x j |n ) . 

For cluster processes p(x|n  ) = p(x), and the secular function simplifies to



(ε; h) = G N

X

h(x) p(x) dx +

n j=1

ε j p(x j ) ,

(B.39)

where G N (·) is the GF of the canonical number N . For h = 0,

(ε; 0) = G N

n j=1

ε j p(x j ) .

(B.40)

The mixed derivative of this function at ε = 0 is the probability of x1:n . Another special case, h = 1, gives the intensity; see (B.50) in Sect. B.6. The mixed first-order derivative of (ε; h) with respect to ε j , j = 1, . . . , n, at zero is the set derivative (B.31) of (h) with respect to x1:n . Secular functions are known explicitly for many point processes; see the examples in Sect. B.2. Derivatives of secular functions are computed manually or by symbolic mathematical software. Alternatively, numerical derivatives can be computed by automatic differentiation (AD) with computational complexity that is provably a small multiple (three or four) of the lowest possible complexity [15]. Example: Singleton process. This process generates exactly one point, G N (z) = z. The probability of the empty set ∅ is zero, as is verified by evaluating the GFL (B.5) at h = 0. To find the probability of the set x1:n , n ≥ 1, substitute a test sequence for the Dirac delta train n ε j δx j (x) , (B.41) h(x) = j=1

182

Appendix B: Generating Functionals for Finite Point Processes

into the GFL (B.5). The test sequence limit gives the secular function as

Singleton (ε) ≡ Singleton (ε; 0) =

n j=1

ε j p(x j ) .

(B.42)

The mixed first-order derivative with respect to ε1 , . . . , εn is nonzero if and only if n = 1, thus confirming analytically that the singleton process generates exactly one point with probability one. For n = 1, the derivative evaluated at ε1 = 0 is p(x1 ). The GFL (B.5) is unique—no other GFL gives exactly the same event probabilities. It is worth pointing out that the GFL of the singleton process, like the GFL of every finite point process, gives a probability for every event in E(X). No event is impossible, only zero probability.

B.6

Intensity Function and Other Summary Statistics

Gaining an intuitive understanding of the structure of finite point processes is handicapped by the high and variable dimensionality of the event space. The problem worsens for bi- and multivariate processes, especially when they are correlated (colloquially speaking). Nonetheless, insight is often gained from summary statistics, that is, lower dimensional statistics derived from the full process. The moment generating functions of a finite point process are the derivatives of the GFL of the process evaluated at h = 1. The first-order moment at the point x1 ∈ X is found by evaluating (B.29) at h = 1; explicitly, m [1] (x1 ) = {x1 } (1) =

∞ n=1

n Pr{n} p(x1 | n).

(B.43)

If the process is a cluster process, then p(x1 |n) ≡ p(x1 ) for all n and m [1] (x1 ) =

∞ n=1

n Pr{n} p(x1 ) = E[N ] p(x1 ) ,

(B.44)

where E[N ] is the expected number of points in a realization. The first moment is the expected number of points per unit state space at x1 and, for this reason, it is also called the intensity function. (Nota Bene. Not every simple point process has an intensity function, e.g., the zeta point process. Such processes do not arise in tracking applications.) Intensity functions have exactly the same units as PDFs on X, e.g., number per unit measure (area, volume, etc.) of X ⊂ Rd . Nonetheless, they are very different functions, and it is a mistake to conceptualize intensity as a PDF. This is true even when E[N ] = 1. The intensity function is equivalent to a PDF if and only if the number of points in the process is exactly one, and not merely one in expectation. When the derivative is taken with respect to n ≥ 1 (distinct) points x1:n , the moments are denoted by m [n] (x1:n ). Evaluating the derivatives (B.31) at h = 1 gives

Appendix B: Generating Functionals for Finite Point Processes

m [n] (x1:n ) = x1:n (1) =

∞ ν=n

183

ν(ν − 1) · · · (ν − n + 1) Pr{ν} p(x1:n | ν) , (B.45)

where p(x1:n | ν) is the marginal PDF, i.e., the integral of p(x1:ν | ν) over xn+1:ν . If, for example, the process is a cluster process, the nth factorial moment function is m [n] (x1:n ) = E[(N )n ] p(x1 ) · · · p(xn ) ,

(B.46)

 where E[(N )n ] ≡ ∞ ν=n ν(ν − 1) · · · (ν − n + 1) Pr{ν} is the nth factorial moment of the canonical number, N , of the number of points of the process. The factorial moment (B.45) is also the probability of the set {x1:n }. This lovely fact takes the form m [n] (x1:n ) = p(x1:n ) = p({x1:n }).

(B.47)

To see this, note that the sum in the series (B.45) is over all numbers ν ≥ n of points. This means that realizations may have any number of points as long as n of them are x1:n . These n points are unordered, hence the falling factorial coefficient (ν)n . The factorial moments are marginal probabilities. An informal understanding is given in [16, Eq. (5.4.12)]. Let dx i = (xi , xi + dx i ), i = 1, . . . , n, denote both an infinitesimal subset and its size (measure). Then m [n] (x1:n ) dx 1 · · · dx n = x1:n (1) dx 1 · · · dx n ⎧ ⎫ ⎨ at least n points in the realization ⎬ = Pr and there is exactly one point in each . (B.48) ⎩ ⎭ infinitesimal dx i , i = 1, . . . , n It is distinguished from the set probability [16, Eq. (5.4.13)], p(x1:n ) dx 1 · · · dx n = x1:n (0) dx 1 · · · dx n ⎧ ⎫ ⎨ exactly n points in the realization ⎬ = Pr with exactly one point in each . ⎩ ⎭ infinitesimal dx i , i = 1, . . . , n

(B.49)

For further discussion, see the treatise [16, Chap. 5]. The intensity function can be computed by the secular function method. If the radius of convergence of G N (z) is strictly greater than one (as is generally the case in practice and is always the case in this book), then substituting h = 1 into (B.39) gives the secular function

k

(ε) ≡ (ε; 1) = G N 1 + ε j p(x j ) . (B.50) j=1

The ordinary mixed first-order derivative m [k] (x1:k ) =

 dk 

(ε) ε1 =···=εk =0 dε1 · · · dεk

(B.51)

184

Appendix B: Generating Functionals for Finite Point Processes

is the kth factorial moment function at x1:k . If, on the other hand, the radius of convergence of G N (z) is exactly equal to one, then evaluating the derivatives requires taking limits from inside the unit polydisc [8]. By Abel’s Theorem, m [k] (x1:k ) = lim− z→1



 k dk  GN z + ε j p(x j )  . j=1 ε1 =···=εk =0 dε1 · · · dεk

(B.52)

The details are straightforward and are omitted.

B.7

Bivariate Finite Point Processes

Realizations of a bivariate finite point process are generated by a natural extension of the three-step procedure used for one process (see Sect. B.3). Only bivariate processes are discussed in this appendix to highlight the basic concepts and avoid cumbersome notation. General multivariate processes present no additional technical difficulty. A bivariate finite point process comprises two finite point processes. Denote their event spaces by E(X) and E(Y), where X ⊆ Rdx , dx ≥ 1, and Y ⊆ Rd y , d y ≥ 1. Realizations of a bivariate finite point process are ordered pairs (x1:n , y1:m ) ∈ E(X) × E(Y), n ≥ 0, m ≥ 0.

(B.53)

Bivariate point processes are not to be confused with univariate point processes whose realizations are in E(X × Y). In the first step of the simulation, a random pair of integers (N , M) gives the numbers of points in X and Y, respectively. The joint PMF is Pr{n, m}, n ≥ 0, m ≥ 0, and its GF is G NM (z 1 , z 2 ) =

∞ ∞ n=0

m=0

Pr{n, m} z 1n z 2m .

(B.54)

In the second step, if n ≥ 1 and m ≥ 1, a sample is drawn from a random variable defined on Xn × Ym whose PDF p(x1:n , y1:m | n, m) is semi-symmetric, i.e.,   p(x1:n , y1:m | n, m) = p xσ (1) , . . . , xσ (n) , yτ (1) , . . . , yτ (m) | n, m

(B.55)

for all permutations σ and τ of the integers {1, . . . , n} and {1, . . . , m}, respectively. The notation is adapted in obvious ways to accommodate special cases: (i) for n = m = 0, the realization is (∅, ∅); (ii) for n = 0 and m ≥ 1, the realization is (∅, y1:m ), where y1:m is a sample vector of a random variable with symmetric PDF p(y1:m | 0, m); and (iii) for n ≥ 1 and m = 0, the realization is (x1:n , ∅), where x1:n is a sample of a random variable with symmetric PDF p(x1:n | n, 0). In the third step, the ordered pair of vectors (x1:n , y1:m ) becomes an ordered pair of multisets (x1:n , y1:m ), where x1:n ⊂ X and y1:m ⊂ Y. Permuting vector components gives the same multiset, so the PDF of (x1:n , y1:m ) is

Appendix B: Generating Functionals for Finite Point Processes

p(x1:n , y1:m | n, m) =

 σ,τ

185

  p xσ (1) , . . . , xσ (n) , yτ (1) , . . . , yτ (m) | n, m

= n!m! p (x1:n , y1:m | n, m) .

(B.56)

The argument determines whether p(·| n, m) is a PDF for multisets or for vectors. The GFL for a multivariate point process requires a complex-valued indeterminate function for each process. A bivariate process therefore has two, namely, h : X → C and g : Y → C. The GFL is the probabilistic sum

(h, g) ≡

∞  ∞ 

Pr{n, m} (h, g| n, m) ,

(B.57)

n=0 m=0

where (h, g| n, m) is the GFL of a point process whose (multiset) realizations have n points in X and m points in Y. For n = m = 0, (h, g| 0, 0) ≡ 1. Otherwise,   h(x1 ) · · ·h(xn ) g(y1 ) · · ·g(ym ) p(x1:n , y1:m |n, m) dx1:n dy1:m ,

(h, g|n, m) = Xn Y m

(B.58) where integrals over X and Y are omitted when either n = 0 or m = 0, respectively. Evaluating (h, g) for the constant functions gives the bivariate GF of the numbers (N , M). Let h z1 (x) = z 1 and gz2 (y) = z 2 for all x ∈ X and y ∈ Y. Then

(h z1 , gz2 | n, m) = z 1n z 2m and (h z1 , gz2 ) = G NM (z 1 , z 2 ). Derivatives. The mixed derivative of (h, g) with respect to the sets x1:n ⊂ X and y1:m ⊂ Y is denoted by x1:n ,y1:m (h, g). It is defined as a nested derivative, that is, the derivative is taken first with respect to y1:m for fixed h, and then with respect to x1:n for fixed g. Differentiation is commutative—derivatives can be taken in any order, first with respect to points in x1:n and then y1:m , or vice versa, or permuted in any of (n + m)! ways. It is also linear, so the derivative of a sum is the sum of derivatives. Switching dummy subscripts to n  and m  to avoid confusion with the sizes of the sets x1:n and y1:m , and then differentiating (B.57) term-by-term gives

x1:n ,y1:m (h, g) =

∞  ∞ 

Pr{n  , m  } x1:n ,y1:m (h, g| n  , m  ) .

(B.59)

n  =0 m  =0

The derivative x1:n ,y1:m (h, g| n  , m  ) evaluated at h = g = 0 is found using the secular function technique. Let α = (α1 , . . . , αn ) ∈ Cn and β = (β1 , . . . , βm ) ∈ Cm . Substituting test function sequences for the weighted Dirac delta trains h δ (x) = gδ (y) =

n j=1

m

k=1

α j δx j (x) , x ∈ X,

(B.60)

βk δ yk (y) ,

(B.61)

y ∈ Y,

186

Appendix B: Generating Functionals for Finite Point Processes

expanding the products as multiple sums, and taking the test sequence limit gives the secular function of the summand of (B.59) as the formidable looking sum, 



(α, β | n , m ) =

n 

m 



n



α jr

j1 ,..., jn  = 1 k1 ,..., km  = 1 r =1

m

βks p(x j1 : jn , yk1 :km | n  , m  ).

s=1

The sum collapses under differentiation—the only terms in the mixed first-order derivative with respect to α1 , . . . , αn and β1 , . . . , βm that are nonzero when evaluated at zero are those with n  = n and m  = m. These nonzero terms correspond to indices ( j1 , . . . , jn ) = σ and (k1 , . . . , km ) = τ that are permutations of 1, . . . , n and 1, . . . , m, respectively. Using the symmetries (B.55) and (B.56), dm dn

(α, β| n, m) dα1 · · · dαn dβ1 · · · dβm   ≡ p(x j1 : jn , yk1 :km | n, m)

x1:n ,y1:m (0, 0| n, m) =

( j1 ,..., jn )=σ

(B.62)

(k1 ,..., km )=τ

= n!m! p(x1:n , y1:m | n, m) = p(x1:n , y1:m | n, m),

(B.63)

which proves that (B.57) is the GFL. In words, there are n!m! nonzero terms in the double sum (B.62), and they are all equal, so (B.59) implies that x1:n ,y1:m (0, 0) = p(x1:n , y1:m ). Independent cluster processes. The general expression (B.57) simplifies considerably when the two processes are independent. If both processes are also simple cluster processes, the PDF factors as (subscripts added for clarity) p(x1:n , y1:m | n, m) = pX (x1:n | n) pY (y1:m | m) n m = pX (x j ) pY (yk ) , j=1

k=1

(B.64)

where products are defined to be one for n = 0 or m = 0. Multiplying (B.58) by Pr{n, m} and summing over n and m gives, using the bivariate GF (B.54) for (N , M),

(h, g) = G NM

 X

h(x) pX (x) dx,

 Y

 g(y) pY (y) dy .

(B.65)

The GFL factors further if the random integers N and M are independent, for then G NM (z 1 , z 2 ) = G N (z 1 ) G M (z 2 ); in this case, the point processes are independent. The secular function for the probability of the pair (x1:n , y1:m ) is found by substituting the weighted Dirac delta trains (B.60) and (B.61) into (B.65), giving

(α, β; 0, 0) ≡ G NM

n j=1

α j pX (x j ),

m k=1

βk pY (yk ) .

(B.66)

Appendix B: Generating Functionals for Finite Point Processes

187

The mixed derivative (n,m) (α, β; 0, 0)|α=0,β=0 is the set derivative x1:n ,y1:m (h, g) evaluated at h = 0 and g = 0. More generally, given indeterminate functions h and g, the secular function

(α, β; h, g) is found by substituting h + h δ and g + gδ into (B.65), where h δ and gδ are the Dirac delta trains (B.60) and (B.61). Details are straightforward and are omitted.

B.8

Bayesian Posterior Finite Point Processes

Bayes Theorem determines the probability distribution of posterior finite point processes that are determined by conditional events. If the process is bivariate, the conditioning is on a realization (observation) of one of the two processes. If the process is univariate and the realizations themselves are only partially observed, then the conditioning is on the points that are observed (see Sect. B.11 below). Either way, it is seen that the conditional GFLs of the posterior processes are normalized derivatives of the GFL. The notation for bivariate point processes in Sect. B.7 is used here. The Bayes posterior point process conditioned on y1:m is a point process on X with GFL given by the normalized derivative

Bayes (h| y1:m ) =

∅,y1:m (h, 0) ,

∅,y1:m (1, 0)

(B.67)

where ∅,y1:m (h, 0) denotes the functional derivative of (h, g) with respect to the points in y1:m , but not the points in x1:n , and evaluated at g = 0. The GF of the number of points N in the posterior process is the GFL evaluated at the constant function h z (x) ≡ z for all x ∈ X. Thus, G N |y1:m (z) = Bayes (h z | y1:m ). The conditional probability that N = n is equal to G (n) N |y1:m (0)/n! . The derivatives of (B.67) yield the conditional probability distribution. The derivative of ∅,y1:m (h, g) with respect to x1:n is x1:n ,y1:m (h, g), and so x1:n ,y1:m (0, 0) = p(x1:n , y1:m ). On the other hand, the denominator of (B.67) is the derivative with respect to y1:m of (1, g), which is the GFL of the marginal process (cf. (B.71) below in Sect. B.9), evaluated at g = 0. The derivative is  

∅,y1:m (1, g) 

g=0

= p(y1:m ) .

(B.68)

Substituting for the numerator and denominator gives the derivative of (B.67) as   x1:n

Bayes (h| y1:m ) 

h=0

x1:n ,y1:m (0, 0)

∅,y1:m (1, 0) p(x1:n , y1:m ) = p(x1:n | y1:m ) . = p(y1:m )

=

(B.69)

188

Appendix B: Generating Functionals for Finite Point Processes

Therefore, (B.67) is the PGFL of the Bayes posterior finite point process on X. Example: Intensity of Bayes posterior process as logarithmic derivative. Let x¯ ∈ X and ε ∈ C. The GFL of the Bayes posterior process is given by (B.67). The intensity function is, using (B.51) with k = 1,   d

Bayes 1 + εδx¯ (x)| y1:m  ε=0 dε d ∅,y1:m (1 + εδx¯ (x) , 0)  =  ε=0 dε

∅,y1:m (1, 0)   d log ∅,y1:m 1 + εδx¯ (x) , 0  . = ε=0 dε

¯ = m [1] (x)

(B.70)

Thus, the intensity of the posterior process can be calculated as a logarithmic derivative of ∅,y1:m .

B.9

Marginalizing a Bivariate Point Process

A multivariate finite point process is marginalized over one or more of its constituent processes by integrating those processes out of the problem. The marginalized processes’ influence remains, but is subsumed in the integral. Marginalization is encountered in the normalization constant (B.68) in Bayes Theorem. Bivariate processes are discussed here. The extension to the multivariate case is self-evident. Let ST (h, g) be the GFL of a bivariate process (S, T ) with event space E(X) × E(Y). Integrating the bivariate process over T gives the marginal process S, whose GFL S (h) is given by ST (h, 1); i.e., it is the bivariate GFL evaluated for g(y) = 1, y ∈ Y. Substituting into (B.57) and using (B.58) gives

ST (h, 1) = =

∞ ∞ n=0 ∞ n=0

Pr{n, m} ST (h, 1| n, m)  ∞ Pr{n} Pr{m|n} h(x1 ) · · · h(xn ) p(x1:n | n, m) dx 1:n ,

where p(x1:n | n, m) = the integrals gives

ST (h, 1) = = =

∞ n=0

∞ n=0

∞ n=0

m=0

m=0

 Ym

Xn

p(x1:n , y1:m | n, m) dy 1:m . Moving the sum over m inside 

Pr{n} X

n

Pr{n} X

n

Pr{n} Xn

h(x1 ) · · · h(xn ) h(x1 ) · · · h(xn )

∞ m=0

∞

m=0

Pr{m|n} p(x1:n | n, m) dx 1:n

p(x1:n , m| n) dx 1:n

h(x1 ) · · · h(xn ) p(x1:n | n) dx 1:n .

Appendix B: Generating Functionals for Finite Point Processes

189

From the definition (B.16) of the univariate PGF,

S (h) = ST (h, 1) .

(B.71)

Similarly, the GFL of the marginal process T is T (g) = ST (1, g).

B.10

Superposition of Bivariate Finite Processes

Superposing the processes onto a single common space—when that can be done— is often helpful. The GFL of the superposed bivariate process is derived in this section without assuming that the component processes S and T are independent. The general result may be new (it is stated without proof in [17]). Bivariate processes are discussed—the extension to multivariate processes is self-evident. Realizations of bivariate finite point processes are ordered pairs of sets in the Cartesian product of event spaces, E(X) × E(Y). The bivariate process is superposed by mapping a bivariate event, say (x1:n , y1:m ), to the set x1:n ∪ y1:m . The union comprises points in both X and Y. For the union to be a well-defined event for a univariate point process, it is necessary that X ≡ Y. This univariate process, when it is defined, is called the superposed process. Let ST (h, g) denote the joint GFL of a bivariate finite point process defined on the space X2 ≡ X × X. The indeterminates h and g are functions of x ∈ X. The superposed (univariate) process is a point process on X, and its GFL is

Super (h) = ST (h, h) .

(B.72)

To see this, substitute g = h in (B.57) and rearrange the double sum to obtain

ST (h, h) = =

∞ 

Pr{n, m} ST (h, h|n, m)

n,m≥0 ∞  

Pr{n, m} ST (h, h| n, m) ,

(B.73)

(B.74)

k=0 n+m=k n,m≥0

from (B.58), since Y = X, 

ST (h, h|n, m) = h(x1 ) · · · h(xn ) h(y1 ) · · · h(ym ) p(x1:n , y1:m |n, m) dx 1:n dy 1:m . Xn+m

(B.75)

Since k = n + m, π {n|k} = Pr{n, k − n}/ Pr{k},  define the conditional probabilities where Pr{k} = kn=0 Pr{n, k − n}, so that kn=0 π {n|k} = 1. Define the k-tuple ξ1:k = (x1:n , y1:m ) and the set ξ 1:k = x1:n ∪ y1:m . Interchanging in (B.74) the sum

190

Appendix B: Generating Functionals for Finite Point Processes

over n + m and the integral over Xn+m ≡ Xk , and using π {n|k} gives

ST (h, h) =

∞ 

!  k  Pr{k} h(ξ1 ) · · · h(ξk ) π {n|k} p(ξ1:n , ξn+1:k |n, k − n) dξ1:k , Xk

k=0

n=0

where the sum in braces is the conditional probability given k of the ordered k-tuple ξ1:k . The conditional probability of the set ξ1:k is the sum p(ξ1:k | k) =

 k μ

n=0

  π {n|k} p ξμ(1):μ(n) , ξμ(n+1):μ(k) | n, k − n ,

(B.76)

where μ is a permutation of {1, . . . , k}. Define the symmetrized joint PDF on Xk by pSymmetric (ξ1:k | n, k) =

 1   p ξμ(1):μ(n) , ξμ(n+1):μ(k) | n, k − n . μ k!

(B.77)

It is invariant, or symmetric, under permutations of all k of its arguments even though the bivariate PDF is only semi-symmetric (B.55). Interchanging the sums in (B.76) and using (B.77) gives p(ξ1:k | k) = k!

k n=0

π(n|k) pSymmetric (ξ1:k | n, k)

≡ k! pSuper (ξ1:k | k).

(B.78)

The probabilistic mixture pSuper (ξ1:k | k) is a permutation invariant PDF on Xk . It uniquely determines the GFL of ST (h, h), which in turn uniquely characterizes a finite point process. The set probabilities of that process, (B.78), are those of the superposed process. Therefore, Super (h) = ST (h, h) is the GFL of the superposition. The result (B.72) holds whether or not the processes are independent. If they are independent, then Super (h) is the product of the GFLs of the processes. To see this, note that by independence p(x1:n , y1:m |n, m) = p S (x1:n |n) pT (y1:m |m),

(B.79)

and therefore, (B.75) factors as ST (h, h|n, m) = S (h|n) T (h|m). Independence also implies that Pr{n, m} = Pr S {n}Pr T {m}. Substituting these two expressions into (B.73) gives

Super (h) =

∞ n,m≥0

Pr S {n}Pr T {m} S (h|n) T (h|m) ≡ S (h) T (h) .

(B.80)

Independence gives a simple form for the GFL of a superposed process, but the permutation invariant PDF of the process is found by symmetrization, as was done in the general result. Special cases are aesthetically pleasing, e.g., the superposition

Appendix B: Generating Functionals for Finite Point Processes

191

of independent Poisson processes is a Poisson process whose intensity function is the sum of intensities of the superposed processes. Superposing point processes is analogous to summing multivariate random variables. To see this, compare the GFLs (B.72) and (B.80) with, respectively, the GFs (A.25) and (A.26). Example: Multiple Bernoulli and Multi-Bernoulli. The joint GFL of n ≥ 1 independent, but not identical, Bernoulli processes defined on Xk , k = 1, . . . , n, is

MultipleB (h 1 , . . . , h n ) =

n k=1



 ak + bk Xk h k (xk ) pk (xk ) dx k ,

(B.81)

where the indeterminate functions h 1 , . . . , h k and the process PDFs pk (·) are defined on Xk , respectively, and the Bernoulli probabilities are bk and ak = 1 − bk . The processes cannot be superposed unless the spaces are identical (or, alternatively, they are mapped onto a common space X). Suppose that Xk = X for all k. In that case, the GFL of the superposed process on X is found by setting h 1 = · · · = h n ≡ h, so that n    ak + bk X h(x) pk (x) dx . (B.82)

MultiBernoulli (h) = k=1

The GFL confirms analytically that the multi-Bernoulli process is the superposition of multiple Bernoulli processes, provided they are defined on the same space X.

B.11

Sequential Bayesian Estimation—Palm Processes

Palm and reduced Palm processes are Bayesian inference problems in which the conditioning event is a partially observed realization of a known point process. The observed points are the conditioning set, and the unobserved points, if any, comprise the point process of interest. The unobserved point process is called the reduced Palm process. Its GFL is derived in this section. Partial realizations arise in sequential statistical estimation problems in which points are “extracted” one at a time from the Bayesian posterior process. After the first point is estimated (or extracted), the next point to be estimated is in principle conditioned on the estimated first point, etc. In multiple object tracking applications, for example, the points are object states. Let x1:n ⊂ X be a partially observed realization of a univariate finite point process on X whose GFL is (h). Realizations which contain x1:n as a subset define the event space of the Palm process. Deleting the points x1:n from the realizations gives the event space of the reduced Palm process. The GFL of the reduced Palm process corresponding to x1:n is the normalized derivative

x1:n (h) . (B.83)

Palm (h|x1:n ) = x

1:n (1)

192

Appendix B: Generating Functionals for Finite Point Processes

c  To see this, let x1:n  , n ≥ 0, be a realization of the reduced Palm process. The points are distinct with probability one, and they are distinct from x1:n with probability one. c Differentiating both sides of (B.83) with respect to x1:n  and evaluating at h = 0 gives xc

x1:n ∪ x1:n (0) .

x1:n (1) c



1:n

Palm (0|x1:n ) =

(B.84)

The numerator of the right-hand side is the probability density of the combined event c x1:n ∪ x1:n  . The denominator is the factorial moment m [n] (x 1:n ), which from (B.48) is the marginal probability p(x1:n ) of the set x1:n . Thus, c x1:n  (0|x1:n )

Palm

  c  c  p x1:n ∪ x1:n  = p x1:n =  | x1:n . p(x1:n )

(B.85)

The last step (Bayes Theorem) establishes (B.83). The GF of the number of points in the reduced Palm process is the GFL on the constant functions h z (x) = z for all x ∈ X, that is, G N |x1:n (z) = Palm (h z |x1:n ). Palm conditioning with bivariate processes works the same way. Let (h, g) denote the joint GFL, and denote the observed points by (x1:n , y1:m ). Then

Palm (h, g| x1:n , y1:m ) =

x1:n ,y1:m (h, g)

x1:n ,y1:m (1, 1)

(B.86)

is the GFL of the reduced bivariate Palm process. Examples: Reduced Palm for Poisson and binomial processes. Given one observed {x} point, say x ∈ X, the derivative of the Poisson GFL (B.11) is Poisson (h) = λp(x)

Poisson (h). Substituting into (B.83) gives Palm (h|{x}) = Poisson (h). For observed subsets x1:n of X, it follows that

Palm (h|x1:n ) = Poisson (h)

(B.87)

and, thus, the reduced Palm process of a Poisson process is identical to the Poisson process. Said differently, partially observed realizations of a Poisson process do not provide information about the locations of unobserved points in the process. Differentiating the GFL of the binomial process (B.7) with n trials gives {x}

Binomial (h) = nβp(x) (α + β X h(x  ) p(x  ) dx  )n−1 . Normalizing the derivative as done in (B.83) shows that the Palm process is the same binomial point process but with n − 1 trials. This result accords well with intuition.

B.12

Pair Correlation Functions

Bi- and multivariate finite point processes are often correlated in surprising and unanticipated ways. Bayes conditional point processes are prominent examples—

Appendix B: Generating Functionals for Finite Point Processes

193

independence assumptions that hold a priori are not preserved in the posterior probability distribution. Quantitative insight into the point-to-point correlation is sometimes gleaned from the pair correlation function. For distinct points x1 and x2 in X, it is defined as the ratio of factorial moments [18] ρ(x1 , x2 ) =

m [2] (x1 , x2 ) . m [1] (x1 ) m [1] (x2 )

(B.88)

The pair correlation of a Poisson process is ρ(x1 , x2 ) = 1. From (B.44), the first-order moments are m [1] (xi ) = λp(xi ), i = 1, 2. Using (B.47), the second factorial moment is m [2] (x1 , x2 ) = λ2 p(x1 ) p(x2 ). Substituting into (B.88) gives ρ(x1 , x2 ) = 1. Pair correlation should not be confused with the concept of statistical correlation in time series analysis. They are not the same concepts. They have different numerical ranges and are scaled differently: pair correlation is nonnegative and can be larger than one; it is identically equal to one for Poisson processes, which are often described as “completely spatially random” [19]; and it is identically zero if and only if the process has at most one point with probability one. The pair correlation function of a point process is used to compare it to the benchmark Poisson process having exactly the same intensity function. For instance, if ρ(x1 , x2 ) > 1, then a pair of points is more likely to occur (jointly) at x1 and x2 than in realizations of a Poisson process of the same intensity [18, p. 31]. Higher order multi-point correlation functions are defined as ρ(x1 , . . . , xk ) =

m [k] (x1 , · · · , xk ) , m [1] (x1 ) · · · m [1] (xk )

(B.89)

where the k points in x1:k are distinct [18]. The multi-point correlation function is identically equal to one if the process is Poisson. Example: Binomial processes. Given two distinct points x1 and x2 in X, the derivative of the binomial GFL (B.7) is  {x1 ,x2 } (h) = n(n − 1)β 2 p(x1 ) p(x2 ) (α + β X h(x  ) p(x  ) dx  )n−2 .

Binomial Evaluating at h = 1 gives m [2] (x1 , x2 ) = n(n − 1)β 2 p(x1 ) p(x2 ). The product of the first moments is m [1] (x1 )m [1] (x2 ) = n 2 β 2 p(x1 ) p(x2 ). Dividing gives the pair correlation function ρ(x1 , x2 ) = 1 − 1/n. Since ρ(x1 , x2 ) < 1 for all x1 and x2 , a pair of points is less likely to occur jointly at x1 and x2 than in realizations of a Poisson process of the same intensity, namely, λ = nβp(x). This is counterintuitive for small n since the points in both processes are IID.

194

Appendix B: Generating Functionals for Finite Point Processes

References 1. Daryl J. Daley and David Vere-Jones. An introduction to the theory of point processes, vol. 2: general theory and structure. 2nd Ed. Springer, 2008. 2. Alan Karr. Point processes and their statistical inference. CRC Press, 1991. 3. Donald E Knuth. The Art of computer programming: Seminumerical algorithms, volume 2. 2nd Ed. Addison Wesley, 1997. 4. Roy L Streit. Intensity filters on discrete spaces. IEEE Transactions on Aerospace and Electronic Systems, 50(2):1590–1599, 2014. 5. Ronald PS Mahler. Advances in statistical multisource-multitarget information fusion. Artech House, 2014. 6. Christoph Degen. Probability generating functions and their application to target tracking. PhD thesis, University of Bonn, Germany, 2017. 7. Roy L Streit. How I learned to stop worrying about a thousand and one filters and love analytic combinatorics. In 2017 IEEE Aerospace Conference, Big Sky, Montana, pages 1–21, 2017. 8. José E Moyal. The general theory of stochastic population processes. Acta Mathematica, (108):1–31, 1962. 9. Maurice S Bartlett and David G Kendall. On the use of the characteristic functional in the analysis of some stochastic processes occurring in physics and biology. In Mathematical Proceedings of the Cambridge Philosophical Society, volume 47, pages 65–76, 1951. 10. Bernard Lindgren. Statistical theory. 4th Ed. Chapman and Hall/CRC, 1993. 11. Gerald Folland. Statistical inference and simulation for spatial point processes. 2nd Ed., John Wiley, 1999. 12. Roy L Streit. A technique for deriving multitarget intensity filters using ordinary derivatives. J. Adv. Inf. Fusion, 9(1):3–12, 2014. 13. Michael J Lighthill. An introduction to Fourier analysis and generalised functions. Cambridge University Press, 1958. 14. Christoph Degen, Roy L Streit, and Wolfgang Koch. On the functional derivative with respect to the Dirac delta. In 2015 Sensor Data Fusion: Trends, Solutions, Applications (SDF), pages 1–8, 2015. 15. Andreas Griewank, Lutz Lehmann, Hernan Leovey, and Marat Zilberman. Automatic evaluations of cross-derivatives. Mathematics of Computation, 83(285): 251–274, 2014. 16. Daryl J. Daley and David Vere-Jones. An introduction to the theory of point processes. Vol. I. Elementary Theory and Methods. Springer, 2003. 17. Roy L Streit. Poisson Point Processes - Imaging, Tracking, and Sensing. Springer, 2010. 18. Jesper Møller and Rasmus Plenge Waagepetersen. Real Analysis: Modern Techniques and their Applications. CRC Press, 2003. 19. John Kingman. Poisson processes. Clarendon Press, 1993.

Appendix C Mathematical Methods

“Theories thus become instruments, not answers to enigmas, in which we can rest.” William James Pragmatism (1907)

Abstract Several widely used techniques and methods in control theory, optimization, applied mathematics, and physics are reviewed. Topics include Dirac deltas and delta trains, the calculus of variations, mixed and cross-derivatives, the complex step method, and automatic differentiation. Keywords Dirac delta · Dirac delta train · Calculus of variations · Variational derivative · Cross-derivatives · Complex step method · Automatic differentiation

C.1

Complex Variables

This section reviews basic results from the theory of complex variables that are used or mentioned in the text. Many textbooks are available for one complex variable, e.g., [1] is a classic. Books on several complex variables are less widely known. Book [2] influenced the presentation here, but more recent books are now available.

C.1.1

One Variable

The univariate function f : C → C is analytic at a given point a ∈ C if and only if, in a neighborhood of a, f (ξ ) can be written as a convergent power series,

© Springer Nature Switzerland AG 2021 R. Streit et al., Analytic Combinatorics for Multiple Object Tracking, https://doi.org/10.1007/978-3-030-61191-0

195

196

Appendix C: Mathematical Methods

f (ξ ) =

∞ n=0

f n (a) (ξ − a)n ,

f n (a) ∈ C.

(C.1)

If f (ξ ) is analytic at a, the series converges for every ξ inside a disk of radius R > 0 about the point a, namely D(a; R) ≡ {ξ : |ξ − a| < R}. The largest value of R for which the series converges in D(a; R) is the radius of convergence (ROC). The derivatives of f (ξ ) exist for all orders at every point in D(a; R). Since f is analytic, they are found by differentiating (C.1) term-by-term. For example, the first derivative, f  (ξ ) ≡ f (1) (ξ ) =

∞ n=1

n f n (a) (ξ − a)n−1 ,

(C.2)

is analytic at every point ξ in D(a; R). Evaluating the kth derivative at ξ = a gives f k (a) =

1 (k) f (a), k ≥ 0 , k!

(C.3)

which shows that (C.1) is the Taylor series of f expanded about a. The univariate Cauchy integral formula states that, for every point z ∈ D(a; R), 

1 f (z) = 2πi

C

f (ξ ) dξ , ξ −z

(C.4)

where C is any circle that lies wholly inside the disk D(a; R) and encloses z. Differentiating both sides of (C.4) k times with respect to z gives f (k) (z) =

k! 2πi

 C

f (ξ ) dξ . (ξ − z)k+1

(C.5)

Interchanging the order of the derivative and the integral is justified, as it is a consequence of the analyticity of f in the disk D(a; R). Cauchy’s integral formula is equivalent to a statement about Fourier trigonometric series. To see this, let a = z = 0 for simplicity, and parameterize the circle by setting ξ = r eiθ , where θ ∈ [0, 2π ) and r < R. Substituting dξ = rieiθ dθ into (C.5) gives f (k) (0) =

k! 2πr k





f (r eiθ )e−ikθ dθ .

(C.6)

0

The kth Fourier series coefficient of the function g(θ ) ≡ f (r eiθ ) is, therefore, equal to Cauchy’s integral.

Appendix C: Mathematical Methods

C.1.2

197

Several Variables

Let a ≡ (a1 , . . . , am ) ∈ Cm and ξ ≡ (ξ1 , . . . , ξm ) ∈ Cm , where m ≥ 1. The multivariate complex-valued function f : Cm → C is analytic at a ∈ Cm if and only if, in a neighborhood of a, it can be written as a convergent multivariate power series, f (ξ ) =

∞ n 1 =0

···

∞

f n 1 ,...,n m (a) (ξ1 − a1 )n 1 · · · (ξm − am )n m ,

n m =0

(C.7)

where the coefficients f n 1 ,...,n m (a) ∈ C. This is equivalent to saying that f (ξ ) is the uniform limit of multivariate polynomials in the complex variables ξ1 , . . . , ξm in a sufficiently small neighborhood of a. For every i, the function f (. . . , ξi , . . .) is analytic in ξi , given fixed values of the variables {ξ j : j = i}, and it has a positive radius of convergence, Ri > 0. The multivariate function f (ξ ) is analytic inside what is termed the polydisc D(1:m) ≡ D(a1 ; R1 ) × · · · × D(am ; Rm ) ⊂ Cm . The converse is also true [2, Hartogs’ Thm.]; that is, if the function is analytic in each variable separately on the disk D(ai ; Ri ), then it is jointly analytic6 on the polydisc D(1:m). This result is important in AC because it means that mixed derivatives and iterated contour integrals can be interchanged freely. The generating functions encountered in this book do not stray outside the boundary of the polydisc. Let z ≡ (z 1 , . . . , z m ) ∈ D(1:m). The multivariate Cauchy integral formula [2] is 1 f (z) = (2πi)m



 ··· C1

Cm

f (ξ1 , . . . , ξm ) dξ1 · · · dξm , (ξ1 − z 1 ) · · · (ξm − z m )

(C.8)

where the Ci are circles that lie, respectively, inside the disks D(ai ; Ri ) and encircle, respectively, the points z i . By Hartogs’ Theorem, the function f (ξ ) is analytic, and the magnitude of the integrand is bounded over the compact domain of integration, so Fubini’s Theorem holds. Consequently, the multiple integral in (C.8) is equivalent to m iterated contour integrals that can be evaluated in any order. Differentiating both sides of (C.8) gives the Cauchy integral formula for mixed partial derivatives, f (k1 ,...,km ) (z) =

k1 ! · · · km ! (2πi)m



 ··· C1

Cm

f (ξ1 , . . . , ξm ) dξ1 · · · dξm , (ξ1 − z 1 )k1 +1 · · · (ξm − z m )km +1

(C.9)

where the integers ki ≥ 0, i = 1, . . . , m. As in the univariate case, interchanging the order of derivatives and integrals is a consequence of the analyticity of f in the polydisc.

converse is false for real variables. A two variable counter-example is g(x, y) = three-dimensional plot of g for (x, y) ∈ [−1, 1] × [−1, 1] is visually convincing.

6 The

xy . x 2 +y 2

A

198

Appendix C: Mathematical Methods

  Fig. C.1 Test sequence h k (s) = N s|x, k −2 , k ≥ 1, for the Dirac delta at x = 0. Depicted functions are for k = 1, 2, 3, 5, and 11

C.2

Dirac Deltas and Trains of Dirac Deltas

The Dirac delta is a rigorously defined concept that is used frequently in digital signal processing (DSP), applied mathematics, and physics. Let δx (·) denote the Dirac delta at a point x ∈ R. It is an operator that is defined by the property δx ( f ) = f (x)

(C.10)

for continuous functions f (x). By insisting that operators are integrals, one might try to interpret the Dirac delta as a function having the property that 



−∞

δx (s) f (s) ds = f (x) .

(C.11)

However, no properly defined function can satisfy (C.11) for every function f . For this reason, the Dirac delta is not referred to as the Dirac delta function. In DSP and other fields, the Dirac delta at x is defined as the limit of the integrals of a sequence of functions called test functions, h k (·), k = 1, 2, . . ., chosen so that 



lim

k→∞

−∞

h k (s) f (s) ds = f (x) .

(C.12)

Appendix C: Mathematical Methods

199

Many test sequences satisfy this requirement. A widely used choice is to take h k (s) equal to the PDF random variable with mean x and variance k −2 , that   of a−2Gaussian is, h k (s) = N s|x, k . Figure C.1 depicts several such Gaussian test functions for the Dirac delta at x = 0. The proof of the limit (C.12) for continuous f is omitted. A more insightful asymptotic expansion is 



    f  (0) N s | x, k −2 f (s) ds = f (x) + + O k −4 . 2 2k −∞

(C.13)

This expansion assumes that f (s) is twice differentiable at s = x and satisfies certain other conditions. The derivation is a standard application of Laplace’s method (also known as the saddle point method) and is omitted. Higher order terms in the expansion can also be computed if desired [3]. The test sequence method for Dirac deltas and other generalized functions is discussed in many places, often tailored to specific communities of interest. Book [4] is a classic treatment by a distinguished aero-acoustician. Dirac delta trains. A train of weighted Dirac deltas is a sum of the form δx1:n (s) =

n j=1

w j δx j (s) ,

(C.14)

where x1 , . . . , xn are distinct points in R and the weights w j ∈ C. For each point x j , let {h k j (·)}∞ k=1 be a test sequence for the Dirac delta δx j (·). A test sequence for the delta train is n w j h k j (s) . (C.15) h train k (s) ≡ j=1

  A widely used choice is h k j (s) = N s | x j , k −2 . The limit 



lim

k→∞

−∞

h train k (s) f (s) ds =

n j=1

w j f (x j )

(C.16)

generalizes (C.12). Dirac delta in two or more dimensions. The Dirac delta is defined at a point (x, y) ∈ R2 by the property that 



−∞



∞ −∞

δ(x,y) (s, t) f (s, t) ds dt = f (x, y) .

(C.17)

A bivariate test sequence φk (·), k = 1, 2, 3, . . . is defined analogously to the onedimensional case, but the test functions are bivariate. A common choice is φk (s, t) = h k (s) gk (t) , (s, t) ∈ R2 ,

(C.18)

200

Appendix C: Mathematical Methods

∞ where {h k (s)}∞ k=1 and {gk (t)}k=1 are test sequences for univariate Dirac deltas at x and y, respectively. Regardless of how the test sequence φk is defined, the bivariate Dirac delta is the product of univariate deltas,

δ(x,y) (s, t) = δx (s) δ y (t).

(C.19)

The product (C.19) is well defined even when x = y because the univariate deltas δx (s) and δ y (t) are defined on different copies of R. Multivariate Dirac deltas at points in Rm , m ≥ 2, are defined in the same manner and satisfy the multivariate analog of (C.19). Trains of multivariate Dirac deltas are defined as in the one-dimensional case and satisfy the multivariate analog of (C.16). Scholium. The pointwise limit of a test sequence for a one-dimensional delta is 0 if s = x · ∞ if s = x

δ˜x (s) =

(C.20)

This limit is used in [5, p. 693] to define the Dirac delta. The limit (C.20) is misleading because it suggests—incorrectly—that interchanging the limit and the integral is mathematically valid. This is not the case, however, since    lim h k f = (lim h k ) f = δ˜x f =

0 if integral is Lesbegue ∞ if integral is Riemann

(C.21)

for continuous functions f . The message is clear: the integrals in (C.12) must be evaluated before the test sequence limit is taken.

C.3

Calculus of Variations

Functional differentiation is part of the calculus of variations, a collection of established methods in optimal control and physics. The brachistochrone problem (1697) is famous as the first problem in the field. A lively discussion is given in [6]. The basic ideas are presented here. A bead slides under the force of gravity and without friction down a wire (lying in a vertical plane). The bead starts with zero velocity at the point (L , L) in the first quadrant and ends at the origin (0, 0). The bead’s descent time from start to finish depends on the shape of the wire which is described by the function y = y(x). The question is, “What shape gives the fastest time?” From elementary physics, the descent time is given by the integral 

L

J (y) = 0

"

1 + (y  )2 dx, 2 gy

(C.22)

Appendix C: Mathematical Methods

201

where y  = y  (x) = dy/d x and g is the gravitational constant. J (y) is a functional because it evaluates to a real number for any function y(·). The brachistochrone problem is to find the shape y(·) for which J (y) is a minimum. In one traditional telling of the story, Isaac Newton learned about the problem one evening and solved it overnight using the following method. Let y0 (·) be the optimal shape, with y0 (L) = L and y0 (0) = 0. Let a shape variation, denoted by η(·), be given. For arbitrary real numbers ε the variation must be such that the perturbed shape y0 + εη satisfies the boundary conditions, hence η(0) = η(L) = 0. If the shape y0 (·) is optimal, then J (y0 + εη) ≥ J (y0 ) for all ε. The variational derivative of J at y0 with respect to the variation η is defined by   d d J (y0 ) ≡ J (y0 + ε η) . dη dε ε=0

(C.23)

A necessary but not sufficient condition for J (·) to have a local minimum at y0 is   d J (y0 + ε η) =0 dε ε=0

(C.24)

for every possible variation η(·). From the fact that (C.24) holds for all variations η(·), a differential equation is derived for y0 (·). Solving the equation gives the optimal shape (a cycloid). See [6] for further details. In the language of the calculus of variations, functional derivatives of the GFL are limits of variational derivatives of exactly the same kind as (C.24). The variations of interest in GFLs are the test functions that define a Dirac delta [4].

C.4

Mixed and Cross-Derivatives

The cross-derivative of a multivariate probability generating function is proportional to the probability of a specified event. The mathematical form of the cross-derivative is typically a large and complicated expression whose terms are meaningful probabilities in their own right—they correspond one-to-one to the probabilities of a set of mutually exclusive and exhaustive events whose union is the specified event. The “multi-index” α = (α1 , . . . , αm ), m≥1, is an ordered m−tuple, or list, of nonnegative integers. In the matrix-vector context, this notation is abused by writing α as a vector in Rm ≡ Rm×1 . Define |α| = α1 + · · · + αm . The mixed partial derivative with index α of a differentiable function f : Cm → C is f α (x) = where x = (x1 , . . . , xm ).

d|α| f (x), dx1α1 · · · dxmαm

(C.25)

202

Appendix C: Mathematical Methods

Leibniz product rule for general mixed derivatives. Let fr : Cm → C be given differentiable functions, r = 0, 1, . . . , n, and let f (x) = f 0 (x)

n r =1

fr (x) .

(C.26)

The Leibniz rule for the mixed derivative of order α of f (x) is



fα =

β1 +···+βn ≤α

α α−β1 −···−βn β1 f f 1 · · · f nβn , β1 · · · βn 0

(C.27)

where β1 , . . . , βn are multi-indices of the same length as α. The multinomial coefficient is defined by

α β1 · · · βn

=

α! , β1 ! · · · βn ! (α − β1 − · · · − βn )!

(C.28)

where the multi-index factorial is defined component-wise, e.g., α! = α1 ! · · · αm !. Leibniz rule for cross-derivatives. A cross-derivative is a mixed derivative with multi-index α ≤  = (1, . . . , 1) ≡ 1m , where multi-index inequalities are defined component-wise. In general, there are 2m cross-derivatives of f (x). The crossderivative of f with multi-index  = 1m is f  (x) =

dm f (x) . dx1 · · · dxm

(C.29)

The Leibniz product rule applied to the function f (x) in (C.26) gives f  (x) =



f 0−ϑ1 −···−ϑn (x) f 1ϑ1 (x) · · · f nϑn (x) ,

(C.30)

ϑ1 +···+ϑn ≤

where ϑt ≡ {θ1t , . . . , θmt } are multi-indices of length m. It is convenient for the purposes of this book to write the multi-index ϑt as the T  m×1 column vector θt , so  that θt ≡ θ1t , . . . , θmt ∈ N , t = 1, . . . , n. Define the m × n matrix θ = θst by standing the column vectors θt side-by-side; explicitly, in partitioned form,   (C.31) θ = θ1 | · · · | θn . The multi-index constraint in (C.30) implies trivially that the index θst is either zero or one, for all s and t, and that the row sums of θ are either zero or one. The column sums of θ are at most m. Let m×n denote the set of m × n matrices θ corresponding to the terms in the multi-index sum (C.30). In this notation, the expression

Appendix C: Mathematical Methods

f  (x) =

 



203

f 0−θ1 −···−θn (x) f 1θ1 (x) · · · f nθn (x)

(C.32)

θ = θ1 | ··· | θn ∈ m×n

is equivalent to the cross-derivative (C.30). Fundamental example. Functions that frequently arise in applications of AC to tracking are deceptively simple looking:

m f 0 (x) = exp a0 + as x s s=1 m f t (x) = b0t + bst xs , t = 1, . . . , n , s=1

(C.33) (C.34)

where the coefficients as and bst , for s = 0, 1, . . . , m and t = 1, . . . , n, are arbitrarily specified complex numbers. The cross-derivative of their product is a special case of (C.30). Surprisingly, the cross-derivative is an NP-hard calculation. For t≥1, f t (x) is a linear function of x, so the cross-derivative f tθt (x) ≡ 0 if the column vector θt has more than one component equal to one. Consequently, with the functions (C.33)–(C.34), the row and column sums of any matrix θ ∈ m×n are at most one. The additional constraint reduces the number of terms in the general sum (C.30). For the remainder of this section, the set m×n is defined to be the set of m × n {0, 1}-matrices with row sums and column sums no greater than 1. For k = 0, 1, . . ., let (k) denote the subset of matrices in m×n with exactly k columns that sum to one. For instance, let m = 2 and n = 3. Enumeration reveals that there are 13 terms in the sum (C.30), each of which corresponds to a matrix θ . Then, in this example, # $! 000 (0) = (C.35a) 000 # (1) =

$ # $ # $ # $ # $ # $! 100 010 001 000 000 000 , , , , , 000 000 000 100 010 001

# (2) =

$ # $ # $ # $ # $ # $! 100 100 010 010 001 001 , , , , , . 010 001 100 001 100 010

(C.35b)

(C.35c)

The subsets (k) partition 2×3 , that is, 2×3 = (0) ∪ (1) ∪ (2). For θ ∈ (k), k ≥ 1, let the set I (θ ) = {i 1 , . . . , i k } denote the indices of the columns of θ that sum to one, and let J (θ ) = { j1 , . . . , jn−k } be the remaining indices of columns that sum to zero. For k = 0, I (θ ) ≡ ∅ and J (θ ) ≡ {1, 2, . . . , n}. Then, I (θ ) ∩ J (θ ) = ∅ and I (θ ) ∪ J (θ ) = {1, . . . , n}

(C.36)

for all k ≥ 0. For t ∈ I (θ ), define m θ (t) to be the index of the (unique) nonzero row of column t. With this definition, the matrix θ has the entry θm θ (t),t = 1.

204

Appendix C: Mathematical Methods

As seen in the example, the sets (k) are disjoint and their union is m×n . The maximum value of k is min{m, n}. The cross-derivative of the product of the functions specified by (C.33)–(C.34) is f  (x) = c0 (x)

min{m,n} 



k=0

θ∈(k)



⎞ bm (t),t θ ⎠, f t (x)⎠ ⎝ a m θ (t) t∈I (θ) ⎞⎛





t∈J (θ)

where c0 (x) = f 0 (x)

m s=1

as .

(C.37)

(C.38)

Multiplying throughout by c0 (x) and canceling terms gives an expression that is valid for all coefficients as . min{m,n} of m×n to rewrite (C.32) To see that (C.37) holds, use the partition {(k)}k≥0 as f  (x) =

min{m,n}  k=0

 

f 0−θ1 −···−θn (x) f 1θ1 (x) · · · f nθn (x) .



(C.39)

θ = θ1 | ··· | θn ∈ (k)

 Since θt = [θ1t , . . . , θmt ]T , the column sum constraint implies that m s=1 θst is either m zero or one. The indices in the set {t : s=1 θst = 0} correspond to functions in the list { f 1 , . . . , f n } that are not differentiated. These functions are linear; from (C.34), the cross-derivative of f t (x) is m  1−ms=1 θst  θst f tθt (x) = f t(θ1t ,...,θmt ) (x) = f t (x) bst . s=1

Taking the product over t gives f 1θ1 (x) · · ·

f nθn (x)

=

n # 

$ m  θst 1−ms=1 θst f t (x) bst

t=1

=



s=1

f t (x)

t∈J (θ)

=



t∈J (θ)

n m t=1 s=1

f t (x)





bst

θst

bm θ (t),t .

t∈I (θ)

Since f 0 (x) specified by (C.33) is the exponential of a linear function,

(C.40)

Appendix C: Mathematical Methods

f 0−θ1 −···−θn (x) = f 0 (x)

205

m m  1−nt=1 θst  − nt=1 θst as as = f 0 (x)a1 · · · am s=1

s=1

m n  −θst as = c0 (x) = c0 (x) am−1θ (t) .

(C.41)

t∈I (θ)

s=1 t=1

Multiplying this expression by (C.40) gives the cross-derivative (C.37). Special case: n = 1. The set of index matrices is m×1 . The set (0) comprises the zero matrix for Rm×1 ≡ Rm . The set (1) = {e1 , . . . , em } comprises the “onehot” basis vectors of Rm . From (C.37), the cross-derivative of f (x) = f 0 (x) f 1 (x) is   !  (·) + (·) f (x) = c0 (x) (0)

(1)

m  bs1 , = c0 (x) f 1 (x) + as s=1

(C.42)

where c0 (x) is given by (C.38). Special case: n = 2, m ≥ 2. The index matrices are of size m × 2. The set (0) comprises the zero matrix in Rm×2 . The set (1) comprises 2m matrices, namely, (1) = {(e1 , 0m ), . . . , (em , 0m )} ∪ {(0m , e1 ), . . . , (0m , em )},

(C.43)

where 0m is the zero vector in Rm×1 . The set (2) comprises m(m − 1) matrices, (2) = {(ei , e j ) : i, j = 1, . . . , m, i = j}.

(C.44)

The cross-derivative of f = f 0 f 1 f 2 is, from (C.37), 



f (x) = c0 (x)

(·) +

(0)

 (1)

(·) +



! (·)

(2)

= c0 (x) f 1 (x) f 2 (x) + f 2 (x)

m  b j1 j=1

aj

+ f 1 (x)

m  bi1 b j2 . + aj a aj i, j=1 i

m  b j2 j=1

i= j

(C.45) The sum over (1) breaks into two sums, one for each set in the union (C.43). Corollary. The total number of terms in the cross-derivative (C.37) is reduced significantly when f t (x) ≡ g(x), 1 ≤ t ≤ n, where g(x) = b0 +

m s=1

bs xs .

(C.46)

206

Appendix C: Mathematical Methods

The cross-derivative of f (x) = f 0 (x)g n (x) is 

f (x) = c0 (x)

min{m,n} 

(n)k g

n−k

(x)

k=0

Sk(m)



b1 bm ,..., a1 am

,

(C.47)

where the “falling factorial” is defined for k ≥ 1 by (n)k = n(n − 1) · · · (n − k + 1) and for k = 0 by (n)0 = 1. The coefficient c0 (x) is given by (C.38), and Sk(m) (·) is the kth elementary symmetric polynomial (ESP) in m ≥ 1 variables. ESPs are defined for x = (x1 , . . . , xm ) by [7, p. 189] Sk(m) (x) =



xi1 · · · xik , k ≥ 1,

(C.48)

1≤i 1