Probability and Random Variables: Theory and Applications [1st ed. 2022] 3030976785, 9783030976781

This book discusses diverse concepts and notions – and their applications – concerning probability and random variables

250 91 4MB

English Pages 508 [506] Year 2022

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Probability and Random Variables: Theory and Applications [1st ed. 2022]
 3030976785, 9783030976781

Table of contents :
Preface
Contents
1 Preliminaries
1.1 Set Theory
1.1.1 Sets
1.1.2 Set Operations
1.1.3 Laws of Set Operations
1.1.4 Uncountable Sets
1.2 Functions
1.2.1 One-to-One Correspondence
1.2.2 Metric Space
1.3 Continuity of Functions
1.3.1 Continuous Functions
1.3.2 Discontinuities
1.3.3 Absolutely Continuous Functions and Singular Functions
1.4 Step, Impulse, and Gamma Functions
1.4.1 Step Function
1.4.2 Impulse Function
1.4.3 Gamma Function
1.5 Limits of Sequences of Sets
1.5.1 Upper and Lower Limits of Sequences
1.5.2 Limit of Monotone Sequence of Sets
1.5.3 Limit of General Sequence of Sets
References
2 Fundamentals of Probability
2.1 Algebra and Sigma Algebra
2.1.1 Algebra
2.1.2 Sigma Algebra
2.2 Probability Spaces
2.2.1 Sample Space
2.2.2 Event Space
2.2.3 Probability Measure
2.3 Probability
2.3.1 Properties of Probability
2.3.2 Other Definitions of Probability
2.4 Conditional Probability
2.4.1 Total Probability and Bayes' Theorems
2.4.2 Independent Events
2.5 Classes of Probability Spaces
2.5.1 Discrete Probability Spaces
2.5.2 Continuous Probability Spaces
2.5.3 Mixed Spaces
References
3 Random Variables
3.1 Distributions
3.1.1 Random Variables
3.1.2 Cumulative Distribution Function
3.1.3 Probability Density Function and Probability Mass Function
3.2 Functions of Random Variables and Their Distributions
3.2.1 Cumulative Distribution Function
3.2.2 Probability Density Function
3.3 Expected Values and Moments
3.3.1 Expected Values
3.3.2 Expected Values of Functions of Random Variables
3.3.3 Moments and Variance
3.3.4 Characteristic and Moment Generating Functions
3.3.5 Moment Theorem
3.4 Conditional Distributions
3.4.1 Conditional Probability Functions
3.4.2 Expected Values Conditional on Event
3.4.3 Evaluation of Expected Values via Conditioning
3.5 Classes of Random Variables
3.5.1 Normal Random Variables
3.5.2 Binomial Random Variables
3.5.3 Poisson Random Variables
3.5.4 Exponential Random Variables
References
4 Random Vectors
4.1 Distributions of Random Vectors
4.1.1 Random Vectors
4.1.2 Bi-variate Random Vectors
4.1.3 Independent Random Vectors
4.2 Distributions of Functions of Random Vectors
4.2.1 Joint Probability Density Function
4.2.2 Joint Probability Density Function: Method of Auxiliary Variables
4.2.3 Joint Cumulative Distribution Function
4.2.4 Functions of Discrete Random Vectors
4.3 Expected Values and Joint Moments
4.3.1 Expected Values
4.3.2 Joint Moments
4.3.3 Joint Characteristic Function and Joint Moment Generating Function
4.4 Conditional Distributions
4.4.1 Conditional Probability Functions
4.4.2 Conditional Expected Values
4.4.3 Evaluation of Expected Values via Conditioning
4.5 Impulse Functions and Random Vectors
References
5 Normal Random Vectors
5.1 Probability Functions
5.1.1 Probability Density Function and Characteristic Function
5.1.2 Bi-variate Normal Random Vectors
5.1.3 Tri-variate Normal Random Vectors
5.2 Properties
5.2.1 Distributions of Subvectors and Conditional Distributions
5.2.2 Linear Transformations
5.3 Expected Values of Nonlinear Functions
5.3.1 Examples of Joint Moments
5.3.2 Price's Theorem
5.3.3 General Formula for Joint Moments
5.4 Distributions of Statistics
5.4.1 Sample Mean and Sample Variance
5.4.2 Chi-Square Distribution
5.4.3 t Distribution
5.4.4 F Distribution
References
6 Convergence of Random Variables
6.1 Types of Convergence
6.1.1 Almost Sure Convergence
6.1.2 Convergence in the Mean
6.1.3 Convergence in Probability and Convergence in Distribution
6.1.4 Relations Among Various Types of Convergence
6.2 Laws of Large Numbers and Central Limit Theorem
6.2.1 Sum of Random Variables and Its Distribution
6.2.2 Laws of Large Numbers
6.2.3 Central Limit Theorem
References
Appendix Answers to Selected Exercises
Index

Citation preview

Iickho Song So Ryoung Park Seokho Yoon

Probability and Random Variables: Theory and Applications

Probability and Random Variables: Theory and Applications

Iickho Song · So Ryoung Park · Seokho Yoon

Probability and Random Variables: Theory and Applications

Iickho Song School of Electrical Engineering Korea Advanced Institute of Science and Technology Daejeon, Korea (Republic of)

So Ryoung Park School of Information, Communications, and Electronics Engineering The Catholic University of Korea Bucheon, Korea (Republic of)

Seokho Yoon College of Information and Communication Engineering Sungkyunkwan University Suwon, Korea (Republic of)

ISBN 978-3-030-97678-1 ISBN 978-3-030-97679-8 (eBook) https://doi.org/10.1007/978-3-030-97679-8 Translation from the Korean language edition: “Theory of Random Variables” by Iickho Song © Saengneung 2020. Published by Saengneung. All Rights Reserved. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

To our kin and academic ancestors and families and to all those who appreciate and enjoy the beauty of thinking and learning To Professors Souguil J. M. Ann, Myung Soo Cha, Saleem A. Kassam, and Jordan M. Stoyanov for their invisible yet enlightening guidance

Preface

This book is a translated version, with some revisions, from Theory of Random Variables, originally written in Korean by the first author in the year 2020. This book is intended primarily for those who try to advance one step further beyond the basic level of knowledge and experience on probability and random variables. At the same time, this book would also be a good resource for experienced scholars to review and refine familiar concepts. For these purposes, the authors have included definitions of basic concepts in clear terms, key advanced concepts in mathematics, and diverse concepts and notions of probability and random variables with a significant number of examples and exercise problems. The organization of this book is as follows: Chap. 1 describes the theory of sets and functions. The unit step function and impulse function, to be used frequently in the following chapters, are also discussed in detail, and the gamma function and binomial coefficients in the complex domain are introduced. In Chap. 2, the concept of sigma algebra is discussed, which is the key for defining probability logically. The notions of probability and conditional probability are then discussed, and several classes of widely used discrete and continuous probability spaces are introduced. In addition, important notions of probability mass function and probability density function are described. After discussing another important notion of cumulative distribution function, Chap. 3 is devoted to the discussion on the notions of random variables and moments, and also for the discussion on the transformations of random variables. In Chap. 4, the concept of random variables is generalized into random vectors, also referred to as joint random variables. Transformations of random vectors are discussed in detail. The discussion on the applications of the unit step function and impulse function in random vectors in this chapter is a unique trait of this book. Chapter 5 focuses on the discussion of normal random variables and normal random vectors. The explicit formula of joint moments of normal random vectors, another uniqueness of this book, is delineated in detail. Three statistics from normal samples and three classes of impulsive distributions are also described in this chapter. In Chap. 6, the authors briefly describe the fundamental aspects of the convergence of random variables. The central limit theorem, one of the most powerful and useful results with practical applications, is among the key expositions in this chapter. vii

viii

Preface

The uniqueness of this book includes, but is not limited to, interesting applications of impulse functions to random vectors, exposition of the general formula for the product moments of normal random vectors, discussion on gamma functions and binomial coefficients in the complex space, detailed procedures to the final answers for almost all results presented, and a substantially useful and extensive index for finding subjects more easily. A total of more than 320 exercise problems are included, of which a complete solution manual for all the problems is available from the authors through the publisher. The authors feel sincerely thankful that, as is needed for the publication of any book, the publication of this book became a reality thanks to a huge amount of help from many people to the authors in a variety of ways. Unfortunately, the authors could mention only some of them explicitly: to the anonymous reviewers for constructive and helpful comments and suggestions, to Bok-Lak Choi and Seung-Ki Kim at Saengneung for allowing the use of the original Korean title, to Eva Hiarapi and Yogesh Padmanaban at Springer Nature for extensive editorial assistance, and to Amelia Youngwha Song Pegram and Yeonwha Song Wratil for improving the readability. In addition, the research grant 2018R1A2A1A05023192 from Korea Research Foundation was an essential support in successfully completing the preparation of this book. The authors would feel rewarded if everyone who spends time and effort wisely in reading and understanding the contents of this book enjoys the pleasure of learning and advancing one step further. Thank you! Daejeon, Korea (Republic of) Bucheon, Korea (Republic of) Suwon, Korea (Republic of) January 2022

Iickho Song So Ryoung Park Seokho Yoon

Contents

1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 Set Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.3 Laws of Set Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.4 Uncountable Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 One-to-One Correspondence . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Metric Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Continuity of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Discontinuities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.3 Absolutely Continuous Functions and Singular Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Step, Impulse, and Gamma Functions . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Step Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 Impulse Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.3 Gamma Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Limits of Sequences of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.1 Upper and Lower Limits of Sequences . . . . . . . . . . . . . . . . . . 1.5.2 Limit of Monotone Sequence of Sets . . . . . . . . . . . . . . . . . . . . 1.5.3 Limit of General Sequence of Sets . . . . . . . . . . . . . . . . . . . . . . Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 1 1 5 9 12 19 22 23 25 25 27

2 Fundamentals of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Algebra and Sigma Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Sigma Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Probability Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

93 93 93 95 99

29 32 33 36 45 55 55 57 59 63 84 91

ix

x

Contents

2.2.1 Sample Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Event Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Probability Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Properties of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Other Definitions of Probability . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Total Probability and Bayes’ Theorems . . . . . . . . . . . . . . . . . . 2.4.2 Independent Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Classes of Probability Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Discrete Probability Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Continuous Probability Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 Mixed Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

99 101 105 108 108 112 116 117 122 125 125 130 136 137 151 160

3 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . 3.1.3 Probability Density Function and Probability Mass Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Functions of Random Variables and Their Distributions . . . . . . . . . . 3.2.1 Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Expected Values and Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Expected Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Expected Values of Functions of Random Variables . . . . . . . 3.3.3 Moments and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.4 Characteristic and Moment Generating Functions . . . . . . . . . 3.3.5 Moment Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Conditional Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Conditional Probability Functions . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Expected Values Conditional on Event . . . . . . . . . . . . . . . . . . 3.4.3 Evaluation of Expected Values via Conditioning . . . . . . . . . . 3.5 Classes of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Normal Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 Binomial Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.3 Poisson Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.4 Exponential Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

161 161 161 163 169 174 174 178 188 190 191 192 198 206 208 208 214 215 216 216 218 222 224 227 242 253

Contents

xi

4 Random Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Distributions of Random Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Random Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 Bi-variate Random Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.3 Independent Random Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Distributions of Functions of Random Vectors . . . . . . . . . . . . . . . . . . 4.2.1 Joint Probability Density Function . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Joint Probability Density Function: Method of Auxiliary Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Joint Cumulative Distribution Function . . . . . . . . . . . . . . . . . . 4.2.4 Functions of Discrete Random Vectors . . . . . . . . . . . . . . . . . . 4.3 Expected Values and Joint Moments . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Expected Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Joint Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 Joint Characteristic Function and Joint Moment Generating Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Conditional Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Conditional Probability Functions . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Conditional Expected Values . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 Evaluation of Expected Values via Conditioning . . . . . . . . . . 4.5 Impulse Functions and Random Vectors . . . . . . . . . . . . . . . . . . . . . . . Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

255 255 255 260 266 270 271

5 Normal Random Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Probability Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Probability Density Function and Characteristic Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 Bi-variate Normal Random Vectors . . . . . . . . . . . . . . . . . . . . . 5.1.3 Tri-variate Normal Random Vectors . . . . . . . . . . . . . . . . . . . . 5.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Distributions of Subvectors and Conditional Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Expected Values of Nonlinear Functions . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Examples of Joint Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Price’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 General Formula for Joint Moments . . . . . . . . . . . . . . . . . . . . 5.4 Distributions of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Sample Mean and Sample Variance . . . . . . . . . . . . . . . . . . . . . 5.4.2 Chi-Square Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.3 t Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.4 F Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

337 337

275 280 284 286 286 287 294 298 299 306 308 309 315 325 335

337 339 343 348 348 351 356 356 358 367 371 371 377 379 384

xii

Contents

Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412 6 Convergence of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Types of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Almost Sure Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.2 Convergence in the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.3 Convergence in Probability and Convergence in Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.4 Relations Among Various Types of Convergence . . . . . . . . . 6.2 Laws of Large Numbers and Central Limit Theorem . . . . . . . . . . . . . 6.2.1 Sum of Random Variables and Its Distribution . . . . . . . . . . . 6.2.2 Laws of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

415 415 415 418 420 422 425 426 432 437 443 454 460

Answers to Selected Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481

Chapter 1

Preliminaries

Sets and functions are key concepts that play an important role in understanding probability and random variables. In this chapter, we discuss those concepts that will be used in later chapters.

1.1 Set Theory In this section, we introduce and review some concepts and key results in the theory of sets (Halmos 1950; Kharazishvili 2004; Shiryaev 1996; Sommerville 1958).

1.1.1 Sets Definition 1.1.1 (abstract space) The collection of all entities is called an abstract space, a space, or a universal set. Definition 1.1.2 (element) The smallest unit that comprises an abstract space is called an element, a point, or a component. Definition 1.1.3 (set) Given an abstract space, a grouping or collection of elements of the abstract space is called a set. An abstract space, often denoted by Ω or S, consists of elements or points, the smallest entities that we shall discuss. In the strict sense, a set is the collection of elements that can be clearly defined mathematically. For example, the collection of ‘people who are taller than 1.5 m’ is a set. On the other hand, the collection of © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 I. Song et al., Probability and Random Variables: Theory and Applications, https://doi.org/10.1007/978-3-030-97679-8_1

1

2

1 Preliminaries

‘tall people’ is not a set because ‘tall’ is not mathematically clear. Yet, in fuzzy set theory, such a vague collection is also regarded as a set by adopting the concept of membership function. Abstract spaces and sets are often represented with braces { } with all elements explicitly shown, e.g. {1, 2, 3}; with the property of the elements described, e.g., n . {ω : 10 < ω < 20π}; {ai }; or {ai }i=1 Example 1.1.1 The result of signal processing in binary digital communication can be represented by the abstract space Ω = {0, 1}. The collection {A, B, . . . , Z } of capital letters of the English alphabet and the collection S = {(0, 0, . . . , 0), (0, 0, . . . , 1), . . . , (1, 1, . . . , 1)} of binary vectors are also abstract spaces. ♦ Example 1.1.2 In the abstract space Ω = {0, 1}, 0 and 1 are elements. The abstract space of seven-dimensional binary vectors contains 27 = 128 elements. ♦ Example 1.1.3 The set A = {1, 2, 3, 4} can also be depicted as, for example, A = {ω : ω is a natural number smaller than 5}. ♦ Definition 1.1.4 (point set) A set with a single point is called a point set or a singleton set. Example 1.1.4 The sets {0}, {1}, and {2} are point sets.



Consider an abstract space Ω and a set G of elements from Ω. When the element ω does and does not belong to G, it is denoted by ω ∈ G

(1.1.1)

and ω ∈ / G, respectively. Sometimes ω ∈ G is expressed as G  ω, and ω ∈ / G as G  ω. Example 1.1.5 For the set A = {0, 1}, we have 0 ∈ A and 2 ∈ / A.



Definition 1.1.5 (subset) If all the elements of a set B belong to another set A, then the set B is called a subset of A, which is expressed as B ⊆ A or A ⊇ B. When B is not a subset of A, it is expressed as B  A or A  B. Example 1.1.6 When A = {0, 1, 2, 3}, B = {0, 1}, and C = {2, 3}, it is clear B ⊆ A and A ⊇ C. The set A is not a subset of B because some elements of A are not elements of B. In addition, B  C and B  C. ♦ Example 1.1.7 Any set is a subset of itself. In other words, A ⊆ A for any set A. ♦ Definition 1.1.6 (equality) If all the elements of A belong to B and all the elements of B belong to A, then A and B are called equal, which is written as A = B. Example 1.1.8 The set A = {ω : ω is a multiple of 25, larger than 15, and smaller than 99} is equal to B = {25, 50, 75}, and C = {1, 2, 3} is equal to D = {3, 1, 2}. In other words, A = B and C = D. ♦

1.1 Set Theory

3

Definition 1.1.7 (proper subset) When B ⊆ A and B = A, the set B is called a proper subset of A, which is denoted by B ⊂ A or A ⊃ B. Example 1.1.9 The set B = {0, 1} is a proper set of A = {0, 1, 2, 3}; that is, B ⊂ A. ♦ In some cases, ⊆ and ⊂ are used interchangeably. Theorem 1.1.1 We have A = B  A ⊆ B, B ⊆ A.

(1.1.2)

In other words, two sets A and B are equal if and only if A ⊆ B and B ⊆ A. As we can later see in the proof of Theorems 1.1.4 and 1.1.1 is useful especially for proving the equality of two sets. Definition 1.1.8 (empty set) A set with no point is called an empty set or a null set, and is denote by ∅ or { }. Note that the empty set ∅ = { } is different from the point set {0} composed of one element 0. One interesting property of the empty set is shown in the theorem below. Theorem 1.1.2 An empty set is a subset of any set. Example 1.1.10 For the sets A = {0, 1, 2, 3} and B = {1, 5}, we have ∅ ⊆ A and { } ⊆ B. ♦ Definition 1.1.9 (finite set; infinite set) A set with a finite or an infinite number of elements is called a finite or an infinite set, respectively. Definition 1.1.10 (set of natural numbers; set of integers; set of real numbers) We will often denote the sets of natural numbers, integers, and real numbers by J+ = {1, 2, . . .}, J = {. . . , −1, 0, 1, . . .},

(1.1.3) (1.1.4)

R = {x : x is a real number},

(1.1.5)

and

respectively. Example 1.1.11 The set {1, 2, 3} is a finite set and the null set { } = ∅ is also a finite set. The set {ω : ω is a natural number, 0 < ω < 10} is a finite set and {ω : ω is a real number, 0 < ω < 10} is an infinite set. ♦

4

Example 1.1.12 The sets J+ , J, and R are infinite sets.

1 Preliminaries



Definition 1.1.11 (interval) An infinite set composed of all the real numbers between two distinct real numbers is called an interval or an interval set. Let a < b and a, b ∈ R. Then, the sets {ω : ω ∈ R, a ≤ ω ≤ b}, {ω : ω ∈ R, a < ω < b}, {ω : ω ∈ R, a ≤ ω < b}, and {ω : ω ∈ R, a < ω ≤ b} are denoted by [a, b], (a, b), [a, b), and (a, b], respectively. The sets [a, b] and (a, b) are called closed and open intervals, respectively, and the sets [a, b) and (a, b] are both called half-open and half-closed intervals. Example 1.1.13 The set [3, 4] = {ω : ω ∈ R, 3 ≤ ω ≤ 4} is a closed interval and the set (2, 5) = {ω : ω ∈ R, 2 < ω < 5} is an open interval. The sets (4, 5] = {ω : ω ∈ R, 4 < ω ≤ 5} and [1, 5) = {ω : ω ∈ R, 1 ≤ ω < 5} are both half-closed intervals and half-open intervals. ♦ Definition 1.1.12 (collection of sets) When all the elements of a ‘set’ are sets, the ‘set’ is called a set of sets, a class of sets, a collection of sets, or a family of sets. A class, collection, and family of sets are also simply called class, collection, and family, respectively. A collection with one set is called a singleton collection. In some cases, a singleton set denotes a singleton collection similarly as a set sometimes denotes a collection. Example 1.1.14 When A = {1, 2}, B = {2, 3}, and C = { }, the set D = {A, B, C} is a collection of sets. The set E = {(1, 2], [3, 4)} is a collection of sets. ♦ Example 1.1.15 Assume the sets A = {1, 2}, B = {2, 3}, C = {4, 5}, and D = {{1, 2}, {4, 5}, 1, 2, 3}. Then, A ⊆ D, A ∈ D, B ⊆ D, B ∈ / D, C  D, and C ∈ D. Here, D is a set but not a collection of sets. ♦ Example 1.1.16 The collection A = {{3}} and B = {{1, 2}} are singleton collections and C = {{1, 2}, {3}} is not a singleton collection. ♦ Definition 1.1.13 (power set) The class of all the subsets of a set is called the power set of the set. The power set of Ω is denoted by 2Ω . Example 1.1.17 The power set of Ω = {3} is 2Ω = {∅, {3}}. The power set of Ω = {4, 5} is 2Ω = {∅, {4}, {5}, Ω}. For a set with n elements, the power set is a collection ♦ of 2n sets.

1.1 Set Theory

5

Fig. 1.1 Set A and its complement Ac

Ac A

1.1.2 Set Operations Definition 1.1.14 (complement) For an abstract space Ω and its subset A, the complement of A, denoted by Ac or A, is defined by / A, ω ∈ Ω}. Ac = {ω : ω ∈

(1.1.6)

Figure 1.1 shows a set and its complement via a Venn diagram. Example 1.1.18 It is easy to see that Ω c = ∅ and (B c )c = B for any set B.



Example 1.1.19 For the abstract space Ω = {0, 1, 2, 3} and B = {0, 1}, we have B c = {2, 3}. The complement of the interval1 A = (−∞, 1] is Ac = (1, ∞). ♦ Definition 1.1.15 (union) The union or sum, denoted by A ∪ B or A + B, of two sets A and B is defined by A∪B = A+B = {ω : ω ∈ A or ω ∈ B}.

(1.1.7)

That is, A ∪ B denotes the set of elements that belong to at least one of A and B. Figure 1.2 shows the union of A and B via a Venn diagram. More generally, the n is2 denoted by union of {Ai }i=1 n

∪ Ai = A1 ∪ A 2 ∪ · · · ∪ A n .

i=1

Example 1.1.20 If A = {1, 2, 3} and B = {0, 1}, then A ∪ B = {0, 1, 2, 3}.

(1.1.8) ♦

Example 1.1.21 For any two sets A and B, we have B ∪ B = B, B ∪ B c = Ω, B ∪ Ω = Ω, B ∪ ∅ = B, A ⊆ (A ∪ B), and B ⊆ (A ∪ B). ♦ 1

Because an interval assumes the set of real numbers by definition, it is not necessary to specify the abstract space when we consider an interval. 2 We often use braces also to denote a number of items in a compact way. For example, {A }n i i=1 here represents A1 , A2 , . . .,An .

6

1 Preliminaries

Fig. 1.2 Sum A ∪ B of A and B A

Fig. 1.3 Intersection A ∩ B of A and B

B

A∩B

A

B

Example 1.1.22 We have A ∪ B = B when A ⊆ B, and (A ∪ B) ⊆ C when A ⊆ C and B ⊆ C. In addition, for four sets A, B, C, and D, we have (A ∪ B) ⊆ (C ∪ D) when A ⊆ C and B ⊆ D. ♦ Definition 1.1.16 (intersection) The intersection or product, denoted by A ∩ B or AB, of two sets A and B is defined by A ∩ B = {ω : ω ∈ A and ω ∈ B}.

(1.1.9)

That is, A ∩ B denotes the set of elements that belong to both A and B simultaneously. The Venn diagram for the intersection of A and B is shown in Fig. 1.3. Meanwhile, n

∩ Ai = A1 ∩ A 2 ∩ · · · ∩ A n

i=1

(1.1.10)

n denotes the intersection of {Ai }i=1 .

Example 1.1.23 For A = {1, 2, 3} and B = {0, 1}, we have A ∩ B = AB = {1}. The intersection of the intervals [1, 3) and (2, 5] is [1, 3) ∩ (2, 5] = (2, 3). ♦ Example 1.1.24 For any two sets A and B, we have B ∩ B = B, B ∩ B c = ∅, B ∩ Ω = B, B ∩ ∅ = ∅, (A ∩ B) ⊆ A, and (A ∩ B) ⊆ B. We also have A ∩ B = A when A ⊆ B. ♦ Example 1.1.25 For three sets A, B, and C, we have (A ∩ B) ⊆ (A ∩ C) when B ⊆ C. ♦

1.1 Set Theory

7

Fig. 1.4 Partition {A1 , A2 , . . . , A6 } of S

S A2 A3

A1 A4

A5 A6

Definition 1.1.17 (disjoint) If A and B have no element in common, that is, if A ∩ B = AB = ∅, then the sets A and B are called disjoint or mutually exclusive. Example 1.1.26 The sets C = {1, 2, 3} and D = {4, 5} are mutually exclusive. The sets A = {1, 2, 3, 4} and B = {4, 5, 6} are not mutually exclusive because A ∩ B = {4} = ∅. The intervals [1, 3) and [3, 5] are mutually exclusive, and [3, 4] and [4, 5] are not mutually exclusive. ♦ Definition 1.1.18 (partition) A collection of subsets of S is called a partition of S when the subsets in the collection are collectively exhaustive and every pair of n is a partition subsets in the collection is disjoint. Specifically, the collection {Ai }i=1 of S if both n

(collectively exhaustive) : ∪ Ai = S

(1.1.11)

(disjoint) : Ai ∩ A j = ∅

(1.1.12)

i=1

and

for all i = j are satisfied. The singleton collection {S} composed only of S is not regarded as a partition of S. Figure 1.4 shows a partition {A1 , A2 , . . . , A6 } of S. Example 1.1.27 When A = {1, 2}, the collection {{1}, {2}} is a partition of A. Each of the five collections {{1}, {2}, {3}, {4}}, {{1}, {2}, {3, 4}}, {{1}, {2, 3}, {4}}, {{1, 2}, {3}, {4}}, and {{1, 2}, {3, 4}} is a partition of B = {1, 2, 3, 4} while neither {{1, 2, 3}, {3, 4}} nor {{1, 2}, {3}} is a partition of B. ♦ Example 1.1.28 The collection {A, ∅} is a partition of A, and {[3, 3.3), [3.3, 3.4], (3.4, 3.6], (3.6, 4)} is a partition of the interval [3, 4). ♦ Example 1.1.29 For A = {1, 2, 3}, obtain all the partitions without the null set. Solution Because the number of elements in A is three, a partition of A with nonempty sets will be that of one- and two-element sets. Thus, collections {{1}, {2, 3}}, {{2}, {1, 3}}, {{3}, {1, 2}}, and {{1}, {2}, {3}} are the desired partitions. ♦

8

1 Preliminaries

Fig. 1.5 Difference A − B between A and B A

B

Definition 1.1.19 (difference) The difference A − B, also denoted by A \ B, is defined as A − B = {ω : ω ∈ A and ω ∈ / B}.

(1.1.13)

Figure 1.5 shows A − B via a Venn diagram. Note that we have A − B = A ∩ Bc = A − AB.

(1.1.14)

Example 1.1.30 For A = {1, 2, 3} and B = {0, 1}, we have A − B = {2, 3} and B − A = {0}. The differences between the intervals [1, 3) and (2, 5] are [1, 3) − (2, 5] = [1, 2] and (2, 5] − [1, 3) = [3, 5]. ♦ Example 1.1.31 For any set A, we have Ω − A = Ac , A − Ω = ∅, A − A = ∅, ♦ A − ∅ = A, A − Ac = A, (A + A) − A = ∅, and A + (A − A) = A. Definition 1.1.20 (symmetric difference) The symmetric difference, denoted by AB, between two sets A and B is the set of elements which belong only to A or only to B. From the definition of symmetric difference, we have AB = (A − B) ∪ (B − A)     = A ∩ B c ∪ Ac ∩ B = (A ∪ B) − (A ∩ B) .

(1.1.15)

Figure 1.6 shows the symmetric difference AB via a Venn diagram. Example 1.1.32 For A = {1, 2, 3, 4} and B = {4, 5, 6}, we have AB = {1, 2, 3} ∪ {5, 6} = {1, 2, 3, 4, 5, 6} − {4} = {1, 2, 3, 5, 6}. The symmetric difference between the intervals [1, 3) and (2, 5] is [1, 3)(2, 5] = ([1, 3) − (2, 5]) ∪ ((2, 5] − [1, 3)) = [1, 2] ∪ [3, 5]. ♦

1.1 Set Theory

9

Fig. 1.6 Symmetric difference AB between A and B A

B

Example 1.1.33 For any set A, we have AA = ∅, A∅ = ∅A = A, AΩ = ♦ ΩA = Ac , and AAc = Ac A = Ω. Example 1.1.34 It follows that A − B = B − A and AB = BA, and that AB = A − B when B ⊆ A. ♦ Example 1.1.35 (Sveshnikov 1968) Show that every element of A1 ΔA2 Δ · · · ΔAn n belongs to only an odd number of the sets {Ai }i=1 . Solution Let us prove the result by mathematical induction. When n = 1, it is selfevident. When n = 2, every element of A1 ΔA2 is an element only of A1 , or only of A2 , by definition. Next, assume that every element of C = A1 ΔA2 Δ · · · ΔAn n belongs only to an odd number of the sets {Ai }i=1 . Then, by the definition of Δ, every element of B = CΔAn+1 belongs only to An+1 or only to C. In other words, n , every element of B belongs only to An+1 or only to an odd number of the sets {Ai }i=1 concluding the proof. ♦ Interestingly, the set operation similar to the addition of numbers is not the union of sets but rather the symmetric difference (Karatowski and Mostowski 1976): subtraction is the inverse operation for addition and symmetric difference is its own inverse operation while no inverse operation exists for the union of sets. More specifically, for two sets A and B, there exists only one C, which is C = AB, such that AC = B. This is clear because A(AB) = B and AC = B.

1.1.3 Laws of Set Operations Theorem 1.1.3 For the operations of union and intersection, the following laws apply: 1. Commutative law A∪B = B∪ A A∩B = B∩ A

(1.1.16) (1.1.17)

10

1 Preliminaries

2. Associative law (A ∪ B) ∪ C = A ∪ (B ∪ C) (A ∩ B) ∩ C = A ∩ (B ∩ C)

(1.1.18) (1.1.19)

(A ∪ B) ∩ C = (A ∩ C) ∪ (B ∩ C) (A ∩ B) ∪ C = (A ∪ C) ∩ (B ∪ C)

(1.1.20) (1.1.21)

3. Distributive law

Note that the associative and distributive laws are for the same and different types of operations, respectively. Example 1.1.36 Assume three sets A = {0, 1, 2, 6}, B = {0, 2, 3, 4}, and C = {0, 1, 3, 5}. It is easy to check the commutative and associative laws. Next, (1.1.20) holds true as it is clear from (A ∪ B) ∩ C = {0, 1, 2, 3, 4, 6} ∩ {0, 1, 3, 5} = {0, 1, 3} and (A ∩ C) ∪ (B ∩ C) = {0, 1} ∪ {0, 3} = {0, 1, 3}. In addition, (1.1.21) holds true as it is clear from (A ∩ B) ∪ C = {0, 2} ∪ {0, 1, 3, 5} = {0, 1, 2, 3, 5} and (A ∪ C) ∩ (B ∪ C) = {0, 1, 2, 3, 5, 6} ∩ {0, 1, 2, 3, 4, 5} = {0, 1, 2, 3, 5}. ♦ Generalizing (1.1.20) and (1.1.21) for a number of sets, we have  B∩



n

n

= ∪ (B ∩ Ai )

∪ Ai

i=1

i=1

(1.1.22)

and 



n

n

= ∩ (B ∪ Ai ) ,

∩ Ai

B∪

i=1

i=1

(1.1.23)

respectively. Theorem 1.1.4 When A1 , A2 , . . ., An are subsets of an abstract space S, we have 

c

n

∪ Ai

i=1

 = S−

n



∪ Ai

i=1

n

= ∩ (S − Ai ) i=1 n

= ∩ Aic i=1

(1.1.24)

and 

n

∩ Ai

i=1

c

n

= ∪ Aic . i=1

(1.1.25)

1.1 Set Theory

11

Proof Let us prove the theorem by using (1.1.2). c  n n (1) Proof of (1.1.24). Let x ∈ ∪ Ai . Then, x ∈ / ∪ Ai , and therefore x is not an i=1

i=1

element of any Ai . This implies that x is an element of every Aic or, equivalently, n

x ∈ ∩ Aic . Therefore, we have i=1



n

c

∪ Ai

i=1

n

⊆ ∩ Aic .

(1.1.26)

i=1

n

Next, assume x ∈ ∩ Aic . Then, x ∈ Aic , and therefore x is not an element of Ai i=1 c  n n for any i: in other words, x ∈ / ∪ Ai . This implies x ∈ ∪ Ai . Therefore, we i=1

i=1

have n

∩ Aic ⊆

i=1



n

∪ Ai

i=1

c .

(1.1.27)

From (1.1.2), (1.1.26), and (1.1.27), the result (1.1.24) is proved. (2) Proof of(1.1.25). Replacing Ai with Aic in (1.1.24) and using (1.1.26), we have  c n n  n c ∪ Aic = ∩ Aic = ∩ Ai . Taking the complement completes the proof. i=1

i=1

i=1

♠ Example 1.1.37 Consider S = {1, 2, 3, 4} and its subsets A1 = {1}, A2 = {2, 3}, and A3 = {1, 3, 4}. Then, we have (A1 + A2 )c = {1, 2, 3}c = {4}, which is the same as Ac1 Ac2 = {2, 3, 4} ∩ {1, 4} = {4}. Similarly, we have (A1 A2 )c = ∅c = S, which is the same as Ac1 + Ac2 = {2, 3, 4} ∪ {1, 4} = S. In addition, (A2 + A3 )c = S c = ∅ is the same as Ac2 Ac3 = {1, 4} ∩ {2} = ∅. Finally, (A1 A3 )c = {1}c = {2, 3, 4} is the ♦ same as Ac1 + Ac3 = {2, 3, 4} ∪ {2} = {2, 3, 4}. Example 1.1.38 For three sets A, B, and C, show that A = B − C when A ⊆ B and C = B − A. Solution First, assume A = B. Then, because C = B − A = ∅, we have B − C = B − ∅ = A. Next, assume A ⊂ B. Then, C = B ∩ Ac and C c = (B ∩ Ac )c = B c ∪ A from (1.1.14) and (1.1.25), respectively. Thus, using (1.1.14) and (1.1.20), we get B − C = B ∩ C c = B ∩ (B c ∪ A) = (B ∩ B c ) ∪ (B ∩ A) = ∅ ∪ A = A. ♦

12

1 Preliminaries

1.1.4 Uncountable Sets Definition 1.1.21 (one-to-one correspondence) A relationship between two sets in which each element of either set is assigned with only one element of the other is called a one-to-one correspondence. The notion of one-to-one correspondence will be redefined in Definition 1.2.9. Based on the set J+ of natural numbers defined in (1.1.3) and the concept of one-toone correspondence, let us define a countable set. Definition 1.1.22 (countable set) A set is called countable or denumerable if we can find a one-to-one correspondence between the set and a subset of J+ . The elements of a countable set can be indexed as a1 , a2 , . . . , an , . . .. It is easy to see that finite sets are all countable sets. Example 1.1.39 The sets {1, 2, 3} and {1, 10, 100, 1000} are both countable because a one-to-one correspondence can be established between these two sets and the subsets {1, 2, 3} and {1, 2, 3, 4}, respectively, of J+ . ♦ Example 1.1.40 The set J of integers is countable because we can establish a oneto-one correspondence as 0 −1 1 −2 2 · · · −n n · · ·      ···   ··· 1 2 3 4 5 · · · 2n 2n + 1 · · ·

(1.1.28)

between J and J+ . Similarly, it is easy to see that the sets {ω : ω is a positive even number} and {2, 4, . . . , 2n , . . .} are countable sets by noting the one-to-one corre♦ spondences 2n ↔ n and 2n ↔ n, respectively. Theorem 1.1.5 The set Q = {q : q is a rational number}

(1.1.29)

of rational numbers is countable.  Proof (Method 1) A rational number can be expressed as Consider the sequence

ai j

p q

 : q ∈ J+ , p ∈ J .

⎧ 0 ⎪ i = j = 0; ⎨ 1, i− j+1 = − j , i = 1, 2, . . . , j = 1, 2, . . . , i; ⎪ ⎩ j−i , i = 1, 2, . . . , j = i + 1, i + 2, . . . , 2i. 2i− j+1

In other words, consider

(1.1.30)

1.1 Set Theory

13

0 1 1 1 i =1:− , 1 1 2 1 1 i =2:− ,− , , 1 2 2 3 2 1 i =3:− ,− ,− , 1 2 3 .. .

i =0:

(1.1.31)

2 1 1 2 3 , , 3 2 1

Reading this sequence downward from the first row and ignoring repetitions, we will have a one-to-one correspondence between the sets of rational numbers and natural numbers. (Method 2) Assume integers x = 0 and y, and denote the rational number xy by the coordinates (x, y) on a two dimensional plane. Reading the integer coordinates as (1, 0) → (1, 1) → (−1, 1) → (−1, −1) → (2, −1) → (2, 2) → · · · while skipping a number if it had previously appeared, we have a one-to-one correspon♠ dence between J+ and Q. Theorem 1.1.6 Countable sets have the following properties: (1) A subset of a countable set is countable. (2) There exists a countable subset for any infinite set.



n

(3) If the sets A1 , A2 , . . . are all countable, then ∪ Ai = ∪ Ai = lim ∪ Ai is i∈J+

i=1

n→∞ i=1

also countable. Proof (1) For a finite set, it is obvious. For an infinite set, denote a countable set by A = {a1 , a2 , . . .}. Then, a subset of A can be expressed as B = an 1 , an 2 , . . . and we can find a one-to-one correspondence i ↔ ani between J+ and B. (2) We can choose a countable subset {a1 , a2 , . . .} arbitrarily from an infinite set. (3) Consider the sequence B1 , B2 , . . . defined as B1 = A1 = b1 j , B2 = A2 − A1 = b2 j , B3 = A3 − (A1 ∪ A2 ) = b3 j , .. .

(1.1.32) (1.1.33) (1.1.34)

14

1 Preliminaries ∞



i=1

i=1

Clearly, B1 , B2 , . . . are mutually exclusive and ∪ Ai = ∪ Bi . Because Bi ⊆ Ai , the sets B1 , B2 , . . . are all countable from Property (1). Next, arrange the elements of B1 , B2 , . . . as b11 → b12  b21

 b22

↓  b31



b32

··· 

b24

···

b33

b34

···

.. .

.. .

..

b23 

 .. .

b13 → b14



(1.1.35)

 .. .

.

and read them in the order as directed by the arrows, which represents a one-to∞



i=1

i=1

one correspondence between J+ and ∪ Bi = ∪ Ai . ♠

Property (3) of Theorem 1.1.6 also implies that a countably infinite union of countable sets is a countable set. Example 1.1.41 Show that the Cartesian product A1 × A2 ×   (Sveshnikov 1968) · · · × An = a1i1 , a2i2 , . . . , anin of a finite number of countable sets is countable. Solution It suffices to show that the Cartesian product A × B is countable when A and B are countable. Denote two countable sets by A = {a1 , a2 , . . .} and B = {b1 , b2 , . . .}. If we arrange the elements of the Cartesian product A × B as (a1 , b1 ) , (a1 , b2 ) , (a2 , b1 ) , (a1 , b3 ) , (a2 , b2 ) , (a3 , b1 ) , . . ., then it is apparent that the Cartesian product is countable. ♦ Example 1.1.42 Show that the set of finite sequences from a countable set is countable. Solution The set Bk of finite sequences with length k from a countable set

A is equivalent to the k-fold Cartesian product Ak = (b1 , b2 , . . . , bk ) : b j ∈ A of A. Then, Bk is countable from Example 1.1.41. Next, the set of finite sequences is the ∞

♦ countable union ∪ Bk , which is countable from (3) of Theorem 1.1.6. k=1

b Example 1.1.43 The set A = a : a, b ∈ Q 1 is countable, where Q 1 = Q − {0} with Q the set of rational numbers. ♦ Example 1.1.44 The set ΥT of infinite binary sequences with a finite period is countable. First, note that there exist3 two sequences with period 2, six sequences with 3

We assume, for example, · · · 1010 · · · and · · · 0101 · · · are different.

1.1 Set Theory

15

period 3, . . ., at most 2k − 2 sequences with period k, . . .. Based on this observation, we can find the one-to-one correspondence · · · 00 · · · → 1, · · · 11 · · · → 2, · · · 0101 · · · → 3, · · · 1010 · · · → 4, · · · 001001 · · · → 5, · · · 010010 · · · → 6, · · · 100100 · · · → 7, · · · 011011 · · · → 8, · · · 101101 · · · → 9, · · · between ΥT and J+ .



Definition 1.1.23 (uncountable set) When no one-to-one correspondence exists between a set and a subset of J+ , the set is called uncountable or non-denumerable. As it has already been mentioned, finite sets are all countable. On the other hand, some infinite sets are countable and some are uncountable. Theorem 1.1.7 The interval set [0, 1] = R[0,1] = {x : 0 ≤ x ≤ 1}, i.e., the set of real numbers in the interval [0, 1], is uncountable. Proof We prove the theorem by contradiction. Letting ai j ∈ {0, 1, . . . , 9}, the elements of the set R[0,1] can be expressed as 0.ai1 ai2 · · · ain · · · . Assume R[0,1] is countable: in other words, assume all the elements of R[0,1] are enumerated as α1 = 0.a11 a12 · · · a1n · · · α2 = 0.a21 a22 · · · a2n · · · .. . αn = 0.an1 an2 · · · ann · · · .. .

(1.1.36) (1.1.37)

(1.1.38)

Now, consider a number β = 0.b1 b2 · · · bn · · · , where bi = aii and bi ∈ {0, 1, . . . , 9}. Then, it is clear that β ∈ R[0,1] . We also have β = α1 because b1 = a11 , β = α2 because b2 = a22 , · · · : in short, β is not equal to any αi . In other words, although β is an element of R[0,1] , it is not included in the enumeration, and produces a contradiction to the assumption that all the numbers in R[0,1] have been ♠ enumerated. Therefore, R[0,1] is uncountable. Example 1.1.45 The set Υ = { = (a1 a2 · · · ) : ai ∈ {0, 1}} of one-sided infinite binary sequences is uncountable, which can be shown via a method similar to the proof of Theorem 1.1.7. Specifically, assume Υ is countable, and all the elements i of Υ are arranged as 1 2

= (a11 a12 · · · a1n · · · ) = (a21 a22 · · · a2n · · · ) .. .

(1.1.39) (1.1.40)

16

1 Preliminaries

where ai j ∈ {0, 1}. Denote the complement of a binary digit x by x. Then, the sequence (a11 a22 · · · ) produces a contradiction. Therefore, Υ is uncountable. ♦ Example 1.1.46 (Gelbaum andOlmsted  1964) Consider the closed interval U = [0, 1]. The open interval D11 = 13 , 23 is removed from U in the first step, two open     intervals D = 1 , 2 and D22 = 79 , 89 are removed from the remaining region  1   2 21  9 9 0, 3 ∪ 3 , 1 in the second step, . . ., 2k−1 open intervals of length 3−k are removed in the k-th step, . . .. The limit of the remaining region C of this procedure is called the Cantor set or Cantor ternary set. The procedure can equivalently be described as follows: Denote an open interval with the starting point ζ1k =

 2 1 + cj k 3 3j j=0

(1.1.41)

ζ2k =

 2 2 + cj k 3 3j j=0

(1.1.42)

k−1

and ending point k−1

by A2c0 ,2c1 ,2c2 ,...,2ck−1 , where c0 = 0 and c j = 0 or 1 for j = 1, 2, . . . , k − 1. Then, at the k-th step in the procedure of obtaining the Cantor set C, we are removing the

2k−1 open intervals A2c0 ,2c1 ,2c2 ,...,2ck−1 of length 3−k . Specifically, we have A0 = 1 2     1 2 , when k = 1; A0,0 = 19 , 29 and A0,2 = 79 , 89 when k = 2; A0,0,0 = 27 , 27 , 3 3 7 8  19 20   25 26  A0,0,2 = 27 , 27 , A0,2,0 = 27 , 27 , and A0,2,2 = 27 , 27 when k = 3; · · · . ♦ The Cantor set C described in Example 1.1.46 has the following properties: ∞   (1) The set C can be expressed as C = ∩ Bi , where B1 = [0, 1], B2 = 0, 13 ∪ i=1         2  , 1 , B3 = 0, 19 ∪ 29 , 39 ∪ 69 , 79 ∪ 89 , 1 , . . .. 3 (2) The set C is an uncountable and closed set. (3) The length of the union of the open intervals removed when obtaining C is 1 1 + 322 + 343 + · · · = 1−3 2 = 1. Consequently, the length of C is 0. 3 3 (4) The set C is the set of ternary real numbers between 0 and 1 that can be represented without using 1. In other words, every element of C can be expressed as ∞  xn , xn ∈ {0, 2}. 3n n=1

In Sect. 1.3.3, the Cantor set is used as the basis for obtaining a singular function. Example 1.1.47 (Gelbaum and Olmsted 1964) The Cantor set C considered in Example 1.1.46 has a length 0. A Cantor set with a length greater than 0 can be obtained similarly. For example, consider the interval   [0, 1] and a constant α ∈ (0, 1]. In the first step, an open interval 21 − α4 , 21 + α4 of length α2 is removed. In the second step, an open interval each of length α8 is removed at the center of the two

1.1 Set Theory

17

Table 1.1 Finite, infinite, countable, and uncountable sets Countable (enumerable, denumerable) Finite Infinite

Finite Example: {3, 4, 5} Countably infinite Example: {1, 2, . . .}

Uncountable (non-denumerable)

Uncountably infinite breal Example: (0, 1]

α closed intervals remaining. In the third step, an open interval each of length 32 is removed at the center of the four closed intervals remaining. . . .. Then, this Cantor set is a set of length 1 − α because the sum of lengths of the regions removed is α + α4 + α8 + · · · = α. A Cantor set with a non-zero length is called the Smith2 Volterra-Cantor set or fat Cantor set. ♦

Example 1.1.48 As shown in Table 1.1, the term countable set denotes a finite set or a countably infinite set, and the term infinite set denotes a countably infinite set or an uncountably infinite, simply called uncountable, set. ♦ Definition 1.1.24 (almost everywhere) In real space, when the length of the union of countably many intervals is arbitrarily small, a set of points that can be contained in the union is called a set of length 0. In addition, ‘at all points except for a set of length 0’ is called ‘almost everywhere’, ‘almost always’, ‘almost surely’, ‘with probability 1’, ‘almost certainly’, or ‘at almost every point’. In the integer or discrete space (Jones 1982), ‘almost everywhere’ denotes ‘all points except for a finite set’. Example 1.1.49 The intervals [1, 2) and (1, 2) are the same almost everywhere. The sets {1, 2, . . .} and {2, 3, . . .} are the same almost everywhere. ♦ Definition 1.1.25 (equivalence) If we can find a one-to-one correspondence between two sets M and N , then M and N are called equivalent, or of the same cardinality, and are denoted by M ∼ N . Example 1.1.50 For the two sets A = {4, 2, 1, 9} and B = {8, 0, 4, 5}, we have A ∼ B. ♦ Example 1.1.51 If A ∼ B, then 2 A ∼ 2 B .



Example 1.1.52 (Sveshnikov 1968) Show that A ∪ B ∼ A when A is an infinite set and B is a countable set. Solution First, arbitrarily choose a countably infinite set A0 = {a0 , a1 , . . .} from A. Because A0 ∪ B is also a countably infinite set, we have a one-to-one correspondence between A0 ∪ B and A0 : denote the one-to-one correspondence by g(x). Then,

18

1 Preliminaries 1 y = tan π x − − 2

Fig. 1.7 The   function y = tan π x − 21 showing the equivalence (0, 1) ∼ R between the interval (0, 1) and the set R of real numbers

1 − 2

 h(x) =

g(x), x ∈ A0 ∪ B, x, x∈ / A0 ∪ B

is a one-to-one correspondence between A ∪ B and A: that is, A ∪ B ∼ A.

1

x

(1.1.43) ♦

Example 1.1.53 The set J+ of natural numbers is equivalent to the set J of integers. The set of irrational numbers is equivalent to the set R of real numbers from Exercise 1.16. ♦ Example 1.1.54 It is interesting to note that the set of irrational numbers is not closed under certain basic operations, such as addition and multiplication, while the much smaller set of rational numbers is closed under such operations. ♦ Example 1.1.55 The Cantor and Smith-Volterra-Cantor sets considered in Examples 1.1.46 and 1.1.47, respectively, are both equivalent to the set R of real numbers. ♦ Example  1.1.56  As shown in Fig. 1.7, it is easy to see that (0, 1) ∼ R via y = tan π x − 21 . It is also clear that [a, b] ∼ [c, d] when a < b and c < d because a point x between a and b, and a point y between c and d, have the one-to-one (d−c) (x − a) + c. ♦ correspondence y = (b−a) Example 1.1.57 The set of real numbers R is uncountable. The intervals [a, b], [a, b), (a, b], and (a, b) are all uncountable for any real number a and b > a from Theorem 1.1.7 and Example 1.1.56. ♦ Theorem 1.1.8 When A is equivalent to a subset of B and B is equivalent to a subset of A, then A is equivalent to B. Example 1.1.58 As we have observed in Theorem 1.1.5, the set J+ of natural numbers is equivalent to the set Q of rational numbers. Similarly, the subset Q3 = {t : t = q3 , q ∈ J} of Q is equivalent to the set J of integers. Therefore, recollecting that J+ is a subset of J and Q3 is a subset of Q, Theorem 1.1.8 dictates that J ∼ Q. ♦

1.1 Set Theory

19

Example 1.1.59 It is interesting to note that, although J+ is a proper subset of J, which is in turn a proper subset of Q, the three sets J+ , J, and Q are all equivalent. For finite sets, on the other hand, such an equivalence is impossible when one set is a proper subset of the other. This exemplifies that the infinite and finite spaces sometimes produce different results. ♦

1.2 Functions In this section, we will introduce and briefly review some key concepts within the theory of functions (Ito 1987; Royden 1989; Stewart 2012). Definition 1.2.1 (mapping) A relation f that assigns every element of a set Ω with only one element of another set A is called a function or mapping and is often denoted by f : Ω → A. For the function f : Ω → A, the sets Ω and A are called the domain and codomain, respectively, of f . Example 1.2.1 Assume the domain Ω = [−1, 1] and the codomain A = [−2, 1]. The relation that connects all the points in [−1, 0) of the domain with −1 in the codomain, and all the points in [0, 1] of the domain with 1 in the codomain is a function. ♦ Example 1.2.2 Assume the domain Ω = [−1, 1] and the codomain A = [−2, 1]. The relation that connects all the points in [−1, 0) of the domain with −1 in the codomain, and all the points in (0, 1] of the domain with 1 in the codomain is not a function because the point 0 in the domain is not connected with any point in the codomain. In addition, the relation that connects all the points in [−1, 0] of the domain with −1 in the codomain, and all the points in [0, 1] of the domain with 1 in the codomain is not a function because the point 0 in the domain is connected with more than one point in the codomain. ♦ Definition 1.2.2 (set function) A function whose domain is a collection of sets is called a set function. Example 1.2.3 Let the domain be the power set 2C = {∅, {3}, {4}, {5}, {3, 4}, {3, 5}, {4, 5}, {3, 4, 5}} of C = {3, 4, 5}. Define a function f (B) for B ∈ 2C as the number of elements in B. Then, f is a set function, and we have f ({3}) = 1, f ({3, 4}) = 2, and f ({3, 4, 5}) = 3, for example. ♦ Definition 1.2.3 (image) For a function f : Ω → A and a subset G of Ω, the set f (G) = {a : a = f (ω), ω ∈ G}, which is a subset of A, is called the image of G (under f ).

(1.2.1)

20 Fig. 1.8 Image f (G) ⊆ A of G ⊆ Ω for a function f :Ω→ A

1 Preliminaries f :Ω→A Ω

A f (G) G

Fig. 1.9 Range f (Ω) ⊆ A of function f : Ω → A

f :Ω→A A Ω

f (Ω)

Definition 1.2.4 (range) For a function f : Ω → A, the image f (Ω) is called the range of the function f . The image f (G) of G ⊆ Ω and the range f (Ω) are shown in Figs. 1.8 and 1.9, respectively. Example 1.2.4 For the domain Ω = [−1, 1] and the codomain 10], con  A = [−10, sider the function f (ω) = ω 2 . The image of the subset G 1 = − 21 , 21 of the domain Ω is f (G 1 ) = [0, 0.25), and the image of G 2 = (0.1, 0.2) is f (G 2 ) = (0.01, 0.04). ♦ Example 1.2.5 The image of G = {{3}, {3, 4}} in Example 1.2.3 is f (G) = {1, 2}. ♦ Example 1.2.6 Consider the domain Ω = [−1, 1] and codomain A = [−2, 1]. Assume a function f for which all the points in [−1, 0) ⊆ Ω are mapped to −1 ∈ A and all the points in [0, 1] ⊆ Ω are mapped to 1 ∈ A. Then, the range f (Ω) = f ([−1, 1]) of f is {−1, 1}, which is different from the codomain A. In Example 1.2.3, the range of f is {0, 1, 2, 3}. ♦ As we observed in Example 1.2.6, the range and codomain are not necessarily the same. Definition 1.2.5 (inverse image) For a function f : Ω → A and a subset H of A, the subset f −1 (H ) = {ω : f (ω) ∈ H }, shown in Fig. 1.10, of Ω is called the inverse image of H (under f ).

(1.2.2)

1.2 Functions Fig. 1.10 Inverse image f −1 (H ) ⊆ Ω of H ⊆ A for a function f : Ω → A

21 f :Ω→A Ω

A

f −1 (H) f −1 (H)

H

Example 1.2.7 Consider the function f (ω) = ω 2 with domain Ω = [−1, 1] and codomain A = [−10, 10]. The inverse image of a subset H1 = (−0.25, 1) of codomain A is f −1 (H1 ) = (−1, 1), and the inverse image of H2 = (−0.25, 0) is f −1 (H2 ) = ♦ f −1 ((−0.25, 0)) = ∅. Example 1.2.8 In Example 1.2.3, the inverse image of H = {3} is f −1 (H ) = {{3, 4, 5}}. ♦ Definition 1.2.6 (surjection) When the range and codomain of a function are the same, the function is called an onto function, a surjective function, or a surjection. If the range and codomain of a function are not the same, that is, if the range is a proper subset of the codomain, then the function is called an into function. Definition 1.2.7 (injection) When the inverse image for every element of the codomain of a function has at most one element, i.e., when the inverse image for every element of the range of a function has only one element, the function is called an injective function, a one-to-one function, a one-to-one mapping, or an injection. In Definition 1.2.7, ‘... function has at most one element, i.e., ...’ can be replaced with ‘... function is a null set, a singleton set, or a singleton collection of sets, i.e., ...’, and ‘... has only one element, ...’ with ‘... is a singleton set or a singleton collection of sets, ...’. Example 1.2.9 For the domain Ω = [−1, 1] and the codomain A = [0, 1], consider the function f (ω) = ω 2 . Then, f is a surjective function because its range is the same as the codomain, and f is not an injective function because, for any non-zero point of the range, the inverse image has two elements. ♦ Example 1.2.10 For the domain Ω = [−1, 1] and the codomain A = [−2, 2], consider the function f (ω) = ω. Then, because the range [−1, 1] is not the same as the codomain, the function f is not a surjection. Because the inverse image of every element in the range is a singleton set, the function f is an injection. ♦ Example 1.2.11 For the domain Ω = {{1}, {2, 3}} and the codomain A = {3, {4}, {5, 6, 7}}, consider the function f ({1}) = 3, f ({2, 3}) = {4}. Because the range {3, {4}} is not the same as the codomain, the function f is not a surjection. Because

22

1 Preliminaries

the inverse image of every element in the range have only one4 element, the function f is an injection. ♦

1.2.1 One-to-One Correspondence The notions of one-to-one mapping and one-to-one correspondence defined in Definitions 1.2.7 and 1.1.21, respectively, can be alternatively defined as in the following definitions: Definition 1.2.8 (one-to-one mapping) A mapping is called one-to-one if the inverse image of every singleton set in the range is a singleton set. Definition 1.2.9 (one-to-one correspondence) When the inverse image of every element in the codomain is a singleton set, the function is called a one-to-one correspondence. A one-to-one correspondence is also called a bijective, a bijective function, or a bijective mapping. A bijective function is a surjection and an injection at the same time. A one-to-one correspondence is a one-to-one mapping for which the range and codomain are the same. For a one-to-one mapping that is not a one-to-one correspondence, the range is a proper subset of the codomain. Example 1.2.12 For the domain Ω = [−1, 1] and the codomain A = [−1, 1], consider the function f (ω) = ω. Then, f is a surjective function because the range is the same as the codomain, and f is an injective function because, for every point of the range, the inverse image is a singleton set. In other words, f is a one-to-one correspondence and a bijective function. ♦ Theorem 1.2.1 When f is a one-to-one correspondence, we have f (A ∪ B) = f (A) ∪ f (B), f (A ∩ B) = f (A) ∩ f (B),

(1.2.3) (1.2.4)

f −1 (C ∪ D) = f −1 (C) ∪ f −1 (D),

(1.2.5)

f −1 (C ∩ D) = f −1 (C) ∩ f −1 (D)

(1.2.6)

and

for subsets A and B of the domain and subsets C and D of the range. Proof Let us show (1.2.5) only. First, when x ∈ f −1 (C ∪ D), we have f (x) ∈ C or f (x) ∈ D. Then, because x ∈ f −1 (C) or x ∈ f −1 (D), we have x ∈ f −1 (C) ∪ The inverse image of the element {4} of the range is not {2, 3} but {{2, 3}}, which has only one element {2, 3}. 4

1.2 Functions

23

f −1 (D). Next, when x ∈ f −1 (C) ∪ f −1 (D), we have f (x) ∈ C or f (x) ∈ D. Thus, ♠ we have f (x) ∈ C ∪ D, and it follows that x ∈ f −1 (C ∪ D). Theorem 1.2.1 implies that, if f is a one-to-one correspondence, not only can the images f (A ∪ B) and f (A ∩ B) be expressed in terms of the images f (A) and f (B), but the inverse images f −1 (C ∪ D) and f −1 (C ∩ D) can also be expressed in terms of the inverse images f −1 (C) and f −1 (D). Generalizing (1.2.3)–(1.2.6), we have 

n

i=1



n



i=1



m

i=1 n

= ∩ f (Ai ) ,

∩ Ai

f

n

= ∪ f (Ai ) ,

∪ Ai

f

f −1





∪ Ci

i=1

i=1 m

= ∪ f −1 (Ci ) , i=1

(1.2.7) (1.2.8) (1.2.9)

and f

−1



m

∩ Ci

i=1



m

= ∩ f −1 (Ci ) i=1

(1.2.10)

n if f is a one-to-one correspondence, where {Ai }i=1 are subsets of the domain and m {Ci }i=1 are subsets of the range.

1.2.2 Metric Space Definition 1.2.10 (distance function) A function d satisfying the three conditions below for every three points p, q, and r is called a distance function or a metric. (1) d( p, q) = d(q, p). (2) d( p, q) > 0 if p = q and d( p, q) = 0 if p = q. (3) d( p, q) ≤ d( p, r ) + d(r, q). Here, d( p, q) is called the distance between p and q. Example 1.2.13 For two elements a and b in the set R of real numbers, assume the function d(a, b) = |a − b|. Then, we have |a − b| = |b − a|, |a − b| > 0 when a = b, and |a − b| = 0 when a = b. We also have |a − c| + |c − b| ≥ |a − b| from (|α| + |β|)2 − |α + β|2 = 2 (|α||β| − αβ) ≥ 0 for real numbers α and β. Therefore, the function d(a, b) = |a − b| is a distance function. ♦ Example 1.2.14 For two elements a and b in the set R of real numbers, assume the function d(a, b) = (a − b)2 . Then, we have (a − b)2 = (b − a)2 , (a − b)2 > 0 when a = b, and (a − b)2 = 0 when a = b. Yet, because (a − c)2 + (c − b)2 =

24

1 Preliminaries f (x)

Fig. 1.11 A function with support [−1, 1]

1

−1

0

1

x

(a − b)2 + 2(c − a)(c − b) < (a − b)2 when a < c < b, the function d(a, b) = ♦ (a − b)2 is not a distance function. Definition 1.2.11 (metric space; neighborhood; radius) A set is called a metric space if a distance is defined for every two points in the set. For a metric space X with distance function d, the set of all points q such that q ∈ X and d( p, q) < r is called a neighborhood of p and denoted by Nr ( p), where r is called the radius of Nr ( p). Example 1.2.15 For the metric space X = {x : −2 ≤ x ≤ 5} and distance function d(a, b) = |a − b|, the neighborhood of 0 with radius 1 is N1 (0) = (−1, 1), and ♦ N3 (0) = [−2, 3) is the neighborhood of 0 with radius 3. Definition 1.2.12 (limit point; closure) A point p is called a limit point of a subset E of a metric space if E contains at least one point different from p for every neighborhood of p. The union E¯ = E ∪ E L of E and the set E L of all the limit points of E is called the closure or enclosure of E. Example 1.2.16 For the metric space X = {x : −2 ≤ x ≤ 5} and distance function d(a, b) = |a − b|, consider a subset Y = (−1, 2] of X . The set of all the limit points of Y is [−1, 2] and the closure of Y is (−1, 2] ∪ [−1, 2] = [−1, 2]. ♦ Definition 1.2.13 (support) The closure of the set {x : f (x) = 0} is called the support of the function f (x). Example 1.2.17 The support of the function  f (x) = is [−1, 1] as shown in Fig. 1.11.

1 − |x|, |x| ≤ 1, 0, |x| > 1

(1.2.11) ♦

Example 1.2.18 The value of the function f (x) = sin x is 0 when x = nπ for n integer. Yet, the support of f (x) is the set R of real numbers. ♦

1.3 Continuity of Functions

25

1.3 Continuity of Functions When {xn }∞ n=1 is a decreasing sequence and x n > x, we denote it by x n ↓ x. When {xn } is an increasing sequence and xn < x, it is denoted by xn ↑ x. Let us also use x − = lim(x − ε) = lim(x + ε) and x + = lim(x + ε) = lim(x − ε). In other words, ε↓0

ε↑0

ε↓0

ε↑0

x − denotes a number smaller than, and arbitrarily close to, x; and x + is a number greater than, and arbitrarily  For a function f and a point x0 , when f (x0 )   close to, x. exists, lim f (x) = f x0− = f x0+ exists, and f (x0 ) = lim f (x), the function x→x0

x→x0

f is called continuous at point x0 . When a function is continuous at every point in an interval, the function is called continuous on the interval. Let us now discuss the continuity of functions (Johnsonbaugh and Pfaffenberger 1981; Khaleelulla 1982; Munroe 1971; Olmsted 1961; Rudin 1976; Steen and Seebach 1970) in more detail.

1.3.1 Continuous Functions Definition 1.3.1 (continuous function) If, for every positive number  and every point x0 in a region S, there exists a positive number δ (x0 , ) such that | f (x) − f (x0 )| <  for all points x in S when |x − x0 | < δ (x0 , ), then the function f is called continuous on S. In other words, for some point x0 in S and some positive number , if there exists at least one point x in S such that |x − x0 | < δ (x0 , ) yet | f (x) − f (x0 )| ≥  for every positive number δ (x0 , ), the function f is not continuous on S. Example 1.3.1 Consider the function  u(x) =

0, x ≤ 0, 1, x > 0.

(1.3.1)

Let x0 = 0 and  = 1. For x = 2δ with a positive number δ, we have |x − x0 | = 2δ < δ, ♦ yet | f (x) − f (x0 )| = 1 ≥ . Thus, u is not continuous on R.   Theorem 1.3.1 If f is continuous at x p ∈ E, g is continuous at f x p , and h(x) = g( f (x)) when x ∈ E, then h is continuous at x p .   Proof Because g is continuous     > 0, there exist a number η   at f x p , for every such that g(y) − g f x p  <  when  y − f x p  < η for y∈ f (E). In addition,  because at x p , there exists a number δ such that  f (x) − f x p  < η  f is continuous  when x − x p  < δ for for every   x ∈ E. In other   positive  words,  number , there exists a number δ such that h(x) − h x p  = g(y) − g f x p  <  when x − x p  < δ for x ∈ E. Therefore, h is continuous at x p . ♠

26

1 Preliminaries

Definition 1.3.2 (uniform continuity) If, for every positive number , there exists a positive number δ () such that | f (x) − f (x0 )| <  for all points x and x0 in a region S when |x − x0 | < δ (), then the function f is called uniformly continuous on S. In other words, if there exist at least one each of x and x0 in S for a positive number  such that |x − x0 | < δ () yet | f (x) − f (x0 )| ≥  for every positive number δ (), then the function f is not uniformly continuous on S. The difference between uniform continuity and continuity lies in the order of choosing the numbers x0 , δ, and . Specifically, for continuity, x0 and  are chosen first and then δ (x0 , ) is chosen, and thus δ (x0 , ) is dependent on x0 and . On the other hand, for uniform continuity,  is chosen first, δ () is chosen next, and then x0 is chosen last, in which δ () is dependent only on  and not on x or x0 . In short, the dependence of δ on x0 is the key difference. When a function f is uniformly continuous, we can make f (x1 ) arbitrarily close to f (x2 ) for every two points x1 and x2 by moving these two points together. A uniformly continuous function is always a continuous function, but a continuous function is not always uniformly continuous. In other words, uniform continuity is a stronger or more strict concept than continuity. Example 1.3.2 For the function f (x) = x in S = R, let δ =  with  > 0. Then, when |x − y| < δ, because | f (x) − f (y)| = |x − y| < δ = , f (x) √ = x is uniformly continuous. As shown in Exercise 1.23, the function f (x) = x is uniformly continuous on the interval (0, ∞). The function f (x) = x1 is uniformly continuous for all intervals (a, ∞) with a > 0. On the other hand, as shown in Exercise 1.22, it is not uniformly continuous on the interval (0, ∞). The  f (x) = tan x is  function ♦ continuous but not uniformly continuous on the interval − π2 , π2 . Example 1.3.3 In the interval S = (0, ∞), consider f(x) = x 2 . For a positive number  and a point x0 in S, let a = x0 + 1 and δ = min 1, 2a . Then, for a point x in S, when |x − x0 | < δ, we have |x − x0 | < 1 and x < x0 + 1 = a because δ ≤ 1. We also have x0 < a. Now, we have x 2 − x02  = (x + x0 ) |x − x0 | < 2aδ ≤ 2a 2a =  because δ ≤ 2a , and thus f is continuous. On the other hand, let  = 1, assume a positive number δ, and x0 = 1δ and x = x0 + 2δ . Then, we have |x − x0 | = 2δ < δ  choose  2   2 2   but x − x02  =  1δ + 2δ − δ12  = 1 + δ4 > 1 = , implying that f is not uniformly continuous. Note that, as shown in Exercise 1.21, function f (x) = x 2 is uniformly continuous on a finite interval. ♦ Theorem 1.3.2 A function f is uniformly continuous on S if | f (x) − f (y)| ≤ M|x − y|

(1.3.2)

for a number M and every x and y in S. In Theorem 1.3.2, the inequality (1.3.2) and the number M are called the Lipschitz inequality and Lipschitz constant, respectively.

1.3 Continuity of Functions

27

Example 1.3.4 Consider S = R and f (x) = 3x + 7. For a positive number , let δ = 3 . Then, we have | f (x) − f (x0 )| = 3 |x − x0 | < 3δ =  when |x − x0 | < δ for every two points x and x0 in S. Thus, f is uniformly continuous on S. ♦ As a special case of the Heine-Cantor theorem, we have the following theorem: Theorem 1.3.3 If a function is differentiable and has a bounded derivative, then the function is uniformly continuous.

1.3.2 Discontinuities     Definition 1.3.3 (type 1 discontinuity) the three values f x + , f x − , and    +When f (x) all exist and at least one of f x and f x − is different from f (x), the point x is called point or a jump discontinuity point, and the   a type1 discontinuity difference f x + − f x − is called the jump or saltus of f at x. Example 1.3.5 The function ⎧ ⎨ x + 1, x > 0, x = 0, f (x) = 0, ⎩ x − 1, x < 0

(1.3.3)

shown in Fig. 1.12 is type 1 discontinuous at x = 0 and the jump is 2.



    Definition 1.3.4 (type 2 discontinuity) If at least one of f x + and f x − does not exist, then the point x is called a type 2 discontinuity point. Example 1.3.6 The function  f (x) =

cos x1 , x = 0, 0, x =0

(1.3.4)

shown in Fig. 1.13 is type 2 discontinuous at x = 0.

Fig. 1.12 An example of a type 1 discontinuous function at x = 0



f (x) 1 0

−1

x

28

1 Preliminaries

Fig. 1.13 An example of a type 2 discontinuous function at x = 0: f (x) = cos x1 for x = 0 and f (0) = 0

1

f (x)

x

0 −1

Example 1.3.7 The function  f (x) =

1, 0,

x = rational number, x = irrational number

(1.3.5)

is type 2 discontinuous at any point x, and the function  f (x) =

x, 0,

x = rational number, x = irrational number

(1.3.6)

is type 2 discontinuous almost everywhere: that is, at all points except x = 0.



Example 1.3.8 The function f (x) =

1

, 0, q

x = qp , p ∈ Jand q ∈ J+ are coprime, x = irrational number

(1.3.7)

is5 continuous almost everywhere: that is, f is continuous at all points except at rational numbers. The discontinuities are all type 2 discontinuities. ♦ Example 1.3.9 Show that the function  f (x) =

sin x1 , x = 0, 0, x =0

(1.3.8)

is type 2 discontinuous at x = 0 and continuous at x = 0.     Solution Because f 0+ and f 0− do not exist, f (x) is type at  2 discontinuous  x+y  sin x−y  ≤ |x − ≤ 2 x = 0. Next, noting that |sin x − sin y| = 2 sin x−y cos 2 2 2 y|, we have |sin x − sin y| <  when |x − y| < δ = . Therefore, sin x is uniformly continuous. In addition, x1 is continuous at x = 0. Thus, from Theorem 1.3.1, f (x) is continuous at x = 0. ♦

5

This function is called Thomae’s function.

1.3 Continuity of Functions

29

1.3.3 Absolutely Continuous Functions and Singular Functions Definition 1.3.5 (absolute continuity) Consider a finite collection {(ak , bk )}nk=1 of non-overlapping intervals with ak , bk ∈ (−c, c) for a positive number c. If there exists a number δ = δ(c, ε) such that n 

| f (bk ) − f (ak )| < ε

(1.3.9)

k=1

when

n 

|bk − ak | < δ for every positive numbers c and ε, then the function f is

k=1

called an absolutely continuous function. Example 1.3.10 The functions f 1 (x) = x 2 and f 2 (x) = sin x are both absolutely ♦ continuous, and f 3 (x) = x1 is absolutely continuous for x > 0. If a function f (x) is absolutely continuous on a finite interval (a, b), then there exists an integrable function f  (x) satisfying  f (b) − f (a) =

b

f  (x)d x,

(1.3.10)

a

where −∞ < a < b < ∞ and f  (x) is the derivative of f (x) at almost every point. The converse also holds true. Note that, if f (x) is not absolutely continuous, the derivative does not satisfy (1.3.10) even when the derivative of f (x) exists at almost every point. Theorem 1.3.4 If a function has a bounded derivative almost everywhere and is integrable on a finite interval, and the right and left derivatives of the function exist at the points where the derivatives do not exist, then the function is absolutely continuous. Definition 1.3.6 (singular function) A continuous, but not absolutely continuous, function is called a singular function. In other words, a continuous function is either an absolutely continuous function or a singular function. Example 1.3.11 Denote by Di j the j-th interval that is removed at the i-th step in the procedure of obtaining the Cantor set C in Example 1.1.46, where i = 1, 2, . . . and j = 1, 2, . . . , 2i−1 . Draw 2n − 1 line segments 

2j − 1 y= : x ∈ Di j ; j = 1, 2, . . . , 2i−1 ; i = 1, 2, . . . , n 2i

 (1.3.11)

30

1 Preliminaries

φ1 (x)

φ2 (x)

1

1 3 − 4

1 −

1 − 2

2

1 − 4

1 − 3

2 − 3

1

x

1 2 3 − −− 9 9 9

6 7 8 − −− 9 9 9

1

x

Fig. 1.14 The first two functions φ1 (x) and φ2 (x) of {φn (x)}∞ n=1 converging to the Cantor function φC (x)

parallel to the x-axis on an (x, y) coordinate plane. Next, draw a straight line each from the point (0, 0) to the left endpoint of the nearest line segment and from the point (1, 1) to the right endpoint of the nearest line segment. For every line segment, draw a straight line from the right endpoint to the left endpoint of the nearest line segment on the right-hand side. Let the function resulting from this procedure be φn (x). Then, (0, 1), and is composed of 2n line segments φn (x) is continuous on the interval  3 n −n of height 2 and slope 2 connected with 2n − 1 horizontal line segments. Figure 1.14 shows φ1 (x) and φ2 (x). The limit φC (x) = lim φn (x) n→∞

(1.3.12)

of the sequence {φn (x)}∞ n=1 is called a Cantor function or Lebesgue function.



The Cantor function φC (x) described in Example 1.3.11 can be expressed as  φC (x) =

0. c21 c22 · · · , y shown in (1.3.11),

x ∈ C, x ∈ [0, 1] − C

(1.3.13)

when a point x in the Cantor set C discussed in Example 1.1.46 is written as x = 0.c1 c2 · · ·

(1.3.14)

in a ternary number. Now, the image of φC (x) is a subset of [0, 1]. In addition, because the number x = 0. (2b1 ) (2b2 ) · · ·

(1.3.15)

is clearly a point in C such that φC (x) = y when y ∈ [0, 1] is expressed in a binary number as y = 0.b1 b2 · · · , we have [0, 1] ⊆ φC (C). Therefore the range of φC (x) is [0, 1]. Some properties of the Cantor function φC (x) are as follows:

1.3 Continuity of Functions

31

(1) The Cantor function φC (x) is a non-decreasing function with range [0, 1] and no jump discontinuity. Because there can be no discontinuity except for jump discontinuities in non-increasing and non-decreasing functions, φC (x) is a continuous function. (2) Let E be the set of points represented by (1.1.41) and (1.1.42). Then, the function φC (x) is an increasing function at x ∈ C − E and is constant in some neighborhood of every point x ∈ [0, 1] − C. (3) As observed in Example 1.1.46, the length of [0, 1] − C is 1, and φC (x) is constant at x ∈ [0, 1] − C. Therefore, the derivative of φC (x) is 0 almost everywhere. Example 1.3.12 (Salem 1943) The Cantor function φC (x) considered in Example 1.3.11 is a non-decreasing singular function. Obtain an increasing singular function. Solution Consider the line segment P Q connecting P(x, y) and Q (x + Δx , y+  Δ y on a two-dimensional plane, where Δx > 0 and Δ y > 0. Let the point R have   the coordinate x + Δ2x , y + λ0 Δ y with 0 < λ0 < 1. Denote the replacement of the line segment P Q into two line segments P R and R Q by ‘transformation of P Q via T (λ0 )’. Now, starting from the line segment O A between the origin O(0, 0) and the point A(1, 1), consider a sequence { f n (x)}∞ n=0 defined by f 0 (x) = line segment O A, f 1 (x) = transformation of f 0 (x) via T (λ0 ) , f 2 (x) = transformation of each of the two line segments composing f 1 (x) via T (λ0 ) , f 3 (x) = transformation of each of the four line segments composing f 2 (x) via T (λ0 ) , .. . In other words, f m (x) is increasing from f m (0) = 0 to f m (1) = 1 and is composed 2m −1 of 2m line segments with the x coordinates of the end points 2km k=1 . Figure 1.15 shows { f m (x)}3m=0 with λ0 = 0.7. Assume we represent the x coordinate of the end points of the line segments m  θj = 0.θ1 θ2 · · · θm in a binary number, where θ j ∈ composing f m (x) as x = 2j j=1

{0, 1}. Then, the y coordinate can be written as y =

m  k=1

θk

k−1  j=1

λθ j ,

(1.3.16)

32

1 Preliminaries

f0 (x)

f1 (x)

1

1

1 x

1 2

1 x

1 2 3 4 5 6 7 8 8 8 8 8 8 8

1 x

f2 (x)

f3 (x)

1

1

1 4

2 4

3 4

1 x

Fig. 1.15 The first four functions in the sequence { f m (x)}∞ m=0 converging to an increasing singular function (λ0 = 0.7 = 1 − λ1 )

where λ1 = 1 − λ0 and we assume

k−1 

λθ j = 1 when k = 1. The limit f (x) =

j=1

lim f m (x) of the sequence { f m (x)}∞ m=1 is an increasing singular function.

m→∞



Let us note that the convolution6 of two absolutely continuous functions always results in an absolutely continuous function while the convolution of two singular functions may sometimes result not in a singular function but in an absolutely continuous function (Romano and Siegel 1986).

1.4 Step, Impulse, and Gamma Functions In this section, we describe the properties of unit step, impulse (Challifour 1972; Gardner 1990; Gelfand and Moiseevich 1964; Hoskins and Pinto 2005; Kanwal 2004; Lighthill 1980), and gamma functions (Artin 1964; Carlson 1977; Zayed 1996) in detail. ∞ ∞ The integral −∞ g(x − v) f (v)dv = −∞ g(v) f (x − v)dv is called the convolution of f and g, and is usually denoted by f ∗ g or g ∗ f .

6

1.4 Step, Impulse, and Gamma Functions

33

1.4.1 Step Function Definition 1.4.1 (unit step function) The function  u(x) =

0, x < 0, 1, x > 0

(1.4.1)

is called the unit step function, step function, or Heaviside function and is also denoted by H (x). In (1.4.1), the value u(0) is not defined: usually, u(0) is chosen as 0, 21 , 1, or any value u 0 between 0 and 1. Figure 1.16 shows the unit step function with u(0) = 21 . In some cases, the unit step function with value α at x = 0 is denoted by u α (x), with u − (x), u(x), and u + (x) denoting the cases of α = 0, 21 , and 1, respectively. The unit step function can be regarded as the integral of the impulse or delta function that will be considered in Sect. 1.4.2. The unit step function u(x) with u(0) = 21 can be represented as the limit  u(x) = lim

α→∞

1 1 + tan−1 (αx) 2 π

 (1.4.2)

or u(x) = lim

α→∞

1 1 + e−αx

(1.4.3)

of a sequence of continuous functions. We also have u(x) =

1 1 + 2 2π





−∞

sin(ωx) dω. ω

(1.4.4)

As we have observed in (1.4.2) and (1.4.3), the unit step function can be defined alternatively by first introducing step-convergent sequence, also called the Heaviside

Fig. 1.16 Unit step function with u(0) = 21

u(x) 1

0

x

34

1 Preliminaries

convergent sequence or Heaviside sequence. Specifically, employing the notation7  a(x), b(x) =



a(x)b(x)d x,

(1.4.5)

−∞

a sequence {h m (x)}∞ m=1 of real functions that satisfy 



lim  f (x), h m (x) =

m→∞

f (x)d x

0

=  f (x), u(x)

(1.4.6)

for every sufficiently smooth function f (x) in the interval −∞ < x < ∞ is called a step-convergent sequence or a step sequence, and its limit lim h m (x)

(1.4.7)

m→∞

is called the unit step function. Example 1.4.1 If we let u(0) = 21 , then u(x) = 1 − u(−x) and u(a − x) = 1 − u(x − a). ♦ Example 1.4.2 We can obtain8 u(x − |a|) = u(x + a)u(x − a)   = u x 2 − a 2 u(x),  0, min(a, b) < x < max(a, b), u((x − a)(x − b)) = 1, x < min(a, b) or x > max(a, b),

(1.4.8) (1.4.9)

and   u x 2 = u (|x|)  u(0), x = 0, = 1, x = 0 from the definition of the unit step function. Example 1.4.3 Let u(0) = 21 . Then, the min function min(t, s) =

(1.4.10)



♦ t, t ≤ s, can s, t ≥ s

be expressed as min(t, s) = t u(s − t) + s u(t − s).

(1.4.11)

When we also take complex functions into account, the notation a(x), b(x) is defined as ∞ a(x), b(x) = −∞ a(x)b∗ (x)d x. 8 In (1.4.8), it is implicitly assumed u(0) = 0 or 1. 7

1.4 Step, Impulse, and Gamma Functions

35

In addition, we have 

t

min(t, s) =

u(s − y)dy

(1.4.12)

0

 for t ≥ 0 and s ≥ 0. Similarly, the max function max(t, s) =

t, t ≥ s, can be s, t ≤ s

expressed as max(t, s) = t u(t − s) + s u(s − t).

(1.4.13)

Recollecting (1.4.12), we also have max(t, s) = t + s − min(t, s), i.e.,  max(t, s) = t + s −

t

u(s − y)dy

(1.4.14)

0

for t ≥ 0 and s ≥ 0.



The unit step function is also useful in expressing piecewise continuous functions as single-line formulas. Example 1.4.4 The function ⎧ 2 ⎨ x , 0 < x < 1, F(x) = 3, 1 < x < 2, ⎩ 0, otherwise

(1.4.15)

  can be written as F(x) = x 2 u(x) − x 2 − 3 u(x − 1) − 3u(x − 2).



Example 1.4.5 Consider the function ⎧ 1 , x > 2m ⎨ 1, 1 1 h m (x) = mx + 2 , − 2m ≤ x ≤ ⎩ 1 0, x < − 2m

1 , 2m

(1.4.16)

 1 ∞ 2m shown in Fig. 1.17. Then, we have lim −∞ f (x)h m (x)d x = lim 1 − 2m m→∞ m→∞   1     ∞ ∞ mx + 21 f (x)d x + 1 f (x)d x = 0 f (x)d x =  f (x), u(x) because −2m1 2m 2m  2m1 1 f (x)d x ≤ m max | f (x)| → 0 and m − 1 x f (x)d x ≤ max |x f (x)| → 0 when 1 |x|≤ 2m

2m

1 |x|≤ 2m

m → ∞ and f (0) < ∞. In other words, the sequence {h m (x)}∞ m=1 is a step♦ convergent sequence and its limit is lim h m (x) = u(x). m→∞

The unit step function we have described so far is defined in the continuous space. In the discrete space, the unit step function can similarly be defined.

36

1 Preliminaries

Fig. 1.17 A function in the step-convergent sequence {h m (x)}∞ m=1

hm (x) 1 1 2

1 − 2m

0

1 2m

x

Definition 1.4.2 (unit step function in discrete space) The function  u(x) ˜ =

1, x = 0, 1, . . . , 0, x = −1, −2, . . .

(1.4.17)

is called the unit step function in discrete space. Note that, unlike the unit step function u(x) in continuous space for which the value u(0) is not defined uniquely, the value u(0) ˜ is defined uniquely as 1. In addition, for any non-zero real number a, u(|a|x) is equal to u(x) except possibly at x = 0 while u(|a|x) ˜ and u(x) ˜ are different9 at infinitely many points when |a| < 1.

1.4.2 Impulse Function Although an impulse function is also called a generalized function or a distribution, we will use the terms impulse function and generalized function in this book and reserve the term distribution for another concept later in Chap. 2. An impulse function can be introduced in three ways. The first one is to define an impulse function as the symbolic derivative of the unit step function. The second way is to define an impulse function via basic properties and the third is to take the limit of an impulse-convergent sequence.

1.4.2.1

Definitions

As shown in Fig. 1.18, the ramp function ⎧ ⎨ 0, t ≤ 0, ra (t) = at , 0 ≤ t ≤ a, ⎩ 1, t ≥ a

9

In this case, we assume that u(x) ˜ is defined to be 0 when x is not an integer.

(1.4.18)

1.4 Step, Impulse, and Gamma Functions

37 r0.5 (t) 1

r1 (t) 1

r2 (t) 1

0

2

0

t

1

t

0

t

0.5

Fig. 1.18 Ramp functions r2 (t), r1 (t), and r0.5 (t)

is a continuous function, but not differentiable. Yet, it is differentiable everywhere except at t = 0 and a. Similarly, the rectangular function  pa (t) =

0, t < 0 or t > a, 1 ,0 2m

(1.4.29)

⎧ 2 ⎨ m x + m, − m1 ≤ x ≤ 0, sm (x) = −m 2 x + m, 0 ≤ x ≤ m1 , ⎩ 0, |x| > m1

(1.4.30)

and



are both delta-convergent sequences. Example 1.4.9 For a non-negative function f (x) with sequence {m f (mx)}∞ m=1 is an impulse sequence.

∞

−∞

f (x)d x = 1, the ♦

The impulse function can now be defined based on the impulse-convergent sequence as follows: Definition 1.4.6 (impulse function) The limit lim sm (x) = δ(x)

m→∞

(1.4.31)

of a delta-convergent sequence {sm (x)}∞ m=1 is called an impulse function. Example 1.4.10 Sometimes, we have δ(0) → −∞ as in (1.4.27).

1.4.2.2



Properties

We have δ(x), φ(x) = φ(0),    δ (k) (x), φ(x) = (−1)k δ(x), φ(k) (x) ,  f (x)δ(x), φ(x) = δ(x), f (x)φ(x), 

and

(1.4.32) (1.4.33) (1.4.34)

1.4 Step, Impulse, and Gamma Functions

  1 x  φ δ(ax), φ(x) = δ(x), |a| a

41

(1.4.35)

when φ and f are sufficiently smooth functions. Letting a = −1 in (1.4.35), we get δ(−x), φ(x) = δ(x), φ(−x) = φ(0). Therefore δ(−x) = δ(x)

(1.4.36)

from (1.4.32). In other words, δ(x) is an even function. Example 1.4.11 For the minimum function min(t, s) = t u(s − t) + s u(t − s) introduced in (1.4.11), we get ∂t∂ min(t, s) = u(s − t) and ∂2 min(t, s) = δ(s − t) ∂s∂t = δ(t − s)

(1.4.37)

by noting that tδ(s − t) − sδ(t − s) = tδ(s − t) − tδ(s − t) = 0. Similarly, for max(t, s) = t u(t − s) + s u(s − t) introduced in (1.4.13), we have ∂t∂ max(t, s) = u(t − s) and ∂2 max(t, s) = −δ(s − t) ∂s∂t = −δ(t − s) by noting again that tδ(s − t) − sδ(t − s) = tδ(s − t) − tδ(s − t) = 0.

(1.4.38) ♦

Let us next introduce the concept of a test function and then consider the product of a function and the n-th order derivative δ (n) (x) of the impulse function. Definition 1.4.7 (test function) A real function φ satisfying the two conditions below is called a test function. (1) The function φ(x) is differentiable infinitely many times at every point x = (x1 , x2 , . . . , xn ). (2) There exists a finite number A such that φ(x) = 0 for every point x = (x1 , x2 , . . . , xn ) satisfying

x12 + x22 + · · · + xn2 > A.

A function satisfying condition (1) above is often called a C ∞ function. Definition 1.4.7 allows a test function in the n-dimensional space: however, we will consider mainly one-dimensional test functions in this book.   2 Example 1.4.12 The function φ(x) = exp − a 2a−x 2 u(a − |x|) shown in Fig. 1.23 is a test function. ♦

42

1 Preliminaries

Fig. 1.23 A test function φ(x) =  2 exp − a 2a−x 2 u(a − |x|)

e−1

−a

0

a

x

Theorem 1.4.1 If a function f is differentiable n times consecutively, then f (x)δ (n) (x − b) = (−1)n

n  (−1)k n Ck f (n−k) (b)δ (k) (x − b),

(1.4.39)

k=0

where n Ck denotes the binomial coefficient. (n) Proof b = 0 and assume  ∞a test function(n)φ(x). We get  f (x)δ (x), φ(x) =  ∞ Let (n) −∞ f (x)δ (x)φ(x)d x = −∞ { f (x)φ(x)}δ (x)d x, i.e.,

 ∞  f (x)δ (n) (x), φ(x) = { f (x)φ(x)} δ (n−1) (x) −∞  ∞ − { f (x)φ(x)} δ (n−1) (x)d x −∞

.. .



= (−1)

n

∞ −∞

{ f (x)φ(x)}(n) δ(x)d x

(1.4.40)

because φ(x) = 0 for x → ±∞. Now, (1.4.40) can be written as  f (x)δ (n) (x), φ(x) = (−1)n f (n) (0)δ(x), φ(x) + (−1)−1 n f (n−1) (0)δ (1) (x), φ(x) n(n − 1) (n−2) f + (−1)−2 (0)δ (2) (x), φ(x) 2! .. .

+ (−1)−n f (0)δ (n) (x), φ(x) using (1.4.33) because { f (x)φ(x)}(n) =

n 

n Ck

k=0

is the same as the symbolic expression (1.4.39).

(1.4.41)

f (n−k) (x)φ(k) (x). The result (1.4.41) ♠

The result (1.4.39) implies that the product of a sufficiently smooth function

n f (x) and δ (n) (x − b) can be expressed as a linear combination of δ (k) (x − b) k=0 with

1.4 Step, Impulse, and Gamma Functions

43

the coefficient of δ (k) (x − b) being the product of the number (−1)n−k n Ck and12 the value f (n−k) (b) of f (n−k) (x) at x = b. Example 1.4.13 From (1.4.39), we have f (x)δ(x − b) = f (b)δ(x − b)

(1.4.42)

when n = 0.



Example 1.4.14 Rewriting (1.4.39) specifically for easier reference, we have f (x)δ  (x − b) = − f  (b)δ(x − b) + f (b)δ  (x − b), (1.4.43) f (x)δ  (x − b) = f  (b)δ(x − b) − 2 f  (b)δ  (x − b) + f (b)δ  (x − b), (1.4.44) and f (x)δ  (x − b) = − f  (b)δ(x − b) + 3 f  (b)δ  (x − b) − 3 f  (b)δ  (x − b) + f (b)δ  (x − b) when n = 1, 2, and 3, respectively.

(1.4.45) ♦

Example 1.4.15 From (1.4.43), we get δ  (x) sin x = −(cos 0)δ(x) + (sin 0)δ  (x) = −δ(x). ♦ Theorem 1.4.2 For non-negative integers m and n, we have x m δ (n) (x) = 0 and n! δ (n−m) (x) when m > n and m ≤ n, respectively. x m δ (n) (x) = (−1)m (n−m)! Theorem 1.4.2 can be obtained directly from (1.4.39). Theorem 1.4.3 The impulse function δ ( f (x)) of a function f can be expressed as δ ( f (x)) =

n  δ (x − xm ) , | f  (xm )| m=1

(1.4.46)

where {xm }nm=1 denotes the real simple zeroes of f . Proof Assume that function f has one real simple zero x1 , and consider a sufficiently small interval Ix1 = (α, β) with α < x1 < β. Because x1 is the simple zero of f , we have f  (x1 ) = 0. If f  (x1 ) > 0, then f (x) increases from f (α) to f (β) as x moves from α to β. Consequently, u ( f (x)) = u (x − x1 ) and ddx u ( f (x)) = δ (x − x1 ) f (x)) d f (x) on the interval Ix1 . On the other hand, we have ddx u ( f (x)) = du( = d f (x) dx  d f (x)   δ( f (x)) d x  = δ( f (x)) f (x1 ). Thus, we get {x: f (x)=0}

12

Note that (−1)n+k = (−1)n−k .

44

1 Preliminaries

δ (x − x1 ) . f  (x1 )

δ( f (x)) =

(1.4.47)

1) Similarly, if f  (x1 ) < 0, we have u ( f (x)) = u (x1 − x) and δ( f (x)) = − δ(x−x on f  (x1 ) the interval Ix1 . In other words,

δ (x − x1 ) . | f  (x1 )|

δ( f (x)) =

Extending (1.4.48) to all the real simple zeroes of f , we get (1.4.46).

(1.4.48) ♠

Example 1.4.16 From (1.4.46), we can get   1 b δ x+ |a| a

δ(ax + b) =   and δ x 3 + 3x =

1 δ(x). 3

(1.4.49) ♦

Example 1.4.17 Based on (1.4.46), it can be shown that δ((x − a)(x − b)) = 1 {δ(x − a) + δ(x − b)} when b > a, δ(tan x) = δ(x) when − π2 < x < π2 , and b−a   ♦ δ(cos x) = δ x − π2 when 0 < x < π.   or For the function13 δ  ( f (x)) = dδ(v) dv  v= f (x)

δ  ( f (x)) =

dδ( f (x)) , d f (x)

(1.4.50)

we similarly have the following theorem: Theorem 1.4.4 We have δ  ( f (x)) =

!  " δ (x − xm ) 1 f  (xm ) , (1.4.51) + δ − x (x ) m | f  (xm )| f  (xm ) { f  (xm )}2 m=1 n 

where {xm }nm=1 denote the real simple zeroes of f (x). Proof We recollect that δ  (x − xm ) f  (xm ) 1 = δ  (x − xm ) δ (x − xm ) +  2   f (x) f (xm ) { f (xm )} $ # Note that g  ( f (x)) = dg(y) = dy y= f (x) # $ dg(y) dg( f ) d or d f , but not d x g( f (x)). dy

13

y= f (x)

d dx

(1.4.52)

g( f (x)). In other words, g  ( f (x)) denotes

1.4 Step, Impulse, and Gamma Functions

 from (1.4.43) because

1 f  (x)

   

that

x=xm

45

  (x)  = − { ff  (x)} 2



x=xm

= − { ff  (x(xm)}) 2 . Recollect also m

n  δ  (x − xm ) d δ ( f (x)) = | f  (xm )| dx m=1

from (1.4.46). Then, because δ  ( f ) = δ  ( f (x)) =

dδ( f ) df

=

d x dδ( f ) , d f dx

(1.4.53)

we have

n 1  δ  (x − xm ) f  (x) m=1 | f  (xm )|

(1.4.54)

from (1.4.53). Now, employing (1.4.52) into (1.4.54) results in δ  ( f (x)) =

!  " f (xm ) 1 δ  (x − xm ) , (1.4.55) δ − x + (x ) m | f  (xm )| { f  (xm )}2 f  (xm ) m=1 n 

completing the proof.



When the real simple zeroes of a sufficiently smooth function f (x) are {xm }nm=1 , Theorem 1.4.3 indicates that δ( f (x)) can be expressed as a linear combination  that of {δ (x − xm )}nm=1 and Theorem 1.4.4

n δ ( f (x)) can be similarly indicates  expressed as a linear combination of δ (x − xm ) , δ (x − xm ) m=1 . Example 1.4.18 The function f (x) = (x − 1)(x − 2) has two simple zeroes x = 1 and 2. We thus have δ  ((x − 1)(x − 2)) = 2δ(x − 1) + 2δ(x − 2) − δ  (x − 1) + ♦ δ  (x − 2) because f  (1) = −1, f  (1) = 2, f  (2) = 1, and f  (2) = 2. Example 1.4.19 The function f (x) = sinh 2x has one simple zero x = 0. Then, we get δ  (sinh 2x) = 21 21 δ  (x) + 0 = 14 δ  (x) from f  (0) = 2 cosh 0 = 2 and f  (0) = 4 sinh 0 = 0. ♦

1.4.3 Gamma Function In this section, we address definitions and properties of the factorial, binomial coefficient, and gamma function (Andrews 1999; Wallis and George 2010).

1.4.3.1

Factorial and Binomial Coefficient

Definition 1.4.8 (falling factorial; factorial) We call

46

1 Preliminaries

 [m]k =

1, k = 0, m(m − 1) · · · (m − k + 1), k = 1, 2, . . .

(1.4.56)

the k falling factorial of m. The number of enumeration of n distinct objects is n! = [n]n

(1.4.57)

for n = 1, 2, . . ., where the symbol ! is called the factorial. Consequently, 0! = 1.

(1.4.58)

Example 1.4.20 If we use each of the five numbers {1, 2, 3, 4, 5} once, we can generate 5! = 5 × 4 × 3 × 2 × 1 = 120 five-digit numbers. ♦ Definition 1.4.9 (permutation) The number of ordered arrangements with k different items from n different items is n Pk

= [n]k

(1.4.59)

for k = 0, 1, . . . , n, and n Pk is called the (n, k) permutation. Example 1.4.21 If we choose two numbers from {2, 3, 4, 5, 6} and use each of the two numbers only once, then we can make 5 P2 = 20 two-digit numbers. ♦ Theorem 1.4.5 The number of arrangements with k different items from n different items is n k if repetitions are allowed. Example 1.4.22 We can make 104 passwords of four digits with {0, 1, . . . , 9}. ♦ Definition 1.4.10 (combination) The number of ways to choose k different items from n items is n Ck

=

n! (n − k)!k!

for k = 0, 1, . . . , n, and n Ck , written also as

(1.4.60)

n  , is called the (n, k) combination. k

The symbol n Ck shown in (1.4.60) is also called the binomial coefficient, and satisfies n Ck

= n Cn−k .

(1.4.61)

From (1.4.59) and (1.4.60), we have n Ck

=

n Pk

k!

.

(1.4.62)

1.4 Step, Impulse, and Gamma Functions

47

Example 1.4.23 We can choose two different numbers from the set {0, 1, . . . , 9} in ♦ 10 C2 = 45 different ways. Theorem 1.4.6 For n repetitions of choosing one element from {ω1 , ω2 , . . . , ωm }, the number of results in which we have n i of ωi for i = 1, 2, . . . , m is 

n n1, n2, . . . , nm

 =

⎧ ⎨ ⎩

m 

n! , n 1 !n 2 !···n m !

if

0,

otherwise,

where n i ∈ {0, 1, . . . , n}. The left-hand side nomial coefficient.



n i = n,

i=1

(1.4.63)

 n of (1.4.63) is called the multin 1 ,n 2 ,...,n m

Proof We have      n n − n1 n − n 1 − n 2 − · · · − n m−1 ··· n1 n2 nm n! (n − n 1 − n 2 − · · · − n m−1 )! (n − n 1 )! ··· = n 1 ! (n − n 1 )! n 2 ! (n − n 1 − n 2 )! nm ! n! (1.4.64) = n 1 !n 2 ! · · · n m ! because the number of desired results is the number that ω1 occurs n 1 times among n occurrences, ω2 occurs n 2 times among the remaining n − n 1 occurrences, · · · , and ωm occurs n m times among the remaining n − n 1 − n 2 · · · − n m−1 occurrences. ♠ The multinomial coefficient is clearly a generalization of the binomial coefficient. Example 1.4.24 Let A = {1, 2, 3}, B = {4, 5}, and C = {6} in rolling a die. When the rolling is repeated 10 times, the number  10 of  results in which A, B, and C occur four, five, and one times, respectively, is 4,5,1 = 1260. ♦ 1.4.3.2

Gamma Function

For α > 0,  Γ (α) =



x α−1 e−x d x

(1.4.65)

0

is called the gamma function, which satisfies Γ (α) = (α − 1)Γ (α − 1) and

(1.4.66)

48

1 Preliminaries

Γ (n) = (n − 1)!

(1.4.67)

when n is a natural number. In other words, the gamma function can be viewed as a generalization of the factorial. Let us now consider a further generalization. When α < 0 and α = −1, −2, . . ., we can define the gamma function as Γ (α) =

Γ (α + k + 1) , α(α + 1)(α + 2) · · · (α + k)

(1.4.68)

where k is the smallest integer such that α + k + 1 > 0. Next, for a complex number z, let14  1, n = 0, (z)n = (1.4.69) z(z + 1) · · · (z + n − 1), n = 1, 2, . . . . Then, for non-negative integers α and n, we can express the factorial as (α + n)(α + n − 1) · · · (α + 1)α(α − 1) · · · 1 (α + 1)(α + 2) · · · (α + n) (α + n)! = (α + 1)n

α! =

from (1.4.67)–(1.4.69). Rewriting (1.4.70) as α! = α! =

(n+1)α n! (α+1)n

(1.4.70)

and subsequently as

n α n! (n + 1)α , (α + 1)n nα

(1.4.71)

n α n! n→∞ (α + 1)n

(1.4.72)

we have α! = lim

     α because lim (n+1) = 1 + n1 1 + n2 · · · 1 + αn = 1. Based on (1.4.72), the α n→∞ n gamma function for a complex number α such that α = 0, −1, −2, . . . can be defined as n α−1 n! , n→∞ (α)n

Γ (α) = lim

(1.4.73)

which can also be written as Here, (z)n is called the rising factorial, ascending factorial, rising sequential product, upper factorial, Pochhammer’s symbol, Pochhammer function, or Pochhammer polynomial, and is the same as Appell’s symbol (z, n).

14

1.4 Step, Impulse, and Gamma Functions

49

n α n! n→∞ (α)n+1

Γ (α) = lim n α−1 n! n→∞ (α)n

because lim

(α+n)n α−1 n! (α)n+1 n→∞ 0 Γ (1) = lim n(1)n!n n→∞

= lim

Now, recollecting

n α n! n→∞ (α+1)n

have Γ (α + 1) = lim

n α n! n→∞ (α)n+1 = lim n! = n→∞ n!

= lim

(1.4.74)   1 + αn . 1 and (α + 1)n = (α)n α+n , we α

α−1 αn lim n n! n→∞ α+n n→∞ (α)n

= lim

from (1.4.73), and therefore

Γ (α + 1) = αΓ (α),

(1.4.75)

which is the same as (1.4.66). Based on (1.4.75), we can obtain lim pΓ ( p) = p→0

lim Γ ( p + 1), i.e.,

p→0

lim pΓ ( p) = 1.

(1.4.76)

p→0

In parallel, when a ≥ b, we have lim (cn)b−a ΓΓ (cn+a) = lim (cn)b−a (cn + a − (cn+b) n→∞

1)(cn + a − 2) · · · (cn + b) = lim (cn)b−a (cn)a−b , i.e.,

n→∞

n→∞

lim (cn)b−a

n→∞

Γ (cn + a) = 1. Γ (cn + b)

(1.4.77)

We can similarly show that (1.4.77) also holds true for a < b. The gamma function Γ (α) is analytic at all points except at α = 0, −1, . . .. In α−1 n! addition, noting that lim (α + k)Γ (α) = lim lim (α+k)n or (α)n α→−k

α→−k n→∞

n −k−1 n! n→∞ (−k)(−k + 1) . . . (−k + n − 1) n! (−1)k lim k! n→∞ n k+1 (n − k − 1)! (n − k)(n − k + 1) · · · n (−1)k lim k! n→∞ n k+1      k k k−1 0 (−1) lim 1 − 1− ··· 1 − k! n→∞ n n n (−1)k (1.4.78) k!

lim (α + k)Γ (α) = lim

α→−k

= = = =

for k = 0, 1, . . . because (−k)(−k + 1) . . . (−k + n − 1) = (−k)(−k + 1) . . . (−k + k − 1)(−k + k + 1) . . . (−k + n − 1) = (−1)k k!(n − k − 1)!, the residue of −α at the simple pole α ∈ {0, −1, −2, . . .}. Γ (α) is (−1) (−α)!

50

1 Preliminaries

As we shall show later in (1.A.31)–(1.A.39), we have (Abramowitz and Stegun 1972) Γ (1 − x)Γ (x) =

π sin πx

(1.4.79)

for 0 < x < 1, which is called the Euler reflection formula. Because we have Γ ( − n π π n)Γ (n + 1 − ) = sin π{(n+1)−} = (−1) when x = n + 1 −  and Γ (−)Γ (1 + sin π π π ) = sin(π+π) = − sin π when x = 1 +  from (1.4.79), we have Γ ( − n) = (−1)n−1 Replacing x with

1 2

+ x and

3 4

Γ (−)Γ (1 + ) . Γ (n + 1 − )

(1.4.80)

+ x in (1.4.79), we have

   1 π 1 −x Γ +x = 2 2 cos πx

(1.4.81)

√    1 3 2π −x Γ +x = , 4 4 cos πx − sin πx

(1.4.82)

 Γ and  Γ respectively. Example 1.4.25 We get

Γ with x =

1 2

  √ 1 = π 2

(1.4.83) ♦

in (1.4.79).

Example 1.4.26 By recollecting (1.4.75) and (1.4.83), we can obtain Γ 1     √ + k − 1 21 + k − 2 · · · 21 Γ 21 = 1×3×···×(2k−1) π, i.e., 2 2k  Γ

1 +k 2



Γ (2k + 1) 22k Γ (k + 1) (2k)! √ = 2k π 2 k!

1 2



+k =

=

(1.4.84)

for k ∈ {0, 1, . . .}. Similarly, we get  Γ

1 −k 2

 = (−1)k

22k k! √ π (2k)!

(1.4.85)

1.4 Step, Impulse, and Gamma Functions

51

         for k ∈ {0, 1, . . .} using Γ 21 = 21 − 1 21 − 2 · · · 21 − k Γ 21 − k =   1  Γ 21 − k = (−1)k 2(2k)! −k . ♦ (−1)k 1×3×···×(2k−1) 2k k! Γ 2 2k From (1.4.84) and (1.4.85), we have  Γ

   1 1 −k Γ + k = (−1)k π, 2 2

(1.4.86)

which is the same as (1.4.81) with x √ an integer k. We can√ obtain  3  1  1  √π         Γ 2 = 2 Γ 2 = 2 , Γ 25 = 23 Γ 23 = 3 4 π , Γ 27 = 25 Γ 25 = 158 π , · · · √         √ from (1.4.84), and Γ − 21 = −2Γ 21 = −2 π, Γ − 23 = − 23 Γ − 21 = 4 3 π , √     Γ − 25 = − 25 Γ − 23 = − 815π , · · · from (1.4.85). Some of such values are shown in Table 1.2. We can rewrite (1.4.79) as Γ (1 − x)Γ (1 + x) =

πx sin πx

(1.4.87)

by recollecting Γ (x + 1) = xΓ (x) shown in (1.4.75). using      In addition,  1 (1.4.79)– 2π (1.4.82), (1.4.86), and (1.4.87), we have15 Γ 13 Γ 23 = √ Γ 43 = , Γ 4 3 √     2π, and Γ 16 Γ 56 = 2π. Note also that, by letting v = t β when β > ∞  ∞ α+1 ∞ α   1 0, we have 0 t α exp −t β dt = 0 v β e−v β1 v β −1 dv = β1 0 v β −1 e−v dv =   1 α+1 . Subsequently, we have Γ β β 

∞ 0

because

∞ 0

  1 Γ t α exp −t β dt = |β|



α+1 β

 (1.4.88)

  0 α   1 by letting v = t α exp −t β dt = ∞ v β e−v β1 v β −1 dv = − β1 Γ α+1 β

t β when β < 0. When α ∈ {−1, −2, . . .}, we have (Artin 1964)

Γ (α + 1) → ±∞.

(1.4.89)

  More specifically, the value Γ α + 1± = α± ! can be expressed as   Γ α + 1± → ±(−1)α+1 ∞.

(1.4.90)

√   √ 2π Here, π ≈ 1.7725, √ ≈ 3.6276, and 2π ≈ 4.4429. In addition, we have Γ 18 ≈ 7.5339, 1 3 1 1 1   1 Γ 7 ≈ 6.5481, Γ 6 ≈ 5.5663, Γ 5 ≈ 4.5908, Γ 4 ≈ 3.6256, Γ 3 ≈ 2.6789, Γ 23 ≈           1.3541, Γ 43 ≈ 1.2254, Γ 45 ≈ 0.4022, Γ 56 ≈ 0.2822, Γ 67 ≈ 0.2082, and Γ 78 ≈ 0.1596.

15

52

1 Preliminaries

For instance, we have lim Γ (α + 1) = +∞, lim Γ (α + 1) = −∞, lim Γ (α + α↓−1

α↑−1

1) = −∞, lim Γ (α + 1) = +∞, · · · .

α↓−2

α↑−2

Finally, when α − β is an integer, α < 0, and β < 0, consider a number v for which both α − v and β − v are natural numbers. Specifically, let v = min(α, β) − k for k = 1, 2, . . .: for instance, v ∈ {−π − 1, −π − 2, . . .} when α = −π and β = 1 − π, and v ∈ {−6.4, −7.4, . . .} when α = −3.4 and β = −5.4. α−v+1 Γ (v) (−v)(−v−1)···(−α+1)(−α) = Γ Γ(α+1) = α(α−1)···v = (−1) = Rewriting ΓΓ (α+1) (β+1) (v) Γ (β+1) β(β−1)···v (−1)β−v+1 (−v)(−v−1)···(−β+1)(−β)

Γ (−β) , we get (−1)α−β ΓΓ(−v+1) (−α) Γ (−v+1)

Γ (α + 1) Γ (−β) = (−1)α−β . Γ (β + 1) Γ (−α) Based on (1.4.91), it is possible to obtain (−β) (−1)α−β ΓΓ (−α) and

lim

Γ (y+1)

x↓β, y↑α Γ (x+1)

=

lim

Γ (y+1)

(1.4.91) =

lim

Γ (y+1)

=

x↓β, y↓α Γ (x+1) x↑β, y↑α Γ (x+1) Γ (y+1) α−β+1 Γ (−β) lim = (−1) : in essence, Γ (−α) x↑β, y↓α Γ (x+1)

we have16 ⎧ ⎨ Γ (α± ) = (−1)α−β Γ (−β+1) , Γ (β ± ) Γ (−α+1) ∓ ⎩ Γ (α± ) = (−1)α−β+1 Γ (−β+1) . Γ (β )

(1.4.92)

Γ (−α+1)

This expression is quite useful when we deal with permutations and combinations of negative integers, as we shall see later in (1.A.3) and Table 1.4. (1−π) 1 Example 1.4.27 We have ΓΓ (2−π) = (−1)−1 ΓΓ(π−1) = 1−π from (1.4.91). We can (π) + Γ (−5 ) Γ (5) ♦ easily get Γ (−4+ ) = − Γ (6) = − 15 from (1.4.91) or (1.4.92).

Example 1.4.28 It is known (Abramowitz and Stegun 1972) that Γ ( j) ≈ √ −0.1550 − j0.4980, where j = −1. Now, recollect Γ (−z) = − Γ (1−z) from z Γ (1 − z) = −zΓ (−z). Thus, using the Euler reflection formula (1.4.79), we have Γ (z)Γ (−z) = − Γ (z)Γz(1−z) , i.e., Γ (z)Γ (−z) = −

π z sin πz

(1.4.93)

for z = 0, ±1,±2, . . .. Next, because e± j x = cos x ± j sin x, we have sin x = 1 e j x − e− j x , Γ (¯z ) = Γ (z), and Γ (1 − j) = − jΓ (− j). Thus, we have 2j Γ ( j)Γ (− j) = |Γ ( j)|2 = − j sinπ π j , i.e., |Γ ( j)|2 =

16

2π , eπ − e−π

When we use this result, we assume α+ and β + unless specified otherwise.

(1.4.94)

1.4 Step, Impulse, and Gamma Functions

53

Table 1.2 Values of Γ (z) for z = − 29 , − 27 , . . . , − 29

z Γ (z)

where

√ − 32945π

2π eπ −e−π

− 27

√ 16 π 105

− 25

√ − 815π

− 23

9 2

1 − 21 2 √ √ −2 π π

√ 4 π 3

3 2√

5 2√ 3 π 4

π 2

7 2 √ 15 π 8

9 2 √ 105 π 16

≈ 0.2720 and 0.15502 + 0.49802 ≈ 0.2720.



Example 1.4.29 If we consider only the region α > 0, then the gamma function (Zhang and Jian 1996) exhibits the minimum value Γ (α0 ) ≈ 0.8856 at α = α0 ≈ ♦ 1.4616, and is convex downward because Γ  (α) > 0.

1.4.3.3

Beta Function

The beta function is defined as17 ˜ B(α, β) =



1

x α−1 (1 − x)β−1 d x

(1.4.95)

0

for complex numbers α and β such that Re(α) > 0 and Re(β) > 0. In this section, let us show Γ (α)Γ (β) ˜ . B(α, β) = Γ (α + β) ˜ B(α, β + 1) =

(1.4.96)

1

˜ ˜ + x α−1 (1 − x)(1 − x)β−1 d x = B(α, β) − B(α   β 1 α β ˜ 1 α β 1 β−1 ˜ 1, β) and B(α, β + 1) = α x (1 − x) x=0 + α 0 x (1 − x) d x = α B(α + 1, β). Thus, We

have

0

α+β ˜ ˜ B(α, β) = B(α, β + 1), β which can also be obtained as

 β   1  1 x α+β 1−x β  ˜ B(α, β + 1) = 01 x α+β−1 1−x d x = α+β  x x

  β  1 α+β 1−x β−1 d x β  1 α−1 β ˜ B(α, β). = α+β (1 − x)β−1 d x = α+β 0 x x α+β 0 x x2

+

x=0

Using (1.4.97) repeatedly,

we get

17

(1.4.97)

The right-hand side of (1.4.95) is called the Eulerian integral of the first kind.

54

1 Preliminaries

(α + β)(α + β + 1) ˜ ˜ B(α, β) = B(α, β + 2) β(β + 1) (α + β)(α + β + 1)(α + β + 2) ˜ B(α, β + 3) = β(β + 1)(β + 2) .. . (α + β)n ˜ B(α, β + n), = (β)n

(1.4.98)

which can be expressed as  α−1   t t β+n−1 dt 1− n n n 0  β+n−1  n β−1 t (α + β)n n n! = α+β−1 t α−1 1 − dt n n! (β)n 0 n

(α + β)n n! ˜ B(α, β) = n! (β)n



n

(1.4.99)

 β+n−1  n = lim 1 + −t after some manipulations. Now, because lim 1 − nt n n→∞ n→∞  β−1 1 − nt = e−t , if we let n → ∞ in (1.4.99) and recollect the defining equation (1.4.73) of the gamma function, we get ˜ B(α, β) =

Γ (β) Γ (α + β)





t α−1 e−t dt.

˜ Next, from (1.4.95) with β = 1, we get B(α, 1) = result into (1.4.100) with β = 1, we get 1 Γ (1) = α Γ (α + 1)



(1.4.100)

0



1 0

t α−1 dt = α1 . Using this

t α−1 e−t dt.

(1.4.101)

0

Therefore, recollecting Γ (α + 1) = αΓ (α) and Γ (1) = 1, we have18 



t α−1 e−t dt = Γ (α).

(1.4.102)

0

From (1.4.100) and (1.4.102), we get (1.4.96). Note that (1.4.100)–(1.4.102) implicitly dictates that the defining equation (1.4.65) of the gamma function Γ (α) for α > 0 is a special case of (1.4.73).

18

The left-hand side of (1.4.102) is called the Eulerian integral of the second kind. In Exercise 1.38, we consider another way to show (1.4.96).

1.5 Limits of Sequences of Sets

55

1.5 Limits of Sequences of Sets In this section, the properties of infinite sequences of sets are discussed. The exposition in this section will be the basis, for instance, in discussing the σ-algebra in Sect. 2.1.2 and the continuity of probability in Appendix 2.1.

1.5.1 Upper and Lower Limits of Sequences Let us first consider the limits of sequences of numbers before addressing the limits of sequences of sets. When ai ≤ u and ai ≥ v for every choice of a number ai from the set A of real numbers, the numbers u and v are called an upper bound and a lower bound, respectively, of A. Definition 1.5.1 (least upper bound; greatest lower bound) For a subset A of real numbers, the smallest among the upper bounds of A is called the least upper bound of A and is denoted by sup A, and the largest among the lower bounds of A is called the greatest lower bound and is denoted by inf A. For a sequence {xn }∞ n=1 of real numbers, the least upper bound and greatest lower bound are written as sup xn = sup {xn : n ≥ 1}

(1.5.1)

n≥1

and inf xn = inf {xn : n ≥ 1} ,

n≥1

(1.5.2)

respectively. When there exists no upper bound and no lower bound of A, it is denoted by sup A → ∞ and inf A → −∞, respectively. Example 1.5.1 For the set A = {1, 2, 3}, the least upper bound is sup A = 3 and the greatest lower bound is inf A = 1. For the sequence {an } = {0, 1, 0, 1, . . .}, the least upper bound is sup an = 1 and the greatest lower bound is inf an = 0. ♦ Definition 1.5.2 (limit superior; limit inferior) For a sequence {xn }∞ n=1 , lim sup xn = xn n→∞

= inf sup xk n≥1 k≥n

and

(1.5.3)

56

1 Preliminaries

lim inf xn = xn n→∞

= sup inf xk

(1.5.4)

n≥1 k≥n

are called the limit superior and limit inferior, respectively. √ √

Example 1.5.2 For the sequences xn = √1n 4n + 3 + (−1)n 4n − 5 , yn =   √5 + (−1)n , and z n = 3 sin nπ , we have x n = 4 and x n = 0, yn = 1 and yn = −1, 2 n and z n = 3 and z n = −3. ♦ Definition 1.5.3 (limit) For a sequence {xn }∞ n=1 , lim x n = y and lim sup x n = n→∞

n→∞

lim inf xn = y are the necessary and sufficient conditions of each other, where y n→∞

is called the limit of the sequence {xn }∞ n=1 . Example 1.5.3 For the three sequences √ 2, n = 1, xn = √ 2 + xn−1 , n = 2, 3, . . . , ⎧√ ⎨ n, n = 1, 2, . . . 9, n = 10, 11, . . . 100, yn = 3, ⎩ 3n , n = 101, 102, . . . , 5n+4

(1.5.5)

(1.5.6)

and zn =

 √ 1 √ 4n + 3 + (−1)n 4n − 5 , 2n

(1.5.7)

we have19 xn = xn = lim xn = 2, yn = yn = lim yn = 35 , and z n = z n = n→∞ n→∞ ♦ lim z n = 0.

n→∞

Example 1.5.4 None of the three sequences xn , yn , and z n considered in Example 1.5.2 has a limit because the limit superior is different from the limit inferior. ♦

Fig. 1.24 Increasing sequence {Bn }∞ n=1 : Bn+1 ⊃ Bn B1

19

The value xn = 2 can be obtained by solving the equation xn =

B2

√ 2 + xn .

B3

···

1.5 Limits of Sequences of Sets

57

1.5.2 Limit of Monotone Sequence of Sets Definition 1.5.4 (increasing sequence; decreasing sequence; non-decreasing sequence; non-increasing sequence) For a sequence {Bn }∞ n=1 of sets, assume that Bn+1 ⊇ Bn for every natural number n. If Bn+1 = Bn for at least one n, then the sequence is called a non-decreasing sequence; otherwise, it is called an increasing sequence. A non-increasing sequence and a decreasing sequence are defined similarly by replacing Bn+1 ⊇ Bn with Bn+1 ⊆ Bn .   ∞ Example 1.5.5 The sequences 1, 2 − n1 n=1 and {(−n, a)}∞ are increasing   n=1   1 ∞ 1 1 ∞ sequences. The sequences 1, 1 + n n=1 and 1 − n , 1 + n n=1 are decreasing sequences. ♦ Increasing and decreasing sequences are sometimes referred to as strictly increasing and strictly decreasing sequences, respectively. A non-decreasing sequence or a non-increasing sequence is also called a monotonic sequence. Although increasing and non-decreasing sequences are slightly different from each other, they are used interchangeably when it does not cause an ambiguity. Similarly, decreasing and non-increasing sequences will be often used interchangeably. Figures 1.24 and 1.25 graphically show increasing and decreasing sequences, respectively. Definition 1.5.5 (limit of monotonic sequence) We call ∞

lim Bn = ∪ Bi

n→∞

(1.5.8)

i=1

for a non-decreasing sequence {Bn }∞ n=1 and ∞

lim Bn = ∩ Bi

n→∞

(1.5.9)

i=1

∞ for a non-increasing sequence {Bn }∞ n=1 the limit set or limit of {Bn }n=1 .

The limit lim Bn denotes the set of points contained in at least one of {Bn }∞ n=1 n→∞

∞ and in every set of {Bn }∞ n=1 when {Bn }n=1 is a non-decreasing and a non-increasing sequence, respectively.

Fig. 1.25 Decreasing sequence {Bn }∞ n=1 : Bn+1 ⊂ Bn B1 B2

B3 · · ·

58

1 Preliminaries

 ∞  Example 1.5.6 The sequence 1, 2 − n1 n=1 considered in Example 1.5.5 has the ∞     limit lim Bn = ∪ 1, 2 − n1 = [1, 1) ∪ 1, 23 ∪ · · · or n→∞

n=1

lim Bn = [1, 2)

(1.5.10)

n→∞

because

  ∞ 1, 2 − n1 n=1 is a non-decreasing sequence. Likewise, the limit of the ∞

non-decreasing sequence {(−n, a)}∞ n=1 is lim Bn = ∪ (−n, a) = (−∞, a). n→∞

n=1



  ∞ Example 1.5.7 The sequence a, a + n1 n=1 is a non-increasing sequence and has ∞   the limit lim Bn = ∩ a, a + n1 or n→∞

n=1

lim Bn = [a, a] ,

n→∞

(1.5.11)

  ∞ which is a singleton set {a}. The non-increasing sequence 1 − n1 , 1 + n1 n=1 has ∞     the limit lim Bn = ∩ 1 − n1 , 1 + n1 = (0, 2) ∩ 21 , 23 ∩ · · · = [1, 1], also a sinn→∞

n=1

gleton set. Note that !

1 lim 0, n→∞ n

 = {0}

(1.5.12)

is different from !

1 n→∞ n

0, lim

 =∅

(1.5.13) ♦

in (1.5.11).

∞ Example Consider

1.5.8

the set S = {x : 0 ≤∞x ≤ 1}, sequence {Ai }i=1 with 1 Ai = x : i+1 < x ≤ 1 , and sequence {Bi }i=1 with Bi = x : 0 < x < 1i . ∞ is a non-decreasing sequence, we have lim An = x : 21 Then, because {Ai }i=1 n→∞

< x ≤ 1} ∪ x : 13 < x ≤ 1 ∪ · · · = {x : 0 < x ≤ 1} and S = {0} ∪ lim An . n→∞

∞ Similarly, because {Bi }i=1 is a non-increasing sequence, we have lim Bn = n→∞

{x : 0 < x < 1} ∩ x : 0 < x < 21 ∩ · · · = {x : 0 < x ≤ 0} = ∅. ♦   ∞   ∞ Example 1.5.9 The sequences 1 + n1 , 2 n=1 and 1 + n1 , 2 n=1 of interval sets are both non-decreasing sequences limits (1, 2) and (1, 2], respectively. The  ∞  with the  ∞ sequences a, a + n1 n=1 and a, a + n1 n=1 are both non-increasing sequences with the limits (a, a] = ∅ and [a, a] = {a}, respectively. ♦     ∞ ∞ Example 1.5.10 The sequences 1 − n1 , 2 n=1 and 1 − n1 , 2 n=1 are both nonincreasing sequences with the limits [1, 2) and [1, 2], respectively. The sequences

1.5 Limits of Sequences of Sets

59



 ∞  ∞  1, 2 − n1 n=1 and 1, 2 − n1 n=1 are both non-decreasing sequences with the limits (1, 2) and [1, 2), respectively. ♦     ∞ ∞ Example 1.5.11 The sequences 1 + n1 , 3 − n1 n=1 and 1 + n1 , 3 − n1 n=1 are both non-decreasing sequences with  the common Similarly, the non  limit (1, 3).   ∞ ∞ decreasing sequences 1 + n1 , 3 − n1 n=1 and 1 + n1 , 3 − n1 n=1 both have the limit20 (1, 3). ♦     ∞ ∞ Example 1.5.12 The four sequences 1 − n1 , 3 + n1 n=1 , 1 − n1 , 3 + n1 n=1 ,  ∞  ∞   1 − n1 , 3 + n1 n=1 , and 1 − n1 , 3 + n1 n=1 are all non-increasing sequences with the common limit21 [1, 3]. ♦

1.5.3 Limit of General Sequence of Sets We have discussed the limits of monotonic sequences in Sect. 1.5.2. Let us now consider the limits of general sequences. First, note that any element in a set of an infinite sequence belongs to (1) (2) (3) (4)

every set, every set except for a finite number of sets, infinitely many sets except for other infinitely many sets, or a finite number of sets.

Keeping these four cases in mind, let us define the lower bound and upper bound sets of general sequences. Definition 1.5.6 (lower bound set) For a sequence of sets, the set of elements belonging to at least almost every set of the sequence is called the lower bound or lower bound set of the sequence, and is denoted by22 lim inf or by lim . n→∞

n→∞

Let us express the lower bound set lim inf Bn = Bn of the sequence {Bn }∞ n=1 in n→∞ terms of set operations. First, note that ∞

G i = ∩ Bk k=i

(1.5.14)

20 In short, irrespective of the type of the parentheses, the limit is in the form of ‘(a, . . .’ for an interval when the beginning point is of the form a + n1 , and the limit is in the form of ‘. . . , b)’ for an interval when the end point is of the form b − n1 . 21 In short, irrespective of the type of the parentheses, the limit is in the form of ‘[a, . . .’ for an interval when the beginning point is of the form a − n1 , and the limit is in the form of ‘. . . , b]’ for an interval when the end point is of the form b + n1 . 22 The acronym inf stands for infimum or inferior.

60

1 Preliminaries

is the set of elements belonging to Bi , Bi+1 , . . .: in other words, G i is the set of elements belonging to all the sets of the sequence except for at most (i − 1) sets, possibly B1 , B2 , . . ., and Bi−1 . Specifically, G 1 is the set of elements belonging to all the sets of the sequence, G 2 is the set of elements belonging to all the sets except possibly for the first set, G 3 is the set of elements belonging to all the sets except possibly for the first and second sets, . . .. This implies that an element belonging to ∞ is an element belonging to almost every set of the sequence. Therefore, any of {G i }i=1 ∞ , or if we take the if we collect all the elements belonging to at least one of {G i }i=1 ∞ union of {G i }i=1 , the result would be the set of elements in every set except for a finite number of sets. In other words, the set of elements belonging to at least almost inf Bn of {Bn }∞ every set of the sequence {Bn }∞ n=1 , or the lower bound set lim n=1 , can n→∞ be expressed as ∞ ∞

lim inf Bn = ∪ ∩ Bk , n→∞

(1.5.15)

i=1 k=i

which is sometimes denoted by {eventually Bn } or {ev. Bn }. Example 1.5.13 For the sequence {Bn }∞ n=1 = {0, 1, 3}, {0, 2}, {0, 1, 2}, {0, 1}, {0, 1, 2}, {0, 1}, . . .

(1.5.16)

of finite sets, obtain the lower bound set. Solution First, 0 belongs to all sets, 1 belongs to all sets except for the second set, and 2 and 3 do not belong to infinitely many sets. Thus, the lower bound of the sequence ∞ ∞



i=1 k=i

i=1

is {0, 1}, which can be confirmed by lim inf Bn = ∪ ∩ Bk = ∪ G i = {0, 1} using ∞



k=1

k=2

n→∞





G 1 = ∩ Bk = {0}, G 2 = ∩ Bk = {0}, G 3 = ∩ Bk = {0, 1}, G 4 = ∩ Bk = {0, 1}, . . ..

k=3

k=4



Definition 1.5.7 (upper bound set) For a sequence of sets, the set of elements belonging to infinitely many sets of the sequence is called the upper bound or upper bound set of the sequence, and is denoted by23 lim sup or by lim . n→∞

n→∞

Bnc

belongs to a finite c number of  ∞ ∞ c Bn and the converse is also true, we have lim sup Bn = ∪ ∩ Bk , i.e., Because an element belonging to almost every n→∞

∞ ∞

lim sup Bn = ∩ ∪ Bk , n→∞

i=1 k=i

i=1 k=i

(1.5.17)

which is alternatively written as lim sup Bn = {infinitely often Bn } with ‘infinitely n→∞

often’ also written as i.o. It is noteworthy that 23

The acronym sup stands for supremum or superior.

1.5 Limits of Sequences of Sets

61

{finitely often Bn } = {infinitely often Bn }c ∞ ∞

= ∪ ∩ Bkc ,

(1.5.18)

i=1 k=i

where ‘finitely often’ is often written as f.o. Example 1.5.14 Obtain the upper bound set for the sequence {Bn } = {0, 1, 3}, {0, 2}, {0, 1, 2}, {0, 1}, {0, 1, 2}, {0, 1}, . . . considered in Example 1.5.13. Solution Because 0, 1, and 2 belong to infinitely many sets and 3 belongs to one set, the upper bound set is {0, 1, 2}. This result can be confirmed as lim sup Bn = ∞ ∞





n→∞

∩ ∪ Bk = ∩ Hi = {0, 1, 2} by noting that H1 = ∪ Bk = {0, 1, 2, 3}, H2 =

i=1 k=i ∞ ∪ Bk = k=2

i=1





k=3

k=4

k=1

{0, 1, 2}, H3 = ∪ Bk = {0, 1, 2}, H4 = ∪ Bk = {0, 1, 2}, . . .. Simi-

larly, assuming Ω = {0, 1, 2, 3, 4} for example, we have B1c = {2, 4}, B2c = {1, 3, 4}, B3c = {3, 4}, B4c = {2, 3, 4}, B5c = {3, 4}, B6c = {2, 3, 4}, . . ., and thus ∞





∩ Bkc = {4}, ∩ Bkc = {3, 4}, ∩ Bkc = {3, 4}, . . .. Therefore, the upper bound can k=2 k=3  c ∞ ∞ c be obtained also as lim sup Bn = ∪ ∩ Bk = ({4} ∪ {3, 4} ∪ {3, 4} ∪ · · · )c =

k=1

{3, 4}c = {0, 1, 2}.

i=1 k=i

n→∞



Let us note that in an infinite sequence of sets, any element belonging to almost every set belongs to infinitely many sets and thus lim inf Bn ⊆ lim sup Bn n→∞

(1.5.19)

n→∞

is always true. On the other hand, as we mentioned before, an element belonging to infinitely many sets may or may not belong to the remaining infinitely many sets: for example, we can imagine an element belonging to all the odd-numbered sets but not in any even-numbered set. Consequently, an element belonging to infinitely many sets does not necessarily belong to almost every set. In short, in some cases we have lim sup Bn  lim inf Bn , n→∞

n→∞

(1.5.20)

which, together with (1.5.19), confirms the intuitive observation that the upper bound is not smaller than the lower bound. In Definition 1.5.5, we addressed the limit of monotonic sequences. Let us now extend the discussion to the limits of general sequences of sets. Definition 1.5.8 (convergence of sequence; limit set) If lim sup Bn ⊆ lim inf Bn n→∞

n→∞

holds true for a sequence {Bn }∞ n=1 of sets, then

(1.5.21)

62

1 Preliminaries

lim sup Bn = lim inf Bn n→∞

n→∞

=B

(1.5.22)

from (1.1.2) using (1.5.19) and (1.5.21). In such a case, the sequence {Bn }∞ n=1 is called to converge to B, which is denoted by Bn → B or lim Bn = B. The set B is n→∞

called the limit set or limit of {Bn }∞ n=1 .

The limit of monotonic sequences described in Definition 1.5.5 is in agreement with Definition 1.5.8: let us confirm this fact. Assume {Bn }∞ n=1 is a non-decreasing ∞

n

n

sequence. Because ∩ Bk = Bi for any i, we have ∩ Bk = lim ∩ Bk = lim Bi = k=i

n→∞ k=i

k=i

∞ ∞

n→∞

Bi , with which we get lim inf Bn = ∪ ∩ Bk as n→∞

i=1 k=i



lim inf Bn = ∪ Bi n→∞

i=1

= lim Bn .

(1.5.23)

n→∞



n

n

We also have ∪ Bk = lim ∪ Bk = lim Bn from ∪ Bk = Bn for any value of i. n→∞ k=i ∞ ∞ lim sup Bn = ∩ ∪ i=1 k=i n→∞

k=i

Thus, we have

n→∞ ∞ Bk = ∩ lim i=1 n→∞

k=i

Bn , consequently resulting in

lim sup Bn = lim Bn n→∞

n→∞

= lim inf Bn .

(1.5.24)

n→∞

∞ ∞

Next, assume {Bn }∞ n=1 is a non-increasing sequence. Then, lim inf Bn = ∪ ∩ Bk = ∞

n→∞



i=1 k=i

n

∪ lim Bn = lim Bn because ∩ Bk = lim Bn from ∩ Bk = Bn for any i. We also

i=1 n→∞

n→∞

∞ ∞

k=i ∞

n→∞

k=i



n

have lim sup Bn = ∩ ∪ Bk = ∩ Bi = lim Bn because ∪ Bk = lim ∪ Bk = Bi from

n→∞ n ∪ Bk = k=i

i=1 k=i

i=1

n→∞

k=i

n→∞ k=i

Bi for any i.

 ∞  1 − n1 , 3 − n1 n=1 . Example 1.5.15 Obtain the limit of {Bn }∞ n=1 =       , . . ., Solution First, because B1 = (0, 2), B2 = 21 , 25 , B3 = 23 , 83 , B4 = 43 , 11 4 ∞ ∞ ∞  5 we have G 1 = ∩ Bk = [1, 2), G 2 = ∩ Bk = 1, 2 , · · · and H1 = ∪ Bk = (0, 3), k=1 k=2 k=1 ∞ ∞ 1  H2 = ∪ Bk = 2 , 3 , · · · . Therefore, the lower bound is lim inf Bn = ∪ G i = k=2



n→∞

i=1

[1, 3), the upper bound is lim sup Bn = ∩ Hi = [1, 3), and the limit is lim Bn = n→∞

i=1

n→∞

1.5 Limits of Sequences of Sets

63

[1, that  the limits  are all24 [1, 3) for the sequences  show  3). 1We can1  similarly ∞ ∞ 1 1 ∞ ♦ 1 − n , 3 − n n=1 , 1 − n , 3 − n n=1 , and 1 − n1 , 3 − n1 n=1 .   ∞ Example 1.5.16 Obtain the limit of the sequence 1 + n1 , 3 + n1 n=1 of intervals.       , B4 = 54 , 13 , . . ., Solution First, because B1 = (2, 4), B2 = 23 , 27 , B3 = 43 , 10 3 4 ∞ ∞ ∞ 3  we have G 1 = ∩ Bk = (2, 3], G 2 = ∩ Bk = 2 , 3 , · · · and H1 = ∪ Bk = (1, 4), k=1 k=2 k=1 ∞ ∞  7 H2 = ∪ Bk = 1, 2 , · · · . Thus, the lower bound is lim inf Bn = ∪ G i = (1, 3], the k=2

n→∞



i=1

upper bound is lim sup Bn = ∩ Hi = (1, 3], and the limit set is lim Bn = (1, 3]. n→∞ i=1 n→∞   ∞ We can similarly show that the limits of the sequences 1 + n1 , 3 + n1 n=1 ,  ∞  ∞   ♦ 1 + n1 , 3 + n1 n=1 , and 1 + n1 , 3 + n1 n=1 are all (1, 3].

Appendices Appendix 1.1 Binomial Coefficients in the Complex Space For the factorial n! = n(n − 1) · · · 1 defined in (1.4.57), the number n is a natural number. Based on the gamma function addressed in Sect. 1.4.3, we extend the factorial into the complex space, which will in turn be used in the discussion of the permutation and binomial coefficients (Riordan 1968; Tucker 2002; Vilenkin 1971) in the complex space.

(A) Factorials and Permutations in the Complex Space Recollecting that α! = Γ (α + 1)

(1.A.1)

from Γ (α + 1) = αΓ (α) shown in (1.4.75), the factorial p! can be expressed as  p! =

±∞, p ∈ J− , Γ ( p + 1), p ∈ / J−

(1.A.2)

for a complex number p, where J− = {−1, −2, . . .} denotes the set of negative integers. Therefore, 0! = Γ (1) = 1 for p = 0. 24

As it is mentioned in Examples 1.5.11 and 1.5.12, when the lower end value of an interval is in the form of a − n1 and a + n1 , the limit is in the form of ‘[a, . . .’ and ‘(a, . . .’, respectively. In addition, when the upper end value of an interval is in the form of b + n1 and b − n1 , the limit is in the form of ‘. . . , b]’ and ‘. . . , b)’, respectively, for both open and closed ends.

64

1 Preliminaries

  Example 1.A.1 From (1.A.2), it is easy to see that (−2)! = ±∞ and that − 21 ! = 1 √ Γ 2 = π from (1.4.83). ♦ n! defined in (1.4.59), it is assumed that n is a nonFor the permutation n Pk = (n−k)! negative integer and k = 0, 1, . . . , n. Based on (1.4.92) and (1.A.2), the permutation Γ ( p+1) p Pq = Γ ( p−q+1) can now be generalized as

⎧ Γ ( p+1) , ⎪ ⎪ ⎨ Γ ( p−q+1) , (−1)q Γ Γ(−(−p+q) p) p Pq = ⎪ ⎪ ⎩ 0, ±∞,

p p p p

∈ / J− ∈ J− ∈ / J− ∈ J−

and and and and

p−q p−q p−q p−q

∈ / J− , ∈ J− , ∈ J− , ∈ / J−

(1.A.3)

for complex numbers p and q, where the expression (−1)q Γ Γ(−(−p+q) in the second p) line of the right-hand side can also be written as (−1)q − p+q−1 Pq . Example 1.A.2 For any number z, z P0 = 1 and z P1 = z.



Example 1.A.3 It follows that  0 Pz =

0, 1 , Γ (1−z)

 1 Pz =

(1.A.4)

z = 2, 3, . . . , otherwise,

(1.A.5)

0, 1 , Γ (2−z)

 z Pz

z is a natural number, otherwise,

=

±∞, z ∈ J− , Γ (z + 1), z ∈ / J− ,

(1.A.6)

and  −1 Pz =

 = from (1.A.3).

(−1)z z Pz , z = 0, 1 . . . , ±∞, otherwise (−1)z Γ (z + 1), z = 0, 1 . . . , ±∞, otherwise

(1.A.7) ♦

(1) Using (1.A.3), we can also get −2 P−0.3 = ±∞, −0.1 P1.9 = 0, 0 P3 = ΓΓ(−2) = 0, √ 3 3 Γ(2) Γ(2) π Γ (4) Γ (4) 3 16 1 P3 = = 8 , 21 P0.8 = Γ (0.7) = 2Γ (0.7) , 3 P 21 = Γ 7 = 5√π , 3 P− 21 = Γ 9 = Γ (− 23 ) 2 (2) (2) Γ ( 21 ) Γ (4) Γ (−1) Γ (5) 32 1 8 √ , P = Γ (6) = 20 , − 21 P3 = Γ − 5 = − 15 , and −2 P3 = Γ (−4) = (−1) Γ (2) = 35 π 3 −2 ( 2) −24. Table 1.3 shows some values of the permutation p Pq .

Appendices

65

Table 1.3 Some values of the permutation p Pq (Here, ∗ denotes ±∞) q −2 p

−2 − 23 −1 − 21 0 1 2

1 3 2

2

∗ −4 ∗ 4 3 1 2 4 15 1 6 4 35 1 12

− 23

−1

− 21

0

1 2

1

3 2

2

∗ √ −2 π ∗ √ π

−1 −2 ∗ 2 1

∗ 0 ∗ √ π

1 1 1 1 1

∗ 0 ∗ 0

−2 − 23 −1 − 21 0

∗ 0 ∗ 0 − 2√1 π

6

1 2

0

− 41 0

4 √ 3 √ π π 4 8√ 15 π √ π 8 32√ 105 π

√2 √π π 2 4 √ 3√π 3 π 8 16 √ 15 π

2 3 1 2 2 5 1 3

1 1 1 1

√1 √π π 2 2 √ π √ 3 π 4 8 √ 3 π

1 3 2

2

√1 π √ 3 π 4 √4 π

15 4

2 3 4

0

3 4

2

(B) Binomial Coefficients in the Complex Space n! defined in (1.4.60), n and k are nonFor the binomial coefficient n Ck = (n−k)!k! negative integers with n ≥ k. Based on the gamma function described in Sect. 1.4.3, we can define the binomial coefficient in the complex space: specifically, employΓ ( p+1) for p and q complex ing (1.4.92), the binomial coefficient p Cq = Γ ( p−q+1)Γ (q+1) numbers can be defined as described in Table 1.4.

Example 1.A.4 When both p and p − q are negative integers and q is a non-negative integer, the binomial coefficient p Cq = (−1)q − p+q−1 Cq can be expressed also as Table 1.4 The binomial coefficient p Cq =

Γ ( p+1) Γ ( p−q+1)Γ (q+1)

Is p ∈ J− ?

Is q ∈ J− ?

Is p − q ∈ J− ?

No

No

No

Yes

Yes

No

Yes

No

Yes

Yes No No Yes

Yes Yes No No

Yes No Yes No

in the complex space p Cq Γ ( p+1) Γ ( p−q+1)Γ (q+1) Γ (−q) (−1) p−q Γ ( p−q+1)Γ (− p) = (−1) p−q −q−1 C p−q Γ (− p+q) (−1)q Γ (− p)Γ (q+1) q = (−1) − p+q−1 Cq

0 ±∞

Note. Among the three numbers p, q, and p − q, it is possible that only p − q is not a negative integer (e.g., p = −2, q = −3, and p − q = 1) and only q is not a negative integer (e.g., p = −3, q = 2, and p − q = −5), but it is not possible that only p is not a negative integer. In other words, when q and p − q are both negative integers, p is also a negative integer.

66

1 Preliminaries p Cq

= (−1)q − p+q−1 C− p−1 .

(1.A.8)

Now, when p is a negative non-integer real number and q is a non-negative ( p+1) 1 = integer, the binomial coefficient can be written as p Cq = Γ Γ( p−q+1) Γ (q+1) p+q−1)! 1 = (−1)q (− or as (−1) p− p+q Γ Γ(−(−p+q) p) Γ (q+1) (− p−1)!q! p Cq

= (−1)q − p+q−1 Cq

(1.A.9)

(−β) = (−1)α−β ΓΓ (−α) shown in (1.4.91) for α − β an integer, by recollecting ΓΓ (α+1) (β+1) α < 0, and β < 0. The two formulas (1.A.8) and (1.A.9) are the same as −r Cx = (−1)x r +x−1 Cx , which we will see in (2.5.15) for a negative real number −r and a non-negative integer x. ♦

Example 1.A.5 From Table 1.4, we promptly get z C0 = z Cz = 1 and z C1 = 1 z Cz−1 = z. In addition, 0 Cz = 0 C−z and 1 Cz = 1 C1−z = Γ (2−z)Γ (1+z) can be expressed as 0 Cz

1 Γ (1 − z)Γ (1 + z) ⎧ z = 0, ⎨ 1, z = ±1, ±2, . . . , = 0, ⎩ 1 , otherwise Γ (1−z)Γ (1+z) =

(1.A.10)

and ⎧ ⎨ 1, 0, 1 Cz = ⎩

z = 0, 1, z = −1, ±2, ±3, . . . , 1 , otherwise, Γ (2−z)Γ (1+z)

(1.A.11) ♦

respectively. We can similarly obtain25 (−3)! (−5)!2!

4! = (−1)2 2!2! = 4 C2 , Γ ( 27 ) 5 C2 = 5 C 1 = = Γ (3)Γ ( 23 ) 2 2 2 coefficient.

−3 C−2

−7 C3 15 . 8

(−3)! = 0, −3 C2 = −3 C−5 = (−2)!(−1)! (−7)! 9! = (−1) 6!3! = −9 C3 , and (−10)!3!

= −3 C−1 =

= −7 C−10 =

Table 1.5 shows some values of the binomial

Example 1.A.6 Obtain the series expansion of h(z) = (1 + z) p for p a real number. Solution First, when p ≥ 0 or when p < 0 and |z| < 1, we have (1 + z) p = ∞ (k) ∞   h (0) k 1 z = p( p − 1) · · · ( p − k + 1)z k , i.e., k! k!

k=0

25

k=0

The cases −1 Cz and −2 Cz are addressed in Exercise 1.39.

Appendices

67

Table 1.5 Values of binomial coefficient p Cq (Here, ∗ denotes ±∞)

p

−2 − 23 −1 − 21 0 1 2

1 3 2

2

q −2

− 23

−1

− 21

0

1 2

1

3 2

2

1 0 −1 0 0 0 0 0 0

∗ 1 ∗ − 21 2 − 3π 1 −8 4 − 15π 1 − 16 16 − 105π

0 0 1 0 0 0 0 0 0

∗ 0 ∗ 1

1 1 1 1 1 1 1 1 1

∗ 0 ∗ 0

−2 − 23 −1 − 21 0

∗ 0 ∗ 0 2 − 3π 0

3

2 π 1 2 4 3π 3 8 16 15π

(1 + z) p =

∞ 

2 π

1 4 π 3 2 16 3π

p Ck z

k

1 2

1 3 2

2

15 8

1 3 8

0 − 18 0

4 3π

3 8

1 16 3π

.

1

(1.A.12)

k=0

Note that (1.A.12) is the same as the binomial expansion (1 + z) p =

p 

p Ck z

k

(1.A.13)

k=0

of (1 + z) p because p Ck = 0 for k = p + 1, p + 2, . . . when p is 0 or a natural  p number. Next, recollecting (1.A.12) and (1 + z) p = z p 1 + 1z , we get (1 + z) p =

∞ 

p Ck z

p−k

(1.A.14)

k=0

for p < 0 and |z| > 1. Combining (1.A.12) and (1.A.14), we eventually get

(1 + z)

p

=

⎧ ∞  ⎪ k ⎪ ⎨ p Ck z , ⎪ ⎪ ⎩

k=0 ∞ 

p Ck z

p−k

for p ≥ 0 or for p < 0, |z| < 1, , for p < 0, |z| > 1,

(1.A.15)

k=0

(1 + z) p = 2 p for p < 0 and z = 1, and (1 + z) p → ∞ for p < 0 and z = −1. Note that the term p Ck in (1.A.15) is always finite because the case of only p being a negative integer among p, k, and p − k is not possible when k is an integer. ♦ Example 1.A.7 Because −1 Ck = (−1)k for k = 0, 1, . . . as shown in (1.E.27), we get

68

1 Preliminaries

⎧ ∞  ⎪ ⎪ ⎨ (−1)k z k ,

|z| < 1, 1 = k=0 ∞  ⎪ 1+z ⎪ ⎩ (−1)k z −1−k , |z| > 1 k=0  1 − z + z 2 − z 3 + · · · , |z| < 1, = 1 − z12 + z13 − · · · , |z| > 1 z

(1.A.16)

from (1.A.15) with p = −1.



Example 1.A.8 Employing (1.A.15) and the result for −2 Ck shown in (1.E.28), we get ⎧ ∞  ⎪ ⎪ ⎨ (−1)k (k + 1)z k ,

|z| < 1, 1 k=0 = ∞  ⎪ (1 + z)2 ⎪ ⎩ (−1)k (k + 1)z −2−k , |z| > 1 k=0  1 − 2z + 3z 2 − 4z 3 + · · · , |z| < 1, = 1 − z23 + z34 − · · · , |z| > 1. z2 Alternatively, from

1 (1+z)2

=

1 1−(−2z−z 2 )

=

(1.A.17)

∞    k  −2z − z 2 for −2z − z 2  < 1, we k=0

have     1 = 1 + −2z − z 2 + 4z 2 + 4z 3 + z 4 (1 + z)2   + −8z 3 + · · · + · · · ,

(1.A.18)

which can be rewritten as     1 − 2z + −z 2 + 4z 2 + 4z 3 − 8z 3 + · · · = 1 − 2z + 3z 2 − 4z 3 + · · ·

(1.A.19)

by changing the order in the addition. The result26 (1.A.19) is the same as (1.A.17) for |z| < 1.

(C) Two Equalities for Binomial Coefficients Theorem 1.A.1 For γ ∈ {0, 1, . . .} and any two numbers α and β, we have   √  In writing (1.A.19) from (1.A.18), we assume −2z − z 2  < 1, 0 < |Re(z) + 1| ≤ 2 , a   √  proper subset of the region |z| < 1. Here, −2z − z 2  < 1, 0 < |Re(z) + 1| ≤ 2 is the right   half of the dumbbell-shaped region −2z − z 2  < 1, which is a proper subset of the rectangle   √ |Im(z)| ≤ 21 , |Re(z) + 1| ≤ 2 .

26

Appendices

69

    γ   α β α+β = , γ−m m γ m=0

(1.A.20)

which is called Chu-Vandermonde convolution or Vandermonde convolution. Theorem 1.A.1 is proved in Exercise 1.35. The result (1.A.20) is the same as the Hagen-Rothe identity       γ  β β + mc α + β − γc α + β α − γc α − mc = (1.A.21) α − mc γ − m β + mc m α+β γ m=0 with c = 0 and Gauss’ hypergeometric theorem 2 F1 (a, b; c; 1)

=

Γ (c)Γ (c − a − b) , Re(c) > Re(a + b) Γ (c − a)Γ (c − b)

(1.A.22)

with a = γ, b = α + β − γ, and c = α + β + 1. In (1.A.22), 2 F1 (a, b; c; z) =

∞  (a)n (b)n z n , Re(c) > Re(b) > 0 (c)n n! n=0

(1.A.23)

is the hypergeometric function27 , and can be expressed also as 2 F1 (a, b; c; z) =

1 B(b, c − b)



1

x b−1 (1 − x)c−b−1 (1 − zx)−a d x (1.A.24)

0

in terms of Euler’s integral formula. Example 1.A.9 In (1.A.20), assume α = 2, β = 21 , and γ = 2. Then, the left-hand 1 1 1   1    1    1  ! ! −1 side is 22 02 + 21 12 + 02 22 = 1 + 2 −( 21 )!1! + −( 23 )!2! = 1 + 1 + ( 2 )(2! 2 ) = 15 8 ( 2) ( 2) 5 5 ! ) ( 15 2 and the right-hand side is 22 = 1 !2! = 8 . ♦ (2) n   α  β  Example 1.A.10 Consider the case β = n for , where α is not a negγ−m m m=0

ative integer and n ∈ {0, 1, . . .}. When γ = 0, 1, . . . , n, we have

n   m=0

27

α  β  γ−m m

The function 2 F1 is also called Gauss’ hypergeometric function, and a special case of the generalized hypergeometric function p Fq (α1 , α2 , . . . , α p ; β1 , β2 , . . . , βq ; z)

=

∞  (α1 )k (α2 )k · · · (α p )k z k . (β1 )k (β2 )k · · · (βq )k k! k=0

  Also, note that 2 F1 1, 1; 23 ; 21 =

π 2.

70

=

1 Preliminaries γ  

m=0 1 (γ−m)!

α  β  γ−m m

n  

+

α  β  γ−m m

m=γ+1

=

α+β 

noting

γ

that



α  γ−m

=

α! (α−γ+m)!

= 0 for m = γ + 1, γ + 2, . . . due to (γ − m)! = ±∞. Similarly, when γ = γ  γ n   α  β  α+β     α  β  α  β  = − = γ n + 1, n + 2, . . ., we have γ−m m γ−m m γ−m m m=0 m=0 m=n+1     n! because mβ = mn = (n−m)!m! = 0 from (n − m)! = ±∞ for m = n + 1, n + 2, . . .. In short, we have     n   α n α+n = γ−m m γ m=0

(1.A.25)

for γ ∈ {0, 1, . . .} when n ∈ {0, 1, . . .}.



Theorem 1.A.2 We have γ 

 [m]ζ

m=0

α γ−m

    β α+β−ζ = [β]ζ m γ−ζ

(1.A.26)

for ζ ∈ {0, 1, . . . , γ} and γ ∈ {0, 1, . . .}. Proof (Method 1) Let us employ the mathematical induction. First, when ζ = 0, (1.A.26) holds true for any values of γ ∈ {0, 1, . . .}, α, and β from Theorem 1.A.1. Assume (1.A.26) holds true when ζ = ζ0 : in other words, for any value of γ ∈ {0, 1, . . .}, α, and β, assume γ 

 [m]ζ0

m=0

α γ−m

     γ  β α β = [m]ζ0 m γ−m m m=ζ0   α + β − ζ0 = [β]ζ0 γ − ζ0

(1.A.27)

 β    holds true. Then, noting that (m + 1) m+1 = β β−1 and [m + 1]ζ0 +1 = (m + 1) m γ γ γ−1  α  β   α  β     [m]ζ0 , we get [m]ζ0 +1 γ−m m = [m]ζ0 +1 γ−m m = [m + 1]ζ0 +1 

α γ−m−1



β m+1



=

m=0 γ−1 

m=ζ0

[m]ζ0



α γ−1−m



m=ζ0 +1

(m + 1)



β m+1



m=ζ0

from (1.A.27), i.e.,

Appendices γ  m=0

71

 [m]ζ0 +1

α γ−m

     γ−1  β α β−1 =β [m]ζ0 m γ−1−m m m=ζ0   α + β − 1 − ζ0 = β[β − 1]ζ0 γ − 1 − ζ0   α + β − (ζ0 + 1) . (1.A.28) = [β]ζ0 +1 γ − (ζ0 + 1)

The result (1.A.28) implies that (1.A.26) holds true also when ζ = ζ0 + 1 if (1.A.26) holds true when ζ = ζ0 . In short, (1.A.26) holds true for ζ ∈ {0, 1, . . .}.   (Method 2) Noting (Charalambides 2002; Gould 1972) that [m]ζ mβ = γ   β−ζ   β−ζ  α+β−ζ   [β] [β−ζ]m−ζ α [m]ζ [m]ζ ζ (m−ζ)! , we can rewrite (1.A.26) as = γ−ζ , = [β]ζ m−ζ γ−m m−ζ which is the same as the Chu-Vandermonde convolution

m=ζ γ−ζ  k=0

α γ−ζ−k

β−ζ  k

=

α+β−ζ  . γ−ζ ♠

It is noteworthy that (1.A.26) holds true also when ζ ∈ {γ + 1, γ + 2, . . .}, in which case the value of (1.A.26) is 0. Example 1.A.11 Assume   α =  7,  β = 3, γ = 6, and ζ = 2 in (1.A.26).   Then, the left-hand side is 2 47 23 + 6 73 33 = 420 and the right-hand side is 6 84 = 420. ♦ Example 1.A.12 Assume = −1, γ = 3, and ζ = 2 in (1.A.26). then, the  −1α = −4, −4β −1 (−4)! (−1)! (−4)! (−1)! left-hand side is 2 −4 + 6 = 2 × (−5)!1! + 6 × (−4)!0! = 1 2 0 3   (−3)!2! (−4)!3! −7 (−7)! −14 and the right-hand side is (−1)(−2) 1 = 2 × (−8)!1! = −14. ♦ Example 1.A.13 The identity (1.A.26) holds true also for non-integer values of α or β. For example, when α = 21 , β = − 21 , γ = 2, and ζ = 1, the left-hand side is 1 1  1 − 1   1 − 1  ! −1 ! ! −1 ! 2 2 + 2 2 2 = −( 21 )!1! (− 32 )!1! + 2 (12 )!0! (− 52 )!2! = 21 and the right-hand side 1 1 0 2 ( ) ( ) ( ) ( 2 2 2 2)   Γ (0) (−1)! 1 1 1 is − 21 × −1 = − × = − × = . ♦ 1 2 Γ (−1) 2 (−2)!1! 2 √ Example 1.A.14 Denoting the unit imaginary number by j = −1, assume α = e − j, β = π + 2 j, γ = 4, and ζ = 2 in (1.A.26). Then, the left-hand          0 + 0 + 2 × α2 β2 + 6 × α1 β3 + 12 × α0 β4 = 21 α(α − 1)β(β − 1) + αβ(β − 1)(β − side is 2) + 21 β(β − 1)(β − 2)(β − 3) = 21 β(β − 1) {α(α − 1) + 2α(β − 2) + (β − 2)(β − 3)} = 21 β(β −

1) α2 + α(2β − 5) + (β − 2)(β − 3) = 21 β(β − 1)(α + β − 2)(α + β − 3) and the right-hand

  (α+β−2)! = β(β − 1) (α+β−4)!2! = 21 β(β − 1)(α + β − 2)(α + side is also β(β − 1) α+β−2 2 β − 3). ♦

72

1 Preliminaries

Example 1.A.15 When ζ = 1 and ζ = 2, (1.A.26) can be specifically written as      γ  α β α+β−1 m =β γ−m m γ−1 m=0   βγ α + β = α+β γ

(1.A.29)

and γ 



    β α+β−2 = β(β − 1) m γ−2   β(β − 1)γ(γ − 1) α + β , = (α + β)(α + β − 1) γ

α m(m − 1) γ−m m=0

(1.A.30)

respectively. The two results (1.A.29) and (1.A.30) will later be useful for obtaining the mean and variance of the hypergeometric distribution in Exercise 3.68. ♦

(D) Euler Reflection Formula We now prove the Euler reflection formula π sin πx

Γ (1 − x)Γ (x) =

for 0 < x < 1 mentioned in (1.4.79). First, if we let x = (1.4.95) of the beta function, we get 

˜ B(α, β) =

∞ 0

(1.A.31) s s+1

s α−1 ds. (s + 1)α+β

in the defining equation

(1.A.32)

˜ Using (1.A.32) and Γ (α)Γ (β) = Γ (α + β) B(α, β) from (1.4.96) will lead us to 



Γ (1 − x)Γ (x) = 0

s x−1 ds s+1

(1.A.33)

for α = x and β = 1 − x. To obtain the right-hand side of (1.A.33), we consider the contour integral  C

z x−1 dz z−1

(1.A.34)

in the complex space. The contour C of the integral in (1.A.34) is shown in Fig. 1.26, a counterclockwise path along the outer circle. As there exists only one pole z = 1

Appendices

73

inside the contour C, we get  C

from the residue theorem



z x−1 C z−1 dz

z x−1 dz = 2π j z−1

(1.A.35)

x−1

= 2π j Res zz−1 . Consider the integral along C in z=1

four segments. First, we have z = Re jθ and dz = j Re jθ dθ over the segment from z 1 = Re j (−π+) to z 2 = Re j (π−) along the circle with radius R. Second, we have z = r e j (π−) and dz = e j (π−) dr over the segment from z 2 = Re j (π−) to z 3 = pe j (π−) along the straight line toward the origin. Third, we have z = pe jθ and dz = j pe jθ dθ over the segment from z 3 = pe j (π−) to z 4 = pe j (−π+) clockwise along the circle with radius p. Fourth, we have z = r e j (−π+) and dz = e j (−π+) dr over the segment from z 4 = pe j (−π+) to z 1 = Re j (−π+) along the straight line out of the origin. Thus, we have  π−  jθ x−1  p j (π−) x−1 j (π−) Re re j Re jθ e z x−1 dz = dθ + dr jθ j (π−) Re − 1 re −1 −π+ R C z−1  R j (−π+) x−1 j (−π+)  −π+  jθ x−1 pe re j pe jθ e dθ + dr, + jθ − 1 j (−π+) − 1 pe r e π− p (1.A.36)



which can be written as  π−  p x−1 j (π−)x  z x−1 j R x e jθx r e dz = dθ + dr jθ − 1 j (π−) − 1 z − 1 Re C −π+ R re  R x−1 j (−π+)x  −π+ j p x e jθx r e dθ + dr + jθ − 1 j (−π+) − 1 pe π− p re

(1.A.37)

after some steps. When x > 0 and p → 0, the third term in the right-hand side  −π+ j p x e jθx of (1.A.37) is lim π− pe jθ −1 dθ = 0. Similarly, the first term in the right-hand p→0  x jθx   jR e  √ Rx Rx side of (1.A.37) is  Re ≤ R−1 → 0 when x < 1 and R → ∞. jθ −1  = R 2 −2R cos θ+1 Therefore, (1.A.37) can be written as  C

 0 x−1 jπx  ∞ x−1 − jπx z x−1 r e r e dz = 0 + dr + 0 + dr z−1 −r − 1 ∞ −r − 1 0    ∞ r x−1 dr = e jπx − e− jπx r +1 0  ∞ x−1 r dr (1.A.38) = 2 j sin πx r +1 0

for 0 < x < 1 when R → ∞, p → 0, and  → 0. In short, we have

74

1 Preliminaries

Fig. 1.26 The contour C of  x−1 integral C zz−1 dz, where 0< p 0, y = 0.

(1.A.55)

In other words, 

1 0

∂ f (x, y) dy = ∂x =

and thus

d dx

1 0

f (x, y)dy =

1

∂ 0 ∂x

01  0

1



3x 2 y2



2x 4 y3



 2 exp − xy x = 0,

x =0  2 (1 − 2x ) exp −x , x = 0, 0, x = 0, 0

0 dy,

2

f (x, y)dy.

(1.A.56) ♦

Example 1.A.21 (Gelbaum and Olmsted 1964) Consider the Cantor function φC (x) discussed in Example 1.3.11 and f (x) = 1 for x ∈ [0, 1]. Then, both 1 the Riemann-Stieltjes and Lebesgue-Stieltjes integrals produce 0 f (x)dφC (x) = 1 [ f (x)φC (x)]10 − 0 φC (x)d f (x) = φC (1) − φC (0) − 0, i.e., 

1

f (x) dφC (x) = 1

(1.A.57)

0

while the Lebesgue integral results in

1 0

f (x)φC (x)d x =

1 0

0d x = 0.



(C) Sum of Powers of Two Real Numbers Theorem 1.A.3 If the sum α + β and product αβ of two numbers α and β are both integers, then αn + β n = integer

(1.A.58)

for n ∈ {0, 1, . . .}. Proof Let us prove the theorem via mathematical induction. It is clear that (1.A.58) holds true when n = 0 and 1. When n = 2, α2 + β 2 = (α + β)2 − 2αβ is an integer. Assume αn + β n are all integers for n = 1, 2, . . . , k − 1. Then,     αk + β k = (α + β)k − k C1 αβ αk−2 + β k−2 − k C2 (αβ)2 αk−4 + β k−4 0 k−1 (αβ) 2 (α + β) , k is odd, k C k−1 2 (1.A.59) −··· − k k is even, k C 2k (αβ) 2 ,

80

1 Preliminaries

which implies that αn + β n is an integer when n = k because the binomial coefficient k C j is always an integer for j = 0, 1, . . . , k when k is a natural number. In other words, if αβ and α + β are both integers, then αn + β n is also an integer when n ∈ {0, 1, . . .}. ♠

(D) Differences of Geometric Sequences Theorem 1.A.4 Consider the difference Dn = αa n − βbn

(1.A.60)

of two geometric sequences, where α > 0, a > b > 0, a = 1, and b = 1. Let r =

ln

(1−b)β (1−a)α ln ab

.

(1.A.61)

Then, the sequence {Dn }∞ n=1 has the following properties: (1) For 0 < a < 1, Dn is the largest at n = r and n = r + 1 if r is an integer and at n = #r $ if r is not an integer. (2) For a > 1, Dn is the smallest at n = r and n = r + 1 if r is an integer and at n = #r $ if r is not an integer. Proof Consider the case 0 < a < 1. Then, from Dn+1 − Dn = bn (1 − b) β − a n (1 − a) α, we have Dn+1 > Dn for n < r , Dn+1 = Dn for n = r , and Dn+1 < Dn for n > r and, subsequently, (1). We can similarly show (2). ♠ Example 1.A.22 The sequence {αa n − βbn }∞ n=1 (1−b)β (1) is increasing and decreasing if a > 1 and 0 < a < 1, respectively, when (1−a)α < a , b (2) is first decreasing and then increasing and first increasing and then decreasing if (1−b)β > ab , and a > 1 and 0 < a < 1, respectively, when (1−a)α (1−b)β = (3) is increasing and decreasing if a > 1 and 0 < a < 1, respectively, when (1−a)α a 2 2 or, equivalently, when αa − βb = αa − βb . b

♦ Example 1.A.23 Assume α = β. Then, (1) {a n − bn }∞ n=1 is an increasing and decreasing sequence when a > 1 and a + b < 1, respectively, 2 2 (2) {a n − bn }∞ n=1 is a decreasing sequence and a − b = a − b when a + b = 1, and

Appendices

81

(3) {a n − bn }∞ n=1 is a sequence that first increases and then decreases with the maximum at n = #r $ if r is not an integer and at n = r and n = r + 1 if r is an integer when a + b > 1 and 0 < a < 1. ♦ Example 1.A.24 Assume α = β = 1, a = 0.95, and b = 0.4. Then, we have 2 − 42 < 0.953 − 43 and 0.953 − 43 > 0.954 − 44 > · · · because 0.95 −%4 < 0.95 0.6 & ln 0.05 #r $ = ln 0.95 ≈ #2.87$ = 3. ♦ 0.4

(E) Selections of Numbers with No Number Unchosen The number of ways to select r different elements from a set of n distinct elements is n Cr . Because every element will be selected as many times as any other element, each of the n elements will be selected n Cr × nr = n−1 Cr −1 times over the n Cr selections. Each of the n elements will be included at least once if we choose appropriately m1 =

%n & r

(1.A.62)

selections among the n Cr selections. For example, assume the set {1, 2, 3, 4, 5} and . / r = 2. Then, we have m 1 = 25 = 3, and thus, each of the five elements is included at least once in the three selections (1, 2), (3, 4), and (4, 5). Next, it is possible that one or more elements will not be included if we consider n−1 Cr selections or less among the total n Cr selections. For example, for the set {1, 2, 3, 4, 5} and r = 2, in some choices of 4 C2 = 6 selections or less such as (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), and (3, 4) among the total of 5 C2 = 10 selections, the element 5 is not included. On the other hand, each of the n elements will be included at least once in any m2 = 1 +

n−1 Cr

(1.A.63)

selections. Here, we have n−1 Cr

 r = 1− n Cr n = n Cr − n−1 Cr −1 .

(1.A.64)

The identity (1.A.64) implies that the number of ways for a specific element not to be included when selecting r elements from a set of n distinct elements is the same as the following two numbers: (1) The number of ways to select r elements from a set of n − 1 distinct elements. (2) The difference between the number of ways to select r elements from a set of n distinct elements and that for a specific element to be included when selecting r elements from a set of n distinct elements.

82

1 Preliminaries

(F) Fubini’s theorem Theorem 1.A.5 When the function f (x, y) is continuous on A = {(x, y) : a ≤ x ≤ b, c ≤ y ≤ d}, we have 

 



d

f (x, y)d xd y = c

A



f (x, y)d xd y

a b

=

b



a

d

f (x, y)d yd x.

(1.A.65)

c

In addition, we have  

 f (x, y)d xd y =

b

a

A



g2 (x) g1 (x)

f (x, y)d yd x

(1.A.66)

if f (x, y) is continuous on A = {(x, y) : a ≤ x ≤ b, g1 (x) ≤ y ≤ g2 (x)} and both g1 and g2 are continuous on [a, b].

(G) Partitions of Numbers A representation of a natural number as the sum of natural numbers is also called a partition. Denote the number of partitions for a natural number n as the sum of k natural numbers by M(n, k). Then, the number N (n) of partitions for a natural number n can be expressed as N (n) =

n 

M(n, k).

(1.A.67)

k=1

As we can see, for example, from 1: 2: 3: 4:

{1}, {2}, {1,1}, {3}, {2,1}, {1,1,1}, {4}, {3,1}, {2,2}, {2,1,1}, {1,1,1,1},

(1.A.68)

we have N (1) = M(1, 1) = 1, N (2) = M(2, 1) + M(2, 2) = 2, and N (3) = M(3, 1) + M(3, 2) + M(3, 3) = 3. In addition, M(4, 1) = 1, M(4, 2) = 2, M(4, 3) = 1, and M(4, 4) = 1. In general, the number M(n, k) satisfies M(n, k) = M(n − 1, k − 1) + M(n − k, k).

(1.A.69)

Appendices

83

Example 1.A.25 We have M(5, 3) = M(5 − 1, 3 − 1) + M(5 − 3, 3) = M(4, 2) + ♦ M(2, 3) = 2 + 0 = 2 from (1.A.69). Theorem 1.A.6 Denote the least common multiplier of k consecutive natural num˜ Let the quotient and remainder of n when divided by k be Q k bers 1, 2, . . . , k by k. and Rk , respectively. If we write n = k˜ Q k˜ + Rk˜ ,

(1.A.70)

then the number M(n, k) can be expressed as M(n, k) =

k−1 

  ci,k Rk˜ Q ik˜ , Rk˜ = 0, 1, . . . , k˜ − 1

(1.A.71)

i=0



k−1 in terms of k˜ polynomials of order k − 1 in Q k˜ , where ci,k (·) i=0 are the coefficients of the polynomial. Based on Theorem 1.A.6, we can obtain M(n, 1) = 1, M(n, 2) =

 n−1 

12 M(n, 3) =

2 n , 2 2

, n is odd, n is even,

R6 = 0; n , n 2 − 4, R6 = 2, 4;

(1.A.72) n 2 − 1, R6 = 1, 5; n 2 + 3, R6 = 3,

(1.A.73)

and ⎧ 3 n + 3n 2 , ⎪ ⎪ ⎪ 3 ⎪ n + 3n 2 − 20, ⎪ ⎪ ⎪ 3 ⎪ n + 3n 2 + 32, ⎪ ⎪ ⎪ ⎪ ⎨ n 3 + 3n 2 − 36, 144 M(n, 4) = n 3 + 3n 2 + 16, ⎪ ⎪ n 3 + 3n 2 − 4, ⎪ ⎪ ⎪ ⎪ n 3 + 3n 2 − 9n + 5, ⎪ ⎪ ⎪ ⎪ n 3 + 3n 2 − 9n − 27, ⎪ ⎪ ⎩ 3 n + 3n 2 − 9n − 11,

R12 R12 R12 R12 R12 R12 R12 R12 R12

= 0, = 2, = 4, = 6, = 8, = 10, = 1, 7, = 3, 9, = 5, 11,

(1.A.74)

for example. Table 1.8 shows the 60 polynomials of order four in Q 60 for the representation of M(n, 5).

84

1 Preliminaries

Table

1.8 Coefficients



0 c j,5 (r ) j=4

in

M(n, 5) = c4,5 (R60 )Q 460 + c3,5 (R60 )Q 360 +

c2,5 (R60 )Q 260 + c1,5 (R60 )Q 60 + c0,5 (R60 ) r

c4,5 (r ), c3,5 (r ), c2,5 (r ), c1,5 (r ), c0,5 (r )

r

c4,5 (r ), c3,5 (r ), c2,5 (r ), c1,5 (r ), c0,5 (r )

0

4500, 750, 25/2, −5/2, 0

1

4500, 1050, 115/2, 1/2, 0

2

4500, 1350, 235/2, 3/2, 0

3

4500, 1650, 385/2, 17/2, 0

4

4500, 1950, 565/2, 29/2, 0

5

4500, 2250, 775/2, 55/2, 1

6

4500, 2550, 1015/2, 81/2, 1

7

4500, 2850, 1285/2, 123/2, 2

8

4500, 3150, 1585/2, 167/2, 3

9

4500, 3450, 1915/2, 229/2, 5

10

4500, 3750, 2275/2, 295/2, 7

11

4500, 4050, 2665/2, 381/2, 10

12

4500, 4350, 3085/2, 473/2, 13

13

4500, 4650, 3535/2, 587/2, 18

14

4500, 4950, 4015/2, 709/2, 23

15

4500, 5250, 4525/2, 855/2, 30

16

4500, 5550, 5065/2, 1011/2, 37

17

4500, 5850, 5635/2, 1193/2, 47

18

4500, 6150, 6235/2, 1387/2, 57

19

4500, 6450, 6865/2, 1609/2, 70

20

4500, 6750, 7525/2, 1845/2, 84

21

4500, 7050, 8215/2, 2111/2, 101

22

4500, 7350, 8935/2, 2393/2, 119

23

4500, 7650, 9685/2, 2707/2, 141

24

4500, 7950, 10465/2, 3039/2, 164

25

4500, 8250, 11275/2, 3405/2, 192

26

4500, 8550, 12115/2, 3791/2, 221

27

4500, 8850, 12985/2, 4213/2, 255

28

4500, 9150, 13885/2, 4657/2, 291

29

4500, 9450, 14815/2, 5139/2, 333

30

4500, 9750, 15775/2, 5645/2, 377

31

4500, 10050, 16765/2, 6191/2, 427

32

4500, 10350, 17785/2, 6763/2, 480

33

4500, 10650, 18835/2, 7377/2, 540

34

4500, 10950, 19915/2, 8019/2, 603

35

4500, 11250, 21025/2, 8705/2, 674

36

4500, 11550, 22165/2, 9421/2, 748

37

4500, 11850, 23335/2, 10183/2, 831

38

4500, 12150, 24535/2, 10977/2, 918

39

4500, 12450, 25765/2, 11819/2, 1014

40

4500, 12750, 27025/2, 12695/2, 1115

41

4500, 13050, 28315/2, 13621/2, 1226

42

4500, 13350, 29635/2, 14583/2, 1342

43

4500, 13650, 30985/2, 15597/2, 1469

44

4500, 13950, 32365/2, 16649/2, 1602

45

4500, 14250, 33775/2, 17755/2, 1747

46

4500, 14550, 35215/2, 18901/2, 1898

47

4500, 14850, 36685/2, 20103/2, 2062

48

4500, 15150, 38185/2, 21347/2, 2233

49

4500, 15450, 39715/2, 22649/2, 2418

50

4500, 15750, 41275/2, 23995/2, 2611

51

4500, 16050, 42865/2, 25401/2, 2818

52

4500, 16350, 44485/2, 26853/2, 3034

53

4500, 16650, 46135/2, 28367/2, 3266

54

4500, 16950, 47815/2, 29929/2, 3507

55

4500, 17250, 49525/2, 31555/2, 3765

56

4500, 17550, 51265/2, 33231/2, 4033

57

4500, 17850, 53035/2, 34973/2, 4319

58

4500, 18150, 54835/2, 36767/2, 4616

59

4500, 18450, 56665/2, 38629/2, 4932

Exercises Exercise 1.1 Show that B c ⊆ Ac when A ⊆ B. c  ∞ ∞ ∞ Exercise 1.2 Show that ∩ Ai = ∪ Aic for a sequence {Ai }i=1 of sets. i=1

i=1

Exercise 1.3 Express the difference A − B in terms only of intersection and symmetric difference, and the union A ∪ B in terms only of intersection and symmetric difference.

Exercises

85

n Exercise 1.4 Consider a sequence {Ai }i=1 of finite sets. Show

|A1 ∪ A2 ∪ · · · ∪ An | =



|Ai | −

i

      Ai ∩ A j  +  Ai ∩ A j ∩ A k  i< j

i< j 0. Using the results, obtain ∞ ∞   p C2k+1 and p C2k when p > 0. k=0

k=0 1

Exercise 1.42 Obtain the series expansions of g1 (z) = (1 + z) 2 and g2 (z) = (1 + 1 z)− 2 . Exercise 1.43 For non-negative numbers α and β such that α + β = 0, show that  αβ 2αβ α+β ≤ min(α, β) ≤ ≤ αβ ≤ ≤ max(α, β). (1.E.29) α+β α+β 2

References M. Abramowitz, I.A. Stegun (eds.), Handbook of Mathematical Functions (Dover, New York, 1972) G.E. Andrews, R. Askey, R. Roy, Special Functions (Cambridge University, Cambridge, 1999) E. Artin, The Gamma Function (Translated by M Rinehart, and Winston Butler) (Holt, New York, 1964) B.C. Carlson, Special Functions of Applied Mathematics (Academic, New York, 1977) J.L. Challifour, Generalized Functions and Fourier Analysis: An Introduction (W. A. Benjamin, Reading, 1972) C.A. Charalambides, Enumerative Combinatorics (Chapman and Hall, New York, 2002) W.A. Gardner, Introduction to Random Processes with Applications to Signals and Systems, 2nd edn. (McGraw-Hill, New York, 1990) B.R. Gelbaum, J.M.H. Olmsted, Counterexamples in Analysis (Holden-Day, San Francisco, 1964)

92

1 Preliminaries

I.M. Gelfand, I. Moiseevich, Generalized Functions (Academic, New York, 1964) H.W. Gould, Combinatorial Identities (Morgantown Printing, Morgantown, 1972) R.P. Grimaldi, Discrete and Combinatorial Mathematics, 3rd edn. (Addison-Wesley, Reading, 1994) P.R. Halmos, Measure Theory (Van Nostrand Reinhold, New York, 1950) R.F. Hoskins, J.S. Pinto, Theories of Generalised Functions (Horwood, Chichester, 2005) K. Ito (ed.), Encyclopedic Dictionary of Mathematics (Massachusetts Institute of Technology, Cambridge, 1987) R. Johnsonbaugh, W.E. Pfaffenberger, Foundations of Mathematical Analysis (Marcel Dekker, New York, 1981) D.S. Jones, The Theory of Generalised Functions, 2nd edn. (Cambridge University, Cambridge, 1982) R.P. Kanwal, Generalized Functions: Theory and Applications (Birkhauser, Boston, 2004) K. Karatowski, A. Mostowski, Set Theory (North-Holland, Amsterdam, 1976) S.M. Khaleelulla, Counterexamples in Topological Vector Spaces (Springer, Berlin, 1982) A.B. Kharazishvili, Nonmeasurable Sets and Functions (Elsevier, Amsterdam, 2004) M.J. Lighthill, An Introduction to Fourier Analysis and Generalised Functions (Cambridge University, Cambridge, 1980) M.E. Munroe, Measure and Integration, 2nd edn. (Addison-Wesley, Reading, 1971) I. Niven, H.S. Zuckerman, H.L. Montgomery, An Introduction to the Theory of Numbers, 5th edn. (Wiley, New York, 1991) J.M.H. Olmsted, Advanced Calculus (Appleton-Century-Crofts, New York, 1961) S.R. Park, J. Bae, H. Kang, I. Song, On the polynomial representation for the number of partitions with fixed length. Math. Comput. 77(262), 1135–1151 (2008) J. Riordan, Combinatorial Identities (Wily, New York, 1968) F.S. Roberts, B. Tesman, Applied Combinatorics, 2nd edn. (CRC, Boca Raton, 2009) J.P. Romano, A.F. Siegel, Counterexamples in Probability and Statistics (Chapman and Hall, New York, 1986) K.H. Rosen, J.G. Michaels, J.L. Gross, J.W. Grossman, D.R. Shier, Handbook of Discrete and Combinatorial Mathematics (CRC, New York, 2000) H.L. Royden, Real Analysis, 3rd edn. (Macmillan, New York, 1989) W. Rudin, Principles of Mathematical Analysis, 3rd edn. (McGraw-Hill, New York, 1976) R. Salem, On some singular monotonic functions which are strictly increasing. Trans. Am. Math. Soc. 53(3), 427–439 (1943) A.N. Shiryaev, Probability, 2nd edn. (Springer, New York, 1996) N.J.A. Sloane, S. Plouffe, Encyclopedia of Integer Sequences (Academic, San Diego, 1995) D.M.Y. Sommerville, An Introduction to the Geometry of N Dimensions (Dover, New York, 1958) R.P. Stanley, Enumerative Combinatorics, Vols. 1 and 2 (Cambridge University Press, Cambridge, 1997) L.A. Steen, J.A. Seebach Jr., Counterexamples in Topology (Holt, Rinehart, and Winston, New York, 1970) J. Stewart, Calculus: Early Transcendentals, 7th edn. (Brooks/Coles, Belmont, 2012) A.A. Sveshnikov (ed.), Problems in Probability Theory, Mathematical Statistics and Theory of Random Functions (Dover, New York, 1968) G.B. Thomas, Jr., R.L. Finney, Calculus and Analytic Geometry, 9th edn. (Addison-Wesley, Reading, 1996) J.B. Thomas, Introduction to Probability (Springer, New York, 1986) A. Tucker, Applied Combinatorics (Wiley, New York, 2002) N.Y. Vilenkin, Combinatorics (Academic, New York, 1971) W.D. Wallis, J.C. George, Introduction to Combinatorics (CRC, New York, 2010) A.I. Zayed, Handbook of Function and Generalized Function Transformations (CRC, Boca Raton, 1996) S. Zhang, J. Jian, Computation of Special Functions (Wiley, New York, 1996)

Chapter 2

Fundamentals of Probability

Probability theory is a branch of measure theory. In measure theory and probability theory, we consider set functions of which the values are non-negative real numbers with the values called the measure and probability, respectively, of the corresponding set. In probability theory (Ross 1976, 1996), the values are in addition normalized to exist between 0 and 1: loosely speaking, the probability of a set represents the weight or size of the set. The more common concepts such as area, weight, volume, and mass are other examples of measure. As we shall see shortly, by integrating probability density or by adding probability mass, we can obtain probability. This is similar to obtaining mass by integrating the mass density or by summing point mass.

2.1 Algebra and Sigma Algebra We first address the notions of algebra and sigma algebra (Bickel and Doksum 1977; Leon-Garcia 2008), which are the bases in defining probability.

2.1.1 Algebra Definition 2.1.1 (algebra) A collection A of subsets of a set S satisfying the two conditions if A ∈ A and B ∈ A, then A ∪ B ∈ A

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 I. Song et al., Probability and Random Variables: Theory and Applications, https://doi.org/10.1007/978-3-030-97679-8_2

(2.1.1)

93

94

2 Fundamentals of Probability

and if A ∈ A, then Ac ∈ A

(2.1.2)

is called an algebra of S. Example 2.1.1 The collection A1 = {{1}, {2}, S1 , ∅} is an algebra of S1 = {1, 2}, ♦ and A2 = {{1}, {2, 3}, S2 , ∅} is an algebra of S2 = {1, 2, 3}. Example 2.1.2 From de Morgan’s law, (2.1.1), and (2.1.2), we get A ∩ B ∈ A when A ∈ A and B ∈ A for an algebra A. Subsequently, we have n

∪ Ai ∈ A

i=1

(2.1.3)

and n

∩ Ai ∈ A

i=1

(2.1.4)

n when {Ai }i=1 are all the elements of the algebra A.



The theorem below follows from Example 2.1.2. Theorem 2.1.1 An algebra is closed under a finite number of set operations.

When Ai ∈ A, we always have Ai ∩ Aic = ∅ ∈ A and Ai ∪ Aic = S ∈ A, expressed as a theorem below. Theorem 2.1.2 If A is an algebra of S, then S ∈ A and ∅ ∈ A.

In other words, a collection is not an algebra of S if the collection does not include ∅ or S. Example 2.1.3 Obtain all the algebras of S = {1, 2, 3}. Solution The collections A1 = {S, ∅}, A2 = {S, {1}, {2, 3}, ∅}, A3 = {S, {2}, {1, 3}, ∅}, A4 = {S, {3}, {1, 2}, ∅}, and A5 = {S, {1}, {2}, {3}, {1, 2}, {2, 3}, {1, 3}, ∅} are the algebras of S. ♦ Example 2.1.4 Assume J+ = {1, 2, . . .} defined in (1.1.3), and consider the collection A1 of all the sets obtained from a finite number of unions of the sets {1}, {2}, . . . each containing a single natural number. Now, J+ is not an element of A1 because it is not possible to obtain J+ from a finite number of unions of the sets {1}, {2}, . . .. ♦ Consequently, A1 is not an algebra of J+ .

2.1 Algebra and Sigma Algebra

95

Definition 2.1.2 (generated algebra) For a collection C of subsets of a set, the smallest algebra to which all the element sets in C belong is called the algebra generated from C and is denoted by A (C). The implication of A (C) being the smallest algebra is that any algebra to which all the element sets of C belong also contains all the element sets of A (C) as its elements. Example 2.1.5 When S = {1, 2, 3}, the algebra generated from C = {{1}} is A (C) = A1 = {S, {1}, {2, 3}, ∅} because A2 = {S, {1}, {2}, {3}, {1, 2}, {2, 3}, {1, 3}, ∅} con♦ tains all the elements of A1 = {S, {1}, {2, 3}, ∅}. Example 2.1.6 For the collection C = {{a}} of S = {a, b, c, d}, the algebra generated from C is A (C) = {∅, {a}, {b, c, d}, S}. ♦ ∞ Theorem 2.1.3 Let A be an algebra of a set S, and {Ai }i=1 be a sequence of sets in ∞ A. Then, A contains a sequence {Bi }i=1 of sets such that

Bm ∩ Bn = ∅

(2.1.5)

for m = n and ∞



i=1

i=1

∪ Bi = ∪ Ai .

(2.1.6)

Proof The theorem can be proved similarly as in (1.1.32)–(1.1.34).



Example 2.1.7 Assume the algebra A = {{1}, {2, 3}, S, ∅} of S = {1, 2, 3}. When A1 = {1}, A2 = {2, 3}, and A3 = S, we have B1 = {1} and B2 = {2, 3}. ♦ Example 2.1.8 Consider the algebra {∅, {a}, {b}, {a, b}, {c, d}, {b, c, d}, {a, c, d}, S} of S = {a, b, c, d}. For A1 = {a, b} and A2 = {b, c, d}, we have B1 = {a, b} and ♦ B2 = {c, d}, or B1 = {a} and B2 = {b, c, d}. Example 2.1.9 Assume the algebra considered in Example 2.1.8. If A1 = {a, b}, A2 = {b, c, d}, and A3 = {a, c, d}, then B1 = {a}, B2 = {b}, and B3 = {c, d}. ♦ Example 2.1.10 Assume the algebra A ({{1}, {2}, . . .}) generated from {{1}, {2}, . . .}, and let Ai = {2, 4, . . . , 2i}. Then, we have B1 = {2}, B2 = {4}, . . ., i.e., ∞



i=1

i=1

Bi = {2i} for i = 1, 2, . . .. It is clear that ∪ Bi = ∪ Ai .



2.1.2 Sigma Algebra In some cases, the results in finite and infinite spaces are different. For example, although the result from a finite number of set operations on the elements of an algebra

96

2 Fundamentals of Probability

is an element of the algebra, the result from an infinite number of set operations is not always an element of the algebra. This is similar to the fact that adding a finite number of rational numbers results always in a rational number while adding an infinite number of rational numbers sometimes results in an irrational number. Example 2.1.11 Assume S = {1, 2, . . .}, a collection C of finite subsets of S, and the algebra A (C) generated from C. Then, S is an element of A (C) although S can be obtained from only an infinite number of unions of the element sets in A (C). On the other hand, the set {2, 4, . . .}, a set that can also be obtained from only an infinite number of unions of the element sets in A (C), is not an element of A (C). In other words, while a finite number of unions of the element sets in A (C) would result in an element of A (C), an infinite number of unions of the element sets in A (C) is not guaranteed to be an element of A (C). ♦ As it is clear from the example above, the algebra is unfortunately not closed under a countable number of set operations. We now define the notion of σ -algebra by adding one desirable property to algebra. Definition 2.1.3 (σ -algebra) An algebra that is closed under a countable number of unions is called a sigma algebra or σ -algebra. In other words, an algebra F is a σ -algebra if ∞

∪ Ai ∈ F

i=1

(2.1.7)

for all element sets A1 , A2 , . . . of F. A sigma algebra is closed under a countable, i.e., finite and countably infinite, number of set operations while an algebra is closed under only a finite number of set operations. A sigma algebra is still an algebra, but the converse is not necessarily true. An algebra and a σ -algebra are also called an additive class of sets and a completely additive class of sets, respectively. Example 2.1.12 For finite sets, an algebra is also a sigma algebra.



Example 2.1.13 For a σ -algebra F of S, we always have S ∈ F and ∅ ∈ F from Theorem 2.1.2 because σ -algebra is an algebra. ♦ Example 2.1.14 The collection F = {∅, {a}, {b}, {c}, {d}, {a, b}, {a, c}, {a, d}, {b, c}, {b, d}, {c, d}, {a, b, c}, {a, b, d}, {a, c, d}, {b, c, d}, S} (2.1.8) of sets from S = {a, b, c, d} is a σ -algebra.



When the collection of all possible outcomes is finite as in a single toss of a coin or a single rolling of a pair of dice, the limit, i.e., the infinite union in (2.1.7), does not have significant implications and an algebra is also a sigma algebra. On the other

2.1 Algebra and Sigma Algebra

97

hand, when an algebra contains infinitely many element sets, the result of an infinite number of unions of the element sets of the algebra does not always belong to the algebra because an algebra is not closed under an infinite number of set operations. Such a case occurs when the collection of all possible outcomes is from, for instance, an infinite toss of a coin or an infinite rolling of a pair of dice. Example 2.1.15 The space Υ = {a = (a1 , a2 , . . .) : ai ∈ {0, 1}} of one-sided binary sequences is an uncountable set as discussed in Example 1.1.45. Consider the algebra AΥ = A (G Υ )

(2.1.9)

generated from the collection G Υ = {{ai } : ai ∈ Υ } of singleton sets {ai }. Then, some useful countably infinite sets such as ΥT = {periodic binary sequences}

(2.1.10)

described in Example 1.1.44 are not elements of the algebra AΥ because an infinite set cannot be obtained by a finite number of set operations on the element sets of ♦ GΥ . Example 2.1.16 Assuming Ω = J+ , consider the algebra A N = A (G) generated from the collection G = {{1}, {2}, . . .}. Clearly, J+ ∈ A N and ∅ ∈ A N . On the other hand, the set {2, 4, . . .} of even numbers is not an element of A N because the set of even numbers cannot be obtained by a finite number of set operations on the element sets of G, as we have already mentioned in Example 2.1.11. Therefore, A N is an ♦ algebra, but is not a σ -algebra, of J+ . Example 2.1.17 Assuming Ω = Q, the set of rational numbers, let A N = A (G) be the algebra generated from the collection G = {{1}, {2}, . . .} of singleton sets of natural numbers. Clearly, A N is not a σ -algebra of Q. Note that the set J+ = {1, 2, . . .} of natural numbers is not an element of A N because J+ cannot be obtained by a finite number of set operations on the sets of G. ♦ Example 2.1.18 For Ω = [0, ∞), consider the collection F1 = {[a, b), [a, ∞) : 0 ≤ a ≤ b < ∞}

(2.1.11)

of intervals [a, b) and [a, ∞) with 0 ≤ a ≤ b < ∞, and the collection F2 obtained from a finite number of unions of the intervals in F1 . We have [a, a) = ∅ ∈ F1 and [a, ∞) = Ω ∈ F1 with a = 0. Yet, although [a, b) ∈ F1 for 0 ≤ a ≤ b < ∞, we / F1 for 0 < a < b < ∞. Thus, F1 is not an algebra have [a, b)c = [0, a) ∪ [b, ∞) ∈ of Ω. On the other hand, F2 is an algebra1 of Ω because a finite number of unions of the elements in F1 is an element of F2 , the complement of every element in Here, if the condition ‘0 ≤ a ≤ b < ∞’ is replaced with ‘0 ≤ a < b < ∞’, F2 is not an algebra because the null set is not an element of F2 .

1

98

2 Fundamentals of Probability

F2 is an element of F2 , ∅ ∈ F2 , and Ω ∈ F2 . However, F2 is not a σ -algebra of ∞

Ω = [0, ∞) because ∩ An = {0}, for instance, is2 not an element of F2 although n=1   ♦ An = 0, n1 ∈ F2 for n = 1, 2, . . .. Example 2.1.19 Assuming Ω = R, consider the collection A of results obtained from finite numbers of unions of intervals (−∞, a], (b, c], and (d, ∞) with b ≤ c. ∞   Then, A is an algebra of R but is not a σ -algebra because ∩ b − n1 , c = [b, c] is n=1

not an element of A.

♦ ∞

∞ of a set Ω. Then, ∩ Fn is a Example 2.1.20 Assume the σ -algebras {Fi }i=1 n=1



σ -algebra. However, ∪ Fn is not always a σ -algebra. For example, for Ω = n=1

{ω1 , ω2 , ω3 }, consider the two σ -algebras F1 = {∅, {ω1 } , {ω2 , ω3 } , Ω} and F2 = {∅, {ω2 } , {ω1 , ω3 } , Ω}. Then, F1 ∩ F2 = {∅, Ω} is a σ -algebra, but the collection F1 ∪ F2 = {∅, {ω1 } , {ω2 } , {ω2 , ω3 } , {ω1 , ω3 } , Ω} is not even an algebra. As another example, consider the sequence F1 = {∅, Ω, {ω1 } , Ω − {ω1 }} , F2 = {∅, Ω, {ω2 } , Ω − {ω2 }} , .. .

(2.1.12) (2.1.13)

of sigma algebras of Ω = {ω1 , ω2 , . . .}. Then, ∞

∪ Fn = {∅, Ω, {ω1 } , {ω2 } , . . . , Ω − {ω1 } , Ω − {ω2 } , . . .}

n=1

is not an algebra.

(2.1.14) ♦

Definition 2.1.4 (generated σ -algebra) Consider a collection G of subsets of Ω. The smallest σ -algebra that contains all the element sets of G is called the σ -algebra generated from G and is denoted by σ (G). The implication of the σ -algebra σ (G) being the smallest σ -algebra is that any σ -algebra which contains all the elements of C will also contain all the elements of σ (G). Example 2.1.21 For S = {a, b, c, d}, the σ -algebra generated from C = {{a}} is σ (C) = {∅, {a}, {b, c, d}, S}. ♦ Example 2.1.22 For the uncountable set Υ = {a = (a1 , a2 , . . .) : ai ∈ {0, 1}} of one-sided binary sequences, consider the algebra A (G Υ ) and σ -algebra σ (G Υ ) generated from G Υ = {{ai } : ai ∈ Υ }. Then, as we have observed in Example 2.1.15, the collection 2

This result is from (1.5.11).

2.1 Algebra and Sigma Algebra

99

ΥT = {periodic binary sequences}

(2.1.15)

is not included in A (G Υ ) because all the element sets of A (G Υ ) contain a finite number of ai ’s while ΥT is an infinite set. On the other hand, we have ΥT ∈ σ (G Υ ) by the definition of a sigma algebra.

(2.1.16) ♦

Based on the concept of σ -algebra, we will discuss the notion of probability space in the next section. In particular, the concept of σ -algebra plays a key role in the continuous probability space.

2.2 Probability Spaces A probability space (Gray and Davisson 2010; Loeve 1977) is the triplet (Ω, F, P) of an abstract space Ω, called the sample space; a sigma algebra F, called the event space, of the sample space; and a set function P, called the probability measure, assigning a number in [0, 1] to each of the element sets of the event space.

2.2.1 Sample Space Definition 2.2.1 (random experiment) An experiment that can be repeated under perfect control, yet the outcome of which is not known in advance, is called a random experiment or, simply, an experiment. Example 2.2.1 Tossing a coin is a random experiment because it is not possible to predict the exact outcome even under a perfect control of the environment. ♦ Example 2.2.2 Making a product in a factory can be modelled as a random experiment because even the same machine would not be able to produce two same products. ♦ Example 2.2.3 Although the law of inheritance is known, it is not possible to know exactly, for instance, the color of eyes of a baby in advance. A probabilistic model is more appropriate. ♦ Example 2.2.4 In any random experiment, the procedure, observation, and model should be described clearly. For example, toss a coin can be described as follows: • Procedure. A coin will be thrown upward and fall freely down to the floor. • Observation. When the coin stops moving, the face upward is observed. • Model. The coin is symmetric and previous outcomes do not influence future outcomes. ♦

100

2 Fundamentals of Probability

Definition 2.2.2 (sample space) The collection of all possible outcomes of an experiment is called the sample space of the experiment. The sample space, often denoted by S or Ω, is basically the same as the abstract space in set theory. Definition 2.2.3 (sample point) An element of the sample space is called a sample point or an elementary outcome. Example 2.2.5 In toss a coin, the sample space is S = {head, tail} and the sample points are head and tail. In rolling a fair die, the sample space is S = {1, 2, 3, 4, 5, 6} and the sample points are 1, 2, . . . , 6. In the experiment of rolling a die until a certain number appears, the sample space is S = {1, 2, . . .} and the sample points are 1, 2, . . . when the observation is the number of rolling. ♦ Example 2.2.6 In the experiment of choosing a real number between a and b randomly, the sample space is Ω = (a, b). ♦ The sample spaces in Example 2.2.5 are countable sets, which are often called discrete sample spaces or discrete spaces. The sample space Ω = (a, b) considered in Example 2.2.6 is an uncountable space and is called a continuous sample space or continuous space. A finite dimensional vector space from a discrete space is, again, a discrete space: on the other hand, it should be noted that an infinite dimensional vector space from a discrete space, which is called a sequence space, is a continuous space. A mixture of discrete and continuous spaces is called a mixed sample space or a hybrid sample space. Let us generally denote by I the index set such as the set R of real numbers, the set R0 = {x : x ≥ 0}

(2.2.1)

of non-negative real numbers, the set Jk = {0, 1, . . . , k − 1}

(2.2.2)

of integers from 0 to k − 1, the set J+ = {1, 2, . . .} of naturalnumbers, and the set J = {. . . , −1, 0, 1, . . .} of integers. Then, the product space Ωt of the sample spaces {Ωt , t ∈ I} can be described as 

t∈I

Ωt = {all {at , t ∈ I} : at ∈ Ωt } ,

(2.2.3)

t∈I

which can also be written as Ω I if it incurs no confusion or if Ωt = Ω. Definition 2.2.4 (discrete combined space) For combined random experiments on two discrete sample spaces Ω1 of size m and Ω2 of size n, the sample space Ω = Ω1 × Ω2 of size mn is called a discrete combined space.

2.2 Probability Spaces

101

Example 2.2.7 When a coin is tossed and a die is rolled at the same time, the sample space is the discrete combined space S = {(head, 1), (head, 2), . . . , (head, 6), (tail, 1), (tail, 2), . . . , (tail, 6)} of size 2 × 6 = 12. ♦ Example 2.2.8 When two coins are tossed once or a coin is tossed twice, the sample space is the discrete combined space S = {(head, head), (head, tail), (tail, head), (tail, tail)} of size 2 × 2 = 4. ♦ Example 2.2.9 Assume the sample space Ω = {0, 1} and let Ω1 = Ω2 = · · · = k  Ωk = Ω. Then, the space Ωi = Ω k = {(a1 , a2 , . . . , ak ) : ai ∈ {0, 1}} is the ki=1

fold Cartesian space of Ω, a space of binary vectors of length k, and an example of a product space. ♦ Example 2.2.10 Assume the discrete space Ω = {a1 , a2 , . . . , am }. The space Ω k = {all vectors b = (b1 , b2 , . . . , bk ) : bi ∈ Ω} of k-dimensional vectors from Ω, i.e., the k-fold Cartesian space of Ω, is a discrete space as we have already observed indirectly in Examples 1.1.41 and 1.1.42. On the other hand, the sequence spaces Ω J = {all infinite sequences {. . . , c−1 , c0 , c1 , . . .} : ci ∈ Ω} and Ω J+ = {all infinite sequences {d1 , d2 , . . .} : di ∈ Ω} are continuous spaces, although they are derived from a discrete space. ♦

2.2.2 Event Space Definition 2.2.5 (event space; event) A sigma algebra obtained from a sample space is called an event space, and an element of the event space is called an event. An event in probability theory is roughly another name for a set in set theory. Nonetheless, a non-measurable set discussed in Appendix 2.4 cannot be an event and not all measurable sets are events: again, only the element sets of an event space are events. Example 2.2.11 Consider the sample space S = {1, 2, 3} and the event space C = {{1}, {2, 3}, S, ∅}. Then, the subsets {1}, {2, 3}, S, and ∅ of S are events. However, the other subsets {2}, {3}, {1, 3}, and {1, 2} of S are not events. ♦ As we can easily observe in Example 2.2.11, every event is a subset of the sample space, but not all the subsets of a sample space are events. Definition 2.2.6 (elementary event) An event that is a singleton set is called an elementary event. Example 2.2.12 For a coin toss, the sample space is S = {head, tail}. If we assume the event space F = {S, ∅, {head}, {tail}}, then the sets S, {head}, {tail}, and ∅ are events, among which {head} and {tail} are elementary events. ♦

102

2 Fundamentals of Probability

It should be noted that only a set can be an event. Therefore, no element of a sample space can be an event, and only a subset of a sample space can possibly be an event. Example 2.2.13 For rolling a die with the sample space S = {1, 2, 3, 4, 5, 6}, the element 1, for example, of S can never be an event. The subset {1, 2, 3} may sometimes be an event: specifically, the subset {1, 2, 3} is and is not an event for the event spaces F = {{1, 2, 3}, {4, 5, 6}, S, ∅} and F = {{1, 2}, {3, 4, 5, 6}, S, ∅}, respectively. ♦ In addition, when a set is an event, the complement of the set is also an event even if it cannot happen. Example 2.2.14 Consider an experiment of measuring the current through a circuit in the sample space R. If the set {a : a ≤ 1000 mA} is an event, then the complement {a : a > 1000 mA} will also be an event even if the current through the circuit cannot exceed 1000 mA. ♦ For a sample space, several event spaces may exist. For a sample space Ω, the collection {∅, Ω} is the smallest event space and the power set 2Ω , the collection of all the subsets of Ω as described in Definition 1.1.13, is the largest event space. Example 2.2.15 For a game of rock-paper-scissors with the sample space Ω = {rock, paper, scissors}, the smallest event space is F S = {∅, Ω} and the largest event space is F L = 2Ω = {∅, {rock}, {paper}, {scissors}, {rock, paper}, {paper, scissors}, {scissors, rock}, Ω}. ♦ Let us now briefly describe why we base the probability theory on the more restrictive σ -algebra, and not on algebra. As we have noted before, when the sample space is finite, we could also have based the probability theory on algebra because an algebra is basically the same as a σ -algebra. However, if the probability theory is based on algebra when the sample space is an infinite set, it becomes impossible to take some useful sets3 as events and to consider the limit of events, as we can see from the examples below. Example 2.2.16 If the event space is defined not as a σ -algebra but as an algebra, then the limit is not an element of the event space for some sequences of events. For example, even when all finite intervals (a, b) are events, no singleton set is an event because a singleton set cannot be obtained from a finite number of set operations on intervals (a, b). In a more practical scenario, even if “The voltage measured is between a and b (V).” is an event, “The voltage measured is a (V).” would not be an event if the event space were defined as an algebra. ♦

Among those useful sets is the set ΥT = {all periodic binary sequences} considered in Example 2.1.15.

3

2.2 Probability Spaces

103

As we have already mentioned in Sect. 2.1.2, when the sample space is finite, the limit is not so crucial and an algebra is a σ -algebra. Consequently, the probability theory could also be based on algebra. When the sample space is an infinite set and the event space is composed of an infinite number of sets, however, an algebra is not closed under an infinite number of set operations. In such a case, the result of an infinite number of operations on sets, i.e., the limit of a sequence, is not guaranteed to be an event. In short, the fact that a σ -algebra is closed under a countable number of set operations is the reason why we adopt the more restrictive σ -algebra as an event space, and not the more general algebra. An event space is a collection of subsets of the sample space closed under a countable number of unions. We can show, for instance via de Morgan’s law, that an event space is closed also under a countable number of other set operations such as difference, complement, intersection, etc. It should be noted that an event space is closed for a countable number of set operations, but not for an uncountable number ∞

of set operations. For example, the set ∪ Hr is also an event when {Hr }r∞=1 are all r =1

events, but the set



r ∈[0,1]

Br

(2.2.4)

may or may not be an event when {Br : r ∈ [0, 1]} are all events. Let us now discuss in some detail the condition ∞

∪ Bi ∈ F

(2.2.5)

i=1

shown originally in (2.1.7), where {Bn }∞ n=1 are all elements of the event space F. This condition implies that the event space is closed under a countable number of union operations. Recollect that the limit lim Bn of {Bn }∞ n=1 is defined as n→∞

lim Bn =

n→∞

⎧ ∞ ⎪ ⎨ ∪ Bi ,

{Bn }∞ n=1 is a non-decreasing sequence,

⎪ ⎩ ∩ Bi ,

{Bn }∞ n=1 is a non-increasing sequence

i=1 ∞ i=1

(2.2.6)

in (1.5.8) and (1.5.9). It is clear that a sequence {Bn }∞ n=1 of events in F will also satisfy lim Bn ∈ F, {Bn }∞ n=1 is a non-decreasing sequence

n→∞

(2.2.7)

when the sequence satisfies (2.2.5). Example 2.2.17 Let a sequence {Hn }∞ n=1 of events in an event space F be nondecreasing. As we have seen in (1.5.8) and (2.2.6), the limit of {Hn }∞ n=1 can be ∞

expressed in terms of the countable union as lim Hn = ∪ Hn . Because a countable n→∞

n=1

104

2 Fundamentals of Probability

number of unions of events results in an event, the limit lim Hn is an event. For n→∞   ∞ example, when 1, 2 − n1 n=1 are all events, the limit [1, 2) of this non-decreasing sequence of events is an event. In addition, when finite intervals of the form (a, b) are all events, the limit (−∞, b) of {(−n, b)}∞ n=1 will also be an event. Similarly, assume a non-increasing sequence {Bn }∞ n=1 of events in F. The limit of this sequence, ∞

lim Bn = ∩ Bn as shown in (1.5.9) or (2.2.6), will also be an event because it is

n→∞

n=1

a countable intersection of events. Therefore, if finite  ∞ intervals (a, b) are all events, ♦ any singleton set {a}, the limit of a − n1 , a + n1 n=1 , will also be an event.

Example 2.2.18 Let us show the equivalence of (2.2.5) and (2.2.7). We have already observed that (2.2.7) holds true for a collection of events satisfying (2.2.5). Let us thus show that (2.2.5) holds true for a collection of events satisfying (2.2.7). Conn

∞ of events chosen arbitrarily in F and let Hn = ∪ G i . Then, sider a sequence {G i }i=1 ∞



n=1

n=1

i=1

∪ G n = ∪ Hn and {Hn }∞ n=1 is a non-decreasing sequence. In addition, because ∞

{Hn }∞ n=1 is a non-decreasing sequence, we have ∪ Hi = lim Hn from (2.2.6). There∞



n=1

n=1

i=1

n→∞

∞ fore, ∪ G n = ∪ Hn = lim Hn ∈ F. In other words, for any sequence {G i }i=1 of n→∞

events in F satisfying (2.2.7), we have ∞

∪ G n ∈ F.

n=1

(2.2.8)

In essence, the two conditions (2.2.5) and (2.2.7) are equivalent, which implies that (2.2.7), instead of (2.2.5), can be employed as one of the requirements for a collection to be an event space. Similarly, instead of (2.2.5), lim Bn ∈ F, {Bn }∞ n=1 is a non-increasing sequence

n→∞

(2.2.9)

can be employed as one of the requirements for a collection to be an event space. ♦ When the sample space is the space of real numbers, the representative of continuous spaces, we now consider a useful event space based on the notion of the smallest σ -algebra described in Definition 2.1.4. Definition 2.2.7 (Borel σ -algebra) When the sample space is the real line R, the sigma algebra B (R) = σ (all open intervals)

(2.2.10)

generated from all open intervals (a, b) in R is called the Borel algebra, Borel sigma field, or Borel field of R. The members of the Borel field, i.e., the sets obtained from a countable number of set operations on open intervals, are called Borel sets.

2.2 Probability Spaces

105

∞   Example 2.2.19 It is possible to see that singleton sets {x} = ∩ x − n1 , x + n1 , n=1

half-open intervals [x, y) = (x, y) ∪ {x} and (x, y] = (x, y) ∪ {y}, and closed intervals [x, y] = (x, y) ∪ {x} ∪ {y} are all Borel sets after some set operations. In addition, half-open intervals [x, +∞) = (−∞, x)c and (−∞, x] = (−∞, x) ∪ {x}, and ♦ open intervals (x, ∞) = (−∞, x]c are also Borel sets. The Borel σ -algebra B (R) is the most useful and widely-used σ -algebra on the set of real numbers, and contains all finite and infinite open, closed, and half-open intervals, singleton sets, and the results from set operations on these sets. On the other hand, the Borel σ -algebra B (R) is different from the collection of all subsets of R. In other words, there exist some subsets of real numbers which are not contained in the Borel σ -algebra. One such example is the Vitali set discussed in Appendix 2.4. When the sample space is the real line R, we choose the Borel σ -algebra B (R) as our event space. At the same time, when a subset Ω of real numbers is the sample space, the Borel σ -algebra

  B Ω = G : G = H ∩ Ω , H ∈ B (R)

(2.2.11)

of Ω is assumed as the event space. Note that when the sample space is a discrete subset A of the set of real numbers, Borel σ -algebra B(A) of A is the same as the power set 2 A of A.

2.2.3 Probability Measure We now consider the notion of probability measure, the third element of a probability space. Definition 2.2.8 (measurable space) The pair (Ω, F) of a sample space Ω and an event space F is called a measurable space. Let us again mention that when the sample space S is countable or discrete, we usually assume the power set  of the  sample space as the event space: in other words, the measurable space is S, 2 S . When the sample space S is uncountable or continuous, we assume the event space described by (2.2.11): in other words, the measurable space is (S, B(S)). Definition 2.2.9 (probability measure) On a measurable space (Ω, F), a set function P assigning a real number P (Bi ) to each set Bi ∈ F under the constraint of the following four axioms is called a probability measure or simply probability: Axiom 1. P (Bi ) ≥ 0.

(2.2.12)

106

2 Fundamentals of Probability

Axiom 2. P(Ω) = 1.

(2.2.13)

n are mutually exclusive, Axiom 3. When a finite number of events {Bi }i=1

 P

n

 =

∪ Bi

i=1

n 

P (Bi ) .

(2.2.14)

i=1

∞ Axiom 4. When a countable number of events {Bi }i=1 are mutually exclusive,

 P



∪ Bi

i=1

 =

∞ 

P (Bi ) .

(2.2.15)

i=1

The probability measure P is a set function assigning a value P(G), called probability and also denoted by P{G} and Pr{G}, to an event G. A probability measure is also called a probability function, probability distribution, or distribution. Axioms 1–4 are also intuitively appealing. The first axiom that a probability should be not smaller than 0 is in some sense chosen arbitrarily like other measures such as area, volume, and weight. The second axiom is a mathematical expression that something happens from an experiment or some outcome will result from an experiment. The third axiom is called additivity or finite additivity, and implies that the probability of the union of events with no common element is the sum of the probability of each event, which is similar to the case of adding areas of nonoverlapping regions. Axiom 4 is called the countable additivity, which is an asymptotic generalization of Axiom 3 into the limit. This axiom is the key that differentiates the modern probability theory developed by Kolmogorov from the elementary probability theory. When evaluating the probability of an event which can be expressed, for example, only by the limit of events, Axiom 4 is crucial: such an asymptotic procedure is similar to obtaining the integral as the limit of a series. It should be noted that (2.2.14) does not guarantee (2.2.15). In some cases, (2.2.14) is combined into (2.2.15) by viewing Axiom 3 as a special case of Axiom 4. If our definition of probability is based on the space of an algebra, then we may not be able to describe, for example, some probability resulting from a countably infinite number of set operations. To guarantee the existence of the probability in such a case as well, we need sigma algebra which guarantees the result of a countably infinite number of set operations to exist within our space. From the axioms of probability, we can obtain P (∅) = 0,   P B c = 1 − P(B),

(2.2.16) (2.2.17)

2.2 Probability Spaces

107

and P(B) ≤ 1,

(2.2.18)

where B is an event. In addition, based on the axioms of probability and (2.2.16), the only probability measure is P(Ω) = 1 and P(∅) = 0 in the measurable space (Ω, F) with event space F = {Ω, ∅}. Example 2.2.20 In a fair4 coin toss, consider the sample space Ω = {head, tail} and event space F = 2Ω . Then, we have5 P(head) = P(tail) = 21 , P(Ω) = 1, and P(∅) = 0. ♦ Example 2.2.21 Consider the sample space Ω = {0, 1} and event space F = {{0}, {1}, Ω, ∅}. Then, an example of the probability measure on this measurable 3 7 , P(1) = 10 , P (∅) = 0, and P (Ω) = 1. ♦ space (Ω, F) is P(0) = 10 In a discrete sample space, the power set of the sample space is assumed to be the event space and the event space may contain all the subsets of the discrete sample space. On the other hand, if we choose the event space containing all the possible subsets of a continuous sample space, not only will we be confronted with contradictions6 but the event space will be too large to be useful. Therefore, when dealing with continuous sample spaces, we choose an appropriate event space such as the Borel sigma algebra mentioned in Definition 2.2.7 and assign a probability only to those events. In addition, in discrete sample spaces, assigning probability to a singleton set is meaningful whereas it is not useful in continuous sample spaces. Example 2.2.22 Consider a random experiment of choosing a real number between 0 and 1 randomly. If we assign the probability to a number, then the probability of choosing any number will be zero because there are uncountably many real numbers between 0 and 1. However, it is not possible to obtain the probability of, for instance, ‘the number chosen is between 0.1 and 0.5’ when the probability of choosing a number is 0. In essence, in order to have a useful model in continuous sample space, we should assign probabilities to interval sets, not to singleton sets. ♦ Definition 2.2.10 (probability space) The triplet (Ω, F, P) of a sample space Ω, an event space F of Ω, and a probability measure P is called a probability space. It is clear from Definition 2.2.10 that the event space F can be chosen as an algebra instead of a σ -algebra when the sample space S is finite, because an algebra is the same as a σ -algebra in finite sample spaces. 4

Here, fair means ‘head and tail are equally likely to occur’. Because the probability measure P is a set function, P({k}) and P({head}), for instance, are the exact expressions. Nonetheless, the expressions P(k), P{k}, P(head), and P{head} are also used. 6 The Vitali set V discussed in Definition 2.A.12 is a subset in the space of real numbers. Denote 0 ∞ the rational numbers in (−1, 1) by {αi }i=1 and assume the translation operation Tt (x) = x + t.

∞ Then, the events Tαi V0 i=1 will produce a contradiction. 5

108

2 Fundamentals of Probability

2.3 Probability We now discuss the properties of probability and alternative definitions of probability (Gut 1995; Mills 2001).

2.3.1 Properties of Probability ∞ Assume a probability space (Ω, F, P) and events {Bi ∈ F}i=1 . Then, based on (2.2.12)–(2.2.15), we can obtain the following properties: Property 1. We have

  P (Bi ) ≤ P B j if Bi ⊆ B j

(2.3.1)

and7  P





∪ Bi



i=1

∞ 

P (Bi ) .

(2.3.2)

i=1

∞ is a countable partition of the sample space Ω, then Property 2. If {Bi }i=1

P(G) =

∞ 

P (G ∩ Bi )

(2.3.3)

i=1

for G ∈ F.  n! Property 3. Denoting the sum over nr = r !(n−r ways of choosing r events from )!    n {Bi }i=1 by P Bi1 Bi2 · · · Bir , we have i 1 21 , y < x + 21 , x < 21 , and A2 = 1 is the area of the region 0 < x < 1, 0 < y < 1, x > y, x > 21 , x < y + 21 , y < 8 1 . A similar problem will be discussed in Example 3.4.2. ♦ 2

114

2 Fundamentals of Probability

Fig. 2.4 Bertrand’s paradox. Solution 1

B A

M

r 2

r

Example 2.3.8 Assume a collection of n integrated circuits (ICs), among which m are defective ones. When an IC is chosen randomly from the collection, obtain the probability α1 that the IC is defective. Solution The number of ways to choose one IC among n ICs is n C1 and to choose C1 = m . ♦ one among m defective ICs is m C1 . Thus, α1 = mC n n

1

In obtaining the probability with the classical definition, it is important to enumerate the number of desired outcomes correctly, which is usually a problem of combinatorics. Another important consideration in the classical definition is the condition of equally likely outcomes, which will become more evident in Example 2.3.9. Example 2.3.9 When a fair11 die is rolled twice, obtain the probability P7 that the sum of the two faces is 7. Solution (Solution 1) There are 11 possible cases {2, 3, . . . , 12} for the sum. There1 . fore, one might say that P7 = 11 (Solution 2) If we write the outcomes of the two rolls as {x, y}, x ≥ y ignoring the order, then we have 21 possible cases {1, 1}, {2, 1}, {2, 2}, . . ., {6, 6}. Among the 21 cases, the three possibilities {4, 3}, {5, 2}, {6, 1} yield a sum of 7. Based on this 3 = 17 . observation, P7 = 21 (Solution 3) There are 36 equally likely outcomes (x, y) with x, y = 1, 2, . . . 6, among which the six outcomes (1, 6), (2, 5), (3, 4), (4, 3), (5, 2), and (6, 1) yield a 6 = 16 . sum of 7. Therefore, P7 = 36 Among the three solutions, neither the first nor the second is correct because the cases in these two solutions are not equally likely. For instance, in the first solution, the sum of two numbers being 2 and 3 are not equally likely. Similarly, {1, 1} and {2, 1} are not equally likely in the second solution. In short, Solution 3 is the correct solution. ♦ In addition, especially when the number of possible outcomes is infinite, the procedure, observation, and model, mentioned in Example 2.2.4, of an experiment should be clearly depicted in advance. Example 2.3.10 On a circle C of radius r , we draw a chord AB√randomly. Obtain the probability PB that the length l of the chord is no shorter than 3r . This problem is called the Bertrand’s paradox. 11

Here, ‘fair’ means ‘1, 2, 3, 4, 5, and 6 are equally likely to occur’.

2.3 Probability

115 D

Fig. 2.5 Bertrand’s paradox. Solution 2

60◦

A

60◦ E A

Fig. 2.6 Bertrand’s paradox. Solution 3 F

r 2

G

M H

r 2

K

B

Solution (Solution 1) Assume that the√ center point M of the chord is chosen randomly. As shown in Fig. 2.4, l ≥ 3r is satisfied if the center point M is located in or on the circle C1 of radius r2 with center the same as that of C. Thus,  −1 PB = 41 πr 2 πr 2 = 41 . (Solution 2) Assume that the point B is selected randomly on the circle C with the √ point A fixed. As shown in Fig. 2.5, l ≥ 3r is satisfied when the point B is on the shorter arc D E, where D and E are the two points 23 πr apart from A along C in two directions. Therefore, we have PB = 23 πr (2πr )−1 = 13 because the length of the shorter arc D E is 23 πr . (Solution 3) Assume that the chord AB √ is drawn orthogonal to a diameter F K of C. As shown in Fig. 2.6, we then have l ≥ 3r if the center point M is located between the two points H and G, located r2 apart from K toward F and from F toward K , respectively. Therefore, PB = 2rr = 21 . This example illustrates that the experiment should be described clearly to obtain the probability appropriately with the classical definition. ♦

2.3.2.2

Definition Via Relative Frequency

Probability can also be defined in terms of the relative frequency of desired outcomes in a number of repetitions of a random experiment. Specifically, the relative frequency of a desired event A can be defined as qn (A) =

nA , n

(2.3.23)

116

2 Fundamentals of Probability

where n A is the number of occurrences of A and n is the number of trials. In many cases, the relative frequency qn (A) converges to a value as n becomes larger, and the probability P(A) of A can be defined as the limit P(A) = lim qn (A) n→∞

(2.3.24)

of the relative frequency. One drawback of this definition is that the limit shown in (2.3.24) can be obtained only by an approximation in practice. Example 2.3.11 The probability that a person of a certain age will survive for a year will differ from year to year, and thus it is difficult to use the classical definition of probability. As an alternative in such a case, we assume that the tendency in the future will be the same as that so far, and then compute the probability as the relative frequency based on the records over a long period for the same age. This method is often employed in determining an insurance premium. ♦

2.4 Conditional Probability Conditional probability (Helstrom 1991; Shiryaev 1996; Sveshnikov 1968; Weirich 1983) is one of the most important notions and powerful tools in probability theory. It is the probability of an event when the occurrence of another event is assumed. Conditional probability is quite useful especially when only partial information is available and, even when we can obtain some probability directly, we can often obtain the same result more easily by using conditioning in many situations. Definition 2.4.1 (conditional probability) The probability of an event under the assumption that another event has occurred is called conditional probability. Specifically, the conditional probability, denoted by P(A|B), of event A under the assumption that event B has occurred is defined as P(A|B) =

P(A ∩ B) P(B)

(2.4.1)

when P(B) > 0. In other words, the conditional probability P(A|B) of event A under the assumption that event B has occurred is the probability of A with the sample space Ω replaced by the conditioning event B. Often, the event A conditioned on B is denoted by A|B. From (2.4.1), we easily get P(A|B) =

P(A) P(B)

(2.4.2)

2.4 Conditional Probability

117

when A ⊆ B and P(A|B) = 1

(2.4.3)

when B ⊆ A. Example 2.4.1 When the conditioning event B is the sample space Ω, we have P(A|Ω) = P(A) because A ∩ B = A ∩ Ω = A and P(B) = P(Ω) = 1. ♦ Example 2.4.2 Consider the rolling of a fair die. Assume that we know the outcome is an even number. Obtain the probability that the outcome is 2. Solution Let A = {2} and B = {2, 4, 6}. Then, because A ∩ B = A, P(A ∩ B) =  −1 P(A) = 16 , and P(B) = 21 = 0, we have P(A|B) = 16 21 = 13 . Again, P(A|B) is the probability of A = {2} when B = {an even number} = {2, 4, 6} is assumed as the sample space. ♦ Example 2.4.3 The probability for any child to be a girl is α and is not influenced by other children. Assume that Dr. Kim has two children. Obtain the probabilities in the following two separate cases: (1) the probability p1 that the younger child is a daughter when Dr. Kim says “The elder one is a daughter”, and (2) the probability p2 that the other child is a daughter when Dr. Kim says “One of my children is a daughter”. 1 ∩D2 ) = αα = α, where D1 and D2 denote Solution We have p1 = P ( D2 | D1 ) = P(D P(D1 ) the events of the first and second child being a daughter, respectively. Similarly, because P (C A ) = P (D1 ∩ D2 ) + P (D1 ∩ B2 ) + P (B1 ∩ D2 ) = 2α − α 2 , we get α2 α A ∩C B ) = 2α−α p2 = P ( C B | C A ) = P(C 2 = 2−α , where B1 and B2 denote the events P(C A ) of the first and second child being a boy, respectively, and C A and C B denote the events of one and the other child being a daughter, respectively. ♦ 2

2.4.1 Total Probability and Bayes’ Theorems Theorem 2.4.1 When P ( Bi | B1 B2 · · · Bi−1 ) > 0 for i = 2, 3, . . . , n, the probabiln n  n ity of the intersection ∩ Bi = Bi of {Bi }i=1 can be written as i=1

 P

n

∩ Bi

i=1

i=1

 = P (B1 ) P ( B2 | B1 ) · · · P ( Bn | B1 B2 · · · Bn−1 ) ,

(2.4.4)

which is called the multiplication theorem. Similarly, the probability of the union n

∪ Bi can be expressed as

i=1

118

2 Fundamentals of Probability

 P

n



∪ Bi

i=1

    c = P (B1 ) + P B1c B2 + · · · + P B1c B2c · · · Bn−1 Bn ,

(2.4.5)

which is called the addition theorem. Note that (2.4.5) is the same as (2.3.4). Next, (2.4.4) can be written as P (B1 ∩ B2 ) = P (B1 ) P ( B2 | B1 ) = P (B2 ) P ( B1 | B2 )

(2.4.6)

    when n = 2. Now, from 1 = P (Ω |B2 ) = P B1c ∪ B1 |B2 = P B1c |B2 + P ( B1 | B2 ), we have P B1c |B2 = 1 − P ( B1 | B2 ). Using this result and (2.4.6), (2.4.5) for n = 2can be written as P (B1 ∪ B2 ) = P (B1 ) + P B1c B2 = P (B1 ) + P (B2 ) P B1c |B2 = P (B1 ) + P (B2 ) {1 − P ( B1 | B2 )}, i.e., P (B1 ∪ B2 ) = P (B1 ) + P (B2 ) − P (B2 ) P ( B1 | B2 ) = P (B1 ) + P (B2 ) − P (B1 ∩ B2 ) ,

(2.4.7)

which is the same as (2.3.5). Example 2.4.4 Assume a box with three red and two green balls. We randomly choose one ball and then another without replacement. Find the probability PRG that the first ball is red and the second ball is green. Solution (Method 1) The number of all possible ways of choosing two balls is 5 P2 , ×2 P1 among which that of the desired outcome is 3 P1 × 2 P1 . Therefore, PRG = 3 P1 P = 5

3 . 10

2

(Method 2) Let A = {the first ball is red} and B = {the second ball is green}. Then, we have P(A) = 35 . In addition, P(B|A) is the conditional probability that the second ball is green when the first ball is red, and is thus the probability of choosing a green ball among the remaining two red and two green balls after a red ball has been chosen: 3 . ♦ in other words, P(B|A) = 24 . Consequently, P(AB) = P(B|A)P(A) = 10 Consider two events A and B. The event A can be expressed as A = AB ∪ AB c ,

(2.4.8)

based on which we have P(A) = P(AB) + P ( AB c ) = P(A|B)P(B) + P (A|B c ) P (B c ), i.e.,   P(A) = P(A|B)P(B) + P A|B c {1 − P(B)}

(2.4.9)

from (2.2.14) because AB and AB c are mutually exclusive. The result (2.4.9) shows that the probability of A is the weighted sum of the conditional probabilities of A when B and B c are assumed with the weights the probabilities of the conditioning events B and B c , respectively.

2.4 Conditional Probability

119

The result (2.4.9) is quite useful when a direct calculation of the probability of an event is not straightforward. Example 2.4.5 In Box 1, we have two white and four black balls. Box 2 contains one white and one black balls. We randomly take one ball from Box 1, put it into Box 2, and then randomly take one ball from Box 2. Find the probability PW that the ball taken from Box 2 is white. Solution Let the events of a white ball from Box 1 and a white ball from Box 2 be C and D, respectively. Then, because P(C) = 13 , P(D|C) = 23 , P (D |C c ) = 1 , and P (C c ) = 1 − P(C) = 23 , we get PW = P(D) = P(D|C)P(C) + P (D |C c ) 3 P (C c ) = 49 . ♦ Example 2.4.6 The two numbers of the upward faces are added after rolling a pair of fair dice. Obtain the probability α that 5 appears before 7 when we continue the rolling until the outcome is 5 or 7. Solution Let An and Bn be the events that the outcome is neither 5 nor 7 and that the outcome is 5, respectively, at the n-th rolling for n = 1, 2, . . .. Then, P (An ) = 1 − 4 6 from P (Bn ) = P(5) = 36 = 19 and P(7) = 36 = 16 . Now, α = P(5) − P(7) = 13 18 ∞  P (B1 ∪ (A1 B2 ) ∪ (A1 A2 B3 ) · · · ) = P ( A1 A2 . . . An−1 Bn ) from (2.4.5) because n=1

{A1 A2 . . . An−1 Bn }∞ n=1 are mutually exclusive, where we assume A0 B1 = B1 . Here, P ( A1 A2 . . . An−1 Bn ) = P (Bn |A1 A2 · · · An−1 ) P (A1 A2 . . . An−1 )   1 13 n−1 . (2.4.10) = 9 18 Therefore, α =

1 9

∞    13 n−1 n=1

18

= 25 .



Example 2.4.7 Example 2.4.6 can be viewed in a more intuitive way as follows: Consider two mutually exclusive events A and B from a random experiment and assume the experiments are repeated. Then, the probability that A occurs before B can be obtained as P(A) probability of A = . probability of A or B P(A) + P(B)

(2.4.11)

Solving Example 2.4.6 based on (2.4.11), we get P(5 appears before 7) = −1 + 16 = 25 .

1 9

1 9



Let us now generalize the number of conditioning events in (2.4.9). Assume a n n collection B j j=1 of mutually exclusive events and let A ⊆ ∪ B j . Then, P(A) = j=1   n   P ∪ A ∩ B j can be expressed as j=1

120

2 Fundamentals of Probability

P(A) =

n    P AB j

(2.4.12)

j=1





n  

n = ∪ A ∩ B j and A ∩ B j j=1 are all mutually excluj=1       j=1  sive. Now, recollecting that P AB j = P A  B j P B j , we get the following theorem called the total probability theorem:

because A = A ∩

n

∪ Bj

Theorem 2.4.2 We have P(A) =

n       P A B j P B j

(2.4.13)

j=1 n n when B j j=1 is a collection of disjoint events and A ⊆ ∪ B j . j=1

Example 2.4.8 Let A = {1, 2, 3} in the experiment of rolling a fair die. When B1 = {1, 2} and B2 = {3, 4, 5, 6}, we have P(A) = 21 = 1 × 13 + 41 × 23 because  −1 1) = 13 × 13 = 1, and P ( A |B2 ) = P (B1 ) = 13 , P (B2 ) = 23 , P ( A |B1 ) = P(AB P(B1 )   −1 P(AB2 ) = 16 × 23 = 14 . Similarly, when B1 = {1} and B2 = {2, 3, 5}, we get P(B2 ) P(A) =

1 2

=1×

1 6

1, and P ( A |B2 ) =

+

2 × 21 from 3 P(AB2 ) = 23 . P(B2 )

P (B1 ) = 16 , P (B2 ) = 21 , P (A |B1 ) =

P(AB1 ) P(B1 )

= ♦

Example 2.4.9 Assume a group comprising 60% women and 40% men. Among the women, 45% play violin, and 25% of the men play violin. A person chosen randomly from the group plays violin. Find the probability that the person is a man. Solution Denote the events of a person being a man and a woman by M and W , respectively, and playing violin by V . Then, using (2.4.1) and (2.4.13), we get P(V |M)P(M) V) P(M|V ) = P(M = P(V |M)P(M)+P(V = 10 because M c = W , P(W ) = 0.6, P(V ) |W )P(W ) 37 P(M) = 0.4, P(V |W ) = 0.45, and P(V |M) = 0.25. ♦ n Consider a collection {Bi }i=1 of events and an event A with P(A) > 0. Then,we have

P ( Bk | A) =

P ( A |Bk ) P (Bk ) P(A)

(2.4.14)

k A) because P ( Bk | A) = P(B and P (Bk A) = P ( A |Bk ) P (Bk ) from the definition of P(A) conditional probability. Now, combining the results (2.4.13) and (2.4.14) when the

n

n are all mutually exclusive and A ⊆ ∪ Bi , we get the following result events {Bi }i=1 i=1

called the Bayes’ theorem:

2.4 Conditional Probability

121

Theorem 2.4.3 We have P ( Bk | A) =

P (A |Bk ) P (Bk ) n     .  P A B j P B j

(2.4.15)

j=1 n n when B j j=1 is a collection of disjoint events, A ⊆ ∪ B j , and P(A) = 0. j=1

It should be noted in applying Theorem 2.4.3 that, when A is not a subset of n

∪ B j , using (2.4.15) to obtain P ( Bk | A) will yield an incorrect value. This is because

j=1

P(A) =

n 

n      P A  B j P B j when A is not a subset of ∪ B j . To obtain P ( Bk | A) j=1

j=1

correctly in such a case, we must use (2.4.14), i.e., (2.4.15) with the denominator n       P A  B j P B j replaced back with P(A). j=1

Example 2.4.10 In Example 2.4.5, obtain the probability that the ball drawn from Box 1 is white when the ball drawn from Box 2 is white. Solution Using the results of Example 2.4.5 and the Bayes’ theorem, we have D) = P(D|C)P(C) = 21 . ♦ P(C|D) = P(C P(D) P(D) Example 2.4.11 For the random experiment of rolling a fair die, assume A = {2, 4, 5, 6}, B1 = {1, 2}, and B2 = {3, 4, 5}. Obtain P ( B2 | A). Solution We easily get P(A) = 23 . On the other hand,

2 

     P A B j P B j =

j=1

1 2

 −1 1) P (B2 ) = 21 , P (A |B1 ) = P(AB = 16 × 13 = 21 , and P(B1 )   −1 2) P (A |B2 ) = P(AB = 13 × 21 = 23 . In other words, because A = {2, 4, 5, 6} is P(B2 ) 2       not a subset of B1 ∪ B2 = {1, 2, 3, 4, 5}, we have P(A) = P A B j P B j .

from

P (B1 ) = 13 ,

j=1

Thus, we would get P ( B2 | A) =

P(A|B2 )P(B2 )

2 

j=1

P ( A | B j )P ( B j )

= 23 , an incorrect answer, if we use

(2.4.15) carelessly. The correct answer P ( B2 | A) = by using (2.4.14) in this case.

P(A|B2 )P(B2 ) P(A)

=

1 2

can be obtained ♦

Let us consider an example for the application of the Bayes’ theorem. Example 2.4.12 Assume four boxes with 2000, 500, 1000, and 1000 parts of a machine, respectively. The probability of a part being defective is 0.05, 0.4, 0.1, and 0.1, respectively, for the four boxes. (1) When a box is chosen at random and then a part is picked randomly from the box, calculate the probability that the part is defective.

122

2 Fundamentals of Probability

(2) Assuming the part picked is defective, calculate the probability that the part is from the second box. (3) Assuming the part picked is defective, calculate the probability that the part is from the third box. Solution Let A and Bi be the events that the part picked is defective and the part is from the i-th box, respectively. Then, P (Bi ) = 41 for i = 1, 2, 3, 4. In addition, the value of P (A |B2 ), for instance, is 0.4 because P ( A |B2 ) denotes the probability that a part picked is defective when it is from the second box. 4 are all disjoint, we get P(A) = (1) Noting that {Bi }i=1

0.05 +

1 4

× 0.4 +

1 4

× 0.1 +

1 4

× 0.1, i.e., P(A) =

4 

P ( A |Bi ) P (Bi ) =

i=1

13 80

1 4

×

(2.4.16)

from (2.4.13). (2) The probability to obtain is P ( B2 | A). We get P ( B2 | A) =

P(A|B2 )P(B2 ) 4  P ( A | B j )P ( B j )

=

j=1

0.4× 14

= as shown in (2.4.15) because P(A) = from (2.4.16), P (B2 ) = 14 , and P ( A |B2 ) = 0.4. 0.1× 1 2 3 )P(B3 ) from P ( B3 | A) = P(A|BP(A) , (3) Similarly, we get12 P ( B3 | A) = 13 4 = 13 13 80

8 13

13 80

80

P (B3 ) = 41 , P ( A |B3 ) = 0.1, and P(A) =

13 . 80

♦ If we calculate similarly the probabilities for the first and fourth boxes and then add the four values in Example 2.4.1, we will get 1: in other words, we have 4  P ( Bi | A) = P (Ω|A) = 1. i=1

2.4.2 Independent Events Assume two boxes. Box 1 contains one red ball and two green balls, and Box 2 contains two red balls and four green balls. If we pick a ball randomly after choosing a box with probability P(Box 1) = p = 1 − P(Box 2), then we have P(red ball) = P ( red ball | Box 1) P(Box 1) + P ( red ball | Box 2) P(Box 2) = 13 p + 13 (1 − p) = 1 and P(green ball) = 23 . Note that 3 P(red ball) = P ( red ball | Box 1) = P ( red ball | Box 2) 12

Here,

13 80

= 0.1625,

8 13

≈ 0.6154, and

2 13

≈ 0.1538.

(2.4.17)

2.4 Conditional Probability

123

and P(green ball) = P ( green ball | Box 1) = P ( green ball | Box 2): whichever box we choose or whatever value the probability of choosing a box is, the probability that the ball picked is red and green is 13 and 23 , respectively. In other words, the choice of a box does not influence the probability of the color of the ball picked. On the other hand, if Box 1 contains one red ball and two green balls and Box 2 contains two red balls and one green ball, the choice of a box will influence the probability of the color of the ball picked. Such an influence is commonly represented by the notion of independence. Definition 2.4.2 (independence of two events) If the probability P(AB) of the intersection of two events A and B is equal to the product P(A)P(B) of the probabilities of the two events, i.e., if P(AB) = P(A)P(B),

(2.4.18)

then A and B are called independent (of each other) or mutually independent. Example 2.4.13 Assume the sample space S = {1, 2, . . . , 9} and P(k) = 19 for k = 1, 2, . . . , 9. Consider the events A = {1, 2, 3} and B = {3, 4, 5}. Then, P(A) = 1 , P(B) = 13 , and P(AB) = P(3) = 19 , and therefore P(AB) = P(A)P(B). Thus, 3 A and B are independent of each other. Likewise, for the sample space S = {1, 2, . . . , 6}, the events C = {1, 2, 3} and D = {3, 4} are independent of each other ♦ when P(k) = 16 for k = 1, 2, . . . , 6. When one of two events has probability 1 as the sample space S or 0 as the null set ∅, the two events are independent of each other because (2.4.18) holds true. Theorem 2.4.4 An event with probability 1 or 0 is independent of any other event. Example 2.4.14 Assume the sample space S = {1, 2, . . . , 5} and let P(k) = 15 for k = 1, 2, . . . , 5. Then, no two sets, excluding S and ∅, are independent of each other. 1 3 , P(2) = P(3) = P(4) = 15 , and P(5) = 10 for the sample space When P(1) = 10 S = {1, 2, . . . , 5}, the events A = {3, 4} and B = {4, 5} are independent because ♦ P(A)P(B) = P(4) from P(A) = 25 , P(B) = 21 , and P(4) = 15 . In general, two mutually exclusive events are not independent of each other: on the other hand, we have the following theorem from Theorem 2.4.4: Theorem 2.4.5 If at least one event has probability 0, then two mutually exclusive events are independent of each other. Example 2.4.15 For the sample space S = {1, 2, 3}, let the power set 2 S = {∅, {1}, {2}, . . . , S} be the event space. Assume the probability measure P(1) = 0, P(2) = 13 , and P(3) = 23 . Then, the events {2} and {3} are mutually exclusive, but not independent of each other because P(2)P(3) = 29 = 0 = P(∅). On the other hand, the events {1} and {2} are mutually exclusive and, at the same time, independent of each other. ♦

124

2 Fundamentals of Probability

From P (Ac ) = 1 − P (A) and the definition (2.4.1) of conditional probability, we can show the following theorem: Theorem 2.4.6 If the events A and B are independent of each other, then A and B c are also independent of each other, P(A|B) = P(A), and P(B|A) = P(B). Example 2.4.16 Assume the sample space S = {1, 2, . . . , 6} and probability measure P(k) = 16 for k = 1, 2, . . . , 6. The events A = {1, 2, 3} and B = {3, 4} are independent of each other as we have already observed in Example 2.4.13. Here, B c = {1, 2, 5, 6} and thus P (B c ) = 23 and P (AB c ) = P(1, 2) = 13 . In other words, A and B c are independent of each other because P ( AB c ) = P(A)P (B c ). We also ♦ have P(A|B) = 21 = P(A) and P(B|A) = 13 = P(B). n Definition 2.4.3 (independence of a number of events) The events {Ai }i=1 are called independent of each other if they satisfy



 P

∩ Ai

i∈J

=



P (Ai )

(2.4.19)

i∈J

for every finite subset J of {1, 2, . . . , n}. Example 2.4.17 When A, B, and C are independent of each other with P(AB) = 13 , P(BC) = 16 , and P(AC) = 29 , obtain the probability of C. 2 Solution First, P(A) = 23 because P(B)P(C) = 27{P(A)} 2 = 2 1 2 and P(C) = 9P(A) . Thus, P(C) = 3 from P(C) = 9P(A) .

1 6

from P(B) =

1 3P(A)



n A number of events {Ai }i=1 with n = 3, 4, . . . may or may not be independent of each other even when Ai and A j are independent of each other for every possible n pair {i, j}. When only all pairs of two events are independent, the events {Ai }i=1 with n = 3, 4, . . . are called pairwise independent.

Example 2.4.18 For the sample space Ω = {1, 2, 3, 4} of equally likely outcomes, consider A1 = {1, 2}, A2 = {2, 3}, and A3 = {1, 3}. Then, A1 and A2 are independent of each other, A2 and A3 are independent of each other, and A3 and A1 are independent of each other because P ( A1 ) = P ( A2 ) = P ( A3 ) = 21 , P ( A1 A2 ) = P ({2}) = 14 , P (A2 A3 ) = P ({3}) = 41 , and P ( A3 A1 ) = P ({1}) = 14 . However, A1 , A2 , and A3 are not independent of each other because P ( A1 A2 A3 ) = P(∅) = 0 is not equal to ♦ P (A1 ) P (A2 ) P (A3 ) = 18 . Example 2.4.19 A malfunction of a circuit element does not influence that of another circuit element. Let the probability for a circuit element to function normally be p. Obtain the probability PS and PP that the circuit will function normally when n circuit elements are connected in series and in parallel, respectively.

2.4 Conditional Probability

125

Solution When the circuit elements are connected in series, every circuit element should function normally for the circuit to function normally. Thus, we have PS = p n

(2.4.20)

On the other hand, the circuit will function normally if at least one of the circuit elements functions normally. Therefore, we get PP = 1 − (1 − p)n

(2.4.21)

because the complement of the event that at least one of the circuit elements functions normally is the event that all elements are malfunctioning. Note that 1 − (1 − p)n > ♦ p n for n = 1, 2, . . . when p ∈ (0, 1).

2.5 Classes of Probability Spaces In this section, the notions of probability mass functions and probability density functions (Kim 2010), which are equivalent to the probability measure for the description of a probability space, and are more convenient tools when managing mathematical operations such as differentiation and integration, will be introduced.

2.5.1 Discrete Probability Spaces In a discrete probability space, in which the sample space Ω is a countable set, we normally assume Ω = {0, 1, . . .} or Ω = {1, 2, . . .} with the event space F = 2Ω . Definition 2.5.1 (probability mass function) In a discrete probability space, a function p(ω) assigning a real number to each sample point ω ∈ Ω and satisfying p(ω) ≥ 0, ω ∈ Ω

(2.5.1)

and 

p(ω) = 1

(2.5.2)

ω∈Ω

is called a probability mass function (pmf), a mass function, or a mass. From (2.5.1) and (2.5.2), we have p(ω) ≤ 1 for every ω ∈ Ω.

(2.5.3)

126

2 Fundamentals of Probability

Example 2.5.1 For the sample space Ω = J0 = {0, 1, . . .} and pmf p(x) =

1

, x = 0; 3 2c, x = 2;

determine the constant c.  Solution From (2.5.2), p(x) = x∈Ω

1 3

c, x = 1; 0, otherwise,

(2.5.4)

+ 3c = 1. Thus, c = 29 .



The probability measure P can be expressed as P(F) =



p(ω)

(2.5.5)

ω∈F

for F ∈ F in terms of the pmf p. Conversely, the pmf p can be written as p(ω) = P({ω})

(2.5.6)

in terms of the probability measure P. Note that a pmf is defined for sample points and the probability measure for events. Both the probability measure P and pmf p can be used to describe the randomness of the outcomes of an experiment. Yet, the pmf is easier than the probability measure to deal with, especially when mathematical operations such as sum and difference are involved. Some of the typical examples of pmf are provided in the examples below. Example 2.5.2 For the sample space Ω = {x1 , x2 } and a number α ∈ (0, 1), the function  1 − α, x = x1 , (2.5.7) p(x) = α, x = x2 ♦

is called a two-point pmf.

Definition 2.5.2 (Bernoulli trial) An experiment with two possible outcomes, i.e., an experiment for which the sample space has two elements, is called a Bernoulli experiment or a Bernoulli trial. Example 2.5.3 When x1 = 0 and x2 = 1 in the two-point pmf (2.5.7), we have  p(x) =

1 − α, x = 0, α, x = 1,

(2.5.8)

which is called the binary pmf or Bernoulli pmf. The binary distribution is usually denoted by b(1, α), where 1 signifies the number of Bernoulli trial and α represents the probability of the desired event or success. ♦

2.5 Classes of Probability Spaces

127

Example 2.5.4 In the experiment of rolling a fair die, assume the events A = {1, 2, 3, 4} and Ac = {5,  6}. Then, if we choose A as the desired event, the dis♦ tribution of A is b 1, 23 . Example 2.5.5 When the sample space is Ω = Jn = {0, 1, . . . , n − 1}, the pmf p(k) =

1 , k ∈ Jn n

(2.5.9) ♦

is called a uniform pmf.

Example 2.5.6 For the sample space Ω = {1, 2, . . .} and a number α ∈ (0, 1), the pmf p(k) = (1 − α)k α, k ∈ Ω

(2.5.10)

is called a geometric pmf. The distribution represented by the geometric pmf (2.5.10) is called the geometric distribution with parameter α and denoted by Geom(α). ♦ When a Bernoulli trial with probability α of success is repeated until the first success, the distribution of the number of failures is Geom(α). In some cases, the function p(k) = (1 − α)k−1 α for k ∈ {1, 2, . . .} with α ∈ (0, 1) is called the geometric pmf. In such a case, the distribution of the number of repetitions is Geom(α) when a Bernoulli trial with probability α of success is repeated until the first success. Example 2.5.7 Based on the binary pmf discussed in Example 2.5.3, let us introduce the binomial pmf. Consider the sample space Ω = Jn+1 = {0, 1, . . . , n} and a number α ∈ (0, 1). Then, the function p(k) =

n Ck α

k

(1 − α)n−k , k ∈ Jn+1

(2.5.11)

is called a binomial pmf and the distribution is denoted by b(n, α). ♦   In (2.5.11), the number n Cr = (n−rn!)!r ! , also denoted by nr , is the coefficient of r n−r x y in the expansion of (x + y)n , and thus called the binomial coefficient as we have described in (1.4.60). Figure 2.7 shows the envelopes of binomial pmf for some values of n when α = 0.4. The binomial pmf is discussed in more detail in Sect. 3.5.2. Example 2.5.8 For the sample space Ω = J0 = {0, 1, . . .} and a number λ ∈ (0, ∞), the function p(k) = e−λ

λk , k ∈ J0 k!

is called a Poisson pmf and the distribution is denoted by P(λ).

(2.5.12) ♦

128

2 Fundamentals of Probability 0.3

α = 0.4

0.25

n = 10

p(k)

0.2

0.15

n = 50 n = 100

0.1

n = 150

0.05

0

0

40

30

20

10

70

60

50

80

k Fig. 2.7 Envelopes of binomial pmf 0.35

0.3

∗ λ = 0.5 ◦λ=3

probability

0.25

0.2

0.15

0.1

0.05

0

0

2

6

4

8

10

12

k Fig. 2.8 Poisson pmf (for λ = 0.5, p(0) =

√1 e

≈ 0.61)

λ For the Poisson pmf (2.5.12), recollecting p(k+1) = k+1 , we have p(0) ≤ p(1) ≤ p(k) · · · ≤ p(λ − 1) = p(λ) ≥ p(λ + 1) ≥ p(λ + 2) ≥ · · · when λ is an integer, and p(0) ≤ p(1) ≤ · · · ≤ p(λ − 1) ≤ p(λ) and p(λ) ≥ p(λ + 1) ≥ · · · when λ is not an integer, where the floor function x is defined following (1.A.44) in Appendix 1.2. Figure 2.8 shows two examples of Poisson pmf. The Poisson pmf will be discussed in more detail in Sect. 3.5.3.

2.5 Classes of Probability Spaces

129

Example 2.5.9 For the sample space Ω = J0 = {0, 1, . . .}, r ∈ (0, ∞), and α ∈ (0, 1), the function p(x) =

−r Cx α

r

(α − 1)x , x ∈ J0

(2.5.13)

is called a negative binomial (NB) pmf, and the distribution with the pmf (2.5.13) is denoted by NB(r, α). ♦ When r = 1, the NB pmf (2.5.13) is the geometric pmf discussed in Example 2.5.6. The NB pmf with r a natural number and a real number is called the Pascal pmf and Polya pmf, respectively. The meaning of NB(r, α) and the formula of the NB pmf vary depending on whether the sample space is {0, 1, . . .} or {r, r + 1, . . .}, whether r represents a success or a failure, or whether α is the probability of success or failure. In (2.5.13), the parameters r and α represent the number and probability of success, respectively. When a Bernoulli trial with the probability α of success is repeated until the r -th success, the distribution of the number of repetitions is NB(r, α). ∞ ∞   x −r p(x) = 1 because = We clearly have −r Cx (α − 1) = (1 + α − 1) x=0

x=0

α −r from (1.A.12) with p = −r and z = α − 1. Now, the pmf (2.5.13) can be written as p(x) = r +x−1 Cx αr (1 − α)x or, equivalently, as p(x) = using13

−r Cx

=

1 (−r )(−r x!

r +x−1 Cr −1 α

r

(1 − α)x , x ∈ J0

(2.5.14)

− 1) · · · (−r − x + 1), i.e., = (−1)x r +x−1 Cx .

−r Cx

(2.5.15)

Note that we have ∞ 

r +x−1 Cx

(1 − α)x = α −r

(2.5.16)

x=0

because and

∞ 

∞ 

r +x−1 Cx

(1 − α)x =

x=0

∞  x=0

(r +x−1)! (r −1)!x!

(1 − α)x =

∞ 

r +x−1 Cr −1

(1 − α)x

x=0

p(x) = 1. Letting x + r = y in (2.5.14), we get

x=0

p(y) =

y−1 Cr −1 α

r

(1 − α) y−r ,

y = r, r + 1, . . .

(2.5.17)

when r is a natural number, which is called the NB pmf sometimes. Here, note that x+r −1 Cx |x=y−r = y−1 C y−r = y−1 Cr −1 . Here, (−r )(−r − 1) · · · (−r − x + 1) = (−1)x r (r + 1) · · · (r + x − 1). Equation (2.5.15) can also be obtained based on Table 1.4.

13

130

2 Fundamentals of Probability

2.5.2 Continuous Probability Spaces Let us now consider the continuous probability space with the measurable space (Ω, F) = (R, B (R)): in other words, the sample space Ω is the set R of real numbers and the event space is the Borel field B (R). Definition 2.5.3 (probability density function) In a measurable space (R, B (R)), a real-valued function f , with the two properties f (r ) ≥ 0,

r ∈Ω

(2.5.18)

and  Ω

f (r )dr = 1

(2.5.19)

is called a probability density function (pdf), a density function, or a density. Example 2.5.10 Determine the constant c when the pdf is f (x) = 41 , c, and 0 for x ∈ [0, 1), [1, 2), and [0, 2)c , respectively. ∞ 1 2 Solution From (2.5.19), we have −∞ f (r )dr = 0 41 dr + 1 c dr = 14 + c = 1. Thus, c = 43 . ♦ The value f (x) of a pdf f does not represent the probability P({x}). Instead, the set function P defined in terms of f as  f (r )dr,

P(F) =

F ∈ B (R)

(2.5.20)

F

is the probability measure of the probability space on which f is defined. Note that (2.5.20) is a counterpart of (2.5.5). While we have (2.5.6), an equation describing the pmf in terms of the probability measure in the discrete probability space, we do not have its counterpart in the continuous probability space, which would describe the pdf in terms of the probability measure. Do the integrals in (2.5.19) and (2.5.20) have any meaning? For interval events or finite unions of interval events, we can adopt the Riemann integral as in most engineering problems and calculations. On the other hand, the Riemann integral has some caveats including that the order of the limit and integral for a sequence of functions is not interchangeable. In addition, the Riemann integral is not defined in some cases. For example, when  f (r ) =

1, r ∈ [0, 1], 0, otherwise,

(2.5.21)

it is not possible to obtain the Riemann integral of f (r ) over the set F = {r : r is an irrational number, r ∈ [0, 1]}. Fortunately, such a caveat can be overcome

2.5 Classes of Probability Spaces

131

by adopting the Lebesgue integral. Compared to the Riemann integral, the Lebesgue integral has the following three important advantages: (1) The Lebesgue integral is defined for any Borel set. (2) The order of the limit and integral can almost always be interchanged in the Lebesgue integral. (3) When a function is Riemann integrable, it is also Lebesgue integrable, and the results are known to be the same. Like the pmf, the pdf is defined on the points in the sample space, not on the events. On the other hand, unlike the pmf p(·) for which p(ω) directly represents the probability P({ω}), the value f (x0 ) at a point x0 of the pdf f (x) is not the probability at x = x0 . Instead, f (x0 ) d x represents the probability for the arbitrarily small interval [x0 , x0 + d x). While the value of a pmf cannot be larger than 1 at any point, the value of a pdf can be larger than 1 at some points. In addition, the probability of a countable event is 0 even when the value of the pdf is not 0 in the continuous space: for the pdf  f (x) =

2, x ∈ [0, 0.5], 0, otherwise,

(2.5.22)

we have P({a}) = 0 for any point a ∈ [0, 0.5]. On the other hand, if we assume a very small interval around a point, the probability of that interval can be expressed as the product of the value of the pdf and the length of the interval. For example, for a pdf f with f (3) = 4 the probability P([3, 3 + d x)) of an arbitrarily small interval [3, 3 + d x) near 3 is f (3)d x = 4d x.

(2.5.23)

This implies that, as we can obtain the probability of an event by adding the probability mass over all points in the event in discrete probability spaces, we can obtain the probability of an event by integrating the probability density over all points in the event in continuous probability spaces. Some of the widely-used pdf’s are shown in the examples below. Example 2.5.11 When a < b, the pdf f (r ) =

1 u(r − a)u(b − r ) b−a

(2.5.24)

shown in Fig. 2.9 is called a uniform pdf or a rectangular pdf, and its distribution is denoted by14 U (a, b). ♦ The probability measure of U [0, 1] is often called the Lebesgue measure. 14

Notations U [a, b], U [a, b), U (a, b], and U (a, b) are all used interchangeably.

132

2 Fundamentals of Probability f (r)

Fig. 2.9 The uniform pdf

1 b−a

a

b

r

f (r)

Fig. 2.10 The exponential pdf

2

λ=2

1 0

λ=1

r

Example 2.5.12 (Romano and Siegel 1986) Countable sets are all of Lebesgue measure 0. Some uncountable sets such as the Cantor set C described in Example 1.1.46 are also of Lebesgue measure 0. ♦ Example 2.5.13 The pdf f (r ) = λe−λr u(r )

(2.5.25)

shown in Fig. 2.10 is called an exponential pdf with λ > 0 called the rate of the pdf. The exponential pdf with λ = 1 is called the standard exponential pdf. The exponential pdf will be discussed again in Sect. 3.5.4. ♦ Example 2.5.14 The pdf f (r ) =

λ −λ|r | e 2

(2.5.26)

with λ > 0, shown in Fig. 2.11, is called a Laplace pdf or a double exponential pdf, and its distribution is denoted by L(λ). ♦ Example 2.5.15 The pdf f (r ) = √

  (r − m)2 exp − 2σ 2 2π σ 2 1

(2.5.27)

shown in Fig. 2.12 iscalled a Gaussian pdf or a normal pdf, and its distribution is denoted by N m, σ 2 . ♦

2.5 Classes of Probability Spaces

133 f (r)

Fig. 2.11 The Laplace pdf

1

λ=2

0.5 λ=1 r

0 f (r)

Fig. 2.12 The normal pdf

σ1 < σ2 σ = σ1 σ = σ2

m

r

When m = 0 and σ 2 = 1, the normal pdf is called the standard normal pdf. The normal distribution is sometimes called the Gauss-Laplace distribution, de MoivreLaplace distribution, or the second Laplace distribution (Lukacs 1970). The normal pdf will be addressed again in Sect. 3.5.1 and its generalizations into multidimensional spaces in Chap. 5. Example 2.5.16 For a positive number α and a real number β, the function f (r ) =

1 α π (r − β)2 + α 2

(2.5.28)

shown in Fig. 2.13 is called a Cauchy pdf and the distribution is denoted by C(β, α). ♦ The Cauchy pdf is also called the Lorentz pdf or Breit-Wigner pdf. We will mostly consider the case β = 0, with the notation C(α) in this book. Example 2.5.17 The pdf   r r2 f (r ) = 2 exp − 2 u(r ) α 2α

(2.5.29)

shown in Fig. 2.14 is called a Rayleigh pdf. ♦  2 Example 2.5.18 When f (v) = av exp −v u(v) is a pdf, obtain the value of a. ∞ ∞   ♦ Solution From −∞ f (v)dv = a 0 v exp −v 2 dv = a2 = 1, we get a = 2.

134

2 Fundamentals of Probability f (r)

Fig. 2.13 The Cauchy pdf

1 π

α=1

1 2π

α=2

0

r

f (r)

Fig. 2.14 The Rayleigh pdf

α = α1

α1 < α2

α = α2 0

r

f (r)

Fig. 2.15 The logistic pdf

k1 > k2 k = k1 k = k2 0

r

Example 2.5.19 The pdf (Balakrishnan 1992) ke−kr f (r ) =  2 1 + e−kr shown in Fig. 2.15 is called a logistic pdf, where k > 0.

(2.5.30) ♦

Example 2.5.20 The pdf   1 r α−1 r u(r ) f (r ) = α exp − β Γ (α) β

(2.5.31)

shown in Fig. 2.16 is called a gamma pdf and the distribution is denoted by G(α, β), where α > 0 and β > 0. It is clear from (2.5.25) and (2.5.31) that the gamma pdf with α = 1 is the same as an exponential pdf. ♦

2.5 Classes of Probability Spaces

135 f (r)

Fig. 2.16 The gamma pdf

β=1

α = 0.5 α=1

α=2

0

r

Example 2.5.21 The pdf f (r ) =

r α−1 (1 − r )β−1 u(r )u(1 − r ) ˜ B(α, β)

(2.5.32)

shown in Fig. 2.17 is called a beta pdf and the distribution is denoted by B(α, β), where α > 0 and β > 0. ♦ ˜ In (2.5.32), B(α, β) is the beta function described in (1.4.95). Unless a confusion ˜ arises regarding the beta function B(α, β) and the beta distribution B(α, β), we often use B(α, β) for both the beta function and beta distribution. Table 2.1 shows some general properties of the beta pdf f (r ) shown in (2.5.32). When α = 1 and β > 1, the pdf f (r ) is decreasing in (0, 1), f (0) = β, and f (1) = 0. When α > 1 and β = 1, the pdf f (r ) is increasing in (0, 1), f (0) = 0, and f (1) = α. In addition, because f (r ) = r α−2 (1 − r )β−2 {α − 1 − (α + β − 2)r }u(r )u(1 − r ),

(2.5.33)

    α−1 α−1 and α+β−2 the pdf f (r ) increases and decreases in 0, α+β−2 , 1 , respectively, and f (0) = f (1) = 0 when α > 1 and β > 1. In other words, when α > 1 and β > 1, α−1 between 21 the pdf f (r ) is a unimodal function, and has its maximum at r = α+β−2 1 α−1 1 and 1 if α > β; at r = 2 if α = β; and at r = α+β−2 between 0 and 2 if α < β. The maximum point is closer to 0 when α is closer to 1 or when β is larger, and it is closer to 1 when β is closer to 1 or when α is larger. Such a property of the beta pdf can be used in the order statistics (David and Nagaraja 2003) of discrete distributions.   Example 2.5.22 The pdf of the distribution B 21 , 21 is f (r ) =

1 u(r )u(1 − r ). √ π r (1 − r )

1 0 Letting r = cos2 v, we have 0 f (r )dr = π 2 also called the inverse sine pdf.

−2 cos v sin v dv π cos v sin v

(2.5.34) = 1. The pdf (2.5.34) is ♦

136

2 Fundamentals of Probability f (r)

(α, β) = (0.7, 0.3) (α, β) = (1, 3)

)=

= (2 , 3)

(α ,β

(α, β)

(7 ,2 )

(α, β) = (2, 5)

,β (α

)=

1) (3,

0

1

r

Fig. 2.17 The beta pdf Table 2.1 Characteristics of a beta pdf f (r ), 0 < r < 1 0 0 and x ∈ X . In addition, let E ∈ i=1

M and I E (s) =

n 

ci μ (E ∩ Bi ). Then, for a non-negative and measurable function

i=1

f,  f dμ = sup I E (s)

(2.A.29)

E

is called the Lebesgue integral. In Definition 2.A.11, the upper bound is obtained over all measurable simple functions s such that 0 ≤ s ≤ f . In the meantime, when the function f is not always positive, the Lebesgue integral can be defined as 



+

f dμ = E



f − dμ

f dμ − E

(2.A.30)

E

  if at least one of E f + dμ and E f − dμ is finite, where f + = max( f, 0) and f − = − min( f, 0). Note that f = f + − f − and that f+ and f − are measurable functions. If both E f + dμ and E f − dμ are finite, then E f dμ is finite and the function f is called Lebesgue integrable on E for μ, which is expressed as f ∈ L(μ) on E. Based on mensuration by parts, the Riemann integral is the sum of products of the value of a function in an arbitrarily small interval composing the integral region and the length of the interval. On the other hand, the Lebesgue integral is the sum of products of the value of a function and the measure of the interval in the domain corresponding to an arbitrarily small interval in the range of the function. The Lebesgue integral exists not only for all Riemann integrable functions but also for other functions while the Riemann integral exists only when the function is at least piecewise continuous. Some of the properties of the Lebesgue integral are as follows: (1) If a function f is measurable on E and bounded and μ(E) is finite, then f ∈ L(μ) on E.  (2) If the measure μ(E) is finite and a ≤ f ≤ b, then aμ(E) ≤ E f dμ ≤bμ(E). (3) If f, g ∈ L(μ) on the set E and f (x) ≤ g(x) for x ∈ E, then  E f dμ ≤ E gdμ. (4) If f ∈ L(μ) on the set E and c is a finite constant, then E c f dμ ≤ c E f dμ and c f ∈ L(μ).    (5) If f ∈ L(μ) on the set E, then | f | ∈ L(μ) and  E f dμ ≤ E | f |dμ.

Appendices

149

 (6) If a function f is measurable on the set E and μ(E) = 0, then E f dμ = 0.  (7) If a function f is Lebesgue integrable on X and φ(A) = A f dμ on A ∈ M, then φ is additive on M.   (8) Let A ∈ M, B ⊆ A, and μ(A − B) = 0. Then, A f dμ = B f dμ. (9) Consider a sequence { f n }∞ n=1 of measurable functions such that lim f n (x) = n→∞

f (x) for E ∈ M and x ∈ E. If there exists a function g ∈ L(μ) such that | f n (x)| ≤ g(x), then lim E f n dμ = E f dμ. n→∞

(10) If a function f is Riemann integrable on [a, b], then f is Lebesgue integrable and the Lebesgue integral with the Lebesgue measure is the same as the Riemann integral.

Appendix 2.4 Non-measurable Sets Assume the open unit interval J = (0, 1) and the set Q of rational numbers in the real space R. Consider the translation operator Tt : R → R such that Tt (x) = x + t for x ∈ R. Suppose the countable set Γt = Tt Q, i.e., Γt = {t + q : q ∈ Q}.

(2.A.31)

For example, we have Γ5445 = {q + 5445 : q ∈ Q} = Q and Γπ = {q + π : q ∈ Q} when t = 5445 and t = π , respectively. It is clear that Γt ∩ J  = ∅

(2.A.32)

because we can always find a rational number q such that 0 < t + q < 1 for any number t. We have Γt = {t + q : q ∈ Q} = {s + (t − s) + q : q ∈ Q} = real s + q : q ∈ Q = Γs and Γt ∩ Γs = ∅ when t − s is a rational number and an irrational number, respectively. Based on this observation, consider the collection K = {Γt : t ∈ R, distinct Γt only}

(2.A.33)

of sets (Rao 2004). Then, we have the following facts: (1) (2) (3) (4)

The collection K is a partition of R. There exists only one rational number t for Γt ∈ K. There exist uncountably many sets in K. For two distinct sets Γt and Γs in K, the number t − s is not a rational number.

150

2 Fundamentals of Probability

Definition 2.A.12 (Vitali set) Based on the axiom of choice17 and (2.A.32), we can obtain an uncountable set V0 = {x : x ∈ Γt ∩ J, Γt ∈ K} ,

(2.A.34)

where x represents a number in the interval (0, 1) and an element of Γt ∈ K. The set V0 is called the Vitali set. Note that the points in the Vitali set V0 are all in interval (0, 1) and have a one-toone correspondence with the sets in K. Denoting the enumeration of all the rational ∞ , we get the following theorem: numbers in the interval (−1, 1) by {αi }i=1 Theorem 2.A.8 For the Vitali set V0 , ∞

(0, 1) ⊆ ∪ Tαi V0 ⊆ (−1, 2) i=1

(2.A.35)

holds true. Proof First, −1 < αi + x < 2 because −1 < αi < 1 and any point x in V0 satisfies 0 < x < 1. In other words, Tαi x ∈ (−1, 2), and therefore ∞

∪ Tαi V0 ⊆ (−1, 2).

i=1

(2.A.36)

Next, for any point x in (0, 1), x ∈ Γt with an appropriately chosen t as we have observed in (2.A.32). Then, we have Γt = Γx and x ∈ Γt = Γx because x − t is a rational number. Now, denoting a point in Γx ∩ V0 by y, we have y = x + q because Γx ∩ V0 = ∅ and therefore y − x ∈ Q. Here, y − x is a rational number in (−1, 1) because 0 < x, y < 1 and, consequently, we can put y − x = αi : in other words, y = x + αi = Tαi x ∈ Tαi V0 . Thus, we have ∞

(0, 1) ⊆ ∪ Tαi V0 . i=1

(2.A.37)

Subsequently, we get (2.A.35) from (2.A.36) and (2.A.37). ♠

∞ Theorem 2.A.9 The sets Tαi V0 i=1 are all mutually exclusive: in other words, 

   Tαi V0 ∩ Tα j V0 = ∅

(2.A.38)

for i = j. The axiom of choice can be expressed as “For a non-empty set B ⊆ A, there exists a choice function f : 2 A → A such that f (B) ∈ B for any set A.” The axiom of choice can be phrased in various expressions, and that in Definition 2.A.12 is based on “If we assume a partition P S of S composed only of non-empty sets, then there exists a set B for which the intersection with any set in P S is a singleton set.” 17

Appendices

151

Proof We prove the theorem When i = j or, equivalently, when by contradiction.   αi = α j , assume that Tαi V0 ∩ Tα j V0 is not a null set. Letting one element of the intersection be y, we have y = x + αi = x + α j for x, x ∈ V0 . It is clear that Γx = Γx because x − x = α j − αi is a rational number. Thus, x = x from the definition  of  K, and therefore αi = α j : this is contradictory to αi = α j . Consequently,  ♠ Tαi V0 ∩ Tα j V0 = ∅.

∞ Theorem 2.A.10 No set in Tαi V0 i=1 is Lebesgue measurable: in other words, / M(μ) for any i. Tαi V0 ∈

∞ Proof We prove the theorem by contradiction. Assume that the sets Tαi V0 i=1 are measurable. Then, from the translation invariance18 of a measure, they have the same  measure. Denoting the Lebesgue measure of Tαi V0 by μ Tαi V0 = β, we have  μ((0, 1)) ≤ μ





∪ Tαi V0

i=1

≤ μ((−1, 2))

(2.A.39)

from μ((0, 1)) = 1 and μ((−1, 2)) = 3. In addition, we have  Here,  (2.A.35). ∞ ∞    μ ∪ Tαi V0 = μ Tαi V0 , i.e., i=1

i=1

 μ





∪ Tαi V0

i=1

=

∞ 

β

(2.A.40)

i=1

∞ because Tαi V0 i=1 is a collection of mutually exclusive sets as we have observed in (2.A.38). Combining (2.A.39) and (2.A.40) leads us to 1 ≤

∞ 

β ≤ 3,

(2.A.41)

i=1

which can

∞ be satisfied neither with β = 0 nor with β = 0. Consequently, no set in ♠ Tαi V0 i=1 , including V0 , is Lebesgue measurable.

Exercises Exercise 2.1 Obtain the algebra generated from the collection C = {{a}, {b}} of the set S = {a, b, c, d}. Exercise 2.2 Obtain the σ -algebra generated from the collection C = {{a}, {b}} of the set S = {a, b, c, d}. 18

For any real number x, the measure of A = {a} is the same as that of A + x = {a + x}.

152

2 Fundamentals of Probability

Exercise 2.3 Obtain the sample space S in the following random experiments: (1) An experiment measuring the lifetime of a battery. (2) An experiment in which an integer n is selected in the interval [0, 2] and then an integer m is selected in the interval [0, n]. (3) An experiment of checking the color of, and the number written on, a ball selected randomly from a box containing two red, one green, and two blue balls denoted by 1, 2, . . . , 5, respectively. Exercise 2.4 When P(A) = P(B) = P(AB), obtain P ( AB c + B Ac ). Exercise 2.5 Consider rolling a fair die. For A = {1}, B = {2, 4}, and C = {1, 3, 5, 6}, obtain P(A ∪ B), P(A ∪ C), and P(A ∪ B ∪ C). Exercise 2.6 Consider the events A = (−∞, r ] and B = (−∞, s] with r ≤ s in the sample space of real numbers. (1) Express C = (r, s] in terms of A and B. (2) Show that B = A ∪ C and A ∩ C = ∅. Exercise 2.7 When ten distinct red and ten distinct black balls are randomly arranged into a single line, find the probability that red and black balls are placed in an alternating fashion. Exercise 2.8 Consider two branches between two nodes in a circuit. One of the two branches is a resistor and the other is a series connection of two resistors. Obtain the probability that the two nodes are disconnected assuming that the probability for a resistor to be disconnected is p and disconnection in a resistor is not influenced by the status of other resistors. Exercise 2.9 Show that Ac and B are independent of each other and that Ac and B c are independent of each other when A and B are independent of each other. Exercise 2.10 Assume the sample space S = {1, 2, 3} and event space F = 2 S . Show that no two events, except S and ∅, are independent of each other for any probability measure such that P(1) > 0, P(2) > 0, and P(3) > 0. Exercise 2.11 For two events A and B, show the followings: (1) If P(A) = 0, then P(AB) = 0. (2) If P(A) = 1, then P(AB) = P(B). Exercise 2.12 Among 100 lottery tickets sold each week, one is a winning ticket. When a ticket costs 10 euros and we have 500 euros, does buying 50 tickets in one week bring us a higher probability of getting the winning ticket than buying one ticket over 50 weeks? Exercise 2.13 In rolling a fair die twice, find the probability that the sum of the two outcomes is 7 when we have 3 from the first rolling.

Exercises

153

Exercise 2.14 When a pair of fair dice are rolled once, find P(a − 2b < 0), where a and b are the face values of the two dice with a ≥ b. Exercise 2.15 When we choose subsets A, B, and C from D = {1, 2, . . . , k} randomly, find the probability that C ∩ (A − B)c = ∅. Exercise 2.16 Denote the four vertices of a regular tetrahedron by A, B, C, and D. In each movement from one vertex to another, the probability of arriving at another vertex is 13 for each of the three vertices. Find the probabilities pn,A and pn,B that we arrive at A and B, respectively, after n movements starting from A. Obtain the values of p10,A and p10,B when n = 10. Exercise 2.17 A box contains N balls each marked with a number 1, 2, . . ., and N , respectively. Each of N students with identification (ID) numbers 1, 2, . . ., and N , respectively, chooses a ball randomly from the box. If the number marked on the ball and the ID number of the student are the same, then it is called a match. (1) Find the probability of no match. (2) Using conditional probability, obtain the probability in (1) again. (3) Find the probability of k matches. Exercise 2.18 In the interval [0, 1] on a line of real numbers, two points are chosen randomly. Find the probability that the distance between the two points is shorter than 21 . Exercise 2.19 Consider the probability space composed of the sample space S = {all pairs (k, m) of natural numbers} and probability measure P((k, m)) = α(1 − p)k+m−2 ,

(2.E.1)

where α is a constant and 0 < p < 1. (1) Determine the constant α. Then, obtain the probability P((k, m) : k ≥ m). (2) Obtain the probability P((k, m) : k + m = r ) as a function of r ∈ {2, 3, . . .}. Confirm that the result is a probability measure. (3) Obtain the probability P((k, m) : kis an odd number). Exercise 2.20 Obtain P ( A ∩ B), P ( A| B), and P ( B| A) when P ( A) = 0.7, P (B) = 0.5, and P ([A ∪ B]c ) = 0.1. Exercise 2.21 Three people shoot at a target. Let the event of a hit by the i-th person be Ai for i = 1, 2, 3 and assume the three events are independent of each other. When P (A1 ) = 0.7, P ( A2 ) = 0.9, and P (A3 ) = 0.8, find the probability that only two people will hit the target. Exercise 2.22 In testing circuit elements, let A = {defective element} and B = {element identified as defective}, and P ( B| A) = p, P (B c | Ac ) = q, P ( A) = r , and P(B) = s. Because the test is not perfect, two types of errors could occur: a

154

2 Fundamentals of Probability

false negative, ‘a defective element is identified to be fine’; or a false positive, ‘a functional element is identified to be defective’. Assume that the production and testing of the elements can be adjusted such that the parameters p, q, r , and s are very close to 0 or 1. (1) For each of the four parameters, explain whether it is more desirable to make it closer to 0 or 1. (2) Describe the meaning of the conditional probabilities P ( B c | A) and P (B |Ac ). (3) Describe the meaning of the conditional probabilities P ( Ac | B) and P (A |B c ) . (4) Given the values of the parameters p, q, r , and s, obtain the probabilities in (2) and (3). (5) Obtain the sample space of this experiment. Exercise 2.23 For three events A, B, and C, show the following results without using Venn diagrams: (1) P(A ∪ B) = P(A) + P(B) − P(AB). (2) P(A ∪ B ∪ C) = P(A) + P(B) + P(C) + P(ABC) − P(AB) − P(AC) − P(BC). (3) Union upper bound, i.e.,  P

n

∪ Ai

i=1

 ≤

n 

P ( Ai ) .

(2.E.2)

i=1

∞ Exercise 2.24 For the sample space S and events E, F, and {Bi }i=1 , show that the conditional probability satisfies the axioms of probability as follows:

(1) 0 ≤ P(E|F) ≤ 1. (2) P(S|F) = 1.  ∞  ∞  ∞ (3) P ∪ Bi  F = P ( Bi | F) when the events {Bi }i=1 are mutually exclusive. i=1

i=1

n n of events, where {Ai }i=1 Exercise 2.25 Assume an event B and a collection {Ai }i=1 is a partition of the sample space S.

(1) Explain whether or not Ai and A j for i = j are independent of each other. n . (2) Obtain a partition of B using {Ai }i=1 n  (3) Show the total probability theorem P(B) = P (B |Ai ) P ( Ai ). (4) Show the Bayes’ theorem P ( Ak | B) =

i=1 P(B|Ak )P(Ak ) . n  P(B|Ai )P(Ai ) i=1

Exercise 2.26 Box 1 contains two red and three green balls and Box 2 contains one red and four green balls. Obtain the probability of selecting a red ball when a ball is selected from a randomly chosen box.

Exercises

155

Exercise 2.27 Box i contains i red and (n − i) green balls for i = 1, 2, . . . , n. 2i , obtain the probability of selecting a red Choosing Box i with probability n(n+1) ball when a ball is selected from the box chosen. Exercise 2.28 A group of people elects one person via rock-paper-scissors. If there is only one person who wins, then the person is chosen; otherwise, the rock-paperscissors is repeated. Assume that the probability of rock, paper, and scissors are 13 for every person and not affected by other people. Obtain the probability pn,k that n people will elect one person in k trials. Exercise 2.29 In an election, Candidates A and B will get n and m votes, respectively. When n > m, find the probability that Candidate A will always have more counts than Candidate B during the ballot-counting. Exercise 2.30 A type O cell is cultured at time 0. After one hour, the cell will become two type O cells, probability = 41 , one type O cell, one type M cell, probability = 23 , 1 . two type M cells, probability = 12

(2.E.3)

A new type O cell behaves like the first type O cell and a type M cell will disappear in one hour, where a change is not influenced by any other change. Find the probability β0 that no type M cell will appear until n + 21 hours from the starting time. Exercise 2.31 Find the probability of the event A that 5 or 6 appears k times when a fair die is rolled n times. Exercise 2.32 Consider a communication channel for signals of binary digits (bits) 0 and 1. Due to the influence of noise, two types of errors can occur as shown in Fig. 2.19: specifically, 0 and 1 can be identified to be 1 and 0, respectively. Let the transmitted and received bits be X and Y , respectively. Assume a priori probability of P (X = 1) = p for 1 and P (X = 0) = 1 − p for 0, and the effect of noise on a bit is not influenced by that on other bits. Denote the probability that the received bit is i when the transmitted bit is i by P ( Y = i| X = i) = pii for i = 0, 1.

Fig. 2.19 A binary communication channel

transmitted signal received signal p00 = 1 − p01 0 0 p01 p10 1

p11 = 1 − p10

1

156

2 Fundamentals of Probability

(1) Obtain the probabilities p10 = P(Y = 0|X = 1) and p01 = P(Y = 1|X = 0) that an error occurs when bits 1 and 0 are transmitted, respectively. (2) Obtain the probability that an error occurs. (3) Obtain the probabilities P(Y = 1) and P(Y = 0) that the received bit is identified to be 1 and 0, respectively. (4) Obtain all a posteriori probabilities P(X = j|Y = k) for j = 0, 1 and k = 0, 1. (5) When p = 0.5, obtain P(X = 1|Y = 0), P(X = 1|Y = 1), P(Y = 1), and P(Y = 0) for a symmetric channel with p00 = p11 . Exercise 2.33 Assume a pile of n integrated circuits (ICs), among which m are defective ones. When an IC is chosen randomly from the pile, the probability that the IC is defective is α1 = mn as shown in Example 2.3.8. (1) Assume we pick one IC and then one more IC without replacing the first one back to the pile. Obtain the probabilities α1,1 , α0,1 , α1,0 , and α0,0 that both are defective, the first one is not defective and the second one is defective, the first one is defective and the second one is not defective, and neither the first nor the second one is defective, respectively. (2) Now assume we pick one IC and then one more IC after replacing the first one back to the pile. Obtain the probabilities α1,1 , α0,1 , α1,0 , and α0,0 again. (3) Assume we pick two ICs randomly from the pile. Obtain the probabilities β0 , β1 , and β2 that neither is defective, one is defective and the other is not defective, and both are defective, respectively. Exercise 2.34 Box 1 contains two old and three new erasers and Box 2 contains one old and six new erasers. We perform the experiment “choose one box randomly and pick an eraser at random” twice, during which we discard the first eraser picked. (1) Obtain the probabilities P2 , P1 , and P0 that both erasers are old, one is old and the other is new, and both erasers are new, respectively. (2) When both erasers are old, obtain the probability P3 that one is from Box 1 and the other is from Box 2. Exercise 2.35 The probability for a couple to have k children is αp k with 0 < p < 1. (1) The color of the eyes being brown for a child is of probability b and is independent of that of other children. Obtain the probability that the couple has r children with brown eyes. (2) Assuming that a child being a girl or a boy is of probability 21 , obtain the probability that the couple has r boys. (3) Assuming that a child being a girl or a boy is of probability 21 , obtain the probability that the couple has at least two boys when the couple has at least one boy. Exercise 2.36 For the pmf p(x) = (2.5.14), show

r +x−1 Cr −1 α

r

(1 − α)x , x ∈ J0 introduced in

Exercises

157

lim p(x) =

r →∞

λx −λ e , x!

(2.E.4)

 r  which implies lim NB r, r +λ = P(λ). r →∞

Exercise 2.37 A person plans to buy a car of price N units. The person has k units and wishes to earn the remaining from a game. In the game, the person wins and loses 1 unit when the outcome is a head and a tail, respectively, from a toss of a coin with probability p for a head and q = 1 − p for a tail. Assuming 0 < k < N and the person continues the game until the person earns enough for the car or loses all the money, find the probability that the person loses all the money. This problem is called the gambler’s ruin problem. Exercise 2.38 A large number of bundles, each with 25 tulip bulbs, are contained in a large box. The bundles are of type R5 and R15 with portions 43 and 41 , respectively. A type R5 bundle contains five red and twenty white bulbs and a type R15 bundle contains fifteen red and ten white bulbs. A bulb, chosen randomly from a bundle selected at random from the box, is planted. (1) Obtain the probability p1 that a red tulip blossoms. (2) Obtain the probability p2 that a white tulip blossoms. (3) When a red tulip blossoms, obtain the conditional probability that the bulb is from a type R15 bundle. Exercise 2.39 For a probability space with the sample space Ω = J0 = {0, 1, . . .} and pmf  p(x) =

5c2 + c, x = 0; c, x = 2;

3 − 13c, x = 1; 0, otherwise;

(2.E.5)

determine the constant c. Exercise 2.40 Show that  2 x x 1 φ(x) < Q(x) < exp − 1 + x2 2 2

(2.E.6)

for x > 0, where φ(x) denotes the standard normal pdf, i.e., (2.5.27) with m = 0 and σ 2 = 1, and 1 Q(x) = √ 2π



∞ x

 2 t dt. exp − 2

(2.E.7)

Exercise 2.41 Balls with colors C1, C2, . . ., Cn are contained in k boxes. Let the probability of choosing Box B j be P B j = b j and that of choosing a ball with color n k k    Ci from Box B j be P (Ci | B j = ci j , where ci j = 1 and b j = 1. A box i=1

is chosen first and then a ball is chosen from the box.

j=1

j=1

158

2 Fundamentals of Probability

n (1) Show that, if {ci1 = ci2 = · · · = cik }i=1 , the color of the ball chosen is indepen     n k dent of the choice of a box, i.e., P Ci B j = P (Ci ) P B j i=1 j=1 , for any k values of b j j=1 . (2) When n = 2, k = 3, b1 = b3 = 41 , and b2 = 21 , express the condition for P (C1 B1 ) = P (C1 ) P (B1 ) to hold true in terms of {c11 , c12 , c13 }.

Exercise 2.42 Boxes 1, 2, and 3 contain four red and five green balls, one red and one green balls, and one red and two green balls, respectively. Assume that the probabilities of the event Bi of choosing Box i are P (B1 ) = P (B3 ) = 41 and P (B2 ) = 21 . After a box is selected, a ball is chosen randomly from the box. Denote the events that the ball is red and green by R and G, respectively. (1) Are the events B1 and R independent of each other? Are the events B1 and G independent of each other? (2) Are the events B2 and R independent of each other? Are the events B3 and G independent of each other?

4 Exercise 2.43 For the sample space Ω = {1, 2, 3, 4} with P(i) = 41 i=1 , consider A1 = {1, 3, 4}, A2 = {2, 3, 4}, and A3 = {3}. Are the three events A1 , A2 , and A3 independent of each other? Exercise 2.44 Consider two consecutive experiments with possible outcomes A and B for the first experiment and C and D for the second experiment. When P ( AC) = 13 , P (AD) = 16 , P (BC) = 16 , and P (B D) = 13 , are A and C independent of each other? Exercise 2.45 Two people make an appointment to meet between 10 and 11 o’clock. Find the probability that they can meet assuming that each person arrives at the meeting place between 10 and 11 o’clock independently and waits only up to 10 minutes. Exercise 2.46 Consider two children. Assume any child can be a girl or a boy equally likely. Find the probability p1 that both are boys when the elder is a boy and the probability p2 that both are boys when at least one is a boy. Exercise 2.47 There are three red and two green balls in Box 1, and four red and three green balls in Box 2. A ball is randomly chosen from Box 1 and put into Box 2. Then, a ball is picked from Box 2. Find the probability that the ball picked from Box 2 is red. Exercise 2.48 Three people A, B, and C toss a coin each. The person whose outcome is different from those of the other two wins. If the three outcomes are the same, then the toss is repeated. (1) Show that the game is fair, i.e., the probability of winning is the same for each of the three people.

Exercises

159

Table 2.2 Some probabilities in the game mighty (A) Player G 1 murmurs “Oh! I do not have the joker.” (1) Probability of having the joker (2) Probability of having the mighty (3) Probability of having either the mighty or joker (4) Probability of having at least one of the mighty and joker (5) Probability of having both the mighty and joker

0,

(B) Player G 1 murmurs “Oh! I have neither the mighty nor the joker.”

Player G 1 , 10 260 5 43 = 1118 , Players {G i }i=2 , 3 78 43 = 1118 , on the table (G 6 ) . 5 215 26 = 1118 , Player G 1 , 105 210 5 559 = 1118 , Players {G i }i=2 , 63 on the table (G 6 ) . 1118 ,

0,

5 215 26 = 1118 , 190 380 559 = 1118 , 135 1118 ,

Player G 1 , 5 Players {G i }i=2 , on the table (G 6 ) .

0,

5 215 26 = 1118 , 425 1118 , 69 138 559 = 1118 ,

Player G 1 , 5 Players {G i }i=2 , on the table (G 6 ) .

0,

0, 45 1118 , 3 1118 ,

Player G 1 , 5 Players {G i }i=2 , on the table (G 6 ) .

Player G 1 , 5 Players {G i }i=2 , on the table (G 6 ) . 0, Player G 1 , 10 70 5 = , 43 301 Players {G i }i=2 , 3 21 43 = 301 , on the table (G 6 ) . 10 43 3 43

= =

110 301 , 40 301 ,

125 301 , 41 301 ,

0, 15 301 , 1 301 ,

70 301 , 21 301 ,

Player G 1 , 5 Players {G i }i=2 , on the table (G 6 ) . Player G 1 , 5 Players {G i }i=2 , on the table (G 6 ) . Player G 1 , 5 Players {G i }i=2 , on the table (G 6 ) .

(2) Find the probabilities that B wins exactly eight times and at least eight times when the coins are tossed ten times, not counting the number of no winner. Exercise 2.49 A game called mighty can be played by three, four, or five players. When it is played with five players, 53 cards are used by adding one joker to a deck of 52 cards. Among the 53 cards, the ace of spades is called the mighty, except when the suit of spades19 is declared the royal suit. In the play, ten cards are distributed to 5 and the remaining three cards are left on the table, each of the five players {G i }i=1 face side down. Assume that what Player G 1 murmurs is always true and consider the two cases (A) Player G 1 murmurs “Oh! I do not have the joker.” and (B) Player G 1 murmurs “Oh! I have neither the mighty nor the joker.” For convenience, let the three cards on the table be Player G 6 . Obtain the following probabilities and thereby confirm Table 2.2: (1) Player G i has the joker. (2) Player G i has the mighty. (3) Player G i has either the mighty or the joker. 19

When the suit of spades is declared the royal suit, the ace of diamonds, not the ace of spades, becomes the mighty.

160

2 Fundamentals of Probability

(4) Player G i has at least one of the mighty and the joker. (5) Player G i has both the mighty and the joker. Exercise 2.50 In a group of 30 men and 20 women, 40% of men and 60% of women play piano. When a person in the group plays piano, find the probability that the person is a man. Exercise 2.51 The probability that a car, a truck, and a bus passes through a toll gate is 0.5, 0.3, and 0.2, respectively. Find the probability that 30 cars, 15 trucks, and 5 buses has passed when 50 automobiles have passed the toll gate.

References N. Balakrishnan, Handbook of the Logistic Distribution (Marcel Dekker, New York, 1992) P.J. Bickel, K.A. Doksum, Mathematical Statistics (Holden-Day, San Francisco, 1977) H.A. David, H.N. Nagaraja, Order Statistics, 3rd edn. (Wiley, New York, 2003) R.M. Gray, L.D. Davisson, An Introduction to Statistical Signal Processing (Cambridge University Press, Cambridge, 2010) A. Gut, An Intermediate Course in Probability (Springer, New York, 1995) C.W. Helstrom, Probability and Stochastic Processes for Engineers, 2nd edn. (Prentice-Hall, Englewood Cliffs, 1991) S. Kim, Mathematical Statistics (in Korean) (Freedom Academy, Paju, 2010) A. Leon-Garcia, Probability, Statistics, and Random Processes for Electrical Engineering, 3rd edn. (Prentice Hall, New York, 2008) M. Loeve, Probability Theory, 4th edn. (Springer, New York, 1977) E. Lukacs, Characteristic Functions, 2nd edn. (Griffin, London, 1970) T.M. Mills, Problems in Probability (World Scientific, Singapore, 2001) M.M. Rao, Measure Theory and Integration, 2nd edn. (Marcel Dekker, New York, 2004) V.K. Rohatgi, A.KMd.E. Saleh, An Introduction to Probability and Statistics, 2nd edn. (Wiley, New York, 2001) J.P. Romano, A.F. Siegel, Counterexamples in Probability and Statistics (Chapman and Hall, New York, 1986) S.M. Ross, A First Course in Probability (Macmillan, New York, 1976) S.M. Ross, Stochastic Processes, 2nd edn. (Wiley, New York, 1996) A.N. Shiryaev, Probability, 2nd edn. (Springer, New York, 1996) A.A. Sveshnikov (ed.), Problems in Probability Theory (Mathematical Statistics and Theory of Random Functions, Dover, New York, 1968) J.B. Thomas, Introduction to Probability (Springer, New York, 1986) P. Weirich, Conditional probabilities and probabilities given knowledge of a condition. Philos. Sci. 50(1), 82–95 (1983) C.K. Wong, A note on mutually independent events. Am. Stat. 26(2), 27–28 (1972)

Chapter 3

Random Variables

Based on the description of probability in Chap. 2, let us now introduce and discuss several topics on random variables: namely, the notions of the cumulative distribution function, expected values, and moments. We will then discuss conditional distribution and describe some of the widely-used distributions.

3.1 Distributions Let us start by introducing the notion of the random variable and its distribution (Gardner 1990; Leon-Garcia 2008; Papoulis and Pillai 2002). In describing the distributions of random variables, we adopt the notion of the cumulative distribution function, which is a useful tool in characterizing the probabilistic properties of random variables.

3.1.1 Random Variables Generally, a random variable is a real function of which the domain is a sample space. The range of a random variable X : Ω → R on a sample space Ω is S X = {x : x = X (s), s ∈ Ω} ⊆ R. In fact, a random variable is not a variable but a function: yet, it is customary to call it a variable. In many cases, a random variable is denoted by an upper case alphabet such as X , Y , . . .. Definition 3.1.1 (random variable) For a sample space Ω of the outcomes from a random experiment, a function X that assigns a real number x = X (ω) to ω ∈ Ω is called a random variable.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 I. Song et al., Probability and Random Variables: Theory and Applications, https://doi.org/10.1007/978-3-030-97679-8_3

161

162

3 Random Variables

For a more precise definition of a random variable, we need the concept of a measurable function. Definition 3.1.2 (measurable function) Given a probability space (Ω, F, P), a realvalued function g that maps the sample space Ω onto the real number R is called a measurable function when the condition if B ∈ B(R), then g −1 (B) ∈ F

(3.1.1)

is satisfied. Example 3.1.1 A real-valued function g for which g −1 (D) is a Borel set for every open set D is called a Borel function, and is a measurable function. ♦ The random variable can be redefined as follows: Definition 3.1.3 (random variable) A random variable is a measurable function defined on a probability space. In general, to show whether or not a function is a random variable is rather complicated. However, (A) every function g : Ω → R on a probability space (Ω, F, P) with the event space F being a power set is a random variable, and (B) almost all functions such as continuous functions, polynomials, unit step function, trigonometric functions, limits of measurable functions, min and max of measurable functions, etc. that we will deal with are random variables. Example 3.1.2 (Romano and Siegel 1986) For the sample space Ω = {1, 2, 3} and event space F = {Ω, ∅, {3}, {1, 2}}, assume the function g such that g(1) = 1, g(2) = 2, and g(3) = 3. Then, g is not a random variable because g −1 ({1}) = {1} ∈ / F although {1} ∈ B(R). ♦ Random variables can be classified into the following three classes: Definition 3.1.4 (discrete random variable; continuous random variable; hybrid random variable) A random variable is called a discrete random variable, continuous random variable, or hybrid random variable when the range is a countable set, an uncountable set, or the union of an uncountable set and a countable set, respectively. A hybrid random variable is also called a mixed-type random variable. A discrete random variable with finite range is sometimes called a finite random variable. The probabilistic characteristics of a continuous random variable and a discrete random variable can be described by the pdf and pmf, respectively. In the meantime, based on Definitions 1.1.22 and 3.1.4, the range of a discrete random variable can be assumed as subsets of {0, 1, . . .} or {1, 2, . . .}.

3.1 Distributions

163

Example 3.1.3 When the outcome from a rolling of a fair die is n, let X 1 (n) = n and  0, n is an odd number, X 2 (n) = (3.1.2) 1, n is an even number. Then, X 1 and X 2 are both discrete random variables.



Example 3.1.4 The random variables L, Θ, and D defined below are all continuous random variables. (1) When (x, y) denotes the coordinate of a randomly selected  point Q inside the unit circle centered atthe origin O, the length L(Q) = x 2 + y 2 of O Q. The  angle Θ(Q) = tan−1 xy formed by O Q and the positive x-axis. (2) The difference D(r ) = |r − r˜ | between a randomly chosen real number r and its rounded integer r˜ . ♦ Example 3.1.5 Assume the response g ∈ {responding, not responding, busy} in a phone call. Then, the length of a phone call is a random variable and can be expressed as X (g) = t

(3.1.3)

for t ≥ 0. Here, because P(X = 0) > 0 and X is continuous for (0, ∞), X is a hybrid random variable. ♦

3.1.2 Cumulative Distribution Function Let X be a random variable defined on the probability space (Ω, F, P). Denote the range of X by A and denote the inverse image of B by X −1 (B) for B ⊆ A. Then, we have   P X (B) = P X −1 (B) ,

(3.1.4)

which implies that the probability of an event is equal to the probability of the inverse image of the event. Based on (3.1.4) and the probability measure P of the original probability space (Ω, F, P), we can obtain the probability measure P X of the probability space induced by the random variable X . Example 3.1.6 Consider a rolling of a fair die and assume P(ω) = 16 for ω ∈ Ω = {1, 2, . . . , 6}. Define a random variable X by X (ω) = −1 for ω = 1, X (ω) = −2 for ω = 2, 3, 4, and X (ω) = −3 for ω = 5, 6. Then, we have A = {−3, −2, −1}. Logically, X −1 ({−3}) = {5, 6}, X −1 ({−2}) = {2, 3, 4}, and X −1 ({−1}) = {1}. Now, the probability measure of random variable X can be obtained as P X ({−3}) =

164

3 Random Variables

  P X −1 ({−3}) = P({5, 6}) = 13 , P X ({−2}) = P({2, 3, 4}) = 21 , and P X ({−1}) = P({1}) = 16 . ♦ We now describe in detail the distribution of a random variable based on (3.1.4), and then define a function with which the distribution can be managed more conveniently. Consider a random variable X defined on the probability space (Ω, F, P) and the range A ⊆ R of X . When B ∈ B(A), the set X −1 (B) = {ω : X (ω) ∈ B}

(3.1.5)

is a subset of Ω and, at the same time, an element of the event space F due to the definition of a random variable. Based on the set X −1 (B) shown in (3.1.5), the distribution of the random variable X can be defined as follows: Definition 3.1.5 (distribution) The set function   P X (B) = P X −1 (B) = P ({ω : X (ω) ∈ B})

(3.1.6)

for B ∈ B(A) represents the probability measure of X and is called the distribution of the random variable X , where A is the range of X and B(A) is the Borel field of A. In essence, the distribution of X is a function representing the probabilistic characteristics of the random variable X . The probability measure P X in (3.1.6) induces a new probability space ( A, B(A), P X ): a consequence is that we can now deal not with the original probability space (Ω, F, P) but with the equivalent probability space (A, B(A), P X ), where the sample points are all real numbers. Figure 3.1 shows the relationship (3.1.6). The distribution of a random variable can be described by the pmf or pdf as we have observed in Chap. 2. First, for a discrete random variable X with range A, the pmf p X of X can be obtained as p X (x) = P X ({x}), x ∈ A

(3.1.7)

Fig. 3.1 The distribution of random variable X X −1 (B) Ω

X B

PX (B) = P ({ω : X(ω) ∈ B}) = P X −1 (B)

R

3.1 Distributions

165

from the distribution P X , which in turn can be expressed as P X (B) =



p X (x),

B ∈ B(A)

(3.1.8)

x∈B

in terms of the pmf p X . For a continuous random variable X with range A and pdf f X , we have  P X (B) =

f X (x)d x,

B ∈ B(A),

(3.1.9)

B

which is the counterpart of (3.1.8): note that the counterpart of (3.1.7) does not exist for a continuous random variable. Definition 3.1.6 (cumulative distribution function) Assume a random variable X on a sample space Ω, and let A x = {s : s ∈ Ω, X (s) ≤ x} for a real number x. Then, we have P ( A x ) = P(X ≤ x) = P X ((−∞, x]) and the function FX (x) = P X ((−∞, x])

(3.1.10)

is called the distribution function or cumulative distribution function (cdf) of the random variable X . The cdf FX (x) denotes the probability that X is located in the half-open interval (−∞, x]. For example, FX (2) is the probability that X is in the half-open interval (−∞, 2], i.e., the probability of the event {−∞ < X ≤ 2}. The pmf and cdf for a discrete random variable and the pdf and cdf for a continuous random variable can be expressed in terms of each other, as we shall see in (3.1.24), (3.1.32), and (3.1.33) later. The probabilistic characteristics of a random variable can be described by the cdf, pdf, or pmf: these three functions are all frequently indicated as the distribution function, probability distribution function, or probability function. In some cases, only the cdf is called the distribution function, and probability function in the strict sense only indicates the probability measure P as mentioned in Sect. 2.2.3. In some fields such as statistics, the name distribution function is frequently used while the name cdf is widespread in other fields including engineering. Example 3.1.7 Let the outcome from a rolling of a fair die be X . Then, we can obtain the cdf FX (x) = P(X ≤ x) of X as FX (x) = P(X ≤ x) ⎧ ⎨ 1, x ≥ 6, = 6i , i ≤ x < i + 1, i = 1, 2, 3, 4, 5, ⎩ 0, x < 1, which is shown in Fig. 3.2.

(3.1.11)



166

3 Random Variables

Fig. 3.2 The cdf FX (x) of the number X resulting from a rolling of a fair die

1

FX (x)

2 3 1 3

0

1

2

3

Fig. 3.3 The cdf FY (x) of the coordinate Y chosen randomly in the interval [0, 1]

4

5

6

x

FY (x) 1

0

1

x

Example 3.1.8 Let the coordinate Y be a number chosen randomly in the interval [0, 1]. Then, P(Y ≤ x) = 1, x, and 0 when x ≥ 1, 0 ≤ x < 1, and x < 0, respectively. Therefore, the cdf of Y is ⎧ ⎨ 1, x ≥ 1, FY (x) = x, 0 ≤ x < 1, ⎩ 0, x < 0. Figure 3.3 shows the cdf FY (x).

(3.1.12)



Theorem 3.1.1 The cdf is a non-decreasing function: that is, F (x1 ) ≤ F (x2 ) when x1 < x2 for a cdf F. In addition, we have F(∞) = 1 and F(−∞) = 0. From the definition of the cdf and probability measure, it is clear that P(X > x) = 1 − FX (x)

(3.1.13)

because P ( Ac ) = 1 − P(A) and that P(a < X ≤ b) = FX (b) − FX (a)

(3.1.14)

for a ≤b. In addition, at a discontinuity point x D of a cdf FX (x), we have FX (x D ) = FX x D+ and   FX (x D ) − FX x D− = P (X = x D )

(3.1.15)

3.1 Distributions

167 FX (x)

Fig. 3.4 An example of the cdf of a hybrid random variable

1 FX

x+ D

= FX (xD ) P (X = xD )

FX x− D xD

0

x

for a discrete or a hybrid random variable as shown in Fig. 3.4. On the other hand, the probability of one point is 0 for a continuous random variable: in other words, we have P(X = x) = 0

(3.1.16)

  FX (x) − FX x − = 0

(3.1.17)

and

for a continuous random variable X . Theorem 3.1.2 The cdf is continuous from the right. That is, FX (x) = lim FX (x + ),  > 0

(3.1.18)

→0

for a cdf FX . ∞ Proof Consider a sequence {αi }i=1 such that αi+1 ≤ αi and lim αi = 0. Then, i→∞

lim FX (x + ε) − FX (x) = lim {FX (x + αi ) − FX (x)}

ε→0

i→∞

= lim P (X ∈ (x, x + αi ]) . i→∞

(3.1.19)

 Now, we have lim P (X ∈ (x, x + αi ]) = lim P X ((x, x + αi ]) = P X lim x, i→∞ i→∞ i→∞

 ∞ x + αi from (2.A.1) because {(x, x + αi ]}i=1 is a monotonic sequence. Subse    ∞ quently, we have P X lim {(x, x + αi ]} = P X ∩ (x, x + αi ] = P X (∅) from i→∞



i=1

(1.5.9) and ∩ (x, x + αi ] = ∅ as shown, for instance, in Example 1.5.9. In other i=1

words,

168

3 Random Variables

lim P (X ∈ (x, x + αi ]) = 0,

i→∞

(3.1.20) ♠

completing the proof.

Example 3.1.9 (Loeve 1977) Let the probability measure and corresponding cdf be P and F, respectively. When g is an integrable function, 

 g dP or

g dF

(3.1.21)

is called the Lebesgue-Stieltjes integral and is often written as, for instance, 

 [a,b)

b

g dP =

g d F.

(3.1.22)

a

When F(x) = x for x ∈ [0, 1], the measure P is called the Lebesgue measure as mentioned in Definition 2.A.7, and 

 [a,b)

g dx =

b

g dx

(3.1.23)

a

is the Lebesgue integral. If g is continuous on [a, b], then the Lebesgue-Stieltjes b b integral a g d F is the Riemann-Stieltjes integral, and the Lebesgue integral a g d x is the Riemann integral. ♦ As we have already seen in Examples 3.1.7 and 3.1.8, subscripts are used to distinguish the cdf’s of several random variables as in FX and FY . In addition, when the cdf FX and pdf f X is for the random variable X with the distribution P X , it is denoted by X ∼ P X , X ∼ FX , or X ∼ f X . For example, X ∼ P(λ) means that the random variable X follows the Poisson distribution with parameter λ, X ∼ U [a, b) means that the distribution of the random variable X is the uniform distribution over [a, b), and Y ∼ f Y (t) = e−t u(t) means that the pdf of the random variable Y is f Y (t) = e−t u(t). Theorem 3.1.3 A cdf may have at most countably many jump discontinuities. Proof Assume the cdf F(x) is discontinuous at x0 . Denote  1 1 by Dn the set of dis, n , where n is a natucontinuities with the jump in the half-open interval n+1 ral number. Then, the number of elements in Dn is at most n, because otherwise F(∞) − F(−∞) > 1. In other words, there exists at most one discontinuity with jump between 21 and 1, at most two discontinuities with jump between 13 and 21 , . . ., at 1 most n − 1 discontinuities with jump between n1 and n−1 , . . .. Therefore the number of discontinuities is at most countable. ♠

3.1 Distributions

169

Theorem 3.1.3 is a special case of the more general result that a function which is continuous from the right-hand side or left-hand side at all points and a monotonic real function may have, at most, countably many jump discontinuities. Based on the properties of the cdf, we can now redefine the continuous, discrete, and hybrid random variables as follows: Definition 3.1.7 (discrete random variable; continuous random variable; hybrid random variable) A continuous, discrete, or hybrid random variable is a random variable whose cdf is a continuous, a step-like function, or a discontinuous but not a step-like function, respectively. Here, when a function is increasing only at some points and is constant in a closed interval not containing the points, the function is called a step-like function. The cdf shown in Fig. 3.4 is an example of a hybrid random variable which is not continuous at a point x D .

3.1.3 Probability Density Function and Probability Mass Function In characterizing the probabilistic properties of continuous and discrete random variables, we can use a pdf and a pmf, respectively. In addition, the cdf can also be employed for the three classes of random variables: the continuous, discrete, and hybrid random variables. Let us denote the cdf of a random variable X by FX , the pdf by f X when X is a continuous random variable, and the pmf by p X when X is a discrete random variable. Then, the cdf FX (x) = P X ((−∞, x]) can be expressed as

FX (x) =

⎧ x f X (y)dy, if X is a continuous random variable, ⎪ ⎪ ⎨ −∞ x  ⎪ ⎪ p X (y), ⎩

(3.1.24) if X is a discrete random variable.

y=−∞

When X is a hybrid random variable, we have for 0 < α < 1 FX (x) = α

x  k=−∞

 p X (k) + (1 − α)

x −∞

f X (y)dy,

(3.1.25)

which is sufficiently general for us to deal with in this book. Note that, as described in Appendix 3.1, the most general cdf is a weighted sum of an absolutely continuous function, a discrete function, and  a singular function. The probability P X (B) = B d FX (x) of an event B can be obtained as

170

3 Random Variables

⎧ ⎪ ⎨ B f X (x)d x, for a continuous random variable, P X (B) =  ⎪ p X (x), for a discrete random variable. ⎩

(3.1.26)

x∈B

Example 3.1.10 Consider a Rayleigh random variable R. Then,from the pdf f R (x) x x2 t2 u(x), the cdf FR (x) = −∞ αt2 exp − 2α u(t)dt is easily = αx2 exp − 2α 2 2 obtained as    x2 u(x). (3.1.27) FR (x) = 1 − exp − 2 2α When α = 1, the probability of the event {1 < R < 2} is √ FR (1) = e−1 − e−2 ≈ 0.4712.

2 1

f R (t)dt = FR (2) − ♦

Theorem 3.1.4 The cdf FX satisfies 1 − FX (x) = FX (−x)

(3.1.28)

when the pdf f X is an even function. ∞  −∞  −x Proof First, P(X > x) = x f X (y)dy = −x f X (−t)(−dt) = −∞ f X (t)dt = ♠ FX (−x) because f X (x) = f X (−x). Recollecting (3.1.13), we get (3.1.28). −kx

ke Example 3.1.11 Consider the pdf’s f L (x) = (1+e −kx )2 for k > 0 of the logistic distriλ −λ|x| for λ > 0 of the double exponential bution (Balakrishnan 1992) and f D (x) = 2 e distribution. The cdf’s of these distributions are

FL (x) =

1 1 + e−kx

(3.1.29)

and FD (x) =

1

eλx , x ≤ 0, 1 − 21 e−λx , x ≥ 0, 2

respectively, for which Theorem 3.1.4 is easily confirmed.

(3.1.30) ♦

Example 3.1.12 We have the cdf FC (x) =

1 1 + tan−1 2 π



x −β α

 (3.1.31)

3.1 Distributions

171

for the Cauchy distribution with pdf f C (r ) = in (2.5.28).

α π



(r − β)2 + α2

−1

shown ♦

From (3.1.24), we can easily see that the pdf and pmf can be obtained as d FX (x) dx 1 = lim P(x < X ≤ x + ε) ε→0 ε

f X (x) =

(3.1.32)

and p X (xi ) = FX (xi ) − FX (xi−1 )

(3.1.33)

from the cdf when X is a continuous random variable and a discrete random variable, respectively. For a discrete random variable, a pmf is used normally. Yet, we can also define the pdf of a discrete random variable using the impulse function as we have observed in (2.5.37). Specifically, let the cdf and pmf of a discrete random  variable X be FX and p X , respectively. Then, based on FX (x) = p X (xi ) = p X (xi ) u (x − xi ), xi ≤x

i

we can regard d  p X (xi ) u (x − xi ) dx i  = p X (xi ) δ (x − xi )

f X (x) =

(3.1.34)

i

as the pdf of X . Example 3.1.13 For the pdf f (x) = 2x for x ∈ [0, 1] and 0 otherwise, sketch the cdf. x Solution Obtaining the cdf F(x) = −∞ f (t)dt, we get  F(x) =

0, x < 0; 1, x ≥ 1,

x 2 , 0 ≤ x < 1;

which is shown in Fig. 3.5 together with the pdf.

(3.1.35) ♦

Example 3.1.14 For the pdf f (x) =

1 1 1 {u(x) − u(x − 1)} + δ(x − 1) + δ(x − 2), 2 3 6

obtain and sketch the cdf.

(3.1.36)

172

3 Random Variables f (x)

F (x)

2

1 x2 x

1

1

x

Fig. 3.5 The cdf F(x) for the pdf f (x) = 2xu(x)u(1 − x) F (x)

f (x) 1 2

1 1 δ(x 3

5 6

− 1) 1 δ(x − 2) 6 x

2

1 2

2

1

x

1

Fig. 3.6 The pdf f (x) = 21 {u(x) − u(x − 1)} + 13 δ(x − 1) + 16 δ(x − 2) and cdf F(x)

Solution First, we get the cdf F(x) =  F(x) =

0, 5 , 6

x −∞

f (t)dt as

x < 0; 1 ≤ x < 2;

x , 2

1,

0 ≤ x < 1; 2 ≤ x,

which is shown in Fig. 3.6 together with the pdf (3.1.36).

(3.1.37) ♦

Example 3.1.15 Let X be the face of a die from a rolling. Then, the cdf of X is 6  FX (x) = 16 u(x − i), from which we get the pdf i=1

1 f X (x) = δ(x − i) 6 i=1 6

(3.1.38)

of X by differentiation. In addition, p X (i) = is the pmf of X .

1

, i = 1, 2, . . . , 6, 6 0, otherwise

(3.1.39) ♦

3.1 Distributions

173

Example 3.1.16 The function (2.5.37) addressed in Example 2.5.23 is the pdf of a hybrid random variable. ♦ Example 3.1.17 A box contains G green and B blue balls. Assume we take one ball from the box n times without1 replacement. Obtain the pmf of the number X of green balls among the n balls taken from the box. Solution We easily get the probability of X = k as P(X = k) =

B Cn−k G Ck

(3.1.40)

G+B Cn

for {0 ≤ k ≤ G, 0 ≤ n − k ≤ B} or, equivalently, for max(0, n − B) ≤ k ≤ min(n, G). Thus, the pmf of X is p X (k) =

⎧ ⎨

B Cn−k G Ck

⎩ 0,

G+B Cn

ˇ kˇ + 1, . . . , min(n, G), , k = k,

(3.1.41)

otherwise,

where2 kˇ = max(0, n − B). In addition, (3.1.41) will become p X (k) = 1 for k = 0 and 0 for k = 0 when G = 0, and p X (k) = 1 for k = n and 0 for k = n when B = 0. The distribution with replacement of the balls will be addressed in Exercise 3.5. ♦ For a random variable X with pdf f X and cdf FX , noting that the cdf is continuous from the right-hand side, the probability of the event {x1 < X ≤ x2 } shown in (3.1.14) can be obtained as P (x1 < X ≤ x2 ) = FX (x2 ) − FX (x1 )  x2+ = f X (x)d x.

(3.1.42)

x1+

2 Example 3.1.18 For the random variable Z with pdf f Z (z) = √12π exp − z2 , 1 we have P(|Z | ≤ 1) = −1 f Z (z)dz ≈ 0.6826, P(|Z | ≤ 2) ≈ 0.9544, and P(|Z | ≤ 3) ≈ 0.9974. ♦ Using (3.1.42), the value F(∞) = 1 mentioned in Theorem 3.1.1 can be con∞ firmed as −∞ f (x)d x = P(−∞ < X ≤ ∞) = F(∞) = 1. Let us mention that  x−  x+ although P (x1 ≤ X < x2 ) = x −2 f X (x)d x, P (x1 ≤ X ≤ x2 ) = x −2 f X (x)d x, and 1

1

1

The distribution of X is a hypergeometric distribution. Here, ‘max(0, n − B) ≤ k ≤ min(n, G)’ can be replaced with ‘all integers k’ by noting that p Cq = 0 for q < 0 or q > p when p is a non-negative integer and q is an integer from Table 1.4.

2

174

3 Random Variables

 x− P (x1 < X < x2 ) = x +2 f X (x)d x are slightly different from (3.1.42), these four 1 probabilities are all equal to each other unless the pdf f X contains impulse functions at x1 or x2 . As it is observed, for instance, in Example 3.1.15, considering a continuous random variable with the pdf is very similar to considering a discrete random variable with the pmf. Therefore, we will henceforth focus on discussing a continuous random variable with the pdf. One final point is that lim f (x) = 0

(3.1.43)

lim f  (x) = 0

(3.1.44)

x→±∞

and x→±∞

hold true for all the pdf’s f we will discuss in this book.

3.2 Functions of Random Variables and Their Distributions In this section, when the cdf FX , pdf f X , or pmf p X of a random variable X is known, we obtain the probability functions of a new random variable Y = g(X ), where g is a measurable function (Middleton 1960).

3.2.1 Cumulative Distribution Function First, the cdf FY (v) = P(Y ≤ v) = P(g(X ) ≤ v) of Y = g(X ) can be obtained as FY (v) = P(x : g(x) ≤ v, x ∈ A),

(3.2.1)

where A is the sample space of the random variable X . Using (3.2.1), the pdf or pmf of Y can be obtained subsequently: specifically, we can obtain the pdf of Y as f Y (v) =

d FY (v) dv

(3.2.2)

when Y is a continuous random variable, and the pmf of Y as pY (v) = FY (v) − FY (v − 1)

(3.2.3)

3.2 Functions of Random Variables and Their Distributions

175

when Y is a discrete random variable. The result (3.2.3) is for a random variable whose range is a subset of integers as described after Definition 3.1.4: more generally, we can write it as pY (vi ) = FY (vi ) − FY (vi−1 )

(3.2.4)

when A = {v1 , v2 , . . .} instead of A = {0, 1, . . .}. Example 3.2.1 Obtain the cdf FY of Y = a X + b in terms of the cdf FX of X , where a = 0. Solution We have the cdf FY (y) = P(Y ≤ y) = P(a X + b ≤ y) as ⎧  ⎨ P X ≤ y−b , a > 0, a  FY (y) = ⎩ P X ≥ y−b , a < 0 a ⎧  ⎨ FX y−b , a > 0, a   = y−b y−b ⎩P X = + 1 − FX a , a < 0 a

(3.2.5)

by noting that the set {Y ≤ y} is equivalent to the set {a X + b ≤ y}.



Example 3.2.2 When the random variable X has the cdf  FX (x) =

0, x ≤ 0; 1, x ≥ 1;

x, 0 ≤ x ≤ 1;

(3.2.6)

the cdf of Y = 2X + 1 is  FY (y) =

0, y ≤ 1; 1, y ≥ 3;

y−1 , 2

1 ≤ y ≤ 3;

(3.2.7) ♦

which are shown in Fig. 3.7.

FX (x) 1

FY (y) 1 1 2

0

1

x

0

Fig. 3.7 The cdf FX (x) of X and cdf FY (y) of Y = 2X + 1

1

2

3

y

176

3 Random Variables

Example 3.2.3 For a continuous random variable X with cdf FX , obtain the cdf of Y = X1 . Solution We get FY (y) = P (X (y X − 1) ≥ 0) ⎧  1 ⎪ ⎪ ⎨ P X ≤ 0 or X ≥ y , = P (X ≤ 0) ,  ⎪ ⎪ ⎩P 1 ≤ X ≤ 0 , y ⎧  1 ⎪ ⎪ ⎨ FX (0) + 1 − FX y , = FX (0),  ⎪ ⎪ ⎩ FX (0) − FX 1 , y

by noting that

1 X

y > 0, y = 0, y 0, y = 0,

(3.2.8)

y < 0,

  ≤ y = X ≤ y X 2 = {(y X − 1)X ≥ 0}.



Example 3.2.4 Obtain the cdf of Y = a X 2 in terms of the cdf FX of X when a > 0.  Solution Because the set {Y ≤ y} is equivalent set a X 2 ≤ y , the cdf of Y   2 to the can be obtained as FY (y) = P(Y ≤ y) = P X ≤ ay , i.e.,  FY (y) =

0,    y < 0, y P − a ≤ X ≤ ay , y ≥ 0,

which can be rewritten as  0,     y < 0,   FY (y) = y y y − F − + P X = − , y≥0 FX X a a a           y y y − FX − +P X =− u(y) = FX a a a in terms of the cdf FX of X . In (3.2.10), it is assumed u(0) = 1.

(3.2.9)

(3.2.10) ♦

Example 3.2.5 Based onthe result of Example 3.2.4, obtain the cdf of Y = X 2 when FX (x) = 1 − 23 e−x u(x). Solution For convenience, let α = ln 23 . Then, eα = 23 . Recollecting that P(X =     0) = FX 0+ − FX 0− = 13 − 0 = 13 in (3.2.10), we have FY (x) = 0 when x < 0 and FY (0) = {1 − exp(α)} − {1 − exp(α)} + P(X = 0) = 13 when x = 0.  √  √  √  − exp − α u x = 1− When x > 0, recollecting  that   X x = 1√  x +√  √ √ F exp − x +α√ and FX −  √ x + α u − x = 0, we get √ x = 1 − exp FY (x) = FX x − FX − x = 1 − exp − x + α . In summary,

3.2 Functions of Random Variables and Their Distributions FX (x) 1

FY (x) 1

1 3

1 3

x

0

0

177

x

  Fig. 3.8 The cdf FX (x) = 1 − 23 e−x u(x) and cdf FY (x) = 1 − 23 e−



x



u(x) of Y = X 2

  2 −√x u(x), FY (x) = 1 − e 3 which is shown in Fig. 3.8 together with FX (x).

(3.2.11) ♦

√ Example 3.2.6 Express the cdf FY of Y = X in terms of the cdf FX of X when P(X < 0) = 0.  0,  y < 0, i.e., Solution We have FY (y) = P X ≤ y 2 , y ≥ 0,   FY (y) = FX y 2 u(y) from FY (y) = P(Y ≤ y) = P



 X≤y .

(3.2.12) ♦

Example 3.2.7 Recollecting that the probability for a singleton set is 0 for a continuous random variable X , the cdf of Y = |X | can be obtained as FY (y) = P(Y ≤ y) = P(|X | ≤ y), i.e.,  FY (y) = 

0, y < 0, P(−y ≤ X ≤ y), y ≥ 0

0, y < 0, FX (y) − FX (−y) + P(X = −y), y ≥ 0 = {FX (y) − FX (−y)} u(y)

=

(3.2.13)

in terms of the cdf FX of X . Examples of the cdf FX (x) and FY (y) are shown in Fig. 3.9. ♦ Example 3.2.8 When the cdf of the input X to the limiter ⎧ ⎨ b, x ≥ b, g(x) = x, −b ≤ x ≤ b, ⎩ −b, x < −b

(3.2.14)

178

3 Random Variables

FX (x) 1

0

FY (y) 1

x

0

y

Fig. 3.9 The cdf FX (x) of X and the cdf FY (y) of Y = |X |

is FX , obtain the cdf FY of the output Y = g(X ). Solution First, when y < −b and y ≥ b, we have FY (y) = 0 and FY (y) = FY (b) = 1, respectively. Next, when −b ≤ y < b, we have FY (y) = FX (y) from FY (y) = P(Y ≤ y) = P(X ≤ y). Thus, we eventually have ⎧ y ≥ b, ⎨ 1, FX (y), −b ≤ y < b, FY (y) = ⎩ 0, y < −b,

(3.2.15)

which is continuous from the right-hand side at any point y and discontinuous at y = ±b in general. ♦ Example 3.2.9 Obtain the cdf of Y = g(X ) when X ∼ U (−1, 1) and ⎧1 ⎨ 2 , x ≥ 21 , g(x) = x, − 21 ≤ x < 21 , ⎩ 1 − 2 , x < − 21 .

(3.2.16)

Solution The cdf of Y = g(X ) can be obtained as ⎧ y ≥ 21 , ⎨ 1, 1 FY (y) = (y + 1), − 21 ≤ y < 21 , ⎩2 0, y < − 21 using (3.2.15), which is shown in Fig. 3.10.

(3.2.17)



3.2.2 Probability Density Function Let us first introduce the following theorem which is quite useful in dealing with the differentiation of an integrated bi-variate function:

3.2 Functions of Random Variables and Their Distributions FY (y)

g(x)

FX (x)

1 2

1

1 3 4

− 12 0 −1

0

179

1

1 2

x

− 12

x

1 2 1 4

− 12 0

y

1 2

Fig. 3.10 The cdf FX (x), limiter g(x), and cdf FY (y) of Y = g(X ) when X ∼ U (−1, 1)

Theorem 3.2.1 Assume that a(x) and b(x) are integrable functions and that both ∂ g(t, x) are continuous in x and t. Then, we have g(t, x) and ∂x d dx



b(x)

g(t, x)dt = g(b(x), x)

a(x)

 +

b(x)

a(x)

da(x) db(x) − g(a(x), x) dx dx ∂g(t, x) dt, ∂x

(3.2.18)

which is called the Leibnitz’s rule. Example 3.2.10 Assume a(x) = x, b(x) = x 2 , and g(t, x) = 2t + x. Then,  b(x)  x2 ∂ 4 3 2 g(t, x)dt = 4x 3 + 3x 2 − 4x from x (2t + x)dt = x + x − 2x . ∂x a(x)  b(x) ∂g(t,x) On the other hand, a(x) ∂x dt = x 2 − x from ∂g(t,x) = 1. Therefore,  b(x) ∂g(t,x)  2∂x  db(x) da(x) g(b(x), x) d x − g(a(x), x) d x + a(x) ∂x dt = 2x 2x + x − 3x + x 2 − x = 4x 3 + 3x 2 − 4x from g(b(x), x) = 2x 2 + x, g(a(x), x) = 3x, da(x) = 1, and dx db(x) = 2x. ♦ dx 3.2.2.1

One-to-One Transformations

We attempt to obtain the cdf FY (y) = P(Y ≤ y) of Y = g(X ) when thepdf of X  is f X . First, if g is differentiable and increasing, then the cdf FY (y) = FX g −1 (y) is  FY (y) =

g −1 (y)

−∞

f X (t)dt

(3.2.19)

 because {Y ≤ y} = X ≤ g −1 (y) , where g −1 is the inverse of g. Thus, the pdf of  −1  d FY (y) = f X g −1 (y) dg dy(y) , i.e., Y = g(X ) is f Y (y) = dy

180

3 Random Variables

f Y (y) = f X (x)

dx dy

(3.2.20)

with x = g −1 (y). Similarly, if g is differentiable and decreasing, then the cdf of Y is  ∞ f X (t)dt (3.2.21) FY (y) = g −1 (y)

  from FY (y) = P(Y ≤ y) = P X ≥ g −1 (y) , and the pdf is f Y (y) =   −1 − f X g −1 (y) dg dy(y) , i.e., f Y (y) = − f X (x)

dx . dy

d dy

FY (y) =

(3.2.22)

Combining (3.2.20) and (3.2.22), we have the following theorem: Theorem 3.2.2 When g is a differentiable and decreasing function or a differentiable and increasing function, the pdf of Y = g(X ) is f Y (y) =

 f X (x)  , |g  (x)| x=g−1 (y)

(3.2.23)

where f X is the pdf of X . f g −1 (y) The result (3.2.23) can be written as f Y (y) = |gX (g−1 (y) ) , as f Y (y) = ( )|     dx  f X (x)  dy  , or as −1 x=g

(y)

f Y (y)|dy| = f X (x)|d x|.

(3.2.24)

The formula (3.2.24) represents the conservation or invariance of probability: the probability f X (x)|d x| of the region |d x| of the random variable X is the same as the probability f Y (y)|dy| of the region |dy| of the random variable Y when the region |dy| of Y is the image of the region |d x| of X under the function Y = g(X ). Example 3.2.11 For a non-zero real number a, let Y = a X + b. Then, noting that inverse function of y = g(x) = ax + b is x = g −1 (y) = y−b and that a    the  g g −1 (y)  = |a|, we get f a X +b (y) =

1 fX |a|



y−b a

 (3.2.25)

3.2 Functions of Random Variables and Their Distributions

181

fY (y)

fX (x) 1

1 2

0

x

1

2

1

y

3

Fig. 3.11 The pdf f X (x) and pdf f Y (y) of Y = 2X + 1 when X ∼ U [0, 1) d from (3.2.23). This result is the same as f a X +b (y) = dy Fa X +b (y), the derivative of the cdf (3.2.5) obtained in Example 3.2.1. Figure 3.11 shows the pdf f X (x) and pdf ♦ f Y (y) = 21 u(y + 1)u(3 − y) of Y = 2X + 1 when X ∼ U [0, 1).

Example 3.2.12 Obtain the pdf of Y = cX when X ∼ G(α, β) and c > 0. Solution Using (2.5.31) and (3.2.25), we get    y α−1 y 1 1 y u exp − α c β Γ (α) c cβ c   1 y u (y) . = y α−1 exp − (cβ)α Γ (α) cβ

f cX (y) =

(3.2.26)

In other words, cX ∼ G(α, cβ) when X ∼ G(α, β) and c > 0.



Example 3.2.13 Consider Y = X1 . Because the inverse function of y = g(x) =    is x = g −1 (y) = 1y and g  g −1 (y)  = y 2 , we get 1 f X1 (y) = 2 f X y

  1 y

1 x

(3.2.27)

from (3.2.23), which can also be obtained by differentiating (3.2.8). Figure 3.12 shows the pdf f X (x) and pdf f Y (y) of Y = X1 when X ∼ U [0, 1). ♦ Example 3.2.14 When X ∼ C(α), obtain the distribution of Y = α 1 , we get π x 2 +α2   1 then X ∼ C α1 .

Solution Noting that f X (x) = other words, if X ∼ C(α),

Example 3.2.15 Express the pdf f Y of Y =



f Y (y) =

1 1 απ y 2 + 12 α

1 . X

from (3.2.27). In ♦

X in terms of the pdf f X of X . √ Solution When y < 0, there to y = x, and thus f Y (y) = 0. When √ is no solution 2  y > 0, the solution to y = x is x = y and g (x) = 2√1 x . Therefore,

182

3 Random Variables fX (x)

fY (y)

1

1

0

1

x

Fig. 3.12 The pdf f X (x) and pdf f Y (y) of Y =

0 1 X

1

y

when X ∼ U [0, 1)

  f √ X (y) = 2y f X y 2 u(y),

(3.2.28)

  which is the same as f √ X (y) = 2y f X y 2 u(y) + FX (0)δ(y), obtainable by dif  ferentiating F√ X (y) = FX y 2 u(y) shown in (3.2.12), except at y = 0. Note √ that, for X to be meaningful, we should have P(X < 0) = 0. Thus, when X is a continuous random variable, we have FX (0) = P(X ≤ 0)  ∞= P(X = 0) = 0 and, consequently, FX (0)δ(y) = 0. We then easily obtain3 −∞ f √ X (y)dy = ∞ ∞ ∞  2 0 2y f X y dy = 0 f X (t)dt = −∞ f X (t)dt = 1 because f X (x) = 0 for x < 0 from P(X < 0) = 0. ♦ √ Example 3.2.16 Obtain the pdf of Y = X when the pdf X is ⎧ 0 ≤ x < 1, ⎨ x, f X (x) = 2 − x, 1 ≤ x < 2, ⎩ 0, x < 0 or x ≥ 2.

(3.2.29)

⎧ 2 0 ≤ y 2 < 1, ⎨y ,  2 2 f X y = 2 − y , 1 ≤ y 2 < 2, ⎩ 0, y 2 < 0 or y 2 ≥ 2,

(3.2.30)

⎧ 3 1, ⎨ 2y ,  0≤y 0, the solution to y = e x is x = ln y and g  (x) = e x . We therefore get f e X (y) =

1 f X (ln y)u(y), y

assuming u(0) = 0.

  Example 3.2.18 When X ∼ N m, σ 2 , obtain the distribution of Y = e X .   2 1 Solution Noting that f X (x) = √2πσ , we get exp − (x−m) 2 2σ   (ln y − m)2 u(y), f Y (y) = √ exp − 2σ 2 2πσ 2 y 1

(3.2.32) ♦

(3.2.33)

which  is called the log-normal pdf. Figure 3.14 shows the pdf f X (x) of X ∼ ♦ N 0, σ 2 and the pdf (3.2.33) of the log-normal random variable Y = e X . 3.2.2.2

General Transformations

We have discussed the probability functions of Y = g(X ) in terms of those of X when the transformation g is a one-to-one correspondence, via (3.2.23) in previous section.

184

3 Random Variables

We now extend our discussion into the more general case where the transformation y = g(x) has multiple solutions. Theorem 3.2.3 When the solutions to y = g(x) are x1 , x2 , . . ., that is, when y = g (x1 ) = g (x2 ) = · · · , the pdf of Y = g(X ) is obtained as f Y (y) =

∞  f X (xi ) , |g  (xi )| i=1

(3.2.34)

where f X is the pdf of X . We now consider some examples for the application of the result (3.2.34). Example 3.2.19 Obtain the pdf of Y = a X 2 for a > 0 in terms of the pdf f X of X . Solution If y < 0, then the solution to y = ax 2does not exist. Thus, f Y (y) = 0. If  y y 2 y > 0, then the solutions to y = ax are x1 = a and x2 = − a . Thus, we have       g (x1 ) = g  (x2 ) = 2a y from g  (x) = 2ax and, subsequently, a f

aX2

1 (y) = √ 2 ay



     y y fX + fX − u(y), a a

(3.2.35)

which is, as expected, the same as the result obtainable by differentiating the cdf (3.2.10) of Y = a X 2 . ♦ 4 Example 3.2.20 When X ∼ N (0, 1), we can easily obtain the pdf  f Y (y) =   2 √ 1 exp − y u(y) of Y = X 2 by noting that f X (x) = √1 exp − x . ♦ 2 2 2π y 2π

Example 3.2.21 Express the pdf f Y of Y = |X | in terms of the pdf f X of X . Solution When y < 0, there is no solution to y = |x|, and thus  f Y(y) = 0. When y > 0, the solutions to y = |x| are x1 = y and x2 = −y, and g  (x) = 1. Thus, we get f Y (y) = { f X (y) + f X (−y)} u(y),

(3.2.36)

which is the same as d [{FX (y) − FX (−y)} u(y)] dy = { f X (y) + f X (−y)} u(y)

f Y (y) =

4

(3.2.37)

This pdf is called the central chi-square pdf with the degree of freedom of 1. The central chi-square pdf, together with the non-central chi-square pdf, is discussed in Sect. 5.4.2.

3.2 Functions of Random Variables and Their Distributions

185

obtained by differentiating the cdf FY (y) in (3.2.13), and then, noting that {FX (y) ♦ −FX (−y)} δ(y) = {FX (0) − FX (0)} δ(y) = 0. Example 3.2.22 When X ∼ U [−π, π), obtain the pdf and cdf of Y = a sin(X + θ), where a > 0 and θ are constants. Solution First, we have f Y (y) = 0 for |y| > a. When |y| < a, letting the two solutions to y = g(x) = a sin(x + θ) in the interval [−π,  π) of x be x1 and x2 , 1 . Thus, recollecting that g  (x) = |a cos(x + θ)| = we have f X (x1 ) = f X (x2 ) = 2π  a 2 − y 2 , we get f Y (y) =

1  u(a − |y|) 2 π a − y2

(3.2.38)

from (3.2.34). Next, let us obtain the cdf FY (y). When 0 ≤ y ≤ a, letting α = sin−1 ay and 0 ≤ α < π2 , we have x1 = α − θ and x2 = π − α − θ and, consequently, FY (y) = P(Y ≤ y) = P (−π ≤ X ≤ x1 ) + P (x2 ≤ X < π). Now, from 1 1 P (−π ≤ X ≤ x1 ) = 2π (x1 + π) and P (x2 ≤ X < π) = 2π (π − x2 ), we have 1 FY (y) = 2π (2π + 2α − π), i.e., FY (y) =

1 y 1 + sin−1 . 2 π a

(3.2.39)

When −a ≤ y ≤ 0, letting β = sin−1 ay and − π2 ≤ β < 0, we have x1 = β − θ, x2 = −π − β − θ, and x1 − x2 = π + 2β, and thus the cdf is FY (y) = P(Y ≤ y) = 1 (π + 2β), i.e., P (x2 ≤ X ≤ x1 ) = 2π FY (y) =

1 y 1 + sin−1 . 2 π a

(3.2.40)

Combining FY (y) = 0 for y ≤ −a, FY (y) = 1 for y ≥ a, (3.2.39), and (3.2.40), we get5 ⎧ ⎨ 0, 1 FY (y) = + ⎩2 1,

1 π

y ≤ −a, sin−1 ay , |y| ≤ a, y ≥ a.

(3.2.41)

The cdf (3.2.41) can of coursebe obtained from the pdf (3.2.38) by integray √ tion: specifically, from FY (y) = −∞ πu(a−|t|) dt, we get FY (y) = 0 when y ≤ −a, a 2 −t 2 y −1    sin 1 FY (y) = π1 − π a a cos a cos θdθ = π1 sin−1 ay + π2 = 21 + π1 sin−1 ay when −a ≤ θ 2

y ≤ a, and FY (y) =

5

1 2

+

1 π

sin−1 1 = 1 when y ≥ a. Figure 3.15 shows the pdf

If a < 0, a will be replaced with |a|.

186

3 Random Variables fY (y)

fX (x)

FY (y) 1

1 2π

1 2

1 2π

−π 0

π x

−2

0

2 y

−2

0

2 y

Fig. 3.15 The pdf f X (x), pdf f Y (y) of Y = 2 sin(X + θ), and cdf FY (y) of Y when X ∼ U [−π, π)

f X (x), pdf f Y (y), and cdf FY (y) when a = 2. Exercise 3.4 discusses a slightly more general problem. ♦ Example 3.2.23 For a continuous random variable X with cdf FX , obtain the cdf, pdf, and pmf of Z = sgn(X ), where sgn(x) = u(x) − u(−x) = 2u(x) − 1 ⎧ x > 0, ⎨ 1, x = 0, = 0, ⎩ −1, x < 0

(3.2.42)

is called the sign function. First, we have the cdf FZ (z) = P(Z ≤ z) = P(sgn(X ) ≤ z) as ⎧ z < −1, ⎨ 0, FZ (z) = P(X ≤ 0), −1 ≤ z < 1, ⎩ 1, z≥1 (3.2.43) = FX (0)u(z + 1) + {1 − FX (0)}u(z − 1), and thus the pdf f Z (z) =

d dz

FZ (z) of Z is

f Z (z) = FX (0)δ(z + 1) + {1 − FX (0)}δ(z − 1).

(3.2.44)

In addition, we also have ⎧ z = −1, ⎨ FX (0), p Z (z) = 1 − FX (0), z = 1, ⎩ 0, otherwise as the pmf of Z .

(3.2.45)



3.2 Functions of Random Variables and Their Distributions

3.2.2.3

187

Finding Transformations

We have so far discussed obtaining the probability functions of Y = g(X ) when the probability functions of X and g are given. We now briefly consider the inverse problem of finding g when the cdf’s FX and FY are given or, equivalently, finding the function that transforms X with cdf FX into Y with cdf FY . The problem can be solved by making use of the uniform distribution as the intermediate step: i.e., FX (x) → uniform distribution → FY (y).

(3.2.46)

Specifically, assume that the cdf FX and the inverse FY−1 of the cdf FY are continuous and increasing. Letting Z = FX (X ),

(3.2.47)

 we have X = FX−1 (Z ) and {FX (X ) ≤ z} = X ≤ FX−1 (z) because FX is continuous 6 and  the cdf  of Z is FZ (z) = P(Z ≤ z) = P (FX (X ) ≤ z) =  increasing. Therefore, P X ≤ FX−1 (z) = FX FX−1 (z) = z for 0 ≤ z < 1. In other words, we have Z ∼ U [0, 1).

(3.2.48)

V = FY−1 (Z ).

(3.2.49)

Next, consider

  Then, recollecting (3.2.48), we get the cdf P(V ≤ y) = P FY−1 (Z ) ≤ y = P (Z ≤ FY (y)) = FZ (FY (y)) = FY (y) of V because FZ (x) = x for x ∈ (0, 1). In other words, when X ∼ FX , we have V = FY−1 (Z ) = FY−1 (FX (X )) ∼ FY , which is summarized as the following theorem: Theorem 3.2.4 The function that transforms a random variable X with cdf FX into a random variable with cdf FY is g = FY−1 ◦ FX . Figure 3.16 illustrates some of the interesting results such as X ∼ FX → FX (X ) ∼ U [0, 1), Z ∼ U [0, 1) → FY−1 (Z ) ∼ FY ,

(3.2.50) (3.2.51)

X ∼ FX → FY−1 (FX (X )) ∼ FY .

(3.2.52)

and

6

 Here, because FX is a continuous function, FX FX−1 (z) = z as it is discussed in (3.A.26).

188 X

3 Random Variables FX (·)

FY−1 (·)

Z = FX (X)

V = FY−1 (FX (X))

X ∼ FX −→ Z = FX (X) ∼ U [0, 1) −→ V = FY−1 (FX (X)) ∼ FY

FY−1 (FX (·))

X X ∼ FX

−→

V = FY−1 (FX (X))

V = FY−1 (FX (X)) ∼ FY

Fig. 3.16 Transformation of a random variable X with cdf FX into Y with cdf FY

Theorem 3.2.4 can be used in the generation of random numbers, for instance. Example 3.2.24 From X ∼ U [0, 1), obtain the Rayleigh random variable Y ∼ y2 u(y). f Y (y) = αy2 exp − 2α 2  2  y Solution Because the cdf of Y is FY (y) = 1 − exp − 2α u(y), the func2  −1 tion we are looking for is g(x) = FY (x) = −2α2 ln(1 − x)as we can easily see from (3.2.51). In other words, ifX ∼ U [0, 1), then Y = −2α2 ln(1 − X ) 2

y has the cdf FY (y) = 1 − exp − 2α 2  Y2 1 − exp − 2α2 ∼ U (0, 1).

u(y). Note that, we conversely have V = ♦

Example 3.2.25 For X ∼ U (0, 1), consider the desired pmf pY (yn ) = P (Y = yn )  pn , n = 1, 2, . . . , = 0, otherwise.

(3.2.53)

Then, letting p0 = 0, the integer Y satisfying Y −1 

pk < X ≤

Y 

pk

(3.2.54)

is the random variable with the desired pmf (3.2.53).



k=0

k=0

3.3 Expected Values and Moments The probabilistic characteristics of a random variable can be most completely described by the distribution via the cdf, pdf, or pmf of the random variable. On the other hand, the distribution is not available in some cases, and we may wish to summarize the characteristics as a few numbers in other cases.

3.3 Expected Values and Moments

189

In this section, we attempt to introduce some of the key notions for use in such cases. Among the widely employed representative values, also called central values, for describing the probabilistic characteristics of a random variable and a distribution are the mean, median, and mode (Beckenbach and Bellam 1965; Bickel and Doksum 1977; Feller 1970; Hajek 1969; McDonough and Whalen 1995). Definition 3.3.1 (mode) For a random variable X with pdf f X or pmf p X , if f X (xmod ) ≥ f X (x) for X a continuous random variable, or p X (xmod ) ≥ p X (x) for X a discrete random variable

(3.3.1) (3.3.2)

holds true for all real number x, then the value xmod is called the mode of X . The mode is the value that could happen most frequently among all the values of a random variable. In other words, the mode is the most probable value or, equivalently, the value at which the pmf or pdf of a random variable is maximum. Definition 3.3.2 (median) The value α satisfying both P(X ≤ α) ≥ α) ≥ 21 is called the median of the random variable X .

1 2

and P(X ≥

Roughly speaking, the median is the value at which the cumulative probability is 0.5. When the distribution is symmetric, the point of symmetry of the cdf is the median. The median is one of the quantiles of order p, or 100 p percentile, defined as the number ξ p satisfying P(X ≤ ξ p ) ≥ p and P(X ≥ ξ p ) ≥ 1 − p for 0 < p < 1. For a random variable X with cdf FX , we have     p ≤ FX ξ p ≤ p + P X = ξ p .

(3.3.3)

  Therefore, if P X = ξ p = 0 as for a continuous random variable, the solution to FX (x) = p is ξ p : the solution to this equation is unique when the cdf FX is a strictly increasing function, but otherwise, there exist many solutions, each of which is the quantile of order p. The median and mode are not unique in some cases. When there exist many medians, the middle value is regarded as the median in some cases. Example 3.3.1 For the pmf p X (1) = 13 , p X (2) = 21 , and p X (3) = 16 , because P(X ≤ 2) = 13 + 21 = 56 ≥ 21 and P(X ≥ 2) = 21 + 16 = 23 ≥ 21 , the median7 is 2. For the uniform distribution over the set {1, 2, 3, 4}, any real number in the interval8 [2, 3] is the median, and the mode is 1, 2, 3, or 4. ♦

Note that if the median xmed is defined by P(X ≤ xmed ) = P (X ≥ xmed ), we do not have the median in this pmf. 8 Note that if the median x med is defined by P(X ≤ x med ) = P (X ≥ x med ), any real number in the interval (2, 3) is the median. 7

190

3 Random Variables

Example 3.3.2 For the distribution N (1, 1), the mode is 1. When the pmf is p X (1) ♦ = 13 , p X (2) = 21 , and p X (3) = 16 , the mode of X is 2.

3.3.1 Expected Values We now introduce the most widely used representative value, the expected value. Definition  ∞3.3.3 (expected value) For a random variable X with cdf FX , the value E{X } = −∞ x d FX (x), i.e., ⎧∞ ⎨ −∞ x f X (x)d x, X continuous random variable, ∞  E{X } = x p X (x), X discrete random variable ⎩

(3.3.4)

x=−∞

is called the expected value or mean of X if

∞

−∞

|x| d FX (x) < ∞.

The expected value is also called the stochastic average, statistical average, or ensemble average, and E{X } is also written as E(X ) or E[X ]. b x Example 3.3.3 For X ∼ U [a, b), we have the expected value E{X } = a b−a dx = b2 −a 2 2(b−a)

= a+b of X . The mode of X is any real number between a and b, and the 2 median is the same as the mean a+b . ♦ 2 Example 3.3.4 (Stoyanov 2013) For unimodal random variables, the median usually lies between the mode and mean: an example of exception is shown here. Assume the pdf ⎧ x ≤ 0, ⎨ 0, 0 < x ≤ c, f (x) = x, ⎩ −λ(x−c) ce , x >c

(3.3.5)

2

of X with c ≥ 1 and c2 + λc = 1. Then, the mean, median, and mode of X are 3 2 μ = c3 + cλ + λc2 , 1, and c, respectively. If we choose c > 1 sufficiently close to 1, 13 then λ ≈ 2 and μ ≈ 12 , and the median is smaller than the mean and mode although f (x) is unimodal. ♦ Theorem 3.3.1 (Stoyanov 2013) A necessary condition for the mean E{X } to exist for a random variable X with cdf F is lim x{1 − F(x)} = 0. x→∞

Proof Rewrite x{1 − F(x)} as x{1 − F(x)} = x ∞ x x f (t)dt. Now, letting E{X } = m, we have

 ∞

−∞

f (t)dt −

x −∞

 f (t)dt =

3.3 Expected Values and Moments

 m= ≥ ∞

191 x

−∞  x −∞





t f (t)dt + x



t f (t)dt ∞

t f (t)dt + x

f (t)dt

(3.3.6)

x

∞ f (t)dt. Here, we should have lim x x x→∞ x ∞ f (t)dt → 0 for (3.3.6) to hold true because lim −∞ t f (t)dt = −∞ t f (t)dt = m x→∞ when x → ∞. ♠

for x > 0 because

x

t f (t)dt ≥ x

∞ x

Based on the result (3.E.2) shown in Exercise 3.1, we can show that (Rohatgi and Saleh 2001) 



E{X } =

 P(X > x)d x −

0

0

−∞

P(X ≤ x)d x

(3.3.7)

for any continuous random variable X , dictating that a necessary and sufficient condi∞ 0 tion for E{|X |} < ∞ is that both 0 P(X > x)d x and −∞ P(X ≤ x)d x converge.

3.3.2 Expected Values of Functions of Random Variables Based on the discussions in the previous section, let us now consider the expected values of functions of a random variable. Let FY be the cdf of Y = g(X ). Then, the expected value of Y = g(X ) can be expressed as  E{Y } =

∞ −∞

y d FY (y).

(3.3.8)

In essence, the expected value of Y = g(X ) can be evaluated using (3.3.8) after we have obtained the cdf, pdf, or pmf of Y from that of X . On the other  ∞ hand, the expected value of Y = g(X ) can be evaluated as E{Y } = E{g(X )} = −∞ g(x)d FX (x), i.e.,

E{Y } =

⎧∞ g(x) f X (x)d x, continuous random variable, ⎪ ⎪ ⎨ −∞ ∞  ⎪ ⎪ g(x) p X (x), ⎩

(3.3.9) discrete random variable.

x=−∞

While the first approach (3.3.8) of evaluating the expected value of Y = g(X ) requires that we need to first obtain the cdf, pdf, or pmf of Y from that of X , the second approach (3.3.9) does not require the cdf, pdf, or pmf of Y . In the second approach, we simply multiply the pdf f X (x) or pmf p X (x) of X with g(x) and then integrate or sum without first having to obtain the cdf, pdf, or pmf of Y . In short, if there is

192

3 Random Variables

no other reason to obtain the cdf, pdf, or pmf of Y = g(X ), the second approach is faster in the evaluation of the expected value of Y = g(X ). Example 3.3.5 When X ∼ U [0, 1), obtain the expected value of Y = X 2 . Solution (Method 1) Based we can obtain the pdf f Y (y) = √  √   on (3.2.35), 1 1 √ √ f − y + f y u(y) = {u(y) − u(y − 1)} of Y . Next, using X X 2 y 2 y 1√ 1 y 1 (3.3.8), we get E{Y } = 0 2√ y dy = 2 0 ydy = 13 . 1 ♦ (Method 2) Using (3.3.9), we can directly obtain E{Y } = 0 x 2 d x = 13 . From the definition of the expected value, we can deduce the following properties: (1) When a random variable X is non-negative, i.e., when P(X ≥ 0) = 1, we have E{X } ≥ 0. (2) The expected value of a constant is the constant. In other words, if P(X = c) = 1, then E{X } = c.  n  ai gi (X ) = (3) The expected value is a linear operator: that is, we have E n 

i=1

ai E{gi (X )}.

i=1

(4) For any function h, we have |E{h(X )}| ≤ E{|h(X )|}. (5) If h 1 (x) ≤ h 2 (x) for every point x, then we have E {h 1 (X )} ≤ E {h 2 (X )}. (6) For any function h, we have min(h(X )) ≤ E{h(X )} ≤ max(h(X )). Example 3.3.6 Based on (3) above, we have E{a X + b} = aE{X } + b when a and b are constants. ♦ Example 3.3.7 For a continuous random variable X ∼ U (1, 9) and h(x) = compare h(E{X }), E{h(X )}, min(h(X )), and max(h(X )).

√1 , x

Solution We have h(E{X }) = h(5) = √15 from the result in Example 3.3.3 9 ∞ and E{h(X )} = −∞ h(x) f X (x)d x = 1 8√1 x d x = 21 from (3.3.9). In addition, min(h(X )) = √19 = 13 and max(h(X )) = √11 = 1. Therefore, min(h(X )) ≤ E{h(X )} ≤ max(h(X )), confirming (6).

1 3


0, 1 − zy > 0, 1 − y > 0} is the same as  (y, z) : z > 0, 0 < y < min 1, 1z , the pdf of Z = YX can be obtained as f Z (z) =  min(1, 1z ) y dy u(z), i.e., 0 ⎧ ⎨ 0, z < 0, f Z (z) = 21 , 0 < z ≤ 1, ⎩ 1 , z ≥ 1. 2z 2

(4.2.28)

280

4 Random Vectors

Fig. 4.10 The pdf f Z (z) of Z = YX when X and Y are i.i.d. with the marginal pdf f (x) = u(x)u(1 − x)

fZ (z) 1 2

1 8

0

1

2

z

Note that the value f Z (0) is not, and does not need to be, specified. Figure 4.10 shows ♦ the pdf (4.2.28) of Z = YX .

4.2.3 Joint Cumulative Distribution Function Theorem 4.2.1 is useful when we obtain the joint pdf f Y of Y = g (X) directly from the joint pdf f X of X. In some cases, it is more convenient and easier to obtain the joint pdf f Y after first obtaining the joint cdf FY ( y) = P (Y ≤ y) = P (g (X) ≤ y) as  FY ( y) = P X ∈ A y .

(4.2.29)

Here, Y ≤ y denotes {Y1 ≤ y1 , Y2 ≤ y2 , . . . , Yn ≤ yn }, and A y denotes  the inverse image of Y ≤ y, i.e., the region of X such that {Y ≤ y} = X ∈ A y . For example, when n =  isnon-decreasing  at every point x and has an inverse function,  1, if g(x) −1 = X ≤ g (y) as we observed in Chap. 3, and we get FY (y) = we have X ∈ A y  FX g −1 (y) . Example 4.2.12 When the joint pdf of (X, Y ) is   2 1 x + y2 , f X,Y (x, y) = exp − 2π σ 2 2σ 2 obtain the joint pdf of Z =



X 2 + Y 2 and W =

(4.2.30)

Y . X

Solution The joint cdf FZ ,W (z, w) = P (Z ≤ z, W ≤ w) of (Z , W ) is  Y X 2 + Y 2 ≤ z, ≤ w X   2



1 x + y2 d xd y = exp − 2π σ 2 2σ 2

FZ ,W (z, w) = P



Dzw

(4.2.31)

4.2 Distributions of Functions of Random Vectors

281

 2 2 for z ≥ 0, where D the union of the two fan shapes (x, zw is    y) : x + y ≤ 2 2 2 2 z , x > 0, y ≤ wx and (x, y) : x + y ≤ z , x < 0, y ≥ wx when w > 0 and also when w < 0. Changing the integration in the perpendicular coordinate system into that in the polar coordinate system as indicated in Fig. 4.1  and2 noting the  θw  z 1 symmetry of f X,Y , we get FZ ,W (z, w) = 2 θ=− π r =0 2πσ 2 exp − 2σr 2 r dr dθ = 2  2 $z  # 2 1 π r θw + 2 −σ exp − 2σ 2 , i.e., πσ 2 r =0

FZ ,W (z, w) =

   1  z2 , π + 2 tan−1 w 1 − exp − 2 2π 2σ

(4.2.32)

 where θw = tan−1 w ∈ − π2 , π2 .



Recollect that we obtained the total probability theorems (2.4.13), (3.4.8), and (3.4.9) based on P(A|B)P(B) = P(AB) derived from (2.4.1). Now, extending the results into the multi-dimensional space, we similarly8 have ⎧ ⎪ ⎨ all x P(A|X = x) f X (x)d x, continuous random vector X, P(A) = (4.2.33)  ⎪ P(A|X = x) p X (x), discrete random vectorX, ⎩ all x

which are useful in obtaining the cdf, pdf, and pmf in some cases. Example 4.2.13 Obtain the pdf of Y = X 1 + X 2 when X = (X 1 , X 2 ) has the joint pdf f X . Solution This problem has already been discussed in Example 4.2.4 based on the pdf. We now consider the problem based on the cdf. (Method 1) Recollecting (4.2.33), the cdf FY (y) = P (X 1 + X 2 ≤ y) of Y can be expressed as

FY (y) =

∞ −∞



∞ −∞

P ( X 1 + X 2 ≤ y| X 1 = x1 , X 2 = x2 )

f X (x1 , x2 ) d x1 d x2 .

(4.2.34)

Here, { X 1 + X 2 ≤ y| X 1 = x1 , X 2 = x2 } does and does not hold true when x1 + x2 ≤ y and x1 + x2 > y, respectively. Thus, we have  P ( X 1 + X 2 ≤ y| X 1 = x1 , X 2 = x2 ) =

1, x1 + x2 ≤ y, 0, x1 + x2 > y

(4.2.35)

and the cdf of Y can be expressed as 8

Conditional distribution in random vectors will be discussed in Sect. 4.4 in more detail.

282

4 Random Vectors

Fig. 4.11 The region A = {(X 1 , X 2 ) : X 1 + X 2 ≤ y} and the interval (−∞, y − x2 ) of integration for the value x1 of X 1 when the value of X 2 is x2

X2 y y

x1 = y − x2

X1

X1 + X2 = y

x2



FY (y) =

= Then, the pdf f Y (y) =

∂ ∂y

x1 +x2 ≤y ∞ y−x2

−∞

f X (x1 , x2 ) d x1 d x2 f X (x1 , x2 ) d x1 d x2 .

−∞

(4.2.36)

FY (y) of Y is

f Y (y) =

∞ −∞

f X (y − x2 , x2 ) d x2 ,

(4.2.37)

∞ which can also be expressed as f Y (y) = −∞ f X (x1 , y − x1 ) d x1 . In obtaining  y−x (4.2.37), we used ∂∂y −∞ 2 f X (x1 , x2 ) d x1 = f X (y − x2 , x2 ) from Leibnitz’s rule (3.2.18). (Method 2) Referring to the region A = {(X 1 , X 2 ) : X 1 + X 2 ≤ y} shown in Fig. 4.11, the value x1 of X 1 runs from −∞ to y − x2 when the value of x2 of 9 X 2 runs from −∞ to ∞. Thus we have FY (y) = P(Y ≤ y) = P (X 1 + X 2 ≤ y) = f X (x1 , x2 ) d x1 d x2 , i.e., A

FY (y) =

∞ x2 =−∞



y−x2 x1 =−∞

f X (x1 , x2 ) d x1 d x2 ,

(4.2.38) ♦

and subsequently (4.2.37).

Example 4.2.14 Obtain the pdf of Z = X + Y when X with the pdf f X (x) = αe−αx u(x) and Y with the pdf f Y (y) = βe−βy u(y) are independent of each other. Solution We first obtain the pdf of Z directly. The joint pdf of X and Y is f X,Y (x, y) = f X (x) f Y (y) = αβe−αx e−βy u(x)u(y). Recollecting that u(y)u(z − y) 9

If the order of integration is interchanged, then ∞  y−x1 x 1 =−∞ x 2 =−∞ f X (x 1 , x 2 ) d x 2 d x 1 .

 y−x2 x 2 =−∞ x 1 =−∞

∞

f X (x1 , x2 ) d x1 d x2 will become

4.2 Distributions of Functions of Random Vectors

283

is non-zero  ∞ only when 0 < y < z, the pdf of Z = X +z Y can be obtained as f Z (z) = −∞ αβe−α(z−y) e−βy u(y)u(z − y)dy = αβe−αz 0 e(α−β)y dyu(z), i.e.,  f Z (z) =

αβ  −αz e β−α 2 −αz

− e−βz u(z), β = α, u(z), β=α

α ze

(4.2.39)

from (4.2.37). Next, the cdf of Z can be expressed as

FZ (z) = αβ

∞ −∞



z−y

e−αx e−βy u(x)u(y) d xd y

(4.2.40)

−∞

based on (4.2.38). Here, (4.2.40) is non-zero only when {x > 0, y > 0, z − y > 0} due to u(x)u(y). With this fact in mind and by noting that {y > 0, z − y > 0} = {z > y > 0}, we can rewrite (4.2.40) as

z z−y FZ (z) = αβ e−αx e−βy d xd y u(z) 0 0

z   =β 1 − e−α(z−y) e−βy dy u(z)  0  −αz  1 1 − β−α βe − αe−βz u(z), β = α, =   1 − (1 + αz) e−αz u(z), β = α.

(4.2.41) ♦

By differentiating this cdf, we can obtain the pdf (4.2.39).

Example 4.2.15 For a continuous random vector (X, Y ), let Z = max(X, Y ) and W = min(X, Y ). Referring to Fig. 4.12, we first have FZ (z) = P(max(X, Y ) ≤ z) = P(X ≤ z, Y ≤ z), i.e., FZ (z) = FX,Y (z, z).

Fig. 4.12 The region {(X, Y ) : max(X, Y ) ≤ z}

(4.2.42)

Y z

z

X

284

4 Random Vectors

Fig. 4.13 The region {(X, Y ) : min(X, Y ) ≤ w}

Y

w

w

X

Next, when A = {X ≤ w} and B = {Y ≤ w}, we have P(X > w, Y > w) = 1 − FX (w) − FY (w) + FX,Y (w, w)

(4.2.43)

from P(X > w, Y > w) = P (Ac ∩ B c ) = P ((A ∪ B)c ) = 1 − P(A ∪ B) = 1 − P(A) − P(B) + P(A ∩ B), P(A) = P(X ≤ w) = FX (w), P(B) = P(Y ≤ w) = FY (w), and P(A ∩ B) = P(X ≤ w, Y ≤ w) = FX,Y (w, w). Therefore, we get the cdf FW (w) = P(W ≤ w) = 1 − P(W > w) = 1 − P(min(X, Y ) > w) = 1 − P(X > w, Y > w) of W as FW (w) = FX (w) + FY (w) − FX,Y (w, w),

(4.2.44)

which can also be obtained intuitively from Fig. 4.13. Note that the pdf f Z (z) = d F (z, z) of Z = max(X, Y ) becomes dz X,Y f Z (z) = 2F(z) f (z) and the pdf f W (w) = becomes

d dw



FX (w) + FY (w) − FX,Y (w, w)

(4.2.45) 

of W = min(X, Y )

f W (w) = 2 {1 − F(w)} f (w) when X and Y are i.i.d. with the marginal cdf F and marginal pdf f .

(4.2.46) ♦

The generalization of Z = max(X, Y ) and W = min(X, Y ) discussed in Example 4.2.15 and Exercise 4.31 is referred to as the order statistic (David and Nagaraja 2003).

4.2.4 Functions of Discrete Random Vectors Considering that the pmf, unlike the pdf, represents a probability, we now discuss functions of discrete random vectors.

4.2 Distributions of Functions of Random Vectors

285

Example 4.2.16 (Rohatgi and Saleh 2001) Obtain the pmf of Z = X + Y and the pmf of W = X − Y when X ∼ b(n, p) and Y ∼ b(n, p) are independent of each other. n  P(X = k, Y = z − k) of Z = X + Y Solution First, the pmf P(Z = z) = can be obtained as P(Z = z) = n 

n 

k=0 n Ck p

k

(1 − p)n−k n Cz−k p z−k (1 − p)n−z+k =

k=0 n Ck n Cz−k p

z

(1 − p)2n−z , i.e.,

k=0

P(Z = z) =

2n Cz p

for z = 0, 1, . . . , 2n, where we have used Next, the pmf P(W = w) =

n 

z

n 

(1 − p)2n−z

n Ck n Cz−k

=

k=0

P(X = k + w, Y = k) =

k=0

P(W = w) =

p 1− p

w  n

2n Cz

n 

based on (1.A.25).

n Ck+w n Ck p

2k+w

(1 −

k=0

p)2n−2k−w of W = X − Y can be obtained as 

(4.2.47)

n Ck+w n Ck p

2k

(1 − p)2n−2k

(4.2.48)

k=0

for w = −n, −n + 1, . . . , n.



Example 4.2.17 Assume that X and Y are i.i.d. with the marginal pmf p(x) = ˜ − 1), where 0 < α < 1 and u(x) ˜ is the discrete space unit step (1 − α)α x−1 u(x function defined in (1.4.17). Obtain the joint pmf of (X + Y, X ), and based on the result, obtain the pmf of X and the pmf of X + Y . Solution First we have p X +Y,X (v, x) = P(X + Y = v, X = x) = P(X = x, Y = v − x), i.e., ˜ − 1)u(v ˜ − x − 1). p X +Y,X (v, x) = (1 − α)2 α v−2 u(x Thus, we have p X +Y (v) =

∞  x=−∞

(4.2.49)

p X +Y,X (v, x), i.e.,

p X +Y (v) = (1 − α)2 α v−2

∞ 

u(x ˜ − 1)u(v ˜ − x − 1)

(4.2.50)

x=−∞

from (4.2.49). Now noting that u(x ˜ − 1)u(v ˜ − x − 1) = 1 for {x − 1 ≥ 0, v − x − 1 ≥ 0} and 0 otherwise and that10 {x : x − 1 ≥ 0, v − x − 1 ≥ 0} = {x : 1 ≤ x ≤ v−1  v − 1, v ≥ 2}, we have p X +Y (v) = (1 − α)2 α v−2 u(v ˜ − 2), i.e., x=1

10

Here, v − x − 1 ≥ 0, for example, can more specifically be written as v − x − 1 = 0, 1, . . ..

286

4 Random Vectors

p X +Y (v) = (1 − α)2 (v − 1)α v−2 u(v ˜ − 2). Next, the pmf of X can be obtained as p X (x) =

p X (x) = (1 − α)2 α −2

∞ 

∞  v=−∞

(4.2.51)

p X +Y,X (v, x), i.e.,

α v u(x ˜ − 1)u(v ˜ − x − 1)

(4.2.52)

v=−∞

from (4.2.49), which can be rewritten as p X (x) = (1 − α)2 α −2 i.e.,

∞  v=x+1

α v u(x ˜ − 1),

˜ − 1) p X (x) = (1 − α)α x−1 u(x

(4.2.53)

by noting that {v : x − 1 ≥ 0, v − x − 1 ≥ 0} = {v : v ≥ x + 1, x ≥ 1} and u(x ˜ − 1)u(v ˜ − x − 1) = 1 for {x − 1 ≥ 0, v − x − 1 ≥ 0} and 0 otherwise. ♦

4.3 Expected Values and Joint Moments For random vectors, we will describe here the basic properties (Balakrishnan 1992; Kendall and Stuart 1979; Samorodnitsky and Taqqu 1994) of expected values. New notions will also be defined and explored.

4.3.1 Expected Values The expected values for random vectors can be described as, for example,

E {g(X)} =

g(x)d FX (x)

(4.3.1)

by extending the notion of the expected values discussed in Chap. 3 into multiple dimensions. Because the expectation is a linear operator, we have  E

n  i=1

 ai gi (X i )

=

n 

ai E {gi (X i )}

(4.3.2)

i=1

n for an n-dimensional random vector X when {gi }i=1 are all measurable functions. In addition,

4.3 Expected Values and Joint Moments

 E

n 

287

 gi (X i )

=

i=1

n 

E {gi (X i )}

(4.3.3)

i=1

when X is an independent random vector. Example 4.3.1 Assume that we repeatedly roll a fair die until the number of evennumbered outcomes is 10. Let N denote the number of rolls and X i denote the number of outcome i when the repetition ends. Obtain the pmf of N , expected value of N , expected value of X 1 , and expected value of X 2 . Solution First, the pmf of N can be obtained as P(N = k) = P ( Ak ∩ B) =  9  1 k−1−9  , i.e., P ( Ak | B) P(B) = P (Ak ) P(B) = 21 k−1 C9 21 2 P(N = k) =

k−1 C9

 k 1 2

(4.3.4)

for k = 10, 11, . . ., where Ak = {an even number at the k-th rolling} and B = ∞  x {9 times of even numbers until (k − 1)-st rolling}. Using r +x−1 Cx (1 − α) = x=0

(k−1)! k! α −r shown in (2.5.16) and noting that k k−1 C9 = k (k−10)!9! = 10 (k−10)!10! = 10 k C10 , ∞ ∞ ∞  1 k  1 k 1 j    k = 10 = 21010 = we get E{N } = k−1 C9 2 k C10 2 j+10 C10 2

 1 −11

k=10

k=10

j=0

= 20. This result can also be obtained from the formula (3.E.27) of 2  the mean of the NB distribution with the pmf (2.5.17) by using (r, p) = 10, 21 . Subsequently, until the end, even numbers will occur 10 times, among which 2, 4, 6  and 6 will occur equally likely. Thus, E {X 2 } = 10 . Next, from N = X i , we get 3 10 210

E{N } =

6 

i=1

E {X i }. Here, because E {X 2 } = E {X 4 } = E {X 6 } =

i=1



E {X 3 } = E {X 5 }, the expected value11 of X 1 is E {X 1 } = 20 − 3 ×

10 3

and E {X 1 } =

3

1 3

10

=

10 3 .



4.3.2 Joint Moments We now generalize the concept of moments discussed in Chap. 3 for random vectors. The moments for bi-variate random vectors will first be considered and then those for higher dimensions will be discussed.

11

The expected values of X 1 and X 2 can of course be obtained with the pmf’s of X 1 and X 2 obtained already in Example 4.1.11.

288

4.3.2.1

4 Random Vectors

Bi-variate Random Vectors

Definition 4.3.1 (joint moment; joint central moment) The expected value   m jk = E X j Y k

(4.3.5)

is termed the ( j, k)-th joint moment or product moment of X and Y , and   μ jk = E (X − m X ) j (Y − m Y )k

(4.3.6)

is termed the ( j, k)-th joint central moment or product central moment of X and Y , for j, k = 0, 1, . . ., where m X and m Y are the means of X and Y , respectively. It is easy  to see that m 00= μ00 = 1, m 10 = m X = E{X }, m 01 = m Y = E{Y }, m 20 = E X 2 , m 02 = E Y 2 , μ10 = μ01 = 0, μ20 = σ X2 is the variance of X , and μ02 = σY2 is the variance of Y .   Example 4.3.2 The expected value E X 1 X 23 is the (1, 3)-rd joint moment of X = ♦ (X 1 , X 2 ). Definition 4.3.2 (correlation; covariance) The (1, 1)-st joint moment m 11 and the (1, 1)-st joint central moment μ11 are termed the correlation and covariance, respectively, of the two random variables. The ratio of the covariance to the product of the standard deviations of two random variables is termed the correlation coefficient. The correlation m 11 = E{X Y } is often denoted12 by R X Y , and the covariance μ11 = E {(X − m X ) (Y − m Y )} = E{X Y } − m X m Y by K X Y , Cov(X, Y ), or C X Y . Specifically, we have K XY = RXY − m X mY

(4.3.7)

for the covariance, and K XY ρX Y = % σ X2 σY2

(4.3.8)

for the correlation coefficient. Definition 4.3.3 (orthogonal; uncorrelated) When the correlation is 0 or, equivalently, when the mean of the product is 0, the two random variables are called orthogonal. When the mean of the product is the same as the product of the means or, equivalently, when the covariance or correlation coefficient is 0, the two random variables are called uncorrelated. 12

When there is more than one subscript, we need commas in some cases: for example, the joint pdf f X,Y of (X, Y ) should be differentiated from the pdf f X Y of the product X Y . In other cases, we do not need to use commas: for instance, R X Y , μ jk , K X Y , . . . denote relations among two or more random variables and thus is expressed without any comma.

4.3 Expected Values and Joint Moments

289

In other words, when R X Y = E{X Y } = 0, X and Y are orthogonal. When ρ X Y = 0, K X Y = Cov(X, Y ) = 0, or E{X Y } = E{X }E{Y }, X and Y are uncorrelated. Theorem 4.3.1 If two random variables are independent of each other, then they are uncorrelated, but the converse is not necessarily true. In other words, there exist some uncorrelated random variables that are not independent of each other. In addition, when two random variables are independent and at least one of them has mean 0, the two random variables are orthogonal. Theorem 4.3.2 The absolute value of a correlation coefficient is no larger than 1.     Proof From the Cauchy-Schwarz inequality E2 {X Y } ≤ E X 2 E Y 2 shown in (6.A.26), we get     E2 {(X − m X ) (Y − m Y )} ≤ E (X − m X )2 E (Y − m Y )2 , which implies K X2 Y ≤ σ X2 σY2 . Thus, ρ X2 Y ≤ 1 and |ρ X Y | ≤ 1.

(4.3.9) ♠

Example 4.3.3 When the two random variables X and Y are related by Y − m Y = ♦ c (X − m X ) or Y = cX + d, we have |ρ X Y | = 1.

4.3.2.2

Multi-dimensional Random Vectors

Let E {X} = m X = (m 1 , m 2 , . . . , m n )T be the mean vector of X = (X 1 , X 2 , . . . , X n )T . In subsequent discussions, especially when we discuss joint moments of random vectors, we will often assume the random vectors are complex. The discussion on complex random vectors is almost the same as that on real random vectors on which we have so far focused. Definition 4.3.4 (correlation matrix; covariance matrix) The matrix   RX = E X X H

(4.3.10)

is termed the correlation matrix and the matrix K X = R X − m X m XH

(4.3.11)

is termed the covariance matrix or variance-covariance matrix of X, where the superscript H denotes the complex conjugate transpose, also called the Hermitian transpose or Hermitian conjugate.

290

4 Random Vectors

 " The correlation matrix R X = Ri j is of  size n × n: the (i, j)-th element of R X ∗ is the correlation Ri j = R X i X j = E X i X j between X i and X j when i = j and the   2 second  " absolute moment Rii = E |X i | when i = j. The covariance matrix K X = K i j is also an n × n matrix: the (i, j)-th element of K X is the covariance K i j = ∗    of X i and X j when i = j and the variance K X i X j = E (X i − m i ) X j − m j K ii = Var (X i ) of X i when i = j. Example 4.3.4 For a random X = (X 1 , X 2 , . . . , X n )T and an n × n linear "  vector transformation matrix L = L i j , consider the random vector Y = L X,

(4.3.12)

T  where Y = (Y1 , Y2 , . . . , Yn )T . Then, letting m X = m X 1 , m X 2 , . . . , m X n be the T  mean vector of X, the mean vector mY = E {Y } = m Y1 , m Y2 , . . . , m Yn = E {L X} = LE {X} of Y can be obtained as mY = L m X .

(4.3.13)

and matrices, Similarly, denoting by R X and K X the correlation  covariance    respectively, of X, the correlation matrix R Y = E Y Y H = E L X (L X) H =   L E X X H L H of Y can be expressed as RY = L R X L H ,

(4.3.14)

  and the covariance matrix K Y = E (Y − mY ) (Y − mY ) H = R Y − mY mYH =  L R X − m X m XH L H of Y can be expressed as KY = L K X LH.

(4.3.15)

More generally, when Y = L X + b, we have mY = Lm X + b, R Y = L R X L H + Lm X b H + b (Lm X ) H + bb H , and K Y = L K X L H . In essence, the results (4.3.13)– (4.3.15), shown in Fig. 4.14 as a visual representation, dictate that the mean vector, correlation matrix, and covariance matrix of Y = L X can be obtained without first having to obtain the cdf, pdf, or pmf of Y , an observation similar to that on (3.3.9). ♦   2 −1 for X = (X 1 , X 2 )T . Example 4.3.5 Assume m X = (1, 2)T and K X = −1 1   1 1 When L = , obtain mY and K Y for Y = L X. −1 1 Solution We easily get the mean mY = L m X = (3, 1)T of Y = H L X. Next, the covariance matrix of       Y = L X is K Y = L K X L = 1 1 2 −1 1 −1 1 −1 = . ♦ −1 1 −1 1 1 1 −1 5

4.3 Expected Values and Joint Moments

291

L

X

Y = LX mY = L mX , RY = L RX LH ,

{mX , RX , K X }

K Y = L K X LH Fig. 4.14 The mean vector, correlation matrix, and covariance matrix of linear transformation

Theorem 4.3.3 The correlation and covariance matrices are Hermitian.13      ∗ Proof From R X i X j = E X i X ∗j = E X j X i∗ = R ∗X j X i for the correlation of X i and X j , the correlation matrix is Hermitian. Similarly, it is easy to see that the ♠ covariance matrix is also Hermitian by letting Yi = X i − m i . Theorem 4.3.4 The correlation and covariance matrices of any random vector are positive semi-definite. Proof Let a = (a1 , a2 ,. . . , an) and X = (X 1 , X 2 , . . . , X n )T . Then, the corre lation matrix R X = E X X H is positive semi-definite because E |aX|2 =     E aX X H a H = aE X X H a H ≥ 0. Letting Yi = X i − m i , we can similarly show that the covariance matrix is positive semi-definite. ♠ Definition 4.3.5 (uncorrelated random   vector X is called an  vector) A random  ∗ uncorrelated random vector if E X i X j = E {X i } E X ∗j for all i and j such that i = j.  For "an uncorrelated random vector X, we have the correlation matrix R X = R X i X j with R Xi X j

   E |X i |2 ,  = E {X i } E X ∗j ,

i = j, i = j

(4.3.16)

"  and the covariance matrix K X = K X i X j with K X i X j = σ X2 i δi j , where  δi j =

1, 0,

i = j, i = j

(4.3.17)

1, 0,

k = 0, k = 0

(4.3.18)

and  δk =

13

A matrix A such that A H = A is called Hermitian.

292

4 Random Vectors

are called the Kronecker delta function. In some cases, an uncorrelated random vector is referred to as a linearly independent random vector. In addition, a random vector X = (X 1 , X 2 , . . . , X n ) is called a linearly dependent random vector if there exists a vector a = (a1 , a2 , . . . , an ) = 0 such that a1 X 1 + a2 X 2 + · · · + an X n = 0. Definition   vectors uncorrelated with each other) When we have  4.3.6 (random E X i Y j∗ = E {X i } E Y j∗ for all i and j, the random vectors X and Y are called uncorrelated with each other. Note that even when X and Y are uncorrelated with each other, each of X and Y may or may not be an uncorrelated random vector, and even when X and Y are both uncorrelated random vectors, X and Y may be correlated. Theorem 4.3.5 (McDonough and Whalen 1995) If the covariance matrix of a random vector is positive definite, then the random vector can be transformed into an uncorrelated random vector via a linear transformation. Proof The theorem can be proved by noting that, when an n × n matrix A is a normal14 matrix, we can take n orthogonal unit vectors as the eigenvectors of A and that there exists a unitary15 matrix P such that P −1 A P = P H A P is diagonal. n be the eigenvalues of the positive definite covariance matrix K X of X. Let {λi }i=1 n are all real. In addition, because Because a covariance matrix is Hermitian, {λi }i=1 n K X is positive definite, {λi }i=1 are all larger than 0. Now, choose the eigenvectors n as corresponding to the eigenvalues {λi }i=1 n {ai }i=1 =



(ai1 , ai2 , . . . , ain )T

n i=1

,

(4.3.19)

respectively, so that the eigenvectors are orthonormal, that is, aiH a j = δi j .

(4.3.20)

Now, consider the unitary matrix A = (a1 , a2 , . . . , an ) H  " = ai∗j

(4.3.21)

composed of the eigenvectors (4.3.19) of K X . Because K X ai = λi ai , and therefore K X (a1 , a2 , . . . , an ) = (λ1 a1 , λ2 a2 , . . . , λn an ), the covariance matrix K Y = AK X A H of Y = (Y1 , Y2 , . . . , Yn )T = AX can be obtained as A matrix A such that A A H = A H A is normal. A matrix A is unitary if A H = A−1 or if A H A = A A H = I. In the real space, a unitary matrix is referred to as an orthogonal matrix. A Hermitian matrix is always a normal matrix anda unitary  1 −1 matrix is always a normal matrix, but the converses are not necessarily true: for example, 1 1 is normal, but is neither Hermitian nor unitary.

14 15

4.3 Expected Values and Joint Moments

293

˜ λA

X K X , |K X | > 0, {λi }n i=1 K X ai = λ i ai aH i aj = δij

˜ = diag λ

˜ Z = λAX

√1 , √1 , · · · , √1 λ1

λ2

λn

KZ = I

A = (a1 , a2 , · · · , an )H

Fig. 4.15 Decorrelating into uncorrelated unit-variance random vectors

K Y = (a1 , a2 , . . . , an ) H (λ1 a1 , λ2 a2 , . . . , λn an ) "  = λi aiH a j = diag (λ1 , λ2 , . . . , λn )

(4.3.22)

from (4.3.15) using (4.3.20). In essence, Y = AX is an uncorrelated random vector. ♠ Let us proceed one step further from Theorem 4.3.5. Recollecting that the eigenn of the covariance matrix K X are all larger than 0, let values {λi }i=1   ˜λ = diag √1 , √1 , . . . , √1 λn λ1 λ2

(4.3.23)

and consider the linear transformation ˜ Z = λY

(4.3.24)

˜ Y λ˜ of Z is of Y . Then, the covariance matrix K Z = λK H

K Z = I.

(4.3.25)

˜ = λ˜ AX is a vector of uncorrelated unit-variance random In other words, Z = λY variables. Figure 4.15 summarizes the procedure. T Example   4.3.6 Assume that the covariance matrix of X = (X 1 , X 2 ) is K X = 13 12 . Find a linear transformation that decorrelates X into an uncorrelated 12 13 unit-variance random vector.

|λI − K X | = 0, we get the two Solution  From the characteristic  equation  T 1 pairs λ1 = 25, a1 = √2 (1 1) and λ2 = 1, a2 = √12 (1 − 1)T of eigenvalue unit of K X . With the linear transformation C = λ˜ A = & and ' eigenvector    √1 0 1 1 1 1 1 25 √1 √ = 50 constructed from the two pairs, the covari2 1 −1 5 −5 0 √11

294

4 Random Vectors

   1 1 25 5 1 = ance matrix K W = C K X C H of W = C X is K W = 50 5 −5 25 −5   10 . In other words, C is a linear transformation that decorrelates X into an 01   1 1 1 √ uncorrelated unit-variance random vector. Note that A = 2 is a unitary 1 −1 matrix.   −2 3 1 Meanwhile, for B = 5 , the covariance matrix of Y = B X is K Y = 3 −2      −2 3 10 15 10 1 = . In other words, like C, the transB K X B H = 25 3 −2 15 10 01 formation B also decorrelates X into an uncorrelated unit-variance random vector.In addition, from C K X C H = I and B K X B H = I, we get C H C = B H B = 13 −12 1 ♦ = K −1 X . 25 −12 13 T T Example 4.3.7  When  U = (U1 , U2 ) has mean vector (10 0) and covari41 ance matrix , consider the linear transformation V = (V1 , V2 )T = LU = 11      −2 5 −2U1 + 5U2 U1 = of U. Then, the mean vector of V is E {V } = U U1 + U2 1 1 2     −2 5 10 −20 LE {U} = = . In addition, the covariance matrix of V is 1 1 0 10   21 0 . ♦ K V = L KU LH = 0 7

Example 4.3.6 implies that the decorrelating linear transformation is generally not unique.

4.3.3 Joint Characteristic Function and Joint Moment Generating Function By extending the notion of the cf and mgf discussed in Sect. 3.3.4, we introduce and discuss the joint cf and joint mgf of multi-dimensional random vectors. Definition 4.3.7 (joint cf) The function    ϕ X (ω) = E exp jω T X

(4.3.26)

is the joint cf of X = (X 1 , X 2 , . . . , X n )T , where ω = (ω1 , ω2 , . . . , ωn )T . The joint cf ϕ X (ω) of X can be expressed as

ϕ X (ω) =



−∞





−∞

···

∞ −∞

 f X (x) exp jω T x d x

(4.3.27)

4.3 Expected Values and Joint Moments

295

when X is a continuous random vector, where x = (x1 , x2 , . . . , xn )T and d x = d x1 d x2 · · · d xn . Thus, the joint cf ϕ X (ω) is the complex conjugate of the multidimensional Fourier transform F { f X (x)}  ∞ of∞the joint  ∞ pdf f X (x). Clearly, we can 1 T · · · ϕ (ω) exp − jω x dω by obtain the joint pdf as f X (x) = (2π) n −∞ −∞ −∞ X inverse transforming the joint cf. Definition 4.3.8 (joint mgf) The function    M X (t) = E exp t T X

(4.3.28)

is the joint mgf of X, where t = (t1 , t2 , . . . , tn )T . The joint mgf M X (t) is the multi-dimensional Laplace transform L { f X (x)} of the joint pdf f X (x) with t replaced by −t. The joint pdf can be obtained from the inverse Laplace transform of the joint mgf. The marginal cf and marginal mgf can be obtained from the joint cf and joint mgf, respectively. For example, the marginal cf ϕ X i (ωi ) of X i can be obtained as  ϕ X i (ωi ) = ϕ X (ω)ω j =0 for all

j=i

(4.3.29)

from the joint cf ϕ X (ω), and the marginal mgf M X i (ti ) of X i as  M X i (ti ) = M X (t)t j =0 for all

j=i

(4.3.30)

from the joint mgf M X (t). When X is an independent random vector, it is easy to see that the joint cf ϕ X (ω) is the product ϕ X (ω) =

n 

ϕ X i (ωi )

(4.3.31)

i=1

of marginal cf’s from Theorem 4.1.3. Because cf’s and distributions are related by one-to-one correspondences as we discussed in Theorem 3.3.2, a random vector whose joint cf is the product of the marginal cf’s is an independent random vector. n  X i . Then, the cf Example 4.3.8 For an independent random vector X, let Y = i=1  jωX jωX   jωY  ϕY (ω) = E e = E e 1 e 2 · · · e jωX n of Y can be expressed as

ϕY (ω) =

n 

ϕ X i (ω).

(4.3.32)

i=1

By inverse transforming the cf (4.3.32), we can get the pdf f Y (y) of Y , which is the convolution

296

4 Random Vectors

f X 1 +X 2 +···+X n = f X 1 ∗ f X 2 ∗ · · · ∗ f X n

(4.3.33)

of the marginal pdf’s. The result (4.2.15) is a special case of (4.3.33) with n = 2. ♦ n  n   n Xi ∼ P λi when {X i ∼ P (λi )}i=1 are Example 4.3.9 Show that Y = i=1

i=1

independent of each other.

   Solution The cf for the distribution P (λi ) is ϕ X i (ω) = exp λi e jω − 1 . Thus, n n     exp λi e jω − 1 of Y = X i can be expressed as the cf ϕY (ω) = i=1

i=1

& ϕY (ω) = exp

n 

' λi

 jω e −1

 (4.3.34)

i=1

from (4.3.32), confirming the desired result.



In Sect. 6.2.1, we will discuss again the sum of a number of independent random variables. As we observed in (4.3.32), when X and Y are independent of each other, the cf of X + Y is the product of the cf’s of X and Y . On the other hand, the converse does not hold true: when the cf of X + Y is the product of the cf’s of X and Y , X and Y may or may not be independent of each other. Specifically, assume X 1 ∼ F1 and X 2 ∼ F2 are independent of each other, where Fi is the cdf of X i for i = 1, 2. Then, X 1 + X 2 ∼ F1 ∗ F2 and, if X 1 and X 2 are absolutely continuous random variables with pdf f 1 and f 2 , respectively, X 1 + X 2 ∼ f 1 ∗ f 2 . Yet, even when X 1 and X 2 are not independent of each other, in some cases we have X 1 + X 2 ∼ f 1 ∗ f 2 (Romano and Siegel 1986; Stoyanov 2013; Wies and Hall 1993). such cases include the nonindependent Cauchy random variables and that shown in the example below. Example 4.3.10 Assume the joint pmf p X,Y (x, y) = P(X = x, Y = y) 1 p X,Y (1, 1) = 19 , p X,Y (1, 2) = 18 , p X,Y (1, 3) = 16 , 1 1 1 , p X,Y (2, 1) = 6 , p X,Y (2, 2) = 9 , p X,Y (2, 3) = 18 1 1 1 p X,Y (3, 1) = 18 , p X,Y (3, 2) = 6 , p X,Y (3, 3) = 9

(4.3.35)

of a discrete random vector (X, Y ). Then, the pmf p X (x) = P(X = x) of X is p X (1) = p X (2) = p X (3) = 13 , and the pmf pY (y) = P(Y = y) of Y is pY (1) = pY (2) = pY (3) = 13 , implying that X and Y are not independent of each other.  However, the mgf’s of X and Y are both M X (t) = MY (t) = 13 et + e2t + e3t and  the mgf of X + Y is M X +Y (t) = 19 e2t + 2e3t + 3e4t + 2e5t + e6t = M X (t)MY (t). ♦ We have observed that when X and Y are independent of each other and g and h are continuous functions, g(X ) and h(Y ) are independent of each other in Theorem 4.1.3. We now discuss whether g(X ) and h(Y ) are uncorrelated or not when X and Y are uncorrelated.

4.3 Expected Values and Joint Moments

297

Theorem 4.3.6 When X and Y are independent, g(X ) and h(Y ) are uncorrelated. However, when X and Y are uncorrelated but not independent, g(X ) and h(Y ) are not necessarily uncorrelated. Proof X and Y are independent of  ∞  When  ∞each other, we have  ∞ E {g(X )h(Y )} = ∞ g(x)h(y) f (x, y)d xd y = g(x) f (x)d x X,Y X −∞ −∞ −∞ −∞ h(y) f Y (y)dy = E{g(X )}E{h(Y )} and thus g(X ) and h(Y ) are uncorrelated. Next, when X and Y are uncorrelated but are not independent  that  g(X) and  of eachother,assume h(Y ) are uncorrelated. Then, we have E e j (ω1 X +ω2 Y ) = E e jω1 X E e jω2 Y from E{g(X )h(Y )} = E{g(X )}E{h(Y )} with g(x) = e jω1 x and h(y) = e jω2 y . This result implies that thejoint cf ϕ X,Y (ω  1 ,ω2 ) ofX and Y can be expressed as ϕ X,Y (ω1 , ω2 ) =  E e j (ω1 X +ω2 Y ) = E e jω1 X E e jω2 Y = ϕ X (ω1 ) ϕY (ω2 ) in terms of the marginal cf’s of X and Y , a contradiction that X and Y are not independent of each other. In short, we have E{X Y } = E{X }E{Y }  E{g(X )h(Y )} = E{g(X )}E{h(Y )}, i.e., when X and Y are uncorrelated but are not independent of each other, g(X ) and h(Y ) are not necessarily uncorrelated. ♠ The joint moments of random vectors can be easily obtained by using the joint cf or joint mgf as shown in the following theorem:     ∞ ∞ Theorem 4.3.7 The joint moment m k1 k2 ···kn = E X 1k1 X 2k2 · · · X nkn = −∞ −∞ · · ·  ∞ k1 k2 kn −∞ x 1 x 2 · · · x n f X (x)d x can be obtained as m k1 k2 ···kn = j

−K

from the joint cf ϕ X (ω), where K =

    ∂ω1k1 ∂ω2k2 · · · ∂ωnkn ω=0

n 

∂ K ϕ X (ω)

(4.3.36)

ki .

i=1

From the joint mgf M X (t), we can obtain the joint moment m k1 k2 ···kn also as m k1 k2 ···kn

   = kn  k1 k2 ∂t1 ∂t2 · · · ∂tn ∂ K M X (t)

.

(4.3.37)

t=0

As a special case of (4.3.37) for the two-dimensional random vector (X, Y ), we have m kr

 ∂ k+r M X,Y (t1 , t2 )  = ,  ∂t1k ∂t2r (t1 ,t2 )=(0,0)

where M X,Y (t1 , t2 ) = E {exp (t1 X + t2 Y )} is the joint mgf of (X, Y ).

(4.3.38)

298

4 Random Vectors

Example 4.3.11 (Romano and Siegel 1986) When the two functions G and H are equal, we have F ∗ G = F ∗ H . The converse does not always hold true, i.e., F ∗ G = F ∗ H does not necessarily imply G = H . Let us consider an example. Assume P(X = x) =

⎧1 ⎨ 2, ⎩

x = 0, x = ±(2n − 1)π, n = 1, 2, . . . , otherwise

2 , π 2 (2n−1)2

0,

for a random variable X . Then, the cf ϕ X (t) =

1 2

+

∞  n=1

4 π 2 (2n−1)2

(4.3.39)

cos{(2n − 1)π t} of

X is a train of triangular pulses with period 2 and ϕ X (t) = 1 − |t|

(4.3.40)

for |t| ≤ 1. Meanwhile, when the distribution is  P(Y = x) =

4 , π 2 (2n−1)2

0,

x = ± (2n−1)π , n = 1, 2, . . . , 2 otherwise

(4.3.41)

for Y , the cf is also a train of triangular pulses with period 4 and ϕY (t) = 1 − |t|

(4.3.42)

for |t| ≤ 2. It is easy to see that ϕ X (t) = ϕY (t) for |t| ≤ 1 and that |ϕ X (t)| = |ϕY (t)| for all t from (4.3.40) and (4.3.42). Now, for a random variable Z with the pdf f Z (x) =

 1−cos x π x2 1 , 2π

,

x = 0, x = 0,

(4.3.43)

we have the cf ϕ Z (t) = (1 − |t|)u(1 − |t|). Then, we have ϕ Z (t)ϕ X (t) = ϕ Z (t)ϕY (t) and FZ (x) ∗ FX (x) = FZ (x) ∗ FY (x), but FX (x) = FY (x), where FX , FY , and FZ denote the cdf’s of X , Y , and Z , respectively. ♦

4.4 Conditional Distributions In this section, we discuss conditional probability functions (Ross 2009) and conditional expected values mainly for bi-variate random vectors.

4.4 Conditional Distributions

299

4.4.1 Conditional Probability Functions We first extend the discussion on the conditional distribution explored in Sect. 3.4. When the event A is assumed, the conditional joint cdf16 FZ ,W |A (z, w) = P(Z ≤ z, W ≤ w| A) of Z and W is FZ ,W |A (z, w) =

P(Z ≤ z, W ≤ w, A) . P(A)

(4.4.1)

The conditional joint pdf17 can be obtained as f Z ,W |A (z, w) =

∂2 FZ ,W |A (z, w) ∂z∂w

(4.4.2)

by differentiating the conditional joint cdf FZ ,W |A (z, w) with respect to z and w. Example 4.4.1 Obtain the conditional joint cdf FX,Y |A (x, y) and the conditional joint pdf f X,Y |A (x, y) under the condition A = {X ≤ x}. P(X ≤x,Y ≤y) , we get the conP(X ≤x) P(X ≤x,Y ≤y) as P(X ≤x)

Solution Recollecting that P(X ≤ x, Y ≤ y| X ≤ x) = ditional joint cdf FX,Y |A (x, y) = FX,Y |X ≤x (x, y) = FX,Y |X ≤x (x, y) =

FX,Y (x, y) . FX (x)

(4.4.3)

Differentiating the conditional joint cdf (4.4.3) with respect to x and y, we get the conditional joint pdf f X,Y |A (x, y) as   1 ∂ ∂ FX,Y (x, y) f X,Y |X ≤x (x, y) = ∂ x FX (x) ∂ y f X (x) ∂ f X,Y (x, y) − 2 FX,Y (x, y). (4.4.4) = FX (x) FX (x) ∂ y By writing FY |X ≤x (y) as FY |X ≤x (y) =

P(X ≤x,Y ≤y) , P(X ≤x)

FY |X ≤x (y) =

i.e.,

FX,Y (x, y) , FX (x)

(4.4.5)

we get FY |X ≤x (y) = FX,Y |X ≤x (x, y) from (4.4.3) and (4.4.5). 16

(4.4.6) ♦

As in other cases, the conditional joint cdf is also referred to as the conditional cdf if it does not cause any ambiguity. 17 The conditional joint pdf is also referred to as the conditional pdf if it does not cause any ambiguity.

300

4 Random Vectors

We now discuss the conditional distribution when the condition is expressed in terms of random variables. Definition 4.4.1 (conditional cdf; conditional pdf; conditional pmf) For a random vector (X, Y ), P(Y ≤ y|X = x) is called the conditional joint cdf, or simply the conditional cdf, of Y given X = x, and is written as FX,Y |X =x (x, y), FY |X =x (y), or FY |X (y|x). For a continuous random vector (X, Y ), the derivative ∂∂y FY |X (y|x), denoted by f Y |X (y|x), is called the conditional pdf of Y given X = x. For a discrete random vector (X, Y ), P(Y = y|X = x) is called the conditional joint pmf or the conditional pmf of Y given X = x and is written as p X,Y |X =x (x, y), pY |X =x (y), or pY |X (y|x). The relationships among the conditional pdf f Y |X (y|x), joint pdf f X,Y (x, y), and marginal pdf f X (x) and those among the conditional pmf pY |X (y|x), joint pmf p X,Y (x, y), and marginal pmf p X (x) are described in the following theorem: Theorem 4.4.1 The conditional pmf pY |X (y|x) can be expressed as pY |X (y|x) =

p X,Y (x, y) p X (x)

(4.4.7)

when (X, Y ) is a discrete random vector. Similarly, the conditional pdf f Y |X (y|x) can be expressed as f Y |X (y|x) =

f X,Y (x, y) f X (x)

(4.4.8)

when (X, Y ) is a continuous random vector. Proof For a discrete random vector, we easily get pY |X (y|x) = p X,Y (x,y) . For a continuous random vector, we have p X (x) P(x−d x 0 when (X, Y ) is a discrete random vector. When X and Y are independent of each other, we have FX |Y (x|y) = FX (x)

(4.4.18)

for every point y such that FY (y) > 0, f X |Y (x|y) = f X (x)

(4.4.19)

for every point y such that f Y (y) > 0, FY |X (y|x) = FY (y) for every point x such that FX (x) > 0, and f Y |X (y|x) = f Y (y) for every point x such that f X (x) > 0 because FX,Y (x, y) = FX (x)FY (y) and f X,Y (x, y) = f X (x) f Y (y). Example 4.4.4 Assume the pmf p X (x) = of X . Consider

1

, x = 3; x =5

6 1 , 3

1 , 2

x = 4;

(4.4.20)

4.4 Conditional Distributions

303

 Y =

0, if X = 4, 1, if X = 3 or 5

(4.4.21)

and Z = X − Y . Obtain the conditional pmf’s p X |Y , pY |X , p X |Z , p Z |X , pY |Z , and p Z |Y , and the joint pmf’s p X,Y , pY,Z , and p Z ,X . Solution We easily get pY (y) = P(X = 4) for y = 0 and pY (y) = P(X = 3 or 5) for y = 1, i.e., pY (y) =

1

, y = 0, y = 1.

2 1 , 2

(4.4.22)

Next, because Y = 1 and Z = X − Y = 2 when X = 3, Y = 0 and Z = X − Y = 4 when X = 4, and Y = 1 and Z = X − Y = 4 when X = 5, we get p Z (z) = P(X = 3) for z = 2 and p Z (z) = P(X = 4 or 5) for z = 4, i.e., p Z (z) =

1

, z = 2, z = 4.

6 5 , 6

(4.4.23)

Next, because Y = 1 when X = 3, Y = 0 when X = 4, and Y = 1 when X = 5 from (4.4.21), we get pY |X (0|3) = 0, pY |X (0|4) = 1, pY |X (0|5) = 0, pY |X (1|3) = 1, pY |X (1|4) = 0, pY |X (1|5) = 1.

(4.4.24)

Noting that p X,Y (x, y) = pY |X (y|x) p X (x) from (4.4.17) and using (4.4.20) and (4.4.24), we get p X,Y (3, 0) = 0, p X,Y (4, 0) = 36 , p X,Y (5, 0) = 0, p X,Y (3, 1) = 16 , p X,Y (4, 1) = 0, p X,Y (5, 1) = 13 . Similarly, noting that p X |Y (x|y) = (4.4.25), we get

p X,Y (x,y) pY (y)

(4.4.25)

from (4.4.17) and using (4.4.22) and

p X |Y (3|0) = 0, p X |Y (4|0) = 1, p X |Y (5|0) = 0, p X |Y (3|1) = 13 , p X |Y (4|1) = 0, p X |Y (5|1) = 23 .

(4.4.26)

Meanwhile, Y = 1 and Z = 2 when X = 3, Y = 0 and Z = 4 when X = 4, and Y = 1 and Z = 4 when X = 5 from (4.4.21) and the definition of Z . Thus, we have p Z |X (2|3) = 1, p Z |X (2|4) = 0, p Z |X (2|5) = 0, p Z |X (4|3) = 0, p Z |X (4|4) = 1, p Z |X (4|5) = 1.

(4.4.27)

Noting that p X,Z (x, z) = p Z |X (z|x) p X (x) from (4.4.17) and using (4.4.20) and (4.4.27), we get

304

4 Random Vectors

p X,Z (3, 2) = 16 , p X,Z (4, 2) = 0, p X,Z (5, 2) = 0, p X,Z (3, 4) = 0, p X,Z (4, 4) = 36 , p X,Z (5, 4) = 13 . Similarly, noting that p X |Z (x|z) = (4.4.28), we get

p X,Z (x,z) p Z (z)

(4.4.28)

from (4.4.17) and using (4.4.23) and

p X |Z (3|2) = 1, p X |Z (4|2) = 0, p X |Z (5|2) = 0, p X |Z (3|4) = 0, p X |Z (4|4) = 35 , p X |Z (5|4) = 25 .

(4.4.29)

Finally, we have p Z |Y (2|1) = P(X − Y = 2|Y = 1) = P(X = 3|X = 3 or 5) = as

P(X =3) P(X =3 or 5)

p Z |Y (2|1) = and p Z |Y (4|1) = P(X − Y = 4|Y = 1) =

1 3

P(X =5) P(X =3 or 5)

p Z |Y (4|1) =

(4.4.30) as

2 3

(4.4.31)

from p Z |Y (2|0) = P(X − Y = 2|Y = 0) = P(X = 2|X = 4) = 0, p Z |Y (4|0) = P(X − Y = 4|Y = 0) = P(X = 4|X = 4) = 1, and {X = 3} ∩ {X = 3 or 5} = {X = 3}. Therefore, noting that pY,Z (y, z) = p Z |Y (z|y) pY (y) from (4.4.17) and using (4.4.22) and (4.4.31), we get pY,Z (0, 2) = 0, pY,Z (0, 4) = 21 , pY,Z (1, 2) = 16 , pY,Z (1, 4) = 13 . (4.4.32) We also get pY |Z (0|2) = 0, pY |Z (0|4) = 35 , pY |Z (1|2) = 1, pY |Z (1|4) = from pY |Z (y|z) =

pY,Z (y,z) p Z (z)

2 5

by using (4.4.23) and (4.4.32).

(4.4.33) ♦

Example 4.4.5 When the joint pdf of (X, Y ) is f X,Y (x, y) =

1 u (2 − |x|) u (2 − |y|) , 16

(4.4.34)

obtain the conditional joint cdf FX,Y |A and the conditional joint pdf f X,Y |A for A = {|X | ≤ 1, |Y | ≤ 1}. 1 1 1 Solution First, we have P(A) = −1 −1 f X,Y (u, v)dudv = 16 × 4 = 41 . Next, for



P(X ≤ x, Y ≤ y, A) =

f X,Y (u, v)dudv, {|u|≤1, |v|≤1, u≤x, v≤y}

(4.4.35)

4.4 Conditional Distributions

305

Fig. 4.16 The region of integration to obtain the conditional joint cdf FX,Y | A (x, y|A) under the condition A = {|X | ≤ 1, |Y | ≤ 1} when f X,Y (x, y) = 1 16 u (2 − |x|) u (2 − |y|)

v 2 1 (x, y) −2

−1

1

2

u

−1 −2

we get the following results by referring to Fig. 4.16: First, P(X ≤ x, Y ≤ y, A) = P(A) = 41 when x ≥ 1 and y ≥ 1 and P(X ≤ x, Y ≤ y, A)  = P(∅) = 0 when x ≤ f X,Y (u, v)dudv = −1 or y ≤ −1. In addition, P(X ≤ x, Y ≤ y, A) = {|u|≤1, −1≤v≤y}

× 2(y + 1) = 18 (y + 1) because {|u| ≤ 1, |v| ≤ 1, u ≤ x, v ≤ y} = {|u| ≤ 1, −1 ≤ v ≤ y} when x ≥ 1 and −1 ≤ y ≤ 1. We similarly get P(X ≤ x, Y ≤ 1 (x + y, A) = 18 (x + 1) when −1 ≤ x ≤ 1 and y ≥ 1, and P(X ≤ x, Y ≤ y, A) = 16 1)(y + 1) when −1 ≤ x ≤ 1 and −1 ≤ y ≤ 1. Taking these results into account, we ≤y,A) as get the conditional joint cdf FX,Y |A (x, y) = P(X ≤x,Y P(A) 1 16

⎧ 1, ⎪ ⎪ ⎪ ⎪ ⎨ 21 (y + 1), FX,Y |A (x, y) = 41 (x + 1)(y + 1), ⎪ 1 ⎪ (x + 1), ⎪ ⎪ ⎩2 0,

x ≥ 1, y ≥ 1, x ≥ 1, |y| ≤ 1, |x| ≤ 1, |y| ≤ 1, |x| ≤ 1, y ≥ 1, x ≤ −1 or y ≤ −1.

(4.4.36)

We subsequently get the conditional joint pdf f X,Y |A (x, y) =

1 u(1 − |x|)u(1 − |y|) 4

(4.4.37)

by differentiating FX,Y |A (x, y).



Let Y = g(X) and g −1 (·) be the inverse of the function g(·). We then have  f Z|Y (z|Y = y) = f Z|X z|X = g −1 ( y) ,

(4.4.38)

which implies that, when the relationship between X and Y can be expressed via an invertible function, conditioning on X = g −1 ( y) is equivalent to conditioning on Y = y.

306

4 Random Vectors

Example 4.4.6 Assume that X and Y are related as Y = g(X) = (X 1 + X 2 , X 1  −X 2 ). Then, we have f Z|Y (z|Y = (3, 1)) = f Z|X z|X = g −1 (3, 1) = ♦ f Z|X (z|X = (2, 1)).

4.4.2 Conditional Expected Values For one random variable, we have discussed the conditional expected value in (3.4.30). We now extend the discussion into random vectors with the conditioning event expressed in terms of random variables.

4.4.2.1

Conditional Expected Values in Random Vectors

∞ Let B = {X = x} in the conditional expected value E{Y |B} = −∞ y f Y |B (y)dy shown in (3.4.30). Then, the conditional expected value m Y |X = E{Y |X = x} of Y is

∞ y f Y |X (y|x)dy (4.4.39) m Y |X = −∞

when X = x. Example 4.4.7 Obtain the conditional expected value E{X |Y = y} in Example 4.4.2. 1 2  1 Solution We easily get E{X |Y = y} = 0 6x (2−x−y) 2(2 − y)x 3 d x = 4−3y 4−3y  1 − 3 x4  = 5−4y for 0 < y < 1. ♦ 2

x=0

8−6y

The conditional expected value E{Y |X } is a function of X and is thus a random variable with its value E{Y |X = x} when X = x. Theorem 4.4.4 The expected value E{Y } can be obtained as E {E{Y |X }} = E{Y } ⎧∞ ⎪ −∞ E{Y |X = x} f X (x)d x, ⎪ ⎪ ⎪ ⎨ X is a continuous random variable, ∞  = E{Y |X = x} p X (x), ⎪ ⎪ ⎪ x=−∞ ⎪ ⎩ X is a discrete random variable from the conditional expected value E{Y |X }.

(4.4.40)

4.4 Conditional Distributions

307

Proof Considering only the case of a continuous  ∞random vector, (4.4.40) can be shown easily as E {E{Y |X }} = −∞ E{Y |X = x} f X (x)d x = ∞ ∞ ∞ ∞ ∞  −∞ −∞ y f Y |X (y|x)dy f X (x)d x = −∞ y −∞ f X,Y (x, y)d xd y = −∞ y f Y (y) dy = E{Y }. ♠

4.4.2.2

Conditional Expected Values for Functions of Random Vectors

The function g(X, Y ) is a function of two random vectors X and Y while g (x, Y ) is a function of a vector x and a random vector Y : in other words, g (x, Y ) and g(X, Y ) are different from each other. In addition, under the condition X = x, the conditional mean of g(X, Y ) is E { g(X, Y )| X = x} =  g y) f Y |X ( y| x) d y, which is the same as the conditional mean (x, all  y   E g(x, Y )| X = x = all y g(x, y) f Y |X ( y| x) d y of g (x, Y ). In other words, E { g(X, Y )| X = x} = E { g (x, Y )| X = x}

= g(x, y) f Y |X ( y| x) d y. all

(4.4.41)

y

Furthermore, for the expected value of the random vector E { g(X, Y )| X}, we have E {g(X, Y )} = E {E { g(X, Y )| X}} = E {E { g(X, Y )| Y }}

(4.4.42)

    from E {E { g(X, Y )| X}} = all x all y g(x, y) f Y |X ( y| x) d y f X (x)d x = all y      all x g(x, y) f X,Y (x, y)d x d y = all y all x g(x, y) f X|Y ( x| y) d x f Y ( y)d y. The result (4.4.42) can be obtained also from Theorem 4.4.4. When X and Y are  2 both one-dimensional, if we let g(X, Y ) = Y − m Y |X , we get the expected value E { g (X, Y )| X = x} = E



 2  Y − m Y |X  X = x ,

(4.4.43)

which is called the conditional variance of Y when X = x. 

Example 4.4.8 (Ross 1976) Obtain the conditional expected value E exp when the joint pdf of (X, Y ) is f X,Y (x, y) = 2y e−x y u(x)u(y)u(2 − y).







X Y = 1 2 

1 −x

=  ∞ 21 e−x d x = e−x for 0 < Solution From (4.4.10), we have f X |Y (x|1) = f X,YfY (x,1) 2   X  (1)  0 ∞ x Y = 1 = e 2 e−x d x = 2. ♦ x < ∞. Thus, from (4.4.41), we get E exp e

2

0

When g (X, Y ) = g 1 (X)g 2 (Y ), from (4.4.41) we get       E g 1 (X)g 2 (Y ) X = x = g 1 (x)E g 2 (Y ) X = x ,

(4.4.44)

308

4 Random Vectors

which is called the factorization property (Gardner 1990). The factorization property implies that the random vector g 1 (X) under the condition X = x, or equivalently g 1 (x), is not probabilistic.

4.4.3 Evaluation of Expected Values via Conditioning As we have observed in Sects. 2.4 and 3.4.3, we can obtain the probability and expected value more easily by first obtaining the conditional probability and conditional expected value with appropriate conditioning. Let us now discuss how we can obtain expected values for random vectors by first obtaining the conditional expected value with appropriate conditioning on random vectors. Example 4.4.9 Consider the group {1, 1, 2, 2, . . . , n, n} of n pairs of numbers. When we randomly delete m numbers in the group, obtain the expected number of pairs remaining. For n = 20 and m = 10, obtain the value of the expected number. Solution Denote by Jm the number of pairs remaining after m numbers have been deleted. After m numbers have been deleted, we have 2n − m − 2Jm non-paired numbers. When we delete one more number, the number of pairs is Jm − 1 with 2Jm 2Jm or Jm with probability 1 − 2n−m . Based on this observation, we probability 2n−m have   2Jm 2Jm E { Jm+1 | Jm } = (Jm − 1) + Jm 1 − 2n − m 2n − m 2n − m − 2 . (4.4.45) = Jm 2n − m  









Noting that E E Jm+1  Jm = E Jm+1 and E {J0 } = n , we get E {Jm+1 } = 2n−m−1 2n−m−1 E {Jm } 2n−m−2 = E {Jm−1 } 2n−m−2 = · · · = E {J0 } 2n−m−2 · · · 2n−2 = 2n−m 2n−m 2n−m+1 2n−m 2n−m+1 2n (2n−m−2)(2n−m−1) , i.e., 2(2n−1) (2n − m − 1)(2n − m) 2(2n − 1) C 2n−2 m =n . 2n Cm

E {Jm } =

For n = 20 and m = 10, the value is E {J10 } =

29×30 2×39

≈ 11.15.

(4.4.46) ♦

Example 4.4.10 (Ross 1996) We toss a coin repeatedly until head appears r times consecutively. When the probability of a head is p, obtain the expected value of repetitions.

4.4 Conditional Distributions

309

Solution Denote by Ck the number of repetitions until the first appearance of k consecutive heads. Then,  Ck+1 -st outcome is head, Ck + 1, Ck+1 = (4.4.47) Ck+1 + Ck + 1, Ck+1 -st outcome is tail. Let αk = E {Ck } for convenience. Then, using (4.4.47) in E {Ck+1 } = E { Ck+1 | Ck+1 -st outcome is head} P(head) +E { Ck+1 | Ck+1 -st outcome is tail} P(tail),

(4.4.48)

we get αk+1 = (αk + 1) p + (αk+1 + αk + 1) (1 − p).

(4.4.49)

Now, solving (4.4.49), we get αk+1 = 1p αk + 1p = p12 αk−1 + 1p + p12 = · · · = k+1  1 1 1 1 1 α + + + · · · + = because α1 = E {C1 } = 1 × p + 2(1 − p) p + 1 2 k k p p p p pi i=1

3(1 − p)2 p + · · · = 1p . In other words, αk =

k  1 . pi i=1

(4.4.50)

A generalization of this problem, finding the mean time for a pattern, is discussed in Appendix 4.2. ♦

4.5 Impulse Functions and Random Vectors As we have observed in Examples 2.5.23 and 3.1.34, the unit step and impulse functions are quite useful in representing the cdf and pdf of discrete and hybrid random variables. In addition, the unit step and impulse functions can be used for obtaining joint cdf’s and joint pdf’s expressed in several formulas depending on the condition. Example 4.5.1 Obtain the joint cdf FX,X +a and joint pdf f X,X +a of X and Y = X + a for a random variable X with pdf f X and cdf FX , where a is a constant. Solution The joint cdf FX,X +a (x, y) = P(X ≤ x, X ≤ y − a) of X and Y = X + a can be obtained as FX,X +a (x, y) = P(X ≤ x) for x ≤ y − a and FX,X +a (x, y) = P(X ≤ y − a) for x ≥ y − a, i.e.,

310

4 Random Vectors

FX,X +a (x, y) = FX (min(x, y − a)) = FX (x)u(y − x − a) + FX (y − a)u(x − y + a),

(4.5.1)

where it is assumed that u(0) = 21 . If we differentiate (4.5.1), we get the joint ∂2 pdf f X,X +a (x, y) = ∂ x∂ F (x, y) = ∂∂y { f X (x)u(y − x − a) − FX (x)δ(y − x− y X,Y a) + FX (y − a)δ(x − y + a)} of X and Y = X + a as18 f X,X +a (x, y) = f X (x)δ(y − x − a)

(4.5.2)

by noting that δ(t) = δ(−t) as shown in (1.4.36) and that FX (x)δ(y − x − a) = FX (y − a)δ(x − y + a) from FX (t)δ(t − a) = FX (a)δ(t − a) as observed in (1.4.42). The result (4.5.2) can be written also as f X,X +a (x, y) = f X (y − a)δ(y − x − a). Another derivation of (4.5.2) based on FX,X +a (x, y) = FX (min(x, y − a)) in (4.5.1) is discussed in Exercise 4.72. ♦ Example 4.5.2 Obtain the joint cdf FX,cX and the joint pdf f X,cX of X and Y = cX for a continuous random variable X with pdf f X and cdf FX , where c is a constant. Solution For c > 0, the joint cdf FX,cX (x, y) = P(X ≤ x, cX ≤ y) of X and Y = cX can be obtained as  P(X x ≤ cy ,  ≤ x), FX,cX (x, y) = y P X ≤ c , x ≥ cy  y  y y − x + FX u x− . (4.5.3) = FX (x)u c c c  For c < 0, the joint cdf is FX,cX (x, y) = P X ≤ x, X ≥ cy , i.e., 

0, x ≤ cy , y P c ≤ X ≤ x , x > cy  y    y u x− . = FX (x) − FX c c

FX,cX (x, y) =

(4.5.4)

For c = 0, we have the joint cdf FX,cX (x, y) = P (X ≤ x, cX ≤ y) as 

0, P (X ≤ x) , = FX (x)u(y)

FX,cX (x, y) =

y < 0, y≥0

with u(0) = 1. Collecting (4.5.3)–(4.5.5), we eventually have

18

Here,

∞ ∞

−∞ −∞

f X (x)δ(y − x − a)dydx =

∞

−∞

f X (x)d x = 1.

(4.5.5)

4.5 Impulse Functions and Random Vectors

311

⎧    FX (x) − FX cy u x − cy , ⎪ ⎪ ⎨ FX (x)u(y), y FX,cX (x, y) = F ⎪ X (x)u c −x ⎪ ⎩ +FX cy u x − cy ,

c < 0, c = 0,

(4.5.6)

c > 0.

We now obtain the Y = cX by differentiating (4.5.6). First, rec joint pdf of X and ollect that FX (x)δ x − cy = FX cy δ x − cy from δ(t) = δ(−t) and FX (t)δ(t − ∂2 a) = FX (a)δ(t − a). Then, for c < 0, the joint pdf f X,cX (x, y) = ∂ x∂ F (x, y) y X,cX of X and Y = cX can be obtained as  y    y  ∂  y  f X (x)u x − + FX (x) − FX δ x− ∂y c c c  1 y = − f X (x)δ x − . (4.5.7) c c    Similarly, we get f X,cX (x, y) = ∂∂y f X (x)u cy − x − FX (x)δ cy − x + FX y   δ x − cy , i.e., c f X,cX (x, y) =

f X,cX (x, y) = for c > 0, and f X,cX (x, y) =

∂2 ∂ x∂ y

(4.5.8)

FX (x)u(y) = f X (x)δ(y) for c = 0: in short,

 f X,cX (x, y) =

y  1 f X (x)δ −x c c

f X (x)δ(y),  1 f (x)δ x − cy , |c| X

c = 0, c = 0.

(4.5.9)

∞ ∞ ∞ ∞ Note that −∞ −∞ f X,cX (x, y)d yd x = −∞ f X (x)d x −∞ δ(y)dy = 1 for c = 0. ∞ ∞ ∞ ∞  y 1 For c = 0, we have −∞ −∞ f X,cX (x, y)d yd x = |c| −∞ f X (x) −∞ δ x − c ∞   −∞ y d yd x. From −∞ δ x − c dy = c ∞ δ(x − t)dt = −c  for  c < 0, and ∞  ∞ ∞ y =c δ(x − t)dt = c for c > 0, we have −∞ δ x − cy dy = −∞ δ x − c dy  ∞  ∞ −∞ ♦ |c|. Therefore, −∞ −∞ f X,cX (x, y)d yd x = 1. Letting a = 0 in Example 4.5.1, c = 1 in Example 4.5.2, or a = 0 and c = 1 in Exercise 4.68, we get f X,X (x, y) = f X (x)δ(x − y)

(4.5.10)

or f X,X (x, y) = f X (x)δ(y − x) = f X (y)δ(x − y) = f X (y)δ(y − x). Let us now consider the joint distribution of a random variable and its absolute value. Example 4.5.3 Obtain the joint cdf FX,|X | and the joint pdf f X,|X | of X and Y = |X | for a continuous random variable X with pdf f X and cdf FX .

312

4 Random Vectors

Solution First, the joint cdf FX,|X | (x, y) = P(X ≤ x, |X | ≤ y) of X and Y = |X | can be obtained as (Bae et al. 2006) ⎧ ⎨ P(−y ≤ X ≤ x), −y < x < y, y > 0, FX,|X | (x, y) = P(−y ≤ X ≤ y), x > y, y > 0, ⎩ 0, otherwise = u(y + x)u(y − x)G 1 (x, y) + u(y)u(x − y)G 1 (y, y), (4.5.11) where G 1 (x, y) = FX (x) − FX (−y)

(4.5.12)

satisfies G 1 (x, −x) = 0. We have used u(y + x)u(y − x)u(y) = u(y − x)u(y + x) in obtaining (4.5.11). Note that FX,|X | (x, y) = 0 for y < 0 because u(x + y)u(y − x) = 0 from (x + y)(y − x) = y 2 − x 2 < 0. Noting that δ(y − a) f X (y) = δ(y − a) f X (a),

(4.5.13)

G 1 (x, −x) = 0, δ(x − y) = δ(y − x), and u(αx) = u(x) for α > 0, we get ∂ FX,|X | (x, y) = δ(x + y)u(y − x)G 1 (x, y) − u(y + x)δ(y − x)G 1 (x, y) ∂x +u(x + y)u(y − x) f X (x) + u(y)δ(x − y)G 1 (y, y) = δ(x + y)u(−2x)G 1 (x, −x) − u(2x)δ(y − x)G 1 (x, x) +u(x + y)u(y − x) f X (x) + u(x)δ(x − y)G 1 (x, x) = u(x + y)u(y − x) f X (x) (4.5.14) by differentiating (4.5.11) with respect to x. Consequently, the joint pdf ∂2 F (x, y) of X and Y = |X | is f X,|X | (x, y) = ∂ x∂ y X,|X | f X,|X | (x, y) = {δ(x + y)u(y − x) + u(x + y)δ(y − x)} f X (x) = {δ(x + y)u(−x) + δ(x − y)u(x)} f X (x) = {δ(x + y) f X (−y) + δ(x − y) f X (y)} u(y). Obtaining the pdf f |X | (y) =

f |X | (y) =

∞ −∞

∞

−∞

(4.5.15)

f X,|X | (x, y)d x of |X | from (4.5.15), we get

{δ(x + y) f X (−y) + δ(x − y) f X (y)} u(y)d x

= { f X (−y) + f X (y)} u(y),

(4.5.16)

which is equivalent to that obtained in Example 3.2.7. From u(x) + u(−x) = 1, we ∞ ♦ also have −∞ f X,|X | (x, y)dy = f X (x)u(−x) + f X (x)u(x) = f X (x).

4.5 Impulse Functions and Random Vectors

313

Example 4.5.4 Using (4.5.16), it is easy to see that Y = |X | ∼ U (0, 1) when X ∼ U (−1, 1), X ∼ U (−1, 0), or X ∼ U [0, 1). ♦ When the Jacobian J (g(x)) is 0, Theorem 4.2.1 is not applicable as mentioned in the paragraph just before Sect. 4.2.1.2. We have J (g(x)) = 0 if a function g = (g1 , g2 , . . . , gn ) in the n-dimensional space is in the form, for example, of g j (x) = c j g1 (x) + a j

(4.5.17)

 n  n for j = 2, 3, . . . , n, where a j j=2 and c j j=2 are all constants. Let us discuss how we can obtain the pdf of (Y1 , Y2 ) = g(X) = (g1 (X), g2 (X)) in the two dimensional case, where g2 (x) = cg1 (x) + a

(4.5.18)

with a and c constants. First, we choose an appropriate auxiliary variable, for example, Z = X 2 or Z = X 1 when g1 (X) is not or is, respectively, in the form of d X 2 + b. Then, using Theorem 4.2.1, we obtain the joint pdf f Y1 ,Z of (Y1 , Z ). We next obtain the pdf of Y1 as

f Y1 (x) =

∞ −∞

f Y1 ,Z (x, v)dv

(4.5.19)

from f Y1 ,Z . Subsequently, we get the joint pdf  f Y1 ,Y2 (x, y) =

f Y1 (x)δ(y − a), 1 , f (x)δ x − y−a |c| Y1 c

c = 0, c = 0

(4.5.20)

of (Y1 , Y2 ) = (g1 (X), cg1 (X) + a) using (4.E.18). Example 4.5.5 Obtain the joint pdf of (Y1 , Y2 ) = (X 1 − X 2 , 2X 1 − 2X 2 ) and the joint pdf of (Y1 , Y3 ) = (X 1 − X 2 , X 1 − X 2 + 2) when the joint pdf of X = (X 1 , X 2 ) is f X (x1 , x2 ) = u (x1 ) u (1 − x1 ) u (x2 ) u (1 − x2 ). Solution Because g1 (X) = Y1 = X 1 − X 2 is not in the form of d X 2 + b, we choose the auxiliary variable Z = X 2 . Let us then obtain the joint pdf of (Y1 , Z ) = (X 1 − X 2 , X 2 ). Noting that X 1 = Y1 + Z , X 2 = Z , and the Jacobian   1 −1   ∂(x1 −x2 ,x2 )  = 1, the joint pdf of (Y1 , Z ) is  = J ∂(x1 ,x2 ) 0 1  f Y1 ,Z (y, z) = f X (x1 , x2 )|x1 =y+z, x2 =z = u(y + z)u(1 − y − z)u(z)u(1 − z).

(4.5.21)

314

4 Random Vectors x2

z 1

1 VX

VY 0

1

−1

x1

1

0

y

Fig. 4.17 The support VY of f X 1 −X 2 ,X 2 (y, z) when the support of f X (x1 , x2 ) is VX . The intervals of integrations in the cases −1 < y < 0 and 0 < y < 1 are also represented as lines with two arrows

We can then obtain the pdf f Y1 (y) = Y1 = g1 (X) = X 1 − X 2 as

∞

−∞

u(y + z)u(1 − y − z)u(z)u(1 − z)dz of

⎧ ⎪ |y| > 1, ⎨  0, 1 dz, −1 < y < 0, f Y1 (y) = −y ⎪ ⎩  1−y dz, 0 < y < 1 ⎧ 0 |y| > 1, ⎨ 0, = 1 + y, −1 < y < 0, ⎩ 1 − y, 0 < y < 1 = (1 − |y|)u(1 − |y|)

(4.5.22)

by integrating f Y1 ,Z (y, z), for which Fig. 4.17 is useful in identifying the integration intervals. Next, using (4.5.20), we get the joint pdf f Y1 ,Y2 (x, y) =

 y 1 (1 − |x|)u(1 − |x|)δ x − 2 2

(4.5.23)

of (Y1 , Y2 ) = (X 1 − X 2 , 2X 1 − 2X 2 ) and the joint pdf f Y1 ,Y3 (x, y) = (1 − |x|)u(1 − |x|)δ(x − y + 2) of (Y , Y ) = (X 1 − X 2 , X 1 − X 2 + 2). ∞  −∞ 1 3 δ(t)(−2dt) = 2 −∞ δ(t)dt = 2. ∞

Note

that

∞

−∞

(4.5.24)  δ x − 2y dy = ♦

We have briefly discussed how we can obtain the joint pdf and joint cdf in some special cases by employing the unit step and impulse functions. This approach is also quite fruitful in dealing with the order statistics and rank statistics.

Appendices

315

Appendices Appendix 4.1 Multinomial Random Variables Let us discuss in more detail the multinomial random variables introduced in Example 4.1.4. Definition 4.A.1 (multinomial distribution) Assume n repetitions of an independent experiment of which the outcomes are a collection {Ai }ri=1 of disjoint events with r  probability {P ( Ai ) = pi }ri=1 , where pi = 1. Denote by X i the number of occuri=1

rences of event Ai . Then, the joint distribution of X = (X 1 , X 2 , . . . , X r ) is called the multinomial distribution, and the joint pmf of X is n! p k1 p k2 · · · prkr k1 !k2 ! · · · kr ! 1 2

p X (k1 , k2 , . . . , kr ) = for {ki ∈ {0, 1, . . . , n}}ri=1 and

r 

(4.A.1)

ki = n.

i=1

The right-hand side of (4.A.1) is the coefficient of

r j=1

expansion of ( p1 t1 + p2 t2 + · · · + pr tr )n .

k

t j j in the multinomial

3 be the Example 4.A.1 In a repetition of rolling of a fair die ten times, let {X i }i=1 numbers of A1 = {1}, A2 = {an even number}, and A3 = {3, 5}, respectively. Then, the joint pmf of X = (X 1 , X 2 , X 3 ) is

10! p X (k1 , k2 , k3 ) = k1 !k2 !k3 ! 3 for {ki ∈ {0, 1, . . . , 10}}i=1 such that

3 

 k1  k2  k3 1 1 1 6 2 3

(4.A.2)

ki = 10. Based on this pmf, the prob-

i=1

ability of the event {three times of A1 , six times of A2 } = {X 1 = 3, X 2 = 6, X 3 =  1 3  1 6  1 1 10! 35 1} can be obtained as p X (3, 6, 1) = 3!6!1! = 1728 ≈ 2.025 × 10−2 . ♦ 6 2 3 Example 4.A.2 As in the binomial distribution, let us consider the approximation of the multinomial distribution in terms of the Poisson distribun!  tion. For pi → 0 and npi → λi when n → ∞, we have kn!r ! =  r = −1 ki !

n−

  r −1 ki n(n − 1) · · · n − ki + 1 ≈ n i=1 ,

i=1

r −1

r −1

n−

prkr = pr

i=1

pr = 1 −

i=1

ki

r −1

 r −1   pi ≈ exp − pi , and

i=1

i=1

 r −1   r −1    r −1    ≈ exp − n − ki pi ≈ exp −n pi = exp −λ , i=1

i=1

i=1

316

4 Random Vectors

where λ =

r −1

i=1 k1 k2 n! p p k1 !k2 !···kr ! 1 2

λi . Based on these results, we can show that p X (k1 , k2 , . . . , kr ) =  k1 k2 kr −1 ···n kr −1 k1 k2 · · · prkr → nk1 !kn 2 !···k p1 p2 · · · pr −1 exp −λ , i.e., r −1 ! p X (k1 , k2 , . . . , kr ) →

r −1 

e−λi

i=1

λiki . ki !

(4.A.3)

The result (4.A.3) with r = 2 is clearly the same as (3.5.19) obtained in Theorem 3.5.2 for the binomial distribution. ♦ √ Example 4.A.3 For ki = npi + O n and n → ∞, the multinomial pmf can be approximated as n! p k1 p k2 · · · prkr k1 !k2 ! · · · kr ! 1 2   r 1 1  (ki − npi )2 ≈  exp − . 2 i=1 npi (2π n)r −1 p1 p2 · · · pr

p X (k1 , k2 , . . . , kr ) =

(4.A.4)

Consider the case of r = 2 in (4.A.4). Letting k1 = k, k2 = n − k, p1 = p, and p2 = 1 − p1 = q, we get p X (k1 , k2 ) = p X (k, n − k) as  ! (n − k − nq)2 1 (k − np)2 1 + exp − p X (k1 , k2 ) ≈ √ 2 np nq 2π npq   1 1 q(k − np)2 + p(np − k)2 = √ exp − 2 npq 2π npq   1 (k − np)2 , (4.A.5) = √ exp − 2npq 2π npq which is the same as (3.5.16) of Theorem 3.5.1 for the binomial distribution.



The multinomial distribution (Johnson and Kotz 1972) is a generalization of the binomial distribution, and the special case r = 2 of the multinomial distribution is the binomial distribution. For the multinomial random vector X = (X 1 , X 2 , . . . , X r ), the marginal distribution of X i is a binomial distribution. In addition, assuming s  X ai as the (s + 1)-st random variable, the distribution of the subvector X s = n−  i=1 X a1 , X a2 , . . . , X as of X is also a multinomial distribution with the joint pmf  n− s kai s  i=1 1− pai ka s   pai i i=1  . p X s ka1 , ka2 , . . . , kas = n!  s  k ! i=1 ai n− kai ! i=1

(4.A.6)

Appendices

317

pa ) a Letting s = 1 in (4.A.6), we get the binomial pmf p X (ka ) = n! (1− . (n−ka )! ka ! In addition, when a subvector of X is given, the conditional joint distribution of the random vector of the remaining random variables is also a multinomial distribution, which depends not on the individual remaining random variables but on the sum of the remaining random variables. For example, assume X = (X 1 , X 2 , X 3 , X 4 ). Then, the joint distribution of (X 2 , X 4 ) when (X 1 , X 3 ) is given is a multinomial distribution, which depends not on X 1 and X 3 individually but on the sum X 1 + X 3 . Finally, when X = (X 1 , X 2 , . . . , X r ) has the pmf (4.A.1), it is known that n−ka





E X i | X b1 , X b2 , . . . , X br −1

⎛ = 1−

pi r −1

⎝n − pb j

r −1 

p ka

⎞ Xbj ⎠ ,

(4.A.7)

j=1

j=1

 r −1 where i is not equal to any of b j j=1 . It is also known that we have the conditional expected value 

E Xi | X j



 n − X j pi = , 1 − pj

(4.A.8)

pi p j ,  (1 − pi ) 1 − p j

(4.A.9)

the correlation coefficient , ρ X i ,X j = −

 and Cov X i , X j = −npi p j for X i and X j with i = j.

Appendix 4.2 Mean Time to Pattern Denote by X k the outcome of the k-th trial of an experiment with the pmf p X k ( j) = p j for j = 1, 2, . . ., where

∞  j=1

(4.A.10)

p j = 1. The number of trials of the experiment until

a pattern M = (i 1 , i 2 , . . . , i n ) is observed for the first time is called the time to pattern M, which is denoted by T = T (M) = T (i 1 , i 2 , . . . , i n ). For example, when the sequence of the outcomes is (6, 4, 9, 5, 5, 9, 5, 7, 3, 2, . . .), the time to pattern (9, 5, 7) is T (9, 5, 7) = 8. Now, let us obtain (Nielsen 1973) the mean time E{T (M)} for the pattern M. First, when M satisfies

318

4 Random Vectors

(i 1 , i 2 , . . . , i k ) = (i n−k+1 , i n−k+2 , . . . , i n )

(4.A.11)

for n = 2, 3, . . . and k = 1, 2, . . . , n − 1, the pattern M overlaps and L k = (i 1 , i 2 , . . . , i k ) is an overlapping piece or a bifix of M. For instance, (A, B, C), (D, E, F, G), (S, S, P), (4, 4, 5), and (4, 1, 3, 3, 2) are non-overlapping patterns; and (A, B, G, A, B), (9, 9, 2, 4, 9, 9), (3, 4, 3), (5, 4, 5, 4, 5), and (5, 4, 5, 4, 5, 4) are overlapping patterns. Note that the length k of an overlapping piece can be longer than n2 and that more than one overlapping pieces may exist in a pattern as in (5, 4, 5, 4, 5) and (5, 4, 5, 4, 5, 4). In addition, when the overlapping piece is of length k, the elements in the pattern are the same at every other n − k − 1: for instance, M = (i 1 , i 1 , . . . , i 1 ) when k = n − 1. A non-overlapping pattern can be regarded as an overlapping pattern with k = n.

(A) A Recursive Method First, the mean time E{T (i 1 )} =

∞ 

kP(T (i 1 ) = k) =

k=1

∞ k−1   k 1 − pi1 pi1 to patk=1

tern i 1 of length 1 is E{T (i 1 )} =

1 . pi1

(4.A.12)

When M has J overlapping pieces, let the lengths of the overlapping pieces be M as M = K 0 < K 1 < · · · < K J < K J +1 with K 0 = 0 and K J +1 = n, and express i 1 , i 2 , . . . , i K 1 , i K 1 +1 , . . . , i K 2 , i K 2 +1 , . . . , i K J , i K J +1 , . . . , i n−1 , i n . If we write the   J  overlapping pieces L K j = i 1 , i 2 , . . . , i K j j=1 as

i1 .. .

i2 .. .

i 1 i 2 · · · i K J,2 +1 i K J,2 +2

i2 · · · i K1 i1 · · · i K 2,1 +1 i K 2,1 +2 · · · i K 2 .. .. .. . . . · · · i K J,1 +1 i K J,1 +2 · · · i K J ,

(4.A.13)

where K α,β = K α − K β , then we have i m = i K b,a +m for 1 ≤ a ≤ b ≤ J and m = 1, 2, . . . , K a − K a−1 because the values at the same column in (4.A.13) are all the same. Denote by T (A1 ) the time to wait until the occurrence of M+1 = (i 1 , i 2 , . . . , i n , i n+1 ) after the occurrence of M. Then, we have E {T (M+1 )} = E{T (M)} + E {T (A1 )} because T (M+1 ) = T (M) + T (A1 ). Here, we can express E {T (A1 )} as

(4.A.14)

Appendices

319 ∞ 

E {T (A1 )} =

E {T (A1 )| X n+1 = x} P (X n+1 = x) .

(4.A.15)

x=1

Let us focus on the term E {T (A1 )| X n+1 = x} in (4.A.15). First, when x = i K j +1  for example, denote by L˜ K j +1 = i 1 , i 2 , . . . , i K j , i K j +1 the j-th overlapping piece with its immediate next element, and recollect (4.A.11). Then, we have  M+1 = i 1 , i 2 , . . . , i n−K j , i n−K j +1 , i n−K j +2 , . . . , i n , i K j +1  = i 1 , i 2 , . . . , i n−K j , i 1 , i 2 , . . . , i K j , i K j +1 , (4.A.16)        from which we can get E T (A1 )  X n+1 = i K j +1 = 1 + E T (M+1 )  L˜ K j +1 . We can similarly get  ⎧   ˜ ⎪ 1 + E T L , x = i K 0 +1 , (M )  ⎪ +1 K +1 0 ⎪  ⎪   ⎪ ˜ ⎪ ⎪ 1 + E T (M+1 )  L K 1 +1 , x = i K 1 +1 , ⎪ ⎪ ⎪ ⎨ .. .. (4.A.17) E {T (A1 )| X n+1 = x} = . .   ⎪ ⎪ ˜ ⎪ 1 + E T (M+1 )  L K J +1 , x = i K J +1 , ⎪ ⎪ ⎪ ⎪ ⎪ 1, x = i n+1 , ⎪ ⎪ ⎩ otherwise 1 + E {T (M+1 )} , when i 1 = i K 0 +1 , i K 1 +1 , . . ., i K J +1 , i K J +1 +1 = i n+1 are all distinct. Here, recollecting        E T (M+1 )  L˜ K j +1 = E {T (M+1 )} − E T L˜ K j +1

(4.A.18)

       from E {T (M+1 )} = E T (M+1 )  L˜ K j +1 + E T L˜ K j +1 , we get E {T (M+1 )} = E{T (M)} J   # $  + pi K j +1 1 + E {T (M+1 )} − E T L˜ K j +1 j=0



+ pin+1 × 1 + ⎝1 − pin+1 − 

× 1 + E {T (M+1 )}

J 

⎞ pi K j +1 ⎠

j=0

"

(4.A.19)

from (4.A.14), (4.A.15), and (4.A.17). We can rewrite (4.A.19) as pin+1 E {T (M+1 )} = E{T (M)} + 1 −

J  j=0

after some steps.

   pi K j +1 E T L˜ K j +1

(4.A.20)

320

4 Random Vectors

Let us next consider the case in which some are the same among i K 0 +1 , i K 1 +1 , . . ., i K J +1 , and i n+1 . For example, assume a < b and i K a +1 = i K b +1 . Then, for  x= ˜ i K a +1 = i K b +1 in (4.A.15) and (4.A.17), the line ‘1 + E T (M+1 )  L K a +1 , x = i K a +1 ’ corresponding to the K a -th piece among the lines of (4.A.17) will disappear because the longest overlapping piece in the last part of M+1 is not L˜ K a +1 but L˜ K b +1 , Based on this fact, if we follow steps similar to those leading to (4.A.19) and (4.A.20), we get pin+1 E {T (M+1 )} = E{T (M)} + 1 −



   pi K j +1 E T L˜ K j +1 , (4.A.21)

j

   denotes the sum from j = 0 to J letting all E T L˜ K a +1 to 0 when j  J i K a +1 = i K b +1 for 0 ≤ a < b ≤ J + 1. Note here that K j j=1 are the lengths of the overlapping pieces of M, not of M+1 . Note also that (4.A.20) is a special case of (4.A.21): in other words, (4.A.21) is always applicable. In essence, starting from E {T (i 1 )} = p1i shown in (4.A.12), we can successively 1 obtain E {T (i 1 , i 2 )}, E {T (i 1 , i 2 , i 3 )}, . . ., E{T (M)} based on (4.A.21). where



Example 4.A.4 For an i.i.d. random variables {X k }∞ k=1 with the marginal pmf p X k ( j) = p j , obtain the mean time to M = (5, 4, 5, 3). Solution First, E{T (5)} = i n+1 = 4, we get 

1 . p5

When (5, 4) is M+1 , because J = 0, i K 0 +1 = 5, and

      = pi1 E T L˜ 1 , pi K j +1 E T L˜ K j +1

(4.A.22)

j

 " E{T (5)} + 1 − p5 E{T (5)} = p41p5 . Next, when (5, 4, 5) is     M+1 , because J = 0 and i K 0 +1 = 5 = i n+1 , we get pi K j +1 E T L˜ K j +1 = 0. j  " Thus, E{T (5, 4, 5)} = p15 E{T (5, 4)} + 1 = p 1p2 + p15 . Finally, when (5, 4, 5, 3) 4 5 is M+1 , because J = 1 and K 1 = 1, we have i K 0 +1 = i 1 = 5, i K 1 +1 = i 2 = 4, i K J +1 +1 = i 4 = 3, and i.e., E{T (5, 4)} =



1 p4

   = p5 E{T (5)} + p4 E{T (5, 4)}, pi K j +1 E T L˜ K j +1

(4.A.23)

j

i.e., 1 . p3 p4 p52

E{T (5, 4, 5, 3)} =

1 p3

 " E{T (5, 4, 5)} + 1 − p5 E{T (5)} − p4 E{T (5, 4)} = ♦

Appendices

321

(B) An Efficient Method The result (4.A.21) is applicable always. However, as we have observed in Example 4.A.4, (4.A.21) possesses some inefficiency in the sense that we have to first obtain the expected values E {(i 1 )}, E {(i 1 , i 2 )}, . . ., E {(i 1 , i 2 , . . . , i n−1 )} before we can obtain the expected value E {(i 1 , i 2 , . . . , i n )}. Let us now consider a more efficient method.

(B-1) Non-overlapping Patterns When the pattern M is non-overlapping, we have (i 1 , i 2 , . . . , i k ) = (i n−k+1 , i n−k+2 , . . . , i n )

(4.A.24)

for every k ∈ {1, 2, . . . , n − 1}. Based on this observation, let us show that {T = j + n} 



  T > j, X j+1 , X j+2 , . . . , X j+n = M .

(4.A.25)

 First, when T = j + n, the first occurrence of M is X j+1 , X j+2 , . . . , X j+n , which implies that T > j and 

X j+1 , X j+2 , . . . , X j+n



= M.

(4.A.26)

Next, let us show that T = j + n when T > j and (4.A.26) holds true. If k ∈ {1, 2, . . . , n − 1} and T = j + k, then we have X j+k = i n , X j+k−1 =  i n−1 , . . . , X j+1 = i n−k+1 . This is a contradiction to X j+1 , X j+2 , . . . , X j+n = (i 1 , i 2 , . . . , i k ) = (i n−k+1 , i n−k+2 , . . . , i n ) implied by (4.A.24) and (4.A.26). In short, for any value k in {1, 2, . . . , n − 1}, we have T = j + k and thus T ≥ j + n. Meanwhile, (4.A.26) implies T ≤ j + n. Thus, we get T = j + n. From (4.A.25), we have   P(T = j + n) = P T > j, X j+1 , X j+2 , . . . , X j+n = M . (4.A.27) Here, the event T > j is dependent only on X 1 , X 2 , . . . , X j but not on X j+1 , X j+2 , . . . , X j+n , and thus P(T = j + n) = P(T > j) P



X j+1 , X j+2 , . . . , X j+n = M

= P(T > j) p, ˆ where pˆ = pi1 pi2 · · · pin . Now, recollecting that ∞  j=0

(4.A.28) ∞  j=0

P(T = j + n) = 1 and that

 " P(T > j) = P(T > 0) + P(T > 1) + · · · = P(T = 1) + P(T = 2) + · · · +

322

4 Random Vectors

∞  "  P(T = 2) + P(T = 3) + · · · + · · · = j P(T = j), i.e., j=0 ∞ 

P(T > j) = E{T },

(4.A.29)

j=0

we get pˆ

∞  j=0

P(T > j) = pE{T ˆ } = 1 from (4.A.28). Thus, we have E{T (M)} = 1pˆ ,

i.e., E{T (M)} =

1 . pi1 pi2 · · · pin

(4.A.30)

Example 4.A.5 For the pattern M = (9, 5, 7), we have E{T (9, 5, 7)} = p5 p17 p9 . Thus, to observe the pattern (9, 5, 7), we have to wait on the average until the p5 p17 p9 th repetition. In tossing a fair die, we need to repeat E{T (3, 5)} = p31p5 = 36 times on the average to observe the pattern (3, 5) for the first time. ♦

(B-2) Overlapping Patterns We next consider overlapping patterns. When M is an overlapping pattern, construct a non-overlapping pattern Mx = (i 1 , i 2 , . . . , i n , x)

(4.A.31)

19 / choosing x as x ∈ / {i 1 , i 2 , . . . , i n } or x ∈ of  length n + 1 by appropriately  i K 0 +1 , i K 1 +1 , . . . , i K J +1 . Then, from (4.A.30), we have

E {T (Mx )} =

1 . px pˆ

(4.A.32)

When x = i n+1 , using (4.A.32) in (4.A.21), we get E{T (M)} =

    1 − 1 + pi K j +1 E T L˜ K j +1 pˆ j

(4.A.33)

by noting that Mx in (4.A.31) and M+1 in (4.A.21) are the same. Now, if we consider the case in which M is not an overlapping pattern, the last

More generally, we can interpret ‘appropriate x’ as ‘any x such that (i 1 , i 2 , . . . , i n , x) is a nonoverlapping pattern’, and x can be chosen even if it is not a realization of any X k . For example, when p1 + p2 + p3 = 1, we could choose x = 7.

19

Appendices

323



term of (4.A.33) becomes

j

      pi K j +1 E T L˜ K j +1 = pi K0 +1 E T L˜ K 0 +1 =

pi1 E {T (i 1 )} = 1. Consequently, (4.A.33) and (4.A.30) are the same. Thus, for any overlapping or non-overlapping pattern M, we can use (4.A.33) to obtain E{T (M)}. Example 4.A.6 In the pattern (9, 5, 1, 9, 5), we have J = 1, K 1 = 2, and we get E{T (9, 5, 1, 9, 5)} = L˜ K 1 +1 = (9, 5,  1). Thus, from (4.A.30) and "(4.A.33), 1 1 − 1 + p E{T (9)} + p E{T (9, 5, 1)} = + p51p9 . Similarly, in the pat9 1 p1 p52 p92 p1 p52 p92 tern (9, 5, 9, 1, 9, 5, 9), we get J = 2, K 1 = 1, K 2 = 3, and L˜ K 1 +1 = (9, 5) and L˜ K 2 +1 = (9, 5, 9, 1). Therefore,  1 − 1 + p9 E{T (9)} + p5 E{T (9, 5)} 2 4 p1 p5 p9 " + p1 E{T (9, 5, 9, 1)} 1 1 1 = + + (4.A.34) 2 4 2 p9 p1 p5 p9 p5 p9

E{T (9, 5, 9, 1, 9, 5, 9)} =



from (4.A.30) and (4.A.33).

Comparing Examples 4.A.4 and 4.A.6, it is easy to see that we can obtain E{T (M)} faster from (4.A.30) and (4.A.33) than from (4.A.21). Theorem 4.A.1 For a pattern M = (i 1 , i 2 , . . . , i n ) with J overlapping pieces, the mean time to M can be obtained as E{T (M)} =

J +1  j=1

1 , pi1 pi2 · · · pi K j

(4.A.35)

where K 1 < K 2 < · · · < K J are the lengths of the overlapping pieces with K J +1 = n.       Proof For convenience, let α j = pi K j +1 E T L˜ K j +1 and β j = E T L K j . Also let ⎧ ⎨ 1, if i K j +1 = i K m +1 for every value of m ∈ { j + 1, j + 2, . . . , J }, j = (4.A.36) ⎩ 0, otherwise for j = 0, 1, . . . , J − 1, and  J = 1 by noting that the term with j = J is always added in the sum in the right-hand side of (4.A.33). Then, we can rewrite (4.A.33) as E{T (M)} =

J  1 − 1 + αjj. pi1 pi2 · · · pin j=0

(4.A.37)

324

4 Random Vectors

j−1  Now, α0 = pi1 E {T (i 1 )} = 1 and α j = β j + 1 − αl l for j = 1, 2, . . . , J l=0  J from (4.A.21). Solving for α j j=1 , we get α1 = β1 + 1 − 0 , α2 = β2 + 1 − α3 = β3 + 1 − (2 α2 + 1 α1 + (1 α1 + 0 α0 ) = β2 − 1 β1 + (1 − 0 ) (1 − 1 ), 0 α0 ) = β3 − 2 β2 − 1 (1 − 2 ) β1 + (1 − 0 ) (1 − 1 ) (1 − 2 ), . . ., and

α J = β J −  J −1 β J −1 −  J −2 (1 −  J −1 ) β J −2 − · · · − 1 (1 − 2 ) (1 − 3 ) · · · (1 −  J −1 ) β1 + (1 − 0 ) (1 − 1 ) · · · (1 −  J −1 ) .

(4.A.38)

Therefore, J 

α j  j = β J + ( J −1 −  J −1 ) β J −1 + { J −2 −  J −2  J −1

j=0

− J −2 (1 −  J −1 )} β J −2 + · · · + {1 − 1 2 − 1 (1 − 2 ) 3 − · · · − 1 (1 − 2 ) (1 − 3 ) · · · (1 −  J −1 )} β1 + {0 + (1 − 0 ) 1 + (1 − 0 ) (1 − 1 ) 2 + · · · + (1 − 0 ) (1 − 1 ) · · · (1 −  J −1 )} .

(4.A.39)

In the right-hand side of (4.A.39), the second, third, . . ., second last terms are all 0, and the last term is 0 + (1 − 0 ) 1 + (1 − 0 ) (1 − 1 ) 2 + · · · + (1 − 0 ) (1 − 1 ) · · · (1 −  J −3 )  J −2 + (1 − 0 ) (1 − 1 ) · · · (1 −  J −2 )  J −1 + (1 − 0 ) (1 − 1 ) · · · (1 −  J −1 ) = 0 + (1 − 0 ) 1 + (1 − 0 ) (1 − 1 ) 2 + · · · + (1 − 0 ) (1 − 1 ) · · · (1 −  J −3 )  J −2 + (1 − 0 ) (1 − 1 ) · · · (1 −  J −2 ) .. . = 0 + (1 − 0 ) 1 + (1 − 0 ) (1 − 1 ) = 1.

(4.A.40)

Thus, noting (4.A.40) and using (4.A.39) into (4.A.37), we get E{T (M)} = 1 − 1 + β J + 1, i.e., pi pi ··· pi 1

2

n

E{T (M)} =

   1 + E T L KJ . pi1 pi2 · · · pin

(4.A.41)

   Next, if we obtain E T L K J after some steps similar to those for (4.A.41) by recollecting that the overlapping pieces of L K J are L K 1 , L K 2 , . . . , L K J −1 , we have

Appendices

   E T L KJ =

325 1 pi1 pi2 ··· pi K

J

   + E T L K J −1 . Repeating this procedure, and noting

that L 1 is not an overlapping piece, we get (4.A.35) by using (4.A.30). Example 4.A.7 Using (4.A.35), it is easy to get E{T (5, 4, 4, 5)} = E{T (5, 4, 5, 4)} = 4, 5, 4, 4, 5)} =

1+ p4 p5 , E{T (5, 4, 5, 4, 5)} p42 p52 1 1 + p2 p2 + p15 . p44 p53 4 5

=

1 p42 p53

+

1 p4 p52

+

1 , p5

♠ 1+ p42 p5 , p42 p52

and E{T (5, 4,



Example 4.A.8 Assume a coin with P(h) = p = 1 − P(t), where h and t denote head and tail, respectively. Then, the expected numbers of tosses until the first occurrences of h, tht, htht, ht, hh, and hthhthh are E{T (h)} = 1p , E{T (tht)} = 1 1 + q1 , E{T (htht)} = p21q 2 + pq , E{T (hthh)} = p13 q + 1p , and E{T (hthhthh)} = pq 2 1 + p13 q + 1p , respectively, where q = 1 − p. ♦ p5 q 2

Exercises Exercise 4.1 Show that f (x) =

μe−μx (μx)n−1 u(x) (n − 1)!

(4.E.1)

is the pdf of the sum of n i.i.d. exponential random variables with rate μ. Exercise 4.2 A box contains three red and two green balls. We choose a ball from the box, discard it, and choose another ball from the box. Let X = 1 and X = 2 when the first ball is red and green, respectively, and Y = 4 and Y = 3 when the second ball is red and green, respectively. Obtain the pmf p X of X , pmf pY of Y , joint pmf p X,Y of X and Y , conditional pmf pY |X of Y given X , conditional pmf p X |Y of X given Y , and pmf p X +Y of X + Y . Exercise 4.3 For two i.i.d. random variables X 1 and X 2 with marginal distribution P(1) = P(−1) = 0.5, let X 3 = X 1 X 2 . Are X 1 , X 2 , and X 3 pairwise independent? Are they independent? Exercise 4.4   When  the joint pdf of a random vector (X, Y ) is f X,Y (x, y) = a 1 + x y x 2 − y 2 u(1 − |x|)u (1 − |y|), determine the constant a. Are X and Y independent of each other? If not, obtain the correlation coefficient between X and Y . Exercise 4.5 A box contains three red, six green, and five blue balls. A ball is chosen randomly from the box and then replaced to the box after the color is recorded. After six trials, let the numbers of red and blue be R and B, respectively. Obtain the conditional pmf p R|B=3 of R when B = 3 and conditional mean E {R|B = 1} of R when B = 1.

326

4 Random Vectors

Exercise 4.6 Two binomial random variables X 1 ∼ b (n 1 , p) and X 2 ∼ b (n 2 , p) are independent of each other. Show that, when X 1 + X 2 = x is given, the conditional distribution of X 1 is a hypergeometric distribution. Exercise 4.7 Show that Z = ables X and Y .

X X +Y

∼ U (0, 1) for two i.i.d. exponential random vari-

Exercise 4.8 When the joint pdf of X = (X 1 , X 2 ) is         1 1 1 1 f X (x1 , x2 ) = u x1 + u − x1 u x2 + u − x2 , (4.E.2) 2 2 2 2  obtain the joint pdf f Y of Y = (Y1 , Y2 ) = X 12 , X 1 + X 2 . Based on the joint pdf f Y , obtain the pdf f Y1 of Y1 = X 12 and pdf f Y2 of Y2 = X 1 + X 2 . Exercise 4.9 When the joint pdf of X 1 and X 2 is f X 1 ,X 2 (x, y) = 41 u(1 − |x|)u(1 − % |y|), obtain the cdf FW and pdf f W of W = X 12 + X 22 . Exercise 4.10 Two random variables X and Y are independent of each other with the pdf’s f X (x) = λe−λx u(x) and f Y (y) = μe−μy u(y), where λ > 0 and μ > 0. When W = min(X, Y ) and  V =

1, if X ≤ Y, 0, if X > Y,

(4.E.3)

obtain the joint cdf of (W, V ). Exercise 4.11 Obtain the pdf of U = X + Y + Z when the joint pdf of X , Y , and . Z is f X,Y,Z (x, y, z) = 6u(x)u(y)u(z) (1+x+y+z)4 Exercise 4.12 Consider the two joint pdf’s (1) f X (x) = u (x1 ) u (1 − x1 ) u (x2 ) u (1 − x2 ) and (2) f X (x) = 2u (x1 ) u (1 − x2 ) u (x2 − x1 ) of X = (X 1 , X 2 ), where In each of the two cases, obtain the joint pdf  x = (x1 , x2 ). f Y of Y = (Y1 , Y2 ) = X 12 , X 1 + X 2 , and then, obtain the pdf f Y1 of Y1 = X 12 and pdf f Y2 of Y2 = X 1 + X 2 based on f Y . Exercise 4.13 In each of the two cases of the joint in Exercise    pdf f X described 4.12, obtain the joint pdf f Y of Y = (Y1 , Y2 ) = 21 X 12 + X 2 , 21 X 12 − X 2 , and then, obtain the pdf f Y1 of Y1 and pdf f Y2 of Y2 based on f Y . Exercise 4.14 Two random variables X ∼ G (α1 , β) and Y ∼ G (α2 , β) are independent of each other. Show that Z = X + Y and W = YX are independent of each other and obtain the pdf of Z and pdf of W . Exercise 4.15 Denote the joint pdf of X = (X 1 , X 2 ) by f X . r  (1) Express the pdf of Y1 = X 12 + X 22 in terms of f X .

Exercises

327

 (2) When f X (x, y) = π1 u 1 − x 2 − y 2 , show that the cdf FW and pdf f W of W = r  2 X 1 + X 22 are as follows: 1. FW (w) = ⎧ u(w − 1) and f W (w) = δ(w − 1) if r = 0. ⎨ 0, w ≤ 0, 1 1 2. FW (w) = w r , 0 ≤ w ≤ 1, and f W (w) = r1 w r −1 u(w)u(1 − w) if r > ⎩ 1, w ≥ 1 0.  0, w < 1, 1 3. FW (w) = and f W (w) = − r1 w r −1 u(w − 1) if r < 0. 1 r 1−w , w ≥1 (3) Obtain FW and f W when r = 21 , 1, and −1 in (2). Exercise 4.16 The marginal pdf of the three i.i.d. random variables X 1 , X 2 , and X 3 is f (x) = u(x)u(1 − x). (1) Obtain the joint pdf f Y1 ,Y2 of (Y1 , Y2 ) = (X 1 + X 2 + X 3 , X 1 − X 3 ). (2) Based on f Y1 ,Y2 , obtain the pdf f Y2 of Y2 . (3) Based on f Y1 ,Y2 , obtain the pdf f Y1 of Y1 . Exercise 4.17 Consider i.i.d. random variables X and Y with marginal pmf p(x) = ˜ − 1), where 0 < α < 1. (1 − α)α x−1 u(x (1) (2) (3) (4)

Obtain the pmf of X + Y and pmf of X − Y . Obtain the joint pmf of (X − Y, X ) and joint pmf of (X − Y, Y ). Using the results in (2), obtain the pmf of X , pmf of Y , and pmf of X − Y . Obtain the joint pmf of (X + Y, X − Y ), and using the result, obtain the pmf of X − Y and pmf of X + Y . Compare the results with those obtained in (1).

Exercise 4.18 Consider Exercise 2.30. Let Rn be the number of type O cells at n + 21 minutes after the start of the culture. Obtain E {Rn }, the pmf p2 (k) of R2 , and the probability η0 that nothing will remain in the culture. Exercise 4.19 Obtain the conditional expected value E{X |Y = y} in Example 4.4.3. Exercise 4.20 Consider an i.i.d. random vector X = (X 1 , X 2 , X 3 ) with marginal pdf f (x) = e−x u(x). Obtain the joint pdf f Y (y1 , y2 , y3 ) of Y = (Y1 , Y2 , Y3 ), where 1 +X 2 1 , and Y3 = X 1X+X . Y1 = X 1 + X 2 + X 3 , Y2 = X 1X+X 2 +X 3 2 Exercise 4.21 Consider two i.i.d. random variables X 1 and X 2 with marginal pdf f (x) = u(x)u(1 − x). Obtain the joint pdf of Y = (Y1 , Y2 ), pdf of Y1 , and pdf of Y2 when Y1 = X 1 + X 2 and Y2 = X 1 − X 2 . Exercise 4.22 When Y = (Y1 , Y2 ) is obtained from rotating clockwise a point X = (X 1 , X 2 ) in the two dimensional plane by θ , express the pdf of Y in terms of the pdf f X of X. Exercise 4.23 Assume that the value of the joint pdf f X,Y (x, y) of X and Y is positive joint cdf in a region containing x 2 + y 2 < a 2 , where a > 0. Express the conditional   FX,Y |A and conditional joint pdf f X,Y |A in terms of f X,Y when A = X 2 + Y 2 ≤ a 2 .

328

4 Random Vectors

Exercise 4.24 The joint pdf of (X, Y ) is f X,Y (x, y) = 41 u (1 − |x|) u (1 − |y|).   When A = X 2 + Y 2 ≤ a 2 with 0 < a < 1, obtain the conditional joint cdf FX,Y |A and conditional joint pdf f X,Y |A . Exercise 4.25 Prove the following results: (1) If X and Z are not orthogonal, then there exists a constant a for which Z and X − a Z are orthogonal. (2) It is possible that X and Y are uncorrelated even when X and Z are correlated and Y and Z are correlated. Exercise 4.26 Prove the following results: (1) If X and Y are independent of each other, then they are uncorrelated. (2) If the pdf f X of X is an even function, then X and X 2 are uncorrelated but are not independent of each other. Exercise 4.27 Show that     E max X 2 , Y 2 ≤ 1 + 1 − ρ 2 ,

(4.E.4)

where ρ is the correlation coefficient between the random variables X and Y both with zero mean and unit variance. T Exercise 4.28 ⎛Consider ⎞ a random vector X = (X 1 , X 2 , X 3 ) with covariance 211 matrix K X = ⎝ 1 2 1 ⎠. Obtain a linear transformation making X into an uncor112 related random vector with unit variance.

Exercise  4.29 Obtain the pdf of Y when the joint pdf of (X, Y ) is f X,Y (x, y) = 1 exp −y − xy u(x)u(y). y Exercise 4.30 When the joint pmf of (X, Y ) is ⎧1 , ⎪ ⎪ ⎨ 21 , 8 p X,Y (x, y) = 1 , ⎪ ⎪ ⎩4 0,

(x, y) = (1, 1), (x, y) = (1, 2) or (2, 2), (x, y) = (2, 1), otherwise,

(4.E.5)

obtain the pmf of X and pmf of Y . Exercise 4.31 For two i.i.d random variables X 1 and X 2 with marginal pmf p(x) = x ˜ where λ > 0, obtain the pmf of M = max (X 1 , X 2 ) and pmf of N = e−λ λx! u(x), min (X 1 , X 2 ). Exercise 4.32 For two i.i.d. random variables X and Y with marginal pdf f (z) = u(z) − u(z − 1), obtain the pdf’s of W = 2X , U = −Y , and Z = W + U .

Exercises

329

Exercise 4.33  marginal dis For"three i.i.d. random variables X 1 , X 2 , and X 3 with tribution U − 21 , 21 , obtain the pdf of Y = X 1 + X 2 + X 3 and E Y 4 . n Exercise 4.34 The random variables {X i }i=1 are independent of each other with n pdf’s { f i }i=1 , respectively. Obtain the joint pdf of {Yk }nk=1 , where Yk = X 1 + X 2 + · · · + X k for k = 1, 2, . . . , n.

Exercise 4.35 The joint pmf of X and Y is p X,Y (x, y) =

 x+y 32

0,

, x = 1, 2, y = 1, 2, 3, 4, otherwise.

(4.E.6)

(1) Obtain the pmf of X and pmf of Y . (2) Obtain P(X > Y ), P(Y = 2X ), P(X + Y = 3), and P(X ≤ 3 − Y ). (3) Discuss whether or not X and Y are independent of each other. Exercise 4.36 For independent random variables X 1 and X 2 with pdf’s f X 1 (x) = u(x)u(1 − x) and f X 2 (x) = e−x u(x), obtain the pdf of Y = X 1 + X 2 . Exercise 4.37 Three Poisson random variables X 1 , X 2 , and X 3 with means 2, 1, and 4, respectively, are independent of each other. (1) Obtain the mgf of Y = X 1 + X 2 + X 3 . (2) Obtain the distribution of Y . Exercise 4.38 When the joint pdf of X , Y , and Z is f X,Y,Z (x, y, z) = k(x + y + z)u(x)u(y)u(z)u(1 − x)u(1 − y)u(1 − z), determine the constant k and obtain the conditional pdf f Z |X,Y (z|x, y). Exercise 4.39 Consider a random variable with probability measure P(X = x) =

 λx e−λ x!

0,

, x = 0, 1, 2, . . . , otherwise.

(4.E.7)

Here, λis a realization of a random variable  with pdf f  (v) = e−v u(v). Obtain   −  X =1 . E e Exercise 4.40 When U1 , U2 , and U3 are independent of each other, obtain the joint pdf f X,Y,Z (x, y, z) of X = U1 , Y = U1 + U2 , and Z = U1 + U2 + U3 in terms of the pdf’s of U1 , U2 , and U3 . Exercise 4.41 Let (X, Y, Z ) be the rectangular coordinate of a randomly chosen point in a sphere of radius 1 centered at the origin in the three dimensional space. (1) Obtain the joint pdf f X,Y (x, y) and marginal pdf f X (x). (2) Obtain the conditional joint pdf f X,Y |Z (x, y|z). Are X , Y , and Z independent of each other?

330

4 Random Vectors

Exercise 4.42 Consider a random vector (X, Y ) with joint pdf f X,Y (x, y) = c u (r − |x| − |y|), where c is a constant and r > 0. (1) Express c in terms of r and obtain the pdf f X (x). (2) Are X and Y independent of each other? (3) Obtain the pdf of Z = |X | + |Y |. Exercise 4.43 Assume X with cdf FX and Y with cdf FY are independent of each other. Show that P(X ≥ Y ) ≥ 21 when FX (x) ≤ FY (x) at every point x.  Exercise 4.44 The joint pdf of (X, Y ) is f X,Y (x, y) = c x 2 + y 2 u(x)u(y)u (1 −x 2 − y 2 . (1) Determine the constant c and obtain the pdf of X and pdf of Y . Are X and Y independent of each other? √ (2) Obtain the joint pdf f R,Θ of R = X 2 + Y 2 and Θ = tan−1 YX . (3) Obtain the pmf of the output Q = q(R, Θ) of polar quantizer, where  q(r, θ ) =

 1 if 0 ≤ r ≤ 21 4 ,  1 k + 4, if 21 4 ≤ r ≤ 1, k,

π(k−1) 8 π(k−1) 8

≤θ ≤ ≤θ ≤

πk , 8 πk 8

(4.E.8)

for k = 1, 2, 3, 4. Exercise 4.45 Two types of batteries have the pdf f (x) = 3λx 2 exp(−λx 3 )u(x) and g(y) = 3μy 2 exp(−μy 3 )u(y), respectively, of lifetime with μ > 0 and λ > 0. When the lifetimes of batteries are independent of each other, obtain the probability that the battery with pdf f of lifetime lasts longer than that with g, and obtain the value when λ = μ. Exercise 4.46 Two i.i.d. random variables X and Y have marginal pdf f (x) = e−x u(x). (1) Obtain the pdf each of U = X + Y , V = X − Y , X Y , YX , Z = min(X,Y ) . max(X, Y ), and max(X,Y ) (2) Obtain the conditional pdf of V when U = u. (3) Show that U and Z are independent of each other.

X X +Y

, min(X, Y ),

Exercise 4.47 Two Poisson random variables X 1 ∼ P (λ1 ) and X 2 ∼ P (λ2 ) are independent of each other. (1) Show that X 1 + X 2 ∼ P (λ1 + λ2 ).   1 . (2) Show that the conditional distribution of X 1 when X 1 + X 2 = n is b n, λ1λ+λ 2 Exercise 4.48 Consider Exercise 2.17. (1) Obtain the mean and variance of the number M of matches. (2) Assume that the students with matches will leave with their balls, and each of the remaining students will pick a ball again after their balls are mixed. Show that the mean of the number of repetitions until every student has a match is N .

Exercises

331

Exercise 4.49 A particle moves back and forth between positions 0, 1, . . . , n. At any position, it moves to the previous or next position with probability 1 − p or p, respectively, after 1 second. At positions 0 and n, however, it moves only to the next position 1 and previous position n − 1, respectively. Obtain the expected value of the time for the particle to move from position 0 to position n. Exercise 4.50 Let N be the number of tosses of a coin with probability p of head until we have two head’s in the last three tosses: we let N = 2 if the first two outcomes are both head’s. Obtain the expected value of N . Exercise 4.51 Two people A1 and A2 with probabilities p1 and p2 , respectively, of hit alternatingly fire at a target until the target has been hit two times consecutively. (1) Obtain the mean number μi of total shots fired at the target when Ai starts the shooting for i = 1, 2. (2) Obtain the mean number h i of times the target has been hit when Ai starts the shooting for i = 1, 2. Exercise 4.52 Consider Exercise 4.51, but now assume that the game ends when the target is hit twice (i.e., consecutiveness is unnecessary). When A1 starts, obtain the probability α1 that A1 fires the last shot of the game and the probability α2 that A1 makes both hits. Exercise 4.53 Assume i.i.d. random variables X 1 , X 2 , . . . with marginal distribution U [0, 1). Let g(x) = E{N }, where N = min {n : X n < X n−1 } and X 0 = x. Obtain an integral equation for g(x) conditional on X 1 , and solve the equation. Exercise 4.54 We repeat tossing a coin with probability p of head. Let X be the number of repetitions until head appears three times consecutively. (1) Obtain a difference equation for g(k) = P(X  = k). (2) Obtain the generating function G X (s) = E s X . (3) Obtain E{X }. (Hint. Use conditional expected value.) Exercise 4.55 Obtain the conditional joint cdf FX,Y |A (x, y) and conditional joint pdf f X,Y |A (x, y) when A = {x1 < X ≤ x2 }. Exercise 4.56 For independent random variables X and Y , assume the pmf p X (x) =

1

, x = 3; x =5

6 1 , 3

1 , 2

x = 4;

(4.E.9)

of X and pmf pY (y) =

1

, y = 0, y=1

2 1 , 2

(4.E.10)

of Y . Obtain the conditional pmf’s p X |Z , p Z |X , pY |Z , and p Z |Y and the joint pmf’s p X,Y , pY,Z , and p X,Z when Z = X − Y .

332

4 Random Vectors

Exercise 4.57 Two exponential random variables T1 and T2 with rate λ1 and λ2 , respectively, are independent of each other. Let U = min (T1 , T2 ), V = max (T1 , T2 ), and I be the smaller index, i.e., the index I such that TI = U . (1) (2) (3) (4)

Obtain the expected values E{U }, E{V − U }, and E{V }. Obtain E{V } using V = T1 + T2 − U . Obtain the joint pdf fU,V −U,I of (U, V − U, I ). Are U and V − U independent of each other?

Exercise 4.58 Consider a bi-variate beta random vector (X, Y ) with joint pdf f X,Y (x, y) =

Γ ( p1 + p2 + p3 ) p1 −1 p2 −1 x y (1 − x − y) p3 −1 Γ ( p1 ) Γ ( p2 ) Γ ( p3 ) ×u(x)u(y)u(1 − x − y), (4.E.11)

where p1 , p2 , and p3 are positive numbers. Obtain the pdf f X of X , pdf f Y of Y , conditional pdf f X |Y , and conditional pdf f Y |X . In addition, obtain the conditional Y Y pdf f 1−X | X of 1−X when X is given. 1 Exercise 4.59 Assuming the joint pdf f X,Y (x, y) = 16 u (2 − |x|) u (2 − |y|) of (X, Y ), obtain the conditional joint cdf FX,Y |B and conditional joint pdf f X,Y |B when B = {|X | + |Y | ≤ 1}.

Exercise 4.60 Let  the joint pdf of X and Y be f X,Y (x, y) = |x y|u(1 − |x|)u(1 − |y|). When A = X 2 + Y 2 ≤ a 2 with 0 < a < 1, obtain the conditional joint cdf FX,Y |A and conditional joint pdf f X,Y |A . Exercise 4.61 For a random vector X = (X 1 , X 2 , . . . , X n ), show that   E X R−1 X T = n,

(4.E.12)

where R is the correlation matrix of X and R−1 is the inverse matrix of R. Exercise 4.62 When the cf of (X, Y ) is ϕ X,Y (t, s), show that the cf of Z = a X + bY is ϕ X,Y (at, bt). Exercise 4.63 The joint pdf of (X, Y ) is f X,Y (x, y) =

n! F i−1 (x){F(y) − F(x)}k−i−1 (i − 1)!(k − i − 1)!(n − k)! ×{1 − F(y)}n−k f (x) f (y)u(y − x), (4.E.13)

where i, k, and n are natural numbers such that 1 ≤ i < k ≤ n, F is the cdf of a random variable, and f (t) = dtd F(t). Obtain the pdf of X and pdf of Y .

Exercises

333

Exercise 4.64 The number N of typographical errors in a book is a Poisson random variable with mean λ. Proofreaders A and B find a typographical error with probability p1 and p2 , respectively. Let X 1 , X 2 , X 3 , and X 4 be the numbers of typographical errors found by Proofreader A but not by Proofreader B, by Proofreader B but not by Proofreader A, by both proofreaders, and by neither proofreader, respectively. Assume that the event of a typographical error being found by a proofreader is independent of that by another proofreader. (1) Obtain the joint pmf of X 1 , X 2 , X 3 , and X 4 . (2) Show that E {X 1 } 1 − p2 , = E {X 3 } p2

E {X 2 } 1 − p1 . = E {X 3 } p1

(4.E.14)

Now assume that the values of p1 , p2 , and λ are not available. (3) Using X i as the estimate of E {X i } for i = 1, 2, 3, obtain the estimates of p1 , p2 , and λ. (4) Obtain an estimate of X 4 . Exercise 4.65 Show that the correlation coefficient between X and |X | is ∞ ρ X |X | =

|x| (x − m X ) f (x)d x % % , σ X2 σ X2 + 4m +X m −X

−∞

(4.E.15)

where m ±X , f , m X , and σ X2 are the half means defined in (3.E.28), pdf, mean, and variance, respectively, of X . Obtain the value of ρ X |X | and compare it with what can be obtained intuitively in each of the following cases of the pdf f X (x) of X : (1) f X (x) is an even function. (2) f X (x) > 0 only for x ≥ 0. (3) f X (x) > 0 only for x ≤ 0. Exercise 4.66 For a random variable X with pdf f X (x) = u(x) − u(x − 1), obtain the joint pdf of X and Y = 2X + 1. Exercise 4.67 Consider a random variable X and its magnitude Y = |X |. Show that the conditional pdf f X |Y can be expressed as f X |Y (x|y) =

f X (x)δ(x − y) f X (x)δ(x + y) u(−x) + u(x) f X (x) + f X (−x) f X (x) + f X (−x)

(4.E.16)

for y ∈ {y | { f X (y) + f X (−y)} u(y) > 0}, where f X is the pdf of X . Obtain the conditional pdf f Y |X (y|x). (Hint. Use (4.5.15).)

334

4 Random Vectors

Exercise 4.68 Show that the joint cdf and joint pdf are ⎧  FX (x)u y−a −x ⎪ c ⎪ ⎨ +FX y−a u x − y−a , c c FX,cX +a (x, y) = F (x)u(y − a), ⎪ ⎪    ⎩ X u x− FX (x) − FX y−a c

y−a , c

c > 0, c = 0, c 0 shown in (3.2.35) and

f X 1 −X 2 (y) =



−∞

f X 1 ,X 2 (y + y2 , y2 ) dy2

(4.E.22)

shown in (4.2.20).

References M. Abramowitz, I.A. Stegun (eds.), Handbook of Mathematical Functions (Dover, New York, 1972) J. Bae, H. Kwon, S.R. Park, J. Lee, I. Song, Explicit correlation coefficients among random variables, ranks, and magnitude ranks. IEEE Trans. Inform. Theory 52(5), 2233–2240 (2006) N. Balakrishnan, Handbook of the Logistic Distribution (Marcel Dekker, New York, 1992) D.L. Burdick, A note on symmetric random variables. Ann. Math. Stat. 43(6), 2039–2040 (1972) W.B. Davenport Jr., Probability and Random Processes (McGraw-Hill, New York, 1970) H.A. David, H.N. Nagaraja, Order Statistics, 3rd edn. (Wiley, New York, 2003) A.P. Dawid, Some misleading arguments involving conditional independence. J. R. Stat. Soc. Ser. B (Methodological) 41(2), 249–252 (1979) W.A. Gardner, Introduction to Random Processes with Applications to Signals and Systems, 2nd edn. (McGraw-Hill, New York, 1990) S. Geisser, N. Mantel, Pairwise independence of jointly dependent variables. Ann. Math. Stat. 33(1), 290–291 (1962) R.M. Gray, L.D. Davisson, An Introduction to Statistical Signal Processing (Cambridge University Press, Cambridge, 2010) R.A. Horn, C.R. Johnson, Matrix Analysis (Cambridge University Press, Cambridge, 1985) N.L. Johnson, S. Kotz, Distributions in Statistics: Continuous Multivariate Distributions (Wiley, New York, 1972) S.A. Kassam, Signal Detection in Non-Gaussian Noise (Springer, New York, 1988) S.M. Kendall, A. Stuart, Advanced Theory of Statistics, vol. II (Oxford University, New York, 1979) A. Leon-Garcia, Probability, Statistics, and Random Processes for Electrical Engineering, 3rd edn. (Prentice Hall, New York, 2008) K.V. Mardia, Families of Bivariate Distributions (Charles Griffin and Company, London, 1970) R.N. McDonough, A.D. Whalen, Detection of Signals in Noise, 2nd edn. (Academic, New York, 1995) P.T. Nielsen, On the expected duration of a search for a fixed pattern in random data. IEEE Trans. Inform. Theory 19(5), 702–704 (1973) A. Papoulis, S.U. Pillai, Probability, Random Variables, and Stochastic Processes, 4th edn. (McGraw-Hill, New York, 2002) V.K. Rohatgi, A.KMd.E. Saleh, An Introduction to Probability and Statistics, 2nd edn. (Wiley, New York, 2001) J.P. Romano, A.F. Siegel, Counterexamples in Probability and Statistics (Chapman and Hall, New York, 1986) S.M. Ross, A First Course in Probability (Macmillan, New York, 1976)

336

4 Random Vectors

S.M. Ross, Stochastic Processes, 2nd edn. (Wiley, New York, 1996) S.M. Ross, Introduction to Probability Models, 10th edn. (Academic, Boston, 2009) G. Samorodnitsky, M.S. Taqqu, Non-Gaussian Random Processes: Stochastic Models with Infinite Variance (Chapman and Hall, New York, 1994) I. Song, J. Bae, S.Y. Kim, Advanced Theory of Signal Detection (Springer, Berlin, 2002) J.M. Stoyanov, Counterexamples in Probability, 3rd edn. (Dover, New York, 2013) A. Stuart and J. K. Ord, Advanced Theory of Statistics: Vol. 1. Distribution Theory, 5th edn. (Oxford University, New York, 1987) J.B. Thomas, Introduction to Probability (Springer, New York, 1986) Y.H. Wang, Dependent random variables with independent subsets. Am. Math. Mon. 86(4), 290–292 (1979). G.L. Wies, E.B. Hall, Counterexamples in Probability and Real Analysis (Oxford University, New York, 1993)

Chapter 5

Normal Random Vectors

In this chapter, we consider normal random vectors in the real space. We first describe the pdf and cf of normal random vectors, and then consider the special cases of bivariate and tri-variate normal random vectors. Some key properties of normal random vectors are then discussed. The expected values of non-linear functions of normal random vectors are also investigated, during which an explicit closed form for joint moments is presented. Additional topics related to normal random vectors are then briefly described.

5.1 Probability Functions Let us first describe the pdf and cf of normal random vectors (Davenport 1970; Kotz et al. 2000; Middleton 1960; Patel et al. 1976) in general. We then consider additional topics in the special cases of bi-variate and tri-variate normal random vectors.

5.1.1 Probability Density Function and Characteristic Function Definition 5.1.1 (normal random vector) A vector X = (X 1 , X 2 , . . . , X n )T is called an n dimensional, n-variable, or n-variate normal random vector if it has the joint pdf f X (x) = √

  1 1 T −1 , K exp − − m) − m) (x (x 2 (2π)n |K |

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 I. Song et al., Probability and Random Variables: Theory and Applications, https://doi.org/10.1007/978-3-030-97679-8_5

(5.1.1)

337

338

5 Normal Random Vectors

where m = (m 1 , m 2 , . . . , m n )T and K is an n × n Hermitian matrix with the determinant |K | ≥ 0. The distribution is denoted by N (m, K ). When m = (0, 0, . . . , 0)T and all the diagonal elements of K are 1, the normal distribution is called a standard normal distribution. We will in most cases assume |K | > 0: when |K | = 0, the distribution is called a degenerate distribution and will be discussed briefly in Theorems 5.1.3, 5.1.4, and 5.1.5. The distribution of a normal random vector is often called a jointly normal distribution and a normal random vector is also called jointly normal random variables. It should be noted that ‘jointly normal random variables’ and ‘normal random variables’ are strictly different. Specifically, the term ‘jointly normal random variables’ is a synonym for ‘a normal random vector’. However, the term ‘normal random variables’ denotes several random variables with marginal normal distributions which may or may not be a normal random vector. In fact, all the components of a nonGaussian random vector may be normal random variables in some cases as we shall see in Example 5.2.3 later, for instance. Example 5.1.1 For a random vector X ∼ N (m, K ), the mean vector, covariance ♦ matrix, and correlation matrix are m, K , and R = K + m m T , respectively. Theorem 5.1.1 The joint cf of X = (X 1 , X 2 , . . . , X n )T ∼ N (m, K ) is  ϕ X (ω) = exp

 1 T jm ω − ω Kω , 2 T

(5.1.2)

where ω = (ω1 , ω2 , . . . , ωn )T . −2 n Proof Letting and y = (y1 , y2 , . . . , yn )T = x − m, the cf  α= {(2π)  |K |} T ϕ X (ω) = E exp jω X of X can be calculated as 1

    1 α exp − (x − m)T K −1 (x − m) exp jω T x d x 2 x∈Rn  

   1 exp − y T K −1 y − 2 jω T y d y. = α exp jω T m 2

ϕ X (ω) =

(5.1.3)

y∈Rn

 T Now, recollecting that ω T y = ω T y = y T ω because ω T y is scalar and   T   that K = K T , we have y − j K T ω K −1 y − j K T ω = y T K −1 − jω T K    K −1 y − j K T ω = y T K −1 y − j y T K −1 K T ω − jω T y − ω T K T ω, i.e., 

y − j KTω

T

  K −1 y − j K T ω = y T K −1 y − 2 jω T y − ω T K ω.

(5.1.4)

5.1 Probability Functions

339

Thus, letting z = (z 1 , z 2 , . . . , z n )T = y − j K T ω, recollecting that ω T m =  T T ω m = m T ω because ω T m is scalar, and using (5.1.4), we get   ϕ X (ω) = α exp jω T m

  T  1  y − j K T ω K −1 y − j K T ω + ω T K ω d y × exp − 2 y∈Rn

 

    1 1 = α exp jω T m exp − ω T K ω exp − z T K −1 z d z 2 2 z∈Rn   1 = exp j m T ω − ω T K ω 2

(5.1.5) ♠

from (5.1.3).

5.1.2 Bi-variate Normal Random Vectors Let the covariance matrix of a bi-variate normal random vector X = (X 1 , X 2 ) be  σ12 ρσ1 σ2 , ρσ1 σ2 σ22

K2 =

(5.1.6)

where ρ is the correlation coefficient X 1 and X 2 with |ρ| ≤ 1. Then, we  between  have the determinant |K 2 | = σ12 σ22 1 − ρ2 of K 2 , the inverse K −1 2 =

 1 σ22 −ρσ1 σ2   2 σ12 σ22 1 − ρ2 −ρσ1 σ2 σ1

(5.1.7)

of K 2 , and the joint pdf f X 1 ,X 2 (x, y) =





(x − m 1 )2 σ12 2πσ1 σ2  (x − m 1 ) (y − m 2 ) (y − m 2 )2 −2ρ + σ1 σ2 σ22 1 

1  exp −  2 2 1 − ρ2 1−ρ

(5.1.8)

  of  (X 1 , X 2 ). The distribution N (m 1 , m 2 )T , K 2 is often also denoted by  N m 1 , m 2 , σ12 , σ22 , ρ . In (5.1.7) and the joint pdf (5.1.8), it is assumed that |ρ| < 1: the pdf for ρ → ±1 will be discussed later in Theorem 5.1.3.

340

5 Normal Random Vectors

The contour or isohypse of the bi-variate normal pdf (5.1.8) is an ellipse: specifically, referring to Exercise 5.42, the equation of the ellipse containing 2 2 2) 2) 1) − 2ρ (x−mσ11)(y−m + (y−m = 100α% of the distribution can be expressed as (x−m 2 2 σ2 σ σ 1 2   2 −2 1 − ρ ln(1 − α). As shown in Exercise 5.44, the major axis of the ellipse makes the angle θ =

1 2ρσ1 σ2 tan−1 2 2 σ1 − σ22

(5.1.9)

with the positive x-axis. For ρ > 0, we have 0 < θ < π4 , θ = π4 , and π4 < θ < π2 when σ1 > σ2 , σ1 = σ2 , and σ1 < σ2 , respectively, as shown in Figs. 5.1, 5.2, and 5.3. Denoting the standard bi-variate normal pdf by f 2 , we have 

 y2  , f 2 (0, y) = f 2 (0, 0) exp −  2 1 − ρ2

(5.1.10)

−1   . where f 2 (0, 0) = 2π 1 − ρ2

Fig. 5.1 Contour of bi-variate normal pdf when ρ > 0 and σ1 > σ2 , in which case 0 < θ < π4

y

m2 θ

Fig. 5.2 Contour of bi-variate normal pdf when ρ > 0 and σ1 = σ2 , in which case θ = π4

m1

x

m1

x

y

m2 θ

5.1 Probability Functions

341

y

Fig. 5.3 Contour of bi-variate normal pdf when ρ > 0 and σ1 < σ2 , in which case π4 < θ < π2

m2

θ m1

x

Example 5.1.2 By integrating  the joint pdf (5.1.8)over y and x, it is easy to see  that we have X ∼ N m 1 , σ12 and Y ∼ N m 2 , σ22 , respectively, when (X, Y ) ∼ N m 1 , m 2 , σ12 , σ22 , ρ . In other words, two jointly normal random variables are also individually normal, which is a special case of Theorem 5.2.1. ♦   Example 5.1.3 For (X, Y ) ∼ N m 1 , m 2 , σ12 , σ22 , ρ , X and Y are independent if ρ = 0. That is, two uncorrelated jointly normal random variables are independent, which will be generalized in Theorem 5.2.3. ♦   Example 5.1.4 From Theorem 5.1.1, the joint cf is ϕ X,Y (u, v) = exp j m 1 u +      m 2 v − 21 σ12 u 2 + 2ρσ1 σ2 uv + σ22 v 2 for (X, Y ) ∼ N m 1 , m 2 , σ12 , σ22 , ρ . ♦ Example 5.1.5 Assume that X 1 ∼ N (0, 1) and X 2 ∼ N (0, 1) are independent. ∞ ∞  1 2 1 1 1 2 Then, the pdf f Y (y) = 2π −∞ exp − 2 x − 2 (y − x) d x = 2π −∞ exp −

 2 ∞    x − 2y − 41 y 2 d x = 2√1 π exp − 14 y 2 −∞ √12π exp − 21 v 2 dv of Y = X 1 + X 2 can eventually be obtained as  2 1 y f Y (y) = √ exp − 4 2 π

(5.1.11)

√   using (4.2.37) and letting v = 2 x − 2y . In other words, The sum of two independent, standard normal random variables is an N (0, 2) random variable. In general,    when X 1 ∼ N m 1 , σ12 , X 2 ∼ N m 2 , σ22 , and X 1 and X 2 are independent of each other, we have Y = X 1 + X 2 ∼ N m 1 + m 2 , σ12 + σ22 . A further generalization of this result is expressed as Theorem 5.2.5 later. ♦ Example 5.1.6 Obtain the pdf of Z = independent of each other.

X Y

when X ∼ N (0, 1) and Y ∼ N (0, 1) are

Solution Because X and Y are independent of each other,  we get the pdf  

∞ ∞ 1 exp − 1 z 2 + 1 y 2 dy = 1 ∞ y exp f Z (z) = −∞ |y| f Y (y) f X (zy)dy = −∞ |y| 2π π 0 2     − 21 z 2 + 1 y 2 dy of Z = YX using (4.2.27). Next, letting 21 z 2 + 1 y 2 = t, we

342

5 Normal Random Vectors

have the pdf of Z as f Z (z) = Cauchy random variable.

1 1 π z 2 +1

∞ 0

e−t dt =

1 1 . π z 2 +1

In other words, Z is a ♦

Based on the results in Examples 4.2.10 and 5.1.6, we can show the following theorem:   Theorem 5.1.2 When (X, Y ) ∼ N 0, 0, σ 2X , σY2 , ρ , we have the pdf  σ X σY 1 − ρ2  f Z (z) =  2 2 (5.1.12) π σY z − 2ρσ X σY z + σ 2X and cdf FZ (z) = of Z =

1 σY z − ρσ X 1  + tan−1 2 π σ X 1 − ρ2

(5.1.13)

X . Y

 −1   2  2ρz 1 z 1 Proof Let α = 2πσ X σY 1 − ρ2 . Using and β = 2 1−ρ 2 − σ σ + 2 2 X Y ( ) σX σY ∞ ∞ 1 (4.2.27), we get f Z (z) = −∞ |v| f X,Y (zv, v)dv = α −∞ |v| exp − 2 1−ρ ( 2) 

  2 2   2 2 ∞ z v v2 z2 dv = 2α 0 v exp − 2 1−ρ dv, i.e., − 2ρzv + σv2 − σ2ρz + σ12 σ X σY σ 2X X σY ( 2 ) σ2X Y Y

∞   f Z (z) = 2α (5.1.14) v exp −βv 2 dv. 0

 ∞   1 1 2 2  dv = − v exp −βv exp −βv , we get the pdf  = 2β 0 2β 0   √ 2 −1  2   σ σ 1−ρ X σY2 z − ρσ + σ 2X 1 − ρ2 , which is the of Z as f Z (z) = αβ = X Y π σY

Thus, noting that

∞

same as (5.1.12). Next, if we let tan θz = √

σY 1−ρ2 σ X



z−

ρσ X σY



for convenience, the cdf √ σ X σY 1−ρ2  z f Z (t)dt = πσ2 1−ρ2 −∞ b(t)dt = ) X(

z of Z can be obtained as FZ (z) = −∞ √ √   σ X σY 1−ρ2 σ X 1−ρ2  θz 1 π 2 − π2 dθ = π θz + 2 , leading to (5.1.13), where b(t) = σY πσ X (1−ρ2 )  2 −1  σY2 ρσ X 1 + σ2 1−ρ t − . ♠ 2 σY ) X( Theorem 5.1.3 When ρ → ±1, we have the limit

2 2)   exp − (y−m x − m1 y − m2 2σ22 lim f X 1 ,X 2 (x, y) = δ −ξ √ ρ→±1 σ1 σ2 2πσ1 σ2 of the bi-variate normal pdf (5.1.8), where ξ = sgn(ρ).

(5.1.15)

5.1 Probability Functions

343

Proof We can rewrite f X 1 ,X 2 (x, y) as 

  1 α (y − m 2 )2 f X 1 ,X 2 (x, y) = exp − √ π 2πσ1 σ2 2σ22     x − m1 y − m2 2 × exp −α −ρ σ1 σ2 (x−m 1 )2 2) − 2ρ (x−mσ11)(y−m σ2 σ12 2 (y−m 2 )2 2) − ρ2 (y−m , i.e., σ22 σ22

by noting that 2) + ρ2 (y−m σ2 2

2

+

(y−m 2 )2 σ22

=

(x−m 1 )2 σ12

(5.1.16)

1 ) (y−m 2 ) − 2ρ (x−m + σ1 σ2

(x − m 1 )2 (x − m 1 ) (y − m 2 ) (y − m 2 )2 − 2ρ + σ1 σ2 σ12 σ22 2     y − m2 2  x − m1 y − m2 = −ρ + 1 − ρ2 , σ1 σ2 σ2

(5.1.17)

 −1      . Now, noting that απ exp −αx 2 → δ(x) for α → ∞ where α = 2 1 − ρ2 as shown in Example 1.4.6 and that α → ∞ for ρ → ±1, we can obtain (5.1.15) from (5.1.16). ♠ Based on f (x)δ(x − b) = f (b)δ(x − b) shown in (1.4.42) and the prop1 erty δ(ax) = |a| δ(x) shown in (1.4.49) of the impulse function, the degenerate pdf (5.1.15)

can be2 expressed in various equivalent formulas. For instance, (x−m 1 )2 2) in (5.1.15) can be replaced with exp − or the term exp − (y−m 2σ22 2σ12

  2) 2 1 exp −ξ (x−m2σ1 )(y−m and the term σ11σ2 δ x−m can be replaced with − ξ y−m σ1 σ2 1 σ2 δ (σ2 (x − m 1 ) − ξσ1 (y − m 2 )).

5.1.3 Tri-variate Normal Random Vectors For a standard tri-variate normal random vector (X 1 , X 2 , X 3 ), let us denote the covariance matrix as ⎡ ⎤ 1 ρ12 ρ31 K 3 = ⎣ρ12 1 ρ23 ⎦ (5.1.18) ρ31 ρ23 1 and the pdf as f 3 (x, y, z) = √

1 8π 3 |K 3 |

 T exp − 21 (x y z)K −1 3 (x y z) , i.e.,

344

5 Normal Random Vectors

 exp −

   1  1 − ρ223 x 2 + 1 − ρ231 y 2 2 |K 3 | 8π 3 |K 3 |    + 1 − ρ212 z 2 + 2c12 x y + 2c23 yz + 2c31 zx , (5.1.19)

f 3 (x, y, z) = 

1

− 1  where ci j = ρ jk ρki − ρi j . Then, we have f 3 (0, 0, 0) = 8π 3 |K 3 | 2 ,   |K 3 | = 1 − ρ212 + ρ223 + ρ231 + 2ρ12 ρ23 ρ31    = 1 − ρ2jk 1 − ρ2ki − ci2j   = αi2j,k 1 − βi2j,k ,

(5.1.20)

and K −1 3

⎤ ⎡ 1 − ρ223 c12 c31 1 ⎣ c12 1 − ρ231 c23 ⎦ , = |K 3 | c31 c23 1 − ρ212

(5.1.21)

where αi j,k = and βi j,k =

−ci j  ,  1−ρ2jk (1−ρ2ki )

   1 − ρ2jk 1 − ρ2ki

(5.1.22)

i.e.,

βi j,k =

ρi j − ρ jk ρki αi j,k

(5.1.23)

denotes the partial correlation coefficient between X i and X j when X k is given.   Example 5.1.7 Note that we have ρi j = ρ jk ρki and |K 3 | = αi2j,k = 1 − ρ2jk     1 − ρ2ki when βi j,k = 0. In addition, for ρi j → ±1, we have ρ jk → sgn ρi j ρki   1−ρ2 because X i → sgn ρi j X j . Thus, when ρi j → ±1, we have βi j,k → 1−ρ2jk = 1 and jk    2 |K 3 | → − ρ jk − sgn ρi j ρki → 0. ♦  1 T Note also that f 3 (0, y, z) = f 3 (0, 0, 0) exp − 2 (0 y z)K −1 = f 3 (0, 3 (0 y z)       1 2 2 2 2 0, 0) exp − 2|K 3 | 1 − ρ31 y + 2c23 yz + 1 − ρ12 z , i.e., 

1  f 3 (0, y, z) = f 3 (0, 0, 0) exp −  2 2 1 − β23,1  z2 + 1 − ρ231



y2 2β23,1 yz − α23,1 1 − ρ212 (5.1.24)

5.1 Probability Functions

345

and     1 − ρ212 2 f 3 (0, 0, z) = f 3 (0, 0, 0) exp − z . 2 |K 3 |

∞



−∞

h 1 (z) exp

2 − 2 z2 23,1

dz = √

23,1

8π 3 |K 3 |

∞

−∞

h1



∞

h 1 (z) f 3 (0, 0, z)dz =  2 23,1 w exp − w2 dw , i.e.,

Example 5.1.8 Based on (5.1.24) and (5.1.25), we have  √ 1 8π 3 |K 3 |

(5.1.25)



−∞

⎞ ⎛ √

∞ |K 3 | 1 h 1 (z) f 3 (0, 0, z)dz =  h1 ⎝  w⎠   2 2 3 −∞ −∞ 8π 1 − ρ12 1 − ρ12  2 w dw × exp − 2 ⎧ ⎛ ⎞⎫ √ ⎨ ⎬ |K 3 | 1  = E h1 ⎝  U⎠ (5.1.26) ⎩ ⎭ 2π 1 − ρ212 1 − ρ212 ∞

   for a uni-variate function h 1 , where i2j,k = 1 − βi2j,k 1 − ρ2jk and U ∼ N (0, 1). We also have

∞ ∞

∞ ∞ 1 h 2 (y, z) f 3 (0, y, z)dydz =  h 2 (y, z) 8π 3 |K 3 | −∞ −∞ −∞ −∞    y2 1 2β23,1 yz z2  × exp −  − + dydz 2 α23,1 1 − ρ212 1 − ρ231 2 1 − β23,1 

∞ ∞   α23,1 2 2 h2 1 − ρ12 v, 1 − ρ31 w =  8π 3 |K 3 | −∞ −∞     2 1 2  v − 2β23,1 vw + w dvdw × exp −  2 2 1 − β23,1   2

∞ ∞   2πα23,1 1 − β23,1 2 2  h2 1 − ρ12 v, 1 − ρ31 w = 8π 3 |K 3 | −∞ −∞ × f 2 (v, w)|ρ=β23,1 dvdw     1 2 2 (5.1.27) 1 − ρ12 V1 , 1 − ρ31 V2 = √ E h2 2π

346

5 Normal Random Vectors

  for a bi-variate function h 2 , where (V1 , V2 ) ∼ N 0, 0, 1, 1, β23,1 . The two results (5.1.26) and (5.1.27) are useful in obtaining the expected values of some non-linear functions. ♦ Denote by gρ (x, y, z) the standard tri-variate normal pdf f 3 (x, y, z) with the covariance matrix K 3 shown in (5.1.18) so that the correlation coefficients ρ = (ρ12 , ρ23 , ρ31 ) are shown explicitly. Then, we have  gρ (−x, y, z) = gρ (x, y, z) 1 ,  gρ (x, −y, z) = gρ (x, y, z) 2 ,  gρ (x, y, −z) = gρ (x, y, z) 3 ,

(5.1.28) (5.1.29) (5.1.30)

and

0 −∞

= =



∞ 0



0







0 ∞

h(x, y, z)gρ (x, y, z)d xd ydz





h(x, y, −t)gρ (x, y, −t)d xd y(−dt)   h(x, y, −z)gρ (x, y, z)d xd ydz 

∞∞ 0 ∞ 0 ∞ 0

0

0

(5.1.31)

3

for a tri-variate function h(x, y, z). Here, k denotes the replacements of the correlation coefficients ρ jk and ρki with −ρ jk and −ρki , respectively. Example 5.1.9 We have βi j,k = β ji,k ,  βi j,k  i = −βi j,k ,  βi j,k  k = βi j,k ,

(5.1.32) (5.1.33)

∂ 1 sin−1 βi j,k = √ , ∂ρi j |K 3 |

(5.1.34)

ρki − ρi j ρ jk ∂ √ sin−1 βi j,k = −  ∂ρ jk |K 3 | 1 − ρ2jk

(5.1.35)

and

for the standard tri-variate normal distribution.



After steps similar to those used in obtaining (5.1.15), we can obtain the following theorem:

5.1 Probability Functions

347

  Theorem 5.1.4 Letting ξi j = sgn ρi j , we have  )   exp − 21 x 2 {z − μ1 (x, y)}2    f 3 (x, y, z) → exp − δ (x − ξ12 y) (5.1.36) 2 1 − ρ231 2π 1 − ρ231 when ρ12 → ±1, where μ1 (x, y) = 21 ξ12 (ρ23 x + ρ31 y). We subsequently have   exp − 21 x 2 f 3 (x, y, z) → δ (x − ξ12 y) δ (x − ξ31 z) √ 2π

(5.1.37)

when ρ12 → ±1 and ρ31 → ±1. ♠     In (5.1.36), we can replace ρ231 with ρ223 and exp − 21 x 2 with exp − 21 y 2 . The ‘mean’ μ1 (x, y) of X 3 , when (X 1 , X 2 , X 3 ) has the pdf (5.1.36), can be written also as μ1 (x, y) = 21 ξ12 (ξ12 ρ31 x + ρ31 y) = 21 ξ12 ρ31 (ξ12 x + y) because, due to the condition |K 3 | ≥ 0, the result lim |K 3 | = − (ρ23 ∓ ρ31 )2 requires that ρ23 → ρ12 ρ31 , Proof The proof is discussed in Exercise 5.41.

ρ12 →±1

i.e., ρ23 = ξ12 ρ31 when ρ12 → ±1. In addition, because of the function δ (x − ξ12 y), the mean can further be rewritten as μ1 (x, y) = 21 ξ12 ρ31 (ξ12 x + ξ12 x) = ρ31 x or as (5.1.37) can be expressed in various formuμ1 (x, y) = ρ23 y. Similarly,   equivalent    las: for instance, exp − 21 x 2 can be replaced with exp − 21 y 2 or exp − 21 z 2 and δ (x − ξ31 z) can be replaced with δ (z − ξ31 x) or δ (y − ξ23 z). The result (5.1.37) in Theorem 5.1.4 can be generalized as follows: Theorem 5.1.5 For the pdf f X (x) shown in (5.1.1), we have

−m 1 )2   n exp − (x1 2σ * 2 1 xi − m i x1 − m 1 1  δ − ξ1i f X (x) → σ σ1 σi 2πσ12 i=2 i

(5.1.38)

  when ρ1 j → ±1 for j = 2, 3, . . . , n, where ξ1 j = sgn ρ1 j . Note in Theorem 5.1.5 that, when ρ1 j → ±1 for j ∈ {2, 3, . . . , n}, the value of ρi j for i ∈ {2, 3, . . . , n} and j ∈ {2, 3, . . . , n} is determined as ρi j → ρ1i ρ1 j = ξ1i ξ1 j . In the tri-variate case, for instance, when ρ12 → 1 and ρ31 → 1, we have ρ23 → 1 from lim |K 3 | = − (1 − ρ23 )2 ≥ 0. ρ12 ,ρ13 →1

348

5 Normal Random Vectors

5.2 Properties In this section, we discuss the properties (Hamedani 1984; Horn and Johnson 1985; Melnick and Tenenbein 1982; Mihram 1969; Pierce and Dykstra 1969) of normal random vectors. Some of the properties we will discuss in this chapter are based on those described in Chap. 4. We will also present properties unique to normal random vectors.

5.2.1 Distributions of Subvectors and Conditional Distributions For X = (X 1 , X 2 , . . . , X n )T ∼ N (m, K ), let us partition the covariance matrix K and its inverse matrix K −1 between the s-th and (s + 1)-st rows, and between the s-th and (s + 1)-st columns also, as  K =

K 11 K 12 K 21 K 22

 (5.2.1)

and K

−1

 =

Ψ 11 Ψ 12 Ψ 21 Ψ 22

 .

(5.2.2)

T T Then, we have K ii = K iiT and Ψ ii = Ψ iiT for i = 1, 2, K 21 = K 12 , and Ψ 21 = Ψ 12 . We also have

−1 −1 K 21 K −1 Ψ 11 = K −1 11 + K 11 K 12 ξ 11 ,

(5.2.3)

Ψ 12 = Ψ 21 =

(5.2.4) (5.2.5)

−1 −K −1 11 K 12 ξ , −1 −1 −ξ K 21 K 11 ,

and Ψ 22 = ξ −1 ,

(5.2.6)

where ξ = K 22 − K 21 K −1 11 K 12 . Theorem 5.2.1 Assume X = (X 1 , X 2 , . . . , X n )T ∼ N (m, K ) and the partition of the covariance matrix K as described in (5.2.1). Then, for a subvector X (2) = (X s+1 , X s+2 , . . . , X n )T of X, we have (Johnson and Kotz 1972)

5.2 Properties

349

  X (2) ∼ N m(2) , K 22 ,

(5.2.7)

where m(2) = (m s+1 , m s+2 , . . . , m n )T . In other words, any subvector of a normal random vector is a normal random vector. Theorem 5.2.1 also implies that every element of a normal random vector is a normal random variable, which we have already observed in Example 5.1.2. However, it should again be noted that the converse of Theorem 5.2.1 does not hold true as we can see in the example below. Example 5.2.1 (Romano and Siegel 1986) Assume the joint pdf  f X,Y (x, y) = of (X, Y ), where g(x, y) =

1 2π

2g(x, y), x y ≥ 0, 0, xy < 0

(5.2.8)

   exp − 21 x 2 + y 2 . Then, we have the marginal pdf

⎧    2 0 y ⎨ 1 exp − x 2 −∞ exp − 2 dy, x < 0, π 2     f X (x) = 1 ∞ y2 ⎩ exp − x 2 dy, x ≥ 0 exp − 0 π 2 2  2 1 x = √ exp − 2 2π

(5.2.9)

of X . We can similarly show that Y is also a normal random variable. In other words, although X and Y are both normal random variables, (X, Y ) is not a normal random vector. ♦ Theorem 5.2.2 Assume X = (X 1 , X 2 , . . . , X n )T ∼ N (m, K ) and the partition of the inverse K −1 of the covariance matrix K as described in (5.2.2). Then, we have the conditional distribution (Johnson and Kotz 1972)   T  −1 T N m(1) − x (2) − m(2) Ψ 21 Ψ −1 11 , Ψ 11

(5.2.10)

of X (1) = (X 1 , X 2 , . . . , X s )T when X (2) = (X s+1 , X s+2 , . . . , X n )T = x (2) T = (xs+1 , xs+2 , . . . , xn ) is given, where m(1) = (m 1 , m 2 , . . . , m s )T and m(2) = (m s+1 , m s+2 , . . . , m n )T .   Example 5.2.2 From the joint pdf (5.1.8) of N m 1 , m 2 , σ12 , σ22 , ρ and the pdf of   pdf f X |Y (x|y) for a normal random vector (X, Y ) ∼ N m 2 , σ22 , the conditional  N m 1 , m 2 , σ12 , σ22 , ρ can be obtained as   2  x − m X |Y =y 1 f X |Y (x|y) =    exp − 2σ 2 1 − ρ2  , 1 2πσ12 1 − ρ2

(5.2.11)

350

5 Normal Random Vectors

where m X |Y =y = m 1 + ρ σσ21 (y − m 2 ). In short, the distribution of X given Y = y      is N m X |Y =y , σ12 1 − ρ2 for (X, Y ) ∼ N m 1 , m 2 , σ12 , σ22 , ρ . This result can   T T be obtained also from (5.2.10) using m(1) = m 1 , x (2) − m(2) = y − m 2 , Ψ 21 =   2 2 ♦ − σ σ ρ1−ρ2 , and Ψ −1 11 = σ1 1 − ρ . ) 1 2( Theorem 5.2.3 If a normal random vector is an uncorrelated random vector, then it is an independent random vector. In general, two uncorrelated random variables are not necessarily independent as we have discussed in Chap. 4. Theorem 5.2.3 tells us that two jointly normal random variables are independent of each other if they are uncorrelated, which can promptly be confirmed because we have f X,Y (x, y) = f X (x) f Y (y) when ρ = 0 in the two-dimensional normal pdf (5.1.8). In Theorem 5.2.3, the key point is that the two random variables are jointly normal (Wies and Hall 1993): in other words, if two normal random variables are not jointly normal but are only marginally normal, they may or may not be independent when they are uncorrelated. Example 5.2.3 (Stoyanov 2013) Let φ1 (x, y) and φ2 (x, y) be two standard bivariate normal pdf’s with correlation coefficients ρ1 and ρ2 , respectively. Assume that the random vector (X, Y ) has the joint pdf f X,Y (x, y) = c1 φ1 (x, y) + c2 φ2 (x, y),

(5.2.12)

where c1 > 0, c2 > 0, and c1 + c2 = 1. Then, when ρ1 = ρ2 , f X,Y is not a normal pdf and, therefore, (X, Y ) is not a normal random vector. Now, we have X ∼ N (0, 1), Y ∼ N (0, 1), and the correlation coefficient between X and Y is ρ X Y = c1 ρ1 + c2 ρ2 . 2 1 and c2 = ρ1ρ−ρ for ρ1 ρ2 < 0, then c1 > 0, c2 > 0, c1 + c2 = If we choose c1 = ρ2ρ−ρ 1 2 1, and ρ X Y = 0. In short, although X and Y are both normal and uncorrelated with each other, they are not independent of each other because (X, Y ) is not a normal random vector. ♦ Based on Theorem 5.2.3, we can show the following theorem: Theorem 5.2.4 For a normal random vector X = (X 1 , X 2 , . . . , X n ), consider non-overlapping k subvectors X 1 = X i1 , X i2 , . . . , X in1 , X 2 = X j1 , X j2 , k    + . . . , X jn2 , . . ., X k = X l1 , X l2 , . . . , X lnk , where n j = n. If ρi j = 0 for j=1

every choice of i ∈ Sa and j ∈ Sb , with a ∈ {1, 2, . . . , k} and b ∈ {1, 2, independent of each other, where S1 =  2 , . . ., X k are  . . . , k}, then X 1 , X i 1 , i 2 , . . . , i n 1 , S2 = j1 , j2 , . . . , jn 2 , . . ., and Sk = l1 , l2 , . . . , ln k . Example 5.2.4 For a normal random vector X = (X 1 , X 2 , . . . , X 5 ) with the covariance matrix

5.2 Properties

351



1 ⎜ 0 ⎜ K = ⎜ ⎜ ρ31 ⎝ ρ41 0

0 1 0 0 ρ52

ρ13 0 1 ρ43 0

ρ14 0 ρ34 1 0

⎞ 0 ρ25 ⎟ ⎟ 0 ⎟ ⎟, 0 ⎠ 1

(5.2.13)

the subvectors X 1 = (X 1 , X 3 , X 4 ) and X 2 = (X 2 , X 5 ) are independent of each other. ♦

5.2.2 Linear Transformations Let us first consider a generalization of the result obtained in Example 5.1.5 that the sum of two jointly normal random variables is a normal random variable.   n  Theorem 5.2.5 When the random variables X i ∼ N m i , σi2 i=1 are independent of each other, we have n .

/ Xi ∼ N

i=1

n .

mi ,

i=1

n .

0 σi2

.

(5.2.14)

i=1

  Proof Because the cf of X i is ϕ X i (ω) = exp jm i ω − 21 σi2 ω 2 , we can obtain the cf   n n  n    1 + + ϕY (ω) = jm i ω − 21 σi2 ω 2 of Y = exp jm i ω − 21 σi2 ω 2 = exp X i as i=1

i=1

 / n 0 0  / n . 1 . 2 ϕY (ω) = exp j mi ω − σi ω 2 2 i=1 i=1 using (4.3.32). This result implies (5.2.14).

i=1

(5.2.15) ♠

Generalizing Theorem 5.2.5 further, we have the following theorem that a linear transformation of a normal random vector is also a normal random vector: Theorem 5.2.6 When X = (X 1 , X 2 , . . . , X n )T ∼ N (m, K ), we have L X ∼  N Lm, L K L T when L is an n × n matrix such that |L| = 0. Proof First, we have X = L −1 Y because  |L| = 0, and the Jacobian of the inverse   1 −1 −1 transformation x = g ( y) = L y is  ∂∂y g −1 ( y) =  L −1  = |L| . Thus, we have   1 the pdf f Y ( y) = |L| f X (x) −1 of Y as x=L

y

352

5 Normal Random Vectors

  T  exp − 21 L −1 y − m K −1 L −1 y − m f Y ( y) = √ |L| (2π)n |K |

(5.2.16)

 −1  −1 T  T  = L ,  L  = |L|, and from Theorem 4.2.1. Now, note that L T  −1  −1 T −1 −1 T −1  −1  T L y−m K L y − m = ( y − Lm) L K L ( y − Lm). In  T −1 −1 −1  −1 T T −1 K L = L addition, letting H = L KL , we have H = L K −1 L −1 and |H| =  L K L T  = |L|2 |K |. Then, we can rewrite (5.2.16) as   1 1 T −1 f Y ( y) = √ exp − ( y − Lm) H ( y − Lm) , 2 (2π)n |H|   which implies L X ∼ N Lm, L K L T when X ∼ N (m, K ).

(5.2.17) ♠

Theorem 5.2.6 is a combined generalization of the facts that the sum of two jointly normal random variables is a normal random variable, as described in Example 5.1.5, and that the sum of a number of independent normal random variables is a normal random variable, as shown in Theorem 5.2.5. Example 5.2.5 For (X, Y ) ∼ N (10, 0, 4, 1, 0.5), find the numbers a and b so that Z = a X + bY and W = X + Y are uncorrelated.  Solution Clearly, E{Z W } − E{Z }E{W } = E a X 2 + bY 2 + (a + b)X Y − 100a = 5a + 2b because E{Z } = 10a and E{W } = 10. Thus, for any pair of two real numbers a and b such that 5a + 2b = 0, the two random variables Z = a X + bY and W = X + Y will be uncorrelated. ♦ Example 5.2.6 For a random vector (X, Y ) ∼ N (10, 0, 4, 1, 0.5), obtain the joint distribution of Z = X + Y and W = X − Y . Solution We first note that Z and W are jointly normal from Theorem 5.2.6. We thus only need to obtain E{Z }, E{W }, Var{Z }, Var{W }, and ρ Z W . We first have E{Z } = 10 and E{W 3 }. Next, we have the  } = 10 from E{X 2± Y } = E{X } ± E{Y variance σ 2Z = E (X + Y − 10)2 = E {(X − 10) + Y }2 of Z as σ 2Z = σ 2X + 2E{X Y − 10Y } + σY2 =7

(5.2.18)

2 from E{X Y } − E{X }E{Y } = ρσ X σY = 1 and, similarly, σW = 3 of the variance 2 2 2 2 W . In addition, we also get E{Z W } = E X − Y = m X + σ X − σY2 = 103  3 √ √ and, consequently, the correlation coefficient ρ Z W = 103−100 = between Z 7  7 3  and W . Thus, we have (Z , W ) ∼ N 10, 10, 7, 3, 37 . In passing, the joint

5.2 Properties

353

pdf of (Z , W ) is f Z ,W (x, y) =

 2 y−10 √ + (y−10) , i.e.. 3 3

1 √ √ √ 2π 7 3 1− 37

 exp −

1 2(1− 37 )



(x−10)2 7

−2



3 x−10 √ 7 7

√    1  2 3 2 exp − 3x − 6x y + 7y − 80y + 400 . (5.2.19) f Z ,W (x, y) = 12π 24 The distribution of (Z , W ) can also be obtained Theorem 5.2.6 more   from    Z 1 1 X X directly as follows: because V = W = 1 −1 = L Y , we have the       Y  X 1 1 10 10 mean vector E {V } = LE Y = = and the covariance    1 −1  0   10  1 1 4 1 1 1 7 3 matrix K V = L K L T = 1 −1 = 3 3 of V . Thus, (Z , W ) ∼ 1 1 1 −1        10 7 3 N , 33 = N 10, 10, 7, 3, 37 . ♦ 10 From Theorem 5.2.6, the linear combination

n +

ai X i of the components of a

i=1

normal random vector X = (X 1 , X 2 , . . . , X n ) is a normal random variable. Let us again emphasize that, while Theorem 5.2.6 tells us that a linear transformation of jointly normal random variables produces jointly normal random variables, a linear transformation of random variables which are normal only marginally but not jointly is not guaranteed to produce normal random variables (Wies and Hall 1993). n are all normal random As we can see in Examples 5.2.7–5.2.9 below, when {X i }i=1 variables but X = (X 1 , X 2 , . . . , X n ) is not a normal random vector, (A) the normal n are generally not independent even if they are uncorrelated, random variables {X i }i=1 n may or may not be a normal random variable, (B) the linear combination of {X i }i=1 and (C) the linear transformation of X is not a normal random vector. Example 5.2.7 (Romano and Siegel 1986) Let X ∼ N (0, 1) and H be the outcome from a toss of a fair coin. Then, we have Y ∼ N (0, 1) for the random variable  Y =

X, H = head, −X, H = tail.

(5.2.20)

Now, E{X 2  2because  3 } = 0, E{Y } = 0, and E{X Y } = E{E{X Y |H }} = 1 E X + E −X 2 = 0, the random variables X and Y are uncorrelated. How2 ever, X and Y are not independent because, for instance, P(|X | > 1)P(|Y | < 1) > 0 while P(|X | > 1, |Y | < 1) = 0. In addition, X + Y is not normal. In other words, even when X and Y are both normal random variables, X + Y could be non-normal if (X, Y ) is not a normal random vector. ♦

354

5 Normal Random Vectors

Example 5.2.8 (Romano and Siegel 1986) Let X ∼ N (0, 1) and 

X, |X | ≤ α, −X, |X | > α

Y =

(5.2.21)

for a positive number α. Then, X and Y are not independent. In addition, Y is also a standard normal random variable because, for any set B such that B ∈ B(R), we have P(Y ∈ B) = P(Y ∈ B| |X | ≤ α)P(|X | ≤ α) +P(Y ∈ B| |X | > α)P(|X | > α) = P(X ∈ B| |X | ≤ α)P(|X | ≤ α) + P(−X ∈ B| |X | > α)P(|X | > α) = P(X ∈ B| |X | ≤ α)P(|X | ≤ α) + P(X ∈ B| |X | > α)P(|X | > α) = P(X ∈ B). (5.2.22) Now, the correlation coefficient ρ X Y = E{X Y } = 2 φ(x)d x between X and Y can be obtained as

ρX Y = 4

α

α 0

x 2 φ(x)d x − 2

x 2 φ(x) d x − 1,

∞ α

x2

(5.2.23)

0

α where φ denotes the standard normal pdf. Letting g(α) = 0 x 2 φ(x)d x, we can find a positive number α0 such that1 g(α0 ) = 41 because g(0) = 0, g(∞) = 21 , and g is a continuous function. Therefore, when α = α0 , X and Y are uncorrelated from (5.2.23). Meanwhile, because  X +Y =

2X, |X | ≤ α, 0, |X | > α,

X + Y is not normal.

(5.2.24) ♦

Example 5.2.9 (Stoyanov 2013) When X = (X, Y ) is a normal random vector, the random variables X , Y , and X + Y are all normal. Yet, the converse is not necessarily true. We now consider an example. Let the joint pdf of X = (X, Y ) be    1 2 1 2 exp − x + y f X (x, y) = 2π 2      2 1 2 2 2 , × 1 + x y x − y exp − x + y + 2 2 where > 0. Let us also note that 1

Here, α0 ≈ 1.54.

(5.2.25)

5.2 Properties

355

     2     x y x − y 2 exp − 1 x 2 + y 2 + 2  ≤ 1   2

(5.2.26)

    when ≥ −2 + ln 4 ≈ −0.6137 because −4e−2 ≤ x y x 2 − y 2 exp − 21 x 2 +  y 2 ≤ 4e−2 . Then, the joint cf of X can be obtained as       2 st s 2 − t 2 s2 + t 2 s + t2 + exp − − , (5.2.27) ϕ X (s, t) = exp − 2 32 4 from which we can make the following observations: (A)  We have X ∼ N (0, 1) and Y ∼ N (0, 1) because ϕ X (t) = ϕ X (t, 0) = exp − 21 t 2 and ϕY (t) = ϕ X (0, t) = exp − 21 t 2 .   (B) We have2 X + Y ∼ N (0, 2) because ϕ X +Y (t) = ϕ X (t, t) = exp −t 2 . (C) We have X − Y ∼ N (0, 2) because ϕ X −Y (t) = ϕ X (t, −t) = exp −t 2 .   ∂2 ϕ X (s, t) = 0 and E{X } = E{Y } = 0. There(D) We have E{X Y } = ∂t∂s (s,t)=(0,0)

fore, X and Y are uncorrelated. (E) As it is clear from (5.2.25) or (5.2.27), the random vector (X, Y ) is not a normal random vector. In other words, although X , Y , and X + Y are all normal random variables, (X, Y ) is not a normal random vector. ♦ Theorem 5.2.7 A normal random vector with a positive definite covariance matrix can be linearly transformed into an independent standard normal random vector. Proof Theorem 5.2.7 can be proved from Theorems 4.3.5, 5.2.3, and 5.2.6, or from (4.3.24) and (4.3.25). Specifically, when X ∼ N (m, K ) with |K | > 0, the eigenn , and the eigenvectorcorresponding to λi is ai , assume the values of K are {λi }i=1 ˜ considered in (4.3.21) and (4.3.23), matrix A and λ = diag √1 , √1 , . . . , √1 λ1

λ2

λn

respectively. Then, the mean vector of ˜ A (X − m) Y = λ

(5.2.28)

˜ AE {X − m} = 0. In addition, as we can see from (4.3.25), the covariis E {Y } = λ ance matrix of Y is K Y = I. Therefore, using Theorem 5.2.6, we get Y ∼ N (0, I). ˜ A (X − m) is a vector of independent standard normal random In other words, Y = λ variables. ♠ Example 5.2.10 Transform the random vector U = (U1 , U2 )T ∼ N (m, K ) into an standard normal random vector when m = (10 0)T and K =  independent  2 −1 . −1 2 Here, as we have observed in Exercise 4.62, when the joint cf of X = (X, Y ) is ϕ X (t, s), the cf of Z = a X + bY is ϕ Z (t) = ϕ X (at, bt).

2

356

5 Normal Random Vectors

Solution The eigenvectors of the covariance matrix



eigenvalues and corresponding 1 1 T T K of U are λ1 = 3, a1 = √2 (1 − 1) and λ2 = 1, a2 = √2 (1 1) . Thus, for  1   √ 0 1 −1 1 3 √ the linear transformation L = 2 , i.e., 1 1 0 1 / L=

√1 6 √1 2

− √16 √1 2

0 ,

(5.2.29)

  the random vector V = L U − (10 0)T will be a vector of independent standard normal random variables: matrix of V is K V = L K U L H = 0 / 1 the 0covariance /    1 1 1 √ −√ √ √ 2 −1 10 6 6 6 2 = . ♦ 1 1 1 √ √ √ √1 −1 2 0 1 − 2 2 6 2 In passing, the following theorem is noted without a proof: Theorem 5.2.8 If the linear combination a T X is a normal random variable for every vector a = (a1 , a2 , . . . , an )T , then the random vector X is a normal random vector and the converse is also true.

5.3 Expected Values of Nonlinear Functions In this section, expected values of some non-linear functions and joint moments (Bär and Dittrich 1971; Baum 1957; Brown 1957; Hajek 1969; Haldane 1942; Holmquist 1988; Kan 2008; Nabeya 1952; Song and Lee 2015; Song et al. 2020; Triantafyllopoulos 2003; Withers 1985) of normal random vectors are investigated. We first consider a few simple examples based on the cf and mgf.

5.3.1 Examples of Joint Moments Let us start with some examples for obtaining joint moments of normal random vectors via cf and mgf.  Example 5.3.1 For the joint central moment μi j = E (X − m X )i (Y − m Y ) j of a random vector (X, Y ), we have observed that μ00 = 1, μ01 = μ10 = 0, μ20 = σ12 , it is easy to see that μ02 = σ22 , and μ11 = ρσ1 σ2 in Sect. 4.3.2.1. In addition,   μ30 = μ03 = 0 and μ40 = 3σ14 when (X, Y ) ∼ N 0, 0, σ12 , σ22 , ρ  from (3.3.31).  2 2 3 2 3ρσ Now, based on the moment theorem, show 1 σ2 , μ22 = 1 + 2ρ σ1 σ2 ,  that μ231 =  and μ41 = μ32 = 0 when (X, Y ) ∼ N 0, 0, σ1 , σ22 , ρ .

5.3 Expected Values of Nonlinear Functions

357

 Solution For convenience, let C = E{X Y } = ρσ1 σ2 , A = 21 σ12 s12 + 2Cs1 s2   2 i+ j +σ22 s22 , and A(i j) = ∂i j A. Then, we easily have A(10) s1 =0,s2 =0 = σ12 s1 ∂s ∂s 1 2 3 3 2 + Cs2 s1 =0,s2 =0 = 0, A(01) s1 =0,s2 =0 = σ22 s2 + Cs2 s1 =0,s2 =0 = 0, A(20) = σ12 , A(11) = C, A(02) = σ22 , and A(i j) = 0 for i + j ≥ 3. Denoting the joint mgf of i+ j (X, Y ) by M = M (s1 , s2 ) = exp(A) and employing the notation M (i j) = ∂ i Mj , ∂s 1 ∂s2  (10) 2 (10) (10) (20) (20) we get M , = MA , M =M A + A

2  M (21) = M A(21) + A(20) A(01) + 2 A(11) A(10) + A(10) A(01)

2  = M A(20) A(01) + 2 A(11) A(10) + A(10) A(01) , (5.3.1)    M (31) = M 3A(20) A(11) + A(10) A(01)  2   + A(10) 3A(11) + A(10) A(01) , (5.3.2)     2 2 M (22) = M A(20) A(02) + 2 A(11) + 4 A(11) A(10) A(01) + A(20) A(01) 2 2  (01) 2   , (5.3.3) A + A(10) A(02) + A(10) M (41) = B41 M,

(5.3.4)

and M (32) = B32 M.

(5.3.5)

Here, 2  2  B41 = 3 A(20) A(01) + 12 A(20) A(11) A(10) + 6A(20) A(10) A(01)  3  3 +4 A(11) A(10) + A(10) A(01) (5.3.6) and  2 B32 = 6A(20) A(11) A(01) + 3A(20) A(02) A(10) + 6A(11) A(10) A(01) 2  3  2  +6 A(11) A(10) + A(02) A(10) + 3A(20) A(10) A(01) 3  (01) 2  A + A(10) . (5.3.7)

358

5 Normal Random Vectors

Recollecting M(0, 0) = 1, we have  μ31 = 3ρσ13 σ2 from3 (5.3.2), μ22 =  2 that  2 2 1 + 2ρ σ1 σ2 from4 (5.3.3), μ41 = M (41) s1 =0,s2 =0 = 0 from (5.3.4) and (5.3.6),  ♦ and μ32 = M (32) s1 =0,s2 =0 = 0 from (5.3.5) and (5.3.7). In Exercise 5.15, it is shown that    E X 12 X 22 X 32 = 1 + 2 ρ212 + ρ223 + ρ231 + 8ρ12 ρ23 ρ31

(5.3.8)

for (X 1 , X 2 , X 3 ) ∼ N (0, K 3 ). Similarly, it is shown in Exercise 5.18 that E {X 1 X 2 X 3 } = m 1 E {X 2 X 3 } + m 2 E {X 3 X 1 } + m 3 E {X 1 X 2 } −2m 1 m 2 m 3

(5.3.9)

for a general tri-variate normal random vector (X 1 , X 2 , X 3 ) and that E {X 1 X 2 X 3 X 4 } = E {X 1 X 2 } E {X 3 X 4 } + E {X 1 X 3 } E {X 2 X 4 } + E {X 1 X 4 } E {X 2 X 3 } − 2m 1 m 2 m 3 m 4

(5.3.10)

for a general quadri-variate normal random vector (X 1 , X 2 , X 3 , X 4 ). The results (5.3.8)–(5.3.10) can also be obtained via the general formula (5.3.51).

5.3.2 Price’s Theorem We now discuss a theorem that is quite useful in evaluating the expected values of various non-linear functions such as the power functions, sign functions, and absolute values of normal random vectors. Denoting the covariance between X i and X j by ρ˜i j = Ri j − m i m j ,

(5.3.11)

 where Ri j = E X i X j and m i = E {X i }, the correlation coefficient ρi j between ρ˜ X i and X j and variance σi2 of X i can be expressed as ρi j = √ i j and σi2 = ρ˜ii , ρ˜ ii ρ˜ j j

respectively.

2 3 Theorem 5.3.1 Let K = ρ˜r s be the covariance matrix of an n-variate normal n are all memoryless functions, we have random vector X. When {gi (·)}i=1    More specifically, we have μ31 = M (31) s =0,s =0 = 3M A(20) A(11) s =0,s =0 = 3σ12 C = 1 2 1 2 3ρσ13 σ2 .  2  4 More specifically, we have μ22 = M (22) s =0,s =0 = M A(20) A(02) + 2M A(11) 1 2   |s1 =0,s2 =0 = σ12 σ22 + 2C 2 = 1 + 2ρ2 σ12 σ22 . 3

5.3 Expected Values of Nonlinear Functions

 ∂ γ1 E

n 1

i=1 ∂ ρ˜rk11s1 ∂ ρ˜rk22s2

· · · ∂ ρ˜rkNN s N

 n  * (γ ) 1 3 = γ E gi (X i ) , 22 i=1

(5.3.12)

N + k j δr j s j , and γ3 = i j k j . Here, δi j is the Kroj=1 j=1 j=1  k necker delta function defined as (4.3.17), N ∈ 1, 2, . . . , 21 n(n + 1) , gi(k) (x) = ddx k gi (x), and r j ∈ {1, 2, . . . , n} and s j ∈ {1, 2, . . . , n} for j = 1, 2, . . . , N . In addition, i j = δir j + δis j ∈ {0, 1, 2} denotes how many of r j and s j are equal to i for n + i = 1, 2, . . . , n and j = 1, 2, . . . , N and satisfies i j = 2.

where γ1 =

N +

 gi (X i )

359

k j , γ2 =

N +

i=1

Based on Theorem 5.3.1, referred to as Price’s theorem (Price 1958), let us describe how we can obtain the joint moments and the expected values of non-linear functions of normal random vectors in the three cases of n = 1, 2, and 3.

5.3.2.1

Uni-Variate Normal Random Vectors

When n = 1 in Theorem 5.3.1, we have N = 1, r1 = 1, s1 = 1, and 11 = 2. Let k = k1 δ11 , and use m for m 1 = E {X 1 } and ρ˜ for ρ˜11 = σ 2 = Var {X 1 } by deleting the subscripts for brevity. We can then express (5.3.12) as ∂k E{g(X )} = ∂ ρ˜k

 k  1 E g (2k) (X ) . 2

(5.3.13)

Meanwhile, for the pdf f X (x) of X , we have lim f X (x) = δ (x − m) .

ρ˜ →0

(5.3.14)

Based on (5.3.13) and (5.3.14), let us obtain the expected value E {g(X )} for a normal random variable X .    Example 5.3.2 For a normal random variable X ∼ N m, σ 2 , obtain Υ˜ = E X 3 . Solution Letting g(x) = x 3 , we have g (2) (x) = 6x. Thus, we get ∂∂ρ˜ Υ˜ =  1 E g (2) (X ) = 3E {X } = 3m, i.e., Υ˜ = 3m ρ˜ + c from (5.3.13) with k = 1, where 2 c is the integration constant. Subsequently, we have  E X 3 = 3mσ 2 + m 3 because c = m 3 from Υ˜ →

∞

−∞

(5.3.15)

x 3 δ(x − m)d x = m 3 recollecting (5.3.14).



360

5 Normal Random Vectors

We now derive a general formula for the moment E {X a }. Let us use an underline as n= to denote the quotient

4n5 2

 n−1 2 n , 2

, n is odd, n is even

(5.3.16)

of a non-negative integer n when divided by 2.

Theorem 5.3.2 For X ∼ N (m, ρ), ˜ we have a .  E Xa = j=0

a! ρ˜ j m a−2 j 2 j j! (a − 2 j)!

(5.3.17)

for a = 0, 1, . . .. Proof The proof is left as an exercise, Exercise 5.27.



Example 5.3.3 Using (5.3.17), we have  E X 4 = 3ρ˜2 + 6m 2 ρ˜ + m 4 for X ∼ N (m, ρ). ˜ When m = 0, (5.3.18) is the same as (3.3.32).

5.3.2.2

(5.3.18) ♦

Bi-variate Normal Random Vectors

When n = 2, specific simpler expressions of (5.3.12) for all possible pairs (n, N ) are shown in Table 5.1. Let us consider the expected value E {g1 (X 1 ) g2 (X 2 )} for a normal random vector X = (X 1 , X 2 ) with mean vector m = (m 1 , m 2 ) and covariρ˜ ρ˜ ance matrix K = 11 12 assuming n = 2, N = 1, r1 = 1, and s1 = 2 in Theorem ρ˜12 ρ˜22 5.3.1. Because 11 = 1 and 21 = 1, we can rewrite (5.3.12) as

∂k E {g1 (X 1 ) g2 (X 2 )} = E g1(k) (X 1 ) g2(k) (X 2 ) . k ∂ ρ˜12

(5.3.19)



First, find a value k for which the right-hand side E g1(k) (X 1 ) g2(k) (X 2 ) of (5.3.19) is simple to evaluate, and then obtain the expected value. Next, integrate the expected value with respect to ρ˜12 to obtain E {g1 (X 1 ) g2 (X 2 )}. Note that, when ρ˜12 = 0, we have ρ12 = σρ˜112σ2 = 0 and therefore X 1 and X 2 are independent of each other from Theorem 5.2.3: this implies, from Theorem 4.3.6, that

  E g1(k) (X 1 ) g2(l) (X 2 ) 

ρ˜ 12 =0





= E g1(k) (X 1 ) E g2(l) (X 2 )

(5.3.20)

5.3 Expected Values of Nonlinear Functions

361

Table 5.1 Specific formulas of Price’s theorem for all possible pairs (n, N ) when n = 2   N  N  n N r j , s j j=1 , δr j s j j=1 , i j i=1 j=1 : (n, N ) Specific formula of (5.3.12) (2, 1)

21 = 0: (r1 , s1 ) = (1, 1), δr1 s1 = 1, 11 = 2,

 1 k (2k) ∂k E g1 (X 1 ) g2 (X 2 ) k E {g1 (X 1 ) g2 (X 2 )} = 2

(2, 1)

(r1 , s1 ) = (1, 2), δr1 s1 = 0, 11 = 1, 21 = 1:

(2, 2)

∂ ρ˜ 11

∂k E {g1 (X 1 ) g2 ∂ ρ˜ k12

(k) (k) (X 2 )} = E g1 (X 1 ) g2 (X 2 )

(r1 , s1 ) = (1, 1), (r2 , s2 ) = (1, 2), δr1 s1 = 1, δr2 s2 = 0, 11 = 2, 21 = 0, 12 = 1, 22 = 1:

∂ k1 +k2 k k E {g1 (X 1 ) g2 (X 2 )} = ∂ ρ˜ 111 ∂ ρ˜ 122

 1 k1 (2k1 +k2 ) (k ) E g1 (X 1 ) g2 2 (X 2 ) 2

(2, 2)

(r1 , s1 ) = (1, 1), (r2 , s2 ) = (2, 2), δr1 s1 = 1, δr2 s2 = 1, 11 = 2, 21 = 0, 12 = 0, 22 = 2:

∂ k1 +k2 k k E {g1 (X 1 ) g2 (X 2 )} = ∂ ρ˜ 111 ∂ ρ˜ 222

 1 k1 +k2 (2k1 ) (2k ) E g1 (X 1 ) g2 2 (X 2 ) 2

(2, 3)

(r1 , s1 ) = (1, 1), (r2 , s2 ) = (1, 2), (r3 , s3 ) = (2, 2), δr1 s1 = 1, δr2 s2 = 0, δr3 s3 = 1, 11 = 2, 21 = 0, 12 = 1, 22 = 1, 13 = 0, 23 = 2: ∂ k1 +k2 +k3 k k k E {g1 (X 1 ) g2 (X 2 )} = ∂ ρ˜ 111 ∂ ρ˜ 122 ∂ ρ˜ 223

 1 k1 +k3 (2k +k ) (k +2k ) E g1 1 2 (X 1 ) g2 2 3 (X 2 ) 2

for k, l = 0, 1, . . ., which can be used to determine integration constants. In short, when we can easily evaluate the expected value of the product of the derivatives of g1 and g2 (e.g., when we have a constant or an impulse function after a few times of differentiations of g1 and/or g2 ), Theorem 5.3.1 is quite useful in obtaining E {g1 (X 1 ) g2 (X 2 )}. Example 5.3.4 For a normal random vector X= (X 1 , X 2 ) with mean vector m =  ρ˜ ρ˜ (m 1 , m 2 ) and covariance matrix K = 11 12 , obtain Υ˜ = E X 1 X 22 . ρ˜12 ρ˜22 ˜

Solution With k = 1, g1 (x) = x, and g2 (x) = x 2 in (5.3.19), we get ddρ˜Υ12 =

E g1(1) (X 1 ) g2(1) (X 2 ) = E {2X 2 } = 2m 2 , i.e., Υ˜ = 2m 2 ρ˜12 + c. Recollecting     (5.3.20), we have c = Υ˜  = m 1 ρ˜22 + m 22 . Thus, we finally have ρ˜ 12 =0

   E X 1 X 22 = 2m 2 ρ˜12 + m 1 ρ˜22 + m 22 .

(5.3.21)

362

5 Normal Random Vectors

  The result (5.3.21) is the same as the result E W Z 2 = 2m 2 ρσ1 σ2 + m 1 σ22 + m 22 for a random vector (W, Z ) = (σ1 X + m 1 , σ2 Y + m 2 ) which we would obtain after some steps based on E X Y 2 = 0 for (X, Y ) ∼ N (0, 0, 1, 1, ρ). In addition, ♦ when X 1 = X 2 = X , (5.3.21) is the same as (5.3.15).  a b A general formula for the joint moment E X 1 X 2 is shown in the theorem below.  Theorem 5.3.3 The joint moment E X 1a X 2b can be expressed as

E





X 1a X 2b

=

a− j b− j min(a,b) . .. j=0

p=0 q=0

j

p

q

a− j−2 p

b− j−2q

a!b! ρ˜12 ρ˜11 ρ˜22 m 1 m2 (5.3.22) p+q 2 j! p!q!(a − j − 2 p)!(b − j − 2q)!

  , where a, b = 0, 1, . . .. for (X 1 , X 2 ) ∼ N m 1 , m 2 , ρ˜11 , ρ˜22 , √ρ˜ 12 ρ˜ 1 ρ˜ 22



Proof A proof is provided in Appendix 5.1. Example 5.3.5 We

can

obtain

2 2−  + +j 3− +j E X 12 X 23 =

j=0 p=0 q=0

j

p

q

2− j−2 p

12˜ρ12 ρ˜ 11 ρ˜ 22 m 1 2 p+q j! p!q! (2− j−2 p)!

3− j−2q

m2 , (3− j−2q)!

i.e.,  E X 12 X 23 = m 21 m 32 + 3ρ˜22 m 21 m 2 + ρ˜11 m 32 + 6ρ˜12 m 1 m 22 +3ρ˜11 ρ˜22 m 2 + 6ρ˜12 ρ˜22 m 1 + 6ρ˜212 m 2

(5.3.23)

♦   Theorem 5.3.4 For (X 1 , X 2 ) ∼ N m 1 , m 2 , σ12 , σ22 , ρ , the joint central moment  μab = E (X 1 − m 1 )a (X 2 − m 2 )b can be obtained as (Johnson and Kotz 1972; Mills 2001; Patel and Read 1996) from (5.3.22).

μab =

⎧ ⎨ 0, a!b!

⎩ 2g+h+ξ

t + j=0

a + b is odd, (2ρσ1 σ2 )2 j+ξ , (g− j)!(h− j)!(2 j+ξ)!

a + b is even

(5.3.24)

for a, b = 0, 1, . . . and satisfies the recursion μab = (a + b − 1)ρσ1 σ2 μa−1,b−1   +(a − 1)(b − 1) 1 − ρ2 σ12 σ22 μa−2,b−2 ,

(5.3.25)

where g and h are the quotients of a and b, respectively, when divided by 2; ξ is the residue when a or b is divided by 2; and t = min(g, h).

5.3 Expected Values of Nonlinear Functions

363

Example 5.3.6 When a = 2g, b = 2h, and m 1 = m 2 = 0, all the terms except for those satisfying a − j − 2 p = 0 and b − j − 2q = 0 will be zero in (5.3.22), and thus we have . min(a,b)  E X 1a X 2b =

a!b! j    ρ˜12 m 01 m 02 a− j b− j g+h− j j! 2 ! 2 !0!0! j=0,2,... 2 

.

min(g,h)

=

j=0

a!b! (2ρ˜12 )2 j , − j)!(h − j)!

2g+h (2 j)!(g

(5.3.26)

which is the same as the second line in the right-hand side of (5.3.24). Similarly, when a = 2g + 1, b = 2h + 1, and m 1 = m 2 = 0, the result (5.3.22) is the same as the second line in the right-hand side of (5.3.24). ♦ Example 5.3.7 (Gardner 1990) Obtain Υ˜ = E {sgn (X 1 ) sgn (X 2 )} for X =  2 2 (X 1 , X 2 ) ∼ N 0, 0, σ1 , σ2 , ρ . Solution First, note that ddx g(x) = ddx sgn(x) = 2δ(x) and that E {δ (X 1 ) δ (X 2 )}   = f (0, 0), where f denotes the pdf of N 0, 0, σ12 , σ22 , ρ . Letting k = 1 in (5.3.19), we have

d Υ˜ d ρ˜

= E g1(1) (X 1 ) g2(1) (X 2 ) = 4 f (0, 0) =

πσ1 σ2

2 √

1−ρ2

d Υ˜ 2 1 =  dρ π 1 − ρ2

, i.e.,

(5.3.27)

because ρ˜ = ρσ1 σ2 . Integrating this result, we get5 E {sgn (X 1 ) sgn (X 2 )} =

2 sin−1 ρ + c. π

(5.3.28)

Subsequently, because E {sgn (X 1 ) sgn (X 2 )}|ρ=0 = E {sgn (X 1 )} E {sgn (X 2 )} = 0 from (5.3.20), we finally have

E {sgn (X 1 ) sgn (X 2 )} =

2 sin−1 ρ. π

(5.3.29)

The result (5.3.29) implies that, when X 1 = X 2 , we have E {sgn (X 1 ) sgn (X 2 )} = E{1} = 1. Table 5.2 provides the expected values for some non-linear functions of ♦ (X 1 , X 2 ) ∼ N (0, 0, 1, 1, ρ).

5

3 2 Here, the range of sin−1 x is set as − π2 , π2 .

364

5 Normal Random Vectors

Table 5.2 Expected value E {g1 (X 1 ) g2 (X 2 )} for some non-linear functions g1 and g2 of (X 1 , X 2 ) ∼ N (0, 0, 1, 1, ρ) g1 (X 1 ) |X 1 | X1 sgn (X 1 ) δ (X 1 )  2 g2 (X 2 ) X2 ρ 0 0 π ρ     2 1 −1 ρ + 1 − ρ2 |X 2 | ρ sin 0 0 1 − ρ2 π π  2 2 −1 ρ sgn (X 2 ) 0 0 π ρ π sin  1 2 √1 δ (X 2 ) 0 0 π 1−ρ 2π 1−ρ2    d d ρ sin−1 ρ + 1 − ρ2 = sin−1 ρ. dρ Note. dρ sin−1 ρ = √ 1 2 . 1−ρ  2 1 √ E {|X i |} = π . E {δ (X i )} = . 2π

Denoting the pdf of a standard bi-variate normal random vector (X 1 , X 2 ) by f ρ (x, y), we have f ρ (−x, −y) = f ρ (x, y) and f ρ (−x, y) = f ρ (x, −y) = (x, y). Then, it is known (Kamat 1958) that the partial moment [r, s] = f −ρ ∞∞ r s 0 0 x y f ρ (x, y)d xd y is ⎧  1 2 ⎪ ⎪ r = 1, s = 0, ⎪ 4 π (1 + ρ), ⎪

⎪ π   ⎪ 1 −1 2 ⎨ ρ 2 + sin ρ + 1 − ρ , r = 1, s = 1, 2π [r, s] =   1 ⎪ 3ρ π2 + sin−1 ρ ⎪ ⎪ 2π ⎪

⎪   ⎪ ⎩ + 2 + ρ2 1 − ρ2 , r = 3, s = 1

(5.3.30)

  and the absolute moment νr s = E  X 1r X 2s  is νr s =

2

r +s 2

π

 Γ

     1 r +1 s+1 1 1 2 . Γ 2 F1 − r, − s; ; ρ 2 2 2 2 2

(5.3.31)

Here, 2 F1 (α, β; γ; z) denotes the hypergeometric function introduced in (1.A.24).   1 − ρ2 + ρ sin−1 ρ , Based on (5.3.30) and (5.3.31), we can obtain ν11 = π2   

  ν12 = ν21 = π2 1 + ρ2 , ν13 = ν31 = π2 2 + ρ2 1 − ρ2 + 3ρ sin−1 ρ ,       ν22 = 1 + 2ρ2 , ν14 = ν41 = π2 3 + 6ρ2 − ρ4 , ν23 = π8 1 + 3ρ2 , ν15 = 

   ν51 = π2 8 + 9ρ2 − 2ρ4 1 − ρ2 + 15ρ sin−1 ρ , ν42 = ν24 = 3 1 + 4ρ2 , and 

   ν33 = π2 4 + 11ρ2 1 − ρ2 + 3 3 + 2ρ2 ρ sin−1 ρ (Johnson and Kotz 1972).

5.3 Expected Values of Nonlinear Functions

5.3.2.3

365

Tri-variate Normal Random Vectors

Let us briefly discuss the case n = 3 in Theorem 5.3.1. Letting N = 3, r1 = 1, s1 = 2, r2 = 2, s2 = 3, r3 = 3, and s3 = 1, we have 11 = 1, 12 = 0, 13 = 1, 21 = 1, 22 = 1, 23 = 0, 31 = 0, 32 = 1, and 33 = 1. Then, for Υ˜ = E {g1 (X 1 ) g2 (X 2 ) g3 (X 3 )}, we can rewrite (5.3.12) as   +3 k j δr j s j 1 j=1 Υ˜ = E g1( 11 k1 + 12 k2 + 13 k3 ) (X 1 ) k1 k2 k3 2 ∂ ρ˜12 ∂ ρ˜23 ∂ ρ˜31 ∂ k1 +k2 +k3

× g2( 21 k1 + 22 k2 + 23 k3 ) (X 2 ) g3( 31 k1 + 32 k2 + 33 k3 ) (X 3 )

= E g1(k1 +k3 ) (X 1 ) g2(k1 +k2 ) (X 2 ) g3(k2 +k3 ) (X 3 ) . (5.3.32)

In addition, similarly to (5.3.20), we have

  ( j) E g1 (X 1 ) g2(k) (X 2 ) g3(l) (X 3 )  ρ˜ =˜ρ =˜ρ =0



12 23 31 ( j) (k) = E g1 (X 1 ) E g2 (X 2 ) E g3(l) (X 3 )

(5.3.33)

  ( j) E g1 (X 1 ) g2(k) (X 2 ) g3(l) (X 3 )  ρ˜ 31 =˜ρ12 =0



( j) (k) = E g1 (X 1 ) E g2 (X 2 ) g3(l) (X 3 )

(5.3.34)

and

for j, k, l = 0, 1, . . .. For ρ˜12 = ρ˜23 = 0 and ρ˜23 = ρ˜31 = 0 as well, we can obtain formulas similar to (5.3.34). These formulas can all be used to determine E {g1 (X 1 ) g2 (X 2 ) g3 (X 3 )}. Example 5.3.8 Obtain Υ˜ = 2E {X 3 1 X 2 X 3 } for X = (X 1 , X 2 , X 3 ) ∼ N (m, K ) with m = (m 1 , m 2 , m 3 ) and K = ρ˜i j . Solution From (5.3.32), we have ∂ ρ∂˜ 12 Υ˜ = E {X 3 } = m 3 , ∂ ρ∂˜ 23 Υ˜ = m 1 , and ∂ ρ∂˜ 31 Υ˜ = m 2 . Thus, Υ˜ = m 3 ρ˜12 + m 1 ρ˜23 + m 2 ρ˜31 + c. Now, when ρ˜12 = ρ˜23 = ρ˜31 = 0, we have Υ˜ = c = E {X 1 } E {X 2 } E {X 3 } = m 1 m 2 m 3 as we can see from (5.3.33). Thus, we have E {X 1 X 2 X 3 } = m 1 ρ˜23 + m 2 ρ˜31 + m 3 ρ˜12 + m 1 m 2 m 3 .

(5.3.35)

The result (5.3.35) is the same as (5.3.9), as (5.3.21) when X 2 = X 3 , and as (5.3.15) when X 1 = X 2 = X 3 = X . For a zero-mean tri-variate normal random vec♦ tor, (5.3.35) implies E {X 1 X 2 X 3 } = 0.

366

5 Normal Random Vectors

Consider a standard tri-variate normal random vector X ∼ N (0, K 3 ) and its pdf f 3 (x, y, z) as described in Sect. 5.1.3. For the partial moment



[r, s, t] = 0





0





x r y s z t f 3 (x, y, z)d xd ydz

(5.3.36)

0

and absolute moment   νr st = E  X 1r X 2s X 3t  ,

(5.3.37)

it is known that we have6 π  1 π + sin−1 β23,1 + ρ12 + sin−1 β31,2 [1, 0, 0] = √ 2 8π 3 2 π  + sin−1 β12,3 , +ρ31 2 1 [2, 0, 0] = 4π



c   π . −1 sin ρi j + ρ12 1 − ρ212 + ρ31 1 − ρ231 + 2 ⎫  ⎬ |K | ρ 3 23 + (2ρ31 ρ12 − ρ23 ) 1 − ρ223 +  , (5.3.39) ⎭ 1 − ρ223

1 [1, 1, 0] = 4π

1

[1, 1, 1] = √ 8π 3

ν211 =

(5.3.38)



0  c π . −1 sin ρi j + 1 − ρ212 + ρ12 2    +ρ23 1 − ρ231 + ρ31 1 − ρ223 ,



/

(5.3.40)

 c  .   π  −1 |K 3 | + + sin βi j,k , (5.3.41) ρi j + ρ jk ρki 2

    2 (ρ23 + 2ρ12 ρ31 ) sin−1 ρ23 + 1 + ρ212 + ρ231 1 − ρ223 , (5.3.42) π

and    The last term |K 3 | ρ23 1 − ρ223 of [2, 0, 0] and the last two terms ρ23 1 − ρ223 + ρ31 1 − ρ231 of [1, 1, 0] given in (Johnson andKotz 1972, Kamat  1958) some references should be corrected 6

into

|K 3 |ρ23  1−ρ223

as in (5.3.39) and ρ23 1 − ρ231 + ρ31 1 − ρ223 as in (5.3.40), respectively.

5.3 Expected Values of Nonlinear Functions

 ν221 =

 2 1 + 2ρ212 + ρ223 + ρ231 + 4ρ12 ρ23 ρ31 − ρ223 ρ231 . π

In (5.3.40) and (5.3.41), the symbol c .

c +

367

(5.3.43)

denotes the cyclic sum: for example, we have

sin−1 ρi j = sin−1 ρ12 + sin−1 ρ23 + sin−1 ρ31 .

(5.3.44)

      We will have 2+3−1 = 4, 3+3−1 = 10, 4+3−1 = 20 different7 cases, respec3 3 3 tively, of the expected value E {g1 (X 1 ) g2 (X 2 ) g3 (X 3 )} for two, three, and four options as the function gi . For a standard tri-variate normal random vector X = (X 1 , X 2 , X 3 ), consider four functions {x, |x|, sgn(x), δ(x)} of gi (x). Among the 20 expected values, due to the symmetry of the standard normal distribution, the four expected values E {X 1 X 2 sgn (X 3 )}, E {X 1 sgn (X 2 ) sgn (X 3 )}, E {sgn (X 1 ) sgn (X 2 ) sgn (X 3 )}, E {X 1 X 2 X 3 } of products of three odd functions and the six expected values E {X 1 δ (X 2 ) δ (X 3 )}, E {sgn (X 1 ) |X 2 | |X 3 |}, E {sgn (X 1 ) |X 2 | δ (X 3 )}, E {sgn (X 1 ) δ (X 2 ) δ (X 3 )}, E {X 1 |X 2 | |X 3 |}, E {X 1 |X 2 | δ (X 3 )} of products of two odd functions and one even function are zero. In addition, we easily get E {δ (X 1 ) δ (X 2 ) δ (X 3 )} = f 3 (0, 0, 0), i.e., 1 E {δ (X 1 ) δ (X 2 ) δ (X 3 )} =  8π 3 |K 3 |

(5.3.45)

based on (5.1.19), and the nine remaining expected values are considered in Exercises 5.21–5.23. The results of these ten expected values are summarized in Table 5.3. Meanwhile, some results shown in Table 5.3 can be verified using 211 (5.3.38)–(5.3.43): for instance, we can reconfirm E {X 1 |X 2 | sgn (X 3 )} = 21 ∂ν = π2 ∂ρ31    ρ12 sin−1 ρ23 + ρ31 1 − ρ223 via ν211 shown in (5.3.42) and E {X 1 X 2 |X 3 |}  = 14 ∂ρ∂12 ν221 = π2 (ρ12 + ρ23 ρ31 ) via ν221 shown in (5.3.43).

5.3.3 General Formula for Joint Moments For a natural number n, let a = {a1 , a2 , . . . , an } with ai ∈ {0, 1, . . .} a set of non-negative integers and let 7

This number is considered in (1.E.24).

(5.3.46)

368

5 Normal Random Vectors

Table 5.3 Expected values E {g1 (X 1 ) g2 (X 2 ) g3 (X 3 )} of some products for a standard tri-variate normal random vector (X 1 , X 2 , X 3 ) E {δ (X 1 ) δ (X 2 ) δ (X 3 )} = √ 13 8π |K 3 |  E {X 1 X 2 |X 3 |} = π2 (ρ12 + ρ23 ρ31 ) √ E {δ (X 1 ) δ (X 2 ) |X 3 |} = √ 3 1 2  |K 3 | E {δ (X 1 ) X 2 X 3 } =

2π 1−ρ12

√1 2π

(ρ23 − ρ31 ρ12 )

E {δ (X 1 ) sgn (X 2 ) X 3 } =  1 2 (ρ23 − ρ31 ρ12 ) π 1−ρ   12 √ 2 E {δ (X 1 ) |X 2 | |X 3 |} = π3 (ρ23 − ρ31 ρ12 ) sin−1 β23,1 + |K 3 |  E {δ (X 1 ) sgn (X 2 ) sgn (X 3 )} = π23 sin−1 β23,1    E {X 1 |X 2 | sgn (X 3 )} = π2 ρ12 sin−1 ρ23 + ρ31 1 − ρ223    E {sgn (X 1 ) sgn (X 2 ) |X 3 |} = π83 sin−1 β12,3 + ρ23 sin−1 β31,2 + ρ31 sin−1 β23,1    c   √ + |K 3 | + E {|X 1 X 2 X 3 |} = π83 ρi j + ρ jk ρki sin−1 βi j,k

 l = l11 , l12 , . . . , l1n , l22 , l23 , . . . , l2n , . . . , ln−1,n−1 , ln−1,n , lnn

(5.3.47)

with li j ∈ {. . . , −1, 0, 1, . . .} a set of integers. Given a and l, define the collection  n n Sa = l : li j ≥ 0 j=i

i=1

 n , L a,k ≥ 0 k=1

(5.3.48)

of l, where L a,k = ak − lkk −

n .

l jk

(5.3.49)

j=1

for k = 1, 2, . . . , n and l ji = li j

(5.3.50)

for j > i. A general formula for the joint moments of normal random vectors can now be obtained as shown in the following theorem: 2 3 Theorem 5.3.5 For X ∼ N (m, K ) with m = (m 1 , m 2 , . . . , m n ) and K = ρ˜i j , we have ⎛ ⎞⎛ ⎞  n  n n * n * a * * . l L i j a, j E Xkk = da,l ⎝ ρ˜i j ⎠ ⎝ m j ⎠ , (5.3.51) k=1

l∈Sa

i=1 j=i

j=1

5.3 Expected Values of Nonlinear Functions

369

where da,l = 2−Ml

/ n * k=1

with Ml =

n +

0⎛ ak ! ⎝

n n * *

⎞−1 ⎛ l i j !⎠



i=1 j=i

n *

⎞−1 L a, j !⎠

(5.3.52)

j=1

lii .

i=1



Proof The proof is shown in Appendix 5.1.

Note that, when any of the 21 n(n + 1) elements of l or any of the n elements of / 0−1 / 0−1 n 1 n n n  1 1 li j ! L a, j ! = 0 because L a, j j=1 is a negative integer, we have i=1 j=i j=1 + (−k)! → ±∞ for k = 1, 2, . . .. Therefore, the collection Sa in of (5.3.51) can l∈Sa

be replaced with the collection of all sets of 21 n(n + 1) integers. Details for obtaining  E X 1 X 2 X 32 based on Theorem 5.3.5 as an example are shown in Table 5.4 in the case of a = {1, 1, 2}. Example 5.3.9 Based on Theorem 5.3.5, we easily get E {X 1 X 2 X 3 } = m 1 m 2 m 3 + ρ˜12 m 3 + ρ˜23 m 1 + ρ˜31 m 2

(5.3.53)

and E {X 1 X 2 X 3 X 4 } = m 1 m 2 m 3 m 4 + m 1 m 2 ρ˜34 + m 1 m 3 ρ˜24 + m 1 m 4 ρ˜23 +m 2 m 3 ρ˜14 + m 2 m 4 ρ˜13 + m 3 m 4 ρ˜12 +ρ˜12 ρ˜34 + ρ˜13 ρ˜24 + ρ˜14 ρ˜23 .

(5.3.54)

 3 Table 5.4 Element sets l = {l11 , l12 , l13 , l22 , l23 , l33 } of Sa , L a, j j=1 , coefficient da,l , and the  terms in E X 1 X 2 X 32 for each of the seven element sets when a = {1, 1, 2}  {l11 , l12 , l13 , l22 , l23 , l33 } da,l Terms L a,1 , L a,2 , L a,3 1 2 3 4 5 6 7

{0, 0, 0, 0, 0, 0} {0, 0, 0, 0, 1, 0} {0, 0, 1, 0, 0, 0} {0, 0, 1, 0, 1, 0} {0, 1, 0, 0, 0, 0} {0, 0, 0, 0, 0, 1} {0, 1, 0, 0, 0, 1}

{1, 1, 2} {1, 0, 1} {0, 1, 1} {0, 0, 0} {0, 0, 2} {1, 1, 0} {0, 0, 0}

1 2 2 2 1 1 1

m 1 m 2 m 23 2˜ρ23 m 1 m 3 2˜ρ13 m 2 m 3 2˜ρ13 ρ˜ 23 ρ˜ 12 m 23 ρ˜ 33 m 1 m 2 ρ˜ 12 ρ˜ 33

370

5 Normal Random Vectors

Note that (5.3.53) is the same as (5.3.9) and (5.3.35), and (5.3.54) is the same as (5.3.10). ♦ When the mean vector is 0 in Theorem 5.3.5, we have ⎛ ⎞ 0  n  / n li j n * n . * a * * ρ ˜ i j ⎠, Xkk = ak ! 2−Ml ⎝ E l ! i j k=1 k=1 i=1 j=i

(5.3.55)

l∈Ta

n n + + lk1 = a1 , l22 + lk2 = a2 , where Ta denotes the collection of l such that l11 + k=1 k=1

n  n n + lkn = an , and li j ≥ 0 j=i . In other words, Ta is the same as Sa . . ., lnn + i=1

k=1

with L a,k ≥ 0 replaced by L a,k = 0 in (5.3.48). n  + ak is Theorem 5.3.6 We have E X 1a1 X 2a2 · · · X nan = 0 for X ∼ N (0, K ) when k=1

an odd number. Proof Adding lkk + n + n +

n + j=1 n−1 +

lk j = ak for k = 1, 2, . . . , n, we have

n +

ak = M l +

k=1

n + li j , which is an even number. Thus, when ak is i=1 j=1 i=1 j=i+1 k=1  a1 a2 an odd number, the collection Ta is a null set and E X 1 X 2 . . . X nan = 0. ♠

li j = 2Ml + 2

n +

Example 5.3.10 For a zero-mean n-variate normal random vector, assume a = 1 = {1, 1, . . . , 1}. When n is an odd number, E {X 1 X 2 · · · X n } = 0 from Theorem 5.3.6. Next, assume n is even. Over a non-negative integer region, if lkk = 0 for k = 1, 2, . . . , n and one of {l1k , l2k , . . . , lnk } − {lkk } is 1 and all the others are 0, we n + n have lkk + lik = 1. Now, if l ∈ Ta , because da,l = d1,l =  1n(1!) = 1, we 1! (0!)n

20

i=1

i=1

have (Isserlis 1918) E {X 1 X 2 · · · X n } =

n * n .*

l

ρ˜iijj .

(5.3.56)

l∈Ta i=1 j=i

Next, assigning 0, 1, and 0 to lkk , one of {l1k , l2k , . . . , lnk } − {lkk }, and all the others of {l1k , l2k , . . . , lnk } − {lkk }, respectively, for k = 1, 2, . . . , n is the same as assigning 1 to each pair after dividing {1, 2, . . . , n} into n pairs of two numbers. Here, a pair ( j, k) represents the subscript of l jk . Now, recollecting that there are n! possibilities for the same choice with a different order, the number of ways to divide {1, 2, . . . , n}    −1    · · · 22 n! = 2n! into n pairs of two numbers is n2 n−2 n n! . In short, the number of 2 elements in Ta , i.e., the number of non-zero terms on the right-hand side of (5.3.56), ♦ is 2n! n n! = (2n − 1)!!.

5.4 Distributions of Statistics

371

5.4 Distributions of Statistics Often, the terms sample and random sample (Abramowitz and Stegun 1972; Gradshteyn and Ryzhik 1980) are used to denote an i.i.d random vector, especially in statistics. A function of a sample is called a statistic. With a sample8 X = (X 1 , X 2 , . . . , X n ) of size n, the mean E {X i } and the variance Var (X i ) of the component random variable X i are called the population mean and population variance, respectively. Unless stated otherwise, we assume that population mean and popula2 for the samples considered in this section. tion variance are m and  σ , respectively, We also denote by E (X i − m)k = μk the k-th population central moment of X i for k = 0, 1, . . ..

5.4.1 Sample Mean and Sample Variance Definition 5.4.1 (sample mean) The statistic n 1. Xi n i=1

Xn =

(5.4.1)

for a sample X = (X 1 , X 2 , . . . , X n ) is called the sample mean of X. Theorem 5.4.1 We have the expected value  E Xn = m

(5.4.2)

 σ2 Var X n = n

(5.4.3)

and variance

for the sample mean X n .  n  + E {X i } = m . Using this result and the fact that E X i X j = Proof First, E X n = n1 i=1    E {X i } E X j = m 2 for i = j and E X i X j = E X i2 = σ 2 + m 2 for i= j, we 





2





have (5.4.3) from Var X n = E X n − E2 X n = E

8

n n 1 + X + X i j 2 n i=1 j=1

− m2 =

In several fields including engineering, the term sample is often used to denote an element X i of X = (X 1 , X 2 , . . . , X n ).

372

5 Normal Random Vectors





n ⎢+





 1 ⎢ 1 2 E X i2 + E Xi X j ⎥ ⎦ − m = n2 n2 ⎣ i=1 i=1 j=1 i= j n + n +

 

n σ 2 + m 2 + n(n − 1)m 2 − m 2 .



♠ 

Example 5.4.1 (Rohatgi and Saleh 2001) Obtain the third central moment μ3 X n of the sample mean X n .  n    3 + 1 Solution The third central moment μ3 X n = E X n − m = n3 E i=1 3  of X = (X 1 , X 2 , . . . , X n ) can be expressed as (X i − m) n n n     1 .  1 ..  μ3 X n = 3 E (X i − m)3 + 3 E (X i − m)2 X j − m n i=1 n i=1 j=1 i= j n n n   1 ...  + 3 E (X i − m) X j − m (X k − m) . n i=1 j=1 k=1

(5.4.4)

i= j, j=k,k=i

X j are independent of each Now, noting that E {X i − m} =0 and that X i and  2 2 X − m) − m other for i  = j, we have E (X j = E (X i − m) E X j − i m  = 0 for i = j and E (X i − m) X j − m (X k − m) = E {X i − m} E X j − m E {X k − m} = 0 for i = j, j = k, and k = i. Thus, we have   μ3 μ3 X n = 2 n   from μ3 X n =

1 n3

n +

(5.4.5)

 E (X i − m)3 .



i=1

Definition 5.4.2 (sample variance) The statistic 2 1 . Wn = Xi − X n n − 1 i=1 n

(5.4.6)

is called the sample variance of X = (X 1 , X 2 , . . . , X n ). Theorem 5.4.2 We have E {Wn } = σ 2 ,

(5.4.7)

i.e., the expected value of sample variance is equal to the population variance.

5.4 Distributions of Statistics

373

 Proof Let Yi = X i − m. Then, we have E {Yi } = 0, E Yi2 = σ 2 = μ2 , and n n  + + Yi = n1 E Yi4 = μ4 . Next, letting Y = n1 (X i − m), we have i=1

n . 

i=1

Xi − X

2

=

i=1

from

n  +

Xi − X

i=1

2

n + i=1

=

n  + i=1

addition, because nY =

2

n  +

n . 

Yi − Y

2

(5.4.8)

i=1

n n    2  2 + + Yi − n1 Xi − m − X − m Xk − m . =

Yi − Y

i=1 2

2

=

n  +

i=1

i=1

Yi2 − 2Y Yi + Y

 2

k=1

=

n +

i=1

Yi2 − 2Y

n +

In

Yi +

i=1

Yi2 − nY , we have n . 

Xi − X

i=1

2

=

n .

2

Yi2 − nY .

(5.4.9)

i=1

 n 2 n 2  + + = Xi − X E Yi2 − nE Y = nσ 2 − Therefore, we have E i=1 i=1   n + n 2  3 + n E Yi Y j = nσ 2 − n1 nE Yi2 + 0 = (n − 1)σ 2 and E {Wn } = n2 i=1 j=1   n  2 1 + = σ2 . E n−1 Xi − X ♠ i=1

Note that, due to the factor n − 1 instead of n in the denominator of (5.4.6), the expected value of sample variance is equal to the population variance as shown in (5.4.7). Theorem 5.4.3 (Rohatgi and Saleh 2001) We have the variance Var {Wn } =

(n − 3)μ22 μ4 − n n(n − 1)

(5.4.10)

of the sample variance Wn .  Proof Letting Yi = X i − m, we have E  n 2+ 4 Y j2 − 2nY Yi2 + n 2 Y , i.e., i=1

n +

i=1

Yi2

− nY

2

2 

 =E

n + n +

i=1 j=1

Yi2

374

5 Normal Random Vectors

E

⎧/ n ⎨ . ⎩

Yi2 − nY

2

02 ⎫ ⎬ ⎭

i=1

= E

⎧ n n . ⎨. ⎩

= nμ4 + n(n −

Yi2 Y j2 − 2nY

i=1 j=1



1)μ22

− 2nE Y

2

2

n .

 Yi2

n .

Yi2 + n 2 Y

i=1

4 + n2E Y .

4

⎫ ⎬ ⎭ (5.4.11)

i=1



In (5.4.11), E Y

n 2 + i=1



 =

Yi2

1 E n2

n + n n + +

i=1 j=1 k=1

 =

Yi2 Y j Yk

1 E n2

+ n i=1

Yi4 +

n n + + i=1 j=1 i= j

Yi2 Y j2 can

be evaluated as  E Y

2

n .

 Yi2

i=1

and E Y

4

 =

1 E n4

n + n + n + n +

=

1 μ4 + (n − 1)μ22 n 

 =

Yi Y j Yk Yl

i=1 j=1 k=1 l=1

1 E n4

n + i=1

Yi4

(5.4.12)

+3

n + n +

i=1 j=1 i= j

 Yi2 Y j2

can

be obtained as9 4 1  E Y = 3 μ4 + 3(n − 1)μ22 . n

(5.4.13)

Next, recollecting (5.4.9), (5.4.12),  and (5.4.13), if  we rewrite (5.4.11),  ) 2 n  n 2 2 + + 2 we have E Xi − X Yi2 − nY =E = nμ4 + n(n − 1)μ22 − i=1

2n n

 μ4 + (n − 1)μ22 + ⎡ E⎣

n .  i=1

n2 n3



i=1

μ4 + 3(n − 1)μ22 , i.e.,

2 ⎤   2 (n − 1) n 2 − 2n + 3 2 (n − 1) ⎦= μ4 + μ2 . (5.4.14) Xi − X n n 2

 We get (5.4.10) from Var {Wn } =

1 E (n−1)2

n  +

Xi − X

i=1

2

2 ) − μ22 using (5.4.7) ♠

and (5.4.14). Theorem 5.4.4 We have 

Xn

σ2 ∼ N μ, n

 (5.4.15)

In this formula, the factor 3 results from the three distinct cases of i = j = k = l, i = k = j = l, and i = l = k = j.

9

5.4 Distributions of Statistics

375

and consequently √  n X n − μ ∼ N (0, 1) σ

(5.4.16)

  for a sample X = (X 1 , X 2 , . . . , X n ) from N μ, σ 2 .     Proof Recollecting the mgf M(t) = exp μt + 21 σ 2 t 2 of N μ, σ 2 and    n = using (5.E.23), the mgf of X n can be obtained as M X n (t) = M nt    2 1 √σ ♠ t 2 . Thus, we have (5.4.15), and (5.4.16) follows. exp μt + 2 n Theorem 5.4.4 be shown from Theorem 5.2.5. More generally, we  also √  can   n    σ2 √ have X n ∼ N μ, n and n X n − μ ∼ N (0, 1) when X i ∼ N μi , σi2 i=1 σ2

are independent of each other, where μ =

1 n

n +

μi and σ 2 =

i=1

1 n

n + i=1

σi2 .

Theorem 5.4.5 The sample mean and sample variance of a normal sample are independent of each other.  Proof We first show that A = X n and B = (V1 , V2 , . . . , Vn ) = X 1 − X n , X 2  −X n , . . . , X n − X n are independent of each other. Letting t  = (t, t1 , t2 , . . . ,      tn ), t = E exp t = X + t V + t V + · · · + t V the joint mgf M A,B n 1 1 2 2 n n   n   + E exp t X n + ti X i − X n of A and B is obtained as i=1

 / n 0 / n 0  ) . .   ti X i − ti − t X n M A,B t = E exp i=1

i=1

⎧ ⎞ ⎫⎤ ⎛ n n ⎨. ⎬ . 1 ⎝nti + t − = E ⎣exp t j ⎠ Xi ⎦ . ⎩ ⎭ n i=1 j=1 ⎡

Letting t =

1 n

n +

(5.4.17)

 

ti , the joint mgf (5.4.17) can be expressed as M A,B t  =

    2  n 1 2 = exp μ t+ntni −nt + σ2 t+ntni −nt exp i=1 i=1 i=1       n     1   2 2 σ2 σ2 t 2 2 t −t 2 exp μt = exp μt + σ2nt μ ti − t + 2n i 2 2nt ti − t + n n + 2n 2 i=1 2   n n 2 1  1 σ σ2 t exp 2 ti − t exp μ + n ti − t , or as n 1

i=1





E exp X i

i=1



t+nti −nt n

i=1

 

=

n 1

376

5 Normal Random Vectors

    n   2 σ2 .  σ2 t 2 exp ti − t M A,B t = exp μt + 2n 2 i=1    n  σ2 t .  μ+ × exp ti − t . n i=1 Noting that

(5.4.18)

n  n  + + ti − t = ti − nt = 0, we eventually have i=1

i=1

    n   2 σ2 .  σ2 t 2 exp M A,B t = exp μt + ti − t . 2n 2 i=1

(5.4.19)

  2 Meanwhile, because A = X n ∼ N μ, σn as we have observed in Theorem 5.4.4, the mgf of A is  σ2 2 t . M A (t) = exp μt + 2n 

(5.4.20)

n   + ti − t = 0, the mgf of B = (V1 , V2 , . . . , Vn ) can be obtained i=1 n   +  ti X i − X n as M B (t) = E {exp (t1 V1 + t2 V2 + · · · + tn Vn )} = E exp i=1    n n n n 2 2    3 + + 1 1 = = E exp ti X i − t Xi E exp X i ti − t = exp i=1 i=1 i=1 i=1    n  n  2  2 + 2 2 +  or, equivalently, = exp μ ti − t + σ2 ti − t μ ti − t + σ2 (ti − t

Recollecting

i=1

as

i=1



 n 2 σ2 .  M B (t) = exp ti − t , 2 i=1

(5.4.21)

where t = (t1 , t2 , . . . , tn ). In short, from (5.4.19)–(5.4.21), the random variable X n and the random vector B = (V1 , V2 , . . . , Vn ) are independent of each other. Consequently, recollecting Theorem 4.1.3, the random variables X n and Wn = n  2 1 + X i − X n , a function of B = (V1 , V2 , . . . , Vn ), are independent of each n−1 i=1

other.



Theorem 5.4.5 is an important property of normal samples, and its converse is known also to hold true: if the sample mean and sample variance of a sample are independent of each other, then the sample is from a normal distribution. In the more general case of samples from symmetric marginal distributions, the sample mean and sample variance are known to be uncorrelated (Rohatgi and Saleh 2001).

5.4 Distributions of Statistics

377

5.4.2 Chi-Square Distribution Let us now consider the distributions of some statistics of normal samples. Definition 5.4.3 (central chi-square pdf) The pdf f (r ) =

 r 1 n  n r 2 −1 exp − u(r ) 2 2 Γ 2 n 2

(5.4.22)

is called the (central) chi-square pdf with its distribution denoted by χ2 (n), where n is called the degree of freedom. The central chi-square pdf (5.4.22), an exampleof which is shown in Fig. 5.4, is  1 r α−1 the same as the gamma pdf f (r ) = β α Γ (α) r exp − β u(r ) introduced in (2.5.31)   n with α and β replaced by 2 and 2, respectively: in other words, χ2 (n) = G n2 , 2 . Theorem 5.4.6 The square of a standard normal random variable is a χ2 (1) random variable.

n σ2 n σ2

Theorem 5.4.6 is proved in Example 3.2.20. Based on Theorem 5.4.6, we have   2  X n − μ ∼ χ2 (1) for a sample of size n from N μ, σ 2 and, more generally,   n 2   X n − μ ∼ χ2 (1) when X i ∼ N μi , σi2 i=1 are independent of each other.

Theorem 5.4.7 We have the mgf MY (t) = (1 − 2t)− 2 , t < n

1 2

(5.4.23)

and moments 

E Y

k



  Γ k + n2   = 2 Γ n2 k

(5.4.24)

for a random variable Y ∼ χ2 (n).

Fig. 5.4 A central chi-square pdf

f (r)

0

n=6

r

378

5 Normal Random Vectors

 Proof Using (5.4.22), the mgf MY (t) = E etY can be obtained as



MY (t) = 0

Now, letting y =

  1 1 − 2t n −1 2  x x d x. exp − n 2 2 2 Γ n2

(5.4.25)

2y 2dy x for t < 21 , we have x = 1−2t and d x = 1−2t . Thus,   n2 −1  ∞ 2dy 2y 1 recollecting (1.4.65), we obtain MY (t) = n2 n 0 1−2t e−y 1−2t = (1 − 2 Γ(2)  n n ∞ 2t)− 2 Γ 1n 0 y 2 −1 e−y dy, which results in (5.4.23). The moments of Y (2)      k  can easily be obtained as E Y k = dtd k MY (t) = (−2)k − n2 − n2 − 1 · · · t=0  n     − 2 − (k − 1) = 2k n2 n2 + 1 · · · n2 + k − 1 , resulting in (5.4.24). ♠ 1−2t 2

Example 5.4.2 For Y ∼ χ2 (n), we have the expected value E{Y } = n and variance Var(Y ) = 2n from (5.4.24). ♦ Example 5.4.3 (Rohatgi and Saleh 2001) For X n ∼ χ2 (n), obtain the limit distributions of Yn = Xn 2n and Z n = Xnn . Solution Recollecting that M X n (t) = (1 − 2t)− 2 , we have lim MYn (t) = n→∞ t   n2 t t  2t − 2t · n and, consequently, = lim exp n = 1 lim M X n n 2 = lim 1 − n 2 n→∞ n→∞ n→∞ n   − ·t P (Yn = 0) → 1. Similarly, we get lim M Z n (t) = lim 1 − 2tn 2t = et and, n

n→∞

consequently, P (Z n = 1) → 1.

n→∞



n  n + When X i ∼ χ2 (ki ) i=1 are independent of each other, the mgf of Sn = Xi

can be obtained as M Sn (t) =

n 1

(1 − 2t)

k − 2i

− 21

= (1 − 2t)

n + i=1

i=1

ki

based on the mgf

i=1

shown in (5.4.23) of X i . This result proves the following theorem:  n Theorem 5.4.8 When X i ∼ χ2 (ki ) i=1 are independent of each other, we have   n n   n  + + X i ∼ χ2 ki . In addition, if X i ∼ N μi , σi2 i=1 are independent of each i=1 i=1 2 n  + X i −μi ∼ χ2 (n). other, then σi i=1

Definition 5.4.4 chi-square distribution) For independent normal ran n   (non-central dom variables X i ∼ N μi , σ 2 i=1 with an identical variance σ 2 , the distribution n + X i2 of Y = is called the non-central chi-square distribution and is denoted by σ2 i=1

χ (n, δ), where n and δ = 2

n + i=1

parameter, respectively.

μi2 σ2

are called the degree of freedom and non-centrality

5.4 Distributions of Statistics

379

The pdf of χ2 (n, δ) is 1

f (x) = √ x 2n π

n 2 −1

   ∞  x + δ . (δx) j Γ j + 21  u(x). (5.4.26)  exp − 2 (2 j)! Γ j + n2 j=0

  √ Recollecting Γ 21 = π shown in (1.4.83), it is easy to see that (5.4.26) with δ = 0 is the same as (5.4.22): in other words, χ2 (n, 0) is χ2 (n). In Exercise 5.32, it is shown that E{Y } = n + δ, σY2 = 2n + 4δ,

(5.4.27) (5.4.28)

and MY (t) = (1 − 2t)− 2 exp n



δt 1 − 2t

 , t
1 (5.4.44) E{Z } = δ 2 Γ 2 and variance n(1 + δ 2 ) nδ 2 Var{Z } = − n−2 2



Γ

 n−1  2

Γ

 n2 

, n>2

(5.4.45)

2

of Z ∼ t (n, δ). Definition 5.4.7 (bi-variate t distribution) When the joint pdf of (X, Y ) is f X,Y (x, y) =

1 

2πσ1 σ2 1 − ρ2



1  1+  n 1 − ρ2

(x − μ1 ) (y − μ2 ) −2ρ + σ1 σ2





y − μ2 σ2

 x − μ1 2 σ1 ) − n+2  2 2

,

(5.4.46)

the random vector (X, Y ) is called  a bi-variate t random vector. The distribution,  denoted by t μ1 , μ2 , σ12 , σ22 , ρ, n , of (X, Y ) is called a bi-variate t distribution.   For (X, Y ) ∼ t μ1 , μ2 , σ12 , σ22 , ρ, n , we have the means E{X } = μ1 and E{Y } = n n σ12 and Var{Y } = n−2 σ22 for n > 2, and correlation coefμ2 , variances Var{X } = n−2 ficient ρ. The parameter n determines how fast the pdf (5.4.7) decays to 0 as |x| → ∞ or as |y| → ∞. When n = 1, the bi-variate t pdf is the same as the bi-variate Cauchy pdf, and the bi-variate t pdf converges to the bi-variate normal pdf as n gets larger. ρsn , and In addition, we have E{X |Y } = ρsY , E{X Y } = n−2   E X 2 Y =

2   3 s2 1 − ρ2 n + 1 + (n − 2)ρ2 Y 2 (n − 1)2

  for (X, Y ) ∼ t 0, 0, s 2 , 1, ρ, n .

(5.4.47)

384

5.4.4

5 Normal Random Vectors

F Distribution

Definition 5.4.8 (central F pdf) The pdf   Γ m+n m  m  m2 −1  m − m+n 2 2 m  n 1+ r f (r ) = u(r ) r n Γ 2 Γ 2 n n

(5.4.48)

is called the central F pdf with the degree of freedom of (m, n) and its distribution is denoted by F(m, n). The F distribution, together with the chi-square and t distributions, plays an important role in mathematical statistics. In Exercise 5.35, it is shown that the moment of H ∼ F(m, n) is 

E H for k = 1, 2, . . . ,

9n: 2

k



 n k Γ  m + k  Γ  n − k  2   2 = m Γ m2 Γ n2

(5.4.49)

− 1. Figure 5.6 shows the pdf of F(4, 3).

Theorem 5.4.12 (Rohatgi and Saleh 2001) We have nX ∼ F(m, n) mY

(5.4.50)

when X ∼ χ2 (m) and Y ∼ χ2 (n) are independent of each other. nX . Assuming the auxiliary variable V = Y , we have X = Proof Let H = mY m H V and Y = V . Because the Jacobian of the inverse transformation n  (X, Y ) =    m  m     v 0   ∂  = v, we g −1 (r, v) =  mn g −1 (H, V ) = mn H V, V is J g −1 (r, v) =  ∂(r,v)  n r 1 n   m have the joint pdf f H,V (r, v) = mv f vr, v of (H, V ) as n X,Y n

f H,V (r, v) =

Fig. 5.6 The pdf of F(4, 3)

mv  m  fX vr f Y (v). n n

(5.4.51)

f (r) m = 4, n = 3

0

r

5.4 Distributions of Statistics

Now,

the

marginal

1 2

2

1 2

2

m     n Γ m Γ n 2 2

pdf

 m  1 2 m

∞ (v)dv = mv n −∞  m  n

2

Γ

 m  m −1 r 2 n

385

n vr m

 m −1 2

Γ

2

∞ 0

∞  ∞ mv m  f H (r ) = −∞ f H,V (r, v)dv = −∞ n f X n vr f Y  m   1  n2 n2 −1 v     vr 2 n exp − n 2 exp − v2 u(v)u m n vr dv =

v

m+n 2 −1

2

   exp − v2 m n r + 1 dv u(r ) of H can be obtained

as10

 m  m2 −1 ∞  1  m+n 2 1 m m+n m  n v 2 −1 r n Γ 2 Γ 2 n 2 0  v m r + 1 dv u(r ) × exp −  m+n  2 n  m  Γ m m 2 −1 m − m+n 2 = m  2 n r 1+ r u(r ) n n n Γ 2 Γ 2

f H (r ) =

by noting that f X (x) =  1  n2   n  −1 n −1 − y Γ 2 y 2 e 2 u(y). 2

 1  m2   m  −1 m −1 − x Γ 2 x 2 e 2 u(x) 2

and

(5.4.52) f Y (y) = ♠

∼ F(n, m) when X ∼ F(m, n). − m+n   m2 −1  m 2 Γ m+n m 1 Solution If we obtain the pdf f Y (y) = y12 (Γn )m (Γ 2n ) 1 + mn 1y n y ( 2 ) (2) m+n     m m m m+n m+n     m m+n 2 1− 2 Γ Γ ny u 1y = y12 (Γn )m (Γ 2n ) ny u(y) = Γ m( Γ2 )n mn 2 y − 2 −1 (ny) 2 ny+m ( 2 ) (2) m ( 2 ) (2) m+n (ny + m)− 2 u(y) of Y = X1 based on (3.2.27) and (5.4.48), we have  m n  − m+n Γ m+n 2 f Y (y) = Γ m( Γ2 )n mn 2 y 2 −1 y + mn u(y), i.e., ( 2 ) (2) Example 5.4.7 Show that

1 X

    Γ m+n n n  n2 −1  n − m+n 2 2 y 1+ y u(y) f Y (y) =  m   n  m m m Γ 2 Γ 2   by noting that u

1 y

= u(y).

(5.4.53)



Example 5.4.8 When n → ∞, find the limit of the pdf of F(m, n). Solution Let us first rewrite the pdf f F (x) as   mx  m2  mx − m+n 1 1 Γ m+n 2 2 m  n 1+ f F (x) = u(x). n n Γ 2 x Γ 2 ;

(5.4.54)

A

10

Here, note that  m nr .

  m+n  −1  ∞  1  m+n  m+n w 2 Γ w 2 −1 e− 2 dw = 1 when we let w = v 1 + 0 2 2

386

5 Normal Random Vectors

Using (1.4.77), we have lim A = lim n→∞

n→∞

 n  m2  mx  m2 2

n

=

 mx  m2 2

and

  m − (m+n) m − n2  m − m2 2 1+ x lim 1 + x = lim 1 + x n→∞ n→∞ n n n  m  = exp − x . 2 Thus, letting a =

m , 2

(5.4.55)

we get

lim f F (x) =

n→∞

a (ax)a−1 exp(−ax)u(x). Γ (a)

(5.4.56)

  In other words, when n → ∞, F(m, n) → G m2 , m2 , where G(α, β) denotes the gamma distribution described by the pdf (2.5.31). ♦ Example 5.4.9 In Example 5.4.8, we obtained the limit of the pdf of F(m, n) when n → ∞. Now, obtain the limit when m → ∞. Then, based on the result, when X ∼ F(m, n) and m → ∞, obtain the pdf of X1 . Solution Rewrite the pdf f F (x) as    n2   m2 n mx 1 1 Γ m+n 2 n m  u(x). f F (x) = n + mx n + mx Γ 2 x Γ 2

; ;

(5.4.57)

C

B

Then, if we let b = n2 , we get 1 lim f F (x) = m→∞ bΓ (b)

 b+1   b b u(x) exp − x x

(5.4.58)

n m 2  n  n2  n  n2 Γ m noting that lim B = lim ( 2 Γ) m ( 2 ) × mx = 2x from (1.4.77) and that (2) m→∞ m→∞ m     n −2 n lim C = lim 1 + mx . Figure 5.7 shows the pdf of F(m, 10) = exp − 2x

m→∞

m→∞

for some values of m, and Fig. 5.8 shows three pdf’s11 of F(m,  n) for m → ∞. 1 Next, for m → ∞, the pdf lim f X1 (y) = lim y 2 f F 1y = bΓ1(b) y12 (by)b+1 m→∞ m→∞   exp(−by)u 1y of X1 can be obtained as lim f X1 (y) =

m→∞

11

The maximum is at x =

a−1 a

=

m−2 m

b (by)b−1 exp(−by)u(y). Γ (b) in Fig. 5.7 and at x =

b b+1

=

n n+2

in Fig. 5.8.

(5.4.59)

5.4 Distributions of Statistics

387

fF (x) 0.8

m → ∞, n = 10 m = 10, n = 10 m = 20, n = 10 m = 100, n = 10

0.6 0.4 0.2 1

0

2

3

4

5

x

Fig. 5.7 The pdf f F(m,10) (x) for some values of m

lim fF (m,n) (x)

m→∞

2.5 2 n = 0.5 n = 10 n = 100

1.5 1 0.5 1

0

2

3

4

x

Fig. 5.8 The limit lim f F(m,n) (x) of the pdf of F(m, n) m→∞

In other words,

1 X

∼G

n 2

 , n2 when m → ∞.



Because X1 ∼ F(n, m) when X ∼ F(m, n) as we have observed in (5.4.53) and   F(m, n) → G m2 , m2 for n → ∞ as we have observed in Example 5.4.8, we have   1 ∼ G n2 , n2 for m → ∞ when X ∼ F(m, n): Example 5.4.9 shows this result X   n directly. In addition, based on this result and (3.2.26), we have 2X ∼ G n2 , 1 for m → ∞ when X ∼ F(m, n). Theorem 5.4.13 (Rohatgi and Saleh 2001)  If X =   (X 1 , X 2 , . . . , X m ) from  N μ X , σ 2X and Y = (Y1 , Y2 , . . . , Yn ) from N μY , σY2 are independent of each other, then σY2 W X,m ∼ F(m − 1, n − 1) σ 2X WY,m and

(5.4.60)

388

5 Normal Random Vectors



  ? X m − μ X − Y n − μY m+n−2  ∼ t (m + n − 2), σ 2X σY2 (m−1)W X,m (n−1)WY,m + + 2 2 m n σ σ X

(5.4.61)

Y

where X m and W X,m are the sample mean and sample variance of X, respectively, and Y n and WY,n are the sample mean and sample variance of Y , respectively. Proof From (5.4.41), we have

(m−1) W X,m σ 2X

∼ χ2 (m − 1) and

(n−1) WY,n σY2

∼ χ2 (n −

1).Thus, (5.4.60) follows from Theorem 5.4.12. Next, noting that X m − Y n ∼  σ 2X σY2 (m−1) WY,n ∼ χ2 (m + n − 2), and these two N μ X − μY , m + n , σ2 W X,m + (n−1) σY2 X statistics are independent of each other, we easily get (5.4.61) from Theorem 5.4.10. ♠ Definition 5.4.9 (non-central F distribution) When X ∼ χ2 (m, δ) and Y ∼ χ2 (n) are independent of each other, the distribution of H =

nX mY

(5.4.62)

is called the non-central F distribution with the degree of freedom of (m, n) and non-centrality parameter δ, and is denoted by F(m, n, δ). The pdf of the non-central F distribution F(m, n, δ) is  δmx  j  m+n+2 j  m n m ∞ Γ 2 2 m 2 n 2 x 2 −1 .   δ n f (x) = u(x). j Γ 2 exp 2 j=0 j! Γ m+2 j (mx + n) m+n+2 2 2

(5.4.63)

Here, the pdf (5.4.63) for δ = 0 indicates that F(m, n, 0) is the central F distribution F(m, n). In Exercise 5.36, we obtain the mean

E{H } =

n(m + δ) m(n − 2)

(5.4.64)

for n = 3, 4, . . . and variance  2n 2 (m + δ)2 + (n − 2)(m + 2δ) Var{H } = m 2 (n − 4)(n − 2)2 for n = 5, 6, . . . of F(m, n, δ).

(5.4.65)

Appendices

389

Appendices Appendix 5.1 Proof of General Formula for Joint Moments The general formula (5.3.51) for the joint moments of normal random vectors is proved via mathematical induction here. First note that ⎫ ⎧ ⎪ ⎪  n  ⎪ n ⎬ ⎨ * a * a⎪ ∂ a −1 a −1 j (5.A.1) E X k k = ai a j E X i i X j Xkk ⎪ ⎪ ∂ ρ˜i j ⎪ ⎪ k=1 k=1 ⎭ ⎩ k=i, j

for i = j and 

n * ∂ E X kak ∂ ρ˜ii k=1

 =

⎫ ⎪ ⎪ ⎬

⎧ ⎪ ⎪ ⎨

n * 1 ai (ai − 1) E X iai −2 X kak ⎪ ⎪ 2 ⎪ ⎪ k=1 ⎭ ⎩

(5.A.2)

k=i

for i = j from (5.3.12).

 (1) Let us show that (5.3.22) holds true. Express E X 1a X 2b as

E



X 1a X 2b



=

a− j b− j min(a,b) . ..

j

p

q

a− j−2 p

da,b, j, p,q ρ˜12 ρ˜11 ρ˜22 m 1

b− j−2q

m2

. (5.A.3)

p=0 q=0

j=0

   Then, when ρ˜12 = 0, we have E X 1a X 2b = E X 1a E X 2b , i.e., a b . .

p

q

a−2 p

da,b,0, p,q ρ˜11 ρ˜22 m 1

b−2q

m2

p=0 q=0

=

a .

p da, p ρ˜11

a−2 p m1

p=0

b .

q

b−2q

db,q ρ˜22 m 2

(5.A.4)

q=0

 p a−2 p a! because the coefficient of ρ˜11 m 1 is da, p = 2 p p!(a−2 when E X 1a is p)! expanded as we can see from (5.3.17). Thus, we get da,b,0, p,q =

2 p+q

a!b! . p!q!(a − 2 p)!(b − 2q)!

(5.A.5)

390

5 Normal Random Vectors

Next, from (5.A.1), we get   ∂ E X 1a X 2b = abE X 1a−1 X 2b−1 . ∂ ρ˜12

(5.A.6)

The left- and right-hand sides of (5.A.6) can be expressed as ∂ E ∂ ρ˜ 12

=



X 1a X 2b



min(a,b) + a− +j b− +j

=

p=0 q=0

j=1

min(a,b)−1 + a−1− + j b−1− +j j=0

p=0

q=0

j−1 p

q

j da,b, j, p,q ρ˜12 ρ˜11 ρ˜22 a−1− j−2 p

( j + 1)da,b, j+1, p,q m 1

b−1− j−2q

m2

(5.A.7)

and  E X 1a−1 X 2b−1 =

a−1− j b−1− j min(a−1,b−1) . . . j=0

p=0

j

p

q

da−1,b−1, j, p,q ρ˜12 ρ˜11 ρ˜22

q=0

a−1− j−2 p b−1− j−2q ×m 1 m2 ,

(5.A.8)

respectively, using (5.A.3). Taking into consideration that min(a, b) − 1 = min(a − 1, b − 1), we get da,b, j+1, p,q =

ab da−1,b−1, j, p,q j +1

(5.A.9)

from (5.A.6)–(5.A.8). Using (5.A.5) in (5.A.9) recursively, we obtain da,b, j+1, p,q =

a!b! , 2 p+q p!q!( j + 1)!(a − j − 1 − 2 p)!(b − j − 1 − 2q)! (5.A.10)

which is equivalent to the coefficient shown in (5.3.22). (2) We have so far shown that (5.3.51) holds true when n = 2, and that (5.3.51) holds true when n = 1 as shown in Exercise 5.27. Now, assume that (5.3.51) holds true when n = m − 1. Then, because   am−1  am E Xm E X 1a1 X 2a2 · · · X mam = E X 1a1 X 2a2 · · · X m−1 when ρ˜1m = ρ˜2m = · · · = ρ˜m−1,m = 0, we get = da1 ,l 1 dam ,lmm , i.e.,

2

da2 ,l 2

(5.A.11)

3 l1m →0,l2m →0,...,lm−1,m →0

Appendices

391

2 3 da2 ,l 2 l1m →0,l2m →0,...,lm−1,m →0 = da1 ,l 1 dam ,lmm m 1

=

ak !

k=1

2 Ml 2 ζm−1 (l) lmm ! (am − 2lmm )!η1,m−1

(5.A.12)

n + m−1 from (5.3.51) with n = m − 1 and (5.3.17), where Ml = lii , a1 = {ai }i=1 , a2 = i=1  m−1 m m 11 m−1 m , l 2 = l 1 ∪ {lim }i=1 , ζm (l) = li j !, and ηk,m = a1 ∪ {am }, l 1 = li j j=i i=1

m 1

i=1 j=i

L ak , j !. Here, the symbol → denotes a substitution: for example, α → β means

j=1

the substitution of α with β. Next, employing (5.A.1) with (i, j) = (1, m), (2, m), . . . , (m − 1, m), we will get 3 2 ai !am ! da2 ,l 2 lim →0,ai →ai −lim −1,am →am −lim −1 3 2 da2 ,l 2 lim →lim +1 = (ai − lim − 1)! (am − lim − 1)! (lim + 1)!

(5.A.13)

for i = 1, 2, . . . , m − 1 in a fashion similar to that leading to (5.A.10) from (5.A.6). By changing lim + 1 into lim in (5.A.13), we have da2 ,l 2 =

3 2 ai !am ! da2 ,l 2 lim →0,ai →ai −lim ,am →am −lim (ai − lim )! (am − lim )!lim !

(5.A.14)

for i = 1, 2, . . . , m − 1. Now, letting l2m = 0 in (5.A.14) with i = 1, we get 3 2 a1 !am ! da2 ,l 2 l1m →0,l2m →0,a1 →a1 −l1m ,am →am −l1m 2 3 da2 ,l 2 l2m →0 = . (a1 − l1m )! (am − l1m )!l1m !

(5.A.15)

Using (5.A.15) into (5.A.14) with i = 2, we obtain a1 ! (am − l2m )! a2 !am ! × (a2 − l2m )! (am − l2m )!l2m ! (a1 − l1m )! (am − l1m − l2m )!l1m ! 2 3 × da2 ,l 2 l1m →0,l2m →0,a1 →a1 −l1m ,a2 →a2 −l2m ,am →am −l1m −l2m . (5.A.16)

da2 ,l 2 =

Subsequently, letting l3m = 0 in (5.A.16), we get 3 2 a1 !a2 !am ! da2 ,l 2 l3m →0 = l1m !l2m ! (a1 − l1m )! (a2 − l2m )! (am − l1m − l2m )! 2 3 × da2 ,l 2 lkm →0 for k=1,2,3, a1 →a1 −l1m , a2 →a2 −l2m , am →am −l1m −l2m , (5.A.17)

392

5 Normal Random Vectors

which can be employed into (5.A.14) with i = 3 to produce da2 ,l 2 = 1 a3 !am ! a1 !a2 ! (am − l3m )! − l ! − l l !l ! − l (a (a )!l (a )! )! 3m 1m 2m 1 1m (a2 − l2m )! (am − l1m − l2m − l3m )! 2 3 3 3m 3m m da2 ,l 2 lkm →0, ak →ak −lkm for k=1,2,3; a →a −l −l −l , i.e., m

m

1m

2m

3m

 da2 ,l 2 = 



3 1

lkm ! 2 3 × da2 ,l 2 k=1

3 1



ak ! am !   3 3 1 + lkm ! (ak − lkm )! am − k=1

k=1

k=1

lkm →0, ak →ak −lkm for k=1,2,3; am →am −

3 +

lkm

.

(5.A.18)

k=1

3 2 If we repeat the steps above until we reach i = m − 1, using da2 ,l 2 lm−1,m →0 obtained by letting lm−1,m = 0 in (5.A.18) with i = m − 2 and recollecting (5.A.14) with i = m − 1, we will eventually get m−1 1

 ak ! am ! k=1 da2 ,l 2 = m−1   m−1  m−1 1 1 + lkm ! lkm ! (ak − lkm )! am − k=1 k=1 k=1 2 3 × da2 ,l 2 . m−1 + lkm →0, ak →ak −lkm for k=1,2,...,m−1; am →am −

(5.A.19)

lkm

k=1

Finally, noting that am − that

2

L a1 , j

m−1 +

lkm − 2lmm = am − lmm −

k=1

3

ak →ak −lkm for k=1,2,...,m−1

= a j − l jm − l j j −

m−1 +

m +

lkm = L a2 ,m

k=1

lk j = a j − l j j −

k=1

m + k=1

and

lk j = L a2 , j

for j = 1, 2, . . . , m − 1, if we combine (5.A.12) and (5.A.19), we can get m−1 1

 ak ! am ! k=1 = m−1   m−1  m−1 1 1 + lkm ! lkm ! (ak − lkm )! am −

da2 ,l 2

k=1

m−1 1

k=1

k=1

  m−1 + lkm ! (ak − lkm )! am − k=1 k=1 ×   m−1 + Ml 2 2 ζm−1 (l) lmm ! am − lkm − 2lmm !D2,2 k=1 m 1

=

ak !

k=1

2 Ml 2 ζm (l) η2,m (l)

,

(5.A.20)

Appendices

393

which implies that (5.3.51) holds true also when n = m, where D2,2 = 2  3 L a1 , j ak →ak −lkm for k=1,2,...,m−1 !.

m−1 1 j=1

Appendix 5.2 Some Integral Formulas For the quadratic function Q(x) =

n n . .

ai j xi x j

(5.A.21)

j=1 i=1

of x = (x1 , x2 , . . . , xn ), consider

Jn = 0











···

0

exp{−Q(x)} d x,

(5.A.22)

0

where d x = d x1 d x2 · · · d xn . When n = 1 with Q(x) = a11 x12 , we easily get J1 =

√ π √ 2 a11

(5.A.23)

for a11 > 0. When n = 2, assume Q(x) = a11 x12 + a22 x22 + 2a12 x1 x2 , where Δ2 = 2 > 0. We then get a11 a22 − a12 1 J2 = √ Δ2



π a12 − tan−1 √ 2 Δ2

 .

(5.A.24)

In addition, when n = 3 assume Q(x) = a11 x12 + a22 x22 + a33 x32 + 2a12 x1 x2 + 2 2 2 where Δ3 = a11 a22 a33 − a11 a23 − a22 a31 − a33 a12 + 2a23 x2 x3 + 2a31 x3 x1 , 3 2a12 a23 a31 > 0 and {aii > 0}i=1 . Then, we will get 0 √ / c π . −1 ai j aki − aii a jk π tan J3 = √ + √ aii Δ3 4 Δ3 2

(5.A.25)

c + denotes the cyclic sum defined in (5.3.44). after some manipulations, where Now, recollect the standard normal pdf

 2 1 x φ(x) = √ exp − 2 2π

(5.A.26)

394

5 Normal Random Vectors

and the standard normal cdf

Φ(x) =

x

−∞

φ(t) dt

(5.A.27)

∞ defined in (3.5.2) and (3.5.3), respectively. Based on (5.A.23) or on −∞ exp ∞ ∞    −αx 2 d x = απ shown in (3.3.28), we get −∞ φm (x)d x = 1 m2 −∞ exp (2π)   2 − mx2 d x, i.e.,

∞ −∞

φm (x) d x = (2π)−

m−1 2

m− 2 . 1

(5.A.28)

For n = 0, 1, . . ., consider



In (a) = 2π Φ n (ax) φ2 (x) d x −∞

∞   Φ n (ax) exp −x 2 d x. =

(5.A.29)

−∞

Letting n = 0 in (5.A.29), we have √ π,

I0 (a) =

(5.A.30)

and letting a = 0 in (5.A.29), we have √ In (0) =

π

2n

.

(5.A.31)

Because 2m + 1 is an odd number for m = 0, 1, . . ., we have

    1 2m+1 Φ(ax) − exp −x 2 d x = 0, 2 −∞ ∞

which can subsequently be expressed as  2m+1 (5.A.29) and then Φ(ax) − 21 = in turn can be rewritten as

2m+1 + 

i=0 2m+1 +  i=0

− 21

− 21

i

i

(5.A.32)

2m+1 Ci I2m+1−i (a)

2m+1 Ci Φ

2m+1−i

= 0 from

(ax). This result

Appendices

395

I2m+1 (a) =

2m+1 .

2−i (−1)i+1 2m+1 Ci I2m+1−i (a)

(5.A.33)

i=1

for m = 0, 1, . . . after some steps. Thus, when m = 0, from (5.A.30) and (5.A.33), we get I1 (a) = 21 I0 (a), i.e., I1 (a) =

√ π . 2

(5.A.34)

Similarly, when m = 1, from (5.A.30), (5.A.33), and (5.A.34), we get I3 (a) = 3 I (a) − 43 I1 (a) + 18 I0 (a), i.e., 2 2 I3 (a) =

3 1 I2 (a) − I0 (a). 2 4

(5.A.35)

d d Next, recollecting that da Φ(ax) = xφ(ax) and da Φ 2 (ax) = 2xΦ(ax)φ(ax), if we differentiate I2 (a) with respect to a using Leibnitz’s  ∞rule (3.2.18), inted I2 (a) = 2π −∞ 2xΦ(ax)φ(ax)φ2 grate by parts, and then use (3.3.29), we get da    ∞   ∞ 2 Φ(ax) 2 2+a 2 2 2 dx = − (x) = √2 −∞ Φ(ax)x exp − 2+a x exp − x + 2 2 π 2 2+a 2π x=−∞    2  2      2+a x ∞ ∞ 2 a a exp − 1 + a 2 x 2 d x , i.e., d x = π 2+a π 2+a 2 −∞ φ(ax) exp − 2 ( 2 ) −∞

d a  I2 (a) =  da π 2 + a2 Consequently, d dx

tan

−1

x=

noting

1 , 1+x 2

(5.A.31)

and

d da



π . 1 + a2

tan−1

√ 1 + a2 =

(5.A.36)

a√ (2+a 2 ) 1+a 2

from

we finally obtain  1 I2 (a) = √ tan−1 1 + a 2 , π

(5.A.37)

√  π 3 −1 2 I3 (a) = √ tan 1+a − 4 2 π

(5.A.38)

and then,

from (5.A.35) and (5.A.37). The results {Jk }3k=1 and {Ik (a)}3k=0 we have derived so far, together with φ (x) = −xφ(x), are quite useful in obtaining the moments of order statistics of standard normal distribution for small values of n.

396

5 Normal Random Vectors

Appendix 5.3 Generalized Gaussian, Generalized Cauchy, and Stable Distributions In many fields including signal processing, communication, and control, it is usually assumed that noise is a normal random variable. The rationale for this is as follows: the first reason is due to the central limit theorem, which will be discussed in Chap. 6. According to the central limit theorem, the sum of random variables will converge to a normal random variable under certain conditions and the sum can reasonably be approximated by a normal random variable even when the conditions are not satisfied perfectly. We have already observed such a case in Gaussian approximation of binomial distribution in Theorem 3.5.16. In essence, the mathematical model of Gaussian assumption on noise does not deviate much from reality. The second reason is that, if we assume that noise is Gaussian, many schemes of communications and signal processing can be obtained in a simple way and analysis of such schemes becomes relatively easy. On the other hand, in some real environments, noise can be described only by non-Gaussian distributions. For example, it is reported that the low frequency noise in the atmosphere and noise environment in underwater acoustics can be modeled adequately only with non-Gaussian distributions. When noise is non-Gaussian, it would be necessary to adopt an adequate model other than the Gaussian model for the real environment in finding, for instance, signal processing techniques or communication schemes. Needless to say, in such an environment, we could still apply techniques obtained under Gaussian assumption on noise at the cost of some, and sometimes significant, loss and/or unpredictability in the performance. Among the non-Gaussian distributions, impulsive distributions, also called longtailed or heavy-tailed distributions also, constitute an important class. In general, when the tail of the pdf of a distribution is heavier (longer) than that of a normal distribution, the distribution is called an impulsive distribution. In impulsive distributions, noise of very large magnitude or absolute value (that is, values much larger or smaller than the median) can occur more frequently than that in the normal distribution. Let us here discuss in a brief manner the generalized Gaussian distribution and the generalized Cauchy distribution. In addition, the class of stable distributions (Nikias and Shao 1995; Tsihrintzis and Nikias 1995), which has bases on the generalized central limit theorem, will also be introduced. In passing, the generalized central limit theorem, not covered in this book, allows us to consider the convergence of random variables of which the variance is not finite and, therefore, to which the central limit theorem cannot be applied.

Appendices

397

fGG (x)

2.5 σG = 1

2

k = 0.5

1.5

k=1 k=2

1

k = 10 k=∞

0.5 0

−3

−2

−1

0

1

2

3

x

Fig. 5.9 The generalized normal pdf

(A) Generalized Gaussian Distribution Definition 5.A.1 (generalized normal distribution) A distribution with the pdf    ) k |x| k   exp − f GG (x) = A G (k) 2 A G (k)Γ k1

(5.A.39)

is called a  generalized normal or generalized Gaussian distribution, where k > 0 and σG2 Γ ( k1 ) . A G (k) = Γ ( k3 ) As it is also clear in Fig. 5.9, the pdf of the generalized normal distribution is a unimodal even function, defined by two parameters. The two parameters are the variance σG2 and the rate k of decay of the pdf. The generalized normal pdf is usefully employed in representing many pdf’s by adopting appropriate values of k. For example, when k = 2, the generalized normal pdf is a normal pdf. When k < 2, the generalized normal pdf is an impulsive pdf: specifically, when k = 1, the generalized normal pdf is the double exponential pdf / √

0 2|x| exp − . f D (x) = √ σG 2σG 1

(5.A.40)

The moment of a random variable X with the pdf f GG (x) in (5.A.39) is 

E X

r



=

σGr

Γ

r 2 −1

1

Γ

k r 2

  Γ r +1 3 k k

(5.A.41)

398

5 Normal Random Vectors

when r is even. In addition, recollecting (1.4.76), we have lim k3 Γ k→∞   lim k1 Γ k1 = 1, and therefore

3 k

= 1 and

k→∞

lim A G (k) =

k→∞

√ 3σG

(5.A.42)

→ ∞, the limit of the exponential function √ in (5.A.39) is 1 when |x| ≤ A G (k), or equivalently when |x| ≤ 3σG , and 0 when |x| > A G (k). Therefore, for k → ∞, we have k 1 k→∞ 2 A G (k)Γ ( k )

and lim

=

√1 . Next, for k 2 3σG

√  1 u 3σG − |x| . f GG (x) → √ 2 3σG

(5.A.43)

In other words, for k → ∞, the limit of the generalized normal pdf is a uniform pdf as shown in Fig. 5.9.

(B) Generalized Cauchy Distribution Definition 5.A.2 (generalized Cauchy distribution) A distribution with the pdf f GC (x) =

B˜ c (k, v)

(5.A.44)

v+ D˜ c k (x) 1

is called the generalized Cauchy distribution and is denoted by G C (k, v). Here, k > 0,

k kΓ v+ 1 v > 0, B˜ c (k, v) = 1 ( k ) 1 , and D˜ c (x) = 1 + v1 AG|x|(k) . 2v k A G (k)Γ (v)Γ ( k ) Figure 5.10 shows the generalized Cauchy pdf. When the parameter v is finite, the tail of the generalized Cauchy pdf shows not an exponential behavior, but an algebraic behavior. Specifically, when |x| is large, the tail of the generalized Cauchy pdf f GC (x) decreases in proportion to |x|−(kv+1) . When k = 2 and 2v is an integer, the generalized Cauchy pdf is a t pdf, and when k = 2 and v = 21 , the generalized Cauchy pdf is a Cauchy pdf f C (x) =

π



σG . + σG2

(5.A.45)

x2

v+ When the parameters σG2 and k are fixed, we have lim D˜ c k (x) = lim v→∞ v→∞

k v+ k1 1 + v1 AG|x|(k) , i.e., 1

Appendices

399

fGC (x) 2.5

k = 0.5

v = 10 σG = 1

2

k=1

1.5

k=2

1

k = 10 k=∞

0.5 0

−3

−2

−1

0

1

2

3

x

Fig. 5.10 The generalized Cauchy pdf

lim

v→∞

In lim

addition, Γ (v+ k1 ) 1

v→∞ v k Γ (v)

v+ 1 D˜ c k (x)

 = exp

|x| A G (k)

k ) .

Γ v+ 1 k lim ( k ) 2 A G (k)Γ ( k1 ) v→∞ v k1 Γ (v)

lim B˜ c (k, v) =

v→∞

=

(5.A.46)

k 2 A G (k)Γ ( k1 )

because

= 1 from (1.4.77). Thus, for v → ∞, the generalized Cauchy

pdf converges to the generalized normal pdf. For example, when k = 2 and v → ∞, the generalized Cauchy pdf is a normal pdf. shown in (1.4.76) Next, using lim pΓ ( p) = lim Γ ( p + 1) = 1 p→0 p→0 √ and lim A G (k) = 3σG shown in (5.A.42), we get lim B˜ c (k, v) = k→∞

when |x|
3σG . In short, when v is

Γ (v) 1 Γ 1 k→∞ 2 A G (k)Γ (v) √k ( k )

lim

=

√1 2 3σG k→∞

fixed and k → ∞, we have

f GC (x) →

√  1 u 3σG − |x| , √ 2 3σG

(5.A.47)

i.e., the limit of the generalized Cauchy pdf is a uniform pdf as shown also in Fig. 5.10. After some steps, we can obtain the r -th moment 

E X

r



= v

r k

σGr

    Γ Γ v − rk Γ r +1 k r 3 Γ (v)Γ 2 k

r 2 −1

1 k

(5.A.48)

400

5 Normal Random Vectors

for vk > r when r is an even number, and the variance 2 σGC

=

2 σG2 v k

  Γ v − 2k Γ (v)

(5.A.49)

of the generalized Cauchy pdf.

(C) Stable Distribution The class of stable distributions is also a useful class for modeling impulsive environments for a variety of scenarios. Unlike the generalized Gaussian and generalized Cauchy distributions, the stable distributions are defined by their cf’s. Definition 5.A.3 (stable distribution) A distribution with the cf 3 2 ϕ(t) = exp jmt − γ|t|α {1 + jβsgn(t)ω(t, α)}

(5.A.50)

is called a stable distribution. Here, 0 < α ≤ 2, |β| ≤ 1, m is a real number, γ > 0, and  tan απ , if α = 1, ω(t, α) = 2 2 (5.A.51) log |t|, if α = 1. π

In Definition 5.A.3, the numbers m, α, β, and γ are called the location parameter, characteristic exponent, symmetry parameter, and dispersion parameter, respectively. The location parameter m represents the mean when 1 < α ≤ 2 and the median when 0 < α ≤ 1. The characteristic exponent α represents the weight or length of the tail of the pdf, with a smaller value denoting a longer tail or a higher degree of impulsiveness. The symmetry parameter β determines the symmetry of the pdf with β = 0 resulting in a symmetric pdf. The dispersion parameter γ plays a role similar to the variance of a normal distribution. For instance, the stable distribution is a normal distribution and the variance is 2γ when α = 2. When α = 1 and β = 0, the stable distribution is a Cauchy distribution. Definition 5.A.4 (symmetric alpha-stable distribution) When the symmetry parameter β = 0, the stable distribution is called the symmetric α-stable (SαS) distribution. When the location parameter m = 0 and the dispersion parameter γ = 1, the SαS distribution is called the standard SαS distribution. By inverse transforming the cf ϕ(t) = exp (−γ|t|α )

(5.A.52)

Appendices

401

f (x) α = 0.6 α = 1.0 α = 1.4 α = 2.0

m

x

Fig. 5.11 The pdf of SαS distribution

of the SαS distribution with m = 0, we have the pdf ⎧ ∞  kαπ   |x| −αk−1 ⎪ 1 + (−1)k−1 ⎪ Γ (αk + 1) sin , ⎪ 1 1 k! 2 ⎪ πγ α k=1 γα ⎪ ⎪ ⎨ for 0 < α ≤ 1, f (x) = ∞  2k+1   x 2k + k ⎪ (−1) 1 ⎪ Γ α , ⎪ 1 1 ⎪ παγ α k=0 (2k)! γα ⎪ ⎪ ⎩ for 1 ≤ α ≤ 2.

(5.A.53)

It is known that the pdf (5.A.53) can be expressed more explicitly in a closed form when α = 1 and 2. Figure 5.11 shows pdf’s of the SαS distributions. Let us show that the two infinite series in (5.A.53) become the Cauchy pdf f (x) =

π



x2

γ  + γ2

(5.A.54)

when α = 1, and that the second infinite series of (5.A.53) is the normal pdf   x2 1 f (x) = √ exp − 2 πγ 4γ

(5.A.55)

when α = 2. The first infinite series in (5.A.53) forα = 1 can be expressed  2  4  6 ∞    |x| −k−1 γ γ γ 1 + (−1)k−1 1 as Γ (k + 1) sin kπ = πγ − |x| + |x| πγ k! 2 γ |x| k=1   2 k+1  8 ∞ γ 1 + − |x| + · · · = πγ (−1)k γx 2 , i.e., k=0

   −k−1 ∞ kπ |x| 1 . (−1)k−1 γ  , (5.A.56) Γ (k + 1) sin =  2 πγ k=1 k! 2 γ π x + γ2

402

5 Normal Random Vectors

which can also be obtained from the second infinite series of (5.A.53) as  2k  2k ∞ ∞ 1 + (−1)k 1 + Γ (2k + 1) γx = πγ (−1)k γx = π x 2γ+γ 2 . Next, noting that πγ (2k)! ( ) k=0 k=0 ∞  2k+1  (2k)! √ + k (−x) Γ = 22k k! π shown in (1.4.84) and that = e−x , the second infinite 2 k! k=0 ∞  2k+1   x 2k + (−1)k √ series of (5.A.53) for α = 2 can be rewritten as 2π1√γ Γ = (2k)! 2 γ k=0     ∞ ∞ k k + + (−1)k 1 x2 x2 √1 − 4γ = 2√1πγ , i.e., 2 πγ γ k! 22k k! k=0

k=0

     ∞ x 2k 1 1 . (−1)k 2k + 1 x2 . = √ exp − Γ √ √ 2π γ (2k)! 2 γ 2 πγ 4γ

(5.A.57)

k=0

 3 When A ∼ U − π2 , π2 and an exponential random variable B with mean 1 are independent of each other, it is known that X =

sin(α A) 1

(cos A) α



cos{(1 − α)A} B

 1−α α (5.A.58)

is a standard SαS random variable. This result is useful when generating random numbers obeying the SαS distribution. Definition 5.A.5 (bi-variate isotropic SαS distribution) When the joint pdf of a random vector (X, Y ) can be expressed as f X,Y (x, y) =

∞ ∞ α  1 exp −γ ω12 + ω22 2 2 4π −∞ −∞ × exp {− j (xω1 + yω2 )} dω1 dω2 ,

(5.A.59)

the distribution of (X, Y ) is called the bi-variate isotropic SαS distribution. Expressing the pdf (5.A.59) of the bi-variate isotropic SαS distribution in infinite series, we have

f X,Y (x, y) =

⎧ ∞   + 1 αk 1 ⎪ ⎪ 2 (−1)k−1 Γ 2 1 + αk 2 ⎪ k! 2 2 ⎪ π γ α k=1 ⎪ ⎪   ⎪ ⎪   √x 2 +y 2 −αk−2 ⎪ ⎪ ⎨ , × sin kαπ 1 2 γα

⎪ for 0 < α ≤ 1, ⎪ ⎪ ∞ ⎪  2k+2   x 2 +y 2 k + ⎪ 1 1 ⎪ Γ , − 2 ⎪ 2 2 ⎪ α ⎪ 2παγ α k=0 (k!) 4γ α ⎪ ⎩ for 1 ≤ α ≤ 2.

(5.A.60)

Appendices

403

Example 5.A.1 Show that (5.A.60) represents a bi-variate Cauchy distribution and a bi-variate normal distribution for α = 1 and α = 2, respectively. In other words, show that the two infinite series of (5.A.60) become f X,Y (x, y) =

γ



2π x 2 + y 2 + γ 2

(5.A.61)

 23

when α = 1 and that the second infinite series of (5.A.60) becomes  2  x + y2 1 exp − 4πγ 4γ

f X,Y (x, y) = when α = 2.

 2k+3 

(5.A.62)

1

   √ + k Γ 21 + k = (2k+1)! π from 22k+1 k! ∞ + − 23 k (1.4.75) and (1.4.84). Thus, recollecting that (1 + x) = − 23 Ck x , i.e., Solution First, note that we have Γ

2

=

2

k=0

(1 + x)− 2 = 3

∞ . (−1)k (2k + 1)!

22k (k!)2

k=0

xk,

(5.A.63)

we get 0−k−2    / 2 ∞ 1 . 2k (−1)k−1 2 k x + y2 kπ Γ + 1 sin k! 2 2 γ π2 γ 2 k=1  /  / ⎧ 0 03 ⎨ 21 Γ 2 23 23 Γ 2 25 γ γ 1    = 2 2 − 1! 3! π x + y2 ⎩ x 2 + y2 x 2 + y2  /   ⎫ 05 07 / ⎬ 25 Γ 2 27 27 Γ 2 29 γ γ   + − + ··· ⎭ 5! 7! x 2 + y2 x 2 + y2 02k+1  / ∞ . γ (−1)k 22k+1 2 2k + 3 1   = 2 2 Γ (2k + 1)! 2 π x + y2 x 2 + y2 k=0 / 0 2k+1 ∞ . (−1)k (2k + 1)! 1 γ   =  2 2 2 + y2 π x +y 22k+1 (k!)2 x k=0 / 0k ∞ . γ2 (−1)k (2k + 1)! 1 γ ×  = 2π x 2 + y 2 x 2 + y2 22k (k!)2 x 2 + y2 

k=0

=



γ

/ 3

γ2

1+ 2 x + y2

2π x 2 + y 2 2 γ = 3  2π x 2 + y 2 + γ 2 2

0− 3 2

(5.A.64)

404

5 Normal Random Vectors

when α = 1 from the first infinite series of (5.A.60). The result (5.A.64)  2 2 k  2 2 k ∞ ∞ + + x +y x +y Γ (2k+2) (2k+1)!(−1)k − = = can also be obtained as 2 2 2 2 2 2k 2πγ (k!) 4γ γ2 2πγ (k!) 2 1 2πγ 2

 1+

x 2 +y 2 γ2

− 23

k=0

k=0

, i.e.,

 2 k ∞ . x + y2 Γ (2k + 2) γ − = 3  2 (k!)2 2 2πγ 4γ 2π x 2 + y 2 + γ 2 2 k=0

(5.A.65)

from the second infinite series of (5.A.60) using (5.A.63). Next, when α = 2,  2 2 k ∞ +y 1 + Γ (k+1) 1 from the second infinite series of (5.A.60), we get 4πγ − x 4γ = 4πγ (k!)2 k=0  2 2 k ∞ ∞ + + x +y (−x)k 1 − 4γ , which is the same as (5.A.62) because = e−x . ♦ k! k!

k=0

k=0

Exercises Exercise 5.1 Assume a random vector (X, Y ) with the joint pdf f X,Y (x, y) =

(1) (2) (3) (4)

     2 1 2 xy . √ exp − x 2 + y 2 cosh 3 3 π 3

(5.E.1)

Show that X ∼ N (0, 1) and Y ∼ N (0, 1). Show that X and Y are uncorrelated. Is (X, Y ) a bi-variate normal random vector? Are X and Y independent of each other?

Exercise 5.2 When X 1 ∼ N (0, 1) and X 2 ∼ N (0, 1) are independent of each other, obtain the conditional joint pdf of X 1 and X 2 given that X 12 + X 22 < a 2 . Exercise 5.3 Assume that X 1 ∼ N (0, 1) and X 2 ∼ N (0, 1) are independent of each other. √ (1) Obtain the joint pdf of U = X 2 + Y 2 and V = tan−1 YX . (2) Obtain the joint pdf of U = 21 (X + Y ) and V = 21 (X − Y )2 . Exercise 5.4 Obtain the conditional pdf’s f Y |X (y|x) and f X |Y (x|y) when (X, Y ) ∼ N (3, 4, 1, 2, 0.5). Exercise 5.5 Obtain the correlation coefficient ρ Z W between Z = X 1 cos θ + X 2 sin θ and W = X 2 cos θ − X 1 sin θ, and show that

Exercises

405

 0 ≤ ρ2Z W ≤

σ12 − σ22 σ12 + σ22

2 (5.E.2)

    when X 1 ∼ N μ1 , σ12 and X 2 ∼ N μ2 , σ22 are independent of each other. Exercise 5.6 When the two normal random variables X and Y are independent of each other, show that X + Y and X − Y are independent of each other. Exercise 5.7 Let us consider (5.2.1) and (5.2.2) when n = 3 and s = 1. Based on   −1 1 ρ23 −1 (5.1.18) and (5.1.21), show that Ψ 22 − Ψ 21 Ψ 11 Ψ 12 is equal to K 22 = . ρ23 1 Exercise 5.8 Consider the random variable  Y, when Z = +1, X = −Y, when Z = −1,

(5.E.3)

where Z is a binary random variable with pmf p Z (1) = p Z (−1) = 0.5 and Y ∼ N (0, 1). (1) (2) (3) (4)

Obtain the conditional cdf FX |Y (x|y). Obtain the cdf FX (x) of X and determine whether or not X is normal. Is the random vector (X, Y ) normal? Obtain the conditional pdf f X |Y (x|y) and the joint pdf f X,Y (x, y).

Exercise ⎛ 1 1 5.9 ⎞ For a zero-mean normal random vector X with covariance matrix 1 6 36 ⎝ 1 1 1 ⎠, find a linear transformation to decorrelate X. 6 6 1 1 1 36 6 Exercise 5.10 Let X = (X, Y ) denote the coordinate of a point in the twodimensional plane √ and C = (R, Θ) be its polar coordinate. Specifically, as shown in Fig. 5.12, R = X 2 + Y 2 is the distance from the origin to X, and Θ = ∠X is the angle between the positive x-axis and the line from the origin to X, where we assume −π < Θ ≤ π. Express the joint pdf of C in terms of the joint  pdf of X. When X is an i.i.d. random vector with marginal distribution N 0, σ 2 , prove or disprove that C is an independent random vector. Exercise 5.11 For the limit pdf lim f X 1 ,X 2 (x, y) shown in (5.1.15), show that ρ→±1 ∞ ∞ −∞ lim f X 1 ,X 2 (x, y)dy = f X 1 (x) and −∞ lim f X 1 ,X 2 (x, y)d x = f X 2 (y). ρ→±1

ρ→±1

Exercise 5.12 Consider a⎞zero-mean normal random vector (X 1 , X 2 , X 3 ) with ⎛ 111 covariance matrix ⎝1 2 1⎠. Obtain the conditional distribution of X 3 when X 1 = 113 X 2 = 1.

406

5 Normal Random Vectors

Fig. 5.12 Polar coordinate C = (R, Θ) = (|X| , ∠X) for X = (X, Y )

R X = (X, Y )

Y Θ X

Exercise 5.13 Consider the linear transformation (Z , W ) = (a X + bY, cX + dY )   of (X, Y ) ∼ N m X , m Y , σ 2X , σY2 , ρ . When ad − bc = 0, find the requirement for {a, b, c, d} for Z and W to be independent of each other.    2 2 2 2 Exercise  4 5.144 When (X, Y ) ∼ N 0, 0, σ X , σY , ρ , we have E X = σ X 2 and E  X = 3σ X . Based on these two results and (4.4.44), obtain E{X Y }, E X Y , E X 3 Y , and E X 2 Y 2 . Compare the results with those you can obtain from (5.3.22) or (5.3.51). Exercise 5.15 For astandard  random vector (X 1 , X 2 , X 3 ), denote tri-variate normal the covariance by E X i X j − E {X i } E X j = ρi j . Show that    E X 12 X 22 X 32 = 1 + 2 ρ212 + ρ223 + ρ231 + 8ρ12 ρ23 ρ31

(5.E.4)

based on the moment theorem. Show (5.E.4) based on Taylor series of the cf.    Exercise 5.16 When (Z , W ) ∼ N m 1 , m 2 , σ12 , σ22 , ρ , obtain E Z 2 W 2 .  Exercise 5.17 Denote the joint moment by μi j = E X i Y j for a zero-mean random vector (X, Y ). Based on the moment theorem, (5.3.22), (5.3.30), or   (5.3.51), obtain μ51 , μ42 , and μ33 for a random vector (X, Y ) ∼ N 0, 0, σ12 , σ22 , ρ . Exercise 5.18 Using the cf, prove (5.3.9) and (5.3.10).  Exercise 5.19 Denote the joint absolute moment by νi j = E |X |i |Y | j for a zero mean random vector (X, Y ). By direct integration, show that ν11 = π2 1 − ρ2 +     ρ sin−1 ρ and ν21 = π2 1 + ρ2 for (X, Y ) ∼ N (0, 0, 1, 1, ρ). For (X, Y ) ∼   N 0, 0, π 2 , 2, √12 , calculate E{|X Y |}.   Exercise 5.20 Based on (5.3.30), obtain E {|X 1 |}, E {|X 1 X 2 |}, and E  X 1 X 23  for X = (X 1 , X 2 ) ∼ N (0, 0, 1, 1, ρ). Show ρ|X 1 ||X 2 | =

 2  1 − ρ2 + ρ sin−1 ρ − 1 . π−2

(5.E.5)

Exercises

407

Next, based on Price’s theorem, show that E {X 1 u (X 1 ) X 2 u (X 2 )} =

  1  π + sin−1 ρ ρ + 1 − ρ2 , 2π 2

(5.E.6)



 which12 implies that E {W Y u(W )u(Y )} = σW2πσY ρ cos−1 (−ρ) + 1 − ρ2 when   2 (W, Y ) ∼ N 0, 0, σW , σY2 , ρ . In addition, when W = Y , we can obtain  2 1 2 E W u(W ) = 2 σW with ρ = 1 and σY = σW , which can be proved by a   ∞  ∞ 2 2 2 direct integration as E W 2 u(W ) = 0 √ x 2 exp − 2σx 2 d x = 21 −∞ √ x 2 2πσW 2πσW W   2 2 exp − 2σx 2 d x = 21 σW . W

Exercise 5.21 Show  E {X 1 X 2 |X 3 |} =

2 (ρ12 + ρ23 ρ31 ) π

(5.E.7)

for a standard tri-variate normal random vector (X 1 , X 2 , X 3 ). Exercise 5.22 Based on Price’s theorem, (5.1.26), and (5.1.27), show that13 √ E {δ (X 1 ) δ (X 2 ) |X 3 |} = √

2π 3

|K 3 |  , 1 − ρ212

c23 E {δ (X 1 ) X 2 X 3 } = √ , 2π c23 , E {δ (X 1 ) sgn (X 2 ) X 3 } =  π 1 − ρ212 

2  −1 |K | + c , sin β E {δ (X 1 ) |X 2 | |X 3 |} = 3 23 23,1 π3

(5.E.8) (5.E.9) (5.E.10)

(5.E.11)

and  E {δ (X 1 ) sgn (X 2 ) sgn (X 3 )} =

2 sin−1 β23,1 π3

(5.E.12)

   Here, the range of the inverse cosine function cos−1 x is [0, π], and cos sin−1 ρ = 1 − ρ2 .     Note that, letting π2 + sin−1 ρ = θ, we get cos θ = cos π2 + sin−1 ρ = − sin sin−1 ρ = −ρ and, π  subsequently, 2 + sin−1 ρ ρ = ρ cos−1 (−ρ). Thus, we have θ = cos−1 (−ρ). 13 Here, using E {sgn (X ) sgn (X )} = 2 sin−1 ρ 2 3 23 obtained in (5.3.29) and E {δ (X 1 )} π = √1 , we can obtain E {δ (X 1 ) sgn (X 2 ) sgn (X 3 )}|ρ31 =ρ12 =0 = E {δ (X 1 )} E {sgn 2π  (X 2 ) sgn (X 3 )} = π23 sin−1 ρ23 from (5.E.12) when ρ31 = ρ12 = 0. This result is the same    . as π23 sin−1 β23,1  12

ρ31 =ρ12 =0

408

5 Normal Random Vectors

for a standard tri-variate normal random vector (X 1 , X 2 , X 3 ), where ci j = ρ jk ρki − ρi j . Then, show that    2 −1 2 ρ12 sin ρ23 + ρ31 1 − ρ23 E {X 1 |X 2 | sgn (X 3 )} = π

(5.E.13)

based on Price’s theorem and (5.E.10). Exercise 5.23 Using (5.3.38)–(5.3.43), show that  E {|X 1 X 2 X 3 |} =

8 π3





 c .   |K 3 | + ρi j + ρ jk ρki κi jk , 

E {sgn (X 1 ) sgn (X 2 ) |X 3 |} =

(5.E.14)

8 (κ123 + ρ23 κ312 + ρ31 κ231 ) , π3

(5.E.15)

and E



X 12 sgn (X 2 ) sgn (X 3 )



  2 2ρ31 ρ12 − ρ23 ρ212 − ρ23 ρ231  = π 1 − ρ223 +

2 sin−1 ρ23 π

(5.E.16)

for14 a standard tri-variate normal random vector (X 1 , X 2 , X 3 ), where κi jk = sin−1 βi j,k . Confirm (5.E.7) and (5.E.13). 15 and (5.E.12) based on (5.E.15). Based on (5.E.16), Exercise5.24 Confirm (5.E.8) 2 obtain E X 1 δ (X 2 ) δ (X 3 ) and confirm (5.E.10).

  can easily get E  X 12 X 2  = E {|X 1 X 2 X 3 |}| X 3 →X 1 = E {|X 1 X 2 X 3 |}|ρ31 =1 =      0 + π2 1 + ρ2 = π2 1 + ρ2 with (5.E.14). Similarly, with (5.E.15), it is easy π3 to get E {|X 3 |} = E {sgn (X 1 ) sgn (X 2 ) |X 3 |}| X 2 →X 1 = E {sgn (X 1 ) sgn (X 2 ) |X 3 |}|ρ12 =1 =     2 8 2 −1 1−ρ23 + 2ρ sin−1 0 = and E {sgn (X 1 ) X 2 } = E {sgn (X 1 ) sgn sin 23 π π3 1−ρ223     (X 2 ) |X 3 |}|ρ23 =1 = π83 0 + 0 + ρ12 sin−1 1 = π2 ρ12 . Next, when |ρ23 | → 1, we  have ρ31 → sgn (ρ23 ) ρ12 because X 3 → X 2 , and thus lim 2ρ31 ρ12 − ρ23 ρ212 −

14



We  8

ρ23 ρ231



= 0.

Consequently,

 2  ρ12 +ρ231 1−ρ223 lim ρ23 ρ23 →1 15 Based on this

we

get

ρ23 →1 2ρ31 ρ12 −ρ23 ρ212 −ρ23 ρ231  lim ρ23 →1 1−ρ223

= lim

ρ23 →1

−ρ212 −ρ231 2

−2ρ23 √ 2

=

1−ρ23

= 0 in (5.E.16) using L’Hospital’s theorem.  √|K 3 | result, we have dρ12 = sin−1 β12,3 + ρ23 sin−1 β31,2 + ρ31 1−ρ2

sin−1 β23,1 + h (ρ23 , ρ31 ) for a function h.

12

Exercises

409

2 4 Exercise 5.25 Find thecoefficient of the term ρ˜12 ρ˜22 ρ˜34 m 1 m 4 in the expansion 3 4 4 5 of the joint moment E X 1 X 2 X 3 X 4 for a quadri-variate normal random vector (X 1 , X 2 , X 3 , X 4 ).

Exercise 5.26 Using the Price’s theorem, confirm that  2  1 − ρ2 + ρ sin−1 ρ π

E {|X 1 X 2 |} =

(5.E.17)

for (X 1 , X 2 ) ∼ N (0, 0, 1, 1, ρ). The result (5.E.17) is obtained with other methods in Exercises 5.19 and 5.20. When (X1 , X 2 ) ∼ N (0, 0, 1, 1,  X1 =  ρ) and   2  2 −1 2 X 2 , (5.E.17) can be written as E X 1 = π ρ sin ρ + 1 − ρ σ1 σ2  ρ=1,σ2 =σ1    = σ12 , implying E X 2 = σ 2 when X ∼ N 0, σ 2 . Exercise 5.27 Let us show some results related to the general formula (5.3.51) for the joint moments of normal random vectors. (1) Confirm the coefficient a! 2 j j! (a − 2 j)!

da, j =

(5.E.18)

in (5.3.17). (2) Recollecting (5.3.46)–(5.3.48), show that a3



E X 1a1 X 2a2 X 3

=

.

⎛ ⎞⎛ ⎞ 3 3 * 3 * * l L da,l ⎝ ρ˜iijj ⎠ ⎝ m j a, j ⎠ i=1 j=i

l∈Sa

(5.E.19)

j=1

3 , where for {ai = 0, 1, . . .}i=1

da,l =

3 +

ljj

/

2 j=1

a1 !a2 !a3 ! 0/ 0 3 1 3 3 1 1 li j ! L a, j !

i=1 j=i

(5.E.20)

j=1

when n = 3. (3) Show that (5.3.51) satisfies (5.A.1) and (5.A.2). Exercise 5.28 When r is even, show the moment 

E X

r



=

σGr

Γ

r 2 −1

1

Γ

k r 2

  Γ r +1 3 k k

for the generalized normal random variable X with the pdf (5.A.39).

(5.E.21)

410

5 Normal Random Vectors

Exercise 5.29 When vk > r and r is even, show the r -th moment 

E X

r



= v

r k

σGr

    Γ Γ v − rk Γ r +1 k r 3 Γ (v)Γ 2 k

r 2 −1

1 k

(5.E.22)

for the generalized Cauchy random variable X with the pdf (5.A.44). γ  2 x Exercise 5.30 Obtain the pdf f X (x) of X from the joint pdf f X,Y (x, y) = 2π 3  − +y 2 + γ 2 2 shown in (5.A.61). Confirm that the pdf is the same as the pdf f (r ) =  2 −1 α r + α2 obtained by letting β = 0 in (2.5.28). π

Exercise 5.31 Show that the mgf of the sample mean is  n t M X n (t) = M n 

(5.E.23)

for a sample X = (X 1 , X 2 , . . . , X n ) with marginal mgf M(t). Exercise 5.32 Obtain the mean and variance, and show the mgf MY (t) = (1 − 2t)− 2 exp n



δt 1 − 2t

 ,

(5.E.24)

for Y ∼ χ2 (n, δ). Exercise 5.33 Show the r -th moment  r k−r k 2 Γ ( 2 )Γ ( r +1 2 )  r √ , when r < k and r is even, πΓ ( k2 ) E X = 0, when r < k and r is odd

(5.E.25)

of X ∼ t (k). Exercise 5.34 Obtain the mean and variance of Z ∼ t (n, δ). Exercise 5.35 For H ∼ F(m, n), show that 

E H for k = 1, 2, . . . ,

9n: 2

k



 n k Γ  m + k  Γ  n − k  2   2 = m Γ m2 Γ n2

(5.E.26)

− 1.

Exercise 5.36 Obtain the mean and variance of H ∼ F(m, n, δ). Exercise 5.37 For i.i.d. random variables X 1 , X 2 , X 3 , and X 4 with marginal distribution N (0, 1), show that the pdf of Y = X 1 X 2 + X 3 X 4 is f Y (y) = 21 e−|y| .

Exercises

411

Exercise 5.38 Show that the distribution of Y = −2 ln X is χ2 (2) when X ∼ k are all independent of each other, show that U (0, 1). When {X i ∼ U (0, 1)}i=1 k + −2 ln X i ∼ χ2 (2k). i=1

Exercise 5.39 Prove that X n = X n−1 + for the sample mean X n =

1 n

n +

 1 X n − X n−1 n

(5.E.27)

X i with X 0 = 0.

i=1

 Exercise 5.40 Let us denote the k-th central moment of X i by E (X i − m)k = μk   for k = 0, 1, . . .. Obtain the fourth central moment μ4 X n of the sample mean X n for a sample X = (X 1 , X 2 , . . . , X n ). Exercise 5.41 Prove Theorem 5.1.4 by taking the steps described below. (1) Show that the pdf f 3 (x, y, z) shown in (5.1.19) can be written as 

(x + t12 y)2  exp −  f 3 (x, y, z) =  2 1 − ρ212 8π 3 |K 3 |   1 − ρ212 2 × exp − (z + b12 ) , 2 |K 3 | 1



   2 2 y 1 − t12  exp −  2 2 1 − ρ12 (5.E.28)

  q12 y+c31 x where t12 = |K and b12 = c231−ρ with q12 = c12 1 − ρ212 − c23 c31 and ci j = 2 3| 12 ρ jk ρki − ρi j . (2) Show that lim t12 = −ξ12 and ρ12 →±1

1 − ρ212 1 . = ρ12 →±1 |K 3 | 1 − ρ223 lim

Subsequently, using lim

α→∞

α π

  exp −αx 2 = δ(x), show that 

(x + t12 y)2  lim  exp −  ρ12 →±1 2 1 − ρ212 8π 3 |K 3 | 1



  where ξi j = sgn ρi j . 2 1−t12 2 ρ12 →±1 1−ρ12

(3) Show that lim

(5.E.29)

= 1, which instantly yields

=

δ(x − ξ12 y)  , 2π 1 − ρ223

(5.E.30)

412

5 Normal Random Vectors

   2   2 y 1 − t12 1 2  . y lim exp −  = exp − ρ12 →±1 2 2 1 − ρ212

(5.E.31)

(4) Using (5.E.29), show that     1 − ρ212 (z − μ1 (x, y))2 2   lim exp − = exp − , (z + b12 ) ρ12 →±1 2 |K 3 | 2 1 − ρ223 (5.E.32) where μ1 (x, y) = 21 ξ12 (ρ23 x + ρ31 y). Combining (5.E.30), (5.E.31), and (5.E.32) into (5.E.28), and noting that ρ23 = ξ12 ρ31 when ρ12 → ±1 and that y can be replaced with ξ12 x due to the function δ(x − ξ12 y), we get (5.1.36). (5) Obtain (5.1.37) from (5.1.36). Exercise 5.42 Assume (X, Y ) has the standard bi-variate normal pdf φ2 . (1) Obtain the pdf and cdf of V = g(X, Y ) =

X 2 −2ρX Y +Y 2 . 2(1−ρ2 ) 2 2

(2) Note that φ2 (x, y) = c is equivalent to x −2ρx y + y = c1 , an ellipse, for positive constants c and c1 . Show that c1 = −2 1 − ρ2 ln(1 − α) for the ellipse containing 100α% of the distribution of (X, Y ). Exercise 5.43 Consider (X 1 , X 2 ) ∼ N (0, 0, 1, 1, ρ) and g(x) = 2β i.e.,

x  −1 , g(x) = β 2Φ α



x α

0

φ(z)dz,

(5.E.33)

where α > 0, β > 0, Φ is the standard normal cdf, and φ is the standard normal pdf. Obtain the correlation RY = E {Y1 Y2 } and correlation coefficient ρY between Y1 = g (X 1 ) and Y2 = g (X 2 ). Obtain the values of ρY when α2 = 1 and α2 → ∞. Note that g is a smoothly increasing function from −β to β. When α = 1, we have β {2Φ (X i ) − 1} ∼ U (−β, β) because Φ(X ) ∼ U (0, 1) when X ∼ Φ from (3.2.50). Exercise 5.44 In Figs. 5.1, 5.2 and 5.3, show that the angle θ between the major axis of the ellipse and the positive x-axis can be expressed as (5.1.9).

References M. Abramowitz, I.A. Stegun (eds.), Handbook of Mathematical Functions (Dover, New York, 1972) J. Bae, H. Kwon, S.R. Park, J. Lee, I. Song, Explicit correlation coefficients among random variables, ranks, and magnitude ranks. IEEE Trans. Inform. Theory 52(5), 2233–2240 (2006)

References

413

W. Bär, F. Dittrich, Useful formula for moment computation of normal random variables with nonzero means. IEEE Trans. Automat. Control 16(3), 263–265 (1971) R.F. Baum, The correlation function of smoothly limited Gaussian noise. IRE Trans. Inform. Theory 3(3), 193–197 (1957) J.L. Brown Jr., On a cross-correlation property of stationary processes. IRE Trans. Inform. Theory 3(1), 28–31 (1957) W.B. Davenport Jr., Probability and Random Processes (McGraw-Hill, New York, 1970) W.A. Gardner, Introduction to Random Processes with Applications to Signals and Systems, 2nd edn. (McGraw-Hill, New York, 1990) I.S. Gradshteyn, I.M. Ryzhik, Table of Integrals, Series, and Products (Academic, New York, 1980) J. Hajek, Nonparametric Statistics (Holden-Day, San Francisco, 1969) J.B.S. Haldane, Moments of the distributions of powers and products of normal variates. Biometrika 32(3/4), 226–242 (1942) G.G. Hamedani, Nonnormality of linear combinations of normal random variables. Am. Stat. 38(4), 295–296 (1984) B. Holmquist, Moments and cumulants of the multivariate normal distribution. Stochastic Anal. Appl. 6(3), 273–278 (1988) R.A. Horn, C.R. Johnson, Matrix Analysis (Cambridge University Press, Cambridge, 1985) L. Isserlis, On a formula for the product-moment coefficient of any order of a normal frequency distribution in any number of variables. Biometrika 12(1/2), 134–139 (1918) N.L. Johnson, S. Kotz, Distributions in Statistics: Continuous Multivariate Distributions (Wiley, New York, 1972) A.R. Kamat, Incomplete moments of the trivariate normal distribution. Indian J. Stat. 20(3/4), 321–322 (1958) R. Kan, From moments of sum to moments of product. J. Multivariate Anal. 99(3), 542–554 (2008) S. Kotz, N. Balakrishnan, N.L. Johnson, Continuous Multivariate Distributions, 2nd edn. (Wiley, New York, 2000) E.L. Melnick, A. Tenenbein, Misspecification of the normal distribution. Am. Stat. 36(4), 372–373 (1982) D. Middleton, An Introduction to Statistical Communication Theory (McGraw-Hill, New York, 1960) G.A. Mihram, A cautionary note regarding invocation of the central limit theorem. Am. Stat. 23(5), 38 (1969) T.M. Mills, Problems in Probability (World Scientific, Singapore, 2001) S. Nabeya, Absolute moments in 3-dimensional normal distribution. Ann. Inst. Stat. Math. 4(1), 15–30 (1952) C.L. Nikias, M. Shao, Signal Processing with Alpha-Stable Distributions and Applications (Wiley, New York, 1995) J.K. Patel, C.H. Kapadia, D.B. Owen, Handbook of Statistical Distributions (Marcel Dekker, New York, 1976) J.K. Patel, C.B. Read, Handbook of the Normal Distribution, 2nd edn. (Marcel Dekker, New York, 1996) D.A. Pierce, R.L. Dykstra, Independence and the normal distribution. Am. Stat. 23(4), 39 (1969) R. Price, A useful theorem for nonlinear devices having Gaussian inputs. IRE Trans. Inform. Theory 4(2), 69–72 (1958) V.K. Rohatgi, A.K. Md. E. Saleh, An Introduction to Probability and Statistics, 2nd edn. (Wiley, New York, 2001) J.P. Romano, A.F. Siegel, Counterexamples in Probability and Statistics (Chapman and Hall, New York, 1986) I. Song and S. Lee, Explicit formulae for product moments of multivariate Gaussian random variables. Stat. Prob. Lett. 100, 27–34 (2015)

414

5 Normal Random Vectors

I. Song, S. Lee, Y.H. Kim, S.R. Park, Explicit formulae and implication of the expected values of some nonlinear statistics of tri-variate Gaussian variables. J. Korean Stat. Soc. 49(1), 117–138 (2020) J.M. Stoyanov, Counterexamples in Probability, 3rd edn. (Dover, New York, 2013) K. Triantafyllopoulos, On the central moments of the multidimensional Gaussian distribution. Math. Sci. 28(2), 125–128 (2003) G.A. Tsihrintzis, C.L. Nikias, Incoherent receiver in alpha-stable impulsive noise. IEEE Trans. Signal Process. 43(9), 2225–2229 (1995) G.L. Wies, E.B. Hall, Counterexamples in Probability and Real Analysis (Oxford University, New York, 1993) C.S. Withers, The moments of the multivariate normal. Bull. Austral. Math. Soc. 32(1), 103–107 (1985)

Chapter 6

Convergence of Random Variables

In this chapter, we discuss sequences of random variables and their convergence. The central limit theorem, one of the most important and widely-used results in many areas of the applications of random variables, will also be described.

6.1 Types of Convergence In discussing the convergence of sequences (Grimmett and Stirzaker 1982; Thomas 1986) of random variables, we consider whether every or almost every sequence is convergent, and if convergent, whether the sequences converge to the same value or different values.

6.1.1 Almost Sure Convergence Definition 6.1.1 (sure convergence; almost sure convergence) For every point ω of the sample space on which the random variable X n is defined, if lim X n (ω) = X (ω),

n→∞

(6.1.1)

then the sequence {X n }∞ n=1 is called surely convergent to X , and if   P ω : lim X n (ω) = X (ω) = 1, n→∞

(6.1.2)

then the sequence {X n }∞ n=1 is called almost surely convergent to X . © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 I. Song et al., Probability and Random Variables: Theory and Applications, https://doi.org/10.1007/978-3-030-97679-8_6

415

416

6 Convergence of Random Variables

Sure convergence is also called always convergence, everywhere convergence, or certain convergence. When a sequence {X n }∞ n=1 is surely convergent to X , it is c. e. s. denoted by X n → X , X n → X , or X n → X . The sure convergence implies that all the sequences are convergent for all ω, yet the limit value of the convergence may depend on ω. Almost sure convergence is synonymous with convergence with probability 1, almost always convergence, almost everywhere convergence, and almost certain convergence. When a sequence {X n }∞ n=1 is almost surely convergent to X , it is denoted by a.c.

a.e.

w. p.1

a.s.

X n −→ X , X n −→ X , X n −→ X , or X n −→ X . For an almost surely convergent   ˜ ˜ sequence {X n (ω)}∞ n=1 , we have lim X n (ω) = X (ω) for any ω ∈  when P  = 1 n→∞

˜ ⊆ . Although a sequence {X n (ω)}∞ ˜ may or may not conand  / n=1 for which ω ∈  ˜ ω ∈  = 0. verge, the set of such ω has probability 0: in other words, P ω : ω ∈ / , Example 6.1.1 Recollecting (1.5.17), P (i.o. |X n | > ε) = 0

(6.1.3)

for every positive number ε and a.s.

X n −→ 0 are the necessary and sufficient conditions of each other.

(6.1.4) ♦

∞ When a sequence {X n }i=1 of random variables is almost surely convergent, almost every random variable in the sequence will eventually be located within a range of 2ε for any number ε > 0: although some random variables may not converge, the probability of ω for such random variables will be 0. The strong law of large numbers, which we will consider later in this chapter, is an example of almost sure convergence.

Example 6.1.2 (Leon-Garcia 2008) For a randomly chosen point ω ∈ [0, 1], assume that P(ω ∈ (a, b)) = b − a for 0 ≤ a ≤ b ≤ 1.Now consider the five sequences of  random variables An (ω) = ωn , Bn (ω) = ω 1 − n1 , Cn (ω) = ωen , Dn (ω) = cos 2π nω, and Hn (ω) = exp{−n(nω − 1)}. The sequence {An (ω)}∞ n=1 converges always to 0 for any value of ω ∈ [0, 1], and thus it is surely convergent to 0. The sequence {Bn (ω)}∞ n=1 converges to ω for any value of ω ∈ [0, 1], and thus it is surely convergent to ω with the limit distribution U [0, 1]. The sequence {Cn (ω)}∞ n=1 converges to 0 when ω = 0 and diverges when ω ∈ (0, 1]: in other words, it is not convergent. The sequence {Dn (ω)}∞ n=1 converges to 1 when ω ∈ {0, 1} and oscillates between −1 and 1 when ω ∈ (0, 1): in other words, it is not convergent. When n → ∞, Hn (0) = en → ∞ for ω = 0 and Hn (ω) → 0 for ω ∈ (0, 1]: in other words, {Hn (ω)}∞ n=1 is not surely convergent. However, because P(ω ∈ (0, 1]) = 1, {Hn (ω)}∞ converges almost surely to 0. ♦ n=1

6.1 Types of Convergence

417

Example 6.1.3 (Stoyanov 2013) Consider a sequence {X n }∞ n=1 . When ∞ 

P (|X n | > ε) < ∞

(6.1.5)

n=1 a.s.

for ε > 0, it is easy to see that X n −→ 0 as n → ∞ from the Borel-Cantelli lemma. In addition, even if we change the condition (6.1.5) into ∞ 

P (|X n | > εn ) < ∞

(6.1.6)

n=1 a.s.

for a sequence {εn }∞ n=1 such that εn ↓ 0, we still have X n −→ 0. Now, when ω is a randomly chosen point in [0, 1], for a sequence {X n }∞ n=1 with  X n (ω) =

0, 1,

0 ≤ ω ≤ 1 − n1 , 1 − n1 < ω ≤ 1,

(6.1.7)

a.s.

we have X n −→ 0 as n → ∞. However, for any εn such that εn ↓ 0, if we consider a sufficiently large n, we have P (|X n | > εn ) = P (X n = 1) = n1 and thus ∞  P (|X n | > εn ) → ∞ for the sequence (6.1.7). In other words, (6.1.6) is a sufn=1

ficient condition for the sequence to converge almost surely to 0, but not a necessary condition. ♦ a.s.

Theorem 6.1.1 (Rohatgi and Saleh 2001; Stoyanov 2013) If X n −→ X for a sequence {X n }∞ n=1 , then

lim P sup |X m − X | > ε = 0

n→∞

(6.1.8)

m≥n

holds true for every ε > 0, and the converse also holds true. a.s.

a.s.

Proof If X n −→ X , then we have X n − X −→ 0. Thus, let us show that a.s.

X n −→ 0

(6.1.9)

and

lim P sup |X m | > ε

n→∞

m≥n

= 0

are the necessary and sufficient conditions of each other.

(6.1.10)

418

6 Convergence of Random Variables

 Assume (6.1.9) holds true. Let An (ε) = sup |X n | > ε , C = lim X n = n→∞

m≥n

∞ 0 , and Bn (ε) = C ∩ An (ε). Then, Bn+1 (ε) ⊆ Bn (ε) and ∩ Bn (ε) = ∅, and thus n=1

∞ lim P(Bn (ε)) = P ∩ Bn (ε) = 0. Recollecting P(C) = 1 and P (C c ) = 0, we n→∞  n=1  get P C c ∩ Acn (ε) ≤ P (C c ) = 0 because C c ∩ Acn (ε) ⊆ C c . We also have  c c P (Bn (ε)) = P(C ∩ An (ε)) = 1 − P C ∪ An(ε) = 1 − P (C c ) − P Acn (ε) + P C c ∩ Acn (ε) = P ( An (ε)) + P C c ∩ Acn (ε) , i.e.,

P (Bn (ε)) = P (An (ε)) .

(6.1.11)

Therefore, we have (6.1.10).

 Next, assume that (6.1.10) holds true. Letting D(ε) = lim sup |X n | > ε > 0 , n→∞

we have P(D(ε)) = 0 because D(ε) ⊆ An (ε) for n = 1, 2, . . .. In addition, because



∞ ∞  C c = lim X n = 0 ⊆ ∪ lim sup |X n | > k1 , we get 1 − P(C) ≤ P n→∞ k=1 n→∞ k=1   1  D k = 0 and, consequently, (6.1.9). ♠ To show that a sequence of random variables is almost surely convergent, it is necessary that either the distribution of ω or the relationship between ω and the random variables are available, or that the random variables are sufficiently simple to show the convergence. Let us now consider a convergence weaker than almost sure convergence. For example, we may require most of the random  variables in the 2 is small enough. sequence {X n }∞ n=1 to be close to X in the sense that E (X n − X ) Such a convergence focuses on the time instances and is easier to show convergence or divergence than almost sure convergence because it does not require convergence of all the sequences.

6.1.2 Convergence in the Mean ∞ {X Definition 6.1.2 (convergence in the r th mean) For a sequence n }n=1 and a ran∞   r dom variable X , assume that the r -th absolute moments E |X n | n=1 and E {|X |r } are all finite. If

  lim E |X n − X |r = 0,

n→∞

(6.1.12) r

then {X n }∞ n=1 is called to converge to X in the r -th mean, and is denoted by X n → X L

r

or X n −→ X .

6.1 Types of Convergence

419

When r = 2, convergence in the r -th mean is called convergence in the mean square, and   lim E |X n − X |2 = 0

(6.1.13)

l.i.m. X n = X,

(6.1.14)

n→∞

is written also as n→∞

where l.i.m. is the acronym of ‘limit in the mean’. Example 6.1.4 (Rohatgi and Saleh 2001) Assume the distribution  P (X n = x) =

1 − n1 , x = 0, 1 , x =1 n

(6.1.15)

for a sequence {X n }∞ n=1 . Then, the sequence  converges in the mean square to X such  that P(X = 0) = 1 because lim E |X n |2 = lim n1 = 0. n→∞

n→∞

Example 6.1.5 (Leon-Garcia 2008) We have observed that the sequence Bn (ω) = 1 − n1 ω in Example 6.1.2 converges surely to ω. Now, because     2 lim E {Bn (ω) − ω}2 = lim E ωn = lim 3n1 2 = 0, the sequence {Bn (ω)}∞ n=1 n→∞ n→∞ n→∞ converges to ω also in the mean square. Mean square convergence is easy to analyze and  meaningful also in engineering applications because the quantity E |X n − X |2 can be regarded as the power of an error. The Cauchy criterion shown in the following theorem allows us to see if a sequence converges in the mean square even when we do not know the limit X : Theorem 6.1.2 A necessary and sufficient condition for a sequence {X n }∞ n=1 to converge in the mean square is   lim E |X n − X m |2 = 0.

n,m→∞

(6.1.16)

{X n }∞ Example 6.1.6 Consider the sequence in Example 6.1.4. Then, n=1 discussed     2 we have lim E |X n − X m | = 0 because E |X n − X m |2 = 1 × P (X n = n,m→∞

0, X m = 1) + 1 × P (X n = 1, X m = 0) = n1 P ( X m = 0| X n = 1) + 0 |X m = 1 ), i.e.,   1 1 + . E |X n − X m |2 ≤ n m Therefore, {X n }∞ n=1 converges in the mean square.

1 P m (X n

=

(6.1.17) ♦

420

6 Convergence of Random Variables

  2  Example 6.1.7 We have lim E |Bn − Bm |2 = lim E n1 − m1 ω2 = n,m→∞ n,m→∞  2   E ω2 lim n1 − m1 = 0 for the sequence {Bn }∞ in Example 6.1.5. n=1 n,m→∞

Mean square convergence implies that more and more sequences are close to the limit X as n becomes larger. However, unlike in almost sure convergence, the sequences close to X do not necessarily always stay close to X . Example 6.1.8 (Leon-Garcia 2008) In Example 6.1.2, the sequence Hn (ω) = exp{−n(nω − 1)} is shown to converge almost surely to 0. Now, because 1        e2n lim E |Hn (ω) − 0|2 = lim e2n 0 exp −2n 2 ω dω = lim 2n 1 − exp −2n 2 2

n→∞

n→∞

n→∞

→ ∞, the sequence {Hn (ω)}∞ n=1 does not converge to 0 in the mean square.

6.1.3 Convergence in Probability and Convergence in Distribution Definition 6.1.3 (convergence in probability) A sequence {X n }∞ n=1 is said to converge stochastically, or converge in probability, to a random variable X if lim P (|X n − X | > ε) = 0

n→∞

(6.1.18)

p

for every ε > 0, and is denoted by X n → X . Note that (6.1.18) implies that almost every sequence is within a range of 2ε at any given time but that the sequence is not required to stay in the range. However, (6.1.8) dictates that a sequence is required to stay within the range 2ε once it is inside the range.  This can easily be confirmed by interpreting the meanings of {|X n − X | > ε} and sup |X m − X | > ε . m≥n

Example 6.1.9 (Rohatgi and Saleh 2001) Assume the pmf  p X n (x) =

1 − n1 , x = 0, 1 , x =1 n

(6.1.19)

for a sequence {X n }∞ n=1 . Then, because  P (|X n | > ε) =

0, ε ≥ 1, 1 , 0 < ε < 1, n p

we have lim P (|X n | > ε) = 0 and thus X n → 0. n→∞

(6.1.20) ♦

6.1 Types of Convergence

421

Example 6.1.10 Assume a sequence {X n }∞ n=1 with the pmf  P (X n = x) =

1 , 2n

x = 3, 4, 1 − n1 , x = 5.

(6.1.21)

p

We then have X n → 5 because ⎧ ⎨ 0, ε ≥ 2, 1 , 1 ≤ ε < 2, P (|X n − 5| > ε) = ⎩ 2n 1 , 0 < ε < 1, n

(6.1.22)

and thus lim P (|X n − 5| > ε) = 0.



n→∞

Theorem 6.1.3 (Rohatgi and Saleh 2001) If a sequence {X n }∞ n=1 converges to X in probability and g is a continuous function, then {g (X n )} converges to g(X ) in probability. Theorem 6.1.3 requires g to be a continuous function: note that the theorem may not hold true if g is not a continuous function.   2 Example 6.1.11 (Stoyanov 2013) Assume X n ∼ N 0, σn and consider the unit step function u(x) with u(0) = 0. Then, {X n }∞ n=1 converges in probability to a random variable X which is almost surely 0 and u(X ) is a random variable which is almost surely 0. However, because u (X n ) = 0 and 1 each with probability 21 , we have p

u (X n )  u(X ).



Definition 6.1.4 (convergence in distribution) If the cdf Fn of X n satisfies lim Fn (x) = F(x)

n→∞

(6.1.23)

for all points at which the cdf F of X is continuous, then the sequence {X n }∞ n=1 is d

said to converge weakly, in law, or in distribution to X , and is written as X n → X or l

Xn → X . Example 6.1.12 For the cdf Fn (x) =

x

√ n √ −∞ σ 2π

⎧ ⎨ 0, 1 lim Fn (x) = , n→∞ ⎩2 1,

  nt 2 dt of X n , we have exp − 2σ 2 x < 0, x = 0, x > 0.

(6.1.24)

Thus, {X n }∞ n=1 converges weakly to a random variable X with the cdf  F(x) =

0, 1,

x < 0, x ≥ 0.

(6.1.25)

422

6 Convergence of Random Variables

Note that although lim Fn (0) = F(0), the convergence in distribution does not n→∞ require the convergence at discontinuity points of the cdf: in short, the convergence of {Fn (x)}∞ n=1 at the discontinuity point x = 0 of F(x) is not a prerequisite for the convergence in distribution. ♦

6.1.4 Relations Among Various Types of Convergence We now discuss the relations among various types of convergence discussed in previous sections. First, let A = {collection of sequences almost surely convergent}, D = {collection of sequences convergent in distribution}, Ms = {collection of sequences convergent in the s-th mean}, Mt = {collection of sequences convergent in the t-th mean}, P = {collection of sequences convergent in probability}, and t > s > 0. Then, we have (Rohatgi and Saleh 2001) D ⊃ P ⊃ Ms ⊃ Mt

(6.1.26)

P ⊃A.

(6.1.27)

and

In addition, neither A and Ms nor A and Mt include each other. Example 6.1.13 (Stoyanov 2013) Assume that the pdf of X is symmetric and let X n = −X . Then, because1 d

X n = X,

(6.1.28)

  d we have X n → X . However, because P (|X n − X | > ε) = P |X | > 2ε  0 when p

n → ∞, we have X n  X .



Example 6.1.14 For the sample space S = {ω1 , ω2 , ω3 , ω4 }, assume the event space 2 S and the uniform probability measure. Define {X n }∞ n=1 by  X n (ω) =

1

d

0, ω = ω3 or ω4 , 1, ω = ω1 or ω2 .

Here, = means ‘equal in distribution’ as introduced in Example 3.5.18.

(6.1.29)

6.1 Types of Convergence

423

Also let  X (ω) =

0, ω = ω1 orω2 , 1, ω = ω3 orω4 .

Then, the cdf’s of X n and X are both ⎧ ⎨ 0, 1 F(x) = , ⎩2 1,

(6.1.30)

x < 0, 0 ≤ x < 1, x ≥ 1.

(6.1.31)

d

In other words, X n → X . Meanwhile, because |X n (ω) − X (ω)| = 1 for ω ∈ S and n ≥ 1, we have P(|X n − X | > ε)  0 for n → ∞. Thus, X n does not converge to X in probability. Example 6.1.15 (Stoyanov 2013) Assume that P (X n = 1) = n1 and P (X n = 0) = 1 − n1 for a sequence {X n }∞ n=1 of independent random variables. Then, we have p

X n → 0 because P (|X n − 0| > ε) = P (X n = 1) =

1 n

→ 0 when n → ∞ for any ∞

ε ∈ (0, 1). Next, let An (ε) = {|X n − 0| > ε} and Bm (ε) = ∪ An (ε). Then, we have n=m

P (Bm (ε)) = 1 − lim P (X n = 0, for all n ∈ [m, M]) . M→∞

Noting that

∞   k=m

1−

1 k



(6.1.32)

= 0 for any natural number m, we get

P (Bm (ε)) = 1 − lim

M→∞

1 1− m





1 1 1− ··· 1 − m+1 M

=1

(6.1.33)

because {X n }∞ n=1 is an independent sequence. Thus, lim P (Bm (ε)) = 0 and, from m→∞

a.s.

Theorem 6.1.1, X n  0.



Example 6.1.16 (Rohatgi and Saleh 2001; Stoyanov 2013) Based on the inequal1  ity |x|s ≤ 1 + |x|r for 0 < s < r , or on the Lyapunov inequality E {|X |s } s ≤  1 E {|X |r } r for 0 < s < r shown in (6.A.21), we can easily show that Lr

Ls

X n −→ X ⇒ X n −→ X

(6.1.34)

for 0 < s < r . Next, if the distribution of the sequence {X n }∞ n=1 is  P (X n = x) =

r +s

x = n, n− 2 , r +s 1 − n− 2 , x = 0

(6.1.35)

424

6 Convergence of Random Variables

  s−r Lr for 0 < s < r , then X n −→ 0 because E X ns = n 2 → 0 for n → ∞: however,   r −s Ls because E X nr = n 2 → ∞, we have X n  0. In short, Ls

Lr

X n −→ 0  X n −→ 0 for r > s.

(6.1.36) ♦

Example 6.1.17 (Stoyanov 2013) If P (X n = x) =

1

, x = en , 1 1 − n, x = 0 n

(6.1.37)

p

for a sequence {X n }∞ n=1 , then X n → 0 because P (|X n | < ε) = P (X n = 0) = 1 −   Lr 1 → 1 for ε > 0 when n → ∞. However, we have X n  0 because E X nr = nr n e → ∞. n Example 6.1.18 (Rohatgi and Saleh 2001) Note first that a natural number n can be expressed uniquely as n = 2k + m

(6.1.38)

  with integers k ∈ {0, 1, . . .} and m ∈ 0, 1, . . . , 2k − 1 . Define a sequence {X n }∞ n=1 by  X n (ω) =

2k , 2mk ≤ ω < 0, otherwise

m+1 , 2k

(6.1.39)

for n = 1, 2, . . . on  = [0, 1]. Assume the pmf  P (X n = x) =

1 , 2k

1−

1 , 2k

x = 2k , x = 0.

(6.1.40)

Then, lim X n (ω) does not exist for any choice ω ∈  and, therefore, the sequence n→∞ does not converge almost surely. However, because P (|X n | > ε) = P (X n > ε)  0, ε ≥ 2k , = 1 , 0 < ε < 2k , 2k p

we have lim P (|X n | > ε) = 0, and thus X n → 0. n→∞

(6.1.41) ♦

6.1 Types of Convergence

425

Example 6.1.19 (Stoyanov 2013) Consider a sequence {X n }∞ n=1 with  P (X n = x) =

1 − n1α , 1 , 2n α

x = 0, x = ±n,

(6.1.42)

1 1 E |X n | 2 < ∞ when α > 23 because E |X n | 2 = n=1 √ √ 1 n −α+ 2 . Letting α = ε and X = α |X n | in the Markov inequality

≥ α) ≤ 1 P(X E{X } − 21 introduced in (6.A.15), we have P (|X n | > ε) ≤ ε E |X | 2 , and thus α ∞  P (|X n | > ε) < ∞ when ε > 0. Now, employing Borel-Cantelli lemma as in

where α > 0. Then,

∞ 

n=1 L2

a.s.

Example 6.1.3, we have X n −→ 0. Meanwhile, X n  0 when α ≤ 2 because     a.s. L2 1 E |X n |2 = n α−2 . In essence, we have X n −→ 0, yet X n  0 for α ∈ 23 , 2 . ♦ d

d

Example 6.1.20 (Stoyanov 2013) When X n → X and Yn → Y , we have X n + d

∞ Yn → X + Y if the sequences {X n }∞ n=1 and {Yn }n=1 are independent of each other. On the other hand, if the two sequences are not independent of each other, we may d

have different results. For example, assume X n → X ∼ N (0, 1) and let Yn = α X n . Then, we have   d Yn → Y ∼ N 0, α 2 ,

(6.1.43)

  to N 0, (1 + α)2 . Howand the distribution of X n + Yn = (1 + α)X   n 2converges ever, because X ∼ N (0, 1) and Y ∼ N 0, α , the distribution of X + Y is not ∞ necessarily N 0, (1 + α)2 . In other words, if the sequences {X n }∞ n=1 and {Yn }n=1 d

are not independent of each other, it is possible that X n + Yn  X + Y even when d

d

X n → X and Yn → Y .



6.2 Laws of Large Numbers and Central Limit Theorem In this section, we will consider the sum of random variables and its convergence. We will then introduce the central limit theorem (Davenport 1970; Doob 1949; Mihram 1969), one of the most useful and special cases of convergence.

426

6 Convergence of Random Variables

6.2.1 Sum of Random Variables and Its Distribution The sum of random variables is one of the key ingredients in understanding and applying the properties of convergence and limits. We have discussed the properties of the sum of random variables in Chap. 4. Specifically, the sum of two random variables as well as the cf and distribution of the sum of a number of random variables are discussed in Examples 4.2.4, 4.2.13, and 4.3.8. We now consider the sum of a number of random variables more generally. Theorem 6.2.1 The expected value and variance of the sum Sn = dom variables

n {X i }i=1

n 

X i of the ran-

i=1

are E {Sn } =

n 

E {X i }

(6.2.1)

i=1

and Var {Sn } =

n 

Var {X i } + 2

i=1

n n−1  

  Cov X i , X j ,

(6.2.2)

i=1 j=i+1

respectively.





n  Proof First, it is easy to see that E {Sn } = E Xi = E {X i }. Next, we have i=1 i=1   n n

 2    Var ai X i = E ai X i − E {X i } , i.e., i=1

n 

i=1

Var

 n 

 ai X i

i=1

=

n  i=1

= E

⎧ n  n ⎨ ⎩

i=1 j=1

ai2 Var {X i } + 2

⎫ ⎬ ai a j (X i − E {X i }) X j − E X j ⎭ 

n n−1  



  ai a j Cov X i , X j

(6.2.3)

i=1 j=i+1

n n when {ai }i=1 are constants. Letting {ai = 1}i=1 , we get (6.2.2).

Theorem 6.2.2 The variance of the sum Sn =

n 



X i can be expressed as

i=1

Var {Sn } =

n 

Var {X i }

i=1 n when the random variables {X i }i=1 are uncorrelated.

(6.2.4)

6.2 Laws of Large Numbers and Central Limit Theorem

427

Theorem 6.2.1 says that the expected value of the sum of random variables is the sum of the expected values of the random variables. In addition, the variance of the sum of random variables is obtained by adding the sum of the covariances between two distinct random variables to the sum of the variances of the random variables. Theorem 6.2.2 dictates that the variance of the sum of uncorrelated random variables is simply the sum of the variances of the random variables. Example 6.2.1 (Yates and Goodman 1999) Assume thatn the joint moments are  . Obtain the expected E X i X j = ρ |i− j| for zero-mean random variables {X i }i=1 value and variance of Yi = X i−2 + X i−1 + X i for i = 3, 4, . . . , n. Solution Using (6.2.1), we have E {Yi } = E {X i−2 } + E {X i−1 } + E {X i } = 0. Next, from (6.2.2), we get Var {Yi } = Var {X i−2 } + Var {X i−1 } + Var {X i } + 2 {Cov (X i−2 , X i−1 ) + Cov (X i−1 , X i ) + Cov (X i−2 , X i )} = 3ρ 0 + 2ρ 1 + 2ρ 1 + 2ρ 2 = 3 + 4ρ + 2ρ 2

(6.2.5)

because Var {X i } = ρ 0 = 1.



Example 6.2.2 In a meeting of a group of n people, each person attends with a gift. The name tags of the n people are put in a box, from which each person randomly picks one name tag: each person gets the gift brought by the person on the name tag. Let G n be the number of people who receive their own gifts back. Obtain the expected value and variance of G n . Solution Let us define  1, when personipicks her/his own name tag, Xi = 0, otherwise.

(6.2.6)

Then, Gn =

n 

Xi .

(6.2.7)

i=1

For any person, the probability of picking her/his own name tag is n1 . Thus, E {X i } =   1 × n1 + 0 × n−1 = n1 and Var {X i } = E X i2 − E2 {X i } = n1 − n12 . In addition,     n 1 1 and P X i X j = 0 = 1 − n(n−1) for i = j, we have because P X i X j = 1 = n(n−1)       1 1 Cov X i , X j = E X i X j − E {X i } E X j = n(n−1) − n 2 , i.e.,   Cov X i , X j =

1 . − 1)

n 2 (n

(6.2.8)

428

6 Convergence of Random Variables

n  1 Therefore, E {G n } = = 1 and Var {G n } = nVar {X i } + n(n − 1)Cov (X i , n i=1  X j = 1. In short, for any number n of the group, one person will get her/his own gift back on average. ♦ n Theorem 6.2.3 For independent random variables {X i }i=1 , let the cf and mgf of X i be ϕ X i (ω) and M X i (t), respectively. Then, we have

ϕ Sn (ω) =

n 

ϕ X i (ω)

(6.2.9)

M X i (t)

(6.2.10)

i=1

and M Sn (t) =

n  i=1

as the cf and mgf, respectively, of Sn =

n 

Xi .

i=1 n are independent of each other, we can easily obtain the Proof Noting that {X i }i=1 n      jωS   cf ϕ Sn (ω) = E e n = E e jω(X 1 +X 2 +···+X n ) = E e jωX i , i.e., i=1

ϕ Sn (ω) =

n 

ϕ X i (ω)

(6.2.11)

i=1

as in (4.3.32). We can show (6.2.10) similarly.



n {X i }i=1 ,

Theorem 6.2.4 For i.i.d. random variables let the cf and mgf of X i be ϕ X (ω) and M X (t), respectively. Then, we have the cf ϕ Sn (ω) = ϕ nX (ω)

(6.2.12)

M Sn (t) = M Xn (t)

(6.2.13)

and the mgf

of Sn =

n 

Xi .

i=1 n Proof Noting that the random variables {X i }i=1 are all of the same distribution, Theorem 6.2.4 follows directly from Theorem 6.2.3. ♠ n Example 6.2.3 When {X i }i=1 are i.i.d. with marginal distribution b(1, p), obtain n  the distribution of Sn = Xi . i=1

6.2 Laws of Large Numbers and Central Limit Theorem

429

Solution The mgf of X i is M X (t) = 1 − p + pet as shown in (3.A.47). Therefore, n ♦ the mgf of Sn is M Sn (t) = 1 − p + pet , implying that Sn ∼ b(n, p). n are independent of each other with X i ∼ b (ki , p), Example 6.2.4 When {X i }i=1 n  obtain the distribution of Sn = Xi . i=1

k  Solution The mgf of X i is M X i (t) = 1 − p + pet i as shown in (3.A.49). Thus, n  k  1 − p + pet i , i.e., the mgf of Sn is M Sn (t) = i=1 n 

 ki  M Sn (t) = 1 − p + pet i=1 . This result implies Sn ∼ b

n 

(6.2.14)

ki , p .



i=1



n 

n 

n 

Example 6.2.5 We have shown that Sn = Xi ∼ N mi , i=1 i=1 i=1    n X i ∼ N m i , σi2 i=1 are independent of each other in Theorem 5.2.5. Example 6.2.6 Show that Sn =

n  i=1

σi2

when

  n X i ∼ G n, λ1 when {X i }i=1 are i.i.d. with

marginal exponential distribution of parameter λ. λ as shown in (3.A.67). Thus, the mgf of Sn Solution The mgf of X i is M X (t) = λ−t   λ n  ♦ is M Sn (t) = λ−t and, therefore, Sn ∼ G n, λ1 .

Definition 6.2.1 (random sum) Assume that the support of the pmf of a random variable N is a subset of {0, 1, . . .} and that the random variables {X 1 , X 2 , . . . , X N } are independent of N . The random variable SN =

N 

Xi

(6.2.15)

i=1

is called the random sum or variable sum, where we assume S0 = 0.

  The mgf of the random sum S N can be expressed as M SN (t) = E et SN = ∞ ∞            E E et SN  N = E et SN  N = n p N (n) = E et Sn p N (n), i.e., n=0

n=0

M SN (t) =

∞ 

M Sn (t) p N (n),

(6.2.16)

n=0

where p N (n) is the pmf of N and M Sn (t) is the mgf of Sn =

n  i=1

Xi .

430

6 Convergence of Random Variables

N Theorem 6.2.5 When the random variables {X i }i=1 are i.i.d. with marginal mgf N  X i can be obtained as M X (t), the mgf of the random sum S N = i=1

M SN (t) = M N (ln M X (t)) ,

(6.2.17)

where M N (t) is the mgf of N . Proof Applying Theorem 6.2.4 in (6.2.16), we get M SN (t) = ∞ 

∞  n=0

  en ln M X (t) p N (n) = E e N ln M X (t) , i.e.,

M Xn (t) p N (n) =

n=0

M SN (t) = M N (ln M X (t)) ,

(6.2.18)

where p N (n) is the pmf of N .



  Meanwhile, if we write M˜ N (z) as E z N , the mgf (6.2.17) can be written as ∞ ∞            M SN (t) = E et SN = E E et SN  N = E et SN  N = n P(N = n) = 

E e ∞  n=0

t X 1 +t X 2 +···+t X n



P(N = n) =

∞ 



n=0

E e

t X1

n=0

  tX    E e 2 · · · E et X n P(N = n) =

n=0

M Xn (t)P(N = n), i.e., M SN (t) = M˜ N (M X (t))

(6.2.19)

∞    g n (t)P(N = n). using M˜ N (g(t)) = E g N (t) = n=0

Example 6.2.7 Assume that i.i.d. exponential random variables {X n }∞ n=1 with mean 1 are independent of a geometric random variable N with pmf p (k) = (1 − α)k−1 α N λ N  Xi . for k ∈ {1, 2, . . .}. Obtain the distribution of the random sum S N = i=1 αe and M X (t) 1−(1−α)et λ α exp(ln λ−t ) , i.e., λ 1−(1−α) exp(ln λ−t )

Solution The mgf’s of N and X i are M N (t) = tively. Thus, the mgf of S N is M SN (t) =

M SN (t) =

αλ . αλ − t

t

=

λ , λ−t

respec-

(6.2.20)

1 Therefore, S N is an exponential random variable with mean αλ . This result is also in agreement with the intuitive interpretation that S N is the sum of, on average, α1 variables of mean λ1 . ♦

6.2 Laws of Large Numbers and Central Limit Theorem

431

Theorem 6.2.6 When the random variables {X i } are i.i.d., we have the expected value E {S N } = E{N }E{X }

(6.2.21)

Var {S N } = E{N }Var{X } + Var{N }E2 {X }

(6.2.22)

and the variance

of the random sum S N =

N 

Xi .

i=1

Proof (Method 1) From (6.2.17), we have M S N (t) = M N (ln M X (t)) M SN (t) = M N (ln M X (t)) +M N



M X (t) M X (t)

(ln M X (t))

M X (t) M X (t)

and

2

 2 M X (t)M X (t) − M X (t) M X2 (t)

.

(6.2.23)

Now, recollecting M X (0) = 1, we get E {S N } = M S N (0) = M N (0)M X (0), i.e., E {S N } = E{N }E{X }

(6.2.24)

    2  2  and E S N2 = M SN (0) = M N (0) M X (0) + M N (0) M X (0) − M X (0) , i.e.,     E S N2 = E N 2 E2 {X } + E{N }Var{X }.

(6.2.25)

Combining (6.2.24) and (6.2.25), we have (6.2.22). N  (Method 2) Because E{Y |N } = E {X i } = N E{X } from (4.4.40) with Y = S N , i=1     we get the expected value of Y as E{Y } = E E {Y |N } = E N E {X } , i.e., E{Y } = E {N } E {X } . Similarly, recollecting that Y 2 =

N  i=1

X i2 +

      E Y 2 = E E Y 2  N can be evaluated as

N  N  i=1 j=1 i = j

(6.2.26) X i X j , the second moment

432

6 Convergence of Random Variables

      E Y 2 = E N E X 2 + N (N − 1)E2 {X }       = E {N } E X 2 − E2 {X } + E N 2 E2 {X }   = E {N } Var {X } + E N 2 E2 {X } .

(6.2.27)

♠   ∞ Example 6.2.8 Assume that i.i.d. random variables X n ∼ N m, σ 2 n=1 are independent of N ∼ P(λ). Then,the random  sum S N has the expected value E {S N } = λm and variance Var {S N } = λ σ 2 + m 2 . From (6.2.26) and (6.2.27), we can obtain (6.2.22).



Let us note two observations. (1) When the random variable N is a constant n: because E{N } = n and Var{N } = 0, we have 2 E {Sn } = nE{X } and Var {Sn } = nVar{X } from Theorem 6.2.6. (2) When the random variable X i is a constant x: because E{X } = x, Var{X } = 0, and S N = x N , we have E {S N } = xE{N } and Var {S N } = x 2 Var{N }.

6.2.2 Laws of Large Numbers n We now consider the limit of a sequence {X i }i=1 by taking into account the sum n  Sn = X i for n → ∞. i=1

6.2.2.1

Weak Law of Large Numbers

Definition 6.2.2 (weak law of large numbers) When we have Sn − an p → 0 bn

(6.2.28)

∞ for two sequences {an }∞ n=1 and {bn > 0}n=1 of real numbers such that bn ↑ ∞, the ∞ sequence {X i }i=1 is called to follow the weak law of large numbers. ∞ In Definition 6.2.2, {an }∞ n=1 and {bn }n=1 are called the central constants and normalizing constants, respectively. Note that (6.2.28) can be expressed as



  Sn − an    lim P  ≥ε =0 n→∞ b n

for every positive number ε. 2

This result is the same as (6.2.1) and (6.2.4).

(6.2.29)

6.2 Laws of Large Numbers and Central Limit Theorem

433

Theorem 6.2.7 (Rohatgi and Saleh 2001) Assume a sequence of uncorrelated ran∞ with means E {X i } = m i and variances Var {X i } = σi2 . If dom variables {X i }i=1 ∞ 

σi2 → ∞,

(6.2.30)

i=1

then n 

!−1 σi2

n 

Sn −

i=1

! p

→ 0.

mi

(6.2.31)

i=1

∞ In other words, the sequence {X i }i=1 satisfies the weak law of large numbers with n n   2 an = m i and bn = σi . i=1

i=1

Proof Employing the Chebyshev inequality P(|Y − E{Y }| ≥ ε) ≤ duced in (6.A.16), we have   ! n n       2 mi  > ε σi ≤ P  Sn −   i=1

i=1

ε

n 

E⎣

σi2

i=1

= ε2

⎡

!−2

n 

intro-

2 ⎤ (X i − m i ) ⎦

i=1

!−1 σi2

n 

Var{Y } ε2

.

(6.2.32)

i=1



 n n     2   In short, P  Sn − mi  > ε σi → 0 when n → ∞. i=1



i=1

∞ Example 6.2.9 (Rohatgi and Saleh 2001) If an uncorrelated sequence {X i }i=1 with 2 mean E {X i } = m i and variance Var {X i } = σi satisfies n 1  2 σi = 0, n→∞ n 2 i=1

lim

then

1 n

Sn −

n 

mi

(6.2.33)

p

→ 0. This result, called the Markov theorem, can be easily

i=1

shown with the steps similar to those in the proof of Theorem 6.2.7. Here, (6.2.33) is called the Markov condition. ♦ Example 6.2.10 (Rohatgi and Saleh 2001) Assume an uncorrelated sequence ∞ {X i }i=1 with identical distribution, mean E {X i } = m, and variance Var {X i } = σ 2 . ∞   p  Then, because σ 2 → ∞, we have σ12 Snn − m → 0 from Theorem 6.2.7. Here, i=1

an = nm and bn = nσ 2 .

434

6 Convergence of Random Variables

From now on, we assume bn = n in discussing the weak law of large numbers unless specified otherwise. Example 6.2.11 (Rohatgi and Saleh 2001) For an i.i.d. sequence of random variables with distribution b(1, p), noting that the mean is p and the variance is p(1 − p), we p ♦ have Snn → p from Theorem 6.2.7 and Example 6.2.9. Example 6.2.12 (Rohatgi and Saleh 2001) For a sequence of i.i.d. random variables with marginal distribution C(1, 0), we have Snn ∼ C(1, 0) as discussed in Exercise 6.3. In other words, because Snn does not converge to 0 in probability, the weak law of large numbers does not hold for sequences of i.i.d. Cauchy random variables. Example 6.2.13 (Rohatgi and Saleh 2001; Stoyanov 2013) For an i.i.d. sequence p ∞ {X i }i=1 , if the absolute mean E {|X i |} is finite, then Snn → E {X 1 } when n → ∞ from Theorem 6.2.7 and Example 6.2.9. This result is called Khintchine’s theorem.

6.2.2.2

Strong Law of Large Numbers

Definition 6.2.3 (strong law of large numbers) When we have Sn − an a.s. −→ 0 bn

(6.2.34)

∞ for two sequences {an }∞ n=1 and {bn > 0}n=1 of real numbers such that bn ↑ ∞, the ∞ sequence {X i }i=1 is called to follow the strong law of large numbers.

Note that (6.2.34) implies

Sn − an P lim = 0 = 1. n→∞ bn

(6.2.35)

A sequence of random variables that follows the strong law of large numbers also follows the weak law of large numbers because almost sure convergence implies convergence in probability. As in the discussion of the weak law of large numbers, we often assume the normalizing constant bn = n also for the strong law of large ∞ to follow the numbers. We now consider sufficient conditions for a sequence {X i }i=1 strong law of large numbers when bn = n. Theorem 6.2.8 (Rohatgi and Saleh 2001) The sum

∞ 

(X i − μi ) converges almost

i=1

surely to 0 if ∞ 

σi2 < ∞

i=1

 ∞ ∞ ∞ with means {μi }i=1 and variances σi2 i=1 . for a sequence {X i }i=1

(6.2.36)

6.2 Laws of Large Numbers and Central Limit Theorem

Theorem 6.2.9 (Rohatgi and Saleh 2001) If n 1  bi xi b n→∞ n i=1

then lim

n  i=1

435

xi converges for a sequence {xn }∞ n=1 ,

= 0 for {bn }∞ n=1 such that bn ↑ ∞. This result is called the

Kronecker lemma. ∞ Example be {μi }i=1  2 ∞6.2.14 (Rohatgi and Saleh 2001) Let the means and variances ∞ and σi i=1 , respectively, for independent random variables {X i }i=1 . Then, we can easily show that

 a.s. 1  Sn − E {Sn } −→ 0 bn

(6.2.37)

from Theorems 6.2.8 and 6.2.9 when ∞  σ2 i

i=1

bi2

< ∞,

bi ↑ ∞.

(6.2.38)

When bn = n, (6.2.38) can be expressed as ∞  σ2 n

n=1

n2

< ∞,

(6.2.39) ♦

which is called the Kolmogorov condition.

 ∞ Example 6.2.15 (Rohatgi and Saleh 2001) If thevariances σn2 n=1 of independent random variables are uniformly bounded, i.e., σn2  ≤ M for a finite number M, then  a.s. 1 Sn − E {Sn } −→ 0 n from Kolmogorov condition because

∞  n=1

σn2 n2



∞  n=1

M n2

(6.2.40) < ∞.



Example 6.2.16 (Rohatgi and Saleh 2001) Based on the result in Example 6.2.15, it is easy to see that Bernoulli trials with probability of success p satisfy the strong law of large numbers because the variance p(1 − p) is no larger than 41 . Note that the Markov condition (6.2.33) and the Kolmogorov condition (6.2.38) are sufficient conditions but are not necessary conditions. Theorem 6.2.10 (Rohatgi and Saleh 2001) If the fourth moment is finite for an i.i.d. ∞ sequence {X i }i=1 with mean E {X i } = μ, then

436

6 Convergence of Random Variables

P In other words,

Sn n

lim

n→∞

Sn = μ = 1. n

(6.2.41)

converges almost surely to μ.

Proof Let the variance of X i be σ 2 . Then, we have ⎡ E⎣

n 

4 ⎤ (X i − μ)



⎦=E

i=1

n 

 (X i − μ)

4

i=1

+3E 

⎧ n ⎨ ⎩

(X i − μ)2

i=1

= nE (X 1 − μ)

4



n   j=1, j =i

⎫ 2 ⎬ Xj −μ ⎭

+ 3n(n − 1)σ 4

≤ cn 2

(6.2.42)

for an appropriate constant c. From this result and the Bienayme-Chebyshev & 

4 '  n n    1 inequality (6.A.25), we get P  (X i − μ) > nε ≤ (nε)4 E (X i − μ) i=1 ∞ 

i=1

   ≤ = and, consequently, P  Snn − μ > ε < ∞, where c = n=1   fore, letting Aε = lim sup  Snn − μ > ε , we get cn 2 (nε)4

c n2

c . ε4

There-

n→∞

P ( Aε ) = 0

(6.2.43)

from the Borel-Cantelli lemma discussed in Theorem 2.A.3. Now, {Aε } is an   increasing sequence of ε → 0, and converges to ω : lim  Snn − μ > 0 or, equivn→∞



  Sn Sn alently, to ω : lim n = μ : thus, we have P lim n = μ = P lim Aε = n→∞

n→∞

lim P ( Aε ) = 0 from (6.2.43). Subsequently, we get (6.2.41).

ε→0

ε→0



∞ Example 6.2.17 (Rohatgi and Saleh 2001) Consider an i.i.d. sequence {X i }i=1 and a Sn positive number B. If P (|X i | < B) = 1 for every i, then n converges almost surely to the mean E {X i } of X i . This can be easily shown from Theorem 6.2.10 by noting ♦ that the fourth moment is finite when P (|X i | < B) = 1.

Theorem 6.2.11 (Rohatgi and Saleh 2001) with mean μ. If

∞ Consider an i.i.d. sequence {X i }i=1

E {|X i |} < ∞,

(6.2.44)

6.2 Laws of Large Numbers and Central Limit Theorem

437

then Sn a.s. −→ μ. n

(6.2.45)

The converse also holds true. Note that the conditions in Theorems 6.2.10 and 6.2.11 are on the fourth moment and absolute mean, respectively. Example 6.2.18 (Stoyanov 2013) Consider an independent sequence {X n }∞ n=2 with the pmf  p X n (x) =

1−

1 , n log n 1 , 2n log n

Letting An = {|X n | ≥ n} for n ≥ 2, we get

x = 0, x = ±n. ∞ 

(6.2.46)

P ( An ) → ∞ because P ( An )

n=2

∞ 

P ( An ) is divergent and {X n }∞ n=2 are independent:   X n  therefore, the probability P (|X n | ≥ n occurs i.o.) = P  n  ≥ 1 occurs i.o. =   P lim Snn = 0 of {An occurs i.o.} is 1, i.e., =

1 . n log n

In other words,

n=2

n→∞

P (|X n | ≥ n occurs i.o.) = 1

(6.2.47)

from the Borel-Cantelli lemma. In short, the sequence {X n }∞ n=2 does not follow the strong law of large numbers. On the other hand, the sequence {X n }∞ n=2 , satisfying the Markov condition as

( n+1 n 2 1 x 1  } {X ≤ Var + d x k n 2 k=2 n 2 log 2 log x 3 ≤

n2

(n − 2)(n + 1) 2 + log 2 n 2 log n

→0 from Var {X k } =

k , log k

(6.2.48)

follows the weak law of large numbers.

6.2.3 Central Limit Theorem Let us now discuss the central limit theorem (Feller 1970; Gardner 2010; Rohatgi and Saleh 2001), the basis for the wide-spread and most popular use of the normal n  X k . Assume distribution. Assume a sequence {X n }∞ n=1 and the sum Sn = k=1

438

6 Convergence of Random Variables

1 l (Sn − an ) → Y bn

(6.2.49)

∞ for appropriately chosen sequences {an }∞ n=1 and {bn > 0}n=1 of constants. It is known that the distribution of the limit random variable Y is always a stable distribution. Sn √ For example, for an i.i.d. sequence {X n }∞ n=1 , we have n ∼ N (0, 1) if X i ∼ N (0, 1)

and Snn ∼ C(1, 0) if X i ∼ C(1, 0): the normal and Cauchy distributions are typical examples of the stable distribution. In this section, we discuss the conditions on which the limit random variable Y has a normal distribution. ∞ Example 6.2.19 (Rohatgi and Saleh 2001) Assume an i.i.d. sequence √ {X i }i=1 with marginal distribution b(1, p). Letting an = E {Sn } = np and bn = Var {Sn } =      n √ Sn −np Xi − p np(1 − p), the mgf Mn (t) = E exp √np(1− t = E exp √np(1− t of p) p) Sn −an bn

i=1

can be obtained as



n  t npt (1 − p) + p exp √ Mn (t) = exp − √ np(1 − p) np(1 − p)



n  (1 − p)t pt + p exp √ = (1 − p) exp − √ np(1 − p) np(1 − p)

 n 2 1 t +o = 1+ . (6.2.50) 2n n Thus, Mn (t) → exp

 2 t 2

when n → ∞ and, subsequently,

2

( x Sn − np t 1 dt P √ exp − ≤x → √ 2 np(1 − p) 2π −∞  2 because exp t2 is the mgf of N (0, 1). Theorem 6.2.12 (Rohatgi and Saleh 2001) with mean m and variance σ 2 , we have



∞ For i.i.d. random variables {X i }i=1

Sn − nm l → Z, √ nσ 2 where Z ∼ N (0, 1).

(6.2.51)

(6.2.52)

  Proof Letting Yi = X i − m, we have E {Yi } = 0 and E Yi2 = σ 2. Also  let jωY Y i = Vi = √ i 2 . Denoting the pdf of Yi by f Y , the cf ϕV (ω) = E exp √ 2 nσ nσ       2 3 ∞ ∞ √jωy f Y (y)dy = −∞ 1 + √jω 2 y + 21 √jω 2 y 2 + 16 √jω 2 2 −∞ exp nσ nσ nσ nσ

y 3 + · · · f Y (y)dy of Vi can be obtained as

6.2 Laws of Large Numbers and Central Limit Theorem

jω 1 ϕV (ω) = 1 + √ E{Y } + 2 2 nσ

ω2 1 . = 1− +o 2n n Next, letting Z n =

S√ n −nm nσ 2

jω √ nσ 2

2

  E Y2 + ··· (6.2.53)

n 

Vi and denoting the cf of Z n by ϕ Z n , we have   n 2 , i.e., lim ϕ Z n (ω) = lim ϕVn (ω) = lim 1 − ω2n + o n1

n→∞

n→∞

=



439

i=1

n→∞

ω2 lim ϕ Z n (ω) = exp − n→∞ 2

(6.2.54)

∞ from (6.2.53) because {Vi }i=1 are independent. In short, the distribution of Z n = S√ n −nm converges to N (0, 1) as n → ∞. ♠ 2 nσ

Theorem 6.2.12 is one of the many variants of the central limit theorem, and can be derived from the Lindeberg’s central limit theorem introduced in Appendix 6.2. ∞ with Example 6.2.20 (Rohatgi and Saleh 2001) Assume an i.i.d. sequence {X i }i=1 ˜ where 0 < p < 1 and u(k) ˜ is the unit step marginal pmf p X i (k) = (1 − p)k p u(k), function in discrete space defined in (1.4.17). We have E {X i } = qp and Var {X i } = q , where q = 1 − p. Thus, it follows from Theorem 6.2.12 that p2

!  √  Sn n p n −q ≤ x → (x) P √ q

(6.2.55)

for x ∈ R when n → ∞. The central limit theorem is useful in many cases: however, it should also be noted that there do exist cases in which the central limit theorem does not apply. Example 6.2.21 (Stoyanov 2013) Assume an i.i.d. sequence {Yk }∞ k=1 with P(Yk = √ 1 15 ±1) = 2 , and let X k = 4k Yk . Then, it is easy to see that E {Sn } = 0 and Var {Sn } = 1 − 16−n . In other words, when n is sufficiently large, Var {Sn } ≈ 1. Meanwhile, |Sn | = |X 1 + because X + · · · + X n | ≥ |X 1 | − (|X 2 | + |X 3 | + · · · + |X n |) = √ √   √15 2 1   15 15 1 − 12 1 − 4n−1 ≥ 6 > 2 , we have P |Sn | ≤ 21 = 0. Thus, P (Sn ) does not 4 converge to the standard normal cdf (x) at some point x. This fact implies that ∞ {X i }i=1 does not satisfy the central limit theorem: the reason is that the random ♦ variable X 1 is exceedingly large to virtually determine the distribution of Sn . The central limit theorem and laws of large numbers are satisfied for a wide range of sequences of random variables. As we have observed in Theorem 6.2.7 and Example 6.2.15, the laws of large numbers hold true for uniformly bounded independent sequences. As shown in Example 6.A.5 of Appendix 6.2, the central

440

6 Convergence of Random Variables

limit theorem holds true for an independent sequence even when the sum of variances ∞ , noting that diverges. Meanwhile, for an i.i.d. sequence {X i }i=1 



 Sn  |Sn − nm| ε√   P  − m > ε = P n > √ n σ σ n   ε√ n , ≈ 1 − P |Z | ≤ σ

(6.2.56)

where Z ∼ N (0, 1), we can obtain the laws of large numbers directly from the central limit theorem. In other words, the central limit theorem is stronger than the laws of large numbers: yet, in the laws of large numbers we are not concerned with the existence of the second moment. In some independent sequences for which the central limit theorem holds true, on the other hand, the weak law of large numbers does not hold true. Example 6.2.22 (Feller 1970;    Rohatgi and Saleh 2001) Assume the pmf  P X k = k λ = P X k = −k λ = 21 for an independent sequence {X k }∞ k=1 , where λ > 0. Then, the mean and variance of X k are E {X k } = 0 and Var {X k } = k 2λ , n n   respectively. Now, letting sn2 = Var {X k } = k 2λ , we have k=1

k=1

sn2 ≥ from

n  k=1

k 2λ ≥

n 1

n 2λ+1 − 1 2λ + 1

(6.2.57)

x 2λ d x. Here, we can assume n > 1 without loss of generality,

−1 n + 1, we have ε2 > 2λ+1 and ε2 sn2 > 2λ+1 s 2 ≥ 2λ+1 = and if we let n > 2λ+1 ε2 n−1 n−1 n n−1 2λ+1 λ | |x > n + 1, we have if n 2λ + n 2λ−1 + · · · + 1 > n 2λ . Therefore, for n > 2λ+1 kl 2 ε |xkl | > εsn . Noting in addition that P (X k = x) is non-zero only when |x| ≤ n λ , we get 2λ+1

n 1  sn2 k=1



xkl2 pkl = 0.

(6.2.58)

|xkl |>εsn

In short, the Lindeberg conditions3 are satisfied and the central limit theorem holds n n  k 2λ ≤ 0 x 2λ d x, i.e., true. Now, if we consider sn2 = k=1

sn2 ≤

3

n 2λ+1 2λ + 1

Equations (6.A.5) and (6.A.6) in Appendix 6.2 are called the Lindeberg conditions.

(6.2.59)

6.2 Laws of Large Numbers and Central Limit Theorem

and (6.2.57), we can write sn ≈ * P a
0, M(t) = 1, t = 0, (6.A.3) ⎩ ∞, t < 0. The function M(t) is not an mgf. In other words, the limit of a sequence of mgf’s is not necessarily an mgf. ♦ Example 6.A.4 Assume the pdf f n (x) = πn 1+n12 x 2 of X n . Then, the cdf is Fn (x) =  dt n x . We also have lim f n (x) = δ(x) and lim Fn (x) = u(x). These limits π −∞ 1+n 2 t 2 n→∞  −ε  ∞n→∞ imply lim P (|X n − 0| > ε) = −∞ δ(x)d x + ε δ(x)d x = 0 and, consequently, n→∞

{X n }∞ n=1 converges to 0 in probability.



Appendix 6.2 The Lindeberg Central Limit Theorem The central limit theorem can be expressed in a variety of ways. Among those varieties, the Lindeberg central limit theorem is one of the most general ones and does not require the random variables to have identical distribution. ∞ , Theorem 6.A.1 (Rohatgi and Saleh 2001) For an independent sequence {X i }i=1 let the mean, variance, and cdf of X i be m i , σi2 , and Fi , respectively. Let

sn2 =

n  i=1

σi2 .

(6.A.4)

Appendices

445

When the cdf Fi is absolutely continuous, assume that the pdf f i (x) = satisfies n ( 1  (x − m i )2 f i (x)d x = 0 n→∞ s 2 |x−m |>εs n i=1 i n

lim

d dx

Fi (x)

(6.A.5)

∞ for every value of ε > 0. When {X i }i=1 are discrete random variables, assume the pmf pi (x) = P (X i = x) satisfies5 n 1  n→∞ s 2 n i=1



lim

(xil − m i )2 pi (xil ) = 0

(6.A.6)

|xil −m i |>εsn

Li are the jump points of Fi with L i the number for every value of ε > 0, where {xil }l=1 of jumps of Fi . Then, the distribution of n 

1 sn

Xi −

i=1

n 

! mi

(6.A.7)

i=1

converges to N (0, 1) as n → ∞. Example 6.A.5 (Rohatgi and Saleh 2001) Assume an independent sequence 1 2 {X k ∼ U (−ak , ak )}∞ k=1 . Then, E {X k } = 0 and Var {X k } = 3 ak . Let |ak | < A and n n   2 sn2 = Var {X k } = 13 ak → ∞ when n → ∞. Then, from the Chebyshev k=1

k=1

inequality P(|Y − E{Y }| ≥ ε) ≤ ( n 1  sn2 k=1

Var{Y } ε2

discussed in (6.A.16), we get

x 2 Fk (x)d x ≤

|x|>εsn

n A2  Var {X k } sn2 k=1 ε2 sn2

A2 ε2 sn2 →0 =

as

n→∞

1 sn2

because

P (|X k | > εsn ). Meanwhile, assume

∞ 

n 



k=1 |x|>εsn

ak2 < ∞, k=1 εk such that

x 2 Fk (x)d x ≤

(6.A.8) 1 sn2

n 



k=1 |x|>εsn

A2 2a1k d x =

A2 sn2

n  k=1

and let sn2 ↑ B 2 for n → ∞. Then, for a con-

εk B < ak , and we have εk sn < εk B. Thus, stant k, we can find P (|X k | > εk sn ) ≥ P (|X k | > εk B) > 0. Based on this result, for n ≥ k, we get 5

As mentioned in Example 6.2.22 already, (6.A.5) and (6.A.6) are called the Lindeberg condition.

446

6 Convergence of Random Variables

( n 1  sn2 j=1

x 2 F j (x)d x ≥

|x|>εk sn

( n sn2 εk2  sn2 j=1

F j (x)d x

|x|>εk sn

=

n sn2 εk2    P Xj sn2 j=1

> εk sn



≥ εk2 P (|X k | > εk sn ) > 0,

(6.A.9)

implying that the Lindeberg condition is not satisfied. In essence, in a sequence of uniformly bounded independent random variables, a necessary and sufficient condi∞  ♦ Var {X k } → ∞. tion for the central limit theorem to hold true is k=1

Example 6.A.6 (Rohatgi and Saleh 2001) Assume an independent sequence n      2+δ {X k }∞ < ∞, and α j = o sn2+δ . Then, the Link=1 . Let δ > 0, αk = E |X k | j=1

deberg condition is satisfied and the central limit theorem holds true. This can be shown easily as ( n 1  sn2 k=1

x 2 Fk (x)d x ≤

|x|>εsn

( n 1  sn2 k=1

|x|>εsn

≤ =

1

|x|2+δ  F (x)d x εδ snδ k

n ( 



εδ sn2+δ k=1 −∞ n  1

εδ sn2+δ k=1

|x|2+δ Fk (x)d x

αk

→0

(6.A.10)

because x 2 < |x| from |x|δ x 2 > |εsn |δ x 2 when |x| > εsn . We can similarly show εδ snδ that the central limit theorem holds true in discrete random variables. ♦ 2+δ

The conditions (6.A.5) and (6.A.6) are the necessary conditions in the following ∞ {X i }i=1 sense: of independent random variables, assume the variances  2 ∞for a sequence ∞ σi i=1 of {X i }i=1 are finite. If the pdf of X i satisfies (6.A.5) or the pmf of X i satisfies (6.A.6) for every value of ε > 0, then ⎞   Xn − E Xn ⎠ lim P ⎝ )   ≤ x = (x) n→∞ Var X n ⎛

(6.A.11)

Appendices

447

and lim P

n→∞

)   = 0, max |X k − E {X k }| > nε Var X n

1≤k≤n

and the converse also holds true, where X n =

1 n

n  i=1

(6.A.12)

n X i is the sample mean of {X i }i=1

defined in (5.4.1).

Appendix 6.3 Properties of Convergence (A) Continuity of Expected Values ∞ When the sequence {X n }∞ n=1 converges to X , the sequence {E {X n }}n=1 of expected values will also converge to the expected value E{X }, which is called the continuity of expected values. The continuity of expected values (Gray and Davisson 2010) is a consequence of the continuity of probability discussed in Appendix 2.1.

(1) Monotonic convergence. If 0 ≤ X n ≤ X n+1 for

every integer n, then E {X n } → E{X } as n → ∞. In other words, E lim X n = lim E {X n }. n→∞

n→∞

(2) Dominated convergence. If |X n | < Y for every integer n and E{Y } < ∞, then E {X n } → E{X } as n → ∞. (3) Bounded convergence. If there exists a constant c such that |X n | ≤ c for every integer n, then E {X n } → E{X } as n → ∞. (B) Properties of Convergence We list some properties among various types of convergence. Here, a and b are constants. p

p

p

p

(1) If X n → X , then X n − X → 0, a X n → a X , and X n − X m → 0 for n, m → ∞. p p (2) If X n → X and X n → Y , then P(X = Y ) = 1. p p (3) If X n → a, then X n2 → a 2 . p p (4) If X n → 1, then X1n → 1. p

p

(5) If X n → X and Y is a random variable, then X n Y → X Y . p p p p (6) If X n → X and Yn → Y , then X n ± Yn → X ± Y and X n Yn → X Y . p p p (7) If X n → a and Yn → b = 0, then XYnn → ab . d

d

d

(8) If X n → X , then X n + a → X + a and bX n → bX for b = 0. d

p

d

p

(9) If X n → a, then X n → a. Therefore, X n → a  X n → a. p d d (10) If |X n − Yn | → 0 and Yn → Y , then X n → Y . Based on this, it can be shown p d that X n → X when X n → X .

448

6 Convergence of Random Variables p

d

d

d

(11) If X n → X and Yn → a, then X n ± Yn → X ± a, X n Yn → a X for a = 0, p

X n Yn → 0 for a = 0, and

Xn Yn

d



X a

for a = 0.

    (12) If X n −→ X , then lim E {X n } = E{X } and lim E X n2 = E X 2 . n→∞ n→∞   Lr (13) If X n → X , then lim E |X n |r = E {|X |r }. r =2

n→∞

p

a.s.

(14) If X 1 > X 2 > · · · > 0 and X n → 0, then X n −→ 0. (C) Convergence and Limits of Products Consider the product An =

n 

ak .

(6.A.13)

k=1

The infinite product

∞ 

ak is called convergent to the limit A when An → A and

k=1

A = 0 for n → ∞; divergent to 0 when An → 0; and divergent when An is not convergent to a non-zero value. The convergence of products is often related to the convergence of sums as shown below. (1) When all the real numbers {ak }∞ k=1 are positive, the convergence of that of

∞ 

∞ 

ak and

k=1

ln ak are the necessary and sufficient conditions of each other.

k=1

(2) When all the real numbers {ak }∞ k=1 are positive, the convergence of and that of

∞ 

∞ 

(1 + ak )

k=1

ak are the necessary and sufficient conditions of each other.

k=1

(3) When all the real numbers {ak }∞ k=1 are non-negative, the convergence of ak ) and that of

∞ 

∞ 

(1 −

k=1

ak are the necessary and sufficient conditions of each other.

k=1

Appendix 6.4 Inequalities In this appendix we introduce some useful inequalities (Beckenbach and Bellam 1965) in probability spaces. (A) Inequalities for Random Variables Theorem 6.A.2 (Rohatgi and Saleh 2001) If a measurable function h is non-negative and E{h(X )} exists for a random variable X , then

Appendices

449

P(h(X ) ≥ ε) ≤

E{h(X )} ε

(6.A.14)

for ε > 0, which is called the tail probability inequality. Proof Assume X is a discrete random variable. Letting P (X = xk ) = pk , we

   h (xk ) pk ≥ h (xk ) pk when A = {k : h (xk ) ≥ }: have E{h(X )} = + A Ac A  this yields E{h(X )} ≥ pk = P(h(X ) ≥ ) and, subsequently, (6.A.14). ♠ A

Theorem 6.A.3 If X is a non-negative random variable, then6 P(X ≥ α) ≤

E{X } α

(6.A.15)

for α > 0, which is called the Markov inequality. The Markov inequality can be proved easily from (6.A.14) by letting  ∞h(X ) = |X | and ε = α. We can show the Markov inequality also from E{X } = 0 x f X (x) ∞ ∞ d x ≥ α x f X (x)d x ≥ α α f X (x)d x = αP(X ≥ α) by recollecting that a pdf is non-negative. Theorem 6.A.4 The mean E{Y } and variance Var{Y } of any random variable Y satisfy P(|Y − E{Y }| ≥ ε) ≤

Var{Y } ε2

(6.A.16)

for any ε > 0, which is called the Chebyshev inequality. 2 Proof The random variable X = (Y − Thus, if we use  E{Y1 })  is non-negative.  } 2 2 . Now, not(6.A.15), we get P [Y − E{Y }] ≥ ε ≤ ε2 E [Y − E{Y }]2 = Var{Y ε2   2 2 ing that P [Y − E{Y }] ≥ ε = P(|Y − E{Y }| ≥ ε), we get (6.A.16). ♠

Theorem 6.A.5 (Rohatgi and Saleh 2001) The absolute mean E{|X |} of any random variable X satisfies ∞ 

P(|X | ≥ n) ≤ E{|X |} ≤ 1 +

n=1

∞ 

P(|X | ≥ n),

(6.A.17)

n=1

which is called the absolute mean inequality. Proof Let the pdf of a continuous random variable X be f X . Then, because E{|X |} = ∞  ∞  −∞ |x| f X (x)d x = k≤|x| 0.

450

6 Convergence of Random Variables ∞ 

kP(k ≤ |X | < k + 1) ≤ E{|X |}

k=0



∞  (k + 1)P(k ≤ |X | < k + 1).

(6.A.18)

k=0

Now, employing P(|X | ≥ n) and 1+

∞ 

∞ 

∞ 

kP(k ≤ |X | < k + 1) =

∞  ∞ 

P(k ≤ |X | < k + 1) =

n=1 k=n

k=0

(k + 1)P(k ≤ |X | < k + 1) = 1 +

k=0

∞ 

∞  n=1

kP(k ≤ |X | < k + 1) =

k=0

P(|X | ≥ n) in (6.A.18), we get (6.A.17). A similar procedure will show the

n=1



result for discrete random variables. Theorem 6.A.6 If f is a convex7 function, then E {h(X )} ≥ h (E{X }) ,

(6.A.19)

which is called the Jensen inequality. Proof Let m = E{X }. Then, from the intermediate value theorem, we have 1 h(X ) = h(m) + (X − m)h  (m) + (X − m)2 h  (α) 2

(6.A.20)

for −∞ < α < ∞. Taking the expectation of the above equation, we get E{h(X )} = h(m) + 21 h  (α)σ X2 . Recollecting that h  (α) ≥ 0 and σ X2 ≥ 0, we get E{h(X )} ≥ h(m) = h(E{X }). ♠ Theorem 6.A.7 (Rohatgi and Saleh 2001) If the n-th absolute moment E {|X |n } is finite, then  1  1 E{|X |s } s ≤ E{|X |r } r

(6.A.21)

for 1 ≤ s < r ≤ n, which is called the Lyapunov inequality. Proof Consider the bi-variable formula ( Q(u, v) =



−∞

  k−1 k+1 2 u|x| 2 + v|x| 2 f (x) d x,

(6.A.22)

A function h is called convex or concave up when h(t x + (1 − t)y) ≤ th(x) + (1 − t)h(y) for every two points x and y and for every choice of t ∈ [0, 1]. A convex function is a continuous function with a non-decreasing derivative and is differentiable except at a countable number of points. In addition, the second order derivative of a convex function, if it exists, is non-negative.

7

Appendices

451

where f is the pdf

of X . Letting βn = E {|X |n }, (6.A.22) can be written as Q(u, v) = β β  βk−1 βk k k (u v) (u v)T . Now, we have  k−1 k  ≥ 0, i.e., βk2k ≤ βk−1 βk+1 βk βk+1 βk βk+1 because Q ≥ 0 for every choice of u and v. Therefore, we have 2(n−1) n−1 n−1 ≤ βn−2 βn β12 ≤ β01 β21 , β24 ≤ β12 β32 , · · · , βn−1

(6.A.23)

with β0 = 1. If we multiply the first k − 1 consecutive inequalities in (6.A.23), then 1

k ≤ βkk−1 for k = 2, 3, . . . , n, from which we can easily get β1 ≤ β22 ≤ we have βk−1 1

1

β33 ≤ · · · ≤ βnn .



Theorem 6.A.8 Let g(x) be a non-decreasing and non-negative function for x ∈ |)} (0, ∞). If E{g(|X is defined, then g(ε) P(|X | ≥ ε) ≤

E{g(|X |)} g(ε)

(6.A.24)

for ε > 0, which is called the generalized Bienayme-Chebyshev inequality.  Proof Let the cdf of X be F(x). Then, we get E{g(|X |)} ≥ g(ε) |x|≥ε d F(x) =   g(ε)P(|X | ≥ ε) by recollecting E{g(|X |)} = |x| 0, we have P(|X | ≥ ε) ≤

E{|X |r } εr

(6.A.25)

for ε > 0, which is called the Bienayme-Chebyshev inequality. (B) Inequalities of Random Vectors Theorem 6.A.10 (Rohatgi and Saleh 2001) For two random variables X and Y , we have     E2 {X Y } ≤ E X 2 E Y 2 ,

(6.A.26)

which is called the Cauchy-Schwarz inequality.     Proof First, note that E{|X Y |} exists when E X 2 < ∞ and E Y 2 < ∞ because   2 2 for real numbers a and b. Now, if E X 2 = 0, then P(X = 0) = 1 and |ab| ≤ a +b 2

452

6 Convergence of Random Variables

  thus E{X Y } = 0, holds true. Next when E X 2 > 0, rec (6.A.26)      implying2that ollecting that E (α X + Y ) = α 2 E X 2 + 2αE{X Y } + E Y 2 ≥ 0 for any real   2 2 Y} number α, we have EE{{XX 2Y}} − 2 EE{{XX 2Y}} + E Y 2 ≥ 0 by letting α = − EE{X { X 2 } . This inequality is equivalent to (6.A.26). ♠ Theorem 6.A.11 (Rohatgi and Saleh 2001) For zero-mean independent random k  n  n variables {X i }i=1 with variances σi2 i=1 , let Sk = X j . Then, j=1



P

max |Sk | > ε



1≤k≤n

n  σ2 i

i=1

(6.A.27)

ε2

for ε > 0, which is called the Kolmogorov inequality.    Proof Let A0 = , Ak = max  S j  ≤ ε for k = 1, 2, . . . , n, and Bk = 1≤ j≤k

Ak−1 ∩ Ack = {|S1 | ≤ ε, |S2 | ≤ ε, . . . , |Sk−1 | ≤ ε} ∩ {at least one of |S1 | , |S2 | , . . . , |Sk | is larger than ε}, i.e., Bk = {|S1 | ≤ ε, |S2 | ≤ ε, . . . , |Sk−1 | ≤ ε, |Sk | > ε} .

(6.A.28)

n

Then, Acn = ∪ Bk and Bk ⊆ {|Sk−1 | ≤ ε, |Sk | > ε}. Recollecting the indicator k=1   2  function K A (x) defined in (2.A.27), we get E Sn K Bk (Sk ) = E {(Sn − Sk ) 2  , i.e., K Bk (Sk ) + Sk K Bk (Sk ) E



Sn K Bk (Sk )

2 

   2  = E (Sn − Sk )2 K Bk (Sk ) + E Sk K Bk (Sk )   +E 2Sk (Sn − Sk ) K Bk (Sk ) . (6.A.29)

Noting that Sn − Sk = X k+1 + X k+2 + · · · + Xn and Sk K Bk (Sk ) are independent |Sk | ≥ ε under {X } of each other, that  E k =0, that E K Bk (Sk ) = P (Bk), and that  2  2 2 = E (Sn − Sk ) K Bk (Sk ) + E Sk K Bk (Sk ) ≥ Bk , we have E Sn K Bk (Sk )  2  E Sk K Bk (Sk ) , i.e., E



Sn K Bk (Sk )

from (6.A.29). Subsequently, using ≤E



Sn2



=

n  k=1

2  n 

≥ ε2 P (Bk ) E



Sn K Bk (Sk )

k=1

σk2

and (6.A.30), we get

is the same as (6.A.27).

n 

k=1

σk2 ≥ ε2

n  k=1

(6.A.30) 2 

  = E Sn2 K Acn (Sn )

  P (Bk ) = ε2 P Acn , which ♠

Appendices

453

Example 6.A.7 (Rohatgi and Saleh 2001) The Chebyshev inequality (6.A.16) with E{Y } = 0, i.e., P (|Y | > ε) ≤

Var{Y } ε2

is the same as the Kolmogorov inequality (6.A.27) with n = 1.

(6.A.31) ♦

n Theorem 6.A.12 Consider i.i.d. random variables {X i }i=1 with marginal mgf n  tX   M(t) = E e i . Let Yn = X i and g(t) = ln M(t). If we let the solution to i=1

α = ng  (t) be tr for a real number α, then    P (Yn ≥ α) ≤ exp −n tr g  (tr ) − g (tr ) , tr ≥ 0

(6.A.32)

   P (Yn ≤ α) ≤ exp −n tr g  (tr ) − g (tr ) , tr ≤ 0.

(6.A.33)

and

The inequalities (6.A.32) and (6.A.33) are called the Chernoff bounds. When tr = 0, the right-hand sides of the two inequalities (6.A.32) and (6.A.33) are both 1 from g (tr ) = ln M (tr ) = ln M(0) = 0: in other words, the Chernoff bounds simply say that the probability is no larger than 1 when tr = 0, and thus the Chernoff bounds are more useful when tr = 0. Example 6.A.8 (Thomas 1986) Let X ∼ N (0, 1), n = 1, and Y1 = X . From the  2 2 mgf M(t) = exp t2 , we get g(t) = ln M(t) = t2 and g  (t) = t. Thus, the solution to α = ng  (t) = t is tr = α. In other words, the Chernoff bounds can be written as

α2 , α≥0 P(X ≥ α) ≤ exp − 2

(6.A.34)

α2 , α≤0 P(X ≤ α) ≤ exp − 2

(6.A.35)

and

for X ∼ N (0, 1). ♦ Example 6.A.9 For X ∼ P(λ), assume n = 1 and Y1 = X . From the mgf M(t) = exp{λ(et − 1)}, we get g(t)=ln M(t) = λ(et − 1) and g  (t) = λet . Solving α = ng  (t) = λet , we get tr = ln αλ . Thus, tr > 0 when α > λ, tr = 0 when α = λ, and tr < 0 when α < λ. Therefore, we have

454

6 Convergence of Random Variables

P(X ≥ α) ≤ e−λ



eλ α

α

, α≥λ

(6.A.36)

, α≤λ

(6.A.37)

and P(X ≤ α) ≤ e

−λ



eλ α

α

   α because n tr g  (tr ) − g (tr ) = ln αλ − α + λ from   − λ and tr g  (tr ) = α ln αλ . Theorem 6.A.13 If p and q are both larger than 1 and

1 p

g (tr ) = λ +

1 q



λ

 −1 =α

= 1, then

 1   1  E{|X Y |} ≤ E p  X p  E q Y q  ,

(6.A.38)

which is called the Hölder inequality. Theorem 6.A.14 If p > 1, then    1  1  1  E p |X + Y | p ≤ E p  X p  + E p Y p  ,

(6.A.39)

which is called the Minkowski inequality. It is easy to see that the Minkowski inequality is a generalization of the triangle inequality |a − b| ≤ |a − c| + |c − b|.

Exercises Exercise 6.1 For the sample space [0, 1], consider a sequence of random variables defined by  X n (ω) =

1, ω ≤ n1 , 0, ω > n1

(6.E.1)

and let X (ω) = 0 for ω ∈ [0, 1]. Assume the probability measure P(a ≤ ω ≤ b) = b − a, the Lebesgue measure mentioned following (2.5.24), for 0 ≤ a ≤ b ≤ 1. Discuss if {X n (ω)}∞ n=1 converges to X (ω) surely or almost surely. Exercise 6.2 For the sample space [0, 1], consider the sequence ⎧ 1 , ⎨ 3, 0 ≤ ω < 2n 1 X n (ω) = 4, 1 − 2n < ω ≤ 1, ⎩ 1 1 5, 2n < ω < 1 − 2n

(6.E.2)

Exercises

455

and let X (ω) = 5 for ω ∈ [0, 1]. Assuming the probability measure P(a ≤ ω ≤ b) = b − a for 0 ≤ a ≤ b ≤ 1, discuss if {X n (ω)}∞ n=1 converges to X (ω) surely or almost surely. n are independent random variables, obtain the distribuExercise 6.3 When {X i }i=1 n  tion of Sn = X i in each of the following five cases of the distribution of X i . i=1

(1) (2) (3) (4) (5)

geometric distribution with parameter α, NB (ri , α), P (λi ), G (αi , β), and C (μi , θi ).

Exercise 6.4 To what does

Sn n

converge in Example 6.2.10?

X√−λ λ

for a Poisson random variable X ∼ P(λ). Noting that    the mgf of X is M X (t) = exp λ et − 1 , show that Y converges to a standard normal random variable as λ → ∞.

Exercise 6.5 Let Y =

Exercise 6.6 For a sequence {X n }∞ n=1 with the pmf P (X n = x) =

1

, x = 1, n 1 − n1 , x = 0,

(6.E.3)

l

show that X n → X , where X has the distribution P(X = 0) = 1. Exercise 6.7 Discuss if the weak law of large numbers holds true for a sequence of i.i.d. random variables with marginal pdf f (x) = x1+α 2+α u(x − 1), where α > 0. Exercise 6.8 Show that Sn =

n 

X k converges to a Poisson random variable with

k=1

distribution P(np) when n → ∞ for an i.i.d. sequence {X n }∞ n=1 with marginal distribution b(1, p). ∞ with Exercise 6.9 Discuss the central limit theorem for an i.i.d. sequence {X i }i=1 marginal distribution B(α, β). n has marginal distribution P(λ). When Exercise 6.10 An i.i.d. sequence {X i }i=1 n  n is large enough, we can approximate as Sn = X k ∼ N (nλ, nλ). Using the k=1

continuity correction, obtain the probability P (50 < Sn ≤ 80). n with P (X i = 1) Exercise 6.11 Consider an i.i.d. Bernoulli sequence {X i }i=1 n , = p, a binomial random variable M ∼ b(n, p) which is independent of {X i }i=1 n  and K = X i . Note that K is the number of successes in n i.i.d. Bernoulli trials. i=1

Obtain the expected values of U =

K  i=1

X i and V =

M  i=1

Xi .

456

6 Convergence of Random Variables

Exercise 6.12 The result of a game is independent of another game, and the probabilities of winning and losing are each 21 . Assume there is no tie. When a person wins, the person gets 2 points and then continues. On the other hand, if the person loses a round, the person gets 0 points and stops. Obtain the mgf, expected value, and variance of the score Y that the person may get from the games. Exercise 6.13 Let Pn be the probability that we have more head than tail in a toss of n fair coins. (1) Obtain P3 , P4 , and P5 . (2) Obtain the limit lim Pn . n→∞

Exercise 6.14 For an i.i.d. sequence {X n ∼ N (0, 1)}∞ n=1 , let the cdf of X n = n  1 X i be Fn . Obtain lim Fn (x) and discuss whether the limit is a cdf or not. n i=1

n→∞

Exercise 6.15 Consider X [1] = min (X 1 , X 2 , . . . , X n ) for an i.i.d. sequence {X n ∼ U (0, θ )}∞ n=1 . Does Yn = n X [1] converge in distribution? If yes, obtain the limit cdf. Exercise 6.16 The marginal cdf F of an i.i.d. sequence {X n }∞ n=1 is absolutely contin∞ uous. For the sequence {Yn }∞ n=1 = {n {1 − F (Mn )}}n=1 , obtain the limit lim FYn (y) of the cdf FYn of Yn , where Mn = max (X 1 , X 2 , . . . , X n ).

n→∞

Exercise 6.17 Is the sequence of cdf’s ⎧ x < 0, ⎨ 0, Fn (x) = 1 − n1 , 0 ≤ x < n, ⎩ 1, x ≥n

(6.E.4)

convergent? If yes, obtain the limit. n n Exercise 6.18 Inthe sequence i = X + Wi }i=1 , X and {Wi }i=1 are independent  {Y2  n 2 < ∞. of each other, and Wi ∼ N 0, σi i=1 is an i.i.d. sequence, where σi2 ≤ σmax We estimate X via n 1 Yi Xˆ n = n i=1

(6.E.5)

and let the error be εn = Xˆ n − X . (1) (2) (3) (4)

n ˆ Express the cf, mean, and variance  of X n in terms of those of X and {Wi }i=1 .  Obtain the covariance Cov Yi , Y j . Obtain the pdf f εn (α) and the conditional pdf f Xˆ |X (α|β). Does Xˆ n converge to X ? If yes, what is the type of the convergence? If not, what is the reason?

Exercises

457

∞ Exercise 6.19 Assume an i.i.d. sequence {X i }i=1 with marginal pdf f (x) = −x+θ u(x − θ ). Show that e p

min (X 1 , X 2 , . . . , X n ) → θ

(6.E.6)

and that p

Y → 1+θ for Y =

1 n

n 

(6.E.7)

Xi .

i=1 p

Exercise 6.20 Show that max (X 1 , X 2 , . . . , X n ) −→ θ for an i.i.d. sequence ∞ {X i }i=1 with marginal distribution U (0, θ ). Exercise 6.21 Assume an i.i.d. sequence {X n }∞ n=1 with marginal cdf ⎧ ⎨ 0, x ≤ 0, F(x) = x, 0 < x ≤ 1, ⎩ 1, x > 1.

(6.E.8)

∞ Let {Yn }∞ n=1 and {Z n }n=1 be defined by Yn = max (X 1 , X 2 , . . . , X n ) and Z n = ∞ {Z n (1 − Yn ). Show that the sequence  n }n=1 converges in distribution to a random  −z u(z). variable Z with cdf F(z) = 1 − e

Exercise 6.22 For the sample space  = {1, 2, . . .} and probability measure P(n) = α , assume a sequence {X n }∞ n=1 such that n2  X n (ω) =

n, ω = n, 0, ω = n.

(6.E.9)

surely, but does not Show that, as n → ∞, {X n }∞ n=1 converges to X = 0 almost  converge to X = 0 in the mean square, i.e., E (X n − 0)2  0. ∞ is finite. Show that Exercise 6.23 The second moment of an i.i.d. sequence {X i }i=1 n  p 2 Yn → E {X 1 } for Yn = n(n+1) i Xi . i=1

Exercise 6.24 For a sequence {X n }∞ n=1 with  P (X n = x) =

1 − n1 , 1 , n

  r =2 we have X n −→ 0 because lim E X n2 = lim {X n }∞ n=1

n→∞

does not converge almost surely.

x = 0, x = 1,

1 n→∞ n

(6.E.10)

= 0. Show that the sequence

458

6 Convergence of Random Variables

n Exercise 6.25 Consider a sequence {X i }i=1 with a finite common variance σ 2 . When the correlation coefficient between X i and X j is negative for every i = j, show n follows the weak law of large numbers. (Hint. Assume that the sequence {X i }i=1 n  ∞ Yn = n1 with mean E {X i } = m i . Then, it is (X k − m k ) for a sequence {X i }i=1 k=1

∞ known that a necessary and sufficient condition for {X i }i=1 to satisfy the weak law of large numbers is that



Yn2 E 1 + Yn2

→ 0

(6.E.11)

as n → ∞.) n Exercise 6.26 For an i.i.d. sequence {X i }i=1 , let E {X i } = μ, Var {X i } = σ 2 ,  4 l n → Z for Vn = and E X i < ∞. Find the constants an and bn such that Vnb−a n n  (X k − μ)2 , where Z ∼ N (0, 1). k=1 α Exercise 6.27 When the sequence {X k }∞ k=1 with P (X k = ±k ) = strong law of large numbers, obtain the range of α.

1 2

satisfies the

Exercise 6.28 Assume a Cauchy random variable X with pdf f X (x) =

a . π(x 2 +a 2 )

(1) Show that the cf is ϕ X (w) = e−a|w| .

(6.E.12)

(2) Show that the sample mean of n i.i.d. Cauchy random variables is a Cauchy random variable. 100 . For S = Exercise 6.29 Assume an i.i.d. sequence {X i ∼ P(0.02)}i=1

100 

X i , obtain

i=1

the value P(S ≥ 3) using the central limit theorem and compare it with the exact value. Exercise 6.30 Consider the sequence of cdf’s ⎧ ⎪ ⎨ 0, Fn (x) = x 1 − ⎪ ⎩ 1,

sin(2nπ x) 2nπ x

x ≤ 0, ,

0 < x ≤ 1, x ≥ 1,

(6.E.13)

among which four are shown in Fig. 6.3. Obtain lim Fn (x) and discuss if n→∞

 d d lim F (x) is the same as lim F (x) . n dx dx n n→∞

n→∞

Exercises

1

459

F1 (x)

1

0

1 x

F4 (x)

1

0

Fig. 6.3 Four cdf’s Fn (x) = x 1 −

1 x

sin(2nπ x) 2nπ x

F16 (x)

0

1

1 x

F64 (x)

0

1 x

for n = 1, 4, 16, and 64 on x ∈ [0, 1]

 ∞ Exercise 6.31 Assume an i.i.d. sequence X i ∼ χ 2 (1) i=1 . Then, we have Sn ∼ χ 2 (n), E {Sn } = n, and Var {Sn } = 2n. Thus, letting Z n = √12n (Sn − n) = − n2  0n    tZ  0 n  Sn 2t n √ = exp −t 1 − − 1 , the mgf M (t) = E e of Z n can be n 2 n 2 2n obtained as 

* ! * !− n2 * 2 2 2 Mn (t) = exp t exp t −t n n n for t
0 and < 0, respectively. (0) (0) (1) , α0(1) , α1(0) , α−2 , α−1 , α0(2) , sequence S = (x0 , x1 , . . .) → sequence α0(0) , α−1 (0) (1) (2) , α−2 , α−1 , α0(3) , α1(2) , α2(1) , α3(0) , . . .. α1(1) , α2(0) , α−3 Exercise 1.11 (1) f (n) = (k, l), where n = 2k × 3l · · · is the factorization of n in prime factors. k l (2) f (n) = (−1) , where n = 2k × 3l × 5m · · · is the factorization of n in prime m+1 factors. (3) For an element x = 0.α1 α2 · · · of the Cantor set, let f (x) = 0. α21 α22 · · · . (4) a sequence (α1 , α2 , . . .) of 0 and 1 → the number 0.α1 α2 · · · . Exercise 1.12 (1) When two intervals (a, b) and (c, d) are both finite, f (x) = x−a . c + (d − c) b−a When a is finite, b = ∞, and (c, d) is finite, f (x) = c + π2 (d − c)arctan(x − a). Similarly in other cases. (2) S1 → S2 , where S2 is an infinite sequence of 0 and 1 obtained by replacing 1 with (1, 0) and 2 with (1, 1) in an infinite sequence S1 = (a0 , a1 , . . .) of 0, 1, and 2. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 I. Song et al., Probability and Random Variables: Theory and Applications, https://doi.org/10.1007/978-3-030-97679-8

461

462

Answers to Selected Exercises

(3) Denote a number x ∈ [0, 1) by x = 0.a1 a2 · · · in decimal system. Let us make consecutive 9’s and the immediately following non-9 one digit into a group and each other digit into a group. For example, we have x = 0.(1)(2)(97)(9996)(6)(5) (99997)(93) · · · . Write the number as x = 0. (x1 ) (x2 ) (x3 ) · · · . Then, letting y = 0. (x1 ) (x3 ) (x5 ) · · · and z = 0. (x2 ) (x4 ) (x6 ) · · · , f (x) = (y, z) is the desired oneto-one correspondence. Exercise 1.14 from (m, n) to k: k = g(m, n) = m + 21 (m + n)(m + n + 1). from k to (m, n): m = k − 21 a(a + 1). n = a − m, where a is an integer such that a(a + 1) ≤ 2k < (a + 1)(a + 2). Exercise 1.15 The collection of intervals with rational end points in the space R of real numbers is countable. Exercise 1.17 It is a rational number. It is a rational number. It is not always a rational number. √ . (4) c = 2a+b and/or d = a+2b . . (2) a+b . (3) a + b−a Exercise 1.18 (1) a+b 2 2 3 3 2 (5) Assume a = 0.a1 a2 · · · and b = 0.b1 b2 · · · , where ai ∈ {0, 1, . . . , 9} and bi ∈ {0, 1, . . . , 9}. Let k = arg min {bi > ai } and l = arg min {bi > 0}. Then, the number i  i>k  c = 0.c1 c2 · · · ck · · · cl−1 cl = 0.b1 b2 · · · bk · · · bl−1 bl − 1 . (6) The number c that can be found by the procedure, after replacing a with g, in (5), . where g = a+b 2 ∞ Exercise 1.19 Let A = {ai }i=0 and ai = 0.α1(i) α2(i) α3(i) · · · . Assume the second player ( j) ( j) chooses y j = 4 when α2 j = 4 and y j = 6 when α2 j = 4. Then, for any sequence x1 , x2 , . . . of numbers the first player has chosen, the second player wins because the number 0.x1 y1 x2 y2 · · · is not the same as any number a j and thus is not included in A.     Exercise 1.25 (1) u(ax + b) = u x + ab u(a) + u −x − ab u(−a). ∞  {u(x − 2nπ) − u(x − (2n + 1)π)}. u(sin x) = n=−∞  0, x < ln π, u (e x − π) = = u(x − ln π). 1, x > ln π x (2) −∞ u(t − y)dt = (x − y)u(x − y). Exercise 1.27 δ  (x) cos x = δ  (x). 2 2π Exercise 1.28 −2π eπx δ x 2 − π 2 d x = coshπ π . δ(sin x) = δ(x) + δ(x − π). ∞   Exercise 1.29 −∞ (cos x + sin x)δ  x 3 + x 2 + x d x = 1. Exercise 1.31 

∞  ∞ (1) 1 + n1 , 2 n=1 → (1, 2). (2) 1 + n1 , 2 n=1 → (1, 2].



∞   (3) 1, 1 + n1 n=1 → (1, 1] = ∅. (4) 1, 1 + n1 n=1 → [1, 1] = {1}.  

 ∞ ∞ (5) 1 − n1 , 2 n=1 → [1, 2). (6) 1 − n1 , 2 n=1 → [1, 2].     ∞ ∞ (7) 1, 2 − n1 n=1 → (1, 2). (8) 1, 2 − n1 n=1 → [1, 2). 1 1 Exercise 1.32 0 lim f n (x)d x = 0. lim 0 f n (x)d x = 21 . n→∞  1 n→∞ b Exercise 1.33 0 lim f n (x)d x = 0. lim 0 f n (x)d x = ∞. n→∞

n→∞

Answers to Selected Exercises

463

Exercise 1.34 The number of all possible arrangements with ten distinct red balls and ten distinct black balls = 20! ≈ 2.43 × 1018 . Exercise 1.41 When p > 0, p C0 − p C1 + p C2 − p C3 + · · · = 0. When p > 0, p C0 + p C1 + p C2 + p C3 + · · · = 2 p . ∞ ∞   p−1 . When p > 0, p C2k+1 = p C2k = 2 k=0

k=0 1

2

3

z − . . .. Exercise 1.42 (1 + z) 2 = 1 + 2z − z8 + 16  1 3 2 5 3 35 4 z + z − z + z − ..., |z| < 1, 1 − 1 2 8 16 128 (1 + z)− 2 = 1 − 23 3 − 25 5 − 27 35 − 29 − 21 z − 2 z + 8 z − 16 z + 128 z − . . . , |z| > 1.

Chapter 2 Fundamentals of Probability Exercise 2.1 F (C) = {∅, {a}, {b}, {a, b}, {c, d}, {b, c, d}, {a, c, d}, S}. Exercise 2.2 σ (C) = {∅, {a}, {b}, {a, b}, {c, d}, {b, c, d}, {a, c, d}, S}. Exercise 2.3 (1) Denoting the lifetime of the battery by t, S = {t : 0 ≤ t < ∞}. (2) S = {(n, m) : (0, 0), (1, 0), (1, 1), (2, 0), (2, 1), (2, 2)}. (3) S = {(1, red), (2, red), (3, green), (4, green), (5, blue)}. Exercise 2.4 P ( AB c + B Ac ) = 0 when P(A) = P(B) = P(AB). Exercise 2.5 P(A ∪ B) = 21 . P(A ∪ C) = 23 . P(A ∪ B ∪ C) = 1. Exercise 2.6 (1) C = Ac ∩ B. Exercise 2.7 The probability that red balls and black balls are placed in an alternating ≈ 1.08 × 10−5 . fashion = 2×10!×10! 20! Exercise 2.8 P(two nodes are disconnected) = p 2 (2 − p). Exercise 2.12 Buying 50 tickets in one week brings us a higher probability of getting the than buying one ticket over 50 weeks. winning ticket  99 50 1 versus 1 − 100 ≈ 0.395 2 Exercise 2.13 16 . Exercise 2.14 21 .    k Exercise 2.15 P C ∩ (A − B)c = ∅ = 58 .  1 n−1 1    n−1 1 1 − + 4 . pn,B = 12 + . Exercise 2.16 pn,A = − 41 − 13



 1 9  1310  4 1 1 ≈ 0.250013. p10,B = 4 1 − − 3 ≈ 0.249996. p10,A = 4 1 − − 3 Exercise 2.17 (1), (2) probability of no match  0, N = 1, = 1 (−1) N 1 − 3! + · · · + N ! , N = 2, 3, . . . . 2! (3)⎧probability of k matches

 (−1) N −k 1 1 1 ⎪ ⎨ k! 2! − 3! + · · · + (N −k)! , k = 0, 1, . . . , N − 2, = 0, k = N − 1, ⎪ ⎩1 , k = N. k!

464

Answers to Selected Exercises

Exercise 2.18 43 . Exercise 2.19 (1) α = p 2 . P((k, m) : k ≥ m) = 2−1 p . (2) P((k, m) : k + m = r ) = p 2 (1 − p)r −2 (r − 1). (3) P((k, m) : k is an odd number) = 2−1 p . 3 . P(A|B) = 35 . P(B|A) = 37 . Exercise 2.20 P(A ∩ B) = 10 398 Exercise 2.21 Probability that only two will hit the target = 1000 . 1−s−q+qr p c c |A . P . = 1 − q or s−r Exercise 2.22 (4) P (B | A) = 1 − p or (B ) r 1−r (1−q)(1−r ) s−r p (1− p)r 1−s−q+qr c c P (A | B) = or s . P ( A |B ) = 1−s or 1−s . s (5) S = {A defective element is identified to be defective, A defective element is identified to be functional, A functional element is identified to be defective, A functional element is identified to be functional}.   Exercise 2.25 (1) Ai and A j are independent if P ( A  i ) = 0 or P A j = 0. Ai and A j are not independent if P ( Ai ) = 0 and P A j = 0. (2) partition of B = {B A1 , B A2 , . . . , B An }. 3 Exercise 2.26 P(red ball) = 25 × 21 + 15 × 21 = 10 . 2n+1 Exercise 2.27 P(red ball) = 3n .   n k−1 n . Exercise 2.28 pn,k = P(ends in k trials) = 1 − 3n−1 3n−1 . Exercise 2.29 Probability of Candidate A leading always = n−m n+m  1 2n −1 Exercise 2.30 β0 = 4 . n−k Exercise 2.31 P(A) = n Ck 23n . Exercise 2.32 (1) p10 = 1 − p11 . p01 = 1 − p00 . (2) P(error) = (1 − p00 ) (1 − p) + (1 − p11 ) p. (3) P(Y = 1) = pp11 + (1 − p) (1 − p00 ). P(Y = 0) = p (1 − p11 ) + (1 − p) p00 . ⎧ pp11 , j = 1, k = 1, ⎪ p)(1− p00 ) ⎪ pp11 +(1− ⎪ p(1− p11 ) ⎨ , j = 1, k = 0, p11 )+(1− p) p00 (4) P ( X = j| Y = k) = p(1− (1− p)(1− p00 ) , ⎪ pp11 +(1− p)(1− p00 ) j = 0, k = 1, ⎪ ⎪ ⎩ (1− p) p00 , j = 0, k = 0. p(1− p11 )+(1− p) p00 1 (5) P (Y = 1) = P (Y = 0) = 2 . P ( X = 1| Y = 0) = P ( X = 0| Y = 1) = 1 − p11 . P ( X = 1| Y = 1) = P ( X = 0| Y = 0) = p11 . . α0,1 = m(n−m) . α1,0 = m(n−m) . Exercise 2.33 (1) α1,1 = m(m−1) n(n−1) n(n−1) n(n−1) (n−m)(n−m−1) α0,0 = . n(n−1) . α˜ 1,0 = m(n−m) . α˜ 0,0 = (n−m) . (2) α˜ 1,1 = mn 2 . α˜ 0,1 = m(n−m) n2 n2 n2 (n−m)(n−m−1) 2m(n−m) m(m−1) . β1 = n(n−1) . β2 = n(n−1) . (3) β0 = n(n−1) 15 143 8 . P0 = 280 . P1 = 122 . (2) P3 = 15 . Exercise 2.34 (1) P2 = 280 280 r (1− p)( pb) Exercise 2.35 (1) P(r brown eye children) = (1− . r +1 p+ pb) 2

2

r

p) p (2) P(r boys) = 2(1− . (2− p)r +1 (3) P(at least two boys|at least one boy) =

p . 2− p

Answers to Selected Exercises

Exercise 2.37

465

k N q − qp p N . pk = 1− qp 3 7 . (2) p2 = 10 . (1) p1 = 10 1 c = 5. (2) 3c11 = 2c12 + c13 .

(3) P(15 red, 10 white|red flower) = 21 . Exercise 2.38 Exercise 2.39 Exercise 2.41 Exercise 2.42 (1) B1 and R are independent of each other. B1 and G are independent of each other. (2) B2 and R are not independent of each other. B3 and G are not independent of each other. Exercise 2.43 A1 , A2 , and A3 are not mutually independent. Exercise 2.44 A and C are not independent of each other. 60×60−50× 50 ×2

2 Exercise 2.45 Probability of meeting = = 11 ≈ 0.3056. 60×60 36 1 1 Exercise 2.46 p1 = 2 . p2 = 3 . 23 Exercise 2.47 P (red ball| given condition) = 40 = 0.575.  8  10−8 = 405 Exercise 2.48 (2) P(one person wins eight times) = 10 C8 41 1 − 41 410 −4 ≈ 3.8624 × 10 . 10  1 k  10−k  1 − 41 P(one person wins at least eight times) = ≈ 4.1580 × 10−4 . 10 Ck 4

k=8

Exercise 2.50 Probability that a person playing piano is a man = 21 .  50  30 15 5 50! 1 315 1 Exercise 2.51 30,15,5 0.5 0.3 0.2 = 30!15!5! ≈ 3.125 × 10−4 . 230 1015 55

Chapter 3 Random Variables  FX (y + c), y ≥ 0, Exercise 3.2 Fg(X ) (y) = ⎧ FX (y − c), y < 0. y ≥ c, ⎨ FX (y − c), Exercise 3.3 FY (y) = FX (0) − P(X = 0), −c ≤ y < c, ⎩ y < −c. FX (y + c), ∞ Exercise 3.4 Denoting the solutions to y = a sin(x + θ) by {xi }i=1 , ∞  1 f Y (y) = √ 2 2 f X (xi ). a −y i=1

Exercise3.5 For G = 0 and B = 0,  B n−k  G k C , k = 0, 1, . . . , n, G+B p X (k) = n k G+B 0, otherwise.  1, k = 0, For G = 0, p X (k) = 0, otherwise.  1, k = n, For B = 0, p X (k) = 0, otherwise.  1 √ , 1 < y ≤ 2; 6√1y−1 , 2 < y ≤ 5; Exercise 3.6 pdf: f Y (y) = 3 y−1 0, otherwise.

466

Answers to Selected Exercises

(In the pdf, the set {1 < y ≤ 2, 2 < y ≤ 5} can be replaced with {1 < y ≤ 2, 2 < y < 5}, {1 < y < 2, 2 ≤ y ≤ 5}, {1 < y < 2, 2 ≤ y < 5}, {1 < y < 2, 2 < y ≤ 5}, or {1 < y <  2, 2 < y < 5}.) 2√ 0, y ≤ 1; y − 1, 1 ≤ y ≤ 2; 3 √ cdf: FY (y) = 1 1 + y − 1, 2 ≤ y ≤ 5; 1, y ≥ 5. 3 3 ⎧ 1 0, y < −18; , −18 ≤ y < −2; ⎨ 7 Exercise 3.7 cdf FY (y) = 73 , −2 ≤ y < 0; 47 , 0 ≤ y < 2; ⎩6 , 2 ≤ y < 18; 1, y ≥ 18. 7  −1 1 = n−2 , n = 3, 4, . . .. Exercise 3.8 E X Exercise 3.9 f Y (y) = FX (y)δ(y) + f X (y)u(y) = FX (0)δ(y) + f X (y)u(y), where FX is the cdf of X . ⎧ y ≤ 0, ⎪ ⎨ 0,  √  θ 1 Exercise 3.10 f Y (y) = e√ y cosh  θ y , 0 < y ≤ θ2 , √ ⎪ θ 1 ⎩ √ exp −θ y , y > 2 . 2e y θ Exercise 3.11 FX |b a} = 1+a 2 12 1 Exercise 3.13 P(950 ≤ R ≤ 1050) = 2 . Exercise 3.14 Let X be the time to take to the location of the appointment with cdf FX . Then, departing t ∗ minutes before the appointment time will incur the minimum k . cost, where FX (t ∗ ) = k+c Exercise 3.15 P (X ≤ α) = 13 u(α) + 23 u(α − π). P (2 ≤ X < 4) = 23 . P (X ≤ 0) = 13 .     Exercise 3.16 P (U > 0) = 21 . P |U | < 13 = 13 . P |U | ≥ 43 = 41 . 1  1 P 3 < U < 21 = 12 . 1 1 Exercise 3.17 P(A) = P(A ∪ B) = 32 . P(B) = P(A ∩ B) = 1024 . 1023 P (B c ) = 1024 . Exercise 3.18 (1) When L 1 = max (0, w A + w B − N ) and U1 = min (w A , w B ), ( N )( N −d )( N −w A ) for d = L 1 , L 1 + 1, . . . , U1 , P(D = d) = d wNA −d N w B −d . (w A )(w B ) (2) When L 2 = max (0, k − w B ), U2 = min (w A − d, k − d), L 3 = L 1 and U3 = min (w A + w B , N ), for k = L 3 , L 3 + 1, . . . , U3 U1  U2    N N −d  N −w A w A −d  P(K = k) = N 1 N d w i w B −d A −d (w A )(w B ) d=L 1 i=L 2  w B −d  w −k+2i (1 − p)k+w A −2d−2i . × k−d−i p B ⎧1 2 ⎨ 6 , k = 0, , d = 0, (3) P(D = d) = 13 P(K = k) = 23 , k = 1, , d = 1. ⎩1 3 , k = 2. 6 Exercise 3.19 (1) c = 2. (2) E{X } = 2. Exercise 3.20 (1) P(0 < X < 1) = 14 . P(1 ≤ X < 1.5) = 38 . 19 (2) μ = 24 . σ 2 = 191 . 576 2

Answers to Selected Exercises

467

 Exercise 3.21 (1) FW (w) = (2) W ∼ U [a, b). Exercise 3.22 f Y (y) = Exercise 3.23 f Y (y) =

1 (1−y)2 1 (1+y)2

0, w < a; 1, w ≥ b.

w−a , b−a

a ≤ w < b;

   u(y) − u y − 21 . y . f X 1+y

1 When X ∼ U [0, 1), f Y (y) = (1+y) 2 u(y). Exercise 3.25 f Z (z) = u(z) − u(z − 1). 2 2 Exercise 3.26 f Y (t) = (t+1) 2 u(t − 1). f Z (s) = (1−s)2 {u(s + 1) − u(s)}. Exercise 3.27 f Y (y) = (2 − 2y)u(y)u(1 − y). Exercise 3.28 (1) f Y (y) = √ 12 2 u (a − |y|).

(2) f Y (y) = √2 π

π

a −y

u(y)u(1 − y). 2

⎧ 0, y ≤ −1, ⎪ ⎪ ⎨4 4 −1 − cos y, −1 ≤ y ≤ 0, (3) With 0 ≤ cos−1 y ≤ π, FY (y) = 3 3π 2 −1 ⎪ 1 − 3π cos y, 0 ≤ y ≤ 1, ⎪ ⎩ 1, y ≥ 1. ∞ 1  1 Exercise 3.29 (1) f Y (y) = 1+y 2 . f X (xi ). (2) f Y (y) = π 1+y ( 2) i=1 1 (3) f Y (y) = π 1+y . ( 2) Exercise 3.30 When X ∼ U [0,1),  Y = − λ1 ln(1 − X ) ∼ FY (y) = 1 − e−λy u(y). Exercise 3.31 expected value: E{X } = 3.5. mode: 1, 2, . . . , or 6. median: any real number in the interval [3, 4]. = b. Exercise 3.32 c = 5 < 101 7 Exercise 3.33 E{X } = 0. Var{X } = λ22 . α Exercise 3.34 E{X } = α+β . Var{X } = (α+β)2αβ . (α+β+1) Exercise 3.36 f Y (y) = u(y + 2) − u(y + 1). f Z (z) = 21 {u(z + 4) − u(z + 2)}. f W (w) = 21 {u(w + 3) − u(w + 1)}. Exercise 3.37 pY (k) = 41 , k = 3, 4, 5, 6. p Z (k) = 41 , k = −1, 0, 1, 2. pW (r ) = 41 , r = ± 13 , 0, 15 . c   − c − 0 FX (x)d x, c ≥ 0, Exercise 3.39 (2) E X c = c, c < 0. c ∞  + {1 − FX (x)}d x + 0 FX (x)d x, c ≥ 0, (3) E X c = 0∞ c < 0. 0 {1 − FX (x)}d x, Exercise 3.40 (1) E{X } = λμ. Var{X } = (1 + λ)λμ.   Exercise 3.41 f X (x) = 4πρx 2 exp − 43 πρx 3 u(x). 1 Exercise 3.42 A = 16 . P(X ≤ 6) = 78 . Exercise 3.44 E{F(X )} = 21 . 1 Exercise 3.45 M(t) = t+1 . ϕ(ω) = 1+1jω . m 1 = −1. m 2 = 2. m 3 = −6. m 4 = 24. π t Exercise 3.46 mgf M(t) = π2 02 (tan x) π d x. Exercise 3.47 α = 2n−1 B|β| ˜(n ,n ). 2 2 1−y

468

Answers to Selected Exercises

2 2 √ Exercise 3.48 M R (t) = 1 + 2πσt exp σ 2t Φ (σt), where Φ is the standard normal cdf.  1 − 4y , 0 ≤ y < 1; 21 − 4y , 1 ≤ y < 2; Exercise 3.51 f Y (y) = 0, otherwise. Exercise 3.52 A cdf such that ∞   n      1 i 1 n+1 , (locaton of jump, height of jump) = a + (b − a) 2 2 i=1

n=0

and the interval between adjacent jumps are all the same.  Exercise 3.53 (1) a ≥ 0, a + 13 ≤ b ≤ −3a + 1 . ⎧ 1√ x, x < 0; ⎨ 0,  √ 4 √   0 ≤ x < 1; 1 (2) FY (x) = 24 11 x − 1 , 1 ≤ x < 4; 18 x + 5 , 4 ≤ x < 9; ⎩ 1, x ≥ 9. 1 P(Y = 1) = 6 . P(Y = 4) = 0.  5√ x, 0 ≤ x < 1; 0, 8  x < 0; Exercise 3.54 FY (x) = 1 √ x + 4 , 1 ≤ x < 16; 1, x ≥ 16. ⎧8 F (α), y < −2 or y > 2, ⎪ X ⎪ ⎨ −2 ≤ y < 0, FX (−2) + p X (1), Exercise 3.57 FY (y) = FX (−2) + p X (0) + p X (1), 0 ≤ y < 2, ⎪ ⎪ ⎩ y = 2, FX (2), where α is the only real root of y = x 3 − 3x when y > 2 or y < −2. Exercise 3.58 FY (x) = 0 for x < 0, x for 0 ≤ x < 1, and 1 for x ≥ 1. 2 Exercise 3.59 (1) For α(θ) = 21 , ϕ(ω) = exp − ω4 . 2 2 For α(θ) = cos2 θ, ϕ(ω) = exp − ω4 I0 ω4 .

(2) The normal pdf with mean 0 and variance 21 . (4) E {X 1 } = 0. Var {X 1 } = 21 . E {X 2 } = 0. Var {X 2 } = 21 .  2 Exercise 3.63 E{X } = π2 . E X 2 = π2 − 2. Exercise 3.65 Var{Y } = σ 2X + 4m +X m −X . . Exercise 3.67 pmf p(k) = (1 − α)k α for k ∈ {0, 1, . . .}: E{X } = 1−α α σ 2X = 1−α . 2 α pmf p(k) = (1 − α)k−1 α for k ∈ {1, 2, . . .}: E{X } = α1 . σ 2X = 1−α . α2 βγ αβγ(α+β−γ) Exercise 3.68 E{X } = α+β . Var{X } = (α+β)2 (α+β−1) . λn n n Exercise 3.70 For t < λ, MY (t) = (λ−t) n . E{Y } = λ . Var{Y } = λ2 .  y  2 , y ≥ 1, 1 − 2p − 2p + 1 1 . E{Y } = 2 p(1− Exercise 3.71 FY (y) = p) 0, y < 1. 

  = 3. Exercise 3.74 In b 10, 13 , P10 (k) is the largest at k = 11 3   In b 11, 21 , P11 (k) is the largest at k = 5, 6. Exercise 3.75 (1) P01 = (0.995)1000 + 1000 C1 (0.005)1 (0.995)999 ≈ 0.0401. approximate value with (3.5.17): Φ(2.2417) − Φ(1.7933) ≈ 0.0242. approximate value with (3.5.18): Φ(2.4658) − Φ(1.5692) ≈ 0.0515. 50 51 approximate value with (3.5.19): 0! + 1! e−5 ≈ 0.0404. (2) P456 =

1000 C4 (0.005)

4

(0.995)996 +

1000 C5 (0.005)

5

Answers to Selected Exercises

469

×(0.995)995 + 1000 C6 (0.005)6 (0.995)994 ≈ 0.4982. approximate value with (3.5.17): 2 {Φ(0.4483) − 0.5} ≈ 0.3472. approximate value with (3.5.18): 2 {Φ(0.6725) − 0.5} ≈ 0.4988. 54 55 56 approximate value with (3.5.19): 4! + 5! + 6! e−5 ≈ 0.4972. Exercise 3.77 (2) coefficient of variation =

√1 . λ

kurtosis = 3 + λ1 .

√  Exercise 3.81 f Y (y) = 2y 2u(y)u(1 − y) + u(y − 1)u 3−y . (3) skewness =

√1 . λ

Chapter 4 Random Vectors Exercise 4.2 p X (1) = 35 , p X (2) = 25 . pY (4) = 35 , pY (3) = 25 . 3 3 3 1 p X,Y (1, 4) = 10 , p X,Y (1, 3) = 10 , p X,Y (2, 4) = 10 , p X,Y (2, 3) = 10 . 1 1 3 1 pY |X (4|1) = 2 , pY |X (3|1) = 2 , pY |X (4|2) = 4 , pY |X (3|2) = 4 . p X |Y (1|4) = 21 , p X |Y (2|4) = 21 , p X |Y (1|3) = 34 , p X |Y (2|3) = 41 . 3 3 p X +Y (4) = 10 , p X +Y (5) = 25 , p X +Y (6) = 10 . Exercise 4.3 pairwise independent. not mutually independent. Exercise 4.4 a = 41 . X and Y are not independent of each other. ρ X Y = 0. 8 Exercise 4.5 p R|B=3 (0) = 27 , p R|B=3 (1) = 49 , p R|B=3 (2) = 29 , 1 p R|B=3 (3) = 27 . E{R|B = 1} = 53 . Exercise 4.8 ⎧ √ √ √1 , 0 < y1 < 1 , − y1 + 1 < y2 < y1 + 21 , ⎪ 2 y1 4 2 ⎪ ⎪ ⎨ √1 , 0 < y < 1 , √ y − 1 < y < −√ y + 1 , 1 1 2 1 y1 4 2 f Y (y1 , y2 ) = √ 2 √ 1 ⎪ √1 , 0 < y1 < 1 , − y1 − 1 < y2 < y − , 1 ⎪ 4 2 2 ⎪ ⎩ 2 y1 0, otherwise.   f Y1 (y) = √1y u(y)u 41 − y . f Y2 (y) = (1 − |y|)u(1 − |y|). ⎧ 0, w < 0, ⎪ ⎪ ⎪ ⎨ π w2 , 0 ≤ w < 1, 4 √ √ √ Exercise 4.9 FW (w) = π −1 w 2 −1 2 2 w + w − 1, 1 ≤ w < 2, − sin ⎪ 4 w ⎪ ⎪ √ ⎩ 1, w ≥ 2. ⎧π 0 ≤ w < 1, ⎪ 2 w, ⎨ √ √ π −1 w 2 −1 w, 1 ≤ w < 2, f W (w) = 2 4 − sin w ⎪ ⎩ 0, otherwise. ⎧ ⎨ 1 − e−(μ+λ)w , if w ≥ 0, v ≥ 1, μ Exercise 4.10 FW,V (w, v) = μ+λ 1 − e−(μ+λ)w , if w ≥ 0, 0 ≤ v < 1, ⎩ 0, otherwise. 2 3v Exercise 4.11 fU (v) = (1+v)4 u(v).     √   √  )  √ 1 u 1 − y1 u y2 − y1 u 1 − y2 + y1 . Exercise 4.12 (1) f Y y1 , y2 = 2u(y y1

470

Answers to Selected Exercises

⎧ 0 < y ≤ 1, ⎨ y, f Y1 (y) = 2√1 y u(y)u(1 − y). f Y2 (y) = 2 − y, 1 < y ≤ 2, ⎩ 0, otherwise.  √   √  (2) f Y (y1 , y2 ) = √1y1 u (y1 ) u 1 − y2 + y1 u y2 − 2 y1 . ⎧  1 0 < y2 ≤ 1, ⎨ y2 , √ − 1, 0 < y1 ≤ 1, y1 f Y2 (y2 ) = 2 − y2 , 1 < y2 ≤ 2, f Y1 (y1 ) = ⎩ 0, otherwise. 0, otherwise. Exercise 4.13 (1) f Y (y1 , y2 ) = √ y11+y2 u (y1 + y2 ) u (1 − y1 − y2 ) u (y1 − y2 ) u (1 − y1 + y2 ).√   √ 2 2y1 , 0 < y1 ≤ 21 ; 2 1 − 2y1 − 1 , 21 < y1 ≤ 1; f Y1 (y1 ) = 0, otherwise.  √  √  2 2y2 + 1, − 21 < y2 ≤ 0; 2 1 − 2y2 , 0 < y2 ≤ 21 ; f Y2 (y2 ) = 0, otherwise.   √ (2) f Y (y1 , y2 ) = √ y12+y2 u (y1 + y2 ) u (1 − y1 + y2 ) u y1 − y2 − y1 + y2 . ⎧ √  0 < y1 ≤ 21 , ⎨ 2 √8y1 + 1 − 1 , √ f Y1 (y1 ) = 2 8y1 + 1 − 1 − 4 2y1 − 1, 21 < y1 ≤ 1, ⎩ otherwise. ⎧ 0,√ 1 2y2 + 1, √ − < y2 ≤ − 18 , ⎨ 4 √ 2  1 f Y2 (y2 ) = 4 2y2 + 1 − 8y2 + 1 , − 8 < y2 ≤ 0, ⎩ 0, otherwise. z α1 +α2 −1 Exercise 4.14 f Z (z) = β α1 +α2 Γ (α1 +α2 ) exp − βz u(z). f W (w) =

Γ (α1 +α2 ) w α1 −1 Γ (α1 )Γ (α2 ) (1+w)α1 +α2

u(w).

− 21  y 2r1 − r −1 1 Exercise 4.15 (1) f Y1 (y1 ) = 2r1 u (y1 ) 1 1 y1 r y1r − y22 −y12r       1 1 fX dy2 . y1r − y22 , y2 + f X − y1r − y22 , y2 ⎧  ⎨ 1, w ≥ 1, 2w, w ∈ [0, 1], (3) For r = 21 , FW (w) = w 2 , w ∈ [0, 1], f W (w) = 0, otherwise. ⎩ 0, otherwise. ⎧  ⎨ 1, w ≥ 1, 1, w ∈ [0, 1], For r = 1, FW (w) = w, w ∈ [0, 1], f W (w) = 0, otherwise. ⎩ 0, otherwise.   0, w < 1, 0, w < 1, (w) = f For r = −1, FW (w) = W 1 − w −1 , w ≥ 1. w −2 , w > 1. Exercise 4.16 (1) f Y1 ,Y2 (y1 , y2 ) ⎧1 (y1 , y2 ) ∈ (1 : 3) ∪ (2 : 3), ⎪ 2 (y1 − |y2 |) , ⎪ ⎪ ⎪ (y1 , y2 ) ∈ (1 : 2) ∪ (2 : 1), ⎨ 1 − |y2 | , = 21 (3 − y1 − |y2 |) , (y1 , y2 ) ∈ (3 : 2) ∪ (3 : 1), ⎪ ⎪ (y1 , y2 ) ∈ (3 : 3), ⎪ 21 , ⎪ ⎩ 0, otherwise. (refer to Fig. A.1). (2) f Y2 (y) = (1 − |y|)u (1 − |y|).

Answers to Selected Exercises Fig. A.1 The regions for f Y1 ,Y2 (y1 , y2 ) in Exercise 4.16

471

y2 1

(1 : 2) (1 : 3) 1

0

−1

(3 : 3)

(2 : 3)

(3 : 2) 2

3

(3 : 1) (2 : 1)

⎧1 2 y , 0 ≤ y ≤ 1, ⎪ ⎪ ⎨2 2 −y + 3y − 23 , 1 ≤ y ≤ 2, (3) f Y1 (y) = 2 1 2 ≤ y ≤ 3, ⎪ ⎪ 2 (3 − y) , ⎩ 0, y ≤ 1, y ≥ 3. ˜ − 2). Exercise 4.17 (1) p X +Y (v) = (v − 1)(1 − α)2 αv−2 u(v |w| α u(|w|). ˜ p X −Y (w) = 1−α 1+α (2) p X −Y,X (w, x) = (1 − α)2 α2x−w−2 u(x ˜ − 1)u(x ˜ − w − 1). ˜ + y − 1)u(y ˜ − 1). p X −Y,Y (w, y) = (1 − α)2 αw+2y−2 u(w |w| x−1 α u(|w|). ˜ p (x) = (1 − α)α u(x ˜ − 1). (3) p X −Y (w) = 1−α X 1+α y−1 pY (y) = (1 − α)α u(y ˜ − 1).     − 1 u˜ v−w −1 . (4) p X +Y,X −Y (v, w) = (1 − α)2 αv−2 u˜ v+w 2 2 p X +Y (v) = (v − 1)(1 − α)2 αv−2 u(v ˜ − 2). p X −Y (w) = 1−α α|w| u(|w|). ˜ 1+α the same as the results in (1). ⎧ 1 1 , k = 3; ⎨ 64 , k = 4; 12  7 n 83 17 Exercise 4.18 E {Rn } = 6 . p2 (k) = 288 , k = 2; 36 , k = 1; ⎩ 9 , k = 0. 64 1 η0 = 3 . Exercise 4.19 E{X |Y = y} = 2 + y for y ≥ 0. Exercise 4.20 f Y ( y) = y12 y2 e−y1 u (y1 ) u (y2 ) u (1 − y2 ) u (y3 ) u (1 − y3 ).   u(y1 +y2 )       u 2 − y1 − y2 u y1 − y2 u 2 − y1 + y2 . Exercise 4.21 2 ⎧ f Y y1 , y2 = 0 < y1 ≤ 1, ⎨ y1 , f Y1 (y1 ) = 2 − y1 , 1 ≤ y1 < 2, ⎩ otherwise. ⎧ 0, ⎨ 1 + y2 , −1 < y2 ≤ 0, f Y2 (y2 ) = 1 − y2 , 0 ≤ y2 < 1, ⎩ 0, otherwise. Exercise 4.22 f Y1 ,Y2 (y1 , y2 ) = f X 1 ,X 2 (y1 cos θ − y2 sin θ, y1 sin θ + y2 cos θ).

y1

472

Answers to Selected Exercises

Exercise 4.24 FX,Y |A (x, y) = ⎧ 1, region 1−3, ⎪ ⎪   ⎪ 1 π 2 2 ⎪ 1 − −xψ(x) + a , region 1−2, θ − a ⎪ x πa 2  2 ⎪  ⎪ 1 π 2 2 ⎪ 1 − −yψ(y) + a , region 1−4, θ − a ⎪ y ⎪ πa 2 2 ⎪ ⎪ 1 2 −1 y ⎪ 1 − πa 2 a cos a − yψ(y) ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ −xψ(x) + a 2 θx − π2 a 2 , region 1−5, y 1 a2 ⎪ x y − 2 θ y + 2 ψ(y) ⎪ ⎪ ⎪ πa 2 ⎪ 2 ⎪ ⎪ + x2 ψ(x) − a2 cos−1 ax + πa 2 , region 1−1 or 2−1, ⎪ ⎪ ⎪ ⎪ 2 ⎪ 1 ⎪ x y − a2 θx + x2 ψ(x) ⎪ 2 ⎪ πa ⎪ ⎪ 2 ⎪ ⎩ + 2y ψ(y) − a2 cos−1 ay + πa 2 , region 4−1 or 1−1, ⎧ 2 1 ⎪ x y − a2 cos−1 ay + 2y ψ(y) ⎪ 2 πa ⎪ ⎪ ⎪ ⎪ x a2 ⎪ + , region 2−1 or 3−1, ψ(x) + θ ⎪ x 2 ⎪ ⎪ 2 ⎪ 2 ⎪ x 1 a x −1 ⎨ 2 x y − cos + ψ(x) πa 2 a 2 2 ⎪ + 2y ψ(y) + a2 θ y , region 3−1 or 4−1, ⎪ ⎪ ⎪   ⎪ 1 π 2 2 ⎪ region 2−2, ⎪ πa 2 xψ(x) + a θx − 2 a  , ⎪ ⎪ ⎪ region 4−2, ⎪ πa1 2 yψ(y) + a 2 θ y − π2 a 2 , ⎪ ⎩ 0, otherwise. √ Here, ψ(t) = a 2 − t 2 , θw = cos−1 −ψ(w) , and ‘region’ is shown in Fig. A.2.  a  f X,Y |A (x, y) = πa1 2 u a 2 − x 2 − y 2 . v

Fig. A.2 The regions of FX,Y | A (u, v) in Exercise 4.24

2−2

1−2 a

2−3

1−5

2−1

1−1

3−1

4−1

1−4 a

−a

3−2

1−3

−a

u 4−2 4−3

Answers to Selected Exercises

473

Exercise 4.28 transformation transforming X into an uncorrelated random ⎞ ⎛ A1 linear √ − √1 0 2 ⎟ ⎜ 2 vector: A = ⎝ √16 √16 − √26 ⎠. √1 3

√1 3

√1 3

a linear transformation transforming X into an uncorrelated random vector with unit ⎛ 1 ⎞ √ − √1 0 2 ⎜ 2 ⎟ variance: ⎝ √16 √16 − √26 ⎠. 1 √ 2 3

1 √ 2 3

1 √ 2 3

Exercise 4.29 f Y (y) = exp(−y)u(y). Exercise 4.30 p X (1) = 58 , p X (2) = 38 . pY (1) = 43 , pY (2) = 41 . Exercise 4.31 pmf of  M = max (X 1 ,X 2 ): m  m λk −2λ λm − λm! u(m). P(M = m) = e 2 ˜ m! k! k=0

pmf of N = min (X 1 ,X 2 ): ∞  n P(N = n) = e−2λ λn! 2

λk k!

k=n+1 = 21 {u(w)

+

λn n!

 u(n). ˜

Exercise 4.32 f W (w) − u(w − 2)}. fU (v) = u(v + 1) − u(v).  , −1 < z ≤ 0; 0, z ≤ −1 or z > 2; z+1 2 f Z (z) = 1 2−z , 0 < z ≤ 1; , 1 < z ≤ 2. 2 2 ⎧ 1  3 2 t + , − 23 ≤ t < − 21 , ⎪ 2 ⎪ ⎨ 23  13 − t 2, − 21 ≤ t < 21 , Exercise 4.33 f Y (t) = 41  . E Y 4 = 80  2 1 3 3 ⎪ ⎪ ⎩ 2 t − 2 , 2 ≤ 3t < 2 , 0, t > 2 or t < − 23 . Exercise 4.34 f Y ( y) = f X 1 (y1 ) f X 2 (y2 − y1 ) · · · f X n (yn − yn−1 ), where Y = (Y1 , Y2 , . . . , Yn ) and y = (y1 , y2 , . . . , yn ). , x = 1, 2. pY (y) = 3+2y , y = 1, 2, 3, 4. Exercise 4.35 (1) p X (x) = 2x+5 16 32 3 9 3 (2) P(X > Y ) = 32 . P(Y = 2X ) = 32 . P(X + Y = 3) = 16 . 1 P(X ≤ 3 − Y ) = 4 . (3) not independent.  0, y ≤ 0; 1 − e−y , 0 < y < 1; Exercise 4.36 f Y (y) = −y (e − 1)e , y ≥ 1.    Exercise 4.37 (1) MY (t) = exp 7 et − 1 . (2) Poisson distribution P(7). x+y+z Exercise 4.38 k = 23 . f Z |X,Y (z|x, y) = x+y+ 1 , 0 ≤ x, y, z ≤ 1. 2

Exercise 4.39 E{exp(−Λ)|X = 1} = 49 . (x) fU2 (y − x) fU3 (z − y). Exercise 4.40 f X,Y,Z (x, y, z) = fU 1   3 Exercise 4.41 (1) f X,Y (x, y) = 2π 1 − x 2 − y 2 u 1 − x 2 − y 2 .   f X (x) = 43 1 − x 2 u(1 − |x|).   1 2 2 2 (2) f X,Y |Z (x, y|z) = π 1−z . 2 u 1 − x − y − z ( ) not independent of each other.  x+r 2 , −r ≤ x ≤ 0, Exercise 4.42 (1) c = 2r12 . f X (x) = r r−x , 0 ≤ x ≤ r. r2 u(z)u(r − z). (2) not independent of each other. (3) f Z (z) = 2z r2

474

Answers to Selected Exercises

 √ 8 1 + 2x 2 1 − x 2 u(x)u(1 − x). Exercise 4.44 (1) c = π8 . f X (x) = f Y (x) = 3π X and Y are not independent of each other. (2) f R,θ (r, θ) = π8 r 3 , 0 ≤ r < 1, 0 ≤ θ < π2 . (3) p Q (q) = 18 , q = 1, 2, . . . , 8. Exercise 4.45 probability that the battery with pdf of lifetime f lasts longer than μ . When λ = μ, the probability is 21 . that with g= λ+μ Exercise 4.46 (1) fU (x) = xe−x u(x). f V (x) = 21 e−|x| . ∞ g u(w) f X Y (g) = 0 e−x e− x x1 d xu(g). f YX (w) = (1+w) 2. X W For Z = X +Y = 1+W , f Z (z) = u(z)u(1 − z).   f min(X,Y ) (z) = 2e−2z u(z). f max(X,Y ) (z) = 2 1 − e−z e−z u(z). 2 1 f min(X,Y ) (x) = (1+x) 2 u(x)u(1 − x). (2) f V |U (x|y) = 2y u(y)u(|y| − x). max(X,Y ) Exercise 4.48 (1) E{M} = 1. Var{M}  = 1.

n  1− p 1− p 1 n − 1 − , p = 21 , 2 p−1 p Exercise 4.49 expected value= 2 p−1 p = 21 . n2, 1+2 p− p2 . p2 (2− p) 2− p1 p2 + p12 p2 2− p p + p p2 Exercise 4.51 (1) μ1 = 2 p p − p2 p − p p2 + p2 p2 . μ2 = 2 p p − p2 1p 2− p 1p22+ p2 p2 . 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 p + p − p2 p + p2 p2 p + p − p p2 + p2 p2 (2) h 1 = 2 p p1 − p22 p −1 p2 p2 +1 p22 p2 . h 2 = 2 p p1 − p22 p −1 p2 p2 +1 p22 p2 . 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 p12 −2 p12 p2 + p1 p2 p12 (1− p2 ) Exercise 4.52 α1 = {1−(1− p1 )(1− p2 )}2 . α2 = {1−(1− p1 )(1− p2 )}2 . 1 Exercise 4.53 integral equation: g(x) = 1 + x g(y)dy. g(x) = e1−x . Exercise 4.54 (1) g(k) = g(k − 1)q + g(k − 2) pq + g(k − 3) p 2 q + δk3 p 3 . p3 s 3 1+ p+ p2 (2) G X (s) = 1−qs− pqs . 2 − p 2 qs 3 . (3) E{X } = p3 ⎧

Exercise 4.50 E{N } =

Exercise 4.55 FX,Y |A (x, y) = f X,Y |A (x, y) =

f X,Y (x,y) u FX (x2 )−FX (x1 )

⎪ ⎨ 0, ⎪ ⎩

FX,Y (x,y)−FX,Y (x1 ,y) , FX (x2 )−FX (x1 ) FX,Y (x2 ,y)−FX,Y (x1 ,y) , FX (x2 )−FX (x1 )

x < x1 , x1 ≤ x < x2 , x ≥ x2 .

(x − x1 ) u (x2 − x). 1 3 2 p X,Y (3, 0) = 12 , p X,Y (4, 0) = 12 , p X,Y (5, 0) = 12 , Exercise 4.56 1 3 2 p X,Y (3, 1) = 12 , p X,Y (4, 1) = 12 , p X,Y (5, 1) = 12 . 1 , x = 3, 4, 5; z = x − 1, x, p Z |X (z|x) = 2 0, otherwise. ⎧ 1 , x = 3, z = 2, 3, ⎪ ⎪ ⎨ 12 3 , x = 4, z = 3, 4, p X,Z (x, z) = 12 2 , x = 5, z = 4, 5, ⎪ ⎪ ⎩ 12 0, otherwise. p X |Z (3|2) = 1, p X |Z (3|3) = 14 , p X |Z (3|4) = 0, p X |Z (3|5) = 0, p X |Z (4|2) = 0, p X |Z (4|3) = 43 , p X |Z (4|4) = 35 , p X |Z (4|5) = 0, p X |Z (5|2) = 0, p X |Z (5|3) = 0, p X |Z (5|4) = 25 , p X |Z (5|5) = 1. p Z |Y (2|0) = 0, p Z |Y (3|0) = 16 , p Z |Y (4|0) = 36 , p Z |Y (5|0) = 13 , p Z |Y (2|1) = 16 , p Z |Y (3|1) = 36 , p Z |Y (4|1) = 13 , p Z |Y (5|1) = 0. 1 3 2 , pY,Z (0, 4) = 12 , pY,Z (0, 5) = 12 , pY,Z (0, 2) = 0, pY,Z (0, 3) = 12 1 3 2 pY,Z (1, 2) = 12 , pY,Z (1, 3) = 12 , pY,Z (1, 4) = 12 , pY,Z (1, 5) = 0.

Answers to Selected Exercises

pY |Z (0|2) = 0, pY |Z (0|3) = 14 , pY |Z (0|4) = 35 , pY |Z (0|5) = 1, pY |Z (1|2) = 1, pY |Z (1|3) = 43 , pY |Z (1|4) = 25 , pY |Z (1|5) = 0. 1 1 1 Exercise 4.57 (1) E{U } = λ1 +λ . E{V − U } = λ11 − λ12 + 2λ . λ2 λ1 +λ2 2 λ1 λ1 1 1 E{V } = λ1 + λ2 (λ1 +λ2 ) . (2) E{V } = λ1 + λ2 (λ1 +λ2 ) .  (3) fU,V −U,I (x, y, i) = λ1 λ2 e−(λ1 +λ2 )x δ(i − 1)e−λ2 y + δ(i − 2)e−λ1 y × u(x)u(y). (4) independent. Exercise 4.58 f X (x) = ΓΓ((pp11)Γ+(pp22++pp33)) x p1 −1 (1 − x) p2 + p3 −1 u(x)u(1 − x). f Y (y) = ΓΓ((pp21)Γ+(pp21++pp33)) y p2 −1 (1 − y) p3 + p1 −1 u(y)u(1 − y). f X |Y (x|y) = ΓΓ((pp11)Γ+(pp33)) x p1 −1 (1 − x − y) p3 −1 (1 − y)1− p1 − p3 u(x)u(y)u(1 − x − y). f Y |X (y|x) = ΓΓ((pp22)Γ+(pp33)) y p2 −1 (1 − x − y) p3 −1 (1 − x)1− p2 − p3 u(x)u(y)u(1 − x − y). p2 −1 (1 − z) p3 −1 x p1 −1 (1 − x) p2 + p3 −1 u(x)u(z)u(1 − x)u(1 − z). 1 + p2 + p3 ) f Y ,X (z, x) = Γ Γ( p(1p)Γ ( p2 )Γ ( p3 ) z 1−X

Γ ( p2 + p3 ) p2 −1 Y f 1−X (1 − z) p3 −1 u(x)u(z)u(1 − x)u(1 − z). | X (z|x) = Γ ( p2 )Γ ( p3 ) z Exercise 4.59 FX,Y |B (x, y) ⎧ 1, x ≥ 1, y ≥ 1, ⎪ ⎪ ⎪ ⎪ 0, x ≤ −1, y ≤ −1, ⎪ ⎪ ⎪ ⎪ or y ≤ −x − 1, ⎪ ⎪ ⎪ 1 2 ⎪ (x + 1) , −1 ≤ x ≤ 0, y ≥ x + 1, ⎪ 2 ⎪ ⎪ 1 2 ⎪ (y + 1) , −1 ≤ y ≤ 0, y ≤ x − 1, ⎪ 2  ⎪ ⎪ 1 ⎪ 0 ≤ x ≤ 1, y ≥ 1, ⎨ 2 2 − (1 − x)2 , = 21 2 − (1 − y)2 , x ≥ 1, 0 ≤ y ≤ 1, ⎪ 1 2 2 ⎪ 2 − (1 − x) , x ≤ 1, y ≤ 1, y ≥ −x + 1, − (1 − y) ⎪ ⎪ 2 ⎪ 1 2 ⎪ (x + y + 1) , x ≤ 0, y ≤ 0, y ≥ −x − 1, ⎪ ⎪ 4  ⎪ 1 2 2 ⎪ ⎪ (x + y + 1) , x ≥ 0, y ≤ 0, y ≥ x − 1, − 2x ⎪ 4  ⎪ 1 2 2 ⎪ ⎪ (x + y + 1) , x ≤ 0, y ≥ 0, y ≤ x + 1, − 2y ⎪ 4  ⎪ 1 2 ⎪ ⎪ (x + y + 1) ⎪   ⎩4 x ≥ 0, y ≥ 0, y ≤ −x + 1. −2 x 2 + y 2 , 1 f X,Y |B (x, y) = 2 u (1 − |x| − |y|). Exercise 4.60 FX,Y |A (x, y) ⎧ 1, region 1−3, ⎪ ⎪ ⎪ ⎪ 1 − 2a1 4 ψ 4 (x), region 1−2, ⎪ ⎪ ⎪ ⎪ 1 − 2a1 4 ψ 4 (y), region 1−4, ⎪ ⎪ ⎪ 1 4 4 ⎪ 1 − ψ (x) + ψ (y) , region 1−5, 4 ⎪

2a ⎪  2    4 ⎪ ⎪ 1 2 2 2 4 ⎪ a +x +y −2 x +y , region 1−1, ⎪ 4 ⎪  ⎨ 4a   1 2 2 2 4 region 2−1, = 4a 4 ψ (x) + y − 2y , ⎪ ⎪ 1 4 ⎪ 4 ψ (x), region 2−2, ⎪ 2a  ⎪  2  ⎪ ⎪ 1 2 2 4 ⎪ ψ (y) + x region 4−1, − 2x , ⎪ 4a 4 ⎪ ⎪ ⎪ 1 4 ⎪ ψ (y), region 4−2, ⎪ 2a 4  ⎪  ⎪ ⎪ 1 2 2 2 ⎪ ψ (x) − y , region 3−1, 4 ⎪ ⎩ 4a 0, otherwise, √ 2 2 where ψ(t) = a − t (refer to Fig.  A.2). f X,Y |A (x, y) = 2|xa 4y| u ψ 2 (x) − y 2 .

475

476

Answers to Selected Exercises

n! Exercise 4.63 f X (x) = (n−i)!(i−1)! F i−1 (x){1 − F(x)}n−i f (x). n! F k−1 (y){1 − F(y)}n−k f (y). f Y (y) = (n−k)!(k−1)! Exercise 4.64 (1) p X 1 ,X 2 ,X 3 ,X 4 (x1 , x2 , x3 , x4 ) ⎧ N  { p1 (1 − p2 )}x1 {(1 − p1 ) p2 }x2 ⎪ ⎪ ⎨ x1 ,x2 ,x3 ,x4 4  = × ( p1 p2 )x3 {(1 − p1 ) (1 − p2 )}x4 , if xi = N , ⎪ ⎪ i=1 ⎩ 0, otherwise. (X 2 +X 3 )(X 1 +X 3 ) X3 3 ˆ (3) pˆ 1 = X 2X+X . p ˆ = . λ = . (4) Xˆ 4 = XX1 X3 2 . 2 X 1 +X 3 X3 3 Exercise 4.65 (1) ρ X |X | = 0. (2) ρ X |X | = 1. (3) ρ X |X | = −1.

Exercise 4.66 f X,2X +1 (x, y) = 21 {u(x) − u(x − 1)} δ y−1 −x . 2 Exercise 4.67 For x ∈ {x| f X (x) > 0}, f Y |X (y|x) = {δ(x + y) + δ(x − y)}u(y).   √   √  √   (x, y) = F(x) − F − y u x + y − F(x) − F y Exercise 4.69 (1) FX,Y   √   √    √  √ 

u x − y u(y) = F min x, y − F − y u(y)u x + y .    √  √  (2) f X,Y (x, y) = 2f √(x)y δ x + y + δ x − y u(y)  f (x) √ δ y − |x| u(y). = 2|x|    √  √  √ (3) f X |Y (x|y) = f √ y f+(x) δ x + y + δ x − y u(y). ( ) f (− y ) Exercise 4.71 FX,Y (x, y) = {FX (x)u(y − x) + FX (y)u(x − y)} u(y). f X,Y (x, y) = f X (x) {u(y)δ(y − x) + u(y − x)δ(y)}. Exercise 4.73 f X 1 (t) = 6t (1 − t)u(t)u(1 − t). f X 2 (t) = 23 (1 − |t|)2 u(1 − |t|) Exercise 4.74 (1) f Y (y1 , y2 ) = 2|y11 | u yy21 + y1 u 1 − 21 yy21 + y1 u yy21 − y1 u 1 − 21 yy21 − y1 (2) f Y1 (y) = (1 − |y|)u(1 − |y|). √ √ 1−|y| √ = 14 u(1 − |y|) ln 1+ . f Y2 (y) = 21 u(1 − |y|) ln 1−√|y| 1−|y| 1− 1−|y|

Chapter 5 Normal Random Vectors Exercise 5.1 (3) The vector (X, Y ) is not a bi-variate normal random vector. (4) The random variables X and Y are not independent of each other.   exp{− 1 x 2 +y 2  u a 2 − x 2 − y 2 . 2 2 Exercise 5.2 f (x, y) = 2 ( )} X 1 ,X 2 |X 1 +X 2 |y|. 2 x 1 v (2) FX (x) = √2π −∞ exp − 2 dv. The random variable X is normal. (3) The vector (X, Y ) is not a normal random vector. (4) f X |Y (x|y) = 21 δ(x + y) + 21 δ(x − y). 2 f X,Y (x, y) = 21 {δ(x + y) + δ(x − y)} √12π exp − y2 . ⎛ 1 ⎞ √ √1 0 − 2 ⎜ 2 ⎟ Exercise 5.9 ⎝ √334 √434 √334 ⎠ 3 2 2 √ − √17 √17 17 Exercise 5.10 f C (r, θ) = r f X (r cos θ, r sin θ) u (r ) u(π − |θ|). The random vector C is an independent random vector when X is an i.i.d. random vector with marginal   distribution N 0, σ 2 . Exercise 5.12 The conditional distribution of X 3 when X 1 = X 2 = 1 is N (1, 2). 2 Exercise 5.13 acσ 2X + (bc + ad)ρσX σY + bdσY = 0.3 3 2 Exercise  2 2 5.14 E{X Y 2}= 2ρσ2X σY . E X Y = 0. E X Y = 3ρσ X σY . E X Y = 1 + 2ρ σ X σY . Exercise 5.16   2 2 2 2 E Z 2 W 2 = 1 + 2ρ2 σ12 σ22 + m 22 σ12 + m  1 σ2 + 2m 1 m4 2 2+ 4m 1 m 2 ρσ1 σ2 . 5 Exercise 5.17 μ51 = 15ρσ1 σ2 . μ42 = 3ρ 1 + 4ρ σ1 σ2 . μ33 = 3ρ 3 + 2ρ2 σ13 σ23 . Exercise 5.5 ρ Z W =



2 σ12 +σ22



Exercise 5.19 E{|X Y |} = 2 + 62 π.   Exercise 5.20 E {|X 1 |} = π2 . E {|X 1 X 2 |} = π2 1 − ρ2 + ρ sin−1 ρ .

 ! !   E ! X 1 X 23 ! = π2 3ρ sin−1 ρ + 2 + ρ2 1 − ρ2 . Exercise 5.25 23!4!4!5! 1 ×2!4! = 4320. Exercise 5.30 f X (x) = π x 2r+r 2 . ( ) Exercise 5.32 E{Y } = n + δ. σY2 = 2n + 4δ. Γ n−1  Exercise 5.34 E{Z } = δ Γ( n2 ) n2 for n > 1. (2) 2 Γ 2 n−1 n 1+δ 2 Var{Z } = (n−2 ) − nδ2 Γ (2 n2 ) for n > 2. (2) n(m+δ) for n > 2. Exercise 5.36 E{H } = m(n−2) 2n 2 {(m+δ)2 +(n−2)(m+2δ)} Var{H } = for n > 4. m 2 (n−4)(n−2)2   μ4 3(n−1)μ22 Exercise 5.40 μ4 X n = n 3 + n 3 .   Exercise 5.42 (1) pdf: f V (v) = e−v u(v). cdf: FV (v) = 1 − e−v u(v). Exercise 5.43 RY = ρY |α2 =1 =

6 π

sin−1 ρ2 .

2β 2 π

sin−1 ρ . ρ = Y 2 1+α sin−1 2 −1 ρY = π sin ρ.

sin−1

lim

α2 →0

ρ 1+α2 1 1+α2

.

478

Answers to Selected Exercises

Chapter 6 Convergence of Random Variables Exercise 6.1 {X n (ω)} converges almost surely, but not surely, to X (ω). to X (ω). Exercise 6.2 {X n (ω)} converges almost surely,but not surely,  n  Exercise 6.3 (1) Sn ∼ NB(n, α). (2) Sn ∼ NB ri , α .  n  n  i=1 n n     (3) Sn ∼ P λi . (4) Sn ∼ G αi , β . (5) Sn ∼ C μi , θi . i=1

i=1

i=1

i=1

p

Exercise 6.4 Snn → m. Exercise 6.7 The weak law of large numbers holds true. Exercise 6.10 P (50 < Sn ≤ 80) = P (50.5 < Sn < 80.5) ≈ 0.9348. Exercise 6.11 E{U } = p(1 − p) + np 2 . E{V } = np 2 . 1 Exercise 6.12 mgf of Y MY (t) = 2−e 2t . expected value= 2. variance= 8. 1 5 Exercise 6.13 (1) P3 = P5 = 2 . P4 = 16 . (2) lim Pn = 21 . n→∞

Exercise 6.14 lim Fn (x) = u(x). It is a cdf. n→∞    Exercise 6.15 It is convergent. FYn (y) → 1 − exp − θy u(y). Exercise 6.16 lim FYn (y) = u(y). n→∞

Exercise 6.17 It is convergent. Fn (x) → u(x). n    " Exercise 6.18 (1) ϕ Xˆ n (w) = ϕ X (w) ϕWi wn . E Xˆ n = E{X }. i=1

 n  1 2 Var Xˆ n = Var{X } + n 2 σi . i=1   (2) Cov Yi , Y j = Var{X } + σi2 for i = j and Var{X } for i = j. n  α2 σi2 , f εn (α) = √ 1 2 exp − 2σ (3) Denoting σ 2 = n12 2 . 2πσ i=1  2 (α−β) 1 f Xˆ n |X (α|β) = √ 2 exp − 2σ2 . 2πσ (4) Xˆ n is mean square convergent to X . Exercise 6.26 an = nσ 2 . bn = nμ4 . Exercise 6.27 α ≤ 0. −2 Exercise 6.29 Exact value: P(S ≥ 3) = 1 − 5e ≈ 0.3233. S−2 1 approximate value: P(S ≥ 3) = P √2 ≥ √2 = P Z ≥ √12 ≈ 0.2398. or P(S ≥ 3) = P(S > 2.5) = P Z > 2√1 2 ≈ 0.3618 . Exercise 6.30 lim Fn (x) = F(x), where F is the cdf of U [0, 1). n→∞

  d lim F (x) = lim ddx Fn (x) . n dx n→∞

n→∞

Exercise 6.32 Distribution of points: Poisson with parameter μ p. mean: μ p = 1.5. variance: μ p = 1.5. Exercise 6.33 Distribution of the total information sent via the fax: 1 = 4 × 105 . geometric distribution with expected value αβ 1 Exercise 6.34 Expected value: E {S N } = pλ . variance: Var {S N } =

1 . p 2 λ2

Answers to Selected Exercises

Exercise 6.35 Expected value of T : λ3 . Exercise 6.36 (1) lim Fn (x) = 21 . This limit is not a cdf. n→∞

(2) Fn (x) → 1. This limit is not a cdf. (3) Because G 2n (x) → 1 and G 2n+1 (x) → 0, {G n (x)}∞ n=1 is not convergent. Exercise 6.37 FYn (y) → u(y − β).

479

Index

A Absolute central moment, 406 normal variable, 406 Absolute continuity, 29 Absolutely continuous function, 32, 169 convolution, 32 Absolute mean, 250 logistic distribution, 251 Absolute mean inequality, 449 Absolute moment, 193, 364 Gaussian distribution, 196 tri-variate normal distribution, 366 Absolute value, 177 cdf, 177 pdf, 184 Abstract space, 1 Addition theorem, 118 Additive class of sets, 96 Additive function, 143 Additivity, 106 countable additivity, 106 finite additivity, 106 Algebra, 94 generated from C , 95 σ-algebra, 96 Algebraic behavior, 398 Algebraic number, 85 Almost always, 17, 110 Almost always convergence, 416 Almost certain convergence, 416 Almost everywhere, 17, 110 Almost everywhere convergence, 416 Borel-Cantelli lemma, 417 Almost sure convergence, 415 sufficient condition, 417

Almost surely, 17, 110 Always convergence, 415 Appell’s symbol, 48 Ascending factorial, 48 Associative law, 10 Asymmetry, 205 At almost all point, 17 At almost every point, 17 Auxiliary variable, 273, 275 Axiom of choice, 150

B Basis, 198 orthonormal basis, 198 Bayes’ theorem, 121, 154, 212, 303 Bernoulli distribution, 126 sum, 428 Bernoulli random variable, 428 Bernoulli trial, 126 Bertrand’s paradox, 114 Bessel function, 249 Beta distribution, 135 expected value, 245 mean, 245 moment, 245 variance, 245 Beta function, 53, 135 Bienayme-Chebyshev inequality, 451 Big O, 219 Bijection, 22 Bijective function, 22 Bijective mapping, 22 Binary distribution, 126 Binomial coefficient, 46, 65, 127

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 I. Song et al., Probability and Random Variables: Theory and Applications, https://doi.org/10.1007/978-3-030-97679-8

481

482 complex space, 63 Binomial distribution, 127 cdf, 219 cf, 207 expected value, 195, 207 Gaussian approximation, 220 kurtosis, 252 mean, 195, 207 skewness, 252 variance, 195, 207 Binomial expansion, 67 Binomial random variable, 218 sum, 429 Bi-variate Cauchy distribution, 403 Bi-variate Gaussian distribution, 403 Bi-variate isotropic symmetric αstabledistribution, 402 Bi-variate normal distribution, 403 Bi-variate random vector, 260 Bi-variate t distribution, 383 Bonferroni inequality, 110 Boole inequality, 108, 141 Borel-Cantelli lemma, 141, 425 almost everywhere convergence, 417 Borel field, 104, 146 Borel set, 104, 146 Borel σ algebra, 104, 146 Borel σ field, 104 Bound, 55 Bounded convergence, 447 Box, 144 Breit-Wigner distribution, 133 Buffon’s needle, 267

C Cantor function, 30 Cantor set, 16 Cantor ternary set, 16 Cardinality, 17 Cartesian product, 14 Cauchy criterion, 419 Cauchy distribution, 133 bi-variate Cauchy distribution, 403 cdf, 170 expected value, 196 generalized Cauchy distribution, 398 mean, 196 sum, 455 variance, 196 Cauchy equation, 225 Cauchy-Schwarz inequality, 290, 451 Cauchy’s integral theorem, 73

Index Ceiling function, 74 Central chi-square distribution, 377 expected value, 378 mean, 378 mgf, 377 moment, 378 sum, 378 variance, 378 Central constant, 432 Central F distribution, 384 expected value, 384 mean, 384 moment, 384 variance, 384 Central limit theorem, 217, 396, 438, 444 generalized, 396 Lindeberg central limit theorem, 439, 444 Central moment, 193 Central t distribution, 380 expected value, 410 mean, 410 moment, 410 variance, 410 Certain convergence, 415 Characteristic equation, 295 Characteristic exponent, 400 Characteristic function (cf), 199 binomial distribution, 207 geometric distribution, 200 joint cf, 295 marginal cf, 296 moment theorem, 206 negative binomial distribution, 200 negative exponential distribution, 246 normal distribution, 200 Poisson distribution, 208 random vector, 295 Chebyshev inequality, 449 Chernoff bound, 453 Chu-Vandermonde convolution, 69 C ∞ function, 41 Class, 4 completely additive class, 96 singleton class, 4 Classical definition, 112 Class of sets, 4 Closed interval, 4 Closed set, 146 Closure, 24 Codomain, 19 Coefficient of variation, 205 exponential distribution, 205

Index Poisson distribution, 252 Collection, 4 partition, 7 singleton collection, 4 Collection of sets, 4 Combination, 46 Combinatorics, 114 Commutative law, 9 Complement, 5 Completely additive class, 96 Complex conjugate transpose, 290 Complex random vector, 290 Component, 1 Concave, 450 Conditional cdf, 209 Conditional distribution, 208, 300 normal random vector, 349 Conditional expected value, 214, 307 expected value of conditional expected value, 307 Conditional joint cdf, 300 Conditional joint pdf, 300 Conditional pdf, 209 Conditional pmf, 208 Conditional probability, 116 Conditional rate of failure, 213, 226 Conditional variance, 308 Conjugate transpose, 290 Conservation law, 271 Continuity, 26, 139 absolute continuity, 29 expectation, 447 expected value, 447 probability, 140 uniform continuity, 26 Continuity correction, 221, 443 Continuity of expectation, 447 bounded convergence, 447 dominated convergence, 447 monotonic convergence, 447 Continuity of probability, 140 continuity from above, 139 continuity from below, 139 limit event, 140 Continuous function, 25 Continuous part, 230 Continuous random variable, 162, 169 Continuous random vector, 255 Continuous sample space, 100 Continuous space, 100 Contour, 340 Convergence, 30, 62 almost always convergence, 416

483 almost certain convergence, 416 almost everywhere convergence, 416 almost sure convergence, 415 always convergence, 415 bounded convergence, 447 certain convergence, 415 continuity of probability, 140 convergence of multiplication, 448 dominated convergence, 447 everywhere convergence, 415 in distribution, 421 in law, 421 in probability, 420 in the mean square, 419 in the r -th mean, 418 mean square convergence, 419 monotonic convergence, 447 probability function, 443 properties of convergence, 447 relationships among convergence, 422 set, 62 stochastic convergence, 420 sure convergence, 415 weak convergence, 421 with probability 1, 416 Convex, 450 Convolution, 32, 203, 276, 296 absolutely continuous function, 32 Chu-Vandermonde convolution, 69 singular function, 32 Vandermonde convolution, 69 Coordinate transformation, 405 Correlation, 289 Correlation coefficient, 289 Correlation matrix, 290 Countable set, 12 Counting, 45 Covariance, 289 Covariance matrix, 290, 291 Covering, 144 Cumulant, 204 Cumulative distribution function (cdf), 165, 227 absolute value, 177 binomial distribution, 219 Cauchy distribution, 170 complementary standard normal cdf, 218 conditional cdf, 209 conditional joint cdf, 300 discontinuous function, 242 double exponential distribution, 170 inverse, 176 inverse cdf, 232

484 joint cdf, 256 limit, 458 limiter, 177 linear function, 175 logistic distribution, 170 magnitude, 177 marginal cdf, 256 Poisson distribution, 222 Rayleigh distribution, 170 sign, 186 square, 176 square root, 177 standard normal distribution, 217 Cyclic sum, 367, 393

D Decorrelating normal random vector, 355 Decreasing sequence, 57 Degenerate bi-variate normal pdf, 342 Degenerate multi-variate normal pdf, 347 Degenerate tri-variate normal pdf, 347 Degree of freedom, 184, 377 Delta-convergent sequence, 39 Delta function, 33 de Moivre-Laplace theorem, 220 de Morgan’s law, 10 Density function, 130 Denumerable set, 12 Diagonal matrix, 293 Difference, 8 geometric sequence, 80 symmetric difference, 8 two random variables, 278 Discontinuity, 27 jump discontinuity, 27, 31, 168 type 1 discontinuity, 27 type 2 discontinuity, 27 Discontinuous part, 230 Discrete combined space, 100 Discrete part, 230 Discrete random variable, 162 Discrete random vector, 255 joint pmf, 262 marginal pmf, 262 Discrete sample space, 100 Discrete space, 100 unit step function, 36 Disjoint, 7 Dispersion parameter, 400 Distance, 23 Distance function, 23 Distribution, 36, 106, 164

Index Bernoulli distribution, 126 beta distribution, 135 binary distribution, 126 binomial distribution, 127 bi-variate Cauchy distribution, 403 bi-variate Gaussian distribution, 403 bi-variate isotropic SαS distribution, 402 bi-variate normal distribution, 403 bi-variate t distribution, 383 Breit-Wigner distribution, 133 Cauchy distribution, 133, 398 central chi-square distribution, 377 central F distribution, 384 central t distribution, 380 conditional distribution, 208, 300 de Moivre-Laplace distribution, 133 difference of two random variables, 278 double exponential distribution, 132, 397 exponential distribution, 132, 226 gamma distribution, 134 Gauss distribution, 133 Gaussian distribution, 133 Gauss-Laplace distribution, 133 geometric distribution, 127, 226, 252 heavy-tailed distribution, 396 hypergeometric distribution, 173, 327 impulsive distribution, 396 inverse of central F random variable, 385 Laplace distribution, 132, 397 lattice distribution, 231 limit distribution, 378 limit of central F distribution, 385, 386 logistic distribution, 134, 251 log-normal distribution, 183 long-tailed distribution, 396 Lorentz distribution, 133 multinomial distribution, 259, 316 negative binomial distribution, 129 negative exponential distribution, 246 non-central chi-square distribution, 379 non-central F distribution, 388 non-central t distribution, 382 normal distribution, 133, 217 Pascal distribution, 129 Poisson distribution, 128, 222 Polya distribution, 129 product of two random variables, 278 ratio of two normal random variables, 342 ratio of two random variables, 279 Rayleigh distribution, 133, 252 rectangular distribution, 131 second Laplace distribution, 133

Index stable distribution, 400, 438 standard normal distribution, 217 sum of two random variables, 276, 282 two-point distribution, 126 uniform distribution, 127, 131 Distribution function, 165 Distributive law, 10 Domain, 19 Dominated convergence, 447 Double exponential distribution, 132 cdf, 170 expected value, 245 mean, 245 variance, 245 Dumbbell-shaped region, 68

E Element, 1 Elementary event, 101 Elementary outcome, 100 Elementary set, 144 Ellipse, 412 Empty set, 3 Enclosure, 24 Ensemble average, 190 Enumerable set, 12 Equality, 2 in distribution, 226, 422 Equivalence, 17 Equivalence theorem, 18 Error function, 217 Euclidean space, 144 Eulerian integral of the first kind, 53 Eulerian integral of the second kind, 54 Euler reflection formula, 50, 72 Euler’s integral formula, 69 Even function, 41 Event, 101 elementary event, 101 independent, 123 Event space, 101 largest event space, 102 smallest event space, 102 Everywhere convergence, 415 Expectation, 190 continuity, 447 Expected value, 190, 287 beta distribution, 245 binomial distribution, 195, 207 Cauchy distribution, 196 central chi-square distribution, 378 central F distribution, 384

485 central t distribution, 410 conditional expected value, 214, 307 continuity, 447 double exponential distribution, 245 expected value of conditional expected value, 307 exponential distribution, 194 gamma distribution, 251 Gaussian distribution, 195 geometric distribution, 250 hypergeometric distribution, 251 magnitude, 250 negative binomial distribution, 250 non-central chi-square distribution, 379 non-central F distribution, 388 non-central t distribution, 383 Poisson distribution, 195 sample mean, 371 sign, 250 uniform distribution, 194 Exponential behavior, 398 Exponential distribution, 132, 224, 226 coefficient of variation, 205 expected value, 194 failure rate function, 226 hazard rate function, 226 kurtosis, 206 Markovian property, 224 mgf, 207 random number generation, 245 rate, 132 skewness, 205 standard exponential pdf, 132 sum, 429 variance, 194

F Factorial, 46 ascending factorial, 48 falling factorial, 46 rising factorial, 48 upper factorial, 48 Factorization property, 309 Failure rate function, 226 Falling factorial, 46 Family, 4 singleton family, 4 Family of sets, 4 Fat Cantor set, 17 Fibonacci number, 76 Field, 94 Borel field, 104

486 generated from C , 95 sigma field, 96 Finite additivity, 106 Finitely μ-measurable set, 145 Finitely often (f.o.), 61 Finite random variable, 162 Finite set, 3 Floor function, 76, 128 Fourier series, 208 Fourier transform, 199 step function, 87 Fubini’s theorem, 82 Function, 19 absolutely continuous function, 169 additive function, 143 Bessel function, 249 bijective function, 22 Cantor function, 30 cdf, 165 ceiling function, 74 cf, 199 characteristic function, 199 C ∞ function, 41 continuous function, 25 cumulative distribution function, 165 delta function, 33 distance function, 23 distribution function, 165 error function, 217 floor function, 76, 128 function of impulse function, 43 gamma function, 47 Gauss function, 76 generalized function, 36 Gödel pairing function, 86 Heaviside function, 33 hypergeometric function, 69, 364 impulse function, 33, 36 increasing singular function, 31 injection, 21 injective function, 21 into function, 21 Kronecker delta function, 293 Kronecker function, 293 Lebesgue function, 30 max, 35 measurable function, 147, 162 membership function, 2 mgf, 202 min, 35 moment generating function, 202 one-to-one correspondence, 22 one-to-one function, 21

Index one-to-one mapping, 21, 22 onto function, 21 pdf, 130 piecewise continuous function, 35 pmf, 125 Pochhammer function, 48 probability density function, 130 probability mass function, 125 set function, 19, 93 simple function, 147 singular function, 29, 169 step function, 33 step-like function, 169, 229 surjection, 21 surjective function, 21 test function, 41 Thomae’s function, 28 unit step function, 33 Function of impulse function, 43 Fuzzy set, 2

G Gambler’s ruin problem, 157 Gamma distribution, 134 expected value, 251 mean, 251 mgf, 251 sum, 455 variance, 251 Gamma function, 47 Gauss function, 76 Gaussian approximation, 220 binomial distribution, 220 multinomial distribution, 317 Gaussian distribution, 133 absolute moment, 196 bi-variate Gaussian distribution, 403 cf, 200 expected value, 195 generalized Gaussian distribution, 397 moment, 196 standard Gaussian distribution, 133 standard Gaussian pdf, 133 sum, 429 variance, 195 Gaussian noise, 396 Gaussian random vector, 337, 338 bi-variate Gaussian random vector, 339 multi-dimensional Gaussian random vector, 338 General formula, 360 joint moment, 362, 369

Index moment, 360 Generalized Bienayme-Chebyshev inequality, 451 Generalized Cauchy distribution, 398 moment, 410 Generalized central limit theorem, 396 Generalized function, 36 Generalized Gaussian distribution, 397 moment, 409 Generalized normal distribution, 397 moment, 409 Geometric distribution, 127, 226, 252 cf, 200 expected value, 250 mean, 250 skewness, 205 sum, 455 variance, 250 Geometric sequence, 80 difference, 80 Gödel pairing function, 86 Greatest lower bound, 55

H Hagen-Rothe identity, 69 Half-closed interval, 4 Half mean, 250, 334, 364 logistic distribution, 251 Half moment, 250, 364 Half-open interval, 4 Half-wave rectifier, 242, 335 Hazard rate function, 226 Heaviside convergence sequence, 33 Heaviside function, 33 Heaviside sequence, 33 Heavy-tailed distribution, 396 Heine-Cantor theorem, 27 Heredity, 99 Hermitian adjoint, 290 Hermitian conjugate, 290 Hermitian matrix, 292 Hermitian transpose, 290 Hölder inequality, 454 Hybrid random vector, 256 Hypergeometric distribution, 173, 327 expected value, 251 mean, 251 moment, 251 variance, 251 Hypergeometric function, 69, 364

487 I Image, 19 inverse image, 20 pre-image, 20 Impulse-convergent sequence, 39 Impulse function, 33, 36, 137 symbolic derivative, 41 Impulse sequence, 39 Impulsive distribution, 396 Incomplete mean, 250, 334, 364 Incomplete moment, 250, 364 Increasing singular function, 31 Independence, 123, 350 mutual, 124 pairwise, 124 Independent and identically distributed (i.i.d.), 269 Independent events, 123 a number of independent events, 124 Independent random vector, 266 several independent random vectors, 270 two independent random vectors, 269 Index set, 100 Indicator function, 147 In distribution, 226, 422 Inequality, 108, 448 absolute mean inequality, 449 Bienayme-Chebyshev inequality, 451 Bonferroni inequality, 110 Boole inequality, 108, 141 Cauchy-Schwarz inequality, 290, 451 Chebyshev inequality, 449 Chernoff bound, 453 generalized Bienayme-Chebyshev inequality, 451 Hölder inequality, 454 Jensen inequality, 450 Kolmogorov inequality, 452 Lipschitz inequality, 26 Lyapunov inequality, 423, 450 Markov inequality, 425, 449 Minkowski inequality, 454 tail probability inequality, 448 triangle inequality, 454 Infimum, 144 Infinite dimensional vector space, 100 Infinitely often (i.o.), 61 Infinite set, 3 Inheritance, 99 Injection, 21 Injective function, 21 In probability, 109 Integral, 79

488 Lebesgue integral, 79, 131, 148, 168 Lebesgue-Stieltjes integral, 79, 168 Riemann integral, 131, 148, 168 Riemann-Stieltjes integral, 79, 168 Intersection, 6 Interval, 4, 144 closed interval, 4 half-closed interval, 4 half-open interval, 4 open interval, 4 Interval set, 4 Into function, 21 Inverse cdf, 232 Inverse Fourier transform, 87 Inverse image, 20 Inverse of central F random variable, 385 Inverse operation, 9 Isohypse, 340

J Jacobian, 272 Jensen inequality, 450 Joint cdf, 256 conditional joint cdf, 300 Joint central moment, 289 Joint cf, 295 Joint mgf, 296 Joint moment, 289, 298 general formula, 362, 369 normal random vector, 356, 406 Joint pdf, 257 conditional joint pdf, 300 Joint pmf, 259 discrete random vector, 262 Joint random variables, 255 Jump discontinuity, 27, 31, 168

K Khintchine’s theorem, 434 Kolmogorov condition, 435 Kolmogorov inequality, 452 Kolmogorov’s strong law of large numbers, 436 Kronecker delta function, 293 Kronecker function, 293 Kronecker lemma, 435 Kurtosis, 205 binomial distribution, 252 exponential distribution, 206 Poisson distribution, 252

Index L Laplace distribution, 132 Laplace transform, 202 Largest event space, 102 Lattice distribution, 231 Laws of large numbers, 432 Least upper bound, 55 Lebesgue decomposition theorem, 230 Lebesgue function, 30 Lebesgue integral, 79, 131, 148, 168 Lebesgue length, 454 Lebesgue measure, 131, 146 Lebesgue-Stieltjes integral, 79, 168 Leibnitz’s rule, 179 Leptokurtic, 205 L’Hospital’s theorem, 408 Limit, 56, 57, 62 cdf, 458 central F distribution, 385, 386 limit inferior, 56 limit set, 62 limit superior, 56 lower limit, 59 negative binomial pmf, 157 pdf, 458 random variable, 415 upper limit, 60 Limit distribution, 378 central F distribution, 385, 386 inverse of F random variable, 387 Limiter, 177 cdf, 177 Limit event, 138 continuity of probability, 140 probability, 138, 139 Limit in the mean (l.i.m.), 419 Limit inferior, 56 Limit of central F distribution, 385, 386 Limit point, 24 Limit set, 57, 62 convergence, 62 monotonic sequence, 57 Limit superior, 56 Lindeberg central limit theorem, 439, 444 Lindeberg condition, 440, 445 Linearly dependent random vector, 293 Linearly independent random vector, 293 Linear transformation, 274 normal random vector, 351, 355 random vector, 293 Line mass, 264 Lipschitz constant, 26 Lipschitz inequality, 26

Index Location parameter, 400 Logistic distribution, 134 absolute mean, 251 cdf, 170 half mean, 251 moment, 251 Log-normal distribution, 183 Long-tailed distribution, 396 Lorentz distribution, 133 Lower bound, 55, 144 greatest lower bound, 55 Lower bound set, 59 Lower limit, 59 Lyapunov inequality, 423, 450

M Magnitude, 177 cdf, 177 expected value, 250 pdf, 184 variance, 250 Mapping, 19 bijective function, 22 one-to-one correspondence, 22 one-to-one mapping, 22 Marginal cdf, 256 Marginal cf, 296 Marginal mgf, 296 Marginal pdf, 256 Marginal pmf, 256 Markov condition, 433 Markovian property, 224 Markov inequality, 425, 449 Markov theorem, 433 Mass function, 125 Max, 35 symbolic derivative, 41 Mean, 190 beta distribution, 245 binomial distribution, 195, 207 Cauchy distribution, 196 central chi-square distribution, 378 central F distribution, 384 central t distribution, 410 double exponential distribution, 245 gamma distribution, 251 geometric distribution, 250 half mean, 250, 334, 364 hypergeometric distribution, 251 incomplete mean, 250, 334, 364 magnitude, 250 negative binomial distribution, 250

489 non-central chi-square distribution, 379 non-central F distribution, 388 non-central t distribution, 383 normal distribution, 195 Poisson distribution, 195 sample mean, 371 sign, 250 uniform distribution, 194 Mean square convergence, 419 Measurable function, 147, 162 Measurable set, 101 finitely μ-measurable set, 145 μ-measurable set, 145 Measurable space, 105, 147 Measure, 93, 143 Lebesgue measure, 131, 146 outer measure, 144 Measure space, 147 Measure theory, 93 Measure zero, 17, 110 Median, 189 uniform distribution, 190 Membership function, 2 Memoryless, 224 Metric, 23 Metric space, 24 Mighty, 159 Mild peak, 205 Min, 35 symbolic derivative, 41 Minkowski inequality, 454 Mixed probability measure, 137 Mixed random vector, 256 Mixed-type random variable, 162, 169 Mode, 189 uniform distribution, 190 Moment, 193 absolute moment, 193 beta distribution, 245 central chi-square distribution, 378 central F distribution, 384 central moment, 193 central t distribution, 410 cumulant, 204 Gaussian distribution, 196 general formula, 360 generalized Cauchy distribution, 410 generalized Gaussian distribution, 409 generalized normal distribution, 409 half moment, 250, 364 hypergeometric distribution, 251 incomplete moment, 250, 364 joint central moment, 289

490 joint moment, 289 logistic distribution, 251 moment theorem, 206, 298 normal distribution, 196, 249 partial moment, 250, 364 Rayleigh distribution, 249 Moment generating function (mgf), 202 central chi-square distribution, 377 exponential distribution, 207 gamma distribution, 251 joint mgf, 296 marginal mgf, 296 non-central chi-square distribution, 379 non-central t distribution, 383 random vector, 296 sample mean, 410 Moment theorem, 206, 298 Monotonic convergence, 447 Monotonic sequence, 57 limit set, 57 Monotonic set sequence, 57 Multi-dimensional Gaussian random vector, 338 Multi-dimensional normal random vector, 338 Multi-dimensional random vector, 255 Multinomial coefficient, 47, 259 Multinomial distribution, 259, 316 Gaussian approximation, 317 Poisson approximation, 316 Multiplication theorem, 118 Multi-variable random vector, 255 Multi-variate random vector, 255 Mutually exclusive, 7

N Negative binomial distribution, 129, 266 cf, 200 expected value, 250 limit, 157 mean, 250 skewness, 205 sum, 455 variance, 250 Negative exponential distribution, 246 Neighborhood, 24 Noise, 396 Gaussian noise, 396 normal noise, 396 Non-central chi-square distribution, 379 expected value, 379 mean, 379

Index mgf, 379 sum, 379 variance, 379 Non-central F distribution, 388 expected value, 388 mean, 388 variance, 388 Non-central t distribution, 382 expected value, 383 mean, 383 mgf, 383 variance, 383 Non-decreasing sequence, 57 Non-denumerable set, 15 Non-increasing sequence, 57 Non-measurable set, 101, 149 Normal distribution, 133, 217 absolute central moment, 406 bi-variate normal distribution, 403 bi-variate normal pdf, 383 complementary standard normal cdf, 218 degenerate bi-variate pdf, 342 degenerate multi-variate pdf, 347 degenerate tri-variate pdf, 347 generalized normal distribution, 397 mean, 195 moment, 196, 249 multi-variate normal pdf, 338 standard normal distribution, 133, 157, 217 sum, 429 variance, 195 Normalizing constant, 432 Normal matrix, 293 Normal noise, 396 Normal random vector, 337, 338 bi-variate normal random vector, 339 conditional distribution, 349 decorrelation, 355 joint moment, 356, 406 linear combination, 356 linear transformation, 351, 355 multi-dimensional normal random vector, 338 Null set, 3 Number of partition, 83

O One-to-one correspondence, 12, 22 One-to-one mapping, 22 One-to-one transformation, 274 pdf, 179, 274

Index Onto function, 21 Open interval, 4 Open set, 146 Order statistic, 135, 285, 315, 395 Orthogonal, 289 Orthogonal matrix, 293 Orthonormal basis, 198 Outer measure, 144

P Pairwise independence, 124 Parallel connection, 124 Partial correlation coefficient, 344 Partial moment, 250, 364 Partition, 7, 82 Pascal distribution, 129 Pascal’s identity, 89 Pascal’s rule, 89 Pattern, 310, 318 mean time, 318 Peakedness, 205 Percentile, 189 Permutation, 46 with repetition, 46 Piecewise continuous function, 35 Platykurtic, 205 Pochhammer function, 48 Pochhammer polynomial, 48 Pochhammer’s symbol, 48 Point, 1 Point mass, 263 Point set, 2 Poisson approximation, 221 multinomial distribution, 316 Poisson distribution, 128, 222, 297, 453 cdf, 222 cf, 208 coefficient of variation, 252 expected value, 195 kurtosis, 252 mean, 195 skewness, 252 sum, 455 variance, 195 Poisson limit theorem, 221 Poisson points, 223 Polar quantizer, 331 Polya distribution, 129 Population mean, 371 Population variance, 371 Positive definite, 293, 355 Positive semi-definite, 200, 292

491 A posteriori probability, 156 Posterior probability, 156 Power set, 4 equivalence, 17 equivalent, 17 Pre-image, 20 Price’s theorem, 358 A priori probability, 156 Prior probability, 156 Probability, 106 a posteriori probability, 156 a priori probability, 156 axioms, 106 classical definition, 112 conditional probability, 116 continuity, 140 limit event, 138, 139 posterior probability, 156 prior probability, 156 relative frequency, 115 Probability density, 93 Probability density function (pdf), 130, 169 absolute value, 184 beta pdf, 135 binomial pdf, 219 bi-variate Cauchy pdf, 383 bi-variate Gaussian pdf, 339 bi-variate normal pdf, 339 Breit-Wigner pdf, 133 Cauchy pdf, 133 central chi-square pdf, 184, 377 central F pdf, 384 central t pdf, 380 chi-square pdf, 377 conditional pdf, 209 cosine, 244 double exponential pdf, 132, 397 exponential function, 183 exponential pdf, 132 gamma pdf, 134 Gaussian pdf, 133 general transformation, 183 inverse, 181 joint pdf, 257 Laplace pdf, 132 limit, 458 linear function, 180 logistic pdf, 134 log-normal pdf, 183 Lorentz pdf, 133 magnitude, 184 marginal pdf, 256 normal pdf, 133

492 one-to-one transformation, 179, 274 Poisson pdf, 222 product pdf, 258 Rayleigh pdf, 133 rectangular pdf, 131 sign, 186 sine, 185, 242 square, 184 square root, 181 standard exponential pdf, 132 standard normal pdf, 157 tangent, 245 transformation finding, 187 uniform pdf, 131 unimodal pdf, 397 Probability distribution, 106 Probability distribution function, 165 Probability function, 106, 165 convergence, 443 Probability mass, 93 Probability mass function (pmf), 125, 169 Bernoulli pmf, 126 binary pmf, 126 binomial pmf, 127 conditional pmf, 208 geometric pmf, 127 joint pmf, 259 marginal pmf, 256 multinomial pmf, 259 negative binomial pmf, 129 Pascal pmf, 129 Poisson pmf, 128, 222 Polya pmf, 129 sign, 186 two-point pmf, 126 uniform pmf, 127 Probability measure, 105, 106, 126 mixed probability measure, 137 Probability space, 107 Product, 6 Product of two random variables, 278 Product pdf, 258 Product space, 100 Proper subset, 3

Q Quality control, 99 Quantile, 189

R Radius, 24

Index Random experiment, 99 model, 99 observation, 99 procedure, 99 Random number generation, 187 exponential distribution, 245 Rayleigh distribution, 188 Random Poisson points, 223 Random process, 256 Random sample, 371 Random sum, 429 expected value, 431 mgf, 430 variance, 431 Random variable, 161, 162 binomial random variable, 218 continuous random variable, 162, 169 convergence, 415 discrete random variable, 162, 169 exponential random variable, 224 finite random variable, 162 function of a random variable, 174 Gaussian random variable, 217 joint random variables, 255 limit, 415 mixed-type random variable, 162, 169 multinomial random variable, 316 negative exponential random variable, 246 normal random variable, 217 Poisson random variable, 222, 297, 453 random sum, 429 sum, 426 uniformly bounded, 435 variable sum, 429 Random vector, 255 bi-variate Gaussian random vector, 339 bi-variate normal random vector, 339 bi-variate random vector, 260 bi-variate t random vector, 383 cf, 295 complex random vector, 290 continuous random vector, 255 discrete random vector, 255 Gaussian random vector, 337, 338 hybrid random vector, 256 i.i.d. random vector, 269 independent random vector, 266 joint cf, 295 joint mgf, 296 linearly dependent random vector, 293 linearly independent random vector, 293 linear transformation, 293

Index mixed random vector, 256 multi-dimensional Gaussian random vector, 338 multi-dimensional normal random vector, 338 multi-dimensional random vector, 255 multi-variable random vector, 255 multi-variate random vector, 255 normal random vector, 337, 338 several independent random vectors, 270 two-dimensional normal random vector, 339 two independent random vectors, 269 two uncorrelated random vectors, 293 uncorrelated random vector, 292 Range, 19 Rank statistic, 315 Rate, 132, 224 Ratio of two normal random variables, 342 Ratio of two random variables, 279 Rayleigh distribution, 133, 252 cdf, 170 moment, 249 random number generation, 188 Rectangle, 144 Rectangular distribution, 131 Relative frequency, 115 Reproducing property, 38 Residue theorem, 73 Riemann integral, 131, 148, 168 Riemann-Stieltjes integral, 79, 168 Rising factorial, 48 Rising sequential product, 48 Rotation, 328

S Sample, 371 random sample, 371 Sample mean, 371 expected value, 371 mgf, 410 symmetric distribution, 376 variance, 371 Sample point, 100 Sample space, 100 continuous sample space, 100 discrete sample space, 100 mixed sample space, 100 Sample variance, 372 symmetric distribution, 376 variance, 373 Sequence space, 100

493 Series connection, 124 Set, 1 additive class of sets, 96 Borel set, 104, 146 Cantor set, 16 Cantor ternary set, 16 class of sets, 4 collection of sets, 4 complement, 5 convergence, 62 countable set, 12 denumerable set, 12 difference, 8 elementary set, 144 empty set, 3 enumerable set, 12 equivalence, 17 family of sets, 4 fat Cantor set, 17 finite set, 3 fuzzy set, 2 index set, 100 infinite set, 3 intersection, 6 interval, 4 interval set, 4 limit set, 57, 62 lower bound set, 59 lower limit, 59 measurable set, 101 non-denumerable set, 15 non-measurable set, 101, 149 null set, 3 open set, 146 point set, 2 power set, 4 product, 6 proper subset, 3 set of integers, 3 set of measure zero, 17, 110 set of natural numbers, 3 set of rational numbers, 12 set of real numbers, 3 set of sets, 4 singleton set, 2 Smith-Volterra-Cantor set, 17 subset, 2 sum, 5 symmetric difference, 8 uncountable set, 12, 15 union, 5 universal set, 1 upper bound set, 60

494 upper limit, 60 Vitali set, 105, 107, 150 Set function, 19, 93 Set of sets, 4 Several independent random vectors, 270 Sharp peak, 205 Sifting property, 38 Sigma algebra generated from G , 98 Sigma algebra (σ-algebra), 96 Sigma field, 96 generated from G , 98 Sign, 186 cdf, 186 expected value, 250 mean, 250 pdf, 186 pmf, 186 variance, 250 Sign statistic, 186 Simple function, 147 Simple zero, 43 Sine, 185 pdf, 185 Singleton class, 4 Singleton family, 4 Singleton set, 2 Singular function, 29, 32, 169 convolution, 32 increasing singular function, 31 Skewness, 205 binomial distribution, 252 exponential distribution, 205 geometric distribution, 205 negative binomial distribution, 205 Poisson distribution, 252 Small o, 219 Smallest event space, 102 Smith-Volterra-Cantor set, 17 Space, 1 abstract space, 1 continuous sample space, 100 discrete combined space, 100 discrete sample space, 100 event space, 101 finite dimensional vector space, 100 infinite dimensional vector space, 100 measurable space, 105, 147 measure space, 147 mixed sample space, 100 probability space, 107 product space, 100 sequence space, 100

Index Span, 231 Stable distribution, 400, 438 symmetric α-stable, 400 Standard deviation, 194 Standard Gaussian distribution, 133 Standard normal distribution, 133 cdf, 217 Standard symmetric α-stable distribution, 400 Statistic, 371 order statistic, 135, 285, 315, 395 rank statistic, 315 sign statistic, 186 Statistical average, 190 Statistically independent, 123 Step-convergent sequence, 33 Step function, 33 Fourier transform, 87 Step-like function, 169, 229 Stepping stones, 74 Stirling approximation, 237 Stirling number, 251 second kind, 251 Stirling’s formula, 237 Stochastic average, 190 Stochastic convergence, 420 Stochastic process, 256 Strictly decreasing sequence, 57 Strictly increasing sequence, 57 Strong law of large numbers, 434 Borel’s strong law of large numbers, 435 Kolmogorov’s strong law of large numbers, 436 Subset, 2 proper subset, 3 Sum, 5 Bernoulli random variable, 428 binomial random variable, 429 Cauchy random variable, 455 central chi-square random variable, 378 cf, 428 expected value, 426 exponential random variable, 429 gamma random variable, 455 Gaussian random variable, 429 geometric random variable, 455 mgf, 428 negative binomial random variable, 455 non-central chi-square random variable, 379 normal random variable, 429 Poisson random variable, 455 random sum, 429

Index variable sum, 429 variance, 426 Sum of two random variables, 276, 282 Support, 24 Sure convergence, 415 Surjection, 21 Surjective function, 21 Symbolic derivative, 36 impulse function, 41 max, 41 min, 41 Symbolic differentiation, 36 Symmetric α-stable distribution, 400 bi-variate isotropic SαS distribution, 402 standard symmetric α-stable distribution, 400 Symmetric channel, 156 Symmetric difference, 8 Symmetric distribution, 376 sample mean, 376 sample variance, 376 Symmetry parameter, 400

T Tail integral, 217 Tail probability inequality, 448 Tangent, 245 pdf, 245 Taylor approximation, 459 Taylor series, 208 Test function, 41 Theorem, 3 addition theorem, 118 ballot theorem, 155 Bayes’ theorem, 121, 154, 212, 303 Borel-Cantelli lemma, 141 Cauchy’s integral theorem, 73 central limit theorem, 217, 438, 444 de Moivre-Laplace theorem, 220 equivalence theorem, 18 Fubini’s theorem, 82 Gauss’ hypergeometric theorem, 69 Heine-Cantor theorem, 27 Kronecker lemma, 435 Lebesgue decomposition theorem, 230 moment theorem, 206 multiplication theorem, 118 Poisson limit theorem, 221 Price’s theorem, 358 residue theorem, 73 total probability theorem, 120, 210 Thomae’s function, 28

495 Total probability theorem, 120, 210 Transcendental number, 85 Transform Fourier transform, 199 inverse Fourier transform, 87 Laplace transform, 202 Transformation, 179, 187 coordinate transformation, 405 linear transformation, 274 one-to-one transformation, 179, 274 Transformation finding, 187 Transformation Jacobian, 272 Triangle inequality, 454 Two independent random vectors, 269 Two-point distribution, 126 Two-point pmf, 126 Type 1 discontinuity, 27 Type 2 discontinuity, 27

U Uncorrelated, 289, 350 Uncorrelated random vector, 292 Uncountable set, 12, 15 Uniform continuity, 26, 200 Uniform distribution, 127, 131 expected value, 194 mean, 194 median, 190 mode, 190 variance, 194 Uniformly bounded, 435 Unimodal pdf, 397 Union, 5 Union bound, 154 Union upper bound, 154 Unitary matrix, 293 Unit step function, 33 discrete space, 36 Universal set, 1 Upper bound, 55 least upper bound, 55 Upper bound set, 60 Upper factorial, 48 Upper limit, 60

V Vandermonde convolution, 69 Variable sum, 429 expected value, 431 mgf, 430 variance, 431

496 Variance, 194 beta distribution, 245 binomial distribution, 195, 207 Cauchy distribution, 196 central chi-square distribution, 378 central F distribution, 384 central t distribution, 410 conditional variance, 308 double exponential distribution, 245 exponential distribution, 194 gamma distribution, 251 Gaussian distribution, 195 geometric distribution, 250 hypergeometric distribution, 251 magnitude, 250 negative binomial distribution, 250 non-central chi-square distribution, 379 non-central F distribution, 388

Index non-central t distribution, 383 normal distribution, 195 Poisson distribution, 195 sample mean, 371 sample variance, 372, 373 sign, 250 uniform distribution, 194 Variance-covariance matrix, 290 Venn diagram, 5 Vitali set, 105, 107, 150

W Weak convergence, 421 Weak law of large numbers, 432 With probability 1, 17, 110 With probability 1 convergence, 416