Tell Me the Odds: A 15 Page Introduction to Bayes Theorem [1 ed.]

Bayes Theorem Is Important. Bayes Theorem is a way of updating probability as you get new information. Essentially, you

485 58 486KB

English Pages 15 [23] Year 2017

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Tell Me the Odds: A 15 Page Introduction to Bayes Theorem [1 ed.]

Citation preview

Tell Me The Odds A 15 Page Introduction To Bayes Theorem

By Scott Hartshorn

What Is In This Book? This is intended to be a short book to help you understand Bayes Theorem without covering every detail. To keep it a length that you can read through and try out the examples in under an hour, this book walks through only two examples. The first example is a “toy example”, where an unknown die is drawn from a bag and you have to iden fy what die it was. Although just a toy example, this turns out to have a lot in common with some real life problems, such as iden fying how many tanks the enemy has based on captured serial numbers. The second example takes us into the seat of a spaceship about to fly through an asteroid field. Sure the odds of successfully naviga ng that asteroid field may seem astronomical (at least according to your loudmouthed golden robot) but he might not know the full story. This example shows how you can include addi onal informa on into the Bayes Theorem calcula on in mul ple steps. In real life, this example might have applica ons to a person trying to figure out how much risk they have of heart disease if they have no family history and have good physical fitness, but also have high blood pressure and are over 50 years old.

Your Free Gift As a way of saying thank you for your purchase, I’m offering this free Bayes Theorem cheat sheet that’s exclusive to my readers. This cheat sheet contains informa on about the Bayes Theorem and key terminology, 6 easy steps to solve a Bayes Theorem Problem, and an example to follow. This is a PDF document that I encourage you to print, save, and share. You can download it by going here

Bayes Theorem Introduction Even though we are only doing two examples, Bayes Theorem is a straight forward topic to understand. The simplest explana on is that Bayes Theorem is the same type of probability you already know, just in reverse. What does that mean? In your typical experience, you have likely seen situa ons where you have a known star ng point and you are asked to calculate the probability of an outcome. For instance, if I know that I am holding a six sided die, what is the probability that I will roll a 3? Alterna vely, if you know that you are innocent of a crime, what is the probability that the DNA test will come back posi ve anyway. (False posi ve). With both of those examples, we know the current state and want to calculate the probability of something in the future. In our typical experience, we have also seen situa ons where we have two events in a series. In that case, those probabili es get mul plied together. For instance, I could say “I have a bag that has 4 different dice in it, each with a different number of sides. One of those dice has 6 dices. If I draw a single die, what is the probability that I will draw the 6 sided die and roll a 3 with it on the first try?” In this case, the solu on is to mul ply the probability of drawing the 6 sided die ( 1 in 4 ) with the odds of rolling a 3 (1 in 6) for a total of 1 in 24. With Bayes theorem, we can do the reverse of the same ques ons. Instead of “What are the odds of drawing a 6 sided die and rolling a 3?” It is “I rolled a 3, what are the odds that I had a 6 sided die?”. (To actually solve that problem we would need to either know or es mate a li le more informa on, namely what are the other dice in the bag)

The reason we bother with Bayes Theorem is that we live in a world where we frequently see outcomes but have to guess at the ini al events that caused those outcomes. Some knowledge is withheld from us due to circumstances, so we need to es mate it as best we can.

Bayes Theorem Equation The equa on for Bayes Theorem is

This equa on is not immediately understandable to me, so instead of focusing on that equa on, it will be more intui ve to show how to actually solve the problems, and then show how the equa on fits in. The easiest way to understand how to solve a Bayes Theorem problem is with a table. The first example we will show is a dice example, but the same type of table can be used to solve all types of Bayes Theorem problems.

st 1 Example – Identify A Die Drawn From A Bag For this problem, assume that I randomly drew a die from a bag that contained 4 dice. The 4 dice each have a different number of sides, one has 4 sides, one has 5 sides, one has 6 sides and one has 8 sides. If I roll that die and tell you the result without showing you the actual die, can tell me what the die is, or at least give me probabili es for each die? For this example let’s say that I rolled a 5. Here you have access to the outcome, the roll of the die, but not the ini al state. If we were asking the problem the other way it would be simple. If I told you that I was holding an 8 sided die and asked you the odds that I would roll a 5, it is easy. Those odds would be 1 in 8. In fact, the odds that I would roll any given number between 1 and 8 with the 8 sided die are 1 in 8. The odds for any given number for the 4, 5, and 6 sided die are also easy, 1 in 4, 5, or 6 respec vely up to the maximum number on that dice.

That was simple, and it turns out we can use our knowledge of probabili es going in one direc on to calculate the odds going in the other direc on. If we make a table of the probabili es of rolling a certain value for any given die what we get is this

Note that this table ignores the odds of actually drawing a specific die. It shows the odds for a given die once it has been drawn. Each die has its own column. Once I tell you which die I am holding, for instance, the 5 sided die, you could delete all the other columns and tell me the odds of the any given outcome based on that informa on. Bayes theorem tells us that we can go the other direc on also. If instead of telling you that I have a 5 sided die, I tell you that I rolled a 5 (with whatever die I had), we can delete all the blocks where the 5 wasn’t rolled. (Basically dele ng rows instead of columns) What we are le with is

What this tells us is that If we had a 4 sided die, we had a 0% chance of rolling a 5 If we had a 5 sided die, we had a 20% chance of rolling a 5 If we had a 6 sided die, we had a 16.7% chance of rolling a 5 If we had an 8 sided die, we had a 12.5% chance of rolling a 5 Since I’m telling you that the 5 was the measured outcome, this is all that we need to worry about. Knowing the outcome of that event allows us to remove all the probabili es associated with other events.

However, we aren’t quite done yet. With this type of probability, all the outcomes must sum up to 1.0 (i.e. 100% likelihood). That is because the outcome in ques on is no longer just possible, it actually happened. I actually did roll a 5, there is a 100% chance that that event occurred.

With this Bayes Theorem example, the rows do not all sum to 1.0. So a er we determine which cells we are keeping based on our known outcome, we need to adjust them to make them sum to 1.0, which is called normalizing. In this example, all of the outcomes only sum up to 49.2% (20+16.7+12.5). To make them sum to 1.0 we divide each of the probabili es by the sum of all the probabili es. .2 / .492 = .407 = 40.7% .167 / .492 = .339 = 33.9% .125 / .492 = .254 = 25.4% What this means is that, based on this single roll of a die, we have a 40.7% chance of having a 5 sided die, a 33.9% chance of having a 6 sided die, and a 25.4% chance of having an 8 sided die.

But Wait, I Forgot To Account For The Initial Probabilities Everything in the above paragraphs is correct for the example given. But something is missing for more complicated examples. To understand what that is, it is me to actually look at the Bayes theorem equa on shown below.

If we label the terms what they are is Prior: Which has our initial estimate of probability before we know the result of our data Likelihood: Which is the probability that any given initial condition would produce the result that we got Normalizing Constant: Which is the sum of the probability of all the conditions which satisfy our result Posterior: This is the result that we are looking for

What we did in the dice example above was use the likelihood and the normalizing constant. I.e. we used this part of the equa on

But did not explicitly use this part of the equa on

The P(B|A) part of the equa on is the likelihood, which is the resul ng value of any given cell. I.e. the likelihood for a 5 sided die was 20%. What this is saying is that if we assume we have a 5 sided die the odds that we would roll a 5 is 20%. Likewise, the likelihoods for the 6 sided die and the 8 sided die are 16.7% and 12.5% respec vely. The P(B) part of the equa on is the normalizing constant, which is the sum of all of those likelihoods. In this case that was 49.2%. As we saw before, dividing by the normalizing constant means that the sum of all of the likelihoods is forced to equal 1.0. What we did not explicitly use was the P(A) part of the equa on, which is the prior. The prior is how you account for your ini al es mate of probability before incorpora ng new data. In the example above, I told you that I had 4 dice in a bag and randomly drew one out. So it was reasonable to assume that I had equal odds of drawing out any single die. Since the ini al odds for all of our outcomes were the same, the fact that we ignored the prior didn’t affect the results since the prior would have gone away in the normaliza on. However, it’s possible, and even likely in many cases, that you won’t have equal odds for all ini al condi ons in the prior. For the example above, you might assume that I was more likely to draw a 6 sided die than the 8 sided die because of the shape. Or, more reasonably, instead of 4 dice in the bag I might have 10 dice in the bag, one 4 sided die, two 5 sided dice, three 6 sided dice, and four 8 sided dice, in which case your ini al es mate of probabili es would not be that there was an equal likelihood of drawing any given die. To relate that back to a real life example, if you are trying to es mate how many tanks an enemy country has, you might pick the most likely number and then es mate a bell curve around it for the other ini al probability es mates. As a result, each individual star ng state (i.e. whether they have 200 tanks or 400 tanks) might get a different ini al probability.

With Bayes Theorem, the way we account for whatever ini al probability es mates we have is by mul plying the prior probability by each individual likelihood. I.e. if I have a 10% chance of having a 4 sided die, and I have a 25% chance of rolling a three, assuming that I have a 4 sided die, then my odds of both having a 4 sided die and rolling a three are .1 * .25 = .025. So we can create the table again accoun ng for both the prior probability and the likelihood of a given roll assuming you have that dice. If we assume there is a 10%, 20%, 30%, and 40% chance of the 4, 5, 6, and 8 sided dice respec vely, then what we are doing is mul plying each column of our original likelihood table

By the associated column in this ini al probability table.

The result is

Importantly, if you sum every cell in this table, the total value is 1.0

Now if we say we rolled a 5, we can remove all the non-5 rolls,

Sum up the remaining results

And divide by that sum to normalize those results

The final results we get is that we have a 28.6% chance of having the 5 sided die, and a 35.7% chance of having each of the 6 or 8 sided dice A Visual Representation Of What We Just Did Using the process above is good for actually solving the updated probabili es a er incorpora ng new informa on. But for understanding exactly what happened, some mes it is helpful to see it more graphically. The table below represents the likelihood table that we ini ally had when we had one of each of the four dice. This table is a 1x1 square, so it has a total area of 1.0. That represents a 100% probability. Thus the rela ve size of any given square represents how likely that outcome is.

In the table above, each of the columns has the same width, that is because every die is equally likely to be drawn. Since I have a 1 in 4 chance of drawing any single die, each column has 25% of the total area. Within a given column, all the rectangles have the same height. That is because, for any given die, you are equally likely to roll any number. However different rectangles have different heights in different columns because different dice have different odds of rolling a number. I.e. you have a 25% chance of rolling a 1 with a 4 sided die, but only a 20% chance of rolling it with the 5 sided die. We can adjust the table above to account for any ini al probabili es that we have (i.e. the prior). In the second part of the example above, we said we have a 10% chance of drawing the 4 sided die, a 20% chance of drawing the 5 sided die, a 30% chance of the 6 sided die, and a 40% chance of the 8 sided die. We can make the table reflect this by adjus ng the width of the columns.

Now the blue column for the 4 sided die has 10% of the total area, and the width columns for the other dice correspond to their ini al probabili es. So far we haven’t done Bayes Theorem. This is the same visual table you could make if you just had the different dice in a bag, and wanted to make a table of your odds of drawing any die and ge ng a specific roll with it. We can start using Bayes Theorem by incorpora ng another result. Here we observe that we rolled a 5, so we can remove all outcomes that are not associated with rolling a 5. What remains is shown below

This gives us the rela ve likelihood of any given dice resul ng in the observed outcome. However we don’t want to just know the rela ve likelihood, we want to know the absolute probability. So we have to adjust our results so that they all add up to 100% probability again (i.e. normalize). We can do that by adjus ng the rela ve width of the rectangles above and then stretching them all to have a total area of 1.0, as is shown below.

At this point, we have our result. If we measured the area of each of the columns, we would see that the 5 sided die has 28.6% of the total area, and the other two dice have 35.7% of the total area each. This is the same result we got using the table method. If we wanted to incorporate another roll, we could keep the resul ng table above as our star ng point, and incorporate the likelihoods of another roll. We will show that mathema cally in the next example below.

nd 2 Example – Navigating An Asteroid Field In the example above, we made an ini al es mate of the probability that we had a given dice and used Bayes theorem to update that probability a single me. However you are not limited to only upda ng probability a single me, you can con nue upda ng it mul ple mes as long as you have new informa on to incorporate. As an example, imagine you are a scruffy looking spaceship captain about to fly through an asteroid field. As you are about to enter the asteroid field, your “helpful” robot companion follows protocol and tells you that the odds of naviga ng the asteroid field are 1 in 3720 or approximately .0269%. But it doesn’t know you very well, and if it had more informa on about your pilo ng skill, it would likely give you be er odds. Bayes Theorem is how it can update that probability with new informa on. In terms of Bayes Theorem, what were the odds the robot ini ally stated, the 1 in 3720? That was the ini al es mate of your odds of success, i.e. the prior. How did the robot get that informa on? Likely he has records of thousands of pilots who flew into asteroid fields and how many came back out alive. I.e. he knows the odds for the popula on in general, but he might not know about you specifically. And there are some important factors that might affect the results. For instance, these might improve your odds You have flown through asteroid fields multiple times before (perhaps smuggling illicit cargo in hidden storage compartments) You have a skilled (if rather hirsute) co-pilot But on the other hand, you are being chased and shot at by the galac c military, which will tend to lower your odds of survival. How can you account for all these factors? By using Bayes Theorem Let’s start with the first piece of informa on, which is that you have flown through asteroid fields mul ple mes in the past. We need to make a likelihood table for how many of the people who survived and did not survive had previous experience flying through asteroid fields, and how many didn’t. Based on the informa on from your golden android, that likelihood table is shown below

What we see is that of the people who didn’t survive, 98% of them had never navigated an asteroid field before. And of the people who survived, only 20% had never flown through an asteroid field before. Now we can mul ply that likelihood table by the ini al probabili es of 1 in 3720 people did survive and 3719 in 3720 people did not survive And get this new table

We get 4 resul ng paired probabili es. From this, we can see that, of all the people flying into an asteroid field, the vast majority, 97.97% are people doing it for the first me who will perish in the a empt. However what we care about are people who are doing it for the second (or more) me. So and we delete the row that doesn't match with our situa on. In this case, we keep the fact that we have previously flown through asteroid fields.

Summing those odds and normalizing gives us this result.

What we have calculated is that an experienced pilot has be er odds of naviga ng the asteroid field. .01064 is just over 1 percent or approximately 1 in 94. Now those might not be great odds, but they are certainly be er than 1 in 3720.

Swamping The Prior This brings up an interes ng point about Bayes Theorem. And that is if your ini al odds are really low, the new odds a er 1 piece of new data are likely to be low. This is frequently seen in medical tes ng. If the odds that you have a rare disease are low, then the odds that you have the disease a er a single posi ve test are probably s ll low (i.e. a false posi ve) A er several tests, however, the ini al probability becomes dominated by the new informa on, which is known as “swamping the prior” In this example, we have addi onal informa on we can incorporate into our probability calcula on. Namely, we can account for the fact that we have a skilled co-pilot, but sadly also need to account for the fact that we are being pursued. The easiest way to include this is to just take the results from the previous step and treat it as the prior for this step. So our odds going into this second step are .01064 that we will survive .98936 that we will not survive If we make a likelihood table of who survived based on their co-pilot status

We can then mul ply by our prior and get

A er discarding the events that don’t fit with our situa on we are le with

Now in the previous examples, we normalized these results. And we could normalize again right now and it would work just fine. But the thing about normalizing is, as long as you do it in the last step, it doesn’t ma er if you do it in the intermediate steps or not. Since we also need to adjust the odds based on the fact that the military is shoo ng at us, we can wait and normalize a er doing that calcula on. The only thing to be aware a er mul ple itera ons is that the probabili es can get so small that you can start under running the available precision of whatever computer/calculator/robot you are using.

So here let's skip normalizing and go straight into the third Bayes calcula on where we incorporate the fact that we are being pursued by the galac c government. Assume that the likelihood table looks like this

We can mul ply that by the prior from the previous step and get these results

When we normalize we get the final odds of survival of

.06977 is 6.977% which is approximately 1 in 15. Those are just the kind of heroic odds we need to impress any nearby princesses.

More About This Table Method This book showed a table method of doing Bayes Theorem, which I think is a very intui ve method. This table method is not the only way to solve Bayes Theorem. We wasted some effort by genera ng likelihoods for outcomes that we immediately threw away. A refinement of this method would be to not construct the full table each step. For instance, if you know you rolled a 5, you don’t have to populate every roll, and then discard most of them, you just need to make the data associated with rolling a 5. No ma er what refinements you include though, you will have to calculate the odds of that outcome for every single possible ini al state (i.e. each column). As a result, thinking of Bayes Theorem as a table where you are keeping certain rows based on the outcome you observe is a good way to remember how it works.

More Books If you liked this book, you may be interested in checking out some of my other books. The full list with links is located here

Some that you may like are Bayes Theorem Examples – This book gives additional examples of how to use Bayes Theorem. It dives into some details that were not covered in this shorter book, such as how you can account for potential errors in your data.

Probability – A Beginner’s Guide To Permutations and Combinations – Which dives deeply into what the permutation and combination equations really mean, and how to understand permutations and combinations without having to just memorize the equations. It also shows how to solve problems that the traditional equations don’t cover, such as “If you have 20 basketball players, how many different ways you can split them into 4 teams of 5 players each?” (Answer 11,732,745,024)

Linear Regression and Correlation: Linear Regression is a way of simplifying a set of data into a single equation. For instance, we all know Moore’s law: that the number of transistors on a computer chip doubles every two years. This law was derived by using regression analysis to simplify the progress of dozens of computer manufacturers over the course of decades into a single equation. This book walks through how to do regression analysis, including multiple regression when you have more than one independent variable. It also demonstrates how to find the correlation between two sets of numbers.

And here is another more advanced book on Bayes Theorem by a different author. “Think Bayes” by Allen Downey This book goes much deeper into complicated probability distribu ons for the priors and the likelihoods. I found the probability of any given spot ge ng hit by a paintball during a paintball compe on to be interes ng.

Thank You

Before you go, I’d like to say thank you for purchasing my eBook. I know you have a lot of op ons online to learn this kind of informa on. So a big thank you for downloading this book and reading all the way to the end. If you like this book, then I need your help. Please take a moment to leave a review for this book. It really does make a difference and will help me con nue to write quality eBooks on Math, Sta s cs, and Computer Science.

P.S. I would love to hear from you. It is easy for you to connect with us on Facebook here

or on our web page here

But it’s o en be er to have one-on-one conversa ons. So I encourage you to reach out over email with any ques ons you have or just to say hi! Simply write here:

~ Sco Hartshorn