Bayes' Theorem, summer babies and that funny | symbol
There’s an excellent article by [twit handle = ‘johnallenpaulos’] talking about a really interesting probability ‘paradox’ to do with summer births.
It’s not really a paradox (as with a lot of probability, it makes perfect sense once you think it through), but the “at least one boy” puzzle was one of my favourites growing up.
An excuse to talk about Bayes’s Theorem!
What Paulos’s article doesn’t do, though, is provide a nice, colourful table to illustrate either of the puzzles. In fairness, the seasons one would be a bit big, but the simple version? Let me put that right.
If we know 100 families with two children, we’d expect their distribution to look something like this((In real life, you’d also expect it to vary a bit - so although this is the most likely distribution, the chances of it being exactly this are actually very small - that’s a story for another day.)):
Older boy | Older girl | Total | |
Younger boy | 25 | 25 | 50 |
Younger girl | 25 | 25 | 50 |
Total | 50 | 50 | 100 |
That is, if you want to know the probability of the family being made up like mine - I have a younger brother - you look down the older boy column until you reach the younger boy row; there were 25 families that matched that description, out of 100 altogether. That’s 25/100 = 25% or one in four.
Paulos’s first example asks about the probability of the second child being a boy if you know the first child is a boy. “If you know…” is usually translated into maths as “Given” - which is something that comes up a lot in S1, and makes everyone cry (unless they’re up to speed with Bayes’ Theorem). You might even see it as $P(\text{two boys}|\text{older boy})$ - which you read as “the probability of ‘two boys’ given ‘older boy’”.
I’ll try to wipe those tears away with a Really Clear Table.
Older boy | Older girl | Total | |
Younger boy | 25 | 25 | 50 |
Younger girl | 25 | 25 | 50 |
Total | 50 | 50 | 100 |
We’re only interested (here) in families with an older boy. (Whenever you’ve got a “given” question, you want to highlight the events AFTER the | symbol - they’re the ones you know are possible).
We see there are 50 families in total there, and 25 of them have a second boy - the probability is $\frac{25}{50} = \frac{1}{2}$, as you’d expect.
The tricky bit comes in Paulos’s second example, where you know at least one of the children is a boy - but not which one. Here, we’re looking for $P(\text{two boys}|\text{at least one boy})$ - and we want to highlight every family with at least one boy.
Older boy | Older girl | Total | |
Younger boy | 25 | 25 | 50 |
Younger girl | 25 | 25 | 50 |
Total | 50 | 50 | 100 |
Now it’s quite clear, I hope: there are 75 families with at least one boy; 25 of them have two boys. The probability of two boys given at least one is $\frac{25}{75} = \frac{1}{3}$.
The really clear table method works in almost all cases where you could use a Venn diagram - the only times you actually need a Venn diagram are when the question asks for it, or if there are more than two events.
The really odd thing about the summer babies problem
Imagine you ask someone about their kids and they say “I’ve got two kids – actually, I’m just off to pick my son up”. From the information you have, you’d say it was a $\frac{2}{3}$ chance that the other child was a daughter.
If instead they said “I’ve got two kids – actually, I’m just off to pick my son up from his birthday party,” the probability changes to a little over $\frac{1}{2}$ – even though the parent has said nothing about the other child.
Bayes’ Theorem has a lot to answer for. What’s the probability there’s some ibuprofen in the bathroom?
- Edited 2021-01-03 for formatting.