A Co-Proof of the Birthday Problem
“[In this context] Co- just means ‘opposite’ — so a co-mathematician is a machine for turning theorems into ffee.” — Miles ([twit handle=”pozorvlak”])
Matt Parker ([twit handle = “standupmaths”]) laid down a challenge on Day 1 of the MathsJam conference: he said that proof by MathsJam was acceptable, because if it wasn’t true, you wouldn’t be giving a talk about it. It immediately became my mission to talk about something that isn’t true, but still looks plausible. If you like, a co-proof.
I picked the Birthday Problem — sometimes called a paradox, for reasons I’ve never been able to fathom — because it’s something everyone in the room would know about, and something I happened to have an erroneous proof of.
The Correct Proof
You may know the correct proof: The chances of two random people not sharing a birthday are $\frac{364}{365}$. For three people, that drops to $\frac{364}{365} \times \frac{363}{365}$, and so on: the first time you see it, it’s a great surprise that you only need 23 people for it to drop below a half (meaning you’ve a better-than-even chance of seeing two people with the same birthday).
I looked at it from a different angle. Think about the number of connections between people. If you have $n$ people, there are $k = \frac{n(n-1)}{2}$ possible pairings. Each of those has a probability of $\frac{364}{365}$ of being a mismatch, so the probability of there being no shared birthday among the $n$ people is $\left(\frac{364}{365}\right)^k$.
Where it gets interesting
This is where it gets interesting for me, as a Mathematical Ninja: if you write $\left(\frac{364}{365}\right)^k$ as $\left(1 - \frac{1}{365}\right)^k$, you see something familiar: it’s something to do with $e$. In fact, $\left(1-\frac{1}{365}\right)^{365}$ isn’t at all far off of $\frac{1}{e}$ — the approximation is an underestimate by about 0.14% ((In my MathsJam talk, I mistakenly gave this error as 1.4%.)) . If you had — for the sake of argument — $n = 23$ making $k= 253$ connections, you could give the probability of no shared birthday as $\left[\left(1-\frac{1}{365}\right)^{365}\right]^{\frac{253}{365}}$. As it happens — and this is by far the most interesting thing — $\frac{253}{365} \simeq 0.69315$. It’s $\ln(2)$, to five decimal places. So, the probability of having no matches would be slightly less than $\left( \frac{1}{e} \right)^{\ln(2)}$, which is $\frac{1}{2}$.
The answer from the approximation there works out to 0.4995. The answer for $n=23$ done properly is 0.4927 — meaning the erroneous version is off by 1.4%. 23 is the first number of people where it drops below a half.
Two observations: one, the ‘connections’ version is wrong, but two, it’s really not all that far off. Also, $\frac{253}{365} = \ln(2)$ (to five decimal places), how cool is that?
Where it all goes wrong
The error — as several people told me after the talk — comes from a sneaky assumption: that all of the connections are equally improbable. In fact, the probabilities of the links being matches changes depending on what you know about the other links — something the correct version accounts for.
As the conference wound down, fellow-delegate Kathryn ([twit handle = “KathrynHTaylor”]) and I made a concerted effort to estimate the error. It turns out that (when you expand the brackets) the powery approximation is off by approximately $\frac{n(n-1)(n-2)}{6 \times 365^2}$ — although that ignores terms in $365^3$ that are certainly significant by the time $n$ reaches 23.