While writing an obituary for George Box, I stumbled on something I thought was ingenious: a method for generating independent pairs of numbers drawn from the normal distribution.

I’ll concede: that’s not necessarily something that makes the average reader-in-the-street stop in their tracks and say “Wow!” In honesty, it would probably make the average reader-in-the-street rapidly become a reader-on-the-other-side-of-the-street. However, I thought an article on it might provide some insight into two mathematical minds: that of George Box, one of the greatest ((If not the greatest)) statisticians of the 20th century, and that of me, possibly the greatest mathematical hack of the 21st.

How the Box-Muller transform works

If you want to apply the Box-Muller transform, you need two numbers drawn from a uniform distribution - so they’re equally likely to take on any value between 0 and 1. Let’s call these numbers $U$ and $V$. Box and Muller claim that if you work out

\(X = \\sqrt{-2 \\ln (U)} \\cos (2\\pi V)\) and \(Y = \\sqrt{-2 \\ln (U)} \\sin (2\\pi V)\)

then $X$ and $Y$ are independent (information about one tells you nothing about the other) and normally distributed with a mean of 0 and a standard deviation of 1. I’m not going to prove that, because I don’t know how, but I can explain what’s happening.

There’s a hint in my choices of letter: you might recognise that you could simplify these down to $X = R \cos(\theta)$ and $Y = R\sin(\theta)$, which are just the sides of a triangle. The $R = \sqrt{-2\ln(U)}$ is the distance from $(0,0)$ - because $U$ is between 0 and 1, $\ln(U)$ is anywhere from $-\infty$ to 0 ((It’s exponentially distributed, since you ask.)) Multiplying by -2 turns it into a nice positive number (so you can take its square root really) and tends to reduce the distance from the origin. For normally-distributed variables, you want the distances to clump up in the middle; that’s what the 2 is for.

The $\theta = 2\pi V$ is much simpler: it just says ‘move in a random direction’.

What Colin did next

My immediate thought was, ‘I wonder if I can use that to work out the probability tables for $z$-scores you get in formula books!’ What do you mean, that wasn’t your immediate thought? ((Weirdo!)) Long story short: the answer is no; I just wanted to show you my thought process and that not everything in maths works out as neatly as you’d like.

My insight was that the probability of generating an $X$ value smaller than some constant $k$ would be the same as the probability of generating $U$ and $V$ values that gave smaller $X$s. So far so obvious! In that case, it’s just a case of rearranging the formulas to get expressions for (say) $V$ in terms of $U$ and integrating to find the appropriate area.

So I tried that:

\[\\sqrt{-2 \\ln (U)} \\cos(2\\pi V) = k \\\\ \\cos(2\\pi V) = \\sqrt{ \\frac{k^2}{-2\\ln(U)}} \\\\ V = \\frac{1}{2\\pi}\\cos^{-1}\\left( \\sqrt{ \\frac{k^2}{-2\\ln(U)}} \\right)\]

Yikes. I don’t fancy trying to integrate that - the arccos is bad enough, but the $\ln(U)$ on the bottom? Forget about it.

Let’s try the other way:

\[\\sqrt{-2 \\ln (U)} \\cos(2\\pi V) = k \\\\ -2\\ln(U) = k^2 \\sec^2(2\\pi V) \\\\ U = e^{-\\frac{k^2}{2}\\sec^2(2\\pi V)}\]

Curses! I don’t think that’s going to work, either. $e^{\sec^2 x}$ isn’t an integral I know how to do - so I’m stymied.

Back to the drawing board, I’m afraid - this time, I didn’t get the cookie of a new maths discovery; the difference between a poor mathematician and a decent mathematician is that a poor mathematician says “I got it wrong, I’m rubbish;” the decent mathematician says either “ah well. Next puzzle!” or “ah well! Try again.”

The great mathematicians, of course, see right to the end of the puzzle before they start.