Generating functions and probability distributions

I was reminded recently about something that took me an embarrassingly long time to realise: that the binomial expansion and the binomial distribution were intrinsically linked.

I mean, sure, obviously they both have an nCr in them, but what of it? It’s just Pascal’s triangle.

And then someone mentioned generating functions in a way that made sudden sense to me, and I hope I can do the same for you.

Let’s start with a concrete example

Suppose we roll four standard six-sided dice and want to know how likely we are to throw any given number of sixes. That’s simple enough:

Sixes	Calculation	Probability
0	$^{4} C_{0} {(\frac{1}{6})}^{0} {(\frac{5}{6})}^{4}$	0.482
1	$^{4} C_{1} {(\frac{1}{6})}^{1} {(\frac{5}{6})}^{3}$	0.386
2	$^{4} C_{2} {(\frac{1}{6})}^{2} {(\frac{5}{6})}^{2}$	0.116
3	$^{4} C_{3} {(\frac{1}{6})}^{3} {(\frac{5}{6})}^{1}$	0.015
4	$^{4} C_{4} {(\frac{1}{6})}^{4} {(\frac{5}{6})}^{0}$	0.001

Now, if we look at the binomial expansion of $((\frac{5}{6}) + (\frac{1}{6}) x)^{4}$ , we get:

$^{4} C_{0} {(\frac{1}{6})}^{0} {(\frac{5}{6})}^{4} x^{0} +$
$^{4} C_{1} {(\frac{1}{6})}^{1} {(\frac{5}{6})}^{3} x^{1} +$
$^{4} C_{2} {(\frac{1}{6})}^{2} {(\frac{5}{6})}^{2} x^{2} +$
$^{4} C_{3} {(\frac{1}{6})}^{3} {(\frac{5}{6})}^{1} x^{3} +$
$^{4} C_{4} {(\frac{1}{6})}^{4} {(\frac{5}{6})}^{0} x^{4}$

… say look! Those calculations are identical. The coefficient of the $x^{k}$ term is the probability of rolling $k$ sixes!

And this works in general – if you’ve got $n$ trials, each with a probability of $p$ , you want the coefficients of $(q + p x)^{n}$ . If the probabilities are different but independent, you can work out $(q_{1} + p_{1} x) (q_{2} + p_{2} x) \dots$ in just the same way.

Other distributions

It turns out, you can do just the same thing for the geometric distribution: given $X \sim G e o (p)$ , $P (X = k) = p q^{k - 1}$ .

What would our GF look like? $g_{p} (x) = p x + p q x^{2} + p q^{2} x^{3} + \dots$ .

That looks a lot like, well, a geometric sequence. The first term is $p x$ and the common ratio is $q x$ , so the sum of the series is $g_{p} (x) = \frac{p x}{1 - q x}$ . How lovely!

And the Poisson? Well, if $Y \sim P o (λ)$ , then $P (Y = k) = e^{- λ} \frac{λ^{k}}{k!}$ .

If we write out the corresponding function, we get $e^{- λ} (1 + \frac{λ x}{1} + \frac{λ^{2} x^{2}}{2!} + \dots)$ , and that bracket is just $e^{λ x}$ . The GF is $p_{λ} (x) = e^{λ (x - 1)}$ .

Sanity checks and means

A quick sanity check for GFs is to see whether the probabilities sum to one – and that’s easy: just put in $x = 1$ and check the result is 1. (It is for all of these).

There’s also a quick way to figure out means: if you differentiate the generating function and put in $x = 1$ , the dark magic of GFs gives you the mean. Can you figure out why?

Let’s start with a concrete example

Other distributions

Sanity checks and means

A selection of other posts