Why is an outlier defined as 1.5 interquartile ranges outside of each quartile?

Great question, imagination!

The simple answer, I think, is that it’s a nice and easy thing to work out, and 1.5 interquartile ranges is quite a long way from the central box (if there’s no skew, it’s roughly three times half of the box in the box-and-whiskers plot).

### But why 1.5?

My suspicion is that it’s something to do with the normal distribution, because everything is something to do with the normal distribution ((well, nearly everything. That’s why it’s called normal.))

The $z$-scores for the quartiles are, inexplicably, not in the standard table EdExcel kindly give you for your A-level exams. However, I have a $z$-score calculator on my machine which tells me they’re $\pm 0.6745$ (to four significant figures).

One-and-a-half interquartile ranges above the upper quartile would take us to a $z$-score of 2.698, which corresponds to a tail probability of 0.0035.

That means, assuming the normal distribution is the true underlying model for your observations, the probability of a given observation lying in the ‘outlier’ zone is a between 0.5% and 1% - in fact, about 1/143, if R isn’t telling me porkies.

Intriguingly, $\sqrt{2}$ IQRs above and below the quartiles gives an outlier probability of very close to 1% - but that’s just a bit harder to remember, a bit harder to work out, and - in the end - just as arbitrary.