Anyone who built software for a while knows that estimating how long something is going to take is *hard*. It’s hard to come up with an unbiased estimate of how long something will take, when fundamentally the work in itself is about *solving* something. One pet theory I’ve had for a really long time, is that some of this is really just a statistical artifact.

Let’s say you estimate a project to take 1 week. Let’s say there are three equally likely outcomes: either it takes 1/2 week, or 1 week, or 2 weeks. The *median* outcome is actually the same as the estimate: 1 week, but the *mean* (aka *average*, aka *expected value*) is 7/6 = 1.17 weeks. The estimate is actually calibrated (unbiased) for the median (which is 1), but not for the the mean.

A reasonable model for the “blowup factor” (actual time divided by estimated time) would be something like a log-normal distribution. If the estimate is one week, then let’s model the real outcome as a random variable distributed according to the log-normal distribution around one week. This has the property that the median of the distribution is exactly one week, but the mean is much larger