Anyone who built software for a while knows that estimating how long something is going to take is hard. It’s hard to come up with an unbiased estimate of how long something will take, when fundamentally the work in itself is about solving something. One pet theory I’ve had for a really long time, is that some of this is really just a statistical artifact.
Let’s say you estimate a project to take 1 week. Let’s say there are three equally likely outcomes: either it takes 1/2 week, or 1 week, or 2 weeks. The median outcome is actually the same as the estimate: 1 week, but the mean (aka average, aka expected value) is 7/6 = 1.17 weeks. The estimate is actually calibrated (unbiased) for the median (which is 1), but not for the the mean.
A reasonable model for the “blowup factor” (actual time divided by estimated time) would be something like a log-normal distribution. If the estimate is one week, then let’s model the real outcome as a random variable distributed according to the log-normal distribution around one week. This has the property that the median of the distribution is exactly one week, but the mean is much larger:
If we take the logarithm of the blowup factor, we end up with a plain old normal distribution centered around 0. This assumes the median blowup factor is 1x, and as you hopefully remember, log(1) = 0. However, different tasks may have different uncertainties around 0. We can model this by varying the σ parameter which corresponds to the standard deviation of the normal distribution:
Just to put some numbers on this: when log(actual / estimated) = 1 then the blowup factor is exp(1) = e = 2.72. It’s equally likely that a project blows up by a factor of exp(2) = 7.4 as it is that it completes in exp(-2) = 0.14 i.e. completes in 14% of the estimated time. Intuitively the reason the mean is so large is that tasks that complete faster than estimated have no way to compensate for the tasks that take much longer than estimated. We’re bounded by 0, but unbounded in the other direction.
Is this just a model? You bet! But I’ll get to real data shortly and show that this in fact maps to reality reasonably well using some empirical data.
So far so good, but let’s really try to understand what this means in terms of software estimation. Let’s say we look at the roadmap and it consists of 20 different software projects and we’re trying to estimate: how long is it going to take to complete all of them.
Here’s where the the mean becomes crucial. Means add, but medians do not. So if we want to get an idea of how long it will take to complete the sum of n projects, we need to look at the mean. Let’s say we have three different projects in the pipeline with the exact same σ = 1: