Anyone who built software for a while knows that estimating how long something is going to take is hard. It’s hard to come up with an unbiased estimate of how long something will take, when fundamentally the work in itself is about solving something. One pet theory I’ve had for a really long time, is that some of this is really just a statistical artifact.
Let’s say you estimate a project to take 1 week. Let’s say there are three equally likely outcomes: either it takes 1/2 week, or 1 week, or 2 weeks. The median outcome is actually the same as the estimate: 1 week, but the mean (aka average, aka expected value) is 7/6 = 1.17 weeks. The estimate is actually calibrated (unbiased) for the median (which is 1), but not for the the mean.
A reasonable model for the “blowup factor” (actual time divided by estimated time) would be something like a log-normal distribution. If the estimate is one week, then let’s model the real outcome as a random variable distributed according to the log-normal distribution around one week. This has the property that the median of the distribution is exactly one week, but the mean is much larger