Post-Factual Estimates
I have ‘written’ more code in the last six months than in the previous six years. It is the kind of line that sounds impressive until you think about it for a minute, at which point it falls apart. More code is not an achievement. It might be the opposite. I have also deleted more code than ever, thrown away more first drafts, and watched an agent burn a few hundred thousand tokens and produce two thousand lines of crap before my Barista Americano. The numbers are real. They just do not mean anything.
It strikes me that a lot of the numbers we used to steer by have stopped meaning what we thought they meant.
Revenue is vanity, profit is sanity
Discussing requirements with a finance team recently, I picked up a phrase that has aged better than most: revenue is vanity, profit is sanity. You can post a huge top-line number and still run the thing into the ground. The figure that flatters you in the meeting is rarely the figure that keeps the lights on. The two are related, loosely, right up until they are not, and the gap is where businesses die.
Software has always had its vanity metrics. Lines of code, the oldest joke in the trade, measuring progress by weight. Story points. Velocity. Commits. Pull requests. Bugs Closed. We knew they were rough proxies, but they roughly tracked the thing we cared about, which was effort turning into working software. Rough, but roughly honest.
AI has snapped that thread. Lines of code is now a measure of how fast your model ‘types’. A pull request count tells you how chatty your agent is. Story points estimate a world that no longer exists. And the newest one, tokens spent, measures precisely one thing: how much you spent or how long your 5 hour window is likely to last. None of them measure whether anything got better. They are all revenue. The profit, which is whether it shipped, whether it works, and whether it made something easier for a real person, is exactly the bit none of them capture.
When someone like Mark Zuckerberg starts telling everyone he is “coding again,” you are watching a vanity metric being performed at the top. Nobody means lines shipped. They mean: look, I am close to the magic.
The estimate from a different universe
The sharpest version of this is estimation, and it borders on the absurd.
I have taken to asking the model to estimate a piece of work the old-fashioned way before I start. It answers in the old currency, in a perfectly serious voice: three people, about five weeks, given the integrations and the testing. A sensible estimate. The estimate I would have written myself a year ago, and defended in a planning meeting with a straight face.
Then, with the work grounded properly, “I” built it. Not in five weeks. In an afternoon. Refined over a subsequent few hours. Reviewed by a spawned subagent.
In fairness to the estimate, it was accurate - for a world I no longer work in. The numbers from that world do not convert into this one. Call them post-factual estimates: the figure on the plan and the thing that actually happened now live in different universes, and the plan is the last to find out.
Counting the wrong thing, faster
We optimise whatever is easy to count. Lines, points, tickets, tokens. All easy. Whether a change actually took friction out of someone’s day is hard to count, so it tends not to get counted at all.
AI has made the easy numbers easier than ever to inflate, and done nothing for the hard one. The temptation is enormous - measure your AI “transformation” in tokens burned and pull requests opened, produce a lovely chart that goes up and to the right, and mistake the chart for the work. “Goodhart’s law” is the old warning that when a measure becomes a target, it stops being a good measure. AI is Goodhart’s law with a rocket strapped to it. The moment tokens spent becomes a target, it becomes a magnificent way of looking busy.
Professional services has lived with this one for years. You can run a firm at full utilisation, every person booked solid and every timesheet a satisfying shade of green, and still do almost nothing to drive real chargeability. Everyone is busy. Busy is easy to measure. Whether all that motion turned into something a client will happily pay for is the quieter, harder number, and it is the only one that counts when the year closes. Utilisation is revenue. Chargeability is profit. Tokens and pull requests are just the software version of a full timesheet.
What I would actually count
Unfortunately, I don’t have a tidy metric to replace them all. But I have one question I return to - What got easier this week? Not how much did we produce. What is less painful than it was on Monday, for a user, a team, a customer. That one is hard to game, because the people on the other end feel it whether you chart it or not.
So when someone posts their AI spend on social media, or a graph of pull requests, I struggle to care. Those are top-line figures. I wrote more code in six months than in six years - and the only numbers I would stand over are whether it shipped and what got easier.
Did you find this post helpful?
A little support goes a long way in helping me create more free, in-depth content like this.