AI Metrics


The company I work for has recently added a new metric to track developer performance - AI usage per epic. Apparently 80% of all epics must use AI to help develop the solution.

The main reason for this metric is to drive adoption of AI tooling across the development cycle. I don’t mind this as a general idea. In my AI disclaimer I mention I do use AI, and find it useful for quite a few things, but it is very much a sometimes food 🍪 - or in this case a sometimes tool.

As with most things in life there are good parts and bad parts to a chosen tool like AI, but these days it seems like leaders in the IT industry have put their blinders on with regards to AI.

I guess it doesn’t help that headlines will take any positive spin on AI and blurt it out without any actual journalistic integrity to verify the statements.

Take Fortune’s article1 on Cursor creating a browser from scratch.

It completely fails to mention that the browser doesn’t build, has failing tests, imports most of the complex code from existing human made open source projects, doesn’t meet the specifications of a browser, imports additional libraries it doesn’t use, and is effectively a failed proof that AI can do it.

Despite all that they got the headline they were after.

Cursor used a swarm of AI agents powered by OpenAI to build and run a web browser for a week — with no human help. Here’s why developers are buzzing — Fortune 1

Why I don’t like this metric?

This metric means we are no longer looking to measure the outcomes of the work, but instead are focusing on how that work is done. A similar metric would be to say all developers must use JetBrains products to write code as it is good at refactoring, despite the fact that many devs have great reasons to use other tools.

Imagine if this was done in another workplace, say a restaurant.

Tonight on Hell’s Kitchen - 80% of the meals you make must use this pot — Gordon Ramsay I assume 2

Seems a bit silly now doesn’t it. Certainly that pot will be great for some meals, but I am not sure deep fried chicken will really benefit from that pot over a deep fryer.

Now some might say this example is a little bit didactic, and AI is certainly more versatile than a pot, but there should always be a primary focus on the end result rather than the tool used to get it.

What this means for developers

This sort of metric causes burnout in developers - especially the experienced ones who have developed the flow they like to work in already.

When metrics are recorded and monitored they are inevitably going to be tied to remuneration. It might be a minimal thing, but it is something the team will have on their mind and be concerned about. I have had managers come in and tell me that the metric doesn’t matter, but if that was the case it would not have been introduced.

This is not a good thing. You are taking the focus away from work, bringing in stress, and holding people to a standard that is not even related to the outcome a developer achieves.

What this means for the business

I think this is all about marketing. As a business that sells AI products there is a drive to say “Hey look everyone, we like AI so you should buy it from us”.

I think this is an okay goal, but you are potentially sacrificing the quality of work, happiness of the team, and potentially losing staff over it. Is that cost worth it? I don’t know.

Better metrics

I would love to see more focus on customer satisfaction, failure rates, and anything else that put the result first. These are far closer to the king of all metrics (whether something is making, maintaining, or saving the business money) than metrics on the tooling.

Others that track output such as burndown and velocity will give a far better picture of the effects of AI, and could be used to weigh the benefits of AI usage against the costs of it.

Unfortunately this does not seem to be the way things are going. It is more important to use AI, than for it to be well used.

Such is the life of a modern developer I guess. We will find out in a few years if it was all worth it.

Footnotes

  1. Fortune’s article on an AI-built browser ↩ ↩2

  2. Imagine the flavour you could get from all the burnt on bits ↩



Want to read more? Check out more posts below!