Mastodon
6 min read

Leave the token counting for Chuck-E-Cheese.

This morning I came across a thread on Reddit asking, “Are people’s bosses really making them use AI tools?” To which I could point to more than a few mandates that have been handed down, but I love this anedotal, top comment:

"I spoke to a good friend a few days ago who told me that his CTO has explicitly stated that ‘# of tokens of code input into AI tool’ is being used as a metric of developer productivity.”

I am close to putting 2000 miles on my vehicle. 

What does that tell you? Not much. It doesn't even tell you how many miles were on it in the first place. Instead it conjures up a ton of questions that could be asked to make any sense of that data point.

Is it brand new or brand new to me? You can't even infer if that's a lot or hardly any miles at all. It doesn't suggest in the least if those are highway or city or gravel road miles. There's nothing in that data to demonstrate my ability to drive nor my style of driving. How many miles were above or below the speed limit? How did that affect my gas mileage? And what about the condition of the tires? Do I wear a seat belt while driving? Am I focused on driving or am I constantly distracted by a device? Am I driving by myself or are there other passengers in the vehicle with me? Was I burning through expensive premium fuel just to sit in traffic, or cruising efficiently? How much of that mileage was spent making actual progress versus being stuck in gridlock? Did I take the most direct route, or drive in circles because I didn't know where I was going? How many times did I have to backtrack or take detours? Were there accidents or safety violations along the way?

And why am I driving at all? Do I need a vehicle or are there better options to consider like public transit or walking? In fact, yeah, how far am I driving on average per trip? And most importantly—was I even going somewhere that mattered? Or am I just driving around because I like being behind the wheel while listening to the new Big Wild album?

Now let’s go back to our CTO’s mandate.

What does tracking "# of tokens of code input into AI tool" tell you about a developer's productivity? Nothing. Nadda. Zip. Zilch.

It doesn't even tell you how much code was already in the codebase when they started. And is this a greenfield project or are they working on legacy systems? From that metric you can't even infer if those tokens represent complex problem-solving or just basic boilerplate generation. It doesn't suggest in the least whether that's production code or experimental throwaway work. 

There's nothing in measuring token use to demonstrate coding ability or problem-solving skills. How much of that AI-generated code actually compiled? How did it affect code quality? And what about technical debt? Are they understanding what the AI produces or just copy-pasting blindly? Are they even reading the generated code before committing it? How did that code impact the platform? Did using AI improve the quality of the work?

Were they burning through expensive AI tokens just to generate trivial snippets they could have written faster by hand, or using it strategically for complex algorithms? How much of that token usage was spent solving real business problems versus just experimenting because they were bored? 

Were they hauling useful functionality that users actually needed, or just generating impressive-looking code that serves no purpose? Did they architect clean solutions, or generate massive amounts of code that had to be refactored, deleted, or completely rewritten? How many times did they have to debug AI-generated bugs versus shipping working software? How much time was spent fixing security vulnerabilities or performance issues that the AI introduced?

Do they need assistance for genuinely complex problems, or are there better approaches like pair programming, code reviews, or actually learning the fundamentals? How much cognitive capability are they building versus outsourcing? And most importantly - are they even solving problems that matter? Or, way more important, are they using AI simply so they don’t take a hit during their annual review?

Tracking usage alone means nothing except that people are using the AI platform that the CTO purchased. It’s a meaningless binary question that doesn’t provide any answers but instead creatves a vacum of information that requires more inquery.

Number of days in a school is not a measure of learning. Number of days sitting in a pew at church does not equal salvation. Number of years married isn't a measure of relationship quality. Number of channels on your TV package doesn't equal entertainment value. Number of contacts in your phone doesn't measure friendship. Number of degrees on your wall doesn't guarantee competence. Number of miles you've traveled doesn't make you worldly. Number of apps on your phone doesn't measure tech savviness. Number of meetings you attend doesn't equal contribution. Number of certifications you hold doesn't prove expertise. Number of subscriptions you pay for doesn't measure productivity. Number of items in your shopping cart doesn't equal wealth. Number of photos you take doesn't capture memories. 

Just as the number "of tokens of code input into AI tool" doesn’t measure developer skill, indicate code quality, equal productivity, demonstrate problem-solving ability, show technical judgment, prove engineering competence, reflect business value delivered, measure understanding of the codebase, indicate innovation or creativity, equal meaningful contribution to the team, demonstrate ability to architect solutions, measure debugging skills, show capacity for code review and collaboration, indicate understanding of software design patterns, measure ability to write maintainable code, demonstrate security awareness, show performance optimization skills, indicate ability to work with legacy systems, measure capacity for technical leadership, or prove that any actual learning, growth, or value creation is happening at all.

I’ve written about developers because that’s the world where this story originated from but you can swap every job type and role in here related to every form of knowledge work. You and I and our work are worth way more than being measured by a yes/no measurement.

More importantly, what does "# of tokens of code input into AI tool" tell you about the CTO?

Here's what I can infer from years of leading and working with executives navigating change. This isn't a strategy—it's a common pattern we see when leaders feel pressure to show progress but lack the literacy to distinguish between activity and achievement.

This CTO is likely caught between board expectations and ground-level reality. They're measuring what's visible instead of what matters, which is completely understandable given the current hype cycle. The challenge is that this metric optimizes for the appearance of innovation while potentially undermining the actual work of software engineering. And I know these types of mandates and direction are happening outside of just the software engineering domain.

What this reveals is a leader who needs better AI literacy—not technical skills, but strategic fluency. They can't yet distinguish between a developer using AI to solve complex architecture problems versus generating boilerplate code. This isn't a character flaw; it's a common gap that I and some peers see across most organizations right now.

The real risk here isn't incompetence—it's incentivizing the wrong behaviors. When you measure token usage, you get token optimization. Developers will start prompting AI unnecessarily just to hit their numbers, which defeats the purpose of using AI strategically. It also will have the uninteded consequence of turning your people into zombies.

This is exactly the kind of situation where a confidential conversation can help. No shame, no judgment—just clarity on what metrics actually indicate progress and how to build the organizational capacity to use AI thoughtfully.

What can you do about it?

If you're the developer stuck with this metric, you're not powerless. Document what actually matters. Keep a simple log of real problems solved, business value delivered, and code quality improvements—with or without AI. Build your own literacy around when AI helps versus when it hurts. And have that uncomfortable conversation with someone you trust. Frame it as wanting clarity on success, not challenging the directive. You're not alone in this confusion.

If you're the CTO reading this and recognizing yourself, there's still time to course-correct. Replace token counting with outcome measurement. Create or steal a framework to do this. Focus on code quality, problem-solving effectiveness, and actual business value delivered. Treat this like training for a 5k not a 500m sprint.

Build your own AI literacy first! You can't guide what you don't understand. And create psychological safety for your team to tell you what's really happening. Say "we're all figuring this out together" and mean it.

Measuring tokens of code input into AI tools tells you as much about developer productivity as counting miles tells you about driving skill. Which is to say: absolutely nothing that matters.

Like trying to stomach the pizza at that rat hole “arcade” today, mandating use is the worst bet on your companies future that you can make. Stop counting tokens, stop counting period! Start building literacy. Teach how to use AI creatively to build your people, their skills, and capabilities. Your job is to build resilence, adaptability, and confidence in your people, your code, your products, and your company—will be better for it.