Before the Next Pilot
An advice column for the recently inspired
Dear Monday Morning Tinkerer,
I just got back from the conference on agentic AI. Fantastic event—really clarified the picture. The tech is moving from assistant to autonomous agent, taking on real responsibilities within roles, eventually operating with humans in support. Every role affected by 2030, somewhere between 20 and 80 percent of existing work transformed. I feel inspired, challenged, and ready to act.
We’ve started the education process internally. Now I need to turn that energy into pilots. I have board support, a mandate to move, and a team that’s genuinely excited. My question is: how do I make sure the pilots actually make it into production, rather than becoming expensive proof-of-concepts that quietly disappear?
Sincerely, Inspired and Ready to Implement
Dear Inspired,
The energy is right. The direction is probably right. And the conference gave you something useful: a reason to move, and a room full of people who confirmed that the pressure is real.
What it didn’t give you is a tool for deciding where to start. That’s not a criticism of the conference—it’s a description of what conferences are for. The narrative about scale and inevitability is accurate as far as it goes. It just doesn’t resolve into a scoping decision. And the gap between AI will transform 20 to 80 percent of every role and here is what we should build first is where most pilots go wrong.
The field evidence is more complicated than the narrative implies, and more useful.
Most pilots don’t fail because the technology isn’t ready, or because the team wasn’t committed, or because the change management was underfunded. They fail for a structural reason that almost nobody names in advance. An LLM in a workflow doesn’t behave like software with a fixed cost—it reasons toward a solution, and that reasoning can expand, opening branches, retrying steps, accumulating context. If what the system is trying to achieve isn’t precisely defined, the reasoning drifts. Controls get added as failures surface. Tighten them enough and you’ve paid for agency and received a workflow with an unreliable narrator in the middle.1 The business case collapses before the ambition does.
Most teams pass through the viable zone on the way to this outcome without recognising it, because nobody told them the viable zone had a shape. The more useful question isn’t why most fail. It’s what the survivors had in common.2
The pilots that made it into production share one feature that almost nobody names: the problem had been defined well enough that the world could confirm whether it was solved. Done wasn’t a judgment the system made. Something external confirmed it.
A change-in-reporting-line request is this shape. The request arrives, the records update, the confirmation sends, the loop closes. A support incident at first tier is this shape too: triage the issue, attempt the documented fixes, escalate if unresolved. Clear input, clear output, clear escalation path. These aren’t the pilots that look most impressive in a board presentation. They’re the ones quietly running in production while the ambitious ones are still accumulating controls.
The survivors weren’t the pilots with better narratives about what AI would eventually become. They were the ones where someone had done the work of defining the problem precisely enough that the technology had something to solve—and the world had something to confirm.3
Back to the 20-to-80 percent. The figure is probably directionally right. It’s also the wrong unit for deciding where to start—and the breadth of the range is the tell. A message that says somewhere between a fifth and four-fifths of every role will be transformed is a message about inevitability and scale, not about sequencing. It implies you could begin anywhere, which is almost the opposite of what the failure data suggests.
Roles aren’t the constraint. Problems are—specifically, which problems have been defined precisely enough that a system can reach a solution the world will confirm, without an expensive harness holding it there. That target is smaller than the narrative implies. It’s also real, available now, and doesn’t require waiting for 2030.
The definitional work that produces this kind of problem is unglamorous. It doesn’t generate conference presentations or steering committee minutes.4 But it determines almost everything else—because a terminal state you can point to is how you know the problem was actually defined, rather than just gestured at.
So before the next scoping meeting—before governance, before vendor selection, before the steering committee convenes—get the room to answer one question:
Can we describe what done looks like, and what happens when the system gets there?
If done is external, unambiguous, and confirmable without relying on the system’s own judgment, the pilot has a shape that production can absorb. Scope it. Run it. It will probably work.
If not yet, that’s the work—before the engineering starts. The conference gave you the urgency. The field evidence gives you the starting point.
Yours in productive constraint,
The Monday Morning Tinkerer
P.S. The pilots that survived almost never named terminal-state clarity as the selection criterion. They just happened to have it. That’s not a coincidence—it’s a pattern waiting to be applied on purpose.
The companion piece, The Art of Not Having an AI Strategy (Yet), covers the earlier problem: how to protect the space to experiment before the steering committee colonises it.
Hero Image: Google Gemini
If your organisation is building toward AI pilots and you want the definitional work done before the engineering starts, that’s the conversation I have. associates.evans-greenwood.com
I publish weekly essays on how emerging technology actually changes work—not the hype, not the panic, just the patterns. Subscribe to follow along.
Want to see how the pieces connect? Check out The Framework.
Know anyone who’s just come back from a conference with a mandate to move on AI and a steering committee forming around them?
This is the piece to send before the first scoping meeting.
The structural reason for this—why agentic variance compounds with scope ambiguity and why controls don’t solve it—is in Inside the Language Machine.
Deloitte’s 2025 survey of enterprise AI deployments found around 90 percent of LLM pilots failed to reach production; the MIT data runs in the same direction. Early evidence on agentic deployments suggests a similar pattern: Gartner estimates more than 40 percent of projects will be cancelled outright, and practitioner reports imply a substantially higher effective failure rate once stalled and lightly adopted deployments are included. What neither dataset asks—and what matters—is what distinguished the survivors. That’s a question about problem definition, not technology.
This is the same dynamic described in When Narratives Can’t Absorb Contradiction: the pilots that survived weren’t the ones with better narratives about imagined utility—they were the ones discovering actual utility through deployment. The terminal state is what made that discovery legible.
The upstream step here—establishing what a problem actually is before asking technology to solve it—is explored in more depth in ‘Reconstructing Work’ (Deloitte Review, Issue 21). The short version: a problem isn’t ready for automation until the terms are specified, the inputs and outputs articulated, and the context fully established. The terminal state is the test for whether that work has been done. See Evans-Greenwood, Peter, Harvey Lewis, and Jim Guszcza. “Reconstructing Work: Automation, Artificial Intelligence, and the Essential Role of Humans.” Deloitte Review, June 31, 2017. https://web.archive.org/web/20180202111312/https://www2.deloitte.com/insights/us/en/deloitte-review/issue-21/artificial-intelligence-and-the-future-of-work.html.

