The Mechanical Horse: When AI Speed Needs Human Direction

The Horse That Couldn’t Fly

One of the novel’s funniest scenes: the Duke gives Quixote a wooden horse called Clavileño and tells him it’s a magical flying steed. Quixote mounts it, gets blindfolded, and the servants blow wind in his face and wave torches to simulate flight. He’s convinced he’s soaring through the sky.

He’s sitting on a wooden horse in a courtyard. Going absolutely nowhere.

Sancho, also blindfolded, has his doubts. “Master, I think we haven’t moved.” But the performance is convincing enough that even Sancho starts to wonder.

The Duke wasn’t trying to help Quixote get anywhere. He was entertaining his court. The whole spectacle existed for the benefit of the audience, not the riders. That’s the part worth remembering.

The Performance of Productivity

Every organization has its version of Clavileño right now. It looks like this:

A VP schedules an all-hands to announce the company’s AI strategy. There’s a demo—someone generates a feature with a coding assistant in real time. The audience gasps. The slide deck says “10x productivity.” A Slack channel called #ai-transformation gets created. Three months later, the team’s actual workflows haven’t changed, but the quarterly board deck has a page about AI adoption with impressive-sounding metrics.

This is AI theater: the organizational performance of being AI-forward without the substance of changed outcomes. It’s the wind in Quixote’s face—convincing, dramatic, and entirely manufactured.

The telltale sign isn’t that the tools don’t work. They do. The sign is that the conversation is about the tools instead of about results. When leadership talks more about which AI products the company is using than about which customer problems got solved faster, you’re watching Clavileño. When the metric is “percentage of teams with AI tool access” instead of “time from customer problem to shipped solution,” the horse is wooden.

And the blindfold works both ways. The team can’t see that they’re going nowhere—but the audience can’t see it either. Everyone is watching the spectacle, and the spectacle is genuinely impressive. Demos always are.

How the Blindfold Gets Applied

Nobody puts on the blindfold voluntarily. It happens in stages.

Stage one: the speed intoxication. AI tools arrive and output volume jumps immediately. PRs per week go up. Features-in-progress multiply. Dashboards turn green. This is real—the speed is genuine. But it creates an assumption that more output equals more progress, and nobody pauses to verify.

Stage two: the metric substitution. Because output is easy to measure and outcomes take months to observe, the organization starts celebrating the thing it can count. “We shipped 40% more features this quarter” becomes the headline. Whether those features moved retention, revenue, or satisfaction is a question for next quarter. The velocity metric becomes the proxy for the outcome metric—and proxies have a way of replacing what they were supposed to represent.

Stage three: the social pressure. Once velocity is the celebrated metric, slowing down to ask “is this the right thing?” feels like obstruction. The person who questions whether the team should build something becomes the person who’s “not embracing AI.” The blindfold isn’t just obscuring the view—it’s socially enforced. Taking it off carries a political cost.

Stage four: the theater becomes the strategy. The organization builds its narrative around AI-powered velocity. The board hears about it. Customers hear about it. Recruiter pitches mention it. Now the appearance of speed is load-bearing—admitting that outcomes haven’t changed would undermine the story everyone has invested in. The wooden horse becomes too important to question.

This is how a team ends up going nowhere fast. Not through incompetence, but through a series of individually reasonable steps that add up to an elaborate performance.

What Actually Works

Name the performance before it names you

Quixote mounted Clavileño before asking where it was going. But the more important detail is that the Duke never intended it to go anywhere. The destination was never part of the plan—the spectacle was the plan.

Sancho would have asked a different question. Not “can the horse fly?” but “who benefits from us believing it can?” In organizations, AI theater persists because it serves someone’s interests—usually the people who sponsored the initiative, the vendors who sold the tools, or the leaders who staked their reputation on transformation.

The PM’s job isn’t to refuse the horse. It’s to ask, in plain language, what outcome the horse is supposed to deliver—and to notice when nobody can answer. If the response is about capability (“it can generate code!”) rather than destination (“it will reduce our time-to-resolution by half”), you’re looking at a stage set, not a vehicle.

Separate the demo from the workflow

The Clavileño scene worked because it was a one-time performance. Nobody had to ride the horse to work every morning. AI demos have the same property—they showcase a tool’s best moment, under ideal conditions, with a preselected problem.

The gap between “impressive demo” and “changed daily workflow” is where most AI theater lives. A coding assistant that generates a clean component in a demo may require 45 minutes of debugging and refactoring to integrate into your actual codebase. A content generation tool that writes perfect marketing copy on stage may produce generic output when fed your real brand constraints.

When evaluating AI tools for your team, skip the demo. Instead, pick the most annoying recurring task your team actually does—the one people groan about—and try the tool on that, in the real environment, with real constraints. If it helps, you’ve found a genuine use. If it doesn’t, you’ve avoided adding a prop to the stage.

Build a blindfold detector

Sancho wanted to peek under the blindfold. He was told not to. The whole illusion depended on nobody checking whether they were actually moving.

In organizations, the blindfold is the absence of outcome data adjacent to velocity data. When your dashboard shows PRs merged, features shipped, and sprint velocity—but not user adoption of those features, not support tickets caused by those features, not revenue impact of those features—you’re riding blind.

The fix isn’t philosophical. It’s mechanical. For every velocity metric your team tracks, place the corresponding outcome metric directly next to it. Features shipped this sprint: 12. Features with measurable user adoption after 30 days: 3. That juxtaposition is the peek under the blindfold. It doesn’t require anyone to make a speech about outcomes vs. output. The numbers do the talking.

If your organization resists adding outcome metrics next to velocity metrics, pay attention to that resistance. It tells you the blindfold is load-bearing.

Spot the “AI-powered” label game

The Duke didn’t just blindfold Quixote—he gave the performance a name. “Clavileño the Swift.” The label made the wooden horse sound like a real one.

Watch for the organizational equivalent: products, features, and initiatives that get the “AI-powered” prefix without a clear explanation of what AI actually changed. “AI-powered search” might mean a genuine semantic understanding improvement, or it might mean someone added an LLM call that rewords the query before sending it to the same search index. “AI-powered analytics” might mean automated insight detection, or it might mean a chatbot wrapper around existing dashboards.

The label isn’t the problem. Meaningless labels are. When something in your product gets the AI prefix, ask: what does this do that the non-AI version couldn’t? If the answer is specific and testable, great. If it’s vague—“it’s smarter,” “it understands context”—the label is a costume, not a capability.

Run a Direction Check that actually bites

Nobody measured whether Clavileño actually reached its destination. They measured the spectacle—the wind, the flames, the drama of the ride.

Most retrospective questions about AI are too polite to reveal theater. “How are we using AI tools?” invites comfortable answers. You need questions that create productive discomfort—the kind of discomfort that Sancho created by saying what nobody wanted to hear.

The Direction Check isn’t a survey. It’s a structured exercise designed to surface the gap between velocity and outcomes. Here’s how to run one that actually reveals what’s happening.

Signs You’re in an AI Theater

Watch for these patterns—not as individual red flags, but as a constellation. Any organization will have one or two. When you see four or five, the horse is wooden.

Throughput is up but customer metrics are flat—more shipped, same results
AI discussions center on tools and capabilities rather than problems and outcomes
The team has more demos than deployments—impressive showcases, few changed workflows
Velocity metrics are celebrated in all-hands while outcome metrics live in a spreadsheet nobody opens
Questioning the value of an AI initiative feels politically risky
Products carry “AI-powered” labels without specific, testable claims
Code reviews have become faster but shallower—volume overwhelms scrutiny
The AI strategy is described in terms of adoption percentages, not customer impact

Each is a blindfolded ride on a horse that doesn’t fly.

The Direction Check

At your next sprint retrospective, run this exercise. It takes 30 minutes and replaces comfortable questions with ones that surface the real picture.

Step 1: The Velocity Inventory (5 min). List everything the team shipped in the last two sprints. Features, fixes, improvements—everything. Put the count on the board. This number will feel good. Let it.

Step 2: The Outcome Overlay (10 min). Next to each shipped item, write the user-facing metric it was supposed to move. Not the output metric (it shipped), the outcome metric (did users adopt it, did it reduce support tickets, did it improve retention). For items less than 30 days old, write “too early to tell”—that’s honest. For items older than 30 days, write the actual number. If nobody knows the number, write “unmeasured.”

Step 3: The Gap Count (5 min). Count how many items are marked “unmeasured.” Calculate the ratio: shipped items vs. items with known outcome data. This ratio is your theater score. If more than half your shipped work has no outcome data, you’re performing productivity—you don’t actually know if the horse is flying.

Step 4: The Uncomfortable Question (10 min). Ask the team: “If we had shipped half as many things but measured all of them, would we know more or less about whether we’re building the right product?” Let the silence sit. This question isn’t rhetorical. Discuss the answer honestly.

Step 5: One Change. Pick one thing to change for the next sprint. Not five things. One. Maybe it’s adding outcome tracking to the top three features. Maybe it’s declining one initiative that exists because it sounds AI-forward rather than because it solves a problem. Maybe it’s running a real user test on something the team assumed was working.

The Direction Check works because it doesn’t argue against speed. It asks what the speed is producing. Sancho never told Quixote to stop riding. He just wanted to know where they were going.

Stop Simulating Flight

Clavileño couldn’t fly. But a real horse, with a real rider who knows the road, can cover enormous ground.

AI tools are real horses. They can genuinely move your product forward at speeds that weren’t possible before. But they need a rider who refuses the blindfold, an organization that values direction over spectacle, and the willingness to ask—in front of the Duke and his court—whether anyone has checked if the horse has left the courtyard.

The person who asks that question will be uncomfortable. They may be the only one in the room not applauding the performance. That’s exactly why the question matters.

Take off the blindfold. Steer the horse.

Put this into practice

Direction matters more than speed. The Decision Filter helps your team name the problem before AI generates the solution—so you’re steering, not performing.

Get the Template →

The Mechanical Horse