Delegation is the AI Metric that Matters

Forget the benchmarks – the best way to track AI’s capabilities is to watch which decisions experts delegate to AI.

Some delegations are subtle, like Claude’s 4.0 system prompt update that searches the web without asking for permission). Others are explicit, like a programmer using Claude Code’s “YOLO mode” that lets the agent perform actions on its own¹.

Delegations signal improvements in models, better applications, or increased user trust (or, occasionally, user laziness).

User delegation is not just a valuable key metric for product owners. It’s also central to the societal question about which decisions we’re comfortable delegating to AI.

Historical Parallel: Remember When Buying Things Online Felt Dangerous?

e-Commerce acceptance was gradual, mostly governed by social and cultural norms, not technological innovation.

The technology parts were relatively easy. The first secure purchase took place in 1994, but as people logged on and internet retailers blossomed at the turn of the century, people weren’t rushing to buy online. By the year 2000, only 22% of Americans had purchased something online

If you’re my age or older, you likely remember the novelty of e-commerce and how tentatively people embraced it. Shoppers cautiously used dedicated credit cards and limited their purchases to low-risk items, like books or CDs.

To earn shopper trust, e-commerce innovators introduced 3rd party trust models. PayPal allowed customers to buy from random sites without sharing their credit card. Amazon’s “A-to-Z Guarantee” made Amazon responsible for all 3rd party seller transactions. eBay’s rating system created a reputation economy where past performance was visible to all parties. And Apple’s iTunes and App Store both brought digital purchases out of the browser into specialized, controlled apps curated and managed by Apple.

Each innovation, combined with social experiences (knowing others who successfully buy online), gradually increased trust until online shopping was normal. By 2016, 79% of Americans were buying things online, with 15% shopping online at least once a week.

The last of the “touch-and-feel” categories – like grocery shopping – finally began to shift online during COVID quarantines.

Today, 30 years after the first secure online purchase, e-commerce is decidedly normal².

Plotting AI Delegation for Product Planning

One could plot the acceptance of e-commerce using four key measures:

Adoption: How many people are buying things online?
Frequency: How often are people buying things online?
Assortment: What types of things are people buying online?
Share: What percent of all their purchases are people buying online?

At first glance, frequency and share look to be getting at the same thing. But by looking at them together, you can understand if e-commerce is leading to more shopping overall. If share is flat but frequency is up, e-commerce is driving new purchase behaviors, not simply replacing offline habits.

These metrics are well suited for assessing the adoption of AI. Instead of looking at purchases occurring online, we’re using these measuring decision delegated to AI applications.

However, we can improve assortment. For e-commerce, this captured what people were buying (say, books) and what they weren’t (cars) at a given moment. We too are interested in what behaviors are being delegated, but unlike online-shopping, delegation isn’t binary. With e-commerce, you’re either purchasing an item online or you aren’t. But there are several degrees of AI delegation:

Avoidance: When you don’t trust AI to help with a task, so you fully abstain.
Supervision: When you trust AI to help with a task, but not to complete it successfully, so you review it’s output carefully.
Delegation: When you fully trust the AI to complete the task, on its own.

Let’s call this spectrum AI Posture, or just Posture.

Which leaves us with our delegation measures:

Adoption: How many people are using AI?
Frequency: How often are people using AI?
Share: What percent of all their tasks are people using AI for?
Assortment: What types of tasks are people using AI for?
Posture: To what degree are people delegating these tasks to AI?

I’ve reordered them a bit so that we first are measuring macro measures, before diving into specific task details.

As an AI product owner, you can imagine keeping track of key user stories and plotting them along the posture spectrum. For example, we can read how an expert programmer uses coding agents and see they’re:

avoiding using AI for writing Rust or guiding the overall project;
supervising most tasks like test cases writing, project set-up, and edit-compile-test-debug cycles;
and delegating refactoring test cases and writing Bash code.

Mapping the posture of your users is a cousin to error analysis (finding the things your AI fails at), but a bit more nuanced: your app can perform a task, but does your user trust it?

This is where the e-commerce comparison shines: e-commerce slowly saw adoption grow as social and cultural norms changed. Technical concerns were table stakes, not accelerants. Only by plotting AI delegation can you begin to assess the challenge.

At some point, many of the supervised tasks above will be reliable enough to move into the delegation category. But there are tasks many users will prefer to retain oversight (payment provider integration and authentication come to mind.)

Finally, these are (of course) dependent on the user you’re targeting. A novice coder playing with Lovable or v0 is delegating like mad. Perhaps because they’re incapable of adequately supervising a coding agent and/or perhaps because their end-goal is a toy or personal tool.

But if you’re not managing an AI product, the metric to watch is expert delegation. Watch what tasks experts begin to supervise or even delegate and you’ll have a better grasp of AI’s capabilities than those merely watching the benchmarks.

While we’ve embraced e-commerce, there are a few examples of things we might want to purchase online, but can’t due to societal norms. Many prescription medications can’t be ordered online and others only can with regular assessments by doctors (virtual or otherwise).

Similarly, experts will delegate many tasks to AI that the general public wouldn’t trust AI to handle. For example, while we’re seeing increased usage of AI automation on the battlefield, there is push back that either effective or ethical, with a wide range of opinions depending on the specific task being discussed.

Or perhaps society will wince at the idea of personalized AIs making end-of-life decisions for unconscious patients? I, for one, don’t think the death panel meme will go over better with the addition of AI – even if some (and let me be explicit: this paper was viewed as bold and questioned by many in the bioethics community) ethicists think delegating end-of-life decisions could be a good idea.

It’s also interesting to watch where experts hesitate but society embraces. I’ve talked to many who would happily let an AI have access to their email accounts, if it meant they could delegate their most menial replies. Meanwhile, Anthropic’s safety team keeps publishing reports about how giving Claude email autonomy might result in it emailing the media or the feds with your misdeeds.

The gap between social acceptance and expert acceptance of delegating a given task to AI is a point of negotiation that will occur more often, in more domains over the coming years. Watch these points of friction to better understand the distribution of AI. First as a sign that AI is performing a task sufficiently against expert standards. And second, as a sign that either regulation will arrive or cultural innovations are needed to enable the technical ones.

I appreciate Anthropic called this flag, --dangerously-skip-permissions. ↩
Even 3rd party trust mechanisms, once so crucial to customers, are valued much less. Apple being forced to allow alternate payment systems on their store feels like a natural bookend to the tale. ↩

Delegation is the AI Metric that Matters

Plotting the Tasks Users Delegate to AI is Key for Product Planning & Monitoring Social Acceptance of AI

Historical Parallel: Remember When Buying Things Online Felt Dangerous?

Plotting AI Delegation for Product Planning

Individual & Social Delegation Disagreements Will Occur, Exposing Sensitive Issues