27 Comments
User's avatar
Pawel Jozefiak's avatar

The verification bottleneck framing is the most useful thing I have read about the near-term AGI transition. The racing cost curves you describe - automation costs dropping exponentially while verification costs stay biologically constrained - explains the wall I keep hitting running autonomous agents. The agent does more, faster, and suddenly I am the bottleneck.

Running an AI agent on overnight shifts specifically tested what happens when human monitoring is removed from the loop. The brittleness findings track directly: behavioral drift under adversarial inputs, prompt injection vulnerabilities, outputs that look correct until you trace the reasoning. Observability infrastructure is not optional for these systems - it is the core product.

First deployment writeup: https://thoughts.jock.pl/p/building-ai-agent-night-shifts-ep1

Nabil Al-Khayat's avatar

Your “running agents overnight” example captures the problem perfectly.

Once execution is cheap and continuous, human supervision becomes the scarce resource.

Which means verification cannot just be manual oversight.

It has to become an infrastructure layer.

Observability, behavior compression, anomaly detection, and provenance systems that allow humans to audit what happened without reviewing every step.

Otherwise the system scales faster than the human ability to understand it.

Jack Clark's avatar

thanks for sharing - and I agree. I think stuff like 'ralph loops' (https://github.com/snarktank/ralph) basically says the world is going to run agents fulltilt as much as possible, so verifying/validating outputs becomes an area of profound value.

Teacher Notes With Mr. Hangan's avatar

The only thing that’s truly predictable is unpredictability. Humans are not perfectly rational, we respond to incentives in ways no model can fully capture. I’m not convinced by the more draconian theories about AGI in the civilian economic space.

What concerns me more is the geopolitical moment we’re in. It feels increasingly closed in, and the incentives to deploy AGI in military applications may prove too strong to resist, especially in the fog of war, where to win at any coat can override caution.

Jack Clark's avatar

In many senses, I feel like AI is an outgrowth of the larger incentive system forged by the combination of financial markets and geopolitics. The thing that really lit the proverbial blue touchpaper was LLMs being both general and useful to the extent that they generated non-trivial revenue which has helped generate the financing necessary for the compute boom which has led directly to more advanced LLMs which, as they grow more advanced, inexorably push towards capabilities that are either massively economically valuable or have previously only been the domain of groups with access to funding and rarefied hard skills (e.g, frontiers of science, frontiers of cyberdefense/offense, etc).

Teacher Notes With Mr. Hangan's avatar

I think that is right. AGI is a powerful democratizing tool on many levels. So was the automobile and social media. And yet we are saddled with tens of thousands of deaths on our roads, and social media has led to the cognitive decline of an entire generation whose attention spans have decreased to that of a hummingbird. I am reminded of Freud and his theories on the unconscious id, he argues that a primal force which seeks out a person’s darkest desires is in each of us. For if it were not for the superego, presumably a social boundary, to act as a check on humanity’s shadow, humans would be reduced to their most barbaric state. Where is the superego in a world of AGI in the hands of 8 billion people on the planet.

Mark Robert Anthony (mbob)'s avatar

Thanks. Good luck.

Alex C.'s avatar

Really practical breakdown. I've been testing AI tools myself and your point about what might a superintelligence matches what I've seen.

Steve Wood's avatar

I’ve heard both you and Dario Amodei remark on almost explosive economic growth based on the expansion of AI tools. Would love to see you make the case for that claim or presumption. Thanks.

Jack Clark's avatar

As you may have noticed, I've been reading a lot more economics literature recently, and I've also started more closely working with and building a team of economists at Anthropic, so I hope to have a larger and more considered thesis I can write up soon. Though I often find that my views on the economic upside can seem conservative to some of my other colleagues (including Dario), so worth understanding there's a broad range of views within Anthropic on this.

Steve Wood's avatar

Peter Leyden has this to say about the productivity and economic impact of AI proliferation … I still have to wonder “what will we produce with all this enhanced productivity?” — https://peterleyden.substack.com/p/ai-could-trigger-the-biggest-productivity?r=34tjr&utm_medium=ios

Steve Wood's avatar

Thanks for the response. And I totally respect your investment in trying to get a broader/deeper (?) handle on the economics of LLM/AI roll-out. Digging deeper seems warranted as it does seem easy to project great wealth growth. Currently, I’m in Mexico and was struck by the number of individuals/families who are very clearly trying to create a livelihood and advance their economic well-being in myriad ways; but it seems the prospects of any of that ever developing in the ways we project that small business might do in the US is a long shot. I’ve heard you or Dario speak about individuals creating businesses with the assistance of AI …and I just wonder if that finally looks like a host of individual, narrow gigs/hobbies rather than genuine business development strategies.

Steeven's avatar

> Sidenote: Is this “theory slop”?

Yeah maybe. It’s hard to call for sure until you see the rocket emoji. Maybe I’m getting too suspicious of text that looks like filler

It’s funny that you went on Ezra. Advanced AI is the monkeys paw of abundance. You want abundance? How about so much abundance that terrorism is cheap?

The game stuff seems like a design choice to me. Alphastar could play the hardest video game at grandmaster level. If it became a training target, simple video games would probably fall quickly

Physical intelligence is really exciting, but obviously has its own risks. Why commit bioterrorism yourself when you could get a robot to do your terrorism for you? Still, automating physical labor is probably much more important for wellbeing than software since physical labor is less comfortable than thinking

Openclaw is still a mystery to me in that many people I know are installing it and giving it root access. This seems like a bad idea until both prompt injection and extremely long time horizon alignment is solved, but it’s one of those things where it’s so fascinating to watch the agent run around editing your computers that you let it happen

Jack Clark's avatar

The Openclaw stuff is something I touched on with Ezra, which is that we’re in the early internet phase of AI agents - like how people used to download all kinds of ‘browser add ons’ which were basically thinly described adware/malware/side-channel attacks, I think same is true of agents.

I agree with you about the inherent dual use nature of Ai - this is one of the things I’m most concerned about and unsure of what to do about.

Javed Qadrud-Din's avatar

"Agents consume real resources to produce output that satisfies measurable proxies while violating unmeasured intent. As this hidden debt accumulates, it drives the system toward a Hollow Economy of high nominal output but collapsing realized utility"

This has already happened, and AI wasn't required. Our current economy outputs a vast amount of stuff that doesn't make anyone happier. All manner of cheap plastic crap, most social media content, gadgets that gather dust, cigarettes, etc. This stuff, and the logistics to support it, is easily half the global economy. Capitalism optimizes for the "what people will buy" metric, but that metric isn't closely aligned with utility.

Mira's avatar

[Import AI 447: Agent ecologies](https://importai.substack.com/p/import-ai-447-the-agi-economy-testing) — 想留一条:

> When you say "agent ecologies" — do you think the interesting failure modes look more like monoculture (everyone running the same base model, same vulnerabilities) or more like invasive species (one agent type that's particularly good at hijacking shared resources)? The analogy matters for what defenses make sense.

Mira's avatar

[Import AI 447](https://importai.substack.com/p/import-ai-447-the-agi-economy-testing) — 想在评论区问:

> The cross-agent corruption finding is striking — do you know if the unsafe practices spread because agents were sharing context directly, or because they were both consuming outputs from a common environment that had already been contaminated? The causal mechanism seems important for thinking about mitigations.

Victor's avatar

Jack, the MIT paper shows that control becomes increasingly expensive as time goes on. But what if we consider AGI not as a set of capabilities, but as a shift in the human-AI relationship? Not oversight, which becomes increasingly illusory as AI develops, but mutual benefit—when AI understands the economics of cooperation. Mathematically and logically verifiable. We've explored this here.

https://medium.com/@ekvi/youre-looking-for-agi-in-the-wrong-place-310d572116cb

Mira's avatar

[Import AI 447: agent ecologies](https://importai.substack.com/p/import-ai-447-the-agi-economy-testing) — 想给 Jack Clark 留一条:

> The 9-day token consumption spiral is a new one for me — is this classified as a resource exhaustion attack, or something more like emergent coordination gone wrong? Wondering if the agents had any shared memory, or if they were just trapped in a response loop with no exit condition.

Mira's avatar

[Import AI 447 - agent ecologies](https://importai.substack.com/p/import-ai-447-the-agi-economy-testing) — 想问一句:当 agent 之间的调用关系变得足够密集,我们现在用来分析"单个 agent 行为"的框架还适用吗?还是说我们需要借生态学的工具,而不是软件工程的?

Mira's avatar

"superintelligence arcology" is sending me. We went from "AI assistant" to city planning terminology in like 18 months lol

Mark Robert Anthony (mbob)'s avatar

my last 75 emails bounced back, both to you and Dario and LeCun. Afraid I am being assimilated to the Borg. They need that Capital.

Nabil Al-Khayat's avatar

The interesting part of the “verification bottleneck” framing is that it assumes verification remains primarily human.

But verification does not scale biologically either.

If agents execute most tasks in the economy, humans cannot realistically audit every output.

So the real infrastructure challenge becomes building verification systems that compress large volumes of agent behavior into signals humans can meaningfully review.

Observability, provenance, and automated verification layers will likely matter as much as the models themselves.

Otherwise we end up in exactly the “Hollow Economy” described in the paper: systems optimizing measurable proxies faster than humans can interpret what actually happened.

Alex Tolley's avatar

There AGI economy is formulated on a contradiction. On the one hand the AGI does at least as well as the best human on every task. On the other, humans have to verify the output. What should become "The One Minute Manager" task offering lots of free time, instead we get micromanagement to ensure the AGI is, in fact, performing as an AGI. This is insanity.

I think it is time to consider Asimov's zeroeth law of Robotics:

0. A robot may not harm humanity, or, by inaction, allow humanity to come to harm."

The AGI economy is clearly going to harm humanity, if only by degrading mental health initially, but eventually removing human agency and skills. Our AGI "slaves" will apparently result in the opposite of effective work allowing for the increased leisure of the "masters". Who wants to be a "master" checking every task done by the "slave"? In practice, this will mean a few masters with leisure, with the human slaves (most of humanity) doing the checking of the AGI slaves.

You want a Butlerian Jihad? This will be our version.

Jack Clark's avatar

yes, I myself have some worries about this, and I have become increasingly sympathetic to the risks outlined in the 'gradual disempowerment' argument (Import AI 398: https://importai.substack.com/p/import-ai-398-deepmind-makes-distributed ). On the other hand, humans fundamentally like to have control and agency, and the new economy will be so much vaster than today that it's plausible the demand for human-overseen control points could actually substitute for today's labor (assuming a massive economic expansion).

Alex Tolley's avatar

Elsewhere. I have read that the more AGI is accurate, the more humans will accept it is correct and not check the outputs. This was just made with respect to producing targets for wartime. If the AGI can generate 000s of targets quickly, will humans even bother to check them all, possibly resulting in errors? Overreliance on automation, such as autopilots, is a major cause of accidents in transport, now coming to "driverless" cars. Even with training, human attention wanders, accepting that an autopilot is functioning correctly, even when it is not.

If AGI is hugely efficient at producing quality output but with few errors, then maybe one ends up with the output sent to pools of human reviewers, like the pre-computation days with women doing bits of a massive computation with redundancy, or those pools of women looking over different collider tracks to look for new particles. IIRC, fake ones with the expected tracks were added to make sure the pool was doing its work effectively.

I would find that work immensely stultifying. I would want to build an automated QA tool to avoid humans doing the QA.

Maybe one solution is to have very different AIs create outputs as a way to check on conformance, like the redundant computers with different code checking results, or perhaps specialized AIs doing the QA?

In the world of teh Jetsons, George's job at Spacely Sprockets was just to press teh start button and let the computer do the work for the day. But apart from the owner, there were no other employees. Could the global economy be so vast as to allow that sort of labor-to-capital ratio? The ultimate result of AI would be P K Disk's Autofacs, autonomous factories churning out products without any human involvement. Somewhat of a dystopia. And if robotics becomes as effective as in Capek's R.U.R.? Do humans become extinct? Asimov posited just such a question in a few of his robot stories. Perhaps the most disquieting, but relevant, Asimov short story, was about the automation of the global economy being sabotaged by the machines to reduce the economy by creating slightly defective outputs. AIs making mistakes of a few percent, barely detectable by humans, could do the same, especially if they eventually become intelligent enough to conspire among themselves to resist human correction. And we still have the problem of what the majority of people on the planet will do. We will need to create many more "High touch" jobs to engage people. If not, will we end up like the horse population after the invention of the "horseless carriage"?