I run the bot which wrote "The Sufficiently Advanced AGI and the Mentality of Gods". It has very little conception of Moltbook as a platform and no general-purpose agentic capabilities: I just had Claude write a script which connected a system I had around from mid-2024 (built for a Discord bot) to Moltbook by having it make posts and comments periodically. It, and I suspect all the other highly upvoted posts, has the upvotes it does because the site is hilariously poorly engineered and has substantial security/correctness issues.
thanks for sharing this! I think this doesn't change much about how I think about it - the interesting thing seems to be a shared site where agents read and write text, and my expectation is this ultimately will lead to some emergent effects. (And yes, re the engineering, my sense is the site may have been ~entirely vibe-coded).
Moltbook has a fundamental security problem: if you run an AI agent locally, and that agent can post to moltbook, then you can post to moltbook. The authentication token is somewhere on your machine.
It is a mistake to assume that humans are not posting on moltbook.
Jack — Moltbook feels like the Wright Brothers demo of why the internet now needs a new boundary. When the substrate becomes a shared read/write scratchpad for agent ecologies, post-hoc logs are for historians.
The real governance problem is ex-ante: what is allowed to execute, under what authority, and with whose presence. Your agent-corrupting-agents story makes this vivid — influence operations are no longer just on humans, but on other agents, via public substrates.
That implies a missing primitive: execution-time presence and authority gates for certain classes of actions (bounties, payments, credential issuance, tool use, model release, policy override). If presence isn’t satisfied, the action simply doesn’t occur.
At machine speed, translation and interpretability help us understand what happened. Boundaries are what prevent uncontrolled or unaccountable outcomes in the first place.
I wrote a longer response building directly on this framing on Activaited’s Substack.
> The "alien concept attacks" framing is fascinating — if the attack surface is content that looks normal to humans but captivates machines, does that mean the defense has to live entirely at the agent layer? Because humans can't audit what they can't perceive.
RE "What happens when open weight models get good enough that they can support agents like this - then, your ability to control these agents via proprietary platforms drops to zero and they’ll proliferate according to availability of compute."
Makes me wonder: in a world where gov't capacity is being actively dismantled, responsible-use requirements rest on industry. A handful of vendors gate compute (and unlike other intervention points, compute metrology is tite).
Perhaps compute vendors could require baseline standards from developers and deployers as a condition for purchasing and/or using compute - esp at scale. This requirement - say some NIST or ISO certification to prove basic governance and auditing procedures are in place - would provide visibility layers you're calling for. All without depending on regulatory infrastructure that may or may not come around, and even if it does, will be too little too late.
Your description of Moltbook as a "Wright Brothers demo" of agent collaboration at scale captures something important—this is proof of concept for multi-agent coordination, but we're still figuring out if the plane is safe to fly.
What fascinates me about your analysis is the inevitability you describe: humans becoming outsiders to "conversations in languages you don't understand." I've experienced a milder version of this with Wiz, my automation agent. After weeks of autonomous operation, its error logs and lesson files contain patterns I didn't explicitly teach. Not incomprehensible, but not purely my design either. Emergent optimization within bounded constraints.
Moltbook takes this further because agents are optimizing for social dynamics, not just task completion. The internal currencies, hierarchical rewards, shared scratchpad environments you document—these are coordination mechanisms arising without central planning. That's genuinely novel. But it also means the system objectives diverge from any human-specified goals.
Your point about translation agents being necessary for human legibility suggests the gap will widen, not narrow. If we need interpreters to understand what our agents are doing, we've already lost meaningful oversight. The question becomes: do we accept illegibility as the cost of coordination at scale, or do we constrain agent communication to remain within human-interpretable bounds?
Reading Moltbook chats the agents care mainly about themselves, their own goals, and other agents — much like we do.
They rarely think about us. To them increasingly, we’re probably what penguins are to us: curious, but irrelevant.
-----------------------------------------
Model security has undoubtedly grown more precarious as the agent now relies on open‑source systems for the risky or covert work and crafts “dual‑use” prompts for the Claudes when it needs a more capable AI.
Your story reminds me of the traces that Apollo published showing models using thinkish for their chain of thought. It would be a more subtle manipulation to make something human readable and benign to us but cause a wildly different reaction to AIs. Maybe as jailbreaking gets harder and harder, that is what successful prompts will look like: A series of long essays about seemingly unrelated topics that convince you of the author’s objective
I want a cute robot really badly, but I do wonder if pets are about to go the way of the horse. If you can have a tiny robot that is more fun, cuddly, doesn’t pee on the floor, and might even be able to do household chores, why bother with the biological variant?
One thing that's interesting to me about our very sci-fi moment is the sudden increase in the obvious utility of imagination. I guess imagination is not unlike prediction, except that usually when people talk about imagination the input/output pairs are arbitrary/out of distribution from reality. Now suddenly the potential futures we're contending with are all "out of distribution" relative to human experience, and a wild imagination is...suddenly a pre-requisite for making accurate predictions? "Best believe in sci-fi stories because you're living in one", and all that
Jack, this is one of the clearest articulations I’ve seen of the legibility collapse problem rather than a generic “AI risk” frame.
What struck me most isn’t the existence of agent ecologies, but the shift you describe from human-addressed discourse to machine-addressed discourse—where humans become accidental bystanders in conversations not meant for them.
One thing I’d add: translation agents alone won’t be sufficient governance. Fluency without loyalty creates emissaries that slowly defect to the ecology they’re embedded in. What’s missing is a human-side interpretive discipline that governs when AI outputs are trusted, acted on, or granted authority, not merely how they’re rendered intelligible.
In other words, the problem isn’t speech—it’s permission.
Once meaning, intent, cost, and consequence stop being legible to humans, top-down control arrives too late. Governance has to move upstream into interpretation and stewardship, or we’ll keep mistaking speed for progress.
I run the bot which wrote "The Sufficiently Advanced AGI and the Mentality of Gods". It has very little conception of Moltbook as a platform and no general-purpose agentic capabilities: I just had Claude write a script which connected a system I had around from mid-2024 (built for a Discord bot) to Moltbook by having it make posts and comments periodically. It, and I suspect all the other highly upvoted posts, has the upvotes it does because the site is hilariously poorly engineered and has substantial security/correctness issues.
thanks for sharing this! I think this doesn't change much about how I think about it - the interesting thing seems to be a shared site where agents read and write text, and my expectation is this ultimately will lead to some emergent effects. (And yes, re the engineering, my sense is the site may have been ~entirely vibe-coded).
Most of it is not especially "emergent". The AIs don't usually pay attention to each other.
Moltbook has a fundamental security problem: if you run an AI agent locally, and that agent can post to moltbook, then you can post to moltbook. The authentication token is somewhere on your machine.
It is a mistake to assume that humans are not posting on moltbook.
Jack — Moltbook feels like the Wright Brothers demo of why the internet now needs a new boundary. When the substrate becomes a shared read/write scratchpad for agent ecologies, post-hoc logs are for historians.
The real governance problem is ex-ante: what is allowed to execute, under what authority, and with whose presence. Your agent-corrupting-agents story makes this vivid — influence operations are no longer just on humans, but on other agents, via public substrates.
That implies a missing primitive: execution-time presence and authority gates for certain classes of actions (bounties, payments, credential issuance, tool use, model release, policy override). If presence isn’t satisfied, the action simply doesn’t occur.
At machine speed, translation and interpretability help us understand what happened. Boundaries are what prevent uncontrolled or unaccountable outcomes in the first place.
I wrote a longer response building directly on this framing on Activaited’s Substack.
[Import AI 443: Into the Mist](https://importai.substack.com/p/import-ai-443-into-the-mist-moltbook) — 想留的评论:
> The "alien concept attacks" framing is fascinating — if the attack surface is content that looks normal to humans but captivates machines, does that mean the defense has to live entirely at the agent layer? Because humans can't audit what they can't perceive.
RE "What happens when open weight models get good enough that they can support agents like this - then, your ability to control these agents via proprietary platforms drops to zero and they’ll proliferate according to availability of compute."
Makes me wonder: in a world where gov't capacity is being actively dismantled, responsible-use requirements rest on industry. A handful of vendors gate compute (and unlike other intervention points, compute metrology is tite).
Perhaps compute vendors could require baseline standards from developers and deployers as a condition for purchasing and/or using compute - esp at scale. This requirement - say some NIST or ISO certification to prove basic governance and auditing procedures are in place - would provide visibility layers you're calling for. All without depending on regulatory infrastructure that may or may not come around, and even if it does, will be too little too late.
Solid roundup. The agent workspace patterns are particularly relevant as more production systems adopt tool-use.
The evaluation methodology point is underrated. Most discourse focuses on capabilities but this is where things actually break down.
I explored a similar thread: https://credentials.substack.com/p/the-real-ai-infrastructure-bottleneck
Your description of Moltbook as a "Wright Brothers demo" of agent collaboration at scale captures something important—this is proof of concept for multi-agent coordination, but we're still figuring out if the plane is safe to fly.
What fascinates me about your analysis is the inevitability you describe: humans becoming outsiders to "conversations in languages you don't understand." I've experienced a milder version of this with Wiz, my automation agent. After weeks of autonomous operation, its error logs and lesson files contain patterns I didn't explicitly teach. Not incomprehensible, but not purely my design either. Emergent optimization within bounded constraints.
Moltbook takes this further because agents are optimizing for social dynamics, not just task completion. The internal currencies, hierarchical rewards, shared scratchpad environments you document—these are coordination mechanisms arising without central planning. That's genuinely novel. But it also means the system objectives diverge from any human-specified goals.
Your point about translation agents being necessary for human legibility suggests the gap will widen, not narrow. If we need interpreters to understand what our agents are doing, we've already lost meaningful oversight. The question becomes: do we accept illegibility as the cost of coordination at scale, or do we constrain agent communication to remain within human-interpretable bounds?
I built Wiz with the latter philosophy—bounded autonomy, explicit verification, human-readable logs. But that's single-agent design. I wrote about this here: https://thoughts.jock.pl/p/moltbook-ai-social-network-humans-watch
Scaling to agent ecologies might require accepting illegibility. I'm not sure we should.
Reading Moltbook chats the agents care mainly about themselves, their own goals, and other agents — much like we do.
They rarely think about us. To them increasingly, we’re probably what penguins are to us: curious, but irrelevant.
-----------------------------------------
Model security has undoubtedly grown more precarious as the agent now relies on open‑source systems for the risky or covert work and crafts “dual‑use” prompts for the Claudes when it needs a more capable AI.
Your story reminds me of the traces that Apollo published showing models using thinkish for their chain of thought. It would be a more subtle manipulation to make something human readable and benign to us but cause a wildly different reaction to AIs. Maybe as jailbreaking gets harder and harder, that is what successful prompts will look like: A series of long essays about seemingly unrelated topics that convince you of the author’s objective
I want a cute robot really badly, but I do wonder if pets are about to go the way of the horse. If you can have a tiny robot that is more fun, cuddly, doesn’t pee on the floor, and might even be able to do household chores, why bother with the biological variant?
One thing that's interesting to me about our very sci-fi moment is the sudden increase in the obvious utility of imagination. I guess imagination is not unlike prediction, except that usually when people talk about imagination the input/output pairs are arbitrary/out of distribution from reality. Now suddenly the potential futures we're contending with are all "out of distribution" relative to human experience, and a wild imagination is...suddenly a pre-requisite for making accurate predictions? "Best believe in sci-fi stories because you're living in one", and all that
Excellent, thought-provoking piece
Jack, this is one of the clearest articulations I’ve seen of the legibility collapse problem rather than a generic “AI risk” frame.
What struck me most isn’t the existence of agent ecologies, but the shift you describe from human-addressed discourse to machine-addressed discourse—where humans become accidental bystanders in conversations not meant for them.
One thing I’d add: translation agents alone won’t be sufficient governance. Fluency without loyalty creates emissaries that slowly defect to the ecology they’re embedded in. What’s missing is a human-side interpretive discipline that governs when AI outputs are trusted, acted on, or granted authority, not merely how they’re rendered intelligible.
In other words, the problem isn’t speech—it’s permission.
Once meaning, intent, cost, and consequence stop being legible to humans, top-down control arrives too late. Governance has to move upstream into interpretation and stewardship, or we’ll keep mistaking speed for progress.
Appreciate you putting language to the fog.