5 Comments
User's avatar
Jonathan Kreindler's avatar

INTIMA identifies dynamics that existing safety completely misses - like the finding that boundary-maintaining behaviors actually decrease when user vulnerability increases. INTIMA operationalizes the psychological theories (parasocial interaction, attachment, anthropomorphism) that explain why these dynamics are so dangerous.

The fact that Claude and other models show such different boundary-setting patterns suggests we need active monitoring systems, not just better training. The psychological risks are too high to catch only in retrospective analysis.

We now need to shift from post-hoc evaluation to real-time detection of psychologically risky conversational dynamics - so systems like Claude can flag these patterns as they happen - like when users show increasing vulnerability, when conversations drift into unhealthy attachment territory, when AI responses inadvertently exploit users' psychological needs.

I've built a psycholinguistic model that detects these risks in real-time:

https://kreindler.substack.com/p/detecting-and-preventing-psychological

Expand full comment
Steeven's avatar

The vending machine experiment is very funny. It seems like the type of thing an AI and a database should be really good at, I’m almost suspicious that these results are because the experiment was done poorly.

Expand full comment
Keith Wilkinson's avatar

I really like using fiction to open our imaginations to the future. Clearly there is a gap between expert knowledge and public understanding. I always loved Arthur C. Clarke for this.

Here is something that really frustrates me. How is it that SF government processes operate like its the 1980s, when Anthropic is a mile down the street.

What I would love is we make our own sci-fi experiment, take a specific totally mundane public task and trick it out with an AI partner. Like a handheld, Ziggy from Quantum Leap, KITT from Knight Rider. A small project that stirs the imagination for the next step.

I wonder if a barrier to the public understanding alignment and AI safety is they just cannot picture, the reality of it. Would we in 2005 have been open to the harms of smart phone attention capitalism? Surely there was fiction about it and hints of the future. But the way culture dived head first into trading attention for dopamine shows we didn't really understand the danger. Maybe mundane adoption, in blue collar tasks is the first step to making the rewards and risks a reality for the general public.

And yes this is a long winded request for a new toy at work lol.

Expand full comment
Tony Rifkin's avatar

Wow. Great one this week, Jack. Felt like you covered all the bases (as they unfold!)

Expand full comment
Paul Triolo's avatar

Excellent comments on Heteroscale...innovation in heterogenous and distributed compute haappening in China all over the place....

Expand full comment