Discussion about this post

User's avatar
Jonathan Kreindler's avatar

INTIMA identifies dynamics that existing safety completely misses - like the finding that boundary-maintaining behaviors actually decrease when user vulnerability increases. INTIMA operationalizes the psychological theories (parasocial interaction, attachment, anthropomorphism) that explain why these dynamics are so dangerous.

The fact that Claude and other models show such different boundary-setting patterns suggests we need active monitoring systems, not just better training. The psychological risks are too high to catch only in retrospective analysis.

We now need to shift from post-hoc evaluation to real-time detection of psychologically risky conversational dynamics - so systems like Claude can flag these patterns as they happen - like when users show increasing vulnerability, when conversations drift into unhealthy attachment territory, when AI responses inadvertently exploit users' psychological needs.

I've built a psycholinguistic model that detects these risks in real-time:

https://kreindler.substack.com/p/detecting-and-preventing-psychological

Expand full comment
Steeven's avatar

The vending machine experiment is very funny. It seems like the type of thing an AI and a database should be really good at, I’m almost suspicious that these results are because the experiment was done poorly.

Expand full comment
3 more comments...

No posts