Discussion about this post

User's avatar
Daniel's avatar

Consistently valuable curation of AI developments.

Jonathan Kreindler's avatar

INTIMA identifies dynamics that existing safety completely misses - like the finding that boundary-maintaining behaviors actually decrease when user vulnerability increases. INTIMA operationalizes the psychological theories (parasocial interaction, attachment, anthropomorphism) that explain why these dynamics are so dangerous.

The fact that Claude and other models show such different boundary-setting patterns suggests we need active monitoring systems, not just better training. The psychological risks are too high to catch only in retrospective analysis.

We now need to shift from post-hoc evaluation to real-time detection of psychologically risky conversational dynamics - so systems like Claude can flag these patterns as they happen - like when users show increasing vulnerability, when conversations drift into unhealthy attachment territory, when AI responses inadvertently exploit users' psychological needs.

I've built a psycholinguistic model that detects these risks in real-time:

https://kreindler.substack.com/p/detecting-and-preventing-psychological

4 more comments...

No posts

Ready for more?