Discussion about this post

User's avatar
Shon Pan's avatar

Distributed training seems extremely concerning. It does seem like there is a bit of an efficiency drop, but seems like it would massively permit bad actor to create powerful models unless there are things like on-chip controls to help hedge it off.

Lennart's comment on memory controls is also option, at least for reasoning models.

Expand full comment
Frank Herfert's avatar

Really enjoyed the story!

Expand full comment
3 more comments...

No posts