Discussion about this post

User's avatar
Shon Pan's avatar

Distributed training seems extremely concerning. It does seem like there is a bit of an efficiency drop, but seems like it would massively permit bad actor to create powerful models unless there are things like on-chip controls to help hedge it off.

Lennart's comment on memory controls is also option, at least for reasoning models.

Frank Herfert's avatar

Really enjoyed the story!

3 more comments...

No posts

Ready for more?