Distributed training seems extremely concerning. It does seem like there is a bit of an efficiency drop, but seems like it would massively permit bad actor to create powerful models unless there are things like on-chip controls to help hedge it off.
Lennart's comment on memory controls is also option, at least for reasoning models.
It would be really interesting IMO to see a model try to run a youtube channel, with all of the challenges that it entails. E.g. choosing an audience, an upload schedule, as well as keeping up to date with and staying ahead of trends.
Distributed training seems extremely concerning. It does seem like there is a bit of an efficiency drop, but seems like it would massively permit bad actor to create powerful models unless there are things like on-chip controls to help hedge it off.
Lennart's comment on memory controls is also option, at least for reasoning models.
> Lennart's comment on memory controls
Where could I find this?
Its on his twitter.
Really enjoyed the story!
It would be really interesting IMO to see a model try to run a youtube channel, with all of the challenges that it entails. E.g. choosing an audience, an upload schedule, as well as keeping up to date with and staying ahead of trends.