5 Comments

Distributed training seems extremely concerning. It does seem like there is a bit of an efficiency drop, but seems like it would massively permit bad actor to create powerful models unless there are things like on-chip controls to help hedge it off.

Lennart's comment on memory controls is also option, at least for reasoning models.

Expand full comment

> Lennart's comment on memory controls

Where could I find this?

Expand full comment

Its on his twitter.

Expand full comment

Really enjoyed the story!

Expand full comment

It would be really interesting IMO to see a model try to run a youtube channel, with all of the challenges that it entails. E.g. choosing an audience, an upload schedule, as well as keeping up to date with and staying ahead of trends.

Expand full comment