Discussion about this post

User's avatar
neuro morph's avatar

Predicting vs. Acting: https://arxiv.org/abs/2407.02446

Experimental confirmation of the observation many of us had reported anecdotally: RLHF makes models less creative, more predictable. Maybe a little of this is good, but too much leads to a boring and unhelpful model.

Diffusion Guided Language Modeling: https://arxiv.org/abs/2408.04220

Perhaps a better option is go light on the RLHF, and combine it with other methods of steering outputs. Diffusion-guiding is a novel and exciting approach to steering outputs.

Expand full comment
Perry Simms's avatar

What an odd assumption underlying the text, that the State is, or should be, owner and proprietor of all technological innovation.

Expand full comment
1 more comment...

No posts