Experimental confirmation of the observation many of us had reported anecdotally: RLHF makes models less creative, more predictable. Maybe a little of this is good, but too much leads to a boring and unhelpful model.
Perhaps a better option is go light on the RLHF, and combine it with other methods of steering outputs. Diffusion-guiding is a novel and exciting approach to steering outputs.
Predicting vs. Acting: https://arxiv.org/abs/2407.02446
Experimental confirmation of the observation many of us had reported anecdotally: RLHF makes models less creative, more predictable. Maybe a little of this is good, but too much leads to a boring and unhelpful model.
Diffusion Guided Language Modeling: https://arxiv.org/abs/2408.04220
Perhaps a better option is go light on the RLHF, and combine it with other methods of steering outputs. Diffusion-guiding is a novel and exciting approach to steering outputs.
What an odd assumption underlying the text, that the State is, or should be, owner and proprietor of all technological innovation.
The Tech Tale has a very Harry Seldon feel to it, in microcosm.