Discussion about this post

User's avatar
Trelis Research's avatar

Wow yeah the gsm no op is a lot worse than gsm. I suppose models aren’t trained with irrelevant data like this during SFT or RLHF

Expand full comment
Perry Simms's avatar

"By adding seemingly relevant but ultimately irrelevant information to problems, we demonstrate substantial performance drops (up to 65%) across all state-of-the-art models"

On the one hand, I feel embarassed for not having thought of this myself.

On the other hand, i feel happy at the realization that we still have a fast growing tree of technology with 'easily' exploitable potentials that just need to be discovered.

Expand full comment
1 more comment...

No posts