If we perfectly simulated the entire world and trained agents within it, what might we learn about ourselves and the experience of being "human"?
BLOOMChat and the OpenLLM Leaderboard show us a simple trend in LLMs right now: size still matters. As data improves, I suspect this matters less.
LLaMA 65B is top still: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard, it may change as we add more eval types though.
BLOOMChat and the OpenLLM Leaderboard show us a simple trend in LLMs right now: size still matters. As data improves, I suspect this matters less.
LLaMA 65B is top still: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard, it may change as we add more eval types though.