University Logo clem-benchmark
ContributorsLeaderboardLLM CalculatorRelated
Supported by: DFKI Logo

An open-source benchmark for evaluating chat-optimized language models as conversational agents through game play.

  • Code: GitHub clembench
  • Foundations of Computational Linguistics, Potsdam