Since we have released the first version of our framework (May 22nd 2023, on the ArXiv at https://arxiv.org/abs/2305.13455), we have become aware of several other projects that develop similar ideas (evaluate / explore LLMs through setting up self-play). We list those here. (If you want your project to be listed as well, get in touch!)

Game / Multi-Agent Frameworks for Evaluation

Qiao et al., “GameEval: Evaluating LLMs on Conversational Games” (2023-08-19); https://arxiv.org/abs/2308.10032
Li et al., “Beyond Static Datasets: A Deep Interaction Approach to LLM Evaluation” (2023-09-08); https://arxiv.org/abs/2309.04369
Gong et al., “MindAgent: Emergent Gaming Interaction” (2023-09-18); https://arxiv.org/abs/2309.09971
Wu et al., “SmartPlay: A Benchmark for LLMs as Intelligent Agents” (2023-10-02); https://arxiv.org/abs/2310.01557
Zhou et al., “SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents” (2023-10-18); https://www.sotopia.world; https://arxiv.org/abs/2310.11667;

Exploring Individual Games

ChatGPT’s Information Seeking Strategy: Insights from the 20-Questions Game (Bertolazzi et al., INLG-SIGDIAL 2023); this great paper analyses a single game (20 questions) in self-play. It won the best paper award at the conference! (It was under submission at the same time as our paper [and published at INLG 2023 before our paper was published at EMNLP 2023].)

Related Work

Game / Multi-Agent Frameworks for Evaluation

Exploring Individual Games