In this paper we introduce a new game to crowd-source natural language referring expressions. By designing a two player game, we can both collect and verify referring expressions directly within the game. To date, the game has produced a dataset containing 130,525 expressions, referring to 96,654 distinct objects, in 19,894 photographs of natural scenes. This dataset is larger and more varied than previous REG datasets and allows us to study referring expressions in real-world scenes. We provide an in depth analysis of the resulting dataset. Based on our findings, we design a new optimization based model for generating referring expressions and perform experimental evaluations on 3 test sets.


Sahar Kazemzadeh*, Vicente Ordonez*, Mark Matten, Tamara L. Berg.   ReferItGame: Referring to Objects in Photographs of Natural Scenes.
Empirical Methods in Natural Language Processing (EMNLP) 2014.  Doha, Qatar.  October 2014.
[*] indicates equal author contribution.

  title     = {ReferIt Game: Referring to Objects in Photographs of Natural Scenes},
  author    = {Sahar Kazemzadeh and Vicente Ordonez and Mark Matten and Tamara L. Berg},
  year      = {2014},
  booktitle = {EMNLP}

If either dataset is used, please reference the ReferItGame paper.


ImageClef Referring Expression Dataset:

MSCOCO Referring Expression Dataset: