A user is presented with an instruction, an image and a fill-in-the-blank template, and asked to fill in the blank.
Describe what happened immediately after this picture was taken.
- One or two seconds after this picture was taken, ____.
Describe the activity of the indicated person/people.
- Person A is ____.
![]() |
As how we collect the dataset, now we ask your algorithm to generate the fill-in-the-blank description for image automatically, with our Madlibs prompt. |
![]() |
This is a new targeted multiple-choice question answering task for images. Among the four choices, there are three distractors chosen from either similar images or random images depending on the level of difficulty desired, i.e., easy and hard. |
Type 1: image's scene
Describe the type of scene/place shown in this picture.
- The place is a(n) tennis court.
Type 2: image's emotion
Describe the emotional content of this picture.
- When I look at this picture, I feel hungry and hot.
Type 3: image's interesting
Describe the most interesting or unusual aspect of this picture.
- The most interesting aspect of this picture is the kites.
Type 4: image's past
Describe what happened immediately before this picture was taken.
- One or two seconds before this picture was taken, they slowed the horses.
Type 5: image's future
Describe what happened immediately after this picture was taken.
- One or two seconds after this picture was taken, they drove around.
Type 6: object's attribute
Describe the appearance of the indicated object.
- The car is white.
Type 7: object's affordance
Describe the function of the indicated object.
- People could relax on the couches.
Type 8: object's position
Describe the position of the indicated object.
- The bicycle is in front of the bus.
Type 9: person's attribute
Describe the appearance of the indicated person/people.
- Person A is a balding male.
Type 10: person's activity
Describe the activity of the indicated person/people.
- Person D is standing around.
Type 11: person's location
Describe the location of the indicated person/people.
- Person B is next to an elephant.
Type 12: pair's relationship
Describe the relationship between the indicated person and object.
- The person is putting food in the bowl.