![]() |
Not all content is created equal – as indicated by the descriptions people write (right). Some objects (e.g. man, baby, sling) seem to be more important than others (e.g. ladder, table, chair). Some attributes seem to be more important (e.g. beard) than others (e.g. shirt, or glasses). Sometimes scene words are used (e.g. kitchen), and sometimes they aren’t. One goal of this workshop was to examine the complex relationship between images and their descriptions. |
| |
AbstractThis workshop explored learning to identifying visually descriptive text, parsing this text and extracting statistical models, and using these models to 1) learn how people describe the visual world, 2) compose descriptions about images, and 3) build more relevant recognition systems in computer vision. It was an exciting opportunity to deal with large scale text and image data, be exposed to cutting edge techniques in computer vision, and interactively develop new strategies on the boundary between NLP and computer vision. Specific types of work included, data collection, parsing, using Amazon's Mechanical Turk, building and using probabilistic models, and work on applications including image parsing, retrieval, and automatic sentence generation from images. |
| |
PublicationsAlexander C. Berg, Tamara L. Berg, Hal Daumé III, Jesse Dodge, Amit Goyal, Xufeng Han, Alyssa Mensch, Margaret Mitchell, Karl Stratos, Kota Yamaguchi JHU-CLSP Summer Workshop Whitepaper, 2011. Margaret Mitchell, Jesse Dodge, Amit Goyal, Kota Yamaguchi, Karl Sratos, Xufeng Han, Alysssa Mensch, Alexander C. Berg, Tamara L. Berg, Hal Daumé III European Chapter of the Association for computational Linguistics, EACL 2012. Alexander C. Berg, Tamara L Berg Hal Daumé III, Jesse Dodge, Amit Goyal, Xufeng Han, Alyssa Mensch, Margaret Mitchell, Aneesh Sood, Karl Stratos, Kota Yamaguchi, Computer Vision and Pattern Recognition, CVPR 2012. Jesse Dodge, Amit Goyal, Xufeng Han, Alyssa Mensch, Margaret Mitchell, Karl Stratos, Kota Yamaguchi, Yejin Choi, Hal Daumé III, Alexander C. Berg, Tamara L. Berg, North American Chapter of the Association for Computational Linguistics, NAACL 2012. |
| |
Related DataSBU Captioned Photo Dataset (Images and Descriptions)If photos or descriptions are used please cite: Im2Text: Describing Images Using 1 Million Captioned Photographs Vicente Ordonez, Girish Kulkarni, Tamara L. Berg Neural Information Processing Systems (NIPS), 2011. Pre-Processed Results (Small Sample of 1k Parsed Descriptions, Object Detections, Scene Classifications)
Detecting Visual Text Data
|
| |
Related Talks
Final Presentation - All
|