HW2 Image Retrieval Using Web Text Descriptions
Due March 10
In this homework you will train classifiers for color based visual attributes.
Training images will automatically be labeled by mining the text descriptions associated
with web shopping imges. You will then use these classifiers to retrieve images
displaying each attribute from a collection of images without the respective attribute
term in their text description.
Data
We will use the same shopping images with associated text descriptions as HW1 -- this time
using only the bag part of the collection. You can simply use the bag portion of the data
from last homework or download it again here.
Part 1 - Collecting Training/Testing Data
We will be training attribute classifiers for 5 color terms ("black",
"brown", "red", "silver", and "gold"). In this part of the homework you will
automatically collect training and testing images from the bag dataset by
utilizing their pre-existing associated text descriptions.
- For each attribute term, find all of the images that contain that attribute
term in their associated text description. You should do this in matlab using
the string processing functionality (help strfun gives a list of useful string
processing functions). Note: you should match both upper and lower case
versions of each attribute term and should remove any punctuation (an example
function to do this is
here).
- The set of images that contains exactly one of the attribute terms will
form your training set. The rest of the images will form your testing set.
Part 2 - Computing Image Descriptors
- Compute a color histograms descriptor for each image in your training and
testing sets. You should bin each dimension (h,s, and v) into 10 bins,
resulting in a 1000 dimensional descriptor. This will be the feature vector for
your images used in the remainder of the homework. Note the color histogram should
be computed in hsv space (hue-saturation-value). To convert from rgb to hsv use the rgb2hsv command. Remember
to normalize your histograms so that the histogram for each image sums to 1 (but make sure you don't create any NaN's when you do this!).
- For images that are grayscale you can set the color histogram descriptor to
be all zeros.
Part 3 - Training Classifiers
In this part of the homework you will use RBF kernel SVMs to train attribute classifiers that
recognize images displaying a color-based visual characteristic. Positive examples for training an
attribute, e.g. "black", will consist of those images in your *training set* that have the attribute
in their text description. Negative examples will be the rest of the images in your *training set*.
- You should use the Matlab LibSVM package located here as
your SVM implementation (description of LibSVM here).
Make sure to add the location of your LibSVM
installation to your path using the "addpath" command. In this package, SVM training is
implemented as a function called svmtrain, while SVM testing is called svmpredict.
- Because your SVM will be sensitive to choice of parameter values, split
your training set into two parts: 70% as a temporary training set, and 30% as a
tuning set. Use the tuning set to select good values for the SVM parameters -- C and g.
To do this you can search over a range
of reasonable values for C and g (perhaps 5-10 values between 0.01 and
1000). Here you will train your SVM on the temporary training set and evaluate
accuracy on the tuning set. The best parameter settings for each attribute will
be those that produce the best accuracy on the tuning set. Report parameter settings
and best tuning accuracies in your write-up.
- Once you have found good parameter values (for each attribute), use the
entire training set (from Part 1) to train final SVM models (for each
attribute). Use the '-b 1' option to get probability values out of your SVM.
Part 4 - Retrieving Images without Attribute Annotations
Here we will retrieve images displaying visual attributes from our testing
set (images that do not contain exactly one of our attribute terms in their
text descriptions -- collected in Part 1).
- For each attribute, classify each image in your testing set using the '-b
1' option to get a probability value.
- For each attribute, rank the test images according to their probability.
Create a web page for each attribute showing the most likely 200 images in
ranked order. This ranking should make sense if you've implemented everything
correctly.
- Compute precision@k curves for each attribute for k=1...200.
What to turn in
Hand in via email to cse595@gmail.com:
- PDF write-Up including a description of what you implemented, your best parameter
values for each attribute and tuning set accuracies, and precision@k curves for each
attribute.
- 5 web pages showing top 200 results for each attribute -- note you don't need to submit
the images for these pages, just assume they are located in a directory, "bags" in the same
location as your web pages.
- Commented code.
Extra Credit
- Train your color attributes on handbag data and use these models to retrieve shoes
displaying those attributes (10 points).
- Implement vector quantized SIFT histograms (bags of visual words) descriptors. Train SVMs to
recognize shape based attributes (15 points).