2012年3月28日 星期三

Semi-Supervised Hashing for Scalable Image Retrieval

Semi-Supervised Hashing for Scalable Image RetrievalJun Wang,Columbia University
Sanjiv Kumar,Google Research
Shih-Fu Chang,Columbia University
CVPR, 2010
============================================
Hashing methid is popular for allowing approximate but highly efficient search.
There are

1.Unsupervised hashing
   Unsupervised hashing methods show good performance with metric distances but, in image
   search, semantic similarity is usually given in terms of labeled pairs of images.

2.Supervised hashing
   Supervised hashing methods that can handle such semantic similarity but they are prone to
   overfitting when labeled data is small or noisy.

Tthese methods are usually very slow to train. The paper proposed a semi-supervised hashing method that improved the problems mentioned above.

Background

1.LSH
A key ingredient of Locality Sensitive Hashing is mapping similar samples to the same bucket with high probability.

A typical category of LSH functions consistsof random projections and thresholds as


Construct a total of K-bit length hash table


,K should be large enough to reduce the size of bucket, and the collision probability is as followiing:


2.SH
Due to the limitation of random projection based LSH, SH was proposed to to design compact binary
codes for ANN search. SH want to minimize the following function.


where , k = 1...K
This is a NP-hard problem, after relaxing the constraints,the above optimization was solved using "spectral" graph analysis. Maybe this is the words Spectral Hashing come from.

3.RBMs
Learning deep belief networks via stacking Restricted Boltzmann Machines (RBMs) to obtain compact binary codes. A practical implementation of RBMs has two critical stages:
  
     (1) unsupervised pre-training
     (2) supervised finetuning.

RBM-based binary encoding involves estimating a large number of weights, result in extremely costly
training procedure and demand sufficient training data for fine-tuning.

Semi-Supervised Hashing

given a set of n points, X = {xi}, i = 1 . . . n, xi 2 RD,in which a fraction of pairs are associated with two categories of label information, M and C.


we want to maximize

Relaxing Objective Function

Here use a simple relaxing method ,we replace the sign of projection with its signed magnitude
With this relaxation, the new objective can be directly written as a function ofWas

Finally, we can get

Experiments

We can see that the result was really good. SSH outperform other methods


========================================
Comment

The paper proposed a smart and impressive way that take the advantage of the original supervised and unsupervised method to improve both efficiency and accuracy.
It'll be an important method in the practical applications.

沒有留言:

張貼留言