2012年3月28日 星期三

Semi-Supervised Hashing for Scalable Image Retrieval

Semi-Supervised Hashing for Scalable Image RetrievalJun Wang,Columbia University
Sanjiv Kumar,Google Research
Shih-Fu Chang,Columbia University
CVPR, 2010
============================================
Hashing methid is popular for allowing approximate but highly efficient search.
There are

1.Unsupervised hashing
   Unsupervised hashing methods show good performance with metric distances but, in image
   search, semantic similarity is usually given in terms of labeled pairs of images.

2.Supervised hashing
   Supervised hashing methods that can handle such semantic similarity but they are prone to
   overfitting when labeled data is small or noisy.

Tthese methods are usually very slow to train. The paper proposed a semi-supervised hashing method that improved the problems mentioned above.

Background

1.LSH
A key ingredient of Locality Sensitive Hashing is mapping similar samples to the same bucket with high probability.

A typical category of LSH functions consistsof random projections and thresholds as


Construct a total of K-bit length hash table


,K should be large enough to reduce the size of bucket, and the collision probability is as followiing:


2.SH
Due to the limitation of random projection based LSH, SH was proposed to to design compact binary
codes for ANN search. SH want to minimize the following function.


where , k = 1...K
This is a NP-hard problem, after relaxing the constraints,the above optimization was solved using "spectral" graph analysis. Maybe this is the words Spectral Hashing come from.

3.RBMs
Learning deep belief networks via stacking Restricted Boltzmann Machines (RBMs) to obtain compact binary codes. A practical implementation of RBMs has two critical stages:
  
     (1) unsupervised pre-training
     (2) supervised finetuning.

RBM-based binary encoding involves estimating a large number of weights, result in extremely costly
training procedure and demand sufficient training data for fine-tuning.

Semi-Supervised Hashing

given a set of n points, X = {xi}, i = 1 . . . n, xi 2 RD,in which a fraction of pairs are associated with two categories of label information, M and C.


we want to maximize

Relaxing Objective Function

Here use a simple relaxing method ,we replace the sign of projection with its signed magnitude
With this relaxation, the new objective can be directly written as a function ofWas

Finally, we can get

Experiments

We can see that the result was really good. SSH outperform other methods


========================================
Comment

The paper proposed a smart and impressive way that take the advantage of the original supervised and unsupervised method to improve both efficiency and accuracy.
It'll be an important method in the practical applications.

2012年3月21日 星期三

Online dictionary learning for sparse coding

Online dictionary learning for sparse coding.Mairal et al.
ICML 2009
=============================================
Sparse codind is a method that model data vectors as sparse linear combinations of basis elements,and widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses
on learning the basis set, also called dictionary,to adapt it to specific data, This paper proposes a new online optimization algorithm for dictionary learning, based on stochastic approximations, which scales up gracefully to large datasets with millions of training samples.

Contribution:
1.optimization of a smooth nonconvex objective function over a convex set
2.iterative online algorithm that solves this problem by efficiently minimizing at each step a
   quadratic surrogate function of the empirical cost over the set of constraints.
3.significantly faster than previous approaches to dictionary learning on both small and large datasets  
   of natural images

1.Problem Statement
2.Online Dictionary Learning:

1.Problem Statement

we define l(x,D) as the optimal value of the ℓ1-sparse coding problem:

The empirical cost function
define a convex set


convert the problem into convex optimization problem


2.Online Dictionary Learning:

Observed that a Cholesky-based implementation of the LARS-Lasso algorithm that provides the whole regularization path.

Optimizing the Algorithm

1.Handling Fixed-Size Datasets
When dealing with large training sets, it is impossible to store all the past coefficients, but we can use the repeated data that appeared frequently.

2.Mini-Batch Extension
Speed up by using multiple signals at a time rather than only one.

3.Purging the Dictionary from Unused Atoms
Replace the dictionary atoms which are seldom used.

Author also gives some intelligent proof which is beyond my comprehension.
=====================================================
Pros and Cons:

Pros:
1. This is an extraordinary work in the sparse coding area, it gives us a good and efficient way to implement it on the real-world problem.

Cons:
1. The proves are difficult to understand, it's better to give more details or even tutorial to reader.

2012年3月14日 星期三

Aggregating local descriptors into a compact image representation

Aggregating local descriptors into a compact image representation,
Herve Jegou, et al.,
Proc. IEEE CVPR'10
================================================
The goal of this paper is to solve the problem of image search on a very large scale, where three constraints have to be considered jointly: the accuracy of the search, its efficiency, and the memory
usage of the representation.

There are three main steps:
1. the representation, i.e., how to aggregate local image descriptors into a vector representation;
2. the dimensionality reduction of these vectors;
3. the indexing algorithm.

1. the representation

The paper propose a descriptor, derived from both BOF and Fisher kernel, that aggregates SIFT descriptors and produces a compact representation, is was called "VLAD (vector of locally aggregated descriptors)".
The vector v is subsequently L2-normalized by v := v/||v||2 .


 Then we need to transform  an image vector into code. The coding has the property such that the nearest neighbors of a (non-encoded) query vector can be efficiently searched in a set of n encoded database vectors.

Using the ADC approach to approximate nearest neighbors



2. the dimensionality reduction

Dimensionality reduction is an important step in approximate nearest neighbor search, as it impacts the subsequent indexation method. Using the standard tool for dimensionality reduction, principal component analysis (PCA).



3. the indexing algorithm.

In the end, It focus on joint optimization of reduction/indexing, optimizing the dimension
D′, having fixed a constraint on the number of bits B used to represent the D-dimensional VLAD vector x, for instance B=128 (16 bytes).


 =============================================
Comments:

Pros:
1. The paper was clear and mentioned many related work for reference and comparison.
2. It combined and modified other methods and really result in good performance.

Cons:
1. It has mentioned about the Fisher kernel but I can't find the clear relation and details to the paper.

2012年3月8日 星期四

Distinctive Image Features from Scale-Invariant Keypoints

Distinctive Image Features from Scale-Invariant Keypoints
David G. Lowe
Computer Science Department
University of British Columbia
Vancouver, B.C., Canada
lowe@cs.ubc.ca
January 5, 2004
===============================================

This paper presents a method for extracting distinctive invariant features from
images that can be used to perform reliable matching between different views of
an object or scene.

There foir main steps for the feature extraction

1. Scale-space extrema detection:
It is implemented efficiently by using a difference-of-Gaussian function to identify potential interest points that are invariant to scale and orientation
Thu formula is as below,



the picture below shows the idea of the difference-of-Gaussian


2. Keypoint localization:
At each candidate location, a detailed model is fit to determine location and scale. Keypoints are selected based on measures of their stability.Some interest points might be not stable.  Aim to delete them from our candidate list.


3. Orientation assignment:
One or more orientations are assigned to each keypoint location
Based on local image gradient directions. All future operations are performed on image data that has been transformed relative to the assigned orientation, scale, and location for each feature, thereby providing invariance to these transformations. The way of caculating the  orientation is as follow,


orientation histogram uses 36 bins to quantize 360°

4. Keypoint descriptor:
The local image gradients are measured at the selected scale in the region around each keypoint. After
calculate the orientation and magnitude, and then accumulated into orientation histograms summarizing the contents over 4x4 subregions.Like the figure shows below.


These are transformed into a representation that allows for significant levels of local shape distortion and change in illumination.
====================================================
Comment
SIFT is really a very power and state-of-art descriptor to handle scaling and rotation problems. It 's a local feature that has the advantages which global features like color historgram don't have. While SIFT doesn't contain any information about color through the whole paper.











Efficient Visual Search of Videos

"Efficient visual search of videos cast as text retrieval," J. Sivic, and A. Zisserman, IEEE TPAMI, 2009
===================================================
The goal of  this paper is to retrieve key frames and shots of a video comtaining a particular object by employing a text retrieval approach.

The following is the summary of the paper:
1.FIND THE VIEWPOINT INVARIANT DESCRIPTION
2.BUILDING A VISUAL VOCABULARY
3.VISUAL INDEXING USING TEXT RETRIEVAL METHODS

Details:

1.FIND THE VIEWPOINT INVARIANT DESCRIPTION

The goal is to extract a description of an object from an image, which will be largely unaffected by a change in camera viewpoint (viewpoint invariant), the object’s scale, and scene illumination and will also be robust to some amount of partial occlusion.

There are two types of affine covariant regions, Shape Adapted (SA) and Maximally Stable (MS). The SA regions tend to be centered on corner-like features and the MS regions correspond to blobs of high contrast with respect to their surroundings such as a dark window on a gray wall. Both types of regions are represented by ellipses.

Using the 128-dims vector SIFT descriptor to represent the the elliptical affine covariant region. Combining the SIFT descriptor with affine covariant regions gives region description vectors, which are invariant to affine transformations of the image.

2.BUILDING A VISUAL VOCABULARY

The objective is to vector quantize the descriptors into clusters, making it seem to be visual “words”, so that we can employed the method for text retrieval. The vector quantization is carried out by K-Means clustering,.

3.VISUAL INDEXING USING TEXT RETRIEVALMETHODS

Once we got the vocabulary from the above step, we can do text search like method on the movie.
By utilizing the methods like stop word list, tf-idf and  to do ranking.

=============================================
Comment:

Pros
1. It was good to handle the video by using the method we used on text which was well developed.
2.The method proposed in this paper is fast and efficient.

Cons
1. The test data was too small.
2.There is no clear analysis and description of all the methods that used in the paper.