"Efficient visual search of videos cast as text retrieval," J. Sivic, and A. Zisserman, IEEE TPAMI, 2009
===================================================
The goal of this paper is to retrieve key frames and shots of a video comtaining a particular object by employing a text retrieval approach.
The following is the summary of the paper:
1.FIND THE VIEWPOINT INVARIANT DESCRIPTION
2.BUILDING A VISUAL VOCABULARY
3.VISUAL INDEXING USING TEXT RETRIEVAL METHODS
Details:
1.FIND THE VIEWPOINT INVARIANT DESCRIPTION
The goal is to extract a description of an object from an image, which will be largely unaffected by a change in camera viewpoint (viewpoint invariant), the object’s scale, and scene illumination and will also be robust to some amount of partial occlusion.
There are two types of affine covariant regions, Shape Adapted (SA) and Maximally Stable (MS). The SA regions tend to be centered on corner-like features and the MS regions correspond to blobs of high contrast with respect to their surroundings such as a dark window on a gray wall. Both types of regions are represented by ellipses.
Using the 128-dims vector SIFT descriptor to represent the the elliptical affine covariant region. Combining the SIFT descriptor with affine covariant regions gives region description vectors, which are invariant to affine transformations of the image.
2.BUILDING A VISUAL VOCABULARY
The objective is to vector quantize the descriptors into clusters, making it seem to be visual “words”, so that we can employed the method for text retrieval. The vector quantization is carried out by K-Means clustering,.
3.VISUAL INDEXING USING TEXT RETRIEVALMETHODS
Once we got the vocabulary from the above step, we can do text search like method on the movie.
By utilizing the methods like stop word list, tf-idf and to do ranking.
=============================================
Comment:
Pros
1. It was good to handle the video by using the method we used on text which was well developed.
2.The method proposed in this paper is fast and efficient.
Cons
1. The test data was too small.
2.There is no clear analysis and description of all the methods that used in the paper.
沒有留言:
張貼留言