2012年4月25日 星期三

Nonlinear Dimensionality Reduction by Locally Linear Embedding

Nonlinear Dimensionality Reduction by Locally Linear Embedding
Sam T. Roweis1 and Lawrence K. Saul2
 Science, 2000
======================================================

Locally linear embedding (LLE), an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs.

The figure below is the toy example which shows the dimension reduction from 3-D to 2-D manifold subspace. The structure(neighborhood distance) of the date was kept through the reduction.

Characterize the local geometry of these patches by linear coefficients that reconstruct each data point from its neighbors. Reconstruction errors are measured by the cost function

 
Below is the illustration of the reconstruction, for a given point, select the neighbors and resconstruct with lineat weught and reserve this property in the embedded coordinates.



An common example of face

==================================================
Sumary and Comment:

The main goal of LLE is neighborhood preserving which based on the fundamental manifold properties, preserved the local geometry of each sample and its neighbors. The advantage is it Ignores the global geometry in large scale which could be more efficient than calculate the all pairs distance.

But like many other manifold learning methods, it must assume that the data must be well-sampled and sufficient, and  each sample and its neighbors lie on or closed to a local
linear patch (sub-plane) of the manifold. The assumption might be too strong and cause the method could not be general used in many problems.

2012年4月11日 星期三

Probabilistic latent semantic indexing

Probabilistic latent semantic indexingT. Hofmann,
SIGIR, 1999.
==================================================
This paper presents a statistical view on LSA which leads to a new model called Probabilistic Latent Semantics Analysis (PLSA). In contrast to standard LSA, its probabilistic variant has a sound statistical foundation and de nes a proper generative model of the data.
According to the Bayesian rule, we can get the following equtions
d is document , w is word, z is hidden topic

we want to minimizq the error distance between original data and the reconstructed data after using latent classes (using KL divergence). We covert original question to maximize the following log-likelihood function

Use the EM algorithn to solve the optimization problem
E-step

M-step

To notice that the result of traditional EM method is highly depend on the initial probabilities, so they proposed tempered EM (TEM) which is derived from deterministic annealing. Then, the formula in E step becomes



The following is the illustraion (physical meaning) of the hidden topic. 



P(w|z1), P(w|z2) ... P(w|zn) can be viewed as the basis vectors for representing the document. The reconstructed document should be on the plane expended by these vectors because of convexity.

============================================

Comment:

The physical meaning of this paper is strong and easy to understand. It's brillient to introduced the idea of hidden topic and construct it on the probablistic model. But the paper doesn't give a good way to deal with unseen data.