ㄎㄎㄎㄎㄎㄎ: Latent Dirichlet allocation

Latent Dirichlet allocation, D. Blei, A. Ng, and M. Jordan
Journal of Machine Learning Research, 2003
=====================================

Latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics.It aim to solve the problems that pLSA didn't handle well, such as the unseen document.

The structure of LDA is as below:
alpha are the parameters of Dirichlet distribution, beta is the word probability k*v matrix, where k is number of topics and v is number of vocabulary

The summation of word over k topics is to decide how likely this word occurs in the combination of the topics and weighted by the probability of topics

The model becomes as follow, it models the document distribution

The illustrative figures of how LDA use topics simplex to generate word is as follow:

Following shows the difference between pLSA and LDA

========================================================

Comment:

LDA is brilliant that solve the problem of pLSA and seems outperform pLSA. But compare to pLSA, it's model is more complcated and hard to understand, while the amount of computations is another big issue.

ㄎㄎㄎㄎㄎㄎ

2012年6月8日星期五

Latent Dirichlet allocation

沒有留言:

張貼留言

關於我自己

網誌存檔

2012年6月8日 星期五

Latent Dirichlet allocation

沒有留言:

張貼留言

2012年6月8日星期五