@folke

Latent Dirichlet Allocation for Automatic Document Categorization

, und . Machine Learning and Knowledge Discovery in Databases (2009)

Zusammenfassung

In this paper we introduce and evaluate a technique for applying latent Dirichlet allocation to supervised semantic categorization of documents. In our setup, for every category an own collection of topics is assigned, and for a labeled training documentonly topics from its category are sampled. Thus, compared to the classical LDA that processes the entire corpus in one, weessentially build separate LDA models for each category with the category-specific topics, and then these topic collectionsare put together to form a unified LDA model. For an unseen document the inferred topic distribution gives an estimation howmuch the document fits into the category.

Links und Ressourcen

URL:
BibTeX-Schlüssel:
istván2009latent
Suchen auf:

Kommentare und Rezensionen  
(0)

Es gibt bisher keine Rezension oder Kommentar. Sie können eine schreiben!

Tags


Zitieren Sie diese Publikation