@folke

Identifying the Original Contribution of a Document via Language Modeling

, und . Machine Learning and Knowledge Discovery in Databases (2009)

Zusammenfassung

One major goal of text mining is to provide automatic methods to help humans grasp the key ideas in ever-increasing text corpora. To this effect, we propose a statistically well-founded method for identifying the original ideas that a document contributesto a corpus, focusing on self-referential diachronic corpora such as research publications, blogs, email, and news articles.Our statistical model of passage impact defines (interesting) original content through a combination of impact and novelty,and the model is used to identify each document’s most original passages. Unlike heuristic approaches, the statistical modelis extensible and open to analysis. We evaluate the approach both on synthetic data and on real data in the domains of researchpublications and news, showing that the passage impact model outperforms a heuristic baseline method.

Links und Ressourcen

URL:
BibTeX-Schlüssel:
benyah2009identifying
Suchen auf:

Kommentare und Rezensionen  
(0)

Es gibt bisher keine Rezension oder Kommentar. Sie können eine schreiben!

Tags


Zitieren Sie diese Publikation