Improving Text Classification by Using Encyclopedia Knowledge.
In:
Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on, Seiten 332-341.
2007.
Pu Wang, Jian Hu, Hua-Jun Zeng, Lijun Chen und Zheng Chen.
[doi]
[Kurzfassung]
[BibTeX]
The exponential growth of text documents available on the Internet has created an urgent need for accurate, fast, and general purpose text classification algorithms. However, the "bag of words" representation used for these classification methods is often unsatisfactory as it ignores relationships between important terms that do not co-occur literally. In order to deal with this problem, we integrate background knowledge - in our application: Wikipedia - into the process of classifying text documents. The experimental evaluation on Reuters newsfeeds and several other corpus shows that our classification results with encyclopedia knowledge are much better than the baseline "bag of words " methods.
An Ontology-based Framework for Text Mining.
LDV Forum - GLDV Journal for Computational Linguistics and Language Technology, 20(1):87-112, 2005.
Stephan Bloehdorn, Philipp Cimiano, Andreas Hotho und Steffen Staab.
[BibTeX]
Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis.
Journal on Artificial Intelligence Research, 24:305-339, 2005.
Philipp Cimiano, Andreas Hotho und Steffen Staab.
[doi]
[BibTeX]
A Brief Survey of Text Mining.
LDV Forum - GLDV Journal for Computational Linguistics and Language Technology, 20(1):19-62, 2005.
Andreas Hotho, Andreas Nürnberger und Gerhard Paaß.
[doi]
[BibTeX]
Boosting for Text Classification with Semantic Features.
In:
Proceedings of the MSW 2004 workshop at the 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Seiten 70-87.
2004.
Stephan Bloehdorn und Andreas Hotho.
[doi]
[BibTeX]
Text Classification by Boosting Weak Learners based on Terms and Concepts.
In:
Proceedings of the Fourth IEEE International Conference on Data Mining, Seiten 331-334.
IEEE Computer Society Press, 2004.
Stephan Bloehdorn und Andreas Hotho.
[doi]
[BibTeX]
Clustering Ontologies from Text.
In:
Proceedings of the Conference on Languages Resources and Evaluation (LREC).
ELRA - European Language Ressources Association, Lisbon, Portugal, 2004.
Philipp Cimiano, Andreas Hotho und Steffen Staab.
[doi]
[BibTeX]
Clustern mit Hintergrundwissen.
2004.
Andreas Hotho.
[doi]
[BibTeX]
Clustern mit Hintergrundwissen.
Doktorarbeit, University of Karlsruhe, Universität Karlsruhe (TH), Institut AIFB, D-76128 Karlsruhe, 2004.
Studer/Gaul
Andreas Hotho.
[BibTeX]
Ontologies Improve Text Document Clustering.
In:
Proc. of the ICDM 03, The 2003 IEEE International Conference on Data Mining, Seiten 541-544.
2003.
A. Hotho, S. Staab und G. Stumme.
[doi]
[BibTeX]
WordNet improves text document clustering.
In:
Proc. of the SIGIR 2003 Semantic Web Workshop.
Toronto, Canada, 2003.
A. Hotho, S. Staab und G. Stumme.
[doi]
[BibTeX]
Explaining Text Clustering Results using Semantic Structures.
In: N. Lavrač, D. Gamberger und H. B. Todorovski
(Herausgeber):
Knowledge Discovery in Databases: PKDD 2003, 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, Band 2838, Reihe LNAI, Seiten 217-228.
Springer, Heidelberg, 2003.
Andreas Hotho, Steffen Staab und Gerd Stumme.
[doi]
[Kurzfassung]
[BibTeX]
Common text clustering techniques offer rather poor capabilities
for explaining to their users why a particular result has been
achieved. They have the disadvantage that they do not relate
semantically nearby terms and that they cannot explain how
resulting clusters are related to each other.
In this paper, we discuss a way of integrating a large thesaurus
and the computation of lattices of resulting clusters into common text clustering
in order to overcome these two problems.
As its major result, our approach achieves an explanation using an
appropriate level of granularity at the concept level as well as
an appropriate size and complexity of the explaining lattice of
resulting clusters.
Ontologies improve text document clustering.
In:
Proceedings of the 2003 IEEE International Conference on Data Mining, Seiten 541-544 (Poster.
IEEE Computer Society, Melbourne, Florida, 2003.
Andreas Hotho, Steffen Staab und Gerd Stumme.
[doi]
[BibTeX]
Automatic multi-label subject indexing in a multilingual environment.
In:
Proc. of the 7th European Conference in Research and Advanced Technology for Digital Libraries, ECDL 2003, Band 2769, Reihe LNCS, Seiten 140-151.
Springer, 2003.
Boris Lauser und Andreas Hotho.
[BibTeX]
Ontology-based Text Document Clustering..
In:
Intelligent Information Processing and Web Mining, Proceedings of the International IIS: IIPWM'03 Conference held in Zakopane, Seiten 451-452.
2003.
Steffen Staab und Andreas Hotho.
[doi]
[BibTeX]
Conceptual Clustering of Text Clusters.
In:
Proceedings of FGML Workshop, Seiten 37-45.
Special Interest Group of German Informatics Society (FGML --- Fachgruppe Maschinelles Lernen der GI e.V.), 2002.
A. Hotho und G. Stumme.
[doi]
[BibTeX]
Text Clustering Based on Good Aggregations.
Künstliche Intelligenz (KI), 16(4):48-54, 2002.
Andreas Hotho, Alexander Maedche und Steffen Staab.
[doi]
[BibTeX]
Ontology-based Text Clustering.
In:
Proc. of the Workshop ``Text Learning: Beyond Supervision'' at IJCAI 2001. Seattle, WA, USA, August 6, 2001.
2001.
Andreas Hotho, Alexander Maedche und Steffen Staab.
[BibTeX]
Text Clustering Based on Good Aggregations.
In:
ICDM '01: Proceedings of the 2001 IEEE International Conference on Data Mining, Seiten 607-608.
IEEE Computer Society, Washington, DC, USA, 2001.
Andreas Hotho, Alexander Maedche und Steffen Staab.
[doi]
[BibTeX]
Automatic acquisition of hyponyms from large text corpora.
In:
Proceedings of the 14th conference on Computational linguistics, Band 2, Seiten 539-545.
Association for Computational Linguistics, Stroudsburg, PA, USA, 1992.
Marti A. Hearst.
[doi]
[Kurzfassung]
[BibTeX]
We describe a method for the automatic acquisition of the hyponymy lexical relation from unrestricted text. Two goals motivate the approach: (i) avoidance of the need for pre-encoded knowledge and (ii) applicability across a wide range of text. We identify a set of lexico-syntactic patterns that are easily recognizable, that occur frequently and across text genre boundaries, and that indisputably indicate the lexical relation of interest. We describe a method for discovering these patterns and suggest that other lexical relations will also be acquirable in this way. A subset of the acquisition algorithm is implemented and the results are used to augment and critique the structure of a large hand-built thesaurus. Extensions and applications to areas such as information retrieval are suggested.