Distributional measures as proxies for semantic relatedness.
, Submitted for publication.
Saif Mohammad and Graeme Hirst.
[doi] [BibTeX]

Text Mining Scientific Papers: A Survey on FCA-Based Information Retrieval Research.
In: P. Perner, editor, Advances in Data Mining. Applications and Theoretical Aspects, pages 273-287. Springer Berlin Heidelberg, 2012.
Jonas Poelmans, DmitryI. Ignatov, Stijn Viaene, Guido Dedene and SergeiO. Kuznetsov.
[doi] [abstract] [BibTeX]

Machine Learnability Analysis of Textclassifications in a Social Bookmarking Folksonomy.
Master's thesis (Bachelor Thesis), University of Kassel, Kassel, 2008.
Jens Illig.
[BibTeX]

Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis.
Journal on Artificial Intelligence Research, 24:305-339, 2005.
Philipp Cimiano, Andreas Hotho and Steffen Staab.
[doi] [BibTeX]

Wordnet improves text document clustering.
In: Proc. SIGIR Semantic Web Workshop. Toronto, 2003.
A Hotho, S. Staab and G. Stumme.
[doi] [BibTeX]

Explaining Text Clustering Results using Semantic Structures.
In: N. Lavrač, D. Gamberger and H. B. Todorovski, editors, Knowledge Discovery in Databases: PKDD 2003, 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, volume 2838, series LNAI, pages 217-228. Springer, Heidelberg, 2003.
Andreas Hotho, Steffen Staab and Gerd Stumme.
[doi] [abstract] [BibTeX]

Ontologies improve text document clustering.
In: Proceedings of the 2003 IEEE International Conference on Data Mining, pages 541-544 (Poster. IEEE Computer Society, Melbourne, Florida, 2003.
Andreas Hotho, Steffen Staab and Gerd Stumme.
[doi] [BibTeX]

Text Clustering Based on Background Knowledge.
Technical Report , University of Karlsruhe, Institute AIFB, 2003.
Andreas Hotho, Steffen Staab and Gerd Stumme.
[doi] [abstract] [BibTeX]

Text document clustering plays an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters. Standard partitional or agglomerative clustering methods efficiently compute results to this end. However, the bag of words representation used for these clustering methods is often unsatisfactory as it ignores relationships between important terms that do not co-occur literally. Also, it is mostly left to the user to find out why a particular partitioning has been achieved, because it is only specified extensionally. In order to deal with the two problems, we integrate background knowledge into the process of clustering text documents. First, we preprocess the texts, enriching their representations by background knowledge provided in a core ontology — in our application Wordnet. Then, we cluster the documents by a partitional algorithm. Our experimental evaluation on Reuters newsfeeds compares clustering results with pre-categorizations of news. In the experiments, improvements of results by background knowledge compared to the baseline can be shown for many interesting tasks. Second, the clustering partitions the large number of documents to a relatively small number of clusters, which may then be analyzed by conceptual clustering. In our approach, we applied Formal Concept Analysis. Conceptual clustering techniques are known to be too slow for directly clustering several hundreds of documents, but they give an intensional account of cluster results. They allow for a concise description of commonalities and distinctions of different clusters. With background knowledge they even find abstractions like “food” (vs. specializations like “beef” or “corn”). Thus, in our approach, partitional clustering reduces first the size of the problem such that it becomes tractable for conceptual clustering, which then facilitates the understanding of the results.

Conceptual Clustering of Text Clusters.
In: G. Kókai and J. Zeidler, editors, Proc. Fachgruppentreffen Maschinelles Lernen (FGML 2002), pages 37-45. 2002.
A. Hotho and G. Stumme.
[doi] [BibTeX]

Journal articles

Book chapters

Master's thesis

Journal articles

Conference articles

Technical reports

Conference articles