TY - JOUR AU - Mohammad, Saif AU - Hirst, Graeme T1 - Distributional measures as proxies for semantic relatedness JO - PY - Submitted for publication/ VL - IS - SP - EP - UR - http://ftp.cs.toronto.edu/pub/gh/Mohammad+Hirst-2005.pdf M3 - KW - distributional KW - measure KW - measures KW - relatedness KW - semantic KW - similarity KW - text L1 - SN - N1 - N1 - AB - ER - TY - CHAP AU - Poelmans, Jonas AU - Ignatov, DmitryI. AU - Viaene, Stijn AU - Dedene, Guido AU - Kuznetsov, SergeiO. A2 - Perner, Petra T1 - Text Mining Scientific Papers: A Survey on FCA-Based Information Retrieval Research T2 - Advances in Data Mining. Applications and Theoretical Aspects PB - Springer Berlin Heidelberg CY - PY - 2012/ VL - 7377 IS - SP - 273 EP - 287 UR - http://dx.doi.org/10.1007/978-3-642-31488-9_22 M3 - 10.1007/978-3-642-31488-9_22 KW - FCA KW - IR KW - Mining KW - SOTA KW - Text L1 - SN - 978-3-642-31487-2 N1 - Text Mining Scientific Papers: A Survey on FCA-Based Information Retrieval Research - Springer N1 - AB - Formal Concept Analysis (FCA) is an unsupervised clustering technique and many scientific papers are devoted to applying FCA in Information Retrieval (IR) research. We collected 103 papers published between 2003-2009 which mention FCA and information retrieval in the abstract, title or keywords. Using a prototype of our FCA-based toolset CORDIET, we converted the pdf-files containing the papers to plain text, indexed them with Lucene using a thesaurus containing terms related to FCA research and then created the concept lattice shown in this paper. We visualized, analyzed and explored the literature with concept lattices and discovered multiple interesting research streams in IR of which we give an extensive overview. The core contributions of this paper are the innovative application of FCA to the text mining of scientific papers and the survey of the FCA-based IR research. ER - TY - THES AU - Illig, Jens T1 - Machine Learnability Analysis of Textclassifications in a Social Bookmarking Folksonomy PY - 2008/ PB - University of Kassel SP - EP - UR - M3 - KW - Illig KW - bachelor KW - classification KW - learning KW - machine KW - recommendations KW - text L1 - N1 - N1 - AB - ER - TY - JOUR AU - Cimiano, Philipp AU - Hotho, Andreas AU - Staab, Steffen T1 - Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis JO - Journal on Artificial Intelligence Research PY - 2005/ VL - 24 IS - SP - 305 EP - 339 UR - http://dblp.uni-trier.de/db/journals/jair/jair24.html#CimianoHS05 M3 - KW - analysis KW - concept KW - fca KW - formal KW - hierarchies KW - hierarchy KW - learning KW - ontologies KW - ontology KW - text L1 - SN - N1 - N1 - AB - ER - TY - CONF AU - Hotho, A AU - Staab, S. AU - Stumme, G. A2 - T1 - Wordnet improves text document clustering T2 - Proc. SIGIR Semantic Web Workshop PB - CY - Toronto PY - 2003/ M2 - VL - IS - SP - EP - UR - http://www.kde.cs.uni-kassel.de/stumme/papers/2003/hotho2003wordnet.pdf M3 - KW - 2003 KW - clustering KW - data KW - discovery KW - document KW - information KW - ir KW - kdd KW - kmeans KW - knowledge KW - mining KW - myown KW - retrieval KW - text KW - wordnet L1 - SN - N1 - Publications of Gerd Stumme N1 - AB - ER - TY - CONF AU - Hotho, Andreas AU - Staab, Steffen AU - Stumme, Gerd A2 - Lavrač, Nada A2 - Gamberger, Dragan A2 - Todorovski, Hendrik BlockeelLjupco T1 - Explaining Text Clustering Results using Semantic Structures T2 - Knowledge Discovery in Databases: PKDD 2003, 7th European Conference on Principles and Practice of Knowledge Discovery in Databases PB - Springer CY - Heidelberg PY - 2003/ M2 - VL - 2838 IS - SP - 217 EP - 228 UR - http://www.kde.cs.uni-kassel.de/stumme/papers/2003/hotho2003explaining.pdf M3 - KW - 2003 KW - analysis KW - clustering KW - concept KW - fca KW - formal KW - myown KW - ontologies KW - semantic KW - semantics KW - text L1 - SN - N1 - Publications of Gerd Stumme N1 - AB - Common text clustering techniques offer rather poor capabilities

for explaining to their users why a particular result has been

achieved. They have the disadvantage that they do not relate

semantically nearby terms and that they cannot explain how

resulting clusters are related to each other.

In this paper, we discuss a way of integrating a large thesaurus

and the computation of lattices of resulting clusters into common text clustering

in order to overcome these two problems.

As its major result, our approach achieves an explanation using an

appropriate level of granularity at the concept level as well as

an appropriate size and complexity of the explaining lattice of

resulting clusters. ER - TY - CONF AU - Hotho, Andreas AU - Staab, Steffen AU - Stumme, Gerd A2 - T1 - Ontologies improve text document clustering T2 - Proceedings of the 2003 IEEE International Conference on Data Mining PB - IEEE Computer Society CY - Melbourne, Florida PY - 2003/november 19-22, M2 - VL - IS - SP - 541 EP - 544 (Poster UR - http://www.kde.cs.uni-kassel.de/stumme/papers/2003/hotho2003ontologies.pdf M3 - KW - 2003 KW - clustering KW - data KW - kdd KW - mining KW - myown KW - ontologies KW - text L1 - SN - N1 - Publications of Gerd Stumme N1 - AB - ER - TY - RPRT AU - Hotho, Andreas AU - Staab, Steffen AU - Stumme, Gerd A2 - T1 - Text Clustering Based on Background Knowledge PB - University of Karlsruhe, Institute AIFB AD - PY - 2003/ VL - 425 IS - SP - EP - UR - http://www.kde.cs.uni-kassel.de/stumme/papers/2003/hotho2003text.pdf M3 - KW - 2003 KW - analysis KW - background KW - clustering KW - concept KW - fca KW - formal KW - knowledge KW - myown KW - ontologies KW - semantic KW - text KW - web L1 - N1 - Publications of Gerd Stumme N1 - Technical Report N1 - AB - Text document clustering plays an important role in providing intuitive

navigation and browsing mechanisms by organizing large amounts of information

into a small number of meaningful clusters. Standard partitional or agglomerative

clustering methods efficiently compute results to this end.

However, the bag of words representation used for these clustering methods is often

unsatisfactory as it ignores relationships between important terms that do not

co-occur literally. Also, it is mostly left to the user to find out why a particular partitioning

has been achieved, because it is only specified extensionally. In order to

deal with the two problems, we integrate background knowledge into the process of

clustering text documents.

First, we preprocess the texts, enriching their representations by background knowledge

provided in a core ontology — in our application Wordnet. Then, we cluster

the documents by a partitional algorithm. Our experimental evaluation on Reuters

newsfeeds compares clustering results with pre-categorizations of news. In the experiments,

improvements of results by background knowledge compared to the baseline

can be shown for many interesting tasks.

Second, the clustering partitions the large number of documents to a relatively small

number of clusters, which may then be analyzed by conceptual clustering. In our approach,

we applied Formal Concept Analysis. Conceptual clustering techniques are

known to be too slow for directly clustering several hundreds of documents, but they

give an intensional account of cluster results. They allow for a concise description

of commonalities and distinctions of different clusters. With background knowledge

they even find abstractions like “food” (vs. specializations like “beef” or “corn”).

Thus, in our approach, partitional clustering reduces first the size of the problem

such that it becomes tractable for conceptual clustering, which then facilitates the

understanding of the results. ER - TY - CONF AU - Hotho, A. AU - Stumme, G. A2 - Kókai, G. A2 - Zeidler, J. T1 - Conceptual Clustering of Text Clusters T2 - Proc. Fachgruppentreffen Maschinelles Lernen (FGML 2002) PB - CY - PY - 2002/ M2 - VL - IS - SP - 37 EP - 45 UR - http://www.kde.cs.uni-kassel.de/stumme/papers/2002/FGML02.pdf M3 - KW - 2002 KW - analysis KW - clustering KW - concept KW - conceptual KW - fca KW - formal KW - myown KW - text L1 - SN - N1 - Publications of Gerd Stumme N1 - AB - ER -