TY - RPRT AU - Hotho, Andreas AU - Staab, Steffen AU - Stumme, Gerd A2 - T1 - Text Clustering Based on Background Knowledge PB - University of Karlsruhe, Institute AIFB AD - PY - 2003/ VL - 425 IS - SP - EP - UR - http://www.kde.cs.uni-kassel.de/stumme/papers/2003/hotho2003text.pdf M3 - KW - 2003 KW - analysis KW - background KW - clustering KW - concept KW - fca KW - formal KW - knowledge KW - myown KW - ontologies KW - semantic KW - text KW - web L1 - N1 - Publications of Gerd Stumme N1 - Technical Report N1 - AB - Text document clustering plays an important role in providing intuitive

navigation and browsing mechanisms by organizing large amounts of information

into a small number of meaningful clusters. Standard partitional or agglomerative

clustering methods efficiently compute results to this end.

However, the bag of words representation used for these clustering methods is often

unsatisfactory as it ignores relationships between important terms that do not

co-occur literally. Also, it is mostly left to the user to find out why a particular partitioning

has been achieved, because it is only specified extensionally. In order to

deal with the two problems, we integrate background knowledge into the process of

clustering text documents.

First, we preprocess the texts, enriching their representations by background knowledge

provided in a core ontology — in our application Wordnet. Then, we cluster

the documents by a partitional algorithm. Our experimental evaluation on Reuters

newsfeeds compares clustering results with pre-categorizations of news. In the experiments,

improvements of results by background knowledge compared to the baseline

can be shown for many interesting tasks.

Second, the clustering partitions the large number of documents to a relatively small

number of clusters, which may then be analyzed by conceptual clustering. In our approach,

we applied Formal Concept Analysis. Conceptual clustering techniques are

known to be too slow for directly clustering several hundreds of documents, but they

give an intensional account of cluster results. They allow for a concise description

of commonalities and distinctions of different clusters. With background knowledge

they even find abstractions like “food” (vs. specializations like “beef” or “corn”).

Thus, in our approach, partitional clustering reduces first the size of the problem

such that it becomes tractable for conceptual clustering, which then facilitates the

understanding of the results. ER - TY - CONF AU - Stumme, Gerd A2 - Bock, H.-H. A2 - Polasek, W. T1 - Attribute Exploration with Background Implications and Exceptions T2 - Data Analysis and Information Systems. Statistical and Conceptual approaches. Proc. GfKl'95. Studies in Classification, Data Analysis, and Knowledge Organization 7 PB - Springer CY - Heidelberg PY - 1996/ M2 - VL - IS - SP - 457 EP - 469 UR - http://www.kde.cs.uni-kassel.de/stumme/papers/1995/P1781-GfKl95.pdf M3 - KW - 1996 KW - acquisition KW - analysis KW - attribute KW - background KW - concept KW - exploration KW - fca KW - formal KW - implications KW - knowledge KW - lattices KW - myown L1 - SN - N1 - Publications of Gerd Stumme N1 - AB - ER -