Community Assessment using Evidence Networks.
In:
Analysis of Social Media and Ubiquitous Data, Band 6904, Reihe LNAI.
2011.
Folke Mitzlaff, Martin Atzmueller, Dominik Benz, Andreas Hotho und Gerd Stumme.
[BibTeX]
Empirical Comparison of Algorithms for Network Community Detection.
2010. cite arxiv:1004.3539
.
Jure Leskovec, Kevin J. Lang und Michael W. Mahoney.
[doi]
[Kurzfassung]
[BibTeX]
Detecting clusters or communities in large real-world graphs such as large
social or information networks is a problem of considerable interest. In
practice, one typically chooses an objective function that captures the
intuition of a network cluster as set of nodes with better internal
connectivity than external connectivity, and then one applies approximation
algorithms or heuristics to extract sets of nodes that are related to the
objective function and that "look like" good communities for the application of
interest. In this paper, we explore a range of network community detection
methods in order to compare them and to understand their relative performance
and the systematic biases in the clusters they identify. We evaluate several
common objective functions that are used to formalize the notion of a network
community, and we examine several different classes of approximation algorithms
that aim to optimize such objective functions. In addition, rather than simply
fixing an objective and asking for an approximation to the best cluster of any
size, we consider a size-resolved version of the optimization problem.
Considering community quality as a function of its size provides a much finer
lens with which to examine community detection algorithms, since objective
functions and approximation algorithms often have non-obvious size-dependent
behavior.
A Survey of Accuracy Evaluation Metrics of Recommendation Tasks .
2935. Band v10.
2009.
[doi]
[BibTeX]
AEON - An approach to the automatic evaluation of ontologies.
Applied Ontology, 3(1-2):41-62, 2008.
Johanna Völker, Denny Vrandečić, York Sure und Andreas Hotho.
[doi]
[Kurzfassung]
[BibTeX]
OntoClean is an approach towards the formal evaluation of taxonomic relations in ontologies. The application of OntoClean consists of two main steps. First, concepts are tagged according to meta-properties known as rigidity, unity, dependency and identity. Second, the tagged concepts are checked according to predefined constraints to discover taxonomic errors. Although OntoClean is well documented in numerous publications, it is still used rather infrequently due to the high costs of application. Especially, the manual tagging of concepts with the correct meta-properties requires substantial efforts of highly experienced ontology engineers. In order to facilitate the use of OntoClean and to enable the evaluation of real-world ontologies, we provide AEON, a tool which automatically tags concepts with appropriate OntoClean meta-properties and performs the constraint checking. We use the Web as an embodiment of world knowledge, where we search for patterns that indicate how to properly tag concepts. We thoroughly evaluated our approach against a manually created gold standard. The evaluation shows the competitiveness of our approach while at the same time significantly lowering the costs. All of our results, i.e. the tool AEON as well as the experiment data, are publicly available.
Evaluating tagging behavior in social bookmarking systems: metrics and design heuristics.
In:
GROUP '07: Proceedings of the 2007 international ACM conference on Conference on supporting group work, Seiten 351-360.
ACM, New York, NY, USA, 2007.
Umer Farooq, Thomas G. Kannampallil, Yang Song, Craig H. Ganoe, John M. Carroll und Lee Giles.
[doi]
[BibTeX]
Improving Text Classification by Using Encyclopedia Knowledge.
In:
Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on, Seiten 332-341.
2007.
Pu Wang, Jian Hu, Hua-Jun Zeng, Lijun Chen und Zheng Chen.
[doi]
[Kurzfassung]
[BibTeX]
The exponential growth of text documents available on the Internet has created an urgent need for accurate, fast, and general purpose text classification algorithms. However, the "bag of words" representation used for these classification methods is often unsatisfactory as it ignores relationships between important terms that do not co-occur literally. In order to deal with this problem, we integrate background knowledge - in our application: Wikipedia - into the process of classifying text documents. The experimental evaluation on Reuters newsfeeds and several other corpus shows that our classification results with encyclopedia knowledge are much better than the baseline "bag of words " methods.
On How to Perform a Gold Standard based Evaluation of Ontology Learning.
In:
In: Proc. of ISWC-2006 International Semantic Web Conference.
Springer, LNCS, Athens, GA, USA, 2006.
Klaas Dellschaft und Steffen Staab.
[doi] [pdf]
[BibTeX]
A Survey of Ontology Evaluation Techniques.
In:
Proc. of 8th Int. multi-conf. Information Society, Seiten 166-169.
2005.
Janez Brank, Marko Grobelnik und Dunja Mladenić.
[BibTeX]
ROC Graphs: Notes and Practical Considerations for Researchers.
HP Laboratories, 2004.
T. Fawcett.
[doi]
[BibTeX]
Evaluating Collaborative Filtering Recommender
Systems.
ACM Transactions on Information Systems, 22(1):5-53, 2004.
J.L. Herlocker, J.A. Konstan, L.G. Terveen und J.T. Riedl.
[BibTeX]
Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering.
Machine Learning, 55(3):311-331, 2004.
Ying Zhao und George Karypis.
[doi]
[BibTeX]
WordNet improves text document clustering.
In:
Proc. of the SIGIR 2003 Semantic Web Workshop.
Toronto, Canada, 2003.
A. Hotho, S. Staab und G. Stumme.
[doi]
[BibTeX]
Comparing clusterings .
In:
Proc. of COLT 03.
2003.
Marina Meila.
[doi]
[BibTeX]
Cluster Ensembles - A Knowledge Reuse Framework for Combining Multiple Partitions.
Journal on Machine Learning Research (JMLR), 3:583-617, 2002.
Alexander Strehl und Joydeep Ghosh.
[doi]
[Kurzfassung]
[BibTeX]
This paper introduces the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that determined these partitionings. We first identify several application scenarios for the resultant 'knowledge reuse' framework that we call cluster ensembles. The cluster ensemble problem is then formalized as a combinatorial optimization problem in terms of shared mutual information. In addition to a direct maximization approach, we propose three effective and efficient techniques for obtaining high-quality combiners (consensus functions). The first combiner induces a similarity measure from the partitionings and then reclusters the objects. The second combiner is based on hypergraph partitioning. The third one collapses groups of clusters into meta-clusters which then compete for each object to determine the combined clustering. Due to the low computational costs of our techniques, it is quite feasible to use a supra-consensus function that evaluates all three approaches against the objective function and picks the best solution for a given situation. We evaluate the effectiveness of cluster ensembles in three qualitatively different application scenarios: (i) where the original clusters were formed based on non-identical sets of features, (ii) where the original clustering algorithms worked on non-identical sets of objects, and (iii) where a common data-set is used and the main purpose of combining multiple clusterings is to improve the quality and robustness of the solution. Promising results are obtained in all three situations for synthetic as well as real data-sets.
Fuzzy Cluster Analysis.
1999.
Frank Höppner, Frank Klawonn, Rudolf Kruse und Thomas Runkler.
[BibTeX]
An examination of indexes for determining the number of clusters in binary data sets.
SFB ``Adaptive Information Systems and Modeling in Economics and Management Science'', 1999. Nummer Working Paper 29.
A. Weingessel, E. Dimitriadou und S. Dolnicar.
[doi]
[BibTeX]
A geometric approach to cluster validity for normal mixtures.
Soft Computing - A Fusion of Foundations, Methodologies and Applications, 1(4):166-179, 1997.
J. C. Bezdek, W. Q. Li, Y. Attikiouzel und M. Windham.
[doi]
[Kurzfassung]
[BibTeX]
We study indices for choosing the correct number of components in a mixture of normal distributions. Previous studies have been confined to indices based wholly on probabilistic models. Viewing mixture decomposition as probabilistic clustering (where the emphasis is on partitioning for geometric substructure) as opposed to parametric estimation enables us to introduce both fuzzy and crisp measures of cluster validity for this problem. We presume the underlying samples to be unlabeled, and use the expectation-maximization (EM) algorithm to find clusters in the data. We test 16 probabilistic, 3 fuzzy and 4 crisp indices on 12 data sets that are samples from bivariate normal mixtures having either 3 or 6 components. Over three run averages based on different initializations of EM, 10 of the 23 indices tested for choosing the right number of mixture components were correct in at least 9 of the 12 trials. Among these were the fuzzy index of Xie-Beni, the crisp Davies-Bouldin index, and two crisp indices that are recent generalizations of Dunn's index.
ER -
Clusteranalyse mit gemischt-skalierten Merkmalen: Abstrahierung vom Skalenniveau.
Allgemeines Statistisches Archiv, Vandenhoeck & Ruprecht in Göttingen, 81(3):249-265, 1997.
N. Fickel.
[BibTeX]
On the use of spreading activation methods in automatic information.
In:
SIGIR '88: Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval, Seiten 147-160.
ACM Press, New York, NY, USA, 1988.
G. Salton und C. Buckley.
[doi]
[Kurzfassung]
[BibTeX]
Spreading activation methods have been recommended in information retrieval to expand the search vocabulary and to complement the retrieved document sets. The spreading activation strategy is reminiscent of earlier associative indexing and retrieval systems. Some spreading activation procedures are briefly described, and evaluation output is given, reflecting the effectiveness of one of the proposed procedures.
Nonparametric statistics for the behavioral sciences.
1988.
S. Siegel und N.J. Castellan.
[BibTeX]