Maslov, S. & Redner, S.:
Promise and Pitfalls of Extending Google's PageRank Algorithm to Citation Networks. , 2009
[Volltext] [Kurzfassung]
[BibTeX]
We review our recent work on applying the Google PageRank algorithm to find scientific "gems" among all Physical Review publications, and its extension to CiteRank, to find currently popular research directions. These metrics provide a meaningful extension to traditionally-used importance measures, such as the number of citations and journal impact factor. We also point out some pitfalls of over-relying on quantitative metrics to evaluate scientific quality.
Butler, D.: Free journal-ranking tool enters citation market. In:
Nature 451 (2008), Nr. 7174, S. 6-6
[Volltext]
[BibTeX]
Chang, M. & Poon, C. K.: Efficient phrase querying with common phrase index. In:
Inf. Process. Manage. 44 (2008), Nr. 2, S. 756-769
[Volltext]
[Kurzfassung]
[BibTeX]
In this paper, we propose a common phrase index as an efficient index structure to support phrase queries in a very large text database. Our structure is an extension of previous index structures for phrases and achieves better query efficiency with modest extra storage cost. Further improvement in efficiency can be attained by implementing our index according to our observation of the dynamic nature of common word set. In experimental evaluation, a common phrase index using 255 common words has an improvement of about 11% and 62% in query time for the overall and large queries (queries of long phrases) respectively over an auxiliary nextword index. Moreover, it has only about 19% extra storage cost. Compared with an inverted index, our improvement is about 72% and 87% for the overall and large queries respectively. We also propose to implement a common phrase index with dynamic update feature. Our experiments show that more improvement in time efficiency can be achieved.
Zhao, Y. & Karypis, G.: Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering. In:
Machine Learning 55 (2004), Nr. 3, S. 311-331
[Volltext]
[BibTeX]
Bezdek, J. C.; Li, W. Q.; Attikiouzel, Y. & Windham, M.: A geometric approach to cluster validity for normal mixtures. In:
Soft Computing - A Fusion of Foundations, Methodologies and Applications 1 (1997), Nr. 4, S. 166-179
[Volltext]
[Kurzfassung]
[BibTeX]
We study indices for choosing the correct number of components in a mixture of normal distributions. Previous studies have been confined to indices based wholly on probabilistic models. Viewing mixture decomposition as probabilistic clustering (where the emphasis is on partitioning for geometric substructure) as opposed to parametric estimation enables us to introduce both fuzzy and crisp measures of cluster validity for this problem. We presume the underlying samples to be unlabeled, and use the expectation-maximization (EM) algorithm to find clusters in the data. We test 16 probabilistic, 3 fuzzy and 4 crisp indices on 12 data sets that are samples from bivariate normal mixtures having either 3 or 6 components. Over three run averages based on different initializations of EM, 10 of the 23 indices tested for choosing the right number of mixture components were correct in at least 9 of the 12 trials. Among these were the fuzzy index of Xie-Beni, the crisp Davies-Bouldin index, and two crisp indices that are recent generalizations of Dunn's index.
ER -
Rand, W.: Objective criteria for the evaluation of clustering methods. In:
Journal of the American Statistical Association 66 (1971), Nr. 336, S. 846-850
[BibTeX]