Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Information theoretic measures for clusterings comparison: is a correction for chance necessary?

N. Vinh, J. Epps, und J. Bailey. ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning, Seite 1073--1080. New York, NY, USA, ACM, (2009)

Zusammenfassung

Information theoretic based measures form a fundamental class of similarity measures for comparing clusterings, beside the class of pair-counting based and set-matching based measures. In this paper, we discuss the necessity of correction for chance for information theoretic based measures for clusterings comparison. We observe that the baseline for such measures, i.e. average value between random partitions of a data set, does not take on a constant value, and tends to have larger variation when the ratio between the number of data points and the number of clusters is small. This effect is similar in some other non-information theoretic based measures such as the well-known Rand Index. Assuming a hypergeometric model of randomness, we derive the analytical formula for the expected mutual information value between a pair of clusterings, and then propose the adjusted version for several popular information theoretic based measures. Some examples are given to demonstrate the need and usefulness of the adjusted measures.

Beschreibung

Information theoretic measures for clusterings comparison

Links und Ressourcen

URL:

http://portal.acm.org/citation.cfm?id=1553511

BibTeX-Schlüssel:

vinh2009information

Suchen auf:

Kommentare und Rezensionen
(0)

Es gibt bisher keine Rezension oder Kommentar. Sie können eine schreiben!

Zitieren Sie diese Publikation

@inproceedings{vinh2009information,
  abstract = {Information theoretic based measures form a fundamental class of similarity measures for comparing clusterings, beside the class of pair-counting based and set-matching based measures. In this paper, we discuss the necessity of correction for chance for information theoretic based measures for clusterings comparison. We observe that the baseline for such measures, i.e. average value between random partitions of a data set, does not take on a constant value, and tends to have larger variation when the ratio between the number of data points and the number of clusters is small. This effect is similar in some other non-information theoretic based measures such as the well-known Rand Index. Assuming a hypergeometric model of randomness, we derive the analytical formula for the expected mutual information value between a pair of clusterings, and then propose the adjusted version for several popular information theoretic based measures. Some examples are given to demonstrate the need and usefulness of the adjusted measures.},
  added-at = {2011-02-04T16:10:16.000+0100},
  address = {New York, NY, USA},
  author = {Vinh, Nguyen Xuan and Epps, Julien and Bailey, James},
  biburl = {https://puma.uni-kassel.de/bibtex/2bed9702898bc8c50faa21eabd068b8d9/benz},
  booktitle = {ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning},
  description = {Information theoretic measures for clusterings comparison},
  doi = {10.1145/1553374.1553511},
  interhash = {ddd96b934438029873242aeabc26a201},
  intrahash = {bed9702898bc8c50faa21eabd068b8d9},
  isbn = {978-1-60558-516-1},
  keywords = {mutual_information comparison evaluation clustering},
  location = {Montreal, Quebec, Canada},
  pages = {1073--1080},
  publisher = {ACM},
  timestamp = {2011-02-04T16:10:16.000+0100},
  title = {Information theoretic measures for clusterings comparison: is a correction for chance necessary?},
  url = {http://portal.acm.org/citation.cfm?id=1553511},
  year = 2009
}

%0 Conference Paper
%1 vinh2009information
%A Vinh, Nguyen Xuan
%A Epps, Julien
%A Bailey, James
%B ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
%C New York, NY, USA
%D 2009
%I ACM
%K mutual_information comparison evaluation clustering
%P 1073--1080
%R 10.1145/1553374.1553511
%T Information theoretic measures for clusterings comparison: is a correction for chance necessary?
%U http://portal.acm.org/citation.cfm?id=1553511
%X Information theoretic based measures form a fundamental class of similarity measures for comparing clusterings, beside the class of pair-counting based and set-matching based measures. In this paper, we discuss the necessity of correction for chance for information theoretic based measures for clusterings comparison. We observe that the baseline for such measures, i.e. average value between random partitions of a data set, does not take on a constant value, and tends to have larger variation when the ratio between the number of data points and the number of clusters is small. This effect is similar in some other non-information theoretic based measures such as the well-known Rand Index. Assuming a hypergeometric model of randomness, we derive the analytical formula for the expected mutual information value between a pair of clusterings, and then propose the adjusted version for several popular information theoretic based measures. Some examples are given to demonstrate the need and usefulness of the adjusted measures.
%@ 978-1-60558-516-1

PUMA

Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Information theoretic measures for clusterings comparison: is a correction for chance necessary?

Zusammenfassung

Beschreibung

Links und Ressourcen

Kommentare und Rezensionen
(0)

Tags

Zitieren Sie diese Publikation

Metadaten

Community

Tags (@benzs Tags hervorgehoben)

PUMA

KopierenLöschenDiese Publikation zur Ablage hinzufügenCommunity-EintragVersionsverlauf dieses EintragsURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Information theoretic measures for clusterings comparison: is a correction for chance necessary?

Zusammenfassung

Beschreibung

Links und Ressourcen

Kommentare und Rezensionen (0)

Tags

Zitieren Sie diese Publikation

Metadaten

Community

Tags (@benzs Tags hervorgehoben)

Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Information theoretic measures for clusterings comparison: is a correction for chance necessary?

Kommentare und Rezensionen
(0)