Statistical Comparisons of Classifiers over Multiple Data Sets
J. Demsar.
J. Mach. Learn. Res. (December 2006)

While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but ignored. This article reviews the current practice and then theoretically and empirically examines several suitable tests. Based on that, we recommend a set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparison of more classifiers over multiple data sets. Results of the latter can also be neatly presented with the newly introduced CD (critical difference) diagrams.

URL

http://dl.acm.org/citation.cfm?id=1248547.1248548

search on

This publication has not been reviewed yet.

rating distribution

average user rating0.0 out of 5.0 based on 0 reviews

Please log in to take part in the discussion (add own reviews or comments).

@article{demvsar2006statistical,
  abstract = {While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but ignored. This article reviews the current practice and then theoretically and empirically examines several suitable tests. Based on that, we recommend a set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparison of more classifiers over multiple data sets. Results of the latter can also be neatly presented with the newly introduced CD (critical difference) diagrams.},
  acmid = {1248548},
  added-at = {2015-03-19T20:53:26.000+0100},
  author = {Dem\v{s}ar, Janez},
  biburl = {https://puma.uni-kassel.de/bibtex/293751bd0bfabffe38f799b9bb7f4c227/stephandoerfel},
  description = {Statistical Comparisons of Classifiers over Multiple Data Sets},
  interhash = {337f48d386c60bd13ce70021894680ef},
  intrahash = {93751bd0bfabffe38f799b9bb7f4c227},
  issn = {1532-4435},
  issue_date = {12/1/2006},
  journal = {J. Mach. Learn. Res.},
  keywords = {classification prediction significance testing},
  month = dec,
  numpages = {30},
  pages = {1--30},
  publisher = {JMLR.org},
  timestamp = {2015-03-19T20:53:26.000+0100},
  title = {Statistical Comparisons of Classifiers over Multiple Data Sets},
  url = {http://dl.acm.org/citation.cfm?id=1248547.1248548},
  volume = 7,
  year = 2006
}

%0 Journal Article
%1 demvsar2006statistical
%A Demsar, Janez
%D 2006
%I JMLR.org
%J J. Mach. Learn. Res.
%K classification prediction significance testing
%P 1--30
%T Statistical Comparisons of Classifiers over Multiple Data Sets
%U http://dl.acm.org/citation.cfm?id=1248547.1248548
%V 7
%X While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but ignored. This article reviews the current practice and then theoretically and empirically examines several suitable tests. Based on that, we recommend a set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparison of more classifiers over multiple data sets. Results of the latter can also be neatly presented with the newly introduced CD (critical difference) diagrams.

PUMA

Statistical Comparisons of Classifiers over Multiple Data Sets
J. Demsar.
J. Mach. Learn. Res. (December 2006)

Tags

Users

Comments and Reviews

Cite this publication

PUMA

Statistical Comparisons of Classifiers over Multiple Data SetsJ. Demsar. J. Mach. Learn. Res. (December 2006)

Tags

Users

Comments and Reviews

Cite this publication

Statistical Comparisons of Classifiers over Multiple Data Sets
J. Demsar.
J. Mach. Learn. Res. (December 2006)