Three-Toed Sloth Slow Takes from the Canopy (My Very Own Internet Tradition) June 15, 2007 « Reformatting in Progress | Main | Books to Read While the Algae Grow in Your Fur, May 2007 » So You Think You Have a Power Law — Well Isn't That Special?
Regular readers who care about such things — I think there are about three of you — will recall that I have long had a thing about just how unsound many of the claims for the presence of power law distributions in real data are, especially those made by theoretical physicists, who, with some honorable exceptions, learn nothing about data analysis. (I certainly didn't.) I have even whined about how I should really be working on a paper about how to do all this right, rather than merely snarking in a weblog. As evidence that the age of wonders is not passed — and, more relevantly, that I have productive collaborators — this paper is now loosed upon the world:
Aaron Clauset, CRS and M. E. J. Newman, "Power-law distributions in empirical data", arxiv:0706.1062, with code available in Matlab and R; forthcoming (2009) in SIAM Review Abstract: Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the empirical detection and characterization of power laws is made difficult by the large fluctuations that occur in the tail of the distribution. In particular, standard methods such as least-squares fitting are known to produce systematically biased estimates of parameters for power-law distributions and should not be used in most circumstances. Here we describe statistical techniques for making accurate parameter estimates for power-law data, based on maximum likelihood methods and the Kolmogorov-Smirnov statistic. We also show how to tell whether the data follow a power-law distribution at all, defining quantitative measures that indicate when the power law is a reasonable fit to the data and when it is not. We demonstrate these methods by applying them to twenty-four real-world data sets from a range of different disciplines. Each of the data sets has been conjectured previously to follow a power-law distribution. In some cases we find these conjectures to be consistent with the data while in others the power law is ruled out.
The paper is deliberately aimed at physicists, so we assume some things that they know (like some of the mechanisms, e.g. critical fluctuations, which can lead to power laws), and devote extra detail to things they don't but which e.g. statisticians do know (such as how to find the cumulative distribution function of a standard Gaussian). In particular, we refrained from making a big deal about the need for an error-statistical approach to problems like this, but it definitely shaped our thinking.
Aaron has already posted about the paper, but I'll do so myself anyway. Partly this is to help hammer the message home, and partly this is because I am a basically negative and critical man, and this sort of work gives me an excuse to vent my feelings of spite under the pretense of advancing truth (unlike Aaron and Mark, who are basically nice guys and constructive scholars).
Here are the take-home points, none of which ought to be news, but which, taken together, would lead to a real change in the literature. (For example, half or more each issue of Physica A would disappear.)
Because this is, of course, what everyone ought to do with a computational paper, we've put our code online, so you can check our calculations, or use these methods on your own data, without having to implement them from scratch. I trust that I will no longer have to referee papers where people use GnuPlot to draw lines on log-log graphs, as though that meant something, and that in five to ten years even science journalists and editors of Wired will begin to get the message.
Manual trackbacks: The Statistical Mechanic; Uncertain Principles; zs; LanguageLog; Science After Sunclipse; Philosophia Naturalis 11 (at Highly Allochthonous); Langreiter; blogs for industry ... blogs for the dead; Infectious Greed; No Free Lunch; Look Here First; TPMCafe; Science After Sunclipse; Cosmic Variance; Messy Matters
Power Laws; Enigmas of Chance; Complexity
Posted by crshalizi at June 15, 2007 13:00 | permanent link Three-Toed Sloth: Hosted, but not endorsed, by the Center for the Study of Complex Systems