Larsen, P O and von Ins, M. The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index. In Scientometrics, (84) 3: 575-603, Year 2010.
Denoyer, Ludovic and Gallinari, Patrick. The Wikipedia XML Corpus. In SIGIR Forum, Year 2006.
Liu, Vinci and Curran, James R.. Web Text Corpus for Natural Language Processing.. EACL. The Association for Computer Linguistics, Year 2006.
Lewis, D. D. and Yang, Y. and Rose, T. G. and Li, F.. RCV1: A New Benchmark Collection for Text Categorization Research. In Journal of Machine Learning Research, (5) Apr: 361--397, Year 2004.
Halevy, Alon Y. and Madhavan, Jayant. Corpus-Based Knowledge Representation. IJCAI-03, Proceedings of the Eighteenth International Joint Conference
on Artificial Intelligence, Acapulco, Mexico, August 9-15, 2003.
editor(s) Gottlob, Georg and Walsh, Toby. 1567-1572, Morgan Kaufmann, Year 2003.
Resnik, Philip and Smith, Noah A.. The Web as a parallel corpus. In Computational Linguistics, (29) 3: 349--380, MIT Press, Cambridge, MA, USA, Year 2003.
Jiang, Jay J. and Conrath, David W.. Semantic similarity based on corpus statistics and lexical taxonomy. In CoRR, (cmp-lg/9709008) Year 1997.
Hearst, Marti A.. Automatic acquisition of hyponyms from large text corpora. Proceedings of the 14th conference on Computational linguistics. (2) 539--545, Association for Computational Linguistics, Stroudsburg, PA, USA, Year 1992.
Francis, Winthrop Nelson and Kucera, Henry. Frequency Analysis of English Usage: Lexicon and Grammar. 1983. Houghton Mifflin, Year 1983.