The Wikipedia XML Corpus.
SIGIR Forum, 2006.
Ludovic Denoyer and Patrick Gallinari.
[doi]  [BibTeX] 
RCV1: A New Benchmark Collection for Text Categorization Research.
Journal of Machine Learning Research, 5(Apr):361-397, 2004.
D. D. Lewis, Y. Yang, T. G. Rose and F. Li.
[doi]  [BibTeX] 
Corpus-Based Knowledge Representation.
In: G. Gottlob and T. Walsh, editors, IJCAI-03, Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, Acapulco, Mexico, August 9-15, 2003, pages 1567-1572. Morgan Kaufmann, 2003.
Alon Y. Halevy and Jayant Madhavan.
[BibTeX] 
Semantic similarity based on corpus statistics and lexical taxonomy.
CoRR, cmp-lg/9709008, 1997.
Jay J. Jiang and David W. Conrath.
[BibTeX]