TIR 2010
7th International Workshop on Text-based Information Retrieval
in conjunction with DEXA 2010
University of Deusto
Bilbao, Spain
30 August - 3 September 2010
20 Newsgroups
Abstract
This data set consists of 20000 messages taken from 20 Usenet newsgroups.
Information files:
description of the data
Data files:
20_newsgroups.tar.gz (17.3M; 61.6M uncompressed)
mini_newsgroups.tar.gz A subset composed of 100 articles from each newsgroup. (1.9M; 6.2M uncompressed)
C. Kohlschütter, P. Fankhauser, und W. Nejdl. Proc. of 3rd ACM International Conference on Web Search and Data Mining New York City, NY USA (WSDM 2010)., (2010)