TIR 2010
7th International Workshop on Text-based Information Retrieval
in conjunction with DEXA 2010
University of Deusto
Bilbao, Spain
30 August - 3 September 2010
20 Newsgroups
Abstract
This data set consists of 20000 messages taken from 20 Usenet newsgroups.
Information files:
description of the data
Data files:
20_newsgroups.tar.gz (17.3M; 61.6M uncompressed)
mini_newsgroups.tar.gz A subset composed of 100 articles from each newsgroup. (1.9M; 6.2M uncompressed)
C. Kohlschütter, P. Fankhauser, and W. Nejdl. Proc. of 3rd ACM International Conference on Web Search and Data Mining New York City, NY USA (WSDM 2010)., (2010)
C. Luo, Y. Li, and S. Chung. Data & Knowledge Engineering68 (11):
1271 - 1288(2009)Including Special Section: Conference on Privacy in Statistical Databases (PSD 2008) - Six selected and extended papers on Database Privacy.