TIR 2010
7th International Workshop on Text-based Information Retrieval
in conjunction with DEXA 2010
University of Deusto
Bilbao, Spain
30 August - 3 September 2010
20 Newsgroups
Abstract
This data set consists of 20000 messages taken from 20 Usenet newsgroups.
Information files:
description of the data
Data files:
20_newsgroups.tar.gz (17.3M; 61.6M uncompressed)
mini_newsgroups.tar.gz A subset composed of 100 articles from each newsgroup. (1.9M; 6.2M uncompressed)
F. Beil, M. Ester, und X. Xu. KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, Seite 436--442. New York, NY, USA, ACM Press, (2002)
S. Bloehdorn, und A. Hotho. Proceedings of the Fourth IEEE International Conference on Data Mining, Seite 331-334. IEEE Computer Society Press, (November 2004)
A. Hotho, A. Maedche, und S. Staab. Proc. of the Workshop ``Text Learning: Beyond Supervision'' at IJCAI 2001. Seattle, WA, USA, August 6, 2001, (2001)
S. Bloehdorn, und A. Hotho. Proceedings of the MSW 2004 workshop at the 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Seite 70-87. (August 2004)
S. Bloehdorn, P. Cimiano, A. Hotho, und S. Staab. LDV Forum - GLDV Journal for Computational Linguistics and Language Technology20 (1):
87-112(Mai 2005)