TIR 2010
7th International Workshop on Text-based Information Retrieval
in conjunction with DEXA 2010
University of Deusto
Bilbao, Spain
30 August - 3 September 2010
20 Newsgroups
Abstract
This data set consists of 20000 messages taken from 20 Usenet newsgroups.
Information files:
description of the data
Data files:
20_newsgroups.tar.gz (17.3M; 61.6M uncompressed)
mini_newsgroups.tar.gz A subset composed of 100 articles from each newsgroup. (1.9M; 6.2M uncompressed)
F. Beil, M. Ester, und X. Xu. Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, Seite 436--442. ACM Press, (2002)
I. Dhillon, Y. Guan, und J. Kogan. 2nd SIAM International Conference on Data Mining (Workshop on Clustering High-Dimensional Data and its Applications), (2002)
Y. Zhao, und G. Karypis. CIKM '02: Proceedings of the eleventh international conference on Information and knowledge management, Seite 515--524. New York, NY, USA, ACM Press, (2002)
S. Bloehdorn, und A. Hotho. Proceedings of the Workshop on Text-based Information Retrieval (TIR-04) at the 27th German Conference on Artificial Intelligence, (September 2004)
A. Maedche, und S. Staab. ECAI-2000 --Proceedings of the 13th European Conference on Artificial Intelligence, Seite 321--325. IOS Press, Amsterdam, (2000)