This collection consists of ~20M web queries collected from ~650k users over three months.
The data is sorted by anonymous user ID and sequentially arranged.
20 Newsgroups
Abstract
This data set consists of 20000 messages taken from 20 Usenet newsgroups.
Information files:
description of the data
Data files:
20_newsgroups.tar.gz (17.3M; 61.6M uncompressed)
mini_newsgroups.tar.gz A subset composed of 100 articles from each newsgroup. (1.9M; 6.2M uncompressed)
Y. Song, L. Zhang, und C. Giles. CIKM '08: Proceeding of the 17th ACM conference on Information and knowledge mining, Seite 93--102. New York, NY, USA, ACM, (2008)