This data set consists of 20000 messages taken from 20 Usenet newsgroups.
description of the data
20_newsgroups.tar.gz (17.3M; 61.6M uncompressed)
mini_newsgroups.tar.gz A subset composed of 100 articles from each newsgroup. (1.9M; 6.2M uncompressed) ·
CiteXplore combines literature search with text mining tools for biology.
Search results are cross referenced to EBI applications based on publication identifiers.
Links to full text versions are provided where available. ·
Congnan Luo, Yanjun Li, und Soon M. Chung. Data & Knowledge Engineering68(11):1271 - 1288 (2009)Including Special Section: Conference on Privacy in Statistical Databases PSD 2008 - Six selected and extended papers on Database Privacy.