PUMA bookmarks for /tag/datasethttps://puma.uni-kassel.de/tag/datasetPUMA RSS Feed for /tag/dataset140kit : The Free, Open Source Twitter Analytics Platformhttp://140kit.com/hotho2011-01-03T13:01:14+01:00collection dataset free open toread twitter <a itemprop="url" data-versiondate="2011-01-03T13:01:14+01:00" href="http://140kit.com/" rel="nofollow" class="description-link">http://140kit.com/</a>20 Newsgroups20 Newsgroups
Abstract
This data set consists of 20000 messages taken from 20 Usenet newsgroups.
Information files:
description of the data
Data files:
20_newsgroups.tar.gz (17.3M; 61.6M uncompressed)
mini_newsgroups.tar.gz A subset composed of 100 articles from each newsgroup. (1.9M; 6.2M uncompressed)http://kdd.ics.uci.edu/databases/20newsgroups/20newsgroups.htmlhotho2008-04-12T15:32:12+02:0020 dataset newsgroups text <span itemprop="description">20 Newsgroups
Abstract
This data set consists of 20000 messages taken from 20 Usenet newsgroups.
Information files:
description of the data
Data files:
20_newsgroups.tar.gz (17.3M; 61.6M uncompressed)
mini_newsgroups.tar.gz A subset composed of 100 articles from each newsgroup. (1.9M; 6.2M uncompressed)</span>A List of Social Tagging Datasets Made Available for Researchhttp://kmi.tugraz.at/staff/markus/datasets/hotho2009-12-10T16:27:55+01:00tagging dataset <a itemprop="url" data-versiondate="2009-12-10T16:27:55+01:00" href="http://kmi.tugraz.at/staff/markus/datasets/" rel="nofollow" class="description-link">http://kmi.tugraz.at/staff/markus/datasets/</a>ACL Anthology Reference Corpus (ACL ARC)http://acl-arc.comp.nus.edu.sg/hotho2010-03-19T10:55:58+01:00acl dataset paper pdf text <a itemprop="url" data-versiondate="2010-03-19T10:55:58+01:00" href="http://acl-arc.comp.nus.edu.sg/" rel="nofollow" class="description-link">http://acl-arc.comp.nus.edu.sg/</a>ACM SIGKDD: Special Issue on Learning from Inbalanced Datasetshttp://www.acm.org/sigs/sigkdd/explorations/issue.php?volume=6&issue=1&year=2004&month=06hotho2007-01-28T16:19:49+01:00data dataset inbalanced learning svm <a itemprop="url" data-versiondate="2007-01-28T16:19:49+01:00" href="http://www.acm.org/sigs/sigkdd/explorations/issue.php?volume=6&issue=1&year=2004&month=06" rel="nofollow" class="description-link">http://www.acm.org/sigs/sigkdd/explorations/issue.php?volume=6&issue=1&year=2004&month=06</a>Aktienindizes Deutschland | Yahoo! FinanzenIhttp://de.finance.yahoo.com/m8hotho2009-12-21T15:03:20+01:00dataset finanz jpp <a itemprop="url" data-versiondate="2009-12-21T15:03:20+01:00" href="http://de.finance.yahoo.com/m8" rel="nofollow" class="description-link">http://de.finance.yahoo.com/m8</a>Algorithms for Large Data Sets: Lecture Notes & Slideshttp://www.ee.technion.ac.il/courses/049011/index_files/Page337.htmlhotho2006-06-23T07:42:47+02:00folien ir large dataset <a itemprop="url" data-versiondate="2006-06-23T07:42:47+02:00" href="http://www.ee.technion.ac.il/courses/049011/index_files/Page337.html" rel="nofollow" class="description-link">http://www.ee.technion.ac.il/courses/049011/index_files/Page337.html</a>Andrew McCallum's Code and DataCora Citation Matching [reference matching, object correspondence]
Text of citations hand-clustered into groups referring to the same paper.http://www.cs.umass.edu/~mccallum/code-data.htmlhotho2006-05-11T09:55:41+02:00ie dataset bibliographic references cora <span itemprop="description">Cora Citation Matching [reference matching, object correspondence]
Text of citations hand-clustered into groups referring to the same paper.</span>AOL search data mirrorsThis collection consists of ~20M web queries collected from ~650k users over three months.
The data is sorted by anonymous user ID and sequentially arranged.http://www.gregsadetsky.com/aol-data/hotho2006-10-07T11:43:25+02:00search dataset <span itemprop="description">This collection consists of ~20M web queries collected from ~650k users over three months.
The data is sorted by anonymous user ID and sequentially arranged.</span>Benchmark Data Sets used in [RaeOnoMue01] and [MikRaeWesSchMue99]http://ida.first.fraunhofer.de/projects/bench/benchmarks.htmhotho2006-06-23T07:24:21+02:00dataset dm ida ml <a itemprop="url" data-versiondate="2006-06-23T07:24:21+02:00" href="http://ida.first.fraunhofer.de/projects/bench/benchmarks.htm" rel="nofollow" class="description-link">http://ida.first.fraunhofer.de/projects/bench/benchmarks.htm</a>Bibliography Imbalance Problemhttp://www.site.uottawa.ca/~nat/Research/class_imbalance_bibli.htmlhotho2006-09-19T12:09:43+02:00data dataset paper imbalance <span itemprop="description"> Imbalance Problem</span>BibSonomy::faqhttp://www.bibsonomy.org/faq#faq-dataset-1stumme2008-11-28T11:01:10+01:00bibsonomy dataset dump <a itemprop="url" data-versiondate="2008-11-28T11:01:10+01:00" href="http://www.bibsonomy.org/faq#faq-dataset-1" rel="nofollow" class="description-link">http://www.bibsonomy.org/faq#faq-dataset-1</a>Billion Triple Challenge 2010 Datasethttp://km.aifb.kit.edu/projects/btc-2010/benz2011-02-04T16:07:16+01:00billion_triple data dataset semantic semantic_web <a itemprop="url" data-versiondate="2011-02-04T16:07:16+01:00" href="http://km.aifb.kit.edu/projects/btc-2010/" rel="nofollow" class="description-link">http://km.aifb.kit.edu/projects/btc-2010/</a>Billion Triple Challenge 2010 Datasethttp://km.aifb.kit.edu/projects/btc-2010/hotho2010-07-29T23:05:09+02:002010 billion challenge dataset semantic triple web <a itemprop="url" data-versiondate="2010-07-29T23:05:09+02:00" href="http://km.aifb.kit.edu/projects/btc-2010/" rel="nofollow" class="description-link">http://km.aifb.kit.edu/projects/btc-2010/</a>Call for Participation | Second Pascal Challenge on Large Scale Hierarchical Text classificationFollowing a successful first edition, we are pleased to announce the 2nd edition of the Large Scale Hierarchical Text Classification (LSHTC) Pascal Challenge. The LSHTC Challenge is a hierarchical text classification competition, using large datasets. This year’s challenge will increase the scale and the difficulty of the task, using data from Wikipedia (www.wikipedia.org), in addition to the ODP Web directory data (www.dmoz.org).http://lshtc.iit.demokritos.gr/benz2011-02-04T16:06:38+01:002011 challenge dataset dmoz text_classification wikipedia workshop <span itemprop="description">Following a successful first edition, we are pleased to announce the 2nd edition of the Large Scale Hierarchical Text Classification (LSHTC) Pascal Challenge. The LSHTC Challenge is a hierarchical text classification competition, using large datasets. This year’s challenge will increase the scale and the difficulty of the task, using data from Wikipedia (www.wikipedia.org), in addition to the ODP Web directory data (www.dmoz.org).</span>CLUTO - Family of Data Clustering Software Tools | Karypis Labhttp://glaros.dtc.umn.edu/gkhome/views/clutohotho2006-10-25T09:25:47+02:00clustering tools dataset dm ml <a itemprop="url" data-versiondate="2006-10-25T09:25:47+02:00" href="http://glaros.dtc.umn.edu/gkhome/views/cluto" rel="nofollow" class="description-link">http://glaros.dtc.umn.edu/gkhome/views/cluto</a>comp.lang.perl.modules | Google Groupshttp://groups.google.com/group/comp.lang.perl.modules/browse_thread/thread/619db8926623c188/dd4500f068555338?lnk=st&q=perl+mysql+large+datasets&rnum=14&hl=en#dd4500f068555338hotho2007-02-01T10:41:52+01:00perl large mysql dataset <a itemprop="url" data-versiondate="2007-02-01T10:41:52+01:00" href="http://groups.google.com/group/comp.lang.perl.modules/browse_thread/thread/619db8926623c188/dd4500f068555338?lnk=st&q=perl+mysql+large+datasets&rnum=14&hl=en#dd4500f068555338" rel="nofollow" class="description-link">http://groups.google.com/group/comp.lang.perl.modules/browse_thread/thread/619db8926623c188/dd4500f068555338?lnk=st&q=perl+mysql+large+datasets&rnum=14&hl=en#dd4500f068555338</a>CoPhIR - COntent-based Photo Image Retrievalhttp://cophir.isti.cnr.it/hotho2009-03-03T15:25:25+01:00audio dataset flickr ir multimedia search similarity <a itemprop="url" data-versiondate="2009-03-03T15:25:25+01:00" href="http://cophir.isti.cnr.it/" rel="nofollow" class="description-link">http://cophir.isti.cnr.it/</a>datasethttp://www.informatics.bangor.ac.uk/~kuncheva/activities/artificial_data.htmhotho2006-05-24T14:14:08+02:00clustering dataset <a itemprop="url" data-versiondate="2006-05-24T14:14:08+02:00" href="http://www.informatics.bangor.ac.uk/~kuncheva/activities/artificial_data.htm" rel="nofollow" class="description-link">http://www.informatics.bangor.ac.uk/~kuncheva/activities/artificial_data.htm</a>Datasetshttp://www.yr-bcn.es/webspam/datasets/hotho2007-07-19T01:15:17+02:00dataset detection spam webspam <a itemprop="url" data-versiondate="2007-07-19T01:15:17+02:00" href="http://www.yr-bcn.es/webspam/datasets/" rel="nofollow" class="description-link">http://www.yr-bcn.es/webspam/datasets/</a>Datasetshttp://www.niaad.liacc.up.pt/old/statlog/datasets.htmlhotho2006-06-23T07:23:30+02:00statlog dataset dm ml <a itemprop="url" data-versiondate="2006-06-23T07:23:30+02:00" href="http://www.niaad.liacc.up.pt/old/statlog/datasets.html" rel="nofollow" class="description-link">http://www.niaad.liacc.up.pt/old/statlog/datasets.html</a>Datasets from transcripts of US Congressional floor debatesCongressional speech datahttp://www.cs.cornell.edu/home/llee/data/convote.htmlhotho2007-02-06T21:26:30+01:00classification dataset text <span itemprop="description">Congressional speech data</span>David Lee's Bookmarks for Corpus-based Linguistshttp://devoted.to/corporahotho2008-04-29T15:03:05+02:00corpus dataset lecture nlp survey <a itemprop="url" data-versiondate="2008-04-29T15:03:05+02:00" href="http://devoted.to/corpora" rel="nofollow" class="description-link">http://devoted.to/corpora</a>Delve Datasetshttp://www.cs.toronto.edu/~delve/data/datasets.htmlhotho2006-06-23T07:18:31+02:00learning data delve dataset dm mining machine ml <a itemprop="url" data-versiondate="2006-06-23T07:18:31+02:00" href="http://www.cs.toronto.edu/~delve/data/datasets.html" rel="nofollow" class="description-link">http://www.cs.toronto.edu/~delve/data/datasets.html</a>ECML/PKDD Discovery Challenge 2006http://www.ecmlpkdd2006.org/challenge.htmlhotho2007-05-18T20:38:05+02:00KI2007WebMining dataset detection email spam <a itemprop="url" data-versiondate="2007-05-18T20:38:05+02:00" href="http://www.ecmlpkdd2006.org/challenge.html" rel="nofollow" class="description-link">http://www.ecmlpkdd2006.org/challenge.html</a>Enron Email Datasethttp://www.cs.cmu.edu/~enron/hotho2007-05-18T20:38:46+02:00KI2007WebMining dataset email enron <a itemprop="url" data-versiondate="2007-05-18T20:38:46+02:00" href="http://www.cs.cmu.edu/~enron/" rel="nofollow" class="description-link">http://www.cs.cmu.edu/~enron/</a>Forum for Information Retrieval Evaluation (FIRE)http://www.isical.ac.in/~fire/2010/data_download.htmlhotho2011-01-07T17:52:13+01:00dataset evaluation information retrieval <a itemprop="url" data-versiondate="2011-01-07T17:52:13+01:00" href="http://www.isical.ac.in/~fire/2010/data_download.html" rel="nofollow" class="description-link">http://www.isical.ac.in/~fire/2010/data_download.html</a>Fundamental Clustering Problem Suite | DatabionicsFundamental Clustering Problem Suitehttp://www.mathematik.uni-marburg.de/~databionics/en//?q=datahotho2006-05-24T14:13:36+02:00clustering dataset <span itemprop="description">Fundamental Clustering Problem Suite</span>Geoffrey Sampson: Downloadable Resourceshttp://www.grsampson.net/Resources.htmlhotho2008-04-29T12:09:45+02:00corpus dataset lecture nlp tm <a itemprop="url" data-versiondate="2008-04-29T12:09:45+02:00" href="http://www.grsampson.net/Resources.html" rel="nofollow" class="description-link">http://www.grsampson.net/Resources.html</a>Google Research Homehttp://research.google.com/hotho2008-01-22T10:27:09+01:00data dataset google research <a itemprop="url" data-versiondate="2008-01-22T10:27:09+01:00" href="http://research.google.com/" rel="nofollow" class="description-link">http://research.google.com/</a>hbz — Linked Open Datahttp://www.hbz-nrw.de/projekte/linked_open_data/hotho2010-03-16T08:22:23+01:00bibliothek data dataset library linked open <a itemprop="url" data-versiondate="2010-03-16T08:22:23+01:00" href="http://www.hbz-nrw.de/projekte/linked_open_data/" rel="nofollow" class="description-link">http://www.hbz-nrw.de/projekte/linked_open_data/</a>HepCorpus - Sinaihttp://sinai.ujaen.es/wiki/index.php/HepCorpus#English_versionhotho2006-05-29T15:53:16+02:00text dataset corpus <a itemprop="url" data-versiondate="2006-05-29T15:53:16+02:00" href="http://sinai.ujaen.es/wiki/index.php/HepCorpus#English_version" rel="nofollow" class="description-link">http://sinai.ujaen.es/wiki/index.php/HepCorpus#English_version</a>Home - CKANhttp://ckan.net/hotho2010-10-21T20:54:54+02:00dataset lod register semantic web <a itemprop="url" data-versiondate="2010-10-21T20:54:54+02:00" href="http://ckan.net/" rel="nofollow" class="description-link">http://ckan.net/</a>Home Page for 20 Newsgroups Data SetThe 20 Newsgroups data sethttp://people.csail.mit.edu/jrennie/20Newsgroups/hotho2008-04-12T15:32:30+02:0020 dataset newsgroups text <span itemprop="description">The 20 Newsgroups data set</span>ICT - Information and Communication Theory Grouphttp://ict.ewi.tudelft.nl/index.php?option=com_sections&id=178&Itemid=328hotho2009-01-19T21:22:47+01:00dataset folksonomy librarything tagging <a itemprop="url" data-versiondate="2009-01-19T21:22:47+01:00" href="http://ict.ewi.tudelft.nl/index.php?option=com_sections&id=178&Itemid=328" rel="nofollow" class="description-link">http://ict.ewi.tudelft.nl/index.php?option=com_sections&id=178&Itemid=328</a>ICWSM 2009 - International AAAI Conference on Weblogs and Social Mediahttp://www.icwsm.org/2009/data/hotho2008-10-23T20:45:36+02:002009 blog challenge conference data dataset social web <a itemprop="url" data-versiondate="2008-10-23T20:45:36+02:00" href="http://www.icwsm.org/2009/data/" rel="nofollow" class="description-link">http://www.icwsm.org/2009/data/</a>Index of /WBS/seb/datasetshttp://www.aifb.uni-karlsruhe.de/WBS/seb/datasets/hotho2007-09-20T12:10:48+02:00dataset relation <a itemprop="url" data-versiondate="2007-09-20T12:10:48+02:00" href="http://www.aifb.uni-karlsruhe.de/WBS/seb/datasets/" rel="nofollow" class="description-link">http://www.aifb.uni-karlsruhe.de/WBS/seb/datasets/</a>Infochimps Data Marketplace / Commons: Download Sell or Share Databases, statistics, data sets for freeFind and download data in any format, from financial to social networking to GIS data. Or sell data in our data marketplace, at a price you set. We have large data sets, spreadsheets, and databases packed with statistics.http://infochimps.org/benz2011-02-04T16:07:23+01:00data dataset datasets download search <span itemprop="description">Find and download data in any format, from financial to social networking to GIS data. Or sell data in our data marketplace, at a price you set. We have large data sets, spreadsheets, and databases packed with statistics.</span>Learning Question Classifiershttp://l2r.cs.uiuc.edu/~cogcomp/Data/QA/QC/hotho2006-10-11T10:27:47+02:00qa classification dataset <a itemprop="url" data-versiondate="2006-10-11T10:27:47+02:00" href="http://l2r.cs.uiuc.edu/~cogcomp/Data/QA/QC/" rel="nofollow" class="description-link">http://l2r.cs.uiuc.edu/~cogcomp/Data/QA/QC/</a>LETOR: Benchmark Data Sets for Learning to Rankhttp://research.microsoft.com/research/downloads/details/22a1b3e9-c5c6-4cfe-86f9-1d2ea1c199e8/details.aspxhotho2007-04-17T09:15:32+02:00benchmark dataset ranking <a itemprop="url" data-versiondate="2007-04-17T09:15:32+02:00" href="http://research.microsoft.com/research/downloads/details/22a1b3e9-c5c6-4cfe-86f9-1d2ea1c199e8/details.aspx" rel="nofollow" class="description-link">http://research.microsoft.com/research/downloads/details/22a1b3e9-c5c6-4cfe-86f9-1d2ea1c199e8/details.aspx</a>LETOR: Benchmark Datasets for Learning to Rankhttp://research.microsoft.com/users/tyliu/LETOR/hotho2008-01-01T13:56:17+01:00benchmark dataset learning microsoft ranking <a itemprop="url" data-versiondate="2008-01-01T13:56:17+01:00" href="http://research.microsoft.com/users/tyliu/LETOR/" rel="nofollow" class="description-link">http://research.microsoft.com/users/tyliu/LETOR/</a>Linguist List - Web Resource Listingshttp://www.linguistlist.org/sp/Texts.htmlhotho2008-04-29T12:06:42+02:00corpus dataset lecture nlp <a itemprop="url" data-versiondate="2008-04-29T12:06:42+02:00" href="http://www.linguistlist.org/sp/Texts.html" rel="nofollow" class="description-link">http://www.linguistlist.org/sp/Texts.html</a>Lost Boy: SPARQLing the BBC Programme Cataloguehttp://www.ldodds.com/blog/archives/000272.htmlhotho2006-04-27T12:05:58+02:00data dataset rdf <a itemprop="url" data-versiondate="2006-04-27T12:05:58+02:00" href="http://www.ldodds.com/blog/archives/000272.html" rel="nofollow" class="description-link">http://www.ldodds.com/blog/archives/000272.html</a>Manuel Barbera, Corpus based computational linguistic resources. General: E-Texts (§ 2.3).Electronic Literary Text Archives.http://www.bmanuel.org/clr2_et.htmlhotho2006-05-26T08:21:51+02:00text dataset corpus <span itemprop="description">Electronic Literary Text Archives.</span>Martin Hepphttp://www.heppnetz.de/eclassowl/hotho2006-06-19T10:00:33+02:00ontology dataset <a itemprop="url" data-versiondate="2006-06-19T10:00:33+02:00" href="http://www.heppnetz.de/eclassowl/" rel="nofollow" class="description-link">http://www.heppnetz.de/eclassowl/</a>Measuring User Influence in Twitterhttp://twitter.mpi-sws.org/hotho2011-01-03T12:57:32+01:00dataset paper toread twitter <a itemprop="url" data-versiondate="2011-01-03T12:57:32+01:00" href="http://twitter.mpi-sws.org/" rel="nofollow" class="description-link">http://twitter.mpi-sws.org/</a>Mendeley's DataTEL Data Set | Mendeley Developers PortalMendeley's DataTEL Data Sethttp://dev.mendeley.com/datachallenge/hotho2010-11-14T15:54:58+01:00data dataset datatel mendeley set todo <span itemprop="description">Mendeley's DataTEL Data Set</span>Mining of Massive Datasetshttp://i.stanford.edu/~ullman/mmds.htmlbenz2011-02-04T16:06:37+01:00data data_mining dataset massive <a itemprop="url" data-versiondate="2011-02-04T16:06:37+01:00" href="http://i.stanford.edu/~ullman/mmds.html" rel="nofollow" class="description-link">http://i.stanford.edu/~ullman/mmds.html</a>Mining of Massive Datasetshttp://i.stanford.edu/~ullman/mmds.htmlhotho2011-01-24T11:10:59+01:00book massive mining pdf slides dataset <a itemprop="url" data-versiondate="2011-01-24T11:10:59+01:00" href="http://i.stanford.edu/~ullman/mmds.html" rel="nofollow" class="description-link">http://i.stanford.edu/~ullman/mmds.html</a>Miscellaneous MATLAB Software, Data, Tricks and DemonstrationsGunnar Raetsch's Benchmark Datasetshttp://theoval.sys.uea.ac.uk/matlab/default.html#benchmarkshotho2006-06-23T09:00:57+02:00benchmark dataset dm matlab ml kernel <span itemprop="description">Gunnar Raetsch's Benchmark Datasets</span>MPQA Releaseshttp://www.cs.pitt.edu/mpqa/hotho2010-03-17T11:31:14+01:00corpus dataset mpqa opinion <a itemprop="url" data-versiondate="2010-03-17T11:31:14+01:00" href="http://www.cs.pitt.edu/mpqa/" rel="nofollow" class="description-link">http://www.cs.pitt.edu/mpqa/</a>much.moreA number of resources have been compiled within the context of the MuchMore project. These include: a bilingual, parallel medical corpus; corresponding queries and relevance assessments; evaluation sets of disambiguated terms for GermaNet and UMLS; an evaluation list for morphological decomposition of medical terms.http://muchmore.dfki.de/resources_index.htmhotho2006-04-07T10:58:58+02:00dataset corpus <span itemprop="description">A number of resources have been compiled within the context of the MuchMore project. These include: a bilingual, parallel medical corpus; corresponding queries and relevance assessments; evaluation sets of disambiguated terms for GermaNet and UMLS; an evaluation list for morphological decomposition of medical terms.</span>Multexthttp://aune.lpl.univ-aix.fr/projects/multext/hotho2007-11-16T17:36:20+01:00corpus dataset text <a itemprop="url" data-versiondate="2007-11-16T17:36:20+01:00" href="http://aune.lpl.univ-aix.fr/projects/multext/" rel="nofollow" class="description-link">http://aune.lpl.univ-aix.fr/projects/multext/</a>Multilabel ClassificationMulti-Label Classificationhttp://mlkd.csd.auth.gr/multilabel.htmlhotho2007-11-23T13:12:59+01:00classification dataset extension multilabel text tools weka <span itemprop="description">Multi-Label Classification</span>NEC Animal Datasethttp://ml.nec-labs.com/download/data/videoembed/hotho2009-05-17T08:48:16+02:00animal dataset evaluation nec <a itemprop="url" data-versiondate="2009-05-17T08:48:16+02:00" href="http://ml.nec-labs.com/download/data/videoembed/" rel="nofollow" class="description-link">http://ml.nec-labs.com/download/data/videoembed/</a>Netflix Prize: Homehttp://www.netflixprize.com/hotho2006-10-05T22:08:28+02:00recommender movie dataset preis <a itemprop="url" data-versiondate="2006-10-05T22:08:28+02:00" href="http://www.netflixprize.com/" rel="nofollow" class="description-link">http://www.netflixprize.com/</a>Network datahttp://www-personal.umich.edu/~mejn/netdata/hotho2009-11-05T08:54:11+01:00data network research dataset <a itemprop="url" data-versiondate="2009-11-05T08:54:11+01:00" href="http://www-personal.umich.edu/~mejn/netdata/" rel="nofollow" class="description-link">http://www-personal.umich.edu/~mejn/netdata/</a>Obtaining corpora and text collections for biomedical natural language processinghttp://compbio.uchsc.edu/corpora/obtaining.shtmlhotho2006-01-31T18:10:51+01:00dataset nlp bio <a itemprop="url" data-versiondate="2006-01-31T18:10:51+01:00" href="http://compbio.uchsc.edu/corpora/obtaining.shtml" rel="nofollow" class="description-link">http://compbio.uchsc.edu/corpora/obtaining.shtml</a>Omega Ontology: Homehttp://omega.isi.edu/hotho2006-06-14T06:19:56+02:00ontology omega dataset nlp <a itemprop="url" data-versiondate="2006-06-14T06:19:56+02:00" href="http://omega.isi.edu/" rel="nofollow" class="description-link">http://omega.isi.edu/</a>Online Data - Robert Shillerhttp://www.econ.yale.edu/~shiller/data.htmhotho2009-12-21T14:40:43+01:00dataset jpp <a itemprop="url" data-versiondate="2009-12-21T14:40:43+01:00" href="http://www.econ.yale.edu/~shiller/data.htm" rel="nofollow" class="description-link">http://www.econ.yale.edu/~shiller/data.htm</a>Online Social Network-dataset now available « Tore Opsahlhttp://toreopsahl.com/2009/11/10/online-social-network-dataset-now-available/hotho2010-04-30T15:43:34+02:00dataset network social <a itemprop="url" data-versiondate="2010-04-30T15:43:34+02:00" href="http://toreopsahl.com/2009/11/10/online-social-network-dataset-now-available/" rel="nofollow" class="description-link">http://toreopsahl.com/2009/11/10/online-social-network-dataset-now-available/</a>Pajek / How to: Convert text file datasets into Pajek formathttp://vlado.fmf.uni-lj.si/pub/networks/pajek/howto/text2pajek.htmhotho2007-01-26T13:34:34+01:00convert dataset pajek <a itemprop="url" data-versiondate="2007-01-26T13:34:34+01:00" href="http://vlado.fmf.uni-lj.si/pub/networks/pajek/howto/text2pajek.htm" rel="nofollow" class="description-link">http://vlado.fmf.uni-lj.si/pub/networks/pajek/howto/text2pajek.htm</a>Public Data Sets on Amazon Web Services (AWS)http://aws.amazon.com/publicdatasets/hotho2009-01-06T18:07:54+01:00amazon dataset ontology public <a itemprop="url" data-versiondate="2009-01-06T18:07:54+01:00" href="http://aws.amazon.com/publicdatasets/" rel="nofollow" class="description-link">http://aws.amazon.com/publicdatasets/</a>Researchers Yearn to Use AOL Logs, but They Hesitate - New York Timeshttp://www.nytimes.com/2006/08/23/technology/23search.html?ei=5088&en=cc878412ed34dad0&ex=1313985600&partner=rssnyt&emc=rss&pagewanted=allhotho2007-02-19T12:49:31+01:00presse dataset aol <a itemprop="url" data-versiondate="2007-02-19T12:49:31+01:00" href="http://www.nytimes.com/2006/08/23/technology/23search.html?ei=5088&en=cc878412ed34dad0&ex=1313985600&partner=rssnyt&emc=rss&pagewanted=all" rel="nofollow" class="description-link">http://www.nytimes.com/2006/08/23/technology/23search.html?ei=5088&en=cc878412ed34dad0&ex=1313985600&partner=rssnyt&emc=rss&pagewanted=all</a>Semantic MatchingS-Match is an open source Java framework for semantic matching. It contains semantic matching, minimal semantic matching and structure preserving semantic matching algorithm implementations.http://semanticmatching.org/hotho2010-08-09T20:31:40+02:00dataset geonames wordnet <span itemprop="description">S-Match is an open source Java framework for semantic matching. It contains semantic matching, minimal semantic matching and structure preserving semantic matching algorithm implementations.</span>semantically_annotated_snapshot_of_wikipediahttp://www.yr-bcn.es/semanticWikipediahotho2009-04-09T10:41:38+02:00tagging dataset wikipedia semantic pos <a itemprop="url" data-versiondate="2009-04-09T10:41:38+02:00" href="http://www.yr-bcn.es/semanticWikipedia" rel="nofollow" class="description-link">http://www.yr-bcn.es/semanticWikipedia</a>Seuchen-Prognose: Forscher finden das Gesetz des Reisens - Wissenschaft - SPIEGEL ONLINE - Nachrichtenhttp://www.spiegel.de/wissenschaft/mensch/0,1518,397303,00.htmlhotho2006-09-04T15:42:51+02:00bewegung dollar dataset reise vorhersagen <a itemprop="url" data-versiondate="2006-09-04T15:42:51+02:00" href="http://www.spiegel.de/wissenschaft/mensch/0,1518,397303,00.html" rel="nofollow" class="description-link">http://www.spiegel.de/wissenschaft/mensch/0,1518,397303,00.html</a>Show Us a Better Way: What public data is already available?http://www.showusabetterway.co.uk/call/data.htmlhotho2008-07-03T14:42:07+02:00data dataset public <a itemprop="url" data-versiondate="2008-07-03T14:42:07+02:00" href="http://www.showusabetterway.co.uk/call/data.html" rel="nofollow" class="description-link">http://www.showusabetterway.co.uk/call/data.html</a>SNAP: Network datasets: 476 million Twitter tweetshttp://snap.stanford.edu/data/twitter7.htmlhotho2010-12-05T19:59:23+01:00dataset network twitter <a itemprop="url" data-versiondate="2010-12-05T19:59:23+01:00" href="http://snap.stanford.edu/data/twitter7.html" rel="nofollow" class="description-link">http://snap.stanford.edu/data/twitter7.html</a>SNAP: Stanford Network Analysis Platformhttp://snap.stanford.edu/hotho2010-04-29T16:44:14+02:00analysis dataset network snap software stanford tools <a itemprop="url" data-versiondate="2010-04-29T16:44:14+02:00" href="http://snap.stanford.edu/" rel="nofollow" class="description-link">http://snap.stanford.edu/</a>Social Network Datahttp://www.angela-bohn.de/data.htmlbenz2011-02-04T16:07:16+01:00data dataset sna social_network <a itemprop="url" data-versiondate="2011-02-04T16:07:16+01:00" href="http://www.angela-bohn.de/data.html" rel="nofollow" class="description-link">http://www.angela-bohn.de/data.html</a>Social Network Datahttp://www.angela-bohn.de/data.htmlhotho2010-07-21T17:13:35+02:00sna dataset <a itemprop="url" data-versiondate="2010-07-21T17:13:35+02:00" href="http://www.angela-bohn.de/data.html" rel="nofollow" class="description-link">http://www.angela-bohn.de/data.html</a>Social Spam Detection Benjamin Markines Ciro Cattuto Filippo MenczerSocial Spam Detectionhttp://givealink.org/Site/socialspam.htmlhotho2009-04-01T17:04:55+02:00detection dataset classification bibsonomy spam <span itemprop="description">Social Spam Detection</span>Some code and datasetshttp://www.kyb.mpg.de/bs/people/pgehler/code/index.htmlhotho2008-10-10T17:20:02+02:00clustering code matlab plsa dataset <a itemprop="url" data-versiondate="2008-10-10T17:20:02+02:00" href="http://www.kyb.mpg.de/bs/people/pgehler/code/index.html" rel="nofollow" class="description-link">http://www.kyb.mpg.de/bs/people/pgehler/code/index.html</a>SourceForge.net: FilesNew text datasets (donated by George Forman) are available for download on Sourceforge:http://sourceforge.net/project/showfiles.php?group_id=5091&package_id=95362&release_id=399264hotho2006-03-07T08:26:04+01:00weka text dataset <span itemprop="description">New text datasets (donated by George Forman) are available for download on Sourceforge:</span>Spam datasethttp://plg.uwaterloo.ca/~gvcormac/treccorpus07/benz2011-02-04T16:07:08+01:00dataset spam <a itemprop="url" data-versiondate="2011-02-04T16:07:08+01:00" href="http://plg.uwaterloo.ca/~gvcormac/treccorpus07/" rel="nofollow" class="description-link">http://plg.uwaterloo.ca/~gvcormac/treccorpus07/</a>Spam Dataset Trechttp://plg1.cs.uwaterloo.ca/cgi-bin/cgiwrap/gvcormac/foo07hotho2010-08-16T14:03:26+02:00dataset spam trec <a itemprop="url" data-versiondate="2010-08-16T14:03:26+02:00" href="http://plg1.cs.uwaterloo.ca/cgi-bin/cgiwrap/gvcormac/foo07" rel="nofollow" class="description-link">http://plg1.cs.uwaterloo.ca/cgi-bin/cgiwrap/gvcormac/foo07</a>Springer Exemplarhttp://www.springerexemplar.com/hotho2010-10-08T15:15:20+02:00dataset extraction springer term <a itemprop="url" data-versiondate="2010-10-08T15:15:20+02:00" href="http://www.springerexemplar.com/" rel="nofollow" class="description-link">http://www.springerexemplar.com/</a>Stack Overflow Creative Commons Data Dump - Blog – Stack Overflowhttp://blog.stackoverflow.com/2009/06/stack-overflow-creative-commons-data-dump/benz2011-02-04T16:06:58+01:00data dataset stackoverflow <a itemprop="url" data-versiondate="2011-02-04T16:06:58+01:00" href="http://blog.stackoverflow.com/2009/06/stack-overflow-creative-commons-data-dump/" rel="nofollow" class="description-link">http://blog.stackoverflow.com/2009/06/stack-overflow-creative-commons-data-dump/</a>Stanford Computer Sciencehttp://cs.stanford.edu/research/project.php?id=121hotho2007-07-19T01:31:59+02:00crawl dataset web <a itemprop="url" data-versiondate="2007-07-19T01:31:59+02:00" href="http://cs.stanford.edu/research/project.php?id=121" rel="nofollow" class="description-link">http://cs.stanford.edu/research/project.php?id=121</a>Summary - ScientextScientext is a new, on-line French and English corpus of scientific texts. The corpus includes 4.8 million running tokens in French, 13 million words of research articles in English (medicine and biology), and an English-language sub-corpus of French undergraduate students’ texts (1,1 million words). The corpus is organized to facilitate the linguistic study of authorial position and reasoning in scientific articles through phraseology and lexico-grammatical markers linked to causality.http://scientext.msh-alpes.fr/scientext-site-en/spip.php?article1benz2011-02-04T16:06:37+01:00dataset english french science scientext texts <span itemprop="description">Scientext is a new, on-line French and English corpus of scientific texts. The corpus includes 4.8 million running tokens in French, 13 million words of research articles in English (medicine and biology), and an English-language sub-corpus of French undergraduate students’ texts (1,1 million words). The corpus is organized to facilitate the linguistic study of authorial position and reasoning in scientific articles through phraseology and lexico-grammatical markers linked to causality.</span>Tastes, Ties, and Time: Facebook data release | Berkman Centerllaboration with Harvard sociology graduate stuhttp://cyber.law.harvard.edu/node/4682hotho2009-01-29T15:46:42+01:00Facebook dataset <span itemprop="description">llaboration with Harvard sociology graduate stu</span>The ClueWeb09 Datasethttp://boston.lti.cs.cmu.edu/Data/clueweb09/benz2011-02-04T16:06:58+01:00clueweb dataset research web <a itemprop="url" data-versiondate="2011-02-04T16:06:58+01:00" href="http://boston.lti.cs.cmu.edu/Data/clueweb09/" rel="nofollow" class="description-link">http://boston.lti.cs.cmu.edu/Data/clueweb09/</a>The ClueWeb09 Datasethttp://boston.lti.cs.cmu.edu/Data/clueweb09/hotho2009-07-03T09:29:44+02:00clueweb09 dataset web <a itemprop="url" data-versiondate="2009-07-03T09:29:44+02:00" href="http://boston.lti.cs.cmu.edu/Data/clueweb09/" rel="nofollow" class="description-link">http://boston.lti.cs.cmu.edu/Data/clueweb09/</a>The Financial Data Finderhttp://fisher.osu.edu/fin/osudown.htmhotho2009-12-21T14:42:44+01:00dataset jpp stock <a itemprop="url" data-versiondate="2009-12-21T14:42:44+01:00" href="http://fisher.osu.edu/fin/osudown.htm" rel="nofollow" class="description-link">http://fisher.osu.edu/fin/osudown.htm</a>The Linking Open Data cloud diagramhttp://richard.cyganiak.de/2007/10/lod/hotho2010-09-23T09:46:08+02:00cloud dataset linked open semantic web <a itemprop="url" data-versiondate="2010-09-23T09:46:08+02:00" href="http://richard.cyganiak.de/2007/10/lod/" rel="nofollow" class="description-link">http://richard.cyganiak.de/2007/10/lod/</a>The QWS Datasethttp://www.uoguelph.ca/~qmahmoud/qws/hotho2007-12-07T21:02:40+01:00answer dataset question semantic service web <a itemprop="url" data-versiondate="2007-12-07T21:02:40+01:00" href="http://www.uoguelph.ca/~qmahmoud/qws/" rel="nofollow" class="description-link">http://www.uoguelph.ca/~qmahmoud/qws/</a>Trec Spam Corpushttp://plg.uwaterloo.ca/~gvcormac/treccorpus/hotho2006-09-04T15:42:51+02:00trec spam set data dataset corpus <a itemprop="url" data-versiondate="2006-09-04T15:42:51+02:00" href="http://plg.uwaterloo.ca/~gvcormac/treccorpus/" rel="nofollow" class="description-link">http://plg.uwaterloo.ca/~gvcormac/treccorpus/</a>Trust network datasets - TrustLethttp://www.trustlet.org/wiki/Trust_network_datasetshotho2008-02-14T09:48:49+01:00dataset network <a itemprop="url" data-versiondate="2008-02-14T09:48:49+01:00" href="http://www.trustlet.org/wiki/Trust_network_datasets" rel="nofollow" class="description-link">http://www.trustlet.org/wiki/Trust_network_datasets</a>UCI Machine Learning Repositoryhttp://www.ics.uci.edu/~mlearn/MLRepository.htmlhotho2006-06-23T07:18:45+02:00learning data dataset dm mining machine ml uci <a itemprop="url" data-versiondate="2006-06-23T07:18:45+02:00" href="http://www.ics.uci.edu/~mlearn/MLRepository.html" rel="nofollow" class="description-link">http://www.ics.uci.edu/~mlearn/MLRepository.html</a>Universität Bern - Departement Mathematik und Statistik - Datensätze (IMSV)von US-amerikanischen Bahttp://www.math-stat.unibe.ch/content/lehrveranstaltungen/skripten_etc/datasets_imsv/index_ger.htmlhotho2009-12-21T14:53:57+01:00dataset jpp <span itemprop="description">von US-amerikanischen Ba</span>Useful Data Setshttp://pages.stern.nyu.edu/~adamodar/New_Home_Page/data.htmlhotho2009-12-21T14:40:53+01:00jpp dataset <a itemprop="url" data-versiondate="2009-12-21T14:40:53+01:00" href="http://pages.stern.nyu.edu/~adamodar/New_Home_Page/data.html" rel="nofollow" class="description-link">http://pages.stern.nyu.edu/~adamodar/New_Home_Page/data.html</a>Web Community Datasethttp://affsys.com/experiments/HT2008/hotho2008-06-21T20:33:47+02:00community dataset ht08 hypertext08 web <a itemprop="url" data-versiondate="2008-06-21T20:33:47+02:00" href="http://affsys.com/experiments/HT2008/" rel="nofollow" class="description-link">http://affsys.com/experiments/HT2008/</a>Web Information Retrieval / Natural Language Processing Group (WING) - NLP/IR resource page on ayehttp://wing.comp.nus.edu.sg/portal/RPNLPIR/hotho2007-03-23T15:16:48+01:00dataset information ir nlp resource retrieval web <a itemprop="url" data-versiondate="2007-03-23T15:16:48+01:00" href="http://wing.comp.nus.edu.sg/portal/RPNLPIR/" rel="nofollow" class="description-link">http://wing.comp.nus.edu.sg/portal/RPNLPIR/</a>Webscope from Yahoo! Labshttp://webscope.sandbox.yahoo.com/hotho2009-10-23T10:00:30+02:00yahoo dataset <a itemprop="url" data-versiondate="2009-10-23T10:00:30+02:00" href="http://webscope.sandbox.yahoo.com/" rel="nofollow" class="description-link">http://webscope.sandbox.yahoo.com/</a>Welcome to the UCR Time Series Classification/Clustering PageWelcome to the UCR Time Series Classification/Clustering Pagehttp://www.cs.ucr.edu/~eamonn/time_series_data/hotho2006-06-02T18:24:45+02:00dataset <span itemprop="description">Welcome to the UCR Time Series Classification/Clustering Page</span>What is Twitter, a Social Network or a News Media? - WWW'10http://an.kaist.ac.kr/traces/WWW2010.htmlbenz2011-02-04T16:07:23+01:00dataset twitter www www2010 <a itemprop="url" data-versiondate="2011-02-04T16:07:23+01:00" href="http://an.kaist.ac.kr/traces/WWW2010.html" rel="nofollow" class="description-link">http://an.kaist.ac.kr/traces/WWW2010.html</a>Where's George? ® 2.2http://www.wheresgeorge.com/hotho2006-09-04T15:42:51+02:00dollar dataset <a itemprop="url" data-versiondate="2006-09-04T15:42:51+02:00" href="http://www.wheresgeorge.com/" rel="nofollow" class="description-link">http://www.wheresgeorge.com/</a>Yahoo datasetshttp://www.stanford.edu/class/cs345a/YahooData.pdfhotho2009-03-13T16:26:34+01:00dataset yahoo <a itemprop="url" data-versiondate="2009-03-13T16:26:34+01:00" href="http://www.stanford.edu/class/cs345a/YahooData.pdf" rel="nofollow" class="description-link">http://www.stanford.edu/class/cs345a/YahooData.pdf</a>Yahoo! Learning to Rank Challenge -http://learningtorankchallenge.yahoo.com/hotho2010-02-26T13:47:48+01:00challenge learning rank search wettbewerb yahoo dataset <a itemprop="url" data-versiondate="2010-02-26T13:47:48+01:00" href="http://learningtorankchallenge.yahoo.com/" rel="nofollow" class="description-link">http://learningtorankchallenge.yahoo.com/</a>