TY - CONF AU - Sinha, Arnab AU - Shen, Zhihong AU - Song, Yang AU - Ma, Hao AU - Eide, Darrin AU - Hsu, Bo-June Paul AU - Wang, Kuansan A2 - Gangemi, Aldo A2 - Leonardi, Stefano A2 - Panconesi, Alessandro T1 - An Overview of Microsoft Academic Service (MAS) and Applications. T2 - WWW (Companion Volume) PB - ACM C1 - PY - 2015/ CY - VL - IS - SP - 243 EP - 246 UR - http://dblp.uni-trier.de/db/conf/www/www2015c.html#SinhaSSMEHW15 DO - KW - MSAC KW - dataset KW - toread L1 - SN - 978-1-4503-3473-0 N1 - N1 - AB - ER - TY - JOUR AU - Adomavicius, Gediminas AU - Zhang, Jingjing T1 - Impact of Data Characteristics on Recommender Systems Performance JO - ACM Trans. Manage. Inf. Syst. PY - 2012/04 VL - 3 IS - 1 SP - 3:1 EP - 3:17 UR - http://doi.acm.org/10.1145/2151163.2151166 DO - 10.1145/2151163.2151166 KW - characteristics KW - dataset KW - dependence KW - evaluation KW - model KW - recommender L1 - SN - N1 - Impact of data characteristics on recommender systems performance N1 - AB - This article investigates the impact of rating data characteristics on the performance of several popular recommendation algorithms, including user-based and item-based collaborative filtering, as well as matrix factorization. We focus on three groups of data characteristics: rating space, rating frequency distribution, and rating value distribution. A sampling procedure was employed to obtain different rating data subsamples with varying characteristics; recommendation algorithms were used to estimate the predictive accuracy for each sample; and linear regression-based models were used to uncover the relationships between data characteristics and recommendation accuracy. Experimental results on multiple rating datasets show the consistent and significant effects of several data characteristics on recommendation accuracy. ER - TY - JOUR AU - Zubiaga, Arkaitz AU - Fresno, Victor AU - Martinez, Raquel AU - Garcia-Plaza, Alberto P. T1 - Harnessing Folksonomies to Produce a Social Classification of Resources JO - IEEE Transactions on Knowledge and Data Engineering PY - 2012/ VL - 99 IS - PrePrints SP - EP - UR - DO - http://doi.ieeecomputersociety.org/10.1109/TKDE.2012.115 KW - classification KW - delicious KW - folksonomy KW - tagging KW - toread KW - dataset L1 - SN - N1 - N1 - AB - ER - TY - JOUR AU - La Rowe, Gavin AU - Ambre, Sumeet AU - Burgoon, John AU - Ke, Weimao AU - Börner, Katy T1 - The Scholarly Database and its utility for scientometrics research JO - Scientometrics PY - 2009/ VL - 79 IS - 2 SP - 219 EP - 234 UR - http://dx.doi.org/10.1007/s11192-009-0414-2 DO - 10.1007/s11192-009-0414-2 KW - analysis KW - database KW - dataset KW - gaw KW - science KW - scientometrics KW - sdb KW - sota L1 - SN - N1 - N1 - AB - The Scholarly Database aims to serve researchers and practitioners interested in the analysis, modelling, and visualization of large-scale data sets. A specific focus of this database is to support macro-evolutionary studies of science and to communicate findings via knowledge-domain visualizations. Currently, the database provides access to about 18 million publications, patents, and grants. About 90% of the publications are available in full text. Except for some datasets with restricted access conditions, the data can be retrieved in raw or pre-processed formats using either a web-based or a relational database client. This paper motivates the need for the database from the perspective of bibliometric/scientometric research. It explains the database design, setup, etc., and reports the temporal, geographical, and topic coverage of data sets currently served via the database. Planned work and the potential for this database to become a global testbed for information science research are discussed at the end of the paper. ER - TY - JOUR AU - Capocci, Andrea AU - Caldarelli, Guido T1 - Folksonomies and clustering in the collaborative system CiteULike JO - Journal of Physics A: Mathematical and Theoretical PY - 2008/ VL - 41 IS - 22 SP - EP - UR - http://stacks.iop.org/1751-8121/41/224016 DO - KW - *** KW - citeulike KW - clustering KW - dataset KW - folksonomy KW - network KW - properties L1 - SN - N1 - N1 - AB - We analyze CiteULike, an online collaborative tagging system where users bookmark and annotate scientific papers. Such a system can be naturally represented as a tri-partite graph whose nodes represent papers, users and tags connected by individual tag assignments. The semantics of tags is studied here, in order to uncover the hidden relationships between tags. We find that the clustering coefficient can be used to analyze the semantical patterns among tags. ER - TY - CONF AU - Caverlee, James AU - Webb, Steve A2 - T1 - A Large-Scale Study of MySpace:

Observations and Implications for Online Social Networks T2 - Proceedings from the 2nd International Conference on Weblogs and Social Media (AAAI) PB - C1 - PY - 2008/ CY - VL - IS - SP - EP - UR - http://faculty.cs.tamu.edu/caverlee/pubs/caverlee08alarge.pdf DO - KW - analysis KW - dataset KW - myspace KW - networking KW - social L1 - SN - N1 - CiteULike: A Large-Scale Study of MySpace: Observations and Implications for Online Social Networks N1 - AB - ER - TY - CONF AU - Narayanan, Arvind AU - Shmatikov, Vitaly A2 - T1 - Robust De-anonymization of Large Sparse Datasets T2 - Proc. of the 29th IEEE Symposium on Security and Privacy PB - IEEE Computer Society C1 - PY - 2008/05 CY - VL - IS - SP - 111 EP - 125 UR - http://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf DO - 10.1109/SP.2008.33 KW - anonymization KW - datamining KW - dataset KW - netflix KW - privacy KW - recommender KW - toread L1 - SN - N1 - N1 - AB - We present a new class of statistical de- anonymization attacks against high-dimensional micro-data, such as individual preferences, recommendations, transaction records and so on. Our techniques are robust to perturbation in the data and tolerate some mistakes in the adversary's background knowledge. We apply our de-anonymization methodology to the Netflix Prize dataset, which contains anonymous movie ratings of 500,000 subscribers of Netflix, the world's largest online movie rental service. We demonstrate that an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber's record in the dataset. Using the Internet Movie Database as the source of background knowledge, we successfully identified the Netflix records of known users, uncovering their apparent political preferences and other potentially sensitive information. ER - TY - CONF AU - Song, Yang AU - Zhang, Lu AU - Giles, C. Lee A2 - T1 - A sparse gaussian processes classification framework for fast tag suggestions T2 - CIKM '08: Proceeding of the 17th ACM conference on Information and knowledge mining PB - ACM C1 - New York, NY, USA PY - 2008/ CY - VL - IS - SP - 93 EP - 102 UR - http://portal.acm.org/citation.cfm?id=1458098 DO - http://doi.acm.org/10.1145/1458082.1458098 KW - bibsonomy KW - bookmarking KW - classification KW - dataset KW - ml KW - recommender KW - social KW - tag KW - tagging KW - taggingsurvey KW - toread L1 - SN - 978-1-59593-991-3 N1 - A sparse gaussian processes classification framework for fast tag suggestions N1 - AB - ER - TY - CONF AU - Hassan-Montero, Y. AU - Herrero-Solana, V. A2 - T1 - Improving Tag-Clouds as Visual Information Retrieval Interfaces T2 - InScit2006: International Conference on Multidisciplinary Information Sciences and Technologies PB - C1 - PY - 2006/ CY - VL - IS - SP - EP - UR - http://nosolousabilidad.com/hassan/improving_tagclouds.pdf DO - KW - clouds KW - dataset KW - del.icio.us KW - information KW - tag KW - tagging KW - taggingsurvey KW - toread KW - visual L1 - SN - N1 - N1 - AB - Tagging-based systems enable users to categorize web resources by means of tags (freely chosen keywords), in order to re-finding these resources later. Tagging is implicitly also a social indexing process, since users share their tags and resources, constructing a social tag index, so-called folksonomy. At the same time of tagging-based system, has been popularised an interface model for visual information retrieval known as Tag-Cloud. In this model, the most frequently used tags are displayed in alphabetical order. This paper presents a novel approach to Tag-Cloud�s tags selection, and proposes the use of clustering algorithms for visual layout, with the aim of improve browsing experience. The results suggest that presented approach reduces the semantic density of tag set, and improves the visual consistency of Tag-Cloud layout. ER - TY - CONF AU - Liu, Vinci AU - Curran, James R. A2 - T1 - Web Text Corpus for Natural Language Processing. T2 - EACL PB - The Association for Computer Linguistics C1 - PY - 2006/ CY - VL - IS - SP - EP - UR - http://dblp.uni-trier.de/db/conf/eacl/eacl2006.html#LiuC06 DO - KW - corpus KW - dataset KW - web KW - synonym_detection KW - nlp L1 - SN - 1-932432-59-0 N1 - dblp N1 - AB - ER - TY - GEN AU - Narayanan, Arvind AU - Shmatikov, Vitaly A2 - T1 - How To Break Anonymity of the Netflix Prize Dataset JO - PB - C1 - PY - 2006/ VL - IS - SP - EP - UR - http://www.citebase.org/abstract?id=oai:arXiv.org:cs/0610105 DO - KW - Preis KW - anonymity KW - dataset KW - netflix KW - prize KW - recommender L1 - N1 - [cs/0610105] How To Break Anonymity of the Netflix Prize Dataset N1 - AB - We present a new class of statistical de-anonymization attacks against high-dimensional micro-data, such as individual preferences, recommendations, transaction records and so on. Our techniques are robust to perturbation in the data and tolerate some mistakes in the adversary's background knowledge. ER - TY - JOUR AU - McRae, K AU - Cree, G S AU - Seidenberg, M S AU - McNorgan, C T1 - Semantic feature production norms for a large set of living and nonliving things JO - Behav Res Methods PY - 2005/november VL - 37 IS - 4 SP - 547 EP - 559 UR - http://www.ncbi.nlm.nih.gov/pubmed/16629288 DO - KW - dataset KW - grounding KW - ol KW - ontology KW - relation KW - semantic KW - toread L1 - SN - N1 - Semantic feature production norms for a large set ...[Behav Res Methods. 2005] - PubMed Result N1 - AB - Semantic features have provided insight into numerous behavioral phenomena concerning concepts, categorization, and semantic memory in adults, children, and neuropsychological populations. Numerous theories and models in these areas are based on representations and computations involving semantic features. Consequently, empirically derived semantic feature production norms have played, and continue to play, a highly useful role in these domains. This article describes a set of feature norms collected from approximately 725 participants for 541 living (dog) and nonliving (chair) basic-level concepts, the largest such set of norms developed to date. This article describes the norms and numerous statistics associated with them. Our aim is to make these norms available to facilitate other research, while obviating the need to repeat the labor-intensive methods involved in collecting and analyzing such norms. The full set of norms may be downloaded from www.psychonomic.org/archive. ER - TY - GEN AU - Newman, C.L. Blake D.J. AU - Merz, C.J. A2 - T1 - UCI Repository of machine learning databases JO - PB - C1 - PY - 1998/ VL - IS - SP - EP - UR - http://www.ics.uci.edu/$\sim$mlearn/MLRepository.html DO - KW - learning KW - data KW - dataset KW - dm KW - mining KW - machine KW - ml KW - uci L1 - N1 - UCI Machine Learning Repository N1 - AB - ER -