Publications

Semantic feature production norms for a large set of living and nonliving things

McRae, K.; Cree, G. S.; Seidenberg, M. S. & McNorgan, C.

Behav Res Methods, 37(4) 547-559 (2005) [pdf]

Semantic features have provided insight into numerous behavioral phenomena concerning concepts, categorization, and semantic memory in adults, children, and neuropsychological populations. Numerous theories and models in these areas are based on representations and computations involving semantic features. Consequently, empirically derived semantic feature production norms have played, and continue to play, a highly useful role in these domains. This article describes a set of feature norms collected from approximately 725 participants for 541 living (dog) and nonliving (chair) basic-level concepts, the largest such set of norms developed to date. This article describes the norms and numerous statistics associated with them. Our aim is to make these norms available to facilitate other research, while obviating the need to repeat the labor-intensive methods involved in collecting and analyzing such norms. The full set of norms may be downloaded from www.psychonomic.org/archive.

Robust De-anonymization of Large Sparse Datasets

Narayanan, A. & Shmatikov, V.

, 'Proc. of the 29th IEEE Symposium on Security and Privacy', IEEE Computer Society, [10.1109/SP.2008.33], 111-125 (2008) [pdf]

We present a new class of statistical de- anonymization attacks against high-dimensional micro-data, such as individual preferences, recommendations, transaction records and so on. Our techniques are robust to perturbation in the data and tolerate some mistakes in the adversary's background knowledge. We apply our de-anonymization methodology to the Netflix Prize dataset, which contains anonymous movie ratings of 500,000 subscribers of Netflix, the world's largest online movie rental service. We demonstrate that an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber's record in the dataset. Using the Internet Movie Database as the source of background knowledge, we successfully identified the Netflix records of known users, uncovering their apparent political preferences and other potentially sensitive information.

Improving Tag-Clouds as Visual Information Retrieval Interfaces

Hassan-Montero, Y. & Herrero-Solana, V.

, 'InScit2006: International Conference on Multidisciplinary Information Sciences and Technologies' (2006) [pdf]

Tagging-based systems enable users to categorize web resources by means of tags (freely chosen keywords), in order to re-finding these resources later. Tagging is implicitly also a social indexing process, since users share their tags and resources, constructing a social tag index, so-called folksonomy. At the same time of tagging-based system, has been popularised an interface model for visual information retrieval known as Tag-Cloud. In this model, the most frequently used tags are displayed in alphabetical order. This paper presents a novel approach to Tag-Cloud�s tags selection, and proposes the use of clustering algorithms for visual layout, with the aim of improve browsing experience. The results suggest that presented approach reduces the semantic density of tag set, and improves the visual consistency of Tag-Cloud layout.

Harnessing Folksonomies to Produce a Social Classification of Resources

Zubiaga, A.; Fresno, V.; Martinez, R. & Garcia-Plaza, A. P.

IEEE Transactions on Knowledge and Data Engineering, 99(PrePrints) (2012)

An Overview of Microsoft Academic Service (MAS) and Applications.

Sinha, A.; Shen, Z.; Song, Y.; Ma, H.; Eide, D.; Hsu, B.-J. P. & Wang, K.

Gangemi, A.; Leonardi, S. & Panconesi, A., ed., 'WWW (Companion Volume)', ACM, 243-246 (2015) [pdf]

A sparse gaussian processes classification framework for fast tag suggestions

Song, Y.; Zhang, L. & Giles, C. L.

, 'CIKM '08: Proceeding of the 17th ACM conference on Information and knowledge mining', ACM, New York, NY, USA, [http://doi.acm.org/10.1145/1458082.1458098], 93-102 (2008) [pdf]