Publications

From Frequency to Meaning: Vector Space Models of Semantics

Turney, P. D. & Pantel, P.

(2010) [pdf]

Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field.

The Wisdom in Tweetonomies: Acquiring Latent Conceptual Structures from Social Awareness Streams

Wagner, C. & Strohmaier, M.

, 'Proc. of the Semantic Search 2010 Workshop (SemSearch2010)' (2010) [pdf]

Although one might argue that little wisdom can be conveyed in messages of 140 characters or less, this paper sets out to explore whether the aggregation of messages in social awareness streams, such as Twitter, conveys meaningful information about a given domain. As a research community, we know little about the structural and semantic properties of such streams, and how they can be analyzed, characterized and used. This paper introduces a network-theoretic model of social awareness stream, a so-called together with a set of stream-based measures that allow researchers to systematically define and compare different stream aggregations. We apply the model and measures to a dataset acquired from Twitter to study emerging semantics in selected streams. The network-theoretic model and the corresponding measures introduced in this paper are relevant for researchers interested in information retrieval and ontology learning from social awareness streams. Our empirical findings demonstrate that different social awareness stream aggregations exhibit interesting differences, making them amenable for different applications.

Semantic Categorization Using Simple Word Co-occurrence statistics

Bullinaria, J.

(2008)

Semantic Categorization Using Simple Word Co-occurrence statistics

Bullinaria, J.

ESSLLI Workshop on Distributional Lexical Semantics (2008)

Wissensverarbeitung und die Semantik der natürlichen Sprache: Wissensrepräsentation mit MultiNet

Helbig, H.

2008, Springer, Berlin, [10.1007/978-3-540-76278-2]

Das Buch gibt eine umfassende Darstellung einer Methodik zur Interpretation und Bedeutungsrepräsentation natürlichsprachlicher Ausdrücke. Diese Methodik der Mehrschichtigen Erweiterten Semantischen Netze (MultiNet) ist sowohl für theoretische Untersuchungen als auch für die automatische Verarbeitung natürlicher Sprache auf dem Rechner geeignet. Die vorgestellten Ergebnisse sind eingebettet in ein System von Software-Werkzeugen, die eine praktische Nutzung der MultiNet-Darstellungsmittel als Formalismus zur Bedeutungsrepräsentation sichern. Hierzu gehören: eine Werkbank für den Wissensingenieur, ein Übersetzungssystem zur automatischen Gewinnung von Bedeutungsdarstellungen natürlichsprachlicher Sätze und eine Werkbank für den Computerlexikographen.

Enhancing text clustering by leveraging Wikipedia semantics.

Hu, J.; Fang, L.; Cao, Y.; Zeng, H.-J.; Li, H.; Yang, Q. & Chen, Z.

Myaeng, S.-H.; Oard, D. W.; Sebastiani, F.; Chua, T.-S. & Leong, M.-K., ed., 'SIGIR', ACM, 179-186 (2008) [pdf]

Social tags: meaning and suggestions

Suchanek, F. M.; Vojnovic, M. & Gunawardena, D.

, 'Proceeding of the 17th ACM conference on Information and knowledge management', CIKM '08, ACM, New York, NY, USA, [10.1145/1458082.1458114], 223-232 (2008) [pdf]

This paper aims to quantify two common assumptions about social tagging: (1) that tags are "meaningful" and (2) that the tagging process is influenced by tag suggestions. For (1), we analyze the semantic properties of tags and the relationship between the tags and the content of the tagged page. Our analysis is based on a corpus of search keywords, contents, titles, and tags applied to several thousand popular Web pages. Among other results, we find that the more popular tags of a page tend to be the more meaningful ones. For (2), we develop a model of how the influence of tag suggestions can be measured. From a user study with over 4,000 participants, we conclude that roughly one third of the tag applications may be induced by the suggestions. Our results would be of interest for designers of social tagging systems and are a step towards understanding how to best leverage social tags for applications such as search and information extraction.

Extracting semantic representations from word co-occurrence statistics: A computational study

Bullinaria, J. & Levy, J.

Behavior Research Methods 510 (2007)

The statistics of word cooccurrences: word pairs and collocations

Evert, S.

Unpublished doctoral dissertation, Institut f\ür maschinelle Sprachverarbeitung, Universit\ät Stuttgart (2004) [pdf]

Producing high-dimensional semantic spaces from lexical co-occurrence

Lund, K. & Burgess, C.

Behavior Research Methods Instruments and Computers, 28(2) 203-208 (1996) [pdf]