ConDist: A Context-Driven Categorical Distance Measure.
In: ECMLPKDD2015 (Herausgeber): . 2015.
Markus Ring, Florian Otto, Martin Becker, Thomas Niebler, Dieter Landes und Andreas Hotho.
[BibTeX]

Linguistic Regularities in Sparse and Explicit Word Representations..
In: R. Morante und W. tau Yih (Herausgeber): CoNLL, Seiten 171-180. ACL, 2014.
Omer Levy und Yoav Goldberg.
[doi] [BibTeX]

Characterizing Semantic Relatedness of Search Query Terms.
In: Proceedings of the 1st Workshop on Explorative Analytics of Information Networks (EIN2009). Bled, Slovenia, 2009.
Dominik Benz, Beate Krause, G. Praveen Kumar, Andreas Hotho und Gerd Stumme.
[BibTeX]

Evaluating Similarity Measures for Emergent Semantics of Social Tagging.
In: 18th International World Wide Web Conference, Seiten 641-641. 2009.
Benjamin Markines, Ciro Cattuto, Filippo Menczer, Dominik Benz, Andreas Hotho und Gerd Stumme.
[doi] [Kurzfassung] [BibTeX]

Semantic Analysis of Tag Similarity Measures in Collaborative Tagging Systems.
2008.
Ciro Cattuto, Dominik Benz, Andreas Hotho und Gerd Stumme.
[doi] [Kurzfassung] [BibTeX]

Locally Expandable Allocation of Folksonomy Tags in a Directed Acyclic Graph..
In: J. Bailey, D. Maier, K.-D. Schewe, B. Thalheim und X. S. Wang (Herausgeber): WISE, Band 5175, Reihe Lecture Notes in Computer Science, Seiten 151-162. Springer, 2008.
Takeharu Eda, Masatoshi Yoshikawa und Masashi Yamamuro.
[doi] [BibTeX]

Extracting semantic relations from query logs.
In: KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, Seiten 76-85. ACM, New York, NY, USA, 2007.
Ricardo Baeza-Yates und Alessandro Tiberi.
[doi] [BibTeX]

Measuring semantic similarity between words using web search engines.
In: WWW '07: Proceedings of the 16th international conference on World Wide Web, Seiten 757-766. ACM, New York, NY, USA, 2007.
Danushka Bollegala, Yutaka Matsuo und Mitsuru Ishizuka.
[BibTeX]

Time-dependent semantic similarity measure of queries using historical click-through data.
In: WWW '06: Proceedings of the 15th international conference on World Wide Web, Seiten 543-552. ACM Press, New York, NY, USA, 2006.
Qiankun Zhao, Steven C. H. Hoi, Tie-Yan Liu, Sourav S. Bhowmick, Michael R. Lyu und Wei-Ying Ma.
[doi] [Kurzfassung] [BibTeX]

Detecting Similarities in Ontologies with the SOQA-SimPack Toolkit.
In: Y. Ioannidis, M. H. Scholl, J. W. Schmidt, F. Matthes, M. Hatzopoulos, K. Boehm, A. Kemper, T. Grust und C. Boehm (Herausgeber): 10th International Conference on Extending Database Technology (EDBT 2006), Band 3896, Reihe Lecture Notes in Computer Science, Seiten 59-76. Springer, Munich, Germany, March 26-31, 2006.
Patrick Ziegler, Christoph Kiefer, Christoph Sturm, Klaus R. Dittrich und Abraham Bernstein.
[BibTeX]

From Distributional to Semantic Similarity.
Doktorarbeit, Institute for Communicating and Collaborative Systems School of Informatics University of Edinburgh, 2003.
James Richard Curran.
[doi] [Kurzfassung] [BibTeX]

Lexical-semantic resources, including thesauri and WOR DNE T, have been successfully incor- porated into a wide range of applications in Natural Language Processing. However they are very difficult and expensive to create and maintain, and their usefulness has been severely hampered by their limited coverage, bias and inconsistency. Automated and semi-automated methods for developing such resources are therefore crucial for further resource development and improved application performance. Systems that extract thesauri often identify similar words using the distributional hypothesis that similar words appear in similar contexts. This approach involves using corpora to examine the contexts each word appears in and then calculating the similarity between context distri- butions. Different definitions of context can be used, and I begin by examining how different types of extracted context influence similarity. To be of most benefit these systems must be capable of finding synonyms for rare words. Reliable context counts for rare events can only be extracted from vast collections of text. In this dissertation I describe how to extract contexts from a corpus of over 2 billion words. I describe techniques for processing text on this scale and examine the trade-off between context accuracy, information content and quantity of text analysed. Distributional similarity is at best an approximation to semantic similarity. I develop improved approximations motivated by the intuition that some events in the context distribution are more indicative of meaning than others. For instance, the object-of-verb context wear is far more indicative of a clothing noun than get. However, existing distributional techniques do not effectively utilise this information. The new context-weighted similarity metric I propose in this dissertation significantly outperforms every distributional similarity metric described in the literature. Nearest-neighbour similarity algorithms scale poorly with vocabulary and context vector size. To overcome this problem I introduce a new context-weighted approximation algorithm with bounded complexity in context vector size that significantly reduces the system runtime with only a minor performance penalty. I also describe a parallelized version of the system that runs on a Beowulf cluster for the 2 billion word experiments. To evaluate the context-weighted similarity measure I compare ranked similarity lists against gold-standard resources using precision and recall-based measures from Information Retrieval, since the alternative, application-based evaluation, can often be influenced by distributional as well as semantic similarity. I also perform a detailed analysis of the final results using WOR DNE T. Finally, I apply my similarity metric to the task of assigning words to WOR DNE T semantic categories. I demonstrate that this new approach outperforms existing methods and overcomes some of their weaknesses.

Building Hypertext Links By Computing Semantic Similarity.
IEEE Transactions on Knowledge and Data Engineering, 11:713-730, 1999.
S.J. Green.
[BibTeX]

Measures of distributional similarity.
In: Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, Seiten 25-32. Association for Computational Linguistics, Morristown, NJ, USA, 1999.
Lillian Lee.
[doi] [BibTeX]

Syntactic clustering of the Web.
Computer Networks and ISDN Systems, 29(8-13):1157-1166, 1997.
Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse und Geoffrey Zweig.
[doi] [Kurzfassung] [BibTeX]

Semantic similarity based on corpus statistics and lexical taxonomy.
CoRR, cmp-lg/9709008, 1997.
Jay J. Jiang und David W. Conrath.
[BibTeX]

The limitations of term co-occurrence data for query expansion in document retrieval systems.
Journal of the American Society for Information Science, 42(5):378-383, 1991.
Helen J. Peat und Peter Willett.
[doi] [BibTeX]