TY - CONF AU - Ring, Markus AU - Otto, Florian AU - Becker, Martin AU - Niebler, Thomas AU - Landes, Dieter AU - Hotho, Andreas A2 - ECMLPKDD2015 T1 - ConDist: A Context-Driven Categorical Distance Measure T2 - PB - C1 - PY - 2015/ CY - VL - IS - SP - EP - UR - DO - KW - 2015 KW - categorical KW - data KW - learning KW - measure KW - myown KW - similarity KW - unsupervised L1 - SN - N1 - N1 - AB - ER - TY - CONF AU - Levy, Omer AU - Goldberg, Yoav A2 - Morante, Roser A2 - tau Yih, Wen T1 - Linguistic Regularities in Sparse and Explicit Word Representations. T2 - CoNLL PB - ACL C1 - PY - 2014/ CY - VL - IS - SP - 171 EP - 180 UR - http://dblp.uni-trier.de/db/conf/conll/conll2014.html#LevyG14 DO - KW - kallimachos KW - posts KW - representation KW - similarity KW - toread KW - word L1 - SN - 978-1-941643-02-0 N1 - N1 - AB - ER - TY - CONF AU - Benz, Dominik AU - Krause, Beate AU - Kumar, G. Praveen AU - Hotho, Andreas AU - Stumme, Gerd A2 - T1 - Characterizing Semantic Relatedness of Search Query Terms T2 - Proceedings of the 1st Workshop on Explorative Analytics of Information Networks (EIN2009) PB - C1 - Bled, Slovenia PY - 2009/10 CY - VL - IS - SP - EP - UR - DO - KW - 2009 KW - ecml KW - measures KW - myown KW - pkdd KW - similarity KW - workshop L1 - SN - N1 - N1 - AB - ER - TY - CONF AU - Markines, Benjamin AU - Cattuto, Ciro AU - Menczer, Filippo AU - Benz, Dominik AU - Hotho, Andreas AU - Stumme, Gerd A2 - T1 - Evaluating Similarity Measures for Emergent Semantics of Social Tagging T2 - 18th International World Wide Web Conference PB - C1 - PY - 2009/04 CY - VL - IS - SP - 641 EP - 641 UR - http://www2009.eprints.org/65/ DO - KW - 2009 KW - measure KW - myown KW - semantics KW - similarity KW - social KW - tagging KW - taggingsurvey KW - tagorapub L1 - SN - N1 - N1 - AB - Social bookmarking systems and their emergent information structures, known as folksonomies, are increasingly important data sources for Semantic Web applications. A key question for harvesting semantics from these systems is how to extend and adapt traditional notions of similarity to folksonomies, and which measures are best suited for applications such as navigation support, semantic search, and ontology learning. Here we build an evaluation framework to compare various general folksonomy-based similarity measures derived from established information-theoretic, statistical, and practical measures. Our framework deals generally and symmetrically with users, tags, and resources. For evaluation purposes we focus on similarity among tags and resources, considering different ways to aggregate annotations across users. After comparing how tag similarity measures predict user-created tag relations, we provide an external grounding by user-validated semantic proxies based on WordNet and the Open Directory. We also investigate the issue of scalability. We ?nd that mutual information with distributional micro-aggregation across users yields the highest accuracy, but is not scalable; per-user projection with collaborative aggregation provides the best scalable approach via incremental computations. The results are consistent across resource and tag similarity. ER - TY - GEN AU - Cattuto, Ciro AU - Benz, Dominik AU - Hotho, Andreas AU - Stumme, Gerd A2 - T1 - Semantic Analysis of Tag Similarity Measures in Collaborative Tagging Systems JO - PB - C1 - PY - 2008/ VL - IS - SP - EP - UR - http://www.citebase.org/abstract?id=oai:arXiv.org:0805.2045 DO - KW - 2008 KW - analysis KW - learning KW - myown KW - ol KW - ontology KW - semantic KW - similarity KW - tag L1 - N1 - [0805.2045] Semantic Analysis of Tag Similarity Measures in Collaborative Tagging Systems N1 - AB - Social bookmarking systems allow users to organise collections of resources on the Web in a collaborative fashion. The increasing popularity of these systems as well as first insights into their emergent semantics have made them relevant to disciplines like knowledge extraction and ontology learning. The problem of devising methods to measure the semantic relatedness between tags and characterizing it semantically is still largely open. Here we analyze three measures of tag relatedness: tag co-occurrence, cosine similarity of co-occurrence distributions, and FolkRank, an adaptation of the PageRank algorithm to folksonomies. Each measure is computed on tags from a large-scale dataset crawled from the social bookmarking system del.icio.us. To provide a semantic grounding of our findings, a connection to WordNet (a semantic lexicon for the English language) is established by mapping tags into synonym sets of WordNet, and applying there well-known metrics of semantic similarity. Our results clearly expose different characteristics of the selected measures of relatedness, making them applicable to different subtasks of knowledge extraction such as synonym detection or discovery of concept hierarchies. ER - TY - CONF AU - Eda, Takeharu AU - Yoshikawa, Masatoshi AU - Yamamuro, Masashi A2 - Bailey, James A2 - Maier, David A2 - Schewe, Klaus-Dieter A2 - Thalheim, Bernhard A2 - Wang, Xiaoyang Sean T1 - Locally Expandable Allocation of Folksonomy Tags in a Directed Acyclic Graph. T2 - WISE PB - Springer C1 - PY - 2008/ CY - VL - 5175 IS - SP - 151 EP - 162 UR - http://dblp.uni-trier.de/db/conf/wise/wise2008.html#EdaYY08 DO - KW - ol KW - similarity KW - toread L1 - SN - 978-3-540-85480-7 N1 - dblp N1 - AB - ER - TY - CONF AU - Baeza-Yates, Ricardo AU - Tiberi, Alessandro A2 - T1 - Extracting semantic relations from query logs T2 - KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining PB - ACM C1 - New York, NY, USA PY - 2007/ CY - VL - IS - SP - 76 EP - 85 UR - http://portal.acm.org/citation.cfm?id=1281204 DO - http://doi.acm.org/10.1145/1281192.1281204 KW - log KW - query KW - search KW - semantic KW - similarity L1 - SN - 978-1-59593-609-7 N1 - Extracting semantic relations from query logs N1 - AB - ER - TY - CONF AU - Bollegala, Danushka AU - Matsuo, Yutaka AU - Ishizuka, Mitsuru A2 - T1 - Measuring semantic similarity between words using web search engines T2 - WWW '07: Proceedings of the 16th international conference on World Wide Web PB - ACM C1 - New York, NY, USA PY - 2007/ CY - VL - IS - SP - 757 EP - 766 UR - DO - http://doi.acm.org/10.1145/1242572.1242675 KW - search KW - semantic KW - similarity KW - toread L1 - SN - 978-1-59593-654-7 N1 - N1 - AB - ER - TY - CONF AU - Zhao, Qiankun AU - Hoi, Steven C. H. AU - Liu, Tie-Yan AU - Bhowmick, Sourav S. AU - Lyu, Michael R. AU - Ma, Wei-Ying A2 - T1 - Time-dependent semantic similarity measure of queries using historical click-through data T2 - WWW '06: Proceedings of the 15th international conference on World Wide Web PB - ACM Press C1 - New York, NY, USA PY - 2006/ CY - VL - IS - SP - 543 EP - 552 UR - http://portal.acm.org/citation.cfm?id=1135858 DO - http://doi.acm.org/10.1145/1135777.1135858 KW - log KW - query KW - search KW - semantic KW - similarity L1 - SN - 1-59593-323-9 N1 - Time-dependent semantic similarity measure of queries using historical click-through data N1 - AB - It has become a promising direction to measure similarity of Web search queries by mining the increasing amount of click-through data logged by Web search engines, which record the interactions between users and the search engines. Most existing approaches employ the click-through data for similarity measure of queries with little consideration of the temporal factor, while the click-through data is often dynamic and contains rich temporal information. In this paper we present a new framework of time-dependent query semantic similarity model on exploiting the temporal characteristics of historical click-through data. The intuition is that more accurate semantic similarity values between queries can be obtained by taking into account the timestamps of the log data. With a set of user-defined calendar schema and calendar patterns, our time-dependent query similarity model is constructed using the marginalized kernel technique, which can exploit both explicit similarity and implicit semantics from the click-through data effectively. Experimental results on a large set of click-through data acquired from a commercial search engine show that our time-dependent query similarity model is more accurate than the existing approaches. Moreover, we observe that our time-dependent query similarity model can, to some extent, reflect real-world semantics such as real-world events that are happening over time. ER - TY - CONF AU - Ziegler, Patrick AU - Kiefer, Christoph AU - Sturm, Christoph AU - Dittrich, Klaus R. AU - Bernstein, Abraham A2 - Ioannidis, Yannis A2 - Scholl, Marc H. A2 - Schmidt, Joachim W. A2 - Matthes, Florian A2 - Hatzopoulos, Mike A2 - Boehm, Klemens A2 - Kemper, Alfons A2 - Grust, Torsten A2 - Boehm, Christian T1 - Detecting Similarities in Ontologies with the SOQA-SimPack Toolkit T2 - 10th International Conference on Extending Database Technology (EDBT 2006) PB - Springer C1 - Munich, Germany, March 26-31 PY - 2006/ CY - VL - 3896 IS - SP - 59 EP - 76 UR - DO - KW - ontology KW - similarity KW - semantic L1 - SN - N1 - N1 - AB - ER - TY - THES AU - Curran, James Richard T1 - From Distributional to Semantic Similarity PY - 2003/ PB - Institute for Communicating and Collaborative Systems School of Informatics University of Edinburgh SP - EP - UR - http://www.era.lib.ed.ac.uk/bitstream/1842/563/2/IP030023.pdf DO - KW - distributional KW - semantic KW - similarity KW - toread KW - wordnet L1 - N1 - N1 - AB - Lexical-semantic resources, including thesauri and WOR DNE T, have been successfully incor-

porated into a wide range of applications in Natural Language Processing. However they are

very difficult and expensive to create and maintain, and their usefulness has been severely

hampered by their limited coverage, bias and inconsistency. Automated and semi-automated

methods for developing such resources are therefore crucial for further resource development

and improved application performance.

Systems that extract thesauri often identify similar words using the distributional hypothesis

that similar words appear in similar contexts. This approach involves using corpora to examine

the contexts each word appears in and then calculating the similarity between context distri-

butions. Different definitions of context can be used, and I begin by examining how different

types of extracted context influence similarity.

To be of most benefit these systems must be capable of finding synonyms for rare words.

Reliable context counts for rare events can only be extracted from vast collections of text. In

this dissertation I describe how to extract contexts from a corpus of over 2 billion words. I

describe techniques for processing text on this scale and examine the trade-off between context

accuracy, information content and quantity of text analysed.

Distributional similarity is at best an approximation to semantic similarity. I develop improved

approximations motivated by the intuition that some events in the context distribution are more

indicative of meaning than others. For instance, the object-of-verb context wear is far more

indicative of a clothing noun than get. However, existing distributional techniques do not

effectively utilise this information. The new context-weighted similarity metric I propose in

this dissertation significantly outperforms every distributional similarity metric described in

the literature.

Nearest-neighbour similarity algorithms scale poorly with vocabulary and context vector size.

To overcome this problem I introduce a new context-weighted approximation algorithm with

bounded complexity in context vector size that significantly reduces the system runtime with

only a minor performance penalty. I also describe a parallelized version of the system that runs

on a Beowulf cluster for the 2 billion word experiments.

To evaluate the context-weighted similarity measure I compare ranked similarity lists against

gold-standard resources using precision and recall-based measures from Information Retrieval,

since the alternative, application-based evaluation, can often be influenced by distributional

as well as semantic similarity. I also perform a detailed analysis of the final results using

WOR DNE T.

Finally, I apply my similarity metric to the task of assigning words to WOR DNE T semantic

categories. I demonstrate that this new approach outperforms existing methods and overcomes

some of their weaknesses.

ER - TY - JOUR AU - Green, S.J. T1 - Building Hypertext Links By Computing Semantic Similarity JO - IEEE Transactions on Knowledge and Data Engineering PY - 1999/ VL - 11 IS - SP - 713 EP - 730 UR - DO - KW - clustering KW - semantic KW - similarity L1 - SN - N1 - N1 - AB - ER - TY - CONF AU - Lee, Lillian A2 - T1 - Measures of distributional similarity T2 - Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics PB - Association for Computational Linguistics C1 - Morristown, NJ, USA PY - 1999/ CY - VL - IS - SP - 25 EP - 32 UR - http://portal.acm.org/citation.cfm?id=1034693&dl= DO - http://dx.doi.org/10.3115/1034678.1034693 KW - measure KW - similarity KW - toread L1 - SN - 1-55860-609-3 N1 - Measures of distributional similarity N1 - AB - ER - TY - JOUR AU - Broder, Andrei Z. AU - Glassman, Steven C. AU - Manasse, Mark S. AU - Zweig, Geoffrey T1 - Syntactic clustering of the Web JO - Computer Networks and ISDN Systems PY - 1997/10 VL - 29 IS - 8-13 SP - 1157 EP - 1166 UR - http://www.sciencedirect.com/science/article/B6TYT-3SP60S4-11/2/38f44c816ec8d69b406317de1629e56d DO - KW - Duplication KW - Fingerprints KW - Resemblance KW - Signatures KW - Similarity KW - Web KW - search L1 - SN - N1 - ScienceDirect - Computer Networks and ISDN Systems : Syntactic clustering of the Web N1 - AB - We have developed an efficient way to determine the syntactic similarity of files and have applied it to every document on the World Wide Web. Using this mechanism, we built a clustering of all the documents that are syntactically similar. Possible applications include a "Lost and Found" service, filtering the results of Web searches, updating widely distributed web-pages, and identifying violations of intellectual property rights. ER - TY - JOUR AU - Jiang, Jay J. AU - Conrath, David W. T1 - Semantic similarity based on corpus statistics and lexical taxonomy JO - CoRR PY - 1997/ VL - cmp-lg/9709008 IS - SP - EP - UR - DO - KW - corpus KW - semantic KW - similarity KW - wordnet L1 - SN - N1 - N1 - AB - ER - TY - JOUR AU - Peat, Helen J. AU - Willett, Peter T1 - The limitations of term co-occurrence data for query expansion in document retrieval systems JO - Journal of the American Society for Information Science PY - 1991/ VL - 42 IS - 5 SP - 378 EP - 383 UR - http://www.iro.umontreal.ca/~nie/IFT6255/Peat_Willett_QExp.pdf DO - 10.1002/(SICI)1097-4571(199106)42:5<378::AID-ASI8>3.0.CO;2-8 KW - expansion KW - ir KW - query KW - similarity KW - term L1 - SN - N1 - Wiley InterScience: Journal: Abstract N1 - AB - ER -