TY  - CONF
AU  - Ring, Markus
AU  - Otto, Florian
AU  - Becker, Martin
AU  - Niebler, Thomas
AU  - Landes, Dieter
AU  - Hotho, Andreas
A2  - ECMLPKDD2015
T1  - ConDist: A Context-Driven Categorical Distance Measure
T2  - 
PB  - 
C1  - 
PY  - 2015/
CY  -  
VL  - 
IS  - 
SP  - 
EP  - 
UR  - 
DO  - 
KW  - 2015
KW  - categorical
KW  - data
KW  - learning
KW  - measure
KW  - myown
KW  - similarity
KW  - unsupervised
L1  - 
SN  - 
N1  - 
N1  - 
AB  - 
ER  -

TY  - CONF
AU  - Levy, Omer
AU  - Goldberg, Yoav
A2  - Morante, Roser
A2  - tau Yih, Wen
T1  - Linguistic Regularities in Sparse and Explicit Word Representations.
T2  - CoNLL
PB  - ACL
C1  - 
PY  - 2014/
CY  -  
VL  - 
IS  - 
SP  - 171
EP  - 180
UR  - http://dblp.uni-trier.de/db/conf/conll/conll2014.html#LevyG14
DO  - 
KW  - kallimachos
KW  - posts
KW  - representation
KW  - similarity
KW  - toread
KW  - word
L1  - 
SN  - 978-1-941643-02-0
N1  - 
N1  - 
AB  - 
ER  -

TY  - CONF
AU  - Benz, Dominik
AU  - Krause, Beate
AU  - Kumar, G. Praveen
AU  - Hotho, Andreas
AU  - Stumme, Gerd
A2  - 
T1  - Characterizing Semantic Relatedness of Search Query Terms
T2  - Proceedings of the 1st Workshop on Explorative Analytics of Information Networks (EIN2009)
PB  - 
C1  - Bled, Slovenia
PY  - 2009/10
CY  -  
VL  - 
IS  - 
SP  - 
EP  - 
UR  - 
DO  - 
KW  - 2009
KW  - ecml
KW  - measures
KW  - myown
KW  - pkdd
KW  - similarity
KW  - workshop
L1  - 
SN  - 
N1  - 
N1  - 
AB  - 
ER  -

TY  - CONF
AU  - Markines, Benjamin
AU  - Cattuto, Ciro
AU  - Menczer, Filippo
AU  - Benz, Dominik
AU  - Hotho, Andreas
AU  - Stumme, Gerd
A2  - 
T1  - Evaluating Similarity Measures for Emergent Semantics of Social Tagging
T2  - 18th International World Wide Web Conference
PB  - 
C1  - 
PY  - 2009/04
CY  -  
VL  - 
IS  - 
SP  - 641
EP  - 641
UR  - http://www2009.eprints.org/65/
DO  - 
KW  - 2009
KW  - measure
KW  - myown
KW  - semantics
KW  - similarity
KW  - social
KW  - tagging
KW  - taggingsurvey
KW  - tagorapub
L1  - 
SN  - 
N1  - 
N1  - 
AB  - Social bookmarking systems and their emergent information structures, known as folksonomies, are increasingly important data sources for Semantic Web applications. A key question for harvesting semantics from these systems is how to extend and adapt traditional notions of similarity to folksonomies, and which measures are best suited for applications such as navigation support, semantic search, and ontology learning. Here we build an evaluation framework to compare various general folksonomy-based similarity measures derived from established information-theoretic, statistical, and practical measures. Our framework deals generally and symmetrically with users, tags, and resources. For evaluation purposes we focus on similarity among tags and resources, considering different ways to aggregate annotations across users. After comparing how tag similarity measures predict user-created tag relations, we provide an external grounding by user-validated semantic proxies based on WordNet and the Open Directory. We also investigate the issue of scalability. We ?nd that mutual information with distributional micro-aggregation across users yields the highest accuracy, but is not scalable; per-user projection with collaborative aggregation provides the best scalable approach via incremental computations. The results are consistent across resource and tag similarity.
ER  -

TY  - GEN
AU  - Cattuto, Ciro
AU  - Benz, Dominik
AU  - Hotho, Andreas
AU  - Stumme, Gerd
A2  - 
T1  - Semantic Analysis of Tag Similarity Measures in Collaborative Tagging   Systems
JO  - 
PB  - 
C1  - 
PY  - 2008/
VL  - 
IS  - 
SP  - 
EP  - 
UR  - http://www.citebase.org/abstract?id=oai:arXiv.org:0805.2045
DO  - 
KW  - 2008
KW  - analysis
KW  - learning
KW  - myown
KW  - ol
KW  - ontology
KW  - semantic
KW  - similarity
KW  - tag
L1  - 
N1  - [0805.2045] Semantic Analysis of Tag Similarity Measures in Collaborative Tagging Systems
N1  - 
AB  -  Social bookmarking systems allow users to organise collections of resources on the Web in a collaborative fashion. The increasing popularity of these systems as well as first insights into their emergent semantics have made them relevant to disciplines like knowledge extraction and ontology learning. The problem of devising methods to measure the semantic relatedness between tags and characterizing it semantically is still largely open. Here we analyze three measures of tag relatedness: tag co-occurrence, cosine similarity of co-occurrence distributions, and FolkRank, an adaptation of the PageRank algorithm to folksonomies. Each measure is computed on tags from a large-scale dataset crawled from the social bookmarking system del.icio.us. To provide a semantic grounding of our findings, a connection to WordNet (a semantic lexicon for the English language) is established by mapping tags into synonym sets of WordNet, and applying there well-known metrics of semantic similarity. Our results clearly expose different characteristics of the selected measures of relatedness, making them applicable to different subtasks of knowledge extraction such as synonym detection or discovery of concept hierarchies.
ER  -

TY  - CONF
AU  - Eda, Takeharu
AU  - Yoshikawa, Masatoshi
AU  - Yamamuro, Masashi
A2  - Bailey, James
A2  - Maier, David
A2  - Schewe, Klaus-Dieter
A2  - Thalheim, Bernhard
A2  - Wang, Xiaoyang Sean
T1  - Locally Expandable Allocation of Folksonomy Tags in a Directed Acyclic Graph.
T2  - WISE
PB  - Springer
C1  - 
PY  - 2008/
CY  -  
VL  - 5175
IS  - 
SP  - 151
EP  - 162
UR  - http://dblp.uni-trier.de/db/conf/wise/wise2008.html#EdaYY08
DO  - 
KW  - ol
KW  - similarity
KW  - toread
L1  - 
SN  - 978-3-540-85480-7
N1  - dblp
N1  - 
AB  - 
ER  -

TY  - CONF
AU  - Baeza-Yates, Ricardo
AU  - Tiberi, Alessandro
A2  - 
T1  - Extracting semantic relations from query logs
T2  - KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
PB  - ACM
C1  - New York, NY, USA
PY  - 2007/
CY  -  
VL  - 
IS  - 
SP  - 76
EP  - 85
UR  - http://portal.acm.org/citation.cfm?id=1281204
DO  - http://doi.acm.org/10.1145/1281192.1281204
KW  - log
KW  - query
KW  - search
KW  - semantic
KW  - similarity
L1  - 
SN  - 978-1-59593-609-7
N1  - Extracting semantic relations from query logs
N1  - 
AB  - 
ER  -

TY  - CONF
AU  - Bollegala, Danushka
AU  - Matsuo, Yutaka
AU  - Ishizuka, Mitsuru
A2  - 
T1  - Measuring semantic similarity between words using web search engines
T2  - WWW '07: Proceedings of the 16th international conference on World Wide Web
PB  - ACM
C1  - New York, NY, USA
PY  - 2007/
CY  -  
VL  - 
IS  - 
SP  - 757
EP  - 766
UR  - 
DO  - http://doi.acm.org/10.1145/1242572.1242675
KW  - search
KW  - semantic
KW  - similarity
KW  - toread
L1  - 
SN  - 978-1-59593-654-7
N1  - 
N1  - 
AB  - 
ER  -

TY  - CONF
AU  - Zhao, Qiankun
AU  - Hoi, Steven C. H.
AU  - Liu, Tie-Yan
AU  - Bhowmick, Sourav S.
AU  - Lyu, Michael R.
AU  - Ma, Wei-Ying
A2  - 
T1  - Time-dependent semantic similarity measure of queries using historical click-through data
T2  - WWW '06: Proceedings of the 15th international conference on World Wide Web
PB  - ACM Press
C1  - New York, NY, USA
PY  - 2006/
CY  -  
VL  - 
IS  - 
SP  - 543
EP  - 552
UR  - http://portal.acm.org/citation.cfm?id=1135858
DO  - http://doi.acm.org/10.1145/1135777.1135858
KW  - log
KW  - query
KW  - search
KW  - semantic
KW  - similarity
L1  - 
SN  - 1-59593-323-9
N1  - Time-dependent semantic similarity measure of queries using historical click-through data
N1  - 
AB  - It has become a promising direction to measure similarity of Web search queries by mining the increasing amount of click-through data logged by Web search engines, which record the interactions between users and the search engines. Most existing approaches employ the click-through data for similarity measure of queries with little consideration of the temporal factor, while the click-through data is often dynamic and contains rich temporal information. In this paper we present a new framework of time-dependent query semantic similarity model on exploiting the temporal characteristics of historical click-through data. The intuition is that more accurate semantic similarity values between queries can be obtained by taking into account the timestamps of the log data. With a set of user-defined calendar schema and calendar patterns, our time-dependent query similarity model is constructed using the marginalized kernel technique, which can exploit both explicit similarity and implicit semantics from the click-through data effectively. Experimental results on a large set of click-through data acquired from a commercial search engine show that our time-dependent query similarity model is more accurate than the existing approaches. Moreover, we observe that our time-dependent query similarity model can, to some extent, reflect real-world semantics such as real-world events that are happening over time.
ER  -

TY  - CONF
AU  - Ziegler, Patrick
AU  - Kiefer, Christoph
AU  - Sturm, Christoph
AU  - Dittrich, Klaus R.
AU  - Bernstein, Abraham
A2  - Ioannidis, Yannis
A2  - Scholl, Marc H.
A2  - Schmidt, Joachim W.
A2  - Matthes, Florian
A2  - Hatzopoulos, Mike
A2  - Boehm, Klemens
A2  - Kemper, Alfons
A2  - Grust, Torsten
A2  - Boehm, Christian
T1  - Detecting Similarities in Ontologies with the SOQA-SimPack Toolkit
T2  - 10th International Conference on Extending Database Technology (EDBT 2006)
PB  - Springer
C1  - Munich, Germany, March 26-31
PY  - 2006/
CY  -  
VL  - 3896
IS  - 
SP  - 59
EP  - 76
UR  - 
DO  - 
KW  - ontology
KW  - similarity
KW  - semantic
L1  - 
SN  - 
N1  - 
N1  - 
AB  - 
ER  -

TY  - THES
AU  - Curran, James Richard
T1  - From Distributional to Semantic Similarity
PY  - 2003/
PB  - Institute for Communicating and Collaborative Systems School of Informatics University of Edinburgh
SP  - 
EP  - 
UR  - http://www.era.lib.ed.ac.uk/bitstream/1842/563/2/IP030023.pdf 
DO  - 
KW  - distributional
KW  - semantic
KW  - similarity
KW  - toread
KW  - wordnet
L1  - 
N1  - 
N1  - 
AB  - Lexical-semantic resources, including thesauri and WOR DNE T, have been successfully incor- <p>porated into a wide range of applications in Natural Language Processing. However they are <p>very difficult and expensive to create and maintain, and their usefulness has been severely <p>hampered by their limited coverage, bias and inconsistency. Automated and semi-automated <p>methods for developing such resources are therefore crucial for further resource development <p>and improved application performance. <p><p>Systems that extract thesauri often identify similar words using the distributional hypothesis <p>that similar words appear in similar contexts. This approach involves using corpora to examine <p>the contexts each word appears in and then calculating the similarity between context distri- <p>butions. Different definitions of context can be used, and I begin by examining how different <p>types of extracted context influence similarity. <p><p>To be of most benefit these systems must be capable of finding synonyms for rare words. <p>Reliable context counts for rare events can only be extracted from vast collections of text. In <p>this dissertation I describe how to extract contexts from a corpus of over 2 billion words. I <p>describe techniques for processing text on this scale and examine the trade-off between context <p>accuracy, information content and quantity of text analysed. <p><p>Distributional similarity is at best an approximation to semantic similarity. I develop improved <p>approximations motivated by the intuition that some events in the context distribution are more <p>indicative of meaning than others. For instance, the object-of-verb context wear is far more <p>indicative of a clothing noun than get. However, existing distributional techniques do not <p>effectively utilise this information. The new context-weighted similarity metric I propose in <p>this dissertation significantly outperforms every distributional similarity metric described in <p>the literature. <p><p>Nearest-neighbour similarity algorithms scale poorly with vocabulary and context vector size. <p>To overcome this problem I introduce a new context-weighted approximation algorithm with <p>bounded complexity in context vector size that significantly reduces the system runtime with <p>only a minor performance penalty. I also describe a parallelized version of the system that runs <p>on a Beowulf cluster for the 2 billion word experiments. <p><p>To evaluate the context-weighted similarity measure I compare ranked similarity lists against <p>gold-standard resources using precision and recall-based measures from Information Retrieval, <p>since the alternative, application-based evaluation, can often be influenced by distributional <p>as well as semantic similarity. I also perform a detailed analysis of the final results using <p>WOR DNE T. <p>Finally, I apply my similarity metric to the task of assigning words to WOR DNE T semantic <p>categories. I demonstrate that this new approach outperforms existing methods and overcomes <p>some of their weaknesses. <p>
ER  -

TY  - JOUR
AU  - Green, S.J.
T1  - Building Hypertext Links By Computing Semantic Similarity
JO  - IEEE Transactions on Knowledge and Data Engineering
PY  - 1999/
VL  - 11
IS  - 
SP  - 713
EP  - 730
UR  - 
DO  - 
KW  - clustering
KW  - semantic
KW  - similarity
L1  - 
SN  - 
N1  - 
N1  - 
AB  - 
ER  -

TY  - CONF
AU  - Lee, Lillian
A2  - 
T1  - Measures of distributional similarity
T2  - Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
PB  - Association for Computational Linguistics
C1  - Morristown, NJ, USA
PY  - 1999/
CY  -  
VL  - 
IS  - 
SP  - 25
EP  - 32
UR  - http://portal.acm.org/citation.cfm?id=1034693&dl=
DO  - http://dx.doi.org/10.3115/1034678.1034693
KW  - measure
KW  - similarity
KW  - toread
L1  - 
SN  - 1-55860-609-3
N1  - Measures of distributional similarity
N1  - 
AB  - 
ER  -

TY  - JOUR
AU  - Broder, Andrei Z.
AU  - Glassman, Steven C.
AU  - Manasse, Mark S.
AU  - Zweig, Geoffrey
T1  - Syntactic clustering of the Web
JO  - Computer Networks and ISDN Systems
PY  - 1997/10
VL  - 29
IS  - 8-13
SP  - 1157
EP  - 1166
UR  - http://www.sciencedirect.com/science/article/B6TYT-3SP60S4-11/2/38f44c816ec8d69b406317de1629e56d
DO  - 
KW  - Duplication
KW  - Fingerprints
KW  - Resemblance
KW  - Signatures
KW  - Similarity
KW  - Web
KW  - search
L1  - 
SN  - 
N1  - ScienceDirect - Computer Networks and ISDN Systems : Syntactic clustering of the Web
N1  - 
AB  - We have developed an efficient way to determine the syntactic similarity of files and have applied it to every document on the World Wide Web. Using this mechanism, we built a clustering of all the documents that are syntactically similar. Possible applications include a "Lost and Found" service, filtering the results of Web searches, updating widely distributed web-pages, and identifying violations of intellectual property rights.
ER  -

TY  - JOUR
AU  - Jiang, Jay J.
AU  - Conrath, David W.
T1  - Semantic similarity based on corpus statistics and lexical taxonomy
JO  - CoRR
PY  - 1997/
VL  - cmp-lg/9709008
IS  - 
SP  - 
EP  - 
UR  - 
DO  - 
KW  - corpus
KW  - semantic
KW  - similarity
KW  - wordnet
L1  - 
SN  - 
N1  - 
N1  - 
AB  - 
ER  -

TY  - JOUR
AU  - Peat, Helen J.
AU  - Willett, Peter
T1  - The limitations of term co-occurrence data for query expansion in document retrieval systems
JO  - Journal of the American Society for Information Science
PY  - 1991/
VL  - 42
IS  - 5
SP  - 378
EP  - 383
UR  - http://www.iro.umontreal.ca/~nie/IFT6255/Peat_Willett_QExp.pdf
DO  - 10.1002/(SICI)1097-4571(199106)42:5<378::AID-ASI8>3.0.CO;2-8
KW  - expansion
KW  - ir
KW  - query
KW  - similarity
KW  - term
L1  - 
SN  - 
N1  - Wiley InterScience: Journal: Abstract
N1  - 
AB  - 
ER  -