Publications
Proceedings of the Dagstuhl Seminar on Social Web Communities
2008, Alani, H.; Staab, S. & Stumme, G., ed., Schloss Dagstuhl [pdf]
Evaluation Strategies for Learning Algorithms of Hierarchical Structures
Bade, K. & Benz, D.
, 'Proceedings of the 32nd Annual Conference of the German Classification Society - Advances in Data Analysis, Data Handling and Business Intelligence (GfKl 2008)', Studies in Classification, Data Analysis, and Knowledge Organization, Springer, Berlin-Heidelberg (2008) [pdf]
Several learning tasks comprise hierarchies. Comparison with a "goldstandard" is often performed to evaluate the quality of a learned hierarchy. We assembled various similarity metrics that have been proposed in different disciplines and compared them in a unified interdisciplinary framework for hierarchical evaluation which is based on the distinction of three fundamental dimensions. Identifying deficiencies for measuring structural similarity, we suggest three new measures for this purpose, either extending existing ones or based on new ideas. Experiments with an artificial dataset were performed to compare the different measures. As shown by our results, the measures vary greatly in their properties.
Analyzing Tag Semantics Across Collaborative Tagging Systems
Benz, D.; Grobelnik, M.; Hotho, A.; Jäschke, R.; Mladenic, D.; Servedio, V. D. P.; Sizov, S. & Szomszor, M.
Alani, H.; Staab, S. & Stumme, G., ed., 'Proceedings of the Dagstuhl Seminar on Social Web Communities' (2008) [pdf]
The objective of our group was to exploit state-of-the-art Information Retrieval methods for finding associations and dependencies between tags, capturing and representing differences in tagging behavior and vocabulary of various folksonomies, with the overall aim to better understand the semantics of tags and the tagging process. Therefore we analyze the semantic content of tags in the Flickr and Delicious folksonomies. We find that: tag context similarity leads to meaningful results in Flickr, despite its narrow folksonomy character; the comparison of tags across Flickr and Delicious shows little semantic overlap, being tags in Flickr associated more to visual aspects rather than technological as it seems to be in Delicious; there are regions in the tag-tag space, provided with the cosine similarity metric, that are characterized by high density; the order of tags inside a post has a semantic relevance.
Semantic Analysis of Tag Similarity Measures in Collaborative Tagging Systems
Cattuto, C.; Benz, D.; Hotho, A. & Stumme, G.
, 'Proceedings of the 3rd Workshop on Ontology Learning and Population (OLP3)', Patras, Greece, 39-43 (2008) [pdf]
Social bookmarking systems allow users to organise collections of resources on the Web in a collaborative fashion. The increasing popularity of these systems as well as first insights into their emergent semantics have made them relevant to disciplines like knowledge extraction and ontology learning. The problem of devising methods to measure the semantic relatedness between tags and characterizing it semantically is still largely open. Here we analyze three measures of tag relatedness: tag co-occurrence, cosine similarity of co-occurrence distributions, and FolkRank, an adaptation of the PageRank algorithm to folksonomies. Each measure is computed on tags from a large-scale dataset crawled from the social bookmarking system del.icio.us. To provide a semantic grounding of our findings, a connection to WordNet (a semantic lexicon for the English language) is established by mapping tags into synonym sets of WordNet, and applying there well-known metrics of semantic similarity. Our results clearly expose different characteristics of the selected measures of relatedness, making them applicable to different subtasks of knowledge extraction such as synonym detection or discovery of concept hierarchies.
Semantic Grounding of Tag Relatedness in Social Bookmarking Systems
Cattuto, C.; Benz, D.; Hotho, A. & Stumme, G.
Sheth, A. P.; Staab, S.; Dean, M.; Paolucci, M.; Maynard, D.; Finin, T. W. & Thirunarayan, K., ed., 'The Semantic Web -- ISWC 2008, Proc.Intl. Semantic Web Conference 2008', 5318(), LNAI, Springer, Heidelberg, [http://dx.doi.org/10.1007/978-3-540-88564-1_39], 615-631 (2008) [pdf]
Collaborative tagging systems have nowadays become important data sources for populating semantic web applications. For taskslike synonym detection and discovery of concept hierarchies, many researchers introduced measures of tag similarity. Eventhough most of these measures appear very natural, their design often seems to be rather ad hoc, and the underlying assumptionson the notion of similarity are not made explicit. A more systematic characterization and validation of tag similarity interms of formal representations of knowledge is still lacking. Here we address this issue and analyze several measures oftag similarity: Each measure is computed on data from the social bookmarking system del.icio.us and a semantic grounding isprovided by mapping pairs of similar tags in the folksonomy to pairs of synsets in Wordnet, where we use validated measuresof semantic distance to characterize the semantic relation between the mapped tags. This exposes important features of theinvestigated similarity measures and indicates which ones are better suited in the context of a given semantic application.
ECML PKDD Discovery Challenge 2008 (RSDC'08)
2008, Hotho, A.; Benz, D.; Jäschke, R. & Krause, B., ed., Workshop at 18th Europ. Conf. on Machine Learning (ECML'08) / 11th Europ. Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD'08) [pdf]
Discovering Shared Conceptualizations in Folksonomies
Jäschke, R.; Hotho, A.; Schmitz, C.; Ganter, B. & Stumme, G.
Web Semantics: Science, Services and Agents on the World Wide Web, 6(1) 38-53 (2008) [pdf]
Social bookmarking tools are rapidly emerging on the Web. In such systems users are setting up lightweight conceptual structures called folksonomies. Unlike ontologies, shared conceptualizations are not formalized, but rather implicit. We present a new data mining task, the mining of all frequent tri-concepts, together with an efficient algorithm, for discovering these implicit shared conceptualizations. Our approach extends the data mining task of discovering all closed itemsets to three-dimensional data structures to allow for mining folksonomies. We provide a formal definition of the problem, and present an efficient algorithm for its solution. Finally, we show the applicability of our approach on three large real-world examples.
Logsonomy -- A Search Engine Folksonomy
Jäschke, R.; Krause, B.; Hotho, A. & Stumme, G.
, 'Proceedings of the Second International Conference on Weblogs and Social Media(ICWSM 2008)', AAAI Press (2008) [pdf]
In social bookmarking systems users describe bookmarksby keywords called tags. The structure behindthese social systems, called folksonomies, can beviewed as a tripartite hypergraph of user, tag and resourcenodes. This underlying network shows specificstructural properties that explain its growth and the possibilityof serendipitous exploration.Search engines filter the vast information of the web.Queries describe a user’s information need. In responseto the displayed results of the search engine, users clickon the links of the result page as they expect the answerto be of relevance. The clickdata can be represented as afolksonomy in which queries are descriptions of clickedURLs. This poster analyzes the topological characteristicsof the resulting tripartite hypergraph of queries,users and bookmarks of two query logs and compares ittwo a snapshot of the folksonomy del.icio.us.
A Comparison of Social Bookmarking with Traditional Search
Krause, B.; Hotho, A. & Stumme, G.
Macdonald, C.; Ounis, I.; Plachouras, V.; Ruthven, I. & White, R. W., ed., '30th European Conference on IR Research, ECIR 2008', 4956(), Lecture Notes in Computer Science, Springer, Glasgow, UK, 101-113 (2008)
The Anti-Social Tagger - Detecting Spam in Social Bookmarking Systems
Krause, B.; Schmitz, C.; Hotho, A. & Stumme, G.
, 'Proc. of the Fourth International Workshop on Adversarial Information Retrieval on the Web' (2008) [pdf]
The Anti-Social Tagger - Detecting Spam in Social Bookmarking Systems
Krause, B.; Schmitz, C.; Hotho, A. & Stumme, G.
, 'AIRWeb '08: Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web', ACM, New York, NY, USA, [10.1145/1451983.1451998], 61-68 (2008) [pdf]
The annotation of web sites in social bookmarking systemshas become a popular way to manage and find informationon the web. The community structure of such systems attractsspammers: recent post pages, popular pages or specifictag pages can be manipulated easily. As a result, searchingor tracking recent posts does not deliver quality resultsannotated in the community, but rather unsolicited, oftencommercial, web sites. To retain the benefits of sharingone’s web content, spam-fighting mechanisms that can facethe flexible strategies of spammers need to be developed.