Publications
Evaluation of Folksonomy Induction Algorithms
Strohmaier, M.; Helic, D.; Benz, D.; Körner, C. & Kern, R.
Transactions on Intelligent Systems and Technology (2012) [pdf]
Review of the state of the art: Discovering and Associating Semantics to Tags in Folksonomies
Garcia-Silva, A.; Corcho, O.; Alani, H. & Gomez-Perez, A.
Knowledge Engineering Review, 26(4) (2011)
This paper describes and compares the most relevant approaches for associating tags with semantics in order to make explicit the meaning of those tags. We identify a common set of steps that are usually considered across all these approaches and frame our descriptions according to them, providing a unified view of how each approach tackles the different problems that appear during the semantic association process. Furthermore, we provide some recommendations on (a) how and when to use each of the approaches according to the characteristics of the data source, and (b) how to improve results by leveraging the strengths of the different approaches.
Ontology Learning based on Text Mining and Social Evidence Sources
Weichselbraun, A.
(2011) [pdf]
Toponym Resolution in Social Media.
Ireson, N. & Ciravegna, F.
Patel-Schneider, P. F.; Pan, Y.; Hitzler, P.; Mika, P.; Zhang, L.; Pan, J. Z.; Horrocks, I. & Glimm, B., ed., '#iswc2010#', 6496(), Lecture Notes in Computer Science, Springer, 370-385 (2010) [pdf]
Increasingly user-generated content is being utilised as a source of information, however each individual piece of content tends to contain low levels of information. In addition, such information tends to be informal and imperfect in nature; containing imprecise, subjective, ambiguous expressions. However the content does not have to be interpreted in isolation as it is linked, either explicitly or implicitly, to a network of interrelated content; it may be grouped or tagged with similar content, comments may be added by other users or it may be related to other content posted at the same time or by the same author or members of the author's social network. This paper generally examines how ambiguous concepts within user-generated content can be assigned a specific/formal meaning by considering the expanding context of the information, i.e. other information contained within directly or indirectly related content, and specifically considers the issue of toponym resolution of locations.
Theoretical and Practical Perspectives on Ontology Learning from Folksonomies
Keller, C.
2010, Master's thesis, Universität Stuttgart
Multi-Domain Klassifikation basierend auf nutzergenerierten Metadaten
Meder, M.
2010, Master's thesis, Technische Universität Berlin
Ontology Learning
Cimiano, P.; Mädche, A.; Staab, S. & Völker, J.
Staab, S. & Studer, R., ed., 'Handbook on Ontologies', Springer Berlin Heidelberg, 245-267 (2009) [pdf]
Preliminary Results in Tag Disambiguation using DBpedia
Garcia, A.; Szomszor, M.; Alani, H. & Corcho, O.
, 'Knowledge Capture (K-Cap'09) - First International Workshop on Collective Knowledge Capturing and Representation - CKCaR'09' (2009) [pdf]
The availability of tag-based user-generated content for a variety of Web resources (music, photos, videos, text, etc.) has largely increased in the last years. Users can assign tags freely and then use them to share and retrieve information. However, tag-based sharing and retrieval is not optimal due to the fact that tags are plain text labels without an explicit or formal meaning, and hence polysemy and synonymy should be dealt with appropriately. To ameliorate these problems, we propose a context-based tag disambiguation algorithm that selects the meaning of a tag among a set of candidate DBpedia entries, using a common information retrieval similarity measure. The most similar DBpedia en-try is selected as the one representing the meaning of the tag. We describe and analyze some preliminary results, and discuss about current challenges in this area.
Ontology learning from domain specific web documents
Hazman, M.; El-Beltagy, S. R. & Rafea, A.
International Journal of Metadata, Semantics and Ontologies, 4() 24-33(10) (2009) [pdf]
Ontologies play a vital role in many web- and internet-related applications. This work presents a system for accelerating the ontology building process via semi-automatically learning a hierarchal ontology given a set of domain-specific web documents and a set of seed concepts. The methods are tested with web documents in the domain of agriculture. The ontology is constructed through the use of two complementary approaches. The presented system has been used to build an ontology in the agricultural domain using a set of Arabic extension documents and evaluated against a modified version of the AGROVOC ontology.
Semantically enriching folksonomies with FLOR
Angeletou, S.; Sabou, M. & Motta, E.
, 'Proceedings of the CISWeb Workshop, located at the 5th European Semantic Web Conference ESWC 2008' (2008) [pdf]
Abstract. While the increasing popularity of folksonomies has lead to a vast quantity of tagged data, resource retrieval in folksonomies is limited by being agnostic to the meaning (i.e., semantics) of tags. Our goal is to automatically enrich folksonomy tags (and implicitly the related resources) with formal semantics by associating them to relevant concepts defined in online ontologies. We introduce FLOR, a method that performs automatic folksonomy enrichment by combining knowledge from WordNet and online available ontologies. Experimentally testing FLOR, we found that it correctly enriched 72 % of 250 Flickr photos. 1
Multilingual Evidence Improves Clustering-based Taxonomy Extraction.
Hjelm, H. & Buitelaar, P.
Ghallab, M.; Spyropoulos, C. D.; Fakotakis, N. & Avouris, N. M., ed., 'ECAI', 178(), Frontiers in Artificial Intelligence and Applications, IOS Press, 288-292 (2008) [pdf]
We present a system for taxonomy extraction, aimed at providing a taxonomic backbone in an ontology learning environment. We follow previous research in using hierarchical clustering based on distributional similarity of the terms in texts. We show that basing the clustering on a comparable corpus in four languages gives a considerable improvement in accuracy compared to using only the monolingual English texts. We also show that hierarchical k-means clustering increases the similarity to the original taxonomy, when compared with a bottom-up agglomerative clustering approach.
Semantify del.icio.us: Automatically Turn your Tags into Senses
Tesconi, M.; Ronzano, F.; Marchetti, A. & Minutoli, S.
, 'Proceedings of the Workshop Social Data on the Web (SDoW2008)' (2008) [pdf]
At present tagging is experimenting a great diffusion as the most adopted way to collaboratively classify resources over the Web. In this paper, after a detailed analysis of the attempts made to improve the organization and structure of tagging systems as well as the usefulness of this kind of social data, we propose and evaluate the Tag Disambiguation Algorithm, mining del.icio.us data. It allows to easily semantify the tags of the users of a tagging service: it automatically finds out for each tag the related concept of Wikipedia in order to describe Web resources through senses. On the basis of a set of evaluation tests, we analyze all the advantages of our sense-based way of tagging, proposing new methods to keep the set of users tags more consistent or to classify the tagged resources on the basis of Wikipedia categories, YAGO classes or Wordnet synsets. We discuss also how our semanitified social tagging data are strongly linked to DBPedia and the datasets of the Linked Data community. 1
What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content
Auer, Sö. & Lehmann, J.
, 'ESWC', 503-517 (2007) [pdf]
Wikis are established means for the collaborative authoring, versioning and publishing of textual articles. The Wikipedia project, for example, succeeded in creating the by far largest encyclopedia just on the basis of a wiki. Recently, several approaches have been proposed on how to extend wikis to allow the creation of structured and semantically enriched content. However, the means for creating semantically enriched structured content are already available and are, although unconsciously, even used by Wikipedia authors. In this article, we present a method for revealing this structured content by extracting information from template instances. We suggest ways to efficiently query the vast amount of extracted information (e.g. more than 8 million RDF statements for the English Wikipedia version alone), leading to astonishing query answering possibilities (such as for the title question). We analyze the quality of the extracted content, and propose strategies for quality improvements with just minor modifications of the wiki systems being currently used.
Extracting semantic relations from query logs
Baeza-Yates, R. & Tiberi, A.
, 'KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining', ACM, New York, NY, USA, [http://doi.acm.org/10.1145/1281192.1281204], 76-85 (2007) [pdf]
In this paper we study a large query log of more than twenty million queries with the goal of extracting the semantic relations that are implicitly captured in the actions of users submitting queries and clicking answers. Previous query log analyses were mostly done with just the queries and not the actions that followed after them. We first propose a novel way to represent queries in a vector space based on a graph derived from the query-click bipartite graph. We then analyze the graph produced by our query log, showing that it is less sparse than previous results suggested, and that almost all the measures of these graphs follow power laws, shedding some light on the searching user behavior as well as on the distribution of topics that people want in the Web. The representation we introduce allows to infer interesting semantic relationships between queries. Second, we provide an experimental analysis on the quality of these relations, showing that most of them are relevant. Finally we sketch an application that detects multitopical URLs.
How flickr helps us make sense of the world: context and content in community-contributed media collections
Kennedy, L.; Naaman, M.; Ahern, S.; Nair, R. & Rattenbury, T.
, 'MULTIMEDIA '07: Proceedings of the 15th international conference on Multimedia', ACM, New York, NY, USA, [10.1145/1291233.1291384], 631-640 (2007) [pdf]
The advent of media-sharing sites like Flickr and YouTube has drastically increased the volume of community-contributed multimedia resources available on the web. These collections have a previously unimagined depth and breadth, and have generated new opportunities – and new challenges – to multimedia research. How do we analyze, understand and extract patterns from these new collections? How can we use these unstructured, unrestricted community contributions of media (and annotation) to generate “knowledge�?? As a test case, we study Flickr – a popular photo sharing website. Flickr supports photo, time and location metadata, as well as a light-weight annotation model. We extract information from this dataset using two different approaches. First, we employ a location-driven approach to generate aggregate knowledge in the form of “representative tags�? for arbitrary areas in the world. Second, we use a tag-driven approach to automatically extract place and event semantics for Flickr tags, based on each tag’s metadata patterns. With the patterns we extract from tags and metadata, vision algorithms can be employed with greater precision. In particular, we demonstrate a location-tag-vision-based approach to retrieving images of geography-related landmarks and features from the Flickr dataset. The results suggest that community-contributed media and annotation can enhance and improve our access to multimedia resources – and our understanding of the world.
Ontology learning and population from text - algorithms, evaluation and applications.
Cimiano, P.
2006, Springer
A Survey of Ontology Evaluation Techniques
Brank, J.; Grobelnik, M. & Mladenić, D.
, 'Proc. of 8th Int. multi-conf. Information Society', 166-169 (2005)
An ontology is an explicit formal conceptualization of some domain of interest. Ontologies are increasingly used in various fields such as knowledge management, information extraction, and the semantic web. Ontology evaluation is the problem of assessing a given ontology from the point of view of a particular criterion of application, typically in order to determine which of several ontologies would best suit a particular purpose. This paper presents a survey of the state of the art in ontology evaluation.
Automatic Acquisition of Taxonomies from Text: FCA meets NLP
Cimiano, P.; Staab, S. & Tane, J.
, 'Proceedings of the ECML / PKDD Workshop on Adaptive Text Extraction and Mining', Cavtat-Dubrovnik, Croatia, 10-17 (2003) [pdf]
We present a novel approach to the automatic acquisition of taxonomies or concept hierarchies from domain-specific texts based on Formal Concept Analysis (FCA). Our approach is based on the assumption that verbs pose more or less strong selectional restrictions on their arguments. The conceptual hierarchy is then built on the basis of the inclusion relations between the extensions of the selectional restrictions of all the verbs, while the verbs themselves provide intensional descriptions for each concept. We formalize this idea in terms of FCA and show how our approach can be used to acquire a concept hierarchy for the tourism domain out of texts. We then evaluate our method by considering an already existing ontology for this domain.
A Graph Model for Unsupervised Lexical Acquisition
Widdows, D. & Dorow, B.
, 'COLING' (2002)
Knowledge Engineering: Principles and Methods
Studer, R.; Benjamins, R. R. & Fensel, D.
Data Knowledge Engineering, 25(1-2) 161-197 (1998) [pdf]
This paper gives an overview about the development of the field of Knowledge Engineering over the last 15 years. We discuss the paradigm shift from a transfer view to a modeling view and describe two approaches which considerably shaped research in Knowledge Engineering: Role-limiting Methods and Generic Tasks. To illustrate various concepts and methods which evolved in the last years we describe three modeling frameworks: CommonKADS, MIKE, and PROTÉGÉ-II. This description is supplemented by discussing some important methodological developments in more detail: specification languages for knowledge-based systems, problem-solving methods, and ontologies. We conclude with outlining the relationship of Knowledge Engineering to Software Engineering, Information Integration and Knowledge Management.