Strohmaier, M.; Helic, D.; Benz, D.; Körner, C. & Kern, R.: Evaluation of Folksonomy Induction Algorithms. In:
Transactions on Intelligent Systems and Technology (2012),
[Volltext]
[BibTeX]
Garcia-Silva, A.; Corcho, O.; Alani, H. & Gomez-Perez, A.: Review of the state of the art: Discovering and Associating Semantics to Tags in Folksonomies. In:
Knowledge Engineering Review 26 (2011), Nr. 4,
[Kurzfassung]
[BibTeX]
This paper describes and compares the most relevant approaches for associating tags with semantics in order to make explicit the meaning of those tags. We identify a common set of steps that are usually considered across all these approaches and frame our descriptions according to them, providing a unified view of how each approach tackles the different problems that appear during the semantic association process. Furthermore, we provide some recommendations on (a) how and when to use each of the approaches according to the characteristics of the data source, and (b) how to improve results by leveraging the strengths of the different approaches.
Weichselbraun, A.:
Ontology Learning based on Text Mining and Social Evidence Sources. , 2011
[Volltext]
[BibTeX]
Ireson, N. & Ciravegna, F.: Toponym Resolution in Social Media.. In: Patel-Schneider, P. F.; Pan, Y.; Hitzler, P.; Mika, P.; Zhang, L.; Pan, J. Z.; Horrocks, I. & Glimm, B. (Hrsg.):
#iswc2010#. Springer, 2010 (Lecture Notes in Computer Science 6496), S. 370-385
[Volltext] [Kurzfassung]
[BibTeX]
Increasingly user-generated content is being utilised as a source of information, however each individual piece of content tends to contain low levels of information. In addition, such information tends to be informal and imperfect in nature; containing imprecise, subjective, ambiguous expressions. However the content does not have to be interpreted in isolation as it is linked, either explicitly or implicitly, to a network of interrelated content; it may be grouped or tagged with similar content, comments may be added by other users or it may be related to other content posted at the same time or by the same author or members of the author's social network. This paper generally examines how ambiguous concepts within user-generated content can be assigned a specific/formal meaning by considering the expanding context of the information, i.e. other information contained within directly or indirectly related content, and specifically considers the issue of toponym resolution of locations.
Ireson, N. & Ciravegna, F.: Toponym Resolution in Social Media.. In: Patel-Schneider, P. F.; Pan, Y.; Hitzler, P.; Mika, P.; Zhang, L.; Pan, J. Z.; Horrocks, I. & Glimm, B. (Hrsg.):
International Semantic Web Conference (1). Springer, 2010 (Lecture Notes in Computer Science 6496), S. 370-385
[Volltext]
[BibTeX]
Keller, C.:
Theoretical and Practical Perspectives on Ontology Learning from Folksonomies. Universität Stuttgart, 2010
[BibTeX]
Meder, M.:
Multi-Domain Klassifikation basierend auf nutzergenerierten Metadaten. Technische Universität Berlin, 2010
[BibTeX]
Brewster, C.; Jupp, S.; Luciano, J.; Shotton, D.; Stevens, R. D. & Zhang, Z.: Issues in learning an ontology from text. In:
BMC Bioinformatics 10 Suppl 5 (2009),
[Volltext]
[Kurzfassung]
[BibTeX]
BACKGROUND: Ontology construction for any domain is a labour intensive and complex process. Any methodology that can reduce the cost and increase efficiency has the potential to make a major impact in the life sciences. This paper describes an experiment in ontology construction from text for the animal behaviour domain. Our objective was to see how much could be done in a simple and relatively rapid manner using a corpus of journal papers. We used a sequence of pre-existing text processing steps, and here describe the different choices made to clean the input, to derive a set of terms and to structure those terms in a number of hierarchies. We describe some of the challenges, especially that of focusing the ontology appropriately given a starting point of a heterogeneous corpus. RESULTS: Using mainly automated techniques, we were able to construct an 18055 term ontology-like structure with 73% recall of animal behaviour terms, but a precision of only 26%. We were able to clean unwanted terms from the nascent ontology using lexico-syntactic patterns that tested the validity of term inclusion within the ontology. We used the same technique to test for subsumption relationships between the remaining terms to add structure to the initially broad and shallow structure we generated. All outputs are available at http://thirlmere.aston.ac.uk/iffer/animalbehaviour/. CONCLUSION: We present a systematic method for the initial steps of ontology or structured vocabulary construction for scientific domains that requires limited human effort and can make a contribution both to ontology learning and maintenance. The method is useful both for the exploration of a scientific domain and as a stepping stone towards formally rigourous ontologies. The filtering of recognised terms from a heterogeneous corpus to focus upon those that are the topic of the ontology is identified to be one of the main challenges for research in ontology learning.
Cimiano, P.; Mädche, A.; Staab, S. & Völker, J.: Ontology Learning. In: Staab, S. & Studer, R. (Hrsg.):
Handbook on Ontologies. Springer Berlin Heidelberg, 2009International Handbooks Information System , S. 245-267
[Volltext]
[BibTeX]
Garcia, A.; Szomszor, M.; Alani, H. & Corcho, O.: Preliminary Results in Tag Disambiguation using DBpedia.
Knowledge Capture (K-Cap'09) - First International Workshop on Collective Knowledge Capturing and Representation - CKCaR'09. 2009
[Volltext] [Kurzfassung]
[BibTeX]
The availability of tag-based user-generated content for a variety of Web resources (music, photos, videos, text, etc.) has largely increased in the last years. Users can assign tags freely and then use them to share and retrieve information. However, tag-based sharing and retrieval is not optimal due to the fact that tags are plain text labels without an explicit or formal meaning, and hence polysemy and synonymy should be dealt with appropriately. To ameliorate these problems, we propose a context-based tag disambiguation algorithm that selects the meaning of a tag among a set of candidate DBpedia entries, using a common information retrieval similarity measure. The most similar DBpedia en-try is selected as the one representing the meaning of the tag. We describe and analyze some preliminary results, and discuss about current challenges in this area.
Hazman, M.; El-Beltagy, S. R. & Rafea, A.: Ontology learning from domain specific web documents. In:
International Journal of Metadata, Semantics and Ontologies 4 (2009), S. 24-33(10)
[Volltext]
[Kurzfassung]
[BibTeX]
Ontologies play a vital role in many web- and internet-related applications. This work presents a system for accelerating the ontology building process via semi-automatically learning a hierarchal ontology given a set of domain-specific web documents and a set of seed concepts. The methods are tested with web documents in the domain of agriculture. The ontology is constructed through the use of two complementary approaches. The presented system has been used to build an ontology in the agricultural domain using a set of Arabic extension documents and evaluated against a modified version of the AGROVOC ontology.
Plangprasopchok, A. & Lerman, K.: Constructing folksonomies from user-specified relations on flickr.. In: Quemada, J.; León, G.; Maarek, Y. S. & Nejdl, W. (Hrsg.):
WWW. ACM, 2009, S. 781-790
[Kurzfassung]
[BibTeX]
Automatic folksonomy construction from tags has attracted much attention recently. However, inferring hierarchical relations between concepts from tags has a drawback in that it is difficult to distinguish between more popular and more general concepts. Instead of tags we propose to use userspecified relations for learning folksonomy. We explore two statistical frameworks for aggregating many shallow individual hierarchies, expressed through the collection/set relations on the social photosharing site Flickr, into a common deeper folksonomy that reflects how a community organizes knowledge. Our approach addresses a number of challenges that arise while aggregating information from diverse users, namely noisy vocabulary, and variations in the granularity level of the concepts expressed. Our second contribution is a method for automatically evaluating learned folksonomy by comparing it to a reference taxonomy, e.g., the Web directory created by the Open Directory Project. Our empirical results suggest that user-specified relations are a good source of evidence for learning folksonomies.
Silva, L. D. & Jayaratne, L.: Semi-automatic extraction and modeling of ontologies using Wikipedia XML Corpus.
Applications of Digital Information and Web Technologies, 2009. ICADIWT '09. Second International Conference on the. 2009, S. 446-451
[Volltext] [Kurzfassung]
[BibTeX]
This paper introduces WikiOnto: a system that assists in the extraction and modeling of topic ontologies in a semi-automatic manner using a preprocessed document corpus derived from Wikipedia. Based on the Wikipedia XML Corpus, we present a three-tiered framework for extracting topic ontologies in quick time and a modeling environment to refine these ontologies. Using natural language processing (NLP) and other machine learning (ML) techniques along with a very rich document corpus, this system proposes a solution to a task that is generally considered extremely cumbersome. The initial results of the prototype suggest strong potential of the system to become highly successful in ontology extraction and modeling and also inspire further research on extracting ontologies from other semi-structured document corpora as well.
Stützer, S.:
Lernen von Ontologien aus kollaborativen Tagging-Systemen. Kassel, University of Kassel, Master Thesis, 2009
[BibTeX]
Angeletou, S.; Sabou, M. & Motta, E.: Semantically enriching folksonomies with FLOR.
Proceedings of the CISWeb Workshop, located at the 5th European Semantic Web Conference ESWC 2008. 2008
[Volltext] [Kurzfassung]
[BibTeX]
Abstract. While the increasing popularity of folksonomies has lead to a vast quantity of tagged data, resource retrieval in folksonomies is limited by being agnostic to the meaning (i.e., semantics) of tags. Our goal is to automatically enrich folksonomy tags (and implicitly the related resources) with formal semantics by associating them to relevant concepts defined in online ontologies. We introduce FLOR, a method that performs automatic folksonomy enrichment by combining knowledge from WordNet and online available ontologies. Experimentally testing FLOR, we found that it correctly enriched 72 % of 250 Flickr photos. 1
García-Silva, A.; Gómez-Pérez, A.; Suárez-Figueroa, M. & Villazón-Terrazas, B.: A Pattern Based Approach for Re-engineering Non-Ontological Resources into Ontologies. 2008, S. 167-181
[Volltext] [Kurzfassung]
[BibTeX]
With the goal of speeding up the ontology development process, ontology engineers are starting to reuse as much as possible available ontologies and non-ontological resources such as classification schemes, thesauri, lexicons and folksonomies, that already have some degree of consensus. The reuse of such non-ontological resources necessarily involves their re-engineering into ontologies. Non-ontological resources are highly heterogeneous in their data model and contents: they encode different types of knowledge, and they can be modeled and implemented in different ways. In this paper we present (1) a typology for non-ontological resources, (2) a pattern based approach for re-engineering non-ontological resources into ontologies, and (3) a use case of the proposed approach.
Hjelm, H. & Buitelaar, P.: Multilingual Evidence Improves Clustering-based Taxonomy Extraction.. In: Ghallab, M.; Spyropoulos, C. D.; Fakotakis, N. & Avouris, N. M. (Hrsg.):
ECAI. IOS Press, 2008 (Frontiers in Artificial Intelligence and Applications 178), S. 288-292
[Volltext] [Kurzfassung]
[BibTeX]
We present a system for taxonomy extraction, aimed at providing a taxonomic backbone in an ontology learning environment. We follow previous research in using hierarchical clustering based on distributional similarity of the terms in texts. We show that basing the clustering on a comparable corpus in four languages gives a considerable improvement in accuracy compared to using only the monolingual English texts. We also show that hierarchical k-means clustering increases the similarity to the original taxonomy, when compared with a bottom-up agglomerative clustering approach.
Marinho, L. B.; Buza, K. & Schmidt-Thieme, L.: Folksonomy-Based Collabulary Learning.. In: Sheth, A. P.; Staab, S.; Dean, M.; Paolucci, M.; Maynard, D.; Finin, T. W. & Thirunarayan, K. (Hrsg.):
International Semantic Web Conference. Springer, 2008 (Lecture Notes in Computer Science 5318), S. 261-276
[Volltext] [Kurzfassung]
[BibTeX]
The growing popularity of social tagging systems promises to alleviate the knowledge bottleneck that slows down the full materialization of the SemanticWeb since these systems allow ordinary users to create and share knowledge in a simple, cheap, and scalable representation, usually known as folksonomy. However, for the sake of knowledge workflow, one needs to find a compromise between the uncontrolled nature of folksonomies and the controlled and more systematic vocabulary of domain experts. In this paper we propose to address this concern by devising a method that automatically enriches a folksonomy with domain expert knowledge and by introducing a novel algorithm based on frequent itemset mining techniques to efficiently learn an ontology over the enriched folksonomy. In order to quantitatively assess our method, we propose a new benchmark for task-based ontology evaluation where the quality of the ontologies is measured based on how helpful they are for the task of personalized information finding. We conduct experiments on real data and empirically show the effectiveness of our approach.
Nazir, F. & Takeda, H.: Extraction and analysis of tripartite relationships from Wikipedia.
IEEE International Symposium on Technology and Society. 2008, S. 1-13
[Volltext] [Kurzfassung]
[BibTeX]
Social aspects are critical in the decision making process for social actors (human beings). Social aspects can be categorized into social interaction, social communities, social groups or any kind of behavior that emerges from interlinking, overlapping or similarities between interests of a society. These social aspects are dynamic and emergent. Therefore, interlinking them in a social structure, based on bipartite affiliation network, may result in isolated graphs. The major reason is that as these correspondences are dynamic and emergent, they should be coupled with more than a single affiliation in order to sustain the interconnections during interest evolutions. In this paper we propose to interlink actors using multiple tripartite graphs rather than a bipartite graph which was the focus of most of the previous social network building techniques. The utmost benefit of using tripartite graphs is that we can have multiple and hierarchical links between social actors. Therefore in this paper we discuss the extraction, plotting and analysis methods of tripartite relations between authors, articles and categories from Wikipedia. Furthermore, we also discuss the advantages of tripartite relationships over bipartite relationships. As a conclusion of this study we argue based on our results that to build useful, robust and dynamic social networks, actors should be interlinked in one or more tripartite networks.
Siorpaes, K. & Hepp, M.: Games with a Purpose for the Semantic Web. In:
IEEE Intelligent Systems 23 (2008), Nr. 3, S. 50-60
[Volltext]
[Kurzfassung]
[BibTeX]
Weaving the Semantic Web requires that humans contribute their labor and judgment for creating, extending, and updating formal knowledge structures. Hiding such tasks behind online multiplayer games presents the tasks as fun and intellectually challenging entertainment.
Tesconi, M.; Ronzano, F.; Marchetti, A. & Minutoli, S.: Semantify del.icio.us: Automatically Turn your Tags into Senses.
Proceedings of the Workshop Social Data on the Web (SDoW2008). 2008
[Volltext] [Kurzfassung]
[BibTeX]
At present tagging is experimenting a great diffusion as the most adopted way to collaboratively classify resources over the Web. In this paper, after a detailed analysis of the attempts made to improve the organization and structure of tagging systems as well as the usefulness of this kind of social data, we propose and evaluate the Tag Disambiguation Algorithm, mining del.icio.us data. It allows to easily semantify the tags of the users of a tagging service: it automatically finds out for each tag the related concept of Wikipedia in order to describe Web resources through senses. On the basis of a set of evaluation tests, we analyze all the advantages of our sense-based way of tagging, proposing new methods to keep the set of users tags more consistent or to classify the tagged resources on the basis of Wikipedia categories, YAGO classes or Wordnet synsets. We discuss also how our semanitified social tagging data are strongly linked to DBPedia and the datasets of the Linked Data community. 1
Tesconi, M.; Ronzano, F.; Marchetti, A. & Minutoli, S.: Semantify del.icio.us: Automatically Turn your Tags into Senses. 2008
[Volltext]
[BibTeX]
Zhou, G.; Zhang, M.; Ji, D. & Zhu, Q.: Hierarchical learning strategy in semantic relation extraction. In:
Information Process Managegement 44 (2008), Nr. 3, S. 1008-1021
[Volltext]
[Kurzfassung]
[BibTeX]
This paper proposes a novel tree kernel-based method with rich syntactic and semantic information for the extraction of semantic relations between named entities. With a parse tree and an entity pair, we first construct a rich semantic relation tree structure to integrate both syntactic and semantic information. And then we propose a context-sensitive convolution tree kernel, which enumerates both context-free and context-sensitive sub-trees by considering the paths of their ancestor nodes as their contexts to capture structural information in the tree structure. An evaluation on the Automatic Content Extraction/Relation Detection and Characterization (ACE RDC) corpora shows that the proposed tree kernelbased method outperforms other state-of-the-art methods.
Auer, Sö. & Lehmann, J.: What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content.
ESWC. 2007, S. 503-517
[Volltext] [Kurzfassung]
[BibTeX]
Wikis are established means for the collaborative authoring, versioning and publishing of textual articles. The Wikipedia project, for example, succeeded in creating the by far largest encyclopedia just on the basis of a wiki. Recently, several approaches have been proposed on how to extend wikis to allow the creation of structured and semantically enriched content. However, the means for creating semantically enriched structured content are already available and are, although unconsciously, even used by Wikipedia authors. In this article, we present a method for revealing this structured content by extracting information from template instances. We suggest ways to efficiently query the vast amount of extracted information (e.g. more than 8 million RDF statements for the English Wikipedia version alone), leading to astonishing query answering possibilities (such as for the title question). We analyze the quality of the extracted content, and propose strategies for quality improvements with just minor modifications of the wiki systems being currently used.
Baeza-Yates, R. & Tiberi, A.: Extracting semantic relations from query logs.
KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. New York, NY, USA: ACM, 2007, S. 76-85
[Volltext] [Kurzfassung]
[BibTeX]
In this paper we study a large query log of more than twenty million queries with the goal of extracting the semantic relations that are implicitly captured in the actions of users submitting queries and clicking answers. Previous query log analyses were mostly done with just the queries and not the actions that followed after them. We first propose a novel way to represent queries in a vector space based on a graph derived from the query-click bipartite graph. We then analyze the graph produced by our query log, showing that it is less sparse than previous results suggested, and that almost all the measures of these graphs follow power laws, shedding some light on the searching user behavior as well as on the distribution of topics that people want in the Web. The representation we introduce allows to infer interesting semantic relationships between queries. Second, we provide an experimental analysis on the quality of these relations, showing that most of them are relevant. Finally we sketch an application that detects multitopical URLs.
Benz, D. & Hotho, A.: Position Paper: Ontology Learning from Folksonomies. In: Hinneburg, A. (Hrsg.):
Workshop Proceedings of Lernen - Wissensentdeckung - Adaptivität (LWA 2007). Martin-Luther-Universität Halle-Wittenberg, 2007, S. 109-112
[Volltext] [Kurzfassung]
[BibTeX]
The emergence of collaborative tagging systems with their underlying flat and uncontrolled resource organization paradigm has led to a large number of research activities focussing on a formal description and analysis of the resulting “folksonomies�?. An interesting outcome is that the characteristic qualities of these systems seem to be inverse to more traditional knowledge structuring approaches like taxonomies or ontologies: The latter provide rich and precise semantics, but suffer - amongst others - from a knowledge acquisition bottleneck. An important step towards exploiting the possible synergies by bridging the gap between both paradigms is the automatic extraction of relations between tags in a folksonomy. This position paper presents preliminary results of ongoing work to induce hierarchical relationships among tags by analyzing the aggregated data of collaborative tagging systems as a basis for an ontology learning procedure.
Damme, C. V.; Hepp, M. & Siorpaes, K.: FolksOntology: An Integrated Approach for Turning Folksonomies into Ontologies.
Bridging the Gep between Semantic Web and Web 2.0 (SemNet 2007). 2007, S. 57-70
[Volltext] [Kurzfassung]
[BibTeX]
We can observe that the amount of non-toy domain ontologies is stillvery limited for many areas of interest. In contrast, folksonomies are widely inuse for (1) tagging Web pages (e.g. del.icio.us), (2) annotating pictures (e.g.flickr), or (3) classifying scholarly publications (e.g. bibsonomy). However,such folksonomies cannot offer the expressivity of ontologies, and therespective tags often lack a context-independent and intersubjective definitionof meaning. Also, folksonomies and other unsupervised vocabularies frequentlysuffer from inconsistencies and redundancies. In this paper, we argue that thesocial interaction manifested in folksonomies and in their usage should beexploited for building and maintaining ontologies. Then, we sketch acomprehensive approach for deriving ontologies from folksonomies byintegrating multiple resources and techniques. In detail, we suggest combining(1) the statistical analysis of folksonomies, associated usage data, and theirimplicit social networks, (2) online lexical resources like dictionaries, Wordnet,Google and Wikipedia, (3) ontologies and Semantic Web resources, (4)ontology mapping and matching approaches, and (5) functionality that helpshuman actors in achieving and maintaining consensus over ontology elementsuggestions resulting from the preceding steps.
Kennedy, L.; Naaman, M.; Ahern, S.; Nair, R. & Rattenbury, T.: How flickr helps us make sense of the world: context and content in community-contributed media collections.
MULTIMEDIA '07: Proceedings of the 15th international conference on Multimedia. New York, NY, USA: ACM, 2007, S. 631-640
[Volltext] [Kurzfassung]
[BibTeX]
The advent of media-sharing sites like Flickr and YouTube has drastically increased the volume of community-contributed multimedia resources available on the web. These collections have a previously unimagined depth and breadth, and have generated new opportunities – and new challenges – to multimedia research. How do we analyze, understand and extract patterns from these new collections? How can we use these unstructured, unrestricted community contributions of media (and annotation) to generate “knowledge�?? As a test case, we study Flickr – a popular photo sharing website. Flickr supports photo, time and location metadata, as well as a light-weight annotation model. We extract information from this dataset using two different approaches. First, we employ a location-driven approach to generate aggregate knowledge in the form of “representative tags�? for arbitrary areas in the world. Second, we use a tag-driven approach to automatically extract place and event semantics for Flickr tags, based on each tag’s metadata patterns. With the patterns we extract from tags and metadata, vision algorithms can be employed with greater precision. In particular, we demonstrate a location-tag-vision-based approach to retrieving images of geography-related landmarks and features from the Flickr dataset. The results suggest that community-contributed media and annotation can enhance and improve our access to multimedia resources – and our understanding of the world.
Zhou, L.: Ontology learning: state of the art and open issues. In:
Information Technology and Management 8 (2007), Nr. 3, S. 241-252
[Volltext]
[Kurzfassung]
[BibTeX]
Abstract&&Ontology is one of the fundamental cornerstones of the semantic Web. The pervasive use of ontologies in information sharing and knowledge management calls for efficient and effective approaches to ontology development. Ontology learning, which seeks to discover ontological knowledge from various forms of data automatically or semi-automatically, can overcome the bottleneck of ontology acquisition in ontology development. Despite the significant progress in ontology learning research over the past decade, there remain a number of open problems in this field. This paper provides a comprehensive review and discussion of major issues, challenges, and opportunities in ontology learning. We propose a new learning-oriented model for ontology development and a framework for ontology learning. Moreover, we identify and discuss important dimensions for classifying ontology learning approaches and techniques. In light of the impact of domain on choosing ontology learning approaches, we summarize domain characteristics that can facilitate future ontology learning effort. The paper offers a road map and a variety of insights about this fast-growing field.
Cimiano, P.:
Ontology learning and population from text - algorithms, evaluation and applications.. Springer, 2006, S. I-XXVIII, 1-347
[BibTeX]
Dellschaft, K. & Staab, S.: On How to Perform a Gold Standard based Evaluation of Ontology Learning.
Proceedings of ISWC-2006 International Semantic Web Conference. Athens, GA, USA: Springer, LNCS, 2006
[Volltext] [Kurzfassung]
[BibTeX]
In recent years several measures for the gold standard based evaluation of ontology learning were proposed. They can be distinguished by the layers of an ontology (e.g. lexical term layer and concept hierarchy) they evaluate. Judging those measures with a list of criteria we show that there exist some measures sufficient for evaluating the lexical term layer. However, existing measures for the evaluation of concept hierarchies fail to meet basic criteria. This paper presents a new taxonomic measure which overcomes the problems of current approaches.
Girju, R.; Badulescu, A. & Moldovan, D. I.: Automatic Discovery of Part-Whole Relations.. In:
Computational Linguistics 32 (2006), Nr. 1, S. 83-135
[Volltext]
[Kurzfassung]
[BibTeX]
An important problem in knowledge discovery from text is the automatic extraction of semantic relations. This paper presents a supervised, semantically intensive, domain independent approach for the automatic detection of part–whole relations in text. First an algorithm is described that identifies lexico-syntactic patterns that encode part–whole relations. A difficulty is that these patterns also encode other semantic relations, and a learning method is necessary to discriminate whether or not a pattern contains a part–whole relation. A large set of training examples have been annotated and fed into a specialized learning system that learns classification rules. The rules are learned through an iterative semantic specialization (ISS) method applied to noun phrase constituents. Classification rules have been generated this way for different patterns such as genitives, noun compounds, and noun phrases containing prepositional phrases to extract part–whole relations from them. The applicability of these rules has been tested on a test corpus obtaining an overall average precision of 80.95% and recall of 75.91%. The results demonstrate the importance of word sense disambiguation for this task. They also demonstrate that different lexico-syntactic patterns encode different semantic information and should be treated separately in the sense that different clarification rules apply to different patterns.
Grobelnik, M. & Mladenić, D.:
Knowledge Discovery for Ontology Construction. , 2006
[Volltext] [Kurzfassung]
[BibTeX]
Summary 10.1002/047003033X.ch2.abs This chapter contains sections titled: * Introduction * Knowledge Discovery * Ontology Definition * Methodology for Semi-automatic Ontology Construction * Ontology Learning Scenarios * Using Knowledge Discovery for Ontology Learning * Related Work on Ontology Construction * Discussion and Conclusion * Acknowledgments * References
Snow, R.; Jurafsky, D. & Ng, A. Y.: Semantic Taxonomy Induction from Heterogenous Evidence..
ACL. The Association for Computer Linguistics, 2006
[Volltext] [Kurzfassung]
[BibTeX]
We propose a novel algorithm for inducing semantic taxonomies. Previous algorithms for taxonomy induction have typically focused on independent classifiers for discovering new single relationships based on hand-constructed or automatically discovered textual patterns. By contrast, our algorithm flexibly incorporates evidence from multiple classifiers over heterogenous relationships to optimize the entire structure of the taxonomy, using knowledge of a word’s coordinate terms to help in determining its hypernyms, and vice versa. We apply our algorithm on the problem of sense-disambiguated noun hyponym acquisition, where we combine the predictions of hypernym and coordinate term classifiers with the knowledge in a preexisting semantic taxonomy (WordNet 2.1). We add 10; 000 novel synsets to WordNet 2.1 at 84% precision, a relative error reduction of 70% over a non-joint algorithm using the same component classifiers. Finally, we show that a taxonomy built using our algorithm shows a 23% relative F-score improvement over WordNet 2.1 on an independent testset of hypernym pairs.
Biemann, C.: Ontology Learning from Text: A Survey of Methods.. In:
LDV Forum 20 (2005), Nr. 2, S. 75-93
[Volltext]
[BibTeX]
Brank, J.; Grobelnik, M. & Mladenić, D.: A Survey of Ontology Evaluation Techniques.
Proc. of 8th Int. multi-conf. Information Society. 2005, S. 166-169
[Kurzfassung]
[BibTeX]
An ontology is an explicit formal conceptualization of some domain of interest. Ontologies are increasingly used in various fields such as knowledge management, information extraction, and the semantic web. Ontology evaluation is the problem of assessing a given ontology from the point of view of a particular criterion of application, typically in order to determine which of several ontologies would best suit a particular purpose. This paper presents a survey of the state of the art in ontology evaluation.
Cimiano, P.; Hotho, A. & Staab, S.: Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis. In:
Journal of Artificial Intelligence Research 24 (2005), Nr. 1, S. 305-339
[Volltext]
[BibTeX]
Cimiano, P.; Hotho, A. & Staab, S.: Comparing Conceptual, Divise and Agglomerative Clustering for Learning Taxonomies from Text. In: de Mántaras, R. L. & Saitta, L. (Hrsg.):
ECAI 2004 Proceedings of the 16th European Conference on Artificial Intelligence, 22 - 27 August, Valencia, Spain. IOS Press, 2004, S. 435-439
[Kurzfassung]
[BibTeX]
The application of clustering methods for automatic taxonomy construction from text requires knowledge about the tradeoff between, (i), their effectiveness (quality of result), (ii), efficiency (run-time behaviour), and, (iii), traceability of the taxonomy construction by the ontology engineer. In this line, we present an original conceptual clustering method based on Formal Concept Analysis for automatic taxonomy construction and compare it with hierarchical agglomerative clustering and hierarchical divisive clustering.
Cimiano, P.; Staab, S. & Tane, J.: Automatic Acquisition of Taxonomies from Text: FCA meets NLP.
Proceedings of the ECML / PKDD Workshop on Adaptive Text Extraction and Mining. Cavtat-Dubrovnik, Croatia: 2003, S. 10-17
[Volltext] [Kurzfassung]
[BibTeX]
We present a novel approach to the automatic acquisition of taxonomies or concept hierarchies from domain-specific texts based on Formal Concept Analysis (FCA). Our approach is based on the assumption that verbs pose more or less strong selectional restrictions on their arguments. The conceptual hierarchy is then built on the basis of the inclusion relations between the extensions of the selectional restrictions of all the verbs, while the verbs themselves provide intensional descriptions for each concept. We formalize this idea in terms of FCA and show how our approach can be used to acquire a concept hierarchy for the tourism domain out of texts. We then evaluate our method by considering an already existing ontology for this domain.
Widdows, D. & Dorow, B.: A Graph Model for Unsupervised Lexical Acquisition.
COLING. 2002
[BibTeX]
Omelayenko, B.: Learning of Ontologies for the Web: the Analysis of Existent Approaches.
Proceedings of the International Workshop on Web Dynamics, held in conj. with the 8th International Conference on Database Theory (ICDT’01), London, UK. 2001
[Volltext] [Kurzfassung]
[BibTeX]
The next generation of the Web, called Semantic Web, has to improve the Web with semantic (ontological) page annotations to enable knowledge-level querying and searches. Manual construction of these ontologies will require tremendous efforts that force future integration of machine learning with knowledge acquisition to enable highly automated ontology learning. In the paper we present the state of the-art in the field of ontology learning from the Web to see how it can contribute to the task of semantic Web querying. We consider three components of the query processing system: natural language ontologies, domain ontologies and ontology instances. We discuss the requirements for machine learning algorithms to be applied for the learning of the ontologies of each type from the Web documents, and survey the existent ontology learning and other closely related approaches.
Studer, R.; Benjamins, R. R. & Fensel, D.: Knowledge Engineering: Principles and Methods. In:
Data Knowledge Engineering 25 (1998), Nr. 1-2, S. 161-197
[Volltext]
[Kurzfassung]
[BibTeX]
This paper gives an overview about the development of the field of Knowledge Engineering over the last 15 years. We discuss the paradigm shift from a transfer view to a modeling view and describe two approaches which considerably shaped research in Knowledge Engineering: Role-limiting Methods and Generic Tasks. To illustrate various concepts and methods which evolved in the last years we describe three modeling frameworks: CommonKADS, MIKE, and PROTÉGÉ-II. This description is supplemented by discussing some important methodological developments in more detail: specification languages for knowledge-based systems, problem-solving methods, and ontologies. We conclude with outlining the relationship of Knowledge Engineering to Software Engineering, Information Integration and Knowledge Management.