TY - CHAP AU - Haridas, Mandar AU - Caragea, Doina A2 - Meersman, Robert A2 - Dillon, Tharam A2 - Herrero, Pilar T1 - Exploring Wikipedia and DMoz as Knowledge Bases for Engineering a User Interests Hierarchy for Social Network Applications T2 - On the Move to Meaningful Internet Systems: OTM 2009 PB - Springer CY - Berlin / Heidelberg PY - 2009/ VL - 5871 IS - SP - 1238 EP - 1245 UR - http://dx.doi.org/10.1007/978-3-642-05151-7_35 M3 - 10.1007/978-3-642-05151-7_35 KW - dmoz KW - genta11 KW - hierarchy KW - taxonomy KW - wordnet KW - ol_web2.0 KW - data_wikis KW - methods_concepthierarchy L1 - SN - N1 - SpringerLink - Abstract N1 - AB - The outgrowth of social networks in the recent years has resulted in opportunities for interesting data mining problems, such as interest or friendship recommendations. A global ontology over the interests specified by the users of a social network is essential for accurate recommendations. We propose, evaluate and compare three approaches to engineering a hierarchical ontology over user interests. The proposed approaches make use of two popular knowledge bases, Wikipedia and Directory Mozilla, to extract interest definitions and/or relationships between interests. More precisely, the first approach uses Wikipedia to find interest definitions, the latent semantic analysis technique to measure the similarity between interests based on their definitions, and an agglomerative clustering algorithm to group similar interests into higher level concepts. The second approach uses the Wikipedia Category Graph to extract relationships between interests, while the third approach uses Directory Mozilla to extract relationships between interests. Our results show that the third approach, although the simplest, is the most effective for building a hierarchy over user interests. ER - TY - CONF AU - Silva, L. De AU - Jayaratne, L. A2 - T1 - Semi-automatic extraction and modeling of ontologies using Wikipedia XML Corpus T2 - Applications of Digital Information and Web Technologies, 2009. ICADIWT '09. Second International Conference on the PB - CY - PY - 2009/aug. M2 - VL - IS - SP - 446 EP - 451 UR - http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5273826&arnumber=5273871&count=156&index=116 M3 - 10.1109/ICADIWT.2009.5273871 KW - learning KW - ol_web2.0 KW - ontology KW - ontology_learning KW - semi_automatic KW - wikipedia KW - data_wikis L1 - SN - N1 - Welcome to IEEE Xplore 2.0: Semi-automatic extraction and modeling of ontologies using Wikipedia XML Corpus N1 - AB - This paper introduces WikiOnto: a system that assists in the extraction and modeling of topic ontologies in a semi-automatic manner using a preprocessed document corpus derived from Wikipedia. Based on the Wikipedia XML Corpus, we present a three-tiered framework for extracting topic ontologies in quick time and a modeling environment to refine these ontologies. Using natural language processing (NLP) and other machine learning (ML) techniques along with a very rich document corpus, this system proposes a solution to a task that is generally considered extremely cumbersome. The initial results of the prototype suggest strong potential of the system to become highly successful in ontology extraction and modeling and also inspire further research on extracting ontologies from other semi-structured document corpora as well. ER - TY - CONF AU - Grineva, Maria AU - Grinev, Maxim AU - Turdakov, Denis AU - Velikhov, Pavel A2 - T1 - Harnessing Wikipedia for Smart Tags Clustering T2 - Proceedings of the International Workshop on Knowledge Acquisition from the Social Web (KASW2008) PB - CY - PY - 2008/ M2 - VL - IS - SP - EP - UR - M3 - KW - clustering KW - ol_web2.0 KW - tags KW - wikipedia KW - methods_concepts KW - data_wikis L1 - SN - N1 - N1 - AB - The quality of the current tagging services can be greatly improved if the service is able to cluster tags by their meaning. Tag clouds clustered by higher level topics enable the users to explore their tag space, which is especially needed when tag clouds become large. We demonstrate TagCluster - a tool for automated tag clustering that harnesses knowledge from Wikipedia about semantic relatedness between tags and names of categories to achieve smart clustering. Our approach shows much better quality of clusters compared to the existing techniques that rely on tag co-occurrence analysis in the tagging service. ER - TY - CONF AU - Medelyan, O. AU - Legg, C. A2 - T1 - Integrating Cyc and Wikipedia: Folksonomy meets rigorously defined common-sense T2 - Proceedings of the WIKI-AI: Wikipedia and AI Workshop at the AAAI PB - CY - PY - 2008/ M2 - VL - 8 IS - SP - EP - UR - http://scholar.google.de/scholar.bib?q=info:hgFpsjJR__4J:scholar.google.com/&output=citation&hl=de&as_sdt=2000&ct=citation&cd=58 M3 - KW - cyc KW - ol_web2.0 KW - tag_concept_mapping KW - data_wikis L1 - SN - N1 - N1 - AB - Integration of ontologies begins with establishing mappings between their concept entries. We map categories from the largest manually-built ontology, Cyc, onto Wikipedia articles describing corresponding concepts. Our method draws both on Wikipedia’s rich but chaotic hyperlink structure and Cyc’s carefully defined taxonomic and common-sense knowledge. On 9,333 manual alignments by one person, we achieve an F-measure of 90%; on 100 alignments by six human subjects the average agreement of the method with the subject is close to their agreement with each other. We cover 62.8% of Cyc categories relating to common-sense knowledge and discuss what further information might be added to Cyc given this substantial new alignment. ER - TY - CONF AU - Nazir, F. AU - Takeda, H. A2 - T1 - Extraction and analysis of tripartite relationships from Wikipedia T2 - IEEE International Symposium on Technology and Society PB - CY - PY - 2008/06 M2 - VL - IS - SP - 1 EP - 13 UR - http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4559785 M3 - 10.1109/ISTAS.2008.4559785 KW - ol_web2.0 KW - ontology_learning KW - wikipedia KW - data_wikis L1 - SN - 978-1-4244-1669-1 N1 - N1 - AB - Social aspects are critical in the decision making process for social actors (human beings). Social aspects can be categorized into social interaction, social communities, social groups or any kind of behavior that emerges from interlinking, overlapping or similarities between interests of a society. These social aspects are dynamic and emergent. Therefore, interlinking them in a social structure, based on bipartite affiliation network, may result in isolated graphs. The major reason is that as these correspondences are dynamic and emergent, they should be coupled with more than a single affiliation in order to sustain the interconnections during interest evolutions. In this paper we propose to interlink actors using multiple tripartite graphs rather than a bipartite graph which was the focus of most of the previous social network building techniques. The utmost benefit of using tripartite graphs is that we can have multiple and hierarchical links between social actors. Therefore in this paper we discuss the extraction, plotting and analysis methods of tripartite relations between authors, articles and categories from Wikipedia. Furthermore, we also discuss the advantages of tripartite relationships over bipartite relationships. As a conclusion of this study we argue based on our results that to build useful, robust and dynamic social networks, actors should be interlinked in one or more tripartite networks. ER - TY - CONF AU - Auer, Sören AU - Lehmann, Jens A2 - T1 - What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content T2 - ESWC PB - CY - PY - 2007/ M2 - VL - IS - SP - 503 EP - 517 UR - http://www.springerlink.com/content/3131t21p634191n2/ M3 - KW - ol_web2.0 KW - ontology_learning KW - semantics KW - wiki KW - data_wikis L1 - SN - N1 - N1 - AB - Wikis are established means for the collaborative authoring, versioning and publishing of textual articles. The Wikipedia project, for example, succeeded in creating the by far largest encyclopedia just on the basis of a wiki. Recently, several approaches have been proposed on how to extend wikis to allow the creation of structured and semantically enriched content. However, the means for creating semantically enriched structured content are already available and are, although unconsciously, even used by Wikipedia authors. In this article, we present a method for revealing this structured content by extracting information from template instances. We suggest ways to efficiently query the vast amount of extracted information (e.g. more than 8 million RDF statements for the English Wikipedia version alone), leading to astonishing query answering possibilities (such as for the title question). We analyze the quality of the extracted content, and propose strategies for quality improvements with just minor modifications of the wiki systems being currently used. ER - TY - CONF AU - Ponzetto, Simone Paolo AU - Strube, Michael A2 - T1 - Deriving a Large-Scale Taxonomy from Wikipedia. T2 - AAAI PB - AAAI Press CY - PY - 2007/ M2 - VL - IS - SP - 1440 EP - 1445 UR - http://dblp.uni-trier.de/db/conf/aaai/aaai2007.html#PonzettoS07 M3 - KW - download KW - ol_web2.0 KW - online KW - ontology KW - taxonomy KW - wikipedia KW - methods_concepthierarchy KW - data_wikis L1 - SN - 978-1-57735-323-2 N1 - dblp N1 - AB - We take the category system inWikipedia as a conceptual network. We label the semantic relations between categories using methods based on connectivity in the network and lexicosyntactic matching. As a result we are able to derive a large scale taxonomy containing a large amount of subsumption, i.e. isa, relations. We evaluate the quality of the created resource by comparing it with ResearchCyc, one of the largest manually annotated ontologies, as well as computing semantic similarity between words in benchmarking datasets. ER - TY - CONF AU - Strube, Michael AU - Ponzetto, Simone Paolo A2 - T1 - WikiRelate! Computing Semantic Relatedness Using Wikipedia. T2 - AAAI PB - AAAI Press CY - PY - 2006/ M2 - VL - IS - SP - EP - UR - http://www.dit.unitn.it/~p2p/RelatedWork/Matching/aaai06.pdf M3 - KW - ol_web2.0 KW - semantic_relatedness KW - wikipedia KW - wikirelate KW - data_wikis L1 - SN - N1 - dblp N1 - AB - Wikipedia provides a knowledge base for computing word relatedness in a more structured fashion than a search engine and with more coverage than WordNet. In this work we present experiments on using Wikipedia for computing semantic relatedness and compare it to WordNet on various benchmarking datasets. Existing relatedness measures perform better using Wikipedia than a baseline given by Google counts, and we show that Wikipedia outperforms WordNet when applied to the largest available dataset designed for that purpose. The best results on this dataset are obtained by integrating Google, WordNet and Wikipedia based measures. We also show that including Wikipedia improves the performance of an NLP application processing naturally occurring texts. ER - TY - CHAP AU - Ruiz-Casado, Maria AU - Alfonseca, Enrique AU - Castells, Pablo A2 - Montoyo, Andrés A2 - Muñoz, Rafael A2 - Métais, Elisabeth T1 - Automatic Extraction of Semantic Relationships for WordNet by Means of Pattern Learning from Wikipedia T2 - Natural Language Processing and Information Systems PB - Springer CY - Berlin / Heidelberg PY - 2005/ VL - 3513 IS - SP - 233 EP - 242 UR - http://dx.doi.org/10.1007/11428817_7 M3 - 10.1007/11428817_7 KW - ol_web2.0 KW - patterns KW - wikipedia KW - wordnet KW - data_wikis KW - methods_relations L1 - SN - N1 - SpringerLink - Abstract N1 - AB - This paper describes an automatic approach to identify lexical patterns which represent semantic relationships between concepts, from an on-line encyclopedia. Next, these patterns can be applied to extend existing ontologies or semantic networks with new relations. The experiments have been performed with the Simple English Wikipedia and WordNet 1.7. A new algorithm has been devised for automatically generalising the lexical patterns found in the encyclopedia entries. We have found general patterns for the hyperonymy, hyponymy, holonymy and meronymy relations and, using them, we have extracted more than 1200 new relationships that did not appear in WordNet originally. The precision of these relationships ranges between 0.61 and 0.69, depending on the relation. ER -