D. Sánchez. The Knowledge Engineering Review24 (04):
Ontology Learning is defined as the set of methods used for building from scratch, enriching or adapting an existing ontology in a semi-automatic fashion using heterogeneous information sources. This data-driven procedure uses text, electronic dictionaries, linguistic ontologies and structured and semi-structured information to acquire knowledge. Recently, with the enormous growth of the Information Society, the Web has become a valuable source of information for almost every possible domain of knowledge. This has motivated researchers to start considering the Web as a valid repository for Information Retrieval and Knowledge Acquisition. However, the Web suffers from problems that are not typically observed in classical information repositories: human oriented presentation, noise, untrusted sources, high dynamicity and overwhelming size. Even though, it also presents characteristics that can be interesting for knowledge acquisition: due to its huge size and heterogeneity it has been assumed that the Web approximates the real distribution of the information in humankind. The present work introduces a novel approach for ontology learning, introducing new methods for knowledge acquisition from the Web. The adaptation of several well known learning techniques to the web corpus and the exploitation of particular characteristics of the Web environment composing an automatic, unsupervised and domain independent approach distinguishes the present proposal from previous works. With respect to the ontology building process, the following methods have been developed: i) extraction and selection of domain related terms, organising them in a taxonomical way; ii) discovery and label of non-taxonomical relationships between concepts; iii) additional methods for improving the final structure, including the detection of named entities, class features, multiple inheritance and also a certain degree of semantic disambiguation. The full learning methodology has been implemented in a distributed agent-based fashion, providing a scalable solution. It has been evaluated for several well distinguished domains of knowledge, obtaining good quality results. Finally, several direct applications have been developed, including automatic structuring of digital libraries and web resources, and ontology-based Web Information Retrieval.