Text Mining Scientific Papers: A Survey on FCA-Based Information Retrieval Research
Formal Concept Analysis (FCA) is an unsupervised clustering technique and many scientific papers are devoted to applying FCA in Information Retrieval (IR) research. We collected 103 papers published between 2003-2009 which mention FCA and information retrieval in the abstract, title or keywords. Using a prototype of our FCA-based toolset CORDIET, we converted the pdf-files containing the papers to plain text, indexed them with Lucene using a thesaurus containing terms related to FCA research and then created the concept lattice shown in this paper. We visualized, analyzed and explored the literature with concept lattices and discovered multiple interesting research streams in IR of which we give an extensive overview. The core contributions of this paper are the innovative application of FCA to the text mining of scientific papers and the survey of the FCA-based IR research.
Publication Analysis of the Formal Concept Analysis Community
We present an analysis of the publication and citation networks of all previous editions of the three conferences most relevant to the FCA community: ICFCA, ICCS and CLA. Using data mining methods from FCA and graph analysis, we investigate patterns and communities among authors, we identify and visualize influential publications and authors, and we give a statistical summary of the conferences’ history.
Computing iceberg concept lattices with TITANIC
We introduce the notion of iceberg concept lattices
and show their use in knowledge discovery in
databases. Iceberg lattices are a conceptual
clustering method, which is well suited for analyzing
very large databases. They also serve as a condensed
representation of frequent itemsets, as starting
point for computing bases of association rules, and
as a visualization method for association rules.
Iceberg concept lattices are based on the theory of
Formal Concept Analysis, a mathematical theory with
applications in data analysis, information retrieval,
and knowledge discovery. We present a new algorithm
called TITANIC for computing (iceberg) concept
lattices. It is based on data mining techniques with
a level-wise approach. In fact, TITANIC can be used
for a more general problem: Computing arbitrary
closure systems when the closure operator comes along
with a so-called weight function. The use of weight
functions for computing closure systems has not been
discussed in the literature up to now. Applications
providing such a weight function include association
rule mining, functional dependencies in databases,
conceptual clustering, and ontology engineering. The
algorithm is experimentally evaluated and compared
with Ganter's Next-Closure algorithm. The evaluation
shows an important gain in efficiency, especially for
weakly correlated data.
Attribute Exploration on the Web
We propose an approach for supporting attribute exploration by web information retrieval, in particular by posing appropriate queries to search engines, crowd sourcing systems, and the linked open data cloud. We discuss underlying general assumptions for this to work and the degree to which these can be taken for granted.