Text Mining Scientific Papers: A Survey on FCA-Based Information Retrieval Research

Formal Concept Analysis (FCA) is an unsupervised clustering technique and many scientific papers are devoted to applying FCA in Information Retrieval (IR) research. We collected 103 papers published between 2003-2009 which mention FCA and information retrieval in the abstract, title or keywords. Using a prototype of our FCA-based toolset CORDIET, we converted the pdf-files containing the papers to plain text, indexed them with Lucene using a thesaurus containing terms related to FCA research and then created the concept lattice shown in this paper. We visualized, analyzed and explored the literature with concept lattices and discovered multiple interesting research streams in IR of which we give an extensive overview. The core contributions of this paper are the innovative application of FCA to the text mining of scientific papers and the survey of the FCA-based IR research.

Publication Analysis of the Formal Concept Analysis Community

We present an analysis of the publication and citation networks of all previous editions of the three conferences most relevant to the FCA community: ICFCA, ICCS and CLA. Using data mining methods from FCA and graph analysis, we investigate patterns and communities among authors, we identify and visualize influential publications and authors, and we give a statistical summary of the conferences’ history.

Computing iceberg concept lattices with TITANIC

We introduce the notion of iceberg concept lattices

and show their use in knowledge discovery in

databases. Iceberg lattices are a conceptual

clustering method, which is well suited for analyzing

very large databases. They also serve as a condensed

representation of frequent itemsets, as starting

point for computing bases of association rules, and

as a visualization method for association rules.

Iceberg concept lattices are based on the theory of

Formal Concept Analysis, a mathematical theory with

applications in data analysis, information retrieval,

and knowledge discovery. We present a new algorithm

called TITANIC for computing (iceberg) concept

lattices. It is based on data mining techniques with

a level-wise approach. In fact, TITANIC can be used

for a more general problem: Computing arbitrary

closure systems when the closure operator comes along

with a so-called weight function. The use of weight

functions for computing closure systems has not been

discussed in the literature up to now. Applications

providing such a weight function include association

rule mining, functional dependencies in databases,

conceptual clustering, and ontology engineering. The

algorithm is experimentally evaluated and compared

with Ganter's Next-Closure algorithm. The evaluation

shows an important gain in efficiency, especially for

weakly correlated data.

and show their use in knowledge discovery in

databases. Iceberg lattices are a conceptual

clustering method, which is well suited for analyzing

very large databases. They also serve as a condensed

representation of frequent itemsets, as starting

point for computing bases of association rules, and

as a visualization method for association rules.

Iceberg concept lattices are based on the theory of

Formal Concept Analysis, a mathematical theory with

applications in data analysis, information retrieval,

and knowledge discovery. We present a new algorithm

called TITANIC for computing (iceberg) concept

lattices. It is based on data mining techniques with

a level-wise approach. In fact, TITANIC can be used

for a more general problem: Computing arbitrary

closure systems when the closure operator comes along

with a so-called weight function. The use of weight

functions for computing closure systems has not been

discussed in the literature up to now. Applications

providing such a weight function include association

rule mining, functional dependencies in databases,

conceptual clustering, and ontology engineering. The

algorithm is experimentally evaluated and compared

with Ganter's Next-Closure algorithm. The evaluation

shows an important gain in efficiency, especially for

weakly correlated data.

Attribute Exploration on the Web

We propose an approach for supporting attribute exploration by web information retrieval, in particular by posing appropriate queries to search engines, crowd sourcing systems, and the linked open data cloud. We discuss underlying general assumptions for this to work and the degree to which these can be taken for granted.