Closed patterns meet n-ary relations.
ACM Transactions on Knowledge Discovery from Data, 3(1):1-36, 2009.
Loïc Cerf, Jérémy Besson, Céline Robardet and Jean-François Boulicaut.
[abstract]
[BibTeX]
Set pattern discovery from binary relations has been extensively studied during the last decade. In particular, many complete and efficient algorithms for frequent closed set mining are now available. Generalizing such a task to n-ary relations (n ≥ 2) appears as a timely challenge. It may be important for many applications, for example, when adding the time dimension to the popular objects × features binary case. The generality of the task (no assumption being made on the relation arity or on the size of its attribute domains) makes it computationally challenging. We introduce an algorithm called Data-Peeler. From an n-ary relation, it extracts all closed n-sets satisfying given piecewise (anti) monotonic constraints. This new class of constraints generalizes both monotonic and antimonotonic constraints. Considering the special case of ternary relations, Data-Peeler outperforms the state-of-the-art algorithms CubeMiner and Trias by orders of magnitude. These good performances must be granted to a new clever enumeration strategy allowing to efficiently enforce the closeness property. The relevance of the extracted closed n-sets is assessed on real-life 3-and 4-ary relations. Beyond natural 3-or 4-ary relations, expanding a relation with an additional attribute can help in enforcing rather abstract constraints such as the robustness with respect to binarization. Furthermore, a collection of closed n-sets is shown to be an excellent starting point to compute a tiling of the dataset.
Data-Peeler: Constraint-based Closed Pattern Mining in n-ary Relations.
In:
Proc. SIAM International Conference on Data Mining SDM'08, pages 37-48.
2008.
Loïc Cerf, Jérémy Besson, Céline Robardet and Jean-Francois Boulicaut.
[doi]
[abstract]
[BibTeX]
Set pattern discovery from binary relations has been extensively studied during the last decade. In particular, many complete and efficient algorithms which extract frequent closed sets are now available. Generalizing such a task to n-ary relations (n ≥ 2) appears as a timely challenge. It may be important for many applications, e.g., when adding the time dimension to the popular objects × features binary case. The generality of the task — no assumption being made on the relation arity or on the size of its attribute domains — makes it computationally challenging. We introduce an algorithm called Data-Peeler. From a n-ary relation, it extracts all closed n-sets satisfying given piecewise (anti)-monotonic constraints. This new class of constraints generalizes both monotonic and anti-monotonic constraints. Considering the special case of ternary relations, Data-Peeler outperforms the state-of-the-art algorithms CubeMiner and Trias by orders of magnitude. These good performances must be granted to a new clever enumeration strategy allowing an efficient closeness checking. An original application on a real-life 4-ary relation is used to assess the relevancy of closed n-sets constraint-based mining.
Trend Detection in Folksonomies.
In: Y. S. Avrithis, Y. Kompatsiaris, S. Staab and N. E. O'Connor, editors,
Proc. First International Conference on Semantics And Digital Media Technology (SAMT) , volume 4306, series LNCS, pages 56-70.
Springer, Heidelberg, 2006.
Andreas Hotho, Robert Jäschke, Christoph Schmitz and Gerd Stumme.
[doi]
[abstract]
[BibTeX]
As the number of resources on the web exceeds by far the number ofdocuments one can track, it becomes increasingly difficult to remainup to date on ones own areas of interest. The problem becomes moresevere with the increasing fraction of multimedia data, from whichit is difficult to extract some conceptual description of theircontents.One way to overcome this problem are social bookmark tools, whichare rapidly emerging on the web. In such systems, users are settingup lightweight conceptual structures called folksonomies, andovercome thus the knowledge acquisition bottleneck. As more and morepeople participate in the effort, the use of a common vocabularybecomes more and more stable. We present an approach for discoveringtopic-specific trends within folksonomies. It is based on adifferential adaptation of the PageRank algorithm to the triadichypergraph structure of a folksonomy. The approach allows for anykind of data, as it does not rely on the internal structure of thedocuments. In particular, this allows to consider different datatypes in the same analysis step. We run experiments on a large-scalereal-world snapshot of a social bookmarking system.