Artikel in Zeitschriften
Finding community structure in very large networks.
Physical Review E, 70:066111, 2004.
Aaron Clauset, M.E.J. Newman und Cristopher Moore.
[doi]
[BibTeX]
Sonstiges
Finding community structure in very large networks.
2004.
Aaron Clauset, M. E. J. Newman und Cristopher Moore.
[doi]
[Kurzfassung]
[BibTeX]
The discovery and analysis of community structure in networks is a topic of
considerable recent interest within the physics community, but most methods
proposed so far are unsuitable for very large networks because of their
computational cost. Here we present a hierarchical agglomeration algorithm for
detecting community structure which is faster than many competing algorithms:
its running time on a network with n vertices and m edges is O(m d log n) where
d is the depth of the dendrogram describing the community structure. Many
real-world networks are sparse and hierarchical, with m ~ n and d ~ log n, in
which case our algorithm runs in essentially linear time, O(n log^2 n). As an
example of the application of this algorithm we use it to analyze a network of
items for sale on the web-site of a large online retailer, items in the network
being linked if they are frequently purchased by the same buyer. The network
has more than 400,000 vertices and 2 million edges. We show that our algorithm
can extract meaningful communities from this network, revealing large-scale
patterns present in the purchasing habits of customers.
Artikel in Tagungsbänden
Tractable Group Detection on Large Link Data Sets.
In: X. Wu, A. Tuzhilin und J. Shavlik
(Herausgeber):
The Third IEEE International Conference on Data Mining, Seiten 573-576.
IEEE Computer Society, 2003.
Jeremy Kubica, Andrew Moore und Jeff Schneider.
[BibTeX]
Technische Berichte
K-groups: Tractable Group Detection on Large Link Data Sets.
Robotics Institute, Carnegie Mellon University, 2003. Nummer CMU-RI-TR-03-32.
Jeremy Martin Kubica, Andrew Moore und Jeff Schneider.
[doi]
[Kurzfassung]
[BibTeX]
Discovering underlying structure from co-occurrence data is an important task in many fields, including: insurance, intelligence, criminal investigation, epidemiology, human resources, and marketing. For example a store may wish to identify underlying sets of items purchased together or a human resources department may wish to identify groups of employees that collaborate with each other.
Previously Kubica et. al. presented the group detection algorithm (GDA) - an algorithm for finding underlying groupings of entities from co-occurrence data. This algorithm is based on a probabilistic generative model and produces coherent groups that are consistent with prior knowledge. Unfortunately, the optimization used in GDA is slow, making it potentially infeasible for many real world data sets.
To this end, we present k-groups - an algorithm that uses an approach similar to that of k-means (hard clustering and localized updates) to significantly accelerate the discovery of the underlying groups while retaining GDA's probabilistic model. In addition, we show that k-groups is guaranteed to converge to a local minimum. We also compare the performance of GDA and k-groups on several real world and artificial data sets, showing that k-groups' sacrifice in solution quality is significantly offset by its increase in speed. This trade-off makes group detection tractable on significantly larger data sets.