TY - GEN AU - Yu, Hsiang-Fu AU - Jain, Prateek AU - Kar, Purushottam AU - Dhillon, Inderjit S. A2 - T1 - Large-scale Multi-label Learning with Missing Labels JO - PB - C1 - PY - 2013/ VL - IS - SP - EP - UR - http://arxiv.org/abs/1307.5101 DO - KW - classification KW - kallimachos KW - label KW - large KW - learning KW - multi L1 - N1 - Large-scale Multi-label Learning with Missing Labels N1 - AB - The multi-label classification problem has generated significant interest in
recent years. However, existing approaches do not adequately address two key
challenges: (a) the ability to tackle problems with a large number (say
millions) of labels, and (b) the ability to handle data with missing labels. In
this paper, we directly address both these problems by studying the multi-label
problem in a generic empirical risk minimization (ERM) framework. Our
framework, despite being simple, is surprisingly able to encompass several
recent label-compression based methods which can be derived as special cases of
our method. To optimize the ERM problem, we develop techniques that exploit the
structure of specific loss functions - such as the squared loss function - to
offer efficient algorithms. We further show that our learning framework admits
formal excess risk bounds even in the presence of missing labels. Our risk
bounds are tight and demonstrate better generalization performance for low-rank
promoting trace-norm regularization when compared to (rank insensitive)
Frobenius norm regularization. Finally, we present extensive empirical results
on a variety of benchmark datasets and show that our methods perform
significantly better than existing label compression based methods and can
scale up to very large datasets such as the Wikipedia dataset.
ER -
TY - JOUR
AU - Binder, M.
AU - Hibbett, D. S.
AU - Larsson, K. H.
AU - Larsson, E.
AU - Langer, E.
AU - Langer, G.
T1 - The phylogenetic distribution of resupinate forms across the major clades of mushroom-forming fungi (Homobasidiomycetes)
JO - Systematics and Biodiversity
PY - 2005/06
VL - 3
IS - 2
SP - 113
EP - 157
UR - /brokenurl# considerable recent interest within the physics community, but most methods
proposed so far are unsuitable for very large networks because of their
computational cost. Here we present a hierarchical agglomeration algorithm for
detecting community structure which is faster than many competing algorithms:
its running time on a network with n vertices and m edges is O(m d log n) where
d is the depth of the dendrogram describing the community structure. Many
real-world networks are sparse and hierarchical, with m ~ n and d ~ log n, in
which case our algorithm runs in essentially linear time, O(n log^2 n). As an
example of the application of this algorithm we use it to analyze a network of
items for sale on the web-site of a large online retailer, items in the network
being linked if they are frequently purchased by the same buyer. The network
has more than 400,000 vertices and 2 million edges. We show that our algorithm
can extract meaningful communities from this network, revealing large-scale
patterns present in the purchasing habits of customers.
ER -
TY - CONF
AU - Kubica, Jeremy
AU - Moore, Andrew
AU - Schneider, Jeff
A2 - Wu, Xindong
A2 - Tuzhilin, Alex
A2 - Shavlik, Jude
T1 - Tractable Group Detection on Large Link Data Sets
T2 - The Third IEEE International Conference on Data Mining
PB - IEEE Computer Society
C1 -
PY - 2003/november
CY -
VL -
IS -
SP - 573
EP - 576
UR -
DO -
KW - large
KW - detection
KW - community
KW - network
KW - gda
L1 -
SN -
N1 -
N1 -
AB -
ER -
TY - RPRT
AU - Kubica, Jeremy Martin
AU - Moore, Andrew
AU - Schneider, Jeff
A2 -
T1 - K-groups: Tractable Group Detection on Large Link Data Sets
PB - Robotics Institute, Carnegie Mellon University
AD - Pittsburgh, PA
PY - 2003/10
VL -
IS - CMU-RI-TR-03-32
SP -
EP -
UR - http://www.ri.cmu.edu/pubs/pub_4489.html
DO -
KW - large
KW - detection
KW - gda
KW - network
KW - community
L1 -
N1 -
N1 -
N1 -
AB - Discovering underlying structure from co-occurrence data is an important task in many fields, including: insurance, intelligence, criminal investigation, epidemiology, human resources, and marketing. For example a store may wish to identify underlying sets of items purchased together or a human resources department may wish to identify groups of employees that collaborate with each other.
Previously Kubica et. al. presented the group detection algorithm (GDA) - an algorithm for finding underlying groupings of entities from co-occurrence data. This algorithm is based on a probabilistic generative model and produces coherent groups that are consistent with prior knowledge. Unfortunately, the optimization used in GDA is slow, making it potentially infeasible for many real world data sets.
To this end, we present k-groups - an algorithm that uses an approach similar to that of k-means (hard clustering and localized updates) to significantly accelerate the discovery of the underlying groups while retaining GDA's probabilistic model. In addition, we show that k-groups is guaranteed to converge to a local minimum. We also compare the performance of GDA and k-groups on several real world and artificial data sets, showing that k-groups' sacrifice in solution quality is significantly offset by its increase in speed. This trade-off makes group detection tractable on significantly larger data sets.
ER -
TY - BOOK
AU - Cullum, J.K.
AU - Willoughby, R.A.
A2 -
T1 - Lanczos algorithms for large symmetric eigenvalue computations: Theory
PB - Society for Industrial Mathematics
C1 -
PY - 2002/
VL -
IS -
SP -
EP -
UR - http://scholar.google.de/scholar.bib?q=info:zshJq2GVHO8J:scholar.google.com/&output=citation&hl=de&ct=citation&cd=0
DO -
KW - algorithms
KW - eigenvalue
KW - large
KW - symmetric
L1 -
SN -
N1 -
N1 -
AB -
ER -
TY - CONF
AU - Wang, Haixun
AU - 0010, Wei Wang
AU - Yang, Jiong
AU - Yu, Philip S.
A2 - Franklin, Michael J.
A2 - Moon, Bongki
A2 - Ailamaki, Anastassia
T1 - Clustering by pattern similarity in large data sets.
T2 - SIGMOD Conference
PB - ACM
C1 -
PY - 2002/
CY -
VL -
IS -
SP - 394
EP - 405
UR - http://dblp.uni-trier.de/db/conf/sigmod/sigmod2002.html#WangWYY02
DO -
KW - algorithm
KW - clustering
KW - data
KW - fca?
KW - large
KW - pClusters
KW - paper
KW - pattern
L1 -
SN - 1-58113-497-5
N1 - dblp
N1 -
AB -
ER -
TY - CONF
AU - Hovy, E.H.
A2 -
T1 - Combining and Standardizing Large-Scale, Practical Ontologies for Machine Translation and Other Uses
T2 - Proc. 1st Intl. Conf. on Language Resources and Evaluation (LREC)
PB -
C1 - Granada
PY - 1998/
CY -
VL -
IS -
SP -
EP -
UR - http://www.isi.edu/natural-language/people/hovy/publications.html
DO -
KW - ontology
KW - large
KW - scale
KW - application
KW - practical
L1 -
SN - 3-540-41066-X
N1 -
N1 -
AB -
ER -
TY - CONF
AU - Zhang, Tian
AU - Ramakrishnan, Raghu
AU - Livny, Miron
A2 -
T1 - BIRCH: an efficient data clustering method for very large databases
T2 - Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data (SIGMOD'96)
PB -
C1 -
PY - 1996/
CY -
VL -
IS -
SP - 103
EP - 114
UR - http://citeseer.ist.psu.edu/zhang96birch.html
DO -
KW - birch
KW - clustering
KW - database
KW - large
KW - tree
L1 -
SN -
N1 -
N1 -
AB -
ER -
TY - JOUR
AU - Berry, M.
AU - Do, T.
AU - OâBrien, G.
AU - Krishna, V.
AU - Varadhan, S.
T1 - SVDPACKC (version 1.0) user's guide
JO - University of Tennessee
PY - 1993/
VL -
IS -
SP -
EP -
UR - http://scholar.google.de/scholar.bib?q=info:Y2fs0GQQ_LIJ:scholar.google.com/&output=citation&hl=de&ct=citation&cd=0
DO -
KW - large
KW - lib
KW - scale
KW - sparse
KW - svd
L1 -
SN -
N1 -
N1 -
AB -
ER -
TY - JOUR
AU - Sherman, Andrew H
T1 - Algorithms for sparse Gaussian elimination with partial pivoting
JO - ACM Trans. Math. Softw.
PY - 1978/
VL - 4
IS - 4
SP - 330
EP - 338
UR - http://portal.acm.org/citation.cfm?id=356494
DO - http://doi.acm.org/10.1145/356502.356494
KW - large
KW - math
KW - matrix
KW - scale
KW - svd
L1 -
SN -
N1 -
N1 -
AB -
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
HeBIS: Universitaetsbibliothek Kassel
Feedback
(564 KB)
ACM Transactions on Mathematical Software (TOMS)
Volume 4 , Issue 4 (December 1978)
Pages: 330 - 338
Year of Publication: 1978
ISSN:0098-3500
Author
Downloads (6 Weeks): 2, Downloads (12 Months): 58, Citation Count: 2
references
cited by
index terms
collaborative colleagues
peer to peer
1
2
CURTIS, A.R., AND REID, J.K. FORTRAN subroutines for the solution of sparse sets of hnear equations. AERE Rep. R6844, AERE HarweU, England, 1971.
3
CURTIS, A.R, AND REID, J.K The solution of large sparse unsymmetnc systems of hnear equations. Information Processing '71 (1971), 1240-1245.
4
DUFF, I.S., AND REID, J.K A comparison of sparsity ordermgs for obtaining a pivotal sequence in Gausslan elm~matlon. AERE Rep. T.P. 526, AERE Harwell, England, 1973
5
6
FORSYTHE, G.E., AND MOLER, C B Computer Solutton of Linear Algebraic Systems Prentice- Hall, Englewood Cliffs, N J, 1967
7
G.
G.1
G.1.3
Subjects:
Sparse, structured, and very large systems (direct and iterative methods)
Additional Classification:
G.
G.1
G.1.3
Subjects:
Linear systems (direct and iterative methods)
Peer to Peer - Readers of this Article have also read:
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE Design Automation Conference on
Gwo-Dong Chen
, Daniel D. Gajski
ER -
Subscribe (Full Service)
Register (Limited Service, Free)
Algorithms for sparse Gaussian elimination with partial pivoting
Full text
Source
Publisher
Bibliometrics
Additional Information: