%0 %0 Generic %A Rubin, Timothy N.; Chambers, America; Smyth, Padhraic & Steyvers, Mark %D 2011 %T Statistical Topic Models for Multi-Label Document Classification %E %B %C %I %V %6 %N %P %& %Y %S %7 %8 %9 %? %! %Z %@ %( %) %* %L %M %1 %2 Statistical Topic Models for Multi-Label Document Classification %3 misc %4 %# %$ %F Rubin2011 %K mining, model, text, tm, topic, toread %X Machine learning approaches to multi-label document classification have (to date) largely relied on discriminative modeling techniques such as support vector machines. A drawback of these approaches is that performance rapidly drops off as the total number of labels and the number of labels per document increase. This problem is amplified when the label frequencies exhibit the type of highly skewed distributions that are often observed in real-world datasets. In this paper we investigate a class of generative statistical topic models for multi-label documents that associate individual word tokens with different labels. We investigate the advantages of this approach relative to discriminative models, particularly with respect to classification problems involving large numbers of relatively rare labels. We compare the performance of generative and discriminative approaches on document labeling tasks ranging from datasets with several thousand labels to datasets with tens of labels. The experimental results indicate that generative models can achieve competitive multi-label classification performance compared to discriminative methods, and have advantages for datasets with many labels and skewed label frequencies. %Z cite arxiv:1107.2462 %U http://arxiv.org/abs/1107.2462 %+ %^ %0 %0 Journal Article %A Carpena, P.; Bernaola-Galv\'a,n, P.; Hackenberg, M.; Coronado, A. V. & Oliver, J. L. %D 2009 %T Level statistics of words: Finding keywords in literary texts and symbolic sequences %E %B Physical Review E (Statistical, Nonlinear, and Soft Matter Physics) %C %I APS %V 79 %6 %N 3 %P 035102 %& %Y %S %7 %8 %9 %? %! %Z %@ %( %) %* %L %M %1 %2 Level statistics of words: Finding keywords in literary texts and symbolic sequences %3 article %4 %# %$ %F carpena:035102 %K analysis, extraction, keyword, statistical, text, tm, topic, toread %X %Z %U http://bioinfo2.ugr.es/TextKeywords/ %+ %^ %0 %0 Conference Proceedings %A Huang, Anna; Milne, David N.; Frank, Eibe & Witten, Ian H. %D 2009 %T Clustering Documents Using a Wikipedia-Based Concept Representation. %E Theeramunkong, Thanaruk; Kijsirikul, Boonserm; Cercone, Nick & Ho, Tu Bao %B PAKDD %C %I Springer %V 5476 %6 %N %P 628-636 %& %Y %S Lecture Notes in Computer Science %7 %8 %9 %? %! %Z %@ 978-3-642-01306-5 %( %) %* %L %M %1 %2 dblp %3 inproceedings %4 conf/pakdd/2009 %# %$ %F conf/pakdd/HuangMFW09 %K background, clustering, knowledge, ontology, tm, wikipedia %X %Z %U http://dblp.uni-trier.de/db/conf/pakdd/pakdd2009.html#HuangMFW09 %+ %^ %0 %0 Book %A Heyer, Gerhard; Quasthoff, Uwe & Wittig, Thomas %D 2008 %T Text Mining: Wissensrohstoff Text %E %B IT lernen %C Herdecke ; Bochum %I W3L-Verl. %V %6 %N %P %& %Y %S %7 1. korr. Nachdr. %8 %9 %? %! %Z %@ 978-3-937137-30-8 %( %) %* %L %M %1 %2 Konzepte, Algorithmen, Ergebnisse %3 book %4 %# %$ %F UBMA_280507895 %K einführung, mining, text, tm %X %Z %U http://aleph.bib.uni-mannheim.de/F/?func=find-b&request=280507895&find_code=020&adjacent=N&local_base=MAN01PUBLIC&x=0&y=0 %+ %^ %0 %0 Book %A %D 2007 %T From Web to Social Web: Discovering and Deploying User and Content Profiles %E Berendt, B.; Hotho, A.; Mladenic, D. & Semeraro, G. %B LNCS %C %I Springer %V 4736 %6 %N %P %& %Y %S %7 %8 %9 %? %! %Z %@ 978-3-540-74950-9 %( %) %* %L %M %1 %2 From Web to Social Web: Discovering and Deploying User and Cont... - Data Mi...Journals, Books & Online Media | Springer %3 book %4 %# %$ %F Berendt2007 %K 2007, data, dm, mining, myown, social, tm, web %X This book constitutes the refereed proceedings of the Workshop on Web Mining, WebMine 2006, held in Berlin, Germany, September 18th, 2006. Topics included are data mining based on analysis of bloggers and tagging, web mining, XML mining and further techniques of knowledge discovery. The book is especially valuable for those interested in the aspects of the Social Web (Web 2.0) and its inherent dynamic and diversity of user-generated content. %Z %U http://www.springer.com/dal/home?SGWID=1-102-22-173759307-0&changeHeader=true&referer=www.springeronline.com&SHORTCUT=www.springer.com/978-3-540-74950-9 %+ %^ %0 %0 Book %A Feldman, Ronen & Sanger, James %D 2007 %T The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data %E %B %C %I Cambridge University Press %V %6 %N %P %& %Y %S %7 %8 %9 %? %! %Z %@ 0521836573 %( %) %* %L %M %1 %2 Amazon.com: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data (9780521836579): Ronen Feldman, James Sanger: Books %3 book %4 %# %$ %F feldman2006mining %K mining, text, tm %X %Z %U http://www.amazon.com/Text-Mining-Handbook-Approaches-Unstructured/dp/0521836573/ref=sr_1_1?s=books&ie=UTF8&qid=1295265273&sr=1-1 %+ %^ %0 %0 Journal Article %A Colas, Fabrice & Brazdil, Pavel %D 2006 %T On the Behavior of SVM and Some Older Algorithms in Binary Text Classification Tasks %E %B Text, Speech and Dialogue %C %I %V %6 %N %P 45--52 %& %Y %S %7 %8 %9 %? %! %Z %@ %( %) %* %L %M %1 %2 SpringerLink - Buchkapitel %3 article %4 %# %$ %F colas2006behavior %K classification, knn, nb, preprocessing, svm, text, tm, toread %X Document classification has already been widely studied. In fact, some studies compared feature selection techniques or feature space transformation whereas some others compared the performance of different algorithms. Recently, following the risinginterest towards the Support Vector Machine, various studies showed that the SVM outperforms other classification algorithms.So should we just not bother about other classification algorithms and opt always for SVM? %Z %U http://dx.doi.org/10.1007/11846406_6 %+ %^ %0 %0 Journal Article %A Crane, Gregory %D 2006 %T What Do You Do with a Million Books? %E %B D-Lib Magazine %C %I %V 12 %6 %N 3 %P %& %Y %S %7 %8 March %9 %? %! %Z %@ 1082-9873 %( %) %* %L %M %1 %2 %3 article %4 %# %$ %F march06crane %K Book, Mining, Text, google, tm, toread %X %Z %U http://www.dlib.org/dlib/march06/crane/03crane.html %+ %^ %0 %0 Book %A Weiss, Sholom M.; Indurkhya, Nitin & Zhang, T. %D 2004 %T Text Mining. Predictive Methods for Analyzing Unstructured Information %E %B %C %I Springer, Berlin %V %6 %N %P %& %Y %S %7 1 %8 %9 %? %! %Z %@ 0387954333 %( %) %* %L %M %1 %2 Amazon.de: Text Mining. Predictive Methods for Analyzing Unstructured Information: Sholom M. Weiss,Nitin Indurkhya,T. Zhang: English Books %3 book %4 %# %$ %F 0387954333 %K dm, mining, nlp, software, text, tm %X %Z %U http://www.amazon.de/gp/redirect.html%3FASIN=0387954333%26tag=ws%26lcode=xm2%26cID=2025%26ccmID=165953%26location=/o/ASIN/0387954333%253FSubscriptionId=13CT5CVB80YFWJEPWS02 %+ %^ %0 %0 Conference Proceedings %A Hotho, Andreas; Maedche, Alexander & Staab, Steffen %D 2001 %T Text Clustering Based on Good Aggregations %E %B ICDM '01: Proceedings of the 2001 IEEE International Conference on Data Mining %C Washington, DC, USA %I IEEE Computer Society %V %6 %N %P 607--608 %& %Y %S %7 %8 %9 %? %! %Z %@ 0-7695-1119-8 %( %) %* %L %M %1 %2 Text Clustering Based on Good Aggregations %3 inproceedings %4 %# %$ %F 658040 %K 2001, clustering, gruppenbildung, kmeans, myown, ontology, text, tm %X %Z %U http://portal.acm.org/citation.cfm?id=658040 %+ %^