TY - GEN AU - Rubin, Timothy N. AU - Chambers, America AU - Smyth, Padhraic AU - Steyvers, Mark A2 - T1 - Statistical Topic Models for Multi-Label Document Classification JO - PB - AD - PY - 2011/ VL - IS - SP - EP - UR - http://arxiv.org/abs/1107.2462 M3 - KW - mining KW - model KW - text KW - tm KW - topic KW - toread L1 - N1 - Statistical Topic Models for Multi-Label Document Classification N1 - AB - Machine learning approaches to multi-label document classification have (to date) largely relied on discriminative modeling techniques such as support vector machines. A drawback of these approaches is that performance rapidly drops off as the total number of labels and the number of labels per document increase. This problem is amplified when the label frequencies exhibit the type of highly skewed distributions that are often observed in real-world datasets. In this paper we investigate a class of generative statistical topic models for multi-label documents that associate individual word tokens with different labels. We investigate the advantages of this approach relative to discriminative models, particularly with respect to classification problems involving large numbers of relatively rare labels. We compare the performance of generative and discriminative approaches on document labeling tasks ranging from datasets with several thousand labels to datasets with tens of labels. The experimental results indicate that generative models can achieve competitive multi-label classification performance compared to discriminative methods, and have advantages for datasets with many labels and skewed label frequencies. ER - TY - JOUR AU - Carpena, P. AU - Bernaola-Galván, P. AU - Hackenberg, M. AU - Coronado, A. V. AU - Oliver, J. L. T1 - Level statistics of words: Finding keywords in literary texts and symbolic sequences JO - Physical Review E (Statistical, Nonlinear, and Soft Matter Physics) PY - 2009/ VL - 79 IS - 3 SP - EP - UR - http://bioinfo2.ugr.es/TextKeywords/ M3 - 10.1103/PhysRevE.79.035102 KW - analysis KW - extraction KW - keyword KW - statistical KW - text KW - tm KW - topic KW - toread L1 - SN - N1 - Level statistics of words: Finding keywords in literary texts and symbolic sequences N1 - AB - ER - TY - CONF AU - Huang, Anna AU - Milne, David N. AU - Frank, Eibe AU - Witten, Ian H. A2 - Theeramunkong, Thanaruk A2 - Kijsirikul, Boonserm A2 - Cercone, Nick A2 - Ho, Tu Bao T1 - Clustering Documents Using a Wikipedia-Based Concept Representation. T2 - PAKDD PB - Springer CY - PY - 2009/ M2 - VL - 5476 IS - SP - 628 EP - 636 UR - http://dblp.uni-trier.de/db/conf/pakdd/pakdd2009.html#HuangMFW09 M3 - KW - background KW - clustering KW - knowledge KW - ontology KW - tm KW - wikipedia L1 - SN - 978-3-642-01306-5 N1 - dblp N1 - AB - ER - TY - BOOK AU - Heyer, Gerhard AU - Quasthoff, Uwe AU - Wittig, Thomas A2 - T1 - Text Mining: Wissensrohstoff Text PB - W3L-Verl. AD - Herdecke ; Bochum PY - 2008/ VL - IS - SP - EP - UR - http://aleph.bib.uni-mannheim.de/F/?func=find-b&request=280507895&find_code=020&adjacent=N&local_base=MAN01PUBLIC&x=0&y=0 M3 - KW - einführung KW - mining KW - text KW - tm L1 - SN - 978-3-937137-30-8 N1 - Konzepte, Algorithmen, Ergebnisse N1 - AB - ER - TY - BOOK AU - A2 - Berendt, B. A2 - Hotho, A. A2 - Mladenic, D. A2 - Semeraro, G. T1 - From Web to Social Web: Discovering and Deploying User and Content Profiles PB - Springer AD - PY - 2007/ VL - 4736 IS - SP - EP - UR - http://www.springer.com/dal/home?SGWID=1-102-22-173759307-0&changeHeader=true&referer=www.springeronline.com&SHORTCUT=www.springer.com/978-3-540-74950-9 M3 - KW - 2007 KW - data KW - dm KW - mining KW - myown KW - social KW - tm KW - web L1 - SN - 978-3-540-74950-9 N1 - From Web to Social Web: Discovering and Deploying User and Cont... - Data Mi...Journals, Books & Online Media | Springer N1 - AB - This book constitutes the refereed proceedings of the Workshop on Web Mining, WebMine 2006, held in Berlin, Germany, September 18th, 2006. Topics included are data mining based on analysis of bloggers and tagging, web mining, XML mining and further techniques of knowledge discovery. The book is especially valuable for those interested in the aspects of the Social Web (Web 2.0) and its inherent dynamic and diversity of user-generated content. ER - TY - BOOK AU - Feldman, Ronen AU - Sanger, James A2 - T1 - The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data PB - Cambridge University Press AD - PY - 2007/ VL - IS - SP - EP - UR - http://www.amazon.com/Text-Mining-Handbook-Approaches-Unstructured/dp/0521836573/ref=sr_1_1?s=books&ie=UTF8&qid=1295265273&sr=1-1 M3 - KW - mining KW - text KW - tm L1 - SN - 0521836573 N1 - Amazon.com: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data (9780521836579): Ronen Feldman, James Sanger: Books N1 - AB - ER - TY - JOUR AU - Colas, Fabrice AU - Brazdil, Pavel T1 - On the Behavior of SVM and Some Older Algorithms in Binary Text Classification Tasks JO - Text, Speech and Dialogue PY - 2006/ VL - IS - SP - 45 EP - 52 UR - http://dx.doi.org/10.1007/11846406_6 M3 - KW - classification KW - knn KW - nb KW - preprocessing KW - svm KW - text KW - tm KW - toread L1 - SN - N1 - SpringerLink - Buchkapitel N1 - AB - Document classification has already been widely studied. In fact, some studies compared feature selection techniques or feature

space transformation whereas some others compared the performance of different algorithms. Recently, following the risinginterest towards the Support Vector Machine, various studies showed that the SVM outperforms other classification algorithms.So should we just not bother about other classification algorithms and opt always for SVM? ER - TY - JOUR AU - Crane, Gregory T1 - What Do You Do with a Million Books? JO - D-Lib Magazine PY - 2006/march VL - 12 IS - 3 SP - EP - UR - http://www.dlib.org/dlib/march06/crane/03crane.html M3 - 10.1045/march2006-crane KW - Book KW - Mining KW - Text KW - google KW - tm KW - toread L1 - SN - N1 - N1 - AB - ER - TY - BOOK AU - Weiss, Sholom M. AU - Indurkhya, Nitin AU - Zhang, T. A2 - T1 - Text Mining. Predictive Methods for Analyzing Unstructured Information PB - Springer, Berlin AD - PY - 2004/ VL - IS - SP - EP - UR - http://www.amazon.de/gp/redirect.html%3FASIN=0387954333%26tag=ws%26lcode=xm2%26cID=2025%26ccmID=165953%26location=/o/ASIN/0387954333%253FSubscriptionId=13CT5CVB80YFWJEPWS02 M3 - KW - dm KW - mining KW - nlp KW - software KW - text KW - tm L1 - SN - 0387954333 N1 - Amazon.de: Text Mining. Predictive Methods for Analyzing Unstructured Information: Sholom M. Weiss,Nitin Indurkhya,T. Zhang: English Books N1 - AB - ER - TY - CONF AU - Hotho, Andreas AU - Maedche, Alexander AU - Staab, Steffen A2 - T1 - Text Clustering Based on Good Aggregations T2 - ICDM '01: Proceedings of the 2001 IEEE International Conference on Data Mining PB - IEEE Computer Society CY - Washington, DC, USA PY - 2001/ M2 - VL - IS - SP - 607 EP - 608 UR - http://portal.acm.org/citation.cfm?id=658040 M3 - KW - 2001 KW - clustering KW - gruppenbildung KW - kmeans KW - myown KW - ontology KW - text KW - tm L1 - SN - 0-7695-1119-8 N1 - Text Clustering Based on Good Aggregations N1 - AB - ER -