Design and Update of a Classification System: The UCSD Map of Science.
PLoS ONE, 7(7):e39464, 2012.
Katy Börner, Richard Klavans, Michael Patek, Angela M. Zoss, Joseph R. Biberstine, Robert P. Light, Vincent Larivière und Kevin W. Boyack.
[doi]
[Kurzfassung]
[BibTeX]
Global maps of science can be used as a reference system to chart career trajectories, the location of emerging research frontiers, or the expertise profiles of institutes or nations. This paper details data preparation, analysis, and layout performed when designing and subsequently updating the UCSD map of science and classification system. The original classification and map use 7.2 million papers and their references from Elsevier’s Scopus (about 15,000 source titles, 2001–2005) and Thomson Reuters’ Web of Science (WoS) Science, Social Science, Arts & Humanities Citation Indexes (about 9,000 source titles, 2001–2004)–about 16,000 unique source titles. The updated map and classification adds six years (2005–2010) of WoS data and three years (2006–2008) from Scopus to the existing category structure–increasing the number of source titles to about 25,000. To our knowledge, this is the first time that a widely used map of science was updated. A comparison of the original 5-year and the new 10-year maps and classification system show (i) an increase in the total number of journals that can be mapped by 9,409 journals (social sciences had a 80% increase, humanities a 119% increase, medical (32%) and natural science (74%)), (ii) a simplification of the map by assigning all but five highly interdisciplinary journals to exactly one discipline, (iii) a more even distribution of journals over the 554 subdisciplines and 13 disciplines when calculating the coefficient of variation, and (iv) a better reflection of journal clusters when compared with paper-level citation data. When evaluating the map with a listing of desirable features for maps of science, the updated map is shown to have higher mapping accuracy, easier understandability as fewer journals are multiply classified, and higher usability for the generation of data overlays, among others.
Full-Text Citation Analysis: A New Method to Enhance Scholarly Network.
Journal of the American Society for Information Science and Technology, 2012.
Xiaozhong Liu, Jinsong Zhang und Chun Guo.
[doi]
[BibTeX]
A Classification-based Approach for Bibliographic Metadata Deduplication.
Proceedings of the IADIS International Conference WWW/Internet 2011 :221-228, 2011.
Eduardo N. Borges, Karin Becker, Carlos A. Heuser und Renata Galante.
[doi]
[Kurzfassung]
[BibTeX]
Digital libraries of scientific articles describe them using a set of metadata, including bibliographic references. These references can be represented by several formats and styles. Considerable content variations can occur in some metadata fields such as title, author names and publication venue. Besides, it is quite common to find references that omit same metadata fields such as page numbers. Duplicate entries influence the quality of digital library services once they need to be appropriately identified and treated. This paper presents a comparative analysis among different data classification algorithms used to identify duplicated bibliographic metadata records. We have investigated the discovered patterns by comparing the rules and the decision tree with the heuristics adopted in a previous work. Our experiments show that the combination of specific-purpose similarity functions previously proposed and classification algorithms represent an improvement up to 12% when compared to the experiments using our original approach.
Web page classification: Features and algorithms.
ACM Comput. Surv., 41(2):1-31, 2009.
Xiaoguang Qi und Brian D. Davison.
[doi]
[BibTeX]
Playing games as a way to improve automatic image annotation.
Computer Vision and Pattern Recognition Workshops, 2008. CVPR Workshops 2008. IEEE Computer Society Conference on:1-8, 2008.
R. Jesus, D. Goncalves, A.J. Abrantes und N. Correia.
[Kurzfassung]
[BibTeX]
Image annotation is hard to do in an automatic way. In this paper, we propose a framework for image annotation that combines the benefits of three paradigms: automatic annotation, human intervention and entertainment activities. We also describe our proposal inside this framework, the ASAA (application for semi-automatic annotation) interface, a new computer game for image tagging. The application has a 3D game interface, and is supported by a game engine that uses a system for automatic image classification and gestural input to play the game. We present results of the performance of semantic models obtained with a training set enlarged by images annotated during the game activity as well as usability tests of the application.
The Anti-Social Tagger - Detecting Spam in Social Bookmarking Systems.
In:
AIRWeb '08: Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web, Seiten 61-68.
ACM, New York, NY, USA, 2008.
Beate Krause, Christoph Schmitz, Andreas Hotho und Gerd Stumme.
[doi]
[Kurzfassung]
[BibTeX]
The annotation of web sites in social bookmarking systems
has become a popular way to manage and find information
on the web. The community structure of such systems attracts
spammers: recent post pages, popular pages or specific
tag pages can be manipulated easily. As a result, searching
or tracking recent posts does not deliver quality results
annotated in the community, but rather unsolicited, often
commercial, web sites. To retain the benefits of sharing
one’s web content, spam-fighting mechanisms that can face
the flexible strategies of spammers need to be developed.
Exploring Automatic Citation Classification.
Doktorarbeit, School of Computer Science, University of Waterloo, 2008.
Radoslav Radoulov.
[doi]
[Kurzfassung]
[BibTeX]
Currently, citation indexes used by digital libraries are very limited. They only provide raw citation counts and link scientific articles through their citations. There are more than one type of citations, but citation indexes treat all citations equally. One way to improve citation indexes is to determine the types of citations in scientific articles (background, support, perfunctory reference, etc.) This will enable researchers to query citation indexes more efficiently by locating articles grouped by citation types. For example, it can enable a researcher to locate all background material needed to understand a specific article by locating all "background" citations. Many classification schemes currently exist. However, manual annotation of all existing digital documents is infeasible because of the sheer magnitude of the digital content, which brings about the need for automating the annotating process, but not much research has been done in the area. One of the reasons preventing researchers from researching automated citation classification is the lack on annotated corpora that they can use. This thesis explores automated citation classification. We make several contributions to the field of citation classification. We present a new citation scheme that is easier to work with than most. Also, we present a document acquisition and citation annotation tool that helps with the development of annotated citation corpora. And finally, we present some experiments with automating citation classification.
SD-Map - A Fast Algorithm for Exhaustive Subgroup Discovery..
In: J. Fürnkranz, T. Scheffer und M. Spiliopoulou
(Herausgeber):
PKDD, Band 4213, Reihe Lecture Notes in Computer Science, Seiten 6-17.
Springer, 2006.
Martin Atzmüller und Frank Puppe.
[doi]
[Kurzfassung]
[BibTeX]
In this paper we present the novel SD-Map algorithm for exhaustive but efficient subgroup discovery. SD-Map guarantees to identify all interesting subgroup patterns contained in a data set, in contrast to heuristic or samplingbased methods. The SD-Map algorithm utilizes the well-known FP-growth method for mining association rules with adaptations for the subgroup discovery task.We show how SD-Map can handle missing values, and provide an experimental evaluation of the performance of the algorithm using synthetic data.
Automatic Bookmark Classification - A Collaborative Approach.
In:
Proceedings of the 2nd Workshop in Innovations in Web Infrastructure (IWI2) at WWW2006.
Edinburgh, Scotland, 2006.
Dominik Benz, Karen H. L. Tso und Lars Schmidt-Thieme.
[doi]
[Kurzfassung]
[BibTeX]
Bookmarks (or Favorites, Hotlists) are a popular strategy to relocate interesting websites on the WWW by creating a personalized local URL repository. Most current browsers offer a facility to store and manage bookmarks in a hierarchy of folders; though, with growing size, users reportedly have trouble to create and maintain a stable taxonomy. This paper presents a novel collaborative approach to ease bookmark management, especially the “classification” of new bookmarks into a folder. We propose a methodology to realize the collaborative classification idea of considering how similar users have classified a bookmark. A combination of nearest-neighbour-classifiers is used to derive a recommendation from similar users on where to store a new bookmark. Additionally, a procedure to generate keyword recommendations is proposed to ease the annotation of new bookmarks. A prototype system called CariBo has been implemented as a plugin of the central bookmark server software SiteBar. A case study conducted with real user data supports the validity of the approach.
Collaborative Tagging as a Knowledge Organisation and Resource Discovery Tool.
Library Review, 55(5):291-300, 2006.
George Macgregor und Emma McCulloch.
[doi]
[Kurzfassung]
[BibTeX]
The purpose of the paper is to provide an overview of the collaborative tagging phenomenon and explore some of the reasons for its emergence. Design/methodology/approach - The paper reviews the related literature and discusses some of the problems associated with, and the potential of, collaborative tagging approaches for knowledge organisation and general resource discovery. A definition of controlled vocabularies is proposed and used to assess the efficacy of collaborative tagging. An exposition of the collaborative tagging model is provided and a review of the major contributions to the tagging literature is presented. Findings - There are numerous difficulties with collaborative tagging systems (e.g. low precision, lack of collocation, etc.) that originate from the absence of properties that characterise controlled vocabularies. However, such systems can not be dismissed. Librarians and information professionals have lessons to learn from the interactive and social aspects exemplified by collaborative tagging systems, as well as their success in engaging users with information management. The future co-existence of controlled vocabularies and collaborative tagging is predicted, with each appropriate for use within distinct information contexts: formal and informal. Research limitations/implications - Librarians and information professional researchers should be playing a leading role in research aimed at assessing the efficacy of collaborative tagging in relation to information storage, organisation, and retrieval, and to influence the future development of collaborative tagging systems. Practical implications - The paper indicates clear areas where digital libraries and repositories could innovate in order to better engage users with information. Originality/value - At time of writing there were no literature reviews summarising the main contributions to the collaborative tagging research or debate.
The Language of Folksonomies: What Tags Reveal About User Classification.
In: C. Kop, G. Fliedl, H. C. Mayr und E. Métais
(Herausgeber):
Natural Language Processing and Information Systems, Band 3999, Reihe Lecture Notes in Computer Science, Seiten 58-69.
Springer, Berlin/Heidelberg, 2006.
Csaba Veres.
[doi]
[Kurzfassung]
[BibTeX]
Folksonomies are classification schemes that emerge from the collective actions of users who tag resources with an unrestricted
set of key terms. There has been a flurry of activity in this domain recently with a number of high profile web sites andsearch engines adopting the practice. They have sparked a great deal of excitement and debate in the popular and technicalliterature, accompanied by a number of analyses of the statistical properties of tagging behavior. However, none has addressedthe deep nature of folksonomies. What is the nature of a tag? Where does it come from? How is it related to a resource? Inthis paper we present a study in which the linguistic properties of folksonomies reveal them to contain, on the one hand,tags that are similar to standard categories in taxonomies. But on the other hand, they contain additional tags to describeclass properties. The implications of the findings for the relationship between folksonomy and ontology are discussed.
Semi-Automatic Visual Subgroup Mining using VIKAMINE..
Journal of Universal Computer Science, 11(11):1752-1765, 2005.
Martin Atzmüller und Frank Puppe.
[doi]
[Kurzfassung]
[BibTeX]
Visual mining methods enable the direct integration of the user to overcome major problems of automatic data mining methods, e.g., the presentation of uninteresting results, lack of acceptance of the discovered findings, or limited confidence in these. We present a novel subgroup mining approach for explorative and descriptive data mining implemented in the VIKAMINE system. We propose several integrated visualization methods to support subgroup mining. Furthermore, we describe three case studies using data from fielded systems in the medical domain.
Citation Classification and its Applications.
In:
Proceedings of the International Conference on Knowledge Management, Seiten 287-298.
World Scientific Publishing, 2005.
Selcuk Aya, Carl Lagoze und Thorsten Joachims.
[doi]
[Kurzfassung]
[BibTeX]
Citation analysis has been used to study various aspects of scholarly communication. In general, these studies have not differentiated among the multiple reasons for citations. However, authors cite other works for a number of reasons including demonstrating knowledge of the field, establishing the placement of the citing work in the field, comparing and criticizing other works, and paying homage to seminal work by pioneers in the field. In this paper, we present a number of applications in which distinguishing among authors' motivations for citations might be useful and present a machine learning approach to automatically classifying citations according to these motivations. Our approach to citation classification makes use of the structure and the argumentative nature of the scientific papers. We present the results of experiments we ran on papers in the computer science field. The results are encouraging and give us hope that we can use our citation classifier in analyzing large corpora of scientific papers.
Hedging in Scientific Articles as a Means of Classifying Citations.
In:
Proc. AAAI Spring Symposium.
2004.
Chrysanne Di Marco und Robert E. Mercer.
[doi]
[Kurzfassung]
[BibTeX]
Citations in scientific writing fulfil an important role in creating relationships among mutually relevant articles within a research field. These inter-article relationships reinforce the argumentation structure intrinsic to all scientific writing. Therefore, determining the nature of the exact relationship between a citing and cited paper requires an understanding of the rhetorical relations within the argumentative context in which a citation is placed. To automatically determine these relations, we have suggested that various stylistic and rhetorical cues will be significant. One such cue that we are studying is the use of hedging to modify the affect of a scientific claim. We have previously shown that hedging occurs more frequently in citation contexts than in the text as a whole. With this information we conjecture that hedging is a significant aspect of the rhetorical structure of citation contexts and that the pragmatics of hedges may help in determining the rhetorical purpose of citations.
Folksonomies - Cooperative Classification and Communication Through Shared Metadata.
2004. http://www.adammathes.com/academic/computer-mediated-communication/folksonomies.html.
Adam Mathes.
[BibTeX]
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.
In:
Proceedings of the Eighteenth International Conference on Machine Learning, Seiten 282-289.
Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2001.
John D. Lafferty, Andrew McCallum und Fernando C. N. Pereira.
[doi]
[BibTeX]
Classification of research papers using citation links and citation types: Towards automatic review article generation.
11th ASIS SIG/CR Classification Research Workshop:117-134, 2000.
H. Nanba, N. Kando und M. Okumura.
[doi]
[Kurzfassung]
[BibTeX]
We are investigating automatic generation of a review (or survey) article in a specific subject domain. In a research paper, there are passages where the author describes the essence of a cited paper and the differences between the current paper and the cited paper (we call them citing areas). These passages can be considered as a kind of summary of the cited paper from the current author's viewpoint. We can know the state of the art in a specific subject domain from the collection of citing areas. FUrther, if these citing areas are properly classified and organized, they can act 8.', a kind of a review article. In our previous research, we proposed the automatic extraction of citing areas. Then, with the information in the citing areas, we automatically identified the types of citation relationships that indicate the reasons for citation (we call them citation types). Citation types offer a useful clue for organizing citing areas. In addition, to support writing a review article, it is necessary to take account of the contents of the papers together with the citation links and citation types. In this paper, we propose several methods for classifying papers automatically. We found that our proposed methods BCCT-C, the bibliographic coupling considering only type C citations, which pointed out the problems or gaps in related works, are more effective than others. We also implemented a prototype system to support writing a review article, which is based on our proposed method.
Recommendation as classification: using social and content-based information in recommendation.
In:
AAAI '98/IAAI '98: Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence, Seiten 714-720.
American Association for Artificial Intelligence, Menlo Park, CA, USA, 1998.
Chumki Basu, Haym Hirsh und William Cohen.
[doi]
[BibTeX]