R. Radoulov. School of Computer Science, University of Waterloo, (Mai 2008)
Currently, citation indexes used by digital libraries are very limited. They only provide raw citation counts and link scientific articles through their citations. There are more than one type of citations, but citation indexes treat all citations equally. One way to improve citation indexes is to determine the types of citations in scientific articles (background, support, perfunctory reference, etc.) This will enable researchers to query citation indexes more efficiently by locating articles grouped by citation types. For example, it can enable a researcher to locate all background material needed to understand a specific article by locating all "background" citations. Many classification schemes currently exist. However, manual annotation of all existing digital documents is infeasible because of the sheer magnitude of the digital content, which brings about the need for automating the annotating process, but not much research has been done in the area. One of the reasons preventing researchers from researching automated citation classification is the lack on annotated corpora that they can use. This thesis explores automated citation classification. We make several contributions to the field of citation classification. We present a new citation scheme that is easier to work with than most. Also, we present a document acquisition and citation annotation tool that helps with the development of annotated citation corpora. And finally, we present some experiments with automating citation classification.