Li, Y.; Wen, A.; Lin, Q.; Li, R. & Lu, Z. (2011), Incorporating User Feedback into Name Disambiguation of Scientific Cooperation Network, in Haixun Wang; Shijun Li; Satoshi Oyama; Xiaohua Hu & Tieyun Qian, ed., 'Web-Age Information Management' , Springer, Berlin/Heidelberg , pp. 454--466 .
[Volltext] [Kurzfassung] [BibTeX] [Endnote]

In scientific cooperation network, ambiguous author names may occur due to the existence of multiple authors with the same name. Users of these networks usually want to know the exact author of a paper, whereas we do not have any unique identifier to distinguish them. In this paper, we focus ourselves on such problem, we propose a new method that incorporates user feedback into the model for name disambiguation of scientific cooperation network. Perceptron is used as the classifier. Two features and a constraint drawn from user feedback are incorporated into the perceptron to enhance the performance of name disambiguation. Specifically, we construct user feedback as a training stream, and refine the perceptron continuously. Experimental results show that the proposed algorithm can learn continuously and significantly outperforms the previous methods without introducing user interactions.

Garbin, E. & Mani, I. (2005), Disambiguating toponyms in news, in 'Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing' , Association for Computational Linguistics, Stroudsburg, PA, USA , pp. 363--370 .
[Volltext] [Kurzfassung] [BibTeX] [Endnote]

This research is aimed at the problem of disambiguating toponyms (place names) in terms of a classification derived by merging information from two publicly available gazetteers. To establish the difficulty of the problem, we measured the degree of ambiguity, with respect to a gazetteer, for toponyms in news. We found that 67.82% of the toponyms found in a corpus that were ambiguous in a gazetteer lacked a local discriminator in the text. Given the scarcity of human-annotated data, our method used unsupervised machine learning to develop disambiguation rules. Toponyms were automatically tagged with information about them found in a gazetteer. A toponym that was ambiguous in the gazetteer was automatically disambiguated based on preference heuristics. This automatically tagged data was used to train a machine learner, which disambiguated toponyms in a human-annotated news corpus at 78.5% accuracy.