Disambiguating toponyms in news
, и .
Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, стр. 363--370. Stroudsburg, PA, USA, Association for Computational Linguistics, (2005)

This research is aimed at the problem of disambiguating toponyms (place names) in terms of a classification derived by merging information from two publicly available gazetteers. To establish the difficulty of the problem, we measured the degree of ambiguity, with respect to a gazetteer, for toponyms in news. We found that 67.82% of the toponyms found in a corpus that were ambiguous in a gazetteer lacked a local discriminator in the text. Given the scarcity of human-annotated data, our method used unsupervised machine learning to develop disambiguation rules. Toponyms were automatically tagged with information about them found in a gazetteer. A toponym that was ambiguous in the gazetteer was automatically disambiguated based on preference heuristics. This automatically tagged data was used to train a machine learner, which disambiguated toponyms in a human-annotated news corpus at 78.5% accuracy.
  • @jaeschke
К этой публикации ещё не было создано рецензий.

распределение оценок
средняя оценка пользователей0,0 из 5.0 на основе 0 рецензий
    Пожалуйста, войдите в систему, чтобы принять участие в дискуссии (добавить собственные рецензию, или комментарий)