Disambiguating toponyms in news
E. Garbin, и I. Mani.
Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, стр. 363--370. Stroudsburg, PA, USA, Association for Computational Linguistics, (2005)

This research is aimed at the problem of disambiguating toponyms (place names) in terms of a classification derived by merging information from two publicly available gazetteers. To establish the difficulty of the problem, we measured the degree of ambiguity, with respect to a gazetteer, for toponyms in news. We found that 67.82% of the toponyms found in a corpus that were ambiguous in a gazetteer lacked a local discriminator in the text. Given the scarcity of human-annotated data, our method used unsupervised machine learning to develop disambiguation rules. Toponyms were automatically tagged with information about them found in a gazetteer. A toponym that was ambiguous in the gazetteer was automatically disambiguated based on preference heuristics. This automatically tagged data was used to train a machine learner, which disambiguated toponyms in a human-annotated news corpus at 78.5% accuracy.

URL

http://dx.doi.org/10.3115/1220575.1220621

искать в

К этой публикации ещё не было создано рецензий.

распределение оценок

средняя оценка пользователей0,0 из 5.0 на основе 0 рецензий

Пожалуйста, войдите в систему, чтобы принять участие в дискуссии (добавить собственные рецензию, или комментарий)

@inproceedings{garbin2005disambiguating,
  abstract = {This research is aimed at the problem of disambiguating toponyms (place names) in terms of a classification derived by merging information from two publicly available gazetteers. To establish the difficulty of the problem, we measured the degree of ambiguity, with respect to a gazetteer, for toponyms in news. We found that 67.82% of the toponyms found in a corpus that were ambiguous in a gazetteer lacked a local discriminator in the text. Given the scarcity of human-annotated data, our method used unsupervised machine learning to develop disambiguation rules. Toponyms were automatically tagged with information about them found in a gazetteer. A toponym that was ambiguous in the gazetteer was automatically disambiguated based on preference heuristics. This automatically tagged data was used to train a machine learner, which disambiguated toponyms in a human-annotated news corpus at 78.5% accuracy.},
  acmid = {1220621},
  added-at = {2012-10-03T09:34:09.000+0200},
  address = {Stroudsburg, PA, USA},
  author = {Garbin, Eric and Mani, Inderjeet},
  biburl = {https://puma.uni-kassel.de/bibtex/2de574cf3bff3a3748fcd9bd5a9a0f3d1/jaeschke},
  booktitle = {Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing},
  doi = {10.3115/1220575.1220621},
  interhash = {566910cb6e9745ee70da19d2ccafaffa},
  intrahash = {de574cf3bff3a3748fcd9bd5a9a0f3d1},
  keywords = {disambiguation extraction geo map news toponym},
  location = {Vancouver, British Columbia, Canada},
  numpages = {8},
  pages = {363--370},
  publisher = {Association for Computational Linguistics},
  timestamp = {2012-10-03T09:34:09.000+0200},
  title = {Disambiguating toponyms in news},
  url = {http://dx.doi.org/10.3115/1220575.1220621},
  year = 2005
}

%0 Conference Paper
%1 garbin2005disambiguating
%A Garbin, Eric
%A Mani, Inderjeet
%B Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
%C Stroudsburg, PA, USA
%D 2005
%I Association for Computational Linguistics
%K disambiguation extraction geo map news toponym
%P 363--370
%R 10.3115/1220575.1220621
%T Disambiguating toponyms in news
%U http://dx.doi.org/10.3115/1220575.1220621
%X This research is aimed at the problem of disambiguating toponyms (place names) in terms of a classification derived by merging information from two publicly available gazetteers. To establish the difficulty of the problem, we measured the degree of ambiguity, with respect to a gazetteer, for toponyms in news. We found that 67.82% of the toponyms found in a corpus that were ambiguous in a gazetteer lacked a local discriminator in the text. Given the scarcity of human-annotated data, our method used unsupervised machine learning to develop disambiguation rules. Toponyms were automatically tagged with information about them found in a gazetteer. A toponym that was ambiguous in the gazetteer was automatically disambiguated based on preference heuristics. This automatically tagged data was used to train a machine learner, which disambiguated toponyms in a human-annotated news corpus at 78.5% accuracy.

PUMA

Disambiguating toponyms in news
E. Garbin, и I. Mani.
Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, стр. 363--370. Stroudsburg, PA, USA, Association for Computational Linguistics, (2005)

Tags

Пользователи данного ресурса

Комментарии и рецензии

Цитировать эту публикацию

PUMA

Disambiguating toponyms in newsE. Garbin, и I. Mani. Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, стр. 363--370. Stroudsburg, PA, USA, Association for Computational Linguistics, (2005)

Tags

Пользователи данного ресурса

Комментарии и рецензии

Цитировать эту публикацию

Disambiguating toponyms in news
E. Garbin, и I. Mani.
Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, стр. 363--370. Stroudsburg, PA, USA, Association for Computational Linguistics, (2005)