Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

A comparison of layout based bibliographic metadata extraction techniques

M. Granitzer, M. Hristakeva, R. Knight, K. Jack, und R. Kern. Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics, Seite 19:1--19:8. New York, NY, USA, ACM, (2012)

Zusammenfassung

Social research networks such as Mendeley and CiteULike offer various services for collaboratively managing bibliographic metadata. Compared with traditional libraries, metadata quality is of crucial importance in order to create a crowdsourced bibliographic catalog for search and browsing. Artifacts, in particular PDFs which are managed by the users of the social research networks, become one important metadata source and the starting point for creating a homogeneous, high quality, bibliographic catalog. Natural Language Processing and Information Extraction techniques have been employed to extract structured information from unstructured sources. However, given highly heterogeneous artifacts that cover a range of publication styles, stemming from different publication sources, and imperfect PDF processing tools, how accurate are metadata extraction methods in such real-world settings? This paper focuses on answering that question by investigating the use of Conditional Random Fields and Support Vector Machines on real-world data gathered from Mendeley and Linked-Data repositories. We compare style and content features on existing state-of-the-art methods on two newly created real-world data sets for metadata extraction. Our analysis shows that two-stage SVMs provide reasonable performance in solving the challenge of metadata extraction for crowdsourcing bibliographic metadata management.

Beschreibung

A comparison of layout based bibliographic metadata extraction techniques

Links und Ressourcen

URL:

http://doi.acm.org/10.1145/2254129.2254154

BibTeX-Schlüssel:

granitzer2012comparison

Suchen auf:

Kommentare und Rezensionen
(0)

Es gibt bisher keine Rezension oder Kommentar. Sie können eine schreiben!

Zitieren Sie diese Publikation

@inproceedings{granitzer2012comparison,
  abstract = {Social research networks such as Mendeley and CiteULike offer various services for collaboratively managing bibliographic metadata. Compared with traditional libraries, metadata quality is of crucial importance in order to create a crowdsourced bibliographic catalog for search and browsing. Artifacts, in particular PDFs which are managed by the users of the social research networks, become one important metadata source and the starting point for creating a homogeneous, high quality, bibliographic catalog. Natural Language Processing and Information Extraction techniques have been employed to extract structured information from unstructured sources. However, given highly heterogeneous artifacts that cover a range of publication styles, stemming from different publication sources, and imperfect PDF processing tools, how accurate are metadata extraction methods in such real-world settings? This paper focuses on answering that question by investigating the use of Conditional Random Fields and Support Vector Machines on real-world data gathered from Mendeley and Linked-Data repositories. We compare style and content features on existing state-of-the-art methods on two newly created real-world data sets for metadata extraction. Our analysis shows that two-stage SVMs provide reasonable performance in solving the challenge of metadata extraction for crowdsourcing bibliographic metadata management.},
  acmid = {2254154},
  added-at = {2012-07-05T15:46:16.000+0200},
  address = {New York, NY, USA},
  articleno = {19},
  author = {Granitzer, Michael and Hristakeva, Maya and Knight, Robert and Jack, Kris and Kern, Roman},
  biburl = {https://puma.uni-kassel.de/bibtex/27194c862da359af9aa18b4d865cbce55/jaeschke},
  booktitle = {Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics},
  description = {A comparison of layout based bibliographic metadata extraction techniques},
  doi = {10.1145/2254129.2254154},
  interhash = {bfa622b68be4bb039ca0516b3b33ec40},
  intrahash = {7194c862da359af9aa18b4d865cbce55},
  isbn = {978-1-4503-0915-8},
  keywords = {extraction ie information},
  location = {Craiova, Romania},
  numpages = {8},
  pages = {19:1--19:8},
  publisher = {ACM},
  timestamp = {2012-07-05T15:46:16.000+0200},
  title = {A comparison of layout based bibliographic metadata extraction techniques},
  url = {http://doi.acm.org/10.1145/2254129.2254154},
  year = 2012
}

%0 Conference Paper
%1 granitzer2012comparison
%A Granitzer, Michael
%A Hristakeva, Maya
%A Knight, Robert
%A Jack, Kris
%A Kern, Roman
%B Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
%C New York, NY, USA
%D 2012
%I ACM
%K extraction ie information
%P 19:1--19:8
%R 10.1145/2254129.2254154
%T A comparison of layout based bibliographic metadata extraction techniques
%U http://doi.acm.org/10.1145/2254129.2254154
%X Social research networks such as Mendeley and CiteULike offer various services for collaboratively managing bibliographic metadata. Compared with traditional libraries, metadata quality is of crucial importance in order to create a crowdsourced bibliographic catalog for search and browsing. Artifacts, in particular PDFs which are managed by the users of the social research networks, become one important metadata source and the starting point for creating a homogeneous, high quality, bibliographic catalog. Natural Language Processing and Information Extraction techniques have been employed to extract structured information from unstructured sources. However, given highly heterogeneous artifacts that cover a range of publication styles, stemming from different publication sources, and imperfect PDF processing tools, how accurate are metadata extraction methods in such real-world settings? This paper focuses on answering that question by investigating the use of Conditional Random Fields and Support Vector Machines on real-world data gathered from Mendeley and Linked-Data repositories. We compare style and content features on existing state-of-the-art methods on two newly created real-world data sets for metadata extraction. Our analysis shows that two-stage SVMs provide reasonable performance in solving the challenge of metadata extraction for crowdsourcing bibliographic metadata management.
%@ 978-1-4503-0915-8

PUMA

Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

A comparison of layout based bibliographic metadata extraction techniques

Zusammenfassung

Beschreibung

Links und Ressourcen

Kommentare und Rezensionen
(0)

Tags

Zitieren Sie diese Publikation

Metadaten

Community

Tags (@jaeschkes Tags hervorgehoben)

PUMA

KopierenLöschenDiese Publikation zur Ablage hinzufügenCommunity-EintragVersionsverlauf dieses EintragsURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML A comparison of layout based bibliographic metadata extraction techniques

Zusammenfassung

Beschreibung

Links und Ressourcen

Kommentare und Rezensionen (0)

Tags

Zitieren Sie diese Publikation

Metadaten

Community

Tags (@jaeschkes Tags hervorgehoben)

Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

A comparison of layout based bibliographic metadata extraction techniques

Kommentare und Rezensionen
(0)