Publications

Information Extraction: Past, Present and Future

Piskorski, J. & Yangarber, R.

Poibeau, T.; Saggion, H.; Piskorski, J. & Yangarber, R., ed., 'Multi-source, Multilingual Information Extraction and Summarization', Springer Berlin Heidelberg, 23-49 (2013) [pdf]

In this chapter we present a brief overview of Information Extraction, which is an area of natural language processing that deals with finding factual information in free text. In formal terms,

Multi-source, multilingual information extraction and summarization

2013, Poibeau, T.; Saggion, H.; Piskorski, J. & Yangarber, R., ed., Springer, Berlin; New York [pdf]

Information extraction (IE) and text summarization (TS) are powerful technologies for finding relevant pieces of information in text and presenting them to the user in condensed form. The ongoing information explosion makes IE and TS critical for successful functioning within the information society. These technologies face particular challenges due to the inherent multi-source nature of the information explosion. The technologies must now handle not isolated texts or individual narratives, but rather large-scale repositories and streams--in general, in multiple languages--containing a multiplicity of perspectives, opinions, or commentaries on particular topics, entities or events. There is thus a need to adapt existing techniques and develop new ones to deal with these challenges. This volume contains a selection of papers that present a variety of methodologies for content identification and extraction, as well as for content fusion and regeneration. The chapters cover various aspects of the challenges, depending on the nature of the information sought--names vs. events,-- and the nature of the sources--news streams vs. image captions vs. scientific research papers, etc. This volume aims to offer a broad and representative sample of studies from this very active research field.

The Role of Social Networks in Information Diffusion

Bakshy, E.; Rosenn, I.; Marlow, C. & Adamic, L.

(2012) [pdf]

Online social networking technologies enable individuals to simultaneously share information with any number of peers. Quantifying the causal effect of these technologies on the dissemination of information requires not only identification of who influences whom, but also of whether individuals would still propagate information in the absence of social signals about that information. We examine the role of social networks in online information diffusion with a large-scale field experiment that randomizes exposure to signals about friends' information sharing among 253 million subjects in situ. Those who are exposed are significantly more likely to spread information, and do so sooner than those who are not exposed. We further examine the relative role of strong and weak ties in information propagation. We show that, although stronger ties are individually more influential, it is the more abundant weak ties who are responsible for the propagation of novel information. This suggests that weak ties may play a more dominant role in the dissemination of information online than currently believed.

Collective Information Extraction with Context-Specific Consistencies.

Klügl, P.; Toepfer, M.; Lemmerich, F.; Hotho, A. & Puppe, F.

Flach, P. A.; Bie, T. D. & Cristianini, N., ed., 'ECML/PKDD (1)', 7523(), Lecture Notes in Computer Science, Springer, 728-743 (2012) [pdf]

Conditional Random Fields For Local Adaptive Reference Extraction

Toepfer, M.; Kluegl, P.; Hotho, A. & Puppe., F.

Atzmüller, M.; Benz, D.; Hotho, A. & Stumme, G., ed., 'Proceedings of LWA2010 - Workshop-Woche: Lernen, Wissen & Adaptivitaet', Kassel, Germany (2010) [pdf]

The accurate extraction of bibliographic information from scientific publications is an active field of research. Machine learning and sequence labeling approaches like Conditional Random Fields (CRF) are often applied for this reference extraction task, but still suffer from the ambiguity of reference notation. Reference sections apply a predefined style guide and contain only homogeneous references. Therefore, other references of the same paper or journal often provide evidence how the fields of a reference are correctly labeled. We propose a novel approach that exploits the similarities within a document. Our process model uses information of unlabeled documents directly during the extraction task in order to automatically adapt to the perceived style guide. This is implemented by changing the manifestation of the features for the applied CRF. The experimental results show considerable improvements compared to the common approach. We achieve an average F1 score of 96.7% and an instance accuracy of 85.4% on the test data set.

Introduction to Information Retrieval

Manning, C. D.; Raghavan, P. & Schütze, H.

2008, Cambridge University Press

Cross-lingual Information Retrieval with Explicit Semantic Analysis

Sorg, P. & Cimiano, P.

, 'Working Notes for the CLEF 2008 Workshop' (2008) [pdf]

Improving Tag-Clouds as Visual Information Retrieval Interfaces

Hassan-Montero, Y. & Herrero-Solana, V.

, 'InScit2006: International Conference on Multidisciplinary Information Sciences and Technologies' (2006) [pdf]

Tagging-based systems enable users to categorize web resources by means of tags (freely chosen keywords), in order to re-finding these resources later. Tagging is implicitly also a social indexing process, since users share their tags and resources, constructing a social tag index, so-called folksonomy. At the same time of tagging-based system, has been popularised an interface model for visual information retrieval known as Tag-Cloud. In this model, the most frequently used tags are displayed in alphabetical order. This paper presents a novel approach to Tag-Cloud�s tags selection, and proposes the use of clustering algorithms for visual layout, with the aim of improve browsing experience. The results suggest that presented approach reduces the semantic density of tag set, and improves the visual consistency of Tag-Cloud layout.

A Survey of Web Information Extraction Systems

Kayed, M. & Shaalan, K. F.

IEEE Transactions on Knowledge and Data Engineering, 18(10) 1411-1428 (2006)

Tree-Structured Conditional Random Fields for Semantic Annotation.

Tang, J.; Hong, M.; Li, J.-Z. & Liang, B.

Cruz, I. F.; Decker, S.; Allemang, D.; Preist, C.; Schwabe, D.; Mika, P.; Uschold, M. & Aroyo, L., ed., 'International Semantic Web Conference', 4273(), Lecture Notes in Computer Science, Springer, 640-653 (2006) [pdf]

Extracting social networks and contact information
from email and the Web

Culotta, A.; Bekkerman, R. & A.McCallum

, 'Proc.Conference on Email and Anti-Spam (CEAS)', Mountain View, USA (2004)

Accurate Information Extraction from Research Papers using Conditional Random Fields

Peng, F. & McCallum, A.

, 'HLT-NAACL', 329-336 (2004) [pdf]

Information Retrieval: Suchmodelle und Data-Mining-Verfahren für Textsammlungen und das Web

Ferber, R.

2003, dpunkt Verlag, Heidelberg [pdf]

The structure and function of complex networks

Newman, M. E. J.

(2003) [pdf]

Inspired by empirical studies of networked systems such as the Internet,
cial networks, and biological networks, researchers have in recent years
veloped a variety of techniques and models to help us understand or predict
e behavior of these systems. Here we review developments in this field,
cluding such concepts as the small-world effect, degree distributions,
ustering, network correlations, random graph models, models of network growth
d preferential attachment, and dynamical processes taking place on networks.

SEAL -- Tying up Information Integration and Web Site
nagement by Ontologies

Maedche, A.; Staab, S.; Studer, R.; Sure, Y. & Volz, R.

IEEE-CS Data Engineering Bulletin, Special Issue on Organizingand Discovering the Semantic Web (2002)

Modern Information Retrieval

Baeza-Yates, R. A. & Ribeiro-Neto, B. A.

1999, ACM Press / Addison-Wesley [pdf]

Application of Spreading Activation Techniques in Information Retrieval

Crestani, F.

Artificial Intelligence Review, 11(6) 453-482 (1997) [pdf]

This paper surveys the use of Spreading Activation techniques onSemantic Networks in Associative Information Retrieval. The majorSpreading Activation models are presented and their applications toIR is surveyed. A number of works in this area are criticallyanalyzed in order to study the relevance of Spreading Activation forassociative IR.
-

Readings in Information Retrieval

1997, Sparck-Jones, K. & Willett, P., ed., Morgan Kaufmann

Conceptual Structures: Information Processing in Mind and Machine

Sowa, J. F.

1984, Addison-Wesley Publishing Company, Reading, MA

Information retrieval

van Rijsbergen, C. J.

1979, Butterworths, London [pdf]