TY  - CONF
AU  - Chai, Xiaoyong
AU  - Vuong, Ba-Quy
AU  - Doan, AnHai
AU  - Naughton, Jeffrey F.
A2  - 
T1  - Efficiently incorporating user feedback into information extraction and integration programs
T2  - Proceedings of the 35th SIGMOD international conference on Management of data
PB  - ACM
CY  - New York, NY, USA
PY  - 2009/
M2  - 
VL  - 
IS  - 
SP  - 87
EP  - 100
UR  - http://doi.acm.org/10.1145/1559845.1559857
M3  - 10.1145/1559845.1559857
KW  - information
KW  - ie
KW  - human
KW  - intelligence
KW  - computing
KW  - cirg
KW  - collective
KW  - extraction
KW  - toread
L1  - 
SN  - 978-1-60558-551-2
N1  - 
N1  - 
AB  - Many applications increasingly employ information extraction and integration (IE/II) programs to infer structures from unstructured data. Automatic IE/II are inherently imprecise. Hence such programs often make many IE/II mistakes, and thus can significantly benefit from user feedback. Today, however, there is no good way to automatically provide and process such feedback. When finding an IE/II mistake, users often must alert the developer team (e.g., via email or Web form) about the mistake, and then wait for the team to manually examine the program internals to locate and fix the mistake, a slow, error-prone, and frustrating process.</p> <p>In this paper we propose a solution for users to directly provide feedback and for IE/II programs to automatically process such feedback. In our solution a developer <i>U</i> uses hlog, a declarative IE/II language, to write an IE/II program <i>P</i>. Next, <i>U</i> writes declarative user feedback rules that specify which parts of <i>P</i>'s data (e.g., input, intermediate, or output data) users can edit, and via which user interfaces. Next, the so-augmented program <i>P</i> is executed, then enters a loop of waiting for and incorporating user feedback. Given user feedback <i>F</i> on a data portion of <i>P</i>, we show how to automatically propagate <i>F</i> to the rest of <i>P</i>, and to seamlessly combine <i>F</i> with prior user feedback. We describe the syntax and semantics of hlog, a baseline execution strategy, and then various optimization techniques. Finally, we describe experiments with real-world data that demonstrate the promise of our solution.
ER  -