TY - CONF AU - Chai, Xiaoyong AU - Vuong, Ba-Quy AU - Doan, AnHai AU - Naughton, Jeffrey F. A2 - T1 - Efficiently incorporating user feedback into information extraction and integration programs T2 - Proceedings of the 35th SIGMOD international conference on Management of data PB - ACM CY - New York, NY, USA PY - 2009/ M2 - VL - IS - SP - 87 EP - 100 UR - http://doi.acm.org/10.1145/1559845.1559857 M3 - 10.1145/1559845.1559857 KW - information KW - ie KW - human KW - intelligence KW - computing KW - cirg KW - collective KW - extraction KW - toread L1 - SN - 978-1-60558-551-2 N1 - N1 - AB - Many applications increasingly employ information extraction and integration (IE/II) programs to infer structures from unstructured data. Automatic IE/II are inherently imprecise. Hence such programs often make many IE/II mistakes, and thus can significantly benefit from user feedback. Today, however, there is no good way to automatically provide and process such feedback. When finding an IE/II mistake, users often must alert the developer team (e.g., via email or Web form) about the mistake, and then wait for the team to manually examine the program internals to locate and fix the mistake, a slow, error-prone, and frustrating process.

In this paper we propose a solution for users to directly provide feedback and for IE/II programs to automatically process such feedback. In our solution a developer U uses hlog, a declarative IE/II language, to write an IE/II program P. Next, U writes declarative user feedback rules that specify which parts of P's data (e.g., input, intermediate, or output data) users can edit, and via which user interfaces. Next, the so-augmented program P is executed, then enters a loop of waiting for and incorporating user feedback. Given user feedback F on a data portion of P, we show how to automatically propagate F to the rest of P, and to seamlessly combine F with prior user feedback. We describe the syntax and semantics of hlog, a baseline execution strategy, and then various optimization techniques. Finally, we describe experiments with real-world data that demonstrate the promise of our solution. ER -