S. Jeffery, M. Franklin, und A. Halevy. Proceedings of the 2008 ACM SIGMOD international conference on Management of data, Seite 847--860. New York, NY, USA, ACM, (2008)
A primary challenge to large-scale data integration is creating semantic equivalences between elements from different data sources that correspond to the same real-world entity or concept. Dataspaces propose a pay-as-you-go approach: automated mechanisms such as schema matching and reference reconciliation provide initial correspondences, termed <i>candidate matches</i>, and then user feedback is used to incrementally confirm these matches. The key to this approach is to determine in what order to solicit user feedback for confirming candidate matches.</p> <p>In this paper, we develop a decision-theoretic framework for ordering candidate matches for user confirmation using the concept of the <i>value of perfect information (VPI)</i>. At the core of this concept is a <i>utility function</i> that quantifies the desirability of a given state; thus, we devise a utility function for dataspaces based on query result quality. We show in practice how to efficiently apply VPI in concert with this utility function to order user confirmations. A detailed experimental evaluation on both real and synthetic datasets shows that the ordering of user feedback produced by this VPI-based approach yields a dataspace with a significantly higher utility than a wide range of other ordering strategies. Finally, we outline the design of Roomba, a system that utilizes this decision-theoretic framework to guide a dataspace in soliciting user feedback in a pay-as-you-go manner.