Publications

Evans, B. M. & Chi, E. H.

Information Processing & Management, 46(6) 656-678 (2010) [pdf]

Search engine researchers typically depict search as the solitary activity of an individual searcher. In contrast, results from our critical-incident survey of 150 users on Amazon's Mechanical Turk service suggest that social interactions play an important role throughout the search process. A second survey of also 150 users, focused instead on difficulties encountered during searches, suggests similar conclusions. These social interactions range from highly coordinated collaborations with shared goals to loosely coordinated collaborations in which only advice is sought. Our main contribution is that we have integrated models from previous work in sensemaking and information-seeking behavior to present a canonical social model of user activities before, during, and after a search episode, suggesting where in the search process both explicitly and implicitly shared information may be valuable to individual searchers. We seek to situate collaboration in these search episodes in the context of our developed model for social search. We discuss factors that influence social interactions and content sharing during search activities. We also explore the relationship between social interactions, motivations, and query needs. Finally, we introduce preliminary findings from the second survey on difficult and failed search efforts, discussing how query needs and social interactions may differ in cases of search failures.

Mathematical Modeling of Social Games

Chan, K. T.; King, I. & Yuen, M.-C.

, 'Proceedings of the International Conference on Computational Science and Engineering, CSE '09', 4(), [10.1109/CSE.2009.166], 1205-1210 (2009) [pdf]

Human computation is a technique that makes use of human abilities for computation to solve problems. Social games use the power of the Internet game players to solve human computation problems. In previous works, many social games were proposed and were quite successful, but no formal framework exists for designing social games in general. A formal framework is important because it lists out the design elements of a social game, the characteristics of a human computation problem, and their relationships. With a formal framework, it simplifies the way to design a social game for a specific problem. In this paper, our contributions are: (1) formulate a formal model on social games, (2) analyze the framework and derive some interesting properties based on model's interactions, (3) illustrate how some current social games can be realized with the proposed formal model, and (4) describe how to design a social game for solving a specific problem with the use of the proposed formal model. This paper presents a set of design guidelines derived from the formal model and demonstrates that the model can help to design a social game for solving a specific problem in a formal and structural way.

Reading Tea Leaves: How Humans Interpret Topic Models

Chang, J.; Boyd-Graber, J. L.; Gerrish, S.; Wang, C. & Blei, D. M.

Bengio, Y.; Schuurmans, D.; Lafferty, J. D.; Williams, C. K. I. & Culotta, A., ed., 'NIPS', Curran Associates, Inc., 288-296 (2009) [pdf]

Probabilistic topic models are a popular tool for the unsupervised analysis of text, providing both a predictive model of future text and a latent topic representation of the corpus. Practitioners typically assume that the latent space is semantically meaningful. It is used to check models, summarize the corpus, and guide exploration of its contents. However, whether the latent space is interpretable is in need of quantitative evaluation. In this paper, we present new quantitative methods for measuring semantic meaning in inferred topics. We back these measures with large-scale user studies, showing that they capture aspects of the model that are undetected by previous measures of model quality based on held-out likelihood. Surprisingly, topic models which perform better on held-out likelihood may infer less semantically meaningful topics.

Measuring article quality in wikipedia: models and evaluation

Hu, M.; Lim, E.-P.; Sun, A.; Lauw, H. W. & Vuong, B.-Q.

, 'Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management', CIKM '07, ACM, New York, NY, USA, [10.1145/1321440.1321476], 243-252 (2007) [pdf]

Wikipedia has grown to be the world largest and busiest free encyclopedia, in which articles are collaboratively written and maintained by volunteers online. Despite its success as a means of knowledge sharing and collaboration, the public has never stopped criticizing the quality of Wikipedia articles edited by non-experts and inexperienced contributors. In this paper, we investigate the problem of assessing the quality of articles in collaborative authoring of Wikipedia. We propose three article quality measurement models that make use of the interaction data between articles and their contributors derived from the article edit history. Our B<scp>asic</scp> model is designed based on the mutual dependency between article quality and their author authority. The P<scp>eer</scp>R<scp>eview</scp> model introduces the review behavior into measuring article quality. Finally, our P<scp>rob</scp>R<scp>eview</scp> models extend P<scp>eer</scp>R<scp>eview</scp> with partial reviewership of contributors as they edit various portions of the articles. We conduct experiments on a set of well-labeled Wikipedia articles to evaluate the effectiveness of our quality measurement models in resembling human judgement.

Pseudo-models and propositional Horn inference

Ganter, B. & Krauße, R.

Discrete Applied Mathematics, 147(1) 43-55 (2005) [pdf]

A well-known result is that the inference problem for propositional Horn formulae can be solved in linear time. We show that this remains true even in the presence of arbitrary (static) propositional background knowledge. Our main tool is the notion of a cumulated clause, a slight generalization of the usual clauses in Propositional Logic. We show that each propositional theory has a canonical irredundant base of cumulated clauses, and present an algorithm to compute this base.

Document quality models for web ad hoc retrieval

Zhou, Y. & Croft, W. B.

, 'Proceedings of the 14th ACM International Conference on Information and Knowledge Management', CIKM '05, ACM, New York, NY, USA, [10.1145/1099554.1099652], 331-332 (2005) [pdf]

The quality of document content, which is an issue that is usually ignored for the traditional ad hoc retrieval task, is a critical issue for Web search. Web pages have a huge variation in quality relative to, for example, newswire articles. To address this problem, we propose a document quality language model approach that is incorporated into the basic query likelihood retrieval model in the form of a prior probability. Our results demonstrate that, on average, the new model is significantly better than the baseline (query likelihood model) in terms of precision at the top ranks.

A Brief History of Generative Models for Power Law and Lognormal Distributions

Mitzenmacher, M.

Internet Mathematics, 1(2) 226-251 (2004) [pdf]

Recently, I became interested in a current debate over whether file size distributions are best modelled by a power law distribution or a lognormal distribution. In trying to learn enough about these distributions to settle the question, I found a rich and long history, spanning many fields. Indeed, several recently proposed models from the computer science community have antecedents in work from decades ago. Here, I briefly survey some of this history, focusing on underlying generative models that lead to these distributions. One finding is that lognormal and power law distributions connect quite naturally, and hence, it is not surprising that lognormal distributions have arisen as a possible alternative to power law distributions across many fields.

An extensible approach for Modeling Ontologies in RDF(S)

Staab, S.; Erdmann, M.; Maedche, A. & Decker, S.

, 'Proc. of First Workshop on the Semantic Web at the Fourth European Conference International Workshop on Research and Advanced Technology for Digital Libraries, Lisbon, Portugal 18-20 September 2000' (2000) [pdf]

RDF(S)1 constitutes a newly emerging standard for metadata that is about to turn the World Wide Web into a machine-understandable knowledge base. It is an XML application that allows for the denotation of facts and schemata in a web-compatible format, building on an elaborate objectmodel for describing concepts and relations. Thus, it might turn up as a natural choice for a widely-useable ontology description language. However, its lack of capabilities for describing the semantics of concepts and relations beyond those provided by inheritance mechanisms makes it a rather weak language for even the most austere knowledge-based system. This paper presents an approach for modeling ontologies in RDF(S) that also considers axioms as objects that are describable in RDF(S). Thus, we provide flexible, extensible, and adequate means for accessing and exchanging axioms in RDF(S). Our approach follows the spirit of the World Wide Web, as we do not assume a global axiom specification language that is too intractable for one purpose and too weak for the next, but rather a methodology that allows (communities of) users to specify what axioms are interesting in their domain.

A distributed trust model

Abdul-Rahman, A. & Hailes, S.

, 'NSPW '97: Proceedings of the 1997 Workshop on New Security Paradigms', ACM, New York, NY, USA, [10.1145/283699.283739], 48-60 (1997) [pdf]

The widespread use of the Internet signals the need for a better understanding of trust as a basis for secure on-line interaction. In the face of increasing uncertainty and risk, users must be allowed to reason effectively about the trustworthiness of on-line entities. In this paper, we outline the shortcomings of current security approaches for managing trust and propose a model for trust, based on distributed recommendations.