Publications

Collaborative data networks for public service: governance, management, and performance.

Chen, Y.-C. & Lee, J.

Public Management Review, 20(5) (2018)

This study aims to advance the theory and practice of managing collaborative data networks for information and decision-support services that exist in over 400 US metropolitan areas. Integrating insights from collaborative governance, network management, and cross-boundary information sharing, this study develops a framework to outline the interplay between context, management, collaborative dynamics, technology, and performance. This study further utilizes the framework to conduct an exploratory in-depth case study of a metropolitan transportation data network to examine such interplay. The findings suggest ways to improve the performance of collaborative data networks and their implications are discussed. [ABSTRACT FROM AUTHOR]

Never-Ending Learning

Mitchell, T.; Cohen, W.; Hruscha, E.; Talukdar, P.; Betteridge, J.; Carlson, A.; Dalvi, B.; Gardner, M.; Kisiel, B.; Krishnamurthy, J.; Lao, N.; Mazaitis, K.; Mohammad, T.; Nakashole, N.; Platanios, E.; Ritter, A.; Samadi, M.; Settles, B.; Wang, R.; Wijaya, D.; Gupta, A.; Chen, X.; Saparov, A.; Greaves, M. & Welling, J.

, 'AAAI' (2015) [pdf]

ASTERIX: an open source system for "Big Data" management and analysis (demo)

Alsubaiee, S.; Altowim, Y.; Altwaijry, H.; Behm, A.; Borkar, V.; Bu, Y.; Carey, M.; Grover, R.; Heilbron, Z.; Kim, Y.-S.; Li, C.; Onose, N.; Pirzadeh, P.; Vernica, R. & Wen, J.

Proceedings of the VLDB Endowment, 5(12) 1898-1901 (2012) [pdf]

At UC Irvine, we are building a next generation parallel database system, called ASTERIX, as our approach to addressing today's "Big Data" management challenges. ASTERIX aims to combine time-tested principles from parallel database systems with those of the Web-scale computing community, such as fault tolerance for long running jobs. In this demo, we present a whirlwind tour of ASTERIX, highlighting a few of its key features. We will demonstrate examples of our data definition language to model semi-structured data, and examples of interesting queries using our declarative query language. In particular, we will show the capabilities of ASTERIX for answering geo-spatial queries and fuzzy queries, as well as ASTERIX' data feed construct for continuously ingesting data.

ASTERIX: towards a scalable, semistructured data platform for evolving-world models

Behm, A.; Borkar, V.; Carey, M.; Grover, R.; Li, C.; Onose, N.; Vernica, R.; Deutsch, A.; Papakonstantinou, Y. & Tsotras, V.

Distributed and Parallel Databases, 29(3) 185-216 (2011) [pdf]

ASTERIX is a new data-intensive storage and computing platform project spanning UC Irvine, UC Riverside, and UC San Diego. In this paper we provide an overview of the ASTERIX project, starting with its main goal—the storage and analysis of data pertaining to evolving-world models . We describe the requirements and associated challenges, and explain how the project is addressing them. We provide a technical overview of ASTERIX, covering its architecture, its user model for data and queries, and its approach to scalable query processing and data management. ASTERIX utilizes a new scalable runtime computational platform called Hyracks that is also discussed at an overview level; we have recently made Hyracks available in open source for use by other interested parties. We also relate our work on ASTERIX to the current state of the art and describe the research challenges that we are currently tackling as well as those that lie ahead.

Community structure of the physical review citation network

Chen, P. & Redner, S.

Journal of Informetrics, 4(3) 278 - 290 (2010) [pdf]

We investigate the community structure of physics subfields in the citation network of all Physical Review publications between 1893 and August 2007. We focus on well-cited publications (those receiving more than 100 citations), and apply modularity maximization to uncover major communities that correspond to clearly identifiable subfields of physics. While most of the links between communities connect those with obvious intellectual overlap, there sometimes exist unexpected connections between disparate fields due to the development of a widely applicable theoretical technique or by cross fertilization between theory and experiment. We also examine communities decade by decade and also uncover a small number of significant links between communities that are widely separated in time.

Claper: Recommend classical papers to beginners

Wang, Y.; Zhai, E.; Hu, J. & Chen, Z.

, 'Proceedings of the seventh International Conference on Fuzzy Systems and Knowledge Discovery', 6(), IEEE, [10.1109/FSKD.2010.5569227], 2777-2781 (2010) [pdf]

Classical papers are of great help for beginners to get familiar with a new research area. However, digging them out is a difficult problem. This paper proposes Claper, a novel academic recommendation system based on two proven principles: the Principle of Download Persistence and the Principle of Citation Approaching (we prove them based on real-world datasets). The principle of download persistence indicates that classical papers have few decreasing download frequencies since they were published. The principle of citation approaching indicates that a paper which cites a classical paper is likely to cite citations of that classical paper. Our experimental results based on large-scale real-world datasets illustrate Claper can effectively recommend classical papers of high quality to beginners and thus help them enter their research areas.

A Method for Comparing Human Postures from Motion Capture Data

Yang, W.-T.; Luo, Z.; Chen, I.-M. & Yeo, S.

Parenti Castelli, V. & Schiehlen, W., ed., 'ROMANSY 18 Robot Design, Dynamics and Control', 524(), Springer Vienna, 441-448 (2010) [pdf]

Interaction design guidelines on critiquing-based recommender systems

Chen, L. & Pu, P.

User Modeling and User-Adapted Interaction, 19(3) 167-206 (2009) [pdf]

A critiquing-based recommender system acts like an artificial salesperson. It engages users in a conversational dialog where users can provide feedback in the form of critiques to the sample items that were shown to them. The feedback, in turn, enables the system to refine its understanding of the user’s preferences and prediction of what the user truly wants. The system is then able to recommend products that may better stimulate the user’s interest in the next interaction cycle. In this paper, we report our extensive investigation of comparing various approaches in devising critiquing opportunities designed in these recommender systems. More specifically, we have investigated two major design elements which are necessary for a critiquing-based recommender system:

Information extraction challenges in managing unstructured data

Doan, A.; Naughton, J. F.; Ramakrishnan, R.; Baid, A.; Chai, X.; Chen, F.; Chen, T.; Chu, E.; DeRose, P.; Gao, B.; Gokhale, C.; Huang, J.; Shen, W. & Vuong, B.-Q.

SIGMOD Record, 37(4) 14-20 (2009) [pdf]

Over the past few years, we have been trying to build an end-to-end system at Wisconsin to manage unstructured data, using extraction, integration, and user interaction. This paper describes the key information extraction (IE) challenges that we have run into, and sketches our solutions. We discuss in particular developing a declarative IE language, optimizing for this language, generating IE provenance, incorporating user feedback into the IE process, developing a novel wiki-based user interface for feedback, best-effort IE, pushing IE into RDBMSs, and more. Our work suggests that IE in managing unstructured data can open up many interesting research challenges, and that these challenges can greatly benefit from the wealth of work on managing structured data that has been carried out by the database community.

An Overview of Learning to Rank for Information Retrieval.

Dong, X.; Chen, X.; Guan, Y.; Yu, Z. & Li, S.

Burgin, M.; Chowdhury, M. H.; Ham, C. H.; Ludwig, S. A.; Su, W. & Yenduri, S., ed., 'CSIE (3)', IEEE Computer Society, 600-606 (2009) [pdf]

Personalized tag recommendation using graph-based ranking on multi-type interrelated objects.

Guan, Z.; Bu, J.; Mei, Q.; Chen, C. & Wang, C.

Allan, J.; Aslam, J. A.; Sanderson, M.; Zhai, C. & Zobel, J., ed., 'SIGIR', ACM, 540-547 (2009) [pdf]

Exploit the tripartite network of social tagging for web clustering

Lu, C.; Chen, X. & Park, E. K.

, 'Proceeding of the 18th ACM conference on Information and knowledge management', CIKM '09, ACM, New York, NY, USA, [10.1145/1645953.1646167], 1545-1548 (2009) [pdf]

In this poster, we investigate how to enhance web clustering by leveraging the tripartite network of social tagging systems. We propose a clustering method, called "Tripartite Clustering", which cluster the three types of nodes (resources, users and tags) simultaneously based on the links in the social tagging network. The proposed method is experimented on a real-world social tagging dataset sampled from del.icio.us. We also compare the proposed clustering approach with K-means. All the clustering results are evaluated against a human-maintained web directory. The experimental results show that Tripartite Clustering significantly outperforms the content-based K-means approach and achieves performance close to that of social annotation-based K-means whereas generating much more useful information.

A Survey of Human Computation Systems

Yuen, M.-C.; Chen, L.-J. & King, I.

, 'Proceedings of the International Conference on Computational Science and Engineering, CSE '09', 4(), [10.1109/CSE.2009.395], 723-728 (2009) [pdf]

Human computation is a technique that makes use of human abilities for computation to solve problems. The human computation problems are the problems those computers are not good at solving but are trivial for humans. In this paper, we give a survey of various human computation systems which are categorized into initiatory human computation, distributed human computation and social game-based human computation with volunteers, paid engineers and online players. For the existing large number of social games, some previous works defined various types of social games, but the recent developed social games cannot be categorized based on the previous works. In this paper, we define the categories and the characteristics of social games which are suitable for all existing ones. Besides, we present a survey on the performance aspects of human computation system. This paper gives a better understanding on human computation system.

Enhancing text clustering by leveraging Wikipedia semantics.

Hu, J.; Fang, L.; Cao, Y.; Zeng, H.-J.; Li, H.; Yang, Q. & Chen, Z.

Myaeng, S.-H.; Oard, D. W.; Sebastiani, F.; Chua, T.-S. & Leong, M.-K., ed., 'SIGIR', ACM, 179-186 (2008) [pdf]

The Art of Tagging: Measuring the Quality of Tags

Krestel, R. & Chen, L.

, 'Proceedings of the 3rd Asian Semantic Web Conference on The Semantic Web', ASWC '08, Springer-Verlag, Berlin, Heidelberg, [10.1007/978-3-540-89704-0_18], 257-271 (2008) [pdf]

Collaborative tagging, supported by many social networking websites, is currently enjoying an increasing popularity. The usefulness of this largely available tag data has been explored in many applications including web resources categorization,deriving emergent semantics, web search etc. However, since tags are supplied by users freely , not all of them are useful and reliable, especially when they are generated by spammers with malicious intent. Therefore, identifying tags of high quality is crucial in improving the performance of applications based on tags. In this paper, we propose TRP-Rank (Tag-Resource Pair Rank), an algorithm to measure the quality of tags by manually assessing a seed set and propagating the quality through a graph. The three dimensional relationship among users, tags and web resources is firstly represented by a graph structure. A set of seed nodes, where each node represents a tag annotating a resource, is then selected and their quality is assessed. The quality of the remaining nodes is calculated by propagating the known quality of the seeds through the graph structure. We evaluate our approach on a public data set where tags generated by suspicious spammers were manually labelled. The experimental results demonstrate the effectiveness of this approach in measuring the quality of tags.

A user reputation model for a user-interactive question answering system

Chen, W.; Zeng, Q.; Wenyin, L. & Hao, T.

Concurrency and Computation: Practice and Experience, 19(15) 2091-2103 (2007) [pdf]

In this paper, we propose a user reputation model and apply it to a user-interactive question answering system. It combines the social network analysis approach and the user rating approach. Social network analysis is applied to analyze the impact of participant users' relations to their reputations. User rating is used to acquire direct judgment of a user's reputation based on other users' experiences with this user. Preliminary experiments show that the computed reputations based on our proposed reputation model can reflect the actual reputations of the simulated roles and therefore can fit in well with our user-interactive question answering system. Copyright © 2006 John Wiley & Sons, Ltd.

An experimental study on large-scale web categorization

LIU, T.-Y.; YANG, Y.; WAN, H.; ZHOU, Q.; GAO, B.; ZENG, H.-J.; CHEN, Z. & MA, W.-Y.

, 'Special interest tracks and posters of the 14th international conference on World Wide Web', WWW '05, ACM, New York, NY, USA, [10.1145/1062745.1062891], 1106-1107 (2005) [pdf]

Taxonomies of the Web typically have hundreds of thousands of categories and skewed category distribution over documents. It is not clear whether existing text classification technologies can perform well on and scale up to such large-scale applications. To understand this, we conducted the evaluation of several representative methods (Support Vector Machines, k-Nearest Neighbor and Naive Bayes) with Yahoo! taxonomies. In particular, we evaluated the effectiveness/efficiency tradeoff in classifiers with hierarchical setting compared to conventional (flat) setting, and tested popular threshold tuning strategies for their scalability and accuracy in large-scale classification problems.

Web-page classification through summarization

Shen, D.; Chen, Z.; Yang, Q.; Zeng, H.-J.; Zhang, B.; Lu, Y. & Ma, W.-Y.

, 'Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval', SIGIR '04, ACM, New York, NY, USA, [10.1145/1008992.1009035], 242-249 (2004) [pdf]

Web-page classification is much more difficult than pure-text classification due to a large variety of noisy information embedded in Web pages. In this paper, we propose a new Web-page classification algorithm based on Web summarization for improving the accuracy. We first give empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the performance of Web-page classification algorithms. We then propose a new Web summarization-based classification algorithm and evaluate it along with several other state-of-the-art text summarization algorithms on the LookSmart Web directory. Experimental results show that our proposed summarization-based classification algorithm achieves an approximately 8.8% improvement as compared to pure-text-based classification algorithm. We further introduce an ensemble classifier using the improved summarization algorithm and show that it achieves about 12.9% improvement over pure-text based methods.