Collaborative data networks for public service: governance, management, and performance.
Chen, Y.-C. & Lee, J.
Public Management Review, 20(5) (2018)
This study aims to advance the theory and practice of managing collaborative data networks for information and decision-support services that exist in over 400 US metropolitan areas. Integrating insights from collaborative governance, network management, and cross-boundary information sharing, this study develops a framework to outline the interplay between context, management, collaborative dynamics, technology, and performance. This study further utilizes the framework to conduct an exploratory in-depth case study of a metropolitan transportation data network to examine such interplay. The findings suggest ways to improve the performance of collaborative data networks and their implications are discussed. [ABSTRACT FROM AUTHOR]
Never-Ending Learning
Mitchell, T.; Cohen, W.; Hruscha, E.; Talukdar, P.; Betteridge, J.; Carlson, A.; Dalvi, B.; Gardner, M.; Kisiel, B.; Krishnamurthy, J.; Lao, N.; Mazaitis, K.; Mohammad, T.; Nakashole, N.; Platanios, E.; Ritter, A.; Samadi, M.; Settles, B.; Wang, R.; Wijaya, D.; Gupta, A.; Chen, X.; Saparov, A.; Greaves, M. & Welling, J.
ASTERIX: an open source system for "Big Data" management and analysis (demo)
Alsubaiee, S.; Altowim, Y.; Altwaijry, H.; Behm, A.; Borkar, V.; Bu, Y.; Carey, M.; Grover, R.; Heilbron, Z.; Kim, Y.-S.; Li, C.; Onose, N.; Pirzadeh, P.; Vernica, R. & Wen, J.
At UC Irvine, we are building a next generation parallel database system, called ASTERIX, as our approach to addressing today's "Big Data" management challenges. ASTERIX aims to combine time-tested principles from parallel database systems with those of the Web-scale computing community, such as fault tolerance for long running jobs. In this demo, we present a whirlwind tour of ASTERIX, highlighting a few of its key features. We will demonstrate examples of our data definition language to model semi-structured data, and examples of interesting queries using our declarative query language. In particular, we will show the capabilities of ASTERIX for answering geo-spatial queries and fuzzy queries, as well as ASTERIX' data feed construct for continuously ingesting data.
ASTERIX: towards a scalable, semistructured data platform for evolving-world models
Behm, A.; Borkar, V.; Carey, M.; Grover, R.; Li, C.; Onose, N.; Vernica, R.; Deutsch, A.; Papakonstantinou, Y. & Tsotras, V.
ASTERIX is a new data-intensive storage and computing platform project spanning UC Irvine, UC Riverside, and UC San Diego. In this paper we provide an overview of the ASTERIX project, starting with its main goal—the storage and analysis of data pertaining to evolving-world models . We describe the requirements and associated challenges, and explain how the project is addressing them. We provide a technical overview of ASTERIX, covering its architecture, its user model for data and queries, and its approach to scalable query processing and data management. ASTERIX utilizes a new scalable runtime computational platform called Hyracks that is also discussed at an overview level; we have recently made Hyracks available in open source for use by other interested parties. We also relate our work on ASTERIX to the current state of the art and describe the research challenges that we are currently tackling as well as those that lie ahead.
Claper: Recommend classical papers to beginners
Wang, Y.; Zhai, E.; Hu, J. & Chen, Z.
Classical papers are of great help for beginners to get familiar with a new research area. However, digging them out is a difficult problem. This paper proposes Claper, a novel academic recommendation system based on two proven principles: the Principle of Download Persistence and the Principle of Citation Approaching (we prove them based on real-world datasets). The principle of download persistence indicates that classical papers have few decreasing download frequencies since they were published. The principle of citation approaching indicates that a paper which cites a classical paper is likely to cite citations of that classical paper. Our experimental results based on large-scale real-world datasets illustrate Claper can effectively recommend classical papers of high quality to beginners and thus help them enter their research areas.
Interaction design guidelines on critiquing-based recommender systems
Chen, L. & Pu, P.
A critiquing-based recommender system acts like an artificial salesperson. It engages users in a conversational dialog where users can provide feedback in the form of critiques to the sample items that were shown to them. The feedback, in turn, enables the system to refine its understanding of the user’s preferences and prediction of what the user truly wants. The system is then able to recommend products that may better stimulate the user’s interest in the next interaction cycle. In this paper, we report our extensive investigation of comparing various approaches in devising critiquing opportunities designed in these recommender systems. More specifically, we have investigated two major design elements which are necessary for a critiquing-based recommender system:
Exploit the tripartite network of social tagging for web clustering
Lu, C.; Chen, X. & Park, E. K.
In this poster, we investigate how to enhance web clustering by leveraging the tripartite network of social tagging systems. We propose a clustering method, called "Tripartite Clustering", which cluster the three types of nodes (resources, users and tags) simultaneously based on the links in the social tagging network. The proposed method is experimented on a real-world social tagging dataset sampled from del.icio.us. We also compare the proposed clustering approach with K-means. All the clustering results are evaluated against a human-maintained web directory. The experimental results show that Tripartite Clustering significantly outperforms the content-based K-means approach and achieves performance close to that of social annotation-based K-means whereas generating much more useful information.
A Survey of Human Computation Systems
Yuen, M.-C.; Chen, L.-J. & King, I.
Human computation is a technique that makes use of human abilities for computation to solve problems. The human computation problems are the problems those computers are not good at solving but are trivial for humans. In this paper, we give a survey of various human computation systems which are categorized into initiatory human computation, distributed human computation and social game-based human computation with volunteers, paid engineers and online players. For the existing large number of social games, some previous works defined various types of social games, but the recent developed social games cannot be categorized based on the previous works. In this paper, we define the categories and the characteristics of social games which are suitable for all existing ones. Besides, we present a survey on the performance aspects of human computation system. This paper gives a better understanding on human computation system.
The Art of Tagging: Measuring the Quality of Tags
Krestel, R. & Chen, L.
<p>Collaborative tagging, supported by many social networking websites, is currently enjoying an increasing popularity. The usefulness of this largely available tag data has been explored in many applications including web resources categorization,deriving emergent semantics, web search etc. However, since tags are supplied by users <em>freely</em> , not all of them are useful and reliable, especially when they are generated by spammers with malicious intent. Therefore, identifying tags of high quality is crucial in improving the performance of applications based on tags. In this paper, we propose TRP-Rank (Tag-Resource Pair Rank), an algorithm to measure the quality of tags by manually assessing a seed set and <em>propagating the quality</em> through a graph. The three dimensional relationship among users, tags and web resources is firstly represented by a graph structure. A set of seed nodes, where each node represents a tag annotating a resource, is then selected and their quality is assessed. The quality of the remaining nodes is calculated by propagating the known quality of the seeds through the graph structure. We evaluate our approach on a public data set where tags generated by suspicious spammers were manually labelled. The experimental results demonstrate the effectiveness of this approach in measuring the quality of tags.</p>
A user reputation model for a user-interactive question answering system
Chen, W.; Zeng, Q.; Wenyin, L. & Hao, T.
In this paper, we propose a user reputation model and apply it to a user-interactive question answering system. It combines the social network analysis approach and the user rating approach. Social network analysis is applied to analyze the impact of participant users' relations to their reputations. User rating is used to acquire direct judgment of a user's reputation based on other users' experiences with this user. Preliminary experiments show that the computed reputations based on our proposed reputation model can reflect the actual reputations of the simulated roles and therefore can fit in well with our user-interactive question answering system. Copyright © 2006 John Wiley & Sons, Ltd.
An experimental study on large-scale web categorization
LIU, T.-Y.; YANG, Y.; WAN, H.; ZHOU, Q.; GAO, B.; ZENG, H.-J.; CHEN, Z. & MA, W.-Y.
Taxonomies of the Web typically have hundreds of thousands of categories and skewed category distribution over documents. It is not clear whether existing text classification technologies can perform well on and scale up to such large-scale applications. To understand this, we conducted the evaluation of several representative methods (Support Vector Machines, <i>k</i>-Nearest Neighbor and Naive Bayes) with Yahoo! taxonomies. In particular, we evaluated the effectiveness/efficiency tradeoff in classifiers with hierarchical setting compared to conventional (flat) setting, and tested popular threshold tuning strategies for their scalability and accuracy in large-scale classification problems.
Web-page classification through summarization
Shen, D.; Chen, Z.; Yang, Q.; Zeng, H.-J.; Zhang, B.; Lu, Y. & Ma, W.-Y.
Web-page classification is much more difficult than pure-text classification due to a large variety of noisy information embedded in Web pages. In this paper, we propose a new Web-page classification algorithm based on Web summarization for improving the accuracy. We first give empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the performance of Web-page classification algorithms. We then propose a new Web summarization-based classification algorithm and evaluate it along with several other state-of-the-art text summarization algorithms on the LookSmart Web directory. Experimental results show that our proposed summarization-based classification algorithm achieves an approximately 8.8% improvement as compared to pure-text-based classification algorithm. We further introduce an ensemble classifier using the improved summarization algorithm and show that it achieves about 12.9% improvement over pure-text based methods.