Inverting a Steady-State.
In:
Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, Reihe WSDM '15, Seiten 359-368.
ACM, New York, NY, USA, 2015.
Ravi Kumar, Andrew Tomkins, Sergei Vassilvitskii und Erik Vee.
[doi]
[Kurzfassung]
[BibTeX]
We consider the problem of inferring choices made by users based only on aggregate data containing the relative popularity of each item. We propose a framework that models the problem as that of inferring a Markov chain given a stationary distribution. Formally, we are given a graph and a target steady-state distribution on its nodes. We are also give a mapping from per-node scores to a transition matrix, from a broad family of such mappings. The goal is to set the scores of each node such that the resulting transition matrix induces the desired steady state. We prove sufficient conditions under which this problem is feasible and, for the feasible instances, obtain a simple algorithm for a generic version of the problem. This iterative algorithm provably finds the unique solution to this problem and has a polynomial rate of convergence; in practice we find that the algorithm converges after fewer than ten iterations. We then apply this framework to choice problems in online settings and show that our algorithm is able to explain the observed data and predict the user choices much better than other competing baselines across a variety of diverse datasets.
Never-Ending Learning.
In:
AAAI.
2015.
: Never-Ending Learning in AAAI-2015
T. Mitchell, W. Cohen, E. Hruscha, P. Talukdar, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohammad, N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya, A. Gupta, X. Chen, A. Saparov, M. Greaves und J. Welling.
[doi]
[BibTeX]
Human-level control through deep reinforcement learning.
Nature, 518(7540):529-533, 2015.
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg und Demis Hassabis.
[doi]
[BibTeX]
An Overview of Microsoft Academic Service (MAS) and Applications..
In: A. Gangemi, S. Leonardi und A. Panconesi
(Herausgeber):
WWW (Companion Volume), Seiten 243-246.
ACM, 2015.
Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June Paul Hsu und Kuansan Wang.
[doi]
[BibTeX]
Semantic Annotation for Microblog Topics Using Wikipedia Temporal Information.
In:
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP).
Association for Computational Linguistics, 2015.
Tuan Tran, Nam-Khanh Tran, Asmelash Teka Hadgu und Robert Jäschke.
[Kurzfassung]
[BibTeX]
In this paper we study the problem of semantic annotation for a trending hashtag which is the crucial step towards analyzing user behavior in social media, yet has been largely unexplored. We tackle the problem via linking to entities from Wikipedia. We incorporate the social aspects of trending hashtags by identifying prominent entities for the annotation so as to maximize the information spreading in entity networks. We exploit temporal dynamics of entities in Wikipedia, namely Wikipedia edits and page views to improve the annotation quality. Our experiments show that we significantly outperform the established methods in tweet annotation.
Large-scale factorization of type-constrained multi-relational data.
In:
International Conference on Data Science and Advanced Analytics, DSAA 2014, Shanghai, China, October 30 - November 1, 2014, Seiten 18-24.
IEEE, 2014.
Denis Krompass, Maximilian Nickel und Volker Tresp.
[doi]
[BibTeX]
Linguistic Regularities in Sparse and Explicit Word Representations..
In: R. Morante und W. tau Yih
(Herausgeber):
CoNLL, Seiten 171-180.
ACL, 2014.
Omer Levy und Yoav Goldberg.
[doi]
[BibTeX]
From Topic Models to Semi-supervised Learning: Biasing Mixed-Membership Models to Exploit Topic-Indicative Features in Entity Clustering..
In: H. Blockeel, K. Kersting, S. Nijssen und F. Zelezný
(Herausgeber):
ECML/PKDD (2), Band 8189, Reihe Lecture Notes in Computer Science, Seiten 628-642.
Springer, 2013.
Ramnath Balasubramanyan, Bhavana Bharat Dalvi und William W. Cohen.
[doi]
[BibTeX]
Text as data: The promise and pitfalls of automatic content analysis methods for political texts.
Political Analysis:mps028, 2013.
Justin Grimmer und Brandon M Stewart.
[BibTeX]
Improved Bibliographic Reference Parsing Based on Repeated Patterns.
In:
Proceedings of the Second International Conference on Theory and Practice of Digital Libraries, Reihe TPDL'12, Seiten 370-382.
Springer-Verlag, Berlin, Heidelberg, 2012.
Guido Sautter und Klemens Böhm.
[doi]
[Kurzfassung]
[BibTeX]
Parsing details like author names and titles out of bibliographic references of scientific publications is an important issue. However, most existing techniques are tailored to the highly standardized reference styles used in the last two to three decades. Their performance tends to degrade when faced with the wider variety of reference styles used in older, historic publications. Thus, existing techniques are of limited use when creating comprehensive bibliographies covering both historic and contemporary scientific publications. This paper presents RefParse, a generic approach to bibliographic reference parsing that is independent of any specific reference style. Its core feature is an inference mechanism that exploits the regularities inherent in any list of references to deduce its format. Our evaluation shows that RefParse outperforms existing parsers both for contemporary and for historic reference lists.
Descriptive matrix factorization for sustainability Adopting the principle of opposites.
Data Mining and Knowledge Discovery, 24(2):325-354, 2012.
Christian Thurau, Kristian Kersting, Mirwaes Wahabzada und Christian Bauckhage.
[doi]
[Kurzfassung]
[BibTeX]
Climate change, the global energy footprint, and strategies for sustainable development have become topics of considerable political and public interest. The public debate is informed by an exponentially growing amount of data and there are diverse partisan interest when it comes to interpretation. We therefore believe that data analysis methods are called for that provide results which are intuitively understandable even to non-experts. Moreover, such methods should be efficient so that non-experts users can perform their own analysis at low expense in order to understand the effects of different parameters and influential factors. In this paper, we discuss a new technique for factorizing data matrices that meets both these requirements. The basic idea is to represent a set of data by means of convex combinations of extreme data points. This often accommodates human cognition. In contrast to established factorization methods, the approach presented in this paper can also determine over-complete bases. At the same time, convex combinations allow for highly efficient matrix factorization. Based on techniques adopted from the field of distance geometry, we derive a linear time algorithm to determine suitable basis vectors for factorization. By means of the example of several environmental and developmental data sets we discuss the performance and characteristics of the proposed approach and validate that significant efficiency gains are obtainable without performance decreases compared to existing convexity constrained approaches.
Anthropogenic noise exposure in protected natural areas: estimating the scale of ecological consequences.
Landscape Ecology, 26(9):1281-1295, 2011.
Jesse R. Barber, Chris L. Burdett, Sarah E. Reed, Katy A. Warner, Charlotte Formichella, Kevin R. Crooks, Dave M. Theobald und Kurt M. Fristrup.
[doi]
[Kurzfassung]
[BibTeX]
The extensive literature documenting the ecological effects of roads has repeatedly implicated noise as one of the causal factors. Recent studies of wildlife responses to noise have decisively identified changes in animal behaviors and spatial distributions that are caused by noise. Collectively, this research suggests that spatial extent and intensity of potential noise impacts to wildlife can be studied by mapping noise sources and modeling the propagation of noise across landscapes. Here we present models of energy extraction, aircraft overflight and roadway noise as examples of spatially extensive sources and to present tools available for landscape scale investigations. We focus these efforts in US National Parks (Mesa Verde, Grand Teton and Glacier) to highlight that ecological noise pollution is not a threat restricted to developed areas and that many protected natural areas experience significant noise loads. As a heuristic tool for understanding past and future noise pollution we forecast community noise utilizing a spatially-explicit land-use change model that depicts the intensity of human development at sub-county resolution. For road noise, we transform effect distances from two studies into sound levels to begin a discussion of noise thresholds for wildlife. The spatial scale of noise exposure is far larger than any protected area, and no site in the continental US is free form noise. The design of observational and experimental studies of noise effects should be informed by knowledge of regional noise exposure patterns.
Context Sensitive Topic Models for Author Influence in Document Networks.
, 2011.
Saurabh Kataria, Prasenjit Mitra, Cornelia Caragea und C. Giles.
[doi]
[Kurzfassung]
[BibTeX]
In a document network such as a citation network of scientific documents, web-logs etc., the content produced by authors exhibit their interest in certain topics. In addition some authors influence other authors' interests. In this work, we propose to model the influence of cited authors along with the interests of citing authors. Morover , we hypothesize that citations present in documents, the context surrounding the citation mention provides extra topical information about the cited authors. However, associating terms in the context to the cited authors remains an open problem. We propose novel document generation schemes that incorporate the context while simultaneously modeling the interests of citing authors and influence of the cited authors. Our experiments show significant improvements over baseline models for various evaluation criteria such as link prediction between document and cited author, and quantitatively explaining unseen text.
Sequential Latent Dirichlet Allocation: Discover Underlying Topic Structures within a Document..
In: G. I. Webb, B. L. 0001, C. Zhang, D. Gunopulos und X. Wu
(Herausgeber):
ICDM, Seiten 148-157.
IEEE Computer Society, 2010.
Lan Du, Wray Lindsay Buntine und Huidong Jin.
[doi]
[BibTeX]
Boilerplate Detection using Shallow Text Features.
In:
Proc. of 3rd ACM International Conference on Web Search and Data Mining New York City, NY USA (WSDM 2010)..
2010.
Christian Kohlschütter, Peter Fankhauser und Wolfgang Nejdl.
[BibTeX]
Dynamic Auto-Encoders for Semantic Indexing.
In: P. of the NIPS 2010 Workshop on Deep Learning
(Herausgeber): .
2010.
Piotr Mirowski, Marc'Aurelio Ranzato und Yann LeCun.
[doi]
[BibTeX]
h-Index: A review focused in its variants, computation and standardization for different scientific fields .
Journal of Informetrics , 3(4):273 - 289, 2009.
S. Alonso, F.J. Cabrerizo, E. Herrera-Viedma und F. Herrera.
[doi]
[BibTeX]
A survey of statistical network models.
2009. cite arxiv:0912.5410Comment: 96 pages, 14 figures, 333 references.
Anna Goldenberg, Alice X Zheng, Stephen E Fienberg und Edoardo M Airoldi.
[doi]
[Kurzfassung]
[BibTeX]
Networks are ubiquitous in science and have become a focal point for discussion in everyday life. Formal statistical models for the analysis of network data have emerged as a major topic of interest in diverse areas of study, and most of these involve a form of graphical representation. Probability models on graphs date back to 1959. Along with empirical studies in social psychology and sociology from the 1960s, these early works generated an active network community and a substantial literature in the 1970s. This effort moved into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning network literature in statistical physics and computer science. The growth of the World Wide Web and the emergence of online networking communities such as Facebook, MySpace, and LinkedIn, and a host of more specialized professional network communities has intensified interest in the study of networks and network data. Our goal in this review is to provide the reader with an entry point to this burgeoning literature. We begin with an overview of the historical development of statistical network modeling and then we introduce a number of examples that have been studied in the network literature. Our subsequent discussion focuses on a number of prominent static and dynamic network models and their interconnections. We emphasize formal model descriptions, and pay special attention to the interpretation of parameters and their estimation. We end with a description of some open problems and challenges for machine learning and statistics.
Labeled LDA: A Supervised Topic Model for Credit Attribution in Multi-labeled Corpora.
In:
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1, Reihe EMNLP '09, Seiten 248-256.
Association for Computational Linguistics, Stroudsburg, PA, USA, 2009.
Daniel Ramage, David Hall, Ramesh Nallapati und Christopher D. Manning.
[doi]
[Kurzfassung]
[BibTeX]
A significant portion of the world's text is tagged by readers on social bookmarking websites. Credit attribution is an inherent problem in these corpora because most pages have multiple tags, but the tags do not always apply with equal specificity across the whole document. Solving the credit attribution problem requires associating each word in a document with the most appropriate tags and vice versa. This paper introduces Labeled LDA, a topic model that constrains Latent Dirichlet Allocation by defining a one-to-one correspondence between LDA's latent topics and user tags. This allows Labeled LDA to directly learn word-tag correspondences. We demonstrate Labeled LDA's improved expressiveness over traditional LDA with visualizations of a corpus of tagged web pages from del.icio.us. Labeled LDA outperforms SVMs by more than 3 to 1 when extracting tag-specific document snippets. As a multi-label text classifier, our model is competitive with a discriminative baseline on a variety of datasets.
Measuring Media Bias: A Content Analysis of Time and Newsweek Coverage of Domestic Social Issues, 1975–2000*.
Social Science Quarterly, 88(3):690-706, 2007.
Tawnya J. Adkins Covert und Philo C. Wasburn.
[doi]
[Kurzfassung]
[BibTeX]
Objective. This study is an effort to produce a more systematic, empirically-based, historical-comparative understanding of media bias than generally is found in previous works.Methods. The research employs a quantitative measure of ideological bias in a formal content analysis of the United States' two largest circulation news magazines, Time and Newsweek. Findings are compared with the results of an identical examination of two of the nation's leading partisan journals, the conservative National Review and the liberal Progressive.Results. Bias scores reveal stark differences between the mainstream and the partisan news magazines' coverage of four issue areas: crime, the environment, gender, and poverty.Conclusion. Data provide little support for those claiming significant media bias in either ideological direction.