Publications

The Socialbot Network: When Bots Socialize for Fame and Money

Boshmaf, Y.; Muslukhov, I.; Beznosov, K. & Ripeanu, M.

, 'Proc. of the Annual Computer Security Applications Conference 2011', ACM (2011) [pdf]

Online Social Networks (OSNs) have become an integral part of today's Web. Politicians, celebrities, revolutionists, and others use OSNs as a podium to deliver their message to millions of active web users. Unfortunately, in the wrong hands, OSNs can be used to run astroturf campaigns to spread misinformation and propaganda. Such campaigns usually start o� by in�ltrating a targeted OSN on a large scale. In this paper, we evaluate how vulnerable OSNs are to a large-scale in�ltration by socialbots: computer programs that control OSN accounts and mimic real users. We adopt a traditional web-based botnet design and built a Socialbot Network (SbN): a group of adaptive socialbots that are or- chestrated in a command-and-control fashion. We operated such an SbN on Facebook|a 750 million user OSN|for about 8 weeks. We collected data related to users' behav- ior in response to a large-scale in�ltration where socialbots were used to connect to a large number of Facebook users. Our results show that (1) OSNs, such as Facebook, can be in�ltrated with a success rate of up to 80%, (2) depending on users' privacy settings, a successful in�ltration can result in privacy breaches where even more users' data are exposed when compared to a purely public access, and (3) in prac- tice, OSN security defenses, such as the Facebook Immune System, are not e�ective enough in detecting or stopping a large-scale in�ltration as it occurs.

Text Linkage in the Wiki Medium-A comparative study

Mehler, A.

, Karlgren Jussi, ed. , 'Proceedings of the EACL 2006 Workshop on New Text-Wikis and blogs and other dynamic text sources', 1-8(2006) [pdf]

Sybil-resilient online content rating

Tran, D.; Min, B.; Li, J. & Subramanian, L.

(2009) [pdf]

Social Structure of Facebook Networks

Traud, A. L.; Mucha, P. J. & Porter, M. A.

(2011) [pdf]

We study the social structure of Facebook "friendship" networks at one
ndred American colleges and universities at a single point in time, and we
amine the roles of user attributes - gender, class year, major, high school,
d residence - at these institutions. We investigate the influence of common
tributes at the dyad level in terms of assortativity coefficients and
gression models. We then examine larger-scale groupings by detecting
mmunities algorithmically and comparing them to network partitions based on
e user characteristics. We thereby compare the relative importances of
fferent characteristics at different institutions, finding for example that
mmon high school is more important to the social organization of large
stitutions and that the importance of common major varies significantly
tween institutions. Our calculations illustrate how microscopic and
croscopic perspectives give complementary insights on the social organization
universities and suggest future studies to investigate such phenomena
rther.

Social Information Processing in Social News Aggregation

Lerman, K.

arXiv (2007) [pdf]

The rise of the social media sites, such as blogs, wikis, Digg and Flickr among others, underscores the transformation of the Web to a participatory medium in which users are collaboratively creating, evaluating and distributing information. The innovations introduced by social media has lead to a new paradigm for interacting with information, what we call 'social information processing'. In this paper, we study how social news aggregator Digg exploits social information processing to solve the problems of document recommendation and rating. First, we show, by tracking stories over time, that social networks play an important role in document recommendation. The second contribution of this paper consists of two mathematical models. The first model describes how collaborative rating and promotion of stories emerges from the independent decisions made by many users. The second model describes how a user's influence, the number of promoted stories and the user's social network, changes in time. We find qualitative agreement between predictions of the model and user data gathered from Digg.

SCAN: a structural clustering algorithm for networks

Xu, X.; Yuruk, N.; Feng, Z. & Schweiger, T. A. J.

, 'KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining', ACM, New York, NY, USA, [http://doi.acm.org/10.1145/1281192.1281280], 824-833 (2007) [pdf]

Ranking scientific publications using a model of network traffic

Walker, D.; Xie, H.; Yan, K.-K. & Maslov, S.

Journal of Statistical Mechanics: Theory and Experiment, 2007(06) P06010 (2007) [pdf]

To account for strong ageing characteristics of citation networks, we modify the PageRank algorithm by initially distributing random surfers exponentially with age, in favour of more recent publications. The output of this algorithm, which we call CiteRank, is interpreted as approximate traffic to individual publications in a simple model of how researchers find new information. We optimize parameters of our algorithm to achieve the best performance. The results are compared for two rather different citation networks: all American Physical Society publications between 1893 and 2003 and the set of high-energy physics theory (hep-th) preprints. Despite major differences between these two networks, we find that their optimal parameters for the CiteRank algorithm are remarkably similar. The advantages and performance of CiteRank over more conventional methods of ranking publications are discussed.

Promise and Pitfalls of Extending Google's PageRank Algorithm to Citation Networks

Maslov, S. & Redner, S.

(2009) [pdf]

We review our recent work on applying the Google PageRank algorithm to find scientific "gems" among all Physical Review publications, and its extension to CiteRank, to find currently popular research directions. These metrics provide a meaningful extension to traditionally-used importance measures, such as the number of citations and journal impact factor. We also point out some pitfalls of over-relying on quantitative metrics to evaluate scientific quality.

Peer and Authority Pressure in Information-Propagation Models

Anagnostopoulos, A.; Brova, G. & Terzi, E.

, 'Proceedings of the ECML/PKDD 2011' (2011)

Online Social Networks – Ein sozialer und technischer Überblick

Heidemann, J.

Informatik-Spektrum - (2009) [pdf]

Zusammenfassung  Online Social Networks wie Xing.com oder Facebook.com gehören zu den am stärksten wachsenden Diensten im Internet. Im Jahr
08 nutzten geschätzte 580 Mio. Menschen weltweit diese Angebote. Entsprechend schnell haben sich Online Social Networksinnerhalb weniger Jahre von einem Nischenphänomen zu einem weltweiten Medium der IT-gestützten Kommunikation entwickelt. Insbesondereaufgrund stark wachsender Mitgliederzahlen entfalten Online Social Networks eine erhebliche gesellschaftliche und wirtschaftlicheBedeutung. Vor diesem Hintergrund ist es Ziel dieses Beitrags, Begriff und Eigenschaften, Entstehung und Entwicklung sowieNutzenpotenziale und Herausforderungen von Online Social Networks näher zu untersuchen.

Identifying influential spreaders in complex networks

Kitsak, M.; Gallos, L. K.; Havlin, S.; Liljeros, F.; Muchnik, L.; Stanley, H. E. & Makse, H. A.

(2010) [pdf]

Networks portray a multitude of interactions through which people meet, ideas
e spread, and infectious diseases propagate within a society. Identifying the
st efficient "spreaders" in a network is an important step to optimize the
e of available resources and ensure the more efficient spread of information.
re we show that, in contrast to common belief, the most influential spreaders
a social network do not correspond to the best connected people or to the
st central people (high betweenness centrality). Instead, we find: (i) The
st efficient spreaders are those located within the core of the network as
entified by the k-shell decomposition analysis. (ii) When multiple spreaders
e considered simultaneously, the distance between them becomes the crucial
rameter that determines the extend of the spreading. Furthermore, we find
at-- in the case of infections that do not confer immunity on recovered
dividuals-- the infection persists in the high k-shell layers of the network
der conditions where hubs may not be able to preserve the infection. Our
alysis provides a plausible route for an optimal design of efficient
ssemination strategies.

Identifying and facilitating social interaction with a wearable wireless sensor network

Paradiso, J.; Gips, J.; Laibowitz, M.; Sadi, S.; Merrill, D.; Aylward, R.; Maes, P. & Pentland, A.

Personal and Ubiquitous Computing, 14(2) 137-152 (2010) [pdf]

Abstract  We have designed a highly versatile badge system to facilitate a variety of interaction at large professional or social events
d serve as a platform for conducting research into human dynamics. The badges are equipped with a large LED display, wirelessinfrared and radio frequency networking, and a host of sensors to collect data that we have used to develop features and algorithmsaimed at classifying and predicting individual and group behavior. This paper overviews our badge system, describes the interactionsand capabilities that it enabled for the wearers, and presents data collected over several large deployments. This data isanalyzed to track and socially classify the attendees, predict their interest in other people and demonstration installations,profile the restlessness of a crowd in an auditorium, and otherwise track the evolution and dynamics of the events at whichthe badges were run.

Group formation in large social networks: membership, growth, and evolution.

Backstrom, L.; Huttenlocher, D. P.; Kleinberg, J. M. & Lan, X.

Eliassi-Rad, T.; Ungar, L. H.; Craven, M. & Gunopulos, D., ed., 'KDD', ACM, 44-54 (2006) [pdf]

Extracting Social Networks among Various Entities on the Web

Jin, Y.; Matsuo, Y. & Ishizuka, M.

Franconi, E.; Kifer, M. & May, W., ed., 'Proceedings of the European Semantic Web Conference, ESWC2007', 4519(), Lecture Notes in Computer Science, Springer-Verlag (2007) [pdf]

Community Detection as an Inference Problem

Hastings, M. B.

(2006) [pdf]

We express community detection as an inference problem of determining the
st likely arrangement of communities. We then apply belief propagation and
an-field theory to this problem, and show that this leads to fast, accurate
gorithms for community detection.

Building an Effective Representation for Dynamic Networks

Hill, S.; Agarwal, D. K.; Bell, R. & Volinsky, C.

Journal of Computational & Graphical Statistics, 15() 584-608(25) (2006) [pdf]

A dynamic network is a special type of network composed of connected transactors which have repeated evolving interaction. Data on large dynamic networks such as telecommunications networks and the Internet are pervasive. However, representing dynamic networks in a manner that is conducive to efficient large-scale analysis is a challenge. In this article, we represent dynamic graphs using a data structure introduced in an earlier article. We advocate their representation because it accounts for the evolution of relationships between transactors through time, mitigates noise at the local transactor level, and allows for the removal of stale relationships. Our work improves on their heuristic arguments by formalizing the representation with three tunable parameters. In doing this, we develop a generic framework for evaluating and tuning any dynamic graph. We show that the storage saving approximations involved in the representation do not affect predictive performance, and typically improve it. We motivate our approach using a fraud detection example from the telecommunications industry, and demonstrate that we can outperform published results on the fraud detection task. In addition, we present a preliminary analysis on Web logs and e-mail networks.

Birds of a Feather: Homophily in Social Networks

McPherson, M.; Smith-Lovin, L. & Cook, J. M.

Annual Review of Sociology, 27(), 415-444(2001) [pdf]

Similarity breeds connection. This principle-the homophily principle-structures network ties of every type, including marriage, friendship, work, advice, support, information transfer, exchange, comembership, and other types of relationship. The result is that people's personal networks are homogeneous with regard to many sociodemographic, behavioral, and intrapersonal characteristics. Homophily limits people's social worlds in a way that has powerful implications for the information they receive, the attitudes they form, and the interactions they experience. Homophily in race and ethnicity creates the strongest divides in our personal environments, with age, religion, education, occupation, and gender following in roughly that order. Geographic propinquity, families, organizations, and isomorphic positions in social systems all create contexts in which homophilous relations form. Ties between nonsimilar individuals also dissolve at a higher rate, which sets the stage for the formation of niches (localized positions) within social space. We argue for more research on: (a) the basic ecological processes that link organizations, associations, cultural communities, social movements, and many other social forms; (b) the impact of multiplex ties on the patterns of homophily; and (c) the dynamics of network change over time through which networks and other social entities co-evolve.

A survey of statistical network models

Goldenberg, A.; Zheng, A. X.; Fienberg, S. E. & Airoldi, E. M.

(2009) [pdf]

Networks are ubiquitous in science and have become a focal point for
scussion in everyday life. Formal statistical models for the analysis of
twork data have emerged as a major topic of interest in diverse areas of
udy, and most of these involve a form of graphical representation.
obability models on graphs date back to 1959. Along with empirical studies in
cial psychology and sociology from the 1960s, these early works generated an
tive network community and a substantial literature in the 1970s. This effort
ved into the statistical literature in the late 1970s and 1980s, and the past
cade has seen a burgeoning network literature in statistical physics and
mputer science. The growth of the World Wide Web and the emergence of online
tworking communities such as Facebook, MySpace, and LinkedIn, and a host of
re specialized professional network communities has intensified interest in
e study of networks and network data. Our goal in this review is to provide
e reader with an entry point to this burgeoning literature. We begin with an
erview of the historical development of statistical network modeling and then
introduce a number of examples that have been studied in the network
terature. Our subsequent discussion focuses on a number of prominent static
d dynamic network models and their interconnections. We emphasize formal
del descriptions, and pay special attention to the interpretation of
rameters and their estimation. We end with a description of some open
oblems and challenges for machine learning and statistics.

A generative model for feedback networks

White, D. R.; Kejzar, N.; Tsallis, C.; Farmer, D. & White, S.

(2005) [pdf]

We investigate a simple generative model for network formation. The model is designed to describe the growth of networks of kinship, trading, corporate alliances, or autocatalytic chemical reactions, where feedback is an essential element of network growth. The underlying graphs in these situations grow via a competition between cycle formation and node addition. After choosing a given node, a search is made for another node at a suitable distance. If such a node is found, a link is added connecting this to the original node, and increasing the number of cycles in the graph; if such a node cannot be found, a new node is added, which is linked to the original node. We simulate this algorithm and find that we cannot reject the hypothesis that the empirical degree distribution is a q-exponential function, which has been used to model long-range processes in nonequilibrium statistical mechanics.