TY - JOUR
AU - Bollen, Johan
AU - Van de Sompel, Herbert
AU - Hagberg, Aric
AU - Bettencourt, Luis
AU - Chute, Ryan
AU - Rodriguez, Marko A.
AU - Balakireva, Lyudmila
T1 - Clickstream Data Yields High-Resolution Maps of Science
JO - PLoS ONE
PY - 2009/03
VL - 4
IS - 3
SP -
EP -
UR - http://dx.doi.org/10.1371%2Fjournal.pone.0004803
M3 - 10.1371/journal.pone.0004803
KW - navigation
KW - markov
KW - clickstream
KW - path
L1 -
SN -
N1 -
N1 -
AB - BackgroundIntricate maps of science have been created from citation data to visualize the structure of scientific activity. However, most scientific publications are now accessed online. Scholarly web portals record detailed log data at a scale that exceeds the number of all existing citations combined. Such log data is recorded immediately upon publication and keeps track of the sequences of user requests (clickstreams) that are issued by a variety of users across many different domains. Given these advantages of log datasets over citation data, we investigate whether they can produce high-resolution, more current maps of science.
MethodologyOver the course of 2007 and 2008, we collected nearly 1 billion user interactions recorded by the scholarly web portals of some of the most significant publishers, aggregators and institutional consortia. The resulting reference data set covers a significant part of world-wide use of scholarly web portals in 2006, and provides a balanced coverage of the humanities, social sciences, and natural sciences. A journal clickstream model, i.e. a first-order Markov chain, was extracted from the sequences of user interactions in the logs. The clickstream model was validated by comparing it to the Getty Research Institute's Architecture and Art Thesaurus. The resulting model was visualized as a journal network that outlines the relationships between various scientific domains and clarifies the connection of the social sciences and humanities to the natural sciences.
ConclusionsMaps of science resulting from large-scale clickstream data provide a detailed, contemporary view of scientific activity and correct the underrepresentation of the social sciences and humanities that is commonly found in citation data.
ER -
TY - JOUR
AU - Bollen, Johan
AU - van de Sompel, Herbert
AU - Hagberg, Aric
AU - Bettencourt, Luis
AU - Chute, Ryan
AU - Rodriguez, Marko A.
AU - Balakireva, Lyudmila
T1 - Clickstream Data Yields High-Resolution Maps of Science
JO - PLoS ONE
PY - 2009/03
VL - 4
IS - 3
SP -
EP -
UR - http://dx.doi.org/10.1371/journal.pone.0004803
M3 - 10.1371/journal.pone.0004803
KW - scientometrics
KW - science
KW - analysis
KW - citation
KW - toread
L1 -
SN -
N1 -
N1 -
AB - Background Intricate maps of science have been created from citation data to visualize the structure of scientific activity. However, most scientific publications are now accessed online. Scholarly web portals record detailed log data at a scale that exceeds the number of all existing citations combined. Such log data is recorded immediately upon publication and keeps track of the sequences of user requests (clickstreams) that are issued by a variety of users across many different domains. Given these advantages of log datasets over citation data, we investigate whether they can produce high-resolution, more current maps of science. Methodology Over the course of 2007 and 2008, we collected nearly 1 billion user interactions recorded by the scholarly web portals of some of the most significant publishers, aggregators and institutional consortia. The resulting reference data set covers a significant part of world-wide use of scholarly web portals in 2006, and provides a balanced coverage of the humanities, social sciences, and natural sciences. A journal clickstream model, i.e. a first-order Markov chain, was extracted from the sequences of user interactions in the logs. The clickstream model was validated by comparing it to the Getty Research Institute's Architecture and Art Thesaurus. The resulting model was visualized as a journal network that outlines the relationships between various scientific domains and clarifies the connection of the social sciences and humanities to the natural sciences. Conclusions Maps of science resulting from large-scale clickstream data provide a detailed, contemporary view of scientific activity and correct the underrepresentation of the social sciences and humanities that is commonly found in citation data.
ER -
TY - GEN
AU - Gomes, Luiz H.
AU - Almeida, Rodrigo B.
AU - Bettencourt, Luis M. A.
AU - Almeida, Virgilio
AU - Almeida, Jussara M.
A2 -
T1 - Comparative Graph Theoretical Characterization of Networks of Spam and Legitimate Email
JO -
PB -
AD -
PY - 2005/
VL -
IS -
SP -
EP -
UR - http://www.citebase.org/abstract?id=oai:arXiv.org:physics/0504025
M3 -
KW - email
KW - graph
KW - network
KW - spam
L1 -
N1 - [physics/0504025] Comparative Graph Theoretical Characterization of Networks of Spam and Legitimate Email
N1 -
AB - Email is an increasingly important and ubiquitous means of communication, both facilitating contact between private individuals and enabling rises in the productivity of organizations. However the relentless rise of automatic unauthorized emails, a.k.a. spam is eroding away much of the attractiveness of email communication. Most of the attention dedicated to date to spam detection has focused on the content of the emails or on the addresses or domains associated with spam senders. Although methods based on these - easily changeable - identifiers work reasonably well they miss on the fundamental nature of spam as an opportunistic relationship, very different from the normal mutual relations between senders and recipients of legitimate email. Here we present a comprehensive graph theoretical analysis of email traffic that captures these properties quantitatively. We identify several simple metrics that serve both to distinguish between spam and legitimate email and to provide a statistical basis for models of spam traffic.
ER -