Web Archives for Researchers: Representations, Expectations and Potential Uses.
D-Lib Magazine, 18(3/4), 2012.
Peter Stirling, Philippe Chevallier und Gildas Illien.
[doi]
[Kurzfassung]
[BibTeX]
The Internet has been covered by legal deposit legislation in France since 2006, making web archiving one of the missions of the Bibliothèque nationale de France (BnF). Access to the web archives has been provided in the library on an experimental basis since 2008. In the context of increasing interest in many countries in web archiving and how it may best serve the needs of researchers, especially in the expanding field of Internet studies for social sciences, a qualitative study was performed, based on interviews with potential users of the web archives held at the BnF, and particularly researchers working in various areas related to the Internet. The study aimed to explore their needs in terms of both content and services, and also to analyse different ways of representing the archives, in order to identify ways of increasing their use. While the interest of maintaining the "memory" of the web is obvious to the researchers, they are faced with the difficulty of defining, in what is a seemingly limitless space, meaningful collections of documents. Cultural heritage institutions such as national libraries are perceived as trusted third parties capable of creating rationally-constructed and well-documented collections, but such archives raise certain ethical and methodological questions.
Development of computer science disciplines: a social network analysis approach.
Social Network Analysis and Mining, 1(4):321-340, 2011.
Manh Pham, Ralf Klamma und Matthias Jarke.
[doi]
[Kurzfassung]
[BibTeX]
In contrast to many other scientific disciplines, computer science considers conference publications. Conferences have the advantage of providing fast publication of papers and of bringing researchers together to present and discuss the paper with peers. Previous work on knowledge mapping focused on the map of all sciences or a particular domain based on ISI published Journal Citation Report (JCR). Although this data cover most of the important journals, it lacks computer science conference and workshop proceedings, which results in an imprecise and incomplete analysis of the computer science knowledge. This paper presents an analysis on the computer science knowledge network constructed from all types of publications, aiming at providing a complete view of computer science research. Based on the combination of two important digital libraries (DBLP and CiteSeerX), we study the knowledge network created at journal/conference level using citation linkage, to identify the development of sub-disciplines. We investigate the collaborative and citation behavior of journals/conferences by analyzing the properties of their co-authorship and citation subgraphs. The paper draws several important conclusions. First, conferences constitute social structures that shape the computer science knowledge. Second, computer science is becoming more interdisciplinary. Third, experts are the key success factor for sustainability of journals/conferences.
Clickstream Data Yields High-Resolution Maps of Science.
PLoS ONE, 4(3):e4803, 2009.
Johan Bollen, Herbert van de Sompel, Aric Hagberg, Luis Bettencourt, Ryan Chute, Marko A. Rodriguez und Lyudmila Balakireva.
[doi]
[Kurzfassung]
[BibTeX]
Background Intricate maps of science have been created from citation data to visualize the structure of scientific activity. However, most scientific publications are now accessed online. Scholarly web portals record detailed log data at a scale that exceeds the number of all existing citations combined. Such log data is recorded immediately upon publication and keeps track of the sequences of user requests (clickstreams) that are issued by a variety of users across many different domains. Given these advantages of log datasets over citation data, we investigate whether they can produce high-resolution, more current maps of science. Methodology Over the course of 2007 and 2008, we collected nearly 1 billion user interactions recorded by the scholarly web portals of some of the most significant publishers, aggregators and institutional consortia. The resulting reference data set covers a significant part of world-wide use of scholarly web portals in 2006, and provides a balanced coverage of the humanities, social sciences, and natural sciences. A journal clickstream model, i.e. a first-order Markov chain, was extracted from the sequences of user interactions in the logs. The clickstream model was validated by comparing it to the Getty Research Institute's Architecture and Art Thesaurus. The resulting model was visualized as a journal network that outlines the relationships between various scientific domains and clarifies the connection of the social sciences and humanities to the natural sciences. Conclusions Maps of science resulting from large-scale clickstream data provide a detailed, contemporary view of scientific activity and correct the underrepresentation of the social sciences and humanities that is commonly found in citation data.
The Scholarly Database and its utility for scientometrics research.
Scientometrics, 79(2):219-234, 2009.
Gavin La Rowe, Sumeet Ambre, John Burgoon, Weimao Ke und Katy Börner.
[doi]
[Kurzfassung]
[BibTeX]
The Scholarly Database aims to serve researchers and practitioners interested in the analysis, modelling, and visualization of large-scale data sets. A specific focus of this database is to support macro-evolutionary studies of science and to communicate findings via knowledge-domain visualizations. Currently, the database provides access to about 18 million publications, patents, and grants. About 90% of the publications are available in full text. Except for some datasets with restricted access conditions, the data can be retrieved in raw or pre-processed formats using either a web-based or a relational database client. This paper motivates the need for the database from the perspective of bibliometric/scientometric research. It explains the database design, setup, etc., and reports the temporal, geographical, and topic coverage of data sets currently served via the database. Planned work and the potential for this database to become a global testbed for information science research are discussed at the end of the paper.
DBLP: some lessons learned.
Proceedings of the VLDB Endowment, 2(2):1493-1500, 2009.
Michael Ley.
[doi]
[Kurzfassung]
[BibTeX]
The DBLP Computer Science Bibliography evolved from an early small experimental Web server to a popular service for the computer science community. Many design decisions and details of the public XML-records behind DBLP never were documented. This paper is a review of the evolution of DBLP. The main perspective is data modeling. In DBLP persons play a central role, our discussion of person names may be applicable to many other data bases. All DBLP data are available for your own experiments. You may either download the complete set, or use a simple XML-based API described in an online appendix.
Stanford WebBase components and applications.
ACM Transactions on Internet Technology, 6(2):153-186, 2006.
Junghoo Cho, Hector Garcia-Molina, Taher Haveliwala, Wang Lam, Andreas Paepcke, Sriram Raghavan und Gary Wesley.
[doi]
[Kurzfassung]
[BibTeX]
We describe the design and performance of WebBase, a tool for Web research. The system includes a highly customizable crawler, a repository for collected Web pages, an indexer for both text and link-related page features, and a high-speed content distribution facility. The distribution module enables researchers world-wide to retrieve pages from WebBase, and stream them across the Internet at high speed. The advantage for the researchers is that they need not all crawl the Web before beginning their research. WebBase has been used by scores of research and teaching organizations world-wide, mostly for investigations into Web topology and linguistic content analysis. After describing the system's architecture, we explain our engineering decisions for each of the WebBase components, and present respective performance measurements.
The structure of scientific collaboration networks.
Proceedings of the National Academy of Sciences, 98(2):404-409, 2001.
M. E. J. Newman.
[doi]
[Kurzfassung]
[BibTeX]
The structure of scientific collaboration networks is investigated. Two scientists are considered connected if they have authored a paper together and explicit networks of such connections are constructed by using data drawn from a number of databases, including MEDLINE biomedical research, the Los Alamos e-Print Archive physics, and NCSTRL computer science. I show that these collaboration networks form ” small worlds,” in which randomly chosen pairs of scientists are typically separated by only a short path of intermediate acquaintances. I further give results for mean and distribution of numbers of collaborators of authors, demonstrate the presence of clustering in the networks, and highlight a number of apparent differences in the patterns of collaboration between the fields studied.