Technische Berichte
Of course we share! Testing Assumptions about Social Tagging Systems.
2014. cite arxiv:1401.0629.
Stephan Doerfel, Daniel Zoller, Philipp Singer, Thomas Niebler, Andreas Hotho und Markus Strohmaier.
[doi]
[Kurzfassung]
[BibTeX]
Social tagging systems have established themselves as an important part in today's web and have attracted the interest from our research community in a variety of investigations. The overall vision of our community is that simply through interactions with the system, i.e., through tagging and sharing of resources, users would contribute to building useful semantic structures as well as resource indexes using uncontrolled vocabulary not only due to the easy-to-use mechanics. Henceforth, a variety of assumptions about social tagging systems have emerged, yet testing them has been difficult due to the absence of suitable data. In this work we thoroughly investigate three available assumptions - e.g., is a tagging system really social? - by examining live log data gathered from the real-world public social tagging system BibSonomy. Our empirical results indicate that while some of these assumptions hold to a certain extent, other assumptions need to be reflected and viewed in a very critical light. Our observations have implications for the design of future search and other algorithms to better reflect the actual user behavior.
Artikel in Zeitschriften
Analysis of Search and Browsing Behavior of Young Users on the Web.
ACM Transactions on the Web, 8(2):7:1-7:54, 2014.
Sergio Duarte Torres, Ingmar Weber und Djoerd Hiemstra.
[doi]
[Kurzfassung]
[BibTeX]
The Internet is increasingly used by young children for all kinds of purposes. Nonetheless, there are not many resources especially designed for children on the Internet and most of the content online is designed for grown-up users. This situation is problematic if we consider the large differences between young users and adults since their topic interests, computer skills, and language capabilities evolve rapidly during childhood. There is little research aimed at exploring and measuring the difficulties that children encounter on the Internet when searching for information and browsing for content. In the first part of this work, we employed query logs from a commercial search engine to quantify the difficulties children of different ages encounter on the Internet and to characterize the topics that they search for. We employed query metrics (e.g., the fraction of queries posed in natural language), session metrics (e.g., the fraction of abandoned sessions), and click activity (e.g., the fraction of ad clicks). The search logs were also used to retrace stages of child development. Concretely, we looked for changes in interests (e.g., the distribution of topics searched) and language development (e.g., the readability of the content accessed and the vocabulary size). In the second part of this work, we employed toolbar logs from a commercial search engine to characterize the browsing behavior of young users, particularly to understand the activities on the Internet that trigger search. We quantified the proportion of browsing and search activity in the toolbar sessions and we estimated the likelihood of a user to carry out search on the Web vertical and multimedia verticals (i.e., videos and images) given that the previous event is another search event or a browsing event. We observed that these metrics clearly demonstrate an increased level of confusion and unsuccessful search sessions among children. We also found a clear relation between the reading level of the clicked pages and characteristics of the users such as age and educational attainment. In terms of browsing behavior, children were found to start their activities on the Internet with a search engine (instead of directly browsing content) more often than adults. We also observed a significantly larger amount of browsing activity for the case of teenager users. Interestingly we also found that if children visit knowledge-related Web sites (i.e., information-dense pages such as Wikipedia articles), they subsequently do more Web searches than adults. Additionally, children and especially teenagers were found to have a greater tendency to engage in multimedia search, which calls to improve the aggregation of multimedia results into the current search result pages.
Web log analysis: a review of a decade of studies about information acquisition, inspection and interpretation of user interaction.
Data Mining and Knowledge Discovery, 24(3):663-696, 2012.
Maristella Agosti, Franco Crivellari und GiorgioMaria Di Nunzio.
[doi]
[Kurzfassung]
[BibTeX]
In the last decade, the importance of analyzing information management systems logs has grown, because log data constitute a relevant aspect in evaluating the quality of such systems. A review of 10 years of research on log analysis is presented in this paper. About 50 papers and posters from five major conferences and about 30 related journal papers have been selected to trace the history of the state-of-the-art in this field. The paper presents an overview of two main themes: Web search engine log analysis and Digital Library System log analysis. The problem of the analysis of different sources of log data and the distribution of data are investigated.
Understanding why users tag: A survey of tagging motivation literature and results from an empirical study.
Web Semantics: Science, Services and Agents on the World Wide Web, 17(0):1 - 11, 2012.
Markus Strohmaier, Christian Körner und Roman Kern.
[doi]
[Kurzfassung]
[BibTeX]
While recent progress has been achieved in understanding the structure and dynamics of social tagging systems, we know little about the underlying user motivations for tagging, and how they influence resulting folksonomies and tags. This paper addresses three issues related to this question. (1) What distinctions of user motivations are identified by previous research, and in what ways are the motivations of users amenable to quantitative analysis? (2) To what extent does tagging motivation vary across different social tagging systems? (3) How does variability in user motivation influence resulting tags and folksonomies? In this paper, we present measures to detect whether a tagger is primarily motivated by categorizing or describing resources, and apply these measures to datasets from seven different tagging systems. Our results show that (a) users’ motivation for tagging varies not only across, but also within tagging systems, and that (b) tag agreement among users who are motivated by categorizing resources is significantly lower than among users who are motivated by describing resources. Our findings are relevant for (1) the development of tag-based user interfaces, (2) the analysis of tag semantics and (3) the design of search algorithms for social tagging systems.
Artikel in Tagungsbänden
A statistical comparison of tag and query logs.
In:
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, Reihe SIGIR '09, Seiten 123-130.
ACM, New York, NY, USA, 2009.
Mark J. Carman, Mark Baillie, Robert Gwadera und Fabio Crestani.
[doi]
[Kurzfassung]
[BibTeX]
We investigate tag and query logs to see if the terms people use to annotate websites are similar to the ones they use to query for them. Over a set of URLs, we compare the distribution of tags used to annotate each URL with the distribution of query terms for clicks on the same URL. Understanding the relationship between the distributions is important to determine how useful tag data may be for improving search results and conversely, query data for improving tag prediction. In our study, we compare both term frequency distributions using vocabulary overlap and relative entropy. We also test statistically whether the term counts come from the same underlying distribution. Our results indicate that the vocabulary used for tagging and searching for content are similar but not identical. We further investigate the content of the websites to see which of the two distributions (tag or query) is most similar to the content of the annotated/searched URL. Finally, we analyze the similarity for different categories of URLs in our sample to see if the similarity between distributions is dependent on the topic of the website or the popularity of the URL.
Artikel in Zeitschriften
Website usage metrics: A re-assessment of session data.
Information Processing & Management, 44(1):358 - 372, 2008.
Evaluation of Interactive Information Retrieval Systems
Paul Huntington, David Nicholas und Hamid R. Jamali.
[doi]
[Kurzfassung]
[BibTeX]
Metrics derived from user visits or sessions provide a means of evaluating Websites and an important insight into online information seeking behaviour, the most important of them being the duration of sessions and the number of pages viewed in a session, a possible busyness indicator. However, the identification of session (termed often ‘sessionization’) is fraught with difficulty in that there is no way of determining from a transactional log file that a user has ended their session. No one logs out. Instead a session delimiter has to be applied and this is typically done on the basis of a standard period of inactivity. To date researchers have discussed the issue of a time out delimiter in terms of a single value and if a page view time exceeds the cut-off value the session is deemed to have ended. This approach assumes that page view time is a single distribution and that the cut-off value is one point on that distribution. The authors however argue that page time distribution is composed of a number of quite separate view time distributions because of the marked differences in view times between pages (abstract, contents page, full text). This implies that a number of timeout delimiters should be applied. Employing data from a study of the OhioLINK digital journal library, the authors demonstrate how the setting of a time out delimiter impacts on the estimate of page view time and the number of estimated session. Furthermore, they also show how a number of timeout delimiters might apply and they argue that this gives a better and more robust estimate of the number of sessions, session time and page view time compared to an application of a single timeout delimiter.
The folksonomy tag cloud: when is it useful?.
Journal of Information Science, 34(1):15-29, 2008.
James Sinclair und Michael Cardew-Hall.
[doi]
[Kurzfassung]
[BibTeX]
The weighted list, known popularly as a `tag cloud', has appeared on many popular folksonomy-based web-sites. Flickr, Delicious, Technorati and many others have all featured a tag cloud at some point in their history. However, it is unclear whether the tag cloud is actually useful as an aid to finding information. We conducted an experiment, giving participants the option of using a tag cloud or a traditional search interface to answer various questions. We found that where the information-seeking task required specific information, participants preferred the search interface. Conversely, where the information-seeking task was more general, participants preferred the tag cloud. While the tag cloud is not without value, it is not sufficient as the sole means of navigation for a folksonomy-based dataset.
Sonstiges
Tagging, Folksonomy & Co - Renaissance of Manual Indexing?.
2007. cite arxiv:cs/0701072Comment: Preprint. 12 pages, 1 figure, 54 references.
Jakob Voss.
[doi]
[Kurzfassung]
[BibTeX]
This paper gives an overview of current trends in manual indexing on the Web. Along with a general rise of user generated content there are more and more tagging systems that allow users to annotate digital resources with tags (keywords) and share their annotations with other users. Tagging is frequently seen in contrast to traditional knowledge organization systems or as something completely new. This paper shows that tagging should better be seen as a popular form of manual indexing on the Web. Difference between controlled and free indexing blurs with sufficient feedback mechanisms. A revised typology of tagging systems is presented that includes different user roles and knowledge organization systems with hierarchical relationships and vocabulary control. A detailed bibliography of current research in collaborative tagging is included.
Artikel in Zeitschriften
End user searching: A Web log analysis of NAVER, a Korean Web search engine.
Library & Information Science Research, 27(2):203 - 221, 2005.
Soyeon Park, Joon Ho Lee und Hee Jin Bae.
[doi]
[Kurzfassung]
[BibTeX]
Transaction logs of NAVER, a major Korean Web search engine, were analyzed to track the information-seeking behavior of Korean Web users. These transaction logs include more than 40 million queries collected over 1 week. This study examines current transaction log analysis methodologies and proposes a method for log cleaning, session definition, and query classification. A term definition method which is necessary for Korean transaction log analysis is also discussed. The results of this study show that users behave in a simple way: they type in short queries with a few query terms, seldom use advanced features, and view few results' pages. Users also behave in a passive way: they seldom change search environments set by the system. It is of interest that users tend to change their queries totally rather than adding or deleting terms to modify the previous queries. The results of this study might contribute to the development of more efficient and effective Web search engines and services.
Tagging and Why it Matters.
SSRN eLibrary, 2005.
David Weinberger.
[BibTeX]