Jäschke, R.; Hotho, A.; Mitzlaff, F. & Stumme, G. (2012), Challenges in Tag Recommendations for Collaborative Tagging Systems, in Janusz Kacprzyk & Lakhmi C. Jain, ed., 'Recommender Systems for the Social Web' , Springer, Berlin/Heidelberg , pp. 65--87 .
[Volltext] [Kurzfassung] [BibTeX] [Endnote]

Originally introduced by social bookmarking systems, collaborative tagging, or social tagging, has been widely adopted by many web-based systems like wikis, e-commerce platforms, or social networks. Collaborative tagging systems allow users to annotate resources using freely chosen keywords, so called tags . Those tags help users in finding/retrieving resources, discovering new resources, and navigating through the system. The process of tagging resources is laborious. Therefore, most systems support their users by tag recommender components that recommend tags in a personalized way. The Discovery Challenges 2008 and 2009 of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD) tackled the problem of tag recommendations in collaborative tagging systems. Researchers were invited to test their methods in a competition on datasets from the social bookmark and publication sharing system BibSonomy. Moreover, the 2009 challenge included an online task where the recommender systems were integrated into BibSonomy and provided recommendations in real time. In this chapter we review, evaluate and summarize the submissions to the two Discovery Challenges and thus lay the groundwork for continuing research in this area.

Herlocker, J. L.; Konstan, J. A.; Terveen, L. G. & Riedl, J. T. (2004), 'Evaluating collaborative filtering recommender systems', ACM Trans. Inf. Syst. 22 , 5--53 .
[Volltext] [Kurzfassung] [BibTeX] [Endnote]

Recommender systems have been evaluated in many, often incomparable, ways. In this article, we review the key decisions in evaluating collaborative filtering recommender systems: the user tasks being evaluated, the types of analysis and datasets being used, the ways in which prediction quality is measured, the evaluation of prediction attributes other than quality, and the user-based evaluation of the system as a whole. In addition to reviewing the evaluation strategies used by prior researchers, we present empirical results from the analysis of various accuracy metrics on one content domain where all the tested metrics collapsed roughly into three equivalence classes. Metrics within each equivalency class were strongly correlated, while metrics from different equivalency classes were uncorrelated.