Kohavi, R. (2012). Online Controlled Experiments: Introduction, Learnings, and Humbling Statistics.

Thelwall, M. (2012). Journal impact evaluation: a webometric perspective. Scientometrics, 92, 429--441. doi: 10.1007/s11192-012-0669-x

Alonso, O., Rose, D. E. & Stewart, B. (2008). Crowdsourcing for relevance evaluation. SIGIR Forum, 42, 9--15. doi: 10.1145/1480506.1480508

de Wit, J. (2008). Evaluating Recommender Systems. Unpublished master's thesis , University of Twente .

Völker, J., Vrandečić, D., Sure, Y. & Hotho, A. (2008). AEON - An approach to the automatic evaluation of ontologies. Applied Ontology, 3, 41--62.

Davis, J. & Goadrich, M. (2006). The relationship between Precision-Recall and ROC curves. ICML '06: Proceedings of the 23rd international conference on Machine learning (p./pp. 233--240), New York, NY, USA: ACM. ISBN: 1-59593-383-2

Joachims, T., Granka, L., Pan, B., Hembrooke, H. & Gay, G. (2005). Accurately interpreting clickthrough data as implicit feedback. Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval (p./pp. 154--161), New York, NY, USA: ACM. ISBN: 1-59593-034-5

Herlocker, J. L., Konstan, J. A., Terveen, L. G. & Riedl, J. T. (2004). Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst., 22, 5--53. doi: http://doi.acm.org/10.1145/963770.963772

Järvelin, K. & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20, 422--446. doi: 10.1145/582415.582418

Järvelin, K. & Kekäläinen, J. (2000). IR evaluation methods for retrieving highly relevant documents. SIGIR '00: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (p./pp. 41--48), New York, NY, USA: ACM. ISBN: 1-58113-226-3

Lewis, D. D. (1991). Evaluating text categorization. Proceedings of Speech and Natural Language Workshop (p./pp. 312-318), Feb, San Mateo: Morgan Kaufmann.