Publications
The Intention Behind Web Queries
Baeza-Yates, R.; Calderón-Benavides, L. & González-Caro, C.
String Processing and Information Retrieval 98-109 (2006) [pdf]
The identification of the user’s intention or interest through queries that they submit to a search engine can be very useful
offer them more adequate results. In this work we present a framework for the identification of user’s interest in an automaticway, based on the analysis of query logs. This identification is made from two perspectives, the objectives or goals of auser and the categories in which these aims are situated. A manual classification of the queries was made in order to havea reference point and then we applied supervised and unsupervised learning techniques. The results obtained show that fora considerable amount of cases supervised learning is a good option, however through unsupervised learning we found relationshipsbetween users and behaviors that are not easy to detect just taking the query words. Also, through unsupervised learning weestablished that there are categories that we are not able to determine in contrast with other classes that were not consideredbut naturally appear after the clustering process. This allowed us to establish that the combination of supervised and unsupervisedlearning is a good alternative to find user’s goals. From supervised learning we can identify the user interest given certainestablished goals and categories; on the other hand, with unsupervised learning we can validate the goals and categories used,refine them and select the most appropriate to the user’s needs.