TY - CONF AU - Liu, Minghai AU - Cai, Rui AU - Zhang, Ming AU - Zhang, Lei A2 - T1 - User browsing behavior-driven web crawling T2 - Proceedings of the 20th ACM international conference on Information and knowledge management PB - ACM C1 - New York, NY, USA PY - 2011/ CY - VL - IS - SP - 87 EP - 92 UR - http://doi.acm.org/10.1145/2063576.2063593 DO - 10.1145/2063576.2063593 KW - engine KW - crowd KW - search KW - crawling KW - web L1 - SN - 978-1-4503-0717-8 N1 - N1 - AB - To optimize the performance of web crawlers, various page importance measures have been studied to select and order URLs in crawling. Most sophisticated measures (e.g. breadth-first and PageRank) are based on link structure. In this paper, we treat the problem from another perspective and propose to measure page importance through mining user interest and behaviors from web browse logs. Unlike most existing approaches which work on single URL, in this paper, both the log mining and the crawl ordering are performed at the granularity of URL pattern. The proposed URL pattern-based crawl orderings are capable to properly predict the importance of newly created (unseen) URLs. Promising experimental results proved the feasibility of our approach. ER -