PUMA publications for /user/jaeschke/crowdhttps://puma.uni-kassel.de/user/jaeschke/crowdPUMA RSS feed for /user/jaeschke/crowd2024-03-29T15:58:09+01:00User browsing behavior-driven web crawlinghttps://puma.uni-kassel.de/bibtex/23ce89bd8a3d3eb6306b739fe1f4088df/jaeschkejaeschke2012-09-06T10:52:55+02:00crawling crowd engine search web <span class="authorEditorList"><span itemtype="http://schema.org/Person" itemscope="itemscope" itemprop="author"><a title="Minghai Liu" itemprop="url" href="/author/Minghai%20Liu"><span itemprop="name">M. Liu</span></a></span>, <span itemtype="http://schema.org/Person" itemscope="itemscope" itemprop="author"><a title="Rui Cai" itemprop="url" href="/author/Rui%20Cai"><span itemprop="name">R. Cai</span></a></span>, <span itemtype="http://schema.org/Person" itemscope="itemscope" itemprop="author"><a title="Ming Zhang" itemprop="url" href="/author/Ming%20Zhang"><span itemprop="name">M. Zhang</span></a></span>, и <span itemtype="http://schema.org/Person" itemscope="itemscope" itemprop="author"><a title="Lei Zhang" itemprop="url" href="/author/Lei%20Zhang"><span itemprop="name">L. Zhang</span></a></span>. </span><span itemtype="http://schema.org/Book" itemscope="itemscope" itemprop="isPartOf"><em><span itemprop="name">Proceedings of the 20th ACM international conference on Information and knowledge management</span>, </em></span><em>стр. <span itemprop="pagination">87--92</span>. </em><em>New York, NY, USA, </em><em><span itemprop="publisher">ACM</span>, </em>(<em><span>2011<meta content="2011" itemprop="datePublished"/></span></em>)Thu Sep 06 10:52:55 CEST 2012New York, NY, USAProceedings of the 20th ACM international conference on Information and knowledge management87--92User browsing behavior-driven web crawling2011crawling crowd engine search web To optimize the performance of web crawlers, various page importance measures have been studied to select and order URLs in crawling. Most sophisticated measures (e.g. breadth-first and PageRank) are based on link structure. In this paper, we treat the problem from another perspective and propose to measure page importance through mining user interest and behaviors from web browse logs. Unlike most existing approaches which work on single URL, in this paper, both the log mining and the crawl ordering are performed at the granularity of URL pattern. The proposed URL pattern-based crawl orderings are capable to properly predict the importance of newly created (unseen) URLs. Promising experimental results proved the feasibility of our approach.Discovering URLs through user feedbackhttps://puma.uni-kassel.de/bibtex/24e73c9d6ed79931ccdfcfda938e3be62/jaeschkejaeschke2012-09-06T10:23:10+02:00crawling crowd feedback search user web <span class="authorEditorList"><span itemtype="http://schema.org/Person" itemscope="itemscope" itemprop="author"><a title="Xiao Bai" itemprop="url" href="/author/Xiao%20Bai"><span itemprop="name">X. Bai</span></a></span>, <span itemtype="http://schema.org/Person" itemscope="itemscope" itemprop="author"><a title="B. Barla Cambazoglu" itemprop="url" href="/author/B.%20Barla%20Cambazoglu"><span itemprop="name">B. Cambazoglu</span></a></span>, и <span itemtype="http://schema.org/Person" itemscope="itemscope" itemprop="author"><a title="Flavio P. Junqueira" itemprop="url" href="/author/Flavio%20P.%20Junqueira"><span itemprop="name">F. Junqueira</span></a></span>. </span><span itemtype="http://schema.org/Book" itemscope="itemscope" itemprop="isPartOf"><em><span itemprop="name">Proceedings of the 20th ACM international conference on Information and knowledge management</span>, </em></span><em>стр. <span itemprop="pagination">77--86</span>. </em><em>New York, NY, USA, </em><em><span itemprop="publisher">ACM</span>, </em>(<em><span>2011<meta content="2011" itemprop="datePublished"/></span></em>)Thu Sep 06 10:23:10 CEST 2012New York, NY, USAProceedings of the 20th ACM international conference on Information and knowledge management77--86Discovering URLs through user feedback2011crawling crowd feedback search user web Search engines rely upon crawling to build their Web page collections. A Web crawler typically discovers new URLs by following the link structure induced by links on Web pages. As the number of documents on the Web is large, discovering newly created URLs may take arbitrarily long, and depending on how a given page is connected to others, such a crawler may miss the pages altogether. In this paper, we evaluate the benefits of integrating a passive URL discovery mechanism into a Web crawler. This mechanism is passive in the sense that it does not require the crawler to actively fetch documents from the Web to discover URLs. We focus here on a mechanism that uses toolbar data as a representative source for new URL discovery. We use the toolbar logs of Yahoo! to characterize the URLs that are accessed by users via their browsers, but not discovered by Yahoo! Web crawler. We show that a high fraction of URLs that appear in toolbar logs are not discovered by the crawler. We also reveal that a certain fraction of URLs are discovered by the crawler later than the time they are first accessed by users. One important conclusion of our work is that web search engines can highly benefit from user feedback in the form of toolbar logs for passive URL discovery.Cloud computing for citizen sciencehttps://puma.uni-kassel.de/bibtex/2d9e22a1a5e9404a805aee5cb0fd406c4/jaeschkejaeschke2012-07-13T17:21:58+02:00citizen cloud collective computing crowd intelligence science sourcing <meta content="thesis" itemprop="educationalUse"/><span class="authorEditorList"><span itemtype="http://schema.org/Person" itemscope="itemscope" itemprop="author"><a title="Michael J. Olson" itemprop="url" href="/author/Michael%20J.%20Olson"><span itemprop="name">M. Olson</span></a></span>. </span><em>California Institute of Technology, </em><em><span itemprop="educationalUse">Master's thesis</span>, </em>(<em><span>2012<meta content="2012" itemprop="datePublished"/></span></em>)Fri Jul 13 17:21:58 CEST 2012Cloud computing for citizen scienceMaster's thesis2012citizen cloud collective computing crowd intelligence science sourcing My thesis describes the design and implementation of systems that empower individuals to help their communities respond to critical situations and to participate in research that helps them understand and improve their environments. People want to help their communities respond to threats such as earthquakes, wildfires, mudslides and hurricanes, and they want to participate in research that helps them understand and improve their environment. “Citizen Science” projects that facilitate this interaction include projects that monitor climate change, water quality and animal habitats. My thesis explores the design and analysis of community-based sense and response systems that enable individuals to participate in critical community activities and scientific research that monitors their environments.Cloud computing for citizen science - CaltechTHESIS