PUMA References output

Author	Title	Year	Journal/Proceedings	Reftype	DOI/URL
Berendt, B., Hotho, A. & Stumme, G.	Bridging the Gap--Data Mining and Social Network Analysis for Integrating Semantic Web and Web 2.0 [BibTeX]	2010	Web Semantics: Science, Services and Agents on the World Wide Web	article	DOI URL
BibTeX: @article{berendt2010bridging, author = {Berendt, Bettina and Hotho, Andreas and Stumme, Gerd}, title = {Bridging the Gap--Data Mining and Social Network Analysis for Integrating Semantic Web and Web 2.0}, journal = {Web Semantics: Science, Services and Agents on the World Wide Web}, year = {2010}, volume = {8}, number = {2-3}, pages = {95 - 96}, note = {Bridging the Gap--Data Mining and Social Network Analysis for Integrating Semantic Web and Web 2.0; The Future of Knowledge Dissemination: The Elsevier Grand Challenge for the Life Sciences}, url = {http://www.sciencedirect.com/science/article/B758F-4YXK4HW-1/2/4cb514565477c54160b5e6eb716c32d7}, doi = {DOI: 10.1016/j.websem.2010.04.008} }
Carpineto, C., Osiński, S., Romano, G. & Weiss, D.	A survey of Web clustering engines [Abstract] [BibTeX]	2009	ACM Comput. Surv.	article	DOI URL
Abstract: Web clustering engines organize search results by topic, thus offering a complementary view to the flat-ranked list returned by conventional search engines. In this survey, we discuss the issues that must be addressed in the development of a Web clustering engine, including acquisition and preprocessing of search results, their clustering and visualization. Search results clustering, the core of the system, has specific requirements that cannot be addressed by classical clustering algorithms. We emphasize the role played by the quality of the cluster labels as opposed to optimizing only the clustering structure. We highlight the main characteristics of a number of existing Web clustering engines and also discuss how to evaluate their retrieval performance. Some directions for future research are finally presented.
BibTeX: @article{Carpineto:2009:SWC:1541880.1541884, author = {Carpineto, Claudio and Osi\'{n}ski, Stanislaw and Romano, Giovanni and Weiss, Dawid}, title = {A survey of Web clustering engines}, journal = {ACM Comput. Surv.}, publisher = {ACM}, year = {2009}, volume = {41}, pages = {17:1--17:38}, url = {http://doi.acm.org/10.1145/1541880.1541884}, doi = {http://dx.doi.org/10.1145/1541880.1541884} }
Hazman, M., El-Beltagy, S. R. & Rafea, A.	Ontology learning from domain specific web documents [Abstract] [BibTeX]	2009	International Journal of Metadata, Semantics and Ontologies	article	DOI URL
Abstract: Ontologies play a vital role in many web- and internet-related applications. This work presents a system for accelerating the ontology building process via semi-automatically learning a hierarchal ontology given a set of domain-specific web documents and a set of seed concepts. The methods are tested with web documents in the domain of agriculture. The ontology is constructed through the use of two complementary approaches. The presented system has been used to build an ontology in the agricultural domain using a set of Arabic extension documents and evaluated against a modified version of the AGROVOC ontology.
BibTeX: @article{Hazman:30May2009:1744-2621:24, author = {Hazman, Maryam and El-Beltagy, Samhaa R. and Rafea, Ahmed}, title = {Ontology learning from domain specific web documents}, journal = {International Journal of Metadata, Semantics and Ontologies}, year = {2009}, volume = {4}, pages = {24-33(10)}, url = {http://www.ingentaconnect.com/content/ind/ijmso/2009/00000004/F0020001/art00003}, doi = {http://dx.doi.org/10.1504/IJMSO.2009.026251} }
Lu, C., Chen, X. & Park, E. K.	Exploit the tripartite network of social tagging for web clustering [Abstract] [BibTeX]	2009	Proceeding of the 18th ACM conference on Information and knowledge management	inproceedings	DOI URL
Abstract: In this poster, we investigate how to enhance web clustering by leveraging the tripartite network of social tagging systems. We propose a clustering method, called "Tripartite Clustering", which cluster the three types of nodes (resources, users and tags) simultaneously based on the links in the social tagging network. The proposed method is experimented on a real-world social tagging dataset sampled from del.icio.us. We also compare the proposed clustering approach with K-means. All the clustering results are evaluated against a human-maintained web directory. The experimental results show that Tripartite Clustering significantly outperforms the content-based K-means approach and achieves performance close to that of social annotation-based K-means whereas generating much more useful information.
BibTeX: @inproceedings{Lu:2009:ETN:1645953.1646167, author = {Lu, Caimei and Chen, Xin and Park, E. K.}, title = {Exploit the tripartite network of social tagging for web clustering}, booktitle = {Proceeding of the 18th ACM conference on Information and knowledge management}, publisher = {ACM}, year = {2009}, pages = {1545--1548}, url = {http://doi.acm.org/10.1145/1645953.1646167}, doi = {http://dx.doi.org/10.1145/1645953.1646167} }
Qi, X. & Davison, B. D.	Web page classification: Features and algorithms [Abstract] [BibTeX]	2009	ACM Comput. Surv.	article	DOI URL
Abstract: Classification of Web page content is essential to many tasks in Web information retrieval such as maintaining Web directories and focused crawling. The uncontrolled nature of Web content presents additional challenges to Web page classification as compared to traditional text classification, but the interconnected nature of hypertext also provides features that can assist the process.</p> <p>As we review work in Web page classification, we note the importance of these Web-specific features and algorithms, describe state-of-the-art practices, and track the underlying assumptions behind the use of information from neighboring pages.
BibTeX: @article{qi2009classification, author = {Qi, Xiaoguang and Davison, Brian D.}, title = {Web page classification: Features and algorithms}, journal = {ACM Comput. Surv.}, publisher = {ACM}, year = {2009}, volume = {41}, pages = {12:1--12:31}, url = {http://doi.acm.org/10.1145/1459352.1459357}, doi = {http://dx.doi.org/10.1145/1459352.1459357} }
	Proceedings of the Dagstuhl Seminar on Social Web Communities [BibTeX]	2008		book	URL
BibTeX: @book{alani2008proceedings,, title = {Proceedings of the Dagstuhl Seminar on Social Web Communities}, publisher = {Schloss Dagstuhl}, year = {2008}, url = {http://www.dagstuhl.de/en/program/calendar/semhp/?semnr=08391} }
Krause, B., Schmitz, C., Hotho, A. & Stumme, G.	The Anti-Social Tagger - Detecting Spam in Social Bookmarking Systems [BibTeX]	2008	Proc. of the Fourth International Workshop on Adversarial Information Retrieval on the Web	inproceedings	URL
BibTeX: @inproceedings{krause2008antisocialb, author = {Krause, Beate and Schmitz, Christoph and Hotho, Andreas and Stumme, Gerd}, title = {The Anti-Social Tagger - Detecting Spam in Social Bookmarking Systems}, booktitle = {Proc. of the Fourth International Workshop on Adversarial Information Retrieval on the Web}, year = {2008}, url = {http://airweb.cse.lehigh.edu/2008/submissions/krause_2008_anti_social_tagger.pdf} }
Bollegala, D., Matsuo, Y. & Ishizuka, M.	Measuring semantic similarity between words using web search engines [BibTeX]	2007	WWW '07: Proceedings of the 16th international conference on World Wide Web	inproceedings	DOI URL
BibTeX: @inproceedings{bollegala2007measuring, author = {Bollegala, Danushka and Matsuo, Yutaka and Ishizuka, Mitsuru}, title = {Measuring semantic similarity between words using web search engines}, booktitle = {WWW '07: Proceedings of the 16th international conference on World Wide Web}, publisher = {ACM}, year = {2007}, pages = {757--766}, url = {http://www2007.org/papers/paper632.pdf}, doi = {http://doi.acm.org/10.1145/1242572.1242675} }
Rattenbury, T., Good, N. & Naaman, M.	Towards automatic extraction of event and place semantics from flickr tags [Abstract] [BibTeX]	2007	SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval	inproceedings	DOI URL
Abstract: We describe an approach for extracting semantics of tags, unstructured text-labels assigned to resources on the Web, based on each tag's usage patterns. In particular, we focus on the problem of extracting place and event semantics for tags that are assigned to photos on Flickr, a popular photo sharing website that supports time and location (latitude/longitude) metadata. We analyze two methods inspired by well-known burst-analysis techniques and one novel method: Scale-structure Identification. We evaluate the methods on a subset of Flickr data, and show that our Scale-structure Identification method outperforms the existing techniques. The approach and methods described in this work can be used in other domains such as geo-annotated web pages, where text terms can be extracted and associated with usage patterns.
BibTeX: @inproceedings{rattenbury2007towards, author = {Rattenbury, Tye and Good, Nathaniel and Naaman, Mor}, title = {Towards automatic extraction of event and place semantics from flickr tags}, booktitle = {SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval}, publisher = {ACM Press}, year = {2007}, pages = {103--110}, url = {http://dx.doi.org/10.1145/1277741.1277762}, doi = {http://dx.doi.org/10.1145/1277741.1277762} }
Angelova, R. & Weikum, G.	Graph-based text classification: learn from your neighbors [Abstract] [BibTeX]	2006	Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval	inproceedings	DOI URL
Abstract: Automatic classification of data items, based on training samples, can be boosted by considering the neighborhood of data items in a graph structure (e.g., neighboring documents in a hyperlink environment or co-authors and their publications for bibliographic data entries). This paper presents a new method for graph-based classification, with particular emphasis on hyperlinked text documents but broader applicability. Our approach is based on iterative relaxation labeling and can be combined with either Bayesian or SVM classifiers on the feature spaces of the given data items. The graph neighborhood is taken into consideration to exploit locality patterns while at the same time avoiding overfitting. In contrast to prior work along these lines, our approach employs a number of novel techniques: dynamically inferring the link/class pattern in the graph in the run of the iterative relaxation labeling, judicious pruning of edges from the neighborhood graph based on node dissimilarities and node degrees, weighting the influence of edges based on a distance metric between the classification labels of interest and weighting edges by content similarity measures. Our techniques considerably improve the robustness and accuracy of the classification outcome, as shown in systematic experimental comparisons with previously published methods on three different real-world datasets.
BibTeX: @inproceedings{angelova2006graphbased, author = {Angelova, Ralitsa and Weikum, Gerhard}, title = {Graph-based text classification: learn from your neighbors}, booktitle = {Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval}, publisher = {ACM}, year = {2006}, pages = {485--492}, url = {http://doi.acm.org/10.1145/1148170.1148254}, doi = {http://dx.doi.org/10.1145/1148170.1148254} }
Liu, V. & Curran, J. R.	Web Text Corpus for Natural Language Processing. [BibTeX]	2006	EACL	inproceedings	URL
BibTeX: @inproceedings{liu2006web, author = {Liu, Vinci and Curran, James R.}, title = {Web Text Corpus for Natural Language Processing.}, booktitle = {EACL}, publisher = {The Association for Computer Linguistics}, year = {2006}, url = {http://dblp.uni-trier.de/db/conf/eacl/eacl2006.html#LiuC06} }
Choi, B. & Yao, Z.	Web Page Classification [Abstract] [BibTeX]	2005	Foundations and Advances in Data Mining	incollection	DOI URL
Abstract: This chapter describes systems that automatically classify web pages into meaningful categories. It first defines two types of web page classification: subject based and genre based classifications. It then describes the state of the art techniques and subsystems used to build automatic web page classification systems, including web page representations, dimensionality reductions, web page classifiers, and evaluation of web page classifiers. Such systems are essential tools for Web Mining and for the future of Semantic Web.
BibTeX: @incollection{choi2005classification, author = {Choi, B. and Yao, Z.}, title = {Web Page Classification}, booktitle = {Foundations and Advances in Data Mining}, publisher = {Springer}, year = {2005}, volume = {180}, pages = {221-274}, url = {http://dx.doi.org/10.1007/11362197_9}, doi = {http://dx.doi.org/10.1007/11362197_9} }
LIU, T.-Y., YANG, Y., WAN, H., ZHOU, Q., GAO, B., ZENG, H.-J., CHEN, Z. & MA, W.-Y.	An experimental study on large-scale web categorization [Abstract] [BibTeX]	2005	Special interest tracks and posters of the 14th international conference on World Wide Web	inproceedings	DOI URL
Abstract: Taxonomies of the Web typically have hundreds of thousands of categories and skewed category distribution over documents. It is not clear whether existing text classification technologies can perform well on and scale up to such large-scale applications. To understand this, we conducted the evaluation of several representative methods (Support Vector Machines, <i>k</i>-Nearest Neighbor and Naive Bayes) with Yahoo! taxonomies. In particular, we evaluated the effectiveness/efficiency tradeoff in classifiers with hierarchical setting compared to conventional (flat) setting, and tested popular threshold tuning strategies for their scalability and accuracy in large-scale classification problems.
BibTeX: @inproceedings{liu2005experimental, author = {LIU, Tie-Yan and YANG, Yiming and WAN, Hao and ZHOU, Qian and GAO, Bin and ZENG, Hua-Jun and CHEN, Zheng and MA, Wei-Ying}, title = {An experimental study on large-scale web categorization}, booktitle = {Special interest tracks and posters of the 14th international conference on World Wide Web}, publisher = {ACM}, year = {2005}, pages = {1106--1107}, url = {http://doi.acm.org/10.1145/1062745.1062891}, doi = {http://dx.doi.org/10.1145/1062745.1062891} }
Shen, D., Chen, Z., Yang, Q., Zeng, H.-J., Zhang, B., Lu, Y. & Ma, W.-Y.	Web-page classification through summarization [Abstract] [BibTeX]	2004	Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval	inproceedings	DOI URL
Abstract: Web-page classification is much more difficult than pure-text classification due to a large variety of noisy information embedded in Web pages. In this paper, we propose a new Web-page classification algorithm based on Web summarization for improving the accuracy. We first give empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the performance of Web-page classification algorithms. We then propose a new Web summarization-based classification algorithm and evaluate it along with several other state-of-the-art text summarization algorithms on the LookSmart Web directory. Experimental results show that our proposed summarization-based classification algorithm achieves an approximately 8.8% improvement as compared to pure-text-based classification algorithm. We further introduce an ensemble classifier using the improved summarization algorithm and show that it achieves about 12.9% improvement over pure-text based methods.
BibTeX: @inproceedings{Shen:2004:WCT:1008992.1009035, author = {Shen, Dou and Chen, Zheng and Yang, Qiang and Zeng, Hua-Jun and Zhang, Benyu and Lu, Yuchang and Ma, Wei-Ying}, title = {Web-page classification through summarization}, booktitle = {Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval}, publisher = {ACM}, year = {2004}, pages = {242--249}, url = {http://doi.acm.org/10.1145/1008992.1009035}, doi = {http://dx.doi.org/10.1145/1008992.1009035} }
Omelayenko, B.	Learning of Ontologies for the Web: the Analysis of Existent Approaches [Abstract] [BibTeX]	2001	Proceedings of the International Workshop on Web Dynamics, held in conj. with the 8th International Conference on Database Theory (ICDT’01), London, UK	inproceedings	URL
Abstract: The next generation of the Web, called Semantic Web, has to improve the Web with semantic (ontological) page annotations to enable knowledge-level querying and searches. Manual construction of these ontologies will require tremendous efforts that force future integration of machine learning with knowledge acquisition to enable highly automated ontology learning. In the paper we present the state of the-art in the field of ontology learning from the Web to see how it can contribute to the task of semantic Web querying. We consider three components of the query processing system: natural language ontologies, domain ontologies and ontology instances. We discuss the requirements for machine learning algorithms to be applied for the learning of the ontologies of each type from the Web documents, and survey the existent ontology learning and other closely related approaches.
BibTeX: @inproceedings{omelayenko2001learning, author = {Omelayenko, Borys}, title = {Learning of Ontologies for the Web: the Analysis of Existent Approaches}, booktitle = {Proceedings of the International Workshop on Web Dynamics, held in conj. with the 8th International Conference on Database Theory (ICDT’01), London, UK}, year = {2001}, url = {http://www.dcs.bbk.ac.uk/webDyn/webDynPapers/omelayenko.pdf} }
Dumais, S. & Chen, H.	Hierarchical classification of Web content [BibTeX]	2000	Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval	inproceedings	DOI URL
BibTeX: @inproceedings{Dumais:2000:HCW:345508.345593, author = {Dumais, Susan and Chen, Hao}, title = {Hierarchical classification of Web content}, booktitle = {Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval}, publisher = {ACM}, year = {2000}, pages = {256--263}, url = {http://doi.acm.org/10.1145/345508.345593}, doi = {http://dx.doi.org/10.1145/345508.345593} }
Chakrabarti, S., Dom, B., Agrawal, R. & Raghavan, P.	Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies [Abstract] [BibTeX]	1998	The VLDB Journal	article	DOI URL
Abstract: We explore how to organize large text databases hierarchically by topic to aid better searching, browsing and filtering. Many corpora, such as internet directories, digital libraries, and patent databases are manually organized into topic hierarchies, also called <i>taxonomies</i>. Similar to indices for relational data, taxonomies make search and access more efficient. However, the exponential growth in the volume of on-line textual information makes it nearly impossible to maintain such taxonomic organization for large, fast-changing corpora by hand. We describe an automatic system that starts with a small sample of the corpus in which topics have been assigned by hand, and then updates the database with new documents as the corpus grows, assigning topics to these new documents with high speed and accuracy. To do this, we use techniques from statistical pattern recognition to efficiently separate the <i>feature</i> words, or <i>discriminants</i>, from the<i>noise</i> words at each node of the taxonomy. Using these, we build a multilevel classifier. At each node, this classifier can ignore the large number of “noise” words in a document. Thus, the classifier has a small model size and is very fast. Owing to the use of context-sensitive features, the classifier is very accurate. As a by-product, we can compute for each document a set of terms that occur significantly more often in it than in the classes to which it belongs. We describe the design and implementation of our system, stressing how to exploit standard, efficient relational operations like sorts and joins. We report on experiences with the Reuters newswire benchmark, the US patent database, and web document samples from Yahoo!. We discuss applications where our system can improve searching and filtering capabilities.
BibTeX: @article{Chakrabarti:1998:SFS:765529.765533, author = {Chakrabarti, Soumen and Dom, Byron and Agrawal, Rakesh and Raghavan, Prabhakar}, title = {Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies}, journal = {The VLDB Journal}, publisher = {Springer-Verlag New York, Inc.}, year = {1998}, volume = {7}, pages = {163--178}, url = {http://dx.doi.org/10.1007/s007780050061}, doi = {http://dx.doi.org/10.1007/s007780050061} }

Created by JabRef export filters on 18/04/2024 by the social publication management platform PUMA