Liu, X.; Lu, M.; Ooi, B. C.; Shen, Y.; Wu, S. & Zhang, M.
(2012):
CDAS: a crowdsourcing data analytics system.
In: Proceedings of the VLDB Endowment,
Ausgabe/Number: 10,
Vol. 5,
Verlag/Publisher: VLDB Endowment.
Erscheinungsjahr/Year: 2012.
Seiten/Pages: 1040-1051.
[Volltext] [Kurzfassung] [BibTeX]
[Endnote]
Some complex problems, such as image tagging and natural language processing, are very challenging for computers, where even state-of-the-art technology is yet able to provide satisfactory accuracy. Therefore, rather than relying solely on developing new and better algorithms to handle such tasks, we look to the crowdsourcing solution - employing human participation - to make good the shortfall in current technology. Crowdsourcing is a good supplement to many computer tasks. A complex job may be divided into computer-oriented tasks and human-oriented tasks, which are then assigned to machines and humans respectively.</p> <p>To leverage the power of crowdsourcing, we design and implement a Crowdsourcing Data Analytics System, CDAS. CDAS is a framework designed to support the deployment of various crowdsourcing applications. The core part of CDAS is a quality-sensitive answering model, which guides the crowdsourcing engine to process and monitor the human tasks. In this paper, we introduce the principles of our quality-sensitive model. To satisfy user required accuracy, the model guides the crowdsourcing query engine for the design and processing of the corresponding crowdsourcing jobs. It provides an estimated accuracy for each generated result based on the human workers' historical performances. When verifying the quality of the result, the model employs an online strategy to reduce waiting time. To show the effectiveness of the model, we implement and deploy two analytics jobs on CDAS, a twitter sentiment analytics job and an image tagging job. We use real Twitter and Flickr data as our queries respectively. We compare our approaches with state-of-the-art classification and image annotation techniques. The results show that the human-assisted methods can indeed achieve a much higher accuracy. By embedding the quality-sensitive model into crowdsourcing query engine, we effectively reduce the processing cost while maintaining the required query answer quality.
@article{liu2012crowdsourcing,
author = {Liu, Xuan and Lu, Meiyu and Ooi, Beng Chin and Shen, Yanyan and Wu, Sai and Zhang, Meihui},
title = {CDAS: a crowdsourcing data analytics system},
journal = {Proceedings of the VLDB Endowment},
publisher = {VLDB Endowment},
year = {2012},
volume = {5},
number = {10},
pages = {1040--1051},
url = {http://dl.acm.org/citation.cfm?id=2336664.2336676},
issn = {2150-8097},
keywords = {analytics, cdas, collective, crowdsourcing, data, intelligence, mining, web},
abstract = {Some complex problems, such as image tagging and natural language processing, are very challenging for computers, where even state-of-the-art technology is yet able to provide satisfactory accuracy. Therefore, rather than relying solely on developing new and better algorithms to handle such tasks, we look to the crowdsourcing solution -- employing human participation -- to make good the shortfall in current technology. Crowdsourcing is a good supplement to many computer tasks. A complex job may be divided into computer-oriented tasks and human-oriented tasks, which are then assigned to machines and humans respectively.</p> <p>To leverage the power of crowdsourcing, we design and implement a Crowdsourcing Data Analytics System, CDAS. CDAS is a framework designed to support the deployment of various crowdsourcing applications. The core part of CDAS is a quality-sensitive answering model, which guides the crowdsourcing engine to process and monitor the human tasks. In this paper, we introduce the principles of our quality-sensitive model. To satisfy user required accuracy, the model guides the crowdsourcing query engine for the design and processing of the corresponding crowdsourcing jobs. It provides an estimated accuracy for each generated result based on the human workers' historical performances. When verifying the quality of the result, the model employs an online strategy to reduce waiting time. To show the effectiveness of the model, we implement and deploy two analytics jobs on CDAS, a twitter sentiment analytics job and an image tagging job. We use real Twitter and Flickr data as our queries respectively. We compare our approaches with state-of-the-art classification and image annotation techniques. The results show that the human-assisted methods can indeed achieve a much higher accuracy. By embedding the quality-sensitive model into crowdsourcing query engine, we effectively reduce the processing cost while maintaining the required query answer quality.}
}
%0 = article
%A = Liu, Xuan and Lu, Meiyu and Ooi, Beng Chin and Shen, Yanyan and Wu, Sai and Zhang, Meihui
%D = 2012
%I = VLDB Endowment
%T = CDAS: a crowdsourcing data analytics system
%U = http://dl.acm.org/citation.cfm?id=2336664.2336676
Rauber, A. & Kaiser, M.
(2009):
Webarchivierung und Web Archive Mining: Notwendigkeit, Probleme und Lösungsansätze.
In: HMD Praxis der Wirtschaftsinformatik,
Vol. 268,
Verlag/Publisher: dpunkt.verlag.
Erscheinungsjahr/Year: 2009.
[Volltext] [Kurzfassung] [BibTeX]
[Endnote]
In den letzten Jahren haben Bibliotheken und Archive zunehmend die Aufgabe übernommen, neben konventionellen Publikationen auch Inhalte aus dem World Wide Web zu sammeln, um so diesen wertvollen Teil unseres kulturellen Erbes zu bewahren und wichtige Informationen langfristig verfügbar zu halten. Diese massiven Datensammlungen bieten faszinierende Möglichkeiten, rasch Zugriff auf wichtige Informationen zu bekommen, die im Live-Web bereits verloren gegangen sind. Sie sind eine unentbehrliche Quelle für Wissenschaftler, die in der Zukunft die gesellschaftliche und technologische Entwicklung unserer Zeit nachvollziehen wollen. Auf der anderen Seite stellt eine derartige Datensammlung aber einen völlig neuen Datenbestand dar, der nicht nur rechtliche, sondern auch zahlreiche ethische Fragen betreffend seine Nutzung aufwirft. Diese werden in dem Ausmaß zunehmen, in dem die technischen Möglichkeiten zur automatischen Analyse und Interpretation dieser Daten leistungsfähiger werden. Da sich die meisten Webarchivierungsinitiativen dieser Problematik bewusst sind, bleibt die Nutzung der Daten derzeit meist stark eingeschränkt, oder es wird eine Art von "Opt-Out"-Möglichkeit vorgesehen, wodurch Webseiteninhaber die Aufnahme ihrer Seiten in ein Webarchiv ausschließen können. Mit beiden Ansätzen können Webarchive ihr volles Nutzungspotenzial nicht ausschöpfen. Dieser Artikel beschreibt einleitend kurz die Technologien, die zur Sammlung von Webinhalten zu Archivierungszwecken verwendet werden. Er hinterfragt Annahmen, die die freie Verfügbarkeit der Daten und unterschiedliche Nutzungsarten betreffen. Darauf aufbauend identifiziert er eine Reihe von offenen Fragen, deren Lösung einen breiteren Zugriff und bessere Nutzung von Webarchiven erlauben könnte.
@article{rauber2009webarchivierung,
author = {Rauber, Andreas and Kaiser, Max},
title = {Webarchivierung und Web Archive Mining: Notwendigkeit, Probleme und Lösungsansätze},
editor = {Knoll, Matthias and Meier, Andreas},
journal = {HMD Praxis der Wirtschaftsinformatik},
publisher = {dpunkt.verlag},
year = {2009},
volume = {268},
url = {http://hmd.dpunkt.de/268/03.php},
issn = {1436-3011},
keywords = {archive, law, mining, privacy, web},
abstract = { In den letzten Jahren haben Bibliotheken und Archive zunehmend die Aufgabe übernommen, neben konventionellen Publikationen auch Inhalte aus dem World Wide Web zu sammeln, um so diesen wertvollen Teil unseres kulturellen Erbes zu bewahren und wichtige Informationen langfristig verfügbar zu halten. Diese massiven Datensammlungen bieten faszinierende Möglichkeiten, rasch Zugriff auf wichtige Informationen zu bekommen, die im Live-Web bereits verloren gegangen sind. Sie sind eine unentbehrliche Quelle für Wissenschaftler, die in der Zukunft die gesellschaftliche und technologische Entwicklung unserer Zeit nachvollziehen wollen. Auf der anderen Seite stellt eine derartige Datensammlung aber einen völlig neuen Datenbestand dar, der nicht nur rechtliche, sondern auch zahlreiche ethische Fragen betreffend seine Nutzung aufwirft. Diese werden in dem Ausmaß zunehmen, in dem die technischen Möglichkeiten zur automatischen Analyse und Interpretation dieser Daten leistungsfähiger werden. Da sich die meisten Webarchivierungsinitiativen dieser Problematik bewusst sind, bleibt die Nutzung der Daten derzeit meist stark eingeschränkt, oder es wird eine Art von "Opt-Out"-Möglichkeit vorgesehen, wodurch Webseiteninhaber die Aufnahme ihrer Seiten in ein Webarchiv ausschließen können. Mit beiden Ansätzen können Webarchive ihr volles Nutzungspotenzial nicht ausschöpfen. Dieser Artikel beschreibt einleitend kurz die Technologien, die zur Sammlung von Webinhalten zu Archivierungszwecken verwendet werden. Er hinterfragt Annahmen, die die freie Verfügbarkeit der Daten und unterschiedliche Nutzungsarten betreffen. Darauf aufbauend identifiziert er eine Reihe von offenen Fragen, deren Lösung einen breiteren Zugriff und bessere Nutzung von Webarchiven erlauben könnte. }
}
%0 = article
%A = Rauber, Andreas and Kaiser, Max
%D = 2009
%I = dpunkt.verlag
%T = Webarchivierung und Web Archive Mining: Notwendigkeit, Probleme und Lösungsansätze
%U = http://hmd.dpunkt.de/268/03.php
Pang, B. & Lee, L.
(2008):
Opinion Mining and Sentiment Analysis.
In: Foundations and Trends in Information Retrieval,
Ausgabe/Number: 1-2,
Vol. 2,
Verlag/Publisher: Now Publishers Inc..
Erscheinungsjahr/Year: 2008.
Seiten/Pages: 1-135.
[Volltext] [Kurzfassung] [BibTeX]
[Endnote]
An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object.
@article{pang2008opinion,
author = {Pang, Bo and Lee, Lillian},
title = {Opinion Mining and Sentiment Analysis},
journal = {Foundations and Trends in Information Retrieval},
publisher = {Now Publishers Inc.},
address = {Hanover, MA, USA},
year = {2008},
volume = {2},
number = {1-2},
pages = {1--135},
url = {http://portal.acm.org/citation.cfm?id=1454712},
doi = {10.1561/1500000011},
issn = {1554-0669},
keywords = {analysis, mining, opinion, sentiment, social, web},
abstract = {An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object.}
}
%0 = article
%A = Pang, Bo and Lee, Lillian
%C = Hanover, MA, USA
%D = 2008
%I = Now Publishers Inc.
%T = Opinion Mining and Sentiment Analysis
%U = http://portal.acm.org/citation.cfm?id=1454712
Wu, X.; Kumar, V.; Quinlan, J. R.; Ghosh, J.; Yang, Q.; Motoda, H.; McLachlan, G.; Ng, A.; Liu, B.; Yu, P.; Zhou, Z.-H.; Steinbach, M.; Hand, D. & Steinberg, D.
(2008):
Top 10 algorithms in data mining.
In: Knowledge and Information Systems,
Ausgabe/Number: 1,
Vol. 14,
Verlag/Publisher: Springer.
Erscheinungsjahr/Year: 2008.
Seiten/Pages: 1-37.
[Volltext] [Kurzfassung] [BibTeX]
[Endnote]
This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM)
December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining algorithms in the research community.With each algorithm, we provide a description of the algorithm, discuss the impact of the algorithm, and review current andfurther research on the algorithm. These 10 algorithms cover classification, clustering, statistical learning, associationanalysis, and link mining, which are all among the most important topics in data mining research and development.
@article{wu2008wu,
author = {Wu, Xindong and Kumar, Vipin and Quinlan, J. Ross and Ghosh, Joydeep and Yang, Qiang and Motoda, Hiroshi and McLachlan, Geoffrey and Ng, Angus and Liu, Bing and Yu, Philip and Zhou, Zhi-Hua and Steinbach, Michael and Hand, David and Steinberg, Dan},
title = {Top 10 algorithms in data mining},
journal = {Knowledge and Information Systems},
publisher = {Springer},
address = {London},
year = {2008},
volume = {14},
number = {1},
pages = {1--37},
url = {http://dx.doi.org/10.1007/s10115-007-0114-2},
issn = {0219-1377},
keywords = {algorithm, data, icdm, ieee, mining, top},
abstract = {This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM)
in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining algorithms in the research community.With each algorithm, we provide a description of the algorithm, discuss the impact of the algorithm, and review current andfurther research on the algorithm. These 10 algorithms cover classification, clustering, statistical learning, associationanalysis, and link mining, which are all among the most important topics in data mining research and development.}
}
%0 = article
%A = Wu, Xindong and Kumar, Vipin and Quinlan, J. Ross and Ghosh, Joydeep and Yang, Qiang and Motoda, Hiroshi and McLachlan, Geoffrey and Ng, Angus and Liu, Bing and Yu, Philip and Zhou, Zhi-Hua and Steinbach, Michael and Hand, David and Steinberg, Dan
%C = London
%D = 2008
%I = Springer
%T = Top 10 algorithms in data mining
%U = http://dx.doi.org/10.1007/s10115-007-0114-2
Kim, H. L.; Hwang, S. H. & Kim, H. G.
(2007):
FCA-based approach for mining contextualized folksonomy.
In: SAC '07: Proceedings of the 2007 ACM symposium on Applied computing,
New York, NY, USA.
[Volltext]
[BibTeX][Endnote]
@inproceedings{1244292,
author = {Kim, Hak Lae and Hwang, Suk Hyung and Kim, Hong Gee},
title = {FCA-based approach for mining contextualized folksonomy},
booktitle = {SAC '07: Proceedings of the 2007 ACM symposium on Applied computing},
publisher = {ACM Press},
address = {New York, NY, USA},
year = {2007},
pages = {1340--1345},
url = {http://portal.acm.org/citation.cfm?id=1244002.1244292&coll=GUIDE&dl=},
doi = {http://doi.acm.org/10.1145/1244002.1244292},
isbn = {1-59593-480-4},
keywords = {concept, fca, folksonomy, formal, mining, tagging, analysis, network, sna, social}
}
%0 = inproceedings
%A = Kim, Hak Lae and Hwang, Suk Hyung and Kim, Hong Gee
%B = SAC '07: Proceedings of the 2007 ACM symposium on Applied computing
%C = New York, NY, USA
%D = 2007
%I = ACM Press
%T = FCA-based approach for mining contextualized folksonomy
%U = http://portal.acm.org/citation.cfm?id=1244002.1244292&coll=GUIDE&dl=
Romero, C. & Ventura, S.
(2007):
Educational data mining: A survey from 1995 to 2005.
In: Expert Syst. Appl.,
Ausgabe/Number: 1,
Vol. 33,
Verlag/Publisher: Pergamon Press, Inc..
Erscheinungsjahr/Year: 2007.
Seiten/Pages: 135-146.
[Volltext] [Kurzfassung] [BibTeX]
[Endnote]
Currently there is an increasing interest in data mining and educational systems, making educational data mining as a new growing research community. This paper surveys the application of data mining to traditional educational systems, particular web-based courses, well-known learning content management systems, and adaptive and intelligent web-based educational systems. Each of these systems has different data source and objectives for knowledge discovering. After preprocessing the available data in each case, data mining techniques can be applied: statistics and visualization; clustering, classification and outlier detection; association rule mining and pattern mining; and text mining. The success of the plentiful work needs much more specialized work in order for educational data mining to become a mature area.
@article{romero07,
author = {Romero, C. and Ventura, S.},
title = {Educational data mining: A survey from 1995 to 2005},
journal = {Expert Syst. Appl.},
publisher = {Pergamon Press, Inc.},
address = {Tarrytown, NY, USA},
year = {2007},
volume = {33},
number = {1},
pages = {135--146},
url = {http://portal.acm.org/citation.cfm?id=1223659},
doi = {http://dx.doi.org/10.1016/j.eswa.2006.04.005},
issn = {0957-4174},
keywords = {data, dm, e-learning, mining, survey, webzu},
abstract = {Currently there is an increasing interest in data mining and educational systems, making educational data mining as a new growing research community. This paper surveys the application of data mining to traditional educational systems, particular web-based courses, well-known learning content management systems, and adaptive and intelligent web-based educational systems. Each of these systems has different data source and objectives for knowledge discovering. After preprocessing the available data in each case, data mining techniques can be applied: statistics and visualization; clustering, classification and outlier detection; association rule mining and pattern mining; and text mining. The success of the plentiful work needs much more specialized work in order for educational data mining to become a mature area.}
}
%0 = article
%A = Romero, C. and Ventura, S.
%C = Tarrytown, NY, USA
%D = 2007
%I = Pergamon Press, Inc.
%T = Educational data mining: A survey from 1995 to 2005
%U = http://portal.acm.org/citation.cfm?id=1223659
Hotho, A.; Jäschke, R.; Schmitz, C. & Stumme, G.
(2006):
Information Retrieval in Folksonomies: Search and Ranking.
In: The Semantic Web: Research and Applications,
Heidelberg.
[Kurzfassung] [BibTeX][Endnote]
Social bookmark tools are rapidly emerging on the Web. In such systems users are setting up lightweight conceptual structures called folksonomies. The reason for their immediate success is the fact that no specific skills are needed for participating. At the moment, however, the information retrieval support is limited. We present a formal model and a new search algorithm for folksonomies, called FolkRank, that exploits the structure of the folksonomy. The proposed algorithm is also applied to find communities within the folksonomy and is used to structure search results. All findings are demonstrated on a large scale dataset.
@inproceedings{hotho2006information,
author = {Hotho, Andreas and Jäschke, Robert and Schmitz, Christoph and Stumme, Gerd},
title = {Information Retrieval in Folksonomies: Search and Ranking},
editor = {Sure, York and Domingue, John},
booktitle = {The Semantic Web: Research and Applications},
series = {Lecture Notes in Computer Science},
publisher = {Springer},
address = {Heidelberg},
year = {2006},
volume = {4011},
pages = {411-426},
keywords = {2006, folkrank, folksonomy, graph, iccs_example, information, l3s, mining, myown, ol_tut2010, rank, ranking, retrieval, search, seminar2006, trias_example, webzu, pagerank},
abstract = {Social bookmark tools are rapidly emerging on the Web. In such systems users are setting up lightweight conceptual structures called folksonomies. The reason for their immediate success is the fact that no specific skills are needed for participating. At the moment, however, the information retrieval support is limited. We present a formal model and a new search algorithm for folksonomies, called FolkRank, that exploits the structure of the folksonomy. The proposed algorithm is also applied to find communities within the folksonomy and is used to structure search results. All findings are demonstrated on a large scale dataset.}
}
%0 = inproceedings
%A = Hotho, Andreas and Jäschke, Robert and Schmitz, Christoph and Stumme, Gerd
%B = The Semantic Web: Research and Applications
%C = Heidelberg
%D = 2006
%I = Springer
%T = Information Retrieval in Folksonomies: Search and Ranking
Lucchese, C.; Orlando, S. & Perego, R.
(2006):
Fast and Memory Efficient Mining of Frequent Closed Itemsets.
In: IEEE Transactions On Knowledge and Data Engineering,
Ausgabe/Number: 1,
Vol. 18,
Erscheinungsjahr/Year: 2006.
Seiten/Pages: 21-36.
[BibTeX]
[Endnote]
@article{tkde06,
author = {Lucchese, Claudio and Orlando, Salvatore and Perego, Raffaele},
title = {Fast and Memory Efficient Mining of Frequent Closed Itemsets},
journal = {IEEE Transactions On Knowledge and Data Engineering},
year = {2006},
volume = {18},
number = {1},
pages = {21--36},
keywords = {association, closed, fca, frequent, itemset, mining, rule}
}
%0 = article
%A = Lucchese, Claudio and Orlando, Salvatore and Perego, Raffaele
%D = 2006
%T = Fast and Memory Efficient Mining of Frequent Closed Itemsets
Schmitz, C.; Hotho, A.; Jäschke, R. & Stumme, G.
(2006):
Mining Association Rules in Folksonomies.
In: Data Science and Classification,
Berlin, Heidelberg.
[BibTeX][Endnote]
@inproceedings{schmitz2006mining,
author = {Schmitz, Christoph and Hotho, Andreas and Jäschke, Robert and Stumme, Gerd},
title = {Mining Association Rules in Folksonomies},
editor = {Batagelj, V. and Bock, H.-H. and Ferligoj, A. and Žiberna, A.},
booktitle = {Data Science and Classification},
series = {Studies in Classification, Data Analysis, and Knowledge Organization},
publisher = {Springer},
address = {Berlin, Heidelberg},
year = {2006},
pages = {261--270},
isbn = {978-3-540-34415-5},
keywords = {2006, association, folksonomy, iccs_example, l3s, mining, myown, ol_tut2010, rule, trias_example}
}
%0 = inproceedings
%A = Schmitz, Christoph and Hotho, Andreas and Jäschke, Robert and Stumme, Gerd
%B = Data Science and Classification
%C = Berlin, Heidelberg
%D = 2006
%I = Springer
%T = Mining Association Rules in Folksonomies
(2005):
Proc. of the European Web Mining Forum 2005.
[Volltext] [BibTeX]
[Endnote]
@proceedings{berendt05european,,
title = {Proc. of the European Web Mining Forum 2005},
editor = {Berendt, Bettina and Hotho, Andreas and Mladenic, Dunja and Semerano, Giovanni and Spiliopoulou, Myra and Stumme, Gerd and van Someren, Maarten},
year = {2005},
url = {http://www.kde.cs.uni-kassel.de/ws/ewmf05},
keywords = {2005, europe, iccs_example, mining, proceeding, trias_example, web}
}
%0 = proceedings
%D = 2005
%T = Proc. of the European Web Mining Forum 2005
%U = http://www.kde.cs.uni-kassel.de/ws/ewmf05
Berendt, B.; Hotho, A. & Stumme, G.
(2005):
Semantic Web Mining and the Representation, Analysis, and Evolution of Web Space.
In: Proc. of the 1st Intl. Workshop on Representation and Analysis of Web Space,
[BibTeX][Endnote]
@inproceedings{berendt05semantic,
author = {Berendt, Bettina and Hotho, Andreas and Stumme, Gerd},
title = {Semantic Web Mining and the Representation, Analysis, and Evolution of Web Space},
editor = {Svatek, Vojtech and Snasel, Vaclav},
booktitle = {Proc. of the 1st Intl. Workshop on Representation and Analysis of Web Space},
publisher = {Technical University of Ostrava},
year = {2005},
pages = {1--16},
isbn = {80-248-0864-1},
keywords = {iccs_example, mining, semantic, trias_example, web}
}
%0 = inproceedings
%A = Berendt, Bettina and Hotho, Andreas and Stumme, Gerd
%B = Proc. of the 1st Intl. Workshop on Representation and Analysis of Web Space
%D = 2005
%I = Technical University of Ostrava
%T = Semantic Web Mining and the Representation, Analysis, and Evolution of Web Space
Berendt, B.; Hotho, A. & Stumme, G.
(2004):
Usage Mining for and on the Semantic Web.
In: Data Mining Next Generation Challenges and Future Directions.
Hrsg./Editors: Kargupta, H.; Joshi, A.; Sivakumar, K. & Yesha, Y.
Verlag/Publisher: AAAI Press,
Boston.
Erscheinungsjahr/Year: 2004.
Seiten/Pages: 461-481.
[Volltext] [Kurzfassung] [BibTeX]
[Endnote]
Semantic Web Mining aims at combining the two fast-developing
search areas Semantic Web and Web Mining.
b Mining aims at discovering insights about the meaning of Web
sources and their usage. Given the primarily syntactical nature
data Web mining operates on, the discovery of meaning is
possible based on these data only. Therefore, formalizations of
e semantics of Web resources and navigation behavior are
creasingly being used. This fits exactly with the aims of the
mantic Web: the Semantic Web enriches the WWW by
chine-processable information which supports the user in his
sks. In this paper, we discuss the interplay of the Semantic Web
th Web Mining, with a specific focus on usage mining.
@incollection{berendt04usage,
author = {Berendt, Bettina and Hotho, Andreas and Stumme, Gerd},
title = {Usage Mining for and on the Semantic Web},
editor = {Kargupta, Hillol and Joshi, Anupam and Sivakumar, Krishnamoorthy and Yesha, Yelena},
booktitle = {Data Mining Next Generation Challenges and Future Directions},
publisher = {AAAI Press},
address = {Boston},
year = {2004},
pages = {461-481},
url = {http://www.kde.cs.uni-kassel.de/stumme/papers/2004/berendt04usage.pdf},
isbn = {0-262-61203-8},
keywords = {iccs_example, mining, semantic, trias_example, usage, web},
abstract = {Semantic Web Mining aims at combining the two fast-developing
search areas Semantic Web and Web Mining.
b Mining aims at discovering insights about the meaning of Web
sources and their usage. Given the primarily syntactical nature
data Web mining operates on, the discovery of meaning is
possible based on these data only. Therefore, formalizations of
e semantics of Web resources and navigation behavior are
creasingly being used. This fits exactly with the aims of the
mantic Web: the Semantic Web enriches the WWW by
chine-processable information which supports the user in his
sks. In this paper, we discuss the interplay of the Semantic Web
th Web Mining, with a specific focus on usage mining.}
}
%0 = incollection
%A = Berendt, Bettina and Hotho, Andreas and Stumme, Gerd
%B = Data Mining Next Generation Challenges and Future Directions
%C = Boston
%D = 2004
%I = AAAI Press
%T = Usage Mining for and on the Semantic Web
%U = http://www.kde.cs.uni-kassel.de/stumme/papers/2004/berendt04usage.pdf
Orlando, S.; Palmerini, P.; Perego, R. & Silvestri, F.
(2003):
An Efficient Parallel and Distributed Algorithm for Counting Frequent Sets.
In: High Performance Computing for Computational Science — VECPAR 2002,
[Volltext]
[Kurzfassung] [BibTeX][Endnote]
Due to the huge increase in the number and dimension of available databases, efficient solutions for counting frequent sets
e nowadays very important within the Data Mining community. Several sequential and parallel algorithms were proposed, whichin many cases exhibit excellent scalability. In this paper we present ParDCI, a distributed and multithreaded algorithm forcounting the occurrences of frequent sets within transactional databases. ParDCI is a parallel version of DCI (Direct Count& Intersect), a multi-strategy algorithm which is able to adapt its behavior not only to the features of the specific computingplatform (e.g. available memory), but also to the features of the dataset being processed (e.g. sparse or dense datasets).ParDCI enhances previous proposals by exploiting the highly optimized counting and intersection techniques of DCI, and byrelying on a multi-level parallelization approachwh ichex plicitly targets clusters of SMPs, an emerging computing platform.We focused our work on the efficient exploitation of the underlying architecture. Intra-Node multithreading effectively exploitsthe memory hierarchies of each SMP node, while Inter-Node parallelism exploits smart partitioning techniques aimed at reducingcommunication overheads. In depth experimental evaluations demonstrate that ParDCI reaches nearly optimal performances undera variety of conditions.
@inproceedings{orlando02efficient,
author = {Orlando, Salvatore and Palmerini, Paolo and Perego, Raffaele and Silvestri, Fabrizio},
title = {An Efficient Parallel and Distributed Algorithm for Counting Frequent Sets},
booktitle = {High Performance Computing for Computational Science — VECPAR 2002},
year = {2003},
pages = {3--29},
url = {http://dx.doi.org/10.1007/3-540-36569-9_28},
keywords = {algorithm, fca, frequent, itemset, mining, parallel, set},
abstract = {Due to the huge increase in the number and dimension of available databases, efficient solutions for counting frequent sets
e nowadays very important within the Data Mining community. Several sequential and parallel algorithms were proposed, whichin many cases exhibit excellent scalability. In this paper we present ParDCI, a distributed and multithreaded algorithm forcounting the occurrences of frequent sets within transactional databases. ParDCI is a parallel version of DCI (Direct Count& Intersect), a multi-strategy algorithm which is able to adapt its behavior not only to the features of the specific computingplatform (e.g. available memory), but also to the features of the dataset being processed (e.g. sparse or dense datasets).ParDCI enhances previous proposals by exploiting the highly optimized counting and intersection techniques of DCI, and byrelying on a multi-level parallelization approachwh ichex plicitly targets clusters of SMPs, an emerging computing platform.We focused our work on the efficient exploitation of the underlying architecture. Intra-Node multithreading effectively exploitsthe memory hierarchies of each SMP node, while Inter-Node parallelism exploits smart partitioning techniques aimed at reducingcommunication overheads. In depth experimental evaluations demonstrate that ParDCI reaches nearly optimal performances undera variety of conditions.}
}
%0 = inproceedings
%A = Orlando, Salvatore and Palmerini, Paolo and Perego, Raffaele and Silvestri, Fabrizio
%B = High Performance Computing for Computational Science — VECPAR 2002
%D = 2003
%T = An Efficient Parallel and Distributed Algorithm for Counting Frequent Sets
%U = http://dx.doi.org/10.1007/3-540-36569-9_28
Berendt, B.; Hotho, A. & Stumme, G.
(2002):
Towards Semantic Web Mining.
In: The Semantic Web - ISWC 2002,
Heidelberg.
[Volltext]
[BibTeX][Endnote]
@inproceedings{berendt02towards,
author = {Berendt, B. and Hotho, A. and Stumme, G.},
title = {Towards Semantic Web Mining},
editor = {Horrocks, I. and Hendler, J.},
booktitle = {The Semantic Web -- ISWC 2002},
series = {LNCS},
publisher = {Springer},
address = {Heidelberg},
year = {2002},
pages = {264-278},
url = {http://www.kde.cs.uni-kassel.de/stumme/papers/2002/ISWC02.pdf},
keywords = {iccs_example, mining, semantic, trias_example, web}
}
%0 = inproceedings
%A = Berendt, B. and Hotho, A. and Stumme, G.
%B = The Semantic Web -- ISWC 2002
%C = Heidelberg
%D = 2002
%I = Springer
%T = Towards Semantic Web Mining
%U = http://www.kde.cs.uni-kassel.de/stumme/papers/2002/ISWC02.pdf
Hartmann, J.; Hotho, A. & Stumme, G.
(2002):
Semantic Web Mining for Building Information Portals (Position Paper).
In: Proc. Arbeitskreistreffen Knowledge Discovery,
Oldenburg.
[Volltext]
[BibTeX][Endnote]
@inproceedings{hartmann02semanticweb,
author = {Hartmann, J. and Hotho, A. and Stumme, G.},
title = {Semantic Web Mining for Building Information Portals (Position Paper)},
booktitle = {Proc. Arbeitskreistreffen Knowledge Discovery},
address = {Oldenburg},
year = {2002},
url = {http://www.kde.cs.uni-kassel.de/stumme/papers/2002/hartmann2002semanticweb.pdf},
keywords = {iccs_example, mining, semantic, trias_example, web}
}
%0 = inproceedings
%A = Hartmann, J. and Hotho, A. and Stumme, G.
%B = Proc. Arbeitskreistreffen Knowledge Discovery
%C = Oldenburg
%D = 2002
%T = Semantic Web Mining for Building Information Portals (Position Paper)
%U = http://www.kde.cs.uni-kassel.de/stumme/papers/2002/hartmann2002semanticweb.pdf
Stumme, G.; Berendt, B. & Hotho, A.
(2002):
Usage Mining for and on the Semantic Web.
In: Proc. NSF Workshop on Next Generation Data Mining,
Baltimore.
[Volltext]
[BibTeX][Endnote]
@inproceedings{stumme02usage,
author = {Stumme, G. and Berendt, B. and Hotho, A.},
title = {Usage Mining for and on the Semantic Web},
booktitle = {Proc. NSF Workshop on Next Generation Data Mining},
address = {Baltimore},
year = {2002},
pages = {77-86},
url = {http://www.kde.cs.uni-kassel.de/stumme/papers/2002/NSF-NGDM02.pdf},
keywords = {iccs_example, mining, semantic, trias_example, usage, web}
}
%0 = inproceedings
%A = Stumme, G. and Berendt, B. and Hotho, A.
%B = Proc. NSF Workshop on Next Generation Data Mining
%C = Baltimore
%D = 2002
%T = Usage Mining for and on the Semantic Web
%U = http://www.kde.cs.uni-kassel.de/stumme/papers/2002/NSF-NGDM02.pdf
Zhang, D. & Dong, Y.
(2002):
A novel Web usage mining approach for search engines.
In: Computer Networks,
Ausgabe/Number: 3,
Vol. 39,
Verlag/Publisher: Elsevier.
Erscheinungsjahr/Year: 2002.
Seiten/Pages: 303-310.
[Volltext] [Kurzfassung] [BibTeX]
[Endnote]
Web usage mining can be very useful to search engines. This paper proposes a novel effective approach to exploit the relationships among users, queries and resources based on the search engine's log. How this method can be applied is illustrated by a Chinese image search engine.
@article{zhang2002web,
author = {Zhang, Dell and Dong, Yisheng},
title = {A novel Web usage mining approach for search engines},
editor = {Akyildiz, Ian F. and Rudin, Harry},
journal = {Computer Networks},
publisher = {Elsevier},
year = {2002},
volume = {39},
number = {3},
pages = {303--310},
url = {http://www.sciencedirect.com/science/article/B6VRG-45H0GV7-5/2/16726cebdcde67ba7aeb95cc91e797bf},
doi = {10.1016/S1389-1286(02)00211-6},
issn = {1389-1286},
keywords = {engine, mining, search, usage, web},
abstract = {Web usage mining can be very useful to search engines. This paper proposes a novel effective approach to exploit the relationships among users, queries and resources based on the search engine's log. How this method can be applied is illustrated by a Chinese image search engine.}
}
%0 = article
%A = Zhang, Dell and Dong, Yisheng
%D = 2002
%I = Elsevier
%T = A novel Web usage mining approach for search engines
%U = http://www.sciencedirect.com/science/article/B6VRG-45H0GV7-5/2/16726cebdcde67ba7aeb95cc91e797bf
(2001):
Semantic Web Mining. Freiburg
[Volltext] [BibTeX]
[Endnote]
@proceedings{stumme01semantic,,
title = {Semantic Web Mining},
editor = {Stumme, G. and Hotho, A. and Berendt, B.},
address = {Freiburg},
year = {2001},
url = {http://semwebmine2001.aifb.uni-karlsruhe.de/online.html},
keywords = {iccs_example, mining, proceeding, semantic, trias_example, web, workshop}
}
%0 = proceedings
%C = Freiburg
%D = 2001
%T = Semantic Web Mining
%U = http://semwebmine2001.aifb.uni-karlsruhe.de/online.html
Tufte, E. R. (Hrsg.)
(2001):
The Visual Display of Quantitative Information.
Second. Aufl./Vol..
Erscheinungsjahr/Year: 2001.
Verlag/Publisher: Graphics Press,
[Volltext] [BibTeX]
[Endnote]
@book{tufte2001visual,
author = {Tufte, Edward R.},
title = {The Visual Display of Quantitative Information},
publisher = {Graphics Press},
year = {2001},
edition = {Second},
url = {http://www.amazon.com/Visual-Display-Quantitative-Information-2nd/dp/0961392142%3FSubscriptionId%3D192BW6DQ43CK9FN0ZGG2%26tag%3Dws%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D0961392142},
isbn = {0961392142},
keywords = {data, information, mining, toread, visualization}
}
%0 = book
%A = Tufte, Edward R.
%D = 2001
%I = Graphics Press
%T = The Visual Display of Quantitative Information
%U = http://www.amazon.com/Visual-Display-Quantitative-Information-2nd/dp/0961392142%3FSubscriptionId%3D192BW6DQ43CK9FN0ZGG2%26tag%3Dws%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D0961392142
Fayyad, U. M.; Piatetsky-Shapiro, G. & Smyth, P.
(1996):
From data mining to knowledge discovery: an overview.
In: Advances in knowledge discovery and data mining.
Hrsg./Editors: Fayyad, U. M.; Piatetsky-Shapiro, G.; Smyth, P. & Uthurusamy, R.
Verlag/Publisher: American Association for Artificial Intelligence,
Menlo Park, CA, USA.
Erscheinungsjahr/Year: 1996.
Seiten/Pages: 1-34.
[Volltext] [Kurzfassung] [BibTeX]
[Endnote]
Data mining and knowledge discovery in
tabases have been attracting a significant amount of research, industry, and media attention of late. What is all the excitement about? This article provides an overview of this emerging
eld, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases. The article
ntions particular real-world applications, specific data-mining techniques, challenges involved in real-world applications of knowledge
scovery, and current and future research directions in the field.
@incollection{fayyad1996data,
author = {Fayyad, Usama M. and Piatetsky-Shapiro, Gregory and Smyth, Padhraic},
title = {From data mining to knowledge discovery: an overview},
editor = {Fayyad, Usama M. and Piatetsky-Shapiro, Gregory and Smyth, Padhraic and Uthurusamy, Ramasamy},
booktitle = {Advances in knowledge discovery and data mining},
publisher = {American Association for Artificial Intelligence},
address = {Menlo Park, CA, USA},
year = {1996},
pages = {1--34},
url = {http://portal.acm.org/citation.cfm?id=257942},
isbn = {0-262-56097-6},
keywords = {data, discovery, kdd, knowledge, mining},
abstract = {Data mining and knowledge discovery in
tabases have been attracting a significant amount of research, industry, and media attention of late. What is all the excitement about? This article provides an overview of this emerging
eld, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases. The article
ntions particular real-world applications, specific data-mining techniques, challenges involved in real-world applications of knowledge
scovery, and current and future research directions in the field.}
}
%0 = incollection
%A = Fayyad, Usama M. and Piatetsky-Shapiro, Gregory and Smyth, Padhraic
%B = Advances in knowledge discovery and data mining
%C = Menlo Park, CA, USA
%D = 1996
%I = American Association for Artificial Intelligence
%T = From data mining to knowledge discovery: an overview
%U = http://portal.acm.org/citation.cfm?id=257942