Publikationen

Liu, X.; Lu, M.; Ooi, B. C.; Shen, Y.; Wu, S. & Zhang, M. (2012): CDAS: a crowdsourcing data analytics system. In: Proceedings of the VLDB Endowment, Ausgabe/Number: 10, Vol. 5, Verlag/Publisher: VLDB Endowment. Erscheinungsjahr/Year: 2012. Seiten/Pages: 1040-1051. [Volltext]

@article{liu2012crowdsourcing,
  author = {Liu, Xuan and Lu, Meiyu and Ooi, Beng Chin and Shen, Yanyan and Wu, Sai and Zhang, Meihui},
  title = {CDAS: a crowdsourcing data analytics system},
  journal = {Proceedings of the VLDB Endowment},
  publisher = {VLDB Endowment},
  year = {2012},
  volume = {5},
  number = {10},
  pages = {1040--1051},
  url = {http://dl.acm.org/citation.cfm?id=2336664.2336676},
  issn = {2150-8097},
  keywords = {analytics, cdas, collective, crowdsourcing, data, intelligence, mining, web},
  abstract = {Some complex problems, such as image tagging and natural language processing, are very challenging for computers, where even state-of-the-art technology is yet able to provide satisfactory accuracy. Therefore, rather than relying solely on developing new and better algorithms to handle such tasks, we look to the crowdsourcing solution -- employing human participation -- to make good the shortfall in current technology. Crowdsourcing is a good supplement to many computer tasks. A complex job may be divided into computer-oriented tasks and human-oriented tasks, which are then assigned to machines and humans respectively.</p> <p>To leverage the power of crowdsourcing, we design and implement a Crowdsourcing Data Analytics System, CDAS. CDAS is a framework designed to support the deployment of various crowdsourcing applications. The core part of CDAS is a quality-sensitive answering model, which guides the crowdsourcing engine to process and monitor the human tasks. In this paper, we introduce the principles of our quality-sensitive model. To satisfy user required accuracy, the model guides the crowdsourcing query engine for the design and processing of the corresponding crowdsourcing jobs. It provides an estimated accuracy for each generated result based on the human workers' historical performances. When verifying the quality of the result, the model employs an online strategy to reduce waiting time. To show the effectiveness of the model, we implement and deploy two analytics jobs on CDAS, a twitter sentiment analytics job and an image tagging job. We use real Twitter and Flickr data as our queries respectively. We compare our approaches with state-of-the-art classification and image annotation techniques. The results show that the human-assisted methods can indeed achieve a much higher accuracy. By embedding the quality-sensitive model into crowdsourcing query engine, we effectively reduce the processing cost while maintaining the required query answer quality.}
  }

2009

Rauber, A. & Kaiser, M. (2009): Webarchivierung und Web Archive Mining: Notwendigkeit, Probleme und Lösungsansätze. In: HMD Praxis der Wirtschaftsinformatik, Vol. 268, Verlag/Publisher: dpunkt.verlag. Erscheinungsjahr/Year: 2009. [Volltext]

@article{rauber2009webarchivierung,
  author = {Rauber, Andreas and Kaiser, Max},
  title = {Webarchivierung und Web Archive Mining: Notwendigkeit, Probleme und Lösungsansätze},
  editor = {Knoll, Matthias and Meier, Andreas},
  journal = {HMD Praxis der Wirtschaftsinformatik},
  publisher = {dpunkt.verlag},
  year = {2009},
  volume = {268},
  url = {http://hmd.dpunkt.de/268/03.php},
  issn = {1436-3011},
  keywords = {archive, law, mining, privacy, web},
  abstract = { In den letzten Jahren haben Bibliotheken und Archive zunehmend die Aufgabe übernommen, neben konventionellen Publikationen auch Inhalte aus dem World Wide Web zu sammeln, um so diesen wertvollen Teil unseres kulturellen Erbes zu bewahren und wichtige Informationen langfristig verfügbar zu halten. Diese massiven Datensammlungen bieten faszinierende Möglichkeiten, rasch Zugriff auf wichtige Informationen zu bekommen, die im Live-Web bereits verloren gegangen sind. Sie sind eine unentbehrliche Quelle für Wissenschaftler, die in der Zukunft die gesellschaftliche und technologische Entwicklung unserer Zeit nachvollziehen wollen.  Auf der anderen Seite stellt eine derartige Datensammlung aber einen völlig neuen Datenbestand dar, der nicht nur rechtliche, sondern auch zahlreiche ethische Fragen betreffend seine Nutzung aufwirft. Diese werden in dem Ausmaß zunehmen, in dem die technischen Möglichkeiten zur automatischen Analyse und Interpretation dieser Daten leistungsfähiger werden. Da sich die meisten Webarchivierungsinitiativen dieser Problematik bewusst sind, bleibt die Nutzung der Daten derzeit meist stark eingeschränkt, oder es wird eine Art von "Opt-Out"-Möglichkeit vorgesehen, wodurch Webseiteninhaber die Aufnahme ihrer Seiten in ein Webarchiv ausschließen können. Mit beiden Ansätzen können Webarchive ihr volles Nutzungspotenzial nicht ausschöpfen.  Dieser Artikel beschreibt einleitend kurz die Technologien, die zur Sammlung von Webinhalten zu Archivierungszwecken verwendet werden. Er hinterfragt Annahmen, die die freie Verfügbarkeit der Daten und unterschiedliche Nutzungsarten betreffen. Darauf aufbauend identifiziert er eine Reihe von offenen Fragen, deren Lösung einen breiteren Zugriff und bessere Nutzung von Webarchiven erlauben könnte. }
  }

2008

Pang, B. & Lee, L. (2008): Opinion Mining and Sentiment Analysis. In: Foundations and Trends in Information Retrieval, Ausgabe/Number: 1-2, Vol. 2, Verlag/Publisher: Now Publishers Inc.. Erscheinungsjahr/Year: 2008. Seiten/Pages: 1-135. [Volltext]

Wu, X.; Kumar, V.; Quinlan, J. R.; Ghosh, J.; Yang, Q.; Motoda, H.; McLachlan, G.; Ng, A.; Liu, B.; Yu, P.; Zhou, Z.-H.; Steinbach, M.; Hand, D. & Steinberg, D. (2008): Top 10 algorithms in data mining. In: Knowledge and Information Systems, Ausgabe/Number: 1, Vol. 14, Verlag/Publisher: Springer. Erscheinungsjahr/Year: 2008. Seiten/Pages: 1-37. [Volltext]

2007

Kim, H. L.; Hwang, S. H. & Kim, H. G. (2007): FCA-based approach for mining contextualized folksonomy. In: SAC '07: Proceedings of the 2007 ACM symposium on Applied computing, New York, NY, USA. [Volltext]

Romero, C. & Ventura, S. (2007): Educational data mining: A survey from 1995 to 2005. In: Expert Syst. Appl., Ausgabe/Number: 1, Vol. 33, Verlag/Publisher: Pergamon Press, Inc.. Erscheinungsjahr/Year: 2007. Seiten/Pages: 135-146. [Volltext]

2006

Hotho, A.; Jäschke, R.; Schmitz, C. & Stumme, G. (2006): Information Retrieval in Folksonomies: Search and Ranking. In: The Semantic Web: Research and Applications, Heidelberg.

Schmitz, C.; Hotho, A.; Jäschke, R. & Stumme, G. (2006): Mining Association Rules in Folksonomies. In: Data Science and Classification, Berlin, Heidelberg.

2005

Berendt, B.; Hotho, A. & Stumme, G. (2005): Semantic Web Mining and the Representation, Analysis, and Evolution of Web Space. In: Proc. of the 1st Intl. Workshop on Representation and Analysis of Web Space,

2004

Berendt, B.; Hotho, A. & Stumme, G. (2004): Usage Mining for and on the Semantic Web. In: Data Mining Next Generation Challenges and Future Directions. Hrsg./Editors: Kargupta, H.; Joshi, A.; Sivakumar, K. & Yesha, Y. Verlag/Publisher: AAAI Press, Boston. Erscheinungsjahr/Year: 2004. Seiten/Pages: 461-481. [Volltext]

2003

Orlando, S.; Palmerini, P.; Perego, R. & Silvestri, F. (2003): An Efficient Parallel and Distributed Algorithm for Counting Frequent Sets. In: High Performance Computing for Computational Science — VECPAR 2002, [Volltext]

2002

Hartmann, J.; Hotho, A. & Stumme, G. (2002): Semantic Web Mining for Building Information Portals (Position Paper). In: Proc. Arbeitskreistreffen Knowledge Discovery, Oldenburg. [Volltext]

Zhang, D. & Dong, Y. (2002): A novel Web usage mining approach for search engines. In: Computer Networks, Ausgabe/Number: 3, Vol. 39, Verlag/Publisher: Elsevier. Erscheinungsjahr/Year: 2002. Seiten/Pages: 303-310. [Volltext]

2001

Tufte, E. R. (Hrsg.) (2001): The Visual Display of Quantitative Information. Second. Aufl./Vol.. Erscheinungsjahr/Year: 2001. Verlag/Publisher: Graphics Press, [Volltext]

1996

Fayyad, U. M.; Piatetsky-Shapiro, G. & Smyth, P. (1996): From data mining to knowledge discovery: an overview. In: Advances in knowledge discovery and data mining. Hrsg./Editors: Fayyad, U. M.; Piatetsky-Shapiro, G.; Smyth, P. & Uthurusamy, R. Verlag/Publisher: American Association for Artificial Intelligence, Menlo Park, CA, USA. Erscheinungsjahr/Year: 1996. Seiten/Pages: 1-34. [Volltext]