Bridging the Gap-Data Mining and Social Network Analysis for Integrating Semantic Web and Web 2.0.
Web Semantics: Science, Services and Agents on the World Wide Web, 8(2-3):95 - 96, 2010.
Bridging the Gap--Data Mining and Social Network Analysis for Integrating Semantic Web and Web 2.0; The Future of Knowledge Dissemination: The Elsevier Grand Challenge for the Life Sciences
Bettina Berendt, Andreas Hotho and Gerd Stumme.
[doi]
[BibTeX]
Publikationsmanagement mit BibSonomy - ein Social-Bookmarking-System für Wissenschaftler.
HMD -- Praxis der Wirtschaftsinformatik, Heft 271:47-58, 2010.
Andreas Hotho, Dominik Benz, Folke Eisterlehner, Robert Jäschke, Beate Krause, Christoph Schmitz and Gerd Stumme.
[abstract]
[BibTeX]
Kooperative Verschlagwortungs- bzw. Social-Bookmarking-Systeme wie Delicious, Mister Wong oder auch unser eigenes System BibSonomy erfreuen sich immer größerer Beliebtheit und bilden einen zentralen Bestandteil des heutigen Web 2.0. In solchen Systemen erstellen Nutzer leichtgewichtige Begriffssysteme, sogenannte Folksonomies, die die Nutzerdaten strukturieren. Die einfache Bedienbarkeit, die Allgegenwärtigkeit, die ständige Verfügbarkeit, aber auch die Möglichkeit, Gleichgesinnte spontan in solchen Systemen zu entdecken oder sie schlicht als Informationsquelle zu nutzen, sind Gründe für ihren gegenwärtigen Erfolg. Der Artikel führt den Begriff Social Bookmarking ein und diskutiert zentrale Elemente (wie Browsing und Suche) am Beispiel von BibSonomy anhand typischer Arbeitsabläufe eines Wissenschaftlers. Wir beschreiben die Architektur von BibSonomy sowie Wege der Integration und Vernetzung von BibSonomy mit Content-Management-Systemen und Webauftritten. Der Artikel schließt mit Querbezügen zu aktuellen Forschungsfragen im Bereich Social Bookmarking.
Stop Thinking, start Tagging - Tag Semantics emerge from Collaborative Verbosity.
In:
Proceedings of the 19th International World Wide Web Conference (WWW 2010).
ACM, Raleigh, NC, USA, 2010.
Christian Körner, Dominik Benz, Markus Strohmaier, Andreas Hotho and Gerd Stumme.
[doi]
[abstract]
[BibTeX]
Recent research provides evidence for the presence of emergent semantics in collaborative tagging systems. While several methods have been proposed, little is known about the factors that influence the evolution of semantic structures in these systems. A natural hypothesis is that the quality of the emergent semantics depends on the pragmatics of tagging: Users with certain usage patterns might contribute more to the resulting semantics than others. In this work, we propose several measures which enable a pragmatic differentiation of taggers by their degree of contribution to emerging semantic structures. We distinguish between categorizers, who typically use a small set of tags as a replacement for hierarchical classification schemes, and describers, who are annotating resources with a wealth of freely associated, descriptive keywords. To study our hypothesis, we apply semantic similarity measures to 64 different partitions of a real-world and large-scale folksonomy containing different ratios of categorizers and describers. Our results not only show that ‘verbose’ taggers are most useful for the emergence of tag semantics, but also that a subset containing only 40% of the most ‘verbose’ taggers can produce results that match and even outperform the semantic precision obtained from the whole dataset. Moreover, the results suggest that there exists a causal link between the pragmatics of tagging and resulting emergent semantics. This work is relevant for designers and analysts of tagging systems interested (i) in fostering the semantic development of their platforms, (ii) in identifying users introducing “semantic noise”, and (iii) in learning ontologies.
Visit me, click me, be my friend: An analysis of evidence networks of user relationships in Bibsonomy.
In:
Proceedings of the 21st ACM conference on Hypertext and hypermedia.
Toronto, Canada, 2010.
Folke Mitzlaff, Dominik Benz, Gerd Stumme and Andreas Hotho.
[BibTeX]
Semantic Grounding of Tag Relatedness in Social Bookmarking Systems.
In:
A. Sheth, S. Staab, M. Dean, M. Paolucci, D. Maynard, T. Finin and K. Thirunarayan, editors,
The Semantic Web - ISWC 2008, pages 615-631.
Springer Berlin / Heidelberg, 2008.
Ciro Cattuto, Dominik Benz, Andreas Hotho and Gerd Stumme.
[doi]
[abstract]
[BibTeX]
Collaborative tagging systems have nowadays become important data sources for populating semantic web applications. For tasks like synonym detection and discovery of concept hierarchies, many researchers introduced measures of tag similarity. Even though most of these measures appear very natural, their design often seems to be rather ad hoc, and the underlying assumptions on the notion of similarity are not made explicit. A more systematic characterization and validation of tag similarity in terms of formal representations of knowledge is still lacking. Here we address this issue and analyze several measures of tag similarity: Each measure is computed on data from the social bookmarking system del.icio.us and a semantic grounding is provided by mapping pairs of similar tags in the folksonomy to pairs of synsets in Wordnet, where we use validated measures of semantic distance to characterize the semantic relation between the mapped tags. This exposes important features of the investigated similarity measures and indicates which ones are better suited in the context of a given semantic application.
Semantic Network Analysis of Ontologies.
In: Y. Sure and J. Domingue, editors,
The Semantic Web: Research and Applications, volume 4011, series LNAI, pages 514-529.
Springer, Heidelberg, 2006.
Bettina Hoser, Andreas Hotho, Robert Jäschke, Christoph Schmitz and Gerd Stumme.
[doi]
[abstract]
[BibTeX]
A key argument for modeling knowledge in ontologies is the easy re-use and re-engineering of the knowledge. However, beside consistency checking, current ontology engineering tools provide only basic functionalities for analyzing ontologies. Since ontologies can be considered as (labeled, directed) graphs, graph analysis techniques are a suitable answer for this need. Graph analysis has been performed by sociologists for over 60 years, and resulted in the vivid research area of Social Network Analysis (SNA). While social network structures in general currently receive high attention in the Semantic Web community, there are only very few SNA applications up to now, and virtually none for analyzing the structure of ontologies. We illustrate in this paper the benefits of applying SNA to ontologies and the Semantic Web, and discuss which research topics arise on the edge between the two areas. In particular, we discuss how different notions of centrality describe the core content and structure of an ontology. From the rather simple notion of degree centrality over betweenness centrality to the more complex eigenvector centrality based on Hermitian matrices, we illustrate the insights these measures provide on two ontologies, which are different in purpose, scope, and size.
WikiRelate! computing semantic relatedness using wikipedia.
In:
proceedings of the 21st national conference on Artificial intelligence - Volume 2, series AAAI'06, pages 1419-1424.
AAAI Press, 2006.
Michael Strube and Simone Paolo Ponzetto.
[doi]
[abstract]
[BibTeX]
Wikipedia provides a knowledge base for computing word relatedness in a more structured fashion than a search engine and with more coverage than WordNet. In this work we present experiments on using Wikipedia for computing semantic relatedness and compare it to WordNet on various benchmarking datasets. Existing relatedness measures perform better using Wikipedia than a baseline given by Google counts, and we show that Wikipedia outperforms WordNet when applied to the largest available dataset designed for that purpose. The best results on this dataset are obtained by integrating Google, WordNet and Wikipedia based measures. We also show that including Wikipedia improves the performance of an NLP application processing naturally occurring texts.
Semantic Web Mining - State of the Art and Future Directions.
Journal of Web Semantics, 4(2):124-143, 2006.
Gerd Stumme, Andreas Hotho and Bettina Berendt.
[doi]
[abstract]
[BibTeX]
SemanticWeb Mining aims at combining the two fast-developing research areas SemanticWeb andWeb Mining. This survey analyzes the convergence of trends from both areas: an increasing number of researchers is working on improving the results ofWeb Mining by exploiting semantic structures in theWeb, and they make use ofWeb Mining techniques for building the Semantic Web. Last but not least, these techniques can be used for mining the Semantic Web itself. The Semantic Web is the second-generation WWW, enriched by machine-processable information which supports the user in his tasks. Given the enormous size even of today’s Web, it is impossible to manually enrich all of these resources. Therefore, automated schemes for learning the relevant information are increasingly being used. Web Mining aims at discovering insights about the meaning of Web resources and their usage. Given the primarily syntactical nature of the data being mined, the discovery of meaning is impossible based on these data only. Therefore, formalizations of the semantics of Web sites and navigation behavior are becoming more and more common. Furthermore, mining the Semantic Web itself is another upcoming application. We argue that the two areas Web Mining and Semantic Web need each other to fulfill their goals, but that the full potential of this convergence is not yet realized. This paper gives an overview of where the two areas meet today, and sketches ways of how a closer integration could be profitable.
A Roadmap for Web Mining: From Web to Semantic Web..
In: B. Berendt, A. Hotho, D. Mladenic, M. van Someren, M. Spiliopoulou and G. Stumme, editors,
Web Mining: From Web to Semantic Web, volume 3209, pages 1-22.
Springer, Heidelberg, 2004.
Bettina Berendt, Andreas Hotho, Dunja Mladenic, Maarten van Someren, Myra Spiliopoulou and Gerd Stumme.
[doi]
[abstract]
[BibTeX]
The purpose of Web mining is to develop methods and systems for discovering models of objects and processes on the World Wide Web and for web-based systems that show adaptive performance. Web Mining integrates three parent areas: Data Mining (we use this term here also for the closely related areas of Machine Learning and Knowledge Discovery), Internet technology and World Wide Web, and for the more recent Semantic Web. The World Wide Web has made an enormous amount of information electronically accessible. The use of email, news and markup languages like HTML allow users to publish and read documents at a world-wide scale and to communicate via chat connections, including information in the form of images and voice records. The HTTP protocol that enables access to documents over the network via Web browsers created an immense improvement in communication and access to information. For some years these possibilities were used mostly in the scientific world but recent years have seen an immense growth in popularity, supported by the wide availability of computers and broadband communication. The use of the internet for other tasks than finding information and direct communication is increasing, as can be seen from the interest in ldquoe-activitiesrdquo such as e-commerce, e-learning, e-government, e-science.
Usage Mining for and on the Semantic Web.
In:
H. Kargupta, A. Joshi, K. Sivakumar and Y. Yesha, editors,
Data Mining Next Generation Challenges and Future Directions, pages 461-481.
AAAI Press, Boston, 2004.
Bettina Berendt, Andreas Hotho and Gerd Stumme.
[doi]
[abstract]
[BibTeX]
Semantic Web Mining aims at combining the two fast-developing
research areas Semantic Web and Web Mining.
Web Mining aims at discovering insights about the meaning of Web
resources and their usage. Given the primarily syntactical nature
of data Web mining operates on, the discovery of meaning is
impossible based on these data only. Therefore, formalizations of
the semantics of Web resources and navigation behavior are
increasingly being used. This fits exactly with the aims of the
Semantic Web: the Semantic Web enriches the WWW by
machine-processable information which supports the user in his
tasks. In this paper, we discuss the interplay of the Semantic Web
with Web Mining, with a specific focus on usage mining.
Web Mining: From Web to Semantic Web, First European Web
Mining Forum, EMWF 2003, Cavtat-Dubrovnik, Croatia, September
22, 2003, Revised Selected and Invited Papers.
LNAI. volume 3209.
Springer, Heidelberg, 2004.
http://km.aifb.uni-karlsruhe.de/ws/ewmf03/.
Bettina Berendt, Andreas Hotho, Dunja Mladenic, Maarten van Someren, Myra Spiliopoulou and Gerd Stumme.
[doi]
[BibTeX]
Semantic resource management for the web: an e-learning application.
In:
Proc. 13th International World Wide Web Conference (WWW 2004), pages 1-10.
2004.
Julien Tane, Christoph Schmitz and Gerd Stumme.
[doi]
[BibTeX]
Semantic Methods and Tools for Information Portals.
In: K. Dittrich, W. König, A. Oberweis, K. Rannenberg and W. Wahlster, editors,
INFORMATIK 2003 - Innovative Informatikanwendungen (Band 1), volume 34, series LNI, pages 116-131.
Gesellschaft für Informatik, Bonn, 2003.
Sudhir Agarwal, Peter Fankhauser, Jorge Gonzalez-Ollala, Jens Hartmann, Silvia Hollfelder, Anthony Jameson, Stefan Klink, Patrick Lehti, Michael Ley, Emma Rabbidge, Eric Schwarzkopf, Nitesh Shrestha, Nenad Stojanovic, Rudi Studer, Gerd Stumme and Bernd Walter.
[doi]
[abstract]
[BibTeX]
The paper describes a set of approaches for representing and
accessing information within a semantically structured information
portal, while offering the possibility to integrate own
information. It discusses research performed within the project
`Semantic Methods and Tools for Information Portals (SemIPort)'.
In particular, it focuses on (1) the development of scalable
storing, processing and querying methods for semantic data, (2)
visualization and browsing of complex data inventories, (3)
personalization and agent-based interaction, and (4) the
enhancement of web mining approaches for use within a
semantics-based portal.
Text Clustering Based on Background Knowledge.
Technical Report , University of Karlsruhe, Institute AIFB, 2003.
Andreas Hotho, Steffen Staab and Gerd Stumme.
[doi]
[abstract]
[BibTeX]
Text document clustering plays an important role in providing intuitive
navigation and browsing mechanisms by organizing large amounts of information
into a small number of meaningful clusters. Standard partitional or agglomerative
clustering methods efficiently compute results to this end.
However, the bag of words representation used for these clustering methods is often
unsatisfactory as it ignores relationships between important terms that do not
co-occur literally. Also, it is mostly left to the user to find out why a particular partitioning
has been achieved, because it is only specified extensionally. In order to
deal with the two problems, we integrate background knowledge into the process of
clustering text documents.
First, we preprocess the texts, enriching their representations by background knowledge
provided in a core ontology — in our application Wordnet. Then, we cluster
the documents by a partitional algorithm. Our experimental evaluation on Reuters
newsfeeds compares clustering results with pre-categorizations of news. In the experiments,
improvements of results by background knowledge compared to the baseline
can be shown for many interesting tasks.
Second, the clustering partitions the large number of documents to a relatively small
number of clusters, which may then be analyzed by conceptual clustering. In our approach,
we applied Formal Concept Analysis. Conceptual clustering techniques are
known to be too slow for directly clustering several hundreds of documents, but they
give an intensional account of cluster results. They allow for a concise description
of commonalities and distinctions of different clusters. With background knowledge
they even find abstractions like “food” (vs. specializations like “beef” or “corn”).
Thus, in our approach, partitional clustering reduces first the size of the problem
such that it becomes tractable for conceptual clustering, which then facilitates the
understanding of the results.
Semantic Web Mining. Proc. of the Semantic Web Mining Workshop of the 13th Europ. Conf. on
Machine Learning (ECML'02) / 6th Europ. Conf. on Principles and
Practice of Knowledge Discovery in Databases (PKDD'02).
Helsinki, Finland, 2002.
B. Berendt, A. Hotho and G. Stumme.
[doi]
[BibTeX]
Towards Semantic Web Mining.
In: I. Horrocks and J. Hendler, editors,
The Semantic Web - ISWC 2002, series LNCS, pages 264-278.
Springer, Heidelberg, 2002.
B. Berendt, A. Hotho and G. Stumme.
[doi]
[BibTeX]
KAON - Towards a large scale Semantic Web.
In: K. Bauknecht, A. M. Tjoa and G. Quirchmayr, editors,
Proceedings of the Third International Conference on E-Commerce and Web Technologies (EC-Web 2002), Aix-en-Provence, France, volume 2455, series LNCS, pages 304-313.
Springer, 2002.
E. Bozsak, Marc Ehrig, Siegfried Handschuh, Andreas Hotho, Alexander Maedche, Boris Motik, Daniel Oberle, Christoph Schmitz, Steffen Staab, Ljiljana Stojanovic, Nenad Stojanovic, Rudi Studer, Gerd Stumme, York Sure, Julien Tane, Raphael Volz and Valentin Zacharias.
[doi]
[BibTeX]
Semantic Methods and Tools for Information Portals - The SemIPort Project (Project Description).
In: B. Berendt, A. Hotho and G. Stumme, editors,
Semantic Web Mining. Proc. of the Semantic Web Mining Workshop of the 13th Europ. Conf., pages 90.
Helsinki, 2002.
J. Gonzalez-Olalla and G. Stumme.
[doi]
[BibTeX]
Usage Mining for and on the Semantic Web.
In:
Proc. NSF Workshop on Next Generation Data Mining, pages 77-86.
Baltimore, 2002.
G. Stumme, B. Berendt and A. Hotho.
[doi]
[BibTeX]
Using Ontologies and Formal Concept Analysis for Organizing Business Knowledge.
In:
J. Becker and R. Knackstedt, editors,
Wissensmanagement mit Referenzmodellen - Konzepte für die Anwendungssystem- und Organisationsgestaltung, pages 163-174.
Physica, Heidelberg, 2002.
G. Stumme.
[doi]
[BibTeX]