TY - CONF AU - Pereira Nunes, Bernardo AU - Kawase, Ricardo AU - Dietze, Stefan AU - Taibi, Davide AU - Casanova, Marco Antonio AU - Nejdl, Wolfgang A2 - Rizzo, Giuseppe A2 - Mendes, Pablo A2 - Charton, Eric A2 - Hellmann, Sebastian A2 - Kalyanpur, Aditya T1 - Can Entities be Friends? T2 - Proceedings of the Web of Linked Entities Workshop in conjuction with the 11th International Semantic Web Conference PB - CY - PY - 2012/november M2 - VL - 906 IS - SP - 45 EP - 57 UR - http://ceur-ws.org/Vol-906/paper6.pdf M3 - KW - data KW - detection KW - entity KW - graph KW - linked KW - relation KW - web L1 - SN - N1 - N1 - AB - The richness of the (Semantic) Web lies in its ability to link related resources as well as data across the Web. However, while relations within particular datasets are often well defined, links between disparate datasets and corpora of Web resources are rare. The increasingly widespread use of cross-domain reference datasets, such as Freebase and DBpedia for annotating and enriching datasets as well as document corpora, opens up opportunities to exploit their inherent semantics to uncover semantic relationships between disparate resources. In this paper, we present an approach to uncover relationships between disparate entities by analyzing the graphs of used reference datasets. We adapt a relationship assessment methodology from social network theory to measure the connectivity between entities in reference datasets and exploit these measures to identify correlated Web resources. Finally, we present an evaluation of our approach using the publicly available datasets Bibsonomy and USAToday. ER - TY - JOUR AU - Borges, Eduardo N. AU - Becker, Karin AU - Heuser, Carlos A. AU - Galante, Renata T1 - A Classification-based Approach for Bibliographic Metadata Deduplication JO - Proceedings of the IADIS International Conference WWW/Internet 2011 PY - 2011/ VL - IS - SP - 221 EP - 228 UR - http://www.eduardo.c3.furg.br/arquivos/download/www-internet2011.pdf M3 - KW - bibliographic KW - classification KW - detection KW - duplicate KW - metadata L1 - SN - N1 - N1 - AB - Digital libraries of scientific articles describe them using a set of metadata, including bibliographic references. These references can be represented by several formats and styles. Considerable content variations can occur in some metadata fields such as title, author names and publication venue. Besides, it is quite common to find references that omit same metadata fields such as page numbers. Duplicate entries influence the quality of digital library services once they need to be appropriately identified and treated. This paper presents a comparative analysis among different data classification algorithms used to identify duplicated bibliographic metadata records. We have investigated the discovered patterns by comparing the rules and the decision tree with the heuristics adopted in a previous work. Our experiments show that the combination of specific-purpose similarity functions previously proposed and classification algorithms represent an improvement up to 12% when compared to the experiments using our original approach. ER - TY - CONF AU - Mitzlaff, Folke AU - Benz, Dominik AU - Stumme, Gerd AU - Hotho, Andreas A2 - T1 - Visit me, click me, be my friend: an analysis of evidence networks of user relationships in BibSonomy T2 - HT '10: Proceedings of the 21st ACM Conference on Hypertext and Hypermedia PB - ACM CY - New York, NY, USA PY - 2010/ M2 - VL - IS - SP - 265 EP - 270 UR - http://portal.acm.org/citation.cfm?id=1810617.1810664 M3 - 10.1145/1810617.1810664 KW - bibsonomy KW - collaborative KW - community KW - detection KW - evidence KW - folksonomy KW - network KW - tagging L1 - SN - 978-1-4503-0041-4 N1 - N1 - AB - The ongoing spread of online social networking and sharing sites has reshaped the way how people interact with each other. Analyzing the relatedness of different users within the resulting large populations of these systems plays an important role for tasks like user recommendation or community detection. Algorithms in these fields typically face the problem that explicit user relationships (like friend lists) are often very sparse. Surprisingly, implicit evidences (like click logs) of user relations have hardly been considered to this end. Based on our long-time experience with running BibSonomy [4], we identify in this paper different evidence networks of user relationships in our system. We broadly classify each network based on whether the links are explicitly established by the users (e.g., friendship or group membership) or accrue implicitly in the running system (e.g., when user u copies an entry of user v). We systematically analyze structural properties of these networks and whether topological closeness (in terms of the length of shortest paths) coincides with semantic similarity between users. ER - TY - CONF AU - Voss, Jakob AU - Hotho, Andreas AU - Jäschke, Robert A2 - Kuhlen, Rainer T1 - Mapping Bibliographic Records with Bibliographic Hash Keys T2 - Information: Droge, Ware oder Commons? PB - Verlag Werner Hülsbusch CY - PY - 2009/ M2 - VL - IS - SP - EP - UR - http://eprints.rclis.org/15953/ M3 - KW - 2009 KW - bibkey KW - bibliographic KW - bibtex KW - detection KW - duplicate KW - hash KW - key KW - myown L1 - SN - N1 - N1 - AB - This poster presents a set of hash keys for bibliographic records called bibkeys. Unlike other methods of duplicate detection, bibkeys can directly be calculated from a set of basic metadata fields (title, authors/editors, year). It is shown how bibkeys are used to map similar bibliographic records in BibSonomy and among distributed library catalogs and other distributed databases. ER - TY - CONF AU - Java, Akshay AU - Joshi, Anupam AU - Finin, Tim A2 - T1 - Detecting Commmunities via Simultaneous Clustering of Graphs and Folksonomies T2 - WebKDD 2008 Workshop on Web Mining and Web Usage Analysis PB - CY - PY - 2008/08 M2 - VL - IS - SP - EP - UR - M3 - KW - clustering KW - community KW - detection L1 - SN - N1 - N1 - AB - ER - TY - JOUR AU - Barber, M. J. T1 - Modularity and community detection in bipartite networks JO - Physical Review E PY - 2007/ VL - 76 IS - 6 SP - EP - UR - http://arxiv.org/abs/arXiv:0707.1616 M3 - 10.1103/PhysRevE.76.066102 KW - bipartite KW - clustering KW - community KW - detection KW - graph KW - modularity KW - network L1 - SN - N1 - N1 - AB - The modularity of a network quantifies the extent, relative to a null model network, to which vertices cluster into community groups. We define a null model appropriate for bipartite networks, and use it to define a bipartite modularity. The bipartite modularity is presented in terms of a modularity matrix B; some key properties of the eigenspectrum of B are identified and used to describe an algorithm for identifying modules in bipartite networks. The algorithm is based on the idea that the modules in the two parts of the network are dependent, with each part mutually being used to induce the vertices for the other part into the modules. We apply the algorithm to real-world network data, showing that the algorithm successfully identifies the modular structure of bipartite networks. ER - TY - JOUR AU - Guimerà, R. AU - Sales-Pardo, M. AU - Amaral, L.A.N. T1 - Module identification in bipartite and directed networks JO - Physical review. E, Statistical, nonlinear, and soft matter physics PY - 2007/ VL - 76 IS - 3 Pt 2 SP - EP - UR - http://arxiv.org/abs/physics/0701151 M3 - 10.1103/PhysRevE.76.036102 KW - bipartite KW - clustering KW - community KW - detection KW - graph KW - modularity KW - module KW - network L1 - SN - N1 - N1 - AB - Modularity is one of the most prominent properties of real-world complex networks. Here, we address the issue of module identification in two important classes of networks: bipartite networks and directed unipartite networks. Nodes in bipartite networks are divided into two non-overlapping sets, and the links must have one end node from each set. Directed unipartite networks only have one type of nodes, but links have an origin and an end. We show that directed unipartite networks can be conviniently represented as bipartite networks for module identification purposes. We report a novel approach especially suited for module detection in bipartite networks, and define a set of random networks that enable us to validate the new approach. ER - TY - CONF AU - Hotho, Andreas AU - Jäschke, Robert AU - Schmitz, Christoph AU - Stumme, Gerd A2 - Avrithis, Yannis S. A2 - Kompatsiaris, Yiannis A2 - Staab, Steffen A2 - O'Connor, Noel E. T1 - Trend Detection in Folksonomies T2 - Proc. First International Conference on Semantics And Digital Media Technology (SAMT) PB - Springer CY - Heidelberg PY - 2006/12 M2 - VL - 4306 IS - SP - 56 EP - 70 UR - http://www.kde.cs.uni-kassel.de/stumme/papers/2006/hotho2006trend.pdf M3 - KW - 2006 KW - detection KW - folksonomy KW - l3s KW - myown KW - trend L1 - SN - 3-540-49335-2 N1 - N1 - AB - As the number of resources on the web exceeds by far the number of documents one can track, it becomes increasingly difficult to remain up to date on ones own areas of interest. The problem becomes more severe with the increasing fraction of multimedia data, from which it is difficult to extract some conceptual description of their contents.

One way to overcome this problem are social bookmark tools, which are rapidly emerging on the web. In such systems, users are setting up lightweight conceptual structures called folksonomies, and overcome thus the knowledge acquisition bottleneck. As more and more people participate in the effort, the use of a common vocabulary becomes more and more stable. We present an approach for discovering topic-specific trends within folksonomies. It is based on a differential adaptation of the PageRank algorithm to the triadic hypergraph structure of a folksonomy. The approach allows for any kind of data, as it does not rely on the internal structure of the documents. In particular, this allows to consider different data types in the same analysis step. We run experiments on a large-scale real-world snapshot of a social bookmarking system. ER - TY - CONF AU - Jäschke, Robert AU - Hotho, Andreas AU - Schmitz, Christoph AU - Stumme, Gerd A2 - Braß, Stefan A2 - Hinneburg, Alexander T1 - Wege zur Entdeckung von Communities in Folksonomies T2 - Proc. 18. Workshop Grundlagen von Datenbanken PB - Martin-Luther-Universität CY - Halle-Wittenberg PY - 2006/06 M2 - VL - IS - SP - 80 EP - 84 UR - http://www.kde.cs.uni-kassel.de/jaeschke/pub/jaeschke2006wege_gvd.pdf M3 - KW - 2006 KW - community KW - detection KW - folksonomy KW - iccs_example KW - l3s KW - myown KW - trias_example L1 - SN - N1 - N1 - AB - Ein wichtiger Baustein des neu entdeckten World Wide Web -- des "Web 2.0" -- stellen Folksonomies dar. In diesen Systemen können Benutzer gemeinsam Ressourcen verwalten und

mit Schlagwörtern versehen. Die dadurch entstehenden begrifflichen Strukturen stellen ein interessantes Forschungsfeld dar. Dieser Artikel untersucht Ansätze und Wege zur Entdeckung und Strukturierung von Nutzergruppen ("Communities") in Folksonomies. ER - TY - CONF AU - Aggarwal, Charu C. AU - Yu, Philip S. A2 - T1 - Online Analysis of Community Evolution in Data Streams. T2 - SDM PB - CY - PY - 2005/ M2 - VL - IS - SP - EP - UR - http://web.mit.edu/charu/www/aggar142.pdf M3 - KW - data KW - detection KW - stream KW - analysis KW - community L1 - SN - N1 - N1 - AB - ER - TY - JOUR AU - Duch, J. AU - Arenas, A. T1 - Community detection in complex networks using Extremal Optimization JO - Physical Review E PY - 2005/ VL - 72 IS - SP - EP - UR - http://www.citebase.org/abstract?id=oai:arXiv.org:cond-mat/0501368 M3 - KW - community KW - complex KW - detection KW - network L1 - SN - N1 - Citebase - Community detection in complex networks using Extremal Optimization N1 - AB - We propose a novel method to find the community structure in complex networks based on an extremal optimization of the value of modularity. The method outperforms the optimal modularity found by the existing algorithms in the literature. We present the results of the algorithm for computer simulated and real networks and compare them with other approaches. The efficiency and accuracy of the method make it feasible to be used for the accurate identification of community structure in large complex networks. ER - TY - THES AU - Trier, Matthias T1 - IT-supported Visualization and Evaluation of Virtual Knowledge Communities. Applying Social Network Intelligence Software in Knowledge Management to enable knowledge oriented People Network Management PY - 2005/ PB - SP - EP - UR - http://nbn-resolving.de/urn/resolver.pl?urn=urn:nbn:de:kobv:83-opus-10720 M3 - KW - social KW - detection KW - knowledge KW - management KW - community KW - network L1 - N1 - N1 - AB - ER - TY - CONF AU - Almeida, Rodrigo B. AU - Almeida, Virgilio A. F. A2 - T1 - A community-aware search engine T2 - Proceedings of the 13th international conference on World Wide Web PB - ACM Press CY - New York, NY, USA PY - 2004/ M2 - VL - IS - SP - 413 EP - 421 UR - http://doi.acm.org/10.1145/988672.988728 M3 - KW - search KW - engine KW - detection KW - hits KW - community KW - network L1 - SN - 1-58113-844-X N1 - N1 - AB -

Current search technologies work in a "one size fits all" fashion. Therefore, the answer to a query is independent of specific user information need. In this paper we describe a novel ranking technique for personalized search servicesthat combines content-based and community-based evidences. The community-based information is used in order to provide context for queries andis influenced by the current interaction of the user with the service. Ouralgorithm is evaluated using data derived from an actual service available on the Web an online bookstore. We show that the quality of content-based ranking strategies can be improved by the use of communityinformation as another evidential source of relevance. In our experiments the improvements reach up to 48% in terms of average precision. ER - TY - GEN AU - Radicchi, Filippo AU - Castellano, Claudio AU - Cecconi, Federico AU - Loreto, Vittorio AU - Parisi, Domenico A2 - T1 - Defining and identifying communities in networks JO - PB - AD - PY - 2004/02 VL - IS - SP - EP - UR - http://arxiv.org/abs/cond-mat/0309488 M3 - KW - graph KW - gn KW - detection KW - network KW - community L1 - N1 - N1 - AB - The investigation of community structures in networks is an important issue

in many domains and disciplines. This problem is relevant for social tasks

(objective analysis of relationships on the web), biological inquiries

(functional studies in metabolic, cellular or protein networks) or

technological problems (optimization of large infrastructures). Several types

of algorithm exist for revealing the community structure in networks, but a

general and quantitative definition of community is still lacking, leading to

an intrinsic difficulty in the interpretation of the results of the algorithms

without any additional non-topological information. In this paper we face this

problem by introducing two quantitative definitions of community and by showing

how they are implemented in practice in the existing algorithms. In this way

the algorithms for the identification of the community structure become fully

self-contained. Furthermore, we propose a new local algorithm to detect

communities which outperforms the existing algorithms with respect to the

computational cost, keeping the same level of reliability. The new algorithm is

tested on artificial and real-world graphs. In particular we show the

application of the new algorithm to a network of scientific collaborations,

which, for its size, can not be attacked with the usual methods. This new class

of local algorithms could open the way to applications to large-scale

technological and biological applications. ER - TY - GEN AU - Almeida, R.B. AU - Almeida, V.A.F. A2 - T1 - Design and evaluation of a user-based community discovery technique JO - PB - AD - PY - 2003/ VL - IS - SP - 17 EP - 23 UR - citeseer.ist.psu.edu/almeida03design.html M3 - KW - detection KW - hits KW - community KW - network L1 - N1 - N1 - AB - ER - TY - CONF AU - Kubica, Jeremy AU - Moore, Andrew AU - Schneider, Jeff A2 - Wu, Xindong A2 - Tuzhilin, Alex A2 - Shavlik, Jude T1 - Tractable Group Detection on Large Link Data Sets T2 - The Third IEEE International Conference on Data Mining PB - IEEE Computer Society CY - PY - 2003/november M2 - VL - IS - SP - 573 EP - 576 UR - M3 - KW - large KW - detection KW - community KW - network KW - gda L1 - SN - N1 - N1 - AB - ER - TY - RPRT AU - Kubica, Jeremy Martin AU - Moore, Andrew AU - Schneider, Jeff A2 - T1 - K-groups: Tractable Group Detection on Large Link Data Sets PB - Robotics Institute, Carnegie Mellon University AD - Pittsburgh, PA PY - 2003/10 VL - IS - CMU-RI-TR-03-32 SP - EP - UR - http://www.ri.cmu.edu/pubs/pub_4489.html M3 - KW - large KW - detection KW - gda KW - network KW - community L1 - N1 - N1 - N1 - AB - Discovering underlying structure from co-occurrence data is an important task in many fields, including: insurance, intelligence, criminal investigation, epidemiology, human resources, and marketing. For example a store may wish to identify underlying sets of items purchased together or a human resources department may wish to identify groups of employees that collaborate with each other.

Previously Kubica et. al. presented the group detection algorithm (GDA) - an algorithm for finding underlying groupings of entities from co-occurrence data. This algorithm is based on a probabilistic generative model and produces coherent groups that are consistent with prior knowledge. Unfortunately, the optimization used in GDA is slow, making it potentially infeasible for many real world data sets.

To this end, we present k-groups - an algorithm that uses an approach similar to that of k-means (hard clustering and localized updates) to significantly accelerate the discovery of the underlying groups while retaining GDA's probabilistic model. In addition, we show that k-groups is guaranteed to converge to a local minimum. We also compare the performance of GDA and k-groups on several real world and artificial data sets, showing that k-groups' sacrifice in solution quality is significantly offset by its increase in speed. This trade-off makes group detection tractable on significantly larger data sets. ER - TY - CONF AU - Kubica, Jeremy AU - Moore, Andrew AU - Schneider, Jeff AU - Yang, Yiming A2 - T1 - Stochastic Link and Group Detection T2 - Proceedings of the Eighteenth National Conference on Artificial Intelligence PB - AAAI Press/MIT Press CY - PY - 2002/07 M2 - VL - IS - SP - 798 EP - 804 UR - M3 - KW - detection KW - community KW - network KW - gda L1 - SN - N1 - N1 - AB - ER - TY - CONF AU - Borodin, Allan AU - Roberts, Gareth O. AU - Rosenthal, Jeffrey S. AU - Tsaparas, Panayiotis A2 - T1 - Finding authorities and hubs from link structures on the World Wide Web T2 - Proceedings of the 10th international conference on World Wide Web PB - ACM Press CY - New York, NY, USA PY - 2001/ M2 - VL - IS - SP - 415 EP - 429 UR - http://doi.acm.org/10.1145/371920.372096 M3 - KW - detection KW - hits KW - community KW - network L1 - SN - N1 - N1 - AB - ER - TY - JOUR AU - Tejada, Sheila AU - Knoblock, Craig A AU - Minton, Steven T1 - Learning object identification rules for information integration JO - Information Systems PY - 2001/12 VL - 26 IS - 8 SP - 607 EP - 633 UR - http://www.sciencedirect.com/science/article/pii/S0306437901000424 M3 - 10.1016/S0306-4379(01)00042-4 KW - detection KW - duplicate KW - entity KW - extraction KW - identification KW - information KW - integration L1 - SN - N1 - N1 - AB - When integrating information from multiple websites, the same data objects can exist in inconsistent text formats across sites, making it difficult to identify matching objects using exact text match. We have developed an object identification system called Active Atlas, which compares the objects’ shared attributes in order to identify matching objects. Certain attributes are more important for deciding if a mapping should exist between two objects. Previous methods of object identification have required manual construction of object identification rules or mapping rules for determining the mappings between objects. This manual process is time consuming and error-prone. In our approach. Active Atlas learns to tailor mapping rules, through limited user input, to a specific application domain. The experimental results demonstrate that we achieve higher accuracy and require less user involvement than previous methods across various application domains. ER -