Publications

Graph Neural Networks Designed for Different Graph Types: A Survey

Thomas, J.; Moallemy-Oureh, A.; Beddar-Wiesing, S. & Holzhüter, C.

Transactions on Machine Learning Research (2023) [pdf]

Graphs are ubiquitous in nature and can therefore serve as models for many practical but also
eoretical problems. For this purpose, they can be defined as many different types which
itably reflect the individual contexts of the represented problem. To address cutting-edge
oblems based on graph data, the research field of Graph Neural Networks (GNNs) has
erged. Despite the field’s youth and the speed at which new models are developed, many
cent surveys have been published to keep track of them. Nevertheless, it has not yet
en gathered which GNN can process what kind of graph types. In this survey, we give a
tailed overview of already existing GNNs and, unlike previous surveys, categorize them
cording to their ability to handle different graph types and properties. We consider GNNs
erating on static and dynamic graphs of different structural constitutions, with or without
de or edge attributes. Moreover, we distinguish between GNN models for discrete-time or
ntinuous-time dynamic graphs and group the models according to their architecture. We
nd that there are still graph types that are not or only rarely covered by existing GNN
dels. We point out where models are missing and give potential reasons for their absence.

Large-scale factorization of type-constrained multi-relational data

Krompass, D.; Nickel, M. & Tresp, V.

, 'International Conference on Data Science and Advanced Analytics, DSAA 2014, Shanghai, China, October 30 - November 1, 2014', IEEE, [10.1109/DSAA.2014.7058046], 18-24 (2014) [pdf]

An analysis of tag-recommender evaluation procedures

Doerfel, S. & Jäschke, R.

, 'Proceedings of the 7th ACM conference on Recommender systems', RecSys '13, ACM, New York, NY, USA, [10.1145/2507157.2507222], 343-346 (2013) [pdf]

Since the rise of collaborative tagging systems on the web, the tag recommendation task -- suggesting suitable tags to users of such systems while they add resources to their collection -- has been tackled. However, the (offline) evaluation of tag recommendation algorithms usually suffers from difficulties like the sparseness of the data or the cold start problem for new resources or users. Previous studies therefore often used so-called post-cores (specific subsets of the original datasets) for their experiments. In this paper, we conduct a large-scale experiment in which we analyze different tag recommendation algorithms on different cores of three real-world datasets. We show, that a recommender's performance depends on the particular core and explore correlations between performances on different cores.

Internet-Graphen

Heidtmann, K.

Informatik-Spektrum, 36(5) 440-448 (2013) [pdf]

Bildeten die Keimzellen des Internet noch kleine und einfach strukturierte Netze, so vergrößerten sich sowohl seine physikalischen als auch seine logischen Topologien später rasant. Wuchs einerseits das Netz aus Rechnern als Knoten und Verbindungsleitungen als Kanten immer weiter, so bedienten sich andererseits gleichzeitig immer mehr Anwendungen dieser Infrastruktur, um darüber ihrerseits immer größere und komplexere virtuelle Netze zu weben, z. B. das WWW oder soziale Online-Netze. Auf jeder Ebene dieser Hierarchie lassen sich die jeweiligen Netztopologien mithilfe von Graphen beschreiben und so mathematisch untersuchen. So ergeben sich interessante Einblicke in die Struktureigenschaften unterschiedlicher Graphentypen, die großen Einfluss auf die Leistungsfähigkeit des Internet haben. Hierzu werden charakteristische Eigenschaften und entsprechende Kenngrößen verschiedener Graphentypen betrachtet wie der Knotengrad, die Durchschnittsdistanz, die Variation der Kantendichte in unterschiedlichen Netzteilen und die topologische Robustheit als Widerstandsfähigkeit gegenüber Ausfällen und Angriffen. Es wird dabei Bezug genommen auf analytische, simulative und zahlreiche empirische Untersuchungen des Internets und hingewiesen auf Simulationsprogramme sowie Abbildungen von Internetgraphen im Internet.

Deeper Into the Folksonomy Graph: FolkRank Adaptations and Extensions for Improved Tag Recommendations

Landia, N.; Doerfel, S.; Jäschke, R.; Anand, S. S.; Hotho, A. & Griffiths, N.

cs.IR, 1310.1498() (2013) [pdf]

The information contained in social tagging systems is often modelled as a graph of connections between users, items and tags. Recommendation algorithms such as FolkRank, have the potential to leverage complex relationships in the data, corresponding to multiple hops in the graph. We present an in-depth analysis and evaluation of graph models for social tagging data and propose novel adaptations and extensions of FolkRank to improve tag recommendations. We highlight implicit assumptions made by the widely used folksonomy model, and propose an alternative and more accurate graph-representation of the data. Our extensions of FolkRank address the new item problem by incorporating content data into the algorithm, and significantly improve prediction results on unpruned datasets. Our adaptations address issues in the iterative weight spreading calculation that potentially hinder FolkRank's ability to leverage the deep graph as an information source. Moreover, we evaluate the benefit of considering each deeper level of the graph, and present important insights regarding the characteristics of social tagging data in general. Our results suggest that the base assumption made by conventional weight propagation methods, that closeness in the graph always implies a positive relationship, does not hold for the social tagging domain.

Deeper Into the Folksonomy Graph: FolkRank Adaptations and Extensions for Improved Tag Recommendations

Landia, N.; Doerfel, S.; Jäschke, R.; Anand, S. S.; Hotho, A. & Griffiths, N.

cs.IR, 1310.1498() (2013) [pdf]

The information contained in social tagging systems is often modelled as a graph of connections between users, items and tags. Recommendation algorithms such as FolkRank, have the potential to leverage complex relationships in the data, corresponding to multiple hops in the graph. We present an in-depth analysis and evaluation of graph models for social tagging data and propose novel adaptations and extensions of FolkRank to improve tag recommendations. We highlight implicit assumptions made by the widely used folksonomy model, and propose an alternative and more accurate graph-representation of the data. Our extensions of FolkRank address the new item problem by incorporating content data into the algorithm, and significantly improve prediction results on unpruned datasets. Our adaptations address issues in the iterative weight spreading calculation that potentially hinder FolkRank's ability to leverage the deep graph as an information source. Moreover, we evaluate the benefit of considering each deeper level of the graph, and present important insights regarding the characteristics of social tagging data in general. Our results suggest that the base assumption made by conventional weight propagation methods, that closeness in the graph always implies a positive relationship, does not hold for the social tagging domain.

Deeper Into the Folksonomy Graph: FolkRank Adaptations and Extensions for Improved Tag Recommendations

Landia, N.; Doerfel, S.; Jäschke, R.; Anand, S. S.; Hotho, A. & Griffiths, N.

cs.IR, 1310.1498() (2013) [pdf]

The information contained in social tagging systems is often modelled as a graph of connections between users, items and tags. Recommendation algorithms such as FolkRank, have the potential to leverage complex relationships in the data, corresponding to multiple hops in the graph. We present an in-depth analysis and evaluation of graph models for social tagging data and propose novel adaptations and extensions of FolkRank to improve tag recommendations. We highlight implicit assumptions made by the widely used folksonomy model, and propose an alternative and more accurate graph-representation of the data. Our extensions of FolkRank address the new item problem by incorporating content data into the algorithm, and significantly improve prediction results on unpruned datasets. Our adaptations address issues in the iterative weight spreading calculation that potentially hinder FolkRank's ability to leverage the deep graph as an information source. Moreover, we evaluate the benefit of considering each deeper level of the graph, and present important insights regarding the characteristics of social tagging data in general. Our results suggest that the base assumption made by conventional weight propagation methods, that closeness in the graph always implies a positive relationship, does not hold for the social tagging domain.

Full-Text Citation Analysis: A New Method to Enhance Scholarly Network

Liu, X.; Zhang, J. & Guo, C.

Journal of the American Society for Information Science and Technology (2012) [pdf]

Full-Text Citation Analysis: A New Method to Enhance Scholarly Network

Liu, X.; Zhang, J. & Guo, C.

Journal of the American Society for Information Science and Technology (2012) [pdf]

Can Entities be Friends?

Pereira Nunes, B.; Kawase, R.; Dietze, S.; Taibi, D.; Casanova, M. A. & Nejdl, W.

Rizzo, G.; Mendes, P.; Charton, E.; Hellmann, S. & Kalyanpur, A., ed., 'Proceedings of the Web of Linked Entities Workshop in conjuction with the 11th International Semantic Web Conference', 906(), CEUR-WS.org, 45-57 (2012) [pdf]

The richness of the (Semantic) Web lies in its ability to link related resources as well as data across the Web. However, while relations within particular datasets are often well defined, links between disparate datasets and corpora of Web resources are rare. The increasingly widespread use of cross-domain reference datasets, such as Freebase and DBpedia for annotating and enriching datasets as well as document corpora, opens up opportunities to exploit their inherent semantics to uncover semantic relationships between disparate resources. In this paper, we present an approach to uncover relationships between disparate entities by analyzing the graphs of used reference datasets. We adapt a relationship assessment methodology from social network theory to measure the connectivity between entities in reference datasets and exploit these measures to identify correlated Web resources. Finally, we present an evaluation of our approach using the publicly available datasets Bibsonomy and USAToday.

Can Entities be Friends?

Pereira Nunes, B.; Kawase, R.; Dietze, S.; Taibi, D.; Casanova, M. A. & Nejdl, W.

Rizzo, G.; Mendes, P.; Charton, E.; Hellmann, S. & Kalyanpur, A., ed., 'Proceedings of the Web of Linked Entities Workshop in conjuction with the 11th International Semantic Web Conference', 906(), CEUR-WS.org, 45-57 (2012) [pdf]

The richness of the (Semantic) Web lies in its ability to link related resources as well as data across the Web. However, while relations within particular datasets are often well defined, links between disparate datasets and corpora of Web resources are rare. The increasingly widespread use of cross-domain reference datasets, such as Freebase and DBpedia for annotating and enriching datasets as well as document corpora, opens up opportunities to exploit their inherent semantics to uncover semantic relationships between disparate resources. In this paper, we present an approach to uncover relationships between disparate entities by analyzing the graphs of used reference datasets. We adapt a relationship assessment methodology from social network theory to measure the connectivity between entities in reference datasets and exploit these measures to identify correlated Web resources. Finally, we present an evaluation of our approach using the publicly available datasets Bibsonomy and USAToday.

Fast algorithms for determining (generalized) core groups in social networks

Batagelj, V. & Zaveršnik, M.

Advances in Data Analysis and Classification, 5(2) 129-145 (2011) [pdf]

The structure of a large network (graph) can often be revealed by partitioning it into smaller and possibly more dense sub-networks that are easier to handle. One of such decompositions is based on “ k -cores”, proposed in 1983 by Seidman. Together with connectivity components, cores are one among few concepts that provide efficient decompositions of large graphs and networks. In this paper we propose an efficient algorithm for determining the cores decomposition of a given network with complexity $$O(m)$$, where m is the number of lines (edges or arcs). In the second part of the paper the classical concept of k -core is generalized in a way that uses a vertex property function instead of degree of a vertex. For local monotone vertex property functions the corresponding generalized cores can be determined in $$O(motn))$$ time, where n is the number of vertices and Δ is the maximum degree. Finally the proposed algorithms are illustrated by the analysis of a collaboration network in the field of computational geometry.

The Anatomy of the Facebook Social Graph

Ugander, J.; Karrer, B.; Backstrom, L. & Marlow, C.

(2011) [pdf]

We study the structure of the social graph of active Facebook users, the
rgest social network ever analyzed. We compute numerous features of the graph
cluding the number of users and friendships, the degree distribution, path
ngths, clustering, and mixing patterns. Our results center around three main
servations. First, we characterize the global structure of the graph,
termining that the social network is nearly fully connected, with 99.91% of
dividuals belonging to a single large connected component, and we confirm the
ix degrees of separation" phenomenon on a global scale. Second, by studying
e average local clustering coefficient and degeneracy of graph neighborhoods,
show that while the Facebook graph as a whole is clearly sparse, the graph
ighborhoods of users contain surprisingly dense structure. Third, we
aracterize the assortativity patterns present in the graph by studying the
sic demographic and network properties of users. We observe clear degree
sortativity and characterize the extent to which "your friends have more
iends than you". Furthermore, we observe a strong effect of age on friendship
eferences as well as a globally modular community structure driven by
tionality, but we do not find any strong gender homophily. We compare our
sults with those from smaller social networks and find mostly, but not
tirely, agreement on common structural network characteristics.

Index design and query processing for graph conductance search

Chakrabarti, S.; Pathak, A. & Gupta, M.

The VLDB Journal 1-26 (2010) [pdf]

Graph conductance queries, also known as personalized PageRank and related to random walks with restarts, were originally proposed to assign a hyperlink-based prestige score to Web pages. More general forms of such queries are also very useful for ranking in entity-relation (ER) graphs used to represent relational, XML and hypertext data. Evaluation of PageRank usually involves a global eigen computation. If the graph is even moderately large, interactive response times may not be possible. Recently, the need for interactive PageRank evaluation has increased. The graph may be fully known only when the query is submitted. Browsing actions of the user may change some inputs to the PageRank computation dynamically. In this paper, we describe a system that analyzes query workloads and the ER graph, invests in limited offline indexing, and exploits those indices to achieve essentially constant-time query processing, even as the graph size scales. Our techniques—data and query statistics collection, index selection and materialization, and query-time index exploitation—have parallels in the extensive relational query optimization literature, but is applied to supporting novel graph data repositories. We report on experiments with five temporal snapshots of the CiteSeer ER graph having 74–702 thousand entity nodes, 0.17–1.16 million word nodes, 0.29–3.26 million edges between entities, and 3.29–32.8 million edges between words and entities. We also used two million actual queries from CiteSeer’s logs. Queries run 3–4 orders of magnitude faster than whole-graph PageRank, the gap growing with graph size. Index size is smaller than a text index. Ranking accuracy is 94–98% with reference to whole-graph PageRank.

What's in a crowd? Analysis of face-to-face behavioral networks

Isella, L.; Stehlé, J.; Barrat, A.; Cattuto, C.; Pinton, J.-F. & den Broeck, W. V.

CoRR, abs/1006.1260() (2010) [pdf]

Visit me, click me, be my friend: An analysis of evidence networks of user relationships in Bibsonomy

Mitzlaff, F.; Benz, D.; Stumme, G. & Hotho, A.

, 'Proceedings of the 21st ACM conference on Hypertext and hypermedia', Toronto, Canada (2010) [pdf]

Visit me, click me, be my friend: An analysis of evidence networks of user relationships in Bibsonomy

Mitzlaff, F.; Benz, D.; Stumme, G. & Hotho, A.

, 'Proceedings of the 21st ACM conference on Hypertext and hypermedia', Toronto, Canada (2010) [pdf]

RTG: A Recursive Realistic Graph Generator Using Random Typing.

Akoglu, L. & Faloutsos, C.

Buntine, W. L.; Grobelnik, M.; Mladenic, D. & Shawe-Taylor, J., ed., 'ECML/PKDD (1)', 5781(), Lecture Notes in Computer Science, Springer, 13-28 (2009) [pdf]

Mining Graph Evolution Rules.

Berlingerio, M.; Bonchi, F.; Bringmann, B. & Gionis, A.

Buntine, W. L.; Grobelnik, M.; Mladenic, D. & Shawe-Taylor, J., ed., 'ECML/PKDD (1)', 5781(), Lecture Notes in Computer Science, Springer, 115-130 (2009) [pdf]

Binary Decomposition Methods for Multipartite Ranking

Fürnkranz, J.; Hüllermeier, E. & Vanderlooy, S.

Machine Learning and Knowledge Discovery in Databases 359-374 (2009) [pdf]

Bipartite ranking refers to the problem of learning a ranking function from a training set of positively and negatively labeled
amples. Applied to a set of unlabeled instances, a ranking function is expected to establish a total order in which positiveinstances precede negative ones. The performance of a ranking function is typically measured in terms of the AUC. In thispaper, we study the problem of multipartite ranking, an extension of bipartite ranking to the multi-class case. In this regard,we discuss extensions of the AUC metric which are suitable as evaluation criteria for multipartite rankings. Moreover, tolearn multipartite ranking functions, we propose methods on the basis of binary decomposition techniques that have previouslybeen used for multi-class and ordinal classification. We compare these methods both analytically and experimentally, not onlyagainst each other but also to existing methods applicable to the same problem.

GMap: Drawing Graphs as Maps

Gansner, E. R.; Hu, Y. & Kobourov, S. G.

cs.CG, arXiv:0907.2585v1() (2009) [pdf]

Information visualization is essential in making sense out of large data sets. Often, high-dimensional data are visualized as a collection of points in 2-dimensional space through dimensionality reduction techniques. However, these traditional methods often do not capture well the underlying structural information, clustering, and neighborhoods. In this paper, we describe GMap: a practical tool for visualizing relational data with geographic-like maps. We illustrate the effectiveness of this approach with examples from several domains All the maps referenced in this paper can be found in http://www.research.att.com/~yifanhu/GMap

Structure of Heterogeneous Networks

Ghosh, R. & Lerman, K.

(2009) [pdf]

Heterogeneous networks play a key role in the evolution of communities and
e decisions individuals make. These networks link different types of
tities, for example, people and the events they attend. Network analysis
gorithms usually project such networks unto simple graphs composed of
tities of a single type. In the process, they conflate relations between
tities of different types and loose important structural information. We
velop a mathematical framework that can be used to compactly represent and
alyze heterogeneous networks that combine multiple entity and link types. We
neralize Bonacich centrality, which measures connectivity between nodes by
e number of paths between them, to heterogeneous networks and use this
asure to study network structure. Specifically, we extend the popular
dularity-maximization method for community detection to use this centrality
tric. We also rank nodes based on their connectivity to other nodes. One
vantage of this centrality metric is that it has a tunable parameter we can
e to set the length scale of interactions. By studying how rankings change
th this parameter allows us to identify important nodes in the network. We
ply the proposed method to analyze the structure of several heterogeneous
tworks. We show that exploiting additional sources of evidence corresponding
links between, as well as among, different entity types yields new insights
to network structure.

Simulated Iterative Classification A New Learning Procedure for Graph Labeling.

Maes, F.; Peters, S.; Denoyer, L. & Gallinari, P.

Buntine, W. L.; Grobelnik, M.; Mladenic, D. & Shawe-Taylor, J., ed., 'ECML/PKDD (2)', 5782(), Lecture Notes in Computer Science, Springer, 47-62 (2009) [pdf]

Collective classification refers to the classification of interlinked and relational objects described as nodes in a graph. The Iterative Classification Algorithm (ICA) is a simple, efficient and widely used method to solve this problem. It is representative of a family of methods for which inference proceeds as an iterative process: at each step, nodes of the graph are classified according to the current predicted labels of their neighbors. We show that learning in this class of models suffers from a training bias. We propose a new family of methods, called Simulated ICA, which helps reducing this training bias by simulating inference during learning. Several variants of the method are introduced. They are both simple, efficient and scale well. Experiments performed on a series of 7 datasets show that the proposed methods outperform representative state-of-the-art algorithms while keeping a low complexity.

Modularities for Bipartite Networks

Murata, T.

, 'HT '09: Proceedings of the Twentieth ACM Conference on Hypertext and Hypermedia', ACM, New York, NY, USA (2009)

Real-world relations are often represented as bipartite networks, such as paper-author networks and event-attendee networks. Extracting dense subnetworks (communities) from bipartite networks and evaluating their qualities are practically important research topics. As the attempts for evaluating divisions of bipartite networks, Guimera and Barber propose bipartite modularities. This paper discusses the properties of these bipartite modularities and proposes another bipartite modularity that allows one-to-many correspondence of communities of different vertex types. Preliminary experimental results for the bipartite modularities are also described.

Eigenvalues and Structures of Graphs

Butler, S.

2008, PhD thesis, University of California, San Diego

A survey of kernel and spectral methods for clustering

Filippone, M.; Camastra, F.; Masulli, F. & Rovetta, S.

Pattern recognition, 41(1) 176-190 (2008)

Average Distance, Diameter, and Clustering in Social Networks with Homophily

Jackson, M.

Internet and Network Economics 4-11 (2008) [pdf]

I examine a random network model where nodes are categorized by type and linking probabilities can differ across types. I
ow that as homophily increases (so that the probability to link to other nodes of the same type increases and the probabilityof linking to nodes of some other types decreases) the average distance and diameter of the network are unchanged, while theaverage clustering in the network increases.

Extending the definition of modularity to directed graphs with
overlapping communities

Nicosia, V.; Mangioni, G.; Carchiolo, V. & Malgeri, M.

(2008) [pdf]

Complex networks topologies present interesting and surprising properties,
ch as community structures, which can be exploited to optimize communication,
find new efficient and context-aware routing algorithms or simply to
derstand the dynamics and meaning of relationships among nodes. Complex
tworks are gaining more and more importance as a reference model and are a
werful interpretation tool for many different kinds of natural, biological
d social networks, where directed relationships and contextual belonging of
des to many different communities is a matter of fact. This paper starts from
e definition of modularity function, given by M. Newman to evaluate the
odness of network community decompositions, and extends it to the more
neral case of directed graphs with overlapping community structures.
teresting properties of the proposed extension are discussed, a method for
nding overlapping communities is proposed and results of its application to
nchmark case-studies are reported. We also propose a new dataset which could
used as a reference benchmark for overlapping community structures
entification.

Tag recommendations based on tensor dimensionality reduction

Symeonidis, P.; Nanopoulos, A. & Manolopoulos, Y.

, 'RecSys '08: Proceedings of the 2008 ACM conference on Recommender systems', ACM, New York, NY, USA, [http://doi.acm.org/10.1145/1454008.1454017], 43-50 (2008) [pdf]

Generating Graphs with Predefined k-Core Structure

Baur, M.; Gaertler, M.; Görke, R.; Krug, M. & Wagner, D.

, 'Proceedings of the European Conference of Complex Systems' (2007) [pdf]

The modeling of realistic networks is of great importance for modern complex systems research. Previous procedures typically model the natural growth of networks by means of iteratively adding nodes, geometric positioning information, a definition of link connectivity based on the preference for nearest neighbors or already highly connected nodes, or combine several of these approaches. Our novel model is based on the well-know concept of k-cores, originally introduced in social network analysis. Recent studies exposed the significant k-core structure of several real world systems, e.g. the AS network of the Internet. We present a simple and efficient method for generating networks which strictly adhere to the characteristics of a given k-core structure, called core fingerprint. We show-case our algorithm in a comparative evaluation with two well-known AS network generators.

On Finding Graph Clusterings with Maximum Modularity

Brandes, U.; Delling, D.; Gaertler, M.; Görke, R.; Hoefer, M.; Nikoloski, Z. & Wagner, D.

Brandstädt, A.; Kratsch, D. & Müller, H., ed., 'Graph-Theoretic Concepts in Computer Science', 4769(), Springer, Berlin / Heidelberg, 121-132 (2007) [pdf]

Modularity is a recently introduced quality measure for graph clusterings. It has immediately received considerable attention in several disciplines, and in particular in the complex systems literature, although its properties are not well understood. We study the problem of finding clusterings with maximum modularity, thus providing theoretical foundations for past and present work based on this measure. More precisely, we prove the conjectured hardness of maximizing modularity both in the general case and with the restriction to cuts, and give an Integer Linear Programming formulation. This is complemented by first insights into the behavior and performance of the commonly applied greedy agglomaration approach.

A tutorial on spectral clustering

Luxburg, U.

Statistics and Computing, 17(4) 395-416 (2007) [pdf]

Graph clustering

Schaeffer, S.

Computer Science Review, 1(1) 27-64 (2007) [pdf]

Spectral Graph Theory and its Applications

Spielman, D.

Foundations of Computer Science, 2007. FOCS '07. 48th Annual IEEE Symposium on 29-38 (2007)

Spectral graph theory is the study of the eigenvalues and eigenvectors of matrices associated with graphs. In this tutorial, we will try to provide some intuition as to why these eigenvectors and eigenvalues have combinatorial significance, and will sitn'ey some of their applications.

Analysis of the Wikipedia Category Graph for NLP Applications

Zesch, T. & Gurevych, I.

, 'Proceedings of the TextGraphs-2 Workshop (NAACL-HLT)', Association for Computational Linguistics, Rochester, 1-8 (2007) [pdf]

In this paper, we discuss two graphs in Wikipedia (i) the article graph, and (ii) the category graph. We perform a graph-theoretic analysis of the category graph, and show that it is a scale-free, small world graph like other well-known lexical semantic networks. We substantiate our findings by transferring semantic relatedness algorithms defined on WordNet to the Wikipedia category graph. To assess the usefulness of the category graph as an NLP resource, we analyze its coverage and the performance of the transferred semantic relatedness algorithms.

Spectral Graph Theory: Applications of Courant Fischer

Butler, S.

(2006)

Spectral Graph Theory: Cheeger constants and discrepancy

Butler, S.

(2006)

Spectral Graph Theory: Three common spectra

Butler, S.

(2006)

Building Emergent Social Networks and Group Profiles by Semantic User Preference Clustering

Cantador, I. & Castells, P.

(2006)

Comparison of Graph Clustering Approaches

Frivolt, G. & Pok, O.

(2006)

Information Retrieval in Folksonomies: Search and Ranking

Hotho, A.; Jäschke, R.; Schmitz, C. & Stumme, G.

Sure, Y. & Domingue, J., ed., 'The Semantic Web: Research and Applications', 4011(), Lecture Notes in Computer Science, Springer, Heidelberg, 411-426 (2006)

Social bookmark tools are rapidly emerging on the Web. In such systems users are setting up lightweight conceptual structures called folksonomies. The reason for their immediate success is the fact that no specific skills are needed for participating. At the moment, however, the information retrieval support is limited. We present a formal model and a new search algorithm for folksonomies,called FolkRank, that exploits the structure of the folksonomy. The proposed algorithm is also applied to findcommunities within the folksonomy and is used to structure search results. All findings are demonstrated on a large scale dataset.

Modularity and community structure in networks

Newman, M. E. J.

Proceedings of the National Academy of Sciences, 103(23) 8577-8582 (2006)

Many networks of interest in the sciences, including social networks, computer networks, and metabolic and regulatory networks, are found to divide naturally into communities or modules. The problem of detecting and characterizing this community structure is one of the outstanding issues in the study of networked systems. One highly effective approach is the optimization of the quality function known as “modularity” over the possible divisions of a network. Here I show that the modularity can be expressed in terms of the eigenvectors of a characteristic matrix for the network, which I call the modularity matrix, and that this expression leads to a spectral algorithm for community detection that returns results of demonstrably higher quality than competing methods in shorter running times. I illustrate the method with applications to several published network data sets.

Finding community structure in networks using the eigenvectors of matrices

Newman, M.

Physical Review E, 74(3) 36104 (2006)

Modularity and community structure in networks

Newman, M.

Proceedings of the National Academy of Sciences, 103(23) 8577-8582 (2006)

Content Aggregation on Knowledge Bases using Graph Clustering

Schmitz, C.; Hotho, A.; Jäschke, R. & Stumme, G.

Sure, Y. & Domingue, J., ed., 'The Semantic Web: Research and Applications', 4011(), LNAI, Springer, Heidelberg, 530-544 (2006) [pdf]

Recently, research projects such as PADLR and SWAP
have developed tools like Edutella or Bibster, which are targeted at
establishing peer-to-peer knowledge management (P2PKM) systems. In
such a system, it is necessary to obtain provide brief semantic
descriptions of peers, so that routing algorithms or matchmaking
processes can make decisions about which communities peers should
belong to, or to which peers a given query should be forwarded.
This paper provides a graph clustering technique on
knowledge bases for that purpose. Using this clustering, we can show
that our strategy requires up to 58% fewer queries than the
baselines to yield full recall in a bibliographic P2PKM scenario.

A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts

Dhillon, I. S.; Guan, Y. & Kulis, B.

2005, Technical report, University of Texas Dept. of Computer Science [pdf]

Recently, a variety of clustering algorithms have been proposed to handle data that is not linearly separable. Spectral clustering and kernel k-means are two such methods that are seemingly quite different. In this paper, we show that a general weighted kernel k-means objective is mathematically equivalent to a weighted graph partitioning objective. Special cases of this graph partitioning objective include ratio cut, normalized cut and ratio association. Our equivalence has important consequences: the weighted kernel k-means algorithm may be used to directly optimize the graph partitioning objectives, and conversely, spectral methods may be used to optimize the weighted kernel k-means objective. Hence, in cases where eigenvector computation is prohibitive, we eliminate the need for any eigenvector computation for graph partitioning. Moreover, we show that the Kernighan-Lin objective can also be incorporated into our framework, leading to an incremental weighted kernel k-means algorithm for local optim ization of the objective. We further discuss the issue of convergence of weighted kernel k-means for an arbitrary graph affinity matrix and provide a number of experimental results. These results show that non-spectral methods for graph partitioning are as effective as spectral methods and can be used for problems such as image segmentation in addition to data clustering.

Generating bicliques of a graph in lexicographic order

Dias, V. M.; de Figueiredo, C. M. & Szwarcfiter, J. L.

Theoretical Computer Science, 337(1-3) 240 - 248 (2005) [pdf]

An independent set of a graph is a subset of pairwise non-adjacent vertices. A complete bipartite set B is a subset of vertices admitting a bipartition B=X[union or logical sum]Y, such that both X and Y are independent sets, and all vertices of X are adjacent to those of Y. If both X,Y[not equal to][empty set], then B is called proper. A biclique is a maximal proper complete bipartite set of a graph. We present an algorithm that generates all bicliques of a graph in lexicographic order, with polynomial-time delay between the output of two successive bicliques. We also show that there is no polynomial-time delay algorithm for generating all bicliques in reverse lexicographic order, unless P=NP. The methods are based on those by Johnson, Papadimitriou and Yannakakis, in the solution of these two problems for independent sets, instead of bicliques.

Graph Theory

Diestel, R.

2005, Springer-Verlag Heidelberg, New York [pdf]

Role Assignments

Lerner, J.

Brandes, U. & Erlebach, T., ed., 'Network Analysis', 3418(), Springer, Berlin / Heidelberg, 216-252 (2005) [pdf]

9.0. 9.0.1. Preliminaries 9.0.2. Role Graph 9.1. Structural Equivalence 9.1.1. Lattice of Equivalence Relations 9.1.2. Lattice of Structural Equivalences 9.1.3. Computation of Structural Equivalences 9.2. Regular Equivalence 9.2.1. Elementary Properties 9.2.2. Lattice Structure and Regular Interior 9.2.3. Computation of Regular Interior 9.2.4. The Role Assignment Problem 9.2.5. Existence of k-Role Assignments 9.3. Other Equivalences 9.3.1. Exact Role Assignments 9.3.2. Automorphic and Orbit Equivalence 9.3.3. Perfect Equivalence 9.3.4. Relative Regular Equivalence 9.4. Graphs with Multiple Relations 9.5. The Semigroup of a Graph 9.5.1. Winship-Pattison Role Equivalence 9.6. Chapter Notes

A spectral clustering approach to finding communities in graph

White, S. & Smyth, P.

(2005)

Characterizing and Mining the Citation Graph of the Computer Science Literature

An, Y.; Janssen, J. & Milios, E. E.

Knowl. Inf. Syst., 6() 664-678 (2004) [pdf]

The Diameter of a Scale-Free Random Graph

Bollobás*, B. & Riordan, O.

Combinatorica, 24(1) 5-34 (2004) [pdf]

We consider a random graph process in which vertices are added to the graph one at a time and joined to a fixed number m of earlier vertices, where each earlier vertex is chosen with probability proportional to its degree. This process was introduced by Barabási and Albert [3], as a simple model of the growth of real-world graphs such as the world-wide web. Computer experiments presented by Barabási, Albert and Jeong [1,5] and heuristic arguments given by Newman, Strogatz and Watts [23] suggest that after n steps the resulting graph should have diameter approximately log n. We show that while this holds for m=1, for m=2 the diameter is asymptotically log n/log log n.
-

Clustering large graphs via the singular value decomposition

Drineas, P.; Frieze, A.; Kannan, R.; Vempala, S. & Vinay, V.

Machine Learning, 56(1) 9-33 (2004) [pdf]

Graph clustering and minimum cut trees

Flake, G.; Tarjan, R. & Tsioutsiouliklis, K.

Internet Mathematics, 1(4) 385-408 (2004) [pdf]

Deeper inside pagerank

Langville, A. & Meyer, C.

Internet Mathematics, 1(3) 335-380 (2004) [pdf]

An O(m) Algorithm for Cores Decomposition of Networks

Batagelj, V. & Zaversnik, M.

(2003) [pdf]

The structure of large networks can be revealed by partitioning them to
aller parts, which are easier to handle. One of such decompositions is based
$k$--cores, proposed in 1983 by Seidman. In the paper an efficient, $O(m)$,
$ is the number of lines, algorithm for determining the cores decomposition
a given network is presented.

Experiments on graph clustering algorithms

Brandes, U.; Gaertler, M. & Wagner, D.

Lecture notes in computer science 568-579 (2003) [pdf]

Spectral measures of bipartivity in complex networks

Estrada, E. & Rodriguez-Velázquez, J.

SIAM Rev Phys Rev E, 72() 046105 (2003)

The second eigenvalue of the Google matrix

Haveliwala, T. & Kamvar, S.

A Stanford University Technical Report http://dbpubs. stanford. edu (2003)

The structure and function of complex networks

Newman, M. E. J.

SIAM Review, 45(2) 167-256 (2003)

A comparison of spectral clustering algorithms

Verma, D. & Meila, M.

University of Washington, Tech. Rep. UW-CSE-03-05-01 (2003)

Multiclass Spectral Clustering

Yu, S. X. & Shi, J.

, 'Proc. International Conference on Computer Vision (ICCV 03)', Nice, France (2003)

Graph Separators

Blelloch, G.

(2002)

Visualization of bibliographic networks with a reshaped landscape metaphor

Brandes, U. & Willhalm, T.

, 'Proceedings of the symposium on Data Visualisation 2002', VISSYM '02, Eurographics Association, Aire-la-Ville, Switzerland, Switzerland, 159-ff (2002) [pdf]

We describe a novel approach to visualize bibliographic networks that facilitates the simultaneous identification of clusters (e.g., topic areas) and prominent entities (e.g., surveys or landmark papers). While employing the landscape metaphor proposed in several earlier works, we introduce new means to determine relevant parameters of the landscape. Moreover, we are able to compute prominent entities, clustering of entities, and the landscape's surface in a surprisingly simple and uniform way. The effectiveness of our network visualizations is illustrated on data from the graph drawing literature.

Markov chain Monte Carlo estimation of exponential random graph models

Snijders, T.

Journal of Social Structure, 3(2) 1-40 (2002)

General formalism for inhomogeneous random graphs

Soderberg, B.

Phys. Rev. E, 66(6) 066121 (2002)

Co-clustering documents and words using bipartite spectral graph partitioning

Dhillon, I. S.

, 'KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining', ACM Press, New York, NY, USA, [10.1145/502512.502550], 269-274 (2001) [pdf]

On Spectral Bounds for the k-Partitioning of Graphs

Monien, B.

(2001)

Random graphs with arbitrary degree distributions and their applications

Newman, M.; Strogatz, S. & Watts, D.

Arxiv preprint cond-mat/0007235 (2001)

On spectral clustering: Analysis and an algorithm

Ng, A. Y.; Jordan, M. I. & Weiss, Y.

, 'Advances in Neural Information Processing Systems 14', MIT Press, 849-856 (2001)

Despite many empirical successes of spectral clustering methods| algorithms that cluster points using eigenvectors of matrices derived from the data|there are several unresolved issues. First, there are a wide variety of algorithms that use the eigenvectors in slightly dierent ways. Second, many of these algorithms have no proof that they will actually compute a reasonable clustering. In this paper, we present a simple spectral clustering algorithm that can be implemented using a few lines of Matlab. Using tools from matrix perturbation theory, we analyze the algorithm, and give conditions under which it can be expected to do well. We also show surprisingly good experimental results on a number of challenging clustering problems. 1

A random graph model for massive graphs

Aiello, W.; Chung, F. & Lu, L.

171-180 (2000) [pdf]

Cospectral graphs for both the adjacency and normalized Laplacian matrices

Butler, S.

(2000)

An open graph visualization system and its applications to software engineering

Gansner, E. R. & North, S. C.

Software Practice & Experience, 30(11) 1203-1233 (2000) [pdf]

We describe a package of practical tools and libraries for manipulating graphs and their drawings. Our design, which aimed at facilitating the combination of the package components with other tools, includes stream and event interfaces for graph operations, high-quality static and dynamic layout algorithms, and the ability to handle sizable graphs. We conclude with a description of the applications of this package to a variety of software engineering tools.

Theory of random graphs

Janson, S.; Luczak, T. & Rucinski, A.

2000, John Wiley & Sons, New York; Chichester [pdf]

Some uses of spectral methods

Ranade, A.

(2000)

A p* primer: Logit models for social networks

Anderson, C.; Wasserman, S. & Crouch, B.

Social Networks, 21(1) 37-66 (1999)

Emergence of scaling in random networks

Barabasi, A. L. & Albert, R.

Science, 286(5439) 509-512 (1999) [pdf]

Systems as diverse as genetic networks or the World Wide Web are best described as networks with complex topology. A common property of many large networks is that the vertex connectivities follow a scale-free power-law distribution. This feature was found to be a consequence of two generic mechanisms: (i) networks expand continuously by the addition of new vertices, and (ii) new vertices attach preferentially to sites that are already well connected. A model based on these two ingredients reproduces the observed stationary scale-free distributions, which indicates that the development of large networks is governed by robust self-organizing phenomena that go beyond the particulars of the individual systems.

Partitioning Approach to Visualization of Large Graphs

Batagelj, V.; Mrvar, A. & Zaveršnik, M.

Kratochvíyl, J., ed., 'Graph Drawing', 1731(), Springer, Berlin / Heidelberg, 90-97 (1999) [pdf]

The structure of large graphs can be revealed by partitioning graphs to smaller parts, which are easier to handle. In the paper we propose the use of core decomposition as an efficient approach for partitioning large graphs. On the selected subgraphs, computationally more intensive, clustering and blockmodeling can be used to analyze their internal structure. The approach is illustrated by an analysis of Snyder & Kick’s world trade graph.

The Anatomy of a Large-Scale Hypertextual Web Search Engine

Brin, S. & Page, L.

, 'Computer Networks and ISDN Systems', 107-117 (1998) [pdf]

In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://infolab.stanford.edu/~backrub/google.html To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description w...

On the quality of spectral separators

Guattery, S. & Miller, G.

SIAM Journal on Matrix Analysis and Applications, 19(3) 701-719 (1998)

Multilevel k-way Hypergraph Partitioning

Karypis, G. & Kumar, V.

, 'In Proceedings of the Design and Automation Conference', 343-348 (1998)

In this paper, we present a new multilevel k-way hypergraph partitioning algorithm that substantially outperforms the existing state-of-the-art K-PM/LR algorithm for multi-way partitioning. both for optimizing local as well as global objectives. Experiments on the ISPD98 benchmark suite show that the partitionings produced by our scheme are on the average 15% to 23% better than those produced by the K-PM/LR algorithm, both in terms of the hyperedge cut as well as the (K - 1) metric. Furthermore, our algorithm is significantly faster, requiring 4 to 5 times less time than that required by K-PM/LR. 1 Introduction Hypergraph partitioning is an important problem with extensive application to many areas, including VLSI design [10], efficient storage of large databases on disks [14], and data mining [13]. The problem is to partition the vertices of a hypergraph into k roughly equal parts, such that a certain objective function defined over the hyperedges is optimized. A commonly used obje...

Spectral Graph Theory

Chung, F. R. K.

1997, American Mathematical Society

Multilevel hypergraph partitioning: Application in VLSI domain

Karypis, G.; Aggarwal, R.; Kumar, V. & Shekhar, S.

526-529 (1997)

Some applications of Laplace eigenvalues of graphs

Mohar, B.

Graph Symmetry: Algebraic Methods and Applications, 497() 227-275 (1997)

Spectral Partitioning Works: Planar Graphs and Finite Element Meshes

Spielman, D. A. & Teng, S.

1996, Berkeley, CA, USA

Spectral partitioning: The more eigenvectors, the better

Alpert, C. J.; Kahng, A. B. & zen Yao, S.

, 'Proc. ACM/IEEE Design Automation Conf', 195-200 (1995)

A critical point for random graphs with a given degree sequence

Molloy, M. & Reed, B.

Random Structures & Algorithms, 6(), 161-179(1995) [pdf]

Spectra and optimal partitions of weighted graphs

Bolla, M. & Tusnády, G.

Discrete Math., 128(1-3) 1-20 (1994) [pdf]

Spectral K-way ratio-cut partitioning and clustering.

Chan, P. K.; Schlag, M. D. F. & Zien, J. Y.

IEEE Trans. on CAD of Integrated Circuits and Systems, 13(9) 1088-1096 (1994) [pdf]

New spectral methods for ratio cut partitioning and clustering.

Hagen, L. W. & Kahng, A. B.

IEEE Trans. on CAD of Integrated Circuits and Systems, 11(9) 1074-1085 (1992) [pdf]

The Laplacian spectrum of graphs

Mohar, B.

Graph Theory, Combinatorics, and Applications, 2() 871-898 (1991)

Partitioning Sparse Matrices with Eigenvectors of Graphs

Pothen, A.; Simon, H. & Liou, K.

SIAM J. MATRIX ANAL. APPLIC., 11(3) 430-452 (1990) [pdf]

Random sampling and social networks: a survey of various approaches

Frank, O.

Math. Sci. Humaines, 104() 19-33 (1988)

On generating all maximal independent sets

Johnson, D. S. & Papadimitriou, C. H.

Inf. Process. Lett., 27(3) 119-123 (1988) [pdf]

Network structure and minimum degree

Seidman, S. B.

Social Networks, 5(3) 269 - 287 (1983) [pdf]

Social network researchers have long sought measures of network cohesion, Density has often been used for this purpose, despite its generally admitted deficiencies. An approach to network cohesion is proposed that is based on minimum degree and which produces a sequence of subgraphs of gradually increasing cohesion. The approach also associates with any network measures of local density which promise to be useful both in characterizing network structures and in comparing networks.

A review of random graphs

Karonski, M.

Journal of Graph Theory, 6(4) (1982)

The diameter of random graphs

Bollobas, B.

Transactions of the American Mathematical Society 41-52 (1981)

A Set of Measures of Centrality Based on Betweenness

Freeman, L. C.

Sociometry, 40(1) 35-41 (1977) [pdf]

A Family of new measures of point and graph centrality based on early intuitions of Bavelas (1948) is introduced. These measures define centrality in terms of the degree to which a point falls on the shortest path between others and therefore has a potential for control of communication. They may be used to index centrality in any large or small network of symmetrical relations, whether connected or unconnected.

A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory

Fiedler, M.

Czechoslovak Mathematical Journal, 25(100) 619-633 (1975)

Lower bounds for the partitioning of graphs

Donath, W. & Hoffman, A.

IBM Journal of Research and Development, 17(5) 420-425 (1973)