Publications

A measure of betweenness centrality based on random walks

Newman, M. E. J.

(2003) [pdf]

Betweenness is a measure of the centrality of a node in a network, and is normally calculated as the fraction of shortest paths between node pairs that pass through the node of interest. Betweenness is, in some sense, a measure of the influence a node has over the spread of information through the network. By counting only shortest paths, however, the conventional definition implicitly assumes that information spreads only along those shortest paths. Here we propose a betweenness measure that relaxes this assumption, including contributions from essentially all paths between nodes, not just the shortest, although it still gives more weight to short paths. The measure is based on random walks, counting how often a node is traversed by a random walk between two other nodes. We show how our measure can be calculated using matrix methods, and give some examples of its application to particular networks.

The origin of degree correlations in the Internet and other networks

Park, J. & Newman, M. E. J.

Physical Review E, 68() 026112 (2003) [pdf]

Dynamical and correlation properties of the Internet

Pastor-Satorras, R.; Vázquez, A. & Vespignani, A.

Physical Review Letters, 87(25) 258701 (2001) [pdf]

The GRACE French Part-of-Speech Tagging Evaluation Task

Adda, G.; Mariani, J.; Lecomte, J.; Paroubek, P. & Rajman, M.

, 'proceedings of the First International Conference on Language Resources and Evaluation (LREC', 433-441 (1998)

Collective dynamics of /`small-world/' networks

Watts, D. J. & Strogatz, S. H.

Nature, 393(6684) 440-442 (1998) [pdf]

Jiang, J. & Conrath, D.

, 'Proc. of the Int'l. Conf. on Research in Computational Linguistics', 19-33 (1997) [pdf]

This paper presents a new approach for measuring semantic similarity/distance between words and concepts. It combines a lexical taxonomy structure with corpus statistical information so that the semantic distance between nodes in the semantic space constructed by the taxonomy can be better quantified with the computational evidence derived from a distributional analysis of corpus data. Specifically, the proposed measure is a combined approach that inherits the edge-based approach of the edge counting scheme, which is then enhanced by the node-based approach of the information content calculation. When tested on a common data set of word pair similarity ratings, the proposed approach outperforms other computational models. It gives the highest correlation value (r = 0.828) with a benchmark based on human similarity judgements, whereas an upper bound (r = 0.885) is observed when human subjects replicate the same task.