**
Saerens, Marco
[UCL]
**
**
Fouss, François
[UCL]
**
**
Yen, Luh
[UCL]
**
**
Dupont, Pierre
[UCL]
**
This work presents a novel procedure for computing (1) distances between nodes of a weighted, undirected, graph, called the Euclidean commute time distance (ECTD), and (2) a subspace projection of the nodes of the graph that preserves as much variance as possible, in terms of the ECTD - a principal components analysis of the graph. It is based on a Markov-chain model of random walk through the graph. The model assigns transition probabilities to the links between nodes, so that a random walker can jump from node to node. A quantity, called the average commute time, computes the average time taken by a random walker for reaching node j for the first time when starting from node i, and coming back to node i. The square root of this quantity, the ECTD, is a distance measure between any two nodes, and has the nice property of decreasing when the number of paths connecting two nodes increases and when the "length" of any path decreases. The ECTD can be computed from the pseudoinverse of the Laplacian matrix of the graph, which is a kernel. We finally define the principal components analysis (PCA) of a graph as the subspace projection that preserves as much variance as possible, in terms of the ECTD. This graph PCA has some interesting links with spectral graph theory, in particular spectral clustering.

Bibliographic reference |
Saerens, Marco ; Fouss, François ; Yen, Luh ; Dupont, Pierre. *The principle components analysis of a graph, and its relationships to spectral clustering.*Machine Learning: ECML 2004. 15th European Conference on Machine Learning. Proceedings (Pisa, Italy, 20-24 September 2004). In: Boulicaut, J.-F.; Esposito, F.; Giannotti, F.; Pedreschi, D.;, *Machine Learning: ECML 2004. 15th European Conference on MachineLearning. Proceedings (Lecture Notes in Artificial IntelligenceVol.3201)*, Springer-verlag2004, p. 371-383 |

Permanent URL |
http://hdl.handle.net/2078.1/67992 |