Abstract |
: |
[eng] Traditional approaches from statistics, pattern recognition, machine-learning, or data-mining usually assume an independent and identically-distributed (i.i.d.) sample of objects, which normally leads to a double-entry table containing features from the given sample of the population. However, in some research fields such as spatial statistics, relational learning or link analysis, it is intended to take into account - and analyze - the links between objects which could be of various types and involved in different kinds of relationships. The sample can be therefore seen as a network or a graph where nodes are objects and the links between them are the edges. Within this context, an interesting research topic consists on quantifying the proximity between the network actors from the underlying graph structure. The primary objective of this thesis is precisely the study and the development of dissimilarity measures on a graph. Three different dissimilarities are proposed here, all derived from random walk models on a graph. Roughly, a Markov chain state is associated to every node of the graph and a transition probability is defined for each edge, the measures being derived from this Markov chain. The advantage is that the defined dissimilarity measures take into account not only the length of indirect paths connecting two nodes, but also their number. The studied dissimilarities are assessed in several node clustering, classification, and network representation problems. More specifically, five clustering methods as well as a two-step procedure for the visualization of a subset of the graph nodes are proposed. |